Improving classification of tweets using word-word co-occurrence information from a large external corpus
Chapter, Peer reviewed, Chapter
Accepted version
Date
2016Metadata
Show full item recordCollections
Original version
Hammer HL, Yazidi A, Bai A, Engelstad P.E.: Improving classification of tweets using word-word co-occurrence information from a large external corpus. In: Ossowski S. Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC '16), 2016. Association for Computing Machinery (ACM) p. 1174-1177 http://dx.doi.org/10.1145/2851613.2851986Abstract
Classifying tweets is an intrinsically hard task as tweets are
short messages which makes traditional bags of words based
approach ine cient. In fact, bags of words approaches ig-
nores relationships between important terms that do not
co-occur literally.
In this paper we resort to word-word co-occurence informa-
tion from a large corpus to expand the vocabulary of another
corpus consisting of tweets. Our results show that we are
able to reduce the number of erroneous classi cations by
14% using co-occurence information.