Improving classification of tweets using word-word co-occurrence information from a large external corpus
Chapter, Peer reviewed, Chapter
MetadataVis full innførsel
OriginalversjonHammer HL, Yazidi A, Bai A, Engelstad P.E.: Improving classification of tweets using word-word co-occurrence information from a large external corpus. In: Ossowski S. Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC '16), 2016. Association for Computing Machinery (ACM) p. 1174-1177 http://dx.doi.org/10.1145/2851613.2851986
Classifying tweets is an intrinsically hard task as tweets are short messages which makes traditional bags of words based approach ine cient. In fact, bags of words approaches ig- nores relationships between important terms that do not co-occur literally. In this paper we resort to word-word co-occurence informa- tion from a large corpus to expand the vocabulary of another corpus consisting of tweets. Our results show that we are able to reduce the number of erroneous classi cations by 14% using co-occurence information.