Improving classification of tweets using word-word co-occurrence information from a large external corpus

Hammer HL, Yazidi A, Bai A, Engelstad P.E.: Improving classification of tweets using word-word co-occurrence information from a large external corpus. In: Ossowski S. Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC '16), 2016. Association for Computing Machinery (ACM) p. 1174-1177 http://dx.doi.org/10.1145/2851613.2851986

Abstract

Classifying tweets is an intrinsically hard task as tweets are

short messages which makes traditional bags of words based

approach ine cient. In fact, bags of words approaches ig-

nores relationships between important terms that do not

co-occur literally.

In this paper we resort to word-word co-occurence informa-

tion from a large corpus to expand the vocabulary of another

corpus consisting of tweets. Our results show that we are

able to reduce the number of erroneous classi cations by

14% using co-occurence information.

Publisher

Association for Computing Machinery (ACM)