dc.contributor.author | Hammer, Hugo Lewi | |
dc.contributor.author | Yazidi, Anis | |
dc.contributor.author | Bai, Aleksander | |
dc.contributor.author | Engelstad, Paal E. | |
dc.date.accessioned | 2017-01-25T10:24:18Z | |
dc.date.accessioned | 2017-03-17T10:14:52Z | |
dc.date.available | 2017-01-25T10:24:18Z | |
dc.date.available | 2017-03-17T10:14:52Z | |
dc.date.issued | 2016 | |
dc.identifier.citation | Hammer HL, Yazidi A, Bai A, Engelstad P.E.: Improving Classification of Tweets Using Linguistic Information from a Large External Corpus. In: Maglaras. Industrial Networks and Intelligent Systems, 2016. Springer p. 122-134 | language |
dc.identifier.issn | 1867-8211 | |
dc.identifier.issn | 1867-822X | |
dc.identifier.uri | https://hdl.handle.net/10642/4326 | |
dc.description.abstract | The bag of words representation of documents is often unsat-
isfactory as it ignores relationships between important terms that do not
co-occur literally. Improvements might be achieved by expanding the
vocabulary with other relevant word, like synonyms.
In this paper we use word-word co-occurence information from a large
corpus to expand the vocabulary of another corpus consisting of tweets.
Several different methods on how to include the co-occurence information
are constructed and tested out on the classification of real twitter data.
Our results show that we are able to reduce the number of erroneous
classifications by 14% using co-occurence information. | language |
dc.rights | The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-52569-3_11 | |
dc.title | Improving Classification of Tweets Using Linguistic Information from a Large External Corpus | language |
dc.type | Journal article | |
dc.type | Peer reviewed | |
dc.date.updated | 2017-01-25T10:24:18Z | |
dc.description.version | acceptedVersion | language |
dc.identifier.cristin | 1437272 | |
dc.source.isbn | 978-3-319-52568-6 | |