Improving Classification of Tweets Using Linguistic Information from a Large External Corpus

Hammer, Hugo Lewi; Yazidi, Anis; Bai, Aleksander; Engelstad, Paal E.

dc.contributor.author	Hammer, Hugo Lewi
dc.contributor.author	Yazidi, Anis
dc.contributor.author	Bai, Aleksander
dc.contributor.author	Engelstad, Paal E.
dc.date.accessioned	2017-01-25T10:24:18Z
dc.date.accessioned	2017-03-17T10:14:52Z
dc.date.available	2017-01-25T10:24:18Z
dc.date.available	2017-03-17T10:14:52Z
dc.date.issued	2016
dc.identifier.citation	Hammer HL, Yazidi A, Bai A, Engelstad P.E.: Improving Classification of Tweets Using Linguistic Information from a Large External Corpus. In: Maglaras. Industrial Networks and Intelligent Systems, 2016. Springer p. 122-134	language
dc.identifier.issn	1867-8211
dc.identifier.issn	1867-822X
dc.identifier.uri	https://hdl.handle.net/10642/4326
dc.description.abstract	The bag of words representation of documents is often unsat- isfactory as it ignores relationships between important terms that do not co-occur literally. Improvements might be achieved by expanding the vocabulary with other relevant word, like synonyms. In this paper we use word-word co-occurence information from a large corpus to expand the vocabulary of another corpus consisting of tweets. Several different methods on how to include the co-occurence information are constructed and tested out on the classification of real twitter data. Our results show that we are able to reduce the number of erroneous classifications by 14% using co-occurence information.	language
dc.rights	The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-52569-3_11
dc.title	Improving Classification of Tweets Using Linguistic Information from a Large External Corpus	language
dc.type	Journal article
dc.type	Peer reviewed
dc.date.updated	2017-01-25T10:24:18Z
dc.description.version	acceptedVersion	language
dc.identifier.cristin	1437272
dc.source.isbn	978-3-319-52568-6

Tilhørende fil(er)

Filnavn:: Manuscript.pdf
Størrelse:: 287.7Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

TKD - Institutt for informasjonsteknologi [945]
TKD - Department of Computer Science

Vis enkel innførsel