Building domain specific sentiment lexicons combining information from many sentiment lexicons and a domain specific corpus

Hammer, Hugo Lewi; Yazidi, Anis; Bai, Aleksander; Engelstad, Paal E.

Hammer, Hugo Lewi; Yazidi, Anis; Bai, Aleksander; Engelstad, Paal E.

Peer reviewed, Chapter

The final publication is available at springer via http://dx.doi.org/20.1007/978-3-319-19578-0_17

Åpne

Postprint (256.4Kb)

Permanent lenke

https://hdl.handle.net/10642/3175

Utgivelsesdato

2015

Metadata

Vis full innførsel

Samlinger

TKD - Institutt for informasjonsteknologi [940]

Originalversjon

Hammer, H., Yazidi, A., Bai, A., & Engelstad, P. (2015). Building Domain Specific Sentiment Lexicons Combining Information from Many Sentiment Lexicons and a Domain Specific Corpus. In A. Amine, L. Bellatreche, Z. Elberrichi, J. E. Neuhold, & R. Wrembel (Eds.), Computer Science and Its Applications: 5th IFIP TC 5 International Conference, CIIA 2015, Saida, Algeria, May 20-21, 2015, Proceedings (pp. 205-216). Cham: Springer International Publishing. 20.1007/978-3-319-19578-0_17

Sammendrag

Most approaches to sentiment analysis requires a sentiment

lexicon in order to automatically predict sentiment or opinion in a text.

The lexicon is generated by selecting words and assigning scores to the

words, and the performance the sentiment analysis depends on the quality

of the assigned scores. This paper addresses an aspect of sentiment

lexicon generation that has been overlooked so far; namely that the most

appropriate score assigned to a word in the lexicon is dependent on the

domain. The common practice, on the contrary, is that the same lexicon

is used without adjustments across different domains ignoring the

fact that the scores are normally highly sensitive to the domain. Consequently,

the same lexicon might perform well on a single domain while

performing poorly on another domain, unless some score adjustment is

performed. In this paper, we advocate that a sentiment lexicon needs

some further adjustments in order to perform well in a specific domain.

In order to cope with these domain specific adjustments, we adopt a

stochastic formulation of the sentiment score assignment problem instead

of the classical deterministic formulation. Thus, viewing a sentiment

score as a stochastic variable permits us to accommodate to the

domain specific adjustments. Experimental results demonstrate the feasibility

of our approach and its superiority to generic lexicons without

domain adjustments.

Utgiver

Springer

Serie

Computer Science and Its Applications: 5th IFIP TC 5 International Conference, CIIA 2015, Saida, Algeria, May 20-21, 2015, Proceedings;