Tsetlin Machine for Sentiment Analysis and Spam Review Detection in Chinese

In Natural Language Processing (NLP), deep-learning neural networks have superior performance but pose transparency and explainability barriers, due to their black box nature, and, thus, there is lack of trustworthiness. On the other hand, classical machine learning techniques are intuitive and easy to understand but often cannot perform satisfactorily. Fortunately, many research studies have recently indicated that the newly introduced model, Tsetlin Machine (TM), has reliable performance and, at the same time, enjoys human-level interpretability by nature, which is a promising approach to trade off effectiveness and interpretability. However, nearly all of the related works so far have concentrated on the English language, while research on other languages is relatively scarce. So, we propose a novel method, based on the TM model, in which the learning process is transparent and easily-understandable for Chinese NLP tasks. Our model can learn semantic information in the Chinese language by clauses. For evaluation, we conducted experiments in two domains, namely sentiment analysis and spam review detection. The experimental results showed thatm for both domains, our method could provide higher accuracy and a higher F1 score than complex, but non-transparent, deep-learning models, such as BERT and ERINE.

Tidsskrift

Algorithms

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal