SentiCode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis.
Mohamed Raouf KanfoudAbdelkrim BouramoulPublished in: Journal of intelligent information systems (2022)
The main objective of multilingual sentiment analysis is to analyze reviews regardless of the original language in which they are written. Switching from one language to another is very common on social media platforms. Analyzing these multilingual reviews is a challenge since each language is different in terms of syntax, grammar, etc. This paper presents a new language-independent representation approach for sentiment analysis, SentiCode. Unlike previous work in multilingual sentiment analysis, the proposed approach does not rely on machine translation to bridge the gap between different languages. Instead, it exploits common features of languages, such as part-of-speech tags used in Universal Dependencies. Equally important, SentiCode enables sentiment analysis in multi-language and multi-domain environments simultaneously. Several experiments were conducted using machine/deep learning techniques to evaluate the performance of SentiCode in multilingual (English, French, German, Arabic, and Russian) and multi-domain environments. In addition, the vocabulary proposed by SentiCode and the effect of each token were evaluated by the ablation method. The results highlight the 70% accuracy of SentiCode, with the best trade-off between efficiency and computing time (training and testing) in a total of about 0.67 seconds, which is very convenient for real-time applications.