Generalized approach to sentiment analysis of short text messages in natural language processing
Keywords:
natural language processing, machine learning, deep learning, vectorization, modeling, pre-processing, automatic machine learning, transfer learningAbstract
Introduction: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study and
amount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformations
and their combinations. Only a part of the transformations is used, limiting the ways to develop high-quality classification models.
Purpose: Developing and exploring a generalized approach to building a model, which consists in sequentially passing through
the stages of exploratory data analysis, obtaining a basic solution, vectorization, preprocessing, hyperparameter optimization, and
modeling. Results: Comparative experiments conducted using a generalized approach for classical machine learning and deep
learning algorithms in order to solve the problem of sentiment analysis of short text messages in natural language processing
have demonstrated that the classification quality grows from one stage to another. For classical algorithms, such an increase
in quality was insignificant, but for deep learning, it was 8% on average at each stage. Additional studies have shown that the
use of automatic machine learning which uses classical classification algorithms is comparable in quality to manual model
development; however, it takes much longer. The use of transfer learning has a small but positive effect on the classification
quality. Practical relevance: The proposed sequential approach can significantly improve the quality of models under development
in natural language processing problems.