TY - JOUR
T1 - Data-based prediction of sentiments using heterogeneous model ensembles
AU - Winkler, Stephan
AU - Schaller, Susanne
AU - Dorfer, Viktoria
AU - Affenzeller, Michael
AU - Petz, Gerald
AU - Karpowicz, Michal Jan
N1 - Publisher Copyright:
© 2014, Springer-Verlag Berlin Heidelberg.
PY - 2015/12/1
Y1 - 2015/12/1
N2 - In this paper, we present an ensemble modeling approach for sentiment analysis using machine learning algorithms. The main goal of sentiment analysis is to develop estimators that are able to identify the sentiment orientation (positive, negative, or neutral) of sentences found in any arbitrary source. The novel approach presented here relies on the analysis of the words found in sentences and the formation of large sets of heterogeneous models, i.e., binary as well as multi-class classification models that are calculated by various different machine learning methods; these models shall represent the relationship between the presence of given words (or combination of words) and sentiments. All models trained during the learning phase are applied during the test phase and the final sentiment assessment is annotated with a confidence value that specifies, how reliable the models are regarding the presented decision. In the empirical part of this paper, we show results achieved using a German corpus of Amazon recensions and a set of machine learning methods (decision trees and adaptive boosting, Gaussian processes, random forests, k-nearest neighbor classification, support vector machines and artificial neural networks with evolutionary feature and parameter optimization, and genetic programming). Using a heterogeneous model ensemble learning approach that combines multi-class classifiers as well as binary classifiers, the classification accuracy can be increased significantly and the ratio of totally wrongly classified samples (i.e., those that are assigned to the completely opposite sentiment orientation) can be decreased significantly.
AB - In this paper, we present an ensemble modeling approach for sentiment analysis using machine learning algorithms. The main goal of sentiment analysis is to develop estimators that are able to identify the sentiment orientation (positive, negative, or neutral) of sentences found in any arbitrary source. The novel approach presented here relies on the analysis of the words found in sentences and the formation of large sets of heterogeneous models, i.e., binary as well as multi-class classification models that are calculated by various different machine learning methods; these models shall represent the relationship between the presence of given words (or combination of words) and sentiments. All models trained during the learning phase are applied during the test phase and the final sentiment assessment is annotated with a confidence value that specifies, how reliable the models are regarding the presented decision. In the empirical part of this paper, we show results achieved using a German corpus of Amazon recensions and a set of machine learning methods (decision trees and adaptive boosting, Gaussian processes, random forests, k-nearest neighbor classification, support vector machines and artificial neural networks with evolutionary feature and parameter optimization, and genetic programming). Using a heterogeneous model ensemble learning approach that combines multi-class classifiers as well as binary classifiers, the classification accuracy can be increased significantly and the ratio of totally wrongly classified samples (i.e., those that are assigned to the completely opposite sentiment orientation) can be decreased significantly.
KW - Evolutionary computation
KW - Heterogeneous model ensembles
KW - Machine learning
KW - Sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=84947127449&partnerID=8YFLogxK
U2 - 10.1007/s00500-014-1325-6
DO - 10.1007/s00500-014-1325-6
M3 - Article
SN - 1433-7479
VL - 19
SP - 3401
EP - 3412
JO - Soft Computing
JF - Soft Computing
IS - 12
ER -