Sentiment Analysis Based on Aspects Using FastText Feature Expansion and NBSVM Classification Method


  • Sukmawati Dwi Lestari Telkom University, Bandung, Indonesia
  • Erwin Budi Setiawan * Mail Telkom University, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: Sentiment Analysis Based on Aspect; Feature Expansion; FastText; NBSVM; Handling Imbalanced Data

Abstract

Telkomsel is a service that the people of Indonesia widely use. Complaints from users referring to Telkomsel's service and signal aspects are often made in Twitter tweets with harsh or good language. This is done because users continue to demand to get better service. Therefore, an aspect-based sentiment analysis technique is needed to determine a person's view of each aspect, such as Telkomsel's service and signal aspects. Aspect-based sentiment analysis is a solution to find out the opinions of Telkomsel users based on their aspects. In its implementation, the NBSVM method is used as a classification model that is proven to work well compared to other methods, namely MNB and SVM. The implementation of the expansion of the FastText feature can affect the level of performance model, and the best results are obtained in the Top 1 feature on the signal aspect and Top 5 on the service aspect with a combination of Twitter corpus and news. In this study, the data used is unbalanced and has been handled by applying SMOTE and AdaBoost techniques to the FastText feature expansion model. Based on the results of the tests that have been carried out, SMOTE can handle data imbalances compared to AdaBoost. The performance results of the FastText feature expansion model after SMOTE are applied to get F1-Score 91.24% in the signal aspect and F1-Score 88.75% in the service aspect.

Downloads

Download data is not yet available.

References

Statista Research Department, “Countries with the most Twitter users 2021,” Statista, 2021. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/#professional (accessed Oct. 31, 2021).

Daon001, “Inilah rincian jumlah pelanggan prabayar masing-masing operator,” Kominfo, 2018. https://kominfo.go.id/content/detail/13131/inilah-rincian-jumlah-pelanggan-prabayar-masing-masing-operator/0/sorotan_media (accessed Oct. 31, 2021).

A. Mittal and S. Patidar, “Sentiment analysis on twitter data: A survey,” ACM Int. Conf. Proceeding Ser., pp. 91–95, 2019, doi: 10.1145/3348445.3348466.

B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publisher, 2012.

R. Feldman, “Techniques and applications for sentiment analysis,” Commun. ACM, vol. 56, no. 4, pp. 82–89, 2013, doi: 10.1145/2436256.2436274.

E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” Int. Conf. Electr. Eng. Comput. Sci. Informatics, vol. 2018-Octob, pp. 509–513, 2018, doi: 10.1109/EECSI.2018.8752851.

F. Novitasari and M. D. Purbolaksono, “Sentiment Analysis Aspect Level on Beauty Product Reviews,” no. January, pp. 18–30, 2021.

A. N. Muhammad, S. Bukhori, and P. Pandunata, “Sentiment Analysis of Positive and Negative of YouTube Comments Using Naïve Bayes-Support Vector Machine (NBSVM) Classifier,” Proc. - 2019 Int. Conf. Comput. Sci. Inf. Technol. Electr. Eng. ICOMITEE 2019, vol. 1, pp. 199–205, 2019, doi: 10.1109/ICOMITEE.2019.8920923.

D. Lakmal, S. Ranathunga, S. Peramuna, and I. Herath, “Word embedding evaluation for Sinhala,” Lr. 2020 - 12th Int. Conf. Lang. Resour. Eval. Conf. Proc., no. May, pp. 1874–1881, 2020.

J. Li, H. Li, and J. L. Yu, “Application of Random-SMOTE on imbalanced data mining,” Proc. - 2011 4th Int. Conf. Bus. Intell. Financ. Eng. BIFE 2011, pp. 130–133, 2011, doi: 10.1109/BIFE.2011.25.

W. Wang and D. Sun, “The improved AdaBoost algorithms for imbalanced data classification,” Inf. Sci. (Ny)., vol. 563, pp. 358–374, 2021, doi: 10.1016/j.ins.2021.03.042.

W. Etaiwi and G. Naymat, “The Impact of applying Different Preprocessing Steps on Review Spam Detection,” Procedia Comput. Sci., vol. 113, pp. 273–279, 2017, doi: 10.1016/j.procs.2017.08.368.

F. S. Alnaz and W. Maharani, “Analisis Emosi Melalui Media Sosial Twitter Dengan Menggunakan Metode Naïve Bayes dan Perbandingan Fitur N-gram dan TF-IDF,” pp. 1–14, 2021.

D. H. K. Al-Khafaji and A. T. Habeeb, “Efficient Algorithms for Preprocessing and Stemming of Tweets in a Sentiment Analysis System,” IOSR J. Comput. Eng., vol. 19, no. 3, pp. 44–50, 2017, doi: 10.9790/0661-1903024450.

A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019, doi: 10.1109/ICOASE.2019.8723825.

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 2, pp. 427–431, 2017, doi: 10.18653/v1/e17-2068.

D. Roy, D. Ganguly, S. Bhatia, S. Bedathur, and M. Mitra, “Using word embeddings for information retrieval: How collection and term normalization choices affect performance,” Int. Conf. Inf. Knowl. Manag. Proc., pp. 1835–1838, 2018, doi: 10.1145/3269206.3269277.

S. Wang and C. D. Manning, “Baselines and bigrams: Simple, good sentiment and topic classification,” 50th Annu. Meet. Assoc. Comput. Linguist. ACL 2012 - Proc. Conf., vol. 2, no. July, pp. 90–94, 2012.

S. Narkhede, “Understanding Confusion Matrix,” towardsdatascience.com, 2018. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62 (accessed Dec. 05, 2021).

D. Karani, “What is ROC-AUC and when not to use it,” towardsdatascience.com, 2022. https://towardsdatascience.com/read-this-before-using-roc-auc-as-a-metric-c84c2d5af621 (accessed Jul. 06, 2022).


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Sentiment Analysis Based on Aspects Using FastText Feature Expansion and NBSVM Classification Method

Dimensions Badge
Article History
Submitted: 2022-08-26
Published: 2022-09-05
Abstract View: 827 times
PDF Download: 436 times
Section
Articles