Sentiment Analysis Based on Aspects Using FastText Feature Expansion and NBSVM Classification Method
Abstract
Telkomsel is a service that the people of Indonesia widely use. Complaints from users referring to Telkomsel's service and signal aspects are often made in Twitter tweets with harsh or good language. This is done because users continue to demand to get better service. Therefore, an aspect-based sentiment analysis technique is needed to determine a person's view of each aspect, such as Telkomsel's service and signal aspects. Aspect-based sentiment analysis is a solution to find out the opinions of Telkomsel users based on their aspects. In its implementation, the NBSVM method is used as a classification model that is proven to work well compared to other methods, namely MNB and SVM. The implementation of the expansion of the FastText feature can affect the level of performance model, and the best results are obtained in the Top 1 feature on the signal aspect and Top 5 on the service aspect with a combination of Twitter corpus and news. In this study, the data used is unbalanced and has been handled by applying SMOTE and AdaBoost techniques to the FastText feature expansion model. Based on the results of the tests that have been carried out, SMOTE can handle data imbalances compared to AdaBoost. The performance results of the FastText feature expansion model after SMOTE are applied to get F1-Score 91.24% in the signal aspect and F1-Score 88.75% in the service aspect.
Downloads
References
Statista Research Department, “Countries with the most Twitter users 2021,” Statista, 2021. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/#professional (accessed Oct. 31, 2021).
Daon001, “Inilah rincian jumlah pelanggan prabayar masing-masing operator,” Kominfo, 2018. https://kominfo.go.id/content/detail/13131/inilah-rincian-jumlah-pelanggan-prabayar-masing-masing-operator/0/sorotan_media (accessed Oct. 31, 2021).
A. Mittal and S. Patidar, “Sentiment analysis on twitter data: A survey,” ACM Int. Conf. Proceeding Ser., pp. 91–95, 2019, doi: 10.1145/3348445.3348466.
B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publisher, 2012.
R. Feldman, “Techniques and applications for sentiment analysis,” Commun. ACM, vol. 56, no. 4, pp. 82–89, 2013, doi: 10.1145/2436256.2436274.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” Int. Conf. Electr. Eng. Comput. Sci. Informatics, vol. 2018-Octob, pp. 509–513, 2018, doi: 10.1109/EECSI.2018.8752851.
F. Novitasari and M. D. Purbolaksono, “Sentiment Analysis Aspect Level on Beauty Product Reviews,” no. January, pp. 18–30, 2021.
A. N. Muhammad, S. Bukhori, and P. Pandunata, “Sentiment Analysis of Positive and Negative of YouTube Comments Using Naïve Bayes-Support Vector Machine (NBSVM) Classifier,” Proc. - 2019 Int. Conf. Comput. Sci. Inf. Technol. Electr. Eng. ICOMITEE 2019, vol. 1, pp. 199–205, 2019, doi: 10.1109/ICOMITEE.2019.8920923.
D. Lakmal, S. Ranathunga, S. Peramuna, and I. Herath, “Word embedding evaluation for Sinhala,” Lr. 2020 - 12th Int. Conf. Lang. Resour. Eval. Conf. Proc., no. May, pp. 1874–1881, 2020.
J. Li, H. Li, and J. L. Yu, “Application of Random-SMOTE on imbalanced data mining,” Proc. - 2011 4th Int. Conf. Bus. Intell. Financ. Eng. BIFE 2011, pp. 130–133, 2011, doi: 10.1109/BIFE.2011.25.
W. Wang and D. Sun, “The improved AdaBoost algorithms for imbalanced data classification,” Inf. Sci. (Ny)., vol. 563, pp. 358–374, 2021, doi: 10.1016/j.ins.2021.03.042.
W. Etaiwi and G. Naymat, “The Impact of applying Different Preprocessing Steps on Review Spam Detection,” Procedia Comput. Sci., vol. 113, pp. 273–279, 2017, doi: 10.1016/j.procs.2017.08.368.
F. S. Alnaz and W. Maharani, “Analisis Emosi Melalui Media Sosial Twitter Dengan Menggunakan Metode Naïve Bayes dan Perbandingan Fitur N-gram dan TF-IDF,” pp. 1–14, 2021.
D. H. K. Al-Khafaji and A. T. Habeeb, “Efficient Algorithms for Preprocessing and Stemming of Tweets in a Sentiment Analysis System,” IOSR J. Comput. Eng., vol. 19, no. 3, pp. 44–50, 2017, doi: 10.9790/0661-1903024450.
A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019, doi: 10.1109/ICOASE.2019.8723825.
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 2, pp. 427–431, 2017, doi: 10.18653/v1/e17-2068.
D. Roy, D. Ganguly, S. Bhatia, S. Bedathur, and M. Mitra, “Using word embeddings for information retrieval: How collection and term normalization choices affect performance,” Int. Conf. Inf. Knowl. Manag. Proc., pp. 1835–1838, 2018, doi: 10.1145/3269206.3269277.
S. Wang and C. D. Manning, “Baselines and bigrams: Simple, good sentiment and topic classification,” 50th Annu. Meet. Assoc. Comput. Linguist. ACL 2012 - Proc. Conf., vol. 2, no. July, pp. 90–94, 2012.
S. Narkhede, “Understanding Confusion Matrix,” towardsdatascience.com, 2018. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62 (accessed Dec. 05, 2021).
D. Karani, “What is ROC-AUC and when not to use it,” towardsdatascience.com, 2022. https://towardsdatascience.com/read-this-before-using-roc-auc-as-a-metric-c84c2d5af621 (accessed Jul. 06, 2022).
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Sentiment Analysis Based on Aspects Using FastText Feature Expansion and NBSVM Classification Method
Pages: 469-477
Copyright (c) 2022 Sukmawati Dwi Lestari, Erwin Budi Setiawan
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).