Sentiment Analysis of Tokopedia Customer Reviews using IndoBERT and SMOTE for Class Imbalance Handling


  • Imam Saputra * Mail Sekolah Tinggi Ilmu Manajemen Sukma Medan, Medan, Indonesia
  • Mesran Mesran Sekolah Tinggi Ilmu Manajemen Sukma Medan, Medan, Indonesia
  • Guidio Leonarde Ginting Sekolah Tinggi Ilmu Manajemen Sukma Medan, Medan, Indonesia
  • (*) Corresponding Author
Keywords: Sentiment Analysis; IndoBERT; SMOTE; Class Imbalance; E-commerce; Natural Language Processing

Abstract

Sentiment analysis in the Indonesian e-commerce sector faces significant challenges due to the informal nature of language and severe class imbalance, where neutral reviews are often underrepresented. This research proposes a hybrid framework combining the deep semantic capabilities of IndoBERT with the Synthetic Minority Over-sampling Technique (SMOTE) to improve classification fairness. Using a dataset of Tokopedia customer reviews, this study compares a baseline model against a balanced model using SMOTE on 768-dimensional IndoBERT features. The experimental results reveal that while the baseline model achieved a high overall accuracy of 83%, it suffered from an "accuracy paradox," exhibiting a dismal recall of only 0.07 for the neutral class. Upon implementing SMOTE, the neutral class recall surged to 0.29, marking a significant 314% improvement in minority class detection. Although overall accuracy slightly decreased to 81%, the Macro Average F1-Score increased from 0.61 to 0.65, proving that the model is more robust and objectively reliable across all sentiment polarities. This study demonstrates that sacrificing marginal accuracy for improved minority sensitivity is vital for providing accurate business intelligence in the digital marketplace. These findings provide a robust roadmap for developing more equitable automated sentiment analysis systems in Indonesia.

Downloads

Download data is not yet available.

References

A. Jazuli, Widowati, and R. Kusumaningrum, “Optimizing Aspect-Based Sentiment Analysis Using BERT for Comprehensive Analysis of Indonesian Student Feedback,” Applied Sciences (Switzerland), vol. 15, no. 1, pp. 1–28, 2025, doi: 10.3390/app15010172.

E. R. Chaldun, G. Yudoko, S. R. Maryunani, F. F. K. Kautsar, and C. T. Walidayni, “Influencing Factors of Indonesian Coffee Product Customer Experience in International Market: an Aspect-Based Sentiment Analysis with GPT-3 Davinci Model,” Cogent Business and Management, vol. 11, no. 1, pp. 1–28, 2024, doi: 10.1080/23311975.2024.2429796.

M. Asokere, A. Wusu, and O. Olabanjo, “Twitter (X) as an Electoral Barometer: Systematic Evidence from Sentiment Analysis of Twitter Data,” International Journal of Information Technology (Singapore), no. X, pp. 1–24, 2025, doi: 10.1007/s41870-025-03039-1.

I. D. Mienye and T. G. Swart, “A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications,” Information (Switzerland), vol. 15, no. 12, pp. 1–45, 2024, doi: 10.3390/info15120755.

A. A. Adekunle, I. Fofana, P. Picher, E. M. Rodriguez-Celis, O. H. Arroyo-Fernandez, and R. Zemouri, “Optimizing deep learning predictive models: A comprehensive review of RNN and its variant architectures,” Appl. Soft Comput., vol. 185, pp. 1–31, 2025, doi: 10.1016/j.asoc.2025.114015.

A. Sampath and T. R. Sumithira, “Sparse based recurrent neural network long short term memory (rnn-lstm) model for the classification of ecg signals,” Applied Artificial Intelligence, vol. 36, no. 1, pp. 1–29, 2022, doi: 10.1080/08839514.2021.2018183.

K. Kamdan, M. P. Anugrah, M. J. Almutaali, R. Ramdani, and I. L. Kharisma, “Performance Analysis of IndoBERT for Detection of Online Gambling Promotion in YouTube Comments †,” Engineering Proceedings, vol. 107, no. 1, pp. 1–17, 2025, doi: 10.3390/engproc2025107066.

Y. A. Singgalen, “IndoBERT-Based Sentiment Analysis for Understanding Hotel Guests’ Preferences,” Journal of Computer System and Informatics (JoSYC), vol. 6, no. 2, pp. 532–544, 2025, doi: 10.47065/josyc.v6i2.6864.

S. Apriliani, A. Erfina, and C. Warman, “Fine-Tuned IndoBERT for Aspect-Based Sentiment Analysis of Indonesian Five-Star Hotel Reviews,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 14, no. 4, pp. 437–445, 2025, doi: 10.32736/sisfokom.v14i4.2491.

M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, Handling imbalanced medical datasets: review of a decade of research, vol. 57, no. 10. Springer Netherlands, 2024. doi: 10.1007/s10462-024-10884-2.

H. Zhou, J. Tong, Y. Liu, K. Zheng, and C. Cao, “An oversampling FCM-KSMOTE algorithm for imbalanced data classification,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 10, pp. 1–20, 2024, doi: 10.1016/j.jksuci.2024.102248.

T. Miftahushudur, H. M. Sahin, B. Grieve, and H. Yin, “A Survey of Methods for Addressing Imbalance Data Problems in Agriculture Applications,” Remote Sens. (Basel)., vol. 17, no. 3, pp. 1–31, 2025, doi: 10.3390/rs17030454.

J. P. Venugopal, A. A. V. Subramanian, G. Sundaram, M. Rivera, and P. Wheeler, “A Comprehensive Approach to Bias Mitigation for Sentiment Analysis of Social Media Data,” Applied Sciences (Switzerland), vol. 14, no. 23, pp. 1–32, 2024, doi: 10.3390/app142311471.

K. Ahmed, M. I. Nadeem, G. Wang, F. Zuo, and Z. Han, “Instruction-tuned ABSA with auxiliary sentences and knowledge-enhanced graphs for implicit aspect detection,” Expert Syst. Appl., vol. 289, no. November 2024, 2025, doi: 10.1016/j.eswa.2025.128284.

S. I. Ahsan, D. Djenouri, and R. Haider, “Privacy-Enhanced Sentiment Analysis in Mental Health: Federated Learning with Data Obfuscation and Bidirectional Encoder Representations from Transformers,” Electronics (Switzerland), vol. 13, no. 23, 2024, doi: 10.3390/electronics13234650.

Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 4, pp. 1–16, 2024, doi: 10.1016/j.jksuci.2024.102048.

T. Hamed and M. Madanchian, “Artificial Intelligence and Sentiment Analysis : A Review in,” Computers, vol. 12, no. 37, pp. 1–15, 2023, doi: 10.3390/computers12020037.

K. Alahmadi, S. Alharbi, J. Chen, and X. Wang, “Generalizing sentiment analysis: a review of progress, challenges, and emerging directions,” Soc. Netw. Anal. Min., vol. 15, no. 1, pp. 1–28, 2025, doi: 10.1007/s13278-025-01461-8.

M. M. Taamneh, S. Taamneh, A. H. Alomari, and M. Abuaddous, “Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use,” Sustainability (Switzerland), vol. 15, no. 13, pp. 1–20, 2023, doi: 10.3390/su151310668.

J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Applied Sciences (Switzerland), vol. 13, no. 6, pp. 1–34, 2023, doi: 10.3390/app13064006.

K. M. Sujon, R. Hassan, K. Choi, and M. A. Samad, “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI for evaluating business predictive models,” J. Big Data, vol. 12, no. 1, pp. 1–45, 2025, doi: 10.1186/s40537-025-01313-4.

S. Jurn and W. Kim, “Improving Text Classification of Imbalanced Call Center Conversations Through Data Cleansing, Augmentation, and NER Metadata,” Electronics (Switzerland), vol. 14, no. 11, pp. 1–23, 2025, doi: 10.3390/electronics14112259.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Sentiment Analysis of Tokopedia Customer Reviews using IndoBERT and SMOTE for Class Imbalance Handling

Dimensions Badge
Article History
Submitted: 2025-11-18
Published: 2025-11-30
Abstract View: 163 times
PDF Download: 157 times
Section
Articles