Sentiment Analysis of Tokopedia Customer Reviews using IndoBERT and SMOTE for Class Imbalance Handling
Abstract
Sentiment analysis in the Indonesian e-commerce sector faces significant challenges due to the informal nature of language and severe class imbalance, where neutral reviews are often underrepresented. This research proposes a hybrid framework combining the deep semantic capabilities of IndoBERT with the Synthetic Minority Over-sampling Technique (SMOTE) to improve classification fairness. Using a dataset of Tokopedia customer reviews, this study compares a baseline model against a balanced model using SMOTE on 768-dimensional IndoBERT features. The experimental results reveal that while the baseline model achieved a high overall accuracy of 83%, it suffered from an "accuracy paradox," exhibiting a dismal recall of only 0.07 for the neutral class. Upon implementing SMOTE, the neutral class recall surged to 0.29, marking a significant 314% improvement in minority class detection. Although overall accuracy slightly decreased to 81%, the Macro Average F1-Score increased from 0.61 to 0.65, proving that the model is more robust and objectively reliable across all sentiment polarities. This study demonstrates that sacrificing marginal accuracy for improved minority sensitivity is vital for providing accurate business intelligence in the digital marketplace. These findings provide a robust roadmap for developing more equitable automated sentiment analysis systems in Indonesia.
Downloads
References
A. Jazuli, Widowati, and R. Kusumaningrum, “Optimizing Aspect-Based Sentiment Analysis Using BERT for Comprehensive Analysis of Indonesian Student Feedback,” Applied Sciences (Switzerland), vol. 15, no. 1, pp. 1–28, 2025, doi: 10.3390/app15010172.
E. R. Chaldun, G. Yudoko, S. R. Maryunani, F. F. K. Kautsar, and C. T. Walidayni, “Influencing Factors of Indonesian Coffee Product Customer Experience in International Market: an Aspect-Based Sentiment Analysis with GPT-3 Davinci Model,” Cogent Business and Management, vol. 11, no. 1, pp. 1–28, 2024, doi: 10.1080/23311975.2024.2429796.
M. Asokere, A. Wusu, and O. Olabanjo, “Twitter (X) as an Electoral Barometer: Systematic Evidence from Sentiment Analysis of Twitter Data,” International Journal of Information Technology (Singapore), no. X, pp. 1–24, 2025, doi: 10.1007/s41870-025-03039-1.
I. D. Mienye and T. G. Swart, “A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications,” Information (Switzerland), vol. 15, no. 12, pp. 1–45, 2024, doi: 10.3390/info15120755.
A. A. Adekunle, I. Fofana, P. Picher, E. M. Rodriguez-Celis, O. H. Arroyo-Fernandez, and R. Zemouri, “Optimizing deep learning predictive models: A comprehensive review of RNN and its variant architectures,” Appl. Soft Comput., vol. 185, pp. 1–31, 2025, doi: 10.1016/j.asoc.2025.114015.
A. Sampath and T. R. Sumithira, “Sparse based recurrent neural network long short term memory (rnn-lstm) model for the classification of ecg signals,” Applied Artificial Intelligence, vol. 36, no. 1, pp. 1–29, 2022, doi: 10.1080/08839514.2021.2018183.
K. Kamdan, M. P. Anugrah, M. J. Almutaali, R. Ramdani, and I. L. Kharisma, “Performance Analysis of IndoBERT for Detection of Online Gambling Promotion in YouTube Comments †,” Engineering Proceedings, vol. 107, no. 1, pp. 1–17, 2025, doi: 10.3390/engproc2025107066.
Y. A. Singgalen, “IndoBERT-Based Sentiment Analysis for Understanding Hotel Guests’ Preferences,” Journal of Computer System and Informatics (JoSYC), vol. 6, no. 2, pp. 532–544, 2025, doi: 10.47065/josyc.v6i2.6864.
S. Apriliani, A. Erfina, and C. Warman, “Fine-Tuned IndoBERT for Aspect-Based Sentiment Analysis of Indonesian Five-Star Hotel Reviews,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 14, no. 4, pp. 437–445, 2025, doi: 10.32736/sisfokom.v14i4.2491.
M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, Handling imbalanced medical datasets: review of a decade of research, vol. 57, no. 10. Springer Netherlands, 2024. doi: 10.1007/s10462-024-10884-2.
H. Zhou, J. Tong, Y. Liu, K. Zheng, and C. Cao, “An oversampling FCM-KSMOTE algorithm for imbalanced data classification,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 10, pp. 1–20, 2024, doi: 10.1016/j.jksuci.2024.102248.
T. Miftahushudur, H. M. Sahin, B. Grieve, and H. Yin, “A Survey of Methods for Addressing Imbalance Data Problems in Agriculture Applications,” Remote Sens. (Basel)., vol. 17, no. 3, pp. 1–31, 2025, doi: 10.3390/rs17030454.
J. P. Venugopal, A. A. V. Subramanian, G. Sundaram, M. Rivera, and P. Wheeler, “A Comprehensive Approach to Bias Mitigation for Sentiment Analysis of Social Media Data,” Applied Sciences (Switzerland), vol. 14, no. 23, pp. 1–32, 2024, doi: 10.3390/app142311471.
K. Ahmed, M. I. Nadeem, G. Wang, F. Zuo, and Z. Han, “Instruction-tuned ABSA with auxiliary sentences and knowledge-enhanced graphs for implicit aspect detection,” Expert Syst. Appl., vol. 289, no. November 2024, 2025, doi: 10.1016/j.eswa.2025.128284.
S. I. Ahsan, D. Djenouri, and R. Haider, “Privacy-Enhanced Sentiment Analysis in Mental Health: Federated Learning with Data Obfuscation and Bidirectional Encoder Representations from Transformers,” Electronics (Switzerland), vol. 13, no. 23, 2024, doi: 10.3390/electronics13234650.
Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 4, pp. 1–16, 2024, doi: 10.1016/j.jksuci.2024.102048.
T. Hamed and M. Madanchian, “Artificial Intelligence and Sentiment Analysis : A Review in,” Computers, vol. 12, no. 37, pp. 1–15, 2023, doi: 10.3390/computers12020037.
K. Alahmadi, S. Alharbi, J. Chen, and X. Wang, “Generalizing sentiment analysis: a review of progress, challenges, and emerging directions,” Soc. Netw. Anal. Min., vol. 15, no. 1, pp. 1–28, 2025, doi: 10.1007/s13278-025-01461-8.
M. M. Taamneh, S. Taamneh, A. H. Alomari, and M. Abuaddous, “Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use,” Sustainability (Switzerland), vol. 15, no. 13, pp. 1–20, 2023, doi: 10.3390/su151310668.
J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Applied Sciences (Switzerland), vol. 13, no. 6, pp. 1–34, 2023, doi: 10.3390/app13064006.
K. M. Sujon, R. Hassan, K. Choi, and M. A. Samad, “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI for evaluating business predictive models,” J. Big Data, vol. 12, no. 1, pp. 1–45, 2025, doi: 10.1186/s40537-025-01313-4.
S. Jurn and W. Kim, “Improving Text Classification of Imbalanced Call Center Conversations Through Data Cleansing, Augmentation, and NER Metadata,” Electronics (Switzerland), vol. 14, no. 11, pp. 1–23, 2025, doi: 10.3390/electronics14112259.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Sentiment Analysis of Tokopedia Customer Reviews using IndoBERT and SMOTE for Class Imbalance Handling
Pages: 20-27
Copyright (c) 2025 Imam Saputra, Mesran Mesran, Guidio Leonarde Ginting

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).






















