Customer Sentiment Analysis of E-Commerce Products Using the Naïve Bayes Method and Word Embedding
Abstract
This study discusses customer sentiment analysis toward e-commerce products using the Naïve Bayes method combined with Word Embedding techniques to enhance the semantic understanding of Indonesian-language customer reviews. The research background is based on the rapid growth of e-commerce, which has created a strong need to understand consumer opinions through online reviews. The main challenge in sentiment analysis lies in the complexity of natural language, such as the use of informal words, abbreviations, and diverse emotional expressions. This study utilizes 40,607 Tokopedia customer reviews across five product categories with three sentiment labels (positive, neutral, and negative). The research stages include data collection, text preprocessing (case folding, tokenization, stopword removal, stemming, and slang normalization), feature representation using Word2Vec and FastText, and classification using Multinomial Naïve Bayes. Experimental results show that the combination of Word2Vec and Naïve Bayes achieved an accuracy of 87.92%, while FastText and Naïve Bayes improved accuracy to 91.52%. The FastText-based model proved superior in handling morphological variations and non-standard spellings, making it more effective for Indonesian customer review texts. The WordCloud visualization reveals the dominance of positive words such as “sesuai” (appropriate), “barang” (item), and “cepat” (fast), indicating customer satisfaction regarding product conformity and service speed. The Confusion Matrix results indicate a bias toward the positive class due to data imbalance, where the model still struggles to recognize neutral and negative classes. Overall, this study demonstrates that integrating Word Embedding with Naïve Bayes enhances classification performance and provides richer semantic representations compared to traditional Bag of Words approaches. This approach has the potential to be applied in developing data-driven recommendation systems and marketing strategies within Indonesia’s e-commerce ecosystem.
Downloads
References
A. R. Susanti and E. N. Ilahi, “Sentiment Analysis of User Reviews of E-commerce Applications: Case Study on the Shoppe Platform,” J. Soc. Sci., vol. 5, no. 4, pp. 983–988, 2024, doi: https://doi.org/10.46799/jss.v5i4.885.
F. Aftab et al., “A Comprehensive Survey on Sentiment Analysis Techniques,” Int. J. Technol., vol. 14, no. 6, pp. 291–319, 2023, doi: https://doi.org/10.14716/ijtech.v14i6.6632.
A. H. I. Ahmad Azrir, P. Naveen, and S. C. Haw, “Sentiment Analysis using Machine Learning Models on Shopee Reviews,” J. Syst. Manag. Sci., vol. 14, no. 2, pp. 214–228, 2024, doi: https://doi.org/10.33168/jsms.2024.0213.
F. A. Ramadhan, R. R. P. Ruslan, and A. Zahra, “Sentiment Analysis Of E-Commerce Product Reviews For Content Interaction Using Machine Learning,” Cakrawala Repos. IMWI, vol. 6, no. 1, pp. 207–220, 2023, doi: https://doi.org/10.52851/cakrawala.v6i1.219.
A. Daza, N. D. G. Rueda, M. S. A. Sánchez, W. F. R. Espíritu, and M. E. C. Quiñones, “Sentiment Analysis on E-Commerce Product Reviews Using Machine Learning and Deep Learning Algorithms: A Bibliometric Analysis, Systematic Literature Review, Challenges and Future Works,” Int. J. Inf. Manag. Data Insights, vol. 4, no. 2, p. 100267, 2024, doi: https://doi.org/10.1016/j.jjimei.2024.100267.
N. M. Al Ghazali and Y. Sibaroni, “Sentiment Classification in E-Commerce Using Naïve Bayes and Combined Lexicon - N-Gram Features,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 10, no. 2, pp. 1257–1271, 2025, doi: https://doi.org/10.29100/jipi.v10i2.6157.
F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf. Sci. (Ny)., vol. 513, pp. 429–441, 2020, doi: https://doi.org/10.1016/j.ins.2019.11.004.
R. Obiedat et al., “Sentiment analysis of customers’ reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution,” Ieee Access, vol. 10, pp. 22260–22273, 2022, doi: https://doi.org/10.1109/ACCESS.2022.3149482.
M. Idris, A. Rifai, and K. D. Tania, “Sentiment Analysis of Tokopedia App Reviews using Machine Learning and Word Embeddings,” Sink. J. dan Penelit. Tek. Inform., vol. 9, no. 1, pp. 210–219, 2025, doi: https://doi.org/10.33395/sinkron.v9i1.14278.
C. C. P. Hapsari, W. Astuti, and M. D. Purbolaksono, “Naive Bayes Classifier and Word2Vec for Sentiment Analysis on Bahasa Indonesia Cosmetic Product Reviews,” in 2021 International Conference on Data Science and Its Applications (ICoDSA), 2021, pp. 22–27. doi: https://doi.org/10.1109/ICoDSA53588.2021.9617544.
D. S. Asudani, N. K. Nagwani, and P. Singh, “Impact of word embedding models on text analytics in deep learning environment: a review,” Artif. Intell. Rev., vol. 56, pp. 10345–10425, 2023, doi: https://doi.org/10.1007/s10462-023-10419-1.
H. Abdelmotaleb, C. Mcneile, and M. Wojtyś, “A comparative study of word embedding techniques for classification of star ratings,” Expert Syst. Appl., vol. 297, no. A, p. 129037, 2025, doi: https://doi.org/10.1016/j.eswa.2025.129037.
N. Tabassum et al., “Semantic Analysis of Urdu English Tweets Empowered by Machine Learning,” Intell. Autom. Soft Comput., vol. 30, no. 1, pp. 175–186, 2021, doi: http://dx.doi.org/10.32604/iasc.2021.018998.
A. Chakravarthy, P. Desai, S. Deshmukh, S. Gawande, and I. Saha, “Hybrid Architecture for Sentiment Analysis using Deep Learning,” Int. J. Adv. Res. Comput. Sci., vol. 9, no. 1, pp. 735–738, 2018, doi: http://dx.doi.org/10.26483/ijarcs.v9i1.5388.
A. T. Rizkya, R. Rianto, and A. I. Gufroni, “Implementation of the Naive Bayes Classifier for Sentiment Analysis of Shopee E-Commerce Application Review Data on the Google Play Store,” Int. J. Appl. Inf. Syst. Informatics, vol. 1, no. 1, 2023, doi: https://doi.org/10.37058/jaisi.v1i1.8993.
S. Xiao, H. Wang, Z. Ling, L. Wang, and Z. Tang, “Sentiment Analysis For Product Reviews Based on Deep Learning,” in Journal of Physics: Conference Series, 2020, vol. 1651, p. 12103. doi: 10.1088/1742-6596/1651/1/012103.
S. N. Fadhilah and F. S. Utomo, “Naïve Bayes Algorithm for Sentiment Analysis of Blibli. com Review on Google Play Store,” Sist. J. Sist. Inf., vol. 13, no. 2, pp. 831–840, 2024, doi: https://doi.org/10.32520/stmsi.v13i2.3887.
M. A. Wahed, M. S. Alzboon, M. Alqaraleh, J. Ayman, M. Al-Batah, and A. F. Bader, “Automating web data collection: Challenges, solutions, and python-based strategies for effective web scraping,” in 2024 7th International Conference on Internet Applications, Protocols, and Services (NETAPPS), 2024, pp. 1–6. doi: https://doi.org/10.1109/NETAPPS63333.2024.10823528.
B. Bala and S. Behal, “A brief survey of data preprocessing in machine learning and deep learning techniques,” in 2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), 2024, pp. 1755–1762. doi: https://doi.org/10.1109/I-SMAC61858.2024.10714767.
D. A. A. Prakash, “Pre-processing techniques for preparing clean and high-quality data for diabetes prediction,” Int. J. Res. Publ. Rev., vol. 5, no. 2, pp. 458–465, 2024, doi: https://doi.org/10.55248/gengpi.5.0224.0412.
N. G. Ramadhan, W. Maharani, and A. A. Gozali, “Chronic diseases prediction using machine learning with data preprocessing handling: A critical review,” IEEE Access, vol. 12, pp. 80698–80730, 2024, doi: https://doi.org/10.1109/ACCESS.2024.3406748.
P. Koukaras and C. Tjortjis, “Data Preprocessing and Feature Engineering for Data Mining: Techniques, Tools, and Best Practices,” AI, vol. 6, no. 10, 2025, doi: 10.3390/ai6100257.
L. Afuan, N. Hidayat, N. Nofiyati, and M. F. As’ ad, “Sentiment Analysis of the Kampus Merdeka Program on Twitter Using Support Vector Machine and a Feature Extraction Comparison: TF-IDF vs. FastText,” J. Appl. Data Sci., vol. 5, no. 4, pp. 1738–1753, 2024, doi: https://doi.org/10.47738/jads.v5i4.436.
E. Dotan, G. Jaschek, T. Pupko, and Y. Belinkov, “Effect of tokenization on transformers for biological sequences,” Bioinformatics, vol. 40, no. 4, 2024, doi: https://doi.org/10.1093/bioinformatics/btae196.
M. E. Samie, E. Bahmani, and N. Mozafari, “Analytical Comparison of Stop Word Recognition Methods in Persian Texts,” Int. J. Inf. Sci. Manag., vol. 23, no. 1, pp. 91–107, 2025, doi: https://doi.org/10.22034/ijism.2025.2017335.1322.
V. P. Carolina, E. Utami, and A. Yaqin, “Exploring Stemming Techniques in Ambon Malay Languages: A Systematic Literature Review,” Jambura J. Informatics, vol. 6, no. 1, pp. 1–13, 2024, doi: https://doi.org/10.37905/jji.v6i1.24954.
B. A. Mustofa and W. L. Y. Saptomo, “Use of Natural Language Processing in Social Media Text Analysis,” J. Artif. Intell. Eng. Appl., vol. 4, no. 2, pp. 1235–1238, 2025, doi: https://doi.org/10.59934/jaiea.v4i2.875.
K. Hadi and E. Utami, “Analysis of K-NN with the Integration of Bag of Words, TF-IDF, and N-Grams for Hate Speech Classification on Twitter,” JUITA J. Inform., vol. 12, no. 2, pp. 289–298, 2024, doi: https://doi.org/10.30595/juita.v12i2.23829.
E. De Santis, A. Martino, F. Ronci, and A. Rizzi, “From bag-of-words to transformers: A comparative study for text classification in healthcare discussions in social media,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 9, no. 1, pp. 1063–1077, 2024, doi: https://doi.org/10.1109/TETCI.2024.3423444.
M. Parmar and A. Tiwari, “Enhancing Text Classification Performance using Stacking Ensemble Method with TF-IDF Feature Extraction,” in 2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), 2024, pp. 166–174. doi: https://doi.org/10.1109/ICMCSI61536.2024.00031.
T. S. Y. Win, “Authorship Identification System Using Word2Vec Word Embedding Model,” in 2024 IEEE Conference on Computer Applications (ICCA), 2024, pp. 1–9. doi: https://doi.org/10.1109/ICCA62361.2024.10533018.
B. P. Kamath, M. Geetha, U. D. Acharya, D. Singh, and A. Rao, “Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews,” IEEE Access, vol. 13, pp. 25239–25255, 2025, doi: https://doi.org/10.1109/ACCESS.2025.3536631.
H. Mahmud, M. Hasan, F. R. Kabir, and M. Z. Aqib, “A Systematic Literature Review of Similarity Analysis Techniques for Bangla Text,” Int. J. Innov. Sci. Res. Technol., vol. 9, no. 10, pp. 3051–3058, 2024, doi: https://doi.org/10.5281/zenodo.14730649.
L. A. Widjayanto and E. B. Setiawan, “Depression Detection using Convolutional Neural Networks and Bidirectional Long Short-Term Memory with BERT variations and FastText Methods,” J. Tek. Inform., vol. 6, no. 3, pp. 1555–1568, 2025, doi: https://doi.org/10.52436/1.jutif.2025.6.3.4874.
N. A. Nasution, E. B. Nababan, and H. Mawengkang, “Comparing LSTM Algorithm with Word Embedding: FastText and Word2Vec in Bahasa Batak-English Translation,” in 2024 12th International Conference on Information and Communication Technology (ICoICT), 2024, pp. 306–313. doi: https://doi.org/10.1109/ICoICT61617.2024.10698481.
A. R. Mandasari, E. B. Nababan, and H. Mawengkang, “Long Short Term Memory (LSTM) Improvement Accuracy Using FastText and Glove for Language Translation Batak-Indonesian,” in 2024 2nd International Conference on Technology Innovation and Its Applications (ICTIIA), 2024, pp. 1–5. doi: https://doi.org/10.1109/ICTIIA61827.2024.10761640.
T. O. Atoyebi, R. F. Olanrewaju, N. V Blamah, and E. C. Uwazie, “Comparison of Multinomial Naive Bayes (MNB), Gaussian Naive Bayes (GNB) and Random Forest (RF) Algorithm in Malaria Disease Diagnosis,” in 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG), 2024, pp. 1–6. doi: https://doi.org/10.1109/SEB4SDG60871.2024.10630308.
E. Mulyani, M. Sari, and F. Ishlakhuddin, “Multinomial Naïve Bayes Optimization with Information Gain for Library Book Classification,” TeknoIS J. Ilm. Teknol. Inf. dan Sains, vol. 15, no. 1, pp. 104–110, 2025, doi: https://doi.org/10.36350/jbs.v15i1.300.
M. K. Matarat, “Aspect-Based Sentiment Analysis in Thai Texts: A Comparative Study of Machine Learning and Neural Network Approaches,” Indochina Appl. Sci., vol. 14, no. 3, p. 262648, 2025, doi: https://doi.org/10.55674/ias.v14i3.262648.
M. Hasan, T. Ahmed, M. R. Islam, and M. P. Uddin, “Leveraging textual information for social media news categorization and sentiment analysis,” PLoS One, vol. 19, no. 7, p. e0307027, 2024, doi: https://doi.org/10.1371/journal.pone.0307027.
J. Wang et al., “Generalizing to unseen domains: A survey on domain generalization,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 8, pp. 8052–8072, 2022, doi: https://doi.org/10.1109/TKDE.2022.3178128.
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,” Commun. ACM, vol. 64, no. 3, pp. 107–115, 2021, doi: https://doi.org/10.1145/3446776.
S. U. Hassan, J. Ahamed, and K. Ahmad, “Analytics of machine learning-based algorithms for text classification,” Sustain. Oper. Comput., vol. 3, pp. 238–248, 2022, doi: https://doi.org/10.1016/j.susoc.2022.03.001.
J. C. Obi, “A comparative study of several classification metrics and their performances on data,” World J. Adv. Eng. Technol. Sci., vol. 8, no. 1, pp. 308–314, 2023, doi: https://doi.org/10.30574/wjaets.2023.8.1.0054.
A. Geroldinger, L. Lusa, M. Nold, and G. Heinze, “Leave-one-out cross-validation, penalization, and differential bias of some prediction model performance measures—a simulation study,” Diagnostic Progn. Res., vol. 7, no. 1, p. 9, 2023, doi: https://doi.org/10.1186/s41512-023-00146-0.
S. Sathyanarayanan and B. R. Tantri, “Confusion matrix-based performance evaluation metrics,” African J. Biomed. Res., vol. 27, no. 4S, pp. 4023–4031, 2024, doi: https://doi.org/10.53555/AJBR.v27i4S.4345.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Customer Sentiment Analysis of E-Commerce Products Using the Naïve Bayes Method and Word Embedding
Pages: 2025-2034
Copyright (c) 2025 Bartolomius Harpad, Azahari Azahari, Salmon Salmon

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















