Perbandingan Kinerja Pre-Trained Word Embedding Terhadap Performa Klasifikasi Sentimen Ulasan Produk Tokopedia Dengan Long Short-Term Memory(LSTM)


  • Naufal Angling Dirfas Universitas Muhammadiyah Malang, Indonesia
  • Vinna Rahmayanti Setyaning Nastiti * Mail Universitas Muhammadiyah Malang, Indonesia
  • (*) Corresponding Author
Keywords: LSTM; GloVe; Word2Vec; FastText; Pre-Trained Word Embedding

Abstract

The product review dataset is a rapidly growing and interesting source of data to explore. The increase in the number of internet users and customer shopping habits through online stores has a significant impact on the growth of product review data, especially for online stores in Indonesia, such as Tokopedia. The sample data used amounted to 1079. This research aims to evaluate the performance of three types of pre-trained word embeddings, namely FastText, GloVe, and Word2Vec, in the Long Short-Term Memory (LSTM) model for sentiment classification of product reviews on Tokopedia. An automated sentiment classification system is needed to process many product reviews, making it easier for sellers to know what consumers think of their products. This research contributes by evaluating the impact of various pre-trained word embeddings on the performance of LSTM models in sentiment classification tasks. In addition, this research also aims to measure the effectiveness of LSTM models combined with multiple pre-trained word embeddings. By implementing a deep learning architecture, computers can learn and recognize contextual data stored in review sentences. The research was conducted in three stages: model selection, layer setup, and hyperparameter optimization, to feature in-depth testing of the deep learning architecture used and the appropriate combination of layers and parameters to obtain high sentiment classification performance. The experimental results show that FastText with LSTM provides the best performance with 85.08% accuracy, Word2Vec with 84.62% accuracy, and GloVe with 83.04% accuracy. The main contribution of this research is to present an in-depth test of the product review dataset and provide a deep learning architecture along with a combination of layers and parameters that has the best performance in recognizing sentiment on the product review dataset. This architecture achieves higher performance than the BERT method with CNN and BiLSTM layers.

Downloads

Download data is not yet available.

References

E. H. Muktafin, K. Kusrini, and E. T. Luthfi, “Analisis Sentimen pada Ulasan Pembelian Produk di Marketplace Shopee Menggunakan Pendekatan Natural Language Processing,” J. Eksplora Inform., vol. 10, no. 1, pp. 32–42, Sep. 2020, doi: 10.30864/eksplora.v10i1.390.

D. Widiastuti, I. Rasal, D. Wulandari, and A. Putri, “Sentiment Analysis of Product Reviews Data on Tokopedia by Comparing The Performance of Classification Algorithms,” J. Infokum, vol. 10, no. 2, pp. 1034–1041, 2022, [Online]. Available: http://infor.seaninstitute.org/index.php/infokum/index

A. N. Rohman, R. Luviana Musyarofah, E. Utami, and S. Raharjo, “Natural Language Processing on Marketplace Product Review Sentiment Analysis,” in 2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS), IEEE, Oct. 2020, pp. 1–5. doi: 10.1109/ICORIS50180.2020.9320827.

M. Loukili, F. Messaoudi, and M. El Ghazi, “Sentiment Analysis of Product Reviews for E-Commerce Recommendation based on Machine Learning,” Int. J. Adv. Soft Comput. its Appl., vol. 15, no. 1, pp. 1–13, 2023, doi: 10.15849/IJASCA.230320.01.

Aakash, S. Gupta, and A. Noliya, “URL-Based Sentiment Analysis of Product Reviews Using LSTM and GRU,” Procedia Comput. Sci., vol. 235, pp. 1814–1823, 2024, doi: 10.1016/j.procs.2024.04.172.

H. T. Ismet, T. Mustaqim, and D. Purwitasari, “Aspect Based Sentiment Analysis of Product Review Using Memory Network,” Sci. J. Informatics, vol. 9, no. 1, pp. 73–83, May 2022, doi: 10.15294/sji.v9i1.34094.

Hanafi, N. Suryana, and A. Basari, “Generate Contextual Insight of Product Review Using Deep LSTM and Word Embedding,” J. Phys. Conf. Ser., vol. 1577, no. 1, p. 012006, Jul. 2020, doi: 10.1088/1742-6596/1577/1/012006.

S. Smetanin and M. Komarov, “Sentiment Analysis of Product Reviews in Russian using Convolutional Neural Networks,” in 2019 IEEE 21st Conference on Business Informatics (CBI), IEEE, Jul. 2019, pp. 482–486. doi: 10.1109/CBI.2019.00062.

F. Xu, Z. Pan, and R. Xia, “E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework,” Inf. Process. Manag., vol. 57, no. 5, p. 102221, Sep. 2020, doi: 10.1016/j.ipm.2020.102221.

P. F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment Analysis Using Word2vec And Long Short-Term Memory (LSTM) For Indonesian Hotel Reviews,” Procedia Comput. Sci., vol. 179, pp. 728–735, 2021, doi: 10.1016/j.procs.2021.01.061.

M. Khuntia and D. Gupta, “Indian News Headlines Classification using Word Embedding Techniques and LSTM Model,” Procedia Comput. Sci., vol. 218, pp. 899–907, 2023, doi: 10.1016/j.procs.2023.01.070.

Y. KIRELLİ and Ş. ÖZDEMİR, “Sentiment Classification Performance Analysis Based on Glove Word Embedding,” Sak. Univ. J. Sci., vol. 25, no. 3, pp. 639–646, Jun. 2021, doi: 10.16984/saufenbilder.886583.

N. K. Gondhi, Chaahat, E. Sharma, A. H. Alharbi, R. Verma, and M. A. Shah, “Efficient Long Short-Term Memory-Based Sentiment Analysis of E-Commerce Reviews,” Comput. Intell. Neurosci., vol. 2022, pp. 1–9, Jun. 2022, doi: 10.1155/2022/3464524.

D. Nam, J. Yasmin, and F. Zulkernine, “Effects of Pre-trained Word Embeddings on Text-based Deception Detection,” in 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, Aug. 2020, pp. 437–443. doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00083.

I. N. Khasanah, “Sentiment Classification Using fastText Embedding and Deep Learning Model,” Procedia Comput. Sci., vol. 189, pp. 343–350, 2021, doi: 10.1016/j.procs.2021.05.103.

A. Chauhan, A. Sharma, and R. Mohana, “A Pre-Trained Model for Aspect-based Sentiment Analysis Task: using Online Social Networking,” Procedia Comput. Sci., vol. 233, pp. 35–44, 2024, doi: 10.1016/j.procs.2024.03.193.

R. Sutoyo, S. Achmad, A. Chowanda, E. W. Andangsari, and S. M. Isa, “PRDECT-ID: Indonesian product reviews dataset for emotions classification tasks,” Data Br., vol. 44, p. 108554, 2022, doi: 10.1016/j.dib.2022.108554.

A. Chowanda, R. Sutoyo, S. Achmad, E. W. Andangsari, S. M. Isa, and T. K. Chen, “Modeling Emotions Recognition on Indonesian Product Review By Combining Bert, Cnn, and Lstm Architecture,” Int. J. Innov. Comput. Inf. Control, vol. 20, no. 3, pp. 929–944, 2024, doi: 10.24507/ijicic.20.03.929.

P. Santosh Kumar, R. B. Yadav, and S. V. Dhavale, “A Comparison of Pre-trained Word Embeddings for Sentiment Analysis Using Deep Learning,” 2021, pp. 525–537. doi: 10.1007/978-981-15-5113-0_41.

S. R. Reddy. V., D. V. L. N. Somayajulu, and A. R. Dani, “Classification of Movie Reviews Using Complemented Naive Bayesian Classifier,” Int. J. Intell. Comput. Res., vol. 2, no. 3, pp. 148–153, Sep. 2011, doi: 10.20533/ijicr.2042.4655.2011.0019.

A. P. P. Wardani, A. Adiwijaya, and M. D. Purbolaksono, “Sentiment Analysis on Beauty Product Review Using Modified Balanced Random Forest Method and Chi-Square,” J. Inf. Syst. Res., vol. 4, no. 1, pp. 1–7, Oct. 2022, doi: 10.47065/josh.v4i1.2047.

I. Zulfa and E. Winarko, “Sentimen Analisis Tweet Berbahasa Indonesia Dengan Deep Belief Network,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 11, no. 2, p. 187, Jul. 2017, doi: 10.22146/ijccs.24716.

S. Dey, S. Wasif, D. S. Tonmoy, S. Sultana, J. Sarkar, and M. Dey, “A Comparative Study of Support Vector Machine and Naive Bayes Classifier for Sentiment Analysis on Amazon Product Reviews,” in 2020 International Conference on Contemporary Computing and Applications (IC3A), IEEE, Feb. 2020, pp. 217–220. doi: 10.1109/IC3A48958.2020.233300.

C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.

M. U. Albab, Y. Karuniawati P, and M. N. Fawaiq, “Optimization of the Stemming Technique on Text preprocessing President 3 Periods Topic,” J. Transform., vol. 20, no. 2, pp. 1–10, 2023, [Online]. Available: https://journals.usm.ac.id/index.php/transformatika/■page1

D. E. Birba, “A Comparative study of data splitting algorithms for machine learning model selection,” Degree Proj. Comput. Sci. Eng., vol. 2020, no. 1, pp. 1–23, 2020, [Online]. Available: https://www.diva-portal.org/smash/get/diva2:1506870/FULLTEXT01.pdf

T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin, “Advances in Pre-Training Distributed Word Representations,” Dec. 2017, [Online]. Available: http://arxiv.org/abs/1712.09405

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.04805

Sitender, Sangeeta, N. S. Sushma, and S. K. Sharma, “Effect of GloVe, Word2Vec and FastText Embedding on English and Hindi Neural Machine Translation Systems,” 2023, pp. 433–447. doi: 10.1007/978-981-19-7615-5_37.

C. Tulu, “Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish,” Adv. Sci. Technol. Res. J., vol. 16, no. 4, pp. 147–156, Oct. 2022, doi: 10.12913/22998624/152453.

E. Sesari, M. Hort, and F. Sarro, “An Empirical Study on the Fairness of Pre-trained Word Embeddings,” in Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Stroudsburg, PA, USA: Association for Computational Linguistics, 2022, pp. 129–144. doi: 10.18653/v1/2022.gebnlp-1.15.

V. Vaissnave and P. Deepalakshmi, “Comparative Analysis: Sentiment Analysis for Legal Judgment Text in India’s Supreme Court Based on GloVe Pretrained Word Embedding and Deep Learning Models,” 2022, pp. 33–44. doi: 10.1007/978-981-19-0707-4_4.

G. Curto, M. F. Jojoa Acosta, F. Comim, and B. Garcia-Zapirain, “Are AI systems biased against the poor? A machine learning analysis using Word2Vec and GloVe embeddings,” AI Soc., vol. 39, no. 2, pp. 617–632, Apr. 2024, doi: 10.1007/s00146-022-01494-z.

V. M. Patro and M. Ranjan Patra, “Augmenting Weighted Average with Confusion Matrix to Enhance Classification Accuracy,” Trans. Mach. Learn. Artif. Intell., vol. 2, no. 4, Aug. 2014, doi: 10.14738/tmlai.24.328.

D. P. Putra and E. B. Setiawan, “Hoax Detection Using Long Short-Term Memory (LSTM) and Gate Recurrent Unit (GRU) on Social Media,” Build. Informatics, Technol. Sci., vol. 4, no. 4, Mar. 2023, doi: 10.47065/bits.v4i4.3084.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Perbandingan Kinerja Pre-Trained Word Embedding Terhadap Performa Klasifikasi Sentimen Ulasan Produk Tokopedia Dengan Long Short-Term Memory(LSTM)

Dimensions Badge
Article History
Submitted: 2024-07-19
Published: 2024-09-09
Abstract View: 132 times
PDF Download: 84 times
How to Cite
Dirfas, N., & Nastiti, V. (2024). Perbandingan Kinerja Pre-Trained Word Embedding Terhadap Performa Klasifikasi Sentimen Ulasan Produk Tokopedia Dengan Long Short-Term Memory(LSTM). Building of Informatics, Technology and Science (BITS), 6(2), 878−889. https://doi.org/10.47065/bits.v6i2.5634
Section
Articles