Performance Analysis of IndoBERT for Sentiment Classification in Indonesian Hotel Review Data

Yerik Afrianto Singgalen

doi:10.47065/josh.v6i2.6505

Yerik Afrianto Singgalen * Atma Jaya Catholic University of Indonesia, Jakarta, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/josh.v6i2.6505

Keywords: IndoBERT; Sentiment Analysis; Hotel Reviews; Class Imbalance; Model Performance

Abstract

This study investigates the performance of a sentiment classification model leveraging IndoBERT to analyze Indonesian hotel review data. Sentiment analysis is crucial for extracting actionable insights from customer reviews, yet challenges such as linguistic diversity and imbalanced datasets complicate accurate classification. The dataset comprises 90% Positive, 5% Neutral, and 5% Negative sentiments, reflecting significant class imbalance. A fine-tuned IndoBERT model was trained over three epochs, with performance assessed using metrics such as accuracy, precision, recall, F1-score, confusion matrices, and ROC and Precision-Recall curves. The results indicate high global accuracy (92.52%) and robust performance for the Positive class (F1-score: 96.09%, AUC: 0.90). However, significant limitations were observed for minority classes, with the Neutral class achieving precision, recall, and F1-scores of 0.00, and the Negative class obtaining a low F1-score of 28.57%. These findings underscore the influence of dataset imbalance, where the dominance of the Positive class skews model predictions. Future research should explore techniques such as oversampling SMOTE, reweighting loss functions, or hybrid architectures to mitigate imbalance and improve performance across all sentiment categories. This research contributes to advancing sentiment classification methodologies for Indonesian text, offering practical implications for enhancing customer feedback analysis in the hospitality industry.

Downloads

Download data is not yet available.

References

D. K. Kardaras, C. Troussas, S. G. Barbounaki, P. Tselenti, and K. Armyras, “A Fuzzy Synthetic Evaluation Approach to Assess Usefulness of Tourism Reviews by Considering Bias Identified in Sentiments and Articulacy,” Inf., vol. 15, no. 4, 2024, doi: 10.3390/info15040236.

R. A. Rahman and Suyanto, “Performance Analysis of ChatGPT for Indonesian Abstractive Text Summarization,” in Proceedings - International Seminar on Intelligent Technology and its Applications, ISITIA, 2024, no. 2024, pp. 477–482. doi: 10.1109/ISITIA63062.2024.10668361.

M. T. Uliniansyah et al., “Twitter dataset on public sentiments towards biodiversity policy in Indonesia,” Data Br., vol. 52, 2024, doi: 10.1016/j.dib.2023.109890.

K. Purwandari, M. A. Jiwanggi, and E. Yulianti, “Sentiment Analysis on YouTube Comment Data for the Candidate Debate in the 2024 Presidential Election of the Republic of Indonesia,” in 2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings, 2024, pp. 392–397. doi: 10.1109/AiDAS63860.2024.10730443.

I. Daqiqil, H. Saputra, Syamsudhuha, R. Kurniawan, and Y. Andriyani, “Sentiment analysis of student evaluation feedback using transformer-based language models,” Indones. J. Electr. Eng. Comput. Sci., vol. 36, no. 2, pp. 1127–1139, 2024, doi: 10.11591/ijeecs.v36.i2.pp1127-1139.

M. Irdayanti, D. Purwitasari, and D. O. Siahaan, “Relevance Detection using Text Entailment for Health-related Question-Answer Texts with Imbalanced Data,” in Proceedings - International Seminar on Intelligent Technology and its Applications, ISITIA, 2024, no. 2024, pp. 681–686. doi: 10.1109/ISITIA63062.2024.10667778.

E. Yulianti, N. Bhary, J. Abdurrohman, F. W. Dwitilas, E. Q. Nuranti, and H. S. Husin, “Named entity recognition on Indonesian legal documents: a dataset and study using transformer-based models,” Int. J. Electr. Comput. Eng., vol. 14, no. 5, pp. 5489–5501, 2024, doi: 10.11591/ijece.v14i5.pp5489-5501.

G. Enrique, I. Alfina, and E. Yulianti, “Javanese part-of-speech tagging using cross-lingual transfer learning,” IAES Int. J. Artif. Intell., vol. 13, no. 3, pp. 3498–3509, 2024, doi: 10.11591/ijai.v13.i3.pp3498-3509.

E. I. Setiawan et al., “Indonesian News Stance Classification Based on Hybrid Bidirectional LSTM and Transformer Based Embedding,” Int. J. Intell. Eng. Syst., vol. 17, no. 5, pp. 517–537, 2024, doi: 10.22266/ijies2024.1031.41.

H. Ahmadian, T. F. Abidin, H. Riza, and K. Muchtar, “Hybrid Models for Recognizing Indonesian Textual Entailment,” in Proceedings - International Conference on Informatics and Computational Sciences, 2024, pp. 462–467. doi: 10.1109/ICICoS62600.2024.10636863.

H. Ahmadian, T. F. Abidin, H. Riza, and K. Muchtar, “Hybrid Models for Emotion Classification and Sentiment Analysis in Indonesian Language,” Appl. Comput. Intell. Soft Comput., vol. 2024, 2024, doi: 10.1155/2024/2826773.

M. Maryamah, G. Wilsen, C. T. Suhalim, R. Septiana, A. Fajar, and M. I. Solihin, “Hybrid Information Retrieval with Masked and Permuted Language Modeling (MPNet) and BM25L for Indonesian Drug Data Retrieval,” in KST 2024 - 16th International Conference on Knowledge and Smart Technology, 2024, pp. 242–247. doi: 10.1109/KST61284.2024.10499674.

Y. A. A. I. Rifai and D. Suhartono, “Emotion Classification of Indonesian Twitter Social Media Text Using Soft Voting Ensemble Method,” ICIC Express Lett. Part B Appl., vol. 15, no. 1, pp. 101–108, 2024, doi: 10.24507/icicelb.15.01.101.

Edwina and T. Mauritsius, “Data-Driven Insights for Mobile Banking App Improvement: A Sentiment Analysis and Topic Modelling Approach for SimobiPlus User Reviews,” Int. J. Eng. Trends Technol., vol. 72, no. 6, pp. 347–360, 2024, doi: 10.14445/22315381/IJETT-V72I6P132.

S. Latisha, S. Favian, and D. Suhartono, “Criminal Court Judgment Prediction System Built on Modified BERT Models,” J. Adv. Inf. Technol., vol. 15, no. 2, pp. 288–298, 2024, doi: 10.12720/jait.15.2.288-298.

F. S. Yerzi, D. P. Ramadhani, and A. Alamsyah, “Comparison of Multiclass Classification and Topic Modeling to Identify Technology Acceptance in Popular E-Commerce in Indonesia Based on UTAUT3 Model,” in Proceedings of the 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, IAICT 2024, 2024, pp. 273–279. doi: 10.1109/IAICT62357.2024.10617771.

J. Islamey, V. Jonathan, M. Nurzaki, and H. Lucky, “Comparative Analysis of Encoder-Based Pretrained Models: Investigating the Performance of BERT Variants in Indonesian Question-Answering,” in 2024 International Conference on Artificial Intelligence, Blockchain, Cloud Computing, and Data Analytics, ICoABCD 2024, 2024, pp. 309–314. doi: 10.1109/ICoABCD63526.2024.10704260.

A. F. Hidayatullah, “Code-Mixed Sentiment Analysis on Indonesian-Javanese-English Text Using Transformer Models,” in 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2024, 2024, pp. 340–345. doi: 10.1109/ICITISEE63424.2024.10730138.

R. A. Fitrianto, A. S. Editya, M. M. H. Alamin, A. L. Pramana, and A. K. Alhaq, “Classification of Indonesian Sarcasm Tweets on X Platform Using Deep Learning,” in Proceedings - International Conference on Informatics and Computational Sciences, 2024, pp. 388–393. doi: 10.1109/ICICoS62600.2024.10636904.

R. Sivanaiah, S. Suresh, S. Pandian, and A. D. Suseelan, “Bridging the Language Gap: Transformer-Based BERT for Fake News Detection in Low-Resource Settings,” Communications in Computer and Information Science, vol. 2046 CCIS. pp. 398–411, 2024. doi: 10.1007/978-3-031-58495-4_29.

E. Yulianti and N. K. Nissa, “ABSA of Indonesian customer reviews using IndoBERT: single-sentence and sentence-pair classification approaches,” Bull. Electr. Eng. Informatics, vol. 13, no. 5, pp. 3579–3589, 2024, doi: 10.11591/eei.v13i5.8032.

F. V. P. Samosir and S. Riyaldi, “Sentiment Analysis of TikTok Comments on Indonesian Presidential Elections Using IndoBERT,” in 2024 3rd International Conference on Creative Communication and Innovative Technology, ICCIT 2024, 2024. doi: 10.1109/ICCIT62134.2024.10701256.

H. M. Ramdhan, M. Dwifebri Purbolaksono, and B. Bunyamin, “Sentiment Analysis of Beauty Product Reviews Using the IndoBERT Method and Naive Bayes Classification,” in 2024 12th International Conference on Information and Communication Technology, ICoICT 2024, 2024, pp. 397–404. doi: 10.1109/ICoICT61617.2024.10698198.

K. Chandra, K. A. Prasetya, R. D. Saputra, and M. F. Hasani, “Leveraging IndoBert for CyberBullying Classification on Social Media,” in ICSINTESA 2024 - 2024 4th International Conference of Science and Information Technology in Smart Administration: The Collaboration of Smart Technology and Good Governance for Sustainable Development Goals, 2024, pp. 407–411. doi: 10.1109/ICSINTESA62455.2024.10747874.

E. Dave and A. Chowanda, “IPerFEX-2023: Indonesian personal financial entity extraction using indoBERT-BiGRU-CRF model,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00987-6.

K. E. Saputra and Riccosan, “Indonesian news article authorship attribution multilabel multiclass classification using IndoBERT,” IAES Int. J. Artif. Intell., vol. 13, no. 4, pp. 4688–4694, 2024, doi: 10.11591/ijai.v13.i4.pp4688-4694.

G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,” Int. J. Electr. Comput. Eng., vol. 14, no. 1, pp. 1071–1078, 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.

F. Rahman and A. S. Girsang, “IndoBERTweet for Sarcasm: Evaluating Domain-Adapted Transformers for Indonesian Twitter Sarcasm Classification,” J. Logist. Informatics Serv. Sci., vol. 11, no. 2, pp. 155–164, 2024, doi: 10.33168/JLISS.2024.0210.

H. Santosa, F. Rachman, S. A. Austen, Christianto, and A. S. Girsang, “IndoBERT for classifying hate speech in Twitter,” in AIP Conference Proceedings, 2024, vol. 3026, no. 1. doi: 10.1063/5.0199750.

R. N. Tanaja, A. Widjaya, Johnny, A. A. S. Gunawan, and K. E. Setiawan, “Evaluating Public Opinion on the 2024 Indonesian Presidential Election Candidate: An IndoBERT Approach to Twitter Sentiment Analysis,” in 2024 10th International Conference on Smart Computing and Communication, ICSCC 2024, 2024, pp. 88–94. doi: 10.1109/ICSCC62041.2024.10690796.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Performance Analysis of IndoBERT for Sentiment Classification in Indonesian Hotel Review Data

Performance Analysis of IndoBERT for Sentiment Classification in Indonesian Hotel Review Data

Abstract

Downloads

References

Most read articles by the same author(s)