Implementation of IndoBERT in Sarcasm Detection using Random Forest Towards Sentiment Analysis


  • Sabrina Adela Br Sibarani * Mail Universitas Mikroskil, Medan, Indonesia
  • Ronsen Purba Universitas Mikroskil, Medan, Indonesia
  • Ricky Paian Limbong Universitas Mikroskil, Medan, Indonesia
  • (*) Corresponding Author
Keywords: Sarcasm Detection; Random Forest; IndoBERT; Natural Language Processing; 10-Fold Cross Validation

Abstract

Sarcasm, a subtle form of irony, often introduces a discrepancy between the literal meaning of words and the intended message, making it a significant challenge for sentiment analysis systems. Misinterpreting sarcasm in social media comments can lead to inaccurate sentiment classification, hindering decision-making processes in areas like customer feedback analysis and social opinion mining. This study addresses this issue by evaluating the effectiveness of sarcasm detection in Indonesian text using a Random Forest Classifier (RFC) integrated with IndoBERT. The research employs 10-fold cross-validation to measure performance. Without IndoBERT, the RFC model achieved average accuracy, precision, recall, and F1-score of 78.83%, 78.83%, 79.01%, and 78.83%, respectively. Incorporating IndoBERT significantly improved performance, with all metrics exceeding 84%. Furthermore, 5-fold cross-validation achieved the highest performance, with all metrics reaching 97.24%. This research contributes to developing more robust natural language processing models tailored to Indonesian linguistic contexts, specifically for sarcasm detection.

Downloads

Download data is not yet available.

References

J. Aboobaker and E. Ilavarasan, “A survey on Sarcasm detection approaches,” Indian Journal of Computer Science and Engineering, vol. 11, no. 6, pp. 751–771, Nov. 2020, doi: 10.21817/indjcse/2020/v11i6/201106048.

P. Verma, N. Shukla, and A. P. Shukla, “Techniques of Sarcasm Detection: A Review,” in 2021 International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2021, Institute of Electrical and Electronics Engineers Inc., Mar. 2021, pp. 968–972. doi: 10.1109/ICACITE51222.2021.9404585.

H. Liu, R. Wei, G. Tu, J. Lin, C. Liu, and D. Jiang, “Sarcasm Driven by Sentiment: A Sentiment-Aware Hierarchical Fusion Network for Multimodal Sarcasm Detection,” Information Fusion, vol. 108, p. 102353, Aug. 2024, doi: 10.1016/j.inffus.2024.102353.

D. Alita and A. Rahman, “Pendeteksian Sarkasme pada Proses Analisis Sentimen Menggunakan Random Forest Classifier,” Jurnal Komputasi, vol. 8, no. 2, 2020.

S. K. Alaramma, A. A. Habu, B. I. Ya’u, and A. G. Madaki, “Sentiment analysis of sarcasm detection in social media,” Gadau Journal of Pure and Allied Sciences, vol. 2, no. 1, pp. 76–82, Jun. 2023, doi: 10.54117/gjpas.v2i1.72.

W. F. Satrya, R. Aprilliyani, and E. H. Yossy, “Sentiment analysis of Indonesian police chief using multi-level ensemble model,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 620–629. doi: 10.1016/j.procs.2022.12.177.

A. C. Băroiu and Ștefan Trăușan-Matu, “Automatic Sarcasm Detection: Systematic Literature Review,” Aug. 01, 2022, MDPI. doi: 10.3390/info13080399.

R. P. Limbong, R. Purba, and M. F. Pasha, “PEMANFAATAN ANALISIS SENTIMEN DARI ULASAN PRODUK DI YOUTUBE UNTUK PENGEMBANGAN PRODUK BARU,” 2024, doi: 10.36418/syntax-literate.v9i7.

M. S. Razali, A. A. Halin, L. Ye, S. Doraisamy, and N. M. Norowi, “Sarcasm Detection Using Deep Learning with Contextual Features,” IEEE Access, vol. 9, pp. 68609–68618, 2021, doi: 10.1109/ACCESS.2021.3076789.

R. Anan, T. S. Apon, Z. T. Hossain, E. A. Modhu, S. Mondal, and MD. G. R. Alam, “Interpretable Bangla Sarcasm Detection using BERT and Explainable AI,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.12772

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” 2020, Online. [Online]. Available: https://huggingface.co/

S. M. Isa, G. Nico, and M. Permana, “IndoBERT for Indonesian Fake News Detection,” ICIC Express Letters, vol. 16, no. 3, pp. 289–297, Mar. 2022, doi: 10.24507/icicel.16.03.289.

E. Savini and C. Caragea, “Intermediate-Task Transfer Learning with BERT for Sarcasm Detection,” Mathematics, vol. 10, no. 5, Mar. 2022, doi: 10.3390/math10050844.

G. Z. Nabiilah, S. Y. Prasetyo, Z. N. Izdihar, and A. S. Girsang, “BERT Base Model for Toxic Comment Analysis on Indonesian Social Media,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 714–721. doi: 10.1016/j.procs.2022.12.188.

A. Rahma, S. S. Azab, and A. Mohammed, “A Comprehensive Survey on Arabic Sarcasm Detection: Approaches, Challenges and Future Trends,” IEEE Access, vol. 11, pp. 18261–18280, 2023, doi: 10.1109/ACCESS.2023.3247427.

N. Majumder, S. Poria, H. Peng, N. Chhaya, E. Cambria, and A. Gelbukh, “Sentiment and Sarcasm Classification with Multitask Learning,” IEEE Intell Syst, vol. 34, no. 3, pp. 38–43, May 2019, doi: 10.1109/MIS.2019.2904691.

A. Balpande, S. Panditpautra, and R. Nair, “Advancements and Comparative Analysis of Opinion Mining Techniques: A Review of Methods and Algorithms,” in Proceedings of 3rd International Conference on Advanced Computing Technologies and Applications, ICACTA 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICACTA58201.2023.10392801.

R. Filik, A. Ţurcan, C. Ralph-Nearman, and A. Pitiot, “What is the difference between irony and sarcasm? An fMRI study,” Cortex, vol. 115, pp. 112–122, Jun. 2019, doi: 10.1016/j.cortex.2019.01.025.

C. I. Eke, A. A. Norman, Liyana Shuib, and H. F. Nweke, “Sarcasm identification in textual data: systematic review, research challenges and open directions,” Artif Intell Rev, vol. 53, no. 6, pp. 4215–4258, Aug. 2020, doi: 10.1007/s10462-019-09791-8.

S. yun Yang, “Listener’s ratings and acoustic analyses of voice qualities associated with English and Korean sarcastic utterances,” Speech Commun, vol. 129, pp. 1–6, May 2021, doi: 10.1016/j.specom.2021.02.002.

Institute of Electrical and Electronics Engineers and PPG Institute of Technology, Machine Learning based Sarcasm Detection on Twitter Data. 2020.

M. S. M. Rudwan and J. V. Fonou-Dombeu, “Hybridizing Fuzzy String Matching and Machine Learning for Improved Ontology Alignment,” Future Internet, vol. 15, no. 7, Jul. 2023, doi: 10.3390/fi15070229.

K. Dhibi et al., “Reduced Kernel Random Forest Technique for Fault Detection and Classification in Grid-Tied PV Systems,” IEEE J Photovolt, vol. 10, no. 6, pp. 1864–1871, Nov. 2020, doi: 10.1109/JPHOTOV.2020.3011068.

M. I. K. Sinapoy, Y. Sibaroni, and S. S. Prasetyowati, “Comparison of LSTM and IndoBERT Method in Identifying Hoax on Twitter,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 3, pp. 657–662, Jun. 2023, doi: 10.29207/resti.v7i3.4830.

D. K. Sharma, B. Singh, S. Agarwal, N. Pachauri, A. A. Alhussan, and H. A. Abdallah, “Sarcasm Detection over Social Media Platforms Using Hybrid Ensemble Model with Fuzzy Logic,” Electronics (Switzerland), vol. 12, no. 4, Feb. 2023, doi: 10.3390/electronics12040937.

T. A. S. Rohmah and W. Maharani, “Personality Detection on Twitter Social Media Using IndoBERT Method,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 2, pp. 448–453, Sep. 2022, doi: 10.47065/bits.v4i2.1895.

M. Shrivastava and S. Kumar, “A Pragmatic and Intelligent Model for Sarcasm Detection in Social Media Text,” Technol Soc, vol. 64, Feb. 2021, doi: 10.1016/j.techsoc.2020.101489.

X. Dong, C. Li, and J. D. Choi, “Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.11424


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Implementation of IndoBERT in Sarcasm Detection using Random Forest Towards Sentiment Analysis

Dimensions Badge
Article History
Submitted: 2024-08-21
Published: 2025-03-01
Abstract View: 45 times
PDF Download: 43 times
How to Cite
Sibarani, S., Purba, R., & Limbong, R. (2025). Implementation of IndoBERT in Sarcasm Detection using Random Forest Towards Sentiment Analysis. Building of Informatics, Technology and Science (BITS), 6(4), 2120-2130. https://doi.org/10.47065/bits.v6i4.5801
Issue
Section
Articles