Deteksi Cyberbullying pada Komentar Media Sosial Berbahasa Indonesia Menggunakan Pendekatan Hibrida IndoBERTweet- BiLSTM

Reza Ramadhon Aditya; Arry Maulana Syarif

doi:10.47065/bits.v7i4.9249

Reza Ramadhon Aditya * Universitas Dian Nuswantoro, Semarang, Indonesia
Arry Maulana Syarif Universitas Dian Nuswantoro, Semarang, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i4.9249

Keywords: Cyberbullying; Text Classification; IndoBERTweet; BiLSTM; Social Media; Indonesian NLP

Abstract

Cyberbullying on Indonesian-language social media has become a serious issue with significant psychological and social consequences, necessitating the development of reliable automated detection systems. However, the informal, ambiguous, and highly contextual nature of social media language, including the frequent use of slang and sarcasm, poses substantial challenges for conventional text classification approaches. This study proposes a hybrid cyberbullying detection model that integrates the domain-specific pre-trained language model IndoBERTweet with a Bidirectional Long Short-Term Memory (BiLSTM) architecture. IndoBERTweet is employed to generate contextualized semantic representations aligned with the linguistic characteristics of Indonesian Twitter data, while BiLSTM is utilized to capture bidirectional sequential dependencies at the sentence level. Experiments were conducted using a publicly available, manually annotated Indonesian Twitter dataset consisting of 13,091 samples, which were reformulated into a binary classification scheme. To address class imbalance, a combination of class weighting and label smoothing was applied during model training. Model performance was evaluated using Accuracy, Precision, Recall, F1-Score, ROC-AUC, and PR-AUC metrics. Experimental results show that the IndoBERTweet–BiLSTM model achieved the best performance with an F1-Score of 87.53%, Recall of 88.80%, Precision of 86.31%, ROC-AUC of 92.91%, and PR-AUC of 94.25%. This performance consistently outperforms baseline models based on IndoBERT and IndoBERT-p1 with identical architectural configurations. These findings highlight the critical role of domain alignment in enhancing cyberbullying detection performance for Indonesian social media text.

Downloads

Download data is not yet available.

References

P. Yi and A. Zubiaga, “Session-based cyberbullying detection in social media: A survey,” Online Soc Netw Media, vol. 36, Jul. 2023, doi: 10.1016/j.osnem.2023.100250.

A. Perera and P. Fernando, “Accurate cyberbullying detection and prevention on social media,” Procedia Comput Sci, vol. 181, pp. 605–611, 2021, doi: 10.1016/j.procs.2021.01.207.

S. Singh and S. H. Othman, “An Effective Cyberbullying Detection Model for the Malay Language Using Transformer Model in Social Media Platform X,” International Journal of Innovative Computing, vol. 15, no. 1, pp. 63–71, May 2025, doi: 10.11113/ijic.v15n1.520.

S. Sihab-Us-Sakib, Md. R. Rahman, Md. S. A. Forhad, and Md. A. Aziz, “Cyberbullying detection of resource constrained language from social media using transformer-based approach,” Natural Language Processing Journal, vol. 9, p. 100104, Dec. 2024, doi: 10.1016/j.nlp.2024.100104.

J. Forry Kusuma and A. Chowanda, “Indonesian Hate Speech Detection Using IndoBERTweet and BiLSTM on Twitter,” JOIV: International Journal on Informatics Visualization, vol. 7, no. 3, pp. 773–780, 2023, doi: 10.30630/joiv.7.3.1035.

P. H. Zakaria, D. Nurjannah, and H. Nurrahmi, “Misogyny Text Detection on Tiktok Social Media in Indonesian Using the Pre-trained Language Model IndoBERTweet,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 3, pp. 1297–1305, Jul. 2023, doi: 10.30865/mib.v7i3.6438.

G. Z. Nabiilah, S. Y. Prasetyo, Z. N. Izdihar, and A. S. Girsang, “BERT base model for toxic comment analysis on Indonesian social media,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 714–721. doi: 10.1016/j.procs.2022.12.188.

K. H. Yunior, A. V. Vitianingsih, S. Kacung, A. Lidya Maukar, and A. Dwi Arumsari, “Sentiment Analysis of Cyberbullying Detection on Social Networks using the Sentistrenght Method,” Sistemasi: Jurnal Sistem Informasi, vol. 13, no. 4, pp. 1587–1596, 2024, doi: 10.32520/stmsi.v13i4.4226.

S. A. B. Sibarani, R. Purba, and R. P. Limbong, “Implementation of IndoBERT in Sarcasm Detection using Random Forest Towards Sentiment Analysis,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 4, pp. 2120–2130, Mar. 2025, doi: 10.47065/bits.v6i4.5801.

A. Baruah, K. A. Das, F. A. Barbhuiya, and K. Dey, “Context-Aware Sarcasm Detection Using BERT,” in Proceedings of the Second Workshop on Figurative Language Processing, Association for Computational Linguistics, Jul. 2020, pp. 24–29. doi: 10.18653/v1/P17.

E. Scola and I. Segura-Bedmar, “Sarcasm Detection with BERT,” Procesamiento del Lenguaje Natural, vol. 67, pp. 13–25, Sep. 2021, doi: 10.26342/2021-67-1.

T. Javed, M. A. Nouman, and R. Zahid, “BERT Model Adoption for Sarcasm Detection on Twitter Data,” VFAST Transactions on Software Engineering, vol. 12, no. 3, pp. 177–198, Sep. 2024, doi: 10.21015/vtse.v12i3.1908.

A. Kannammal, S. Omprakash, and J. D. Dheerthan, “Automated Decision Support System for Cyberbullying Detection,” in Procedia Computer Science, Elsevier B.V., 2023, pp. 760–768. doi: 10.1016/j.procs.2023.12.130.

F. R. Sayed, E. H. Elnashar, and F. A. Omara, “Cyberbullying Detection in Social Media Using Natural Language Processing,” Sci Afr, p. e02713, Apr. 2025, doi: 10.1016/j.sciaf.2025.e02713.

A. G, H. Kumar, and B. D, “Toxic Comment Classification using Transformers,” in Proceedings of the 11th Annual International Conference on Industrial Engineering and Operations Management, Singapore, 2021, pp. 1895–1905. doi: 10.46254/AN11.20210366.

F. Shely Amalia and Y. Suyanto, “Offensive language and hate speech detection using BERT model,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 18, no. 1, 2024, doi: 10.22146/ijccs.99841.

F. Rahman and A. S. Girsang, “IndoBERTweet for Sarcasm: Evaluating Domain-Adapted Transformers for Indonesian Twitter Sarcasm Classification,” Journal of Logistics, Informatics and Service Science, vol. 11, no. 2, pp. 155–164, 2024, doi: 10.33168/JLISS.2024.0210.

S. Kaya and B. Alatas, “Sarcasm Detection with A New CNN+BiLSTM Hybrid Neural Network and BERT Classification Model,” International Journal of Advanced Networking and Applications, vol. 14, no. 03, pp. 5436–5443, 2022, doi: 10.35444/ijana.2022.14304.

A. Amudhan, A. G, R. S, and S. Niveditha, “Toxic comment classification,” International Research Journal of Modernization in Engineering Technology and Science (IRJMETS), vol. 6, no. 10, pp. 2093–2099, 2024, doi: 10.56726/IRJMETS62348.

Y. M. Ibrahim, R. Essameldin, and S. M. Saad, “Social Media Forensics: An Adaptive Cyberbullying-Related Hate Speech Detection Approach Based on Neural Networks With Uncertainty,” IEEE Access, vol. 12, pp. 59474–59484, 2024, doi: 10.1109/ACCESS.2024.3393295.

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” in Proceedings of the 3rd Workshop on Abusive Language Online (ALW3), Florence, Italy: Association for Computational Linguistics, 2019, pp. 46–57. doi: 10.18653/v1/W19-3506.

J. Khan, K. Ahmad, S. K. Jagatheesaperumal, and K. A. Sohn, “Textual variations in social media text processing applications: challenges, solutions, and trends,” Artif Intell Rev, vol. 58, no. 89, Mar. 2025, doi: 10.1007/s10462-024-11071-z.

X. Xie, M. Xie, A. J. Moshayedi, and M. H. Noori Skandari, “A Hybrid Improved Neural Networks Algorithm Based on L2 and Dropout Regularization,” Math Probl Eng, vol. 2022, pp. 1–19, 2022, doi: 10.1155/2022/8220453.

A. A. Hafiza and E. B. Setiawan, “Enhancing Cyberbullying Detection on Platform ‘X’ Using IndoBERT and Hybrid CNN-LSTM Model,” Jurnal Teknik Informatika (Jutif), vol. 6, no. 2, pp. 655–672, Apr. 2025, doi: 10.52436/1.jutif.2025.6.2.4321.

Moh. H. Fariz and E. B. Setiawan, “The Impact of Word Embedding on Cyberbullying Detection Using Hybrid Deep Learning CNN-BiLSTM,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 10, no. 3, pp. 661–671, Feb. 2025, doi: 10.33480/jitk.v10i3.6270.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Deteksi Cyberbullying pada Komentar Media Sosial Berbahasa Indonesia Menggunakan Pendekatan Hibrida IndoBERTweet- BiLSTM