Analisis Sentimen Terhadap Cyberbullying di Twitter (X) Menggunakan Improved Word Vectors dan Bert

Madya Dharma Nusantara; Fajri Rakhmat Umbara; Puspita Nurul Sabrina

doi:10.47065/bits.v7i2.7968

Madya Dharma Nusantara * Universitas Jenderal Achmad Yani, Cimahi, Indonesia
Fajri Rakhmat Umbara Universitas Jenderal Achmad Yani, Cimahi, Indonesia
Puspita Nurul Sabrina Universitas Jenderal Achmad Yani, Cimahi, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i2.7968

Keywords: Text Mining; Sentiment Analysis; Cyberbullying; IWV; BERT

Abstract

Text mining is an important approach in analyzing text data, particularly for detecting negative sentiments such as cyberbullying on social media. Twitter (X), as an open platform, often serves as a space for the proliferation of hate speech and abusive behavior recorded in text form. This study aims to improve the performance of sentiment classification models on Twitter (X) data by combining the Improved Word Vector (IWV) and Bidirectional Encoder Representations from Transformers (BERT) methods, evaluated using precision, recall, and F1-score metrics. The dataset used consists of 9,874 Indonesian-language tweets labeled into three categories: Hate Speech (HS), Abusive, and Neutral. This data is sourced from previous research and is the result of re-annotation of the original dataset of 13,169 tweets. IWV is formed from a combination of Word2Vec, GloVe, POS tagging, and emotion lexicon features designed to enrich word representation semantically. The preprocessing process is carried out through several important stages, namely tokenization, filtering, stemming/lemmatization, and normalization. The IWV extraction results were then combined with BERT embedding through concatenation to produce high-dimensional vector representations. Evaluation was performed using precision, recall, and F1-score metrics. The test results showed that the combined IWV+BERT model was able to produce better performance than BERT alone. The use of data that has been balanced through balancing techniques also contributed to the improvement in accuracy, with the highest accuracy value reaching 91%. This finding indicates that the integration of word representation features from IWV and sentence context from BERT can improve the effectiveness of text mining in sentiment analysis related to cyberbullying on social media

Downloads

Download data is not yet available.

References

N. Putu, V. D. Saraswati, N. Yudistira, and P. P. Adikara, “Analisis Sentimen terhadap Perundungan Siber pada Twitter menggunakan Algoritma Bidirectional Encoder Representations from Transformer (BERT),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 2, pp. 909–916, 2023, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12345

C. Destitus, W. Wella, and S. Suryasari, “Support Vector Machine VS Information Gain: Analisis Sentimen Cyberbullying di Twitter Indonesia,” Ultim. InfoSys J. Ilmu Sist. Inf., vol. 11, no. 2, pp. 107–111, 2020, doi: 10.31937/si.v11i2.1740.

Z. Mansur, N. Omar, and S. Tiun, “Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities,” IEEE Access, vol. 11, no. February, pp. 16226–16249, 2023, doi: 10.1109/ACCESS.2023.3239375.

A. R. Hakim, D. M. U. Atmaja, D. Haryadi, and N. Suwaryo, “Tik-53 Twitter Sentiment Analysis Terhadap Pengguna E-Commerce Menggunakan Text Mining,” Pros. Semin. Nas. Teknol. Energi dan Miner., vol. 1, no. 2, pp. 1227–1237, 2021, doi: 10.53026/sntem.v1i2.592.

A. Hermawan, I. Jowensen, J. Junaedi, and Edy, “Implementasi Text-Mining untuk Analisis Sentimen pada Twitter dengan Algoritma Support Vector Machine,” JST (Jurnal Sains dan Teknol., vol. 12, no. 1, pp. 129–137, 2023, doi: 10.23887/jstundiksha.v12i1.52358.

D. Nugraha and P. Astuti, “Analisis Sentimen Cyberbullying Pada Sosial Media Instagram Menggunakan Metode Support Vector Machine,” Inf. Syst. Educ. Prof. J. Inf. Syst., vol. 8, no. 2, p. 153, 2023, doi: 10.51211/isbi.v8i2.2535.

S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” J. Media Inform. Budidarma, vol. 5, no. 2, p. 406, 2021, doi: 10.30865/mib.v5i2.2835.

B. Hakim, “Analisa Sentimen Data Text Preprocessing Pada Data Mining Dengan Menggunakan Machine Learning,” JBASE - J. Bus. Audit Inf. Syst., vol. 4, no. 2, pp. 16–22, 2021, doi: 10.30813/jbase.v4i2.3000.

U. Khairani, V. Mutiawani, and H. Ahmadian, “Pengaruh Tahapan Preprocessing Terhadap Model Indobert Dan Indobertweet Untuk Mendeteksi Emosi Pada Komentar Akun Berita Instagram,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 4, pp. 887–894, 2024, doi: 10.25126/jtiik.1148315.

M. Khadapi and V. Maruli Pakpahan, “Analisis Sentimen Berbasis Jaringan LSTM dan BERT terhadap Diskusi Twitter tentang Pemilu 2024,” JUKI J. Komput. dan Inform., vol. 6, no. 2, pp. 130–137, 2024, [Online]. Available: https://www.ioinformatic.org/index.php/JUKI/article/view/681

M. Adrinta Abdurrazzaq and E. Lesmana Tjiong, “Analisis Sentimen KUHP Baru Pada Data Twitter Menggunakan BERT,” J. Komunikasi, Sains dan Teknol., vol. 1, no. 2, pp. 83–88, 2022, doi: 10.61098/jkst.v1i2.10.

C. A. Putri, “Analisis Sentimen Review Film Berbahasa Inggris Dengan Pendekatan Bidirectional Encoder Representations from Transformers,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 181–193, 2020, doi: 10.35957/jatisi.v6i2.206.

B. Kurniawan, A. Ari Aldino, and A. Rahman Isnain, “Sentimen Analisis Terhadap Kebijakan Penyelenggara Sistem Elektronik (PSE) Menggunakan Algoritma Bidirectional Encorder Representations From Transformer (BERT),” J. Teknol. dan Sist. Inf., vol. 3, no. 4, pp. 98–106, 2022, [Online]. Available: http://jim.teknokrat.ac.id/index.php/JTSI

S. M. Rezaeinia, A. Ghodsi, and R. Rahmani, “Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis,” arXiv, 2017, [Online]. Available: http://arxiv.org/abs/1711.08609

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” Proc. Third Work. Abus. Lang. Online, pp. 46–57, 2019, doi: 10.18653/v1/w19-3506.

S. Riadi, E. Utami, and A. Yaqin, “Comparison of NB and SVM in Sentiment Analysis of Cyberbullying using Feature Selection,” Sinkron, vol. 8, no. 4, pp. 2414–2424, 2023, doi: 10.33395/sinkron.v8i4.12629.

M. H. Fariz and E. B. Setiawan, “the Impact of Word Embedding on Cyberbullying Detection Using Hybird Deep Learning Cnn-Bilstm,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 10, no. 3, pp. 661–671, 2025, doi: 10.33480/jitk.v10i3.6270.

P. F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment Analysis Using Word2vec and Long Short-Term Memory (LSTM) for Indonesian Hotel Reviews,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 728–735, 2021, doi: 10.1016/j.procs.2021.01.061.

A. Nayla, C. Setianingsih, and B. Dirgantoro, “Deteksi Hate Speech Pada Twitter Menggunakan Algoritma BERT,” e-Proceeding Eng., vol. 10, no. 1, p. 256, 2023, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/19323

X. Wang, Z. Jia, Y. Jiang, and K. Tu, “Enhanced Universal Dependency Parsing with Automated Concatenation of Embeddings,” arXiv, pp. 189–195, 2021, doi: 10.18653/v1/2021.iwpt-1.20.

M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Analisis Sentimen Terhadap Cyberbullying di Twitter (X) Menggunakan Improved Word Vectors dan Bert

Analisis Sentimen Terhadap Cyberbullying di Twitter (X) Menggunakan Improved Word Vectors dan Bert

Abstract

Downloads

References

Most read articles by the same author(s)