Analisis Sentimen Terhadap Cyberbullying di Twitter (X) Menggunakan Improved Word Vectors dan Bert
Abstract
Text mining is an important approach in analyzing text data, particularly for detecting negative sentiments such as cyberbullying on social media. Twitter (X), as an open platform, often serves as a space for the proliferation of hate speech and abusive behavior recorded in text form. This study aims to improve the performance of sentiment classification models on Twitter (X) data by combining the Improved Word Vector (IWV) and Bidirectional Encoder Representations from Transformers (BERT) methods, evaluated using precision, recall, and F1-score metrics. The dataset used consists of 9,874 Indonesian-language tweets labeled into three categories: Hate Speech (HS), Abusive, and Neutral. This data is sourced from previous research and is the result of re-annotation of the original dataset of 13,169 tweets. IWV is formed from a combination of Word2Vec, GloVe, POS tagging, and emotion lexicon features designed to enrich word representation semantically. The preprocessing process is carried out through several important stages, namely tokenization, filtering, stemming/lemmatization, and normalization. The IWV extraction results were then combined with BERT embedding through concatenation to produce high-dimensional vector representations. Evaluation was performed using precision, recall, and F1-score metrics. The test results showed that the combined IWV+BERT model was able to produce better performance than BERT alone. The use of data that has been balanced through balancing techniques also contributed to the improvement in accuracy, with the highest accuracy value reaching 91%. This finding indicates that the integration of word representation features from IWV and sentence context from BERT can improve the effectiveness of text mining in sentiment analysis related to cyberbullying on social media
Downloads
References
N. Putu, V. D. Saraswati, N. Yudistira, and P. P. Adikara, “Analisis Sentimen terhadap Perundungan Siber pada Twitter menggunakan Algoritma Bidirectional Encoder Representations from Transformer (BERT),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 2, pp. 909–916, 2023, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12345
C. Destitus, W. Wella, and S. Suryasari, “Support Vector Machine VS Information Gain: Analisis Sentimen Cyberbullying di Twitter Indonesia,” Ultim. InfoSys J. Ilmu Sist. Inf., vol. 11, no. 2, pp. 107–111, 2020, doi: 10.31937/si.v11i2.1740.
Z. Mansur, N. Omar, and S. Tiun, “Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities,” IEEE Access, vol. 11, no. February, pp. 16226–16249, 2023, doi: 10.1109/ACCESS.2023.3239375.
A. R. Hakim, D. M. U. Atmaja, D. Haryadi, and N. Suwaryo, “Tik-53 Twitter Sentiment Analysis Terhadap Pengguna E-Commerce Menggunakan Text Mining,” Pros. Semin. Nas. Teknol. Energi dan Miner., vol. 1, no. 2, pp. 1227–1237, 2021, doi: 10.53026/sntem.v1i2.592.
A. Hermawan, I. Jowensen, J. Junaedi, and Edy, “Implementasi Text-Mining untuk Analisis Sentimen pada Twitter dengan Algoritma Support Vector Machine,” JST (Jurnal Sains dan Teknol., vol. 12, no. 1, pp. 129–137, 2023, doi: 10.23887/jstundiksha.v12i1.52358.
D. Nugraha and P. Astuti, “Analisis Sentimen Cyberbullying Pada Sosial Media Instagram Menggunakan Metode Support Vector Machine,” Inf. Syst. Educ. Prof. J. Inf. Syst., vol. 8, no. 2, p. 153, 2023, doi: 10.51211/isbi.v8i2.2535.
S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” J. Media Inform. Budidarma, vol. 5, no. 2, p. 406, 2021, doi: 10.30865/mib.v5i2.2835.
B. Hakim, “Analisa Sentimen Data Text Preprocessing Pada Data Mining Dengan Menggunakan Machine Learning,” JBASE - J. Bus. Audit Inf. Syst., vol. 4, no. 2, pp. 16–22, 2021, doi: 10.30813/jbase.v4i2.3000.
U. Khairani, V. Mutiawani, and H. Ahmadian, “Pengaruh Tahapan Preprocessing Terhadap Model Indobert Dan Indobertweet Untuk Mendeteksi Emosi Pada Komentar Akun Berita Instagram,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 4, pp. 887–894, 2024, doi: 10.25126/jtiik.1148315.
M. Khadapi and V. Maruli Pakpahan, “Analisis Sentimen Berbasis Jaringan LSTM dan BERT terhadap Diskusi Twitter tentang Pemilu 2024,” JUKI J. Komput. dan Inform., vol. 6, no. 2, pp. 130–137, 2024, [Online]. Available: https://www.ioinformatic.org/index.php/JUKI/article/view/681
M. Adrinta Abdurrazzaq and E. Lesmana Tjiong, “Analisis Sentimen KUHP Baru Pada Data Twitter Menggunakan BERT,” J. Komunikasi, Sains dan Teknol., vol. 1, no. 2, pp. 83–88, 2022, doi: 10.61098/jkst.v1i2.10.
C. A. Putri, “Analisis Sentimen Review Film Berbahasa Inggris Dengan Pendekatan Bidirectional Encoder Representations from Transformers,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 6, no. 2, pp. 181–193, 2020, doi: 10.35957/jatisi.v6i2.206.
B. Kurniawan, A. Ari Aldino, and A. Rahman Isnain, “Sentimen Analisis Terhadap Kebijakan Penyelenggara Sistem Elektronik (PSE) Menggunakan Algoritma Bidirectional Encorder Representations From Transformer (BERT),” J. Teknol. dan Sist. Inf., vol. 3, no. 4, pp. 98–106, 2022, [Online]. Available: http://jim.teknokrat.ac.id/index.php/JTSI
S. M. Rezaeinia, A. Ghodsi, and R. Rahmani, “Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis,” arXiv, 2017, [Online]. Available: http://arxiv.org/abs/1711.08609
M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” Proc. Third Work. Abus. Lang. Online, pp. 46–57, 2019, doi: 10.18653/v1/w19-3506.
S. Riadi, E. Utami, and A. Yaqin, “Comparison of NB and SVM in Sentiment Analysis of Cyberbullying using Feature Selection,” Sinkron, vol. 8, no. 4, pp. 2414–2424, 2023, doi: 10.33395/sinkron.v8i4.12629.
M. H. Fariz and E. B. Setiawan, “the Impact of Word Embedding on Cyberbullying Detection Using Hybird Deep Learning Cnn-Bilstm,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 10, no. 3, pp. 661–671, 2025, doi: 10.33480/jitk.v10i3.6270.
P. F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment Analysis Using Word2vec and Long Short-Term Memory (LSTM) for Indonesian Hotel Reviews,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 728–735, 2021, doi: 10.1016/j.procs.2021.01.061.
A. Nayla, C. Setianingsih, and B. Dirgantoro, “Deteksi Hate Speech Pada Twitter Menggunakan Algoritma BERT,” e-Proceeding Eng., vol. 10, no. 1, p. 256, 2023, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/19323
X. Wang, Z. Jia, Y. Jiang, and K. Tu, “Enhanced Universal Dependency Parsing with Automated Concatenation of Embeddings,” arXiv, pp. 189–195, 2021, doi: 10.18653/v1/2021.iwpt-1.20.
M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Analisis Sentimen Terhadap Cyberbullying di Twitter (X) Menggunakan Improved Word Vectors dan Bert
Pages: 1057-1068
Copyright (c) 2025 Madya Dharma Nusantara, Fajri Rakhmat Umbara, Puspita Nurul Sabrina

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















