Hate Speech Detection on Twitter through Natural Language Processing using LSTM Model
Abstract
Currently, social media is a place to express opinions. This opinion can be positive or negative. However, lately, the opinion that often appears is a negative opinion, such as hate speech. Hate speech is often found on social media, such as malicious comments intended to insult individuals or groups. Based on WeAreSocial data in 2021, one of the most used social media platforms in Indonesia is Twitter, with 63.6% of users. According to the Indonesia National Police, hate speech cases were more dominant during the period from April 2020 to July 2021. Therefore, efforts are needed to identify hate speech on the Twitter platform. One way to detect hate speech is by using deep learning. In this research, we use a deep learning model of Long Short-Term Memory (LSTM) with word embedding. FastText and Global Vector (GloVe) is the word embeddings that we use as input for word representation and classification. FastText embeddings make use of subword information to create word embeddings and GloVe embeddings using an unsupervised learning method trained on a corpus to generate distributional feature vectors. From the evaluation results on the experimental model, LSTM-FastText using random oversampling has an advantage with an F1-score of 89.91% compared to LSTM-GloVe to obtain an F1-score of 82.14%.
Downloads
References
“Digital in Indonesia: All the Statistics You Need in 2021,” DataReportal – Global Digital Insights. https://datareportal.com/reports/digital-2021-indonesia (accessed Dec. 04, 2021).
P. Fortuna and S. Nunes, “A Survey on Automatic Detection of Hate Speech in Text,” ACM Comput. Surv., vol. 51, no. 4, p. 85:1-85:30, Jul. 2018, doi: 10.1145/3232676.
M. Teja, “MEDIA SOSIAL: UJARAN KEBENCIAN DAN PERSEKUSI,” p. 4.
“Kasus Hate Speech Mendominasi Kejahatan Siber, Melebihi Laporan Konten Porno,” kumparan. https://kumparan.com/kumparannews/kasus-hate-speech-mendominasi-kejahatan-siber-melebihi-laporan-konten-porno-1wEebgKLVuE (accessed Dec. 07, 2021).
I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study,” Oct. 2017. doi: 10.1109/ICACSIS.2017.8355039.
B. van Aken, J. Risch, R. Krestel, and A. Löser, “Challenges for Toxic Comment Classification: An In-Depth Error Analysis,” ArXiv180907572 Cs, Sep. 2018, Accessed: Dec. 27, 2021. [Online]. Available: http://arxiv.org/abs/1809.07572
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated Hate Speech Detection and the Problem of Offensive Language.” arXiv, Mar. 11, 2017. Accessed: Jul. 25, 2022. [Online]. Available: http://arxiv.org/abs/1703.04009
H. T.-T. Do, H. D. Huynh, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, “Hate Speech Detection on Vietnamese Social Media Text using the Bidirectional-LSTM Model,” ArXiv191103648 Cs, Nov. 2019, Accessed: Nov. 22, 2021. [Online]. Available: http://arxiv.org/abs/1911.03648
T. Van Huynh, V. D. Nguyen, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, “Hate Speech Detection on Vietnamese Social Media Text using the Bi-GRU-LSTM-CNN Model,” ArXiv191103644 Cs, Dec. 2019, Accessed: Nov. 26, 2021. [Online]. Available: http://arxiv.org/abs/1911.03644
P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep Learning for Hate Speech Detection in Tweets,” in Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion, 2017, pp. 759–760. doi: 10.1145/3041021.3054223.
A. Aprianto, “Reaksi Kebablasan Promosi Holywings,” Tempo, Jun. 27, 2022. https://kolom.tempo.co/read/1606006/reaksi-kebablasan-promosi-holywings (accessed Jul. 25, 2022).
T. detikcom, “Seruan Boikot Netflix dan The Umbrella Academy Gegara Lafaz Allah di Lantai,” detikhot. https://hot.detik.com/tv-news/d-6147508/seruan-boikot-netflix-dan-the-umbrella-academy-gegara-lafaz-allah-di-lantai (accessed Jul. 25, 2022).
“Kurang Ajar! Perusahaan China Jadikan Lafaz Allah Hiasan Bikini | Hukum.” https://www.gatra.com/news-546669-hukum-kurang-ajar-perusahaan-china-jadikan-lafaz-allah-hiasan-bikini-.html (accessed Aug. 10, 2022).
“‘Kami Percaya ACT’ Jadi Trending Topic, Publik Ramai Bandingkan dengan Tikus Berdasi,” suara.com, Jul. 05, 2022. https://www.suara.com/news/2022/07/05/104130/kami-percaya-act-jadi-trending-topic-publik-ramai-bandingkan-dengan-tikus-berdasi (accessed Jul. 25, 2022).
F. M. Sidik, “Ini Alasan Kemensos Cabut Izin Pengumpulan Uang dan Barang ACT,” detiknews. https://news.detik.com/berita/d-6164336/ini-alasan-kemensos-cabut-izin-pengumpulan-uang-dan-barang-act (accessed Jul. 25, 2022).
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” ArXiv160704606 Cs, Jun. 2017, Accessed: Jan. 24, 2022. [Online]. Available: http://arxiv.org/abs/1607.04606
J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543. doi: 10.3115/v1/D14-1162.
A. Bisht, A. Singh, H. Bhadauria, J. Virmani, and D. Kriti, “Detection of Hate Speech and Offensive Language in Twitter Data Using LSTM Model,” 2020, pp. 243–264. doi: 10.1007/978-981-15-2740-1_17.
C. Raj, A. Agarwal, G. Bharathy, B. Narayan, and M. Prasad, “Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques,” Electronics, vol. 10, no. 22, Art. no. 22, Jan. 2021, doi: 10.3390/electronics10222810.
I. G. M. Putra and D. Nurjanah, “Hate Speech Detection In Indonesian Language Instagram,” in 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Oct. 2020, pp. 413–420. doi: 10.1109/ICACSIS51025.2020.9263084.
“What is Gensim? — gensim.” https://radimrehurek.com/gensim/intro.html (accessed Aug. 14, 2022).
M. F. Ahmed, Z. Mahmud, Z. T. Biash, A. A. N. Ryen, A. Hossain, and F. B. Ashraf, “Cyberbullying Detection Using Deep Neural Network from Social Media Comments in Bangla Language,” ArXiv210604506 Cs, Jun. 2021, Accessed: Oct. 18, 2021. [Online]. Available: http://arxiv.org/abs/2106.04506
W. K. Sari, D. P. Rini, and R. F. Malik, “Text Classification Using Long Short-Term Memory With GloVe Features,” J. Ilm. Tek. Elektro Komput. Dan Inform., vol. 5, no. 2, Art. no. 2, Dec. 2019, doi: 10.26555/jiteki.v5i2.15021.
“[1512.05287] A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” https://arxiv.org/abs/1512.05287 (accessed Nov. 29, 2022).
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Hate Speech Detection on Twitter through Natural Language Processing using LSTM Model
Pages: 1548−1557
Copyright (c) 2022 Cepthari Ningtyas Arbaatun, Dade Nurjanah, Hani Nurrahmi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















