Hate Speech Detection on Twitter through Natural Language Processing using LSTM Model


  • Cepthari Ningtyas Arbaatun * Mail Telkom University, Bandung, Indonesia
  • Dade Nurjanah Telkom University, Bandung, Indonesia
  • Hani Nurrahmi Telkom University, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: Hate Speech; Twitter; FastText; GloVe; LSTM

Abstract

Currently, social media is a place to express opinions. This opinion can be positive or negative. However, lately, the opinion that often appears is a negative opinion, such as hate speech. Hate speech is often found on social media, such as malicious comments intended to insult individuals or groups. Based on WeAreSocial data in 2021, one of the most used social media platforms in Indonesia is Twitter, with 63.6% of users. According to the Indonesia National Police, hate speech cases were more dominant during the period from April 2020 to July 2021. Therefore, efforts are needed to identify hate speech on the Twitter platform. One way to detect hate speech is by using deep learning. In this research, we use a deep learning model of Long Short-Term Memory (LSTM) with word embedding. FastText and Global Vector (GloVe) is the word embeddings that we use as input for word representation and classification. FastText embeddings make use of subword information to create word embeddings and GloVe embeddings using an unsupervised learning method trained on a corpus to generate distributional feature vectors. From the evaluation results on the experimental model, LSTM-FastText using random oversampling has an advantage with an F1-score of 89.91% compared to LSTM-GloVe to obtain an F1-score of 82.14%.

Downloads

Download data is not yet available.

References

“Digital in Indonesia: All the Statistics You Need in 2021,” DataReportal – Global Digital Insights. https://datareportal.com/reports/digital-2021-indonesia (accessed Dec. 04, 2021).

P. Fortuna and S. Nunes, “A Survey on Automatic Detection of Hate Speech in Text,” ACM Comput. Surv., vol. 51, no. 4, p. 85:1-85:30, Jul. 2018, doi: 10.1145/3232676.

M. Teja, “MEDIA SOSIAL: UJARAN KEBENCIAN DAN PERSEKUSI,” p. 4.

“Kasus Hate Speech Mendominasi Kejahatan Siber, Melebihi Laporan Konten Porno,” kumparan. https://kumparan.com/kumparannews/kasus-hate-speech-mendominasi-kejahatan-siber-melebihi-laporan-konten-porno-1wEebgKLVuE (accessed Dec. 07, 2021).

I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study,” Oct. 2017. doi: 10.1109/ICACSIS.2017.8355039.

B. van Aken, J. Risch, R. Krestel, and A. Löser, “Challenges for Toxic Comment Classification: An In-Depth Error Analysis,” ArXiv180907572 Cs, Sep. 2018, Accessed: Dec. 27, 2021. [Online]. Available: http://arxiv.org/abs/1809.07572

T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated Hate Speech Detection and the Problem of Offensive Language.” arXiv, Mar. 11, 2017. Accessed: Jul. 25, 2022. [Online]. Available: http://arxiv.org/abs/1703.04009

H. T.-T. Do, H. D. Huynh, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, “Hate Speech Detection on Vietnamese Social Media Text using the Bidirectional-LSTM Model,” ArXiv191103648 Cs, Nov. 2019, Accessed: Nov. 22, 2021. [Online]. Available: http://arxiv.org/abs/1911.03648

T. Van Huynh, V. D. Nguyen, K. Van Nguyen, N. L.-T. Nguyen, and A. G.-T. Nguyen, “Hate Speech Detection on Vietnamese Social Media Text using the Bi-GRU-LSTM-CNN Model,” ArXiv191103644 Cs, Dec. 2019, Accessed: Nov. 26, 2021. [Online]. Available: http://arxiv.org/abs/1911.03644

P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep Learning for Hate Speech Detection in Tweets,” in Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion, 2017, pp. 759–760. doi: 10.1145/3041021.3054223.

A. Aprianto, “Reaksi Kebablasan Promosi Holywings,” Tempo, Jun. 27, 2022. https://kolom.tempo.co/read/1606006/reaksi-kebablasan-promosi-holywings (accessed Jul. 25, 2022).

T. detikcom, “Seruan Boikot Netflix dan The Umbrella Academy Gegara Lafaz Allah di Lantai,” detikhot. https://hot.detik.com/tv-news/d-6147508/seruan-boikot-netflix-dan-the-umbrella-academy-gegara-lafaz-allah-di-lantai (accessed Jul. 25, 2022).

“Kurang Ajar! Perusahaan China Jadikan Lafaz Allah Hiasan Bikini | Hukum.” https://www.gatra.com/news-546669-hukum-kurang-ajar-perusahaan-china-jadikan-lafaz-allah-hiasan-bikini-.html (accessed Aug. 10, 2022).

“‘Kami Percaya ACT’ Jadi Trending Topic, Publik Ramai Bandingkan dengan Tikus Berdasi,” suara.com, Jul. 05, 2022. https://www.suara.com/news/2022/07/05/104130/kami-percaya-act-jadi-trending-topic-publik-ramai-bandingkan-dengan-tikus-berdasi (accessed Jul. 25, 2022).

F. M. Sidik, “Ini Alasan Kemensos Cabut Izin Pengumpulan Uang dan Barang ACT,” detiknews. https://news.detik.com/berita/d-6164336/ini-alasan-kemensos-cabut-izin-pengumpulan-uang-dan-barang-act (accessed Jul. 25, 2022).

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” ArXiv160704606 Cs, Jun. 2017, Accessed: Jan. 24, 2022. [Online]. Available: http://arxiv.org/abs/1607.04606

J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543. doi: 10.3115/v1/D14-1162.

A. Bisht, A. Singh, H. Bhadauria, J. Virmani, and D. Kriti, “Detection of Hate Speech and Offensive Language in Twitter Data Using LSTM Model,” 2020, pp. 243–264. doi: 10.1007/978-981-15-2740-1_17.

C. Raj, A. Agarwal, G. Bharathy, B. Narayan, and M. Prasad, “Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques,” Electronics, vol. 10, no. 22, Art. no. 22, Jan. 2021, doi: 10.3390/electronics10222810.

I. G. M. Putra and D. Nurjanah, “Hate Speech Detection In Indonesian Language Instagram,” in 2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Oct. 2020, pp. 413–420. doi: 10.1109/ICACSIS51025.2020.9263084.

“What is Gensim? — gensim.” https://radimrehurek.com/gensim/intro.html (accessed Aug. 14, 2022).

M. F. Ahmed, Z. Mahmud, Z. T. Biash, A. A. N. Ryen, A. Hossain, and F. B. Ashraf, “Cyberbullying Detection Using Deep Neural Network from Social Media Comments in Bangla Language,” ArXiv210604506 Cs, Jun. 2021, Accessed: Oct. 18, 2021. [Online]. Available: http://arxiv.org/abs/2106.04506

W. K. Sari, D. P. Rini, and R. F. Malik, “Text Classification Using Long Short-Term Memory With GloVe Features,” J. Ilm. Tek. Elektro Komput. Dan Inform., vol. 5, no. 2, Art. no. 2, Dec. 2019, doi: 10.26555/jiteki.v5i2.15021.

“[1512.05287] A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” https://arxiv.org/abs/1512.05287 (accessed Nov. 29, 2022).


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Hate Speech Detection on Twitter through Natural Language Processing using LSTM Model

Dimensions Badge
Article History
Submitted: 2022-12-20
Published: 2022-12-30
Abstract View: 1174 times
PDF Download: 684 times
How to Cite
Arbaatun, C., Nurjanah, D., & Nurrahmi, H. (2022). Hate Speech Detection on Twitter through Natural Language Processing using LSTM Model. Building of Informatics, Technology and Science (BITS), 4(3), 1548−1557. https://doi.org/10.47065/bits.v4i3.2718
Issue
Section
Articles