Hate Speech Detection on YouTube Using Long Short-Term Memory and Latent Dirichlet Allocation Method

Andi Fadil Adiyaksa; Donny Richasdy; Aditya Firman Ihsan

doi:10.47065/josh.v3i4.1875

Andi Fadil Adiyaksa * Telkom University, Bandung, Indonesia
Donny Richasdy Telkom University, Bandung, Indonesia
Aditya Firman Ihsan Telkom University, Bandung, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/josh.v3i4.1875

Keywords: Hate Speech; YouTube; Topic Modelling; Long Short-Term Memory; Latent Dirichlet Allocation

Abstract

YouTube social media is one of the popular media for all people to become a platform as a means of information and expressing opinions. Opinions can be categorized as hate if they attack something targeted. Hate speech is a behavior, word or action that is prohibited, because it causes violence to any individual and group. Expressing opinions in the form of hate speech is a problem that is still very difficult for the authorities to overcome because it is very common. Therefore, in this study a system was created to detect hate speech in the youtube comment column, using the Long Short-Term Memory and Latent Dirichlet Allocation. In this study, several methods were carried out that aimed to get the best accuracy value and carried out the topic modeling process using Latent Dirichlet Allocation to produce a total of three topics containing words that often appear in youtube comments. Based on the tests that have been obtained, the best accuracy is 0.657 or 66%.

Downloads

Download data is not yet available.

References

B. R. Amrutha and K. R. Bindu, “Detecting hate speech in tweets using different deep neural network architectures,” 2019 Int. Conf. Intell. Comput. Control Syst. ICCS 2019, no. Iciccs, pp. 923–926, 2019, doi: 10.1109/ICCS45141.2019.9065763.

S. S. Syam, B. Irawan, and C. Setianingsih, “Hate speech detection on twitter using long short-term memory (LSTM) method,” 2019 4th Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2019, pp. 305–310, 2019, doi: 10.1109/ICITISEE48480.2019.9003992.

À. A. Carracedo and R. J. Mondéjar, “Profiling Hate Speech Spreaders on Twitter,” CEUR Workshop Proc., vol. 2936, pp. 1801–1807, 2021.

E. Erizal, B. Irawan, and C. Setianingsih, “Hate speech detection in Indonesian language on instagram comment section using maximum entropy classification method,” 2019 Int. Conf. Inf. Commun. Technol. ICOIACT 2019, pp. 533–538, 2019, doi: 10.1109/ICOIACT46704.2019.8938593.

B. P. Putra, B. Irawan, C. Setianingsih, F. T. Elektro, U. Telkom, and D. Learning, “Convolutional Neural Network Pada Gambar Hatespeech Detection Using Convolutional Neural Network Algorithm Based on Image,” no. 3, 2019.

D. Riyanta and E. B. Setiawan, “Deteksi Ujaran Kebencian pada Twitter dengan Feature Expansion Menggunakan Fasttext,” e-Proceeding Eng., vol. 8, no. 6, pp. 12449–12458, 2021.

B. N. Saha and A. Senapati, “Hate speech and offensive content identification: LSTM based deep learning approach @ HASOC 2020,” CEUR Workshop Proc., vol. 2826, pp. 290–297, 2020.

D. Pitaloka, M. Nasrun, S. Si, and C. Setianingsih, “DETEKSI UJARAN KEBENCIAN MENGGUNAKAN ALGORITMA WORD2 VEC DAN DEEP BELIEF NETWORK ( DBN ) DETECTION OF HATE SPEECH USING WORD2 VEC AND DEEP BELIEF NETWORK ( DBN ) ALGORITHM,” no. 3, 2019.

N. Rochmawati, H. B. Hidayati, Y. Yamasari, H. P. A. Tjahyaningtijas, W. Yustanti, and A. Prihanto, “Analisa Learning Rate dan Batch Size pada Klasifikasi Covid Menggunakan Deep Learning dengan Optimizer Adam,” J. Inf. Eng. Educ. Technol., vol. 5, no. 2, pp. 44–48, 2021, doi: 10.26740/jieet.v5n2.p44-48.

H. Faris, I. Aljarah, M. Habib, and P. A. Castillo, “Hate speech detection using word embedding and deep learning in the Arabic language context,” ICPRAM 2020 - Proc. 9th Int. Conf. Pattern Recognit. Appl. Methods, no. Icpram 2020, pp. 453–460, 2020, doi: 10.5220/0008954004530460.

D. A. N. Taradhita and I. K. G. D. Putra, “Hate speech classification in Indonesian language tweets by using convolutional neural network,” J. ICT Res. Appl., vol. 14, no. 3, pp. 225–239, 2021, doi: 10.5614/itbj.ict.res.appl.2021.14.3.2.

P. H. Saputro, M. Aristin, and Dy. L. Tyas, “Klasifikasi Lagu Daerah Indonesia Berdasarkan Lirik Menggunakan Metode Tfidf Dan Naïve Bayes,” J. Teknol. Inform. dan Terap., vol. 4, no. 1, pp. 45–50, 2017.

S. H. Mohammed and S. Al-Augby, “LSA & LDA topic modeling classification: Comparison study on E-books,” Indones. J. Electr. Eng. Comput. Sci., vol. 19, no. 1, pp. 353–362, 2020, doi: 10.11591/ijeecs.v19.i1.pp353-362.

Z. Wan, “What Do Programmers Discuss about Blockchain ?,” IEEE Trans. Softw. Eng., 2019.

M. D. R Wahyudi, A. Fatwanto, U. Kiftiyani, and M. Galih Wonoseto, “Topic Modeling of Online Media News Titles during COVID-19 Emergency Response in Indonesia Using the Latent Dirichlet Allocation (LDA) Algorithm,” Telematika, vol. 14, no. 2, pp. 101–111, 2021, doi: 10.35671/telematika.v14i2.1225.

Z. F. Hu, X. T. Si, Y. Luo, S. S. Tang, and F. Jian, “Speaker recognition based on 3dcnn-lstm,” Eng. Lett., vol. 29, no. 2, pp. 463–470, 2021.

S. Boumerdassi, R. Milocco, L. Saidane, and N. Puech, Machine learning for networking, vol. 77, no. 5–6. Paris, France: First International Conference, 2022.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Hate Speech Detection on YouTube Using Long Short-Term Memory and Latent Dirichlet Allocation Method

Hate Speech Detection on YouTube Using Long Short-Term Memory and Latent Dirichlet Allocation Method

Abstract

Downloads

References

Most read articles by the same author(s)