Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF

Ichwanul Muslim Karo Karo; Romia Romia; Sri Dewi; Putri Maulidina Fadilah

doi:10.47065/josh.v4i3.3317

Ichwanul Muslim Karo Karo * Medan State University, Medan, Indonesia
Romia Romia STMIK Citra Mandiri Padangsidimpuan, Padangsidimpuan, Indonesia
Sri Dewi Medan State University, Medan, Indonesia
Putri Maulidina Fadilah Medan State University, Medan, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/josh.v4i3.3317

Keywords: Hoax; Naïve Bayes; TF-IDF; Text Preprocessing; Performance

Abstract

Twitter is one of the most popular social media platforms in the world nowadays. Twitter users in Indonesia are the fifth largest in the world and are always active in expressing themselves and getting information through tweets. A hoax is a lie created as if it were true. Hoaxes are also often spread via tweets. The spread of hoaxes is extremely dangerous because it can cause social discord and even misunderstanding. Therefore, hoaxes must be resisted. This study aims to build a system to detect hoaxes on Indonesian tweets. The objective of this research is to identify hoax Indonesian tweets by using the Naïve Bayes classifier with Term Frequency Inverse Document Frequency (TF-IDF). This study collects and annotates tweets from hoax tweets post which sent by a user account. This study also applied several text preprocessing techniques to provide datasets. To provide the best hoax prediction model, this work splits datasets into training and testing datasets. There are four experimental scenarios that refer to splitting the dataset. The experimental results showed that the hoax prediction model using Naïve Bayes with TF-IDF had 64% accuracy and recall, 69% and 67% precision, and a F1-score respectively. This result is also superior to the hoax prediction model when using the Naïve Bayes classifier without the TF-IDF. It means that TF-IDF has made a positive contribution to improving model performance. Finally, this research contributes by detecting news with a proclivity for hoaxes and filtering what is classified as hoaxes or not.

Downloads

Download data is not yet available.

References

U. R. Hodeghatta and S. Sahney, “Understanding Twitter as an e-WOM,” Journal of Systems and Information Technology, vol. 18, no. 1, 2016, doi: 10.1108/JSIT-12-2014-0074.

T. Widaretna, J. Tirtawangsa, and A. Romadhony, “Hoax Identification on Tweets in Indonesia Using Doc2Vec,” in 2021 9th International Conference on Information and Communication Technology, ICoICT 2021, 2021. doi: 10.1109/ICoICT52021.2021.9527515.

Y. Priatna, “Hoax: An Information Society Challenge,” Record and Library Journal, vol. 4, no. 2, 2018.

G. E. Dowd, Groundless: Rumors, legends, and hoaxes on the early American frontier. 2015. doi: 10.1093/jahist/jaw367.

M. A. Hasbullah, “Hoax in legal perspective and literacy education in digital era,” International Seminar and Call for Paper 2017 Darul Ulum Islamic University of Lamongan, 2017.

A. Fauzi, E. B. Setiawan, and Z. K. A. Baizal, “Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method,” in Journal of Physics: Conference Series, 2019. doi: 10.1088/1742-6596/1192/1/012025.

D. J. Bayu, “Jumlah Pengguna Media Sosial di Dunia Capai 4,2 Miliar | Databoks,” Databoks, 2021.

F. Rahutomo, I. Yanuar Risca Pratiwi, D. Mayangsari Ramadhani, and P. Negeri Malang Jalan Soekarno Hatta No, “Naïve bayes’s experiment on hoax news detection in Indonesian language,” JURNAL PENELITIAN KOMUNIKASI DAN OPINI PUBLIK, vol. 23, no. 1, 2019.

M. Granik and V. Mesyura, “Fake news detection using naive Bayes classifier,” in 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering, UKRCON 2017 - Proceedings, 2017. doi: 10.1109/UKRCON.2017.8100379.

D. A. N. Krisna and U. Salamah, “Perbandingan Algoritma Naïve Bayes Dan K-Nearest Neighbor Untuk Klasifikasi Berita Hoax Kesehatan Di Media Sosial Twitter,” Jurnal Teknik Informatika Kaputama (JTIK), vol. 6, no. 2, 2022.

A. Yodi Prayoga, A. Id Hadiana, and F. Rakhmat Umbara, “Deteksi Hoax pada Berita Online Bahasa Inggris Menggunakan Bernoulli Naïve Bayes dengan Ekstraksi Fitur Tf-Idf,” Jurnal Health Sains, vol. 2, no. 10, 2021, doi: 10.46799/jsa.v2i10.327.

G. Bonaccorso, Machine Learning Algorithms: Reference guide for popular algorithms for data science and machine learning. 2017.

A. Rusli, J. C. Young, and N. M. S. Iswari, “Identifying fake news in indonesian via supervised binary text classification,” in Proceedings - 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, IAICT 2020, 2020. doi: 10.1109/IAICT50021.2020.9172020.

I. M. Karo Karo, M. F. M. Fudzee, S. Kasim, and A. A. Ramli, “Sentiment Analysis in Karonese Tweet using Machine Learning,” Indonesian Journal of Electrical Engineering and Informatics, vol. 10, no. 1, pp. 219–231, Mar. 2022, doi: 10.52549/ijeei.v10i1.3565.

J. Perkins, Python 3 Text Processing With NLTK 3 Cookbook. 2014.

S. Fahmi, L. Purnamawati, G. F. Shidik, M. Muljono, and A. Z. Fanani, “Sentiment analysis of student review in learning management system based on sastrawi stemmer and SVM-PSO,” in Proceedings - 2020 International Seminar on Application for Technology of Information and Communication: IT Challenges for Sustainability, Scalability, and Security in the Age of Digital Disruption, iSemantic 2020, 2020. doi: 10.1109/iSemantic50169.2020.9234291.

I. M. Karo Karo, M. Farhan, M. Fudzee, S. Kasim, and A. A. Ramli, “Karonese Sentiment Analysis: A New Dataset and Preliminary Result,” JOIV: International Journal on Informatics Visualization, vol. 6, no. 2–2, pp. 523–530, 2022, [Online]. Available: www.joiv.org/index.php/joiv

I. M. K. Karo, M. Y. Fajari, N. U. Fadhilah, and W. Y. Wardani, “Benchmarking Naïve Bayes and ID3 Algorithm for Prediction Student Scholarship,” IOP Conf Ser Mater Sci Eng, vol. 1232, no. 1, p. 012002, Mar. 2022, doi: 10.1088/1757-899X/1232/1/012002.

I. M. K. Karo, A. Khosuri, and R. Setiawan, “Effects of Distance Measurement Methods in K-Nearest Neighbor Algorithm to Select Indonesia Smart Card Recipient,” in 2021 International Conference on Data Science and Its Applications, ICoDSA 2021, 2021. doi: 10.1109/ICoDSA53588.2021.9617476.

N. Z. Salih and W. Khalaf, “Prediction of student’s performance through educational data mining techniques,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 3, 2021, doi: 10.11591/ijeecs.v22.i3.pp1708-1715.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF