Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF
Abstract
Twitter is one of the most popular social media platforms in the world nowadays. Twitter users in Indonesia are the fifth largest in the world and are always active in expressing themselves and getting information through tweets. A hoax is a lie created as if it were true. Hoaxes are also often spread via tweets. The spread of hoaxes is extremely dangerous because it can cause social discord and even misunderstanding. Therefore, hoaxes must be resisted. This study aims to build a system to detect hoaxes on Indonesian tweets. The objective of this research is to identify hoax Indonesian tweets by using the Naïve Bayes classifier with Term Frequency Inverse Document Frequency (TF-IDF). This study collects and annotates tweets from hoax tweets post which sent by a user account. This study also applied several text preprocessing techniques to provide datasets. To provide the best hoax prediction model, this work splits datasets into training and testing datasets. There are four experimental scenarios that refer to splitting the dataset. The experimental results showed that the hoax prediction model using Naïve Bayes with TF-IDF had 64% accuracy and recall, 69% and 67% precision, and a F1-score respectively. This result is also superior to the hoax prediction model when using the Naïve Bayes classifier without the TF-IDF. It means that TF-IDF has made a positive contribution to improving model performance. Finally, this research contributes by detecting news with a proclivity for hoaxes and filtering what is classified as hoaxes or not.
Downloads
References
U. R. Hodeghatta and S. Sahney, “Understanding Twitter as an e-WOM,” Journal of Systems and Information Technology, vol. 18, no. 1, 2016, doi: 10.1108/JSIT-12-2014-0074.
T. Widaretna, J. Tirtawangsa, and A. Romadhony, “Hoax Identification on Tweets in Indonesia Using Doc2Vec,” in 2021 9th International Conference on Information and Communication Technology, ICoICT 2021, 2021. doi: 10.1109/ICoICT52021.2021.9527515.
Y. Priatna, “Hoax: An Information Society Challenge,” Record and Library Journal, vol. 4, no. 2, 2018.
G. E. Dowd, Groundless: Rumors, legends, and hoaxes on the early American frontier. 2015. doi: 10.1093/jahist/jaw367.
M. A. Hasbullah, “Hoax in legal perspective and literacy education in digital era,” International Seminar and Call for Paper 2017 Darul Ulum Islamic University of Lamongan, 2017.
A. Fauzi, E. B. Setiawan, and Z. K. A. Baizal, “Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method,” in Journal of Physics: Conference Series, 2019. doi: 10.1088/1742-6596/1192/1/012025.
D. J. Bayu, “Jumlah Pengguna Media Sosial di Dunia Capai 4,2 Miliar | Databoks,” Databoks, 2021.
F. Rahutomo, I. Yanuar Risca Pratiwi, D. Mayangsari Ramadhani, and P. Negeri Malang Jalan Soekarno Hatta No, “Naïve bayes’s experiment on hoax news detection in Indonesian language,” JURNAL PENELITIAN KOMUNIKASI DAN OPINI PUBLIK, vol. 23, no. 1, 2019.
M. Granik and V. Mesyura, “Fake news detection using naive Bayes classifier,” in 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering, UKRCON 2017 - Proceedings, 2017. doi: 10.1109/UKRCON.2017.8100379.
D. A. N. Krisna and U. Salamah, “Perbandingan Algoritma Naïve Bayes Dan K-Nearest Neighbor Untuk Klasifikasi Berita Hoax Kesehatan Di Media Sosial Twitter,” Jurnal Teknik Informatika Kaputama (JTIK), vol. 6, no. 2, 2022.
A. Yodi Prayoga, A. Id Hadiana, and F. Rakhmat Umbara, “Deteksi Hoax pada Berita Online Bahasa Inggris Menggunakan Bernoulli Naïve Bayes dengan Ekstraksi Fitur Tf-Idf,” Jurnal Health Sains, vol. 2, no. 10, 2021, doi: 10.46799/jsa.v2i10.327.
G. Bonaccorso, Machine Learning Algorithms: Reference guide for popular algorithms for data science and machine learning. 2017.
A. Rusli, J. C. Young, and N. M. S. Iswari, “Identifying fake news in indonesian via supervised binary text classification,” in Proceedings - 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, IAICT 2020, 2020. doi: 10.1109/IAICT50021.2020.9172020.
I. M. Karo Karo, M. F. M. Fudzee, S. Kasim, and A. A. Ramli, “Sentiment Analysis in Karonese Tweet using Machine Learning,” Indonesian Journal of Electrical Engineering and Informatics, vol. 10, no. 1, pp. 219–231, Mar. 2022, doi: 10.52549/ijeei.v10i1.3565.
J. Perkins, Python 3 Text Processing With NLTK 3 Cookbook. 2014.
S. Fahmi, L. Purnamawati, G. F. Shidik, M. Muljono, and A. Z. Fanani, “Sentiment analysis of student review in learning management system based on sastrawi stemmer and SVM-PSO,” in Proceedings - 2020 International Seminar on Application for Technology of Information and Communication: IT Challenges for Sustainability, Scalability, and Security in the Age of Digital Disruption, iSemantic 2020, 2020. doi: 10.1109/iSemantic50169.2020.9234291.
I. M. Karo Karo, M. Farhan, M. Fudzee, S. Kasim, and A. A. Ramli, “Karonese Sentiment Analysis: A New Dataset and Preliminary Result,” JOIV: International Journal on Informatics Visualization, vol. 6, no. 2–2, pp. 523–530, 2022, [Online]. Available: www.joiv.org/index.php/joiv
I. M. K. Karo, M. Y. Fajari, N. U. Fadhilah, and W. Y. Wardani, “Benchmarking Naïve Bayes and ID3 Algorithm for Prediction Student Scholarship,” IOP Conf Ser Mater Sci Eng, vol. 1232, no. 1, p. 012002, Mar. 2022, doi: 10.1088/1757-899X/1232/1/012002.
I. M. K. Karo, A. Khosuri, and R. Setiawan, “Effects of Distance Measurement Methods in K-Nearest Neighbor Algorithm to Select Indonesia Smart Card Recipient,” in 2021 International Conference on Data Science and Its Applications, ICoDSA 2021, 2021. doi: 10.1109/ICoDSA53588.2021.9617476.
N. Z. Salih and W. Khalaf, “Prediction of student’s performance through educational data mining techniques,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 3, 2021, doi: 10.11591/ijeecs.v22.i3.pp1708-1715.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF
Pages: 914-919
Copyright (c) 2023 Ichwanul Muslim Karo Karo, Romia Romia, Sri Dewi, Putri Maulidina Fadilah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).






















