Klasifikasi Pesan Penipuan pada Platform WhatsApp Menggunakan Metode Naïve Bayes Berbasis TF-IDF, N-Gram, dan Chi-Square


  • Hardika Nur Saputra Universitas Dian Nuswantoro, Semarang, Indonesia
  • Ardytha Luthfiarta * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: Text Classification; Scam WhatsApp; Naive Bayes; TF-IDF; Chi-Square

Abstract

The rapid development of digital communication has led to an increase in message exchanges across various platforms, accompanied by the widespread spread of fraudulent messages (scams). This situation demands an automated system capable of identifying and classifying messages quickly and accurately. This study aims to develop a text-based message classification system on the WhatsApp platform using the Naïve Bayes algorithm. The research stages include text preprocessing consisting of case folding, cleaning, normalization, stopword removal, and stemming to improve data quality. Next, feature extraction is carried out using Term Frequency-Inverse Document Frequency (TF-IDF) combined with the N-Gram (unigram) approach to represent each word in the text, and Chi-Square feature selection is applied to obtain the most relevant features in the classification process. The dataset used consists of three categories of WhatsApp messages: normal, promotional, and fraudulent. In addition, this study also applies a data balancing method using Random Oversampling to increase the number of minority class samples in the training data for optimal model performance. The main contribution of this research is the application of a combination of TF-IDF unigram, Chi-Square feature selection, and Random Oversampling in the Naïve Bayes algorithm to improve the classification performance of Indonesian WhatsApp messages, especially in conditions of unbalanced class distribution. Model evaluation is carried out using a Confusion Matrix with accuracy, precision, recall, and F1-score metrics. The test results show that the model built is able to achieve an accuracy level of 95.63%, so the method used is proven to be effective in classifying WhatsApp messages accurately and consistently.

Downloads

Download data is not yet available.

References

F. N. Azzahra, T. Rohana, R. Rahmat, and A. R. Juwita, “Penerapan Metode Naive Bayes Dalam Klasifikasi Spam SMS Menggunakan Fitur Teks Untuk Mengatasi Ancaman Pada Pengguna,” J. Inf. Syst. Res., vol. 5, no. 3, pp. 873–880, 2024, doi: 10.47065/josh.v5i3.5070.

S. R. Prusty, B. Sainath, S. K. Jayasingh, and J. K. Mantri, “SMS Fraud Detection Using Machine Learning,” Lect. Notes Networks Syst., vol. 431, no. May, pp. 595–606, 2022, doi: 10.1007/978-981-19-0901-6_52.

A. W. Putera, Suriati, and Y. D. Lestari, “Klasifikasi Sms Spam Menggunakan Algoritma K-Nearest Neighbor,” Jikstra, vol. 5, no. 01, pp. 43–55, 2023.

G. Sanhaji, J. Julian, and H. Syah, “WFraud Alert Sebagai Prediksi Pesan Penipuan WhatsApp Menggunakan Naïve Bayes,” J. Tekno Kompak, vol. 18, no. 1, p. 113, 2024, doi: 10.33365/jtk.v18i1.3523.

Sutriawan, Siti Mutmainnah, Teguh Ansyor Lorosae, and Sahrul Ramadhan, “Model Text Embedding dan TF-IDF+Ngram untuk Meningkatkan Kinerja Algoritma Binary Classifier pada Klasifikasi SMS Palsu,” J. Sist. Inf. Triguna Dharma (JURSI TGD), vol. 4, no. 1, pp. 55–64, 2025, doi: 10.53513/jursi.v4i1.10582.

Lenny, “Kominfo catatkan 1.730 kasus penipuan online, kerugian ratusan triliun,” Katadata.co.id. [Online]. Available: https://katadata.co.id/desysetyowati/digital/63f8a599de801/kominfo-catatkan-1730-kasus-penipuan-online-kerugian-ratusan-triliun

A. P. Pradana, A. M. Syarif, I. N. Dewi, and C. Irawan, “Kombinasi naive bayes dan chi-square untuk identifikasi sms penipuan,” IRCS Integr. Res. Comput. Sci., vol. 1, no. 1, pp. 1–22, 2025.

I. K. Dwiprayoga and M. A. Raharja, “Komparasi Ekstraksi Fitur BoW dan TF-IDF untuk Klasifikasi SMS Menggunakan Naive Bayes,” Jnatia J. Nas. Teknol. Inf. dan Apl., vol. 3, no. 2, pp. 247–254, 2025.

T. Informatika, F. Teknik, U. Nusantara, and P. Kediri, “Klasifikasi Pesan SMS Menggunakan Metode TF-IDF dan Support Vector Machine,” Semin. Nas. Teknol. Dan Sains, vol. 5, pp. 151–160, 2026, doi: 10.29407/9y2vn411.

M. I. U. Rosyidi and N. Rochmawati, “Teknik Bagging Pada Algoritma Klasifikasi Decision Tree dan SVM Untuk Klasifikasi SMS Berbahasa Indonesia,” J. Informatics Comput. Sci., vol. 5, no. 02, pp. 265–271, 2023, doi: 10.26740/jinacs.v5n02.p265-271.

F. R. Suprihati, “Analisis Klasifikasi SMS Spam Menggunakan Logistic Regression,” J. Sist. Cerdas, vol. 4, no. 3, pp. 155–160, 2021, doi: 10.37396/jsc.v4i3.166.

D. W. Putri and M. A. Soeleman, “Penerapan Algoritma Naïve Bayes Terhadap Sentimen Ulasan Produk Skincare Pada E-Commerce Shopee,” Technol. Sci., vol. 7, no. 4, pp. 2218–2228, 2026, doi: 10.47065/bits.v7i4.9209.

R. Dwiyansaputra, G. S. Nugraha, F. Bimantoro, and A. Aranta, “Deteksi Sms Spam Berbahasa Indonesia Menggunakan Tf-Idf Dan Stochastic Gradient Descent Classifier,” J. Teknol. Informasi, Komput. dan Apl., vol. 3, no. 2, pp. 200–207, 2021.

A. Wahid, M. Baharulloh, R. Kahfiansyah, T. Abrilianto, A. Saifudin, and S. Mulyati, “Identifikasi SMS Spam Menggunakan Metode Naive Bayes,” Inform. Univ. Pamulang , vol. 6, no. 3, pp. 536–539, 2021, [Online]. Available: http://openjournal.unpam.ac.id/index.php/informatika536

M. A. Akbar and F. Ariany, “Komparasi Algoritma Naive Bayes dan K-Nearest Neighbor untuk Analisis Sentimen Pengguna Dompet Digital pada Google Play Store,” Technol. Sci., vol. 7, no. 4, pp. 2335–2348, 2026, doi: 10.47065/bits.v7i4.9285.

A. Puji Astuti, S. Alam, and I. Jaelani, “Komparasi Algoritma Support Vector Machine dengan Naive Bayes Untuk Analisis Sentimen Pada Aplikasi BRImo,” J. Bangkit Indones., vol. 11, no. 2, pp. 1–6, 2022, doi: 10.52771/bangkitindonesia.v11i2.196.

E. A. -, “Klasifikasi Penyalahgunaan Pesan singkat Menggunakan Algoritma Naïve Bayes,” Techno Xplore J. Ilmu Komput. dan Teknol. Inf., vol. 8, no. 1, pp. 01–07, 2023, doi: 10.36805/technoxplore.v8i1.3500.

W. Mulyaningtyas, “Deteksi Email Spam Menggunakan Multinomial Naive Bayes dengan Teknik Bag of Words,” SENTRI J. Ris. Ilm., vol. 5, no. 2, pp. 1523–1533, 2026, doi: 10.55681/sentri.v5i2.5650.

N. Arifin, U. Enri, and N. Sulistiyowati, “Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification,” STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 6, no. 2, p. 129, 2021, doi: 10.30998/string.v6i2.10133.

D. Irawan, E. B. Perkasa, Y. Yurindra, D. Wahyuningsih, and E. Helmud, “Perbandingan Klassifikasi SMS Berbasis Support Vector Machine, Naive Bayes Classifier, Random Forest dan Bagging Classifier,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 10, no. 3, pp. 432–437, 2021, doi: 10.32736/sisfokom.v10i3.1302.

M. B. M. Amin et al., “Deteksi Spam Berbahasa Indonesia Berbasis Teks Menggunakan Model Bert,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 6, pp. 1291–1302, 2024, doi: 10.25126/jtiik.2024118121.

M. Anita, B. Susanto, and L. Larwuy, “Perbandingan Metode Random Forest dan Naïve Bayes dalam Email Spam Filtering,” KUBIK J. Publ. Ilm. Mat., vol. 7, no. 2, pp. 88–96, 2023, doi: 10.15575/kubik.v7i2.18933.

D. I. Muhammad Hairu Dzikri, Iwan Rizal Setiawan, “Penerapan Algoritma Naive Bayes Untuk Mendeteksi Penipuan Lowongan Pekerjaan,” Sist. Inf. DAN Tek. Komput., vol. 8, no. 1, pp. 919–926, 2014, doi: 10.51876/simtek.v9i2.392.

M. Dauber Panjaitan, P. P. Adikara, and B. D. Setiawan, “Klasifikasi Spam pada Short Message Service (SMS) menggunakan Support Vector Machine,” Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 1, no. 1, pp. 2548–964, 2024, [Online]. Available: http://j-ptiik.ub.ac.id

M. Arif Sofyan, N. Rahaningsih, and R. Danar Dana, “Deteksi Sms Spam Berbahasa Indonesia Menggunakan Algoritma Support Vector Machine,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 3, pp. 3071–3079, 2024, doi: 10.36040/jati.v8i3.9532.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Klasifikasi Pesan Penipuan pada Platform WhatsApp Menggunakan Metode Naïve Bayes Berbasis TF-IDF, N-Gram, dan Chi-Square

Dimensions Badge
Article History
Submitted: 2026-05-13
Published: 2026-06-23
Abstract View: 26 times
PDF Download: 18 times
How to Cite
Saputra, H., & Luthfiarta, A. (2026). Klasifikasi Pesan Penipuan pada Platform WhatsApp Menggunakan Metode Naïve Bayes Berbasis TF-IDF, N-Gram, dan Chi-Square. Building of Informatics, Technology and Science (BITS), 8(1), 278-292. https://doi.org/10.47065/bits.v8i1.9941
Issue
Section
Articles