Perbandingan Algoritma Support Vector Machine, Decision Tree, Naïve Bayes, dan Neural Network dalam Klasifikasi Email


  • Dika Wicaksono * Mail Universitas Amikom Yogyakarta, Yogyakarta, Indonesia
  • I Made Artha Agastya Universitas Amikom Yogyakarta, Yogyakarta, Indonesia
  • (*) Corresponding Author
Keywords: Email Classification; Neural Network; Support Vector Machine (SVM); Naive Bayes; Decision Tree

Abstract

This study aims to compare the effectiveness of four machine learning models in email classification, namely Support Vector Machine (SVM), Decision Tree, Naive Bayes, and Neural Network. This research uses datasets obtained from the Kaggle website. The first dataset contains 18,650 phishing emails (7,328 phishing and 11,322 non-phishing). The second dataset is the result of merging two different datasets containing Indonesian spam emails, resulting in a total of 4,681 emails (2,670 spam and 2,011 non-spam). The merging was done to obtain a more representative amount of data for model evaluation. The results of the study of the two datasets above showed that the Neural Network achieved the highest accuracy with an average of 96.60%. Then, followed by SVM with an average accuracy of 96.43%. Meanwhile, Decision Tree has a fairly high accuracy with an average of 92.38%. In contrast, Naive Bayes recorded the lowest performance with an average accuracy of 90.22%. Although Neural Network has the highest accuracy, other models may be more suitable depending on the needs of the system. Models with lower accuracy, such as Naive Bayes, can be more useful in systems with computational limitations due to their efficiency. SVM offers a balance between high accuracy and computational efficiency, making it an ideal choice for systems that require optimal performance without too much computational burden. Decision Tree is superior in result interpretation, making it suitable for applications that require transparency in decision making.

Downloads

Download data is not yet available.

References

R. S. Lutfiyani and N. Retnowati, “Implementasi Pendeteksian Spam Email Menggunakan Metode Text Mining dengan Algoritma Naïve Bayes dan Decision Tree J48,” Jurnal Komputer dan Informatika, vol. 9, no. 2, pp. 244–252, Oct. 2021, [Online]. Available: https://doi.org/10.35508/jicon.v9i2.5304

K. M. S. Hidayatullah and T. Sutabri, “Pengembangan Sistem Pengklasifikasi e-mail Berbasis Kecerdasan Buatan untuk Deteksi Spam dan Phishing,” IJM: Indonesian Journal of Multidisciplinary, vol. 2, no.2, Apr. 2024. [Online]. Available: https://journal.csspublishing/index.php/ijm/article/view/689

D. Anggraini and T. Sutabri, “Pengembangan Aplikasi Penyaringan Spam e-mail Menggunakan Teknik Machine Learning dengan Metode Support Vector Machines,” IJM: Indonesian Journal of Multidisciplinary, vol. 2, no. 3, pp. 106–114, Apr. 2024. [Online]. Available: https://journal.csspublishing/index.php/ijm/article/view/720

A. Kumar, J. M. Chatterjee, and V. G. Díaz, “A Novel Hybrid Approach of SVM Combined with NLP and Probabilistic Neural Network for Email Phishing,” International Journal of Electrical and Computer Engineering, vol. 10, no. 1, pp. 486–493, 2020. [Online]. Available: https://doi.org/10.11591/ijece.v10i1.pp486-493

R. P. Ramadhan and T. Desyani, “Implementasi Algoritma J48 Untuk Identifikasi Website Phising,” BINER: Jurnal Ilmu Komputer, Teknik dan Multimedia, vol. 1, no. 2, pp. 46–54, Jun. 2023. [Online]. Available: https://journal.mediapublikasi.id/index.php/Biner/article/view/2557

Q. Ouyang, J. Tian, and J. Wei, “E-mail Spam Classification using KNN and Naive Bayes,” Highlights in Science, Engineering and Technology, vol. 38, pp. 57–63, Mar. 2023. [Online]. Available: https:// doi.org/10.54097/hset.v38i.5699

N. L. Octaviani, E. H. Rachmawanto, C. A. Sari, and I. M. S. De Rosal, "Comparison of multinomial naïve Bayes classifier, support vector machine, and recurrent neural network to classify email spams," in Proceedings of the 2020 International Seminar on Application for Technology of Information and Communication (iSemantic), Sep. 2020, pp. 17–21. [Online]. Available: https://doi.org/10.1109/iSemantic50169.2020.9234296

F. Alghifari and D. Juardi, “Penerapan Data Mining pada Penjualan Makanan dan Minuman Menggunakan Metode Algoritma Naïve Bayes,” JURNAL ILMIAH INFORMATIKA, vol. 9, no. 02, pp. 75–81, Sep. 2021. [Online]. Available: https://doi.org/10.33884/jif.v9i02.3755

D. Chicco, L. Oneto, and E. Tavazzi, “Eleven quick tips for data cleaning and feature engineering,” PLoS Computational Biology, vol. 18, no. 12, p. e1010718, Dec. 2022. [Online]. Available: https://doi.org/10.1371/journal.pcbi.1010718

M. U. Albab, Y. Karuniawati P, and M. N. Fawaiq, "Optimization of the stemming technique on text preprocessing President 3 periods topic," Jurnal Transformatika, vol. 20, no. 2, pp. 1–12, 2023. [Online]. Available: https://doi.org/10.26623/transformatika.v20i2.5374

. Abidin, A. Junaidi, and Wamiliana, "Text stemming and lemmatization of regional languages in Indonesia: A systematic literature review," Journal of Information Systems Engineering and Business Intelligence, vol. 10, no. 2, pp. 217–231, Jun. 2024. [Online]. Available: https://doi.org/10.20473/jisebi.10.2.217-231

M. J. Prasetyo and I. M. A. Agastya, “Sentiment Analysis of Banking Application Reviews on Google Play Store Using Support Vector Machine Algorithm,” Sistemasi: Jurnal Sistem Informasi, vol. 13, no. 6, pp. 2386–2400, 2024. [Online]. Available: http://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4536

R. Ramadhani, R. Ramadhanu, and T. Hidayat, “Exploratory Data Analysis (EDA) untuk Mengetahui Distribusi Data Kualitas Susu Sapi,” Jurnal SAINTIKOM (Jurnal Sains Manajemen Informatika dan Komputer), vol. 23, no. 1, pp. 68-76, Feb. 2024. [Online]. Available: https://doi.org/10.53513/jis.v23i1.9500

M. Radhi, A. Amalia, D. R. H. Sitompul, S. H. Sinurat, and E. Indra, "Analisis Big Data dengan Metode Exploratory Data Analysis (EDA) dan Metode Visualisasi Menggunakan Jupyter Notebook," Jurnal Sistem Informasi dan Ilmu Komputer Prima, vol. 4, no. 2, pp. 23–27, 2021. [Online] Available: https://jurnal.unprimdn.ac.id/index.php/JUSIKOM/article/view/2475

S. Sumayah, F. Sembiring, and W. Jatmiko, "Analysis of sentiment of Indonesian community on metaverse using support vector machine algorithm," Jurnal Teknik Informatika (JUTIF), vol. 4, no. 1, pp. 143–150, 2023. [Online]. Available: https://doi.org/10.20884/1.jutif.2023.4.1.417

A. M. R. Armaya, “Pengaruh Feature Selection dan Feature Extraction dalam Peningkatan Akurasi Klasifikasi Kebakaran Hutan,” JuTI “Jurnal Teknologi Informasi,” vol. 3, no. 1, p. 13, Aug. 2024. [Online]. Available: http://dx.doi.org/10.26798/juti.v3i1.1039

W. N. I. Al-Obaydy, H. A. Hashim, Y. A. Najm, and A. A. Jalal, “Document classification using term frequency-inverse document frequency and K-means clustering,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 27, no. 3, p. 1517, Sep. 2022. [Online]. Available: https://doi.org/10.11591/ijeecs.v27.i3.pp1517-1524

A. Nugroho, "Text Analysis dan Text Mining," in Data Science Menggunakan Bahasa R, E. S. Mulyanta, Ed. Jogja: Penerbit Andi, 2024, pp. 112–123.

H. Han, B. Shi, and L. Zhang, “Prediction of landslide sharp increase displacement by SVM with considering hysteresis of groundwater change,” Engineering Geology, vol. 280, p. 105876, Jan. 2021. [Online]. Available: https://doi.org/10.1016/j.enggeo.2020.105876

N. A. Priyanka and D. Kumar, “Decision tree classifier: a detailed survey,” International Journal of Information and Decision Sciences, vol. 12, no. 3, p. 246, 2020. [Online]. Available: https://doi.org/10.1504/IJIDS.2020.108141

M. V. Anand, B. KiranBala, S. R. Srividhya, K. C., M. Younus, and M. H. Rahman, “Gaussian Naïve Bayes Algorithm: A Reliable Technique Involved in the Assortment of the Segregation in Cancer,” Mobile Information Systems, vol. 2022, pp. 1–7, Jun. 2022. [Online]. Available: https://doi.org/10.1155/2022/2436946

D. Singh and N. S. Rajput, "Blockchain Technology for Smart Cities," in Blockchain Technologies, D. Singh and N. S. Rajput, Eds. Singapore: Springer Singapore, 2020, pp. 67–68. [Online]. Available: https://doi.org/10.1007/978-981-15-2205-5

N. K. E. Sapitri, U. Sa’adah, and N. Shofianah, “Knowledge Discovery from Confusion Matrix of Pruned CART in Imbalanced Microarray Data Ovarian Cancer Classification,” Scientific Journal of Informatics, vol. 11, no. 1, pp. 227–236, Feb. 2024. [Online]. Available: https://doi.org/10.15294/sji.v11i1.50077


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Perbandingan Algoritma Support Vector Machine, Decision Tree, Naïve Bayes, dan Neural Network dalam Klasifikasi Email

Dimensions Badge
Article History
Submitted: 2025-02-08
Published: 2025-03-13
Abstract View: 8 times
PDF Download: 6 times
How to Cite
Wicaksono, D., & Agastya, I. M. (2025). Perbandingan Algoritma Support Vector Machine, Decision Tree, Naïve Bayes, dan Neural Network dalam Klasifikasi Email. Building of Informatics, Technology and Science (BITS), 6(4), 2559-2572. https://doi.org/10.47065/bits.v6i4.6949
Issue
Section
Articles