Deteksi URL Phishing Menggunakan Natural Language Processing Dan Support Vector Machine Berbasis Machine Learning
Abstract
Phishing represents a significant danger in cybersecurity, using malicious URLs to mislead users into revealing critical information. This research seeks to create a phishing URL detection model using machine learning via the integration of structural URL feature extraction, Natural Language Processing (NLP) methodologies, and the Support Vector Machine (SVM) classification algorithm. Indicators of phishing trends are derived from features such as URL length, the quantity of dots, and slashes, while URL content is quantified as numerical vectors using Term Frequency-Inverse Document Frequency (TF-IDF). All characteristics are subsequently integrated as input into a support vector machine model with a linear kernel for classification. The evaluation results from the classification report indicate that the integration of TF-IDF and linear kernel SVM achieves optimal performance, with 90% accuracy, 92% precision, 89% recall, and 90% F1-score. Conversely, the confusion matrix reveals 90.29% accuracy, 91.66% precision, 88.62% recall, and 90.12% F1-score. This study primarily contributes by integrating NLP and SVM into a unified adaptive phishing detection model via the amalgamation of structural and textual aspects of URLs. This strategy facilitates enhanced phishing detection relative to techniques reliant only on manual characteristics. This model, unlike other research that concentrated on particular instances or excluded NLP, is engineered to identify many categories of phishing URLs broadly, hence enhancing its relevance in tackling the dynamic nature of assaults.
Downloads
References
A. Sudiro, M. D. Ilmawan, and N. V. Puspita, “Pendampingan Terpadu Untuk Maksimalkan Pemasaran Digital Umkm Soto Kudus Kedai Taman Cabang Mojokerto,” DedikasiMU: Journal of Community Service, vol. 6, no. 4, 2024, doi: https://doi.org/10.30587/dedikasimu.v6i4.8556.
V. A. Windarni, A. F. Nugraha, S. T. A. Ramadhani, D. A. Istiqomah, F. M. Puri, and A. Setiawan, “Deteksi Website Phishing Menggunakan Teknik Filter Pada Model Machine Learning,” Information System Journal (INFOS) , vol. 6, no. 1, 2023, doi: https://doi.org/10.24076/infosjournal.2023v6i01.1268.
Indonesia Anti-Phishing Data Exchange (IDADX), “Laporan Aktivitas Phising Domain~.ID,” 2023. Accessed: May 18, 2025. [Online]. Available: https://surl.li/vfofxr
Indonesia Anti-Phishing Data Exchange (IDADX), “Laporan Aktivitas Abuse Domain .Id Indonesia Domain Abuse Data Exchange,” 2024. Accessed: May 26, 2025. [Online]. Available: https://surli.cc/oojpve
A. Erikha and Z. Arifin Hoesein, “Strategi Pencegahan Kebocoran Data Pribadi melalui Peran Kominfo dan Gerakan Siberkreasi dalam Edukasi Digital,” Jurnal Retentum, vol. 7, no. 1, 2025, doi: 10.46930.
Y. Yuliana, “The Importance Of Cybersecurity Awareness For Children,” Lampung Journal of International Law, vol. 4, no. 1, pp. 41–48, Jun. 2022, doi: 10.25041/lajil.v4i1.2526.
A. F. Mahmud and S. Wirawan, “Deteksi Phishing Website menggunakan Machine Learning Metode Klasifikasi,” Sistemasi: Jurnal Sistem Informasi, vol. 13, no. 4, 2024, doi: https://doi.org/10.32520/stmsi.v13i4.
C. A. Nurhaliza Agustina, R. Novita, Mustakim, and N. E. Rozanda, “The Implementation of TF-IDF and Word2Vec on Booster Vaccine Sentiment Analysis Using Support Vector Machine Algorithm,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 156–163. doi: 10.1016/j.procs.2024.02.162.
M. Z. Naeem, F. Rustam, A. Mehmood, Mui-zzud-din, I. Ashraf, and G. S. Choi, “Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms,” PeerJ Comput Sci, vol. 8, 2022, doi: 10.7717/PEERJ-CS.914.
B. Alshawi, “Comparison of SVM kernels in Credit Card Fraud Detection using GANs,” International Journal of Advanced Computer Science and Application (IJACSA), vol. 15, no. 1, 2024, doi: 10.14569/ijacsa.2024.0150131.
F. S. Salam Nagalay, “Analisis Penerapan Algoritma Decision Tree Dalam Keamanan Siber Untuk Kelasifikasi Situs Website Phishing,” Jurnal Ilmiah Rekayasa dan Manajemen Sistem Informasi, vol. 10, no. 1, pp. 1–8, 2024, doi: http://dx.doi.org/10.24014/rmsi.v10i1.28401.
A. Jauhar Himawan, A. Meyla Kartika Sari, N. Agatha Parsa, K. Sabilah Putri Hermansyah, and E. Sabrina Dea Rizki, “Penerapan Metode K-Nearest Neighbors dalam Mendeteksi Website Phishing,” COREAI, vol. 5, no. 2, 2024, doi: 10.33650/coreai.v5i2.10484.
M. Vebriani and W. Yustanti, “Klasifikasi Deteksi Link Phising DANA Kaget Menggunakan Metode Support Vector Machine Berbasis Website,” Journal of Informatics and Computer Science, vol. 06, no. 2, 2024, doi: https://doi.org/10.26740/jinacs.v6n02.p408-416.
N. H. Shaker and B. N. Dhannoon, “Word embedding for detecting cyberbullying based on recurrent neural networks,” IAES International Journal of Artificial Intelligence, vol. 13, no. 1, pp. 500–508, Mar. 2024, doi: 10.11591/ijai.v13.i1.pp500-508.
I. P. Ramayasa, I. G. A. Des Saryanti, I. K. Dharmendra, and Edwar, “Perbandingan Metode Vektorisasi Pada Analisa Sentiment, Studi Kasus : Cyberbullying Pada Komentar Instagram,” Jurnal Teknologi Informasi Dan Komputer, vol. 9, no. 5, 2023, doi: https://doi.org/10.36002/jutik.v9i5.2645.
D. E. Cahyani and I. Patasik, “Performance comparison of TF-IDF and Word2Vec models for emotion text classification,” Bulletin of Electrical Engineering and Informatics , vol. 10, no. 5, 2021, doi: 10.11591/eei.v10i5.3157.
R. Danar Dana, Mulyawan, A. Bahtiar, and I. Ali, Dasar Dasar Natural Language Processing (NLP) . Minhaj Pustaka, 2024. [Online]. Available: https://surl.lu/gpfpax
A. N. Putri, A. Aryanti, and S. Soim, “Implementasi Algoritma SVM Non-Linear Pada Klasifikasi Analisis Sentimen Perkembangan AI di Sektor Pendidikan,” Technology and Science (BITS), vol. 6, no. 2, 2024, doi: 10.47065/bits.v6i2.5522.
S. Shabudin, N. S. Sani, K. A. Z. Ariffin, and M. Aliff, “Feature Selection for Phishing Website Classification,” Int J Adv Comput Sci Appl, vol. 11, no. 4, 2020, doi: https://doi.org/10.14569/ijacsa.2020.0110477.
M. R. Sudrajat and M. Zakariyah, “Penerapan Natural Language Processing dan Machine Learning untuk Prediksi Stres Siswa SMA Berdasarkan Analisis Teks,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 3, Dec. 2024, doi: 10.47065/bits.v6i3.6180.
J. Anggraini and D. Alita, “Implementasi Metode SVM Pada Sentimen Analisis Terhadap Pemilihan Presiden (Pilpres) 2024 Di Twitter,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 9, no. 2, pp. 102–111, Aug. 2024, doi: 10.30591/jpit.v9i2.6560.
panji bintoro, ratnasari, edy wihardjo, pratiwi putri indah, and andi asari, Pengantar Machine Learning. PT MAFY MEDIA LITERASI INDONESIA, 2024. [Online]. Available: https://surli.cc/qqlrwg
M. L. B. Permadi and R. Gumilang, “Penerapan Algoritma CNN (Convolutional Neural Network) Untuk Deteksi Dan Klasifikasi Target Militer Berdasarkan Citra Satelit,” SOSTECH Jurnal sosial dan teknologi, vol. 4, no. 2, 2024, doi: https://doi.org/10.59188/jurnalsostech.v4i2.1138.
D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” J-SAKTI Jurnal Sains Komputer dan Informatika, vol. 5, no. 2, 2021, doi: http://dx.doi.org/10.30645/j-sakti.v5i2.369.
A. Ramadhan, Lindawati, and M. M. Rose, “Komparasi Algoritma Neural Network dan K-Nearest Neighbor Dalam Mendeteksi Malware Android,” Building of Informatics, Technology and Science (BITS), vol. 5, no. 1, Jun. 2023, doi: https://doi.org/10.47065/bits.v5i1.3538.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Deteksi URL Phishing Menggunakan Natural Language Processing Dan Support Vector Machine Berbasis Machine Learning
Pages: 526-537
Copyright (c) 2025 Nabila Nabila, Emilia Hesti, Aryanti Aryanti

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















