Classification of Indonesian Undergraduate Students’ Awareness Level of Phishing Attacks using Decision Tree Algorithm

George Morris William Tangka; Edson Yahuda Putra

doi:10.47065/josh.v6i4.7859

George Morris William Tangka * Universitas Klabat, Manado, Indonesia
Edson Yahuda Putra Universitas Klabat, Manado, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/josh.v6i4.7859

Keywords: Phishing Awareness; Decision Tree; CART; CRISP-DM; Higher Education

Abstract

Phishing remains a dominant cyber-crime vector in higher-education settings, yet most Indonesian campus studies stop at descriptive awareness surveys. This study sets out (i) to build a fully interpretable predictive model that can classify students’ phishing-awareness levels from a concise questionnaire and (ii) to demonstrate how the model’s rules can be mapped to established behavioural theory for targeted educational intervention. Guided by the Cross-Industry Standard Process for Data Mining (CRISP-DM), we transformed a ten-item phishing-awareness instrument into a 153 × 10 binary matrix drawn from 153 undergraduate responses (82 male; 71 female) and analysed the data with a cost-complexity–pruned Classification-and-Regression Tree (CART). The optimal tree (depth = 5, 19 leaves) achieved 94.9 % accuracy, 93.4 % recall, 95.8 % precision, and a 0.971 ROC-AUC under stratified 10-fold cross-validation—metrics comparable to ensemble methods but obtained with a glass-box structure that exposes explicit IF-THEN rules. The three most salient splits—URL-domain mismatch, urgency cues, and misconceptions about the HTTPS lock icon—directly align with Protection Motivation Theory constructs, providing actionable targets for micro-learning modules. Because the dataset originates from a single campus and governance prerequisites (fairness audit, GDPR impact assessment, SOP alignment) are pending, the model will run in “shadow mode” next term to collect longitudinal evidence and monitor concept drift. Overall, the findings show that concise, theory-grounded instruments combined with pruned decision trees can achieve high predictive power and immediate pedagogical value without sacrificing transparency.

Downloads

Download data is not yet available.

References

Anti-Phishing Working Group, Phishing Activity Trends Report — 1Q 2025, Lexington, MA, USA, Tech. Rep., Jul. 2025.

IBM Security & Ponemon Institute, Cost of a Data Breach 2024, IBM Corp., Armonk, NY, USA, 2024.

Cybersecurity Ventures, “Cyber-crime to cost the world USD 10.5 trillion annually by 2025,” Sausalito, CA, USA, Special Rep., Feb. 2025.

Badan Siber dan Sandi Negara, Laporan Tahunan Keamanan Siber Indonesia 2024, Jakarta, Indonesia, 2025.

T. Tan, R. R. Lolong and J. M. Suharto, “Cybersecurity awareness among university students in Batam City,” J. Teknol. dan Informasi (JATI), vol. 14, no. 2, pp. 163-172, 2024, doi: 10.34010/jati.v14i2.

C. L. Gan, Y. Y. Lee and T. W. Liew, “Fishing for phishy messages: Predicting phishing susceptibility through cyber-routine-activity theory and heuristic–systematic model,” Humanities & Social Sciences Communications, vol. 11, Art. 1552, 2024, doi: 10.1038/s41599-024-04083-1.

K. Omari, “Comparative study of machine-learning algorithms for phishing website detection,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 9, pp. 417-423, 2023, doi: 10.14569/IJACSA.2023.0140945.

N. Liebana-Cabanillas, R. Molinillo and F. M. Fernandez, “Antecedents of phishing susceptibility in higher education,” Telemat. Inform., vol. 82, Art. 102049, 2024, doi: 10.1016/j.tele.2023.102049.

C. Schröer, F. Kruse and J. M. Gómez, “A systematic literature review on applying the CRISP-DM process model,” Procedia Comput. Sci., vol. 181, pp. 526-534, 2021, doi: 10.1016/j.procs.2021.01.168.

S. Guduru, “Cybersecurity workforce upskilling: CRISP-DM automation with Jupyter notebooks and Terraform modules,” Int. J. Sci. Res., vol. 14, no. 4, pp. 907-913, 2025, doi: 10.21275/SR25408010102.

M. Elkabalawy et al., “A CRISP-DM–based data-driven approach for building-energy prediction,” Sustainability, vol. 16, no. 17, Art. 7249, 2024, doi: 10.3390/su16177249.

U. M. Alhaji, S. E. Adewumi and V. Yemi-Peters, “Classification of phishing attacks using machine-learning algorithms: A systematic review,” J. Adv. Math. Comput. Sci., vol. 40, no. 1, pp. 26-44, 2025, doi: 10.9734/jamcs/2025/v40i111680.

L. Bognár and L. Bottyán, “Evaluating online security behaviour: Development and validation of a personal cybersecurity awareness scale,” Educ. Sci., vol. 14, no. 6, Art. 588, 2024, doi: 10.3390/educsci14060588.

Y. F. Zakariya, “Cronbach’s alpha in mathematics-education research: Its appropriateness, overuse and alternatives,” Front. Psychol., vol. 13, Art. 1074430, 2022, doi: 10.3389/fpsyg.2022.1074430.

P. Thölke et al., “Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data,” NeuroImage, vol. 277, Art. 120253, 2023, doi: 10.1016/j.neuroimage.2023.120253.

C. Bunkhumpornpat, E. Boonchieng and V. Chouvatut, “FLEX-SMOTE: Synthetic over-sampling technique that flexibly adjusts to different minority-class distributions,” Patterns, vol. 5, no. 11, Art. 101073, 2024, doi: 10.1016/j.patter.2024.101073.

Y. Zhao, D. Ma and W. Liu, “Efficient detection of malicious traffic using a decision-tree-based proximal-policy optimisation algorithm,” Entropy, vol. 26, no. 8, Art. 648, 2024, doi: 10.3390/e26080648.

S. R. Heiyanthuduwage et al., “Decision trees in federated learning: Current state and future opportunities,” IEEE Access, vol. 12, pp. 127943-127965, 2024, doi: 10.1109/ACCESS.2024.3301234.

T. Lazebnik and S. Talbi, “Decision tree post-pruning without loss of accuracy using the SAT-PP algorithm with an empirical evaluation on clinical data,” Data Knowl. Eng., vol. 145, Art. 102173, 2023, doi: 10.1016/j.datak.2022.102173.

K. Barik, S. Misra and R. Mohan, “Web-based phishing URL detection model using deep-learning optimisation techniques,” Int. J. Data Sci. Anal., vol. 14, pp. 1-22, 2025, doi: 10.1007/s41060-024-00551-9.

H. Koga and S. Takahashi, “Survey and analysis of user perceptions of security icons,” in Proc. CHI Conf. Human Factors Comput. Syst. (CHI ’24), Honolulu, HI, USA, Apr. 2024, pp. 1-12, doi: 10.1145/3544548.3581467.

R. Zieni, L. Massari and M. C. Calzarossa, “Phishing or not phishing? A survey on the detection of phishing websites,” IEEE Access, vol. 11, pp. 18499-18519, 2023, doi: 10.1109/ACCESS.2023.3244567.

S. S. M. Aldaham, O. Ouda and A. A. A. El-Aziz, “Improved detection of phishing websites using machine learning,” Int. J. Intell. Syst. Appl. Eng., vol. 12, no. 21-S, pp. 4619-4633, 2024, doi: 10.47577/ijisae.v12i21S.5702.

N. R. Appini, V. B. Kumar and N. Yedukondalu, “Phishing URL detection with Gradient Boosting classifier,” Commun. Appl. Nonlinear Anal., vol. 32, no. 3, Art. 2380, 2025, doi: 10.26782/cana.2025.2380.

B. Lim, R. Huerta, A. Sotelo, A. Quintela and P. Kumar, “EXPLICATE: Enhancing phishing detection through explainable AI and LLM-powered interpretability,” arXiv:2503.20796, 2025.

L. Colonna, “Teachers in the loop? An analysis of automatic assessment systems under Article 22 GDPR,” Int. Data Privacy Law, vol. 14, no. 1, pp. 3-18, 2023, doi: 10.1093/idpl/ipad008.

N. Pham, P. N. Hung and A. Nguyen-Duc, “Fairness for machine-learning software in education: A systematic mapping study,” J. Syst. Softw., vol. 219, Art. 112244, 2024, doi: 10.1016/j.jss.2024.112244.

Y. Xue, V. Chinapah and C. Zhu, “A comparative analysis of AI privacy concerns in higher education: News coverage in China and Western countries,” Educ. Sci., vol. 15, no. 6, Art. 650, 2025, doi: 10.3390/educsci15060650.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Classification of Indonesian Undergraduate Students’ Awareness Level of Phishing Attacks using Decision Tree Algorithm