Classification of Indonesian Undergraduate Students’ Awareness Level of Phishing Attacks using Decision Tree Algorithm
Abstract
Phishing remains a dominant cyber-crime vector in higher-education settings, yet most Indonesian campus studies stop at descriptive awareness surveys. This study sets out (i) to build a fully interpretable predictive model that can classify students’ phishing-awareness levels from a concise questionnaire and (ii) to demonstrate how the model’s rules can be mapped to established behavioural theory for targeted educational intervention. Guided by the Cross-Industry Standard Process for Data Mining (CRISP-DM), we transformed a ten-item phishing-awareness instrument into a 153 × 10 binary matrix drawn from 153 undergraduate responses (82 male; 71 female) and analysed the data with a cost-complexity–pruned Classification-and-Regression Tree (CART). The optimal tree (depth = 5, 19 leaves) achieved 94.9 % accuracy, 93.4 % recall, 95.8 % precision, and a 0.971 ROC-AUC under stratified 10-fold cross-validation—metrics comparable to ensemble methods but obtained with a glass-box structure that exposes explicit IF-THEN rules. The three most salient splits—URL-domain mismatch, urgency cues, and misconceptions about the HTTPS lock icon—directly align with Protection Motivation Theory constructs, providing actionable targets for micro-learning modules. Because the dataset originates from a single campus and governance prerequisites (fairness audit, GDPR impact assessment, SOP alignment) are pending, the model will run in “shadow mode” next term to collect longitudinal evidence and monitor concept drift. Overall, the findings show that concise, theory-grounded instruments combined with pruned decision trees can achieve high predictive power and immediate pedagogical value without sacrificing transparency.
Downloads
References
Anti-Phishing Working Group, Phishing Activity Trends Report — 1Q 2025, Lexington, MA, USA, Tech. Rep., Jul. 2025.
IBM Security & Ponemon Institute, Cost of a Data Breach 2024, IBM Corp., Armonk, NY, USA, 2024.
Cybersecurity Ventures, “Cyber-crime to cost the world USD 10.5 trillion annually by 2025,” Sausalito, CA, USA, Special Rep., Feb. 2025.
Badan Siber dan Sandi Negara, Laporan Tahunan Keamanan Siber Indonesia 2024, Jakarta, Indonesia, 2025.
T. Tan, R. R. Lolong and J. M. Suharto, “Cybersecurity awareness among university students in Batam City,” J. Teknol. dan Informasi (JATI), vol. 14, no. 2, pp. 163-172, 2024, doi: 10.34010/jati.v14i2.
C. L. Gan, Y. Y. Lee and T. W. Liew, “Fishing for phishy messages: Predicting phishing susceptibility through cyber-routine-activity theory and heuristic–systematic model,” Humanities & Social Sciences Communications, vol. 11, Art. 1552, 2024, doi: 10.1038/s41599-024-04083-1.
K. Omari, “Comparative study of machine-learning algorithms for phishing website detection,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 9, pp. 417-423, 2023, doi: 10.14569/IJACSA.2023.0140945.
N. Liebana-Cabanillas, R. Molinillo and F. M. Fernandez, “Antecedents of phishing susceptibility in higher education,” Telemat. Inform., vol. 82, Art. 102049, 2024, doi: 10.1016/j.tele.2023.102049.
C. Schröer, F. Kruse and J. M. Gómez, “A systematic literature review on applying the CRISP-DM process model,” Procedia Comput. Sci., vol. 181, pp. 526-534, 2021, doi: 10.1016/j.procs.2021.01.168.
S. Guduru, “Cybersecurity workforce upskilling: CRISP-DM automation with Jupyter notebooks and Terraform modules,” Int. J. Sci. Res., vol. 14, no. 4, pp. 907-913, 2025, doi: 10.21275/SR25408010102.
M. Elkabalawy et al., “A CRISP-DM–based data-driven approach for building-energy prediction,” Sustainability, vol. 16, no. 17, Art. 7249, 2024, doi: 10.3390/su16177249.
U. M. Alhaji, S. E. Adewumi and V. Yemi-Peters, “Classification of phishing attacks using machine-learning algorithms: A systematic review,” J. Adv. Math. Comput. Sci., vol. 40, no. 1, pp. 26-44, 2025, doi: 10.9734/jamcs/2025/v40i111680.
L. Bognár and L. Bottyán, “Evaluating online security behaviour: Development and validation of a personal cybersecurity awareness scale,” Educ. Sci., vol. 14, no. 6, Art. 588, 2024, doi: 10.3390/educsci14060588.
Y. F. Zakariya, “Cronbach’s alpha in mathematics-education research: Its appropriateness, overuse and alternatives,” Front. Psychol., vol. 13, Art. 1074430, 2022, doi: 10.3389/fpsyg.2022.1074430.
P. Thölke et al., “Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data,” NeuroImage, vol. 277, Art. 120253, 2023, doi: 10.1016/j.neuroimage.2023.120253.
C. Bunkhumpornpat, E. Boonchieng and V. Chouvatut, “FLEX-SMOTE: Synthetic over-sampling technique that flexibly adjusts to different minority-class distributions,” Patterns, vol. 5, no. 11, Art. 101073, 2024, doi: 10.1016/j.patter.2024.101073.
Y. Zhao, D. Ma and W. Liu, “Efficient detection of malicious traffic using a decision-tree-based proximal-policy optimisation algorithm,” Entropy, vol. 26, no. 8, Art. 648, 2024, doi: 10.3390/e26080648.
S. R. Heiyanthuduwage et al., “Decision trees in federated learning: Current state and future opportunities,” IEEE Access, vol. 12, pp. 127943-127965, 2024, doi: 10.1109/ACCESS.2024.3301234.
T. Lazebnik and S. Talbi, “Decision tree post-pruning without loss of accuracy using the SAT-PP algorithm with an empirical evaluation on clinical data,” Data Knowl. Eng., vol. 145, Art. 102173, 2023, doi: 10.1016/j.datak.2022.102173.
K. Barik, S. Misra and R. Mohan, “Web-based phishing URL detection model using deep-learning optimisation techniques,” Int. J. Data Sci. Anal., vol. 14, pp. 1-22, 2025, doi: 10.1007/s41060-024-00551-9.
H. Koga and S. Takahashi, “Survey and analysis of user perceptions of security icons,” in Proc. CHI Conf. Human Factors Comput. Syst. (CHI ’24), Honolulu, HI, USA, Apr. 2024, pp. 1-12, doi: 10.1145/3544548.3581467.
R. Zieni, L. Massari and M. C. Calzarossa, “Phishing or not phishing? A survey on the detection of phishing websites,” IEEE Access, vol. 11, pp. 18499-18519, 2023, doi: 10.1109/ACCESS.2023.3244567.
S. S. M. Aldaham, O. Ouda and A. A. A. El-Aziz, “Improved detection of phishing websites using machine learning,” Int. J. Intell. Syst. Appl. Eng., vol. 12, no. 21-S, pp. 4619-4633, 2024, doi: 10.47577/ijisae.v12i21S.5702.
N. R. Appini, V. B. Kumar and N. Yedukondalu, “Phishing URL detection with Gradient Boosting classifier,” Commun. Appl. Nonlinear Anal., vol. 32, no. 3, Art. 2380, 2025, doi: 10.26782/cana.2025.2380.
B. Lim, R. Huerta, A. Sotelo, A. Quintela and P. Kumar, “EXPLICATE: Enhancing phishing detection through explainable AI and LLM-powered interpretability,” arXiv:2503.20796, 2025.
L. Colonna, “Teachers in the loop? An analysis of automatic assessment systems under Article 22 GDPR,” Int. Data Privacy Law, vol. 14, no. 1, pp. 3-18, 2023, doi: 10.1093/idpl/ipad008.
N. Pham, P. N. Hung and A. Nguyen-Duc, “Fairness for machine-learning software in education: A systematic mapping study,” J. Syst. Softw., vol. 219, Art. 112244, 2024, doi: 10.1016/j.jss.2024.112244.
Y. Xue, V. Chinapah and C. Zhu, “A comparative analysis of AI privacy concerns in higher education: News coverage in China and Western countries,” Educ. Sci., vol. 15, no. 6, Art. 650, 2025, doi: 10.3390/educsci15060650.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Classification of Indonesian Undergraduate Students’ Awareness Level of Phishing Attacks using Decision Tree Algorithm
Pages: 1911-1919
Copyright (c) 2025 George Morris William Tangka, Edson Yahuda Putra

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).






















