Evaluating the Effectiveness of Machine Learning Models for Cyberattack Detection: A Study on Model Generalization and Dataset Imbalance
Abstract
In today's rapidly evolving digital landscape, detecting and preventing cyberattacks has become crucial for securing networks and data. This study evaluates the performance of several machine learning models, including RandomForest, GradientBoosting, XGBoost, LightGBM, CatBoost, Support Vector Classifier (SVC), Logistic Regression, and an ensemble Voting Classifier, in detecting and classifying cyberattacks. The models were tested on a real-world cybersecurity dataset with significant class imbalance, where benign traffic vastly outnumbers malicious attacks. Results showed that while some models, such as RandomForest and the Voting Classifier, achieved high training accuracy, they suffered from overfitting, with test accuracies not exceeding 34%. Boosting models like XGBoost and LightGBM exhibited better generalization than RandomForest but still struggled to handle the dataset complexity. The primary limitations of this study include the dataset's imbalance, the high dimensionality of the features, and the models’ tendency to overfit. These challenges highlight the need for more robust data preprocessing techniques, hyperparameter tuning, and exploration of advanced models, such as deep learning architectures, for future work. The findings provide insights into the challenges of using machine learning for cybersecurity attack detection and point toward future directions for improving model performance in real-world settings.
Downloads
References
C.-L. Chen, Y.-C. Lin, W.-H. Chen, C.-F. Chao, and H. Pandia, “Role of government to enhance digital transformation in small service business,” Sustainability, vol. 13, no. 3, p. 1028, 2021.
S. Kraus, P. Jones, N. Kailer, A. Weinmann, N. Chaparro-Banegas, and N. Roig-Tierno, “Digital transformation: An overview of the current state of the art of research,” Sage Open, vol. 11, no. 3, p. 21582440211047576, 2021.
T. Saarikko, U. H. Westergren, and T. Blomquist, “Digital transformation: Five recommendations for the digitally conscious firm,” Bus. Horiz., vol. 63, no. 6, pp. 825–839, 2020.
D. J. Edwards, “Malware Defenses,” in Critical Security Controls for Effective Cyber Defense: A Comprehensive Guide to CIS 18 Controls, Springer, 2024, pp. 277–308.
M. Huszár, “Current state of IT security awareness--challenges, risks and effects globally.”
M. Anisetti, C. Ardagna, M. Cremonini, E. Damiani, J. Sessa, and L. Costa, “Security threat landscape,” White Pap. Secur. Threat., 2020.
S. S. M. Dandyala and S. Banik, “Traditional Methods of Threat Detection,” Int. J. Adv. Eng. Technol. Innov., vol. 1, no. 2, pp. 161–177, 2021.
K. Hamid, M. W. Iqbal, M. Aqeel, X. Liu, and M. Arif, “Analysis of Techniques for Detection and Removal of Zero-Day Attacks (ZDA),” in International Conference on Ubiquitous Security, 2022, pp. 248–262.
I. H. Sarker, “CyberLearning: Effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks,” Internet of Things, vol. 14, p. 100393, 2021.
L. Cui, Y. Qu, L. Gao, G. Xie, and S. Yu, “Detecting false data attacks using machine learning techniques in smart grid: A survey,” J. Netw. Comput. Appl., vol. 170, p. 102808, 2020.
C. Xenofontos, I. Zografopoulos, C. Konstantinou, A. Jolfaei, M. K. Khan, and K.-K. R. Choo, “Consumer, commercial, and industrial iot (in) security: Attack taxonomy and case studies,” IEEE Internet Things J., vol. 9, no. 1, pp. 199–221, 2021.
S. A. Toledano, Critical Infrastructure Security: Cybersecurity lessons learned from real-world breaches. Packt Publishing Ltd, 2024.
R. S. Gonzalez, R. A. da Silveira Rossi, and L. G. M. Vieira, “Economic and financial consequences of process accidents in Brazil: Multiple case studies,” Eng. Fail. Anal., vol. 132, p. 105934, 2022.
T. Sobb, B. Turnbull, and N. Moustafa, “Supply chain 4.0: A survey of cyber security challenges, solutions and future directions,” Electronics, vol. 9, no. 11, p. 1864, 2020.
T. Anitha, S. Aanjankumar, S. Poonkuntran, and A. Nayyar, “A novel methodology for malicious traffic detection in smart devices using BI-LSTM--CNN-dependent deep learning methodology,” Neural Comput. Appl., vol. 35, no. 27, pp. 20319–20338, 2023.
A. Muttepawar, “Detecting Distributed Denial of Service attack using ensemble learning,” Dublin, National College of Ireland, 2021.
L. Liu, P. Wang, J. Lin, and L. Liu, “Intrusion detection of imbalanced network traffic based on machine learning and deep learning,” IEEE access, vol. 9, pp. 7550–7563, 2020.
L. Sun, Y. Zhou, Y. Wang, C. Zhu, and W. Zhang, “The effective methods for intrusion detection with limited network attack data: Multi-task learning and oversampling,” IEEE access, vol. 8, pp. 185384–185398, 2020.
S. Han, C. Lin, C. Shen, Q. Wang, and X. Guan, “Interpreting adversarial examples in deep learning: A review,” ACM Comput. Surv., vol. 55, no. 14s, pp. 1–38, 2023.
P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, “Explainable ai: A review of machine learning interpretability methods,” Entropy, vol. 23, no. 1, p. 18, 2020.
G. Agrawal, A. Kaur, and S. Myneni, “A review of generative models in generating synthetic attack data for cybersecurity,” Electronics, vol. 13, no. 2, p. 322, 2024.
O. Serradilla, E. Zugasti, J. Rodriguez, and U. Zurutuza, “Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects,” Appl. Intell., vol. 52, no. 10, pp. 10934–10964, 2022.
L. E. Lwakatare, A. Raj, I. Crnkovic, J. Bosch, and H. H. Olsson, “Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions,” Inf. Softw. Technol., vol. 127, p. 106368, 2020.
R. Ahmad and I. Alsmadi, “Machine learning approaches to IoT security: A systematic literature review,” Internet of Things, vol. 14, p. 100365, 2021.
T. Incribo, "Cyber Security Attacks," Kaggle, 2023. [Online]. Available: https://www.kaggle.com/datasets/teamincribo/cyber-security-attacks. [Accessed: 1-Oct-2024].
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Evaluating the Effectiveness of Machine Learning Models for Cyberattack Detection: A Study on Model Generalization and Dataset Imbalance
Pages: 619-628
Copyright (c) 2024 Gregorius Airlangga

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).






















