Evaluating the Effectiveness of Machine Learning Models for Cyberattack Detection: A Study on Model Generalization and Dataset Imbalance


  • Gregorius Airlangga * Mail Atma Jaya Catholic University of Indonesia, Jakarta, Indonesia
  • (*) Corresponding Author
Keywords: Cyberattack Detection; Machine Learning; Imbalanced Datasets; Model Overfitting; XGBoost; RandomForest

Abstract

In today's rapidly evolving digital landscape, detecting and preventing cyberattacks has become crucial for securing networks and data. This study evaluates the performance of several machine learning models, including RandomForest, GradientBoosting, XGBoost, LightGBM, CatBoost, Support Vector Classifier (SVC), Logistic Regression, and an ensemble Voting Classifier, in detecting and classifying cyberattacks. The models were tested on a real-world cybersecurity dataset with significant class imbalance, where benign traffic vastly outnumbers malicious attacks. Results showed that while some models, such as RandomForest and the Voting Classifier, achieved high training accuracy, they suffered from overfitting, with test accuracies not exceeding 34%. Boosting models like XGBoost and LightGBM exhibited better generalization than RandomForest but still struggled to handle the dataset complexity. The primary limitations of this study include the dataset's imbalance, the high dimensionality of the features, and the models’ tendency to overfit. These challenges highlight the need for more robust data preprocessing techniques, hyperparameter tuning, and exploration of advanced models, such as deep learning architectures, for future work. The findings provide insights into the challenges of using machine learning for cybersecurity attack detection and point toward future directions for improving model performance in real-world settings.

Downloads

Download data is not yet available.

References

C.-L. Chen, Y.-C. Lin, W.-H. Chen, C.-F. Chao, and H. Pandia, “Role of government to enhance digital transformation in small service business,” Sustainability, vol. 13, no. 3, p. 1028, 2021.

S. Kraus, P. Jones, N. Kailer, A. Weinmann, N. Chaparro-Banegas, and N. Roig-Tierno, “Digital transformation: An overview of the current state of the art of research,” Sage Open, vol. 11, no. 3, p. 21582440211047576, 2021.

T. Saarikko, U. H. Westergren, and T. Blomquist, “Digital transformation: Five recommendations for the digitally conscious firm,” Bus. Horiz., vol. 63, no. 6, pp. 825–839, 2020.

D. J. Edwards, “Malware Defenses,” in Critical Security Controls for Effective Cyber Defense: A Comprehensive Guide to CIS 18 Controls, Springer, 2024, pp. 277–308.

M. Huszár, “Current state of IT security awareness--challenges, risks and effects globally.”

M. Anisetti, C. Ardagna, M. Cremonini, E. Damiani, J. Sessa, and L. Costa, “Security threat landscape,” White Pap. Secur. Threat., 2020.

S. S. M. Dandyala and S. Banik, “Traditional Methods of Threat Detection,” Int. J. Adv. Eng. Technol. Innov., vol. 1, no. 2, pp. 161–177, 2021.

K. Hamid, M. W. Iqbal, M. Aqeel, X. Liu, and M. Arif, “Analysis of Techniques for Detection and Removal of Zero-Day Attacks (ZDA),” in International Conference on Ubiquitous Security, 2022, pp. 248–262.

I. H. Sarker, “CyberLearning: Effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks,” Internet of Things, vol. 14, p. 100393, 2021.

L. Cui, Y. Qu, L. Gao, G. Xie, and S. Yu, “Detecting false data attacks using machine learning techniques in smart grid: A survey,” J. Netw. Comput. Appl., vol. 170, p. 102808, 2020.

C. Xenofontos, I. Zografopoulos, C. Konstantinou, A. Jolfaei, M. K. Khan, and K.-K. R. Choo, “Consumer, commercial, and industrial iot (in) security: Attack taxonomy and case studies,” IEEE Internet Things J., vol. 9, no. 1, pp. 199–221, 2021.

S. A. Toledano, Critical Infrastructure Security: Cybersecurity lessons learned from real-world breaches. Packt Publishing Ltd, 2024.

R. S. Gonzalez, R. A. da Silveira Rossi, and L. G. M. Vieira, “Economic and financial consequences of process accidents in Brazil: Multiple case studies,” Eng. Fail. Anal., vol. 132, p. 105934, 2022.

T. Sobb, B. Turnbull, and N. Moustafa, “Supply chain 4.0: A survey of cyber security challenges, solutions and future directions,” Electronics, vol. 9, no. 11, p. 1864, 2020.

T. Anitha, S. Aanjankumar, S. Poonkuntran, and A. Nayyar, “A novel methodology for malicious traffic detection in smart devices using BI-LSTM--CNN-dependent deep learning methodology,” Neural Comput. Appl., vol. 35, no. 27, pp. 20319–20338, 2023.

A. Muttepawar, “Detecting Distributed Denial of Service attack using ensemble learning,” Dublin, National College of Ireland, 2021.

L. Liu, P. Wang, J. Lin, and L. Liu, “Intrusion detection of imbalanced network traffic based on machine learning and deep learning,” IEEE access, vol. 9, pp. 7550–7563, 2020.

L. Sun, Y. Zhou, Y. Wang, C. Zhu, and W. Zhang, “The effective methods for intrusion detection with limited network attack data: Multi-task learning and oversampling,” IEEE access, vol. 8, pp. 185384–185398, 2020.

S. Han, C. Lin, C. Shen, Q. Wang, and X. Guan, “Interpreting adversarial examples in deep learning: A review,” ACM Comput. Surv., vol. 55, no. 14s, pp. 1–38, 2023.

P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, “Explainable ai: A review of machine learning interpretability methods,” Entropy, vol. 23, no. 1, p. 18, 2020.

G. Agrawal, A. Kaur, and S. Myneni, “A review of generative models in generating synthetic attack data for cybersecurity,” Electronics, vol. 13, no. 2, p. 322, 2024.

O. Serradilla, E. Zugasti, J. Rodriguez, and U. Zurutuza, “Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects,” Appl. Intell., vol. 52, no. 10, pp. 10934–10964, 2022.

L. E. Lwakatare, A. Raj, I. Crnkovic, J. Bosch, and H. H. Olsson, “Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions,” Inf. Softw. Technol., vol. 127, p. 106368, 2020.

R. Ahmad and I. Alsmadi, “Machine learning approaches to IoT security: A systematic literature review,” Internet of Things, vol. 14, p. 100365, 2021.

T. Incribo, "Cyber Security Attacks," Kaggle, 2023. [Online]. Available: https://www.kaggle.com/datasets/teamincribo/cyber-security-attacks. [Accessed: 1-Oct-2024].


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Evaluating the Effectiveness of Machine Learning Models for Cyberattack Detection: A Study on Model Generalization and Dataset Imbalance

Dimensions Badge
Article History
Submitted: 2024-10-17
Published: 2024-10-31
Abstract View: 916 times
PDF Download: 343 times
How to Cite
Airlangga, G. (2024). Evaluating the Effectiveness of Machine Learning Models for Cyberattack Detection: A Study on Model Generalization and Dataset Imbalance. Journal of Information System Research (JOSH), 6(1), 619-628. https://doi.org/10.47065/josh.v6i1.6089
Section
Articles

Most read articles by the same author(s)

1 2 > >>