Optimasi Deteksi Malware Android pada Dataset Drebin Menggunakan Ensemble Learning

Haidar Nafiis Usmany; Wildanil Ghozi

doi:10.47065/bits.v7i4.9443

Haidar Nafiis Usmany * Universitas Dian Nuswantoro, Semarang, Indonesia
Wildanil Ghozi Universitas Dian Nuswantoro, Semarang, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i4.9443

Keywords: Android Malware Detection; Random Forest Feature Selection; Ensemble Learning; Drebin Dataset; Hyperparameter Tuning

Abstract

The increasing number and complexity of Android malware require detection systems that are accurate, efficient, and capable of handling high-dimensional data. Machine learning–based approaches have become one of the widely adopted solutions in cybersecurity research. However, the performance of classification models is often affected by feature redundancy and suboptimal hyperparameter configurations. This study aims to evaluate the effectiveness of combining Random Forest–based feature selection with modern boosting classification algorithms for Android malware detection. The dataset used in this study is the Drebin 215 dataset, which was selected because it is one of the most widely used benchmark datasets for Android malware detection based on static analysis, enabling more objective comparison with previous studies. Feature selection was performed using the Random Forest feature importance method to reduce data dimensionality prior to the classification stage. The classification models employed include XGBoost, Light Gradient Boosting Machine (LightGBM), and CatBoost. The experiments were conducted under two scenarios: without hyperparameter optimization (non-tuning) and with hyperparameter optimization using the Grid Search method. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics, as well as computational time analysis. The experimental results show that all models achieved very strong classification performance on the Drebin benchmark dataset, with accuracy values exceeding 0.98. Among the evaluated models, LightGBM achieved the best performance, with an accuracy of 0.9900 and an F1-score of 0.9865. This performance advantage is likely influenced by the efficiency of its histogram-based learning mechanism and leaf-wise tree growth strategy, which enables faster and more effective learning on high-dimensional data. Nevertheless, the high performance observed on this benchmark dataset still requires further evaluation on more diverse datasets or dynamic environments to ensure the generalization capability of the model in real-world scenarios. The findings of this study indicate that the combination of Random Forest–based feature selection and boosting algorithms can serve as an effective approach for improving the efficiency and performance of Android malware detection systems.

Downloads

Download data is not yet available.

References

J. Bintoro, F. A. Rafrastara, I. A. Latifah, W. Ghozi, and W. Yassin, “Optimizing Android Malware Detection Using Neural Networks And Feature Selection Method,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 6, pp. 1663–1672, Dec. 2024, doi: 10.52436/1.jutif.2024.5.6.3898.

I. A. Latifah, F. A. Rafrastara, J. Bintoro, W. Ghozi, and W. M. Osman, “Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 3, pp. 403–409, Nov. 2024, doi: 10.32736/sisfokom.v13i3.2294.

A. R. Zaidi, T. Abbas, A. Daud, O. Alghushairy, H. Dawood, and N. Sarwar, “Enhancing Android Malware Detection with XGBoost and Convolutional Neural Networks,” Computers, Materials and Continua, vol. 84, no. 2, pp. 3281–3304, 2025, doi: 10.32604/cmc.2025.063646.

M. A. Haq and M. Khuthaylah, “Leveraging Machine Learning for Android Malware Analysis: Insights from Static and Dynamic Techniques,” Engineering, Technology and Applied Science Research, vol. 14, no. 4, pp. 15027–15032, Aug. 2024, doi: 10.48084/etasr.7632.

M. U. Rashid et al., “Hybrid Android Malware Detection and Classification Using Deep Neural Networks,” International Journal of Computational Intelligence Systems, vol. 18, no. 1, Dec. 2025, doi: 10.1007/s44196-025-00783-x.

M. Aamir et al., “AMDDLmodel: Android smartphones malware detection using deep learning model,” PLoS One, vol. 19, no. 1 January, Jan. 2024, doi: 10.1371/journal.pone.0296722.

A. Alhussen, “Advanced Android Malware Detection through Deep Learning Optimization,” Engineering, Technology and Applied Science Research, vol. 14, no. 3, pp. 14552–14557, Jun. 2024, doi: 10.48084/etasr.7443.

S. Zhou, H. Li, X. Fu, D. Han, and X. He, “Novel Multi-Classification Dynamic Detection Model for Android Malware Based on Improved Zebra Optimization Algorithm and LightGBM,” Sensors, vol. 24, no. 18, Sep. 2024, doi: 10.3390/s24185975.

R. S. Arslan, “JDroid: Android malware detection using hybrid opcode feature vector,” PeerJ Comput. Sci., vol. 11, 2025, doi: 10.7717/peerj-cs.3051.

N. H. Saeed, A. A. Hamza, M. A. Sobh, and A. M. Bahaa-Eldin, “Efficient feature ranked hybrid framework for android Iot malware detection,” Sci. Rep., vol. 16, no. 1, Dec. 2026, doi: 10.1038/s41598-026-35238-6.

R. Islam, M. I. Sayed, S. Saha, M. J. Hossain, and M. A. Masud, “Android malware classification using optimum feature selection and ensemble machine learning,” Internet of Things and Cyber-Physical Systems, vol. 3, pp. 100–111, Jan. 2023, doi: 10.1016/j.iotcps.2023.03.001.

S. Nazarinezhad, N. Khosrojerdi, and A. R. Shafieesabet, “Android Malware Detection by XGBoost Algorithm,” Journal of Artificial Intelligence, Applications, and Innovations, vol. 1, no. 3, pp. 31–37, 2024, doi: 10.61838/jaiai.1.3.4.

A. Guerra-Manzanares, “Machine Learning for Android Malware Detection: Mission Accomplished? A Comprehensive Review of Open Challenges and Future Perspectives,” Comput. Secur., vol. 138, Mar. 2024, doi: 10.1016/j.cose.2023.103654.

J. M. Arif, M. F. A. Razak, S. Awang, S. R. T. Mat, N. S. N. Ismail, and A. Firdaus, “A static analysis approach for Android permission-based malware detection systems,” PLoS One, vol. 16, no. 9 September, Sep. 2021, doi: 10.1371/journal.pone.0257968.

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” 2018. [Online]. Available: https://github.com/catboost/catboost

A. Farhan AlShammari, “Implementation of Model Evaluation using Confusion Matrix in Python,” Int. J. Comput. Appl., vol. 186, no. 50, pp. 975–8887, 2024.

O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-56706-x.

S. Sathyanarayanan, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, pp. 4023–4031, Nov. 2024, doi: 10.53555/ajbr.v27i4s.4345.

F. Mukhlif, I. Hashem, and N. Ithnin, “Performance Metrics of Different Machine Learning Models for Windows Malware Detection,” Journal of Advanced Industrial Technology and Application, vol. 6, no. 2, Dec. 2025, doi: 10.30880/jaita.2025.06.02.004.

K. M. Sujon, R. Hassan, K. Choi, and M. A. Samad, “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI for evaluating business predictive models,” J. Big Data, vol. 12, no. 1, Dec. 2025, doi: 10.1186/s40537-025-01313-4.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Optimasi Deteksi Malware Android pada Dataset Drebin Menggunakan Ensemble Learning

Optimasi Deteksi Malware Android pada Dataset Drebin Menggunakan Ensemble Learning

Abstract

Downloads

References

Most read articles by the same author(s)