Deteksi Malware Android Berbasis Ensemble Soft Voting LightGBM, Logistic Regression dan CatBoost
Abstract
The Android operating system faces serious challenges with increasingly complex and diverse malware evolution. This research proposes an Android malware detection system based on soft voting ensemble that integrates three algorithms (LightGBM, Logistic Regression, and CatBoost) to improve detection accuracy while maintaining computational efficiency. The dataset used is CCCS-CIC-AndMal-2020, which is highly imbalanced with over 400,000 Android application samples. The proposed model leverages hybrid features that combine static information (permissions, intents, API calls from the AndroidManifest) with dynamic behavior (memory activities, runtime API calls, logcat, and network traffic in an emulated environment), balancing low extraction cost with improved robustness against obfuscation. The methodology includes multi-stage preprocessing (IQR capping 40×, StandardScaler, RFE 150 features, SMOTE 30%) to improve data quality and reduce dimensionality by 56% without losing important information. The ensemble model is trained with F1-Macro-based weights (33.46% LightGBM, 30.99% Logistic Regression, 35.55% CatBoost) approximating 1:1:1 proportion. Evaluation results on the testing set demonstrate very high performance: Accuracy 95.58%, Balanced Accuracy 92.21%, F1-Macro 0.9208, True Positive Rate 100%, and False Alarm Rate 0.00%. The combination of these metrics indicates that the model can detect all malware samples without false positives on benign applications, making it suitable for production deployment. This research contributes by demonstrating the effectiveness of an efficient soft voting ensemble (only 3 models) for Android malware detection with multi-dimensional evaluation metrics representative of imbalanced data.
Downloads
References
T. A. Aziz, Z. Sari, C. Sri, and K. Aditiya, “Klasifikasi Malware android dengan menggunakan metode XGBoost Algoritma,” REPOSITOR, vol. 7, no. 1, pp. 103–110, 2025, doi: 10.22219/repositor.v7i1.36564.
D. B. Ansori, J. Slamet, M. Z. Ghufron, M. A. R. Putra, and T. Ahmad, “Android Malware Classification Using Gain Ratio and Ensembled Machine Learning,” International Journal of Safety and Security Engineering, vol. 14, no. 1, pp. 259–266, Feb. 2024, doi: 10.18280/ijsse.140126.
S. K. Smmarwar, G. P. Gupta, and S. Kumar, “Android malware detection and identification frameworks by leveraging the machine and deep learning techniques: A comprehensive review,” Telematics and Informatics Reports, vol. 14, Jun. 2024, doi: 10.1016/j.teler.2024.100130.
A. Alhogail and R. A. Alharbi, “Effective ML-Based Android Malware Detection and Categorization,” Electronics (Switzerland), vol. 14, no. 8, Apr. 2025, doi: 10.3390/electronics14081486.
R. Islam, M. I. Sayed, S. Saha, M. J. Hossain, and M. A. Masud, “Android malware classification using optimum feature selection and ensemble machine learning,” Internet of Things and Cyber-Physical Systems, vol. 3, pp. 100–111, Jan. 2023, doi: 10.1016/j.iotcps.2023.03.001.
L. Kaur, C. Singh Saroa, and J. Singh, “A review of Static Analysis of Android Malware,” IOSR Journal of Computer Engineering (IOSR-JCE), vol. 25, no. 6, pp. 72–78, Dec. 2023, doi: 10.9790/0661-2506027278.
K. Khalda and D. K. Wibowo, “Malware Behavior Analysis Using Static and Dynamic Analysis Approaches,” Jurnal Sains, Nalar, dan Aplikasi Teknologi Informasi, vol. 4, no. 1, pp. 1–8, Jan. 2025, doi: 10.20885/snati.v4.i1.1.
P. Sumalatha and G. S. Mahalakshmi, “Machine Learning Based Ensemble Classifier For Android Malware Detection,” International Journal of Computer Networks and Communications, vol. 15, no. 4, pp. 111–122, Jul. 2023, doi: 10.5121/ijcnc.2023.15407.
J. Abawajy, A. Darem, and A. A. Alhashmi, “Feature subset selection for malware detection in smart iot platforms,” Sensors (Switzerland), vol. 21, no. 4, pp. 1–19, Feb. 2021, doi: 10.3390/s21041374.
S. Aurangzeb and M. Aleem, “Evaluation and classification of obfuscated Android malware through deep learning using ensemble voting mechanism,” Sci Rep, vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-30028-w.
A. Al-Sraratee and A. Al-Azawei, “Classifying Android Malware Categories Based On Dynamic Features: An Integration Of Feature Reduction And Selection Techniques,” Kufa Journal of Engineering, vol. 16, no. 2, pp. 96–118, Apr. 2025, doi: 10.30572/2018/KJE/160206.
N Anintha Devi, C Karthika, V Pradeepa, and C Sharmila, “Shap Based -Android Malware Detection Using Ensemble Learning,” International Research Journal on Advanced Science Hub, vol. 7, no. 07, pp. 673–680, Jul. 2025, doi: 10.47392/irjash.2025.077.
R. R. Sani, F. A. Rafrastara, and W. Ghozi, “Integrating Ensemble Learning and Information Gain for Malware Detection based on Static and Dynamic Features,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Jan. 2025, doi: 10.22219/kinetik.v10i1.2051.
M. Azwar, L. Widyawati, R. Azhar, K. Kartarina, T. Tanwir, and A. S. Anas, “Deteksi Malware pada Perangkat Android Menggunakan Ensemble Learning,” JTIM : Jurnal Teknologi Informasi dan Multimedia, vol. 7, no. 3, pp. 408–419, Jun. 2025, doi: 10.35746/jtim.v7i3.573.
S. S. Suhaila and K. S. Krishnan, “A novel end-to-end ensemble framework for enhanced android malware detection accuracy,” Egyptian Informatics Journal, vol. 32, Dec. 2025, doi: 10.1016/j.eij.2025.100827.
V. Kouliaridis and G. Kambourakis, “A comprehensive survey on machine learning techniques for android malware detection,” Information (Switzerland), vol. 12, no. 5, 2021, doi: 10.3390/info12050185.
S. Widodo and F. S. Utomo, “A Comprehensive Evaluation of CatBoost and LightGBM Algorithms for Honorarium Prediction on Categorical Datasets with Class Imbalance,” JUITA: Jurnal Informatika, vol. 13, no. 3, pp. 359–370, 2025, doi: 10.30595/juita.v13i3.27363.
V. Jyothsna, K. P. Dasari, S. Inuguru, V. B. R. Gowni, J. T. R. Kudumula, and K. Srilakshmi, “Unified Approach for Android Malware Detection: Feature Combination and Ensemble Classifier,” Proceedings of the International Conference on Computational Innovations and Emerging Trends, pp. 485–495, 2024, doi: 10.2991/978-94-6463-471-6_47.
E. Y. Chaymae and C. Khalid, “Android Malware Detection Through CNN Ensemble Learning on Grayscale Images,” International Journal of Advanced Computer Science and Applications, vol. 16, no. 1, p. 2025, 2025, doi: 10.14569/IJACSA.2025.01601116.
K. Aziz, A. Wahyudi, and I. Palupi, “Mental Health Sentiment Analysis on Twitter using Ensemble Learning Algorithm,” Technology and Science (BITS), vol. 7, no. 2, pp. 1017–1027, 2025, doi: 10.47065/bits.v7i2.7763.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Deteksi Malware Android Berbasis Ensemble Soft Voting LightGBM, Logistic Regression dan CatBoost
Pages: 2193-2204
Copyright (c) 2026 Ardian Danendra, Elkaf Rahmawan Pramudya

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















