Analisis SelectKBest pada Klasifikasi Trafik VPN Menggunakan Random Forest dan SVM
Abstract
The increasing use of Virtual Private Networks (VPNs) in modern networks poses significant challenges for network monitoring and traffic management, particularly in accurately and efficiently distinguishing VPN and non-VPN traffic. This study aims to analyze the effectiveness of the SelectKBest feature selection method in improving VPN traffic classification performance using Random Forest and Support Vector Machine (SVM) algorithms. The dataset used in this study is the CIC VPN-NonVPN Traffic Dataset provided by the Canadian Institute for Cybersecurity (CIC), which is widely recognized as a standard benchmark in network security research. Feature selection was performed using SelectKBest with the ANOVA (f_classif) scoring function, reducing the original feature set to 15 most relevant features. Experimental results show that the Random Forest classifier achieved a test accuracy of 84.94%, along with high F1-score and ROC-AUC values, and an average cross-validation accuracy of 95.18% with low variance. In contrast, the SVM model demonstrated relatively poor performance, with a test accuracy of approximately 62%, indicating its limitation in capturing the complex patterns of network traffic data. Further analysis using ROC curves, Precision–Recall curves, confusion matrices, and learning curves confirms that Random Forest exhibits superior generalization capability compared to SVM. These findings indicate that the combination of SelectKBest and Random Forest not only delivers high classification performance but also improves computational efficiency through feature dimensionality reduction, making it suitable for large-scale VPN traffic classification scenarios.
Downloads
References
Afuwape, A., Xu, Y., Anajemba, J., & Srivastava, G. (2021). Performance evaluation of secured network traffic classification using a machine learning approach. Comput. Stand. Interfaces, 78, 103545. https://doi.org/10.1016/j.csi.2021.103545
Almomani, A. (2022). Classification of Virtual Private networks encrypted traffic using ensemble learning algorithms. Egyptian Informatics Journal. https://doi.org/10.1016/j.eij.2022.06.006
Balachandran, A., & Amritha, P. (2022). VPN Network Traffic Classification Using Entropy Estimation and Time-Related Features. IOT with Smart Systems. https://doi.org/10.1007/978-981-16-3945-6_50
Boateng, E., Otoo., J., & Abaye, D. (2020). Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review. 08, 341–357. https://doi.org/10.4236/jdaip.2020.84020
Cervantes, J., García, F., Rodríguez-Mazahua, L., & López-Chau, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215. https://doi.org/10.1016/j.neucom.2019.10.118
Dener, M., Al, S., & Ok, G. (2023). RFSE-GRU: Data Balanced Classification Model for Mobile Encrypted Traffic in Big Data Environment. IEEE Access, 11, 21831–21847. https://doi.org/10.1109/access.2023.3251745
Elnawawy, M., Sagahyroon, A., & Shanableh, T. (2020). FPGA-Based Network Traffic Classification Using Machine Learning. IEEE Access, 8, 175637–175650. https://doi.org/10.1109/access.2020.3026831
Gupta, A. (2021). VPN-nonVPN Traffic Classification Using Deep Reinforced Naive Bayes and Fuzzy K-means Clustering. 2021 IEEE 41st International Conference on Distributed Computing Systems Workshops (ICDCSW), 1–6. https://doi.org/10.1109/icdcsw53096.2021.00008
Gupta, N., Jindal, V., & Bedi, P. (2021). Encrypted Traffic Classification Using eXtreme Gradient Boosting Algorithm. Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-981-16-3071-2_20
Izadi, S., Ahmadi, M., & Rajabzadeh, A. (2022). Network Traffic Classification Using Deep Learning Networks and Bayesian Data Fusion. Journal of Network and Systems Management, 30. https://doi.org/10.1007/s10922-021-09639-z
Khademioureh, S., Dinu, I., & Peignier, S. (2025). GSHAPA: Gene Set Analysis for Single-Cell RNAseq Using Random Forest and SHAP Values. Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing. https://doi.org/10.1145/3672608.3707901
Liu, Zhengyang, Wei, Q., Song, Q., & Duan, C. (2025). Fine-Grained Encrypted Traffic Classification Using Dual Embedding and Graph Neural Networks. Electronics. https://doi.org/10.3390/electronics14040778
Liu, Ziao, Xie, Y., Luo, Y., Wang, Y., & Ji, X. (2025). TransECA-Net: A Transformer-Based Model for Encrypted Traffic Classification. Applied Sciences. https://doi.org/10.3390/app15062977
Lohiya, P., & Bamnote, G. (2025). Internet Traffic Classification through Supervised Learning: Exploring Machine Learning Techniques. Intelligent Methods in Engineering Sciences. https://doi.org/10.58190/imiens.2025.119
Messaoud, M. (2025). Classification Of Network Traffic Using Machine Learning Models On The Netml Dataset. International Journal of Computer Networks & Communications. https://doi.org/10.5121/ijcnc.2025.17307
Olaniran, O., Olaniran, S., Alzahrani, A., Alharbi, N. M., & Alzahrani, A. A. (2025). Random Forest Adaptation for High-Dimensional Count Regression. Mathematics. https://doi.org/10.3390/math13183041
Ratnasingam, S., & Muñoz-Lopez, J. (2023). Distance Correlation-Based Feature Selection in Random Forest. Entropy, 25. https://doi.org/10.3390/e25091250
Salau, A. O., & Beyene, M. M. (2024). Software defined networking based network traffic classification using machine learning techniques. Scientific Reports, 14. https://doi.org/10.1038/s41598-024-70983-6
Tao, Y., Yan, J., Niu, E., Zhai, P., & Zhang, S. (2025). An SVM-Based Anomaly Detection Method for Power System Security Analysis Using Particle Swarm Optimization and t-SNE for High-Dimensional Data Classification. Processes. https://doi.org/10.3390/pr13020549
Telikani, A., Gandomi, A., Choo, K., & Shen, J. (2022). A Cost-Sensitive Deep Learning-Based Approach for Network Traffic Classification. IEEE Transactions on Network and Service Management, 19, 661–670. https://doi.org/10.1109/tnsm.2021.3112283
Thakur, S., Tiwari, V. K., & Agrawal, J. (2025). Performance Analysis of Linear Kernel Support Vector Machine Models on Real-World Datasets. International Journal of Advanced Networking and Applications. https://doi.org/10.35444/ijana.2025.17106
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Analisis SelectKBest pada Klasifikasi Trafik VPN Menggunakan Random Forest dan SVM
Pages: 1533-1541
Copyright (c) 2026 Andri Nurdiansyah, Dwi Robiul R, Sururi Sururi, Nana Sujana

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













