Analisis SelectKBest pada Klasifikasi Trafik VPN Menggunakan Random Forest dan SVM


  • Andri Nurdiansyah * Mail Politeknik Pajajaran, Bandung, Indonesia
  • Dwi Robiul R Politeknik Pajajaran, Bandung, Indonesia
  • Sururi Sururi Politeknik Pajajaran, Bandung, Indonesia
  • Nana Sujana Politeknik Pajajaran, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: SelectKBest; Random Forest; Support Vector Machine; VPN Traffic Classification

Abstract

The increasing use of Virtual Private Networks (VPNs) in modern networks poses significant challenges for network monitoring and traffic management, particularly in accurately and efficiently distinguishing VPN and non-VPN traffic. This study aims to analyze the effectiveness of the SelectKBest feature selection method in improving VPN traffic classification performance using Random Forest and Support Vector Machine (SVM) algorithms. The dataset used in this study is the CIC VPN-NonVPN Traffic Dataset provided by the Canadian Institute for Cybersecurity (CIC), which is widely recognized as a standard benchmark in network security research. Feature selection was performed using SelectKBest with the ANOVA (f_classif) scoring function, reducing the original feature set to 15 most relevant features. Experimental results show that the Random Forest classifier achieved a test accuracy of 84.94%, along with high F1-score and ROC-AUC values, and an average cross-validation accuracy of 95.18% with low variance. In contrast, the SVM model demonstrated relatively poor performance, with a test accuracy of approximately 62%, indicating its limitation in capturing the complex patterns of network traffic data. Further analysis using ROC curves, Precision–Recall curves, confusion matrices, and learning curves confirms that Random Forest exhibits superior generalization capability compared to SVM. These findings indicate that the combination of SelectKBest and Random Forest not only delivers high classification performance but also improves computational efficiency through feature dimensionality reduction, making it suitable for large-scale VPN traffic classification scenarios.

Downloads

Download data is not yet available.

References

Afuwape, A., Xu, Y., Anajemba, J., & Srivastava, G. (2021). Performance evaluation of secured network traffic classification using a machine learning approach. Comput. Stand. Interfaces, 78, 103545. https://doi.org/10.1016/j.csi.2021.103545

Almomani, A. (2022). Classification of Virtual Private networks encrypted traffic using ensemble learning algorithms. Egyptian Informatics Journal. https://doi.org/10.1016/j.eij.2022.06.006

Balachandran, A., & Amritha, P. (2022). VPN Network Traffic Classification Using Entropy Estimation and Time-Related Features. IOT with Smart Systems. https://doi.org/10.1007/978-981-16-3945-6_50

Boateng, E., Otoo., J., & Abaye, D. (2020). Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review. 08, 341–357. https://doi.org/10.4236/jdaip.2020.84020

Cervantes, J., García, F., Rodríguez-Mazahua, L., & López-Chau, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189–215. https://doi.org/10.1016/j.neucom.2019.10.118

Dener, M., Al, S., & Ok, G. (2023). RFSE-GRU: Data Balanced Classification Model for Mobile Encrypted Traffic in Big Data Environment. IEEE Access, 11, 21831–21847. https://doi.org/10.1109/access.2023.3251745

Elnawawy, M., Sagahyroon, A., & Shanableh, T. (2020). FPGA-Based Network Traffic Classification Using Machine Learning. IEEE Access, 8, 175637–175650. https://doi.org/10.1109/access.2020.3026831

Gupta, A. (2021). VPN-nonVPN Traffic Classification Using Deep Reinforced Naive Bayes and Fuzzy K-means Clustering. 2021 IEEE 41st International Conference on Distributed Computing Systems Workshops (ICDCSW), 1–6. https://doi.org/10.1109/icdcsw53096.2021.00008

Gupta, N., Jindal, V., & Bedi, P. (2021). Encrypted Traffic Classification Using eXtreme Gradient Boosting Algorithm. Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-981-16-3071-2_20

Izadi, S., Ahmadi, M., & Rajabzadeh, A. (2022). Network Traffic Classification Using Deep Learning Networks and Bayesian Data Fusion. Journal of Network and Systems Management, 30. https://doi.org/10.1007/s10922-021-09639-z

Khademioureh, S., Dinu, I., & Peignier, S. (2025). GSHAPA: Gene Set Analysis for Single-Cell RNAseq Using Random Forest and SHAP Values. Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing. https://doi.org/10.1145/3672608.3707901

Liu, Zhengyang, Wei, Q., Song, Q., & Duan, C. (2025). Fine-Grained Encrypted Traffic Classification Using Dual Embedding and Graph Neural Networks. Electronics. https://doi.org/10.3390/electronics14040778

Liu, Ziao, Xie, Y., Luo, Y., Wang, Y., & Ji, X. (2025). TransECA-Net: A Transformer-Based Model for Encrypted Traffic Classification. Applied Sciences. https://doi.org/10.3390/app15062977

Lohiya, P., & Bamnote, G. (2025). Internet Traffic Classification through Supervised Learning: Exploring Machine Learning Techniques. Intelligent Methods in Engineering Sciences. https://doi.org/10.58190/imiens.2025.119

Messaoud, M. (2025). Classification Of Network Traffic Using Machine Learning Models On The Netml Dataset. International Journal of Computer Networks & Communications. https://doi.org/10.5121/ijcnc.2025.17307

Olaniran, O., Olaniran, S., Alzahrani, A., Alharbi, N. M., & Alzahrani, A. A. (2025). Random Forest Adaptation for High-Dimensional Count Regression. Mathematics. https://doi.org/10.3390/math13183041

Ratnasingam, S., & Muñoz-Lopez, J. (2023). Distance Correlation-Based Feature Selection in Random Forest. Entropy, 25. https://doi.org/10.3390/e25091250

Salau, A. O., & Beyene, M. M. (2024). Software defined networking based network traffic classification using machine learning techniques. Scientific Reports, 14. https://doi.org/10.1038/s41598-024-70983-6

Tao, Y., Yan, J., Niu, E., Zhai, P., & Zhang, S. (2025). An SVM-Based Anomaly Detection Method for Power System Security Analysis Using Particle Swarm Optimization and t-SNE for High-Dimensional Data Classification. Processes. https://doi.org/10.3390/pr13020549

Telikani, A., Gandomi, A., Choo, K., & Shen, J. (2022). A Cost-Sensitive Deep Learning-Based Approach for Network Traffic Classification. IEEE Transactions on Network and Service Management, 19, 661–670. https://doi.org/10.1109/tnsm.2021.3112283

Thakur, S., Tiwari, V. K., & Agrawal, J. (2025). Performance Analysis of Linear Kernel Support Vector Machine Models on Real-World Datasets. International Journal of Advanced Networking and Applications. https://doi.org/10.35444/ijana.2025.17106


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Analisis SelectKBest pada Klasifikasi Trafik VPN Menggunakan Random Forest dan SVM

Dimensions Badge
Article History
Published: 2026-01-31
Abstract View: 126 times
PDF Download: 125 times
Issue
Section
Articles