Optimasi Algoritma Machine Learning Menggunakan Seleksi Fitur Xgboost Untuk Klasifikasi Kanker Payudara


  • Naufal Cahya Ramadhan * Mail Universitas Buana Perjuangan, Karawang, Indonesia
  • Hanny Hikmayanti H Universitas Buana Perjuangan, Karawang, Indonesia
  • Tatang Rohana Universitas Buana Perjuangan, Karawang, Indonesia
  • Amril Mutoi Siregar Universitas Buana Perjuangan, Karawang, Indonesia
  • (*) Corresponding Author
Keywords: Feature Selection; Extreme Gradient Boosting (XGBoost); K-Nearest Neighbor (KNN); Naïve Bayes; Random Forest

Abstract

This research analyzes the performance of the K-Nearest Neighbors (KNN), Naïve Bayes, and Random Forest algorithms in the classification of breast cancer diagnosis using the Wisconsin Breast Cancer dataset. The problem discussed is how to improve the accuracy of breast cancer diagnosis classification through appropriate preprocessing techniques. The research objective is to evaluate and compare the performance of the three algorithms after the application of preprocessing which includes data cleaning, handling missing values, data duplication, and outliers, as well as feature selection using XGBoost and SMOTE oversampling. application of feature selection to identify the most relevant features and SMOTE to balance the class distribution in the dataset. Performance evaluation results using a confusion matrix show that Random Forest has the best performance with high accuracy, precision, recall, and F1-score, reaching an AUC of 98% after the application of SMOTE. The combination of feature selection and SMOTE was shown to significantly improve model performance, although KNN showed a decrease in performance with SMOTE, while Naïve Bayes experienced a considerable improvement. This study demonstrates the importance of preprocessing techniques in the development of machine learning models for medical applications, emphasizing that appropriate techniques can significantly improve classification performance and result in more accurate diagnoses.

Downloads

Download data is not yet available.

References

Adhikary, S., & Banerjee, S. (2023). Introduction to Distributed Nearest Hash: On Further Optimizing Cloud Based Distributed kNN Variant. Procedia Computer Science, 1571-1580.

Ali, A., Hamraz, M., Gul, N., Khan, D. M., Aldahmani, S., & Khan, Z. (2023). A k nearest neighbour ensemble via extended neighbourhood rule and feature subsets. Pattern Recognition.

Carli, F., Leonelli, M., & Varando, G. (2023). A new class of generative classifiers based on staged tree models. Knowledge-Based Systems, 110-488.

Edgar, T. W., & Manz, D. O. (2017). Research Methods for Cyber Security. Elsevier Science.

Fauzan, M., Gusti, S. K., Jasril, & Pizaini. (2023). Penerapan Seleksi Fitur Untuk Klasifikasi Penerima Bantuan Sosial Pangkalan Sesai Menggunakan Metode K-Nearest Neighbor. Jurnal Sistem Komputer dan Informatika (JSON), 1-10.

Hidayata, R., Kartinia, D., Mazdadia, M. I., Budiman, I., & Ramadhania, R. (2023). Implementasi Seleksi Fitur Binary Particle Swarm Optimization pada Algoritma K-NN untuk Klasifikasi Kanker Payudara. Jurnal Sistem dan Teknologi Informasi, 135-139.

Ismafillah, D., Rohana, T., & Cahyana, Y. (2023). Analisis algoritma pohon keputusan untuk memprediksi penyakit diabetes menggunakan oversampling smote. NFOTECH:Jurnal Informatika Teknologi, 27-36.

Kurnia, D., Madadia, M. I., Kartini, D., Nugroho, R. A., & Abadi, F. (2023). SELEKSI FITUR DENGAN PARTICLE SWARM OPTIMIZATION PADA KLASIFIKASI PENYAKIT PARKINSON MENGGUNAKAN XGBOOST. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 1083-1094.

Manju, N., Harish, B. S., & Prajwal, V. (2019). Ensemble Feature Selection and Classification of Internet Traffic using XGBoost Classifier. I. J. Computer Network and Information Security, 37-44.

Meuthia Zulma, G. D., Angelika, & Chamidah, N. (2021). Perbandingan Metode Klasifikasi Naive Bayes, Decision Tree Dan K-Nearest Neighbor Pada Data Log Firewall. Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), 679-688.

Nguyen, B. H., Xue, B., & Zhang, M. (2020). A survey on swarm intelligence approaches to feature selection in data mining. Swarm and Evolutionary Computation.

Nurdian, R. A., Ridwan, M., & Yusuf, A. (2022). Komparasi Metode SMOTE dan ADASYN dalam Meningkatkan Performa Klasifikasi Herregistrasi Mahasiswa Baru. Jurnal Teknik Informatika dan Sistem Informasi, 24-32.

Pratama, & Adhitya, Y. (2019). Analisis Metode Seleksi Fitur untuk Meningkatkan Akurasi pada Variant Metode Klasifikasi K-Nearest Neighbor (kNN).

R, D., Avilala, S. V., & Subramaniyaswamy, V. (2019). Comparative Study of Classifier for Chronic Kidney Disease prediction using Naive Bayes, KNN and Random Forest. IEEE, 679-684.

Rifatama, M. I., Faisal, M. R., Hertono, R., Budiman, I., & Mazdadi, M. I. (2023). OPTIMASI ALGORITMA K-NEAREST NEIGHBOR DENGAN SELEKSI FITUR MENGGUNAKAN XGBOOST. JIRE (Jurnal Informatika & Rekayasa Elektronika).

Setiawan, & Yohanes. (2023). Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara. Jurnal Informatika: Jurnal pengembangan IT (JPIT).

Shanshool, A. M., Hussien Saeed, E. M., & Khaleel, H. H. (2023). Comparison of various data mining methods for early diagnosis of human cardiology. IAES International Journal of Artificial Intelligence (IJ-AI), 1343-1351.

Siregar, A. M., Tukino, Faisal, S., Fauzi, A., & Kadori, I. (2020). Klasifikasi untuk Prediksi Cuaca Menggunakan Esemble Learning. PETIR:Jurnal Pengkajian dan Penerapan Teknik Informatika, 138-147.

Suhliyyah, Handayani, H. H., & Baihaqi, K. A. (2023). Implementasi Algoritma Logistic Regression Untuk Klasifikasi Penyakit Stroke. Syntax: Jurnal Informatika, 15-23.

Syafei, R. M., & Efrilianda, D. A. (2023). Machine Learning Model Using Extreme Gradient Boosting (XGBoost) Feature Importance and Light Gradient Boosting Machine (LightGBM) to Improve Accurate Prediction of Bankruptcy. Recursive Journal of Informatics, 64-72.

V, K., & S, S. P. (2022). Adaptive boosted random forest-support vector machine based classification scheme for speaker identification. Applied Soft Computing, 109-826.

Wilantapoera, A. W., Astuti, W., & Dwifebri, M. (2023). Analisis Sentimen Kategori Aspek Pada Ulasan Produk Menggunakan Metode KNN Dengan Seleksi Fitur Mutual Information. e-Proceeding of Engineering, 1673-1681.

Zhang, X., Shi, Z., Liu, X., & Li, X. (2018). A Hybrid Feature Selection Algorithm For Classification Unbalanced Data Processsing. IEEE International Conference on Smart Internet of Things (SmartIoT), 269-275.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Optimasi Algoritma Machine Learning Menggunakan Seleksi Fitur Xgboost Untuk Klasifikasi Kanker Payudara

Dimensions Badge
Article History
Published: 2024-07-31
Abstract View: 374 times
PDF Download: 337 times
Issue
Section
Articles