Optimasi Algoritma Machine Learning Menggunakan Seleksi Fitur Xgboost Untuk Klasifikasi Kanker Payudara
Abstract
This research analyzes the performance of the K-Nearest Neighbors (KNN), Naïve Bayes, and Random Forest algorithms in the classification of breast cancer diagnosis using the Wisconsin Breast Cancer dataset. The problem discussed is how to improve the accuracy of breast cancer diagnosis classification through appropriate preprocessing techniques. The research objective is to evaluate and compare the performance of the three algorithms after the application of preprocessing which includes data cleaning, handling missing values, data duplication, and outliers, as well as feature selection using XGBoost and SMOTE oversampling. application of feature selection to identify the most relevant features and SMOTE to balance the class distribution in the dataset. Performance evaluation results using a confusion matrix show that Random Forest has the best performance with high accuracy, precision, recall, and F1-score, reaching an AUC of 98% after the application of SMOTE. The combination of feature selection and SMOTE was shown to significantly improve model performance, although KNN showed a decrease in performance with SMOTE, while Naïve Bayes experienced a considerable improvement. This study demonstrates the importance of preprocessing techniques in the development of machine learning models for medical applications, emphasizing that appropriate techniques can significantly improve classification performance and result in more accurate diagnoses.
Downloads
References
Adhikary, S., & Banerjee, S. (2023). Introduction to Distributed Nearest Hash: On Further Optimizing Cloud Based Distributed kNN Variant. Procedia Computer Science, 1571-1580.
Ali, A., Hamraz, M., Gul, N., Khan, D. M., Aldahmani, S., & Khan, Z. (2023). A k nearest neighbour ensemble via extended neighbourhood rule and feature subsets. Pattern Recognition.
Carli, F., Leonelli, M., & Varando, G. (2023). A new class of generative classifiers based on staged tree models. Knowledge-Based Systems, 110-488.
Edgar, T. W., & Manz, D. O. (2017). Research Methods for Cyber Security. Elsevier Science.
Fauzan, M., Gusti, S. K., Jasril, & Pizaini. (2023). Penerapan Seleksi Fitur Untuk Klasifikasi Penerima Bantuan Sosial Pangkalan Sesai Menggunakan Metode K-Nearest Neighbor. Jurnal Sistem Komputer dan Informatika (JSON), 1-10.
Hidayata, R., Kartinia, D., Mazdadia, M. I., Budiman, I., & Ramadhania, R. (2023). Implementasi Seleksi Fitur Binary Particle Swarm Optimization pada Algoritma K-NN untuk Klasifikasi Kanker Payudara. Jurnal Sistem dan Teknologi Informasi, 135-139.
Ismafillah, D., Rohana, T., & Cahyana, Y. (2023). Analisis algoritma pohon keputusan untuk memprediksi penyakit diabetes menggunakan oversampling smote. NFOTECH:Jurnal Informatika Teknologi, 27-36.
Kurnia, D., Madadia, M. I., Kartini, D., Nugroho, R. A., & Abadi, F. (2023). SELEKSI FITUR DENGAN PARTICLE SWARM OPTIMIZATION PADA KLASIFIKASI PENYAKIT PARKINSON MENGGUNAKAN XGBOOST. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 1083-1094.
Manju, N., Harish, B. S., & Prajwal, V. (2019). Ensemble Feature Selection and Classification of Internet Traffic using XGBoost Classifier. I. J. Computer Network and Information Security, 37-44.
Meuthia Zulma, G. D., Angelika, & Chamidah, N. (2021). Perbandingan Metode Klasifikasi Naive Bayes, Decision Tree Dan K-Nearest Neighbor Pada Data Log Firewall. Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), 679-688.
Nguyen, B. H., Xue, B., & Zhang, M. (2020). A survey on swarm intelligence approaches to feature selection in data mining. Swarm and Evolutionary Computation.
Nurdian, R. A., Ridwan, M., & Yusuf, A. (2022). Komparasi Metode SMOTE dan ADASYN dalam Meningkatkan Performa Klasifikasi Herregistrasi Mahasiswa Baru. Jurnal Teknik Informatika dan Sistem Informasi, 24-32.
Pratama, & Adhitya, Y. (2019). Analisis Metode Seleksi Fitur untuk Meningkatkan Akurasi pada Variant Metode Klasifikasi K-Nearest Neighbor (kNN).
R, D., Avilala, S. V., & Subramaniyaswamy, V. (2019). Comparative Study of Classifier for Chronic Kidney Disease prediction using Naive Bayes, KNN and Random Forest. IEEE, 679-684.
Rifatama, M. I., Faisal, M. R., Hertono, R., Budiman, I., & Mazdadi, M. I. (2023). OPTIMASI ALGORITMA K-NEAREST NEIGHBOR DENGAN SELEKSI FITUR MENGGUNAKAN XGBOOST. JIRE (Jurnal Informatika & Rekayasa Elektronika).
Setiawan, & Yohanes. (2023). Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara. Jurnal Informatika: Jurnal pengembangan IT (JPIT).
Shanshool, A. M., Hussien Saeed, E. M., & Khaleel, H. H. (2023). Comparison of various data mining methods for early diagnosis of human cardiology. IAES International Journal of Artificial Intelligence (IJ-AI), 1343-1351.
Siregar, A. M., Tukino, Faisal, S., Fauzi, A., & Kadori, I. (2020). Klasifikasi untuk Prediksi Cuaca Menggunakan Esemble Learning. PETIR:Jurnal Pengkajian dan Penerapan Teknik Informatika, 138-147.
Suhliyyah, Handayani, H. H., & Baihaqi, K. A. (2023). Implementasi Algoritma Logistic Regression Untuk Klasifikasi Penyakit Stroke. Syntax: Jurnal Informatika, 15-23.
Syafei, R. M., & Efrilianda, D. A. (2023). Machine Learning Model Using Extreme Gradient Boosting (XGBoost) Feature Importance and Light Gradient Boosting Machine (LightGBM) to Improve Accurate Prediction of Bankruptcy. Recursive Journal of Informatics, 64-72.
V, K., & S, S. P. (2022). Adaptive boosted random forest-support vector machine based classification scheme for speaker identification. Applied Soft Computing, 109-826.
Wilantapoera, A. W., Astuti, W., & Dwifebri, M. (2023). Analisis Sentimen Kategori Aspek Pada Ulasan Produk Menggunakan Metode KNN Dengan Seleksi Fitur Mutual Information. e-Proceeding of Engineering, 1673-1681.
Zhang, X., Shi, Z., Liu, X., & Li, X. (2018). A Hybrid Feature Selection Algorithm For Classification Unbalanced Data Processsing. IEEE International Conference on Smart Internet of Things (SmartIoT), 269-275.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Optimasi Algoritma Machine Learning Menggunakan Seleksi Fitur Xgboost Untuk Klasifikasi Kanker Payudara
Pages: 162-171
Copyright (c) 2024 Naufal Cahya Ramadhan, Hanny Hikmayanti H, Tatang Rohana, Amril Mutoi Siregar

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).