Optimasi Metode Support Vector Machine Menggunakan Seleksi Fitur Recursive Feature Elimination dan Forward Selection untuk Klasifikasi Kanker Payudara
Abstract
Cancer, the leading cause of global death, results from abnormal cell proliferation that spreads beyond the boundaries of normal tissue. Breast cancer is one of the most common types of cancer, with approximately 2.26 million cases reported in 2020. This research aims to develop a more effective Support Vector Machine (SVM) algorithm for breast cancer classification through efficient feature selection techniques. Previous research has used various algorithms such as K-Nearest Neighbor and Logistic Regression for breast cancer identification. This research focuses on improving accuracy by using alternative feature selection methods such as Recursive Feature Elimination (RFE) and Forward Selection. The dataset used consists of 569 instances with 32 features sourced from the UCI Machine Learning Repository, and classified into benign and malignant categories. Data pre-processing methods, including data cleaning, coding, and feature selection, were applied to the dataset. RFE and Forward Selection techniques were used to identify the most important features for model training. Evaluation of the improved SVM model shows a training accuracy of nearly 100% and a Cross Validation accuracy of 97%, demonstrating the effectiveness of the proposed approach in the context of breast cancer. In addition, the Learning Curve and testing showed the stability of the SVM model with no signs of overfitting or underfitting. Thus, this study developed an SVM algorithm with a feature selection method that produces better accuracy results in breast cancer classification.
Downloads
References
Abidin, M. I., Notodiputro, K. A., & Sartono, B. (2021). Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter. Indonesian Journal of Statistics and Its Applications, 5(1), 26–38. https://doi.org/10.29244/ijsa.v5i1p26-38
Adnyana, I. M. B. (2019). Penerapan Feature Selection untuk Prediksi Lama Studi Mahasiswa. Jurnal Sistem Dan Informatika (JSI), 13(2), 72–76.
Aji, P. W. S., & Suprianto, S. (2023). Stroke disease prediction using random forest method. Universitas Muhammadiyah Sidoarjo. http://dx.doi.org/10.21070/ups.2643
Barinov, R., Gai, V., Kuznetsov, G., & Golubenko, V. (2023). Automatic evaluation of neural network training results. Computers, 12(2), 26. https://doi.org/10.3390/computers12020026
Cancer Today. (2021, March). Global Cancer Observatory. https://gco.iarc.fr/today/data/factsheets/populations/360-indonesia-fact-sheets.pdf
Chazar, C., & Erawan, B. (2020). Machine learning diagnosis kanker payudara menggunakan algoritma support vector machine. INFORMASI (Jurnal Informatika Dan Sistem Informasi), 12(1), 67–80. https://doi.org/10.37424/informasi.v12i1.48
Farahdiba, S., Kartini, D., Nugroho, R. A., Herteno, R., & Saragih, T. H. (2023). Backward elimination for feature selection on breast cancer classification using logistic regression and support vector machine algorithms. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 17(4), 429. https://doi.org/10.22146/ijccs.88926
Fitriyani, N., Amalia, D. R., Handayani, H. H., & Masruriyah, A. F. N. (2023). Aplikasi Berbasis Web Berdasarkan Model Klasifikasi Algoritma SVM dan Logistic Regression Terhadap Data Diabetes. REMIK: Riset Dan E-Jurnal Manajemen Informatika Komputer, 7(4), 1762–1771. https://doi.org/10.33395/remik.v7i4.13001
Ginting, V. S., Kusrini, K., & Taufiq, E. (2020). Implementasi Algoritma C4.5 untuk Memprediksi Keterlambatan Pembayaran Sumbangan Pembangunan Pendidikan Sekolah Menggunakan Python. Inspiration: Jurnal Teknologi Informasi Dan Komunikasi, 10(1). https://doi.org/10.35585/inspir.v10i1.2535
Ismafillah, D., Tatang Rohana, & Yana Cahyana. (2023). Analisis algoritma pohon keputusan untuk memprediksi penyakit diabetes menggunakan oversampling smote. INFOTECH : Jurnal Informatika & Teknologi, 4(1), 27–36. https://doi.org/10.37373/infotech.v4i1.452
koirunnisa, Siregar, A. M., & Faisal, S. (2023). Optimized Machine Learning Performance with Feature Selection for Breast Cancer Disease Classification. Jurnal Ilmiah Teknik Elektro Dan Informatika (JITEKI), 9(4), 1131–1143.
Kusnawi, K., & Khrisna Irham Fadhil Pratama. (2023). Komparasi Algoritma Supervised Learning dan Feature Selection pada Klasifikasi Penyakit Gagal Jantung. Indonesian Journal of Computer Science, 12(6). https://doi.org/10.33022/ijcs.v12i6.3487
Masruriyah, A., Novita, H., Sukmawati, C., Ramadhan, A., Arif, S., & Dermawan, B. (2024). Pengukuran Kinerja Model Klasifikasi dengan Data Oversampling pada Algoritma Supervised Learning untuk Penyakit Jantung. Computer Science (CO-SCIENCE), 4(1), 62–70. https://doi.org/10.31294/coscience.v4i1.2389
Maulana, A., Nugroho, A., & Romli, I. (2022). Optimalisasi support vector machine menggunakan particle swarm optimization untuk mendiagnosa penyakit Kanker Payudara. Journal of Practical Computer Science, 1(2), 1–11. https://doi.org/10.37366/jpcs.v1i2.940
Nikfalazar, S., Yeh, C.-H., Bedingfield, S., & Khorshidi, H. A. (2019). Missing data imputation using decision trees and fuzzy clustering with iterative learning. Knowledge and Information Systems, 62(6), 2419–2437. https://doi.org/10.1007/s10115-019-01427-1
Nurjanah, N., Rani, A. N., Masruriyah, A. F. N., & Handayani, H. H. (2023). Implementasi Model Klasifikasi Jenis Kanker Payudara Menggunakan Algoritma SVM dan Logistic Regression berbasis Web. Riset Dan E-Jurnal Manajemen Informatika Komputer.
Pratama, A. R. I., Latipah, S. A., & Sari, B. N. (2022). OPTIMASI KLASIFIKASI CURAH HUJAN MENGGUNAKAN SUPPORT VECTOR MACHINE (SVM) DAN RECURSIVE FEATURE ELIMINATION (RFE). JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 7(2), 314–324. https://doi.org/10.29100/jipi.v7i2.2675
Qadrini, L., Hijrah, M., Hikmah, L., & Handayani, H. (2023). The application of the neighborhood cleaning rule in conjunction with random forest, k-fold cross-validation, and grid search for addressing imbalanced datasets. TIN: Terapan Informatika Nusantara, 3(8), 286–293. https://doi.org/10.47065/tin.v3i8.4124
Ramadhan, N. G. (2021). Comparative analysis of ADASYN-SVM and SMOTE-SVM methods on the detection of type 2 diabetes mellitus. Scientific Journal of Informatics, 8(2), 276–282. https://doi.org/10.15294/sji.v8i2.32484
Robbani, A. A. (2021). Klasifikasi Penderita Penyakit Diabetes Menggunakan Algoritma C4.5. Universitas Buana Perjuangan Karawang.
Setiawan, Y. (2023). Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara. Jurnal Informatika: Jurnal Pengembangan IT, 8(2), 89–96. https://doi.org/10.30591/jpit.v8i2.4994
Tuntun, R., Kusrini, K., & Kusnawi, K. (2022). Analisis Perbandingan Kinerja Algoritma Klasifikasi dengan Menggunakan Metode K-Fold Cross Validation. JURNAL MEDIA INFORMATIKA BUDIDARMA, 6(4), 2111. https://doi.org/10.30865/mib.v6i4.4681
World Health Organization: WHO. (2022, February 3). Cancer. World Health Organization: WHO. https://www.who.int/news-room/fact-sheets/detail/cancer(N.d.).
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Optimasi Metode Support Vector Machine Menggunakan Seleksi Fitur Recursive Feature Elimination dan Forward Selection untuk Klasifikasi Kanker Payudara
Pages: 144-154
Copyright (c) 2024 Eva Senia Septiany, Hanny Hikmayanti Handayani, Tohirin Al Mudzakir, Anis Fitri Nur Masruriyah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).