Optimasi Metode Support Vector Machine Menggunakan Seleksi Fitur Recursive Feature Elimination dan Forward Selection untuk Klasifikasi Kanker Payudara


  • Eva Senia Septiany * Mail Universitas Buana Perjuangan Karawang, Karawang, Indonesia
  • Hanny Hikmayanti Handayani Universitas Buana Perjuangan Karawang, Karawang, Indonesia
  • Tohirin Al Mudzakir Universitas Buana Perjuangan Karawang, Karawang, Indonesia
  • Anis Fitri Nur Masruriyah Universitas Buana Perjuangan Karawang, Karawang, Indonesia
  • (*) Corresponding Author
Keywords: Breast Cancer; RFE; Forward Selection; SVM

Abstract

Cancer, the leading cause of global death, results from abnormal cell proliferation that spreads beyond the boundaries of normal tissue. Breast cancer is one of the most common types of cancer, with approximately 2.26 million cases reported in 2020. This research aims to develop a more effective Support Vector Machine (SVM) algorithm for breast cancer classification through efficient feature selection techniques. Previous research has used various algorithms such as K-Nearest Neighbor and Logistic Regression for breast cancer identification. This research focuses on improving accuracy by using alternative feature selection methods such as Recursive Feature Elimination (RFE) and Forward Selection. The dataset used consists of 569 instances with 32 features sourced from the UCI Machine Learning Repository, and classified into benign and malignant categories. Data pre-processing methods, including data cleaning, coding, and feature selection, were applied to the dataset. RFE and Forward Selection techniques were used to identify the most important features for model training. Evaluation of the improved SVM model shows a training accuracy of nearly 100% and a Cross Validation accuracy of 97%, demonstrating the effectiveness of the proposed approach in the context of breast cancer. In addition, the Learning Curve and testing showed the stability of the SVM model with no signs of overfitting or underfitting. Thus, this study developed an SVM algorithm with a feature selection method that produces better accuracy results in breast cancer classification.

Downloads

Download data is not yet available.

References

Abidin, M. I., Notodiputro, K. A., & Sartono, B. (2021). Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter. Indonesian Journal of Statistics and Its Applications, 5(1), 26–38. https://doi.org/10.29244/ijsa.v5i1p26-38

Adnyana, I. M. B. (2019). Penerapan Feature Selection untuk Prediksi Lama Studi Mahasiswa. Jurnal Sistem Dan Informatika (JSI), 13(2), 72–76.

Aji, P. W. S., & Suprianto, S. (2023). Stroke disease prediction using random forest method. Universitas Muhammadiyah Sidoarjo. http://dx.doi.org/10.21070/ups.2643

Barinov, R., Gai, V., Kuznetsov, G., & Golubenko, V. (2023). Automatic evaluation of neural network training results. Computers, 12(2), 26. https://doi.org/10.3390/computers12020026

Cancer Today. (2021, March). Global Cancer Observatory. https://gco.iarc.fr/today/data/factsheets/populations/360-indonesia-fact-sheets.pdf

Chazar, C., & Erawan, B. (2020). Machine learning diagnosis kanker payudara menggunakan algoritma support vector machine. INFORMASI (Jurnal Informatika Dan Sistem Informasi), 12(1), 67–80. https://doi.org/10.37424/informasi.v12i1.48

Farahdiba, S., Kartini, D., Nugroho, R. A., Herteno, R., & Saragih, T. H. (2023). Backward elimination for feature selection on breast cancer classification using logistic regression and support vector machine algorithms. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 17(4), 429. https://doi.org/10.22146/ijccs.88926

Fitriyani, N., Amalia, D. R., Handayani, H. H., & Masruriyah, A. F. N. (2023). Aplikasi Berbasis Web Berdasarkan Model Klasifikasi Algoritma SVM dan Logistic Regression Terhadap Data Diabetes. REMIK: Riset Dan E-Jurnal Manajemen Informatika Komputer, 7(4), 1762–1771. https://doi.org/10.33395/remik.v7i4.13001

Ginting, V. S., Kusrini, K., & Taufiq, E. (2020). Implementasi Algoritma C4.5 untuk Memprediksi Keterlambatan Pembayaran Sumbangan Pembangunan Pendidikan Sekolah Menggunakan Python. Inspiration: Jurnal Teknologi Informasi Dan Komunikasi, 10(1). https://doi.org/10.35585/inspir.v10i1.2535

Ismafillah, D., Tatang Rohana, & Yana Cahyana. (2023). Analisis algoritma pohon keputusan untuk memprediksi penyakit diabetes menggunakan oversampling smote. INFOTECH : Jurnal Informatika & Teknologi, 4(1), 27–36. https://doi.org/10.37373/infotech.v4i1.452

koirunnisa, Siregar, A. M., & Faisal, S. (2023). Optimized Machine Learning Performance with Feature Selection for Breast Cancer Disease Classification. Jurnal Ilmiah Teknik Elektro Dan Informatika (JITEKI), 9(4), 1131–1143.

Kusnawi, K., & Khrisna Irham Fadhil Pratama. (2023). Komparasi Algoritma Supervised Learning dan Feature Selection pada Klasifikasi Penyakit Gagal Jantung. Indonesian Journal of Computer Science, 12(6). https://doi.org/10.33022/ijcs.v12i6.3487

Masruriyah, A., Novita, H., Sukmawati, C., Ramadhan, A., Arif, S., & Dermawan, B. (2024). Pengukuran Kinerja Model Klasifikasi dengan Data Oversampling pada Algoritma Supervised Learning untuk Penyakit Jantung. Computer Science (CO-SCIENCE), 4(1), 62–70. https://doi.org/10.31294/coscience.v4i1.2389

Maulana, A., Nugroho, A., & Romli, I. (2022). Optimalisasi support vector machine menggunakan particle swarm optimization untuk mendiagnosa penyakit Kanker Payudara. Journal of Practical Computer Science, 1(2), 1–11. https://doi.org/10.37366/jpcs.v1i2.940

Nikfalazar, S., Yeh, C.-H., Bedingfield, S., & Khorshidi, H. A. (2019). Missing data imputation using decision trees and fuzzy clustering with iterative learning. Knowledge and Information Systems, 62(6), 2419–2437. https://doi.org/10.1007/s10115-019-01427-1

Nurjanah, N., Rani, A. N., Masruriyah, A. F. N., & Handayani, H. H. (2023). Implementasi Model Klasifikasi Jenis Kanker Payudara Menggunakan Algoritma SVM dan Logistic Regression berbasis Web. Riset Dan E-Jurnal Manajemen Informatika Komputer.

Pratama, A. R. I., Latipah, S. A., & Sari, B. N. (2022). OPTIMASI KLASIFIKASI CURAH HUJAN MENGGUNAKAN SUPPORT VECTOR MACHINE (SVM) DAN RECURSIVE FEATURE ELIMINATION (RFE). JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 7(2), 314–324. https://doi.org/10.29100/jipi.v7i2.2675

Qadrini, L., Hijrah, M., Hikmah, L., & Handayani, H. (2023). The application of the neighborhood cleaning rule in conjunction with random forest, k-fold cross-validation, and grid search for addressing imbalanced datasets. TIN: Terapan Informatika Nusantara, 3(8), 286–293. https://doi.org/10.47065/tin.v3i8.4124

Ramadhan, N. G. (2021). Comparative analysis of ADASYN-SVM and SMOTE-SVM methods on the detection of type 2 diabetes mellitus. Scientific Journal of Informatics, 8(2), 276–282. https://doi.org/10.15294/sji.v8i2.32484

Robbani, A. A. (2021). Klasifikasi Penderita Penyakit Diabetes Menggunakan Algoritma C4.5. Universitas Buana Perjuangan Karawang.

Setiawan, Y. (2023). Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara. Jurnal Informatika: Jurnal Pengembangan IT, 8(2), 89–96. https://doi.org/10.30591/jpit.v8i2.4994

Tuntun, R., Kusrini, K., & Kusnawi, K. (2022). Analisis Perbandingan Kinerja Algoritma Klasifikasi dengan Menggunakan Metode K-Fold Cross Validation. JURNAL MEDIA INFORMATIKA BUDIDARMA, 6(4), 2111. https://doi.org/10.30865/mib.v6i4.4681

World Health Organization: WHO. (2022, February 3). Cancer. World Health Organization: WHO. https://www.who.int/news-room/fact-sheets/detail/cancer(N.d.).


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Optimasi Metode Support Vector Machine Menggunakan Seleksi Fitur Recursive Feature Elimination dan Forward Selection untuk Klasifikasi Kanker Payudara

Dimensions Badge
Article History
Published: 2024-07-31
Abstract View: 194 times
PDF Download: 380 times
Issue
Section
Articles