Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping

Aqmal Syarif Fadilah; Siska Kurnia Gusti; Iwan Iskandar; Iis Afrianty

doi:10.47065/bulletinds.v5i3.10179

Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping

Aqmal Syarif Fadilah Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Siska Kurnia Gusti * Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Iwan Iskandar Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
Iis Afrianty Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bulletinds.v5i3.10179

Keywords: Genetic Algorithm; K-Nearest Neighbor; Overlapping Dataset; Stroke; Feature Selection

Abstract

Stroke is a disease that disrupts brain function due to obstructed blood flow and requires early detection to reduce the risk of disability and death. One of the classification methods widely used in stroke prediction is K-Nearest Neighbor (KNN), but this algorithm is sensitive to irrelevant features and complex data distribution. This study aims to evaluate the effectiveness of feature selection using Genetic Algorithm (GA) on the performance of KNN classification on stroke prediction datasets characterized by high overlap between classes. The dataset used consists of 15,000 patient data with 21 initial attributes, which after the preprocessing stage changed to 38 attributes. The evaluation model was carried out using 10-Fold Stratified Cross Validation at three data sharing ratios, namely 90:10, 80:20, and 70:30, with parameter K = 13. The results showed that GA was effective in reducing the data dimensionality significantly to 15–17 selected features and increasing the average internal validation fitness value by around 3%. However, evaluation of the test data shows that the performance of both the KNN and GA-KNN models remains stagnant at an accuracy range of 48–52%, with ROC-AUC values approaching 0.500. This condition indicates that GA tends to find local optimal solutions that are sensitive to the division of the training data, making it difficult for the model to generalize when processing new data. The extreme level of overlap in this dataset is further demonstrated through comparative experiments using feature engineering and the Random Forest algorithm, which both stalled at around 50% accuracy. These results indicate that GA feature selection successfully addresses data complexity, but is not automatically capable of solving class criteria problems if overlap is an inherent characteristic of the data itself.

Downloads

Download data is not yet available.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping

Dimensions Badge

Article History

Published: 2026-06-23
Abstract View: 0 times
PDF Download: 0 times

How to Cite

Fadilah, A., Gusti, S., Iskandar, I., & Afrianty, I. (2026). Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping. Bulletin of Data Science, 5(3), 233-243. https://doi.org/10.47065/bulletinds.v5i3.10179

Download Citation

Issue

Vol 5 No 3 (2026): June 2026
Pages: 233-243

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).

Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping

Abstract

Downloads

Most read articles by the same author(s)