Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping


  • Aqmal Syarif Fadilah Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Siska Kurnia Gusti * Mail Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Iwan Iskandar Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Iis Afrianty Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • (*) Corresponding Author
Keywords: Genetic Algorithm; K-Nearest Neighbor; Overlapping Dataset; Stroke; Feature Selection

Abstract

Stroke is a disease that disrupts brain function due to obstructed blood flow and requires early detection to reduce the risk of disability and death. One of the classification methods widely used in stroke prediction is K-Nearest Neighbor (KNN), but this algorithm is sensitive to irrelevant features and complex data distribution. This study aims to evaluate the effectiveness of feature selection using Genetic Algorithm (GA) on the performance of KNN classification on stroke prediction datasets characterized by high overlap between classes. The dataset used consists of 15,000 patient data with 21 initial attributes, which after the preprocessing stage changed to 38 attributes. The evaluation model was carried out using 10-Fold Stratified Cross Validation at three data sharing ratios, namely 90:10, 80:20, and 70:30, with parameter K = 13. The results showed that GA was effective in reducing the data dimensionality significantly to 15–17 selected features and increasing the average internal validation fitness value by around 3%. However, evaluation of the test data shows that the performance of both the KNN and GA-KNN models remains stagnant at an accuracy range of 48–52%, with ROC-AUC values ​​approaching 0.500. This condition indicates that GA tends to find local optimal solutions that are sensitive to the division of the training data, making it difficult for the model to generalize when processing new data. The extreme level of overlap in this dataset is further demonstrated through comparative experiments using feature engineering and the Random Forest algorithm, which both stalled at around 50% accuracy. These results indicate that GA feature selection successfully addresses data complexity, but is not automatically capable of solving class criteria problems if overlap is an inherent characteristic of the data itself.

Downloads

Download data is not yet available.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping

Dimensions Badge
Article History
Published: 2026-06-23
Abstract View: 0 times
PDF Download: 0 times
How to Cite
Fadilah, A., Gusti, S., Iskandar, I., & Afrianty, I. (2026). Evaluasi Seleksi Fitur Algoritma Genetika pada Klasifikasi Penyakit Stroke Menggunakan K-Nearest Neighbor pada Dataset Overlapping. Bulletin of Data Science, 5(3), 233-243. https://doi.org/10.47065/bulletinds.v5i3.10179
Issue
Section
Articles

Most read articles by the same author(s)

<< < 1 2