Seleksi Fitur Menggunakan Eigen Vector Untuk Peningkatan Kinerja K-Means Clustering Dalam Pengelompokan Data


  • Nugroho Syahputra * Mail Universitas Sumatera Utara, Medan, Indonesia
  • Muhammad Zarlis Universitas Sumatera Utara, Medan, Indonesia
  • Syahril Efendi Universitas Sumatera Utara, Medan, Indonesia
  • (*) Corresponding Author
Keywords: Clustering; K-Means Clustering; Eigen Vector; Sum of Square Error

Abstract

The large number of data set attributes from the data grouping process with K-Means Clustering can affect the number of iterations produced. In this research, Eigen Vector is used to perform feature selection on the data set. The selected data set is then clustered using K-Means Clustering. The data set used in this research is the Wine Quality Dataset obtained from the UCI Machine Learning Repository, with 11 attributes, 4898 data records and 7 attribute classes. Then the South German Credit Dataset was obtained from kaggle.com with 20 attributes, 1000 data records and 2 attribute classes. The results of this research indicate that the number of iterations obtained from the comparison of tests using K-Means without feature selection is that in the Wine Quality Dataset, 11 iterations are obtained, and in the South German Credit Dataset, there are 10 iterations. Meanwhile, K-Means with Eigen Vector feature selection obtained the number of iterations in the Wine Quality Dataset with a total of 5 iterations, and in the South German Credit Dataset with a total of 4 iterations. Clustering evaluation was calculated using Sum of Square Error (SSE). The SSE value in K-Means Clustering without feature selection from the Wine Quality Dataset is 678.5735, while in the South German Credit Dataset it is 1534.3167. While the K-Means Clustering with Eigen Vector from the Wine Quality Dataset is 383.0517, and the South German Credit Dataset is 469.0698. From the results of the proposed method is able to reduce the percentage of errors and minimize the number of iterations on K-Means Clustering with feature selection using Eigen Vector

Downloads

Download data is not yet available.

Author Biographies

Nugroho Syahputra, Universitas Sumatera Utara, Medan

Program Studi S2 Teknik Informatika, Fakultas Ilmu Komputer dan Teknologi Informasi, Universitas Sumatera Utara

Muhammad Zarlis, Universitas Sumatera Utara, Medan

Fakultas Ilmu Komputer dan Teknologi Informasi, Universitas Sumatera Utara

Syahril Efendi, Universitas Sumatera Utara, Medan

Fakultas Ilmu Komputer dan Teknologi Informasi, Universitas Sumatera Utara

References

N. Arunkumar, M. A. Mohammed, M. K.A Ghani, D. A. Ibrahim, “K-means clustering and neural network for osbject detecting and identifying abnormality of brain tumor”. Soft Computing, vol. 23, no. 19, pp. 9083-9096, 2019.

U. R. Raval, C. Jani, “Implementing & Improvisation of K-means Clustering Algorithm”, IJCSMC, vol. 5, no. 5, 2016

M. Bora, D. Jyoti, D. Gupta, A. Kumar, “Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab”, IJCBIT, vol. 5, no. 2, 2014.

M. Kuhkan, "A Method to Improve the Accuracy of K-Nearest Neighbor Algorithm," International Journal of Computer Engineering and Information Technology, vol. 8, no. 6, pp. 90-95, 2016.

R. K. Dinata, H. Novriando, N. Hasdyna, and S. Retno, "Reduksi atribut menggunakan information gain untuk optimasi cluster algoritma k-means," Jurnal Edukasi dan Penelitian Informatika, vol. 6, no. 1, pp. 48-53. 2020.

A. Izzuddin, "Optimasi Cluster pada Algoritma K-Means dengan Reduksi Dimensi Dataset Menggunakan Principal Component Analysis untuk Pemetaan Kinerja Dosen," Energy-Jurnal Ilmiah Ilmu-Ilmu Teknik, vol. 5, no. 2, pp.41-46, 2015.

T. Silwattananusarn, K. Tuamsuk, “Data Mining and Its Applications for Knowledge Management: A Literature Review from 2007 to 2012”, IJDKP, vol. 2, no. 5, 2012.

Z. Ren, Z. Xu, and H. Wang, "The strategy selection problem on artificial intelligence with an integrated VIKOR and AHP method under probabilistic dual hesitant fuzzy information," IEEE Access, vol. 7, pp. 103979-103999, 2019.

C. Saranya, and G. Manikandan, "A Study on Normalization Techniques for Privacy Preserving Data Mining," International Journal of Engineering and Technology (IJET), vol. 5, no. 3, pp. 2701-2704, 2013.

L. P. Refialy, H. Maitimu, and M. S. Pesulima, “Perbaikan Kinerja Clustering K-Means pada Data Ekonomi Nelayan dengan Perhitungan Sum of Square Error (SSE) dan Optimasi nilai K cluster,” Techno. Com, vol. 20, no. 2, pp. 321-329, 2021.

A. I. Lubis, U. Erdiansyah, and R. Siregar, ”Comparison of Accuracy in Naïve Bayes and Random Forests in Classification of Liver Disease,” CESS (Journal of Computer Engineering, System and Science), vol. 7, no. 1, pp. 81-89, 2022.

A.E. Munthafa, and H. Mubarok, "Penerapan Metode Analytical Hierarchy Process Dalam Sistem Pendukung Keputusan Penentuan Mahasiswa Berprestasi," Jurnal Siliwangi, vol.3, no.2, 2017.

O. J. Oyelade, O. O. Oladipupo, I. C. Obagbuwa, “Application of K-Means Clustering Algorithm for Prediction of Students’s Academic Performance”, IJCSIS, Vol 7, No 1, 2010

W. Wijayanti, R. Ayu, M. T. Furqon, and S. Adinugroho. "Penerapan Algoritme Support Vector Machine Terhadap Klasifikasi Tingkat Risiko Pasien Gagal Ginjal." Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN 2548 (2018): 964X.

G. Tian, H. Zhang, M. Zhou, and Z. Li, "AHP, Gray Correlation, and TOPSIS Combined Approach to Green Performance Evaluation of Design Alternatives," IEEE Transaction on Systems, MAN, and Cybernetics, pp. 1-13, 2007

H. Haviluddin, S. J. Patandianan, G. M. Putra, N. Puspitasari, and H. S. Pakpahan, "Implementasi Metode K-Means Untuk Pengelompokkan Rekomendasi Tugas Akhir," Informatika Mulawarman: Jurnal Ilmiah Ilmu Komputer, vol. 16, no. 1, pp. 13-18, 2021.s

R. Nainggolan, and G. Lumbantoruan, "Optimasi performa cluster K-Means menggunakan Sum of Squared Error (SSE)," METHOMIKA: Jurnal Manajemen Informatika & Komputerisasi Akuntansi, vol. 2, no. 2, pp. 103-108, 2018.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Seleksi Fitur Menggunakan Eigen Vector Untuk Peningkatan Kinerja K-Means Clustering Dalam Pengelompokan Data

Dimensions Badge
Article History
Submitted: 2022-08-01
Published: 2022-09-29
Abstract View: 382 times
PDF Download: 325 times
How to Cite
Syahputra, N., Zarlis, M., & Efendi, S. (2022). Seleksi Fitur Menggunakan Eigen Vector Untuk Peningkatan Kinerja K-Means Clustering Dalam Pengelompokan Data. Building of Informatics, Technology and Science (BITS), 4(2), 1010−1017. https://doi.org/10.47065/bits.v4i2.2022
Issue
Section
Articles