In-Situ Database Machine Learning: Evaluating SQL-Based K-Means for E-Commerce Sales Analysis


  • Joanne Polama Putri Sembiring Institut Teknologi Sumatera, South Lampung, Indonesia
  • Rajif Agung Yunmar * Mail Institut Teknologi Sumatera, South Lampung, Indonesia
  • (*) Corresponding Author
Keywords: K-Means; In-Situ Database Machine Learning; Stored Procedure; Silhouette Score; E-Commerce

Abstract

Conventional machine learning techniques, such as K-Means clustering, often necessitate transferring data outside the database for analysis, which introduces inefficiencies, potential data inconsistencies, or security and privacy concerns. This research proposes an in-situ database machine learning approach by implementing the K-Means clustering algorithm directly within the database management system through using stored procedure. The methodology comprises five main stages: collection of public datasets (from Kaggle), data preparation and cleaning, transformation of data through cyclical feature encoding for temporal context, in-database K-Means implementation, and performance evaluation. The evaluation utilized the Silhouette Score metric and execution time to compare the proposed in-situ approach with a conventional off-database implementation. The in-situ database clustering achieved an optimal Silhouette Score of S ≈ 0.914 in a remarkably short time of 0.0121 seconds. In comparison, the conventional off-database clustering achieved an identical quality score, but required a significantly longer execution time of 1.2956 seconds. This means that, to achieve the exact same cluster quality, the in-situ method is approximately 107.07 times faster than the off-database method. The identical score confirms the mathematical correctness of the SQL-based implementation and indicates excellent cluster quality. The findings of this study demonstrate that the in-situ database clustering approach is a superior methodology. This exceptional efficiency, validated by the successful categorization of e-commerce sales data into distinct demand patterns, lays a strong foundation for developing more effective and efficient predictive analytical strategies and data-driven decision-making, particularly for inventory planning.

Downloads

Download data is not yet available.

References

R. Geuens, “Analisis Pasar E-Commerce Global: Tren dan Prediksi 202410,” 2025. https://soax.com/research/growth-ecommerce (accessed Sep. 28, 2025).

B. P. S. "Direktorat Statistik Keuangan, Teknologi Informasi, dan Pariwisata, “Statistik E-Commerce 2023,” 2023. [Online]. Available: https://www.bps.go.id/id/publication/2025/01/30/d52af11843aee401403ecfa6/statistik-e-commerce-2023.html.

R. A. Alifia, N. R. Safitri, D. M. Irhami, R. Hidayat N, and I. R. Kusumasari, “Challenges and Solutions for Decision Making in the Era of Big Data,” J. Bisnis dan Komun. Digit., vol. 2, no. 2, p. 13, Dec. 2024, doi: 10.47134/jbkd.v2i2.3498.

D. T. Warianta, P. Astagina, R. Julianto, and F. Y. Arini, “Optimalisasi K-Means Menggunakan Algoritma Firefly Untuk Segmentasi Pelanggan pada E-commerce,” J. FASILKOM, vol. 14, no. 3, pp. 775–785, Jan. 2025, doi: 10.37859/jf.v14i3.8287.

I. Arwani, “Integrasi Algoritma K-Means Dengan Bahasa SQL Untuk Klasterisasi IPK Mahasiswa (Studi Kasus: Fakultas Ilmu Komputer Universitas Brawijaya),” J. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 2, p. 143, 2015, doi: 10.25126/jtiik.201522148.

I. Indra, N. Nur, M. Iqram, and N. Inayah, “Perbandingan K-Means dan Hierarchical Clustering dalam Pengelompokan Daerah Beresiko Stunting,” INOVTEK Polbeng - Seri Inform., vol. 8, no. 2, p. 356, 2023, doi: 10.35314/isi.v8i2.3612.

F. M. Pranata, S. H. Wijoyo, and N. Y. Setiawan, “Analisis Performa Algoritma K-Means dan DBSCAN Dalam Segmentasi Pelanggan Dengan Pendekatan Model RFM,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 7, pp. 2548–964, 2024.

S. Sindi, W. R. O. Ningse, I. A. Sihombing, F. I. R.H.Zer, and D. Hartama, “Analisis Algoritma K-Medoids Clustering dalam Pengelompokan Penyebaran Covid-19 di Indonesia,” J. Teknol. Inf., vol. 4, no. 1, pp. 166–173, Jun. 2020, doi: 10.36294/jurti.v4i1.1296.

A. Ikhwan and N. Aslami, “Implementasi Data Mining untuk Manajemen Bantuan Sosial Menggunakan Algoritma K-Means,” J. Teknol. Inf., vol. 4, no. 2, pp. 208–217, Dec. 2020, doi: 10.36294/jurti.v4i2.2103.

M. D. Rivaldo, G. W. N. Wibowo, and H. Mulyo, “Implementasi Algoritma K-Means untuk Klasterisasi Data Hasil Tangkapan Ikan di Karimunjawa,” J. Minfo Polgan, vol. 13, no. 1, pp. 1045–1056, Jul. 2024, doi: 10.33395/jmp.v13i1.13928.

K. Kodratul Munawar and A. Irma Purnamasari, “Implementasi Algoritma K-Means Clustering pada Klasterisasi Kasus HIV di Jawa Barat,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 2, pp. 1092–1099, Aug. 2023, doi: 10.36040/jati.v7i2.6372.

F. Handayanna, “Penerapan Algoritma K-Means untuk Klasterisasi Penduduk Miskin di Provinsi Banten,” INTI Nusa Mandiri, vol. 18, no. 1, pp. 93–99, Aug. 2023, doi: 10.33480/inti.v18i1.4399.

A. Anjani, “Klasterisasi Data Penjualan Terlaris Produk Kosmetik You Menggunakan Algoritma K-Means,” J. Tika, vol. 9, no. 1, pp. 17–25, Apr. 2024, doi: 10.51179/tika.v9i1.2531.

J. M. Hellerstein et al., “The MADlib analytics library,” Proc. VLDB Endow., vol. 5, no. 12, pp. 1700–1711, 2012, doi: 10.14778/2367502.2367510.

“Most popular technologies - Stackoverflow Developer Survey,” 2025. https://survey.stackoverflow.co/2024/technology#most-popular-technologies-database-learn (accessed Oct. 22, 2025).

S. Jagtap, “E-commerce Customer Data For Behavior Analysis,” 2023. https://www.kaggle.com/datasets/shriyashjagtap/e-commerce-customer-for-behavior-analysis?resource=download (accessed Sep. 29, 2025).

P. K. Dunn, “Scientific Research and Methodology,” Sci. Res. Methodol., 2025, doi: 10.1201/9781003394938.

T. Mahajan, G. Singh, and G. Bruns, “An Experimental Assessment of Treatments for Cyclical Data,” pp. 1–6, 2021, [Online]. Available: https://scholarworks.calstate.edu/downloads/pv63g5147.

M. Anjelita, A. P. Windarto, A. Wanto, and S. Saifullah, “Analisis Metode K-Means pada Kasus Ekspor Barang Perhiasan dan Barang Berharga Berdasarkan Negara Tujuan,” Semin. Nas. Sains Teknol. Inf., pp. 476–482, 2019, [Online]. Available: http://prosiding.seminar-id.com/index.php/sensasi/issue/archivePage%7C476.

M. Muhtadin Billah, D. Rasyid Al-Hadi, D. Zatusiva Haq, and D. C. R. Novitasari, “Analisis Cluster Negara di Asia Berdasarkan Tingkat Kenyamanan Hidup Menggunakan Metode K-Means,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 5, pp. 10551–10557, Sep. 2024, doi: 10.36040/jati.v8i5.10753.

P. S. Rosiana, A. A. Mohsa, and Y. Umaidah, “Implementasi K-Means dalam Pengelompokan Penyebaran Penyakit DBD di Jawa Barat,” J. Inform. dan Tek. Elektro Terap., vol. 11, no. 3, Aug. 2023, doi: 10.23960/jitet.v11i3.3344.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel In-Situ Database Machine Learning: Evaluating SQL-Based K-Means for E-Commerce Sales Analysis

Dimensions Badge
Article History
Submitted: 2025-10-06
Published: 2025-10-29
Abstract View: 160 times
PDF Download: 68 times
How to Cite
Sembiring, J., & Yunmar, R. (2025). In-Situ Database Machine Learning: Evaluating SQL-Based K-Means for E-Commerce Sales Analysis. Journal of Information System Research (JOSH), 7(1), 85-100. https://doi.org/10.47065/josh.v7i1.8468
Section
Articles