Implementasi K-Means sebagai Mekanisme Self-Labeling dalam Arsitektur Ensemble Voting Classifier untuk Prediksi Penjualan Usaha Mikro Kecil dan Menengah (UMKM) pada Data Tanpa Label


  • Muhammad Aqil Fahmi * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • Defri Kurniawan Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: Sales Prediction; Self-Labeling; K-Means Clustering; Ensemble Voting; Loss

Abstract

Sales forecasting in the Micro, Small, and Medium Enterprises (MSME) sector faces challenges due to the fluctuating (noisy) nature of the data and the absence of class labels (unlabeled) required for training supervised learning models. This study proposes a sequential hybrid architecture in which the K-Means algorithm is employed as a Self-Labeling mechanism to automatically transform raw transaction data into class labels (“Low” and “High”). The resulting synthetic labels are then used to train an Ensemble Voting Classifier model that aggregates predictions from XGBoost, LightGBM, and CatBoost. The experimental evaluation results show that although the single XGBoost model achieves a slightly higher accuracy (96.24%) compared to the Ensemble model (96.07%), the hybrid Ensemble Voting model proves superior in terms of probability calibration, achieving the lowest Loss value of 0.1532. This value outperforms XGBoost (0.1646) and LightGBM (0.1772), indicating more reliable and stable prediction confidence. The model also demonstrates excellent balance with an F1-Score of 0.95 and a Recall of 0.96 for the majority class. This study confirms that the hybrid approach is effective in reducing uncertainty in MSME stock management.

Downloads

Download data is not yet available.

References

M. R. Santoso, ‘PRODUCT INNOVATION STRATEGIES IN INCREASING COMPETITIVENESS MSMES IN INDONESIAN’, Proceeding of International Students Conference of Economics and Business Excellence, vol. 1, no. 1, pp. 143–147, 2024, doi: 10.33830/iscebe.v1i1.4412.

A. Z. Muttaqin, A. E. Lestiana, and N. R. Aza, ‘Pendampingan Penambahan Varian Rasa Sebagai Upaya Peningkatan Nilai Jual Produk UMKM Keripik Tempe “Nadhira”’, Harmoni Sosial : Jurnal Pengabdian dan Solidaritas Masyarakat, vol. 2, no. 1, pp. 85–92, 2025, doi: 10.62383/harmoni.v2i1.1111.

T. T. H. Tambunan, UMKM Di INDONESIA: Perkembangan, Kendala, dan Tantangan. Prenada Media, 2021.

D. Astuti, K. Kardiyem, R. Setiyani, and L. Latifah, ‘Peningkatan Nilai Tambah Olahan Bawang Merah dengan Konsep Pengembangan Ekonomi Lokal di Kecamatan Toroh Grobogan’, Jurnal Pengabdian Nasional (JPN) Indonesia, vol. 4, pp. 528–535, 2023, doi: 10.35870/jpni.v4i3.430.

O. Surakhi et al., ‘Time-Lag Selection for Time-Series Forecasting Using Neural Network and Heuristic Algorithm’, Electronics (Basel), vol. 10, no. 20, p. 2518, 2021, doi: 10.3390/electronics10202518.

J. Heizer, B. Render, and C. Munson, Operations Management: Sustainability and Supply Chain Management. Pearson, 2020.

S. P. Lesmarna et al., ‘DEVELOPMENT OF TIME-SERIES-BASED MLOPS ARCHITECTURE FOR PREDICTING SALES QUANTITY IN MICRO, SMALL, AND MEDIUM ENTERPRISES (MSMES)’, Transmisi: Jurnal Ilmiah Teknik Elektro, vol. 26, no. 2, pp. 64–69, 2024, doi: 10.14710/transmisi.26.2.64-69.

A. Borucka, ‘Seasonal Methods of Demand Forecasting in the Supply Chain as Support for the Company’s Sustainable Growth’, Sustainability, vol. 15, no. 9, p. 7399, 2023, doi: 10.3390/su15097399.

Jahroni and M. Muksin, ‘Economic Order Quantity (EOQ) Application to Raw Material Inventory Control for SME’s’, Community Service Journal (CSJ), vol. 5, no. 2, pp. 88–98, 2023, doi: 10.22225/csj.5.2.2023.88-98.

S. Mansur et al., ‘Sales forecasting for retail stores using hybrid neural networks and sales-affecting variables’, PeerJ Comput Sci, vol. 11, p. e3058, 2025, doi: 10.7717/peerj-cs.3058.

R. Kang, ‘Sales Prediction of Big Mart based on Linear Regression, Random Forest, and Gradient Boosting’, Advances in Economics, Management and Political Sciences, vol. 17, no. 1, pp. 200–207, 2023, doi: 10.54254/2754-1169/17/20231094.

C. N. C et al., ‘Advancing Retail Predictions: Integrating Diverse Machine Learning Models for Accurate Walmart Sales Forecasting’, Asian Journal of Probability and Statistics, vol. 26, no. 7, pp. 1–23, 2024, doi: 10.9734/ajpas/2024/v26i7626.

S. Wu, Z. Zhang, and Y. Ru, ‘Research on Product Demand Forecasting Based on Random Forest and ARIMA Time Series: Precision Forecasting Method for Data-Scarce Environments’, Transactions on Computer Science and Intelligent Systems Research, vol. 5, pp. 1327–1335, Aug. 2024, doi: 10.62051/G9R9CA46.

M. Kamal Ahmed et al., ‘Forecasting Sales Trends Using Time Series Analysis: A Comparative Study Of Traditional And Machine Learning Models’, Membrane Technology, vol. 668, no. 1, pp. 668–682, Jan. 2025, doi: 10.52710/MT.325.

P. Ganguly and I. Mukherjee, ‘Enhancing Retail Sales Forecasting with Optimized Machine Learning Models’, in 2024 4th International Conference on Sustainable Expert Systems (ICSES), 2024, pp. 884–889. doi: 10.1109/ICSES63445.2024.10762950.

A. F. Alshaibanee and K. B. S. AlJanabi, ‘A Proposed Class Labeling Approach: From Unsupervised to Supervised Learning’, in 2022 Iraqi International Conference on Communication and Information Technologies (IICCIT), 2022, pp. 1–6. doi: 10.1109/IICCIT55816.2022.10010551.

B. Sunarko et al., ‘Penerapan Stacking Ensemble Learning untuk Klasifikasi Efek Kesehatan Akibat Pencemaran Udara’, Edu Komputika Journal, vol. 10, pp. 55–63, 2023, doi: 10.15294/edukomputika.v10i1.72080.

T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, ‘A survey on missing data in machine learning’, J Big Data, vol. 8, no. 1, p. 140, 2021, doi: 10.1186/s40537-021-00516-9.

S. Makridakis, E. Spiliotis, and V. Assimakopoulos, ‘M5 accuracy competition: Results, findings, and conclusions’, Int J Forecast, vol. 38, no. 4, pp. 1346–1364, 2022, doi: 10.1016/j.ijforecast.2021.11.013.

S. Bates, T. Hastie, and R. Tibshirani, ‘Cross-Validation: What Does It Estimate and How Well Does It Do It?’, J Am Stat Assoc, vol. 119, no. 546, pp. 1434–1445, 2024, doi: 10.1080/01621459.2023.2197686.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, ‘K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data’, Inf Sci (N Y), vol. 622, pp. 178–210, 2023, doi: 10.1016/j.ins.2022.11.139.

I. D. Mienye and Y. Sun, ‘A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects’, IEEE Access, vol. 10, pp. 99129–99149, 2022, doi: 10.1109/ACCESS.2022.3207287.

S. Kumari, D. Kumar, and M. Mittal, ‘An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier’, International Journal of Cognitive Computing in Engineering, vol. 2, pp. 40–46, 2021, doi: 10.1016/j.ijcce.2021.01.001.

T. Chen and C. Guestrin, ‘XGBoost: A scalable tree boosting system’, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Aug. 2016, doi: 10.1145/2939672.2939785;CSUBTYPE:STRING:CONFERENCE.

G. Ke et al., ‘LightGBM: A Highly Efficient Gradient Boosting Decision Tree’, doi: 10.5555/3294996.3295074.

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, ‘CatBoost: unbiased boosting with categorical features’, Adv Neural Inf Process Syst, vol. 2018-December, pp. 6638–6648, Jun. 2017, Accessed: Dec. 06, 2025. [Online]. Available: https://arxiv.org/pdf/1706.09516


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Implementasi K-Means sebagai Mekanisme Self-Labeling dalam Arsitektur Ensemble Voting Classifier untuk Prediksi Penjualan Usaha Mikro Kecil dan Menengah (UMKM) pada Data Tanpa Label

Dimensions Badge
Article History
Submitted: 2025-11-24
Published: 2025-12-26
Abstract View: 303 times
PDF Download: 271 times
How to Cite
Fahmi, M., & Kurniawan, D. (2025). Implementasi K-Means sebagai Mekanisme Self-Labeling dalam Arsitektur Ensemble Voting Classifier untuk Prediksi Penjualan Usaha Mikro Kecil dan Menengah (UMKM) pada Data Tanpa Label. Building of Informatics, Technology and Science (BITS), 7(3), 2006-2016. https://doi.org/10.47065/bits.v7i3.8779
Issue
Section
Articles

Most read articles by the same author(s)