A Comparative Study of Machine Learning Classifiers with SMOTE for Predicting Purchase Intention


  • Khairunnisa Khairunnisa * Mail Politeknik Negeri Sriwijaya, Palembang, Indonesia
  • Sopian Soim Politeknik Negeri Sriwijaya, Palembang, Indonesia
  • Lindawati Lindawati Politeknik Negeri Sriwijaya, Palembang, Indonesia
  • (*) Corresponding Author
Keywords: Class Imbalance; E-Commerce; Machine Learning; Predictive Modeling; SMOTE

Abstract

The rapid growth of e-commerce has made it increasingly important for online platforms to understand user behavior, particularly in predicting purchasing intention. This study examines the implementation of three machine learning models: Logistic Regression, Random Forest, and Gradient Boosting, to classify purchase intention using real transaction session data. One of the primary obstacles confronted in this investigation is the matter of class imbalance found in the dataset, where 10422 records indicate no purchase while only 1908 indicate a completed purchase. This disparity may result in a biased model performance that prioritizes the dominant class and limits the ability to accurately detect minority class behavior, which in this case is the actual purchase. To resolve this matter, During the data preprocessing phase, the Synthetic Minority Over-sampling Technique (SMOTE) was implemented.  Accuracy, precision, recall, and F1-score metrics were implemented to assess each model's functionality. The results indicate that following the implementation of SMOTE, the Random Forest model attained the best accuracy of 93%, succeeded by Gradient Boosting at 90% and Logistic Regression with 84%. These findings demonstrate that the use of SMOTE significantly improves model sensitivity and balance. This study provides useful insights into designing fairer and more effective predictive systems in the field of e-commerce.

Downloads

Download data is not yet available.

References

M. B. Gulfraz, M. Sufyan, M. Mustak, J. Salminen, and D. K. Srivastava, “Understanding the impact of online customers’ shopping experience on online impulsive buying: A study on two leading E-commerce platforms,” Journal of Retailing and Consumer Services, vol. 68, Sep. 2022, doi: 10.1016/j.jretconser.2022.103000.

A. Rahaman, P. Hulgutte, S. Shaligram, and S. P. Pawar, “Advancements in Diagnostic Strategy of Neurological and Neuropsychiatric Disorders: From Conventional Methods to Point-of-Care Approaches,” British Journal of Multidisciplinary and Advanced Studies, vol. 5, pp. 1–14, Sep. 2024, doi: 10.37745/bjmas.2022.04184.

M. S. Azad, S. S. Khan, R. Hossain, R. Rahman, and S. Momen, “Predictive modeling of consumer purchase behavior on social media: Integrating theory of planned behavior and machine learning for actionable insights,” PLoS One, vol. 18, Dec. 2023, doi: 10.1371/journal.pone.0296336.

N. Chaudhuri, G. Gupta, V. Vamsi, and I. Bose, “On the platform but will they buy? Predicting customers’ purchase behavior using deep learning,” Decis Support Syst, vol. 149, Oct. 2021, doi: 10.1016/j.dss.2021.113622.

D. C. Gkikas and P. K. Theodoridis, “Predicting Online Shopping Behavior: Using Machine Learning and Google Analytics to Classify User Engagement,” Applied Sciences (Switzerland), vol. 14, Dec. 2024, doi: 10.3390/app142311403.

S. Jayanthi, D. Rajeshwari, N. M. Goud, R. Geetha, S. B. Franklin, and P. Rajyalakshmi, “Optimizing Purchase Intention Prediction in E-Commerce,” in 2024 1st International Conference for Women in Computing, InCoWoCo 2024 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/InCoWoCo64194.2024.10863606.

R. Gupta, A. Sharma, and T. Alam, “Building Predictive Models with Machine Learning,” in Studies in Big Data, vol. 145, Springer Science and Business Media Deutschland GmbH, 2024, pp. 39–59. doi: 10.1007/978-981-97-0448-4_3.

M. Arunkumar, K. Rajkumar, W. R. Salem Jeyaseelan, and N. A. Natraj, “Data Mining, Machine Learning, and Statistical Modeling for Predictive Analytics with Behavioral Big Data,” Tehnicki Vjesnik, vol. 32, pp. 72–77, 2025, doi: 10.17559/TV-20231102001073.

G. Wei, W. Mu, Y. Song, and J. Dou, “An improved and random synthetic minority oversampling technique for imbalanced data,” Knowl Based Syst, vol. 248, Jul. 2022, doi: 10.1016/j.knosys.2022.108839.

S. A. Alex, J. Jesu Vedha Nayahi, and S. Kaddoura, “Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification,” Appl Soft Comput, vol. 156, May 2024, doi: 10.1016/j.asoc.2024.111491.

F. Kamalov, A. F. Atiya, and D. Elreedy, “Partial Resampling of Imbalanced Data,” Jul. 2022.

F. E. Harrell, Regression Modeling Strategies. in Springer Series in Statistics. New York, NY: Springer New York, 2001. doi: 10.1007/978-1-4757-3462-1.

E. Tahirovic and S. Krivic, “Interpretability and Explain ability of Logistic Regression Model for Breast Cancer Detection,” in International Conference on Agents and Artificial Intelligence, Science and Technology Publications, Lda, 2023, pp. 161–168. doi: 10.5220/0011627600003393.

A. Cemiloglu, L. Zhu, A. B. Mohammednour, M. Azarafza, and Y. A. Nanehkaran, “Landslide Susceptibility Assessment for Maragheh County, Iran, Using the Logistic Regression Algorithm,” Land (Basel), vol. 12, Jul. 2023, doi: 10.3390/land12071397.

N. A. Saran and F. Nar, “Fast binary logistic regression,” PeerJ Comput Sci, vol. 11, 2025, doi: 10.7717/PEERJ-CS.2579.

M. Mohammadagha, “Hyperparameter Optimization Strategies for Tree-Based Machine Learning Models Prediction: A Comparative Study of AdaBoost, Decision Trees, and Random Forest,” SSRN Electronic Journal, 2025, doi: 10.2139/ssrn.5226457.

H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/bjml/2024/007.

A. Thakur et al., “Product Length Predictions with Machine Learning: An Integrated Approach Using Extreme Gradient Boosting,” SN Comput Sci, vol. 5, Aug. 2024, doi: 10.1007/s42979-024-02999-8.

J. Li, P. Liu, L. Chen, W. Pedrycz, and W. Ding, “An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models,” IEEE Transactions on Artificial Intelligence, 2024, doi: 10.1109/TAI.2024.3424427.

A. Shamim, “Predictive Modeling of E-Commerce Purchase Intent,” https://www.kaggle.com/datasets/adilshamim8/online.

B. Ghojogh, M. Crowley, F. Karray, and A. Ghodsi, “Adversarial Autoencoders,” in Elements of Dimensionality Reduction and Manifold Learning, Springer International Publishing, 2023, pp. 577–596. doi: 10.1007/978-3-031-10602-6_21.

A. Bernardo and E. Della Valle, “An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams,” Expert Syst Appl, vol. 196, Jun. 2022, doi: 10.1016/j.eswa.2022.116630.

S. A. Alex, J. Jesu Vedha Nayahi, and S. Kaddoura, “Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification,” Appl Soft Comput, vol. 156, May 2024, doi: 10.1016/j.asoc.2024.111491.

G. Kunapuli, Ensemble Methods for Machine Learning. Simon and Schusters, Manning, 2023. Accessed: Jun. 08, 2025. [Online]. Available: https://search.worldcat.org/title/1266357525

R. Sibindi, R. W. Mwangi, and A. G. Waititu, “A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices,” Engineering Reports, vol. 5, Apr. 2023, doi: 10.1002/eng2.12599.

F. Sulianta, Basic Data Mining from A to Z - Feri Sulianta - Google Books. 2023. Accessed: Jun. 23, 2025. [Online]. Available: https://books.google.co.id/books?hl=en&lr=lang_en&id=JcLhEAAAQBAJ&oi=fnd&pg=PA1&dq=metodologi+semma&ots=VnDoPkWIrp&sig=BzMu92d48476WZ6-oo7fQfLEUYw&redir_esc=y#v=onepage&q=metodologi%20semma&f=false

M. E. Lestari, I. Asror, and I. L. Sardi, “Penerapan PCA (Principal Component Analysis) pada Deteksi Outlier untuk Data Text,” eProceedings of Engineering, vol. 10, no. 3, Jun. 2023, doi: 10.1016/j.jsb.2012.10.010.

F. D. Pramakrisna, “Aplikasi Klasifikasi SMS Berbasis Web Menggunakan Algoritma Logistic Regression.” TEKNIKA, vol. 11, no. 2, 2025. Available: https://ejournal.ikado.ac.id/index.php/teknika/article/view/466/206

K. A. Khalim, U. Hayati, and A. Bahtiar, “Perbandingan Prediksi Penyakit Hipertensi Menggunakan Metode Random Forest Dan Naïve Bayes,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 498–504, Mar. 2023, doi: 10.36040/JATI.V7I1.6376.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel A Comparative Study of Machine Learning Classifiers with SMOTE for Predicting Purchase Intention

Dimensions Badge
Article History
Submitted: 2025-06-17
Published: 2025-09-02
Abstract View: 485 times
PDF Download: 282 times
How to Cite
Khairunnisa, K., Soim, S., & Lindawati, L. (2025). A Comparative Study of Machine Learning Classifiers with SMOTE for Predicting Purchase Intention. Building of Informatics, Technology and Science (BITS), 7(2), 993-1004. https://doi.org/10.47065/bits.v7i2.7615
Section
Articles