A Comparative Study of Machine Learning Classifiers with SMOTE for Predicting Purchase Intention
Abstract
The rapid growth of e-commerce has made it increasingly important for online platforms to understand user behavior, particularly in predicting purchasing intention. This study examines the implementation of three machine learning models: Logistic Regression, Random Forest, and Gradient Boosting, to classify purchase intention using real transaction session data. One of the primary obstacles confronted in this investigation is the matter of class imbalance found in the dataset, where 10422 records indicate no purchase while only 1908 indicate a completed purchase. This disparity may result in a biased model performance that prioritizes the dominant class and limits the ability to accurately detect minority class behavior, which in this case is the actual purchase. To resolve this matter, During the data preprocessing phase, the Synthetic Minority Over-sampling Technique (SMOTE) was implemented. Accuracy, precision, recall, and F1-score metrics were implemented to assess each model's functionality. The results indicate that following the implementation of SMOTE, the Random Forest model attained the best accuracy of 93%, succeeded by Gradient Boosting at 90% and Logistic Regression with 84%. These findings demonstrate that the use of SMOTE significantly improves model sensitivity and balance. This study provides useful insights into designing fairer and more effective predictive systems in the field of e-commerce.
Downloads
References
M. B. Gulfraz, M. Sufyan, M. Mustak, J. Salminen, and D. K. Srivastava, “Understanding the impact of online customers’ shopping experience on online impulsive buying: A study on two leading E-commerce platforms,” Journal of Retailing and Consumer Services, vol. 68, Sep. 2022, doi: 10.1016/j.jretconser.2022.103000.
A. Rahaman, P. Hulgutte, S. Shaligram, and S. P. Pawar, “Advancements in Diagnostic Strategy of Neurological and Neuropsychiatric Disorders: From Conventional Methods to Point-of-Care Approaches,” British Journal of Multidisciplinary and Advanced Studies, vol. 5, pp. 1–14, Sep. 2024, doi: 10.37745/bjmas.2022.04184.
M. S. Azad, S. S. Khan, R. Hossain, R. Rahman, and S. Momen, “Predictive modeling of consumer purchase behavior on social media: Integrating theory of planned behavior and machine learning for actionable insights,” PLoS One, vol. 18, Dec. 2023, doi: 10.1371/journal.pone.0296336.
N. Chaudhuri, G. Gupta, V. Vamsi, and I. Bose, “On the platform but will they buy? Predicting customers’ purchase behavior using deep learning,” Decis Support Syst, vol. 149, Oct. 2021, doi: 10.1016/j.dss.2021.113622.
D. C. Gkikas and P. K. Theodoridis, “Predicting Online Shopping Behavior: Using Machine Learning and Google Analytics to Classify User Engagement,” Applied Sciences (Switzerland), vol. 14, Dec. 2024, doi: 10.3390/app142311403.
S. Jayanthi, D. Rajeshwari, N. M. Goud, R. Geetha, S. B. Franklin, and P. Rajyalakshmi, “Optimizing Purchase Intention Prediction in E-Commerce,” in 2024 1st International Conference for Women in Computing, InCoWoCo 2024 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/InCoWoCo64194.2024.10863606.
R. Gupta, A. Sharma, and T. Alam, “Building Predictive Models with Machine Learning,” in Studies in Big Data, vol. 145, Springer Science and Business Media Deutschland GmbH, 2024, pp. 39–59. doi: 10.1007/978-981-97-0448-4_3.
M. Arunkumar, K. Rajkumar, W. R. Salem Jeyaseelan, and N. A. Natraj, “Data Mining, Machine Learning, and Statistical Modeling for Predictive Analytics with Behavioral Big Data,” Tehnicki Vjesnik, vol. 32, pp. 72–77, 2025, doi: 10.17559/TV-20231102001073.
G. Wei, W. Mu, Y. Song, and J. Dou, “An improved and random synthetic minority oversampling technique for imbalanced data,” Knowl Based Syst, vol. 248, Jul. 2022, doi: 10.1016/j.knosys.2022.108839.
S. A. Alex, J. Jesu Vedha Nayahi, and S. Kaddoura, “Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification,” Appl Soft Comput, vol. 156, May 2024, doi: 10.1016/j.asoc.2024.111491.
F. Kamalov, A. F. Atiya, and D. Elreedy, “Partial Resampling of Imbalanced Data,” Jul. 2022.
F. E. Harrell, Regression Modeling Strategies. in Springer Series in Statistics. New York, NY: Springer New York, 2001. doi: 10.1007/978-1-4757-3462-1.
E. Tahirovic and S. Krivic, “Interpretability and Explain ability of Logistic Regression Model for Breast Cancer Detection,” in International Conference on Agents and Artificial Intelligence, Science and Technology Publications, Lda, 2023, pp. 161–168. doi: 10.5220/0011627600003393.
A. Cemiloglu, L. Zhu, A. B. Mohammednour, M. Azarafza, and Y. A. Nanehkaran, “Landslide Susceptibility Assessment for Maragheh County, Iran, Using the Logistic Regression Algorithm,” Land (Basel), vol. 12, Jul. 2023, doi: 10.3390/land12071397.
N. A. Saran and F. Nar, “Fast binary logistic regression,” PeerJ Comput Sci, vol. 11, 2025, doi: 10.7717/PEERJ-CS.2579.
M. Mohammadagha, “Hyperparameter Optimization Strategies for Tree-Based Machine Learning Models Prediction: A Comparative Study of AdaBoost, Decision Trees, and Random Forest,” SSRN Electronic Journal, 2025, doi: 10.2139/ssrn.5226457.
H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/bjml/2024/007.
A. Thakur et al., “Product Length Predictions with Machine Learning: An Integrated Approach Using Extreme Gradient Boosting,” SN Comput Sci, vol. 5, Aug. 2024, doi: 10.1007/s42979-024-02999-8.
J. Li, P. Liu, L. Chen, W. Pedrycz, and W. Ding, “An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models,” IEEE Transactions on Artificial Intelligence, 2024, doi: 10.1109/TAI.2024.3424427.
A. Shamim, “Predictive Modeling of E-Commerce Purchase Intent,” https://www.kaggle.com/datasets/adilshamim8/online.
B. Ghojogh, M. Crowley, F. Karray, and A. Ghodsi, “Adversarial Autoencoders,” in Elements of Dimensionality Reduction and Manifold Learning, Springer International Publishing, 2023, pp. 577–596. doi: 10.1007/978-3-031-10602-6_21.
A. Bernardo and E. Della Valle, “An extensive study of C-SMOTE, a Continuous Synthetic Minority Oversampling Technique for Evolving Data Streams,” Expert Syst Appl, vol. 196, Jun. 2022, doi: 10.1016/j.eswa.2022.116630.
S. A. Alex, J. Jesu Vedha Nayahi, and S. Kaddoura, “Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification,” Appl Soft Comput, vol. 156, May 2024, doi: 10.1016/j.asoc.2024.111491.
G. Kunapuli, Ensemble Methods for Machine Learning. Simon and Schusters, Manning, 2023. Accessed: Jun. 08, 2025. [Online]. Available: https://search.worldcat.org/title/1266357525
R. Sibindi, R. W. Mwangi, and A. G. Waititu, “A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices,” Engineering Reports, vol. 5, Apr. 2023, doi: 10.1002/eng2.12599.
F. Sulianta, Basic Data Mining from A to Z - Feri Sulianta - Google Books. 2023. Accessed: Jun. 23, 2025. [Online]. Available: https://books.google.co.id/books?hl=en&lr=lang_en&id=JcLhEAAAQBAJ&oi=fnd&pg=PA1&dq=metodologi+semma&ots=VnDoPkWIrp&sig=BzMu92d48476WZ6-oo7fQfLEUYw&redir_esc=y#v=onepage&q=metodologi%20semma&f=false
M. E. Lestari, I. Asror, and I. L. Sardi, “Penerapan PCA (Principal Component Analysis) pada Deteksi Outlier untuk Data Text,” eProceedings of Engineering, vol. 10, no. 3, Jun. 2023, doi: 10.1016/j.jsb.2012.10.010.
F. D. Pramakrisna, “Aplikasi Klasifikasi SMS Berbasis Web Menggunakan Algoritma Logistic Regression.” TEKNIKA, vol. 11, no. 2, 2025. Available: https://ejournal.ikado.ac.id/index.php/teknika/article/view/466/206
K. A. Khalim, U. Hayati, and A. Bahtiar, “Perbandingan Prediksi Penyakit Hipertensi Menggunakan Metode Random Forest Dan Naïve Bayes,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1, pp. 498–504, Mar. 2023, doi: 10.36040/JATI.V7I1.6376.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel A Comparative Study of Machine Learning Classifiers with SMOTE for Predicting Purchase Intention
Pages: 993-1004
Copyright (c) 2025 Khairunnisa Khairunnisa, Sopian Soim, Lindawati Lindawati

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















