Handling Imbalanced Data Sets Using SMOTE and ADASYN to Improve Classification Performance of Ecoli Data Sets
Abstract
In this digital era, machine learning is a technology that is in demand by organizations and individuals. In the age of data and digital information, the ability to process data efficiently is needed. As the amount of data grows, there are various problems in machine learning. One of them is that with the increasing amount of data, class imbalance is also often found. Class imbalance is a condition where a class dominates another class, in one example case is when the positive value class has less number than the negative class. The class that is less in number is categorized as the minority class, while the class that dominates the dataset is called the majority class. Class imbalance can affect classification performance in a bad way, so handling imbalanced classes is needed to improve classification results. Classification of imbalanced data using Random Forest has satisfactory results, as well as by implementing SMOTE and ADASYN as sampling methods because they are highly popular and easy to implement. The best model produced in this study is the model that applies SMOTE oversampling on a dataset with 10% IR with a balanced accuracy of 98.75%, and the best result when applying ADASYN oversampling is on a dataset with 13% IR and a balanced accuracy of 99.03%.
Downloads
References
X. Jiang and Z. Ge, “Data Augmentation Classifier for Imbalanced Fault Classification,” IEEE Trans. Autom. Sci. Eng., vol. 18, no. 3, pp. 1206–1217, 2021, doi: 10.1109/TASE.2020.2998467.
K. U. Syaliman, “Enhance the Accuracy of K-Nearest Neighbor ( K-Nn ) for Unbalanced Class Data Using Synthetic Minority Oversampling Technique ( Smote ) and Gain Ratio ( Gr ),” J. Infokum, vol. 10, no. 1, pp. 188–195, 2021.
D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf. Sci. (Ny)., vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.
J. Brandt and E. Lanzén, “A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification,” p. 42, 2020.
H. A. Gameng, B. D. Gerardo, and R. P. Medina, “A Modified Adaptive Synthetic SMOTE Approach in Graduation Success Rate Classification A Modified Adaptive Synthetic SMOTE Approach in Graduation Success Rate Classification,” no. December 2019, 2020, doi: 10.30534/ijatcse/2019/63862019.
N. G. Ramadhan, “Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus,” Sci. J. Informatics, vol. 8, no. 2, pp. 276–282, 2021, doi: 10.15294/sji.v8i2.32484.
J. Alcalá-Fdez et al., “KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” J. Mult. Log. Soft Comput., vol. 17, no. 2–3, pp. 255–287, 2011.
S. W. Yahaya, A. Lotfi, and M. Mahmud, “A Consensus Novelty Detection Ensemble Approach for Anomaly Detection in Activities of Daily Living,” Appl. Soft Comput. J., vol. 83, p. 105613, 2019, doi: 10.1016/j.asoc.2019.105613.
J. L. P. Lima, D. MacEdo, and C. Zanchettin, “Heartbeat Anomaly Detection using Adversarial Oversampling,” Proc. Int. Jt. Conf. Neural Networks, vol. 2019-July, no. July, pp. 1–7, 2019, doi: 10.1109/IJCNN.2019.8852242.
P. Soltanzadeh and M. Hashemzadeh, “RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem,” Inf. Sci. (Ny)., vol. 542, pp. 92–111, 2021, doi: 10.1016/j.ins.2020.07.014.
J. Park, S. Kwon, and S. P. Jeong, “A study on improving turnover intention forecasting by solving imbalanced data problems: focusing on SMOTE and generative adversarial networks,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00715-6.
A. O. Technique et al., “DAD-Net : Classification of Alzheimer ’ s Disease Using Neural Network,” pp. 1–21, 2022.
V. Jackins, S. Vimal, M. Kaliappan, and M. Y. Lee, “AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes,” J. Supercomput., vol. 77, no. 5, pp. 5198–5219, 2021, doi: 10.1007/s11227-020-03481-x.
I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector,” IEEE Access, vol. 7, no. c, pp. 60134–60149, 2019, doi: 10.1109/ACCESS.2019.2914999.
I. Prayoga and M. D. P, “Sentiment Analysis on Indonesian Movie Review Using KNN Method With the Implementation of Chi-Square Feature Selection,” vol. 7, pp. 369–375, 2023, doi: 10.30865/mib.v7i1.5522.
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020, doi: 10.1186/s12864-019-6413-7.
M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” pp. 1–17, 2020, [Online]. Available: http://arxiv.org/abs/2008.05756
R. Arora, C. T. Tsai, K. Tsereteli, P. Kambadur, and Y. Yang, “A semi-Markov structured support vector machine model for high-precision named entity recognition,” ACL 2019 - 57th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf., no. 2005, pp. 5862–5866, 2020, doi: 10.18653/v1/p19-1587.
N. Munsch et al., “Diagnostic accuracy of web-based COVID-19 symptom checkers: Comparison study,” J. Med. Internet Res., vol. 22, no. 10, 2020, doi: 10.2196/21299.
D. Chicco, N. Tötsch, and G. Jurman, “The matthews correlation coefficient (Mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min., vol. 14, pp. 1–22, 2021, doi: 10.1186/s13040-021-00244-z.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Handling Imbalanced Data Sets Using SMOTE and ADASYN to Improve Classification Performance of Ecoli Data Sets
Pages: 246−253
Copyright (c) 2023 Anthony Mas Halim, Mahendra Dwifebri, Fhira Nhita

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















