Comparative Analysis of Machine Learning Models for Classifying Human DNA Sequences: Performance Metrics and Strategic Recommendations
Abstract
This study presents a comprehensive evaluation of seven machine learning models applied to the classification of human DNA sequences, highlighting their performance and potential applications in genomics. We explored Logistic Regression, Support Vector Machines (SVM), Random Forest, Decision Trees, Gradient Boosting, Naive Bayes, and XGBoost, using a 5-fold StratifiedKFold cross-validation method to ensure robustness and reliability in our findings. Naive Bayes demonstrated exceptional performance with near-perfect accuracy, precision, recall, and F1 scores, suggesting its suitability for rapid and efficient genomic classification. Logistic Regression also showed high efficacy, proving effective even in multi-class classifications of complex genetic data. Conversely, Decision Trees and SVM struggled with overfitting and computational efficiency, respectively, indicating the need for careful parameter tuning and optimization in practical applications. The study addresses these challenges and proposes strategies for enhancing model robustness and computational efficiency, such as advanced regularization techniques and hybrid modeling approaches. These insights not only aid in selecting appropriate models for specific genomic tasks but also pave the way for future research into integrating machine learning with genomic science to advance personalized medicine and genetic research. The findings encourage ongoing refinement of these models to unlock further potential in genomic applications.
Downloads
References
P. Tolani, S. Gupta, K. Yadav, S. Aggarwal, and A. K. Yadav, “Big data, integrative omics and network biology,” Adv. Protein Chem. Struct. Biol., vol. 127, pp. 127–160, 2021.
D. S. Bailey and G. I. Johnston, “Impact of genomics on the discovery and development of modern medicines,” in Genetics of Common Diseases, Garland Science, 2020, pp. 241–261.
A.-F. A. Mentis and L. Liu, “Global impact and application of Precision Healthcare,” in The New Era of Precision Medicine, Elsevier, 2024, pp. 209–228.
U. Radzikowska et al., “Omics technologies in allergy and asthma research: An EAACI position paper,” Allergy, vol. 77, no. 10, pp. 2888–2908, 2022.
H. Satam et al., “Next-generation sequencing technology: current trends and advancements,” Biology (Basel)., vol. 12, no. 7, p. 997, 2023.
L. Bai, Y. Wu, G. Li, W. Zhang, H. Zhang, and J. Su, “AI-enabled organoids: Construction, analysis, and application,” Bioact. Mater., vol. 31, pp. 525–548, 2024.
P. Crovari et al., “GeCoAgent: a conversational agent for empowering genomic data extraction and analysis,” ACM Trans. Comput. Healthc., vol. 3, no. 1, pp. 1–29, 2021.
A. Sharma and R. Kumar, “Recent Advancement and Challenges in Deep Learning, Big Data in Bioinformatics,” in Blockchain and Deep Learning: Future Trends and Enabling Technologies, Springer, 2022, pp. 251–284.
M. K. Gupta et al., “Sequence Alignment,” Bioinforma. Rice Res. Theor. Tech., pp. 129–162, 2021.
J. K. Chaudhari, S. Pant, R. Jha, R. K. Pathak, and D. B. Singh, “Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review,” Knowl. Inf. Syst., pp. 1–51, 2024.
U. Ullah and B. Garcia-Zapirain, “Quantum Machine Learning Revolution in Healthcare: A Systematic Review of Emerging Perspectives and Applications,” IEEE Access, 2024.
Y. Cao, T. A. Geddes, J. Y. H. Yang, and P. Yang, “Ensemble deep learning in bioinformatics,” Nat. Mach. Intell., vol. 2, no. 9, pp. 500–508, 2020.
N. S. Kiran, C. Yashaswini, R. Maheshwari, S. Bhattacharya, and B. G. Prajapati, “Advances in Precision Medicine Approaches for Colorectal Cancer: From Molecular Profiling to Targeted Therapies,” ACS Pharmacol. & Transl. Sci., vol. 7, no. 4, pp. 967–990, 2024.
S. Maleki Varnosfaderani and M. Forouzanfar, “The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century,” Bioengineering, vol. 11, no. 4, p. 337, 2024.
J. F. Uleman, R. Quax, R. J. F. Melis, A. G. Hoekstra, and M. G. M. O. Rikkert, “The need for systems thinking to advance Alzheimer’s disease research,” Psychiatry Res., vol. 333, p. 115741, 2024.
V. Gambardella et al., “Personalized medicine: recent progress in cancer therapy,” Cancers (Basel)., vol. 12, no. 4, p. 1009, 2020.
J. Peng, E. C. Jury, P. Dönnes, and C. Ciurtin, “Machine learning techniques for personalised medicine approaches in immune-mediated chronic inflammatory diseases: applications and challenges,” Front. Pharmacol., vol. 12, p. 720694, 2021.
K. Huang, C. Xiao, L. M. Glass, C. W. Critchlow, G. Gibson, and J. Sun, “Machine learning applications for therapeutic tasks with genomics data,” Patterns, vol. 2, no. 10, 2021.
M. Barshai, E. Tripto, and Y. Orenstein, “Identifying regulatory elements via deep learning,” Annu. Rev. Biomed. Data Sci., vol. 3, pp. 315–338, 2020.
A. A. Joshi and R. M. Aziz, “Deep learning approach for brain tumor classification using metaheuristic optimization with gene expression data,” Int. J. Imaging Syst. Technol., vol. 34, no. 2, p. e23007, 2024.
N. Vasani, “Human DNA Data.” 2022.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Comparative Analysis of Machine Learning Models for Classifying Human DNA Sequences: Performance Metrics and Strategic Recommendations
Pages: 729-738
Copyright (c) 2024 Gregorius Airlangga

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).






















