Comparative Analysis of Deep Learning Architectures for DNA Sequence Classification: Performance Evaluation and Model Insights

Gregorius Airlangga

doi:10.47065/josyc.v5i3.5170

Gregorius Airlangga * Atma Jaya Catholic University of Indonesia, Jakarta, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/josyc.v5i3.5170

Keywords: Deep Learning; DNA Classification; Convolutional Neural Networks; Recurrent Neural Networks; Genomic Data Analysis

Abstract

The classification of DNA sequences using deep learning models offers promising avenues for advancements in genomics and personalized medicine. This study provides a comprehensive evaluation of several deep learning architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), Gated Recurrent Units (GRUs), Bidirectional LSTMs (BiLSTMs), and hybrid models combining CNNs with various recurrent networks, to classify human DNA sequences into functional categories. We employed a dataset of approximately 100,000 labeled sequences, ensuring a balanced representation across seven distinct classes to facilitate a fair comparison of model performance. Each model was assessed based on accuracy, precision, recall, F1 score, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC). The CNN model demonstrated superior accuracy (74.86%) and the highest AUC (94.64%), indicating its effectiveness in capturing spatial patterns within sequences. LSTM and GRU models showed commendable performance, particularly in balancing precision and recall, suggesting their capability in managing sequential dependencies. However, hybrid models did not perform as expected, showing lower overall metrics, which highlighted challenges in model integration and complexity management. The findings suggest that while CNNs excel in feature extraction, sequence-based models like LSTMs and GRUs provide valuable capabilities in capturing long-range dependencies, essential for comprehensive genomic analysis. The study underscores the need for optimized hybrid models and further research into model robustness and generalizability.

Downloads

Download data is not yet available.

References

S. M. Bakhtiar and E. Dilshad, Omics technologies for clinical diagnosis and gene therapy: medical applications in human genetics. Bentham Science Publishers, 2022.

D. Juan, G. Santpere, J. L. Kelley, O. E. Cornejo, and T. Marques-Bonet, “Current advances in primate genomics: novel approaches for understanding evolution and disease,” Nat. Rev. Genet., vol. 24, no. 5, pp. 314–331, 2023.

S. Mukherjee, The song of the cell: An exploration of medicine and the new human. Simon and Schuster, 2022.

M. Ben Khedher, K. Ghedira, J.-M. Rolain, R. Ruimy, and O. Croce, “Application and challenge of 3rd generation sequencing for clinical bacterial studies,” Int. J. Mol. Sci., vol. 23, no. 3, p. 1395, 2022.

R. Chataut, M. Nankya, and R. Akl, “6G Networks and the AI Revolution—Exploring Technologies, Applications, and Emerging Challenges,” Sensors, vol. 24, no. 6, p. 1888, 2024.

P. Brlek et al., “Implementing Whole Genome Sequencing (WGS) in Clinical Practice: Advantages, Challenges, and Future Perspectives,” Cells, vol. 13, no. 6, p. 504, 2024.

M. A. Rather et al., “Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture,” Int. J. Biol. Macromol., vol. 233, p. 123549, 2023.

W. S. Alharbi and M. Rashid, “A review of deep learning applications in human genomics using next-generation sequencing data,” Hum. Genomics, vol. 16, no. 1, p. 26, 2022.

J. Yang, S. C. Han, and J. Poon, “A survey on extraction of causal relations from natural language text,” Knowl. Inf. Syst., vol. 64, no. 5, pp. 1161–1186, 2022.

E. Uffelmann et al., “Genome-wide association studies,” Nat. Rev. Methods Prim., vol. 1, no. 1, p. 59, 2021.

R. Abdollahi-Arpanahi, D. Gianola, and F. Peñagaricano, “Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes,” Genet. Sel. Evol., vol. 52, pp. 1–15, 2020.

S. Bianchini, M. Müller, and P. Pelletier, “Deep learning in science,” arXiv Prepr. arXiv2009.01575, 2020.

A. Kaur, A. P. S. Chauhan, and A. K. Aggarwal, “Prediction of enhancers in DNA sequence data using a hybrid CNN-DLSTM model,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 20, no. 2, pp. 1327–1336, 2022.

A. A. Heydari and S. S. Sindi, “Deep learning in spatial transcriptomics: Learning from the next next-generation sequencing,” Biophys. Rev., vol. 4, no. 1, 2023.

R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deep learning and its applications to machine health monitoring,” Mech. Syst. Signal Process., vol. 115, pp. 213–237, 2019.

S. Rauschert, K. Raubenheimer, P. E. Melton, and R. C. Huang, “Machine learning and clinical epigenetics: a review of challenges for diagnosis and classification,” Clin. Epigenetics, vol. 12, pp. 1–11, 2020.

J. G. Greener, S. M. Kandathil, L. Moffat, and D. T. Jones, “A guide to machine learning for biologists,” Nat. Rev. Mol. cell Biol., vol. 23, no. 1, pp. 40–55, 2022.

K. M. Boehm, P. Khosravi, R. Vanguri, J. Gao, and S. P. Shah, “Harnessing multimodal data integration to advance precision oncology,” Nat. Rev. Cancer, vol. 22, no. 2, pp. 114–126, 2022.

T. Das, G. Andrieux, M. Ahmed, and S. Chakraborty, “Integration of online omics-data resources for cancer research,” Front. Genet., vol. 11, p. 578345, 2020.

Y. Wang, S. Tang, R. Ma, I. Zamit, Y. Wei, and Y. Pan, “Multi-modal intermediate integrative methods in neuropsychiatric disorders: A review,” Comput. Struct. Biotechnol. J., vol. 20, pp. 6149–6162, 2022.

N. Vasani, “Human DNA Data.” 2022.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Comparative Analysis of Deep Learning Architectures for DNA Sequence Classification: Performance Evaluation and Model Insights