Analisis Perbandingan Metode Random Forest dan Adaptive Boosting Untuk Prediksi Leukemia dengan Data Microarray


  • Juleha Irianti Heremba * Mail Universitas Papua, Manokwari, Indonesia
  • Christian Dwi Suhendra Universitas Papua, Manokwari, Indonesia
  • Marlinda Sanglise Universitas Papua, Manokwari, Indonesia
  • (*) Corresponding Author
Keywords: Random Forest; AdaBoost; Leukemia; Microarray

Abstract

Cancer is the uncontrolled growth of cells that spread to other parts of the body. There are different types of cancer that are named after the organ they originate from. One of them is blood cancer or leukemia, which is bone marrow cancer caused by genetic mutations. According to data from Global Cancer Statistics in 2020, there were an estimated 19.3 million new cancer cases and 10 million cancer deaths, and it is estimated that by 2040 it will increase globally by 47% from 19.3 million to 28.4 million new cancer cases. Leukemia is one type of cancer with the ninth rank in Indonesia in 2020, there are 14,979 new cases and 11,530 cases of death caused by leukemia. One of the efforts to prevent leukemia can be done by diagnosing the acute leukemia category using DNA and genetic information. The purpose of this study is to analyze the comparative performance between Random Forest and Adaptive Boosting methods in predicting leukemia types using microarray datasets to determine which method is more effective in performing classification. In this study, the dataset used is gene expression in bone marrow and blood consisting of two categories of acute leukemia, namely Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia (ALL) obtained with DNA microarray technology. These genes will be classified using Random Forest and Adaboost methods to predict acute leukemia categories. The results of the analysis process show that the random forest method is a better method for predicting acute leukemia with an Area Under Curve value of 100%, Accuracy 92.9%, Precision 93.7%, Recall 92.9%, and F1-Score 92.7% compared to the AdaBoost method with an Area Under Curve value of 83.3%, Accuracy 85.7%, Precision 88.6%, Recall 85.7%, and F1-Score 85.1%.

Downloads

Download data is not yet available.

References

K. Zhu, “Active Learning for Microarray based Leukemia Classification,” in 2021 8th International Conference on Biomedical and Bioinformatics Engineering, New York, NY, USA: ACM, Nov. 2021, pp. 77–81. doi: 10.1145/3502871.3502884.

A. El-Baz and J. S. Suri, Artificial Intelligence in Cancer Diagnosis and Prognosis, Volume 1. IOP Publishing, 2022. doi: 10.1088/978-0-7503-3595-9.

“Cancer statistics for the year 2020: An overview,” Mar. 2021.

E. Morgan et al., “Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN,” Gut, vol. 72, no. 2, pp. 338–344, Feb. 2023, doi: 10.1136/gutjnl-2022-327736.

D. Castillo et al., “Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level,” PLoS One, vol. 14, no. 2, p. e0212127, Feb. 2019, doi: 10.1371/journal.pone.0212127.

R. Sheikhpour, R. Fazli, and S. Mehrabani, “Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method,” Iran J Ped Hematol Oncol, Mar. 2021, doi: 10.18502/ijpho.v11i2.5838.

S. A. Naufal, A. Adiwijaya, and W. Astuti, “Analisis Perbandingan Klasifikasi Support Vector Machine (SVM) dan K-Nearest Neighbors (KNN) untuk Deteksi Kanker dengan Data Microarray,” JURIKOM (Jurnal Riset Komputer), vol. 7, no. 1, p. 162, Feb. 2020, doi: 10.30865/jurikom.v7i1.2014.

W. Astuti and A. Adiwijaya, “Principal Component Analysis Sebagai Ekstraksi Fitur Data Microarray Untuk Deteksi Kanker Berbasis Linear Discriminant Analysis,” Jurnal Media Informatika Budidarma, vol. 3, no. 2, p. 72, Apr. 2019, doi: 10.30865/mib.v3i2.1161.

F. Anowar, S. Sadaoui, and B. Selim, “Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE),” Comput Sci Rev, vol. 40, p. 100378, May 2021, doi: 10.1016/j.cosrev.2021.100378.

S. A. Naufal, A. Adiwijaya, and W. Astuti, “Analisis Perbandingan Klasifikasi Support Vector Machine (SVM) dan K-Nearest Neighbors (KNN) untuk Deteksi Kanker dengan Data Microarray,” JURIKOM (Jurnal Riset Komputer), vol. 7, no. 1, p. 162, Feb. 2020, doi: 10.30865/jurikom.v7i1.2014.

N. B. Tayfor and S. J. Mohammed, “A Comparison Study of Data Mining Algorithms for blood Cancer Prediction,” Passer Journal of Basic and Applied Sciences, vol. 3, no. 2, pp. 174–179, Sep. 2021, doi: 10.24271/psr.29.

S. Ratnawati, S. Sunendiari, P. Statistika, F. Matematika, D. Ilmu, and P. Alam, “Penggunaan Metode Logistic Regression Ensemble (LORENS) pada Klasifikasi Leukemia Akut”, 2021, doi: 10.29313/.v7i1.25555.

W. W. Piegorsch, “Statistical data analytics : foundations for data mining, informatics, and knowledge discovery,” 2015.

M. J. Paput, K. Suryowati, and M. T. Jatipaningrum, “Perbandingan Metode Random Forest Dan Adaptive Boosting Pada Klasifikasi Indeks Pembangunan Manusia Di Indonesia,” Jurnal Statistika Industri dan Komputasi, vol. 8, no. 2, pp. 73–83, Jul. 2023, doi: 10.34151/statistika.v8i2.4458.

A. C. Kurniawan and A. Salam, “Seleksi Fitur Information Gain untuk Optimasi Klasifikasi Penyakit Tuberkulosis,” Jurnal Media Informatika Budidarma, vol. 8, no. 1, p. 70, Jan. 2024, doi: 10.30865/mib.v8i1.7122.

C. C. Aggarwal, Data Mining. Cham: Springer International Publishing, 2015. doi: 10.1007/978-3-319-14142-8.

C. Schröer, F. Kruse, and J. M. Gómez, “A Systematic Literature Review on Applying CRISP-DM Process Model,” Procedia Comput Sci, vol. 181, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.

C. Crawford, “Gene expression dataset (Golub et al.),” Access Date Oct 2024, https://www.kaggle.com/datasets/crawford/gene-expression.

Z. I. Bimawan, T. Astuti, and P. Arsi, “Comparison Of Random Forest, K-Nearest Neighbor, Decision Tree, And Xgboost Algorithms For Detecting Stunting In Toddlers Komparasi Algoritma Random Forest, K-Nearest Neighbor, Decision Tree, Xgboost Untuk Mendeteksi Penyakit Stunting Balita,” Jurnal Teknik Informatika (JUTIF), vol. 5, no. 6, pp. 1599–1607, 2024, doi: 10.52436/1.jutif.2024.5.6.2629.

S. Widaningsih, “Perbandingan Metode Data Mining Untuk Prediksi Nilai Dan Waktu Kelulusan Mahasiswa Prodi Teknik Informatika Dengan Algoritma C4,5, Naïve Bayes, Knn Dan Svm,” Jurnal Tekno Insentif, vol. 13, no. 1, pp. 16–25, Apr. 2019, doi: 10.36787/jti.v13i1.78.

H. Azis, P. Purnawansyah, F. Fattah, and I. P. Putri, “Performa Klasifikasi K-NN dan Cross Validation Pada Data Pasien Pengidap Penyakit Jantung,” ILKOM Jurnal Ilmiah, vol. 12, no. 2, pp. 81–86, Aug. 2020, doi: 10.33096/ilkom.v12i2.507.81-86.

D. Desyanti, J. Suarlin, and R. Faisal, “Otoritas Guru Dalam Prestasi Belajar Siswa Menggunakan Fuzzy Mamdani,” Jurnal Media Informatika Budidarma, vol. 7, no. 3, p. 1323, Jul. 2023, doi: 10.30865/mib.v7i3.6368.

F. Salsabila, I. Fitrianti, Y. Umaidah, and N. Heryana, “Penerapan Metode Crisp-Dm Untuk Analisa Pendapatan Bersih Bulanan Pekerja Informal Di Provinsi Jawa Barat Dengan Algoritma K-Means,” Dinamik, vol. 28, no. 2, pp. 97–104, Jul. 2023, doi: 10.35315/dinamik.v28i2.9454.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Analisis Perbandingan Metode Random Forest dan Adaptive Boosting Untuk Prediksi Leukemia dengan Data Microarray

Dimensions Badge
Article History
Submitted: 2025-02-02
Published: 2025-04-06
Abstract View: 275 times
PDF Download: 199 times
How to Cite
Heremba, J., Suhendra, C., & Sanglise, M. (2025). Analisis Perbandingan Metode Random Forest dan Adaptive Boosting Untuk Prediksi Leukemia dengan Data Microarray. Journal of Information System Research (JOSH), 6(3), 1564-1572. https://doi.org/10.47065/josh.v6i3.6898
Issue
Section
Articles