Implementation of Hyperparameters to the Ensemble Learning Method for Lung Cancer Classification


  • Ridlo Yanuar * Mail Telkom University, Bandung, Indonesia
  • Siti Sa’adah Telkom University, Bandung, Indonesia
  • Prasti Eko Yunanto Telkom University, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: Lung Cancer; Classification; Ensemble Learning; Bagging; Boosting

Abstract

Lung cancer is the most common cause of death in someone who has cancer. This happens because of remembering the importance of lung function as a breathing apparatus and oxygen distribution throughout the body. Early identification of lung cancer is crucial to reduce its mortality rate. Accuracy is crucial since it indicates how accurately the model or system makes the right predictions. High levels of accuracy show that the model can produce trustworthy and accurate findings, essential for making effective decisions based on available data. In this research, ensemble learning approaches, namely bagging and boosting methods, were employed for classifying lung cancer. Hyperparameters, a class of parameters, are crucial to this model's effectiveness. In order to increase the lung cancer classification model's accuracy, a thorough investigation was conducted to identify the best hyperparameter combination. In this study, the dataset used is a medical dataset that contains a history of patients who have been diagnosed with lung cancer or not. The dataset is taken from Kaggle mysarahmadbhat and cancerdatahp from data world. To evaluate the model's accuracy, this study used the confusion matrix method which compares the model's prediction results with the ground truth. the study findings revealed that employing a dataset split ratio of 70:30 produced the best results, with the Random Forest, CatBoost, and XGBoost models achieving an impressive 98% accuracy, 0.98 precision, 0.98 recall, and 0.98 f1-score. but for AdaBoost, the best results were obtained on a dataset with a ratio of 80:20 with an accuracy of 96%, 0.97 precision, 0.96 recall, and 0.96 f1-score

Downloads

Download data is not yet available.

References

R. Kumar et al., “Effect of Covid-19 in Management of Lung Cancer Disease: A Review,” Asian Journal of Pharmaceutical Research and Development, vol. 10, no. 3, pp. 58–64, Jun. 2022, doi: 10.22270/ajprd.v10i3.1131.

R. L. Siegel, K. D. Miller, H. E. Fuchs, and A. Jemal, “Cancer statistics, 2022,” CA Cancer J Clin, vol. 72, no. 1, pp. 7–33, Jan. 2022, doi: 10.3322/caac.21708.

P. M. de Groot, C. C. Wu, B. W. Carter, and R. F. Munden, “The epidemiology of lung cancer,” Translational Lung Cancer Research, vol. 7, no. 3. AME Publishing Company, pp. 220–233, Jun. 01, 2018. doi: 10.21037/tlcr.2018.05.06.

L. Corrales, R. Rosell, A. F. Cardona, C. Martín, Z. L. Zatarain-Barrón, and O. Arrieta, “Lung cancer in never smokers: The role of different risk factors other than tobacco smoking,” Critical Reviews in Oncology/Hematology, vol. 148. Elsevier Ireland Ltd, Apr. 01, 2020. doi: 10.1016/j.critrevonc.2020.102895.

Muhammad Imran Faisal, Saba Bashir, Zain Sikandar Khan, and Farhan Hassan Khan, An Evaluation of Machine Learning Classifiers and Ensembles for Early Stage Prediction of Lung Cancer. IEEE, 2018. doi: 10.1109/ICEEST.2018.8643311.

R. P.R., R. A. S. Nair, and V. G., A Comparative Study of Lung Cancer Detection using Machine Learning Algorithms. pp. 1-4, 2019. doi: 10.1109/ICECCT.2019.8869001.

M. Amrane, S. Oukid, I. Gagaoua, and T. Ensarİ, “Breast Cancer Classification Using Machine Learning,” Istanbul, 2018. doi: 10.1109/EBBT.2018.8391453.

E. Dritsas and M. Trigka, “Lung Cancer Risk Prediction with Machine Learning Models,” Big Data and Cognitive Computing, vol. 6, no. 4, Dec. 2022, doi: 10.3390/bdcc6040139.

G. A. Shanbhag, K. A. Prabhu, N. V. S. Reddy, and B. A. Rao, “Prediction of Lung Cancer using Ensemble Classifiers,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Jan. 2022. doi: 10.1088/1742-6596/2161/1/012007.

M. Mamun, A. Farjana, M. Al Mamun, and M. S. Ahammed, “Lung cancer prediction model using ensemble learning techniques and a systematic review analysis,” in 2022 IEEE World AI IoT Congress, AIIoT 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 187–193. doi: 10.1109/AIIoT54504.2022.9817326.

C. S. Anita, G. Vasukidevi, D. Rajalakshmi, K. Selvi, and T. Ramesh, “Lung cancer prediction model using machine learning techniques,” Int J Health Sci (Qassim), pp. 12533–12539, Jun. 2022, doi: 10.53730/ijhs.v6ns2.8306.

S. Huang, I. Arpaci, M. Al-Emran, S. Kılıçarslan, and M. A. Al-Sharafi, “A comparative analysis of classical machine learning and deep learning techniques for predicting lung cancer survivability,” Multimed Tools Appl, 2023, doi: 10.1007/s11042-023-16349-y.

L. Wen and M. Hughes, “Coastal wetland mapping using ensemble learning algorithms: A comparative study of bagging, boosting and stacking techniques,” Remote Sens (Basel), vol. 12, no. 10, May 2020, doi: 10.3390/rs12101683.

S. Bagga, A. Goyal, N. Gupta, and A. Goyal, “Credit Card Fraud Detection using Pipeling and Ensemble Learning,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 104–112. doi: 10.1016/j.procs.2020.06.014.

S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Information Fusion, vol. 64, pp. 205–237, Dec. 2020, doi: 10.1016/j.inffus.2020.07.007.

P. Y. Taser, “Application of Bagging and Boosting Approaches Using Decision Tree-Based Algorithms in Diabetes Risk Prediction,” MDPI AG, Mar. 2021, p. 6. doi: 10.3390/proceedings2021074006.

M. H. D. M. Ribeiro and L. dos Santos Coelho, “Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series,” Applied Soft Computing Journal, vol. 86, Jan. 2020, doi: 10.1016/j.asoc.2019.105837.

I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” IEEE Access, vol. 10. Institute of Electrical and Electronics Engineers Inc., pp. 99129–99149, 2022. doi: 10.1109/ACCESS.2022.3207287.

W. Wang and D. Sun, “The improved AdaBoost algorithms for imbalanced data classification,” Inf Sci (N Y), vol. 563, pp. 358–374, Jul. 2021, doi: 10.1016/j.ins.2021.03.042.

Y. N. Lin, T. Y. Hsieh, J. J. Huang, C. Y. Yang, V. R. L. Shen, and H. H. Bui, “Fast Iris localization using Haar-like features and AdaBoost algorithm,” Multimed Tools Appl, vol. 79, no. 45–46, pp. 34339–34362, Dec. 2020, doi: 10.1007/s11042-020-08907-5.

A. A. Ibrahim, R. L. Ridwan, M. M. Muhammed, R. O. Abdulaziz, and G. A. Saheed, “Comparison of the CatBoost Classifier with other Machine Learning Methods,” vol. 11, no. 11, 2020. doi: 10.14569/IJACSA.2020.0111190.

A. Paleczek, D. Grochala, and A. Rydosz, “Artificial breath classification using xgboost algorithm for diabetes detection,” Sensors, vol. 21, no. 12, Jun. 2021, doi: 10.3390/s21124187.

E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, Dec. 2021, doi: 10.3390/informatics8040079.

S. George and B. Sumathi, “Grid Search Tuning of Hyperparameters in Random Forest Classifier for Customer Feedback Sentiment Prediction,” vol. 11, no. 9, 2020. doi: 10.14569/IJACSA.2020.0110920.

M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,” IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Implementation of Hyperparameters to the Ensemble Learning Method for Lung Cancer Classification

Dimensions Badge
Article History
Submitted: 2023-08-16
Published: 2023-09-30
Abstract View: 347 times
PDF Download: 372 times
How to Cite
Yanuar, R., Sa’adah, S., & Yunanto, P. (2023). Implementation of Hyperparameters to the Ensemble Learning Method for Lung Cancer Classification. Building of Informatics, Technology and Science (BITS), 5(2), 498−508. https://doi.org/10.47065/bits.v5i2.4096
Section
Articles