Applying Data Mining Techniques to Investigate the Impact of Smoking Prevalence on Life Expectancy in Indonesia: Insights from Random Forest Models
Abstract
This study investigates the relationship between smoking prevalence and life expectancy in Indonesian provinces using data mining techniques, specifically focusing on the application of random forests. The primary objective is to quantify the potential impact of reducing smoking prevalence on population health outcomes. Data were sourced from the Indonesian Central Bureau of Statistics, which included life expectancy and smoking prevalence data from 2021 to 2023. The methodology involved aggregating life expectancy data from the district to the province level, followed by the application of a random forest model to predict life expectancy based on smoking prevalence and other socioeconomic indicators. Key findings indicate a weak to moderate negative correlation between smoking prevalence and life expectancy, with higher smoking rates associated with lower life expectancies. Predictive modeling suggests that a reduction in smoking prevalence could lead to significant improvements in life expectancy. For example, a 5% reduction in smoking rates could increase the average life expectancy by approximately 0.3 years, while a 15% reduction could result in an increase of about 0.9 years by 2025. These results underscore the detrimental impact of smoking on population health and highlight the importance of effective tobacco control measures. The predictive models developed in this study provide valuable information for policymakers, enabling targeted public health strategies and resource allocation. This research contributes to the field by demonstrating the utility of data mining techniques in public health and offering a comprehensive analysis of the relationship between smoking and life expectancy in Indonesia. The findings advocate for the urgent implementation of smoking cessation programs to enhance life expectancy and improve public health outcomes
Downloads
References
WHO, “Tobacco,” https://www.who.int/news-room/fact-sheets/detail/tobacco.
M. S. El Hajj et al., “Evaluation of an intensive education program on the treatment of tobacco-use disorder for pharmacists: A study protocol for a randomized controlled trial,” Trials, vol. 20, no. 1, 2019, doi: 10.1186/s13063-018-3068-7.
A. F. Baktiar and T. S. Utiayarsih, “Identification of Factors Affecting Smoking Prevalence in West Java using Spatial Modeling,” Indonesian Journal of Statistics and Its Applications, vol. 6, no. 1, 2022, doi: 10.29244/ijsa.v6i1p114-131.
S. A. Kristina, D. Endarti, Y. S. Prabandari, A. Ahsan, and M. Thavorncharoensap, “Burden of cancers related to smoking among the Indonesian population: Premature mortality costs and years of potential life lost,” Asian Pacific Journal of Cancer Prevention, vol. 16, no. 16, 2015, doi: 10.7314/APJCP.2015.16.16.6903.
J. M and V. H, “Opinion Mining For Sentiment Data Classification,” International Journal of Research in Information Technology, vol. 3, no. 1, pp. 1–13, 2014.
Y. C. Giap, N. Leonardi, B. Waseso, and ..., “Data Mining of Family, School, and Society Environments Influences to Student Performance,” IOP Conference Series …, 2018, doi: 10.1088/1757-899X/420/1/012090.
B. M. Duffy and V. G. Duffy, “Data Mining Methodology in Support of a Systematic Review of Human Aspects of Cybersecurity,” 2020, pp. 242–253. doi: 10.1007/978-3-030-49907-5_17.
S. Dolley, “Big data’s role in precision public health,” Frontiers in Public Health, vol. 6. 2018. doi: 10.3389/fpubh.2018.00068.
I. Yoo et al., “Data mining in healthcare and biomedicine: A survey of the literature,” J Med Syst, vol. 36, no. 4, 2012, doi: 10.1007/s10916-011-9710-5.
I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques. 2016.
J. Alzubi, A. Nayyar, and A. Kumar, “Machine Learning from Theory to Algorithms: An Overview,” J Phys Conf Ser, vol. 1142, p. 012012, Nov. 2018, doi: 10.1088/1742-6596/1142/1/012012.
R. Kumari and S. Kr., “Machine Learning: A Review on Binary Classification,” Int J Comput Appl, vol. 160, no. 7, 2017, doi: 10.5120/ijca2017913083.
I. A. Hidayat, “Classification of Sleep Disorders Using Random Forest on Sleep Health and Lifestyle Dataset,” Journal of Dinda : Data Science, Information Technology, and Data Analytics, vol. 3, no. 2, 2023, doi: 10.20895/dinda.v3i2.1215.
C. Iwendi et al., “COVID-19 patient health prediction using boosted random forest algorithm,” Front Public Health, vol. 8, 2020, doi: 10.3389/fpubh.2020.00357.
M. M. Alam et al., “A Novel Krill Herd Based Random Forest Algorithm for Monitoring Patient Health,” Computers, Materials and Continua, vol. 75, no. 2, 2023, doi: 10.32604/cmc.2023.032118.
A. Liaw and M. Wiener, “Classification and Regression with Random Forest,” R News, vol. 2, 2002.
C. King and E. Strumpf, “Applying random forest in a health administrative data context: a conceptual guide,” Health Serv Outcomes Res Methodol, vol. 22, no. 1, 2022, doi: 10.1007/s10742-021-00255-7.
Q. Zhong and X. Liu, “Improved random forest method for mental health education,” International Journal of Circuits, Systems and Signal Processing, vol. 16, 2022, doi: 10.46300/9106.2022.16.41.
J. Wang et al., “Smoking, smoking cessation and tobacco control in rural China: A qualitative study in Shandong Province,” BMC Public Health, vol. 14, no. 1, 2014, doi: 10.1186/1471-2458-14-916.
J. Wong, M. Murray Horwitz, L. Zhou, and S. Toh, “Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data,” Curr Epidemiol Rep, vol. 5, no. 4, pp. 331–342, Dec. 2018, doi: 10.1007/s40471-018-0165-9.
N. L. W. S. R. Ginantra et al., Data Mining dan Penerapan Algoritma. Medan: Yayasan Kita Menulis, 2021.
U. E. Orji, M. E. Ezema, and J. C. Agbo, “Mining Twitter Data for Business Intelligence Using Naive Bayes Algorithm for Sentiment Analysis,” International Journal of Progressive Sciences and Technologies (IJPSAT), vol. 27, no. 2, 2021.
Y. Yuhefizar and R. Putra, “Web Mining for Enhanced Academic Visibility and Engagement Analysis Based on Visitor Data,” Journal of Systems Engineering and Information Technology , vol. 3, no. 1, pp. 7–13, Mar. 2024.
C.-F. Tsai, C.-T. Tsai, C.-S. Hung, and P.-S. Hwang, “Data mining techniques for identifying students at risk of failing a computer proficiency test required for graduation,” Australasian Journal of Educational Technology, vol. 27, no. 3, pp. 481–498, 2011, doi: 10.14742/ajet.956.
BPS, “Tabel Statistik,” https://www.bps.go.id/id.
A. Salam, S. S. Prasetiyowati, and Y. Sibaroni, “Prediction Vulnerability Level of Dengue Fever Using KNN and Random Forest,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 3, pp. 531–536, 2020, doi: 10.29207/resti.v4i3.1926.
E. P. Cynthia, M. A. R. A., A. Nazir, and F. Syafria, “Random Forest Algorithm to Investigate the Case of Acute Coronary Syndrome,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 2, pp. 369–378, 2021, doi: 10.29207/resti.v5i2.3000.
P. Jha et al., “21st-Century Hazards of Smoking and Benefits of Cessation in the United States,” New England Journal of Medicine, vol. 368, no. 4, 2013, doi: 10.1056/nejmsa1211128.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Applying Data Mining Techniques to Investigate the Impact of Smoking Prevalence on Life Expectancy in Indonesia: Insights from Random Forest Models
Pages: 460-468
Copyright (c) 2024 Abdul Hakim Dalimunthe, Samsir, Selamat Subagio, Taufiqqurrahman Nur Siagian, Ronal Watrianthos

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















