Applying Data Mining Techniques to Investigate the Impact of Smoking Prevalence on Life Expectancy in Indonesia: Insights from Random Forest Models


  • Abdul Hakim Dalimunthe Universitas Al Washliyah, Rantauprapat, Indonesia
  • Samsir Samsir Universitas Al Washliyah, Rantauprapat, Indonesia
  • Selamat Subagio Universitas Al Washliyah, Rantauprapat, Indonesia
  • Taufiqqurrahman Nur Siagian Universitas Al Washliyah, Rantauprapat, Indonesia
  • Ronal Watrianthos * Mail Universitas Al Washliyah, Rantauprapat, Indonesia
  • (*) Corresponding Author
Keywords: Smoking Prevalence; Life Expectancy; Random Forest Model; Data Mining; Public health policy

Abstract

This study investigates the relationship between smoking prevalence and life expectancy in Indonesian provinces using data mining techniques, specifically focusing on the application of random forests. The primary objective is to quantify the potential impact of reducing smoking prevalence on population health outcomes. Data were sourced from the Indonesian Central Bureau of Statistics, which included life expectancy and smoking prevalence data from 2021 to 2023. The methodology involved aggregating life expectancy data from the district to the province level, followed by the application of a random forest model to predict life expectancy based on smoking prevalence and other socioeconomic indicators. Key findings indicate a weak to moderate negative correlation between smoking prevalence and life expectancy, with higher smoking rates associated with lower life expectancies. Predictive modeling suggests that a reduction in smoking prevalence could lead to significant improvements in life expectancy. For example, a 5% reduction in smoking rates could increase the average life expectancy by approximately 0.3 years, while a 15% reduction could result in an increase of about 0.9 years by 2025. These results underscore the detrimental impact of smoking on population health and highlight the importance of effective tobacco control measures. The predictive models developed in this study provide valuable information for policymakers, enabling targeted public health strategies and resource allocation. This research contributes to the field by demonstrating the utility of data mining techniques in public health and offering a comprehensive analysis of the relationship between smoking and life expectancy in Indonesia. The findings advocate for the urgent implementation of smoking cessation programs to enhance life expectancy and improve public health outcomes

Downloads

Download data is not yet available.

References

WHO, “Tobacco,” https://www.who.int/news-room/fact-sheets/detail/tobacco.

M. S. El Hajj et al., “Evaluation of an intensive education program on the treatment of tobacco-use disorder for pharmacists: A study protocol for a randomized controlled trial,” Trials, vol. 20, no. 1, 2019, doi: 10.1186/s13063-018-3068-7.

A. F. Baktiar and T. S. Utiayarsih, “Identification of Factors Affecting Smoking Prevalence in West Java using Spatial Modeling,” Indonesian Journal of Statistics and Its Applications, vol. 6, no. 1, 2022, doi: 10.29244/ijsa.v6i1p114-131.

S. A. Kristina, D. Endarti, Y. S. Prabandari, A. Ahsan, and M. Thavorncharoensap, “Burden of cancers related to smoking among the Indonesian population: Premature mortality costs and years of potential life lost,” Asian Pacific Journal of Cancer Prevention, vol. 16, no. 16, 2015, doi: 10.7314/APJCP.2015.16.16.6903.

J. M and V. H, “Opinion Mining For Sentiment Data Classification,” International Journal of Research in Information Technology, vol. 3, no. 1, pp. 1–13, 2014.

Y. C. Giap, N. Leonardi, B. Waseso, and ..., “Data Mining of Family, School, and Society Environments Influences to Student Performance,” IOP Conference Series …, 2018, doi: 10.1088/1757-899X/420/1/012090.

B. M. Duffy and V. G. Duffy, “Data Mining Methodology in Support of a Systematic Review of Human Aspects of Cybersecurity,” 2020, pp. 242–253. doi: 10.1007/978-3-030-49907-5_17.

S. Dolley, “Big data’s role in precision public health,” Frontiers in Public Health, vol. 6. 2018. doi: 10.3389/fpubh.2018.00068.

I. Yoo et al., “Data mining in healthcare and biomedicine: A survey of the literature,” J Med Syst, vol. 36, no. 4, 2012, doi: 10.1007/s10916-011-9710-5.

I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques. 2016.

J. Alzubi, A. Nayyar, and A. Kumar, “Machine Learning from Theory to Algorithms: An Overview,” J Phys Conf Ser, vol. 1142, p. 012012, Nov. 2018, doi: 10.1088/1742-6596/1142/1/012012.

R. Kumari and S. Kr., “Machine Learning: A Review on Binary Classification,” Int J Comput Appl, vol. 160, no. 7, 2017, doi: 10.5120/ijca2017913083.

I. A. Hidayat, “Classification of Sleep Disorders Using Random Forest on Sleep Health and Lifestyle Dataset,” Journal of Dinda : Data Science, Information Technology, and Data Analytics, vol. 3, no. 2, 2023, doi: 10.20895/dinda.v3i2.1215.

C. Iwendi et al., “COVID-19 patient health prediction using boosted random forest algorithm,” Front Public Health, vol. 8, 2020, doi: 10.3389/fpubh.2020.00357.

M. M. Alam et al., “A Novel Krill Herd Based Random Forest Algorithm for Monitoring Patient Health,” Computers, Materials and Continua, vol. 75, no. 2, 2023, doi: 10.32604/cmc.2023.032118.

A. Liaw and M. Wiener, “Classification and Regression with Random Forest,” R News, vol. 2, 2002.

C. King and E. Strumpf, “Applying random forest in a health administrative data context: a conceptual guide,” Health Serv Outcomes Res Methodol, vol. 22, no. 1, 2022, doi: 10.1007/s10742-021-00255-7.

Q. Zhong and X. Liu, “Improved random forest method for mental health education,” International Journal of Circuits, Systems and Signal Processing, vol. 16, 2022, doi: 10.46300/9106.2022.16.41.

J. Wang et al., “Smoking, smoking cessation and tobacco control in rural China: A qualitative study in Shandong Province,” BMC Public Health, vol. 14, no. 1, 2014, doi: 10.1186/1471-2458-14-916.

J. Wong, M. Murray Horwitz, L. Zhou, and S. Toh, “Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data,” Curr Epidemiol Rep, vol. 5, no. 4, pp. 331–342, Dec. 2018, doi: 10.1007/s40471-018-0165-9.

N. L. W. S. R. Ginantra et al., Data Mining dan Penerapan Algoritma. Medan: Yayasan Kita Menulis, 2021.

U. E. Orji, M. E. Ezema, and J. C. Agbo, “Mining Twitter Data for Business Intelligence Using Naive Bayes Algorithm for Sentiment Analysis,” International Journal of Progressive Sciences and Technologies (IJPSAT), vol. 27, no. 2, 2021.

Y. Yuhefizar and R. Putra, “Web Mining for Enhanced Academic Visibility and Engagement Analysis Based on Visitor Data,” Journal of Systems Engineering and Information Technology , vol. 3, no. 1, pp. 7–13, Mar. 2024.

C.-F. Tsai, C.-T. Tsai, C.-S. Hung, and P.-S. Hwang, “Data mining techniques for identifying students at risk of failing a computer proficiency test required for graduation,” Australasian Journal of Educational Technology, vol. 27, no. 3, pp. 481–498, 2011, doi: 10.14742/ajet.956.

BPS, “Tabel Statistik,” https://www.bps.go.id/id.

A. Salam, S. S. Prasetiyowati, and Y. Sibaroni, “Prediction Vulnerability Level of Dengue Fever Using KNN and Random Forest,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 3, pp. 531–536, 2020, doi: 10.29207/resti.v4i3.1926.

E. P. Cynthia, M. A. R. A., A. Nazir, and F. Syafria, “Random Forest Algorithm to Investigate the Case of Acute Coronary Syndrome,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 2, pp. 369–378, 2021, doi: 10.29207/resti.v5i2.3000.

P. Jha et al., “21st-Century Hazards of Smoking and Benefits of Cessation in the United States,” New England Journal of Medicine, vol. 368, no. 4, 2013, doi: 10.1056/nejmsa1211128.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Applying Data Mining Techniques to Investigate the Impact of Smoking Prevalence on Life Expectancy in Indonesia: Insights from Random Forest Models

Dimensions Badge
Article History
Submitted: 2024-05-20
Published: 2024-06-30
Abstract View: 809 times
PDF Download: 414 times
How to Cite
Dalimunthe, A. H., Samsir, S., Subagio, S., Siagian, T. N., & Watrianthos, R. (2024). Applying Data Mining Techniques to Investigate the Impact of Smoking Prevalence on Life Expectancy in Indonesia: Insights from Random Forest Models. Building of Informatics, Technology and Science (BITS), 6(1), 460-468. https://doi.org/10.47065/bits.v6i1.5201
Issue
Section
Articles