Stock Industry Sector Prediction Based on Financial Reports Using Random Forest
Abstract
This study aims to predict the stock industry sector on the Indonesia Stock Exchange (IDX) based on financial reports using the Random Forest method. Implementing this machine learning approach is crucial due to the complexity of financial data, which demands robust and adaptive methods for accurate predictions. The dataset comprises financial data from companies across 10 industrial sectors on the IDX, spanning 2010-2022, and includes 17 features from each financial report. Notably, there is an imbalance in the number of companies per sector, with sector B representing 14.76% and sector G only 1.98%. This imbalance introduces bias in data analysis, thus necessitating the application of the SMOTE oversampling method to address it. The research process involves data cleaning, splitting the data into 80% training and 20% testing sets, applying the SMOTE oversampling technique, and comparing predictions from imbalanced and balanced datasets. The Random Forest method is chosen for its capability to handle complex datasets for industrial sector classification. Evaluation results indicate that without oversampling, the model achieves an accuracy of 73.57%, precision of 74.29%, recall of 73.57%, and an F1-score of 73.51%. With oversampling, these metrics improve to an accuracy of 80.21%, precision of 81.34%, recall of 80.21%, and an F1-score of 80.45%.
Downloads
References
W. Budiharto, “Data science approach to stock prices forecasting in Indonesia during Covid-19 using Long Short-Term Memory (LSTM),” J. Big Data, vol. 8, no. 1, 2021, doi: 10.1186/s40537-021-00430-0.
O. D. Madeeh and H. S. Abdullah, “An Efficient Prediction Model based on Machine Learning Techniques for Prediction of the Stock Market,” J. Phys. Conf. Ser., vol. 1804, no. 1, 2021, doi: 10.1088/1742-6596/1804/1/012008.
Y. S. Soekamto, M. Chandra, T. Wiradinata, R. Tanamal, and T. R. D. Saputri, Property Category Prediction Model using Random Forest Classifier to Improve Property Industry in Surabaya. Atlantis Press International BV, 2023. doi: 10.2991/978-94-6463-144-9_24.
D. Makariou, P. Barrieu, and Y. Chen, “A random forest based approach for predicting spreads in the primary catastrophe bond market,” Insur. Math. Econ., vol. 101, no. Breiman 2001, pp. 140–162, 2021, doi: 10.1016/j.insmatheco.2021.07.003.
H. van der Heijden, “Predicting industry sectors from financial statements: An illustration of machine learning in accounting research,” Br. Account. Rev., vol. 54, no. 5, p. 101096, 2022, doi: 10.1016/j.bar.2022.101096.
P. Chakri, S. Pratap, Lakshay, and S. K. Gouda, “An exploratory data analysis approach for analyzing financial accounting data using machine learning,” Decis. Anal. J., vol. 7, no. March, p. 100212, 2023, doi: 10.1016/j.dajour.2023.100212.
C. Lohrmann and P. Luukka, “Classification of intraday S&P500 returns with a Random Forest,” Int. J. Forecast., vol. 35, no. 1, pp. 390–407, 2019, doi: 10.1016/j.ijforecast.2018.08.004.
H. Daori, “Predicting Stock Prices Using the Random Forest Classier,” 2022, [Online]. Available: https://doi.org/10.21203/rs.3.rs-2266733/v1
P. Ghosh, A. Neufeld, and J. K. Sahoo, “Forecasting directional movements of stock prices for intraday trading using LSTM and random forests,” Financ. Res. Lett., vol. 46, no. December 2018, 2022, doi: 10.1016/j.frl.2021.102280.
M. Vijh, D. Chandola, V. A. Tikkiwal, and A. Kumar, “Stock Closing Price Prediction using Machine Learning Techniques,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 599–606, 2020, doi: 10.1016/j.procs.2020.03.326.
B. Mohammadi ivatlood, C. Spampinato, R. Chopra, K. C. Lee, and S. S. Roy, “Random forest, gradient boosted machines and deep neural network for stock price forecasting: a comparative analysis on South Korean companies,” Int. J. Ad Hoc Ubiquitous Comput., vol. 33, no. 1, p. 62, 2020, doi: 10.1504/ijahuc.2020.10026453.
A. M. N. Alzubaidi and E. S. Al-Shamery, “Projection pursuit Random Forest using discriminant feature analysis model for churners prediction in telecom industry,” Int. J. Electr. Comput. Eng., vol. 10, no. 2, pp. 1406–1421, 2020, doi: 10.11591/ijece.v10i2.pp1406-1421.
X. Zhong and D. Enke, “Predicting the daily return direction of the stock market using hybrid machine learning algorithms,” Financ. Innov., vol. 5, no. 1, 2019, doi: 10.1186/s40854-019-0138-0.
A. Bin Omar, S. Huang, A. A. Salameh, H. Khurram, and M. Fareed, “Stock Market Forecasting Using the Random Forest and Deep Neural Network Models Before and During the COVID-19 Period,” Front. Environ. Sci., vol. 10, no. July, pp. 1–10, 2022, doi: 10.3389/fenvs.2022.917047.
E. González-Núñez, L. A. Trejo, and M. Kampouridis, “A Comparative Study for Stock Market Forecast Based on a New Machine Learning Model,” Big Data Cogn. Comput., vol. 8, no. 4, 2024, doi: 10.3390/bdcc8040034.
K. Kaczmarczyk and M. Hernes, “Financial decisions support using the supervised learning method based on random forests,” Procedia Comput. Sci., vol. 176, pp. 2802–2811, 2020, doi: 10.1016/j.procs.2020.09.276.
J. Shen and M. O. Shafiq, “Short-term stock market price trend prediction using a comprehensive deep learning system,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00333-6.
N. Rouf et al., “Stock market prediction using machine learning techniques: A decade survey on methodologies, recent developments, and future directions,” Electron., vol. 10, no. 21, 2021, doi: 10.3390/electronics10212717.
P. Sadorsky, “A Random Forests Approach to Predicting Clean Energy Stock Prices,” J. Risk Financ. Manag., vol. 14, no. 2, 2021, doi: 10.3390/jrfm14020048.
T. P. Ogundunmade, A. A. Adepoju, and A. Allam, “Stock Price Forecasting: Machine Learning Models with K-fold and Repeated Cross Validation Approaches,” Mod. Econ. Manag., no. June, 2022, doi: 10.53964/mem.2022001.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Stock Industry Sector Prediction Based on Financial Reports Using Random Forest
Pages: 1002-1011
Copyright (c) 2024 Kamil Elian Zhafran, Deni Saepudin
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).