Market-Adaptive Stock Trading through B-WEMA Driven Proximal Policy Optimization

Mulia Ichsan; Amalia Zahra

doi:10.47065/bits.v7i4.9349

Mulia Ichsan * Bina Nusantara University, Jakarta, Indonesia
Amalia Zahra Bina Nusantara University, Jakarta, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v7i4.9349

Keywords: Deep Reinforcement Learning; Proximal Policy Optimization; B-WEMA; Risk-Adjusted Trading Performance

Abstract

Developing automated trading strategies that achieve stable returns while controlling risk remains a central threat in quantitative finance. Many reinforcement learning-based trading systems focus on reward maximization but provide limited justification for the choice of forecasting indicators and often lack comprehensive benchmarking against alternative strategies and risk measures. This essay addresses the problem of integrating a statistically grounded price-smoothing technique with a policy optimization scheme to improve sequential trading decisions under market uncertainty. We propose a hybrid model that combines Brown’s Weighted Exponential Moving Average (B-WEMA) as a trend-sensitive forecasting indicator with a Deep Reinforcement Learning agent trained using Proximal Policy Optimization (PPO). The role of B-WEMA is to provide structured price signals that reduce noise sensitivity, while PPO determines buy and sell actions through policy updates constrained for stable learning. The performance of the proposed model is evaluated over a 10-month trading horizon and compared with a buy-and-hold benchmark and an alternative reinforcement learning method, Advantage Actor-Critic (A2C), both implemented under the same experimental conditions. Empirical results show that the proposed B-WEMA-PPO framework achieved a cumulative return of 23.43% over the test period, outperforming both the benchmark and the A2C-based agent. In addition to cumulative return, risk-adjusted performance metrics, namely volatility and maximum drawdown, are reported to provide a balanced assessment of profitability and risk exposure. These findings suggest that incorporating structured exponential smoothing into policy optimization may enhance the stability and effectiveness of reinforcement learning-based trading strategies.

Downloads

Download data is not yet available.

References

A. Maharani and F. Saputra, “Relationship of Investment Motivation, Investment Knowledge and Minimum Capital to Investment Interest,” Journal of Law, Politic and Humanities, vol. 2, no. 1, pp. 23–32, 2021, doi: 10.38035/jlph.v2i1.84.

A. F. Kamara, E. Chen, and Z. Pan, “An ensemble of a boosted hybrid of deep learning models and technical analysis for forecasting stock prices,” Inf. Sci. (N Y)., vol. 594, pp. 1–19, May 2022, doi: 10.1016/j.ins.2022.02.015.

G. Sonkavde, D. S. Dharrao, A. M. Bongale, S. T. Deokate, D. Doreswamy, and S. K. Bhat, “Forecasting Stock Market Prices Using Machine Learning and Deep Learning Models: A Systematic Review, Performance Analysis and Discussion of Implications,” International Journal of Financial Studies, vol. 11, no. 3, p. 94, Jul. 2023, doi: 10.3390/ijfs11030094.

D. A. Daniswara, H. Widjanarko, and K. Hikmah, “THE ACCURACY TEST OF TECHNICAL ANALYSIS OF MOVING AVERAGE, BOLLINGER BANDS, AND RELATIVE STRENGTH INDEX ON STOCK PRICES OF COMPANIES LISTED IN INDEX LQ45,” Indikator: Jurnal Ilmiah Manajemen dan Bisnis, vol. 6, no. 2, p. 16, Apr. 2022, doi: 10.22441/indikator.v6i2.14806.

U. W. Chohan and S. Van Kerckhoven, Activist Retail Investors and the Future of Financial Markets, 1st ed., vol. 1. London: Routledge, 2023. doi: 10.4324/9781003351085.

A. Coloma-Carmona, J. L. Carballo, F. Miró-Llinares, and J. C. Aguerri, “Not all traders gamble, but some gamblers trade: a latent class analysis of trading and gambling behaviors among retail investors,” Public Health, vol. 244, p. 105742, Jul. 2025, doi: 10.1016/j.puhe.2025.105742.

K. D. Pradnyani, I. M. S. Sandhiyasa, and I. M. A. O. Gunawan, “Optimising Double Exponential Smoothing for Sales Forecasting Using The Golden Section Method,” Jurnal Galaksi, vol. 1, no. 2, pp. 110–120, Aug. 2024, doi: 10.70103/galaksi.v1i2.21.

D. P. Anggraeni, “Optimisation of Inventory Management Through Time Series Analysis of Inventory Data with Double Exponential Smoothing Method,” Journal of Computer Networks, Architecture and High Performance Computing, vol. 6, no. 3, pp. 1693–1700, Jul. 2024, doi: 10.47709/cnahpc.v6i3.4410.

S. K. Sahu, A. Mokhade, and N. D. Bokde, “An Overview of Machine Learning, Deep Learning, and Reinforcement Learning-Based Techniques in Quantitative Finance: Recent Progress and Challenges,” Applied Sciences, vol. 13, no. 3, p. 1956, Feb. 2023, doi: 10.3390/app13031956.

M. Saberironaghi, J. Ren, and A. Saberironaghi, “Stock Market Prediction Using Machine Learning and Deep Learning Techniques: A Review,” AppliedMath, vol. 5, no. 3, p. 76, Jun. 2025, doi: 10.3390/appliedmath5030076.

K. Olorunnimbe and H. Viktor, “Deep learning in the stock market—a systematic survey of practice, backtesting, and applications,” Artif. Intell. Rev., vol. 56, no. 3, pp. 2057–2109, Mar. 2023, doi: 10.1007/s10462-022-10226-0.

D. Sheth and M. Shah, “Predicting stock market using machine learning: best and accurate way to know future stock prices,” International Journal of System Assurance Engineering and Management, vol. 14, no. 1, pp. 1–18, Feb. 2023, doi: 10.1007/s13198-022-01811-1.

H. Noor, M. Shahbaz, and W. Ali, “Risk-Aware Proximal Policy Optimization for Time-Series Options Trading,” Multiagent and Grid Systems, vol. 21, no. 3–4, pp. 209–227, Nov. 2025, doi: 10.1177/15741702251398696.

H. Feng, Y. Wang, S. Zhong, T. Yuan, and Z. Quan, “Federated Reinforcement Learning in Stock Trading Execution: The FPPO Algorithm for Information Security,” IEEE Access, vol. 13, pp. 25074–25086, 2025, doi: 10.1109/ACCESS.2025.3538859.

W. Wen, Y. Yuan, and J. Yang, “Reinforcement Learning for Options Trading,” Applied Sciences, vol. 11, no. 23, p. 11208, 2021, doi: 10.3390/app112311208.

S. Sha, Y. Liu, and B. Huo, “Dynamic proximal policy optimization: Enhancing PPO with adaptive entropy and smooth clipping,” Neurocomputing, vol. 674, p. 132861, Apr. 2026, doi: 10.1016/j.neucom.2026.132861.

Z. Wang, W. Jiang, R. Peng, Q. Kou, L. Wan, and X. Lan, “Improving Sample Efficiency Through Stability Enhancement in Deep-Reinforcement Learning,” IEEE Trans. Syst. Man Cybern. Syst., vol. 55, no. 9, pp. 6164–6176, Sep. 2025, doi: 10.1109/TSMC.2025.3578050.

Y. Cheng, Q. Guo, and X. Wang, “Proximal Policy Optimization With Advantage Reuse Competition,” IEEE Transactions on Artificial Intelligence, vol. 5, no. 8, pp. 3915–3925, Aug. 2024, doi: 10.1109/TAI.2024.3354694.

J. Zhang and J. Xie, “Adaptive Portfolio Optimization via PPO-HER,” Journal of Global Trends in Social Science, vol. 2, no. 4, pp. 23–30, Apr. 2025, doi: 10.70731/6xd2xq47.

A. A. S. Gunawan, S. Bilqis Ashifa, R. Y. Rumagit, and H. Ngarianto, “Development of Stock Market Price Application to Predict Purchase and Sales Decisions Using Proximal Policy Optimization Method,” in 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), IEEE, 2021, pp. 431–437. doi: 10.1109/ICCSAI53272.2021.9609714.

F. Espiga-Fernández, Á. García-Sánchez, and J. Ordieres-Meré, “A Systematic Approach to Portfolio Optimization: A Comparative Study of Reinforcement Learning Agents, Market Signals, and Investment Horizons,” Algorithms, vol. 17, no. 12, p. 570, Dec. 2024, doi: 10.3390/a17120570.

M. Kong and J. So, “Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning,” Applied Sciences, vol. 13, no. 1, p. 633, Jan. 2023, doi: 10.3390/app13010633.

Y. Ansari et al., “A Deep Reinforcement Learning-Based Decision Support System for Automated Stock Market Trading,” IEEE Access, vol. 10, pp. 127469–127501, 2022, doi: 10.1109/ACCESS.2022.3226629.

C. Quintero, D. Leon, J. Sandoval, and G. Hernandez, “Deep Reinforcement Learning in Continuous Action Spaces for Pair Trading: A Comparative Study of A2 C and PPO,” SN Comput. Sci., vol. 6, no. 5, p. 407, Apr. 2025, doi: 10.1007/s42979-025-03854-0.

J. Zou, J. Lou, B. Wang, and S. Liu, “A novel Deep Reinforcement Learning based automated stock trading system using cascaded LSTM networks,” Expert Syst. Appl., vol. 242, p. 122801, May 2024, doi: 10.1016/j.eswa.2023.122801.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Market-Adaptive Stock Trading through B-WEMA Driven Proximal Policy Optimization