Development of AI-Based Presentation Application using Deep Learning for Individuals With Disabilities


  • Carli Apriansyah Hutagalung * Mail MNC University, Jakarta, Indonesia
  • Adi Fitrianto MNC University, Jakarta, Indonesia
  • Gebran Akbar MNC University, Jakarta, Indonesia
  • (*) Corresponding Author
Keywords: Speech Recognition; LSTM-GRU Model; AI Application; Disabilities; Deep Learning

Abstract

This study addresses the challenges individuals with disabilities face in controlling presentation devices, particularly in noisy environments, by developing an AI-based application using a hybrid LSTM-GRU model. The primary objective is to improve voice command recognition accuracy for commonly used presentation commands, such as “next” and “back,” even under varying noise conditions. The research employs a hybrid deep learning architecture combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) with an attention mechanism to focus on the most relevant temporal features. The model was trained using the Speech Commands Dataset and further fine-tuned with noise-augmented data to simulate real-world environments. Results show that the LSTM-GRU model achieved high accuracy in clean environments and maintained reasonable performance in noisy conditions, outperforming traditional models like Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). The fine-tuned model, at its optimal epoch, demonstrated robust performance with a balanced precision and recall, making it suitable for deployment in real-world scenarios. The study concludes that while deep learning models offer significant improvements, further refinement is necessary to enhance noise resilience in practical applications

Downloads

Download data is not yet available.

References

Agnes Z. Yonatan, ‘Menilik Distribusi Sektor Pekerja Disabilitas Indonesia’, Goodstats.

Cindy Mutia Annur, ‘Mayoritas Pekerja Disabilitas Di Indonesia Berstatus Wirausaha’, Databoks.

Freya Robinson, ‘5 Ways Ai Can Help Disabled People In The Workplace’, Abilitynet.

Hauke Timmermann, ‘Using Ai To Support People With Disabilities In The Workplace’, Dotmagazine.

J. J. G. White, ‘Artificial Intelligence And People With Disabilities: A Reflection On Human–Ai Partnerships’, In Humanity Driven Ai, Cham: Springer International Publishing, 2022, Pp. 279–310. Doi: 10.1007/978-3-030-72188-6_14.

X. Xu Et Al., ‘Training-Free Acoustic-Based Hand Gesture Tracking On Smart Speakers’, Applied Sciences, Vol. 13, No. 21, P. 11954, Nov. 2023, Doi: 10.3390/App132111954.

B. I. Alabdullah Et Al., ‘Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion And Recurrent Neural Network’, Sensors, Vol. 23, No. 17, P. 7523, Aug. 2023, Doi: 10.3390/S23177523.

Z. Lv, F. Poiesi, Q. Dong, J. Lloret, And H. Song, ‘Deep Learning For Intelligent Human–Computer Interaction’, Applied Sciences, Vol. 12, No. 22, P. 11457, Nov. 2022, Doi: 10.3390/App122211457.

I. Elmagrouni, A. Ettaoufik, S. Aouad, And A. Maizate, ‘A Deep Learning Framework For Hand Gesture Recognition And Multimodal Interface Control’, Revue D’intelligence Artificielle, Vol. 37, No. 4, Pp. 881–887, Aug. 2023, Doi: 10.18280/Ria.370407.

N. Zafar, I. U. Haq, J.-R. Chughtai, And O. Shafiq, ‘Applying Hybrid Lstm-Gru Model Based On Heterogeneous Data Sources For Traffic Speed Prediction In Urban Areas’, Sensors, Vol. 22, No. 9, P. 3348, Apr. 2022, Doi: 10.3390/S22093348.

H. C. Kilinc, S. Apak, F. Ozkan, M. E. Ergin, And A. Yurtsever, ‘Multimodal Fusion Of Optimized Gru–Lstm With Self-Attention Layer For Hydrological Time Series Forecasting’, Water Resources Management, Aug. 2024, Doi: 10.1007/S11269-024-03943-4.

E. Salah, K. Amine, K. Redouane, And K. Fares, ‘A Fourier Transform Based Audio Watermarking Algorithm’, Applied Acoustics, Vol. 172, P. 107652, Jan. 2021, Doi: 10.1016/J.Apacoust.2020.107652.

N. Peng Et Al., ‘Environment Sound Classification Based On Visual Multi-Feature Fusion And Gru-Aws’, Ieee Access, Vol. 8, Pp. 191100–191114, 2020, Doi: 10.1109/Access.2020.3032226.

F. Wang And X. Shen, ‘Research On Speech Emotion Recognition Based On Teager Energy Operator Coefficients And Inverted Mfcc Feature Fusion’, Electronics (Basel), Vol. 12, No. 17, P. 3599, Aug. 2023, Doi: 10.3390/Electronics12173599.

Q. Li Et Al., ‘Msp-Mfcc: Energy-Efficient Mfcc Feature Extraction Method With Mixed-Signal Processing Architecture For Wearable Speech Recognition Applications’, Ieee Access, Vol. 8, Pp. 48720–48730, 2020, Doi: 10.1109/Access.2020.2979799.

G. Liu And J. Guo, ‘Bidirectional Lstm With Attention Mechanism And Convolutional Layer For Text Classification’, Neurocomputing, Vol. 337, Pp. 325–338, Apr. 2019, Doi: 10.1016/J.Neucom.2019.01.078.

R. L. Abduljabbar, H. Dia, And P.-W. Tsai, ‘Unidirectional And Bidirectional Lstm Models For Short-Term Traffic Prediction’, J Adv Transp, Vol. 2021, Pp. 1–16, Mar. 2021, Doi: 10.1155/2021/5589075.

Y. Imrana, Y. Xiang, L. Ali, And Z. Abdul-Rauf, ‘A Bidirectional Lstm Deep Learning Approach For Intrusion Detection’, Expert Syst Appl, Vol. 185, P. 115524, Dec. 2021, Doi: 10.1016/J.Eswa.2021.115524.

M. Fazil, S. Khan, B. M. Albahlal, R. M. Alotaibi, T. Siddiqui, And M. A. Shah, ‘Attentional Multi-Channel Convolution With Bidirectional Lstm Cell Toward Hate Speech Prediction’, Ieee Access, Vol. 11, Pp. 16801–16811, 2023, Doi: 10.1109/Access.2023.3246388.

J. Jorge, A. Gimenez, J. A. Silvestre-Cerda, J. Civera, A. Sanchis, And A. Juan, ‘Live Streaming Speech Recognition Using Deep Bidirectional Lstm Acoustic Models And Interpolated Language Models’, Ieee/Acm Trans Audio Speech Lang Process, Vol. 30, Pp. 148–161, 2022, Doi: 10.1109/Taslp.2021.3133216.

A. Shewalkar, D. Nyavanandi, And S. A. Ludwig, ‘Performance Evaluation Of Deep Neural Networks Applied To Speech Recognition: Rnn, Lstm And Gru’, Journal Of Artificial Intelligence And Soft Computing Research, Vol. 9, No. 4, Pp. 235–245, Oct. 2019, Doi: 10.2478/Jaiscr-2019-0006.

Y. Dai, H. Rong, Y. Wu, C. Yang, And Y. Xu, ‘Stall Flutter Prediction Based On Multi-Layer Gru Neural Network’, Chinese Journal Of Aeronautics, Vol. 36, No. 1, Pp. 75–90, Jan. 2023, Doi: 10.1016/J.Cja.2022.07.011.

S. Mahjoub, L. Chrifi-Alaoui, B. Marhic, And L. Delahoche, ‘Predicting Energy Consumption Using Lstm, Multi-Layer Gru And Drop-Gru Neural Networks’, Sensors, Vol. 22, No. 11, P. 4062, May 2022, Doi: 10.3390/S22114062.

Z. Niu, G. Zhong, And H. Yu, ‘A Review On The Attention Mechanism Of Deep Learning’, Neurocomputing, Vol. 452, Pp. 48–62, Sep. 2021, Doi: 10.1016/J.Neucom.2021.03.091.

A. M. Javid, S. Das, M. Skoglund, And S. Chatterjee, ‘A Relu Dense Layer To Improve The Performance Of Neural Networks’, In Icassp 2021 - 2021 Ieee International Conference On Acoustics, Speech And Signal Processing (Icassp), Ieee, Jun. 2021, Pp. 2810–2814. Doi: 10.1109/Icassp39728.2021.9414269.

X. Liang, X. Wang, Z. Lei, S. Liao, And S. Z. Li, ‘Soft-Margin Softmax For Deep Classification’, 2017, Pp. 413–421. Doi: 10.1007/978-3-319-70096-0_43.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Development of AI-Based Presentation Application using Deep Learning for Individuals With Disabilities

Dimensions Badge
Article History
Submitted: 2024-10-30
Published: 2024-12-26
Abstract View: 47 times
PDF Download: 21 times
How to Cite
Hutagalung, C., Fitrianto, A., & Akbar, G. (2024). Development of AI-Based Presentation Application using Deep Learning for Individuals With Disabilities. Building of Informatics, Technology and Science (BITS), 6(3), 1910-1918. https://doi.org/10.47065/bits.v6i3.6162
Issue
Section
Articles