Pemodelan Pola Temporal Action Unit untuk Pengenalan Ekspresi Wajah Berbasis Bidirectional LSTM


  • Muhammad Ghozali Sulton Universitas Dian Nuswantoro, Semarang, Indonesia
  • Sugiyanto Sugiyanto * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: BiLSTM; DCAP-SWOZ; FACS; Facial Action Units; Automated Labeling; Temporal Smoothing

Abstract

This study develops a facial expression recognition system based on Facial Action Units (AU) data using a Bidirectional Long Short-Term Memory (BiLSTM) model. The dataset consists of AU data obtained from a supervisor, sourced from DCAP-SWOZ (USC Institute for Creative Technologies), a multimodal corpus containing AU values extracted from human interaction videos. A total of 188 AU files were used in this research. Initial labeling was performed using Facial Action Coding System (FACS)-based rules as pseudo-labels serving as a starting point for training the BiLSTM model. This approach was chosen because the dataset lacks inherent emotion labels, necessitating a label initialization mechanism. The BiLSTM model functions as a temporal smoother designed to reduce noise and label inconsistencies that commonly occur in frame-by-frame rule-based approaches. The trained model then performs inference on the same data to generate final labels with improved temporal stability. Evaluation was conducted by measuring model consistency against FACS rules and qualitative analysis of temporal stability in generated labels. Data were processed into 30-frame sequences with a 1-frame sliding window to effectively capture expression dynamics patterns. The BiLSTM model was trained using two hidden layers with dropout regularization. Evaluation results showed 96.61% consistency against FACS rules with high performance across all emotion classes, including anger (99.11%), disgust (97.98%), fear (94.08%), happiness (99.29%), neutral (96.42%), sadness (98.31%), and surprise (99.16%). Qualitative analysis demonstrated that the model successfully reduced frame-by-frame label fluctuations by 73% compared to pure rule-based approaches, producing more stable and realistic emotion segmentation. These results demonstrate that the combination of FACS-based labeling and the BiLSTM model can produce a temporally consistent automated labeling system capable of accelerating labeled dataset creation, although validation against human ground truth remains necessary as future research.

Downloads

Download data is not yet available.

References

S. Li and W. Deng, “Deep Facial Expression Recognition: A Survey,” IEEE Trans. Affect. Comput., vol. 13, no. 3, pp. 1195–1215, 2022, doi: 10.1109/TAFFC.2020.2981446.

H. Shin, B. Lee, B. Ku, and H. Ko, “Noisy label facial expression recognition via face-specific label distribution learning,” Image Vis. Comput., vol. 143, p. 104901, Mar. 2024, doi: 10.1016/j.imavis.2024.104901.

K. Wang, X. Peng, J. Yang, S. Lu, and Y. Qiao, “Suppressing uncertainties for large-scale facial expression recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6896–6905, 2020, doi: 10.1109/CVPR42600.2020.00693.

B. Jiang et al., “Research on facial expression recognition algorithm based on improved MobileNetV3,” EURASIP J. Image Video Process., vol. 2024, no. 1, p. 22, Aug. 2024, doi: 10.1186/s13640-024-00638-z.

I. D. Mienye, T. G. Swart, and G. Obaido, “Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications,” Information, vol. 15, no. 9, p. 517, Aug. 2024, doi: 10.3390/info15090517.

Z. Zhao, Q. Liu, and S. Wang, “Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild,” IEEE Transactions on Image Processing, vol. 30, pp. 6544–6556, 2021, doi: 10.1109/TIP.2021.3093397.

S. Minaee, M. Minaei, and A. Abdolrashidi, “Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network,” Sensors, vol. 21, no. 9, p. 3046, Apr. 2021, doi: 10.3390/s21093046.

S. Ullah, J. Ou, Y. Xie, and W. Tian, “Facial expression recognition (FER) survey: a vision, architectural elements, and future directions,” PeerJ Comput. Sci., vol. 10, p. e2024, Jun. 2024, doi: 10.7717/peerj-cs.2024.

A. Khelifa, H. Ghazouani, and W. Barhoumi, “Label distribution learning for compound facial expression recognition in‐the‐wild: A comparative study,” Expert Syst., vol. 42, no. 2, Feb. 2025, doi: 10.1111/exsy.13724.

Z. Shao et al., “Facial Action Unit Detection by Adaptively Constraining Self-Attention and Causally Deconfounding Sample,” Int. J. Comput. Vis., vol. 133, no. 4, pp. 1711–1726, Apr. 2025, doi: 10.1007/s11263-024-02258-6.

N. Begum and A. S. Mustafa, “CNN BLSTM Joint Technique on Dynamic Shape and Appearance of FACS,” Int. J. Eng. Adv. Technol., vol. 9, no. 4, pp. 1754–1757, 2020, doi: 10.35940/ijeat.d7308.049420.

C. Liang and J. Dong, “A Survey of Deep Learning-based Facial Expression Recognition Research,” Frontiers in Computing and Intelligent Systems, vol. 5, no. 2, pp. 56–60, 2023, doi: 10.54097/fcis.v5i2.12445.

T. Kopalidis, V. Solachidis, N. Vretos, and P. Daras, “Advances in Facial Expression Recognition: A Survey of Methods, Benchmarks, Models, and Datasets,” Information (Switzerland), vol. 15, no. 3, 2024, doi: 10.3390/info15030135.

D. Liang, H. Liang, Z. Yu, and Y. Zhang, “Deep convolutional BiLSTM fusion network for facial expression recognition,” Vis. Comput., vol. 36, no. 3, pp. 499–508, Mar. 2020, doi: 10.1007/s00371-019-01636-3.

Y. Li, J. Zeng, and S. Shan, “Learning Representations for Facial Actions From Unlabeled Videos,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 302–317, Jan. 2022, doi: 10.1109/TPAMI.2020.3011063.

S. Jayaraman and A. Mahendran, “An Improved Facial Expression Recognition using CNN-BiLSTM with Attention Mechanism,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 5, 2024, doi: 10.14569/IJACSA.2024.01505132.

J. Zhong, T. Chen, and L. Yi, “Face expression recognition based on NGO-BILSTM model,” Front. Neurorobot., vol. 17, Mar. 2023, doi: 10.3389/fnbot.2023.1155038.

B. H. Pansambal, A. B. Nandgaokar, J. L. Rajput, and A. Wagh, “An Integrated CNN-BiLSTM Approach for Facial Expressions,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 3, 2024, doi: 10.14569/IJACSA.2024.0150398.

X. Ge, J. Fu, F. Chen, S. An, N. Sebe, and J. M. Jose, “Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning,” in Proceedings of the 32nd ACM International Conference on Multimedia, New York, NY, USA: ACM, Oct. 2024, pp. 8189–8198. doi: 10.1145/3664647.3681443.

I. D. Mienye and T. G. Swart, “A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications,” Information, vol. 15, no. 12, p. 755, Nov. 2024, doi: 10.3390/info15120755.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Pemodelan Pola Temporal Action Unit untuk Pengenalan Ekspresi Wajah Berbasis Bidirectional LSTM

Dimensions Badge
Article History
Submitted: 2026-01-30
Published: 2026-03-20
Abstract View: 182 times
PDF Download: 136 times
How to Cite
Sulton, M., & Sugiyanto, S. (2026). Pemodelan Pola Temporal Action Unit untuk Pengenalan Ekspresi Wajah Berbasis Bidirectional LSTM. Building of Informatics, Technology and Science (BITS), 7(4), 2629-2639. https://doi.org/10.47065/bits.v7i4.9315
Issue
Section
Articles