Klasifikasi Jenis Kelamin Berbasis Citra Mata Menggunakan Vision Transformer ViT dengan Strategi Discriminative Fine-Tuning


  • Gde Made Hanura * Mail Universitas Pendidikan Ganesha, Singaraja, Indonesia
  • Putu Hendra Suputra Universitas Pendidikan Ganesha, Singaraja, Indonesia
  • (*) Corresponding Author
Keywords: Vision Transformer; Gender Classification; Eye Image; Discriminative Learning Rate; Fine Tuning

Abstract

Face-based biometric identification systems have significant limitations when a subject’s face is covered, whether due to mask usage after the COVID-19 pandemic or face veils for cultural and religious reasons. This creates real security gaps, as evidenced by the gender-disguise infiltration incident at Masjid Jannatul Firdaus in Makassar. In such situations, the eyes remain the only consistently exposed biometric feature. This study proposes the application of Vision Transformer (ViT-B/16) pretrained on ImageNet-21K with a progressive fine-tuning strategy based on the discriminative learning rate principle to classify gender from eye images. The Female and Male Eyes dataset from Kaggle consists of 11,525 eye images divided into training (64%), validation (16%), and testing (20%) sets. Experiments were conducted in two series: Series B tested variations in the number of unfrozen transformer blocks (0–6), and Series C tested discriminative learning rate ratios between the classifier and encoder (5:1, 10:1, 3:1). The optimal configuration with 6 unfrozen blocks and a 3:1 ratio achieved 95.70% accuracy, 97.67% precision, 92.69% recall, and 0.9569 weighted F1-score, surpassing MobileNet (93.90%) and K-Nearest Neighbor (68.81%). These results indicate that ViT with discriminative fine-tuning is effective for gender classification from eye images and has potential for biometric security applications.

Downloads

Download data is not yet available.

References

M. Ngan, P. Grother, dan K. Hanaoka, "Ongoing Face Recognition Vendor Test (FRVT) Part 6A: Face Recognition Accuracy with Masks Using Pre-COVID-19 Algorithms," NIST Interagency Report 8311, National Institute of Standards and Technology, Gaithersburg, MD, USA, Jul. 2020. DOI: 10.6028/NIST.IR.8311

Detik News, "Pria Bercadar Menyusup ke Jemaah Wanita di Masjid Makassar Diamankan," Detik.com, 2024. [Online]. Available: https://news.detik.com/berita/d-7259609. [Accessed: Apr. 20, 2026]

K. Nguyen, H. Proença, dan F. Alonso-Fernandez, "Deep Learning for Iris Recognition: A Survey," ACM Computing Surveys, vol. 56, no. 9, Art. no. 223, 2024. DOI: 10.1145/3637525

S. Minaee, A. Abdolrashidi, H. Su, M. Bennamoun, dan D. Zhang, "Biometrics Recognition Using Deep Learning: A Survey," Artificial Intelligence Review, vol. 56, no. 8, hlm. 8647–8695, 2023. DOI: 10.1007/s10462-022-10237-x

D. Kwasny dan D. Hemmerling, "Gender and Age Estimation Methods Based on Speech Using Deep Neural Networks," Sensors, vol. 21, no. 14, Art. no. 4785, Jul. 2021. DOI: 10.3390/s21144785

S. Haseena et al., "Prediction of the Age and Gender Based on Human Face Images Based on Deep Learning Algorithm," Computational Intelligence and Neuroscience, vol. 2022, Art. no. 1413597, 2022. DOI: 10.1155/2022/1413597

C.-T. Hsiao, C.-Y. Lin, P.-S. Wang, dan Y.-T. Wu, "Application of Convolutional Neural Network for Fingerprint-Based Prediction of Gender, Finger Position, and Height," Entropy, vol. 24, no. 4, Art. no. 475, Mar. 2022. DOI: 10.3390/e24040475

S. Zhang, X. Wang, A. Liu, C. Zhao, J. Wan, S. Escalera, H. Shi, Z. Wang, dan S. Z. Li, "A Dataset and Benchmark for Large-Scale Multi-Modal Face Anti-Spoofing," dalam Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, hlm. 919–928. DOI: 10.1109/CVPR.2019.00101

B. M. S. Hasan dan R. J. Mstafa, "A Study of Gender Classification Techniques Based on Iris Images: A Deep Survey and Analysis," Science Journal of University of Zakho, vol. 10, no. 4, hlm. 222–234, 2022. DOI: 10.25271/sjuoz.2022.10.4.1039

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, dan N. Houlsby, "An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale," dalam Proc. 9th International Conference on Learning Representations (ICLR 2021), May 2021. [Online]. Available: https://arxiv.org/abs/2010.11929

J. Howard dan S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” dalam Proc. 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018, hlm. 328–339. DOI: 10.18653/v1/P18-1031

B. Pavel, "Female and Male Eyes," Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/burakbey0/female-and-male-eyes. [Accessed: Apr. 20, 2026]

C. Kurniawan dan H. Irsyad, "Perbandingan Metode K-Nearest Neighbor Dan Naïve Bayes Untuk Klasifikasi Gender Berdasarkan Mata," Jurnal Algoritme, vol. 2, no. 2, hlm. 82–91, Apr. 2022. DOI: 10.35957/algoritme.v2i2.2358

N. Aini dan D. Y. Liliana, "Prediksi Gender Berdasarkan Citra Mata Menggunakan Metode Convolutional Neural Network, Inception dan MobileNet," Buletin Poltanesa, vol. 23, no. 1, hlm. 226–232, Jun. 2022. DOI: 10.51967/tanesa.v23i1.1272

A. I. Pradana dan W. Wijiyanto, "Identifikasi Jenis Kelamin Otomatis Berdasarkan Mata Manusia Menggunakan Convolutional Neural Network (CNN) dan Haar Cascade Classifier," G-Tech: Jurnal Teknologi Terapan, vol. 8, no. 1, hlm. 502–511, Jan. 2024. DOI: 10.33379/gtech.v8i1.3814

H. Touvron et al., “Training Data-Efficient Image Transformers & Distillation Through Attention,” dalam Proc. International Conference on Machine Learning (ICML), PMLR vol. 139, 2021, hlm. 10347–10357. [Online]. Available: https://arxiv.org/abs/2012.12877

I. Loshchilov dan F. Hutter, "Decoupled Weight Decay Regularization," dalam Proc. International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 2019. [Online]. Available: https://arxiv.org/abs/1711.05101

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, dan I. Polosukhin, “Attention Is All You Need,” dalam Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, hlm. 5998–6008. [Online]. Available: https://arxiv.org/abs/1706.03762

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, dan B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” dalam Proc. IEEE/CVF International Conference on Computer Vision (ICCV), 2021, hlm. 10012–10022. DOI: 10.1109/ICCV48922.2021.00986

K. He, X. Chen, S. Xie, Y. Li, P. Dollar, dan R. Girshick, “Masked Autoencoders Are Scalable Vision Learners,” dalam Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, hlm. 16000–16009. DOI: 10.1109/CVPR52688.2022.01553

M. Tan dan Q. V. Le, “EfficientNetV2: Smaller Models and Faster Training,” dalam Proc. International Conference on Machine Learning (ICML), PMLR vol. 139, 2021, hlm. 10096–10106. [Online]. Available: https://arxiv.org/abs/2104.00298

A. Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” dalam Proc. International Conference on Machine Learning (ICML), PMLR vol. 139, 2021, hlm. 8748–8763. [Online]. Available: https://arxiv.org/abs/2103.00020

V. K. Suravarapu dan H. Y. Patil, "Performance Evaluation of Enhanced Deep Learning Classifiers for Person Identification and Gender Classification," Scientific Reports, vol. 15, Art. no. 28182, Aug. 2025. DOI: 10.1038/s41598-025-12474-w

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, dan D. Tao, "A Survey on Vision Transformer," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, hlm. 87–110, Jan. 2023. DOI: 10.1109/TPAMI.2022.3152247

J. Yosinski, J. Clune, Y. Bengio, dan H. Lipson, "How Transferable Are Features in Deep Neural Networks?" dalam Advances in Neural Information Processing Systems 27 (NIPS), 2014, hlm. 3320–3328. [Online]. Available: https://arxiv.org/abs/1411.1792

B. J. Ferrell, "Fine-tuning Strategies for Classifying Community-Engaged Research Studies Using Transformer-Based Models: Algorithm Development and Improvement Study," JMIR Formative Research, vol. 7, Art. no. e41137, Feb. 2023. DOI: 10.2196/41137

M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, dan A. Dosovitskiy, "Do Vision Transformers See Like Convolutional Neural Networks?" dalam Advances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021, hlm. 12116–12128. [Online]. Available: https://arxiv.org/abs/2108.08810

M. Hossin dan M. N. Sulaiman, "A Review on Evaluation Metrics for Data Classification Evaluations," International Journal of Data Mining & Knowledge Management Process, vol. 5, no. 2, hlm. 1–11, 2015. DOI: 10.5121/ijdkp.2015.5201

C. Bisogni, L. Cascone, dan F. Narducci, "Periocular Data Fusion for Age and Gender Classification," Journal of Imaging, vol. 8, no. 11, Art. no. 307, Nov. 2022. DOI: 10.3390/jimaging8110307


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Klasifikasi Jenis Kelamin Berbasis Citra Mata Menggunakan Vision Transformer ViT dengan Strategi Discriminative Fine-Tuning

Dimensions Badge
Article History
Submitted: 2026-04-28
Published: 2026-05-26
Abstract View: 18 times
PDF Download: 15 times
Issue
Section
Articles