Deep Fake Image Detection Using Vision Transformer with Random Oversampling Technique


  • Dipo Paudro Tirto Prakoso Universitas Dian Nuswantoro, Semarang, Indonesia
  • Sugiyanto Sugiyanto * Mail Universitas Dian Nuswantoro, Semarang, Indonesia
  • (*) Corresponding Author
Keywords: Deepfake Detection; Vision Transformer; Image Classification; Random Oversampling; Transfer Learning

Abstract

Recent developments in deep learning have facilitated the generation of visually convincing deepfake images, creating serious concerns for the reliability and security of digital media content. The primary challenge lies in detecting these sophisticated manipulations while handling imbalanced datasets, a common issue in deepfake detection research. This research focuses on designing a robust deepfake image classification model based on the Vision Transformer (ViT) architecture to differentiate between authentic and manipulated images. The main objectives are to: (1) adapt and fine-tune a pre-trained Vision Transformer for binary classification, (2) evaluate the effectiveness of Random Oversampling in addressing class imbalance while preventing data leakage, and (3) assess model performance using comprehensive metrics. Methods: A pre-trained Vision Transformer model (Deep-Fake-Detector-v2-Model) was adapted and fine-tuned using a dataset consisting of 190,335 images. To overcome the issue of class imbalance, a Random Oversampling strategy was applied exclusively to the training set after dataset splitting to prevent data leakage. The dataset was divided into training and testing subsets using an 80:20 ratio. During the training phase, data augmentation techniques such as image rotation, sharpness variation, and pixel normalization were employed. The model was trained for four epochs with a learning rate of 1×10⁻⁶ and a batch size of 32. Results: Experimental evaluation demonstrates that the proposed model achieves a classification accuracy of 94.46% on the test dataset. The model demonstrates high precision of 97.56% for fake images and 91.74% for real images, with corresponding recall rates of 91.21% and 97.72% respectively. The F1-score reaches 94.46% for both classes, indicating balanced performance. Novelty: This research presents a novel application of Vision Transformer architecture for deepfake detection, combining efficient transfer learning with strategic oversampling to handle imbalanced datasets while preventing data leakage. The study demonstrates that ViT-based models can effectively capture subtle manipulation artifacts in deepfake images, achieving superior performance compared to traditional convolutional neural network approaches.

Downloads

Download data is not yet available.

References

R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia, “Deepfakes and beyond: A Survey of face manipulation and fake detection,” Information Fusion, vol. 64, pp. 131–148, 2020, doi: 10.1016/j.inffus.2020.06.014.

B. Zi, M. Chang, J. Chen, X. Ma, and Y.-G. Jiang, “WildDeepfake,” in Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA: ACM, Oct. 2020, pp. 2382–2390. doi: 10.1145/3394171.3413769.

Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2020, pp. 3204–3213. doi: 10.1109/CVPR42600.2020.00327.

L. Jiang, R. Li, W. Wu, C. Qian, and C. C. Loy, “DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2020, pp. 2886–2895. doi: 10.1109/CVPR42600.2020.00296.

Y. Mirsky and W. Lee, “The Creation and Detection of Deepfakes,” ACM Comput. Surv., vol. 54, no. 1, 2021, doi: 10.1145/3425780.

T. T. Nguyen, Q. V. H. Nguyen, D. T. Nguyen, D. T. Nguyen, T. Huynh-The, S. Nahavandi, T. T. Nguyen, Q.-V. Pham, and C. M. Nguyen, "Deep Learning for Deepfakes Creation and Detection: A Survey," Computer Vision and Image Understanding, vol. 223, p. 103525, Oct. 2022, doi: 10.1016/j.cviu.2022.103525.

J. Frank, T. Eisenhofer, L. Schönherr, A. Fischer, D. Kolossa, and T. Holz, “Leveraging Frequency Analysis for Deep Fake Image Recognition,” PMLR, 2020, pp. 3247–3258. [Online]. Available: http://proceedings.mlr.press/v119/frank20a.html

R. Durall, M. Keuper, and J. Keuper, “Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2020, pp. 7887–7896. doi: 10.1109/CVPR42600.2020.00791.

A. George and S. Marcel, “Learning One Class Representations for Face Presentation Attack Detection Using Multi-Channel Convolutional Neural Networks,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 361–375, 2021, doi: 10.1109/TIFS.2020.3013214.

Z. Chen, L. Xie, S. Pang, Y. He, and B. Zhang, "MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2021, pp. 14973–14982. doi: 10.1109/CVPR46437.2021.01473.

H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain, “On the Detection of Digital Face Manipulation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Jun. 2020, pp. 5780–5789. doi: 10.1109/CVPR42600.2020.00582.

B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. Canton Ferrer, "The DeepFake Detection Challenge Dataset," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, Jun. 2020, pp. 2137–2147. doi: 10.1109/CVPRW50498.2020.00253.

L. Guarnera, O. Giudice, and S. Battiato, “DeepFake Detection by Analyzing Convolutional Traces,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, Jun. 2020, pp. 2841–2850. doi: 10.1109/CVPRW50498.2020.00341.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," in International Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao, "A Survey on Vision Transformer," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 87–110, Jan. 2023, doi: 10.1109/TPAMI.2022.3152247.

D. Coccomini, N. Messina, C. Gennaro, and F. Falchi, "Combining EfficientNet and Vision Transformers for Video Deepfake Detection," in Image Analysis and Processing – ICIAP 2022, Springer International Publishing, 2022, pp. 219–229. doi: 10.1007/978-3-031-06433-3_19.

S. Agarwal, H. Farid, O. Fried, and M. Agrawala, “Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, Jun. 2020, pp. 2814–2822. doi: 10.1109/CVPRW50498.2020.00338.

C. Rathgeb, R. Tolosana, R. Vera-Rodriguez, and C. Busch, Eds., Handbook of Digital Face Manipulation and Detection. in Advances in Computer Vision and Pattern Recognition. Cham: Springer International Publishing, 2022. doi: 10.1007/978-3-030-87664-7.

T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 38–45. doi: 10.18653/v1/2020.emnlp-demos.6.

R. Caldelli, L. Galteri, I. Amerini, and A. Del Bimbo, “Optical Flow based CNN for detection of unlearnt deepfake manipulations,” Pattern Recognit. Lett., vol. 146, pp. 31–37, Jun. 2021, doi: 10.1016/j.patrec.2021.03.005.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Deep Fake Image Detection Using Vision Transformer with Random Oversampling Technique

Dimensions Badge
Article History
Submitted: 2026-01-30
Published: 2026-03-06
Abstract View: 205 times
PDF Download: 74 times
How to Cite
Tirto Prakoso, D., & Sugiyanto, S. (2026). Deep Fake Image Detection Using Vision Transformer with Random Oversampling Technique. Building of Informatics, Technology and Science (BITS), 7(4), 2361−2369. https://doi.org/10.47065/bits.v7i4.9316
Issue
Section
Articles