Enhanced Detection of Diffusion Model-Generated Deepfakes Using CNN Feature Maps and Forgery Traces
DOI:
https://doi.org/10.29304/jqcsm.2024.16.41790Keywords:
GAN, CNN, DDPM, LDM, VQGAN, Celeb-HQAbstract
Recent advancements in generative artificial intelligence powered by deep learning have significantly improved image generation and manipulation, resulting in highly realistic images that pose substantial risks to multimedia security. The increasing similarity between authentic and deepfake images, especially those generated by Diffusion Models, highlights the pressing need for effective detection mechanisms. This study provides a comprehensive evaluation of counterfeit facial image detection, focusing on the generalizability and robustness of various detection methods. Using a dataset comprising real images from CelebA-HQ and synthetic images generated by five state-of-the-art models (StyleGAN2, VQGAN, DDPM, PNDM, and LDM), we benchmark four leading detection algorithms: Wang2020, Grag2021, Mandelli2022, and Ojha2023.
The performance of these detectors was evaluated across different generative models and under various image perturbations, such as resizing, noise, blur, and compression. Additionally, we analyzed frequency-domain artifacts, revealing that GAN-generated images exhibit distinct frequency patterns, whereas DM-generated images closely resemble authentic ones. A novel hybrid approach combining spatial and frequency-domain features was proposed, yielding superior performance in detecting AI-generated human faces. Among the methods tested, Mandelli2022 achieved an AUC of 98.38%, while our ResNet-50+FFT model outperformed it slightly with an AUC of 98.42%. These results highlight the effectiveness of hybrid approaches in improving detection accuracy. However, detectors still face challenges in generalizing to diverse datasets, emphasizing the need for more adaptable and robust detection strategies.
Downloads
References
Suwajanakorn, S., Seitz, S.M., and Kemelmacher-Shlizerman, I.Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
Zhou, X. and Zafarani, R. A survey of fake news: Fundamental theories, detection methods, and op- portunities, September 2020. ISSN 1557-7341. URL http://dx.doi.org/10.1145/3395046
Chen, H. and Magramo, K. Finance worker pays out $25 million after video call with deepfake ‘chief financial officer’, 2024.
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen, “Progressive growing of gans for im- proved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros, “Un- paired image-to-image transla- tion using cycle-consistent adversarial networks,” in Proceedings of the IEEE international confer ence on computer vision, 2017, pp. 2223–2232.
Tero Karras, Samuli Laine, and Timo Aila, “A style-based generator architecture for generative ad versarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila, “Analyz- ing and improving the image quality of StyleGAN,” in Proc. CVPR, 2020.
Patrick Esser, Robin Rombach, and Bjorn Ommer, “Taming transformers for high-resolution image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12873–12883.
Diederik P Kingma and Max Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1–11.
Kaede Shiohara and Toshihiko Yamasaki, “Detecting deepfakes with self-blended images,” in Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18720–18729.
Yuhang Lu and Touradj Ebrahimi, “Assessment framework for deepfake detection in real-world situ- ations,” arXiv preprint arXiv:2304.06125, 2023.
David Holz. Midjoureny. https://docs. midjourney.com/docs/model-versions, 2022. [Online; ac- cessed 26-June-2023].
David Holz. Dall-e 2. https://labs.openai.com, 2022. [Online; accessed 27-June-2023].
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural In- formation Processing Systems, 35:36479–36494, 2022.
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500-22510, 2023.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-reso- lution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Bor- ja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. arXiv preprint arXiv:2301.13188, 2023.
Derui Zhu, Dingfan Chen, Jens Grossklags, and Mario Fritz. Data forensics in diffusion models: A system- atic analysis of membership privacy. arXiv preprint arXiv:2302.07801, 2023.
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros, “Cnn-generated images are surprisingly easy to spot...for now,” in CVPR, 2020.
Francesco Marra, Diego Gragnaniello, Luisa Verdoliva, and Giovanni Poggi, “Do gans leave artifi- cial fingerprints?,” in 2019 IEEE con- ference on multimedia information processing and retrieval (MIPR). IEEE, 2019, pp. 506–511.
Ning Yu, Larry S Davis, and Mario Fritz, “Attributing fake images to gans: Learning and analyzing gan fingerprints,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7556–7566.
SaraMandelli,Nicolo`Bonettini,PaoloBestagini,andStefanoTubaro, “Training CNNs in presence of JPEG compression: Multimedia foren- sics vs computer vision,” in IEEE International Workshop on Informa- tion Forensics and Security (WIFS), 2020.
SaraMandelli,Nicolo`Bonettini,PaoloBestagini,andStefanoTubaro, “Detecting gan-generated images by orthogonal training of multiple cnns,” in 2022 IEEE International Conference on Image Process ing (ICIP). IEEE, 2022, pp. 3091–3095.
Utkarsh Ojha, Yuheng Li, and Yong Jae Lee, “Towards universal fake image detectors that generalize across generative models,” in Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 24480–24489.
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, and Yun- chao Wei, “Learning on gradi- ents: Generalized artifacts representation for gan-generated images detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12105–12114.
Jonas Ricker, Simon Damm, Thorsten Holz, and Asja Fischer, “Towards the detection of diffusion mo del deepfakes,” arXiv preprint arXiv:2210.14571, 2022.
Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Ver- doliva, “On the detection of synthetic images generated by diffusion models,” in ICASSP 2023-2023 IEEE
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality,Stability, and Variation. In International Conference on Learning Representa- tions. https://openreview.net/forum?id=Hk99zCeAb
Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embed- ding. science, 290(5500):2323–2326, 2000.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” Advances in neural information process- ing systems, vol. 27, 2014.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros, “Image-to-image translation with conditional adversarial networks,” in Pro- ceedings of the IEEE conference on computer vision and pattern recog- nition, 2017, pp. 1125–1134.
JaschaSohl-Dickstein,EricWeiss,NiruMaheswaranathan, and Surya Ganguli. Deep unsu- pervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
JonathanHo,AjayJain,andPieterAbbeel,“Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
PrafullaDhariwalandAlexanderNichol.Diffusionmodels beat gans on image synthesis. Advances in Neural Informa- tion Processing Systems, 34:8780–8794, 2021.
LupingLiu,YiRen,ZhijieLin,andZhouZhao.Pseudonu- merical methods for diffusion models on man- ifolds. arXiv preprint arXiv:2202.09778, 2022.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-reso- lution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
Scott McCloskey and Michael Albright, “Detecting gan-generated im- agery using color cues,” arXiv preprint arXiv:1812.08247, 2018.
Scott McCloskey and Michael Albright, “Detecting gan-generated imagery using saturation cues,” in 2019 IEEE international conference on image processing (ICIP). IEEE, 2019, pp. 4584–4588.
Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo, “Face x- ray for more general face forgery detection,” in Proceedings of the IEEE/CVF conference on com puter vision and pattern recognition, 2020, pp. 5001–5010.
François Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
Francesco Marra, Diego Gragnaniello, Davide Cozzolino, and Luisa Verdoliva, “Detection of gan- generated fake images over social net- works,” in 2018 IEEE conference on multimedia information processing and retrieval (MIPR). IEEE, 2018, pp. 384–389.
Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. arXiv preprint arXiv:2303.09295, 2023.
Diego Gragnaniello, Davide Cozzolino, Francesco Marra, Giovanni Poggi, and Luisa Verdoliva, “Are gan generated images easy to detect? a critical analysis of the state-of-the-art,” in 2021 IEEE international conference on multimedia and expo (ICME). IEEE, 2021, pp. 1–6.
Mingxing Tan and Quoc Le, “Efficientnet: Rethinking model scaling for convolutional neural net- works,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
Chengdong Dong, Ajay Kumar, and Eryun Liu. Think twice before detecting gan-generated fake images from their spectral domain imprints. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 7865–7874, 2022.
Peter Lorenz, Ricard L Durall, and Janis Keuper, “Detecting images generated by deep diffusion models using their local intrinsic dimen- sionality,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 448–459.
Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao, “Pseudo numer- ical methods for diffusion models on manifolds,” arXiv preprint arXiv:2202.09778, 2022.
SaraMandelli,Nicolo`Bonettini,PaoloBestagini,andStefanoTubaro, “Detecting gan-generated images by orthogonal training of multiple cnns,” in 2022 IEEE International Conference on Image Process- ing (ICIP). IEEE, 2022, pp. 3091–3095.
Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz, “Leveraging frequency analysis for deep fake image recognition,” in International conference on machine learning. PMLR, 2020, pp. 3247–3258.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770– 778.
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv pre print arXiv:1506.03365, 2015.
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hell- sten, Jaakko Lehtinen, and Timo Aila, “Alias-free generative adver- sarial networks,” Advances in Neural Information Process- ing Systems, vol. 34, pp.8.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Banan Jamil Awrahman, Zhir Jamil Awrahman, Chya Fatah Aziz
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.