O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Próximos SlideShares
What to Upload to SlideShare
What to Upload to SlideShare
Carregando em…3
1 de 35

SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)



Baixar para ler offline

SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)

6/10 (木) 14:30~15:00
講師:Huy H. Nguyen 氏(総合研究大学院大学/国立情報学研究所)​

概要: Advances in machine learning and their interference with computer graphics allow us to easily generate high-quality images and videos. State-of-the-art manipulation methods enable the real-time manipulation of videos obtained from social networks. It is also possible to generate videos from a single portrait image. By combining these methods with speech synthesis, attackers can create a realistic video of some person saying something that they never said and distribute it on the internet. This results in loosing social trust, making confusion, and harming people’s reputation. Several countermeasures have been proposed to tackle this problem, from using hand-crafted features to using convolutional neural network. Some countermeasures use images as input and other leverage temporal information in videos. Their output could be binary (bona fide or fake) or muti-class (deepfake detection), or segmentation masks (manipulation localization). Since deepfake methods evolve rapidly, dealing with unseen ones is still a challenging problem. Some solutions have been proposed, however, this problem is not completely solved. In this talk, I will provide an overview on both deepfake generation and deepfake detection/localization. I will mainly focus on image and video domain and also introduce some audiovisual-based methods on both sides. Some open discussions and future directions are also included.

SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)

  1. 1. Deepfake Generation and Detection - An Overview June 2021 Huy H. Nguyen(SOKENDAI)
  2. 2. 2 About me Education: • 2009-2013: BS, University of Science, Vietnam National Univerisity – Ho Chi Minh City. • 2016-Now: Ph.D Candidate (Echizen Lab), The Graduate University for Advanced Studies (SOKENDAI), in association with the National Institute of Informatics, Japan. Research topics: Machine learning, deepfake detection, biometrics. Research Contributions: • Reviewer: APSIPA, ICME, IEEE Access, IEEE TIFS, IEEE/CAA JAS. • APSIPA 2020 Special Session Chair: Deep Generative Models for Media Clones and Its Detection. Huy H. Nguyen nhhuy@nii.ac.jp
  3. 3. 3 Outline 1. What is Deepfake? 2. Deepfake Generation 2.1. Entire face synthesis 2.2. Attribute manipulation 2.3. Facial reenactment 2.4. Speaking manipulation 2.5. Face swap 3. Deepfake Detection 3.1. Datasets 3.2. Image-based deepfake detection 3.3. Image-based deepfake segmentation 3.4. Video-based deepfake detection 3.5. Generalizability 4. Discussion 5. Q&A
  4. 4. 4 1. What is Deepfake?
  5. 5. 5 1. What is Deepfake? Deepfake / facial generation & manipulation: • Entire face synthesis • Attribute manipulation: hair, skin color, expression • Facial reenactment • Speaking manipulation • Face swap
  6. 6. 6 1. Examples of Deepfake Realistic images generated by StyleGAN 2 (Karras et al. 2020) More examples can be found at https://thispersondoesnotexist.com/ French charity published a deepfake of Trump saying 'AIDS is over’ Source: Euronews
  7. 7. 7 1. Deepfake’s threats Breaking authentication systems à Identity thief Chingovska et al. 2012 Pornography Fraudulent / Spying purpose Image: The Verge Spreading disinformation Image: CNN Breaking border control Image: MIT Technology Review Phony, blackmail Image: Military Times
  8. 8. 8 2. Deepfake Generation
  9. 9. 9 2.1. Entire Face Synthesis VAEs vs. GANs StyleGAN / StyleGAN 21 (Karras et al. 2019/2020). Using progressive training strategy and a style-based image generation approach. VQ-VAE 2 (Razavi et al. 2019). Using multi-stage image generation strategy. - Razavi, Ali, Aaron van den Oord, and Oriol Vinyals. "Generating diverse high-fidelity images with vq-vae-2." NeurIPS (2019). - Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." CVPR, pp. 4401-4410. 2019. - Karras, Tero, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Analyzing and improving the image quality of stylegan." CVPR, pp. 8110-8119. 2020. - 1 Demo can be found at: https://thispersondoesnotexist.com/
  10. 10. 10 2.2. Attribute Manipulation ELEGANT (Xiao et al. 2018). Exchanging latent encodings for transferring multiple face attributes. StarGAN (Choi et al. 2018). Image-to-image translation for multiple domains. - Choi, Yunjey, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation." CVPR, pp. 8789-8797. 2018. - Xiao, Taihong, Jiapeng Hong, and Jinwen Ma. "ELEGANT: Exchanging latent encodings with GAN for transferring multiple face attributes." ECCV, pp. 168-184. 2018.
  11. 11. 11 2.3. Facial Reenactment Video (attacker) + video (victim) à forged video Face2Face (Thies et al. 2016). Transferring facial movements of one person to the other one. Deep Video Portraits (Kim et al. 2018). Extension of Face2Face with the addition of transferring head poses. - Thies, Justus, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. "Face2Face: Real-time face capture and reenactment of RGB videos." CVPR, pp. 2387-2395. 2016. - Kim, Hyeongwoo, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. "Deep video portraits." ACM Transactions on Graphics (TOG) 37, no. 4 (2018): 1-14.
  12. 12. 12 2.3. Facial Reenactment Video (attacker) + video (victim) à forged video Head2Head++ (Doukas et al. 2021) - Doukas, Michail Christos, Mohammad Rami Koujan, Viktoriia Sharmanska, Anastasios Roussos, and Stefanos Zafeiriou. "Head2Head++: Deep Facial Attributes Re-Targeting." IEEE Transactions on Biometrics, Behavior, and Identity Science 3, no. 1 (2021): 31-43. - Thies, Justus, Michael Zollhöfer, and Matthias Nießner. "Deferred neural rendering: Image synthesis using neural textures." ACM Transactions on Graphics (TOG) 38, no. 4 (2019): 1-12. NeuralTextures (Thies et al. 2019)
  13. 13. 13 2.3. Facial Reenactment Video (attacker) + image (victim) à forged video Bringing Portraits to Life (Averbuch-Elor et al. 2017) ICFace (Tripathy et al. 2020) - Averbuch-Elor, Hadar, Daniel Cohen-Or, Johannes Kopf, and Michael F. Cohen. "Bringing portraits to life." ACM Transactions on Graphics (TOG) 36, no. 6 (2017): 1-13. - Zakharov, Egor, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. "Few-shot adversarial learning of realistic neural talking head models." ICCV, pp. 9459-9468. 2019. - Tripathy, Soumya, Juho Kannala, and Esa Rahtu. "ICFace: Interpretable and controllable face reenactment using GANs." WACV, pp. 3385-3394. 2020. Neural Talking Head Models (Zakharov et al. 2019)
  14. 14. 14 2.4. Speaking Manipulation Synthesized speech (attacker) + image/video (victim) à forged video Speech2Vid (Jamaludin et al. 2020) Synthesizing Obama (Suwajanakorn et al. 2017) - Jamaludin, Amir, Joon Son Chung, and Andrew Zisserman. "You said that?: Synthesising talking faces from audio." International Journal of Computer Vision 127, no. 11 (2019): 1767-1779.. - Suwajanakorn, Supasorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. "Synthesizing obama: learning lip sync from audio." ACM Transactions on Graphics (ToG) 36, no. 4 (2017): 1-13.
  15. 15. 15 2.4. Speaking Manipulation Modified text (attacker) + video (victim) à forged video Text-based Editing of Talking-head Video (Fried et al. 2019) Fried, Ohad, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B. Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, and Maneesh Agrawala. "Text-based editing of talking-head video." ACM Transactions on Graphics (TOG) 38, no. 4 (2019): 1-14.
  16. 16. 16 2.5. Face Swap Traditional (computer graphic based) face swap - Bitouk, Dmitri, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, and Shree K. Nayar. "Face swapping: automatically replacing faces in photographs." SIGGRAPH 2008, pp. 1-8. 2008. - 1 A course project from the Warsaw University of Technology. Access at https://github.com/MarekKowalski/FaceSwap FaceSwap1 (Kowalski 2016) Face Swapping (Bitouk et al. 2008)
  17. 17. 17 2.5. Face Swap Deep learning based face swap Original Deepfake (Faceswap)1 Image: Alan Zucconi Faceswap – GAN2 Image: shaoanlu 1 https://github.com/deepfakes/faceswap 2 https://github.com/shaoanlu/faceswap-GAN
  18. 18. 18 3. Deepfake Detection
  19. 19. 19 3. Overview of Deepfake Detection Other terms/similar topics: • Image splicing detection • Computer-generated image/video detection • Presentation attack detection • Image/video manipulation/forgery detection
  20. 20. 20 3. Overview of Deepfake Detection Deepfake detection Input Image/ video frame Video Output Classification Segmentation Feature extraction Hand-crafted Automatic (deep learning) Semi- automatic Architecture Single network Two-stream Multi-task learning Ensemble
  21. 21. 21 3.1. Datasets Dataset Year #Real #Fake #Person Manipulation Methods DF-TIMIT 1 2018 320 320 1 Deepfake UADFV 2 2018 49 49 1 Deepfake FaceForensics++ 3 2019 1,000 5,000 1 • Deepfake family • Face2Face • FaceSwap • NeuralTextures • FaceShifter Google DFD 4 2019 363 3,068 1 Deepfake Facebook DFDC 5 2020 23,654 104,500 ~1 Various Celeb-DF 6 2020 590 5,639 1 Deepfake DeeperForensics 7 2020 1,000 (from FF++) 1,000 (raw) → 10,000 (aug.) 1 DeepFake-VAE WildDeepfake 8 2020 707 1 No information Face Forensics in the Wild (FFIW) 9 2021 10,000 10,000 3.15 • DeepFaceLab • FaceSwap • FaceSwap-GAN 1 Korshunov, P. and Marcel, S., 2018. Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685. 2 Li, Yuezun, Ming-Ching Chang, and Siwei Lyu. "In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking." WIFS. 2018. 3 Rossler, Andreas, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. "Faceforensics++: Learning to detect manipulated facial images." ICCV. 2019. 4 Google AI blog. Contributing data to deepfake detection research. Access at https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html. 2019 5 Dolhansky, Brian, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. "The deepfake detection challenge dataset." arXiv preprint arXiv:2006.07397 (2020). 6 Li, Yuezun, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. "Celeb-DF: A large-scale challenging dataset for deepfake forensics." CVPR. 2020. 7 Jiang, Liming, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. "Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection." CVPR. 2020. 8 Zi, Bojia, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. "WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection." ACM Multimedia. 2020. 9 Zhou, Tianfei, Wenguan Wang, Zhiyuan Liang, and Jianbing Shen. "Face Forensics in the Wild." CVPR. 2021. FaceForensics++ DFDC DeeperForensics Celeb-DF
  22. 22. 22 3.2. Image-based Deepfake Detection Using hand-crafted residuals to extract features and an ensemble classifier (Fridrich and Kodovsky. 2012). - Fridrich, Jessica, and Jan Kodovsky. "Rich models for steganalysis of digital images." IEEE Transactions on Information Forensics and Security 7, no. 3 (2012): 868-882. - Cozzolino, Davide, Giovanni Poggi, and Luisa Verdoliva. "Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection." ACM Workshop on Information Hiding and Multimedia Security, pp. 159-164. 2017. - Bayar, Belhassen, and Matthew C. Stamm. "A deep learning approach to universal image manipulation detection using a new convolutional layer." ACM Workshop on Information Hiding and Multimedia Security. 2016. Reimplementing Fridrich and Kodovsky’s work as a CNN (Cozzolino et al. 2017). Proposing a new convolutional layer. The coefficients in the green region sum to 1. (Bayar and Stamm. 2016).
  23. 23. 23 3.2. Image-based Deepfake Detection CNN-based single network deepfake detectors MesoNet (Afchar et al. 2018) is a compact network using residual blocks (He et al. 2016). Applying transfer learning on XceptionNet (Chollet et al. 2017) for deepfake detection (Rossler et al. 2019). EfficientNet (Tan and Le 2019) is another solid architecture for deepfake detection which achieved high score in the DFDC (Dolhansky et al 2020). - Afchar, Darius, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. "Mesonet: a compact facial video forgery detection network." WIFS. IEEE, 2018. - He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR. 2016. - Chollet, François. "Xception: Deep learning with depthwise separable convolutions." CVPR. 2017. - Rossler, Andreas, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. "Faceforensics++: Learning to detect manipulated facial images." ICCV. 2019. - Tan, Mingxing, and Quoc Le. "EfficientNet: Rethinking model scaling for convolutional neural networks." ICML. PMLR, 2019. - Dolhansky, Brian, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. "The deepfake detection challenge dataset." arXiv preprint arXiv:2006.07397 (2020).
  24. 24. 24 3.2. Image-based Deepfake Detection Two-stream network deepfake detectors Two-stream network, one branch takes RGB input, the other takes steganalysis feature and using triplet loss (Zhou et al. 2017). - Zhou, Peng, Xintong Han, Vlad I. Morariu, and Larry S. Davis. "Two-stream neural networks for tampered face detection." CVPRW. IEEE, 2017. - Qian, Yuyang, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. "Thinking in frequency: Face forgery detection by mining frequency-aware clues." ECCV. Springer, Cham, 2020. The two-stream F3-Net with the frequency-aware decomposition (FAD) branch and the local frequency statistics (LFS) branch (Qian et al. 2020).
  25. 25. 25 3.2. Image-based Deepfake Detection Complex network deepfake detectors 8 64 Batch Norm 2D Conv ReLU Batch Norm 2D Conv ReLU Batch Norm 1D Conv Batch Norm 1D Conv Stats Pooling Batch Norm 2D Conv ReLU Batch Norm 2D Conv ReLU Batch Norm 1D Conv Batch Norm 1D Conv Stats Pooling Batch Norm 2D Conv ReLU Batch Norm 2D Conv ReLU Batch Norm 1D Conv Batch Norm 1D Conv Stats Pooling … … … Feature extractor Real capsule Fake capsule Softmax Mean Dynamic routing Primary capsules Output capsules Final output 3×3 stride 1 3×3 stride 1 5×1 stride 2 3×1 stride 1 16 1 4×1 vector 4×1 vector Output depth A B C !(") "(") !($) "($) !(%) "(%) #(") #($) #(%) $(") $($) % & Capsule network (Sabour et al. 2017) based detector the Capsule-Forensics (Nguyen et al. 2019) with statistical pooling layers (Rahmouni et al. 2016) used by the primary capsules. - Dang, Hao, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K. Jain. "On the detection of digital face manipulation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5781-5790. 2020. - Sabour, Sara, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." NIPS. 2017. - Nguyen, Huy H., Junichi Yamagishi, and Isao Echizen. "Capsule-forensics: Using capsule networks to detect forged images and videos." ICASSP. IEEE, 2019. - Rahmouni, Nicolas, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. "Distinguishing computer graphics from natural images using convolution neural networks." WIFS. IEEE, 2017. Two-stream attention-based deepfake detector, one branch uses manipulation appearance model (MAM), the other uses direct regression (Dang et al. 2020).
  26. 26. 26 Multi-task learning & active learning to improve generalization 3.3. Image-based Deepfake Segmentation Latent Label [0, 1] Activation Selection Recon- structed image a Encoder Shared decoder Recon- struction branch Segmen- tation branch Multi-task learning combining detection, segmentation and reconstruction (Nguyen et al. 2019). - Nguyen, Huy H., Fuming Fang, Junichi Yamagishi, and Isao Echizen. "Multi-task learning for detecting and segmenting manipulated facial images and videos." BTAS. IEEE, 2019. - Du, Mengnan, Shiva Pentyala, Yuening Li, and Xia Hu. "Towards Generalizable Deepfake Detection with Locality-aware AutoEncoder." International Conference on Information & Knowledge Management. 2020. Locally-aware autoencoder with attention loss and active learning (Du et al. 2020).
  27. 27. 27 3.3. Image-based Deepfake Segmentation - Wang, Sheng-Yu, Oliver Wang, Andrew Owens, Richard Zhang, and Alexei A. Efros. "Detecting photoshopped faces by scripting photoshop." ICCV. 2019. - Li, Lingzhi, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. "Face x-ray for more general face forgery detection." CVPR. 2020. - Chai, Lucy, David Bau, Ser-Nam Lim, and Phillip Isola. "What makes fake images detectable? Understanding properties that generalize." ECCV. Springer, Cham, 2020. Face X-ray focusing on blending area instead of manipulated area (Li et al. 2020). Using patch classifier to generate heatmap (Chai et al. 2020). Using dilated residual network (DRN) to detect photoshopped region (Wang et al. 2019).
  28. 28. 28 3.4. Video-based Deepfake Detection - Li, Yuezun, Ming-Ching Chang, and Siwei Lyu. "In Ictu Oculi: Exposing AI generated fake face videos by detecting eye blinking." WIFS. 2018. - Agarwal, Shruti, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. "Protecting World Leaders Against Deep Fakes." CVPRW. 2019. - Ciftci, Umur Aybars, Ilke Demir, and Lijun Yin. "Fakecatcher: Detection of synthetic portrait videos using biological signals." IEEE Transactions on Pattern Analysis and Machine Intelligence (2020). Biological-inspired approaches Detecting eye blinking (Li et al. 2018). Modeling facial expression movements (Agarwal et al. 2019). Using photoplethysmography (PPG) (Ciftci et al. 2020).
  29. 29. 29 3.4. Video-based Deepfake Detection - Sabir, Ekraam, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, and Prem Natarajan. "Recurrent convolutional strategies for face manipulation detection in videos." CVPRW. 2019. - Zhou, Tianfei, Wenguan Wang, Zhiyuan Liang, and Jianbing Shen. "Face Forensics in the Wild." CVPR. 2021. Automatically feature extraction approaches Multi-person deepfake detection using multi- temporal-scale instance feature aggregation and bag feature aggregation (Zhou et al. 2021). Automatically feature extraction in both spacial and temporal domains (Sabir et al. 2019).
  30. 30. 30 3.5. Generalizability Cross-domain deepfake detection is still challenging! Performances of several detectors (trained on FaceForensics++ dataset) on the Google DFD dataset. Although having high performances (over 90%) on the FaceForensics++ dataset, they still struggle with the domain mismatch issue. Capsule-Forensics (VGG-19) Capsule-Forensics (ResNet-50) Capsule-Forensics (XceptionNet FT) Feature aggregation (VGG-19) Feature aggregation (ResNet-50) Multi-task learning XceptionNet EfficientNet-B4 35 40 45 50 55 60 65 70 75 80 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 Inference time (s) Accuracy (%) Correlation between the scores of several detectors on the public and private datasets of the DFDC1. Many detectors struggle with the domain mismatch issue. 1 Image obtained from https://www.facebook.com/mediaforensics2020/videos/1640779116079742/
  31. 31. 31 4. Discussion
  32. 32. 32 4. Discussion Does ”deepfake” have good applications? àFast and easy content creation and editing • Synthetic Media: https://www.syntheticmedialandscape.com/ • Synthesia STUDIO: https://www.synthesia.io/
  33. 33. 33 4. Discussion Potential/challenging topics in deepfake detection: • Low-quality input deepfake detection • Cross-domain deepfake detection • Online learning • Explanable AI: Result explanation, finding/reconstructing original images/videos à Deepfake detection in the wild (real-world applications)
  34. 34. 34 5. References Some nice survey papers: • Verdoliva, Luisa. "Media forensics and deepfakes: an overview." IEEE Journal of Selected Topics in Signal Processing 14, no. 5 (2020): 910-932. • Tolosana, Ruben, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. "Deepfakes and beyond: A Survey of face manipulation and fake detection." Information Fusion 64 (2020): 131-148.
  35. 35. 35 Thank you very much! Q&A