Table 2. Summary of digital forensic techniques for deepfake detection.
A summary of the method used, modality, features, datasets, and performance metrics, along with the advantages and limitations of mentioned techniques.
| Reference | Method | Modality | Features analyzed | Validation dataset | Performance metrics | Advantages | Limitations |
|---|---|---|---|---|---|---|---|
| Xia et al. (2022) | MesoNet with Preprocessing | Video (Face Images) | Enhanced MesoNet features, Frame consistency, Color and texture details | FaceForensics++ and DFDC Preview Dataset | Accuracy: 95.6% (FaceForensics++), 93.7% (DFDC) | Preprocessing module enhances the discriminative capability of the network. Robust against various compression levels and deepfake generation techniques. | Performance might vary based on the quality of the deepfakes. Slight computational overhead due to preprocessing. |
| Guarnera et al. (2020) | Feature-based Forensic Analysis | Image | JPEG artifacts, Quantization tables, Sensor noise patterns | Custom dataset of StarGAN and StyleGAN images | Qualitative analysis | Targets intrinsic features and artifacts, making it robust against typical manipulations. Can be applied to a wide variety of image sources and formats. | Might not be as effective against advanced manipulation techniques. Requires high-quality original images for optimal performance. |
| Kumar et al. (2020) | Counter Anti-Forensic Approach | Image | JPEG compression artifacts, Histogram analysis, Noise inconsistencies | Self-created dataset with a variety of JPEG manipulations | Effectiveness in detecting anti-forensic manipulations discussed | Specifically designed to detect and counter anti-forensic techniques. Utilizes multiple feature sets for a comprehensive analysis. | May require calibration based on the specific JPEG anti-forensic technique used. Performance might vary based on the quality and type of manipulations. |
| Raza, Munir & Almutairi (2022) | Convolutional Neural Network (CNN) Approach | Video | Deep features from CNN layers, Temporal dynamics and spatial details | DFDC and DeepFake-TIMIT | Accuracy: 96.4% (DFDC), 95.7% (DeepFake-TIMIT) | Utilizes deep features which capture intricate details often missed by traditional methods. Highly scalable due to the deep learning framework. | Requires a significant amount of labeled data for training. Performance might degrade in scenarios with limited training data or diverse manipulations. |
| Mitra et al. (2020) | Machine Learning-based Forensic Analysis | Video (Face Regions) | Frame-by-frame pixel intensity, Facial expressions and landmarks, Audio-visual synchronization | DFDC Preview Dataset | Accuracy: 94.7%, Precision: 94.5%, Recall: 94.8%, F1 Score: 94.6% | Integrates both visual and auditory features for improved detection. Applicable to a wide range of videos sourced from social media platforms. | Might be sensitive to noisy social media data. Requires substantial computational resources for feature extraction and analysis. |
| Vamsi et al. (2022) | Media Forensic Deepfake Detection | Image and Video | Compression artifacts, Lighting anomalies, Physiological signals (e.g., heartbeat, breath patterns) | Combined dataset from FaceForensics++, DFDC, and DeepFake-TIMIT | Accuracy: 93.5% | Comprehensive approach that combines various media forensic techniques. Targets both superficial and deep features of manipulated content. | May require high-resolution data to detect subtle physiological signals. Computationally intensive due to the amalgamation of multiple forensic methods. |
| Lee et al. (2021) | Temporal Artifacts Reduction (TAR) | Video (Face Regions) | Temporal artifacts in frame sequences, Lighting and shadow inconsistencies | DeepFake Detection Challenge Dataset (DFDC) | Accuracy: 97.3% | Targets inconsistencies arising due to the deepfake generation process. Effective in detecting subtle temporal artifacts. | Might be sensitive to the quality and resolution of the video. Requires a sequence of frames. |
| Li et al. (2020) | Dataset-based Forensics (Celeb-DF) | Video (Face Regions) | Dataset creation and benchmarking | Custom Celeb-DF dataset | Focus on dataset creation | Provides a large-scale, challenging dataset for deepfake forensics. Contains high-quality deepfakes. | Dataset complexity might challenge traditional forensic techniques. Needs other datasets for comprehensive evaluation. |
| Kumar & Sharma (2023) | GAN-Based Forensic Detection | Image, Video | Discriminative features from GAN layers, Texture and color anomalies | DFDC and FaceForensics++ | Accuracy: 96.1% (DFDC), 95.8% (FaceForensics++) | Utilizes the power of GANs for deepfake detection. Capable of detecting intricate manipulations. | Sensitive to the quality of GAN-generated fakes. Requires significant computational resources. |
| Hao et al. (2022) | Multi-modal fusion | Image and Audio | Image: Differences in pixel intensity, facial landmarks, skin tone inconsistencies. Audio: Spectral features, prosodic features, phonotactic features. | Deepfake Detection Challenge Dataset (DFDC) | Accuracy: 94.2%, Precision: 93.8%, Recall: 94.1%, F1 Score: 94.0% | Uses a fusion of image and audio modalities which increases robustness. Effective in real-world scenarios where only one modality might be tampered with. | Requires both audio and video data, which might not always be available. Slightly increased computational overhead due to multi-modal processing. |
| Jafar et al. (2020) | Temporal Forensic Analysis | Video | Temporal inconsistencies: Frame-to-frame variations. Compression artifacts: Differences due to video compression. Lighting inconsistencies: Inconsistencies in shadows and light reflections. | FaceForensics++ and DeepFake-TIMIT | Accuracy: 91.5% (FaceForensics++), 89.8% (DeepFake-TIMIT) | Targets inconsistencies that arise due to the video generation process. Robust against various deepfake generation techniques. | Performance might degrade with higher-quality deepfakes. Requires a sequence of frames rather than individual images. |
| Ferreira, Antunes & Correia (2021) | Dataset-based Forensics | Image and Video | Metadata extraction, Image source identification, Manipulation detection | Proprietary dataset introduced in the article | Accuracy, Precision. | Provides a diverse set of images and videos for forensic analysis. Can be used to benchmark multiple forensic techniques. | Dataset might not cover all possible manipulations and scenarios. Requires periodic updates to remain relevant. |
| Wang et al. (2022a), Wang et al. (2022b) and Wang et al. (2022c) | Reliability-based Forensics | Video, Audio | Frame consistency, Eye blinking patterns, Facial muscle movements, Skin texture analysis and Voice pattern. | Celeb-DF Dataset | Accuracy: 92.3%, Precision: 92.1%, Recall: 92.4%, F1 Score: 92.2% | Uses natural physiological signals which are hard for deepfakes to mimic. Applicable to a wide range of videos regardless of content. | Might be sensitive to video quality and resolution. Real-life scenarios with partial occlusions or low lighting might affect performance. |
| Xue et al. (2022) | Combination of F0 information and spectrogram features | Audio | Fundamental frequency (F0), real and imaginary spectrogram features | ASVspoof 2019 LA dataset | Equivalent error rate (EER) of 0.43% | High effectiveness in detecting audio deepfakes, surpassing most existing systems | Limited discussion on the applicability in diverse real-world scenarios |
| Müller et al. (2022) | Re-implementation and evaluation of existing architectures | Audio | Various audio spoofing detection features | New dataset of celebrity and politician recordings | Performance degradation on real-world data | Systematizes audio spoofing detection, identifies key successful features | Poor performance on real-world data, suggesting limited generalizability |
| Khalid et al. (2021) | Novel multimodal detection method | Audio-Video | Deepfake videos and synthesized cloned audios | FakeAVCeleb dataset | Checking audio and video accuracy and precision. | Addresses multimodal deepfake detection and racial bias issues | Dataset might not cover all possible manipulations and scenarios. |
| Fagni et al. (2021) | Dataset introduction and evaluation | Text (Tweets) | Tweets from various generative models | TweepFake dataset | Evaluation of 13 methods | First dataset of real deepfake tweets, baseline for future research | Specific detection techniques not developed in the article |
| Kietzmann et al. (2020) | R.E.A.L. framework | Various (including text) | Deepfake types and technologies | Text based mostly to check accurancy. | Framework effectiveness | Comprehensive overview, risk management strategy | Lacks empirical validation |
| Pu et al. (2023) | Semantic analysis | Text | Semantic information in text | Online services powered by Transformer-based tools | Robustness against adversarial attacks | Improves robustness and generalization | Performance degradation under certain scenarios |