Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Feb 15;16:9137. doi: 10.1038/s41598-026-40387-9

Source camera attribution using a rule-based explainable convolutional neural network

Tahereh Nayerifard 1, Haleh Amintoosi 1,, Abbas Ghaemi Bafghi 1
PMCID: PMC12996404  PMID: 41692832

Abstract

In recent years, there has been a push towards adopting artificial intelligence (AI) models in digital forensics (DF), particularly deep learning (DL) models. While these models assist DF experts, their lack of transparency raises concerns about reliability. Although eXplainable Artificial Intelligence (XAI) has progressed, current methods remain limited for DF applications. Existing visual XAI techniques do not provide sufficient clarity for challenging image forensics tasks such as Source Camera Identification (SCI), nor do they offer mechanisms to assess whether a model’s decision is correct. Most methods simply highlight influential regions without enabling examiners to validate the decision itself. Rule-based explainability is a promising strategy for increasing transparency, yet deploying it on real-world Convolutional Neural Networks (CNNs) is still challenging. Prior studies remain largely experimental and often require modifying the model to extract rules, conflicting with the integrity requirements of DF workflows. To address these gaps, this paper introduces a framework to make CNN models used in the analysis stage of digital forensics explainable. The framework, by following three fundamental steps—layers trace detection, layers majority voting, and rule extraction—provides structured and transparent visual output, and rule-based textual explainability that is understandable to the user. Based on this, the first explainable Source Camera Identification (SCI) model is introduced which is a challenging DF task to make it explainable. The explainable output allows for the rejection or confirmation of the main model’s prediction based on the decisions of the layers and compliance with the principle of integrity to the DF examiner. In addition, with the identification of 27 out of 37 incorrect predictions by the base model, the precision of the model was improved from 97.33% to 99.2%.

Keywords: Explainable artificial intelligence, XAI, Image forensics, Rule extraction, Convolutional neural network, Source camera attribution

Subject terms: Engineering, Mathematics and computing

Introduction

With exponential growth and increasing complexity of digital data, the field of digital forensics faces a plethora of new challenges. One such challenge is effectively handling large volumes of data and extracting meaningful insights, which requires the adoption of novel techniques such as those derived from the field of Artificial Intelligence1. Leading forensics software developers such as Cellebrite, Magnet Forensics, and Griffeye Technologies have integrated their proprietary AI-driven predictive models into their tools. These models aim to aid in the identification of potentially relevant data for forensics analysis. These models are designed to help identify potentially crucial data for forensic analysis. However, as of the current publication date, these models lack explanatory capabilities, which hinders investigators from comprehending the rationale behind their findings. Successful integration of AI in digital forensics requires not only accurate results, but also clear justifications and explanations to empower forensic analysts to make well-informed decisions without unnecessary complexity2.

Establishing trust in AI systems is crucial for their successful application in DF. Several factors contribute to the acceptance of AI in DF, including transparency, interpretability, and understandability3. An Explainable AI system can fulfill this role in DF. However, it is important to note that the existing XAI methods themselves also face challenges, such as:

Lack of support for challenging DF tasks: Some tasks in digital forensics are challenging to explain and make transparent using current explainability methods. For example, making AI models explainable through existing visual XAI methods for tasks such as source identification, which is fundamental in image forensics, often fails to provide a clear and understandable explanation of the model’s decision-making process. Since this task primarily involves identifying the source’s fingerprint in the evidence, comprehending this fingerprint (such as camera noise on an image) using visual methods is not easily achievable for a human user.

Lack of model error detection: The current explainability methods do not determine the accuracy or rejection of the model’s decisions; this aspect is entirely left to human experts. Although this is not a primary feature of XAI methods, it becomes important in the field of digital forensics to assist forensics investigators in making accurate decisions.

Failure to preserve the principle of integrity: Some XAI methods require adding a component to the model or simplifying and creating an approximation of the model that contradicts the notion of ensuring integrity in digital forensics.

To address these gaps and to have an XAI applicable in DF which we term as xDFAI4, this paper introduces a framework based on rule extraction using layers’ majority voting. Explainability can be achieved in three stages: pre-modeling, modeling, and post-modeling5. Among the existing Model-specific methods belonging to the second stage, Rule Extraction offers explanations that possess several strengths, including faithfulness, robustness, comprehensiveness, and consistency6. Using post-modeling explainability and identifying traces at each layer, we offer explainability during the modeling stage through the proposal of rule extraction based on majority voting across layers. This framework aims to improve explainability in CNN-based digital forensics attribution tasks, and is empirically validated in this work through a scoped SCI case study. The research seeks to answer the following Research Questions (RQs):

RQ1: How can a CNN model be explained using a rule extraction method, while ensuring adherence to the principle of integrity?

RQ2: What will the proposed rule-based XAI framework bring to digital forensics investigators?

To address RQ1, we propose a framework in the “Proposed xDFAI framework (RQ1)” section that integrates seamlessly into the digital forensics process, enabling the creation of an xDFAI. By performing a challenging DF task and analyzing the results in the “Features of the proposed framework (RQ2)” section, we address RQ2.

Scope of the study and validation setting xDFAI is a post-hoc explainability framework for auditing convolutional neural networks in digital forensic attribution tasks. The scope of the present study is intentionally limited to Source Camera Identification (SCI), which is used as a canonical case study to validate the framework. No empirical claims are made regarding cross-task, cross-dataset, or cross-architecture generalization. Instead, the objective is to demonstrate how xDFAI operates, what types of internal model behaviors it can expose, and under which conditions its explanations are meaningful within the SCI setting. Extensions to other forensic domains are conceptually possible but are outside the empirical scope of this work.

Contributions

The main contributions of this paper are summarized as follows:

  • We propose xDFAI, a novel explainable framework for CNN-based digital forensics, designed to provide interpretable and reliable predictions.

  • We introduce layer-wise trace analysis to capture discriminative features at each stage of the CNN, providing insights into model behavior.

  • We employ a majority voting mechanism across layers to generate consistent, explainable outputs.

  • We integrate post-hoc rule extraction to produce faithful explanations while preserving the integrity of the original model.

These contributions collectively advance the state-of-the-art in explainable digital forensics by combining trace analysis, layer-wise aggregation, and interpretable rule extraction into a cohesive framework. A detailed discussion of the technical innovations and their novelty is provided in “Distinct contributions and key technical innovations of xDFAI” section.

The rest of the paper is organized as follows. The next section provides an overview of the current state-of-the-art in explainable image forensics solutions. “Proposed xDFAI framework (RQ1)” section details the proposed framework. “Case study implementation: an explainable CNN-based SCI” section presents the experiments conducted and the results obtained, and also investigates the characteristics of the proposed framework. The Discussion outlines the limitations and future prospects. Finally, the main conclusions are drawn.

Related works

Explainable AI-based digital forensics (xDFAI) refers to an AI-based digital forensics method that provides explicit and intelligible rationale for its functions and inferential reasoning, ensuring transparency and assessability4. There is a lack of research and significant attention needed in the field of explainability in digital forensics. Many works in this area have not been conducted, highlighting the need for further exploration and focus.

W. Hall et al. conducted a study focusing on Explainable Artificial Intelligence (XAI) and its application in Digital Forensics7. They presented a model that utilized LIME to analyze extracted forensics data and generate logical conclusions considering the contextual factors of the case. This approach aimed to improve the efficiency of the investigation. Additionally, the authors aimed to showcase the feasibility of implementing XAI in digital forensics, assess the performance of XAI tools using simulated forensics data, and explore the implications and potential advantages of XAI in this field. In network forensics, W.W. Lo et al. introduce XG-BoT, a deep graph neural network model designed for botnet detection and forensics8. XG-BoT comprises a botnet detector and an explainer module for automatic forensics. To extract meaningful node representations from botnet communication graphs, the model employs a grouped reversible residual connection in combination with a graph isomorphism network. Additionally, the XG-BoT explainer utilizes GNNExplainer and saliency maps to identify and emphasize suspicious network flows and associated botnet nodes, so it is suitable for graph-based models. This approach enhances the transparency and explainability of the detection process, helping to automate network forensics.

S. Henrique et al. present a hierarchical explainable forensics algorithm that combines attention-based deepfake detection and an ensemble of models to enhance generalization9. The algorithm employs a Grad-CAM explanation to evaluate the decisions of the model, with a specific focus on the attention maps. This approach achieves an accuracy of 92.4%. In relation to cybersecurity, W. Ge et al. presented MetaCluster, a versatile framework for explainable classification in cybersecurity10. MetaCluster generates semantic prototypes by embedding representations, acquiring prototypes, and aggregating semantics at different levels of granularity. The framework is evaluated in cybersecurity classification tasks, such as malware family classification, threat behavior analysis, and malicious traffic identification.

The current state-of-the-art methods in the field of explainability in digital forensics have mainly focused on utilizing existing XAI methods, which still pose challenges79 as discussed in Section Introduction. One significant challenge is the limited scope of these methods, potentially impeding their applicability to various forensics scenarios such as source identification. Despite the introduction of a new method by W. Ge et al., implementing it into target models requires modifying the models to include explainability features10. Furthermore, the explanations generated by these methods often provide visual explanations in the image data, possibly not providing users with sufficient comprehension of the rationale behind the decisions of the model710. In digital forensics, the examiner must determine the correctness of the predictions of the model. However, recent research studies place the onus on the digital forensics examiner to assess the reliability of the model’s output710. These limitations underscore the need for further advancements in explainability methods to address these challenges and enhance their practical utility in digital forensics. Accordingly, the comparison in this section is intentionally limited to widely adopted explainability methods that are compatible with convolutional neural networks and operate in a post-hoc manner without requiring modification of the target model, in line with digital forensics integrity constraints. Table 1. provides a summary and comparison of the related work with current research.

Table 1.

Summary of state-of-the-art studies in explainability of digital forensics tasks in comparison to our work. Not all existing methods are applicable to challenging tasks such as source camera identification, nor do they have the capability to detect incorrect model decisions. (Vis. denotes Visualization.).

Ref. New method XAI method Error detection XAI type Integrity preservation Challenging DF tasks
7 LIME Vis.
8 GNNExplainer, saliency map Vis.
9 Grad-CAM Vis.
10 MetaCluster Vis.
Our work Rule extraction based-on layers’ majority voting

Vis.,

Rules

To address the limitations of current methods, this research presents a framework that enhances the explainability of AI models in the context of digital forensics. The framework incorporates three essential steps: layers’ Trace detection, layers’ majority voting, and rule extraction. The result provides explanations for the validation or rejection of decisions of an AI model while preserving the integrity of the models. Identifying incorrect model predictions allows digital forensics examiners to make well-informed final decisions. Particularly in complex scenarios such as image analysis, where currently only visual explainability is available, this framework significantly enhances the capabilities of XAI in digital forensics research.

Proposed xDFAI framework (RQ1)

To address the existing problems described in “Related works” section and to respond to RQ 1, we introduce a framework designed to enhance transparency in the behavior of the model across its layers, helping digital forensic examiners and analysts make informed decisions. Illustrated in Fig. 1., the framework consists of three main components: Trace Detection, Layers’ Majority Voting, Rule Extraction, and the final step, an xDFAI. The primary objective of this framework is to identify the truly important features for each class in each layer with minimal errors, referred to as “Trace.” Voting operations are then conducted in each layer based on these identified Traces. Subsequently, the decision-making process is structured through logical rules, extracting the model’s behavior and arriving at the final decision. In the subsequent sections, we will provide detailed explanations of each component. A comprehensive list of symbols used in the paper can be found in Table 2. In order to enhance transparency and reproducibility in scientific research, we have released the source code for our proposed framework to the public.

Fig. 1.

Fig. 1

Overview of the proposed rule-based explainable framework for CNN-driven models in digital forensics. The figure presents the overall workflow of the framework, showing how rule extraction integrates with CNN-based model outputs to enhance explainability within digital forensics tasks.

Table 2.

Description of the symbols used in the proposed framework. The table provides an overview of all notations and parameters employed throughout the proposed Algorithms 1 and 2.

Symbol Description Formula/value
y Number of the training samples for a class
r

The rate of reduction of y to determine the number of

common important features in a layer for identification

%10 * Inline graphic

(Explanatory)

Inline graphic Important features of a layer l and class c
Inline graphic Identified Trace of layer l and class c Algorithm 1
TA Activation values of Trace Algorithm 1
NED

Normal Euclidean Distance: the distance between Inline graphic and Inline graphic

for each Trace i and training sample j

Inline graphic - Inline graphic
SED

Strict Euclidean Distance: the distance between median value of

Inline graphic and Inline graphic for each Trace i

Inline graphic) - Inline graphic
threshold1 Minimum acceptable number of Inline graphic for each class c

%10 * Inline graphic

(Explanatory)

threshold2

Standard deviation values of Inline graphic where x=Inline graphic and

n=the length of the training samples for class c

Inline graphic
Inline graphic

The minimum score required for the winning class;

based on the number of layers in the target model

Inline graphicNumber of model layersInline graphic
Inline graphic The final Majority Voting Decision by summing layer results Algorithm 2
Inline graphic The number of winning class(es) based on Inline graphic Algorithm 2

A theoretical explainability model for CNN-based digital forensics

Explainable Artificial Intelligence (XAI) aims to provide human-understandable and faithful explanations of a models decision-making process without compromising the integrity of the underlying system. According to IEEE Standard 28946, an AI system is explainable if it can express the factors influencing its decisions in a form that is both interpretable and faithful to the original model. Within this framework, explainability is not restricted to visual attribution maps, but may also be achieved through symbolic, rule-based, or behavioral explanations, provided that the explanation reflects the true internal decision logic of the model.

Let Inline graphic denote a trained convolutional neural network (CNN) composed of L sequential layers, such that

graphic file with name d33e474.gif

where x is an input image and Inline graphic represents the transformation performed at layer Inline graphic. For a classification task with C classes, the decision of the network can be viewed as the result of a sequence of intermediate representations that progressively encode task-relevant features.

In this work, we define explainability as the ability to expose and reason about the decision behavior of the CNN across layers, rather than relying solely on post-hoc visual saliency. To this end, we introduce the notion of a Trace, which serves as the fundamental explanatory unit of the proposed framework. A Trace represents a layer-wise characterization of how discriminative features evolve within the network and how they support or contradict the final prediction.

Formally, for a given input x and class c, let Inline graphic denote the Trace extracted at layer Inline graphic, capturing the most influential feature activations associated with class c at that layer. The collection of Traces across layers,

graphic file with name d33e524.gif

constitutes a structured explanation of the model’s internal behavior. Unlike conventional attribution maps that provide a single, often ambiguous visualization, this representation enables systematic analysis of consistency and agreement across layers.

The explainability mechanism of the proposed framework is grounded in three core principles:

  • Faithfulness: Traces are extracted directly from the internal activations of the trained CNN without introducing surrogate models or approximations, ensuring that explanations accurately reflect the true decision process.

  • Integrity preservation: The original model architecture and parameters remain unchanged. The framework operates as a non-invasive explanatory layer, satisfying a key requirement for forensic and security-sensitive applications.

  • Decision-level interpretability: By analyzing agreement and disagreement among layer-wise decisions, the framework explains why a prediction is reliable or suspicious, rather than merely highlighting where the network attends in the input.

Within this theoretical view, layers’ majority voting is not treated as a heuristic classifier but as an explanatory operator that reveals the stability of the decision-making process. Consistent agreement among Traces across layers indicates a coherent and stable internal reasoning path, whereas persistent disagreement exposes abnormal or unreliable behavior. This interpretation aligns with established perspectives on decision-path stability and model introspection in deep learning.

It is important to emphasize that this formulation differs fundamentally from prior work that applies anomaly detection or post-processing to the outputs of existing XAI methods (e.g.,11). Such approaches operate after explanations are generated, whereas the proposed framework embeds explainability directly into the analysis of the CNN’s internal decision flow. Consequently, the Trace-based representation and cross-layer behavioral analysis introduced in this work constitute a distinct class of model-specific, rule-based XAI tailored to digital forensics tasks, particularly source camera identification.

Distinct contributions and key technical innovations of xDFAI

This work introduces several technical contributions that advance explainable artificial intelligence for digital forensics by integrating interpretability directly into deep model analysis. The key contributions are summarized below.

  • Layer-wise trace-based explanation paradigm: We introduce a layer-wise trace-based explanation paradigm that leverages internal activation dynamics across multiple network layers to support interpretability. Unlike conventional explainable AI approaches that rely primarily on output-level attribution or saliency analysis, the proposed paradigm constructs explanations from hierarchical internal representations, enabling a closer alignment between model behavior and forensic reasoning processes.

  • Cross-layer majority voting for explanation stability: This work formulates cross-layer majority voting as an explanation stabilization mechanism within a single deep model. While majority voting is commonly used for prediction aggregation in ensemble learning, its application as an operator for consolidating explanatory evidence across internal network layers is distinct. This approach enhances the consistency of explanations by emphasizing agreement across multiple abstraction levels.

  • Model-preserving post-hoc rule extraction for forensic AI: We propose a post-hoc rule extraction strategy that preserves the integrity of the original deep model. In contrast to surrogate-based explainability techniques that approximate model behavior using simplified interpretable models, the proposed approach derives symbolic rules directly from internal inference behavior without altering the trained network. This property is particularly relevant for digital forensic applications, where evidentiary validity and reproducibility require that explanatory procedures do not interfere with the original decision process.

Methodological rationale and theoretical justification of xDFAI

This subsection provides the methodological and theoretical foundation of the xDFAI framework. The objective is to formally justify its applicability, robustness, reproducibility, and explanatory behavior in CNN-based digital forensics, while clearly separating structural properties from empirical validation. The framework is designed to provide faithful, layer-wise explanations without modifying the underlying model, addressing key limitations of prior XAI approaches.

  1. Structural Applicability and Scope of Validity: We first distinguish between empirical generalization, which requires experimental validation across multiple tasks, datasets, or architectures, and structural applicability, which follows from the design of the framework and the model interfaces it assumes.

    Proposition 1

    (Structural Applicability) Let Inline graphic be a trained CNN composed of L layers and used for a supervised attribution task with C classes. The proposed xDFAI framework operates only on (i) intermediate layer activations, (ii) class-conditioned gradient-based attributions, and (iii) deterministic aggregation and rule extraction over layer-level traces. Therefore, xDFAI is structurally applicable to any CNN-based attribution model that exposes these quantities, independently of the specific forensic task or dataset.

    Proof sketch. The trace construction and aggregation steps require no task-specific priors, no handcrafted forensic features, and no modification of Inline graphic. They consume generic tensors produced by standard CNN forward/backward passes (activations and class-conditioned gradients) and apply deterministic operators (trace selection, cross-layer aggregation, and rule mapping). Hence, applicability follows from architectural interface assumptions rather than task-specific empirical tuning. This distinction ensures that no empirical generalization beyond the evaluated task is implicitly claimed.

  2. Attribution Noise and Trace Stability: The xDFAI framework relies on feature attribution methods to initialize layer-wise Traces. As with all post-hoc attribution techniques, attribution maps may exhibit variability due to numerical approximation, background selection, or local gradient sensitivity. Rather than assuming perfectly stable attributions, xDFAI is explicitly designed to tolerate bounded attribution noise through aggregation and filtering.

    Assumption 1

    (Bounded Attribution Perturbation:) Let Inline graphic denote the attribution map at layer Inline graphic for input x and class c, produced by a gradient-based attribution method. Attribution variability is modeled as a bounded perturbation
    graphic file with name d33e675.gif
    where Inline graphic is a zero-mean perturbation with bounded magnitude.

    Definition 1

    (Stable Trace Element:) A feature index i at layer Inline graphic is considered trace-stable for class c if it appears among the most influential features for a sufficiently large fraction of training samples of class c, despite attribution perturbations.

    Proposition 2

    (Noise Attenuation by Trace Aggregation:) Under Assumption 1, the probability that an unstable attribution feature persists across Trace extraction decreases exponentially with the number of aggregated samples and layers. Consequently, features that consistently appear in the extracted Traces are invariant to small attribution perturbations and represent stable explanatory elements.

    Proof sketch. Trace construction relies on intersection and frequency filtering over attribution maps computed from multiple samples and layers. Random or sample-specific attribution noise is unlikely to survive repeated intersection and thresholding operations, whereas genuinely discriminative features persist. Majority voting further suppresses isolated layer-level fluctuations by favoring consistent class-level evidence.

  3. Explanatory Hyperparameters and Reproducibility: The framework introduces several user-defined parameters, including the reduction rate r, thresholds Inline graphic and Inline graphic, and the decision threshold Inline graphic. These parameters do not influence the learned representations or predictive function of the underlying CNN.

    Definition 2

    (Explanatory Hyperparameters:) A parameter is termed an explanatory hyperparameter if it affects only the selection, aggregation, or presentation of explanatory elements, without altering model weights, feature representations, or prediction outcomes.

    Under this definition, all parameters used in xDFAI serve explanatory roles. Specifically, Inline graphic and Inline graphic regulate trace consistency, r governs the strictness–efficiency trade-off during trace extraction, and Inline graphic enforces a minimum level of cross-layer agreement.

    Remark 1

    (Reproducibility Across Settings:) Because explanatory hyperparameters operate independently of the training process, they can be adapted across datasets or architectures without retraining the CNN, enabling reproducible forensic analysis.

    The computational cost of xDFAI is dominated by Trace extraction, which is performed offline.

    Definition 3

    (One-Time Explanatory Cost:) A computational cost is termed one-time if it is incurred only during the initialization of the explanatory framework and can be amortized over all subsequent inference instances.

    Remark 2

    (Scalability Considerations:) Trace extraction scales linearly with the number of layers and training samples and is independent of the number of test instances, ensuring compatibility with practical forensic workflows.

  4. Layer-wise Majority Voting as an Explanatory Aggregation Operator: Aggregation of layer-level information in xDFAI is performed via majority voting. This operation is not intended as a classifier or ensemble fusion mechanism, but as an explanatory operator that exposes decision stability.

    Definition 4

    (Explanatory Aggregation Operator:) An aggregation operator is termed explanatory if it summarizes internal model behavior without altering the predictive function or decision boundaries of the model.

    Proposition 3

    (Robustness to Localized Layer Noise:) Assuming bounded, independent perturbations in layer-wise votes, majority voting suppresses isolated fluctuations by favoring class hypotheses supported by multiple layers.

    Proof sketch. Localized noise affecting a limited number of layers is outvoted by consistent signals from unaffected layers. As network depth increases, the likelihood of random perturbations dominating the aggregated result decreases.

    Weighted or confidence-based fusion strategies would introduce additional assumptions and parameters without improving interpretability for the explanatory objective pursued here.

  5. Explanatory Components and Validation Context: Within this framework, Traces capture discriminative features at each layer, majority voting aggregates class-level evidence across depth, and post-hoc rule extraction translates internal behavior into symbolic explanations without modifying the model.

    While Proposition 1 establishes structural applicability, empirical validation in this study is intentionally limited to the Source Camera Identification (SCI) task. No claims are made regarding empirical generalization beyond this case study.

    Collectively, these components form a coherent, theoretically grounded explainability framework that supports transparent, integrity-preserving analysis of CNN-based digital forensics models.

Explainable AI implementation in xDFAI

This section details how explainability is operationalized within the xDFAI framework. The description focuses exclusively on the mechanisms used to extract, aggregate, and represent interpretable information from the model during inference, without addressing novelty or comparative advantages.

  • Layer-wise Trace Extraction: During inference, xDFAI records intermediate activation responses from a predefined set of layers within the deep neural network. These intermediate responses, referred to as layer-wise traces, capture the transformation of input data across successive abstraction levels of the model. Each trace reflects the internal representation formed at a specific stage of the inference pipeline. The extracted traces are stored in a structured form that enables subsequent aggregation and analysis. By preserving activation information across multiple layers, xDFAI enables inspection of internal model behavior beyond the final output layer, forming the basis for later explanation generation.

  • Cross-Layer Majority Voting Mechanism: To aggregate explanatory signals derived from multiple layer-wise traces, xDFAI applies a cross-layer majority voting mechanism. Each selected layer produces an intermediate explanatory outcome based on its corresponding trace. These outcomes are combined using a deterministic voting rule, where the explanation supported by the majority of layers is selected. This aggregation process operates independently of model parameters and does not modify the inference procedure. The voting mechanism serves solely to consolidate explanatory information derived from different depths of the network.

  • Post-hoc rule extraction: Following the aggregation of layer-wise explanatory outcomes, xDFAI performs post-hoc rule extraction to generate symbolic representations of the inference process. The extracted rules are derived from observed activation patterns and voting results, translating internal model behavior into human-interpretable logical conditions. The rule extraction process is non-intrusive and is executed after inference is completed. The trained model remains unchanged, and no auxiliary or surrogate models are introduced during explanation generation.

Trace detection

In the Trace Detection component, we assume the availability of a trained model to perform a digital forensics task, along with access to the dataset on which the model has been trained. The purpose of this component is to detect significant patterns or footprints within each layer, specifically for each class. This is accomplished using existing feature attribution methods. In this stage, the following main steps are performed for each class:

Step 1- Calculation and identification of important features for each layer: Using existing feature attribution methods, we identify the influential features in each layer for all training data, categorized by class. The appropriate method may vary depending on the specific task, but generally, SHAP GradientExplainer is used here as one instantiation of the attribution operator due to its model-agnostic Shapley-value formulation12. Positive Shapley values are utilized to achieve this objective.

Step 2- Identification of common important features (Trace): Due to factors such as model bias or potential errors in the Feature Attribution method used, the results obtained in Step 1 may vary between different data for a specific class. For example, if there are 1000 training data for class C, the results from Step 1 may not be consistent across all 1000 data. Since accurate identification of the Trace in each layer is crucial and impactful in the final results within this framework, important features are filtered and selected in each layer based on their presence in all or a significant percentage of the training data for that class. If the number of Traces found is below a threshold (threshold1), the samples (y) of that class will decrease at a rate of r (an explanatory hyperparameter controlling the strictness–efficiency trade-off; see Remark 1), and the search will be conducted among the updated reduced value of y. This approach ensures that important features are shared among the revised y samples. The outcome of this step is the Traces for each class across all layers, derived from the participation of all or a high percentage of the training data. Figure 2a (based on Algorithm 1) demonstrates the process of identifying Traces.

Fig. 2.

Fig. 2

Flowcharts of the main components in the proposed framework. (a) Trace Detection component extracts traces from all layers and classes (at the initial time, y represents the length of the training data for class c). (b) The output of the Layers’ Majority Voting component is used to activate the extracted rules.

Step 3- Calculation of activation values based on the identified Trace in step two: In this stage, it is necessary to calculate the output of activations in the identified Trace for each of the training data for all layers. These values will be required in the next stage, the Majority Voting and Rule Extraction component.

Algorithm 1.

Algorithm 1

Trace detection

Layers majority voting and rule extraction

The flow chart displayed in Fig. 2b (based on Algorithm 2) illustrates our proposed method for computing the majority voting of the layer for an input sample based on the identified Trace obtained using Algorithm 1. Following the identification of Traces in the preceding phase, the activation values of these Traces are extracted for the validation data. The similarity between the Traces of the validation samples and their corresponding training samples across all layers is assessed using two distance metrics: Normalized Euclidean Distance (NED) and Strict Euclidean Distance (SED). In the NED calculation, each value of a Trace from the validation data is compared with all values of the corresponding Traces from the training samples across all classes, and the smallest distance is chosen. Since each class has distinct values for a specific Trace equivalent to the number of training instances in that class, a Trace in a layer earns a score if it exhibits similarity to any of the corresponding Traces from the training data in that layer.

Algorithm 2.

Algorithm 2

Layer-based majority voting

Based on the result of the Layers’ Majority Voting Algorithm 2, the maximum number of votes (Inline graphic), the decision of the majority vote (Inline graphic), a threshold is set based on the number of layers in the model (Inline graphic) and the prediction of the base model (P), two rules are extracted (Table 3). The results of these rules are categorized into two groups: either validation of the model’s decision or flagging the model’s behavior as anomalous, indicating a potential model error. In the process of extracting these rules, the value of Inline graphic is adjusted according to the number of layers within the target Deep Neural Networks (DNN) model. In this framework, Inline graphic is defined as an explanatory hyperparameter that scales with model depth to enforce a minimum cross-layer consensus, as specified in Table 2. . To ensure the effectiveness of the extracted rules, they are evaluated on a separate testing dataset.

Table 3.

Rule-based prediction derived from Layers’ Majority Voting outputs. The table shows how decisions are made using Inline graphic, Inline graphic, and model prediction P. The threshold parameter Inline graphic is determined based on the number of layers in the target CNN model.

Rule Conditions Decision
R1 Inline graphic Confirmation
R2 Inline graphic Abnormal behavior

xDFAI

In this step, the test sample is evaluated against the extracted rules. Similarly to the prediction of the model, activations in various layers are obtained based on the Traces for the test sample. By executing Algorithm 2 for the test sample, the necessary variables are configured in the preconditions of the rules, and one of the rules is applied, leading to the confirmation or detection of abnormal behavior regarding the prediction of the model. In addition, the results can be visually displayed to the user.

Case study implementation: an explainable CNN-based SCI

To implement the proposed framework, we focus on the Source Camera Identification (SCI) task in image forensics. SCI is a crucial task that cannot be effectively addressed using existing explainability methods. The task relies on identifying the unique noise patterns like Photo response non-uniformity (PRNU) introduced by different cameras in the captured images. PRNU is a unique pattern inherent to each camera sensor, caused by slight variations in the manufacturing process. This pattern serves as a fingerprint for the camera, allowing us to identify the source camera of an image. These noise patterns are not visually distinguishable or meaningful to the human eye, and current explainability methods are inadequate for this task and do not provide reliable assistance in camera identification. The target model for the SCI task is a 7-layer convolutional neural network13. It consists of four convolutional layers and three fully connected layers. Remarkably, the model achieved an accuracy of 97.26% on a test set of 1350 images captured by 27 camera models from 11 brands (VISION dataset)14.

To apply the proposed RExCSA framework, the SHAP library’s GradientExplainer12 is used to detect Traces in each layer. To identify the Traces, we find the intersection of the indices of the recognized Shapley values to identify the common important features among all training samples of a class as the Traces for that class in the respective layer. If the number of Traces discovered is below a threshold (threshold2) set at 10% of the important features, we search for important features shared among a reduced number of training samples (y). In this implementation, the reduction rate r was instantiated with a fixed value ( 5% ) to operationalize the trace selection procedure. As discussed in Remark 1, r is an explanatory hyperparameter that controls the strictness–efficiency trade-off and does not affect the predictive function of the base model.

Figure 3 demonstrates the comparison between important features identified using Grad-CAM15, Integrated Gradients16, and SHAP GradientExplainer in the fourth Convolutional layer, along with our identified Trace across four images captured by a Huawei Ascend G6-U10, where the model predictions were correct. The comparison indicates that the identified Traces from our proposed framework exhibits a remarkable similarity across all four images, unlike the outputs of the other three methods. Although it seems that SHAP has also performed well, upon closer examination of its output, various discrepancies appear across all four images, indicating a visual error that makes it challenging for the human user.

Fig. 3.

Fig. 3

Comparison between the proposed framework and state-of-the-art explainability methods. Grad-CAM, Integrated Gradients, and GradientExplainer are compared in the fourth convolutional layer, along with our identified Traces, across four images captured by the Huawei Ascend G6-U10 (VISION dataset) with correct model predictions13. Grad-CAM and Integrated Gradients: Lighter colors indicate higher importance, GradientExplainer: Red and blue colors represent feature values that contribute positively and negatively, respectively, to the model’s prediction. Our framework: The identified Traces are displayed as distinct squares, which have a darker color based on the higher activation value. Inline graphic has been utilized in Grad-CAM, Integrated Gradients and in our output as well.

Additionally, based on our proposed framework, it is possible not only to provide a clearer visual representation but also to inform the examiner about the class to which the Trace of examined layer’s belongs. By extracting Trace and calculating the majority votes of layers based on the validation data set, the value of Inline graphic in Table 3 is set to the value of 4 in the CNN-based model with 7 layers. Figure 4 shows two examples of the rule-based explanation extracted from Layers’ Majority Voting. Figure 4a confirms the correct prediction of the model identified by the extracted rules. In Fig. 4a (1), the classes that received the most votes in both NED and SED scenarios based on their intersection are displayed. It is important to note that among the four different identified classes, three are related to the Apple brand, and all layers have identified the trace of this brand. The main result, based on our layers’ Majority Voting solution (separate decisions of NED and SED), is shown in Fig. 4a (2) to identify the winning class(es). According to NED, class 4 has emerged victorious with 7 votes, and according to SED, Classes 4 and 5 have tied with 3 votes each. Therefore, the total score for Class 4 is 10, leading to the final selection based on the rules, aligning with the prediction of the base model.

Fig. 4.

Fig. 4

Examples of rule-based explanations extracted using Layers’ Majority Voting outputs. (a) The sample captured by the Samsung Galaxy S III Mini GT-I8190N (class 1) exhibits an incorrect model prediction, which has been identified by the framework. (b) The model’s accurate prediction for the iPhone 5c (class 4) has been confirmed.

Figure 4b illustrates an incorrect prediction made by the base model, which has been flagged as an anomaly by the framework. As seen in Fig. 4b(1), the common classes between NED and SED across different layers show significant variability in terms of brands, with five different brands identified. In Fig. 4b(2), there was no consensus between the winning class (Class 1) determined by our Layers’ Majority Voting and the decision of the base model. Two important points are highlighted in the results of this experiment: 1) the incorrect prediction of the base model was Class 14, contrary to our proposed framework, which did not identify the Trace of this class in any layer; and 2) according to Fig. 4b(2), our framework accurately detected the trace of Class 1 in the Layer Majority Voting decision.

Results

To answer RQ2, it is necessary to examine the accuracy and precision results obtained by applying the proposed framework to the model studied13. In this case study, the CNN-based model was tested using 1350 unseen samples. After applying the proposed rules to the same 1350 test images, the results yielded 1276 true positive (TP), 10 false positive (FP), 27 true negative (TN), and 37 false negative (FN) instances. Table 4 provides a comparison between the performance of the basic model and the explained model. The 27 TN cases depict samples displaying non-systematic behavior, resulting in no assigned class. These 27 instances were misclassified by the base model, assigning them to an incorrect class. Through the proposed rules, these 27 instances of model errors are presented to the user as” Abnormal behavior.” Furthermore, 37 FN cases emerged where the based model correctly classified the class, but the rules failed to assign the final class to these instances. It should be mentioned that out of the 10 cases which our method did not identify the error of the model, the layers’ majority voting output had more than one decision in 6 cases and in 4 cases, the correct camera model had been identified among the decisions.

Table 4.

Performance comparison between the base model and the model enhanced with our explainable framework. The table summarizes classification accuracy for 1,350 test samples, demonstrating the framework’s effectiveness in improving model explainability by identifying incorrect decisions, thereby reducing false positives and increasing model precision.

Decision type True positive False positive Error detection Precision (%) Accuracy (%) Recall (%) F1 score (%)
Original model 1313 37 97.33 97.26 97.26 97.34
Our xDFAI framework 1276 10 27/37 99.22 96.52 97.18 98.19

It is worth highlighting that in the realm of digital forensics, precision is of paramount importance alongside accuracy. In this experimental study, the incorporation of rules led to a substantial increase in precision from 97.33% to 99.2%, which is of immense importance in the field of digital forensics. Furthermore, despite the presence of 37 cases of FN, the overall accuracy changed slightly from 97.26% to 96.5%. This shows that, by identifying abnormal behavior within the 27 instances that were errors in the base model, the proposed approach did not lead to a substantial decrease in model accuracy.

Features of the proposed framework (RQ2)

In response to RQ2, in this section, our focus will be on a thorough examination of the various features of the proposed framework and investigating its impact on the domain of digital forensics.

Decision confirmation or error discovery in model predictions

By delving into the inner workings of the model and gaining insight into the Trace across various layers, the proposed framework can validate or refute the decisions made by the base model and foster trust in the model’s outputs. Applying the layers’ majority voting approach, based on the identified Traces in each layer, and utilizing the proposed rules derived from the majority voting results can help identify some misclassifications made by the model. These instances are categorized as” abnormal behavior.” When the model makes incorrect decisions, the majority voting process does not yield a consensus agreement on a particular class. Instead, different layers identify various classes or identify a class with significantly different and low voting results compared to the decision of the base model. Consequently, confirming the model’s decision becomes unfeasible.

As depicted in Table 4, the ability to identify model errors and the removal of 27 model errors from the true positive (TP) category led to a notable increase in precision, rising from 97.33% to 99.2%. This improvement is of great significance within the field of digital forensics, as it improves the reliability of decision-making results. It should be noted that, despite our 37 FN, the accuracy of our proposed framework remains relatively unchanged compared to the base model.

Advancing explainability in challenging scenarios

Our framework introduces a unique feature that addresses the limitations of existing explainability methods. Although current XAI methods have been successful in providing explanations for certain types of models and tasks, they may struggle to offer meaningful insights in complex scenarios, particularly in tasks involving image analysis, such as source identification. These tasks often require a deeper understanding of the source fingerprint that exceeds the capabilities of current explainability techniques. As illustrated in Fig. 3, the GradCAM, Integrated Gradients, and SHAP GradientExplainer outputs are not applicable for images taken with the same camera model. Despite sharing some commonalities, the outputs of these XAI methods also exhibit significant differences that do not effectively assist investigators in explaining camera identification tasks.

No need for model annotation, and retraining

Our framework offers a unique advantage by eliminating the need for model annotation and retraining. Although model annotation serves as a practical approach to achieving explainability, it does have certain drawbacks. One significant drawback is the introduction of subjectivity and potential bias by human annotators. Different annotators may interpret the data differently, leading to variations in the generated explanations and undermining the objectivity and reliability of the model’s output. In addition, model annotation can be time-consuming, costly, and resource-intensive, especially when dealing with large datasets or complex data. This can result in longer project timelines and increased expenses17.

Scalability and generalization also pose challenges in model annotation. It may not be practical to annotate every possible scenario or variation, limiting the model’s ability to generalize to unseen data. Ensuring annotation quality and consistency is another concern, often requiring additional resources and quality control measures. Ethical considerations, such as privacy protection and fair representation, must also be carefully addressed during the annotation process. Furthermore, annotations may not fully capture the complexity of the decision-making process of the model, which can limit explainability18.

Extracting rules from deep models without changing the model (preserving integrity principle)

One of the important concerns and sensitivities in digital forensics is trying to eliminate any change in evidence. The artificial intelligence model as an analysis tool is not exempted from this rule and, in the process of using it in digital crime detection, it must be protected from any changes. Our proposed framework by extracting the rules from Deep models makes the model explainable without the need to change the structure of the target model.

Discussion on limitations, and future work

This section will shed light on the constraints and limitations. Identifying these constraints, we can gain insight into areas that may require further refinement or improvement. In addition, we will outline potential directions for future research and development, highlighting avenues to improve the capabilities of the framework and expanding its applicability in different domains. By doing so, our goal is to foster innovation and progress in the field, paving the way for more advanced and effective solutions. The following limitations can be mentioned with respect to the proposed framework:

  • Access to the training dataset is necessary.

  • Performing Trace identification operations on the training dataset and across all layers is time-consuming, although this task is only executed once during the framework implementation.

  • Since trace initialization relies on attribution operators (e.g., SHAP GradientExplainer), attribution variability may affect individual saliency maps; however, xDFAI is designed to attenuate bounded attribution noise through aggregation and filtering (Assumption 1 and Proposition 2).

The undetected error cases primarily correspond to scenarios in which layer-wise votes exhibit partial consensus or ambiguity, indicating inherent uncertainty in the base model rather than failure of the explanatory mechanism.

To enhance the proposed framework and facilitate the dependable application of deep learning models in the field of digital forensics, we are focusing on the following future endeavors:

  • We intend to enhance the Trace identification solution to increase the correct detection rate for baseline model errors and eliminate false negatives in our framework.

  • We intend to improve the Trace identification solution to achieve a 100% detection rate for model errors and eliminate FNs in our framework.

  • Furthermore, we believe that the proposed approach holds the potential to uncover the reasons behind incorrect predictions made by the baseline model by thoroughly analyzing the results obtained from the identified Trace for samples. This analysis will enable us to train a model that achieves the highest accuracy.

Conclusion

Accepting the explainability of AI models used in digital forensics faces various challenges. The comprehensibility and practicality of the outputs of existing XAI methods for DF experts, along with adherence to the governing principles in digital forensics, are crucial aspects. One potent approach to boosting transparency in artificial intelligence models is through explainability by rules extraction. Implementing this method in CNN models has proven challenging, as it requires modifications to the model and therefore compromises the principle of integrity in digital forensics6. However, some AI-based image forensics tasks cannot be adequately explained by existing methods, such as image source identification in child abuse investigations. This study proposed a framework that, by incorporating three fundamental steps: layer trace detection, layers majority voting, and rule extraction, ensures visual and rule-based explainability while maintaining integrity. By applying the proposed framework to a CNN-based model in the field of camera source identification, not only is it possible to confirm or reject the predictions of the baseline model, but it also increases the precision of the model. This empowers examiners to better understand the reasoning behind the model’s decisions and improves their confidence in the reliability of the classification process. The proposed framework is structurally applicable to other CNN-based forensic attribution settings; however, empirical validation beyond SCI is outside the scope of this study. This contributes to the advancement of explainable and reliable digital forensics practices.

Acknowledgements

Figures in this manuscript were drawn or modified by T.N. using draw.io (https://app.diagrams.net)

Author contributions

T.N. implemented the proposed idea and extracted the results. T. N also prepared the first draft. H.A. revised the paper and supervised the whole process. H.A. and A.GhB. helped with the formation of the basic idea.

Data availability

The source code is available on GitHub.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Nayerifard, T., Amintoosi, H., Gaemi Bafghi, A. and Dehghantanha, A. Machine learning in digital forensics: A systematic literature review. 10.48550/arXiv.2306.04965 (2023).
  • 2.Hall, S. W., Sakzad, A. & Choo, K. R. Explainable artificial intelligence for digital forensics. WIREs Forensic Sci.4, 1–11. 10.1002/wfs2.1434 (2022). [Google Scholar]
  • 3.Fähndrich, J. et al. Digital forensics and strong AI: A structured literature review. Forensic Sci. Int. Digit. Investig.10.1016/j.fsidi.2023.301617 (2023). [Google Scholar]
  • 4.Solanke, A. A. Explainable digital forensics AI: Towards mitigating distrust in AI-based digital forensics analysis using interpretable models. Forensic Sci. Int. Digit. Investig.10.1016/j.fsidi.2022.301403 (2022). [Google Scholar]
  • 5.Ali, S. et al. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion10.1016/j.inffus.2023.101805 (2023). [Google Scholar]
  • 6.IEEE Standards Association, Guide for an Architectural Framework for Explainable Artificial Intelligence. IEEE standard 2894 10.1109/IEEESTD.2024.10659410 (2024).
  • 7.Hall, S. W., Sakzad, A. & Minagar, S. A proof of concept implementation of explainable artificial intelligence (XAI) in digital forensics. In Network and System Security Conference10.1007/978-3-031-23020-2_4 (2022).
  • 8.Lo, W., Kulatilleke, G., Sarhan, M., Layeghy, S. & Portmann, M. XG-BoT: An explainable deep graph neural network for botnet detection and forensics. Internet Things10.1016/j.iot.2023.100747 (2023). [Google Scholar]
  • 9.Henrique, S. et al. Forensic Science International: Synergy Deepfake forensics analysis: An explainable hierarchical ensemble of weakly supervised models. IForensic Sci. Int. Synerg.10.1016/j.fsisyn.2022.100217 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ge, W. et al. MetaCluster: A universal interpretable classification framework for cybersecurity. IEEE Trans. Inf. Forensics Security19, 3829–3843. 10.1109/TIFS.2024.3372808 (2024). [Google Scholar]
  • 11.Floreale, G. et al. Automated processing of eXplainable Artificial Intelligence outputs in deep learning models for fault diagnostics of large infrastructures. Eng. Appl. Artif. Intell.149, 110518 (2025). [Google Scholar]
  • 12.Lundberg, S. M. and Lee, S. I. A unified approach to interpreting model predictions. IAdv. Neural Inf. Process. Syst. Preprint at 10.48550/arXiv.1705.07874 (2017).
  • 13.Nayerifard, T., Amintoosi, H. & Gaemi Bafghi, A. A robust PRNU-based source camera attribution with convolutional neural networks. J. Supercomput.10.1007/s11227-024-06579-8 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Shullani, D., Fontani, M., Iuliani, M., Shaya, OAl. and Piv, A. VISION: A video and image dataset for source identification. EURASIP J. Inf. Secur.10.1186/s13635-017-0067-2 (2017)
  • 15.Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. IInt. J. Comput. Vis.128, 336–359. 10.1109/ICCV.2017.74 (2020). [Google Scholar]
  • 16.Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In 34th International Conference on Machine Learning, ICML 7, 5109–5118 10.48550/arXiv.1703.01365 (2017).
  • 17.Ghai, B., Vera Liao, Q., Zhang, Y., Bellamy, R. and Mueller, K. Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers Proc. ACM Human-Computer Interaction 4, 1–28, 10.1145/3432934 (2020).
  • 18.Rasmussen, C. B., Kirk, K. & Moeslund, T. B. The challenge of data annotation in deep learning—A case study on whole plant corn silage. Sensors10.3390/s22041596 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The source code is available on GitHub.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES