Tabular diffusion counterfactual explanations

Wei Zhang; Brian Barr; John Paisley

doi:10.3389/frai.2026.1743495

. 2026 Feb 13;9:1743495. doi: 10.3389/frai.2026.1743495

Tabular diffusion counterfactual explanations

Wei Zhang ^1,^*, Brian Barr ², John Paisley ¹

PMCID: PMC12946002 PMID: 41766942

Abstract

Counterfactual explanations methods provide an important tool in the field of interpretable machine learning. Recent advances in this direction have focused on diffusion models to explain a deep classifier. However, these techniques have predominantly focused on problems in computer vision. In this study, we focus on tabular data typical in finance and the social sciences and propose a novel guided reverse process for categorical features based on an approximation to the Gumbel-softmax distribution. Furthermore, we study the effect of the temperature τ and derive a theoretical bound between the Gumbel-softmax distribution and our proposed approximated distribution. We perform experiments on several large-scale credit lending and other tabular datasets, assessing their performance in terms of the quantitative measures of interpretability, diversity, instability, and validity. These results indicate that our approach outperforms popular baseline methods, producing robust and realistic counterfactual explanations.

Keywords: controllable diffusion models, counterfactual generation, heterogeneous data, discrete diffusion models, explainable machine learning

1. Introduction

Deep neural networks have revolutionized many fields, perhaps most notably in computer vision and natural language processing. Despite its extraordinary performance, the frequent lack of a deep model's explainability prevents it from being widely adopted in regulated fields such as Fintech. Practitioners in those fields are interested in not only the decisions given by the black-box model, but also the reasons behind the decisions. This is necessary for transparency of the factors impacting the decision, and explaining alternatives that may produce different outcomes.

Many methods have been developed to improve the transparency of black-box models. A number of works generate feature importance based on local approximations (Ribeiro et al., 2016), global approximations (Ibrahim et al., 2019), gradient attributions (Shrikumar et al., 2017; Sundararajan et al., 2017), and SHAP values (Lundberg and Lee, 2017). Other methods focus on interventions in causal regimes (Liu et al., 2024, 2018) and the construction of additive models using neural networks (Agarwal et al., 2021; Radenovic et al., 2022; Chang et al., 2022; Zhang et al., 2024).

In this study, we focus on tabular counterfactual explanations (CEs), a post-hoc method that answers the question “What changes can be made to an input so that its output label changes to the target class?” Counterfactual explanations are often preferable to methods such as LIME or gradient-based attributions because they provide actionable and intuitive insights that align with how people naturally reason about decisions. Rather than assigning post-hoc importance scores to features, counterfactuals answer a concrete “what-if” question: what minimal change to the input would lead to a different outcome? This makes them especially suitable for high-stakes settings where explanations must support decision-making. By directly reflecting the model's decision boundary, counterfactual explanations are both faithful to the model and more interpretable for end users.

CEs aim to explain a classifier f:ℝ^d → {0, 1} by generating a counterfactual sample $\hat{x}$ such that the predicted label is flipped with minimal changes to the input as defined by a metric d(·, ·). This can be characteristically formulated as

\begin{array}{l} arg min_{\hat{x}} d (x, \hat{x}) subject to f (\hat{x}) = y_{target} . \end{array}

(1)

Wachter et al. (2017) cast this framework into an optimization problem and directly back-propagate the gradients of the classifier and distance constraints into the feature space. This approach treats each feature as a continuous variable and thus does not directly apply to categorical features. Other methods explicitly deal with categorical features and generate counterfactual explanations using graphs (Poyiadzi et al., 2020), prototypes (Van Looveren and Klaise, 2021), multi-objective functions (Dandl et al., 2020), rule-based sets (Guidotti et al., 2018), point processes (Mothilal et al., 2020), and random forests (Fernández et al., 2020).

Deep generative models, such as variational autoencoders (VAEs) (Kingma, 2013) and generative adversarial networks (GANs) (Goodfellow et al., 2020), also play an important role in counterfactual generation. Compared with optimization-based approaches, generative methods are often preferred due to their ability to produce more plausible counterfactuals and generate them efficiently at inference time. Joshi et al. (2019) and Antorán et al. (2020) propose methods that search in the latent space of a VAE to generate counterfactuals. Pawelczyk et al. (2020) focus explicitly on tabular data and use conditional VAEs as the generator for a target class. Methods in this line of work build on VAE architectures and rely on an efficient searching algorithm. Guo et al. (2023b); Sumiya and Shouno (2024) leverage the amortized cost during the inference time to improve the efficiency. Nemirovsky et al. (2022) deploys a GAN-based framework to learn the residual between an input and its counterfactual for vision data. Furthermore, it has been empirically shown that diffusion models can achieve better generative capacity and more stable training in fields such as vision tasks Ho et al. (2020); Song et al. (2020); Rombach et al. (2022), language texts Nie et al. (2025), and tabular data Panagiotou et al. (2024).

Instead, like Wachter et al. (2017) we work directly in the feature space, but approach the problem from the perspective of diffusion modeling (Ho et al., 2020; Song et al., 2020). In the continuous image domain, Dhariwal and Nichol (2021) has introduced classifier guidance on the reverse process for continuous features such as image data, while Augustin et al. (2022) built upon this framework with a counterfactual constraint on the reverse process. The proposed methods generate high-fidelity counterfactual images for a vision classifier. Explainable diffusion models for categorical tabular data have received less consideration. While diffusion models have been extensively studied for categorical tabular data, e.g., Hoogeboom et al. (2021); Sun et al. (2022); Dieleman et al. (2022); Kotelnikov et al. (2023); Regol and Coates (2023), this line of work rarely intends to provide explanations for a classifier.

Two notable recent investigations in this area include (Gruver et al., 2024; Schiff et al., 2024). Gruver et al. (2024) focuses on controllable discrete diffusion models in protein design by introducing a learnable mapping function that projects a discrete vector onto a continuous representation. The resulting representation is treated as a continuous variable and is diffused through the Gaussian distribution. Schiff et al. (2024) also focuses on discrete data by treating a one-hot vector as continuous. Nevertheless, their works mainly focus on pure discrete data such as language texts. Direct adoption of a classifier for counterfactual explanations cannot be straightforwardly applied.

Despite the growing body of work on counterfactual explanations, existing approaches face several important limitations when applied to tabular data. Optimization-based methods often require solving a separate optimization problem for each input instance, which can be computationally expensive at inference time. Additionally, without strong constraints or priors, the resulting counterfactuals may sometimes deviate from the realistic data manifold. VAEs-based approaches often require a well-trained latent space and an additional search procedure, which can introduce instability, mode collapse, or misalignment between latent perturbations and interpretable feature-level changes. More recent diffusion-based methods demonstrate strong generative performance for tabular and discrete data, but have largely focused on data synthesis rather than explanation, and do not directly integrate a classifier for tabular counterfactual explanations. As a result, there remains a gap for methods that can generate faithful counterfactual explanations in heterogeneous tabular data while operating directly in the original feature space and remaining tightly coupled to the decision boundary of the target classifier. To this goal, we propose a novel tabular diffusion model for counterfactual explanations that leverages Gumbel-softmax re-parameterization (Jang et al., 2017). Gumbel-Softmax reparameterization is widely used to handle categorical variables in a differentiable manner, enabling gradient-based backpropagation. However, integrating Gumbel-Softmax into classifier-guided diffusion models remains challenging. Our contributions are threefold:

Our method permits gradient backpropagation, and the resulting reverse process resembles the classifier guidance in the Gaussian case. It is easy to implement and efficient for counterfactual generation.
We study the effect of temperature τ in the Gumbel-softmax distribution on our model and derive a tight bound between an introduced approximation. Our proposed method approximates the base model better as the temperature decreases.
We experiment on four large-scale tabular datasets. The results demonstrate that our method achieves competitive performance on popular metrics used to evaluate counterfactuals within the field.

2. Related work

To situate our method within technological developments, we first review machine learning works related to counterfactual explanations and recent advances in controllable diffusion models, highlighting some of the key differences and shortcomings our method seeks to address for tabular data.

2.1. Counterfactual explanations

Following Equation 1, researchers have leveraged the auto-encoder architecture to construct a counterfactual explainer. Joshi et al. (2019) takes a learned auto-encoder and aims to find the latent vector of the counterfactual sample by back-propagating gradients from the classifier into the latent space. The distance constraint is applied in the feature space. Antorán et al. (2020) takes a similar approach but uses Bayesian Neural Networks to estimate the uncertainty of the generated counterfactual. Pawelczyk et al. (2020) also works in the latent space but explicitly tackles a set of immutable features. The authors use the conditional HVAE (Nazabal et al., 2020) and condition on the immutable features while searching for the counterfactual is again completed in the latent space with validity and minimum changes constraints in the feature space. Panagiotou et al. (2024) relies on the VAE framework but leverages a transformer to encode and decode input samples. These methods work in the latent space, and once the latent vector is found, the counterfactual sample is generated from the pretrained decoder. The searching phase is often computationally expensive, as we illustrated in the experimental section. In addition, VAE-based methods often generate counterfactual samples through a black-box decoder, which might introduce another layer of uncertainty.

To mitigate the searching task, Guo et al. (2023a,b); Zhang et al. (2022) train the classifier and counterfactual generator simultaneously by supervising the latent space. Counterfactual samples can be generated by linear mapping (Zhang et al., 2022) and non-linear mapping (Guo et al., 2023a,b) in the latent space, which effectively reduces the computational cost. Sumiya and Shouno (2024) replace the counterfactual generator with an invertible flow model with validity and proximity constraints. Although efficient, such explainers are model-dependent, and the uncertainty of the decoder still exists. In contrast, our approach will be model-agnostic, which only requires differentiability and directly operates in the feature space.

On the other hand, Wachter et al. (2017) directly works in the feature space. The proposed method back-propagates the gradients that lead to the target class label with minimum changes in the feature space. Through this back-propagation, it becomes easier to handle immutable features, which simply mask the corresponding gradients. Similar to this approach, Sanderson et al. (2025) enhances the searching algorithm by including additional losses such as sparsity, proximity, plausibility, and diversity. Tsiourvas et al. (2024) solves mixed-integer optimization with searching in the live polytopes to reduce the computational cost. However, the study only focuses on the ReLU-based model, whereas our model has no such constraint. (Duell et al. 2024) only includes the proximity constraint but introduces the uncertainty minimizer to reduce the randomness of the counterfactual path. Though effective, it is still hard to handle categorical features in this setting. In addition, it has been shown that a single pixel can fool a well-trained classifier (Su et al., 2019). Thus, although the resulting counterfactual sample might be valid (i.e., changed its label), it may not provide meaningful human-actionable information or insight into the learned deep neural network.

Another method called FACE (Poyiadzi et al., 2020) also works in the feature space. Here, a graph is first constructed based on the existing dataset. A graph search algorithm is then performed until it finds the counterfactual sample with the target label and minimum changes. If immutable features are present, a sub-graph is selected from the original graph. Though intuitive, the counterfactual samples are only selected from the existing dataset, which limits the diversity of the generated samples. Depending on the size of the dataset, the proposed method might also suffer from an instability issue. The computational cost is also high for a large dataset.

2.2. Guided diffusion models

Diffusion models have demonstrated much generative power for image generation (Song et al., 2020; Ho et al., 2020) and tabular data generation (Kotelnikov et al., 2023). However, counterfactual explanations through diffusion models are still rapidly developing. Guided diffusion models have been extensively studied (Dhariwal and Nichol, 2021) and extended (Augustin et al., 2022; Na and Lee, 2025) for counterfactual generation in the continuous domain. In this study, we employ these developments for tabular data, but challenges remain for extending to categorical features. Guided diffusion works because it operates in continuous spaces, where gradients can be calculated. However, this is infeasible in discrete spaces. Madaan and Bedathur (2024) attempted to solve this issue by using a look-up dictionary as the encoder and decoder. Nevertheless, the look-up dictionary introduces additional learning parameters and requires discretizations as diffusion models operate in continuous space. Galwaduge and Samarabandu (2025) involves training a classifier that can classify the noisy samples during the reverse process and primarily focuses on the intrusion detection task. Our method only requires a differentiable classifier that classifies an unperturbed sample.

Recently, Gruver et al. (2024) has developed a controllable diffusion pipeline for protein generation, which is purely categorical data, using a continuous function mapping. Schiff et al. (2024) also worked on categorical language data, by directly treating the categorical vector as if it were a continuous vector. Both of these recent works have demonstrated their efficacy for their related tasks. While in this line of work, our study is distinct in two ways: (1) We propose a new approach to handling categorical data in controllable diffusion models that requires minimal modification to the existing tabular diffusion frameworks, and (2) We handle both continuous and categorical data simultaneously with the aim of explainable classification, whereas these two studies do not involve a classification problem.

3. Background on diffusion models

3.1. Tabular diffusion models

Diffusion models have been extensively studied recently as a powerful generative model for high fidelity images (Sohl-Dickstein et al., 2015; Ho et al., 2020; Nichol and Dhariwal, 2021; Song et al., 2020). A typical diffusion model consists of a forward and reverse Markov process. The forward process injects Gaussian noise to the input along a sequence of time steps, terminating at a prior, typically isotropic Gaussian distribution. The Markovian assumption factorizes the forward process as $q (x_{1 : T} | x_{0}) = \prod_{t = 1}^{T} q (x_{t} | x_{t - 1})$ . The reverse process aims to gradually denoise from the prior x_T~q(x_T) to generate a new sample through $p (x_{0 : T}) = \prod_{t = 1}^{T} p (x_{t - 1} | x_{t})$ . Although the Gaussian forward process can be derived in closed form, the reverse process p(x_t−1|x_t) is intractable and requires a neural network to approximate. The parameters of the denoising neural network can be learned by maximizing the evidence lower bound,

\begin{array}{l} log q (x_{0}) & \geq E_{q (x_{0})} (\underset{L_{0}}{\underset{︸}{log q (x_{0} | x_{1})}} - \underset{L_{T}}{\underset{︸}{KL (q (x_{T} | x_{0}) | | q (x_{T}))}} \end{array}

\begin{array}{l} - \sum_{t = 2}^{T} \underset{L_{t}}{\underset{︸}{KL (q (x_{t - 1} | x_{t}, x_{0}) | | q (x_{t - 1} | x_{t}))}}) . \end{array}

(2)

The key distinction of tabular diffusion models is that there are two independent processes: Gaussian diffusion models for continuous features and Multinomial diffusion models for categorical features (Kotelnikov et al., 2023).

3.1.1. Continuous diffusions

Let $x_{t} \in ℝ^{D}$ and α_t = 1−β_t where t∈[1, T] is the time step. The forward process follows the distribution

\begin{array}{l} q (x_{t}) ~ N (x_{t} | \sqrt{α_{t}} x_{t - 1}, (1 - β_{t} I)) . \end{array}

(3)

Given x₀, the marginal distribution of x_t for any t is $q (x_{t} | x_{0}) ~ N (x_{t} | \sqrt{{\bar{α}}_{t}} x_{0}, \sqrt{1 - {\bar{α}}_{t}} I)$ where ${\bar{α}}_{t} = \prod_{i = 1}^{t} α_{i}$ . This allows direct generation of the noisy x_t. The reverse process approximates the true posterior q(x_t−1|x_t, x₀) with q_θ(x_t−1|x_t). By Bayes' rule, q(x_t−1|x_t, x₀) can be computed in closed form and is Gaussian. Therefore, q_θ(x_t−1|x_t) is usually chosen to be a neural network-parameterized Gaussian, $q_{θ} (x_{t - 1} | x_{t}) ~ N (x_{t - 1} | μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t))$ . Alternatively, Ho et al. (2020) found that, instead of directly producing the mean of the posterior Gaussian distribution, more favorable results can be found by predicting the noise at each time step:

ℒ_{t} = E_{ϵ ~ N (0, I)} ‖ ϵ_{t} - ϵ_{θ} (x_{t}, t)) ‖^{2}

(4)

where ϵ_θ is a neural network. Once trained, the mean of the posterior can be derived as

\begin{array}{l} μ_{θ} (x_{t}, t) = \frac{1}{\sqrt{1 - β_{t}}} (x_{t} - \frac{β_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ϵ_{θ} (x_{t}, t)), \end{array}

(5)

which gradually denoises x_t. Furthermore, Ho et al. (2020) constructs the generative process using stochastic Langevin dynamics, which introduce randomness during the sampling process. We use the same dynamics, except for the final step, which produces actual samples.

3.1.2. Categorical diffusions

Multinomial diffusion models adapt the framework to handle categorical data (Hoogeboom et al., 2021). Let x_t be a K-dimensional one-hot vector. The forward process now becomes

\begin{array}{l} q (x_{t} | x_{t - 1}) ~ Cat (x_{t} | (1 - β_{t}) x_{t - 1} + β_{t} / K) . \end{array}

(6)

When T is large enough, the resulting x_T~Cat(x_T|1/K). Similar to the continuous case, x_t can be computed as $q (x_{t} | x_{0}) = Cat (x_{t} | {\bar{α}}_{t} x_{0} + (1 - {\bar{α}}_{t}) / K)$ . The posterior of the reverse process can be derived using Bayes' rule,

\begin{array}{l} q (x_{t - 1} | x_{t}, x_{0}) ~ Cat (x_{t - 1} | π / \sum_{i = 1}^{K} π_{i}), \end{array}

(7)

where $π = [α_{t} x_{t} + (1 - α_{t}) / K] ⊙ [{\bar{α}}_{t - 1} x_{0} + (1 - {\bar{α}}_{t - 1}) / K]$ . The loss $L_{t}$ in the categorical case is the KL divergence KL(q(x_t−1|x_t, x₀)||p_θ(x_t−1|x_t)) where the neural network outputs the predicted ${\tilde{x}}_{0}$ directly from the noisy input x_t.

3.2. Classifier guidance

Controllable reverse processes have been explored to generate class-dependent samples (Nichol and Dhariwal, 2021). In classifier-free guidance, the target class label y is embedded into the denoising neural network, generating class-dependent predicted noise. No classifier exists to be explained or generate counterfactuals for, and so these techniques are outside the scope of this study.

In classifier guidance methods, a differentiable classifier p_ϕ(y|x) is trained on the input space, and a guided reverse process is formulated as

\begin{array}{l} p_{θ, ϕ} (x_{t} | x_{t + 1}, y) = \frac{1}{Z} p_{θ} (x_{t} | x_{t + 1}) p_{ϕ} (y | f_{d n} (x_{t})), \end{array}

(8)

where f_dn reconstructs the noise-free sample. A first-order Taylor expansion around the mean μ gives the approximation

\begin{array}{l} \frac{1}{Z} p_{θ} (x_{t} | x_{t + 1}) p_{ϕ} (y | x_{t}) ~ N (μ + Σ g, Σ), \end{array}

(9)

where g = ∇_{x_t}logp_ϕ(y|f_dn(x_t))|_{x_t = μ}. We see that the reverse process uses gradient information from the target class in the generative process.

However, in the categorical setting, a combinatorial challenge arises when calculating gradients, resulting in $O (\prod_{i} K_{i})$ forward passes from the classifier, where K_i is the number of options for the i-th categorical variable. This is infeasible when the number of categorical variables becomes large. This challenge motivates our following use of Gumbel-softmax reparameterization, resulting in a reverse process similar to that of the continuous case.

4. Categorical tabular diffusions for counterfactual explanations

We propose a novel method to generate counterfactual explanations for any differentiable classifier, with particular interest in the categorical data scenario. We adopt the Gumbel-softmax re-parameterization (Jang et al., 2017) transform to provide a continuous representation of discrete data. This allows the model to leverage the gradients from the differentiable classifier on all the categorical variables and produce counterfactual information. The pipeline of our method is shown in Figure 1.

Flowchart depicting a data processing pipeline for generating counterfactuals. Numerical input $X_{\text{num}}$ undergoes normalization. Categorical inputs $X_{\text{cat1}}$ and $X_{\text{cat2}}$ use Gumbel-softmax reparameterization. Outputs feed into $q$ sampling, producing $X_{\text{in}}$. This input enters a system combining U-Net $t$ and function $f$, with results indicated by gradient $ \nabla f $. The process iterates over time $T$ to produce counterfactual $X_{\text{cf}}$. — The pipeline of Tabular Diffusion Counterfactual Explanations (TDCE). The categorical variables in the one-hot vector are first re-parameterized. Then, the q sampling generates the noisy version of the input sample. The denoising module runs T steps with the gradient from the classifier to generate the counterfactual sample.

4.1. Tabular counterfactual generation

We break a data point x into its continuous and categorical portions, x^num and x^cat, respectively.

4.1.1. Continuous features

Here, we follow the adaptive parameterization (Augustin et al., 2022) to implement the guided reverse process. The mean transition of Equation 9 now becomes

\begin{array}{l} μ_{θ} (x_{t}^{num}, t) + Σ_{θ} (x_{t}^{num}, t) | | μ_{θ} (x_{t}^{num}, t) | | g_{guided} \end{array}

(10)

g_{guided} = \frac{\nabla log p_{ϕ} (y | f_{d n} (x_{t}))}{| | \nabla log p_{ϕ} (y | f_{d n} (x_{t})) | |} - \frac{\nabla d (x, f_{d n} (x_{t}))}{| | \nabla d (x, f_{d n} (x_{t})) | |}

is the normalized gradient from the classifier and the normalized distance constraint.

Intuitively, each original denoising step is guided by the classifier's gradients multiplied by the covariance of the denoising step and the magnitude of the unguided mean vector. This process takes the classifier's impact into account and generates high-quality data not only around a dataset's manifold but also in the cluster of the target class. The proposed counterfactual changes should be minimal compared with the initial sample.

4.1.2. Categorical features

Working with Equation 8,

\begin{array}{l} log p_{θ, ϕ} (x_{t} | x_{t + 1}, y) = log p_{θ} (x_{t} | x_{t + 1}) + log p_{ϕ} (y | f_{d n} (x_{t})) - log Z, \end{array}

(11)

we observe that the adaptive parameterization approach cannot be straightforwardly applied because the gradient cannot be back-propagated to the discrete one-hot vector space. To guide the reverse process in discrete data scenarios, all combinations must be exhausted, which is infeasible and motivated recent developments (Schiff et al., 2024).

In this study, we approach this problem through the Gumbel-softmax re-parameterization; instead of working in the discrete space, we propose to use the Gumbel-softmax vector to softly approximate the discrete data.

4.2. Relaxation of categorical variables

At each time step, a categorical variable is modeled as $x^{cat} ~ Cat (x^{cat} | \bar{π})$ where $\bar{π} \in Δ^{K - 1}$ is a normalized non-negative vector. A one-hot vector can be constructed as $x^{cat} = onehot ({argmax}_{i} g_{i} + log π_{i})$ , where g_i~Gumbel(0, 1). Following Jang et al. (2017), and re-parameterize this as

\begin{array}{l} {\tilde{x}}_{i, t}^{cat} = \frac{exp (\frac{1}{τ} (g_{i} + log {\bar{π}}_{i, t}))}{\sum_{j = 1}^{K} exp (\frac{1}{τ} (g_{j} + log {\bar{π}}_{j, t}))} \end{array}

(12)

at each time step of the reverse process, where τ≥0 is the temperature. As is evident, as τ → 0, ${\tilde{x}}_{t}^{cat}$ reduces to a one-hot vector. Using this continuous transformation, the logp_θ(x_t|x_t+1) term in Equation 11 can be modeled with a Gumbel-softmax vector. The density of Gumbel-softmax (GS) (Jang et al., 2017; Maddison et al., 2016) is

p_{G S} ({\tilde{x}}_{1 : K} | \bar{π}, τ) = Γ (K) τ^{K - 1} {(\sum_{i = 1}^{K} \frac{{\bar{π}}_{i}}{\tilde{x}})}^{- K} \prod_{i = 1}^{K} \frac{{\bar{π}}_{i}}{{\tilde{x}}_{i}^{τ + 1}} .

(13)

Using Equation 12, we switch from the discrete one-hot representation to the continuous softmax representation. In the forward and backward process, the transitions are

\begin{array}{l} q ({\tilde{x}}_{t} | {\tilde{x}}_{t - 1}) ~ GS ({\tilde{x}}_{t} | \bar{π} = (1 - β_{t}) {\tilde{x}}_{t - 1} + β_{t} / K), \end{array}

\begin{array}{l} q ({\tilde{x}}_{t - 1} | {\tilde{x}}_{t}, {\tilde{x}}_{0}) ~ GS ({\tilde{x}}_{t - 1} | \bar{π} = \tilde{π} / \sum_{i = 1}^{K} {\tilde{π}}_{i}), \end{array}

(14)

where $\tilde{π} = [α_{t} {\tilde{x}}_{t} + (1 - α_{t}) / K] ⊙ [{\bar{α}}_{t - 1} {\tilde{x}}_{0} + (1 - {\bar{α}}_{t - 1}) / K]$ . The final categorical sample can be obtained by $x^{cat} = onehot ({argmax}_{i} \tilde{x})$ . Equation 11 in the Gumbel-softmax space reflects this change straightforwardly,

\begin{array}{l} log p_{θ, ϕ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1}, y) = log p_{θ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1}) + log p_{ϕ} (y | f_{d n} ({\tilde{x}}_{t})) + c o n s t . \end{array}

(15)

The reverse process $p_{θ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1})$ is a parameterized neural network. A challenge arises while solving the guided process with the Gumbel-softmax distribution, not faced by Gaussian diffusions, because the Gaussian model mathematically accommodates a first-order Taylor approximation of the classifier well. The Gumbel-softmax distribution in Equation 13 proposed by Jang et al. (2017) cannot be directly applied in Equation 11 and leverage Taylor expansion to derive as it is for the Gaussian case in Equation 9. This stops from direct adoption of Gumbel-softmax reparameterization in controlled Gaussian diffusion models. Therefore, for the Gumbel-softmax we approximate the log density $log p_{θ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1})$ as

\begin{array}{l} log p_{θ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1}) \approx {\tilde{x}}_{t}^{⊤} log {\bar{π}}_{θ} ({\tilde{x}}_{t + 1}) + c o n s t \end{array}

(16)

We evaluate this approximation in Section 4.3 in terms of a KL divergence bound between the Gumbel-softmax distribution and Equation 16.

Next, we take the first order of Taylor expansion for the classifier around ${\tilde{x}}_{t + 1}$ , leading to

\begin{array}{l} log p_{ϕ} (y | {\tilde{x}}_{t}) \approx {({\tilde{x}}_{t} - {\tilde{x}}_{t + 1})}^{⊤} g_{c a t} + c o n s t \end{array}

(17)

where $g_{c a t} = \nabla log p_{ϕ} (y | {\tilde{x}}_{t}) |_{{\tilde{x}}_{t} = {\tilde{x}}_{t + 1}}$ . Replacing Equations 16, 17 in Equation 15, the guided reverse process becomes

\begin{array}{l} log p_{θ, ϕ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1}, y) \approx {\tilde{x}}_{t}^{⊤} (log {\bar{π}}_{θ} ({\tilde{x}}_{t + 1}) + λ g_{c a t}) + c o n s t, \end{array}

(18)

where λ is a regularization hyperparameter. The familiar expression that results has a similar interpretation to the continuous case. We illustrate the reverse process dynamics of our approach in Figure 2.

Sequence of eleven triangular plots showing data points in blue, decreasing in density from left to right. Each triangle is labeled with decreasing time intervals, starting from t=1000 to t=100. — A simulation of diffusions for the Gumbel-softmax vector over a single categorical variable with three classes. Each blue dot is a data point. **Top**: The reverse diffusion process for the Gumbel-softmax vector on a 3D simplex *without* classifier guidance. **Bottom**: The reverse diffusion process *with* classifier guidance.

4.3. Closeness of the approximation

At each reverse time step, the log density $log p_{θ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1})$ follows the Gumbel-softmax distribution $p_{G S} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1})$ . We model the log density as

\begin{array}{l} p_{θ} ({\tilde{x}}_{t} | {\tilde{x}}_{t + 1}) = \frac{1}{Z ({\tilde{x}}_{t + 1})} \prod_{i}^{K} {\bar{π}}_{θ} {({\tilde{x}}_{t + 1})}_{i}^{{\tilde{x}}_{t, i}} \end{array}

(19)

where $Z ({\tilde{x}}_{t + 1})$ is the normalizing constant and ${\bar{π}}_{θ} (\cdot)$ is the probability estimator parameterized by a diffusion network.

Theorem 4.1. Let $\tilde{x}, π \in Δ^{K - 1}$ and the temperature τ∈ℝ⁺. Define ${\tilde{x}}_{m i n}$ the minimum value $\tilde{x}$ can take. The KL divergence between p_GS defined in Equation 13 and its approximation p_θ in Equation 19 is bounded as follows:

\begin{array}{l} KL (p_{G S} | | p_{θ}) < - K (τ + 1) log [1 - {\tilde{x}}_{m i n}] + (K - 1) log τ \\ + (K - 1) log [1 - {\tilde{x}}_{m i n}] \\ + log Γ (K) + K log [(1 - {\tilde{x}}_{m i n}) / (K - 1)!] \\ KL (p_{G S} | | p_{θ}) > K (τ + 1) log {\tilde{x}}_{m i n} + (K - 1) log τ \\ + (K - 1) log [{\tilde{x}}_{m i n}] \\ + log Γ (K) + K log [1 / (K - 1)!] \end{array}

Proof: See the Supplementary material.

An empirical example of the bound is shown in Figure 3. The benefits of our proposed approximation are:

Two line graphs compare KL Divergence versus Tau for different K values. The left graph with K equals 3 shows a steady increase in KL Divergence up to 10. The right graph with K equals 5 shows a sharp initial rise, followed by fluctuations, reaching over 400. Both include shaded areas representing the range. — KL divergence between the Gumbel-softmax distribution and our approximation on simulated data as a function of temperature τ. The KL divergence increases as the τ grows, as do the bounds.

1) It allows us to use the first-order Taylor expansion, resulting in a closed-form update at each time step for the reverse process. This update directs the unguided logits with the gradient of the classifier toward the target class, which is intuitive and similar to the Gaussian case.
2) The Taylor expansion, which is also applied to the Gaussian case, requires no additional step for guided categorical variables. The gradient can be calculated concurrently with the Gaussian case over any continuous variables in the data, which significantly reduces computational complexity.

Although a lower temperature leads to a better approximation, it also introduces a larger variance and may result in vanishing gradient issues. As τ → 0, the soft representation approaches a one-hot vector, which may prevent the backwards flow of the gradients through the softmax function. A lower temperature can also introduce significant variance in the estimated gradients. To see this, let Y_i = softmax((z_i+g_i)/τ) where g_i is the Gumbel noise. The partial derivative $\partial Y_{i} / \partial z_{j} = \frac{1}{τ} Y_{i} (δ_{i j} - Y_{j})$ , which is bounded above by $\frac{1}{4 τ}$ , and so $Var (\partial Y_{i} / \partial z_{j}) \leq E [{(\partial Y_{i} / \partial z_{j})}^{2}] < \frac{1}{16 τ^{2}}$ . For the lower bound, the integral over the Gumbel noise is required, which is complicated. However, we know that $\partial Y_{i} / \partial z_{j} \geq \frac{B}{τ}$ for some constant B, meaning both bounds of the variance indicate that it becomes larger as the temperature decreases. In implementation, we therefore start with a warmer temperature and gradually decrease to a smaller value away from zero.

4.4. Immutable features

Immutable features are those that are predefined as unchangeable by the source of a datum, e.g., a location. When generating counterfactuals, these cannot be changed. One simple approach is to define a binary mask m indicating which features can change, and produce the counterfactual x_t*m+x*(1−m). The main issue here is with the so-called coherence (Avrahami et al., 2022), yielding samples that fall outside the data manifold.

Motivated by the blended diffusions of vision tasks (Avrahami et al., 2022), we combine the noisy version of the immutable features from the input with the guided mutable features according to x_{t, guided}*m+x_{t, noisy}*(1−m) where x_{t, noisy} is obtained from the forward process. At the final step, the immutable features are replaced by the original input. Our algorithm is shown in Algorithm 1.

Algorithm 1 Tabular diffusion counterfactual explanations —

5. Experiments

We compare our method with other popular methods for generating counterfactual explanations. The classifier f to be explained shares the same architecture as the U-net in the diffusion model, and the last layer outputs two-dimensional logits for binary classification. The U-net classifier is trained independently from our counterfactual generator. Our method is applicable to any differentiable black-box classifier where gradients are readily available with respect to the input. We refer to our method as Tabular Diffusion Counterfactual Explanation (TDCE). We use the same U-net architecture as described in Kotelnikov et al. (2023) for denoising samples and follow the same training procedure. Once trained, we follow the Algorithm 1 to generate a counterfactual sample for any differentiable black-box classifier.

5.1. Datasets

We focus on tabular datasets, selecting popular and public datasets that consists of both numerical and categorical features. A description of data is shown in Table 1. Lending Club Dataset (LCD) and Give Me Some Credit (GMC) focus on credit lending decisions. “Adult” predicts binarized annual income based on a set of features. The LAW data predicts pass/fail on a law school test. For each dataset, we select 1,000 samples as the test data with an equal number of positive and negative samples. The balanced test dataset is preferred for fair comparisons under metrics such as interpretability score, where autoencoders are trained on positive and negative samples. In all the experiments, we perform standardization for each continuous feature and convert each categorical feature to a one-hot vector.

Table 1.

Statistics from the tabular data sets we use.

Dataset	#Train	#Val	#Test	#Num	#Cat
LCD	10,000	1,000	1,000	5	1
GMC	15,000	1,000	1,000	9	1
Adult	47,842	1,000	1,000	9	2
LAW	5,502	1,000	1,000	8	3

Open in a new tab

5.2. Baselines and evaluation metrics

As a baseline, we compare with five methods for generating counterfactual explanations of a binary classifier. Wachter et al. (2017) presents the most straightforward baseline. They generate counterfactuals by following the gradients of a classifier from the input x to the decision boundary. Although it can generate a valid counterfactual sample with minimum distance, as we show in the next section, this simple and intuitive approach struggles to generate realistic counterfactuals. In addition, we also note that the L2 distance is dramatically increased as it searches for a more realistic counterfactual. We also benchmark against VAE-based methods designed to fix this, including Counterfactual Conditional Heterogeneous Autoencoder (CCHVAE) (Pawelczyk et al., 2020), Realistic, Explainable, and Interpretable Searched Explanations (REVISE) (Joshi et al., 2019), and Counterfactual Latent Uncertainty Explanations (CLUE) (Antorán et al., 2020), as well as a method based on graph search called Feasible and Actionable Counterfactual Explanations (FACE) (Poyiadzi et al., 2020), the neural network based method CounterNet (Guo et al., 2023b) and flow-based method FastDCFlow (Sumiya and Shouno, 2024). We implement these benchmarks using the CARLA library (Pawelczyk et al., 2021) or the source code provided by the authors.

We evaluate these methods using several widely used metrics for counterfactual-based explainability: L2 distance, Interpretability, Diversity, Validity, Instability, and the JS divergence. Among these, Diversity, L2 distance, and Instability are restricted to continuous features, while JS divergence applies only to categorical features. There is no global metric that quantifies counterfactual performance, and so the set of metrics described below combines to paint a subjective picture for evaluation.

5.2.1. L2 distance

Counterfactual samples aim to have turned their label to the target class with the minimum changes in the feature space. This is the key standard of counterfactual generation described in Equation 1 and can be quantified using L2 distance,

\begin{array}{c} L2 = \frac{1}{N} \sum_{i = 1}^{N} | | x_{i} - x_{i}^{cf} | |_{2}^{2}, \end{array}

(20)

Please note that this metric can only evaluate continuous features. For categorical features, we aim to recover the distribution of the categorical variable in the target class, which will be described later.

5.2.2. Interpretability

Van Looveren and Klaise (2021) use an autoencoder to evaluate the interpretability of a counterfactual method. Let AE_o, AE_t, and AE be three autoencoders trained on the original class, target class, and the entire dataset, respectively. The IM1 and IM2 scores are

\begin{array}{c} IM1 = \frac{1}{N} \sum_{i = 1}^{N} \frac{| | x_{i}^{cf} - {AE}_{t} (x_{i}^{cf}) | |^{2}}{| | x_{i}^{cf} - {AE}_{o} (x_{i}^{cf}) | |^{2} + ϵ} \end{array}

\begin{array}{c} IM2 = \frac{1}{N} \sum_{i = 1}^{N} \frac{| | {AE}_{t} (x_{i}^{cf}) - AE (x_{i}^{cf}) | |^{2}}{| | x_{i}^{cf} | |_{1} + ϵ} \end{array}

(21)

where $x_{i}^{cf}$ is the ith of N counterfactuals. A lower value of IM1 indicates that the generated counterfactuals are reconstructed better by the autoencoder trained on the counterfactual class (AE_t) than the autoencoder trained on the original class. This suggests that the counterfactual is closer to the data manifold of the counterfactual class, and thus more plausible. A similar interpretation holds for IM2. Hence, lower values of IM1 and IM2 are preferred.

5.2.3. Diversity

Diversity provides additional performance information because low IM1 and IM2 may occur with counterfactuals that tend to merge to a single point; not only should the counterfactual look like the counterclass, it should also preserve its variety. The diversity metric is calculated as

\begin{array}{c} Diversity = \frac{1}{N (N - 1)} \sum_{i = 1}^{N} \sum_{j = i + 1}^{N} d (x_{i}^{cf}, x_{j}^{cf}), \end{array}

(22)

where d(·, ·) is a predefined distance function. We use the Euclidean distance in this study.

5.2.4. Validity

This metric verifies that the The generated counterfactual indeed lies in the counter-class region of the classifier to be explained. This is

\begin{array}{c} Validity = \frac{1}{N} \sum_{i = 1}^{N} 1 (f (x_{i}^{cf}) = y^{'}) \end{array}

(23)

where f(·) is the explained classifier and y′ is the target label. (Not all counterfactual methods generate counterfactuals that are guaranteed to change their label).

5.2.5. Instability

A stable counterfactual explainer should produce similar counterfactual outputs for two similar query inputs. Instability quantifies this as

Instability = \frac{1}{N} \sum_{i = 1}^{N} \frac{d (x_{i}^{cf}, {\hat{x}}_{i}^{cf})}{1 + d (x_{i}, {\hat{x}}_{i})}

where ${\hat{x}}_{i} = arg min_{x \in X \ x_{i}, f (x) = f (x_{i})} | | x - x_{i} | |$ , the point within the data set closest to x_i that has the same label. A low instability is preferred.

5.2.6. JS divergence

We also evaluate how well the distribution of counterfactual categorical variables aligns with the distribution of the target class. We calculate the average JS divergence across categorical variables,

\begin{array}{c} JS = \frac{1}{N^{c}} \sum_{i = 1}^{N^{c}} JS (P_{target} (x_{i}) | | P_{C F} (x_{i})) \end{array}

(24)

where N^c is the number of categorical variables. A lower JS score indicates similarity between the distributions of generated counterfactuals and the target class.

5.3. Results

5.3.1. Quantitative evaluation

We show quantitative results in Tables 2, 3. The Table 2 is for the no-masking setting, while the Table 3 is for masking a preselected feature, prohibiting it from being changed by the counterfactual generator. While there is no single combination of these various metrics that determines relative performance, a subjective evaluation indicates the competitive performance of our TDCE method. For example, it achieves the best validity among other methods with significant margins, indicating nearly all the generated samples have turned to the target class—arguably a prerequisite for other metrics to have meaning. We observe the competitive or superior performance on IM1 and IM2 as well, indicating that the generated counterfactuals stay on the data manifold of the target class and have better interpretability.

Table 2.

Counterfactual quantitative evaluation without masking of features that are allowed to change.

Counterfactual evaluations
Model		L2↓	Diversity↑	Instability↓	JS↓	IM1↓	IM2↓	Validity↑
LCD	Wach.	0.34±0.02	0.73±0.03	0.11±0.03	0.12 ± 0.03	1.33 ± 0.04	0.16 ± 0.03	0.60 ± 0.03
	Wach.^IM1	0.76 ± 0.02	0.71 ± 0.03	0.13 ± 0.03	0.12 ± 0.03	0.71 ± 0.04	0.13 ± 0.03	0.97 ± 0.03
	CCH.	0.56 ± 0.03	0.19 ± 0.01	0.21 ± 0.02	0.09 ± 0.01	0.57±0.01	0.08±0.01	0.99±0.01
	REVI.	0.59 ± 0.01	0.18 ± 0.03	0.22 ± 0.02	0.10 ± 0.01	0.89 ± 0.03	0.09 ± 0.02	0.99±0.01
	CLUE	0.70 ± 0.02	0.26 ± 0.03	0.31 ± 0.03	0.11 ± 0.01	0.72 ± 0.04	0.11 ± 0.01	0.83 ± 0.03
	FACE	0.69 ± 0.01	0.54 ± 0.05	0.11±0.01	0.06±0.01	0.91 ± 0.07	0.11 ± 0.03	0.85 ± 0.02
	CounterNet	0.35±0.01	0.45 ± 0.03	0.25 ± 0.02	0.15 ± 0.02	0.99 ± 0.03	0.69 ± 0.03	0.99±0.01
	FastDCFlow	0.62 ± 0.08	0.69±0.06	0.29 ± 0.02	0.17 ± 0.02	0.91 ± 0.06	0.56 ± 0.03	0.99±0.01
	TDCE	0.59 ± 0.03	0.73±0.03	0.05±0.01	0.01±0.01	0.63±0.03	0.05±0.01	0.99±0.01
GMC	Wach.	0.03±0.02	0.25 ± 0.02	0.09 ± 0.01	0.03±0.01	1.04 ± 0.05	0.07±0.01	0.73 ± 0.03
	Wach.^IM1	0.27 ± 0.02	0.22 ± 0.02	0.10 ± 0.01	0.03±0.01	1.01±0.05	0.06±0.01	0.93 ± 0.03
	CCH.	0.21 ± 0.03	0.21 ± 0.01	0.10 ± 0.01	0.06 ± 0.02	1.14 ± 0.05	0.15 ± 0.02	0.77 ± 0.02
	REVI.	0.23 ± 0.02	0.21 ± 0.02	0.13 ± 0.01	0.05±0.01	1.18 ± 0.05	0.07±0.01	0.80 ± 0.02
	CLUE	0.18±0.02	0.18 ± 0.02	0.07±0.01	0.08 ± 0.01	1.14 ± 0.04	0.07±0.01	0.81 ± 0.02
	FACE	0.21 ± 0.02	0.17 ± 0.02	0.05±0.01	0.07 ± 0.01	1.18 ± 0.01	0.08 ± 0.01	0.86 ± 0.01
	CounterNet	0.20 ± 0.01	0.17 ± 0.02	0.10 ± 0.01	0.06 ± 0.01	1.02 ± 0.02	0.11 ± 0.02	0.97±0.01
	FastDCFlow	0.25 ± 0.01	0.98±0.04	0.13 ± 0.03	0.07 ± 0.02	1.01±0.04	0.10 ± 0.02	0.96±0.01
	TDCE	0.18±0.03	1.08±0.06	0.05±0.01	0.03±0.01	0.96±0.04	0.06±0.02	0.99±0.01
Adult	Wach.	0.27±0.03	1.11±0.01	0.09 ± 0.01	0.13 ± 0.01	1.31 ± 0.03	0.05±0.01	0.57 ± 0.02
	Wach.^IM1	0.97 ± 0.03	0.92 ± 0.01	0.08 ± 0.01	0.13 ± 0.01	1.01 ± 0.03	0.05 ± 0.01	0.87 ± 0.02
	CCH.	0.79±0.03	0.19 ± 0.02	0.22 ± 0.02	0.11 ± 0.02	1.89 ± 0.07	0.06 ± 0.02	0.61 ± 0.03
	REVI.	0.99 ± 0.02	0.43 ± 0.02	0.10 ± 0.01	0.11 ± 0.01	1.11 ± 0.01	0.07 ± 0.01	0.58 ± 0.02
	CLUE	0.81 ± 0.03	0.11 ± 0.01	0.04±0.01	0.17 ± 0.03	1.41 ± 0.05	0.04±0.01	0.62 ± 0.01
	FACE	0.89 ± 0.02	0.74 ± 0.04	0.07±0.01	0.06±0.01	0.97 ± 0.02	0.06 ± 0.01	0.75±0.02
	CounterNet	0.86 ± 0.02	0.69 ± 0.02	0.07±0.02	0.09 ± 0.01	0.96±0.02	0.06 ± 0.01	0.94±0.01
	FastDCFlow	0.96 ± 0.05	0.79±0.06	0.08 ± 0.02	0.10 ± 0.02	0.98±0.04	0.07 ± 0.01	0.92±0.01
	TDCE	0.85 ± 0.04	0.80±0.03	0.05±0.01	0.03±0.02	0.90±0.02	0.04±0.01	0.94±0.04
LAW	Wach.	0.17±0.04	1.22±0.05	0.13 ± 0.02	0.11 ± 0.02	1.73 ± 0.02	0.12 ± 0.02	0.58 ± 0.01
	Wach.^IM1	0.87 ± 0.04	1.12 ± 0.05	0.12 ± 0.02	0.11 ± 0.02	1.31 ± 0.02	0.11 ± 0.02	0.88 ± 0.01
	CCH.	0.99 ± 0.02	0.20 ± 0.01	0.07 ± 0.01	0.05±0.01	0.95 ± 0.03	0.09±0.02	0.99±0.01
	REVI.	0.71±0.03	0.91 ± 0.03	0.06±0.01	0.06 ± 0.01	1.56 ± 0.05	0.11 ± 0.01	0.61 ± 0.01
	CLUE	0.79 ± 0.02	0.37 ± 0.01	0.07 ± 0.01	0.05±0.01	1.21 ± 0.02	0.06±0.02	0.99±0.01
	FACE	0.81 ± 0.02	0.83 ± 0.02	0.03±0.01	0.04±0.01	1.31 ± 0.06	0.11 ± 0.02	0.81 ± 0.02
	CounterNet	0.79 ± 0.01	0.91 ± 0.02	0.07 ± 0.01	0.06 ± 0.01	0.93±0.03	0.08 ± 0.01	0.99±0.01
	FastDCFlow	0.88 ± 0.04	0.96±0.05	0.11 ± 0.02	0.07 ± 0.01	0.96±0.04	0.11 ± 0.02	0.98±0.01
	TDCE	0.81 ± 0.02	0.97±0.03	0.06±0.02	0.04±0.02	0.89±0.05	0.06±0.01	0.99±0.01

Open in a new tab

We provide an evaluation according to the metrics described in the text. The arrow beside each metric indicates the preferred value. We select one feature to mask in the masking setting. bold = 1st, underline = 2nd. The classifier in CounterNet requires a different architecture because it is model-dependent. Wach.^IM1 is generated by linearly interpolating the generated sample such that the IM1 score is maximized. The mean and standard deviation are calculated based on 5 runs with different random seeds.

Table 3.

Counterfactual quantitative evaluation with masking of features that are allowed to change.

Counterfactual evaluations
Model		L2↓	Diversity↑	Instability↓	IM1↓	IM2↓	Validity↑
LCD	Wach.	0.34±0.03	0.73±0.03	0.12±0.01	1.04 ± 0.05	0.27 ± 0.03	0.75 ± 0.03
	Wach.^IM1	0.51 ± 0.02	0.61 ± 0.03	0.11±0.01	0.84 ± 0.05	0.18 ± 0.03	0.96 ± 0.03
	CCH.	0.50 ± 0.03	0.36 ± 0.03	0.29 ± 0.02	0.64±0.05	0.16±0.01	0.98±0.01
	REVI.	0.52 ± 0.02	0.33 ± 0.03	0.21 ± 0.02	0.82 ± 0.04	0.19 ± 0.02	0.98±0.01
	CLUE	0.49 ± 0.02	0.38 ± 0.04	0.24 ± 0.02	0.92 ± 0.02	0.15±0.01	0.81 ± 0.02
	FACE	0.69 ± 0.02	0.55 ± 0.03	0.17 ± 0.01	0.79 ± 0.07	0.20 ± 0.01	0.87 ± 0.01
	CounterNet	0.35±0.01	0.45 ± 0.03	0.25 ± 0.02	1.09 ± 0.03	0.88 ± 0.03	0.99±0.01
	FastDCFlow	0.62 ± 0.08	0.69 ± 0.06	0.29 ± 0.02	0.99 ± 0.06	0.66 ± 0.03	0.99±0.01
	TDCE	0.49 ± 0.02	0.77±0.03	0.09±0.02	0.77±0.02	0.06±0.02	0.99±0.01
GMC	Wach.	0.04±0.01	0.23 ± 0.02	0.10 ± 0.01	1.13 ± 0.09	0.13±0.02	0.57 ± 0.03
	Wach.^IM1	0.15 ± 0.01	0.24 ± 0.02	0.11 ± 0.01	1.04 ± 0.09	0.13±0.02	0.77 ± 0.03
	CCH.	0.17 ± 0.02	0.21 ± 0.01	0.11 ± 0.01	1.19 ± 0.03	0.15 ± 0.01	0.52 ± 0.02
	REVI.	0.16 ± 0.02	0.21 ± 0.02	0.12 ± 0.01	1.10 ± 0.05	0.17 ± 0.01	0.53 ± 0.02
	CLUE	0.11 ± 0.02	0.20 ± 0.02	0.08±0.01	1.32 ± 0.05	0.13±0.01	0.57 ± 0.01
	FACE	0.09±0.03	0.16 ± 0.02	0.07±0.02	1.03±0.02	0.13±0.02	0.65 ± 0.02
	CounterNet	0.20 ± 0.01	0.17 ± 0.02	0.10 ± 0.01	1.09 ± 0.02	0.16 ± 0.02	0.90±0.01
	FastDCFlow	0.25 ± 0.01	0.98±0.04	0.13 ± 0.03	1.04±0.04	0.11±0.02	0.95±0.01
	TDCE	0.11 ± 0.02	0.83±0.03	0.06±0.01	0.99±0.03	0.05±0.01	0.94±0.02
Adult	Wach.	0.28±0.04	1.01±0.03	0.15 ± 0.01	1.00±0.05	0.07 ± 0.01	0.51 ± 0.02
	Wach.^IM1	0.83 ± 0.04	0.99 ± 0.03	0.16 ± 0.01	0.98±0.04	0.07 ± 0.01	0.71 ± 0.02
	CCH.	0.62 ± 0.03	0.72 ± 0.03	0.17 ± 0.02	1.11 ± 0.03	0.11 ± 0.01	0.55 ± 0.03
	REVI.	0.78 ± 0.03	0.78 ± 0.02	0.09 ± 0.01	1.11 ± 0.06	0.07 ± 0.01	0.61 ± 0.02
	CLUE	0.61±0.03	0.71 ± 0.03	0.07±0.01	1.14 ± 0.03	0.06±0.01	0.55 ± 0.01
	FACE	0.85 ± 0.02	0.79 ± 0.02	0.08±0.01	1.02 ± 0.02	0.06±0.01	0.58 ± 0.02
	CounterNet	0.86 ± 0.02	0.69 ± 0.02	0.07±0.02	1.03 ± 0.02	0.10 ± 0.01	0.84±0.01
	FastDCFlow	0.96 ± 0.05	0.79±0.06	0.08 ± 0.02	1.01±0.04	0.09 ± 0.01	0.84±0.01
	TDCE	0.79 ± 0.03	0.82±0.03	0.06±0.02	0.93±0.04	0.05±0.02	0.86±0.04
LAW	Wach.	0.26±0.02	1.12±0.02	0.13 ± 0.02	1.54 ± 0.01	0.14 ± 0.01	0.39 ± 0.03
	Wach.^IM1	0.91 ± 0.02	1.09±0.02	0.15 ± 0.02	1.22 ± 0.01	0.11 ± 0.01	0.81 ± 0.03
	CCH.	0.89 ± 0.02	0.75 ± 0.02	0.05±0.01	1.43 ± 0.02	0.13 ± 0.03	0.99±0.01
	REVI.	0.80 ± 0.03	1.02±0.02	0.09 ± 0.02	1.37 ± 0.02	0.11 ± 0.01	0.60 ± 0.01
	CLUE	0.81 ± 0.02	0.68 ± 0.01	0.07 ± 0.02	0.76±0.01	0.09±0.01	0.99±0.01
	FACE	0.92 ± 0.01	0.81 ± 0.03	0.04±0.01	1.63 ± 0.01	0.16 ± 0.01	0.80 ± 0.03
	CounterNet	0.79 ± 0.01	0.91 ± 0.02	0.07 ± 0.01	0.96 ± 0.03	0.09±0.01	0.98±0.01
	FastDCFlow	0.88 ± 0.04	0.96 ± 0.05	0.11 ± 0.02	0.98±0.04	0.12 ± 0.02	0.98±0.01
	TDCE	0.79±0.02	0.95 ± 0.03	0.05±0.01	0.73±0.03	0.07±0.01	0.98±0.02

Open in a new tab

We provide an evaluation according to the metrics described in the text. The arrow beside each metric indicates the preferred value. We select one feature to mask in the masking setting. bold = 1st, underline = 2nd. Wach.^IM1 is generated by linearly interpolating the generated sample such that the IM1 score is maximized. The mean and standard deviation are calculated based on 5 runs with different random seeds.

In the experiments, the baseline Wachter shows volatile performance in terms of the metrics across different datasets. It is able to produce robust samples with fair diversity, but it lacks interpretability, such as on LCD. Although the Wachter achieves the best L2 distance, the price is in other metrics, including validity. To see this, we linearly interpolate between the sample and the generated counterfactual to search for the sample such that IM1 is maximized. As it is evident, when the highest IM1 (which is still lower than ours) is achieved, the L2 distance is much worse than our TDCE. In the masking setting, we observe a similar performance. In addition, we see that FastDCFlow achieves a better diversity score because it adds perturbations for diversity purposes while generating counterfactual samples. However, this comes at the price of a high L2 distance, which is the key metric of counterfactual explanations. Furthermore, we also note that CCHVAE, REVISE, and CLUE show strong robustness (low instability). However, these results are usually accompanied by a low diversity score, indicating that the algorithms tend to generate similar counterfactuals. The same conclusion can also be drawn from their relatively high JS score, a high JS score suggesting that the generated categorical variables do not match the distribution in the target class.

To give a concrete example of what we observed, when analyzing the LCD dataset, we found that the Fair, Isaac and Company (FICO) score dominates the classifier's decision, while the categorical loan-term variable is less important. All benchmark methods tend to completely ignore the less significant feature because changing one's FICO score quickly changes the classification. In contrast, our TDCE method pays more attention to each feature, which creates less discrepancy between the distributions of the counterfactual and counter-class data.

We also note that TDCE is faster compared with other searching algorithms. Wachter, REVISE, and FACE require iterative searching for each individual sample, while TDCE's reverse process is a fixed Markov chain learned during training.

5.3.2. Proximity

We evaluate the proximity between the generated counterfactual sample and the original query samples using L2 distance. In the experiments, we observe that Wachter has achieved the lowest L2 distance across all the datasets. This is because Wachter stops searching when it finds the sample that changes its label with minimum modification in the feature space. This can produce a sample around the decision boundary, though by using minimal changes, it might produce a less meaningful explanation that is out of sample (i.e., has high IM1/IM2). We show that its L2 distance would be much worse if we interpolate the original sample and the counterfactual sample to find the sample with the maximum IM1 score. We emphasize that we are focusing on a higher validity and greater interpretability (i.e., low IM1/IM2) and therefore are willing to sacrifice on L2 distance. Nevertheless, our TDCE is adjustable for the importance of L2 distance by simply adding a regularization to Equation 10.

5.3.3. Efficency

We show the runtime for generating counterfactual samples in Table 4. We observe that CounterNet and FastDCFlow run faster than other methods because, instead of searching, it generates the counterfactuals through a single forward pass of a neural network. However, we highlight that our TDCE is model-agnostic(only requires differentiability), whereas CounterNet is model-dependent, requiring the encoder to be the classifier as well. FastDCFlow requires an invertible architecture as a counterfactual generator, restricting the model's expressiveness. The results also show that while FastDCFlow generally achieves a high diversity score, this comes at the cost of a larger L2 distance, which is the key metric of counterfactual explanations. In addition, TDCE is capable of generating more interpretable samples (i.e., low IM1/IM2) for all datasets we consider. Aside from CounterNet and FastDCFlow, TDCE runs faster than all other search-based algorithms because TDCE can generate the counterfactual samples in a pre-defined number of reverse steps, while search algorithms rely heavily on the objective function.

Table 4.

Counterfactual generation time in seconds per 100 samples on the same computer setting.

Counterfactual generation time (sec/100 sample)
Dataset\Model	Wach.	CCH.	REVI.	CLUE	FACE	CounterNet	FasthCFlow	TDCE
LCD	0.9	0.8	0.8	0.9	1.0	0.1	0.1	0.3
GMC	1.1	1.2	1.1	1.3	1.5	0.2	0.2	0.5
Adult	1.1	1.3	1.2	1.3	1.6	0.2	0.1	0.5
LAW	1.0	1.1	1.1	1.2	1.4	0.1	0.2	0.4

Open in a new tab

5.3.4. Ablation

We show the ablation study of our TDCE in Table 5. In the study, we experiment on the datasets with and without the L2 distance constraint described in Equation 10. The L2 distance constraint aims to generate a counterfactual sample with minimum changes. The results show that without an L2 distance constraint, TDCE can still generate interpretable results with a larger diversity score. However, the L2 distance constraint is necessary by the definition in Equation 1. In a no-masking setting, the JS score is almost unchanged as the categorical module in TDCE is separated from the continuous part and fixed during the ablation study.

Table 5.

Ablation study of TDCE with or without distance constraint in Equation 10.

Ablation study of TDCE with or without distance constraint in Equation 10
Dataset\Metrics		L2↓	Diversity↑	Instability↓	JS↓	IM1↓	IM2↓	Validity↑
LCD	without distance constraint	0.88 ± 0.04	0.81 ± 0.03	0.06 ± 0.01	0.02 ± 0.01	0.54 ± 0.03	0.05 ± 0.01	0.99 ± 0.01
	with distance constraint	0.59 ± 0.03	0.73 ± 0.03	0.05 ± 0.01	0.01 ± 0.01	0.63 ± 0.03	0.05 ± 0.01	0.99 ± 0.01
	without distance constraint	0.58 ± 0.02	0.91 ± 0.03	0.08 ± 0.03	−	0.79 ± 0.02	0.05 ± 0.02	0.99 ± 0.01
	with distance constraint	0.49 ± 0.02	0.77 ± 0.03	0.09 ± 0.02	−	0.77 ± 0.02	0.06 ± 0.02	0.99 ± 0.01
GMC	without distance constraint	0.31 ± 0.03	1.38 ± 0.06	0.05 ± 0.01	0.03 ± 0.01	0.94 ± 0.04	0.06 ± 0.02	0.99 ± 0.01
	with distance constraint	0.18 ± 0.03	1.08 ± 0.06	0.05 ± 0.01	0.03 ± 0.01	0.96 ± 0.04	0.06 ± 0.02	0.99 ± 0.01
	without distance constraint	0.20 ± 0.02	0.91 ± 0.03	0.07 ± 0.01	−	0.98 ± 0.03	0.06 ± 0.01	0.94 ± 0.02
	with distance constraint	0.11 ± 0.02	0.83 ± 0.03	0.06 ± 0.01	−	0.99 ± 0.03	0.05 ± 0.01	0.94 ± 0.02
Adult	without distance constraint	1.01 ± 0.05	0.89 ± 0.03	0.06 ± 0.01	0.03 ± 0.02	0.93 ± 0.04	0.05 ± 0.02	0.93 ± 0.04
	with distance constraint	0.85 ± 0.04	0.80 ± 0.03	0.05 ± 0.01	0.03 ± 0.02	0.90 ± 0.02	0.04 ± 0.01	0.94 ± 0.04
	without distance constraint	0.88 ± 0.03	0.89 ± 0.04	0.06 ± 0.02	−	0.97 ± 0.04	0.06 ± 0.01	0.83 ± 0.04
	with distance constraint	0.79 ± 0.03	0.82 ± 0.03	0.06 ± 0.02	−	0.93 ± 0.04	0.05 ± 0.02	0.86 ± 0.04
LAW	without distance constraint	0.99 ± 0.03	1.19 ± 0.04	0.07 ± 0.02	0.04 ± 0.02	0.90 ± 0.05	0.05 ± 0.01	0.99 ± 0.01
	with distance constraint	0.81 ± 0.02	0.97 ± 0.03	0.06 ± 0.02	0.04 ± 0.02	0.89 ± 0.05	0.06 ± 0.01	0.99 ± 0.01
	without distance constraint	0.96 ± 0.02	1.21 ± 0.03	0.06 ± 0.01	−	0.76 ± 0.04	0.06 ± 0.01	0.96 ± 0.02
	with distance constraint	0.79 ± 0.02	0.95 ± 0.03	0.05 ± 0.01	−	0.73 ± 0.03	0.07 ± 0.01	0.98 ± 0.02

Open in a new tab

For each dataset, the top two rows are for no masking, and the bottom two rows are for masking.

5.3.5. Discussion

Through the experiments, each benchmark shows a good ability to generate counterfactuals, yet they are somewhat limited due to design issues. Gradient-based methods, such as Wachter, which directly operate in the feature space, can often be fooled by spurious changes. This causes the classifier to change its prediction by small, uninterpretable movements. Although it can achieve the minimum L2 distance, further analysis shows that such samples often result in a worse interpretability score(IM1/IM2). VAE-based methods such as CLUE, REVISE, and CCHVAE leverage the generative power of VAEs but heavily rely on the black-box latent space in which they work. This may lead to counterfactuals that fall off the data manifold. Furthermore, we also observe that these methods occasionally tend to produce unvaried samples, implying that mode collapse might occur. In addition, the black-box decoder might also add uncertainty to the produced counterfactuals. Graph methods like FACE depend on sample quality and coverage, and search only within the graph itself, producing a sample that already exists in the data set. The construction of the graph and searching algorithm are inherently computationally expensive. Although CounterNet and FastDCFlow generate counterfactual samples at a faster speed, they often suffer from the worst interpretability score(CounterNet) due to the black-box decoder or larger L2 distance(FastDCFlow) due to added noise in the latent space. In contrast, our TDCE uses a diffusion model operating directly in the ambient features space. This connects it to Wachter, while still leveraging the generative power available to deep models such as the VAE. We believe the combination of these desirable aspects accounts for our good relative performance.

5.3.6. Qualitative evaluations

We also provide a qualitative comparison on the LCD dataset in Figure 4. LCD contains five numerical features and one categorical feature. In the non-mask setting, all features are guided by the classifier. Darker pixels show greater discrepancy between the counterfactual sample and the target class. As we can see from the first row, TDCE has fewer darker pixels in general. Importantly, in the middle row, the generated categorical variables from TDCE perfectly match the distribution of the target class. In the masking setting, we fix the categorical variable and only guide the continuous features. In general, the distributional agreement between the target class and the counterfactual class is much greater with TDCE than with other methods.

A series of visualizations comparing different models: TDCE, Wachter, CCHVAE, REVISE, CLUE, FACE, CounterNet, and FastDCFlow. Each model has three sections: a heatmap, a bar graph showing counts for Class 1 over 36 and 60 months, and a second heatmap. The heatmaps display varying shades of orange and brown, while the bar graphs compare model predictions over time, represented in blue and orange. — Qualitative comparisons between TDCE and other methods on LCD dataset. **Top**: the absolute difference between the correlation of counterfactual samples and that of the target class for the continuous features (debt-to-income ratio, loan amount, interest rate, annual income, FICO score). **Middle**: Bar plots for the categorical variable (loan term: 36 months or 60 months). **Bottom**: The same metrics as the top when masking the categorical variable. The absolute difference is the same for CounterNet and FastDCFlow because it simply copies the immutable features from the query sample.

5.3.7. Discussion on temperature τ

We evaluate the temperature τ over IM1, IM2, JS and Validity on LCD dataset in Figure 5. The temperature affects the overall counterfactual performance significantly. As the temperature drops, the Gumbel-softmax approaches a one-hot vector. However, this blocks the flow of gradients from the classifier back to the categorical variable, producing the vanishing gradient issue. In this case, the reverse process is mainly governed by the continuous features. This does not automatically prevent a model from generating valid counterfactual samples because the categorical variables might not be significant in the prediction. However, it does prevent the model from generating realistic counterfactual samples, which diminishes the legitimacy of counterfactual explanations.

Four line graphs display data labeled IM1, IM2, JS, and Val against the x-axis of Tau. Each graph shows mean values with ranges indicated by vertical error bars. IM1 and IM2 display increasing trends, JS shows a sharp dip at 0.5 Tau, and Val has a significant drop at 0.5 Tau with variability across other points. — An analysis on the temperature τ of LCD dataset. The best JS score is achieved when τ = 0.3 with balanced IM1 and IM2 scores.

As the temperature increases, the counterfactual generator tends to recover the distribution of the categorical variable in the target class. However, according to Theorem 4.1, our reverse process might diverge from the true reverse process as the temperature becomes larger. The resulting model may not be able to recover the distribution of categorical variables well. In the experiments, we search for the best τ ranging from 0.1 to 5 for each dataset and select the best τ based on the minimum L2 distance.

6. Conclusion

We proposed a tabular diffusion model that generates counterfactual explanations for a classifier. We leverage the Gumbel-softmax distribution to re-parameterize one-hot vectors into a continuous vector, which allows us to utilize the gradients from the classifier to guide the reverse process. We provided theoretical bounds and experimented on four popular tabular datasets. Quantitative results support that our method combines the advantages of working directly in the feature space of Wachter and graph methods with the advantage of neural networks of VAE-based methods.

Funding Statement

The author(s) declared that financial support was received for this work and/or its publication. This research was supported by funding from Capital One Labs. C Capital One Labs was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

Footnotes

Edited by: Kesheng Wu, Berkeley Lab (DOE), United States

Reviewed by: Jacob Sanderson, Northumbria University, United Kingdom

Tang Li, University of Delaware, United States

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

WZ: Writing – original draft, Writing – review & editing. BB: Conceptualization, Supervision, Visualization, Writing – review & editing. JP: Methodology, Supervision, Writing – review & editing.

Conflict of interest

BB was employed by Capital One.

The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frai.2026.1743495/full#supplementary-material

Data_Sheet_1.pdf^{(200.9KB, pdf)}

References

Agarwal R., Melnick L., Frosst N., Zhang X., Lengerich B., Caruana R., et al. (2021). “Neural additive models: Interpretable machine learning with neural nets,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Antorán J., Bhatt U., Adel T., Weller A., Hernández-Lobato J. M. (2020). Getting a CLUE: a method for explaining uncertainty estimates. arXiv [preprint] arXiv:2006.06848. doi: 10.48550/arXiv.2006.06848 [DOI] [Google Scholar]
Augustin M., Boreiko V., Croce F., Hein M. (2022). “Diffusion visual counterfactual explanations,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Avrahami O., Lischinski D., Fried O. (2022). “Blended diffusion for text-driven editing of natural images,” in IEEE Conference on Computer Vision and Pattern Recognition (Washington, DC: ). [Google Scholar]
Chang C.-H., Caruana R., Goldenberg A. (2022). “Node-GAM: Neural generalized additive model for interpretable deep learning,” in International Conference on Learning Representations (Amherst, MA: OpenReview; ). [Google Scholar]
Dandl S., Molnar C., Binder M., Bischl B. (2020). “Multi-objective counterfactual explanations,” in International Conference on Parallel Problem Solving from Nature (Cham: Springer; ). [Google Scholar]
Dhariwal P., Nichol A. (2021). “Diffusion models beat GANs on image synthesis,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Dieleman S., Sartran L., Roshannai A., Savinov N., Ganin Y., Richemond P. H., et al. (2022). Continuous diffusion for categorical data. arXiv [preprint] arXiv:2211.15089. doi: 10.48550/arXiv.2211.15089 [DOI] [Google Scholar]
Duell J., Seisenberger M., Fu H., Fan X. (2024). “QUCE: the minimisation and quantification of path-based uncertainty for generative counterfactual explanations,” in 2024 IEEE International Conference on Data Mining (ICDM) (IEEE: ), 693–698. [Google Scholar]
Fernández R. R., De Diego I. M., Ace na V., Fernández-Isabel A., Moguerza J. M. (2020). Random forest explainability using counterfactual sets. Inform. Fusion 63, 196–207. doi: 10.1016/j.inffus.2020.07.001 [DOI] [Google Scholar]
Galwaduge V., Samarabandu J. (2025). Novel actionable counterfactual explanations for intrusion detection using diffusion models. J. Cybersecur. Privacy 5:68. doi: 10.3390/jcp5030068 [DOI] [Google Scholar]
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., et al. (2020). Generative adversarial networks. Commun. ACM 63, 139–144. doi: 10.1145/3422622 [DOI] [Google Scholar]
Gruver N., Stanton S., Frey N., Rudner T. G., Hotzel I., Lafrance-Vanasse J., et al. (2024). “Protein design with guided discrete diffusion,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Guidotti R., Monreale A., Ruggieri S., Pedreschi D., Turini F., Giannotti F. (2018). Local rule-based explanations of black box decision systems. arXiv [preprint] arXiv:1805.10820. doi: 10.48550/arXiv.1805.10820 [DOI] [Google Scholar]
Guo H., Jia F., Chen J., Squicciarini A., Yadav A. (2023a). “Rocoursenet: Robust training of a prediction aware recourse model,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (New York, NY: ), 619–628. [Google Scholar]
Guo H., Nguyen T. H., Yadav A. (2023b). “CounterNet: End-to-end training of prediction aware counterfactual explanations,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (New York, NY: ), 577–589. doi: 10.1145/3580305.3599290 [DOI] [Google Scholar]
Ho J., Jain A., Abbeel P. (2020). “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Hoogeboom E., Nielsen D., Jaini P., Forré P., Welling M. (2021). “Argmax flows and multinomial diffusion: learning categorical distributions,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Ibrahim M., Louie M., Modarres C., Paisley J. (2019). “Global explanations of neural networks: Mapping the landscape of predictions,” in AAAI/ACM Conference on AI, Ethics, and Society (New York, NY: ). [Google Scholar]
Jang E., Gu S., Poole B. (2017). “Categorical reparameterization with Gumbel-softmax,” in International Conference on Learning Representations (Amherst, MA: OpenReview; ). [Google Scholar]
Joshi S., Koyejo O., Vijitbenjaronk W., Kim B., Ghosh J. (2019). Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv [preprint] arXiv:1907.09615. doi: 10.48550/arXiv.1907.09615 [DOI] [Google Scholar]
Kingma D. P. (2013). Auto-encoding variational Bayes. arXiv [preprint] arXiv:1312.6114. doi: 10.48550/arXiv.1312.6114 [DOI] [Google Scholar]
Kotelnikov A., Baranchuk D., Rubachev I., Babenko A. (2023). “TabDDPM: Modelling tabular data with diffusion models,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]
Liu L. T., Barocas S., Kleinberg J., Levy K. (2024). “On the actionability of outcome prediction,” in AAAI Conference on Artificial Intelligence (Washington, DC: ). [Google Scholar]
Liu L. T., Dean S., Rolf E., Simchowitz M., Hardt M. (2018). “Delayed impact of fair machine learning,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]
Lundberg S., Lee S.-I. (2017). “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Madaan N., Bedathur S. (2024). “Navigating the structured what-if spaces: Counterfactual generation via structured diffusion,” in 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (Toronto, ON: IEEE; ), 710–722. [Google Scholar]
Maddison C. J., Mnih A., Teh Y. W. (2016). The concrete distribution: A continuous relaxation of discrete random variables. arXiv [preprint] arXiv:1611.00712. doi: 10.48550/arXiv.1611.00712 [DOI] [Google Scholar]
Mothilal R. K., Sharma A., Tan C. (2020). “Explaining machine learning classifiers through diverse counterfactual explanations,” in Conference on Fairness, Accountability, and Transparency (New York, NY: ). [Google Scholar]
Na S.-H., Lee S.-W. (2025). Counterfactual explanation through latent adjustment in disentangled space of diffusion model. IEEE Trans. Neural Netw. Learn. Syst. 36, 18355–18368. doi: 10.1109/TNNLS.2025.3580118 [DOI] [PubMed] [Google Scholar]
Nazabal A., Olmos P. M., Ghahramani Z., Valera I. (2020). Handling incomplete heterogeneous data using vaes. Pattern Recogn. 107:107501. doi: 10.1016/j.patcog.2020.107501 [DOI] [Google Scholar]
Nemirovsky D., Thiebaut N., Xu Y., Gupta A. (2022). “CounteRGAN: Generating counterfactuals for real-time recourse and interpretability using residual gans,” in Uncertainty in Artificial Intelligence (New York: PMLR; ), 1488–1497. [Google Scholar]
Nichol A. Q., Dhariwal P. (2021). “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning (Amherst, MA: OpenReview; ). [Google Scholar]
Nie S., Zhu F., You Z., Zhang X., Ou J., Hu J., et al. (2025). Large language diffusion models. arXiv [preprint] arXiv:2502.09992. doi: 10.48550/arXiv.2502.09992 [DOI] [Google Scholar]
Panagiotou E., Heurich M., Landgraf T., Ntoutsi E. (2024). “TABCF: Counterfactual explanations for tabular data using a transformer-based VAE,” Proceedings of the 5th ACM International Conference on AI in Finance (New York, NY), 274–282. [Google Scholar]
Pawelczyk M., Bielawski S., Heuvel J., Richter T., Kasneci G. (2021). “CARLA: a Python library to benchmark algorithmic recourse and counterfactual explanation algorithms,” in Neural Information Processing Systems Track on Datasets and Benchmarks (New York, NY: Association for Computing Machinery; ). [Google Scholar]
Pawelczyk M., Broelemann K., Kasneci G. (2020). “Learning model-agnostic counterfactual explanations for tabular data,” in The Web Conference (San Diego, CA: ). [Google Scholar]
Poyiadzi R., Sokol K., Santos-Rodriguez R., De Bie T., Flach P. (2020). “FACE: feasible and actionable counterfactual explanations,” in AAAI/ACM Conference on AI, Ethics, and Society (New York, NY: ). [Google Scholar]
Radenovic F., Dubey A., Mahajan D. (2022). “Neural basis models for interpretability,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]
Regol F., Coates M. (2023). “Diffusing Gaussian mixtures for generating categorical data,” in AAAI Conference on Artificial Intelligence (Washington, DC: ). [Google Scholar]
Ribeiro M. T., Singh S., Guestrin C. (2016). ““Why should i trust you?” explaining the predictions of any classifier,” in International Conference on Knowledge Discovery and Data Mining (New York, NY). [Google Scholar]
Rombach R., Blattmann A., Lorenz D., Esser P., Ommer B. (2022). “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (New Orleans, LA: IEEE; ), 10684–10695. [Google Scholar]
Sanderson J., Mao H., Woo W. L. (2025). GradCFA: A hybrid gradient-based counterfactual and feature attribution explanation algorithm for local interpretation of neural networks. IEEE Trans. Artif. Intellig. 6, 2575–2587. doi: 10.1109/TAI.2025.3552057 [DOI] [Google Scholar]
Schiff Y., Sahoo S. S., Phung H., Wang G., Boshar S., Dalla-torre H., et al. (2024). Simple guidance mechanisms for discrete diffusion models. arXiv [preprint] arXiv:2412.10193. doi: 10.48550/arXiv.2412.10193 [DOI] [Google Scholar]
Shrikumar A., Greenside P., Kundaje A. (2017). “Learning important features through propagating activation differences,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]
Sohl-Dickstein J., Weiss E., Maheswaranathan N., Ganguli S. (2015). “Deep unsupervised learning using nonequilibrium thermodynamics,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]
Song J., Meng C., Ermon S. (2020). Denoising diffusion implicit models. arXiv [preprint] arXiv:2010.02502. doi: 10.48550/arXiv.2010.02502 [DOI] [Google Scholar]
Su J., Vargas D. V., Sakurai K. (2019). One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comp. 23, 828–841. doi: 10.1109/TEVC.2019.2890858 [DOI] [Google Scholar]
Sumiya Y., Shouno H. (2024). “Model-based counterfactual explanations incorporating feature space attributes for tabular data,” in 2024 International Joint Conference on Neural Networks (IJCNN) (Yokohama: IEEE; ), 1–10. [Google Scholar]
Sun H., Yu L., Dai B., Schuurmans D., Dai H. (2022). Score-based continuous-time discrete diffusion models. arXiv [preprint] arXiv:2211.16750. doi: 10.48550/arXiv.2211.16750 [DOI] [Google Scholar]
Sundararajan M., Taly A., Yan Q. (2017). “Axiomatic attribution for deep networks,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]
Tsiourvas A., Sun W., Perakis G. (2024). “Manifold-aligned counterfactual explanations for neural networks,” in International Conference on Artificial Intelligence and Statistics (New York: PMLR; ), 3763–3771. [Google Scholar]
Van Looveren A., Klaise J. (2021). “Interpretable counterfactual explanations guided by prototypes,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Cham: Springer; ). [Google Scholar]
Wachter S., Mittelstadt B., Russell C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. J. Law & Tech. 31:841. doi: 10.2139/ssrn.3063289 [DOI] [Google Scholar]
Zhang W., Barr B., Paisley J. (2022). “An interpretable deep classifier for counterfactual generation,” in Proceedings of the Third ACM International Conference on AI in Finance (New York, NY: ), 36–43. [Google Scholar]
Zhang W., Barr B., Paisley J. (2024). “Gaussian process neural additive models,” in AAAI Conference on Artificial Intelligence (Washington, DC: ). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data_Sheet_1.pdf^{(200.9KB, pdf)}

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

[B1] Agarwal R., Melnick L., Frosst N., Zhang X., Lengerich B., Caruana R., et al. (2021). “Neural additive models: Interpretable machine learning with neural nets,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B2] Antorán J., Bhatt U., Adel T., Weller A., Hernández-Lobato J. M. (2020). Getting a CLUE: a method for explaining uncertainty estimates. arXiv [preprint] arXiv:2006.06848. doi: 10.48550/arXiv.2006.06848 [DOI] [Google Scholar]

[B3] Augustin M., Boreiko V., Croce F., Hein M. (2022). “Diffusion visual counterfactual explanations,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B4] Avrahami O., Lischinski D., Fried O. (2022). “Blended diffusion for text-driven editing of natural images,” in IEEE Conference on Computer Vision and Pattern Recognition (Washington, DC: ). [Google Scholar]

[B5] Chang C.-H., Caruana R., Goldenberg A. (2022). “Node-GAM: Neural generalized additive model for interpretable deep learning,” in International Conference on Learning Representations (Amherst, MA: OpenReview; ). [Google Scholar]

[B6] Dandl S., Molnar C., Binder M., Bischl B. (2020). “Multi-objective counterfactual explanations,” in International Conference on Parallel Problem Solving from Nature (Cham: Springer; ). [Google Scholar]

[B7] Dhariwal P., Nichol A. (2021). “Diffusion models beat GANs on image synthesis,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B8] Dieleman S., Sartran L., Roshannai A., Savinov N., Ganin Y., Richemond P. H., et al. (2022). Continuous diffusion for categorical data. arXiv [preprint] arXiv:2211.15089. doi: 10.48550/arXiv.2211.15089 [DOI] [Google Scholar]

[B9] Duell J., Seisenberger M., Fu H., Fan X. (2024). “QUCE: the minimisation and quantification of path-based uncertainty for generative counterfactual explanations,” in 2024 IEEE International Conference on Data Mining (ICDM) (IEEE: ), 693–698. [Google Scholar]

[B10] Fernández R. R., De Diego I. M., Ace na V., Fernández-Isabel A., Moguerza J. M. (2020). Random forest explainability using counterfactual sets. Inform. Fusion 63, 196–207. doi: 10.1016/j.inffus.2020.07.001 [DOI] [Google Scholar]

[B11] Galwaduge V., Samarabandu J. (2025). Novel actionable counterfactual explanations for intrusion detection using diffusion models. J. Cybersecur. Privacy 5:68. doi: 10.3390/jcp5030068 [DOI] [Google Scholar]

[B12] Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., et al. (2020). Generative adversarial networks. Commun. ACM 63, 139–144. doi: 10.1145/3422622 [DOI] [Google Scholar]

[B13] Gruver N., Stanton S., Frey N., Rudner T. G., Hotzel I., Lafrance-Vanasse J., et al. (2024). “Protein design with guided discrete diffusion,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B14] Guidotti R., Monreale A., Ruggieri S., Pedreschi D., Turini F., Giannotti F. (2018). Local rule-based explanations of black box decision systems. arXiv [preprint] arXiv:1805.10820. doi: 10.48550/arXiv.1805.10820 [DOI] [Google Scholar]

[B15] Guo H., Jia F., Chen J., Squicciarini A., Yadav A. (2023a). “Rocoursenet: Robust training of a prediction aware recourse model,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (New York, NY: ), 619–628. [Google Scholar]

[B16] Guo H., Nguyen T. H., Yadav A. (2023b). “CounterNet: End-to-end training of prediction aware counterfactual explanations,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (New York, NY: ), 577–589. doi: 10.1145/3580305.3599290 [DOI] [Google Scholar]

[B17] Ho J., Jain A., Abbeel P. (2020). “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B18] Hoogeboom E., Nielsen D., Jaini P., Forré P., Welling M. (2021). “Argmax flows and multinomial diffusion: learning categorical distributions,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B19] Ibrahim M., Louie M., Modarres C., Paisley J. (2019). “Global explanations of neural networks: Mapping the landscape of predictions,” in AAAI/ACM Conference on AI, Ethics, and Society (New York, NY: ). [Google Scholar]

[B20] Jang E., Gu S., Poole B. (2017). “Categorical reparameterization with Gumbel-softmax,” in International Conference on Learning Representations (Amherst, MA: OpenReview; ). [Google Scholar]

[B21] Joshi S., Koyejo O., Vijitbenjaronk W., Kim B., Ghosh J. (2019). Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv [preprint] arXiv:1907.09615. doi: 10.48550/arXiv.1907.09615 [DOI] [Google Scholar]

[B22] Kingma D. P. (2013). Auto-encoding variational Bayes. arXiv [preprint] arXiv:1312.6114. doi: 10.48550/arXiv.1312.6114 [DOI] [Google Scholar]

[B23] Kotelnikov A., Baranchuk D., Rubachev I., Babenko A. (2023). “TabDDPM: Modelling tabular data with diffusion models,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]

[B24] Liu L. T., Barocas S., Kleinberg J., Levy K. (2024). “On the actionability of outcome prediction,” in AAAI Conference on Artificial Intelligence (Washington, DC: ). [Google Scholar]

[B25] Liu L. T., Dean S., Rolf E., Simchowitz M., Hardt M. (2018). “Delayed impact of fair machine learning,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]

[B26] Lundberg S., Lee S.-I. (2017). “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B27] Madaan N., Bedathur S. (2024). “Navigating the structured what-if spaces: Counterfactual generation via structured diffusion,” in 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (Toronto, ON: IEEE; ), 710–722. [Google Scholar]

[B28] Maddison C. J., Mnih A., Teh Y. W. (2016). The concrete distribution: A continuous relaxation of discrete random variables. arXiv [preprint] arXiv:1611.00712. doi: 10.48550/arXiv.1611.00712 [DOI] [Google Scholar]

[B29] Mothilal R. K., Sharma A., Tan C. (2020). “Explaining machine learning classifiers through diverse counterfactual explanations,” in Conference on Fairness, Accountability, and Transparency (New York, NY: ). [Google Scholar]

[B30] Na S.-H., Lee S.-W. (2025). Counterfactual explanation through latent adjustment in disentangled space of diffusion model. IEEE Trans. Neural Netw. Learn. Syst. 36, 18355–18368. doi: 10.1109/TNNLS.2025.3580118 [DOI] [PubMed] [Google Scholar]

[B31] Nazabal A., Olmos P. M., Ghahramani Z., Valera I. (2020). Handling incomplete heterogeneous data using vaes. Pattern Recogn. 107:107501. doi: 10.1016/j.patcog.2020.107501 [DOI] [Google Scholar]

[B32] Nemirovsky D., Thiebaut N., Xu Y., Gupta A. (2022). “CounteRGAN: Generating counterfactuals for real-time recourse and interpretability using residual gans,” in Uncertainty in Artificial Intelligence (New York: PMLR; ), 1488–1497. [Google Scholar]

[B33] Nichol A. Q., Dhariwal P. (2021). “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning (Amherst, MA: OpenReview; ). [Google Scholar]

[B34] Nie S., Zhu F., You Z., Zhang X., Ou J., Hu J., et al. (2025). Large language diffusion models. arXiv [preprint] arXiv:2502.09992. doi: 10.48550/arXiv.2502.09992 [DOI] [Google Scholar]

[B35] Panagiotou E., Heurich M., Landgraf T., Ntoutsi E. (2024). “TABCF: Counterfactual explanations for tabular data using a transformer-based VAE,” Proceedings of the 5th ACM International Conference on AI in Finance (New York, NY), 274–282. [Google Scholar]

[B36] Pawelczyk M., Bielawski S., Heuvel J., Richter T., Kasneci G. (2021). “CARLA: a Python library to benchmark algorithmic recourse and counterfactual explanation algorithms,” in Neural Information Processing Systems Track on Datasets and Benchmarks (New York, NY: Association for Computing Machinery; ). [Google Scholar]

[B37] Pawelczyk M., Broelemann K., Kasneci G. (2020). “Learning model-agnostic counterfactual explanations for tabular data,” in The Web Conference (San Diego, CA: ). [Google Scholar]

[B38] Poyiadzi R., Sokol K., Santos-Rodriguez R., De Bie T., Flach P. (2020). “FACE: feasible and actionable counterfactual explanations,” in AAAI/ACM Conference on AI, Ethics, and Society (New York, NY: ). [Google Scholar]

[B39] Radenovic F., Dubey A., Mahajan D. (2022). “Neural basis models for interpretability,” in Advances in Neural Information Processing Systems (San Diego, CA: ). [Google Scholar]

[B40] Regol F., Coates M. (2023). “Diffusing Gaussian mixtures for generating categorical data,” in AAAI Conference on Artificial Intelligence (Washington, DC: ). [Google Scholar]

[B41] Ribeiro M. T., Singh S., Guestrin C. (2016). ““Why should i trust you?” explaining the predictions of any classifier,” in International Conference on Knowledge Discovery and Data Mining (New York, NY). [Google Scholar]

[B42] Rombach R., Blattmann A., Lorenz D., Esser P., Ommer B. (2022). “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (New Orleans, LA: IEEE; ), 10684–10695. [Google Scholar]

[B43] Sanderson J., Mao H., Woo W. L. (2025). GradCFA: A hybrid gradient-based counterfactual and feature attribution explanation algorithm for local interpretation of neural networks. IEEE Trans. Artif. Intellig. 6, 2575–2587. doi: 10.1109/TAI.2025.3552057 [DOI] [Google Scholar]

[B44] Schiff Y., Sahoo S. S., Phung H., Wang G., Boshar S., Dalla-torre H., et al. (2024). Simple guidance mechanisms for discrete diffusion models. arXiv [preprint] arXiv:2412.10193. doi: 10.48550/arXiv.2412.10193 [DOI] [Google Scholar]

[B45] Shrikumar A., Greenside P., Kundaje A. (2017). “Learning important features through propagating activation differences,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]

[B46] Sohl-Dickstein J., Weiss E., Maheswaranathan N., Ganguli S. (2015). “Deep unsupervised learning using nonequilibrium thermodynamics,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]

[B47] Song J., Meng C., Ermon S. (2020). Denoising diffusion implicit models. arXiv [preprint] arXiv:2010.02502. doi: 10.48550/arXiv.2010.02502 [DOI] [Google Scholar]

[B48] Su J., Vargas D. V., Sakurai K. (2019). One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comp. 23, 828–841. doi: 10.1109/TEVC.2019.2890858 [DOI] [Google Scholar]

[B49] Sumiya Y., Shouno H. (2024). “Model-based counterfactual explanations incorporating feature space attributes for tabular data,” in 2024 International Joint Conference on Neural Networks (IJCNN) (Yokohama: IEEE; ), 1–10. [Google Scholar]

[B50] Sun H., Yu L., Dai B., Schuurmans D., Dai H. (2022). Score-based continuous-time discrete diffusion models. arXiv [preprint] arXiv:2211.16750. doi: 10.48550/arXiv.2211.16750 [DOI] [Google Scholar]

[B51] Sundararajan M., Taly A., Yan Q. (2017). “Axiomatic attribution for deep networks,” in International Conference on Machine Learning (Cambridge, MA: ). [Google Scholar]

[B52] Tsiourvas A., Sun W., Perakis G. (2024). “Manifold-aligned counterfactual explanations for neural networks,” in International Conference on Artificial Intelligence and Statistics (New York: PMLR; ), 3763–3771. [Google Scholar]

[B53] Van Looveren A., Klaise J. (2021). “Interpretable counterfactual explanations guided by prototypes,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Cham: Springer; ). [Google Scholar]

[B54] Wachter S., Mittelstadt B., Russell C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. J. Law & Tech. 31:841. doi: 10.2139/ssrn.3063289 [DOI] [Google Scholar]

[B55] Zhang W., Barr B., Paisley J. (2022). “An interpretable deep classifier for counterfactual generation,” in Proceedings of the Third ACM International Conference on AI in Finance (New York, NY: ), 36–43. [Google Scholar]

[B56] Zhang W., Barr B., Paisley J. (2024). “Gaussian process neural additive models,” in AAAI Conference on Artificial Intelligence (Washington, DC: ). [Google Scholar]

PERMALINK

Tabular diffusion counterfactual explanations

Wei Zhang

Brian Barr

John Paisley

Roles

Abstract

1. Introduction

2. Related work

2.1. Counterfactual explanations

2.2. Guided diffusion models

3. Background on diffusion models

3.1. Tabular diffusion models

3.1.1. Continuous diffusions

3.1.2. Categorical diffusions

3.2. Classifier guidance

4. Categorical tabular diffusions for counterfactual explanations

Figure 1.

4.1. Tabular counterfactual generation

4.1.1. Continuous features

4.1.2. Categorical features

4.2. Relaxation of categorical variables

Figure 2.

4.3. Closeness of the approximation

Figure 3.

4.4. Immutable features

Algorithm 1 Tabular diffusion counterfactual explanations —

5. Experiments

5.1. Datasets

Table 1.

5.2. Baselines and evaluation metrics

5.2.1. L2 distance

5.2.2. Interpretability

5.2.3. Diversity

5.2.4. Validity

5.2.5. Instability

5.2.6. JS divergence

5.3. Results

5.3.1. Quantitative evaluation

Table 2.

Table 3.

5.3.2. Proximity

5.3.3. Efficency

Table 4.

5.3.4. Ablation

Table 5.

5.3.5. Discussion

5.3.6. Qualitative evaluations

Figure 4.

5.3.7. Discussion on temperature τ

Figure 5.

6. Conclusion

Funding Statement

Footnotes

Data availability statement

Author contributions

Conflict of interest

Generative AI statement

Publisher's note

Supplementary material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases