Abstract
Understanding the relationship between risk factors, geospatial patterns, and disease outcomes is essential in health geography research. These relationships can inform the implementation of healthcare and public health strategies to improve health outcomes. To accurately uncover such complex relationships, it is necessary to have a predictive model capable of integrating both health variables and spatial information to forecast health outcomes, along with a tool to interpret and reveal the patterns identified by this model. We developed a Spatial Counterfactual Explainable Deep Learning model (SpaCE), comprising a spatially explicit health outcome predictor and a prototype-guided counterfactual explanation. The SpaCE model unifies geospatial and health variables to improve predictions and generates hypothetical examples with minimal changes but opposite outcomes. Using these counterfactuals, SpaCE assesses the impact of each variable in different spatial contexts. We evaluated the model for predicting cardiac arrest survival outcomes. With a 0.682 AUCROC score, the SpaCE exceeds baseline models by 10.2%. Further analysis also reveals that the geospatial context significantly affects how various risk factors affect the survival outcomes of patients. Overall, the SpaCE model significantly improves predictive accuracy and explainability. It provides targeted interventions at both individual and geographic levels, and the cardiac arrest case study shows its high adaptability to various disease scenarios.
Keywords: Counterfactual Explanation, Geographical Explainable Artificial Intelligence, Out-of-Hospital Cardiac Arrest, Survival Status
1. Introduction
The fundamental task in building predictive models in healthcare and public health is to accurately capture and quantify the relationships between risk factors and health outcomes, enabling models to guide effective interventions and policy decisions (Shah et al. 2018). Given the complexity of this task, previous research has often focused on one of two primary goals: developing highly accurate predictive models or creating explainable methods to reveal patterns learned by these models.
In the case of predictive modeling in healthcare and public health, a key observation is that many chronic health conditions such as cardiovascular disease exhibit distinct spatial patterns (Nazia et al. 2022, Djukpen 2012, Mena et al. 2018, Sahar et al. 2021, Son et al. 2023). Consequently, integrating both health-related features and spatial effects is essential to building reliable predictive models. Traditionally, spatial statistical methods, such as Geographically Weighted Regression (GWR) (Brunsdon et al. 1998), have been instrumental in identifying spatial variations and elucidating underlying risk factors across geographic regions (Akindote et al. 2023, ŞENER and Türk 2021). In recent years, machine learning (ML) models, known for their superior predictive capabilities and ability to manage complex data patterns, have introduced a new approach to predicting health outcomes (Santangelo et al. 2023, Wiemken and Kelley 2020, Habehh and Gohel 2021). However, as ML approaches become more prevalent, some studies fail to account for spatial information, overlooking how geospatial variations influence health outcomes (Zhang et al. 2022, Chen et al. 2017, Choi et al. 2016). Admittedly, incorporating spatial effects into traditional ML models is challenging (Mai et al. 2022b). Treating geographical coordinates as continuous variables in a regression model often leads to model overfitting, as the model may memorize all the coordinates in the training dataset. Such a model might perform well on the training data but struggle to generalize to new, unseen data, limiting its practical utility. Thus, a pressing need exists to develop ML-based predictive models that effectively incorporate the spatial distribution of the data.
Once a predictive model is established, the next challenge lies in interpreting the patterns it has identified. Unfortunately, most ML models are highly complex, making direct interpretation difficult. Previous studies (Loh et al. 2022, Minh et al. 2022) have primarily employed post-hoc Explainable Artificial Intelligence (XAI) techniques to address this issue (Lundberg and Lee 2017). Although these methods can generate global or local feature importance scores(Zhang et al. 2022, Chen et al. 2017, Choi et al. 2016), they often fall short of providing actionable guidance, such as specific changes needed in variables to alter outcomes (e.g., transitioning from a high-risk to a low-risk state) or identifying optimal intervention targets. Such insights are crucial for both life-saving interventions and informed policy development.
Recognizing the limitations in current predictive models and XAI methods, we developed a novel Spatial Counterfactual Explainable Deep Learning Method (SpaCE). This method comprises two core components: a Spatially Explicit Health Outcome Predictor (SEP) and a Prototype-Guided Counterfactual Explanation (PCE) algorithm, applied to predict survival outcomes for Out-of-Hospital Cardiac Arrest (OHCA). The SEP integrates spatial and health variables to improve health outcome prediction accuracy, while the PCE explains SEP predictions by identifying minimal adjustments needed to achieve an alternative outcome, generating feasible and actionable counterfactual examples. These examples offer strategic insights for intervention when an individual’s predicted outcome differs from the observed result. We summarize our contribution as follows:
Spatially Explicit Health Outcome Predictor (SEP): We design SEP to integrate heterogeneous spatial and health variables effectively, enhancing predictive accuracy, demonstrated on Out-of-Hospital Cardiac Arrest (OHCA) survival outcome predictions.
Prototype-Guided Counterfactual Explanation (PCE): We develop the PCE algorithm to generate actionable counterfactual explanations by identifying minimal variable changes required to achieve alternative outcomes, thus providing practical intervention insights.
Insights for Public Health Policy and Intervention: SpaCE delivers guidance at both geographic and individual levels, supporting data-driven strategies for reducing cardiac arrest mortality and offering personalized recommendations for intervention.
2. Related Work
2.1. ML predictive models using healthcare-related variables
ML models are widely developed and applied to predict health outcomes using healthcare-related variables owing to their advantages in enhancing diagnostic speed and accuracy (Mubeen et al. 2017, Neefjes et al. 2017, Alaa et al. 2019, Kwon et al. 2019). These ML predictive models are divided into two categories based on whether they account for spatial effects: non-spatial predictive models and spatially explicit predictive models.
Non-spatial predictive ML models(Zhang et al. 2022, Chen et al. 2017, Choi et al. 2016) include Decision Trees (DT), Random Forests (RF), Support Vector Machines (SVM), and Artificial Neural Networks (ANN), among others. For instance, Zhang et al. (2022) utilized logistic regression, random forest, light gradient boosting, and extreme gradient boosting models to forecast in-hospital mortality among patients admitted for peripheral artery disease in the United States based on variables including patient characteristics, comorbidities, procedures, and hospital-related factors. In another study, Chen et al. (2017) proposed a convolutional neural network-based multi-modal approach for effective prediction of chronic disease outbreaks in disease-frequent communities using structured and unstructured data from hospitals. Through experimentation on a dataset focused on regional chronic diseases such as cerebral infarction, they demonstrated that the prediction accuracy of their algorithm surpassed that of unimodal approaches for disease risk prediction using structured and unstructured data from hospitals. Choi et al. (2016) developed a recurrent neural network-based temporal model to predict all the diagnosis and medication categories for a subsequent visit using longitudinal time-stamped Electronic Health Record(EHR) data for 260K patients. These EHR variables include diagnosis codes, medication codes, or procedure codes. It is worth noting that, in general, none of these ML models applied in healthcare fields have incorporated spatial elements during their development(Zhang et al. 2022, Chen et al. 2017, Choi et al. 2016), despite the evident spatial distribution patterns observed in diseases.
Recently, we have witnessed an increasing number of studies on spatially explicit machine learning model development (Janowicz et al. 2020, Mai et al. 2022a). These models have been applied to various geospatial prediction tasks such as geographic question answering (Mai et al. 2020a), geospatial shape recognition (Mai et al. 2023a, Siampou et al. 2024), POI type prediction (Yan et al. 2017, Mai et al. 2020b), species fine-grained recognition (Mac Aodha et al. 2019, Wu et al. 2024, Mai et al. 2023c), satellite image classification (Mai et al. 2023b), trajectory generation (Rao et al. 2020, Klemmer et al. 2023), terrain feature detection (Li et al. 2021), etc. However, most of these spatially explicit ML models do not offer model explainability except for a few. One such model is the Spatial Random Forest (Spatial RF)(Benito 2021, Wright and Ziegler 2015), which is suitable for spatial classification and spatial regression tasks. Another is the Spatial Regression Graph Convolutional Neural Network (SRGCNN) model(Zhu et al. 2022), which is limited to spatial regression tasks. Both of these methods show potential in healthcare outcome prediction tasks. SpatialRF facilitates the fitting of spatial regression models on spatial data using Random Forest. It achieves this by generating spatial predictors that enable the model to understand the spatial structure of the training data. This approach aims to minimize the spatial autocorrelation of model residuals and provide accurate variable importance scores. The method SRGCNN, a spatial regression graph convolutional neural network, incorporates both non-spatial and spatial effects by formalizing the spatial weights matrix and cross-sectional data () as a fully connected graph within the GCNN framework. Experiments have demonstrated its capability to handle a broad range of geospatial data. As discussed in the paper, the authors suggest that this method could be extended into the field of public health. Considering the SRGCNN is a spatial regression model yet our task is a classification task, we leverage SpatialRF as one baseline in our study.
2.2. Explainable AI for healthcare-related outcome predictive model
The XAI models used for explaining predictive models can also be divided into two categories based on whether they account for spatial effects: non-spatial XAI models and spatially explicit XAI models. Among non-spatial XAI studies for explaining predictive models in healthcare, Tseng et al. (2020) investigated the influence of intraoperative variables on acute kidney injury associated with cardiac surgery. Utilizing SHAP values (Lundberg and Lee 2017), they identified several factors in hemodynamic variables as significant contributors to injury occurrence. Dindorf et al. (2020) applied LIME to an SVC model aimed at classifying post-hip surgery walking patterns in patients. Their analysis illuminated the pivotal role of specific movements as key determinants influencing the SVC’s decision-making process. Similarly, Peng et al. (2021) endeavored to provide a comprehensive interpretation of auxiliary diagnosis in hepatitis cases. They utilized Partial Dependence Plots(PDP) to refine the interpretation of liver disease dynamics.
To the best of our knowledge, the field of spatially explicit XAI (explainable artificial intelligence) models is still in its early stages. Currently, the only model in this category is GeoShapley (Li 2024). GeoShapley is a post-hoc explanation method that treats geographic coordinates as separate feature columns in the predictive model. Its explanation approach leverages the Joint Shapley concept, treating two separate features as a combined feature for interpretation. Although GeoShapley is based on SHAP, it is not a counterfactual-based method. Although it can be used to identify feature importance, it cannot generate feasible and actionable counterfactual examples that offer strategic insights for interventions which are expected for most healthcare prediction problems.
Despite advancements in both non-spatial and spatially explicit XAI methods, these techniques primarily focus on assessing global or local feature importance(Zhang et al. 2022, Chen et al. 2017, Choi et al. 2016). In contrast, counterfactual-based explanation methods not only provide insights into feature importance but also offer targeted interventions for individual variables. For example, in the case of a patient with an ”expired” status, a counterfactual approach might suggest that if the ”witness status” changed from ”not witnessed” to ”witnessed by family member” and ”AED use” from ”not used” to ”used,” the patient might have survived. This approach offers actionable modifications for specific features. When applied within a geographic context, it can further inform region-specific intervention strategies.
2.3. Counterfactual Explanation for healthcare-related outcome predictive model
Counterfactual explanations (Molnar 2020) represent a distinctive area within the XAI (Explainable Artificial Intelligence) domain. In general, counterfactual explanations can be divided into two main categories: one emphasizing causal inference (Prosperi et al. 2020, Smith and Ramamoorthy 2020, Morgan 2015), where counterfactuals are used to infer causal relationships, and the other focusing on counterfactuals without making causal assumptions (Dickerman and Hernán 2020, Mothilal et al. 2020, Goyal et al. 2019, Poyiadzi et al. 2020). Our study aligns with the latter, non-causal approach.
For counterfactual explanations that involve causal inference, the objective is to mitigate confounding bias and estimate treatment effects from generated counterfactuals(Yao et al. 2021). Methods (Zhu et al. 2024, Yao et al. 2021) in this category are often grouped by their approaches to controlling confounders, including (1) re-weighting, (2) stratification, (3) matching, (4) tree-based, and (5) representation-based. In the context of healthcare outcome prediction, several key studies illustrate these approaches. Li and Li (2019) introduced a propensity score weighting framework to estimate causal effects across multiple treatments, applying this to analyze racial disparities in medical expenditures among different racial groups using the 2009 Medical Expenditure Panel Survey (MEPS) data. Linden (2014) employed marginal mean weighting through stratification (MMWS) to measure pre and post-intervention differences in hospitalizations following a disease management program for congestive heart failure, the results underscore MMWS as a valuable alternative for evaluating healthcare interventions with observational data. Huang et al. (2023) proposed GPMatching, a novel matching method utilizing Gaussian process priors to define matching distance, which they applied to assess the effectiveness of early biological medication for Juvenile Idiopathic Arthritis using electronic medical record data. Similarly, Wang et al. (2016) utilized causal trees to identify patient groups with differing outcomes across healthcare providers, highlighting that patient-provider alignment based on outcome information can lead to improved expectations for patients. Despite the advancements in counterfactual causal inference for healthcare, these methods often rest on strict assumptions. Given that our work focuses on spatially explicit explanations, spatial continuity and correlation may violate the assumption that an individual’s potential outcomes remain unaffected by others’ treatment assignments, we leave these considerations for future work and will not explore them in the current study.
For non-causal counterfactual prediction and explanation, counterfactual examples can be generated through either model-agnostic or model-specific approaches. Model-agnostic methods do not consider the model’s internal structure, instead relying solely on the model’s inputs and outputs to generate counterfactual examples. Model-specific methods, on the other hand, leverage the model’s internal structure to create these examples. Our study emphasizes the model-agnostic approach due to its adaptability and potential for transfer across tasks. Despite the demand, few counterfactual explanation methods in this category are tailored for healthcare-specific ML models. One general-purpose approach, DiCE (Diverse Counterfactual Explanations) (Mothilal et al. 2020), can be applied broadly to any ML model, generating counterfactuals agnostic to model structure. However, DiCE does not account for the alignment of generated examples with real-world data, affecting the trustworthiness of the explanations. Developing a healthcare-specific counterfactual explanation method could substantially enhance intervention strategies. In our study, we include DiCE as a baseline for comparison.
3. Methodology
The SpaCE (Spatial Counterfactual Explanation) method (Figure 1) consists of two integral components: a Spatially-Explicit health outcome Predictor(SEP) to predict health outcomes from both health and spatial variables (longitude and latitude), and a Prototype-guided Counterfactual Explanation algorithm (PCE) to illuminate the decision-making process inherent in the SEP model.
Figure 1.
The SpaCE Framework. Panel (1): Spatially Explicit Health Outcome Predictor (SEP). Panel (2): Prototype-guided Counterfactual Explanation algorithm (PCE). This PCE algorithm-based pipeline includes three components: ① Simulated Data Generation: generating simulated data with a trained SEP model and combining them with the actual scenario data. The triangle and square symbols symbolize binary outcomes within the dataset. ② Prototype Calculation: employing the Maximum Mean Discrepancy-Critic method to aid in identifying a specific number of prototypes from the merged dataset. ③ Counterfactual Example Generation: generating counterfactual examples guided by prototypes while considering three criteria of feasibility, proximity, and diversity. This is the core of our method.
3.1. Spatially Explicit Health Outcome Predictor
The problem statement for health outcome prediction can be formulated as follows. Let , denote the set of health variables, denote the set of longitude and latitude. The total variable set can be denoted as . Given a dataset that consists of samples, for each sample , , , . Our goal is to learn a predictive model , where represents the outcome of the health status.
Making the outcome prediction model spatially explicit(Janowicz et al. 2020, Mai et al. 2022a, Li et al. 2021) is crucial given geospatial patterns observed in various diseases. A spatially explicit health outcome predictor (SEP) is developed to integrate both spatial and health variables, enhancing the performance of outcome predictions.
The architecture for SEP is shown in Figure 1 Panel 1. First, we encode spatial and health variables separately. For spatial effect representation or so-called location encoding (Mac Aodha et al. 2019, Mai et al. 2020b, 2022b, 2023b,c, Rußwurm et al. 2023, Wu et al. 2024), we adopt the location encoder from GeoCLIP (Vivanco Cepeda et al. 2024). It is important to note that the GeoCLIP encoder is a point-based encoder rather than a spatial-structure-based encoder like a Graph Neural Network (GNN) (Zhu et al. 2022). While GNNs can quantify spatial effects, they implicitly learn spatial features during training, which requires a carefully designed loss term for effective optimization. Assessing the extent to which these models rely on spatial information for downstream tasks can be challenging due to the complexity of their learning processes. In contrast, point-based encoders like GeoCLIP can capture spatial information in a pre-trained manner, eliminating the need for training from scratch. This allows them to serve as plug-and-play components within various model architectures alongside other types of features, making them highly efficient and easily transferable to new applications. The GeoCLIP encoder, trained on a globally geotagged image dataset, generalizes well across tasks and regions. It is in an equal earth projection and applied positional encoding with random Fourier variables that transform two-dimensional coordinates into a high-dimensional location embedding where . For the extraction of health variables, we utilized a Multilayer Perceptron (MLP). After obtaining two embeddings for spatial and health variables, we concatenate these embeddings and feed them into a Variational Autoencoder (VAE) model to learn a fused distribution. VAE(Kingma et al. 2016) is an advanced version of autoencoder (AE). Unlike traditional AE that encodes input data as a single point, VAE treats the input as a distribution over potential values in the latent space. This probabilistic approach not only helps in disregarding outlier data but also in filtering the most significant variables, thereby improving model robustness. Moreover, the VAE is particularly adept at handling heterogeneous data. It learns to normalize different types of variables internally and weigh the importance of each variable based on its relevance to the task, thus effectively fusing two different types of variables.
The framework of the SEP model is detailed as follows:
Encoder Network: The Encoder Network contains two parts. Firstly, we encode the spatial and health variables to embeddings separately. Secondly, the VAE encoder encodes the concatenated embedding to a latent variable within a Gaussian distribution, characterized by its mean and standard deviation .
Decoder Network: The decoder network takes the sampled representation as input and predicts the reconstruction of the original data. Through training, the decoder aims to generate output points that closely resemble the input.
Classification Network The learned embedding is fed into a classification layer (MLP) to predict the health outcome.
The total loss function combines reconstruction loss, Kullback-Leibler (KL) divergence, and cross-entropy loss, with the weights for each component controlled by hyperparameters , , and . The reconstruction loss quantifies the dissimilarity between the input data and the reconstructed data, quantified by Mean Squared Error. The KL divergence loss measures the disparity between the inferred latent space distribution and a prior standard Gaussian distribution . The cross-entropy loss is used to measure the precision of multiclass health outcome prediction. The loss formula is given by:
| (1) |
Where, is the number of training samples, represents the feature of the input, and represents the corresponding reconstructed output. and denote the mean and standard deviation of the latent variable , respectively, and represents the dimensionality of the latent space. is the number of classes. is a binary indicator of whether belongs to class (1 if true, 0 otherwise). is the predicted probability that sample belongs to class according to the model.
During training, we freeze the the pretrained location encoder module to preserve its generalized location embeddings.
3.2. Prototype-guided Counterfactual Explanation
After training the SEP model, our next aim is to understand the rationale behind its predictions. We achieve this by developing a novel XAI algorithm, named the Prototype-guided Counterfactual Explanation (PCE) algorithm, as illustrated in Panel 2 of Figure 1. In utilizing PCE to explain the SEP model, we initially employ the pretrained SEP model to predict outcomes for simulated OHCA patients, integrating the learned distribution into the simulated data. Subsequently, we merge these simulated patients with actual OHCA patients to generate candidates for counterfactual examples. The PCE algorithm then performs counterfactual generation from these candidates.
To illustrate how PCE works, we will first explain its key idea and then outline the steps involved. The simplest way to generate counterfactual examples is to use real-world data with opposite outcomes. However, real-world data are often noisy and contain many outliers, making them less suitable for counterfactuals. Our pre-trained SEP model, which has learned the mapping from x variables to y outcomes, can predict and filter out noise for y outcomes. We first simulate x variable values from the real-world data and use the pre-trained SEP model to predict y outcomes, creating a “noise-free” dataset. We then combine these noise-free data with the original data to form integrated data. In PCE, we first generate prototypes from this combined dataset to further reduce the effect of outliers. Then, for each query instance, we identify the closest prototypes with the opposite outcome as counterfactuals.
3.2.1. Data Simulation
We simulate or augment the data using a uniform sampling method from the original dataset, ensuring all simulated data points fall within the range of the original real-scenario values. The number of simulated samples matches the original data samples, based on the hypothesis that their importance is equal. There are three main reasons for using simulated data and a uniform sampling method: (1) Current real-scenario data we are working with is sparse, leveraging a pretrained prediction model to perform data augmentation helps to interpolate the data distribution, potentially smoothing the decision boundary and facilitating the identification of valid counterfactual examples. (2) We aim to address the skewness in the original data. For example, a variable such as the use of AED is highly skewed (over 90% of cases do not use AEDs). Directly using these data would generate counterfactuals that predominantly favor not to use AEDs, limiting the potential for improving survival rates through AED advocacy. (3) We aim to avoid limiting the generated counterfactual examples to specific geographical or data space. Since our collected data set only covered certain geographical areas, the use of uniform sampling ensures even geographical coverage for generality and equality.
3.2.2. Prototype Selection
We calculate prototypes from both simulated and real-world data to serve as potential candidates for counterfactual examples. Identifying prototypes ensures that the generated counterfactuals are not outliers, providing viable intervention guidance. Here, we use the Maximum Mean Discrepancy-Critic (MMD-Critic) method (Molnar 2020) to identify prototypes. The MMD-Critic procedure consists of two phases:
Determining Prototype Count: As the number of prototypes increases, the Squared MMD decreases, indicating a closer match between the prototype distribution and the overall dataset distribution. To determine the optimal prototype count, we employ elbow analysis (Humaira and Rasyidah 2020). This involves plotting the curve of Squared MMD against the increasing number of prototypes and identifying the elbow point where the rate of decrease sharply changes. We use the number at this point as the final number of prototypes.
Identifying the Prototypes: Once the optimal number is determined, we set it as a parameter to the MMD-Critic model and get the results of prototype points.
The squared MMD is calculated (Eq. 5)
| (2) |
In this equation, is a kernel function, identifying similarities between two data points. The parameters and denote the counts of prototypes and original data points , respectively. The parameters and denote the random index which indicates any samples selected from prototypes or data points . The prototypes are derived from a mix of real data and simulated data from a pre-trained model. This blend ensures the generated counterfactuals closely mirror real-world examples.
3.2.3. Counterfactual Example Generation
The next step involves generating counterfactual examples for each query instance. The formulation for counterfactual example generation is defined as follows. Let the set of query instances is defined as , , where is the number of query instances. The set of prototype instances is defined as , , where is the number of prototypes. The set of counterfactual examples for all query instances is defined as , , where represents the counterfactual example set for each query instance . For each query instance , the counterfactual example set is defined as , , where is the number of counterfactual examples we wish to generate. The total number of counterfactual examples for the query set is . Given the set of query instances and the set of prototype instances , the goal is to find the set of counterfactual instances for all query instances such that the outcomes of the counterfactual instances are opposite to the outcomes of the query instances and are realistic.
To ensure the generated counterfactuals are realistic, according to Mothilal et al. (2020), we follow three principles: feasibility, proximity, and diversity. Feasibility ensures that the counterfactual examples generated are practical and applicable to individual circumstances. When identifying counterfactual examples, we consider certain variables to be immutable, such as race or gender. For each variable in the list of immutable variables , if is a categorical variable, and the categorical variable value of (one prototype example) equals that of (one query instance/example), we include in the set of counterfactual examples . If is a continuous variable, and the continuous variable value of falls within the same quantile as that of , we add to the set of counterfactual examples . Proximity refers to the notion that the generated counterfactual examples should closely resemble real data points. To achieve this, counterfactual examples are searched in a high-dimensional space facilitated by k-d Tree (k-dimensional Tree) (Ram and Sinha 2019) data structure. A k-d Tree is particularly useful for nearest neighbor searches, as it first divides the feature space into subspaces to quickly eliminate large portions of the search space. Our k-d tree is constructed from prototypes derived from both real-world scenarios and simulated data, ensuring that the generated counterfactual examples maintain proximity to the original data points. Diversity in this context refers to the flexibility to generate multiple counterfactual examples for each query instance as required. In real-world scenarios, some variables are not easy to change for certain people, such as income, due to various constraints. Therefore, providing multiple counterfactual examples allows for a broader exploration of possibilities to identify actionable interventions.
The process for finding counterfactual examples for a specific query instance is as follows. Note that query instances are limited to data points with an expired status, since we aim to observe the change from expired to survived status.
Once we select prototypes from the combined dataset(simulated and real-scenario data), our initial step is to filter out the prototypes that belong to the opposite class of the query instance.
For the filtered prototypes , we construct a k-d tree instance, denoted as , using both the filtered prototypes and the query instance .
Initialize the number of neighbors, denoted as , to be equal to the number of counterfactual examples, represented by .
- Iterate while , where is the set of filtered prototypes.
- We query the k-d tree to find the nearest neighbors of , and denoted the preliminary generated counterfactual examples as .
- We check if satisfies all variable feasibility constraints and return the counterfactual examples met with constraints NN.indices.
- If the number of generated counterfactual examples NN.indices less than , we increment the number of neighbors by 1.
- We end this until the number of counterfactual examples NN.indices generated equals to or the number of neighbors equals to the number of filtered prototypes .
Return as counterfactual example set .
Details for how to find all counterfactual examples for all query instances can be found in Algorithm 1.
Algorithm 1.
Counterfactual Generation
| Input: query_instance_set , prototype_set | |
| Parameter: num_cfs , variables_not_vary | |
| Output: The set of counterfactual examples | |
| 1: | Initialize empty list lst_CF |
| 2: | Initialize num cfs |
| 3: | prototypes select ← [.class ≠ query.class] |
| 4: | for each in do |
| 5: | ← concat |
| 6: | k_dTree ← Build_k_d_tree |
| 7: | num_neighbors ← num_cfs |
| 8: | while ≤ len do |
| 9: | ← k_dTree.search |
| 10: | .indices = constraints_check |
| 11: | if len(.indices) < num_cfs then |
| 12: | |
| 13: | else |
| 14: | lst_CF ← concat[lst_CF, NN.indices] |
| 15: | break |
| 16: | end if |
| 17: | end while |
| 18: | Can not find enough counterfactuals |
| 19: | end for |
| 20: | Create counterfactuals using indices from lst_CF |
3.2.4. Variable Importance and Coefficient
Variable Importance
As we generate counterfactual examples for each query instance, we observe that some variables have changed more times than others. To quantify the effect of different variables in the counterfactual explanation scenario, we define both individual variable importance and global variable importance.
- Individual Importance The individual variable importance is calculated for all variables with respect to a single example in the original dataset. We define four different individual importance scores based on each variable type, value, and the positive/negative impact on the outcome. The calculated individual importance is normalized by dividing by the number of counterfactual examples to ensure they sum up to 1.
- Individual Importance for Categorical Variables: For each categorical variable, the individual importance, denoted as , is calculated as follows:
where represents the set of all counterfactual examples, is a categorical variable in the query instance , is the value of in a counterfactual example , and is an indicator function that returns 1 if the categorical value does not match .(3) - Individual Importance for Numerical Variables: For each numerical variable, the individual importance, denoted as , is calculated as follows:
where is the set of all counterfactual examples, is a numerical variable in the query instance , is the value of in a counterfactual example , and represent the quantile distributions of in the query instance and counterfactual example respectively, and is an indicator function that returns 1 if the absolute difference in quantiles between the query instance and the counterfactual example is greater than one quantile, and 0 otherwise. is the total number of counterfactual examples.(4) - Individual Importance Per Categorical variable Class: For each categorical variable, the individual importance for each class is calculated based on the frequency of this class appearing in the counterfactual examples and is different from the class of query instance normalized by the total number of counterfactual examples, denoted as :
Let be the set of all counterfactual examples, the class of the categorical variable in the query instance , the class in a counterfactual example , and a possible class. The indicator function returns 1 if equals and differs from , and 0 otherwise. denotes the total number of counterfactual examples.(5) -
Divergent Individual Importance for Numerical variable: The divergent individual importance of numerical variables, denoted as and , is calculated by comparing the value in the query instance q against values in counterfactual examples . For negative outcomes, measures the proportion of where , and for positive outcomes, where . Conversely, is measured under the opposite conditions. This approach assesses how variable changes impact outcomes differently. For a query instance with a negative outcome :
(6) For a query instance with a positive outcome :
Here, is an indicator function that returns 1 if the condition is true and 0 otherwise, and is the total number of counterfactual examples. The roles of and reverse depending on whether the outcome is positive or negative.(7)
- Global Importance (Georgia State-level in case study)
- Global importance, denoted as is determined by aggregating individual importances for each variable across all examples(query instances) in the original dataset. This aggregation involves summing individual importances across all query instances and then averaging these sums over all instances.
Coefficient
The coefficient for an ordinal variable is determined by graphically representing a smoothed linear relationship between the ordinal values of the variable (x-axis) and their corresponding global importance values (y-axis). The slope of this line is used as the coefficient for the ordinal variable. Conversely, the coefficient for a continuous variable is derived by estimating the net effect of the positive and negative individual importances.
3.2.4.1. Geographical Area-level Variable Importance and Coefficient.
In addition to investigating the impact of spatial variables on health outcomes, this study examines the spatial variation in the relationship between health variables and outcomes. To achieve this, we segment the study area according to specific levels of administrative boundaries. For each distinct zone, we evaluate both the global importance and the coefficient for every variable. In the case study, the geographical-area level is set at the county level.
3.3. Evaluation Metric
In our study, we utilize two sets of evaluation metrics. To assess the performance of SEP, we employ standard metrics, including Precision, Recall, F1 score, Area under the Receiver Operating Characteristic Curve(AUCROC), and Area Under the Precision-Recall curve(AUCPR), to compare it against other baseline models. For evaluating the quality of the generated counterfactual examples, we adopte proximity-based metrics, as recommended by Lucic et al. (2022), to quantify the deviation between the original dataset and the counterfactual examples generated. Our counterfactual explanation model is compared with a recently published model DiCE as the baseline (Mothilal et al. 2020).
Here, we formulate three proximity-based metrics. The first metric, mean distance , represents the average distance of the generated counterfactual examples from the original input. It is computed by first measuring the distance between a single query instance and each of its corresponding counterfactual examples, then calculating the average of these distances for all generated counterfactuals, and finally averaging these averages across all query instances. The second metric, mean relative distance is computed by calculating the ratio of the distance for each query instance from our method to the distance from the baseline method, and then averaging these ratios across all query instances in the dataset. A value less than 1 indicates that our counterfactual methods generate examples that are, on average, closer to the original input than those produced by the baseline. Furthermore, we also quantified how often the distance from a query instance to its counterfactual examples generated by our method was smaller than the distance using the baseline method as . A greater than 0.5 indicates our model is better than the baseline.
4. Experiment Setup
4.1. Data Source
This study utilizes Georgia cardiac arrest incident data (Figure 2) recorded from 2019 to 2021. The data is accessed from the Georgia Department of Public Health, reported by the Emergency Medical Service (EMS), and collected by The National Emergency Medical Services Information System. The data collection process is approved by the university’s Institutional Review Board. A total of 57,223 individual cardiac arrest patients and 24 variables are reported in this dataset(Table 1). The variable Patient Outcome at End of EMS Event is used as the targeted outcome variable. After data processing, a total of 5,385 cardiac arrest patients with geographic coordinates are filtered for analysis.
Figure 2.
Out-of-Hospital Cardiac Arrest Distribution in Georgia from 2019 to 2021
Table 1.
Variable Description of Georgia Out-of-Hospital Cardiac Arrest Dataset
| Category | Variable |
|---|---|
| Incident Details | Incident Date, Scene Latitude, Scene Longitude, Incident Location Type |
| Patient Information | Patient Gender, Patient Race, Patient Age |
| Response Details | Response Beginning Vehicle Odometer, Response On Scene Vehicle Odometer, Incident Call Date Time, Incident Unit Arrived On Scene Date Time |
| Initial Assessment | Initial Patient Acuity |
| Cardiac Arrest Details | Cardiac Arrest Etiology, Cardiac Arrest Indications Resuscitation Attempted By EMS, Cardiac Arrest Witnessed By Whom |
| CPR and AED Details | CPR Provided or Not Prior to EMS Arrival, Who Provided CPR Prior to EMS Arrival, AED Used or Not Prior to EMS Arrival, Who Used AED Prior to EMS Arrival, Types of CPR Provided List |
| Patient Outcome | Patient Outcome at End of EMS Event, Medical Device Type of Shock, Outcome Emergency Department Disposition Description, Outcome Hospital Disposition Description |
AED: Automated External Defibrillator, CPR: Cardiopulmonary Resuscitation
4.2. Data Preprocessing
Data processing steps include: (1) The location type of cardiac arrest incidents is classified using the Tenth Revision, Clinical Modification (ICD-10-CM) provided by the CDC (Centers for Disease Control and Prevention). (2) Racial categorization is refined according to the standards of the Office of Management and Budget. (3) Outliers are identified and removed. (4) Data integrity is maintained by excluding variables with more than 35% missing values. Then we remove examples that have missing values. (5) For data analysis, nominal variables are processed using one-hot encoding, while ordinal variables utilize ordinal encoding. Additionally, all numerical variables are standardized.
Fifteen variables are finally used, including latitude, longitude, etiology, gender, race, age, initial patient acuity, who witnessed the cardiac arrest, CPR provided prior to EMS arrival, who provided CPR prior to EMS, AED use prior to EMS arrival, types of CPR provided, duration from call to EMS arrival, location type, and patient outcome at the end of the EMS event. The patient outcome at the end of the EMS event is the targeted outcome variable.
4.3. SpaCE Model Training and Counterfactual Explanation
4.3.1. Spatially Explicit Health Outcome Predictor Training
After data pre-processing, our analysis includes 5,385 patients with OHCA, each with OHCA coordinates. The survival outcomes for these patients are classified into three categories: expired, ongoing resuscitation, and survived. We train our SEP model using 80% of the data and test it on the remaining 20%. It is important to mention that we use three labels when training the SEP model, but only two labels (expired and survived) are used when conducting the counterfactual explanation. We exclude ongoing resuscitation from our analysis as it represents an intermediate state, which complicates the generation of valid counterfactuals. Instead, we focus on generating examples where individuals transition directly from expired to survived.
4.3.2. Prototype-Guided Counterfactual Explanation Estimation
After obtaining a well-trained SEP model, we utilized it to generate pseudo data (5,385 examples) and combined it with real-scenario data (5,385 examples) to create a dataset with 10,770 examples. We filter the data and obtain 8,248 final examples that have either expired or survived labels. Using the PCE algorithm, we identify in total of 400 prototypes as counterfactual candidates. The query instances (4,167 examples) are patients who have expired within the combined dataset. We generate five counterfactual examples from prototypes for each query instance. We chose five as it ensures computational efficiency and provides sufficient diversity in options to effectively guide interventions for changing the survival status of a cardiac arrest patient. Based on the generated counterfactual examples, we calculate the global importance and global coefficient for each variable. Additionally, we create the dependence plot with the x-axis representing the variable class and the y-axis showing the variable class importance. We estimate the global importance and coefficients at the county level to assess the spatial variation in the effects and direction of each variable on health outcomes.
5. Result
5.1. Comparison between SEP and traditional machine learning models
To illustrate the effectiveness of our developed predictive model, SEP, in capturing spatial effects, we compared its performance with the state-of-the-art spatially explicit model, SpatialRF, in predicting OHCA survival outcomes. SpatialRF (Benito 2021, Wright and Ziegler 2015) is a recently developed machine learning approach that incorporates spatial information by calculating a distance matrix based on spatial coordinates and training data, aiming to minimize spatial autocorrelation in model residuals. Our SEP model achieved an AUC-ROC score of 0.682, outperforming SpatialRF’s score of 0.619 (Table 2). To further emphasize the importance of spatial variables in outcome prediction, we also evaluated our model against traditional machine learning models that do not include spatial variables. These models included Random Forest (RF), Gradient Boosted Decision Trees (GBDT), LightGBM, and Support Vector Classifier (SVC). Our SEP model consistently demonstrated superior performance across various metrics, as shown in Table 2. Notably, it achieved the highest AUC-ROC score of 0.682, surpassing the closest traditional model, GBDT, which scored 0.622.
Table 2.
Evaluation result for OHCA survival outcome prediction across multiple models
| Metrics | Models |
||||||
|---|---|---|---|---|---|---|---|
| Metrics | RandomForest | GBDT | LightGBM | XGBoost | SVC | SpatialRF | SEP |
| Precision | 0.384 | 0.415 | 0.417 | 0.400 | 0.405 | - | 0.440 |
| Recall | 0.501 | 0.583 | 0.579 | 0.570 | 0.576 | - | 0.625 |
| F1 | 0.435 | 0.485 | 0.485 | 0.470 | 0.476 | - | 0.517 |
| AUCROC | 0.571 | 0.622 | 0.616 | 0.599 | 0.609 | 0.619 | 0.682 |
| AUCPR | 0.386 | 0.444 | 0.447 | 0.418 | 0.419 | - | 0.469 |
The bold numbers indicate the highest performance per metric, the numbers underlined represent the second-highest performance. SEP represents the Spatially Explicit Health Outcome Prediction model. The hyphen (−) means the metric calculation in SpatialRF is not provided.
5.2. Comparison between PCE and DiCE
We compare our method, PCE, with the gradient-based DiCE method, which is a recently published counterfactual explanation framework(Mothilal et al. 2020). This comparative analysis is conducted using three distinct evaluation metrics stated in Section 3.3 across three different types of distance (Euclidean, Manhattan, and Cosine Similarity). As indicated in Table 3, the counterfactual examples generated by our method, PCE, are demonstrably closer to actual OHCA scenarios than those produced by DiCE. Specifically, the mean distance calculated for all three distance types using our method consistently yields smaller values compared to those from DiCE. Furthermore, the ratio , representing the distance from our method relative to that from DiCE, consistently stays below 1 for all distance types, indicating a consistent outperformance by our method. Additionally, the percentage , exceeds 50% for both the Euclidean and Manhattan distances, suggesting that our method surpasses DiCE in these metrics.
Table 3.
Evaluation Comparison between PCE and DiCE
| Metric | Method | Euclidean | Manhattan | Cosine |
|---|---|---|---|---|
| DiCE | 2.48 | 7.72 | 0.26 | |
| PCE | 1.24 | 2.80 | 0.17 | |
| PCE/DiCE | 0.54 | 0.42 | 0.79 | |
| PCE<DiCE | 0.62 | 0.62 | 0.42 |
The bold number shows the best performance. The smaller the distance metric, the better.
Figure 3 shows histograms for all three types of distance whose x-axes represent the calculated mean distance between each query instance and its corresponding counterfactual examples, and y-axes represent the count of query instances within certain distance bins. A closer inspection of this distribution histogram reveals that the distance between the counterfactual examples generated from PCE and the original real-scenario dataset is substantially smaller compared with those from DiCE.
Figure 3.
Distance distribution map between PCE and DiCE
5.3. Results from SpaCE on OHCA Survival Outcome Prediction
5.3.1. Latent Embedding Visualization
To investigate the potential relationship between the learned latent embedding and geographical locations, we employ Uniform Manifold Approximation and Projection (UMAP)(McInnes et al. 2018) to project the high-dimensional embedding into a two-dimension space, subsequently comparing it with the geographical distribution of the data points. Given the substantial number of data points, we adopt a sampling approach. Specifically, we sample 10 data points from each of the top 10 counties with the highest number of points. Then, we plot the learned latent embeddings of these 100 points colored by counties and compare this embedding space with real scenario spatial space. Figure 4 shows the visualization results. From panel(a), we observe two distinct groups. The larger group on the left shows a clear correlation where data points close in the embedding space are also proximal in spatial space, and vice versa. The smaller group on the right displays some discrepancies, likely due to the influence of other health-related variables on the embedding space. Overall, there is general alignment between the embedding and spatial coordinates, indicating effective integration of spatial information into the latent space.
Figure 4.
Comparison between learned embedding space and real scenario spatial space.
5.3.2. Generated Counterfactual Examples with Feasibility and Diversity
To assess the distinction between feasible and unfeasible counterfactual examples, we conducted a feasibility scenario simulation, the results are displayed in Table 4. We maintained certain demographic variables, such as gender and race, as constants; instead, we focused on modifying other factors. In scenarios deemed unfeasible, the counterfactual examples erroneously altered immutable characteristics, changing race from 6 (white) to 5 (other race) and 3 (Hispanic or Latin), and gender from 1 (male) to 0 (female). Conversely, the feasible counterfactual examples retained these variables and concentrated on altering other variables. To show the diversity of our method, we can select any reasonable number of generated counterfactual examples to provide various options for OHCA patients.
Table 4.
Counterfactual examples generated from unfeasible and feasible scenario
| Variable | query instance | unfeasible counterfactuals | feasible counterfactuals | ||
|---|---|---|---|---|---|
|
| |||||
| cf1 | cf2 | cf1 | cf2 | ||
| latitude | 33.40 | 34.32 | 33.66 | 33.33 | 30.85 |
| longitude | −84.60 | −85.09 | −84.41 | −82.54 | −83.34 |
| Types of CPR | 1.0 | - | 3.0 | 2.0 | 2.0 |
| Etiology | 6.0 | 3.0 | - | - | - |
| Gender | 1.0 | 0.0 | 0.0 | - | - |
| Race | 6.0 | 5.0 | 3.0 | - | - |
| Location type | 5.0 | 3.0 | - | - | 2.0 |
| Patient initial acuity | 2.0 | 0.0 | - | - | - |
| Witnessed by whom | 0.0 | - | 1.0 | 2.0 | - |
| CPR used or not | 1.0 | - | - | - | - |
| Who use CPR | 3.0 | - | 1.0 | 1.0 | - |
| AED used or not | 0.0 | - | - | - | - |
| Age | 0.0 | 3.0 | 1.0 | 2.0 | 1.0 |
| EMS response time (Duration) | 3.0 | - | 0.0 | - | 2.0 |
|
| |||||
| Survival outcome | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 |
The variables underlined means these variables should be kept constant and unchanged. Hyphen (−) means the values in generated counterfactual examples are unchanged compared with the value in the query instance. Survival outcome (0: expired, 1: survived). cfs: counterfactual examples
5.3.3. Individual Variable Importance and Coefficient
To illustrate an individual-level result, we randomly selected an OHCA patient from our dataset, generated five counterfactual examples (Table 5), and calculated the importance of each variable (Figure 5(a)) and coefficients (Figure 5(b)) for this patient. Considering only ordinal or continuous variables can be used to calculate coefficients, we have excluded the nominal variables, location type, and etiology in Figure 5(b). Analysis of the importance table revealed that variables such as geographic location (longitude and latitude), EMS response time(duration), and the person administering CPR(who used CPR) significantly influenced the shift from a non-survival to a survival outcome. From the coefficients depicted in Figure 5(b), we observed that duration is highly negatively correlated with survival outcomes, indicating that shorter EMS response times are crucial. Specifically shown in Table 5, four out of the five counterfactual examples suggested reducing EMS response time from category 3 (more than 13 minutes) to category 2 (less than 13 minutes), underscoring the importance of timely EMS intervention. Furthermore, most counterfactual examples recommended changing the CPR provider(who use CPR) from a professional (category 3) to a family member (category 1) or a community access responder (category 2), likely because these responders can reach OHCA patients faster than EMS personnel in such scenarios. In this instance, the variables ”AED usage” and ”initial patient acuity” have zero importance since our model tends to maximize the individual’s survival chance by keeping these two variables constant when generating counterfactual examples, highlighting our method’s ability to develop personalized intervention strategies tailored to individual cases.
Table 5.
Generated counterfactual examples for a single OHCA patient
| Variable | query instance | Counterfactuals (cf) |
||||
|---|---|---|---|---|---|---|
| cf1 | cf2 | cf3 | cf4 | cf5 | ||
| latitude | 33.40 | 33.33 | 30.85 | 33.98 | 30.77 | 33.47 |
| longitude | −84.60 | −82.54 | −83.34 | −83.71 | −83.76 | −84.18 |
| Types of CPR | 1.0 | 2.0 | 2.0 | - | - | 4.0 |
| Etiology | 6.0 | - | - | 2.0 | 5.0 | 2.0 |
| Gender | 1.0 | - | - | - | - | - |
| Race | 6.0 | - | - | - | - | - |
| Location type | 5.0 | - | 2.0 | 4.0 | 7.0 | - |
| Patient initial acuity | 2.0 | - | - | - | - | - |
| Witnessed by whom | 0.0 | 2.0 | - | - | 3.0 | - |
| CPR used or not | 1.0 | - | - | - | 0.0 | - |
| Who use CPR | 3.0 | 1.0 | - | - | 2.0 | 1.0 |
| AED used or not | 0.0 | - | - | - | - | - |
| Age | 0.0 | 2.0 | 1.0 | 1.0 | - | - |
| EMS response time (Duration) | 3.0 | - | 2.0 | 2.0 | 2.0 | 2.0 |
|
| ||||||
| Survival outcome | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Hyphen (−) means the values in generated counterfactual examples are unchanged compared with the value in the query instance. Survival outcome (0: expired, 1: survived). cfs: counterfactual examples
Figure 5.
Individual importance and coefficient for a single OHCA patient in predicting survival outcome. Considering only ordinal or continuous variables can be used to calculate coefficients, we have excluded the nominal variables, location type, and etiology in panel (b).
5.3.4. Global(State-level) Variable Importance and Coefficient
The global variable importance (Figure 6(a)) and coefficient (Figure 6(b)) are calculated for all examples in Georgia state. We also plot a dependence plot (Figure 7) based on the calculated global importance for all variables. This plot shows how the survival probability (importance of survival) changes when the variable class value changes. Given that the query instances used in PCE pertain to expired OHCA patients, and our objective is to generate counterfactual examples resulting in survived status, the variable importance and dependence plot serve as estimations of importance to survival. As we can see from Figure 6(a) panel, the geographical variable of longitude and latitude exhibit high significance in improving survival probability. Apart from these two spatial variables, the top six health variables include types of CPR used, OHCA witnessed by whom, the duration, age, location type of OHCA incident, and who use CPR. We also look into the efficient and dependence plot of these six variables, as shown in Figure 6(b) and Figure 7.
Figure 6.
Global(State-level) importances and coefficients for all examples in predicting OHCA survival outcome. Considering only ordinal or continuous variables can be used to calculate coefficients, we have excluded the nominal variables, location type, and etiology in panel (b).
Figure 7.
Global(state-level) dependence plot in predicting OHCA survival outcome. The y-axis represents variable class importance in enhancing survival status. The shaded region represents changing trend. For nominal variables without a specific order, such as etiology and location type, we do not display the trend here.
Variables with positive coefficient: For the variable OHCA witnessed by whom, patients witnessed by professionals or family members are more likely to survive than those not witnessed or witnessed by bystanders. This aligns with our common sense. Regarding emergency response durations, there is a positive correlation between survival rates and the promptness of the response. However, a closer inspection reveals a significant decline in survival rates when the response time exceeds 13 minutes. This suggests that factors other than EMS response time may play a more critical role in determining survival outcomes for instances that are less than 13 minutes. Additionally, incidents occurring in sports areas and service areas.
Variables with negative coefficient: There is a negative correlation between both latitude and longitude and survival outcomes, indicating that survival chances increase the further south and west the location. The impact of location on survival is complex and interacts with socioeconomic factors, social determinants of health, and factors related to OHCA incidents. An exploration of the geospatial distribution of the original data revealed that the northeastern area utilized CPR less frequently than the southwestern area, which may contribute to the negative correlation. Additionally, the choice of CPR types employed is negatively correlated with survival rates. The variable types of CPR reflect the frequency and variety of devices and techniques used during resuscitation efforts. For instance, in a case where three types of CPR are utilized on a patient, this might include compression with an external plunger-type device, ventilation with a bag valve mask, and ventilation with a pocket mask. These findings suggest that employing simpler, yet effective CPR techniques may enhance survival outcomes. Age also plays a crucial role, as age increases, survival chances decrease. Moreover, patients who receive CPR from family members or professionals before the arrival of EMS have a higher survival chance compared to those assisted by bystanders or community access responders.
5.3.5. County-level Explanation
To assess the spatial variation in the impact of health-related variables on survival outcomes across different counties in Georgia, we conduct a county-level analysis and visualize the results through maps of variable importance (Figure 8) and coefficient (Figure 9). The variable importance and coefficient are broken into 5 classes in the maps using quantile method. We organize our discussion of results according to the spatial orientation, focusing on the northern, southern, and entire state perspectives. The boundary for north and south Georgia in the map is perceptually draw based on the fall line.
Figure 8.
Variable Importance at County level across Georgia
Figure 9.
Variable Coefficient at County level across Georgia
In North Georgia, the variable importance map(Figure 8) generally indicates low significance for most variables in the Atlanta metro area. However, this area is critical due to its high incidence of OHCA. Consequently, for the Atlanta metro area, we have shifted our focus to other variables that show greater importance: the duration from EMS dispatch to arrival and patient age. As illustrated in Figure 9, these two variables exhibit a consistent negative correlation with survival outcomes in the Atlanta area. The coefficient map for EMS response time indicates that patients with shorter response times have a higher survival probability in the Atlanta metro area. Similarly, the coefficient map for age reveals that younger patients generally have a better chance of survival.
In South Georgia, the importance map (Figure 8) indicates that most variables hold greater significance compared to those in North Georgia. From the coefficient map (Figure 9), it is evident that certain variables display a strong correlation (indicated by deep color intensity) with survival outcomes. These variables include initial patient acuity, witnessed by whom, who used CPR, and age. For initial patient acuity, a significant negative correlation is observed, suggesting that higher acuity levels correspond to lower survival chances, aligning with intuitive expectations. The variable witnessed by whom displays a positive correlation with survival outcomes, indicating that OHCA patients witnessed by healthcare professionals are more likely to survive than those witnessed by bystanders. Further analysis combining the witnessed by whom coefficient map with the AED use map pinpoints counties such as Telfair, Pulaski, Coffee, and Appling where both witnessing and AED use are positively correlated with survival outcomes. This insight supports the advocacy for targeted AED training initiatives in these specific areas. Additionally, when the witnessed by whom map is combined with the CPR use or not map, a positive correlation between CPR use and witnessed by whom with survival outcomes is also observed in southern counties, including Atkinson, Coffee, Telfair, and Irwin. This suggests that promoting CPR training initiatives in these counties could increase the availability of trained responders and thus improve survival rates. For the variable concerning who performed CPR, almost all counties show a negative correlation with survival rates, indicating that patients assisted by professionals have lower survival chances compared to those helped by family members, community responders, or bystanders. This may be due to the latter groups’ ability to reach patients more promptly than professionals.
Across the entire state of Georgia, the age coefficient map reveals a significant negative correlation with survival outcomes.
6. Discussion
6.1. Advantages of our predictive model over previous models
Our research on OHCA survival outcome prediction demonstrates that the SEP model we developed outperforms the state-of-the-art model, SpatialRF. The SEP model, which integrates spatial and health variables, achieved a 10.2% improvement in OHCA survival prediction accuracy compared to SpatialRF, indicating a stronger capability in capturing spatial effects. While SpatialRF can simulate spatial structure using a distance matrix and account for spatial autocorrelation, it may fall short in capturing more complex spatial relationships. In contrast, our model’s location encoders, which leverage Fourier transformations, effectively capture subtle spatial patterns that traditional models might miss. Additionally, it is noted that SpatialRF, depending on its implementation, can be computationally demanding due to spatial adjustments and distance matrix calculations, particularly when working with large datasets.
Moreover, our SEP model showed a 9.7% improvement in OHCA survival outcome prediction compared to traditional machine learning models such as RF, GBDT, LightGBM, and SVC, which do not consider spatial variables. This indicates that spatial variables play a critical role in survival outcome prediction, and our SEP model successfully integrates spatial effect. Second, the resulting map (Figure 4) showed a high similarity between cardiac arrest cases compared at the county level in a two-dimensional embedding space plot and a geographical space plot. This further confirms that our model effectively learned and incorporated spatial features.
6.2. The validity and advantages in our explainability model
By comparing the PCE with the DiCE to explain the prediction of the SEP model, we found that the counterfactual examples generated by PCE are more closely aligned with the real scenario data than DiCE. This indicates that our model has greater potential to provide practical suggestions. Further analysis of the PCE explanation results on the SEP model in OHCA survival outcome prediction, at both individual and geographical levels, supports this conclusion. At the geographical level, the explanation results for variable importance and coefficient maps indicate that EMS response time needs improvement in North Georgia, particularly around the Atlanta area. Additionally, these maps highlight that promoting AED and CPR training initiatives in South Georgia would be beneficial for improving OHCA survival rates. At the individual level, the generated counterfactual examples in the OHCA survival outcome prediction task enable practical intervention suggestions for specific individuals. These suggestions are reasonable and practical, as they maintain certain self-attribute variables, such as race and gender, unchanged, and make the generated counterfactual examples as close to the real scenario examples as possible. Overall, these results offer targeted plans for improving OHCA survival rates at both geographical and individual levels, demonstrating the effectiveness of our explanations.
While SHAP, LIME, and GeoShapley methods can rank feature importance at both global and local levels, they do not provide actionable guidance on how to adjust specific features to change the outcome for an individual case. In contrast, counterfactual-based methods not only rank feature importance but also suggest specific feature modifications that could lead to a different outcome, aligning directly with the goal of improving health outcomes through targeted interventions. Our counterfactual-based XAI method achieves this by generating counterfactual examples. For instance, for a patient with an ”expired” status, our method might indicate that changing the ”witness status” from ”not witnessed” to ”witnessed by family member” and changing ”AED use” from ”not used” to ”used” could have led to survival. This counterfactual-based method provides clear, actionable modifications for specific features. With the successful development and application of our counterfactual explanation model, PCE, in predicting OHCA outcomes, our method holds the potential for adaptation to other geohealth datasets, enabling the development of multi-level intervention strategies accordingly. Furthermore, with adjustments to align with principles of feasibility, diversity, and proximity, our method can be applied to other domains where data exhibit spatial patterns and support counterfactual outcomes.
6.3. Health Policy Recommendation
Our analysis provides valuable insights into the factors influencing OHCA survival outcomes in Georgia, revealing significant regional differences that necessitate tailored health policy interventions. In North Georgia, particularly within the Atlanta metro area, the variable importance map highlights that the duration from EMS dispatch to arrival and patient age are critical factors. The consistent negative correlation between shorter EMS response times and higher survival probabilities underscores the urgency of improving EMS efficiency. This finding suggests that investing in advanced dispatch systems, better traffic management for emergency vehicles, and strategic placement of EMS stations could substantially enhance survival rates in this densely populated area. Additionally, since younger patients tend to have better survival outcomes, health policies could also focus on preventive measures and awareness programs targeting older adults to mitigate their higher risks. In South Georgia, the analysis indicates that multiple variables, including initial patient acuity, who witnessed the event, and who performed CPR, play crucial roles. The significant negative correlation of higher acuity levels with survival suggests the need for early intervention strategies and better acute care management. The positive impact of being witnessed by healthcare professionals and the use of AEDs implies that increasing the presence and readiness of medical personnel, as well as public access to AEDs, could improve outcomes. Training community members in AED use and CPR, especially in identified high-impact counties like Telfair, Pulaski, Coffee, and Appling, could empower bystanders to act swiftly and effectively, potentially bridging the gap before professional help arrives. Additionally, allocating more resources to the older population is essential to improve survival chances across the state.
6.4. Limitations and Future Works
Although our method has shown improved performance and explainability, we recognize that there is still room for further exploration. For example, despite being a complete dataset spanning the years from 2019 to 2012, the data is not able to cover all geographical areas and data space, this gives limitations to the quality of the generated counterfactual examples. Additionally, the input variables for our model are limited to those related to OHCA incidents, while individual demographic variables are omitted due to privacy policies. However, these features could potentially impact health outcomes. Evaluating our model on datasets that include such information would provide a more complete picture, helping to identify significant features of health outcomes.
7. Conclusion
In this study, we propose a Spatial Counterfactual Explanation (SpaCE) method for health outcome prediction and explanation, and demonstrate its effectiveness in the task of OHCA survival outcome prediction. Overall, this study presents three key contributions. First, we develop a spatially explicit health outcome predictor (SEP) model that effectively captures the nuances of both spatial and health variable distributions. Second, we introduce a tailored Prototype-guided Counterfactual Explanation (PCE) algorithm designed to elucidate the decision-making process of this model. This provides valuable insights for intervention strategies and policymaking by examining counterfactual examples. Finally, we apply the model to predict and analyze the Out-of-Hospital Cardiac Arrest (OHCA) dataset in Georgia, United States. Our model outperforms other state-of-the-art ML models in prediction accuracy and generates the most closely resembling counterfactual examples to the query case. Through analysis of generated counterfactual examples, our method effectively identifies risk factors, quantifies their impact on health outcomes at both individual and county levels and provides tailored intervention strategies. This illustrative case study highlights the versatility of our method, demonstrating its potential applicability to diverse individual, public, and population health contexts.
Funding
This work was supported by the NIH-NIMHD (National Institute on Minority Health and Health Disparities) under Grant 1R01MD013886-01 and Grant 3R01MD013886-05S1. This work was also supported by University Campus Sustainability Grant.
List of Acronyms
- AED
Automated External Defibrillator
- ANN
Artificial Neural Network
- CPR
Cardiopulmonary Resuscitation
- DiCE
Diverse Counterfactual Explanation
- DL
Deep learning
- DT
Decision Tree
- EMS
Emergency medical services
- LIME
Local Interpretable Model-agnostic Explanations
- ML
Machine Learning
- MMD-Critic
Maximum Mean Discrepancy-Critic
- OHCA
Out-of-Hospital Cardiac Arrest
- PCE
Prototype-guided Counterfactual Explanation Model
- RF
Random Forest
- SEP
Spatially Explicit Health Outcome Predictor
- SHAP
SHapley Additive exPlanations
- SpaCE
Spatial Counterfactual Explainable Deep Learning model
- SVC
Support Vector Classifier
- VAE
Variational Autoencoder
- XAI
Explainable Machine Learning
Footnotes
Disclosure statement
The authors report there are no competing interests to declare.
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Georgia (IRB ID: PROJECT00005093, approved 25 March 2022).
Informed Consent Statement
Patient consent was waived due to the data we used being secondary data and the waiver was approved by Institutional Review Board.
Data and Codes Availability Statement
The codes that support the findings of this study are available with the identifier at this public link (https://doi.org/10.6084/m9.figshare.25719591). The participants of this study did not give written consent for their data to be shared publicly, so due to the sensitive nature of the research supporting data is not available.
References
- Akindote OJ, et al. , 2023. Comparative review of big data analytics and gis in healthcare decision-making. World Journal of Advanced Research and Reviews, 20 (3), 1293–1302. [Google Scholar]
- Alaa AM, et al. , 2019. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 uk biobank participants. PloS one, 14 (5), e0213653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benito B, 2021. Spatialrf: easy spatial regression with random forest. R package version, 1 (0). [Google Scholar]
- Brunsdon C, Fotheringham S, and Charlton M, 1998. Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician), 47 (3), 431–443. [Google Scholar]
- Chen M, et al. , 2017. Disease prediction by machine learning over big data from healthcare communities. Ieee Access, 5, 8869–8879. [Google Scholar]
- Choi E, et al. , 2016. Doctor ai: Predicting clinical events via recurrent neural networks. In: Machine learning for healthcare conference. PMLR, 301–318. [PMC free article] [PubMed] [Google Scholar]
- Dickerman BA and Hernán MA, 2020. Counterfactual prediction is not only for causal inference. European journal of epidemiology, 35, 615–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dindorf C, et al. , 2020. Interpretability of input representations for gait classification in patients after total hip arthroplasty. Sensors, 20 (16), 4385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Djukpen RO, 2012. Mapping the hiv/aids epidemic in nigeria using exploratory spatial data analysis. GeoJournal, 77, 555–569. [Google Scholar]
- Goyal Y, et al. , 2019. Counterfactual visual explanations. In: International Conference on Machine Learning. PMLR, 2376–2384. [Google Scholar]
- Habehh H. and Gohel S, 2021. Machine learning in healthcare. Current genomics, 22 (4), 291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang B, et al. , 2023. Gpmatch: A bayesian causal inference approach using gaussian process covariance function as a matching tool. Frontiers in Applied Mathematics and Statistics, 9, 1122114. [Google Scholar]
- Humaira H. and Rasyidah R, 2020. Determining the appropiate cluster number using elbow method for k-means algorithm. In: Proceedings of the 2nd Workshop on Multidisciplinary and Applications (WMA) 2018, 24–25 January 2018, Padang, Indonesia. [Google Scholar]
- Janowicz K, et al. , 2020. Geoai: spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond.
- Kingma DP, et al. , 2016. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29. [Google Scholar]
- Klemmer K, et al. , 2023. Satclip: Global, general-purpose location embeddings with satellite imagery. arXiv preprint arXiv:2311.17179.
- Kwon J.m., et al. , 2019. Deep-learning-based out-of-hospital cardiac arrest prognostic system to predict clinical outcomes. Resuscitation, 139, 84–91. [DOI] [PubMed] [Google Scholar]
- Li F. and Li F, 2019. Propensity score weighting for causal inference with multiple treatments.
- Li W, Hsu CY, and Hu M, 2021. Tobler’s first law in geoai: A spatially explicit deep learning model for terrain feature detection under weak supervision. Annals of the American Association of Geographers, 111 (7), 1887–1905. [Google Scholar]
- Li Z, 2024. Geoshapley: A game theory approach to measuring spatial effects in machine learning models. Annals of the American Association of Geographers, 1–21.
- Linden A, 2014. Combining propensity score-based stratification and weighting to improve causal inference in the evaluation of health care interventions. Journal of evaluation in clinical practice, 20 (6), 1065–1071. [DOI] [PubMed] [Google Scholar]
- Loh HW, et al. , 2022. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine, 226, 107161. [DOI] [PubMed] [Google Scholar]
- Lucic A, et al. , 2022. Focus: Flexible optimizable counterfactual explanations for tree ensembles. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, 5313–5322. [Google Scholar]
- Lundberg SM and Lee SI, 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30. [Google Scholar]
- Mac Aodha O, Cole E, and Perona P, 2019. Presence-only geographical priors for fine-grained image classification. In: Proceedings of the IEEE International Conference on Computer Vision. 9596–9606. [Google Scholar]
- Mai G, et al. , 2022a. Symbolic and subsymbolic geoai: Geospatial knowledge graphs and spatially explicit machine learning. Trans. GIS, 26 (8), 3118–3124. [Google Scholar]
- Mai G, et al. , 2020a. Se-kge: A location-aware knowledge graph embedding model for geographic question answering and spatial semantic lifting. Transactions in GIS, 24 (3), 623–655. [Google Scholar]
- Mai G, et al. , 2022b. A review of location encoding for geoai: methods and applications. International Journal of Geographical Information Science, 36 (4), 639–673. [Google Scholar]
- Mai G, et al. , 2020b. Multi-scale representation learning for spatial feature distributions using grid cells. In: ICLR 2020. openreview. [Google Scholar]
- Mai G, et al. , 2023a. Towards general-purpose representation learning of polygonal geometries. GeoInformatica, 27 (2), 289–340. [Google Scholar]
- Mai G, et al. , 2023b. Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. In: International Conference on Machine Learning. PMLR. [Google Scholar]
- Mai G, et al. , 2023c. Sphere2vec: A general-purpose location representation learning over a spherical surface for large-scale geospatial predictions. ISPRS Journal of Photogrammetry and Remote Sensing, 202, 439–462. [Google Scholar]
- McInnes L, Healy J, and Melville J, 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
- Mena C, et al. , 2018. Spatial analysis for the epidemiological study of cardiovascular diseases: A systematic literature search. Geospatial health, 13 (1). [DOI] [PubMed] [Google Scholar]
- Minh D, et al. , 2022. Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 1–66.
- Molnar C, 2020. Interpretable machine learning. Lulu.com.
- Morgan S, 2015. Counterfactuals and causal inference. Cambridge: University Press. [Google Scholar]
- Mothilal RK, Sharma A, and Tan C, 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. 607–617. [Google Scholar]
- Mubeen AM, et al. , 2017. A six-month longitudinal evaluation significantly improves accuracy of predicting incipient alzheimer’s disease in mild cognitive impairment. Journal of Neuroradiology, 44 (6), 381–387. [DOI] [PubMed] [Google Scholar]
- Nazia N, et al. , 2022. Methods used in the spatial and spatiotemporal analysis of covid-19 epidemiology: a systematic review. International Journal of Environmental Research and Public Health, 19 (14), 8267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neefjes EC, et al. , 2017. Identification of patients with cancer with a high risk to develop delirium. Cancer medicine, 6 (8), 1861–1870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng J, et al. , 2021. An explainable artificial intelligence framework for the deterioration risk prediction of hepatitis patients. Journal of medical systems, 45 (5), 61. [DOI] [PubMed] [Google Scholar]
- Poyiadzi R, et al. , 2020. Face: feasible and actionable counterfactual explanations. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 344–350. [Google Scholar]
- Prosperi M, et al. , 2020. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nature Machine Intelligence, 2 (7), 369–375. [Google Scholar]
- Ram P. and Sinha K, 2019. Revisiting kd-tree for nearest neighbor search. In: Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining. 1378–1388. [Google Scholar]
- Rao J, et al. , 2020. Lstm-trajgan: A deep learning approach to trajectory privacy protection. arXiv preprint arXiv:2006.10521.
- Rußwurm M, et al. , 2023. Geographic location encoding with spherical harmonics and sinusoidal representation networks. In: The Twelfth International Conference on Learning Representations. [Google Scholar]
- Sahar L, et al. , 2021. Using geospatial analysis to evaluate access to lung cancer screening in the united states. Chest, 159 (2), 833–844. [DOI] [PubMed] [Google Scholar]
- Santangelo OE, et al. , 2023. Machine learning and prediction of infectious diseases: a systematic review. Machine Learning and Knowledge Extraction, 5 (1), 175–198. [Google Scholar]
- ŞENER R. and Türk T, 2021. Spatiotemporal analysis of cardiovascular disease mortality with geographical information systems. Applied Spatial Analysis and Policy, 14 (4), 929–945. [Google Scholar]
- Shah ND, Steyerberg EW, and Kent DM, 2018. Big data and predictive analytics: recalibrating expectations. Jama, 320 (1), 27–28. [DOI] [PubMed] [Google Scholar]
- Siampou MD, et al. , 2024. Poly2vec: Polymorphic encoding of geospatial objects for spatial reasoning with deep neural networks. arXiv preprint arXiv:2408.14806.
- Smith SC and Ramamoorthy S, 2020. Counterfactual explanation and causal inference in service of robustness in robot control. In: 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob). IEEE, 1–8. [Google Scholar]
- Son H, et al. , 2023. Social determinants of cardiovascular health: a longitudinal analysis of cardiovascular disease mortality in us counties from 2009 to 2018. Journal of the American Heart Association, 12 (2), e026940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tseng PY, et al. , 2020. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Critical care, 24, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vivanco Cepeda V, Nayak GK, and Shah M, 2024. Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization. Advances in Neural Information Processing Systems, 36. [Google Scholar]
- Wang G, Li J, and Hopp WJ, 2016. A causal tree approach for personalized health care outcome analysis.
- Wiemken TL and Kelley RR, 2020. Machine learning in epidemiology and health outcomes research. Annu Rev Public Health, 41 (1), 21–36. [DOI] [PubMed] [Google Scholar]
- Wright MN and Ziegler A, 2015. ranger: A fast implementation of random forests for high dimensional data in c++ and r. arXiv preprint arXiv:1508.04409.
- Wu N, et al. , 2024. Torchspatial: A location encoding framework and benchmark for spatial representation learning. arXiv preprint arXiv:2406.15658.
- Yan B, et al. , 2017. From itdl to place2vec: Reasoning about place type similarity and relatedness by learning embeddings from augmented spatial contexts. In: Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems. 1–10. [Google Scholar]
- Yao L, et al. , 2021. A survey on causal inference. ACM Transactions on Knowledge Discovery from Data (TKDD), 15 (5), 1–46. [Google Scholar]
- Zhang D, et al. , 2022. Machine learning approach to predict in-hospital mortality in patients admitted for peripheral artery disease in the united states. Journal of the American Heart Association, 11 (20), e026987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu D, et al. , 2022. Spatial regression graph convolutional neural networks: A deep learning paradigm for spatial multivariate distributions. GeoInformatica, 26 (4), 645–676. [Google Scholar]
- Zhu Y, et al. , 2024. Causal inference with latent variables: Recent advances and future prospectives. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 6677–6687. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The codes that support the findings of this study are available with the identifier at this public link (https://doi.org/10.6084/m9.figshare.25719591). The participants of this study did not give written consent for their data to be shared publicly, so due to the sensitive nature of the research supporting data is not available.









