Abstract
Using recent methods for spatial propensity score modeling, we examine differences in hospital stays between non-Hispanic black and non-Hispanic white veterans with type 2 diabetes. We augment a traditional patient-level propensity score model with a spatial random effect to create a matched sample based on the estimated propensity score. We then use a spatial negative binomial hurdle model to estimate differences in both hospital admissions and inpatient days. We demonstrate that in the presence of unmeasured geographic confounding, spatial propensity score matching in addition to the spatial negative binomial hurdle outcome model yields improved performance compared to the outcome model alone. In the motivating application, we construct three estimates of racial differences in hospitalizations: the risk difference in admission, the mean difference in number of inpatient days among those hospitalized, and the mean difference in number of inpatient days across all patients (hospitalized and non-hospitalized). Results indicate that non-Hispanic black veterans with type 2 diabetes have a lower risk of hospital admission and a greater number of inpatient days on average. The latter result is especially important considering that we observed much smaller effect sizes in analyses that did not incorporate spatial matching. These results emphasize the need to address geographic confounding in health disparity studies.
1. Introduction
It is estimated that one in eight Americans has type 2 diabetes, with a heavier burden among non-white racial minorities [1]. Patients with type 2 diabetes experience more common and lengthier inpatient hospital stays and are more likely to die in the hospital than their non-diabetic counterparts [2]. In fact, inpatient stays are the highest medical expenditure among type 2 diabetics, accounting for 43% of the 256 billion dollars spent annually [3]. Promisingly, the number of days Americans spend in the hospital has been decreasing over time. However, for ethnic and racial minorities, this trend may actually represent short but frequent health care system encounters of poor-quality, insufficient care [4]. It is still unclear how observed decreases in inpatient days differentially affect racial minorities even after disease management initiatives to reduce inpatient encounters have been enacted [5].
There are a number of factors that may contribute to disparities in inpatient hospital stays. Barriers to access of inpatient services may vary at both individual and facility levels. Differences in comorbidity burden or financial obligation in payment for services may account for some of the racial differences that have been reported [6]. In addition, the use of inpatient services has been shown to vary geographically: recent studies have shown that after socioeconomic status and disease burden is controlled for, areas with higher hospital capacity tend to have higher hospitalization rates [7]. Even in a relatively homogeneous patient population such as veterans receiving care within the Veterans Affairs (VA) healthcare system, geographic variation in inpatient utilization persists [8].
Finally, patient-provider relationships and patient advocacy may also influence hospitalizations and the number of days patients spend in the hospital [5]. For example, McCormick et al. [9] found that recent healthcare reforms did not narrow the racial disparity in hospital admissions and that this could be explained, in part, by implicit cultural biases on the part of healthcare providers. This finding aligns with the Institute of Medicine’s position that implicit racial bias may affect clinical communication and care [10]. Recent studies have also found that internal and emergency medicine physicians exhibit implicit bias and adhere to unconscious ethnic and racial stereotypes that can affect medical decision making [11, 12].
In this paper, we are interested in estimating racial differences in the risk of hospitalization and the number of inpatient days among veterans with type 2 diabetes after accounting for potential confounding factors. Propensity score matching offers a principled approach to addressing the issue of confounding and enables estimation of the average treatment effect among the treated (ATT). The ATT is often of interest in health disparities studies as interventions typically target the at-risk group rather than the population as a whole. Because community factors such as accessibility to health care facilities and availability of disease management resources can exacerbate health disparities, it is critical to account for not only patient-level confounding but also geographic confounding, which occurs when confounding factors vary spatially. The resulting matched sample is therefore balanced on the distribution of racial groups across geographical regions in addition to important patient-level characteristics. This in turn minimizes the bias in estimating the ATT.
Once a matched sample is generated, fitting a model for the outcome can address any residual imbalance and may yield improved effect estimates [13]. The specification of the outcome model is flexible and allows analysts to tailor it to the specific research question at hand. Two-part hurdle models [14] have been utilized in studies examining racial disparities in inpatient stays [5], as they allow researchers to address questions regarding both the risk of hospitalization and the number of inpatient days while accounting for zero-inflation in the count response. Hurdle models are two-part mixture models comprising a binary component that models (in our case) the probability of hospitalization, and a truncated count component that models the number of days in the hospital among those who are hospitalized. The truncated negative binomial distribution is an attractive choice for the count model as it allows for overdispersion relative to the Poisson assumption that the variance is equal to the mean. The negative binomial hurdle model can easily be extended to the spatial setting by incorporating spatial effects into both the binary and count components of the model. The resulting model can be fit within a Bayesian framework using standard software such as R-INLA [15].
In this work, we combine methods in spatial propensity score matching and hierarchical spatial hurdle models to achieve minimally biased estimates of the racial disparity in the risk of hospitalization and the mean number of inpatient days. We conduct a simulation study to assess the performance of spatial propensity score analysis in combination with the spatial hurdle model under unknown geographic confounders. We apply these methods to an analysis of the effect of race on hospitalization and inpatient days among type 2 diabetic veterans receiving care within the VA in the southeastern United States in the 2014 fiscal year. As the VA is concerned with reducing hospitalizations and inpatient days while simultaneously maintaining quality care, it is important to understand differences in health care services between non-Hispanic whites and racial minorities. Furthermore, as patients receiving care within the VA are often considered a “sentinel” population, the results of this study may indicate areas in need of attention in the general population [10].
2. Estimation of Causal Effects
2.1. Spatial Propensity Score Analysis
We begin by reviewing propensity score methods and extensions that incorporate a spatial random effect to address geographic confounding. Let Z denote a race group indicator taking the value 1 if a patient is non-Hispanic black (NHB) and 0 if non-Hispanic white (NHW). In the context of the causal framework outlined by Rubin [16], each individual is assumed to have two potential outcomes (Y(1), Y(0)), where Y(1) and Y(0) denote the (potentially counterfactual) outcomes under Z = 1 and Z = 0, respectively. The observed response, Y, is given by Y = ZY(1)+(1−Z)Y(0), so that Y = Y(1) if Z = 1 and Y = Y(0) otherwise. In the case of immutable characteristics such as race, we shift our focus from inherent biological traits such as race to health professionals’ perceptions of race. Here, the counterfactual can be viewed as the outcome that would be observed if implicit biases were eliminated. Since such biases accrue to non-Hispanic blacks and not (presumably) to non-Hispanic whites, the latter can serve as valid causal comparison group provided they are balanced on other relevant characteristics (e.g., age, gender, comorbidity burden). Thus, as Greiner and Rubin [17] note, perceptions of race can be regarded as a “treatment” assigned to a patient at the moment of patient-clinician interaction. In this way, ancillary characteristics such as age and gender can in turn be viewed as “pre-treatment” variables [17]. Our goal, therefore, is to create a well-balanced comparison group that can be used to examine the effect of perceived race on healthcare outcomes such as hospitalizations.
The population ATT is the average difference between the potential outcomes among treated patients, formally defined as ΔATT = E(Y(1) − Y(0)|Z = 1). The ATT is often desired in program evaluation or when the treatment is not likely to be targeted universally, as is the case in our motivating study. Under the assumption of no unmeasured confounding, propensity score methods can be used to derive unbiased estimators of the ATT in observational studies. The propensity score, e(x) = Pr(Z = 1|X = x), is the conditional probability of exposure given covariates X, where the so-called “overlap” condition, 0 < e(x) < 1, is assumed to hold. Propensity score matching is a technique that forms matched pairs between exposed and unexposed subjects based on the similarity of their estimated propensity scores [18, 19, 20].
Recent work has explored the use of propensity score analysis in the hierarchical setting, where patients are nested within clusters such as health care plans or hospitals [21, 22, 23]. Arpino and Mealli [21] proposed propensity score matching methods for hierarchical data that incorporate random effects into the propensity score model. In the hierarchical spatial setting, we augment the propensity score model with spatial random effects. The spatial effects are then assigned distributions that account for spatial correlation and promote smoothing across spatial units. More concretely, let Zij denote an indicator variable taking the value 1 if the jth patient in the ith county is NHB and 0 if NHW, and let xij represent a set of observed patient-level covariates. The spatial propensity score model is given by
| (1) |
where α is a vector of regression coefficients and ϕ1i is the spatial random effect for county i. The spatial effect ϕ1i accounts for unmeasured county-level factors associated with race, and circumvents the need to match within county, which may be infeasible in the case of small cluster sizes [23].
Matching on the spatial propensity score ensures balance across both patient factors and the geographical distribution between NHB and NHW patients. Once the matched sample is constructed, a variety of outcome models can be employed to address any residual imbalance between exposure groups [13].
2.2. Two Part Spatial Hurdle Models
Our motivating study of inpatient hospitalization practices poses a set of unique analytic challenges. First, approximately 71% of the patients in the sample were not hospitalized in 2014, resulting in substantial zero-inflation. Furthermore, among those who were hospitalized, there was a wide range of counts of inpatient days. In order to capture both zero-inflation for patients who do not experience a hospitalization and overdispersion of inpatient days among patients who do experience a hospitalization, we propose a negative binomial hurdle outcome model. A hurdle model [14] is a two-part mixture model consisting of a point mass at zero followed by a zero-truncated count distribution for the positive observations. The choice of count distribution can vary, but the negative binomial distribution is attractive because it accounts for overdispersion in the counts.
Let Y represent the number of inpatient days in a fiscal year. The probability of experiencing a hospitalization (i.e., any positive number of inpatient days) is expressed as Pr(Y > 0) = π where 0 < π < 1. For y = 1, 2, …, the probability that Y = y is given by , where p(y; μ, r) denotes the probability distribution function of a negative binomial distribution with mean μ and overdispersion parameter r, and p(0; μ, r) denotes the negative binomial distribution evaluated at 0. The mean count among hospitalized patients is given by , while the overall all mean among hospitalized and non-hospitalized patients is . The variance of the negative binomial hurdle model is where τ2 = μ(1 + μ/r) is the variance of the negative binomial distribution.
Turning to our case study, let Yij denote the number of inpatient days for the jth patient in the ith county. In this context, the negative binomial hurdle model is expressed as
| (2) |
where 1(·) denotes the indicator function, πij = Pr(Yij > 0) is the probability of hospitalization, and TNegBin(yij; μij, r) is a truncated negative binomial distribution with parameters μij and r. We model πij and μij as
| (3) |
where zij is a binary indicator for race; γ, δ and θ are regression coefficients; and, as in Equation 1, xij represent a set of patient-level covariates and ϕ2i and ϕ3i are spatial random effects.
To encourage maximal spatial smoothing, we assign the random effects ϕ1i, ϕ2i, and ϕ3i independent intrinsic conditional autoregressive (ICAR) priors [24]. Let k = 1 denote the propensity score model, k = 2 denote the outcome model for the risk of hospitalization, and k = 3 denote the outcome model for the mean number of inpatient days. The ICAR prior for ϕki takes the conditional form
| (4) |
where h ~ i indicates that county h is a geographic neighbor of county i, mi is the number of neighbors, and, for model k, is the conditional variance of ϕki given the remaining spatial effects, ϕk(−i). Following Brook’s Lemma [25], the joint distribution for ϕk = (ϕk1, … , ϕkn)T is given by
| (5) |
where Q = M−A is a spatial structure matrix of rank n−1, with M = diag(m1, … , mn) and A representing an n × n adjacency matrix with aii = 0, aih = 1 if i ~ h, and aih = 0 otherwise. When a fixed intercept is included in the model, a sum-to-zero constraint must be applied to ϕk to ensure an identifiable model.
The ICAR prior is appealing because it imposes spatial smoothing, reflecting the intuition that adjacent spatial units are more similar in terms of access to health care, resources and policies than non-neighbors. Moreover, by promoting localized spatial smoothing and information sharing from surrounding geographic areas, the ICAR prior reduces uncertainty in estimating the propensity scores and, in turn, the ATT.
2.3. Treatment Effect Estimation
Because our outcome analysis involves a two-part model, we can estimate an ATT for each part of the model. Furthermore, it is possible to combine results from both components of the hurdle model in order to estimate an ATT across the entire population of hospitalized and non-hospitalized patients. The three ATTs are more formally defined as
| (6) |
| (7) |
| (8) |
where
is a binary indicator of hospital admission (i.e., at least one inpatient day). Equation (6) yields the ATT of the racial disparity in the risk of hospitalization. Equation (7) yields the ATT of racial differences in the mean number of inpatient days among those who were hospitalized. Lastly, equation (8) yields the ATT of the disparity in the mean number of inpatient days among the entire population comprising both hospitalized and non-hospitalized patients. Each of the ATTs can be of practical interest depending on the clinical or public health question at hand.
3. Model Fitting
In this work, we adopted a Bayesian model fitting approach and assigned prior distributions to all model parameters. As a default, we assigned weakly informative N(0, 1e5) priors to fixed effects and, to ensure stable posterior precision estimates, specified Ga(1, 0.5) priors for the spatial precision terms, where Ga(a, b) denotes a gamma distribution with shape parameter a and rate parameter b. We fit the propensity score and outcome models separately, thus avoiding the so-called “feedback” issue that can arise when the models are fit jointly under a fully Bayesian approach [26]. We used approximate Bayesian methods for posterior inference. Specifically, we adopted the efficient integrated nested Laplace approximation (INLA) proposed by Rue et al. [15]. INLA uses a Laplace approximation to estimate the joint posterior of the model parameters, yielding improved computational capabilities over standard Markov chain Monte Carlo routines. This method can be readily implemented in the R package INLA (www.r-inla.org), where the Besag option is used to specify the ICAR prior. The posterior means of the propensity scores were then used to match individuals.
We matched individuals without replacement using the logit of the estimated propensity score with a caliper of 0.2 times the standard deviation as recommended by Austin [18] using the R package Matching [27]. Once a matched sample was constructed, we fit the spatial hurdle model. Because INLA does not have a built-in option for fitting hurdle models, we adapted the work of Quiroz et al. [28] by constructing an N × 2 matrix with column one comprising binary indicators of whether patients were hospitalized, and column two comprising the number of visits for each patient, where N is the total number of observations. We then jointly fit a binomial model to the binary portion, i.e. any inpatient days, and a zero-inflated negative binomial model to the number of visits, where missing values, NAs, are used to represent non-hospitalized patients. The zero-inflated negative binomial with missing values forces INLA to fit a zero-truncated negative binomial which, combined with the binomial model for any inpatient days, yields a negative binomial hurdle model. We constructed an estimate of each of the three ATTs using “standardization”, a technique that allows us to marginalize across the population by estimating the predicted responses, and , under the observed and counterfactual racial group [29]. In order to construct a credible interval (CrI) around this estimate, we used the inla.posterior.sample function within R-INLA to obtain 1000 Monte Carlo draws from the approximate posterior distribution. The mean of the three estimates across the 1000 samples is reported as the estimated ATT, and the corresponding 95% CrI is derived from the 2.5 and 97.5 percentiles.
4. Simulation Study
4.1. Data Description
We conducted a simulation study to assess the ability of the proposed spatial propensity score matching and spatial hurdle model to capture the true ATTs compared to fitting only a spatial hurdle outcome model without matching when relevant county-level covariates were left out of the analysis. We generated inpatient hospitalization data for a sample of hypothetical patients residing in three southeastern US states. To emulate the spatial layout of our application, we used the US Census county-level adjacency matrix for South Carolina, Georgia, and Alabama [30]. This matrix contains n = 272 counties and 1528 pairwise adjacencies. We generated 100 datasets; per-county sample size ni was generated uniformly within intervals defined by the quartiles of the sample sizes of our application (mean = 79.8, range: 1 to 1,385). We simulated the data according to the following propensity score (Equation 9) and outcome models (Equations 10 and 11):
| (9) |
| (10) |
| (11) |
where i = 1, …, 272, j = 1, …, ni, xij is a patient-level covariate generated from a Bernoulli(0.05) distribution and Vi is a county-level covariate generated from a N(10,3) distribution. The fixed effect coefficients were set at α0 = 0.2, α1 = −1.5, α2 = −0.1, β0 = −1.0, β1 = 0.5, β2 = 0.1, γ = −0.3, δ0 = 1, δ1 = −0.5, δ2 = 0.1 and θ = 0.3; ϕ1i, ϕ2i, and ϕ3i were simulated from ICAR models given in Equation (5) with σ2 = 1 to mimic the spatial variation observed in the case study. We then excluded the cluster-level variable Vi during the analysis to assess the impact of omitted variable bias.
4.2. Results
Table 1 presents results of the simulation study. The results indicate that fitting only an outcome model resulted in poor performance, whereas first performing spatial propensity score matching and then fitting a spatial outcome model yielded lower bias and RMSE and reasonable coverage. For example, when estimating Δ2, the mean difference in inpatient days among those who were hospitalized, fitting a propensity score model in addition to the outcome model resulted in lower bias and 90% coverage compared to 63% coverage when only an outcome model was fit. Misspecification of the model for the mean count is especially detrimental, as both the coefficients and the overdispersion parameter may be affected, potentially leading to extreme counts. The results of this simulation study support the use of spatial propensity score matching prior to fitting the spatial hurdle model, as it appears to capture the true risk difference even when county-level fixed effects are ignored during model fitting. As it is not uncommon for these covariates to be unavailable to the analyst, spatial matching provides a practical strategy to account for unmeasured geographic confounding.
Table 1:
Results of the simulation study: Mean absolute bias, RMSE, and 95% coverage of the three ATTs (Equations (6) – (8)) across 100 simulated datasets when (left) only an outcome model is fit and (right) a propensity score (PS) and outcome are fit under an omitted spatial covariate scenario
| Outcome Model Only | PS + Outcome Model | |||||
|---|---|---|---|---|---|---|
| ATT | Bias | RMSE | Coverage | Bias | RMSE | Coverage |
| Δ1 | 0.008 | 0.012 | 91 | 0.007 | 0.009 | 96 |
| Δ2 | 0.342 | 0.403 | 63 | 0.211 | 0.267 | 90 |
| Δ3 | 0.120 | 0.168 | 87 | 0.112 | 0.151 | 93 |
5. Analysis of Racial Disparities in Hospitalization and Inpatient Days
5.1. Data Description
Next, we applied the spatial propensity score analysis to the veteran inpatient data. The data consist of 23,533 veterans (9, 695 NHB; 13, 838 NHW) with type 2 diabetes living in Georgia, Alabama and South Carolina in 2014. The mean county sample size was n = 86.5 (range: 1 to 1, 385). Table 2 displays the patient-level variables that were included in the propensity score and adjusted outcome models. Figure 1 displays the per-county percent of NHB veterans, percent of veterans who experience a hospitalization, the mean number of inpatient days among those with a hospitalization, and the mean number of inpatient days across all patients. Moran’s I statistic estimates for percent NHB, percent hospitalizations, and mean number of inpatient days were 0.405, 0.056, and 0.120, respectively (all p-values < 0.05), indicating significant spatial clustering. These results suggest spatial variation in racial composition and hospitalization patterns by county. Approximately 29.0% of the patients experienced a hospitalization in 2014 (31.3% for NHBs, 28.3% for NHWs). Among those who experienced a hospitalization, the mean number of inpatient days was 3.9 (4.9 for NHBs, 3.6 for NHWs). Across all patients – i.e., those with and without a hospitalization – the mean number of inpatient days was 1.1 (1.3 for NHBs, 1.0 for NHWs).
Table 2:
Balance of covariates between NHB and NHW veterans in pre-matched and post-matched samples; “Stand. Diff.” denotes the standardized difference
| Unmatched Sample | Matched Sample | |||||
|---|---|---|---|---|---|---|
| Variable | NHB | NHW | Stand. Diff. | NHB | NHW | Stand. Diff. |
| Age | 64.22 | 70.38 | −0.619 | 66.83 | 66.90 | −0.007 |
| Comorbidity Burden | 2.54 | 2.17 | 0.224 | 2.32 | 2.32 | 0.004 |
| Male | 93.05 | 97.59 | −0.216 | 95.96 | 95.96 | −0.004 |
| Service Percent ≥ 50 | 47.14 | 36.36 | 0.219 | 43.92 | 43.96 | −0.001 |
| Married | 54.79 | 68.86 | −0.293 | 61.90 | 61.76 | 0.003 |
Figure 1:

Percent of patients who are NHB (top left), percent hospitalization (top right), mean number of inpatient days among those who were hospitalized (bottom left), and mean number of inpatients days in the entire sample comprising hospitalized and non-hospitalized patients (bottom right)
5.2. Analysis and Results
We first fit a logistic propensity score model that included patient-level covariates for age, sex, marital status, service connected disability and comorbidity burden as well as a county-level spatial random effect. Accounting for comorbidity burden is critical, as this allows us to compare hospitalization patterns among patients with similar disease profile. We then matched patients based on the logit of the estimated propensity score and a caliper of 0.2 times the standard deviation of the logit. The caliper discarded n = 2, 687 NHB patient observations due to poor matches, thus ensuring a well-balanced sample. NHB patients who were unable to be matched were, on average, younger, sicker, more likely to have service connected disabilities, more likely to be female and less likely to be married than their NHB matched sample counterparts. In the matched sample, 29.71% of patients (NHB: 30.5%, NHW: 28.9%) experienced a hospitalization. Among those hospitalized, the mean number of inpatient days was 4.0; across the entire matched sample, the mean number of inpatient days was 1.2 days (range: 0 to 86 days). The original sample exhibited a number of covariate imbalances. After matching, however, we observed balance across these patient features (Table 2).
Figure 2 displays the spatial distribution of NHB and NHW veterans in the unmatched and spatially matched samples. The spatial distribution of NHB and NHW veterans varied in the unmatched sample (Figure 2, top row), implying that NHBs and NHWs were concentrated in different areas. While a high percent of both NHB and NHW veterans lived in urban areas such as Atlanta, NHW veterans appeared to be concentrated in northern Georgia where only 0.00% to 0.03% of NHB veterans reside (lightest shade on the map). This spatial imbalance was eliminated once the spatially matched sample was created (Figure 2, bottom row). Table 3 displays the pre- and post-matched distribution of NHBs and NHWs in three selected counties. In the spatially matched sample, the distribution of NHW veterans more closely mimicked the nearly unchanged distribution of NHB veterans, indicating that spatial matching resulted in a sample of geographically matched controls.
Figure 2:

Spatial distribution (percent of racial group living in each county) between NHB and NHW veterans in unmatched (top row), and spatially matched (bottom row) samples
Table 3:
Balance of percentage of all NHB and NHW veterans living in selected counties in pre-matched and post-matched samples; “Stand. Diff.” denotes the standardized difference
| Unmatched Sample | Matched Sample | |||||
|---|---|---|---|---|---|---|
| County | NHB | NHW | Stand. Diff. | NHB | NHW | Stand. Diff. |
| Fulton County, GA | 6.24 | 1.63 | 0.239 | 3.53 | 3.24 | 0.014 |
| Jefferson County, AL | 9.18 | 3.58 | 0.231 | 7.23 | 6.92 | 0.010 |
| Horry County, SC | 0.65 | 2.79 | −0.165 | 0.89 | 0.81 | 0.007 |
Once a well-balanced sample was constructed, we fit the spatial negative binomial hurdle model with the same patient-level covariates and a spatial random effect. We then used the estimated coefficients from the two parts of the model to form a standardized estimate of the difference in the risk of hospitalization (Δ1), the mean number of inpatient days among patients with a hospitalization (Δ2), and the mean number of inpatient days across all patients (Δ3). The reported 95% CrI was constructed using the 2.5 and 97.5 percentiles of the sample distribution of risk and mean differences. Estimates of the three ATTs are reported in Table 4. Negative estimates indicate that NHB veterans have a lower probability of hospitalization or mean number of inpatient days while positive estimates indicate the opposite.
Table 4:
Estimated ATTs in the racial disparity of the risk of hospitalization (Δ1), mean number of inpatient days among those hospitalized (Δ2), and mean number of inpatient days across the entire patient population comprising both hospitalized and non-hospitalized patients (Δ3)
| ATT | Estimate | 95% CrI |
|---|---|---|
| Δ1 | −0.015 | (−0.028, −0.001) |
| Δ2 | 0.431 | (−0.214, 1.138) |
| Δ3 | 0.112 | (−0.219, 0.476) |
The results in Table 4 indicate that the difference in the risk of hospitalization between NHB and NHW patients was 1.5 percentage points, with NHB patients having a lower risk of experiencing a hospitalization (−0.015, 95% CrI = [−0.028, −0.001]). Conversely, NHB patients who were hospitalized spent on average approximately one half day longer in the hospital than NHW patients (0.431 days, 95% CrI = [−0.214,1.138]). While the 95% posterior interval included 0, the posterior probability that Δ2>0 was 0.84, providing modest evidence of an increase in the mean number of inpatient days for NHB veterans. Lastly, we observed a slight increase in the mean number of inpatient days among NHBs compared to NHWs across the entire population comprising those who were and were not hospitalized (0.112 days, 95% CrI = [−0.219, 0.476]), though the 95% credible interval also included 0.
In a similar analysis that excluded spatial random effects, we observed a notable difference in the estimate of the mean number of inpatient days among those who had been hospitalized: the estimated ATT was 0.678 days (95% CrI = [0.187, 1.200]. Thus, ignoring geographic confounding would result in a potentially misleading estimate suggesting that hospitalized NHB veterans have a large, highly significant increase in length of stay compared to their NHW counterparts.
6. Discussion
We have combined recent work in propensity score modeling and spatial hurdle models for hierarchical data to understand racial differences in hospitalization and inpatient days. To conduct our analysis, we incorporated spatial random effects into the propensity score and outcome models to account for spatial variation due to potential unmeasured geographic confounders. The spatial effects were assigned CAR priors that promote local spatial smoothing to improve small-area estimation. We performed this work within a Bayesian modeling framework using the readily available software package R-INLA.
In simulation, we explored the impact of fitting only the outcome portion of the analysis versus two-stage propensity score and outcome modeling in the presence of unknown geographic confounding. We observed favorable performance of the two-stage modeling approach across the estimation of the three ATTs: the risk difference in hospitalization, the mean number of inpatient days among those hospitalized, and the mean number of inpatient days overall. This result is consistent with recent work that demonstrates the ability of a cluster-level random effect in the propensity score model to balance on an omitted cluster-level covariate [31]. Furthermore, the analysis that included only an outcome model performed poorly, particularly in the estimation of the mean number of inpatient days as the estimation of both the coefficients and the overdispersion parameter was impacted. This outcome-model-only approach lacked the benefit of balance provided by the first stage, and its subsequent performance across measures is in line with literature that demonstrates the detriment of misspecifying the outcome model [22, 32]. In sum, these results indicate that first using spatial propensity score matching to create a balanced sample and then fitting a spatial hurdle model for the outcome is a reasonable approach when true geographic confounders may be unmeasured or unknown.
Our application study explored racial differences in hospitalization and inpatient days among a sample of diabetic veterans residing in the southeastern United States. We included patient-level covariates and a spatial random effect for each patient’s county of residence in the propensity score model. It is important to note that variables such as comorbidity burden may also be susceptible to racial perceptions and implicit bias as they require diagnoses from healthcare providers; however, here we conceptualize comorbidity burden as the aggregate effect of biological conditions on the patient’s physical well-being. We achieved both patient-level covariate balance and spatial balance in the resulting matched sample and proceeded with the spatial hurdle outcome model to estimate the three ATTs of interest. The small but statistically significant estimate for the risk difference in hospitalization between NHBs and NHWs indicated that after accounting for patients characteristics and geographic residence, NHB patients may be less likely to be hospitalized. While the estimate for the mean difference in inpatient days among those who were hospitalized was not statistically significant, potentially related to the small sample size of hospitalized individuals, it did suggest that hospitalized NHB patients on average spend a half day longer in the hospital compared to hospitalized NHW patients after accounting for disease severity. However, ignoring geographic confounding would have led to a potential overestimate of the disparity. Lastly, as race is a nonmanipulable feature, we shift our focus to the perception of race by deciders such as attending physicians [17]. Here, implicit race-related biases function as the “treatment” variable, and the counterfactual outcome is the one that would be observed if such biases were eliminated. Studies such as ours that investigate the causal association between race and health outcomes can determine whether or not an intervention is necessary and may serve as a prelude to future studies on interventions such as implicit bias training to address racially sensitive clinical decision making [33]. This goal aligns with the VA’s mission to equitably deliver high-quality care and address racial and ethnic disparities within the its healthcare system [34].
In ongoing work, we are examining the impact of addressing geographic confounding through spatial matching compared to traditional patient-level matching. As it is also possible to estimate the ATT using weighting methods [35], our approach could be extended to other propensity score techniques to address geographic confounding. Future work might also explore longitudinal trends in hospitalization and inpatient stays between NHB and NHW patients. Additionally, as this work is restricted to hospitalizations within the VA, additional database resources may be explored to understand racial differences among the broader patient population who may be receiving inpatient care, namely emergency or trauma care, at local or specialized hospitals outside of the VA.
Supplementary Material
Acknowledgments
This work was supported in part by Merit Award HX002299-01A2 from the U.S. Department of Veterans Affairs Health Services Research and Development Program. The contents do not represent the views of the U.S. Department of Veterans Affairs or the United States Government. Dr. Nietert’s time on this project was funded by grants from the NIH (NCATS grant # UL1-TR001450, NIAMS grant # P30-AR072582, and NIGMS grant # U54-GM104941).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Menke A, Casagrande S, Geiss L, Cowie C, Prevalence of and trends in diabetes among adults in the United States, 1988–2012, JAMA 314 (10) (2015) 1021–1029. [DOI] [PubMed] [Google Scholar]
- [2].Currie CJ, Morgan CL, Peters JR, The epidemiology and cost of inpatient care for peripheral vascular disease, infection, neuropathy, and ulceration in diabetes, Diabetes Care 21 (1) (1998) 42–48. [DOI] [PubMed] [Google Scholar]
- [3].American Diabetes Association, Economic costs of diabetes in the U.S. in 2012, Diabetes Care.
- [4].Kalra AD, Fisher RS, Axelrod P, Decreased length of stay and cumulative hospitalized days despite increased patient admissions and readmissions in an area of urban poverty, Journal of General Internal Medicine 25 (9) (2010) 930–935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Kominski GF, Morisky DE, Afifi AA, Kotlerman JB, The effect of disease management on utilization of services by race/ethnicity: Evidence from the Florida Medicaid program, The American Journal of Managed Care 14 (3) (2008) 168–172. [PubMed] [Google Scholar]
- [6].Schneider E, Zaslavsky A, Epstein A, Racial disparities in the quality of care for enrollees in Medicare managed care, JAMA 287 (10) (2002) 1288–1294. [DOI] [PubMed] [Google Scholar]
- [7].Fisher ES, Wennberg JE, Stukel TA, Skinner JS, Sharp SM, Freeman JL, Gittelsohn AM, Associations among hospital capacity, utilization, and mortality of US Medicare beneficiaries, controlling for sociodemographic factors, Health Services Research 34 (6) (2000) 1351–1362. [PMC free article] [PubMed] [Google Scholar]
- [8].Ashton CM, Petersen NJ, Souchek J, Menke TJ, Yu H-J, Pietz K, Eigenbrodt ML, Barbour G, Kizer KW, Wray NP, Geographic variations in utilization rates in Veterans Affairs hospitals and clinics, New England Journal of Medicine 340 (1) (1999) 32–39. [DOI] [PubMed] [Google Scholar]
- [9].McCormick D, Hanchate AD, Lasser KE, Manze MG, Lin M, Chu C,Kressin NR, Effect of Massachusetts healthcare reform on racial and ethnic disparities in admissions to hospital for ambulatory care sensitive conditions: retrospective analysis of hospital episode statistics, BMJ 350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Institute of Medicine (US), How far have we come in reducing health disparities? Progress since 2000: Workshop summary. [PubMed]
- [11].Green A, Carney D, Pallin D, Ngo L, Raymond K, Iezzoni L, Banaji M, Implicit bias among physicians and its prediction of thrombolysis decisions for black and white patients, Journal of General Internal Medicine 22 (9) (2007) 1231–1238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Sabin J, Nosek B, Greenwald A, Rivara F, Physicians’ implicit and explicit attitudes about race by MD race, ethnicity, and gender, Journal of Health Care for the Poor and Underserved 20 (3) (2009) 896–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Stuart EA, Matching methods for causal inference: A review and a look forward, Statistical Science 25 (1) (2010) 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Mullahy J, Specification and testing of some modified count data models, Journal of Econometrics 33 (3) (1986) 341–365. [Google Scholar]
- [15].Rue H, Martino S, Chopin N, Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2) (2009) 319–392. [Google Scholar]
- [16].Rubin DB, Estimating causal effects of treatments in randomized and non-randomized studies., Journal of Educational Psychology 66 (5) (1974) 688–701. [Google Scholar]
- [17].Greiner DJ, Rubin DB, Causal effects of perceived immutable characteristics, The Review of Economics and Statistics 93 (3) (2011) 775–785. [Google Scholar]
- [18].Austin PC, An introduction to propensity score methods for reducing the effects of confounding in observational studies, Multivariate Behavioral Research 46 (3) (2011) 399–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Rosenbaum PR, Rubin DB, The central role of the propensity score in observational studies for causal effects, Biometrika 70 (1) (1983) 41–55. [Google Scholar]
- [20].Rosenbaum PR, Rubin DB, Constructing a control group using multivariate matched sampling methods that incorporate the propensity score, The American Statistician 39 (1) (1985) 33–38. [Google Scholar]
- [21].Arpino B, Mealli F, The specification of the propensity score in multilevel observational studies, Computational Statistics and Data Analysis 55 (4) (2011) 1770–1780. [Google Scholar]
- [22].Li F, Zaslavsky AM, Landrum MB, Propensity score weighting with multilevel data, Stat Med 32 (19) (2013) 3373–3387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Arpino B, Cannas M, Propensity score matching with clustered data. an application to the estimation of the impact of caesarean section on the Apgar score, Statistics in Medicine 35 (20). [DOI] [PubMed] [Google Scholar]
- [24].Besag J, York J, Mollié A, Bayesian image restoration, with two applications in spatial statistics, Annals of the Institute of Statistical Mathematics 43 (1) (1991) 1–20. [Google Scholar]
- [25].Banerjee S, Carlin BP, Gelfand AE, Hierarchical modeling and analysis for spatial data, 2nd Edition, CRC Press, Taylor and Francis Group, Boca Raton, Fla, 2014. [Google Scholar]
- [26].McCandless LC, Gustafson P, Austin PC, Bayesian propensity score analysis for observational data, Statistics in Medicine 28 (1) (2009) 94–112. [DOI] [PubMed] [Google Scholar]
- [27].Sekhon JS, Multivariate and propensity score matching software with automated balance optimization: The Matching package for R, Journal of Statistical Software 42 (7). [Google Scholar]
- [28].Quiroz ZC, Prates MO, Rue H, A Bayesian approach to estimate the biomass of anchovies off the coast of Peru, Biometrics 71 (1) (2015) 208–217. [DOI] [PubMed] [Google Scholar]
- [29].Hernan M, Robins J, Estimating causal effects from epidemiological data, Journal of Epidemiology and Community Health 60 (7) (2006) 578–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].U.S. Census Bureau, TIGER/Line Shapefiles (2014).
- [31].Schuler M, Chu W, Coffman D, Propensity score weighting for a continuous exposure with multilevel data, Health Services & Outcomes Research Methodology 16 (4) (2016) 271–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Drake C, Effects of misspecification of the propensity score on estimators of treatment effect, Biometrics 49 (4) (1993) 1231–1236. [Google Scholar]
- [33].Glass T, Goodman S, Hernan M, Samet J, Causal inference in public health, Annual Review of Public Health 34 (1) (2013) 61–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Saha S, Freeman M, Toure J, Tippens K, Weeks C, Racial and Ethnic Disparities in the VA Healthcare System: A Systematic Review, Executive Summary, Washington DC, 2007. [PubMed] [Google Scholar]
- [35].Hirano K, Imbens G, Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization, Health Services & Outcomes Research Methodology 2 (2001) 259–278. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
