Abstract
Purpose
Despite growing popularity of Propensity Score (PS) methods used in ethnic disparities studies, many researchers lack clear understanding of when to use PS in place of conventional regression models. One such scenario is presented here: when the relationship between ethnicity and primary care utilization is confounded with and modified by socioeconomic status. Here, standard regression fails to produce an overall disparity estimate, whereas PS methods can through the choice of a reference sample (RS) to which the effect estimate is generalized.
Methods
Using data from the National Alcohol Surveys, ethnic disparities between white and Hispanics in access to primary care were estimated using PS methods (PS stratification and weighting), standard logistic regression, and the marginal effects from logistic regression models incorporating effect modification.
Results
Whites, Hispanics and combined white/Hispanic samples were used separately as the RS. Two strategies utilizing PS generated disparities estimates different from those from standard logistic regression, but similar to marginal ORs from logistic regression with ethnicity by covariate interactions included in the model.
Conclusions
When effect modification is present, PS estimates are comparable to marginal estimates from regression models incorporating effect modification. The estimation process requires a-priori hypotheses to guide selection of the RS.
Keywords: Propensity Scores, multivariable regression, marginal effect, confounding, effect modification, reference sample, disparity
INTRODUCTION
Epidemiological studies investigating racial/ethnic disparities in health have grown exponentially over the past few decades (1), in conjunction with the Institute of Medicine’s groundbreaking report on disparities in health care (2) and U.S. national objectives put forth in Healthy People 2000–2020 (3–5). Until recently, disparities research relied heavily upon conventional regression modeling to document inequalities in health and health care across racial/ethnic groups (1). One common problem with regression, however, is that the relationship between ethnicity and the health outcomes of interest are often confounded with, and modified by, socioeconomic status (6). This study addresses why and how newer methods based on Propensity Scores (PS) are particularly well suited for disparities research, although they are relevant to any area of epidemiologic research where effect modification is similarly of concern.
There is lively debate over when PS methods offer benefits over standard multivariable linear or logistic regression typically used in disparities research (7–10). Although there is empirical support for PS’s advantages (11, 12), in practice, PS methods often seem to generate results quite similar to those from multivariable regression (13, 14). This paper is preoccupied with one common scenario in which PS methods should theoretically produce more interpretable effect estimates than those from multivariable regression. This occurs when the distribution of one or more confounding covariates (e.g., SES), as well as the relationship between these confounding covariates and the health outcome, varies across the ethnic groups being compared. Entering relevant interaction terms addresses the effect modification, but the regression model can only produce disparity estimates at specific SES levels and fails to generate a single overall disparity estimate which, in most practical cases, is desired. Without including interactions, disparity estimates will, in general, be biased.
With control variables in the model, standard regression generates conditional effect estimates. In contrast, PS methods produce marginal effect estimates that can be interpreted counterfactually. In the language of causal inference, which often focuses on the effect of a treatment or of exposure to a risk factor, PS methods estimate the difference in an outcome for a given population when it is treated/exposed, and the outcome for the same population when it is not treated/exposed. PS methods thus estimate the marginal effect by design, largely through specifying a Reference Sample (RS) to which the effect estimate is generalized. The RS often used is either the “treated” (“risk-exposed”) sample or the sample combining those who were and were not treated/risk-exposed (15–17). With regression generating conditional effect estimates and PS methods producing marginal estimates, questions have been raised regarding the comparability of the two approaches in non-linear models (10, 18). Specifically, for a dichotomous outcome (as in the present study), it was suggested that marginal Odds Ratios (ORs) should be generated from both PS and logistic regression for the two methods to be comparable (19, 20). Little is known about the comparability of the two approaches, however, if effect modification is present.
In the current study of ethnic disparities, we compare estimates from PS approaches generated using various RSs to both conditional and marginal ORs produced from ordinary logistic regression. We show how marginal ORs from logistic regression incorporating effect modification can be produced and comparable to PS estimates. While some recent work has examined the performance of PS methods under effect modification (21–23), this study compares substantive results obtained from PS and logistic regression models; and also shows how varying the specification of the RS can influence results.
MATERIALS AND METHODS
Dataset and measures
Our empirical analysis focuses on the timely issue of racial/ethnic disparities between whites and Hispanic Americans in access to primary care interventions for alcohol problems (24). Data come from the combined 2000 and 2005 U.S. National Alcohol Surveys (NAS), two comparable probability samples of U.S. adults collected using computer-assisted telephone interviews via random digit dialing (25). Included in this analysis are “at risk” drinkers who meet the NIAAA drinking guidelines defined as: men/women drinking more than 4/3 drinks on any day or more than 14/7 drinks per week (26). Self-identified ethnicity, based on the current U.S. Census definition, is the key risk-exposure variable in this analysis. Only those self-identifying as white (N=2798) or Hispanic (N=684) were retained for analysis as the greatest disparities have been found between these two groups (24, 27, 28). The outcome of interest is the subject’s report of whether she or he had one or more primary care visits in the prior year. Any visit with a private doctor, clinic or non-emergency medical setting during the prior year counted as a primary care visit. Demographic and SES covariates were used as potential confounders and effect modifiers of interest, including gender, age, education, annual household income and health insurance coverage.
Statistical Analysis
PS Stratification and Weighting
A PS is defined as the estimated probability of being Hispanic versus white and is modeled as a function of gender, age, education, annual household income and health insurance coverage using logistic regression. Two approaches utilizing the estimated PS to estimate ethnic disparities are considered in the present study, PS stratification and PS weighting.
PS-Stratification (15, 16) classifies subjects into five strata (quintiles) using their estimated PS. This process is repeated several times, using “Hispanics”, “whites”, and the “combined white and Hispanic sample” separately as the RS. When Hispanics are treated as the RS, the Hispanic sample is divided into 5 equal size strata based on the sorted PS distribution within the Hispanic sample. Using the PS thresholds for quintile generation, the comparison white sample is then divided into five groups, presumably of unequal sizes given the distribution of PS is different between whites and Hispanics. This is done analogously for whites as the RS. When the combined whites/Hispanic sample is treated as the RS, the pooled sample is divided into 5 equal size strata based on the sorted PS distribution for the total sample. The overall response probability, for each choice of RS, is simply a weighted average of stratum-specific response probabilities. Since the RS is divided into five equal size strata in the current design, equal weights are assigned across the five strata and the overall response probability is the simple average of the stratum-specific probabilities. The marginal OR estimate is then derived using the overall response probabilities (see the Appendix for details as well as (19)).
For PS weighting (29), subjects in the two ethnic comparison groups are weighted based on their estimated PS to construct a ‘pseudo-population’ in which confounders are no longer associated with ethnicity. When using either the white or the Hispanic group as the RS, “Standardized Mortality Ratio” (SMR) weights are used that assign a weight of 1 to the group chosen as the RS and the propensity odds ê(X)/(1 − ê(X) (where ê(X) is the estimated PS) to the non-RS. When using the combined sample as the RS, the Inverse-Probability-of-Treatment-Weighted (IPTW) estimator must be used and derives the effect estimate by using the inverse of the PS (1/ê(X)) as weights for one group and the inverse of 1 minus the PS (1/(1 − ê(X)) for the other. The marginal ORs are estimated by fitting logistic regression using the ethnic indicator as predictor, with the PS weights used as sampling weights in the model estimation (29).
Standardization and Reference Sample
Standardization is the traditional approach in epidemiology to obtain an overall effect estimate when a confounder modifies the relationship between the outcome and a risk factor. Standardization takes a weighted average of the stratum-specific rates or risks, with the strata defined by the effect modifier and with weights corresponding to the number of persons in the RS falling into each category of the effect modifier (30). As noted above, disparity estimates from the PS stratification method are created from a weighted average of stratum-specific estimates which is essentially a standardization process where the PS serves as the effect modifier (see an illustration in (29) Appendix 1). Sato and Matsuyama (31) also show that PS weighting allows for nonparametric standardization using either the exposure group or the total combined groups as the RS.
Conditional and Marginal ORs from Logistic Regression
We first fit a standard logistic regression model which generates the conditional OR for the ethnic disparity estimate. Using the fitted model, two predicted response probabilities are generated for each individual by plugging in the actual values of their observed covariates (other than ethnicity) into the regression model. Regardless of an individual’s actual ethnicity, the first predicted response probability is generated by assuming the individual is white; the second by assuming he/she is Hispanic. Using these two sets of estimated response probabilities for each individual, the separate “white”/“Hispanic” marginal response probabilities are estimated as the simple average across all respondents (thus using the total combined sample as the RS) where everyone was assumed to be white, and (separately) Hispanic. Finally, the marginal OR is then derived using these two marginal response probabilities. Alternatively, the marginal response probabilities, as well as the corresponding marginal ORs, can also be generated by averaging the individual predicted probabilities across the white or Hispanic sample only, thus using the white or Hispanic sample as the RS. See the Appendix for details and also see (19).
Marginal ORs from logistic regression incorporating effect modification
Here, interaction terms are explicitly built into the regression model to generate ethnic-specific coefficients. Individual predicted response probabilities for the white and Hispanic effects, marginal response probabilities and the corresponding marginal ORs are all derived in a similar manner to that described above using standard logistic regression models. More details for this procedure can be found in the Appendix. Unlike the above procedure, the presence of effect modification (e.g., SES with ethnicity) requires the appropriate covariate and coefficient values to be used in the construction of marginal probabilities. Considering a white female case, her estimated counterfactual Hispanic outcome would be obtained by plugging in her actual SES into the regression but using the estimated Hispanic SES coefficient (rather than that estimated for whites). The marginal response probability thus derived can then be interpreted as the predicted outcome for Hispanics had they the same covariate distribution as whites.
RESULTS
We first examined differences in the distributions of demographic and SES covariates as well as effect modification between white and Hispanic at-risk drinkers. Table 1 shows pronounced differences in socio-demographic estimates between the two ethnic groups, with Hispanics more likely to be male, younger, have lower education, income and no health insurance compared to whites. The right-hand panel of Table 1 also shows the percentages of white and Hispanic at-risk drinkers receiving any primary care across the levels of each covariate. The magnitude of the ethnic difference in outcome rates is smaller among those with higher income and education vs. those with lower levels. The presence of effect modification was also indicated by the large differences in the magnitude of the χ2 statistics across the two ethnic groups.
Table 1.
Distribution of Covariates across White and Hispanic at Risk Drinkers and Their Association with Primary Care Use within Ethnic Groups
Distribution of covariates (%) | % Receiving any primary care | |||
---|---|---|---|---|
| ||||
White (N=2798) | Hispanics (N=684) | White (N=2798) | Hispanics (N=684) | |
Sex | χ2(1)=26.9*** | χ2(1)=1.21 | χ2(1)=7.3** | |
Male | 52.4 | 63.5 | 33.1 | 15.0 |
Female | 47.6 | 36.5 | 37.3 | 23.2 |
Age | χ2(3)=108.1*** | χ2(3)=0.71 | χ2(3)=2.3 | |
18–29 | 26.7 | 45.9 | 32.7 | 15.9 |
30–49 | 51.4 | 42.8 | 36.4 | 20.1 |
50–64 | 16.5 | 9.5 | 35.5 | 16.9 |
65+ | 5.4 | 1.8 | 34.4 | 25.0 |
Education | χ2(3)=402.8*** | χ2(3)=3.91 | χ2(3)=30.6*** | |
<HS grad | 5.3 | 28.9 | 33.3 | 8.1 |
HS grad | 25.4 | 30.7 | 29.6 | 15.7 |
Some college | 30.4 | 23.2 | 35.6 | 23.9 |
College grad | 39.0 | 17.1 | 38.6 | 30.8 |
Income | χ2(6)=233.7*** | χ2(6)=4.21 | χ2(6)=27.3*** | |
≤ 10k | 6.8 | 17.0 | 30.9 | 10.3 |
10,001–20k | 8.5 | 21.2 | 27.8 | 13.8 |
20,001–30k | 11.1 | 14.5 | 34.4 | 12.1 |
30,001–40k | 12.1 | 11.3 | 31.1 | 22.5 |
40,001–60k | 19.1 | 11.3 | 39.1 | 32.5 |
>60k | 34.6 | 15.8 | 37.7 | 26.9 |
Missing | 7.9 | 9.1 | 32.7 | 12.9 |
Insurance | χ2(4)=304.6*** | χ2(4)=5.31 | χ2(4)=33.3*** | |
None | 10.0 | 35.7 | 25.7 | 8.2 |
Private | 74.5 | 48.7 | 36.5 | 26.4 |
Medicaid | 1.8 | 2.8 | 41.2 | 10.5 |
Medicare/Fed | 9.7 | 7.0 | 38.2 | 14.6 |
Other | 3.4 | 5.8 | 22.5 | 15.0 |
P<0.05,
P<0.01,
P<0.001
χ2 test statistics for whites are derived by normalizing sample size of whites to that of Hispanics (N=684)
We examined the common support region and evaluated the balancing quality of the two PS approaches. The estimated PS, the probability of being Hispanics, ranged from 0.039 to 0.823 for the Hispanic sample, and from 0.024 to 0.823 for whites. The non-overlapping region was very small and sensitivity analysis restricting to the common support generated similar results. Overall, PS weighting performed better than PS stratification in balancing the covariates. Of the 21 dummy variables from the demographic and SES measures, and assessed separately for Hispanics, whites, and the combined sample as the RS (63 tests altogether), no significant differences were found between whites and Hispanics using PS weighting for any of the covariates. All comparisons had p>0.10 and only 2 out of 63 had p<0.20. In contrast, using PS stratification assessing balancing for all covariates separately across the 5 strata, about 9–13% of comparisons showed significant differences at p<0.05 for the three RSs, with the 5th stratum (i.e., those with the largest PS) producing the majority of the unbalanced covariates.
Table 2 shows the effect estimates, both overall and within each stratum, using the PS stratification method. With Hispanics used as the RS, shown in the top third of Table 2, as expected, the five strata each contained a similar number of Hispanics. The middle of Table 2 shows results when whites are used as the RS, for which the strata contain a similar proportion of whites and analogously for the combined white/Hispanic groups used as the RS as shown in the bottom third of Table 2. In general, larger ORs, and hence larger disparity estimates, were observed for the strata with higher PS estimates for all three choices of the RS. However, with Hispanics as the RS, four strata had ORs significantly different from one. By contrast, with whites as the RS, only two strata had ORs significantly different from 1.00. This suggests that the overall ethnic effect would be larger with Hispanics used as the RS compared to whites as the RS. As shown in the last rows of the three sections in Table 2, the overall marginal ORs were 1.99, 1.44, and 1.57 using Hispanics, whites, and the combined white/Hispanic sample as the RS, respectively in separate procedures.
Table 2.
Estimated Proportions and Odds Ratios (ORs) of any Primary Care Use for whites and Hispanics, for Strata and Combined Overall across Strata for the PS Stratification Method, using Hispanics, Whites and the Combined Sample as Reference Samples
Ethnicity | N | Mean (se) | Difference (se) | Odds ratios (whites vs Hispanics) | |
---|---|---|---|---|---|
PS Stratification, Hispanics as reference sample | |||||
Stratum (from lowest to highest propensity for being Hispanic) | |||||
1 | White | 1470 | 0.381 (0.013) | 0.022 (0.042) | 1.10 |
Hispanics | 142 | 0.359 (0.040) | |||
2 | White | 789 | 0.346 (0.017) | 0.128 (0.040)** | 1.90** |
Hispanics | 133 | 0.218 (0.036) | |||
3 | White | 321 | 0.308 (0.026) | 0.157 (0.040)*** | 2.51** |
Hispanics | 139 | 0.151 (0.030) | |||
4 | White | 160 | 0.231 (0.033) | 0.134 (0.042)** | 2.80*** |
Hispanics | 134 | 0.097 (0.026) | |||
5 | White | 58 | 0.241 (0.056) | 0.175 (0.060)** | 4.49** |
Hispanics | 136 | 0.066 (0.021) | |||
Overall | White | 2798 | 0.302 (0.033) 1 | ||
Hisp | 684 | 0.178 (0.031) 1 | 0.123 (0.045)** | 1.99 (1.58, 2.51) 2 | |
| |||||
PS Stratification, whites as reference sample | |||||
Stratum (from lowest to highest propensity for being Hispanic) | |||||
1 | White | 572 | 0.406 (0.021) | 0.011 (0.082) | 1.05 |
Hispanics | 38 | 0.395 (0.079) | |||
2 | White | 553 | 0.354 (0.020) | 0.054 (0.063) | 1.28 |
Hispanics | 60 | 0.300 (0.059) | |||
3 | White | 562 | 0.386 (0.021) | 0.035 (0.059) | 1.16 |
Hispanics | 74 | 0.351 (0.055) | |||
4 | White | 552 | 0.330 (0.020) | 0.118 (0.046)* | 1.83* |
Hispanics | 99 | 0.212 (0.041) | |||
5 | White | 559 | 0.279 (0.019) | 0.175 (0.024)*** | 3.33*** |
Hispanics | 413 | 0.104 (0.015) | |||
Overall | White | 2798 | 0.351 (0.020) 1 | ||
Hisp | 684 | 0.272 (0.054) 1 | 0.079 (0.058) | 1.44 (1.12, 1.87) 2 | |
| |||||
PS Stratification, combined whites/Hispanics as reference sample | |||||
Stratum (from lowest to highest propensity for being Hispanic) | |||||
1 | White | 751 | 0.403 (0.018) | 0.101 (0.072) | 1.56 |
Hispanics | 43 | 0.302 (0.070) | |||
2 | White | 571 | 0.354 (0.020) | 0.009 (0.066) | 1.04 |
Hispanics | 58 | 0.345 (0.062) | |||
3 | White | 581 | 0.348 (0.020) | 0.069 (0.050) | 1.38 |
Hispanics | 97 | 0.278 (0.043) | |||
4 | White | 553 | 0.340 (0.020) | 0.120 (0.040)** | 1.83** |
Hispanics | 141 | 0.220 (0.035) | |||
5 | White | 342 | 0.257 (0.024) | 0.165 (0.028)*** | 3.39*** |
Hispanics | 345 | 0.093 (0.015) | |||
Overall | White | 2798 | 0.340 (0.020) 1 | ||
Hisp | 684 | 0.248 (0.050) 1 | 0.093 (0.054)† | 1.57 (1.23, 2.00) 2 |
The overall mean is simple average of the means from the five strata and pooled SE is calculated based on (16).
Pooled OR is calculated from the pooled proportion from white/Hispanic groups; 95% CIs were generated using bootstrap method.
p<0.10,
p<0.05,
p<0.01,
p<0.001
Table 3 shows that, before adjustment, whites had 2.47 times the odds compared to Hispanics to report a primary care visit. Next, adjusting for demographics and SES using logistic regression, the adjusted conditional OR dropped to 1.84. The marginal OR from the logistic regression adjusting for covariates was 1.81. This marginal OR was derived by averaging across the total combined sample. For illustrative purpose, we estimated the marginal OR for whites and, separately, Hispanics as the RS as 1.82 and 1.81, respectively. Note that similar results are expected across different RSs here, as effect modification was not incorporated in the model.
Table 3.
Estimated Disparities in Primary Care Use for the PS Weighting Method Standardized to Different Reference Samples, Compared to Raw and Multivariable Logistic Regression, Whites versus Hispanics
Raw OR (95% CI) | 2.47 (2.00, 3.05) |
Multivariable logistic regression, conditional OR (95% CI) | 1.84 (1.47,2.31) |
Multivariable logistic regression, marginal OR (95% CI) | 1.81 (1.45, 2.27)1 |
PS weighting, marginal ORs (95% CI) | |
- Hispanics as reference sample | 2.01 (1.58, 2.57) |
- Whites as reference sample | 1.41 (1.09, 1.83) |
- Combined sample as reference sample | 1.54 (1.19, 1.99) |
Marginal OR was derived for the total combined sample; 95% CI was generated using bootstrap method
For a specific choice of RS, PS weighting methods produced results quite similar to those from PS stratification. Using PS weighting, the marginal ORs were 2.01, 1.41, and 1.54 using Hispanics, whites, and the combined white/Hispanic sample as the RS, separately.
Table 4 shows the marginal effects from the logistic regressions incorporating effect modification by SES, including the marginal response probabilities and marginal ORs for the three RS choices. Using Hispanics, whites, and the combined white/Hispanic sample as the RS, the marginal ORs were 2.01, 1.48, and 1.56, respectively, each of which are similar to those produced from PS methods for the corresponding RS.
Table 4.
Estimated Disparities in Primary Care Using Logistic Regression Incorporating Effect Modification
Hispanics as reference sample | |
---|---|
Hispanics proportion | 0.180 |
Counterfactual whites proportion | 0.305 |
Difference in proportion (whites vs Hispanics) | 0.126 |
Marginal OR (whites vs Hispanics) | 2.01 (1.54, 2.61) 1 |
| |
Whites as reference sample | |
Counterfactual Hispanics proportion | 0.268 |
Whites proportion | 0.351 |
Difference in proportion (whites vs Hispanics) | 0.083 |
Marginal OR (whites vs Hispanics) | 1.48 (1.18, 1.86) 1 |
| |
Combined whites/Hispanics as reference sample | |
Counterfactual Hispanics proportion | 0.251 |
Counterfactual whites proportion | 0.342 |
Difference in proportion (whites vs Hispanics) | 0.092 |
Marginal OR (whites vs Hispanics) | 1.56 (1.25, 1.93) 1 |
95% CIs were generated using bootstrap method
DISCUSSION
Our goal was to compare standard logistic regression and Propensity Score (PS) methods to estimate the main effect of ethnicity on a health-related outcome in the presence of effect modification. This is a common problem in epidemiological research, particularly in disparities research where ethnicity and SES are typically confounded. Standard regression analyses, controlling for SES and other demographics, estimated that whites had 1.84 times the odds of receiving primary care compared to Hispanics. However, when effect modifiers are present, this conditional estimate assumes, incorrectly, the ethnicity effects are constant across SES levels. In contrast, PS methods produced marginal estimates for a given reference sample (RS), using a standardization process that used weighting to reflect the covariate distribution in the RS. In this study, if our substantive research aim is the ethnic difference between whites and Hispanics if Hispanics were like whites in terms of measured SES characteristics (i.e., using whites as the RS), the results from PS methods showed whites had 1.41 or 1.44 times the odds (using PS weighting or stratification) of visiting primary care than Hispanics, markedly different from the conditional OR of 1.84 using standard logistic regression.
Our findings highlight the importance of careful selection of the RS to generate appropriate marginal effects when potential confounders are also effect modifiers. In the disparities research, the decisions about the RS should be informed by the substantive research question at hand and should also consider policy implications of such a choice. Findings from disparities research are often politically charged and rather technical choices concerning the selection of RS can substantially impact the “take home” message of one’s research. As illustrated above, ethnic disparity estimate using whites as the RS (OR=1.41 or 1.44 using two PS approaches) was substantially different from those using Hispanics as the RS (OR=1.99 or 2.01). In this study, as in many other disparities investigations, a comparison of greater interest is the Hispanics-white disparity where the minority population (Hispanics) had the same demographic and SES characteristics as whites, when whites, an advantaged group, serve as the RS.
Given the importance of the selection of RS, it should be carefully thought through by researchers designing PS analyses. One relevant issue is incomplete matching (32), when it is not possible to find matches for each case in the RS. PS methods, particularly the PS matching approach, often produce a matched subsample in order to achieve well balanced groups. When the number of unmatched individuals is large, there is concern whether the matched individuals are representative of the population being studied (33). This is particularly problematic when effect modification is present and the covariate distribution in the matched subsample is different from that in the total RS. Recent work has shown inconsistent estimates when different matched sub-samples were chosen (21, 23). In this study, to use PS matching with whites as the RS, one would need to identify enough individuals from a much smaller sample (Hispanics) to match with all subjects from the much larger sample (whites). Due to this potential problem, the PS matching method was not implemented here and left for future research.
PS is a complex approach and researchers implementing it are confronted with technical choices that can potentially impact the results. Other than the choice of RS, there are questions regarding choice of specific PS approach to apply (matching, stratification or weighting), evaluation of common support, and assessment of balancing quality, etc. (33) In this analysis, we found that the PS stratification approach might not fully balance the covariates due to uncontrolled intra-stratum confounding. One remedy is to further adjust for the SES covariates within each PS stratum. For example, using the combined sample as the RS, the ORs adjusting for covariates are 1.62, 0.96, 1.42, 1.81 and 3.01 for the five strata 1–5, in comparison to 1.56, 1.04, 1.38, 1.83 and 3.39 without covariate adjustment as in the bottom section of Table 2. Comparing the two sets of results suggests the overall OR using the new approach might be slightly lower than we previously found (1.57). A problem of this adjusting approach is that, given the non-collapsibility property of ORs, the overall OR cannot be derived simply by averaging the stratum-specific ORs (=1.76) as was done for stratum-specific response probabilities which are collapsible.
Finally, we found that the marginal effects derived from logistic regression that included interaction terms appeared quite similar to those produced from the PS methods. We stress the comparability between the PS estimates and regression estimates for marginal effects only. Conceptually, PS methods produce counterfactual outcome estimates for a specific RS and generate a marginal effect. Quantitatively, the marginal and conditional ORs from logistic regressions do not always converge due to the non-collapsibility property of ORs (34, 35). Furthermore, when interaction terms are entered into the regression model, marginal ORs for a specific RS can be produced in a process involving the averaging across a given population. This generalized regression approach incorporating effect modification has its roots both in the Rubin casual model as well as the economics literature. Rubin (36) developed a formula estimating the treatment effect using linear regression for the combined treated and non-treated samples, basically equivalent to the marginal effects we use here for the combined sample as the RS. Separately, labor economists devised the so-called Oaxaca-Blinder (OB) decomposition (37, 38) to estimate gender and racial disparities due to discrimination. Its generalized form for dichotomous outcomes (39) is equivalent to what we present here using whites or Hispanics as the RS. Last, while a model specification including interaction terms is more flexible than one only having main effects, all regressions are parametric models which assume some underlying data generation processes which might not be correct and may have potential problems stemming from untrustworthy extrapolation (7). In this regard, PS methods as a quasi-parametric approach may be more robust.
Acknowledgments
Supported by grants from the U.S. National Institute on Alcohol Abuse and Alcoholism R01 AA017197 and P50 AA005595.
List of Abbreviations
- PS
Propensity score
- RS
Reference sample
- SES
Socioeconomic status
- OR
Odds ratio
- NAS
National Alcohol Survey
- SMR
Standardized Mortality Ratio
- IPTW
Inverse-Probability-of-Treatment-Weighted
Appendix
1. The calculation of PS stratification marginal response probability and marginal ORs
For each of the three RS selections (Hispanics, whites and the combined sample), the overall (marginal) response probability for the white and Hispanic effect is estimated by weighting the stratum-specific response probability
(1) |
Where p̂1,Resp and p̂0,Resp are the overall response probabilities for white effect and Hispanic effect separately, and p̂1s and p̂0s the stratum-specific response probabilities for whites and Hispanics separately, where s=1 to 5. Since the reference sample (RS) is divided into five equal size strata in the current design, equal weights are assigned across the five strata and the overall response probability is just the simple average of the stratum-specific probability. The overall (marginal) OR is defined for whites versus Hispanics as:
(2) |
2. The calculation of marginal ORs from standard logistic regression
Consider a standard logistic regression model
(3) |
where Y is the dichotomous observed response, Z is indicator for ethnicity (coded as 1 if white and 0 if Hispanic) and X are covariates. For this model, the conditional OR is estimated as exp (β̂Z). Using the fitted model (3) above, for each subject i, two predicted response probabilities are generated given individual observed covariates xi. The first is generated by assuming the respondent is white (i.e., using Z=1); the second by assuming the respondent is Hispanic (i.e., Z=0), regardless of the respondent’s actual ethnicity. The response probability can be represented as
(4) |
where l=1, then =0 for each respondent for the white and Hispanic effects.
The marginal response probability for the total combined sample is just the average across the whole sample of individual predicted probability for both white and Hispanic effects (l=1 and 0 respectively):
(5) |
Finally the marginal OR for the total combined sample is as below.
(6) |
Alternatively, the marginal response probabilities, as well as the corresponding marginal ORs, can also be estimated for the white or Hispanic sample only as the RS (Z=1 or 0, respectively), for both the white and Hispanic effects (l=1 and 0 respectively):
(7) |
3. The calculation of marginal ORs from standard regression incorporating effect modification
The logistic regression (3) above is generalized to include ethnic-specific coefficients in the model
(8) |
where Z1 is the indicator for whites and Z0 is the indicator for Hispanics, and the intercept is removed from the model. Similar to (4) above, for each subject i, two predicted response probabilities can be generated that, respectively, assuming the respondent is white or Hispanic (regardless of their true ethnicity) to generate the white and Hispanic effects (l=1 and 0 respectively). For example, for a white individual, the predicted response probability for the white effect (l=1) is
(9) |
For this same white individual, the predicted response probability for the Hispanic effect (l=0) is
(10) |
Note in (10) while this white individual’s own covariates xwi are used, the coefficients for Hispanics (β̂00 and β̂x0) are plugged into the model to generate the counterfactual predicted outcome if the respondent were Hispanic. The predicted response probability in (10) can also be interpreted as the predicted outcome for Hispanics should they have the same covariate distribution as whites.
Similarly, the predicted response outcome for each Hispanic individual can be estimated to generate the white and Hispanic effects (l=1 and 0, respectively). The marginal response probability can then be calculated by taking the average of the individual predicted probabilities as shown in (5) for the combined overall sample as the RS or (7) for whites or Hispanics as the RS. Finally, the marginal OR for each RS is derived as shown in (6) using the corresponding marginal response probabilities.
References
- 1.Adler NE, Rehkopf DH. U.S. disparities in health: descriptions, causes, and mechanisms. Annu Rev Public Health. 2008;29:235–52. doi: 10.1146/annurev.publhealth.29.020907.090852. [DOI] [PubMed] [Google Scholar]
- 2.Smedley BD, Stith AY, Nelson AR, editors. Confronting racial and ethnic disparities in health care. Washington, DC: Boiard on Health Sciences Policy, Institute of Medicine, National Academies Press; 2002. Unequal Treatment. [PubMed] [Google Scholar]
- 3.U.S. Department of Health and Human Services. Healthy people 2000: National health promotion and disease prevention objectives. 1991. [Google Scholar]
- 4.U.S. Department of Health and Human Services. Healthy People 2010: Understanding and improving health. 2. Washington, DC: U.S. Department of Health and Human Services; 2000. [Google Scholar]
- 5.U.S. Department of Health and Human Services. Healthy People 2020: Disparities. Washington, DC: U.S. Department of Health and Human Services Office of Disease Prevention and Health Promotion; [Accessed: 2011-12-05]. Archived by WebCite® at http://www.webcitation.org/63iByEQI3. [Google Scholar]
- 6.Williams DR, Collins C. US Socioeconomic and racial differences in health: patterns and explanatioins. Annual Review of Sociology. 1995;21:349–86. [Google Scholar]
- 7.Imbens GW, Wooldridge JM. Recent developments in the econometrics of program evaluation. Journal of Economic Literature. 2009;47(1):5–86. [Google Scholar]
- 8.D’Agostino RB., Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17(19):2265–81. doi: 10.1002/(sici)1097-0258(19981015)17:19<2265::aid-sim918>3.0.co;2-b. [DOI] [PubMed] [Google Scholar]
- 9.Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med. 1997;127(8 Pt 2):757–63. doi: 10.7326/0003-4819-127-8_part_2-199710151-00064. [DOI] [PubMed] [Google Scholar]
- 10.Martens EP, Pestman WR, de Boer A, et al. Systematic differences in treatment effect estimates between propensity score methods and logistic regression. Int J Epidemiol. 2008;37(5):1142–7. doi: 10.1093/ije/dyn079. [DOI] [PubMed] [Google Scholar]
- 11.Dehejia RH, Wahba S. Causal effects in non-experimental studies: re-evaluating the evaluation of training programs. J Am Stat Assoc. 1999;94(448):1053–62. [Google Scholar]
- 12.Dehejia RH, Wahba S. Propensity score-matching methods for nonexperimental causal studies. The Review of Economics and Statistics. 2002;84(1):151–61. [Google Scholar]
- 13.Shah BR, Laupacis A, Aux JE, et al. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. J Clin Epidemiol. 2005;58(6):550–9. doi: 10.1016/j.jclinepi.2004.10.016. [DOI] [PubMed] [Google Scholar]
- 14.Stürmer T, Joshi M, Glynn RJ, et al. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol. 2006;59(5):437–47. doi: 10.1016/j.jclinepi.2005.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. [Google Scholar]
- 16.Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79(387):516–24. [Google Scholar]
- 17.Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician. 1985;39(1):33–8. [Google Scholar]
- 18.Kaufman JS. Marginalia: comparing adjusted effec measures. Epidemiol. 2010;21(4):490–3. doi: 10.1097/EDE.0b013e3181e00730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Stampf S, Graf E, Schmoor C, et al. Estimators and confidence intervals for the marginal odds ratio using logic regression and propensity score stratification. Stat Med. 2010;29(7–8):760–9. doi: 10.1002/sim.3811. [DOI] [PubMed] [Google Scholar]
- 20.Austin PC. The performance of different propensity score methods for estimating odds ratios. Stat Med. 2007;26(16):3078–94. doi: 10.1002/sim.2781. [DOI] [PubMed] [Google Scholar]
- 21.Kurth T, Walker AM, Glynn RJ, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol. 2006;163(3):262–70. doi: 10.1093/aje/kwj047. [DOI] [PubMed] [Google Scholar]
- 22.Stürmer T, Rothman KJ, Glynn RJ. Insights into different results from different causal contrasts in the presense of effect-measure modification. Pharmacoepidemiology and Drug Safety. 2006;15(10):698–709. doi: 10.1002/pds.1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lunt M, Solomon D, Rothman K, et al. Different methods of balancing covariates leading to different effect estimates in the presence of effect modification. Am J Epidemiol. 2009;169(7):909–17. doi: 10.1093/aje/kwn391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schmidt LA, Ye Y, Greenfield TK, et al. Ethnic disparities in clinical severity and services for alcohol problems: results from the National Alcohol Survey. Alcohol Clin Exp Res. 2007;31(1):48–56. doi: 10.1111/j.1530-0277.2006.00263.x. [DOI] [PubMed] [Google Scholar]
- 25.Kerr WC, Greenfield TK, Bond J, et al. Age-period-cohort modeling of alcohol volume and heavy drinking days in the US National Alcohol Surveys: divergence in younger and older adult trends. Addiction. 2009;104(1):27–37. doi: 10.1111/j.1360-0443.2008.02391.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.National Institute on Alcohol Abuse and Alcoholism. NIH Publication No 07–3769. Rockville, MD: National Institute on Alcohol Abuse and Alcoholism; [accessed 12/11/09]. 2005. Helping patients who drink too much: a clinician’s guide. Updated 2005 Edition. http://pubs.niaaa.nih.gov/publications/Practitioner/CliniciansGuide2005/guide.pdf. [Google Scholar]
- 27.Chartier K, Caetano R. Ethnicity and health disparities in alcohol research. Alcohol Res Health. 2010;33(1–2):152–60. [PMC free article] [PubMed] [Google Scholar]
- 28.Mulia N, Schmidt LA, Ye Y, et al. Preventing disparities in alcohol screening and brief intervention: the need to move beyond primary care. Alcohol Clin Exp Res. 2011;35(9):1557–60. doi: 10.1111/j.1530-0277.2011.01501.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Robins JM, Hernán MÁ, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiol. 2000;11(5):550–60. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
- 30.Rothman KJ, Greenland S. Modern epidemiology. 2. Philadelphia, PA: Lippincott-Raven Publishers; 1998. [Google Scholar]
- 31.Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiol. 2003;14(6):680–6. doi: 10.1097/01.EDE.0000081989.82616.7d. [DOI] [PubMed] [Google Scholar]
- 32.Rosenbaum PR, Rubin DB. The bias due to incomplete matching. Biometrics. 1985;41(1):103–16. [PubMed] [Google Scholar]
- 33.Caliendo M, Kopeinig S. Propensity score matching, implementation, evaluation, sensitivity analysis. Journal of Economic Surveys. 2008;22(1):31–72. [Google Scholar]
- 34.Gail MH, Wieand HS, Piantadosi S. Biased estimates of treatment effect in randomized experiments with non-linear regressions and omitted covariates. Biometrika. 1984;71(3):431–44. [Google Scholar]
- 35.Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Sci. 1999;14(1):29–46. [Google Scholar]
- 36.Rubin DB. Assignment to treatment group on the basis of a covariate. Journal of Educational Statistics. 1977;2(1):1–26. [Google Scholar]
- 37.Oaxaca RL. Male-female wage differentials in urban labor markets. International Economic Review. 1973;14(3):693–709. [Google Scholar]
- 38.Blinder AS. Wage discrimination: reduced form and structural estimates. J Hum Resour. 1973;8(4):436–55. [Google Scholar]
- 39.Bauer TK, Sinning M. An extension of the Blinder-Oaxaca decomposition to nonlinear models. Advances in Statistical Analysis. 2008;92(2):197–206. [Google Scholar]