Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Nov 1.
Published in final edited form as: Pharmacoepidemiol Drug Saf. 2021 Aug 24;30(11):1471–1485. doi: 10.1002/pds.5338

Core Concepts in Pharmacoepidemiology: Violations of the Positivity Assumption in the Causal Analysis of Observational Data: Consequences and Statistical Approaches

Yaqian Zhu 1,*, Rebecca A Hubbard 1, Jessica Chubak 2, Jason Roy 3,, Nandita Mitra 1,
PMCID: PMC8492528  NIHMSID: NIHMS1732059  PMID: 34375473

Abstract

In the causal analysis of observational data, the positivity assumption requires that all treatments of interest be observed in every patient subgroup. Violations of this assumption are indicated by nonoverlap in the data in the sense that patients with certain covariate combinations are not observed to receive a treatment of interest, which may arise from contraindications to treatment or small sample size. In this paper, we emphasize the importance and implications of this often-overlooked assumption. Further, we elaborate on the challenges nonoverlap poses to estimation and inference and discuss previously proposed methods. We distinguish between structural and practical violations and provide insight into which methods are appropriate for each. To demonstrate alternative approaches and relevant considerations (including how overlap is defined and the target population to which results may be generalized) when addressing positivity violations, we employ an electronic health record-derived data set to assess the effects of metformin on colon cancer recurrence among diabetic patients.

Keywords: overlap, trimming, weighting, extrapolation, propensity score, generalizability

1. Introduction

Electronic Health Records (EHR), medical claims data, and public health registries are increasingly used to assess the causal effects of treatments, interventions, or other exposures on health outcomes. Valid causal inference relies on careful attention to underlying assumptions. Because treatments are not assigned randomly in observational studies, inference is susceptible to confounding bias. The assumption of no unmeasured confounding is often the primary focus of analysts.1,2 To that end, researchers often control for a large number of pre-treatment covariates using approaches such as regression adjustment or matching. However, controlling for many patient and disease level characteristics threatens another vital causal assumption—positivity.

The positivity assumption states that the conditional probability of receiving a given treatment cannot be 0 or 1 in any patient subgroup as defined by combinations of covariate values. Consider an example in which a subgroup of patients never receives the treatment of interest. In such a subgroup, the treatment effect cannot be estimated directly because outcomes for treated subjects are never observed. In other words, this lack of variability in treatment assignment would threaten the identifiability of causal effects—whether they can be uniquely determined or estimated based on observed variables—in both this subgroup and the overall population that includes this subgroup.

Positivity violations can take two forms. Structural (also called theoretical) violations occur when it is impossible for a subject to receive a certain treatment, e.g., if certain patient characteristics constitute an absolute contraindication for treatment.3 Increasing sample size does not ameliorate this problem. From the perspective of target trial emulation, structural positivity holds if we can think of an individual as being eligible to be randomized based on his or her baseline data.4,5 On the other hand, practical (also termed random) violations of positivity occur when assignment to the treatment of interest is theoretically possible for patients in a given subgroup but is not observed to occur in the data under study. Suppose, in general, individuals over the age of 50 with a history of heart failure have a 10% chance of receiving treatment. In a particular study, however, it is possible that no one in this subpopulation will be observed to be in the treatment group. We can therefore think of this as a small sample problem that can occur due to chance.6 As the number of confounders increases, it becomes less likely to observe both treated and control subjects for all combinations of covariate values, making practical violations more likely.3

Positivity is related to the concept of overlap. Overlap regions are defined by covariate values that are shared by both treatment groups. One way to assess overlap, especially for a large number of confounders, is with the propensity score (PS). The PS, denoted by e(X), is the probability of receiving the treatment of interest conditional on covariates X.1 The PS can be used to assess overlap because it indicates how likely a subject is to be in either treatment group given covariate values (Figure 1). People with characteristics resulting in a PS near 0.5 are expected to be in the overlap region because they are nearly equally likely to be in either treatment group. Although the estimated PS is a common measure of overlap, there is a distinction between PS overlap and overlap of the joint distributions of covariates. Specifically, there may be a subject in the control group that has a very similar estimated propensity score to someone in the treatment group but their covariate combinations may not match and could even be quite dissimilar. If PS approaches are used for achieving balance, we may focus only on PS overlap for evaluating positivity. On the other hand, when the approach involves assessing positivity violations based on covariate values themselves and when there is interest in generalizing to a population with particular characteristics, it is important to note that PS nonoverlap may not account for all nonoverlap in covariate values.

Figure 1.

Figure 1.

Distribution of propensity scores for study populations in which (a) the potential for positivity violations is low and (b) the potential for positivity violations is high, respectively. Subjects with propensity scores near 0 and 1 are more likely to violate the positivity assumption.

To provide further intuition for why positivity violations create a challenge for causal inference, consider estimation using standard methods for the data displayed in Figure 2. Let Y be the outcome and A be the indicator for treatment assignment. The standardized risk among treated subjects involves calculating P(Y = 1∣A = 1, X1, X2), but this conditional probability is not well-defined since P(A = 1∣X1, X2) is 0 for X1 < −1 and X2 < 0 (Figure 2). If we use inverse probability of treatment weighting, treated subjects are given 1P(A=1X1,X2) as weights. However, for those with X1 < −1 and X2 < 0, this fraction is undefined because the denominator is 0 which results in an infinite weight. An initial thought is to force overlap by constructing broader categories (e.g., making age ranges wider so that both treated and control subjects are present in all age categories). However, potential residual confounding may become a concern especially since broad categories may result in a loss of information. Even regression adjustment relies on extrapolation when estimating the potential outcomes under treatment for these covariate values, leading to possibly inaccurate and imprecise estimates if trends in nonoverlap regions are not well captured.7 It is therefore important for researchers to understand methods to remedy this problem and the various tradeoffs involved with different approaches that have been proposed to deal with positivity violations.

Figure 2.

Figure 2.

Overlap may be assessed by determining which regions of the covariate space (based on variables X1 and X2) contain both treated and untreated subjects. These regions may be visualized by plotting subjects based on their covariate values. Areas where we only see treated subjects or only untreated subjects would be deemed regions of nonoverlap. Equivalently, the positivity assumption does not hold for these subjects.

We provide an overview of approaches that have been proposed in the literature for estimating causal effects when faced with positivity violations. The last review was Petersen et al.'s in 2012.6 Here, we provide a comprehensive update on the advancements on this topic and provide insights and practical advice regarding which approaches have superior performance or should be employed based on study characteristics.

We demonstrate how the definition of the region of overlap may influence the target estimand (the quantity we are interested in estimating) and corresponding population of inference. Specifically, we discuss the suitability of each approach in the context of study objectives and target populations, which have been largely overlooked. Using data from a study on the association between diabetes and colon cancer recurrence, we assess how different models for estimating the PS may affect nonoverlap regions and how different methods approach treatment effect estimation and inference.8

2. Methods

2.1. Potential Outcomes and Average Treatment Effect

Suppose there are n independent observations from a population. Define treatment assignment as Ai = 1 if subject i receives the treatment of interest and Ai = 0 if subject i receives a comparator treatment (commonly, the absence of treatment). For a dichotomous treatment, each subject i has two potential outcomes: Yi(1), the outcome under treatment, and Yi(0), the outcome under the comparator.9 However, since each subject only receives one treatment in a study, the observed outcome is Yi = AiYi(1)+(1 – Ai)Yi(0). Furthermore, we define Xi to be a vector of p pre-treatment variables or covariates for the ith subject.

A common objective of causal inference is to estimate the average treatment effect (ATE), defined as the mean difference in potential outcomes under treatment and comparator, respectively: E[Y(1) – Y(0)]. This represents the average effect had everyone received the treatment of interest versus had everyone received the comparator. Identifiability of this parameter rests on the consistency, stable unit treatment value, ignorability, and positivity assumptions.1,5 In this paper, we focus on the positivity assumption: P(Ai = aXi) > 0, for a = 0,1. If this probability is 0 for some values of X, then there is a structural violation of the positivity assumption. Even with no structural violation, practical violations are possible in finite samples. In a given data set, it is often difficult to distinguish between structural and practical violations although subject matter knowledge may provide information about treatment protocol.

2.2. Approaches for Addressing Positivity Violations

We review several methods that have been proposed for dealing with violations of positivity for the setting with two treatment or exposure groups. For a continuous treatment or one involving more than two levels, positivity may be defined in terms of the generalized propensity score (GPS), the conditional density of a treatment a given the covariates r(a, x) = fAX(ax).10,11 Briefly, the positivity assumption requires that for all treatment values a and covariate values x, r(a, x) > 0; that is, the conditional probability of receiving each treatment level is positive. Common approaches for inferring causality for a multivalued or continuous treatment rely on regression modeling of the outcome, which may use machine learning methods and doubly robust estimation.12-16 To assess nonoverlap in multiple treatment settings, one may compare the distributions of estimated GPS for each treatment across all treatment groups via boxplots or histograms; quartiles of exposure may be employed in the continuous treatment setting.17 For this paper, we will focus on the binary treatment setting because it is the most commonly explored setting in the existing literature and has a well-developed set of methods available for comparison.

2.2.1. Trimming

Trimming involves identifying a subgroup of subjects for whom the positivity assumption appears to be violated, removing them from the data set, and drawing inference about the remaining population.6,18 Often, the PS is used to identify the subjects to discard.

PS-based trimming first estimates PSs from the full cohort and then removes subjects with values that are rare in either the treated or control group. Exclusion of subjects reduces the effective sample size, which may increase the variance. Crump et al. (2009) propose a method that trims those with PSs near 0 and 1 and seeks to obtain a subsample for which the conditional ATE, given covariate values for the particular subsample, has minimum variance.19 This approach searches for the subgroup with PSs in the interval [α, 1 – α] (Figure 3(a)). The optimal value of α is the one that provides the most precise estimate of ATE over the class of semiparametric efficient estimators. Thus, their definition of overlap employs PS bounds; the authors suggest that the range [.1, .9], which corresponds to α = .1, results in good performance generally. Because this set is purported to satisfy positivity, standard inference may be used for these observations. Yang et al. (2016) use estimated GPS to extend this trimming approach to multi-level treatments.20

Figure 3.

Figure 3.

Several proposed methods for addressing positivity violations. (a) Given optimal value of α, subjects with estimated PS less than α and greater than 1 – α are discarded—those to the left of the first dashed line and those to the right of the second. (b) Pairs of treated and control subjects from cardinality matching are connected by lines. Points that are not connected to another represent trimmed subjects. (c) The dashed red line is at M1, the maximum BART posterior standard deviation under treatment among treated subjects. Control subjects corresponding to points above this line are trimmed. (d) Treated subjects with PS near 0 and control subjects with PS near 1 are given the most weight because they are most likely to be in either group.

Another approach that addresses subjects in the tails of the PS distribution is Stürmer et al. (2010)’s asymmetrical trimming, which seeks to restrict treatment comparisons to those with a common covariate range.21 Specifically, treated subjects below a certain percentile of the PS (say, 1st, 2.5th, or 5th) and untreated subjects above a certain PS percentile (say, 99th, 97.5th, or 95th) are discarded—essentially those who are given a treatment contrary to what is expected. Common PS approaches, such as direct adjustment, matching, inverse probability weighting, and stratification may be used to estimate treatment effects for the remaining subsample. Trimming in this way has been shown to reduce bias.

Matching, which can be considered a type of trimming, is another approach that can address positivity violations. For each treated subject, one or more control subjects with similar covariate values are selected. Two matching algorithms that have been proposed for limited overlap settings are optimal matching and cardinality matching.22,23 Optimal matching is an approach that seeks to minimize the total covariate or propensity score distances between matched treated and control pairs. Details regarding the algorithm may be found in Rosenbaum (2012).22 Cardinality matching solves an optimization problem in which the objective function seeks to maximize the number of matched sets, while the balancing constraints determine the allowable distance within matched sets based on a pre-specified tolerance (Figure 3(b)). The balance constraints aim to reduce bias by constructing comparable treatment groups while the objective function aims to reduce variance. This procedure accommodates different forms of balance based on a chosen feature of the empirical distributions of observed covariates.23 Original covariates, rather than the PS, may be used in dealing with nonoverlap, allowing covariates to be directly balanced.23 Note that a very large number of variables may require more computation time and give fewer matches for a given degree of balance. Subjects who are not matched are trimmed, and estimation and inference are conducted on the remaining subsample. When matching is carried out with respect to treated subjects and there is a match for each treated subject, the resulting estimand is the average treatment effect on the treated (ATT) rather than the ATE. Some external validity may be lost because the population to which results can be generalized is the one observed to receive the treatment of interest. If there are treated subjects for whom there are no matching untreated subjects, the target population consists of subjects with covariate values found in the matched sample.

Other criteria have been proposed for matching to trim samples.24 For instance, Cochran & Rubin (1973) discuss caliper matching to limit within-match differences based on some threshold.25 Other approaches match on estimated PSs based on similarity to a specified digit and enforcing bounds on the range of the PS—i.e., comparator subjects with PSs lower than the minimum PS in the treatment group are discarded.26-29 Because the PS is a summary measure of variation in covariates, it may or may not capture the nonoverlap in individual covariates. Several of these trimming methods evaluate the positivity assumption based on PS values and do not necessary ensure that the positivity is not violated for covariates, which is a possible drawback.

Unlike the previously described methods, which use the subjects' pre-treatment variables for trimming decisions, Hill & Su (2013) address overlap by using outcome information via Bayesian Additive Regression Trees (BART).30 Suppose the expected potential outcomes are modeled by E^(Yi(0)Xi=x)=g(0,x) and E^(Yi(1)Xi=x)=g(1,x) for each subject i. There tends to be more uncertainty in BART counterfactual outcomes for a subject with covariate values in an area of little or no overlap. Defining sig0=sd^(g(0,Xi)) and sig1=sd^(g(1,Xi)) to be the counterfactual standard deviations under the control and under treatment, a rule may trim subject i with Ai = a if sig1a>Ma, where Ma=maxjsjga is the maximum standard deviation from the model for subjects given treatment a (Figure 3(c)). Although their approach avoids specifying the PS model, using BART to model the covariates themselves may be unwieldy with high dimensional data. Furthermore, specifying the cutoff in this way may overlook nonoverlap regions that have few data points. This approach has been generalized to the setting of three or more treatments by Hu et al. (2020), who apply analogous discarding rules to each treatment group.31

In general, trimming methods discard subjects who may be problematic from a positivity standpoint so that the positivity assumption holds in the resulting subsample. This means that the estimates of ATE may only generalize to the population reflected by that subsample. For matching, although balanced comparisons become possible, the target of inference shifts to the matched population, which consists of characteristics present in the matched sample. Thus, when trimming or matching is employed, the characteristics of the sample on which inference is based should be communicated.32

2.2.2. Weighting

Inverse probability of treatment weighting, though commonly used to control for confounding and selection bias, runs the risk of unstable or infinite weights in situations where there is nonoverlap (e.g., when the denominator of the weight is close to 0). Alternative weighting schemes have been proposed that mitigate this problem.6 Li et al.’s (2018) overlap weights deal with subjects most likely to violate the positivity assumption by giving greater weight to covariate strata that have variability in treatment assignment.33,34 Specifically, the overlap weight is wi=1e^i for subject i receiving the treatment of interest and wi=e^i for subject i receiving the comparator treatment, where e^i is the estimated PS; thus, subjects are weighted by their probability of being in the opposite treatment group. This upweights people who are likely to receive either treatment and downweights treated subjects with PS near 1 and control subjects with PS near 0 (Figure 3(d)). The resulting estimand corresponds to the “overlap” population, which consists of covariate combinations for which the treated and control groups have the most overlap. Individuals with these characteristics have a substantial probability of being in either treatment group and are the ones to whom treatment effect estimates generalize. In clinical studies, there may not be a consensus or clear decision regarding treatment assignment for these patients; that is, they have the most treatment equipoise and may be randomized to either treatment. Li & Li (2019) have generalized the overlap weights to the context of multiple treatments by defining them in terms of estimated GPS.35

2.2.3. Extrapolation

Despite maintaining the identifiability of causal estimates, a limitation of the approaches discussed thus far is that they change the target of inference. Estimands correspond to a population that differs from the combined treated and control population because certain subgroups have been excluded from or downweighted in the analysis. To address this, Nethery et al. (2019) propose a two-stage procedure for estimating population causal effects via inference on the entire sample.36 Motivated by environmental health studies that serve to inform policy, the authors emphasize the importance of inference that preserves the original estimand and generalizes to the original population of interest. As we discuss later, this is appropriate for addressing random violations of positivity due to finite sample size. Their definition of overlap is based on two user-specified parameters, u (a number less than 1) and v (a count of subjects), and estimated PSs. For each subject i whose PS is e^i, we take v + 1 treated subjects with the closest PSs to e^i and see if the range of PSs for this group is less than u and covers e^i. Then, we make the same assessment but with control subjects. If both ranges are less than u, then subject i is considered in the region of overlap.36 Otherwise, subject i is placed in the region of nonoverlap. This provides flexibility in defining overlap—the required amount of data support may differ for different studies. Nethery et al. (2019) recommend u=0.1range(e^),v=10 as the default specification although altering these values based on sample size may provide better assessments of nonoverlap.

Their approach employs two types of models. In the imputation phase, a BART model is fit to subjects in the overlap region and is used to predict individual causal effects. In the subsequent smoothing stage, a restricted cubic spline (SPL) is fit to the estimated effects obtained from BART. The trends in the region of overlap are extrapolated to subjects in the region of nonoverlap while using their observed variables. An added variance component for those with nonoverlap accounts for the higher uncertainty. However, we note that the authors' specification of this component may lead to excessively conservative standard errors. To estimate the population ATE, a Bayesian bootstrap is performed. The overall approach is called BART+SPL.

If positivity violations are present, standard regression methods extrapolate beyond the overlap region, relying on modeling assumptions to do so. These approaches assume the outcome model is correctly specified for the covariate space represented by data, even in areas where there are no observed data for a particular treatment group. Further, the uncertainty associated with the extrapolation might not be conveyed in the resulting estimates.

2.3. Considerations When Dealing with Positivity Violations

2.3.1. Type of Positivity Violation

The type of positivity violation has implications for the target causal estimand and the population to which it applies. With structural violations, since it is clinically impossible for certain patients to receive a treatment, estimating the effect on these patients is not of interest as they will either always or never receive treatment. Thus, structural violations, as determined by eligibility guidelines, may be dealt with using trimming or weighting approaches that shift the population for which inference is made. Interpretation of the causal effect depends on who remains or is upweighted in later analyses.

When there are practical violations, subjects with certain covariate values may be observed to not have received treatment. However, there are people in the population with these covariates who are actually eligible for the treatment. How do we then understand the treatment effects on these individuals? Estimators that alter the target of inference do not achieve the intended objective of preserving the original population ATE. Thus, careful consideration should be given to whether each subject is part of the population of interest when deciding on the approach (Figure 4). Further, since many proposed methods for addressing overlap rely on estimated PSs, decisions that are involved in estimation of these values warrant attention.

Figure 4.

Figure 4.

Considerations for addressing positivity violations or nonoverlap in data analysis.

2.3.2. Specification of the Propensity Score Model

Defining overlap based on the estimated PS depends on modeling decisions regarding the PS. Two specific decisions involve which variables to include in the PS model and how to model them. Variable selection is either knowledge-driven or data-driven,37 and the effects of including certain types of variables in the PS model on effect estimates have been explored by Brookhart et al. (2006).38 Variables may be generally described as confounders (cause for both treatment and outcome), instruments (cause for treatment only), and risk factors (variables associated only with outcome). Ignorability (no unmeasured confounding) would hold when all confounders are included in the model, but it would also be satisfied if instruments and risk factors in addition to all confounders are in the model. Even though ignorability would hold in either case, the subjects in the overlap and nonoverlap groups might differ; including a strong instrument may make it less likely for positivity to be satisfied. Decisions regarding variable inclusion have implications for treatment effect estimation with inclusion of confounders and risk factors tending to provide the most efficient estimates.38

The most common modeling choice for estimating the PS is logistic regression, which requires specifying important interaction terms. On the other hand, nonparametric methods, such as BART, and machine learning algorithms, such as gradient boosting machine (GBM), provide greater flexibility in modeling PSs.39-41 Super Learner (SL), an ensemble machine learning approach, combines multiple parametric and nonparametric models and uses cross-validation to assess their respective predictive performances.42 The choice of model affects PS estimates, which in turn can affect the overlap region and ultimately treatment effect estimates.

3. Addressing Nonoverlap in a Colon Cancer Recurrence Study

We use EHR data for a cohort of colon cancer patients treated in the Kaiser Permanente Washington (KPWA) health care system to illustrate differences across the alternative methods described above. Data for patients with stage I-IIIA colon cancer diagnosed between 1994 and 2014 were derived from the KPWA Data Warehouse and manual chart abstractions. Complete details on the inclusion and exclusion criteria have been previously published.8 In the present analysis, we focus on patients with diabetes and assess the effect of metformin use, the most common medication used to treat diabetes,43,44 compared to that of no metformin medication on colon cancer recurrence (binary outcome) between 90 days after end of incident cancer treatment and the end of follow up, which was the earliest of death, disenrollment, or medical records abstraction. Potential confounders include age, sex, race, Charlson co-morbidity score, primary tumor location, stage of primary cancer diagnosis, whether the cancer was diagnosed following a screening examination, treatment with chemotherapy, treatment with radiation, smoking status, weight, prior non-colon cancer diagnosis, highest hemoglobin A1c (HbA1c) measurement in the period from 1 year prior to cancer diagnosis to 90 days after end of cancer treatment, prior hypertension, prior hypercholesterolemia, and use of insulin and/or sulfonylurea. Our cohort of interest consists of patients with diabetes and available HbA1c data (n=216)—80 treated subjects and 136 comparators.

Using this sample, we assess the association between metformin and the recurrence of colon cancer. Previous studies have shown metformin to lower the risk of cancer development, and metformin has been found to be significantly associated with a smaller risk of primary colon cancer in subjects with diabetes.45-48 Because metformin is already the recommended first-line treatment for diabetes, greater interest lies in whether its use may reduce cancer recurrence, potentially even in patients without diabetes. Prescribing practices giving rise to data observed in routine clinical care may result in clustering of variables and possibly positivity violations. For instance, recommended diabetes treatment tends to depend on which range the patient’s HbA1c levels fall.43

3.1. Approach

We first compare different approaches to modeling the PS— logistic regression with only the main effects for all the variables, BART, GBM, and SL (which includes BART, Bayesian logistic regression, classification and regression training (caret), GBM, logistic regression, and random forest)—and their effect on the overlap region. We consider three PS-based definitions of the overlap region for trimming.

  1. (C): Those with PSs outside [.1, .9] are excluded (Crump et al.'s recommendation).19

  2. (S): Stürmer et al.’s asymmetrical trimming of treated subjects with PS below the 5th percentile and control subjects with PS above the 95th percentile.21

  3. (N): Using Nethery et al.'s definition of the region of overlap with u = .1, v = 7.36

The size of the trimmed sample and differences in overlap status are compared across the different PS models. Next, we examine the resulting subsamples provided by the various trimming strategies in terms of their empirical covariate distributions. Further, we provide effect estimates in the context of target estimands and populations. The employed methods are as follows (Table 1).

Table 1.

Methods for addressing positivity violations that we employ in the colon cancer recurrence data analysis.

Approach Method Details
PS-based trimming (C), (S), and (N) Logistic PS PSs are estimated using the specified model or algorithm, and trimming is performed based on the three definitions of overlap.
(C), (S), and (N) BART PS
(C), (S), and (N) GBM PS
(C), (S), and (N) SL PS
Alternative trimming approaches Cardinality matching We implement the fine balance (exact balance of categorical variables), which ensures equal counts for nominal covariate categories between the treatment groups.49 Continuous covariates are binned using 10 categories.
Hill & Su For cut-offs, we define M1 to be the 90th percentile of sig1 for subjects who received metformin and M0 to be the 90th percentile of sig0 for those who did not. These bounds trim subjects whose counterfactual standard deviation are greater than most (90%) of the standard deviations under the observed treatment condition, avoiding the impact of extreme outliers in either treatment group and ensuring more overlap in the trimmed sample.
Weighting OW Estimated PSs from the four types of models are used to compute weights.
Extrapolation BART+SPL We specify u = .1, v = 7.
(C): Crump et al.’s definition of overlap
(S): Stürmer et al.’s definition of overlap
(N): Nethery et al.’s definition of overlap
PS: propensity score
BART: Bayesian additive regression trees
GBM: gradient boosting machines
SL: Super Learner
OW: overlap weights
BART+SPL: extrapolation method for addressing nonoverlap as proposed by Nethery et al.(2019)

We compute the average causal effect as a risk difference from the alternative trimming approaches, overlap weights, and BART+SPL. Specifics regarding estimation for each method are presented in the Appendix.

3.2. Results

3.2.1. Assessments of overlap based on propensity scores

PSs estimated using logistic regression, BART, GBM, and SL are shown in Figure 5. Logistic regression resulted in more subjects with estimated PS very close to 1. The nonparametric (BART) and machine learning approaches (GBM and SL) gave similar PS distributions and show greater separation of those who are less likely to take metformin.

Figure 5.

Figure 5.

Histograms of propensity scores estimated using (a) logistic regression, (b) Bayesian additive regression trees (BART), (c) gradient boosting machines (GBM), and Super Learner (SL).

Using Crump et al.’s definition, the percentages of subjects remaining in the analytic sample after trimming were 81.5%, 99.1%, 80.1%, and 98.1% when logistic regression, BART, GBM, and SL, respectively, were used to estimate the PS. The analogous percentages from Stürmer et al.’s definition were 92.1%, 94.0%, 95.4%, and 92.1%. Based on Nethery et al.’s definition, the overlap proportion based on logistic PS was 58.8%, and BART, GBM, and SL PS gave overlap proportions of 83.8%, 25.5%, and 45.8%, respectively. For PSs estimated with logistic regression, the proportion of subjects who had overlap statuses that agreed—that is, being designated in the overlap region or in the nonoverlap region by all three definitions—was 66.7%. The analogous percentages for BART, GBM, and SL were 84.3%, 27.8%, and 46.3%, respectively. Even comparing the two approaches that trim at the tails, (C) and (S), the proportions in agreement were 89.4%, 94.0%, 80.1%, and 92.1%, for logistic regression, BART, GBM, and SL, respectively. These suggest that different definitions for overlap can substantially differ in their classifications of which subjects satisfy the positivity assumption. Although logistic regression and GBM gave PSs closer to 1, indicating certain subjects are very likely to receive metformin, there is a substantial number of values near the middle. Trimming only at the tails of the PS distributions resulted in a much larger subsample. On the other hand, Nethery et al.’s definition using the specified u and v values resulted in less overlap because of the small sample size and differences in PS values between those receiving metformin and those who did not. Larger u or smaller v would result in fewer subjects in the region of nonoverlap, indicating the impact of these user-specified parameters on assessments of nonoverlap.

Table 2 gives the percentages of subjects who had the same overlap status under different PS models. There is a smaller discrepancy between logistic regression and each of the other models when trimming occurs at the tails of the PS distribution. Because there are fewer subjects with PSs near 0 and 1, most are considered in the overlap region. When the definition of overlap from Nethery et al. is used, the agreement across models tends to be low with 41.7% of subjects having the same overlap status when comparing BART and GBM estimated PSs, for example. By assessing nonoverlap across the PS range, the Nethery et al. definition better captured the amount of nonoverlap and the differences in PS values estimated using different models. Thus, although the overall amount of nonoverlap may appear similar, different models may give different estimates and disagree on the overlap status for a particular subject.

Table 2.

Percentage (%) of subjects with the same overlap status (either trimmed or retained) based on estimated propensity scores obtained from various types of models.

Logistic BART GBM Super
Learner
Crump et al.'s recommendation to exclude those with PS outside [.1, .9] Logistic 100.0
BART 82.4 100.0
GBM 81.0 81.0 100.0
Super Learner 83.3 99.1 81.9 100.0
Stürmer et al.’s trimming at the lower and upper 5th percentiles Logistic 100.0
BART 96.3 100.0
GBM 93.0 95.8 100.0
Super Learner 100.0 92.6 90.3 100.0
Nethery et al.'s definition of the region of overlap with u = .1, v = 7 Logistic 100.0
BART 72.2 100.0
GBM 61.1 41.7 100.0
Super Learner 73.1 61.1 73.1 100.0

3.2.2. Shifts in target populations and estimators

Not only does trimming reduce sample sizes, but subject characteristics may also change (Table 3). (We include the covariate distribution for the samples from Stürmer et al.’s asymmetrical trimming as Table A1 in the Appendix.) For instance, the subsamples based on logistic, GBM, and SL estimated PSs and Nethery et al.’s definition tend to have a smaller proportion of male subjects and lower Charlson score on average than the original sample. For the distribution of race, trimming based on logistic estimated PS, GBM and SL estimated PS using Nethery et al.’s definition, and cardinality matching gave samples that did not contain all the original racial categories. Thus, for rare characteristics, trimmed samples may exclude some subgroups entirely, suggesting a change in the population of inference. Although Hill & Su’s approach did not discard many subjects, the percentage in the trimmed sample that received chemotherapy (9.3%) is lower than that of the original sample (13.0%); this difference from the original is larger than those observed for the other approaches (even ones that discarded substantially more observations). Thus, the distribution of covariate values in the sample may differ by discarding rules.

Table 3.

Sample size and descriptive statistics at cancer diagnosis for covariates of interest for the original and trimmed samples.

Original (C)
Logistic
PS
(N)
Logistic
PS
(N)
BART
PS
(C)
GBM
PS
(N)
GBM
PS
(C) SL
PS
(N) SL
PS
Cardinality
Matching
Hill &
Su
n 216 176 127 181 173 55 212 94 88 194
Age 70 (11) 70 (10) 71 (10) 71 (10) 69 (10) 70 (9) 70 (11) 70 (10) 68 (11) 71 (11)
Sex (male) 56.5 55.1 49.6 55.8 57.2 49.1 57.1 48.9 56.8 52.6
Race
WH 83.8 83.0 84.3 84.0 80.9 78.2 83.5 84.0 79.5 85.1
BA 6.0 5.1 4.7 5.0 6.9 9.1 6.1 5.3 6.8 5.2
AS 5.6 6.8 5.5 6.6 6.4 7.3 5.7 8.5 6.8 5.7
IN 1.4 1.7 2.4 1.7 1.7 3.6 1.4 1.1 2.3 1.5
HP 0.5 0 0 0.6 0.6 0 0.5 0 0 0.5
MU 0.9 1.1 0.8 0.6 1.2 0 0.9 0 0 1.0
OT/UN 1.9 2.3 2.4 1.7 2.3 1.8 1.9 1.1 4.5 1.0
Charlson score 2.38 (1.67) 2.15 (1.48) 2.22 (1.54) 2.34 (1.58) 2.21 (1.51) 2.18 (1.39) 2.33 (1.64) 2.14 (1.36) 2.15 (1.43) 2.29 (1.58)
Tumor location
Left 40.3 39.8 38.6 42.0 42.8 40.0 41.0 48.9 40.9 41.8
Transverse 9.7 9.7 10.2 10.5 9.8 12.7 9.9 9.6 15.9 8.2
Right 50.0 50.6 51.2 47.5 47.4 47.3 49.1 41.5 43.2 50.0
Tumor stage
I 42.6 41.5 42.5 42.0 42.8 34.5 42.5 37.2 40.9 46.9
IIA 45.8 46.0 48.0 46.4 44.5 54.5 45.8 53.2 45.5 42.3
IIB 6.9 7.4 4.7 6.1 8.7 7.3 7.1 5.3 9.1 6.7
IIIA 4.6 5.1 4.7 5.5 4.0 3.6 4.7 4.3 4.5 4.1
Screening 26.4 25.6 27.6 27.6 27.2 32.7 26.9 28.7 27.3 27.3
Chemotherapy 13.0 12.5 11.0 11.6 13.9 12.7 13.2 13.8 15.9 9.3
Radiotherapy 2.8 2.8 3.1 3.3 3.5 3.6 2.8 4.3 2.3 2.1
Weight 199.09 (50.41) 199.02 (46.28) 195.01 (42.97) 198.10 (51.18) 201.21 (49.53) 198.36 (47.46) 199.99 (50.40) 191.99 (43.38) 204.18 (42.54) 198.10 (50.68)
Smoking 57.4 55.1 52.0 56.4 53.8 47.3 56.5 50.0 52.3 55.7
Prior non-colon cancer 12.0 11.4 11.0 11.0 11.0 14.5 11.3 13.8 6.8 12.9
HbA1c 8.15 (2.02) 8.18 (1.94) 8.16 (2.00) 8.12 (2.00) 8.38 (1.99) 8.61 (1.84) 8.19 (2.02) 8.54 (2.10) 8.54 (1.94) 7.99 (1.95)
Hypertension 62.0 60.2 59.8 61.3 62.4 65.5 61.3 56.4 63.6 61.3
Hyper-cholesterolemia 33.3 31.8 29.1 32.0 32.4 25.5 33.0 26.6 25.0 30.9
Insulin Use 36.1 33.0 35.4 35.4 35.3 41.8 35.4 41.5 37.5 33.0
Sulfonylurea Use 34.7 35.2 32.3 33.7 34.1 29.1 35.8 34.0 28.4 34.5

The cells for continuous variables present mean (SD) and those for categorical or binary variables give percentages (%).

(C) BART PS is not included because only two subjects were trimmed.

The categories for race are White (WH), Black or African American (BA), Asian (AS), American Indian or Alaska Native (IN), Native Hawaiian or Other Pacific Islander (HP), multiple categories reported (MU), and other/unknown (OT/UN). HbA1c refers to hemoglobin A1c, a measure of average blood sugar levels.

Figure 6 presents effect estimates from the trimming procedures, overlap weighting, and BART+SPL. With the exception of cardinality matching and Hill & Su’s method, point estimates suggest that metformin may have a protective effect against colon cancer recurrence, as a smaller proportion of those given metformin experienced recurrence. However, we would not necessarily expect the estimates to be the same as they are measuring different quantities and reflect differences in how positivity violations were addressed. The estimates may also correspond to different populations as discussed previously. For trimming methods, the target population reflects the subsample remaining after discarding observations. On the other hand, BART+SPL preserves the original population intended for inference. The standard error from the BART+SPL, however, is larger than those obtained for the other methods, suggesting a lack of efficiency.

Figure 6.

Figure 6.

Effect estimates and 95% intervals for various methods for addressing nonoverlap. For methods that use BART as an outcome model, a 95% credible/posterior interval is given. The outcome was not observed (no recurrence) in the trimmed sample for the (N) GBM PS method.

4. Discussion

In observational studies, ignoring positivity violations may result in unstable or inaccurate estimates. To address nonoverlap, the appropriate approach should be governed by the way overlap is defined, the study’s objectives, and the population of interest. Our data analysis demonstrates that the distribution of subject characteristics may be altered because of trimming since certain subjects are left out of the analysis. Weighting methods also change the covariate distribution by making certain characteristics more prominent. Thus, although trimming and weighting approaches focus analysis on the sample for which positivity holds, they shift the target of inference. To preserve the original target of inference, extrapolation methods (e.g., BART+SPL) are recommended, but the added uncertainty from extrapolating to nonoverlap regions must be accounted for with larger estimates of variability.

The estimates of the ATE of metformin on colon cancer recurrence obtained from the various methods differed but their validity depends on the study’s objectives. In this case, all methods suggested that metformin is not significantly related to colon cancer recurrence, but it is important to note that these results may correspond to different patient populations and different estimands.

To estimate causal effects, the positivity assumption may be evaluated by comparing the treatment and comparator groups in terms of their confounder values. Considering whether each patient subgroup is in the population of interest using treatment eligibility guidelines may help determine whether the violation is structural or practical. Depending on the approach employed, the population to which results may be generalized and specifications of user-defined parameters should be communicated.

Given the increasing emphasis on personalized medicine, it is also important to consider positivity in that context.50 If a subgroup analysis was deemed appropriate for a given study, then it is recommended that the propensity scores be re-estimated for that subgroup. Along with the re-estimation, all of the accompanying diagnostics should be repeated including checking for positivity violations, and if needed, the approaches that we have presented including trimming, weighting, or extrapolation could be applied to the subgroup.

Although we have focused on the dichotomous treatment setting in this paper, we note that there is a need to extend approaches that address positivity violations to settings with multi-level or continuous treatments. Further, the possibility of time-dependent treatments and covariates raises questions regarding how positivity may be assessed in longitudinal studies and which models are appropriate for addressing violations. Future studies that consider covariate nonoverlap in these more complex settings are warranted.

Supplementary Material

Appendix

Key Points.

  • Causal analysis of observational data requires several assumptions, one of which is the positivity assumption that may be assessed by comparing the covariate distributions of the treatment groups and determining whether there is sufficient overlap.

  • Violations of the positivity assumption occur when certain patient subgroups are not observed to receive a treatment of interest and may lead to challenges for standard estimation and inference of treatment effects.

  • Trimming, weighting, and extrapolation are potential approaches to take when data exhibits nonoverlap; however, the suitability of each approach depends on study objectives and the population of interest.

  • Propensity score-based approaches rely on model specifications and decisions that, along with discarding rules, may affect evaluations of overlap and the resulting sample for analysis.

  • Characteristics of the study population should be included when communicating study findings.

Funding

This work was supported in part by grants from the National Cancer Institute (R01CA172973, principal investigator: JC; R21CA227613, principal investigator: RAH).

Footnotes

Conflict of Interest

The authors report no conflict of interest.

References

  • 1.Rosenbaum PR, Rubin DB. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika. 1983;70(1):41–55. doi: 10.2307/2335942 [DOI] [Google Scholar]
  • 2.Roy J, Mitra N. Measured and accounted-for confounding in pharmacoepidemiologic studies: Some thoughts for practitioners. Pharmacoepidemiol Drug Saf. n/a(n/a). doi: 10.1002/pds.5189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.D’Amour A, Ding P, Feller A, Lei L, Sekhon J. Overlap in observational studies with high-dimensional covariates. J Econom. 2021;221(2):644–654. doi: 10.1016/j.jeconom.2019.10.014 [DOI] [Google Scholar]
  • 4.Hernán MA, Robins JM. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. Am J Epidemiol. 2016;183(8):758–764. doi: 10.1093/aje/kwv254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hernan MA, Robins, Jamie M. Causal Inference: What If. Chapman & Hall/CRC; 2020. [Google Scholar]
  • 6.Petersen ML, Porter KE, Gruber S, Wang Y, Laan MJ van der. Diagnosing and responding to violations in the positivity assumption. Stat Methods Med Res. 2012;21(1):31–54. doi: 10.1177/0962280210386207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.King G, Zeng L. The Dangers of Extreme Counterfactuals. Polit Anal. 2006;14(2):131–159. doi: 10.1093/pan/mpj004 [DOI] [Google Scholar]
  • 8.Chubak J, Yu O, Ziebell RA, et al. Risk of colon cancer recurrence in relation to diabetes. Cancer Causes Control CCC. 2018;29(11):1093–1103. doi: 10.1007/s10552-018-1083-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rubin DB. Causal Inference Using Potential Outcomes. J Am Stat Assoc. 2005;100(469):322–331. doi: 10.1198/016214504000001880 [DOI] [Google Scholar]
  • 10.Imbens GW. The Role of the Propensity Score in Estimating Dose-Response Functions. Biometrika. 2000;87(3):706–710. [Google Scholar]
  • 11.Imbens G, Hirano K. The propensity score with continuous treatments. Published online 2004. [Google Scholar]
  • 12.Galagate Douglas. Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response functions with applications. Published online 2016. 10.13016/M2Q48K [DOI] [Google Scholar]
  • 13.Hill JL. Bayesian Nonparametric Modeling for Causal Inference. J Comput Graph Stat. 2011;20(1):217–240. doi: 10.1198/jcgs.2010.08162 [DOI] [Google Scholar]
  • 14.Kreif N, Grieve R, Díaz I, Harrison D. Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury. Health Econ. 2015;24(9):1213–1228. doi: 10.1002/hec.3189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Linden A, Uysal SD, Ryan A, Adams JL. Estimating causal effects for multivalued treatments: a comparison of approaches. Stat Med. 2016;35(4):534–552. doi: 10.1002/sim.6768 [DOI] [PubMed] [Google Scholar]
  • 16.Kennedy EH, Ma Z, McHugh MD, Small DS. Nonparametric methods for doubly robust estimation of continuous treatment effects. J R Stat Soc Ser B Stat Methodol. 2017;79(4):1229–1245. doi: 10.1111/rssb.12212 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF. A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med. 2013;32(19):3388–3414. doi: 10.1002/sim.5753 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ghosh D Relaxed covariate overlap and margin-based causal effect estimation. Stat Med. 2018;37(28):4252–4265. doi: 10.1002/sim.7919 [DOI] [PubMed] [Google Scholar]
  • 19.Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika. 2009;96(1):187–199. doi: 10.1093/biomet/asn055 [DOI] [Google Scholar]
  • 20.Yang S, Imbens GW, Cui Z, Faries DE, Kadziola Z. Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics. 2016;72(4):1055–1065. doi: 10.1111/biom.12505 [DOI] [PubMed] [Google Scholar]
  • 21.Stürmer T, Rothman KJ, Avorn J, Glynn RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study. Am J Epidemiol. 2010;172(7):843–854. doi: 10.1093/aje/kwq198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rosenbaum PR. Optimal Matching of an Optimally Chosen Subset in Observational Studies. J Comput Graph Stat. 2012;21(1):57–71. doi: 10.1198/jcgs.2011.09219 [DOI] [Google Scholar]
  • 23.Visconti G, Zubizarreta J. Handling limited overlap in observational studies with cardinality matching. Obs Stud. 2018;4:217–249. [Google Scholar]
  • 24.Ho DE, Imai K, King G, Stuart EA. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Polit Anal. 2007;15(3):199–236. doi: 10.1093/pan/mpl013 [DOI] [Google Scholar]
  • 25.Cochran WG, Rubin DB. Controlling Bias in Observational Studies: A Review. Sankhyā Indian J Stat Ser 1961-2002. 1973;35(4):417–446. [Google Scholar]
  • 26.Vincent JL, Baron J-F, Reinhart K, et al. Anemia and blood transfusion in critically ill patients. JAMA. 2002;288(12):1499–1507. doi: 10.1001/jama.288.12.1499 [DOI] [PubMed] [Google Scholar]
  • 27.Dehejia R, Wahba S. Propensity Score Matching Methods For Non-Experimental Causal Studies. Rev Econ Stat. 2002;84:151–161. doi: 10.1162/003465302317331982 [DOI] [Google Scholar]
  • 28.Heckman JJ, Ichimura H, Todd PE. Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme. Rev Econ Stud. 1997;64(4):605–654. doi: 10.2307/2971733 [DOI] [Google Scholar]
  • 29.Smith JA, Todd PE. Does matching overcome LaLonde’s critique of nonexperimental estimators? J Econom. 2005;125(1):305–353. doi: 10.1016/j.jeconom.2004.04.011 [DOI] [Google Scholar]
  • 30.Hill J, Su Y-S. Assessing lack of common support in causal inference using Bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children’s cognitive outcomes. Ann Appl Stat. 2013;7(3):1386–1420. doi: 10.1214/13-AOAS630 [DOI] [Google Scholar]
  • 31.Hu L, Gu C, Lopez M, Ji J, Wisnivesky J. Estimation of causal effects of multiple treatments in observational studies with a binary outcome. Stat Methods Med Res. 2020;29(11):3218–3234. doi: 10.1177/0962280220921909 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Traskin M, Small DS. Defining the Study Population for an Observational Study to Ensure Sufficient Overlap: A Tree Approach. Stat Biosci. 2011;3(1):94–118. doi: 10.1007/s12561-011-9036-3 [DOI] [Google Scholar]
  • 33.Li F, Morgan KL, Zaslavsky AM. Balancing Covariates via Propensity Score Weighting. J Am Stat Assoc. 2018;113(521):390–400. doi: 10.1080/01621459.2016.1260466 [DOI] [Google Scholar]
  • 34.Li F, Thomas LE, Li F. Addressing Extreme Propensity Scores via the Overlap Weights. Am J Epidemiol. 2019;188(1):250–257. doi: 10.1093/aje/kwy201 [DOI] [PubMed] [Google Scholar]
  • 35.Li F, Li F. Propensity score weighting for causal inference with multiple treatments. Ann Appl Stat. 2019;13:2389–2415. doi: 10.1214/19-AOAS1282 [DOI] [Google Scholar]
  • 36.Nethery RC, Mealli F, Dominici F. Estimating population average causal effects in the presence of non-overlap: The effect of natural gas compressor station exposure on cancer mortality. Ann Appl Stat. 2019;13(2):1242–1267. doi: 10.1214/18-AOAS1231 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Sauer BC, Brookhart MA, Roy J, VanderWeele T. A review of covariate selection for non-experimental comparative effectiveness research. Pharmacoepidemiol Drug Saf. 2013;22(11):1139–1145. doi: 10.1002/pds.3506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163(12):1149–1156. doi: 10.1093/aje/kwj149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chipman HA, George EI, McCulloch RE. BART: Bayesian additive regression trees. Ann Appl Stat. 2010;4(1):266–298. doi: 10.1214/09-AOAS285 [DOI] [Google Scholar]
  • 40.Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001;29(5):1189–1232. doi: 10.1214/aos/1013203451 [DOI] [Google Scholar]
  • 41.Ridgeway G Generalized Boosted Models: A Guide to the GBM Package. Compute. 2005;1:1–12. [Google Scholar]
  • 42.van der Laan MJ, Polley EC, Hubbard AE. Super Learner. Stat Appl Genet Mol Biol. 2007;6(1). doi: 10.2202/1544-6115.1309 [DOI] [PubMed] [Google Scholar]
  • 43.Cavaiola TS, Pettus JH. Management Of Type 2 Diabetes: Selecting Amongst Available Pharmacological Agents. In: Feingold KR, Anawalt B, Boyce A, et al. , eds. Endotext. MDText.com, Inc.; 2000. [PubMed] [Google Scholar]
  • 44.Krentz AJ, Bailey CJ. Oral antidiabetic agents: current role in type 2 diabetes mellitus. Drugs. 2005;65(3):385–411. doi: 10.2165/00003495-200565030-00005 [DOI] [PubMed] [Google Scholar]
  • 45.Sehdev A, Shih Y-CT, Vekhter B, Bissonnette MB, Olopade OI, Polite BN. Metformin for primary colorectal cancer prevention in patients with diabetes: A case-control study in a US population. Cancer. 2015;121(7):1071–1078. doi: 10.1002/cncr.29165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Dowling RJO, Goodwin PJ, Stambolic V. Understanding the benefit of metformin use in cancer treatment. BMC Med. 2011;9:33. doi: 10.1186/1741-7015-9-33 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Higurashi T, Nakajima A. Metformin and Colorectal Cancer. Front Endocrinol. 2018;9:622–622. doi: 10.3389/fendo.2018.00622 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhang Z-J, Zheng Z-J, Kan H, et al. Reduced risk of colorectal cancer with metformin therapy in patients with type 2 diabetes: a meta-analysis. Diabetes Care. 2011;34(10):2323–2328. doi: 10.2337/dc11-0512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Rosenbaum PR, Ross RN, Silber JH. Minimum Distance Matched Sampling With Fine Balance in an Observational Study of Treatment for Ovarian Cancer. J Am Stat Assoc. 2007;102(477):75–83. [Google Scholar]
  • 50.Yang S, Lorenzi E, Papadogeorgou G, Wojdyla DM, Li F, Thomas LE. Propensity score weighting for causal subgroup analysis. Stat Med. 2021;n/a(n/a). doi: 10.1002/sim.9029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Horvitz DG, Thompson DJ. A Generalization of Sampling Without Replacement from a Finite Universe. J Am Stat Assoc. 1952;47(260):663–685. doi: 10.1080/01621459.1952.10483446 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES