Abstract
An important strategy for identifying principal causal effects (popular estimands in settings with noncompliance) is to invoke the principal ignorability (PI) assumption. As PI is untestable, it is important to gauge how sensitive effect estimates are to its violation. We focus on this task for the common one-sided noncompliance setting where there are two principal strata, compliers and noncompliers. Under PI, compliers and noncompliers share the same outcome-mean-given-covariates function under the control condition. For sensitivity analysis, we allow this function to differ between compliers and noncompliers in several ways, indexed by an odds ratio, a generalized odds ratio, a mean ratio, or a standardized mean difference sensitivity parameter. We tailor sensitivity analysis techniques (with any sensitivity parameter choice) to several types of PI-based main analysis methods, including outcome regression, influence function (IF) based and weighting methods. We discuss range selection for the sensitivity parameter. We illustrate the sensitivity analyses with several outcome types from the JOBS II study. This application estimates nuisance functions parametrically – for simplicity and accessibility. In addition, we establish rate conditions on nonparametric nuisance estimation for IF-based estimators to be asymptotically normal – with a view to inform nonparametric inference.
Keywords: principal stratification, complier average causal effect, principal ignorability, sensitivity analysis
1 |. INTRODUCTION
The study of causal effects of a treatment is often complicated by noncompliance. The principal stratification framework1 defines types (principal strata) of study participants based on their potential compliance to treatment conditions. In the one-sided noncompliance setting where individuals in the control condition do not have access to the active treatment, there are two principal strata: compliers, who would take the treatment if offered, and noncompliers, who would not. In the two-sided noncompliance setting where all individuals (assigned to either treatment or control) can access the treatment, there are four principal strata, often known as compliers, always-takers, never-takers, and defiers. Principal causal effects are effects of treatment assignment within each stratum, , where and are potential outcomes2 under assignment of active treatment and of control, respectively, and C denotes principal stratum. Of common interest is the complier average causal effect (CACE), but other principal causal effects may also be of interest3,4.
This paper focuses on one-sided noncompliance, which is common in studies where the treatment is designed and implemented by the study and is not otherwise available, e.g., job search training for unemployed workers5, volunteering program for the elderly6, or weight management for people with mental illness7. We will briefly comment on the two-sided non-compliance case in the Discussion section.
The challenge in identifying principal causal effects is that principal stratum membership is only partially observed; with one-sided noncompliance is not observed in the control condition. Effect identification thus requires untestable assumptions. One such assumption is exclusion restriction8 (ER), which posits that treatment assignment does not affect the outcome other than through its effect on treatment received. This means there is no effect on noncompliers, and effects on compliers explain the full effect of treatment assignment. ER is not suitable if treatment receipt is not strictly binary, i.e., noncompliers are exposed to some active ingredients in the treatment arm9,10. This case may arise when an intervention includes several components, and only a major one is used to define compliance. It may also arise due to dichotomization, e.g., only people who attend more than a certain number of treatment sessions are classified as compliers6. ER may also not hold if there are compensating behaviors or psychological effects due to being assigned to one condition as opposed to the other11.
Another identification strategy does not restrict the noncomplier effect to zero, but instead invokes the principal ignorability (PI) assumption12,11,13. This assumption posits that, conditional on a set of pre-treatment-assignment covariates , the potential outcome under control is independent (or mean-independent) of principal stratum , i.e., compliers and noncompliers share the same conditional distribution (or mean function). PI may be appealing for studies with rich baseline covariate data. As randomized trials and cohort studies tend to collect a lot of covariate data, one might hope that the covariates account for a substantial part of the dependence between and . On the other hand, most studies are not designed with noncompliance in mind, and thus not much attention is paid to measuring covariates that predict compliance type to render and independent, which means PI may be violated.
1.1 |. Our contribution
In this paper we focus on the PI assumption. Specifically, we develop methods to evaluate the robustness of the estimated principal causal effects to violation of PI, in the one-sided noncompliance setting. We introduce several sensitivity parameterizations representing how (within levels of ) the mean of differs between compliers and noncompliers. These are indexed by an odds ratio, generalized odds ratio, mean ratio, or standardized mean difference, suitable for use with different outcome types. In addition, we tailor sensitivity analysis techniques for pairing with a range of estimation methods that may be used for the PI-based main analysis, including weighting, outcome regression and influence function based estimation.
We illustrate the proposed sensivity analysis methods using the JOBS II Intervention Study5, where unemployed workers were randomized to receive either a week-long training program to promote mental health and provide job search skills (treatment) or a booklet with job search tips (control). Just over half of those randomized to treatment actually attended the training, resulting in a setting with compliers and noncompliers. JOBS II has been used by authors investigating different aspects of principal stratification, e.g., identification and estimation under PI12,11, alternative identification assumptions14, bias due to failed assumptions15, and noncompliance combined with outcome missingness16. For our purpose, JOBS II is an interesting example for two reasons: (i) the study paid attention to the issue of noncompliance and collected baseline data on workers’ motivation to participate in a hypothetical training program on job search skills, making this a prime case for invoking PI; and (ii) the study collected outcomes of several types (binary, continuous and bounded) to which the methods we propose are relevant.
1.2 |. Related work
To our knowledge, two methods have been proposed to assess sensitivity of effect estimates to PI violation. The method used in Ding and Lu (2017)13 is the closest to, and inspired, our work. In the one-sided noncompliance context, this method allows the mean of given to differ between compliers and noncompliers by a ratio that serves as the sensitivity parameter, and estimates effects under each value of the sensitivity parameter by modifying a PI-based weighting estimator. The application was with a binary outcome, flu-related hospitalization. A drawback is that with a binary outcome this mean ratio parameter may yield predictions greater than 1. This motivated our expansion of the range of sensitivity parameterizations to accommodate different outcome types. Also, we consider sensitivity analysis techniques pairing with different types of PI-based estimators, not just the weighting estimator. The second sensitivity analysis method is that of Wang et al. (2023)17 for survival outcomes, which imputes unobserved and under a parametric model containing a hazard ratio sensitivity parameter. This work differs from our approach in that it relies on this parametric model for identification, whereas we make explicit the assumption required for identification and then use modeling only for estimation. We also avoid refitting models for every value of the sensitivity parameter.
There are methods to assess sensitivity of principal causal effect estimates to violation of assumptions other than PI, such as treatment assignment ignorability18,19 and ER20. These are not our current focus.
To discuss sensitivity analysis, we will need to start with a description of PI-based estimation. While PI-based methods have been discussed in the literature, it has been in settings that are somewhat different, e.g., randomized treatment assignment15,11,13 (which we do not require), a qualitatively different assumption14, or two-sided rather than one-sided noncompliance21. The PI-based estimators we list in this paper share certain features (e.g., principal score weighting, multiple robustness) with these earlier works, but are based on results for the current setting.
The paper proceeds as follows. Section 2 presents the setting, the estimands, and identification under PI. Section 3 introduces three types of PI-based estimators to be handled with different sensitivity analysis techniques. Sections 4 and 5 present sensitivity analysis using ratio-type and difference-type sensitivity parameters, respectively, and address each of the three estimator types. Section 6 covers other topics relevant to the sensitivity analyses. Section 7 analyzes JOBS II data. Section 8 closes with a discussion. Proofs are provided in the Web Appendices. Code is provided in the R-package PIsens available at https://github.com/trangnguyen74/PIsens.
2 |. SETTING, ESTIMANDS, AND PI-BASED IDENTIFICATION
2.1 |. Setting, estimands, and standard assumptions
Let denote treatment assignment (1 for treatment, 0 for control), denote the observed outcome, the potential outcome had treatment been assigned , and denote baseline covariates. Let be a binary variable indicating whether the person actually receives the treatment or not . (More generally, can be any post-treatment variable of interest 6,22,4 The principal stratification framework1 defines subpopulations (aka principal strata, denoted by ) based on and , the potential values of under assignment to treatment and to control. In the one-sided compliance setting, , so only matters. Hence coincides with and there are two principal strata: compliers who would and noncompliers who would not take the treatment, if offered the treatment. The “full” data for an individual are ; the observed data are . Assume that we observe i.i.d. copies of .
Here we are interested in the complier and noncomplier average causal effects (CACE and NACE). As the PI identification strategy is symmetric with respect to these two effects (and so are the sensitivity assumptions we consider), we focus on the generic estimand
where gives the CACE and gives the NACE.
Throughout we assume the usual causal inference assumptions:
Under , we write to simplify presentation.
As several expressions appear repeatedly in the paper, we will use the shorthand notation
for . Here . Note the difference between which is the conditional mean of a potential outcome within a principal stratum and which concerns the observed outcome in the control condition and does not condition on principal stratum. is the propensity score. is the probability of being in stratum given covariate values, which we also refer to as the principal score, following the literature12,15,11,13.
Proofs of all results in this section are provided in Web Appendix B.
2.2 |. The identification challenge and the PI assumption
Identification of amounts to identification of and . The challenge is that while A0-A2 identify , they are not sufficient to identify . To see this, we start with the identity below.
Lemma 1.
| (1) |
(To simplify presentation, it is left implicit that is only defined where .)
Lemma 1 says that is equal to the weighted average of the stratum-specific potential outcome mean where the weight is proportional to the principal score . This means can be identified via identification of and , which we address next.
Proposition 1 (Results without PI).
Under assumptions A0-A2,
| (2) |
| (3) |
| (4) |
| (5) |
Proposition 1 shows that A0-A2 identify and , but not . (The RHS of 3) conditions on , which is not observed for .) Hence is identified, but is not, so is not.
The problem here is nonidentifiability of the stratum-specific conditional mean functions . These two functions, for compliers (where ) and for noncompliers (where ), are tied together as two unknowns in one equation, 55, which we will call the mixture equation. To identify them, some additional assumption is needed.
One such assumption is PI, which we state here as a conditional mean independence:
PI is also sometimes stated as (which implies A3). This version is more intuitive: it is satisfied if captures all common causes of and 13. Like other authors, we assume that A3 and A1 involve the same set of covariates; this can be relaxed.
A3 combined with (5) solves the identification problem.
Proposition 2 (PI based identification).
Under assumptions A0-A3,
| (6) |
| (7) |
We will refer to the observed data functionals in Proposition 2 that identify and as and , and the corresponding result for (i.e., ) as .
Remark 1 (Sufficient PI version).
A3 involves but not . Feller et al. (2017)11 call this assumption weak PI to differentiate it from a different assumption (strong PI) that involves both potential outcomes, for . While these labels suggest a difference in degree, these assumptions are qualitatively different. Strong PI implies that conditional on , the average causal effect is constant across principal strata, which is generally not desired11. As A3 is sufficient (and strong PI is unnecessary), we simply refer to A3 as PI.
PI is untestable. The sensitivity analyses in Sections 4 and 5 will each replace PI with an alternative assumption (sensitivity assumption) indexed by a sensitivity parameter representing deviation from PI. Such an assumption obtains alternative identification results for and . The sensitivity analysis then shows, for a plausible range of the sensitivity parameter, how effect estimates depart from those obtained in a PI-based analysis.
3 |. THREE TYPES OF PI-BASED ESTIMATORS FROM THE LENS OF SENSITIVITY ANALYSIS
It is desirable to develop sensitivity analysis methods that are simple modifications of PI-based methods. With this in mind, in this section we group estimators of into three types (each with a few example estimators), which we anticipate can be adapted for sensitivity analysis using different techniques (in subsequent sections). This grouping may be useful generally, say, where it is desirable to use a different sensitivity assumption not covered in this paper.
With three estimator types, the presentation from here through Section 5 is slightly complex. Readers who are mainly looking to add a sensitivity analysis to an already conducted or planned PI-based analysis only need to focus on the type of their estimator and can ignore the others.
Proofs of results in this section are provided in Web Appendix C.
3.1 |. Type A (≈ outcome regression estimators)
As PI-based analysis relies on the identification result , an obvious sensitivity analysis technique (applicable to any PI-based method that involves estimating ) is to replace with the alternative formula for identified under the sensitivity assumption. We aim to use this technique with type A (roughly outcome regression) estimators.
To be precise, type A estimators involve estimating in order to first estimate the principal causal effect conditional on covariates (which under PI is ) or a proxy for it, and then aggregate these conditional effects to estimate the average principal causal effect . Examples include the principal-score-weighted outcome-regression estimator (aka the plug-in estimator) (8) and the propensity-score-weighted outcome-regression estimator (9):
| (8) |
| (9) |
where the hat notation indicates an estimated function. These are justified by the formulae in (4) and the first two formulae in (7). Also included in type A is a multiply robust outcome regression estimator, , which we will present after explaining type B estimators.
For each estimator here we highlight in red the component to be replaced in sensitivity analysis.
3.2 |. Type B (≈ influence function based estimators)
Type B estimators are a subset of estimators constructed based on the nonparametric influence function (IF) of (hence the rough label IF-based estimators, although not all IF-based estimators belong in type B). To define this type precisely, let
In this notation, and . A type B estimator of is one that can be expressed as a combination of IF-based estimators of and . The sensitivity analysis technique will be to replace the component with an IF-based estimator of under the sensitivity assumption. To obtain these estimators, we derive the relevant IFs.
Proposition 3 (IFs for PI-based analysis).
The IFs of , and are
| (10) |
| (11) |
| (12) |
| (13) |
The estimator that uses the IF of (with estimated nuisances) as the estimating function is a type B estimator. This is because due to (13), this estimator has the form
| (14) |
where (with representing sample average)
are IF-based estimators of .
Another type B estimator is the Hájek-type 23 estimator,
| (15) |
where , , are a modified version of , , , replacing with and with . (We call this modification Hájek-ization.)
and are multiply robust (see Proposition 4 below). is range-preserving.
Circling back to type A.
We now present the multiply robust outcome regression estimator mentioned earlier. This is a multi-step estimator (the MS subscript is for “multi-step”) that is based on expressing the IF of as a sum of three terms:
| (16) |
and building steps that zero out the sample means of the terms. The resulting estimator is
| (17) |
Here . and are specific estimators of and is fit to (non)compliers in the treatment arm weighted by , is fit to control units weighted by , and both are mean-recovering models (i.e., on the sample to which the model is fit, the mean of model predictions equals outcome mean). These models zero out the sample means of and , and the weighted averaging in 17 zeros out the sample mean of . can also be Hájek-ized, for another version.)
Remark 2.
The tilde notation here refers to this specific method of estimating functions for this estimator. The weighting targets the model to the relevant covariate space where it is used for prediction, and the mean-recovering feature ensures that predictions are on average unbiased (if the weights are correct). This targeted estimation technique can also be used (but is not required) for estimating and for other estimators, and for estimating .
shares the same multiply robust property of and (see Proposition 4).
Proposition 4 (multiply robust PI-based estimators).
and are consistent if one of the following three conditions hold:
the propensity score and principal score models are correctly specified; or
the principal score model and both outcome models , are correctly specified; or
the propensity score model and the outcome under control model are correctly specified.
For simplicity, we presume that estimation uses parametric models. We leave IF-based inference using data-adaptive nuisance estimation to future work, except for a small first step of deriving nonparametric rate conditions (see Section 6.3).
3.3 |. Type C (≈ other/weighting estimators)
Type C estimators do not involve estimating as a step in the estimation procedure. This type includes the pure weighting estimator
| (18) |
justified by the second formula in (4) and the third formula in (7). Also included in type C is the estimator that employs this same weighting scheme and uses the weighted sample to fit a model regressing outcome on treatment and covariates, say, to improve precision in estimating (in the spirit of24,25). For this type, we do not have a specific sensitivity analysis technique in mind, and will need to see whether the identification result under the sensitivity assumption allows a simple modification.
To sum up, we have defined three types of PI-based estimators: type A, whose defining feature is involving estimation; type B, whose defining feature is having as a component an IF-based estimator of , and type C, other estimators. We now consider sensitivity analysis.
4 |. SENSITIVITY ANALYSIS BASED ON THREE RATIO-TYPE SENSITIVITY PARAMTERS
Recall that the challenge before invoking PI was that the stratum-specific conditional means and are not identified, as they are two unknowns in the mixture equation
PI identifies and by equating them to each other. A sensitivity analysis replaces PI with a sensitivity assumption that allows and to differ from each other. The assumption is indexed by a sensitivity parameter indicating how and to what degree they differ. To accommodate different outcome types (binary, bounded, unbounded) and different conceptualizations of how and may differ, we consider different parameterizations. The following assumptions use an odds ratio (OR), a generalized odds ratio (GOR) and a mean ratio (MR) sensitivity parameter. In all of them, recovers the PI case.
A4-OR (sensitivity odds ratio): ,
A4-GOR (sensitivity generalized odds ratio): , where are the lower and upper bounds,
A4-MR (sensitivity mean ratio): ,
for some positive range of that is considered plausible.
As mentioned in Section 1.2, a challenge with A4-MR is that it may predict out of the outcome range. For an example, consider an outcome on a 0 to 7 scale. Suppose that for some covariate value and . Then a sensitivity MR value of 1.69 would imply . A4-MR is thus more suitable if the outcome is single-signed and unbounded. Since most outcomes are practically bounded, if using A4-MR, the parameter range should be carefully selected to avoid predicting extreme values; we will discuss this in Section 6.1.
For binary outcomes, we propose A4-OR, the assumption that within levels of (i) the odds of the outcome for compliers is times that for noncompliers, or equivalently (because ORs are symmetric), (ii) the odds of being a complier for those with the outcome is times that for those without the outcome. A4-OR predicts within [0, 1].
More generally, for outcomes bounded on both ends, we propose A4-GOR, a generalization of A4-OR. (A4-OR is a special case with and .) Figure 1 shows the connection between and for several GOR values. If the outcome range varies with , the bounds can be made -value-specific, i.e., and . A4-GOR always predicts within the specified bounds. For a non-binary outcome, however, A4-GOR may still contradict with the observed outcome distribution in ways that are not obvious, e.g., predicting values far from where the outcome mass is concentrated.
FIGURE 1.

Connection between and under A4-GOR for different GOR values
Remark 3 (Exponential tilting connection).
A4-OR can be equivalently expressed as
| (19) |
which looks like exponential tilting assumptions used in the context of non-ignorable missingness and unobserved confounding26,27,28. The difference is that in these other problems, the assumption connects an unobserved distribution (e.g., that of missing data) to an observed distribution (that of non-missing data), whereas here the assumption relates two otherwise unidentified distributions whose mixture (and mixing ratio) is identified. Here the tilting-like assumption (19) achieves identification with a binary outcome but not generally. If is continuous, for example, 19 (combined with the mixing weights ) is not sufficient to identify the component distributions (or their means) based on the mixture distribution .
Proofs of all results in this section are provided in Web Appendix D.
4.1 |. Identification
Combining any of the A4- assumptions with (5), we can identify , which then identifies . We present results for A4-GOR (which includes A4-OR as a special case) and A4-MR.
To maintain symmetry, let .
Proposition 5 (GOR- and MR-based identification).
Under assumptions A0-A2 combined with A4-GOR,
| (20) |
and under assumptions A0-A2 combined with A4-MR,
| (21) |
where
Identification of follows from identification. We will label the results of these parameters under A4-GOR and A4-MR with superscripts GOR and MR, respectively.
4.2 |. Estimation
Based on the above identification results, we now modify the PI-based estimators. We let each resulting estimator inherit the label of the originating estimator, except replacing the superscript PI with one indicating the sensitivity assumption.
Figure 2 provides a summary of the key techniques presented here and in the next section.
FIGURE 2.

Flowchart summarizing key sensitivity analysis techniques that are applicable given PI-based estimator type and sensitivity parameterization
4.2.1 |. Type A estimators
These estimators are adapted by replacing the estimate of with estimates of or . For example, this turns the principal score weighted outcome regression estimator (8) (aka the plug-in estimator) into
| (22) |
where and are (20) and (21) evaluated at and . The other outcome-regression estimators (9) and (17) are adapted similarly.
4.2.2 |. Type B estimators
Adaptation is based on the IFs of and , which are provided in Proposition 6.
Proposition 6 (GOR- and MR-based IFs).
The IFs for and are
| (23) |
| (24) |
where
Based on Proposition 6, under A4-GOR and A4-MR, we obtain estimators and by replacing the component of (14) with and , respectively, where
| (25) |
| (26) |
and the functions are estimated by evaluating them at and .
(15) is modified similarly to obtain sensitivity estimators and by replacing with and , the Hájek-ized version of and .
4.2.2.1 |. Partial loss of robustness.
Proposition 4 stated that several PI-based estimators are multiply robust, including type B estimators (14) and (15), and the multi-step type A estimator (17). The adaptation of these estimators for sensitivity analysis results in partial loss of robustness (see Proposition 7). The resulting GOR-based estimators () depend on correct specification of models for and (i.e., they are inconsistent if either model is misspecified). The MR-based counterparts () depend on correct specification of the model for .
Proposition 7 (Partial loss of robustness).
and are consistent for if
both the model for and the model for are correctly specified, AND
either the model for or the model for is correctly specified.
and are consistent for if
the model for is correctly specified, AND
either the model for or both outcome models , are correctly specified.
Remark 4 (Approximate robustness).
Among these sensitivity estimators, the type B estimators () are in a sense more robust than the multi-step type A estimators (): they have an approximate robustness property with respect to the model component(s) whose correct specification they require for consistency. Specifically, (i) while all six estimators depend on a correct model for , the type B estimators provide a first-order correction of the bias (that would be incurred if simply using the plug-in estimator 22) due to the deviation of the probability limit of from the true function . Also, (ii) while all three GOR-based estimators additionally depend on a correct model for , the type B estimators provide a first-order correction of the bias due to the deviation of the probability limit of from the true function . (This first-order bias correction feature is also shared by the originating PI-based estimators , and results in the robustness of those estimators.)
We give a quick explanation of (ii) to make this concrete. (For full details concerning Remark 4, see the Web Appendix.) If and are correctly specified but is not, the probability limit of both and is the sum of two terms
| (27) |
(which result from the last and first terms in (25)). These are the first two terms in the Taylor expansion of the true parameter treated as a function of at the point . The first term coincides with the probability limit of the plug-in estimator, which is biased due to . The second term provides a first-order correction of this bias. For this approximate robustness property to be beneficial, however, needs to be close to .
4.2.3 |. Type C estimators
We consider A4-MR and A4-GOR separately. Under A4-MR, the convenient form of (21) allows a simple adaptation of type C estimators: to estimate , scale the outcome in control units by a factor of then use the PI-based analysis method. For the pure weighting estimator specifically, this adaptation results in the estimator
This outcome scaling technique is justified by the result below, a corollary of Proposition 5
Corollary 1 (MR-based outcome scaling).
| (28) |
Remark 5.
When specializing to the randomized treatment setting, 28p simplifies, and one expression of the specialized version of 28 is , which appeared in Ding and Lu (2017, proposition 3)13. Based on this expression, this prior work characterizes the MR-based sensitivity analysis as an under/overweighting of the principal score by a factor of . Interestingly, this characterization breaks the interpretation of as a weighted average (our starting point in Lemma 1, which we have maintained throughout). Our new insight here is that the appearance of in (28) is due to the fact that under A4-MR the outcome mean is identified by . It is thus natural to use the scaling the outcome by a factor of characterization. Also, by leaving the principal score weights alone, this outcome scaling technique applies to type C estimators generally, not just the pure weighting estimator.
Under A4-GOR, there is no result similar to (28) that separates from functions of , therefore no simple modification is available for type C estimators. The pure weighting estimator (9) (but not type C generally) can be adapted by replacing in the second term with an estimate of (which requires estimating ). For this estimator to reduce to when has to be estimated by a model (defined in 17). However, with estimated, there are other options for estimating that one might prefer to such modification, e.g., replacing the whole second term of with . This obtains the type A estimator , which inconveniently does not reduce to when . Hence this is one place where we break the convention of respecting the primacy of the main analysis and recommend that, if a GOR-based sensitivity analysis is to be conducted, a type A (or type B) estimator be used for the main analysis.
5 |. SENSITIVITY ANALYSIS BASED ON A DIFFERENCE-TYPE SENSITIVITY PARAMETER
A4-OR, A4-GOR and A4-MR all assume that the means of differ between compliers and noncompliers in some multiplicative manner. If one believes the difference is additive, it is more appropriate to use a sensitivity parameter that involves . We propose using a standardized mean difference (SMD). For convenient notation, let
A simple SMD-based assumption is
The denominator here is an “average” standard deviation: the quadratic mean of and (the within-stratum conditional standard deviations of ). This standard deviation scale helps in selecting a range for and users can tap into intuition about SMDs from other contexts (e.g., measuring effect size 29 or covariate imbalance30. recovers the PI case; indicates a substantial complier-noncomplier difference in the outcome under control.
Inconveniently, combined with A0-A2 only partially identifies . For a simple sensitivity analysis, we consider the stronger assumption below, which supplements with an equal variance assumption:
5.1 |. Identification
For symmetry, let and .
Proposition 8 (SMDe-based identification).
Under A0-A2 combined with A4-SMDe,
| (29) |
| (30) |
| (31) |
If equal variance is not assumed, is not point identified, but bounds can be obtained. The bounds can be narrowed if one additionally assumes that and differ from each other by less than a certain factor (see Proposition 8 b in the Web Appendix).
5.2 |. Estimation
This sensitivity analysis requires estimating . For simplicity, in the illustration we use a quasi-likelihood approach assuming the outcome’s conditional variance is proportional to a function of its mean. An alternative is to directly model based on in control units.
With the simple result 31, each estimator of we obtain is an estimator of minus times an estimator of . This is the case regardless of the type of the PI-based estimator.
5.2.1 |. Simple type A estimators
Adaptation of (8) and (9) by replacing with yields the following estimators:
| (32) |
| (33) |
Rather than applying the same adaptation to the multi-step estimator (17), thanks to the special form of , we can adapt the way we adapt other IF-based estimators.
5.2.2 |. IF-based estimators (including type B and multi-robust type A)
We adapt these estimators using IF-based estimators of . Let and . Then .
Proposition 9 (SMDe-based IF).
The IFs of and are
| (34) |
| (35) |
where
Based on Proposition 9, we have the estimator
| (36) |
where
is the IF-based estimator of (where , and are estimated by plugging in , and ), and is the IF-based estimator of (defined under (14)). In addition, we have the estimator based on Hájek-ized versions of and ,
| (37) |
Then the adapted IF-based estimators are
| (38) |
| (39) |
| (40) |
Remark 6.
and depend on consistent estimation of and (they are inconsistent if either component is inconsistent), but they have the approximately robust property where (i) if , and are consistent but is not, the estimator provides a first-order correction of the bias of the plug-in estimator due to the deviation of the probability limit of from the true ; and (ii) if , and are consistent but is not, the estimator provides a first-order correction of the bias due to the deviation of the probability limit of from the true . (See details in the Web Appendix.)
5.2.3 |. Other estimators
While any PI-based estimator can be paired with any estimator, to keep things simple it is reasonable to pair non-IF-based estimators with either (32) or (33), which are not IF-based. As outcome modeling is needed to estimate for the sensitivity analysis, however, we recommend switching to a type A or IF-based estimator for the PI-based main analysis.
6 |. OTHER TOPICS
6.1 |. Using data in considering the range of the MR and SMD parameters
We now return to the issue that certain sensitivity parameters may predict extreme values. A example concerns the outcome earnings in our illustrative study. Since earnings span a large range, it may be intuitive to think about the earnings as differing in a multiplicative rather than additive manner, so a researcher may choose to use A4-MR for a sensitivity analysis. But earnings are not unbounded, and there is a maximum earning in the dataset, so we would be right to worry that certain sensitivity MR values may predict some values that are too high. A4-SMDe also has the same issue (to a lesser degree), where predicted values may be too high or too low. A4-GOR and A4-OR, on the other hand, predict within bounds.
We can use the data to gauge what values of the MR or SMD sensitivity parameter may be extreme, if we are willing to also specify bounds for the stratum-specific conditional means, . With A4-MR (and a non-negative outcome), we fix an upper bound (B) for . With A4-SMDe, we fix a pair of upper and lower bounds. These can be informed by the observed outcome distribution, but are not necessarily bounds on the outcome itself. They are required to satisfy or for all values in the data.
For each value, we can obtain an interval for the MR/SMD sensitivity parameter that does not predict outside of these assumed bounds. (This interval is derived in Web Appendix F, see Propositions 10 and 11.) We estimate such intervals for all covariate values and examine the distributions of their upper and lower ends to judge which ranges of the sensitivity parameter should not be allowed – see application in the illustrative example in Section 7
Note that while this helps guard against mathematically implausible values, it does not replace careful consideration based on substantive knowledge, which is important for deciding which range is practically plausible and relevant to the specific application.
6.2 |. Confidence interval estimation
The application in this paper estimates nuisance functions (e.g., propensity score, principal score and outcome mean) parametrically, for simplicity. All the estimators in sections 3, 4 and 5 are M-estimators. With parametric nuisance estimation, they are asymptotically normal and analytic standard errors can be derived using M-estimation calculus31, and the bootstrap is also valid. In our illustration below, we bootstrap and construct BCa confidence intervals32.
6.3 |. Rate conditions for nonparametric estimation
With a view to inform nonparametric inference (not the focus of this paper), we derive rate conditions on nonparametric nuisance estimation for IF-based estimators (using sample splitting or cross fitting) to be -consistent and asymptotically normal. See Propositions 12 and 13 in Web Appendix G for these results under PI and under the sensitivity assumptions, respectively. To our knowledge, our results are the first on rate conditions for sensitivity analyses for PI violation. They show that while PI-based analysis only requires typical rate conditions on several error products of nuisance functions (e.g., ), the sensitivity analyses require rate conditions on single nuisance functions (due to the presence of square errors in the remainder bias term). Specifically, we require with all the sensitivity analyses, and additionally with the GOR- and SMDe-based sensitivity analyses, and with the SMDe-based sensitivity analyis. These results immediately connect to the earlier results on the robustness under PI, and (partial) loss of robustness under sensitivity assumptions, of IF-based estimation.
6.4 |. Finite-sample bias
There is not an ideal choice for the placement of this topic. It is easier to read after reading the illustrative analysis in the next section. But we put it here for it is a small other topic.
Many consistent estimators are biased in finite samples. Methods to reduce such bias33,34 are not often used, perhaps because the bias tends to be small, and the correction is complicated. The data example, however, reveals an interesting pattern of bias specific to sensitivity analysis that is worth noting. It is seen with the different outcomes and different estimators. An instance of this pattern is shown in Figure 3, all instances are shown in Web Appendix H.
FIGURE 3.

Point estimate and iterated bootstrap mean estimates. Plots are shown for the outcome work for pay.
In Figure 3 the solid black curve is the point estimate (which we refer to generically as ), the dashed red curve is the mean of bootstrap estimates , and the dashed orange curve is the mean of estimates from the double bootstrap (bootstrap of bootstrap samples) . The shared pattern in all sensitivity analyses is that the slope of the curve is less steep than that of the curve, and the slope of the is even less steep. (Note that the steepness of the curve indicates the degree to which sensitivity analysis estimates depart from the main analysis estimate.) For the two outcomes work and depressive symptoms, where the differences between and are minimal in the main analysis, this means that in the sensitivity analysis tends to be less extreme than , and tends to be even less extreme; and this gets more pronounced the farther the sensitivity parameter is from its null value.
Finite-sample bias deserves dedicated investigation, which is outside the scope of this paper. This specific pattern, however, begs the question why. Our intuition is that it may be due to the fact that is a weighted average of where the weights are , and under sensitivity assumptions the quantity being averaged depends on the weight . Specifically, with a fixed , is (i) monotone decreasing in for or , and (ii) monotone increasing in for or (see Proposition 14 in Web Appendix H). This results in a coupling of (a) any deviation (of the finite sample from the population) in the weight with (b) a deviation in the quantity being averaged – in the opposite direction for case (i) and the same direction for case (ii). The resulting finite-sample bias is an attenuation of the difference between the sensitivity analysis and main analysis estimates.
For the data example we use a bootstrap-based bias correction after conducting a focused simulation study (see Web Appendix H). This bias correction is also implemented in our R-package.
7 |. JOBS II ILLUSTRATION
De-identified JOBS II data were accessed from the Inter-University Consortium for Political and Social Research data archive (www.icpsr.umich.edu). Our analysis focuses on the set of participants who were identified at initial screening as being at high risk for developing depression5. For illustrative purposes, we further subset to participants with complete data (n=465) and treat the resulting dataset as if it were an observational study. (Due to this restriction of the sample, analysis results should be seen as merely illustrative and not taken as substantive findings.) We consider three outcomes: working for pay (binary), monthly earnings (non-negative), and depressive symptoms (a score ranging from 1 to 5) at six months post-treatment. The study has a rich set of baseline covariates including demographics, household characteristics, employment history, motivation, and depressive symptoms. Given these covariates, we assume treatment assignment ignorability. We also assume PI in the main analysis.
Table 1 summarizes the covariate distribution (i) in the full analysis sample; (ii) stratified by compliance type in the treatment group (to give a sense of associations); and (iii) stratified by the binary work-for-pay outcome in the control group (to give a sense of associations). Compared to noncompliers, compliers were more likely to be male, White, older and have a college degree. They were more likely to have ever married and have fewer cohabiting children, and less likely to have low household income. They were more likely to have had a professional job as their last steady job, to have been unemployed for a shorter time, and to report slightly higher job-seeking and program-participation motivation. In the control condition, participants who were younger, White, higher educated, unemployed for a shorter period, a manager at their last steady job, or reported higher motivation were more likely to be employed at six months.
TABLE 1.
Baseline covariates in (1) full analysis sample; (2) propensity-score-weighted treatment group, stratified by compliance type; and (3) propensity-score-weighted control group, stratified by outcome work for pay
| Full analysis sample (n=465) | Treatment group propensity-score-weighted |
Control group propensity-score-weighted |
||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| compliers (n=172) (n.wt=256.6) | noncompliers (n=139) (n.wt=208.0) | work (n=96) (n.wt=303.3) | not work (n=58) (n.wt=152.1) | |||||||
|
| ||||||||||
|
mean
or % |
(SD)
(count) |
mean
or % |
(SD) (count) |
mean
or % |
(SD)
(count) |
mean
or % |
(SD)
(count) |
mean
or % |
(SD)
(count) |
|
| Age | 36.5 | (9.9) | 39.0 | (9.7) | 33.5 | (9.8) | 35.2 | (9.4) | 38.6 | (11.3) |
| Sex (female) | 57.6% | (268) | 53.4% | (137) | 62.9% | (130.9) | 59.4% | (180.3) | 56.7% | (86.3) |
| Race (white) | 81.7% | (380) | 85.1% | (218.5) | 78.6% | (163.5) | 87.1% | (264.3) | 73.7% | (112.1) |
| Education | ||||||||||
| less than high school | 10.5% | (49) | 7.3% | (18.8) | 16.5% | (34.3) | 8.5% | (25.9) | 16.3% | (24.8) |
| high school | 29.7% | (138) | 26.3% | (67.5) | 31.7% | (66.0) | 25.6% | (77.5) | 35.6% | (54.2) |
| some college | 38.9% | (181) | 37.2% | (95.5) | 41.7% | (86.7) | 42.6% | (129.0) | 34.7% | (52.8) |
| Bachelor’s degree | 13.1% | (61) | 19.4% | (49.9) | 5.7% | (11.8) | 14.6% | (44.3) | 9.5% | (11.4) |
| graduate studies | 7.7% | (36) | 9.8% | (25.0) | 4.4% | (9.3) | 8.7% | (26.4) | 3.9% | (5.9) |
| Marital status | ||||||||||
| never married | 34.4% | (160) | 31.8% | (81.6) | 38.0% | (79.0) | 35.2% | (106.8) | 35.5% | (54.0) |
| married | 38.7% | (180) | 37.3% | (95.7) | 38.5% | (80.2) | 35.4% | (107.3) | 39.8% | (60.6) |
| divorced/separated/widowed | 26.9% | (125) | 30.9% | (79.2) | 23.5% | (48.8) | 29.4% | (89.2) | 24.6% | (37.5) |
| Kids in household | 0.93 | (1.13) | 0.85 | (1.12) | 0.95 | (1.17) | 0.80 | (1.05) | 0.98 | (1.03) |
| Household income | ||||||||||
| under 15K | 22.8% | (106) | 19.3% | (49.4) | 26.7% | (55.6) | 19.0% | (57.5) | 31.3% | (47.6) |
| 15K to under 25K | 24.9% | (116) | 22.1% | (56.6) | 29.6% | (61.6) | 34.3% | (104.1) | 15.1% | (23.0) |
| 25K to under 40K | 25.8% | (120) | 28.6% | (73.3) | 23.3% | (48.6) | 25.6% | (77.7) | 24.0% | (36.5) |
| 40K to under 50K | 10.8% | (50) | 12.6% | (32.4) | 7.7% | (16.0) | 6.7% | (20.2) | 15.3% | (23.2) |
| 50K or more | 15.7% | (73) | 17.5% | (44.9) | 12.6% | (26.2) | 14.4% | (43.8) | 14.3% | (21.8) |
| Economic hardship | 3.62 | (0.92) | 3.52 | (0.92) | 3.78 | (0.92) | 3.73 | (0.91) | 3.52 | (1.00) |
| Occupation (last steady job) | ||||||||||
| professional | 18.5% | (86) | 26.7% | (68.5) | 9.1% | (18.9) | 17.1% | (51.8) | 18.8% | (28.6) |
| managerial | 17.2% | (80) | 14.7% | (37.6) | 19.5% | (40.6) | 18.4% | (55.9) | 10.8% | (16.4) |
| clerical | 23.4% | (109) | 23.9% | (61.3) | 23.2% | (48.2) | 22.9% | (69.6) | 26.3% | (40.1) |
| sales | 6.5% | (30) | 5.2% | (13.3) | 7.7% | (16.0) | 7.9% | (23.8) | 3.0% | (4.6) |
| crafts/foremen | 12.9% | (60) | 13.6% | (34.8) | 12.2% | (25.4) | 10.9% | (33.0) | 18.8% | (28.6) |
| operative | 9.5% | (44) | 5.2% | (13.3) | 14.6% | (30.4) | 9.5% | (28.9) | 7.3% | (11.1) |
| labor/service | 12.0% | (56) | 10.8% | (27.7) | 13.7% | (28.5) | 13.3% | (40.3) | 15.0% | (22.7) |
| Weeks unemployed | 9.3 | (11.0) | 8.1 | (10.3) | 10.4 | (11.1) | 8.0 | (9.4) | 10.5 | (12.5) |
| Motivation to participate | 5.34 | (0.80) | 5.50 | (0.79) | 5.19 | (0.78) | 5.41 | (0.73) | 5.37 | (0.83) |
| Job-seeking motivation | 82 | (17) | 84 | (15) | 81 | (19) | 85 | (16) | 76 | (17) |
| Job-seeking self-efficacy | 3.59 | (0.83) | 3.48 | (0.84) | 3.70 | (0.82) | 3.66 | (0.76) | 3.44 | (0.84) |
| Assertiveness | 2.99 | (0.82) | 2.90 | (0.82) | 3.07 | (0.82) | 2.97 | (0.81) | 2.94 | (0.79) |
| Depressive symptoms | 2.34 | (0.68) | 2.34 | (0.69) | 2.36 | (0.68) | 2.42 | (0.70) | 2.25 | (0.60) |
n.wt = weighted subsample size. Ranges of continuous/interval variables: age 17 to 77; kids in households 0 to 5 (one observation >5 truncated to 5), economic hardship 1 to 5; weeks unemployed 1 to 52 (12 observations >52 truncated to 52); motivation to participate 1 to 6.5; job-seeking motivation 0 to 100; job-seeking self-efficacy 1 to 5; assertiveness 1 to 5; depressive symptoms 1 to 5.
We aim to illustrate the use of the sensitivity assumptions introduced above with the different outcomes, and show how sensitivity analysis effect estimates depart from PI-based estimates. For this purpose, any type A or type B estimator suffices. We suppose that a researcher has chosen to use the Hájek-type IF-based estimator (15) for the PI-based analysis. We will briefly describe an implementation of this estimator, and then will focus on the sensitivity analyses.
We report bias-corrected point estimates (see Section 6.4) and BCa confidence intervals.
7.1 |. PI-based main analysis
The estimator requires estimating several nuisance functions. We make relatively simple choices, keeping in mind what applied researchers may use in practice. We use logistic regression to fit the propensity score () and principal score ) models. These models include all baseline covariates, plus squares and square roots of continuous covariates; the inclusion of these additional terms is meant to improve covariate balance to be obtained from principal score and inverse propensity score weighting. We check balance as suggested in13. Figure 17 (in Appendix I) shows that covariate balance is improved (i) between the treated and control groups after propensity score weighting, and (ii) between treated (non)compliers and controls after principal score weighting combined with propensity score weighting.
Next, we estimate the conditional outcome mean functions for treated compliers (), treated noncompliers () and controls (). With the binary outcome work for pay, we use logistic regression. For the outcome earnings, the means are estimated conditional on working using gamma regression with log link. and then multiplied with the probability of working predicted by the work for pay model. (Small detail: since we use a noncanonical link with the gamma model, the predictions are slightly mean-biased; we calibrate them by a multiplicative constant to remove this bias.) For the depressive symptoms outcome, we use a simple transformation to the [0, 1] interval (by subtracting and dividing by ), fit a quasi-logistic model to the transformed outcome to estimate the conditional means, and then transform the means back to the original scale. These models include all baseline covariates.
We use targeted nuisance estimation (see Remark 2). The , and models are fit to data (treated group, treated compliers and treated noncompliers, respectively) weighted by . The model is fit twice, to the control group weighted by and weighted by , for CACE and NACE estimation, respectively.
Results (see Table 2) suggest that assignment to the intervention resulted in increased employment and earnings and decreased depressive symptoms for compliers. For noncompliers, effect estimates are close to null.
TABLE 2.
Pl-based analysis results: point estimates (and 95% BCa confidence intervals)
| outcome | compliers | noncompliers | ||||
|---|---|---|---|---|---|---|
| mean | mean | CACE () | mean | mean | NACE () | |
| work | 75.4% (69.4, 81.4) |
61.1% (53.1,68.4) |
14.3 percentage points (4.7, 23.1) |
68.5% (60.7, 75.2) |
64.2% (55.7, 72.2) |
4.3 percentage points (−6.2, 14.2) |
| earnings | $1,279 (1,107, 1,452) |
$1,014 (802, 1,221) |
$266 (18, 530) |
$928 (776, 1,115) |
$835 (666, 972) |
$92 (−90, 318) |
| depressive symptoms |
1.90 (1.80, 1.99) |
2.07 (1.96,2.20) |
−0.18 (−0.32, -0.04) |
2.05 (1.94,2.16) |
2.02 (1.88,2.14) |
0.02 (−0.12, 0.18) |
Variable work is binary. Actual earnings range is $0–5,667. Scale range of depressive symptoms is 1 to 5.
7.2 |. Sensitivity analysis
We now demonstrate sensitivity analyses that are OR-based for work for pay, MR-based for earnings, and GOR- and SMDe-based for depressive symptoms.
7.2.1 |. OR-based sensitivity analysis: work for pay
We noted above that some baseline factors such as socio-economic advantage and motivation are positively associated both with being a complier and with the work for pay outcome under control . One might be concerned whether, within subpopulations homogeneous in the observed covariates, there are other advantage type factors that are unobserved that relate to and in a similar way; in that case the PI-based analysis might have overestimated the CACE and underestimated the NACE. On the other hand, one might be concerned that among people with the same , some may not have needed to participate in the training because they had good prospects of finding a job; in that case the PI-based analysis might have been biased in the opposite direction. We thus consider a range of sensitivity OR values spanning both sides of 1. The results of this sensitivity analysis (Figure 4, top left) suggest that, even if (within levels of ) compliers had double the odds (relative to noncompliers) of getting work without the intervention, the intervention’s effect on having work for compliers would still be positive.
FIGURE 4.

Sensitivity analysis results: point estimates and 95% point-wise CIs for CACE, NACE and stratum-specific potential outcome means, for the range of the sensitivity parameter.
7.2.2 |. GOR-based sensitivity analysis: depressive symptoms
A concern may be that even among people with the same baseline covariate values (including baseline depressive symptoms score), compliers may be those who were more robust in some way (e.g., better at getting out of bed in the morning), and therefore may have better outcome (i.e., lower depressive symptoms at six months) under control than noncompliers. Therefore we consider sensitivity GOR values smaller than 1 (Figure 4, bottom left). The CACE estimate is quite sensitive to PI violation. It is negative (indicating a reduction in depressive symptoms) under PI, but as the sensitivity GOR deviates only slightly from 1, it quickly approaches zero.
7.2.3 |. MR-based sensitivity analysis: earnings
With this outcome, we use the MR sensitivity parameter. To illustrate the method as it would typically be used, we treat earnings as a stand-alone outcome, using as the only input, putting aside its connection with the work for pay outcome.
We start with a tentative MR range from 1/3 to 3, which is covered in Figure 4(top right). As mentioned earlier, it is challenging to choose what range to consider for the sensitivity parameter. Most important to this decision is substantive knowledge, including opinions of experts and study staff (who might know participants better than what is captured in the covariates). Such knowledge should be used, whenever it is available, to help rule in which range of the sensitivity parameter is practical and relevant.
As discussed in Section 6.1, the data can help rule out some implausible ranges. Here we simply use the maximum reported earnings under control ($5,667) as the upper bound B for . After computing covariate-specific “legal” intervals for the sensitivity parameter, we use their end points to make the plot on the left in Figure 5, which shows the proportion of the sample with either or exceeding B under each MR value. We do not restrict the MR range based on this plot, as it suggests limited bound contradiction. (Alternatives include (i) restricting the MR range, or (ii) modifying the assumption to let the MR be for values where is in the legal interval, and otherwise be the legal value closest to .)
FIGURE 5.

Bounds violation diagnostic: proportion contradicting bounds as a function of the sensitivity parameter
Another way to rely on the data is to examine what the MR values imply about the distributions of values among compliers and of values among noncompliers. Figure 18 (in Appendix I) plots these implied distributions for several MR values on [1/3, 3]. To judge whether such distributions are plausible, again, one should rely substantive knowledge if possible. Also, very large or values (especially those substantially larger than the maximum reported earnings) are suspect. Based on this, one might consider excluding MR values at the low end (1/3) and at the high end (≥ 2).
Another possibility is to supplement the A4-MR with other assumptions based on substantive knowledge. Suppose, for example, that substantive experts think it is unlikely that being assigned to the intervention is harmful to noncompliers (a relaxation of the ER assumption). Based on the results plot in Figure 4, this would narrow attention to the MR range above 1/2.
7.2.4 |. SMDe-based sensitivity analysis: depressive symptoms
Suppose that for the depressive symptoms outcome, an investigator prefers a sensitivity analysis based on A4-SMDe, being more comfortable communicating about mean differences. Here also, we consider a sensitivity SMD range to the left of the null value, where within levels, complier and noncomplier outcome means under control may differ by up to one standard deviation. Results (Figure 4, bottom right) look similar to those from the GOR-based sensitivity analysis, although using a different sensitivity parameter.
We note two details. First, this sensitivity analysis requires estimating the conditional variance . Using the quasi-likelihood approach, we assume that is proportional to . This is equivalent to assuming that the outcome, after being shifted and rescaled to the [0, 1] interval, follows a quasibinomial model conditional on covariates. Recall that in the PI-based analysis, we transformed this outcome to the [0, 1] interval and fit a model with logit link. We now manually extract the dispersion parameter from this model and use it to compute the variance estimate . Second, the plot on the right of Figure 5 shows that for the SMD range considered there is minimal contradiction with the bounds, which here are simply set to the minimum and maximum depressive symptom scores. This is expected, as we consider a modest SMD range.
We do not conduct an SMDe-based sensitivity analysis for the outcome earnings, because the equal variance part of A4-SMDe is likely grossly incorrect for that outcome.
8 |. DISCUSSION
This paper substantially expands options for sensitivity analysis for PI violation in the estimation of complier and noncomplier average causal effects in two ways. First, we consider several sensitivity models with different sensitivity parameters (OR, GOR, MR, SMD) suitable for different outcome types and reflecting different ways compliers and noncompliers may differ with respect to outcome under control. Second, rather than proposing one estimator under the sensitivity model, we tailor sensitivity analysis techniques to different types of estimators (outcome regression, IF-based and weighting) that may be used for the PI-based main analysis.
There are several future directions for this line of sensitivity analysis. One is to incorporate data-adaptive nuisance estimation. As noted, the robustness available for PI-based analysis via IF-based estimation is partially lost for sensitivity analysis, making it more important that we estimate nuisance functions well. We provide rate conditions, but otherwise leave this to future work. Also important is how to handle missing data. Missing-at-random cases can be handled by standard techniques, but given the difference in compliance type observability between treatment arms, one may wish to allow certain not-at-random missingness, e.g., outcome missingness that depends on compliance type35. Another extension is to adapt the methods to accommodate two-sided noncompliance and non-binary , which are also common settings.
For the two-sided noncompliance case, extension is conceptually straightforward: wherever a PI assumption is used to disentangle a mixture it can be replaced with a sensitivity assumption. With binary and there are four mixtures, so if PI assumptions are invoked to disentangle all four, then replacing those assumptions requires four sensitivity parameters. If one assumes away one principal stratum (say, defiers) to identify stratum prevalences and covariate distributions, then two mixtures remain, which means a PI-based analysis requires two PI assumptions and the sensitivity analysis involves two sensitivity parameters – see 21 for a sensitivity analysis using two MR parameters. While the idea is simple, works needs to be done to consider different (types of) PI-based estimators and pair them with sensitivity analysis techniques.
The methods in this paper belong to a mean-centric approach to sensitivity analysis. Each assumes a connection between two conditional outcome mean functions of complier and noncompliers. For a binary outcome, the sensitivity analysis based on A4-OR fully respects the observed outcome distribution. For continuous outcomes, however, the sensitivity analyses based on A4-GOR, A4-MR and A4-SMDe alone may conflict with the observed outcome distribution. The MR-based model may predict out of range because it treats the outcome as unbounded. The other two methods use some additional information: the GOR-based model takes in user-specified outcome bounds and respects those bounds; the SMDe-based model is informed about conditional outcome variability and with that information offers a scale-free sensitivity parameter. To mitigate the out-of-range prediction problem that affects the MR-based and to a lesser degree of the SMD-based method, we propose a simple technique that requires an additional assumption of bounds on stratum-specific conditional outcome means. There remains, however, the risk of more subtle conflict (e.g., predicting mean outcome in the tail of the distribution). A different approach is to avoid conflicting with the observed data distribution27,26,28 all together by anchoring on the conditional distribution of the outcome under control rather than just its mean plus bounds/variance. Such sensitivity analysis (described briefly in the preprint36) will be presented in a separate manuscript.
One last comment:
This paper provides technical solutions for doing sensitivity analysis, but does not address how to choose a relevant range for the sensitivity parameter and how to elicit and use expert opinion for this purpose. This is a topic that should receive more attention.
Supplementary Material
ACKNOWLEDGMENTS
This work is partially supported by grants R03MH128634, R01MH115487 and U24OD023382 from the National Institutes of Health, and N00014-21-1-2820 from the Office of Naval Research. Its depth and clarity benefited from helpful feedback from several anonymous reviewers. TQN thanks Drs. Ilya Shpitser, Bonnie Smith and Razieh Nabi for helpful discussions about influence functions, and Drs. Constantine Frangakis and Scott Zeger for thought-provoking comments at an early presentation of this work. The authors appreciate the participants, staff and investigators of the JOBS II study, and the ICPSR data archive.
Abbreviations:
- CACE
complier average causal effect
- NACE
noncomplier average causal effect
- ER
exclusion restriction
- PI
principal ignorability
- MR
mean ratio
- OR
odds ratio
- GOR
generalized odds ratio
- SMD
standardized mean difference
- IF
influence function
Footnotes
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no conflict of interests.
DATA AND CODE AVAILABILITY STATEMENT
The de-identified JOBS II data used in this paper can be requested from the Inter-University Consortium for Political and Social Research data archive at www.icpsr.umich.edu. All code to produce the results in this paper and to implement the proposed methods are provided in the R-package PIsens available at https://github.com/trangnguyen74/PIsens.
SUPPORTING INFORMATION
The Web Appendix may be found in the online version of the article at the publisher’s website, and also included on the next pages.
References
- 1.Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58(1):21–29. doi: 10.1111/j.0006-341X.2002.00021.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies.. Journal of Educational Psychology. 1974;66(5):688–701. doi: 10.1037/h0037350 [DOI] [Google Scholar]
- 3.Rubin DB. Causal inference through potential outcomes and principal stratification: Application to studies with “censoring” due to death. Statistical Science. 2006;21(3):299–309. doi: 10.1214/088342306000000114 [DOI] [Google Scholar]
- 4.Griffin BA, McCaffrey DF, Morral AR. An application of principal stratification to control for institutionalization at follow-up in studies of substance abuse treatment programs. The Annals of Applied Statistics. 2008;2(3):1034–1055. doi: 10.1214/08-AOAS179 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Vinokur AD, Price RH, Schul Y. Impact of the JOBS intervention on unemployed workers varying in risk for depression. American Journal of Community Psychology. 1995;23(1):39–74. doi: 10.1007/BF02506922 [DOI] [PubMed] [Google Scholar]
- 6.Gruenewald TL, Tanner EK, Fried LP, et al. The Baltimore Experience Corps Trial: Enhancing generativity via intergenerational activity engagement in later life. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences. 2016;71(4):661–670. doi: 10.1093/geronb/gbv005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Daumit GL, Dickerson FB, Wang NY, et al. A behavioral weight-loss intervention in persons with serious mental illness. New England Journal of Medicine. 2013;368(17):1594–1602. doi: 10.1056/NEJMoa1214530 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Angrist JD, Imbens GW. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association. 1995;90(430):431–442. doi: 10.1080/01621459.1995.10476535 [DOI] [Google Scholar]
- 9.Marshall J Coarsening bias: How coarse treatment measurement upwardly biases instrumental variable estimates. Political Analysis. 2016;24(2):157–171. doi: 10.1093/pan/mpw007 [DOI] [Google Scholar]
- 10.Andresen ME, Huber M. Instrument-based estimation with binarised treatments: issues and tests for the exclusion restriction. The Econometrics Journal. 2021;24(3):536–558. doi: 10.1093/ectj/utab002 [DOI] [Google Scholar]
- 11.Feller A, Mealli F, Miratrix L. Principal score methods: Assumptions, extensions, and practical considerations. Journal of Educational and Behavioral Statistics. 2017;42(6):726–758. doi: 10.3102/1076998617719726 [DOI] [Google Scholar]
- 12.Jo B, Stuart EA. On the use of propensity scores in principal causal effect estimation. Statistics in Medicine. 2009;28(23):2857–2875. doi: 10.1002/sim.3669 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ding P, Lu J. Principal stratification analysis using principal scores. Journal of the Royal Statistical Society. Series B: Statistical Methodology. 2017;79(3):757–777. doi: 10.1111/rssb.12191 [DOI] [Google Scholar]
- 14.Jiang Z, Ding P. Identification of causal effects within principal strata using auxiliary variables. Statistical Science. 2021;36(4):1–49. doi: 10.1214/20-STS810 [DOI] [Google Scholar]
- 15.Stuart EA, Jo B. Assessing the sensitivity of methods for estimating principal causal effects. Statistical Methods in Medical Research. 2015;24(6):657–674. doi: 10.1177/0962280211421840 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jo B, Vinokur AD. Sensitivity analysis and bounding of causal effects with alternative identifying assumptions. Journal of Educational and Behavioral Statistics. 2011;36(4):415–440. doi: 10.3102/1076998610383985 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang C, Zhang Y, Mealli F, Bornkamp B. Sensitivity analyses for the principal ignorability assumption using multiple imputation. Pharmaceutical Statistics. 2023;22(1):64–78. doi: 10.1002/pst.2260 [DOI] [PubMed] [Google Scholar]
- 18.Schwartz S, Li F, Reiter JP. Sensitivity analysis for unmeasured confounding in principal stratification settings with binary variables. Statistics in Medicine. 2012;31(10):949–962. doi: 10.1002/sim.4472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mercatanti A, Li F. Do debit cards decrease cash demand?: Causal inference and sensitivity analysis using principal stratification. Journal of the Royal Statistical Society. Series C: Applied Statistics. 2017;66(4):759–776. doi: 10.1111/rssc.12193 [DOI] [Google Scholar]
- 20.Baiocchi M, Cheng J, Small DS. Instrumental variable methods for causal inference. Statistics in Medicine. 2014;33(13):2297–2340. doi: 10.1002/sim.6128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Jiang Z, Yang S, Ding P. Multiply robust estimation of causal effects under principal ignorability. Journal of the Royal Statistical Society. Series B: Statistical Methodology. 2022;84(4):1423–1445. doi: 10.1111/rssb.12538 [DOI] [Google Scholar]
- 22.McConnell S, Stuart EA, Devaney B. The truncation-by-death problem: What to do in an experimental evaluation when the outcome is not always defined. Evaluation Review. 2008;32(2):157–186. doi: 10.1177/0193841X07309115 [DOI] [PubMed] [Google Scholar]
- 23.Hájek J Comment on “An essay on the logical foundations of survey sampling, part one” by Basu, D. In: Toronto: Holt, Rinehart, and Winston, 1971:236. [Google Scholar]
- 24.Wang B, Ogburn EL, Rosenblum M. Analysis of covariance in randomized trials: More precision and valid confidence intervals, without model assumptions. Biometrics. 2019;75(4):1391–1400. doi: 10.1111/biom.13062 [DOI] [PubMed] [Google Scholar]
- 25.Steingrimsson JA, Hanley DF, Rosenblum M. Improving precision by adjusting for prognostic baseline variables in randomized trials with binary outcomes, without regression model assumptions. Contemporary Clinical Trials. 2017;54:18–24. doi: 10.1016/j.cct.2016.12.026 [DOI] [PubMed] [Google Scholar]
- 26.Franks AM, D’Amour A, Feller A. Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association. 2020;115(532):1730–1746. doi: 10.1080/01621459.2019.1604369 [DOI] [Google Scholar]
- 27.Scharfstein DO, Nabi R, Kennedy EH, Huang MY, Bonvini M, Smid M. Semiparametric sensitivity analysis: Unmeasured confounding in observational studies. 2021. arxiv: 2104.08300. [DOI] [PubMed] [Google Scholar]
- 28.Robins JM, Rotnitzky A, Scharfstein DO. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: New York, NY: Springer New York, 2000:1–94 [Google Scholar]
- 29.Cohen J Statistical Power Analysis for the Behavioral Sciences. New York: Routledge. 2nd ed., 1988 [Google Scholar]
- 30.Stuart EA. Matching methods for causal inference: A review and a look forward. Statistical Science. 2010;25(1). doi: 10.1214/09-STS313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Stefanski LA, Boos DD. The calculus of M-estimation. The American Statistician. 2002;56(1):29–38. doi: 10.1198/000313002753631330 [DOI] [Google Scholar]
- 32.Efron B Better bootstrap confidence intervals. Journal of the American Statistical Association. 1987;82(397):171–185. doi: 10.1080/01621459.1987.10478410 [DOI] [Google Scholar]
- 33.Efron B, Tibshirani RJ. An Introduction to the Bootstrap. CRC Press, 1994. Google-Books-ID: gLlpIUxRntoC. [Google Scholar]
- 34.Chang J, Hall P. Double-bootstrap methods that use a single double-bootstrap simulation. Biometrika. 2015;102(1):203–214. doi: 10.1093/biomet/asu060 [DOI] [Google Scholar]
- 35.Frangakis C, Rubin DB. Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika. 1999;86(2):365–379. doi: 10.1093/biomet/86.2.365 [DOI] [Google Scholar]
- 36.Nguyen TQ, Stuart EA, Scharfstein DO, Ogburn EL. Sensitivity analysis for principal ignorability violation in estimating complier and noncomplier average causal effects. 2023. arXiv:2303.05052v1 (preprint version 1). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The de-identified JOBS II data used in this paper can be requested from the Inter-University Consortium for Political and Social Research data archive at www.icpsr.umich.edu. All code to produce the results in this paper and to implement the proposed methods are provided in the R-package PIsens available at https://github.com/trangnguyen74/PIsens.
SUPPORTING INFORMATION
The Web Appendix may be found in the online version of the article at the publisher’s website, and also included on the next pages.
