On varieties of doubly robust estimators under missingness not at random with a shadow variable

Wang Miao; Eric J Tchetgen Tchetgen

doi:10.1093/biomet/asw016

. 2016 May 10;103(2):475–482. doi: 10.1093/biomet/asw016

On varieties of doubly robust estimators under missingness not at random with a shadow variable

Wang Miao ¹, Eric J Tchetgen Tchetgen ²

PMCID: PMC4890127 PMID: 27279671

Abstract

Suppose we are interested in the mean of an outcome variable missing not at random. Suppose however that one has available a fully observed shadow variable, which is associated with the outcome but independent of the missingness process conditional on covariates and the possibly unobserved outcome. Such a variable may be a proxy or a mismeasured version of the outcome and is available for all individuals. We have previously established necessary and sufficient conditions for identification of the full data law in such a setting, and have described semiparametric estimators including a doubly robust estimator of the outcome mean. Here, we propose two alternative estimators, which may be viewed as extensions of analogous methods under missingness at random, but enjoy different properties. We assess the correctness of the required working models via straightforward goodness-of-fit tests.

Keywords: Doubly robust estimation, Missingness not at random, Shadow variable

1. Introduction

Doubly robust methods are designed to mitigate estimation bias due to model misspecification in observational studies and imperfect experiments. Such methods have grown in popularity in recent years for estimation with missing data and other forms of coarsening (Robins et al., 1994; Scharfstein et al., 1999; Van der Laan & Robins, 2003; Bang & Robins, 2005; Tsiatis, 2006). There exist various constructions of doubly robust estimators for the mean of an outcome that is missing at random; see Kang & Schafer (2007). In contrast, for data missing not at random, the difficulty of identification undermines one's ability to obtain accurate inferences, and doubly robust estimation is far more challenging. Identification of a full data model means that the parameters indexing the model are uniquely determined by the data that are actually observed. Statistical inference based on non-identifiable models may be misleading and of limited interest in practice; see Miao et al. (2015). Under missingness at random, the full data law, i.e., the joint distribution of all variables of interest, is nonparametrically identified from the observed data. However, under missingness not at random, identification is not possible without further restrictions on the missingness process. Although no general identification results are available for data missing not at random, one may identify the full data law under specific assumptions. Building on earlier work by D'Haultfoeuille (2010), Wang et al. (2014) and Zhao & Shao (2015), we have previously used a fully observed shadow variable to establish a general identification framework for data missing not at random, in a 2015 technical report available from the authors. Such a variable is associated with the outcome conditional on covariates, but independent of the missingness conditional on covariates and the outcome (Kott, 2014); it may be available in many empirical studies, where a fully observed proxy or a mismeasured version of the outcome is available. For example, in a study of the mental health of children in Connecticut (Zahner et al., 1992; Ibrahim et al., 2001), researchers were interested in evaluating the prevalence of students with abnormal psychopathological status based on their teacher's assessment, which was subject to missingness. A separate parent report available for all children in the study is a proxy for the teacher's assessment, but is unlikely to be related to the teacher's response rate conditional on covariates and her assessment of the student; in this case the parental assessment constitutes a valid shadow variable. Other examples can be found in Wang et al. (2014).

Throughout, let Inline graphic denote the outcome, is its missingness indicator, with if is observed, otherwise , and let denote fully observed covariates. Suppose that one has also fully observed a shadow variable that satisfies

Assumption 1. —

(i) ; (ii) .

Assumption 1 formalizes the idea that the shadow variable only affects the missingness through its association with the outcome. We provide a directed acyclic graph in the Supplementary Material that can help understand the assumption. The shadow variable introduces additional conditional independence conditions, which impose further restrictions on the missingness process, and thus provides better opportunity for identification under missingness not at random. In the 2015 technical report, we presented a brief review of such problems, and gave necessary and sufficient conditions as well as sufficient conditions for identification with a shadow variable. In particular, if the outcome is binary, the full data law is identifiable with a binary shadow variable. But for a continuous outcome, a binary shadow variable does not impose enough restrictions to identify the full data law; see the Supplementary Material for a counterexample. Identification for a continuous outcome requires at least one continuous shadow variable, but even then, additional conditions are needed. We consider a location-scale model for the density function:

(1)

with unrestricted functions Inline graphic and , and density functions . Under certain regularity conditions summarized in the Appendix, we have previously proved identification of the full data law if either or follows model (1), even if the missingness process is unrestricted. Model (1) includes many commonly-used models, for instance, Gaussian models, and thus demonstrates that lack of identification is not an issue in many familiar situations. Assumption 1 plays a central role for identification of model (1). If Assumption 1 is violated, i.e., the shadow variable is not available, model (1) is not identified even if one assumes a parametric missingness model. For additional discussion about identification under missingness not at random, see Miao et al. (2015) and Wang et al. (2014).

With models satisfying the corresponding identification conditions, previous authors have developed several non-doubly robust estimators. Among them, inverse probability weighted estimation (Wang et al., 2014) and pseudo-likelihood estimation (Zhao & Shao, 2015) are sensitive to model misspecification, and nonparametric estimation (D'Haultfoeuille, 2010) requires an unrealistically large sample for reasonable performance when the covariate dimension is moderate to large. In contrast, a doubly robust approach remains consistent and asymptotically normal under partial misspecification. In the 2015 technical report, we developed a doubly robust estimator based on a three-part model for the full data: a model for the joint distribution of the outcome and the shadow variable in complete cases; a model for the propensity score evaluated at a reference value of the outcome; and a log odds ratio model encoding the association of the outcome and the missingness process. Under correct specification of the log odds ratio model, the doubly robust estimator is consistent if either of the other two models is correct, but not necessarily both. However, the construction of a doubly robust estimator is not unique. In this paper, we develop two alternative doubly robust estimators of the outcome mean that enjoy different properties, and we compare them both in theory and via simulations reported in the Supplementary Material.

2. Doubly robust estimators

Under Assumption 1, we factorize the conditional density function of Inline graphic given as

(2)

where Inline graphic ; is the response probability evaluated at the reference level , and is referred to as the baseline propensity score; is the joint density function of conditional on among the complete cases, i.e., the subset with , and is referred to as the baseline outcome density;

is the log of the conditional odds ratio function relating Inline graphic and given , with and . For a continuous outcome, we require that satisfies model (1) to guarantee identification. For estimation, we specify separate parametric models , , and . We suppose throughout that is correctly specified, which can be achieved by specifying a relatively flexible model, or following the approach suggested by Higgins et al. (2008) if information on the reasons for missingness are available. From (2), we have

(3)

(4)

(5)

The propensity score, and its reciprocal, i.e., the inverse probability weight function Inline graphic , are determined by the baseline propensity score model and the log odds ratio model as in (3); the conditional outcome mean among the incomplete cases is determined by the baseline outcome model and the log odds ratio model as in (5).

Estimation of Inline graphic only involves complete cases. Letting denote the empirical mean, we solve

(6)

with score function Inline graphic . Estimation of and is motivated from a classical estimating equation following the fact that the respective weighted mean of any vector functions and among the complete cases equals their population mean: , where and are user-specified vector functions of dimension equal to that of Inline graphic and , respectively, and is nonsingular for all . For example, if follows a logistic model and thus , we may naturally choose and . Because is missing for , the classical estimating equation is not feasible. However, Assumption 1 allows us to replace with the shadow variable and to replace Inline graphic with . To make the estimating equation doubly robust, we incorporate the baseline outcome model into the estimating equation for . Let , we solve

(7)

with Inline graphic and such that is nonsingular for all . The shadow variable is used as a proxy of , thus, a choice of that is highly correlated with is desirable to maximize efficiency.

Using Inline graphic obtained from equations (6) and (7), we construct three different estimators for the outcome mean that are consistent if either the baseline outcome model or the baseline propensity score model is correctly specified, together with the log odds ratio model.

A regression estimator with residual bias correction was previously described in our 2015 technical report. We use the weighted residual to correct the bias of the conditional mean among incomplete cases. Letting Inline graphic , the estimator is

A Horvitz–Thompson estimator with extended weights employs an extended baseline propensity score model and an extended weight function. The extended baseline propensity score model with unknown parameter Inline graphic satisfies only at . For example, we can specify

with user-specified scalar function Inline graphic . The extended weight function , and its reciprocal is determined as in (3) with and replaced by and respectively. We estimate by solving

(8)

with previously obtained Inline graphic and . The Horvitz–Thompson estimator with extended weights is

A regression estimator with an extended outcome model involves an extended outcome model Inline graphic with parameter satisfying only at . If for some inverse link and some function , we can specify with a scalar function . We estimate by solving

(9)

with previously obtained Inline graphic . The regression estimator with an extended outcome model is

All three estimators Inline graphic and are doubly robust.

Theorem 1. —

Under Assumption 1, if the log odds ratio model is correct, and the probability limit of equations (6), (7), (8) and (9) has a unique solution, then , and are consistent if either or is correctly specified.

The extended models not only provide double robustness, but also provide a strategy to check whether the working models are correct. We prove in the Appendix that if the baseline propensity score model is correct, Inline graphic converges to zero in probability; and if the baseline outcome model is correct, converges to zero in probability. Therefore, one may use this property to assess whether the working models are correctly specified by checking whether and are within sampling variability of zero, respectively. However, the space of possible departures from the assumed model may be prohibitively large relative to the proposed test so that the resulting goodness-of-fit test will generally have good power against certain alternatives but not in all possible directions away from the specified working model. We explore the power of the proposed goodness-of-fit test via simulation in the Supplementary Material.

All three doubly robust estimators rely on a correct log odds ratio model, since inference about the law of Inline graphic requires an accurate evaluation of the dependence between the missingness process and the outcome, which is captured by the log odds ratio model . To the best of our knowledge, previous doubly robust estimators have assumed that this log odds ratio is known, either to equal the null value of zero under missingness at random (Bang & Robins, 2005; Tsiatis, 2006; Van der Laan & Robins, 2003), or to be of a known functional form with no unknown parameters (Vansteelandt et al., 2007; Robins et al., 2008). We have relaxed these more stringent assumptions.

3. Relation to previous doubly robust estimators and comparisons

Previous doubly robust estimators under missingness at random can be viewed as special cases of our estimators. Under missingness at random, Inline graphic , , the inverse probability weight function does not vary with , and the conditional mean among the population equals that among the incomplete cases . The estimator of Kang & Schafer (2007) is a special case of the regression estimator with residual bias correction; the estimator proposed by Robins et al. (2007), with an extended logistic propensity score model Inline graphic , is a special case of the Horvitz–Thompson estimator with extended weights; the estimator proposed by Robins et al. (2007), with an extended outcome model satisfying and , is a special case of the regression estimator with an extended outcome model.

The three proposed doubly robust estimators enjoy some of the properties of their missingness at random analogs. The estimator Inline graphic is a convex combination of the observed outcome values. It satisfies the boundedness property (Robins et al., 2007) that the estimator falls in the parameter space for the outcome mean almost surely. Such estimators are preferred when the inverse probability weights are highly variable, because they rule out estimates outside the sample space. Boundedness is not guaranteed for Inline graphic . If the range of is contained in the sample space of the outcome, also satisfies the boundedness condition, but this does not hold in general. For example, if the outcome is continuous, and , the range of may be outside the sample space of the outcome mean.

The three proposed estimators offer certain improvements in terms of bias when both models are misspecified. The asymptotic bias of Inline graphic can be written as

and the asymptotic bias of Inline graphic has the same form with replaced by , with probability limits of the corresponding estimators. The asymptotic bias of doubly robust estimators is a product of the bias of and the bias of . When both of these models are nearly correct, doubly robust estimators will generally have smaller bias than non-doubly robust estimators based on one of the two models only; but this is unlikely to hold if both models are seriously misspecified, especially when the weights are highly variable (Robins et al., 2007; Vermeulen & Vansteelandt, 2015). However, if Inline graphic in equation (7) includes a constant function, then , which restricts the amount of variability of the inverse probability weights. Thus, does not explode with large weights.

In simulation studies, we found that the three doubly robust estimators approximate the true outcome mean if either of the baseline models is correct, but they are biased if neither baseline model is correct. For the case with moderately variable weights, the relative magnitude of the bias depends on the specific data generating process, but for the case with highly-variable weights, the Horvitz–Thompson estimator with extended weights has smaller bias. If the baseline outcome model is correct, the parameter of the extended outcome model, Inline graphic is close to zero; and if the baseline propensity score model is correct, the parameter of the extended weight model, is close to zero. We also perform formal tests of the null hypotheses and respectively under level . The results show an empirical Type I error approximating if the required baseline propensity score model or baseline outcome model is correct, respectively, i.e., the true values of Inline graphic and equal zero respectively. Such tests have good power in moderate samples if the required model is incorrect, respectively. We recommend the proposed hypothesis tests to check for severe misspecification of the baseline models in practice.

4. Discussion

Extensions of the doubly robust methods described in this work to other functionals, such as a parameter Inline graphic solving a full data estimating equation , can be achieved by replacing with wherever occurs in the estimating equations and solving the doubly robust estimating equation for the parameter of interest. The methods also have potential application in related areas, such as longitudinal data analysis and causal inference.

Acknowledgments

The work is partially supported by the China Scholarship Council and the National Institute of Health. The authors are grateful to the referees and the editor for their helpful comments.

Supplementary material

Supplementary material available at Biometrika online includes the proof of Lemma A1, a counterexample to identification with a continuous outcome, a graph model for the shadow variable, and simulation studies.

Appendix

Proof of Theorem 1

We need the following lemma, which we prove in the Supplementary Material.

Lemma A1. —

Under Assumption 1, suppose that the log odds ratio model is correct, and that the probability limit of equations (6) and (7) has a unique solution. For any square integrable vector function , scalar function , and solving equations (6) and (7),

if is correct, then converges to zero in probability;

if is correct, then converges to zero in probability;

if either of the baseline models is correct, then converges to zero in probability.

Proof of Theorem 1. —

Suppose that the log odds ratio model is correctly specified, and that the probability limit of the estimating equations has a unique solution.

Double robustness of . If either of the baseline models is correct, from (iii) of Lemma A1, converges to zero, therefore converges to the true outcome mean.

Double robustness of . From (i) of Lemma A1, if the baseline propensity score model is correct, converges to zero, i.e., is a solution of the probability limit of equation (8). Thus, the solution of equation (8) converges to zero, and , . If the baseline outcome model is correct, converges to zero; converges to the true outcome mean; and converges to zero. By definition of the extended weight function, with . From (ii) of Lemma A1, converges to zero. Thus, must converge to zero, and
converges to the true outcome mean in probability.

Double robustness of . If is correct, from (i) of Lemma A1, converges to zero. Note equation (9), we have that converges to zero. Thus, converges to the true outcome mean. If is correct, then converges to zero. Since with , from (ii) of Lemma A1, converges to zero. That is, is a solution of the probability limit of equation (9). Thus, the solution of equation (9), converges to zero, and .

Regularity conditions for model

The full data law is identifiable if either Inline graphic or follows the location-scale model (1), and the corresponding density function or satisfies the following conditions:

the characteristic function of the density function satisfies for and some constants ;
conditional on , , are continuously differentiable and integrable with respect to ; is continuously differentiable, and is finite;
there exist some linear one-to-one mapping and some value such that either equals zero or infinity for any , with .

Many commonly-used models satisfy conditions (a)–(c), for example, the Gaussian models with Inline graphic the standard normal density function, the inverse Laplace transform, the moment-generating function of a normal density function with mean and variance , and .

References

Bang H. & Robins J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–73. [DOI] [PubMed] [Google Scholar]
D'Haultfoeuille X. (2010). A new instrumental method for dealing with endogenous selection. J. Economet. 154, 1–15. [Google Scholar]
Higgins J. P., White I. R. & Wood A. M. (2008). Imputation methods for missing outcome data in meta-analysis of clinical trials. Clin. Trials 5, 225–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ibrahim J. G., Lipsitz S. R. & Horton N. (2001). Using auxiliary data for parameter estimation with non-ignorably missing outcomes. Appl. Statist. 50, 361–73. [Google Scholar]
Kang J. D. & Schafer J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22, 523–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kott P. S. (2014). Calibration weighting when model and calibration variables can differ. In Contributions to Sampling Statistics, F. Mecatti, L. P. Conti & G. M. Ranalli, eds. Cham: Springer, pp. 1–18.
Miao W., Ding P. & Geng Z. (2015). Identifiability of normal and normal mixture models with nonignorable missing data. J. Am. Statist. Assoc. doi: 10.1080/01621459.2015.1105808.
Robins J. M., Rotnitzky A. & Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Am. Statist. Assoc. 89, 846–66. [Google Scholar]
Robins J., Sued M., Lei-Gomez Q. & Rotnitzky A. (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statist. Sci. 22, 544–59. [Google Scholar]
Robins J., Li L., Tchetgen Tchetgen E. & van der Vaart A. (2008). Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and Statistics: Essays in Honor of David A. Freedman, D. Nolan & T. P. Speed, eds., vol. 2. Beachwood, Ohio: Institute of Mathematical Statistics, pp. 335–421.
Scharfstein D. O., Rotnitzky A. & Robins J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Statist. Assoc. 94, 1096–120. [Google Scholar]
Tsiatis A. (2006). Semiparametric Theory and Missing Data. New York: Springer. [Google Scholar]
Van der Laan M. J. & Robins J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. New York: Springer. [Google Scholar]
Vansteelandt S., Rotnitzky A. & Robins J. (2007). Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika 94, 841–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vermeulen K. & Vansteelandt S. (2015). Biased-reduced doubly robust estimation. J. Am. Statist. Assoc. 110, 1024–36. [Google Scholar]
Wang S., Shao J. & Kim J. K. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statist. Sinica 24, 1097–116. [Google Scholar]
Zahner G. E., Pawelkiewicz W., DeFrancesco J. J. & Adnopoz J. (1992). Children's mental health service needs and utilization patterns in an urban community: an epidemiological assessment. J. Am. Acad. Child Adolesc. Psychiat. 31, 951–60. [DOI] [PubMed] [Google Scholar]
Zhao J. & Shao J. (2015). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. J. Am. Statist. Assoc. 110, 1577–90. [Google Scholar]

[ASW016C1] Bang H. & Robins J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–73. [DOI] [PubMed] [Google Scholar]

[ASW016C2] D'Haultfoeuille X. (2010). A new instrumental method for dealing with endogenous selection. J. Economet. 154, 1–15. [Google Scholar]

[ASW016C3] Higgins J. P., White I. R. & Wood A. M. (2008). Imputation methods for missing outcome data in meta-analysis of clinical trials. Clin. Trials 5, 225–39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW016C4] Ibrahim J. G., Lipsitz S. R. & Horton N. (2001). Using auxiliary data for parameter estimation with non-ignorably missing outcomes. Appl. Statist. 50, 361–73. [Google Scholar]

[ASW016C5] Kang J. D. & Schafer J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22, 523–39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW016C6] Kott P. S. (2014). Calibration weighting when model and calibration variables can differ. In Contributions to Sampling Statistics, F. Mecatti, L. P. Conti & G. M. Ranalli, eds. Cham: Springer, pp. 1–18.

[ASW016C7] Miao W., Ding P. & Geng Z. (2015). Identifiability of normal and normal mixture models with nonignorable missing data. J. Am. Statist. Assoc. doi: 10.1080/01621459.2015.1105808.

[ASW016C8] Robins J. M., Rotnitzky A. & Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Am. Statist. Assoc. 89, 846–66. [Google Scholar]

[ASW016C9] Robins J., Sued M., Lei-Gomez Q. & Rotnitzky A. (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statist. Sci. 22, 544–59. [Google Scholar]

[ASW016C10] Robins J., Li L., Tchetgen Tchetgen E. & van der Vaart A. (2008). Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and Statistics: Essays in Honor of David A. Freedman, D. Nolan & T. P. Speed, eds., vol. 2. Beachwood, Ohio: Institute of Mathematical Statistics, pp. 335–421.

[ASW016C11] Scharfstein D. O., Rotnitzky A. & Robins J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Statist. Assoc. 94, 1096–120. [Google Scholar]

[ASW016C12] Tsiatis A. (2006). Semiparametric Theory and Missing Data. New York: Springer. [Google Scholar]

[ASW016C13] Van der Laan M. J. & Robins J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. New York: Springer. [Google Scholar]

[ASW016C14] Vansteelandt S., Rotnitzky A. & Robins J. (2007). Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika 94, 841–60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ASW016C15] Vermeulen K. & Vansteelandt S. (2015). Biased-reduced doubly robust estimation. J. Am. Statist. Assoc. 110, 1024–36. [Google Scholar]

[ASW016C16] Wang S., Shao J. & Kim J. K. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statist. Sinica 24, 1097–116. [Google Scholar]

[ASW016C17] Zahner G. E., Pawelkiewicz W., DeFrancesco J. J. & Adnopoz J. (1992). Children's mental health service needs and utilization patterns in an urban community: an epidemiological assessment. J. Am. Acad. Child Adolesc. Psychiat. 31, 951–60. [DOI] [PubMed] [Google Scholar]

[ASW016C18] Zhao J. & Shao J. (2015). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. J. Am. Statist. Assoc. 110, 1577–90. [Google Scholar]

PERMALINK

On varieties of doubly robust estimators under missingness not at random with a shadow variable

Wang Miao

Eric J Tchetgen Tchetgen

Abstract

1. Introduction

Assumption 1. —

2. Doubly robust estimators

Theorem 1. —

3. Relation to previous doubly robust estimators and comparisons

4. Discussion

Acknowledgments

Supplementary material

Appendix

Proof of Theorem 1

Lemma A1. —

Proof of Theorem 1. —

Regularity conditions for model

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On varieties of doubly robust estimators under missingness not at random with a shadow variable

Wang Miao

Eric J Tchetgen Tchetgen

Abstract

1. Introduction

Assumption 1. —

2. Doubly robust estimators

Theorem 1. —

3. Relation to previous doubly robust estimators and comparisons

4. Discussion

Acknowledgments

Supplementary material

Appendix

Proof of Theorem 1

Lemma A1. —

Proof of Theorem 1. —

Regularity conditions for model

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases