Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2018 Feb 28;105(2):479–486. doi: 10.1093/biomet/asy007

Optimal pseudolikelihood estimation in the analysis of multivariate missing data with nonignorable nonresponse

Jiwei Zhao 1,, Yanyuan Ma 2,1
PMCID: PMC6373018  PMID: 30799873

SUMMARY

Tang et al. (2003) considered a regression model with missing response, where the missingness mechanism depends on the value of the response variable and hence is nonignorable. They proposed three pseudolikelihood estimators, based on different treatments of the probability distribution of the completely observed covariates. The first assumes the distribution of the covariate to be known, the second estimates this distribution parametrically, and the third estimates the distribution nonparametrically. While it is not hard to show that the second estimator is more efficient than the first, Tang et al. (2003) only conjectured that the third estimator is more efficient than the first two. In this paper, we investigate the asymptotic behaviour of the third estimator by deriving a closed-form representation of its asymptotic variance. We then prove that the third estimator is more efficient than the other two. Our result can be straightforwardly applied to missingness mechanisms that are more general than that in Tang et al. (2003).

Keywords: Efficiency, Missing data, Nonignorable nonresponse, Pseudolikelihood estimator

1. Introduction

Tang et al. (2003) considered multivariate regression analysis of a Inline graphic-dimensional response Inline graphic on a Inline graphic-dimensional covariate Inline graphic, with the joint density function of Inline graphic factorized as Inline graphic, where Inline graphic represents the marginal density function of Inline graphic and the estimation of Inline graphic is of main interest.

They considered the situation where Inline graphic is fully observed but Inline graphic has missing values. Let Inline graphic if Inline graphic is completely observed and Inline graphic otherwise. Tang et al. (2003) assumed that the missing data mechanism depends only on the underlying value of the response Inline graphic and hence is nonignorable,

graphic file with name M16.gif (1)

They proposed estimators built on the fact that Inline graphic and Inline graphic are conditionally independent given Inline graphic, so the completely observed subjects form a random sample from the distribution of Inline graphic given Inline graphic.

The missing data mechanism (1) has been widely adopted in response-biased sampling (Brown, 1990; Liang & Qin, 2000; Chen, 2001) and is relevant in many applications. For example, Chen (2001) studied a case of a univariate response Inline graphic and a multivariate covariate Inline graphic, where the observed data form a nonrandom sample from Inline graphic, with the sampling probability depending only on Inline graphic. Therefore, the observed data can be viewed as a random sample from the distribution of Inline graphic given Inline graphic, instead of from the original regression model Inline graphic (Chen, 2001). Assumption (1) is also sensible in other situations. For example, when evaluating a new biomarker in analytical chemistry, scientists usually encounter a laboratory quality control limit, the so-called detection limit (Navas-Acien et al., 2008; Caldwell et al., 2009; Carter et al., 2016), defined as the lowest concentration of analyte distinguishable from the background noise. Although theoretically available, concentration values below the detection limit are usually not released by laboratories. When the concentration value is the outcome of interest and needs to be regressed against covariates Inline graphic, assumption (1) is satisfied, with Inline graphic where Inline graphic is the detection limit. Eliminating observations with values below the detection limit can lead to severe bias; see Hopke et al. (2001), Moulton et al. (2002), Richardson & Ciampi (2003), Schisterman et al. (2006) and the references therein. Other examples where (1) is valid include survey sampling (Deville & Särndal, 1992; Deville, 2000; Kott, 2014), case-control studies (Chen, 2007), and some survival analysis contexts. Tang et al. (2003) also discussed some extensions of the assumption (1).

The novelty of the idea in Tang et al. (2003) has led to recent developments such as Kim & Yu (2011), Zhao & Shao (2015), Shao & Wang (2016) and Miao & Tchetgen Tchetgen (2016). In this paper, we first present our results under assumption (1) and then show that they can be straightforwardly applied to more general missingness mechanisms.

The estimation of Inline graphic is based on independent and identically distributed observations Inline graphic for Inline graphic and Inline graphic for Inline graphic. Based on (1), we estimate Inline graphic through maximizing the likelihood of the parameters of Inline graphic based on the complete observations,

graphic file with name M39.gif

Equivalently, we estimate Inline graphic by maximizing the complete-case conditional loglikelihood

graphic file with name M41.gif (2)

which contains Inline graphic, the unspecified probability density function of Inline graphic. If the true Inline graphic, Inline graphic, is known, the corresponding estimator of Inline graphic, denoted by Inline graphic, is the maximizer of Inline graphic. If Inline graphic is unknown, with Inline graphic fully observed, any appropriate complete-data technique can be applied to estimate Inline graphic. Tang et al. (2003) considered two situations and obtained two pseudolikelihood estimators of Inline graphic (Gong & Samaniego, 1981; Parke, 1986). The first is when a parametric model Inline graphic is adopted and the full-data maximum likelihood estimator is used to obtain Inline graphic. Then Inline graphic is used to replace Inline graphic in (2), which leads to the estimator Inline graphic, the maximizer of Inline graphic. The second is when Inline graphic is unspecified and its cumulative distribution is estimated by its empirical version Inline graphic. This gives the estimator Inline graphic, the maximizer of

graphic file with name M62.gif (3)

An interesting issue is the efficiency of Inline graphic, Inline graphic and Inline graphic. Theorem 2 of Tang et al. (2003) established the asymptotic normality of Inline graphic and showed that Inline graphic is more efficient than Inline graphic. However, the authors did not give an explicit expression for the asymptotic variance of Inline graphic, so could not provide a theoretical efficiency comparison with Inline graphic. Based on simulation studies, they conjectured that Inline graphic is more efficient than both Inline graphic and Inline graphic.

We derive the asymptotic variance of Inline graphic in closed form and prove that Inline graphic is more efficient than Inline graphic and Inline graphic; thus we establish the correctness of the conjecture in Tang et al. (2003) and provide a clear explanation of their numerical observations. We also show that, in general, no other method of estimating Inline graphic can lead to a more efficient estimator of Inline graphic than Inline graphic, which is recommended for use in practice.

2. Asymptotic distribution and optimality

We use uppercase letters to denote random variables and lowercase letters to denote their realizations. We let Inline graphic and Inline graphic. Sometimes we also write Inline graphic and Inline graphic. We define Inline graphic, Inline graphic and Inline graphic. We let Inline graphic and Inline graphic. We also write Inline graphic and Inline graphic.

In this section, we establish the asymptotic distribution of Inline graphic and briefly describe the asymptotic distributions of Inline graphic and Inline graphic. The results for Inline graphic and Inline graphic can be found in Theorem 2 of Tang et al. (2003). Recall that Inline graphic is the maximizer of Inline graphic. It is straightforward to show that

graphic file with name M99.gif

so Inline graphic in distribution as Inline graphic. The estimator Inline graphic is the maximizer of Inline graphic. Because the maximum likelihood estimate of Inline graphic satisfies Inline graphic,

graphic file with name M106.gif

Hence Inline graphic in distribution as Inline graphic, where Inline graphic. It is obvious that Inline graphic.

Theorem 1.

Under the conditions of Theorem 3 in Tang et al. (2003), the estimator Inline graphic has the asymptotic representation

Theorem 1.

so Inline graphic in distribution as Inline graphic, where

Theorem 1.

This implies that:

Corollary 1.

Inline graphic is more efficient than Inline graphic and hence more efficient than Inline graphic.

Although other nonparametric methods can be used to estimate Inline graphic and thus obtain alternative estimators of Inline graphic, doing so cannot increase the efficiency of Inline graphic. To see this, let Inline graphic denote the empirical estimator of Inline graphic, which results in Inline graphic, and let Inline graphic be an alternative consistent estimator of Inline graphic using data Inline graphic, which gives rise to an alternative estimator of Inline graphic, denoted by Inline graphic. The derivation in Theorem 1 yields

graphic file with name M130.gif

We write Inline graphic as Inline graphic. As regular asymptotically linear estimators of Inline graphic based on Inline graphic, Inline graphic and Inline graphic satisfy

graphic file with name M137.gif

(Huber, 1981) for some influence functions Inline graphic and Inline graphic, which we inspect in order to compare estimation efficiency. Here Inline graphic. Therefore

graphic file with name M141.gif

where in the last step we have used the zero-mean properties of Inline graphic and Inline graphic. This leads to

graphic file with name M144.gif

Using the technique in Corollary 1, we obtain

graphic file with name M145.gif

so Inline graphic is also less efficient than Inline graphic. Therefore the pseudolikelihood estimator Inline graphic is superior to any other possible parametrically or nonparametrically based estimators.

3. Extension to more general missingness mechanisms

The missing data mechanism in (1) assumes that given Inline graphic, Inline graphic and Inline graphic are conditionally independent. Although reasonable in many situations, this is not always true. For instance, in a randomized clinical trial comparing a treatment with a placebo, the dichotomous treatment indicator may influence the missingness. Consider a very simple scenario where Inline graphic denotes the outcome, Inline graphic the binary treatment indicator and Inline graphic the covariate. We are interested in the unknown parameters in Inline graphic. Compared with (1), it is more cautious to assume that

graphic file with name M156.gif (4)

Under (4), the methods in § 2 still apply. To see this, similar to the idea in § 1, the unknown parameter Inline graphic can be estimated based on the conditional likelihood

graphic file with name M158.gif

Thus the same reasoning and derivation can be applied, and we can show that the estimator for Inline graphic with Inline graphic estimated by its empirical version under Inline graphic and Inline graphic separately, i.e., the Inline graphic version in § 2, is also optimal among the three estimators.

We now generalize the assumption (1) to the case where the missing data indicator Inline graphic and some components in the covariates Inline graphic, say Inline graphic, are conditionally independent given Inline graphic and the remaining components of Inline graphic, say Inline graphic; that is,

graphic file with name M170.gif (5)

Here, the covariates are represented by Inline graphic, and Inline graphic is called a nonresponse instrument (Zhao & Shao, 2015) or a shadow variable (Miao & Tchetgen Tchetgen, 2016). The objective function becomes

graphic file with name M173.gif (6)

The distribution of Inline graphic conditional on Inline graphic in (6) poses extra challenges. The truth, a parametric or nonparametric estimator of Inline graphic, can be incorporated into the estimation, resulting in Inline graphic, Inline graphic or Inline graphic. Theory similar to that in § 2 can be developed and leads to the same optimality of Inline graphic.

The nonignorable missing data mechanism assumption is usually difficult to specify or verify (d’Haultfoeuille, 2010), but a nonresponse instrument Inline graphic is often available and assumption (5) is often reasonable. For example, in a study of children’s mental health (Zahner et al., 1992), investigators were interested in evaluating the prevalence of children with abnormal psychopathological status based on their teacher’s assessment, Inline graphic, which was subject to missing values. A missing teacher report may be related to the teacher’s assessment of the student even after adjusting for fully observed covariates Inline graphic such as physical health of the child and parental status of the household (Ibrahim et al., 2001). A separate parental report on the psychopathology of the child was also available for all children in the study. Such a report is likely to be highly correlated with that of the teacher, but is unlikely to be correlated with the teacher’s response status conditional on the teacher’s assessment of that student. Therefore, the parental assessment constitutes a valid nonresponse instrument and assumption (5) is reasonable.

Supplementary Material

Supplementary Data

Acknowledgement

We thank the editor, associate editor and three referees for their constructive comments, which have led to a significantly improved paper. This work was partially supported by the National Center for Advancing Translational Sciences of the U.S. National Institutes of Health and the U.S. National Science Foundation.

Appendix

Proof of Theorem 1.

If we estimate Inline graphic with its empirical distribution, we approximate Inline graphic by its sample average, i.e., Inline graphic. Also, Inline graphic. To obtain Inline graphic, we maximize (3), which is equivalent to maximizing

Proof of Theorem 1.

Thus, by the mean value theorem, there must exist some Inline graphic lying between Inline graphic and Inline graphic such that

Proof of Theorem 1.

Now

Proof of Theorem 1. (A1)

where, using the decomposition and representation techniques related to V-statistics (Serfling, 1980; Shao, 2003), we have

Proof of Theorem 1.

Substituting this into (A1), we get

Proof of Theorem 1.

Hence Inline graphic in distribution as Inline graphic, where

Proof of Theorem 1.

Proof of Corollary 1.

To prove that Inline graphic is more efficient than Inline graphic, note that

Proof of Corollary 1.

where the second equality comes from the fact that

Proof of Corollary 1.

Therefore, Inline graphic is also more efficient than Inline graphic. □

Supplementary material

Supplementary material available at Biometrika online contains some simulation results.

References

  1. Brown C. H. (1990). Protecting against nonrandomly missing data in longitudinal studies. Biometrics 46, 143–55. [PubMed] [Google Scholar]
  2. Caldwell K. L., Jones R. L., Verdon C. P., Jarrett J. M., Caudill S. P. & Osterloh J. D. (2009). Levels of urinary total and speciated arsenic in the US population: National Health and Nutrition Examination Survey 2003–2004. J. Expos. Sci. Envir. Epidemiol. 19, 59–68. [DOI] [PubMed] [Google Scholar]
  3. Carter R. L., Wrabetz L., Jalal K., Orsini J. J., Barczykowski A. L., Matern D. & Langan T. J. (2016). Can psychosine and galactocerebrosidase activity predict early-infantile Krabbe’s disease presymptomatically? J. Neurosci. Res. 94, 1084–93. [DOI] [PubMed] [Google Scholar]
  4. Chen H. Y. (2007). A semiparametric odds ratio model for measuring association. Biometrics 63, 413–21. [DOI] [PubMed] [Google Scholar]
  5. Chen K. (2001). Parametric models for response-biased sampling. J. R. Statist. Soc. B 63, 775–89. [Google Scholar]
  6. Deville J.-C. (2000). Generalized calibration and application to weighting for non-response. In Proceedings in Computational Statistics: 14th Symposium held in Utrecht, The Netherlands, 2000. Heidelberg: Springer, pp. 65–76. [Google Scholar]
  7. Deville J.-C. & Särndal C. E. (1992). Calibration estimators in survey sampling. J. Am. Statist. Assoc. 87, 376–82. [Google Scholar]
  8. d’Haultfoeuille X. (2010). A new instrumental method for dealing with endogeneous selection. J. Economet. 154, 1–15. [Google Scholar]
  9. Gong G. & Samaniego F. J. (1981). Pseudo maximum likelihood estimation: Theory and applications. Ann. Statist. 9, 861–9. [Google Scholar]
  10. Hopke P. K., Liu C. & Rubin D. B. (2001). Multiple imputation for multivariate data with missing and below-threshold measurements: Time-series concentrations of pollutants in the Arctic. Biometrics 57, 22–33. [DOI] [PubMed] [Google Scholar]
  11. Huber P. J. (1981). Robust Statistics. New York: Wiley. [Google Scholar]
  12. Ibrahim J. G., Lipsitz S. R. & Horton N. (2001). Using auxiliary data for parameter estimation with non-ignorably missing outcomes. Appl. Statist. 50, 361–73. [Google Scholar]
  13. Kim J. K. & Yu C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. J. Am. Statist. Assoc. 106, 157–65. [Google Scholar]
  14. Kott P. (2014). Calibration weighting when model and calibration variables can differ. In Contributions to Sampling Statistics. Cham: Springer International Publishing, pp. 1–18. [Google Scholar]
  15. Liang K.-Y. & Qin J. (2000). Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. J. R. Statist. Soc. B 62, 773–86. [Google Scholar]
  16. Miao W. & Tchetgen Tchetgen E. J. (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika 103, 475–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Moulton L. H., Curriero F. C. & Barroso P. F. (2002). Mixture models for quantitative HIV RNA data. Statist. Meth. Med. Res. 11, 317–25. [DOI] [PubMed] [Google Scholar]
  18. Navas-Acien A., Silbergeld E. K., Pastor-Barriuso R. & Guallar E. (2008). Arsenic exposure and prevalence of type 2 diabetes in US adults. J. Am. Med. Assoc. 300, 814–22. [DOI] [PubMed] [Google Scholar]
  19. Parke W. R. (1986). Pseudo maximum likelihood estimation: The asymptotic distribution. Ann. Statist. 14, 355–7. [Google Scholar]
  20. Richardson D. B. & Ciampi A. (2003). Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am. J. Epidemiol. 157, 355–63. [DOI] [PubMed] [Google Scholar]
  21. Schisterman E. F., Vexler A., Whitcomb B. W. & Liu A. (2006). The limitations due to exposure detection limits for regression models. Am. J. Epidemiol. 163, 374–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Serfling R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley. [Google Scholar]
  23. Shao J. (2003). Mathematical Statistics. New York: Springer, 2nd ed. [Google Scholar]
  24. Shao J. & Wang L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103, 175–87. [Google Scholar]
  25. Tang G., Little R. J. & Raghunathan T. E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika 90, 747–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Zahner G. E., Pawelkiewicz W., DeFrancesco J. J. & Adnopoz J. (1992). Children’s mental health service needs and utilization patterns in an urban community: An epidemiological assessment. J. Am. Acad. Child Adolesc. Psychiat. 31, 951–60. [DOI] [PubMed] [Google Scholar]
  27. Zhao J. & Shao J. (2015). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. J. Am. Statist. Assoc. 110, 1577–90. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES