Robust Estimation of Area Under ROC Curve Using Auxiliary Variables In the Presence of Missing Biomarker Values

Qi Long; Xiaoxi Zhang; Brent A Johnson

doi:10.1111/j.1541-0420.2010.01487.x

. Author manuscript; available in PMC: 2021 May 28.

Published in final edited form as: Biometrics. 2010 Sep 3;67(2):559–567. doi: 10.1111/j.1541-0420.2010.01487.x

Robust Estimation of Area Under ROC Curve Using Auxiliary Variables In the Presence of Missing Biomarker Values

Qi Long ^1,^*, Xiaoxi Zhang ², Brent A Johnson ¹

PMCID: PMC8162996 NIHMSID: NIHMS1704813 PMID: 20825391

Summary:

In medical research, the receiver operating characteristic (ROC) curves can be used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing a disease in the future. The area under the ROC curve (AUC), as a summary measure of ROC curves, is widely utilized, especially when comparing multiple ROC curves. In observational studies, the estimation of the AUC is often complicated by the presence of missing biomarker values, which means that the existing estimators of the AUC are potentially biased. In this article, we develop robust statistical methods for estimating the ROC AUC and the proposed methods use information from auxiliary variables that are potentially predictive of the missingness of the biomarkers or the missing biomarker values. We are particularly interested in auxiliary variables that are predictive of the missing biomarker values. In the case of missing at random (MAR), i.e., missingness of biomarker values only depends on the observed data, our estimators have the attractive feature of being consistent if one correctly specifies, conditional on auxiliary variables and disease status, either the model for the probabilities of being missing or the model for the biomarker values. In the case of missing not at random (MNAR), i.e., missingness may depend on the unobserved biomarker values, we propose a sensitivity analysis to assess the impact of MNAR on the estimation of the ROC AUC. The asymptotic properties of the proposed estimators are studied and their finite sample behaviors are evaluated in simulation studies. The methods are further illustrated using data from a study of maternal depression during pregnancy.

Keywords: Area under the curve, Biomarker, Doubly robust estimators, Missing at random, Missing not at random, Receiver operating characteristic curve, Sensitivity analysis

1. Introduction

The receiver operating characteristic (ROC) curve plots the fraction of true positives (sensitivity) against the fraction of false positives (1–specificity) as the discrimination threshold (e.g., of a biomarker for a disease) is varied, and it is often used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing diseases in the future. It was originally developed for the analysis of signal detection (Green and Swets, 1966) and was first used in medicine for the assessment of imaging devices (Zweig and Campbell, 1993). In medical studies, summary measures of ROC curves are often used and they are particularly powerful when comparing several ROC curves. The most widely used summary measure is the area under the ROC curve (ROC AUC) (Bamber, 1975). The ROC AUC is bounded between 0.5 and 1, and has the interpretation of the probability of a randomly selected observation from the diseased (non-diseased) population having a higher biomarker value than that from the non-diseased (diseased) population. Therefore, a large AUC value represents good separation in the biomarker values between the diseased and non-diseased populations. In particular, a perfect test would achieve an AUC of 1.0, whereas an uninformative test would have an AUC of 0.5. A wealth of literature has been developed for this type of research (Pepe (2003) and references therein).

In practice, the biomarker value may be missing for some subjects, especially in observational studies. Take for example a self-rated mental illness score collected from pregnant women in a psychiatric study, where the disease of interest is the presence (or absence) of a major depressive episode throughout pregnancy (see Section 4 for more details). Since the biomarker score is self-rated, it is possible that some subjects did not complete the self-evaluation and hence the score is missing. In such studies, additional variables including demographic and baseline variables are often available, which are referred to as auxiliary variables. While these variables are not of primary interest themselves, they are potentially predictive of the missingness of the biomarker value or the value itself, and can be incorporated in a data analysis to improve its robustness and/or efficiency. If an auxiliary variable is predictive of missingness but independent of the missing values, then using it in an analysis will not affect the results. Thus, we are interested in auxiliary variables that are predictive of the missing values, especially if they are also predictive of the missingness.

As with the general setting discussed in Little and Rubin (2002) and references therein, a naive analysis that only uses complete observations may lead to bias and loss of efficiency in the estimation of the ROC AUC. First, when the biomarker is missing completely at random (MCAR), i.e., the missingness does not depend on either observed or unobserved data, the naive analysis is valid but is not efficient. Second, when the biomarker is missing at random (MAR), i.e., the missingness is conditionally independent of the missing data given the observed data, the naive analysis is biased and other methods, e.g., inverse-weighted (IW) methods, can be extended for consistent estimation. IW methods weight each complete case by the inverse of the probability of observing the biomarker value. Despite its conceptual simplicity, IW methods have limitations. Most notably, IW methods are not efficient and are subject to bias if one misspecifies the model for the missingness. Alternatively, one can extend the methods that are doubly robust and more efficient (Robins et al., 1994; Scharfstein et al., 1999) for estimating the ROC AUC. In the case of missing not at random (MNAR), i.e., missingness depends on unobserved biomarker values even after conditioning on the observed data, it is common practice to conduct sensitivity analysis (Zhou, 1994; Rotnitzky and Robins, 1997; Scharfstein et al., 1999; Kosinski and Barnhart, 2003). In all cases, auxiliary variables can be used to potentially reduce bias and improve efficiency when associated with the probability of missing and the value of biomarkers, or simply improve efficiency when only associated with the value of biomarkers.

We confine the scope of this paper to the case where the disease status is always confirmed and a set of auxiliary variables are fully observed but the biomarker values are missing for some subjects, and we are interested in estimating the ROC AUC. Our setting is to be distinguished from the existing research on verification bias (Zhou, 1993, 1998; Rotnitzky et al., 2006; Fluss et al., 2009). In the presence of verification bias, the biomarker values are always observed whereas the true disease status is only verified for a non-random sample of the population of interest, e.g., the selection for testing may depend on the disease status or other variables. In particular, Rotnitzky et al. (2006) extended the doubly robust method developed in Rotnitzky and Robins (1997) to the estimation of the ROC AUC in the presence of verification biases. As a result of different problem setups (i.e. biomarker values missing vs. disease status unconfirmed for a subset of subjects), there are important differences between our work and theirs. In our setting, a working model on biomarker values, which can be continuous or categorical, is utilized, whereas a working model on the presence (or absence) of the disease, a binary variable, was utilized in Rotnitzky et al. (2006); consequently, our methods require modeling of the conditional distribution of biomarker values. Furthermore, we study and compare parametric and nonparametric approaches for estimating this conditional distribution and discuss two types of MAR assumptions, which have different implications on the estimation of AUC.

The remainder of the article is organized as follows. In Section 2, we describe the proposed estimators and their theoretical properties under MAR and propose a sensitivity analysis under MNAR. In Section 3, we evaluate the finite sample performance of the proposed estimators through simulations. In Section 4, we apply the proposed methods to a psychiatric study of maternal depression during pregnancy. We conclude with a discussion in Section 5.

2. Methodology

Suppose that a random sample of n subjects is selected from a population of interest to evaluate the performance of a diagnostic or predictive test using a biomarker. Each subject i, i = 1, … , n, is classified into one of two groups, diseased (D_i = 1) or non-diseased (D_i = 0), based on a gold standard. For each subject i, denote the biomarker value by X_i, which is used to diagnose or predict the disease status (D_i). X_i is not observed in a subset of the subjects, and let δ_i denote the missing indicator for X_i, i.e., δ_i = 1 when X_i is observed and δ_i = 0 if X_i is missing. In addition, p auxiliary variables that may be associated with the value of X_i and/or its missingness (δ_i) are also collected and denoted by $Z_{i} = {(Z_{i}^{(1)}, \dots, Z_{i}^{(p)})}^{T}$ . Then for subject i, the complete data are (D_i, Z_i, δ_i, X_i). When δ_i = 1, the observed data are O_i = (D_i, Z_i, δ_i, X_i) and subject i is called a complete case; when δ_i = 0, the observed data are O_i = (D_i, Z_i, δ_i) and subject i is called an incomplete case. We denote by O the collection of observed data for all subjects. When δ_i is independent of X_i conditional on D_i and Z_i, it is a case of MAR; when δ_i is dependent on X_i conditional on D_i and Z_i, it is a case of MNAR.

We are interested in estimating the ROC AUC, which is equivalent to a U-statistic (Bamber, 1975), θ = Pr(X_i > X_j | D_i = 1, D_j = 0), assuming that the diseased tend to have higher biomarker values. When all data are completely observed, an unbiased estimator of θ is

\hat{θ} = \frac{1}{\sum_{i \neq j} D_{i} (1 - D_{j})} \sum_{i \neq j} D_{i} (1 - D_{j}) I_{i j},

where I_ij = I(X_i > X_j) + 0.5I(X_i = X_j) with I(A) equals to 1 if A is true and 0 if A is false. When X is missing for some subjects, a naive extension of the above estimator using only the complete observations (i.e., δ_i = 1) is

{\hat{θ}}_{0} = \frac{1}{\sum_{i \neq j} D_{i} (1 - D_{j}) δ_{i} δ_{j}} \sum_{i \neq j} D_{i} (1 - D_{j}) δ_{i} δ_{j} I_{i j} .

(1)

It is straightforward to verify the following proposition:

Proposition 1: (i) When δ is independent of X given D, ${\hat{θ}}_{0}$ is an unbiased estimator of θ; (ii) when δ is dependent on X given D, then ${\hat{θ}}_{0}$ is subject to potential bias.

We note that (i) includes the case of MCAR and a special case of MAR where δ may depend on D and Z and is independent of X given D and (ii) includes the case of MNAR and a special case of MAR where δ is dependent on X conditional on D but is independent of X conditional on D and Z. We refer to ${\hat{θ}}_{0}$ as the naive estimator throughout this article.

2.1. Inverse-Weighted Estimator

In the case of MAR, we first study an inverse-weighted estimator,

{\hat{θ}}_{I W} = \frac{1}{\sum_{i \neq j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} D_{i} (1 - D_{j})} \sum_{i \neq j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} D_{i} (1 - D_{j}) I_{i j},

(2)

where ${\hat{π}}_{i}$ is an estimate of the probability of observing X_i, namely, π_i = Pr(δ_i = 1), conditional on Z_i and D_i under MAR. We denote by $(M 1)$ the working model for π_i given Z_i and D_i with a set of unknown parameters, α, and denote by $A (α; O) = \sum_{i} A_{i} (α; O)$ the estimating equations for computing the estimate of α, namely, $\hat{α}$ , based on the observed data. For instance, one can use a logistic regression model for $(M 1)$ , i.e. logit(π_i) = W(Z_i, D_i; α) where W(Z_i, D_i; α) is a function of Z_i and D_i and is parameterized by α; $A (α; O)$ can be taken as the likelihood equations for the logistic regression model. $(M 1)$ is also known as the propensity score model (Rosenbaum and Rubin, 1983). It can be readily shown that if the working model $(M 1)$ is correctly specified, ${\hat{θ}}_{I}$ is a consistent estimator of θ under MAR.

2.2. Doubly Robust Estimators

In the case of MAR, we propose a second estimator

{\hat{θ}}_{D R} = \frac{1}{\sum_{i \neq j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} D_{i} (1 - D_{j})} \sum_{i \neq j} D_{i} (1 - D_{j}) {\frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} I_{i j} - \frac{δ_{i} δ_{j} - {\hat{π}}_{i} {\hat{π}}_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} E (I_{i j} ∣ Z_{i}, Z_{j}, D_{i} = 1, D_{j} = 0)}

(3)

where ${\hat{π}}_{i}$ is the same as previously defined and E(I_ij | Z_i, Z_j, D_i = 1, D_j = 0) can be estimated based on the joint conditional distribution of X_i and X_j given the observed data. Specifically, we denote by $(M 2)$ the working model for characterizing the conditional distribution of X given Z and D with a set of unknown parameters, β, and denote by $B (β; O) = \sum_{i} B_{i} (β; O)$ the estimating equations for computing the estimate of β, namely, $\hat{β}$ , based on the observed data. We note that the conditional mean of X given Z and D is only part of $(M 2)$ . It can be readily shown that if either $(M 1)$ or $(M 2)$ is correctly specified, ${\hat{θ}}_{D R}$ is a consistent estimator of θ under MAR.

We consider two options for the working model $(M 2)$ . In the first option, X given Z and D is assumed to follow a known parametric distribution with unknown parameters β. One special case is the Gaussian distribution, i.e., $[X_{i} ∣ Z_{i}, D_{i}] ~ N (V (Z_{i}, D_{i}; β_{1}), σ_{1}^{2} D_{i} + σ_{0}^{2} (1 - D_{i}))$ , where V(Z_i, D_i; β₁) is a function of Z_i and D_i parameterized by β₁. Let $β = {(β_{1}^{T}, σ_{1}^{2}, σ_{0}^{2})}^{T}$ denote all parameters of interest, and it follows that

X_{i} - X_{j} ∣ Z_{i}, Z_{j}, D_{i} = 1, D_{j} = 0 ~ N (V (Z_{i}, D_{i} = 1; β_{1}) - V (Z_{j}, D_{j} = 0; β_{1}), σ_{1}^{2} + σ_{0}^{2}),

and hence

E {I_{i j} ∣ Z_{i}, Z_{j}, D_{i} = 1, D_{j} = 0} = Φ (\frac{V (Z_{i}, D_{i} = 1; β_{1}) - V (Z_{j}, D_{j} = 0; β_{1})}{\sqrt{σ_{1}^{2} + σ_{0}^{2}}}),

where Φ(·) is the cumulative distribution function (c.d.f.) of a standard normal random variable. ${\hat{θ}}_{D R}$ can be rewritten as

{\hat{θ}}_{D R} = \frac{1}{\sum_{i \neq j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} π_{j}} D_{i} (1 - D_{j})} \sum_{i \neq j} D_{i} (1 - D_{j}) [\frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} I_{i j} - \frac{δ_{i} δ_{j} - {\hat{π}}_{i} {\hat{π}}_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} Φ {b_{i j} (\hat{β})}],

(4)

where $b_{i j} (\hat{β}) = \frac{V (Z_{i}, D_{i} = 1; {\hat{β}}_{1}) - V (Z_{j}, D_{j} = 0; {\hat{β}}_{1})}{\sqrt{{\hat{σ}}_{1}^{2} + {\hat{σ}}_{0}^{2}}}$ and $\hat{β}$ can be obtained through, say, linear regression for $(M 2)$ using the observed data. From here on, let ${\hat{θ}}_{D R}$ denote the doubly robust estimator in Equation (4), which assumes that the conditional distribution of X is Gaussian.

In the second option, suppose X_i = V(Z_i, D_i; β₁)+ε_1iD_i+ε_0i(1−D_i), where {ε_1i, i = 1, …, n₁} and {ε_0i, i = 1, …, n₀} are independent and identically distributed (i.i.d.) random errors in the diseased and non-diseased, respectively, and their respective distributions are unknown. In this case, the conditional distribution of X_i given Z_i and D_i can be estimated nonparametrically. We denote the set of observed residuals by ${{\hat{ε}}_{1 k} = X_{k} - V (Z_{k}, D_{k} = 1; {\hat{β}}_{1}), k = 1, \dots, n_{1}^{o}}$ and ${{\hat{ε}}_{0 l} = X_{l} - V (Z_{l}, D_{l} = 0; {\hat{β}}_{1}), l = 1, \dots, n_{0}^{o}}$ for the diseased and non-diseased, respectively, where $n_{1}^{o}$ and $n_{0}^{o}$ are the number of subjects with observed X in the diseased and non-diseased, respectively. An empirical sample of the estimated conditional distribution of X_i given Z_i and D_i can be constructed as ${{\tilde{X}}_{i k}^{1} = V (Z_{i}, D_{i} = 1; {\hat{β}}_{1}) + {\hat{ε}}_{1 k}, k = 1, \dots, n_{1}^{o}}$ in the diseased and ${{\tilde{X}}_{i l}^{0} = V (Z_{i}, D_{i} = 0; {\hat{β}}_{1}) + {\hat{ε}}_{0 l}, l = 1, \dots, n_{0}^{o}}$ in the non-diseased. E(I_ij | Z_i, Z_j, D_i = 1, D_j = 0) in Equation (3) can then be estimated using $\frac{1}{n_{1}^{0} n_{0}^{σ}} \sum_{k = 1}^{n_{1}^{0}} \sum_{l = 1}^{n_{0}^{o}} {I ({\tilde{X}}_{i k}^{1} > {\tilde{X}}_{j l}^{0}) + 0.5 I ({\tilde{X}}_{i k}^{1} = {\tilde{X}}_{j l}^{0})}$ , where i and j go through all subjects including those with missing X, and we denote the resulting nonparametric estimator of θ by ${\hat{θ}}_{D R - N}$ . When random errors are not i.i.d., e.g., the variance changes as the mean of X changes, the above procedure needs to be modified accordingly, e.g., performed within strata of the mean of X.

When computing ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ , the weights $(\frac{1}{{\hat{π}}_{i}})$ may be large and unstable, and lead to extra noise in estimation, in particular, when computing the bootstrap SE of ${\hat{θ}}_{D R - N}$ . Thus, we consider a simple modification to stabilize the weights, namely, replacing $\frac{1}{{\hat{π}}_{i}}$ with $\frac{1}{{\hat{π}}_{i}} \frac{n}{\sum_{i} δ_{i} / {\hat{π}}_{i}}$ . When $(M 1)$ is correctly specified, it can be readily shown that $\frac{1}{n} \sum_{i} δ_{i} / {\hat{π}}_{i}$ converges to 1 in probability, hence $\frac{1}{{\hat{π}}_{i}} \frac{n}{\sum_{i} δ_{i} / {\hat{π}}_{i}}$ is equivalent to $\frac{1}{{\hat{π}}_{i}}$ asymptotically.

2.3. Theoretical Properties

Following our previous notation, we further define the following,

U_{i, j} (θ, α) \equiv θ \frac{δ_{i} δ_{j}}{π_{i} π_{j}} D_{i} (1 - D_{j}) - \frac{δ_{i} δ_{j}}{π_{i} π_{j}} D_{i} (1 - D_{j}) I_{i j},

V_{i, j} (θ, α, β) \equiv θ \frac{δ_{i} δ_{j}}{π_{i} π_{j}} D_{i} (1 - D_{j}) - \frac{δ_{i} δ_{j}}{π_{i} π_{i}} D_{i} (1 - D_{j}) I_{i j} + \frac{δ_{i} δ_{j} - π_{i} π_{j}}{π_{i} π_{j}} D_{i} (1 - D_{j}) E (I_{i j} ∣ Z_{i}, Z_{j}, D_{i}, D_{j}),

where π_i depends on α and E(I_ij | Z_i, Z_j, D_i, D_j) depends on β. It follows that $U = \sum_{i \neq j} U_{i, j} (θ, \hat{α})$ and $V = \sum_{i \neq j} V_{i, j} (θ, \hat{α}, \hat{β})$ are the set of estimating equations for ${\hat{θ}}_{I W}$ and ${\hat{θ}}_{D R}$ , respectively. Let α₀ and β₀ be the probability limits of $\hat{α}$ and $\hat{β}$ , respectively, which usually exist.

Theorem 1: Under the regularity conditions (A1)–(A3) given in Web Appendix A, if either or both of $(M 1)$ and $(M 2)$ are correctly specified, then $\sqrt{n} ({\hat{θ}}_{D R} - θ) \to N (0, Ω)$ in distribution, where $Ω = Var [{E \frac{δ_{i} δ_{j}}{π_{i} π_{j}} D_{i} (1 - D_{j})}^{- 1} Q_{i} (θ, α_{0}, β_{0})]$ , and

Q_{i} = - E {V_{i, j} (θ, α, β) + V_{j, i} (θ, α, β) ∣ O_{i}} + [\frac{\partial}{\partial α} E {V_{i, j} (θ, α, β)}] \times {[\frac{\partial}{\partial α} E {A_{i} (α)}]}^{- 1} A_{i} (α) + [\frac{\partial}{\partial β} E {V_{i, j} (θ, α, β)}] \times {[\frac{\partial}{\partial β} E {B_{i} (β)}]}^{- 1} B_{i} (β) .

Ω can be consistently estimated by $\hat{Ω} = \frac{1}{γ^{2} n} \sum_{i = 1}^{n} {\hat{Q}}_{i}^{2}$ with $γ = \frac{1}{n^{2}} \sum_{i, j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} D_{i} (1 - D_{j})$ and

{\hat{Q}}_{i} = - \frac{1}{n} [\sum_{j} {V_{i, j} ({\hat{θ}}_{D R}, \hat{α}, \hat{β}) + V_{j, i} ({\hat{θ}}_{D R}, \hat{α}, \hat{β})}] + \frac{1}{n} [{\sum_{i \neq j} \frac{\partial V_{i, j} ({\hat{θ}}_{D R}, α, \hat{β})}{\partial α} |}_{α = \hat{α}}] {[{\sum_{i} \frac{\partial A_{i} (α)}{\partial α} |}_{α = \hat{α}}]}^{- 1} A_{i} (\hat{α}) + \frac{1}{n} [{\sum_{i \neq j} \frac{\partial V_{i, j} ({\hat{θ}}_{D R}, \hat{α}, β)}{\partial β} |}_{β = \hat{β}}] {[{\sum_{i} \frac{\partial B_{i} (β)}{\partial β} |}_{β = \hat{β}}]}^{- 1} B_{i} (\hat{β}) .

Theorem 2: Under the regularity conditions similar to (A1)–(A3) given in Web Appendix A, if $(M 1)$ is correctly specified, then $\sqrt{n} ({\hat{θ}}_{I W} - θ) \to N (0, Ω)$ in distribution, where $Ω = Var [{E \frac{δ_{i} δ_{j}}{π_{i} π_{j}} D_{i} (1 - D_{j})}^{- 1} R_{i} (θ, α_{0})]$ , and

R_{i} = - E {u_{i, j} (θ, α) + u_{j, i} (θ, α) ∣ O_{i}} + [\frac{\partial}{\partial α} E {U_{i, j} (θ, α)}] \times {[\frac{\partial}{\partial α} E {A_{i} (α)}]}^{- 1} A_{i} (α) .

Ω can be consistently estimated by $\hat{Ω} = \frac{1}{γ^{2} n} \sum_{i = 1}^{n} {\hat{R}}_{i}^{2}$ with $γ = \frac{1}{n^{2}} \sum_{i, j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} D_{i} (1 - D_{j})$ and

{\hat{R}}_{i} = - \frac{1}{n} [\sum_{j} {u_{i, j} ({\hat{θ}}_{I W}, \hat{α}) + u_{j, i} ({\hat{θ}}_{I W}, \hat{α})}] + \frac{1}{n} [{\sum_{i \neq j} \frac{\partial U_{i, j} ({\hat{θ}}_{I W}, α)}{\partial α} |}_{α = \hat{α}}] {[{\sum_{i} \frac{\partial A_{i} (α)}{\partial α} |}_{α = \hat{α}}]}^{- 1} A_{i} (\hat{α}) .

A sketch of proof for Theorems 1 and 2 is provided in Web Appendix A, which is along the similar lines of Rotnitzky et al. (2006). The underlying idea is to derive the influence functions for ${\hat{θ}}_{I W}$ or ${\hat{θ}}_{D R}$ by plugging in the influence functions for $\hat{α}$ and $\hat{β}$ . The consistency of ${\hat{θ}}_{D R - N}$ is straightforward to show when either $(M 1)$ or $(M 2)$ holds and its SE can be computed using a bootstrap procedure, which resamples the data with replacement within disease strata.

A few remarks are in order. First, as stated in Proposition 1, ${\hat{θ}}_{0}$ is unbiased when δ is independent of X given D; but if δ and X are associated with Z, ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ are potentially more efficient when the working models are correctly specified. Second, when δ is dependent on X given D but independent of X given D and Z, ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ are still consistent under suitable conditions, while ${\hat{θ}}_{0}$ is subject to potential bias. Thirdly, ${\hat{θ}}_{D R}$ assumes that the residuals are Gaussian in $(M 2)$ and is subject to model misspecification even if the mean model is correctly specified; ${\hat{θ}}_{D R - N}$ does not impose this restriction.

2.4. MNAR: Sensitivity Analysis

We now consider a case of MNAR, where δ is dependent on X conditional on Z and D; thus, a working model $(M 1)$ that only includes Z and D is misspecified. We investigate a sensitivity analysis to assess the impact on ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ , as the effect of X on δ is varied. To fix the idea, suppose that logit(π_i) = S(Z_i, D_i; α_S) + U(X_i, α_X), where α_S and α_X are two sets of unknown parameters associated with known functions S and U, respectively. α_X represents the effect of biomarker values on the probability of being missing. Since α_S and α_X can not be jointly estimated using the observed data, we fix α_X at a set of pre-determined values and estimate α_S using the following set of estimating equations,

\sum_{i = 1}^{n} (\frac{δ_{i}}{π_{i}} - 1) W (Z_{i}, D_{i}),

(5)

where W(Z_i, D_i) is an arbitrary known vector function with the same dimension as α_S. For instance, if S(Z_i, D_i; α_S) = α_SW(Z_i, D_i), then W(Z_i, D_i) is the covariate vector for i, which may include interaction terms. Compared to the likelihood equations for the logistic regression, one advantage of the estimation equations (5) is that π_i is not needed when X_i is missing. For every pre-determined value of α_X, we can compute ${\hat{α}}_{S}$ using (5) and ${\hat{π}}_{i}$ for subjects with observed X_i; subsequently we can compute ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ , all of which do not need ${\hat{π}}_{i}$ for subjects with missing X_i. This procedure is repeated for a grid of α_X values, and the resulting estimators are compared to assess the impact of α_X and hence the impact of MNAR. U(X_i, α_X) = 0 corresponds to the case of MAR, and U(X_i, α_X) ≠ 0 corresponds to the case of MNAR. In this sensitivity analysis, we do not assume that the estimation of the parameters of $(M 2)$ is not affected by MNAR. To simplify the sensitivity analysis and, in particular, avoid performing sensitivity analysis for two working models, we exploit the doubly robust property, i.e., if $(M 1)$ is correctly specified then the proposed estimators are consistent, and focus on $(M 1)$ only.

3. Simulation studies

We conducted simulations to evaluate the finite sample performance of the proposed estimators, first in the case of MAR where δ is independent of X given D and Z, then in the case of MNAR where δ is dependent on X given D and Z. In our simulations, ${\hat{θ}}_{0}$ , ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ were compared. In addition, we considered another estimator, namely, ${\hat{θ}}_{I M P} = \frac{1}{\sum_{i \neq j} D_{i} (1 - D_{j})} \sum_{i \neq j} D_{i} (1 - D_{j}) [δ_{i} δ_{j} I_{i j} - (δ_{i} δ_{j} - 1) Φ {b_{i j} (\hat{β})}]$ , which only relies on $(M 2)$ and is not doubly robust. While it is not of primary interest in this article, ${\hat{θ}}_{I M P}$ under the correctly specified $(M 2)$ can be used as an optimal benchmark for efficiency as suggested by a referee. To benchmark bias and loss of efficiency due to missing data, a so-called gold standard (GS) estimator was also computed, i.e., ${\hat{θ}}_{G S} = \frac{1}{\sum_{i \neq j} D_{i} (1 - D_{j})} \sum_{i = 1}^{n} \sum_{j = 1}^{n} D_{i} (1 - D_{j}) I_{i j}$ , which uses the underlying “true” biomarker values for all subjects and is not applicable in real data analysis. In Tables 1–3, modified weights as described in Section 2.2 were used for ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ ; to compute the standard error for ${\hat{θ}}_{D R - N}$ , we used 200 bootstrap datasets randomly sampled with replacement from the data while stratified on the disease status. In each simulation, we generated a random sample of n = 200 independent subjects with an equal number of diseased and non-diseased subjects. For each simulation setting, 500 Monte Carlo datasets were generated and the results were summarized using the following measures: the mean relative bias (RB), mean of the standard error estimates (SE), Monte Carlo standard deviation of parameter estimates (SD), square root of mean squared errors (SMSE) and coverage rate (CR) of 95% Wald’s confidence interval using a logistic transform of θ as suggested in Pepe (2003) (Ch. 5).

Table 1.

Results of simulation study under MAR: comparison of ${\hat{θ}}_{0}$ , ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ using modified weights, when Z₁ and Z₂ are identical. ε is Gaussian (i.e., ε ~ N(0, 1)) or non-Gaussian (i.e., ε = 20{η – E(η)} with η ~ Beta(5, 1)). True θ is 0.722 for Gaussian ε and 0.675 for non-Gaussian ε. The details of true models and misspecified working models are provided in Section 3.1.

	Gaussian ε					non-Gaussian ε
	RB (%)	SE	SD	SMSE	CR (%)	RB (%)	SE	SD	SMSE	CR (%)
${\hat{θ}}_{G S}$	−0.2	0.036	0.037	0.037	94.0	0.0	0.038	0.038	0.038	95.8
${\hat{θ}}_{0}$	11.6	0.050	0.054	0.099	70.0	10.8	0.057	0.056	0.092	80.4
	Both mean models correctly specified
${\hat{θ}}_{I M P}$	−0.1	0.040	0.042	0.042	95.0	−0.8	0.054	0.054	0.054	94.8
${\hat{θ}}_{I W}$	0.5	0.048	0.052	0.052	93.0	1.0	0.056	0.058	0.059	95.0
${\hat{θ}}_{D R}$	0.0	0.045	0.043	0.043	96.4	0.5	0.057	0.055	0.055	96.4
${\hat{θ}}_{D R - N}$	0.0	0.042	0.043	0.043	94.4	0.5	0.056	0.056	0.056	96.0
	Mean model for $(M 1)$ misspecified
${\hat{θ}}_{I W}$	8.4	0.050	0.054	0.081	78.6	8.0	0.056	0.058	0.079	84.8
${\hat{θ}}_{D R}$	0.0	0.040	0.043	0.043	94.0	0.6	0.052	0.055	0.055	94.4
${\hat{θ}}_{D R - N}$	0.0	0.041	0.043	0.043	95.4	0.4	0.056	0.055	0.055	95.8
	Mean model for $(M 2)$ misspecified
${\hat{θ}}_{D R}$	0.5	0.054	0.050	0.050	96.2	0.9	0.061	0.058	0.058	96.2
${\hat{θ}}_{D R - N}$	0.5	0.050	0.050	0.050	94.0	0.9	0.059	0.058	0.058	95.6
	Both mean models misspecified
${\hat{θ}}_{D R}$	8.4	0.050	0.053	0.081	78.6	7.9	0.057	0.058	0.079	86.0
${\hat{θ}}_{D R - N}$	8.4	0.051	0.053	0.081	79.0	7.9	0.058	0.058	0.079	86.2

Open in a new tab

Table 3.

Results of simulation study under MNAR: comparison of ${\hat{θ}}_{0}$ , ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ , ${\hat{θ}}_{D R - N}$ , ${\hat{θ}}_{I W - S}$ , ${\hat{θ}}_{D R - S}$ , and ${\hat{θ}}_{D R - N - S}$ using modified weights, when ε ~ N(0, 1) and Z₁ and Z₂ are identical. True θ is 0.722. The details of true models and misspecified working models are provided in Section 3.2.

	RB (%)	SE	SD	SMSE	CR (%)
${\hat{θ}}_{G S}$	−0.2	0.036	0.037	0.037	94.0
${\hat{θ}}_{0}$	−8.1	0.053	0.055	0.080	78.2
Correct subset of Z₁ and D included in $(M 1)$
${\hat{θ}}_{I W}$	−5.8	0.052	0.055	0.069	85.4
${\hat{θ}}_{I W - S}$	−1.0	0.049	0.052	0.053	93.0
Correct subset of Z and D included in both models
${\hat{θ}}_{D R}$	−0.5	0.039	0.040	0.041	95.0
${\hat{θ}}_{D R - N}$	−0.5	0.039	0.040	0.040	94.4
${\hat{θ}}_{D R - S}$	−0.2	0.038	0.039	0.039	93.8
${\hat{θ}}_{D R - N - S}$	−0.2	0.038	0.039	0.039	93.6
Incorrect subset of Z₁ and D included in $(M 1)$
${\hat{θ}}_{D R}$	−0.6	0.039	0.040	0.041	94.6
${\hat{θ}}_{D R - N}$	−0.5	0.039	0.040	0.040	93.2
${\hat{θ}}_{D R - S}$	−0.2	0.038	0.039	0.039	94.0
${\hat{θ}}_{D R - N - S}$	−0.2	0.038	0.039	0.039	93.4
Incorrect subset of Z₂ and D included in $(M 2)$
${\hat{θ}}_{D R}$	−5.3	0.049	0.053	0.065	85.0
${\hat{θ}}_{D R - N}$	−5.3	0.050	0.053	0.065	86.2
${\hat{θ}}_{D R - S}$	−0.8	0.047	0.049	0.049	92.8
${\hat{θ}}_{D R - N - S}$	−0.8	0.048	0.049	0.049	93.8

Open in a new tab

3.1. MAR: δ independent of X given D and Z

Under MAR, we considered two settings, namely, δ dependent on X given D and δ independent of X given D. Corresponding to each setting, we generated the auxiliary variables, $Z_{1} = (Z_{1}^{(1)}, Z_{1}^{(2)}, Z_{1}^{(3)})$ , which are associated with δ, and $Z_{2} = (Z_{2}^{(1)}, Z_{2}^{(2)}, Z_{2}^{(3)})$ , which are associated with X. In the first setting, Z₁ = Z₂ and they were generated from a multivariate Gaussian distribution with a mean μ_Z = (3, −2, −1) and a variance matrix Σ_Z = diag(0.25, 0.25, 0.25), which implies that δ is dependent on X given D and hence ${\hat{θ}}_{0}$ is subject to potential bias. In the second setting, Z₁ and Z₂ were generated from two independent multivariate Gaussian distributions with the same mean and variance as in the first setting, which implies that δ is independent of X given D and hence ${\hat{θ}}_{0}$ is unbiased. Next, we generated X as follows, X = β₀ + β₁D + β₂Z₂ + β₃DZ₂ + ε with β₀ = 1, β₁ = 2.5, β₂ = (3, 3, 3), and β₃ = (.5, .5, .5), which is the true underlying model for $(M 2)$ . Two different residual distributions were considered so that we could compare the performance of ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ ; specifically, ε ~ N(0, σ²) or ε = 20{η − E(η)} with η ~ Beta(5, 1). The resulting true θ is 0.722 for Gaussian ε and 0.675 for non-Gaussian ε. Subsequently, we generated the missing indicator δ from a Bernoulli distribution with mean π which satisfies logit(π) = α₀ + α₁D + α₂Z₁ + α₃DZ₁ with α₀ = 0.3, α₁ = 0.3, α₂ = (0.4,0.5,0.3), and α₃ = (−0.7, −0.7, −0.9); this is the underlying true model for $(M 1)$ . The resulting average probability of missing X is 66.4% in the diseased group and 55.8% in the non-diseased group.

When computing ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ , we fitted the two working models for δ and X, namely, $(M 1)$ and $(M 2)$ , under the following four scenarios: 1) the mean structure is correctly specified for both working models, i.e., Z₁ and D are included in $(M 1)$ , and Z₂ and D are included in $(M 2)$ ; 2) the mean structure is misspecified for $(M 1)$ , i.e., only $Z_{1}^{(1)}$ and D are used in $(M 1)$ ; 3) the mean structure is misspecified for $(M 2)$ , i.e., only $Z_{2}^{(1)}$ and D are used in $(M 2)$ ; and 4) the mean structure is misspecified for both working models, i.e., only $Z_{1}^{(1)}$ and D are included in $(M 1)$ and only $Z_{2}^{(1)}$ and D are included in $(M 2)$ . We note that ${\hat{θ}}_{D R}$ assumes that X follows Gaussian distributions. Consequently, if the residuals for X follow a Gaussian distribution, e.g. ε ~ N(0, σ²), then the correct specification of the mean structure in $(M 2)$ also indicates the correct specification of the conditional distribution for X when computing ${\hat{θ}}_{D R}$ . However, if the residual distribution is not Gaussian, e.g., ε = 20{η−E(η)} with η ~ Beta(5, 1), the conditional distribution for X is misspecified when computing ${\hat{θ}}_{D R}$ , even if the mean structure is correctly specified in $(M 2)$ . Since ${\hat{θ}}_{D R - N}$ is robust to the mis-specification of distributions of the residuals for X, it should remain consistent in both cases.

3.1.1. The case of δ dependent on X given D.

In this setting, we let Z₁ and Z₂ be identical, hence δ is dependent on X given D. Table 1 presents the results for two different residual distributions for X. We first discuss the case of Gaussian ε. ${\hat{θ}}_{0}$ shows a large RB of 11.6% with a low coverage rate of 70.0%. ${\hat{θ}}_{I W}$ exhibits negligible bias and a CR close to the nominal level when $(M 1)$ is correctly specified; however, its bias becomes substantial and CR degrades considerably to 78.6% when $(M 1)$ is misspecified. When at least one working model is correctly specified, ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ show negligible bias that is comparable to ${\hat{θ}}_{G S}$ and good coverage properties. In particular, as long as $(M 2)$ is correctly specified, ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ are more efficient than ${\hat{θ}}_{I W}$ , and is almost as efficient as ${\hat{θ}}_{I M P}$ ; in this case, negligible loss of efficiency is observed even if $(M 1)$ is misspecified. By contrast, when $(M 2)$ is misspecified and $(M 1)$ is correctly specified, the loss of efficiency is considerable for ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ . These observations are consistent with what have been reported in the literature, i.e., the correct specification of $(M 2)$ for X is more important in terms of improving efficiency of ${\hat{θ}}_{D R}$ . When both working models are misspecified, the bias and MSE of ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ are still similar to or less than those of ${\hat{θ}}_{I W}$ or ${\hat{θ}}_{0}$ .

When the residuals are not Gaussian, $(M 2)$ is always misspecified for ${\hat{θ}}_{D R}$ . Our results in Table 1 show that ${\hat{θ}}_{D R}$ is fairly robust to the misspecified distribution of ε as long as the conditional mean of X in $(M 2)$ is correctly specified. In addition, most observations for Gaussian ε are still true for non-Gaussian ε. In this case, ${\hat{θ}}_{I M P}$ serves as an approximate benchmark for efficiency, since ${\hat{θ}}_{I M P}$ is also fairly robust to a mis-specified distribution for ε and it is generally difficult to obtain an exact “imputation” estimator when ε is non-Gaussian. Similar results were observed in our additional simulations with other non-Gaussian distributions for ε, say, χ² distribution.

3.1.2. The case of δ independent of X given D.

In this setting, Z₁ and Z₂ are two separate sets of auxiliary variables, hence δ is independent of X given D. Table 2 presents the results for both Gaussian and non-Gaussian residuals. In all cases, all estimators exhibit negligible bias and satisfactory coverage properties, which is consistent with our discussion in Section 2. Again, as long as $(M 2)$ is (approximately) correctly specified, ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ are almost as efficient as ${\hat{θ}}_{I M P}$ ; they perform no worse than ${\hat{θ}}_{I W}$ and ${\hat{θ}}_{0}$ in other settings. As with the case of δ dependent on X given D in Section 3.1.1, the results are very similar for two different types of residual distributions for X.

Table 2.

Results of simulation study under MAR: comparison of ${\hat{θ}}_{0}$ , ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ using modified weights, when Z₁ and Z₂ are independent. ε is Gaussian (i.e., ε ~ N(0, 1)) or non-Gaussian (i.e., ε = 20{η ~ E(η)} with η ~ Beta(5, 1)). True θ is 0.722 for Gaussian ε and 0.675 for non-Gaussian ε. The details of true models and misspecified working models are provided in Section 3.1.

	Gaussian ε					non-Gaussian ε
	RB (%)	SE	SD	SMSE	CR (%)	RB (%)	SE	SD	SMSE	CR (%)
${\hat{θ}}_{G S}$	0.2	0.036	0.035	0.035	96.0	0.2	0.038	0.036	0.036	96.0
${\hat{θ}}_{0}$	−0.1	0.059	0.057	0.057	95.8	0.4	0.063	0.061	0.061	96.4
	Both mean models correctly specified
${\hat{θ}}_{I M P}$	0.1	0.039	0.040	0.040	94.2	−0.6	0.051	0.050	0.050	95.8
${\hat{θ}}_{I W}$	−0.1	0.059	0.060	0.059	94.8	0.3	0.062	0.065	0.065	94.6
${\hat{θ}}_{D R}$	0.2	0.044	0.040	0.040	96.4	0.1	0.056	0.053	0.053	95.8
${\hat{θ}}_{D R - N}$	0.2	0.041	0.041	0.041	95.4	0.1	0.056	0.053	0.053	95.2
	Mean model for $(M 1)$ misspecified
${\hat{θ}}_{I W}$	−0.2	0.058	0.059	0.059	95.0	0.3	0.062	0.062	0.062	95.2
${\hat{θ}}_{D R}$	0.2	0.041	0.040	0.040	95.0	0.2	0.053	0.051	0.051	95.8
${\hat{θ}}_{D R - N}$	0.2	0.041	0.040	0.040	94.8	0.1	0.054	0.051	0.051	96.2
	Mean model for $(M 2)$ misspecified
${\hat{θ}}_{D R}$	−0.2	0.059	0.055	0.055	96.0	0.4	0.063	0.062	0.062	95.4
${\hat{θ}}_{D R - N}$	−0.2	0.057	0.055	0.055	96.4	0.4	0.063	0.062	0.062	95.2
	th Mean Models misspecified
${\hat{θ}}_{D R}$ .	−0.2	0.054	0.054	0.054	95.0	0.4	0.059	0.059	0.059	95.2
${\hat{θ}}_{D R - N}$	−0.2	0.055	0.054	0.054	95.8	0.4	0.061	0.059	0.059	95.8

Open in a new tab

We repeated the simulations in Tables 1 and 2 using the original weights (Web Appendix B), and the results are almost the same except that the performance of the bootstrap SE for ${\hat{θ}}_{D R - N}$ deteriorates somewhat.

3.2. MNAR: δ dependent on X given D and Z

We now consider the case of MNAR where δ is dependent on X conditional on D and Z, i.e., the true model for δ is $logit (π) = α_{0} + α_{Z} Z_{1}^{(3)} + α_{D} D + α_{X} X$ with (α₀, α_Z, α_D, α_X) = (−1, 0.2, 0.5, 0.3). The rest of the simulation setup is identical to that in Section 3.1. The resulting average probability of missing X is 57.4% in the diseased group and 31.5% in the non-diseased group. We focused on the case where Z₁ and Z₂ are identical and ε is Gaussian; in this case, the true θ remains 0.722. Our primary goal is to compare ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ with their corresponding sensitivity estimators as described in Section 2.4, namely, ${\hat{θ}}_{I W - S}$ , ${\hat{θ}}_{D R - S}$ and ${\hat{θ}}_{D R - N - S}$ , for which the estimating equations (5) were used to estimate α_S = (α₀, α_Z, α_D) with α_X fixed at its true value. The rest of estimating procedures remain the same for all estimators. As with the case of MAR in Section 3.1, we investigated the impact of the mis-specified $(M 1)$ and/or $(M 2)$ ; specifically, we considered a misspecified $(M 1)$ that includes $Z_{1}^{(1)}$ and D and a misspecified $(M 2)$ that includes only $Z_{2}^{(3)}$ and D. We also note that X is included as a covariate in $(M 1)$ for ${\hat{θ}}_{I W - S}$ , ${\hat{θ}}_{D R - S}$ and ${\hat{θ}}_{D R - N - S}$ , but not for ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ or ${\hat{θ}}_{D R - N}$ . Thus, when D and the correct subset of Z₁ (i.e., $Z_{1}^{(3)}$ ) are included in $(M 1)$ , $(M 1)$ is correctly specified for ${\hat{θ}}_{I W - S}$ , ${\hat{θ}}_{D R - S}$ and ${\hat{θ}}_{D R - N - S}$ , but is misspecified for ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ .

Table 3 presents the simulation results. First, ${\hat{θ}}_{0}$ again exhibits substantial bias under MNAR. We now compare ${\hat{θ}}_{I W}$ and ${\hat{θ}}_{I W - S}$ . When $(M 1)$ does not account for the effect of X, ${\hat{θ}}_{I W}$ shows considerable bias even if $(M 1)$ includes D and the correct subset of Z₁. On the other hand, ${\hat{θ}}_{I W - S}$ , which accounts for the effect of X, shows negligible bias. Next, we compare ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ with ${\hat{θ}}_{D R - S}$ and ${\hat{θ}}_{D R - N - S}$ . When correct subsets of Z and D are included in both working models, $(M 1)$ is still misspecified for ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ . However, since $(M 2)$ is correctly specified, ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ exhibit negligible bias and good coverage properties as a result of their double robustness, and their efficiency is comparable to that of ${\hat{θ}}_{D R - S}$ and ${\hat{θ}}_{D R - N - S}$ . These results still hold when $(M 1)$ includes the incorrect subset of auxiliary variables and $(M 2)$ is correctly specified. When an incorrect subset of Z₂ is included in $(M 2)$ , both working models are misspecified for ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ ; consequently, ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ exhibit considerable bias. In all three settings, ${\hat{θ}}_{D R - S}$ and ${\hat{θ}}_{D R - N - S}$ show negligible bias, but their SDs increase when $(M 2)$ is misspecified, which is consistent with the earlier findings that $(M 2)$ is more important in terms of improving efficiency.

4. Data Analysis

We illustrate our methods using an observational psychiatric study, which was concerned with the impact of maternal depression during pregnancy on infant outcomes. In this study, participants were enrolled no later than week 28 of gestation and evaluated at each trimester across pregnancy. As part of the study, the presence (or absence) of a major depressive episode (disease status, D) was determined at each visit by the Mood Module of the Structured Clinical Interview for DSMIV Axis I Disorders (SCID) (First et al., 2002), which needs to be administered by a trained research professional and is considerably more time-consuming and difficult to obtain in practice. At the same time, some subjects also completed the self-rated Edinburgh Postnatal Depression Scale (EPDS) (Cox and Holden, 1987) at each visit.

In female mental health research, several rating scales have been developed for identifying postpartum depression (Fergerson et al., 2002; Perfetti et al., 2004), and in particular the self-rated EPDS has emerged as a widely-used instrument for postpartum depression screening and detection (Austin et al., 2005; Felice et al., 2006), which can be obtained fairly easily in practice. In contrast, there are no validated tools to assess depression during pregnancy. In practice, the EPDS, developed for postpartum use, has been increasingly used to identify depression during pregnancy and to screen for those at risk for developing depression during pregnancy. While not designed for such purpose, data collected from this study have been recently used to evaluate EPDS as a biomarker for the diagnosis of maternal depression throughout pregnancy. For the purpose of illustration, we focus on the data collected from the second trimester; a subset of the study population who had data in the second trimester was used and the sample size is n = 517 in the analysis. The outcome of interest is the presence of a major depressive episode (D) and is confirmed for all subjects, whereas EPDS is the biomarker of interest and is missing in 79% of the subjects. Additional auxiliary variables were also measured in this study including the mother’s age, race, marital status and eduction level, whether or not it was the first pregnancy. In addition, a research interviewer masked to treatment status administered the Structured Interview Guide for the Hamilton Rating Scale for Depression to obtain 17-item (HRSD17), which is known to be highly correlated with EPDS. These variables are treated as auxiliary variables (Z) and are used to build $(M 1)$ and $(M 2)$ .

We conducted a sensitivity analysis for ${\hat{θ}}_{0}$ , ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ and ${\hat{θ}}_{D R - N}$ as described in Section 2.4. Specifically, we considered a $(M 1)$ that is similar to what is discussed in Section 2.4, i.e., $logit (π) = α_{S}^{T} W (Z, D) + α_{X} X$ , where W(Z, D) include the intercept and interaction terms between auxiliary variables Z and D. In fitting this $(M 1)$ , estimating equations (5) were used with α_X fixed at −1, 0 and 1, where α_X = 0 corresponds to the case of MAR, and α_X = −1 or 1 correspond to the case of MNAR. In our analysis, all continuous variables including X were standardized to have mean 0 and unit standard deviation. Consequently, α_X captures the effect of a one-SD change in X. Table 4 presents the results using modified weights. The impact of different α_X values is moderate on ${\hat{θ}}_{I W}$ , ${\hat{θ}}_{D R}$ , and ${\hat{θ}}_{D R - N}$ , and the estimates using different methods including ${\hat{θ}}_{0}$ are comparable. It indicates that the missingness of X (δ) is likely close to be independent of X given D. Nevertheless, ${\hat{θ}}_{D R}$ , which incorporates information from auxiliary variables, is more efficient than the other estimators. Since the proportion of missing data is very high in the data, the bootstrap SE of ${\hat{θ}}_{D R - N}$ is greater than the SE of ${\hat{θ}}_{D R}$ , but it is still smaller than the SE of ${\hat{θ}}_{I W}$ . We repeated this analysis using the original weights in Web Appendix C; while the main results remain similar, a larger bootstrap SE for ${\hat{θ}}_{D R - N}$ is observed as a result of large and unstable weights.

Table 4.

Sensitivity analysis using the modified weights for estimating the ROC AUC (θ) in the psychiatric study

			Estimate	SE
		${\hat{θ}}_{0}$	0.861	0.038
	α_X = −1		α_X = 0		α_X = 1
	Estimate	SE	Estimate	SE	Estimate	SE
${\hat{θ}}_{I W}$	0.864	0.037	0.851	0.040	0.849	0.042
${\hat{θ}}_{D R}$	0.873	0.028	0.852	0.030	0.841	0.032
${\hat{θ}}_{D R - N}$	0.873	0.035	0.852	0.038	0.841	0.038

Open in a new tab

With ${\hat{θ}}_{D R}$ ranging from 0.841 to 0.873, our results suggest that EPDS has very good discriminative power during the second trimester. However, in this study, only a subset of the study population had depression status confirmed during each perinatal window. As a result, in addition to missing values in the biomarker, the verification bias is potentially in play as well. Furthermore, both the rating scale and the presence of a major depressive episode were repeatedly measured through the pregnancy. Therefore, it is of substantial interest in the future studies to investigate methods that can account for both missing biomarker values as well as verification bias and accommodate repeatedly measured biomarker values and disease status when estimating the ROC AUC.

5. Discussion

We have proposed and contrasted several estimators of the ROC AUC when the biomarker value is missing for some subjects. Our numerical studies show that the doubly robust estimators perform as well as or better than other estimators in all cases even when both working models are misspecified. ${\hat{θ}}_{D R}$ is also fairly robust to the misspecified residual distribution for the biomarker variable (X). Since only ranks of X are used in estimating θ, the correct specified conditional mean is more important and the impact of a misspecified residual distribution may be limited given the correctly specified conditional mean. The bootstrap procedure for obtaining SE of ${\hat{θ}}_{D R - N}$ is computationally more expensive and also makes it more susceptible to large and unstable weights. Thus, in practice, we recommend the use of ${\hat{θ}}_{D R}$ and stabilized weights such as ours, and emphasize the importance of identifying (approximately) correct $(M 2)$ . We also note that ${\hat{θ}}_{D R}$ can readily accommodate categorical biomarker values, e.g., a baseline logit model (Agresti, 2002) can be used to model the conditional distribution of a categorical biomarker variable.

More recently, Cao et al. (2009) investigated alternative doubly robust estimators for estimating a population mean; their methods achieve minimum variance under incorrectly specified $(M 2)$ and correctly specified $(M 1)$ , and they do not suffer from large and unstable weights. While their enhanced model for $(M 1)$ can be readily adopted in our methods as an alternative to alleviate the problem of large and unstable weights, it is more involved to extend their approach of minimizing variance under misspecified $(M 2)$ and correctly specified $(M 1)$ to the estimation of the ROC AUC as complications arise from the use of U-statistic in our methods. Potential future research may also include extending sensitivity analysis to $(M 2)$ and investigating more complicated missing patterns, e.g., auxiliary variables are also missing and missingness is not monotone, for which an imputation approach may be more practical.

Supplementary Material

NIHMS1704813-supplement-Supplementary_Material.pdf^{(151.6KB, pdf)}

Acknowledgements

We thank Editor Verbeke, an associate editor and two referees for their insightful suggestions which greatly improved an earlier draft of this manuscript.

Footnotes

Supplementary Materials

Web Appendices referenced in Sections 2 and 3 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

References

Agresti A (2002). Categorial Data Analysis, 2nd Edition. John Wiley & Sons. [Google Scholar]
Austin M, D. H-P, Saint K, and Parker G (2005). Antenatal screening for the prediction of postnatal depression: validation of a psychosocial pregnancy risk questionnaire. Acta Psychiatr Scand. 112, 310–317. [DOI] [PubMed] [Google Scholar]
Bamber D (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology. 12, 387C415. [Google Scholar]
Cao W, Tsiatis A, and Davidian M (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96, 723–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cox J and Holden J (1987). Detection of postnatal depression. development of the 10-item edinburgh postnatal depression scale. Br J Psychiatry. 150, 782–786. [DOI] [PubMed] [Google Scholar]
Felice E, Saliba J, Grech V, and Cox J (2006). Validation of the maltese version of the edinburgh postnatal depression scale. Arch Womens Ment Health. 9, 75–80. [DOI] [PubMed] [Google Scholar]
Fergerson S, Jamieson D, and Lindsay M (2002). Diagnosing postpartum depression: can we do better? Am J Obstet Gynecol. 186, 899–902. [DOI] [PubMed] [Google Scholar]
First M, Spitzer R, Gibbon M, and Williams J (2002). Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Patient Edition (SCID-IP, 11/2002 Revision). Washington, DC: American Psychiatric Press. [Google Scholar]
Fluss R, Reiser B, Faraggi D, and Rotnitzky A (2009). Estimation of the roc curve under verification bias. Bio-1metrical Journal. 51(3), 475–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
Green D and Swets J (1966). Signal detection theory and psychopysics. Wiley, New York. [Google Scholar]
Kosinski A and Barnhart H (2003). Accounting for non-ignorable verification bias in assessment of diagnostic test. Biometrics 59, 163–171. [DOI] [PubMed] [Google Scholar]
Little R and Rubin D (2002). Statistical Analysis with Missing Data. 2nd Edition. Wiley &. Sons. [Google Scholar]
Pepe M (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: University Press. [Google Scholar]
Perfetti J, Clark R, and Fillmore C (2004). Postpartum depression: identification, screening, and treatment. Wis Med J. 103, 56–63. [PubMed] [Google Scholar]
Robins J, Rotnitzky A, and Zhao L (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 89, 846–866. [Google Scholar]
Rosenbaum P and Rubin D (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55. [Google Scholar]
Rotnitzky A, Faraggi D, and Schisterman E (2006). Doubly robust estimation of the area under the receiver-operating characteristic curve in the presence of verification bias. Journal of the American Statistical Association. 101, 1276–1288. [Google Scholar]
Rotnitzky A and Robins J (1997). Analysis of semiparametric regression models with non-ignorable nonresponse. Statistics in Medicine. 16, 81–102. [DOI] [PubMed] [Google Scholar]
Scharfstein D, Rotnitzky A, and Robins J (1999). Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion). Journal of the American Statistical Association 94, 1096–1120. [Google Scholar]
Zhou X (1993). Maximum likelihood estimators of sensitivity and specificity corrected for verification bias. Communication in Statistics-Theory and Methods. 22, 3177–3198. [Google Scholar]
Zhou X (1994). Effect of verification bias on positive and negative predictive values. Statistics in Medicine. 13, 1737–1745. [DOI] [PubMed] [Google Scholar]
Zhou X (1998). Correcting for verification bias in studies of a diagnostic test’s accuracy. Statistical Methods in Medical Research. 7, 337–353. [DOI] [PubMed] [Google Scholar]
Zweig M and Campbell G (1993). Receiver-operating characteristic (roc) plots: A fundamental evaluation tool in clinical medicine. Clinical Chemistry. 39, 561–577. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

NIHMS1704813-supplement-Supplementary_Material.pdf^{(151.6KB, pdf)}

[R1] Agresti A (2002). Categorial Data Analysis, 2nd Edition. John Wiley & Sons. [Google Scholar]

[R2] Austin M, D. H-P, Saint K, and Parker G (2005). Antenatal screening for the prediction of postnatal depression: validation of a psychosocial pregnancy risk questionnaire. Acta Psychiatr Scand. 112, 310–317. [DOI] [PubMed] [Google Scholar]

[R3] Bamber D (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology. 12, 387C415. [Google Scholar]

[R4] Cao W, Tsiatis A, and Davidian M (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96, 723–734. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Cox J and Holden J (1987). Detection of postnatal depression. development of the 10-item edinburgh postnatal depression scale. Br J Psychiatry. 150, 782–786. [DOI] [PubMed] [Google Scholar]

[R6] Felice E, Saliba J, Grech V, and Cox J (2006). Validation of the maltese version of the edinburgh postnatal depression scale. Arch Womens Ment Health. 9, 75–80. [DOI] [PubMed] [Google Scholar]

[R7] Fergerson S, Jamieson D, and Lindsay M (2002). Diagnosing postpartum depression: can we do better? Am J Obstet Gynecol. 186, 899–902. [DOI] [PubMed] [Google Scholar]

[R8] First M, Spitzer R, Gibbon M, and Williams J (2002). Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Patient Edition (SCID-IP, 11/2002 Revision). Washington, DC: American Psychiatric Press. [Google Scholar]

[R9] Fluss R, Reiser B, Faraggi D, and Rotnitzky A (2009). Estimation of the roc curve under verification bias. Bio-1metrical Journal. 51(3), 475–490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Green D and Swets J (1966). Signal detection theory and psychopysics. Wiley, New York. [Google Scholar]

[R11] Kosinski A and Barnhart H (2003). Accounting for non-ignorable verification bias in assessment of diagnostic test. Biometrics 59, 163–171. [DOI] [PubMed] [Google Scholar]

[R12] Little R and Rubin D (2002). Statistical Analysis with Missing Data. 2nd Edition. Wiley &. Sons. [Google Scholar]

[R13] Pepe M (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: University Press. [Google Scholar]

[R14] Perfetti J, Clark R, and Fillmore C (2004). Postpartum depression: identification, screening, and treatment. Wis Med J. 103, 56–63. [PubMed] [Google Scholar]

[R15] Robins J, Rotnitzky A, and Zhao L (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 89, 846–866. [Google Scholar]

[R16] Rosenbaum P and Rubin D (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55. [Google Scholar]

[R17] Rotnitzky A, Faraggi D, and Schisterman E (2006). Doubly robust estimation of the area under the receiver-operating characteristic curve in the presence of verification bias. Journal of the American Statistical Association. 101, 1276–1288. [Google Scholar]

[R18] Rotnitzky A and Robins J (1997). Analysis of semiparametric regression models with non-ignorable nonresponse. Statistics in Medicine. 16, 81–102. [DOI] [PubMed] [Google Scholar]

[R19] Scharfstein D, Rotnitzky A, and Robins J (1999). Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion). Journal of the American Statistical Association 94, 1096–1120. [Google Scholar]

[R20] Zhou X (1993). Maximum likelihood estimators of sensitivity and specificity corrected for verification bias. Communication in Statistics-Theory and Methods. 22, 3177–3198. [Google Scholar]

[R21] Zhou X (1994). Effect of verification bias on positive and negative predictive values. Statistics in Medicine. 13, 1737–1745. [DOI] [PubMed] [Google Scholar]

[R22] Zhou X (1998). Correcting for verification bias in studies of a diagnostic test’s accuracy. Statistical Methods in Medical Research. 7, 337–353. [DOI] [PubMed] [Google Scholar]

[R23] Zweig M and Campbell G (1993). Receiver-operating characteristic (roc) plots: A fundamental evaluation tool in clinical medicine. Clinical Chemistry. 39, 561–577. [PubMed] [Google Scholar]

PERMALINK

Robust Estimation of Area Under ROC Curve Using Auxiliary Variables In the Presence of Missing Biomarker Values

Qi Long

Xiaoxi Zhang

Brent A Johnson

Summary:

1. Introduction

2. Methodology

2.1. Inverse-Weighted Estimator

2.2. Doubly Robust Estimators

2.3. Theoretical Properties

2.4. MNAR: Sensitivity Analysis

3. Simulation studies

Table 1.

Table 3.

3.1. MAR: δ independent of X given D and Z

3.1.1. The case of δ dependent on X given D.

3.1.2. The case of δ independent of X given D.

Table 2.

3.2. MNAR: δ dependent on X given D and Z

4. Data Analysis

Table 4.

5. Discussion

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Robust Estimation of Area Under ROC Curve Using Auxiliary Variables In the Presence of Missing Biomarker Values

Qi Long

Xiaoxi Zhang

Brent A Johnson

Summary:

1. Introduction

2. Methodology

2.1. Inverse-Weighted Estimator

2.2. Doubly Robust Estimators

2.3. Theoretical Properties

2.4. MNAR: Sensitivity Analysis

3. Simulation studies

Table 1.

Table 3.

3.1. MAR: δ independent of X given D and Z

3.1.1. The case of δ dependent on X given D.

3.1.2. The case of δ independent of X given D.

Table 2.

3.2. MNAR: δ dependent on X given D and Z

4. Data Analysis

Table 4.

5. Discussion

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases