Abstract
In studies of older adults, researchers often recruit proxy respondents, such as relatives or caregivers, when study participants cannot provide self-reports (e.g., due to illness). Proxies are usually only sought to report on behalf of participants with missing self-reports; thus, either a participant self-report or proxy report, but not both, is available for each participant. Furthermore, the missing-data mechanism for participant self-reports is not identifiable and may be nonignorable. When exposures are binary and participant self-reports are conceptualized as the gold standard, substituting error-prone proxy reports for missing participant self-reports may produce biased estimates of outcome means. Researchers can handle this data structure by treating the problem as one of misclassification within the stratum of participants with missing self-reports. Most methods for addressing exposure misclassification require validation data, replicate data, or an assumption of nondifferential misclassification; other methods may result in an exposure misclassification model that is incompatible with the analysis model. We propose a model that makes none of the aforementioned requirements and still preserves model compatibility. Two user-specified tuning parameters encode the exposure misclassification model. Two proposed approaches estimate outcome means standardized for (potentially) high-dimensional covariates using multiple imputation followed by propensity-score methods. The first method is parametric and uses maximum likelihood to estimate the exposure misclassification model (i.e., the imputation model) and the propensity score model (i.e., the analysis model); the second method is non-parametric and uses boosted classification and regression trees to estimate both models. We apply both methods to a study of elderly hip-fracture patients.
Keywords: Exposure Misclassification, Gerontology, Missing Data, Proxy Respondents
1. Introduction
Studies of older adults often include interview questions on participants’ perceptions, attitudes, and beliefs. This information is used to measure disability, depressive symptoms, and other subjective constructs as predictors of physical decline and mortality. A common problem in assessment is that frail participants are often unable to respond to interview questions due to cognitive impairment or illness. Researchers are concerned about excluding this segment of the population from analysis, because doing so would lead to selection bias from missing data and would produce findings that represent only healthier members of the older adult population.
To prevent selection bias, researchers often attempt to recruit a proxy respondent to report on behalf of a participant who has missing data [1]. However, the proxy report is an error-prone version of the missing participant self-report. When the response options result in a binary exposure variable, the proxy report is likely subject to differential misclassification [2], and substituting the proxy report in place of the missing participant self-report in analysis can lead to misclassification bias either toward or away from the null. Furthermore, since most studies only recruit proxies to report on behalf of participants with missing data, the proxy reports cannot be empirically validated.
Statistical analysis with exposure misclassification is challenging, because the misclassification mechanism is not identified by the data and most methods require either available validation data, replicate data, or an assumption of nondifferential misclassification (see [3, 4] and references therein). Furthermore, most conventional methods for misclassification were developed to address exposures missing by design, not those missing by happenstance via an unidentifiable and potentially nonignorable mechanism. To overcome these limitations and address the data structure in studies with proxy respondents, we propose a statistical approach that treats the issue as a misclassification problem embedded within a missing-data framework. The proposed approach applies multiple imputation [5] to estimate missing exposures using proxy reports and applies propensity-score methods to calculate standardized outcome means. The exposure misclassification model is indexed by two user-specified tuning parameters that represent an assumed level of agreement between the observed proxy and missing participant responses. A sensitivity analysis can be performed by varying the tuning parameters.
The proposed method builds on our previous work addressing misclassified or mismeasured outcome data from proxy respondents [6, 7]. An additional complication of misclassified exposure data is the need to specify an exposure misclassification model (an imputation model), and ensure that it is compatible with the analysis model, which in this case is the propensity-score model [3, 4]. To handle this complication, we propose both parametric and non-parametric approaches to estimate both the imputation and analysis models, where parametric estimation is carried out using maximum likelihood and non-parametric estimation is carried out using boosted classification and regression trees (CART) [8–10].
We assess the methods’ performance using a simulation study and apply the methods to a study of hip fracture patients to determine whether perceived rapid recovery of mobility is associated with two-year survival [11]. Throughout the paper, we conceptualize participant self-reports as the gold standard and proxy reports as an error-prone version to be consistent with the study design most commonly used in aging research [12].
2. Data and Model
Let Y denote an outcome variable and let X denote the true value of a binary exposure variable. Let Z denote a vector of covariates. Our goal is to estimate
| (1) |
the x-specific (x = 0, 1) mean of Y standardized for Z. If Z comprises all confounders of the X-Y relation, then μ1 − μ0 equals the causal contrast E[Y(1) − Y(0)], where Y(x) denotes the potential outcome if exposure X were set to value x. Equation (1) is a special case of the g-formula proposed by Robins [13]. If X were completely observed without error, Equation (1) can be estimated using propensity-score methods [14–16], including inverse-probability weighting [17]. This relation can be seen by re-writing Equation (1) as
| (2) |
where P(X = x | Z = z) is the propensity score of having X = x given covariates Z = z. Consider a study with n participants where Xi, Yi, and Zi denote the variables for the ith participant, i = 1, …, n. A complete-data estimator for μx is essentially a Monte Carlo integration over the distribution of Z among those with X = x:
| (3) |
where I(·) is the indicator function and P̂(Xi = x | Zi) is an estimator for P(Xi = x | Zi). Thus, consistent estimation of μx can be accomplished by consistent estimation of P(X = x | Z = z). Equation (3) is the inverse-probability weighted estimator; the link between standardization and inverse-probability weighting is further described by Sato and Matsuyama [18].
Analysis is more complex when X is missing for some participants and is measured by proxy reports. In this scenario, unbiased estimation of P(X = x | Z = z) in Equation (2) using proxy reports requires an exposure model that 1) differentiates between data from participant self-reports and proxy reports, and 2) addresses the misclassification mechanism of proxy reports. Section 2.1 below differentiates participant self reports and proxy reports using a missing data framework, and Section 2.2 proposes a model for misclassification for the subgroup of participants who require proxy reports.
2.1. Missing Data Framework
We introduce additional notation to differentiate between participant self-reports and proxy reports and to express an exposure model using the missing-data framework. Let X* denote the proxy-reported binary exposure variable. Let R be an indicator for participant response, where R = 1 when X is observed, and R = 0 when X is missing (X* is observed). Gerontology researchers may be concerned that X is missing not at random (MNAR) in the sense of Rubin [19], because older adults who are frailer and sicker may be more reliant on proxy respondents than their healthier counterparts. The implications of incomplete exposure data are somewhat counterintuitive. For example, if exposure is MNAR, but missingness does not depend on Y, then means of Y estimated using complete-case methods will be unbiased. In contrast, if exposure is missing at random (MAR), but missingness depends on Y, then means of Y estimated using complete-case methods will be biased [20, 21]. To minimize assumptions and maximize generality, we will presume that the exposure is MNAR, and that missingness may depend on Y (and covariates Z).
When X is MNAR, then
for x = (0, 1) and r = (0, 1). In this case, researchers must specify the conditional joint distribution of X and R. However, since the joint distribution is not identifiable from the data, its specification depends on investigator-supplied assumptions about missingness. Researchers encode their assumptions by factoring the joint distribution into either a selection model,
or a pattern-mixture model [22, 23],
The selection model is often used, because it leads to direct estimation of P(X = x | Z, Y), where integration over Y is all that is needed to obtain propensity scores P(X = x | Z) to plug into Equation (2). However, we will specify a pattern-mixture model, because it can easily accommodate proxy reports, X*. To see this, note that P(X = x | Z, Y) can be calculated from a pattern-mixture model by
| (4) |
Thus, an unbiased estimate of P(X = x | Z, Y) can be obtained by plugging unbiased estimates into the right side of Equation (4). The quantity P(X = x | Z, Y, R = 1) is estimable from observed participant self-reports; P(R = r | Z, Y), where r = (0, 1), is also estimable. In contrast, P(X = x | Z, Y, R = 0) is not estimable; however, P(X* = x* | Z, Y, R = 0), where x* = (0, 1), is estimable. Therefore, researchers can posit assumptions about P(X = x | Z, Y, R = 0) by relating it to P(X* = x* | Z, Y, R = 0).
Relating P(X = x | Z, Y, R = 0) to P(X* = x* | Z, Y, R = 0) in a pattern-mixture model is analogous to previously proposed models for handling proxy-reported outcome data [6, 7], but differs from conventional usage of pattern-mixture models. Conventionally, P(X = x | Z, Y, R = 0) would be specified by relating it to P(X = x | Z, Y, R = 1). For example, setting P(X = 1 | Z, Y, R = 0) = P(X = 1 | Z, Y, R = 1) encodes the MAR assumption [19]. Analogously, a pattern-mixture model setting P(X = 1 | Z, Y, R = 0) = P(X* = 1 | Z, Y, R = 0) encodes the assumption that analysis using RX + (1 − R)X* as the exposure is unbiased.
Although researchers can ignore X* and perform a principled analysis using one of many techniques [24], collecting and analyzing X* may be useful because 1) specifying the relation between X and X* may be more intuitive for subject-matter experts than specifying the relation between X and R due to the many published participant-proxy concordance studies available (see [1, 25] and references therein), and 2) X* may be the strongest predictor of X available in the data set given that measuring X* is an attempt to recover X.
2.2. Model for Misclassification within Stratum R = 0
Expressing the exposure model as a pattern-mixture model in Equation (4) shows that within the stratum R = 0, the problem simplifies to one of misclassification. Namely, estimating Equation (4) using X* requires positing a relation between P(X = 1 | Z, Y, R = 0) and P(X* = 1 | Z, Y, R = 0). To tackle this problem we adapt and apply previously published methods fro misclassification error [3]. First, we express P(X* = 1 | Z, Y, R = 0) as
| (5) |
and note that the first term in the summation can be interpreted as sensitivity or 1 − specificity for x = 1 or 0, respectively [26]. The concepts of sensitivity and specificity are often used in the misclassification literature as interpretable, standardized measures of the assumed relation between X and X*. Let Sens(Z, Y) = P(X* = 1 | X = 1, Z, Y, R = 0) and Spec(Z, Y) = P(X* = 0 | X = 0, Z, Y, R = 0) denote sensitivity and specificity, respectively, conditioned on Z and Y in the stratum R = 0. Therefore, a sensitivity analysis on Equation (4) boils down to varying the values of Sens(Z, Y) and Spec(Z, Y) in Equation (5).
Two related complications exist when positing values of Sens(Z, Y) and Spec(Z, Y). First, Sens(Z, Y) and Spec(Z, Y) must satisfy the constraint
| (6) |
to ensure that estimated values of P(X = 1 | Z, Y, R = 0) lie between 0 and 1. Second, sensitivity and specificity as (potentially high-dimensional) functions of Z and Y must be posited. To simplify this problem, researchers often assume that misclassification is nondifferential, which implies that Sens(Z, Y) and Spec(Z, Y) are constant in Z and Y:
and
The benefit of assuming nondifferential classification is that only two unidentified user-specified values are needed: one each for sensitivity and specificity. However, the drawbacks of assuming nondifferential misclassification are two-fold. First, nondifferential misclassification is often implausible, particularly in the context of aging research with proxy respondents [2]. Second, assuming nondifferential misclassification may violate the constraints in Equation (6) for some values of Z and Y. To overcome these limitations while maintaining the advantage of requiring only two user-specified values, we instead encode assumptions about sensitivity and specificity using two tuning parameters and P(X* = 1 | Z, Y, R = 0), where the latter quantity can be empirically estimated. Let qsens and qspec denote two unidentifiable tuning parameters that encode assumptions about sensitivity and specificity, respectively, and let
| (7) |
and
| (8) |
Using Equations (7) and (8), Equation (6) is satisfied by assuming qsens, qspec > 0. Equations (7) and (8) are examples of ‘exponential tilt models,’ [27] a common approach to modeling nonignorably missing data and performing sensitivity analysis (see [28, 29]).
The quantity eqsens is interpreted as the odds ratio of X* = 1 comparing participants with X = 1 to the whole population conditioned on Z, Y and R = 0. Similarly, eqspec is the odds ratio of X* = 0 comparing participants with X = 0 to the whole population conditioned on Z, Y and R = 0. For example, if P(X* = 1 | Z = z, Y = y, R = 0) = 0.75 for a particular (z, y), and we assume that eqsens = 3 and eqspec = 7, then
In this example, setting eqsens = 3 implies that the odds of X* = 1 increase from 0.75/0.25 to (0.75/0.25) × 3 after conditioning on X = 1; setting eqspec = 7 implies that the odds of X* = 0 increase from 0.25/0.75 to (0.25/0.75) × 7 after conditioning on X = 0. Presumed sensitivity and specificity increase with increasing qsens and qspec, respectively. Assuming qsens = qspec = ∞ implies Sens(Z, Y) = Spec(Z, Y) = 1 and is equivalent to performing a complete-case analysis using RX + (1 − R)X* as the exposure.
Once eqsens and eqspec are specified, Equation (5) can be used to solve for P(X = 1 | Z, Y, R = 0). The solution is obtained by using the standard “matrix-method” calculations shown in the misclassification literature [30]:
| (9) |
The quantity P(X = 1 | Z, Y, R = 0) is then used to calculate positive predictive value (PPV) and negative predictive value (NPV):
| (10) |
| (11) |
The quantities PPV(Z, Y) and NPV(Z, Y) form the basis of the exposure-misclassification imputation model. Also, plugging P(X = 1 | Z, Y, R = 0) into Equation (4) calculates P(X = 1 | Z, Y), which will be used to compute propensity scores and hence forms the basis of the analysis model.
In the next section, we describe how to estimate PPV(Z, Y), NPV(Z, Y), P(X = 1 | Z, Y), propensity scores, and standardized means μx while ensuring compatibility between the imputation models (PPV(Z, Y), NPV(Z, Y)) and the analysis model (P(X = 1 | Z)).
3. Estimation
We propose a general strategy of using the expsoure misclassification model to multiply impute missing X using PPV(Z, Y) and NPV(Z, Y). We then use the multiply imputed data to calculate standardized means μx via propensity-score methods, where the propensity score model is the analysis model. A challenge of this approach is to specify models for propensity scores, PS(Z) = P(X = 1 | Z), that are compatible with the imputation models, that is, so that there exists a full-data distribution that is consistent with all specified models. To achieve this goal, we first consider specification and estimation of parametric models that preserve model compatibility. We then describe estimation using non-parametric machine-learning methods that can avoid model incompatibility and other sources of model mis-specification by reducing the number of needed models.
Consider a study with n participants where Xi, Xi*, Ri, Yi, and Zi denote the variables for the ith participant, i = 1, …, n. Without loss of generality, let the respective distributions of Xi, Xi*, Ri, and Yi be denoted as
| (12) |
| (13) |
| (14) |
| (15) |
where f(· | ·) denotes a probability mass function for categorical outcomes and a probability density function for continuous outcomes. We henceforth may suppress the subscript i in notation if doing so does not cause ambiguity.
3.1. Parametric Estimation
Let the respective distributions for Xi, Xi*, Ri, and Yi be indexed by a finite number of parameters, β = {βX, βX*, βR, βY}, and be denoted as πXi|ZiYi1(βX), πXi*|ZiYi0(βX*), πRi|ZiYi(βR), and fYi|Zi(yi; βY). We estimate β, denoted β̂ = {β̂X, β̂X*, β̂R, β̂Y}, by maximizing the observed-data likelihood L(β) where
| (16) |
Equation (16) can be maximized by performing four separate regressions: 1) Regressing Xi on Zi and Yi using data from participants with Ri = 1 produces β̂X, which can be used to calculate fitted values π̂Xi|ZiYi(β̂X); 2) regressing Xi* on Zi and Yi using data from participants with Ri = 0 produces β̂X*, which can be used to calculate fitted values π̂X*i|ZiYi0(β̂X*); 3) regressing Ri on Zi and Yi using data from all participants produces β̂R, which can be used to calculate fitted values π̂Ri|ZiYi(β̂R); and 4) regressing Yi on Zi using data from all participants produces β̂Y, which can be used to calculate estimated densities f̂Yi(yi; β̂Y). These parameter estimates can be used to calculate propensity scores, PS(Z). If Y is categorical, then fitted values from maximizing L(β) calculate
| (17) |
where, by plugging estimates into Equation (4),
and Sêns(Z, Y) and Sp̂ec(Z, Y) are estimates of Sens(Z, Y) and Spec(Z, Y), respectively, found by plugging π̂X*i|ZiYi0(β̂X*) into Equations (7) and (8). If Y is continuous, Monte Carlo integration can calculate . The first step is to simulate nysim values from f̂Yi (β̂Y) for i = 1, …, n, denoted , j = 1, …, nysim. Simulation can most efficiently be carried out by drawing one set of nysim values from the uniform distribution and then using the inverse cumulative distribution method for transformation. Then, calculate
| (18) |
Plugging appropriate estimated quantities into Equations (10) and (11) produces estimates of PPV(Z, Y) and NPV(Z, Y), denoted PP̂V(Z, Y) and NP̂V(Z, Y), respectively. A step-by-step workflow for multiply imputing M sets of missing Xi and estimating μx proceeds as follows:
Sample participants with Ri = 0 with replacement (bootstrap sample).
Obtain β̂X*, the maximum likelihood estimate of βX*, by regressing X* on Z and Y (e.g., logistic regression) using the bootstrapped sample.
Calculate π̂Xi*|ZiYi0(β̂X*) for the original sample with R = 0.
Plug π̂Xi*|ZiYi0(β̂X*) in place of P(X* = 1 | Z, Y, R = 0) into Equation (9) to obtain P̂(Xi = 1 | Zi, Yi, Ri = 0) for each participant i with Ri = 0. Calculate PP̂V(Zi, Yi) and NP̂V(Zi, Yi) using Equations (10) and (11).
For each i with Ri = 0, draw a value from [1 − NP̂V(Zi, Yi)] × (1 − Xi*) + PP̂V(Zi, Yi) × Xi*.
Plug P̂(Xi = 1 | Zi, Yi, Ri = 0), π̂Ri|ZiYi(β̂R), and π̂Xi|ZiYi1(β̂X) into Equation (4) to obtain P̂(Xi = 1 | Zi, Yi = y; β̂X, β̂X*, β̂R).
Solve Equation (17) or (18) to obtain estimated propensity scores, .
Obtain an estimate of μx, x = 0, 1 by plugging completed X, , and into a complete-data propensity-score method for standardization such as inverse-probability weighting with weights .
Steps 1–2 account for the uncertainty of β̂X*, which can alternatively be achieved by simulating from the estimated asymptotic distribution of β̂X*[5, 24]; steps 3–4 compute the imputation model, which is a function of qsens, qspec, and π̂Xi*|ZiYi0(β̂X*); step 5 generates the imputations; steps 6–7 compute propensity scores by averaging over R and Y (this can be simplified by regressing Xcomp on Z, but with the potential for model incompatibility; for example, a presumed linear-logistic model for P(X = x | Z) may contradict the models specified on the right side of Equation (4)); and step 8 is the complete-data analysis. Repeating steps 1 – 8 M times produces M completed data sets and M estimates of μx and corresponding variance-covariance matrix. The final estimate and variance-covariance matrix is obtained using Rubin’s combining rules [5].
Although computationally simple, a major drawback of the parametric estimation approach is that it requires specification of fYi|Zi (yi; βY), πRi|ZiYi(βR), and πXi|ZiYi1(βX), which are nuisance parameters that are not of scientific interest, but are needed to obtain estimates of PS(Z) that are compatible with the imputation model. Additionally, even if qsens and qspec are correct, results will not be robust to misspecification of the models in Equations (12)–(15). To overcome this limitation, we consider non-parametric estimation of Equation (13) and of , which avoids having to specify and estimate Equations (12), (14), and (15).
3.2. Non-parametric Estimation
We consider machine-learning methods for imputing missing X and for estimating propensity scores by regressing Xcomp on Z. Rather than positing a data-generating model as in the parametric estimation procedure above, machine-learning methods seek to extract the relationships between an outcome and set of predictor variables without a presumed data-generating model. Machine-learning methods are potentially beneficial for the problem of misclassified proxy data owing to the need for π̂Xi*|ZiYi0 to calculate PPV(Z, Y) and NPV(Z, Y) for imputation and PS(Z) for analysis. Specifically, these methods allow us to avoid modeling Equations (12), (14) and (15) altogether by eliminating the need to explicitly solve Equations (4) and (17) or (18) to compute PS(Z) that ensure compatibility with the imputation model.
Multiple machine-learning methods have been studied for estimating propensity scores, including CART and ensembles of CARTs such as bagged CART, boosted CART, and random forests [31]. We will only consider boosted CART here for brevity and because it has empirically demonstrated better performance than other machine-learning methods [31]. Briefly, CART recursively partitions the data into nodes defined by a set of predictors and predictor cut points within which observations have similar outcomes. The result is a decision tree that can accommodate interactions and nonlinear relationships. However, CART is prone to over fitting and is suboptimal at revealing linear (or other smooth) main effects.
To overcome these weaknesses, boosted CART passes through the data multiple times to update a suboptimal prediction model. The update is a regression tree of the residuals of the current model [9, 32], and fit is quantified using the log-likelihood. Consider the objective of estimating πXi*|ZiYi0. Following the notation of McCaffrey et al [32], let g(Zi, Yi) denote the current prediction model for the log-odds that Xi* = 1 given (Zi, Yi, Ri = 0) and let h(Zi, Yi) denote an update to the current model. To estimate h(Zi, Yi), calculate residuals of the estimated current model, ĝ(Zi, Yi), residi = Xi * −expit[ĝ(Zi, Yi)], where expit[·] = exp(·)/(1 + exp(·)), and use CART to regress residi on (Zi, Yi). By modeling the residuals, h(Zi, Yi) is interpreted as the expected score function, thus quantifying the optimal adjustment of g(Zi, Yi) to increase the log-likelihood [32]. By using CART to model the residuals, study participants are partitioned into K regions, T1, …, TK, where the within-region residuals are relatively homogeneous (compared to between-region residuals), and where the estimated within-region mean of residuals is a constant. However, updating ĝ(Zi, Yi) using the within-region mean residuals is not guaranteed to increase the likelihood. Therefore, as proposed by Friedman [10], the within-region likelihood is maximized via a constant update, θk, for k = 1, …, K. That is, for i ∈ Tk, maximize
| (19) |
Therefore, using a second-order Taylor-series approximation to reduce computation, the estimated update is
| (20) |
and the estimated updated model is ĝnew(Zi, Yi) = ĝ(Zi, Yi) + ĥk(Zi, Yi) for i ∈ Tk. A shrinkage parameter, α ∈ (0, 1], can be used to reduce the size of the update so that ĝnew(Zi, Yi) = ĝ(Zi, Yi) + α × ĥk(Zi, Yi) for i ∈ Tk. Friedman [33] proposed adding a random-sampling step into the estimation algorithm, where, at each iteration, a random sub-sample is used in CART to estimate the update.
The imputation model is then estimated by computing fitted values, π̂Xi*|ZiYi0, which are used to estimate PPV(Z, Y) and NPV(Z, Y). Boosted CART is similarly used to estimate PS(Zi) by replacing Xi* with and (Zi, Yi) with Zi in Equations (19) and (20), and by including all participants irrespective of Ri. The step-by-step workflow to impute M sets of missing Xi and estimate μx is:
Sample participants with Ri = 0 with replacement (bootstrap sample).
Perform a boosted CART analysis of X* on Z and Y using the bootstrapped sample.
Calculate π̂Xi*|ZiYi0 by applying the boosted CART results to the original sample with R = 0.
Plug π̂Xi*|ZiYi0 in place of P(X* = 1 | Z, Y, R = 0) into Equation (9) to obtain P̂(Xi = 1 | Zi, Yi, Ri = 0) for each participant i with Ri = 0. Calculate PP̂V(Zi, Yi) and NP̂V(Zi, Yi) using Equations (10) and (11).
For each i with Ri = 0, draw a value, , from [1 − NP̂V(Zi, Yi)] × (1 − Xi*) + PP̂V (Zi, Yi) × Xi*.
Perform boosted CART of on Z to obtain estimated propensity scores, .
Obtain an estimate of μx, x = 0, 1 by plugging and into a complete-data propensity-score method for standardization such as inverse-probability weighting with weights .
Steps 1–2 account for the uncertainty of π̂Xi*|ZiYi0 [5, 24]; steps 3–4 compute the imputation model, a function of qsens, qspec, and π̂Xi*|ZiYi0; step 5 generates the imputations; step 6 computes propensity scores by regressing Xcomp on Z (and avoids model incompatibility without requiring models for Equations (12), (14), and (15) owing to the non-parametric estimation); and step 7 is the complete-data analysis. Repeating steps 1–7 M times produces M completed data sets and M estimates of μx and corresponding variance-covariance matrix. Once again, we use Rubin’s combining rules to obtain the final estimates [5].
3.3. Adaptations and Extensions
The proposed model and estimation procedures provide a foundation for adaptation and extension. For example, specifying non-differential misclassification can be accomplished by estimating P(X* = 1 | Z, R = 0) or P(X* = 1 | R = 0) and then computing sensitivity and specificity by plugging it, rather than P(X* = 1 | Y, Z, R = 0), into Equations (7) and (8). Under this assumption, a model for Y is not needed for parametric estimation of PS(Z).
The model for differential misclassification can be further generalized. As an example, differential misclassification in the special case of binary Y and a single binary Z is often operationalized by specifying four separate values of sensitivity and specificity, one for each combination of Y and Z. The exponential tilt models can accommodate this case by allowing qsens and qspec to be non-negative functions rather than scalars. Setting qsens(Z, Y) = γ0 + γ1Z + γ2Y + γ3ZY and qspec(Z, Y) = ξ0 + ξ1Z + ξ2Y + ξ3ZY by user-specified γ and ξ is equivalent to specifying (up to) four values of sensitivity and specificity. The analysis for general Z proceeds by computing Sens(Z, Y) and Spec(Z, Y) and plugging them into Equation (9). Up to now, we considered the special case where qsens(Z, Y) = γ0 and qspec(Z, Y) = ξ0. The exponential tilt models are beneficial in that they can be generalized to accommodate continuous Y and multiple, possibly continuous, Z. Indexing exponential tilt models with a flexible function is consistent with their use in the missing-data literature [28, 34].
Also, rather than specifying the parameters of qsens(Z, Y) and qspec(Z, Y) to be fixed quantities, the parameters can be treated as random quantities with distributions. For each iteration of multiple imputation, the parameters of qsens(Z, Y) and qspec(Z, Y) can be simulated from a user-specified distribution. Doing so extends the probabilistic sensitivity analysis of misclassification proposed by Fox et al [35] that accounts for uncertainty of assumptions about the misclassification mechanism.
Lastly, the proposed approach was motivated by the scenario where some participants have the gold-standard measure (i.e., R = 1 for some). An additional benefit of the approach is that it can handle, as a special case, the scenario where R = 0 for all. This special-case scenario is the classical case of exposure misclassification that is most often addressed in the misclassification literature [3]. Non-parametric estimation can be directly applied to this case because propensity scores are estimated by regressing Xcomp on Z. Parametric estimation can be adapted–and simplified–by making the following small changes: Equations (12) and (14) need not be estimated, Equation (15) and the propensity scores implicitly condition on R = 0, and hence propensity scores are computed by integrating Y out of P(X = 1 | Z, Y, R = 0).
4. Simulation Study
We performed a simulation study to evaluate the finite-sample properties of both the parametric and non-parametric estimation procedures. We explored the cases of binary Y and continuous Y. Boosted CART was implemented in R software version 2.15.0 [36] using the twang package [37] with a shrinkage parameter of 0.0005, 20000 iterations, and a 50% sub-sampling fraction. We selected these values based on published recommendations [31, 32]. For both types of outcomes and both types of estimation, we assessed the methods’ accuracy by calculating percent relative bias. Empirical standard errors were compared to estimated standard errors, and empirical coverage of the 95% confidence interval was calculated.
For all simulations, we simulated 1000 data sets each of size n = 500. We simulated three covariates Z = (Z1, Z2, Z3) where Z1 followed a Bernoulli(0.5) distribution, and (Z2, Z3) followed a bivariate normal distribution with mean (0, 0.1Z1), variance (1, 1) and covariance 0.5. We took M = 50 imputations for all simulations.
4.1. Binary Outcome
Y was simulated from a Bernoulli distribution with logit(fY|Z(1)) = 0.6 + 0.1Z1 + 0.1Z2 − 0.1Z3; and R was simulated from a Bernoulli distribution with logit(πR|ZY) = 0.2 − 0.2Z1 + 0.1Z2 − 0.1Z3 + 0.9Y + 0.1Y Z2. When R = 1, X was simulated from a Bernoulli distribution with logit(πX|ZY 1) = −0.2 − 0.2Z1 + 0.2Z2 − 0.1Z3 + 1.0Y + 0.1Y Z2, and when R = 0, X* was simulated from a Bernoulli distribution with logit(πX*|ZY 0) = 0.1 + 0.2Z1 − 0.1Z2 + 0.1Z3 − 0.3Y + 0.3Y Z2. We set qsens = 2.00 and qspec = 0.75. The median (interquartile range) of sensitivity and specificity under these models was 0.894 (0.876–0.906) and 0.649 (0.619–0.688), respectively. We estimated standardized proportions px ≡ μx and their difference, p1 − p0, using inverse-probability weighting with multiply imputed X and WX|Z estimated using both parametric modeling (logistic regression) of Equations (12)–(15) and non-parametric modeling (boosted CART) of Equation (13) and PS(Z). Estimation was performed assuming both the correct values of qsens and qspec and under the incorrect assumption that qsens = qspec = 99, an arbitrarily large value to approximate ∞. We also performed analysis using only data with R = 1 (“participant-only estimation”) and with XR + (1 − R)X* as the exposure (“participant + proxy estimation”) to demonstrate the latter approach’s equivalence with the proposed approach assuming qsens = qspec = ∞.
Table 1 shows that parametric multiple imputation-based estimation with correct qsens and qspec produced proportions and differences of proportions with negligible bias, whereas participant-only estimation produced biased estimates due to Y-dependent MNAR missingness, and participant + proxy estimation produced biased estimates due to misclassification error. Furthermore, the parametric multiple imputation-based estimates with qsens = qspec = 99 were nearly identical to those with participant + proxy estimation, empirically demonstrating the equivalence between these two models. Not surprisingly, multiple imputation over-estimated the standard errors [38, 39], leading to empirical coverage > 0.95 with correct qsens and qspec. Non-parametric multiple imputation produced proportions that were unbiased and differences in proportions that had small bias when qsens and qspec were correctly specified, similar standard errors to those from parametric estimation, and empirical coverage > 0.95.
Table 1.
Simulation Study Results for Binary Outcome.a
| Parameter | Analytic Method | Estimate | % Relative Bias | SE | ESE | Coverageb |
|---|---|---|---|---|---|---|
| p1 | Participant Onlyc | 0.557 | 22 | 0.040 | 0.040 | 27.4 |
| Participant + Proxyd | 0.419 | −8 | 0.030 | 0.031 | 76.5 | |
| Parametric MI, Correcte | 0.456 | <1 | 0.033 | 0.032 | 96.1 | |
| Parametric MI, Incorrectf | 0.418 | −8 | 0.030 | 0.031 | 76.5 | |
| Non-parametric MI, Correctg | 0.457 | <1 | 0.034 | 0.030 | 97.5 | |
| Non-parametric-MI, Incorrecth | 0.419 | −8 | 0.031 | 0.031 | 77.7 | |
| p0 | Participant Only | 0.316 | 10 | 0.040 | 0.041 | 89.2 |
| Participant + Proxy | 0.300 | 5 | 0.030 | 0.030 | 91.3 | |
| Parametric MI, Correct | 0.287 | <1 | 0.027 | 0.026 | 95.5 | |
| Parametric MI, Incorrect | 0.300 | 5 | 0.030 | 0.032 | 91.6 | |
| Non-parametric MI, Correct | 0.285 | <1 | 0.028 | 0.026 | 95.5 | |
| Non-parametric-MI, Incorrect | 0.300 | 5 | 0.031 | 0.032 | 92.7 | |
| p1 − p0 | Participant Only | 0.242 | 43 | 0.056 | 0.058 | 73.5 |
| Participant + Proxy | 0.119 | −30 | 0.043 | 0.044 | 77.2 | |
| Parametric MI, Correct | 0.169 | <1 | 0.043 | 0.039 | 97.2 | |
| Parametric MI, Incorrect | 0.119 | −30 | 0.043 | 0.045 | 77.1 | |
| Non-parametric MI, Correct | 0.172 | 2 | 0.044 | 0.037 | 98.3 | |
| Non-parametric-MI, Incorrect | 0.120 | −29 | 0.044 | 0.044 | 79.5 |
1000 simulations, 500 observations. True parameters: p0 = 0.287, p1 = 0.455, p1 − p0 = 0.169.
Percent of 95% confidence intervals covering true parameter values;
Participant Only = Analysis excluding observations with R = 0;
Participant + Proxy = Analysis substituting missing X with X*;
Parametric MI, Correct = Multiple imputation with imputation model and propensity scores estimated using maximum likelihood correctly assuming qsens = 2.00 and qspec = 0.75;
Parametric MI, Incorrect = Parametric MI incorrectly assuming qsens = qspec = 99;
Non-parametric MI, Correct = Multiple imputation with imputation model and propensity scores estimated using boosted CART correctly assuming qsens = 2.00 and qspec = 0.75;
Non-parametric MI, Incorrect = Non-Parametric MI incorrectly assuming qsens = 99 and qspec = 99.
4.2. Continuous Outcome
Y was simulated from a normal distribution with E[Y | Z] = 0.3 + 0.1Z1 + 0.1Z2 − 0.1Z3 and variance 1. R, X, and X* were simulated under the same distributions as those for binary Y above; we similarly set qsens = 2.00 and qspec = 0.75. The median (interquartile range) of sensitivity and specificity under these models was 0.892 (0.876–0.919), and 0.654 (0.608–0.688), respectively. We estimated μ0, μ1, and μ1 − μ0 using inverse-probability weighting with multiply imputed X. WX|Z was parametrically estimated using logistic regression for Equations (12)–(14) and linear regression for Equation (15); and non-parametrically estimated using boosted CART for Equation (13) and PS(Z). Estimation was performed assuming the correct values of qsens and qspec and under the incorrect assumption that qsens = qspec = 99. We also performed participant-only estimation and participant + proxy estimation.
Table 2 shows results analogous to those for the binary outcome. When qsens and qspec were correctly specified, parametric and non-parametric multiple imputation produced unbiased means and mean differences, but with empirical coverage > 0.95.
Table 2.
Simulation Study Results for Continuous Outcome.a
| Parameter | Analytic Method | Estimate | % Relative Bias | SE | ESE | Coverageb |
|---|---|---|---|---|---|---|
| μ1 | Participant Onlyc | 0.702 | 41 | 0.052 | 0.050 | 2.4 |
| Participant + Proxyd | 0.423 | −15 | 0.045 | 0.045 | 61.4 | |
| Parametric MI, Correcte | 0.499 | <1 | 0.048 | 0.043 | 96.6 | |
| Parametric MI, Incorrectf | 0.423 | −15 | 0.045 | 0.044 | 61.2 | |
| Non-Parametric MI, Correctg | 0.501 | <1 | 0.049 | 0.044 | 97.2 | |
| Non-Parametric-MI, Incorrecth | 0.430 | −14 | 0.046 | 0.046 | 67.7 | |
| μ0 | Participant Only | 0.285 | 40 | 0.057 | 0.057 | 69.6 |
| Participant + Proxy | 0.238 | 17 | 0.044 | 0.044 | 88.9 | |
| Parametric MI, Correct | 0.202 | <1 | 0.041 | 0.038 | 96.9 | |
| Parametric MI, Incorrect | 0.238 | 17 | 0.044 | 0.044 | 88.9 | |
| Non-Parametric MI, Correct | 0.202 | <1 | 0.042 | 0.037 | 97.1 | |
| Non-Parametric-MI, Incorrect | 0.238 | 17 | 0.045 | 0.044 | 88.3 | |
| μ1 − μ0 | Participant Only | 0.417 | 41 | 0.077 | 0.074 | 65.6 |
| Participant + Proxy | 0.186 | −37 | 0.063 | 0.063 | 58.0 | |
| Parametric MI, Correct | 0.297 | <1 | 0.063 | 0.051 | 98.6 | |
| Parametric MI, Incorrect | 0.186 | −37 | 0.063 | 0.063 | 58.4 | |
| Non-Parametric MI, Correct | 0.299 | <1 | 0.064 | 0.054 | 98.4 | |
| Non-Parametric-MI, Incorrect | 0.192 | −35 | 0.064 | 0.064 | 61.6 |
1000 simulations, 500 observations. True parameters: μ0 = 0.203, μ1 = 0.500, μ1 − μ0 = 0.296.
Percent of 95% confidence intervals covering true parameter values;
Participant Only = Analysis excluding observations with R = 0;
Participant + Proxy = Analysis substituting missing X with X*;
Parametric MI, Correct = Multiple imputation with imputation model and propensity scores estimated using maximum likelihood correctly assuming qsens = 2.00 and qspec = 0.75;
Parametric MI, Incorrect = Parametric MI incorrectly assuming qsens = qspec = 99;
Non-Parametric MI, Correct = Multiple imputation with imputation model and propensity scores estimated using boosted CART correctly assuming qsens = 2.00 and qspec = 0.75;
Non-Parametric MI, Incorrect = Non-Parametric MI incorrectly assuming qsens = 99 and qspec = 99.
5. Data Application: The Baltimore Hip Studies
We illustrate our proposed statistical methods using data from the Second Cohort of the Baltimore Hip Studies, a prospective study comprising older adults who experienced a hip fracture [11]. The goal of the present analysis was to determine whether perceived rapid recovery of independent mobility, assessed using self-reported ability to walk 10 feet without human assistance two months post hip fracture, is associated with survival two years after the fracture. We considered both a binary outcome and a continuous outcome. For the binary outcome, we operationalized two-year survival as alive or dead; for the continuous outcome, we operationalized survival as the number of days alive in the two years after hip fracture (maximum, 731 days).
The analysis included 502 participants, where 365 provided self-reports, among whom 284 (77.8%) reported independent mobility. Proxies assessed mobility for the remaining 137 participants, 61 (44.5%) of whom evaluated the participant to be independently mobile. All estimated proportions and means were standardized for sex (395 women, 107 men), years of age (mean=80.8, SD=7.3, range 65–104) and number of comorbid conditions (mean=3.2, SD=2.1, range 0–12). Participant-only, participant + proxy, and parametric and non-parametric multiple imputation analyses were all performed using inverse-probability weighting. Multiple imputation was carried out with M = 50. Propensity scores from participant-only and participant + proxy analyses were estimated using conventional logistic regression to realistically reflect a typical analysis that does not account for missingness or misclassification. Boosted CART was implemented using a shrinkage parameter of 0.0005, 20000 iterations, and a 50% sub-sampling fraction, as per published recommendations [31, 32]. Balance diagnostics demonstrated that these tuning parameters performed well.
5.1. Binary Outcome
There were 404 (80.5%) participants alive two years after hip fracture. We performed multiple imputation assuming (qsens, qspec) = (3, 1) and (qsens, qspec) = (1, 3). Using a parametric model, assuming (qsens, qspec) = (3, 1) produced sensitivity and specificity distributions with median (interquartile range) of 0.950 (0.929–0.960) and 0.744 (0.692–0.806), respectively; assuming (qsens, qspec) = (1, 3) produced sensitivity and specificity distributions with median (interquartile range) of 0.718 (0.640–0.767) and 0.955 (0.943–0.968), respectively. Using boosted CART, setting (qsens, qspec) = (3, 1) produced sensitivity and specificity distributions with median (interquartile range) of 0.937 (0.903–0.965) and 0.785 (0.665–0.855), respectively; setting (qsens, qspec) = (1, 3) produced sensitivity and specificity distributions with median (interquartile range) of 0.669 (0.556–0.783) and 0.964 (0.938–0.978), respectively. We also performed analysis assuming (qsens, qspec) = (99, 99).
Table 3 shows that standardized two-year survival was higher for independently mobile participants than for participants who were not independently mobile, but the magnitude of the difference varied by statistical method. Among the methods and assumptions considered, boosted CART with (qsens, qspec) = (99, 99) and with (qsens, qspec) = (1, 3) produced the largest and smallest differences in survival, 0.139 and 0.110, respectively. The participant-only analysis produced the highest estimated proportions of survival in both groups, and estimates from the participant + proxy analysis and parametric multiple imputation with qsens = qspec = 99 were nearly identical.
Table 3.
Results for 502 Patients in the Second Cohort of the Baltimore Hip Studies, Binary Outcome.
| Analytic Method | qsens | qspec | Proportion Alive 2 Years Post-Fracture
|
Difference
|
||||
|---|---|---|---|---|---|---|---|---|
| Independently Mobile (IM)a
|
Not Independently Mobile (NIM)a
|
|||||||
| p̂IM | SE | p̂NIM | SE | p̂IM − p̂NIM | SE | |||
| Participant Onlyb | 0.88 | 0.02 | 0.75 | 0.05 | 0.12 | 0.06 | ||
| Participant + Proxyc | 0.85 | 0.02 | 0.72 | 0.04 | 0.13 | 0.04 | ||
| Parametric MId | 3 | 1 | 0.85 | 0.02 | 0.72 | 0.03 | 0.13 | 0.04 |
| 1 | 3 | 0.83 | 0.02 | 0.72 | 0.04 | 0.11 | 0.04 | |
| 99 | 99 | 0.85 | 0.02 | 0.72 | 0.04 | 0.13 | 0.04 | |
| Non-Parametric MIe | 3 | 1 | 0.85 | 0.02 | 0.72 | 0.03 | 0.13 | 0.04 |
| 1 | 3 | 0.83 | 0.02 | 0.72 | 0.04 | 0.11 | 0.05 | |
| 99 | 99 | 0.85 | 0.02 | 0.71 | 0.04 | 0.14 | 0.04 | |
Mobility assessed two months after hip fracture.
Participant Only = Analysis excluding observations with missing participant self reports;
Participant + Proxy = Analysis substituting missing participant self reports with proxy reports;
Parametric MI = Multiple imputation using maximum likelihood to estimate the imputation and propensity-score models;
Non-Parametric MI =Multiple imputation using boosted CART to estimate the imputation and propensity-score models.
We additionally performed multiple imputation using 100 combinations of qsens and qspec ranging from 0.01 to 4 and 99. When these broad ranges of qsens and qspec were considered, their effects on estimated proportions could be discerned using both parametric and non-parametric estimation. p̂0 ranged 0.713–0.754 and 0.707–0.741 using parametric and non-parametric estimation, respectively. Similarly, p̂1 ranged 0.812–0.874 and 0.811–0.870 for parametric and non-parametric estimation, respectively. As a result, p̂1 − p̂0 ranged 0.058–0.154 and 0.070–0.145 for parametric and non-parametric estimation, respectively.
5.2. Continuous Outcome
Participants lived for an average of 659 (SD=174) days after hip fracture. We considered the same sets of values for (qsens, qspec) as we did for binary survival. Using continuous survival with a parametric model, setting (qsens, qspec) = (3, 1) produced sensitivity and specificity distributions with median (interquartile range) of 0.947 (0.922–0.957) and 0.754 (0.708–0.822), respectively; setting (qsens, qspec) = (1, 3) produced sensitivity and specificity distributions with median (interquartile range) of 0.706 (0.614–0.753) and 0.958 (0.947–0.972), respectively. To address the large proportion alive after two years, we calculated propensity scores assuming that Y follows the density I(Y = 731)P(Y = 731 | Z) + I(Y < 731)P(Y < 731 | Z)f(Y = y | Y < 731, Z), where f(·) is assumed to be a normal density. Using boosted CART, setting (qsens, qspec) = (3, 1) produced sensitivity and specificity distributions with median (interquartile range) of 0.943 (0.904–0.966) and 0.767 (0.660–0.852), respectively; setting (qsens, qspec) = (1, 3) produced sensitivity and specificity distributions with median (interquartile range) of 0.690 (0.564–0.788) and 0.961 (0.936–0.977), respectively.
Results in Table 4 show that standardized two-year life expectancy was higher for independently mobile participants than for participants who were not independently mobile, but the magnitude of the difference varied by statistical method. Among the methods and assumptions considered, parametric analysis with (qsens, qspec) = (99, 99) and participant-only analysis produced the largest and smallest differences in average two-year life expectancy, 74 and 44 days, respectively. The participant-only analysis produced the highest average two-year life expectancy in both groups. Estimates from the participant + proxy analysis and parametric multiple imputation with qsens = qspec = 99 differed, owing to differences in how the propensity score was calculated; however, estimates from participant + proxy analysis were similar to those from boosted CART with qsens = qspec = 99.
Table 4.
Results for 502 Patients in the Second Cohort of the Baltimore Hip Studies, Continuous Outcome.
| Analytic Method | qsens | qspec | Mean Days Alive through 2 Years (max, 731) Post-Fracture
|
Difference
|
||||
|---|---|---|---|---|---|---|---|---|
| Independently Mobile (IM)a
|
Not Independently Mobile (NIM)a
|
|||||||
| μ̂IM | SE | μ̂NIM | SE | μ̂IM − μ̂NIM | SE | |||
| Participant Onlyb | 688 | 8 | 644 | 22 | 44 | 24 | ||
| Participant + Proxyc | 680 | 8 | 618 | 17 | 62 | 19 | ||
| Parametric MId | 3 | 1 | 683 | 8 | 612 | 17 | 71 | 19 |
| 1 | 3 | 673 | 8 | 617 | 19 | 56 | 20 | |
| 99 | 99 | 682 | 8 | 608 | 18 | 74 | 20 | |
| Non-Parametric MIe | 3 | 1 | 679 | 9 | 624 | 16 | 55 | 18 |
| 1 | 3 | 671 | 8 | 625 | 18 | 46 | 20 | |
| 99 | 99 | 680 | 8 | 619 | 17 | 61 | 19 | |
Mobility assessed two months after hip fracture.
Participant Only = Analysis excluding observations with missing participant self reports;
Participant + Proxy = Analysis substituting missing participant self reports with proxy reports;
Parametric MI = Multiple imputation using maximum likelihood to estimate the imputation and propensity-score models;
Non-Parametric MI =Multiple imputation using boosted CART to estimate the imputation and propensity-score models.
When assessing broader ranges of qsens and qspec (0.01 to 4 and 99), we found that μ̂0 ranged 606–645 days and 616–639 days for parametric and non-parametric estimation, respectively. Similarly, μ̂1 ranged 660–686 days and 660–684 days for parametric and non-parametric estimation, respectively. In this case, μ̂1 − μ̂0 ranged 14–76 days and 20–64 days for parametric and non-parametric estimation, respectively.
6. Discussion
This paper proposed and evaluated statistical methods to address exposures that are missing and assessed using error-prone proxy reports for analysis with both categorical and continuous outcomes. A major innovation of the parametric modeling approach is the use of pattern-mixture models to address missing and differentially misclassified exposure data. Pattern-mixture models have hitherto only been used to address missing or differentially misclassified outcomes [7, 22–24]. This approach was made possible by 1) using a likelihood that did not require a model for Y conditioned on X and X*, and 2) estimating covariate-standardized outcome means. The advantages of this approach are that the models for imputation and analysis are compatible (called ‘congenial’ in the multiple imputation literature [40]) and that the methods can be easily implemented. The disadvantage of the parametric approach is that it requires correctly specifying and estimating models for fYi|Zi (y; βY) (including the distribution), πRi|ZiYi (βR), and πXi|ZiYi1 (βX), which are nuisance parameters. The non-parametric machine-learning approach overcame the limitations of the parametric approach by not requiring specification of the distribution for Y and circumventing estimation of nuisance parameters while preserving model compatibility.
Our proposed approach incorporates features from and generalizations of other methods that were developed to handle exposure misclassification in the case where R = 0 for all participants. Lyles and Lin [26] proposed predictive-value weighting for logistic regression with binary outcomes and jackknife estimation of standard errors. The proposed approach uses the predictive values for multiple imputation, a method that can decrease the computational burden of estimating standard errors (i.e., 50 imputations rather than n jackknife iterations). Fox et al [35] proposed reconstructing the data that would have been observed had there been no misclassification using Monte Carlo simulations from predictive values and then performing logistic regression on the completed data; however, the authors’ predictive values did not accommodate covariates, and outcomes were assumed to be binary. Our proposed parametric estimation uses a likelihood decomposition that ensures all specified models are compatible with each other; namely, the propensity score model is compatible with the predictive-value models. Our proposed non-parametric approach ensures model compatibility by requiring fewer models–the same number of models as the proposed approaches by Lyles and Lin [26] and by Fox et al [35]. A direct adaptation of these previous methods in which a linear-logistic model is presumed for P(X* = 1 | Z, Y, R = 0) and a linear-logistic (binary outcome) or linear (continuous outcome) model is presumed for E(Y | X = x, Z = z) could result in model incompatibility, because the presumed links and linear relationships may not be preserved. That is, a full-data distribution that is consistent with all model assumptions may not exist.
In summary, our approach generalizes these earlier proposals by 1) considering the case where some participants have the gold-standard rather than error-prone exposure measured (i.e., R = 1 for some), 2) ensuring compatibility of all specified models and compatibility of assumptions about sensitivity and specificity with the data, 3) accommodating continuous or binary outcomes, 4) using standardization to facilitate the analyst’s choice of association measure (e.g., Z-adjusted risk difference, risk ratio, or odds ratio for binary outcomes), and 5) non-parametric estimation with machine-learning methods to further prevent model mis-specification.
The proposed method is similar in spirit to our published method for handling error-prone proxy-reported outcomes [7], but the new method features multiple innovations to help overcome some challenges of misclassified exposures that are not encountered in analysis with misclassified outcomes. In particular, unlike with misclassified outcomes, having misclassified exposures may require modeling both the outcome and the exposure. The propensity-score approach with non-parametric estimation is of benefit because it circumvents the need for an outcome model. Furthermore, when misclassification is differential, the exposure misclassification model (the imputation model) must condition on the outcome, and this model may not be compatible with the analysis model (an outcome model or propensity-score model). Therefore, we proposed a novel likelihood decomposition for parametric estimation and evaluated a non-parametric estimation procedure. A propensity-score method that can handle differential exposure misclassification is an innovation in itself. While methods for covariate measurement error in propensity-score analysis are available [41–43], this is the first method, to our knowledge, that handles exposure misclassification in propensity-score analysis. The proposed method can be adapted to handle the situation where all participants have the error-prone “proxy” exposure measure as a special case (R = 0 for all). A recent simulation study demonstrating the biasing effects of exposure misclassification on propensity-score estimators supports the need for such methods [44].
Both parametric and non-parametric estimation require user-specified values that encode presumed sensitivity and specificity of proxy reports. However, we do not consider this to be a limitation. Rather, this feature accurately represents the realities of exposure misclassification, namely that the sensitivity and specificity of proxy reports are not identifiable from the data, and an assumption is needed for estimation. Thus, a strength of the overall approach is that the assumptions are made explicit. In particular, when using the same model to estimate propensity scores, participant + proxy analysis is equivalent to the proposed method with large qsens and qspec.
The proposed methods were motivated by aging research, where proxy data are routinely collected and have been evaluated in published proxy-participant validation studies (see [1, 25] and references therein). However, the validation studies are imperfect because they only generalize to participants who do not need a proxy respondent. Despite this limitation, researchers may have a better intuition about misclassification than about missingness, thus making proxy data a valuable part of a sensitivity analysis. In future work, we aim to extend our methods to formally handle internal validation data as part of a sensitivity analysis for missing and misclassified exposures.
Acknowledgments
Contract/grant sponsor: National Institutes of Health K25AG034216, R01AG041202
References
- 1.Gruber-Baldini AL, Shardell M, Lloyd K, Magaziner J. Use of proxies and informants. In: Newman AB, Cauley JA, editors. The Epidemiology of Aging. New York: Springer; 2012. pp. 81–90. [Google Scholar]
- 2.Nelson LM, Longstreth WT, Jr, Koepsell TD, Van Belle G. Proxy respondents in epidemiologic research. Epidemiologic Reviews. 1990;12:71–86. doi: 10.1093/oxfordjournals.epirev.a036063. [DOI] [PubMed] [Google Scholar]
- 3.Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2. London: Chapman and Hall; 2006. [Google Scholar]
- 4.Guolo A. Robust techniques for measurement error correction: a review. Statistical Methods in Medical Research. 2008;17:555–580. doi: 10.1177/0962280207081318. [DOI] [PubMed] [Google Scholar]
- 5.Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: Wiley; 2004. [Google Scholar]
- 6.Shardell M, Hicks GE, Miller RR, Langenberg P, Magaziner J. Pattern-mixture models for analyzing normal outcome data with proxy respondents. Statistics in Medicine. 2010;29:1522–1538. doi: 10.1002/sim.3902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Shardell M, Simonsick E, Hicks GE, Resnick B, Ferrucci L, Magaziner J. Sensitivity analysis for nonignorable missingness and outcome misclassification from proxy reports. Epidemiology. 2013;24:215–223. doi: 10.1097/EDE.0b013e31827f4fa9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Belmont, CA: Wadsworth International; 1984. [Google Scholar]
- 9.Friedman JH, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion) Annals of Statistics. 2000;28:337–407. [Google Scholar]
- 10.Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001;29:1189–1232. [Google Scholar]
- 11.Magaziner J, Hawkes W, Hebel JR, Zimmerman SI, Fox KM, Dolan M, Felsenthal G, Kenzora J. Recovery from hip fracture in eight areas of function. Journals of Gerontology Series A: Biological Sciences Medical Sciences. 2000;55:M498–M507. doi: 10.1093/gerona/55.9. [DOI] [PubMed] [Google Scholar]
- 12.Snow AL, Cook KF, Lin PS, Morgan RO, Magaziner J. Proxies and other external raters: methodological considerations. Health Services Research. 2005;40:1676–1693. doi: 10.1111/j.1475-6773.2005.00447.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods-application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Google Scholar]
- 14.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. doi: 10.1093/biomet/70.1.41. [DOI] [Google Scholar]
- 15.Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association. 1984;79:516–524. doi: 10.1080/01621459.1984.10478078. [DOI] [Google Scholar]
- 16.D’Agostino RB. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine. 1998;17:2265–2281. doi: 10.1002/(SICI)1097-0258(19981015)17:19¡2265::AID-SIM918¿3.0.CO;2-B. [DOI] [PubMed] [Google Scholar]
- 17.Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
- 18.Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003;14:680–686. doi: 10.1097/01.EDE.0000081989.82616.7d. EDE.0000081989.82616.7d. [DOI] [PubMed] [Google Scholar]
- 19.Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. doi: 10.1093/biomet/63.3.581. [DOI] [Google Scholar]
- 20.Daniel RM, Kenward MG, Cousens SN, De Stavola BL. Using causal diagrams to guide analysis in missing data problems. Statistical Methods in Medical Research. 2012;21:243–256. doi: 10.1177/0962280210394469. [DOI] [PubMed] [Google Scholar]
- 21.Westreich D. Berkson’s bias, selection bias, and missing data. Epidemiology. 2012;23:159–164. doi: 10.1097/EDE.0b013e31823b6296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Little RJ. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134. doi: 10.1080/01621459.1993.10594302. [DOI] [Google Scholar]
- 23.Little RJ, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996;52:98–111. [PubMed] [Google Scholar]
- 24.Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2. New York: Wiley; 2002. [Google Scholar]
- 25.Shardell M, Alley DE, Miller RR, Hicks GE, Magaziner J. Comparing reports from hip-fracture patients and their proxies: implications on evaluating sex differences in disability and depressive symptoms. Journal of Aging and Health. 2012;24:367–383. doi: 10.1177/0898264311424208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lyles RH, Lin J. Sensitivity analysis for misclassification in logistic regression via likelihood methods and predictive value weighting. Statistics in Medicine. 2010;29:2297–2309. doi: 10.1002/sim.3971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Barndorff-Nielsen OE, Cox DR. Asymptotic Techniques for use in Statistics. London: Chapman and Hall; 1989. [Google Scholar]
- 28.Shardell M, El-Kamary SS. Sensitivity analysis of informatively coarsened data using pattern mixture models. Journal of Biopharmaceutical Statistics. 2009;19:1018–1038. doi: 10.1080/10543400903242779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Shardell M, Scharfstein DO, Vlahov D, Galai N. Sensitivity analysis using elicited expert information for inference with coarsened data: Illustration of censored discrete event times in ALIVE. American Journal of Epidemiology. 2008;168:1460–1469. doi: 10.1093/aje/kwn265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Barron BA. The effects of misclassification on the estimation of relative risk. Biometrics. 1977;33:414–418. [PubMed] [Google Scholar]
- 31.Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Statistics in Medicine. 2010;29:337–346. doi: 10.1002/sim.3782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.McCaffrey DF, Ridgeway G, Morral AR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods. 2004;4:403–425. doi: 10.1037/1082-989X.9.4.403. [DOI] [PubMed] [Google Scholar]
- 33.Friedman JH. Stochastic gradient boosting. Computational Statistics and Data Analysis. 2002;38:367–378. doi: 10.1016/S0167-9473(01)00065-2. [DOI] [Google Scholar]
- 34.Birmingham J, Rotnitzky A, Fitzmaurice GM. Pattern-mixture and selection models for analysing longitudinal data with monotone missing patterns. J Royal Stat Soc Ser B. 2003;65:275–297. doi: 10.1111/1467-9868.00386. [DOI] [Google Scholar]
- 35.Fox MP, Lash TL, Greenland S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int J Epidemiol. 2005;34:1370–1376. doi: 10.1093/ije/dyl226. [DOI] [PubMed] [Google Scholar]
- 36.Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5:299–314. [Google Scholar]
- 37.Ridgeway G, McCaffrey DF, Morral AR. Twang: Toolkit for weighting and analysis of nonequivalent groups. R Package Version 1.0-1. 2006 [Google Scholar]
- 38.Wang N, Robins JM. Large-sample theory for parametric multiple imputation procedures. Biometrika. 1998;85:935–948. doi: 10.1093/biomet/85.4.935. [DOI] [Google Scholar]
- 39.Robins JM, Wang N. Inference for imputation estimators. Biometrika. 2000;87:113–124. doi: 10.1093/biomet/87.1.113. [DOI] [Google Scholar]
- 40.Meng XL. Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994;9:538–558. doi: 10.1214/ss/1177010269. [DOI] [Google Scholar]
- 41.McCaffrey DF, Lockwood JR, Setodji CM. Inverse probability weighting with error-prone covariates. Biometrika. 2013;100:671–680. doi: 10.1093/biomet/ast022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yi GY, Ma Y, Carroll RJ. A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error. Biometrika. 2012;99:151–165. doi: 10.1093/biomet/asr076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.D’Agostino R, Jr, Rubin DB. Estimating and using propensity scores with partially missing data. J Am Statist Assoc. 2000;95:749–759. doi: 10.1080/01621459.2000.10474263. [DOI] [Google Scholar]
- 44.Babanezhad M, Vansteelandt S, Goetghebeur E. Comparison of causal effect estimators under exposure misclassification. Journal of Statistical Planning and Inference. 2010;140:1306–1319. doi: 10.1016/j.jspi.2009.11.015. [DOI] [Google Scholar]
