Abstract
Joint misclassification of exposure and outcome variables can lead to considerable bias in epidemiological studies of causal exposure-outcome effects. In this paper, we present a new maximum likelihood based estimator for marginal causal effects that simultaneously adjusts for confounding and several forms of joint misclassification of the exposure and outcome variables. The proposed method relies on validation data for the construction of weights that account for both sources of bias. The weighting estimator, which is an extension of the outcome misclassification weighting estimator proposed by Gravel and Platt (Weighted estimation for confounded binary outcomes subject to misclassification. Stat Med 2018; 37: 425–436), is applied to reinfarction data. Simulation studies were carried out to study its finite sample properties and compare it with methods that do not account for confounding or misclassification. The new estimator showed favourable large sample properties in the simulations. Further research is needed to study the sensitivity of the proposed method and that of alternatives to violations of their assumptions. The implementation of the estimator is facilitated by a new R function (ipwm) in an existing R package (mecor).
Keywords: Causal inference, confounding, inverse probability weighting, joint exposure and outcome misclassification, propensity scores, validation data
1 Introduction
In epidemiological research on causal associations between a particular exposure and a certain outcome, erroneous information on either or both of these variables poses a serious methodological obstacle in making valid inferences. In particular, joint misclassification of exposure and outcome can lead to considerable bias of standard causal effect estimators, with direction and magnitude depending on various factors, including the misclassification mechanism and the direction and magnitude of the true effect.1–6
Exposure and outcome misclassification is typically categorised according to two separate properties: whether or not the misclassification is differential and whether or not it is dependent relative to some covariate vector L containing patient characteristics.1,5 Joint misclassification of exposure and outcome is said to be nondifferential if (1) the sensitivity and specificity of exposure classification are constant across all categories of the (true) outcome given L and (2) the sensitivity and specificity of outcome classification are constant across all categories of the (true) exposure given L; otherwise it is differential. Misclassification is said to be independent if the joint probability of any exposure and outcome classification given any true exposure and outcome categories and L can be factored into the product of the corresponding probabilities for exposure and outcome separately; otherwise, it is dependent. In Dawid’s notation,7 that is, if true exposure level A and true outcome Y are (potentially mis)classified as B and Z, respectively, misclassification is nondifferential if and only if and and independent if and only if .
Epidemiological research hampered by joint misclassification of some type is likely voluminous.6 Examples of studies affected by exposure and outcome misclassification can be found, for example, in the literature on the causal effects of drug use, which is largely based on routinely collected data, where exposures are typically operationalised on the basis of prescription records and where outcomes are often self-reported.8–11 In applied epidemiological research, misclassification or some of its potential consequences are often ignored.12,13 The assertion often made in the discussion of study results that observed measures of association are biased toward the null under nondifferentiality, for example, is not generally true unless additional conditions are presupposed.2,6
Methods to adjust for misclassification rely on additional information that can be used to estimate or correct for bias. One potential source of information is validation data obtained through supposedly infallible measurement. Recently, Gravel and Platt proposed an inverse probability weighting (IPW) method to simultaneously address confounding and outcome misclassification by means of internal validation data.14 Other methods likewise suppose that either the exposure or the outcome is subject to misclassification.14–17 In what follows, we propose an extension of Gravel and Platt’s method to allow for confounding adjustment and joint exposure and outcome misclassification. This flexible estimator allows for the misclassifications to be dependent, differential or both. In Section 2, inverse probability weights for confounding and joint misclassification are introduced through a hypothetical study based on the illustrative example of Gravel and Platt. Section 3 details methods for estimation of the various components of the proposed weights based on validation data. In Section 4, we describe a series of Monte Carlo simulations that were used to study properties of the proposed method in finite samples. We conclude with a summary and discussion of our findings in context of the existing literature.
2 Data distribution for illustration and development of weighting method
We first consider the data and setting described by Gravel and Platt and suppose that Table 1 represents a simple random (i.i.d.) sample from (or that its cell counts are proportional to the respective densities in) the population of interest. This illustration is based on a cohort study on the association between post-myocardial infarction statin use (A) and the one-year risk of reinfarction (Y). In what follows, we will refer to this example as the ‘reinfarction example’.
Table 1.
Cross-classification of the reinfarction data for 33,007 individuals as given by Gravel and Platt.
L = 0 |
L = 1 |
|||
---|---|---|---|---|
A = 0 | A = 1 | A = 0 | A = 1 | |
Y = 0 | 11602 | 13116 | 1302 | 5363 |
Y = 1 | 890 | 589 | 49 | 96 |
Throughout we take the counterfactual framework for causal inference, formal accounts of which are given for example by Neyman et al.18–22 The interest, we suppose, lies in estimating for some function g, where Y(0) and Y(1) denote the counterfactual outcomes for hypothetical interventions setting A to 0 and 1, respectively. Common choices of g define (risk difference), (risk ratio), or (odds ratio). For our numerical example and simulation studies, we concentrate on the causal marginal odds ratio (OR) in particular, with
(1) |
but the results naturally extend to other effect measures.
2.1 No misclassification
Under conditional exchangeability given L (i.e. ), consistency (Y(a) = Y if A = a) and positivity ( for a = 0, 1 and all l in the support of L), the mean counterfactuals and can be expressed in terms of ‘observables’ (meaning, here, variables that would be observed in the absence of measurement error) as follows
where W denotes the inverse probability of the allocated exposure level A given L (i.e. the inverse propensity score if A = 1 and the inverse of the complement of the propensity score if A = 0) multiplied by the prevalence of the allocated exposure level A (i.e. ; Supplementary Appendix I). We therefore have
(2) |
Replacing components of the right-hand side of equation (2) with sample analogues, we obtain the following estimator for the setting where L is binary
(3) |
where nyal denotes the number of subjects with Y = y, A = a, L = l and where is the product of the proportion of subjects in the sample with A = a and the inverse of the proportion of subjects with A = a among those with L = l. For the data in Table 1, we obtain . The corresponding crude odds ratio (i.e. with ) is 0.509.
2.2 Joint misclassification
Suppose that rather than observing Y and A we observe Z and B, the misclassified versions of Y and A, respectively. The relation between Z and B on the one hand and Y, A and L on the other can be expressed as follows
for and all possible realisations y, a, l of Y, A, L, and where and .
To simulate (dependent differential) misclassification in the reinfarction dataset, we use the true positive and false positive rates given in Table 2 . The expected cell counts for these rates are given in Table 3 .
Table 2.
True and false positive rates for reinfarction example.
For and .
Table 3.
Expected cell counts (rounded to integers) for reinfarction example after misclassification was introduced.
Z = 0 |
Z = 1 |
|||
---|---|---|---|---|
B = 0 | B = 1 | B = 0 | B = 1 | |
10912 | 109 | 574 | 7 | |
51 | 10 | 678 | 151 | |
1527 | 10850 | 47 | 693 | |
5 | 27 | 48 | 509 | |
1148 | 116 | 23 | 14 | |
7 | 4 | 29 | 9 | |
334 | 4738 | 41 | 249 | |
4 | 11 | 13 | 68 |
Note: Because of rounding, the sum of all cell entries is 33,006 rather than 33,007, the size of the reinfarction dataset.
We redefine the weights in equation (2) as a function of B and L (as per Supplementary Appendix I) such that
(4) |
where p(B) is the prevalence of level B of the potentially misclassified version of the exposure variable and where and for all possible realisations a and l of A and L, respectively. In Supplementary Appendix I, it is shown that
(5) |
which suggests the plug-in estimator
(6) |
where denotes the sample mean operator and the sample analogue (i.e. consistent estimator) of W in equation (4). For other effect measures (i.e. other choices of g), the same plug-in strategy can be implemented.
In the absence of exposure misclassification, equation (4) reduces to
(7) |
The first term within the round brackets corrects for confounding and represents the propensity score if A = 1 or its complemement if A = 0 divided by the prevalence of exposure level A. The term within square brackets is a factor that corrects for misclassification in the outcome variable. This correction factor is similar to that proposed by Gravel and Platt.14 The only difference is that where in equation (7) it does not depend on the fallible measurement Z of Y, Gravel and Platt define different weights for subjects with Z = 0. Note, however, that the choice of weights for subjects with Z = 0 does not affect the population quantity in equation (5) or the estimator defined by equation (6), because the weights only appear in products with Z, which equal zero if Z = 0.
As for the reinfarction example, the odds ratio estimate for the exposure-outcome effect based on inverse probability weighting that assumes absence of exposure or outcome misclassification is 1.120, while the corresponding misclassification naive crude odds ratio is 1.031. Estimation of the population weights W from observables using validation data is discussed in the next section. As shown below, weighting using the proposed weights that account for confounding and outcome and exposure misclassification results in an odds ratio of . Inference based on equation (7) rather than equation (4), i.e. using Gravel and Platt’s method and ignoring misclassification in the exposure but correcting for outcome misclassification, yields an odds ratio estimate of 0.934.
2.3 Parameterisation based on positive and negative predictive values
In the foregoing discussion, the proposed weights were expressed in terms of sensitivity and specificity parameters. The sensitivity and specificity of Z with respect to Y, given (B, A, L), are and , respectively. Similarly, and reflect the sensitivity and specificity, respectively, with respect to A, conditional on Y and L.
As discussed below, it may be more convenient to choose a parameterisation that is based on (positive and negative) predictive values. Define and . The weights in equation (4) can be rewritten as
(8) |
In the absence of exposure misclassification, these weights simplify to
3 Estimation of weights based on validation data
Estimation of the proposed weights can be done using a number of approaches and we will here consider a maximum likelihood approach that assumes the availability of internal validation data, i.e. that some study participants have their observed exposure or outcome measured by an ‘infallible’ or ‘gold standard’ (100% accurate) classifier, and that all participants have the misclassified exposure and outcome variables measured.
3.1 Validation subset inclusion mechanism
Let RY be the indicator variable that takes the value of 1 if the outcome is observed (i.e. measured by an infallible classifier) and 0 otherwise. Similarly, define RA to be the indicator variable that takes the value of 1 if the exposure variable is observed and 0 otherwise. RY and RA reflect which subjects have validation data available on Y and A, respectively. The subset of subjects with validation data on Y need not fully overlap with the subset with validation data on A.
The validation subsets can be approached from the missing data framework of Rubin.23 Provided that Z, B, L are free of missing values, Rubin’s missing at random (MAR) condition is met whenever the vector is conditionally independent of (Y, A) given (Z, B, L).
3.2 Full likelihood approach based on parameterisation in terms of sensitivities and specificities
Simultaneous estimation of the whole vector of δ, ε, λ and π parameters can be done via maximum likelihood estimation as follows. Assuming i.i.d. observations and ignorable missingness in the sense of Rubin23 (MAR and distinctness), for valid likelihood-based inference it is appropriate to maximise the following log-likelihood over the parameter space of θ, the vector of δ, ε, λ and π parameters
where
Evaluating this log-likelihood involves marginalising over unobserved quantities in the last three terms of . The log-likelihood equations may become considerably more tractable if we choose a parameterisation of the likelihood that is based on predictive values rather than sensitivities and specificities.
3.3 Full likelihood approach based on parameterisation in terms of predictive values
Inference may alternatively be based on a log-likelihood that is parameterised in terms of the vector of the and parameters, i.e.
where
If validation data is available on Y if and only if it is available on A, the complete data log-likelihood ignoring the missing data mechanism can be conveniently expressed as follows
(9) |
with denoting the vector of and parameters and where
Now, assuming distinct parameter spaces for the vectors of , and parameters, the parameter values that maximise can be found by separately maximising and in the validation subset with respect to the and parameters, respectively, and and in the entire dataset with respect to and . Following Gravel and Platt14 and Tang et al.,24 the sum of the first and last two terms are therefore suitably labelled the internal validation and main study log-likelihood, respectively. With this parameterisation, finding the maximum likelihood estimates is readily achieved by taking advantage of standard statistical software.
3.4 Equivalence of likelihood approaches based on different parameterisations
Without restrictions imposed on
other than that , it can be shown that the maximum likelihood estimator based on the internal validation design is invariant to its parameterisation (sensitivities/specificities versus positive and negative predictive values). This is because there exists a function mapping every to a unique and vice versa. Maximising with respect to θ is then equivalent to maximising () with respect to for some bijection σ such that ; that is,
If more restrictions are imposed on θ or , e.g. if we assume non-saturated logistic models for the components of θ and , this equivalence no longer holds and the resulting weight estimates may differ depending on the parameterisation.
3.5 Application
For the re-infarction data example, we assume validation data are available according to a MAR mechanism characterised by
This mechanism assigns validation data to an individual on either both Y and A (30% of all individuals) or neither depending on their realisation of B, the misclassified version of the exposure variable A (Table S.1). Tables S.2 and S.3 (see Supplementary online Appendix) give the likelihood contributions for the parameterisation based on predictive values and the closed-form maximum likelihood expressions, respectively. Maximum likelihood estimates can also be found by fitting to the data the saturated logistic regression models of B and Z on L and (B, L), respectively, and to the validation subset the fully saturated logistic regression models of A and Y on (Z, B, L) and , respectively. Estimated weights are then obtained by plugging in the maximum likelihood estimates into equation (8). As in the complete data setting where we assumed the weights to be known, evaluating equation (6) then yields an odds ratio of .
4 Simulations
We performed a series of Monte Carlo simulation experiments to illustrate the implementation of the proposed method, to study its finite sample properties and to compare the method to estimators that ignore the presence of confounding or joint exposure and outcome misclassification. All simulations were conducted using R-3.5.025 on x86_64-pc-linux-gnu platforms of the high performance computer cluster of Leiden University Medical Center.
4.1 Methods
For all 54 simulation experiments, we generated samples of size n according to the data generating mechanisms depicted in the directed acyclic graphs of Figure 1 . This multi-step data generating process included generating values on measurement error-free variables, introducing misclassification and allocating individuals validation data. We applied various estimators to each of the simulation samples to yield, for each scenario, an empirical distribution of each point estimator and corresponding precision estimators. These distributions were then summarised into various performance metrics. These metrics include the empirical bias of the estimator on the log-scale (i.e. the mean estimated log-OR minus the target log-OR across the samples), the empirical standard error (SE) of the estimator on the log-scale (i.e. the square root of the mean squared deviation of the estimated log-OR from the mean log-OR), the empirical mean squared error (MSE) (i.e. the sum of the squared SE and the squared bias), the square root of the mean estimated variance (SSE, sample standard error) and the empirical coverage probability (CP) (i.e. the fraction of simulation runs per scenario where the 95% confidence interval (95% CI) contained the target quantity).
Figure 1.
Data structure for scenarios with misclassification on the outcome only (left) or on both the exposure and outcome (right). Bullet arrowheads represent deterministic relationships.
4.1.1 Distribution of measurement error-free variables
Following Gravel and Platt,14 we consider a setting based on that of “Scenario A” in the work of Setoguchi et al. with slight modifications to the propensity score and outcome models.26 We consider a fully observed covariate vector whose distribution coincides with that of h(V), where has the multivariate normal distribution with zero means, unit variances and correlations equal to zero except for the correlations between W1 and V5, V2 and V6, V3 and V8, and V4 and V9, which were set to 0.2, 0.9, 0.2, and 0.9, respectively. Function h was defined such that
Thus, sampling from the distribution of L is equivalent to sampling from the multivariate normal distribution with the given parameter values and dichotomising the first, third, fifth, sixth, eighth and ninth elements.
Next, let U1 and U2 be binary variables distributed according to the following logistic models:
(10) |
(11) |
The distribution of the binary exposure variable A was defined according to the model
(12) |
Letting U3 be a scalar random variable that is independent of and uniformly distributed over the interval , we defined the counterfactual outcome Y(a), under the intervention setting A to a, as
(13) |
With , the above implies consistency, conditional exchangeability given L and structural positivity.
4.1.2 Misclassification mechanism
For scenarios with joint misclassification, we defined B = U1 and Z = U2, so that the predictive values take a standard logistic form
(14) |
(15) |
For scenarios without exposure misclassification, we set and defined B = A and Z = U2, so that
(16) |
(17) |
For simplicity, we removed any marginal dependence of Z on the covariates L and U1 as well as any marginal dependence of U1 on L (cf. equations (10) and (11)). Although models (10) through (15) take a standard logistic form, they do not imply that the corresponding sensitivities and specificities can be written in the same form. We chose the predictive values rather than the sensitivities and specificities to take a standard logistic form so as to ensure correct model specification in the estimation of the weights in the simulation experiments, in which a likelihood approach based on predictive values was adopted (cf. equation (9)).
4.1.3 Missing data mechanism
For these simulations, we stipulated L, B and Z to be observed for all subjects. We consider scenarios where the dataset can be partitioned into a subset with validation data on all misclassified variables (denoted R = 1) and a dataset with validation data on neither (R = 0). That is, we simulated data such that subjects have validation data on both A and Y or neither on A nor on Y. Values for the response indicator R were generated according to the following (MAR) model
4.1.4 Scenarios
We initially fixed most parameters of models (12) and (13) at the respective values of “Scenario A” of Setoguchi et al.26 and . Parameters η0 and α0 were fixed at zero and ξ1, ξ2 and ξ3 at 2, 1 and −1, respectively. The remaining parameters and β0 were allowed to vary across scenarios as per Table 4 .
Table 4.
Simulation parameter values used in the Monte Carlo studies.
Exposure |
|||||||
---|---|---|---|---|---|---|---|
Scenarios | misclassification | μ 0 | α 11 | β 0 | β 11 | Γ | ξ 0 |
1a,1b,1c | Absent | −2 | 0 | −3.85 | 2 | −0.431 | −1.5 |
2a,2b,2c | Absent | −3 | 0 | −3.85 | 2 | −0.417 | −1.5 |
3a,3b,3c | Absent | −2 | 0 | −3.85 | 4 | −0.624 | −1.5 |
4a,4b,4c | Absent | −2 | 0 | −3.85 | 2 | −0.431 | −2.5 |
5a,5b,5c | Present | −2 | 2 | −3.85 | 2 | −0.431 | −1.5 |
6a,6b,6c | Present | −3 | 2 | −3.85 | 2 | −0.417 | −1.5 |
7a,7b,7c | Present | −2 | 4 | −3.85 | 2 | −0.431 | −1.5 |
8a,8b,8c | Present | −2 | 2 | −3.85 | 4 | −0.624 | −1.5 |
9a,9b,9c | Present | −2 | 2 | −3.85 | 2 | −0.431 | −2.5 |
10a,10b,10c | Absent | −2 | 0 | −2 | 2 | −0.470 | −1.5 |
11a,11b,11c | Absent | −3 | 0 | −2 | 2 | −0.445 | −1.5 |
12a,12b,12c | Absent | −2 | 0 | −2 | 4 | −0.641 | −1.5 |
13a,13b,13c | Absent | −2 | 0 | −2 | 2 | −0.470 | −2.5 |
14a,14b,14c | Present | −2 | 2 | −2 | 2 | −0.470 | −1.5 |
15a,15b,15c | Present | −3 | 2 | −2 | 2 | −0.445 | −1.5 |
16a,16b,16c | Present | −2 | 4 | −2 | 2 | −0.470 | −1.5 |
17a,17b,17c | Present | −2 | 2 | −2 | 4 | −0.641 | −1.5 |
18a,18b,18c | Present | −2 | 2 | −2 | 2 | −0.470 | −2.5 |
Note: Scenarios indicated with ‘a’ have n = 10,000, those with ‘b’ have n = 5000 and those with ‘c’ have n = 1000.
Scenarios differ by sample size n, the presence of outcome misclassification, potentially misclassified outcome prevalence (via μ0), the associations between the exposure and outcome on the one hand and the respective misclassified versions on the other (via α11 and β11), outcome model intercept β0, the conditional log-OR γ, or the size of the validation subset (via ξ0). Based on an iterative Monte Carlo integration approach,27 we specified γ so as to keep the target marginal log odds ratio at −0.4.
4.1.5 Estimators
We considered five estimators of the OR for the marginal exposure-outcome effect: a crude estimator (labeled Crude) that ignores both confounding and misclassication of any variable, a misclassification naive estimator (labeled PS) that addresses confounding through IPW, complete cases analysis (CCA) in which IPW is applied only to the subset of subjects with validation data, the Gravel and Platt estimator (GP) that ignores exposure misclassification, and the method proposed in this article (labeled IPWM). Both GP and IPWM are implemented using the R function mecor::ipwm,28,29 which in the simulation settings considered uses iteratively reweighted least squares via the stats::glm function for maximum likelihood estimation. GP coincides with the approach of Gravel and Platt where it concerns point estimation, but they differ in the construction of confidence intervals. Unlike Gravel and Platt,14 we used a non-parametric rather than a semi-parametric bootstrap procedure for estimating standard errors and constructing confidence intervals. Semi-parametrically generating response indicators would preferably require modelling of (or making additional assumptions about) the missing data mechanism. In particular, to obtain a bootstrap dataset, we defined the record of a unit as their observed data and response indicators, imposed a uniform distribution across all records in the original dataset, and drew independently as many records from this distribution as the total number of records in the original dataset. For all methods and each original dataset, we drew 1000 bootstrap datasets for variance estimation and the construction of percentile confidence intervals.
All estimators are based on a function of the estimated outcome probability P1 in the exposed group and the estimated outcome probability P0 in the unexposed group. However, since P1 and P0 may take a value of 0 or 1, the crude odds ratio need not exist. In contrast to what is often (implicitly) done in simulation studies—i.e., studying the properties of the estimators after conditioning on datasets where is defined—we first define and for a large positive number s (here set to 106) and then regard as the estimator of the OR for the exposure-outcome association. This ensures the estimator is always defined and effectively shrinks the outcome probabilities towards 0.5 and the OR towards 1 (online Supplementary Appendix II).
For PS and CCA, we used a logistic regression of B and A, respectively, on covariates L1 through L10 as main effects to estimate the propensity scores. Taking the crude OR for the association between B and Z (PS) or A and Y (CCA) over the data weighted by the reciprocal of the propensity scores provided an estimate of target OR. R code for the methods GP and IPWM is given in online Supplementary Appendix III.
4.2 Results
The treatment assignment mechanism detailed above resulted in average exposure rates ranging from 17% to 51%, whereas average outcome rates ranged from 3% to 22%. Across all simulation studies, the average outcome rate ranged from 6% to 18%. Across all simulation studies with exposure misclassification, exposure and joint misclassification rates ranged from 16% to 33% and from 2% to 6%, respectively. Approximately 16% to 32% of subjects were allocated validation data.
The results on the performance of the various methods in simulations studies 1–9 are provided in Table 5 (see Supplementary Table S.4 for the results on all scenarios).
Table 5.
Results for simulation studies 1–9b on the performance of different causal estimators in various scenarios of confounding and misclassification in exposure and outcome.
Crude |
||||||
---|---|---|---|---|---|---|
Scenario | Bias | BSE | MSE | SE | SSE | CP |
1b | 0.004 | 0.169 | 0.119 | 0.118 | 0.122 | |
2b | 0.006 | 0.179 | 0.183 | 0.184 | 0.492 | |
3b | 0.004 | 0.169 | 0.117 | 0.118 | 0.116 | |
4b | 0.004 | 0.174 | 0.117 | 0.118 | 0.102 | |
5b | 0.003 | 0.169 | 0.090 | 0.088 | 0.007 | |
6b | 0.004 | 0.183 | 0.132 | 0.134 | 0.133 | |
7b | 0.003 | 0.164 | 0.086 | 0.088 | 0.009 | |
8b | 0.003 | 0.164 | 0.086 | 0.088 | 0.005 | |
9b | 0.003 | 0.166 | 0.089 | 0.088 | 0.005 |
PS: Propensity score method ignoring misclassification; CCA: complete case analysis; GP: Gravel and Platt estimator ignoring exposure misclassification, consistent with the methodology of Gravel and Platt for point (but not for variance) estimation14; IPWM: inverse probability weighting method for confounding and joint exposure and outcome misclassification; BSE: estimated standard error for the bias due to Monte Carlo error; SE: empirical standard error; SSE: sample standard error; CP: empirical coverage probability. In all scenarios, the true marginal log OR (estimand) was −0.4.
As expected, Crude, PS and CCA clearly showed bias with respect to the target log OR of −0.4. The bias associated with restricting the analysis to records with validation data is likely brought on to a large extent by collider stratification, with R acting as the collider here (cf. Figure 1). Both Crude and PS indicated a null effect, as one would anticipate in view of the marginal and L-conditional independence of B and Z implied by the simulation set-up. The empirical coverage probabilities were, although low for both estimators, similar to substantially larger for PS as compared with Crude. Paralleling this is that Crude, whose (implicit) propensity score model is inherently at least as parsimonious, yielded similar to smaller empirical and sample standard errors as compared with PS. With the average fraction of subjects with validation data being as low as 16% (in scenarios with low ξ0) to 32%, it is not unsurprising that Crude was subject to the largest degree of variability.
The results for the IPWM approach are generally favourable for large samples and in line with its theoretical (large sample) properties. For scenarios with smaller samples (scenarios 1c, 2c and 4c, 6c and 9c in particular), however, we observed considerable bias (see Supplementary Appendix S.4). Comparing CCA with IPWM, we note a strong linear association between the methods in terms of the absolute within-method differences in estimated bias between scenarios of size 10,000 (scenarios labeled ‘a’) and the respective scenarios of size 1000 (scenarios labeled ‘c’) (Pearson correlation 0.997). Note that the results for GP and IPWM are identical for scenarios labeled 1–4 and 10–13 since the methods are equivalent in terms of point estimation in the absence of exposure misclassification. In all other scenarios, i.e. scenarios for which GP was not developed, GP performed substantially worse than IPWM. The non-zero, albeit relatively small, systematic deviations of the IPWM point estimates from the target −0.4, notably the estimated bias of −0.097 (scenario 2 b), may be attributable in part to the outcome being rare (with prevalence ranging from 3% to 8% across scenarios labeled 1–9). This is indicated by the superior performance of IPWM in scenarios where the outcome is more prevalent (cf. scenarios labeled 1–9 b versus 10–18 b, which have prevalence up to 22%). A similar observation was made by Gravel and Platt.14
The standard errors for GP and IPWM were noticeably higher than those of Crude and PS, which is unsurprising in view of the discrepancies in the number of estimated parameters. As expected, increasing the sample size, the true outcome rate (via β0) or both led to a decrease in the variability of IPWM (cf. Table 4 and Supplementary Table S.4). However, despite the large discrepancies between SSE and SE for some scenarios, the empirical coverage probabilities of IPWM were close to the nominal level of 0.95, except for scenarios 1c, 2c and 4c, where we observed considerable bias.
5 Discussion
The analysis of epidemiologic data is often complicated by the presence of confounding and misclassifications in exposure and outcome variables. In this paper, we propose a new estimator for estimating a marginal odds-ratio in the presence of confouding and joint misclassification of the exposure and outcome variables. In simulation studies, this weighting estimator showed promising finite sample performance, reducing bias and mean squared error as compared with simpler methods.
The proposed IPWM estimator is an extension of the inverse probability weighting estimator recently proposed by Gravel and Platt (GP) which only addresses the misclassification in the outcome.14 IPWM and GP are (mathematically) equivalent when the exposure is (assumed to be) measured without misclassification error.
Like the Gravel and Platt approach, IPWM relies on estimates of sensitivity and specificity or positive and negative predictive values for the misclassified variables. In this paper, we used an internal validation approach where a portion of subjects would receive error-free (‘gold standard’) measurements on either or both the outcome and exposure. However, we anticipate that in some settings the likelihood may not be fully identifiable from the data at hand. In these settings, it may be possible to incorporate external rather than internal information on the misclassification rates, possibly through a Bayesian approach using prior assumptions about misclassification probabilities. When validation data is external, however, it may be necessary to assume misclassification to be independent of covariates L, because external studies seldom consider the same covariates as the main study.30 External validation approaches also require the assumption that the misclassification parameters targeted in the validation sample are transportable to the main study.
In the absence of internal and external validation data, it is possible to conduct a sensitivity analysis within the weighting framework. Formula (8) for the weights can readily be used in a sensitivity analysis in which the terms describing the distribution of true exposure and outcome variables in relation to the observed data (positive and negative predictive values) serve as sensitivity parameters of the sensitivity analysis. The models for the predictive values can take complex forms, however, thus complicating the analysis and presentation of results.
If internal validation is available, the subjects with validation data need not form a completely random subset. The proposed method, IPWM, was developed under the assumption that validation data allocation occurs in an “ignorable” fashion.23 In practice, it may be that the researchers have limited control over the validation data allocation mechanism. For instance, it is conceivable that individuals with specific indications (e.g. with a realisation of L, B or Z) are practically ineligible to be assigned a double measurement of the exposure (A and B) and outcome (Y and Z). Further, the estimator also allows for validation subjects to receive either the double exposure or double outcome measurement. We simulated data such that subjects have validation data on both the exposure and outcome variables or on neither. Although this may greatly simplify analysis and enhance efficiency, in practice it is not necessary to assume that this condition holds. An interesting scenario is where subjects have validation data on at most one variable, i.e. on the exposure variable or the outcome variable but not both. In this case, valid estimation would require additional modelling assumptions; for example, the error-free outcome variable cannot then be regressed on the error-free exposure variable.
To accommodate settings where validation data allocation is not completely at random, we deviated from the semi-parametric bootstrap procedure for variance estimation proposed by Gravel and Platt. Instead, the non-parametric procedure we used requires less assumptions regarding the validation subset sampling procedure. The non-parametric procedure showed good performance in our simulations.
Whilst we have discussed under what conditions the proposed method consistently estimates or at least identifies the target quantity, the assumptions may be untenable in particular settings. Particularly, an infallible measurement tool for the exposure and outcome that can be performed on a subset of the data need not always exist. The robustness to deviations of infallibility is an interesting and important direction for further research. This is especially relevant where there exists considerable uncertainty about the tenability of the assumptions that is difficult to incorporate in the analysis. An obvious and flexible alternative to IPWM is to multiply impute missing values including absent measurement error-free variables before implementing IPW (MI + IPW). Although MI + IPW and IPWM may be comparable in terms of their assumptions, it is yet unclear how they behave under assumption violations such as misspecification of the outcome model.
An advantageous property of MI + IPW is that it can easily accommodate missing covariate values. Other alternatives that can accommodate missing covariates were recently developed by Shu and Yi.31 Their proposed weighting estimators simultaneously addresses confounding, misclassification of the outcome (but not of the exposure) and measurement error on the covariates under a classical additive measurement error model. The methods can be implemented using validation data or repeated measurements and use a simple misclassification model (in which the outcome surrogate is independent of exposure or covariates given the target outcome) that is suitable for performing sensitivity analyses.
Another interesting area for further research is where the researchers do have control over who is referred for further testing by the assumed infallible measurement tool(s). An obvious choice is to adopt a completely at random strategy (simple random sampling). However, other referral (sampling) strategies exist and it is not clear what strategy leads to the most favourable estimator properties for the given setting.
In summary, we have developed an extension to an existing method, to allow for valid estimation of a marginal causal OR in the presence of confounding and a commonly ignored and misunderstood source of bias—joint exposure and outcome misclassification. The R function mecor::ipwm has been made available to facilitate implementation.28,29
Supplemental Material
Supplemental material, sj-pdf-1-smm-10.1177_0962280220960172 for A weighting method for simultaneous adjustment for confounding and joint exposure-outcome misclassifications by Bas BL Penning de Vries, Maarten van Smeden and Rolf HH Groenwold in Statistical Methods in Medical Research
Declaration of conflicting interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: RHHG was funded by the Netherlands Organization for Scientific Research (NWO-Vidi project 917.16.430). The views expressed in this article are those of the authors and not necessarily any funding body.
Supplemental material: Supplemental material for this article is available online.
ORCID iDs
Bas BL Penning de Vries https://orcid.org/0000-0001-9989-7732
Maarten van Smeden https://orcid.org/0000-0002-5529-1541
References
- 1.Kristensen P. Bias from nondifferential but dependent misclassification of exposure and outcome. Epidemiology 1992; 3: 210–215. [DOI] [PubMed] [Google Scholar]
- 2.Brenner H, Savitz DA, Gefeller O. The effects of joint misclassification of exposure and disease on epidemiologic measures of association exposure and disease on epidemiologic measures of association. J Clin Epidemiol 1993; 46: 1195–1202. [DOI] [PubMed] [Google Scholar]
- 3.Vogel C, Brenner H, Pfahlberg A, et al. The effects of joint misclassification of exposure and disease on the attributable risk. Stat Med 2005; 24: 1881–1896. [DOI] [PubMed] [Google Scholar]
- 4.Jurek AM, Greenland S, Maldonado G. Brief report: how far from non-differential does exposure or disease misclassification have to be to bias measures of association away from the null?. Int J Epidemiol 2008; 37: 382–385. [DOI] [PubMed] [Google Scholar]
- 5.VanderWeele TJ, Hernán MA. Results on differential and dependent measurement error of the exposure and the outcome using signed directed acyclic graphs. Am J Epidemiol 2012; 175: 1303–1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Brooks DR, Getz KD, Brennan AT, et al. The impact of joint misclassification of exposures and outcomes on the results of epidemiologic research. Curr Epidemiol Rep 2018; 5: 166–174. [Google Scholar]
- 7.Dawid A. Conditional independence in statistical theory. J Royal Stat Soc, Ser B (Methodol) 1979; 41: 1–31. [Google Scholar]
- 8.Marcum ZA, Sevick MA, Handler SM. Medication nonadherence: a diagnosable and treatable medical condition. JAMA 2013; 309: 2105–2106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Culver AL, Ockene IS, Balasubramanian R, et al. Statin use and risk of diabetes mellitus in postmenopausal women in the women’s health initiative. Archives Intern Med 2012; 172: 144–152. [DOI] [PubMed] [Google Scholar]
- 10.Leong A, Dasgupta K, Bernatsky S, et al. Systematic review and meta-analysis of validation studies on a diabetes case definition from health administrative records. PloS One 2013; 8: e75256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ni J, Leong A, Dasgupta K, et al. Correcting hazard ratio estimates for outcome misclassification using multiple imputation with internal validation data. Pharmacoepidemiol Drug Safety 2017; 26: 925–934. [DOI] [PubMed] [Google Scholar]
- 12.Jurek AM, Maldonado G, Greenland S, et al. Exposure-measurement error is frequently ignored when interpreting epidemiologic study results. Eur J Epidemiol 2006; 21: 871–876. [DOI] [PubMed] [Google Scholar]
- 13.Brakenhoff TB, Mitroiu M, Keogh RH, et al. Measurement error is often neglected in medical literature: a systematic review. J Clin Epidemiol 2018; 98: 89–97. [DOI] [PubMed] [Google Scholar]
- 14.Gravel CA, Platt RW. Weighted estimation for confounded binary outcomes subject to misclassification. Stat Med 2018; 37: 425–436. [DOI] [PubMed] [Google Scholar]
- 15.Babanezhad M, Vansteelandt S, Goetghebeur E. Comparison of causal effect estimators under exposure misclassification. J Stat Plan Inference 2010; 140: 1306–1319. [Google Scholar]
- 16.Braun D, Gorfine M, Parmigiani G, et al. Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics 2017; 18: 695–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Shu D, Yi GY. Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Stat Meth Med Res 2019; 28: 2049–2068. [DOI] [PubMed] [Google Scholar]
- 18.Neyman J, Iwaszkiewicz K, St Kolodziejczyk. Statistical problems in agricultural experimentation. Suppl J Royal Stat Soc 1935; 2: 107–180. [Google Scholar]
- 19.Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974; 66: 688–701. [Google Scholar]
- 20.Holland P. Statistics in causal inference. J Am Stat Assoc 1986; 81: 945–960. [Google Scholar]
- 21.Holland P. Causal inference, path analysis, and recursive structural equations models. Sociol Methodol 1988; 18: 449–484. [Google Scholar]
- 22.Pearl J. Causality: models, reasoning and inference. New York, NY: Cambridge University Press, 2009. [Google Scholar]
- 23.Rubin D. Inference and missing data. Biometrika 1976; 63: 581–592. [Google Scholar]
- 24.Tang L, Lyles RH, Ye Y, et al. Extended matrix and inverse matrix methods utilizing internal validation data when both disease and exposure status are misclassified. Epidemiol Meth 2013; 2: 49–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.R Core Team. R: A Language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2018, https://www.R-project.org/ (accessed 20 July 2020).
- 26.Setoguchi S, Schneeweiss S, et al. Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol Drug Safe 2008; 17: 546–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Austin PC, Stafford J. The performance of two data-generation processes for data with specified marginal treatment odds ratios. Communicat Stat Simulat Computat 2008; 37: 1039–1051. [Google Scholar]
- 28.Nab L. mecor: Measurement error corrections, 2019. R package version 0.1.0. https://github.com/LindaNab/mecor.git (accessed 30 January 2020).
- 29.Nab L, Groenwold RH, Welsing PM, et al. Measurement error in continuous endpoints in randomised trials: problems and solutions. 2018. [DOI] [PMC free article] [PubMed]
- 30.Lyles RH, Tang L, Superak HM, et al. Validation data-based adjustments for outcome misclassification in logistic regression: an illustration. Epidemiology 2011; 22: 589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Shu D, Yi GY. Weighted causal inference methods with mismeasured covariates and misclassified outcomes. Stat Med 2018; 38: 1835--1854. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-pdf-1-smm-10.1177_0962280220960172 for A weighting method for simultaneous adjustment for confounding and joint exposure-outcome misclassifications by Bas BL Penning de Vries, Maarten van Smeden and Rolf HH Groenwold in Statistical Methods in Medical Research