Skip to main content
Sage Choice logoLink to Sage Choice
. 2023 Jan 24;32(4):691–711. doi: 10.1177/09622802221146308

Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders

Grace Y Yi 1,2,, Li-Pang Chen 1,3
PMCID: PMC10119903  PMID: 36694932

Abstract

In the framework of causal inference, the inverse probability weighting estimation method and its variants have been commonly employed to estimate the average treatment effect. Such methods, however, are challenged by the presence of irrelevant pre-treatment variables and measurement error. Ignoring these features and naively applying the usual inverse probability weighting estimation procedures may typically yield biased inference results. In this article, we develop an inference method for estimating the average treatment effect with those features taken into account. We establish theoretical properties for the resulting estimator and carry out numerical studies to assess the finite sample performance of the proposed estimator.

Keywords: Causal inference, inverse probability weight, measurement error, propensity score, simulation–extrapolation, variable selection

1. Introduction

Causal inference offers an important paradigm for answering a multitude of questions arising from a broad variety of areas such as healthcare, epidemiological studies, and social sciences. With observational studies, various estimation methods have been developed based on propensity scores, where the propensity score is defined as the conditional probability for an individual to receive a treatment, given pre-treatment confounders. 1 In particular, the inverse probability weighting estimation methods are widely used due to their easy implementation and transparent interpretation (e.g. Rosenbaum and Rubin, 2 Lunceford and Davidian, 3 and Bang and Robins 4 ). They adjust for the effects of measured confounders by re-weighting the data as if the weighted data were collected from randomized controlled trials.

The validity of those methods requires the treatment model to be correctly specified so that the propensity scores can be consistently estimated. To accommodate possible misspecification of the treatment model, Bang and Robins 4 developed a doubly robust estimation method. This protection, nevertheless, does not come for free. It is at the price of specifying the outcome model correctly to facilitate the relationship between the outcome and confounders.

If correctly posing the outcome model is not possible, trying to postulate a feasible treatment model becomes necessary, which basically requires no omission of relevant pre-treatment confounders. Furthermore, to ensure the no-unmeasured-confounders assumption to be reasonable, one may tend to include as many variables as possible when building a treatment model. However, naively including all available variables in the treatment model would degrade inference results or even yield erroneous conclusions. In applications, some collected variables are unimportant in explaining the treatment variable, yet it is often unclear which variables should or should not be included in the treatment model, and we often rely on subject-matter knowledge to decide what variables are to be used for building a treatment model (e.g. Westreich et al. 5 ). It is desirable to develop an analytical procedure for variable selection to build a suitable treatment model.

Utilizing variable selection techniques in causal inference has recently attracted interest. For example, Shortreed and Ertefaie 6 proposed the outcome-adaptive lasso method to select covariates associated with outcomes. Ertefaie et al. 7 developed a penalized objective function by employing both the outcome and treatment models to do a variable selection. Assuming Spike and slab priors for covariate coefficients, Koch et al. 8 explored a Bayesian method to estimate causal effects with outcome and treatment models simultaneously employed. Ghosh et al. 9 proposed the “multiply impute, then select” approach by employing the lasso method. Vansteelandt et al. 10 utilized the focused information criterion to select important confounders.

While those methods are useful for formulating treatment models, their development hinges on an implicit but subtle condition that variables need to be precisely measured. This assumption, however, is commonly violated in applications. Measurement error arises inevitably and ubiquitously because of various reasons, such as the impossibility of measuring a long-term average quantity, inevitable recall bias in answering a questionnaire, unaffordability of precise measurements, the unwillingness of answering sensitive questions, and so on. Causal inference with measurement error has attracted extensive attention (e.g. Imai and Yamamoto 11 and Edwards et al. 12 ). For example, McCaffrey et al. 13 and Shu and Yi14,15 proposed methods for adjusting measurement error effects on causal inference. To ameliorate measurement error effects, Kyle et al. 16 explored the simulation–extrapolation (SIMEX) method for marginal structural models with time-varying covariates. These methods, however, do not consider variable selection to exclude unimportant pre-treatment variables.

In this article, we consider inverse probability weighting estimation in the presence of measurement error and unimportant pre-treatment variables. We focus on the circumstance where no prior knowledge is available to guide us to select relevant variables for building the treatment model and we rely on using variable selection techniques to do so. To provide valid estimation results, we propose a simulation-based estimation method for the average treatment effect. Variable selection for building the treatment model and accommodation of measurement error effects are simultaneously conducted in inferential procedures. Theoretical results for the proposed method are established rigorously. Our method has a broad scope of applications. It does not require modeling of the outcome process to specify the relationship of the outcome variable with confounders. The proposed method applies to any parametric models used to describe the treatment model. The implementation procedure is straightforward.

The rest of this article is organized as follows. Section 2 introduces the notation and inference framework. Section 3 reports the proposed method together with theoretical justifications. Simulation studies for the proposed method are included in Section 4 and an application of the proposed method is described in Section 5. Discussions and extensions are presented in the last section.

2. Notation and framework

For an individual, let T be the observed binary treatment variable, with T=1 if treated and T=0 if untreated. Let Y(1) denote the potential outcome that would have been observed had the subject been treated, and let Y(0) represent the potential outcome that would have been observed had the subject been untreated. We are interested in estimating the average treatment effect (ATE), τ0E(Y(1))E(Y(0)) . While T is generically termed as a treatment indicator here, its practical meaning varies in applications. For example, T can be used to represent the exposure to a certain condition, as the case in the example presented in Section 5.

Let Y represent the observed outcome for an individual, which is assumed to be linked with potential outcomes via Y=TY(1)+(1T)Y(0) . This assumption, called consistency, basically says that the potential outcome under the observed treatment status equals the observed outcome.

In a randomized trial, the treatment indicator T is determined randomly so the potential outcomes {Y(1),Y(0)} and T are independent. Consequently, the sample average of the response measurements for the treated and untreated groups can be directly used to estimate τ0 by taking their difference. However, in a non-randomized trial or an observational study, the relationship E(Y(j))=E{Y(j)|T=j} does not hold anymore for j=0,1 due to the presence of confounders that are not controlled for the treated and untreated groups.

Let W be the vector of pre-treatment covariates or confounders. We assume that W contains all confounders associated with both potential outcomes and treatment and that only some components in W are predictive for T. That is, W can be written as W=(WIT,WIIT)T so that {Y(1),Y(0)} and T are conditionally independent, given W; and P(T=1|W)=P(T=1|WI) . In other words, WI includes important variables to predict T, whereas WII contains unimportant variables for predicting T.

The introduction of W allows us to express E(Y(j)) using the observed outcome Y and the treatment indicator T. Specifically, let πP(T=1|WI) denote the propensity score for an individual, then with the consistency assumption, we obtain that

E(TYπ)=E[T{TY(1)+(1T)Y(0)}π]=E{TY(1)π}=E[E{TY(1)|W}π]=E{E(T=1|W)E{Y(1)|W}π}=E{E(T=1|WI)E{Y(1)|W}π}=E{E(Y(1)|W)}=E{Y(1)}

where the second step comes from that T2=T and T(1T)=0 , the fourth step is due to the assumption of no unmeasured confounders, and the fifth step results from the nature of WI .

Similarly, we obtain that E{(1T)Y/(1π)}=E{Y(0)} , and thus, E{(TY/π)[(1T)Y/(1π)]}=τ0 . These identities provide us a basis for constructing a consistent estimator of τ0 using the observed data, as described below. Here the positivity assumption 0<π<1 is implicitly made to ensure the denominators are meaningful.

2.1. Consistent estimator

To estimate τ0 , suppose we have a sample {(Ti,Yi,Wi):i=1,,n} of size n, where Ti,Yi , and Wi=(WIiT,WIIiT)T represent the corresponding variables for subject i in the sample for i=1,,n . To use the confounder information together with the sample response measurements obtained from observational studies, Rosenbaum and Rubin 1 initiated the propensity score method to balance the distribution of the covariates for the treated and untreated groups. For i=1,,n , let

πi=P(Ti=1|WIi)

denote the conditional probability for subject i to receive the treatment, given the predictive covariates WIi ; this probability is called the propensity score of subject i.

If πi can be consistently estimated, say, by π^i , for subject i=1,,n , then a consistent estimator of τ0 can be constructed by

τ^=1ni=1nTiYiπ^i1ni=1n(1Ti)Yi1π^i. (1)

To mitigate unstable numerical results caused by extreme π^i close to 0 or 1, Lunceford and Davidian 3 proposed a stable version of (1), which is also consistent:

τ^=(i=1nTiπ^i)1i=1nTiYiπ^i(i=1n1Ti1π^i)1i=1n(1Ti)Yi1π^i. (2)

We use (2) for the following development.

2.2. Variable selection and measurement error

The consistency of the estimator (2) hinges on the consistent estimation of the propensity score πi for subject i=1,,n . While the propensity score πi is assumed to be determined by WIi but not WIIi , in building the treatment model for πi , it is usually unclear what components of Wi should be included in WIi , though subject matter knowledge is often utilized to help. Here we consider the circumstance where no prior knowledge is available to guide us to form WIi and we rely on using variable selection techniques to decide what variables in Wi are to be selected to form WIi . Without loss of generality, components in Wi are assumed to be standardized to have mean 0 and variance 1.

Suppose that we start with postulating the propensity score πi using a parametric model, denoted g(Wi;γ) , with all the components WIi and WIIi in Wi included:

πi=g(WIi,WIIi;γ), (3)

where g() is a link function, such as the logit, probit, or complementary log–log function; and γ=(γ0,γI*T,γIIT)T is the vector of regression parameters, with γ0 representing the intercept and γI* and γII standing for the parameters, respectively, corresponding to WIi and WIIi . The importance of WIi and irrelevance of WIIi with respect to Ti is then reflected by that γI*0 and γII=0 . Here and elsewhere, we loosely use 0 to denote a zero vector whose dimension is inferred from the context. In other words, to identify salient variables in Wi , one may apply variable selection techniques to model (3) and then examine estimates for the components in γ to determine what components in Wi need to be selected to form WIi . For ease of exposition, we write γI=(γ0,γI*T)T and γ=(γIT,γIIT)T ; let p denote the dimension of (γI*T,γIIT)T ; and we also write γ=(γ0,γIγp)T .

The estimation of γ may proceed as follows. Let Si(γ;Wi) denote the likelihood score function, or more generally, an unbiased estimating function, derived from model (3) to reflect the contribution from subject i in the sample. If the true value of Wi is available, one may solve

i=1nSi(γ;Wi)=0 (4)

for γ to obtain a consistent estimator under regularity conditions.

In applications, some variables in Wi are error-prone and their accurate measurements are not available for all subjects in the study. To characterize this feature, we re-write WIi=(ZIiT,XIiT)T and WIIi=(ZIIiT,XIIiT)T so that Zi(ZIiT,ZIIiT)T represents the subvector of error-free covariates in Wi and Xi(XIiT,XIIiT)T is the subvector of error-prone covariates in Wi . Furthermore, let Xi*=(XIi*T,XIIi*T)T denote the observed version of Xi , where XIi* and XIIi* are the observed measurements of XIi and XIIi , respectively.

Suppose that Xi* and Xi are linked by the classical additive error model

Xi*=Xi+ei, (5)

where the error term ei is independent of {Ti,Xi,Zi,Y(0)i,Y(1)i} and follows N(0,Σe) with covariance matrix Σe . Here Y(1)i (or Y(0)i ) represents the potential outcome that would have been observed had subject i been treated (or untreated). Model (5) features situations where the observed value fluctuates around the true value with an error term; this model is most commonly used in the literature (e.g. Carroll et al. 17 and Yi 18 ).

While model (5) is useful to describe measurement error problems in applications, its introduction brings in an issue of parameter non-identifiability (e.g. Yi 18 and Yi et al. 19 ). To circumvent this issue and highlight the main ideas, we assume that Σe is known for now and defer the discussions on handling unknown Σe to Section 6.

3. Simulation-based method

In this section, we develop a method of estimating the average treatment effect τ0 by incorporating the features of variable selection and measurement error. We also establish the asymptotic properties of the resulting estimator.

3.1. Implementation procedure

Here we describe a simulation-based method that applies to any parametric form of the treatment model (3). The method roots in the SIMEX algorithm developed for the non-causal inference framework by Cook and Stefanski, 20 but it extends the scope of the usual SIMEX algorithm by including extra steps for variable selection and inverse probability treatment weighting estimation. The estimation procedure consists of five steps. The first three steps come from the SIMEX method which aims to correct for measurement error effects on the estimation of the parameters associated with the treatment model (3), the fourth step performs the selection of active variables for building a suitable treatment model, and the last step outputs a valid estimator of τ0 . The five implementation steps are described as follows, and the programming code is available at GitHub: https://github.com/lchen723/Causal-inference-R-code.git.

Step 1. Simulation:

Let K be a given positive integer (say, K=500 ) and let C={ψ1,ψ2,,ψM} be a sequence of M (say, M=10 ) non-negative numbers taken from [0,ψM] with a given ψM (say, ψM=1 ), where ψ1=0 . For k=1,,K , independently generate random variates eik from N(0,Σe) and define

Xi*(k,ψ)=Xi*+ψeikforψC.

Step 2. Estimation of treatment model parameters:

In (4), we replace Xi with Xi*(k,ψ) and solve

i=1nSi(γ;Xi*(k,ψ))=0

for γ to obtain an estimator, denoted γ^(k,ψ) , where the dependence of Si(γ;Xi*(k,ψ)) on Zi is suppressed in the notation. Next, calculate

γ^(ψ)=K1k=1Kγ^(k,ψ).

Step 3. Extrapolation:

For j=0,1,,p , let γ^j(ψ) denote the jth element of γ^(ψ) ; fit a regression model to {(ψ,γ^j(ψ)):ψC} and extrapolate it to ψ=1 ; and let γ~j denote the resulting extrapolated estimator of γj , the jth element of γ . Write γ~=(γ~0,γ~1,,γ~p)T .

Step 4. Variable selection:

Define a quadratic loss function:

(γ)=12(γγ~)TVn(γγ~), (6)

where Vn is a user-specified positive definite weight matrix. Different specifications of Vn may result in estimators having different efficiency; taking Vn to be the covariance matrix of γ~ yields an efficient result, and setting Vn as an identity matrix gives an easy implementation.

Consider the penalized quadratic loss function:

P(γ)=(γ)nj=1ppλ(|γj|), (7)

where pλ() is a penalty function with a tuning parameter λ . Here we consider a weighted L1 -penalty 21 :

pλ(|γj|)=pλ(|γ~j|)|γj|,

where pλ(u) is the first order derivative of pλ(u) , and pλ(u) can be set as a commonly used penalty function such as the least absolute shrinkage and selection operator (LASSO) penalty 22 or the smoothly clipped absolute deviation (SCAD) penalty 23 that are commonly used in applications. With the LASSO penalty, we set

pλ(|γj|)=λ|γj|,

and for the SCAD penalty, we take

pλ(x)=λ{I(xλ)+(aλx)+(a1)λI(x>λ)},

where I() is the indicator function, x+=max{x,0} , and a=3.7 .

To achieve a satisfactory performance of the selection procedure, we consider a grid Λ of possible values for the tuning parameter λ , and for λΛ , let γ^(λ)=argminγP(γ) and let dfλ denote the number of non-zero elements of γ^(λ) . We define

BIC(λ)=2(γ^(λ))+2(logn)dfλ.

Then the optimal tuning parameter λ* is chosen as the minimizer of BIC(λ) :

λ*=argminλΛBIC(λ)

and the resulting estimator γ^(λ*) , denoted γ^ , is taken as the estimator of γ .

Step 5. Estimation of ATE:

Write γ^=(γ^IT,γ^IIT)T with γ^I=(γ^0,γ^xIT,γ^zIT)T and γ^II=(γ^xIIT,γ^zIIT)T , respectively, corresponding to the non-zero and zero components in γ^ . Importance of covariates {XIi,ZIi} and unimportance of covariates {XIIi,ZIIi} are thereby suggested by the estimates of the corresponding coefficients. With unimportant variables XIIi and ZIIi excluded from the initial model (3), the final treatment model is taken as

πi=g(XIi,ZIi;γI), (8)

where γI includes the intercept and the model parameters associated with important covariates {XIi,ZIi} .

Let XIi*(k,ψ) denote the subvector of Xi*(k,ψ) corresponding to XIi* , generated from Step 1. Using the selected treatment model (8) with XIi replaced by XIi*(k,ψ) , we calculate the fitted value π^i(k,ψ)g(XIi*(k,ψ),ZIi;γ^I) . Then we obtain an estimate, say, τ^(k,ψ) , of τ0 using (2) with π^i replaced by π^i(k,ψ) , and calculate

τ^(ψ)=K1k=1Kτ^(k,ψ).

Finally, we fit a regression model to {(ψ,τ^(ψ)):ψC} and extrapolate it to ψ=1 . The resulting value, denoted as τ^ , is taken as an estimate of τ0 .

We conclude this section with a few remarks. The basic idea of the implementation roots in using the available estimation method (2), which is developed under the ideal situation where all the pre-treatment covariates are relevant and measured without error. We start with an initial treatment model (3) by using all pre-treatment variables in Wi to express propensity scores.

The first three steps are directed to addressing the measurement error effects in the estimation of the propensity scores; and the last step estimates the ATE τ0 with the measurement error effects accounted for by employing the SIMEX algorithm. The idea of the SIMEX is to first establish the trend of measurement error-induced biases as a function of the variance of measurement error by artificially creating a sequence of surrogate measurements, and then extrapolate this trend back to the case without measurement error. Specifically, in Step 1, we artificially create a sequence of error-contaminated surrogate measurements by introducing different degrees of measurement error, and then we apply those surrogate measurements in Step 2 to obtain biased estimates by running an estimation method developed for error-free settings. Step 3 traces the patten of biased estimates against varying magnitudes of measurement error and then does extrapolation. To analytically demonstrate this rationale, one may consider the simple linear regression model with an additive measurement error in the covariate; an intuitive illustration of the idea can be found in Yi. 18 (p.6364) In implementing Steps 3 and 5, it is ideal to use the true extrapolation function as is required in establishing the theoretical results in the next subsection. However, the exact function form is typically unknown in applications, and one has to invoke a specified regression model to approximate it. As a result, this approximation makes the resulting SIMEX estimators not exactly but only approximately consistent. Further, this suggests that the performance of the SIMEX estimators can be sensitive to the choice of a working extrapolation function form. In our simulation studies to be reported in Section 4, we compare how different specifications of the extrapolation function in Steps 3 and 5 may affect the performance of the estimation.

Step 4 requires the optimal value for the tuning parameter λ . While different criteria such as generalized cross validation (GCV) and Akaike information criterion (AIC) are commonly used in applications, Wang et al. 24 and Zhang et al. 25 showed that the optimal tuning parameter derived from the AIC and GCV criteria has a nonignorable overfitting effect and that the tuning parameter derived from the Bayesian information criterion (BIC) with the SCAD can identify the true model consistently under linear or partial linear regression models. Here we consider a criterion based on the BIC following Yi et al. 26 and Chen and Yi. 27

3.2. Theoretical results

Now we justify the validity of the algorithm described in Section 3.1 by establishing asymptotic properties for the associated estimators. Consistent with authors such as Fan and Li, 23 Yi et al., 26 and Carroll et al., 28 the following regularity conditions are considered:

  • (A). 

    As n , n1VnpV , where V is positive definite.

  • (B). 

    As n , n(γ~γ)dN(0,Σ) , where Σ is positive definite.

  • (C). 
    To express the dependence on the sample size, we write the tuning parameter as λn . Define
    an=max{pλn(|γj|):γj0withj=1,,p}andbn=max{pλn(|γj|):γj0withj=1,,p}.
    Assume that as n ,
    an0andbn0andliminfnliminfu0+npλn(u)=.
  • (D). 

    The extrapolation function in Step 5 is correctly specified.

Repeating the proof for Theorem 1 of Fan and Li, 23 we can show that with regularity conditions, there exists an estimator γ^ such that

γ^γ=Op(n1/2+an).

This suggests that γ^ is a n -consistent estimator of γ if an=Op(n1/2) .

Next, we discuss the asymptotic properties of γ^ . Split V into a 2×2 block matrix:

V=(VI,IVI,IIVII,IVII,II),

where Vu,v is the submatrix of dimension pu×pv for u,v=I,II , pI represents the dimension of γI , and pII stands for the dimension of γII . Write VI=[VI,IVI,II] . Using the arguments of Yi et al., 26 we can prove the following results.

Theorem 3.1.

Assume that regularity conditions (A) to (C) hold. Then the following results hold:

  • (a). 
    As n ,
    n(γ^IγI)dN(0,VI,I1VIΣVITVI,I1).
  • (b). 

    γ^zII=0 and γ^xII=0 .

  • (c). 

    When V is equal to Σ1 , the covariance matrix of γ^I is no greater than the covariance variance of γ~I in the Loewner order, where γ~I is the subvector of the SIMEX estimator γ~ corresponding to γI .

Theorem 3.1(a) establishes the asymptotic distribution for the estimators for the effects corresponding to important pre-treatment variables in model (3), or equivalently, for the estimators of the parameters for the selected treatment model (8). Theorem 3.1(b) ensures the oracle property in the sense of Fan and Li 23 for the variable selection procedure for building the final treatment model (8). The results in Theorem 3.1(a) and (b) are established in the same lines as Zou and Li 21 and Fan and Li, 23 which basically require regularity condition (C). This condition is satisfied by the SCAD penalty but not the LASSO penalty. Condition (B) is needed in showing Theorem 3.1(a), and its validity is ensured by the result of Carroll et al., 28 who assumed the knownness of the extrapolation function in Step 3.

Theorem 3.1(c) suggests that the estimator obtained from conducting an extra step of variable selection (i.e. obtained from Steps 1–4) is more efficient than the usual SIMEX estimator (i.e. obtained from Steps 1–3). This shows the necessity and importance of excluding inactive pre-treatment variables when building the treatment model. Furthermore, we show the following asymptotic distribution of the estimator τ^ established in Step 5 of Section 3.1, and its proof is deferred to the Appendix.

Theorem 3.2.

Assume that regularity conditions (A) to (D) hold. Then the estimator τ^ obtained in Section 3.1 has the asymptotic distribution

n(τ^τ0)dN(0,v(τ0))asn,

where v(τ0) is defined in the Appendix.

4. Simulation studies

In this section, we conduct simulation studies to assess the finite sample performance of the proposed method. As described in Section 3, we first select the variables and estimate the associated parameters for the treatment model, and then estimate the average treatment effect τ0 . For each parameter configuration, we repeat the simulation 500 times, where the sample size is set as n=400 .

4.1. Simulation designs

We consider one of the following two outcome models, which show different types of dependence of Y on the covariates {X,Z} and the treatment indicator variable T:

Model1:Y=T+βxTX+βzTZ+ϵModel2:logitP(Y=1|T,X,Z)=T+βxTX+βzTZ

where ϵ is independent of {X,Z,T} ; ϵN(0,1) ; βx and βz are the model parameters of dimension px and pz , respectively. Here we take βx=(1dxT0pxdxT)T and βz=(1dzT0pzdzT)T with dxpx2 and dzpz2 , where a represents the ceiling function of a, that is, it is the least integer that is a for a real number a; and 1r and 0r represent the r×1 unit and zero vectors, respectively, for a positive integer r. We set px and pz both to be 15, yielding that Models 1 and 2 include 16 important covariates of which 8 are error-prone, and 14 unimportant covariates of which 7 are error-prone.

The treatment indicator Ti is independently generated from the treatment model (3) for i=1,,n . We consider three useful model forms for (3), respectively, given by

  • logistic regression model:
    πi=exp(γ0+XiTγx+ZiTγz)1+exp(γ0+XiTγx+ZiTγz) (9)
  • probit regression model:
    πi=Φ(γ0+XiTγx+ZiTγz) (10)
  • complementary log–log regression model:
    πi=1exp{exp(γ0+XiTγx+ZiTγz)} (11)

where γ=(γ0,γxT,γzT)T is the vector of parameters, and Φ() is the cumulative standard normal distribution function.

For all those treatment models, consider the case with γx=γz=(1dx/2T,1dx/2T,0pxdxT)T and γ0=1 . Thus, the number of important error-prone and important error-free variables is both 8, and the number of unimportant error-prone and unimportant error-free variables is both 7.

To generate measurements of Yi and Ti , respectively, from an outcome and treatment model independently for i=1,,n , we need first to simulate measurements for the covariates {Xi,Zi} as well as for the treatment indicator variable Ti . For i=1,,n , we generate covariate measurements independently from the normal distribution: (XiT,ZiT)TN(0p,Σw) , where the matrix Σw is written as

(ΣxxΣxzΣzxΣzz)

with Σxz=Σzx and Σxx and Σzz , respectively, representing the covariance matrices for Xi and Zi . Let σxzjk , σxjk , and σzjk denote element (j,k) of Σxz , Σxx , and Σzz , respectively. We consider the case with σxzjk=0.4(2+|jk|) , σxjk=σx2ρx|jk| , and σzjk=σz2ρz|jk| by setting σx2=σz2=1.0 and ρx=ρz=0.5 .

To generate surrogate measurements Xi* for Xi , we consider the classical additive error model (5), where Σe is set as a diagonal matrix with a common element σe2 . We set σe2 to be 0.15, 0.50, or 0.75, yielding the signal-to-noise ratio var(Xj)/var(Xj*) to be 44, 4, and 1.8, respectively.

Finally, we examine the value of the average treatment effect, τ0 , under each of the outcome models. Model 1 shows a scenario with continuous outcomes and yields the true value τ0 to be

τ0=E(Y(1))E(Y(0))=E{E(Y(1)|T,X,Z)}E{E(Y(0)|T,X,Z)}=E(1+βxTX+βzTZ+ϵ)E(0+βxTX+βzTZ+ϵ)=1

On the contrary, Model 2 forms a logistic regression model for binary outcomes, yielding the true value τ0 to be

τ0=P(Y(1)=1)P(Y(0)=1)=RpxRpzP(Y=1|T=1,X=x,Z=z)f(x,z)dzdxRpxRpzP(Y=1|T=0,X=x,Z=z)f(x,z)dzdx (12)

where f(x,z) is the density function of {X,Z} , which is specified as a multivariate normal distribution earlier, and P(Y=1|T=1,X=x,Z=z) is determined by Model 2. As (12) does not have a closed form, we use the Monte Carlo method to obtain an approximate value of τ0 . First, we generate a sequence of values {(xj,zj):j=1,,N} from f(x,z) for a sufficiently large N, and then we approximate τ0 by 1Nj=1NP(Y=1|T=1,X=xj,Z=zj)1Nj=1NP(Y=1|T=0,X=xj,Z=zj) . In our numerical studies, we set N=5000 , yielding the true value τ0 to be 0.187 approximately.

4.2. Simulation results

Our objectives here are to (a) select important variables from the treatment model (3), (b) assess the performance of the proposed estimator of τ0 , and (c) demonstrate the effects of measurement error. When implementing the proposed method described in Section 3.1, we consider the following choices of relevant quantities. In Steps 3 and 5, we set C={0,0.25,0.50,0.75,1.0,1.25,1.50,1.75,2.0} and K=500 ; to evaluate how estimation results of τ0 may change, we use the quadratic, linear, or rational linear extrapolation function to approximate the true extrapolation function, as discussed by Carroll et al. 17 (Section5.3.2) . In Step 4, we set Vn in (6) to be the identity matrix and take the SCAD penalty for (7); in comparison, we also consider the use of the LASSO penalty, which does not satisfy the requirement of “ an0 as n ” in condition (C).

In contrast to variable selection for objective (a), we also examine the performance based on the full model without doing the variable selection. To compare the performance of the methods, we use both the L1 and L2 loss functions, respectively, given by k=0p|γ^kγk|andk=0p(γ^kγk)2, where γ^k and γk represent the kth component of γ^ and γ , respectively. In addition, we report the total number of selected variables and the number of falsely excluded important variables, denoted #S and #FN , respectively. Regarding objective (c) of demonstrating the impact of measurement error, we examine the naive approach, which disregards the difference between Xi* and Xi in variable selection and estimation of τ0 .

To save space, here we report only the results for the cases with the treatment model specified as the logistic regression model and defer the results for the other two treatment models to the Supplemental material. Table 1 records the results for variable selection of the treatment model, and Table 2 reports finite sample biases (Bias), empirical standard errors (S.E.), root mean squared errors (RMSE), and coverage rates in percent (CR%) for 95% confidence intervals for τ0 obtained under the response Models 1 and 2.

Table 1.

Simulation results: variable selection for the treatment model postulated by the logistic model.

Proposed: quadratic extrapolation Proposed: linear extrapolation Proposed: rational linear extrapolation Naive
σe2 Method L1 -loss L2 -loss #S #FN L1 -loss L2 -loss #S #FN L1 -loss L2 -loss #S #FN L1 -loss L2 -loss #S #FN
0.15 LASSO 0.322 0.013 18.363 0.000 0.405 0.028 18.557 0.000 0.395 0.023 18.530 0.000 2.528 0.468 22.190 0.000
SCAD 0.313 0.011 17.256 0.000 0.353 0.022 17.477 0.000 0.349 0.020 17.309 0.000 2.489 0.447 22.167 0.000
full 0.553 0.028 0.718 0.062 0.624 0.053 2.843 0.655
0.50 LASSO 0.337 0.015 18.570 0.000 0.423 0.033 18.593 0.000 0.414 0.029 18.554 0.000 2.607 0.474 22.231 0.000
SCAD 0.326 0.014 17.368 0.000 0.388 0.029 17.536 0.000 0.380 0.027 17.388 0.000 2.538 0.456 22.184 0.000
full 0.580 0.034 0.724 0.069 0.642 0.054 2.896 0.660
0.75 LASSO 0.349 0.018 18.615 0.000 0.440 0.037 18.673 0.000 0.423 0.034 18.650 0.000 2.634 0.486 22.325 0.000
SCAD 0.342 0.017 17.381 0.000 0.421 0.034 17.589 0.000 0.411 0.032 17.430 0.000 2.597 0.475 22.249 0.000
full 0.595 0.037 0.731 0.071 0.660 0.056 2.923 0.678
True X LASSO 0.316 0.011 17.231 0.000
SCAD 0.310 0.010 17.156 0.000
full 0.536 0.025

LASSO: least absolute shrinkage and selection operator; SCAD: smoothly clipped absolute deviation. “Proposed” refers to the procedure in Section 3 using the surrogate Xi* together with other measurements; “Naive” represents the estimation procedure in Section 2 with Xi replaced by Xi* ; “True X” denotes the estimation procedure in Section 2 using Xi together with other measurements.

Table 2.

Simulation results: estimation of ATE τ0 with the treatment model postulated by the logistic model.

Proposed: quadratic extrapolation Proposed: linear extrapolation Proposed: rational linear extrapolation Naive
Model σe2 Method Bias S.E. RMSE CR% Bias S.E. RMSE CR% Bias S.E. RMSE CR% Bias S.E. RMSE CR%
1 0.15 LASSO 0.031 0.023 0.039 94.7 0.035 0.030 0.046 94.5 0.034 0.026 0.043 94.3 0.146 0.020 0.147 47.3
SCAD 0.026 0.020 0.033 95.2 0.034 0.028 0.044 94.7 0.031 0.023 0.039 94.5 0.137 0.018 0.138 49.5
full 0.058 0.026 0.064 88.4 0.064 0.035 0.073 86.3 0.061 0.031 0.068 87.8 0.159 0.024 0.161 42.6
0.50 LASSO 0.033 0.025 0.041 94.7 0.039 0.033 0.051 94.4 0.035 0.029 0.045 94.1 0.155 0.022 0.157 43.4
SCAD 0.028 0.023 0.036 95.0 0.037 0.030 0.048 94.5 0.034 0.028 0.044 94.3 0.149 0.020 0.150 45.2
full 0.063 0.029 0.069 86.8 0.068 0.042 0.080 84.6 0.066 0.036 0.075 86.1 0.167 0.027 0.169 38.4
0.75 LASSO 0.035 0.028 0.045 94.5 0.042 0.036 0.055 93.8 0.038 0.032 0.050 94.0 0.163 0.025 0.165 40.6
SCAD 0.033 0.026 0.042 94.8 0.041 0.034 0.053 94.0 0.036 0.030 0.047 94.2 0.157 0.023 0.159 42.0
full 0.069 0.032 0.076 83.7 0.071 0.045 0.084 82.4 0.069 0.038 0.079 86.0 0.173 0.029 0.175 34.2
True X LASSO 0.026 0.019 0.032 95.1
SCAD 0.023 0.017 0.029 95.3
full 0.053 0.024 0.058 91.3
2 0.15 LASSO 0.020 0.027 0.034 95.2 0.039 0.032 0.050 94.3 0.036 0.030 0.047 94.4 0.216 0.020 0.217 23.6
SCAD 0.018 0.023 0.029 95.4 0.035 0.028 0.045 94.4 0.032 0.026 0.041 94.6 0.208 0.017 0.209 26.7
full 0.063 0.036 0.073 86.5 0.071 0.042 0.082 85.1 0.068 0.039 0.078 86.7 0.256 0.030 0.258 18.5
0.50 LASSO 0.024 0.029 0.038 95.0 0.042 0.034 0.054 94.0 0.040 0.032 0.051 94.1 0.223 0.023 0.224 20.1
SCAD 0.022 0.025 0.033 95.2 0.039 0.030 0.049 94.1 0.037 0.030 0.048 94.3 0.216 0.019 0.217 22.4
full 0.072 0.039 0.082 84.7 0.076 0.045 0.088 84.7 0.073 0.041 0.084 85.8 0.266 0.034 0.268 16.1
0.75 LASSO 0.027 0.030 0.040 95.0 0.044 0.037 0.057 93.8 0.043 0.034 0.055 94.0 0.250 0.026 0.251 18.6
SCAD 0.026 0.028 0.038 95.0 0.043 0.034 0.055 93.8 0.040 0.033 0.052 94.0 0.238 0.024 0.239 20.1
full 0.079 0.043 0.090 82.6 0.084 0.047 0.096 84.1 0.082 0.044 0.093 84.7 0.307 0.035 0.309 11.4
True X LASSO 0.017 0.023 0.029 95.3
SCAD 0.015 0.020 0.025 95.4
full 0.063 0.030 0.070 91.1

ATE: average treatment effect; S.E.: standard error; RMSE: root mean square error; LASSO: least absolute shrinkage and selection operator; SCAD: smoothly clipped absolute deviation. CR%: coverage rates in percent. “Proposed” refers to the procedure in Section 3 using the surrogate Xi* together with other measurements, “Naive” represents the estimation procedure in Section 2 with Xi replaced by Xi* , and “True X” denotes the estimation procedure in Section 2 using Xi together with other measurements.

First, we examine the performance of the proposed method. Comparing the values of the L1 and L2 loss functions produced from the proposed method and the likelihood estimation method based on the full model, Table 1 shows the benefits of conducting variable selection for the treatment model (3). With the proposed method, using the quadratic extrapolation function tends to perform the best while the linear extrapolation function seems to perform the least well. The proposed method with the SCAD penalty inclines to slightly outperform the proposed method with the LASSO penalty. As expected, as the degree of measurement error increases, the proposed method produces increasing values of the L1 and L2 loss functions. Regarding the number (#S) of selected variables, the LASSO penalty tends to produce larger values than those obtained from using the SCAD, and both seem to be slightly larger than the true value.

Regarding the performance of estimating the ATE τ0 , Table 2 shows that the proposed method outperforms the method without excluding unimportant variables (i.e. under the full model). Estimation of τ0 without incorporating variable selection for building the treatment model (3) yields larger finite sample biases and standard errors than the proposed method, and the resulting coverage rates for 95% confidence intervals of τ0 deviate from the nominal level 95%. This suggests that imposing variable selection in the procedures of estimating τ0 is necessary.

With the three extrapolation functions, the proposed method generally performs well with small finite sample biases and good coverage rates for 95% confidence intervals of τ0 , and the performance with the quadratic extrapolation function tends to perform the best. Unsurprisingly, the performance of the proposed method deteriorates as the degree of measurement error increases. In implementing the proposed method, using the SCAD penalty yields slightly better results than using the LASSO penalty.

On the other hand, we inspect the results produced from the naive method, which ignores the measurement error effects. Evidently, as shown in Table 1, the naive method produces noticeable biases for the parameters of the treatment model (3) than the proposed method, though the number of falsely excluded important variables by the naive method, # FN, is near zero in all settings. Notably, Table 2 shows that the naive method produces unsatisfactory estimation results of τ0 with a lot larger finite sample biases yet smaller standard errors than those yielded from the proposed method, regardless of the outcome model, and thus, it leads to useless coverage rates for 95% confidence intervals of τ0 . Furthermore, RMSE values yielded from the naive method are larger than those of the proposed method under different settings, suggesting that the bias-variance trade-off of the proposed method is beneficial in contrast to that of the naive method. Similar patterns are also revealed in the tables included in Section A of the Supplemental material.

In summary, the simulation studies demonstrate the adverse effects of ignoring measurement error in the analysis as well as the importance of excluding irrelevant covariates when building the treatment model for calculating proper propensity scores. The studies confirm the satisfactory performance of the proposed estimator of τ0 in finite sample settings; the method with the SCAD penalty tends to slightly outperform the method with the LASSO penalty. As the proposed method is simulation-based, it may be time-consuming. However, its implementation is easy and can be built as a computerized-automatic program. Further, its applicability scope is broad in the sense that any parametric or semiparametric modeling for the treatment model (3) can be applied.

5. Analysis of NHANES I Epidemiologic Follow-up Study data

5.1. Data description

We apply the proposed method to analyze the data arising from the NHANES I Epidemiologic Follow-up Study (NHEFS), a national longitudinal study that was jointly initiated by the National Center for Health Statistics and the National Institute on Aging in collaboration with other agencies of the Public Health Service. The study was designed to investigate the relationship among clinical, nutritional, and behavioral factors. The NHEFS cohort includes participants of age 25 to 74 years who completed a medical examination at NHANES I in 1971–1975, and the first wave of data collection was conducted for all members from 1982 to 1984. The details can be found at https://wwwn.cdc.gov/nchs/nhanes/nhefs/#dfd.

In our analysis here, we consider a dataset of 1624 subjects, which is available at https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/. We are interested in understanding possible causal effects of smoking behavior on the weight change. Using the notation in Section 2, for an individual we let T represent the binary exposure variable for the smoking status (qsmk) (1 for quitting smoking or not smoking between the first questionnaire and 1982 and 0 otherwise). We want to estimate the average effect of T, τ0=E(Y(1))E(Y(0)) , where Y(1) represents the weight an individual would have at the entry of the study if this person had been a nonsmoker or quitted smoking, and Y(0) represents the weight an individual would have at the entry of the study if this person had been a smoker. Let Y denote the actual weight in kilograms (wt) for an individual at the entry of the study.

Since the data are collected from the observational studies (and hence, smoking status cannot be randomized among the study subjects), directly calculating the difference for the sample average between the smokers and nonsmokers fails to yield a consistent estimator of τ0 . To control for the confounding effects and use (2) to estimate τ0 , we need first to build a suitable treatment model to calculate propensity scores.

The dataset we consider contains the following covariates, including systolic blood pressure (sbp, in millimeter of mercury (mmHg)), serum cholesterol (cholesterol, in mg/100), diastolic blood pressure (dbp, in mmHg), height in centimeters (ht), average tobacco price in the state of residence in 1982 (price82, in US dollars), age in 1971 (age, in years), sex (0 for male and 1 for female), use nerves medication (nerves, with 0 representing never and 1 otherwise), use high blood pressure medication (hbpmed, with 0 representing never use and 1 otherwise), and race (0 for white and 1 otherwise). Here each variable is standardized by subtracting its sample mean and then being divided by its sample standard deviation.

Blood pressure and cholesterol are error-prone, as discussed, respectively, by Bauldry et al. 29 and Glasziou et al. 30 As commented by Lebow and Rudd, 31 (p.168) the price of tobacco is subject to measurement error due to respondents’ imprecise recall of their own expenditure or their unwillingness of reporting expenditures. Let X represent the vector of error-prone covariates including sbp ( X1 ), dbp ( X2 ), cholesterol ( X3 ), and price82 ( X4 ). Let Z denote the vector of error-free covariates including ht, age, sex, nerves, hbpmed, and race. Using the notation in Section 4.1, we have that px=4 , pz=1+6 with 1 indicating the inclusion of the intercept in Z. We employ model (5) to describe the relationship between the observed surrogate measurement X* and the value of the true covariate X.

5.2. Analysis using external information

To estimate the covariance matrix in model (5), here we make use of external information of repeated measurements for sbp, dbp, cholesterol, and price82. Specifically, two repeated measurements, Xi1k* and Xi2k* , of the variables sbp ( Xi1 ) and dbp ( Xi2 ) for the same subjects are available at https://archive.ics.uci.edu/ml/datasets/Myocardial+infarction+complications, where i=574 and k=1,,n(1,2) with n(1,2)=2 . Three repeated measurements, Xi3k , of the the variable cholesterol ( Xi3 ) for a group of people are posted at https://www.sheffield.ac.uk/mash/statistics/datasets, where i=18 and k=1,,n3 with n3=3 . Eleven repeated measurements, Xi4k , of the variable price82 ( Xi4 ) for the same individuals are available in the R package “Ecdar,” where i=48 and k=1,,n4 with n4=11 .

With those repeated measurements, the estimated covariance matrix of Σe is given by

Σ^e=diag(Σ^12*,σ^33*2,σ^44*2),

where

Σ^12*=(0.4990.4340.4340.543),

σ^33*2=0.006 , and σ^44*2=0.150 are, respectively, obtained from the three different data sources described above using the method of moments,

i=1nk=1nj(Xijk*X¯ij*)(Xijk*X¯ij*)i=1n(nj1)

for j=(1,2),3,4 , X¯ij*=nj1k=1nj , and the independence of the Xijk* is assumed for j=1,2,3,4 .

To implement the proposed method in Section 3.1, we consider three forms for the treatment model (3): the logistic, probit, and complementary log–log models; and we examine three extrapolation functions in Steps 3 and 5 in Section 3.1: quadratic, linear, and rational linear extrapolation functions. For variable selection, we use either the LASSO or SCAD penalty when implementing the proposed method based on (7). In comparison, we also examine the implementation based on the full model without doing variable selection like in Section 4, and call this method “full.” In the top two panels of Table 3, we report the estimates of the model parameters for the logistic regression treatment model obtained from different methods as well as the estimation results of τ0 , and defer the results for the other two treatment models to Section B in the Supplemental material.

Table 3.

Analysis results of NHEFS data with propensity scores determined by the logistic model: external data are used to characterize the measurement error degree.

Quadratic Linear RL
Covariate full LASSO SCAD full LASSO SCAD full LASSO SCAD
Intercept −1.162 −1.098 −1.159 −1.145 −1.089 −1.150 −1.093 −1.049 −1.099
sbp 0.052 0.065 0.073
dbp 1.199 1.146 1.188 1.141 1.092 1.098 1.280 1.222 1.274
cholesterol −0.005 −0.029 −0.013
price82 −1.268 −0.216 −0.272 −1.080 −0.024 −0.032 −1.120 −0.062 −0.066
ht 0.060 0.090 0.036
age 0.047 0.268 0.273
sex 0.040 0.029 0.028
nerves −0.050 −0.034 −0.070
hbpmed 0.015 0.022 −0.004
race −1.300 −0.236 −0.296 −1.276 −0.220 −0.280 −1.319 −0.274 −0.324
τ^ 1.313 3.256 3.477 1.478 3.345 3.500 1.600 3.587 3.744
S.E.(τ^) 1.013 0.931 0.925 1.024 0.977 0.963 1.130 1.044 1.038
p-value 0.194 <0.001 <0.001 0.148 <0.001 <0.001 0.157 <0.001 <0.001
τ^F 3.554 3.951 3.375 3.245 3.754 3.609
S.E.(τ^F) 1.312 1.293 0.588 0.590 1.048 1.038
p-value 0.007 0.002 <0.001 <0.001 <0.001 <0.001

ATE: average treatment effect; LASSO: least absolute shrinkage and selection operator; SCAD: smoothly clipped absolute deviation. Headings “Quadratic”, “Linear”, and “RL” refer to the extrapolation function approximated by the quadratic, linear, and rational linear functions, respectively. The top panel reports the results of variable selection for the treatment model; the middle panel displays the estimation results of the ATE τ0 ; and the bottom panel shows the estimation results of the ATE τ0 by forcefully including “age” and “sex” to the selected variables to form the final treatment model to estimate τ0 .

Those results show that regardless of the model assumption for the treatment model and the extrapolation function form, both the proposed LASSO and SCAD methods suggest the same important covariates, dbp and race, for determining propensity scores. The estimates of τ0 produced by the proposed LASSO and SCAD method are closer to each other than to those obtained from the “full” method; and the former estimates are smaller than the latter ones. The standard errors produced from the proposed method with the SCAD penalty are the smallest, and they are closer to those produced from the proposed method with the LASSO penalty than those obtained from the “full” method.

Under all the settings, the estimates determined by the proposed LASSO or SCAD method have p-values <0.001 , revealing evidence of the exposure effects of T on affecting the average weight difference for nonsmokers and smokers. This revealing is in line with the results of Kaufman et al. 32 On the contrary, if not excluding irrelevant covariates when determining propensity scores, the resulting p-values derived from the “full” method are all >0.05 , showing no evidence that the smoking status has effects on affecting individuals’ weights.

To reflect the application scenario where the assignment of the treatment must depend on certain covariates, a referee suggested to forcefully include some covariates to form the final treatment model, together with the selected variables obtained from Steps 1 to 4 in Section 3.1. To this end, here we take the covariates, age and sex, as those that must enter the final treatment model. Let {XIi,ZIi} denote the covariates selected from Steps 1 to 4 in Section 3.1. We then modify the fitting model (8) by replacing {XIi,ZIi} with {XIi,ZIi,age,sex} , and re-run Step 5 in Section 3.1 to obtain an estimate of τ0 ; let τ^F and S.E. (τ^F) denote the resulting estimate and the associated standard error, respectively. The results for the logistic treatment model are presented in the bottom panel of Table 3; and the results for the probit and complement log–log treatment models are summarized, respectively, at the bottom panels of Tables B1 and B2 in the Supplemental material.

5.3. Sensitivity analyses

Using external data helps us estimate the covariance matrix in model (5) as illustrated in Section 5.2. The validity of the analysis in Section 5.2 requires the comparability between the measurement error processes of the external data and the data we analyze, also called the transportability condition. 33 As it is not apparent that this condition holds, here we further carry out sensitivity analyses to understand the impact of measurement error on estimation (e.g. Chen and Yi 27 and Chen and Yi 34 ).

Let ΣX and ΣX* denote the covariance matrix of X and X* , respectively, and let σXij , σX*ij , and σeij denote the (i,j) entry of ΣX , ΣX* , and Σe , respectively. Measurement error model (5) suggests that σXij is smaller than σX*ij for all i and j. To consider possible scenarios of measurement error, we calculate the sample covariance Σ^X* of ΣX* and set σXij as σXij=0.9σ^X*ij where σ^X*ij is the (i,j) entry of Σ^X* , and

Σ^X*=(1.0000.5600.1550.0660.5601.0000.0520.0260.1550.0521.0000.0530.0660.0260.0531.000).

To specify σeij , we use the reliability ratio Rij=σXij/σX*ij=σXij/(σXij+σϵij) to guide us:

σeij=(Rij11)σXij. (13)

For ease of exposition, we take Rij as a constant for all i and j and let R denote it. In our analysis, we specifically consider R=0.65,0.75 or 0.85 , and in the top two panels of Tables 4 to 6 we report the estimates of the parameters in the logistic treatment model as well as estimation results of τ0 . Other results for the probit and complementary log–log treatment models are included in Section B of the Supplemental material.

Table 4.

Sensitivity analyses for NHEFS data with propensity scores determined by the logistic model: a quadratic extrapolation function is used.

R=0.65 R=0.75 R=0.85
Covariate full LASSO SCAD full LASSO SCAD full LASSO SCAD
intercept −1.292 −1.501 −1.161 −1.303 −1.191 −1.124 −1.792 −1.895 −1.570
sbp 0.626 0.236 0.152
dbp 2.427 1.915 2.510 3.301 2.575 3.347 3.212 2.758 3.434
cholesterol −0.057 −0.197 −0.250
price82 −1.321 −0.530 −0.538 −1.087 −0.414 −0.406 −1.281 −0.384 −0.384
ht 0.011 0.011 0.011
age 0.020 0.022 0.024
sex 0.078 0.068 0.037
nerves −0.235 −0.179 −0.190
hbpmed 0.023 0.022 0.013
race −1.034 −0.243 −0.251 −1.028 −0.225 −0.217 −1.098 −0.201 −0.201
τ^ 1.012 3.155 3.617 1.497 3.631 3.713 1.424 3.672 3.286
S.E.(τ^) 1.411 1.203 1.022 1.658 1.259 1.104 1.623 1.375 1.259
p-value 0.473 0.008 <0.001 0.366 0.003 <0.001 0.380 0.007 0.009
τ^F 3.886 3.424 3.633 3.915 3.094 3.143
S.E.(τ^F) 1.152 1.474 1.296 0.898 1.379 1.119
p-value 0.001 0.020 0.005 <0.001 0.024 0.005

ATE: average treatment effect; LASSO: least absolute shrinkage and selection operator; SCAD: smoothly clipped absolute deviation. The top panel reports the results of variable selection for the treatment model; the middle panel displays the estimation results of the ATE τ0 ; and the bottom panel shows the estimation results of the ATE τ0 by forcefully including “age” and “sex” to the selected variables to form the final treatment model to estimate τ0 .

Table 6.

Sensitivity analyses for NHEFS data with propensity scores determined by the logistic model: a rational linear extrapolation function is used.

R=0.65 R=0.75 R=0.85
Covariate full LASSO SCAD full LASSO SCAD full LASSO SCAD
intercept −1.163 −1.116 −1.170 −1.145 −1.092 −1.145 −1.149 −1.099 −1.156
sbp 0.069 0.035 0.062
dbp 1.295 1.234 1.288 1.277 1.224 1.277 1.208 1.152 1.187
cholesterol −0.001 −0.008 −0.002
price82 −0.080 −0.033 −0.033 −0.103 −0.050 −0.050 −0.100 −0.050 −0.051
ht 0.100 0.091 0.093
age 0.271 0.260 0.260
sex 0.011 0.011 0.014
nerves −0.070 −0.041 −0.067
hbpmed 0.012 0.016 0.015
race −0.815 −0.268 −0.322 −0.827 −0.275 −0.328 −0.834 −0.254 −0.312
τ^ 1.236 3.188 3.433 0.955 3.209 3.107 1.436 3.139 3.650
S.E.(τ^) 1.520 1.062 1.024 1.255 1.189 1.160 1.198 0.961 0.918
p-value 0.219 <0.001 <0.001 0.447 0.001 0.002 0.231 <0.001 <0.001
τ^F 3.210 3.084 3.166 3.073 3.796 3.445
S.E.(τ^F) 1.025 1.014 0.803 0.737 1.216 1.177
p-value 0.002 0.002 <0.001 <0.001 0.002 0.003

ATE: average treatment effect; LASSO: least absolute shrinkage and selection operator; SCAD: smoothly clipped absolute deviation. The top panel reports the results of variable selection for the treatment model; the middle panel displays the estimation results of the ATE τ0 ; and the bottom panel shows the estimation results of the ATE τ0 by forcefully including “age” and “sex” to the selected variables to form the final treatment model to estimate τ0 .

Table 5.

Sensitivity analyses for NHEFS data with propensity scores determined by the logistic model: a linear extrapolation function is used.

R=0.65 R=0.75 R=0.85
Covariate full LASSO SCAD full LASSO SCAD full LASSO SCAD
intercept −1.478 −1.031 −1.387 −1.956 −1.441 −1.866 −1.128 −1.648 −1.032
sbp 0.108 0.191 0.130
dbp 1.117 0.888 1.123 1.265 0.967 1.202 1.186 0.929 1.172
cholesterol −0.016 −0.019 −0.052
price82 −0.598 −0.151 −0.169 −0.470 −0.145 −0.153 −0.428 −0.133 −0.146
ht 0.008 0.008 0.009
age 0.022 0.023 0.023
sex 0.049 0.072 0.066
nerves −0.171 −0.153 −0.140
hbpmed 0.009 0.005 0.003
race −0.881 −0.434 −0.493 −0.898 −0.383 −0.402 −0.900 −0.420 −0.451
τ^ 1.658 3.713 3.520 1.212 3.915 3.756 1.249 3.844 3.652
S.E.(τ^) 1.155 1.088 1.128 1.612 1.014 0.997 1.374 1.245 1.224
p-value 0.265 0.001 0.002 0.452 <0.001 <0.001 0.363 0.001 0.003
τ^F 3.523 3.501 3.467 3.263 3.055 2.977
S.E.(τ^F) 1.216 1.215 0.883 0.900 1.050 1.048
p-value 0.004 0.004 <0.001 <0.001 0.004 0.003

ATE: average treatment effect; LASSO: least absolute shrinkage and selection operator; SCAD: SCAD: smoothly clipped absolute deviation. The top panel reports the results of variable selection for the treatment model; the middle panel displays the estimation results of the ATE τ0 ; and the bottom panel shows the estimation results of the ATE τ0 by forcefully including “age” and “sex” to the selected variables to form the final treatment model to estimate τ0 .

The estimation results are, in nature, consistent with the numerical results reported in Section 5.2. The variables dbp and race are selected to determine the propensity scores regardless of the values of R. For the estimation of τ0 , the proposed method with either the LASSO or SCAD penalty shows the significance effect of T, whereas the results obtained from the “full” model do not reveal such an effect.

Similar to Section 5.2, here we report sensitivity analyses results for τ^F and S.E.(τ^F) . The bottom panels of Tables 4 to 6 show the results for taking the treatment model as a logistic regression form. The results under the probit and complement log–log treatment models are placed in the bottom panels of Tables B3 to B8 in the Supplemental material.

6. Discussions and extensions

The inverse probability weighting estimation method and its variants have proved to be useful for estimating the average treatment effect in the causal inference framework. Despite the popularity of these methods, their applications are hindered by two critical conditions. The validity of those methods relies on the proper determination of propensity scores and the precise measurements of the confounders. In this article, we develop a simulation-based method by adapting the inverse probability weighting scheme to accommodate measurement error effects as well as variable selection for calculating propensity scores.

To highlight the idea, we present the proposed method by assuming the covariance matrix Σe for model (5) to be given. This condition is needed only for Step 1 in Section 3. However, it is restrictive for applications. To circumvent the issues induced by unknown Σe , we often need an additional data source to gain an understanding of the measurement error degree. If a validation sample having measurements for both Xi and Xi* is available, Σe can be estimated by fitting model (5) to the validation data. If a prior study on the same variables is available for the estimation of Σe , one may use the estimated Σe to implement Step 1 in Section 3. When repeated surrogate measurements are available, one may use them to estimate Σe , as done in Section 5.2, or alternatively, one may modify Step 1 in Section 3 to get around the problem of unknown Σe by applying the method of Devanarayan and Stefanski. 35 In circumstances where no additional data sources are available, we often conduct sensitivity analyses to understand the impact of different degrees of measurement error on estimation of τ0 , as demonstrated in Section 5.3. Basically, we first specify a sequence of values for the covariance matrix Σe to reflect possible scenarios of the measurement error process, and then we apply the proposed method to conduct the estimation of τ0 for each specified Σe . Finally, we evaluate the sensitivity of estimation results to different magnitudes of measurement error. Such a study helps us understand how varying amounts of measurement error may affect estimation results.

The proposed method has a major limitation. As discussed in Section 3.1, the extrapolation functions in Steps 3 and 5 are unknown in applications and they can only be approximated by user-specified functions. Consistent with many studies about the SIMEX algorithm, we approximate the underlying true extrapolation functions by using one of the three functions, quadratic, linear, and rational linear functions, in our numerical studies. The simulation studies show that quadratic functions tend to outperform linear and rational linear functions, as in the same line with the numerical experience reported in the literature. 17 (Section5.3.2) Here we stress that the inference outcome can be sensitive to the choice of an approximating function. It is useful to develop a procedure to construct data-driven functions to well approximate the extrapolation functions to reach robust inference results. One possible approach is to start with a class of candidate functions, say F , then repeat the steps in Section 3.1 by taking each fF to approximate the extrapolation function, and let τ^f and v(τ^f) denote the resulting estimator and the associated variance, respectively. Then the optimal estimator of τ0 , τ^f0 , is the estimator derived from using the function, f0=argminfFv(τ^f) , which minimizes v(τ^f) . It will be interesting to carefully explore this idea through numerical studies and theoretical justifications by examining various classes F .

This research can be sharpened with several extensions which are outlined as follows. The proposed method focuses on the case where error-prone variables are all continuous. If error-prone variables are all discrete, one may modify the development here by replacing the SIMEX steps with the MC-SIMEX algorithm developed by Kuchenhoff et al. 36 It is also interesting to generalize the development to accommodate a mix of error-contaminated discrete and continuous variables. One may adapt the augmented simulation-extrapolation method developed by Yi et al. 33 and discussed by Chen and Yi 27 and Zhang and Yi. 37 To shed light on the extension, we consider the case with the vector Xi only, where Xi=(XDi,XCiT)T with XDi representing a binary variable and XCi a subvector of continuous variables. Let XDi* and XCi* denote the observed surrogate measurements of XDi and XCi , respectively. Let pi=P(XDi*=0|XDi=1,XCi) and qi=P(XDi*=1|XDi=0,XCi) be the conditional misclassification probabilities, given Xi , where the dependence of pi and qi on Xi is suppressed in the notation.

One may employ logistic regression models to facilitate the dependence of the misclassification probabilities on the true variables Xi :

logitpi=α01+αx1TXCi;logitqi=α00+αx0TXCi

where α=(α01,αx1T,α00,αx0T)T is the vector of regression parameters.

For the error-prone continuous variables XCi , we consider a model similar to (5):

XCi*=XCi+eCi,

where the error term eCi is independent of {Ti,Xi,XDi*,Y(0)i,Y(1)i} and follows N(0,ΣCe) with covariance matrix ΣCe .

Now we write Si(γ;Xi) in (4) as Si(γ;XDi,XCi) to separately show the involvement of the discrete and continuous covariates. Define

Si*(β;γ,XDi*,XCi)=(1piqi)1×[(1XDi*){(1pi)Si(β;γ,0,XCi)qiSi(β;γ,1,XCi)}XDi*{piSi(β;γ,0,XCi)(1qi)Si(β;γ,1,XCi)}]

where Si(β;γ,r,XCi) represents Si(γ;XDi,XCi) with XDi taking value r, and r=0,1 .

By Problem 2.10 of Yi 18 (p.83) , Si*(β;γ,XDi*,XCi) is an unbiased estimating function expressed in terms of the observed binary variable XDi* , together with XCi . To address the induced error effects of replacing XCi with its surrogate XCi* , we may now employ the implementation procedure in Section 3.1 by replacing Si(β;γ,XDi,XCi) with Si*(β;γ,XDi*,XCi) in Step 2. Then the same steps carry through to obtain an estimator of τ0 with mismeasurement error effects in both XDi and XCi accounted for.

For a more general case with multiple binary variables or categorical variables subject to measurement error, one may use the same ideas to develop an inference procedure by introducing a misclassification matrix and further modifying the construction of the function Si*(β;γ,XDi*,XCi) with more involved notation. A careful study is warranted for working out technical details.

In Step 4 of the implementation procedure in Section 3.1, a common penalty function pλ() is used for the components in γ to construct the penalized quadratic loss function (6). This practice essentially treats all the pre-treatment variables of Wi in model (3) equally the same. In applications, with the subject matter information, we may need to include certain pre-treatment variables when building the treatment model. In this case, the formulation of the penalized quadratic loss function (6) can be modified by not imposing a penalty on the parameters corresponding to such variables. Alternatively, one may first implement the present development to do variable selection and then add the must-to-be-included variables (if not being selected) to build the final treatment model, together with those selected variables. It is useful to explore these two strategies in depth.

Our development roots in the use of a parametric model (3) to describe propensity scores. This consideration is primarily driven by the attractive features of parametric modeling. Parametric approaches are more effective than non-parametric methods in handling data with a large dimension, and they allow us to deal with measurement error effects using existing methods. Further, the asymptotic result of the resulting estimator can be established, which enables us to carry out statistical inference rigorously. While having these advantages, the parametric modeling scheme is vulnerable to model misspecification. To ameliorate this, machine learning methods have been employed to characterize propensity scores, which include classification and regression trees (e.g. Lee et al. 38 ), neural networks and support vector machines (e.g. Westreich et al. 39 ), and cross-fit estimators (e.g. Zivich and Breskin 40 ). It is interesting to extend our development by using those machine learning methods to delineate the treatment model. One needs to carefully investigate typical issues related to machine learning methods, such as the lack of transparent interpretation or the nature of “black-box” of the algorithms, and the bias-and-variance trade-off.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802221146308 - Supplemental material for Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders

Supplemental material, sj-pdf-1-smm-10.1177_09622802221146308 for Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders by Grace Y. Yi and Li-Pang Chen in Statistical Methods in Medical Research

Acknowledgements

The authors thank the Editor and the review team for their helpful comments on the initial submission. Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs Program.

Appendix: Proof of Theorem 3.2

Let γI denote the parameters for the selected treatment model (8) and let ϕ(γI;Ti,XIi,ZIi) denote the score function of γI derived from model (8). Let θ=(γIT,τ)T denote the parameters and let θ0=(γI0T,τ0)T denote their true values. Let θ^=(γ^IT,τ^)T denote the estimator of θ , referred in Theorems 3.1 and 3.2.

Define

Ui(θ;Yi,Ti,XIi,ZIi)=(ϕ(γ;Ti,XIi,ZIi)TiYiπi(1Ti)Yi1πiτ). (14)

It is readily shown that Ui(θ;Yi,Ti,XIi,ZIi) is an unbiased estimating function of θ , i.e.,

E{Ui(θ;Yi,Ti,XIi,ZIi)}=0

where the expectation is evaluated with respect to the joint distribution for the associated random variables.

Now we discuss the asymptotic distribution for the estimator θ^ for θ by modifying the discussions of Yi et al. 26 and Carroll et al. 28 For each k and ψ , define Ui*(θ;k,ψ)=Ui(θ;Yi,Ti,XIi*(k,ψ),ZIi) , where XIi*(k,ψ) is the subvector of Xi*(k,ψ) corresponding to the subvector XIi* of Xi* . Since for a given ψ , the XIi*(k,ψ) are independent for k=1,,K , so the solutions of E[Ui*(θ;k,ψ)]=0 are free of the values of k, where the expectation is evaluated under the distribution of the associated variables. Assuming that the equation has a unique solution, we let θ(ψ) denote such a solution.

Assuming that for each k and ψ ,

i=1nUi*(θ;k,ψ)=0 (15)

has a unique solution, and let θ^(k,ψ) denote such a solution. Then applying Theorem 1 of Yi and Reid 41 gives

θ^(k,ψ)pθ(ψ)asn.

Applying the Taylor series expansion to (15) gives that

n{θ^(k,ψ)θ(ψ)}=n1i=1n[Γ(ψ)]1Ui*(θ(ψ);k,ψ)+op(1) (16)

where Γ(ψ)=E[(/θ)Ui*T(θ(ψ);k,ψ)] .

Let

θ^(ψ)=1Kk=1Kθ^(k,ψ)andVi(ψ)=1Kk=1K[Γ(ψ)]1Ui*(θ(ψ);k,ψ).

Thus, summing (16) over k and then dividing by K leads to

n{θ^(ψ)θ(ψ)}=n1i=1nVi(ψ)+op(1)forψC. (17)

Now we examine the extrapolation step for obtaining the SIMEX estimator θ^ . Let d be the dimension of θ . Suppose the exact extrapolation function is known, that is, there is a known d×1 vector h() of functions of m-dimensional arguments such that the SIMEX estimator can be written as

θ^=h(θ^(ψ1),,θ^(ψM))

and the true parameter value θ0 is related to {θ(ψ1),,θ(ψM)} by

θ0=h(θ(ψ1),,θ(ψM)).

For j=1,,M , let h˙j=h(θ(ψ1),,θ(ψM))/θT(ψj) . Then applying the Taylor series expansion to h(θ^(ψ1),,θ^(ψM)) gives

h(θ^(ψ1),,θ^(ψM))=h(θ(ψ1),,θ(ψM))+j=1Mh˙j×(θ^(ψj)θ(ψj))+op(n1),

thus

n(θ^θ0)=j=1Mh˙j×n(θ^(ψj)θ(ψj))+op(1).

Define Hi=j=1Mh˙j×Vi(ψj) , then by (17), we have

n(θ^θ0)=n1i=1nHi+op(1). (18)

Then applying the Central Limit Theorem to (18) gives

n(θ^θ0)pN(0,ΣSIM)asn, (19)

where ΣSIM=var(Hi) . Let v(τ0) denote the right-lower corner element of ΣSIM . Then (19) yields that n(τ^τ0) has an asymptotic normal distribution with mean zero and variance v(γ0) .

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). Yi’s research was undertaken, in part, by funding from the Canada Research Chairs Program.

Supplemental material: Supplemental material for this article is available online.

References

  • 1.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41–55. [Google Scholar]
  • 2.Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984; 79: 516–524. [Google Scholar]
  • 3.Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004; 23: 2937–2960. [DOI] [PubMed] [Google Scholar]
  • 4.Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics 2005; 61: 962–973. [DOI] [PubMed] [Google Scholar]
  • 5.Westreich D, Cole SR, Funk MJ, et al. The role of the c-statistic in variable selection for propensity score models. Pharmacoepidemiol Drug Saf 2011; 20: 317–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shortreed SM, Ertefaie A. Outcome-adaptive lasso: variable selection for causal inference. Biometrics 2017; 73: 1111–1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ertefaie A, Asgharian M, Stephens DA. Variable selection in causal inference using a simultaneous penalization method. J Causal Inference 2018; 6: 20170010. [Google Scholar]
  • 8.Koch B, Vock DM, Wolfson J, et al. Variable selection and estimation in causal inference using Bayesian spike and slab priors. Stat Methods Med Res 2020; 29: 2445–2469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ghosh D, Zhu Y, Coffman DL. Penalized regression procedures for variable selection in the potential outcomes framework. Stat Med 2015; 34: 1645–1658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Vansteelandt S, Bekaert M, Claeskens G. On model selection and model misspecification in causal inference. Stat Methods Med Res 2010; 21: 7–30. [DOI] [PubMed] [Google Scholar]
  • 11.Imai K, Yamamoto T. Causal inference with differential measurement error: Nonparametric identification and sensitivity analysis. Am J Pol Sci 2010; 54: 543–560. [Google Scholar]
  • 12.Edwards J, Cole SR, Westreich D. All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework. Int J Epidemiol 2015; 44: 1452–1459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McCaffrey DF, Lockwood JR, Setodji CM. Inverse probability weighting with error-prone covariates. Biometrika 2013; 100: 671–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Shu D, Yi GY. Causal inference with measurement error in outcomes: Bias analysis and estimation methods. Stat Methods Med Res 2019a; 28: 2049–2068. [DOI] [PubMed] [Google Scholar]
  • 15.Shu D, Yi GY. Inverse-probability-of-treatment weighted estimation of causal parameters in the presence of error-contaminated and time-dependent confounders. Biometrical J 2019b; 61: 1507–1525. [DOI] [PubMed] [Google Scholar]
  • 16.Kyle RP, Moodie EEM, Klein MB, et al. Correcting for measurement error in time-varying covariates in marginal structural models. Am J Epidemiol 2016; 184: 249–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Carroll RJ, Ruppert D, Stefanski LA, et al. Measurement error in nonlinear models. 2nd ed.Boca Raton, FL: Chapman & Hall, 2006. [Google Scholar]
  • 18.Yi GY. Statistical analysis with measurement error or misclassification: strategy. Springer: Method and Application, 2017. [Google Scholar]
  • 19.Yi GY, Delaigle A, Gustafson P. Handbook of measurement error models. Boca Raton, FL: Chapman & Hall/CRC, 2021. [Google Scholar]
  • 20.Cook JR, Stefanski LA. Simulation-extrapolation in parametric measurement error models. J Am Stat Assoc 1994; 89: 1314–1328. [Google Scholar]
  • 21.Zou H, Li R. One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 2008; 36: 1509–1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc, Ser B 1996; 58: 267–288. [Google Scholar]
  • 23.Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 2001; 96: 1348–1360. [Google Scholar]
  • 24.Wang H, Li R, Tsai C-L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 2007; 94: 553–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang Y, Li R, Tsai C-L. Regularization parameter selections via generalized information criterion. J Am Stat Assoc 2010; 105: 312–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yi GY, Tan X, Li R. Variable selection and inference procedures for marginal analysis of longitudinal data with missing observations and covariate measurement error. Can J Stat 2015; 43: 498–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chen L-P, Yi GY. Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics 2021; 77: 956–969. [DOI] [PubMed] [Google Scholar]
  • 28.Carroll RJ, Lombard F, Küchenhoff H, et al. Asymptotics for the SIMEX estimator in structural measurement error models. J Am Stat Assoc 1996; 91: 242–250. [Google Scholar]
  • 29.Bauldry S, Bollen KA, Adair LS. Evaluating measurement error in readings of blood pressure for adolescents and young adults. Blood Press 2015; 24: 96–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Glasziou PP, Irwig L, Heritier S, et al. Monitoring cholesterol levels: measurement error or true change? Ann Intern Med 2008; 148: 656–661. [DOI] [PubMed] [Google Scholar]
  • 31.Lebow DE, Rudd JB. Measurement error in the consumer price index: where do we stand? J Econ Lit 2003; 41: 159–201. [Google Scholar]
  • 32.Kaufman A, Augustson EM, Patrick H. Unraveling the relationship between smoking and weight: the role of sedentary behavior. J Obes 2012; Article ID 735465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Yi GY, Ma Y, Spiegelman D, Carroll RJ. Functional and structural methods with mixed measurement error and misclassification in covariates. J Am Stat Assoc 2015; 110: 681–696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chen L-P, Yi GY. Model selection and model averaging for analysis of truncated and censored data with measurement error. Electron J Stat 2020; 14: 4054–4109. [Google Scholar]
  • 35.Devanarayan V, Stefanski LA. Empirical simulation extrapolation for measurement error models with replicate measurements. Stat Probab Lett 2002; 59: 219–225. [Google Scholar]
  • 36.Küchenhoff H, Mwalili SM, Lesaffre E. A general method for dealing with misclassification in regression: the misclassification SIMEX. Biometrics 2006; 62: 85–96. [DOI] [PubMed] [Google Scholar]
  • 37.Zhang Q, Yi GY. R package for analysis of data with mixed measurement error and misclassification in covariates: augSIMEX. J Stat Comput Simul 2019; 89: 2293–2315. [Google Scholar]
  • 38.Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med 2010; 29: 337–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Westreich D, Lessler J, Funk MJ. Propensity score estimation: Neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol 2010; 63: 826–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zivich PN, Breskin A. Machine learning for causal inference: on the use of cross-fit estimators. Epidemiology 2021; 32: 393–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Yi GY, Reid N. A note on misspecified estimating function. Stat Sin 2010; 20: 1749–1769. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-pdf-1-smm-10.1177_09622802221146308 - Supplemental material for Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders

Supplemental material, sj-pdf-1-smm-10.1177_09622802221146308 for Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders by Grace Y. Yi and Li-Pang Chen in Statistical Methods in Medical Research


Articles from Statistical Methods in Medical Research are provided here courtesy of SAGE Publications

RESOURCES