Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Oct 1.
Published in final edited form as: J Econom. 2017 Jul 8;200(2):194–206. doi: 10.1016/j.jeconom.2017.06.005

Simultaneous treatment of unspecified heteroskedastic model error distribution and mismeasured covariates for restricted moment models

Tanya P Garcia 1,1, Yanyuan Ma 2
PMCID: PMC5708600  NIHMSID: NIHMS891467  PMID: 29200600

Abstract

We develop consistent and efficient estimation of parameters in general regression models with mismeasured covariates. We assume the model error and covariate distributions are unspecified, and the measurement error distribution is a general parametric distribution with unknown variance-covariance. We construct root-n consistent, asymptotically normal and locally efficient estimators using the semiparametric efficient score. We do not estimate any unknown distribution or model error heteroskedasticity. Instead, we form the estimator under possibly incorrect working distribution models for the model error, error-prone covariate, or both. Empirical results demonstrate robustness to different incorrect working models in homoscedastic and heteroskedastic models with error-prone covariates.

Some Key Words: Influence function, Linear operator, Measurement error, Nuisance tangent space, Restricted moment model

1 Introduction

1.1 Motivating problem

Regression is arguably the most familiar topic in econometrics and statistics and has motivated a vast amount of literature. Many scientific phenomena can be modeled using a general regression model where a univariate response Y is related to covariates X ∈ ℝk and Z ∈ ℝs through

Y=m(X,Z;β)+ε. (1)

Here, m is known up to the parameter β ∈ ℝp, and the model error ε is only required to satisfy E(ε|X,Z) = 0. With the conditional distribution of ε unspecified, this model is also known as a restricted moment model (RMM). A typical challenge with RMMs is that some covariates, say Z, are precisely measured, whereas others, say X, are mismeasured. In place of Xi, i = 1,, n, one instead observes ℓ surrogate replicates

Wij=Xi+Uij,j=1,,, (2)

where Uij’s are independent, mean zero random variables with unknown variance-covariance ΩU ∈ ℝk×k. The surrogacy assumption implies that Yi and Wij’s are conditionally independent given (Xi, Zi). Lastly, we suppose the measurement error is classical so that Xi and Uij are independent.

An example of this model is in the nutrition study of Flagg et al. (2000). There, a key interest is properly modeling the relationship between percent calories from fat (Y ), race (Z), and saturated fat intake (X). Saturated fat intake is not known exactly and only an approximate version via two repeated measurements, W·1,W·2, is available from food frequency questionnaires. To handle the measurement error in this example and in any model characterized by (1) and (2), the goal of this paper is to estimate the model parameters β and ΩU under the following general assumptions:

  • Assumption (i): the mean model m(X,Z; β) is any linear or nonlinear function;

  • Assumption (ii): the model error ε may depend on (X,Z) (i.e., heteroskedasticity), and its conditional distribution pε|X,Z(ε|x, z) is unspecified;

  • Assumption (iii): the conditional distribution of X given Z, pX|Z(x|z), and the distribution of Z, pZ(z), exist but are completely unspecified. Thus we have a modern functional measurement error model (Carroll et al., 2006, chap. 7.2);

  • Assumption (iv): the measurement error is classical and Uij, i = 1,, n; j = 1,, ℓ, has a general parametric distribution pUij (u; ΩU) with ΩU unknown. This contrasts from the usual normality assumption for measurement error (Carroll et al., 1999, 2004).

1.2 Estimation challenges

Allowing p|X,Z, pX|Z, and pZ to be unspecified provides more modeling flexibility and reduces the chance of model misspecification. However, it also raises serious challenges. The unknown distributions cannot be ignored, and arbitrarily adopting models for p|X,Z or pX|Z may cause bias. Estimating these distributions is also potentially difficult. For example, pX|Z is a model of unobserved variables. Its estimation would involve an inverse operation such as deconvolution (Stefanski and Carroll, 1990), which results in a very slow rate (Carroll and Hall, 1988; Fan, 1991). The estimation of p|X,Z(ℓ|x, z) is equally challenging because residuals are unobtainable in measurement error models even if model parameters were known. The unavailability of the residuals makes correctly estimating the model error’s variance-covariance difficult. This is especially problematic when the model error is heteroskedastic, and a proper variance-covariance is needed to yield consistent model parameter estimates. Although methods exist to estimate the unknown variance-covariance, they are either approximate (Carroll and Wang, 2008) or complex (Delaigle and Hall, 2011).

1.3 Competing methods and features of our approach

A nonlinear, classical measurement error model with replicates has been treated by several authors. Extensive research has focused on measurement error problems with specific forms of m(X,Z; β), ranging from polynomial regression (Chan and Mak, 1985; Cheng and Schneeweiss, 1998; Cheng et al., 2000; Huang and Huwang, 2001) to generalized linear mixed models (Liang, 2009; Li and Liqun, 2012). For our purposes, we consider m(X,Z; β) to be any linear or nonlinear form, which is a general assumption of many existing works. For example, Li (2002) used Kotlarski’s identification (Rao, 1992, p. 21) to identify and consistently estimate model parameters for a general m(X,Z; β) with two replicates Wi1,Wi2. Tsiatis and Ma (2004) developed a consistent, asymptotically normal estimator when p|X,Z is known and parametric. Schennach (2004a) used properties of Fourier transforms, where the crux of her work lies in constructing moments of the unobserved X, and then forming estimators that can be written in terms of these moments. Schennach (2004b) developed an unbiased, Nadaraya-Watson based estimator to nonparametrically estimate m(X,Z) in (1). Lastly, Hu and Schennach (2008) and Schennach and Hu (2013) used a sieve maximum likelihood estimator (MLE) which yields consistency and the former successfully handles heteroskedastic measurement error (i.e., U in (2) depends on X). For an overview on measurement error models, see Fuller (1987) for earlier results in linear models and Carroll et al. (2006) for modern approaches in linear and nonlinear models. The developed methodologies have all positively impacted the literature of regression with classical measurement error. Still, some limitations linger and it is these limitations that motivated this work.

In this paper, we propose to overcome two key limitations of existing methods: the direct estimation or knowledge of pε|X,Z, and the inability to handle model error heteroskedasticity. In this regard, we develop a semiparametric estimator which avoids estimating pX|Z and pε|X,Z. This is possible through deriving the semiparametric efficient score (Bickel et al., 1993; Tsiatis, 2006) which we reveal is robust to misspecification of the unknown distributions. Our approach involves adopting working parametric models for the unknown distributions. We show that if the working models are correct, then the estimator is semiparametric efficient; otherwise, the estimator is still root-n consistent and asymptotically normal. Lastly, our method does not require correctly estimating the model error’s variance-covariance.

Not having to directly estimate pε|X,Z differs from the semiparametric Tsiatis and Ma (2004) method and the sieve MLE (Shen, 1997; Schennach and Hu, 2013). Tsiatis and Ma (2004) assume pε|X,Z is a known, parametric form. Unfortunately, in our own numerical studies (Section 4), we found that such an assumption is sensitive to misspecification of the model error variance. With the sieve MLE, pε|X,Z and pX|Z are represented by increasingly rich parametric representations such as a truncated series of basis functions. The parameters in the truncated series and regression are then jointly estimated via MLE subject to constraints that ensure the estimated pε|X,Z, pX|Z are valid densities and that E(ε|X,Z) = 0. Sieve methods yield consistent estimators and are fairly straight-forward to implement, making the approach widely appreciated in the literature. However, compared to the sieve MLE, our approach bypasses the consistent estimation of pε|X,Z, pX|Z. In doing so, our method eliminates a step in the aim of constructing a consistent estimator and, as described next, flexibly handles potential heteroskedasticity in the model error.

Model error heteroskedasticity is a challenging problem, especially in a measurement error setting where residuals are unavailable to aid the appropriate modeling of variance-covariance structures. In bypassing the correct estimation of pε|X,Z, our method implicitly handles misspecifications of the model error’s variance structure. That is, knowledge of the model error being heteroskedastic or homoskedastic is not needed. In our own explorations of existing estimators to handle model error heteroskedasticity, we found some shortcomings. The estimators of Li (2002) and Schennach and Hu (2013) both assume ε and (X,Z) are independent (i.e., homoskedastic model error). Consequently, ignoring the homoskedastic assumption naturally results in bias when the model error is truly heteroskedastic; see numerical studies in Section 4 for bias of the sieve estimator from Schennach and Hu (2013). The bias persists even when the number of terms in the sieve representations increases. As improvement, Hu and Schennach (2008) developed a different sieve estimator that successfully handles heteroskedastic measurement error (i.e., U in (2) depends on X). Unfortunately, when we extended their methodology to handle heteroskedastic model error, we encountered two difficulties. First, for the heteroskedastic sieve of pε|X,Z to be a valid density and have conditional mean zero, we require imposing twelve constraints (see Section S.8, Supplementary Material). Second, from our numerical studies (Section 4), we found that the heteroskedastic sieve estimator yielded biased estimates for the RMM models considered here. Given that the sieve approach has been widely successful in various regressions with errors-in-covariates, we were initially surprised by these results. However, we now believe the biasedness is a consequence of the complex computation that attempts a constrained optimization subject to too many constraints.

Lastly, for a nonparametric regression with classical measurement error, Schennach (2004b) developed an unbiased, Nadaraya-Watson based estimator that can handle heteroskedastic model error. However, our situation is completely different in that we consider a semiparametric regression model (i.e., m(X,Z; β)), not a nonparametric one (i.e., m(X,Z)).

Thus, as far as we are aware, we believe our semiparametric approach provides advantages over existing methods in that it bypasses estimating pε|X,Z and pX|Z, and simultaneously handles unspecified heteroskedastic model error and mismeasured covariates. It is important to note that our method is developed under specific assumptions in Section 1.1, among which require multiple proxy variables and classical measurement error (Assumption (iv)). Under Assumption (iv), we may easily estimate ΩU in the measurement error distribution (Section 2.1) and thus, more easily identify estimating equations for β (Theorem 1). When this assumption no longer holds, the estimation procedure is more difficult: a more general method is needed to simultaneously estimate ΩU and β. Work in this area has been explored; see Hu and Schennach (2008) and Chen et al. (2009) for developments in non-classical measurement error and estimation without available replicates (Chen et al., 2009).

The rest of the paper is as follows. Section 2 establishes identifiability results for the model parameters. Section 3 describes the main results for the semiparametric estimator, including theoretical properties, robustness to misspecifications of working distributions and its numerical implementation. We show the satisfying performance of the estimator through a simulation study in Section 4 and a data example in Section 5. Section 6 concludes the paper with a brief discussion. Technical proofs and additional simulation results are provided in the Supplementary Material. All computer codes are available upon request.

2 Identification

2.1 Identification of ΩU

The identification of ΩU is facilitated by the observed replicates. If replicates are unavailable, then validation data (Lee and Sepanski, 1995) or instrumental variables (Carroll et al., 2004) can be used.

To identify ΩU, we use the usual components of variance analysis (Carroll et al., 2006, chap. 4). Define Wi=j=1Wij/ and Vi=j=1(Wij-Wi)(Wij-Wi)T. Then ΩU = E(Vi)/(l − 1), hence it is identifiable. In practice, we solve

i=1n(Vi-1-ΩU)=0 (3)

to obtain Ω̂U.

2.2 Identification of β

We demonstrate identifiability of β by casting the RMM with measurement error into a semiparametric framework. Let η1(x, z) ≡ pX|Z(x|z), η2(ε, x, z) ≡ pε|X,Z(ε|x, z), and η3(z) ≡ pZ(z) denote infinite-dimensional nuisance parameters corresponding to the unknown distributions. Let W denote the average of the observed replicates and pW|X,Z(w|x, z; α) denote its conditional distribution given (X,Z), with α = vech(ΩU) (i.e., the vectorized version of the upper block of ΩU including its diagonal). Then, the probability density function of (Y,W,Z) is

pY,W,Z(y,w,z;β,α,η1,η2,η3)=η2{y-m(x,z;β),x,z}pWX,Z(wx,z;α)η1(x,z)η3(z)dμ(x), (4)

where (·) denotes the dominating measure, which is the Lebesgue measure for continuous variables and the counting measure for discrete variables. The density of (Y,W,Z) contains both finite and infinite-dimensional parameters, hence the RMM with measurement error is a semiparametric model.

The identifiability of β in the RMM with measurement error is closely linked to the identifiability of β in the RMM without measurement error. To see this, assume to the contrary that the RMM without measurement error is identifiable, but that β in the RMM with measurement error is not. Then, there exist β0, η1, η2, η3 and β, η1,η2,η3 where β0β, but β0, η1, η2, η3 and β, η1,η2,η3 yield the same data generation procedure:

pY,W,Z(y,w,z;β,α,η1,η2,η3)=η2{y-m(x,z;β0),x,z}η1(x,z)η3(z)pU(w-x;α)dx=η2{y-m(x,z;β),x,z}η1(x,z)η3(z)pU(w-x;α)dx.

Here, pU(u; α) denotes the measurement error distribution. Deconvolution then implies that for all (Y,X,Z), η2{y-m(x,z;β0),x,z}η1(x,z)η3(z)=η2{y-m(x,z;β),x,z}η1(x,z)η3(z). A similar argument to

pW,Z(w,z;α,η1,η3)=pU(w-x;α)η1(x,z)η3(z)dx=pU(w-x;α)η1(x,z)η3(z)dx,

yields η1(x,z)η3(z)=η1(x,z)η3(z) for all (x, z). Together, these results imply that on the support of the probability density of (x, z),

η2{y-m(x,z;β0),x,z}=η2{y-m(x,z;β),x,z} (5)

for all (Y,X,Z). Hence, (5) implies that the conditional model error distributions under β0 and under β are identical which makes the RMMwithout measurement error not identifiable. This contradicts our original assumption. Therefore, we have identifiability as long as we begin with an identifiable RMM without measurement error. Identifiability of the RMM without measurement error depends on the specific form of the mean model and is generally straight-forward to establish.

3 Methodology

3.1 Estimation of ΩU and β

Estimation of ΩU, equivalently α = vech(ΩU ), follows directly from the solution to (3). Estimation of β builds upon the semiparametric results for an RMM without measurement error. For this latter case, Tsiatis (2006) demonstrated that consistent estimators are the solutions to the linear estimating equation

i=1nA(Xi,Zi){Yi-m(Xi,Zi;β)}=0.

Here, A(X,Z) ∈ ℝp is an arbitrary function that does not cause the above estimating equation to degenerate. If A(X,Z) = ∂m(X,Z; β)/∂βE(ε2|X,Z)−1, then the equation is named the optimal generalized estimating equation (optimal GEE; Liang and Zeger, 1986), and it yields the efficient estimator. See Section S.1 (Supplementary Material) for a brief overview of the semiparametric procedure and its application to the RMM without measurement error.

Applying the semiparametric procedure to the RMM with measurement error, we establish in Theorem 1 the condition that any consistent estimator for β must satisfy. A detailed derivation is given in Section S.2 (Supplementary Material).

Theorem 1

For the RMM with measurement error, a consistent estimator for β is the solution to i=1nf(Yi,Wi,Zi;β)=0 where f is a p-dimensional function in

Λ=[f(Y,W,Z):E{f(Y,W,Z)Y,X,Z}=g(X,Z)ε].

Here, g is an arbitrary function of (X,Z) with finite variance, and

E{f(Y,W,Z)Y,X,Z}=f(y,w,z)pWX,Z(wx,z;α)dμ(w). (6)

Theorem 1 states that to determine if a function f(Y,W,Z; β) yields a consistent estimator for β, one must verify that f belongs to Λ. The verification involves computing the integral in (6) and checking that the result is of the form g(X,Z)ε for some function g(X,Z). Note that the integration in (6) does not involve the unknown distributions η1 or η2. Instead, it only involves the distribution pW|X,Z(w|x, z; α) which is completely known once α is estimated from (3). This observation means that even without knowing η1 and η2, one can verify if a function f belongs to Λ, and thus use it to form a consistent estimator for β.

Unfortunately, finding f that belongs to Λ is not a trivial task. It is equivalent to the challenge of finding a corrected score which is only resolved for generalized linear models (Nakamura, 1990). An approximate corrected score is possible using complex-variable computations and Monte Carlo averaging (Novick and Stefanski, 2002). In this work, we use a careful analytic derivation to construct f in Λ.

Let η1(x,z) and η2(ε,x,z) be working models of η1 and η2, respectively. The working models may be completely different from the true models, denoted as η10, η20, but we assume the support is the same. Throughout, let E* (·) denote the expectation computed under η1,η2, and E(·) denote the expectation computed under η10, η20. Define conjugate linear operators

K1{h(Y,X,Z)}=E{h(Y,X,Z)Y,W,Z},K2{f(Y,W,Z)}=E{f(Y,W,Z)Y,X,Z}.

It is important to note that 𝒦2 is independent of η1,η2 as evident from (6); hence its definition is asterisk-free.

Using the projection theorem (Rudin, 1987), we demonstrate in Section S.3 (Supplementary Material) that a function in Λ is 𝒦1{d*(Y,X,Z)} where d*(Y,X,Z) is a p-dimensional function that satisfies

εE(dεX,Z)+K2K1(d)E(ε2X,Z)-εE{K2K1(d)εX,Z}=mβ(X,Z;β)ε. (7)

Here, ∘ denotes the composite operation and mβ(X,Z;β) is ∂m(X,Z; β)/∂β. To see that 𝒦1{d*(Y,X,Z)} indeed belongs to Λ, we can easily re-arrange (7) to show that E[𝒦1{d*(Y,X,Z)}|Y,X,Z] = g(X,Z)ε with g(X,Z)=(mβ(X,Z;β)-E[{d-K2K1(d)}εX,Z])E(ε2X,Z)-1. It is worth noting that in the terminology of semip arametric theory, 𝒦1{d*(Y,X,Z)} is known as the locally efficient score vector

Seff(Y,W,Z;β,α,η1,η2)K1{d(Y,X,Z)}.

A few remarks are in order. First, equation (7) may admit more than one solution d*. However, by the projection theorem (Rudin, 1987), even if d* is not unique, 𝒦1{d*(Y,X,Z)} is unique; see Section S.3 (Supplementary Material). Hence differences in numerical procedures for obtaining d* will not affect the final estimating equation which is formed using Seff(Y,W,Z;β,α,η1,η2)K1{d(Y,X,Z)}. Second, to ensure that the parameter values are identified from the ensuing estimating equation, we require that E{Seff(Yi,Wi,Zi;β,α,η1,η2)}=0 has unique root. Third, even if the unique root property holds at the population level, the estimating equation may still have multiple roots at the sample level. As far as we are aware, selecting among the multiple roots in estimating equations is a thorny issue; empirical knowledge for root selection is usually needed in practice. Lastly, because 𝒦1{d*(Y,X,Z)} is constructed to be an element of Λ and all elements of Λ yield consistent estimators for β (Theorem 1), the choice of η1,η2 in forming 𝒦1{d*(Y,X,Z)} does not affect consistency. See Section 3.3 for a discussion of choosing η1,η2 in practice. To the best of our knowledge, this is the only existing root-n consistent estimator for the RMM with measurement error that does not require estimating the unknown η1, η2.

3.2 Algorithm for estimating ΩU and β

The algorithm for estimating ΩU and β in model (1) and (2) is as follows.

  1. Recall that Wi=j=1Wij/ and Vi=j=1(Wij-Wi)(Wij-Wi)T. Solve for Ω̂U as the root of (3) and form α̂n = vech(Ω̂U).

  2. Propose a working density model η1 for η1.

  3. Propose a working density model η2 for η2 that satisfies E*(ε|X,Z) = 0.

  4. Perform 𝒦1, 𝒦2, E*|X,Z) under pW|X,Z(w|x, z; α̂n), η1, and η2. Solve for d*(Y,X,Z) from (7). When (7) admits more than one solution, pick one arbitrarily.

  5. Form the score vector Seff(Y,W,Z;β,α^n,η1,η2)=K1(d) by calculating 𝒦1 under η1 and pW|X,Z(w|x, z; α̂n). Even if (7) has multiple solutions, they will yield the same 𝒦1(d*) (Rudin, 1987).

  6. Solve the estimating equation i=1nSeff(Yi,Wi,Zi;β,α^n,η1,η2)=0 for the estimator β̂n.

In estimating β, we have treated α via a plug-in estimator obtained from Step 1. Alternatively, we can also augment α to β and simultaneously estimate both using the procedure from Step 2 on. That is, we may solve for θ^n=(α^nT,β^nT)T as the root of

i=1nS(Yi,Wi,Vi,Zi;θ,η1,η2)=0. (8)

Here, 𝒮 = (ϕT, fT )T with ϕ denoting the estimating equations in (3) corresponding to the α elements, and f ∈ Λ. In our algorithm, we set f=Seff, and η1=η1,η2=η2. Solving for α̂n and β̂n simultaneously does not change the analysis.

The numerical implementation of the algorithm is given in Section 3.5. We now give some remarks regarding the algorithm.

3.3 Selection and impact of working models η1,η2

One flexibility of our algorithm is the ability to choose possibly incorrect, working models η1,η2 for η1, η2 (Steps 2 and 3 of the algorithm). We now discuss a practical approach for selecting these working models and its impact on the consistency and efficiency of β̂n.

Remark 1

When either or both η1,η2 are misspecified and the measurement error distribution is estimated as pW|X,Z(w|x, z; α̂n), the algorithm still provides a consistent estimator.

To prove the consistency claim in Remark 1, we make the following regularity conditions, stated using the general 𝒮 = (ϕT, fT )T and θ = (αT, βT )T notation. We assume θ belongs to a domain of interest Θ which is a compact set.

  • (R1)

    The estimating equation in (8) and its expectation E{𝒮(Y,W, V,Z; θ, η1, η2)} are sufficiently smooth in (θ, η1, η2) in a neighborhood of (θ0, η10, η20). This condition is needed so that the weak law of large numbers is valid.

  • (R2)

    The matrix E{𝒮(Y,W, V,Z; θ, η1, η2)/∂θT } is invertible, bounded and smooth in (θ, η1, η2) in a neighborhood of (θ0, η10, η20). This assumption permits the re-arrangement of a Taylor expansion and hence, the applicability of the central limit theorem.

  • (R3)

    For η1=η1,η2=η2, the equation E{𝒮(Yi,Wi, Vi, Zi; θ, η1, η2)} = 0 has a unique solution and E{supθΘ|𝒮(Yi,Wi, Vi, Zi; θ, η1, η2)|} < ∞ component wise. The unique solution requirement is commonly needed in semiparametric estimation and in parametric estimation, except when the objective function guarantees the unique root property such as when it is convex. While a globally unique root property is somewhat restrictive, one can instead require a unique root in a region of interest, so long as it is justifiable to consider parameters only in that region.

We show in Section S.4 (Supplementary Material) that under these regularity conditions, θ̂n is a consistent estimator even when η1=η1,η2=η2 are misspecified.

From Remark 1, in terms of obtaining a consistent estimator, we are free to choose any working model η1 and η2. Thus, for computational ease, we suggest using Gaussian models due to their simplicity. We also recommend choosing the support of η1,η2 to be as large as that of the true distributions so as to maintain numerical stability. Of course, the true distributions are unknown, so the latter requirement may be achieved by choosing the support based on the observed data. For example, after centering the observed data (Y,W,Z), one may choose η1 to be a normal distribution with mean zero and variance equal to the sample variance of W. Likewise, one may choose η2 to be a normal distribution with mean zero and variance as estimated from the residual sum of squares after regressing Y on m(W, Z; β).

Although any working models η1,η2 maintain consistency, they affect efficiency in theory as we now describe.

Remark 2

The choice of η1,η2 only affects β, so we characterize the efficiency for β only with α fixed at the truth. When the working models are correct, i.e., η1=η10 and η2=η20, the algorithm gives the optimal estimator in that its estimation variance achieves the semiparametric efficiency bound (Tsiatis, 2006, chap. 4). Such results follow because, in this case, the resulting estimator solves the true efficient score estimating equation i=1nSeff(Yi,Wi,Zi;β^n,α0,η10,η20)=0.

Justification of Remark 2 follows from the principles of semiparametric theory (Tsiatis, 2006, chap. 4). From Remark 2, if the working models η1,η2 are exactly the true models η10, η20, then the resulting estimator β̂n is most efficient. Of course knowing the true models η10, η20 is rarely an option. Hence, some efficiency loss is expected since the working models will most likely differ from the truth. The incurring loss depends on the proposed working models, and can be theoretically characterized as follows.

Let Seff be as in Step 5 of the algorithm which is constructed under the possibly misspecified working models η1,η2. Let A=E{Seff(Y,W,Z;β0,α0,η1,η2)/β} and B=var{Seff(Y,W,Z;β0,α0,η1,η2)}, where β0, α0 denote the true parameter values. The asterisks in A* and B* are used to emphasize that Seff depends on the working models. Finally, let A,B and Seff be defined analogously to A*,B*, and Seff, respectively, except with η1=η10 and η2=η20.

In Theorem 2 (see Section 3.4), we demonstrate that under working models η1,η2, the estimator β̂n is asymptotically normal with mean zero and variance-covariance A-1B(A-1)T. In comparison, under the true η10, η20, the asymptotic variance-covariance of β̂n is A−1B(A−1)T. Therefore, the theoretical efficiency loss of the estimator computed under misspecified working models and the truth is the difference A-1B(A-1)T-A-1B(A-1)T. This difference is identical to E{(A-1Seff-A-1Seff)(A-1Seff-A-1Seff)T} (see Section S.5 in Supplementary Material), which means that the efficiency loss is positive definite. The precise efficiency loss can thus be evaluated in each case. In our limited empirical studies (see Section 4.3.2), it has been observed that the loss is generally small, and the estimation variance is quite insensitive to the choice of the working models.

In summary, our procedure allows flexible working models η1,η2 to construct consistent estimators and achieves local efficiency. This contrasts from existing methods in the literature, including that from Tsiatis and Ma (2004), which are highly sensitive to the variance misspecification of the model error. Moreover, in bypassing the estimation of η1, η2, our algorithm minimizes the unnecessary work in the process of estimating α and β.

3.4 Theoretical properties

We describe the theoretical properties of θ^n=(α^nT,β^nT)T under working models η1(x,z;γ1) and η2(ε,x,z;γ2) where γ1, γ2 are finite-dimensional parameters. The parameters γ1, γ2 reflect the common practice of using parametric forms for the working models η1,η2. The true forms η10, η20 may or may not belong to these working model families.

Let γ=(γ1T,γ2T)T belong to a compact set 𝒢 and γ̂n be an estimator of γ. We assume γ̂n is root-n consistent in the proposed working models, so n1/2(γ̂nγ*) is bounded in probability for some constant γ*. We now demonstrate that under η1(x,z;γ1) and η2(ε,x,z;γ2), the estimator θ̂n is asymptotically normal, and its efficiency does not depend on how efficiently we estimate γ.

To establish these results, we further make the following assumptions:

  • (R4)

    The equation E{𝒮(Yi,Wi, Zi, Vi; θ, γ*)} = 0 has a unique solution. In addition, E{supθΘ*∈ 𝒢|𝒮(Yi,Wi, Zi, Vi; θ, γ*)|} < ∞ component wise, and the expectation of the squared l2 norm of 𝒮, i.e. E{||𝒮(Yi,Wi, Zi, Vi; θ0, γ*)||2}, is bounded. This condition is similar to condition (R3).

  • (R5)

    n-1i=1nS(Yi,Wi,Vi,Zi;θ,γ)/θ converges in probability to E{𝒮(Yi,Wi, Vi, Zi; θ, γ)/∂θ} uniformly in (θ, γ) in a neighborhood of (θ0, γ*).

  • (R6)

    n-1i=1nS(Yi,Wi,Vi,Zi;θ,γ)/γ converges in probability to E{𝒮(Yi,Wi, Vi, Zi; θ, γ)/∂γ} uniformly in (θ, γ) in a neighborhood of (θ0, γ*).

    The last two conditions are very mild and are generally satisfied following the law of large lumbers and equicontinuity conditions.

Our first theoretical result shows that θ̂n is asymptotically normal whether η1(x,z;γ1) and η2(ε,x,z;γ2) contain the true η10, η20 or not.

Theorem 2

Let f be an arbitrary p-dimensional function belonging to Λ in Theorem 1. Let η1(x,z;γ1) and η2(ε,x,z;γ2) be working parametric models for η1, η2. Let γ=(γ1T,γ2T)T and γ̂n be its estimate such that for some constant γ*, n1/2(γ̂nγ*) is bounded in probability. Finally, let θ^n=(α^nT,β^nT)T and θ0=(α0T,β0T)T denote the truth. Under regularity conditions ( R1)–( R6), the root θ̂n of i=1nS(Yi,Wi,Vi,Zi;θ,γ^n)=0 is consistent and

n(θ^n-θ0)Normal(0,V)

in distribution as n→. Here, V=A-1B(A-1)T with 𝒜* = E{𝒮(Y,W, V,Z; θ0, γ*)/∂θT } and* = diag[var{ϕ(V ; α0)}, var{f(Y,W,Z; θ0, γ*)}], a block diagonal matrix.

In Theorem 2, the arguments in i=1nS(Yi,Wi,Vi,Zi;θ,γ^n)=0 differ from those in (8) in that we have replaced η1,η2 with γ^n=(γ^n1T,γ^n2T)T to emphasize our use of parametric working models for η1,η2. The results in Theorem 2 hold because we can express n(θ^n-θ0) as a summand of normalized, zero-mean random vectors based on Taylor expansion and the properties of Λ. Consequently, by our regularity assumptions and the central limit theorem, this normalized sum will converge in distribution to a multivariate normal with zero mean and variance-covariance 𝒱*; see Section S.6 (Supplementary Material) for complete details. In addition, the result in Theorem 2 is useful for performing inference on θ where 𝒱* in practice is estimated by the sandwich estimator A^-1B^(A^-1)T. Here,

A^=n-1i=1nS(Yi,Wi,Vi,Zi;θ^n,γ^n)/θT,B^=n-1i=1nS(Yi,Wi,Vi,Zi;θ^n,γ^n)ST(Yi,Wi,Vi,Zi;θ^n,γ^n).

Remark 3

The result in Theorem 2 applies to any function f ∈ Λ. In Section 3.1, we argued that a particular function in Λ is Seff=K1(d). Thus, by Remark 1 and Theorem 2, when S=(ϕT,SeffT)T, the resulting estimator θ̂n from our proposed algorithm is consistent and asymptotically normal.

Our second theoretical result demonstrates that the asymptotic efficiency of θ̂n does not depend on how efficiently we estimate γ in the working parametric models. Specifically, consider the case when θ̂n solves the estimating equation i=1nS(Yi,Wi,Vi,Zi;θ,γ^n)=0, and θ̌n solves the estimating equation i=1nS(Yi,Wi,Vi,Zi;θ,γ)=0, where 𝒮 = (ϕ, fT )T and f belongs to Λ in Theorem 1. Our previous results from Theorem 2 warrant that θ̂n and θ̌n are root-n consistent estimators and asymptotically normal. A stronger result, shown below, is that θ̂n and θ̌n also have the same asymptotic efficiency even though the former involves the estimated γ̂n, and the latter only involves the constant γ*. Thus, as long as we consistently estimate γ̂n, then using either γ̂n or γ* in the working parametric models yields the same efficiency for θ̂n.

Theorem 3

Let the p-dimensional function f belong to Λ in Theorem 1. Assume γ̂n is such that n1/2(γ̂nγ*) is bounded in probability. Then, under regularity conditions ( R1)–( R6), the efficiency of the estimator θ̂n obtained as the root of i=1nS(Yi,Wi,Vi,Zi;θ,γ^n)=0 is asymptotically equivalent to the efficiency of the estimator θ̌n obtained as the root of i=1nS(Yi,Wi,Vi,Zi;θ,γ)=0. Namely, both n1/2(θ̂nθ0) and n1/2(θ̌nθ0) are asymptotically normal with mean zero and variance-covariance 𝒱* as in Theorem 2.

The proof of Theorem 3 follows analogously to that of Theorem 2 in that n(θ^n-θ0) and n(θˇn-θ0) can be expressed as the same summand of normalized, zero-mean random vectors via Taylor expansion; see Section S.7 (Supplementary Material). Thus, because the first order expansions of θ̂n and θ̌n are the same, it immediately follows from the regularity conditions and central limit theorem that both estimators are asymptotically normal with identical variance-covariance 𝒱*.

The results in Theorems 2 and 3 hold whether or not η1(x,z;γ1) and η2(ε,x,z;γ2) contain the true distributions η10, η20. However, when the working parametric models do contain the true distributions, the resulting estimator θ̂n is actually semiparametric efficient as noted below.

Remark 4

A particularly interesting case is when f is the efficient score Seff as in our algorithm. Since SeffΛ, Theorem 3 tells us that if correct parametric models with parameters γ are used for η1(x, z), η2(ε, x, z), and root-n estimators can be found for the parameters γ, then it is as if η1(x, z), η2(ε, x, z) were known precisely. In this case, we achieve optimal semiparametric efficiency. This is a stronger statement than Remark 2.

In practice, a correct parametric model is certainly not easy to obtain. It requires good knowledge of η1(x, z) and η2(ε, x, z), both of which are “invisible” due to the unobservable X’s. Thus, if reducing estimation variability is important, one can propose a relatively large model for η1 and η2, and proceed with the locally efficient estimator. With richer models of η1, η2, the chance of achieving efficiency is increased.

3.5 Implementation of the algorithm

Steps 1–3 in our algorithm are easily handled by following the guidance in Section 3.3 for selecting η1,η2. We thus focus on the details for executing Steps 4–6.

Step 4 requires solving for d*(Y,XZ) from the ill-posed problem in (7). Although this ill-posed problem may at first appear challenging, we benefit from two aspects. First, solving for d* is a “good” ill-posed problem in the sense that the ill-posedness is only because more than one solution may satisfy (7). This is beneficial since our objective is to find any one of these solutions. Second, what we really need for estimation and inference is not d* itself, but a smoothed version of d*, namely 𝒦1(d*) = E(d*|Y,W,Z) which is unique and hence no longer an ill-posed problem. We now demonstrate how (7) can be solved analytically in some cases and numerically otherwise.

3.5.1 Analytic d*

For some mean models, d* may be computed analytically such as for the simple, linear RMM with two replicates:

Yi=β1+β2Xi+εi,Wij=Xi+Uij,E(εiXi)=0,

for i = 1,, n, j = 1, 2. Here, Uij is normally distributed with mean zero and unknown variance 2σU2.

Following our algorithm, solve for σ^U2 from (3) and let Wi = (Wi1 + Wi2)/2. With η1 ≡ pX(x) and η2 ≡ pε|X(ε|x), we suppose that (Yi,Wi) are standardized so that it is reasonable to posit η1,η2 as standard normals. Then, under η1,η2, an analytic solution to (7) is d=(d1,d2)T with

d1(Y,X)=Y-β1-β2X(1+c1-1σ^U2),d2(Y,X)=c2-1β2σ^U2{c1(1-β12)+σ^U2+1-c1(Y-2β1)Y}+c2-1(c1+σ^U2)X{(2c1-1)(Y-β1)-β2(c1+σ^U2)X},

and c1=1+β22σ^U2,c2=c1(1+2σ^U2)-σ^U2.

Using the analytic d* to form the score vector Seff(Y,W;β,σ^U2,η1,η2)=K1(d) then yields that β̂n solves

0=Cn-1i=1n{(1WiWiWi2-σ^U2)(β1β2)-(YiYiWi)},

where C=diag(c1-1,-c2-1). Because C is non-singular, the above estimating equation is exactly the same explicit form previously given in Hall and Ma (2007). In other words, the estimator in Hall and Ma (2007) is a special case of our solution family corresponding to a natural choice of standard normals for the working models η1,η2.

3.5.2 Numerical d*

For general mean models, d* is computed numerically. The implementation below is provided in software available on the first-author’s website. We propose solving for d* by approximating it with a linear combination of basis functions. For ease of presentation, we demonstrate the procedure when X and Z are univariate; however, the method extends to the multivariate case.

In our approach, we approximate d* by

d(Y,X,Z)=j,k=1qcjk(Z)gk(Y)hj(X),

where cjk(Z), j, k = 1,, q, is a p-dimensional vector of unknown coefficients, and gk(·), hj(·) are sets of real-valued basis functions (e.g., Hermite polynomials, Chebychev polynomials, Fourier series, B-splines, Legendre polynomials). The number of bases q is chosen to give accurate approximation and permit fast computation. The number of basis functions q is dependent on the true d*(Y,X,Z) function and on the type of basis functions. Empirically, we suggest to start from q = 4 and increase it until the result stabilizes.

With d* as above, the goal then is to form (7) and solve for the coefficients cjk(Z), j, k = 1, . . . , q. To this end, (7) becomes

j,k=1qcjk(Z)εhj(X)E{gk(Y)εX,Z}+j,k=1qcjk(Z)gk(Y)K2K1{hj(X)}E(ε2X,Z)-j,k=1qcjk(Z)εE[gk(Y)K2K1{hj(X)}εX,Z)]=mβ(X,Z;β)ε. (9)

Under the working models η1 and η2, we evaluate the expectations in (9) using discretization and quadrature integration (e.g., Hermite quadrature). Specifically, we discretize η1(x,z) at r points x1, . . . , xr across the support of X with weights given by η1(x,z)=s=1rps(z)I(x=xs) such that s=1rps(z)=1 for all z in the support of Z. Under this discretization, the terms in (9) are computed using the formulas

K1{f1(Y,X,Z)}=s=1rf1(Y,xs,Z)pWX,Z(Wxs,Z;α^n)η2{Y-m(xs,Z;β),xs,Z}ps(Z)s=1rpWX,Z(Wxs,Z;α^n)η2{Y-m(xs,Z;β),xs,Z}ps(Z),K2{f2(Y,W,Z)}=f2(Y,w,Z)p(wX,Z;α^n)dμ(w),E{f1(Y,X,Z)X,Z}=f1(y,X,Z)η2{y-m(X,Z;β),X,Z}dμ(y),

for appropriate functions f1(Y,X,Z), f2(Y,W,Z). Finally, the integrals in 𝒦2 and E*(·|X,Z) are evaluated using quadrature integration (Kress, 1999, chap. 12). It is important to note that our way of discretizing η1(x,z) simplifies the computation of 𝒦1 into a simple summation of functions evaluated at x1, . . . , xr. Doing so avoids the complex task of estimating the unknown distribution pX|Y,W,Z(x|y,w, z). The number of discretization points r controls the integral approximation accuracy. Empirically, we suggest to use r = 20 and increase it until the results stabilize.

In the last step of solving for the coefficients of d*, each term in (9) is evaluated at q2 grid-points (ym, x, Z) for m, ℓ = 1, . . . , q, typically chosen as quadrature points. Doing so leads to p linear systems of size q2×q2, from which we may evaluate cjk(Z) at each observed Z. After obtaining the coefficients, we then verify that (7) is really solved by plugging in the coefficients to d*. The verification needs to be done only at the grid points (ym, x, Z) because d* was only solved for at these grid points. By having to verify d* only at these grid points rather than at all (Y,X,Z), we essentially bypass the functional nature of solving for d*, which means solving for d* is actually simpler than it appears.

After the coefficients of d* are verified, we then do Step 5 and form

Seff(Y,W,Z;β,α^n,η1,η2)=K1(d)=j,k=1qcjk(Z)gk(Y)K1{hj(X)}

to construct i=1nSeff(Yi,Wi,Zi;β,α^n,η1,η2)=0. In Step 6, the estimator for β is then the root of the constructed estimating equation.

One possible concern about our proposed implementation is that the different numerical approximations may ultimately affect the efficiency of the proposed estimator. However, this is not the case. If d* is constructed so that (7) is indeed satisfied, then Seff=K1(d) belongs to Λ as stated in Section 3.1. Elements in Λ lead to consistent estimators for β (see Theorem 1 and Remark 1) with efficiency affected only by the choice of η1,η2, not the approximation of d* (see Remark 2 and the ensuing discussion). Therefore, a critical step is ensuring that the obtained d* does indeed satisfy (7), which is exactly what we do. Therefore, solving for d* is genuinely and completely a computational issue since no data is involved in the solution process. To ensure that (7) is properly solved, one may need to choose a rich class of basis functions, for example, combinations of polynomial bases, B-splines, or Fourier series. A full discussion of various methods to solve (7) is a well studied topic in numerical analysis and can be found in Kress (1999) and references therein.

4 Empirical Studies

We now demonstrate the performance of our method and compare its results to five competing methods.

4.1 Simulation design

We consider the RMM with measurement error

Yi=β2exp(-β1Xi2)+β3Zi+εi,Wij=Xi+Uij,Uij~Normal(0,2σU2),

for i = 1, . . . , n and j = 1, 2. The true model parameters are (σUT,β1,β2,β3)T=(0.05,0.25,0.7,0.5)T. Results for other mean models are reported in the Supplementary Material (Section S.9).

The true distribution η10 of X is uniform on [ 1.1-0.9,1.1+0.9], and the true distribution η30 of Z is Bernoulli with parameter 0.5. To evaluate the robustness of our method, we set the model error distribution η20 to be either a uniform or t-distribution with 5 degrees of freedom (i.e., t5 distribution), and its variance either homoskedastic or heteroskedastic. Specifically, we consider

  • Setting 1 : Uniform distribution.

    • Homoskedastic: η20 is uniform on [−1, 1];

    • Heteroskedastic: η20 is (|X| + 1)𝒰 where 𝒰 is a uniform distribution on [−1, 1].

  • Setting 2 : t5-distribution.

    • Homoskedastic: η20 is 0.4t5;

    • Heteroskedastic: η20 is (0.4|X| + 0.5)t5.

4.2 Methods evaluated

For all settings, we generated 1000 data sets with sample size n = 500. Parameters σU2, β1, β2, β3 were estimated using six different methods.

4.2.1 Proposed method

We used our proposed method where we set working models η1,η2 different from the true η10, η20 both in terms of distributional form and variance structure. The differences are intended to demonstrate the robustness of our method when η1,η2 differ from the truth.

In Settings 1 and 2, we let the working model η1 be Normal(1.1, 0.9/3.52). In Setting 1, the working model η2 was Normal(0, 0.92), and in Setting 2, η2 was Normal(0, 1.72). While the working models have supports as large as the true distributions, the proposed η2 in no way accounts for the possible heteroskedasticity in η20. Our approach was implemented following the procedure in Section 3.5 where d* was computed numerically with q = 7 Hermite bases and r = 20 discretization points; all integrals were computed using Hermite quadrature.

4.2.2 Homoskedastic and heteroskedastic sieve estimator

The second and third method is a sieve MLE which either assumes homoskedastic or heteroskedastic model error. Specifically, the sieve MLE is the solution to

argmaxβsup(f1,f2)1ni=1nlnf1{yi-m(x,zi;β)x,zi}pU(wi-x;α^n)f2(x)dμ(x), (10)

where f1, f2 are truncated series used to estimate the unknown distributions of pε|X,Z(ε|x, z) and pX|Z(x|z), respectively. For our simulations, we have that pX|Z(x|z) = pX(x) since X and Z are generated independently of each other; thus, f2 is set to represent pX(x). Lastly, pU corresponds to the normal distribution for Wi = (Wi1 +Wi2)/2 and α̂n is the vectorized solution to (3).

We consider two different sieves for f1. The first is a homoskedastic sieve where f1 will estimate pε( ε) and thus ignore any dependence between ε and (X,Z). The second is a heteroskedastic sieve where f1 will estimate pε|X,Z(ε|x, z) and thus account for any dependence between ε and (X,Z).

For the homoskedastic sieve, we use the work of Schennach and Hu (2013), and use

f1(ε)=j=0κεξjεtj(ε),

where κε is a smoothing parameter, tj(x)=(πj!2j)-1Hj(x)exp(-x2/2) and Hj are Hermite polynomials. To ensure that f1(ε) is a valid density and that E(ε) = 0, we require that j=0κε(ξjε)2=1 and j=1κε-12(j+1)ξjεξj+1ε=0. We expect that this homoskedastic f1 will perform well when ε is in fact homoskedastic, but we do expect bias when ε is in fact heteroskedastic.

For the heteroskedastic sieve, we extended the work of Hu and Schennach (2008), and use

f1(εx,z)=[a00+a01cos{πxm(x,z;β)}+a02cos{2πxm(x,z;β)}]+k=13[ak0+ak1cos{πxm(x,z;β)}+ak2cos{2πxm(x,z;β)}]cos(kπeε)+k=13[bk0+bk1cos{πxm(x,z;β)}+bk2cos{2πxm(x,z;β)}]sin(kπeε).

By construction m(x, z; β) ∈ [0, x] and we simulated data such that ε ∈ [−e, ℓe] for an appropriate choice of e, so as to align with the assumptions of Hu and Schennach (2008). Finally, to ensure that f1(ε|x, z) is a valid density and that E(ε|X,Z) = 0, we impose twelve constraints given in Section S.8 (Supplementary Material). It is important to note that our heteroskedastic sieve above differs from that in Hu and Schennach (2008) in two ways. First, we use a sieve to estimate pε|X,Z rather than pU|X,Z as in Hu and Schennach (2008) who considered heteroskedastic measurement error, not heteroskedastic model error. Second, we further require that f1 is always non-negative while Hu and Schennach (2008) did not impose that in their numerical studies. In terms of performance, we expect that the heteroskedastic f1 will perform well whether ε is homoskedastic or heteroskedastic, since homoskedasticity is a special case of heteroskedascity (i.e., σε(x) ≡ σε).

Lastly, regardless of the form for f1, we let

f2(x)=j=0κxξjxtj(x),

where κx is a smoothing parameter, and tj(x) is the Hermite representation as defined for the homoskedastic f1. To ensure that f2 is a valid density we require that j=0κx(ξjx)2=1.

The homoskedastic and heteroskedastic sieve MLE is then the solution to the optimization problem in (10) subject to all constraints stated above: three for the homoskedastic sieve MLE and thirteen for the heteroskedastic sieve MLE. The integral in (10) is evaluated using Hermite quadrature. We set the smoothing parameters κε = 6 and κx = 6 as in Schennach and Hu (2013), but other values were considered and yielded similar results (not reported).

4.2.3 Homoskedastic and heteroskedastic Tsiatis-Ma estimator

The fourth and fifth methods are based on the work of Tsiatis and Ma (2004). The Tsiatis-Ma (TM) estimator also uses a working model η1, but requires η2 to yield a correctly specified variance structure. To demonstrate this sensitivity, we applied the TM estimator assuming homoskedastic model errors (TM-Homoskedastic) and assuming heteroskedastic model errors (TM-Heteroskedastic).

For both TM-Homoskedastic and TM-Heteroskedastic estimators, we set the working model η1 as Normal(1.1, 0.9/3.52). For the TM-Homoskedastic estimator, we let η2 be Normal(0, 1/3) in Setting 1 and Normal(0, 4/15) in Setting 2. The variances for η2 correspond to the true variances of η20 when η20 is homoskedastic. For the TM-Heteroskedastic estimator, we let η2 be Normal {0, (|x|+1)2/3} in Setting 1 and Normal {0, 5(0.4|x|+0.5)2/3} in Setting 2. The variances for η2 correspond to the true variances of η20 when η20 is heteroskedastic.

4.2.4 Naive estimator

The last method is the naive least squares estimator which is the solution to

argmaxβi=1n{yi-m(wi,zi;β)}2.

The naive estimator ignores measurement error and falsely assumes Xi and Wi = (Wi1 + Wi2)/2 are the same.

4.3 Simulation results

4.3.1 Performance of methods compared

Results in Tables 1 and 2 show the bias, estimated variance, and estimated 95% coverage probabilities for the model parameter estimates based on all six methods. Overall, all estimators consistently estimated the measurement error variance σU2 and β3 associated with the non-mismeasured covariate Z. Performances differed, however, for parameters β1, β2 which were affected by the mismeasured covariate X.

Table 1.

Bias, empirical sample variances (var), averaged estimated variances ( var^), and estimated 95% coverage probabilities (CI) for (σ^U2,β^T)T based on our proposed method (Semipar), homoskedastic sieve MLE (Sieve-Hom), heteroskedastic sieve MLE (Sieve-Het), Tsiatis-Ma homoskedastic estimator (TM-Hom), Tsiatis-Ma heteroskedastic estimator (TM-Het), and the naive estimator. Results based on 1000 simulations when m(X,Z; β) = β2 exp(−β1X2) + β3Z, and true parameter values (σU,02,β0T)T=(0.05,0.25,0.7,0.5)T.

Setting 1: Setting 2:
η20 ~ Uniform η20 ~ t5
β̂1 β̂2 β̂3
σ^U2
β̂1 β̂2 β̂3
σ^U2
η20: Homoskedastic

Semipar
bias −0.0065 −0.0080 0.0011 5.1664×10−5 −0.0056 −0.0059 0.0008 −9.2372×10−5
var 0.0030 0.0031 0.0026 1.0255×10−5 0.0024 0.0025 0.0022 1.0823×10−5
var^
0.0030 0.0030 0.0027 1.0255×10−5 0.0024 0.0024 0.0021 9.9839×10−6
CI 0.9500 0.9390 0.9520 0.9490 0.9440 0.9370 0.9520 0.9320
Sieve-Hom*
bias 0.0066 0.0008 0.0021 5.1664×10−5 0.0371 0.0235 0.0056 −9.2372×10−5
var 0.0033 0.0030 0.0022 1.0255×10−5 0.0046 0.0035 0.0023 1.0823×10−5
var^
NA NA NA NA NA NA NA NA
CI NA NA NA NA NA NA NA NA
Sieve-Het*
bias 0.5022 0.8177 0.6900 9.6823×10−6 0.7261 −0.1768 0.3083 −5.0634×10−5
var 0.0450 0.0458 0.0795 1.0916×10−5 0.0521 0.0581 0.0497 1.0109×10−5
var^
NA NA NA NA NA NA NA NA
CI NA NA NA NA NA NA NA NA
TM-Hom
bias 0.0019 0.0019 −0.0000 −0.0002 0.0012 0.0004 −0.0013 −7.9103×10−5
var 0.0035 0.0033 0.0027 1.0396×10−5 0.0028 0.0026 0.0021 9.6008×10−6
var^
0.0035 0.0032 0.0027 9.8539×10−6 0.0028 0.0026 0.0022 9.941×10−6
CI 0.9460 0.9440 0.9470 0.9490 0.9450 0.9540 0.9610 0.9470
TM-Het
bias −0.0144 −0.0185 0.0001 −0.0002 −0.0203 −0.0234 −0.0013 −7.9103×10−5
var 0.0038 0.0034 0.0032 1.0396×10−5 0.0026 0.0024 0.0023 9.6008×10−6
var^
0.0037 0.0032 0.0031 9.8539×10−6 0.0026 0.0023 0.0023 9.941×10−6
CI 0.9210 0.9300 0.9480 0.9490 0.9080 0.9160 0.9530 0.9470
Naive
bias −0.0269 −0.0230 0.0027 5.1664×10−5 −0.0255 −0.0206 0.0023 −9.2372×10−5
var 0.0029 0.0030 0.0026 1.0255×10−5 0.0023 0.0024 0.0022 1.0823×10−5
var^
0.0030 0.0028 0.0027 1.0255×10−5 0.0024 0.0023 0.0022 9.9839×10−6
CI 0.8930 0.9130 0.9570 0.9490 0.8830 0.9190 0.9530 0.9320
*

Estimated variances not available. The homoskedastic sieve MLE uses smoothing parameters κε = κx = 6, except for the uniform heteroskedastic setting which uses κε = 5, κx = 6. For the uniform heteroskedastic setting, the constrained optimization could not be solved for larger κε values.

Table 2.

Bias, empirical sample variances (var), averaged estimated variances ( var^), and estimated 95% coverage probabilities (CI) for (σ^U2,β^T)T based on our proposed method (Semipar), homoskedastic sieve MLE (Sieve-Hom), heteroskedastic sieve MLE (Sieve-Het), Tsiatis-Ma homoskedastic estimator (TM-Hom), Tsiatis-Ma heteroskedastic estimator (TM-Het), and the naive estimator. Results based on 1000 simulations when m(X,Z; β) = β2 exp(−β1X2) + β3Z, and true parameter values (σU,02,β0T)T=(0.05,0.25,0.7,0.5)T.

Setting 1: Setting 2:
η20 ~ Uniform η20 ~ t5
β̂1 β̂2 β̂3
σ^U2
β̂1 β̂2 β̂3
σ^U2
η20: Heteroskedastic

Semipar
bias 0.0098 −0.0055 0.0013 2.9761×10−5 0.0122 −0.0009 0.0008 −0.0001
var 0.0188 0.0101 0.0119 1.0161×10−5 0.0160 0.0103 0.0124 1.0843×10−5
var^
0.0206 0.0100 0.0125 1.0022×10−5 0.0195 0.0102 0.0124 9.9708×10−5
CI 0.9610 0.9480 0.9550 0.9510 0.9510 0.9570 0.9520 0.9320
Sieve-Hom*
bias 0.1683 0.1008 −0.0757 −2.4913×10−5 0.2437 0.1595 −0.0139 −0.0001
var 0.1686 0.2500 0.0518 9.9631×10−6 0.0601 0.0410 0.0247 1.0492×10−5
var^
NA NA NA NA NA NA NA NA
CI NA NA NA NA NA NA NA NA
Sieve-Het*
bias 0.7334 0.6731 0.6527 9.6823×10−6 0.7868 −0.2997 0.5669 −5.0634×10−5
var 0.0423 0.0385 0.0589 1.0916×10−5 0.0443 0.0844 0.0256 1.0109×10−5
var^
NA NA NA NA NA NA NA NA
CI NA NA NA NA NA NA NA NA
TM-Hom
bias 0.2931 0.2677 −0.0227 −0.0002 0.5032 0.5174 −0.0349 −0.0005
var 0.2640 0.1392 0.0123 1.037×10−5 1.3260 1.6331 0.0119 8.9453×10−6
var^
0.1065 0.0692 0.0131 9.8513×10−6 0.3768 0.2236 0.0135 9.7925×10−6
CI 0.8110 0.6800 0.9580 0.9480 0.8950 0.7890 0.9600 0.9550
TM-Het
bias 0.0225 0.0100 0.0005 −0.0002 0.0134 0.0011 −0.0027 −9.3324×10−5
var 0.0218 0.0096 0.0103 1.0409×10−5 0.0195 0.0088 0.0096 9.5564×10−6
var^
0.0258 0.0098 0.0104 9.8583×10−6 0.0215 0.0089 0.0099 9.9345×10−6
CI 0.9500 0.9500 0.9540 0.9490 0.9520 0.9500 0.9590 0.9470
Naive
bias 0.0692 −0.0185 0.0033 2.9761×10−5 0.0149 −0.0123 0.0028 −0.0001
var 3.0261 0.0102 0.0125 1.0161×10−5 0.3106 0.0100 0.0127 1.0843×10−5
var^
1.5445 0.0102 0.0177 1.0022×10−5 0.9245 0.0387 0.0792 9.9708×10−5
CI 0.9390 0.9480 0.9550 0.9510 0.9310 0.9550 0.9550 0.9320
*

Estimated variances not available. The homoskedastic sieve MLE uses smoothing parameters κε = κx = 6, except for the uniform heteroskedastic setting which uses κε = 5, κx = 6. For the uniform heteroskedastic setting, the constrained optimization could not be solved for larger κε values.

In general, compared to the other estimators, our estimator had smaller bias, estimated variances better matching the sample variances, and estimated coverage probabilities closer to the nominal 95% level. This performance was similar regardless of the true model error distribution and its variance structure, thus reflecting the proposed estimator’s flexibility. The proposed estimator can yield valid estimates for an RMM with measurement error regardless of whether the true model error is homoskedastic or heteroskedastic. This is especially beneficial in practice since knowing the correct model error variance structure is almost impossible as residuals are not obtainable in measurement error models.

In comparison, the homoskedastic and heteroskedastic sieve MLE were, in some cases, sensitive to the model error’s variance structure. When the model error was homoskedastic, the homoskedastic sieve MLE performed well and yielded unbiased estimates. Unfortunately, when applied to the heteroskedastic model error, this same estimator yielded biased estimates with bias up to 19 times larger than our proposed estimator. Increasing the number of smoothing parameters did not change the numerical results (a similar phenomenon was observed in Schennach and Hu (2013)), and it breaks the constrained optimization solver when the number becomes too large. The observed bias was expected, however, because the homoskedastic sieve MLE is not designed to handle heteroskedasticity. Instead, a more flexible sieve such as the heteroskedastic sieve estimator should actually be employed. Unfortunately, in our numerical studies, the heteroskedastic sieve MLE yielded biased estimates both when the model error was homoskedastic and heteroskedastic. We suspect the observed bias could be a result of the difficulty in solving a constrained optimization subject to too many constraints. When the model error is truly heteroskedastic, we further suspect that more specialized bases may be needed to properly account for the heteroskedasticity. Doing so, however, may be difficult as it would require estimating the model error’s heteroskedasticity and defining a truncated series that can capture its form. For an RMM with measurement error, correctly determining the model error’s variance-covariance is challenging, and is a step surpassed by our proposed estimator.

The TM-Homoskedastic and TM-Heteroskedastic estimators also heavily relied on the correctness of the model error variance. When the model error variance structure was correctly specified, the TM estimators had little bias and nearly perfect nominal 95% coverage probabilities. In this case, the TM estimator has one less nonparametric term than our proposed method, and thus performed well. In contrast, when the variance structure was incorrect, the TM estimators performed poorly compared to our proposed estimator. The poor performance was most notable when the data was generated with heteroskedastic model errors, and we applied the TM-Homoskedastic estimator. In this case, the TM-Homoskedastic estimator yielded estimates with bias up to 40 times larger than our proposed estimator.

Finally, the naive estimator had large bias and coverage probabilities less than the nominal 95%, indicating that the measurement error was significant enough and could not be ignored.

These results demonstrate that measurement error cannot be ignored and that methods that rely on knowing the model error variance structure will, unfortunately, yield biased estimates. Because our proposed estimator makes no assumptions about the model error’s variance structure, our method does indicate more flexibility than existing methods, including the sieve MLE and Tsiatis-Ma method. Specifically, our proposed estimator provides consistent estimates even when the model error and covariate distributions are both misspecified. Similar results were observed for other mean models; see Supplementary Material (Section S.9).

4.3.2 Empirical impact of working models in proposed method

In Section 3.3, we discussed the theoretical impact of working models in our proposed method. We now evaluate the numerical impact. Specifically, we generated data as in Section 4.1, except with η10 as Normal(0, 0.52) and η20 as Normal(0, 0.42). We then evaluated our proposed method for four different cases of working models η1,η2:

  • Case 1: η1=η10,η2=η20.

  • Case 2: η1η10,η2=η20 with η1 a t-distribution with 4 degrees of freedom.

  • Case 3: η1=η10,η2η20 with η2 as Normal {0, (1 + |X|)2/32}.

  • Case 4: η1η10,η2η20 with η1 a t-distribution with 4 degrees of freedom, and η2 as Normal {0, (1 + |X|)2/32}.

Results in Table 3 show that in all cases, the proposed estimator yields consistent estimates. As we progress from Case 2 to Case 4, the efficiency loss only slightly increases; for example, the estimated variance for β̂1 is 0.0065 in Case 4 compared to an estimated variance of 0.0044 in Case 1. Similar results were observed for other regression models; see Supplementary Material (Section S.9). This small loss in efficiency and insensitivity to the choice of the working models was similarly observed in simpler models (see Tsiatis and Ma, 2004, Ma and Carroll, 2006 and Wang et al., 2009). Hence, for flexible choices of working models, our method yields consistent estimates and small efficiency loss when using incorrect working models.

Table 3.

Evaluation of efficiency loss from proposed method when working models η1,η2 may differ from the true η10, η20. Bias, empirical sample variances (var), averaged estimated variances ( var^), and estimated 95% coverage probabilities (CI) for (σ^U2,β^T)T with true parameter values (σU,02,β0T)T=(0.05,0.25,0.7,0.5)T and m(X,Z; β) = β2 exp(−β1X2) + β3Z. Results based on 1000 simulations.

Setting β̂1 β̂2 β̂3
σ^U2
η1=η10,η2=η20
bias −0.0013 0.0009 −0.0017 −3.9892×10−5
var 0.0043 0.0010 0.0013 1.0103×10−5
var^
0.0044 0.0010 0.0013 9.9257×10−6
CI 0.9500 0.9500 0.9430 0.9430
η1η10,η2=η20
bias −0.0001 0.0012 −0.0017 −3.9892×10−5
var 0.0046 0.0010 0.0013 1.0103×10−5
var^
0.0047 0.0010 0.0013 9.9257×10−6
CI 0.9480 0.9500 0.9430 0.9430
η1=η10,η2η20
bias −0.0104 0.0007 −0.0063 −3.9892×10−5
var 0.0051 0.0012 0.0020 1.0103×10−5
var^
0.0052 0.0012 0.0018 9.9257×10−6
CI 0.9380 0.9490 0.9410 0.9430
η1η10,η2η20
bias −0.0081 0.0011 −0.0064 −3.9892×10−5
var 0.0064 0.0013 0.0021 1.0103×10−5
var^
0.0065 0.0013 0.0019 9.9257×10−6
CI 0.9410 0.9470 0.9430 0.9430

5 A case study

Flagg et al. (2000) performed a study to evaluate the validity of a Nutrition Survey conducted by the American Cancer Society in 1992–1993. In the study, n = 317 male participants completed four 24 hour dietary recall interviews given over a one-year period. Interest lies in understanding the impact of saturated fat intake on percent calories from fat for different races (white vs. non-white). Saturated fat intake, however, is not known exactly and only a mismeasured version via two repeated measurements is available.

Let Y denote the percent calories from fat, X denote the log transformation of the true (unobserved) saturated fat intake, and Z denote race (Z = 1 refers to white). We let W1 and W2 be the centered, log-transformed saturated fat measurements. Through a QQ-plot in Figure 1, we find that V = (W1W2)/2 is acceptably normally distributed with some unknown variance σU2. Normality was formally evaluated through a Pearson Chi-squared test where we used 10 to 20 bins for testing and obtained a p-value at least 0.63, thus assuring the normality assumption.

Figure 1.

Figure 1

Nutrition study: quantile-quantile plots of the measurement error for the original first and third readings of the 24 hour recall surveys (top) and after the logarithm transform (bottom).

Because nutrition models usually assume percent calories from fat is related to saturated fat intake through a linear regression, we use the model

Yi=β1exp(Xi)+β2+β3Zi+ε,Wij=Xi+Uij,Uij~Normal(0,2σU2)

for i = 1, . . . , n; j = 1,2 and E(ε|X,Z) = 0.

To estimate the model parameters, we used five methods: (i) The proposed method with working models η1 as Normal(0, 0.562) and η2 as Normal(0, 0.912). The variance for η1 is the sample variance of W, and the variance of η2 is the residual sum of squares after regressing Y on exp(W) and Z. (ii) The homoskedastic sieve MLE with smoothing parameters κε = κx = 6. (iii) The heteroskedastic sieve MLE with x = e = maxi |Yi|. (iv) The Tsiatis-Ma Homoskedastic estimator with η1,η2 as in our proposed method. Unlike our method, the TM-Homoskedastic estimator assumes the specified η2 is correct. We did not use the TM-Heteroskedastic estimator because it is difficult to specify a heteroskedastic variance structure for an RMM with measurement error. (v) The naive estimator.

Parameter estimates for all methods are in Table 4. All methods yielded similar inference conclusions: among the male population, saturated fat intake is statistically significant in relation to percent calories from fat (e.g., proposed method yielded β̂1 = 1.59, 95% CI: (1.23, 1.95)), whereas race is not (e.g., proposed method yielded β̂3 = −0.14, 95% CI: (−0.35, 0.08)). Though inference conclusions were similar, the methods yielded different magnitudes of the parameter effects. For example, the proposed method indicated that a one unit increase in saturated fat is associated with an estimated increase of 1.59 units in the mean of percent calories. This is nearly twice as large as the naive estimates would conclude and at least 1.4 times as large as the homoskedastic sieve, heteroskedastic sieve or TM-Homoskedastic estimator would conclude. The contrast in these results indicate that measurement error cannot be ignored. Moreover, given that the Tsiatis-Ma and sieve MLE estimator exhibited sensitivity to misspecification of the model error variance, we would prefer to rely on the results from the proposed method which is insensitive to such misspecification. Therefore, our method indicates that saturated fat intake affects a male’s percent calories from fat more than existing methods would indicate.

Table 4.

Results from nutrition study when estimation is based on proposed method (Semipar), homoskedastic sieve MLE (Sieve-Hom), heteroskedastic sieve MLE (Sieve-Het), Tsiatis-Ma homoskedastic estimator (TM-Hom), and naive estimator. Parameter estimate (est), its estimated variance ( var^), and 95% confidence interval (CI).

β̂1 β̂2 β̂3
σ^U2
Semipar
est 1.5926 −1.6611 −0.1364 0.1097
var^
0.0338 0.0544 0.0117 0.0001
CI (1.2322,1.9530) (−2.1182, −1.2040) (−0.3480, 0.0752) (0.0923, 0.1271)
Sieve-Hom*
est 1.1397 −1.2449 −0.0401 0.1097
var^
NA NA NA NA
CI NA NA NA NA
Sieve-Het*
est 1.1407 1.1628 0.2398 0.1097
var^
NA NA NA NA
CI NA NA NA NA
TM-Hom
est 1.2745 −1.3604 −0.0734 0.1097
var^
0.0190 0.0242 0.0116 0.0001
CI (1.0046, 1.5445) (−1.6655, −1.0553) (−0.2841, 0.1373) (0.0923, 0.1271)
Naive
est 0.7110 −0.8044 −0.0351 0.1097
var^
0.0065 0.0129 0.0104 0.0001
CI (0.3506, 1.0714) (−1.2615, −0.3473) (−0.2467, 0.1766) (0.0923, 0.1271)
*

Estimated variances not available. The homoskedastic sieve MLE uses smoothing parameters κε = κx = 6.

We did not use the TM-Heteroskedastic estimator because it is difficult to specify a heteroskedastic variance structure for an RMM with measurement error.

6 Discussion

We have developed root-n consistent estimators and provided inference tools for an RMM with errors in covariates where both the mean model and the measurement error model are in their general form. We showed that our method’s consistency does not require independence between the covariates and the model error, nor require estimating the unobservable covariate distribution and model error distribution. This is advantageous over existing methods including the Tsiatis and Ma (2004) estimator and the sieve MLE which have shown numerical sensitivity to model error heteroskedasticity. The proposed estimator is derived via a semiparametric procedure different from that in Tsiatis and Ma (2004), and, to the best of our knowledge, the resulting root-n consistent estimator is the first known in its generality that is robust to various distribution misspecifications.

To identify and estimate ΩU in Section 2.1, we used the average of repeated measures. An alternative is to directly use the repeated measures to perform estimation and inference. Based on our experience (Ma and Yin, 2008), there is generally not a definitive efficiency gain or loss with this approach relative to the average approach. However, more careful analysis will be needed to determine when one or the other is more efficient.

We assumed throughout that the measurement error distribution pUij (u; ΩU) is parametric with ΩU unknown. We can relax this assumption to have a nonparametric measurement error distribution. In this case, still assuming Xi and Uij are independent, Kotlarski’s Theorem (Kotlarski, 1967) implies that the measurement error density is identifiable. From the repeated measures, a nonparametric kernel estimation of the measurement error density function Uij can be obtained, and operationally our estimation procedure can proceed with pUij replaced by Uij. For such a plug-in procedure, we provide the following summary. (i) The identifiability of β (Section 2.2) still holds; (ii) Theorem 1 and the estimation procedure for β (Section 3.2) remain valid since they only required a consistent estimator for the measurement error density. (iii) The root-n consistency and asymptotic normality in Theorems 2 and 3 still hold, although the asymptotic variance will change and the proofs will need to be redone to take into account the additional nonparametric estimation. See Hall and Ma (2007) for details on how to incorporate a nonparametrically estimated error distribution in a different model. (iv) The optimal efficiency bound in estimating β will decrease due to the the nonparametric estimation of pUij.

Lastly, another extension of our method is to a conditional moment model where

E{m(Y,X,Z;β)X,Z}=0. (11)

In this case, the proof of identification of β (Section 2.2) still holds since it does not require a particular form of the conditional density of Y conditional on X,Z. Our remaining estimation procedure, asymptotic properties, and implementation (Section 3) also remain intact except with ε replaced everywhere by m(Y,X,Z; β) and mβ(X,Z;β) in the right hand side of equation (7) changed to −E{∂m(Y,X,Z; β)/∂β|X,Z}. To this end, even for general nonlinear and nonseparable regression models of the form Y = f(X,Z, ε, β0), where the distribution of ε is unknown and may be subject to various restrictions, as long as we can construct moment conditions, i.e. finding m(Y,X,Z; β0) such that (11) holds, our general procedure is applicable. This extension is particularly useful in empirical economics where models can take a conditional or nonseparable forms.

Acknowledgments

This work was supported by the the National Institute Of Neurological Disorders And Stroke of the National Institutes of Health under Award Number K01NS099343, the Huntington’s Disease Society of America Human Biology Project Fellowship, Texas A&M School of Public Health Research Enhancement and Development Initiative (REDI-23-202059-36000), and the National Science Foundation (DMS-1608540). We thank Yingyao Hu for providing code for the homoskedastic sieve estimator and for advising on the heteroskedastic sieve estimator. We thank Raymond J. Carroll for providing the nutrition data. We also thank the editor and two referees whose comments substantially improved the quality and presentation of the work.

Footnotes

JEL Classification: C1

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Tanya P. Garcia, Department of Epidemiology and Biostatistics, Texas A&M University.

Yanyuan Ma, Department of Statistics, Pennsylvania State University.

References

  1. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore: The Johns Hopkins University Press; 1993. [Google Scholar]
  2. Carroll RJ, Hall P. Optimal rates of convergence for deconvoluting a density. Journal of the American Statistical Association. 1988;83:1184–1186. [Google Scholar]
  3. Carroll RJ, Maca JD, Ruppert D. Nonparametric regression in the presence of measurement error. Biometrika. 1999;86:541–554. [Google Scholar]
  4. Carroll RJ, Ruppert D, Crainiceanu CM, Tosteson TD, Karagas MR. Nonlinear and nonparametric regression and instrumental variables. Journal of the American Statistical Association. 2004;99:736–750. [Google Scholar]
  5. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C. Measurement Error in Nonlinear Models: A Modern Perspective. 2. London: CRC Press; 2006. [Google Scholar]
  6. Carroll RJ, Wang Y. Nonparametric variance estimation in the analysis of microarray data: a measurement error approach. Biometrika. 2008;95:437–449. doi: 10.1093/biomet/asn017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chan LK, Mak TK. On the polynomial functional relationship. Journal of the Royal Statistical Society, Series B. 1985;47:510–518. [Google Scholar]
  8. Chen X, Hu Y, Lewbel A. Nonparametric identification and estimation of nonclassical errors-in-variables models without additional information. Statistica Sinica. 2009;19:949–968. [Google Scholar]
  9. Cheng CL, Schneeweiss H. Polynomial regression with errors in the variables. Journal of the Royal Statistical Society, Series B. 1998;60:189–199. [Google Scholar]
  10. Cheng CL, Schneeweiss H, Thamerus M. A small sample estimator for a polynomial regression with errors in the variables. Journal of the Royal Statistical Society, Series B. 2000;62:699–709. [Google Scholar]
  11. Delaigle A, Hall P. Estimation of observation-error variance in errors-invariables regression. Statistica Sinica. 2011;21:1023–1063. [Google Scholar]
  12. Fan J. On the optimal rates of convergence for nonparametric deconvolution problems. Annals of Statistics. 1991;19:1257–1272. [Google Scholar]
  13. Flagg E, Coates R, Calle E, Potischman N, Thun M. Validation of the American Cancer Society Cancer Prevention Study II Nutrition Survey Cohort Food Frequency Questionnaire. Epidemiology. 2000;11:462–468. doi: 10.1097/00001648-200007000-00017. [DOI] [PubMed] [Google Scholar]
  14. Fuller WA. Measurement Error Models. New York: Wiley; 1987. [Google Scholar]
  15. Hall P, Ma Y. Measurement Error Models with Unknown Error Structure. Journal of the Royal Statistical Society, Series B. 2007;69:429–446. [Google Scholar]
  16. Huang S, Huwang L. On the polynomial structural relationship. The Canadian Journal of Statistics. 2001;29:495–512. [Google Scholar]
  17. Hu Y, Schennach S. Instrumental variable treatment of nonclassical measurement error models. Econometrica. 2008;76:195–216. [Google Scholar]
  18. Jennrich RI. Asymptotic Properties of Non-Linear Least Squares Estimators. Annals of Mathematical Statistics. 1969;40:633–643. [Google Scholar]
  19. Kotlarski II. On characterizing the gamma and normal distribution. Pacific Journal of Mathematics. 1967;20:69–76. [Google Scholar]
  20. Kress R. Linear integral equations. 2. Berlin: Springer; 1999. [Google Scholar]
  21. Lee L, Sepanski J. Estimation of linear and nonlinear errors-in-variables models using validation data. Journal of the American Statistical Association. 1995;90:130–140. [Google Scholar]
  22. Li H, Liqun W. Consistent estimation in generalized linear mixed models with measurement error. Journal of Biometrics and Biostatistics. 2012:S7:007. doi: 10.4172/2155-6180.S7-007. [DOI] [Google Scholar]
  23. Li T. Robust and consistent estimation of nonlinear errors-in-variables models. Journal of Econometrics. 2002;110:1–26. [Google Scholar]
  24. Liang H. Generalized partially linear mixed effects models incorporating mismeasured covariates. Annals of the Institute of Statistical Mathematics. 2009;61:27–46. doi: 10.1007/s10463-007-0146-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Liang K, Zeger S. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
  26. Ma Y, Carroll RJ. Locally efficient estimators for semiparametric models with measurement error. Journal of the American Statistical Association. 2006;101:1465–1474. [Google Scholar]
  27. Ma Y, Yin G. Cure Rate Model with Mismeasured Covariates under Transformation. Journal of the American Statistical Association. 2008;103:743–756. [Google Scholar]
  28. Nakamura T. Corrected score function for errors-in-variables models: methodology and application to generalized linear models. Biometrika. 1990;77:127–137. [Google Scholar]
  29. Newey W. Semiparametric efficiency bounds. Journal of Applied Econometrics. 1990;5:99–135. [Google Scholar]
  30. Novick SJ, Stefanski LA. Corrected score estimation via complex variable simulation extrapolation. Journal of the American Statistical Association. 2002;97:472–481. [Google Scholar]
  31. Rao CR. Linear Statistical Inference and Its Applications. New York: Wiley; 1973. [Google Scholar]
  32. Rao P. Identifiability in Stochastic Models. New York: Academic Press; 1992. [Google Scholar]
  33. Rudin W. Real and complex analysis. McGraw-Hill; 1987. Mathematics series. [Google Scholar]
  34. Schennach S. Estimation of nonlinear models with measurement error. Econometrica. 2004;72:33–75. [Google Scholar]
  35. Schennach S. Nonparametric regression in the presence of measurement error. Econometric Theory. 2004b;20:1046–1093. [Google Scholar]
  36. Schennach S, Hu Y. Nonparametric identification and semiparametric estimation of classical measurement error models without side information. Journal of the American Statistical Association. 2013;108:177–186. [Google Scholar]
  37. Shen X. On methods of sieves and penalization. Annals of Statistics. 1997;25:2555–2591. [Google Scholar]
  38. Stefanski L, Carroll RJ. Deconvoluting kernel density estimators. Statistics. 1990;21:169–184. [Google Scholar]
  39. Tsiatis A. Semiparametric Theory and Missing Data. New York: Springer; 2006. [Google Scholar]
  40. Tsiatis A, Ma Y. Locally efficient semiparametric estimators for functional measurement error models. Biometrika. 2004;91:835–848. [Google Scholar]
  41. Wang Y, Ma Y, Carroll RJ. Variance estimation in the analysis of microarray data. Journal of the Royal Statistical Society, Series B. 2009;71:725–745. doi: 10.1111/j.1467-9868.2008.00690.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES