Statistical Inference for Semiparametric Varying-coefficient Partially Linear Models with Generated Regressors (F06-463)

Yong Zhou; Hua Liang

doi:10.1214/07-AOS561

. Author manuscript; available in PMC: 2010 Feb 1.

Published in final edited form as: Ann Stat. 2009 Feb;37(1):427. doi: 10.1214/07-AOS561

Statistical Inference for Semiparametric Varying-coefficient Partially Linear Models with Generated Regressors (F06-463)

Yong Zhou ¹, Hua Liang ²

PMCID: PMC2652893 NIHMSID: NIHMS78900 PMID: 20126281

Abstract

We study semiparametric varying-coefficient partially linear models when some linear covariates are not observed, but ancillary variables are available. Semiparametric profile least-square-based estimation procedures are developed for parametric and nonparametric components after we calibrate the error-prone covariates. Asymptotic properties of the proposed estimators are established. We also propose the profile least-square-based ratio test and Wald test to identify significant parametric and nonparametric components. To improve accuracy of the proposed tests for small or moderate sample sizes, Wild bootstrap version is also proposed to calculate the critical values. Intensive simulation experiments are conducted to illustrate the proposed approaches.

Keywords: Ancillary variables, de-noise linear model, errors-in-variable, profile least-square-based estimator, rational expectation model, validation data, Wild bootstrap

1 Introduction

Various efforts have been made to balance the interpretation of linear models and flexibility of nonparametric models. Important results from these efforts include semiparametric varying-coefficient partially linear models (SVCPLM), in which the response variable Y depends on variables Z, X and U in the form of

Y = Θ^{T} Z + α^{T} (U) X + ε,

(1.1)

where Θ is a p-dimensional vector of unknown parameters, α(·) is a q-variate vector of unknown functions, U is a vector of nonparametric components, which may be multivariate, and the model error ε has mean zero and finite variance. For notational simplicity, we assume that U is scalar. α^T(U)X is referred to as a nonparametric component since α(U) is nonparametric.

Model (1.1) permits the interaction between the covariates U and X in such a way that a different level of covariate U is associated with a different linear model about Θ^TZ, and allows one to examine the extent to which covariates X interact. This model presents a novel and general structure, which indeed covers many well-studied important semiparametric regression models. For example, when Z = 0, (1.1) reduces to varying-coefficient models, which were originally proposed by Hastie and Tibshirani (1993), and studied by Fan and Zhang (1999), Xia and Li (1999), and Cai, Fan and Li (2000). When q = 1 and X = 1, model (1.1) reduces to well-known partially linear models, in which Y depends on Z in a linear way but is related to another independent variable U in an unspecified form. There is a great deal of literature on the study of partially linear models (e.g., Engle et al., 1986;, Robinson, 1988; and Speckman, 1988). A survey of partially linear models was given by Härdle, Liang and Gao (2000). The study of SVCPLM has been investigated by Zhang, Lee and Song (2002), and Fan and Huang (2005) among others. Zhang, Lee and Song (2002) developed the procedures for estimation of the linear and nonparametric parts of the SVCPLM. Fan and Huang (2005) proposed a profile likelihood technique for estimating parametric components and established the asymptotic normality of their proposed estimator.

All studies of the SVCPLM are limited to considerations of exactly observed data. However, in biomedical research observations are measured with error. Simply ignoring measurement errors, known as the naive method, will result in biased estimators. Various attempts have been made to correct for such bias, see Fuller (1987) and Carroll et al. (2006) for extensive discussions and examples of linear and nonlinear models with measurement errors. In this paper, we are concerned with the situation where some components (ξ) of Z are unobserved directly, but auxiliary information is available to remit ξ. Let Z = (ξ^T, W^T)^T, where ξ is a p₁ × 1 vector and W is a vector of the remaining observed components. We assume that ξ is related to observed η and V through the relationship ξ = E(η|V). Thus, we study the following model

{\begin{array}{l} Y = β^{T} ξ + θ^{T} W + α^{T} (U) X + ε, \\ η = ξ (V) + e, \end{array}

(1.2)

where E(ε|Z, X, U) = 0, E(ε²|Z, X, U) = σ²(Z, X, U), and e is an error with mean zero and positive finite covariance matrix Σ_e = E(ee^T). The four covariates V, W, X and U are different. In our structure, we allow that V and (X, W, U) may overlap. Model (1.2) is flexible enough to include a variety of models of interest. We give three examples to illustrate its flexibility.

Example 1.1

(Errors-in-variable models with validation data) Z is a p-variate variable vector and is not observed. Z̃ is an another p-variate vector and is observed associated with vector Z. Assume that we have primary observations {Y_j, Z̃_j, U_j, j = 1, ···, n}, and n₀ independent validation observations {Z_j, Z̃_j, U_i, i = n + 1, ···, n + n₀}, which are independent of the primary observations. Let V = (Z̃^T, U)^T. The partial errors-in-variable model with validation data is written as

{\begin{array}{l} Y = β^{T} E (Z ∣ V) + α (U) + ɛ, \\ ε = e + β^{T} {Z - E (Z ∣ V)} . \end{array}

(1.3)

This model has been studied by Sepanski and Lee (1995), Sepanski and Carroll (1993), and Sepanski, Knickerbocker and Carroll (1994). Taking X = 1, θ = 0, η = Z, and ξ = E(Z|V) in model (1.2), we know that model (1.3) is a sub-model of (1.2).

Example 1.2

(De-noise linear model) The relation between the response variable Y and covariates (ξ, W) is described by Y = β^Tξ + θ^TW + ε, where β and θ are parametric vectors respectively. The covariate ξ is measured with error since instead of observing ξ directly we observe its surrogate η. This forms a de-noise linear model:

{\begin{array}{l} Y = β^{T} ξ + θ^{T} W + ε, \\ η = ξ + e, \end{array}

(1.4)

where ξ = ξ(t) is subject to measurement error at time t and the measurement errors ε and e are independent of each other at each time t.

Cai, Naik and Tsai (2000) used this model to estimate the relationship between awareness and television rating points of TV commercials for certain products. Cui, He and Zhu (2002) proposed an estimator of the coefficients and established asymptotic results of the proposed estimator. It is easy to see that model (1.2) embeds (1.4).

Example 1.3

(Rational expectation model) Consider the following rational expectation model:

Y_{t} = γ^{T} S_{t} + ζ^{T} {η_{t} - E (η_{t} ∣ V_{t})} + ε_{t},

(1.5)

where η_t − E(η_t|V_t) is the expectation payoff for price variable η_t given historical information V_t. In this model, (Y_t, S_t, η_t, V_t) except E(η_t|V_t) can be observed directly.

Beside estimation and inference of γ and ζ, within the econometric community, the following model is of interest.

Y_{t} = γ^{T} S_{t} + ζ^{T} η_{t} - β^{T} E (η_{t} ∣ V_{t}) + ε_{t} .

(1.6)

It is worthy to note that model (1.6) is a sub-model of (1.2). An interesting question is to test whether the model (1.6) satisfies the rational expectation model (1.5), that is to test following hypothesis:

H_{0} : β = ζ VS H_{1} : β \neq ζ,

(1.7)

In the econometric literature, regressing unobserved covariates is also called generated regression. This topic has been widely studied. Pagan (1984) gave a comprehensive review on the estimation of parametric models with generated regression. Ai and Mcfadden (1997) presented a procedure for analyzing a partially specified nonlinear regression model in which the nuisance parameter is an unrestricted function of a subset of regressors. Ahn and Powell (1993) and Powell (1987) considered the case with the generated regressors in the nonparametric part of the model. Li (2002) considered the problems of estimating a semiparametric partially linear model for dependent data with generated regressors. Their models are special cases of the rational expectation model.

Various procedures similar to generated regression have been proposed to reduce the bias due to mis-measurement. Regression calibration and simulation extrapolation have been developed for measurement errors models (Carroll et al., 2006). Liang, Härdle and Carroll (1999) studied a special case of (1.2), partially linear errors-in-variables models, and proposed an attenuated estimator of the parameter based on the semiparametric likelihood estimate. Wang and Pepe (2000) used a pseudo-expected estimating equation method to estimate the parameter in order to correct the estimation bias.

In an attempt to develop a unified estimation procedure for model (1.2), we propose a profile-based procedure, which is similar to regression calibration method in spirit. The procedure consists of two steps. In the first step, we calibrate the error-prone covariate ξ by using ancillary information and applying nonpara-metric regression techniques. In the second step, we use profile least-square-based principle for estimating the parametric and nonparametric components. Under the mild assumptions, we derive the asymptotic representives of the proposed estimators, and use the representives to establish asymptotic normality. We also propose the profile least-square-based ratio test and Wald test for the parametric part of model (1.2), and a goodness-of-fit test for the varying coefficients in the nonparametric part. The asymptotic distribution of the proposed test statistics is derived. Wild bootstrap versions are introduced to calculate the critical values for those tests.

The paper is organized as follows. In Section 2, we focus on the estimation of the parameters and nonparametric functions, and on the development of asymptotic properties of the resulting estimators. The error-prone covariates are first calibrated. Bandwidth selection strategy is also discussed. In Section 3, we develop profile least-square-based ratio tests for parametric and nonparametric components. Wild bootstrap methods are proposed to calculate the critical values. The results of applications to simulated and real data are reported in Section 4. Section 5 gives a conclusion. Regularity assumptions and technical proofs are relegated to the Appendix.

2 Estimation of the parametric and nonparametric components

When ξ is observed, estimators of β and α(u), and associated tests have been developed to study model (1.2). These estimators and tests cannot be used directly when ξ is unobservable. We first need to calibrate ξ by using ancillary variables and V because a direct replacement of ξ by η will result in bias.

2.1 Covariate calibration

For notational simplicity, we assume V is univariate in the remainder of this paper. Let η_i,k be the kth entry of vector η, and L_b(·) = L(·/b)/b, b = b_k (k = 1, 2, ···, p₁) is a bandwidth for the kth component of η. Assume throughout the paper that ξ_k(υ) has r + 1 derivatives and we approximate ξ_k(υ) by a r-order polynomial within the neighborhood of υ₀ via Taylor expansion

ξ_{k} (v) \approx ξ_{k} (v_{0}) + ξ_{k}^{'} (v_{0}) (v - v_{0}) + \dots + \frac{ξ_{k}^{(r)} (v_{0})}{r!} {(v - v_{0})}^{r} = \sum_{j = 0}^{r} a_{j, k} {(v - v_{0})}^{j} .

Denote

V_{v} = [\begin{matrix} 1 & (V_{1} - v) & \dots & {(V_{1} - v)}^{r} \\ ⋮ & ⋮ & \dots & \dots \\ 1 & (V_{n} - v) & \dots & {(V_{n} - v)}^{r} \end{matrix}], η^{(k)} = (\begin{matrix} η_{1 k} \\ ⋮ \\ η_{n k} \end{matrix}),

W_υ = diag{L_b(V₁ − υ), ···, L_b(V_n − υ)}. The local polynomial estimator (Fan and Gijble, 1996) of (a₀_,_k, …, a_r,k)^T can be expressed as ${\hat{a}}_{k}^{T} = {(V_{v}^{T} W_{v} V_{v})}^{- 1} V_{v}^{T} W_{v} η^{(k)}$ . As a consequence, ξ_k(v) is estimated by ${\hat{ξ}}_{k} (v) = ζ_{1}^{T} {(V_{v}^{T} W_{v} V_{v})}^{- 1} V_{v}^{- 1} W_{v} η^{(k)}$ , for k = 1, …, p₁, where ζ₁ is a (r + 1) × 1 vector with 1 in the first position and 0 in other positions.

In what follows, we denote A^⊗2 = AA^T, μ_j = ∫ u^jL(u)du, ν_j = ∫ u^jL²(u)du, S_u = (μ_j_+l)_0≤_j_,l≤_r, and c_p = (μ_r₊₁, ···, μ₂_r₊₁)^T. f_υ(υ) is the density function of V.

Under the assumptions given in the Appendix, we can prove (Fan and Gijbels, 1996, pp 101–103 or Carroll et al., 1997, pp 486) that

\begin{matrix} \hat{ξ} (v) - ξ (v) = \frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{(r + 1)!} ξ^{(r + 1)} (v) + \frac{1}{n f_{v} (v)} \sum_{i = 1}^{n} L_{b} (V_{i} - v) e_{i} \\ + o (b^{r + 1} + log b^{- 1} / \sqrt{n b}), \end{matrix}

(2.1)

uniformly on υ ∈ Inline graphic . This fact will be used for proving the main results in the Appendix.

2.2 Estimation of the parametric component

Let (Y_i, η_i, V_i, W_i, X_i, U_i), i = 1, 2 ···, n, be the observations from model (1.2). The unknown covariates ξ_i are substituted by their estimators given in the above section. We therefore have following “new” model:

{\begin{array}{l} Y_{i} = β^{T} {\hat{ξ}}_{i} + θ^{T} W_{i} + α^{T} (U_{i}) X_{i} + {\hat{ε}}_{i}, \\ {\hat{ε}}_{i} = ε_{i} + β^{T} {ξ_{i} - \hat{ξ} (V_{i})}, \end{array} i = 1, \dots, n,

(2.2)

where ${{\hat{ε}}_{i}}_{i = 1}^{n}$ are still treated as errors. If ξ̂_i would be an unbiased estimator of ξ_i, then Eε̂_i = 0.

Approximate α_j(U) within the neighbors of u by a_j(u) + b_j(u)(U − u) for j = 1, …, q. Write ${\hat{Z}}_{i} = {({\hat{ξ}}_{i}^{T}, W_{i}^{T})}^{T}$ and Θ = (β^T, θ^T)^T. Following the profile likelihood based procedure proposed by Fan and Huang (2005), our profile least-square-based estimator of Θ is defined as

{\hat{Θ}}_{n} = {{\tilde{Z}}^{T} \tilde{Z}}^{- 1} {\tilde{Z}}^{T} (I - S) Y,

(2.3)

where Z̃ = (I − S)Ẑ, I is the n × n identity matrix,

S = {(\begin{matrix} (X_{1}^{T} & 0_{q}^{T}) {(D_{u_{1}}^{T} W_{u_{1}} D_{u_{1}})}^{- 1} D_{u_{1}}^{T} W_{u_{1}} \\ ⋮ \\ (X_{n}^{T} & 0_{q}^{T}) {(D_{u_{n}}^{T} W_{u_{n}} D_{u_{n}})}^{- 1} D_{u_{n}}^{T} W_{u_{n}} \end{matrix})}_{n \times 2 q}, D_{u} = {(\begin{matrix} X_{1}^{T} & h^{- 1} (U_{1} - u) X_{1}^{T} \\ ⋮ & ⋮ \\ X_{n}^{T} & h^{- 1} (U_{n} - u) X_{n}^{T} \end{matrix})}_{n \times 2 q},

and Y = (Y₁, …, Y_n)^T, W_u = diag{K_h(U₁ − u), …, K_h(U_n − u)}_n_×_n, Ẑ = (Ẑ₁, ···, Ẑ_n)^T, 0_q is the q×1 vector with all the entries being zero, K(·) is a kernel function, h is a bandwidth and K_h(·) = K(·/h)/h.

We now give a representation of Θ̂_n. This representation can be used to obtain the asymptotic distribution of $\sqrt{n} ({\hat{Θ}}_{n} - Θ)$ , which we give in Theorem 2. This result extends the method of Fan and Huang (2005) to a SVCPLM with generated regressors.

Write Φ(U) = E(XZ^T|U), Γ(U) = E(XX^T|U), B(V) = E[{Z − Φ^T(U)Γ⁻¹(U)X}|V], ψ(Z, X, U) = Z − Φ^T(U)Γ⁻¹(U)X, and Σ = E(ZZ^T) − E{Φ^T(U) Γ⁻¹(U)Φ(U)}.

Theorem 1

Under Assumptions 1–5 in Appendix, we have

\begin{array}{l} {\hat{Θ}}_{n} - Θ = \sum^{- 1} [\frac{1}{n} \frac{b^{r + 1}}{(r + 1)!} ζ_{1}^{T} S_{u}^{- 1} c_{p} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) {ξ^{(r + 1)} (V_{i})}^{T} β_{0} \\ + \frac{1}{n} \sum_{j = 1}^{n} Δ (V_{j}) e_{j}^{T} β_{0} + \frac{1}{n} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) ε_{i}] {1 + o_{p} (1)} \end{array}

where $Δ (V_{j}) = \frac{1}{n} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) L_{b} (V_{j} - V_{i}) / f_{v} (V_{i})$ .

Theorem 2

Let nb²⁽^r⁺¹⁾ → 0. Under Assumptions 1–5 in Appendix, $\sqrt{n} ({\hat{Θ}}_{n} - Θ)$ converges to a normal distribution with mean zero and covariance matrix Σ₁, where Σ₁= Σ⁻¹DΣ⁻¹,

D = E [σ^{2} (X, Z, U) {ψ (X, Z, U)}^{\otimes 2}] + E [{(e^{T} β)}^{2} {B (V)}^{\otimes 2}] + β^{T} E {E [e ε ∣ Z, X, U, V] {B (V)}^{\otimes 2}} .

Furthermore, if e is independent of ε given (Z, X, U, V), and ε is independent of (Z, X, U), the asymptotic covariance can be simplified as Σ⁻¹(σ²Σ + E[(e^Tβ)²{B(V)}^⊗2])Σ⁻¹. If e is also independent of V, the asymptotic covariance can further be simplified as σ²Σ⁻¹ + β^TΣ_eβΣ⁻¹E{B(V)^⊗2Σ⁻¹.

The proof of Theorem 2 can be completed by using Theorem 1. We omit the details.

The asymptotic variance has a similar structure to that of Das (2005). The first term of asymptotic variance can be viewed as the variance from the first stage estimation without measurement error/missing data, the second one is the variance of the second stage for estimating unobserved variable and the third one is the covariance of two stage estimators. If e = 0 in model (1.2), i.e., the covariate can be exactly observed, the variance of Θ̂_n is the same as that of Fan and Huang (2005). To achieve the root-n estimator of Θ, Theorem 2 indicates that undersmoothing is required in estimating ξ(υ), and the optimal bandwidth does not satisfy the condition of Theorem 2.

Example 1.1 (con.) Let β̂_n be the estimator of β in model (1.3). Assume n₀/n → λ. Checking the conditions of Theorem 2, we can conclude that $\sqrt{n} ({\hat{β}}_{n} - β_{0}) \overset{L}{\to} N (0, \sum_{*})$ where Σ_⋆ = Σ⁻¹{σ² + λβ^TE[E{Z − E(Z|U)|V}]^⊗2β} and Σ = E[{ξ − E(ξ|U)}^⊗2].

Example 1.2 (con.) For the de-noised models introduced in Section 1, we apply Theorem 2 to derive the asymptotic distribution of the estimator, Θ̂ = (β̂^T, θ̂^T)^T, given by Cui, He, and Zhu (2002), and obtain that $\sqrt{n} (\hat{Θ} - Θ) \overset{L}{\to} N {0, \sum^{- 1} (σ^{2} + β^{T} \sum_{e} β)}$ .

The asymptotic covariance of Θ̂_n can be consistently estimated by Σ̂_n = nΣ̂^{− 1}σ̂² + Σ̂^{− 1}Q̂Σ̂⁻¹, where Σ̂⁻¹ = (Z̃^TZ̃)⁻¹Z̃^T(I−S)^T(I−S)Z̃(Z̃^TZ̃)⁻¹, $\hat{Q} = \frac{1}{n} \sum_{i = 1}^{n} {(η_{i} - {\hat{Z}}_{i})}^{T} {\hat{Θ}}_{n} {\hat{B} (V_{i})}^{\otimes 2}$ , and ${\hat{σ}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} {Y_{i} - \hat{α} (U_{i}) X_{i} - {\hat{Θ}}^{T} \hat{Z}}^{2}$ , and B̂(v) = Ẑ − Ê{Φ^T(U)Γ⁻¹(U)X|V = v} and Ê{Φ^T(U)Γ⁻¹(U)X|V = v} is a nonparametric regression estimator of Φ^T(U)Γ⁻¹(U)X on V. α̂(·) will be given in the next section.

Generally Σ̂_n is difficult to calculate. However, implementation will become simpler in some cases. For example, in the errors-in-variables model with validation data, a direct simplification yields B(V) = Z − Φ^T(U)Γ⁻¹(U)X, D = {β^TE(ee^T|V)β}Σ, and the asymptotic covariance matrix equals Σ⁻¹{σ² + λβ^TE(ee^T|V)β}. This matrix can be estimated by a standard sandwich procedure. The similar situation also applies for the asymptotic covariance matrix, Σ⁻¹{σ² + β^TE(ee^T|V)β}, of the de-noise model.

2.3 Estimation of the nonparametric components

After obtaining estimates Θ̂_n, we can estimate a_j(u) and b_j(u) for j = 1, …, q, and then α_j(u). Write Ψ(u) = {a₁(u), …, a_q(u), b₁(u), …, b_q(u)}^T. An estimator of the nonparametric components Ψ(u) is defined as

\hat{Ψ} (u) = H^{- 1} {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} (Y - \hat{Z} {\hat{Θ}}_{n}) .

(2.4)

Correspondingly, a(u) is estimated by $\hat{a} (u) = (I_{q}, 0_{q}) {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} (Y - \hat{Z} {\hat{Θ}}_{n})$ , where I_q is the q×q identity matrix, H = diag(1, h)⊗I_q. We have the following asymptotic representation for the resulting estimator.

Theorem 3

Under Assumptions 1–5 given in Appendix, we have

\begin{array}{l} \sqrt{n h} H {\hat{Ψ} (u_{0}) - Ψ (u_{0})} = \sqrt{n h} {\frac{h^{2}}{2 (μ_{2} - μ_{1}^{2})} (\begin{matrix} μ_{2}^{2} - μ_{1} μ_{3} \\ μ_{3} - μ_{1} μ_{2} \end{matrix}) α^{″} (u_{0}) \\ - \frac{b^{r + 1}}{(r + 1)!} ζ_{1}^{T} S_{u}^{- 1} c_{p} (\begin{matrix} Γ^{- 1} (u_{0}) E [X {ξ^{(r + 1)} (V)}^{T} β_{0} ∣ U = u_{0}] \\ 0 \end{matrix}) + o (h^{2} + b^{r + 1})} \\ + \frac{\sqrt{n h} Γ^{- 1} (u)}{n f_{u} (u) (μ_{2} - μ_{1}^{2})} \sum_{i = 1}^{n} K_{h} (U_{i} - u) {X_{i} ε_{i} + E (X_{i} ∣ V_{i}) e_{i}^{T} β} \\ \otimes (\begin{matrix} μ_{2} - μ_{1} (U_{i} - u) / h \\ (U_{i} - u) / h - μ_{1} \end{matrix}) {1 + o_{p} (1)} . \end{array}

Based on this representation, we can derive the asymptotic normality of the proposed nonparametric estimators of the varying coefficient functions. The proof is straightforward but tedious. We omit the details.

For notational simplicity, we assume that ε is independent of (Z, X, U) and e is independent of (V, U) in the remaining part of this paper.

Theorem 4

Under Assumptions 1–5, we have

\begin{array}{l} \sqrt{n h} [H {\hat{Ψ} (u_{0}) - Ψ (u_{0})} - \frac{h^{2}}{2 (μ_{2} - μ_{1}^{2})} (\begin{matrix} (μ_{2}^{2} - μ_{1} μ_{3}) α^{″} (u_{0}) \\ (μ_{3} - μ_{1} μ_{2}) α^{″} (u_{0}) \end{matrix}) \\ - \frac{b^{r + 1}}{(r + 1)!} ζ_{1}^{T} S_{u}^{- 1} c_{p} (\begin{matrix} Γ^{- 1} (u_{0}) E [X {ξ^{(r + 1)} (V)}^{T} β_{0} ∣ U = u_{0}] \\ 0 \end{matrix}) + o (h^{2} + b^{r + 1})] \overset{L}{\to} N (0, \sum_{2}), \end{array}

as n → ∞, where

\sum_{2} = \frac{1}{f_{u} (u_{0})} {σ^{2} Γ^{- 1} (u_{0}) + Γ^{- 1} (u_{0}) \sum_{1}^{*} Γ^{- 1} (u_{0})} \otimes G,

and

G = \frac{1}{{(μ_{2} - μ_{1}^{2})}^{2}} (\begin{matrix} μ_{2}^{2} ν_{0} - 2 μ_{1} μ_{2} ν_{1} + μ_{1}^{2} ν_{2} & (μ_{1}^{2} + μ_{2}) ν_{1} - μ_{1} μ_{2} ν_{0} - μ_{1} ν_{2} \\ (μ_{1}^{2} + μ_{2}) ν_{1} - μ_{1} μ_{2} ν_{0} - μ_{1} ν_{2} & ν_{1} - μ_{1} (2 ν_{1} + μ_{1} ν_{0}) \end{matrix})

in which $\sum_{1}^{*} = β^{T} \sum_{e} β Λ (u_{0})$ , Λ(u₀) = {E[{E(X|V)}|U = u₀]}^⊗2 q₀ = μ₂/(μ₂ − μ₁), $q_{1} = - μ_{1} / (μ_{2} - μ_{1}^{2})$ .

Furthermore, if nhb²^r⁺² → 0, then

\sqrt{n h} {\hat{α} (u) - α (u) - \frac{h^{2}}{2} \frac{μ_{2}^{2} - μ_{1} μ_{3}}{μ_{2} - μ_{1}^{2}} α^{″} (u) + o (h^{2} + b^{r + 1})} \overset{L}{\to} N (0, \sum_{2}^{*}),

where $\sum_{2}^{*} = σ^{2} (q_{0}^{2} ν_{0} + 2 q_{0} q_{1} ν_{1} + q_{1}^{2} ν_{2}) {Γ^{- 1} (u_{0}) + Γ^{- 1} (u_{0}) \sum_{1}^{*} Γ^{- 1} (u_{0})} / f_{u} (u)$ ,

The first term of Σ₂ is the asymptotic covariance of the usual profile likelihood estimator (Cai, Fan, and Li 2000) when ξ_j is observed. The second term is attributed to calibrating the error-prone covariates. In the error-in-variable model with validation data, if X is independent of V and E(X) = 0, the measurement errors have no impact on the effect of the covariance Σ₂. Theorem 4 also indicates that if n¹^/² max(h⁵^/², b^r⁺¹) → 0, the bias of α̂(u) tends to zero and α̂(u) is asymptotically normally distributed with rate (nh)¹^/².

After obtaining Θ̂_n and α̂(u), one can easily give an estimator of the variance σ² of the error ε:

{\hat{σ}}_{n}^{2} = \frac{1}{n} \sum_{i = 1}^{n} {Y_{i} - {\hat{β}}^{T} {\hat{ξ}}_{n}^{T} (V_{i}) - {\hat{θ}}_{n}^{T} W_{i} - {\hat{α}}^{T} (U_{i}) X_{i}}^{2} .

In our simulation, a simple version of ${\hat{σ}}_{n}^{2}$ is used. Note that S depends only on the observations ${(U_{i}, X_{i})}_{i = 1}^{n}$ , and we can derive a “synthetic linear model”, that is, Y − ZΘ = M + ε, where M = α^T(U)X. A straight-forward derivation yields (I − S)Y = (I − S)ZΘ + (I − S)ε. Standard regression gives the least square estimates Θ̂ and then M̂ = S(Y − ZΘ̂). Note that Z is not always observed. Replacing Z by its estimates, we obtain a consistent estimator M̂ of M; that is, M̂ = S(Y − ẐΘ̂). A consistent estimator σ² may be defined as ${\hat{σ}}_{n}^{2} = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Θ}}^{T} {\hat{Z}}_{i} - {\hat{M}}_{i})}^{2}$ , where M̂_i is the ith element of M̂.

2.4 Bandwidth selection

The proposed procedure involves two bandwidths, h and b, to be selected. To derive asymptotic distributions of the proposed estimators, we theoretically impose the rates of convergence for the bandwidths. It is worthy to point out that undersmoothing is necessary when we estimate ξ and the optimal bandwidth for b is then violated.

As mentioned before, the optimal bandwidth for b cannot be obtained because undersmoothing the nonparametric estimators of the covariates is needed. The consequence of undersmoothing ξ is that the bias is kept small and precludes the optimal bandwidth for b. The asymptotic variances of the proposed estimators for constant coefficients depend on neither the bandwidth nor the kernel function. Hence, we can use the similar method of mixture of higher-order theoretical expansions, proposed by Sepanski, Knicherbocker and Carroll (1994) or the typical curves approach by Brookmeyer and Liao (1992) to select the bandwidth b. As done by Sepanski, et al. (1994), the suitable bandwidth is b = Cn⁻¹^/³, where C is a constant depending on unknown function ξ(υ) and its twice derivatives. In practice, one can use a plug-in rule to estimate the constant C. A useful and simple candidate C is σ̂_V, the sample deviation of V. This method is fairly effective and easy to implement. In our simulation example, the bandwidth is b = σ̂_υn⁻¹^/³. Based on the asymptotic analysis and empirical experience for fixed time case (i.e. de-noise models), we suggest a simple rule of thumb as follows: the smoothing parameter b is so chosen that intervals of size 2b would contain around 5 points for n up to 100 and between 8⁻¹n¹^/³ and 4⁻¹n¹^/³ points for larger n.

We use “leave-one-sample-out” method to select the bandwidth h. This method has been widely applied in practice, for example Cai, et al. (2000) and Fan and Huang (2005). We define the cross-validation score for h as $C V (h) = n^{- 1} \sum_{i = 1}^{n} {Y_{i} - {\hat{α}}_{h, - i}^{T} (U_{i}) X_{i} - {\hat{Θ}}_{n, - i}^{T} {\hat{Z}}_{i}}^{2}$ , where Θ̂_n,₋_i is the estimated profile least-square-based estimator defined by (2.3), and computed from the data with measurements of the i-th observation deleted and α̂_h,₋_i(·) is the estimator defined in (2.4) with Θ̂_n replaced by Θ̂_n,₋_i. The likelihood cross-validation smoothing parameter h_cυ is the minimizer of CV(h). That is, h_cυ = arg min_h CV(h).

3 Tests for parametric and nonparametric components

3.1 Test for parametric components

An interesting question is to consider the following hypothesis

H_{0} : A Θ = 0 VS A Θ \neq 0

(3.1)

where A is a given l × p full rank matrix.

Let ${\hat{Θ}}_{0} = {({\hat{β}}_{0}^{T}, {\hat{θ}}_{0}^{T})}^{T}$ be the estimators of Θand α̂₀(·) be the estimator of α(u) under the null hypothesis. Denote $R S S_{0} = \sum_{i = 1}^{n} {Y_{i} - {\hat{β}}_{0}^{T} {\hat{ξ}}_{i}^{T} - {\hat{θ}}_{0}^{T} W_{i} - {\hat{α}}_{0}^{T} (U_{i}) X_{i}}^{2}$ . RSS₀ can be further expressed as $\sum_{i = 1}^{n} {Y_{i} - {\hat{β}}_{0}^{T} {\hat{ξ}}_{i}^{T} - θ_{0}^{T} W_{i} - S (Y - \hat{Z} {\hat{Θ}}_{0})}^{2}$ , where Θ̂₀ = Θ̂ − (Z̃^TZ̃)⁻¹A^T{A(Z̃^TZ̃)⁻¹A^T}^{− 1}AΘ̂, and Θ̂ = (Z̃^TZ̃)⁻¹Z̃^TỸ, an estimator of Θ without the restriction, with Z̃ = (I − S)Ẑ and Ỹ = (I − S)Ŷ.

Similarly, let ${\hat{Θ}}_{1} = {({\hat{β}}_{1}^{T}, {\hat{θ}}_{1}^{T})}^{T}$ and α̂₁(·) be the estimators of Θ and α(·) under the alternative hypothesis, respectively. Denote $R S S_{1} = \sum_{i = 1}^{n} {Y_{i} - {\hat{β}}_{1}^{T} {\hat{ξ}}_{i} - {\hat{θ}}_{1}^{T} W_{i} - {\hat{α}}_{1}^{T} (U_{i}) X_{i}}^{2}$ , which can be expressed as $\sum_{i = 1}^{n} {Y_{i} - {\hat{β}}_{1}^{T} {\hat{ξ}}_{i} - {\hat{θ}}_{1}^{T} W_{i} - S (Y - \hat{Z} {\hat{Θ}}_{1})}^{2}$ . Following Fan and Huang (2005), we define a profile least-square-based ratio test by

T_{n} = \frac{n}{2} (R S S_{0} - R S S_{1}) / R S S_{1} .

Under their set-up, Fan and Huang (2005) showed that statistic T_n is the profile likelihood ratio when the error distribution is normally distributed. In the present situation, because of effect of measurement error on variables, no central χ²-distribution as that of Fan and Huang (2005) is available. However, we can still prove that 2T_n has the asymptotic noncentral χ² distribution under the alternative hypothesis of (3.1), which we summarize in the following theorem.

Theorem 5

Suppose that Assumptions 1-5 in Appendix are satisfied and nb²^r⁺² → 0, as n → ∞. Under the alternative hypothesis of (3.1),

2 T_{n} - n σ^{- 2} Θ^{T} A^{T} {(A \sum^{- 1} A^{T})}^{- 1} A Θ \overset{L}{\to} \sum_{i = 1}^{l} ω_{i} χ_{i 1}^{2}

where ω_i for 1 ≤ i ≤ l are the eigenvalues of ${(σ^{2} A \sum^{- 1} A^{T})}^{- 1} (A \sum_{1}^{- 1} A^{T})$ and $χ_{i 1}^{2}$ is the central χ² distribution with degree of freedom 1. Furthermore, let Σ̂₁ and Σ̂ be the consistent estimators of Σ₁ and Σ, respectively. Then $2_{ϱ n} T_{n} \overset{L}{\to} χ_{(l)}^{2} (λ)$ , where $ϱ_{n} = l / t r {{(σ^{2} A {\sum^{^}}^{- 1} A^{T})}^{- 1} (A {\sum^{^}}_{1}^{- 1} A^{T})}, χ_{(l)}^{2} (λ)$ is noncentral χ² distribution with degrees of freedom l, and the noncentral parameter λ = σ⁻²ϱ lim_n_→∞ nΘ^TA^T(AΣ⁻¹A^T)⁻¹AΘ with $ϱ = l / t r {{(σ^{2} A \sum^{- 1} A^{T})}^{- 1} (A \sum_{1}^{- 1} A^{T})}$ .

In a similar way, we may construct the Wald test for hypothesis (3.1) as W_n =Θ̂^TA^T(AΣ̂₁A^T)⁻¹AΘ̂ and demonstrate that W_n and 2ϱ_nT_n have the same asymptotic distribution under the alternative hypothesis. These properties can therefore be used to calculate power of the proposed tests.

Example 1.3 (con.) Generalize (1.6) to a more flexible model:

Y_{t} = β^{T} E (η ∣ V_{t}) + ζ^{T} η + γ^{T} S_{t} + α (U_{t}) X_{t} + ε_{t} .

Write Θ = (β^T, ζ^T, γ^T)^T and $Z = {E (η^{T} ∣ V), η^{T}, S_{t}^{T}}^{T}$ . The hypothesis (1.7) is equivalent to

A Θ = 0 VS A Θ \neq 0,

(3.2)

where A = (1_p₁, −1_p₁, 0), 1_p₁ is p₁-variate vector with all entries 1. This is an expression of (3.1). As a consequence, the proposed profile least-square-based ratio test and Wald test can be applied to test this hypothesis.

For hypothesis (3.2), one may also propose a Wald-type statistic: W_n(h) = Θ̂^TA^T(AΣ̂_hA^T)⁻¹AΘ̂, where Σ̂_h = Σ̂⁻¹(σ̂² + β̂^TΣ̂_eβ̂). It can be proved that 2ϱ_nT_n and W_n have the same asymptotic χ² distribution.

3.2 Tests for the nonparametric part and Wild bootstrap version

It is also of interest to check whether the varying-coefficient functions α(u) in model (1.2) are parametric functions. Specifically speaking, we consider the following hypothesis:

H_{0} : α_{i} (U) = α_{i} (U, γ) VS H_{1}; α_{i} (U) \neq α_{i} (U, γ), i = 1, 2 \dots, q,

where γ is an unknown vector, α_i(·, ·) is a known function, i = 1, 2, ···, q.

For simplicity of presentation, we test the homogeneity:

H_{0} : α_{1} (U) = α_{1}, \dots, α_{q} (U) = α_{q} .

(3.3)

Let α̃₁, ···, α̃_q and Θ̃ be the profile estimator under H₀. The weighted residual sum of squares under H₀ is $RSS (H_{0}) = \sum_{i = 1}^{n} w_{i} {(Y_{i} - \sum_{j = 1}^{q} {\tilde{α}}_{j} X_{i j} - {\tilde{Θ}}^{T} {\hat{Z}}_{i})}^{2}$ , where w_i(·) are weighted functions such that $\sum_{i = 1}^{n} w_{i} = 1$ , and w_i ≥ 0. In general, the weight function w has a compact support, designed to reduce the boundary effects on the test statistics. When σ²(Z, X, U) = υ(Z, X, U)σ² for some known function υ(Z, X, U), we may choose w_i = υ⁻¹(Z_i, X_i, U_i). See Fan, Zhang, and Zhang (2001) for a similar argument.

Under the general alternative that all the varying-coefficient functions are allowed to be varying of random variable U, we use the local likelihood method to obtain estimator β̂ and α̂(U). Therefore, the corresponding weighted residual sum of squares is

RSS (H_{1}) = \sum_{i = 1}^{n} w_{i} {Y_{i} - \sum_{j = 1}^{q} {\hat{α}}_{j} (U_{i}) X_{i j} - {\hat{Θ}}^{T} {\hat{Z}}_{i}}^{2} .

In a similar way as in Section 3.1, we propose a generalized likelihood ratio (GLR) statistic:

T_{GLR} = {RSS (H_{0}) - RSS (H_{1})} / RSS (H_{1}) .

(3.4)

Under mild assumptions, one can derive the asymptotic distribution of T_GLR. This distribution can be used to gain the empirical level. See Fan, Zhang and Zhang (2001) for a related discussion.

These arguments can be applied to the following partially parametric null hypothesis:

H_{0} : α_{1} (U) = α_{1}, \dots, α_{l} (U) = α_{r}, r < q .

The difference is only the definition of RSS(H₀), for which we use the profile likelihood procedure to estimate constant coefficient α_i, i = 1, 2, ···, r and Θ, and use the profile linear procedure to estimate the nonparametric component α_i(·), i = r + 1, ···, q under the null hypothesis.

Although the asymptotic level of T_GLR is available, T_GLR may not perform well when sample sizes are small. For this reason and for practical purposes, we suggest using a bootstrap procedure. To be specific, let

{\hat{ε}}_{i} = Y_{i} - {\hat{Θ}}^{T} {\hat{Z}}_{i} - {\hat{α}}^{T} (U_{i}) X_{i}

be the residuals based on estimators (2.3) and (2.4) for parametric and nonparametric parts, respectively. We use the Wild bootstrap (Wu, 1986; Härdle and Mammen, 1993) method to calculate the critical values for test T_GLR. Let τ be a random variable with a distribution function F(·) such that Eτ = 0, Eτ² = 1 and E|τ|³ < ∞. We generate the bootstrap residual $ε_{i}^{*} = {\hat{ε}}_{i} τ_{i}$ , where τ_i is independent of ε̂_i. Define bootstrap version $T_{GLR}^{*}$ like T_GLR based on the bootstrap sample ( $Y_{i}^{*}, X_{i}, {\hat{Z}}_{i}, U_{i}$ ), where $Y_{i}^{*} = \hat{Θ} {\hat{Z}}_{i} + \hat{α} (U_{i}) X_{i} + ε_{i}^{*}$ for i = 1, 2 ···, n. On a basis of the distribution of $T_{GLR}^{*}$ , we have the (1 − α) quantile $t_{1 - α}^{*}$ and reject the parametric hypothesis if $T_{GLR} > t_{1 - α}^{*}$ .

4 Numerical Examples

4.1 Performance of the proposed estimators

In this section, we conducted simulation experiments to illustrate the finite sample performances of the proposed estimators and tests. Our simulated data were generated from the following model:

{\begin{array}{l} Y = β_{1} ξ + β_{2} W_{1} + β_{3} W_{2} + α_{1} (U) X_{1} + α_{2} (U) X_{2} + ε, \\ ξ = ξ (V), η = ξ (V) + e . \end{array}

(4.1)

W₁ and W₂ are bivariate normal with marginal mean zero, marginal variance 1 and correlation $1 / \sqrt{5}$ , while X₁ and X₂ are independent and normal with mean zero and variance 0.8. The unobserved covariate ξ is related to auxiliary variable (η, V) through ξ(V) = 3V − 2 cos(4πV) and η = ξ(V) + e. V is a uniform random variable on [0, 1] and U is a uniform random variable on [0, 3]. The errors ε and e are independent each other, and normal variables with mean 0 and variances $σ_{ε}^{2}$ and $σ_{e}^{2}$ , respectively. The varying-coefficient functions are

α_{1} (U) = exp (- U^{2}) + sin (π U),

(4.2)

or {\tilde{α}}_{1} (U, ϱ) = m + ϱ {α_{1} (U) - m},

(4.3)

α_{2} (U) = \frac{1}{2} U^{2} - cos (2 π U),

(4.4)

where $m = \int_{0}^{3} α_{1} (t) d t / 3$ , and ϱ is chosen one from the set {0.0, 0.2, 0.5, 0.7, 1.0}.

The sample size was 100. We generated 500 data sets in each case, applying to each simulated sample the bootstrap test proposed for the parametric part based on 500 bootstrap repetitions. The Gaussian kernel has been used in this example. The optimal bandwidth h were chosen by the leave-one-out cross-validation method described in Section 2.4 and the bandwidth b was selected as b = σ_υn⁻¹^/³, where σ_υ is the sample deviation of V.

We consider four scenarios. In the first three scenarios $σ_{ε}^{2} = 1$ and $σ_{e}^{2} = 2$ .

β = (0, c − 1, 1)^T for c ∈ {0, 0.1, 0.2, 0.25, 0.5, 0.7, 1.0} and α₁(u) and α₂(u) are given in (4.2) and (4.4);
β = (0, −0.8, 1)^T and α₁(u) and α₂(u) are given in (4.3) and (4.4) with ϱ ∈ {0.0, 0.2, 0.5, 0.7, 1.0};
β = (0.2, −1, 1)^T and α₁(u) and α₂(u) are the same as in (ii).
The setting is the same as that of (iii). But the signal-noise ratio ( $r = σ_{ξ}^{2} / (σ_{ξ}^{2} + σ_{e}^{2})$ ) varies from 0.3 to 0.8 by 0.1.

The corresponding results are presented in Tables 1–4, in which we display the estimated values and associated standard errors, standard derivations, and coverage probabilities based on the benchmark estimator (i.e., all covariate measured exactly), the proposed estimator and the naive estimator (η_i directly used as the covariates). We summarize our findings as follows.

When β₁ = 0 (scenario (i) and (ii)), all estimates are close to the true values regardless of the non-parametric functions α₁(u) and α₂(u). The differences among the estimated values based on three methods are slight and can be ignored. However, when β₁ = 0.2, the estimates of β₁ based on the naive method have severe biases and the associated coverage probabilities are also substantially smaller than 0.95. These biases were not improved when the sample size was increased (not listed here). But the proposed estimation procedure performs well. On the other hand, the estimates of β₂ and β₃ are similar based on the three methods. From Table 4, we can see that the naive estimator of β₁ has zero coverage probabilities when r = 0.3, while the proposed estimator has fairly reasonable coverage probabilities. With increase of r, it is readily seen that coverage probabilities of the proposed estimator are closer to the nominal level, which indicates the proposed method is promising.

Table 4.

Results of simulation study for scenario (iv).

		β₁				β₂				β₃
r		Est.	SE	SD	COV	Est.	SE	SD	COV	Est.	SE	SD	COV
0.30	B	0.194	0.038	0.046	0.970	−0.987	0.145	0.159	0.980	1.005	0.149	0.156	0.980
	P	0.173	0.040	0.042	0.850	−0.986	0.150	0.152	0.970	1.005	0.150	0.149	0.970
	N	0.073	0.025	0.025	0.000	−0.976	0.159	0.137	0.920	0.996	0.153	0.135	0.920
0.40	B	0.199	0.043	0.044	0.950	−1.002	0.123	0.147	0.970	1.002	0.131	0.147	0.980
	P	0.185	0.045	0.041	0.890	−1.003	0.124	0.144	0.970	0.999	0.130	0.144	0.980
	N	0.096	0.029	0.028	0.060	−1.002	0.127	0.134	0.970	1.002	0.135	0.134	0.960
0.50	B	0.199	0.043	0.042	0.960	−0.981	0.134	0.142	0.960	1.020	0.122	0.143	0.970
	P	0.190	0.044	0.040	0.920	−0.981	0.133	0.139	0.950	1.019	0.127	0.141	0.950
	N	0.116	0.033	0.030	0.200	−0.988	0.137	0.133	0.930	1.020	0.132	0.135	0.930
0.60	B	0.194	0.035	0.040	0.970	−0.993	0.136	0.141	0.950	1.025	0.137	0.138	0.930
	P	0.192	0.038	0.040	0.950	−0.994	0.140	0.140	0.950	1.025	0.138	0.137	0.910
	N	0.131	0.028	0.032	0.450	−0.998	0.152	0.137	0.910	1.020	0.138	0.134	0.940
0.70	B	0.198	0.038	0.039	0.960	−1.018	0.137	0.133	0.970	1.004	0.140	0.131	0.930
	P	0.194	0.040	0.038	0.950	−1.017	0.138	0.132	0.960	1.004	0.142	0.131	0.930
	N	0.152	0.038	0.033	0.660	−1.021	0.142	0.130	0.920	1.004	0.144	0.128	0.920
0.80	B	0.203	0.036	0.038	0.950	−1.001	0.142	0.132	0.930	1.005	0.136	0.132	0.960
	P	0.203	0.038	0.038	0.950	−1.002	0.143	0.131	0.940	1.005	0.135	0.132	0.960
	N	0.172	0.035	0.035	0.870	−1.002	0.147	0.131	0.920	1.000	0.136	0.131	0.930

Open in a new tab

4.2 Performance of the proposed tests

We now explore the numerical performance of the proposed tests. Firstly, we want to test hypothesis of the parametric component of form:

H_{0} : A β = 0 VS H_{1} : A β = c

(4.5)

where A = (1, 1, 1)^T, c is a value from the set {0, 0.1, 0.2, …, 0.7, 1}, β = (0.2, c − 1.2, 1)^T, α₁(·) and α₁(·) are the same those as scenarios (i). The same models and error distribution as in section 4.1 are used.

The power to detect H₁ were calculated by using the critical values from chi-squared approximation and Wild bootstrap approximation. To compare test performances, the powers of the tests based on the benchmark estimator, the proposed estimator and the naive estimator are presented. In implementing the Wild bootstrap method, we generated 500 bootstrap samples from the model

{\begin{array}{l} Y_{i}^{*} = {\hat{β}}_{1} {\hat{ξ}}_{i} + {\hat{β}}_{2} W_{1 i} + {\hat{β}}_{3} W_{2 i} + {\hat{α}}_{1} (U_{i}) Z_{1 i} + {\hat{α}}_{2} (U_{i}) Z_{2 i} + ε_{i}^{*}, \\ {\hat{ξ}}_{i} = \hat{ξ} (V_{i}), \end{array}

here, $ε_{i}^{*}$ is a wild bootstrap residual; i.e., $ε_{i}^{*} = τ_{i} {\hat{ε}}_{i}$ , with ε̂_i = Y_i − {β̂₁ξ̂_i + β̂₂W₁_i + β̂₃W₂_i + α̂₁(U_i)Z₁_i + α̂₂(U_i)Z₂_i}, and $τ_{i} = - (\sqrt{5} - 1) / 2$ with probability $(\sqrt{5} + 1) / (2 \sqrt{5})$ and $τ_{i} = (\sqrt{5} + 1) / 2$ with $1 - (\sqrt{5} + 1) / (2 \sqrt{5})$ . Using this bootstrap sample ( $Y_{i}^{*}, {\hat{ξ}}_{i}, W_{i}, Z_{i}, U_{i}$ ), we can calculate the $T_{n}^{*}$ and $W_{n}^{*}$ , and get the 95% percentiles as the critical values for the proposed tests at the significance level 0.05.

The power of T_n associated to scenario (iii) is presented in Table 5 for β₁ = 0.2. Note that the power is actually the empirical level when c = 0. All empirical levels close nominal level 0.05 and the empirical level based on the Wild bootstrap procedure are consistently smaller than those based on the χ² approximation and are closer to the nominal level. These facts apply for β₁ = 0 (not listed here). As c increases to 0.7, the powers of two tests based on χ² approximation is greater than 0.92. Similar conclusions can be drawn for the Wald test, whose simulation results are also given in Table 5.

Table 5.

Empirical power of profile least square ratio test T_n and the Wald test W_n at level 0.05 for hypothesis (4.5). Data were generated from models (4.1) with β = (0.2, c − 1.2, 1)^T and c ∈ {0, 0.1, 0.2, 0.25, 0.5, 0.7, 1} and α₁(u) and α₂(u) given by (4.2) and (4.4), respectively. The methods used are “Asm” for the asymptotic version, and “Boot” for the bootstrap version.

		T_n			Wald
c		B	P	N	B	P	N
0	Aym	0.060	0.070	0.080	0.050	0.050	0.080
	Boot	0.050	0.060	0.060	0.060	0.060	0.060
0.10	Aym	0.150	0.140	0.150	0.130	0.130	0.150
	Boot	0.130	0.100	0.080	0.130	0.120	0.080
0.20	Aym	0.190	0.220	0.120	0.150	0.150	0.120
	Boot	0.170	0.160	0.080	0.190	0.180	0.080
0.25	Aym	0.350	0.340	0.240	0.320	0.310	0.240
	Boot	0.290	0.280	0.180	0.310	0.300	0.180
0.50	Aym	0.740	0.710	0.530	0.670	0.660	0.530
	Boot	0.700	0.630	0.500	0.720	0.630	0.500
0.70	Aym	0.940	0.940	0.870	0.930	0.920	0.870
	Boot	0.920	0.890	0.800	0.930	0.890	0.800
1.00	Aym	1.000	1.000	1.000	0.990	0.990	1.000
	Boot	0.990	0.990	0.960	0.990	0.990	0.960

Open in a new tab

We further study the numerical performance of the test by checking the nonparametric component. We consider the following hypothesis:

H_{0} : α_{1} (u) = m VS α_{1} (u) = α_{1} (u, ϱ) given by (4.3)

(4.6)

The simulation results obtained by using the Wild bootstrap approximation method to choose critical value are shown in Table 6. When ϱ = 0, the results are the empirical levels, which are close to the nominal level. The power is greater than 0.99 when ϱ = 0.5. Table 6 also indicates that the power is a monotone increasing function of ϱ.

Table 6.

Empirical power of level 0.05 for hypothesis (4.6) using the Wild bootstrap procedure. Data were generated from (4.1) and (4.3) with β = (0.2, −1, 1)^T and ϱ ∈ {0, 0.5, 0.10, 0.15, 0.5, 0.7}.

ϱ	B	P	N
0	0.060	0.050	0.080
0.05	0.110	0.140	0.160
0.10	0.240	0.260	0.250
0.15	0.410	0.360	0.360
0.20	0.520	0.510	0.500
0.50	0.990	0.990	1.000
0.70	1.000	1.000	1.000

Open in a new tab

4.3 Real data example

To illustrate the proposed estimation method, we consider a dataset from Duchenne Muscular Dystrophy (DMD) study. See Andrews and Herzberg (1985) for a detailed discussion on the dataset. The dataset contains 209 observations corresponding to blood samples on 192 patients (17 patients have two samples) collected from a project to develop a screening program for female relatives of boys with DMD. The program’s goal was to inform a woman of her chances of being a carrier based on serum markers as well as her family pedigree. Another question of interest is whether age should be taken into account in the analysis. Enzyme levels were measured in known carriers (75 samples) and in a group of non-carriers (134 samples). The serum marker creatine kinase ( ck) is inexpensive to obtain, while the marker lactate dehydrogenase ( ld) is very expensive to obtain. It is of interest to predict the value ld by using the level of ck, carrier status and age of patient.

We consider the following model: Y = β₀ + β₁Z₁ + β₂Z₂ + g(U), where Z₁ = ck is measured with errors, and Z₂ = carrierstatus is exactly measured, U is age and Y denotes the observed level of lactate dehydrogenase. We justify the measurement error of Z₁ by regressing Z₁ on U. The estimates and associated standard errors based on the naive and proposed methods are as follows: β̂₀_,_naive = 4.6057(0.113), β̂₁_,_naive = 0.1509(0.027), β̂₂_,_naive = 0.2269(0.055); β̂₀_,n = 4.4296(0.329), β̂₁_,n = 0.1775(0.042), and β̂₂_,n = 0.3702(0.050). The estimated curves of the nonparametric function g(u) are provided in Figure 1. Accounting for measurement errors, the estimate of β₁ increases about 17.2%, and the associated standard error also increases 55%. The estimate of β₂ also increases when measurement errors are taken into account. The patterns of the nonparametric curve are similar, and shows slight difference.

Estimated curves of the nonparametric function for the DMD study. The solid, dotted lines were obtained using the naive and proposed method, respectively.

5 Discussion

We developed estimation and inference procedures for the SVCPLM when parts of the parametric components are unobserved. The procedures are derived by incorporating ancillary information to calibrate the mismeasured variables, and by applying the profile least-square-based principle.

In some case we may not have an auxiliary variable η, but we can observe two or more independent replicates of V. For instance, when two measurements V₁ and V₂, which satisfy that V₁ = ξ + u₁ and V₂ = ξ + u₂, and E(u₁|V₂) = 0 and E(u₂|V₁) = 0, are available, we can estimate ξ by

\hat{ξ} (v) = \frac{\sum_{i = 1}^{n} {V_{i 1} K_{h} (V_{i 2} - v) + V_{i 1} K_{h} (V_{i 1} - v)}}{\sum_{i = 1}^{n} {K_{h} (V_{i 2} - v) + K_{h} (V_{i 1} - v)}}

because E(V₁|V₂ = υ) = E(V₂|V₁ = υ) = E(ξ|V = υ). The proposed procedure applies to this situation as well, and similar results to those presented in this paper can be obtained for the resulting estimator.

It is of interest to extend the proposed methodology to a more general semiparametric model:

E (Y ∣ Z, X, U) = G {Θ^{T} Z + α^{T} (U) X},

where G(·) is a link function. The study of this model with mismeasured components of Z needs further investigation and is beyond the scope of this paper.

Table 1.

Results of simulation study for scenario (i)

		β₁				β₂				β₃
ϱ		Est.	SE	SD	COV	Est.	SE	SD	COV	Est.	SE	SD	COV
0	B	−0.000	0.030	0.027	0.912	−0.990	0.133	0.125	0.938	1.000	0.138	0.126	0.924
	P	−0.001	0.031	0.028	0.918	−0.990	0.133	0.125	0.936	1.000	0.139	0.126	0.930
	N	−0.001	0.026	0.024	0.904	−0.990	0.133	0.125	0.940	0.999	0.138	0.126	0.926
0.1	B	0.002	0.028	0.027	0.920	−0.890	0.139	0.126	0.910	1.003	0.129	0.126	0.936
	P	0.003	0.030	0.028	0.938	−0.890	0.139	0.126	0.912	1.003	0.129	0.126	0.938
	N	0.003	0.025	0.024	0.938	−0.890	0.140	0.126	0.912	1.004	0.129	0.126	0.938
0.2	B	0.000	0.029	0.027	0.936	−0.802	0.144	0.126	0.894	0.991	0.138	0.126	0.932
	P	−0.000	0.030	0.028	0.934	−0.802	0.145	0.126	0.898	0.991	0.138	0.126	0.940
	N	−0.001	0.027	0.024	0.912	−0.801	0.145	0.126	0.896	0.992	0.138	0.125	0.934
0.25	B	−0.001	0.029	0.027	0.930	−0.749	0.128	0.127	0.936	0.990	0.138	0.127	0.940
	P	−0.000	0.031	0.028	0.928	−0.748	0.129	0.127	0.938	0.990	0.139	0.127	0.938
	N	−0.000	0.024	0.024	0.948	−0.749	0.128	0.126	0.938	0.990	0.138	0.126	0.940
0.5	B	−0.002	0.029	0.027	0.926	−0.513	0.143	0.126	0.918	1.000	0.131	0.126	0.936
	P	−0.002	0.031	0.028	0.928	−0.513	0.143	0.126	0.920	1.001	0.131	0.126	0.936
	N	−0.001	0.026	0.024	0.926	−0.513	0.143	0.126	0.918	1.001	0.131	0.126	0.936
0.7	B	0.000	0.029	0.027	0.936	−0.299	0.140	0.127	0.916	0.996	0.138	0.127	0.924
	P	0.001	0.029	0.028	0.930	−0.298	0.140	0.127	0.920	0.997	0.138	0.127	0.926
	N	0.001	0.025	0.024	0.934	−0.299	0.140	0.126	0.914	0.996	0.138	0.126	0.926
1	B	0.001	0.030	0.027	0.934	0.002	0.137	0.127	0.942	1.008	0.144	0.127	0.908
	P	0.001	0.031	0.028	0.934	0.002	0.137	0.127	0.938	1.008	0.145	0.127	0.906
	N	0.001	0.026	0.024	0.928	0.002	0.138	0.127	0.938	1.007	0.144	0.127	0.908

Open in a new tab

Note: “Est” is the simulation mean; “SE” is the mean of the estimated standard error; “SE” is the mean of the estimated standard deviation; and “COV” is the coverage probability of a nominal 95% confidence interval. The methods used are “B” for the benchmark method, “P” for the proposed method, and “N” for the naive method.

Table 2.

Results of simulation study for scenario (ii)

		β₁				β₂				β₃
ϱ		Est.	SE	SD	COV	Est.	SE	SD	COV	Est.	SE	SD	COV
0	B	0.000	0.034	0.033	0.920	−0.795	0.154	0.153	0.948	0.995	0.160	0.154	0.946
	P	0.001	0.036	0.035	0.922	−0.795	0.154	0.154	0.950	0.995	0.159	0.154	0.950
	N	0.000	0.030	0.029	0.928	−0.794	0.154	0.153	0.948	0.994	0.160	0.154	0.950
0.05	B	0.002	0.028	0.027	0.920	−0.790	0.139	0.125	0.910	1.003	0.129	0.126	0.936
	P	0.003	0.030	0.028	0.938	−0.790	0.139	0.125	0.908	1.004	0.129	0.126	0.938
	N	0.003	0.025	0.024	0.938	−0.790	0.140	0.125	0.908	1.004	0.129	0.126	0.938
0.1	B	0.000	0.029	0.027	0.936	−0.802	0.144	0.126	0.894	0.991	0.138	0.125	0.928
	P	−0.000	0.030	0.028	0.936	−0.802	0.144	0.126	0.898	0.991	0.138	0.126	0.938
	N	−0.001	0.027	0.024	0.916	−0.801	0.145	0.125	0.896	0.992	0.138	0.125	0.932
0.15	B	−0.001	0.029	0.027	0.932	−0.799	0.128	0.126	0.936	0.990	0.138	0.126	0.938
	P	−0.000	0.031	0.028	0.930	−0.798	0.128	0.126	0.938	0.990	0.138	0.126	0.938
	N	−0.000	0.024	0.024	0.950	−0.799	0.128	0.126	0.938	0.990	0.138	0.126	0.936
0.2	B	−0.002	0.029	0.027	0.926	−0.813	0.143	0.126	0.918	1.001	0.131	0.126	0.934
	P	−0.002	0.031	0.028	0.932	−0.813	0.143	0.126	0.918	1.001	0.131	0.126	0.934
	N	−0.001	0.026	0.024	0.924	−0.813	0.143	0.126	0.916	1.001	0.131	0.126	0.934
0.5	B	0.000	0.029	0.027	0.936	−0.799	0.140	0.126	0.916	0.996	0.138	0.126	0.924
	P	0.001	0.029	0.028	0.930	−0.798	0.140	0.126	0.922	0.997	0.138	0.127	0.926
	N	0.001	0.025	0.024	0.934	−0.799	0.140	0.126	0.914	0.996	0.138	0.126	0.926
0.7	B	0.001	0.030	0.027	0.932	−0.798	0.137	0.127	0.942	1.008	0.144	0.127	0.906
	P	0.001	0.031	0.028	0.934	−0.798	0.137	0.127	0.938	1.008	0.145	0.127	0.906
	N	0.001	0.026	0.024	0.930	−0.798	0.138	0.126	0.938	1.007	0.144	0.126	0.908

Open in a new tab

Table 3.

Results of simulation study for scenario (iii)

		β₁				β₂				β₃
ϱ		Est.	SE	SD	COV	Est.	SE	SD	COV	Est.	SE	SD	COV
0	B	0.200	0.034	0.035	0.936	−0.995	0.154	0.162	0.956	0.995	0.160	0.163	0.964
	P	0.195	0.038	0.038	0.920	−0.995	0.158	0.167	0.958	0.994	0.163	0.168	0.958
	N	0.156	0.031	0.030	0.684	−0.995	0.158	0.160	0.952	0.995	0.165	0.160	0.946
0.05	B	0.202	0.028	0.029	0.948	−0.990	0.139	0.138	0.938	1.003	0.129	0.138	0.950
	P	0.197	0.032	0.032	0.944	−0.994	0.144	0.144	0.938	1.004	0.139	0.144	0.948
	N	0.159	0.026	0.025	0.602	−0.991	0.149	0.133	0.910	1.004	0.140	0.133	0.926
0.1	B	0.200	0.029	0.029	0.950	−1.002	0.144	0.138	0.924	0.991	0.138	0.138	0.956
	P	0.194	0.032	0.033	0.948	−1.005	0.151	0.144	0.922	0.991	0.147	0.144	0.954
	N	0.155	0.028	0.025	0.560	−1.004	0.153	0.133	0.904	0.991	0.148	0.133	0.912
0.15	B	0.199	0.029	0.029	0.950	−0.999	0.128	0.138	0.960	0.990	0.138	0.138	0.960
	P	0.194	0.033	0.032	0.938	−0.998	0.135	0.144	0.958	0.986	0.144	0.144	0.956
	N	0.155	0.025	0.025	0.542	−0.997	0.138	0.133	0.938	0.989	0.145	0.133	0.948
0.2	B	0.198	0.029	0.029	0.948	−1.013	0.143	0.138	0.942	1.001	0.131	0.138	0.958
	P	0.193	0.033	0.032	0.936	−1.012	0.148	0.144	0.938	0.997	0.136	0.144	0.954
	N	0.155	0.027	0.025	0.536	−1.016	0.154	0.133	0.920	0.998	0.141	0.133	0.932
0.5	B	0.200	0.029	0.029	0.956	−0.999	0.140	0.138	0.954	0.996	0.138	0.138	0.944
	P	0.195	0.032	0.032	0.952	−1.000	0.147	0.144	0.954	0.993	0.144	0.144	0.956
	N	0.157	0.026	0.025	0.582	−1.000	0.153	0.133	0.898	0.996	0.145	0.133	0.920
0.7	B	0.201	0.030	0.029	0.952	−0.998	0.137	0.139	0.958	1.008	0.144	0.139	0.938
	P	0.196	0.033	0.033	0.946	−0.997	0.143	0.145	0.962	1.008	0.146	0.145	0.956
	N	0.157	0.028	0.025	0.594	−1.000	0.146	0.134	0.932	1.006	0.151	0.134	0.912

Open in a new tab

Acknowledgments

Zhou’s research was partially supported by Funds of National Natural Science (No.10471140 and No. 10571169) of China. Liang’s research was supported by NIH/NIAID grants AI62247, AI59773, and AI50020. The authors thank the Co-Editors, the former Co-Editor Professor Jianqing Fan, and the referees for constructive comments that substantially improved an earlier version of this paper.

Appendix

In this Appendix, we list assumptions and outline proofs of the main results. The following technical assumptions are imposed.

A.1 Assumptions

The random variable U has a bounded support . Its density function f_u(·) is Lipschitz continuous and bounded away from 0 on its support. The density function of random variable V, f_υ(v), is continuously differentiable and bounded away 0 and infinite on its finite support . {α_i(u), i = 1, 2, ···, q} have continuous second derivative.
The q × q matrix E(ZZ^T|U) is non-singular for each U ∈ . All elements of the matrixes E(ZZ^T|U), E(ZZ^T|U)⁻¹ and E(ZX^T|U) are Lipschitz continuous.
The kernel functions K(·) and L(·) are density functions with compact support [−1, 1].
There is an s > 2 such that E||Z||²^s < ∞ and E||X||²^s < ∞ and for some δ < 2 − s⁻¹ such that n²^δ⁻¹h → ∞, n²^δ⁻¹b_k → ∞ and $n h b_{k}^{(2 r + 2)} \to 0$ , k = 1, 2, ···, p₁ where b_k is the bandwidth parameter in polynomial estimator ξ̂_k(·) of ξ_k(·).

5. nh⁸ → 0 and nh²/(log n)² → ∞.

A.2 Preliminary Lemmas

Write $c_{n 1} = {(\frac{log h}{n h})}^{1 / 2} + h^{2}, c_{n 2} = {(\frac{log b}{n b^{d}})}^{1 / 2} + b^{r + 1}, c_{n} = c_{n 1} + c_{n 2}$ .

Lemma A.1

Suppose that (Z_i, X_i, U_i), i = 1, 2, ···, n are i.i.d random vector. E|g(X, Z, U)| < ∞ and E[g(·, ·, u)|U = u] has continuous second derivative on u. Further assume that E(|g(X, Z, U)|^s|Z = z, X = x) < ∞. Let K be a bounded positive function with a bounded support, satisfying Lipschitz condition. Given that n²^δ⁻¹h → ∞ for some δ < 1 − s⁻¹, then we have

sup_{u \in U} ∣ \frac{1}{n} \sum_{i = 1}^{n} K_{h} (U_{i} - u) {(\frac{U_{i} - u}{h})}^{k} g (X_{i}, Z_{i}, U_{i}) - f (u) E {g (X, Z, u) ∣ U = u} μ_{k} ∣ = O (c_{n 1}) a . s .

Furthermore, Assume that E[ε_i|Z_i, X_i, U_i] = 0, E[|ε_i|^s|Z_i, X_i, U_i)] < ∞, then

sup_{u \in U} ∣ \frac{1}{n} \sum_{i = 1}^{n} K_{h} (U_{i} - u) g (X_{i}, Z_{i}, U_{i}) ε_{i} ∣ = O (c_{n 1}) a . s .

Proof

The first result follows an argument similar to that of Lemma A.2 of Fan and Huang (2005). The second result follows the first result and an argument similar to Xia and Li (1999).

Lemma A.2

Suppose that E[g(Z, X, u)|U = u] has continuous second derivative on u and E|g(X, Z, U)|^s < ∞. Under Assumptions 1–5, we have

sup_{u \in U} ∣ \frac{1}{n} \sum_{i = 1}^{n} K_{h} (U_{i} - u) {(\frac{U_{i} - u}{h})}^{k} g (X_{i}, Z_{i}, U_{i}) {\hat{ξ}}_{i}^{T} - f (u) E {g (X, Z, u) ξ^{T} ∣ U = u} μ_{k} ∣ = O (c_{n}) a . s .

and

sup_{u \in U} ∣ \frac{1}{n} \sum_{i = 1}^{n} K_{h} (U_{i} - u) g (X_{i}, Z_{i}, U_{i}) h ({\hat{ξ}}_{i}) ε_{i} ∣ = O (c_{n}),

where h(·) is a twice continuous differentiable function.

Proof

Note that $\frac{1}{n} \sum_{i = 1}^{n} K_{h} (U_{i} - u) {(\frac{U_{i} - u}{h})}^{k} g (X_{i}, Z_{i}, U_{i}) {\hat{ξ}}_{i}^{T}$ can be decomposed as

\begin{array}{l} \frac{1}{n} \sum_{i = 1}^{n} K_{h} (U_{i} - u) {(\frac{U_{i} - u}{h})}^{k} g (X_{i}, Z_{i}, U_{i}) ξ_{i}^{T} \\ + \frac{1}{n} \sum_{i = 1}^{n} K_{h} (U_{i} - u) {(\frac{U_{i} - u}{h})}^{k} g (X_{i}, Z_{i}, U_{i}) {({\hat{ξ}}_{i} - ξ_{i})}^{T} . \end{array}

By Lemma A.1, the first term equals f_u(u)E{g(X, Z, u)ξ|U = u}μ_k + O(c_n₁) uniformly on u ∈ Inline graphic in probability. Recalling the asymptotic expression given in (2.1) and using Lemma A.1, one can show that the second term is O(c_n₂). This completes the proof of Lemma 2.

Lemma A.3

g(·, ·, u) has continuous second derivative on u and E|g(X, Z, U)| < ∞. Under Assumptions 1–5, $∣ \frac{1}{n} \sum_{i = 1}^{n} ({\hat{Z}}_{i} - Z_{i}) g (X_{i}, Z_{i}, U_{i}) ∣$ and $∣ \frac{1}{n} \sum_{i = 1}^{n} ({\hat{Z}}_{i} - Z_{i}) {\hat{Z}}_{i} g (X_{i}, Z_{i}, U_{i}) ∣$ are of order O(c_n) a.s.

Proof

The proof follows from (2.1) and arguments similar to Lemma A.2.

Lemma A.4

Under Assumptions 1–5, we have

\begin{array}{l} {({\tilde{Z}}^{T} \tilde{Z})}^{- 1} {\tilde{Z}}^{T} (I - S) Z = \sum^{- 1} (\frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{n (r + 1)!} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) [{ξ^{(r + 1)} (V_{i})}^{T}, 0] \\ + \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{1}{f_{v} (V_{i})} ψ (Z_{i}, X_{i}, U_{i}) L_{b} (V_{j} - V_{i}) (e_{j}^{T}, 0)) {1 + o (1)} \end{array}

in probability.

Proof

We first prove that

\frac{1}{n} {\tilde{Z}}^{T} \tilde{Z} \to \sum .

(A.1)

A direct calculation yields

D_{u}^{T} W_{u} D_{u} = n f_{u} (U) Γ (U) \otimes (\begin{matrix} 1 & μ_{1} \\ μ_{1} & μ_{2} \end{matrix}) {1 + O_{P} (c_{n 1})} .

(A.2)

On the other hand, Lemma A.3 implies

D_{u}^{T} W_{u} \hat{Z} = n f_{u} (U) Φ (U) \otimes {(1, μ_{1})}^{T} {1 + O_{P} (c_{n})} .

(A.3)

A combination of (A.2) and (A.3) implies

(X^{T}, 0) {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} \hat{Z} = X^{T} Γ^{- 1} (U) Φ (U) {1 + O_{P} (c_{n})},

(A.4)

and then

{\tilde{Z}}_{i} = {\hat{Z}}_{i} - Φ^{T} (U_{i}) Γ^{- 1} (U_{i}) X_{i} {1 + O_{p} (c_{n})}, i = 1, 2, \dots, n .

(A.5)

It follows from these arguments that $n^{- 1} {\tilde{Z}}^{T} \tilde{Z} = \frac{1}{n} \sum_{i = 1}^{n} {ψ (Z_{i}, X_{i}, U_{i})}^{\otimes 2} {1 + O_{P} (c_{n})}$ , and (A.1) follows.

Note that ${\tilde{Z}}^{T} (I - S) (Z - \hat{Z}) = Z^{T} {(I - S)}^{T} (I - S) (Z - \hat{Z}) - {(Z - \hat{Z})}^{T} {(I - S)}^{T} (I - S) (Z - \hat{Z}) \overset{def}{=} J_{1} - J_{2}$ . The second term, J₂, is $O_{P} (c_{n}^{2})$ by Lemma A.3. Write Z̃_* = (I − S)Z. We have $J_{1} = {\tilde{Z}}_{*}^{T} (Z - \hat{Z}) - {\tilde{Z}}_{*}^{T} S (Z - \hat{Z})$ . It follows from (2.1) that

\begin{array}{l} D_{u} W_{u} (Z - \hat{Z}) = \frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{(r + 1)!} \sum_{i = 1}^{n} (\begin{matrix} K_{h} (U_{i} - U) X_{i} ξ^{(r + 1)} (V_{i}) & 0 \\ K_{h} (U_{i} - U) h^{- 1} (U_{i} - U) X_{i} ξ^{(r + 1)} (V_{i}) & 0 \end{matrix}) \\ + \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} f_{v}^{- 1} (V_{i}) (\begin{matrix} K_{h} (U_{i} - U) L_{b} (V_{j} - V_{i}) X_{i} e_{j}^{T} & 0 \\ h^{- 1} (U_{i} - U) K_{h} (U_{i} - U) L_{b} (V_{j} - V_{i}) X_{i} e_{j}^{T} & 0 \end{matrix}) \\ + o (b^{r + 1} + log b^{- 1} / \sqrt{n b}) . \end{array}

By an argument similar to that of (A.5), we derive

\begin{array}{l} {\tilde{Z}}_{*}^{T} S (Z - \hat{Z}) = \frac{1}{n} \sum_{l = 1}^{n} \tilde{ρ} (Z_{l}, X_{l}, U_{l}) \sum_{i = 1}^{n} {\frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{(r + 1)!} (\begin{matrix} K_{h} (U_{i} - U_{l}) X_{i} ξ^{(r + 1)} (V_{i}) & 0 \\ K_{h} (U_{i} - U_{l}) h^{- 1} (U_{i} - U_{l}) X_{i} ξ^{(r + 1)} (V_{i}) & 0 \end{matrix}) \\ + \frac{1}{n} \sum_{j = 1}^{n} f_{v}^{- 1} (V_{i}) (\begin{matrix} K_{h} (U_{i} - U_{l}) L_{b} (V_{j} - V_{i}) X_{i} e_{j}^{T} & 0 \\ h^{- 1} (U_{i} - U_{l}) K_{h} (U_{i} - U_{l}) L_{b} (V_{j} - V_{i}) X_{i} e_{j}^{T} & 0 \end{matrix})} {1 + o_{P} (1)}, \end{array}

where ρ̃(Z_l, X_l, U_l) can be expressed as

\begin{array}{l} ψ (Z_{l}, X_{l}, U_{l}) {1 + O_{P} (c_{n})}] (X_{l}^{T}, 0) \\ \times {f_{u} (U_{l}) Γ (U_{l}) \otimes (\begin{matrix} 1 & μ_{1} \\ μ_{1} & μ_{2} \end{matrix}) {1 + O_{p} (c_{n})}}^{- 1} \\ = \frac{1}{f_{u} (U_{l})} {ψ {(Z_{l}, X_{l}, U_{l})}^{T} X_{l}^{T}, 0} {Γ^{- 1} (U_{1}) \otimes {(\begin{matrix} 1 & μ_{1} \\ μ_{1} & μ_{2} \end{matrix})}^{- 1}} {1 + O_{P} (c_{n})} \\ = \frac{1}{f_{u} (U_{l}) (μ_{2} - μ_{1}^{2})} {ψ (Z_{l}, X_{l}, U_{l}) X_{l}^{T}} Γ^{- 1} (U_{1}) \otimes (μ_{2}, - μ_{1}) {1 + O_{P} (c_{n})} . \end{array}

Denote by ρ(Z_l, X_l, U_l) the main term of the right hand side of the above formula. Note that E{ρ (Z_l, X_l, U_l)|U_l} = 0. By Lemma 3 of Chen, Choi and Zhou (2005) we have

\frac{1}{n^{3}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{l = 1}^{n} K_{h} (U_{i} - U_{l}) L_{b} (V_{j} - V_{i}) ρ (Z_{l}, X_{l}, U_{l}) \frac{X_{i} e_{j}^{T}}{f_{v} (V_{i})} = O_{P} (c_{n} n^{- 1 / 2}) .

(A.6)

Furthermore, we can show, in a similar way as that for (A.6), that

\frac{ζ_{1} S_{u}^{- 1} c_{p} b^{2 (r + 1)}}{n^{2} (r + 1)!} \sum_{i = 1}^{n} \sum_{l = 1}^{n} K_{h} (U_{i} - U_{l}) ρ (Z_{l}, X_{l}, U_{l}) X_{i} {ξ^{(r + 1)} (V_{i})}^{T} = O_{P} (c_{n}^{2}) .

These arguments imply that

n^{- 1} {\tilde{Z}}_{*}^{T} S (Z - \hat{Z}) = O_{P} (c_{n}^{2}) .

(A.7)

We now deal with the term ${\tilde{Z}}_{*}^{T} (Z - \hat{Z})$ . Note that ${\tilde{Z}}_{*}^{T} (Z - \hat{Z})$ equals $\sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) {{(ξ_{i} - {\hat{ξ}}_{i})}^{T}, 0}$ , which can be further decomposed as

\begin{array}{l} \frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{(r + 1)!} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) [{ξ^{(r + 1)} (V_{i})}^{T}, 0] \\ + \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{1}{f_{v} (V_{i})} ψ (Z_{i}, X_{i}, U_{i}) L_{b} (V_{j} - V_{i}) (e_{j}^{T}, 0) + o_{P} (c_{n}) . \end{array}

This completes the proof of Lemma A.4.

Lemma A.5

Under Assumptions 1–5, we have Z̃^T(I −S)(I −S)^TZ̃/n → Σ in probability and Σ̂ = n(Z̃^TZ̃)⁻¹Z̃^T(I −S)(I −S)Z̃(Z̃^TZ̃^T)⁻¹ → Σ.

Proof

The proof of the first result can be finished by arguments similar to those of Lemmas A.2–A.4, while the second one can be proved by arguments similar to Lemma 7.3 of Fan and Huang (2005).

Lemma A.6

Under Assumptions 1–5, we have ${\tilde{Z}}^{T} (I - S) M / n = O_{p} (c_{n}^{2})$ .

Proof

The proof follows (A.5) and an argument similar to that of Lemma 7.4 of Fan and Huang (2005).

Lemma A.7

g (·) and h(·) are two continuous function vectors. Under Assumptions 1–5, we have $\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Z}}_{i} - Z_{i}) g (Z_{i}) ε_{i} \to 0$ and $\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Z}}_{i} - Z_{i}) X_{i}^{T} h (U_{i}) ε_{i} \to 0$ in probability.

Proof

The proof follows from arguments similar to those of Lemma A.2.

Lemma A.8

Under Assumptions 1–5, we have ${\tilde{Z}}^{T} (I - S) ɛ = \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) X_{i} {1 + o_{P} (1)} ε_{i} + o (n^{1 / 2})$ , where ε= (ε₁, ···, ε_n)^T.

Proof

Note that ${\tilde{Z}}^{T} (I - S) ɛ = \sum_{i = 1}^{n} {\tilde{Z}}_{i} {ε_{i} - (X_{i}, 0) {(D_{u_{i}} W_{u_{i}} D_{u_{i}})}^{- 1} D_{u_{i}} W_{u_{i}} ε}$ . By the same argument as those for (A.3), we have

n^{- 1} D_{u}^{T} W_{u} ε = n^{- 1} \sum_{i = 1}^{n} (\begin{matrix} X_{i} \\ \frac{U_{i} - U}{h} X_{i} \end{matrix}) K_{h} (U_{i} - U) ε_{i} = f_{u} (U) E (X ∣ U) O_{P} (c_{n}) .

This formula along with (A.2) yields $(X^{T}, 0) {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u} W_{u} ε = X^{T} Γ^{- 1} (U) E (X ∣ U) O_{P} (c_{n})$ . A combination of these arguments with Lemma A.7 finishes the proof of Lemma A.8.

A.3 Proof of Theorem 1

Note that Θ̂_n can be expressed as (Z̃^TZ̃)⁻¹Z̃^T(I −S)ZΘ+ (Z̃^TZ̃)⁻¹Z̃^T(I −S)M + (Z̃^TZ̃)⁻¹Z̃^T(I −S)ε. By Lemma A.8, the third term equals $\sum^{- 1} n^{- 1} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) ε_{i} {1 + o_{p} (1)} + o_{P} (n^{- 1 / 2})$ . The first term equals, via Lemma A.4,

\begin{array}{l} \sum^{- 1} [\frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{n (r + 1)!} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) {ξ^{(r + 1)} (V_{i})}^{T} β_{0} \\ + \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{1}{f_{v} (V_{i})} ψ (Z_{i}, X_{i}, U_{i}) L_{b} (V_{j} - V_{i}) e_{j}^{T} β_{0}] . \end{array}

By Lemma A.6 and (A.1), it follows that the second term of Θ̂_n’s expression is of order $O (c_{n}^{2})$ in probability. These arguments imply that

\begin{array}{l} {\hat{Θ}}_{n} - Θ_{0} = \sum^{- 1} [\frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{n (r + 1)!} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) {ξ^{(r + 1)} (V_{i})}^{T} β_{0} \\ + \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{1}{f_{v} (V_{i})} ψ (Z_{i}, X_{i}, U_{i}) L_{b} (V_{j} - V_{i}) e_{j}^{T} β_{0} \\ + \frac{1}{n} \sum_{i = 1}^{n} ψ (Z_{i}, X_{i}, U_{i}) ε_{i}] {1 + o_{p} (1)} . \end{array}

This completes the proof of Theorem 1.

A.4 Proof of Theorem 3

By the definition of Ψ̂ (u), we have

\begin{array}{l} H \hat{Ψ} = {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} (Y - \hat{Z} {\hat{Θ}}_{n}) \\ = I_{1} + {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} (Z - \hat{Z}) Θ + {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} Z (Θ - {\hat{Θ}}_{n}) + R_{n}, \end{array}

where $I_{1} = {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u} W_{u} (Y - Z Θ)$ and $R_{n} = {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u} W_{u} (Z - \hat{Z}) (Θ - {\hat{Θ}}_{n})$ . It is easy to show that R_n = o(n⁻¹^/²) in probability. Note that

\begin{array}{l} D_{u}^{T} W_{u} (Z - \hat{Z}) Θ = {\frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{(r + 1)!} \sum_{i = 1}^{n} (\begin{matrix} K_{h} (U_{i} - u) X_{i} {ξ^{(r + 1)} (V_{i})}^{T} β \\ K_{h} (U_{i} - u) h^{- 1} (U_{i} - u) X_{i} {ξ^{(r + 1)} (V_{i})}^{T} β \end{matrix}) \\ \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n} f_{v}^{- 1} (V_{i}) (\begin{matrix} K_{h} (U_{i} - u) L_{b} (V_{j} - V_{i}) X_{i} e_{j}^{T} β \\ K_{h} (U_{i} - u) L_{b} (V_{j} - V_{i}) h^{- 1} (U_{i} - u) X_{i} e_{j}^{T} β \end{matrix})} {1 + o_{P} (1)} \\ = I_{1}^{'} + I_{2}^{'} . \end{array}

It follows from (A.2) that

\begin{array}{r} {(D_{u}^{T} W_{u} D_{u})}^{- 1} I_{1}^{'} = \frac{ζ_{1} S_{u}^{- 1} c_{p} b^{r + 1}}{(r + 1)!} {f_{u} (u) Γ^{- 1} (u) \otimes (\begin{matrix} 1 & μ_{1} \\ μ_{1} & μ_{2} \end{matrix})}^{- 1} \\ \times [\sum_{i = 1}^{n} {\begin{matrix} K_{h} (U_{i} - u) X_{i} {ξ^{(r + 1)} (V_{i})}^{T} β \\ K_{h} (U_{i} - u) X_{i} h^{- 1} (U_{1} - u) {ξ^{(r + 1)} (V_{i})}^{T} β \end{matrix}} {1 + o_{P} (1)}] \end{array}

(A.8)

and

\begin{array}{l} {(D_{u}^{T} W_{u} D_{u})}^{- 1} I_{2}^{'} = \frac{1}{n^{2}} \sum_{j = 1}^{n} \sum_{i = 1}^{n} {f_{u} (u) Γ (u) \otimes (\begin{matrix} 1 & μ_{1} \\ μ_{1} & μ_{2} \end{matrix})}^{- 1} \\ \times {\begin{matrix} K_{h} (U_{i} - u) E (X_{i} ∣ V = V_{j}) e_{j}^{T} β \\ h^{- 1} (U_{i} - u) K_{h} (U_{i} - u) E (X_{i} ∣ V = V_{j}) e_{j}^{T} β \end{matrix}} {1 + o_{p} (1)} \\ = \frac{Γ^{- 1} (u)}{n f_{u} (u) (μ_{2} - μ_{1}^{2})} \sum_{i = 1}^{n} K_{h} (U_{i} - u) E (X_{i} ∣ V = V_{i}) e_{i}^{T} β \\ \otimes (\begin{matrix} μ_{2} - μ_{1} (U_{i} - u) / h \\ (U_{i} - u) / h - μ_{1} \end{matrix}) {1 + o_{p} (1)} . \end{array}

Furthermore, (A.3) implies that

\begin{array}{l} {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} Z (Θ - {\hat{Θ}}_{n}) = {f_{u} (u) Γ (u) \otimes (\begin{matrix} 1 & μ_{1} \\ μ_{1} & μ_{2} \end{matrix})}^{- 1} \\ {n f_{u} (u) Φ (u) \otimes {(1, μ_{1})}^{T}} (Θ - {\hat{Θ}}_{n}) {1 + o_{P} (1)} \\ = {Γ^{- 1} (u) Φ (u) \otimes {(1, 0)}^{T}} (Θ - {\hat{Θ}}_{n}) {1 + o_{P} (1)} . \end{array}

(A.9)

We therefore have $I_{1} = {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} M_{u} + {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} ε$ , where M_u = α(u)^TX.

By Taylor expansion and a direct simplification, we have

\begin{array}{l} M = (\begin{matrix} X_{1}^{T} α (u) + (U_{1} - u) X_{1}^{T} α^{'} (u) + 2^{- 1} {(U_{1} - u)}^{2} X_{1}^{T} α^{″} (u) \\ ⋮ \\ X_{1}^{T} α (u) + (U_{n} - u) X_{n}^{T} α^{'} (u) + 2^{- 1} {(U_{n} - u)}^{2} X_{n}^{T} α^{″} (u) \end{matrix}) + o (h^{2}) \\ = D_{u} (\begin{matrix} α (u) \\ h α^{'} (u) \end{matrix}) + \frac{1}{2} (\begin{matrix} {(U_{1} - u)}^{2} X_{1}^{T} α^{″} (u) \\ ⋮ \\ {(U_{n} - u)}^{2} X_{n}^{T} α^{″} (u) \end{matrix}) + o (h^{2}) . \end{array}

Hence,

\begin{array}{l} I_{1} = {(\begin{matrix} α (u) \\ h α (u) \end{matrix}) + \frac{1}{2} {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} (\begin{matrix} {(U_{1} - u)}^{2} X_{1}^{T} α^{″} (u) \\ ⋮ \\ {(U_{n} - u)}^{2} X_{n}^{T} α^{″} (u) \end{matrix}) \\ + {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{T} W_{u} ε} {1 + o_{P} (h^{2})} . \end{array}

(A.10)

It follows from (A.8)–(A.10) that $\sqrt{n h} H {\hat{Ψ} (u_{0}) - Ψ (u_{0})}$ can be represented as

\begin{array}{l} \sqrt{n h} [\frac{b^{r + 1}}{n (r + 1)!} \frac{ζ_{1}^{T} S_{u}^{- 1} c_{p}}{(μ_{2} - μ_{1}^{2}) f_{u} (u)} Γ^{- 1} (U) \sum_{i = 1}^{n} K_{h} (U_{i} - u) \\ \otimes (\begin{matrix} {μ_{2} = μ_{1} (U_{i} - u) / h} X_{i} {ξ^{(r + 1)} (V_{i})}^{T} β_{0} \\ {(U_{i} - u) / h - μ_{1}} X_{i} {ξ^{(r + 1)} (V_{i})}^{T} β_{0} \end{matrix}) \\ + \frac{Γ^{- 1} (U)}{2 (μ_{2} - μ_{1}^{2}) f_{u} (u)} \sum_{i = 1}^{n} K_{h} (U_{i} - u) \otimes (\begin{matrix} {μ_{2} - μ_{1} (U_{i} - u) / h} {(U_{i} - u)}^{2} X_{i} X_{i}^{T} α^{″} (u) \\ {(U_{i} - u) / h - μ_{1}} {(U_{i} - u)}^{2} X_{i} X_{i}^{T} α^{″} (u) \end{matrix}) \\ + o (h^{2} + b^{r + 1}) + O (n^{- 1 / 2})] \\ + \sqrt{n h} {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{- 1} W_{u} ε + \frac{\sqrt{n h} Γ^{- 1} (u)}{n f_{u} (u) (μ_{2} - μ_{1}^{2})} \sum_{i = 1}^{n} K_{h} (U_{i} - u) E (X_{i} ∣ V = V_{i}) e_{i}^{T} β \\ \otimes (\begin{matrix} μ_{2} - μ_{1} (U_{i} - u) / h \\ (U_{i} - u) / h - μ_{1} \end{matrix}) {1 + o_{p} (1)} . \end{array}

By the similar argument of Lemma A.8, we have

\begin{array}{l} {(D_{u}^{T} W_{u} D_{u})}^{- 1} D_{u}^{- 1} W_{u} ε = \frac{Γ^{- 1} (u)}{n f_{u} (u) (μ_{2} - μ_{1}^{2})} \sum_{i = 1}^{n} K_{h} (U_{i} - u) X_{i} ε_{i} \\ \otimes (\begin{matrix} μ_{2} - μ_{1} (U_{i} - u) / h \\ (U_{i} - u) / h - μ_{1} \end{matrix}) {1 + o_{p} (1)} . \end{array}

The proof of Theorem 3 is completed.

A.5 Proof of Theorem 5

The proof is similar to Theorems 3.1 and 3.2 of Fan and Huang (2005). We only give a sketch. We first prove that n⁻¹RSS₁ = σ²{1 + o_P (1)}.

By a similar procedure as that of Theorem 3.2 in Fan and Huang (2005), we can obtain that $n^{- 1} R S S_{10} = n^{- 1} \sum_{i = 1}^{n} {(Y_{i} - {\hat{M}}_{i 0} - {\hat{Θ}}^{T} Z_{i})}^{2} = σ^{2} {1 + o_{P} (1)}$ , where M̂_i₀ is the ith element of M̂₀ = S(Y −ZΘ̂). A direct calculation yields that

\begin{matrix} n^{- 1} (R S S_{1} - R S S_{10}) = n^{- 1} \sum_{i = 1}^{n} {\hat{Θ}}^{T} ({\hat{Z}}_{i} - Z_{i}) {(Y_{i} - {\hat{M}}_{i} - {\hat{Θ}}^{T} {\hat{Z}}_{i}) + (Y_{i} - {\hat{M}}_{i 0} - {\hat{Θ}}^{T} Z_{i})} \\ + n^{- 1} \sum_{i = 1}^{n} ({\hat{M}}_{i} - {\hat{M}}_{i 0}) {(Y_{i} - {\hat{M}}_{i} - {\hat{Θ}}^{T} {\hat{Z}}_{i}) + (Y_{i} - {\hat{M}}_{i 0} - {\hat{Θ}}^{T} Z_{i})} . \end{matrix}

(A.11)

By (2.1), Theorem 2 and Jensen inequality, we know that the first term in the right-hand side of (A.11) is bounded by

max_{1 \leq i \leq n} {\hat{Θ}}^{T} ∣ {\hat{Z}}_{i} - Z_{i} ∣ [{n^{- 1} \sum_{i = 1}^{n} {(Y_{i} - {\hat{M}}_{i 0} - {\hat{Θ}}^{T} Z_{i})}^{2}}^{1 / 2} + max_{1 \leq i \leq n} {∣ {\hat{M}}_{i} - {\hat{M}}_{i 0} ∣ + {\hat{Θ}}^{T} ∣ {\hat{Z}}_{i} - Z_{i} ∣}],

(A.12)

which is o_p(1). A similar argument can show that the second term in the right-hand side of (A.11) is also o_p(1). We therefore have n⁻¹RSS₁ = σ²{1 + o_P (1)}.

Furthermore, RSS₀ can be decomposed as ${Y - \hat{M} - \hat{Z} \hat{Θ} + \tilde{Z} (\hat{Θ} - {\hat{Θ}}_{0})}^{T} {Y - \hat{M} - \hat{Z} \hat{Θ} + \tilde{Z} (\hat{Θ} - {\hat{Θ}}_{0})} \overset{def}{=} R S S_{1} + Q_{1} + Q_{2} + Q_{3}$ , where Q₁= {Z̃(Θ̂ − Θ̂₀)}^T{Z̃(Θ̂ − Θ̂₀)}, Q₂= (Y − M̂ − ẐΘ̂) {Z̃(Θ̂ − Θ̂₀)}, Q₃={Z̃(Θ̂ − Θ̂₀)}^T(Y − M̂ − ẐΘ̂).

Recalling the expression of Θ̂₀ and the result given in (A.1), we know that n⁻¹Z̃^T Z̃ → Σ in probability, and $Q_{1} - n Θ^{T} A^{T} {A \sum^{- 1} A^{T}}^{- 1} A Θ \to σ^{2} \sum_{i = 1}^{l} ω_{i} χ_{i 1}^{2}$ in distribution. In an analogous way, we can that show Q₂ and Q₃ are asymptotic negligible in probability. These statement along with the Slutsky theorem imply that $2 T_{n} - n σ^{- 2} Θ^{T} A^{T} {A \sum^{- 1} A^{T}}^{- 1} A Θ \to \sum_{i = 1}^{l} ω_{i} χ_{i 1}^{2}$ in distribution. Finally, following the lines of Rao and Scott (1981), we can prove that the distribution of $ϱ n \sum_{i = 1}^{l} ω_{i} χ_{i 1}^{2}$ has the approximate distribution as $χ_{l}^{2}$ , and complete the proof of Theorem 5.

Contributor Information

Yong Zhou, Institute of Applied Mathematics, Academy of Mathematics and System Science,Chinese Academy of Science, Beijing, China, 100080.

Hua Liang, Dept. of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630 Rochester, NY 14642.

References

1.Ahn H, Powell JL. Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J Econometrics. 1993;58:3–29. [Google Scholar]
2.Ai C, McFadden D. Estimation of some partially specified nonlinear models. J Econometrics. 1997;76:1–37. [Google Scholar]
3.Andrews DF, Herzberg AM. Data. A Collection of Problems for Many Fields for Student and Research Worker. New York: Springer Verlag; 1985. [Google Scholar]
4.Brookmeyer R, Liao J. Statistical Models for Reconstructing Infection Curves. In: Jewell NP, Dietz K, Farewell VT, editors. AIDS Epidemiology Issues. Boston: Birkhäuser; 1992. pp. 39–60. [Google Scholar]
5.Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. J Amer Statist Asso. 2000;95:888–902. [Google Scholar]
6.Cai Z, Naik PA, Tsai CL. De-noised least squares estimators: an application to estimating advertising effectiveness. Statistica Sinica. 2000;10:1231–1241. [Google Scholar]
7.Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized partially linear single-index models. J Amer Statist Asso. 1997;92:477–489. [Google Scholar]
8.Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models. 2. New York: Chapman and Hall; 2006. [Google Scholar]
9.Chen G, Choi Y, Zhou Y. Nonparametric estimation for structural change points in volatility for time series. J Econometrics. 2005;126:79–114. [Google Scholar]
10.Cui H, He X, Zhu L. On regression estimators with de-noised variables. Statistica Sinica. 2002;12:1191–1205. [Google Scholar]
11.Das M. Instrumental variable estimators of nonparametric models with discrete endogenous regressors. J Econometrics. 2005;124:335–361. [Google Scholar]
12.Engle RF, Granger WJ, Rice J, Weiss A. Semiparametric estimates of the relation between weather and electricity sales. J Amer Statist Asso. 1986;80:310–319. [Google Scholar]
13.Fan JQ, Gijbels I. Local polynomial modelling and its applications. New York: Chapman and Hall; 1996. [Google Scholar]
14.Fan JQ, Huang T. Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli. 2005;11:1031–1057. [Google Scholar]
15.Fan JQ, Zhang C, Zhang J. Generalized likelihood ratio statistics and Wilks phenomenon. Ann Statist. 2001;29:153–193. [Google Scholar]
16.Fan JQ, Zhang W. Statistical estimation in varying coefficient models. Ann Statist. 1999;27:1491–1518. [Google Scholar]
17.Fuller WA. Measurement Error Models. John Wiley and Sons; New York: 1987. [Google Scholar]
18.Härdle W, Liang H, Gao JT. Partially linear models. Heidelberg: Physica-Verlag; 2000. [Google Scholar]
19.Härdle W, Mammen E. Comparing nonparametric versus parametric regression fits. Ann Statist. 1993;21:1926–1947. [Google Scholar]
20.Hastie TJ, Tibshirani R. Varying-coefficient models. J Roy Statist Soc, Ser B. 1993;55:757–796. [Google Scholar]
21.Lee L, Sepanski JH. Estimation of linear and nonlinear error-in-variables models using validation data. J Amer Statist Asso. 1995;90:130–140. [Google Scholar]
22.Li Q. Semiparametric estimation of partially linear models for dependent data with generated regressors. Econ Theory. 2002;19:625–645. [Google Scholar]
23.Liang H, Härdle W, Carroll RJ. Estimation in a semiparametric partially linear errors-in-variables model. Ann Statist. 1999;27:1519–1535. [Google Scholar]
24.Masry E. Multivariate local polynomial regression for time series: uniform strong consistency and rates. J Time Ser Ann. 1996;17:571–599. [Google Scholar]
25.Pagan A. Econometric issues in the analysis of regressions with generated regressors. International Econometric Review. 1984;25:221–247. [Google Scholar]
26.Powell JL. Discussion paper, Social System Research Institute. University of Wisconsin; 1987. Semiparametric estimation of bivariate latent variable models. [Google Scholar]
27.Rao JNK, Scott AJ. The analysis of categorical data from complex sample surveys: chi-squares tests for goodness of fit and independence in two-way tables. J Amer Statist Asso. 1981;76:221–230. [Google Scholar]
28.Robinson P. Root-N-consistent semiparametric regression. Econometrica. 1988;56:931–954. [Google Scholar]
29.Severini TA, Staniswalis JG. Quasilikehood estimation in semiparametric models. J Amer Statist Asso. 1994;89:501–511. [Google Scholar]
30.Speckman P. Kernel smoothing in partial linear models. J Roy Statist Soc, Ser B. 1988;50:413–436. [Google Scholar]
31.Sepanski JH, Carroll RJ. Semiparametric quasi-likelihood and variance function estimation in measurement error models. J Econometrics. 1993;58:223–256. [Google Scholar]
32.Sepanski JH, Lee LF. Semiparametric estimation of nonlinear errors-in-variables models with validation study. J Nonparametric Statist. 1995;4:365–394. [Google Scholar]
33.Sepanski JH, Knickerbocker R, Carroll RJ. A semiparametric correction for attenuation. J Amer Statist Asso. 1994;89:1366–1373. [Google Scholar]
34.Wang CY, Pepe MS. Expected estimating equations to accommodate covariate measurement error. J Roy Statist Soc, Ser B. 2000;62:509–524. [Google Scholar]
35.Wu J. Jackknife, bootstrap and other resampling methods in regression analysis. Ann Statist. 1986;14:1261–1343. [Google Scholar]
36.Xia Y, Li WK. On the estimation and testing of functional-coefficient linear models. Statistica Sinica. 1999;9:737–757. [Google Scholar]
37.Zhang W, Lee SY, Song X. Local polynomial fitting in semiparametric varying coefficient models. J Mult Anal. 2002;82:166–188. [Google Scholar]

[R1] 1.Ahn H, Powell JL. Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J Econometrics. 1993;58:3–29. [Google Scholar]

[R2] 2.Ai C, McFadden D. Estimation of some partially specified nonlinear models. J Econometrics. 1997;76:1–37. [Google Scholar]

[R3] 3.Andrews DF, Herzberg AM. Data. A Collection of Problems for Many Fields for Student and Research Worker. New York: Springer Verlag; 1985. [Google Scholar]

[R4] 4.Brookmeyer R, Liao J. Statistical Models for Reconstructing Infection Curves. In: Jewell NP, Dietz K, Farewell VT, editors. AIDS Epidemiology Issues. Boston: Birkhäuser; 1992. pp. 39–60. [Google Scholar]

[R5] 5.Cai Z, Fan J, Li R. Efficient estimation and inferences for varying-coefficient models. J Amer Statist Asso. 2000;95:888–902. [Google Scholar]

[R6] 6.Cai Z, Naik PA, Tsai CL. De-noised least squares estimators: an application to estimating advertising effectiveness. Statistica Sinica. 2000;10:1231–1241. [Google Scholar]

[R7] 7.Carroll RJ, Fan J, Gijbels I, Wand MP. Generalized partially linear single-index models. J Amer Statist Asso. 1997;92:477–489. [Google Scholar]

[R8] 8.Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models. 2. New York: Chapman and Hall; 2006. [Google Scholar]

[R9] 9.Chen G, Choi Y, Zhou Y. Nonparametric estimation for structural change points in volatility for time series. J Econometrics. 2005;126:79–114. [Google Scholar]

[R10] 10.Cui H, He X, Zhu L. On regression estimators with de-noised variables. Statistica Sinica. 2002;12:1191–1205. [Google Scholar]

[R11] 11.Das M. Instrumental variable estimators of nonparametric models with discrete endogenous regressors. J Econometrics. 2005;124:335–361. [Google Scholar]

[R12] 12.Engle RF, Granger WJ, Rice J, Weiss A. Semiparametric estimates of the relation between weather and electricity sales. J Amer Statist Asso. 1986;80:310–319. [Google Scholar]

[R13] 13.Fan JQ, Gijbels I. Local polynomial modelling and its applications. New York: Chapman and Hall; 1996. [Google Scholar]

[R14] 14.Fan JQ, Huang T. Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli. 2005;11:1031–1057. [Google Scholar]

[R15] 15.Fan JQ, Zhang C, Zhang J. Generalized likelihood ratio statistics and Wilks phenomenon. Ann Statist. 2001;29:153–193. [Google Scholar]

[R16] 16.Fan JQ, Zhang W. Statistical estimation in varying coefficient models. Ann Statist. 1999;27:1491–1518. [Google Scholar]

[R17] 17.Fuller WA. Measurement Error Models. John Wiley and Sons; New York: 1987. [Google Scholar]

[R18] 18.Härdle W, Liang H, Gao JT. Partially linear models. Heidelberg: Physica-Verlag; 2000. [Google Scholar]

[R19] 19.Härdle W, Mammen E. Comparing nonparametric versus parametric regression fits. Ann Statist. 1993;21:1926–1947. [Google Scholar]

[R20] 20.Hastie TJ, Tibshirani R. Varying-coefficient models. J Roy Statist Soc, Ser B. 1993;55:757–796. [Google Scholar]

[R21] 21.Lee L, Sepanski JH. Estimation of linear and nonlinear error-in-variables models using validation data. J Amer Statist Asso. 1995;90:130–140. [Google Scholar]

[R22] 22.Li Q. Semiparametric estimation of partially linear models for dependent data with generated regressors. Econ Theory. 2002;19:625–645. [Google Scholar]

[R23] 23.Liang H, Härdle W, Carroll RJ. Estimation in a semiparametric partially linear errors-in-variables model. Ann Statist. 1999;27:1519–1535. [Google Scholar]

[R24] 24.Masry E. Multivariate local polynomial regression for time series: uniform strong consistency and rates. J Time Ser Ann. 1996;17:571–599. [Google Scholar]

[R25] 25.Pagan A. Econometric issues in the analysis of regressions with generated regressors. International Econometric Review. 1984;25:221–247. [Google Scholar]

[R26] 26.Powell JL. Discussion paper, Social System Research Institute. University of Wisconsin; 1987. Semiparametric estimation of bivariate latent variable models. [Google Scholar]

[R27] 27.Rao JNK, Scott AJ. The analysis of categorical data from complex sample surveys: chi-squares tests for goodness of fit and independence in two-way tables. J Amer Statist Asso. 1981;76:221–230. [Google Scholar]

[R28] 28.Robinson P. Root-N-consistent semiparametric regression. Econometrica. 1988;56:931–954. [Google Scholar]

[R29] 29.Severini TA, Staniswalis JG. Quasilikehood estimation in semiparametric models. J Amer Statist Asso. 1994;89:501–511. [Google Scholar]

[R30] 30.Speckman P. Kernel smoothing in partial linear models. J Roy Statist Soc, Ser B. 1988;50:413–436. [Google Scholar]

[R31] 31.Sepanski JH, Carroll RJ. Semiparametric quasi-likelihood and variance function estimation in measurement error models. J Econometrics. 1993;58:223–256. [Google Scholar]

[R32] 32.Sepanski JH, Lee LF. Semiparametric estimation of nonlinear errors-in-variables models with validation study. J Nonparametric Statist. 1995;4:365–394. [Google Scholar]

[R33] 33.Sepanski JH, Knickerbocker R, Carroll RJ. A semiparametric correction for attenuation. J Amer Statist Asso. 1994;89:1366–1373. [Google Scholar]

[R34] 34.Wang CY, Pepe MS. Expected estimating equations to accommodate covariate measurement error. J Roy Statist Soc, Ser B. 2000;62:509–524. [Google Scholar]

[R35] 35.Wu J. Jackknife, bootstrap and other resampling methods in regression analysis. Ann Statist. 1986;14:1261–1343. [Google Scholar]

[R36] 36.Xia Y, Li WK. On the estimation and testing of functional-coefficient linear models. Statistica Sinica. 1999;9:737–757. [Google Scholar]

[R37] 37.Zhang W, Lee SY, Song X. Local polynomial fitting in semiparametric varying coefficient models. J Mult Anal. 2002;82:166–188. [Google Scholar]

PERMALINK

Statistical Inference for Semiparametric Varying-coefficient Partially Linear Models with Generated Regressors (F06-463)

Yong Zhou

Hua Liang

Abstract

1 Introduction

Example 1.1

Example 1.2

Example 1.3

2 Estimation of the parametric and nonparametric components

2.1 Covariate calibration

2.2 Estimation of the parametric component

Theorem 1

Theorem 2

2.3 Estimation of the nonparametric components

Theorem 3

Theorem 4

2.4 Bandwidth selection

3 Tests for parametric and nonparametric components

3.1 Test for parametric components

Theorem 5

3.2 Tests for the nonparametric part and Wild bootstrap version

4 Numerical Examples

4.1 Performance of the proposed estimators

Table 4.

4.2 Performance of the proposed tests

Table 5.

Table 6.

4.3 Real data example

Figure 1.

5 Discussion

Table 1.

Table 2.

Table 3.

Acknowledgments

Appendix

A.1 Assumptions

A.2 Preliminary Lemmas

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Lemma A.6

Proof

Lemma A.7

Proof

Lemma A.8

Proof

A.3 Proof of Theorem 1

A.4 Proof of Theorem 3

A.5 Proof of Theorem 5

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases