Local Rank Inference for Varying Coefficient Models

Lan Wang; Bo Kai; Runze Li

doi:10.1198/jasa.2009.tm09055

. Author manuscript; available in PMC: 2010 Dec 1.

Published in final edited form as: J Am Stat Assoc. 2009 Dec 1;104(488):1631–1645. doi: 10.1198/jasa.2009.tm09055

Local Rank Inference for Varying Coefficient Models^¹

Lan Wang, Bo Kai, Runze Li

PMCID: PMC2908045 NIHMSID: NIHMS142014 PMID: 20657760

Abstract

By allowing the regression coefficients to change with certain covariates, the class of varying coefficient models offers a flexible approach to modeling nonlinearity and interactions between covariates. This paper proposes a novel estimation procedure for the varying coefficient models based on local ranks. The new procedure provides a highly efficient and robust alternative to the local linear least squares method, and can be conveniently implemented using existing R software package. Theoretical analysis and numerical simulations both reveal that the gain of the local rank estimator over the local linear least squares estimator, measured by the asymptotic mean squared error or the asymptotic mean integrated squared error, can be substantial. In the normal error case, the asymptotic relative efficiency for estimating both the coefficient functions and the derivative of the coefficient functions is above 96%; even in the worst case scenarios, the asymptotic relative efficiency has a lower bound 88.96% for estimating the coefficient functions, and a lower bound 89.91% for estimating their derivatives. The new estimator may achieve the nonparametric convergence rate even when the local linear least squares method fails due to infinite random error variance. We establish the large sample theory of the proposed procedure by utilizing results from generalized U-statistics, whose kernel function may depend on the sample size. We also extend a resampling approach, which perturbs the objective function repeatedly, to the generalized U-statistics setting; and demonstrate that it can accurately estimate the asymptotic covariance matrix.

Keywords: Asymptotic relative efficiency, Local linear regression, Local rank, Varying coefficient model

1 Introduction

As introduced in Cleveland, Crosse and Shyu (1992) and Hastie and Tibshirani (1993), the varying coefficient model provides a natural and useful extension of the classical linear regression model by allowing the regression coefficients to depend on certain covariates. Due to its flexibility to explore the dynamic features which may exist in the data and its easy interpretation, the varying coefficient model has been widely applied in many scientific areas. It has also experienced rapid developments in both theory and methodology, see Fan and Zhang (2008) for a comprehensive survey. Fan and Zhang (1999) proposed a two-step estimation procedure for the varying coefficient model when the coefficient functions have possibly different degrees of smoothness. Kauermann and Tutz (1999) investigated the use of varying coefficient models for diagnosing the lack-of-fit of regression, regarding the varying coefficient model as an alternative to a parametric null model. Cai, Fan and Li (2000) developed a more efficient estimation procedure for varying coefficient models in the framework of generalized linear models. As special cases of varying coefficient models, time-varying coefficient models are particularly appealing in longitudinal studies, survival analysis and time series data since they allow one to explore the time-varying effect of covariates over the response. Pioneering works on novel applications of time-varying coefficient models to longitudinal data include Brumback and Rice (1998), Hoover, et al. (1998), Wu, et al. (1998) and Fan and Zhang (2000), among others. For more details, readers are referred to Fan and Li (2006) and the references therein. Time-varying coefficient models are also popular in modeling and predicting nonlinear time series data and survival data, see Fan and Zhang (2008) for related literature.

Estimation procedures in the aforementioned papers are built on either local least squares type or local likelihood type methods. Although these estimators remain asymptotically normal for a large class of random error distributions, their efficiency can deteriorate dramatically when the true error distribution deviates from normality. Furthermore, these estimators are very sensitive to outliers. Even a few outlying data points may introduce undesirable artificial features in the estimated functions. These considerations motivate us to develop a novel local rank estimation procedure that is highly efficient, robust and computationally simple. In particular, the proposed local rank regression estimator may achieve the nonparametric convergence rate even when the local linear least squares method fails to consistently estimate the regression coefficient functions due to infinite random error variance, which occurs for instance when the random error has a Cauchy distribution.

The new approach can substantially improve upon the commonly used local linear least squares procedure for a wide class of error distributions. Theoretical analysis reveals that the asymptotic relative efficiency (ARE), measured by the asymptotic mean squared error (or the asymptotic mean integrated squared error), of the local rank regression estimator in comparison with the local linear least squares estimator has an expression that is closely related to that of the Wilcoxon-Mann-Whitney rank test in comparison with the two-sample t-test. However, different from the two-sample test scenario where the efficiency is completely determined by the asymptotic variance, in the current setting of estimating an infinite-dimensional parameter both bias and variance contribute to the asymptotic efficiency. The value of ARE is often significantly greater than one. For example, the ARE is 167% for estimating the regression coefficient functions when the random error has a t₃ distribution, is 240% for exponential random error distribution, and is 493% for lognormal random error distribution.

A striking feature of the local rank procedure is that its pronounced efficiency gain comes with only a little loss when the random error actually has a normal distribution, for which the ARE of the local rank regression estimator relative to the local linear least squares estimator is above 96% for estimating both the coefficient functions and their derivatives. For estimating the regression coefficient functions, the ARE has a sharp lower bound 88.96%, which implies that the efficiency loss is at most 11.04% in the worst case scenario. For estimating the first derivative of the regression coefficient functions, the ARE possesses a lower bound 89.91%. Kim (2008) developed a quantile regression procedure for varying coefficient models when the random errors are assumed to have a certain quantile equal to zero. She used the regression splines method and derived the convergence rate, but the lack of asymptotic normality result does not allow the comparison of the relative efficiency. On the other hand, one may extend the local quantile regression approach (Yu and Jones, 1998) to the varying coefficient models. However, this is expected to yield an estimator which still suffers from loss of efficiency and may have near zero ARE relative to the local linear least squares estimator in the worst case scenario.

The new estimator proposed in this paper minimizes a convex objective function based on local ranks. The implementation of the minimization can be conveniently carried out using existing functions in the R statistical software package via a simple algorithm (§4.1). The objective function has the form of a generalized U-statistic whose kernel varies with the sample size. Under some mild conditions, we establish the asymptotic representation of the proposed estimator and further prove its asymptotic normality. We derive the formula of the asymptotic relative efficiency of the local rank estimator relative to the local linear least squares estimator, which confirms the efficiency advantage of the local rank approach. We also extend a resampling approach, which perturbs the objective function repeatedly, to the generalized U-statistics setting; and demonstrate that it can accurately estimate the asymptotic covariance matrix.

This paper is organized as follows. Section 2 presents the local rank procedure for estimating the varying coefficient models. Section 3 discusses its large sample properties and proposes a resampling method for estimating the asymptotic covariance matrix. In Section 4, we address issues related to practical implementation and present Monte Carlo simulation results. We further illustrate the proposed procedure via analyzing an environment data set. Regularity conditions and technical proofs are presented in the Appendix.

2 Local rank estimation procedure

Let Y be a response variable, and U and X be the covariates. The varying coefficient model is defined by

Y = a_{0} (U) + X^{T} a (U) + ε,

(1)

where a₀(·) and a(·) are both unknown smooth functions. The random error ε has probability density function g(·) which has finite Fisher information, i.e., ∫ {g(x)}⁻¹ g′(x)²dx < ∞. In this paper, it is assumed that U is a scalar and X is a p-dimensional vector. The proposed procedures can be extended to the case of multivariate U with more complicated notations by following the same idea in this paper.

Suppose that {U_i, X_i, Y_i}, i = 1, …, n, is a random sample from model (1). Write X_i = (X_i₁, …, X_ip)^T and a(·) = (a₁(·), …, a_p(·))^T. For u in a neighborhood of any given u₀, we locally approximate the coefficient function by a Taylor expansion

a_{m} (u) \approx a_{m} (u_{0}) + a_{m}^{'} (u_{0}) (u - u_{0}), m = 0, 1, \dots, p .

(2)

Denote α₁ = a₀(u₀), $α_{2} = a_{0}^{'} (u_{0})$ , β_m = a_m(u₀) and $β_{p + m} = a_{m}^{'} (u_{0})$ , for m = 1, …, p. Based on the above approximation, we obtain the residual for estimating Y_i at U_i = u₀

e_{i} = Y_{i} - α_{1} - α_{2} (U_{i} - u_{0}) - \sum_{m = 1}^{p} [β_{m} + β_{p + m} (U_{i} - u_{0})] X_{i m} .

(3)

We define the local rank objective function to be

Q_{n} (β, α_{2}) = \frac{1}{n (n - 1)} \sum_{1 \leq i, j \leq n} ∣ e_{i} - e_{j} ∣ K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}),

(4)

where β = (β₁, …, β_p, β_p₊₁, …, β₂_p)^T, and for a given kernel function K(·) and a bandwidth h, K_h(t) = h⁻¹K(t/h). Note that Q_n(β, α₂) does not depend on α₁ because α₁ is canceled out in e_i − e_j. The objective function Q_n(β, α₂) is a local version of Gini’s mean difference, which is a classical measure of concentration or dispersion (David, 1998). Without the kernel functions, [n(n − 1)]⁻¹ Σ_{1 ≤} _i_, _j _≤ _n|e_j − e_j| is the global rank objective function that leads to the classical rank estimator in linear models based on Wilcoxon scores. Rank-based statistical procedures have played a fundamental role in nonparametric analysis of linear models due to its high efficiency and robustness. We refer to the review paper of McKean (2004) for many useful references.

For any given u₀, minimizing Q_n(β, α₂) yields the local Wilcoxon rank estimator for ${(β_{0}^{T}, α_{2})}^{T}$ , where $β_{0} = β (u_{0}) = {(a_{1} (u_{0}), \dots, a_{p} (u_{0}), a_{1}^{'} (u_{0}), \dots, a_{p}^{'} (u_{0}))}^{T}$ . Denote the minimizer of Q_n(β, α₂) by (β̂^T, α̂₂)^T. Then for m = 1, …, p,

{\hat{a}}_{m} (u_{0}) = {\hat{β}}_{m}, {\hat{a}}_{m}^{'} (u_{0}) = {\hat{β}}_{p + m} and {\hat{a}}_{0}^{'} (u_{0}) = {\hat{α}}_{2},

In the sequel, we also use the vector notation α̂(u₀) = (â₁(u₀), …, â_p(u₀))^T and ${\hat{a}}^{'} (u_{0}) = {({\hat{a}}_{1}^{'} (u_{0}), \dots, {\hat{a}}_{p}^{'} (u_{0}))}^{T}$ when convenient.

The location parameter a₀(u₀) needs to be estimated separately. This is analogous to the scenario of global rank estimation of intercept in the linear regression model. In order to make the intercept identifiable, it is essential to have additional location constraint on the random errors. We adopt the commonly used constraint that ε_i has median zero. Given (β̂^T, α̂₂)^T, we estimate a₀(u₀) by α̂₁, the value of α₁ that minimizes

n^{- 1} \sum_{i = 1}^{n} | Y_{i} - α_{1} - {\hat{α}}_{2} (U_{i} - u_{0}) - \sum_{m = 1}^{p} [{\hat{β}}_{m} + {\hat{β}}_{p + m} (U_{i} - u_{0})] X_{i m} | K_{h} (U_{i} - u_{0}),

(5)

which is a local version of a weighted L₁-norm objective function.

3 Theoretical Properties

3.1 Large sample distributions

In this subsection, we investigate the asymptotic properties of β̂ and α̂₂. The main challenge comes from the non-smoothness of the objective function Q_n(β, α₂). To overcome this difficulty, we first derive an asymptotic representation of β̂ and α̂₂ via a quadratic approximation of Q_n(β, α₂), which holds uniformly in a local neighborhood of the true parameter values. Aided with this asymptotic representation, we further establish the asymptotic normality of the local rank estimator.

Let us begin with some new notation. Let γ_n = (nh)^−1/2, and define

\begin{array}{l} β^{*} = γ_{n}^{- 1} {(β_{1} - a_{1} (u_{0}), \dots, β_{p} - a_{p} (u_{0}), h (β_{p + 1} - a_{1}^{'} (u_{0})), \dots, h (β_{2 p} - a_{p}^{'} (u_{0})))}^{T}, \\ α^{*} = {(α_{1}^{*}, α_{2}^{*})}^{T} = γ_{n}^{- 1} {(α_{1} - a_{0} (u_{0}), h (α_{2} - a_{0}^{'} (u_{0})))}^{T} \\ Δ_{i} (u_{0}) = \sum_{m = 1}^{p} [a_{m} (U_{i}) - a_{m} (u_{0}) - a_{m}^{'} (u_{0}) (U_{i} - u_{0})] X_{i m} \\ + [a_{0} (U_{i}) - a_{0} (u_{0}) - a_{0}^{'} (u_{0}) (U_{i} - u_{0})] . \end{array}

Let ${({\hat{β}}_{n}^{* T}, {\hat{α}}_{2 n}^{*})}^{T}$ be the value of ${(β^{* T}, α_{2}^{*})}^{T}$ that minimizes the following reparametrized objective function

\begin{array}{l} Q_{n}^{*} (β^{*}, α_{2}^{*}) = \frac{1}{n (n - 1)} \sum_{1 \leq i, j \leq n} | (ε_{i} - γ_{n} α_{2}^{*} (U_{i} - u_{0}) / h - γ_{n} β^{* T} Z_{i} + Δ_{i} (u_{0})) \\ - (ε_{j} - γ_{n} α_{2}^{*} (U_{j} - u_{0}) / h - γ_{n} β^{* T} Z_{j} + Δ_{j} (u_{0})) | K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}), \end{array}

(6)

Where $Z_{i} = {(X_{i}^{T}, ((U_{i} - u_{0}) / h) X_{i}^{T})}^{T}$ . Let H = diag(1, h)⊗I_p, where ⊗ denotes the operation of Kronecker product and I_p denotes a p × p identity matrix. Then it can be easily seen that

{\hat{β}}_{n}^{*} = \sqrt{n h} H (\hat{β} - β_{0}) and {\hat{α}}_{2 n}^{*} = \sqrt{n h^{3}} [{\hat{α}}_{2} - a_{0}^{'} (u_{0})] .

We next show that the non-smooth function $Q_{n}^{*} (β^{*}, α_{2}^{*})$ can be locally approximated by a quadratic function of ${(β^{* T}, α_{2}^{*})}^{T}$ . Let μ_i = ∫ tⁱK(t)dt, i = 1, 2, and ν_i = ∫ tⁱK²(t)dt, i = 0, 1, 2. In this paper, we assume that the kernel function K(·) is symmetric. This is not restrictive considering that most of the commonly used kernel functions, such as the Epanechnikov kernel K(t) = 0.75(1 − t²)I(|t| < 1), are symmetric. We use $S_{n} (β^{*}, α_{2}^{*}) = {(S_{n 1}^{T} (β^{*}, α_{2}^{*}), S_{n 2} (β^{*}, α_{2}^{*}))}^{T}$ to denote the gradient function of $Q_{n}^{*} (β^{*}, α_{2}^{*})$ , i.e., $S_{n 1} (β^{*}, α_{2}^{*}) = \nabla_{β^{*}} Q_{n}^{*} (β^{*}, α_{2}^{*})$ and $S_{n 2} (β^{*}, α_{2}^{*}) = \nabla_{α_{2}^{*}} Q_{n}^{*} (β^{*}, α_{2}^{*})$ . More specifically,

\begin{array}{l} S_{n 1} (β^{*}, α_{2}^{*}) = 2 γ_{n} {[n (n - 1)]}^{- 1} \sum_{i \neq j} [I (ε_{i} - γ_{n} α_{2}^{*} (U_{i} - u_{0}) / h - γ_{n} β^{* T} Z_{i} + Δ_{i} (u_{0}) \leq ε_{j} \\ - γ_{n} α_{2}^{*} (U_{j} - u_{0}) / h - γ_{n} β^{* T} Z_{j} + Δ_{j} (u_{0})) - 1 / 2] (Z_{i} - Z_{j}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}), \end{array}

and

\begin{array}{l} S_{n 2} (β^{*}, α_{2}^{*}) = 2 γ_{n} {[n (n - 1)]}^{- 1} \sum_{i \neq j} [I (ε_{i} - γ_{n} α_{2}^{*} (U_{i} - u_{0}) / h - γ_{n} β^{* T} Z_{i} + Δ_{i} (u_{0}) \leq ε_{j} \\ - γ_{n} α_{2}^{*} (U_{j} - u_{0}) / h - γ_{n} β^{* T} Z_{j} + Δ_{j} (u_{0})) - 1 / 2] ((U_{i} - U_{j}) / h) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) . \end{array}

Furthermore, we consider the following quadratic function of ${(β^{* T}, α_{2}^{*})}^{T}$ :

B_{n} (β^{*}, α_{2}^{*}) = γ_{n}^{- 1} (β^{* T}, α_{2}^{*}) (\begin{matrix} S_{n 1} (0, 0) \\ S_{n 2} (0, 0) \end{matrix}) + \frac{1}{2} γ_{n} (β^{* T}, α_{2}^{*}) A (\begin{matrix} β^{*} \\ α_{2}^{*} \end{matrix}) + γ_{n}^{- 1} Q_{n}^{*} (0, 0),

(7)

where

A = 4 τ f^{2} (u_{0}) (\begin{matrix} \sum (u_{0}) & 0 & 0 \\ 0 & μ_{2} \sum (u_{0}) & 0 \\ 0 & 0 & μ_{2} \end{matrix}),

(8)

$\sum (u_{0}) = E [X_{i} X_{i}^{T} ∣ U_{i} = u_{0}]$ , 0 denotes a matrix (or vector) of zeroes whose dimension is determined by the context, τ = ∫ g²(t)dt is the Wilcoxon constant, and g(·) is the density function of the random error ε.

Lemma 3.1

Suppose that Conditions (C1) — (C4) in the Appendix hold. Then ∀ ε > 0, ∀ c > 0,

P [sup_{∣ ∣ (β^{* T}, α_{2}^{*}) ∣ ∣ \leq c} | γ_{n}^{- 1} Q_{n}^{*} (β^{*}, α_{2}^{*}) - B_{n} (β^{*}, α_{2}^{*}) | \geq ε] \to 0,

where || · || denotes the Euclidean norm.

Lemma 3.1 implies that the non-smooth objective function $Q_{n}^{*} (β^{*}, α_{2}^{*})$ can be uniformly approximated by a quadratic function $B_{n} (β^{*}, α_{2}^{*})$ in a neighborhood around 0. In the appendix, it is also shown that the minimizer of $B_{n} (β^{*}, α_{2}^{*})$ is asymptotically within a o(1) neighborhood of ${({\hat{β}}_{n}^{* T}, {\hat{α}}_{n 2}^{*})}^{T}$ . This further allows us to derive the asymptotic distribution.

The local linear Wilcoxon estimator of a(u₀) = (a₁(u₀), …, a_p(u₀))^T is â(u₀). The theorem below provides an asymptotic representation of â(u₀) and the asymptotic normal distribution. Let $S_{n 1} (0, 0) = {(S_{n 11}^{T} (0, 0), S_{n 12}^{T} (0, 0))}^{T}$ , where S_n₁₁(0, 0) and S_n₁₂(0, 0) are both p × 1 vectors.

Theorem 3.2

Suppose that Conditions (C1) — (C4) in the Appendix hold. Then we have the following asymptotic representation

\sqrt{n h} [\hat{a} (u_{0}) - a (u_{0})] = - γ_{n}^{- 2} {[4 τ f^{2} (u_{0}) \sum (u_{0})]}^{- 1} S_{n 11} (0, 0) + o_{P} (1),

(9)

where f(u) is the density function of U. Furthermore,

\sqrt{n h} [\hat{a} (u_{0}) - a (u_{0}) - \frac{μ_{2}}{2} a^{″} (u_{0}) h^{2} + o (h^{2})] \to N (0, \frac{ν_{0}}{12 τ^{2} f (u_{0})} \sum^{- 1} (u_{0}))

(10)

in distribution, where $a^{″} (u_{0}) = {(a_{1}^{″} (u_{0}), \dots, a_{p}^{″} (u_{0}))}^{T}$ .

Remark

For the estimators of the derivatives of the coefficient functions, we have the following asymptotic representations:

\sqrt{n h^{3}} [{\hat{α}}_{2} - a_{0}^{'} (u_{0})] = - γ_{n}^{- 2} {[4 τ f^{2} (u_{0}) μ_{2}]}^{- 1} S_{n 2} (0, 0) + o_{P} (1),

(11)

\sqrt{n h^{3}} [{\hat{a}}^{'} (u_{0}) - a^{'} (u_{0})] = - γ_{n}^{- 2} {[4 τ f^{2} (u_{0}) μ_{2} \sum (u_{0})]}^{- 1} S_{n 12} (0, 0) + o_{P} (1) .

(12)

Following similar proof as that for Theorem 3.2 in the appendix, it can be shown that $\sqrt{n h^{3}} [{\hat{α}}_{n 2} - a_{0}^{'} (u_{0})]$ and $\sqrt{n h^{3}} [{\hat{a}}^{'} (u_{0}) - a^{'} (u_{0})]$ are both asymptotically normal. The proof of the asymptotic normality of α̂₂ and â′(u₀) is given in the technical report version of this paper ((Wang, Kai and Li, 2009).

3.2 Asymptotic relative efficiency

We now compare the estimation efficiency of the local rank estimator (denoted by b â_R(u₀)) with that of the local linear least squares estimator (denoted by â_LS(u₀)) for estimating a(u₀) in the varying coefficient model. To measure efficiency, we consider both the asymptotic mean squared error (MSE) at a given u₀ and the asymptotic mean integrated squared error (MISE). When evaluating both criteria, we plug in the theoretical optimal bandwidth.

Zhang and Lee (2000) gives the asymptotic MSE of â_LS(u₀) for estimating a(u₀):

{MSE}_{L S} (h; u_{0}) = E ∣ ∣ {\hat{a}}_{L S} (u_{0}) - a (u_{0}) ∣ ∣^{2} = \frac{μ_{2}^{2} ∣ ∣ a^{″} (u_{0}) ∣ ∣^{2}}{4} h^{4} + \frac{ν_{0} σ^{2}}{f (u_{0})} tr {\sum^{- 1} (u_{0})} \frac{1}{n h},

where σ² = var(ε) is assumed to be finite and positive. Thus, the theoretical optimal bandwidth, which minimizes the asymptotic MSE of â_LS(u₀), is

h_{L S}^{opt} (u_{0}) = {[\frac{ν_{0} σ^{2} tr {\sum^{- 1} (u_{0})}}{μ_{2}^{2} ∣ ∣ a^{″} (u_{0}) ∣ ∣^{2} f (u_{0})}]}^{1 / 5} n^{- 1 / 5} .

(13)

From (10), the asymptotic MSE of the local rank estimator â_R(u₀) is

{MSE}_{R} (h; u_{0}) = E ∣ ∣ {\hat{a}}_{R} (u_{0}) - a (u_{0}) ∣ ∣^{2} = \frac{μ_{2}^{2} ∣ ∣ a^{″} (u_{0}) ∣ ∣^{2}}{4} h^{4} + \frac{ν_{0}}{12 τ^{2} f (u_{0})} tr {\sum^{- 1} (u_{0})} \frac{1}{n h} .

The theoretical optimal bandwidth for the local rank estimator thus is

h_{R}^{opt} (u_{0}) = {[\frac{ν_{0} tr {\sum^{- 1} (u_{0})}}{12 τ^{2} μ_{2}^{2} ∣ ∣ a^{″} (u_{0}) ∣ ∣^{2} f (u_{0})}]}^{1 / 5} n^{- 1 / 5} .

(14)

This allows us to calculate the local asymptotic relative efficiency.

Theorem 3.3

The asymptotic relative efficiency of the local rank estimator to the local linear least squares estimator for a(u₀) is

ARE (u_{0}) = \frac{{MSE}_{L S} {h_{L S}^{opt} (u_{0}), u_{0}}}{{MSE}_{R} {h_{R}^{opt} (u_{0}), u_{0}}} = {(12 σ^{2} τ^{2})}^{4 / 5} .

This asymptotic relative efficiency has a lower bound 0.8896, which is attained at the random error density $f (t) = \frac{3}{20 \sqrt{5}} (5 - x^{2}) I (∣ x ∣ \leq 5)$ .

Remark 1

Alternatively, we may consider the asymptotic relative efficiency obtained by comparing the MISE, which is defined as MISE(h) = ∫ E||â(u) − a(u)||²w(u) du with a weight function w(·). This provides a global measurement. Interestingly, it leads to the same relative efficiency. This follows by observing that the theoretical optimal global bandwidths for the local linear least squares estimator and the local rank estimator are

h_{L S}^{opt} = {[\frac{ν_{0} σ^{2} \int w (u) tr {\sum^{- 1} (u)} / f (u) d u}{μ_{2}^{2} \int ∣ ∣ a^{″} (u) ∣ ∣^{2} w (u) d u}]}^{1 / 5} n^{- 1 / 5},

(15)

and

h_{R}^{opt} = {[\frac{ν_{0} \int w (u) tr {\sum^{- 1} (u)} / f (u) d u}{12 τ^{2} μ_{2}^{2} \int ∣ ∣ a^{″} (u) ∣ ∣^{2} w (u) d u}]}^{1 / 5} n^{- 1 / 5},

(16)

respectively. Thus, with the theoretical optimal bandwidths,

ARE = \frac{{MISE}_{L S} (h_{L S}^{opt})}{{MISE}_{R} (h_{R}^{opt})} = {(12 σ^{2} τ^{2})}^{4 / 5} .

Define φ = (12σ²τ²)^4/5. Then ARE(u₀) = ARE = φ.

Note that the above ARE is closely related to the asymptotic relative efficiency of the Wilcoxon-Mann-Whitney rank test in comparison with the two-sample t-test. Table 1 depicts the value of φ for some commonly used error distributions. It can be seen that the desirable high efficiency of traditional rank methods for estimating a finite-dimensional parameter completely carries over to the local rank method for estimating an infinite dimensional parameter.

Table 1.

Asymptotic Relative Efficiency

Error	Normal	Laplace	t₃	Exponential	Log N	Cauchy
φ	0.9638	1.3832	1.6711	2.4082	4.9321	∞
ψ	0.9671	1.3430	1.5949	2.2233	4.2661	∞

Open in a new tab

By a similar calculation, we can show that the asymptotic relative efficiencies of the local rank estimator to the local linear estimator for a′(u₀) and a′(·) both equal ψ = (12σ²τ²)^8/11, which has a lower bound 0.8991. This value is also reported in Table 1 for some common error distributions.

Remark 2

We may also apply the local median approach (Yu and Jones, 1998) to estimate the coefficient functions and their first derivatives. Similarly, we can prove that such estimators are asymptotically normal. The ARE of the local median estimator versus the local linear least squares estimator is closely related to that of the sign test versus the t-test. It is known that the ARE of the sign test versus the t-test for the normal distribution is 0.63. Thus we expect the efficiency loss of the local median procedure to be substantial for normal random error.

3.3 Asymptotic normality of α̂₁

Following (5), ${\hat{α}}_{1}^{*} = \sqrt{n h} {{\hat{α}}_{1} - a_{0} (u_{0})}$ is the value of $α_{1}^{*}$ that minimizes

\begin{array}{l} Q_{n 0}^{*} (α_{1}^{*}, {\hat{α}}_{2}, \hat{β}) = n^{- 1} \sum_{i = 1}^{n} | ε_{i} - γ_{n} α_{1}^{*} - ({\hat{α}}_{2} - a_{0}^{'} (u_{0})) (U_{i} - u_{0}) \\ - \sum_{m = 1}^{p} [({\hat{β}}_{m} - a_{m} (u_{0})) + ({\hat{β}}_{p + m} - a_{m}^{'} (u_{0})) (U_{i} - u_{0})] X_{i m} + Δ_{i} (u_{0}) | K_{h} (U_{i} - u_{0}) . \end{array}

Similarly as Lemma 3.1, we can establish the following local quadratic approximation which holds uniformly in a neighborhood around 0:

γ_{n}^{- 1} Q_{n 0}^{*} (α_{1}^{*}, {\hat{α}}_{2}, \hat{β}) = γ_{n}^{- 1} α_{1}^{*} S_{n 0} + γ_{n} g (0) f (u_{0}) α_{1}^{* 2} + γ_{n}^{- 1} Q_{n 0}^{*} (0, a_{0}^{'} (u_{0}), β_{0}) + o_{p} (1),

(17)

where

S_{n 0} = 2 γ_{n} n^{- 1} \sum_{i = 1}^{n} [I (ε_{i} \leq - Δ_{i} (u_{0})) - 1 / 2] K_{h} (U_{i} - u_{0}) .

(18)

This further leads to an asymptotic representation of α̂₁:

\sqrt{n h} ({\hat{α}}_{1} - a_{0} (u_{0})) = - γ_{n}^{- 2} {[2 g (0) f (u_{0})]}^{- 1} S_{n 0} + o_{p} (1) .

(19)

The theorem below gives the asymptotic distribution of α̂₁.

Theorem 3.4

Under the conditions of Theorem 3.2, we have

\sqrt{n h} [{\hat{α}}_{1} - a_{0} (u_{0}) - \frac{μ_{2} a_{0}^{″} (u_{0})}{2} h^{2} + o (h^{2})] \to N (0, {[12 g^{2} (0) f (u_{0})]}^{- 1} ν_{0}) .

3.4 Estimation of the standard errors

To make statistical inference based on the local rank methodology, one needs to estimate the standard error of the resulting estimator. As indicated by Theorem 3.2, the asymptotic covariance matrix of the local rank estimator is rather complex and involves unknown functions. Here we propose a standard error estimator using a simple resampling method proposed by Jin, Ying and Wei (2001).

Let V₁, …, V_n be independent and identically distributed nonnegative random variables with mean 1/2 and variance 1. We consider a stochastic perturbation of (4):

{\bar{Q}}_{n} (β, α_{2}) = \frac{1}{n (n - 1)} \sum_{1 \leq i, j \leq n} (V_{i} + V_{j}) ∣ e_{i} - e_{j} ∣ K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}),

(20)

where e_i is defined in (3). In Q̄_n(β, α₂), the data {Y_i, U_i, X_i} are considered to be fixed, and the randomness comes from the V_i’s. Let (β̄^T, ᾱ₂)^T be the value of (β^T, α₂)^T that minimizes Q̄_n(β, α₂). It is easy to obtain (β̄^T, ᾱ₂)^T by applying a simple algorithm described in Section 3.1.

Jin, Ying and Wei (2001) established the validity of the resampling method when the objective function has a U-statistics structure. Although their theory covers many important applications, they require that the U-statistic has a fixed kernel. We extend their result to our setting, where the U-statistic involves variable kernel due to nonparametric smoothing. Let ā(u₀) be the local rank estimator of a(u₀) based on the perturbed objective function (20), i.e., it is the subvector that consists of the first p components of β̄. Its asymptotic normality is given in the theorem below.

Theorem 3.5

Under the conditions of Lemma 3.1, conditional on almost surely every sequence of data {Y_i; U_i; X_i},

\sqrt{n h} [\bar{a} (u_{0}) - \hat{a} (u_{0})] \to N (0, \frac{ν_{0}}{12 τ^{2} f (u_{0})} \sum^{- 1} (u_{0}))

in distribution.

This theorem suggests that to estimate the asymptotic covariance matrix of b â(u₀), one can repeatedly perturb (4) by generating a large number of independent random samples ${V_{i}}_{i = 1}^{n}$ . For each perturbed objective function, one solves for ā(u₀). The sample covariance matrix of ā(u₀) based on a large number of independent perturbations provides a good approximation. The accuracy of the resulting standard error estimate will be tested in the next section.

The perturbed estimator has conditional bias equal to zero. It has been found that standard bootstrap method, which resamples from the empirical distribution of the data, also estimates bias as zero when estimating nonparametric curves (Hall and Kang, 2001). It is possible to use more delicate bootstrap technique to estimate the bias of a nonparametric curve estimator. Although some of the ideas may be adapted to the method of disturbing the objective function, this is beyond the scope of our paper and is not pursued further here.

4 Numerical Studies

4.1 A pseudo-observation algorithm

The local rank estimator can be obtained by applying an efficient and reliable algorithm. Note that the local rank estimator of ${(β_{0}^{T}, a_{0}^{'} (u_{0}))}^{T}$ can be solved by fitting a weighted L₁ regression on $\frac{n (n - 1)}{2}$ pseudo observations ( $x_{i}^{*} - x_{j}^{*}$ , Y_i-Y_j) with weights w_ij = K((U_i − u₀)/h)K((U_j − u₀)/h), where $x_{i}^{*} = {(U_{i} - u_{0}, X_{i}^{T}, (U_{i} - u_{0}) X_{i}^{T})}^{T}$ , 1 ≤ i < j ≤ n. Given (β̂^T, α̂₂)^T, the estimator of a₀(u₀) can be obtained by another weighted L₁ regression on (1, $Y_{i} - {\hat{α}}_{2} (U_{i} - u_{0}) - \sum_{m = 1}^{p} [{\hat{β}}_{m} + {\hat{β}}_{p + m} (U_{i} - u_{0})] X_{i m}$ ) With weights w_i = K((U_i − u₀)/h), 1 ≤ i ≤ n. Many statistical software packages can implement weighted L₁ regression. In our numerical studies, we use the function “rq” in the R package quantreg.

4.2 Bandwidth selection

Bandwidth selection is an important issue for all statistical models that involve nonparametric smoothing. Although we have derived the theoretical optimal bandwidth for the local rank estimator in (14) and (16), it is difficult to use the “plug-in“ method to estimate it due to many unknown quantities.

We propose below an alternative bandwidth selection method that is practically feasible. This approach is based on the relationship between $h_{R}^{opt}$ and $h_{L S}^{opt}$ . From Section 2.3, we see that

h_{R}^{opt} (u_{0}) = {(\frac{1}{12 τ^{2} σ^{2}})}^{1 / 5} h_{L S}^{opt} (u_{0}) and h_{R}^{opt} = {(\frac{1}{12 τ^{2} σ^{2}})}^{1 / 5} h_{L S}^{opt} .

(21)

Thus, we can first use existing bandwidth selectors (e.g. Zhang and Lee 2000) to estimate $h_{L S}^{opt} (u_{0})$ or $h_{L S}^{opt}$ . The error variance σ² can be estimated based on the residuals; in particular when robustness is of concern it can be estimated using the MAD of the residuals. Hettmansperger and Mckean (1998, p.181) discussed in details how to estimate τ, which can be obtained by the function “wilcoxontau” in the R software developed by Terpstra and McKean (2005). In the end, we plug in these estimators into (21) to get the bandwidth for local rank estimator.

Alternatively, instead of the above two-step procedure, we may directly use the computationally intensive cross-validation to estimate the bandwidth for the local rank procedure. Note that under outlier contamination, standard cross-validation can lead to extremely biased bandwidth estimates because it can be adversely influenced by extreme prediction errors. Robust cross-validation method, such as that developed by Leung (2005), is therefore preferred.

4.3 Examples

We conduct Monte Carlo simulations to access the finite sample performance, and illustrate the proposed methodology on a real environmental data set. In the analysis, we use the Epanechnikov kernel K(u) =.75(1 − u²)I(|u| < 1).

Example 1

We generate random data from

Y = a_{0} (U) + a_{1} (U) X_{1} + a_{2} (U) X_{2} + ε,

where a₀(u) = exp(2u − 1), a₁(u) = 8u(1 − u) and a₂(u) = 2 sin²(2πu). The covariate U follows a uniform distribution on [0, 1], and is independent from (X₁, X₂), where the covariates X₁ and X₂ are standard normal random variables with correlation coefficient 2^−1/2. The coefficient functions and the mechanism to generate U and (X₁, X₂) are the same as those in Cai, Fan and Li (2000). We consider six different error distributions: N(0, 1), Laplace, standard Cauchy, t-distribution with 3 degrees of freedom, mixture of normals 0.9N(0, 1) + 0.1N(0, 10²) and lognormal distribution. Except for Cauchy error, all the other generated random errors are standardized to have median 0 and variance 1. We consider sample sizes n =400 and 800, and conduct 400 simulations for each case.

We compare the performance of the local rank estimate with the local least squares estimate using the square root of average squared errors (RASE), defined by

RASE = {\frac{1}{n_{grid}} \sum_{m = 1}^{p} \sum_{k = 1}^{n_{grid}} {{\hat{a}}_{m} (u_{k}) - a_{m} (u_{k})}^{2}}^{1 / 2},

where {u_k: k = 1, ···, n_grid} is a set of grid points uniformly placed on [0, 1] with n_grid = 200. The sample mean and standard deviation of the RASEs over 400 simulations are presented in Figure 1 and Figure 2, for sample size n = 400 and 800, respectively. The two figures clearly demonstrate that the local rank estimator performs almost as well as the local least squares estimator for normal random error; and has smaller RASE for other heavier-tailed error distributions. The efficiency gain can be substantial, see for example the mixture normal case where the observed relative efficiency for the local rank estimator versus the local least squares estimator is above 2 for most choices of bandwidth. For Cauchy random error, the local rank estimator yields a $\sqrt{n}$ -consistent estimator but the local least squares estimator is inconsistent, which is reflected by the extremely large value of RASE for the local least squares estimator.

Bar graphs of the RASE with standard error for sample size n = 400 over 400 simulations. The light gray bar denotes the local least squares method and the dark gray bar denotes the local rank method. The horizontal axis is in the unit of *h^opt*, which is calculated separately for each method and error specification using either (13) or (14).

Bar graphs of the RASE with standard error for sample size n = 800 over 400 simulations. The light gray bar denotes the local least squares method and the dark gray bar denotes the local rank method. The horizontal axis is in the unit of *h^opt*, which is calculated separately for each method and error specification using either (13) or (14).

Figure 3 depicts the estimated coefficient functions for the normal random error and the mixture normal random error for a typical sample, which is selected in such a way that its RASE value is the median of the 400 RASE values. For this typical sample, we observe that the local rank estimator is almost identical to the local least squares estimator for normal random error; but falls much closer to the truth than the local least squares estimator does for mixture normal random error. Figure 4 plots the estimated coefficient functions for all 400 simulations when the random error has a mixture of normal distribution. It is clear that the local rank estimator has smaller variance. In these two figures, we set the bandwidth to be the theoretic optimal one h^opt, calculated using (15) and (16), for both the local rank estimator and the local least squares estimator.

Plot of estimated coefficient functions for a typical data set

(a) and (c) are plots of 400 local least squares estimators of a₁(·) and a₂(·) over 400 simulation, respectively. (b) and (d) are plots of 400 local rank estimators of a₁(·) and a₂(·), respectively.

At the end, we evaluate the resampling method (Section 3.4) for estimating the standard errors. We randomly perturb the objective function 1000 times; each time the random variables V_i in (20) are generated from the Gamma(0.25, 2) distribution. Table 2 summarizes the simulation results at three points u₀ = 0.25, 0.50 and 0.75. In the table, ‘SD’denotes the standard deviation of 400 estimated â_m(u₀) and can be regarded as the true standard error; ‘SE(std(SE))’ denotes the mean (standard deviation) of 400 estimated standard errors from the resampling method. Bandwidths are set to be the optimal ones. We observe that the resampling method estimates the standard error very accurately.

Table 2.

Standard deviations of the local rank estimators with n = 400

		â₁(u)		â₂(u)
Error	u₀	SD	SE(Std(SE))	SD	SE(Std(SE))
Normal	0.25	0.189	0.159(0.032)	0.197	0.160(0.032)
	0.5	0.183	0.159(0.030)	0.180	0.162(0.031)
	0.75	0.191	0.162(0.033)	0.195	0.163(0.032)

Laplace	0.25	0.175	0.151(0.037)	0.174	0.151(0.037)
	0.5	0.168	0.153(0.039)	0.173	0.154(0.039)
	0.75	0.168	0.150(0.037)	0.177	0.150(0.037)

Mixture	0.25	0.095	0.107(0.051)	0.092	0.107(0.049)
	0.5	0.095	0.109(0.057)	0.091	0.109(0.055)
	0.75	0.095	0.108(0.061)	0.093	0.109(0.055)

t₃	0.25	0.144	0.137(0.039)	0.145	0.138(0.036)
	0.5	0.148	0.133(0.035)	0.152	0.136(0.037)
	0.75	0.158	0.137(0.039)	0.155	0.139(0.042)

Log N	0.25	0.111	0.112(0.047)	0.112	0.114(0.049)
	0.5	0.106	0.114(0.047)	0.107	0.119(0.050)
	0.75	0.118	0.117(0.058)	0.118	0.120(0.060)

Open in a new tab

Example 2

As an illustration, we now apply the local rank procedure to the environmental data set in Fan and Zhang (1999). Of interest is to study the relationship between levels of pollutants and the number of total hospital admissions for circulatory and respiratory problems on every Friday from January 1, 1994 to December 31, 1995. The response variable is the logarithm of the number of total hospital admissions, and the covariates include the level of sulfur dioxide (X₁), the level of nitrogen dioxide (X₂) and the level of dust (X₃). A scatter plot of the response variable over time is given in Figure 5(a). We analyze this data set using the following varying coefficient model

(a) Scatterplot of the log of number of total hospital admissions over time, and the solid curve is an estimator of the expected log of number of hospital admissions over time at the average pollutant levels, i.e., â₀(u) + â₁(u)X̄₁ + â₂(u)X̄₂ + â₃(u)X̄₃, (b), (c) and (d) are the estimated coefficient functions via the local rank estimator for *a_k*(·), k = 1, 2, and 3, respectively.

Y = a_{0} (u) + a_{1} (u) X_{1} + a_{2} (u) X_{2} + a_{3} (u) X_{3} + ε,

where u denotes time and is scaled to the interval [0,1].

We select the bandwidth via the relation (21). More specifically, we first use 20-fold cross-validation to select a bandwidth ĥ_LS for the local least squares estimator. We then use the function ‘wilcoxontau’ in the R package for rank regression by Terpstra and McKean to estimate (12τ²)^−1/2 and use the MAD of the residuals to robustly estimate σ. All these lead to the selected bandwidth for the local rank estimator: ĥ_R = 0.26.

The estimated coefficient functions are depicted in Figures 5(b), (c) and (d), where the two dashed curves around the solid line are the estimated function plus/minus twice the standard errors estimated by the resampling method. These two dashed lines can be regarded as a pointwise confidence interval with bias ignored. The figures suggest clearly that the coefficient functions vary with time. The fitted curve is shown in Figure 5(a).

Now we demonstrate the robustness of the local rank procedure. To this end, we artificially perturb the dataset by moving the response value of the 68^th observation from 5.89 to 6.89, and the response value of the 34^th observation from 5.07 to 3.07. We refit the data with both the local least squares procedure and the local rank procedure, see Figure 6. We observe that the local least squares estimator changes dramatically due to the presence of these two artificial outliers. In contrast, the local rank estimator is nearly unaffected.

(a) Scatterplot of the perturbed data (without the two outliers shown as they are outside of the range), and the local LS and the local rank estimators of the expected log of number of hospital admissions. (b), (c) and (d) are the local LS and the rank estimators of the coefficient functions for *a_k*(·), k = 1, 2, and 3, respectively.

Supplementary Material

WKL

NIHMS142014-supplement-WKL.pdf^{(1MB, pdf)}

Appendix: Proofs

Regularity conditions

(C1). Assume that {U_i, X_i, Y_i} are independent and identically distributed, and that the random error ε and the covariate {U, X} are independent. Furthermore, assume that ε has probability density function g(·) which has finite Fisher information, i.e., ∫{g(x)}⁻¹g′(x)²dx < ∞; and that U has probability density function f(·).
(C2). The function a_m(·), m = 0, 1,…, p, has continuous second-order derivative in a neighborhood of u₀.
(C3). Assume that E(X_i|U_i = u₀) = 0 and that $\sum (u) = E (X_{i} X_{i}^{T} ∣ U_{i} = u)$ is continuous at u = u₀. The matrix Σ(u₀) is positive definite.

(C4). The kernel function K(·) is symmetric about the origin, and has a bounded support. Assume that h → 0 and nh² → ∞, as n → ∞

These conditions are used to facilitate the proofs, but may not be the weakest ones. The assumptions on the random errors in (C1) are the same as those for multiple linear rank regression (Hettmansperger and McKean, 1998). (C2) imposes smoothness requirement on the coefficient functions. In (C3), the assumption E(X_i|U_i = u₀) = 0 (also adopted by Kim, 2007) makes the presentation simpler but can be relaxed. It can be shown that the asymptotic normality still holds without this assumption. The conditions on the kernel function and bandwidth in (C4) are common for nonparametric kernel smoothing.

In our proofs, we will use some results on generalized U-statistic, where the kernel function is allowed to depend on the sample size n. The generalized U-statistic has the form U_n = [n(n − 1)]⁻¹ΣΣ_i_≠_j H_n(D_i, D_j), where ${D_{i}}_{i = 1}^{n}$ is a random sample and H_n is symmetric in its arguments, i.e., H_n(D_i, D_j) = H_n(D_j, D_i). In this paper, $D_{i} = {(X_{i}^{T}, U_{i}, ε_{i})}^{T}$ . Define r_n(D_i) = E[H_n(D_i, D_j)|D_i], r̄_n = E[r_n(D_i)] and ${\hat{U}}_{n} = {\bar{r}}_{n} + 2 n^{- 1} \sum_{i = 1}^{n} [r_{n} (D_{i}) - {\bar{r}}_{n}]$ . We will repeatedly use the following lemma taken from Powell, Stock and Stoker (1989).

Lemma A.1

If E[||H_n(D_i, D_j)||²] = o(n), then $\sqrt{n} (U_{n} - {\hat{U}}_{n}) = o_{p} (1)$ and U_n = r̄_n + o_p(1).

We need the following two lemmas to prove Lemma 3.1. Denote

\begin{array}{l} A_{n 11} = 2 h^{- 2} E {(Z_{i} - Z_{j}) {(Z_{i} - Z_{j})}^{T} K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})}, \\ A_{n 12} = 2 h^{- 2} E {(Z_{i} - Z_{j}) [(U_{i} - U_{j}) / h] K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})}, \\ A_{n 21} = A_{n 12}^{T}, \\ A_{n 22} = 2 h^{- 2} E {[{(U_{i} - U_{j})}^{2} / h^{2}] K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})}, \end{array}

and define

A_{n} = τ (\begin{matrix} A_{n 11} & A_{n 12} \\ A_{n 21} & A_{n 22} \end{matrix}) .

Lemma A.2

Suppose that Conditions (C1) — (C4) hold, then A_n → A, where A is defined in (8).

Proof

We can write $A_{n 11} = (\begin{matrix} A_{n 11}^{1} & A_{n 11}^{2} \\ A_{n 11}^{3} & A_{n 11}^{4} \end{matrix})$ . Let

A_{n 11}^{1} = 2 h^{- 2} E [(X_{i} - X_{j}) {(X_{i} - X_{j})}^{T} K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})] .

Calculating the expectation by conditional on U_i and U_j first, A_n₁₁ becomes

2 h^{- 2} \int E [(X_{i} - X_{j}) {(X_{i} - X_{j})}^{T} ∣ U_{i} = u, U_{j} = v] K (\frac{u - u_{0}}{h}) K (\frac{v - u_{0}}{h}) f (u) f (v) dudv .

Using Condition (C3), straightforward calculation gives $A_{n 11}^{1} \to 4 f^{2} (u_{0}) \sum (u_{0})$ . Let

A_{n 11}^{2} = 2 h^{- 2} E {(X_{i} - X_{j}) {[X_{i} (U_{i} - u_{0}) / h - X_{j} (U_{j} - u_{0}) / h]}^{T} K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})} .

Using Condition (C3) and notice that K(·) is symmetric, it can be shown that $A_{n 11}^{2} \to 0$ . By symmetry, $A_{n 11}^{3} \to 0$ . Similarly, we have

\begin{array}{l} A_{n 11}^{4} = 2 h^{- 2} E {[X_{i} (U_{i} - u_{0}) / h - X_{j} (U_{j} - u_{0}) / h] {[X_{i} (U_{i} - u_{0}) / h - X_{j} (U_{j} - u_{0}) / h]}^{T} \\ K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})} \to 4 f^{2} (u_{0}) \sum (u_{0}) μ_{2} . \end{array}

Thus $A_{n 11} \to 4 f^{2} (u_{0}) \sum (u_{0}) (\begin{matrix} I_{p} & 0 \\ 0 & μ_{2} I_{p} \end{matrix})$ . Similarly, we can show that $A_{n 12} = A_{n 21}^{T} \to 0$ , and

A_{n 22} = 2 \int {(t_{1} - t_{2})}^{2} K (t_{1}) K (t_{2}) f (u_{0} + t_{1} h) f (u_{0} + t_{2} h) {d t}_{1} {d t}_{2} \to 4 f^{2} (u_{0}) μ_{2} .

Lemma A.3

Under Conditions (C1) — (C4), we have

γ_{n}^{- 1} [S_{n} (β^{*}, α_{2}^{*}) - S_{n} (0, 0)] = γ_{n} A (\begin{matrix} β^{*} \\ α_{2}^{*} \end{matrix}) + o_{p} (1) .

Proof

Let $U_{n} = γ_{n}^{- 1} [S_{n} (β^{*}, α_{2}^{*}) - S_{n} (0, 0)] = {[n (n - 1)]}^{- 1} \sum \sum_{i \neq j} W_{n} (D_{i}, D_{j})$ , where

\begin{array}{l} W_{n} (D_{i}, D_{j}) = 2 [I (ε_{i} - γ_{n} α_{2}^{*} (U_{i} - u_{0}) / h - γ_{n} β^{* T} Z_{i} + Δ_{i} (u_{0}) \leq ε_{j} - γ_{n} α_{2}^{*} (U_{j} - u_{0}) / h \\ - γ_{n} β^{* T} Z_{j} + Δ_{j} (u_{0})) - 1 / 2] (\begin{matrix} Z_{i} - Z_{j} \\ (U_{i} - U_{j}) / h \end{matrix}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) . \end{array}

Let H_n(D_i, D_j) = [W_n(D_i, D_j)+W_n(D_j, D_i)]/2, then U_n = [n(n − 1)]⁻¹ ΣΣ_i_≠_j H_n(D_i, D_j) has the form of a generalized U-statistic. Note that $E [∣ ∣ H_{n} (D_{i}, D_{j}) ∣ ∣^{2}] \leq \frac{1}{2} E [∣ ∣ W_{n} (D_{i}, D_{j}) ∣ ∣^{2}] + \frac{1}{2} E [∣ ∣ W_{n} (D_{i}, D_{j}) ∣ ∣^{2}] = E [∣ ∣ W_{n} (D_{i}, D_{j}) ∣ ∣^{2}]$ . Furthermore,

\begin{array}{l} E [∣ ∣ W_{n} (D_{i}, D_{j}) ∣ ∣^{2}] \\ \leq 4 h^{- 4} E {[{(Z_{i} - Z_{j})}^{T} (Z_{i} - Z_{j}) + {[(U_{i} - U_{j}) / h]}^{2}] K^{2} (\frac{U_{i} - u_{0}}{h}) K^{2} (\frac{U_{j} - u_{0}}{h})} \\ = O (h^{- 2}) = o (n) since n h^{2} \to \infty by assumption. \end{array}

Thus U_n = E[H_n(D_i, D_j)] + o_p(1) by Lemma A.1. Furthermore,

\begin{array}{l} E [H_{n} (D_{i}, D_{j})] \\ = 2 h^{- 2} E {\int [G (ε + Δ_{j} (u_{0}) - Δ_{i} (u_{0}) - γ_{n} α_{2}^{*} (U_{j} - U_{i}) / h - γ_{n} β^{* T} (Z_{j} - Z_{i})) - G (ε)] g (ε) d ε \\ (\begin{matrix} Z_{i} - Z_{j} \\ (U_{i} - U_{j}) / h \end{matrix}) K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})} \\ = 2 h^{- 2} γ_{n} E {\int g [ε + Δ_{j} (u_{0}) - Δ_{i} (u_{0})] g (ε) d ε (\begin{matrix} Z_{i} - Z_{j} \\ (U_{i} - U_{j}) / h \end{matrix}) (Z_{i}^{T} - Z_{j}^{T}, (U_{i} - U_{j}) / h) \\ K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})} (\begin{matrix} β^{*} \\ α_{2}^{*} \end{matrix}) (1 + o (1)) = γ_{n} A_{n} (\begin{matrix} β^{*} \\ α_{2}^{*} \end{matrix}) {1 + o (1)} . \end{array}

The proof is completed by using Lemma A.2.

Proof of Lemma 3.1

In view of Lemma A.3, it follows that

\nabla [γ_{n}^{- 1} Q_{n}^{*} (β^{*}, α_{2}^{*}) - B_{n} (β^{*}, α_{2}^{*})] = γ_{n}^{- 1} [S_{n} (β^{*}, α_{2}^{*}) - S_{n} (0, 0)] - γ_{n} A (\begin{matrix} β^{*} \\ α_{2}^{*} \end{matrix}) = o_{p} (1) .

The proof follows along the same lines of the proof of Theorem A.3.7. of Hettmansperger and McKean (1998), using a “diagonal subsequencing” argument and convexity.

Proof of Theorem 3.2

By Lemma 3.1, $γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2}) = B_{n} (s_{1}, s_{2}) + r_{n} (s_{1}, s_{2})$ , where $r_{n} (s_{1}, s_{2}) \overset{p}{\to} 0$ uniformly over any bounded set. Note that $γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2})$ is minimized by ${({\hat{β}}_{n}^{* T}, {\hat{α}}_{2 n}^{*})}^{T}$ , and B_n(s₁, s₂) is minimized by ${({\tilde{β}}_{n}^{* T}, {\tilde{α}}_{2 n}^{*})}^{T} = - γ_{n}^{- 2} A^{- 1} {(S_{n 1}^{T} (0, 0), S_{n 2} (0, 0))}^{T}$ . We first establish the asymptotic representation by following similar argument as in Hjort and Pollard (1993). For any constant c > 0, define

\begin{array}{l} T_{n} = inf_{∣ ∣ (s_{1}^{T}, s_{2}) - ({\tilde{β}}_{n}^{* T}, {\tilde{α}}_{2 n}^{*}) ∣ ∣ = c} B_{n} (s_{1}, s_{2}) - B_{n} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*}) \\ R_{n} = sup_{∣ ∣ (s_{1}^{T}, s_{2}) - ({\tilde{β}}_{n}^{* T}, {\tilde{α}}_{2 n}^{*}) ∣ ∣ \leq c} ∣ γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2}) - B_{n} (s_{1}, s_{1}) ∣, \end{array}

then $R_{n} \overset{p}{\to} 0$ as n → ∞. Let ${(s_{1}^{T}, s_{2})}^{T}$ be an arbitrary point outside the ball { ${(s_{1}^{T}, s_{2})}^{T} : ∣ ∣ (s_{1}^{T}, s_{2}) - ({\tilde{β}}_{n}^{* T}, {\tilde{α}}_{2 n}^{*}) ∣ ∣ \leq c$ }, then we can write ${(s_{1}^{T}, s_{2})}^{T} = {({\tilde{β}}_{n}^{* T}, {\tilde{α}}_{2 n}^{*})}^{T} + l 1_{2 p + 1}$ , where l > c is a positive constant and 1_d denotes a unit vector of length d. Write

\frac{c}{l} [γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2}) - γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*})] = \frac{c}{l} γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2}) + (1 - \frac{c}{l}) γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*}) - γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*}) .

By the convexity of $γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2})$ , we have

\frac{c}{l} [γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2}) - γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*})] \geq γ_{n}^{- 1} Q_{n}^{*} (\frac{c}{l} (s_{1}, s_{2}) + (1 - \frac{c}{l}) ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*})) - γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*}) .

Thus,

\begin{array}{l} \frac{c}{l} [γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2}) - γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*})] \geq γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*} + c 1_{2 p}, {\tilde{α}}_{2 n}^{*} + c) - γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*}) \\ = B_{n} ({\tilde{β}}_{n}^{*} + c 1_{2 p}, {\tilde{α}}_{2 n}^{*} + c) + r_{n} ({\tilde{β}}_{n}^{*} + c 1_{2 p}, {\tilde{α}}_{2 n}^{*} + c) - B_{n} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*}) - r_{n} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*}) \\ \geq T_{n} - 2 R_{n} . \end{array}

If $R_{n} \leq \frac{1}{2} T_{n}$ , then $γ_{n}^{- 1} Q_{n}^{*} (s_{1}, s_{2}) > γ_{n}^{- 1} Q_{n}^{*} ({\tilde{β}}_{n}^{*}, {\tilde{α}}_{2 n}^{*})$ for all ${(s_{1}^{T}, s_{2})}^{T}$ outside the ball. This implies if $R_{n} \leq \frac{1}{2} T_{n}$ then the minimizer of $γ_{n}^{- 1} Q_{n}^{*}$ must be inside the ball. Thus

P (∣ ∣ {({\tilde{β}}_{n}^{* T}, {\tilde{α}}_{2 n}^{*})}^{T} - {({\hat{β}}_{n}^{* T}, {\hat{α}}_{2 n}^{*})}^{T} ∣ ∣ \geq c) \leq P (R_{n} \geq \frac{1}{2} T_{n}) = P (R_{n} \geq \frac{1}{2} λ c^{2}) \to 0,

where λ is the smallest eigenvalue of A. Therefore, ${({\hat{β}}_{n}^{* T}, {\hat{α}}_{2 n}^{*})}^{T} = {({\tilde{β}}_{n}^{* T}, {\tilde{α}}_{2 n}^{*})}^{T} + o_{p} (1)$ . This in particular implies the asymptotic representations (9), (11) and (12).

We next show the asymptotic normality of â(u₀). From (9), we have

\sqrt{n h} (\hat{a} (u_{0}) - a (u_{0})) = - γ_{n}^{- 2} {(4 τ f^{2} (u_{0}) \sum (μ_{0}))}^{- 1} S_{n 11} (0, 0) + o_{p} (1),

(A.1)

where

\begin{array}{l} S_{n 11} (0, 0) = 2 γ_{n} {[n (n - 1)]}^{- 1} \sum_{i \neq j} [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - 1 / 2] (X_{i} - X_{j}) \\ K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) . \end{array}

(A.2)

By (A.2), let us rewrite $- γ_{n}^{2} S_{n 11} (0, 0) = S_{n a 1} (0, 0) + S_{n a 2} (0, 0)$ , where

\begin{array}{l} S_{n a} (0, 0) = 2 γ_{n}^{- 1} {[n (n - 1)]}^{- 1} \sum_{i \neq j} [I (ε_{i} \leq ε_{j}) - 1 / 2] (X_{j} - X_{i}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}), \\ S_{n b} (0, 0) = 2 γ_{n}^{- 1} {[n (n - 1)]}^{- 1} \sum_{i \neq j} [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - I (ε_{i} \leq ε_{j})] (X_{j} - X_{i}) \\ K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) . \end{array}

We next prove that

S_{n a} (0, 0) \to N (0, \frac{4}{3} f^{3} (u_{0}) ν_{0} \sum (u_{0})) in distribution .

(A.3)

Note that we can write $S_{n a} (0, 0) = \sqrt{n} {[n (n - 1)]}^{- 1} \sum_{i \neq j} H_{n} (D_{i}, D_{j})$ , where H_n(D_i, D_j) =W_n(D_i, D_j) + W_n(D_j, D_i) with

W_{n} (D_{i}, D_{j}) = h^{- 3 / 2} [I (ε_{i} \leq ε_{j}) - 1 / 2] (X_{j} - X_{i}) K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h}) .

Similarly to the arguments in the proof of Lemma A.3, it can be shown that E[||H_n(D_i, D_j)||²] = o(n). By Lemma A.1, this implies that $S_{n a} (0, 0) = 2 n^{- 1} \sum_{i = 1}^{n} r_{n} (D_{i}) + o_{p} (1)$ since it is easy to check that r̄_n = 0. We have

\begin{array}{l} r_{n} (D_{i}) = E [H_{n} (D_{i}, D_{j}) ∣ D_{i}] \\ = 2 h^{- 3 / 2} [G (ε_{i}) - 1 / 2] K (\frac{U_{i} - u_{0}}{h}) E {(X_{i} - X_{j}) K (\frac{U_{j} - u_{0}}{h}) ∣ X_{i}, U_{i}, ε_{i}} \\ = 2 h^{- 1 / 2} [G (ε_{i}) - 1 / 2] K (\frac{U_{i} - u_{0}}{h}) [(\int K (t) f (u_{0} + t h) d t) X_{i} \\ - \int E (X_{j} ∣ U_{j} = u_{0} + t h) K (t) f (u_{0} + t h) d t] . \end{array}

Furthermore,

\begin{array}{l} E [r_{n} (D_{i}) r_{n} {(D_{i})}^{T}] \\ = \frac{1}{3} h^{- 1} E {K^{2} (\frac{U_{i} - u_{0}}{h}) [(\int K (t) f (u_{0} + t h) d t) X_{i} \\ - \int E (X_{j} ∣ U_{j} = u_{0} + t h) K (t) f (u_{0} + t h) d t] \\ [(\int K (t) f (u_{0} + t h) d t) X_{i}^{T} - \int E (X_{j}^{T} ∣ U_{j} = u_{0} + t h) K (t) f (u_{0} + t h) d t]} \\ \to \frac{1}{3} f^{3} (u_{0}) ν_{0} \sum (u_{0}) . \end{array}

To prove the asymptotic normality of S_n_a(0, 0), it is sufficient to check the Lindeberg-Feller condition: ∀ ε > 0, $n^{- 1} \sum_{i = 1}^{n} E {r_{n} (D_{i}) r_{n} {(D_{i})}^{T} I (∣ ∣ r_{n} (D_{i}) ∣ ∣ > ε \sqrt{n})} \to 0$ . This can be easily verified by applying the dominated convergence theorem. Next we show that

S_{n b} (0, 0) = \frac{2 h^{2}}{γ_{n}} [τ f^{2} (u_{0}) μ_{2} \sum (u_{0}) a^{″} (u_{0}) + o (1)] + o_{p} (1) .

(A.4)

We may write $S_{n b} (0, 0) = {[n (n - 1)]}^{- 1} \sum_{i \neq j} H_{n}^{*} (D_{i}, D_{j})$ , where $H_{n}^{*} (D_{i}, D_{j}) = W_{n}^{*} (D_{i}, D_{j}) + W_{n}^{*} (D_{j}, D_{i})$ with

\begin{array}{l} W_{n}^{*} (D_{i}, D_{j}) = n h^{- 1} γ_{n} [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - I (ε_{i} \leq ε_{j})] (X_{j} - X_{i}) \\ K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h}) . \end{array}

Note that

\begin{array}{l} Δ_{j} (u_{0}) - Δ_{i} (u_{0}) \\ = \frac{1}{2} [{(U_{j} - u_{0})}^{2} X_{j}^{T} - {(U_{i} - u_{0})}^{2} X_{i}^{T}] a^{″} (u_{0}) + \frac{1}{2} [{(U_{j} - u_{0})}^{2} - {(U_{i} - u_{0})}^{2}] a_{0}^{″} (u_{0}) \\ + o ({(U_{i} - u_{0})}^{2}) + o ({(U_{j} - u_{0})}^{2}) . \end{array}

By applying Lemma A.1, it can be shown that $S_{n b} (0, 0) = E [H_{n}^{*} (D_{i}, D_{j})] + o_{p} (1)$ . It follows by using the same arguments as those in the proof of Lemma A.2 that

\begin{array}{l} E [H_{n}^{*} (D_{i}, D_{j})] \\ = 2 n h^{- 1} γ_{n} E {\int [G (ε + Δ_{j} (u_{0}) - Δ_{i} (u_{0})) - G (ε)] g (ε) d ε \\ (X_{j} - X_{i}) K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})} \\ = 2 n h^{- 1} γ_{n} [τ + O (h)] E [(Δ_{j} (u_{0}) - Δ_{i} (u_{0})) (X_{j} - X_{i}) K (\frac{U_{i} - u_{0}}{h}) K (\frac{U_{j} - u_{0}}{h})] (1 + o (1)) \\ = \frac{2 h^{2}}{γ_{n}} [τ f^{2} (u_{0}) μ_{2} \sum (u_{0}) a^{″} (u_{0}) + o (1)] . \end{array}

This proves (A.4). By combining (A.3) and (A.4) and using the approximation given in (A.1), we obtain (10).

Proof of Theorem 3.3

A result of Hodges and Lehmann (1956) indicates that the ARE has a lower bound 0.864^4/5 = 0.8896, with this lower bound being attained at the density $f (t) = \frac{3}{20 \sqrt{5}} (5 - x^{2}) I (∣ x ∣ \leq 5)$ .

Proof of Theorem 3.4

Let

\begin{array}{l} V_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) \\ = {(n h)}^{- 1} \sum_{i = 1}^{n} | ε_{i} - γ_{n} α_{1}^{*} - ξ_{1} (U_{i} - u_{0}) - ξ_{2}^{T} X_{i} - ξ_{3}^{T} (U_{i} - u_{0}) X_{i} + Δ_{i} (u_{0}) | K (\frac{U_{i} - u_{0}}{h}), \end{array}

where $α_{1}^{*} = γ_{n}^{- 1} (α_{1} - a_{0} (u_{0}))$ , ξ₁ ∈ ℝ, ξ₂ ∈ ℝ^p and ξ₃ ∈ ℝ^p. The subgradient of $V_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3})$ with respect to $α_{1}^{*}$ is

\begin{array}{l} S_{n}^{*} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) \\ = \frac{2 γ_{n}}{n h} \sum_{i = 1}^{n} [I (ε_{i} \leq γ_{n} α_{1}^{*} + ξ_{1} (U_{i} - u_{0}) + ξ_{2}^{T} X_{i} + ξ_{3}^{T} (U_{i} - u_{0}) X_{i} - Δ_{i} (u_{0})) - 1 / 2] K (\frac{U_{i} - u_{0}}{h}) . \end{array}

We have $S_{n}^{*} (0, 0, 0, 0) = 2 γ_{n} {(n h)}^{- 1} \sum_{i = 1}^{n} [I (ε_{i} \leq Δ_{i} (u_{0})) - 1 / 2] K (\frac{U_{i} - u_{0}}{h})$ , which is the same as the S_n₀ defined in (18). Let $U_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) = γ_{n}^{- 1} [S_{n}^{*} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) - S_{n}^{*} (0, 0, 0, 0)]$ , then

\begin{array}{l} U_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) \\ = 2 {(n h)}^{- 1} \sum_{i = 1}^{n} [I (ε_{i} \leq γ_{n} α_{1}^{*} + ξ_{1} (U_{i} - u_{0}) + ξ_{2}^{T} X_{i} + ξ_{3}^{T} (U_{i} - u_{0}) X_{i} - Δ_{i} (u_{0})) \\ - I (ε_{i} \leq Δ_{i} (u_{0}))] K (\frac{U_{i} - u_{0}}{h}) . \end{array}

For any positive constants c_i, i = 1, 2, 3 and ∀ ξ₁, ξ₂, ξ₃ such that ||ξ₁|| ≤ c₁h⁻¹γ_n, ||ξ₂|| ≤ c₂γ_n and ||ξ₃|| ≤ c₃h⁻¹γ_n, we have

U_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) = 2 γ_{n} g (0) f (u_{0}) α_{1}^{*} + o_{p} (1) .

(A.5)

This can be proved by directly checking the mean and variance. More specifically,

\begin{array}{l} E [U_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3})] \\ = 2 h^{- 1} E {[G (γ_{n} α_{1}^{*} + ξ_{1} (U_{i} - u_{0}) + ξ_{2}^{T} X_{i} + ξ_{3}^{T} (U_{i} - u_{0}) X_{i} - Δ_{i} (u_{0})) \\ - G (- Δ_{i} (u_{0}))] K (\frac{U_{i} - u_{0}}{h})} \\ = 2 h^{- 1} g (0) E {[γ_{n} α_{1}^{*} + ξ_{1} (U_{i} - u_{0}) + ξ_{2}^{T} X_{i} + ξ_{3}^{T} (U_{i} - u_{0}) X_{i}] K (\frac{U_{i} - u_{0}}{h})} (1 + O (h)) \\ = 2 γ_{n} g (0) f (u_{0}) α_{1}^{*} (1 + O (h)) . \end{array}

And

\begin{array}{l} Var [U_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3})] \\ \leq 4 n^{- 1} h^{- 2} E {[I (ε_{i} \leq γ_{n} α_{1}^{*} + ξ_{1} (U_{i} - u_{0}) + ξ_{2}^{T} X_{i} + ξ_{3}^{T} (U_{i} - u_{0}) X_{i} - Δ_{i} (u_{0})) \\ {- I (ε_{i} \leq Δ_{i} (u_{0}))]}^{2} K^{2} (\frac{U_{i} - u_{0}}{h})} \\ \leq 4 n^{- 1} h^{- 2} E {K^{2} (\frac{U_{i} - u_{0}}{h})} = O (n^{- 1} h^{- 1}) = o (1) . \end{array}

By (A.5) and similar proof as that for Lemma 3.1, we have

γ_{n}^{- 1} V_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) = V_{n}^{*} (α_{1}^{*}) + o_{p} (1),

(A.6)

where $V_{n}^{*} (α_{1}^{*}) = γ_{n}^{- 1} S_{n}^{*} (0, 0, 0, 0) α_{1}^{*} + γ_{n} g (0) f (u_{0}) α_{1}^{* 2} + γ_{n}^{- 1} V_{n} (0, 0, 0, 0)$ . Because the function $V_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3})$ is convex in its arguments, (A.6) can be strengthened to uniform convergence (convexity lemma, see Pollard 1991), i.e.,

sup_{\begin{array}{l} α_{1}^{*} \in C, ∣ ∣ ξ_{1} ∣ ∣ \leq c_{1} h^{- 1} γ_{n} \\ ∣ ∣ ξ_{2} ∣ ∣ \leq c_{2} γ_{n}, ∣ ∣ ξ_{3} ∣ ∣ \leq c_{3} h^{- 1} γ_{n} \end{array}} ∣ γ_{n}^{- 1} V_{n} (α_{1}^{*}, ξ_{1}, ξ_{2}, ξ_{3}) - V_{n}^{*} (α_{1}^{*}) ∣ = o_{p} (1),

where ℂ is a compact set in ℝ. By Theorem 3.2, ${\hat{α}}_{2} - α_{0}^{'} (u_{0}) = O_{p} (h^{- 1} γ_{n})$ , â(u₀) − a(u₀) = O_p(γ_n) and â′ (u₀) − a′(u₀) = O_p(h⁻¹γ_n), we thus have

sup_{α_{1}^{*} \in C} | γ_{n}^{- 1} V_{n} (α_{1}^{*}, {\hat{α}}_{2} - a_{0}^{'} (u_{0}), \hat{a} (u_{0}) - a (u_{0}), {\hat{a}}^{'} (u_{0}) - a^{'} (u_{0})) - V_{n}^{*} (α_{1}^{*}) | = o_{p} (1) .

Note that $V_{n} (α_{1}^{*}, {\hat{α}}_{2} - a_{0}^{'} (u_{0}), \hat{a} (u_{0}) - a (u_{0}), {\hat{a}}^{'} (u_{0}) - a^{'} (u_{0})) = Q_{n 0}^{*} (α_{1}^{*}, {\hat{α}}_{2}, \hat{β}), S_{n}^{*} (0, 0, 0, 0) = S_{n 0}$ , where $Q_{n 0}^{*}$ and S_n₀ are defined in Section 2.4. The quadratic function $V_{n}^{*} (α_{1}^{*})$ is minimized by ${\tilde{α}}_{1 n}^{*} = \frac{1}{2} γ_{n}^{- 2} {[g (0) f (u_{0})]}^{- 1} S_{n 0}$ . Similar argument as that for Theorem 3.2 shows that ${\hat{α}}_{1 n}^{*} = {\tilde{α}}_{1 n}^{*} + o_{p} (1)$ . Thus we have (19). We can write $γ_{n}^{- 2} S_{n 0} = T_{1 n} + T_{2 n}$ , where

\begin{array}{l} T_{1 n} = \frac{2 γ_{n}^{- 1}}{n h} \sum_{i = 1}^{n} [I (ε_{i} \leq 0) - 1 / 2] K (\frac{U_{i} - u_{0}}{h}), \\ T_{2 n} = \frac{2 γ_{n}^{- 1}}{n h} \sum_{i = 1}^{n} [I (ε_{i} \leq - Δ_{i} (u_{0})) - I (ε_{i} \leq 0)] K (\frac{U_{i} - u_{0}}{h}) . \end{array}

By the Lindeberg-Feller central limit theorem, T₁_n → N (0, f (u₀)ν₀/3) in distribution. By checking mean and variance, we have

T_{2 n} = - \frac{h^{2}}{γ_{n}} g (0) f (u_{0}) a_{0}^{″} (u_{0}) μ_{2} (1 + o (1)) + o_{p} (1) .

Combining the above results and using (19), the proof is completed.

To prove Theorem 3.5, we first extend Lemma A.1 to almost sure convergence.

Lemma A.4

If E[||H_n(D_i, D_j)||²] = O(h⁻²), then U_n − Û_n = o(1) almost surely and U_n = r̄_n + o(1) a.s.

Proof

The proof of Powell, Stock and Stoker (1989) for Lemma A.1 suggests that E[||U_n − Û_n||²] = O(n⁻²h⁻²). By Theorem 1.3.5 of Serfling (1980), $\sum_{i = 1}^{n} E [∣ ∣ U_{n} - {\hat{U}}_{n} ∣ ∣^{2}] = O (n^{- 1} h^{- 2}) < \infty$ . This implies that U_n − Û_n = o(1) almost surely. The second result follows by an application of the strong law of large numbers to Û_n.

Proof of Theorem 3.5

Let β* and $α_{2}^{*}$ be defined the same as before. We introduce the reparametrized objective function ${\bar{Q}}_{n}^{*} (β^{*}, α_{2}^{*})$ . Let ${\bar{S}}_{n} (β^{*}, α_{2}^{*}) = {({\bar{S}}_{n 1}^{T} (β^{*}, α_{2}^{*}), {\bar{S}}_{n 2} (β^{*}, α_{2}^{*}))}^{T}$ denote the gradient function of ${\bar{Q}}_{n}^{*} (β^{*}, α_{2}^{*})$ , which is defined similarly as in Section 2.2. We first show that ${\bar{S}}_{n} (β^{*}, α_{2}^{*})$ has a similar local linear approximation as stated in Lemma A.3. To make the proof concise, we prove this for ${\bar{S}}_{n 1} (β^{*}, α_{2}^{*})$ , where

\begin{array}{l} {\bar{S}}_{n 1} (β^{*}, α_{2}^{*}) \\ = 2 γ_{n} {[n (n - 1)]}^{- 1} \sum_{i \neq j} (V_{i} + V_{j}) [I (ε_{i} - γ_{n} α_{2}^{*} (U_{i} - u_{0}) / h - γ_{n} β^{* T} Z_{i} + Δ_{i} (u_{0}) \leq ε_{j} \\ - γ_{n} α_{2}^{*} (U_{j} - u_{0}) / h - γ_{n} β^{* T} Z_{j} + Δ_{j} (u_{0})) - 1 / 2] (Z_{i} - Z_{j}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) . \end{array}

Let $U_{n} = γ_{n}^{- 1} [{\bar{S}}_{n 1} (β^{*}, α_{2}^{*}) - {\bar{S}}_{n 1} (0, 0)] = {[n (n - 1)]}^{- 1} \sum_{i \neq j} (V_{i} + V_{j}) M_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*})$ , where $M_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*}) = \frac{1}{2} [m_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*}) + m_{n} (D_{j}, D_{i}, β^{*}, α_{2}^{*})]$ and

\begin{array}{l} m_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*}) \\ = 2 [I (ε_{i} - γ_{n} α_{2}^{*} (U_{i} - u_{0}) / h - γ_{n} β^{* T} Z_{i} + Δ_{i} (u_{0}) \leq ε_{j} - γ_{n} α_{2}^{*} (U_{j} - u_{0}) / h \\ - γ_{n} β^{* T} Z_{j} + Δ_{j} (u_{0})) - 1 / 2] (Z_{i} - Z_{j}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) . \end{array}

Note that $U_{n} = 2 n^{- 1} \sum_{i = 1}^{n} V_{i} [{(n - 1)}^{- 1} \sum_{j = 1, j \neq i}^{n} M_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*})]$ . Conditional on ${D_{i}}_{i = 1}^{n}$ , this is a weighted average of V_i. Note that

\begin{array}{l} E (U_{n} ∣ {D_{i}}_{i = 1}^{n}) = {[n (n - 1)]}^{- 1} \sum_{i \neq j} M_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*}) . \\ Var (U_{n} ∣ {D_{i}}_{i = 1}^{n}) = n^{- 2} \sum_{i = 1}^{n} {[{(n - 1)}^{- 1} \sum_{j = 1, j \neq i}^{n} M_{n} (d_{i}, d_{j}, β^{*}, α_{2}^{*})]}^{2} . \end{array}

By Lemma A.4, it can be shown that ${[n (n - 1)]}^{- 1} \sum_{i \neq j} M_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*}) = γ_{n} A^{*} β^{*} + o (1)$ almost surely, where A* = 4τf²(u₀)diag(I_p,μ₂I_p) ⊗ Σ(μ₀). It is also easy to check that $n^{- 2} {\sum_{i = 1}^{n} [{(n - 1)}^{- 1} \sum_{j = 1, j \neq i}^{n} M_{n} (D_{i}, D_{j}, β^{*}, α_{2}^{*})]}^{2} = o (1)$ almost surely. Thus for almost surely every sequence ${D_{i}}_{i = 1}^{n}$ , U_n=γ_nA*β*+o_p(1), where o_p(1) is in the probability space generated by ${V_{i}}_{i = 1}^{n}$ . The proofs of Lemma 3.1 and the asymptotic representation in Theorem 3.2 can be similarly carried out to show that for almost surely every sequence ${D_{i}}_{i = 1}^{n}$ ,

\sqrt{n h} [{\bar{a}}_{n} (u_{0}) - a (u_{0})] = - γ_{n}^{- 2} {(4 τ f^{2} (u_{0}) \sum (μ_{0}))}^{- 1} {\bar{S}}_{n a}^{*} (0, 0) + o_{p} (1),

(A.7)

where o_p(1) is in the probability space generated by ${V_{i}}_{i = 1}^{n}$ , and

\begin{array}{l} {\bar{S}}_{n a}^{*} (0, 0) = 2 γ_{n} {[n (n - 1)]}^{- 1} \sum_{i \neq j} (V_{i} + V_{j}) [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - 1 / 2] \\ (X_{i} - X_{j}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) . \end{array}

The approximation (A.1) can be strengthened to almost surely convergence, i.e.,

\sqrt{n h} [{\hat{a}}_{n} (u_{0}) - a (u_{0})] = - γ_{n}^{- 2} {(4 τ f^{2} (u_{0}) \sum (μ_{0}))}^{- 1} S_{n a}^{*} (0, 0) + o (1) a . s ..

(A.8)

Combining (17) and (A.7), we have that for almost surely every sequence ${D_{i}}_{i = 1}^{n}$ ,

\sqrt{n h} [{\bar{a}}_{n} (u_{0}) - {\hat{a}}_{n} (u_{0})] = - γ_{n}^{- 2} {(4 τ f^{2} (u_{0}) \sum (μ_{0}))}^{- 1} [{\bar{S}}_{n a}^{*} (0, 0) - S_{n a}^{*} (0, 0)] + o_{p} (1) .

Note that

\begin{array}{l} γ_{n}^{- 2} [{\bar{S}}_{n a}^{*} (0, 0) - S_{n a}^{*} (0, 0)] \\ = 2 γ_{n}^{- 1} {[n (n - 1)]}^{- 1} \sum_{i \neq j} [(V_{i} - 1 / 2) + (V_{j} - 1 / 2)] [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - 1 / 2] \\ (X_{i} - X_{j}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0}) \\ = 4 γ_{n}^{- 1} n^{- 1} \sum_{i = 1}^{n} (V_{i} - 1 / 2) {{(n - 1)}^{- 1} \sum_{j = 1, j \neq i}^{n} [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - 1 / 2] \\ (X_{i} - X_{j}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0})} . \end{array}

And $E {γ_{n}^{- 2} [{\bar{S}}_{n a}^{*} (0, 0) - S_{n a}^{*} (0, 0)] ∣ {D_{i}}_{i = 1}^{n}} = 0$ . We have

\begin{array}{l} Var {γ_{n}^{- 2} [{\bar{S}}_{n a}^{*} (0, 0) - S_{n a}^{*} (0, 0)] ∣ {D_{i}}_{i = 1}^{n}} \\ = 16 γ_{n}^{- 2} n^{- 2} {(n - 1)}^{2} \sum_{i = 1}^{n} {\sum_{j = 1, j \neq i}^{n} [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - 1 / 2)] \\ {(X_{i} - X_{j}) K_{h} (U_{i} - u_{0}) K_{h} (U_{j} - u_{0})}}^{2} \\ = W_{1} + W_{2}, \end{array}

where

\begin{array}{l} W_{1} = 16 γ_{n}^{- 2} n^{- 2} {(n - 1)}^{2} h^{- 4} \sum_{i = 1}^{n} \sum_{j = 1, j \neq i}^{n} {[I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j} + Δ_{j} (u_{0})) - 1 / 2]}^{2} \\ (X_{i} - X_{j}) {(X_{i} - X_{j})}^{T} K^{2} ((U_{i} - u_{0}) / h) K^{2} ((U_{j} - u_{0}) / h), \\ W_{2} = 16 γ_{n}^{- 2} n^{- 2} {(n - 1)}^{2} h^{- 4} \sum_{i = 1}^{n} \sum_{j_{1} \neq i}^{n} \sum_{j_{2} \neq i, j_{1}}^{n} [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j_{1}} + Δ_{j_{1}} (u_{0})) - 1 / 2] \\ [I (ε_{i} + Δ_{i} (u_{0}) \leq ε_{j_{2}} + Δ_{j_{2}} (u_{0})) - 1 / 2] (X_{i} - X_{j_{1}}) {(X_{i} - X_{j_{2}})}^{T} \\ K^{2} ((U_{i} - u_{0}) / h) K ((U_{j_{1}} - u_{0}) / h) K ((U_{j_{2}} - u_{0}) / h) . \end{array}

Lemma A.4 can be used to show that W₁ = o(1) almost surely; and a minor extension of Lemma A.4 to third-order U-statistic can be used to show that $W_{2} = \frac{4}{3} f^{3} (u_{0}) ν_{0} \sum (u_{0}) + o (1)$ almost surely. The asymptotic normality of $γ_{n}^{- 2} [{\bar{S}}_{n a}^{*} (0, 0) - S_{n a}^{*} (0, 0)]$ follows by showing that the condition of Lindeberg-Feller central limit theorem for triangular arrays holds almost surely. We have, for almost surely every sequence ${D_{i}}_{i = 1}^{n}$ ,

γ_{n}^{- 2} [{\bar{S}}_{n a}^{*} (0, 0) - S_{n a}^{*} (0, 0)] \to N (0, \frac{4}{3} f^{3} (u_{0}) ν_{0} \sum (u_{0}))

in distribution. This completes the proof.

Footnotes

Lan Wang is Assistant Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455. Email: lan@stat.umn.edu. Bo Kai is a graduate student, Department of Statistics, The Pennsylvania State University, University Park, PA 16802. Email: bokai@psu.edu. Runze Li is Professor, Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111. Email: rli@stat.psu.edu. Wang’s research is supported by National Science Foundation grant DMS-0706842. Kai’s research is supported by National Science Foundation grants DMS 0348869 as a research assistant. Li’s research is supported by NIDA, NIH grants R21 DA024260 and P50 DA10075. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

References

1.Brumback B, Rice JA. Smoothing Spline Models for the Analysis of Nested and Crossed Samples of Curves (with discussion) Journal of the American Statistical Association. 1998;93:961–994. [Google Scholar]
2.Cai Z, Fan J, Li R. Efficient Estimation and Inferences for Varying- Coefficient Models. Journal of the American Statistical Association. 2000;95:888–902. [Google Scholar]
3.Cleveland WS, Grosse E, Shyu WM. Local regression models. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Pacific Grove, California: Wadsworth & Brooks; 1992. pp. 309–376. [Google Scholar]
4.David HA. Early Sample Measures of Variability. Statistical Science. 1998;13:368–377. [Google Scholar]
5.Fan J, Li R. An Overview on Nonparametric and Semiparametric Techniques for Longitudinal Data. In: Fan J, Koul H, editors. Frontiers in Statistics. Imperial College Press; London: 2006. pp. 277–303. [Google Scholar]
6.Fan J, Zhang W. Statistical Estimation in Varying-Coefficient Models. The Annals of Statistics. 1999;27:1491–1518. [Google Scholar]
7.Fan J, Zhang W. Simultaneous Confidence Bands and Hypothesis Testing in Varying-Coefficient Models. Scandinavian Journal of Statistics. 2000;27:715–731. [Google Scholar]
8.Fan J, Zhang W. Statistical Methods with Varying Coefficient Models. Statistics and its Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hall P, Kang KH. Bootstrapping Nonparametric Density Eestimators with Empirically Chosen Bandwidths. The Annals of Statistics. 2001;29:1443–1468. [Google Scholar]
10.Hastie TJ, Tibshirani RJ. Varying-Coefficient Models (with discussion) Journal of the Royal Statistical Society, Series B. 1993;55:757–796. [Google Scholar]
11.Hettmansperger TP, McKean JW. Robust Nonparametric Statistical Methods. London: Arnold; 1998. [Google Scholar]
12.Hjort NL, Pollard D. Asymptotics for minimisers of convex processes, Preprint. 1993 http://citeseer.ist.psu.edu/hjort93asymptotics.html.
13.Hodges JL, Lehmann EL. The Efficiency of Some Nonparametric Competitors of the t-Test. The Annals of Mathematical Statistics. 1956;27:324–335. [Google Scholar]
14.Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric Smoothing Estimates of Time-varying Coefficient Models with Longitudinal Data. Biometrika. 1998;85:809–822. [Google Scholar]
15.Jin Z, Ying Z, Wei LJ. A Simple Resampling Method by Disturbibg the Minimand. Biometrika. 2001;88:381–390. [Google Scholar]
16.Kauermann G, Tutz G. On Model Diagnostics Using Varying Coefficient Models. Biometrika. 1999;86:119–128. [Google Scholar]
17.Kim M-O. Quantile Regression with Varying Coefficients. The Annals of Statistics. 2007;35:92–108. [Google Scholar]
18.Leung DH. Cross-validation in nonparametric regression with outliers. The Annals of Statistics. 2005;33:2291–2310. [Google Scholar]
19.McKean JW. Robust Analysis of Linear Models. Statistical Science. 2004;19:562–570. [Google Scholar]
20.Pollard D. Asymptotics for Least Absoulte Deviation Regression Estimators. Econometric Theory. 1991;7:186–199. [Google Scholar]
21.Powell JL, Stock JH, Stoker TM. Semiparametric Estimation of Index Coefficients. Econometrica. 1989;57:1403–1430. [Google Scholar]
22.Serfling R. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]
23.Terpstra J, McKean J. Rank-based Analysis of Linear Models using R. Journal of Statistical Software. 2005;14:1–26. [Google Scholar]
24.Wang L, Kai B, Li R. Technical Report, the Methodology Center, the Pennsylvania State University. 2009. Local Rank Inference for Varying Coefficient Models. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Wu CO, Chiang CT, Hoover DR. Asymptotic Confidence Regions for Kernel Smoothing of a Varying-coefficient Model with Longitudinal Data. Journal of the American Statistical Association. 1998;93:1388–1402. [Google Scholar]
26.Yu K, Jones MC. Local Linear Quantile Regression. Journal of the American Statistical Association. 1998;93:228–237. [Google Scholar]
27.Zhang W, Lee SY. Variable Bandwidth Selection in Varying-Coefficient Models. Journal of Multivariate Analysis. 2000;74:116–134. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

WKL

NIHMS142014-supplement-WKL.pdf^{(1MB, pdf)}

[R1] 1.Brumback B, Rice JA. Smoothing Spline Models for the Analysis of Nested and Crossed Samples of Curves (with discussion) Journal of the American Statistical Association. 1998;93:961–994. [Google Scholar]

[R2] 2.Cai Z, Fan J, Li R. Efficient Estimation and Inferences for Varying- Coefficient Models. Journal of the American Statistical Association. 2000;95:888–902. [Google Scholar]

[R3] 3.Cleveland WS, Grosse E, Shyu WM. Local regression models. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Pacific Grove, California: Wadsworth & Brooks; 1992. pp. 309–376. [Google Scholar]

[R4] 4.David HA. Early Sample Measures of Variability. Statistical Science. 1998;13:368–377. [Google Scholar]

[R5] 5.Fan J, Li R. An Overview on Nonparametric and Semiparametric Techniques for Longitudinal Data. In: Fan J, Koul H, editors. Frontiers in Statistics. Imperial College Press; London: 2006. pp. 277–303. [Google Scholar]

[R6] 6.Fan J, Zhang W. Statistical Estimation in Varying-Coefficient Models. The Annals of Statistics. 1999;27:1491–1518. [Google Scholar]

[R7] 7.Fan J, Zhang W. Simultaneous Confidence Bands and Hypothesis Testing in Varying-Coefficient Models. Scandinavian Journal of Statistics. 2000;27:715–731. [Google Scholar]

[R8] 8.Fan J, Zhang W. Statistical Methods with Varying Coefficient Models. Statistics and its Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Hall P, Kang KH. Bootstrapping Nonparametric Density Eestimators with Empirically Chosen Bandwidths. The Annals of Statistics. 2001;29:1443–1468. [Google Scholar]

[R10] 10.Hastie TJ, Tibshirani RJ. Varying-Coefficient Models (with discussion) Journal of the Royal Statistical Society, Series B. 1993;55:757–796. [Google Scholar]

[R11] 11.Hettmansperger TP, McKean JW. Robust Nonparametric Statistical Methods. London: Arnold; 1998. [Google Scholar]

[R12] 12.Hjort NL, Pollard D. Asymptotics for minimisers of convex processes, Preprint. 1993 http://citeseer.ist.psu.edu/hjort93asymptotics.html.

[R13] 13.Hodges JL, Lehmann EL. The Efficiency of Some Nonparametric Competitors of the t-Test. The Annals of Mathematical Statistics. 1956;27:324–335. [Google Scholar]

[R14] 14.Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric Smoothing Estimates of Time-varying Coefficient Models with Longitudinal Data. Biometrika. 1998;85:809–822. [Google Scholar]

[R15] 15.Jin Z, Ying Z, Wei LJ. A Simple Resampling Method by Disturbibg the Minimand. Biometrika. 2001;88:381–390. [Google Scholar]

[R16] 16.Kauermann G, Tutz G. On Model Diagnostics Using Varying Coefficient Models. Biometrika. 1999;86:119–128. [Google Scholar]

[R17] 17.Kim M-O. Quantile Regression with Varying Coefficients. The Annals of Statistics. 2007;35:92–108. [Google Scholar]

[R18] 18.Leung DH. Cross-validation in nonparametric regression with outliers. The Annals of Statistics. 2005;33:2291–2310. [Google Scholar]

[R19] 19.McKean JW. Robust Analysis of Linear Models. Statistical Science. 2004;19:562–570. [Google Scholar]

[R20] 20.Pollard D. Asymptotics for Least Absoulte Deviation Regression Estimators. Econometric Theory. 1991;7:186–199. [Google Scholar]

[R21] 21.Powell JL, Stock JH, Stoker TM. Semiparametric Estimation of Index Coefficients. Econometrica. 1989;57:1403–1430. [Google Scholar]

[R22] 22.Serfling R. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]

[R23] 23.Terpstra J, McKean J. Rank-based Analysis of Linear Models using R. Journal of Statistical Software. 2005;14:1–26. [Google Scholar]

[R24] 24.Wang L, Kai B, Li R. Technical Report, the Methodology Center, the Pennsylvania State University. 2009. Local Rank Inference for Varying Coefficient Models. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Wu CO, Chiang CT, Hoover DR. Asymptotic Confidence Regions for Kernel Smoothing of a Varying-coefficient Model with Longitudinal Data. Journal of the American Statistical Association. 1998;93:1388–1402. [Google Scholar]

[R26] 26.Yu K, Jones MC. Local Linear Quantile Regression. Journal of the American Statistical Association. 1998;93:228–237. [Google Scholar]

[R27] 27.Zhang W, Lee SY. Variable Bandwidth Selection in Varying-Coefficient Models. Journal of Multivariate Analysis. 2000;74:116–134. [Google Scholar]

PERMALINK

Local Rank Inference for Varying Coefficient Models1

Lan Wang

Bo Kai

Runze Li

Abstract

1 Introduction

2 Local rank estimation procedure

3 Theoretical Properties

3.1 Large sample distributions

Lemma 3.1

Theorem 3.2

Remark

3.2 Asymptotic relative efficiency

Theorem 3.3

Remark 1

Table 1.

Remark 2

3.3 Asymptotic normality of α̂1

Theorem 3.4

3.4 Estimation of the standard errors

Theorem 3.5

4 Numerical Studies

4.1 A pseudo-observation algorithm

4.2 Bandwidth selection

4.3 Examples

Example 1

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Table 2.

Example 2

Figure 5.

Figure 6.

Supplementary Material

Appendix: Proofs

Regularity conditions

Lemma A.1

Lemma A.2

Proof

Lemma A.3

Proof

Proof of Lemma 3.1

Proof of Theorem 3.2

Proof of Theorem 3.3

Proof of Theorem 3.4

Lemma A.4

Proof

Proof of Theorem 3.5

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Local Rank Inference for Varying Coefficient Models^¹

3.3 Asymptotic normality of α̂₁