Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression

Bo Kai; Runze Li; Hui Zou

doi:10.1111/j.1467-9868.2009.00725.x

. Author manuscript; available in PMC: 2011 Jan 1.

Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2010 Jan;72(1):49–69. doi: 10.1111/j.1467-9868.2009.00725.x

Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression

Bo Kai ^*, Runze Li ^†, Hui Zou ^‡,^✉

PMCID: PMC2958780 NIHMSID: NIHMS115056 PMID: 20975930

Abstract

Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. In this paper, we propose a new nonparametric regression technique called local composite-quantile-regression (CQR) smoothing in order to further improve local polynomial regression. Sampling properties of the proposed estimation procedure are studied. We derive the asymptotic bias, variance and normality of the proposed estimate. Asymptotic relative efficiency of the proposed estimate with respect to the local polynomial regression is investigated. It is shown that the proposed estimate can be much more efficient than the local polynomial regression estimate for various non-normal errors, while being almost as efficient as the local polynomial regression estimate for normal errors. Simulation is conducted to examine the performance of the proposed estimates. The simulation results are consistent with our theoretical findings. A real data example is used to illustrate the proposed method.

Key words and phrases: Asymptotic efficiency, CQR estimator, Kernel function, Local polynomial regression, Nonparametric regression

1 Introduction

Consider the general nonparametric regression model

Y = m (T) + σ (T) ϵ,

(1.1)

where Y is the response variable, T is a covariate, m(T) = E(Y|T), which is assumed to be a smooth nonparametric function, and σ(T) is a positive function representing the standard deviation. We assume ϵ has mean 0 and variance 1. Local polynomial regression is a popular and successful method for nonparametric regression, and it has been well studied in the literature (Fan & Gijbels 1996). By locally fitting a linear (or polynomial) regression model via adaptively weighted least squares, local polynomial regression is able to explore the fine features of the regression function and its derivatives. Although the least squares method is a popular and convenient choice in local polynomial fitting, we may consider using different local fitting methods. For example, in the presence of outliers, one may consider local least absolute deviation (LAD) polynomial regression (Fan, Hu & Truong 1994, Welsh 1996). When the error follows a Laplacian distribution, the local LAD polynomial regression is more efficient than the local least squares polynomial regression. Of course, the local LAD polynomial regression can do much worse than the local least squares polynomial regression in other different settings. The aim of this paper is to develop a new local estimation procedure that can significantly improve upon the classical local polynomial regression for a wide class of error distributions, and has comparable efficiency in the worst case scenario.

Our proposal is built upon the composite-quantile-regression (CQR) estimator recently proposed by Zou & Yuan (2008) for estimating the regression coefficients in the classical linear regression model. Zou & Yuan (2008) show that the relative efficiency of the CQR estimator compared to the least squares estimator is greater than 70% regardless the error distribution. Furthermore, the CQR estimator could be much more efficient and sometimes arbitrarily more efficient than the least squares estimator. These nice theoretical properties of CQR in linear regression motivates us to construct the local CQR smoothers as nonparametric estimates of the regression function and its derivatives.

We make several contributions in this paper.

We propose the local linear CQR estimator for estimating the nonparametric regression function. We establish the asymptotic theory of the local linear CQR estimator and show that, compared with the classical local linear least squares estimator, the new method can significantly improve the estimation efficiency of the local linear least squares estimator for commonly used non-normal error distributions.
We propose the local quadratic CQR estimator for estimating the derivative of the regression function. The asymptotic theory shows that the local quadratic CQR estimator can often drastically improve the estimation efficiency of its local least squares counterpart if the error distribution is non-normal, and at the same time, the loss in efficiency is at most 8.01% in the worst case scenario.
The general asymptotic theory of the local p-polynomial CQR estimator is established. Our theory does not require the error distribution to have a finite variance. Therefore, local CQR estimators can work well even when local polynomial regression fails due to the infinite variance in the noise.

It is a well-known fact that the local linear (polynomial) regression is the best linear smoother in terms of efficiency (Fan & Gijbels 1996). There is no contradiction between this fact and our results, because the proposed local CQR estimator is a nonlinear smoother.

The rest of this paper is organized as follows. In section 2, we introduce the local linear CQR for the nonparametric regression and study its asymptotic properties. In section 3, we propose the local quadratic CQR for estimating the derivative of the nonparametric regression, which is able to further reduce the estimation bias by the local linear CQR. Monte Carlo study and a real data example are presented in section 4. In section 5 we present the general theory of the local p-polynomial CQR and technical proofs.

2 Estimation of regression function

Suppose that (t_i,y_i), i = 1, ⋯, n, is an independent and identically distributed random sample. Consider estimating the value of m(T) at t₀. In local linear regression we first approximate m(t) locally by a linear function m(t) ≈ m(t₀) + m′(t₀)(t − t₀) and then fit a linear model locally in a neighborhood of t₀. Let K(·) be a smooth kernel function, the local linear regression estimators of m(t₀) is â, where

(\hat{a}, \hat{b}) = \underset{a, b}{arg min} \sum_{i = 1}^{n} {y_{i} - a - b (t_{i} - t_{0})}^{2} K (\frac{t_{i} - t_{0}}{h}),

(2.1)

where h is the smoothing parameter. Local linear regression enjoys many good theoretical properties, such as its design adaptation property and high minimax efficiency (Fan & Gijbels 1992). However, local least squares regression breaks down when the error distribution does not have finite second moment, for the estimator is no longer consistent. The local least absolute deviation (LAD) polynomial regression (Fan et al. 1994, Welsh 1996) replaces the least squares loss in (2.1) with the L₁ loss. By doing so, the local LAD estimator can deal with the infinite variance case, but for finite variance cases its relative efficiency compared to the local least squares estimator can be arbitrarily small.

We propose the local linear CQR estimator as an efficient alternative to the local linear regression estimator. Let ρ_{τ_k} (r) = τ_kr − rI (r < 0), k = 1, 2, …, q, be q check loss functions at q quantiles positions: $τ_{k} = \frac{k}{q + 1}$ . In the linear regression model the CQR loss is defined as (Zou & Yuan 2008)

\sum_{k = 1}^{q} \sum_{i = 1}^{n} ρ_{τ_{κ}} (y_{i} - a_{k} - b t_{i}) .

The CQR combines the strength across multiple quantile regressions with forcing a single parameter for “slope”. Since the nonparametric function is approximated by a linear model locally, we consider minimizing the locally weighted CQR loss

\sum_{k = 1}^{q} [\sum_{i = 1}^{n} ρ_{τ_{k}} {y_{i} - a_{k} - b (t_{i} - t_{0})} K (\frac{t_{i} - t_{0}}{h})] .

(2.2)

Denote the minimizer of (2.2) by (â₁, ⋯, â_q, b̂). Then we let

\hat{m} (t_{0}) = \frac{1}{q} \sum_{k = 1}^{q} {\hat{a}}_{k}, and \tilde{m}' (t_{0}) = \hat{b} .

(2.3)

We refer m̂(t₀) to as the local linear CQR estimator of m(t₀). As an estimator of m′(t₀), m̃′(t₀) can be further improved by using the local quadratic CQR estimator which is discussed in the next section.

Remark 1

It is worth mentioning here that although the check loss function is typically used to estimate the conditional quantile function of y given T (see Koenker (2005) and references therein), we simultaneously employ several check functions to estimate the regression (mean) function. So the local CQR smoother is conceptually different from nonparametric quantile regression by local fitting which has been studied in Yu & Jones (1998) and chapter 5 of Fan & Gijbels (1996).

Remark 2

In a short note Koenker (1984) studied the Hogg estimator as the minimizer of the weighted sum of check functions in the framework of parametric linear models. The focus there is to argue that the Hogg estimator is a different way to do L-estimation. The CQR loss can be regarded as a weighted sum of check functions with uniform weights and uniform quantiles $(τ_{k} = \frac{k}{q + 1}, k = 1, 2, \dots, q)$ . When q is large, such a choice leads to nice oracle-like estimators in the oracle model selection theoretic framework (Zou & Yuan 2008). Koenker (1984) did not discuss relative efficiency of the Hogg estimator relative to the least squares estimator. In this work we consider minimizing the locally weighted CQR loss and show that the local CQR smoothers have very interesting asymptotic efficiency properties. To our best knowledge, none of these has been studied in the literature.

2.1 Asymptotic properties

To see why local linear CQR is an efficient alternative to local linear regression, we establish the asymptotic properties of the local linear CQR estimator. Some notation is necessary for the discussion. Let F(·) and f(·) denote the density function and cumulative distribution function of the error distribution, respectively. Denote by f_T(·) the marginal density function of the covariate T. We choose the kernel K(·) as a symmetric density function and let

μ_{j} = \int u^{j} K (u) d u and ν_{j} = \int u^{j} K^{2} (u) d u, j = 0, 1, 2, \dots, .

Define

R_{1} (q) = \frac{1}{q^{2}} \sum_{k = 1}^{q} \sum_{k' = 1}^{q} \frac{τ_{k k'}}{f (c_{k}) f ({c k}_{'})},

(2.4)

where c_k = F⁻¹ (τ_k) and τ_kk′ = τ_k ∧ τ_k′ − τ_kτ_k′. In the following theorem, we present the asymptotic bias, variance and normality of m̂(t₀), whose proof is given in section 5. Let T be the σ-field generated by {T₁, ⋯, T_n}.

Theorem 2.1

Suppose that t₀ is an interior point of the support of f_T(·). Under the regularity conditions (A)—(D) in section 5, if h → 0 and nh → ∞, then the asymptotic conditional bias and variance of the local linear CQR estimator m̂(t₀) are given by

Bias (\hat{m} (t_{0}) | T) = \frac{1}{2} m ″ (t_{0}) μ_{2} h^{2} + o_{p} (h^{2}),

(2.5)

Var (\hat{m} (t_{0}) | T) = \frac{1}{n h} \frac{ν_{0} σ^{2} (t_{0})}{f_{T} (t_{0})} R_{1} (q) + o_{p} (\frac{1}{n h}) .

(2.6)

Furthermore, conditioning on T, we have

\sqrt{n h} {\hat{m} (t_{0}) - m (t_{0}) - \frac{1}{2} m ″ (t_{0}) μ_{2} h^{2}} \overset{ℒ}{\to} N (0, \frac{ν_{0} σ^{2} (t_{0})}{f_{T} (t_{0})} R_{1} (q)) .

(2.7)

where $\overset{ℒ}{\to}$ stands for convergence in distribution.

Remark 3

In the proof given in section 5 we assume the error distribution is symmetric. Without such a condition the asymptotic bias will have a non-varnishing term. The asymptotic variance remains the same and the asymptotic normality still holds with a minor modification. In other words, the symmetric error distribution condition is only used to ensure that the quantity to which the local CQR estimator converges is the conditional mean. This is similar to the situation when using the local LAD to estimate the conditional mean function where we need to assume the mean and median of the error distribution coincide.

We see from Theorem 2.1 that the leading term of the asymptotic bias for the local linear CQR estimator is the same as that for the local linear least squares estimator, while their asymptotic variances are different. The mean squared error of m̂(t₀) is

MSE {\hat{m} (t_{0})} = {\frac{1}{2} m ″ (t_{0}) μ_{2}}^{2} h^{4} + \frac{1}{n h} \frac{ν_{0} σ^{2} (t_{0})}{f_{T} (t_{0})} R_{1} (q) + o_{p} (h^{4} + \frac{1}{n h}) .

By straightforward calculations we can see that the optimal variable bandwidth minimizing the asymptotic mean squared error of m̂(t₀) is

h^{opt} (t_{0}) = {[\frac{ν_{0} σ^{2} (t_{0}) R_{1} (q)}{f_{T} (t_{0}) {m ″ (t_{0}) μ_{2}}^{2}}]}^{1 / 5} n^{- 1 / 5} .

In practice, one may select a constant bandwidth by minimizing the mean integrated squared error MISE (m̂) = ∫ MSE{m̂(t₀)}w(t) dt for a weight function w(t). Similarly, the optimal bandwidth minimizing the asymptotic MISE(m̂) is

h^{opt} = {[\frac{ν_{0} R_{1} (q) \int σ^{2} (t) f_{T}^{- 1} (t) w (t) d t}{μ_{2}^{2} \int {m ″ (t)}^{2} w (t) d t}]}^{1 / 5} n^{- 1 / 5} .

The above calculations indicate that the local linear CQR estimator enjoys the optimal rate of convergence n^2/5.

2.2 Asymptotic relative efficiency

In this section, we study the asymptotic relative efficiency of the local linear CQR estimator with respect to the local linear least squares estimator by comparing their mean squared errors. The role of R₁ becomes clear in the relative efficiency study.

The local linear least squares estimator for m(t₀) has the mean squared error

MSE {{\hat{m}}_{LS} (t_{0})} = {\frac{1}{2} m ″ (t_{0}) μ_{2}}^{2} h^{4} + \frac{1}{n h} \frac{ν_{0}}{f_{T} (t_{0})} σ^{2} (t_{0}) + o_{p} (h^{4} + \frac{1}{n h}),

and hence

h_{LS}^{opt} (t_{0}) = {[\frac{ν_{0} σ^{2} (t_{0})}{f_{T} (t_{0}) {m ″ (t_{0}) μ_{2}}^{2}}]}^{1 / 5} n^{- 1 / 5}, h_{LS}^{opt} = {[\frac{ν_{0} \int σ^{2} (t) f_{T}^{- 1} (t) w (t) d t}{μ_{2}^{2} \int {m ″ (t)}^{2} w (t) d t}]}^{1 / 5} n^{- 1 / 5},

where $h_{LS}^{opt} (t_{0})$ is the optimal variable bandwidth minimizing the asymptotic MSE and $h_{LS}^{opt}$ is the optimal bandwidth minimizing the asymptotic MISE. Therefore, we have

h^{opt} (t_{0}) = R_{1} {(q)}^{1 / 5} h_{LS}^{opt} (t_{0}), h^{opt} = R_{1} {(q)}^{1 / 5} h_{LS}^{opt} .

(2.8)

We use MSE_opt and MISE_opt to denote the MSE and MISE evaluated at their optimal bandwidth. Then by straightforward calculations we see that as n approaches ∞,

\frac{{MSE}_{opt} {{\hat{m}}_{LS} (t_{0})}}{{MSE}_{opt} {\hat{m} (t_{0})}} \to R_{1} {(q)}^{- 4 / 5}, \frac{{MISE}_{opt} {{\hat{m}}_{LS}}}{{MISE}_{opt} {\hat{m}}} \to R_{1} {(q)}^{- 4 / 5} .

Thus, it is natural to define ARE(m̂, m̂_LS the asymptotic relative efficiency (ARE) of the local linear CQR estimator with respect to the local linear least squares estimator, as follows

ARE (\hat{m}, {\hat{m}}_{LS}) = R_{1} {(q)}^{- 4 / 5} .

(2.9)

The ARE only depends on the error distribution, although the dependence could be rather complex. However, for many commonly seen error distributions, we can directly compute the value of ARE. Table 1 displays the ARE(m̂, m̂_LS) for some commonly seen error distributions.

Table 1.

Comparisons of ARE m̂, m̂_LS

Error Distribution	ARE (m̂, m̂_LS)
	q = 1	q = 5	q = 9	q = 19	q = 99
N(0, 1)	0.6968	0.9339	0.9659	0.9858	0.9980
Laplace	1.7411	1.2199	1.1548	1.0960	1.0296
t-distribution with df = 3	1.4718	1.5967	1.5241	1.4181	1.2323
t-distribution with df = 4	1.0988	1.2652	1.2377	1.1872	1.0929
.95N(0, 1) + .05N(0, 3²)	0.8639	1.1300	1.1536	1.1540	1.0804
.90N(0, 1) + .10N(0, 3²)	0.9986	1.2712	1.2768	1.2393	1.0506
.95N(0, 1) + .05N(0, 10²)	2.6960	3.4577	3.4783	3.3591	1.3498
.90N(0, 1) + .10N(0, 10²)	4.0505	4.9128	4.7049	3.5444	1.1379

Open in a new tab

Several interesting observations can be made from Table 1. Firstly, when the error distribution is N(0, 1) for which the local linear least squares estimator is expected to have the best performance, the ARE(m̂, m̂_LS) is very close to 1 regardless the choice of q in the local linear CQR estimator. When q = 5 the the local linear CQR only loses at most 7% efficiency, while it performs as well as the local linear least squares estimator when q = 99. Secondly, for all the other non-normal distributions listed in Table 1, the local linear CQR estimator can have higher efficiencies than the local linear least squares estimator when a small q is used. The mixture of two normals is often used to model the so-called contaminated data. For such distributions, the ARE(m̂, m̂_LS) can be as large as 4.9 and even more. Table 1 also indicates that, except for the Laplace error, the local CQR with q = 5 or q = 9 are significantly better than the one with q = 1, which becomes the local LAD for these distributions. Finally, we observe that the ARE values for a variety of distributions are very close to 1 when q is large (q = 99). It turns out that this phenomenon is true in general, as demonstrated in the following theorem.

Theorem 2.2

lim_q→∞ R₁(q) = 1, and thus $lim_{q \to \infty} ARE (\hat{m}, {\hat{m}}_{L S}) = 1$ .

Theorem 2.2 provides us insights into the asymptotic behavior of the local linear CQR estimator and implies that the local linear CQR estimator is a safe competitor against the local linear least squares estimator, for it will not lose efficiency when using a large q. On the other hand, substaintial gain in efficiency could be achieved by using a relatively small q such as q = 9, as shown in Table 1.

3 Estimation of derivative

In many situations we are interested in estimating the derivative of m(t). The local linear CQR also provides an estimator m̃′(t₀) to the derivative of m(t). The asymptotic bias and variance of the estimate m̃′(t₀) in (2.3) are given in (5.8) and (5.9) in section 5. The local linear CQR estimator and the local linear regression estimator have the same leading bias term which depends on the intrinsic part m‴(t₀) and the extra part $m ″ (t_{0}) {f'}_{T} (t_{0}) / f_{T} (t_{0})$ . In Chu & Marron (1991) and Fan (1992), the authors already argued that the bias could be very large in many situations. So it may not be an ideal estimator because of the relatively large bias. The local quadratic regression is often preferred for estimating the derivative function, since it reduces the estimation bias without increasing the estimation variance (Fan & Gijbels 1992). We show here that the same phenomenon is true in local CQR smoothing.

We consider the local quadratic approximation of m(t) in the neighborhood of t₀: $m (t) \approx m (t_{0}) + m' (t_{0}) (t - t_{0}) + \frac{1}{2} m ″ (t_{0}) {(t - t_{0})}^{2}$ . Let a = (a₁, ⋯, a_q) and b = (b₁, b₂). We solve

(â, \hat{b}) = \underset{a,b}{arg min} \sum_{i = 1}^{n} [\sum_{k = 1}^{q} ρ_{τ_{k}} (y_{i} - a_{k} - b_{1} (t_{i} - t_{0}) - \frac{1}{2} b_{2} {(t_{i} - t_{0})}^{2}) K (\frac{t_{i} - t_{0}}{h})] .

(3.1)

Then the local quadratic CQR estimator for m′(t₀) is given by

\hat{m}' (t_{0}) = {\hat{b}}_{1} .

(3.2)

3.1 Asymptotic properties

Denote

R_{2} (q) = (\sum_{k = 1}^{q} \sum_{k' = 1}^{q} τ_{k k'}) / {(\sum_{k = 1}^{q} f (c_{k}))}^{2} .

(3.3)

The asymptotic bias, variance and normality are given in the following theorem.

Theorem 3.1

Suppose that t₀ is an interior point of the support of f_T(·). Under the regularity conditions (A)—(D) in section 5, if h → 0 and nh³ → ∞, then the asymptotic conditional bias and variance of m̃′(t₀), defined in (3.2) is given by

Bias (\hat{m}' (t_{0}) | T) = \frac{1}{6} m ‴ (t_{0}) \frac{μ_{4}}{μ_{2}} h^{2} + o_{p} (h^{2}),

(3.4)

Var (\hat{m}' (t_{0}) | T) = \frac{1}{n h^{3}} \frac{ν_{2} σ^{2} (t_{0})}{μ_{2}^{2} f_{T} (t_{0})} R_{2} (q) + o_{p} (\frac{1}{n h^{3}}) .

(3.5)

Furthermore, conditioning on T, we have the following asymptotic normal distribution

\sqrt{n h^{3}} (\hat{m}' (t_{0}) - m' (t_{0}) - \frac{1}{6} m ‴ (t_{0}) \frac{μ_{4}}{μ_{2}} h^{2}) \overset{ℒ}{\to} N (0, \frac{ν_{2} σ^{2} (t_{0})}{μ_{2}^{2} f_{T} (t_{0})} R_{2} (q)) .

(3.6)

Remark 4

In Theorem 3.1 the symmetric-error-distribution assumption is used to get the asymptotic bias formula. Without that assumption, the asymptotic variance remains the same and the asymptotic normality still holds with a minor modification. It is also interesting to point out that when the variance function is homoscadastic the symmetric-error-distribution assumption is no longer needed for Theorem 3.1.

Comparing (5.8) and (3.4), we see that the extra part $m ″ (t_{0}) {f'}_{T} (t_{0}) / f_{T} (t_{0})$ is removed in the local quadratic CQR estimator. Comparing the local quadratic CQR and the local quadratic least squares estimators for m′(t₀), we see that they have the same leading bias term, while their asymptotic variances are different.

From Theorem 3.1, the mean squared error of local quadratic CQR estimator m̂′(t₀) is given by

MSE {m' (t_{0})} = {(\frac{1}{6} m ‴ (t_{0}) \frac{μ_{4}}{μ_{2}})}^{2} h^{4} + \frac{1}{n h^{3}} \frac{ν_{2} σ^{2} (t_{0})}{μ_{2}^{2} f_{T} (t_{0})} R_{2} (q) + o_{p} (h^{4} + \frac{1}{n h^{3}}) .

Thus, the optimal variable bandwidth minimizing MSE{m̂′(t₀)} is

h^{opt} (t_{0}) = {R_{2} (q)}^{1 / 7} {(\frac{27 ν_{2} σ^{2} (t_{0})}{f_{T} (t_{0}) {m ‴ (t_{0}) μ_{4}}^{2}})}^{1 / 7} n^{- 1 / 7} .

Furthermore, we consider the mean integrated squared error MISEm̂′ = ∫ MSE{m̂′(t)}w(t) dt with a weight function w(t). The optimal constant bandwidth minimizing the mean integrated squared error is given by

h^{opt} = {R_{2} (q)}^{1 / 7} {(\frac{27 ν_{2} \int σ^{2} (t) f_{T}^{- 1} (t) w (t) d t}{\int {m ‴ (t)}^{2} w (t) d t μ_{4}^{2}})}^{1 / 7} n^{- 1 / 7} .

The above calculations indicate that the local quadratic CQR estimator enjoys the optimal rate of convergence n^2/7.

3.2 Asymptotic relative efficiency

In what follows we study the asymptotic relative efficiency of the local quadratic CQR estimator with respect to the local quadratic least squares estimator. Note that the mean squared error of local quadratic least squares estimator ${\hat{m}}_{LS}^{'} (t_{0})$ is given by

MSE {{\hat{m}}_{LS}^{'} (t_{0})} = {(\frac{1}{6} m ‴ (t_{0}) \frac{μ_{4}}{μ_{2}})}^{2} h^{4} + \frac{1}{n h^{3}} \frac{ν_{2} σ^{2} (t_{0})}{μ_{2}^{2} f_{T} (t_{0})} + o_{p} (h^{4} + \frac{1}{n h^{3}}),

and the mean integrated squared error (MISE) is $MISE ({\hat{m}}_{LS}^{'}) = \int MSE {{\hat{m}}_{LS}^{'} (t)} w (t) d t$ with a weight function w(t). Thus, by straightforward calculations, we notice that

h^{opt} (t_{0}) = h_{LS}^{opt} (t_{0}) R_{2} {(q)}^{1 / 7}, h^{opt} = h_{LS}^{opt} R_{2} {(q)}^{1 / 7},

(3.7)

where $h_{LS}^{opt} (t_{0})$ and $h_{LS}^{opt}$ are the corresponding optimal bandwidths of local quadratic least squares estimator. With the optimal bandwidths, we have

\frac{{MSE}_{opt} {{\hat{m}}_{LS}^{'} (t_{0})}}{{MSE}_{opt} {\hat{m}' (t_{0})}} \to R_{2} {(q)}^{- 4 / 7}, \frac{{MISE}_{opt} ({\hat{m}}_{LS}^{'})}{{MISE}_{opt} (\hat{m}')} \to R_{2} {(q)}^{- 4 / 7} .

Therefore, the asymptotic relative efficiency (ARE) of the local quadratic CQR estimator (m̂′) with respect to the local quadratic least squares estimator $({\hat{m}}_{LS}^{'})$ is defined to be

ARE (\hat{m}', {\hat{m}}_{LS}^{'}) = R_{2} {(q)}^{- 4 / 7} .

(3.8)

The ARE only depends on the error distribution and it is scale invariant.

To gain insights into the asymptotic relative efficiency, we consider the limit when q is large. Zou & Yuan (2008) showed that

lim_{q \to \infty} R_{2} {(q)}^{- 1} > \frac{6}{e π} = 0.7026 .

Immediately, we know that if using a large q, the ARE is bounded below by 0.7026^4/7 = 0.8173. Having a universal lower bound is very useful because it prohibits severe loss in efficiency when replacing the local quadratic least squares estimator with the local quadratic CQR estimator. One of our contributions in this work is to provide an improved sharper lower bound as shown in the following theorem.

Theorem 3.2

Let ℱ denote the class of error distributions with mean 0 and variance 1, then we have

inf_{f \in ℱ} lim_{q \to \infty} R_{2} {(q)}^{- 1} = 0.864 .

(3.9)

The lower bound is reached if and only if the error follows the rescaled Beta(2,2) distribution with mean zero and variance one. Thus

lim_{q \to \infty} A R E (\hat{m}', {\hat{m}}_{LS}^{'}) \geq 0.9199 .

(3.10)

It is interesting to note that Theorem 3.2 provides us the exact lower bound of $ARE (\hat{m}', {\hat{m}}_{LS}^{'})$ as q → ∞. Theorem 3.2 indicates that if q is large, even in the worst scenario the potential efficiency loss for the local CQR estimator is only 8.01%.

Theorem 3.2 implies that the local quadratic CQR estimator is a safe alternative to the local quadratic least squares estimator. It concerns the worst case scenario. There are many optimistic scenarios as well in which the ARE can be much bigger than 1. We examine the $ARE (\hat{m}', {\hat{m}}_{LS}^{'})$ for the error distributions considered in Table 1. We also list the results in Table 2, where the column labeled q = ∞ shows the theoretical limit of the $ARE (\hat{m}', {\hat{m}}_{LS}^{'})$ . Obviously, these limits are all larger than the lower bound 0.9199. The local quadratic CQR estimator only loses less than 4% efficiency when the error distribution is normal and q = 9. It is interesting to see that for the other non-normal distributions the $ARE (\hat{m}', {\hat{m}}_{LS}^{'})$ is larger than 1 and its value is insensitive to the choice of q. For example, with q = 9, the AREs are already very close to their theoretical limits.

Table 2.

Comparisons of $ARE (\hat{m}', {\hat{m}}_{LS}^{'})$

Error Distribution

ARE (\hat{m}', {\hat{m}}_{LS}^{'})

q = 1

q = 5

q = 9

q = 19

q = 99

q = ∞

N(0, 1)

0.7726

0.9453

0.9625

0.9708

0.9738

0.9740

Laplace

1.4860

1.2812

1.2680

1.2625

1.2608

1.2607

t-distribution with df = 3

1.3179

1.4405

1.4435

1.4430

1.4431

t-distribution with df = 4

1.0696

1.2038

1.2104

1.2123

1.2125

.95N(0, 1) + .05N(0, 3²)

0.9008

1.0867

1.1019

1.1073

1.1077

.90N(0, 1) + .10N(0, 3²)

0.9990

1.1869

1.1982

1.1999

1.1987

.95N(0, 1) + .05N(0, 10²)

2.0308

2.4229

2.4466

2.4482

2.4415

.90N(0, 1) + .10N(0, 10²)

2.7160

3.1453

3.1430

3.1135

3.1094

3.1093

Open in a new tab

4 Numerical comparisons and examples

In this section, we first use Monte Carlo simulation studies to assess the finite sample performance of the proposed estimation procedures and then demonstrate the application of the proposed method by using a real data example. Throughout this section we use the Epanechnikov kernel, i.e., $K (z) = \frac{3}{4} {(1 - z^{2})}_{+}$ . We adopt the MM algorithm proposed by (Hunter & Lange 2000) for solving the local CQR smoothing estimator. All the numerical results are computed using our MATLAB code, which is available upon request.

4.1 Bandwidth selection in practical implementation

Bandwidth selection is an important issue in local smoothing. Here we briefly discuss the bandwidth selection issue in the local CQR smoothing estimator by using existing bandwidth selector for the local polynomial regression. Here we consider two bandwidth selectors.

The “pilot” selector. The idea is to use a pilot bandwidth in local cubic CQR (defined in section 5) to estimate m″(t) and m‴(t). The fitted residuals can be used to estimate R₁(q) and R₂(q). Thus, we can use the optimal bandwidth formula to estimate the optimal bandwidth and then refit the data.
A short-cut strategy. In our numerical studies, we compare the local CQR and local least squares estimators. Note that in (2.8) and (3.7) we obtain very neat relationships between the optimal bandwidths for the local CQR and local least squares estimators. The optimal bandwidth for the local least squares estimators can be selected by existing bandwidth selectors (see Chapter 4 of Fan & Gijbels (1996)). In addition, we are able to infer the factors R₁(q) and R₂(q) from the residuals of the local least squares fit. Sometimes, we even know the exact values of the two factors (e.g., in simulations). Therefore, after fitting the local least squares estimator with the optimal bandwidth, we can estimate the optimal bandwidth for the local CQR estimator.

We used the short-cut strategy in our simulation examples. However, if the error variance is infinite or very large, then the local least squares estimator performs poorly. The “pilot” selector is a better choice than the short-cut strategy.

4.2 Simulation examples

In our simulation studies, we compare the performance of the newly proposed method with the local polynomial least squares estimate. The bandwidth is set to the optimal one in which the $h_{LS}^{opt}$ is selected by a plug-in bandwidth selector (Ruppert, Sheather & Wand 1995). The performance of estimator m̂(·) and m̂′(·) is assessed via the average squared errors (ASE), defined by $ASE (\hat{g}) = \frac{1}{n_{grid}} \sum_{k = 1}^{n_{grid}} {\hat{g} (u_{k}) - g (u_{k})}^{2}$ , with g equals either m(·) or m′(·), where {u_k, k = 1, …, n_grid} are the grid points at which the functions {ĝ(·)} are evaluated. In our simulation, we set n_grid = 200 and grid points are evenly distributed over the interval at which the m(·) and m′(·) are estimated. We summarize our simulation results using the ratio of average squared errors (RASE), $RASE (\hat{g}) = \frac{{ASE}_{({\hat{g}}_{LS})}}{{ASE}_{(\hat{g})}}$ for an estimator ĝ, where ĝ_LS is the local polynomial regression estimator under the least squares loss. We considered two simulation examples.

Example 4.1

We generated 400 data set, each consisting of n = 200 observations, from

Y = sin (2 T) + 2 exp (- 16 T^{2}) + 0.5 ϵ,

(4.1)

where T follows N(0, 1). This model is adopted from Fan & Gijbels (1992). In our simulation, we considered five error distributions for ϵ: N(0, 1), Laplace, t₃ distribution, a mixture of two normals (0.95N(0,1) + 0.05N(0, σ²) with σ = 3,10). For the local polynomial CQR estimator, we consider q = 5, 9 and 19, and estimate m(·) and m′(·) over [−1.5, 1.5]. The mean and standard deviation of RASE over 400 simulations are summarized in Table 3. To see how the proposed estimate behaves at a typical point, Table 3 also depicts the biases and standard deviations of m̂(t) and m̂′(t) at t = 0.75. In Table 3, CQR₅, CQR₉ and CQR₁₉ correspond to the local CQR estimate with q = 5, 9 and 19, respectively.

Table 3.

Simulation Results for Example 4.1

	m̂			m̂′

	RASE Mean(SD)	t = 0.75		RASE Mean(SD)	t = 0.75
	RASE Mean(SD)	Bias	Std	RASE Mean(SD)	Bias	Std
Standard Normal
LS	—	−0.0239	0.1098	—	−0.0539	0.6871
CQR₅	0.9314_(0.1190)	−0.0224	0.1161	0.9518_(0.1087)	−0.0508	0.7257
CQR₉	0.9588_(0.0888)	−0.0236	0.1133	0.9614_(0.1019)	−0.0530	0.7165
CQR₁₉	0.9802_(0.0592)	−0.0228	0.1117	0.9646_(0.0998)	−0.0513	0.7178

Laplace
LS	—	−0.0146	0.1215	—	−0.1108	0.6988
CQR₅	1.1088_(0.1985)	−0.0171	0.1155	1.1014_(0.1679)	−0.0774	0.6916
CQR₉	1.0717_(0.1351)	−0.0154	0.1195	1.1025_(0.1565)	−0.0834	0.6678
CQR₁₉	1.0346_(0.0856)	−0.0141	0.1214	1.1005_(0.1500)	−0.0934	0.6529

t-distribution with df = 3
LS	—	−0.0214	0.1266	—	−0.0701	0.7254
CQR₅	1.2752_(0.5020)	−0.0182	0.1103	1.2104_(0.4584)	−0.0559	0.6635
CQR₉	1.1712_(0.3356)	−0.0158	0.1137	1.2133_(0.4526)	−0.0520	0.6537
CQR₁₉	1.0710_(0.2086)	−0.0186	0.1222	1.2182_(0.4403)	−0.0540	0.6431

.95N(0, 1) + .05N(0,9)
LS	—	−0.0007	0.1256	—	−0.0382	0.8540
CQR₅	1.0685_(0.2275)	−0.0060	0.1202	1.0479_(0.1773)	−0.0182	0.8098
CQR₉	1.0621_(0.1740)	−0.0049	0.1219	1.0531_(0.1727)	−0.0154	0.8085
CQR₁₉	1.0280_(0.1125)	−0.0018	0.1251	1.0532_(0.1687)	−0.0198	0.8062

.95N(0, 1) + .05N(0,100)
LS	—	0.0034	0.1283	—	−0.0456	0.8667
CQR₅	2.1548_(1.5318)	0.0002	0.0888	1.7671_(0.7607)	0.0022	0.5953
CQR₉	1.5240_(0.8360)	−0.0009	0.1181	1.7527_(0.7535)	0.0024	0.6030
CQR₁₉	1.1600_(0.8776)	0.0069	0.1365	1.7560_(0.7382)	0.0044	0.5927

Open in a new tab

Example 4.2

It is of interest to investigate the effect of heteroscedastic errors. To this end, we generated 400 simulation data sets, each consisting of n = 200 observations, from

Y = T sin (2 π T) + σ (T) ϵ,

(4.2)

where T follows U(0, 1), σ(t) = {2 + cos(2πt)}/10, and ϵ is the same as that in Example 4.1. In this example, we estimate m(t) and m′(t) over [0,1]. The mean and standard deviation of RASE over 400 simulations are summarized in Table 4, in which we also show the biases and standard deviations of m̂(t) and m̂′(t) at t = 0.4. The notation of Table 4 is the same as that in Table 3.

Table 4.

Simulation Results for Example 4.2.

	m̂			m̂′

	RASE Mean(SD)	t = 0.4		RASE Mean(SD)	t = 0.4
	RASE Mean(SD)	Bias	Std	RASE Mean(SD)	Bias	Std
Standard Normal
LS	—	−0.0177	0.0263	—	0.0329	0.2753
CQR₅	0.9574_(0.1699)	−0.0166	0.0271	0.9376_(0.3587)	0.0289	0.3019
CQR₉	0.9783_(0.1286)	−0.0165	0.0266	0.9458_(0.3092)	0.0283	0.3013
CQR₁₉	0.9838_(0.0815)	−0.0168	0.0266	0.9491_(0.2952)	0.0278	0.2962

Laplace
LS	—	−0.0175	0.0249	—	0.0236	0.2718
CQR₅	1.1938_(0.3279)	−0.0145	0.0237	1.2063_(0.6794)	0.0106	0.2701
CQR₉	1.1405_(0.2523)	−0.0150	0.0243	1.2046_(0.6413)	0.0079	0.2719
CQR₁₉	1.0857_(0.1584)	−0.0157	0.0248	1.2019_(0.6035)	0.0098	0.2693

t-distribution with d f = 3
LS	—	−0.0167	0.0261	—	0.0025	0.3068
CQR₅	1.5974_(1.0324)	−0.0120	0.0229	1.6099_(1.7558)	0.0004	0.2503
CQR₉	1.4247_(0.8170)	−0.0132	0.0228	1.5975_(1.8047)	−0.0002	0.2560
CQR₁₉	1.2111_(0.4330)	−0.0140	0.0242	1.5948_(1.8291)	0.0006	0.2567

.95N(0, 1) + .05N(0,9)
LS	—	−0.0175	0.0247	—	−0.0130	0.2916
CQR₅	1.1788_(0.6248)	−0.0157	0.0228	1.2268_(2.0608)	−0.0050	0.2778
CQR₉	1.1507_(0.4715)	−0.0157	0.0230	1.2132_(1.8791)	−0.0048	0.2754
CQR₁₉	1.0835_(0.2603)	−0.0159	0.0234	1.2104_(1.8546)	−0.0066	0.2742

.95N(0, 1) + .05N(0,100)
LS	—	−0.0162	0.0260	—	0.0335	0.3728
CQR₅	3.1661_(2.4820)	−0.0077	0.0173	3.0593_(5.6699)	0.0245	0.2420
CQR₉	2.4179_(1.7012)	−0.0080	0.0171	3.0287_(5.3433)	0.0209	0.2533
CQR₁₉	1.3469_(0.5075)	−0.0085	0.0241	3.0146_(5.2728)	0.0234	0.2452

Open in a new tab

Table 3 and Table 4 show very similar messages, although Table 4 indicates that the local CQR has more gains over the local least squares method. When the error follows the normal distribution, the RASEs of the local CQR estimators are slightly less than one. For non-normal distributions, the RASEs of the local CQR estimators can be greater than one, indicating the gain in efficiency. For estimating the regression function, CQR₅ and CQR₉ seem to have better overall performance than CQR₁₉. For estimating the derivative, all three CQR estimators perform very similarly. These findings are consistent with the theoretical analysis of AREs.

4.3 A real data example

As an illustration, we now apply the proposed local CQR methodology to the U.K. Family Expenditure Survey data subset with high net-income, which consists of 363 observations. The scatter plot of data is depicted in the left panel of Figure 1. The data set was collected in the U.K. Family Expenditure Survey in 1973. Of interest is to study the relationship between the food expenditure and the net-income. Thus, we take the response variable Y to be the logarithm of the food expenditure, and the predictor variable T is the net-income.

The left panel is the scatter plot of data, the middle panel is the estimated regression function, and the right panel is the estimated derivative function.

We first estimated the regression function using the local least squares estimator with the plug-in bandwidth selector (Ruppert et al. 1995). We further employed the kernel density estimate to infer the error density f(·) based on the residuals from the local least squares estimator. Based on the estimated density, we estimated both R₁(q) and R₂(q), which were used to compute the bandwidth selector for the CQR estimator. For this example, the estimated ratios are close to 1, so we basically use the same bandwidths for these two methods. The selected bandwidths are 0.24 for regression estimation and 0.4 for derivative estimation. The CQR estimates with q = 5,9 and 19 with the selected bandwidths are evaluated. The CQR estimates with three different q’s are very similar, we only present the CQR estimate with q = 9 in Figure 1.

It is interesting to see from Figure 1 that the overall patten of the local least squares and the local CQR estimate are the same. The difference between the local least squares estimate and the local CQR estimate of the regression function becomes large when the net income around 2.8. From the scatter plot, there are two possible outlier observations: (2.7902,−2.5207) and (2.8063,−2.6105) (circled in the plot). To understand the impact of these two possible outliers, we re-evaluated the local CQR and the local least squares estimates after excluding these two possible outliers. The resulting estimates are depicted in the top panel of Figure 2, from which we can see that the local CQR estimate remains almost the same, while the local least squares estimate changes a lot. We also note that after removing these two possible outliers, the local least squares estimator becomes very close to the local CQR estimator. Furthermore, as a more extreme demonstration, we kept these two possible outliers in the data set and moved them to more extreme cases, i.e, we moved (2.7902,−2.5207) and (2.8063,−2.6105) to (2.7902,(2.7902,−26.5207) and (2.8063,−6.6105), respectively. After distorting the two observations, we re-calculated the local CQR and the local least squares estimate. The resulting estimates are depicted in the bottom panel of Figure 2, which clearly demonstrates that the local least squares estimate changes dramatically, while the local CQR estimate is nearly un-affected by the artificial data distortion.

Plot of estimated regression function and its derivative. The top panel is for the estimate removing the two possible outliers, and bottom panel is for the estimate moving the two possible outliers to more extreme cases. The left panel is for the estimated regression function, and the right panel is the estimated derivative function.

5 Local p-polynomial CQR smoothing and proofs

In this section we establish asymptotic theory of the local p-polynomial CQR estimators. We then treat Theorems 2.1 and 3.1 as two special cases of the general theory. As a generalization of the local linear and local quadratic CQR estimators, the local p-polynomial CQR estimator is constructed by minimizing

\sum_{k = 1}^{q} [\sum_{i = 1}^{n} ρ_{τ_{k}} {y_{i} - a_{k} - \sum_{j = 1}^{p} b_{j} {(t_{i} - t_{0})}^{j}} K (\frac{t_{i} - t_{0}}{h})],

(5.1)

and the local p-polynomial CQR estimators of m(t₀) and m^(r)(t₀) are given by

\hat{m} (t_{0}) = \frac{1}{q} \sum_{k = 1}^{q} {\hat{a}}_{k}, and {\hat{m}}^{(r)} (t_{0}) = r! {\hat{b}}_{r}, r = 1, \dots, p .

(5.2)

For the asymptotic analysis, we need the following regularity conditions:

m(t) has a continuous (p + 2)^th derivative in the neighborhood of t₀.
f_T(·), the marginal density function of T, is differentiable and positive in the neighborhood of t₀.
The conditional variance σ²(t) is continuous in the neighborhood of t₀.
Assume that the error has a symmetric distribution with a positive density f(·).

We choose the kernel function K such that K is a symmetric density function with finite support [−M, M]. The following notation is needed to present the asymptotic properties of the local p-polynomial CQR estimator. Let S₁₁ be a q × q diagonal matrix with diagonal elements f(c_k), k = 1, ⋯, q, S₁₂ be a q × p matrix with (k, j)-element being f(c_k)μ_j, k = 1, ⋯,q and $j = 1, \dots, p, S_{21} = S_{12}^{T}$ , and S₂₂ be a p × p matrix with (j, j′)-element being $\sum_{k = 1}^{q} f (c_{k}) μ_{j + j'}, for j, j' = 1, \dots, p$ . Similarly, Let Σ₁₁ be a q × q matrix with (k, k′)-element ν₀τ_kk′, k, k′ = 1, ⋯, q, Σ₁₂ be a q×p matrix with (k, j)-element being $ν_{j} \sum_{k' = 1}^{q} τ_{k k'}, k = 1, \dots, q$ and $j = 1, \dots, p, Σ_{21} = Σ_{12}^{T}$ , and Σ₂₂ be a p × p matrix with (j, j′)-element being $(\sum_{k, k' = 1}^{q} τ_{k k'}) ν_{j + j'}, for j, j' = 1, \dots, p$ . Define

S = (\begin{matrix} S_{11} & S_{12} \\ S_{21} & S_{22} \end{matrix}), and Σ = (\begin{matrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{matrix}) .

Partition S⁻¹ into four submatrices as follows

S^{- 1} = {(\begin{matrix} S_{11} & S_{12} \\ S_{21} & S_{22} \end{matrix})}^{- 1} = (\begin{matrix} {(S^{- 1})}_{11} & {(S^{- 1})}_{12} \\ {(S^{- 1})}_{21} & {(S^{- 1})}_{22} \end{matrix}),

where and hereafter, we use (·)₁₁ to denote the left-top q × q submatrix and use (·)₂₂ to denote the right-bottom p × p submatrix.

Furthermore, let $u_{k} = \sqrt{n h} {a_{k} - m (t_{0}) - σ (t_{0}) c_{k}}, v_{j} = h^{j} \sqrt{n h} {j! b_{j} - m^{(j)} (t_{0})} / j!$ . Let x_i = (t_i − t₀)/h, K_i = K(x_i) and $Δ_{i, k} = \frac{u_{k}}{\sqrt{n h}} + \sum_{j = 1}^{p} \frac{v_{j} x_{i}^{j}}{\sqrt{n h}}$ . Write d_i,k = c_k[σ(t_i) − σ(t₀)] + r_i,p with $r_{i, p} = m (t_{i}) - \sum_{j = 0}^{p} m^{(j)} (t_{0}) {(t_{i} - t_{0})}^{j} / j!$ . $η_{i, k}^{*}$ to be $I (ϵ_{i} \leq c_{k} - \frac{d_{i, k}}{σ (t_{i})}) - τ_{k}$ . let $W_{n}^{*} = {(w_{11}^{*}, \dots, w_{1 q}^{*}, w_{21}^{*}, \dots, w_{2 p}^{*})}^{T} with w_{1 k}^{*} = \frac{1}{\sqrt{n h}} \sum_{i = 1}^{n} K_{i} η_{i, k}^{*} and w_{2 j}^{*} = \frac{1}{\sqrt{n h}} \sum_{k = 1}^{q} \sum_{i = 1}^{n} K_{i} x_{i}^{j} η_{i, k}^{*}$ .

The asymptotic properties of the local p-polynomial CQR estimator are based on the following theorem.

Theorem 5.1

Denote θ̂_n = (û₁, …, û₁, ν̂₁, …, ν̂_p) be the minimizer of (5.1). Then under the regularity conditions (A)—(C), we have

{\hat{θ}}_{n} + \frac{σ (t_{0})}{f_{T} (t_{0})} S^{- 1} E (W_{n}^{*} | T) \overset{ℒ}{\to} M V N (0, \frac{σ^{2} (t_{0})}{f_{T} (t_{0})} S^{- 1} Σ S^{- 1}) .

To prove theorem 5.1, we first establish Lemmas 5.2—5.3.

Lemma 5.2

Minimizing (5.1) is equivalent to minimizing

\sum_{k = 1}^{q} u_{k} (\sum_{i = 1}^{n} \frac{K_{i} η_{i, k}^{*}}{\sqrt{n h}}) + \sum_{j = 1}^{p} v_{j} (\sum_{k = 1}^{q} \sum_{i = 1}^{n} \frac{K_{i} x_{i}^{j} η_{i, k}^{*}}{\sqrt{n h}}) + \sum_{k = 1}^{q} B_{n, k} (θ)

with respect to θ = (u₁, ⋯, u_q, ν₁, ⋯, ν_p)^T, where

B_{n, k} (θ) = \sum_{i = 1}^{n} {K_{i} \int_{0}^{Δ_{i, k}} [I (ϵ_{i} \leq c_{k} - \frac{d_{i, k}}{σ (t_{i})} + \frac{z}{σ (t_{i})}) - I (ϵ_{i} \leq c_{k} - \frac{d_{i, k}}{σ (t_{i})})] d z} .

Proof. To apply the identity (Knight 1998)

ρ_{τ} (x - y) - ρ_{τ} (x) = y (I (x \leq 0) - τ) + \int_{0}^{y} {I (x \leq z) - I (x \leq 0)} d z,

(5.3)

we write $y_{i} - a_{k} - \sum_{j = 1}^{p} b_{i} {(t_{i} - t_{0})}^{j} = σ (t_{i}) (ϵ_{i} - c_{k}) + d_{i, k} - Δ_{i, k}$ . Minimizing (5.1) is equivalent to minimizing

L_{n} (θ) = \sum_{i = 1}^{n} {K_{i} \sum_{k = 1}^{q} [ρ_{τ_{k}} (σ (t_{i}) (ϵ_{i} - c_{k}) + d_{i, k} - Δ_{i, k}) - ρ_{τ_{k}} (σ (t_{i}) (ϵ_{i} - c_{k}) + d_{i, k})]} .

Using the identity (5.3) and with some straightforward calculations, it follows that

L_{n} (θ) = \sum_{k = 1}^{q} u_{k} (\sum_{i = 1}^{n} \frac{K_{i} η_{i, k}^{*}}{\sqrt{n h}}) + \sum_{j = 1}^{p} v_{j} (\sum_{k = 1}^{q} \sum_{i = 1}^{n} \frac{K_{i} x_{i}^{j} η_{i, k}^{*}}{\sqrt{n h}}) + \sum_{k = 1}^{q} B_{n, k} (θ) .

This completes the proof.

Let S_n,11 be a q × q diagonal matrix with diagonal elements $f (c_{k}) \sum_{i = 1}^{n} \frac{K_{i}}{n h σ (t_{i})}, k = 1, \dots, q$ ; S_n,12 be a q × p matrix with (k, j)-element $f (c_{k}) \sum_{i = 1}^{n} \frac{K_{i} x_{i}^{j}}{n h σ (t_{i})}, j = 1, \dots, p$ ; S_n,22 be a p × p matrix with (j, j′) element $\sum_{k = 1}^{q} f (c_{k}) \sum_{i = 1}^{n} \frac{K_{i} x_{i}^{j + j'}}{n h σ (t_{i})}$ . Denote

S_{n} = (\begin{matrix} S_{n, 11} & S_{n, 12} \\ S_{n, 12}^{T} & S_{n, 22} \end{matrix}) .

Lemma 5.3

Under Conditions (A) – (C), $L_{n} (θ) = \frac{1}{2} θ^{T} S_{n} θ + {(W_{n}^{*})}^{T} θ + o_{p} (1)$ .

Proof. Write L_n(θ) as

L_{n} (θ) = \sum_{k = 1}^{q} u_{k} (\sum_{i = 1}^{n} \frac{K_{i} η_{i, k}^{*}}{\sqrt{n h}}) + \sum_{j = 1}^{p} v_{j} (\sum_{k = 1}^{q} \sum_{i = 1}^{n} \frac{K_{i} x_{i}^{j} η_{i, k}^{*}}{\sqrt{n h}}) + \sum_{k = 1}^{q} E_{ϵ} [B_{n, k} (θ) | T] + \sum_{k = 1}^{q} R_{n, k} (θ),

where R_n,k(θ) = B_n,k(θ) − E_ϵ[B_n,k(θ)|T]. Using F(c_k + z) − F(c_k) = zf(c_k) + o(z), then $\sum_{k = 1}^{q} E_{ϵ} [B_{n, k} (θ) | T]$ equals

\begin{matrix} \sum_{k = 1}^{q} \sum_{i = 1}^{n} [K_{i} \int_{0}^{Δ_{i, k}} {\frac{z}{σ (t_{i})} f (c_{k} - \frac{d_{i, k}}{σ (t_{i})}) + o (z)} d z] \\ = \sum_{k = 1}^{q} \sum_{i = 1}^{n} [K_{i} Δ_{i, k}^{2} \frac{f (c_{k} - \frac{d_{i, k}}{σ (t_{i})})}{2 σ (t_{i})}] + o_{p} (1) \\ = \sum_{k = 1}^{q} \sum_{i = 1}^{n} [K_{i} Δ_{i, k}^{2} \frac{f (c_{k})}{2 σ (t_{i})}] + o_{p} (1) = \frac{1}{2} θ^{T} S_{n} θ + o_{p} (1) \end{matrix}

We now prove R_n,k(θ) = o_p(1). It is sufficient to show Var_ϵ[B_n,k(θ)|T] = o_p(1). In fact,

\begin{matrix} {Var}_{ϵ} [B_{n, k} (θ) | T] \\ = \sum_{i = 1}^{n} {Var}_{ϵ} [{K_{i} \int_{0}^{Δ_{i, k}} [I (ϵ_{i} \leq c_{k} - \frac{d_{i, k}}{σ (t_{i})} + \frac{z}{σ (t_{i})}) - I (ϵ_{i} \leq c_{k} - \frac{d_{i, k}}{σ (t_{i})})] d z} | T] \\ \leq \sum_{i = 1}^{n} E_{ϵ} [{K_{i} \int_{0}^{Δ_{i, k}} [I (ϵ_{i} \leq c_{k} - \frac{d_{i, k}}{σ (t_{i})} + \frac{z}{σ (t_{i})}) - I (ϵ_{i} \leq c_{k} - \frac{d_{i, k}}{σ (t_{i})})] d z}^{2} | T] \\ \leq \sum_{i = 1}^{n} K_{i}^{2} \int_{0}^{| Δ_{i, k} |} \int_{0}^{| Δ_{i, k} |} [F (c_{k} - \frac{d_{i, k}}{σ (t_{i})} + \frac{| Δ_{i, k} |}{σ (t_{i})}) - F (c_{k} - \frac{d_{i, k}}{σ (t_{i})})] d z_{1} d z_{2} \\ = o (\sum_{i = 1}^{n} K_{i}^{2} Δ_{i, k}^{2}) = o_{p} (1) . \end{matrix}

Proof of Theorem 5.1

Similar to Parzen (1962), we have $\frac{1}{n h} \sum_{i = 1}^{n} K_{i} x_{i}^{j} \overset{P}{\to} f_{T} (t_{0}) μ_{j}, where \overset{P}{\to}$ stands for convergence in probability. Thus,

S_{n} \overset{P}{\to} \frac{f_{T} (t_{0})}{σ (t_{0})} S = \frac{f_{T} (t_{0})}{σ (t_{0})} (\begin{matrix} S_{11} & S_{12} \\ S_{21} & S_{22} \end{matrix}) .

This, together with Lemmas 5.2, 5.3, leads to

L_{n} (θ) = \frac{1}{2} \frac{f_{T} (t_{0})}{σ (t_{0})} θ^{T} S θ + {(W_{n}^{*})}^{T} θ + o_{p} (1) .

Since the convex function $L_{n} (θ) - {(W_{n}^{*})}^{T} θ$ converges in probability to the convex function $\frac{1}{2} \frac{f_{T} (t_{0})}{σ (t_{0})} θ^{T} S θ$ , it follows from the convexity lemma (Pollard 1991) that for any compact set Θ, the quadratic approximation to L_n(θ) holds uniformly for θ in any compact set, which leads to

{\hat{θ}}_{n} = - \frac{σ (t_{0})}{f_{T} (t_{0})} S^{- 1} W_{n}^{*} + o_{p} (1) .

Denote η_i,k = I(ϵ_i ≤ c_k)−τ_k and W_n = (w₁₁, ⋯, w_1q,, w₂₁, ⋯, w_2p)^T with $w_{1 k} = \frac{1}{\sqrt{n h}} \sum_{i = 1}^{n} K_{i} η_{i, k}$ and $w_{2 j} = \frac{1}{\sqrt{n h}} \sum_{k = 1}^{q} \sum_{i = 1}^{n} K_{i} x_{i}^{j} η_{i, k}$ . By the Cramer-Wald theorem, it is easy to see that the CLT for W_n|T holds

\frac{W_{n} | T - E [W_{n} | T]}{\sqrt{Var [W_{n} | T]}} \overset{ℒ}{\to} M V N (0, I_{(p + q) \times (p + q)}) .

(5.4)

Note that Cov(η_i,k, η_i,k′) = τ_kk′,Cov(η_i,k, η_j,k′) = 0, if i ≠ j. Similar to Parzen (1962), we have $\frac{1}{n h} \sum_{i = 1}^{n} K_{i}^{2} x_{i}^{j} \overset{P}{\to} f_{T} (t_{0}) ν_{j}, Therefore, Var [W_{n} | T] \overset{P}{\to} f_{T} (t_{0}) Σ$ . Combined with (5.4), we have $W_{n} | T \overset{ℒ}{\to} M V N (0, f_{T} (t_{0}) Σ)$ . Moreover, we have $Var (w_{1 k}^{*} - w_{1 k} | T) = \frac{1}{n h} \sum_{i = 1}^{n} K_{i}^{2} Var (η_{i, k}^{*} - η_{i, k}) \leq \frac{1}{n h} \sum_{i = 1}^{n} K_{i}^{2} {F (c_{k} + \frac{| d_{i, k} |}{σ (t_{i})}) - F (c_{k})} = o_{p} (1) and also Var (w_{2 j}^{*} - w_{2 j} | T) = \frac{1}{n h} \sum_{i = 1}^{n} K_{i}^{2} x_{i}^{j} Var (\sum_{k = 1}^{q} η_{i, k}^{*} - η_{i, k}) \leq \frac{q^{2}}{n h} \sum_{i = 1}^{n} K_{i}^{2} x_{i}^{j} {max}_{k} {F (c_{k} + \frac{| d_{i, k} |}{σ (t_{i})}) - F (c_{})} = o_{p} (1), thus Var (W_{n}^{*} - W_{n} | T) = o_{p} (1)$ . So by Slutsky’s theorem, conditioning on T, we have $W_{n}^{*} | T - E (W_{n}^{*} | T) \overset{ℒ}{\to} M V N (0, f_{T} (t_{0}) Σ)$ . Therefore,

{\hat{θ}}_{n} + \frac{σ (t_{0})}{f_{T} (t_{0})} S^{- 1} E (W_{n}^{*} | T) \overset{ℒ}{\to} M V N (0, \frac{σ^{2} (t_{0})}{f_{T} (t_{0})} S^{- 1} Σ S^{- 1}) .

(5.5)

This completes the proof.

Proof of Theorem 2.1

The asymptotic normality follows Theorem 5.1 with p = 1. Let us calculate the conditional bias and variance, respectively. Denote by e_q×1 the vector that contains q 1’s. When p = 1, S is a diagonal matrix with diagonal elements $f (c_{1}), \dots, f (c_{q}), μ_{2} \sum_{k = 1}^{q} f (c_{k})$ . So the asymptotic conditional bias of $\hat{m} (t_{0}) = \frac{1}{q} \sum_{k = 1}^{q} {\hat{a}}_{k}$ is

\begin{matrix} Bias (\hat{m} (t_{0}) | T) & = \frac{1}{q} σ (t_{0}) \sum_{k = 1}^{q} c_{k} - \frac{1}{q \cdot \sqrt{n h}} \frac{σ (t_{0})}{f_{T} (t_{0})} e_{q \times 1}^{T} {(S^{- 1})}_{11} E (W_{1 n}^{*} | T) \\ = \frac{1}{q} σ (t_{0}) \sum_{k = 1}^{q} c_{k} - \frac{1}{q \cdot n h} \frac{σ (t_{0})}{f_{T} (t_{0})} \sum_{i = 1}^{n} K_{i} \sum_{k = 1}^{q} \frac{1}{f (c_{k})} {F (c_{k} - \frac{d_{i, k}}{σ (t_{i})}) - F (c_{k})} . \end{matrix}

Note that the error is symmetric, thus $\sum_{k = 1}^{q} c_{k} = 0$ , and furthermore, it is easy to check that $\frac{1}{q} \sum_{k = 1}^{q} \frac{1}{f (c_{k})} {F (c_{k} - \frac{d_{i, k}}{σ (t_{i})}) - F (c_{k})} = - \frac{r_{i, p}}{σ (t_{i})} {1 + o_{p} (1)}$ . Therefore,

Bias (\hat{m} (t_{0}) | T) = \frac{1}{n h} \frac{σ (t_{0})}{f_{T} (t_{0})} \sum_{i = 1}^{n} K_{i} \frac{r_{i, p}}{σ (t_{i})} {1 + o_{p} (1)} .

By using the fact that

\frac{1}{n h} \sum_{i = 1}^{n} K_{i} \frac{r_{i, p}}{σ (t_{i})} = \frac{f_{T} (t_{0}) m ″ (t_{0})}{2 σ (t_{0})} μ_{2} h^{2} {1 + o_{p} (1)},

we obtain

Bias (\hat{m} (t_{0}) | T) = \frac{1}{2} m ″ (t_{0}) μ_{2} h^{2} + o_{p} (h^{2}) .

(5.6)

Furthermore, the conditional variance of m̂(t₀) is

\begin{matrix} Var (\hat{m} (t_{0}) | T) & = \frac{1}{n h} \frac{σ^{2} (t_{0})}{f_{T} (t_{0})} \frac{1}{q^{2}} e_{q \times 1}^{T} {(S^{- 1} Σ S^{- 1})}_{11} e_{q \times 1} + o_{p} (\frac{1}{n h}) \\ = \frac{1}{n h} \frac{ν_{0} σ^{2} (t_{0})}{f_{T} (t_{0})} R_{1} (q) + o_{p} (\frac{1}{n h}) . \end{matrix}

(5.7)

By using Theorem 5.1, we can further derive the asymptotic bias and variance of m̃′(t₀) given in (2.3):

Bias (\tilde{m}' (t_{0}) | T) = \frac{1}{6} (m ‴ (t_{0}) + 3 m ″ (t_{0}) \frac{{f'}_{T} (t_{0})}{f_{T} (t_{0})}) \frac{μ_{4}}{μ_{2}} h^{2} + o_{p} (h^{2}),

(5.8)

Var (\tilde{m}' (t_{0}) | T) = \frac{1}{n h^{3}} \frac{ν_{2} σ^{2} (t_{0})}{μ_{2}^{2} f_{T} (t_{0})} R_{2} (q) + o_{p} (\frac{1}{n h^{3}}) .

(5.9)

Proof of Theorem 2.2

Note that

\begin{matrix} lim_{q \to \infty} R_{1} (q) & = \int_{0}^{1} \int_{0}^{1} \frac{s_{1} \land s_{2} - s_{1} s_{2}}{f (F^{- 1} (s_{1})) f (F^{- 1} (s_{2}))} d s_{1} d s_{2} \\ = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} (F (z_{1}) \land F (z_{2}) - F (z_{1}) F (z_{2})) d z_{1} d z_{2} . \end{matrix}

(5.10)

by change of variables. Define two functions below $G (s) = \int_{- \infty}^{s} F (t) d t and H (s) = \int_{- \infty}^{s} G (t) d t$ . It is easy to verify that

G (s) = \int_{- \infty}^{s} (s - x) f (x) d x = s F (s) - k_{1} (s),

(5.11)

where $k_{1} (s) = \int_{- \infty}^{s} x f (x) d x$ . Similarly, we obtain

2 H (s) = \int_{- \infty}^{s} {(s - x)}^{2} f (x) d x = s^{2} F (s) - 2 s k_{1} (s) + k_{2} (s),

(5.12)

where $k_{2} (s) = \int_{- \infty}^{s} x^{2} f (x) d x$ . Let I be the integral in (5.10). We have that I equals

2 \int_{- \infty}^{\infty} (\int_{z_{1}}^{\infty} f (t) d t) G (z_{1}) d z_{1} = 2 \int_{- \infty}^{\infty} f (t) (\int_{- \infty}^{t} G (z_{1}) d z_{1}) d t = \int_{- \infty}^{\infty} 2 f (t) H (t) d t .

(5.13)

By the definition of G and H, we know $\frac{d (2 H (t) F (t) - G^{2} (t))}{d t} = 2 H (t) f (t)$ ; and combining (5.11) and (5.12) yields $2 H (t) F (t) - G^{2} (t) = k_{2} (t) F (t) - k_{1}^{2} (t)$ . Now it is easy to see that I equals 1, by the facts that $\int_{- \infty}^{\infty} x^{2} f (x) d x = E_{F} [ϵ^{2}] = 1 and \int_{- \infty}^{\infty} x f (x) d x = E_{F} [ϵ] = 0$ .

Proof of Theorem 3.1

We apply Theorem 5.1 to get the asymptotic normality. Denote by e_r the p-vector (0, 0, ⋯, 1, 0, ⋯, 0)^T with 1 on the r^th position. When p = 2, S₁₂ has the following forms $S_{12} = (0_{q \times 1}, μ_{2} {(f (c_{k}))}_{q \times 1}), and S_{22} = diag (μ_{2} \sum_{k = 1}^{q} f (c_{k}), μ_{4} \sum_{k = 1}^{q} f (c_{k})) . Since S_{11} = diag (f (c_{1}), \dots, f (c_{q})), {(S^{- 1})}_{22} = {(S_{22} - S_{21} S_{11}^{- 1} S_{12})}^{- 1} = diag ({(μ_{2} \sum_{k = 1}^{q} f (c_{k}))}^{- 1}, {(μ_{4} - μ_{2}^{2}) \sum_{k = 1}^{q} f (c_{k})}^{- 1})$ .

Note that ${(S^{- 1})}_{21} = - {(S^{- 1})}_{22} S_{21} S_{11}^{- 1}$ . Thus, (S⁻¹)₂₁ equals ${(0_{q \times 1}, [μ_{2} / {(μ_{4} - μ_{2}^{2}) \sum_{k = 1}^{q} f (c_{k})}] 1_{q \times 1})}^{T}$ . By Theorem 5.1

\begin{matrix} Bias (\hat{m}' (t_{0}) | T) & = - \frac{σ (t_{0})}{h f_{T} (t_{0})} \frac{1}{\sqrt{n h}} e_{1}^{T} {{(S^{- 1})}_{21} E (W_{1 n}^{*} | T) + {(S^{- 1})}_{22} E (W_{2 n}^{*} | T)} \\ = - \frac{σ (t_{0})}{h f_{T} (t_{0})} \frac{1}{μ_{2} \sum_{k = 1}^{q} f (c_{k})} \frac{1}{\sqrt{n h}} E (w_{21}^{*} | T) . \end{matrix}

Note that $E (w_{2 j}^{*} | T) = \frac{1}{\sqrt{n h}} \sum_{i = 1}^{n} K_{i} x_{i}^{j} \sum_{k = 1}^{q} {F (c_{k} - \frac{d_{i, k}}{σ (t_{i})}) - F (c_{k})}$ . Similarly, under condition (D), we have $\sum_{k = 1}^{q} {F (c_{k} - \frac{d_{i, k}}{σ (t_{i})}) - F (c_{k})} = - \sum_{k = 1}^{q} f (c_{k}) \cdot \frac{r_{i, p}}{σ (t_{i})} {1 + o_{p} (1)}$ . Therefore, Bias(m̂′(t₀|T) is equal to $\frac{1}{n h^{2}} \frac{σ (t_{0})}{f_{T} (t_{0})} \sum_{i = 1}^{n} K_{i} x_{i} \frac{r_{i, p}}{σ (t_{i})} {1 + o_{p} (1)}$ . For p = 2,

\frac{1}{n h} \sum_{i = 1}^{n} K_{i} x_{i} \frac{r_{i, p}}{σ (t_{i})} = \frac{f_{T} (t_{0}) m ‴ (t_{0})}{6 σ (t_{0})} \frac{μ_{4}}{μ_{2}} h^{3} {1 + o_{p} (1)},

we obtain

B i a s (\hat{m}' (t_{0}) | T) = \frac{1}{6} m ‴ (t_{0}) \frac{μ_{4}}{μ_{2}} h^{2} + o_{p} (h^{2}) .

(5.14)

Furthermore, the conditional variance of m̂(t₀) is

\begin{matrix} Var (\hat{m}' (t_{0}) | T) & = \frac{1}{n h^{3}} \frac{σ^{2} (t_{0})}{f_{T} (t_{0})} e_{1}^{T} {(S^{- 1} Σ S^{- 1})}_{22} e_{1} + o_{p} (\frac{1}{n h^{3}}), \\ = \frac{1}{n h^{3}} \frac{ν_{2} σ^{2} (t_{0})}{μ_{2}^{2} f_{T} (t_{0})} R_{2} (q) + o_{p} (\frac{1}{n h^{3}}) . \end{matrix}

(5.15)

which completes the proof.

Proof of Theorem 3.2

From Zou & Yuan (2008), we know that

lim_{q \to \infty} {(\sum_{k = 1}^{q} f (c_{k}))}^{2} / (\sum_{k = 1}^{q} \sum_{k' = 1}^{q} τ_{k k'}) = 12 E_{F}^{2} [f (ϵ)] = 12 {(\int f^{2} (x) d x)}^{2} .

Thus ${lim}_{q \to \infty} \frac{1}{R_{2} (q)} = 12 {(\int f^{2} (x) d x)}^{2}$ . We notice that 12 (∫ f²(x)dx)² is also the asymptotic Pitman efficiency of the Wilcoxon test relative to the t-test Hodges & Lehmann (1956). For the rest of the proof, readers are referred to Hodges & Lehmann (1956).

6 Discussion

In this paper our theoretical analysis deals with the classical setting in which t₀ is an interior point and the error distribution has finite variance. We should point out here that the same arguments hold for estimating boundary points and the proposed methodology is valid even when the error variance is infinite.

Automatic boundary correction. For simplicity, consider t ϵ [0, 1] and t₀ = ch for some constant c. We show that the leading team of the asymptotic bias of the local linear/quadratic CQR estimator is the same as that of the local linear/quadratic LS estimator, which indicates that the local CQR estimator enjoys the property of automatical boundary correction, a nice property of local LS estimator. Furthermore, the asymptotic relative efficiency remains exactly the same as that for interior points.
Infinite error variance. We show that the local CQR estimator still enjoys the optimal rate of convergence and asymptotic normality even when the conditional variance is infinite. This property can be important for real applications, since we have no information on the error distribution in practice.

For detailed theoretical proof of the above claims, we refer interested readers to a supplementary file (Kai, Li & Zou 2009) of this paper, where we also provide additional simulation results to support the theory. We opt not to show these results here due to space limit.

In this paper, we focus on the local CQR estimate for the nonparametric regression model. The proposed methodology and theory may be extended to the settings in presence of multivariate covariates by considering varying coefficient models, additive models or semiparametric models. Such extensions are of great interest, and further research is needed for such extensions.

Finally, we would like to point out that the local CQR procedure is efficiently implemented using the MM algorithm. Our experiences show that for q = 9 and sample size n = 7000, the local CQR fit at a given location can be computed within 0.32 seconds on an AMD 1.9GHz machine. The MM implementation seems to be more efficient than the standard linear programming. We discuss the computing algorithm in details in a separate article.

Supplementary Material

LCQR

NIHMS115056-supplement-LCQR.pdf^{(1.4MB, pdf)}

Acknowledgements

The authors are grateful to the editor, the associate editor and two referees for their helpful and constructive comments, which lead to a substantial improvement of the quality of this paper.

Kai’s research is supported by NIDA, NIH grants R21 DA024260 and P50 DA10075 as a research assistant. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

Li’s research is supported by National Science Foundation grants DMS 0348869 and DMS 0722351.

Zou’s research is supported by National Science Foundation grant DMS 0706733.

References

Chu C-K, Marron JS. Choosing a kernel regression estimator. Statist. Sci. 1991;6(4):404–436. [Google Scholar]
Fan J. Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 1992;87(420):998–1004. [Google Scholar]
Fan J, Gijbels I. Variable bandwidth and local linear regression smoothers. Ann. Statist. 1992;20(4):2008–2036. [Google Scholar]
Fan J, Gijbels I. Local polynomial modelling and its applications. London: Chapman & Hall; 1996. [Google Scholar]
Fan J, Hu TC, Truong YK. Robust non-parametric function estimation. Scand. J. Statist. 1994;21(4):433–446. [Google Scholar]
Hodges J, Lehmann E. The efficiency of some nonparametric competitors of the t-test. Ann. Math. Stat. 1956;27(2):324–335. [Google Scholar]
Hunter DR, Lange K. Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics. 2000;9(1):60–77. [Google Scholar]
Kai B, Li R, Zou H. Supplementary materials for “local cqr smoothing: An efficient and safe alternative to local polynomial regression”. 2009 doi: 10.1111/j.1467-9868.2009.00725.x. Technical report, http://www.stat.psu.edu/rli/research/Supplement-of-localCQR.pdf. [DOI] [PMC free article] [PubMed] [Google Scholar]
Knight K. Limiting distributions for L1 regression estimators under general conditions. Ann. Statist. 1998;26(2):755–770. [Google Scholar]
Koenker R. A note on l-estimates for linear models. Stat. and Prob. Letters. 1984;2(6):323–325. [Google Scholar]
Koenker R. Quantile regression. Cambridge: Cambridge University Press; 2005. [Google Scholar]
Parzen E. On estimation of a probability density function and mode. Ann. Math. Statist. 1962;33:1065–1076. [Google Scholar]
Pollard D. Asymptotics for least absolute deviation regression estimators. Econometric Theory. 1991;7(2):186–199. [Google Scholar]
Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 1995;90(432):1257–1270. [Google Scholar]
Welsh AH. Robust estimation of smooth regression and spread functions and their derivatives. Statist. Sinica. 1996;6(2):347–366. [Google Scholar]
Yu K, Jones MC. Local linear quantile regression. J. Amer. Statist. Assoc. 1998;93(441):228–237. [Google Scholar]
Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. The Annals of Statistics. 2008;36(3):1108–1126. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

LCQR

NIHMS115056-supplement-LCQR.pdf^{(1.4MB, pdf)}

[R1] Chu C-K, Marron JS. Choosing a kernel regression estimator. Statist. Sci. 1991;6(4):404–436. [Google Scholar]

[R2] Fan J. Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 1992;87(420):998–1004. [Google Scholar]

[R3] Fan J, Gijbels I. Variable bandwidth and local linear regression smoothers. Ann. Statist. 1992;20(4):2008–2036. [Google Scholar]

[R4] Fan J, Gijbels I. Local polynomial modelling and its applications. London: Chapman & Hall; 1996. [Google Scholar]

[R5] Fan J, Hu TC, Truong YK. Robust non-parametric function estimation. Scand. J. Statist. 1994;21(4):433–446. [Google Scholar]

[R6] Hodges J, Lehmann E. The efficiency of some nonparametric competitors of the t-test. Ann. Math. Stat. 1956;27(2):324–335. [Google Scholar]

[R7] Hunter DR, Lange K. Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics. 2000;9(1):60–77. [Google Scholar]

[R8] Kai B, Li R, Zou H. Supplementary materials for “local cqr smoothing: An efficient and safe alternative to local polynomial regression”. 2009 doi: 10.1111/j.1467-9868.2009.00725.x. Technical report, http://www.stat.psu.edu/rli/research/Supplement-of-localCQR.pdf. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Knight K. Limiting distributions for L1 regression estimators under general conditions. Ann. Statist. 1998;26(2):755–770. [Google Scholar]

[R10] Koenker R. A note on l-estimates for linear models. Stat. and Prob. Letters. 1984;2(6):323–325. [Google Scholar]

[R11] Koenker R. Quantile regression. Cambridge: Cambridge University Press; 2005. [Google Scholar]

[R12] Parzen E. On estimation of a probability density function and mode. Ann. Math. Statist. 1962;33:1065–1076. [Google Scholar]

[R13] Pollard D. Asymptotics for least absolute deviation regression estimators. Econometric Theory. 1991;7(2):186–199. [Google Scholar]

[R14] Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 1995;90(432):1257–1270. [Google Scholar]

[R15] Welsh AH. Robust estimation of smooth regression and spread functions and their derivatives. Statist. Sinica. 1996;6(2):347–366. [Google Scholar]

[R16] Yu K, Jones MC. Local linear quantile regression. J. Amer. Statist. Assoc. 1998;93(441):228–237. [Google Scholar]

[R17] Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. The Annals of Statistics. 2008;36(3):1108–1126. [Google Scholar]

PERMALINK

Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression

Bo Kai

Runze Li

Hui Zou

Roles

Abstract

1 Introduction

2 Estimation of regression function

Remark 1

Remark 2

2.1 Asymptotic properties

Theorem 2.1

Remark 3

2.2 Asymptotic relative efficiency

Table 1.

Theorem 2.2

3 Estimation of derivative

3.1 Asymptotic properties

Theorem 3.1

Remark 4

3.2 Asymptotic relative efficiency

Theorem 3.2

Table 2.

4 Numerical comparisons and examples

4.1 Bandwidth selection in practical implementation

4.2 Simulation examples

Example 4.1

Table 3.

Example 4.2

Table 4.

4.3 A real data example

Figure 1.

Figure 2.

5 Local p-polynomial CQR smoothing and proofs

Theorem 5.1

Lemma 5.2

Lemma 5.3

Proof of Theorem 5.1

Proof of Theorem 2.1

Proof of Theorem 2.2

Proof of Theorem 3.1

Proof of Theorem 3.2

6 Discussion

Supplementary Material

Acknowledgements

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases