Efficient Regressions via Optimally Combining Quantile Information

Zhibiao Zhao; Zhijie Xiao

doi:10.1017/S0266466614000176

. Author manuscript; available in PMC: 2015 Dec 1.

Published in final edited form as: Econ Theory (N Y). 2014 Jun 6;30(6):1272–1314. doi: 10.1017/S0266466614000176

Efficient Regressions via Optimally Combining Quantile Information^*

Zhibiao Zhao ¹, Zhijie Xiao ²

PMCID: PMC4251566 NIHMSID: NIHMS605447 PMID: 25484481

Abstract

We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods.

Keywords: Asymptotic normality, Bahadur representation, Efficiency, Fisher information, Quantile regression, Super-efficiency

1 Introduction

For regression estimations, the most widely used approach is the least squares (LS) method (for finite-dimensional models) or local LS method (in infinite-dimensional settings). If the data is normally distributed, the LS estimator has the likelihood interpretation and is the most efficient estimator. In the absence of Gaussianity, the LS estimation is usually less efficient than methods that exploit the distributional information, although may still be consistent under appropriate regularity conditions. Without these regularity conditions, the LS estimator may not even be consistent, for example, when the data have a heavy-tailed distribution such as the Cauchy distribution. Monte Carlo evidence indicates that the LS estimator can be quite sensitive to certain outliers. In empirical analysis, many applications (such as finance and economics) involve data with heavy-tailed or skewed distributions, and the LS estimators may have poor performance in these cases. It is therefore important to develop robust and efficient estimation procedures for general innovation distributions.

If the underlying distribution were known, the Maximum Likelihood Estimator (MLE) could be constructed. Under regularity conditions, the MLE is asymptotic normal and asymptotically efficient in the sense that the limiting covariance matrix attains the Cramér-Rao lower bound. In practice, the true density function is generally unknown and so the MLE is not feasible. Nevertheless, the MLE and the Craḿer-Rao bound serve as a standard against which we should measure our estimator.

For this reason, statisticians have devoted a great deal of research effort to the construction of estimation procedures that can extract distributional information from the data, and thus deliver more efficient estimators than the conventional LS method. For the location model Y = α + ε, where ε has a symmetric density, the adaptive likelihood or score function based estimators of α were constructed in Beran (1974) and Stone (1975). Bickel (1982) further extended the idea to slope estimation of classical linear models. For nonlinear models, adaptive likelihood based estimations are usually technically challenging.

We believe that the quantile regression technique [Koenker and Bassett (1978); Koenker (2005)] can provide a useful method in efficient statistical estimation. Intuitively, an estimation method that exploits the distributional information can potentially provide more efficient estimators. Since quantile regression provides a way of estimating the whole conditional distribution, appropriately using quantile regressions may improve estimation efficiency. Under regularity assumptions, the least-absolute-deviation (LAD) regression (i.e. quantile regression at median) can provide better estimators than the LS regression in the presence of heavy-tailed distributions. In addition, for certain distributions, a quantile regression at a non-median quantile may deliver a more efficient estimator than the LAD method. More importantly, additional efficiency gain can be achieved by combining information over multiple quantiles.

Although combining quantile regression over multiple quantiles can potentially improve estimation efficiency, this is often much easier to say than it is to do in a satisfactory way. To combine information from quantile regression, one may consider combining information over different quantiles via the criterion or loss function. For example, Zou and Yuan (2008) and Bradic, Fan and Wang (2011) proposed the composite quantile regression (CQR) for parameter estimation and variable selection in the classical linear regression models. For nonparametric regression models, Kai, Li and Zou (2010) proposed a local CQR estimation procedure, which is asymptotically equivalent to the local LS estimator as the number of quantiles increases. Alternatively, one may combine information based on estimators at different quantiles. Along this direction, Portnoy and Koenker (1989) studied asymptotically efficient estimation for the simple linear regression model. Although the proposed estimator is efficient asymptotically, it is not the best estimator with fixed quantiles. Also see Chamberlain (1994), Xiao and Koenker (2009), and Chen, Linton and Jacho-Chavez (2011) for related work on combination of estimators.

In this paper we consider regression estimation by combining information across k quantiles τ_j = j/(k + 1), j = 1, …, k. We show that for a wide range of parametric and nonparametric regression models, more efficient estimators can be constructed via optimally combining quantile regressions. We argue that it is essential to combine quantile information appropriately to achieve efficiency gain. In particular, simple averaging multiple quantile regression estimators is asymptotically equivalent to the LS method. We show that, by optimally combining information across quantiles τ₁, …, τ_k, the efficiency of the proposed optimal weighted quantile average estimator is at most Φ_k away from the Fisher information, where Φ_k is defined as (43). As the number of quantiles k → ∞, under appropriate regularity conditions, we have Φ_k → 0 and the estimator is asymptotically efficient. Interestingly, in the case of non-regular statistical estimation when these regularity conditions do not hold, the proposed estimators may lead to super-efficient estimation.

The proposed methodology provides a generally applicable framework for constructing more efficient estimators under a broad variety of settings. For finite-dimensional parametric estimations, the method can be applied to construct efficient estimators for parameters in both linear and nonlinear regression models with homoscedastic errors, and parameters in location-scale models with conditional heteroscedasticity. We show that, in the presence of conditional heteroscedasticity, some appropriate preliminary quantile regression is needed to improve the efficiency and to facilitate the quantile combination. Different restrictions (and thus optimal weights) are needed for estimation of the location parameters and scalar parameters. For nonparametric function estimations, the asymptotic bias of the proposed estimator is the same as that of the conventional nonparametric estimators (such as the local LS and the local LAD estimators) and meanwhile the inverse of the asymptotic variance is at most Φ_k away from the optimal Fisher information. Our extensive simulation studies show that the proposed method significantly outperforms the widely used LS, LAD, and the CQR method [Zou and Yuan (2008); Kai, Li and Zou (2010)] for both parametric and nonparametric models.

The rest of this paper is organized as follows: we provide a general discussion on the framework and assumptions for constructing efficient estimators based on quantile regressions in Section 2. Three leading cases of regressions are then investigated in Sections 3–5. In particular, we study the parametric regression models with homoscedasticity in Section 3. In Section 4, we study the parametric models with heteroscedasticity. Nonparametric models are investigated in Section 5. We focus on methodology and our discussions in Sections 3–5 consider the case with finite k. The case with k increasing to infinity is discussed in Section 6. Simulation studies are contained in Section 8, and an application to financial data is given in Section 9 to highlight the proposed method. Proofs are given in the Appendix.

2 Model Setup, Assumptions, and TheWeighted Quantile Average Estimator

We consider regression models of the following form:

Y = m (X) + σ (X) ε,

(1)

where (X, Y, ε) is the triplet of covariate, response, and noise, with ε independent of X. Here m(·) and σ(·) are two functions that depend on unknown parameters θ, where θ may be of finite dimension (parametric case) or infinite dimension (nonparametric case).

Denote by Q_ε(τ) the τ-th quantile of ε for τ ∈ (0, 1). Then the τ-th conditional quantile of Y given X, denoted by Q_Y (τ|X), is given by

Q_{Y} (τ | X) = m (X) + σ (X) Q_{ε} (τ) .

(2)

As the inverse of the conditional distribution function, Q_Y (τ|X) fully captures the distributional relationship between Y and X. Intuitively, different distributional information may be obtained from different quantiles, and an appropriate combination of multiple quantiles may be more informative about the distribution than the conditional mean in the LS methods.

Throughout this paper we consider combining information over k equally spaced quantiles τ_j = j/(k + 1), j = 1, …, k. The discussion of this paper focuses on the case where k is assumed to be a given finite number. We consider the case that k → ∞ increases with n in Section 6.3.

We briefly introduce the idea of our proposed estimator. Let θ be a parameter of interest. From the conditional quantile in (2), we can usually identify some perturbed version of θ, denoted by θ(τ). Suppose there exists a class 𝒲 of weights such that

\sum_{j = 1}^{k} ω_{j} θ (τ_{j}) = θ, ω = {[ω_{1}, \dots, ω_{k}]}^{T} \in 𝒲 .

(3)

Given data on (X, Y), we can use a quantile regression based on (2) to obtain some consistent estimate, denoted by θ̂(τ), of θ(τ). In light of (3), we propose the following estimate of θ:

{θ̂}_{W Q A E} (ω) = \sum_{j = 1}^{k} ω_{j} θ̂ (τ), ω \in 𝒲 .

(4)

We term θ̂_WQAE(ω) by the weighted quantile average estimator (WQAE) of θ. Since θ̂_WQAE(ω) is a consistent estimate of θ for each ω ∈ 𝒲, we propose in this paper estimation of θ based on ω ∈ 𝒲 that minimizes the limiting variance of θ̂_WQAE(ω) in parametric settings, or the asymptotic mean squared error in nonparametric settings. If ω is chosen in such a way, we call the corresponding estimator θ̂_WQAE(ω*) the optimal WQAE (OWQAE).

The proposed estimation can be applied to a wide range of regression models. In this paper, we focus on the following three leading cases of regression (1):

Case 1: Parametric regression models with homoscedastic errors.
Case 2: Location-scale models with conditional heteroscedasticity in σ(·).
Case 3: Nonparametric regressions.

We study each of these cases in the following three sections 3–5.

Suppose we have samples ${(X_{t}, Y_{t})}_{t = 1}^{n}$ from (1) with corresponding noises ${ε_{t}}_{t = 1}^{n}$ . To facilitate the study of asymptotic theory, we impose the following assumptions.

Assumption 1 (i) {(X_t, ε_t)}_t∈ℤ is strictly stationary; for each t, ε_t is independent of {X_t, X_t−1, …; ε_t−1, ε_t−2, …}. (ii) {X_t}_t∈ℤ is an ergodic process.

Assumption 2 Denote by f_ε(·) and F_ε(·) the density and distribution functions of ε. f_ε is positive, twice differentiable, and bounded on {u : 0 < F_ε(u) < 1}.

Assumption 1 provides a convenient framework for studying asymptotic theory. First, strong mixing condition implies the ergodicity, and thus Assumption 1(ii) is weaker than the widely used strong mixing conditions in time series analysis. Next, we illustrate two useful properties below.

(P1)
Martingale structure. Let ℱ_t be the σ-algebra generated by {X_t+1, X_t, …; ε_t, ε_t−1, …}. By Assumption 1(i), ε_t is independent of ℱ_t−1. For any functions h₁(·) and h₂(·) such that 𝔼 [|h₁(X_t)h₂(ε_t)|] < ∞ and 𝔼[h₂(ε_t)] = 0, we have 𝔼[h₁(X_t)h₂(ε_t)|ℱ_t−1] = h₁(X_t)𝔼[h₂(ε_t)] = 0. Therefore, {h₁(X_t)h₂(ε_t)}_t∈ℤ are martingale differences with respect to the filtration {ℱ_t}_t∈ℤ, and consequently $\sum_{t = 1}^{n} h_{1} (X_{t}) h_{2} (ε_{t})$ is a martingale.
(P2)
Law of large numbers. By ergodic theorem [Theorem 3.5.7, Stout (1974)], for any function ℓ(·) such that 𝔼[|ℓ(X_t)|] < ∞, the ergodicity in Assumption 1(ii) implies
$lim_{n \to \infty} \frac{1}{n} \sum_{t = 1}^{n} ℓ (X_{t}) = 𝔼 [ℓ (X_{0})], in probability .$ (5)

In subsequent sections we adopt the following notation. For a random vector Z, we use ‖Z‖ to signify the Euclidean norm of Z, and write Z ∈ ℒ^q, q > 0, if 𝔼(‖Z‖^q) < ∞.

3 Homoscadastic Parametric Regression Models

In this section we study the parametric regression model (case 1 in Section 2). Corresponding to the general representation (1), let σ(X) ≡ 1 and m(X) = α + m(X, β), where β ∈ ℝ^p is the vector of unknown parameters, and the intercept α is added to ensure the identifiability of β, we have

Y = α + m (X, β) + ε .

(6)

We are interested in the estimation of β.

By (2), we have the conditional quantile representation

Q_{Y} (τ | X) = α (τ) + m (X, β), where α (τ) = α + Q_{ε} (τ) .

(7)

Given samples ${(X_{t}, Y_{t})}_{t = 1}^{n}$ from (6), let (α̂(τ), β̂(τ)) be an estimator of (α(τ), β) from a quantile regression based on (7):

(α̂ (τ), β̂ (τ)) = \underset{(a, b)}{argmin} \sum_{t = 1}^{n} ρ_{τ} {Y_{t} - a - m (X_{t}, b)},

(8)

where ρ_τ(z) = z(τ − 1_z≤0) and 1 is the indicator function. Denote by ṁ(x, β) is the partial derivative vector of m(x, β) with respect to β. Define

D_{n} = [\begin{matrix} 1 & ṁ {(X_{1}, β)}^{T} \\ ⋮ & ⋮ \\ 1 & ṁ {(X_{n}, β)}^{T} \end{matrix}] \in ℝ^{n \times (p + 1)} and Z_{t} = [\begin{matrix} 1 \\ ṁ (X_{t}, β) \end{matrix}] \in ℝ^{p + 1} .

Similar to the linear regression, D_n serves as the design matrix and Z_t is the equivalent covariate corresponding to observation t for the quantile regression (8). A leading example is the classical linear regression model [see, e.g., Koenker (1984)], corresponding to m(X, β) = X^T β. In this case, Q_Y(τ|X) = [α + Q_ε(τ)] + X^T β and ṁ(X, β) = X.

Assumption 3 The quantile regression estimator has the Bahadur representation

[\begin{matrix} α̂ (τ) \\ β̂ (τ) \end{matrix}] = [\begin{matrix} α (τ) \\ β \end{matrix}] + \frac{{(D_{n}^{T} D_{n})}^{- 1}}{f_{ε} (Q_{ε} (τ))} \sum_{t = 1}^{n} Z_{t} [τ - 1_{ε_{t} < Q_{ε} (τ)}] + o_{p} (n^{- 1 / 2}),

(9)

uniformly in τ ∈ 𝒯 ≔ [δ, 1 − δ] with some small constant δ > 0.

Assumption 3 is an asymptotic representation of the quantile regression estimator. Under regularity conditions on the regression function m(·), error density, and the parameter space, a Bahadur representation can be obtained over τ on a subset of [0,1]. See, e.g., Portnoy and Koenker (1989), Jurečková and Procházka (1994), and He and Shao (1996) for related study. Also see Section 4 for discussions on the conditional heteroscedastic parametric models.

Since β does not depend on τ, we can use β̂(τ) to estimate β with any choice of τ. By Theorem 1 below (also, see the definition of Σ_β there),

\sqrt{n} [β̂ (τ) - β] \Rightarrow N (0, Σ_{β}^{- 1} \frac{τ (1 - τ)}{f_{ε}^{2} (Q_{ε} (τ))}) .

(10)

For example, the case with τ = 0.5 corresponds to the median quantile regression or LAD estimation of β.

As discussed in Section 2, we want to combine information over the k quantiles τ_j = j/(k + 1), j = 1, …, k, where k is assumed to be a given finite number such that τ_j ∈ 𝒯. Since β̂(τ) is a consistent estimate of β, from (3)–(4), we consider the WQAE of β:

{β̂}_{W Q A E} (ω) = \sum_{j = 1}^{k} ω_{j} β̂ (τ_{j}), where \sum_{j = 1}^{k} ω_{j} = 1 .

(11)

Theorem 1 Suppose Assumptions 1–3 hold and ṁ (X, β) ∈ ℒ² with $X \overset{d}{=} X_{t}$ . Then

\sqrt{n} [{β̂}_{W Q A E} (ω) - β] \Rightarrow N (0, Σ_{β}^{- 1} S (ω)),

(12)

with Σ_β = 𝔼[ṁ(X, β)ṁ(X, β)^T] − 𝔼[ṁ(X, β)]𝔼 [ṁ(X, β)^T] assumed to be non-singular, and

S (ω) = ω^{T} H ω with H = {\frac{min (τ_{j}, τ_{j'}) - τ_{j} τ_{j'}}{f_{ε} (Q_{ε} (τ_{j})) f_{ε} (Q_{ε} (τ_{j'}))}}_{1 \leq j, j' \leq k} \in ℝ^{k \times k} .

(13)

The proposed estimator, the OWQAE, of β is obtained by choosing ω to minimize the asymptotic variance of β̂_WQAE(ω).

Theorem 2 Under the assumptions of Theorem 1, the optimal weight is

ω^{*} = \underset{ω_{1} + \dots + ω_{k} = 1}{argmin} S (ω) = \frac{H^{- 1} e_{k}}{e_{k}^{T} H^{- 1} e_{k}}, where e_{k} = {(1, \dots, 1)}^{T} .

(14)

With ω* in (14), the OWQAE of β has the following limiting distribution:

\sqrt{n} [{β̂}_{W Q A E} (ω^{*}) - β] \Rightarrow N (0, Σ_{β}^{- 1} Ω_{k}^{- 1}), where Ω_{k} = e_{k}^{T} H^{- 1} e_{k} .

(15)

Remark 1: A quick way of combining quantile information is to take a simple average of the quantile regression estimators. This is easy to implement and has been used in the literature [see, e.g., Kai, Li and Zou (2010) for nonparametric estimation] as a method of combining quantile information. If we use ω = [1/k, …, 1/k]^T in (11), the resulting unweighted estimator has the asymptotic normality in Theorem 1 with S(ω) replaced by

R_{k} ≔ \frac{1}{k^{2}} \sum_{j = 1}^{k} \sum_{j' = 1}^{k} \frac{min (τ_{j}, τ_{j'}) - τ_{j} τ_{j'}}{f_{ε} (Q_{ε} (τ_{j})) f_{ε} (Q_{ε} (τ_{j'}))} .

(16)

Clearly, R_k ≥ S(ω*). See Section 6.2 for more discussions on the property of R_k.

We compute ω* for some examples below using k = 9 quantiles 0.1, 0.2, …, 0.9.

Example 1. Let ε be Student-t distributed. For t₁ (Cauchy distribution), the optimal weight ω* = {−0.03,−0.04,0.08,0.29,0.40,0.29,0.08,−0.04,−0.03}, quantiles τ = 0.4, 0.5, 0.6 contribute almost all information whereas quantiles τ = 0.1, 0.2, 0.8, 0.9 have negative weights, so the unweighted quantile average estimator would not perform well. However, for N(0,1), ω* = {0.13,0.11,0.11,0.10,0.10,0.10,0.11,0.11,0.13} are close to the uniform weights, and thus the OWQAE, the unweighted quantile average estimator, and the LS estimator have comparable performance.

Example 2. Let ε have normal mixture distributions. For Mixture 1: 0.5N(0,1)+0.5N(0,0.5⁶) (different variances), ω* = {−0.002,−0.102,0.183,0.277,0.287,0.277,0.183,−0.102,−0.002}, quantiles 0.3, …, 0.7 contain substantial information whereas quantiles 0.2 and 0.8 have negative weight. For Mixture 2: 0.5N(−2,1)+0.5N(2,1) (different means), ω* = {0.185,0.156,0.153,0.078,−0.144,0.078,0.153,0.156,0.185}, quantiles τ = 0.1, 0.2, 0.3, 0.7, 0.8, 0.9 are comparable while the median performs the worst.

Example 3. Let ε be Gamma random variable with parameter d > 0. For d = 1 (exponential distribution), $ω_{1}^{*} = 1.152, ω_{2}^{*} = - 0.124$ , and $ω_{i}^{*} \approx 0$ for i = 3, …, 9. Quantiles 0.1 and 0.2 contain almost all information.

As shown in Examples 1–3 above, different quantiles may carry substantially different amount of information, and inappropriately utilizing such information may result in a significant loss of efficiency. The latter phenomenon provides strong evidence in favor of our proposed optimally weighted quantile based estimators.

In practice, the optimal weight ω* in (14), which depends on the sparsity or quantile-density function f_ε(Q_ε(τ)), needs to be estimated. We make the following assumption on the estimate, denoted by b f̂_ε(Q̂_ε(τ)), of f_ε(Q_ε(τ)).

Assumption 4 sup_τ∈𝒯 |f̂_ε(Q̂_ε(τ)) − f_ε(Q_ε(τ))| = o_p(1) for 𝒯 in Assumption 3.

Plugging the consistent estimate f̂_ε(Q̂_ε(τ)) of f_ε(Q_ε(τ)) into the matrix H in (14), we can obtain the following consistent estimate of the optimal weight ω*:

{ω̂}^{*} = \frac{Ĥ^{- 1} e_{k}}{e_{k}^{T} Ĥ^{- 1} e_{k}}, where Ĥ = {\frac{min (τ_{j}, τ_{j'}) - τ_{j} τ_{j'}}{{f̂}_{ε} ({Q̂}_{ε} (τ_{j})) {f̂}_{ε} ({Q̂}_{ε} (τ_{j'}))}}_{1 \leq j, j' \leq k} .

(17)

Theorem 3 below asserts that, β̂_WQAE(ω̂*) with the estimated weight ω̂* achieves the same efficiency of β̂_WQAE(ω*).

Theorem 3 Under the assumptions of Theorem 1 and Assumption 4, we have

\sqrt{n} [{β̂}_{W Q A E} ({ω̂}^{*}) - β] \Rightarrow N (0, Σ_{β}^{- 1} Ω_{k}^{- 1}) .

(18)

4 The Location-Scale Models

Another class of widely used regression models is the location-scale models (case 2 in Section 2) that allows for conditional heteroscedasticity. There is a large literature in econometrics and statistics studying the location-scale models. Koenker and Zhao (1994) studied L estimation of a location-scale model in the following form:

Y_{t} = X_{t}^{T} β + σ_{t} ε_{t}, where σ_{t} = X_{t}^{T} γ,

(19)

under the condition $X_{t}^{T} > 0$ . Zhao (2001) studied asymptotically efficient median regression using the k-nearest neighbors method. In this section we study the location-scale models via optimal quantile combination.

In the model (19), the positive constraint $X_{t}^{T} > 0$ is somewhat restrictive to allow for flexible applications. For example, it is violated for normally distributed covariates X. For this reason, many researchers consider an alternative form of σ_t which can be expressed as a linear function of absolute values of the regressors and other variables:

Y_{t} = X_{t}^{T} β + σ_{t} ε_{t}, where σ_{t} U_{t}^{T} γ ≔ σ (U_{t}),

(20)

where U_t is a vector of absolute values of the regressors and other covariates [see, e.g., Koenker and Zhao (1996) for studies on related models]. For example, let $X_{t}^{T} = (x_{t 1}, \dots, x_{t p})$ , one may consider a location-scale model with $σ_{t} = σ (U_{t}) = U_{t}^{T} γ = γ_{0} + \sum_{j = 1}^{p} γ_{j} | x_{t j} |$ , where $U_{t}^{T} = (1, | x_{t 1} |, \dots, | x_{t p} |)$ , γ = (γ₀, γ₁, ⋯, γ_p), γ₀ > 0, γ₁ ≥ 0, ⋯, γ_p ≥ 0.

In this section we consider the location-scale regression model (20).¹ We are interested in the estimation of β and γ. By (2), we have the conditional quantile representation

Q_{Y} (τ | X) = X^{T} β + U^{T} γ (τ) with γ (τ) = γ Q_{ε} (τ) .

(21)

Given a sample of size n, we may estimate (β, γ(τ)) using a quantile regression similar to (8). However, in the presence of conditional heteroscedasticity, it is more efficient to use a weighted quantile regression with the weights reflecting the conditional heteroscedasticity. In addition, the weighted quantile regression estimates have nice properties that helps combining quantile information.

Thus, following the idea of Koenker and Zhao (1994), we consider the weighted quantile regression:

(β̂ (τ), γ̂ (τ)) = \underset{(b, r)}{argmin} \sum_{t = 1}^{n} \frac{1}{{σ̃}_{t}} ρ_{τ} {Y_{t} - X_{t}^{T} b - U_{t}^{T} r},

(22)

where ${σ̃}_{t} = U_{t}^{T} γ̃$ and γ̃ is a consistent estimate of γ.

Assumption 5 (i) {(X_t, U_t, ε_t)}_t∈ℤ is strictly stationary; for each t, ε_t is independent of {(X_t, U_t), (X_t−1,U_t−1), …; ε_t−1, ε_t−2, …}. (ii) {(X_t, U_t)}_t∈ℤ is an ergodic process.

Assumption 6 (i) ‖X_t‖ + ‖U_t‖ ≤ c₁ and $U_{t}^{T} γ \geq c_{2}$ for some constants c₁, c₂ > 0. (ii) γ̃ − γ = o_p(n^−1/4). (iii) Let (X, U) be distributed as (X_t, U_t). Define the matrices

M_{1} = 𝔼 [\frac{X X^{T}}{σ {(U)}^{2}}], M_{2} = 𝔼 [\frac{X U^{T}}{σ {(U)}^{2}}], M_{3} = 𝔼 [\frac{U U^{T}}{σ {(U)}^{2}}],

and

M_{β} = M_{1} - M_{2} M_{3}^{- 1} M_{2}^{T}, M_{γ} = M_{3} - M_{2}^{T} M_{1}^{- 1} M_{2} .

The matrices M₁, M₃, M_β and M_γ are non-singular.

Assumption 5 is a modification of Assumption 1 by allowing for more covariates (X_t, U_t). In Assumption 6, (i) is imposed simply for technical convenience and can be replaced by some finite moment conditions, (ii) requires that γ̃ must be reasonably close to γ, and (iii) is used to avoid some singular design matrix.

Theorem 4 Suppose Assumptions 2, 5, and 6 hold. Then we have

[\begin{matrix} β̂ (τ) \\ γ̂ (τ) \end{matrix}] = [\begin{matrix} β \\ γ (τ) \end{matrix}] + \frac{1}{n f_{ε} (Q_{ε} (τ))} \sum_{t = 1}^{n} Z_{t} [τ - 1_{ε_{t} < Q_{ε} (τ)}] + o_{p} (n^{- 1 / 2}),

(23)

with

Z_{t} = [\begin{matrix} M_{β}^{- 1} (X_{t} - M_{2} M_{3}^{- 1} U_{t}) / σ (U_{t}) \\ M_{γ}^{- 1} (U_{t} - M_{2}^{T} M_{1}^{- 1} X_{t}) / σ (U_{t}) \end{matrix}] .

We now construct estimators of β and γ by optimally combining information over quantiles τ₁, …, τ_k.

First, we consider estimation of β. As in Section 3, we consider the WQAE β̂_WQAE(ω) given by (11). Using the Bahadur representation (23) from Theorem 4, the same argument in Theorem 1 yields

\sqrt{n} [{β̂}_{W Q A E} (ω) - β] \Rightarrow N (0, M_{β}^{- 1} S (ω)),

with S(ω) given in Theorem 1. Therefore, the optimal weight can be constructed in a similar way as described by Theorem 2, and ω* is given by (14). The OWQAE has the following limiting distribution

\sqrt{n} [{β̂}_{W Q A E} (ω^{*}) - β] \Rightarrow N (0, M_{β}^{- 1} Ω_{k}^{- 1}),

with Ω_k given in (15). If we use the estimated optimal weight ω̂* in (17), under the additional Assumption 4, the conclusion in Theorem 3 also holds here.

Next, we consider estimation of the scale parameter γ via quantile combination. As will be clear from later analysis, the construction of WQAE and choice of optimal weights related to the scale parameter will be different from those of β. For this reason, we denote the weights used in γ estimation by π = [π₁, …, π_k]^T. From (21)–(22), γ̂(τ) is an estimation of γ(τ) = γQ_ε(τ). Then, for any π satisfying $\sum_{j = 1}^{k} π_{j} Q_{ε} (τ_{j}) = 1, \sum_{j = 1}^{k} π_{j} γ (τ_{j}) = γ$ . Therefore, we propose the following WQAE of γ:

{γ̂}_{W Q A E} (π) = \sum_{j = 1}^{k} π_{j} γ̂ (τ_{j}), where \sum_{j = 1}^{k} π_{j} Q_{ε} (τ_{j}) = 1 .

(24)

Theorem 5 Under the assumptions in Theorem 4, we have the asymptotic normality:

\sqrt{n} [{γ̂}_{W Q A E} (π) - γ] \Rightarrow N (0, M_{γ}^{- 1} S (π)),

where S(π) = π^THπ with H defined in (13). Furthermore, the optimal weight is

π^{*} = \underset{π_{1} Q_{ε} (τ_{1}) + \dots + π_{k} Q_{ε} (τ_{k}) = 1}{argmin} S (π) = \frac{H^{- 1} q}{q^{T} H^{- 1} q}, where q = {[Q_{ε} (τ_{1}), \dots, Q_{ε} (τ_{k})]}^{T} .

(25)

With π* in (25), the OWQAE of γ has the following limiting distribution:

\sqrt{n} [{γ̂}_{W Q A E} (π^{*}) - γ] \Rightarrow N (0, M_{γ}^{- 1} Λ_{k}^{- 1}), where Λ_{k} = q^{T} H^{- 1} q .

(26)

Therefore, the optimal weights for the OWQAE of β and γ are different, and their corresponding OWQAEs have different efficiency. This is due to the structure of the conditional quantile representation (21): β does not depend on the quantile τ whereas γ relies on τ through the coefficient Q_ε(τ).

Similar to the case of β, the conclusion in Theorem 3 also holds for γ̂_WQAE(π̂*) when we use estimated optimal weight π̂* by plugging in consistent estimates of (q, H) into (25).

To implement the weighted quantile regression (22), we need to find a consistent estimate γ̃ of γ. We propose the following procedure:

For each quantile τ = τ₁, …, τ_k, fit the unweighted quantile regression
$(β̃ (τ), γ̃ (τ)) = \underset{(b, r)}{argmin} \sum_{t = 1}^{n} ρ_{τ} {Y_{t} - X_{t}^{T} b - U_{t}^{T} r} .$ (27)
By the same argument in Theorem 4, (β̃(τ), γ ˜(τ)) = (β, γ(τ)) + O_p(n^−1/2).
Let
$γ̃ = \sum_{j = 1}^{k} | γ̃ (τ_{j}) | .$ (28)
Then $γ̃ = γ \sum_{j = 1}^{k} | Q_{ε} (τ_{j}) | + O_{p} (n^{- 1 / 2})$ . Note that, in (22), it suffices for γ̃ to estimate γ up to a multiplication factor. Thus, γ̃ in (28) satisfies Assumption 6(ii).

Finally, we point out an identifiability issue of the optimal weight π* in (25). Since Q_ε(τ) is identifiable up to a scale factor, if we multiply Q_ε(τ) by a constant c, π* and hence γ̂_WQAE(π*) in (24) will be multiplied by a factor 1/c. This is due to the non-identifiability of the parameter γ in (20). To ensure identifiability, we may impose some constraint on ε; see Section 7 for more discussions on estimating π*.

5 Nonparametric Regressions

In this section we study the nonparametric regression (case 3 in Section 2). We assume that both m(·) and σ(·) in (1) are nonparametric functions, and we are interested in the estimation of m(·). Although our theory is also applicable for multivariate case, to avoid the issue of “curse of dimensionality”, we consider the univariate case X ∈ ℝ.

Recall the conditional quantile Q_Y (τ|X) in (2). Without further assumptions, we cannot identify m(X) from Q_Y(τ|X) at a single quantile. To ensure identifiability, we assume that ε has a symmetric density, which is satisfied for many commonly used distributions, such as normal distribution, Student-t distribution, Cauchy distribution, uniform distribution on a symmetric interval, Laplace distribution, symmetric stable distribution, many normal mixture distributions, and their truncated versions on symmetric intervals.

Consider weights ω₁, …, ω_k satisfying the constraints

\sum_{j = 1}^{k} ω_{j} = 1 and ω_{j} = ω_{k + 1 - j}, j = 1, \dots, k .

(29)

Under the symmetric density assumption above, Q_ε(τ) + Q_ε(1 − τ) = 0. Therefore, with quantiles τ_j = j/(k + 1) and using (2) and (29), we have

\sum_{j = 1}^{k} ω_{j} Q_{Y} (τ_{j} | X) = \sum_{j = 1}^{k} ω_{j} [m (X) + σ (X) Q_{ε} (τ_{j})] = m (X) .

(30)

This identity suggests estimation of m(·) by plugging in consistent estimation of Q_Y (τ_j|X).

Given samples ${(X_{t}, Y_{t})}_{t = 1}^{n}$ , we can estimate Q_Y(τ|x) by the local linear quantile regression [Yu and Jones (1998)]:

({Q̂}_{Y} (τ | x), b̂) = \underset{(a, b)}{argmin} \sum_{t = 1}^{n} ρ_{τ} {Y_{t} - a - b (X_{t} - x)} K (\frac{X_{t} - x}{h}),

(31)

for a kernel function K(·) and bandwidth h. From (30), we propose the WQAE of m(x):

{m̂}_{W Q A E} (x | ω) = \sum_{j = 1}^{k} ω_{j} {Q̂}_{Y} (τ_{j} | x) .

(32)

Assumption 7 (i) f_ε is symmetric, positive, and twice continuously differentiable on its support; the density function p_X(·) > 0 of X is differentiable, m(·) is three times differentiable, and σ(·) > 0 is differentiable, in the neighborhood of x. (ii) nh → ∞ and nh⁹ → 0. (iii) K(·) integrates to one, is symmetric, and has bounded support. Write

μ_{K} = \int_{ℝ} u^{2} K (u) d u, φ_{K} = \int_{ℝ} K^{2} (u) d u,

Theorem 6 Suppose Assumptions 1 and 7 hold. Let S(ω) be defined in (13). Then

\sqrt{n h} {{m̂}_{W Q A E} (x | ω) - m (x) - \frac{1}{2} m ″ (x) μ_{K} h^{2}} \Rightarrow N (0, \frac{φ_{K} σ^{2} (x)}{p_{X} (x)} S (ω)) .

(33)

Furthermore, ω* in (14) minimizes S(ω) subject to the constraints (29), and

\sqrt{n h} {{m̂}_{W Q A E} (x | ω^{*}) - m (x) - \frac{1}{2} m ″ (x) μ_{K} h^{2}} \Rightarrow N (0, \frac{φ_{K} σ^{2} (x)}{p_{X} (x)} Ω_{k}^{- 1}),

(34)

where Ω_k is defined in (15).

For comparison, we briefly review some alternative nonparametric estimation methods. The widely used local linear LS regression estimator, denoted by m̂_LS(x), is obtained by replacing the quantile loss ρ_τ(·) in (31) with the square loss. If 𝔼(ε_t) = 0 and var(ε_t) < ∞, under some regularity conditions [Fan and Gijbels (1996)],

\sqrt{n h} {{m̂}_{L S} (x) - m (x) - \frac{1}{2} m ″ (x) μ_{K} h^{2}} \Rightarrow N (0, \frac{φ_{K} σ^{2} (x)}{p_{X} (x)} var (ε)) .

(35)

When ε_t’s are Gaussian, the local LS estimation corresponds to the local likelihood criterion. In the absence of Gaussianity, asymptotic results of m̂_LS(x) generally still hold but this estimator is less efficient in terms of mean-squared error than estimators that exploit the distributional information. For heavy-tailed data, local quantile regression is a robust estimation method; see, e.g., Yu and Jones (1998). The local median regression estimator, denoted by m̂_LAD(x), corresponds to τ = 0.5 in (31). By Theorem 6,

\sqrt{n h} {{m̂}_{L A D} (x) - m (x) - \frac{1}{2} m ″ (x) μ_{K} h^{2}} \Rightarrow N (0, \frac{φ_{K} σ^{2} (x)}{p_{X} (x)} \frac{1}{4 f_{ε}^{2} (0)}) .

(36)

Recently, Kai, Li and Zou (2010) proposed a local composite quantile regression (CQR) estimator which takes a simple average of multiple quantile estimations. The local CQR estimator, denoted by m̂_CQR(x), has the asymptotic normality

\sqrt{n h} {{m̂}_{C Q R} (x) - m (x) - \frac{1}{2} m ″ (x) μ_{K} h^{2}} \Rightarrow N (0, \frac{φ_{K} σ^{2} (x)}{p_{X} (x)} R_{k}),

(37)

where R_k is defined in (16). Intuitively, m̂_LS(x) uses information from the local sample average, m̂_LAD(x) uses information from the local sample median, m̂_CQR(x) uses information from multiple quantiles with uniform weight, and the proposed OWQAE m̂_WQAE(x|ω*) combines information from multiple quantiles optimally.

If the error density f_ε were known, we could replace the quantile loss ρ_τ(·) in (31) by the log likelihood log f_ε(Y_t−a−b(X_t−x)) and obtain a likelihood-based estimator, denoted by m̂_MLE(x), see, e.g., Fan, Farman and Gijbels (1998). Under appropriate conditions,

\sqrt{n h} {{m̂}_{M L E} (x) - m (x) - \frac{1}{2} m ″ (x) μ_{K} h^{2}} \Rightarrow N (0, \frac{φ_{K} σ^{2} (x)}{p_{X} (x)} ℐ {(f_{ε})}^{- 1}),

(38)

where ℐ(f_ε) is the Fisher information of f_ε. Under some regularity conditions, the local likelihood estimator is the most efficient estimator. In practice, f_ε is unknown and m̂_MLE(x) is infeasible. In Section 6.2, it is shown that Ω_k → ℐ(f_ε), and therefore the optimal WQAE m̂_WQAE(x|ω*) achieves the same asymptotic efficiency of the infeasible estimator m̂_MLE(x).

We now compare the efficiency of m̂_WQAE(x|ω*) to m̂_LS(x), m̂_LAD(x), and m̂_CQR(x). From (34)–(37), all four estimators have the asymptotic normality with different s²:

\sqrt{n h} {m̂ (x) - m (x) - \frac{1}{2} m ″ (x) μ_{K} h^{2}} \Rightarrow N (0, \frac{φ_{K} σ^{2} (x)}{p_{X} (x)} s^{2}) .

Define the asymptotic mean-squared error (AMSE) as AMSE{m̂(x)|h} = [m″(x)μ_Kh²/2]² + φ_Kσ²(x)s²/[nhp_X(x)]. Minimizing the AMSE, we obtain the optimal bandwidth:

h^{*} = \underset{h}{argmin} A M S E {m̂ (x) | h} = {μ_{K} m ″ (x)}^{- 2 / 5} {\frac{φ_{K} σ^{2} (x)}{n p_{X} (x)}}^{1 / 5} {(s^{2})}^{1 / 5} \propto {(s^{2})}^{1 / 5},

(39)

and the associated optimal AMSE evaluated at the optimal bandwidth h*

A M S E {m̂ (x) | h^{*}} = \frac{5}{4} {μ_{K} m ″ (x)}^{2 / 5} {\frac{φ_{K} σ^{2} (x)}{n p_{X} (x)}}^{4 / 5} {(s^{2})}^{4 / 5} \propto {(s^{2})}^{4 / 5} .

(40)

In Section 6.4, we tabulate s² for different distributions.

Theorem 7 studies m̂_WQAE(x|ω*) when we use the estimated optimal weight ω̂* in (17).

Theorem 7 Under the assumptions of Theorem 1 and Assumption 4, when we use the estimated weight ω̂* in (17), m̂_WQAE(x|ω̂*) has the same asymptotic normality as m̂_WQAE(x|ω*).

The discussion of the selection of the bandwidth h is deferred to Section 8.4.

6 Efficiency Comparison

6.1 The k-quantile optimal efficiency Ω_k and Λ_k

The parameters in Sections 3–5 can be classified into two types: location-type and scale-type parameters. For β in (6) and (20) and the nonparametric function m(·) in Section 5, these parameters do not directly interact with the error ε, and we call them location-type parameters. For γ in (20), it is directly related to ε, and we call it a scale-type parameter.

Our discussion in the previous sections considers combination of information over a fixed number of quantiles. From the results in Sections 3–5, for the location-type parameters mentioned above, their OWQAE has the asymptotic variance proportional to $Ω_{k}^{- 1}$ with Ω_k defined in (15); for the scale-type parameter γ in (20), the OWQAE has the asymptotic variance proportional to $Λ_{k}^{- 1}$ with Λ_k defined in (26). Since the efficiency of an estimator is inversely proportional to its variance, we call Ω_k and Λ_k the k-quantile optimal efficiency of the location-type and the scale-type parameters, respectively. The larger Ω_k and Λ_k, the better performance of the corresponding estimators.

It is well-known that, under appropriate conditions, the variance of any unbiased parameter estimator has the Cramér-Rao lower bound: the inverse of the Fisher information of the underlying distribution. To illustrate the Fisher information for the location-type and scale-type parameters, consider the simple location-scale model Y = β + γε with location parameter β and scale parameter γ. Note that Y has the density f_Y(y; β, γ) = f_ε((y − β)/γ)/γ. Under the specification (β, γ) = (0, 1), we can show that, the Fisher information for the location parameter β is

ℐ (f_{ε}) = \int_{ℝ} \frac{{[f_{ε}^{'} (u)]}^{2}}{f_{ε} (u)} d u = \int_{0}^{1} {\frac{\partial f_{ε} (Q_{ε} (τ))}{\partial τ}}^{2} d τ,

(41)

and the Fisher information for the scale parameter γ is

𝒥 (f_{ε}) = \int_{ℝ} \frac{{[f_{ε} (u) + u f_{ε}^{'} (u)]}^{2}}{f_{ε} (u)} d u = \int_{0}^{1} {\frac{\partial [Q_{ε} (τ) f_{ε} (Q_{ε} (τ))]}{\partial τ}}^{2} d τ .

(42)

We assume ℐ(f_ε) < ∞ and 𝒥(f_ε) < ∞. The Fisher information ℐ(f_ε) and 𝒥(f_ε) serve as a natural standard when we measure the efficiency of our estimators in the case of regular estimation.

Theorem 8 Suppose Assumption 2 holds. Let Δ = 1/(k + 1).

For Ω_k in (15), we have |Ω_k − ℐ(f_ε)| ≤ Φ_k, where g(t) = f_ε(Q_ε(t)), and
$Φ_{k} = \frac{g^{2} (Δ) + g^{2} (1 - Δ)}{Δ} + \frac{Δ^{2}}{2} \int_{Δ}^{1 - Δ} {[g ″ (t)]}^{2} d t + \int_{0}^{Δ} {{[g' (t)]}^{2} + {[g' (1 - t)]}^{2}} d t .$ (43)
For Λ_k in (26), we have |Λ_k − 𝒥(f_ε)| ≤ Ψ_k, where h(t) = Q_ε(t)f_ε(Q_ε(t)), and
$Ψ_{k} = \frac{h^{2} (Δ) + h^{2} (1 - Δ)}{Δ} + \frac{Δ^{2}}{2} \int_{Δ}^{1 - Δ} {[h ″ (t)]}^{2} d t + \int_{0}^{Δ} {{[h' (t)]}^{2} + {[h' (1 - t)]}^{2}} d t .$ (44)

Theorem 8 indicates that, by optimally combining k quantiles τ₁, …, τ_k, the k-quantile optimal efficiency Ω_k (resp. Λ_k) for the OWQAE of the location-type (resp. scale-type) parameters is at most Φ_k (resp. Ψ_k) away from the corresponding Fisher information ℐ(f_ε) (resp. 𝒥(f_ε)). This result holds for any fixed k.

6.2 Asymptotic behavior of Ω_k and Λ_k

In all previous sections, k is assumed to be a given finite number. In the following few sections, we discuss the behavior of the proposed estimators as k increases with n. In this section, we consider the asymptotic behavior of Ω_k and Λ_k as k → ∞. For regular estimation, it is shown that Ω_k and Λ_k approach the corresponding Cramér-Rao efficiency bound. In Section 6.3, we discuss the OWQAE as k → ∞ and when we use the true optimal weight or the estimated optimal weight. In Section 6.4, we discuss the asymptotic relative efficiency of OWQAE compared to some existing methods. Finally, Section 6.5 briefly considers some non-regular estimation.

The Cramér-Rao efficiency analysis is based on the basic assumption of finite Fisher information ℐ(f_ε) < ∞ and 𝒥(f_ε) < ∞. From (41) and (42), this implies that $\int_{0}^{τ} {{[g' (t)]}^{2} + {[g' (1 - t)]}^{2}} d t \to 0$ and $\int_{0}^{τ} {{[h' (t)]}^{2} + {[h' (1 - t)]}^{2}} d t \to 0$ as τ → 0, where g(τ) and h(·) are defined in Theorem 8. Thus, by Theorem 8, we have the following result.

Theorem 9 Suppose Assumption 2 holds. Let g(·) and h(·) be defined in Theorem 8.

If
$lim_{τ \to 0} \frac{g^{2} (τ) + g^{2} (1 - τ)}{τ} = 0 and lim_{τ \to 0} τ^{2} \int_{τ}^{1 - τ} {[g ″ (t)]}^{2} d t = 0,$ (45)
then, for Φ_k in (43), lim_k→∞ Φ_k = 0, and
$lim_{k \to \infty} Ω_{k} = ℐ (f_{ε}) .$
If (45) holds with g(·) replaced by h(·), then, for Ψ_k in (44), lim_k→∞ Ψ_k = 0, and
$lim_{k \to \infty} Λ_{k} = 𝒥 (f_{ε}) .$

The condition (45) is conventionally imposed in the study of efficient estimations. Basically, it requires that the error density decay sufficiently fast at the boundary (the corresponding estimation is sometimes called as regular estimation), otherwise one may estimate the parameters at a faster rate; see, e.g., Akahira and Takeuchi (1995) for a discussion of this issue, also see Section 6.5 below for discussions on related issues. By Theorem 9, as k → ∞, the k-quantile optimal efficiency Ω_k and Λ_k attain the corresponding Fisher information.

From Theorem 9, Ω_k and Λ_k have different limit as k → ∞. As discussed in Section 4, this is due to the extra dependence of γQ_ε(τ) on Q_ε(τ) in the scale-type parameter.

Proposition 1 below presents an alternative sufficient condition for (45).

Proposition 1 Suppose f_ε has support on ℝ and Assumption 2 holds. Then (45) holds if

lim_{u \to \pm \infty} {\frac{f_{ε}^{2} (u)}{min {1 - F_{ε} (u), F_{ε} (u)}} + | \frac{\partial^{2} [log f_{ε} (u)]}{\partial u^{2}} | \frac{min {1 - F_{ε} (u), F_{ε} (u)}^{3 / 2}}{f_{ε} (u)}} = 0 .

(46)

Write x ∝ y if x/y is bounded away from 0 and ∞. If f_ε(u) ∝ |u|^−a as |u| → ∞ for some a > 1, then 1 − F_ε(u) ∝ |u|^1−a as u → ∞ and F_ε(u) ∝ |u|^1−a as u → −∞. Thus, by Proposition 1, we have the following result.

Corollary 1 Suppose that there exist a > 1 and b > 0 such that f_ε(u) ∝ |u|^−a and ∂²[log f_ε(u)]/∂u² ∝ |u|^−b as |u| → ∞. Then (45) is satisfied if b + 3(a − 1)/2 > 1.

Many commonly used distributions with support on ℝ satisfy (45). (i) For standard normal density f_ε, ∂²logf_ε(u)/∂u² = −1 and 1 − F_ε(u) = [1 + o(1)]f_ε(u)/u as u → ∞, we can verify (46). (ii) For Laplace distribution with density f_ε(u) = 0.5 exp(−|u|), u ∈ ℝ, ∂²[log f_ε(u)]/∂u² = 0 for u ≠ 0, so (46) can be easily verified. (iii) For logistic distribution, g(τ) = c(τ − τ²) for some constant c > 0, so (45) holds. (iv) For Student-t distribution with d > 0 degrees of freedom, Corollary 1 holds with a = d + 1 and b = 2. (v) For normal mixture $θ N (μ_{1}, σ_{1}^{2}) + (1 - θ) N (μ_{2}, σ_{2}^{2}) μ_{1}, μ_{2} \in ℝ, σ_{1}^{2}, σ_{2}^{2} > 0, θ \in [0, 1]$ , we can verify (46).

Recall that the unweighted quantile average estimators have asymptotic variance proportional to R_k defined in (16). The following results show that such a simple averaging estimator is asymptotically equivalent to the LS estimator as k → ∞. This result indicates that if we use a simple average over quantiles, even as we use more and more quantiles, there is no efficiency gain of combining quantile information. Thus, proper weighting over different quantiles is crucial.

Theorem 10 (i) $R_{k} \geq Ω_{k}^{- 1}$ ; as k → ∞, the equality holds if and only if ε is normally distributed. (ii) If var(ε) < ∞, then lim_k→∞ R_k = var(ε).

6.3 Behavior of the OWQAE as k → ∞

In Sections 3–5, the asymptotic normalities of the OWQAE are established for k quantiles with a fixed k. In this section we consider the case that k increases with n. To keep the length, we consider only β̂_WQAE(ω*) for the parametric regression case in Section 3.

Since the uniform Bahadur representation holds on a subinterval of [0,1], we modify Assumption 3 so that the Bahadur representation holds uniformly over expanding subintervals of [0,1] when the the number of quantiles increases with n.

Assumption 8 The asymptotic Bahadur representation (9) holds over τ ∈ 𝒯_n = [δ_n, 1 − δ_n] with δ_n = (log n)^−ε for some ε > 0. Let the number of quantiles $k = k_{n} = ⌊ δ_{n}^{- 1} ⌋ - 1$ .

First, we consider the OWQAE with the theoretical optimal weight ω* in (14).

Corollary 2 Consider β̂_WQAE(ω) in (11) and ω* in (14). Suppose Assumptions 1, 2, and 8 and (45) hold. Further assume ṁ(X_t, β) ∈ ℒ^q for some q > 2. Then

n^{1 / 2} [{β̂}_{W Q A E} (ω^{*}) - β] \Rightarrow N (0, Σ_{β}^{- 1} ℐ {(f_{ε})}^{- 1}) .

Thus, if we use more and more quantiles as n → ∞, the efficiency of the OWQAE with the theoretical optimal weight ω* approaches the Fisher information. The same conclusion also holds for the estimators in Sections 4–5 provided that, as in Assumption 8, appropriate Bahadur representations hold uniformly on 𝒯_n.

We next briefly discuss limiting behavior of the OWQAE with estimated weight when k_n is chosen as in Assumption 8. Again, we discuss the bahavior of the parametric model in Section 3. As k_n → ∞, the asymptotic analysis of the proposed estimator is complicated and depends on the behavior of quantile regression estimators and quantile-density estimators at the extreme.

Let ${β̂}_{W Q A E} (ω^{*}) = \sum_{j = 1}^{k_{n}} ω_{j}^{*} β̂ (τ_{j})$ be the OWQAE with the optimal weight ω* in (14), and let ${β̂}_{W Q A E} ({ω̂}^{*}) = \sum_{i = 1}^{k_{n}} {ω̂}_{j}^{*} β̂ (τ_{j})$ be the OWQAE with estimated weight ω̂* [see, e.g., (17)]. Then, using $\sum_{j = 1}^{k_{n}} ω_{j}^{*} = 1$ and $\sum_{j = 1}^{k_{n}} {ω̂}_{j}^{*} = 1$ , we have

\sqrt{n} [{β̂}_{W Q A E} ({ω̂}^{*}) - β] = \sum_{j = 1}^{k_{n}} ({ω̂}_{j}^{*} - ω_{j}^{*}) \sqrt{n} [β̂ (τ_{j}) - β] + \sqrt{n} [{β̂}_{W Q A E} (ω^{*}) - β] .

(47)

From Corollary 2, in order to prove

\sqrt{n} [{β̂}_{W Q A E} ({ω̂}^{*}) - β] \Rightarrow N (0, Σ_{β}^{- 1} ℐ {(f_{ε})}^{- 1}),

(48)

it suffices to prove

\sum_{j = 1}^{k} ({ω̂}_{j}^{*} - ω_{j}^{*}) \sqrt{n} [β̂ (τ_{j}) - β] = o_{p} (1) .

(49)

we need additional regularity conditions regarding the behavior of density f_ε(Q_ε(τ)) when τ approaches the boundary, and conditions on the density estimators.

Assumption 9 Let k_n and 𝒯_n be chosen as in Assumption 8. There exists some constant η > 0 such that inf_{τ∈𝒯_n} f_ε(Q_ε(τ)) ≥ c(log n)^−η and $\sum_{j = 1}^{k_{n}} | {ω̂}_{j}^{*} - ω_{j}^{*} | = o_{p} [{(log n)}^{- (η + ε / 2)}]$ , where ε is the constant in Assumption 8.

Under Assumption 8 and the condition $\sum_{j = 1}^{k_{n}} | {ω̂}_{j}^{*} - ω_{j}^{*} | = o_{p} [{(log n)}^{- (η + ε / 2)}]$ in Assumption 9, by (71) in the proof of Theorem 1, we have

\sum_{j = 1}^{k_{n}} ({ω̂}_{j}^{*} - ω_{j}^{*}) \sqrt{n} [β̂ (τ_{j}) - β] = [Σ_{β}^{- 1} + o_{p} (1)] \sum_{j = 1}^{k_{n}} \frac{{ω̂}_{j}^{*} - ω_{j}^{*}}{f_{ε} (Q_{ε} (τ_{j}))} N_{j} + o_{p} (1),

(50)

where

N_{j} = \frac{1}{\sqrt{n}} \sum_{t = 1}^{n} {ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]} [τ_{j} - 1_{ε_{t} < Q_{ε} (τ_{j})}] .

Assume without loss of generality that ṁ(X_t, β) is scalar-valued. By property (P1) in Section 2, the summands of N_j are martingale differences. By the condition ṁ(X_t, β) ∈ ℒ² and the orthogonality of martingale differences, $𝔼 (N_{j}^{2}) = O (1)$ uniformly in j. Thus,

𝔼 (\begin{matrix} max_{1 \leq j \leq k_{n}} & N_{j}^{2} \end{matrix}) \leq \sum_{j = 1}^{k_{n}} 𝔼 (N_{j}^{2}) = O (k_{n}),

and thus ${max}_{1 \leq j \leq k_{n}} | N_{j} | = O_{p} (\sqrt{k_{n}})$ . Recall k_n in Assumption 8. Under Assumption 9,

| \sum_{j = 1}^{k_{n}} \frac{{ω̂}_{j}^{*} - ω_{j}^{*}}{f_{ε} (Q_{ε} (τ_{j}))} N_{j} | \leq \frac{{max}_{1 \leq j \leq k_{n}} | N_{j} |}{{inf}_{τ \in 𝒯_{n}} f_{ε} (Q_{ε} (τ))} \sum_{j = 1}^{k_{n}} | {ω̂}_{j}^{*} - ω_{j}^{*} | = o_{p} (1) .

Thus, (49) follows from (50), and we conclude that (48) holds.

6.4 Comparison of asymptotic relative efficiency

We now compare the efficiency of the proposed OWQAE to some existing methods.

First, we consider the parametric case in Section 3. Theorem 2 gives $\sqrt{n} [{β̂}_{W Q A E} (ω^{*}) - β] \Rightarrow N (0, Σ_{β}^{- 1} S (ω^{*}))$ . For parameter estimations, the most widely used method is the ordinary LS estimator, denoted by β̂_LS, which minimizes the squared errors. Assuming var(ε) < ∞ and other appropriate conditions, we have the asymptotic normality $\sqrt{n} ({β̂}_{L S} - β) \Rightarrow N (0, Σ_{β}^{- 1} var (ε))$ . For the quantile regression based estimator β̂(τ) with a single quantile τ, the asymptotic normality in (10) holds. All the three estimators β̂_WQAE(ω*), β̂_LS, and β̂(τ) have asymptotic normality of the form: $\sqrt{n} (β̂ - β) \Rightarrow N (0, Σ_{β}^{- 1} s^{2})$ , where s² = var(ε) (assuming finite) for β̂_LS, $s^{2} = τ (1 - τ) / f_{ε}^{2} (Q_{ε} (τ))$ for β̂(τ), and s² = S(ω*) for β̂_WQAE(ω*). For comparison, we use β̂_WQAE(ω*) as the benchmark and define its asymptotic relative efficiency (ARE) to β̂_LS and β̂(τ) as

A R E ({β̂}_{L S}) = \frac{var (ε)}{S (ω^{*})} and A R E (β̂ (τ)) = \frac{τ (1 - τ)}{f_{ε}^{2} (Q_{ε} (τ)) S (ω^{*})} .

(51)

A value of ARE ≥ 1 indicates better performance of β̂_WQAE(ω*). Clearly, ARE(β̂(τ_j)) ≥ 1, j = 1, …, k. Under the conditions in Theorem 9, lim_k→∞ S(ω*) = 1/ℐ(f_ε) ≤ var(ε) (the Cramér-Rao inequality) so that lim_k→∞ ARE(β̂_LS) ≥ 1. Intuitively, both β̂_LS and β̂(τ) use only partial information: sample average and sample τ-th quantile, respectively. By contrast, β̂_WQAE(ω*) combines strength across quantiles and thus can be more efficient.

Using k = 9 quantiles, Table 1 tabulates ARE(β̂_LS) and ARE(β̂(τ)), τ = 0.1, …, 0.9, for some commonly used distributions. For all non-normal distributions considered, β̂_WQAE(ω*) significantly outperforms β̂_LS and β̂(τ). For N(0,1), β̂_WQAE(ω*) and β̂_LS are comparable, and both are about 50% more efficient than β̂(0.5). For Student-t with one (t₁) or two (t₂) degrees of freedom, LS is not applicable due to infinite variance; β̂_WQAE(ω*) is about 20% more efficient than β̂(0.5) and even substantially more efficient than β̂(τ) for other choices of τ. Thus, potentially much improved efficiency and robustness can be achieved by using the proposed estimator β̂_WQAE(ω*). For linear models, Zou and Yuan (2008) studied composite quantile regression (CQR) method, and we include the efficiency of their method for comparison purpose. Clearly, the OWQAE is significantly more efficient than the CQR.

Table 1.

Theoretical ARE [see (51)] of β̂_WQAE(ω^∗) compared to β̂_LS, Zou and Yuan (2008)’s CQR estimator β̂_CQR, and β̂ (τ), using 9 quantiles τ_j = j/10, j = 1, …, 9. Mixture 1: 0.5N(0,1)+0.5N(0,0.5⁶); Mixture 2: 0.5N(−2,1)+0.5N(2,1). For Student-t₁, t₂, LS is not applicable. [Numbers ≥ 1 indicate better performance of β̂_WQAE(ω^∗)].

ε			β̂(τ) with τ =
distribution	β̂_LS	β̂_CQR	τ₁	τ₂	τ₃	τ₄	τ₅	τ₆	τ₇	τ₈	τ₉

Student-t₁	NA	1.58	47.12	6.40	2.34	1.40	1.19	1.40	2.34	6.40	47.12
Student-t₂	NA	1.12	9.01	2.85	1.65	1.27	1.17	1.27	1.65	2.85	9.01
N(0,1)	0.96	1.03	2.80	1.96	1.67	1.54	1.51	1.54	1.67	1.96	2.80
Mixture 1	10.17	2.43	91.95	32.64	3.26	1.80	1.55	1.80	3.26	32.64	91.95
Mixture 2	3.28	2.80	3.01	2.81	3.68	7.84	56.24	7.84	3.68	2.81	3.01
Laplace	2.00	1.32	9.00	4.00	2.33	1.50	1.00	1.50	2.33	4.00	9.00
Gamma(1)	9.00	3.67	1.00	2.25	3.86	6.00	9.00	13.50	21.00	36.00	81.00
Gamma(2)	2.44	1.73	1.12	1.49	1.91	2.42	3.10	4.08	5.65	8.68	17.34
Gamma(3)	1.71	1.41	1.26	1.42	1.64	1.94	2.35	2.94	3.88	5.68	10.75
Beta(1,1)	1.67	2.04	1.80	3.20	4.20	4.80	5.00	4.80	4.20	3.20	1.80
Beta(1,2)	2.35	2.33	1.05	2.11	3.16	4.22	5.27	6.33	7.38	8.44	9.49
Beta(1,3)	3.31	2.71	1.02	2.12	3.32	4.66	6.19	8.00	10.27	13.33	19.04

Open in a new tab

We briefly mention efficiency comparison of the nonparametric estimator m̂_WQAE(x|ω*) relative to the local LS, local LAD, and Kai, Li and Zou (2010)’s local CQR estimators in Section 5. By (40),

The nonparametric relative efficiency = (The parametric relative efficiency)^4/5.

Thus, the same efficiency comparison result (up to an exponent 4/5) in Table 1 also holds for the nonparametric estimator m̂_WQAE(x|ω*) in Section 5.

6.5 Asymptotic super-efficiency

By Corollary 2, under (45), β̂_WQAE(ω*) is an asymptotically efficient estimator of β, with limiting covariance matrix approaching the Fisher information bound. The corresponding conditions (45) are not mathematical trivialities but are real restricting conditions to obtain the efficiency results. In the case when those “usual” regularity conditions do not hold, the previously discussed efficiency result may not hold, and we may obtain different results from the likelihood-based estimation. For example, we may have a different rate of convergence, and in general, asymptotically efficient estimators do not exist. These “unusual” cases are sometimes called “non-regular” statistical estimation. In this section, we briefly discuss the case of non-regular estimation. In this case, Theorem 11 below shows that, by using quantile regression with optimal weighting, super-efficient estimators may be obtained in the sense that the efficiency is larger than the Fisher information ℐ(f_ε).

Theorem 11 Recall Ω_k in (15). Let g(τ) be defined in Theorem 8. Assume

lim_{τ \to 0} \frac{g^{2} (τ) + g^{2} (1 - τ)}{τ} = c .

(52)

If 0 < c < ∞ and ${lim}_{τ \to 0} τ^{2} \int_{τ}^{1 - τ} {[g ″ (t)]}^{2} d t = 0$ , then lim_k→∞ Ω_k = c + ℐ(f_ε).
If c = ∞, then lim_k→∞ Ω_k = ∞.

Condition (45) covers the regular case c = 0 in (52). Theorem 11 indicates that, for the non-regular case c > 0 in (52), under appropriate conditions, for large k, the variance of the (standardized) optimally weighted quantile regression based estimator β̂_WQAE(ω*) is smaller than the Cramér-Rao bound. In particular, if c = ∞, as the number of quantiles k increases, the asymptotic variance approaches zero. In this sense, the estimator β̂_WQAE(ω*) is asymptotically super-efficient.

Corollary 3 below concerns a special case of super-efficiency when the density f_ε is positive at the boundary.

Corollary 3 Denote the support of f_ε by 𝒟, then lim_k→∞ Ω_k = ∞ in any of the following three cases: (i) 𝒟 = [D₁, D₂] with f_ε(D₁) + f_ε(D₂) > 0; (ii) 𝒟 = [D₁, ∞) with f_ε(D₁) > 0; or (iii) 𝒟 = (−∞, D₂] with f_ε(D₂) > 0.

For the truncated version of the distributions in Section 6.2, we have lim_k→∞ Ω_k = ∞. For example, for the truncated normal on [−1, 1], Corollary 3 (i) applies. For uniform distribution on [0, 1], we can show Ω_k = 2k + 2 → ∞.

Similar results can also be established for Λ_k. We omit the details.

7 Estimation of the Optimal Weight

To construct the proposed OWQAE β̂_WQAE(ω*) in Sections 3–5, we need to obtain estimations of the optimal weight ω* in (14) and π* in (25). It suffices to estimate Q_ε(τ) and f_ε(Q_ε(τ)). We can accomplish this through a two-step procedure: first, use a preliminary estimator to obtain residuals; second, estimate Q_ε(τ) and f_ε(Q_ε(τ)) based on the residuals. Here we illustrate the idea using the models in Sections 3–5.

(Case 1: Parametric model in Section 3.) Since f_ε (Q_ε (τ)) remains the same if we change ε to c + ε for any c, α in (6) can be absorbed into ε. We propose the procedure:

Use the uniform weight ω = [1/k, …, 1/k]^T to obtain the preliminary estimator β̂, and compute the “residuals” (a combination of both α and ε) as ε̂_t = Y_t − m(X_t, β̂).
To estimate f_ε(u), use the nonparametric density estimate ${f̂}_{ε} (u) = {(n b)}^{- 1} \sum_{t = 1}^{n} K {(u - {ε̂}_{t}) / b}$ , where, we follow Silverman (1986) to choose the rule-of-thumb bandwidth b:
$b = 0.9 n^{- 1 / 5} * min {s d ({ε̂}_{1}, \dots, {ε̂}_{n}), \frac{I Q R ({ε̂}_{1}, \dots, {ε̂}_{n})}{1.34}} .$
Here, “sd” and “IQR” are the sample standard deviation and sample interquartile.
Estimate f_ε(Q_ε(τ)) by f̂_ε(Q̂_ε(τ)), where Q̂_ε(τ) is the sample τ-th quantile of ε̂₁, …, ε̂_n.
Plug f̂_ε(Q̂_ε(τ)) into (14) to obtain the estimated optimal weight ω̂*.

(Case 2: Location-scale model in Section 4.) To ensure identifiability, we assume without loss of generality that $\sum_{j = 1}^{k} | Q_{ε} (τ_{j}) | = 1$ , otherwise we can consider the reparametrized model Y = X^T β + (U^T γ*)ε* with γ* = cγ, ε* = ε/c, and $c = \sum_{j = 1}^{k} | Q_{ε} (τ_{j}) |$ . Note that this assumption bears no effect on the optimal weight ω* since f_ε(Q_ε(τ)) is invariant under the transformation cε for any c > 0. For each quantile τ = τ₁, …, τ_k, we fit the quantile regression (27) to obtain (β̃(τ), γ̃(τ)). Define the preliminary estimator

β̃ = \frac{1}{k} \sum_{j = 1}^{k} β̃ (τ_{j}) and γ̃ = \sum_{j = 1}^{k} | γ̃ (τ_{j}) | .

(53)

Then (β̃, γ̃) consistently estimates (β, γ). We use the procedure to compute ω* and π*:

Use β̃ and γ̃ in (53) to compute the errors ${ε̃}_{t} = (Y_{t} - X_{t}^{T} β̃) / (U_{t}^{T} γ̃)$ , t = 1, …, n. To better mimic the constraint $\sum_{j = 1}^{k} | Q_{ε} (τ_{j}) | = 1$ , consider the transformed errors
${ε̂}_{t} = \frac{{ε̃}_{t}}{\sum_{j = 1}^{k} | {Q̃}_{ε} (τ_{j}) |}, t = 1, \dots, n,$
where Q̃_ε(τ) is the sample τ-quantile of ε̃₁, …, ε̃_n.
Use the same steps (ii)–(iii) in case 1 above to obtain estimates f̂_ε(Q̂_ε(τ)) and Q̂_ε(τ).
Use (14) to compute ω̂* and use (25) to compute π̂*.

(Case 3: Nonparametric regression model in Section 5.) As in case 2 above, ω* is invariant under the transformation cε, c > 0. Assume without loss of generality that |ε| has median one. Then the conditional median of |Y − m(X)| given X is σ(X), and we can apply local median quantile regression to estimate σ(·). We propose the procedure:

Use (32) with the uniform weight to obtain the preliminary estimator m̂(·).
Compute Y_t − m̂ (X_t) and estimate σ(·) by local linear median quantile regression:
$(σ̂ (x), b̂) = \underset{σ, b}{argmin} \sum_{t = 1}^{n} ρ_{0.5} {| Y_{t} - m̂ (X_{t}) | - σ - b (X_{t} - x)} K (\frac{X_{t} - x}{ℓ}) .$ (54)
For the bandwidth ℓ, following Yu and Jones (1998), we use ℓ = ℓ_LS(π/2)^1/5, where ℓ_LS is the plug-in bandwidth [Ruppert, Sheather and Wand (1995)] for local linear LS regression based on the data (X_i, |Y_i − m̂(X_i)|²), i = 1, …, n.
Compute the errors ε̂_t = [Y_t − m̂(X_t)]/σ̂(X_t) and obtain the estimator f̂_ε(Q̂_ε(τ)) as in the parametric regression case 1 above.
Use (14) to obtain ω̂₁, …, ω̂_k and symmetrize them: ${ω̂}_{j}^{*} = ({ω̂}_{j} + {ω̂}_{k + 1 - j}) / 2$ , j = 1, …, k.

8 Monte Carlo Studies

We conduct Monte Carlo studies to investigate the sampling performance of the proposed procedures in a variety of regression models. In all settings below, we use 1000 realizations to evaluate the performance of various methods.

8.1 Linear models with homoscedastic errors

For linear models, we compare six estimation methods. OLS: ordinary LS estimator; LAD: the median quantile estimator with τ = 0.5 in (8); QAU, QAO, QAE: the WQAE in (11) with the uniform weights, theoretical optimal weight ω* in (14), and estimated optimal weight ω̂* (cf. Section 7), respectively; CQR: Zou and Yuan (2008)’s CQR estimator. For QAU, QAO, QAE, and CQR, we use k = 9 quantiles 0.1, 0.2, …, 0.9. With 1000 realizations, we use QAE as the benchmark to which the other five methods are compared based on the empirical relative efficiency:

R E (Method) = \frac{M S E (Method)}{M S E (Q A E)} and M S E = \frac{1}{1000} \sum_{j = 1}^{1000} {[β̂ (j) - β]}^{2},

(55)

where “Method” stands for OLS, LAD, QAU, QAO, CQR, and β̂(j) is the estimator of β in the j-th realization. A value of RE ≥ 1 indicates better performance of QAE.

We consider both independent data and time series data:

Model 1 : Y_{t} = α + X_{t} β + ε_{t}, X_{t} ~ N (0, 1), (α, β) = (0, 1),

(56)

Model 2 : Y_{t} = α + β_{1} Y_{t - 1} + β_{2} | Y_{t - 2} | + 0.2 ε_{t}, (α, β_{1}, β_{2}) = (0, 0.3, 0.5) .

(57)

Model 2 is a variant of the threshold autoregressive model with a linear component Y_t−1. For the innovation ε_t, we consider 12 distributions: Normal distribution N(0,1), Student-t distribution with one (t₁) and two (t₂) degrees of freedom, the two normal mixture distributions in Example 2, Laplace distribution, Beta distributions Beta(1,1), Beta(1,2), Beta(1,3), and Gamma distributions Gamma(1), Gamma(2), Gamma(3).

The results are summarized in Table 2(a) for Model 1 with sample sizes n = 100, 300, and in Table 2(b) for Model 2 with n = 300. For N(0,1), Student-t₂, and Laplace distributions, QAE and CQR are comparable; for all other distributions, QAE significantly outperforms CQR. Also, QAE outperforms OLS for all non-normal distributions whereas they are comparable for N(0,1). For n = 300, the superior performance of QAE is even more remarkable, which agrees with our asymptotic theory. For almost all cases considered, QAE substantially outperforms the LAD estimator and the relative efficiency can be as high as almost 2000%. It is worth pointing out that, for Beta and Gamma distributions, the relative efficiencies are much higher than the other distributions considered, owing to the super-efficiency phenomenon in Section 6.5. We also note that QAE with estimated optimal weight has comparable performance to QAO with theoretical optimal weight. We conclude that the proposed OWQAE offers a more efficient alternative to existing methods.

Table 2.

(Linear regression model) Empirical relative efficiency of OWQAE with estimated optimal weight compared to five methods: OLS, LAD, QAU, QAO, CQR. Mixture 1: 0.5N(0,1)+0.5N(0,0.5⁶); Mixture 2: 0.5N(−2,1)+0.5N(2,1). For Student-t₁, t₂, due to infinite variance, the OLS is not stable and varies significantly in simulations. [Numbers ≥ 1 indicate better performance of OWQAE].

(a): Model 1 in (56)

ε_t	n = 100					n = 300
Distribution	OLS	LAD	QAU	QAO	CQR	OLS	LAD	QAU	QAO	CQR
Student-t₁	NA	0.86	2.55	0.74	1.11	NA	1.04	3.16	0.89	1.37
Student-t₂	NA	0.95	1.14	0.81	0.89	NA	1.06	1.30	0.99	1.02
N(0,1)	0.86	1.33	0.90	0.90	0.91	0.89	1.43	0.95	0.95	0.97
Mixture 1	5.10	0.89	3.70	0.77	1.45	7.79	1.27	6.12	0.95	2.00
Mixture 2	2.22	10.85	2.31	1.18	2.03	2.65	19.88	2.84	1.09	2.30
Laplace	1.24	0.85	1.06	0.85	0.89	1.48	0.87	1.23	0.87	1.01
Gamma(1)	5.69	5.46	4.81	0.67	2.51	7.09	6.98	6.12	0.76	2.91
Gamma(2)	2.08	2.60	1.96	0.88	1.48	2.23	2.85	2.14	0.94	1.61
Gamma(3)	1.49	2.02	1.48	0.92	1.25	1.64	2.21	1.61	0.99	1.35
Beta(1,1)	1.49	4.20	1.73	0.91	1.77	1.58	4.58	1.88	0.95	1.90
Beta(1,2)	1.71	3.65	1.93	1.09	1.71	2.02	4.47	2.34	1.97	1.99
Beta(1,3)	2.39	4.30	2.59	1.36	2.01	2.78	5.12	3.06	3.03	2.25

(b): Model 2 in (57), n = 300

ε_t	β₁					β₂
Distribution	OLS	LAD	QAU	QAO	CQR	OLS	LAD	QAU	QAO	CQR
Student-t₁	NA	1.00	1.85	0.89	1.28	NA	0.85	2.74	0.80	1.07
Student-t₂	NA	1.10	1.19	1.04	0.99	NA	1.04	1.23	1.13	1.00
N(0,1)	0.91	1.45	0.94	0.94	0.98	0.90	1.36	0.94	0.94	0.95
Mixture 1	6.85	1.14	4.84	0.92	1.78	6.82	1.16	5.20	0.88	1.77
Mixture 2	2.70	19.79	2.86	1.11	2.35	2.80	19.51	2.99	1.18	2.43
Laplace	1.36	0.89	1.17	0.89	0.99	1.37	0.91	1.21	0.91	1.00
Gamma(1)	6.70	6.23	5.38	0.79	2.83	6.24	6.22	5.35	0.75	2.80
Gamma(2)	2.39	2.94	2.22	0.95	1.70	2.42	2.80	2.23	0.93	1.68
Gamma(3)	1.53	1.95	1.50	0.97	1.27	1.68	2.07	1.59	0.91	1.36
Beta(1,1)	1.50	4.30	1.79	0.93	1.80	1.50	4.15	1.77	0.94	1.81
Beta(1,2)	1.92	4.24	2.21	0.86	1.95	2.23	4.89	2.52	0.85	2.21
Beta(1,3)	2.78	4.71	3.04	0.82	2.29	2.77	4.60	3.01	0.80	2.28

Open in a new tab

8.2 Nonlinear models with homoscedastic errors

We consider two nonlinear models (one independent data and the other time series data):

Model 3 : Y_{t} = α + exp (β X_{t}) + 0.5 ε_{t}, X_{t} ~ N (0, 1), (α, β) = (0, 0.6),

(58)

Model 4 : Y_{t} = α + \sqrt{0.5 + β_{1} Y_{t - 1}^{2} + β_{2} Y_{t - 2}^{2}} + 0.2 ε_{t}, (α, β_{1}, β_{2}) = (0, 0.3, 0.5) .

(59)

Model 4 is Engle (1982)’s ARCH model. Again, we consider the 12 distributions in Section 8.1 for ε_t. Table 3 summarizes the empirical relative efficiency [cf. (55)] of the proposed OWQAE compared to the other methods OLS, LAD, QAU, and QAO (see Section 8.1). The proposed OWQAE is significantly superior to the OLS, LAD and QAU, and comparable to the QAO with theoretical optimal weight.

Table 3.

(Nonlinear regression model) Empirical relative efficiency of OWQAE with estimated weights compared to four methods: OLS, LAD, QAU, QAO. Mixture 1: 0.5N(0,1)+0.5N(0,0.5⁶); Mixture 2: 0.5N(−2,1)+0.5N(2,1). For Student-t₁, t₂, due to infinite variance, the OLS is not stable and varies significantly in simulations. [Numbers ≥ 1 indicate better performance of OWQAE].

	Model 3 in (58)				Model 4 in (59)
ε_t	β = 0.6				β₁ = 0.3				β₂ = 0.5
Distribution	OLS	LAD	QAU	QAO	OLS	LAD	QAU	QAO	OLS	LAD	QAU	QAO

Student-t₁	NA	0.48	1.61	0.55	NA	0.75	2.06	0.74	NA	0.73	2.38	0.75
Student-t₂	NA	1.00	1.37	1.55	NA	1.02	1.23	0.91	NA	0.99	1.21	1.06
N(0,1)	0.79	1.81	0.96	0.95	0.97	1.45	0.99	0.99	0.92	1.38	0.97	0.97
Mixture 1	3.75	1.01	3.16	0.86	6.45	1.12	4.62	0.89	6.99	1.09	5.08	0.91
Mixture 2	1.81	9.54	1.90	1.02	2.56	16.83	2.72	1.15	2.75	18.08	2.85	1.15
Laplace	1.36	1.19	1.15	1.19	1.52	0.93	1.28	0.93	1.42	0.89	1.21	0.89
Gamma(1)	4.21	6.58	3.94	0.75	6.74	6.42	5.78	0.77	6.67	6.44	5.56	0.73
Gamma(2)	1.71	2.78	1.69	0.97	2.23	2.59	2.07	0.97	2.39	2.90	2.21	0.92
Gamma(3)	1.31	1.89	1.28	1.14	1.59	2.04	1.53	0.96	1.64	2.05	1.58	0.92
Beta(1,1)	1.15	4.61	1.98	0.94	1.67	4.48	1.95	0.93	1.57	4.02	1.82	0.94
Beta(1,2)	1.08	3.72	1.97	2.50	1.98	3.98	2.24	0.89	2.06	4.21	2.30	0.85
Beta(1,3)	1.58	4.38	2.93	5.27	2.84	4.85	3.02	0.81	2.96	4.94	3.09	0.89

Open in a new tab

8.3 Location-scale models with conditional heteroscedasticity

Consider two location-scale models (one independent data and the other time series data):

Model 5 : Y_{t} = β X_{t} + (γ_{0} + γ_{1} | X_{t} |) ε_{t}, X_{t} ~ N (0, 1), (β, γ_{0}, γ_{1}) = (0.6, 0.5, 1.0),

(60)

Model 6 : Y_{t} = β Y_{t - 1} + (γ_{0} + γ_{1} | Y_{t - 1} |) ε_{t}, (β, γ_{0}, γ_{1}) = (0.4, 0.5, 0.5) .

(61)

Model 6 is the ARCH model in Koenker and Zhao (1996), and it is different from Engle’s ARCH model where the conditional heteroscedasticity takes the form in (59). Due to the conditional heteroscedasticity, it is slightly more difficult to estimate the parameters. We use Model 5 to illustrate five estimation methods.

(LS method) If ε_t has zero mean and unit variance, the Gaussian-likelihood based estimation method is to minimize the loss function
$\sum_{t = 1}^{n} {\frac{{(Y_{t} - b X_{t})}^{2}}{{(r_{0} + r_{1} | X_{t} |)}^{2}} + log [{(r_{0} + r_{1} | X_{t} |)}^{2}]} .$ (62)
This is essentially an LS type estimation and the Gaussianity is not necessary for the consistency. In general, if ε_t has variance σ², then this LS method produces consistent estimators of β and σ(γ₀, γ₁).
(LAD method) First, apply (27) with τ = 0.5 to obtain the LAD estimator β̂_LAD of β. Second, apply median quantile regression based on absolute residuals:
$\sum_{t = 1}^{n} ρ_{0.5} (| Y_{t} - {β̂}_{L A D} X_{t} | - r_{0} - r_{1} | X_{t} |) .$ (63)
This LAD regression produces estimators of Q_|ε|(0.5)(γ₀, γ₁), where Q_|ε|(0.5) is the median quantile of |ε_t|.
(OWQAE with theoretical optimal weights). As in Section 8.1, we denote this method by QAO.
(OWQAE with estimated optimal weights). As in Section 8.1, we denote this method by QAE. As discussed in Section 7, under the constraint $\sum_{j = 1}^{k} | Q_{ε} (τ_{j}) | = 1$ , QAE produces consistent estimators of (β, γ₀, γ₁). Without the latter constraint, QAE produces consistent estimators of β and $\sum_{j = 1}^{k} | Q_{ε} (τ_{j}) | (γ_{0}, γ_{1})$ .
(OWQAE based on the unweighted quantile regression (27)). This method works the same as the OWQAE above and the only difference is to use β̃(τ) and γ̃(τ) in (27) to form the OWQAE. Denote this method by QAEU. Again, QAEU produces estimators of β and $\sum_{j = 1}^{k} | Q_{ε} (τ_{j}) | (γ_{0}, γ_{1})$ . We include this method to evaluate the performance of the OWQAE based on the unweighted quantile regression (27).

As discussed above, the five estimation methods produce consistent estimators of β and λ(γ₀, γ₁) for some constant λ depending on the distribution of ε_t. To make sensible comparison, we divide the corresponding estimators by λ to obtain consistent estimators of (γ₀, γ₁). Furthermore, to ensure the consistency of the LS method, we consider the properly centered ε_t for the 12 distributions in Section 8.1. That is, if ε_t has finite mean, then we center it so that 𝔼(ε_t) = 0.

The results are summarized in Table 4(a) for Model 5 and in Table 4(b) for Model 6. In both models, sample size n = 300. We make three observations. First, the OWQAE delivers much superior overall performance than OLS and LAD. Second, in all cases considered, the OWQAE using heteroscedasticity-weighted quantile regression (22) clearly outperforms the OWQAE using unweighted quantile regression (27). Third, the OWQAE with estimated weights is comparable to the QAO with theoretical optimal weights.

Table 4.

(Location-scale model) Empirical relative efficiency of OWQAE with estimated weights compared to four methods: OLS, LAD, QAO, OWQAE using unweighted quantile regression in (27). Mixture 1: 0.5N(0,1)+0.5N(0,0.5⁶); Mixture 2: 0.5N(−2,1)+0.5N(2,1). For Student-t₁, t₂, due to infinite variance, the OLS is not stable and varies significantly in simulations. [Numbers ≥ 1 indicate better performance of OWQAE].

(a): Model 5 in (60)

	location parameter				scale parameter
ε_t	β = 0.6				γ₀ = 0.5				γ₁ = 1.0
Distribution	OLS	LAD	QAO	QAEU	OLS	LAD	QAO	QAEU	OLS	LAD	QAO	QAEU
Student-t₁	NA	0.96	0.85	1.17	NA	1.17	0.88	1.20	NA	1.22	0.92	1.25
Student-t₂	NA	1.05	0.91	1.13	NA	1.66	0.94	1.28	NA	1.70	0.96	1.27
N(0,1)	0.89	1.43	0.93	1.12	0.68	2.29	0.92	1.28	0.68	2.37	0.94	1.26
Mixture 1	7.05	1.21	0.91	1.18	0.55	1.88	0.85	1.23	0.55	2.05	0.89	1.27
Mixture 2	2.57	23.03	1.03	1.23	0.66	2.07	0.86	1.28	0.69	2.58	0.86	1.20
Laplace	1.53	0.80	0.80	1.11	0.93	1.89	0.96	1.25	0.96	1.81	0.96	1.21
Gamma(1)	7.38	6.40	0.79	1.17	7.31	3.72	0.50	1.13	6.38	3.18	0.44	1.14
Gamma(2)	2.32	2.89	0.92	1.13	2.70	2.61	0.79	1.23	2.60	2.76	0.81	1.21
Gamma(3)	1.67	2.37	0.92	1.13	1.79	2.63	0.83	1.15	1.76	2.84	0.88	1.20
Beta(1,1)	1.48	4.39	0.95	1.09	0.66	3.87	0.83	1.19	0.65	3.56	0.85	1.17
Beta(1,2)	1.95	4.48	0.87	1.12	1.04	3.41	0.71	1.10	1.07	3.75	0.73	1.21
Beta(1,3)	2.75	4.97	0.84	1.11	1.71	3.64	0.69	1.14	1.63	3.30	0.65	1.10

(b): Model 6 in (61)

	location parameter				scale parameter
ε_t	β = 0.4				γ₀ = 0.5				γ₁ = 0.5
Distribution	OLS	LAD	QAO	QAEU	OLS	LAD	QAO	QAEU	OLS	LAD	QAO	QAEU
Student-t₁	NA	0.65	0.55	NA	NA	16.84	0.37	NA	NA	3.85	0.30	NA
Student-t₂	NA	1.03	0.89	6.76	NA	3.69	0.88	3.38	NA	4.01	0.96	6.02
N(0,1)	0.89	1.45	0.97	1.12	0.62	1.94	0.92	1.21	0.67	2.10	0.97	1.19
Mixture 1	6.47	1.12	0.85	1.22	0.51	1.76	0.77	1.07	0.60	1.92	0.99	1.10
Mixture 2	2.40	22.77	1.00	6.59	0.70	20.72	0.91	34.50	0.64	9.51	0.84	5.82
Laplace	1.48	0.80	0.81	1.53	0.92	2.31	0.92	1.50	0.91	2.35	0.97	1.44
Gamma(1)	5.71	6.16	0.78	1.10	5.13	3.65	0.40	1.12	7.79	5.61	0.62	1.21
Gamma(2)	2.13	2.86	0.93	1.22	2.63	4.37	0.69	1.58	2.73	4.64	0.80	1.53
Gamma(3)	1.44	2.04	0.95	1.57	1.89	4.55	0.84	2.12	1.77	3.85	0.84	1.76
Beta(1,1)	1.59	4.30	0.92	1.01	0.67	3.14	0.71	1.00	0.83	3.79	0.98	1.01
Beta(1,2)	1.73	4.27	0.86	0.98	1.00	2.44	0.65	0.99	1.15	3.21	0.89	0.99
Beta(1,3)	2.48	5.17	0.81	1.01	1.50	2.45	0.54	0.98	1.48	3.79	0.87	1.00

Open in a new tab

8.4 Nonparametric regression models

In our data analysis, we use the standard Gaussian kernel for K(·). We now address the bandwidth selection issue. By (39), the optimal bandwidth h* is proportional to (s²)^1/5. Denote by $h_{L S}^{*}$ and $h_{O W Q A E}^{*}$ the bandwidth for the least squares estimator and the proposed OWQAE. Then $h_{O W Q A E}^{*} = h_{L S}^{*} {[S (ω^{*}) / var (ε)]}^{1 / 5}$ , where S(ω*) is defined in (13). In practice, to select $h_{L S}^{*}$ , we can use the plug-in bandwidth selector in Ruppert, Sheather and Wand (1995), implemented using the command dpill in the R package KernSmooth. We then select $h_{O W Q A E}^{*}$ by plugging in estimates of S(ω*) and var(ε) using the two-step procedure in Section 7, but for the purpose of comparison we shall use their true values in our simulation studies. Similarly, we can choose the optimal bandwidths for the other two estimators. Kai, Li and Zou (2010) adopted the same strategy. For the preliminary estimator m̂(·) in step (i) of Section 7, we use the plug-in bandwidth selector $h_{L S}^{*}$ .

We compare the empirical performance of the four methods (LS, LAD, CQR, OWQAE) in Section 5. With 1000 realizations, we use the least squares estimator m̂_LS as the benchmark to which the other three methods are compared based on the relative efficiency:

R E (m̂) = \frac{MISE ({m̂}_{L S})}{MISE (m̂)} and MISE (m̂) = \frac{1}{1000} \sum_{j = 1}^{1000} \int_{ℓ_{1}}^{ℓ_{2}} {[{m̂}_{j} (x) - m (x)]}^{2} d x,

where m̂_j is the estimator in the j-th realization, and [ℓ₁, ℓ₂] is the interval over which m is estimated. A value of RE ≥ 1 indicates that m̂ outperforms the LS estimator. To facilitate computation, the integral is approximated using 20 grid points.

Consider n = 200 samples from the model

Model 7 : Y = sin (2 X) + 2 exp (- 16 X^{2}) + 0.5 ε, X ~ Unif [- 1.6, 1.6] .

(64)

The same model was also used in Kai, Li and Zou (2010) with the normal design X ~ N(0,1). Here we use the uniform design to avoid some computational issues. For ε, we consider nine symmetric distributions: N(0,1), truncated normal on [−1, 1], truncated Cauchy on [−10, 10], truncated Cauchy on [−1, 1], Student-t with 3 (t₃) degrees of freedom, Standard Laplace distribution, uniform distribution on [−0.5, 0.5], and two normal mixture distributions: 0.5N(2,1)+0.5N(−2,1) and 0.95N(0,1)+0.05N(0,9). The first normal mixture can be used to model a two-cluster population, whereas the second normal mixture can be viewed as a noise contamination model. Let [ℓ₁, ℓ₂] = [−1.5, 1.5].

The relative efficiencies of the four methods, with m̂_LS(x) as the benchmark, are summarized in Table 5(a). Overall, OWQAE either significantly outperforms or is comparable to the other three methods. For example, for N(0,1) and 0.95N(0,1)+0.05N(0,9), OWQAE is comparable to LS; for other distributions, OWQAE has about 20% efficiency gain over LS for most distributions and more than 60% efficiency gain for 0.5N(2,1)+0.5N(−2,1). When compared with CQR, OWQAE outperforms CQR for all but the four distributions: N(0,1), Student-t₃, Laplace distribution, and 0.95N(0,1)+0.05N(0,9), for which they are comparable. While OWQAE underperforms LAD for truncated Cauchy on [−10, 10], it has substantial efficiency gains for N(0,1), truncated N(0,1) on [−1, 1], truncated Cauchy on [−1, 1], uniform on [−0.5, 0.5], and 0.5N(2,1)+0.5 N(−2,1).

Table 5.

(Nonparametric regression model) Empirical relative efficiency of the local least-absolute-deviation estimator (LAD), Kai, Li and Zou (2010)’s local CQR estimator, and the proposed OWQAE, relative to the benchmark local LS estimator. For CQR and OWQAE: τ_j = j/(k + 1), j = 1, …, k, k = 9, 19, 29.

(a): Model 7 in (64)

			CQR, k =			OWQAE, k =
Distribution of ε	LS	LAD	9	19	29	9	19	29
N(0,1)	1	0.73	0.99	1.00	0.99	0.96	0.95	0.95
Truncated N(0,1) on [−1, 1]	1	0.54	0.93	0.98	0.99	1.14	1.25	1.26
Truncated Cauchy on [−10, 10]	1	1.67	1.16	1.07	1.05	1.40	1.27	1.21
Truncated Cauchy on [−1, 1]	1	0.53	0.93	0.98	0.99	1.07	1.20	1.20
Student-t with 3 d.f.’s	1	1.44	1.39	1.21	1.16	1.42	1.28	1.19
Standard Laplace	1	1.26	1.11	1.06	1.05	1.16	1.10	1.06
Uniform on [−0.5, 0.5]	1	0.51	0.94	0.98	0.99	1.23	1.24	1.21
0.5N(−2,1)+0.5N(2,1)	1	0.31	0.92	0.97	0.99	1.57	1.60	1.62
0.95N(0,1)+0.05N(0,9)	1	0.88	1.13	1.08	1.06	1.05	0.98	0.95

(b): Model 8 in (65)

			CQR, k =			OWQAE, k =
Distribution of ε	LS	LAD	9	19	29	9	19	29
N(0,1)	1	0.66	0.96	0.98	0.98	0.94	0.95	0.95
Truncated N(0,1) on [−1, 1]	1	0.43	0.84	0.91	0.93	1.14	1.56	1.80
Truncated Cauchy on [−10, 10]	1	2.39	1.35	1.17	1.12	1.97	1.78	1.61
Truncated Cauchy on [−1, 1]	1	0.46	0.84	0.91	0.94	1.02	1.35	1.55
Student-t with 3 d.f.’s	1	1.73	1.75	1.52	1.41	1.72	1.51	1.44
Standard Laplace	1	1.48	1.18	1.11	1.09	1.31	1.24	1.16
Uniform on [−0.5, 0.5]	1	0.36	0.84	0.91	0.94	1.46	2.21	2.70
0.5N(−2,1)+0.5N(2,1)	1	0.20	0.88	0.95	0.97	2.12	2.18	2.17
0.95N(0,1)+0.05N(0,9)	1	0.86	1.18	1.15	1.12	1.10	1.02	0.95

Open in a new tab

The empirical performance of the proposed method in Table 5(a) is not as impressive as its theoretical performance in (40). For example, for truncated N(0,1) on [−1, 1], the theoretical AREs according to (40) are 1, 0.48, 0.86, 0.93, 0.95, 1.13, 1.69, 2.17, compared to 1, 0.54, 0.93, 0.98, 0.99, 1.14, 1.25, 1.26 in Table 5(a). To explain this phenomenon, the plot (not included here) of the function m(x) = sin(2x) + 2 exp(−16x²) exhibits large curvature and sharp changes on [−0.5, 0.5], and thus a large estimation bias could easily offset the asymptotic efficiency improvements, especially for a moderate sample size. To appreciate this, use the same X and ε in (64), and consider model

Model 8 : Y = 1.8 X + 0.5 ε .

(65)

Then the bias term h²μ_Km″(x) = 0 vanishes and the variance plays a dominating role. For all four estimation methods, we use the same bandwidth: the plug-in bandwidth selector for local linear regression. We summarize the relative efficiencies in Table 5(b). The overall pattern of the empirical relative efficiencies is consistent with that of the theoretical ones in (40), and the proposed OWQAE significantly outperforms other methods for almost all distributions considered. Also, using more quantiles (k = 29) significantly improves the performance of OWQAE for truncated N(0,1) on [−1, 1], truncated Cauchy on [−1, 1] and uniform on [−0.5, 0.5]. The latter property is not shared by the CQR method.

In summary, for most non-normal distributions considered, the proposed method can have substantial efficiency improvements over other methods, and the empirical performance is consistent with our asymptotic theory.

9 An Empirical Application

To highlight the proposed approach, we consider a simple application of this method to the widely studied cross-section of stock returns. The Capital Asset Pricing Model [CAPM, see Sharpe (1964) and Black (1972)] has long served as the backbone of both theoretical and empirical finance. It is generally agreed that investors demand a higher expected return for investment in riskier securities. Over the past three decades a number of studies have empirically examined the performance of the CAPM in the cross-section of returns, and it is also well documented that the rate of return to holding common stocks is to some extend predictable over time. A large number of papers have studied the appropriateness of CAPM model in explaining how investors assess the risk and how they determine what risk premium to demand, and several alternative models have also been proposed in the literature. However, empirical evidence is ambiguous. The support for other asset-pricing models is no better. In addition, the theory behind the CAPM has an intuitive appeal that other models lack. For these (and other) reasons, in spite of the controversy in empirical studies, the CAPM is still widely used in fiancial applications and still the preferred model used in MBA and other managerial finance courses.

The focus of this section is not on the choice of alternative models. In this section, we consider applications of the methods that we discussed in the previous sections on the traditional widely used CAPM cross-sectional regression (similar to those of Fama-MacBeth which can be used to study the predictability of returns). The cross-sectional regression equation at time t is

R_{i, t} = λ_{0} + λ_{1} β_{i m, t - 1} + ε_{i, t},

(66)

where λ₀ is the intercept term, λ₁ is the slope coefficient, and β_im,t−1 is the conditional beta of the excess return for asset i in month t. The dating convention indicates that the conditional beta is formed using only information available at time t − 1. This regression model provides a decomposition of each excess return over each period into two components: the first component, λ₁β_im,t−1, represents the part of return of asset i that is related to the cross-sectional structure of risk, as measured by the betas. The remaining component of the return is uncorrelated to the measures of risk. Thus, the asset pricing model implies that the predictability of returns should be related to the risk.

We consider a population of stocks traded on the New York Stock Exchange (NYSE) from January 2009 to December 2010. We study monthly stock returns. These data are available from CRSP (the Center for Research in Security Prices) as well as many other data resources. Following the literature of many empirical studies, the stocks are considered if their returns in the current month and also the previous 60 month are available, and we exclude firms with negative book-to-market equity (using information from Compustat). The cross-sectional regression model (66) is usually estimated by the least squares method in practice. On the other hand, cumulated empirical evidence in finance indicates that stock returns are not normally distributed. In fact, it is well-known that the distributions of returns are heavy-tailed. Therefore it is important to consider estimation procedures which have good properties in the absence of Gaussianity.

We estimate the cross-sectional regression model (66) using four methods: the traditional OLS regression, the LAD estimation, a simple equally-weighted quantile averaging estimation (denoted by QAU), and the optimally weighted quantile averaging estimation (denoted by OWQAE). We use k = 9 quantiles 0.1, 0.2, …, 0.9, for quantile combination. For the purpose of comparison, we evaluate the performance of these estimators based on their out-of-sample prediction. In particular, we estimate the cross-sectional regression model (66) based on cross-sectional data at each month of 2009, and then use the estimated coefficients λ̂₀ and λ̂₁ to construct forecast of return at the corresponding month of 2010. We compare both the mean squared prediction error (MSE) and the mean absolute deviation (MAD) of the predictions. In particular, we calculate the mean squared prediction error and the mean absolute prediction error by

M S E = \sum_{i} {(R_{i, t + 1} - {R̂}_{i, t + 1})}^{2}, M A D = \sum_{i} | R_{i, t + 1} - {R̂}_{i, t + 1} |,

for each month, and then average these mean squared prediction errors and mean absolute prediction errors respectively over all months. Table A1 below sumarizes the results.

Table A1.

	OLS	LAD	QAU	OWQAE
MAD	51.39	46.94	48.35	44.78
MSE	10.50	8.89	9.45	7.95

Open in a new tab

Numbers in this table are multiplied by 500 for convenience.

Model (66) is the basic regression model that characterizes the risk premiums. We next consider an extension of model (66) which adds conditional heteroscedastic effect of capitalization (the “size” effect). We consider an analogue of (20),

R_{i, t} = λ_{0} + λ_{1} β_{i m, t - 1} + σ_{i, t} ε_{i, t},

(67)

where σ_i,t = γCap_i,t and Cap_i,t is the market capitalization. Again, we estimate the cross-sectional regression model (67) using the four estimation methods mentioned before. More specifically, the two-stage Weighted Least Squares (WLS), the two-stage Weighted Least Absolute Deviation (WLAD), the QAU and OWQAE based on quantiles 0.1, 0.2, …, 0.9. Table A2 below reports the mean squared prediction errors (MSE) and the mean absolute prediction errors (MAD) that are calculated in a similar way as Table A1.

Table A2.

	WLS	WLAD	QAU	OWQAE
MAD	51.01	46.80	48.30	44.09
MSE	10.40	8.86	9.41	7.90

Open in a new tab

Numbers in this table are multiplied by 500 for convenience.

The empirical results from Tables A1–A2 indicate that least squares method-based estimation is less efficient than other methods. In particular, the proposed OWQAE estimator performs relatively better than other methods.

10 Further Discussions

We propose a general method of combining quantile regression information to improve efficiency of regression estimators. The proposed method is simple and more efficient regression estimators can be constructed based on a relatively small number of quantiles.

The proposed method has a wide range of applicability and can be potentially applied to many other models. We briefly discuss a few directions of interesting applications of our approach, without giving full details.

The first direction is efficient estimation for the varying-coefficient model:

Y = α (U) + X^{T} β (U) + ε,

(68)

where α(·) is the functional intercept and β(·) is the p-dimensional column vector of functional coefficients. Then the conditional τ-th quantile of Y given (X, U) is

Q_{Y} (τ | X, U) = α_{τ} (U) + X^{T} β (U) with α_{τ} (U) = α (U) + Q_{ε} (τ) .

A useful application is the varying-coefficient longitudinal model when we have longitudinal measurements from multiple subjects. Wang, Zhu and Zhou (2009) studied quantile regression for a partially linear varying-coefficient longitudinal model. In their work, the coefficients depend on the quantile, and they estimated the coefficients for each quantile without combining information across quantiles. We will explore further in a future paper.

A second direction is volatility estimation in time series. In financial econometrics, volatility plays an important role in asset pricing and risk management. Here we briefly discuss volatility estimation for both parametric and nonparametric ARCH models.

(Nonparametric volatility) Consider nonparametric ARCH(1): X_t = σ(X_t−1)ε_t. Let Q_ε²(τ) be the τ-th quantile of $ε_{t}^{2}$ , and $Q_{X_{t}^{2} | X_{t - 1}} (τ)$ the conditional τ-th quantile of $X_{t}^{2}$ given X_t−1. Then $Q_{X_{t}^{2} | X_{t - 1} = x} (τ) = σ^{2} (x) Q_{ε^{2}} (τ)$ and $Q_{X_{t}^{2} | X_{t - 1} = x} (τ) / Q_{X_{t}^{2} | X_{t - 1} = 0} (τ) = σ^{2} (x) σ^{2} (0)$ for all τ. Given estimates ${Q̂}_{X_{t}^{2} | X_{t - 1} = x} (τ)$ of $Q_{X_{t}^{2} | X_{t - 1} = x} (τ)$ , we can construct efficient estimators of σ²(x)/σ²(0) by combining ${Q̂}_{X_{t}^{2} | X_{t - 1} = x} (τ_{j}) / {Q̂}_{X_{t}^{2} | X_{t - 1} = 0} (τ_{j})$ , j = 1, …, k.

(Parametric volatility) Consider parametric ARCH(p) model:

X_{t} = σ_{t} ε_{t} and σ_{t}^{2} = β_{0} + β_{1} X_{t - 1}^{2} + \dots + β_{p} X_{t - p}^{2}, β_{0} > 0, β_{1}, \dots, β_{p} \geq 0 .

Let ℐ_t−1 be the information up to time t − 1. Denote by Q_ε²(τ) the τ-th quantile of $ε_{t}^{2}$ , and by $Q_{X_{t}^{2} | ℐ_{t - 1}} (τ)$ the conditional τ-th quantile of $X_{t}^{2}$ given ℐ_t−1. Then

Q_{X_{t}^{2} | ℐ_{t - 1}} (τ) = β_{0} (τ) + \sum_{j = 1}^{p} β_{j} (τ) X_{t - j}^{2} and β_{j} (τ) = β_{j} Q_{ε^{2}} (τ), j = 0, \dots, p .

Therefore, we can apply quantile regression with quantile τ to obtain consistent estimates β̂_j(τ) of β_j(τ). Note that β_j(τ)/β₀(τ) = β_j/β₀ for all τ. Therefore,

\frac{{β̂}_{0} (τ) + \sum_{j = 1}^{p} {β̂}_{j} (τ) X_{t - j}^{2}}{{β̂}_{0} (τ)} \approx \frac{β_{0} + \sum_{j = 1}^{p} β_{j} X_{t - j}^{2}}{β_{0}} = σ_{t}^{2} / β_{0}, for all τ \in (0, 1) .

We can construct efficient estimators of $σ_{t}^{2} / β_{0}$ by combining quantiles τ₁, …, τ_k.

Similar ideas also apply to generalized ARCH models. Since substantial work is needed here, we will explore further in a separate project.

Appendix: Proofs

Proof of Theorem 1. By the ergodicity in Assumption 1(ii) and (5),

\frac{D_{n}^{T} D_{n}}{n} \to [\begin{matrix} 1 & 𝔼 [ṁ {(X, β)}^{T}] \\ 𝔼 [ṁ (X, β)] & 𝔼 [ṁ (X, β) ṁ {(X, β)}^{T}] \end{matrix}] ≔ Σ, in probability .

(69)

Recall the definition of Σ_β in Theorem 1. Then we can easily verified that

Σ^{- 1} = [\begin{matrix} 1 + 𝔼 [ṁ {(X, β)}^{T}] Σ_{β}^{- 1} 𝔼 [ṁ (X, β)] & - 𝔼 [ṁ {(X, β)}^{T}] Σ_{β}^{- 1} \\ - Σ_{β}^{- 1} 𝔼 [ṁ (X, β)] & Σ_{β}^{- 1} \end{matrix}] .

(70)

By Assumption 3 and (69)–(70),

β̂ (τ) = β + \frac{Σ_{β}^{- 1} + o_{p} (1)}{n f_{ε} (Q_{ε} (τ))} \sum_{t = 1}^{n} {ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]} [τ - 1_{ε_{t} < Q_{ε} (τ)}] + o_{p} (n^{- 1 / 2}) .

(71)

Therefore,

\sqrt{n} [\sum_{j = 1}^{k} ω_{j} β̂ (τ_{j}) - β] = \frac{Σ_{β}^{- 1} + o_{p} (1)}{\sqrt{n}} \sum_{t = 1}^{n} {ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]} d_{t} + o_{p} (1),

(72)

where

d_{t} = \sum_{j = 1}^{k} \frac{ω_{j}}{f_{ε} (Q_{ε} (τ_{j}))} [τ_{j} - 1_{ε_{t} < Q_{ε} (τ_{j})}] .

(73)

By the Cramér-Wold device, it suffices to consider the case that ṁ(X_t, β) is scalar-valued. Let ℱ_t be the σ-algebra generated by {X_t+1, X_t, …; ε_t, ε_t−1, …}. By property (P1) in Section 2, {{ṁ(X_t, β) − 𝔼[ṁ(X, β)]}d_t}_t∈ℤ form martingale differences with respect to {ℱ_t}_t∈ℤ. Using cov{τ − 1_{ε_t≤Q_ε(τ)}, τ′ − 1_{ε_t≤Q_ε(τ′)}} = min(τ, τ′) − ττ′, we have $𝔼 (d_{t}^{2}) = S (ω)$ with S(ω) defined in (13). Since ε_t is independent of ℱ_t−1, by (5),

\frac{1}{n} \sum_{t = 1}^{n} 𝔼 [{({ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]} d_{t})}^{2} | ℱ_{t - 1}] = \frac{S (ω)}{n} \sum_{t = 1}^{n} {ṁ (X_{t}, β) - 𝔼 {[ṁ (X, β)]}}^{2} \to Σ_{β} S (ω) .

(74)

Since k is fixed, by Assumption 2, d_t is bounded. Thus, the assumption ṁ(X_t, β) ∈ ℒ² ensures the Lindeberg condition. The result then follows from the martingale CLT.

Proof of Theorem 2. The optimal weight follows from the Lagrange multiplier method. The asymptotic normality follows from Theorem 1 and $S (ω *) = 1 / (e_{k}^{T} H^{- 1} e_{k})$ .

Proof of Theorem 3. Recall d_t in (72)–(73). Since k is fixed, by Slutsky’s theorem, it is easy to see that $\sum_{t = 1}^{n} {ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]} d_{t}$ has the same asymptotic distribution if we replace ω_j therein by any ω̃_j such that ω̃_j = ω_j + o_p(1). Thus, the result follows.

Proof of Theorem 4. Define the vectors

V_{t} = [\begin{matrix} X_{t} \\ U_{t} \end{matrix}], θ = [\begin{matrix} b \\ r \end{matrix}], θ (τ) = [\begin{matrix} β \\ γ (τ) \end{matrix}], θ̂ (τ) = [\begin{matrix} β̂ (τ) \\ γ̂ (τ) \end{matrix}], δ = \sqrt{n} [θ - θ (τ)] .

Then

Y_{t} - X_{t}^{T} b - U_{t}^{T} r = U_{t}^{T} γ [ε_{t} - Q_{ε} (τ)] - V_{t}^{T} δ / \sqrt{n} .

Since θ̂(τ) minimizes the criterion function in (22), the reparametrized parameter $δ̂ = \sqrt{n} [θ̂ (τ) - θ (τ)]$ minimizes the loss function

L (δ) = \sum_{t = 1}^{n} \frac{1}{U_{t}^{T} γ̃} [ρ_{τ} {U_{t}^{T} γ [ε_{t} - Q_{ε} (τ)] - V_{t}^{T} δ / \sqrt{n}} - ρ_{τ} {U_{t}^{T} γ [ε_{t} - Q_{ε} (τ)]}] .

Suppose we can establish the quadratic approximation

L (δ) = L^{*} (δ) + o_{p} (1),

(75)

where

L^{*} (δ) = - \frac{1}{\sqrt{n}} \sum_{t = 1}^{n} [τ - 1_{ε_{t} < Q_{ε} (τ)}] \frac{V_{t}^{T} δ}{U_{t}^{T} γ} + \frac{f_{ε} (Q_{ε} (τ))}{2} δ^{T} 𝔼 [\frac{V_{0} V_{0}^{T}}{{(U_{0}^{T} γ)}^{2}}] δ .

(76)

Then the convexity lemma in Pollard (1991) gives δ̂ = δ ^* + o_p(1), where

{δ̂}^{*} = \underset{δ}{argmin} L^{*} (δ) = \frac{1}{f_{ε} (Q_{ε} (τ))} {𝔼 [\frac{V_{0} V_{0}^{T}}{{(U_{0}^{T} γ)}^{2}}]}^{- 1} \frac{1}{\sqrt{n}} \sum_{t = 1}^{n} \frac{V_{t}}{U_{t}^{T} γ} [τ - 1_{ε_{t} < Q_{ε} (τ)}] .

The desired result then follows by using block matrix inverse of ${𝔼 [V_{0} V_{0}^{T} / {(U_{0}^{T} γ)}^{2}]}^{- 1}$ .

It remains to prove (75). In view of L(δ), define

L̃ (δ) = \sum_{t = 1}^{n} \frac{1}{U_{t}^{T} γ} [ρ_{τ} {U_{t}^{T} γ [ε_{t} - Q_{ε} (τ)] - V_{t}^{T} δ / \sqrt{n}} - ρ_{τ} {U_{t}^{T} γ [ε_{t} - Q_{ε} (τ)]}] .

It suffices to prove L̃(δ) = L*(δ) + o_p(1) and L(δ) = L̃(δ) + o_p(1).

First, we prove L̃(δ) = L*(δ) + o_p(1). Using ρ_τ (cz) = cρ_τ (z) for c > 0, we can rewrite

L̃ (δ) = \sum_{t = 1}^{n} [ρ_{τ} {ε_{t} - Q_{ε} (τ) - \frac{V_{t}^{T} δ}{\sqrt{n} U_{t}^{T} γ}} - ρ_{τ} {ε_{t} - Q_{ε} (τ)}] .

Applying Knight (1998)’s identity

ρ_{τ} (u - υ) - ρ_{τ} (u) = - υ (τ - 1_{u < 0}) + \int_{0}^{υ} (1_{u \leq s} - 1_{u \leq 0}) d s,

(77)

we can obtain

L̃ (δ) = - \frac{1}{\sqrt{n}} \sum_{t = 1}^{n} [τ - 1_{ε_{t} < Q_{ε} (τ)}] \frac{V_{t}^{T} δ}{U_{t}^{T} γ} + \sum_{t = 1}^{n} 𝔼 (ξ_{t} | 𝒢_{t - 1}) + \sum_{t = 1}^{n} [ξ_{t} - 𝔼 (ξ_{t} | 𝒢_{t - 1})],

(78)

where 𝒢_t is the σ-algebra generated by {(X_t+1,U_t+1), (X_t,U_t), …; ε_t, ε_t−1, …}, and

ξ_{t} = \int_{0}^{\frac{V_{t}^{T} δ}{\sqrt{n} U_{t}^{T} γ}} [1_{ε_{t} \leq Q_{ε} (τ) + s} - 1_{ε_{t} \leq Q_{ε} (τ)}] d s .

By Assumption 5, ε_t is independent of 𝒢_t−1. Thus,

𝔼 (ξ_{t} | 𝒢_{t - 1}) = \int_{0}^{\frac{V_{t}^{T} δ}{\sqrt{n} U_{t}^{T} γ}} [F_{ε} (s + Q_{ε} (τ)) - F_{ε} (Q_{ε} (τ))] d s .

(79)

By Assumption 6(i), there exists some constant c such that

\frac{| V_{t}^{T} δ |}{U_{t}^{T} γ} \leq c .

(80)

Thus, from (79) and Taylor’s expansion F_ε(s + Q_ε(τ)) − F_ε(Q_ε(τ)) = sf_ε(Q_ε(τ)) + o(s),

\sum_{t = 1}^{n} 𝔼 (ξ_{t} | 𝒢_{t - 1}) = \frac{f_{ε} (Q_{ε} (τ))}{2} δ^{T} [\frac{1}{n} \sum_{t = 1}^{n} \frac{V_{t} V_{t}^{T}}{{(U_{t}^{T} γ)}^{2}}] δ + o (1) \to \frac{f_{ε} (Q_{ε} (τ))}{2} δ^{T} 𝔼 [\frac{V_{0} V_{0}^{T}}{{(U_{0}^{T} γ)}^{2}}] δ, in probability,

(81)

where the convergence follows from the ergodicity and (5). Since {ξ_t − 𝔼(ξ_t|𝒢_t−1)}_t∈ℤ are martingale differences with respect to {𝒢_t−1}_t∈ℤ, by their orthogonality,

𝔼 ({\sum_{t = 1}^{n} [ξ_{t} - 𝔼 (ξ_{t} | 𝒢_{t - 1})]}^{2}) = \sum_{t = 1}^{n} 𝔼 {{[ξ_{t} - 𝔼 (ξ_{t} | 𝒢_{t - 1})]}^{2}} \leq 𝔼 [{(\sqrt{n} ξ_{0})}^{2}] .

(82)

From (80), we have $| \sqrt{n} ξ_{0} | \leq c 1_{| ε_{0} - Q_{ε} (τ) | \leq c / \sqrt{n}}$ , which combined with (82) gives $\sum_{t = 1}^{n} [ξ_{t} - 𝔼 (ξ_{t} | 𝒢_{t - 1})] = o_{p} (1)$ . Thus, by (78) and (81), we have L̃(δ) = L*(δ) + o_p(1).

Next, we prove the approximation L(δ) = L̃(δ) + o_p(1). Let

η_{t} = ρ_{τ} {U_{t}^{T} γ [ε_{t} - Q_{ε} (τ)] - V_{t}^{T} δ / \sqrt{n}} - ρ_{τ} {U_{t}^{T} γ [ε_{t} - Q_{ε} (τ)]} .

Then it is easy to see that

L (δ) - L̃ (δ) = \sum_{t = 1}^{n} \frac{η_{t} U_{t}^{T} (γ - γ̃)}{{(U_{t}^{T} γ)}^{2}} + \sum_{t = 1}^{n} \frac{η_{t} {(U_{t}^{T} γ - U_{t}^{T} γ̃)}^{2}}{{(U_{t}^{T} γ)}^{2} U_{t}^{T} γ̃} ≔ N_{1} + N_{2} .

(83)

By the same argument leading to the quadratic approximation L̃(δ) = L*(δ)+o_p(1) above, we can show that each element of $\sum_{t = 1}^{n} η_{t} U_{t}^{T} / {(U_{t}^{T} γ)}^{2}$ has a quadratic approximation of the order O_p(1). Thus, N₁ = O_p(‖γ̃ − γ‖) = o_p(1). For N₂, by Assumption 6(i)–(ii), $U_{t}^{T} γ - U_{t}^{T} γ̃ = o_{p} (n^{- 1 / 4})$ and $| η_{t} | \leq | V_{t}^{T} δ | / \sqrt{n} = O (n^{- 1 / 2})$ , which give N₂ = o_p(1). Thus, we conclude that L(δ) = L̃(δ) + o_p(1), completing the proof.

Proof of Theorem 5. The asymptotic normally follows from the Bahadur representation in Theorem 4 and the same martingale CLT argument in Theorem 1. The optimal weight follows from the Lagrange multiplier method.

Proof of Theorem 6. Write K_t = K{(X_t − x)/h}. For IID data, Kai, Li and Zou (2010) have shown the following asymptotic representation:

{Q̂}_{Y} (τ | x) - Q_{Y} (τ | x) = \frac{1}{2} m ″ (x) μ_{K} h^{2} + \frac{σ (x)}{p X (x) f_{ε} (Q_{ε} (τ))} \frac{1}{n h} \sum_{t = 1}^{n} [τ - 1_{ε_{t} < Q_{ε} (τ)}] K_{t} + o_{p} (\frac{1}{\sqrt{n h}}) .

Examining their argument and using properties (P1) and (P2) in Section 2, we see that the asymptotic representation also holds under Assumption 1. Therefore, by (30),

{m̂}_{W Q A E} (x | ω) = m (x) + \frac{1}{2} m ″ (x) μ_{K} h^{2} + \frac{σ (x)}{n h p_{X} (x)} \sum_{t = 1}^{n} d_{t} K_{t} + o_{p} (\frac{1}{\sqrt{n h}}),

where d_t is defined in (73). The desired asymptotic normality then follows from the same martingale CLT argument in Theorem 1.

It is easy to see that, under the symmetric density assumption, the optimal weight ω* in (14) automatically satisfies the symmetric weight constraint in (29).

Proof of Theorem 7. This follows from the same argument of Theorem 3.

Proof of Theorem 8. Recall τ_j = j/(k + 1). Define k × k matrices Γ and P:

Γ = {[min (τ_{j}, τ_{j'}) - τ_{j} τ_{j'}]}_{1 \leq j, j' \leq k}, P = diag {f_{ε} (Q_{ε} (τ_{1})), \dots, f_{ε} (Q_{ε} (τ_{k}))} .

(84)

Here “diag” stands for the diagonal matrix. By direct matrix multiplications, we can verify

Γ^{- 1} = (k + 1) [\begin{matrix} 2 & - 1 & 0 & 0 & \dots \\ - 1 & 2 & - 1 & 0 & \dots \\ ⋮ \\ \dots & 0 & - 1 & 2 & - 1 \\ 0 & 0 & \dots & - 1 & 2 \end{matrix}],

(85)

with 2(k + 1) on the diagonal, −(k + 1) on the super-/sub-diagonals, and 0 elsewhere.

Recall g(τ) = f_ε(Q_ε(τ)) and Δ = 1/(k + 1). By H = P⁻¹ΓP⁻¹ and (85),
$Ω_{k} = e_{k}^{T} P Γ^{- 1} P e_{k} = (k + 1) {g^{2} (τ_{1}) + g^{2} (τ_{k}) + \sum_{j = 2}^{k} {[g (τ_{j}) - g (τ_{j - 1})]}^{2}} = (k + 1) [g^{2} (τ_{1}) + g^{2} (τ_{k})] + W_{k} + \int_{Δ}^{1 - Δ} {[g' (t)]}^{2} d t,$ (86)
where
$W_{k} = (k + 1) \sum_{j = 2}^{k} {[g (τ_{j}) - g (τ_{j - 1})]}^{2} - \int_{Δ}^{1 - Δ} {[g' (t)]}^{2} d t .$

We can rewrite W_k as
$W_{k} = (k + 1) \sum_{j = 2}^{k} {{[\int_{τ_{j - 1}}^{τ_{j}} g' (t) d t]}^{2} - (τ_{j} - τ_{j - 1}) \int_{τ_{j - 1}}^{τ_{j}} {[g' (t)]}^{2} d t} = - \frac{k + 1}{2} \sum_{j = 2}^{k} \int_{τ_{j - 1}}^{τ_{j}} \int_{τ_{j - 1}}^{τ_{j}} [g' (t) - {g' (s)]}^{2} d t d s .$

For t, s ∈ [τ_j−1, τ_j], we have $| g' (t) - g' (s) | = | \int_{s}^{t} g ″ (υ) d υ | \leq \int_{τ_{j} - 1}^{τ_{j}} | g ″ (υ) | d υ$ , uniformly. Thus, by the Cauchy-Schwarz inequality,
$max_{t, s \in [τ_{j - 1}, τ_{j}]} {| g' (t) - g' (s) |}^{2} \leq {[\int_{τ_{j - 1}}^{τ_{j}} | g ″ (υ) | d υ]}^{2} \leq Δ \int_{τ_{j - 1}}^{τ_{j}} {[g ″ (υ)]}^{2} d υ .$

Applying the above inequality, we can obtain
$| W_{k} | \leq \frac{k + 1}{2} \sum_{j = 2}^{k} {(τ_{j} - τ_{j - 1})}^{2} max_{t, s \in [τ_{j - 1}, τ_{j}]} {| g' (t) - g' (s) |}^{2} \leq \frac{Δ^{2}}{2} \int_{Δ}^{1 - Δ} {| g ″ (t) |}^{2} d t .$

The result then follows from (86) and the identity $\int_{0}^{1} {[g' (τ)]}^{2} d τ = ℐ (f_{ε})$ in (41).
Using H = P⁻¹ΓP⁻¹, we can write Λ_k = q^T H⁻¹q = (Pq)^TΓ⁻¹(Pq). Note that Pq = [h(τ₁), …, h(τ_k)]^T with h(t) = Q_ε(t)f_ε(Q_ε(t)). Using (85) and the argument in (i) above, we can easily obtain the desired result.

Proof of Proposition 1. Let u = Q_ε(τ) so that τ = F_ε(u). Since f_ε has support ℝ, u → −∞ as τ → 0. Recall g(τ) = f_ε(Q_ε(τ)). By the chain rule, we can show g′′(τ) = [∂² log f_ε(u)/∂u²]/f_ε(u). Then one can easily show that (46) is equivalent to

lim_{τ \to 0} {\frac{g^{2} (τ) + g^{2} (1 - τ)}{τ} + [| g ″ (τ) | + | g ″ (1 - τ) |] τ^{3 / 2}} = 0 .

(87)

For example, lim_τ→0 g²(τ)/τ = 0 if and only if ${lim}_{u \to - \infty} f_{ε}^{2} (u) / F_{ε} (u) = 0$ , and lim_τ→0 g²(1 − τ)/τ = 0 if and only if ${lim}_{u \to \infty} f_{ε}^{2} (u) / [1 - F_{ε} (u)] = 0$ . It remains to show (87) implies (45).

Let ε > 0 be any given number. By (87), there exists $0 < τ_{0}^{*} < 1 / 2$ such that $| g ″ (t) | < \sqrt{ε} t^{- 3 / 2}$ and $| g ″ (1 - t) | < \sqrt{ε} t^{- 3 / 2}$ for all $t \in (0, τ_{0}^{*})$ . Fix $τ_{0}^{*}$ . By Assumption 2, there exists c < ∞ such that |g′′(τ)| < c for $τ \in [τ_{0}^{*}, 1 - τ_{0}^{*}]$ . Let $τ^{*} = min {τ_{0}^{*}, \sqrt{ε} {[c (1 - 2 τ_{0}^{*})]}^{- 1 / 2}}$ .

Then $τ^{* 2} c (1 - 2 τ_{0}^{*}) < ε$ . For τ < τ*, applying $\int_{τ}^{1 - τ} = \int_{τ}^{τ_{0}^{*}} + \int_{τ_{0}^{*}}^{1 - τ_{0}^{*}} + \int_{1 - τ_{0}^{*}}^{1 - τ}$ , we have

τ^{2} \int_{τ}^{1 - τ} {| g ″ (t) |}^{2} d t \leq τ^{2} (\int_{τ}^{τ_{0}^{*}} ε t^{- 3} d t + \int_{τ_{0}^{*}}^{1 - τ_{0}^{*}} c d t + \int_{τ}^{τ_{0}^{*}} ε t^{- 3} d t) \leq τ^{2} [\frac{ε}{2 τ^{2}} + c (1 - 2 τ_{0}^{*}) + \frac{ε}{2 τ^{2}}] \leq 2 ε,

completing the proof.

Proof of Theorem 10. (i) For S(ω) in (13), R_k = S((1/k, …, 1/k)^T). By the uniqueness of the minimizer ω* of S(ω) [see (14)], $R_{k} \geq Ω_{k}^{- 1}$ with equality if and only if ω* = (1/k, …, 1/k)^T. Let g(τ) = f_ε(Q_ε(τ)), and for convenience write g(τ₀) = g(τ_k+1) = 0. For $ω^{*} = {(ω_{1}^{*}, \dots, ω_{k}^{*})}^{T}$ in (14), by H⁻¹ = PΓ⁻¹P and (85), we can show

ω_{j}^{*} = \frac{(k + 1) [2 g (τ_{j}) - g (τ_{j - 1}) - g (τ_{j + 1})] g (τ_{j})}{Ω_{k}},

(88)

where Ω_k is defined in (15). Note that, for j = ⌊(k + 1)τ⌋ with τ ∈ (0, 1), lim_k→∞(k + 1)²[2g(τ_j) − g(τ_j−1) − g(τ_j+1)]g(τ_j) = −g′′(τ)g(τ). Thus, as k → ∞, $ω_{j}^{*} = 1 / k$ for all j implies g′′(τ)g(τ) = −c, τ ∈ (0, 1), for some c > 0. Define the transformation u = Q_ε(τ). By the chain rule, we can show that g′′(τ)g(τ) = −c is equivalent to $[f_{ε}^{″} (u) f_{ε} (u) - f_{ε}^{'} {(u)}^{2}] / f_{ε}^{2} (u) = - c$ or {log f_ε(u)}′′ = −c. Thus, f_ε(u) must be a normal density.

(ii) See the proof in Kai, Li and Zou (2010).

Proof of Corollary 2. As in the proof of Theorem 1, we use martingale CLT and consider scalar-valued ṁ(X_t, β). Similar to d_t in (73), with the optimal weight ω*, define

d_{t}^{*} = \sum_{j = 1}^{k_{n}} \frac{ω_{j}^{*}}{f_{ε} (Q_{ε} (τ_{j}))} [τ_{j} - 1_{ε_{t} < Q_{ε} (τ_{j})}] .

By Theorem 9, S(ω*) = 1/Ω_{k_n} → 1/ℐ(f_ε). Thus, the convergence of the conditional variances follows from the argument in (74). It remains to verify the Lindeberg condition.

By Assumption 2, g(τ) = f_ε(Q_ε(τ)) is bounded on τ ∈ (0, 1). Thus, from (88),

| d_{t}^{*} | \leq \frac{k_{n} + 1}{Ω_{k_{n}}} \sum_{j = 1}^{k_{n}} | 2 g (τ_{j}) - g (τ_{j - 1}) - g (τ_{j + 1}) | \leq c_{1} k_{n}^{2},

for some constant c₁. For any given c₂ > 0,

\frac{1}{n} \sum_{t = 1}^{n} 𝔼 [{({ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]} d_{t}^{*})}^{2} 1_{| {ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]} d_{t}^{*} | \geq c_{2} \sqrt{n}}] \leq c_{1}^{2} k_{n}^{4} 𝔼 ({ṁ (X_{t}, β) - 𝔼 [ṁ (X, β)]}^{2} 1_{| ṁ (X, β) - 𝔼 [ṁ (X, β)] | \geq c_{2} \sqrt{n} / (c_{1} k_{n}^{2})}) .

(89)

Note that, for any random variable U ∈l ℒ^q, q > 2, and constant c > 0,

𝔼 (U^{2} 1_{| U | \geq c}) \leq 𝔼 (\frac{{| U |}^{q}}{c^{q - 2}} 1_{| U | \geq c}) \leq \frac{𝔼 ({| U |}^{q})}{c^{q - 2}} .

(90)

Applying (90) to (89) and using k_n = O[(log n)^ε], we have the Lindeberg condition.

Proof of Theorem 11. (i) It follows from (86) and the proof of Theorem 8. (ii) From lim_τ→0[g²(τ) + g²(1 − τ)]/τ = ∞ and (86), 1/S(ω*) ≥ (k + 1)[g²(τ₁) + g²(τ_k)] → ∞.

Footnotes

We thank Roger Koenker, Steve Portnoy, Victor Chernozhukov, two referees, and seminar attendants at University of Illinois for their very helpful comments. Zhao’s research was partially supported by a NIDA grant P50-DA10075-15. Xiao thanks Boston College for research support. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

Note that different forms of the location-scale model can be studied similarly to our analysis in this section, and optimally weighted quantile averaging estimators can be constructed (but the construction of optimal weights will be different, depending on the specific form of the regression model).

Contributor Information

Zhibiao Zhao, Email: zuz13@stat.psu.edu, Department of Statistics, Penn State University, University Park, PA 16802.

Zhijie Xiao, Email: zhijie.xiao@bc.edu, Department of Economics, Boston College, Chestnut Hill, MA 02467.

References

Akahira M, Takeuchi K. Non-Regular Statistical Estimation. New York: Springer; 1995. [Google Scholar]
Beran R. Asymptotically efficient adaptive rank estimates in location models. Annals of Statistics. 1974;2:248–266. [Google Scholar]
Bickel P. On adaptive estimation. Annals of Statistics. 1982;10:647–671. [Google Scholar]
Black F. capital market equilibrium with restricted borrowing. Journal of Business. 1972;45:444–454. [Google Scholar]
Bradic J, Fan J, Wang W. Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. Journal of the Royal Statistical Society, Series B. 2011;73:325–349. doi: 10.1111/j.1467-9868.2010.00764.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen X, Linton O, Jacho-Chavez D. Averaging of moment condition estimators. Working paper. 2011 [Google Scholar]
Chamberlain G. Quantile regression, censoring and the structure of wages. Advances in Econometrics, sixth World Congress. 1994:171–210. [Google Scholar]
Engle RF. Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. inflation. Econometrica. 1982;50:987–1007. [Google Scholar]
Fan J, Farmen M, Gijbels I. Local maximum likelihood estimation and inference. Journal of the Royal Statistical Society, Series B. 1998;60:591–608. [Google Scholar]
Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. London: Chapman and Hall; 1996. [Google Scholar]
He X, Shao QM. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Annals of Statistics. 1996;24:2608–2630. [Google Scholar]
Jurečková J, Procházka B. Regression quantiles and trimmed least squares estimator in nonlinear regression model. Journal of Nonparametric Statistics. 1994;3:201–222. [Google Scholar]
Kai B, Li R, Zou H. Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. Journal of the Royal Statistical Society, Series B. 2010;72:49–69. doi: 10.1111/j.1467-9868.2009.00725.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koenker R. A note on L-estimates for linear models. Statistics & Probability Letters. 1984;2:323–325. [Google Scholar]
Koenker R. Quantile Regression. New York: Cambridge University Press; 2005. [Google Scholar]
Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46:33–49. [Google Scholar]
Koenker R, Zhao Q. L-estimation for linear heteroscedastic models. Journal of Nonparametric Statistics. 1994;3:223–235. [Google Scholar]
Koenker R, Zhao Q. Conditional quantile estimation and inference for ARCH models. Econometric Theory. 1996;12:793–813. [Google Scholar]
Pollard D. Asymptotics for least absolute deviation regression estimators. Econometric Theory. 1991;7:186–199. [Google Scholar]
Portnoy S, Koenker R. Adaptive L-estimation of Linear Models. Annals of Statistics. 1989;17:362–381. [Google Scholar]
Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. Journal of the American Statistical Association. 1995;90:1257–1270. [Google Scholar]
Sharpe WF. Capital asset prices: a theory of market equilibrium under conditions of risk. Journal of Finance. 1964;19:425–442. [Google Scholar]
Silverman BW. Density Estimation. London: Chapman and Hall; 1986. [Google Scholar]
Stone C. Adaptive maximum likelihood estimation of a location parameter. Annals of Statistics. 1975;3:267–284. [Google Scholar]
Stout WF. Almost Sure Convergence. New York: Academic Press; 1974. [Google Scholar]
Wang H, Zhu Z, Zhou J. Quantile regression in partially linear varying coefficient models. Annals of Statistics. 2009;37:3841–3866. doi: 10.1214/07-AOS561. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiao Z, Koenker R. Conditional quantile estimation and inference for GARCH models. Journal of the American Statistical Association. 2009;104:1696–1712. [Google Scholar]
Yu K, Jones MC. Local linear quantile regression. Journal of the American Statistical Association. 1998;93:228–237. [Google Scholar]
Zhao Q. Asymptotically efficient median regression in the presence of heteroskedasticity of unknown form. Econometric Theory. 2001;17:765–784. [Google Scholar]
Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. Annals of Statistics. 2008;36:1108–1126. [Google Scholar]

[R1] Akahira M, Takeuchi K. Non-Regular Statistical Estimation. New York: Springer; 1995. [Google Scholar]

[R2] Beran R. Asymptotically efficient adaptive rank estimates in location models. Annals of Statistics. 1974;2:248–266. [Google Scholar]

[R3] Bickel P. On adaptive estimation. Annals of Statistics. 1982;10:647–671. [Google Scholar]

[R4] Black F. capital market equilibrium with restricted borrowing. Journal of Business. 1972;45:444–454. [Google Scholar]

[R5] Bradic J, Fan J, Wang W. Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. Journal of the Royal Statistical Society, Series B. 2011;73:325–349. doi: 10.1111/j.1467-9868.2010.00764.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Chen X, Linton O, Jacho-Chavez D. Averaging of moment condition estimators. Working paper. 2011 [Google Scholar]

[R7] Chamberlain G. Quantile regression, censoring and the structure of wages. Advances in Econometrics, sixth World Congress. 1994:171–210. [Google Scholar]

[R8] Engle RF. Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. inflation. Econometrica. 1982;50:987–1007. [Google Scholar]

[R9] Fan J, Farmen M, Gijbels I. Local maximum likelihood estimation and inference. Journal of the Royal Statistical Society, Series B. 1998;60:591–608. [Google Scholar]

[R10] Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. London: Chapman and Hall; 1996. [Google Scholar]

[R11] He X, Shao QM. A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs. Annals of Statistics. 1996;24:2608–2630. [Google Scholar]

[R12] Jurečková J, Procházka B. Regression quantiles and trimmed least squares estimator in nonlinear regression model. Journal of Nonparametric Statistics. 1994;3:201–222. [Google Scholar]

[R13] Kai B, Li R, Zou H. Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. Journal of the Royal Statistical Society, Series B. 2010;72:49–69. doi: 10.1111/j.1467-9868.2009.00725.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Koenker R. A note on L-estimates for linear models. Statistics & Probability Letters. 1984;2:323–325. [Google Scholar]

[R15] Koenker R. Quantile Regression. New York: Cambridge University Press; 2005. [Google Scholar]

[R16] Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46:33–49. [Google Scholar]

[R17] Koenker R, Zhao Q. L-estimation for linear heteroscedastic models. Journal of Nonparametric Statistics. 1994;3:223–235. [Google Scholar]

[R18] Koenker R, Zhao Q. Conditional quantile estimation and inference for ARCH models. Econometric Theory. 1996;12:793–813. [Google Scholar]

[R19] Pollard D. Asymptotics for least absolute deviation regression estimators. Econometric Theory. 1991;7:186–199. [Google Scholar]

[R20] Portnoy S, Koenker R. Adaptive L-estimation of Linear Models. Annals of Statistics. 1989;17:362–381. [Google Scholar]

[R21] Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. Journal of the American Statistical Association. 1995;90:1257–1270. [Google Scholar]

[R22] Sharpe WF. Capital asset prices: a theory of market equilibrium under conditions of risk. Journal of Finance. 1964;19:425–442. [Google Scholar]

[R23] Silverman BW. Density Estimation. London: Chapman and Hall; 1986. [Google Scholar]

[R24] Stone C. Adaptive maximum likelihood estimation of a location parameter. Annals of Statistics. 1975;3:267–284. [Google Scholar]

[R25] Stout WF. Almost Sure Convergence. New York: Academic Press; 1974. [Google Scholar]

[R26] Wang H, Zhu Z, Zhou J. Quantile regression in partially linear varying coefficient models. Annals of Statistics. 2009;37:3841–3866. doi: 10.1214/07-AOS561. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Xiao Z, Koenker R. Conditional quantile estimation and inference for GARCH models. Journal of the American Statistical Association. 2009;104:1696–1712. [Google Scholar]

[R28] Yu K, Jones MC. Local linear quantile regression. Journal of the American Statistical Association. 1998;93:228–237. [Google Scholar]

[R29] Zhao Q. Asymptotically efficient median regression in the presence of heteroskedasticity of unknown form. Econometric Theory. 2001;17:765–784. [Google Scholar]

[R30] Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. Annals of Statistics. 2008;36:1108–1126. [Google Scholar]

PERMALINK

Efficient Regressions via Optimally Combining Quantile Information^*

Zhibiao Zhao

Zhijie Xiao

Abstract

1 Introduction

2 Model Setup, Assumptions, and TheWeighted Quantile Average Estimator

3 Homoscadastic Parametric Regression Models

4 The Location-Scale Models

5 Nonparametric Regressions

6 Efficiency Comparison

6.1 The k-quantile optimal efficiency Ω_k and Λ_k

6.2 Asymptotic behavior of Ω_k and Λ_k

6.3 Behavior of the OWQAE as k → ∞

6.4 Comparison of asymptotic relative efficiency

Table 1.

6.5 Asymptotic super-efficiency

7 Estimation of the Optimal Weight

8 Monte Carlo Studies

8.1 Linear models with homoscedastic errors

Table 2.

8.2 Nonlinear models with homoscedastic errors

Table 3.

8.3 Location-scale models with conditional heteroscedasticity

Table 4.

8.4 Nonparametric regression models

Table 5.

9 An Empirical Application

Table A1.

Table A2.

10 Further Discussions

Appendix: Proofs

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Efficient Regressions via Optimally Combining Quantile Information*

Zhibiao Zhao

Zhijie Xiao

Abstract

1 Introduction

2 Model Setup, Assumptions, and TheWeighted Quantile Average Estimator

3 Homoscadastic Parametric Regression Models

4 The Location-Scale Models

5 Nonparametric Regressions

6 Efficiency Comparison

6.1 The k-quantile optimal efficiency Ωk and Λk

6.2 Asymptotic behavior of Ωk and Λk

6.3 Behavior of the OWQAE as k → ∞

6.4 Comparison of asymptotic relative efficiency

Table 1.

6.5 Asymptotic super-efficiency

7 Estimation of the Optimal Weight

8 Monte Carlo Studies

8.1 Linear models with homoscedastic errors

Table 2.

8.2 Nonlinear models with homoscedastic errors

Table 3.

8.3 Location-scale models with conditional heteroscedasticity

Table 4.

8.4 Nonparametric regression models

Table 5.

9 An Empirical Application

Table A1.

Table A2.

10 Further Discussions

Appendix: Proofs

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Efficient Regressions via Optimally Combining Quantile Information^*

6.1 The k-quantile optimal efficiency Ω_k and Λ_k

6.2 Asymptotic behavior of Ω_k and Λ_k