Inference in Functional Linear Quantile Regression

Meng Li; Kehui Wang; Arnab Maity; Ana-Maria Staicu

doi:10.1016/j.jmva.2022.104985

. Author manuscript; available in PMC: 2023 Jul 1.

Published in final edited form as: J Multivar Anal. 2022 Mar 11;190:104985. doi: 10.1016/j.jmva.2022.104985

Inference in Functional Linear Quantile Regression

Meng Li ^a,^*, Kehui Wang ^b, Arnab Maity ^c, Ana-Maria Staicu ^c

PMCID: PMC8975129 NIHMSID: NIHMS1788232 PMID: 35370319

Abstract

In this paper, we study statistical inference in functional quantile regression for scalar response and a functional covariate. Specifically, we consider a functional linear quantile regression model where the effect of the covariate on the quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. The objective is to test that the regression parameter is constant across several quantile levels of interest. The parameter function is estimated by combining ideas from functional principal component analysis and quantile regression. An adjusted Wald testing procedure is proposed for this hypothesis of interest, and its chi-square asymptotic null distribution is derived. The testing procedure is investigated numerically in simulations involving sparse and noisy functional covariates and in a capital bike share data application. The proposed approach is easy to implement and the R code is published online at https://github.com/xylimeng/fQR-testing.

Keywords: Composite quantile regression, Functional principal component analysis, Functional quantile regression, Measurement error, Wald test

2020 MSC: Primary 62G08, Secondary 62H15

1. Introduction

The advance in computation and technology generated an explosion of data that have functional characteristics. The need to analyze this type of data triggered a rapid growth of the functional data analysis (FDA) field; see [9, 35] for two comprehensive treatments. Most research in functional data analysis has primarily focused on mean regression (see, for example, [10, 19, 20, 41, 46]); only a few works accommodate higher-order moment effects [29, 38]. Quantile regression is appealing in many applications as it allows us to describe the entire conditional distribution of the response at various quantile levels. For example, in our capital bike share data application, it is of interest to study how the bike rental behavior of casual users in the previous day affects the upper quantiles of total bike rentals in the current day.

Quantile regression models for scalar responses and functional covariates have been introduced in [2]. Functional quantile regression (fQR) models essentially extend the standard quantile regression framework to account for functional covariates: the effect of the covariate on a particular quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. Cardot et al. [2] considered a smoothing splines-based approach to represent the functional covariates and derived its convergence rate; Kato [22] studied principal component analysis (PCA)-based estimation and established a sharp convergence rate. In [5] and [4] Crambes et al. discussed nonparametric quantile regression estimation and studied the theoretical properties of a support vector machine-based estimator, a method inspired from [30]. Yao et al. [47] considered regularized partially fQR model by additionally incorporating high-dimensional scalar covariates. Shi et al. [37] developed a procedure to test the adequacy of fQR based on functional PCA. Unlike these regularization and basis expansion-based methods, Ferraty et al. [8] and Chen and Müller [3] estimated the conditional quantile function by inverting the corresponding conditional distribution function; they too studied consistency properties of the regression estimator. Nevertheless, hitherto there is no available work on statistical inference of the quantile regression estimator under fQR. Additionally, existing functional quantile regression research often assumes that the functional covariate is observed either completely on the domain or at very dense grids of points and typically with little or no error contamination. In this work, we are interested in formally assessing whether the effect of the true smooth signal of the covariate, varies across several quantile levels of interest of the response, when the smooth signal is observed at finite grids and possibly perturbed with error and a functional linear quantile regression model is assumed. This problem is important in its own right, yielding a more comprehensive description of the relationship between the covariate and the conditional distribution of the response. Furthermore, formally assessing such a hypothesis is critical when one wishes to improve the estimation accuracy of the conditional quantile of the response at some specified level. Specifically, suppose for several quantile levels around the specified level of interest, there is no evidence that the effect of the latent covariate on these quantile levels of the response differs. In that case, one can improve the accuracy in estimating the covariate effect on the response at the specified level of interest by borrowing information across these quantile levels. For example, in the case of standard quantile regression with vector predictors, there has been a rich literature on the so-called composite quantile regression to aggregate information across quantile levels [21, 24, 49, 52].

In this paper, we assume a linear fQR model that relates the τth quantile level of the response to the covariate through the inner product between a bivariate regression coefficient function and the true covariate signal. In the case when the true signal is measured at the same time points across the study, one naive way to test the null hypothesis that the effect of the true covariate signal is constant across several quantile levels of interest, is to treat the discretely observed functional covariates as high dimensional covariate and apply standard testing procedures (Wald test) in linear quantile regression for vector covariates [25]. As expected, such an approach results in inflated type I error rates due to the high correlation between the repeated measurements corresponding to the same subject; the situation gets progressively worse when the covariate includes noise. Another alternative is to consider a single number summary of the covariate, such as average or median, and carry out this hypothesis testing by employing standard testing methods in quantile regression. Our numerical investigation of this direction shows that while the Type I error rates are preserved well, the power is substantially affected.

We propose to represent the latent smooth covariate and the quantile regression parameter function using the same orthogonal basis system; this reduces the inner product part of the linear fQR model to an infinite sum of products of basis coefficients of the smooth covariate and parameter function. There are various options of orthogonal basis types: we consider the data-driven basis that is formed by the leading eigenbasis functions of the covariance of the true covariate signal and use the percentage of variance explained criterion to determine a finite truncation for this basis. While using a finite basis system reduces the dimensionality of the problem, an important challenge is handling the variability of the basis coefficients of the smooth latent signal, called functional principal component (fPC) scores. We develop the asymptotic distributions of the quantile estimators based on the estimated fPC scores, when the functional covariate is sampled at a fine grid of points (dense design). Finally, we introduce an adjusted Wald test statistic and develop its asymptotic null distribution. The introduced testing procedure shows excellent numerical results even in situations when the functional covariate is sampled at few and irregular time points across the study (sparse design) and the measurements are contaminated by error.

The theoretical study of the distribution of the quantile estimator based on the estimated fPC scores has important differences from the standard linear quantile regression with vector covariates. First, the predictors, fPC scores, are unknown and require estimation, which in turn introduces uncertainty; by comparison the vector covariates are known in the standard quantile setting counterpart. We show that asymptotically the quantile estimators are still unbiased, but their variances are inflated. This implies that, in this reduced framework, a direct application of the Wald testing procedure for null hypotheses involving regression parameters is not appropriate. Second, dealing with estimated fPC scores in this situation is different from the measurement errors in predictors setting. For the latter, it is typically assumed that the measurement error and the true predictors are mutually independent or that the errors are independent across subjects [42, 44, 45]. However, in the functional data setting the resulting errors, due to the difference between the estimated fPC scores and the true scores, are dependent on the true predictors and are also dependent across subjects. As a result, the theoretical investigation requires more careful quantification in terms of the estimated scores and the use of quantile loss.

This article makes three main contributions. First, we establish the asymptotic distribution of the coefficient estimator for both one single quantile level and multiple quantile levels for dense sampled functional covariates. To the best of our knowledge, this is the first work that studies inference of the quantile estimators; previous research in functional quantile regression focused on consistency and minimax rates (see [3, 22]), and most literature on inference in FDA is limited to the context of functional linear regression [1, 16, 26, 40, 48]. Second, we propose an adjusted Wald test statistic to formally assess that the quantile regression parameter is constant across specified quantile levels and derive its asymptotic null distribution. Third, we consider cases where the functional covariate is observed sparsely and contaminated with noise and illustrate through detailed numerical investigation that the testing procedure continues to have excellent performance. Furthermore, we demonstrates the usage of the composite quantile regression and the corresponding advantage in terms of estimation and prediction accuracy, using a capital bike rental data set. Composite quantile regression is well known to improve the efficiency of the quantile estimators at a single quantile level, which becomes especially useful for extreme quantiles [43]; nonetheless, more formal investigation of functional composite quantile regression is beyond the scope of this article.

The rest of the paper is organized as follows. Section 2 introduces the statistical framework, describes the null and alternative hypotheses, discusses a simpler approximation of the testing procedure, and presents the estimation approach. Section 3 develops the asymptotic normality of the proposed estimators, introduces the adjusted Wald test, and derives its null asymptotic distribution. Section 4 presents extensive simulation studies confirming the excellent performance of the proposed test procedure in various scenarios for both dense and sparse designs. Section 5 applies the proposed test to a bike rental data and illustrates the improvement of combined quantile regressions compared to a single level quantile regression after the proposed tests being used. Proofs of Theorem 1 and Theorem 2, as well as some additional useful results, are given in Section 6.

2. Methodology

2.1. Statistical framework

Suppose we observe data {Y_i, (t_ij, W_ij)} for j ∈ {1, …, m_i} and i ∈ {1, …, n}, where Y_i is a scalar response variable, ${W_{i 1}, \dots, W_{i m_{i}}}$ is the evaluation of a latent and smooth process X_i(·) measured with noise at the finite grid of points ${t_{i 1}, \dots, t_{i m_{i}}}$ for $t_{ij} \in T$ , and $T$ is a bounded closed interval. It is assumed that the observed functional covariate is perturbed by white noise, i.e., W_ij = X_i(t_ij) + e_ij, where e_ij has mean 0 and variance σ². Furthermore, we assume that the true functional signal $X_{i} (\cdot) \in L^{2} (T)$ with $T = [0, 1]$ , and X_i(·) are independent and identically distributed. Our objective is to formally assess whether the smooth covariate signal X_i(·) has constant effect at specified quantile levels of the response.

Let $Q_{Y_{i} ∣ X_{i}} (τ)$ be the conditional τth quantile function of the response Y_i given the true covariate signal X_i(·) where τ ∈ (0, 1). We assume the following linear fQR model:

Q_{Y_{i} ∣ X_{i}} (τ) = β_{0} (τ) + \int_{0}^{1} β (t, τ) X_{i}^{c} (t) dt,

(1)

where β₀(τ) is the quantile-level varying intercept function, and β(t, τ) is the bivariate regression coefficient function and the main object of interest. It is assumed that for a fixed quantile level τ, β(t, τ) ∈ L²[0, 1] as a function of t. Here $X_{i}^{c} (t)$ is the de-meaned smooth covariate signal, defined as $X_{i}^{c} (t) = X_{i} (t) - E X_{i} (t)$ . Model (1) is an extension of the standard linear quantile regression model [25] to functional covariates. It was first introduced by [2] and later considered by [3, 22]. For simplicity, in the following it is assumed that the smooth covariate signal has zero mean, i.e., EX_i(t) = 0 for all t ∈ [0, 1].

Let $U = {τ_{1}, \dots, τ_{L}}$ be a set with quantile levels of interest where τ₁ < ⋯ < τ_L. Motivated by the reasons mentioned in Section 1, our goal is to test the null hypothesis:

H_{0} : β (\cdot, τ_{1}) = \dots = β (\cdot, τ_{L}),

(2)

against the alternative hypothesis H_a : β(·, τ_ℓ) ≠ β(·, τ_ℓ′), for some ℓ ≠ ℓ′ ∈ {1, …, L}. This null hypothesis involves infinite dimensional objects, which is very different from the common null hypotheses considered in quantile regression.

One approach to simplify the null hypothesis is by using basis functions expansion. Specifically, let {ϕ_k(·)}_k≥1 be an orthogonal basis in L²[0, 1] such that $\int_{0}^{1} ϕ_{k} (t) ϕ_{k^{'}} (t) dt = 0$ if k ≠ k′ and 1 if k = k′. We represent the unknown parameter function β(·, τ) using this orthogonal basis β(t, τ) = ∑_k≥1 β_k(τ)ϕ_k(t) where $β_{k} (τ) = \int β (t, τ) ϕ_{k} (t) dt$ are unknown parameter loadings varying with the quantile level τ. It follows that the equality β(·, τ_ℓ) = β(·, τ_ℓ′) is equivalent to β_k(τ_ℓ) = β_k(τ_ℓ′), k ≥ 1. Thus, the null hypothesis (2) can be written as H₀ : β_k(τ₁) = β_k(τ₂) = ⋯ = β_k(τ_L) for k ≥ 1. Furthermore, we represent the smooth covariate using the same basis function as X_i(t) = ∑_k≥1 ξ_ikϕ_k(t) where $ξ_{i k} = \int X_{i} (t) ϕ_{k} (t) dt$ are smooth covariate loadings. Then, the linear fQR model (1) can be equivalently represented as $Q_{Y_{i} ∣ X_{i}} (τ) = β_{0} (τ) + \sum_{k = 1}^{\infty} β_{k} (τ) ξ_{ik}$ . In practice the infinite summation is typically truncated to some finite truncation K. As a result the fQR model can be approximated by

Q_{Y_{i} ∣ X_{i}}^{K} (τ) = β_{0} (τ) + \sum_{k = 1}^{K} β_{k} (τ) ξ_{ik},

(3)

and the null hypothesis to be tested can be approximated by a reduced version

H_{0}^{K} : (\begin{matrix} β_{1} (τ_{1}) \\ β_{2} (τ_{1}) \\ ⋮ \\ β_{K} (τ_{1}) \end{matrix}) = (\begin{matrix} β_{1} (τ_{2}) \\ β_{2} (τ_{2}) \\ ⋮ \\ β_{K} (τ_{2}) \end{matrix}) = \dots = (\begin{matrix} β_{1} (τ_{L}) \\ β_{2} (τ_{L}) \\ ⋮ \\ β_{K} (τ_{L}) \end{matrix}) .

(4)

Let θ_τ ≔ (β₀(τ), β₁(τ), …, β_K(τ))^T be the (K + 1)-dimensional parameter vector and $ζ ≔ {(θ_{τ_{1}}^{T}, \dots, θ_{τ_{L}}^{T})}^{T}$ the full quantile regression parameter vector of dimension L(K + 1). Then the reduced null hypothesis (4) can be equivalently re-written as $H_{0}^{K} : R ζ = 0$ ζ = 0, where R = R₁ ⊗ R₂ and

\underset{(L - 1) \times L}{R_{1}} = [\begin{matrix} 1 & - 1 & 0 & \dots & 0 & 0 \\ 0 & 1 & - 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 1 & - 1 \end{matrix}], \underset{K \times (K + 1)}{R_{2}} = [0_{K}, I_{K}] .

Here 0_K denotes the K-dimensional vector of zeros and I_K is the K × K dimensional identity matrix.

If the loadings ξ_ik’s were known, then model (3) is exactly the conventional quantile regression model. In such case, a standard Wald testing procedure for $H_{0}^{K}$ is typically formulated as $T_{W} = {(R \hat{ζ})}^{T} {(R {\hat{Γ}}_{\hat{ζ}} R^{T})}^{- 1} R \hat{ζ}$ , where $\hat{ζ}$ is the quantile regression estimator of ζ and ${\hat{Γ}}_{\hat{ζ}}$ is a consistent estimator of the covariance of $\hat{ζ}$ conditional on the true loadings ξ_ik’s; see [25, Chapter 3] for a review of existing methods. However, in practice the loadings of the smooth covariate signal ξ_ik are unknown, and a valid approach has to account for such uncertainty.

Depending on the choice of the orthogonal basis, the approaches used to select the finite truncation K and to develop the theoretical properties for the quantile regression estimators differ. Several choices have been commonly used in functional data analysis literature: Fourier basis functions [39], Wavelet basis [33] or orthogonal B-splines [36, 50]. One important aspect to keep in mind when selecting the basis functions is how to handle the finite truncation K. In this paper we consider the orthogonal basis given by the eigenfunctions of the covariance of the smooth covariate signal X_i(·). Let G(s, t) ≔ Cov{X_i(s), X_i(t)} be the covariance of X_i(·); Mercer’s theorem gives the following spectral decomposition of the covariance $G (s, t) = \sum_{k = 1}^{\infty} λ_{k} ϕ_{k} (s) ϕ_{k} (t)$ , where {ϕ_k(·), λ_k}_k are the pairs of eigenfunctions and corresponding eigenvalues. The eigenvalues λ_k’s are nondecreasing and nonnegative and the eigenfunctions ϕ_k(·)’s are mutually orthogonal functions in L²[0, 1]. Using the Karhunen-Loève expansion, the zero-mean smooth covariate X_i(·) can be represented as $X_{i} (t) = \sum_{k = 1}^{\infty} ξ_{ik} ϕ_{k} (t)$ , where $ξ_{ik} = \int_{0}^{1} X_{i} (t) ϕ_{k} (t) dt$ are commonly known as functional principal component (fPC) scores of X_i(·),satisfying that E(ξ_ik) = 0, Var(ξ_ik) = λ_k and uncorrelated over k. A popular way to select the finite truncation, or equivalently the number of leading eigenfunctions, is the percentage of variance explained; alternative options for selecting the finite truncation K are considered in [32] and [28].

2.2. Estimation procedure

We discuss estimation for the case when the functional covariate is observed on a fine grid of points, a setting known in the literature by the name of dense sampling design. Nevertheless, our procedure can be successfully applied to the case when the covariate is observed on an irregular sampling design with few points (sparse sampling design) and contaminated with noise, as illustrated later in the numerical investigation. When the sampling design is dense, and thus m_i is very large for each i, a common approach in functional data analysis is “smoothing first, then estimation” [48]. Specifically, we first reconstruct each trajectory ${\hat{X}}_{i} (\cdot)$ from the data ${(t_{i j}, W_{i j}) |}_{j = 1}^{m_{i}}$ using penalized regression splines, while one can also use any other appropriate smoothing method such as the local polynomial kernel smoothing technique [6]. Let $\bar{X} (\cdot)$ be the sample mean of these reconstructed trajectories and denote by ${\hat{X}}_{i}^{c} (t) ≔ {\hat{X}}_{i} (t) - \bar{X} (t)$ the centered covariate. Furthermore, let S(·, ·) be the sample covariance of ${\hat{X}}_{i} (t)$ ; the spectral decomposition of S(·, ·) yields the pairs of estimated eigenfunctions and eigenvalues ${{\hat{ϕ}}_{k} (\cdot), {\hat{λ}}_{k}}_{k}$ . The theoretical properties of the estimated eigenfunctions ${\hat{ϕ}}_{k} (\cdot)$ have been well studied in the literature; see [11, 13, 48] among others. As eigenfunctions ϕ_k(·) and ${\hat{ϕ}}_{k} (\cdot)$ are both defined up to a change in sign, we assume that the sign of ${\hat{ϕ}}_{k} (\cdot)$ is chosen such that $\int ϕ_{k} (t) {\hat{ϕ}}_{k} (t) dt \geq 0$ throughout the paper. Finally the fPC scores ξ_ik are estimated as ${\hat{ξ}}_{ik} = \int {\hat{X}}_{i}^{c} (t) {\hat{ϕ}}_{k} (t) dt$ ; in practice numerical integration is used to approximate the integral; see also [31].

Using the estimated fPC scores ${\hat{ξ}}_{ik}$ ’s, the quantile regression parameter of the approximated linear fQR model, θ_τ, is estimated by

{\hat{θ}}_{τ} = \underset{{(b_{0}, b_{1}, \dots, b_{K})}^{T} \in ℝ^{K + 1}}{arg min} \sum_{i = 1}^{n} ρ_{τ} (y_{i} - b_{0} - \sum_{k = 1}^{K} {\hat{ξ}}_{ik} b_{k}),

(5)

where ρ_τ(x) = x{τ − I(x < 0)} is the quantile loss function and I(x < 0) is the indicator function that equals 1 if x < 0 and 0 otherwise. Although throughout this article we focus on a homogeneous truncation level K to ease presentation, the proposed method easily generalizes to the case in which K varies with τ. We next move on to studying the theoretical properties of the quantile regression estimator in (5).

3. Theoretical properties

3.1. Assumptions

Let F_i(y) = P(Y_i < y|X_i(·)), and f_i(·) be the corresponding density function. We make the following assumptions:

A1.
${Y_{i}, X_{i} (\cdot), e_{i} (\cdot)}_{i = 1}^{n}$ are independent and identically distributed (i.i.d.) as {Y, X(·), e(·)}, and X(·) and e(·) are independent where E{e(t)} = 0 and Cov{e(t), e(t′)} = σ²I(t = t′) for any t, t′;
A2.
The conditional distribution F_i(·) is twice continuously differentiable and the corresponding density function f_i(·) is uniformly bounded away from 0 and ∞ at points $Q_{Y_{i} ∣ X_{i}} (τ)$ ;
A3.
The functional covariates X(·) satisfy that E{X(t₁)X(t₂)X(t₃)X(t₄)} < ∞ uniformly for (t₁, t₂, t₃, t₄) ∈ [0, 1]⁴;
A4.
There exists a finite number p₀ such that $λ_{1} > λ_{2} > \dots > λ_{p_{0}} > 0$ and λ_k = 0 if k > p₀.

A2 and the i.i.d. assumption in A1 are standard in quantile regression with vector covariates; see [25, Ch. 4]. A1 assumes that the functional covariates X_i(·)’s are observed with independent white noise e_i(·), making the model more realistic compared to error free assumptions made by [22]. The assumption A3 holds for Gaussian processes and is common in the FDA literature; for example, see [13] and the discussion therein.

Finally A4 requires that the functional covariate has a finite number of non-zero eigenvalues, making the approximate model (3) exact, with K = p₀. This strong assumption has been employed previously in the literature [31, 32]. In numerical studies, we found that A4 is not needed in order for the testing procedure to show excellent performance in terms of size and power; see, for example, the simulation study in Section 4.4 under the more general model (1) when p₀ is divergent. This seems to indicate that A4 is for theoretical convenience. One possible way to relax this assumption is to replace it with a condition on the number of principal components that are relevant in describing the dependence between the functional covariate and the response. Another possibility is to remove it entirely and show that the functional quantile regression $Q_{Y_{i} ∣ X_{i}}^{K} (τ)$ approximates the original model with negligible error. Nonetheless, our attempts to prove the main results by relaxing A4 in these directions have not been productive, partly due on one hand to the complication in the interweave of the quantile loss function and infinitely dimensional functional data and on the other hand to the focus on hypothesis testing, as opposed to estimation. Specifically, A4 is critical to ensure a root-n rate for the estimated coefficient functions formulated in Theorem 1 and subsequently to derive the test’s null distribution. As noted in the preceding section, even under A4, inference on fQR based on the estimated fPC scores differs from the standard multivariate quantile regression with vector covariates in the key aspect that estimation of the fPC scores induces a specific type of measurement error. Unlike the existing measurement error in covariates literature relying on certain independence assumptions [42, 44, 45], measurement errors in the estimated fPC scores are dependent on the true predictors and are also dependent across subjects. This consideration requires a more careful quantification in terms of the estimated scores and the use of quantile loss. In this article, we focus on addressing the challenge posed by measurement error in quantile regression with intricate dependence induced by functional covariates, and leave developments to relax A4 to future research.

The following assumptions are commonly used when describing a dense sampling design [31, 48]. For convenient mathematical derivations, we assume that there are the same number of observations per subject, i.e., m_i = m for all i.

B1.
The time points $t_{i j} \overset{i . i . d .}{~} g (\cdot)$ for i ∈ {1, …, n} and j ∈ {1, …, m}, where the density g(·) has bounded support [0, 1] and is continuously differentiable.
B2.
$m \geq C n^{c_{m}}$ where c_m > 5/4 and C is some constant.

For our theoretical development, we require the following condition for the kernel bandwidth h_X that is used in smoothing the functional covariates.

C1.
$h_{X} = O (n^{- c_{m} / 5})$ .

3.2. Asymptotic distribution

The following theorem gives the asymptotic distribution of the quantile estimator. Kato [22] gave the minimax rate of the coefficient function estimation when there is no measurement error on the discrete functional covariates. The author assumed that the number of eigenvalues is infinite instead of finite as in our assumption A4. Our established root-n rate crucially depends on A4, which facilities downstream inference. One would need to properly scale the estimator using a slower rate and derive the asymptotic distribution, for both ${\hat{θ}}_{τ}$ and test statistics constructed via ${\hat{θ}}_{τ}$ , should A4 be relaxed. We denote D₀ as the diagonal matrix whose diagonal entries are $(1, λ_{1}, \dots, λ_{p_{0}})$ and $D_{1} (τ) = E [f_{i} {Q_{Y_{i} ∣ X_{i}} (τ)} ξ_{i} ξ_{i}^{T}]$ which is positive definite, where $ξ_{i} = {(1, ξ_{i 1}, \dots, ξ_{i p_{0}})}^{T}$ . Similarly, we denote ${\hat{ξ}}_{i} = {(1, {\hat{ξ}}_{i 1}, \dots, {\hat{ξ}}_{i p_{0}})}^{T}$ . When K = p₀, the hypothesis $H_{0}^{K}$ in (4) is equivalent to H₀ in (2), and the truncation model in (3) does not incur approximation error as the residual ∑_k>K β_k(τ)ξ_ik degenerates to zero owing to its zero variance.

Theorem 1. Denote by ${\hat{θ}}_{τ}$ the quantile regression estimator defined by (5) for K = p₀, where τ ∈ (0, 1). Under Conditions A1–A4, B1–B2, C1, we have

\sqrt{n} ({\hat{θ}}_{τ} - θ_{τ}) \overset{d}{\to} N {0, τ (1 - τ) D_{1}^{- 1} (τ) D_{0} D_{1}^{- 1} (τ) + Θ_{τ} Σ_{0} Θ_{τ}},

(6)

where $Θ_{τ} = 1_{(p_{0} + 1) \times (p_{0} + 1)} \otimes θ_{τ}^{T}$ and the matrix Σ₀ is defined in Section 6 which does not depend on τ. Moreover, $\hat{ζ} = {({\hat{θ}}_{τ_{1}}^{T}, \dots, {\hat{θ}}_{τ_{L}}^{T})}^{T}$ is asymptotically multivariate normal centered at $ζ = {(θ_{τ_{1}}^{T}, \dots, θ_{τ_{L}}^{T})}^{T}$ , and for 1 ≤ ℓ ≠ ℓ′ ≤ L the asymptotic covariance matrix for ${\hat{θ}}_{τ_{ℓ}}$ and ${\hat{θ}}_{τ_{ℓ^{'}}}$ is given by

Acov {\sqrt{n} ({\hat{θ}}_{τ_{ℓ}} - θ_{τ_{ℓ}}), \sqrt{n} ({\hat{θ}}_{τ_{ℓ^{'}}} - θ_{τ_{ℓ^{'}}})} = {min (τ_{ℓ}, τ_{ℓ^{'}}) - τ_{ℓ} τ_{ℓ^{'}}} D_{1}^{- 1} (τ_{ℓ}) D_{0} D_{1}^{- 1} (τ_{ℓ^{'}}) + Θ_{τ_{ℓ}} Σ_{0} Θ_{τ_{ℓ^{'}}} .

(7)

Remark that the asymptotic covariances in both (6) and (7) contain two components: a Huber [18] sandwich term that is typical in quantile regression theory and a “variance inflation” term. Specifically, if the true scores ξ_i’s were observed, then the asymptotic variance of ${\hat{θ}}_{τ}$ would be $τ (1 - τ) D_{1}^{- 1} (τ) D_{0} D_{1}^{- 1} (τ)$ , and the asymptotic covariance matrix for ${\hat{θ}}_{τ_{ℓ}}$ , ${\hat{θ}}_{τ_{ℓ^{'}}}$ would be ${min (τ_{ℓ}, τ_{ℓ^{'}}) - τ_{ℓ} τ_{ℓ^{'}}} D_{1}^{- 1} (τ_{ℓ}) D_{0} D_{1}^{- 1} (τ_{ℓ^{'}})$ ; see [25, 34]. The variance inflation terms, Θ_τΣ₀Θ_τ in (6) and $Θ_{τ_{ℓ}} Σ_{0} Θ_{τ_{ℓ^{'}}}$ in (7), quantify the effect of uncertainty in estimating the fPC scores on the quantile regression estimators. Thus, when the covariates are functional data, the asymptotic distribution of ${\hat{θ}}_{τ}$ is unbiased but the variance is inflated where the variance inflation terms depend on the true parameter value θ_τ.

The proof of Theorem 1 is detailed in Section 6. The reasoning follows two main steps: 1) approximate the estimated fPC scores ${\hat{ξ}}_{i}$ ’s by linear combinations of random vectors of the true fPC scores ξ_i; and 2) show that the approximation error in the predictors is negligible to the quantile loss function. Step 1 crucially relies on the dense design assumption B2. This allows to employ various bounds on both the estimated eigenfunctions and the difference ${\hat{X}}_{i} (\cdot) - X_{i} (\cdot)$ , which in turn enables us to derive a fine-grained characterization of the estimated scores (Lemma 1); see the supplementary materials for more detail.

3.3. Adjusted Wald test

Using the asymptotic properties of the quantile regression estimators, we are now ready to develop a Wald type testing procedure for assessing the general null hypothesis (2) or its finite reduced version (4) represented in vector form by $H_{0}^{K} : R ζ = 0$ . Recall that $ζ = {(θ_{τ_{1}}^{T}, \dots, θ_{τ_{L}}^{T})}^{T}$ denotes the full quantile regression parameter, and $\hat{ζ} = {({\hat{θ}}_{τ_{1}}^{T}, \dots, {\hat{θ}}_{τ_{L}}^{T})}^{T}$ is its estimator.

We define a modified version of Wald test, called the adjusted Wald test, by ignoring the variance inflation terms in the above asymptotic covariances. Let $Σ (τ_{ℓ}, τ_{ℓ^{'}}) = σ (τ_{ℓ}, τ_{ℓ^{'}}) D_{1}^{- 1} (τ_{ℓ}) D_{0} D_{1}^{- 1} (τ_{ℓ^{'}})$ with σ(τ_ℓ, τ_ℓ′) set to τ_ℓ(1 − τ_ℓ) if ℓ = ℓ′, and {min(τ_ℓ, τ_ℓ′) − τ_ℓτ_ℓ′} otherwise. Then the asymptotic covariance matrix of $\hat{ζ}$ without the inflation terms, denoted as $Γ_{\hat{ζ}}^{a}$ in which the superscript a indicates the adjustment by ignoring the inflation terms, i.e.,

\underset{L (K + 1) \times L (K + 1)}{Γ_{\hat{ζ}}^{a}} = [\begin{matrix} Σ (τ_{1}, τ_{1}) & Σ (τ_{1}, τ_{2}) & \dots & Σ (τ_{1}, τ_{L}) \\ Σ (τ_{2}, τ_{1}) & Σ (τ_{2}, τ_{2}) & \dots & Σ (τ_{2}, τ_{L}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Σ (τ_{L}, τ_{1}) & Σ (τ_{L}, τ_{2}) & \dots & Σ (τ_{L}, τ_{L}) \end{matrix}] .

(8)

Let ${\hat{Γ}}_{\hat{ζ}}^{a}$ be a consistent estimator of $Γ_{\hat{ζ}}^{a}$ constructed similarly to (8) but with a consistent estimator $\hat{Σ} (τ_{ℓ}, τ_{ℓ^{'}})$ of Σ(τ_ℓ, τ_ℓ′). The adjusted Wald test is given by

T_{n} = n {(R \hat{ζ})}^{T} {(R {\hat{Γ}}_{\hat{ζ}}^{a} R^{T})}^{- 1} R \hat{ζ} .

(9)

This test is not a proper Wald test as the covariance matrix used is not the valid covariance of $\hat{ζ}$ . The following result studies the asymptotic null distribution of T_n assuming K = p₀.

Theorem 2. Assume the regularity conditions A1–A4, B1–B2 and C1 hold. If the null hypothesis is true, Rζ = 0, then the asymptotic distribution of T_n is $χ_{K}^{2}$ .

The proof of this result relies on the observation that if $Γ_{\hat{ζ}}$ is the proper covariance of $\hat{ζ}$ as described by Theorem 1, then $R (Γ_{\hat{ζ}}^{a} - Γ_{\hat{ζ}}) R^{T} = 0$ . Intuitively, this is because the inflation terms in (6) and (7) possess a sandwich structure with a constant matrix enclosed by Θ_τ, which is zeroed out if left multiplied by R under the null hypothesis that Rζ = 0. Thus, although the estimation of the fPC scores yields inflated covariance of the regression estimator, its effect on testing the null hypothesis (2) is negligible. Nevertheless, if one is interested in testing a different type of null hypothesis for ζ, such as nonlinear functionals, then this variance inflated term has to be taken into account for a proper testing procedure.

We construct $\hat{Σ} (τ_{ℓ}, τ_{ℓ^{'}})$ for 1 ≤ ℓ, ℓ′ ≤ L by a plug-in estimator that uses ${\hat{D}}_{0} = \sum_{i = 1}^{n} {\hat{ξ}}_{i} {\hat{ξ}}_{i}^{T} / n$ and ${\hat{D}}_{1} (τ) = \sum_{i = 1}^{n} \hat{f_{i}} (ξ_{i}^{T} θ_{τ}) {\hat{ξ}}_{i} {\hat{ξ}}_{i}^{T} / n$ to estimate D₀ and D₁(τ), respectively. The consistency of these estimators can be proved by law of large numbers-based arguments together with Lemma 1 that discusses the closeness between ${\hat{ξ}}_{i}$ and ξ_i. For the estimation of $f_{i} (ξ_{i}^{T} θ_{τ})$ in D₁(τ), we use the difference quotient method proposed by [15] and substitute the estimates ${\hat{ξ}}_{i}$ and ${\hat{θ}}_{τ}$ . Theorem 2 implies that, for testing the null hypothesis of equal functional covariate effect across various quantile levels, the common Wald test based on the estimated fPC scores provides a valid testing procedure. The adjusted Wald test, that disregards the variance component due to the estimation uncertainty of the fPC scores, has a chi-square asymptotic null distribution.

If the number of principal components p₀ is replaced by its consistent estimator K_n, then the null distribution of the test statistic T_n is approximately $χ_{K_{n}}^{2}$ for large n. In other words, the difference in the respective cumulative distribution functions, $P (T_{n} \leq t) - P (χ_{K_{n}}^{2} \leq t)$ , goes to zero for any $t \in ℝ$ ; it implies that the critical value of T_n is asymptotically the same as that of $χ_{K_{n}}^{2}$ . Functional data analysis literature provides a rich menu of possibilities for selecting p₀, such as the percentage of variance explained (PVE) criterion and a Bayesian information criterion (BIC) proposed by [32]. The BIC method is proved to be consistent for both sparse and dense functional data. The PVE criterion is defined as

K_{n} = min {p : \sum_{i = 1}^{p} {\hat{λ}}_{i} / \sum_{i = 1}^{q} {\hat{λ}}_{i} \geq PVE},

where q is the number of estimated eigenvalues and PVE some user-defined threshold that approaches one. The widely used PVE approach also leads to consistent estimators of p₀ given that the number of estimated eigenvalues is greater than p₀ and eigenvalues are estimated consistently, which may well be true in many applications and particularly as suggested by our extensive simulation studies. We shall use the method of PVE in the remaining sections, and we have found that it leads to accurate estimate of p₀ in finite sample performance.

We would like to point out that the asymptotic power of the adjusted Wald test is obtainable using a non-central chi-square distribution. However, the expression is complicated without involving stronger assumptions on Xi(·)’s, since the equation $R (Γ_{\hat{ζ}}^{a} - Γ_{\hat{ζ}}) R^{T} = 0$ does not generally hold under the alternative thus a Wald-type test requires an estimate of the matrix Σ₀.

4. Simulation

4.1. Settings

The simulated data is of the form ${Y_{i}, {(t_{i j}, W_{i j}) |}_{j = 1}^{m_{i}}}$ for i ∈ {1, …, n}, where Y_i is the scalar response and W_ij = X_i(t_ij) + e_ij is the functional covariate contaminated with measurement error e_ij, t_ij ∈ [0, 1], and X_i(·) is the true functional covariate. We generate the data from the following heteroscedastic model: $Y_{i} = \int X_{i} (t) t dt + {1 + γ \int X_{i} (t) t^{2} dt} ϵ$ with ϵ ~ N(0, 1). This leads to a quantile regression model of the form (1) with β₀(τ) = Φ⁻¹(τ), and β(t, τ) = t + γt²Φ⁻¹(τ). Note that the functional coefficient β(t, τ) is nonlinear in t when γ ≠ 0. Here the scalar γ controls the heteroscedasticity and determines how the coefficient function $β (\cdot, τ) (τ \in U)$ varies across τ. Specifically, if γ = 0 then the effect of X_i(·) is constant across different quantile levels of Y_i|X_i(·), while if γ ≠ 0 then the effect of X_i(·) varies across different quantile levels of Y_i|X_i(·).

The true functional covariate X_i(·) is generated from a Gaussian process with zero mean and covariance function cov{X_i(s), X_i(t)} = ∑_k≥1 λ_kϕ_k(s)ϕ_k(t), where λ_k = (1/2)^k−1 for k = 1, 2, 3 and λ_k = 0 for k ≥ 4, and {ϕ_k(·)}_k are the orthonormal Legendre polynomials on $[0, 1] : ϕ_{1} (t) = \sqrt{3} (2 t - 1)$ , $ϕ_{2} (t) = \sqrt{5} (6 t^{2} - 6 t + 1)$ , $ϕ_{3} (t) = \sqrt{7} (20 t^{3} - 30 t^{2} + 12 t - 1)$ . It is assumed that the measurement error e_ij ~ N(0, σ²). Fig. 1 plots simulated data when n = 200, γ = 1, and σ = 1.

Fig. 1: — Simulated data when n = 200 and γ = 1. The left panel plots the functional covariates and two randomly selected curves are highlighted in blue and red; the right panel is the histogram of the response.

The objective is to test the null hypothesis H₀ : β(·,τ_ℓ) = β(·,τ_ℓ′) for τ_ℓ, $τ_{ℓ^{'}} \in U$ , that the effect of the true functional covariate on the conditional distribution of the response is the same for all the quantile levels in a given set $U$ . When γ = 0, the coefficient function β(·, τ) is independent of τ, which means that null hypothesis is true; when γ ≠ 0 then β(·, τ) is varies with τ and thus the null hypothesis is false. We consider two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ for one-sided quantile levels, and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ for two-sided quantile levels.

We implement the proposed adjusted Wald test using a number of fPC selected via the PVE criterion with PVE=95%. We use the R package refund [17] to estimate the fPC scores, where the individual trajectory is reconstructed using penalized regression splines via the function gam and the smoothing parameter is selected using the restricted maximum likelihood approach. We investigate the performance of the proposed test for low and high level of measurement error in the functional covariate (σ = 0.05 and σ = 1 respectively), for varying sample sizes n from 100 to 5000. For the functional covariates, we consider a dense design in Section 4.2, a sparse design in Section 4.3, and a setting when p₀ diverges in Section 4.4.

4.2. Dense design

We first consider a dense design for the functional covariates: the grid of points for each i is an equispaced grid of m_i = 100 timepoints in [0, 1]. We are not aware of any testing procedures for testing the null hypothesis of constant effect at various quantile levels, when the covariate is functional; however we can exploit this particular setting and pretend the covariates are vectors and thus use or directly extend existing testing procedures from quantile regression. In particular, we consider three alternative approaches: (1) treat the observed functional covariate as vector and use the common Wald test for vector covariates in quantile regression (NaiveQR); (2) summarize observed functional covariates via a single number summary of the functional covariate in conjunction with the Wald test (SSQR); and (3) treat the observed functional covariate as a vector, reduce the dimensionality using principal component analysis and then apply the Wald test using the vector of principal component scores (pcaQR). For the pcaQR approach, the number of principal components are selected via PVE and using a level PVE=95%. The Wald test for vector covariates in these three approaches is described in [25, Chapter 3.2.3].

Table 1 summarizes the empirical Type I error rates of the adjusted Wald test when testing H₀ at one-sided quantile levels $(U_{1})$ as well as two-sided quantile levels $(U_{2})$ , when the functional covariate is observed with large (σ = 1) measurement error. The results are presented for three significance levels α = 0.01, α = 0.05 and α = 0.10; they indicate irrespective of quantile levels set or magnitude of the measurement error the Type I error rates are slightly inflated for moderate sample sizes. Nevertheless the empirical Type I error rates converge to the nominal level. The empirical Type I error rates for the alternative approaches are presented in Table 2. As expected the NaiveQR approach has very poor performance. The NaiveQR approach does hypothesis testing when the covariates are highly correlated; this leads to numerical instability due to singularity of the design matrix. Therefore NaiveQR produces many missing values (reported as “–”) in the table, and yields inflated empirical Type I error rates for any significance level. Results for σ = 0.05 are similar and omitted here.

Table 1:

Type I error of the adjusted Wald-type test at significant level α ∈ {0.01, 0.05, 0.10} under dense design. We test H₀ at two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ . Results are based on 5000 simulations.

Scenario	n	0.01	0.05	0.10	Scenario	n	0.01	0.05	0.10
	100	0.021	0.060	0.104		100	0.030	0.076	0.123
σ = 1	500	0.014	0.057	0.107	σ = 1	500	0.015	0.062	0.116
$τ \in U_{1}$	1000	0.017	0.052	0.106	$τ \in U_{2}$	1000	0.015	0.059	0.112
	2000	0.011	0.051	0.101		2000	0.010	0.053	0.103
	5000	0.010	0.054	0.105		5000	0.012	0.056	0.103

Open in a new tab

Table 2:

Type I error of alternative approaches at significant level α ∈ {0.01, 0.05, 0.10} under dense design. We test H₀ at two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ . Results are based on 5000 simulations. When one method returns error (due to singularity of the design matrix) in more than 20% replications, we report it as “–”.

		NaiveQR			SSQR			pcaQR
Scenario	n	0.01	0.05	0.10	0.01	0.05	0.10	0.01	0.05	0.10
	100	–	–	–	0.008	0.033	0.071	–	–	–
σ = 1	500	–	–	–	0.008	0.036	0.080	–	–	–
$τ \in U_{1}$	1000	–	–	–	0.010	0.049	0.092	0.996	0.999	1.000
	2000	1.000	1.000	1.000	0.009	0.048	0.097	1.000	1.000	1.000
	5000	1.000	1.000	1.000	0.008	0.053	0.099	0.999	1.000	1.000
	100	–	–	–	0.009	0.040	0.077	–	–	–
σ = 1	500	–	–	–	0.009	0.050	0.096	–	–	–
$τ \in U_{2}$	1000	1.000	1.000	1.000	0.009	0.046	0.095	1.000	1.000	1.000
	2000	1.000	1.000	1.000	0.010	0.048	0.099	1.000	1.000	1.000
	5000	1.000	1.000	1.000	0.011	0.051	0.100	1.000	1.000	1.000

Open in a new tab

The pcaQR approach gives relatively good performance when the magnitude of the error is small (σ = 0.05): the empirical Type I error is close to the nominal level in results not reported here. However, Table 1 shows that as the error variance increases (σ = 1), the empirical rejection probabilities are either excessively inflated when n ∈ {1000, 2000, 5000}, or there are too many missing values when n ∈ {100, 500}. The results are not surprising, because in the case of large error variance, a direct application of principal component analysis yields a large number of principal components. As a consequence, the application of the classical Wald test for vector covariate leads to numerical instability due to singularity of the design matrix, in a similar way to the NaiveQR approach. The performance of SSQR approach is very good for all the scenarios considered and across various sample sizes: the empirical Type I error rates are close to the nominal levels. This is expected, as in the case when H₀ holds, the functional covariate effect is through its mean, and this effect is invariant over quantile levels.

Next we evaluate the performance in terms of empirical rejection probabilities when the null hypothesis is not true. We only focus on the proposed adjusted Wald testing and SSQR procedures, as they have the correct size. Fig. 2 illustrates the power curves based on 2000 simulations for large noise with σ = 1; the results are similar in the case of low noise (σ = 0.05) and for brevity are not included. The adjusted Wald procedure is much more powerful than SSQR irrespective of the departure from the null hypothesis as reflected by the coefficient γ. For example, when γ = 1 the probability to correctly reject H₀ using the adjusted Wald is about 100% when the sample size is 500 or more, whereas the counterpart obtained with SSQR is less than 70% even when the sample size increases to 5000. These results are not surprising, as SSQR summarizes the entire functional covariate through a single scalar, while the proposed adjusted Wald test employs the full functional covariate.

Fig. 2: — Power curves of the adjusted Wald test and SSQR under dense design. We test H₀ at two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ . The x-axis is the sample size n ∈ {100, 500, 1000, 2000, 5000}. Results are based on 2000 simulations.

4.3. Sparse design

Next, we study the performance of the adjusted Wald testing procedure when the functional covariate is observed sparsely and with measurement error. We set an overall grid of 101 equispaced points in [0, 1] and consider two settings: a ‘moderately sparse’ sampling design with m_i = 50 randomly generated time points to form $t_{i 1}, \dots, t_{i m_{i}}$ for each i, and a ‘highly sparse’ design with m_i = 10. Other aspects of the data generating process follow the dense design described in the previous section. We use the adjusted Wald test which relies on sparse fPCA techniques, that estimate the fPC scores ξ_ik’s using conditional expectation proposed by [46]. When the sampling design of the functional covariate is sparse, there are no obvious reasonable alternative approaches to compare. Thus in this section we only discuss the performance of the proposed Wald-type procedure.

Table 3 shows the empirical Type I error when the noise level σ = 1. They show excellent performance of the adjusted Wald test in maintaining the nominal levels for moderately large sample size (n = 1000 or larger) under both moderately sparse and sparse sampling design of the functional covariate. Fig. 3 shows the power of the adjusted Wald test for moderately sparse and highly sparse designs for σ = 1. It indicates that the sparsity of the functional covariates slightly affects the proposed functional Wald-type procedure, as expected. Nevertheless the adjusted Wald test continues to display excellent performance. The results are similar for low level of measurement error and for brevity are omitted here.

Table 3:

Type I error of the adjusted Wald test at significance level α ∈ {0.01, 0.05, 0.10} under sparse design. We test H₀ at two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ . The missing rate is 50% for moderate sparsity and 90% for high sparsity. Results are based on 5000 simulations.

		missing rate = 50%			missing rate = 90%
Scenario	n	0.01	0.05	0.10	0.01	0.05	0.10
	100	0.021	0.063	0.104	0.024	0.075	0.119
σ = 1	500	0.014	0.055	0.104	0.011	0.058	0.110
$τ \in U_{1}$	1000	0.014	0.055	0.106	0.013	0.052	0.101
	2000	0.011	0.055	0.106	0.013	0.053	0.103
	5000	0.011	0.052	0.100	0.010	0.048	0.100
	100	0.026	0.075	0.120	0.034	0.092	0.143
σ = 1	500	0.016	0.058	0.110	0.021	0.069	0.119
$τ \in U_{2}$	1000	0.013	0.057	0.106	0.014	0.063	0.114
	2000	0.011	0.053	0.100	0.011	0.056	0.108
	5000	0.010	0.048	0.103	0.011	0.049	0.100

Open in a new tab

Fig. 3: — Power curves of the adjusted Wald test for moderately sparse design with m_i = 50 (blue) and highly sparse design with m_i = 10 (red). We test H₀ at two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ . The x-axis is the sample size n ∈ {100, 500, 1000, 2000, 5000}. Results are based on 2000 simulations.

4.4. Divergent p₀

In this section, we study the performance of the proposed adjusted Wald test when Assumption A4 is violated. We follow the same settings in Section 4.1 but the eigen values and eigenfunctions to generate the functional covariate are given by λ_k = (1/2)^k−1 for $k \in {1, \dots, ⌊ \sqrt{n} ⌋}$ and λ_k = 0 for $k > ⌊ \sqrt{n} ⌋$ where ⌊·⌋ is the floor function; the eigen function ϕ_k is the kth function in the Fourier basis ${\sqrt{2} cos (2 π t), \sqrt{2} sin (2 π t), \sqrt{2} cos (4 π t), \sqrt{2} sin (4 π t), \dots}$ . We set σ = 1 for the measurement error in the functional covariate.

Table 4 presents the Type I error rates of the adjusted Wald test under various designs. We can see that even Assumption A4 is violated, the proposed test matches the nominal level when the sample size is large, for both dense and sparse designs. Fig. 4 plots the power curves when γ ∈ {0.5, 1, 1.5}, which indicates similar performance to the case where p₀ is a small constant. Therefore, it seems that the proposed Wald test continues to show desirable performance when Assumption A4 does not hold, at least under the simulation settings. A theoretical justification may be an interesting research topic.

Table 4:

Type I error of the adjusted Wald test at significance level α ∈ {0.01, 0.05, 0.10} when p₀ is divergent under dense design (no missing) and sparse design. We test H₀ at two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ . The missing rate is 50% for moderate sparsity and 90% for high sparsity. Results are based on 5000 simulations.

$U$	n	no missing			missing rate = 50%			missing rate = 90%
		0.01	0.05	0.10	0.01	0.05	0.10	0.01	0.05	0.10
	100	0.045	0.107	0.161	0.049	0.107	0.162	0.040	0.096	0.147
	500	0.025	0.085	0.146	0.022	0.081	0.141	0.026	0.084	0.137
$U_{1}$	1000	0.015	0.067	0.123	0.019	0.072	0.132	0.013	0.062	0.120
	2000	0.015	0.063	0.120	0.016	0.062	0.115	0.015	0.063	0.119
	5000	0.010	0.052	0.107	0.010	0.059	0.117	0.012	0.055	0.101
	100	0.070	0.149	0.222	0.061	0.142	0.212	0.051	0.125	0.193
	500	0.035	0.105	0.169	0.037	0.109	0.174	0.025	0.087	0.148
$U_{2}$	1000	0.025	0.084	0.148	0.022	0.079	0.140	0.024	0.077	0.132
	2000	0.020	0.072	0.127	0.016	0.067	0.123	0.015	0.063	0.123
	5000	0.013	0.056	0.111	0.014	0.058	0.104	0.012	0.057	0.110

Open in a new tab

Fig. 4: — Power curves of the adjusted Wald test when p₀ is divergent. We test H₀ at two sets of quantile levels: $U_{1} = {0.1, 0.2, 0.3, 0.4}$ and $U_{2} = {0.1, 0.2, 0.6, 0.7}$ . In each plot, the x-axis is the sample size n ∈ {100, 500, 1000, 2000, 5000}, and a larger corresponds to more deviation from the null hypothesis. Results are based on 2000 simulations.

5. Application

In this section we consider the capital bike sharing study and discuss the application of the proposed testing procedure to formally assess whether the effect of the previous day casual bike rentals on the current day total bike rentals varies across several quantile levels. The bike data [7] is recorded by the Capital Bikeshare System (CBS), Washington D.C., USA, which is available at http://capitalbikeshare.com/system-data. As the new generation of bike rentals, bike sharing systems possess membership, rental and return automatically. With currently over 500 bike-share programs around the world [27] and the fast growing trend, data analysis on these systems regarding the effects to public traffic and the environment has become popular. The bike data includes hourly rented bikes for casual users that are collected during January 1st 2011 to December 31st 2012, for a total of 731 days.

Our objective is to formally assess how the previous day casual bike rentals, X_i(·), affects the distribution of the current day total bike rentals counts, Y_i, where i ∈ {1, …, 730} denote the ith day starting from January 2nd 2011. A subsequent interest is to predict the 90% quantile of the total casual bike rentals. Fig. 5 plots the hourly profiles of casual bike rentals (left) and the histogram of the total casual bike rentals (right).

Fig. 5: — Bike rental data (casual users). The left panel plots hourly bike rentals for casual users on the previous day X_i(t) for t ranges from 0 to 24 hours, and the right panel plots the histogram of the total casual bike rentals on the current day y_i, where i ∈ {1, …, 730}.

We assume the functional quantile regression model (1), $Q_{Y_{i} ∣ X_{i}} (τ) = β_{0} (τ) + \int β (t, τ) X_{i}^{c} (t) dt$ , where Y_i is the total bike casual bike rentals for the current day and X_i(·) is the true profile of the casual bike rentals recorded in the previous day. As described earlier β₀(·) is the quantile varying intercept function and β(·, τ) is the slope parameter and quantifies the effect of the functional covariate at the τth quantile level of the distribution of the response.

To address the first objective we consider a set of quantile levels and use the proposed testing procedure to test the null hypothesis

H_{0} : β (\cdot, 0.20) = β (\cdot, 0.40) = β (\cdot, 0.60) = β (\cdot, 0.80) .

The number of fPC is selected using PVE = 99%; this choice selects three fPC. We use the adjusted Wald test T_n and its asymptotic null distribution; the resulting p-value is close to zero indicating overwhelming evidence that low and large number of bike rentals are affected differently by the hourly rentals on the previous day.

Next we turn to the problem of predicting the 90% quantile of the total bike rentals for the current day. When some quantile coefficients in a region of quantile levels are constant, we may improve the estimator’s efficiency by borrowing information from neighboring quantiles to estimate the common coefficients, especially when the quantile level of interest is high. Here consider the quantile level set $U = {0.8, 0.825, 0.85, 0.875, 0.9}$ around the 90%th quantile. We apply the proposed method to estimate the coefficient functions at various quantile levels $U$ as shown in Fig. 6. The corresponding adjusted Wald test leads to a p-value = 0.466, which suggests that the quantile coefficients are not significantly different across the quantile levels. We consider combined quantile regression at $U$ by using the methods of quantile average estimator (QAE) and composite regression of quantiles (CRQ) with equal weights; see [24, 43] for more technical details. We denote the single quantile regression estimation at the 90th quantile by RQ.

Fig. 6: — Estimated β(·, τ) by the proposed method at various quantile levels for the capital bike sharing study. The x-axis ranges from 0 to 24 hours.

We use 1000 bootstrap samples to study the efficiency of the three estimators. Fig. 7 plots the bootstrap means and standard errors of the estimates of β(·, 0.9) by QAE, CRQ and RQ. The QAE and CRQ estimators have smaller standard errors uniformly for all t, indicating efficiency gain by combining information across quantile levels. We also observe that the number of fPC is either 3 or 4 in all bootstrap samples, suggesting that the assumption A4 is reasonable in this data application.

Furthermore, we conduct a cross-validation by randomly selecting 50% of the data as the training data set and using the other half as the test data set. We use 1000 replications and calculate the prediction error for each replication and each $τ \in U$ as follows:

PE = \sum_{i \in test sample} ρ_{τ} (y_{i} - {\hat{ξ}}_{i}^{T} {\hat{θ}}_{τ}),

where the estimated coefficients ${\hat{θ}}_{τ}$ are based on the training data and the summation is over the test data. The RQ estimates are obtained separately at each $τ \in U$ , while the QAE and CRQ estimates are shared across $U$ . The averaged prediction errors are reported in Table 5. We can see that the application of QAE and CRQ improves the prediction significantly for the 87.5%th and 90%th quantiles; differences among the three methods are not significant at the lower quantiles. This makes sense since data sparsity becomes more severe for more extreme quantile levels. Hence, incorporating lower quantile levels improves efficiency at higher levels, while it may not benefit the prediction performance at lower quantile levels by considering more extreme levels.

Table 5:

Prediction errors from different methods averaged over 1000 cross-validations. The maximum standard error of each row is reported in the last column. QAE and CRQ that combine information at various quantile levels tend to yield smaller prediction errors than RQ at more extreme quantile levels.

τ	QAE	CRQ	RQ	SE
0.8	154.163	153.073	152.396	0.277
0.825	146.163	145.598	145.504	0.268
0.85	137.028	136.758	137.071	0.259
0.875	126.138	125.949	126.819	0.252
0.9	112.774	112.842	113.823	0.238

Open in a new tab

6. Proofs

In this section, we prove Theorem 1 and Theorem 2, as well as auxiliary results needed in the proofs, including Lemma 1, Lemma 2, and Lemma 3. We use $‖ \cdot ‖_{L^{2}}$ as the L²-norm for a function and ∥ · ∥ as the Euclidean norm for a vector.

6.1. Proofs of Theorem 1 and Theorem 2

Proof of Theorem 1. The proof proceeds in three steps. In step 1, we approximate the estimated scores ${\hat{ξ}}_{i}$ ’s by linear combinations of ξ_i’s. In step 2, we obtain the asymptotic distribution of ${\hat{θ}}_{τ}$ at a single quantile level. In step 3, we extend the results in step 2 to multiple quantile levels.

Step 1 (Approximation of the estimated scores). Most of the existing literature has been focused on establishing error bounds for estimated eigenvalues and eigenfunctions; see for example [11, 12] and the discussion therein. The following lemma instead characterizes the accuracy in predicting the fPC scores.

Lemma 1. Under Assumptions A4, B1, B2 and C1, we have

E {‖ \hat{ξ_{i}} - ξ_{i} ‖}^{2} = o (n^{- 1 / 2}) .

(10)

In addition,

max_{1 \leq i \leq n} | {\hat{ξ}}_{i} - ξ_{i} - n^{- 1 / 2} B ξ_{i} | = O_{p} (n^{- 1}),

(11)

where B is a (p₀ + 1) × (p₀ + 1) dimensional matrix with the bottom right p₀ × p₀ block matrix equal to B⁺ described next and the rest of the elements equal to zero. Here B⁺ = (b_kk′) is a p₀ × p₀ random matrix such that b_kk = 0 for k ∈ {1, …, p₀} and $b_{k k^{'}} = n^{- 1 / 2} {(λ_{k} - λ_{k^{'}})}^{- 1} (\sum_{i = 1}^{n} ξ_{ik} ξ_{i k^{'}})$ if k ≠ k′.

The result in (11) indicates that the leading term of ${\hat{ξ}}_{i} - ξ_{i}$ is n^−1/2 Bξ_i, which is a linear combination of ξ_i with a random weight matrix B that does not depend on i.

Step 2 (Quantile regression on estimated scores). We focus on a single quantile level τ in this step. For any $δ \in ℝ^{p_{0} + 1}$ , let

Z_{n} (δ) = \sum_{i = 1}^{n} {ρ_{τ} ({\hat{u}}_{i} - {\hat{ξ}}_{i}^{T} δ / \sqrt{n}) - ρ_{τ} ({\hat{u}}_{i})},

where ${\hat{u}}_{i} = y_{i} - {\hat{ξ}}_{i}^{T} θ_{τ}$ . Then Z_n(δ) is a convex function which is minimized ${\hat{δ}}_{n} = \sqrt{n} ({\hat{θ}}_{τ} - θ_{τ})$ . Therefore, the asymptotic distribution of ${\hat{δ}}_{n}$ is determined by the limiting behavior of Z_n(δ). Let ψ_τ(t) = τ − I(t < 0). According to the Knight’s identity [23], we can decompose Z_n(δ) into two parts: Z_n(δ) = Z_1n(δ) + Z_2n(δ), where

Z_{1 n} (δ) = - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{ξ}}_{i}^{T} δ ψ_{τ} ({\hat{u}}_{i}), Z_{2 n} (δ) = \sum_{i = 1}^{n} \int_{0}^{{\hat{ξ}}_{i}^{T} δ / \sqrt{n}} {I ({\hat{u}}_{i} \leq s) - I ({\hat{u}}_{i} \leq 0)} ds = \sum_{i = 1}^{n} Z_{2 ni} (δ) .

(12)

In order to show (6), it is sufficient to prove that

Z_{n} (δ) \overset{d}{\to} - δ^{T} W (τ) + \frac{1}{2} δ^{T} D_{1} (τ) δ

(13)

where W(τ) ~ N {0, τ(1 − τ)D₀ + D₁(τ)Σ₀(τ)D₁(τ)}, since one can apply the convexity lemma [34] to the quadratic form of δ in (13).

We next derive the limiting distributions of Z_1n(δ) and Z_2n(δ). For Z_1n(δ), similarly to its definition in (12), we define $Z_{1 n}^{*} (δ)$ based on the true scores ξ_i:

Z_{1 n}^{*} (δ) = - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ξ_{i}^{T} δ ψ_{τ} (u_{i})

where $u_{i} = y_{i} - Q_{Y_{i} ∣ X_{i}} (τ) = y_{i} - ξ_{i}^{T} θ_{τ} = y_{i} - \sum_{k = 0}^{p_{0}} ξ_{ik} β_{k} (τ)$ . By a direct application of the central limit theorem (CLT), we obtain that the asymptotic distribution of $Z_{1 n}^{*} (δ)$ is N(0, τ(1 – τ)δ^TD₀δ). However, when the predictors are estimated with errors, the difference $Z_{1 n} (δ) - Z_{1 n}^{*} (δ)$ is non-negligible. Lemma 2 provides a representation of Z_1n(δ) by explicitly formulating this difference.

Lemma 2. Under Assumptions A4, B1, B2 and C1,

Z_{1 n} (δ) = δ^{T} [- \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {ξ_{i} ψ_{τ} (u_{i}) - D_{1} (τ) d_{i}}] + o_{p} (1),

where $d_{i} = {(0, d_{i 1}, \dots, d_{i p_{0}})}^{T}$ and $d_{ik} = \sum_{r = 1, r \neq k}^{p_{0}} {(λ_{k} - λ_{r})}^{- 1} ξ_{ik} ξ_{ir} β_{r} (τ), k \geq 1$ , k ≥ 1.

Since ξ_iψ_τ(u_i) − D₁(τ)d_i are i.i.d., Lemma 2 allows us to directly apply Linderberg’s CLT to obtain the asymptotic distribution of Z_1n(δ). Note that E{ξ_iψ_τ(u_i)} = 0 and Var{ξ_iψ_τ(u_i)} = τ(1 – τ)D₀. In addition, Ed_i = 0 because ξ_ik and ξ_ir are uncorrelated and have mean 0 (when r ≠ k). Let the matrix Σ(τ) be the covariance matrix of d_i whose first row and first column is all 0 and the (k + 1, k′ + 1)th element (k, k′ = 1, …, p₀) is given by $Cov (d_{ik}, d_{i k^{'}}) = θ_{τ}^{T} A^{k, k^{'}} θ_{τ}$ for some (p₀ + 1) by (p₀ + 1) matrix A^k,k′. The first row and first column of A^k,k′ are all 0, and simple calculation yields its bottom right block A^k,k′,+ = (σ_{j, j′}):

σ_{j, j^{'}} = {\begin{array}{l} 0, & if j = k or j^{'} = k^{'}, \\ {(λ_{k} - λ_{j})}^{- 1} {(λ_{k^{'}} - λ_{j^{'}})}^{- 1} E (ξ_{1 k} ξ_{1 j} ξ_{1 k^{'}} ξ_{1 j^{'}}), & otherwise. \end{array}

(14)

Let $Θ_{τ} = 1_{(p_{0} + 1) \times (p_{0} + 1)} \otimes θ^{T}$ , and Σ₀ be a (p₀+1)² by (p₀+1)² matrix whose (k+1, k′+1)th block is A^k,k′ (k, k′ = 1, …, p₀) and (k + 1, k′ + 1)th block is $0_{(p_{0} + 1) \times (p_{0} + 1)}$ for k = 0 or k′ = 0. Then Σ(τ) can be rewritten as $Σ (τ) = Θ_{τ} Σ_{0} Θ_{τ}^{T}$ . Furthermore, we have

Cov {ξ_{i} ψ_{τ} (u_{i}), d_{i}} = E {ψ_{τ} (u_{i}) ξ_{i}^{T} d_{i}} = E {ξ_{i}^{T} d_{i} E ψ_{τ} (u_{i}) ∣ ξ_{i}} = 0,

which leads to

- \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {ξ_{i} ψ_{τ} (u_{i}) - D_{1} (τ) d_{i}} \overset{d}{\to} N (0, τ (1 - τ) D_{0} + D_{1} (τ) Σ (τ) D_{1} (τ)) .

Hence, we have $Z_{1 n} (δ) \overset{d}{\to} - δ^{T} W (τ)$ where W(τ) ~ N (0, τ(1 − τ)D₀ + D₁(τ)Σ(τ)D₁(τ)). Consequently, the following result for Z_2n(δ) concludes the asymptotic distribution in (13).

Lemma 3. Under Assumptions A4, B1, B2 and C1, we have

Z_{2 n} (δ) = \frac{1}{2} δ^{T} D_{1} (τ) δ + o_{p} (1) .

Step 3 (Asymptotic distributions across quantile levels). When considering various quantile levels, the same arguments can be made via a convex optimization and the limiting distribution of the objective function. The asymptotic covariance in (7) is obtained by the covariance between $ξ_{i} ψ_{τ_{ℓ}} (u_{i}) + D_{1} (τ_{ℓ}) d_{i} (τ_{ℓ})$ and $ξ_{i} ψ_{τ_{ℓ^{'}}} (u_{i}) + D_{1} (τ_{ℓ^{'}}) d_{i} (τ_{ℓ^{'}})$ , following similar calculation as in (14). □

Proof of Theorem 2. We just need to show that $R (Γ_{\hat{ζ}}^{a} - Γ_{\hat{ζ}}) R^{T} = 0$ . The (ℓ, ℓ′)th block of the matrix $Γ_{\hat{ζ}}^{a} - Γ_{\hat{ζ}}$ is $Θ_{τ_{ℓ}} Σ_{0} Θ_{τ_{ℓ^{'}}}$ , where 1 ≤ ℓ, ℓ′ ≤ L. Therefore, we have $Γ_{\hat{ζ}}^{a} - Γ_{\hat{ζ}} = A Σ_{0} A^{T}$ where $A = {(Θ_{τ_{1}}, \dots, Θ_{τ_{L}})}^{T}$ is a (p₀ + 1)L × (p₀ + 1) matrix. Noting that $Θ_{τ_{ℓ}} = 1_{(p_{0} + 1) \times (p_{0} + 1)} \otimes θ_{ℓ}^{T}$ for ℓ ∈ {1, …, L}, we have $A^{T} = 1_{(p_{0} + 1) \times (p_{0} + 1)} \otimes ζ^{T}$ and thus $A = 1_{(p_{0} + 1) \times (p_{0} + 1)} \otimes ζ$ . Therefore, when Rζ = 0, it follows that $R A = 1_{(p_{0} + 1) \times (p_{0} + 1)} \otimes (R ζ) = 0$ . This completes the proof. □

6.2. Proofs of lemmas

Proof of Lemma 1. The bound in (10) follows from standard bounds for the estimated eigenfunctions and covariance kernel in the FDA literature. According to Theorem 1 in [11], we have

{‖ {\hat{ϕ}}_{k} - ϕ_{k} ‖}_{L^{2}} \leq 8^{1 / 2} s_{k}^{- 1} | ‖ \hat{G} - G ‖ |,

where s_k = min_r≤k(λ_r − λ_r+1) and $| ‖ \hat{G} - G ‖ | = {[\int_{0}^{1} \int_{0}^{1} {\hat{G} (u, v) - G (u, v)}^{2} dudv]}^{1 / 2}$ . Therefore,

| {\hat{z}}_{ik} - ξ_{ik} | = | \int_{0}^{1} X_{i} (t) {{\hat{ϕ}}_{k} (t) - ϕ_{k} (t)} dt | \leq {‖ X_{i} ‖}_{L^{2}} \cdot {‖ {\hat{ϕ}}_{k} - ϕ_{k} ‖}_{L^{2}} \leq constant \cdot {‖ X_{i} ‖}_{L^{2}} s_{k}^{- 1} \cdot | ‖ \hat{G} - G ‖ |,

which leads to $‖ {\hat{ξ}}_{i} - ξ_{i} ‖ \leq constant \cdot {‖ X_{i} ‖}_{L^{2}} s_{p_{0}}^{- 1} | ‖ \hat{G} - G ‖ |$ . For any c > 0, invoking the bound $E {| ‖ \hat{G} - G ‖ |}^{c} \leq constant \cdot n^{- c / 2}$ [12, Lemma 3.3] leads to

E {‖ \hat{ξ_{i}} - ξ_{i} ‖}^{c} \leq constant \cdot s_{p_{0}}^{- c} {(E {| ‖ \hat{G} - G ‖ |}^{2 c})}^{1 / 2} \leq constant \cdot s_{p_{0}}^{- c} n^{- c / 2} .

Thus, for finite p₀, we have $E {‖ \hat{ξ_{i}} - ξ_{i} ‖}^{c} = o (n^{- c / 4})$ ; in particular, there holds $\sqrt{n} E {‖ {\hat{ξ}}_{i}^{T} - ξ_{i}^{T} ‖}^{2} = o (1)$ .

Next we prove the representation in (11). Let $\tilde{G}$ be the estimator of the kernel G based on the fully observed covariate X_i(·), and recall that $\hat{G}$ is the estimate based on the discretized W_ij with measurement error. Denote $\tilde{Z} = \sqrt{n} (\tilde{G} - G)$ and $\hat{Z} = \sqrt{n} (\hat{G} - G)$ . We use the notation $\int \hat{Z} ϕ_{k} ϕ_{k^{'}}$ to denote $\int_{0}^{1} \int_{0}^{1} \hat{Z} (u, v) ϕ_{k} (u) ϕ_{k^{'}} (v) dudv$ .

Since {ϕ_k : k ≥ 1} forms a basis of the L² space on [0, 1], we have ${\hat{ϕ}}_{k} = \sum_{k^{'} = 1}^{\infty} a_{k k^{'}} ϕ_{k}^{'}$ , where k ∈ {1, …, p₀} and the generalized Fourier coefficients $a_{k k^{'}} = \int_{0}^{1} {\hat{ϕ}}_{k} (t) ϕ_{k^{'}} (t) dt$ . Furthermore, we have the following expansion for a_kk′’s:

a_{kk} = 1 + O_{p} (n^{- 1}); a_{k k^{'}} = n^{- 1 / 2} {(λ_{k} - λ_{k^{'}})}^{- 1} \int \hat{Z} ϕ_{k} ϕ_{k^{'}} + O_{p} (n^{- 1}) if k \neq k^{'},

according to (2.6) and (2.7) in [11]. Therefore, for k ∈ {1, …, p₀}, we have

\int_{0}^{1} X_{i} (t) {{\hat{ϕ}}_{k} (t) - ϕ_{k} (t)} dt = \sum_{k^{'} = 1}^{p_{0}} {a_{k k^{'}} - I (k^{'} = k)} ξ_{i k^{'}} = \sum_{k^{'} = 1, k^{'} \neq k}^{p_{0}} n^{- 1 / 2} {(λ_{k} - λ_{k^{'}})}^{- 1} \int \hat{Z} ϕ_{k} ϕ_{k^{'}} ξ_{i k^{'}} + O_{p} (n^{- 1}) .

A direct calculation gives that

\int \tilde{Z} ϕ_{k} ϕ_{k^{'}} = n^{- 1 / 2} \sum_{i = 1}^{n} ξ_{ik} ξ_{i k^{'}} - n^{1 / 2} {\bar{ξ}}_{k} {\bar{ξ}}_{k^{'}}

for k, k′ = 1, …, p₀ and k ≠ k′, where ${\bar{ξ}}_{k} = n^{- 1} \sum_{i = 1}^{n} ξ_{ik}$ . Since $n^{1 / 2} {\bar{ξ}}_{k} {\bar{ξ}}_{k^{'}} = n^{- 1 / 2} \cdot (n^{1 / 2} {\bar{ξ}}_{k}) \cdot (n^{1 / 2} {\bar{ξ}}_{k^{'}}) = n^{- 1 / 2} \cdot O_{p} (1) \cdot O_{p} (1) = O_{p} (n^{- 1 / 2})$ , we have $\int \tilde{Z} ϕ_{k} ϕ_{k^{'}} = n^{- 1 / 2} \sum_{i = 1}^{n} ξ_{ik} ξ_{i k^{'}} + O_{p} (n^{- 1 / 2})$ . The same approximation holds when using $\hat{Z}$ since $\hat{Z} - \tilde{Z}$ is uniformly o_p(n^−1/2) as shown by [48]. Consequently,

\int_{0}^{1} X_{i} (t) {{\hat{ϕ}}_{k} (t) - ϕ_{k} (t)} dt = \sum_{k^{'} = 1, k^{'} \neq k}^{p_{0}} n^{- 1} {(λ_{k} - λ_{k^{'}})}^{- 1} (\sum_{i = 1}^{n} ξ_{ik} ξ_{i k^{'}}) ξ_{i k^{'}} + O_{p} (n^{- 1}) .

(15)

This approximation will not be affected if we use ${\hat{X}}_{i} (\cdot)$ instead of the true curve X_i(·) because the difference ${\hat{X}}_{i} (\cdot) - X_{i} (\cdot)$ is negligible uniformly for all i (e.g., see Theorem 2 in [48] or Lemma 1 in [51]). Let a p₀-dimension random matrix B⁺ = (b_kk′) where b_kk′ = 0 if k = k′ and $b_{k k^{'}} = n^{- 1 / 2} {(λ_{k} - λ_{k^{'}})}^{- 1} (\sum_{i = 1}^{n} ξ_{ik} ξ_{i k^{'}})$ if k ≠ k′. Let B be a (p₀ + 1) × (p₀ + 1) zero matrix but the bottom right block is replaced by B⁺, then the right hand side in (15) becomes n^−1/2Bξ_i + O_p(n⁻¹). Consequently, we have

{\hat{ξ}}_{i} - ξ_{i} = n^{- 1 / 2} B ξ_{i} + O_{p} (n^{- 1}) .

This completes the proof by noting that all the stochastic bounds starting from O_p(n⁻¹) in a_kk do not depend on i. □

Proof of Lemma 2. We first decompose the difference between Z_1n(δ) and $Z_{1 n}^{*} (δ)$ into three parts S₁, S₂ and S₃ as follows:

Z_{1 n} (δ) - Z_{1 n}^{*} (δ) = - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{ξ}}_{i}^{T} δ ψ_{τ} ({\hat{u}}_{i}) + \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} z_{i}^{T} δ ψ_{τ} (u_{i}) = (- \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{ξ}}_{i}^{T} - ξ_{i}^{T}) δ {ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i})}) + (- \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{ξ}}_{i}^{T} - ξ_{i}^{T}) δ ψ_{τ} (u_{i})) + (- \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ξ_{i}^{T} δ {ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i})}) ≕ S_{1} + S_{2} + S_{3} .

The proof proceeds in three steps: S₂ = o_p(1) (Step i), S₁ = o_p(1) (Step ii), and $S_{3} = n^{- 1 / 2} δ^{T} D_{1} (τ) \sum_{i = 1}^{n} d_{i} + o_{p} (1)$ (Step iii). Step i and Step ii indicate that the first two terms S₁ and S₂ are negligible, and it is sufficient to show that $E (S_{2}^{2}) = o (1)$ and E|S₁| = o(1) according to Chebyshev’s inequality. The third term S₃ is challenging to analyze since the function of ψ_τ(·) is not differentiable. In Step iii, we approximate the term S₃ mainly using the uniform approximation on ψ_τ(·).

Step i. First notice that $E {ψ_{τ} (u_{i}) ∣ ξ_{i}, {\hat{ξ}}_{i}} = 0$ and $E {ψ_{τ} {(u_{i})}^{2} ∣ ξ_{i}, {\hat{ξ}}_{i}} = τ - τ^{2}$ . Therefore, we have E(S₂) = 0, and further

E (S_{2}^{2}) = E {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{ξ}}_{i}^{T} δ - ξ_{i}^{T} δ) ψ_{τ} (u_{i})}^{2} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{i^{'} = 1}^{n} E {({\hat{ξ}}_{i}^{T} - ξ_{i}^{T}) δ ψ_{τ} (u_{i}) \cdot ({\hat{ξ}}_{i^{'}}^{T} - ξ_{i^{'}}^{T}) δ ψ_{τ} (u_{i^{'}})} .

For i = i′,

E {({\hat{ξ}}_{i}^{T} - ξ_{i}^{T}) δ ψ_{τ} (u_{i}) \cdot ({\hat{ξ}}_{i^{'}}^{T} - ξ_{i^{'}}^{T}) δ ψ_{τ} (u_{i^{'}})} = E {({\hat{ξ}}_{i}^{T} δ - ξ_{i}^{T} δ) ψ_{τ} (u_{i})}^{2} = E {{({\hat{ξ}}_{i}^{T} δ - ξ_{i}^{T} δ)}^{2} E {ψ_{τ}^{2} (u_{i}) ∣ ξ_{i}, {\hat{ξ}}_{i}}} = τ (1 - τ) E {{({\hat{ξ}}_{i}^{T} δ - ξ_{i}^{T} δ)}^{2}} .

Since ${\hat{ξ}}_{i}$ are identically distributed for all i, we have $E {{({\hat{ξ}}_{i}^{T} δ - ξ_{i}^{T} δ)}^{2}} = E {{({\hat{ξ}}_{1}^{T} δ - ξ_{1}^{T} δ)}^{2}}$ . For i ≠ i′, we have $E {({\hat{ξ}}_{i}^{T} - ξ_{i}^{T}) δ ψ_{τ} (u_{i}) \cdot ({\hat{ξ}}_{i^{'}}^{T} - ξ_{i^{'}}^{T}) δ ψ_{τ} (u_{i^{'}})} = 0$ by noting that

E {ψ_{τ} (u_{i}) ψ_{τ} (u_{i^{'}}) ∣ ξ_{i}, {\hat{ξ}}_{i}, ξ_{i^{'}}, {\hat{ξ}}_{i^{'}}} = E {ψ_{τ} (u_{i}) ∣ ξ_{i}, {\hat{ξ}}_{i}, ξ_{i^{'}}, {\hat{ξ}}_{i^{'}}} \cdot E {ψ_{τ} (u_{i^{'}}) ∣ ξ_{i}, {\hat{ξ}}_{i}, ξ_{i^{'}}, {\hat{ξ}}_{i^{'}}} = 0.

Therefore, $E (S_{2}^{2}) = τ (1 - τ) E {{({\hat{ξ}}_{1}^{T} δ - ξ_{1}^{T} δ)}^{2}} = O (E {‖ {\hat{ξ}}_{i} - ξ_{i} ‖}^{2}) = o (1)$ .

Step ii. For S₁, we first introduce the notation

Δ_{i} = E (ψ_{τ} ({\hat{u}}_{i}) ∣ ξ_{i}, {\hat{ξ}}_{i}) = τ - F_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) = F_{i} (ξ_{i}^{T} θ_{τ}) - F_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) .

For each i, this random variable Δ_i satisfies that

Δ_{i} = E (ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i}) ∣ ξ_{i}, {\hat{ξ}}_{i}),

(16)

| Δ_{i} | = E (∣ ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i}) ‖ ξ_{i}, {\hat{ξ}}_{i}) .

(17)

The result given in (16) is obtained by noting that ψ_τ(u_i) has mean 0 conditional on ξ_i, while (17) holds because $| ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i}) | = I {min ({\hat{ξ}}_{i} θ_{τ}, ξ_{i} θ_{τ}) < y_{i} < max ({\hat{ξ}}_{i} θ_{τ}, ξ_{i} θ_{τ})}$ .

By Taylor’s theorem, for any a, $b \in ℝ$ , we have

F (a + b) - F (a) = f (a) b + b^{2} \int_{0}^{1} f^{'} (a + t b) (1 - t) dt ≕ f (a) b + \frac{b^{2}}{2} R (a, b),

where |R(a, b)| ≤ C₀. Therefore,

Δ_{i} = - ({\hat{ξ}}_{i}^{T} θ_{τ} - ξ_{i}^{T} θ_{τ}) f_{i} (ξ_{i}^{T} θ_{τ}) + {({\hat{ξ}}_{i}^{T} θ_{τ} - ξ_{i}^{T} θ_{τ})}^{2} R ({\hat{ξ}}_{i}^{T}, ξ_{i}^{T}),

where $| R ({\hat{ξ}}_{i}^{T}, ξ_{i}^{T}) | \leq 2 C_{0}$ . We also have the bound

E Δ_{i}^{2} \leq constant \cdot E {‖ {\hat{ξ}}_{i} - ξ_{i} ‖}^{2} = o (n^{- 1 / 2}) .

Therefore, $| S_{1} | \leq \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} | ({\hat{ξ}}_{i}^{T} δ - ξ_{i}^{T} δ) | \cdot | ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i}) |$ and consequently

E | S_{1} | \leq \frac{1}{\sqrt{n}} E {\sum_{i = 1}^{n} ∣ \hat{ξ_{i}^{T}} δ - ξ_{i}^{T} δ) ‖ Δ_{i} ∣} = \sqrt{n} E | ({\hat{ξ}}_{1}^{T} δ - ξ_{1}^{T} δ) Δ_{1} | \leq \sqrt{n E {‖ \hat{ξ_{i}} - ξ_{i} ‖}^{2} E Δ_{1}^{2}} = o (1) .

Step iii. Define

R_{n} (t) = \sum_{i = 1}^{n} ξ_{i} {ψ_{τ} (u_{i} - ξ_{i}^{T} t) - ψ_{τ} (u_{i})},

for any vector such that ∥t∥ ≤ C for some constant C. Then the uniform approximation [14] indicates that

sup ‖ R_{n} (t) - E {R_{n} (t)} ‖ = O_{p} (n^{1 / 2} (log n) ‖ t ‖^{1 / 2}) .

On the other hand,

E {R_{n} (t)} = \sum_{i = 1}^{n} E [ξ_{i} {F_{i} (ξ_{i}^{T} θ_{τ}) - F_{i} (ξ_{i}^{T} θ_{τ} - ξ_{i}^{T} t)}] = n E [ξ_{1} {F_{1} (ξ_{1}^{T} θ_{τ}) - F_{1} (ξ_{1}^{T} θ_{τ} - ξ_{1}^{T} t)}] = - n E ξ_{1} ξ_{1}^{T} f_{1} (ξ_{1}^{T} θ_{τ}) t + O (n E {‖ ξ_{1} ‖}^{3} {‖ t ‖}^{2}) = - n D_{1} (τ) t + O (n {‖ t ‖}^{2}) .

Therefore,

R_{n} (t) = - n D_{1} (τ) t + O (n {‖ t ‖}^{2}) + O_{p} (n^{1 / 2} (log n) {‖ t ‖}^{1 / 2}) .

(18)

Note that ${\hat{u}}_{i} = u_{i} + ξ_{i}^{T} θ_{τ} - {\hat{ξ}}_{i}^{T} θ_{τ}$ and ${\hat{ξ}}_{i} - ξ_{i} = B ξ_{i}$ up to a negligible term O_p(n⁻¹). Then

- \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ξ_{i}^{T} δ {ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i})} = - n^{- 1 / 2} R_{n} (n^{- 1 / 2} B θ_{τ}) + o_{p} (1),

where the term o_p(1) is obtained by the same arguments used in Step ii via conditional expectation and Taylor theorem. Substituting t = n^−1/2Bθ_τ into (18) and noting that ∥n^−1/2Bθ_τ∥ = O_p(n^−1/2), we obtain that R_n(n^−1/2Bθ_τ) = −n^1/2D₁(τ)Bθ_τ + O(1) + O_p(n^1/4 log n), leading to

S_{3} = - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ξ_{i}^{T} δ {ψ_{τ} ({\hat{u}}_{i}) - ψ_{τ} (u_{i})} = δ^{T} D_{1} (τ) B θ_{τ} + o_{p} (1) .

According to the definition of B in (11), it is easy to verify that $B θ_{τ} = n^{- 1 / 2} \sum_{i = 1}^{n} d_{i}$ , where $d_{i} = (0, d_{i 1}, \dots, d_{i p_{0}})$ and $d_{ik} = {\sum_{r = 1, r \neq k}^{p_{0}} (λ_{k} - λ_{r})}^{- 1} ξ_{ik} ξ_{i r} β_{r} (τ)$ for k ≥ 1. Therefore, it follows that $S_{3} = n^{- 1 / 2} δ^{T} D_{1} (τ) \sum_{i = 1}^{n} d_{i} + o_{p} (1)$ , which concludes Step iii. This completes the proof. □

Proof of Lemma 3. Recall that $Z_{2 n} = \sum_{i = 1}^{n} Z_{2 ni}$ , where

Z_{2 ni} (δ) = \int_{0}^{{\hat{ξ}}_{i}^{T} δ / \sqrt{n}} {I ({\hat{u}}_{i} \leq s) - I ({\hat{u}}_{i} \leq 0)} ds .

First, we have

E [Z_{2 ni} (δ) ∣ ξ_{i}, {\hat{ξ}}_{i}] = \int_{0}^{{\hat{ξ}}_{i}^{T} δ / \sqrt{n}} F_{i} ({\hat{ξ}}_{i}^{T} θ_{τ} + s) - F_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) ds = \frac{1}{\sqrt{n}} \int_{0}^{{\hat{ξ}}_{i}^{T} δ} F_{i} ({\hat{ξ}}_{i}^{T} θ_{τ} + \frac{t}{\sqrt{n}}) - F_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) dt .

Therefore, by Taylor’s theorem, we have

E [Z_{2 ni} (δ) ∣ ξ_{i}, {\hat{ξ}}_{i}] = \frac{1}{\sqrt{n}} \int_{0}^{{\hat{ξ}}_{i}^{T} δ} f_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) \frac{t}{\sqrt{n}} + \frac{t^{2}}{2 n} R ({\hat{ξ}}_{i}^{T} δ, \frac{t}{\sqrt{n}}) dt = \frac{1}{2 n} δ^{T} {\hat{ξ}}_{i} f_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) {\hat{ξ}}_{i}^{T} δ + R_{n, i},

where R_ni is the remainder satisfying that $| R_{ni} | \leq c n^{- 3 / 2} {| {\hat{ξ}}_{i}^{T} δ |}^{3}$ . Consequently,

E [Z_{2 ni} (δ) ∣ ξ_{i}, {\hat{ξ}}_{i}] = \frac{1}{2} \cdot δ^{T} \frac{1}{n} {\hat{ξ}}_{i} f_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) {\hat{ξ}}_{i}^{T} δ + R_{ni} .

Therefore, the unconditional expectation of Z_2ni(δ) is

E Z_{2 ni} (δ) = E {E [Z_{2 ni} (δ) ∣ ξ_{i}, {\hat{ξ}}_{i}]} = \frac{1}{2} \cdot δ^{T} E (\frac{1}{n} {\hat{ξ}}_{i} f_{i} ({\hat{ξ}}_{i}^{T} θ_{τ}) {\hat{ξ}}_{i}^{T}) δ + E R_{n} = \frac{1}{2} \cdot \frac{1}{n} δ^{T} D_{1} (τ) δ + E (R_{ni}),

leading to

{E Z}_{2 n} = \frac{1}{2} δ^{T} D_{1} (τ) δ + \sum_{i = 1}^{n} E (R_{ni}) .

The second term $\sum_{i = 1}^{n} E (R_{ni})$ is negligible because

| \sum_{i = 1}^{n} E (R_{ni}) | \leq \sum_{i = 1}^{n} E | R_{ni} | \leq c n^{- 3 / 2} \sum_{i = 1}^{n} E {| {\hat{ξ}}_{i}^{T} δ |}^{3} = c n^{- 1 / 2} E {| {\hat{ξ}}_{i}^{T} δ |}^{3}

(19)

\leq O (n^{- 1 / 2}) \cdot (E {‖ {\hat{ξ}}_{1} ‖}_{2}^{3}) {‖ δ ‖}_{2}^{3} = O (n^{- 1 / 2}) \cdot O (1) = o (1),

(20)

where the last step is due to the fact that ${‖ {\hat{ξ}}_{1} ‖}_{2} = {‖ {\hat{X}}_{1} ‖}_{2} \leq {‖ {\hat{X}}_{1} - X_{1} ‖}_{2} + {‖ X_{1} ‖}_{2}$ .

We next will show that $max_{i = 1, \dots, n} ξ_{i} / \sqrt{n} \overset{p}{\to} 0$ . Note thatk ${‖ ξ_{i} ‖}^{2} = 1 + ξ_{i 1}^{2} + \dots + ξ_{i p_{0}}^{2}$ for i ∈ {1, …, n}, and ∥ξ_i∥²’s are i.i.d. with a finite second moment $E {‖ ξ_{i} ‖}^{2} = 1 + λ_{1} + \dots + λ_{p_{0}} < \infty$ . For any ϵ > 0, we have

P (max_{i = 1, \dots, n} \frac{‖ ξ_{i} ‖}{\sqrt{n}} > ϵ) \leq \sum_{i = 1}^{n} P (‖ ξ_{i} ‖ > \sqrt{n} ϵ) \leq \frac{1}{n ϵ^{2}} \sum_{i = 1}^{n} E {{‖ ξ_{i} ‖}^{2} I (‖ ξ_{i} ‖ > \sqrt{n} ϵ)} = \frac{1}{ϵ^{2}} E {{‖ ξ_{1} ‖}^{2} I (‖ ξ_{1} ‖ > \sqrt{n} ϵ)} \to 0,

according to the dominated convergence theorem. It implies that $max_{i = 1, \dots, n} ‖ {\hat{ξ}}_{i} ‖ / \sqrt{n} = o_{p}$ since $‖ {\hat{ξ}}_{i} - ξ_{i} ‖ = o_{p} (1)$ uniformly for all i’s. Consequently, $Var (Z_{2 n} ∣ ξ_{i} ’ s, \hat{ξ_{i}} ’ s) \leq max_{i = 1, \dots, n} ‖ {\hat{ξ}}_{i}^{T} δ ‖ / \sqrt{n} \cdot E (Z_{2 ni} ∣ ξ_{i} ’ s, \hat{ξ_{i}} ’ s) = o_{p} (1)$ , i.e., the conditional variance converges to 0 in probability. Therefore, following the martingale argument in the proof of Theorem 2 in [34], we have Z_2n − E(Z_2n) = o_p(1). This completes the proof. □

Acknowledgments

We thank the Editor, the Associate Editor and two anonymous referees for constructive comments that helped to improve the paper. The authors would like to acknowledge the support of NSF DMS 2015569 (Li) and DMS 1454942 (Staicu) and NIH 5P01 CA142538-09 (Staicu).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CRediT authorship contribution statement

Meng Li: Conceptualization, Methodology, Software, Data analysis, Writing – original draft. Kehui Wang: Conceptualization, Methodology, Software, Data analysis, Writing – original draft. Arnab Maity: Conceptualization, Methodology, Writing – review & editing, Validation. Ana-Maria Staicu: Conceptualization, Methodology, Writing – review & editing, Validation.

References

[1].Cao G, Wang S, Wang L, Estimation and inference for functional linear regression models with partially varying regression coefficients, Stat 9 (2020) e286. [Google Scholar]
[2].Cardot H, Crambes C, Sarda P, Quantile regression when the covariates are functions, Nonparametric Statistics 17 (2005) 841–856. [Google Scholar]
[3].Chen K, Müller H-G, Conditional quantile analysis when covariates are functions, with application to growth data, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 (2012) 67–89. [Google Scholar]
[4].Crambes C, Gannoun A, Henchiri Y, Weak consistency of the Support Vector Machine Quantile Regression approach when covariates are functions, Statistics and Probability Letters 81 (2011) 1847–1858. [Google Scholar]
[5].Crambes C, Gannoun A, Henchiri Y, Support vector machine quantile regression approach for functional data: Simulation and application studies, Journal of Multivariate Analysis 121 (2013) 50–68. [Google Scholar]
[6].Fan J, Gijbels I, Local Polynomial Modelling and Its Applications, Monographs on Statistics and Applied Probability, Chapman & Hall, 1996. [Google Scholar]
[7].Fanaee-T H, Gama J, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence 2 (2014) 113–127. [Google Scholar]
[8].Ferraty F, Rabhi A, Vieu P, Conditional quantiles for dependent functional data with application to the climatic El Niño phenomenon, Sankhyā: The Indian Journal of Statistics 67 (2005) 378–398. [Google Scholar]
[9].Ferraty F, Vieu P, Nonparametric Functional Data Analysis, Springer, New York, 2006. [Google Scholar]
[10].Gertheiss J, Maity A, Staicu A-M, Variable selection in generalized functional linear models, Stat 2 (2013) 86–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Hall P, Hosseini-Nasab M, On properties of functional principal components analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (2006) 109–126. [Google Scholar]
[12].Hall P, Hosseini-Nasab M, Theory for high-order bounds in functional principal components analysis, Mathematical Proceedings of the Cambridge Philosophical Society 146 (2009) 225–256. [Google Scholar]
[13].Hall P, Müller H-G, Wang J-L, Properties of principal component methods for functional and longitudinal data analysis, The Annals of Statistics 34 (2006) 1493–1517. [Google Scholar]
[14].He X, Shao Q-M, A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs, The Annals of Statistics 24 (1996) 2608–2630. [Google Scholar]
[15].Hendricks W, Koenker R, Hierarchical spline models for conditional quantiles and the demand for electricity, Journal of the American statistical Association 87 (1992) 58–68. [Google Scholar]
[16].Horvath L, Kokoszka P, Reimherr M, Two sample inference in functional linear models, The Canadian Journal of Statistics 37 (2009) 571–591. [Google Scholar]
[17].Huang L, Scheipl F, Goldsmith J, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, Reiss P, refund: Regression with Functional Data, 2015. R package version 0.1–13.
[18].Huber PJ, The behavior of maximum likelihood estimates under nonstandard conditions, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. I: Statistics, University of California Press: Berkeley, CA, USA, 1967, pp. 221–233. [Google Scholar]
[19].Ivanescu AE, Staicu A-M, Scheipl F, Greven S, Penalized function-on-function regression, Computational Statistics 30 (2015) 539–568. [Google Scholar]
[20].Jiang C-R, Wang J-L, Covariate adjusted functional principal components analysis for longitudinal data., The Annals of Statistics 38 (2010) 1194–1226. [Google Scholar]
[21].Jiang L, Bondell HD, Wang H, Interquantile shrinkage and variable selection in quantile regression, Computational Statistics & Data Analysis 69 (2014) 208–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Kato K, Estimation in functional linear quantile regression, The Annals of Statistics 40 (2012) 3108–3136. [Google Scholar]
[23].Knight K, Limiting distributions for L₁ regression estimators under general conditions, The Annals of Statistics 26 (1998) 755–770. [Google Scholar]
[24].Koenker R, A note on L-estimates for linear models, Statistics & Probability Letters 2 (1984) 323–325. [Google Scholar]
[25].Koenker R, Quantile Regression, volume 38, Cambridge university press, 2005. [Google Scholar]
[26].Kong D, Staicu A-M, Maity A, Classical testing in functional linear models, Journal of Nonparametric Statistics 28 (2016) 813–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Larsen J, Policy institute: Bike-sharing programs hit the streets in over 500 cities worldwide, http://www.earth-policy.org/plan_b_updates/2013/update112, 2013.
[28].Lee ER, Noh H, Park BU, Model selection via Bayesian information criterion for quantile regression models, Journal of the American Statistical Association 109 (2014) 216–229. [Google Scholar]
[29].Li M, Staicu A-M, Bondell HD, Incorporating covariates in skewed functional data models, Biostatistics 16 (2015) 413–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Li Y, Liu Y, Zhu J, Quantile regression in reproducing kernel Hilbert spaces, Journal of the American Statistical Association 102 (2007) 255–268. [Google Scholar]
[31].Li Y, Wang N, Carroll RJ, Generalized functional linear models with semiparametric single-index interactions, Journal of the American Statistical Association 105 (2010) 621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Li Y, Wang N, Carroll RJ, Selecting the number of principal components in functional data, Journal of the American Statistical Association 108 (2013) 1284–1294. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Morris JS, Carroll RJ, Wavelet-based functional mixed models, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (2006) 179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Pollard D, Asymptotics for least absolute deviation regression estimators, Econometric Theory 7 (1991) 186–199. [Google Scholar]
[35].Ramsay J, Silverman B, Functional Data Analysis, Springer Series in Statistics, Springer, 2005. [Google Scholar]
[36].Redd A, A comment on the orthogonalization of B-spline basis functions and their derivatives, Statistics and Computing 22 (2012) 251–257. [Google Scholar]
[37].Shi G, Du J, Sun Z, Zhang Z, Checking the adequacy of functional linear quantile regression model, Journal of Statistical Planning and Inference 210 (2021) 64–75. [Google Scholar]
[38].Staicu A-M, Crainiceanu CM, Reich DS, Ruppert D, Modeling functional data with spatially heterogeneous shape characteristics, Biometrics 68 (2012) 331–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Staicu A-M, Lahiri SN, Carroll RJ, Significance tests for functional data with complex dependence structure, Journal of Statistical Planning and Inference 156 (2015) 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Su Y-R, Di C-Z, Hsu L, Hypothesis testing in functional linear models, Biometrics 73 (2017) 551–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Usset J, Staicu A-M, Maity A, Interaction models for functional regression, Computational Statistics & Data Analysis 94 (2016) 317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
[42].Wang HJ, Stefanski LA, Zhu Z, Corrected-loss estimation for quantile regression with covariate measurement errors, Biometrika 99 (2012) 405–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Wang K, Wang HJ, Optimally combined estimation for tail quantile regression, Statistica Sinica 26 (2016) 295–311. [Google Scholar]
[44].Wei Y, Carroll RJ, Quantile regression with measurement error, Journal of the American Statistical Association 104 (2009) 1129–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45].Wu Y, Ma Y, Yin G, Smoothed and corrected score approach to censored quantile regression with measurement errors, Journal of the American Statistical Association in press (2015). [Google Scholar]
[46].Yao F, Müller H-G, Wang J-L, Functional data analysis for sparse longitudinal data, Journal of the American Statistical Association 100 (2005) 577–590. [Google Scholar]
[47].Yao F, Sue-Chee S, Wang F, Regularized partially functional quantile regression, Journal of Multivariate Analysis 156 (2017) 39–56. [Google Scholar]
[48].Zhang J-T, Chen J, Statistical inferences for functional data, The Annals of Statistics 35 (2007) 1052–1079. [Google Scholar]
[49].Zhao Z, Xiao Z, Efficient regressions via optimally combining quantile information, Econometric Theory 30 (2014) 1272–1314. [DOI] [PMC free article] [PubMed] [Google Scholar]
[50].Zhou L, Huang JZ, Carroll RJ, Joint modelling of paired sparse functional data using principal components, Biometrika 95 (2008) 601–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
[51].Zhu H, Yao F, Zhang HH, Structured functional additive regression in reproducing kernel hilbert spaces, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2014) 581–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Zou H, Yuan M, Composite quantile regression and the oracle model selection theory, The Annals of Statistics 36 (2008) 1108–1126. [Google Scholar]

[R1] [1].Cao G, Wang S, Wang L, Estimation and inference for functional linear regression models with partially varying regression coefficients, Stat 9 (2020) e286. [Google Scholar]

[R2] [2].Cardot H, Crambes C, Sarda P, Quantile regression when the covariates are functions, Nonparametric Statistics 17 (2005) 841–856. [Google Scholar]

[R3] [3].Chen K, Müller H-G, Conditional quantile analysis when covariates are functions, with application to growth data, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 (2012) 67–89. [Google Scholar]

[R4] [4].Crambes C, Gannoun A, Henchiri Y, Weak consistency of the Support Vector Machine Quantile Regression approach when covariates are functions, Statistics and Probability Letters 81 (2011) 1847–1858. [Google Scholar]

[R5] [5].Crambes C, Gannoun A, Henchiri Y, Support vector machine quantile regression approach for functional data: Simulation and application studies, Journal of Multivariate Analysis 121 (2013) 50–68. [Google Scholar]

[R6] [6].Fan J, Gijbels I, Local Polynomial Modelling and Its Applications, Monographs on Statistics and Applied Probability, Chapman & Hall, 1996. [Google Scholar]

[R7] [7].Fanaee-T H, Gama J, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence 2 (2014) 113–127. [Google Scholar]

[R8] [8].Ferraty F, Rabhi A, Vieu P, Conditional quantiles for dependent functional data with application to the climatic El Niño phenomenon, Sankhyā: The Indian Journal of Statistics 67 (2005) 378–398. [Google Scholar]

[R9] [9].Ferraty F, Vieu P, Nonparametric Functional Data Analysis, Springer, New York, 2006. [Google Scholar]

[R10] [10].Gertheiss J, Maity A, Staicu A-M, Variable selection in generalized functional linear models, Stat 2 (2013) 86–101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Hall P, Hosseini-Nasab M, On properties of functional principal components analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (2006) 109–126. [Google Scholar]

[R12] [12].Hall P, Hosseini-Nasab M, Theory for high-order bounds in functional principal components analysis, Mathematical Proceedings of the Cambridge Philosophical Society 146 (2009) 225–256. [Google Scholar]

[R13] [13].Hall P, Müller H-G, Wang J-L, Properties of principal component methods for functional and longitudinal data analysis, The Annals of Statistics 34 (2006) 1493–1517. [Google Scholar]

[R14] [14].He X, Shao Q-M, A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs, The Annals of Statistics 24 (1996) 2608–2630. [Google Scholar]

[R15] [15].Hendricks W, Koenker R, Hierarchical spline models for conditional quantiles and the demand for electricity, Journal of the American statistical Association 87 (1992) 58–68. [Google Scholar]

[R16] [16].Horvath L, Kokoszka P, Reimherr M, Two sample inference in functional linear models, The Canadian Journal of Statistics 37 (2009) 571–591. [Google Scholar]

[R17] [17].Huang L, Scheipl F, Goldsmith J, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, Reiss P, refund: Regression with Functional Data, 2015. R package version 0.1–13.

[R18] [18].Huber PJ, The behavior of maximum likelihood estimates under nonstandard conditions, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. I: Statistics, University of California Press: Berkeley, CA, USA, 1967, pp. 221–233. [Google Scholar]

[R19] [19].Ivanescu AE, Staicu A-M, Scheipl F, Greven S, Penalized function-on-function regression, Computational Statistics 30 (2015) 539–568. [Google Scholar]

[R20] [20].Jiang C-R, Wang J-L, Covariate adjusted functional principal components analysis for longitudinal data., The Annals of Statistics 38 (2010) 1194–1226. [Google Scholar]

[R21] [21].Jiang L, Bondell HD, Wang H, Interquantile shrinkage and variable selection in quantile regression, Computational Statistics & Data Analysis 69 (2014) 208–219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Kato K, Estimation in functional linear quantile regression, The Annals of Statistics 40 (2012) 3108–3136. [Google Scholar]

[R23] [23].Knight K, Limiting distributions for L₁ regression estimators under general conditions, The Annals of Statistics 26 (1998) 755–770. [Google Scholar]

[R24] [24].Koenker R, A note on L-estimates for linear models, Statistics & Probability Letters 2 (1984) 323–325. [Google Scholar]

[R25] [25].Koenker R, Quantile Regression, volume 38, Cambridge university press, 2005. [Google Scholar]

[R26] [26].Kong D, Staicu A-M, Maity A, Classical testing in functional linear models, Journal of Nonparametric Statistics 28 (2016) 813–838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Larsen J, Policy institute: Bike-sharing programs hit the streets in over 500 cities worldwide, http://www.earth-policy.org/plan_b_updates/2013/update112, 2013.

[R28] [28].Lee ER, Noh H, Park BU, Model selection via Bayesian information criterion for quantile regression models, Journal of the American Statistical Association 109 (2014) 216–229. [Google Scholar]

[R29] [29].Li M, Staicu A-M, Bondell HD, Incorporating covariates in skewed functional data models, Biostatistics 16 (2015) 413–426. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Li Y, Liu Y, Zhu J, Quantile regression in reproducing kernel Hilbert spaces, Journal of the American Statistical Association 102 (2007) 255–268. [Google Scholar]

[R31] [31].Li Y, Wang N, Carroll RJ, Generalized functional linear models with semiparametric single-index interactions, Journal of the American Statistical Association 105 (2010) 621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Li Y, Wang N, Carroll RJ, Selecting the number of principal components in functional data, Journal of the American Statistical Association 108 (2013) 1284–1294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Morris JS, Carroll RJ, Wavelet-based functional mixed models, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (2006) 179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Pollard D, Asymptotics for least absolute deviation regression estimators, Econometric Theory 7 (1991) 186–199. [Google Scholar]

[R35] [35].Ramsay J, Silverman B, Functional Data Analysis, Springer Series in Statistics, Springer, 2005. [Google Scholar]

[R36] [36].Redd A, A comment on the orthogonalization of B-spline basis functions and their derivatives, Statistics and Computing 22 (2012) 251–257. [Google Scholar]

[R37] [37].Shi G, Du J, Sun Z, Zhang Z, Checking the adequacy of functional linear quantile regression model, Journal of Statistical Planning and Inference 210 (2021) 64–75. [Google Scholar]

[R38] [38].Staicu A-M, Crainiceanu CM, Reich DS, Ruppert D, Modeling functional data with spatially heterogeneous shape characteristics, Biometrics 68 (2012) 331–343. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Staicu A-M, Lahiri SN, Carroll RJ, Significance tests for functional data with complex dependence structure, Journal of Statistical Planning and Inference 156 (2015) 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Su Y-R, Di C-Z, Hsu L, Hypothesis testing in functional linear models, Biometrics 73 (2017) 551–561. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Usset J, Staicu A-M, Maity A, Interaction models for functional regression, Computational Statistics & Data Analysis 94 (2016) 317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] [42].Wang HJ, Stefanski LA, Zhu Z, Corrected-loss estimation for quantile regression with covariate measurement errors, Biometrika 99 (2012) 405–421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Wang K, Wang HJ, Optimally combined estimation for tail quantile regression, Statistica Sinica 26 (2016) 295–311. [Google Scholar]

[R44] [44].Wei Y, Carroll RJ, Quantile regression with measurement error, Journal of the American Statistical Association 104 (2009) 1129–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] [45].Wu Y, Ma Y, Yin G, Smoothed and corrected score approach to censored quantile regression with measurement errors, Journal of the American Statistical Association in press (2015). [Google Scholar]

[R46] [46].Yao F, Müller H-G, Wang J-L, Functional data analysis for sparse longitudinal data, Journal of the American Statistical Association 100 (2005) 577–590. [Google Scholar]

[R47] [47].Yao F, Sue-Chee S, Wang F, Regularized partially functional quantile regression, Journal of Multivariate Analysis 156 (2017) 39–56. [Google Scholar]

[R48] [48].Zhang J-T, Chen J, Statistical inferences for functional data, The Annals of Statistics 35 (2007) 1052–1079. [Google Scholar]

[R49] [49].Zhao Z, Xiao Z, Efficient regressions via optimally combining quantile information, Econometric Theory 30 (2014) 1272–1314. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] [50].Zhou L, Huang JZ, Carroll RJ, Joint modelling of paired sparse functional data using principal components, Biometrika 95 (2008) 601–619. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] [51].Zhu H, Yao F, Zhang HH, Structured functional additive regression in reproducing kernel hilbert spaces, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2014) 581–603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Zou H, Yuan M, Composite quantile regression and the oracle model selection theory, The Annals of Statistics 36 (2008) 1108–1126. [Google Scholar]

PERMALINK

Inference in Functional Linear Quantile Regression

Meng Li

Kehui Wang

Arnab Maity

Ana-Maria Staicu

Abstract

1. Introduction

2. Methodology

2.1. Statistical framework

2.2. Estimation procedure

3. Theoretical properties

3.1. Assumptions

3.2. Asymptotic distribution

3.3. Adjusted Wald test

4. Simulation

4.1. Settings

Fig. 1:

4.2. Dense design

Table 1:

Table 2:

Fig. 2:

4.3. Sparse design

Table 3:

Fig. 3:

4.4. Divergent p0

Table 4:

Fig. 4:

5. Application

Fig. 5:

Fig. 6:

Fig. 7:

Table 5:

6. Proofs

6.1. Proofs of Theorem 1 and Theorem 2

6.2. Proofs of lemmas

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.4. Divergent p₀