Abstract
In this paper, we study statistical inference in functional quantile regression for scalar response and a functional covariate. Specifically, we consider a functional linear quantile regression model where the effect of the covariate on the quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. The objective is to test that the regression parameter is constant across several quantile levels of interest. The parameter function is estimated by combining ideas from functional principal component analysis and quantile regression. An adjusted Wald testing procedure is proposed for this hypothesis of interest, and its chi-square asymptotic null distribution is derived. The testing procedure is investigated numerically in simulations involving sparse and noisy functional covariates and in a capital bike share data application. The proposed approach is easy to implement and the R code is published online at https://github.com/xylimeng/fQR-testing.
Keywords: Composite quantile regression, Functional principal component analysis, Functional quantile regression, Measurement error, Wald test
2020 MSC: Primary 62G08, Secondary 62H15
1. Introduction
The advance in computation and technology generated an explosion of data that have functional characteristics. The need to analyze this type of data triggered a rapid growth of the functional data analysis (FDA) field; see [9, 35] for two comprehensive treatments. Most research in functional data analysis has primarily focused on mean regression (see, for example, [10, 19, 20, 41, 46]); only a few works accommodate higher-order moment effects [29, 38]. Quantile regression is appealing in many applications as it allows us to describe the entire conditional distribution of the response at various quantile levels. For example, in our capital bike share data application, it is of interest to study how the bike rental behavior of casual users in the previous day affects the upper quantiles of total bike rentals in the current day.
Quantile regression models for scalar responses and functional covariates have been introduced in [2]. Functional quantile regression (fQR) models essentially extend the standard quantile regression framework to account for functional covariates: the effect of the covariate on a particular quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. Cardot et al. [2] considered a smoothing splines-based approach to represent the functional covariates and derived its convergence rate; Kato [22] studied principal component analysis (PCA)-based estimation and established a sharp convergence rate. In [5] and [4] Crambes et al. discussed nonparametric quantile regression estimation and studied the theoretical properties of a support vector machine-based estimator, a method inspired from [30]. Yao et al. [47] considered regularized partially fQR model by additionally incorporating high-dimensional scalar covariates. Shi et al. [37] developed a procedure to test the adequacy of fQR based on functional PCA. Unlike these regularization and basis expansion-based methods, Ferraty et al. [8] and Chen and Müller [3] estimated the conditional quantile function by inverting the corresponding conditional distribution function; they too studied consistency properties of the regression estimator. Nevertheless, hitherto there is no available work on statistical inference of the quantile regression estimator under fQR. Additionally, existing functional quantile regression research often assumes that the functional covariate is observed either completely on the domain or at very dense grids of points and typically with little or no error contamination. In this work, we are interested in formally assessing whether the effect of the true smooth signal of the covariate, varies across several quantile levels of interest of the response, when the smooth signal is observed at finite grids and possibly perturbed with error and a functional linear quantile regression model is assumed. This problem is important in its own right, yielding a more comprehensive description of the relationship between the covariate and the conditional distribution of the response. Furthermore, formally assessing such a hypothesis is critical when one wishes to improve the estimation accuracy of the conditional quantile of the response at some specified level. Specifically, suppose for several quantile levels around the specified level of interest, there is no evidence that the effect of the latent covariate on these quantile levels of the response differs. In that case, one can improve the accuracy in estimating the covariate effect on the response at the specified level of interest by borrowing information across these quantile levels. For example, in the case of standard quantile regression with vector predictors, there has been a rich literature on the so-called composite quantile regression to aggregate information across quantile levels [21, 24, 49, 52].
In this paper, we assume a linear fQR model that relates the τth quantile level of the response to the covariate through the inner product between a bivariate regression coefficient function and the true covariate signal. In the case when the true signal is measured at the same time points across the study, one naive way to test the null hypothesis that the effect of the true covariate signal is constant across several quantile levels of interest, is to treat the discretely observed functional covariates as high dimensional covariate and apply standard testing procedures (Wald test) in linear quantile regression for vector covariates [25]. As expected, such an approach results in inflated type I error rates due to the high correlation between the repeated measurements corresponding to the same subject; the situation gets progressively worse when the covariate includes noise. Another alternative is to consider a single number summary of the covariate, such as average or median, and carry out this hypothesis testing by employing standard testing methods in quantile regression. Our numerical investigation of this direction shows that while the Type I error rates are preserved well, the power is substantially affected.
We propose to represent the latent smooth covariate and the quantile regression parameter function using the same orthogonal basis system; this reduces the inner product part of the linear fQR model to an infinite sum of products of basis coefficients of the smooth covariate and parameter function. There are various options of orthogonal basis types: we consider the data-driven basis that is formed by the leading eigenbasis functions of the covariance of the true covariate signal and use the percentage of variance explained criterion to determine a finite truncation for this basis. While using a finite basis system reduces the dimensionality of the problem, an important challenge is handling the variability of the basis coefficients of the smooth latent signal, called functional principal component (fPC) scores. We develop the asymptotic distributions of the quantile estimators based on the estimated fPC scores, when the functional covariate is sampled at a fine grid of points (dense design). Finally, we introduce an adjusted Wald test statistic and develop its asymptotic null distribution. The introduced testing procedure shows excellent numerical results even in situations when the functional covariate is sampled at few and irregular time points across the study (sparse design) and the measurements are contaminated by error.
The theoretical study of the distribution of the quantile estimator based on the estimated fPC scores has important differences from the standard linear quantile regression with vector covariates. First, the predictors, fPC scores, are unknown and require estimation, which in turn introduces uncertainty; by comparison the vector covariates are known in the standard quantile setting counterpart. We show that asymptotically the quantile estimators are still unbiased, but their variances are inflated. This implies that, in this reduced framework, a direct application of the Wald testing procedure for null hypotheses involving regression parameters is not appropriate. Second, dealing with estimated fPC scores in this situation is different from the measurement errors in predictors setting. For the latter, it is typically assumed that the measurement error and the true predictors are mutually independent or that the errors are independent across subjects [42, 44, 45]. However, in the functional data setting the resulting errors, due to the difference between the estimated fPC scores and the true scores, are dependent on the true predictors and are also dependent across subjects. As a result, the theoretical investigation requires more careful quantification in terms of the estimated scores and the use of quantile loss.
This article makes three main contributions. First, we establish the asymptotic distribution of the coefficient estimator for both one single quantile level and multiple quantile levels for dense sampled functional covariates. To the best of our knowledge, this is the first work that studies inference of the quantile estimators; previous research in functional quantile regression focused on consistency and minimax rates (see [3, 22]), and most literature on inference in FDA is limited to the context of functional linear regression [1, 16, 26, 40, 48]. Second, we propose an adjusted Wald test statistic to formally assess that the quantile regression parameter is constant across specified quantile levels and derive its asymptotic null distribution. Third, we consider cases where the functional covariate is observed sparsely and contaminated with noise and illustrate through detailed numerical investigation that the testing procedure continues to have excellent performance. Furthermore, we demonstrates the usage of the composite quantile regression and the corresponding advantage in terms of estimation and prediction accuracy, using a capital bike rental data set. Composite quantile regression is well known to improve the efficiency of the quantile estimators at a single quantile level, which becomes especially useful for extreme quantiles [43]; nonetheless, more formal investigation of functional composite quantile regression is beyond the scope of this article.
The rest of the paper is organized as follows. Section 2 introduces the statistical framework, describes the null and alternative hypotheses, discusses a simpler approximation of the testing procedure, and presents the estimation approach. Section 3 develops the asymptotic normality of the proposed estimators, introduces the adjusted Wald test, and derives its null asymptotic distribution. Section 4 presents extensive simulation studies confirming the excellent performance of the proposed test procedure in various scenarios for both dense and sparse designs. Section 5 applies the proposed test to a bike rental data and illustrates the improvement of combined quantile regressions compared to a single level quantile regression after the proposed tests being used. Proofs of Theorem 1 and Theorem 2, as well as some additional useful results, are given in Section 6.
2. Methodology
2.1. Statistical framework
Suppose we observe data {Yi, (tij, Wij)} for j ∈ {1, …, mi} and i ∈ {1, …, n}, where Yi is a scalar response variable, is the evaluation of a latent and smooth process Xi(·) measured with noise at the finite grid of points for , and is a bounded closed interval. It is assumed that the observed functional covariate is perturbed by white noise, i.e., Wij = Xi(tij) + eij, where eij has mean 0 and variance σ2. Furthermore, we assume that the true functional signal with , and Xi(·) are independent and identically distributed. Our objective is to formally assess whether the smooth covariate signal Xi(·) has constant effect at specified quantile levels of the response.
Let be the conditional τth quantile function of the response Yi given the true covariate signal Xi(·) where τ ∈ (0, 1). We assume the following linear fQR model:
| (1) |
where β0(τ) is the quantile-level varying intercept function, and β(t, τ) is the bivariate regression coefficient function and the main object of interest. It is assumed that for a fixed quantile level τ, β(t, τ) ∈ L2[0, 1] as a function of t. Here is the de-meaned smooth covariate signal, defined as . Model (1) is an extension of the standard linear quantile regression model [25] to functional covariates. It was first introduced by [2] and later considered by [3, 22]. For simplicity, in the following it is assumed that the smooth covariate signal has zero mean, i.e., EXi(t) = 0 for all t ∈ [0, 1].
Let be a set with quantile levels of interest where τ1 < ⋯ < τL. Motivated by the reasons mentioned in Section 1, our goal is to test the null hypothesis:
| (2) |
against the alternative hypothesis Ha : β(·, τℓ) ≠ β(·, τℓ′), for some ℓ ≠ ℓ′ ∈ {1, …, L}. This null hypothesis involves infinite dimensional objects, which is very different from the common null hypotheses considered in quantile regression.
One approach to simplify the null hypothesis is by using basis functions expansion. Specifically, let {ϕk(·)}k≥1 be an orthogonal basis in L2[0, 1] such that if k ≠ k′ and 1 if k = k′. We represent the unknown parameter function β(·, τ) using this orthogonal basis β(t, τ) = ∑k≥1 βk(τ)ϕk(t) where are unknown parameter loadings varying with the quantile level τ. It follows that the equality β(·, τℓ) = β(·, τℓ′) is equivalent to βk(τℓ) = βk(τℓ′), k ≥ 1. Thus, the null hypothesis (2) can be written as H0 : βk(τ1) = βk(τ2) = ⋯ = βk(τL) for k ≥ 1. Furthermore, we represent the smooth covariate using the same basis function as Xi(t) = ∑k≥1 ξikϕk(t) where are smooth covariate loadings. Then, the linear fQR model (1) can be equivalently represented as . In practice the infinite summation is typically truncated to some finite truncation K. As a result the fQR model can be approximated by
| (3) |
and the null hypothesis to be tested can be approximated by a reduced version
| (4) |
Let θτ ≔ (β0(τ), β1(τ), …, βK(τ))T be the (K + 1)-dimensional parameter vector and the full quantile regression parameter vector of dimension L(K + 1). Then the reduced null hypothesis (4) can be equivalently re-written as ζ = 0, where R = R1 ⊗ R2 and
Here 0K denotes the K-dimensional vector of zeros and IK is the K × K dimensional identity matrix.
If the loadings ξik’s were known, then model (3) is exactly the conventional quantile regression model. In such case, a standard Wald testing procedure for is typically formulated as , where is the quantile regression estimator of ζ and is a consistent estimator of the covariance of conditional on the true loadings ξik’s; see [25, Chapter 3] for a review of existing methods. However, in practice the loadings of the smooth covariate signal ξik are unknown, and a valid approach has to account for such uncertainty.
Depending on the choice of the orthogonal basis, the approaches used to select the finite truncation K and to develop the theoretical properties for the quantile regression estimators differ. Several choices have been commonly used in functional data analysis literature: Fourier basis functions [39], Wavelet basis [33] or orthogonal B-splines [36, 50]. One important aspect to keep in mind when selecting the basis functions is how to handle the finite truncation K. In this paper we consider the orthogonal basis given by the eigenfunctions of the covariance of the smooth covariate signal Xi(·). Let G(s, t) ≔ Cov{Xi(s), Xi(t)} be the covariance of Xi(·); Mercer’s theorem gives the following spectral decomposition of the covariance , where {ϕk(·), λk}k are the pairs of eigenfunctions and corresponding eigenvalues. The eigenvalues λk’s are nondecreasing and nonnegative and the eigenfunctions ϕk(·)’s are mutually orthogonal functions in L2[0, 1]. Using the Karhunen-Loève expansion, the zero-mean smooth covariate Xi(·) can be represented as , where are commonly known as functional principal component (fPC) scores of Xi(·),satisfying that E(ξik) = 0, Var(ξik) = λk and uncorrelated over k. A popular way to select the finite truncation, or equivalently the number of leading eigenfunctions, is the percentage of variance explained; alternative options for selecting the finite truncation K are considered in [32] and [28].
2.2. Estimation procedure
We discuss estimation for the case when the functional covariate is observed on a fine grid of points, a setting known in the literature by the name of dense sampling design. Nevertheless, our procedure can be successfully applied to the case when the covariate is observed on an irregular sampling design with few points (sparse sampling design) and contaminated with noise, as illustrated later in the numerical investigation. When the sampling design is dense, and thus mi is very large for each i, a common approach in functional data analysis is “smoothing first, then estimation” [48]. Specifically, we first reconstruct each trajectory from the data using penalized regression splines, while one can also use any other appropriate smoothing method such as the local polynomial kernel smoothing technique [6]. Let be the sample mean of these reconstructed trajectories and denote by the centered covariate. Furthermore, let S(·, ·) be the sample covariance of ; the spectral decomposition of S(·, ·) yields the pairs of estimated eigenfunctions and eigenvalues . The theoretical properties of the estimated eigenfunctions have been well studied in the literature; see [11, 13, 48] among others. As eigenfunctions ϕk(·) and are both defined up to a change in sign, we assume that the sign of is chosen such that throughout the paper. Finally the fPC scores ξik are estimated as ; in practice numerical integration is used to approximate the integral; see also [31].
Using the estimated fPC scores ’s, the quantile regression parameter of the approximated linear fQR model, θτ, is estimated by
| (5) |
where ρτ(x) = x{τ − I(x < 0)} is the quantile loss function and I(x < 0) is the indicator function that equals 1 if x < 0 and 0 otherwise. Although throughout this article we focus on a homogeneous truncation level K to ease presentation, the proposed method easily generalizes to the case in which K varies with τ. We next move on to studying the theoretical properties of the quantile regression estimator in (5).
3. Theoretical properties
3.1. Assumptions
Let Fi(y) = P(Yi < y|Xi(·)), and fi(·) be the corresponding density function. We make the following assumptions:
-
A1.
are independent and identically distributed (i.i.d.) as {Y, X(·), e(·)}, and X(·) and e(·) are independent where E{e(t)} = 0 and Cov{e(t), e(t′)} = σ2I(t = t′) for any t, t′;
-
A2.
The conditional distribution Fi(·) is twice continuously differentiable and the corresponding density function fi(·) is uniformly bounded away from 0 and ∞ at points ;
-
A3.
The functional covariates X(·) satisfy that E{X(t1)X(t2)X(t3)X(t4)} < ∞ uniformly for (t1, t2, t3, t4) ∈ [0, 1]4;
-
A4.
There exists a finite number p0 such that and λk = 0 if k > p0.
A2 and the i.i.d. assumption in A1 are standard in quantile regression with vector covariates; see [25, Ch. 4]. A1 assumes that the functional covariates Xi(·)’s are observed with independent white noise ei(·), making the model more realistic compared to error free assumptions made by [22]. The assumption A3 holds for Gaussian processes and is common in the FDA literature; for example, see [13] and the discussion therein.
Finally A4 requires that the functional covariate has a finite number of non-zero eigenvalues, making the approximate model (3) exact, with K = p0. This strong assumption has been employed previously in the literature [31, 32]. In numerical studies, we found that A4 is not needed in order for the testing procedure to show excellent performance in terms of size and power; see, for example, the simulation study in Section 4.4 under the more general model (1) when p0 is divergent. This seems to indicate that A4 is for theoretical convenience. One possible way to relax this assumption is to replace it with a condition on the number of principal components that are relevant in describing the dependence between the functional covariate and the response. Another possibility is to remove it entirely and show that the functional quantile regression approximates the original model with negligible error. Nonetheless, our attempts to prove the main results by relaxing A4 in these directions have not been productive, partly due on one hand to the complication in the interweave of the quantile loss function and infinitely dimensional functional data and on the other hand to the focus on hypothesis testing, as opposed to estimation. Specifically, A4 is critical to ensure a root-n rate for the estimated coefficient functions formulated in Theorem 1 and subsequently to derive the test’s null distribution. As noted in the preceding section, even under A4, inference on fQR based on the estimated fPC scores differs from the standard multivariate quantile regression with vector covariates in the key aspect that estimation of the fPC scores induces a specific type of measurement error. Unlike the existing measurement error in covariates literature relying on certain independence assumptions [42, 44, 45], measurement errors in the estimated fPC scores are dependent on the true predictors and are also dependent across subjects. This consideration requires a more careful quantification in terms of the estimated scores and the use of quantile loss. In this article, we focus on addressing the challenge posed by measurement error in quantile regression with intricate dependence induced by functional covariates, and leave developments to relax A4 to future research.
The following assumptions are commonly used when describing a dense sampling design [31, 48]. For convenient mathematical derivations, we assume that there are the same number of observations per subject, i.e., mi = m for all i.
-
B1.
The time points for i ∈ {1, …, n} and j ∈ {1, …, m}, where the density g(·) has bounded support [0, 1] and is continuously differentiable.
-
B2.
where cm > 5/4 and C is some constant.
For our theoretical development, we require the following condition for the kernel bandwidth hX that is used in smoothing the functional covariates.
-
C1.
.
3.2. Asymptotic distribution
The following theorem gives the asymptotic distribution of the quantile estimator. Kato [22] gave the minimax rate of the coefficient function estimation when there is no measurement error on the discrete functional covariates. The author assumed that the number of eigenvalues is infinite instead of finite as in our assumption A4. Our established root-n rate crucially depends on A4, which facilities downstream inference. One would need to properly scale the estimator using a slower rate and derive the asymptotic distribution, for both and test statistics constructed via , should A4 be relaxed. We denote D0 as the diagonal matrix whose diagonal entries are and which is positive definite, where . Similarly, we denote . When K = p0, the hypothesis in (4) is equivalent to H0 in (2), and the truncation model in (3) does not incur approximation error as the residual ∑k>K βk(τ)ξik degenerates to zero owing to its zero variance.
Theorem 1. Denote by the quantile regression estimator defined by (5) for K = p0, where τ ∈ (0, 1). Under Conditions A1–A4, B1–B2, C1, we have
| (6) |
where and the matrix Σ0 is defined in Section 6 which does not depend on τ. Moreover, is asymptotically multivariate normal centered at , and for 1 ≤ ℓ ≠ ℓ′ ≤ L the asymptotic covariance matrix for and is given by
| (7) |
Remark that the asymptotic covariances in both (6) and (7) contain two components: a Huber [18] sandwich term that is typical in quantile regression theory and a “variance inflation” term. Specifically, if the true scores ξi’s were observed, then the asymptotic variance of would be , and the asymptotic covariance matrix for , would be ; see [25, 34]. The variance inflation terms, ΘτΣ0Θτ in (6) and in (7), quantify the effect of uncertainty in estimating the fPC scores on the quantile regression estimators. Thus, when the covariates are functional data, the asymptotic distribution of is unbiased but the variance is inflated where the variance inflation terms depend on the true parameter value θτ.
The proof of Theorem 1 is detailed in Section 6. The reasoning follows two main steps: 1) approximate the estimated fPC scores ’s by linear combinations of random vectors of the true fPC scores ξi; and 2) show that the approximation error in the predictors is negligible to the quantile loss function. Step 1 crucially relies on the dense design assumption B2. This allows to employ various bounds on both the estimated eigenfunctions and the difference , which in turn enables us to derive a fine-grained characterization of the estimated scores (Lemma 1); see the supplementary materials for more detail.
3.3. Adjusted Wald test
Using the asymptotic properties of the quantile regression estimators, we are now ready to develop a Wald type testing procedure for assessing the general null hypothesis (2) or its finite reduced version (4) represented in vector form by . Recall that denotes the full quantile regression parameter, and is its estimator.
We define a modified version of Wald test, called the adjusted Wald test, by ignoring the variance inflation terms in the above asymptotic covariances. Let with σ(τℓ, τℓ′) set to τℓ(1 − τℓ) if ℓ = ℓ′, and {min(τℓ, τℓ′) − τℓτℓ′} otherwise. Then the asymptotic covariance matrix of without the inflation terms, denoted as in which the superscript a indicates the adjustment by ignoring the inflation terms, i.e.,
| (8) |
Let be a consistent estimator of constructed similarly to (8) but with a consistent estimator of Σ(τℓ, τℓ′). The adjusted Wald test is given by
| (9) |
This test is not a proper Wald test as the covariance matrix used is not the valid covariance of . The following result studies the asymptotic null distribution of Tn assuming K = p0.
Theorem 2. Assume the regularity conditions A1–A4, B1–B2 and C1 hold. If the null hypothesis is true, Rζ = 0, then the asymptotic distribution of Tn is .
The proof of this result relies on the observation that if is the proper covariance of as described by Theorem 1, then . Intuitively, this is because the inflation terms in (6) and (7) possess a sandwich structure with a constant matrix enclosed by Θτ, which is zeroed out if left multiplied by R under the null hypothesis that Rζ = 0. Thus, although the estimation of the fPC scores yields inflated covariance of the regression estimator, its effect on testing the null hypothesis (2) is negligible. Nevertheless, if one is interested in testing a different type of null hypothesis for ζ, such as nonlinear functionals, then this variance inflated term has to be taken into account for a proper testing procedure.
We construct for 1 ≤ ℓ, ℓ′ ≤ L by a plug-in estimator that uses and to estimate D0 and D1(τ), respectively. The consistency of these estimators can be proved by law of large numbers-based arguments together with Lemma 1 that discusses the closeness between and ξi. For the estimation of in D1(τ), we use the difference quotient method proposed by [15] and substitute the estimates and . Theorem 2 implies that, for testing the null hypothesis of equal functional covariate effect across various quantile levels, the common Wald test based on the estimated fPC scores provides a valid testing procedure. The adjusted Wald test, that disregards the variance component due to the estimation uncertainty of the fPC scores, has a chi-square asymptotic null distribution.
If the number of principal components p0 is replaced by its consistent estimator Kn, then the null distribution of the test statistic Tn is approximately for large n. In other words, the difference in the respective cumulative distribution functions, , goes to zero for any ; it implies that the critical value of Tn is asymptotically the same as that of . Functional data analysis literature provides a rich menu of possibilities for selecting p0, such as the percentage of variance explained (PVE) criterion and a Bayesian information criterion (BIC) proposed by [32]. The BIC method is proved to be consistent for both sparse and dense functional data. The PVE criterion is defined as
where q is the number of estimated eigenvalues and PVE some user-defined threshold that approaches one. The widely used PVE approach also leads to consistent estimators of p0 given that the number of estimated eigenvalues is greater than p0 and eigenvalues are estimated consistently, which may well be true in many applications and particularly as suggested by our extensive simulation studies. We shall use the method of PVE in the remaining sections, and we have found that it leads to accurate estimate of p0 in finite sample performance.
We would like to point out that the asymptotic power of the adjusted Wald test is obtainable using a non-central chi-square distribution. However, the expression is complicated without involving stronger assumptions on Xi(·)’s, since the equation does not generally hold under the alternative thus a Wald-type test requires an estimate of the matrix Σ0.
4. Simulation
4.1. Settings
The simulated data is of the form for i ∈ {1, …, n}, where Yi is the scalar response and Wij = Xi(tij) + eij is the functional covariate contaminated with measurement error eij, tij ∈ [0, 1], and Xi(·) is the true functional covariate. We generate the data from the following heteroscedastic model: with ϵ ~ N(0, 1). This leads to a quantile regression model of the form (1) with β0(τ) = Φ−1(τ), and β(t, τ) = t + γt2Φ−1(τ). Note that the functional coefficient β(t, τ) is nonlinear in t when γ ≠ 0. Here the scalar γ controls the heteroscedasticity and determines how the coefficient function varies across τ. Specifically, if γ = 0 then the effect of Xi(·) is constant across different quantile levels of Yi|Xi(·), while if γ ≠ 0 then the effect of Xi(·) varies across different quantile levels of Yi|Xi(·).
The true functional covariate Xi(·) is generated from a Gaussian process with zero mean and covariance function cov{Xi(s), Xi(t)} = ∑k≥1 λkϕk(s)ϕk(t), where λk = (1/2)k−1 for k = 1, 2, 3 and λk = 0 for k ≥ 4, and {ϕk(·)}k are the orthonormal Legendre polynomials on , , . It is assumed that the measurement error eij ~ N(0, σ2). Fig. 1 plots simulated data when n = 200, γ = 1, and σ = 1.
Fig. 1:

Simulated data when n = 200 and γ = 1. The left panel plots the functional covariates and two randomly selected curves are highlighted in blue and red; the right panel is the histogram of the response.
The objective is to test the null hypothesis H0 : β(·,τℓ) = β(·,τℓ′) for τℓ, , that the effect of the true functional covariate on the conditional distribution of the response is the same for all the quantile levels in a given set . When γ = 0, the coefficient function β(·, τ) is independent of τ, which means that null hypothesis is true; when γ ≠ 0 then β(·, τ) is varies with τ and thus the null hypothesis is false. We consider two sets of quantile levels: for one-sided quantile levels, and for two-sided quantile levels.
We implement the proposed adjusted Wald test using a number of fPC selected via the PVE criterion with PVE=95%. We use the R package refund [17] to estimate the fPC scores, where the individual trajectory is reconstructed using penalized regression splines via the function gam and the smoothing parameter is selected using the restricted maximum likelihood approach. We investigate the performance of the proposed test for low and high level of measurement error in the functional covariate (σ = 0.05 and σ = 1 respectively), for varying sample sizes n from 100 to 5000. For the functional covariates, we consider a dense design in Section 4.2, a sparse design in Section 4.3, and a setting when p0 diverges in Section 4.4.
4.2. Dense design
We first consider a dense design for the functional covariates: the grid of points for each i is an equispaced grid of mi = 100 timepoints in [0, 1]. We are not aware of any testing procedures for testing the null hypothesis of constant effect at various quantile levels, when the covariate is functional; however we can exploit this particular setting and pretend the covariates are vectors and thus use or directly extend existing testing procedures from quantile regression. In particular, we consider three alternative approaches: (1) treat the observed functional covariate as vector and use the common Wald test for vector covariates in quantile regression (NaiveQR); (2) summarize observed functional covariates via a single number summary of the functional covariate in conjunction with the Wald test (SSQR); and (3) treat the observed functional covariate as a vector, reduce the dimensionality using principal component analysis and then apply the Wald test using the vector of principal component scores (pcaQR). For the pcaQR approach, the number of principal components are selected via PVE and using a level PVE=95%. The Wald test for vector covariates in these three approaches is described in [25, Chapter 3.2.3].
Table 1 summarizes the empirical Type I error rates of the adjusted Wald test when testing H0 at one-sided quantile levels as well as two-sided quantile levels , when the functional covariate is observed with large (σ = 1) measurement error. The results are presented for three significance levels α = 0.01, α = 0.05 and α = 0.10; they indicate irrespective of quantile levels set or magnitude of the measurement error the Type I error rates are slightly inflated for moderate sample sizes. Nevertheless the empirical Type I error rates converge to the nominal level. The empirical Type I error rates for the alternative approaches are presented in Table 2. As expected the NaiveQR approach has very poor performance. The NaiveQR approach does hypothesis testing when the covariates are highly correlated; this leads to numerical instability due to singularity of the design matrix. Therefore NaiveQR produces many missing values (reported as “–”) in the table, and yields inflated empirical Type I error rates for any significance level. Results for σ = 0.05 are similar and omitted here.
Table 1:
Type I error of the adjusted Wald-type test at significant level α ∈ {0.01, 0.05, 0.10} under dense design. We test H0 at two sets of quantile levels: and . Results are based on 5000 simulations.
| Scenario | n | 0.01 | 0.05 | 0.10 | Scenario | n | 0.01 | 0.05 | 0.10 |
|---|---|---|---|---|---|---|---|---|---|
| 100 | 0.021 | 0.060 | 0.104 | 100 | 0.030 | 0.076 | 0.123 | ||
| σ = 1 | 500 | 0.014 | 0.057 | 0.107 | σ = 1 | 500 | 0.015 | 0.062 | 0.116 |
| 1000 | 0.017 | 0.052 | 0.106 | 1000 | 0.015 | 0.059 | 0.112 | ||
| 2000 | 0.011 | 0.051 | 0.101 | 2000 | 0.010 | 0.053 | 0.103 | ||
| 5000 | 0.010 | 0.054 | 0.105 | 5000 | 0.012 | 0.056 | 0.103 |
Table 2:
Type I error of alternative approaches at significant level α ∈ {0.01, 0.05, 0.10} under dense design. We test H0 at two sets of quantile levels: and . Results are based on 5000 simulations. When one method returns error (due to singularity of the design matrix) in more than 20% replications, we report it as “–”.
| NaiveQR | SSQR | pcaQR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Scenario | n | 0.01 | 0.05 | 0.10 | 0.01 | 0.05 | 0.10 | 0.01 | 0.05 | 0.10 |
| 100 | – | – | – | 0.008 | 0.033 | 0.071 | – | – | – | |
| σ = 1 | 500 | – | – | – | 0.008 | 0.036 | 0.080 | – | – | – |
| 1000 | – | – | – | 0.010 | 0.049 | 0.092 | 0.996 | 0.999 | 1.000 | |
| 2000 | 1.000 | 1.000 | 1.000 | 0.009 | 0.048 | 0.097 | 1.000 | 1.000 | 1.000 | |
| 5000 | 1.000 | 1.000 | 1.000 | 0.008 | 0.053 | 0.099 | 0.999 | 1.000 | 1.000 | |
| 100 | – | – | – | 0.009 | 0.040 | 0.077 | – | – | – | |
| σ = 1 | 500 | – | – | – | 0.009 | 0.050 | 0.096 | – | – | – |
| 1000 | 1.000 | 1.000 | 1.000 | 0.009 | 0.046 | 0.095 | 1.000 | 1.000 | 1.000 | |
| 2000 | 1.000 | 1.000 | 1.000 | 0.010 | 0.048 | 0.099 | 1.000 | 1.000 | 1.000 | |
| 5000 | 1.000 | 1.000 | 1.000 | 0.011 | 0.051 | 0.100 | 1.000 | 1.000 | 1.000 | |
The pcaQR approach gives relatively good performance when the magnitude of the error is small (σ = 0.05): the empirical Type I error is close to the nominal level in results not reported here. However, Table 1 shows that as the error variance increases (σ = 1), the empirical rejection probabilities are either excessively inflated when n ∈ {1000, 2000, 5000}, or there are too many missing values when n ∈ {100, 500}. The results are not surprising, because in the case of large error variance, a direct application of principal component analysis yields a large number of principal components. As a consequence, the application of the classical Wald test for vector covariate leads to numerical instability due to singularity of the design matrix, in a similar way to the NaiveQR approach. The performance of SSQR approach is very good for all the scenarios considered and across various sample sizes: the empirical Type I error rates are close to the nominal levels. This is expected, as in the case when H0 holds, the functional covariate effect is through its mean, and this effect is invariant over quantile levels.
Next we evaluate the performance in terms of empirical rejection probabilities when the null hypothesis is not true. We only focus on the proposed adjusted Wald testing and SSQR procedures, as they have the correct size. Fig. 2 illustrates the power curves based on 2000 simulations for large noise with σ = 1; the results are similar in the case of low noise (σ = 0.05) and for brevity are not included. The adjusted Wald procedure is much more powerful than SSQR irrespective of the departure from the null hypothesis as reflected by the coefficient γ. For example, when γ = 1 the probability to correctly reject H0 using the adjusted Wald is about 100% when the sample size is 500 or more, whereas the counterpart obtained with SSQR is less than 70% even when the sample size increases to 5000. These results are not surprising, as SSQR summarizes the entire functional covariate through a single scalar, while the proposed adjusted Wald test employs the full functional covariate.
Fig. 2:

Power curves of the adjusted Wald test and SSQR under dense design. We test H0 at two sets of quantile levels: and . The x-axis is the sample size n ∈ {100, 500, 1000, 2000, 5000}. Results are based on 2000 simulations.
4.3. Sparse design
Next, we study the performance of the adjusted Wald testing procedure when the functional covariate is observed sparsely and with measurement error. We set an overall grid of 101 equispaced points in [0, 1] and consider two settings: a ‘moderately sparse’ sampling design with mi = 50 randomly generated time points to form for each i, and a ‘highly sparse’ design with mi = 10. Other aspects of the data generating process follow the dense design described in the previous section. We use the adjusted Wald test which relies on sparse fPCA techniques, that estimate the fPC scores ξik’s using conditional expectation proposed by [46]. When the sampling design of the functional covariate is sparse, there are no obvious reasonable alternative approaches to compare. Thus in this section we only discuss the performance of the proposed Wald-type procedure.
Table 3 shows the empirical Type I error when the noise level σ = 1. They show excellent performance of the adjusted Wald test in maintaining the nominal levels for moderately large sample size (n = 1000 or larger) under both moderately sparse and sparse sampling design of the functional covariate. Fig. 3 shows the power of the adjusted Wald test for moderately sparse and highly sparse designs for σ = 1. It indicates that the sparsity of the functional covariates slightly affects the proposed functional Wald-type procedure, as expected. Nevertheless the adjusted Wald test continues to display excellent performance. The results are similar for low level of measurement error and for brevity are omitted here.
Table 3:
Type I error of the adjusted Wald test at significance level α ∈ {0.01, 0.05, 0.10} under sparse design. We test H0 at two sets of quantile levels: and . The missing rate is 50% for moderate sparsity and 90% for high sparsity. Results are based on 5000 simulations.
| missing rate = 50% | missing rate = 90% | ||||||
|---|---|---|---|---|---|---|---|
| Scenario | n | 0.01 | 0.05 | 0.10 | 0.01 | 0.05 | 0.10 |
| 100 | 0.021 | 0.063 | 0.104 | 0.024 | 0.075 | 0.119 | |
| σ = 1 | 500 | 0.014 | 0.055 | 0.104 | 0.011 | 0.058 | 0.110 |
| 1000 | 0.014 | 0.055 | 0.106 | 0.013 | 0.052 | 0.101 | |
| 2000 | 0.011 | 0.055 | 0.106 | 0.013 | 0.053 | 0.103 | |
| 5000 | 0.011 | 0.052 | 0.100 | 0.010 | 0.048 | 0.100 | |
| 100 | 0.026 | 0.075 | 0.120 | 0.034 | 0.092 | 0.143 | |
| σ = 1 | 500 | 0.016 | 0.058 | 0.110 | 0.021 | 0.069 | 0.119 |
| 1000 | 0.013 | 0.057 | 0.106 | 0.014 | 0.063 | 0.114 | |
| 2000 | 0.011 | 0.053 | 0.100 | 0.011 | 0.056 | 0.108 | |
| 5000 | 0.010 | 0.048 | 0.103 | 0.011 | 0.049 | 0.100 | |
Fig. 3:

Power curves of the adjusted Wald test for moderately sparse design with mi = 50 (blue) and highly sparse design with mi = 10 (red). We test H0 at two sets of quantile levels: and . The x-axis is the sample size n ∈ {100, 500, 1000, 2000, 5000}. Results are based on 2000 simulations.
4.4. Divergent p0
In this section, we study the performance of the proposed adjusted Wald test when Assumption A4 is violated. We follow the same settings in Section 4.1 but the eigen values and eigenfunctions to generate the functional covariate are given by λk = (1/2)k−1 for and λk = 0 for where ⌊·⌋ is the floor function; the eigen function ϕk is the kth function in the Fourier basis . We set σ = 1 for the measurement error in the functional covariate.
Table 4 presents the Type I error rates of the adjusted Wald test under various designs. We can see that even Assumption A4 is violated, the proposed test matches the nominal level when the sample size is large, for both dense and sparse designs. Fig. 4 plots the power curves when γ ∈ {0.5, 1, 1.5}, which indicates similar performance to the case where p0 is a small constant. Therefore, it seems that the proposed Wald test continues to show desirable performance when Assumption A4 does not hold, at least under the simulation settings. A theoretical justification may be an interesting research topic.
Table 4:
Type I error of the adjusted Wald test at significance level α ∈ {0.01, 0.05, 0.10} when p0 is divergent under dense design (no missing) and sparse design. We test H0 at two sets of quantile levels: and . The missing rate is 50% for moderate sparsity and 90% for high sparsity. Results are based on 5000 simulations.
| n | no missing | missing rate = 50% | missing rate = 90% | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0.01 | 0.05 | 0.10 | 0.01 | 0.05 | 0.10 | 0.01 | 0.05 | 0.10 | ||
| 100 | 0.045 | 0.107 | 0.161 | 0.049 | 0.107 | 0.162 | 0.040 | 0.096 | 0.147 | |
| 500 | 0.025 | 0.085 | 0.146 | 0.022 | 0.081 | 0.141 | 0.026 | 0.084 | 0.137 | |
| 1000 | 0.015 | 0.067 | 0.123 | 0.019 | 0.072 | 0.132 | 0.013 | 0.062 | 0.120 | |
| 2000 | 0.015 | 0.063 | 0.120 | 0.016 | 0.062 | 0.115 | 0.015 | 0.063 | 0.119 | |
| 5000 | 0.010 | 0.052 | 0.107 | 0.010 | 0.059 | 0.117 | 0.012 | 0.055 | 0.101 | |
| 100 | 0.070 | 0.149 | 0.222 | 0.061 | 0.142 | 0.212 | 0.051 | 0.125 | 0.193 | |
| 500 | 0.035 | 0.105 | 0.169 | 0.037 | 0.109 | 0.174 | 0.025 | 0.087 | 0.148 | |
| 1000 | 0.025 | 0.084 | 0.148 | 0.022 | 0.079 | 0.140 | 0.024 | 0.077 | 0.132 | |
| 2000 | 0.020 | 0.072 | 0.127 | 0.016 | 0.067 | 0.123 | 0.015 | 0.063 | 0.123 | |
| 5000 | 0.013 | 0.056 | 0.111 | 0.014 | 0.058 | 0.104 | 0.012 | 0.057 | 0.110 | |
Fig. 4:

Power curves of the adjusted Wald test when p0 is divergent. We test H0 at two sets of quantile levels: and . In each plot, the x-axis is the sample size n ∈ {100, 500, 1000, 2000, 5000}, and a larger corresponds to more deviation from the null hypothesis. Results are based on 2000 simulations.
5. Application
In this section we consider the capital bike sharing study and discuss the application of the proposed testing procedure to formally assess whether the effect of the previous day casual bike rentals on the current day total bike rentals varies across several quantile levels. The bike data [7] is recorded by the Capital Bikeshare System (CBS), Washington D.C., USA, which is available at http://capitalbikeshare.com/system-data. As the new generation of bike rentals, bike sharing systems possess membership, rental and return automatically. With currently over 500 bike-share programs around the world [27] and the fast growing trend, data analysis on these systems regarding the effects to public traffic and the environment has become popular. The bike data includes hourly rented bikes for casual users that are collected during January 1st 2011 to December 31st 2012, for a total of 731 days.
Our objective is to formally assess how the previous day casual bike rentals, Xi(·), affects the distribution of the current day total bike rentals counts, Yi, where i ∈ {1, …, 730} denote the ith day starting from January 2nd 2011. A subsequent interest is to predict the 90% quantile of the total casual bike rentals. Fig. 5 plots the hourly profiles of casual bike rentals (left) and the histogram of the total casual bike rentals (right).
Fig. 5:

Bike rental data (casual users). The left panel plots hourly bike rentals for casual users on the previous day Xi(t) for t ranges from 0 to 24 hours, and the right panel plots the histogram of the total casual bike rentals on the current day yi, where i ∈ {1, …, 730}.
We assume the functional quantile regression model (1), , where Yi is the total bike casual bike rentals for the current day and Xi(·) is the true profile of the casual bike rentals recorded in the previous day. As described earlier β0(·) is the quantile varying intercept function and β(·, τ) is the slope parameter and quantifies the effect of the functional covariate at the τth quantile level of the distribution of the response.
To address the first objective we consider a set of quantile levels and use the proposed testing procedure to test the null hypothesis
The number of fPC is selected using PVE = 99%; this choice selects three fPC. We use the adjusted Wald test Tn and its asymptotic null distribution; the resulting p-value is close to zero indicating overwhelming evidence that low and large number of bike rentals are affected differently by the hourly rentals on the previous day.
Next we turn to the problem of predicting the 90% quantile of the total bike rentals for the current day. When some quantile coefficients in a region of quantile levels are constant, we may improve the estimator’s efficiency by borrowing information from neighboring quantiles to estimate the common coefficients, especially when the quantile level of interest is high. Here consider the quantile level set around the 90%th quantile. We apply the proposed method to estimate the coefficient functions at various quantile levels as shown in Fig. 6. The corresponding adjusted Wald test leads to a p-value = 0.466, which suggests that the quantile coefficients are not significantly different across the quantile levels. We consider combined quantile regression at by using the methods of quantile average estimator (QAE) and composite regression of quantiles (CRQ) with equal weights; see [24, 43] for more technical details. We denote the single quantile regression estimation at the 90th quantile by RQ.
Fig. 6:

Estimated β(·, τ) by the proposed method at various quantile levels for the capital bike sharing study. The x-axis ranges from 0 to 24 hours.
We use 1000 bootstrap samples to study the efficiency of the three estimators. Fig. 7 plots the bootstrap means and standard errors of the estimates of β(·, 0.9) by QAE, CRQ and RQ. The QAE and CRQ estimators have smaller standard errors uniformly for all t, indicating efficiency gain by combining information across quantile levels. We also observe that the number of fPC is either 3 or 4 in all bootstrap samples, suggesting that the assumption A4 is reasonable in this data application.
Fig. 7:

Bootstrap means (left) and standard errors (right) when estimating β(·, 0.9) by QAE, CRQ, and RQ. QAE and CRQ reduce the standard error of RQ. The x-axis ranges from 0 to 24 hours.
Furthermore, we conduct a cross-validation by randomly selecting 50% of the data as the training data set and using the other half as the test data set. We use 1000 replications and calculate the prediction error for each replication and each as follows:
where the estimated coefficients are based on the training data and the summation is over the test data. The RQ estimates are obtained separately at each , while the QAE and CRQ estimates are shared across . The averaged prediction errors are reported in Table 5. We can see that the application of QAE and CRQ improves the prediction significantly for the 87.5%th and 90%th quantiles; differences among the three methods are not significant at the lower quantiles. This makes sense since data sparsity becomes more severe for more extreme quantile levels. Hence, incorporating lower quantile levels improves efficiency at higher levels, while it may not benefit the prediction performance at lower quantile levels by considering more extreme levels.
Table 5:
Prediction errors from different methods averaged over 1000 cross-validations. The maximum standard error of each row is reported in the last column. QAE and CRQ that combine information at various quantile levels tend to yield smaller prediction errors than RQ at more extreme quantile levels.
| τ | QAE | CRQ | RQ | SE |
|---|---|---|---|---|
| 0.8 | 154.163 | 153.073 | 152.396 | 0.277 |
| 0.825 | 146.163 | 145.598 | 145.504 | 0.268 |
| 0.85 | 137.028 | 136.758 | 137.071 | 0.259 |
| 0.875 | 126.138 | 125.949 | 126.819 | 0.252 |
| 0.9 | 112.774 | 112.842 | 113.823 | 0.238 |
6. Proofs
In this section, we prove Theorem 1 and Theorem 2, as well as auxiliary results needed in the proofs, including Lemma 1, Lemma 2, and Lemma 3. We use as the L2-norm for a function and ∥ · ∥ as the Euclidean norm for a vector.
6.1. Proofs of Theorem 1 and Theorem 2
Proof of Theorem 1. The proof proceeds in three steps. In step 1, we approximate the estimated scores ’s by linear combinations of ξi’s. In step 2, we obtain the asymptotic distribution of at a single quantile level. In step 3, we extend the results in step 2 to multiple quantile levels.
Step 1 (Approximation of the estimated scores). Most of the existing literature has been focused on establishing error bounds for estimated eigenvalues and eigenfunctions; see for example [11, 12] and the discussion therein. The following lemma instead characterizes the accuracy in predicting the fPC scores.
Lemma 1. Under Assumptions A4, B1, B2 and C1, we have
| (10) |
In addition,
| (11) |
where B is a (p0 + 1) × (p0 + 1) dimensional matrix with the bottom right p0 × p0 block matrix equal to B+ described next and the rest of the elements equal to zero. Here B+ = (bkk′) is a p0 × p0 random matrix such that bkk = 0 for k ∈ {1, …, p0} and if k ≠ k′.
The result in (11) indicates that the leading term of is n−1/2 Bξi, which is a linear combination of ξi with a random weight matrix B that does not depend on i.
Step 2 (Quantile regression on estimated scores). We focus on a single quantile level τ in this step. For any , let
where . Then Zn(δ) is a convex function which is minimized . Therefore, the asymptotic distribution of is determined by the limiting behavior of Zn(δ). Let ψτ(t) = τ − I(t < 0). According to the Knight’s identity [23], we can decompose Zn(δ) into two parts: Zn(δ) = Z1n(δ) + Z2n(δ), where
| (12) |
In order to show (6), it is sufficient to prove that
| (13) |
where W(τ) ~ N {0, τ(1 − τ)D0 + D1(τ)Σ0(τ)D1(τ)}, since one can apply the convexity lemma [34] to the quadratic form of δ in (13).
We next derive the limiting distributions of Z1n(δ) and Z2n(δ). For Z1n(δ), similarly to its definition in (12), we define based on the true scores ξi:
where . By a direct application of the central limit theorem (CLT), we obtain that the asymptotic distribution of is N(0, τ(1 – τ)δTD0δ). However, when the predictors are estimated with errors, the difference is non-negligible. Lemma 2 provides a representation of Z1n(δ) by explicitly formulating this difference.
Lemma 2. Under Assumptions A4, B1, B2 and C1,
where and , k ≥ 1.
Since ξiψτ(ui) − D1(τ)di are i.i.d., Lemma 2 allows us to directly apply Linderberg’s CLT to obtain the asymptotic distribution of Z1n(δ). Note that E{ξiψτ(ui)} = 0 and Var{ξiψτ(ui)} = τ(1 – τ)D0. In addition, Edi = 0 because ξik and ξir are uncorrelated and have mean 0 (when r ≠ k). Let the matrix Σ(τ) be the covariance matrix of di whose first row and first column is all 0 and the (k + 1, k′ + 1)th element (k, k′ = 1, …, p0) is given by for some (p0 + 1) by (p0 + 1) matrix Ak,k′. The first row and first column of Ak,k′ are all 0, and simple calculation yields its bottom right block Ak,k′,+ = (σj, j′):
| (14) |
Let , and Σ0 be a (p0+1)2 by (p0+1)2 matrix whose (k+1, k′+1)th block is Ak,k′ (k, k′ = 1, …, p0) and (k + 1, k′ + 1)th block is for k = 0 or k′ = 0. Then Σ(τ) can be rewritten as . Furthermore, we have
which leads to
Hence, we have where W(τ) ~ N (0, τ(1 − τ)D0 + D1(τ)Σ(τ)D1(τ)). Consequently, the following result for Z2n(δ) concludes the asymptotic distribution in (13).
Lemma 3. Under Assumptions A4, B1, B2 and C1, we have
Step 3 (Asymptotic distributions across quantile levels). When considering various quantile levels, the same arguments can be made via a convex optimization and the limiting distribution of the objective function. The asymptotic covariance in (7) is obtained by the covariance between and , following similar calculation as in (14). □
Proof of Theorem 2. We just need to show that . The (ℓ, ℓ′)th block of the matrix is , where 1 ≤ ℓ, ℓ′ ≤ L. Therefore, we have where is a (p0 + 1)L × (p0 + 1) matrix. Noting that for ℓ ∈ {1, …, L}, we have and thus . Therefore, when Rζ = 0, it follows that . This completes the proof. □
6.2. Proofs of lemmas
Proof of Lemma 1. The bound in (10) follows from standard bounds for the estimated eigenfunctions and covariance kernel in the FDA literature. According to Theorem 1 in [11], we have
where sk = minr≤k(λr − λr+1) and . Therefore,
which leads to . For any c > 0, invoking the bound [12, Lemma 3.3] leads to
Thus, for finite p0, we have ; in particular, there holds .
Next we prove the representation in (11). Let be the estimator of the kernel G based on the fully observed covariate Xi(·), and recall that is the estimate based on the discretized Wij with measurement error. Denote and . We use the notation to denote .
Since {ϕk : k ≥ 1} forms a basis of the L2 space on [0, 1], we have , where k ∈ {1, …, p0} and the generalized Fourier coefficients . Furthermore, we have the following expansion for akk′’s:
according to (2.6) and (2.7) in [11]. Therefore, for k ∈ {1, …, p0}, we have
A direct calculation gives that
for k, k′ = 1, …, p0 and k ≠ k′, where . Since , we have . The same approximation holds when using since is uniformly op(n−1/2) as shown by [48]. Consequently,
| (15) |
This approximation will not be affected if we use instead of the true curve Xi(·) because the difference is negligible uniformly for all i (e.g., see Theorem 2 in [48] or Lemma 1 in [51]). Let a p0-dimension random matrix B+ = (bkk′) where bkk′ = 0 if k = k′ and if k ≠ k′. Let B be a (p0 + 1) × (p0 + 1) zero matrix but the bottom right block is replaced by B+, then the right hand side in (15) becomes n−1/2Bξi + Op(n−1). Consequently, we have
This completes the proof by noting that all the stochastic bounds starting from Op(n−1) in akk do not depend on i. □
Proof of Lemma 2. We first decompose the difference between Z1n(δ) and into three parts S1, S2 and S3 as follows:
The proof proceeds in three steps: S2 = op(1) (Step i), S1 = op(1) (Step ii), and (Step iii). Step i and Step ii indicate that the first two terms S1 and S2 are negligible, and it is sufficient to show that and E|S1| = o(1) according to Chebyshev’s inequality. The third term S3 is challenging to analyze since the function of ψτ(·) is not differentiable. In Step iii, we approximate the term S3 mainly using the uniform approximation on ψτ(·).
Step i. First notice that and . Therefore, we have E(S2) = 0, and further
For i = i′,
Since are identically distributed for all i, we have . For i ≠ i′, we have by noting that
Therefore, .
Step ii. For S1, we first introduce the notation
For each i, this random variable Δi satisfies that
| (16) |
| (17) |
The result given in (16) is obtained by noting that ψτ(ui) has mean 0 conditional on ξi, while (17) holds because .
By Taylor’s theorem, for any a, , we have
where |R(a, b)| ≤ C0. Therefore,
where . We also have the bound
Therefore, and consequently
Step iii. Define
for any vector such that ∥t∥ ≤ C for some constant C. Then the uniform approximation [14] indicates that
On the other hand,
Therefore,
| (18) |
Note that and up to a negligible term Op(n−1). Then
where the term op(1) is obtained by the same arguments used in Step ii via conditional expectation and Taylor theorem. Substituting t = n−1/2Bθτ into (18) and noting that ∥n−1/2Bθτ∥ = Op(n−1/2), we obtain that Rn(n−1/2Bθτ) = −n1/2D1(τ)Bθτ + O(1) + Op(n1/4 log n), leading to
According to the definition of B in (11), it is easy to verify that , where and for k ≥ 1. Therefore, it follows that , which concludes Step iii. This completes the proof. □
Proof of Lemma 3. Recall that , where
First, we have
Therefore, by Taylor’s theorem, we have
where Rni is the remainder satisfying that . Consequently,
Therefore, the unconditional expectation of Z2ni(δ) is
leading to
The second term is negligible because
| (19) |
| (20) |
where the last step is due to the fact that .
We next will show that . Note thatk for i ∈ {1, …, n}, and ∥ξi∥2’s are i.i.d. with a finite second moment . For any ϵ > 0, we have
according to the dominated convergence theorem. It implies that since uniformly for all i’s. Consequently, , i.e., the conditional variance converges to 0 in probability. Therefore, following the martingale argument in the proof of Theorem 2 in [34], we have Z2n − E(Z2n) = op(1). This completes the proof. □
Acknowledgments
We thank the Editor, the Associate Editor and two anonymous referees for constructive comments that helped to improve the paper. The authors would like to acknowledge the support of NSF DMS 2015569 (Li) and DMS 1454942 (Staicu) and NIH 5P01 CA142538-09 (Staicu).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
CRediT authorship contribution statement
Meng Li: Conceptualization, Methodology, Software, Data analysis, Writing – original draft. Kehui Wang: Conceptualization, Methodology, Software, Data analysis, Writing – original draft. Arnab Maity: Conceptualization, Methodology, Writing – review & editing, Validation. Ana-Maria Staicu: Conceptualization, Methodology, Writing – review & editing, Validation.
References
- [1].Cao G, Wang S, Wang L, Estimation and inference for functional linear regression models with partially varying regression coefficients, Stat 9 (2020) e286. [Google Scholar]
- [2].Cardot H, Crambes C, Sarda P, Quantile regression when the covariates are functions, Nonparametric Statistics 17 (2005) 841–856. [Google Scholar]
- [3].Chen K, Müller H-G, Conditional quantile analysis when covariates are functions, with application to growth data, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 (2012) 67–89. [Google Scholar]
- [4].Crambes C, Gannoun A, Henchiri Y, Weak consistency of the Support Vector Machine Quantile Regression approach when covariates are functions, Statistics and Probability Letters 81 (2011) 1847–1858. [Google Scholar]
- [5].Crambes C, Gannoun A, Henchiri Y, Support vector machine quantile regression approach for functional data: Simulation and application studies, Journal of Multivariate Analysis 121 (2013) 50–68. [Google Scholar]
- [6].Fan J, Gijbels I, Local Polynomial Modelling and Its Applications, Monographs on Statistics and Applied Probability, Chapman & Hall, 1996. [Google Scholar]
- [7].Fanaee-T H, Gama J, Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence 2 (2014) 113–127. [Google Scholar]
- [8].Ferraty F, Rabhi A, Vieu P, Conditional quantiles for dependent functional data with application to the climatic El Niño phenomenon, Sankhyā: The Indian Journal of Statistics 67 (2005) 378–398. [Google Scholar]
- [9].Ferraty F, Vieu P, Nonparametric Functional Data Analysis, Springer, New York, 2006. [Google Scholar]
- [10].Gertheiss J, Maity A, Staicu A-M, Variable selection in generalized functional linear models, Stat 2 (2013) 86–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Hall P, Hosseini-Nasab M, On properties of functional principal components analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (2006) 109–126. [Google Scholar]
- [12].Hall P, Hosseini-Nasab M, Theory for high-order bounds in functional principal components analysis, Mathematical Proceedings of the Cambridge Philosophical Society 146 (2009) 225–256. [Google Scholar]
- [13].Hall P, Müller H-G, Wang J-L, Properties of principal component methods for functional and longitudinal data analysis, The Annals of Statistics 34 (2006) 1493–1517. [Google Scholar]
- [14].He X, Shao Q-M, A general Bahadur representation of M-estimators and its application to linear regression with nonstochastic designs, The Annals of Statistics 24 (1996) 2608–2630. [Google Scholar]
- [15].Hendricks W, Koenker R, Hierarchical spline models for conditional quantiles and the demand for electricity, Journal of the American statistical Association 87 (1992) 58–68. [Google Scholar]
- [16].Horvath L, Kokoszka P, Reimherr M, Two sample inference in functional linear models, The Canadian Journal of Statistics 37 (2009) 571–591. [Google Scholar]
- [17].Huang L, Scheipl F, Goldsmith J, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, Reiss P, refund: Regression with Functional Data, 2015. R package version 0.1–13.
- [18].Huber PJ, The behavior of maximum likelihood estimates under nonstandard conditions, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. I: Statistics, University of California Press: Berkeley, CA, USA, 1967, pp. 221–233. [Google Scholar]
- [19].Ivanescu AE, Staicu A-M, Scheipl F, Greven S, Penalized function-on-function regression, Computational Statistics 30 (2015) 539–568. [Google Scholar]
- [20].Jiang C-R, Wang J-L, Covariate adjusted functional principal components analysis for longitudinal data., The Annals of Statistics 38 (2010) 1194–1226. [Google Scholar]
- [21].Jiang L, Bondell HD, Wang H, Interquantile shrinkage and variable selection in quantile regression, Computational Statistics & Data Analysis 69 (2014) 208–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Kato K, Estimation in functional linear quantile regression, The Annals of Statistics 40 (2012) 3108–3136. [Google Scholar]
- [23].Knight K, Limiting distributions for L1 regression estimators under general conditions, The Annals of Statistics 26 (1998) 755–770. [Google Scholar]
- [24].Koenker R, A note on L-estimates for linear models, Statistics & Probability Letters 2 (1984) 323–325. [Google Scholar]
- [25].Koenker R, Quantile Regression, volume 38, Cambridge university press, 2005. [Google Scholar]
- [26].Kong D, Staicu A-M, Maity A, Classical testing in functional linear models, Journal of Nonparametric Statistics 28 (2016) 813–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Larsen J, Policy institute: Bike-sharing programs hit the streets in over 500 cities worldwide, http://www.earth-policy.org/plan_b_updates/2013/update112, 2013.
- [28].Lee ER, Noh H, Park BU, Model selection via Bayesian information criterion for quantile regression models, Journal of the American Statistical Association 109 (2014) 216–229. [Google Scholar]
- [29].Li M, Staicu A-M, Bondell HD, Incorporating covariates in skewed functional data models, Biostatistics 16 (2015) 413–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Li Y, Liu Y, Zhu J, Quantile regression in reproducing kernel Hilbert spaces, Journal of the American Statistical Association 102 (2007) 255–268. [Google Scholar]
- [31].Li Y, Wang N, Carroll RJ, Generalized functional linear models with semiparametric single-index interactions, Journal of the American Statistical Association 105 (2010) 621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Li Y, Wang N, Carroll RJ, Selecting the number of principal components in functional data, Journal of the American Statistical Association 108 (2013) 1284–1294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Morris JS, Carroll RJ, Wavelet-based functional mixed models, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (2006) 179–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Pollard D, Asymptotics for least absolute deviation regression estimators, Econometric Theory 7 (1991) 186–199. [Google Scholar]
- [35].Ramsay J, Silverman B, Functional Data Analysis, Springer Series in Statistics, Springer, 2005. [Google Scholar]
- [36].Redd A, A comment on the orthogonalization of B-spline basis functions and their derivatives, Statistics and Computing 22 (2012) 251–257. [Google Scholar]
- [37].Shi G, Du J, Sun Z, Zhang Z, Checking the adequacy of functional linear quantile regression model, Journal of Statistical Planning and Inference 210 (2021) 64–75. [Google Scholar]
- [38].Staicu A-M, Crainiceanu CM, Reich DS, Ruppert D, Modeling functional data with spatially heterogeneous shape characteristics, Biometrics 68 (2012) 331–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Staicu A-M, Lahiri SN, Carroll RJ, Significance tests for functional data with complex dependence structure, Journal of Statistical Planning and Inference 156 (2015) 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Su Y-R, Di C-Z, Hsu L, Hypothesis testing in functional linear models, Biometrics 73 (2017) 551–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Usset J, Staicu A-M, Maity A, Interaction models for functional regression, Computational Statistics & Data Analysis 94 (2016) 317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Wang HJ, Stefanski LA, Zhu Z, Corrected-loss estimation for quantile regression with covariate measurement errors, Biometrika 99 (2012) 405–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Wang K, Wang HJ, Optimally combined estimation for tail quantile regression, Statistica Sinica 26 (2016) 295–311. [Google Scholar]
- [44].Wei Y, Carroll RJ, Quantile regression with measurement error, Journal of the American Statistical Association 104 (2009) 1129–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Wu Y, Ma Y, Yin G, Smoothed and corrected score approach to censored quantile regression with measurement errors, Journal of the American Statistical Association in press (2015). [Google Scholar]
- [46].Yao F, Müller H-G, Wang J-L, Functional data analysis for sparse longitudinal data, Journal of the American Statistical Association 100 (2005) 577–590. [Google Scholar]
- [47].Yao F, Sue-Chee S, Wang F, Regularized partially functional quantile regression, Journal of Multivariate Analysis 156 (2017) 39–56. [Google Scholar]
- [48].Zhang J-T, Chen J, Statistical inferences for functional data, The Annals of Statistics 35 (2007) 1052–1079. [Google Scholar]
- [49].Zhao Z, Xiao Z, Efficient regressions via optimally combining quantile information, Econometric Theory 30 (2014) 1272–1314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Zhou L, Huang JZ, Carroll RJ, Joint modelling of paired sparse functional data using principal components, Biometrika 95 (2008) 601–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Zhu H, Yao F, Zhang HH, Structured functional additive regression in reproducing kernel hilbert spaces, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2014) 581–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Zou H, Yuan M, Composite quantile regression and the oracle model selection theory, The Annals of Statistics 36 (2008) 1108–1126. [Google Scholar]
