Asymptotics of nonparametric L-1 regression models with dependent data

ZHIBIAO ZHAO; YING WEI; DENNIS KJ LIN

doi:10.3150/13-BEJ532

. Author manuscript; available in PMC: 2015 Aug 1.

Published in final edited form as: Bernoulli (Andover). 2014 Aug 1;20(3):1532–1559. doi: 10.3150/13-BEJ532

Asymptotics of nonparametric L-1 regression models with dependent data

ZHIBIAO ZHAO ^1,^*, YING WEI ^2,^**, DENNIS KJ LIN ^1,^†

PMCID: PMC4060752 NIHMSID: NIHMS491280 PMID: 24955016

Abstract

We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration.

Keywords: Bahadur representation, Coupling argument, Least-absolute-deviation estimation, Longitudinal data, Nonparametric estimation, Time series, Weighted empirical process

1. Introduction

There is a vast literature on the nonparametric location-scale model Y = μ(X) + s(X)e, where X, Y, and e are the covariates, response, and error, respectively. Given observations {(X_j, Y_j)}_j₌₁_,_…_,m, the latter model has been studied under various settings of data structure. In terms of the dependence structure, there are independent data and time series data scenarios; in terms of the design point X, there are random-design and fixed-design X_j = j/m settings. In these settings, we usually assume that either (X_j, Y_j) are independent observations from subjects j = 1, …, m, or {(X_j, Y_j)}_j₌₁_,_…, _m is a sequence of time series observations from the same subject. We refer the reader to Fan and Yao (2003) and Li and Racine (2007) for an extensive exposition of related works.

In this article we are interested in the following nonparametric location-scale model with serially correlated data from multiple subjects:

Y_{i, j} = μ (x_{i, j}) + s (x_{i, j}) e_{i, j}, 1 \leq j \leq m_{i}, 1 \leq i \leq n,

(1.1)

where, for each subject i, {(x_i,j, Y_i,j)}_{j=1,…,m_i} is the sequence of covariates and responses, and {e_i,j}_{j=1,…,m_i} is the corresponding error process. We study (1.1) under a general dependence framework for {e_i,j}_j_∈ℕ that allows for both longitudinal data and some spatially correlated data. In typical longitudinal studies, x_i,j represents measurement time or covariates at time j, then it is reasonable to assume that {e_i,j}_j_∈ is a causal time series, i.e., the current observation depends only on past but not future observations. In other applications, however, measurements may be dependent on both the left and right neighboring measurements, especially when x_i,j represents measurement location. A good example of this type of data is the vertical density profile data in Walker and Wright (2002); see also Section 2.1 for more details. To accommodate this, we propose a general error dependence structure, which can be viewed as an extension of the one-sided causal structure in Wu (2005) and Dedecker and Prieur (2005) to a two-sided non-causal setting. The proposed dependence framework allows for many linear and nonlinear processes.

We are interested in nonparametric estimation of the location function μ(·) and the scale function s(·). Least-squares based nonparametric methods have been extensively studied for both time series data (Fan and Yao, 2003) and longitudinal data (Hoover et al., 1998; Fan and Zhang, 2000; Wu and Zhang, 2002; Yao, Müller and Wang, 2005). While they perform well for Gaussian errors, least-squares based methods are sensitive to extreme outliers, especially when the errors have a heavy-tailed distribution. By contrast, robust estimation methods impose heavier penalty on far-deviated data points to reduce the impact from extreme outliers. For example, median quantile regression uses the absolute loss and the resultant estimator is based on sample local median. Since Koenker and Bassett (1978), quantile regression has become popular in parametric and nonparametric inferences and we refer the reader to Yu, Lu and Stander (1998) and Koenker (2005) for excellent expositions. Recently, He, Fu and Fung (2003), Koenker (2004) and Wang and Fygenson (2009) applied quantile regression techniques to parameter estimation of parametric longitudinal models, He, Zhu and Fung (2002) studied median regression for semiparametric longitudinal models, and Wang, Zhu and Zhou (2009) studied inferences for a partially linear varying-coefficient longitudinal model. Here we focus on quantile regression based estimation for the nonparametric model (1.1).

We aim to study the asymptotic properties, including uniform Bahadur representations and asymptotic normalities, of the least-absolute-deviation or median quantile estimates for model (1.1) under a general dependence structure. Nonparametric quantile regression estimation has been studied mainly under either the iid setting (Bhattacharya and Gangopadhyay, 1990; Chaudhuri, 1991; Yu and Jones, 1998) or the strong mixing setting (Truong and Stone, 1992; Honda, 2000; Cai, 2002). There are relatively scarce results on Bahadur representations of conditional quantile estimates. Bhattacharya and Gangopadhyay (1990) and Chaudhuri (1991) obtained point-wise Bahadur representations for conditional quantile estimation of iid data. For mixing stationary processes, Honda (2000) obtained point-wise and uniform Bahadur representations of conditional quantile estimates. For stationary random fields, Hallin, Lu and Yu (2009) obtained a point-wise Bahadur representation for spatial quantile regression function under spatial mixing conditions. Due to the non-stationarity and dependence structure, it is clearly challenging to establish Bahadur representations in the context of (1.1).

Our contribution here is mainly on the theoretical side. We establish uniform Bahadur representations for the least-absolute-deviation estimates of μ(·) and σ(·) in (1.1). To derive the uniform Bahadur representations, the key ingredient is to study the modulus of continuity of certain kernel weighted empirical processes of the non-stationary observations Y_i,j in (1.1). Empirical processes have been extensively studied under various settings, including the iid setting (Shorack and Wellner, 1986), linear processes (Ho and Hsing, 1996), strong mixing setting (Andrews and Pollard, 1994; Shao and Yu, 1996), and general causal stationary processes (Wu, 2008). Using a coupling argument to approximate the dependent process by an m-dependent process with a diverging m, we study the modulus of continuity of weighted empirical processes, and the latter result serves as a key tool in establishing our uniform Bahadur representations. These Bahadur representations provide deep insights into the asymptotic behavior of the estimates, and in particular they provide theoretical justification for the profile control chart methodologies in Wei, Zhao and Lin (2012). These technical treatments are also of interest in other nonparametric problems involving dependent data.

The article is organized as follows. In Section 2 we introduce the error dependence structure with examples. In Section 3 we study weighted empirical process through a coupling argument. Section 4 contains uniform Bahadur representations and asymptotic normality. Section 5 contains an illustration using progesterone data. Possible extensions to spatial setting are discussed in Section 6. Proofs are provided in Section 7.

2. Error dependence structure

First we introduce some notation used throughout this article. For a, b ∈ ℝ, let ⌊a⌋ be the integer part of a, a ∨ b = max(a, b), and a ∧ b = min(a, b). For a random variable Z ∈ Inline graphic , q > 0, if ||Z||_q = [ (|Z|^q)]^1/^q < ∞. Let ( ) be the set of functions with bounded derivatives up to order r on a set ⊂ ℝ.

Assume that, for each i, the error process {e_i,j}_j_∈ℕ in (1.1) is an independent copy from a stationary process {e_j}_j_∈ℕ which has the representation

e_{j} = G (ε_{j}, ε_{j \pm 1}, ε_{j \pm 2}, \dots),

(2.1)

where ε_j, j ∈ Inline graphic , are iid random innovations, and G is a measurable function such that e_j is well defined. We can view (2.1) as an input-output system with (ε_j, ε_j±₁, ε_j±₂, …), G, and e_j being, respectively, the input, filter, and output. Wu (2005) considered the causal time series case that e_j depends only on the past innovations ε_j, ε_j₋₁, …. In contrast, (2.1) allows for non-causal models and is particularly useful for applications that do not have a time structure. For example, if x_i,j are locations, then the corresponding measurement y_i,j depends on both the left and right neighboring measurements.

Condition 2.1

Let ${ε_{j}^{'}}_{j \in ℤ}$ be iid copies of {ε_j}_j_∈. There exist constants q > 0 and ρ ∈ (0, 1) such that

{‖ e_{0} - e_{0} (k) ‖}_{q} = O (ρ^{k}), where e_{0} (k) = G (ε_{0}, ε_{\pm 1}, \dots, ε_{\pm k}, ε_{\pm (k + 1)}^{'}, ε_{\pm (k + 2)}^{'}, \dots) .

(2.2)

In (2.2), e₀(k) can be viewed as a coupling process of e₀ with {ε_r}_|_r_|≥_k₊₁ replaced by the iid copy ${ε_{r}^{'}}_{∣ r ∣ \geq k + 1}$ while keeping the nearest 2k +1 innovations {ε_r}_|_r_|≤_k. In particular, if e₀ does not depend on {ε_r}_|_r_|≥_k₊₁, then e₀(k) = e₀. Thus, ||e₀ − e₀(k)||_q quantifies the contribution of {ε_r}_|_r_|≥_k₊₁ to e₀, and (2.2) states that the contribution decays exponentially in k. Shao and Wu (2007) and Dedecker and Prieur (2005) [cf. Equation (4.2) therein] considered one-sided causal version of (2.2) where e₀ depends only on {ε_r}_r_≤0.

Propositions 2.1–2.2 below indicate that, if {e_i} satisfies (2.2), then its properly transformed process also satisfies (2.2).

Proposition 2.1

For 0 < ς ≤ 1 and υ ≥ 0, define the collection of functions h

H (ς, υ) = {h : ∣ h (x) - h (x^{'}) ∣ \leq c {∣ x - x^{'} ∣}^{ς} {(1 + ∣ x ∣ + ∣ x^{'} ∣)}^{υ}, x, x^{'} \in R},

(2.3)

where c is a constant. Suppose {e_j} satisfies (2.2). Then the transformed process $e_{j}^{*} = h (e_{j})$ satisfies (2.2) with (q, ρ) replaced by q^* = q/(ς + υ) and ρ^* = ρ^ς.

In (2.3), Inline graphic (ς, 0) is the class of uniformly Hölder-continuous functions with index ς. If h(x) = |x|^b, b > 1, then h ∈ (1, b − 1). Clearly, all functions in (ς, 0) are continuous. Interestingly, for non-continuous transformations, the conclusion may still hold; see Proposition 2.2 below, where 1 is the indicator function.

Proposition 2.2

Let e₀ have a bounded density. Suppose {e_j} satisfies (2.2). Then, for any given x, {1 _{e_j≤x}} satisfies (2.2) with ρ replaced by ρ^* = ρ^1/(1+^q⁾.

Propositions 2.1–2.2 along with the examples below show that the error structure (2.1) and Condition 2.1 are sufficiently general to accommodate many popular linear and nonlinear time series models and their properly transformed processes.

Example 2.1 (m-dependent sequence)

Assume that e_j = G(ε_j, ε_j±₁, …, ε_j±m) for a measurable function G. Then e_j depends only on the nearest 2m + 1 innovations ε_j, ε_j±₁, …, ε_j±m. Clearly, {e_j}_j_∈ form a (2m+1)-dependent sequence, ||e₀−e₀(k)||_q = 0 for k ≥ m, and (2.2) trivially holds. If m = 0, then e_j are iid random variables.

Example 2.2 (Non-causal linear processes)

Consider the non-causal linear process $e_{j} = \sum_{r = - \infty}^{\infty} a_{r} ε_{j - r}$ . If ε_j ∈ Inline graphic and a_j = O(ρ^|^j^|), then it is easy to see that (2.2) holds.

Example 2.3 (Iterated random functions)

Consider random variables e_j defined by

e_{j} = R (e_{j - 1}, \dots, e_{j - d}; ε_{j}),

(2.4)

where ε_j, j ∈ Inline graphic , are iid random innovations, and R is a random map. Many widely time series models are of form (2.4), including threshold autoregressive model e_j = a max(e_j₋₁, 0) + b min(e_j₋₁, 0) + ε _j, autoregressive conditional heteroscedastic model $e_{j} = ε_{j} {(a^{2} + b^{2} e_{j - 1}^{2})}^{1 / 2}$ , random coefficient model e_j = (a + bε_j)e_j₋₁ + ε_j, and exponential autoregressive model $e_{j} = [a + b exp (- {c e}_{j - 1}^{2})] e_{j - 1} + ε_{j}$ , among others. Suppose there exists z₀ such that R(z₀; ε₀) ∈ Inline graphic and there exist constants a₁, …, a_d such that

\sum_{j = 1}^{d} a_{j} < 1 and {‖ R (z; ε_{0}) - R (z^{'}; ε_{0}) ‖}_{q}^{1 \land q} \leq \sum_{j = 1}^{d} a_{j} {∣ z_{j} - z_{j}^{'} ∣}^{1 \land q}

holds for all z = (z₁, …, z_d), $z^{'} = (z_{1}^{'}, \dots, z_{d}^{'})$ . By Shao and Wu (2007), (2.2) holds.

2.1. Some examples

The imposed dependence structure and hence the developed results in Sections 3–4 below are potentially applicable to a wide range of practical data types. We briefly mention some below.

(Time series data)

In the special case of n = 1, m₁ = m → ∞ and (x₁_,j, Y₁_,j, e₁_,j) = (x_j, Y_j, e_j) for a stationary time series {e_j}, (1.1) becomes the usual nonparametric location-scale model Y_j = μ(x_j) + s(x_j)e_j with time series data. The latter model has been extensively studied under both the random-design case and the fixed-design case x_j = j/n. See Fan and Yao (2003) for an excellent introduction to various local least-squares based methods under mixing settings. Quantile regression based estimations have been studied in Truong and Stone (1992), Honda (2000), and Cai (2002) for mixing processes. Despite the popularity of mixing conditions, it is generally difficult to verify mixing conditions even for linear processes. For example, for the autoregressive model X_i = ρX_i₋₁ + ε_i, ρ ∈ (0, 1/2], where ε_i are iid Bernoulli random variables ℙ(ε_i = 1) = 1 − ℙ(ε_i = 0) = q ∈ (0, 1), the stationary solution is not strong mixing (Andrews, 1984). By contrast, as shown above, the imposed Condition 2.1 is easily verifiable for many linear and nonlinear time series models and their proper transformations.

Longitudinal data

For each subject i, if x_i,j is the j-th measurement time or the covariates at time j, Y_i,j is the corresponding response, and {e_i,j}_j_∈ℕ is a stationary causal process [for example, e_j = G(ε_j, ε_j₋₁, ε_j₋₂, …) in (2.1) depends only on the past], then (1.1) becomes a typical longitudinal data setting. For example, Section 5.2 re-examines the well-studied progesterone data using the proposed methods. Another popular longitudinal data example is the CD4 cell percentage in HIV infection from the Multicenter AIDS Cohort Study. Based on least-squares methods, this data has been studied previously in Hoover et al. (1998) and Fan and Zhang (2000). We can examine how the response function (CD4 cell percentage) varies with measurement time (age) using the proposed robust estimation method in Section 4.

Spatially correlated data

In the vertical density data of Walker and Wright (2002), manufacturers are concerned about engineered wood boards’ density, which determines fiberboard’s overall quality. For each board, densities are measured at various locations along a designated vertical line. In this example, measurements depend on both the left and right neighboring measurements, and it is reasonable to impose the dependence structure (2.1). See Wei, Zhao and Lin (2012) for a detailed analysis. Also, as will be discussed in Section 6, the two-sided framework (2.1) can be extended to spatial lattice settings. We point out that the structure in (1.1) and (2.1) differs from the usual spatial model setting in the sense that (1.1) allows for observations from multiple independent subjects whereas the latter usually assumes that all observations are spatially correlated [see, e.g., Hallin, Lu and Yu (2009) for quantile regression of spatial data].

3. Weighted empirical process

In this section, we study weighted empirical processes through a coupling argument. Dependence is the main difficulty in extending results developed for independent data to dependent data. For mixing processes, the widely used large-block-small-block technique partitions the data into asymptotically independent blocks. Here, we adopt a coupling argument which copes well with the dependence structure in Section 2.

We now illustrate the basic idea. By (2.1), the error e_i,j in (1.1) has the representation

e_{i, j} = G (ε_{i, j}, ε_{i, j \pm 1}, ε_{i, j \pm 2}, \dots),

for iid innovations ε_i,j, i, j ∈ Inline graphic . Thus, {e_i,j}_j_∈ is a dependent series for each fixed i, whereas {e_i₁,j}_j∈ and {e_i₂,j}_j_∈ are two independent series for i₁ ≠ i₂. Let $ε_{i, j, k}^{'}, i, j, k \in ℤ$ , be iid copies of ε_i,j. For k_n ∈ ℕ, define the coupling process of e_i,j as

e_{i, j} (k_{n}) = G (ε_{i, j}, ε_{i, j \pm 1}, \dots, ε_{i, j \pm k_{n}}, ε_{i, j, j \pm (k_{n} + 1)}^{'}, ε_{i, j, j \pm (k_{n} + 2)}^{'}, \dots)

(3.1)

by replacing all but the nearest 2k_n + 1 innovations with iid copies. We call k_n the coupling lag. Clearly, e_i,j(k_n) has the same distribution as e_i,j.

By construction, for each fixed i, {e_i,j(k_n)}_j_∈ form (2k_n + 1)-dependent sequence in the sense that e_i,j(k_n) and e_i,j_′(k_n) are independent if |j − j′| ≥ 2k_n + 1. Consequently, for each fixed i and s, {e_{i,(j−1)(2k_n+1)+s}(k_n)}_j_∈ are iid. The latter property helps us reduce the dependent data to an independent case. On the other hand, under (2.2), ||e_i,j − e_i,j(k_n)||_q = O(ρ^k_n) is sufficiently small with properly chosen k_n and hence the coupling process is close enough to the original one. Similarly, for Y_i,j in (1.1), define its coupling process:

{\tilde{Y}}_{i, j} = μ (x_{i, j}) + s (x_{i, j}) e_{i, j} (k_{n}) .

(3.2)

First, we present a general result regarding the sum of functions of the coupling process Ỹ_i,j. Let Inline graphic be any finite set. For real-valued functions g_i,j(y, v), i, j ∈ ℕ, defined on ℝ × such that [g_i,j(Ỹ_i,j, v)] = 0 for all v ∈ , define

H_{n} (v) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} g_{i, j} ({\tilde{Y}}_{i, j}, v), v \in V_{n} .

Throughout let N_n = m₁ + ··· + m_n be the total number of observations.

Theorem 3.1

Assume that the cardinality | Inline graphic | of and the coupling lag k_n grow no faster than a polynomial of N_n. Further assume |g_i,j(y, v)| ≤ c for a constant c < ∞, and for some sequence χ_n,

max_{v \in V_{n}} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E [g_{i, j}^{2} ({\tilde{Y}}_{i, j}, v)] \leq χ_{n} .

(3.3)

If χ_n = O(1), then max_v_∈ |H_n(v)| = O_p(k_n log N_n).
If sup_n log N_n/χ_n < ∞, then max_v_∈ |H_n(v)| = O_p[k_n(χ_n log N_n)^1/2].

By Theorem 3.1, the magnitude of max_v_∈ |H_n(v)| increases with the coupling lag k_n. Intuitively, as k_n increases, there is stronger dependence in the coupling process Ỹ_i,j and consequently a larger bound for H_n(v). Therefore, a small k_n is preferred in order to have a tight bound for H_n(v). On the other hand, a reasonably large k_n is needed in order for the coupling process to be sufficiently close to the original process. Under (2.2), for k_n = O(log N_n), the coupling process converges to the original one at a polynomial rate, and meanwhile the maximum bound in Theorem 3.1 is optimal up to a logarithm factor. For example, if χ_n = O(1), then max_v_∈ |H_n(v)| = O_p[(log N_n)²]; if sup_n log N_n/χ_n < ∞, then max_v_∈ |H_n(v)| = O_p{[χ_n(log N_n)³]^1/2}.

In what follows we consider the special case of weighted empirical process, which plays an essential role in quantile regression. Let ϖ_i,j(x) ≥ 0 be non-random weights that may depend on x. Consider the weighted empirical process

F_{n} (x, y) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϖ_{i, j} (x) 1_{Y_{i, j} \leq y} .

(3.4)

To study F_n(x, y), recall Ỹ_i,j in (3.2) and define the coupling empirical process

{\tilde{F}}_{n} (x, y) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϖ_{i, j} (x) 1_{{\tilde{Y}}_{i, j} \leq y} .

(3.5)

Under mild regularity conditions, Theorem 3.2 below states that F_n(x, y) can be uniformly approximated by F̃_n(x, y) with proper choice of the coupling lag k_n.

Condition 3.1

(i) ϖ_i,j(x) ≤ c uniformly for some constant c < ∞. (ii) μ(x_i,j) is uniformly bounded. (iii) s(x_i,j) > 0 is uniformly bounded away from zero and infinity.

Theorem 3.2

Assume that Conditions 2.1 and 3.1 hold. In (3.1), let the coupling lag k_n = ⌊λlog N_n⌋ for some λ > (q + 1)/[q log(1/ρ)], where N_n = m₁ + ··· + m_n. Then

sup_{x, y \in R} ∣ F_{n} (x, y) - {\tilde{F}}_{n} (x, y) ∣ = O_{p} [{(log N_{n})}^{2}] .

To study asymptotic Bahadur representations of quantile regression estimates, a key step is to study the modulus of continuity of F_n(x, y), defined by

D_{n} (δ, x, y) = {F_{n} (x, y + δ) - E [F_{n} (x, y + δ)]} - {F_{n} (x, y) - E [F_{n} (x, y)]} .

(3.6)

Intuitively, D_n(δ, x, y) measures the oscillation of the centered empirical process F_n(x, y)– Inline graphic [F_n(x, y)] in response to a small perturbation δ in y.

The dependence structure in Section 2 along with the coupling argument provides a convenient framework to study D_n(δ, x, y). Recall F̃_n(x, y) in (3.5). For D_n(δ, x, y) in (3.6), define its coupling process

{\tilde{D}}_{n} (δ, x, y) = {{\tilde{F}}_{n} (x, y, + δ) - E [{\tilde{F}}_{n} (x, y + δ)]} - {{\tilde{F}}_{n} (x, y) - E [{\tilde{F}}_{n} (x, y)]} .

(3.7)

Notice that e_i,j(k_n) and e_i,j have the same distribution, so Inline graphic [F_n(x, y)] = [F̃_n(x, y)]. By Theorem 3.2, it is easy to see that, uniformly over x, y, δ,

∣ D_{n} (δ, x, y) - {\tilde{D}}_{n} (δ, x, y) ∣ \leq 2 sup_{x, y \in R} ∣ F_{n} (x, y) - {\tilde{F}}_{n} (x, y) ∣ = O_{p} [{(log N_{n})}^{2}] .

(3.8)

Therefore, the asymptotic properties of D_n(δ, x, y) are similar to that of D̃_n(δ, x, y), which can be studied through Theorem 3.1.

Condition 3.2

(i) ϖ_i,j(·) = 0 outside a common bounded interval for all i, j. (ii) There exist τ_n and φ_n such that

sup_{x \neq x^{'}} \frac{∣ ϖ_{i, j} (x) - ϖ_{i, j} (x^{'}) ∣}{∣ x - x^{'} ∣} \leq τ_{n} and sup_{x} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϖ_{i, j}^{2} (x) \leq φ_{n} .

(3.9)

Theorem 3.3

Assume that Conditions 2.1 and 3.1–3.2 hold. Further assume δ_n → 0, sup_n log N_n/(δ_nφ_n) < ∞, and that 1/δ_n + τ_n grows no faster than a polynomial of N_n. Then

sup_{∣ δ ∣ \leq δ_{n}, x, y \in R} ∣ D_{n} (δ, x, y) ∣ = O_{p} {{[δ_{n} φ_{n} {(log N_{n})}^{3}]}^{1 / 2}} .

(3.10)

4. Quantile regression and Bahadur representations

For a random variable Z, denote by Inline graphic (Z) = inf{z ∈ ℝ, ℙ(Z ≤ z) ≥ 1/2} the median of Z, and similarly denote by (·|·) the conditional median operator. To ensure identifiability of μ and s in (1.1), without loss of generality we assume (e_i,j) = 0 and (|e_i,j|) = 1.

Note that Inline graphic (Y_i,j|x_i,j = x) = μ(x). Applying a kernel localization technique, we propose the following least-absolute-deviation or median quantile estimate of μ(x):

\hat{μ} (x) = \underset{θ}{argmin} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ∣ Y_{i, j} - θ ∣ K_{b_{n}} (x_{i, j} - x), where K_{b_{n}} (u) = K (u / b_{n})

(4.1)

for a non-negative kernel function K satisfying ∫_ℝ K(u) = 1, and b_n > 0 is a bandwidth. The estimate μ̂_{b_n}(x) pools together information across all subjects, an appealing property especially when some subjects have sparse observations. By the Bahadur representation in Theorem 4.1 below, the bias term of μ̂(x)−μ(x) is of order $O (b_{n}^{2})$ . Following Wu and Zhao (2007), we adopt a jackknife bias-correction technique. In (4.1), denote by μ̂(x|b_n) and $\hat{μ} (x ∣ \sqrt{2} b_{n})$ the estimates of μ(x) using bandwidth b_n and $\sqrt{2} b_{n}$ , respectively. The bias-corrected jackknife estimator is

\tilde{μ} (x) = 2 \hat{μ} (x ∣ b_{n}) - \hat{μ} (x ∣ \sqrt{2} b_{n}),

(4.2)

which can remove the second-order bias term $O (b_{n}^{2})$ in μ̂(x).

After estimating μ(·), we estimate s(·) based on residuals. Notice that Inline graphic (|e_i,j|) = 1 implies (|Y_i,j − μ(x)||x_i,j = x) = s(x). Therefore, we propose

\hat{s} (x) = \underset{θ}{argmin} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} | ∣ Y_{i, j} - \tilde{μ} (x) ∣ - θ | K_{h_{n}} (x_{i, j} - x),

(4.3)

where h_n > 0 is another bandwidth, and μ̃(x) is the bias-corrected jackknife estimator in (4.2). As in (4.2), we adopt the following bias-corrected jackknife estimator

\tilde{s} (x) = 2 \hat{s} (x ∣ h_{n}) - \hat{s} (x ∣ \sqrt{2} h_{n}) .

(4.4)

Remark 4.1

By Inline graphic (|Y_i,j − μ(x_i,j)||x_i,j = x) = s(x), an alternative estimator of s(x) is

\bar{s} (x) = \underset{θ}{argmin} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} | ∣ Y_{i, j} - \tilde{μ} (x_{i, j}) ∣ - θ | K_{h_{n}} (x_{i, j} - x) .

(4.5)

The difference between (4.3) and (4.5) is that (4.3) uses μ̃(x) whereas (4.5) uses μ̃(x_i,j). Since K has bounded support, only those x_i,j’s with |x_i,j − x| = O(h_n) contribute to the summation in (4.5). Thus, as h_n → 0 so that x_i,j → x and μ̃(x_i,j) ≈ μ̃(x), the two estimators in (4.3) and (4.5) are expected to be asymptotically close. Our use of (4.3) has some technical and computational advantages. First, the estimation error μ̃(x_i,j)−μ(x_i,j) varies with (i, j), and thus it is technically more challenging to study (4.5). Second, to implement (4.5), we need to compute μ̃(·) at each point x_i,j, which requires solving a large number of optimization problems in (4.1) for a large data set. By contrast, (4.3) only requires estimation of μ̃(·) at those grid points x at which we wish to estimate s(·).

To study asymptotic properties, we need to introduce some regularity conditions. Throughout we write Inline graphic ([a, b]) = [a + ε, b − ε] for an arbitrarily fixed small ε > 0. Denote by F_e and $f_{e} = F_{e}^{'}$ the distribution and density functions of e₀ in (2.1), respectively. The assumption (e₀) = 0 and (|e₀|) = 1 implies F_e(0) = 1/2 and F_e(1) − F_e(−1) = 1/2.

Condition 4.1

Suppose that all measurement locations x_i,j are within an interval [a, b], and order them as a = x̃₀ < x̃₁ < ··· < x̃_{N_n} < x̃_{N_n+1} = b. Assume that

max_{0 \leq k \leq N_{n}} | {\tilde{x}}_{k + 1} - {\tilde{x}}_{k} - \frac{b - a}{N_{n}} | = O (N_{n}^{- 2}), where N_{n} = m_{1} + \dots + m_{n} .

(4.6)

Condition 4.1 requires that the pooled covariates x_i,j should be approximately uniformly dense on [a, b], which is a natural condition since otherwise it would be impossible to draw inferences for regions with very scarce observations. Pooling all subjects together is an appealing procedure to ensure this uniform denseness even though each single subject may only contain sparse measurements.

In nonparametric regression problems, there are two typical settings on the design points: fixed-design and random-design points. For fixed-design case, it is often assumed that the design points are equally spaced on some interval. For example, for the vertical density profile data of Walker and Wright (2002), the density was measured at equispaced points along a designated vertical line of wood boards. Condition 4.1 can be viewed as a generalization of the fixed-design points to allow for approximately fixed-design points. For random-design case, the design points are sampled from a distribution. For example, Assumption (a) in Appendix A of Fan and Zhang (2000)) imposed the random-design condition. In practice, both settings have different range of applicability. For example, for daily or monthly temperature series, the fixed-design setting may be appropriate; for children’s growth curve studies, it may be more reasonable to use the random-design setting since the measurements are usually taken at irregular time points.

Remark 4.2 (Asymptotic results under the random-design case)

All our subsequent theoretical results are derived under the approximate fixed-design setting in Condition 4.1, but the same argument also applies to the random-design case. Specifically, assume that the design-points {x_i,j} are random samples from a density f_X(·) with support [a, b] and that x is an interior point. Then, for the design-adaptive local linear median quantile regression estimates, the subsequent Theorems 4.1–4.2 and Corollaries 4.1–4.2 still hold with (b − a) therein replaced by 1/f_X(x). In fact, given the iid structure of {x_i,j}, the technical argument becomes much easier. For example, to establish Lemma 7.1 (again, with (b − a) therein replaced by 1/f_X(x)), elementary calculations can easily find the mean and variance for the right hand side of (7.11). All other proofs can be similarly modified and we omit the details.

Conditions 4.2–4.3 below are standard assumptions in nonparametric estimation.

Condition 4.2

K is symmetric and has bounded support and bounded derivative. Write

ϕ_{K} = \int_{R} K^{2} (u) d u and ψ_{K} = \frac{1}{2} \int_{R} u^{2} K (u) d u .

Condition 4.3

μ, s ∈ Inline graphic ([a, b]), inf_x_∈[_a,b_] s(x) > 0, f_e ∈ (ℝ), f_e(0) > 0, f_e(1) +f_e(−1) > 0.

4.1. Uniform Bahadur representation for μ̂(x)

Theorem 4.1 below provides an asymptotic uniform Bahadur representation for μ̂(x) in (4.1), and its proof in Section 7.4 relies on the arguments and results in Section 3.

Theorem 4.1

Let μ̂(x) be as in (4.1). Assume that Conditions 2.1 and 4.1–4.3 hold. Further assume b_n → 0 and (log N_n)³/(N_nb_n) → 0. Then

We have the uniform consistency:
$sup_{x \in S_{ε} ([a, b])} ∣ \hat{μ} (x) - μ (x) ∣ = O_{p} {b_{n}^{2} + \frac{{(log N_{n})}^{3 / 2}}{{(N_{n} b_{n})}^{1 / 2}}} .$ (4.7)
Moreover, the Bahadur representation
$\hat{μ} (x) - μ (x) = ψ_{K} ρ_{μ} (x) b_{n}^{2} + \frac{(b - a) s (x)}{f_{e} (0)} \frac{Q_{b_{n}} (x)}{N_{n} b_{n}} + O_{p} (r_{n})$ (4.8)

holds uniformly over x ∈ ([a, b]), where
$\begin{array}{l} ρ_{μ} (x) = μ^{″} (x) - [\frac{μ^{'} (x) f_{e}^{'} (0)}{f_{e} (0)} + 2 s^{'} (x)] \frac{μ^{'} (x)}{s (x)}, \\ Q_{b_{n}} (x) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {1_{Y_{i, j} \leq μ (x)} - E [1_{Y_{i, j} \leq μ (x)}]} K_{b_{n}} (x_{i, j} - x), \\ r_{n} = b_{n}^{4} + \frac{b_{n}^{1 / 2} {(log N_{n})}^{3 / 2}}{N_{n}^{1 / 2}} + \frac{{(log N_{n})}^{9 / 4}}{{(N_{n} b_{n})}^{3 / 4}} . \end{array}$

In the Bahadur representation (4.8), $ψ_{K} ρ_{μ} (x) b_{n}^{2}$ is the bias term, Q_{b_n} (x) determines the asymptotic distribution of μ̂(x) − μ(x), and r_n is the negligible error term. Such a Bahadur representation provides a powerful tool in studying the asymptotic behavior of μ̂(x). Based on Theorem 4.1, we obtain a Central Limit Theorem (CLT) for μ̂ in Corollary 4.1. Clearly, the variance of Q_{b_n} (x) is a linear combination of K_{b_n} (x_i,j₁ − x)K_{b_n} (x_i,j₂ − x). The following regularity condition is needed to ensure the negligibility of the cross-term K_{b_n} (x_i,j₁ − x)K_{b_n} (x_i,j₂ − x) for j₁ ≠ j₂.

Condition 4.4

Assume that, for all given x ∈ Inline graphic ([a, b]) and k_n = O(log N_n), there exits ι_n such that k_nι_n → 0 and

\sum_{(i, j_{1}, j_{2}) \in I} k_{b_{n}} (x_{i, j_{1}} - x) K_{b_{n}} (x_{i, j_{2}} - x) = O [min (h, M_{n}) {n b}_{n} k_{n} ι_{n}], M_{n} = max_{1 \leq i \leq n} m_{i},

(4.9)

for all h ≥ (k_n ∨ a), where Inline graphic = {(i, j₁, j₂) : 1 ≤ i ≤ n, a ≤ j₁ < j₂ ≤ min(a + h − 1, m_i), |j₁ − j₂| ≤ k_n}. Further assume that ${max}_{j} \sum_{i = 1}^{n} K_{b_{n}}^{r} (x_{i, j} - x) = O ({n b}_{n})$ , r = 2, 4.

Condition 4.4 is very mild. Intuitively, we consider x_i,j, j ∈ Inline graphic , being random locations, then $E [K_{b_{n}} (x_{i, j_{1}} - x) K_{b_{n}} (x_{i, j_{2}} - x)] = O (b_{n}^{2})$ for j₁ ≠ j₂. Thus, under the mild condition b_n log N_n → 0, (4.9) holds with ι_n = b_n.

Corollary 4.1

Let the conditions in Theorem 4.1 be fulfilled and Condition 4.4 hold. Further assume that ${(log N_{n})}^{9} / (N_{n} b_{n}) + N_{n} b_{n}^{9} \to 0$ and nM_n = O(N_n), nb_n → ∞, $log N_{n} = O (\sqrt{M_{n}})$ , where M_n is defined as in (4.9). Then, for any x ∈ Inline graphic ([a, b]), we have

{(N_{n} b_{n})}^{1 / 2} [\hat{μ} (x) - μ (x) - ψ_{K} ρ_{μ} (x) b_{n}^{2}] \Rightarrow N (0, \frac{ϕ_{K} (b - a) s^{2} (x)}{4 f_{e}^{2} (0)}) .

(4.10)

The proof of Corollary 4.1, given in Section 7.5, uses the coupling argument in Section 3. The condition nM_n = O(N_n) is in line with the classical CLT Lindeberg condition that none of the subjects dominates the others. If b_n is of the order $N_{n}^{- β}$ , then the bandwidth condition in Corollary 4.1 holds if β ∈ (1/9, 1). By Corollary 4.1, the optimal bandwidth minimizing the asymptotic mean squared error is

b_{n} = {[\frac{ϕ_{K} (b - a) s^{2} (x)}{4 ψ_{K}^{2} ρ_{μ}^{2} (x) f_{e}^{2} (0)}]}^{1 / 5} N_{n}^{- 1 / 5} .

(4.11)

For this optimal bandwidth, the bias term is of order $O (N_{n}^{- 2 / 5})$ and contains the derivatives s′, μ′, μ″ and $f_{e}^{'}$ that can be difficult to estimate. Based on the Bahadur representation (4.8), we can correct the bias term $ψ_{K} ρ_{μ} (x) b_{n}^{2}$ via the jackknife estimator μ̃(x) in (4.2). Then the bias term for μ̃(x) becomes $2 ψ_{K} ρ_{μ} (x) b_{n}^{2} - ψ_{K} ρ_{μ} (x) {(\sqrt{2} b_{n})}^{2} = 0$ . By (4.8), following the proof of Corollary 4.1, we have

{(N_{n} b_{n})}^{1 / 2} [\tilde{μ} (x) - μ (x)] \Rightarrow N (0, \frac{ϕ_{K^{*}} (b - a) s^{2} (x)}{4 f_{e}^{2} (0)}),

(4.12)

where $K^{*} (u) = 2 K (u) - 2^{- 1 / 2} K (u / \sqrt{2})$ .

4.2. Uniform Bahadur representation for ŝ(x)

Theorem 4.2 below provides a uniform Bahadur representation for ŝ(x) in (4.3).

Theorem 4.2

Let ŝ(x) be as in (4.3). Assume that the conditions in Theorem 4.1 hold. Further assume h_n + (log N_n)³/(N_nh_n) → 0. Then

We have the uniform consistency:
$sup_{x \in S_{ε} ([a, b])} ∣ \hat{s} (x) - s (x) ∣ = O_{p} {b_{n}^{2} + h_{n}^{2} + \frac{{(log N_{n})}^{3 / 2}}{{(N_{n} b_{n})}^{1 / 2}} + \frac{{(log N_{n})}^{3 / 2}}{{(N_{n} h_{n})}^{1 / 2}}} .$ (4.13)

Moreover, the Bahadur representation

\hat{s} (x) - s (x) = ψ_{K} ρ_{s} (x) h_{n}^{2} + (b - a) s (x) [\frac{W_{h_{n}} (x)}{N_{n} h_{n} κ_{+}} - \frac{κ T_{b_{n}} (x)}{N_{n} b_{n} f_{e} (0)}] + O_{p} ({\tilde{r}}_{n}),

(4.14)

holds uniformly over x ∈ Inline graphic

([a, b]), where κ₊ = f_e(−1) + f_e(1), κ = [f_e(1) − f_e(−1)]/κ₊, Q_{b_n} (x) is defined as in Theorem 4.1,

\begin{array}{l} T_{b_{n}} (x) = 2 Q_{b_{n}} (x) - 2^{- 1 / 2} Q_{\sqrt{2} b_{n}} (x), \\ ρ_{s} (x) = s^{″} (x) - \frac{2 s^{'} {(x)}^{2}}{s (x)} + κ [μ^{″} (x) - \frac{2 μ^{'} (x) s^{'} (x)}{s (x)}] - \frac{f_{e}^{'} (1) {[s^{'} (x) + μ^{'} (x)]}^{2} - f_{e}^{'} (- 1) {[s^{'} (x) - μ^{'} (x)]}^{2}}{κ_{+} s (x)}, \\ W_{h_{n}} (x) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {1_{∣ Y_{i, j} - μ (x) ∣ \leq s (x)} - E [1_{∣ Y_{i, j} - μ (x) ∣ \leq s (x)}]} K_{h_{n}} (x_{i, j} - x), \end{array}

{\tilde{r}}_{n} = b_{n}^{4} + h_{n}^{4} + \frac{h_{n}^{1 / 2} {(log N_{n})}^{3 / 2}}{N_{n}^{1 / 2}} + \frac{{(log N_{n})}^{9 / 4}}{{(N_{n} h_{n})}^{3 / 4}} + \frac{{(log N_{n})}^{9 / 4}}{N_{n}^{3 / 4} b_{n}^{1 / 4} h_{n}^{1 / 2}} + \frac{b_{n} {(log N_{n})}^{3 / 2}}{{(N_{n} h_{n})}^{1 / 2}} .

As in Corollary 4.1, we can use the Bahadur representation (4.14) to obtain a CLT for ŝ(x) − s(x). However, the convergence rate depends on the ratio h_n/b_n. If h_n/b_n → ∞, then the term T_{b_n} (x)/(N_nb_n) dominates and we have (N_nb_n)^1/2-convergence; if h_n/b_n → 0, then the term W_{h_n} (x)/(N_nh_n) dominates and we have (N_nh_n)^1/2-convergence; if h_n/b_n → c for a constant c ∈ (0, ∞), then both terms contribute.

Corollary 4.2

Let the conditions in Theorem 4.2 be fulfilled and Condition 4.4 and its counterpart version with b_n being replaced by h_n hold. Further assume that

N_{n} {(b_{n} \lor h_{n})}^{9} + \frac{{(log N_{n})}^{9}}{N_{n} (b_{n} \land h_{n})} \to 0,

and nM_n = O(N_n), n(b_n ∧ h_n) → ∞, $log N_{n} = O (\sqrt{M_{n}})$ , where M_n is defined as in (4.9). Recall $K^{*} (u) = 2 K (u) - 2^{- 1 / 2} K (u / \sqrt{2})$ in (4.12) and κ, κ₊ in Theorem 4.2. Let x ∈ Inline graphic ([a, b]) be a fixed point. Suppose h_n/b_n → c.

If κ ≠ 0 and c = ∞, then
${(N_{n} b_{n})}^{1 / 2} [\hat{s} (x) - s (x) - ψ_{K} ρ_{s} (x) h_{n}^{2}] \Rightarrow N (0, \frac{ϕ_{K^{*}} κ^{2} (b - a) s^{2} (x)}{4 f_{e}^{2} (0)}) .$
If κ ≠ 0 and c ∈ [0, ∞), then
${(N_{n} h_{n})}^{1 / 2} [\hat{s} (x) - s (x) - ψ_{K} ρ_{s} (x) h_{n}^{2}] \Rightarrow N (0, σ_{c}^{2}),$ (4.15)

where
$σ_{c}^{2} = \frac{(b - a) s^{2} (x)}{4} {\frac{ϕ_{K}}{κ_{+}^{2}} + \frac{c^{2} κ^{2} ϕ_{K^{*}}}{f_{e}^{2} (0)} - \frac{2 c κ [1 - 4 F_{e} (- 1)]}{κ_{+} f_{e} (0)} \int_{R} K (u) K^{*} (c u) d u} .$
If κ = 0, then for all c ∈ [0, ∞], (4.15) holds with $σ_{c}^{2} = ϕ_{K} (b - a) s^{2} (x) / (4 κ_{+}^{2})$ .

One can similarly establish CLT results for s̃(x) in (4.4). We omit the details.

5. An illustration using real data

5.1. Bandwidth selection

For least-squares based estimation of longitudinal data, Rice and Silverman (1991) suggested the subject-based cross-validation method. The basic idea is to use all but one subject to do model fitting, validate the fitted model using the left-out subject, and finally choose the optimal bandwidth by minimizing the overall prediction error:

b_{LS}^{*} = \underset{b}{argmin} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {Y_{i, j} - {\tilde{μ}}^{(- i)} (x_{i, j})}^{2},

(5.1)

where μ̃⁽⁻ⁱ⁾(x) represents the estimator of μ(x) based on data from all but ith subject. As in Wei, Zhao and Lin (2012), we replace the square loss by absolute deviation:

b_{LAD}^{*} = \underset{b}{argmin} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ∣ Y_{i, j} - {\tilde{μ}}^{(- i)} (x_{i, j}) ∣ .

(5.2)

5.2. An illustration using progesterone data

Urinary metabolite progesterone levels are measured daily, around the ovulation day, over 22 conceptive and 69 nonconceptive women’s menstrual cycles so that each curve has about 24 design points; see the left panel of Figure 1 for a plot of the trajectories of the 22 conceptive women. Previous studies based on least-squares (LS) methods include Brumback and Rice (1998), Fan and Zhang (2000), and Wu and Zhang (2002). Here we re-analyze the conceptive group using our least-absolute-deviation (LAD) estimates.

Left: Trajectories of the measurements from 22 conceptive women. Right: Estimates of μ(·) using both the original data and perturbed data. Thin solid, dotted, and dashed curves are the least-squares estimates of μ(·) based on the original data, perturbation scenario I (remove subjects 13 and 14), and perturbation scenario II (shift subjects 13 and 14 down by three units), respectively. Similarly, thick solid, dotted, and dashed curves are least-absolute-deviation estimates.

From the left plot in Figure 1, subject 14 (dashed curve) has two sharp drops in progesterone levels at days −3 and 9. Similarly, subject 13 (dotted curve) has unusually low levels on days −1, 0, 1. While such sharp drops or “outliers” may be caused by incorrect measurements or other unknown reasons, we investigate the impact of such “outliers” on the LS and LAD estimates. In the right plot of Figure 1, the thick solid and thin solid curves are the LAD and LS estimates of μ(·). The two estimates are reasonably close except during the periods [−4, 1] and [8, 15]. Notice that the latter periods contain the “outliers” from subjects 13, 14.

To understand the impact of such possible “outliers”, we consider two scenarios of perturbing the data below.

Scenario I: remove subjects 13 and 14 and estimate μ(·) using the remaining subjects. The thick dotted and thin dotted curves are the corresponding LAD and LS estimates. Clearly, the discrepancy is largely diminished.
Scenario II: make the two outlier subjects 13 and 14 even more extreme by shifting their curves three units down. We see that the discrepancy between the LAD (thick dashed) and LS (thin dashed) estimates becomes even more remarkable.

Compared with the estimate based on the original data, the LS estimates under the two perturbation scenarios differ significantly. By contrast, the LAD estimates under the three cases are similar, indicating the robustness in the presence of outliers. We conclude that, for the progesterone data with several possible outliers, the proposed LAD estimate offers an attractive alternative over the well-studied LS estimates. In practice, we recommend the LAD estimate if the data has suspicious, unusual observations or extreme outliers.

6. Conclusion and extension to spatial setting

This paper studies robust estimations of the location and scale functions in a nonparametric regression model with serially dependent data from multiple subjects. Under a general error dependence structure that allows for many linear and nonlinear processes, we study uniform Bahadur representations and asymptotic normality for least-absolute-deviation estimations of a location-scale longitudinal model. In the large literature on nonparametric estimation of longitudinal models, most existing works use least-squares based methods, which are sensitive to extreme observations and may perform poorly in such circumstances. Despite the popularity of quantile regression methods in linear models and nonparametric regression models, little research has been done in quantile regression based estimations for nonparametric longitudinal models, partly due to difficulties in dealing with the dependence. Therefore, our work provides a solid theoretical foundation for quantile regression estimations in longitudinal models.

The study of asymptotic Bahadur representations is a difficult area and has focused mainly on the iid setting or stationary time series setting. For longitudinal data, deriving Bahadur representations is more challenging due to the non-stationarity and dependence. To obtain our Bahadur representations, we develop substantial theory for kernel weighted empirical processes via a coupling argument.

The proposed error dependence structure and coupling argument provide a flexible and powerful framework for asymptotics from dependent data, such as time series data, longitudinal data and spatial data, whereas similar problems have been previously studied mainly for either independent data or stationary time series. In (2.1), e_j depends on the innovations or shocks ε_j, ε_j±₁, …, indexed by integers on a line. A natural extension is the function of innovations indexed by bivariate integers on a square:

e_{j} = G (ε_{j, j}, ε_{j, j \pm 1}, ε_{j \pm 1, j}, ε_{j \pm 1, j \pm 1}, \dots), j \in ℤ .

The coupling argument still holds by replacing the innovations ε_j±r,j±s, r, s ≥ k + 1, outside the k nearest squares with iid copies. As in Condition 2.1, we can assume that the impact of perturbing the distant innovations decays exponentially fast (or polynomially fast with slight modifications of the proof). More generally, the coupling argument holds for function of innovations indexed by multivariate spatial lattice, and such setting may be useful in studying asymptotics for spatial data.

7. Technical proofs

Throughout c, c₁, c₂, …, are generic constants. First, we give an inequality for the indicator function. Let Z, Z′ be two random variables and y ∈ ℝ. For α > 0, we have

1_{Z \leq y < Z^{'}} = 1_{Z \leq y < Z^{'}, ∣ Z - Z^{'} ∣ \geq α} + 1_{Z \leq y < Z^{'}, ∣ Z - Z^{'} ∣ < α} \leq 1_{∣ Z - Z^{'} ∣ \geq α} + 1_{y < Z^{'} < y + α},

Similarly, 1_Z_′≤_y_<_z ≤ 1_|_Z₋_Z_′|≥_α + 1_y₋_α_<_Z_′≤_y. Therefore,

∣ 1_{Z \leq y} - 1_{Z^{'} \leq y} ∣ = 1_{Z \leq y < Z^{'}} + 1_{Z^{'} \leq y < Z} \leq 2 1_{∣ Z - Z^{'} ∣ \geq α} + 1_{y - α < Z^{'} < y + α} .

(7.1)

7.1. Proof of Propositions 2.1–2.2

Proof of Proposition 2.1

Let q^* = q/(ς + υ), p₁ = υ/ς + 1, and p₂ = ς/υ + 1 so that ςq^*p₁ = q, υq^*p₂ = q, and 1/p₁ + 1/p₂ = 1. For convenience write $e_{0}^{'} = e_{0} (k)$ . By assumption, ${‖ e_{0} - e_{0}^{'} ‖}_{q} = O (ρ^{k})$ . By (2.3) and the Hölder inequality Inline graphic |Z₁Z₂| ≤ ||Z₁||_p₁ ||Z₂||_p₂,

\begin{array}{l} {‖ h (e_{0}^{'}) - h (e_{0}) ‖}_{q^{*}}^{q^{*}} \leq O (1) E [{∣ e_{0}^{'} - e_{0} ∣}^{ς q^{*}} {(1 + ∣ e_{0} ∣ + ∣ e_{0}^{'} ∣)}^{υ q^{*}}] \\ \leq O (1) {E [{∣ e_{0} - e_{0}^{'} ∣}^{ς q^{*} \cdot p_{1}}]}^{1 / p_{1}} {E [{(1 + ∣ e_{0} ∣ + ∣ e_{0}^{'} ∣)}^{υ q^{*} \cdot p_{2}}]}^{1 / p_{2}} \\ = O (1) {‖ e_{0} - e_{0}^{'} ‖}_{q}^{q / p_{1}} {‖ e_{0} ‖}_{q}^{q / p_{2}} = O (ρ^{k q / p_{1}}) . \end{array}

The above expression gives ${‖ h (e_{0}^{'}) - h (e_{0}) ‖}_{q^{*}} \leq O (1) {[ρ^{q / (p_{1} q^{*})}]}^{k} = O (ρ^{k ς})$ .

Proof of Proposition 2.2

Let α = ρ^kq^/(1+^q⁾. By (7.1) and the triangle inequality,

\begin{array}{l} {‖ 1_{e_{0} \leq x} - 1_{e_{0} (k) \leq x} ‖}_{q} \leq 2 {‖ 1_{∣ e_{0} - e_{0} (k) ∣ \geq α} ‖}_{q} + {‖ 1_{x - α \leq e_{0} \leq x + α} ‖}_{q} \\ = 2 {[P {∣ e_{0} - e_{0} (k) ∣ \geq α}]}^{1 / q} + {[P {x - α \leq e_{0} \leq x + α}]}^{1 / q} . \end{array}

By the Markov inequality, ℙ{|e₀ − e₀(k)| ≥ α} = Inline graphic [|e₀ − e₀(k)|^q]/α^q = O(ρ^kq/α^q). Since e₀ has a bounded density, ℙ{x − α ≤ e₀ ≤ x + α} = O(α). The result then follows.

7.2. Proof of Theorems 3.1–3.3

Proof of Theorem 3.1

for r = 1, 2, …, 2k_n + 1, let

I_{r} = {(i, j) : 1 \leq i \leq n, 1 \leq j \leq ⌊ (m_{i} - r) / (2 k_{n} + 1) ⌋ + 1} .

(7.2)

Using the identity $\sum_{j = 1}^{m} a_{j} = \sum_{r = 1}^{k} \sum_{j = 1}^{⌊ (m - r) / k ⌋ + 1} a_{(j - 1) k + r}$ for all k, m ∈ ℕ, a₁, …, a_m ∈ ℝ, we can rewrite H_n(v) as

H_{n} (v) = \sum_{r = 1}^{2 k_{n} + 1} \sum_{(i, j) \in I_{r}} g_{i, (j - 1) (2 k_{n} + 1) + r} ({\tilde{Y}}_{i, (j - 1) (2 k_{n} + 1) + r}, v) : = \sum_{r = 1}^{2 k_{n} + 1} H_{n} (v, r) .

(7.3)

Now we consider H_n(v, r). By the discussion in Section 3, the summands in H_n(v, r) are independent. By (3.3),

\begin{array}{l} Var [H_{n} (v, r)] = \sum_{(i, j) \in I_{r}} E [g_{i, (j - 1) (2 k_{n} + 1) + r}^{2} ({\tilde{Y}}_{i, (j - 1) (2 k_{n} + 1) + r}, v)] \\ \leq \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E [g_{i, j}^{2} ({\tilde{Y}}_{i, j}, v)] \leq χ_{n}, \end{array}

(7.4)

uniformly over v, r.

Consider the case χ_n = O(1). Recall the condition |g_i,j(y, v)| ≤ c. By Berstein’s exponential inequality (Bennett, 1962) for bounded and independent random variables, for any given c₁ > 0, when N_n is sufficiently large,
$P {∣ H_{n} (v, r) ∣ \geq c_{1} log N_{n}} \leq 2 exp {- \frac{{(c_{1} log N_{n})}^{2}}{2 Var [Λ_{n} (r, h)] + {c c}_{1} log N_{n}}} \leq 2 N_{n}^{- c_{1} / (3 c)},$ (7.5)

uniformly over r and h. Here the second inequality follows from Var[H_n(v, r)] ≤ χ_n = O(1) ≤ cc₁ log N_n for large enough N_n. Thus,
$\begin{array}{l} P {max_{v \in V_{n}, 1 \leq r \leq 2 k_{n} + 1} ∣ H_{n} (v, r) ∣ \geq c_{1} log N_{n}} \leq \sum_{v \in V_{n}, 1 \leq r \leq 2 k_{n} + 1} P {∣ H_{n} (v, r) ∣ \geq c_{1} log N_{n}} \\ \leq 2 ∣ V_{n} ∣ K_{n} N_{n}^{- c_{1} / (3 c)} . \end{array}$

By the assumption that both | | and k_n grow no faster than a polynomial of N_n, we can make the above probability go to zero by choosing a large enough c₁. Therefore, max_{v∈
,1≤r≤2k_n+1} |H_n(v, r)| = O_p(log N_n). By (7.3), the desired result follows from
$max_{v \in V_{n}} ∣ H_{n} (v) ∣ \leq (2 k_{n} + 1) max_{v \in V_{n}, 1 \leq r \leq 2 k_{n} + 1} ∣ H_{n} (v, r) ∣ .$
Consider the case sup_n log N_n/χ_n < ∞. As in (7.5),
$P {∣ H_{n} (v, r) ∣ \geq c_{1} \sqrt{χ_{n} log N_{n}}} \leq 2 exp {- \frac{{(c_{1} \sqrt{χ_{n} log N_{n}})}^{2}}{2 χ_{n} + {c c}_{1} \sqrt{χ_{n} log N_{n}}}} = O [N_{n}^{- c_{1}^{2} / (2 + {c c}_{1} c_{2})}],$

uniformly over r and h, where c₂ = sup_n[log N_n/χ_n]^1/2 < ∞. The rest of the proof follows from the same argument as in the case (i) by choosing a sufficiently large c₁.

Proof of Theorem 3.2

Let α = 1/N_n. Since ϖ_i,j(x) ≤ c, applying (7.1), we obtain

\begin{array}{l} ∣ F_{n} (x, y) - {\tilde{F}}_{n} (x, y) ∣ \leq \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϖ_{i, j} (x) ∣ 1_{Y_{i, j} \leq y} - 1_{{\tilde{Y}}_{i, j} \leq y} ∣ \\ \leq 2 c [\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} 1_{∣ Y_{i, j} - {\tilde{Y}}_{i, j} ∣ \geq α} + \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} 1_{y - α < {\tilde{Y}}_{i, j} < y + α}] \\ : = 2 c [Ω_{n} + Λ_{n} (y)] . \end{array}

(7.6)

Notice that, |Y_i,j − Ỹ_i,j| = O(1)|e_i,j − e_i,j(k_n)|. By (2.2) and the Markov inequality,

E (1_{∣ Y_{i, j} - {\tilde{Y}}_{i, j} ∣ \geq α}) \leq \frac{{‖ Y_{i, j} - {\tilde{Y}}_{i, j} ‖}_{q}^{q}}{α^{q}} = O (1) \frac{{‖ e_{i, j} - e_{i, j} (k_{n}) ‖}_{q}^{q}}{α^{q}} = O (N_{n}^{q} ρ^{{q k}_{n}}) .

Thus, $Ω_{n} = O_{p} (N_{n}^{1 + q} ρ^{{q k}_{n}}) = O_{p} [N_{n}^{1 + q} ρ^{q λ log (N_{n})}] = o_{p} (1)$ for λ > (q + 1)/[q log(1/ρ)].

For Λ_n(y) over y ∈ ℝ, consider two cases: $∣ y ∣ > N_{n}^{1 / q}$ and $∣ y ∣ \leq N_{n}^{1 / q}$ . For $∣ y ∣ > N_{n}^{1 / q}$ , since α = 1/N_n → 0, μ(x_i,j) and s(x_i,j) are bounded, ${y - α < {\tilde{Y}}_{i, j} < y + α} \subset {∣ e_{i, j} (k_{n}) ∣ \geq c_{1} N_{n}^{1 / q}}$ for some constant c₁ > 0. Therefore, by e_i,j(k_n) ∈ Inline graphic and the Markov inequality,

E [sup_{∣ y ∣ > N_{n}^{1 / q}} Λ_{n} (y)] \leq E [\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} 1_{∣ e_{i, j} (k_{n}) ∣ > c_{1} N_{n}^{1 / q}}] \leq \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} \frac{{‖ e_{i, j} (k_{n}) ‖}_{q}^{q}}{{(c_{1} N_{n}^{1 / q})}^{q}} = O (1) .

(7.7)

We conclude that ${sup}_{∣ y ∣ > N_{n}^{1 / q}} Λ_{n} (y) = O_{p} (1)$ .

In what follows we use a chain argument to prove ${sup}_{y \in [- N_{n}^{1 / q}, N_{n}^{1 / q}]} Λ_{n} (y) = O_{p} [{(log n)}^{2}]$ . Without loss of generality, consider $y \in [0, N_{n}^{1 / q}]$ . Write $ℓ_{n} = [N_{n}^{1 + 1 / q}]$ and let $V_{n} = {y_{v} = {v N}_{n}^{1 / q} / ℓ_{n}, v = 0, 1, \dots, ℓ_{n}}$ be the set of ℓ_n + 1 grid points uniformly spaced over $[0, N_{n}^{1 / q}]$ . Partition $[0, N_{n}^{1 / q}]$ into intervals I_v = [y_v₋₁, y_v], v = 1, …, ℓ_n. For any y ∈ I_v, we have 1_{y−α<Ỹ_i,jy+α} ≤ 1_{y_v−1−α<Ỹ_i,jy_v+α}. Since s(x_i,j) is bounded away from zero, sup_u f_e(u) < ∞, and |y_v − y_v₋₁| = O(1/N_n), we have Inline graphic (1_{y_v−1−α<Ỹ_i,jy_v+α}) ≤ c₂/N_n uniformly for some constant c₂ < ∞. Consequently, for any y ∈ I_v, we have

Λ_{n} (y) \leq \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} [{1_{y_{v - 1} - α < {\tilde{Y}}_{i, j} < y_{v} + α} - E (1_{y_{v - 1} - α < {\tilde{Y}}_{i, j} < y_{v} + α})} + c_{2} / N_{n}] = Λ_{n}^{*} (v) + c_{2} .

We apply Theorem 3.1 to $Λ_{n}^{*} (v)$ . For χ_n in (3.3), using Inline graphic (1_{y_v−1−α<Ỹ_i,jy_v+α}) ≤ c₂/N_n, we have χ_n = O(1) and thus ${max}_{v \in V_{n}} ∣ Λ_{n}^{*} (v) ∣ = O_{p} [{(log N_{n})}^{2}]$ , completing the proof.

Proof of Theorem 3.3

Recall the coupling process D̃_n(δ, x, y) in (3.7). Under the assumption sup_n log N_n/(δ_nφ_n) < ∞, (log N_n)² = O{[δ_nφ_n(log N_n)³]^1/2}. Thus, by (3.8), it suffices to show sup_{|δ|≤δ_n,x,y∈ℝ}|D̃_n(δ, x, y)| = O_p{[δ_nφ_n(log N_n)³]^1/2}.

Without loss of generality assume δ ∈ [0, δ_n]. Recall Ỹ_i,j in (3.5). Rewrite

{\tilde{D}}_{n} (δ, x, y) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϖ_{i, j} (x) {{\tilde{ξ}}_{i, j} (δ, y) - E [{\tilde{ξ}}_{i, j} (δ, y)]}, {\tilde{ξ}}_{i, j} (δ, y) = 1_{y < {\tilde{Y}}_{i, j} \leq y + δ} .

As in the proof of Theorem 3.2, consider $∣ y ∣ > N_{n}^{1 / q}$ and $∣ y ∣ \leq N_{n}^{1 / q}$ .

For $∣ y ∣ > N_{n}^{1 / q}$ , since μ(x_i,j) and s(x_i,j) are bounded and |δ| ≤ δ_n → 0, ${y < {\tilde{Y}}_{i, j} \leq y + δ} \subset {∣ e_{i, j} (k_{n}) ∣ \geq c_{1} N_{n}^{1 / q}}$ for some c₁ > 0. Therefore, by the boundedness of ϖ_i,j (·), the same argument in (7.7) shows D̃_n(δ, x, y) = O_p(1) uniformly over x ∈ ℝ, $∣ y ∣ > N_{n}^{1 / q}$ , |δ| ≤ δ_n.

Next we consider $∣ y ∣ \leq N_{n}^{1 / q}$ . Since ϖ_i,j(x) vanishes for x outside a bounded interval, without loss of generality we only consider x ∈ [0, b] for some b > 0, $y \in [0, N_{n}^{1 / q}]$ , and δ ∈ [0, δ_n]. As in the proof of Theorem 3.2, we use the chain argument. Let $ℓ_{n} = ⌊ N_{n}^{1 / q} / δ_{n} + N_{n} τ_{n} + N_{n}^{1 + 1 / q} ⌋$ , and

V_{n} = {(x_{v_{1}}, y_{v_{2}}, t_{v_{3}}), x_{v_{1}} = \frac{v_{1} b}{ℓ_{n}}, y_{v_{2}} = \frac{v_{2} N_{n}^{1 / q}}{ℓ_{n}}, t_{v_{3}} = \frac{v_{3} δ_{n}}{ℓ_{n}}, v_{1}, v_{2}, v_{3} = 0, 1, \dots . ℓ_{n}}

be uniformly spaced grid points. Partition $[0, b] \times [0, N_{n}^{1 / q}] \times [0, δ_{n}]$ into intervals I_{v₁v₂,v₃} = [x_v₁−1, x_v₁] × [y_v₂−1, y_v₂] × [t_v₃−1, t_v₃], v₁, v₂, v₃ = 1, …, ℓ_n. Let

{\underline{ξ}}_{i, j} (v_{2}, v_{3}) = 1_{y_{v_{2}}} < {\tilde{Y}}_{i, j} \leq y_{v_{2} - 1} + t_{v_{3} - 1} and {\bar{ξ}}_{i, j} (v_{2}, v_{3}) = 1_{y_{v_{2} - 1} < {\tilde{Y}}_{i, j} \leq y_{v_{2}} + t_{v_{3}}} .

Clearly, for any (x, y, δ) ∈ I_{v₁,v₂, v₃}, we have ξ_i,j(v₂, v₃) ≤ ξ̃_i,j(δ, y) ≤ ξ̄_i,j (v₂, v₃). Since N_n → ∞ and δ_n → 0, there exists a constant c₂ < ∞ such that $0 \leq E [{\bar{ξ}}_{i, j} (v_{2}, v_{3})] - E [{\underline{ξ}}_{i, j} (v_{2}, v_{3})] \leq c_{2} N_{n}^{1 / q} / ℓ_{n}$ . Additionally, for x ∈ [x_v₁−1, x_v₁], by Condition 3.2, |ϖ_i,j(x) − ϖ_i,j (x_v₁) | ≤ τ_n|x − x_v₁ | ≤ τ_nb/ℓ_n. Thus, there exists a constant c₃ < ∞ such that

\begin{array}{l} ϖ_{i, j} (x) {{\tilde{ξ}}_{i, j} (δ, y) - E [{\tilde{ξ}}_{i, j} (δ, y)]} \leq ϖ_{i, j} (x_{v_{1}}) {{\bar{ξ}}_{i, j} (v_{2}, v_{3}) - E [{\underline{ξ}}_{i, j} (v_{2}, v_{3})]} + τ_{n} b / ℓ_{n} \\ \leq ϖ_{i, j} (x_{v_{1}}) {{\bar{ξ}}_{i, j} (v_{2}, v_{3}) - E [{\bar{ξ}}_{i, j} (v_{2}, v_{3})]} + c_{3} (τ_{n} + N_{n}^{1 / q}) / ℓ_{n}, \end{array}

(7.8)

uniformly over i, j, and (x, y, δ) ∈ I_{v₁,v₂,v₃}. Similarly,

ϖ_{i, j} (x) {{\tilde{ξ}}_{i, j} (δ, y) - E [{\tilde{ξ}}_{i, j} (δ, y)]} \geq ϖ_{i, j} (x_{v_{1}}) {{\underline{ξ}}_{i, j} (v_{2}, v_{3}) - E [{\underline{ξ}}_{i, j} (v_{2}, v_{3})]} - c_{3} (τ_{n} + N_{n}^{1 / q}) / ℓ_{n} .

(7.9)

Combining (7.8) and (7.9) and using $N_{n} (τ_{n} + N_{n}^{1 / q}) / ℓ_{n} = O (1)$ , we have

sup_{x, y, δ} ∣ {\tilde{D}}_{n} (δ, x, y) ∣ \leq max_{v \in V_{n}} {∣ {\underline{Δ}}_{n} (v) ∣ + ∣ {\bar{Δ}}_{n} (v) ∣} + O (1),

(7.10)

where v = (v₁, v₂, v₃),

\begin{array}{l} {\underline{Δ}}_{n} (v) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϖ_{i j} (x_{v_{1}}) {{\underline{ξ}}_{i, j} (v_{2}, v_{3}) - E [ξ_{i, j} (v_{2}, v_{3})]}, \\ {\bar{Δ}}_{n} (v) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϖ_{i, j} (x_{v_{1}}) {{\bar{ξ}}_{i, j} (v_{2}, v_{3}) - E [{\bar{ξ}}_{i, j} (v_{2}, v_{3})]} . \end{array}

We now apply Theorem 3.1 to Δ_n(v) and Δ̄_n(v). For χ_n in (3.3), with φ_n in (3.9) and $E [{\bar{ξ}}_{i, j} (h_{2}, h_{3})] = O (δ_{n} + N_{n}^{1 / q} / ℓ_{n}) = O (δ_{n})$ , we can take χ_n = O(δ_nφ_n). By Theorem 3.1 (ii), max_v_∈ |Δ̄_n(v) = O_p{[δ_nφ_n(log N_n)³]^1/2}. The latter bound also holds for max_v_∈|Δ_n(v)|. The desired result then follows from (7.10).

7.3. Asymptotic expansions

Throughout the proofs, we use the following notations:

\begin{array}{l} L_{μ} (δ_{1}, x) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} K_{b_{n}} (x_{i, j} - x) 1_{Y_{i, j} \leq μ (x) + δ_{1}}, \\ L_{μ} (x) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} K_{b_{n}} (x_{i, j} - x), \\ J_{μ} (δ_{1}, x) = E [L_{μ} (δ_{1}, x)], \\ L_{s} (δ_{1}, δ_{2}, x) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} K_{h_{n}} (x_{i, j} - x) 1_{∣ Y_{i, j} - μ (x) - δ_{1} ∣ \leq s (x) + δ_{2}}, \\ L_{s} (x) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} K_{h_{n}} (x_{i, j} - x), \\ J_{s} (δ_{1}, δ_{2}, x) = E [L_{s} (δ_{1}, δ_{2}, x)] . \end{array}

Lemma 7.1

Assume that Conditions 4.1–4.2 hold. Then, we have

Uniformly over x ∈ [a, b],
$\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {(\frac{x_{i, j} - x}{b_{n}})}^{r} K (\frac{x_{i, j} - x}{b_{n}}) = \frac{N_{n} b_{n}}{b - a} \int_{R} u^{r} K (u) d u + O (1) .$ (7.11)
Let g(x, v) be a measurable bivariate function on [a, b]². Define
$G_{g} (x) = \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} g (x, x_{i, j}) K_{b_{n}} (x_{i, j} - x) .$ (7.12)

Further assume that sup_x_∈[_a_,_b_] |∂^s(x, v)/∂v^s| < ∞, s = 0, 1, …, r for some r ∈ ℕ. Then uniformly over x ∈ [a, b],
$G_{g} (x) = \sum_{s = 0}^{r - 1} {\frac{\partial^{s} g (x, v)}{\partial v^{s}} |}_{v = x} \frac{N_{n} b_{n}^{s + 1}}{(b - a) s!} \int_{R} u^{s} K (u) d u + O (1 + N_{n} b_{n}^{r + 1}) .$ (7.13)

Proof

Recall the ordered locations x̃_k in Condition 4.1. Define
$S_{n} (x) = \sum_{k = 1}^{N_{n}} {(\frac{{\tilde{x}}_{k} - x}{b_{n}})}^{r} K (\frac{{\tilde{x}}_{k} - x}{b_{n}}),$ (7.14)

$I_{n} (x) = \sum_{k = 0}^{N_{n}} ({\tilde{x}}_{k + 1} - {\tilde{x}}_{k}) {(\frac{{\tilde{x}}_{k} - x}{b_{n}})}^{r} K (\frac{{\tilde{x}}_{k} - x}{b_{n}}),$ (7.15)

$ϱ_{n} = max_{0 \leq k \leq N_{n}} | {\tilde{x}}_{k + 1} - {\tilde{x}}_{k} - (b - a) / N_{n} | = O (N_{n}^{- 2}),$ (7.16)

$I (x) = {1 \leq k \leq N_{n} : {\tilde{x}}_{k} - x \in [- b_{n} - (b - a) / N_{n} - ϱ_{n}, b_{n}]} .$ (7.17)

Assume without loss of generality that K has support [−1, 1]. Condition (4.6) implies that sup_x_∈[_a,b_] | (x)| = O(N_nb_n), where and hereafter | | is the cardinality of a set . Because K has support [−1, 1], K_{b_n} (x̃_k − x) = 0 for k ∉ (x). Additionally, for k ∈ (x), the summands in S_n(x) are uniformly bounded. Thus,
$S_{n} (x) = \sum_{k \in I (x)} {(\frac{{\tilde{x}}_{k} - x}{b_{n}})}^{r} K (\frac{{\tilde{x}}_{k} - x}{b_{n}}) = O [∣ I (x) ∣] = O (N_{n} b_{n}),$ (7.18)

uniformly over x ∈ [a, b].

By (4.6), elementary calculation shows that, uniformly over x ∈ [a, b],
$\begin{array}{l} \frac{b - a}{N_{n}} S_{n} (x) - I_{n} (x) = - \sum_{k = 1}^{N_{n}} ({\tilde{x}}_{k + 1} - {\tilde{x}}_{k} - \frac{b - a}{N_{n}}) {(\frac{{\tilde{x}}_{k} - x}{b_{n}})}^{r} K (\frac{{\tilde{x}}_{k} - x}{b_{n}}) \\ = O (ϱ_{n}) sup_{x \in [a, b]} ∣ S_{n} (x) ∣ = O (b_{n} / N_{n}) . \end{array}$ (7.19)

Write u_k = (x̃_k − x)/b_n. Observe that $I_{n} (x) = \sum_{k = 0}^{N_{n}} \int_{{\tilde{x}}_{k}}^{{\tilde{x}}_{k + 1}} u_{k}^{r} K (u_{k}) d v$ . Thus, by the triangle inequality, we have
$\begin{array}{l} | I_{n} (x) - \int_{{\tilde{x}}_{0}}^{{\tilde{x}}_{N_{n} + 1}} {(\frac{v - x}{b_{n}})}^{r} K (\frac{v - x}{b_{n}}) d v | \leq \sum_{k = 0}^{N_{n}} V_{k}, \\ where V_{k} = \int_{{\tilde{x}}_{k}}^{{\tilde{x}}_{k + 1}} | u_{k}^{r} K (u_{k}) - {(\frac{v - x}{b_{n}})}^{r} K (\frac{v - x}{b_{n}}) | d v . \end{array}$ (7.20)

Since K has bounded derivative, |y^rK(y) − z^rK(z)| = O(|y − z|) for y, z ∈ [−1, 1]. Also, |u_k − (v − x)/b_n| = |v − x̃_k|/b_n. Thus, under Condition 4.1,
$∣ V_{k} ∣ = O (1) \int_{{\tilde{x}}_{k}}^{{\tilde{x}}_{k + 1}} \frac{v - {\tilde{x}}_{k}}{b_{n}} d v = \frac{O [{({\tilde{x}}_{k + 1} - {\tilde{x}}_{k})}^{2}]}{b_{n}} \frac{O (1)}{N_{n}^{2} b_{n}} .$ (7.21)

Furthermore, it is easily seen that, for k ∉ (x), min(|x̃_k − x|, |x̃_k₊₁ − x|) > b_n, which implies K(u_k) = 0, K{(v − x)/b_n} = 0 for v ∈ [x̃_k, x̃_k₊₁], and consequently V_k = 0. Thus, by (7.20) and (7.21),
$| I_{n} (x) - \int_{{\tilde{x}}_{0}}^{{\tilde{x}}_{N_{n} + 1}} {(\frac{v - x}{b_{n}})}^{r} K (\frac{v - x}{b_{n}}) d v | \leq \sum_{k \in I (x)} V_{k} = O (1 / N_{n}),$ (7.22)

uniformly over x ∈ [a, b],

Notice that $\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {[(x_{i, j} - x) / b_{n}]}^{r} K_{b_{n}} (x_{i, j} - x) = S_{n} (x)$ . Recall that x̃₀ = a and x̃_{N_n+1} = b. The desired result then follows from (7.19) and (7.22) in view of
$\int_{{\tilde{x}}_{0}}^{{\tilde{x}}_{N_{n} + 1}} {(\frac{v - x}{b_{n}})}^{r} K (\frac{v - x}{b_{n}}) d v = b_{n} \int_{(a - x) / b_{n}}^{(b - x) / b_{n}} u^{r} K (u) d u = b_{n} \int_{- 1}^{1} u^{r} K (u) d u,$

for all x ∈ [a, b] and large enough n.
The expression (7.13) easily follows from (i) in view of the Taylor expansion $g (x, x_{i, j}) = \sum_{s = 0}^{r - 1} \partial^{s} g (x, v) / \partial v^{s} ∣_{v = x} {(x_{i, j} - x)}^{s} / s! + O (b_{n}^{r})$ for |x_i,j − x| ≤ b_n.

Lemma 7.2

Assume that Conditions 4.1–4.2 hold. Let ρ_μ(x), ρ_s(x), κ, κ₊ be as in Theorems 4.1–4.2. Then, for δ₁ → 0, δ₂ → 0, we have uniformly over x ∈ Inline graphic [a, b],

\begin{array}{l} J_{μ} (0, x) = L_{μ} (x) / 2 - N_{n} b_{n}^{3} ρ_{μ} (x) f_{e} (0) ψ_{K} / [(b - a) s (x)] + O (1 + N_{n} b_{n}^{5}), \\ J_{μ} (δ_{1}, x) = J_{μ} (0, x) + N_{n} b_{n} δ_{1} {f_{e} (0) / [(b - a) s (x)] + O [{(N_{n} b_{n})}^{- 1} + b_{n}^{2} + δ_{1}]}, \\ J_{s} (δ_{1}, 0, x) = L_{s} (x) / 2 - N_{n} h_{n} κ_{+} {[h_{n}^{2} ψ_{K} ρ_{s} (x) - δ_{1} κ] / [(b - a) s (x)] + O (h_{n}^{4} + δ_{1}^{2})}, \\ J_{s} (δ_{1}, δ_{2}, x) = J_{s} (δ_{1}, 0, x) + N_{n} h_{n} δ_{2} {κ_{+} / [(b - a) s (x)] + O (h_{n}^{2} + δ_{1} + δ_{2})} . \end{array}

Proof

Recall that F_e and f_e are the distribution and density functions of e_i,j. The assumption Inline graphic (e_i,j) = 0 implies that F_e(0) = 1/2. Notice that

\begin{array}{l} J_{μ} (0, x) - L_{μ} (x) / 2 = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} K_{b_{n}} (x_{i, j} - x) [P {Y_{i, j} \leq μ (x)} - 1 / 2] \\ = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} K_{b_{n}} (x_{i, j} - x) g (x, x_{i, j}), \end{array}

where g(x, v) = F_e{[μ(x)−μ(v)]/s(v)}−F_e(0). The symmetry of K entails ∫ u^sK(u)du = 0, s = 1, 3. The first expression then follows from Lemma 7.1 (ii) with r = 4.

Similarly, we can show $J_{μ}^{'} (0, x) : = \partial J_{μ} (δ_{1}, x) / \partial δ_{1} ∣_{δ_{1} = 0} = N_{n} b_{n} f_{e} (0) / [(b - a) s (x)] + O (1 + N_{n} b_{n}^{3})$ and $J_{μ}^{″} (δ_{1}, x) : = \partial^{2} J_{μ} (δ_{1}, x) / \partial δ_{1}^{2} = O ({N b}_{n})$ uniformly over δ₁, x. So, the second expression follows from the Taylor expansion $J_{μ} (δ_{1}, x) - J_{μ} (0, x) = δ_{1} J_{μ}^{'} (0, x) + O ({N b}_{n} δ_{1}^{2})$ . The other two expressions can be similarly treated. We omit the details.

7.4. Proof of Theorems 4.1–4.2

Let L_μ(x), L_μ(δ₁, x), J_μ(δ₁, x), L_s(x), L_s(δ₁, δ₂, x) and J_s(δ₁, δ₂, x) be as in Section 7.3.

Proof of Theorem 4.1

Let $δ_{n} = {[{(log N_{n})}^{3} / (N_{n} b_{n})]}^{1 / 2} + b_{n}^{2} \to 0$ . Let l_n ↑ ∞ be a positive sequence satisfying δ_nl_n → 0. First, we show Δ̂_μ(x):= μ̂ (x) − μ(x) = O_p(l_nδ_n) uniformly over x ∈ Inline graphic ([a, b]). Since μ̂(x) is a solution to (4.1), by Koenker (2005, p.32–33),

∣ L_{μ} ({\hat{Δ}}_{μ} (x), x) - L_{μ} (x) / 2 ∣ \leq \sum_{i, j} K_{b_{n}} (x_{i, j} - x) 1_{Y_{i, j = \hat{μ} (x)}} = O_{p} (1),

(7.23)

uniformly over x. Let

Ω_{n} (x) = [L_{μ} (l_{n} δ_{n}, x) - J_{μ} (l_{n} δ_{n}, x)] - [L_{μ} (0, x) - J_{μ} (0, x)] .

We can apply Theorem 3.3 with ϖ_i,j (x) = K_{b_n}(x_i,j − x) to Ω_n(x). For τ_n and φ_n in Condition 3.2, τ_n = O(1/b_n) and φ_n = O(N_nb_n) (see Lemma 7.1). By Theorem 3.3, sup_x_∈[_a,b_] |Ω_n(x)| = O_p{[N_nb_nl_nδ_n(log N_n)³]^1/2}. By the same argument, we can show

sup_{x \in [a, b]} ∣ L_{μ} (0, x) - J_{μ} (0, x) ∣ = O_{p} {{[N_{n} b_{n} {(log N_{n})}^{3}]}^{1 / 2}} .

(7.24)

Hence, by (7.24) and Lemma 7.2, uniformly over x ∈ Inline graphic ([a, b]),

\begin{array}{l} L_{μ} (l_{n} δ_{n}, x) - L_{μ} (x) / 2 = [J_{μ} (l_{n} δ_{n}, x) - J_{μ} (0, x)] + [J_{μ} (0, x) - L_{μ} (x) / 2] + [L_{μ} (0, x) - J_{μ} (0, x)] + Ω_{n} (x) \\ = N_{n} b_{n} l_{n} δ_{n} f_{e} (0) / [(b - a) s (x)] [1 + o (1)] + O_{p} (ν_{n}), \end{array}

(7.25)

where $ν_{n} = N_{n} b_{n}^{3} + 1 + {[N_{n} b_{n} {(log N_{n})}^{3}]}^{1 / 2} + {[N_{n} b_{n} l_{n} δ_{n} {(log N_{n})}^{3}]}^{1 / 2}$ . Because l_n → ∞ and l_nδ_n → 0, it is easy to see that ν_n = o(N_nb_nl_nδ_n) and N_nb_nl_nδ_n → ∞, which implies L_μ(l_nδ_n, x) − L_μ(x)/2 → ∞ uniformly over x ∈ Inline graphic [a, b] in view of sup_x s(x) < ∞. Since L_μ(δ₁, x) is non-decreasing in δ₁, (7.23) and (7.25) entail ℙ{sup_x Δ̂_μ(x) ≤ l_nδ_n} → 1. Similarly, ℙ{inf_x Δ̂_μ(x) ≥ −l_nδ_n} → 1. So, sup_x | Δ̂_μ(x)| = O_p(l_nδ_n). Since the rate of l_n → ∞ can be arbitrarily slow, sup_x |Δ̂_μ(x)| = O_p(δ_n).

Again, by (7.23) and Lemma 7.2, uniformly over x ∈ Inline graphic ([a, b]),

\begin{array}{l} L_{μ} (0, x) - J_{μ} (0, x) = L_{μ} ({\tilde{Δ}}_{μ} (x), x) - J_{μ} ({\hat{Δ}}_{μ} (x), x) + O_{p} [\sqrt{N_{n} b_{n} δ_{n} {(log N_{n})}^{3}}] \\ = [L_{μ} ({\hat{Δ}}_{μ} (x), x) - L_{μ} (x) / 2] + [L_{μ} (x) / 2 - J_{μ} (0, x)] - [J_{μ} ({\hat{Δ}}_{μ} (x), x) - J_{μ} (0, x)] + O_{p} [\sqrt{N_{n} b_{n} δ_{n} {(log N_{n})}^{3}}] \\ = O_{p} (1) + N_{n} b_{n}^{3} ρ_{μ} (x) f_{e} (0) ψ_{K} / [(b - a) s (x)] + O (1 + N_{n} b_{n}^{5}) - N_{n} b_{n} {\hat{Δ}}_{μ} (x) {f_{e} (0) / [(b - a) s (x)] + O (δ_{n})} + O_{p} [\sqrt{N_{n} b_{n} δ_{n} {(log N_{n})}^{3}}] . \end{array}

The representation (4.8) then follows by solving Δ̂_μ(x) from the above equation.

Proof of Theorem 4.2

We use the argument in Theorem 4.1 and only sketch the outline. Let

D_{s} (δ_{1}, δ_{2}, x) = [L_{s} (δ_{1}, δ_{2}, x) - J_{s} (δ_{1}, δ_{2}, x)] - [L_{s} (0, 0, x) - J_{s} (0, 0, x)] .

Using Theorem 3.3, we can show that

sup_{x \in [a, b]} ∣ L_{s} (0, 0, x) - J_{s} (0, 0, x) ∣ = O_{p} {{[N_{n} h_{n} {(log N_{n})}^{3}]}^{1 / 2}},

(7.26)

sup_{∣ δ_{1} ∣ + ∣ δ_{2} ∣ \leq δ_{n}, x \in [a, b]} ∣ D_{s} (δ_{1}, δ_{2}, x) ∣ = O_{p} {{[N_{n} h_{n} {(log N_{n})}^{3}]}^{1 / 2}},

(7.27)

hold for all b_n → 0, h_n → 0 and δ_n → 0 satisfying sup_n log N_n/[N_n min(b_n, h_n)δ_n] < ∞.

Let $δ_{n} = b_{n}^{2} + h_{n}^{2} + {[{(log N_{n})}^{3} / (N_{n} b_{n})]}^{1 / 2} + {[{(log N_{n})}^{3} / (N_{n} h_{n})]}^{1 / 2}$ and l_n → ∞ be a sequence such that l_nδ_n → 0. By Theorem 4.1, Δ̃_μ(x):= μ̃(x) − μ(x) = O_p(δ_n). Using (7.27) and Lemma 7.2, we can derive the following counterpart of (7.25)

\begin{array}{l} L_{s} ({\tilde{Δ}}_{μ} (x), l_{n} δ_{n}, x) - L_{s} (x) / 2 = [J_{s} ({\tilde{Δ}}_{μ} (x), l_{n} δ_{n}, x) - J_{s} ({\tilde{Δ}}_{μ} (x), 0, x)] + [J_{s} ({\tilde{Δ}}_{μ} (x), 0, x) - L_{s} (x) / 2] + L_{s} (0, 0, x) - J_{s} (0, 0, x) + O_{p} {{[N_{n} h_{n} l_{n} δ_{n} {(log N_{n})}^{3}]}^{1 / 2}} \\ = N_{n} h_{n} l_{n} δ_{n} κ_{+} / [(b - a) s (x)] [1 + o_{p} (1)] \to \infty . \end{array}

Let Δ̂_s(x) = ŝ(x) − s(x). By the same argument in (7.23), sup_x |L_s(Δ̃_μ(x), Δ̂_s(x), x) −L_s(x)/2| = O_p(1). Notice that L_s(Δ̃_μ(x), δ₂, x) is non-decreasing in x. Thus, ℙ{sup_x Δ̂_s(x) ≤ l_nk_n} → 1. Similarly, ℙ{inf_x Δ̂_s(x) ≥ −l_nk_n} → 1. Then sup_x | Δ̂_s(x)| = O_p(δ_n).

Write ϖ_n = [N_nh_nδ_n(log N_n)³]^1/2. To derive the Bahadur representation (4.14), we use (7.27) and Lemma 7.2 to obtain

\begin{array}{l} L_{s} (0, 0, x) - J_{s} (0, 0, x) = [L_{s} ({\tilde{Δ}}_{μ} (x), {\hat{Δ}}_{s} (x), x) - L_{s} (x) / 2] + [L_{s} (x) / 2 - J_{s} ({\hat{Δ}}_{μ} (x), 0, x)] - [J_{s} ({\tilde{Δ}}_{μ} (x), {\hat{Δ}}_{s} (x), x) - J_{s} ({\tilde{Δ}}_{μ} (x), 0, x)] + O_{p} (ϖ_{n}) \\ = O_{p} (1) + N_{n} h_{n} κ_{+} {[h_{n}^{2} ψ_{K} ρ_{s} (x) - κ {\tilde{Δ}}_{u} (x)] / [(b - a) s (x)] + O (h_{n}^{4} + δ_{n}^{2})} - N_{n} h_{n} {\hat{Δ}}_{s} (x) {κ_{+} / [(b - a) s (x)] + O (δ_{n})} + O_{p} (ϖ_{n}) . \end{array}

Solving Δ̂_s(x) from the above equation, we obtain the Bahadur representation (4.14).

7.5. Proof of Corollaries 4.1–4.2

Again we use the coupling argument to convert the dependent data to m-dependent case. Theorem 7.1 below presented a CLT for m-dependent sequence with unbounded m.

Theorem 7.1 (Romano and Wolf (2000))

Let Z_n,j, 1 ≤ j ≤ d_n, be a triangular array of mean zero k_n-dependent random variables. Define

S_{n} = \sum_{j = 1}^{d_{n}} Z_{n, j}, B_{n}^{2} = Var (S_{n}), S_{n, h, a} = \sum_{j = a}^{a + h - 1} Z_{n, j}, B_{n, h, a}^{2} = Var (S_{n, h, a}) .

Assume that there exist some δ > 0, −1 ≤ γ < 1, C_n,₁, C_n,₂, C_n,₃ > 0 such that

(|Z_n_,_j|²⁺^δ) = O(C_n_,1);
(b) $B_{n, h, a}^{2} / h^{1 + γ} = O (C_{n, 2})$ for all h ≥ k_n,a;
(c) $B_{n}^{2} / (d_{n} C_{n, 2}^{γ}) \geq C_{n, 3}$ ;
C_n_,2/C_n_,3 = O(1);
$C_{n, 1} / C_{n, 3}^{(2 + δ) / 2} = O (1)$ ;
$k_{n}^{1 + (1 - γ) (1 + 2 / δ)} / d_{n} \to 0$ .

Then S_n/B_n ⇒ N(0, 1).

Proof of Corollaries 4.1–4.2

We only prove Corollary 4.1 since Corollary 4.2 can be similarly treated. By the Bahadur representation (4.8), under the specified condition, $r_{n} \sqrt{N_{n} b_{n}} \to 0$ . Thus, it suffices to show (N_nb_n)^−1/2Q_{b_n} (x) ⇒ N (0, ϕ_K/[4(b − a)]). Recall e_i,j(k_n) and Ỹ_i,j in (3.1) and (3.5). Define the coupling process

{\tilde{Q}}_{b_{n}} (x) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {1_{{\tilde{Y}}_{i, j} \leq μ (x)} - E [1_{{\tilde{Y}}_{i, j} \leq μ (x)}]} K_{b_{n}} (x_{i, j} - x) .

Let the coupling lag k_n = ⌊c log N_n⌋ be chosen as in Theorem 3.2. By Theorem 3.2, Q_{b_n} (x)−Q̃_{b_n} (x) = O_p[(log N_n)²] = o_p[(N_nb_n)^1/2]. It remains to show (N_nb_n)^−1/2Q̃_{b_n} (x) ⇒ N(0, ϕ_K/[4(b − a)]). Recall M_n = max_1≤_i_≤_n m_i. Set Ỹ_i,j = 0 for m_i < j ≤ M_n. Define

Z_{n, j} = \sum_{i = 1}^{n} ζ_{i, j}, where ζ_{i, j} = {(N_{n} b_{n})}^{- 1 / 2} {1_{{\tilde{Y}}_{i, j} \leq μ (x)} - E [1_{{\tilde{Y}}_{i, j} \leq μ (x)}]} K_{b_{n}} (x_{i, j} - x) .

Then we can write $- {(N_{n} b_{n})}^{- 1 / 2} {\tilde{Q}}_{b_{n}} (x) = \sum_{j = 1}^{M_{n}} Z_{n, j}$ . Notice that Z_n,j, j = 1, 2, …, are (2k_n + 1)-dependent, and ζ_i,j, i = 1, 2, …, are independent for each fixed j.

Let S_n, $B_{n}^{2}$ , S_n,h,a and $B_{n, h, a}^{2}$ be defined in Theorem 7.1. We shall verify the conditions in Theorem 7.1. By the independence of the summands ζ_i,j in Z_n,j,

\begin{array}{l} E ({∣ Z_{n, j} ∣}^{4}) = \sum_{i = 1}^{n} E ({∣ ζ_{i, j} ∣}^{4}) + 6 \sum_{i_{1} \neq i_{2}} E ({∣ ζ_{i, j} ∣}^{2}) E ({∣ ζ_{i_{2}, j} ∣}^{2}) \\ = \frac{O (1)}{{(N_{n} b_{n})}^{2}} {\sum_{i = 1}^{n} K_{b_{n}}^{4} (x_{i, j} - x) + {[\sum_{i = 1}^{n} K_{b_{n}}^{2} (x_{i, j} - x)]}^{2}} = O (1 / M_{n}^{2}), \end{array}

in view of nM_n = O(N_n). Since Ỹ_i,j and Y_i,j have same distribution, we have $g (x, x_{i, j}) : = Var (1_{{\tilde{Y}}_{i, j} \leq μ (x)}) = F_{e} {[μ (x) - μ (x_{i, j})] / s (x_{i, j})} - F_{e}^{2} {[μ (x) - μ (x_{i, j})] / s (x_{i, j})}$ . Recall F_e(0) = 1/2. Then g(x, x) = 1/4. Thus, by (4.9) and the (2k_n + 1)-dependence of Ỹ_i,j, j ∈ Inline graphic , applying Lemma 7.2 (ii) with r = 1 produces

\begin{array}{l} B_{n}^{2} = \frac{1}{N_{n} b_{n}} \sum_{i = 1}^{n} \sum_{j = 1}^{n_{i}} Var (1_{{\tilde{Y}}_{i, j} \leq μ (x)}) K_{b_{n}}^{2} (x_{i, j} - x) + \frac{O (1)}{N_{n} b_{n}} \sum_{i = 1}^{n} \sum_{1 \leq j_{1} < j_{2} \leq n_{i}, ∣ j_{1} - j_{2} ∣ \leq 2 k_{n}} K_{b_{n}} (x_{i, j_{1}} - x) K_{b_{n}} (x_{i, j_{2}} - x) \\ = \frac{1}{N_{n} b_{n}} [\frac{N_{n} b_{n} ϕ_{K}}{4 (b - a)} + O (N_{n} b_{n}^{2})] + \frac{O ({n M}_{n} k_{n} b_{n} t_{n})}{N_{n} b_{n}} \to \frac{ϕ_{K}}{4 (b - a)}, \end{array}

in view of nM_n = O(N_n) and k_nι_n → 0. Similarly, we can show $B_{n, h, a}^{2} = O (n h / N_{n}) = O (h / M_{n})$ . Therefore, it is easy to see that the conditions in Theorem 7.1 hold with δ = 2, γ = 0, and straightforward choices of C_n,₁, C_n,₂, C_n,₃, completing the proof.

Acknowledgments

We are grateful to an associate editor and three anonymous referees for their insightful comments. Wei’s research was supported by the National Science Foundation (DMS-0906568) and a career award from NIEHS Center for Environmental Health in Northern Manhattan (ES-009089). Zhao’s research was supported by a NIDA grant P50-DA10075-15. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

References

Andrews DWK. Non-strong mixing autoregressive processes. J Appl Probab. 1984;21:930–934. [Google Scholar]
Andrews DWK, Pollard D. An introduction to functional central limit theorems for dependent stochastic processes. Int Stat Rev. 1994;62:119–132. [Google Scholar]
Bennett G. Probability inequalities for the sum of independent random variables. J Amer Statist Assoc. 1962;57:33–45. [Google Scholar]
Bhattacharya PK, Gangopadhyay AK. Kernel and nearest-neighbor estimation of a conditional quantile. Ann Statist. 1990;18:1400–1415. [Google Scholar]
Brumback B, Rice KA. Smoothing spline models for the analysis of nested and crossed samples of curve. J Amer Statist Assoc. 1998;93:961–994. [Google Scholar]
Cai ZW. Regression quantiles for time series. Econometric Theory. 2002;18:169–192. [Google Scholar]
Chaudhuri P. Nonparametric estimates of regression quantiles and their local Bahadur representation. Ann Statist. 1991;19:760–777. [Google Scholar]
Dedecker J, Prieur C. New dependence coefficients. Examples and applications to statistics. Probab Theor Relat Field. 2005;132:203–236. [Google Scholar]
Fan J, Yao Q. Nonlinear Time Series: Nonparametric and Parametric Methods. New York: Springer-Verlag; 2003. [Google Scholar]
Fan J, Zhang JT. Two-step estimation of functional linear models with applications to longitudinal data. J Roy Statist Soc Ser B. 2000;62:303–322. [Google Scholar]
Hallin M, Lu Z, Yu K. Local linear spatial quantile regression. Bernoulli. 2009;15:659–686. [Google Scholar]
He X, Fu B, Fung WK. Median regression for longitudinal data. Stat Med. 2003;22:3655–3669. doi: 10.1002/sim.1581. [DOI] [PubMed] [Google Scholar]
He X, Zhu ZY, Fung WK. Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika. 2002;89:579–590. [Google Scholar]
Ho HC, Hsing T. On the asymptotic expansion of the empirical process of long-memory moving averages. Ann Statist. 1996;24:992–1024. [Google Scholar]
Honda T. Nonparametric estimation of a conditional quantile for α-mixing processes. Ann Inst Statist Math. 2000;52:459–470. [Google Scholar]
Hoover D, Rice J, Wu C, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
Koenker R, Bassett G., Jr Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
Koenker R. Quantile regression for longitudinal data. J Multivariate Anal. 2004;91:74–89. [Google Scholar]
Koenker R. Quantile Regression. New York: Cambridge University Press; 2005. [Google Scholar]
Li Q, Racine J. Nonparametric Econometrics. New Jersey: Princeton University Press; 2007. [Google Scholar]
Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Statist Soc Ser B. 1991;53:233–243. [Google Scholar]
Romano JP, Wolf M. A more general Central Limit Theorem for m-dependent random variables with unbounded m. Statist Probab Lett. 2000;47:115–124. [Google Scholar]
Shao QM, Yu H. Weak convergence for weighted empirical processes of dependent sequences. Ann Probab. 1996;24:2098–2127. [Google Scholar]
Shao X, Wu WB. Asymptotic spectral theory for nonlinear time series. Ann Statist. 2007;35:1773–1801. [Google Scholar]
Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. New York: Wiley; 1986. [Google Scholar]
Truong YK, Stone CJ. Nonparametric function estimation involving time series. Ann Statist. 1992;20:77–97. [Google Scholar]
Walker E, Wright SP. Comparing curves using additive models. J Qual Technol. 2002;34:118–129. [Google Scholar]
Wang H, Fygenson W. Inference for censored quantile regression models in longitudinal studies. Ann Statist. 2009;37:756–781. [Google Scholar]
Wang HJ, Zhu Z, Zhou J. Quantile regression in partially linear varying coefficient models. Ann Statist. 2009;37:3841–3866. [Google Scholar]
Wei Y, Zhao Z, Lin D. Profile control charts based on nonparametric L-1 regression methods. Ann Appl Statist. 2012;6:409–427. doi: 10.1214/11-AOAS501. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu WB. Nonlinear system theory: Another look at dependence. P Natl Acad Sci USA. 2005;102:14150–14154. doi: 10.1073/pnas.0506715102. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu WB. Empirical processes of stationary sequences. Statist Sinica. 2008;18:313–333. [Google Scholar]
Wu WB, Zhao Z. Inference of trends in time series. J Roy Statist Soc Ser B. 2007;69:391–410. [Google Scholar]
Wu H, Zhang JT. Local polynomial mixed-effects models for longitudinal data. J Amer Statist Assoc. 2002;97:883–897. [Google Scholar]
Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. Ann Statist. 2005;33:2873–2903. [Google Scholar]
Yu K, Lu Z, Stander J. Quantile regression: applications and current research areas. The Statistician. 2003;52:331–350. [Google Scholar]
Yu K, Jones MC. Local linear quantile regression. J Amer Statist Assoc. 1998;93:228–237. [Google Scholar]

[R1] Andrews DWK. Non-strong mixing autoregressive processes. J Appl Probab. 1984;21:930–934. [Google Scholar]

[R2] Andrews DWK, Pollard D. An introduction to functional central limit theorems for dependent stochastic processes. Int Stat Rev. 1994;62:119–132. [Google Scholar]

[R3] Bennett G. Probability inequalities for the sum of independent random variables. J Amer Statist Assoc. 1962;57:33–45. [Google Scholar]

[R4] Bhattacharya PK, Gangopadhyay AK. Kernel and nearest-neighbor estimation of a conditional quantile. Ann Statist. 1990;18:1400–1415. [Google Scholar]

[R5] Brumback B, Rice KA. Smoothing spline models for the analysis of nested and crossed samples of curve. J Amer Statist Assoc. 1998;93:961–994. [Google Scholar]

[R6] Cai ZW. Regression quantiles for time series. Econometric Theory. 2002;18:169–192. [Google Scholar]

[R7] Chaudhuri P. Nonparametric estimates of regression quantiles and their local Bahadur representation. Ann Statist. 1991;19:760–777. [Google Scholar]

[R8] Dedecker J, Prieur C. New dependence coefficients. Examples and applications to statistics. Probab Theor Relat Field. 2005;132:203–236. [Google Scholar]

[R9] Fan J, Yao Q. Nonlinear Time Series: Nonparametric and Parametric Methods. New York: Springer-Verlag; 2003. [Google Scholar]

[R10] Fan J, Zhang JT. Two-step estimation of functional linear models with applications to longitudinal data. J Roy Statist Soc Ser B. 2000;62:303–322. [Google Scholar]

[R11] Hallin M, Lu Z, Yu K. Local linear spatial quantile regression. Bernoulli. 2009;15:659–686. [Google Scholar]

[R12] He X, Fu B, Fung WK. Median regression for longitudinal data. Stat Med. 2003;22:3655–3669. doi: 10.1002/sim.1581. [DOI] [PubMed] [Google Scholar]

[R13] He X, Zhu ZY, Fung WK. Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika. 2002;89:579–590. [Google Scholar]

[R14] Ho HC, Hsing T. On the asymptotic expansion of the empirical process of long-memory moving averages. Ann Statist. 1996;24:992–1024. [Google Scholar]

[R15] Honda T. Nonparametric estimation of a conditional quantile for α-mixing processes. Ann Inst Statist Math. 2000;52:459–470. [Google Scholar]

[R16] Hoover D, Rice J, Wu C, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]

[R17] Koenker R, Bassett G., Jr Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]

[R18] Koenker R. Quantile regression for longitudinal data. J Multivariate Anal. 2004;91:74–89. [Google Scholar]

[R19] Koenker R. Quantile Regression. New York: Cambridge University Press; 2005. [Google Scholar]

[R20] Li Q, Racine J. Nonparametric Econometrics. New Jersey: Princeton University Press; 2007. [Google Scholar]

[R21] Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Statist Soc Ser B. 1991;53:233–243. [Google Scholar]

[R22] Romano JP, Wolf M. A more general Central Limit Theorem for m-dependent random variables with unbounded m. Statist Probab Lett. 2000;47:115–124. [Google Scholar]

[R23] Shao QM, Yu H. Weak convergence for weighted empirical processes of dependent sequences. Ann Probab. 1996;24:2098–2127. [Google Scholar]

[R24] Shao X, Wu WB. Asymptotic spectral theory for nonlinear time series. Ann Statist. 2007;35:1773–1801. [Google Scholar]

[R25] Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. New York: Wiley; 1986. [Google Scholar]

[R26] Truong YK, Stone CJ. Nonparametric function estimation involving time series. Ann Statist. 1992;20:77–97. [Google Scholar]

[R27] Walker E, Wright SP. Comparing curves using additive models. J Qual Technol. 2002;34:118–129. [Google Scholar]

[R28] Wang H, Fygenson W. Inference for censored quantile regression models in longitudinal studies. Ann Statist. 2009;37:756–781. [Google Scholar]

[R29] Wang HJ, Zhu Z, Zhou J. Quantile regression in partially linear varying coefficient models. Ann Statist. 2009;37:3841–3866. [Google Scholar]

[R30] Wei Y, Zhao Z, Lin D. Profile control charts based on nonparametric L-1 regression methods. Ann Appl Statist. 2012;6:409–427. doi: 10.1214/11-AOAS501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Wu WB. Nonlinear system theory: Another look at dependence. P Natl Acad Sci USA. 2005;102:14150–14154. doi: 10.1073/pnas.0506715102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Wu WB. Empirical processes of stationary sequences. Statist Sinica. 2008;18:313–333. [Google Scholar]

[R33] Wu WB, Zhao Z. Inference of trends in time series. J Roy Statist Soc Ser B. 2007;69:391–410. [Google Scholar]

[R34] Wu H, Zhang JT. Local polynomial mixed-effects models for longitudinal data. J Amer Statist Assoc. 2002;97:883–897. [Google Scholar]

[R35] Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. Ann Statist. 2005;33:2873–2903. [Google Scholar]

[R36] Yu K, Lu Z, Stander J. Quantile regression: applications and current research areas. The Statistician. 2003;52:331–350. [Google Scholar]

[R37] Yu K, Jones MC. Local linear quantile regression. J Amer Statist Assoc. 1998;93:228–237. [Google Scholar]

PERMALINK

Asymptotics of nonparametric L-1 regression models with dependent data

ZHIBIAO ZHAO

YING WEI

DENNIS KJ LIN

Abstract

1. Introduction

2. Error dependence structure

Condition 2.1

Proposition 2.1

Proposition 2.2

Example 2.1 (m-dependent sequence)

Example 2.2 (Non-causal linear processes)

Example 2.3 (Iterated random functions)

2.1. Some examples

(Time series data)

Longitudinal data

Spatially correlated data

3. Weighted empirical process

Theorem 3.1

Condition 3.1

Theorem 3.2

Condition 3.2

Theorem 3.3

4. Quantile regression and Bahadur representations

Remark 4.1

Condition 4.1

Remark 4.2 (Asymptotic results under the random-design case)

Condition 4.2

Condition 4.3

4.1. Uniform Bahadur representation for μ̂(x)

Theorem 4.1

Condition 4.4

Corollary 4.1

4.2. Uniform Bahadur representation for ŝ(x)

Theorem 4.2

Corollary 4.2

5. An illustration using real data

5.1. Bandwidth selection

5.2. An illustration using progesterone data

Figure 1.

6. Conclusion and extension to spatial setting

7. Technical proofs

7.1. Proof of Propositions 2.1–2.2

Proof of Proposition 2.1

Proof of Proposition 2.2

7.2. Proof of Theorems 3.1–3.3

Proof of Theorem 3.1

Proof of Theorem 3.2

Proof of Theorem 3.3

7.3. Asymptotic expansions

Lemma 7.1

Proof

Lemma 7.2

Proof

7.4. Proof of Theorems 4.1–4.2

Proof of Theorem 4.1

Proof of Theorem 4.2

7.5. Proof of Corollaries 4.1–4.2

Theorem 7.1 (Romano and Wolf (2000))

Proof of Corollaries 4.1–4.2

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases