Single-index Varying Coefficient Model for Functional Responses

Xinchao Luo; Lixing Zhu; Hongtu Zhu

doi:10.1111/biom.12526

. Author manuscript; available in PMC: 2017 Jun 1.

Published in final edited form as: Biometrics. 2016 Apr 8;72(4):1275–1284. doi: 10.1111/biom.12526

Single-index Varying Coefficient Model for Functional Responses

Xinchao Luo ^1,², Lixing Zhu ³, Hongtu Zhu ^2,^✉

PMCID: PMC5055851 NIHMSID: NIHMS778174 PMID: 27061414

Summary

Recently, massive functional data have been widely collected over space across a set of grid points in various imaging studies. It is interesting to correlate functional data with various clinical variables, such as age and gender, in order to address scientific questions of interest. The aim of this paper is to develop a single-index varying coefficient (SIVC) model for establishing a varying association between functional responses (e.g., image) and a set of covariates. It enjoys several unique features of both varying-coefficient and single-index models. An estimation procedure is developed to estimate varying coefficient functions, the index function, and the covariance function of individual functions. The optimal integration of information across different grid points are systematically delineated and the asymptotic properties (e.g., consistency and convergence rate) of all estimators are examined. Simulation studies are conducted to assess the finite-sample performance of the proposed estimation procedure. Furthermore, our real data analysis of a white matter tract dataset obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study confirms the advantage and accuracy of SIVC model over the popular varying coefficient model.

Keywords: Functional response, Image analysis, Single index, Uniform convergence, Varying coefficient

1. Introduction

As a semiparametric regression modelling strategy, single-index modelling has attracted much attention in the literature due to its balance between exibility and fidelity. A classical single-index model is often written as

Y = g (X^{T} β) + ε,

(1)

where Y is a response variable, g(·) is an unknown index function, X is a covariate vector, and ε is an error term such that E(ε|X) = 0. See Horowitz (2009) for a comprehensive review of various estimation methods for single-index models and references therein (Li, 1991; Cook and Weisberg, 1991; Zhu et al., 2010; Xia et al., 2002; Xia, 2007; Ma and Zhu, 2012, 2013). For instance, dimension reduction approaches, such as likelihood-based methods (Cook and Forzani, 2009), are also commonly used for estimation. However, the existing literature primarily considers univariate response observed from cross-sectional studies.

This paper is motivated by the analysis of a real diffusion weighted imaging (DWI) data set with n = 213 subjects collected from NIH Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. For each subject, we calculated a Fractional Anisotropy (FA) curve at all the 83 grid points along the skeleton of the midsagittal corpus callosum as shown in Figure 1. We are interested in establishing an association between FA curves and several covariates of interest, such as age and gender. To establish such association, standard grid-wise methods are to fit a linear model to functional observations at each grid point as responses and clinical variables, such as age and gender, as covariates, and to generate a statistical map of test statistics or p-values across all grid points (Lazar, 2008;Worsley et al., 2004). These grid-wise methods have several major limitations. First, compared with model (1), the classical linear model used in the neuroimaging literature is often restrictive, since it assumes that the index function g(·) is an identity function. When g(·) is truly nonlinear, directly fitting a classical linear model can cause substantial efficiency loss and reduce prediction accuracy. Second, since the grid-wise methods treat all grid points as independent units, they ignore two key functional features of functional data including spatial smoothness and spatial correlation.

ADNI data analysis: raw FA curves measured at 83 grid points (upper-left panel), the estimated index function with the broken red lines representing 95% simultaneous Confidence bands (upper-right panel), the estimated accumulative proportion of estimated eigenvalues (lower-left panel) and estimated eigenfunctions (lower-right panel) corresponding to the five largest eigenvalues.

Some advanced methods have been developed to specifically incorporate these features by using function-on-scalar regression under the functional data analysis (FDA) framework (Zhu et al., 2012; Ramsay and Silverman, 2005; Staicu et al., 2010; Zhang and Chen, 2007; Reiss et al., 2010). Some important estimation methods for FDA include adaptive smoothing methods (Polzehl and Spokoiny, 2006; Li et al., 2011), the integration of FDA and adaptive smoothing methods (Zhu et al., 2014), and spatial priors within the Bayesian framework (Gossl et al., 2001; Penny et al., 2005; Bowman et al., 2008; Smith and Fahrmeir, 2007; Yue et al., 2010; Miranda et al., 2013), among others. See Morris (2015) and Wang et al. (2015) for a comprehensive review of various FDA models for functional responses. However, to the best of our knowledge, none of the references cited above address the two functional features and estimate the nonparametric index function simultaneously.

The aim of this paper is to develop a single-index varying coefficient (SIVC) model to establish a varying association between functional responses and a set of covariates. Specifically, we extend the single-index model (1) to SIVC for functional responses as follows:

Y (s) = g (X^{T} β (s)) + ε (s),

(2)

where {Y (s) : s ∈ 𝒮} is an observed stochastic process on a compact set 𝒮 and ε(s) is a random function characterizing the within-subject correlations and measurement errors at different grid points such that E{ε(s)|X} = 0 for all s ∈ 𝒮. The β(s) allows us to characterize the dynamic association between covariate X and functional response, whereas g(·) is a nonparametric function, while being fixed across all s ∈ 𝒮. Model (2) differs from the functional single index model in Jiang and Wang (2011), in which g(·) varies across s, but β(s) is assumed to be stationary. When g(x) = x, model (2) reduces to standard voxel-wise methods based on linear model and the most popular FDA model considered in (Zhang and Chen, 2007; Ramsay and Silverman, 2005; Zhu et al., 2014). For notational simplicity, we set 𝒮 = [0, 1]. The results can be readily extended to more general cases with compact subset 𝒮 of the Euclidean space.

Compared with the existing literature, we make several unique contributions. (i) We develop a new estimation procedure to estimate various parametric and non-parametric components of SIVC. (ii) Theoretically, we delineate the integration of information across all grid points by using an optimal weight function and then establish the asymptotic properties of various estimates for SIVC. (iii) Our analysis of the ADNI data confirms the advantage and accuracy of SIVC model over the popular varying coefficient model (Zhang and Chen, 2007; Ramsay and Silverman, 2005; Zhu et al., 2014).

The rest of this paper is organized as follows. Section 2 introduces the estimation procedure to estimate varying coefficient functions, index function, and the covariance function of individual functions. Section 3 systematically investigates the asymptotic properties of all estimators. A simulation study and a real data analysis of Alzheimer’s Disease Neuroimaging Initiative (ADNI) study are presented in Section 4 to demonstrate the finite sample performance of SIVC. Section 5 concludes with some discussions.

2. Methods

2.1 Single-index Varying Coefficient Model

We formally introduce the single-index varying coefficient model as follows. Consider {({Y_i(s) : s ∈ 𝒮}, X_i) : i = 1, …, n} from n independent subjects, where 𝒮 is a compact set that characterizes the range of all possible grid points. Our single-index varying coefficient (SIVC) model is written as

Y_{i} (s) = g (X_{i}^{T} β (s)) + η_{i} (s) + ε_{i} (s) for all s \in 𝒮 and i = 1, \dots, n,

(3)

where X_i is a p × 1 covariate vector, β(s) = (β₁(s), ⋯, β_p(s))^T is a p × 1 vector of varying coefficient functions, g(·) is an unknown index function, η_i(s) characterizes individual curve variations, and ε_i(s) is a random function of measurement errors. The process {η_i(s) : s ∈ 𝒮} is assumed to be a Gaussian process with zero mean and a covariance function Σ_η(s, t) = Cov{η(s), η(t)}. The error terms ε_i(s) follow a Gaussian process with zero mean and a diagonal covariance function Cov{ε(s), ε(t)}. That is, ε_i(s) and ε_i(t) are assumed to be independent for s ≠ t, and Cov{ε(s), ε(t)} takes the form of $σ_{ε}^{2} (s) 1 (s = t)$ , where 1(·) is an indicator function. Moreover, Y_i(s) are usually measured at the same set of locations for all subjects and exhibit both the within-curve and between-curve dependence structures. Thus, without loss of generality, it is assumed that Y_i(s) are observed on M grid points 𝒮_M = {s_m : 0 = s₁ ≤ ⋯ ≤ s_M = 1} for all subjects.

For single index model, β(s) is not identifiable since g(X^T β(s)) and g(β₀(s) + δX^T β(s)) are not distinguishable, where δ is any nonzero scalar. A simple solution in the literature (Zhu et al., 2010; Xia et al., 2002) is to impose some constraints on β₀(s) and β(s), such as β₀(s) = 0 and β(s)^T β(s) = 1. Therefore, throughout the paper, it is assumed that X does not contain the intercept, β(s)^T β(s) = 1 holds for all s ∈ 𝒮 and the first entry of β(s) is positive at each s.

2.2 Estimation Procedure

Our estimation procedure consists of three steps for estimating the varying coefficient functions β(·), the index function g(·), and the covariance function Σ_η(s, s′).

Step I. Estimating varying coefficient β(s)

We first consider the estimation of β(s_m) at each given grid point s_m. Model (3) reduces to a classical single-index model given by

Y_{i} (s_{m}) = g (X_{i}^{T} β (s_{m})) + ε_{i}^{*} (s_{m}),

(4)

where $ε_{i}^{*} (s_{m}) = η_{i} (s_{m}) + ε_{i} (s_{m})$ such that $E {ε_{i}^{*} (s_{m}) | X_{i}} = 0$ and $Var {ε^{*} (s_{m}) | X_{i}} = σ_{ε *}^{2} (s_{m}) = Σ_{η} (s_{m}, s_{m}) + σ_{ε}^{2} (s_{m})$ . The likelihood function of a random observation (X, Y (s_m)) in model (4) is given by

f_{1} (X) \times f_{2} (Y (s_{m}) - g (X^{T} β (s_{m})); X),

where f₁ is the probability density function of X, and f₂ is the conditional probability density function of ε*(s_m) = Y (s_m) − g(X^T β(s_m)) given X.

Based on the Gaussian assumption of $ε_{i}^{*} (s)$ , f₂(·) is the normal density with mean zero and variance $σ_{ε *}^{2} (s_{m})$ . Thus, the score function for β(s_m), denoted as S(β(s_m)), is given by

S (β (s_{m})) = - X ε^{*} (s_{m}) ġ (X^{T} β (s_{m})) / σ_{ε *}^{2} (s_{m}) = (I) + (I I) = - E {X | X^{T} β (s_{m})} ε^{*} (s_{m}) ġ (X^{T} β (s_{m})) / σ_{ε *}^{2} (s_{m}) - [X - E {X | X^{T} β (s_{m})}] ε^{*} (s_{m}) ġ (X^{T} β (s_{m})) / σ_{ε *}^{2} (s_{m}),

(5)

where ġ(t) = dg(t)/dt. Following the reasoning in Ma and Zhu (2014), (I) and (II) belong to the tangent space of model (4) with respect to β(s_m), denoted by Λ_g(s_m), and its orthogonal component, denoted by Λ_g(s_m)^⊥, respectively. For any s ∈ 𝒮, Λ_g(s) and Λ_g(s)^⊥ are, respectively, given by

Λ_{g} (s) = {- ε^{*} (s) h (X^{T} β (s)) / σ_{ε *}^{2} (s) : \forall h (\cdot)},

Λ_{g} {(s)}^{⊥} = {- ε^{*} (s) [α (X) - E {α (X) | X^{T} β (s)}] : \forall α (\cdot)} .

Therefore, the efficient score function for β(s_m) is given by

S_{eff} (β (s_{m}); X, Y (s_{m})) = ε^{*} (s_{m}) [X - E {X | X^{T} β (s_{m})}] ġ (X^{T} β (s_{m})) .

(6)

To calculate an efficient estimator of β(s_m), denoted as β̃(s_m), we can solve

\sum_{i = 1}^{n} S_{eff} (\tilde{β} (s_{m}); X_{i}, Y_{i} (s_{m})) = 0 .

(7)

Since S_eff depends on three unknown quantities E{X|X^T β(s_m)}, g(X^T β(s_m)) and ġ(X^T β (s_m)), we construct their nonparametric estimators as follows (Ma and Zhu, 2013). The Nadaraya-Watson kernel estimator of E{X|X^T β(s_m)} is given by

Ê {X | X^{T} β (s_{m})} = \frac{\sum_{i = 1}^{n} X_{i} K_{h_{x}} (X_{i}^{T} β (s_{m}) - X^{T} β (s_{m}))}{\sum_{i = 1}^{n} K_{h_{x}} (X_{i}^{T} β (s_{m}) - X^{T} β (s_{m}))},

(8)

where K_h(·) = K(·/h)/h is a kernel function and h_x is a given bandwidth. We can calculate the estimates of g(X^T β(s_m)) and ġ(X^T β(s_m)) at X^T β(s_m) = X^T β₀(s_m), denoted by ĝ(X^T β₀(s_m)) and ġ̂(X^T β₀(s_m)), by minimizing

\sum_{i = 1}^{n} {[Y_{i} (s_{m}) - g (X^{T} β_{0} (s_{m})) - ġ (X^{T} β_{0} (s_{m})) (X_{i}^{T} - X^{T}) β_{0} (s_{m})]}^{2} K_{h_{y}} ((X_{i}^{T} - X^{T}) β_{0} (s_{m})) .

(9)

As suggested by Ma and Zhu (2014), we set h_x = cn^−1/3 and h_y = cn^−1/5, where c is the average standard deviation of X. By plugging Ê{X|X^T β(s_m)}, ĝ(X^T β₀(s_m)) and ġ̂(X^T β₀(s_m)) into (7), we get an estimate of S_eff(β(s_m); X_i, Y_i(s_m)), denoted by Ŝ_eff(β(s_m); X_i, Y_i(s_m)) and then calculate β̃(s_m) by solving

\sum_{i = 1}^{n} Ŝ_{eff} (\tilde{β} (s_{m}); X_{i}, Y_{i} (s_{m})) = 0 .

(10)

To estimate β(s) at any s ∈ 𝒮, we need to construct a weighted estimating equation to borrow information across all grid points by using the functional features of imaging data (Wang et al., 2004; Zhu et al., 2012; Polzehl and Spokoiny, 2006; Li et al., 2011). Specifically, the weighted estimating equation is given by

Ŝ_{n M} (β (s); w) = \sum_{i = 1}^{n} \sum_{m = 1}^{M} w (s_{m}, s) Ŝ_{eff} (β (s); X_{i}, Y_{i} (s_{m})),

(11)

where w(s_m, s) is a weight function of (s_m, s) and may depend a few parameters, such as bandwidth. Then, we calculate an estimate of β(s), denoted by β̂(s), by solving the following equation:

Ŝ_{n M} (\hat{β} (s); w) = 0 .

(12)

A critical issue in (12) is how to select an optimal weight function w(s_m, s). Theoretically, one may choose w(s_m, s) by minimizing the mean integrated squares error (MISE) of β̂(·) for all s ∈ 𝒮, but it can be challenging due to the lack of precise information about β(s). Without such information, a simple solution is to set w(s_m, s) = K((s_m − s)/h)/h and then select the bandwidth h by using some criteria, such as cross-validation method, based on MISE. Furthermore, when β(s) is a piecewise constant function, we will show how to optimally select w(s_m, s) in Section 3.

Step II. Estimating the unknown index function g(·)

We use the local linear technique to estimate g(X^T β(s)). We define $Z_{i, m} (s) = {(1, {X_{i}^{T} β (s_{m}) - X^{T} β (s)} / h_{1})}^{T}$ and

Σ (X^{T} β (s), h_{1}) = \sum_{i = 1}^{n} \sum_{m = 1}^{M} K_{h_{1}} (X_{i}^{T} β (s_{m}) - X^{T} β (s)) Z_{i, m} {(s)}^{\otimes 2},

where Z^⊗2 = ZZ^T for any vector Z. By replacing β(s_m) by β̂(s_m), we get Ẑ_i,m(s) and Σ̂(X^T β(s), h₁). Denote G(X^T β (s)) = (g(X^T β (s)), h₁ġ(X^T β (s)))^T, we directly minimize a weighted least square function given by

\sum_{i = 1}^{n} \sum_{m = 1}^{M} {Y_{i} (s_{m}) - Ẑ_{i, m} {(s)}^{T} G (X^{T} β (s))}^{2} K_{h_{1}} (X_{i}^{T} \hat{β} (s_{m}) - X^{T} β (s)) .

(13)

Thus, we have ĝ(X^T β(s)) = [1 0]Ĝ (X^T β (s)), where Ĝ (X^T β(s)) is given by

\hat{Σ} {(X^{T} β (s), h_{1})}^{- 1} \sum_{i = 1}^{n} \sum_{m = 1}^{M} K_{h_{1}} (X_{i}^{T} \hat{β} (s_{m}) - X^{T} β (s)) Ẑ_{i, m} (s) Y_{i} (s_{m}) .

The bandwidth h₁ is chosen by using the cross-validation method.

Step III. Estimating the covariance function Σ_η(s, t)

Let d_i(s) = (η_i(s), h₂η̇_i(s))^T, W_m,s = (1, (s_m−s)/h₂)^T, and $Y_{i}^{*} (s_{m}) = Y_{i} (s_{m}) - ĝ (X_{i}^{T} \hat{β} (s_{m}))$ . We minimize the following function

\sum_{m = 1}^{M} {Y_{i}^{*} (s_{m}) - W_{m, s}^{T} d_{i} (s)}^{2} K_{h_{2}} (s_{m} - s)

(14)

to obtain

{\hat{d}}_{i} (s) = {(\sum_{m = 1}^{M} K_{h_{2}} (s_{m} - s) W_{m, s}^{\otimes 2})}^{- 1} \sum_{m = 1}^{M} K_{h_{2}} (s_{m} - s) W_{m, s} Y_{i}^{*} (s_{m}) .

Then, η(s) can be estimated by

{\hat{η}}_{i} (s) = [10] {\hat{d}}_{i} (s) = \sum_{m = 1}^{M} {\tilde{K}}_{h_{2}} (s_{m} - s) Y_{i}^{*} (s_{m}),

(15)

where ${\tilde{K}}_{h_{2}} (s_{m} - s) = [1 0] {(\sum_{m = 1}^{M} K_{h_{2}} (s_{m} - s) W_{m, s}^{\otimes 2})}^{- 1} K_{h_{2}} (s_{m} - s) W_{m, s}$ , which is the empirical equivalent kernel. The bandwidth h₂ is also chosen by using the cross-validation method.

We consider the spectral decomposition of Σ_η(s, t) and its approximation. Suppose Σ_η(s, t) is continuous on 𝒮², then according to Mercer’s theorem, it can be decomposed as

Σ_{η} (s, t) = \sum_{k = 1}^{\infty} λ_{k} ψ_{k} (s) ψ_{k} (t),

where λ₁ ≥ λ₂ ≥ ⋯ ≥ 0 are ordered eigenvalues and ψ_k(s) are the corresponding orthonormal eigenfunctions. Furthermore, the eigenfunctions form an orthonormal system on the space of square-intergrable function on 𝒮, and η_i(s) admits the Karhunen-Loeve expansion as $η_{i} (s) = \sum_{k = 1}^{\infty} ξ_{i k} ψ_{k} (s)$ , where $ξ_{i k} = \int_{0}^{1} η_{i} (s) ψ_{k} (s) d s$ is the k-th functional principal component scores of the i-th subject. For a fixed i, ξ_ik are uncorrelated random variables with mean zero and variance λ_k.

By following Rice and Silverman (1991), the covariance matrix Σ_η(s, t) and its spectral decomposition can be estimated by

{\hat{Σ}}_{η} (s, t) = {(n - p)}^{- 1} \sum_{i = 1}^{n} {\hat{η}}_{i} (s) {\hat{η}}_{i} (t) = \sum_{k = 1}^{\infty} {\hat{λ}}_{k} {\hat{ψ}}_{k} (s) {\hat{ψ}}_{k} (t),

(16)

where λ̂₁ ≥ λ̂₂ ≥ ⋯ ≥ 0 are estimated eigenvalues and ψ̂_k(s) are the corresponding estimated eigenfunctions. Moreover, the k-th functional principal component scores can be computed by ${\hat{ξ}}_{i k} = \sum_{m = 1}^{M} {\hat{η}}_{i} (s_{m}) {\hat{ψ}}_{k} (s_{m}) (s_{m} - s_{m - 1})$ for i = 1, …, n.

2.3 Simultaneous Confidence Bands

Given a Confidence level α, we construct a simultaneous Confidence band for each β_l(s) such that $P ({\hat{β}}_{l}^{L, α} (s) < β_{l} (s) < {\hat{β}}_{l}^{U, α} (s) for all s \in 𝒮) = 1 - α$ where ${\hat{β}}_{l}^{L, α} (s)$ and ${\hat{β}}_{l}^{U, α} (s)$ are the lower and upper limits of simultaneous Confidence band, respectively. Specifically, we set

{\hat{β}}_{l}^{L, α} (s) = {\hat{β}}_{l} (s) - bias ({\hat{β}}_{l} (s)) - C_{β_{l}} (α) and {\hat{β}}_{l}^{U, α} (s) = {\hat{β}}_{l} (s) - bias ({\hat{β}}_{l} (s)) + C_{β_{l}} (α),

(17)

where bias(β̂_l(s)) is the bias of β̂_l(s) at s ∈ 𝒮 and C_{β_l}(α) is a scalar. By following the arguments in Zhu et al. (2012), we use the local polynomial technique to estimate bias(β̂_l(s)) for each l and then approximate C_{β_l}(α) by using the wild bootstrap as follows:

Step 1: We calculate ${\hat{r}}_{i} (s_{m}) = Y_{i} (s_{m}) - ĝ (X_{i}^{T} \hat{β} (s_{m}))$ for all i and m.
Step 2: For q = 1, …, Q, we independently generate { $τ_{i}^{(q)}$ : i = 1, ⋯, n} from N(0, 1) and construct $Ŷ_{i} {(s_{m})}^{(q)} = ĝ (X_{i}^{T} \hat{β} (s_{m})) + τ_{i}^{(q)} {\hat{r}}_{i} (s_{m})$ . Then, based on {Y_i(s_m)^(q)}, we recalculate β̂(s)^(q), and obtain a stochastic process G_{β_l}(s)^(q) = | β̂_l(s)(q) − β̂_l(s)^(q)| for each l.
Step 3: For all q, we calculate the 1 − α empirical percentile of G_{β_l}(s)^(q), denoted by C_{β_l}(s, α), at each s ∈ 𝒮. Finally, an estimate of C_{β_l}(α) is sup_s |C_{β_l}(s, α)|.

Similarly, for a given α, we construct a simultaneous Confidence band for g(·) as follows:

P (ĝ^{L, α} (u) < g (u) < ĝ^{U, α} (u) for all u \in 𝒰) = 1 - α,

where 𝒰 is a compact set in R and ĝ^L,α(u) and ĝ^U,α(u) are the lower and upper limits of simultaneous Confidence band, respectively. Specifically, we set

ĝ^{L, α} (u) = ĝ (u) - bias (ĝ (u)) - C_{g} (α) and ĝ^{U, α} (u) = ĝ (u) - bias (ĝ (u)) + C_{g} (α) .

(18)

Subsequently, we use the local polynomial technique to estimate bias(ĝ(u)) and then use the wild bootstrap to approximate C_g(α) as follows:

Step 1: We calculate ${\hat{r}}_{i} (s_{m}) = Y_{i} (s_{m}) - ĝ (X_{i}^{T} \hat{β} (s_{m}))$ for all i and m.
Step 2: For q = 1, …, Q, we independently generate { $τ_{i}^{(q)}$ : i = 1, ⋯, n} from N(0, 1) and construct $Ŷ_{i} {(s_{m})}^{(q)} = ĝ (X_{i}^{T} \hat{β} (s_{m})) + τ_{i}^{(q)} {\hat{r}}_{i} (s_{m})$ . Then, based on {Y_i(s_m)^(q)}, we recalculate ĝ(·)^(q), and obtain a stochastic process G_g(u)^(q) = |ĝ(u) − ĝ(u)^(q)|.
Step 3: For all ℓ, we calculate the 1 − α empirical percentile of G_g(u)^(q) denoted by C_g(u, α) at every u ∈ 𝒰 and then approximate C_g(α) by using sup_u∈𝒰 |C_g(u, α)|.

3. Theoretical Results

3.1 Optimal weight Functions

We consider a challenging issue of optimally selecting the weight function w(s_m, s) in order to gain efficiency, since Ŝ_nM (β(s);w) for a given weight function w(·, ·) may not be an efficient estimating equation. Specifically, Λ_g(s) and Λ_g(s′) may interact with each other for s ≠ s′ by noting that

E [ε^{*} (s) h (X^{T} β (s)) ε^{*} (s') h' (X^{T} β (s')) / {σ_{ε *}^{2} (s) σ_{ε *}^{2} (s')}] = Σ_{η} (s, s') E {h (X^{T} β (s)) h' (X^{T} β (s'))} / {σ_{ε *}^{2} (s) σ_{ε *}^{2} (s')} .

(19)

Therefore, Λ_g(s) and Λ_g(s′) are orthogonal with each other for s ≠ s′ if and only if Σ_η(s, s′) = 0 for s ≠ s′. It also holds for {Λ_g(s)^⊥ : s ∈ 𝒮} due to Λ_g(s) ⊥ Λ_g(s)^⊥.

First, we consider how to choose the weight function when β(s) = β₀ does not vary across s ∈ 𝒮. With some calculations, we can show that the covariance matrix of $\sqrt{n} \hat{β} (s)$ can be approximated by

Σ_{1} (w (s)) = D (w (s)) E {{[X - E {X | X^{T} β_{0}}]}^{\otimes 2} ġ {(X^{T} β_{0})}^{2}}^{- 1},

(20)

where w(s) = (w(s₁, s), ⋯, w(s_M, s))^T and D(w(s)) is given by

\frac{\sum_{m, m' = 1}^{M} w (s_{m}, s) w (s_{m'}, s) {Σ_{η} (s_{m}, s_{m'}) + σ_{ε}^{2} (s_{m}) 1 (s_{m} = s_{m'})}}{{\sum_{m = 1}^{M} w (s_{m}, s)}^{2}} .

(21)

We can obtain an optimal weight vector, denoted by w_*, by minimizing Σ₁(w(s)) such that w_* = argmin_w(s)Σ₁(w(s)). As shown in Theorem 1 (i) below, w_* is associated with the eigenvalue-eigenvector pairs of $Λ_{ε *, M}^{- 1 / 2} Σ_{η *, M} Λ_{ε *, M}^{- 1 / 2}$ , denoted by {(λ_m*,M, ψ_m*,M) : m = 1, …, M}, where Σ_η*,M = (Σ_η(s_m, s_m′)) and $Λ_{ε *, M} = (σ_{ε}^{2} (s_{m}) 1 (s_{m} = s_{m'}))$ are two M × M matrices.

Second, we set ω(s_m, s) as K_h(s_m − s), which is a kernel function of (s_m − s)/h, when β(s) may vary across s ∈ 𝒮. If h → 0, then it can be shown that the covariance matrix of $\sqrt{n} \hat{β} (s)$ can be approximated by Σ₁(w(s)) in (20). We will show in Theorem 1 (ii) that the use of the kernel function can lead to substantial efficiency gain even under this general scenario.

Theorem 1

We have the following results.

Suppose that β(s) = β₀ does not vary across s ∈ 𝒮. The optimal w_* is given by
$w_{*} = \sum_{ε *, M}^{- 1} 1_{M} / {‖ \sum_{ε *, M}^{- 1} 1_{M} ‖}_{2},$ (22)
where ‖·‖₂ is the Euclidean norm of a vector, Σ_ε*,M = Σ_η*,M+Λ_ε*,M is an M×M matrix, and 1_M is an M × 1 vector of ones. Thus, the optimal D(w) is given by $D (w_{*}) = {(1_{M}^{T} \sum_{ε *, M}^{- 1} 1_{M})}^{- 1}$ and is independent of s. The $Σ_{ε *, M}^{- 1} 1_{M}$ can be written as
${I_{M} - Λ_{ε *, M}^{- 1 / 2} \sum_{m = 1}^{M} \frac{λ_{m *, M}}{1 + λ_{m *, M}} ψ_{m *, M} ψ_{m *, M}^{T} Λ_{ε *, M}^{- 1 / 2}} Λ_{ε *, M}^{- 1 / 2} 1_{M} .$ (23)
Suppose that β(s) may vary across s ∈ 𝒮. Under Assumptions (C6) and (C7), if w(s_m, s) = K_h(s_m − s), h → 0, and Mh → ∞, then D(w(s)) can be approximated by Σ_η(s, s).

Theorem 1 has several interesting implications. If Σ_η(s, s′) = 0 for any s ≠ s′, then w_* is proportional to $Λ_{ε *, M}^{- 1} 1_{M} = {(σ_{ε}^{- 2} (s_{1}), \dots, σ_{ε}^{- 2} (s_{M}))}^{T}$ . In this case, we can set $w (s_{m}, s) = σ_{ε}^{- 2} (s_{m})$ for all m, and then the optimal weighted estimating equation is given by

Ŝ_{n M} (β (s); w_{*}) = \sum_{m = 1}^{M} \sum_{i = 1}^{n} ε_{i}^{*} (s_{m}) [X_{i} - E {X_{i} | X_{i}^{T} β (s_{m})}] ġ (X_{i}^{T} β (s_{m})) / σ_{ε *}^{2} (s_{m}) .

Theorem 1 (i) also implies that in general cases, w(s_m, s) is given by

σ_{ε}^{- 2} (s_{m}) - σ_{ε} {(s_{m})}^{- 1} \sum_{m' = 1}^{M} \frac{λ_{m' *, M}}{1 + λ_{m' *, M}} e_{m}^{T} ψ_{m' *, M} ψ_{m' *, M}^{T} Λ_{ε *, M}^{- 1 / 2} 1_{M},

where e_m is an M × 1 vector with the m-th element one and zero otherwise. For functional data, it is common to assume that λ_m*,M = 0 for all m > K, where K is a positive integer. Under some additional conditions, it can be shown that λ_m*,M and ψ_m*,M converge to the m–th eigenvalue and its corresponding eigenfunction of the covariance function Σ_η(s, s′)/{σ_ε(s)σ_ε(s′)}, respectively.

Another implication of Theorem 1 (i) is the lower bound of D(w_*). Specifically, D(w_*) is given by

{(1_{M}^{T} Λ_{ε *, M}^{- 1} 1_{M} - 1_{M}^{T} Λ_{ε *, M}^{- 1 / 2} \sum_{m = 1}^{M} \frac{λ_{m *, M}}{1 + λ_{m *, M}} ψ_{m *, M} ψ_{m *, M}^{T} Λ_{ε *, M}^{- 1 / 2} 1_{M})}^{- 1},

(24)

which is greater than ${(1_{M}^{T} Λ_{ε *, M}^{- 1} 1_{M})}^{- 1} = 1 / {\sum_{m = 1}^{M} σ_{ε}^{- 2} (s_{m})}$ . In general, the presence of spatial correlation increases the covariance matrix of β̂(s). Such lower bound is achievable only when Σ_η(s, s′) = 0 holds for any s ≠ s′. Furthermore, such lower bound is asymptotically achievable when λ_1*,M = o_p(1), since x/(1 + x) is a monotone function of x.

Theorem 1 (ii) implies that the use of w(s_m, s) = K_h(s_m − s) can lead to substantial efficiency gain if there are substantial measurement errors. Specifically, if w(s_m) = e_m, then we consider information at the m–th grid point. In this case, we have $D (e (s_{m})) = Σ_{η} (s_{m}, s_{m}) + σ_{ε}^{2} (s_{m})$ , which can be much larger than Σ_η(s_m, s_m) when $σ_{ε}^{2} (m)$ is much larger than Σ_η(s_m, s_m). The ratio of Σ₁(e(s_m)) over Σ₁((K_h(s_m−s))) is equal to $1 + σ_{ε}^{2} (s_{m}) / Σ_{η} (s_{m}, s_{m})$ . Therefore, the efficiency gain of using w(s_m, s) = K_h(s_m − s) can be substantial if the value of $σ_{ε}^{2} (s_{m}) / Σ_{η} (s_{m}, s_{m})$ is large.

3.2 Asymptotic Properties

Second, we investigate the asymptotic properties of β̂(·), ĝ(X^T β̂(s)) and Σ̂_η(s, t) when we set w(s_m, s) = K_h(s_m − s). For any smooth function f(s) and g(s, t), define ḟ(s) = df(s)/ds, f̈(s) = d² f(s)/ds² and g^(a,b)(s, t) = ∂^a+bg(s, t)/∂s^a∂t^b, where a and b are any nonnegative integers. We state the following theorems regarding the weak convergence of {β̂(s) : s ∈ 𝒮} and ĝ(X^T β̂(s)), whose detailed assumptions and proofs can be found in Web Appendix. Moreover, we also include additional asymptotic properties and their proofs in the same Web Appendix.

Theorem 2

Under Assumptions (C1)–(C11), as n, M → ∞, we have the following results.

$\sqrt{n} (\hat{β} (s) - β_{0} (s) - 0.5 h^{2} A_{n} {(s)}^{- 1} {\frac{1}{n} \sum_{i = 1}^{n} B_{i} (s) [\ddot{g} (X_{i}^{T} β (s)) + 2 Ġ (X_{i}^{T} β (s)) \dot{π} (s) / π (s)] μ_{2} (K)})$ converges weakly to a Gaussian process with mean zero and covariance function, which is the limiting function of $A_{n} {(s)}^{- 1} [n^{- 1} \sum_{i = 1}^{n} B_{i} (s) Σ_{η} (s, t) B_{i} {(t)}^{T}] A_{n} {(t)}^{- 1}$ , where A_n(·) and B_i(·) are defined in Web Appendix.
$\sqrt{n} [ĝ (X^{T} \hat{β} (s)) - g (X^{T} β (s)) - 0.5 h_{1}^{2} μ_{2} (K) \ddot{g} (K^{T} β (s))]$ converges weakly to a Gaussian process with mean zero and covariance function Σ_η(s, t).

Theorem 2 ensures that we can make formal statistical inference on β(·) and g(·). Based on Theorem 2 (i) and (ii), we develop a wild bootstrap method to construct the Confidence bands of β̂(·) and ĝ(·) and include it in Section B of Web Appendix.

4. Numerical Studies

4.1 Simulation Results

We generated Y (s_m) according to model (3) with ε(s_m) ~ N(0, σ² = 0.3²) and X_i ~ N(0_p, Σ), where Σ is a 4 × 4 matrix with elements ρ^|j′−j| for j, j′ = 1, …, 4. Moreover, the varying coefficients β_j(s) are, respectively, given by

β_{1} (s) = 1 + s^{2}, β_{2} (s) = {(1 - s)}^{2},

β_{3} (s) = 4 s (1 - s), β_{4} (s) = - 1 + 4 {(s - 0.5)}^{2},

and then they were scaled as β(s)/‖β(s)‖₂. We set s_m as M = 50 equidistant grid points in [0,1] with s₁ = 0 and s_M = 1. We set $η_{i} (s) = \sum_{k = 1}^{2} ξ_{i k} ψ_{k} (s)$ with $ξ_{i 1} ~ N (0, λ_{1}^{2} = 1), ξ_{i 2} ~ N (0, λ_{2}^{2} = {0.5}^{2}), ψ_{1} (s) = \sqrt{2} sin (2 π s)$ , and $ψ_{2} (s) = \sqrt{2} cos (2 π s)$ . We consider two index functions including $g_{1} (X_{i}^{T} β (s_{m})) = sin (2 X_{i}^{T} β (s_{m})) + 2 cos (2 + X_{i}^{T} β (s_{m}))$ and $g_{2} (X_{i}^{T} β (s_{m})) = X_{i}^{T} β (s_{m})$ .

We conducted extensive simulation studies under different settings, but we only report some representative results for the sake of space. First, we set n = 40 and 200 and simulated data sets from model (3) for the first index function $g_{1} (X_{i}^{T} β (s_{m}))$ as described above. We fitted SIVC to each simulated data set and calculated all unknown quantities. Table 1 summarizes the mean absolute error (MAE) and root mean square error (RMSE) of all parameter estimates and the mean integrated absolute error (MIAE) and mean integrated squared error (MISE) of all estimated functions based on 200 simulations. The results in Table 1 indicate satisfactory performance of our estimators since all MAE, MSE, MIAE and MISE values are quite small. Typical estimated functions with mean performance are displayed in Figure 2 and Figure 1 in Web Appendix. The estimated curves (broken lines) closely resemble the corresponding true functions (solid lines) in these figures. As expected, all the errors increase as sample size decreases. Moreover, Table 2 includes the coverage probabilities of the simultaneous Confidence bands of β_l(s)s and g₁(u) for n = 40 and 200 based on 500 simulations. These coverage probabilities get close to the specified Confidence level 95% as sample size increases, while the results for g(·) are slightly worse than those of β_l(s).

Table 1.

Estimation results from 200 simulated data sets corresponding to the index function g₁(X^T β(s)) with n = 200 and 40. MAE is mean absolute error, RMSE is root mean square error, MIAE is mean integrated absolute error, and MISE is mean integrated square error

	n = 200				n = 40
Parameters	λ₁	λ₂	σ²		λ₁	λ₂	σ²
MAE	0.1551	0.0621	0.0202		0.1773	0.1026	0.0594
RMSE	0.1724	0.0726	0.0267		0.2108	0.1212	0.0667

n = 200
Functions	β₁(s)	β₂(s)	β₃(s)	β₄(s)	ψ₁(s)	ψ₂(s)	g(·)
MIAE	0.0288	0.0440	0.0403	0.0119	0.0137	0.0138	0.1063
MISE	0.0042	0.0077	0.0093	0.0039	0.0033	0.0034	0.0412

n = 40
Functions	β₁(s)	β₂(s)	β₃(s)	β₄(s)	ψ₁(s)	ψ₂(s)	g(·)
MIAE	0.0488	0.0751	0.0686	0.0597	0.0291	0.0299	0.2536
MISE	0.0121	0.0249	0.0251	0.0241	0.0061	0.0060	0.1530

Open in a new tab

Simulation results for model (3) with the first index function and n = 200: the true and estimated varying coefficient functions and the true and estimated index functions. In each panel, the solid line represents the true function, the broken line represents the estimated function, and the red broken lines are the corresponding 95% simultaneous Confidence bands.

Table 2.

Coverage probabilities of simultaneous confidence bands for n = 40 and 200 based on 500 simulated data sets. The confidence level is 95%.

n	β₁(s)	β₂(s)	β₃(s)	β₄(s)	g(·)
40	0.936	0.932	0.944	0.934	0.924
200	0.946	0.954	0.948	0.946	0.942

Open in a new tab

Second, we illustrate the superiority of SIVC over multivariate varying coefficient model (MVCM) in Zhu et al. (2012) in terms of prediction accuracy. We set n = 200 and then simulated data sets from model (3) for both $g_{1} (X_{i}^{T} β (s_{m}))$ and $g_{2} (X_{i}^{T} β (s_{m}))$ . For each simulated dataset, we randomly split it into a training set and a test set according to proportions π and 1 − π, respectively. We used the training set to estimate all unknown parameters, and then predicted the responses in the test set. Finally, we calculated the prediction error for each simulated dataset. We consider three values of π including 0.3, 0.5, and 0.7. For each case, we repeated 200 times. Table 3 reports the MAE and RMSE of prediction errors for SIVC and MVCM. For the index function $g_{1} (X_{i}^{T} β (s_{m}))$ , SIVC significantly outperforms MVCM with smaller MAE and RMSE. Even for $g_{2} (X_{i}^{T} β (s_{m}))$ , SIVC is slightly better than MVCM. It may indicate that SIVC is a useful tool for modeling functional data.

Table 3.

Prediction results corresponding to both index functions g₁(X^T β(s)) and g₂(X^T β(s)) with n = 200. MAE is mean absolute error and RMSE is root mean square error.

	g₁(X^T β(s))		g₂(X^T β(s))
	MAE	RMSE	MAE	RMSE
	π = 0.3
SIVC	1.013 (0.040)	1.282 (0.052)	0.987 (0.042)	1.248 (0.053)
MVCM	1.215 (0.040)	1.530 (0.053)	1.031 (0.045)	1.299 (0.060)

	π = 0.5
SIVC	1.002 (0.049)	1.267 (0.063)	0.978 (0.048)	1.236 (0.060)
MVCM	1.207 (0.048)	1.520 (0.065)	1.014 (0.050)	1.278 (0.062)

	π = 0.7
SIVC	0.977 (0.068)	1.236 (0.081)	0.966 (0.065)	1.222 (0.082)
MVCM	1.188 (0.063)	1.494 (0.082)	1.006 (0.068)	1.268 (0.084)

Open in a new tab

4.2 Real Data Analysis

We applied model (3) to the DWI data set described in Section 1. One goal of NIH ADNI is to test whether genetic, structural and functional neuroimaging, and clinical data can be integrated to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). We downloaded the structural brain MRI data and corresponding clinical and genetic data from baseline and follow-up from the ADNI publicly available database (http://adni.loni.usc.edu/).

The DWI data were processed by using a FSL TBSS pipeline (Smith et al., 2006) to register DTIs from multiple subjects to create a mean image and a mean skeleton. We used FMRIB software library to compute maps of fractional anisotropy (FA) for all subjects from the DTI after eddy current correction and automatic brain extraction. Then, we fed FA maps into the TBSS tool, which is also part of FSL. In the TBSS analysis, we aligned the FA data of all the subjects into a common space by using non-linear registration and created and thinned the mean FA image to obtain a mean FA skeleton, which represents the centers of all white matter tracts common to the group. Subsequently, we projected each subject’s aligned FA data onto this skeleton. Finally, we obtained the FA template curve measured at all the 83 grid points along the skeleton of the midsagittal corpus callosum as shown in Figure 2 in Appendix for all the subjects.

We are interested in establishing an association between FA and seven covariates including the gender variable (123 male and 91 female, coded by a dummy variable indicating for male), the age of the subject (years, ranges from 48.4 (years) to 90.4, mean 73.20), an indicator for handiness (193 right-hand and 20 left-hand, coded by a dummy variable indicating for left-hand), the education level (years, ranges from 9 to 20 (years), mean 15.91), an indicator for Alzheimer’s disease (AD) status (19.6%), an indicator for mild cognitive impairment (MCI) status (55.1%) and Mini-Mental State Exam (MMSE) of ADNI. We standardized all variables to have mean zero and variance one. We fitted model (3) and applied the estimation procedure in Section 2 to the data set. Figure 3 presents the estimated varying coefficients corresponding to age, education, AD status, and MCI status. The results reveal that MMSE, age, education, MCI, and AD are the most important factors. Moreover, gender and handiness have little effects on FA.

ADNI data analysis: the four estimated varying coefficients for Age, Education, AD, and MCI. The black solid lines are estimated coefficients and the red broken lines are their corresponding 95% simultaneous Confidence bands.

We estimated the single index function g(·) and the covariance function Σ_η(s, s′) and its associated eigenvalues and eigenfunctions. See Figure 1 for details. The single index function shows a pattern with small values at the left and right ending points of the range $X_{i}^{T} β (s)$ and two modes in the middle of the range. The top five non-zero eigenvalues of Σ̂_η(s, s′) are 0.8628, 0.2771, 0.0284, 0.0142, and 0.0102, respectively. The first two eigenvalues account for 94.8% of the total variability, while the remaining eigenvalues rapidly drop to zero. The first eigenfunction, with a dominant eigenvalue accounting for 71.8% of the total variation, is simple in structure and resembles a single cycle of a sine wave. The remaining eigenfunctions are also quite simple and roughly sinusoidal representing additional functional structure that cannot be captured by the mean structure of model (3).

Finally, we compared the prediction accuracy of SIVC with that of MVCM in Zhu et al. (2012). We randomly split the 213 subjects into a training set and a test set with corresponding proportions π and 1 − π, respectively. We used the training set to estimate all the parameters, and then predicted the responses of the testing set. The 100 replications were used to calculate the prediction errors corresponding π = 0.3, 0.5 and 0.7. Table 4 reports the MAE and RMSE of the prediction errors and indicates that SIVC significantly outperforms MVCM in terms of both MAE and RMSE.

Table 4.

Prediction results for the ADNI data analysis. EST is estimate, SE is standard error, MAE is mean absolute error, and RMSE is root mean square error.

		MAE	RMSE	MAE	RMSE	MAE	RMSE
		π = 0.3		π = 0.5		π = 0.7
SIVC	EST	0.114	0.1374	0.113	0.136	0.112	0.135
	SE	0.001	0.001	0.002	0.002	0.002	0.002
MVCM	EST	0.599	0.650	0.599	0.627	0.600	0.628
	SE	0.012	0.028	0.014	0.022	0.013	0.016

Open in a new tab

5. Conclusion

In this paper we have developed a single-index varying coefficient model for establishing a varying association between functional responses (e.g., image) and a set of covariates. We have developed an estimation procedure to estimate varying coefficient functions, link function, and the covariance function of individual functions. We have investigated a strategy to integrate the information across all grid points. We have used simulations and real data analysis to demonstrate that SIVC is a useful tool for modeling functional data.

Many important issues need to be addressed in future research. First, the computational burden associated with SIVC can be quite heavy making it infeasible for large-scale imaging data at this moment. We will develop more computationally efficient algorithms to address such challenge. Second, we need to develop an effective procedure to carry out statistical inference, such as hypothesis test. Third, it is scientifically interesting to extend SIVC to carry out regression analysis of longitudinal functional data.

Supplementary Material

NIHMS778174-supplement-su.pdf^{(424.2KB, pdf)}

Acknowledgments

The research of Dr. Zhu was supported by NIH grants 1UL1TR001111, MH086633, and EB005149-01 and NSF grants SES-1357666 and DMS-1407655. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

Footnotes

Supplementary Materials

All Web Appendices, Tables, and Figures referenced in Sections 3.2 and 4.2 are available with this paper at the Biometrics website on Wiley Online Library. In addition, we will develop a companion software for SIVC and release it to the public through http://www.nitrc.org/ and https://www.nitrc.org/projects/fadtts/.

References

Bowman FD, Caffo B, Bassett SS, Kilts C. A bayesian hierarchical framework for spatial modeling of fmri data. NeuroImage. 2008;39:146–156. doi: 10.1016/j.neuroimage.2007.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cook RD, Forzani L. Likelihood-based sufficient dimension reduction. Journal of the American Statistical Association. 2009;104:197–208. [Google Scholar]
Cook RD, Weisberg S. Discussion of “sliced inverse regression for dimension reduction”. Journal of the American Statistical Association. 1991;86:328–332. [Google Scholar]
Gossl C, Auer DP, Fahrmeir L. Bayesian spatiotemporal inference in functional magnetic resonance imaging. Biometrics. 2001;57:554–562. doi: 10.1111/j.0006-341x.2001.00554.x. [DOI] [PubMed] [Google Scholar]
Horowitz JL. Semiparametric and Nonparametric Methods in Econometrics. Springer; 2009. [Google Scholar]
Jiang CR, Wang JL. Functional single index models for longitudinal data. Annals of Statistics. 2011;39:362–388. [Google Scholar]
Lazar N. The Statistical Analysis of Functional MRI Data. New York: Springer; 2008. [Google Scholar]
Li K-C. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 1991;86:316–327. [Google Scholar]
Li Y, Zhu H, Shen D, Lin W, Gilmore JH, Ibrahim JG. Multiscale adaptive regression models for neuroimaging data. Journal of the Royal Statistical Society: Series B. 2011;73:559–578. doi: 10.1111/j.1467-9868.2010.00767.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma Y, Zhu L. A semiparametric approach to dimension reduction. Journal of the American Statistical Association. 2012;107:168–179. doi: 10.1080/01621459.2011.646925. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma Y, Zhu L. Efficient estimation in sufficient dimension reduction. Annals of statistics. 2013;41:250–268. doi: 10.1214/12-AOS1072SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma Y, Zhu L. On estimation efficiency of the central mean subspace. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014;76:885–901. [Google Scholar]
Miranda MF, Zhu H, Ibrahim JG. Bayesian spatial transformation models with applications in neuroimaging data. Biometrics. 2013;69:1074–1083. doi: 10.1111/biom.12085. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morris JS. Functional regression. Annual Review of Statistics and Its Application. 2015;2:321–359. [Google Scholar]
Penny WD, Trujillo-Barreto NJ, Friston KJ. Bayesian fmri time series analysis with spatial priors. NeuroImage. 2005;24:350–362. doi: 10.1016/j.neuroimage.2004.08.034. [DOI] [PubMed] [Google Scholar]
Polzehl J, Spokoiny VG. Propagation-separation approach for local likelihood estimation. Probab. Theory Relat. Fields. 2006;135:335–362. [Google Scholar]
Ramsay JO, Silverman BW. Functional Data Analysis. second. New York: Springer; 2005. [Google Scholar]
Reiss PT, Huang L, Mennes M. Fast function-on-scalar regression with penalized basis expansions. The International Journal of Biostatistics. 2010;6 doi: 10.2202/1557-4679.1246. [DOI] [PubMed] [Google Scholar]
Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B. 1991;53:233–243. [Google Scholar]
Smith M, Fahrmeir L. Spatial bayesian variable selection with application to functional magnetic resonance imaging. Journal of the American Statistical Association. 2007;102:417–431. [Google Scholar]
Smith S, Jenkinson M, Johansen-Berg H, Rueckert D, Nichols T, Mackay C, Watkins K, Ciccarelli O, Cader M, Matthews P, Behrens T. Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. NeuroImage. 2006;31:1487–1505. doi: 10.1016/j.neuroimage.2006.02.024. [DOI] [PubMed] [Google Scholar]
Staicu A, Crainiceanu C, Carroll R. Fast analysis of spatially correlated multilevel functional data. Biostatistics. 2010;11:177–194. doi: 10.1093/biostatistics/kxp058. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang J-L, Chiou J-M, Mueller H-G. Review of functional data analysis. 2015 [Google Scholar]
Wang X, van Eeden C, Zidek JV. Asymptotic properties of maximum weighted likelihood estimators. Journal of Statistical Planning and Inference. 2004;119:37–54. [Google Scholar]
Worsley KJ, Taylor JE, Tomaiuolo F, Lerch J. Unified univariate and multivariate random field theory. NeuroImage. 2004;23:189–195. doi: 10.1016/j.neuroimage.2004.07.026. [DOI] [PubMed] [Google Scholar]
Xia Y. A constructive approach to the estimation of dimension reduction directions. The Annals of Statistics. 2007;35:2654–2690. [Google Scholar]
Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B. 2002;64:363–410. [Google Scholar]
Yue Y, Loh JM, Lindquist MA. Adaptive spatial smoothing of fMRI images. Statistics and its Interface. 2010;3:3–14. [Google Scholar]
Zhang J, Chen J. Statistical inferences for functional data. The Annals of Statistics. 2007;35:1052–1079. [Google Scholar]
Zhu H, Fan J, Kong L. Spatially varying coefficient model for neuroimaging data with jump discontinuities. Journal of the American Statistical Association. 2014;109:1084–1098. doi: 10.1080/01621459.2014.881742. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu H, Li R, Kong L. Multivariate varying coefficient model for functional responses. Annals of Statistics. 2012;40:2634–2666. doi: 10.1214/12-AOS1045SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu L, Wang T, Zhu L, Ferré L. Sufficient dimension reduction through discretization-expectation estimation. Biometrika. 2010;97:295–304. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS778174-supplement-su.pdf^{(424.2KB, pdf)}

[R1] Bowman FD, Caffo B, Bassett SS, Kilts C. A bayesian hierarchical framework for spatial modeling of fmri data. NeuroImage. 2008;39:146–156. doi: 10.1016/j.neuroimage.2007.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Cook RD, Forzani L. Likelihood-based sufficient dimension reduction. Journal of the American Statistical Association. 2009;104:197–208. [Google Scholar]

[R3] Cook RD, Weisberg S. Discussion of “sliced inverse regression for dimension reduction”. Journal of the American Statistical Association. 1991;86:328–332. [Google Scholar]

[R4] Gossl C, Auer DP, Fahrmeir L. Bayesian spatiotemporal inference in functional magnetic resonance imaging. Biometrics. 2001;57:554–562. doi: 10.1111/j.0006-341x.2001.00554.x. [DOI] [PubMed] [Google Scholar]

[R5] Horowitz JL. Semiparametric and Nonparametric Methods in Econometrics. Springer; 2009. [Google Scholar]

[R6] Jiang CR, Wang JL. Functional single index models for longitudinal data. Annals of Statistics. 2011;39:362–388. [Google Scholar]

[R7] Lazar N. The Statistical Analysis of Functional MRI Data. New York: Springer; 2008. [Google Scholar]

[R8] Li K-C. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 1991;86:316–327. [Google Scholar]

[R9] Li Y, Zhu H, Shen D, Lin W, Gilmore JH, Ibrahim JG. Multiscale adaptive regression models for neuroimaging data. Journal of the Royal Statistical Society: Series B. 2011;73:559–578. doi: 10.1111/j.1467-9868.2010.00767.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Ma Y, Zhu L. A semiparametric approach to dimension reduction. Journal of the American Statistical Association. 2012;107:168–179. doi: 10.1080/01621459.2011.646925. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Ma Y, Zhu L. Efficient estimation in sufficient dimension reduction. Annals of statistics. 2013;41:250–268. doi: 10.1214/12-AOS1072SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Ma Y, Zhu L. On estimation efficiency of the central mean subspace. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014;76:885–901. [Google Scholar]

[R13] Miranda MF, Zhu H, Ibrahim JG. Bayesian spatial transformation models with applications in neuroimaging data. Biometrics. 2013;69:1074–1083. doi: 10.1111/biom.12085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Morris JS. Functional regression. Annual Review of Statistics and Its Application. 2015;2:321–359. [Google Scholar]

[R15] Penny WD, Trujillo-Barreto NJ, Friston KJ. Bayesian fmri time series analysis with spatial priors. NeuroImage. 2005;24:350–362. doi: 10.1016/j.neuroimage.2004.08.034. [DOI] [PubMed] [Google Scholar]

[R16] Polzehl J, Spokoiny VG. Propagation-separation approach for local likelihood estimation. Probab. Theory Relat. Fields. 2006;135:335–362. [Google Scholar]

[R17] Ramsay JO, Silverman BW. Functional Data Analysis. second. New York: Springer; 2005. [Google Scholar]

[R18] Reiss PT, Huang L, Mennes M. Fast function-on-scalar regression with penalized basis expansions. The International Journal of Biostatistics. 2010;6 doi: 10.2202/1557-4679.1246. [DOI] [PubMed] [Google Scholar]

[R19] Rice JA, Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B. 1991;53:233–243. [Google Scholar]

[R20] Smith M, Fahrmeir L. Spatial bayesian variable selection with application to functional magnetic resonance imaging. Journal of the American Statistical Association. 2007;102:417–431. [Google Scholar]

[R21] Smith S, Jenkinson M, Johansen-Berg H, Rueckert D, Nichols T, Mackay C, Watkins K, Ciccarelli O, Cader M, Matthews P, Behrens T. Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. NeuroImage. 2006;31:1487–1505. doi: 10.1016/j.neuroimage.2006.02.024. [DOI] [PubMed] [Google Scholar]

[R22] Staicu A, Crainiceanu C, Carroll R. Fast analysis of spatially correlated multilevel functional data. Biostatistics. 2010;11:177–194. doi: 10.1093/biostatistics/kxp058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Wang J-L, Chiou J-M, Mueller H-G. Review of functional data analysis. 2015 [Google Scholar]

[R24] Wang X, van Eeden C, Zidek JV. Asymptotic properties of maximum weighted likelihood estimators. Journal of Statistical Planning and Inference. 2004;119:37–54. [Google Scholar]

[R25] Worsley KJ, Taylor JE, Tomaiuolo F, Lerch J. Unified univariate and multivariate random field theory. NeuroImage. 2004;23:189–195. doi: 10.1016/j.neuroimage.2004.07.026. [DOI] [PubMed] [Google Scholar]

[R26] Xia Y. A constructive approach to the estimation of dimension reduction directions. The Annals of Statistics. 2007;35:2654–2690. [Google Scholar]

[R27] Xia Y, Tong H, Li W, Zhu L-X. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B. 2002;64:363–410. [Google Scholar]

[R28] Yue Y, Loh JM, Lindquist MA. Adaptive spatial smoothing of fMRI images. Statistics and its Interface. 2010;3:3–14. [Google Scholar]

[R29] Zhang J, Chen J. Statistical inferences for functional data. The Annals of Statistics. 2007;35:1052–1079. [Google Scholar]

[R30] Zhu H, Fan J, Kong L. Spatially varying coefficient model for neuroimaging data with jump discontinuities. Journal of the American Statistical Association. 2014;109:1084–1098. doi: 10.1080/01621459.2014.881742. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Zhu H, Li R, Kong L. Multivariate varying coefficient model for functional responses. Annals of Statistics. 2012;40:2634–2666. doi: 10.1214/12-AOS1045SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Zhu L, Wang T, Zhu L, Ferré L. Sufficient dimension reduction through discretization-expectation estimation. Biometrika. 2010;97:295–304. [Google Scholar]

PERMALINK

Single-index Varying Coefficient Model for Functional Responses

Xinchao Luo

Lixing Zhu

Hongtu Zhu

Summary

1. Introduction

Figure 1.

2. Methods

2.1 Single-index Varying Coefficient Model

2.2 Estimation Procedure

Step I. Estimating varying coefficient β(s)

Step II. Estimating the unknown index function g(·)

Step III. Estimating the covariance function Ση(s, t)

2.3 Simultaneous Confidence Bands

3. Theoretical Results

3.1 Optimal weight Functions

Theorem 1

3.2 Asymptotic Properties

Theorem 2

4. Numerical Studies

4.1 Simulation Results

Table 1.

Figure 2.

Table 2.

Table 3.

4.2 Real Data Analysis

Figure 3.

Table 4.

5. Conclusion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Step III. Estimating the covariance function Σ_η(s, t)