Spatially Varying Coefficient Model for Neuroimaging Data with Jump Discontinuities

Hongtu Zhu; Jianqing Fan; Linglong Kong

doi:10.1080/01621459.2014.881742

. Author manuscript; available in PMC: 2015 Jul 1.

Published in final edited form as: J Am Stat Assoc. 2014 Jul;109(507):1084–1098. doi: 10.1080/01621459.2014.881742

Spatially Varying Coefficient Model for Neuroimaging Data with Jump Discontinuities

Hongtu Zhu ¹, Jianqing Fan ², Linglong Kong ³

PMCID: PMC4244662 NIHMSID: NIHMS588529 PMID: 25435598

Abstract

Motivated by recent work on studying massive imaging data in various neuroimaging studies, we propose a novel spatially varying coefficient model (SVCM) to capture the varying association between imaging measures in a three-dimensional (3D) volume (or 2D surface) with a set of covariates. Two stylized features of neuorimaging data are the presence of multiple piecewise smooth regions with unknown edges and jumps and substantial spatial correlations. To specifically account for these two features, SVCM includes a measurement model with multiple varying coefficient functions, a jumping surface model for each varying coefficient function, and a functional principal component model. We develop a three-stage estimation procedure to simultaneously estimate the varying coefficient functions and the spatial correlations. The estimation procedure includes a fast multiscale adaptive estimation and testing procedure to independently estimate each varying coefficient function, while preserving its edges among different piecewise-smooth regions. We systematically investigate the asymptotic properties (e.g., consistency and asymptotic normality) of the multiscale adaptive parameter estimates. We also establish the uniform convergence rate of the estimated spatial covariance function and its associated eigenvalues and eigenfunctions. Our Monte Carlo simulation and real data analysis have confirmed the excellent performance of SVCM.

Keywords: Asymptotic normality, Functional principal component analysis, Jumping surface model, Kernel, Spatial varying coefficient model, Wald test

1 Introduction

The aims of this paper are to develop a spatially varying coefficient model (SVCM) to delineate association between massive imaging data and a set of covariates of interest, such as age, and to characterize the spatial variability of the imaging data. Examples of such imaging data include T1 weighted magnetic resonance imaging (MRI), functional MRI, and diffusion tensor imaging, among many others (Friston, 2007; Thompson and Toga, 2002; Mori, 2002; Lazar, 2008). In neuroimaging studies, following spatial normalization, imaging data usually consists of data points from different subjects (or scans) at a large number of locations (called voxels) in a common 3D volume (without loss of generality), which is called a template. We assume that all imaging data have been registered to a template throughout the paper.

To analyze such massive imaging data, researchers face at least two main challenges. The first one is to characterize varying association between imaging data and covariates, while preserving important features, such as edges and jumps, and the shape and spatial extent of effect images. Due to the physical and biological reasons, imaging data are usually expected to contain spatially contiguous regions or effect regions with relatively sharp edges (Chumbley et al., 2009; Chan and Shen, 2005; Tabelow et al., 2008a, b). For instance, normal brain tissue can generally be classified into three broad tissue types including white matter, gray matter, and cerebrospinal fluid. These three tissues can be roughly separated by using MRI due to their imaging intensity differences and relatively intensity homogeneity within each tissue. The second challenge is to characterize spatial correlations among a large number of voxels, usually in the tens thousands to millions, for imaging data. Such spatial correlation structure and variability are important for achieving better prediction accuracy, for increasing the sensitivity of signal detection, and for characterizing the random variability of imaging data across subjects (Cressie and Wikle, 2011; Spence et al., 2007).

There are two major statistical methods including voxel-wise methods and multiscale adaptive methods for addressing the first challenge. Conventional voxel-wise approaches involve in Gaussian smoothing imaging data, independently fitting a statistical model to imaging data at each voxel, and generating statistical maps of test statistics and p-values (Lazar, 2008; Worsley et al., 2004). As shown in Chumbley et al. (2009) and Li et al. (2011), voxel-wise methods are generally not optimal in power since it ignores the spatial information of imaging data. Moreover, the use of Gaussian smoothing can blur the image data near the edges of the spatially contiguous regions and thus introduce substantial bias in statistical results (Yue et al., 2010).

There is a great interest in the development of multiscale adaptive methods to adaptively smooth neuroimaging data, which is often characterized by a high noise level and a low signal-to-noise ratio (Tabelow et al., 2008a, b; Polzehl et al., 2010; Li et al., 2011; Qiu, 2005, 2007). Such multiscale adaptive methods not only increase signal-to-noise ratio, but also preserve important features (e.g., edge) of imaging data. For instance, in Polzehl and Spokoiny (2000, 2006), a novel propagation-separation approach was developed to adaptively and spatially smooth a single image without explicitly detecting edges. Recently, there are a few attempts to extend those adaptive smoothing methods to smoothing multiple images from a single subject (Tabelow et al., 2008a, b; Polzehl et al., 2010). In Li et al. (2011), a multiscale adaptive regression model, which integrates the propagation-separation approach and voxel-wise approach, was developed for a large class of parametric models.

There are two major statistical models, including Markov random fields and low rank models, for addressing the second challenge. The Markov random field models explicitly use the Markov property of an undirected graph to characterize spatial dependence among spatially connected voxels (Besag, 1986; Li, 2009). However, it can be restrictive to assume a specific type of spatial correlation structure, such as Markov random fields, for very large spatial data sets besides its computational complexity (Cressie and Wikle, 2011). In spatial statistics, low rank models, also called spatial random effects models, use a linear combination of ‘known’ spatial basis functions to approximate spatial dependence structure in a single spatial map (Cressie and Wikle, 2011). The low rank models have a close connection with the functional principal component analysis model for characterizing spatial correlation structure in multiple images, in which spatial basis functions are directly estimated (Zipunnikov et al., 2011; Ramsay and Silverman, 2005; Hall et al., 2006).

The goal of this article is to develop SVCM and its estimation procedure to simultaneously address the two challenges discussed above. SVCM has three features: piecewise smooth, spatially correlated, and spatially adaptive, while its estimation procedure is fast, accurate and individually updated. Major contributions of the paper are as follows.

Compared with the existing multiscale adaptive methods, SVCM first integrates a jumping surface model to delineate the piecewise smooth feature of raw and effect images and the functional principal component model to explicitly incorporate the spatial correlation structure of raw imaging data.
A comprehensive three-stage estimation procedure is developed to adaptively and spatially improve estimation accuracy and capture spatial correlations.
Compared with the existing methods, we use a fast and accurate estimation method to independently smooth each of effect images, while consistently estimating their standard deviation images.
We systematically establish consistency and asymptotic distribution of the adaptive parameter estimators under two different scenarios including piecewise-smooth and piecewise-constant varying coefficient functions. In particular, we introduce several adaptive boundary conditions to delineate the relationship between the amount of jumps and the sample size. Our conditions and theoretical results differ substantially from those for the propagation-separation type methods (Polzehl and Spokoiny, 2000, 2006; Li et al., 2011).

The rest of this paper is organized as follows. In Section 2, we describe SVCM and its three-stage estimation procedure and establish the theoretical properties. In Section 3, we present a set of simulation studies with the known ground truth to examine the finite sample performance of the three-stage estimation procedure for SVCM. In Section 4, we apply the proposed methods in a real imaging dataset on attention deficit hyper-activity disorder (ADHD). In Section 5, we conclude the paper with some discussions. Technical conditions are given in Section 6. Proofs and additional results are given in a supplementary document.

2 Spatial Varying Coefficient Model with Jumping Discontinuities

2.1 Model Setup

We consider imaging measurements in a template and clinical variables (e.g., age, gender, and height) from n subjects. Let Inline graphic represent a 3D volume and d and d₀, respectively, denote a point and the center of a voxel in . Let be the union of all centers d₀ in and N_D equal the number of voxels in . Without loss of generality, is assumed to be a compact set in R³. For the i-th subject, we observe an m × 1 vector of imaging measures y_i(d₀) at d₀ ∈ Inline graphic , which leads to an mN_D × 1 vector of measurements across , denoted by = {y_i(d₀) : d₀ ∈ }. For notational simplicity, we set m = 1 and consider a 3D volume throughout the paper.

The proposed spatial varying coefficient model (SVCM) consists of three components: a measurement model, a jumping surface model, and a functional component analysis model. The measurement model characterizes the association between imaging measures and covariates and is given by

y_{i} (d) = x_{i}^{T} β (d) + η_{i} (d) + ε_{i} (d) for all i = 1, \dots, n and d \in D,

(1)

where x_i = (x_i₁, …, x_ip)^T is a p × 1 vector of covariates, β(d) = (β₁(d), …, β_p(d))^T is a p × 1 vector of coefficient functions of d, η_i(d) characterizes individual image variations from $x_{i}^{T} β (d)$ , and ε_i(d) are measurement errors. Moreover, {η_i(d) : d ∈ Inline graphic } is a stochastic process indexed by d ∈ that captures the within-image dependence. We assume that they are mutually independent and η_i(d) and ε_i(d) are independent and identical copies of SP(0, Σ_η) and SP(0, Σ_ε), respectively, where SP(μ, Σ) denotes a stochastic process vector with mean function μ(d) and covariance function Σ(d, d′). Moreover, ε_i(d) and ε_i(d′) are independent for d ≠ d′ and thus Σ_ε(d, d′) = 0 for d ≠ d′. Therefore, the covariance function of {y_i(d) : d ∈ Inline graphic }, conditioned on x_i, is given by

\sum_{y} (d, d^{'}) = Cov (y_{i} (d), y_{i} (d^{'})) = \sum_{η} (d, d^{'}) + \sum_{ε} (d, d) 1 (d = d^{'}) .

(2)

The second component of the SVCM is a jumping surface model for each of {β_j(d) : d ∈ Inline graphic }_j_≤_p. Imaging data {y_i(d₀) : d₀ ∈ } can usually be regarded as a noisy version of a piecewise-smooth function of d ∈ with jumps or edges. In many neuroimaging data, those jumps or edges often reflect the functional and/or structural changes, such as white matter and gray matter, across the brain. Therefore, the varying function {β_j(d) : d ∈ Inline graphic } in model (1) may inherit the piecewise-smooth feature from imaging data for j = 1, …, p, but allows to have different jumps and edges. Specially, we make the following assumptions.

(i) (Disjoint Partition) There is a finite and disjoint partition { : l = 1, ···, L_j} of such that each is a connected region of and its interior, denoted by $D_{j, l}^{o}$ , is nonempty, where L_j is a fixed, but unknown integer. See Figure 1 (a), (b), and (d) for an illustration.
(ii) (Piecewise Smoothness) β_j(d) is a smooth function of d within each $D_{j, l}^{o}$ for l = 1, …, L_j, but β_j(d) is discontinuous on $\partial D^{(j)} = D \ [\cup_{l = 1}^{L_{j}} D_{j, l}^{o}]$ , which is the union of the boundaries of all . See Figure 1 (b) for an illustration.
(iii) (Local Patch) For any d₀ ∈ and h > 0, let B(d₀, h) be an open ball of d₀ with radius h and P_j(d₀, h) a maximal path-connected set in B(d₀, h), in which β_j(d) is a smooth function of d. Assume that P_j(d₀, h), which will be called a local patch, contains an open set. See Figure 1 for a graphical illustration.

Illustration of a jumping surface model for β₁(d) and boundary sets over a two-dimensional region D: (a) , , a disjoint partition of as the union of four disjoint regions with white, yellow, blue green, and red representing , , , and , a representative voxel d₀ ∈ , an open ball of d₀, B(d₀, h), a maximal path-connected set P₁(d₀, h), and P₁(d₀, h)^c; (b) three-dimensional shaded surface of true {β₁(d): d ∈ } map; (c) three-dimensional shaded surface of estimated {β̂₁(d₀): d₀ ∈ } map; and (d) , , a disjoint partition of = ∪ , ∂D⁽¹⁾(h₀) ⊂ ∂D⁽¹⁾(*h_s*), two representative voxels d₀ and $d_{0}^{'}$ in , two open balls of $d_{0}^{'} \in D_{1, 1}$ , an open ball of d₀ ∈ ∂D⁽¹⁾(*h_s*) ∩ , B(d₀, *h_s*), and P₁(d₀, *h_s*)^c.

Inline graphic — Illustration of a jumping surface model for β₁(d) and boundary sets over a two-dimensional region D: (a) , , a disjoint partition of as the union of four disjoint regions with white, yellow, blue green, and red representing , , , and , a representative voxel d₀ ∈ , an open ball of d₀, B(d₀, h), a maximal path-connected set P₁(d₀, h), and P₁(d₀, h)^c; (b) three-dimensional shaded surface of true {β₁(d): d ∈ } map; (c) three-dimensional shaded surface of estimated {β̂₁(d₀): d₀ ∈ } map; and (d) , , a disjoint partition of = ∪ , ∂D⁽¹⁾(h₀) ⊂ ∂D⁽¹⁾(*h_s*), two representative voxels d₀ and $d_{0}^{'}$ in , two open balls of $d_{0}^{'} \in D_{1, 1}$ , an open ball of d₀ ∈ ∂D⁽¹⁾(*h_s*) ∩ , B(d₀, *h_s*), and P₁(d₀, *h_s*)^c.

The jumping surface model can be regarded as a generalization of various models for delineating changes at unknown location (or time). See, for example, Khodadadi and Asgharian (2008) for an annotated bibliography of change point problem and regression. The disjoint partition and piecewise smoothness assumptions characterize the shape and smoothness of β_j(d) in Inline graphic , whereas the local patch assumption primarily characterizes the local shape of β_j(d) at each voxel d₀ ∈ across different scales (or radii). For $d_{0} \in [\cup_{l = 1}^{L_{j}} D_{j, l}^{o}] \cap D_{0}$ , there exists a radius h(d₀) such that $B (d_{0}, h (d_{0})) \subset \cup_{l = 1}^{L_{j}} D_{j, l}^{o}$ . In this case, for h ≤ h(d₀), we have P_j(d₀, h) = B(d₀, h) and P_j(d₀, h)^c = ∅, whereas P_j(d₀, h)^c may not equal the empty set for large h since B(d₀, h) may cross different $D_{j, l}^{o} s$ . For d₀ ∈ ∂ Inline graphic ∩ , P_j(d₀, h)^c ≠ ∅ for all h > 0. Since P_j(d₀, h) contains an open set for any h > 0, it eliminates the case of d₀ being an isolated point. See Figure 1 (a) and (d) for an illustration.

The last component of the SVCM is a functional principal component analysis model for η_i(d). Let λ₁ ≥ λ₂ ≥ …≥ 0 be ordered values of the eigenvalues of the linear operator determined by Σ_η with $\sum_{l = 1}^{\infty} λ_{l} < \infty$ and the ψ_l(d)s’ be the corresponding orthonormal eigenfunctions (or principal components) (Li and Hsing, 2010; Hall et al., 2006). Then, Σ_η admits the spectral decomposition:

\sum_{η} (d, d^{'}) = \sum_{l = 1}^{\infty} λ_{l} ψ_{l} (d) ψ_{l} (d^{'}) .

(3)

The eigenfunctions ψ_l(d) form an orthonormal basis on the space of square-integrable functions on Inline graphic , and η_i(d) admits the Karhunen-Loeve expansion as follows:

η_{i} (d) = \sum_{l = 1}^{\infty} ξ_{i, l} ψ_{l} (d),

(4)

where ξ_i_,_l = Inline graphic η_i(s)ψ_l(s)d (s) is referred to as the l-th functional principal component score of the ith subject, in which d (s) denotes the Lebesgue measure. The ξ_i_,_l are uncorrelated random variables with E(ξ_i_,_l) = 0 and E(ξ_i_,_lξ_i_,_k) = λ_l1(l = k). If λ_l ≈ 0 for l ≥ L_S + 1, then model (1) can be approximated by

y_{i} (d) \approx x_{i}^{T} β (d) + \sum_{l = 1}^{L_{S}} ξ_{i, l} ψ_{l} (d) + ε_{i} (d) .

(5)

In (5), since ξ_i_,_l are random variables and ψ_l(d) are ‘unknown’ but fixed basis functions, it can be regarded as a varying coefficient spatial mixed effects model. Therefore, model (5) is a mixed effects representation of model (1).

Model (5) differs significantly from other models in the existing literature. Most varying coefficient models assume some degrees of smoothness on varying coefficient functions, while they do not model the within-curve dependence (Wu et al., 1998). See Fan and Zhang (2008) for a comprehensive review of varying coefficient models. Most spatial mixed effects models in spatial statistics assume that spatial basis functions are known and regression coefficients do not vary across d (Cressie and Wikle, 2011). Most functional principal component analysis models focus on characterizing spatial correlation among multiple observed functions when Inline graphic ∈ R¹ (Zipunnikov et al., 2011; Ramsay and Silverman, 2005; Hall et al., 2006).

2.2 Three-stage Estimation Procedure

We develop a three-stage estimation procedure as follows. See Figure 2 for a schematic overview of SVCM.

A schematic overview of the three stages of SVCM: Stage (I) is the initialization step, Stage (II) is the Multiscale Adaptive and Sequential Smoothing (MASS) method, and Stage (III) is the hypothesis test.

Stage (I): Calculate the least squares estimate of β(d₀), denoted by β̂(d₀), across all voxels in , and estimate {Σ_ε(d₀, d₀) : d₀ ∈ }, {Σ_η(d, d′) : (d, d′) ∈ } and its eigenvalues and eigenfunctions.
Stage (II): Use the propagation-seperation method to adaptively and spatially smooth each component of β̂(d₀) across all d₀ ∈ .
Stage (III): Approximate the asymptotic covariance matrix of the final estimate of β(d₀) and calculate test statistics across all voxels d₀ ∈ .

This is more refined idea than the two-stage procedure proposed in Fan and Zhang (1999, 2002).

2.2.1 Stage (I)

Stage (I) consists of four steps.

Step (I.1) is to calculate the least squares estimate of β(d₀), which equals $\hat{β} (d_{0}) = Ω_{X, n}^{- 1} \sum_{i = 1}^{n} x_{i} y_{i} (d_{0})$ across all voxels d₀ ∈ Inline graphic , where $Ω_{X, n} = \sum_{i = 1}^{n} x_{i}^{\otimes 2}$ , in which a^⊗2 = aa^T for any vector a. See Figure 1 (c) for a graphical illustration of {β̂(d₀) : d₀ ∈ }.

Step (I.2) is to estimate η_i(d) for all d ∈ Inline graphic . We employ the local linear regression technique to estimate all individual functions η_i(d). Let ∂_dη_i(d) = ∂η_i(d)/∂d, C_i(d) = (η_i(d), h∂_dη_i(d)^T)^T, and z_h(d_m − d) = (1, (d_m_,1 − d₁)/h, (d_m_,2 − d₂)/h, (d_m_,3 − d₃)/h)^T, where d = (d₁, d₂, d₃)^T and d_m = (d_m_,1, d_m_,2, d_m_,3)^T ∈ Inline graphic . We use Taylor series expansion to expand η_i(d_m) at d leading to

η_{i} (d_{m}) = C_{i} {(d)}^{T} z_{h} (d_{m} - d) .

We develop an algorithm to estimate C_i(d) as follows. Let K_loc(·) be a univariate kernel function and $K_{h} (d_{m} - d) = h^{- 3} \prod_{k = 1}^{3} K_{loc} ((d_{m, k} - d_{k}) / h)$ be the rescaled kernel function with a bandwidth h. For each i, we estimate C_i(d) by minimizing the weighted least squares function given by

{\hat{C}}_{i} (d) = {argmin}_{C_{i} (d)} \sum_{d_{m} \in D_{0}} {r_{i} (d_{m}) - C_{i} {(d)}^{T} z_{h} (d_{m} - d)}^{2} K_{h} (d_{m} - d) .

where $r_{i} (d_{m}) = y_{i} (d_{m}) - x_{i}^{T} \hat{β} (d_{m})$ . It can be shown that

{\hat{C}}_{i} (d) = {\sum_{d_{m} \in D_{0}} K_{h} (d_{m} - d) z_{h} {(d_{m} - d)}^{\otimes 2}}^{- 1} \sum_{d_{m} \in D_{0}} K_{h} (d_{m} - d) z_{h} (d_{m} - d) r_{i} (d_{m}),

(6)

Let R̂_i = (r_i(d₀) : d₀ ∈ Inline graphic ) be an N_D × 1 vector of estimated residuals and notice that η̂_i(d) is the first component of C_i(d). Then, we have

{\hat{η}}_{i} = ({\hat{η}}_{i} (d_{0}) : d_{0} \in D_{0}) = S_{i} {\hat{R}}_{i} and {\hat{η}}_{i} (d) = (1, 0, 0, 0) {\hat{C}}_{i} (d),

(7)

where S_i is an N_D × N_D smoothing matrix (Fan and Gijbels, 1996). We pool the data from all n subjects and select the optimal bandwidth h, denoted by h̃, by minimizing the generalized cross-validation (GCV) score given by

GCV (h) = \sum_{i = 1}^{n} \frac{{\hat{R}}_{i}^{T} {(I_{D} - S_{i})}^{T} (I_{D} - S_{i}) {\hat{R}}_{i}}{{[1 - N_{D}^{- 1} tr (S_{i})]}^{2}},

(8)

where I_D is an N_D × N_D identity matrix. Based on h̃, we can use (7) to estimate η_i(d) for all i.

Step (I.3) is to estimate Σ_η(d, d′) and Σ_ε(d₀, d₀). Let ${\hat{ε}}_{i} (d_{0}) = y_{i} (d_{0}) - x_{i}^{T} \hat{β} (d_{0}) - {\hat{η}}_{i} (d_{0})$ be estimated residuals for i = 1, …, n and d₀ ∈ Inline graphic . We estimate Σ_ε(d₀, d₀) by

{\sum^{^}}_{ε} (d_{0}, d_{0}) = n^{- 1} \sum_{i = 1}^{n} {\hat{ε}}_{i} {(d_{0})}^{2}

(9)

and Σ_η(d, d′) by the sample covariance matrix:

{\sum^{^}}_{η} (d, d^{'}) = {(n - p)}^{- 1} \sum_{i = 1}^{n} {\hat{η}}_{i} (d) {\hat{η}}_{i} (d^{'}) .

(10)

Step (I.4) is to estimate the eigenvalue-eigenfunction pairs of Σ_η by using the singular value decomposition. Let V = [η̂₁, ···, η̂_n] be an N_D × n matrix. Since n is much smaller than N_D, we can easily calculate the eigenvalue-eigenvector pairs of the n × n matrix V^TV, denoted by {(λ̂ _i, ξ̂_i) : i = 1, ···, n}. It can be shown that {(λ̂_i, Vξ̂_i) : i = 1, ···, n} are the eigenvalue-eigenvector pairs of the N_D × N_D matrix VV^T. In applications, one usually considers large λ̂_l values, while dropping small λ̂_ls. It is common to choose a value of L_S so that the cumulative eigenvalue $\sum_{l = 1}^{L_{S}} {\hat{λ}}_{l} / \sum_{l = 1}^{n} {\hat{λ}}_{l}$ is above a prefixed threshold, say 80% (Zipunnikov et al., 2011; Li and Hsing, 2010; Hall et al., 2006). Furthermore, the lth SPCA scores can be computed using

{\hat{ξ}}_{i, l} = \sum_{m = 1}^{N_{D}} {\hat{η}}_{i} (d_{m}) {\hat{ψ}}_{l} (d_{m}) V (d_{m})

(11)

for l = 1, …, L_S, where Inline graphic (d_m) is the volume of voxel d_m.

2.2.2 Stage (II)

Stage (II) is a multiscale adaptive and sequential smoothing (MASS) method. The key idea of MASS is to use the propagation-separation method (Polzehl and Spokoiny, 2000, 2006) to individually smooth each least squares estimate image {β̂_j(d₀) : d₀ ∈ Inline graphic } for j = 1, …, p. MASS starts with building a sequence of nested spheres with increasing bandwidths 0 = h₀ < h₁ < ··· < h_S = r₀ ranging from the smallest bandwidth h₁ to the largest bandwidth h_S = r₀ for each d₀ ∈ . At bandwidth h₁, based on the information contained in {β̂(d₀) : d₀ ∈ Inline graphic }, we sequentially calculate adaptive weights $ω_{j} (d_{0}, d_{0}^{'}; h_{1})$ between voxels d₀ and $d_{0}^{'}$ , which depends on the distance ||d₀ −d₀|| and spacial similarity |β̂_j(d₀) − β̂_j(d₀)|, and update β̂_j(d₀; h₁) for all d₀ ∈ for j = 1, ···, p. At bandwidth h₂, we repeat the same process using {β̂(d₀; h₁) : d₀ ∈ Inline graphic } to compute spatial similarities. In this way, we can sequentially determine $ω_{j} (d_{0}, d_{0}^{'}; h_{s})$ and β̂_j(d₀; h_s) for each component of β(d₀) as the bandwidth ranges from h₁ to h_S = r₀. Moreover, as shown below, we have found a simple way of calculating the standard deviation of β̂_j(d₀; h_s).

MASS consists of three steps including (II.1) an initialization step, (II.2) a sequentially adaptive estimation step, and (II.3) a stop checking step, each of which involves in the specification of several parameters. Since propagation-separation and the choice of their associated parameters have been discussed in details in Polzehl et al. (2010) and Li et al. (2011), we briefly mention them here for the completeness. In the initialization step (II.1), we take a geometric series { $h_{s} = c_{h}^{s}$ : s = 1, …, S} of radii with h₀ = 0, where c_h > 1, say c_h = 1.10. We suggest relatively small c_h to prevent incorporating too many neighboring voxels.

In the sequentially adaptive estimation step (II.2), starting from s = 1 and h₁ = c_h, at step s, we compute spatial adaptive locally weighted average estimate β̂_j(d₀; h_s) based on {β̂_j(d₀) : d₀ ∈ Inline graphic } and {β̂_j(d₀; h_s₋₁) : d ∈ }, where β̂_j(d₀; h₀) = β̂_j(d₀). Specifically, for each j, we construct a weighted quadratic function

ℓ_{n} (β_{j} (d_{0}); h_{s}) = \sum_{d_{m} \in B (d_{0}, h_{s}) \cap D_{0}} {{\hat{β}}_{j} (d_{m}) - β_{j} (d_{0})}^{2} ω_{j} (d_{0}, d_{m}; h_{s}),

(12)

where ω_j(d₀, d_m; h_s), which will be defined below, characterizes the similarity between β̂_j(d_m; h_s₋₁) and β̂_j(d₀; h_s₋₁). We then calculate

{\hat{β}}_{j} (d_{0}; h_{s}) = {argmin}_{β_{j} (d_{0})} ℓ_{n} (β_{j} (d_{0}); h_{s}) = \sum_{d_{m} \in B (d_{0}, h_{s}) \cap D_{0}} {\tilde{ω}}_{j} (d_{0}, d_{m}; h_{s}) {\hat{β}}_{j} (d_{m}),

(13)

where ω̃_j(d₀, d_m; h_s) = ω_j(d₀, d_m; h_s)/ Inline graphic ω_j(d₀, d_m_′ ; h_s).

Let Σ_n(β̂_j(d₀; h_s)) be the asymptotic variance of β̂_j(d₀; h_s). For β_j(d₀), we compute the similarity between voxels d₀ and $d_{0}^{'}$ , denoted by $D_{β_{j}} (d_{0}, d_{0}^{'}; h_{s - 1})$ , and the adaptive weight $ω_{j} (d_{0}, d_{0}^{'}; h_{s})$ , which are, respectively, defined as

\begin{matrix} D_{β_{j}} (d_{0}, d_{0}^{'}; h_{s - 1}) = {{\hat{β}}_{j} (d_{0}; h_{s - 1}) - {\hat{β}}_{j} (d_{0}^{'}; h_{s - 1})}^{2} / \sum_{n} ({\hat{β}}_{j} (d_{0}; h_{s - 1})), \\ ω_{j} (d_{0}, d_{0}^{'}; h_{s}) = K_{loc} ({| | d_{0} - d_{0}^{'} | |}_{2} / h_{s}) K_{s t} (D_{β_{j}} (d_{0}, d_{0}^{'}; h_{s - 1}) / C_{n}), \end{matrix}

(14)

where K_st(u) is a nonnegative kernel function with compact support, C_n is a tuning parameter depending on n, and || · ||₂ denotes the Euclidean norm of a vector.

The weights $K_{loc} ({| | d_{0} - d_{0}^{'} | |}_{2} / h_{s})$ give less weight to the voxel $d_{0}^{'}$ that is far from the voxel d₀. The weights K_st(u) downweight the voxels $d_{0}^{'}$ with large $D_{β_{j}} (d_{0}, d_{0}^{'}; h_{s - 1})$ , which indicates a large difference between ${\hat{β}}_{j} (d_{0}^{'}; h_{s - 1})$ and β̂_j(d₀; h_s₋₁). In practice, we set K_loc(u) = (1 − u)₊. Although different choices of K_st(·) have been suggested in the propagation-separation method (Polzehl and Spokoiny, 2000, 2006; Polzehl et al., 2010; Li et al., 2011), we have tested these kernel functions and found that K_st(u) = exp(−u) performs reasonably well. Another good choice of K_st(u) is min(1, 2(1 − u))₊. Moreover, theoretically, as shown in Scott (1992) and Fan (1993), they have examined the efficiency of different kernels for weighted least squares estimators, but extending their results to the propagation-separation method needs some further investigation.

The scale C_n is used to penalize the similarity between any two voxels d₀ and $d_{0}^{'}$ in a similar manner to bandwidth, and an appropriate choice of C_n is crucial for the behavior of the propagation-separation method. As discussed in (Polzehl and Spokoiny, 2000, 2006), a propagation condition independent of the observations at hand can be used to specify C_n. The basic idea of the propagation condition is that the impact of the statistical penalty in $K_{s t} (D_{β_{j}} (d_{0}, d_{0}^{'}; h_{s - 1}) / C_{n})$ should be negligible under a homogeneous model β_j(d) ≡ constant yielding almost free smoothing within homogeneous regions. However, we take an alternative approach to choose C_n here. Specifically, a good choice of C_n should balance between the sensitivity and specificity of MASS. Theoretically, as shown in Section 2.3, C_n should satisfy C_n/n = o(1) and $C_{n}^{- 1} log (N_{D}) = o (1)$ . We choose $C_{n} = n^{0.4} χ_{1}^{2} (0.8)$ based on our experiments, where $χ_{1}^{2} (a)$ is the upper a-percentile of the $χ_{1}^{2}$ -distribution.

We now calculate Σ_n(β̂_j(d₀; h_s)). By treating the weights ω̃_j(d₀, d_m; h_s) as ‘fixed’ constants, we can approximate Σ_n(β̂_j(d₀; h_s)) by

\sum_{d_{m}, d_{m^{'}} \in B (d_{0}, h_{s}) \cap D_{0}} {\tilde{ω}}_{j} (d_{0}, d_{m}; h_{s}) {\tilde{ω}}_{j} (d_{0}, d_{m^{'}}; h_{s}) Cov ({\hat{β}}_{j} (d_{m}), {\hat{β}}_{j} (d_{m^{'}})),

(15)

where Cov(β̂_j(d_m), β̂_j(d_m_′)) can be estimated by

e_{j, p}^{T} Ω_{X, n}^{- 1} e_{j, p} [{\sum^{^}}_{η} (d_{m}, d_{m^{'}}) + {\sum^{^}}_{ε} (d_{m}, d_{m}) 1 (d_{m} = d_{m^{'}})],

(16)

in which e_j_,_p is a p × 1 vector with the j-th element 1 and others 0. We will examine the consistency of approximation (15) later.

In the stop checking step (II.3), after the first iteration, we start to calculate a stopping criterion based on a normalized distance between β̂_j(d₀) and β̂_j(d₀; h_s) given by

D ({\hat{β}}_{j} (d_{0}), {\hat{β}}_{j} (d_{0}; h_{s})) = {{\hat{β}}_{j} (d_{0}) - {\hat{β}}_{j} (d_{0}; h_{s})}^{2} / \sum_{n} ({\hat{β}}_{j} (d_{0})) .

(17)

Then, we check whether β̂_j(d₀; h_s) is in a confidence ellipsoid of β̂_j(d₀) given by {β_j(d₀) : D(β̂_j(d₀), β_j(d₀)) ≤ C_s}, where C_s is taken as $C_{s} = χ_{1}^{2} (0.80 / s)$ in our implementation. If D(β̂_j(d₀), β̂_j(d₀; h_s)) is greater than C_s, then we set β̂_j(d₀, h_S) = β̂_j(d₀, h_s₋₁) and s = S for the j-th component and voxel d₀. If s = S for all components in all voxels, we stop. If D(β̂_j(d₀), β̂_j(d₀; h_s)) ≤ C_s, then we set h_s₊₁ = c_hh_s, increase s by 1 and continue with the step (II.1). It should be noted that different components of β̂(d₀; h) may stop at different bandwidths.

We usually set the maximal step S to be relatively small, say between 10 and 20, and thus each B(d₀, h_S) only contains a relatively small number of voxels. As S increases, the number of neighboring voxels in B(d₀, h_S) increases exponentially. It increases the chance of oversmoothing β_j(d₀) when d₀ is near the edge of distinct regions. Moreover, in order to prevent oversmoothing β_j(d₀), we compare β̂_j(d₀; h_s) with the least squares estimate β̂_j(d₀) and gradually decrease C_s with the number of iteration.

2.2.3 Stage (III)

Based on β̂(d₀; h_S), we can further construct test statistics to examine scientific questions associated with β(d₀). For instance, such questions may compare brain structure across different groups (normal controls versus patients) or detect change in brain structure across time. These questions can be formulated as the linear hypotheses about β(d₀) given by

H_{0} (d_{0}) : R_{1} β (d_{0}) = b_{0} vs . H_{1} (d_{0}) : R_{1} β (d_{0}) \neq b_{0},

(18)

where R₁ is an r × k matrix of full row rank and b₀ is an r × 1 specified vector. We use the Wald test statistic

W_{β} (d_{0}; h) = {R_{1} \hat{β} (d_{0}; h_{S}) - b_{0}}^{T} {R_{1} \sum_{n} (\hat{β} (d_{0}; h_{S})) R_{1}^{T}}^{- 1} {R_{1} \hat{β} (d_{0}; h_{S}) - b_{0}}

(19)

for problem (18), where Σ_n(β̂(d₀; h_S)) is the covariance matrix of β̂(d₀; h_S).

We propose an approximation of Σ_n(β̂(d₀; h_S)). According to (13), we know that

\hat{β} (d_{0}; h_{S}) = \sum_{d_{m} \in B (d_{0}, h_{S})} \tilde{ω} (d_{0}, d_{m}; h_{S}) \circ \hat{β} (d_{m})

where a ∘ b denotes the Hadamard product of matrices a and b and ω̃(d₀, d_m; h) is a p × 1 vector determined by the weights ω̃_j(d₀, d_m; h) in Stage II. Let J_p be the p² × p selection matrix (Liu, 1999). Therefore, Σ_n(β̂(d₀; h_S)) can be approximated by

\begin{matrix} \sum_{d_{m}, d_{m}^{'} \in B (d_{0}, h_{S})} Cov (\tilde{ω} (d_{0}, d_{m}; h_{S}) \circ \hat{β} (d_{m}), \tilde{ω} (d_{0}, d_{m}^{'}; h_{S}) \circ \hat{β} (d_{m}^{'})) \\ \approx \sum_{d_{m}, d_{m}^{'} \in B (d_{0}, h_{S})} {\sum^{^}}_{y} (d_{m}, d_{m}^{'}) J_{p}^{T} {[\tilde{ω} (d_{0}, d_{m}; h_{S}) \tilde{ω} {(d_{0}, d_{m}^{'}; h_{S})}^{T}] \otimes Ω_{X, n}^{- 1}} J_{p} . \end{matrix}

2.3 Theoretical Results

We systematically investigate the asymptotic properties of all estimators obtained from the three-stage estimation procedure. Throughout the paper, we only consider a finite number of iterations and bounded r₀ for MASS, since a brain volume is always bounded. Without otherwise stated, we assume that o_p(1) and O_p(1) hold uniformly across all d in either Inline graphic or throughout the paper. Moreover, the sample size n and the number of voxels N_D are allowed to diverge to infinity. We state the following theorems, whose detailed assumptions and proofs can be found in Section 6 and a supplementary document.

Let β_*(d₀) = (β_1*(d₀), …, β_p_*(d₀))^T be the true value of β(d₀) at voxel d₀. We first establish the uniform convergence rate of {β̂(d₀) : d₀ ∈ Inline graphic }.

Theorem 1

Under assumptions (C1)–(C4) in Section 6, as n → ∞, we have

(i) $\sqrt{n} [\hat{β} (d_{0}) - β_{*} (d_{0})] \to^{L} N (0, Ω_{X}^{- 1} \sum_{y} (d_{0}, d_{0}))$ for any d₀ ∈ , where →^L denotes convergence in distribution;
(ii) ${sup}_{d_{0} \in D_{0}} {| | \hat{β} (d_{0}) - β_{*} (d_{0}) | |}_{2} = O_{p} (\sqrt{n^{- 1} log (1 + N_{D})})$

Remark 1

Theorem 1 (i) just restates a standard asymptotic normality of the least squares estimate of β(d₀) at any given voxel d₀ ∈ Inline graphic . Theorem 1 (ii) states that the maximum of ||β̂(d₀) − β_*(d₀)||₂ across all d₀ ∈ is at the order of $\sqrt{n^{- 1} log (1 + N_{D})}$ . If log(1 + N_D) is relatively small compared with n, then the estimation errors converge uniformly to zero in probability. In practice, N_D is determined by imaging resolution and its value can be much larger than the sample size. For instance, in most applications, N_D can be as large as 100³ and log(1 + N_D) is around 15. In a study with several hundreds subjects, n⁻¹ log(1 + N_D) can be relatively small.

We next study the uniform convergence rate of Σ̂_η and its associated eigenvalues and eigenfunctions. We also establish the uniform convergence of Σ̂_ε(d₀, d₀).

Theorem 2

Under assumptions (C1)–(C8) in Section 6, we have the following results:

$sup_{(d, d^{'}) \in D^{2}} ∣ {\sum^{^}}_{η} (d, d^{'}) - \sum_{η} (d, d^{'}) ∣ = O_{p} (1)$ ;
∫[ψ̂_l(d) − ψ̂_l(d)]² d (d) = o_P(1) and |λ̂_l − λ_l| = o_p(1) for l = 1, …, E;
$sup_{d_{0} \in D_{0}} ∣ {\sum^{^}}_{ε} (d_{0}, d_{0}) - \sum_{ε} (d_{0}, d_{0}) ∣ = o_{p} (1)$ ;

where E will be described in assumption (C8) and ψ̂_l(d) is the estimated eigenvector, computed from ψ̂_l = Vξ_l.

Remark 2

Theorem 2 (i) and (ii) characterize the uniform weak convergence of Σ̂_η(·, ·) and the convergence of ψ̂_l (·) and λ̂_l. These results can be regarded as an extension of Theorems 3.3–3.6 in Li and Hsing (2010), which established the uniform strong convergence rates of these estimates under a simple model. Specifically, in Li and Hsing (2010), they considered y_i(d) = μ(d) + η_i(d) + ε_i(d) and assumed that μ(d) is twice differentiable. Another key difference is that in Li and Hsing (2010), they employed all cross products y_i(d)y_i(d′) for d ≠ d′ and then used the local polynomial kernel to estimate Σ_η(d, d′). In contrast, our approach is computationally simple and Σ̂_η(d, d′) is positive definite. Theorem 2 (iii) characterizes the uniform weak convergence of Σ̂_ε(d₀, d₀) across all voxels d₀ ∈ Inline graphic .

To investigate the asymptotic properties of β̂_j(d₀; h_s), we need to characterize points close to and far from the boundary set ∂ Inline graphic . For a given bandwidth h_s, we first define h_s-boundary sets:

\partial D^{(j)} (h_{s}) = {d \in D : B (d, h_{s}) \cap \partial D^{(j)} \neq \emptyset} and \partial D_{0}^{(j)} (h_{s}) = \partial D^{(j)} (h_{s}) \cap D_{0} .

(20)

Thus, ∂ Inline graphic (h_s) can be regarded as a band with radius h_s covering the boundary set ∂ , while $\partial D_{0}^{(j)} (h_{s})$ contains all grid points within such band. It is easy to show that for a sequence of bandwidths h₀ = 0 < h₁ < ··· < h_S, we have

\partial D^{(j)} (h_{0}) = \partial D^{(j)} \subset \dots \subset \partial D^{(j)} (h_{S}) and \partial D_{0}^{(j)} (h_{0}) \subset \dots \subset \partial D_{0}^{(j)} (h_{S}) .

(21)

Therefore, for a fixed bandwidth h_s, any point d₀ ∈ Inline graphic belongs to either \ ∂ (h_s) or ∂ (h_s). For each d₀ ∈ \ ∂ (h_s), there exists one and only one such that

B (d_{0}, h_{0}) \subset \dots \subset B (d_{0}, h_{s}) \subset D_{j, l}^{o} .

(22)

See Figure 1 (d) for an illustration.

We first investigate the asymptotic behavior of β̂_j(d₀; h_s) when β_j_*(d) is piecewise constant. That is, β_j_*(d) is a constant in $D_{j, l}^{o}$ and for any d′ ∈ ∂ Inline graphic , there exists a $d \in \cup_{l = 1}^{L_{j}} D_{j, l}^{o}$ such that β_j_*(d) = β_j_*(d′). Let β̃_j_*(d₀; h_s) = ω̃_j(d₀, d_m; h_s)β_j_*(d_m) be the pseudo-true value of β_j(d₀) at scale h_s in voxel d₀. For all d₀ ∈ \ ∂ (h_S), we have β̃_j_*(d₀; h_s) = β_j_*(d₀) for all s ≤ S due to (22). In contrast, for d₀ ∈ ∂ Inline graphic (h_S), β̃_j_*(d₀; h_s) may vary from h₀ to h_S. In this case, we are able to establish several important theoretical results to characterize the asymptotic behavior of β̂(d₀; h_s) even when h_S does not converge to zero. We need additional notation as follows:

\begin{array}{l} {\hat{Δ}}_{j} (d_{0}) = {\hat{β}}_{j} (d_{0}) - β_{j *} (d_{0}) and Δ_{j *} (d_{0}, d_{0}^{'}) = β_{j *} (d_{0}) - β_{j *} (d_{0}^{'}), \\ ω_{j}^{(0)} (d_{0}, d_{0}^{'}; h_{s}) = K_{loc} ({| | d_{0} - d_{0}^{'} | |}_{2} / h_{s}) K_{s t} (0) 1 (β_{j *} (d_{0}) = β_{j *} (d_{0}^{'})), \\ {\tilde{ω}}_{j}^{(0)} (d_{0}, d_{0}^{'}; h_{s}) = ω_{j}^{(0)} (d_{0}, d_{0}^{'}; h_{s}) / \sum_{d_{m} \in B (d_{0}, h_{s}) \cap D_{0}} ω_{j}^{(0)} (d_{0}, d_{m}; h_{s}), \\ \sum_{j}^{(0)} (d_{0}; h_{s}) = e_{j, p}^{T} Ω_{X}^{- 1} e_{j, p} \sum_{d_{m}, d_{m}^{'} \in B (d_{0}, h_{s}) \cap D_{0}} {\tilde{ω}}_{j}^{(0)} (d_{0}, d_{m}; h_{s}) {\tilde{ω}}_{j}^{(0)} (d_{0}, d_{m}^{'}; h_{s}) \sum_{y} (d_{m}, d_{m}^{'}) . \end{array}

(23)

Theorem 3

Under assumptions (C1)–(C10) in Section 6 for piecewise constant {β_j_*(d) : d ∈ Inline graphic }, we have the following results for all 0 ≤ s ≤ S:

${sup}_{d_{0} \in D_{0}} ∣ {\tilde{β}}_{j *} (d_{0}; h_{s}) - β_{j *} (d_{0}) ∣ = o_{p} (\sqrt{log (1 + N_{D}) / n})$ ;
${\hat{β}}_{j} (d_{0}; h_{s}) - β_{j *} (d_{0}) = \sum_{d_{m} \in B (d_{0}, h_{s}) \cap D_{0}} {\tilde{ω}}_{j}^{(0)} (d_{0}, d_{m}; h_{s}) {\hat{Δ}}_{j} (d_{m}) [1 + o_{p} (1)]$ ;
${sup}_{d_{0} \in D_{0}} ∣ \sum^{^} (\sqrt{n} {\tilde{β}}_{j *} (d_{0}; h_{s})) - \sum_{j}^{(0)} (d_{0}; h_{s}) ∣ = o_{p} (1)$ ;
$\sqrt{n} [{\hat{β}}_{j} (d_{0}; h_{s}) - β_{j *} (d_{0})]$ converges in distribution to a normal distribution with mean zero and variance $\sum_{j}^{(0)} (d_{0}; h_{s})$ as n → ∞.

Remark 3

Theorem 3 shows that MASS has several important features for a piecewise constant function β_j_*(d). For instance, Theorem 3 (i) quantifies the maximum absolute difference (or bias) between the true value β_j_*(d₀) and the pseudo true value β̃_j_*(d₀; h_s) across all d₀ ∈ Inline graphic for any s. Since β̃_j_*(d₀; h_s) − β_j_*(d₀) = 0 for d₀ ∈ \ ∂ (h_s), this result delineates the potential bias for voxels d₀ in ∂ (h_s).

Theorem 3 (iv) ensures that $\sqrt{n} [{\hat{β}}_{j} (d_{0}; h_{s}) - β_{j *} (d_{0})]$ is asymptotically normally distributed. Moreover, as shown in the supplementary document, $\sum_{j}^{(0)} (d_{0}; h_{s})$ is smaller than the asymptotic variance of the raw estimate β̂_j(d₀). As a result, MASS increases statistical power of testing H₀(d₀).

We now consider a much complex scenario when β_j_*(d) is piecewise smooth. In this case, β̃_j_*(d₀; h_s) may vary from h₀ to h_S for all voxels d₀ ∈ Inline graphic regardless whether d₀ belongs to ∂ (h_s) or not. We can establish important theoretical results to characterize the asymptotic behavior of β̂(d₀; h_s) only when $h_{s} = O (\sqrt{log (1 + N_{D}) / n}) = o (1)$ holds. We need some additional notation as follows:

\begin{array}{l} ω_{j}^{(1)} (d_{0}, d_{0}^{'}; h_{s}) = K_{loc} ({| | d_{0} - d_{0}^{'} | |}_{2} / h_{s}) K_{s t} (0) 1 (∣ β_{j *} (d_{0}) - β_{j *} (d_{0}^{'}) ∣ \leq O (h_{s})), \\ {\tilde{ω}}_{j}^{(1)} (d_{0}, d_{0}^{'}; h_{s}) = ω_{j}^{(1)} (d_{0}, d_{0}^{'}; h_{s}) / \sum_{d_{m} \in B (d_{0}, h_{s}) \cap D_{0}} ω_{j}^{(1)} (d_{0}, d_{m}; h_{s}), \\ \sum_{j}^{(1)} (d_{0}; h_{s}) = e_{j, p}^{T} Ω_{X}^{- 1} e_{j, p} \sum_{d_{m}, d_{m}^{'} \in B (d_{0}, h_{s}) \cap D_{0}} {\tilde{ω}}_{j}^{(1)} (d_{0}, d_{m}; h_{s}) {\tilde{ω}}_{j}^{(0)} (d_{0}, d_{m}^{'}; h_{s}) \sum_{y} (d_{m}, d_{m}^{'}) . \end{array}

(24)

Theorem 4

Suppose assumptions (C1)–(C9) and (C11) in Section 6 hold for piecewise continuous {β_j_*(d) : d ∈ Inline graphic }. For all 0 ≤ s ≤ S, we have the following results:

|β̃_j_*(d₀; h_s) − β_j_*(d₀)| = O_p(h_s);
${\hat{β}}_{j} (d_{0}; h_{s}) - {\tilde{β}}_{j *} (d_{0}; h_{s}) = \sum_{d_{m} \in B (d_{0}, h_{s}) \cap D_{0}} {\tilde{ω}}_{j}^{(1)} (d_{0}, d_{m}; h_{s}) {\hat{Δ}}_{j} (d_{m}) [1 + o_{p} (1)]$ ;
${sup}_{d_{0} \in D_{0}} ∣ \sum^{^} (\sqrt{n} {\tilde{β}}_{j *} (d_{0}; h_{s})) - \sum_{j}^{(1)} (d_{0}; h_{s}) ∣ = o_{p} (1)$ .
$\sqrt{n} [{\hat{β}}_{j} (d_{0}; h_{s}) - {\tilde{β}}_{j *} (d_{0}; h_{s})]$ converges in distribution to a normal distribution with mean zero and variance $\sum_{j}^{(1)} (d_{0}; h_{s})$ as n → ∞.

Remark 4

Theorem 4 characterizes several key features of MASS for a piecewise continuous function β_j_*(d). These results differ significantly from those for the piecewise constant case, but under weaker assumptions. For instance, Theorem 4 (i) quantifies the bias of the pseudo true value β̃_j_*(d₀; h_s) relative to the true value β_j_*(d₀) across all d₀ ∈ Inline graphic for a fixed s. Even for voxels inside the smooth areas of β_j_*(d), the bias O_p(h_s) is still much higher than the standard bias at the rate of $h_{s}^{2}$ due to the presence of $K_{s t} (D_{β_{j}} (d_{0}, d_{0}^{'}; h_{s - 1}) / C_{n})$ (Fan and Gijbels, 1996; Wand and Jones, 1995). If we set K_st(u) = 1(u ∈ [0, 1]) and β_j_*(d) is twice differentiable, then the bias of β̃_j_*(d₀; h_s) relative to β_j_*(d₀) may be reduced to $O_{p} (h_{s}^{2})$ . Theorem 4 (iv) ensures that $\sqrt{n} [{\hat{β}}_{j} (d_{0}; h_{s}) - {\tilde{β}}_{j *} (d_{0}; h_{s})]$ is asymptotically normally distributed. Moreover, as shown in the supplementary document, $\sum_{j}^{(1)} (d_{0}; h_{s})$ is smaller than the asymptotic variance of the raw estimate β̂_j(d₀), and thus MASS can increase statistical power in testing H₀(d₀) even for the piecewise continuous case.

3 Simulation Studies

In this section, we conducted a set of Monte Carlo simulations to compare MASS with voxel-wise methods from three different aspects. Firstly, we examine the finite sample performance of β̂(d₀; h_s) at different signal-to-noise ratios. Secondly, we examine the accuracy of the estimated eigenfunctions of Σ_η(d, d′). Thirdly, we assess both Type I and II error rates of the Wald test statistic. For the sake of space, we only present some selected results below and put additional simulation results in the supplementary document.

We simulated data at all 32,768 voxels on the 64 × 64 × 8 phantom image for n = 60 (or 80) subjects. At each d₀ = (d_0,1, d_0,2, d_0,3)^T in Inline graphic , Y_i(d₀) was simulated according to

y_{i} (d_{0}) = x_{i}^{T} β (d_{0}) + η_{i} (d_{0}) + ε_{i} (d_{0}) for i = 1, \dots, n,

(25)

where x_i = (x_i₁, x_i₂, x_i₃)^T, β(d₀) = (β₁(d₀), β₂(d₀), β₃(d₀))^T, and ε(d₀) ~ N(0, 1) or χ(3)² − 3, in which χ²(3) − 3 is a very skewed distribution. Furthermore, we set $η_{i} (d_{0}) = \sum_{l = 1}^{3} ξ_{i l} ψ_{l} (d_{0})$ , where ξ_il are independently generated according to ξ_i₁ ~ N(0, 0.6), ξ_i₂ ~ N(0, 0.3), and ξ_i₃ ~ N(0, 0.1), ψ₁(d₀) = 0.5 sin(2πd_0,1/64), ψ₂(d₀) = 0.5 cos(2πd_0,2/64), and $ψ_{3} (d_{0}) = \sqrt{1 / 2.625} (9 / 8 - d_{0, 3} / 4)$ . The first eigenfunction ψ₁(d₀) changes only along d_0,1 direction, while it keeps constant in the other two directions. The other two eigenfunctions, ψ₂(d₀) and ψ₃(d₀), were chosen in a similar way (Figure 3). We set x_i₁ = 1 and generated x_i₂ independently from a Bernoulli distribution with success rate 0.5 and x_i₃ independently from the uniform distribution on [1, 2]. The covariates x_i₂ and x_i₃ were chosen to represent group identity and scaled age, respectively.

Simulation results: a selected slice of (a) true ψ₁(d); (b) true ψ₂(d); (c) true ψ₃(d); (d) ψ̂₁(d); (e) ψ̂₂(d); and (f) ψ̂₃(d).

We chose different pattens for different β_j(d) images in order to examine the finite sample performance of our estimation method under different scenarios. We set all the 8 slices along the coronal axis to be identical for each of β_j(d) images. As shown in Figure 4, each slice of the three different β_j(d) images has four different blocks and 5 different regions of interest (ROIs) with varying patterns and shape. The true values of β_j(d) were varied from 0 to 0.8, respectively, and were displayed for all ROIs with navy blue, blue, green, orange and brown colors representing 0, 0.2, 0.4, 0.6, and 0.8, respectively.

Simulation results: a selected slice of (a) true β₁(d); (b) true β₂(d); (c) true β₃(d); (d) β̂₁(d₀); (e) β̂₂(d₀); (f) β̂₃(d₀); (g) β̂₁(d₀; h₁₀); (h) β̂₂(d₀; h₁₀); and (i) β̂₃(d₀; h₁₀).

We fitted the SVCM model (1) with the same set of covariates to a simulated data set, and then applied the three-stage estimation procedure described in Section 2.2 to calculate adaptive parameter estimates across all pixels at 11 different scales. In MASS, we set h_s = 1.1^s for s = 0, …, S = 10. Figure 4 shows some selected slices of β̂(d₀; h_s) at s = 0 (middle panels) and s = 10 (lower panels). Inspecting Figure 4 reveals that all β̂_j(d₀; h₁₀) outperform their corresponding β̂_j(d₀) in terms of variance and detected ROI patterns. Following the method described in Section 2.2, we estimated η_i(d) based on the residuals $y_{i} (d_{0}) - x_{i}^{T} \hat{β} (d_{0})$ by using the local linear smoothing method and then calculate η̂_i(d). Figure 3 shows some selected slices of the first three estimated eigenfunctions. Inspecting Figure 3 reveals that η̂_i(d) are relatively close to the true eigenfunctions and can capture the main feature in the true eigenfunctions, which vary in one direction and are constant in the other two directions. However, we do observe some minor block effects, which may be caused by using the block smoothing method to estimate η_i(d).

Furthermore, for β̂(d₀; h_s), we calculated the bias, the empirical standard error (RMS), the mean of the estimated standard errors (SD), and the ratio of RMS over SD (RE) at each voxel of the five ROIs based on the results obtained from the 200 simulated data sets. For the sake of space, we only presented some selected results based on β̂₃(d₀) and β̂₃(d₀; h₁₀) obtained from N(0, 1) distributed data with n = 60 in Table 1. The biases are slightly increased from h₀ to h₁₀ (Table 1), whereas RMS and SD at h₅ and h₁₀ are much smaller than those at h₀ (Table 1). In addition, the RMS and its corresponding SD are relatively close to each other at all scales for both the normal and Chi-square distributed data (Table 1). Moreover, SDs in these voxels of ROIs with β₃(d₀) > 0 are larger than SDs in those voxels of ROI with β₃(d₀) = 0, since the interior of ROI with β₃(d₀) = 0 contains more pixels (Figure 4 (c)). Moreover, the SDs at steps h₀ and h₁₀ show clear spatial patterns caused by spatial correlations. The RMSs also show some evidence of spatial patterns. The biases, SDs, and RMSs of β₃(d₀) are smaller in the normal distributed data than in the chi-square distributed data (Table 1), because the signal-to-noise ratios (SNRs) in the normal distributed data are bigger than those SNRs in the chi-square distributed data. Increasing sample size and signal-to-noise ratio decreases the bias, RMS and SD of parameter estimates (Table 1).

Table 1.

Simulation results: Average Bias (×10⁻²), RMS, SD, and RE of β₂(d₀) parameters in the five ROIs at 3 different scales (h₀, h₅, h₁₀), N (0, 1) and χ(3)² − 3 distributed noisy data, and 2 different sample sizes (n = 60, 80). BIAS denotes the bias of the mean of estimates; RMS denotes the root-mean-square error; SD denotes the mean of the standard deviation estimates; RE denotes the ratio of RMS over SD. For each case, 200 simulated data sets were used.

		χ²(3) − 3						N(0, 1)
		n = 60			n = 80			n = 60			n = 80

β₂(d₀)		h₀	h₅	h₁₀	h₀	h₅	h₁₀	h₀	h₅	h₁₀	h₀	h₅	h₁₀
0.0	BIAS	−0.03	0.36	0.61	0.00	0.34	0.56	−0.01	0.17	0.22	0.01	0.16	0.20
	RMS	0.18	0.13	0.13	0.15	0.10	0.10	0.14	0.07	0.07	0.12	0.06	0.06
	SD	0.18	0.13	0.12	0.15	0.11	0.11	0.14	0.07	0.07	0.12	0.06	0.06
	RE	1.03	1.00	1.04	1.00	0.94	0.98	0.99	0.94	1.03	1.00	0.95	1.04

0.2	BIAS	0.72	0.37	0.38	0.15	−0.35	−0.39	−0.04	−0.55	−0.66	0.10	−0.48	−0.61
	RMS	0.19	0.14	0.13	0.16	0.11	0.11	0.14	0.07	0.07	0.12	0.06	0.06
	SD	0.18	0.14	0.13	0.16	0.12	0.11	0.14	0.08	0.07	0.12	0.07	0.06
	RE	1.02	0.99	1.03	1.00	0.96	0.99	0.99	0.96	1.04	1.00	0.97	1.06

0.4	BIAS	−0.40	−0.55	−0.68	−0.10	−0.15	−0.24	0.04	0.12	0.13	−0.10	0.05	0.08
	RMS	0.19	0.14	0.14	0.16	0.12	0.12	0.14	0.07	0.07	0.12	0.07	0.07
	SD	0.18	0.14	0.13	0.16	0.12	0.12	0.14	0.08	0.07	0.12	0.07	0.06
	RE	1.02	1.00	1.03	1.00	0.96	1.00	0.99	0.96	1.04	1.00	0.97	1.06

0.6	BIAS	0.42	−1.14	−1.93	0.05	−1.20	−1.89	0.03	−0.55	−0.69	−0.01	−0.43	−0.54
	RMS	0.18	0.13	0.13	0.15	0.11	0.11	0.14	0.07	0.07	0.12	0.06	0.06
	SD	0.18	0.13	0.13	0.15	0.11	0.11	0.14	0.08	0.07	0.12	0.07	0.06
	RE	1.02	1.00	1.04	1.00	0.95	0.99	0.99	0.97	1.05	1.00	0.97	1.05

0.8	BIAS	−1.04	−2.95	−4.09	−0.13	−1.71	−2.70	−0.11	−0.82	−1.03	−0.03	−0.59	−0.77
	RMS	0.19	0.15	0.15	0.16	0.12	0.12	0.14	0.08	0.07	0.12	0.07	0.07
	SD	0.19	0.15	0.14	0.16	0.13	0.12	0.14	0.08	0.07	0.12	0.07	0.06
	RE	1.02	1.00	1.03	1.00	0.96	0.99	0.99	0.94	1.01	1.00	0.95	1.02

Open in a new tab

To assess both Type I and II error rates at the voxel level, we tested the hypotheses H₀(d₀): β_j(d₀) = 0 versus H₁(d₀): β_j(d₀) ≠ 0 for j = 1, 2, 3 across all d₀ ∈ Inline graphic . We applied the same MASS procedure at scales h₀ and h₁₀. The −log₁₀(p) values on some selected slices are shown in the supplementary document. The 200 replications were used to calculate the estimates (ES) and standard errors (SE) of rejection rates at α = 5% significance level. Due to space limit, we only report the results of testing β₂(d₀) = 0. The other two tests have similar results and are omitted here. For W_β(d₀; h), the Type I rejection rates in ROI with β₂(d₀) = 0 are relatively accurate for all scenarios, while the statistical power for rejecting the null hypothesis in ROIs with β₂(d₀) ≠ 0 significantly increases with radius h_s and signal-to-noise ratio (Table 2). As expected, increasing n improves the statistical power for detecting β₂(d₀) ≠ 0.

Table 2.

Simulation Study for W_β(d₀; h): estimates (ES) and standard errors (SE) of rejection rates for pixels inside the five ROIs were reported at 2 different scales (h₀, h₁₀), N(0, 1) and χ²(3) − 3 distributed data, and 2 different sample sizes (n = 60, 80) at α = 5%. For each case, 200 simulated data sets were used.

		χ²(3) − 3				N(0, 1)
		n = 60		n = 80		n = 60		n = 80

β₂(d₀)	s	ES	SE	ES	SE	ES	SE	ES	SE
0.0	h₀	0.056	0.016	0.049	0.015	0.048	0.015	0.050	0.016
0.0	h₁₀	0.055	0.016	0.042	0.015	0.036	0.016	0.040	0.019

0.2	h₀	0.210	0.043	0.245	0.039	0.282	0.033	0.370	0.035
0.2	h₁₀	0.358	0.126	0.413	0.139	0.777	0.107	0.870	0.081

0.4	h₀	0.556	0.072	0.692	0.054	0.794	0.030	0.895	0.024
0.4	h₁₀	0.792	0.129	0.894	0.078	0.994	0.006	0.998	0.003

0.6	h₀	0.907	0.040	0.966	0.022	0.988	0.008	0.998	0.003
0.6	h₁₀	0.986	0.023	0.997	0.009	1.000	0.001	1.000	0.000

0.8	h₀	0.978	0.016	0.997	0.004	1.000	0.001	1.000	0.000
0.8	h₁₀	0.997	0.006	1.000	0.001	1.000	0.000	1.000	0.000

Open in a new tab

4 Real Data Analysis

We applied SVCM to the Attention Deficit Hyperactivity Disorder (ADHD) data from the New York University (NYU) site as a part of the ADHD-200 Sample Initiative (http://fcon1000.projects.nitrc.org/indi/adhd200/). ADHD-200 Global Competition is a grassroots initiative event to accelerate the scientific community’s understanding of the neural basis of ADHD through the implementation of open data-sharing and discovery-based science. Attention deficit hyperactivity disorder (ADHD) is one of the most common childhood disorders and can continue through adolescence and adulthood (Polanczyk et al., 2007). Symptoms include difficulty staying focused and paying attention, difficulty controlling behavior, and hyperactivity (over-activity). It affects about 3 to 5 percent of children globally and diagnosed in about 2 to 16 percent of school aged children (Polanczyk et al., 2007). ADHD has three subtypes, namely, predominantly hyperactive-impulsive type, predominantly inattentive type, and combined type.

The NYU data set consists of 174 subjects (99 Normal Controls (NC) and 75 ADHD subjects with combined hyperactive-impulsive). Among them, there are 112 males whose mean age is 11.4 years with standard deviation 7.4 years and 62 females whose mean age is 11.9 years with standard deviation 10 years. Resting-state functional MRIs and T1-weighted MRIs were acquired for each subject. We only use the T1-weighted MRIs here. We processed the T1-weighted MRIs by using a standard image processing pipeline detailed in the supplementary document. Such pipeline consists of AC (anterior commissure) and -PC (posterior commissure) correction, bias field correction, skull-stripping, intensity inhomogeneity correction, cerebellum removal, segmentation, and nonlinear registration. We segmented each brain into three different tissues including grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF). We used the RAVENS maps to quantify the local volumetric group differences for the whole brain and each of the segmented tissue type (GM, WM, and CSF) respectively, using the deformation field that we obtained during registration (Davatzikos et al., 2001). RAVENS methodology is based on a volume-preserving spatial transformation, which ensures that no volumetric information is lost during the process of spatial normalization, since this process changes an individuals brain morphology to conform it to the morphology of the Jacob template (Kabani et al., 1998).

We fitted model (1) to the RAVEN images calculated from the NYU data set. Specifically, we set β(d₀) = (β₁(d₀), …, β₈(d₀))^T and x_i = (1, G_i, A_i, D_i, WBV_i, A_i × D_i, G_i × D_i, A_i × G_i)^T, where G_i, A_i, D_i, and WBV_i, respectively, represent gender, age, diagnosis (1 for NC and 0 for ADHD), and whole brain volume. We applied the three-stage estimation procedure described in Section 2.2. In MASS, we set h_s = 1.1^s for s = 1, …, 10. We are interested in assessing the age and diagnosis interaction and the gender and diagnosis interaction. Specifically, we tested H₀(d₀): β₆(d₀) = 0 against H₁(d₀): β₆(d₀) ≠ 0 for the age × diagnosis interaction across all voxels. Moreover, we also tested H₀(d₀): β₇(d₀) = 0 against H₁(d₀): β₇(d₀) ≠ 0 for the gender × diagnosis interaction, but we present the associated results in the supplementary document. Furthermore, as shown in the supplementary document, the largest estimated eigenvalue is much larger than all other estimated eigenvalues, which decrease very slowly to zero, and explains 22% of variation in data after accounting for x_i. Inspecting Figure 5 reveals that the estimated eigenfunction corresponding to the largest estimated eigenvalue captures the dominant morphometric variation.

Results from the ADHD 200 data: five selected slices of the four estimated eigenfunctions corresponding to the first four largest eigenvalues of Σ̂_η(·, ·): (a) ψ̂₁(d); (b) ψ̂₂(d); (c) ψ̂₃(d); and (d) ψ̂₄(d).

As s increases from 0 to 10, MASS shows an advantage in smoothing effective signals within relatively homogeneous ROIs, while preserving the edges of these ROIs (Fig. 6 (a)–(d)). Inspecting Figure 6 (c) and (d) reveals that it is much easier to identify significant ROIs in the −log₁₀(p) images at scale h₁₀, which are much smoother than those at scale h₀. To formally detect significant ROIs, we used a cluster-form of threshold of 5% with a minimum voxel clustering value of 50 voxels. We were able to detect 26 significant clusters across the brain. Then, we overlapped these clusters with the 96 predefined ROIs in the Jacob template and were able to detect several predefined ROIs for each cluster. As shown in the supplementary document, we were able to detect several major ROIs, such as the frontal lobes and the right parietal lobe. The anatomical disturbance in the frontal lobes and the right parietal lobe has been consistently revealed in the literature and may produce difficulties with inhibiting prepotent responses and decreased brain activity during inhibitory tasks in children with ADHD (Bush, 2011). These ROIs comprise the main components of the cingulo-frontal-parietal cognitive-attention network. These areas, along with striatum, premotor areas, thalamus and cerebellum have been identified as nodes within parallel networks of attention and cognition (Bush, 2011).

Results from the ADHD 200 data: five selected slices of (a) β̂₆(d₀), (b) β̂₆(d₀; h₁₀), the −log₁₀(p) images for testing H₀: β₆(d₀) = 0 (c) at scale h₀ and (d) at scale h₁₀, where β₆(d₀) is the regression coefficient associated with the age×diagnostic interaction.

To evaluate the prediction accuracy of SVCM, we randomly selected one subject with ADHD from the NYU data set and predicted his/her RAVENS image by using both model (1) and a standard linear model with normal noise. In both models, we used the same set of covariates, but different covariance structures. Specifically, in the standard linear model, an independent correlation structure was used and the least squares estimates of β(d₀) were calculated. For SVCM, the functional principal component analysis model was used and β̂(d₀; h₁₀) were calculated. After fitting both models to all subjects except the selected one, we used the fitted models to predict the RAVEN image of the selected subject and then calculated the prediction error based on the difference between the true and predicted RAVEN images. We repeated the prediction procedure 50 times and calculated the mean and standard deviation images of these prediction error images (Figure 7). Inspecting Figure 7 reveals the advantage and accuracy of model (1) over the standard linear model for the ADHD data.

Results from the ADHD 200 data: The raw RAVENS image for a selected subject with ADHD (a), mean ((b) GLM and (d) SVCM) and standard error ((c) GLM and (e) SVCM) of the errors to predict the RAVENS image in (a), where GLM denotes general linear model.

5 Discussion

This article studies the idea of using SVCM for the spatial and adaptive analysis of neuroimaging data with jump discontinuities, while explicitly modeling spatial dependence in neuroimaging data. We have developed a three-stage estimation procedure to carry out statistical inference under SVCM. MASS integrates three methods including propagation-separation, functional principal component analysis, and jumping surface model for neuroimaging data from multiple subjects. We have developed a fast and accurate estimation method for independently updating each of effect images, while consistently estimating their standard deviation images. Moreover, we have derived the asymptotic properties of the estimated eigenvalues and eigenfunctions and the parameter estimates.

Many issues still merit further research. The basic setup of SVCM can be extended to more complex data structures (e.g., longitudinal, twin and family) and other parametric and semiparametric models. For instance, we may develop a spatial varying coefficient mixed effects model for longitudinal neuroimaging data. It is also feasible to include nonparametric components in SVCM. More research is needed for weakening regularity assumptions and for developing adaptive-neighborhood methods to determine multiscale neighborhoods that adapt to the pattern of imaging data at each voxel. It is also interesting to examine the efficiency of our adaptive estimators obtained from MASS for different kernel functions and coefficient functions. An important issue is that SVCM and other voxel-wise methods do not account for the errors caused by registration method. We may need to explicitly model the measurement errors caused by the registration method, and integrate them with smoothing method and SVCM into a unified framework.

6 Technical Conditions

6.1 Assumptions

Throughout the paper, the following assumptions are needed to facilitate the technical details, although they may not be the weakest conditions. We do not distinguish the differentiation and continuation at the boundary points from those in the interior of Inline graphic .

Assumption C1. The number of parameters p is finite. Both N_D and n increase to infinity such that ${lim}_{n \to \infty} C_{n} / n = {lim}_{n \to \infty} C_{n}^{- 1} log (N_{D}) = {lim}_{n \to \infty} C_{n}^{- 1} = 0$ .
Assumption C2. ε_i(d) are identical and independent copies of SP(0, Σ_ε) and ε_i(d) and ε_i(d′) are independent for d ≠ d′ ∈ . Moreover, ε_i(d) are, uniformly in d, sub-Gaussian such that $K_{ε}^{2} [E exp ({∣ ε_{i} (d) ∣}^{2} / K_{ε}) - 1] \leq C_{ε}$ for all d ∈ and some positive constants K_ε and C_ε.
Assumption C3. The covariate vectors x_is are independently and identically distributed with Ex_i = μ_x and ||x_i||_∞ < ∞. Moreover, $E (x_{i}^{\otimes 2}) = Ω_{X}$ is invertible. The x_i, ε_i(d), and η_i(d) are mutually independent of each other.
Assumption C4. Each component of {η(d): d ∈ }, {η(d)η(d′)^T: (d, d′) ∈ } and {xη^T(d): d ∈ } are Donsker classes. Moreover, Σ_η(d, d) > 0 and $E [{sup}_{d \in D} {| | η (d) | |}_{2}^{2 r_{1}}] < \infty$ for some r₁ ∈ (2, ∞), where || · ||₂ is the Euclidean norm. All components of Σ_η(d, d′) have continuous second-order partial derivatives with respect to (d, d′) ∈ .
Assumption C5. The grid points = {d_m, m = 1, …, N_D} are independently and identically distributed with density function π(d), which has the bounded support . Moreover, π(d) > 0 for all d ∈ and π(d) has continuous second-order derivative.
Assumption C6. The kernel functions K_loc(t) and K_st(t) are Lipschitz continuous and symmetric density functions, while K_loc(t) has a compact support [−1, 1]. Moreover, they are continuously decreasing functions of t ≥ 0 such that K_st(0) = K_loc(0) > 0 and lim_t→_∞ K_st(t) = 0.
Assumption C7. h converges to zero such that

$h \geq c {(log N_{D} / N_{D})}^{1 - 2 / q_{1}} and h^{- 12} {(log n / n)}^{1 - 1 / q_{2}} = o (1),$

where c > 0 is a fixed constant and min(q₁, q₂) > 2.
Assumption C8. There is a positive integer E < ∞ such that λ₁ > … > λ_E ≥ 0.
Assumption C9. For each j, the three assumptions of the jumping surface model hold, each $D_{j, l}^{o}$ is path-connected, and β_j_*(d) is a Lipschitz function of d with a common Lipschitz constant K_j > 0 in each $D_{j, l}^{o}$ such that |β_j_*(d) − β_j_*(d′)| ≤ K_j||d − d′||₂ for any d, $d^{'} \in D_{j, l}^{o}$ . Moreover, |β_j_*(d)| < ∞, and max(K_j, L_j) < ∞.
Assumption C10. For piecewise constant β_j_*(d), $o (u^{(j)} (h_{s})) = \sqrt{log (1 + N_{D}) / n}$ and $N_{D} h_{s}^{3} K_{s t} (C_{n}^{- 1} n u^{(j)} {(h_{s})}^{2} / (3 S_{y})) = o (\sqrt{log (1 + N_{D}) / n})$ holds uniformly for h₀ = 0 < ··· < h_S, where S_y = Σ_y(d₀, d₀) and u⁽^j⁾(h_s) is the smallest absolute value of all possible jumps at scale h_s and given by
$u^{(j)} (h_{s}) = min {∣ β_{j *} (d_{0}) - β_{j *} (d_{0}^{'}) ∣ : (d_{0}, d_{0}^{'}) \in D_{0}^{2}, β_{j *} (d_{0}) \neq β_{j *} (d_{0}^{'}), d_{0}^{'} \in B (d_{0}, h_{s})} .$
Assumption C11. For piecewise continuous β_j_*(d), [P_j(d₀, h_S)^c ∩ I_j(d₀, δ_L, δ_U)] is an empty set and h₀ = 0 < h₁ < ··· < h_S is a sequence of bandwidths such that $δ_{L} = O (\sqrt{log (1 + N_{D}) / n}) = o (1), δ_{U} = \sqrt{C_{n} / n} M_{n} = o (1)$ , in which lim_n_→∞ M_n = ∞, $h_{S} = O (\sqrt{log (1 + N_{D}) / n})$ and $N_{D} h_{S}^{3} K_{s t} (M_{n}^{2} / (3 S_{y})) = o (\sqrt{log (1 + N_{D}) / n})$ .

Remark 5

Assumption (C2) is needed to invoke Hoeffding inequality (Buhlmann and van de Geer, 2011; van der Vaar and Wellner, 1996) in order to establish the uniform bound for β̂(d₀; h_s). In practice, since most neuroimaging data are often bounded, the sub-Gaussian assumption is reasonable. The bound assumption on ||x||_∞ in Assumption (C3) is not essential and can be removed if we put a restriction on the tail of the distribution x. Moreover, with some additional efforts, all results are valid even for the case with fixed design predictors. Assumption (C4) avoids smoothness conditions on the sample path η(d), which are commonly assumed in the literature (Hall et al., 2006). The assumption on the moment of ${sup}_{d \in D} {| | η (d) | |}_{2}^{2 r_{2}}$ is similar to the conditions used in (Li and Hsing, 2010). Assumption (C5) on the stochastic grid points is not essential and can be modified to accommodate the case for fixed grid points with some additional complexities.

Remark 6

The bounded support restriction on K_loc(·) in Assumption (C6) can be weaken to a restriction on the tails of K_loc(·). Assumption (C9) requires smoothness and shape conditions on the image of β_j_*(d) for each j. For piecewise constant β_j_*(d), assumption (C10) requires conditions on the amount of changes at jumping points relative to n, N_D, and h_S. If K_st(t) has a compact support, then K_st(u⁽^j⁾²/C) = 0 for relatively large u⁽^j⁾². In this case, h_S can be very large. However, for piecewise continuous β_j_*(d), assumption (C11) requires the convergence rate of h_S and the amount of changes at jumping points.

Supplementary Material

NIHMS588529-supplement-Supplementary_Material.pdf^{(5.2MB, pdf)}

Contributor Information

Hongtu Zhu, Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, hapel Hill, NC 27599, USA.

Jianqing Fan, Department of Oper Res and Fin. Eng, Princeton University, Princeton, NJ 08540.

Linglong Kong, Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB Canada T6G 2G1.

References

Besag JE. On the statistical analysis of dirty pictures (with discussion) Journal of the Royal Statistical Society, Ser B. 1986;48:259–302. [Google Scholar]
Buhlmann P, van de Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. New York, N.Y: Springer; 2011. [Google Scholar]
Bush G. Cingulate, frontal and parietal cortical dysfunction in attention-deficit/hyperactivity disorder. Bio Psychiatry. 2011;69:1160–1167. doi: 10.1016/j.biopsych.2011.01.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chan TF, Shen J. Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods. Philadelphia: SIAM; 2005. [Google Scholar]
Chumbley J, Worsley KJ, Flandin G, Friston KJ. False discovery rate revisited: FDR and topological inference using Gaussian random fields. Neuroimage. 2009;44:62–70. doi: 10.1016/j.neuroimage.2008.05.021. [DOI] [PubMed] [Google Scholar]
Cressie N, Wikle C. Statistics for Spatio-Temporal Data. Hoboken, NJ: Wiley; 2011. [Google Scholar]
Davatzikos C, Genc A, Xu D, Resnick S. Voxel-based morphome-try using the RAVENS maps: methods and validation using simulated longitudinal atrophy. NeuroImage. 2001;14:1361–1369. doi: 10.1006/nimg.2001.0937. [DOI] [PubMed] [Google Scholar]
Fan J. Local linear regression smoothers and their minimax efficiencies. Ann Statist. 1993;21:196–216. [Google Scholar]
Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. London: Chapman and Hall; 1996. [Google Scholar]
Fan J, Zhang J. Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;62:303–322. [Google Scholar]
Fan J, Zhang W. Statistical estimation in varying coefficient models. The Annals of Statistics. 1999;27:1491–1518. [Google Scholar]
Fan J, Zhang W. Statistical methods with varying coefficient models. Stat Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Friston KJ. Statistical Parametric Mapping: the Analysis of Functional Brain Images. London: Academic Press; 2007. [Google Scholar]
Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Ann Statist. 2006;34:1493–1517. [Google Scholar]
Kabani N, MacDonald D, Holmes C, Evans A. A 3D atlas of the human brain. Neuroimage. 1998;7:S717. [Google Scholar]
Khodadadi A, Asgharian M. Tech rep. McGill University; 2008. Change point problem and regression: an annotated bibliography. http://biostats.bepress.com/cobra/art44. [Google Scholar]
Lazar NA. The Statistical Analysis of Functional MRI Data. New York: Springer; 2008. [Google Scholar]
Li SZ. Markov Random Field Modeling in Image Analysis. New York, NY: Springer; 2009. [Google Scholar]
Li Y, Hsing T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. The Annals of Statistics. 2010;38:3321–3351. [Google Scholar]
Li Y, Zhu H, Shen D, Lin W, Gilmore JH, Ibrahim JG. Multiscale adaptive regression models for neuroimaging data. Journal of the Royal Statistical Society: Series B. 2011;73:559–578. doi: 10.1111/j.1467-9868.2010.00767.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu S. Matrix results on the Khatri-Rao and Tracy-Singh products. Linear Algebra Appl. 1999;289:267–277. [Google Scholar]
Mori S. Principles, methods, and applications of diffusion tensor imaging. In: Toga AW, Mazziotta JC, editors. Brain Mapping: The Methods. 2. Elsevier Science; 2002. pp. 379–397. [Google Scholar]
Polanczyk G, de Lima M, Horta B, Biederman J, Rohde L. The worldwide prevalence of ADHD: a systematic review and metaregression analysis. The American Journal of Psychiatry. 2007;164:942–948. doi: 10.1176/ajp.2007.164.6.942. [DOI] [PubMed] [Google Scholar]
Polzehl J, Spokoiny VG. Adaptive weights smoothing with applications to image restoration. J R Statist Soc B. 2000;62:335–354. [Google Scholar]
Polzehl J, Spokoiny VG. Propagation-separation approach for local likelihood estimation. Probab Theory Relat Fields. 2006;135:335–362. [Google Scholar]
Polzehl J, Voss HU, Tabelow K. Structural adaptive segmentation for statistical parametric mapping. NeuroImage. 2010;52:515–523. doi: 10.1016/j.neuroimage.2010.04.241. [DOI] [PubMed] [Google Scholar]
Qiu P. Image Processing and Jump Regression Analysis. New York: John Wileym & Sons; 2005. [Google Scholar]
Qiu P. Jump surface estimation, edge detection, and image restoration. Journal of American Statistical Association. 2007;102:745–756. [Google Scholar]
Ramsay JO, Silverman BW. Functional Data Analysis. New York: Springer-Verlag; 2005. [Google Scholar]
Scott D. Multivariate Density Estimation: Theory, Practice, and Visualization. New York: John Wiley; 1992. [Google Scholar]
Spence J, Carmack P, Gunst R, Schucany W, Woodward W, Haley R. Accounting for spatial dependence in the analysis of SPECT brain imaging data. Journal of the American Statistical Association. 2007;102:464–473. [Google Scholar]
Tabelow K, Polzehl J, Spokoiny V, Voss HU. Diffusion tensor imaging: structural adaptive smoothing. NeuroImage. 2008a;39:1763–1773. doi: 10.1016/j.neuroimage.2007.10.024. [DOI] [PubMed] [Google Scholar]
Tabelow K, Polzehl J, Ulug AM, Dyke JP, Watts R, Heier LA, Voss HU. Accurate localization of brain activity in presurgical fMRI by structure adaptive smoothing. IEEE Trans Med Imaging. 2008b;27:531–537. doi: 10.1109/TMI.2007.908684. [DOI] [PubMed] [Google Scholar]
Thompson P, Toga A. A framework for computational anatomy. Computing and Visualization in Science. 2002;5:13–34. [Google Scholar]
van der Vaar AW, Wellner JA. Weak Convergence and Empirical Processes. Springer-Verlag Inc; 1996. [Google Scholar]
Wand MP, Jones MC. Kernel Smoothing. London: Chapman and Hall; 1995. [Google Scholar]
Worsley KJ, Taylor JE, Tomaiuolo F, Lerch J. Unified univariate and multivariate random field theory. NeuroImage. 2004;23:189–195. doi: 10.1016/j.neuroimage.2004.07.026. [DOI] [PubMed] [Google Scholar]
Wu CO, Chiang CT, Hoover DR. Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. J Amer Statist Assoc. 1998;93:1388–1402. [Google Scholar]
Yue Y, Loh JM, Lindquist MA. Adaptive spatial smoothing of fMRI images. Statistics and its Interface. 2010;3:3–14. [Google Scholar]
Zipunnikov V, Caffo B, Yousem DM, Davatzikos C, Schwartz BS, Crainiceanu C. Functional principal component model for high-dimensional brain imaging. NeuroImage. 2011;58:772–784. doi: 10.1016/j.neuroimage.2011.05.085. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

NIHMS588529-supplement-Supplementary_Material.pdf^{(5.2MB, pdf)}

[R1] Besag JE. On the statistical analysis of dirty pictures (with discussion) Journal of the Royal Statistical Society, Ser B. 1986;48:259–302. [Google Scholar]

[R2] Buhlmann P, van de Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. New York, N.Y: Springer; 2011. [Google Scholar]

[R3] Bush G. Cingulate, frontal and parietal cortical dysfunction in attention-deficit/hyperactivity disorder. Bio Psychiatry. 2011;69:1160–1167. doi: 10.1016/j.biopsych.2011.01.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Chan TF, Shen J. Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods. Philadelphia: SIAM; 2005. [Google Scholar]

[R5] Chumbley J, Worsley KJ, Flandin G, Friston KJ. False discovery rate revisited: FDR and topological inference using Gaussian random fields. Neuroimage. 2009;44:62–70. doi: 10.1016/j.neuroimage.2008.05.021. [DOI] [PubMed] [Google Scholar]

[R6] Cressie N, Wikle C. Statistics for Spatio-Temporal Data. Hoboken, NJ: Wiley; 2011. [Google Scholar]

[R7] Davatzikos C, Genc A, Xu D, Resnick S. Voxel-based morphome-try using the RAVENS maps: methods and validation using simulated longitudinal atrophy. NeuroImage. 2001;14:1361–1369. doi: 10.1006/nimg.2001.0937. [DOI] [PubMed] [Google Scholar]

[R8] Fan J. Local linear regression smoothers and their minimax efficiencies. Ann Statist. 1993;21:196–216. [Google Scholar]

[R9] Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. London: Chapman and Hall; 1996. [Google Scholar]

[R10] Fan J, Zhang J. Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;62:303–322. [Google Scholar]

[R11] Fan J, Zhang W. Statistical estimation in varying coefficient models. The Annals of Statistics. 1999;27:1491–1518. [Google Scholar]

[R12] Fan J, Zhang W. Statistical methods with varying coefficient models. Stat Interface. 2008;1:179–195. doi: 10.4310/sii.2008.v1.n1.a15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Friston KJ. Statistical Parametric Mapping: the Analysis of Functional Brain Images. London: Academic Press; 2007. [Google Scholar]

[R14] Hall P, Müller HG, Wang JL. Properties of principal component methods for functional and longitudinal data analysis. Ann Statist. 2006;34:1493–1517. [Google Scholar]

[R15] Kabani N, MacDonald D, Holmes C, Evans A. A 3D atlas of the human brain. Neuroimage. 1998;7:S717. [Google Scholar]

[R16] Khodadadi A, Asgharian M. Tech rep. McGill University; 2008. Change point problem and regression: an annotated bibliography. http://biostats.bepress.com/cobra/art44. [Google Scholar]

[R17] Lazar NA. The Statistical Analysis of Functional MRI Data. New York: Springer; 2008. [Google Scholar]

[R18] Li SZ. Markov Random Field Modeling in Image Analysis. New York, NY: Springer; 2009. [Google Scholar]

[R19] Li Y, Hsing T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. The Annals of Statistics. 2010;38:3321–3351. [Google Scholar]

[R20] Li Y, Zhu H, Shen D, Lin W, Gilmore JH, Ibrahim JG. Multiscale adaptive regression models for neuroimaging data. Journal of the Royal Statistical Society: Series B. 2011;73:559–578. doi: 10.1111/j.1467-9868.2010.00767.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Liu S. Matrix results on the Khatri-Rao and Tracy-Singh products. Linear Algebra Appl. 1999;289:267–277. [Google Scholar]

[R22] Mori S. Principles, methods, and applications of diffusion tensor imaging. In: Toga AW, Mazziotta JC, editors. Brain Mapping: The Methods. 2. Elsevier Science; 2002. pp. 379–397. [Google Scholar]

[R23] Polanczyk G, de Lima M, Horta B, Biederman J, Rohde L. The worldwide prevalence of ADHD: a systematic review and metaregression analysis. The American Journal of Psychiatry. 2007;164:942–948. doi: 10.1176/ajp.2007.164.6.942. [DOI] [PubMed] [Google Scholar]

[R24] Polzehl J, Spokoiny VG. Adaptive weights smoothing with applications to image restoration. J R Statist Soc B. 2000;62:335–354. [Google Scholar]

[R25] Polzehl J, Spokoiny VG. Propagation-separation approach for local likelihood estimation. Probab Theory Relat Fields. 2006;135:335–362. [Google Scholar]

[R26] Polzehl J, Voss HU, Tabelow K. Structural adaptive segmentation for statistical parametric mapping. NeuroImage. 2010;52:515–523. doi: 10.1016/j.neuroimage.2010.04.241. [DOI] [PubMed] [Google Scholar]

[R27] Qiu P. Image Processing and Jump Regression Analysis. New York: John Wileym & Sons; 2005. [Google Scholar]

[R28] Qiu P. Jump surface estimation, edge detection, and image restoration. Journal of American Statistical Association. 2007;102:745–756. [Google Scholar]

[R29] Ramsay JO, Silverman BW. Functional Data Analysis. New York: Springer-Verlag; 2005. [Google Scholar]

[R30] Scott D. Multivariate Density Estimation: Theory, Practice, and Visualization. New York: John Wiley; 1992. [Google Scholar]

[R31] Spence J, Carmack P, Gunst R, Schucany W, Woodward W, Haley R. Accounting for spatial dependence in the analysis of SPECT brain imaging data. Journal of the American Statistical Association. 2007;102:464–473. [Google Scholar]

[R32] Tabelow K, Polzehl J, Spokoiny V, Voss HU. Diffusion tensor imaging: structural adaptive smoothing. NeuroImage. 2008a;39:1763–1773. doi: 10.1016/j.neuroimage.2007.10.024. [DOI] [PubMed] [Google Scholar]

[R33] Tabelow K, Polzehl J, Ulug AM, Dyke JP, Watts R, Heier LA, Voss HU. Accurate localization of brain activity in presurgical fMRI by structure adaptive smoothing. IEEE Trans Med Imaging. 2008b;27:531–537. doi: 10.1109/TMI.2007.908684. [DOI] [PubMed] [Google Scholar]

[R34] Thompson P, Toga A. A framework for computational anatomy. Computing and Visualization in Science. 2002;5:13–34. [Google Scholar]

[R35] van der Vaar AW, Wellner JA. Weak Convergence and Empirical Processes. Springer-Verlag Inc; 1996. [Google Scholar]

[R36] Wand MP, Jones MC. Kernel Smoothing. London: Chapman and Hall; 1995. [Google Scholar]

[R37] Worsley KJ, Taylor JE, Tomaiuolo F, Lerch J. Unified univariate and multivariate random field theory. NeuroImage. 2004;23:189–195. doi: 10.1016/j.neuroimage.2004.07.026. [DOI] [PubMed] [Google Scholar]

[R38] Wu CO, Chiang CT, Hoover DR. Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. J Amer Statist Assoc. 1998;93:1388–1402. [Google Scholar]

[R39] Yue Y, Loh JM, Lindquist MA. Adaptive spatial smoothing of fMRI images. Statistics and its Interface. 2010;3:3–14. [Google Scholar]

[R40] Zipunnikov V, Caffo B, Yousem DM, Davatzikos C, Schwartz BS, Crainiceanu C. Functional principal component model for high-dimensional brain imaging. NeuroImage. 2011;58:772–784. doi: 10.1016/j.neuroimage.2011.05.085. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Spatially Varying Coefficient Model for Neuroimaging Data with Jump Discontinuities

Hongtu Zhu

Jianqing Fan

Linglong Kong

Abstract

1 Introduction

2 Spatial Varying Coefficient Model with Jumping Discontinuities

2.1 Model Setup

Figure 1.

2.2 Three-stage Estimation Procedure

Figure 2.

2.2.1 Stage (I)

2.2.2 Stage (II)

2.2.3 Stage (III)

2.3 Theoretical Results

Theorem 1

Remark 1

Theorem 2

Remark 2

Theorem 3

Remark 3

Theorem 4

Remark 4

3 Simulation Studies

Figure 3.

Figure 4.

Table 1.

Table 2.

4 Real Data Analysis

Figure 5.

Figure 6.

Figure 7.

5 Discussion

6 Technical Conditions

6.1 Assumptions

Remark 5

Remark 6

Supplementary Material

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases