Regression Models on Riemannian Symmetric Spaces

Emil Cornea; Hongtu Zhu; Peter Kim; Joseph G Ibrahim

doi:10.1111/rssb.12169

. Author manuscript; available in PMC: 2017 Sep 1.

Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2016 Mar 20;79(2):463–482. doi: 10.1111/rssb.12169

Regression Models on Riemannian Symmetric Spaces

Emil Cornea ^*, Hongtu Zhu ^*,^†, Peter Kim ^**, Joseph G Ibrahim ^*, for the Alzheimers Disease Neuroimaging Initiative

PMCID: PMC5433528 NIHMSID: NIHMS778184 PMID: 28529445

Summary

The aim of this paper is to develop a general regression framework for the analysis of manifold-valued response in a Riemannian symmetric space (RSS) and its association with multiple covariates of interest, such as age or gender, in Euclidean space. Such RSS-valued data arises frequently in medical imaging, surface modeling, and computer vision, among many others. We develop an intrinsic regression model solely based on an intrinsic conditional moment assumption, avoiding specifying any parametric distribution in RSS. We propose various link functions to map from the Euclidean space of multiple covariates to the RSS of responses. We develop a two-stage procedure to calculate the parameter estimates and determine their asymptotic distributions. We construct the Wald and geodesic test statistics to test hypotheses of unknown parameters. We systematically investigate the geometric invariant property of these estimates and test statistics. Simulation studies and a real data analysis are used to evaluate the finite sample properties of our methods.

Keywords: Generalized method of moment, Group action, Geodesic, Lie group, Link function, Regression, RS space

1. Introduction

Manifold-valued responses in curved spaces frequently arise in many disciplines including medical imaging, computational biology, and computer vision, among many others. For instance, in medical and molecular imaging, it is interesting to delineate the changes in the shape and anatomy of a molecule. See Figure 1 for four different examples of manifold-valued data. Regression analysis is a fundamental statistical tool for relating a response variable to covariate, such as age. In particular, when both the response and the covariate(s) are in Euclidean space, the classical linear regression model and its variants have been widely used in various fields (McCullagh and A. Nelder, 1989). However, when the response is in a Riemannian symmetric space (RSS) and the covariates are in Euclidean space, developing regression models for this type of data raises both computational and theoretical challenges. The aim of this paper is to develop a general regression framework to address these challenges.

Fig. 1 — Examples of manifold-valued data: (a) diffusion tensors along white matter fiber bundles and their ellipsoid representations; (b) principal direction map of a selected slice for a randomly selected subject and the directional representation of some randomly selected principal directions on S²; (c) median representations and median atoms of a hippocampus from a randomly selected subject; and (d) an extracted contour and landmarks along the contour of the midsagittal section of the corpus callosum (CC) from a randomly selected subject.

Little has been done on the regression analyses of manifold-valued response data. The existing statistical methods for general manifold-valued data are primarily developed to characterize the population ‘mean’ and ‘variation’ across groups (Bhattacharya and Patrangenaru, 2003, 2005; Fletcher et al., 2004; Dryden and Mardia, 1998; Huckemann et al., 2010). In contrast, even for the ‘simplest’ directional data, there is a sparse literature on regression modeling of a single directional response and multiple covariates (Mardia and Jupp, 2000). In addition, these regression models of directional data are primarily based on a specific parametric distribution, such as the von Mises-Fisher distribution (Mardia and Jupp, 2000; Kent, 1982). However, it can be very challenging to assume useful parametric distributions for general manifold-valued data, and thus it is difficult to generalize these regression models of directional data to general manifold-valued data except for some specific manifolds (Shi et al., 2012; Fletcher, 2013; Kim et al., 2014; Shi et al., 2009; Zhu et al., 2009). There is also a great interest in developing nonparametric regression models for manifold-valued response data and multiple covariates (Bhattacharya and Dunson, 2010, 2012; Samir et al., 2012; Su et al., 2012; Muralidharan and Fletcher, 2012; Machado and Leite, 2006; Machado et al., 2010; Yuan et al., 2012).

An intriguing question is whether there is a general regression framework for manifold-valued response in a RSS and covariates in a multidimensional Euclidean space. The aim of this paper is to give an affirmative answer to such a question. The theoretical development is challenging but of great interest for carrying out statistical inferences on regression coefficients. We make five major contributions in this paper as follows: (i) We propose an intrinsic regression model solely based on an intrinsic conditional moment for the response in a RSS, thus avoiding specifying any parametric distributions in a general RSS - the model can handle multiple covariates in Euclidean space. (ii) We develop several ‘efficient’ estimation methods for estimating the regression coefficients in this intrinsic model. (iii) We develop several test statistics for testing linear hypotheses of the regression coefficients. (iv) We develop a general asymptotic framework for the estimates of the regression coefficients and test statistics. (v) We systematically investigate the geometrical properties (e.g., chart invariance) of these parameter estimates and test statistics.

The paper is organized as follows. In Section 2, we review the basic notion and concepts of Riemannian geometry. In Section 3, we propose the intrinsic regression models and propose various link functions for several specific RSS’s. In Section 4, we develop estimation and test procedures for the intrinsic regression models. In Section 5, we carry out a detailed data analysis on the shape of Corpus Callosum (CC) contours obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. Finally, we conclude with some discussions in Section 6. Technical conditions, simulation studies, theoretical examples, and proofs are deferred to the Supplementary document. Our code and data are available from http://www.bios.unc.edu/research/bias.

2. Differential Geometry Preliminaries

We briefly review some basic facts about the theory of Riemannian geometry and present more technical details in the Supplementary Report. The reader can refer to (Helgason, 1978; Spivak, 1979; Lang, 1999) for more details.

Let ℳ be a smooth manifold and d_ℳ be its dimension. A tangent vector of ℳ at p ∈ ℳ is defined as the derivative of a smooth curve γ(t) with respect to t evaluated at t = 0, denoted as γ̇(0), where γ(0) = p. The tangent space of ℳ at p is denoted as T_pℳ and is the set of all tangent vectors at p.

A Riemannian manifold (ℳ, m) is a smooth manifold together with a family of inner products, m = {m_p}, on the tangent spaces T_pℳ’s that vary smoothly with p ∈ ℳ, and m is called a Riemannian metric. This metric induces a so-called geodesic distance dist_ℳ on ℳ. The geodesics are, by definition, the locally distance-minimizing paths. If the metric space (ℳ, dist_ℳ) is complete, the exponential map at p is defined on the tangent space T_pℳ by ${Exp}_{p}^{M} (V) = γ (1; p, V)$ , where t → γ (t; p, V) is the geodesic with γ(0; p, V) = p and γ̇(0; p, V) = V. ${Exp}_{p}^{M}$ is well-defined near 0 and is a diffeomorphism on an open neighborhood 𝒱 of the origin in T_pℳ onto 𝒰 with 𝒱 such that tV ∈ 𝒱 for 0 ≤ t ≤ 1 and V ∈ 𝒱. The inverse map is the logarithmic map at p, denoted by ${Log}_{p}^{M}$ . Then, for q ∈ 𝒰, ${dist}_{M} (p, q) = {‖ {Log}_{p}^{M} (q) ‖}_{p}$ . The radius of injectivity of ℳ at p, denoted by ρ^*(ℳ, p), is the largest r > 0 such that ${Exp}_{p}^{M}$ is a diffeomorphism on the open ball B_{m_p}(0, r) ⊂ T_pℳ onto an open set in ℳ near p. Any basis in the tangent space T_pℳ induces an isomorphism from T_pℳ to R^d_ℳ, and then the logarithmic map Log_p provides a local chart near p. If T_pℳ is endowed with an orthonormal basis, such a chart is called a normal chart and the coordinates are called normal coordinates.

A Lie group G is a group together with a smooth manifold structure such that the operations of multiplication and inversion are smooth maps. Many common geometric transformations of Euclidean spaces that form Lie groups include rotations, translations, dilations, and affine transformations on R^d. In general, Lie groups can be used to describe transformations of smooth manifolds.

An RSS is a connected Riemannian manifold ℳ with the property that at each point, the mapping that reverses geodesics through that point is an isometry. Examples of RSS’s include Euclidean spaces, R^k, spheres, S^k, projective spaces, PR^k, and hyperbolic spaces, H^k, each with their standard Riemannian metrics. Symmetric spaces arise naturally from Lie group actions on manifolds, see Helgason (1978). Given a smooth manifold ℳ and a Lie group G, a smooth group action of G on ℳ is a smooth mapping G × ℳ → ℳ, (a, p) ↦ a · p such that e · p = p and (aa′) · p = a · (a′ · p) for all a, a′ ∈ G and all p ∈ ℳ. The group action should be interpreted as a group of transformations of the manifold ℳ, namely, {L_a}_a_∈_G, L_a(p) = a · p for p ∈ ℳ. The L_a is a smooth transformation on ℳ and its inverse is L_a⁻¹. The orbit of a point p ∈ ℳ is defined as G(p) = {a · p | a ∈ G}. The orbits form a partition of ℳ. If ℳ consists of a single orbit, the group action is transitive or G acts transitively on ℳ, and we call ℳ a homogeneous space. The isotropy subgroup of a point p ∈ ℳ is defined as G_p = {a ∈ G | a · p = p}. When G is a connected group of isometries of the RSS ℳ, ℳ can always be viewed as a homogeneous space, ℳ ≅ G/G_p, and the isotropy subgroup G_p is compact.

From now on, we will assume that the manifold ℳ is an RSS and ℳ = G/G_p with G being a Lie group of isometries acting transitively on ℳ. Geodesics on ℳ are computed through the action of G on ℳ. Due to the transitive action of the group G of isometries on ℳ, it suffices to consider only the geodesic starting at the base point p. Geodesics on ℳ starting from p are the images of the action of a 1-parameter subgroup of G acting on the base point p. That is, for any geodesic γ on ℳ, γ(·) : R → ℳ, starting from p, there exists a 1-parameter subgroup c(·) : R → G such that γ(t) = c(t) · p for all t ∈ R.

3. Intrinsic Regression Model

Let (ℳ, m) be a (C^∞) RSS of dimension d_ℳ and geodesically complete with an inner product m_p and let G be a Lie group of isometries acting smoothly and transitively on ℳ with the identity element e.

3.1. Formulation

Consider n independent observations (y₁, x₁), …, (y_n, x_n), where y_i is the ℳ-valued response variable and x_i = (x_i₁, ···, x_{id_x})^T is a d_x × 1 vector of multiple covariates. Our objective is to introduce an intrinsic regression model for RSS responses and multiple covariates of interest from n subjects.

The specification of the intrinsic regression model involves three key steps including (i) a link function mapping from the space of covariates to ℳ, (ii) the definition of a residual, and (iii) the action of transporting all residuals to a common space. First, we explicitly formalize the link function. From now on, all covariates have been centered to have mean zero. We consider a single-center link function given by

μ (x, q, β) : R^{d_{x}} \times M \times R^{d_{β}} \to M,

(1)

where μ(x, q, β) is a known link function, q ∈ ℳ can be regarded as the intercept or center, and β = (β₁, …, β_{d_β})^T is a d_β ×1 vector of regression coefficients. Moreover, it is assumed that μ(x, q, β) satisfies a single-center property as follows:

μ (0, q, β) = μ (x, q, 0) = q.

(2)

When the regression coefficient vector β equals 0, the link function is independent of the covariates and thus, it reduces to the single center (or ‘mean’) q ∈ ℳ. When all the covariates are equal to zero, the link function is independent of the regression coefficients and reduces to the center q ∈ ℳ. An example of the single-center link function is the geodesic link function in (Kim et al., 2014; Fletcher, 2013), which is given by

μ (x, q, β) = {Exp}_{q}^{M} (\sum_{k = 1}^{d_{x}} x_{i k} V_{k}),

(3)

where V_k’s are tangent vectors in T_qℳ and β includes all unknown parameters associated with the tangent vectors.

More generally, we will consider a multicenter link function to account for the presence of multiple discrete covariates and even a general link function defined as μ(x, θ) : R^d_x × Θ → M, where θ is a vector of unknown parameters in a parameter space Θ. For the multicenter link function, θ contains all unknown intercepts, denoted as q(x_D), corresponding to each discrete covariate class and all regression parameters β corresponding to continuous covariates and their potential interactions with the discrete variables. However, the details on these link functions are presented in the Supplementary document, and here, for notational simplicity, we focus on (1) from now on.

Second, we introduce a definition of “residual” to ensure that μ(x_i, q, β) is the proper “conditional mean” of y_i given x_i, which is the key concept of many regression models (McCullagh and A. Nelder, 1989). For instance, in the classical linear regression model, the response can be written as the sum of the regression function and a residual term and the regression function is the conditional mean of the response only when the conditional mean of the residual is equal to zero. Given the points y_i and μ(x_i, q, β) on a RSS ℳ, we need to define the residual as “a difference” between y_i and μ(x_i, q, β). Assume that y_i and μ(x_i, q, β) are “close enough” to each other in the sense that there is an open ball B(0, ρ) ⊂ T_{μ(x_i,q,β)} ℳ such that for all i = 1, …, n,

y_{i} \in {Exp}_{μ (x_{i}, q, β)} (B (0, ρ)) or {Log}_{μ (x_{i}, q, β)} (y_{i}) \subset B (0, ρ) .

(4)

However, according to a result in Le and Barden (2014), Log_{μ(x_i,q,β)}(y_i) is well defined under some very mild conditions, which require that ∫_ℳ dist_ℳ (p, y_i)²dp(y_i) be finite and achieve a local minimum at μ(x_i, q, β), where p(y_i) is any finite measure of y_i on ℳ. Thus, Log_{μ(x_i,q,β)}(y_i) makes it a good candidate to play the role of a ‘residual’. These residuals, however, lie on different tangent spaces to ℳ, so it is difficult to carry out a multivariate analysis of these residuals.

Third, since ℳ is a RSS, this enables us to “transport” all the residuals, separately, to a common space, say T_pℳ, by exploiting the fact that the parallel transport along the geodesics can be expressed in terms of the action of G on ℳ. Indeed, since ℳ is a symmetric space, the base point p and the point μ(x_i, q, β) can be joined in ℳ by a geodesic, which can be seen as the action of a one-parameter subgroup c(t; x_i, q, β) of G such that c(1; x_i, q, β) · p = μ(x_i, q, β).

We define the rotated residual ℰ (y_i, x_i; q, β) of y_i ∈ ℳ with respect to μ(x_i, q, β) as the parallel transport of the actual residual, Log_{μ(x_i,q,β)}(y_i), along the geodesic from the conditional mean, μ(x_i, q, β), to the base point p. That is,

E (y_{i}, x_{i}; q, β) = E_{i} (q, β) : = {Log}_{p} (c {(1; x_{i}, q, β)}^{- 1} \cdot y_{i}) \in T_{p} M

(5)

for i = 1, …, n, where T_pℳ is identified with R^d_ℳ. The intrinsic regression model on ℳ is defined by

E [E (y_{i}, x_{i}; q_{*}, β_{*}) ∣ x_{i}] = 0,

(6)

where (q_*, β_*) denotes the true value of (q, β) and the expectation is taken with respect to the conditional distribution of y_i given x_i. Model (6) is equivalent to E[Log_{μ(x_i, q_*, β_*)} (y_i)|x_i] = 0 for i = 1, …, n, since the tangent map of the action of c(1;x_i,q_*,β_*)⁻¹ on ℳ is an isomorphism of linear spaces (invariant under the metric m) between the fibers of the tangent bundle T ℳ. This model does not assume any parametric distribution for y_i given x_i, and thus it allows for a large class of distributions. The model is essentially semi-parametric, since the joint distribution of (y, x) is not restricted except by the zero conditional moment requirement in (6).

3.2. A Theoretical Example: The Unit Sphere S^k

We investigate the intrinsic regression model for S^k-valued responses and include several other examples in the supplementary document. We review some basic facts about the geometric structure of ℳ = S^k = {x ∈ R^k⁺¹ : ||x||₂ = 1} (Shi et al., 2012; Mardia and Jupp, 2000; Healy and Kim, 1996; Huckemann et al., 2010). For q ∈ S^k, T_qS^k is given by T_qS^k = {v ∈ R^k⁺¹ : v^⊤q = 0}. The canonical Riemannian metric on S^k is that induced by the canonical inner product on R^k⁺¹. Under this metric, the geodesic distance between any two points q and q′ is equal to ψ_q_,_q′ = arccos(q^Tq′) If the points are not antipodal (i.e. q′ ≠ −q), then there is a unique geodesic path that joins them. Therefore, the radius of injectivity is ρ(S^k) = π. For v ∈ T_qS^k, the Riemannian Exponential map is given by Exp_q(v) = cos(||v||)q + sin(||v||)v/||v||. (Here, sin(0)/0 = 1.) If q and q′ are not antipodal, the Riemannian Logarithmic map is given by Log_q(q′) = arccos(q^T q′)v/||v||, where v = q′ − (q^T q′)q ≠ 0.

The special orthogonal group G = SO(k + 1) is a group of isometries on S^k and acts transitively and simply on S^k via the left matrix multiplication. Specifically, the rotation matrix R_q_,_q′, which rotates q to q′ ∈ S^k, is given by

R_{q, q^{'}} = I_{k + 1} + sin (ψ_{q, q^{'}}) {q^{'} {\tilde{q}}^{T} - \tilde{q} {q^{'}}^{T}} + {cos (ψ_{q, q^{'}}) - 1} {{q^{'} q^{'}}^{T} + {\tilde{q} \tilde{q}}^{T}},

where $\tilde{q} = {q - (q^{T} q^{'}) q^{'}} / \sqrt{1 - {(q^{T} q^{'})}^{2}}$ . Thus, q′ = R_q_,_q′q and R_q_,_q′v ∈ T_q′S^k, for any v ∈ T_qS^k. Moreover, (−π, π) ∋ t ↦ c_q′_,_q(t) · q′ is the unique geodesic curve in S^k joining q′ with q, where c_q′_,_q(t) takes the form

c_{q^{'}, q} (t) = I_{k + 1} + sin (t) {{\tilde{q} q^{'}}^{T} - {q^{'} \tilde{q}}^{T}} + {cos (t) - 1} {{q^{'} q^{'}}^{T} + {\tilde{q} \tilde{q}}^{T}} .

Suppose that we observe {(y_i, x_i) : i = 1, …, n}, where y_i ∈ S^k for all i. We introduce the three key components of our intrinsic regression model for S^k–valued responses. First, we consider several examples of the general link function μ(x_i, θ). Specifically, without loss of generality, we fix the “north pole” p = (0, …, 0, 1)^T ∈ R^k⁺¹ as a base point. Let e_j be the (k +1)×1 vector with a 1 at the j-th component and a 0 otherwise for j = 1, …, k. Let q ∈ S^k be the ‘center’, and we consider two link functions as follows:

\begin{array}{l} μ (x_{i}, q, β) = {Exp}_{q} (\sum_{j = 1}^{k} f {(x_{i}, β)}_{j} c_{p, q} (arccos (p^{T} q)) e_{j}), \\ μ (x_{i}, q, β) = c_{- p, q} (arccos (- p^{T} q)) (T_{s t, - p}^{- 1} ({(f {(x_{i}, β)}^{T}, - 1)}^{T})), \end{array}

(7)

where T_st,-_p is the stereographic projection mapping from S^k \ {p} onto the d-dimensional hyperplane R^k×{−1} and f(x, β) is a function mapping from R^d_x×R^d_β to R^k with f(0, ·) = f(·, 0) = 0. A simple example of f(·, ·) is f(x_i, β) = Bx_i, where B is a k × d_x matrix of regression coefficients and β includes all components of B.

Second, we define residuals for our intrinsic regression model. The residual in (4) requires that y_i is not antipodal to μ(x_i, θ). In this case, the residual for the i-th subject is given by Log_{μ(x_i,θ)}(y_i) = arccos(μ(x_i, θ)^T y_i)v_i/||v_i||, where v_i = y_i − (μ(x_i, θ)^T y_i)μ(x_i, θ). However, when y_i = −μ(x_i, θ) holds, there is an infinite number of geodesics connecting μ(x_i, θ) and y_i. In this case, Log_{μ(x_i, θ)}(y_i) is not uniquely defined, whereas their geodesic distance is unique. However, according to the results in Le and Barden (2014), Log_{μ(x_i,q,β)}(y_i) is well defined almost surely.

Third, we transport all the residuals, separately, to a common space, say T_pS^k. The rotated residual is given by

E (y_{i}, x_{i}; θ) = {Log}_{p} (c_{p, μ (x_{i}, θ)} (- arccos (p^{T} μ (x_{i}, θ))) \cdot y_{i}) .

(8)

Our intrinsic regression model is defined by the zero conditional mean assumption in (6) on the above rotated residuals (8). A graphic illustration of the stereographic link functions in (7), rotated residual, and parallel transport is given in Figure 2.

Fig. 2 — Graphical illustration of (a) stereographic projection and (b) rotated residual and parallel transport. In panel (a), N and O denote the north pole (0, 0, 1) and the origin (0, 0, 0), respectively. In panel (b), y, μ = μ(x, q, β), *ε_D* = Log_μ(y), ℰ(y, x; q, β), and p, respectively, represent an observation, the conditional mean, the residual, the rotated residual, and the base point.

Alternatively, we may consider some parametric spherical regression models for spherical responses. As an illustration, we consider the von Mises-Fisher (vMF) regression model. Specifically, it is assumed that y_i|x_i ~ vMF(μ(x_i, θ), κ) or, equivalently, R_{μ(x_i, θ),p}y_i|x_i ~ vMF(p, κ), for i = 1, …, n, where κ is a positive concentration parameter and is assumed to be known for simplicity. Calculating the maximum likelihood estimate of θ is equivalent to solving a score equation given by $\sum_{i = 1}^{n} y_{i}^{T} \partial_{θ} μ (x_{i}, θ) = 0$ , where ∂_θ = ∂/∂θ. Since ||μ(x_i, θ)|| = 1 and {∂_θμ(x_i, θ)}^T μ(x_i, θ) = 0 for all subcomponents of θ, we have

y_{i}^{T} \partial_{θ} μ (x_{i}, θ) = \sqrt{2} \frac{\sqrt{1 - {(μ {(x_{i}, θ)}^{T} y_{i})}^{2}}}{arccos (μ {(x_{i}, θ)}^{T} y_{i})} {R_{μ (x_{i}, θ), p} \partial_{θ} μ (x_{i}, θ)}^{T} E (y_{i}, x_{i}; θ),

(9)

which is a linear combination of the rotated residual.

4. Estimation and Test Procedures

4.1. Generalized Method of Moment Estimators

We consider the generalized method of moment estimator (GMM estimator) to estimate the unknown parameters in model (6) (Hansen, 1982; Newey, 1993; Korsholm, 1999). We may view the T_pℳ-valued function ℰ as a function with values in R^d_ℳ. Let h(x; q, β) be a s × d_ℳ matrix of functions of (x, q, β) with s ≥ d_ℳ + d_β and W_n be a random sequence of positive definite s × s weight matrices. It follows from (6) that

E {h (x_{i}; q_{*}, β_{*}) E [E (y_{i}, x_{i}; q_{*}, β_{*}) ∣ x_{i}]} = 0.

(10)

We define 𝒬_n(q, β) = [ℙ_n(h(x; q, β)ℰ(y, x; q, β))]^T W_n [ℙ_n(h(x; q, β)ℰ(y, x; q, β))], where $ℙ_{n} f (y, x) = n^{- 1} \sum_{i = 1}^{n} (y_{i}, x_{i})$ for a real-vector valued function f (y, x). The GMM estimator (q̂_G, β̂_G), or simply (q̂, β̂), of (q, β) associated with (h(·, ·, ·), W_n) is defined as

({\hat{q}}_{G}, {\hat{β}}_{G}) = \underset{(q, β) \in M \times R^{d_{β}}}{argmin} Q_{n} (q, β) .

(11)

Under some conditions detailed below, we can show the first order asymptotic properties of (q̂_G, β̂_G) including consistency and asymptotic normality of GMM-estimators. We introduce some notation. Let ||·|| denote the Euclidean norm of a vector or a matrix; $\partial^{l} f (t, β) / {\partial {(t, β)}^{l}} = \partial_{(t, β)}^{l} f (t, β)$ for l = 1, … ; a^⊗2 = aa^T any matrix or vector a; V = Var[h(x; q_*, β_*)ℰ(y, x; q_*, β_*)]; I_d is the identity matrix; $\overset{d}{\to}$ and $\overset{p}{\to}$ , respectively, denote convergence in distribution and in probability. We obtain the following results, whose detailed proofs can be found in the supplementary document.

Theorem 4.1

Assume that (y_i, x_i), i = 1,…, n, are iid random variables in ℳ × R^d_x. Let (q_*, β_*) be the exact value of the parameters satisfying (6). Let {W_n}_n be a random sequence of s × s symmetric positive semi-definite matrices with s ≥ d_ℳ + d_β.

Under assumptions (C1)–(C5) in the Supplementary document, (q̂_G, β̂_G) in (11) is consistent in probability as n → ∞.
Under assumptions (C1)–(C4) and (C6)–(C10) in the Supplementary document, for any local chart (U, ϕ) on ℳ near q_* as n → ∞, we have
$\sqrt{n} [{(ϕ {({\hat{q}}_{G})}^{T}, {\hat{β}}_{G}^{T})}^{T} - {(ϕ {(q_{*})}^{T}, β_{*}^{T})}^{T}] \overset{d}{\to} N_{d_{M} + d_{β}} (0, \sum_{ϕ}),$ (12)

where $\sum_{ϕ} = {(G_{ϕ}^{T} W G_{ϕ})}^{- 1} G_{ϕ}^{T} {WVWG}_{ϕ} {(G_{ϕ}^{T} W G_{ϕ})}^{- 1}$ , in which G_ϕ is defined in (C9). Moreover, for any other chart (U, ϕ′) near q_*, we have
$\sum_{ϕ^{'}} = diag (J {(ϕ^{'} \circ ϕ^{- 1})}_{ϕ (q_{*})}, I_{d_{β}}) \sum_{ϕ} diag {(J {(ϕ^{'} \circ ϕ^{- 1})}_{ϕ (q_{*})}, I_{d_{β}})}^{T},$ (13)

where J(·)_t denotes the Jacobian matrix evaluated at t.

Theorem 4.1 establishes the first-order asymptotic properties of (q̂_G, β̂_G) for the intrinsic regression model (6). Theorem 4.1 (a) establishes the consistency of (q̂_G, β̂_G). The consistency result does not depend on the local chart. Theorem 4.1 (b) establishes the asymptotic normality of (ϕ(q̂_G), β̂_G) for a specific chart (U, ϕ) and the relationship between the asymptotic covariances Σ_ϕ_′ and Σ_ϕ for two different charts. It follows from the lower-right d_β × d_β submatrix of Σ_ϕ_′ that the asymptotic covariance matrix of β̂ does not depend on the chart. However, the asymptotic normality of q̂_G does depend on a specific chart. A consistent estimator of the asymptotic covariance matrix Σ_ϕ is given by ${({\hat{G}}_{ϕ}^{T} W_{n} {\hat{G}}_{ϕ})}^{- 1} {\hat{G}}_{ϕ}^{T} W_{n} \hat{V} W_{n} {\hat{G}}_{ϕ} {({\hat{G}}_{ϕ}^{T} W_{n} {\hat{G}}_{ϕ})}^{- 1}$ with ${\hat{G}}_{ϕ} = n^{- 1} \sum_{i = 1}^{n} [h (x_{i}; \hat{q}, \hat{β}) \frac{\partial}{\partial (t, β)} E (y_{i}, x_{i}; ϕ^{- 1} (t), \hat{β}) ∣_{t = ϕ (\hat{q})}]$ and

\hat{V} = n^{- 1} \sum_{i = 1}^{n} {[h (x_{i}; \hat{q}, \hat{β}) E (y_{i}, x_{i}; \hat{q}, \hat{β})]}^{\otimes 2} .

This estimator is also compatible with the manifold structure of ℳ.

We consider the relationship between the GMM estimator and the intrinsic least squares estimator (ILSE) of (q, β), denoted by (q̂_I, β̂_I). The (q̂_I, β̂_I) minimizes the total residual sum of squares 𝒢_I,n(q, β) as follows:

({\hat{q}}_{I}, {\hat{β}}_{I}) = \underset{(q, β) \in M \times R^{d_{β}}}{argmin} G_{I, n} (q, β) = \underset{(q, β) \in M \times R^{d_{β}}}{argmin} \sum_{i = 1}^{n} {dist}_{M} {(y_{i}, μ (x_{i}, q, β))}^{2} .

(14)

According to (2), the ILSE is closely related to the intrinsic mean q̂_IM of y₁,⋯, y_n ∈ ℳ, which is defined as

{\hat{q}}_{I M} = \underset{q \in M}{argmin} \sum_{i = 1}^{n} {dist}_{M} {(y_{i}, q)}^{2} = \underset{q \in M}{argmin} \sum_{i = 1}^{n} {dist}_{M} {(y_{i}, μ (0, q, β))}^{2} .

Recall that μ(0, q, β) is independent of β.

The (q̂_I, β̂_I) can be regarded as a special case of the GMM estimator when we set W_n = I_{d_ℳ}+d_β and h’s rows h_j(x, q, β) = (L_{c(1;x, q,β)⁻¹.*}(∂_{t_j} μ(x, ϕ⁻¹(t), β)|_t₌_ϕ_(q)))^T for j = 1,…, d_ℳ, and h_{d_ℳ+j} (x, q, β) = (L_{c(1;x,q,β)⁻¹.*}(∂_{β_j} μ(x, q, β)))^T for j = 1,…, d_β, where (U, ϕ) is a chart on ℳ and each row of h(x, q, β) is in R^1×d_ℳ via the identification T_pℳ ≅ R^d_ℳ corresponding to (U, ϕ). It follows from Theorem 4.1 that under model (6), (q̂_I, β̂_I) enjoys the first-order asymptotic properties as well.

4.2. Efficient GMM Estimator

We investigate the most efficient estimator in the class of GMM estimators. For a fixed h(·; ·, ·), the optimal choice of W is W^opt = V⁻¹, and the use of W_n = W^opt leads to the most efficient estimator in the class of all GMM estimators obtained using the same h(·) function (Hansen, 1982). Its asymptotic covariance is given by (G_ϕV⁻¹G_ϕ)⁻¹. An interesting question is what the optimal choice of h^opt(·) is.

We first introduce some notation. For a chart (U, ϕ) on ℳ near q_*, let

\begin{array}{l} D_{ϕ} (x) & = & E {[\partial_{(t, β)} E (y, x; ϕ^{- 1} (t), β_{*}) ∣_{t = ϕ (q_{*})} ∣ x]}^{T}, h_{ϕ}^{*} (x) = D_{ϕ} (x) Ω {(x)}^{- 1}, \\ W_{ϕ}^{*} & = & E {[D_{ϕ} (x) Ω {(x)}^{- 1} D_{ϕ} {(x)}^{T}]}^{- 1}, Ω (x) = Var (E (y, x; q_{*}, β_{*}) ∣ x) . \end{array}

Let (q̂^*, β̂^*) be the GMM estimator of (q, β) based on $h_{ϕ}^{*} (x)$ and $W_{ϕ}^{*}$ . Generally, we obtain an optimal result of h^opt(·), which generalizes an existing result for Euclidean-valued responses and covariates (Newey, 1993), as follows.

Theorem 4.2

Suppose that (C2)–(C8) and (C10)–(C12) in the Supplementary document hold for $h_{ϕ}^{*} (x)$ and $W_{ϕ}^{*}$ . We have the following results:

(q̂^*, β̂^*) is asymptotically normally distributed with mean 0 and covariance $W_{ϕ}^{*}$ ;
(q̂^*, β̂^*) is optimal among all GMM estimators for model (6);
(q̂^*, β̂^*) is independent of the chart.

Theorem 4.2 characterizes the optimality of $h_{ϕ}^{*} (x)$ and $W_{ϕ}^{*}$ among regular GMM estimators for model (6). Geometrically, (q̂^*, β̂^*) is independent of the chart. Specifically, for any other chart (U, ϕ′) near q_*, we have

\begin{array}{l} D_{ϕ^{'}} (x) & = & diag {({[J {(ϕ^{'} \circ ϕ^{- 1})}_{ϕ (q_{*})}]}^{- 1}, I_{d_{β}})}^{T} D_{ϕ} (x), \\ h_{ϕ^{'}}^{*} (x) & = & diag {({[J {(ϕ^{'} \circ ϕ^{- 1})}_{ϕ (q_{*})}]}^{- 1}, I_{d_{β}})}^{T} h_{ϕ}^{*} (x), \\ W_{ϕ^{'}}^{*} & = & diag (J {(ϕ^{'} \circ ϕ^{- 1})}_{ϕ (q_{*})}, I_{d_{β}}) W_{ϕ}^{*} diag {(J {(ϕ^{'} \circ ϕ^{- 1})}_{ϕ (q_{*})}, I_{d_{β}})}^{T} . \end{array}

Thus, the quadratic form in (11) associated with $h_{ϕ^{'}}^{*} (x)$ and $W_{ϕ^{'}}^{*}$ is the same as that which is associated with $h_{ϕ}^{*} (x)$ and $W_{ϕ}^{*}$ . It indicates that the GMM estimator (q̂^*, β̂^*)_ϕ based on $h_{ϕ}^{*} (x)$ and $W_{ϕ}^{*}$ is independent of the chart (U, ϕ).

The next challenging issue is the estimation of D_ϕ (x) and Ω(x). We may proceed in two steps. The first step is to calculate a $\sqrt{n}$ -consistent estimator (q̂_I, β̂_I) of (q, β), such as the ILSE. The second step is to plug (q̂_I, β̂_I) into the functions ℰ_i(q̂_I, β̂_I) and ∂₍_t_,_β₎ℰ(y_i, x_i; ϕ⁻¹(t), β̂_I)|_{t=ϕ(q̂_I)} for all i and then use them to construct the nonparametric estimates of D_ϕ(x) and Ω(x) (Newey, 1993). Specifically, let K(·) be a d_x-dimensional kernel function of the l₀-th order satisfying ∫K(u₁,…, u_{d_x})du₁ … du_{d_x} = 1, $\int u_{s}^{l} K (u_{1}, \dots, u_{d_{x}}) d u_{1} \dots d u_{d_{x}} = 0$ for any s = 1,…, d_x and 1 ≤ l < l₀, and $\int u_{s}^{l_{0}} K (u_{1}, \dots, u_{d_{x}}) d u_{1} \dots d u_{d_{x}} \neq 0$ . Let K_τ (u) = τ⁻¹K(u/τ), where τ > 0 is a bandwidth. Then, a nonparametric estimator of D_ϕ(x) can be written by

{\hat{D}}_{ϕ} {(x)}^{T} = \sum_{i = 1}^{n} ω_{i} (x; τ) \partial_{(t, β)} E (y_{i}, x_{i}; ϕ^{- 1} (t), {\hat{β}}_{I}) ∣_{t = ϕ ({\hat{q}}_{t})},

(15)

where $ω_{i} (x; τ) = K_{τ} (x - x_{i}) / {\sum_{k = 1}^{n} K_{τ} (x - x_{k})}$ . Although we may construct a nonparametric estimator of Ω(x) similar to (15), we have found that even for moderate d_x, such an estimator is numerically unstable. Instead, we approximate Ω(x_i) = Var(ℰ(y, x; q_*, β_*)|x = x_i) by its mean V_{E_*} = Var(ℰ(y, x; q_*, β_*)). In this case, $h_{ϕ}^{*} (x)$ and $W_{ϕ}^{*}$ , respectively, reduce to

h_{E, ϕ}^{*} (x) = D_{ϕ} (x) V_{E *}^{- 1} and W_{E, ϕ}^{*} = {E [D_{ϕ} (x) V_{E *}^{- 1} Ω (x) V_{E *}^{- 1} D_{ϕ} {(x)}^{T}]}^{- 1} .

(16)

For any local chart (U, ϕ) with q̂_I ∈ U, we construct the estimators of $h_{E, ϕ}^{*}$ and $W_{E, ϕ}^{*}$ as follows. Let V̂(q, β) =ℙ_nℰ(y, x; q, β)^⊗2, we have

{\hat{h}}_{E, ϕ} (x_{i}) = {\hat{D}}_{ϕ} (x_{i}) \hat{V} {({\hat{q}}_{I}, {\hat{β}}_{I})}^{- 1}, {\hat{W}}_{E, ϕ} = {ℙ_{n} {[{\hat{h}}_{E, ϕ} (x) E (y, x; {\hat{q}}_{I}, {\hat{β}}_{I})]}^{\otimes 2}}^{- 1} .

(17)

Then, we substitute ĥ_{E, ϕ} and Ŵ_{E, ϕ} into (11) and then calculate the GMM estimator of (q, β), denoted by (q̂_E, β̂_E). Similar to (q̂^*, β̂^*), it can be shown that (q̂_E, β̂_E) is independent of the chart (U, ϕ) on ℳ near q_* with q̂_I ∈ U. For sufficiently large n, dist_ℳ(q̂_I, q_*) can be made sufficiently small and any maximal normal chart on ℳ centered at q̂_I contains the true value q_* with probability approaching one.

We calculate a one-step linearized estimator of (q, β), denoted by (q̃_E, β̃_E), to approximate (q̂_E, β̂_E). Computationally, the linearized estimator does not require iteration, whereas, theoretically, it shares the first asymptotic properties with (q̂_E, β̂_E) as shown below. Specifically, in the chart (U, ϕ) near q̂_I, we have

{({\tilde{t}}_{E, ϕ}^{T}, {\tilde{β}}_{E, ϕ}^{T})}^{T} - {(ϕ {({\hat{q}}_{I})}^{T}, {\hat{β}}_{I}^{T})}^{T} = {- ℙ_{n} [{\hat{h}}_{E, ϕ} (x) \partial_{(t, β)} E (y, x; ϕ^{- 1} (t), {\hat{β}}_{I}) ∣_{t = ϕ ({\hat{q}}_{I})}]}^{- 1} ℙ_{n} [{\hat{h}}_{E, ϕ} (x) E_{i} (y, x; {\hat{q}}_{I}, {\hat{β}}_{I})] .

(18)

Furthermore, if (U′, ϕ ′) is another chart on ℳ near q̂_I, then we have

{({\tilde{t}}_{E, ϕ^{'}}^{T}, {\tilde{β}}_{E, ϕ^{'}}^{T})}^{T} - {(ϕ^{'} {({\hat{q}}_{I})}^{T}, {\hat{β}}_{I}^{T})}^{T} = (\begin{matrix} J_{ϕ ({\hat{q}}_{I})} (ϕ^{'} \circ ϕ^{- 1}) & 0 \\ 0 & I_{d_{β}} \end{matrix}) [{({\tilde{t}}_{E, ϕ}^{T}, {\tilde{β}}_{E, ϕ}^{T})}^{T} - {(ϕ {({\hat{q}}_{I})}^{T}, {\hat{β}}_{I}^{T})}^{T}] .

Thus, β̃_E,ϕ is independent of the chart ϕ and {t̃_{E, ϕ} − ϕ(q̂_I) | ϕ is a chart on M} defines a unique tangent vector to ℳ at q̂_I. Moreover, if ϕ and ϕ′ are maximal normal charts centered at q̂_I, then γ_ϕ (τ) = ϕ⁻¹(τt̃_E,ϕ) and γ_ϕ′(τ) = ϕ′⁻¹(τt̃_{E, ϕ′}) are two geodesic curves on ℳ starting from the same point q̂_I with the same initial velocity vector, and thus these two geodesics coincide. Therefore, ϕ⁻¹(t̃_E,ϕ) is independent of the normal chart ϕ centered at q̂_I. Finally, we can establish the first order asymptotic properties of (q̃_E, β̃_E) as follows.

Theorem 4.3

Assume that (C2)–(C11) and (C13)–(C18) are valid. As n → ∞, we have the following results:

\sqrt{n} [{(ϕ {({\tilde{q}}_{E})}^{T}, {\tilde{β}}_{E}^{T})}^{T} - {(ϕ {(q_{*})}^{T}, β_{*}^{T})}^{T}] \overset{d}{\to} N_{d_{M} + d_{β}} (0, \sum_{E, ϕ}),

(19)

where $\sum_{E, ϕ} = {(G_{ϕ, h_{E, ϕ}^{*}} W_{E, ϕ}^{*} G_{ϕ, h_{E, ϕ}^{*}})}^{- 1}$ . In addition, Σ_{E, ϕ} is invariant under the change of coordinates in ℳ and the asymptotic distribution of β̃_E does not depend on the chart (U, ϕ). Also, if we set

\begin{array}{l} {\sum^{^}}_{E, ϕ} = n^{- 1} {ℙ_{n} [{\hat{D}}_{ϕ} (x) \hat{V} {({\hat{q}}_{I}, {\hat{β}}_{I})}^{- 1} {\hat{D}}_{ϕ} {(x)}^{T}]}^{- 1} \\ \times {ℙ_{n} [{\hat{D}}_{ϕ} (x) \hat{V} {({\hat{q}}_{I}, {\hat{β}}_{I})}^{- 1} E {(y, x, {\tilde{q}}_{I}, {\tilde{β}}_{I})}^{\otimes 2} \hat{V} {({\hat{q}}_{I}, {\hat{β}}_{I})}^{- 1} {\hat{D}}_{ϕ} {(x)}^{T}]} \\ \times {ℙ_{n} [{\hat{D}}_{ϕ} (x) \hat{V} {({\hat{q}}_{I}, {\hat{β}}_{I})}^{- 1} {\hat{D}}_{ϕ} {(x)}^{T}]}^{- 1}, \end{array}

(20)

then nΣ̂_E,ϕ is a consistent estimator of Σ_E,ϕ, i.e. $n {\sum^{^}}_{E, ϕ} \overset{p}{\to} \sum_{E, ϕ}$ . This estimator is also compatible with the manifold structure of ℳ.

Theorem 4.3 establishes the first-order asymptotic properties of (q̃_E, β̃_E). If Ω(x) = Ω for a constant matrix Ω, then it follows from Theorems 4.2 and 4.3 that (q̃_E, β̃_E) is optimal. If Ω(x) does not vary dramatically as a function of x, then (q̃_E, β̃_E) is nearly optimal. If Ω(x) varies dramatically as a function of x, one can replace V̂(q̂_I, β̂_I) in (17) by Ω̂(x_i) to obtain ĥ_E,ϕ (x) = D̂_ϕ(x)Ω̂(x)⁻¹ and Ŵ_E,ϕ = {ℙ_n[ĥ_E,ϕ (x)ℰ(y, x; q̂_I, β̂_I)]^⊗2}⁻¹, where Ω̂(x_i) is a consistent estimator of Ω(x_i) for all i, then the optimality of (q̃_E, β̃_E) still holds. We have the following theorem.

Theorem 4.4

Assume that (C2)–(C17) and (C19) are valid. Then, as n → ∞, we have

\sqrt{n} [{(ϕ {({\tilde{q}}_{E})}^{T}, {\tilde{β}}_{E}^{T})}^{T} - {(ϕ {(q_{*})}^{T}, β_{*}^{T})}^{T}] \overset{d}{\to} N_{d_{M} + d_{β}} (0, \sum_{ϕ}^{*}),

(21)

in which $\sum_{ϕ}^{*}$ is given in Theorem 4.2. If we set Σ̂_E,ϕ = n⁻¹{ℙ_n[D̂_ϕ (x)Ω̂(x)⁻¹D̂_ϕ(x)^T]}⁻¹, then nΣ̂_E,ϕ is a consistent estimator of $\sum_{ϕ}^{*}$ .

4.3. Computational Algorithm

Computationally, an annealing evolutionary stochastic approximation Monte Carlo (SAMC) algorithm (Liang et al., 2010) is developed to compute (q̂_I, β̂_I) and (q̂_E, β̂_E). See the supplementary document for details. Although some gradient-based optimization methods, such as the quasi-Newton method, have been used to optimize 𝒬_n(q, β) (Kim et al., 2014; Fletcher, 2013), we have found that these methods strongly depend on the starting value of (q, β). Specifically, when ℰ(y, x; q, β) takes a relatively complicated form, 𝒬_n(q, β) is generally not convex and can easily converge to local minima. Moreover, we have found that it can be statistically misleading to carry out statistical inference, such as the estimated standard errors of (q̂_E, β̂_E), at those local minima. The annealing evolutionary SAMC algorithm converges fast and distinguishes from many gradient-based algorithms, since it possesses a nice feature in that the moves are self-adjustable and thus not likely to get trapped by local energy minima. The annealing evolutionary SAMC algorithm (Liang et al., 2010) represents a further improvement of stochastic approximation Monte Carlo for optimization problems by incorporating some features of simulated annealing and the genetic algorithm into its search process.

4.4. Hypotheses Testing

Many scientific questions involve the comparison of the ℳ-valued data across groups and subjects and the detection of the change in the ℳ-valued data over time. Such questions usually can be formulated as testing the hypotheses of q and β. We consider two types of hypotheses as follows:

H_{0}^{(1)} : C_{0} β = b_{0} vs. H_{1}^{(1)} : C_{0} β \neq b_{0},

(22)

H_{0}^{(2)} : q = q_{0} vs. H_{1}^{(2)} : q \neq q_{0},

(23)

where C₀ is an r × d_β matrix of full row rank and q₀ and b₀ are specified in ℳ and R^r, respectively. Further extensions of these hypotheses are definitely interesting and possible. For instance, for the multicenter link function, we may be interested in testing whether all intercepts are independent of the discrete covariate class.

We develop several test statistics for testing the hypotheses given in (22) and (23). First, we consider the Wald test statistic for testing $H_{0}^{(1)}$ against $H_{1}^{(1)}$ in (22), which is given by

W_{n, ϕ}^{(1)} = {(C_{0} {\tilde{β}}_{E} - b_{0})}^{T} {[C_{0} {\sum^{^}}_{E, ϕ; 22} C_{0}^{T}]}^{- 1} (C_{0} {\tilde{β}}_{E} - b_{0}),

where Σ̂_E_,_ϕ is given in Theorem 4.3 or Theorem 4.4, and Σ̂_E_,_ϕ_;22 is its lower-right d_β × d_β submatrix. Since β̃_E and its asymptotic covariance matrix are independent of the chart on ℳ, the test statistic $W_{n, ϕ}^{(1)}$ is independent of the chart.

Second, we consider the Wald test statistic for testing the hypotheses given in (23) when there is a local chart (U, ϕ) on ℳ containing both q̂_E and q₀. Specifically, the Wald test statistic for testing (23) is defined by

W_{n, ϕ}^{(2)} = {(ϕ ({\tilde{q}}_{E}) - ϕ (q_{0}))}^{T} {[(I_{d_{M}} 0) {\sum^{^}}_{E, ϕ} {(I_{d_{M}} 0)}^{T}]}^{- 1} (ϕ ({\tilde{q}}_{E}) - ϕ (q_{0})) .

Third, we develop an intrinsic Wald test statistic, that is independent of the chart, for testing the hypotheses given in (23). We consider the asymptotic covariance estimator Σ̂_E_,_ϕ based on q̃_E and its upper-left d_ℳ × d_ℳ submatrix Σ̂_E_,_ϕ_;11. Since both are compatible with the manifold structure of ℳ, Σ̂_E_,_ϕ_;11 defines a unique non-degenerate linear map Σ̂_E_;11(·) from the tangent space T_{q̃_E}ℳ of ℳ at q̃_E onto itself, which is independent of the chart (U, ϕ). In a maximal normal chart centered at q̃_E, then in any such normal chart, the Wald test statistic for testing (23) is given by

W_{M, n}^{(2)} = m_{{\tilde{q}}_{E}} ({(\sum_{E : 11})}^{- 1} ({Log}_{{\tilde{q}}_{E}} q_{0}), {Log}_{{\tilde{q}}_{E}} q_{0}) .

We obtain the asymptotic null distributions of $W_{n, ϕ}^{(1)}, W_{n, ϕ}^{(2)}$ , and $W_{M, n}^{(2)}$ as follows.

Theorem 4.5

Let (U, ϕ) be a local chart on ℳ so that q̃_E, q_* ∈ U. Assume that all conditions in Theorem 4.3 hold. Under the corresponding null hypothesis, we have the following results:

$W_{n, ϕ}^{(1)}$ and $W_{n, ϕ}^{(2)}$ are asymptotically distributed as $χ_{r}^{2}$ and $χ_{d_{M}}^{2}$ , respectively;
$W_{n, ϕ}^{(1)}$ is independent of the chart (U, ϕ);
$W_{n, ϕ^{'}}^{(2)} = W_{n, ϕ}^{(2)} + o_{p} (1)$ , for any other local chart (U, ϕ′) with q̃_E and q₀ in U;
$W_{n, ϕ}^{(2)} = W_{M, n}^{(2)}$ , for any normal chart (U, ϕ) centered at q̃_E.

Theorem 4.5 has several important implications. Theorem 4.5 (i) characterizes the asymptotic null distributions of $W_{n, ϕ}^{(1)}$ and $W_{n, ϕ}^{(2)}$ . Theorem 4.5 (ii) shows that $W_{n, ϕ}^{(1)}$ does not depend the choice of the chart (U, ϕ) on ℳ. Theorem 4.5 (iii) shows that $W_{n, ϕ^{'}}^{(2)}$ and $W_{n, ϕ}^{2}$ are asymptotically equivalent for any two local charts. Theorem 4.5 (iv) shows that $W_{n, ϕ^{'}}^{(2)}$ can be used to construct an intrinsic test statistic.

We consider a local alternative framework for (22) and (23) as follows:

H_{0}^{(1)} : C_{0} β = b_{0} vs. H_{1, n}^{(1)} : C_{0} β = b_{0} + δ / \sqrt{n} + o (1 / \sqrt{n}),

(24)

H_{0}^{(2)} : q = q_{0} vs. H_{1, n}^{(2)} : q = {Exp}_{q_{0}} (v / \sqrt{n} + o (1 / \sqrt{n})),

(25)

where δ and v are specified (and fixed) in R^r and T_q₀ℳ, respectively, and we establish the asymptotic distributions of $W_{n, ϕ}^{(1)}, W_{n, ϕ}^{(2)}$ and $W_{M, n}^{(2)}$ under these local alternatives.

Theorem 4.6

Let (U, ϕ) be a local chart on ℳ so that q̃_E, q_* ∈ U. Assume that all conditions in Theorem 4.3 hold. Under the local alternatives (24) and (25), we have the following results:

Under $H_{1, n}^{(1)}, W_{n, ϕ}^{(1)}$ is asymptotically distributed as noncentral $χ_{r}^{2}$ with noncentrality parameter $δ^{T} {[C_{0} {\sum^{^}}_{E, ϕ; 22} C_{0}^{T}]}^{- 1} δ$ .
Under $H_{1, n}^{(2)}, W_{n, ϕ}^{(2)}$ is asymptotically distributed as noncentral $χ_{d_{M}}^{2}$ , with noncentrality parameter J(ϕ ∘ Exp_q₀)₀(v)^T [Σ̂_E_,_ϕ_;11]⁻¹ J(ϕ ∘ Exp_q₀)₀(v). The noncentrality parameter does not depend on the choice of the coordinate system at q₀. Here, J(f)_a denotes the Jacobian matrix of map f at a.
Under $H_{1, n}^{(2)}, W_{n, ϕ}^{(2)}$ is asymptotically distributed as noncentral $χ_{d_{M}}^{2}$ , with noncentrality parameter m_{q̃_E}((Σ̂_E_;11)⁻¹(J(Log_{q̃_E})_q₀(v)), (J(Log_{q̃_E})_q₀(v))). The noncentrality parameter does not depend on the choice of the coordinate systems at q̃_E and q₀, respectively.

We consider another scenario that there are no local charts on ℳ containing both q̃_E and q₀. In this case, we restate the hypotheses $H_{0}^{(2)}$ and $H_{1}^{(2)}$ as follows:

H_{0}^{(2)} : {dist}_{M} (q, q_{0}) = 0 vs. H_{1}^{(2)} : {dist}_{M} (q, q_{0}) \neq 0.

(26)

We propose a geodesic test statistic given by

W_{dist} = {dist}_{M} {({\tilde{q}}_{E}, q_{0})}^{2},

(27)

which is independent of the chart (U, ϕ). Theoretically, we can establish the asymptotic distribution of W_dist under both the null and alternative hypotheses as follows.

Theorem 4.7

Assume that all conditions in Theorem 4.5 hold.

Under $H_{0}^{(2)}$ , nW_dist is asymptotically weighted chi-square χ²(λ₁, …, λ_{d_ℳ}) distributed, where the weights λ₁, …, λ_{d_ℳ} are the eigenvalues of the matrix Σ_{E,Log_q₀,11}, which is the upper-left d_ℳ × d_ℳ submatrix of the asymptotic covariance matrix Σ_{E,Log_q₀} of q̃_E in a normal chart centered at q₀. Moreover, the weights are independent, up to a permutation, of the choice of the normal chart centered at q_*.
Under the alternative hypothesis, W_dist is asymptotically normal distributed and we have
$\sqrt{n} (W_{dist} - {dist}_{M} {(q_{*}, q_{0})}^{2}) \overset{d}{\to} N_{d_{M}} (0, D_{dist}^{T} \sum_{E, {Log}_{q_{*}}, 11} D_{dist}),$

where D_dist is the column vector representation of grad_{q_*}(dist(·, q₀)²) with respect to the orthonormal basis of T_{q_*}ℳ associated with the normal chart used to represent the asymptotic covariance of q̃_E as the matrix Σ_{E,Log_{q_*}}. In particular, when q₀ is close to q_*, then
$\sqrt{n} (W_{dist} - {dist}_{M} {(q_{*}, q_{0})}^{2}) \overset{d}{\to} N_{d_{M}} (0, 4 {[{Log}_{q_{*}} q_{0}]}^{T} \sum_{E, {Log}_{q_{*}}, 11} [{Log}_{q_{*}} q_{0}]) .$

Theorem 4.7 establishes the asymptotic distribution of W_dist when q̃_E and q₀ do not belong to the same chart of ℳ. In practice, the covariance matrix Σ_{E,Log_{q_*},11} is not available, since Σ_{E,Log_{q_*}} is not known; it also depends on the unknown true value β_*, so we may use the estimate Σ̂_{E,Log_{q_*}} as defined in Theorems 4.3 and 4.4. Therefore, under the null hypothesis, the asymptotic distribution of W_dist can be approximated by the weighted chi-square distribution χ²(λ̂₁, …, λ̂_{d_ℳ}), in which the weights λ̂₁, …, λ̂_{d_ℳ} are the eigenvalues of the covariance matrix (Σ̂_{E,Log_q₀})₁₁/n.

Finally, we develop a score test statistic for testing $H_{0}^{(2)}$ against $H_{1}^{(2)}$ . An advantage of using the score test statistic is that it avoids the calculation of an estimator under the alternative hypothesis $H_{1}^{(2)}$ . For notational simplicity, we only consider the ILSE estimator of (q, β), denoted by (q₀, β̃_I), under the null hypothesis $H_{0}^{(2)}$ . For any chart (U, ϕ) on ℳ with q₀ ∈ U, we define

\begin{array}{l} F_{ϕ i} = {(F_{ϕ i, 1}^{⊤}, F_{ϕ i, 2}^{⊤})}^{⊤} = \partial_{(t, β)} {dist}_{M} {(f (x_{i}, ϕ^{- 1} (t), β), y_{i})}^{2} ∣_{t = ϕ (q_{0}), {\tilde{β}}_{I}}, \\ U_{ϕ} = (\begin{matrix} U_{tt} & U_{t β} \\ U_{β t} & U_{β β} \end{matrix}) = \sum_{i = 1}^{n} \partial_{(t, β)}^{2} {dist}_{M} {(f (x_{i}, ϕ^{- 1} (t), β), y_{i})}^{2} ∣_{t = ϕ (q_{0}), {\tilde{β}}_{I}}, \end{array}

where the subcomponents F_ϕi_,1 and F_ϕi_,2 correspond to t and β, respectively. It can be shown that the score test W_SC_,_ϕ reduces to

W_{S C, ϕ} = {(\sum_{i = 1}^{n} F_{ϕ i, 1})}^{⊤} {\sum^{\sim}}_{ϕ, q}^{- 1} (\sum_{i = 1}^{n} F_{ϕ i, 1}),

(28)

where ${\sum^{\sim}}_{ϕ, q} = (I_{d_{M}}, - U_{t β} U_{ββ}^{- 1}) [\sum_{i = 1}^{n} {(F_{ϕ i} - \bar{F_{ϕ}})}^{\otimes 2}] {(I_{d_{M}}, - U_{t β} U_{ββ}^{- 1})}^{⊤}$ , in which $\bar{F_{ϕ}} = n^{- 1} \sum_{i = 1}^{n} F_{ϕ i}$ . Theoretically, we can establish the asymptotic distribution of W_SC_,_ϕ under the null hypothesis.

Theorem 4.8

Assume that all conditions in Theorem 4.5 hold. We have the following results:

For any suitable local chart (U, ϕ), the score test statistic W_SC_,_ϕ is asymptotically distributed as $χ_{d_{M}}^{2}$ under the null hypothesis $H_{0}^{(2)}$ .
Under $H_{0}^{(2)}$ , for any other local chart (U, ϕ′) with q₀ ∈ U, we have
$W_{S C, ϕ^{'}} = W_{S C, ϕ} .$

5. Real Data Example

5.1. ADNI Corpus Callosum Shape Data

Alzheimer disease (AD) is a disorder of cognitive and behavioral impairment that markedly interferes with social and occupational functioning. It is an irreversible, progressive brain disease that slowly destroys memory and thinking skills, and eventually even the ability to carry out the simplest tasks. AD affects almost 50% of those over the age of 85 and is the sixth leading cause of death in the United States.

The corpus callosum (CC), as the largest white matter structure in the brain, connects the left and right cerebral hemispheres and facilitates homotopic and heterotopic interhemispheric communication. It has been a structure of high interest in many neuroimaging studies of neuro-developmental pathology. Individual differences in CC and their possible implications regarding interhemispheric connectivity have been investigated over the last several decades (Paul et al., 2007).

We consider the CC contour data obtained from the ADNI study. For each subject in ADNI dataset, the segmentation of the T1-weighted MRI and the calculation of the intracranial volume were done in the FreeSurfer package^‡ (Dale et al., 1999), while the midsagittal CC area was calculated in the CCseg package, which is measured by using subdivisions in Witelson (1989) motivated by neuro-histological studies. Finally, each T1-weighted MRI image and tissue segmentation were used as the input files of CCSeg package to extract the planar CC shape data.

5.2. Intrinsic Regression Models

We are interested in characterizing the change of the CC contour shape as a function of three covariates including gender, age, and AD diagnosis. We focused on n = 409 subjects with 223 healthy controls (HCs) and 186 AD patients at baseline of the ADNI1 database. We observed a CC planar contour Y_i with 32 landmarks and three clinical variables including gender x_i_,1 (0-female, 1-male), age x_i_,2, and diagnosis x_i_,3 (0-control, 1-AD) for i = 1, …, 409. The demographic info is presented in Table 1.

Table 1.

Demographic information for the processed ADNI CC shape dataset including disease status, age, and gender.

Disease status	No.	Range of age in years (mean)	Gender (female/male)
Healthy Control	223	62–90 (76.25)	107/116
AD	186	55–92 (75.42)	88/98

Open in a new tab

We treat the CC planar contour Y_i as a RSS-valued response in the Kendall’s planar shape space $\sum_{2}^{32}$ . The geometric structure of $\sum_{2}^{k}$ for k > 2 is included in the supplementary document. Each Y_i is specified as a 32 × 2 real matrix, whose rows represent the planar coordinates of 32 landmarks on y_i. Moreover, Y_i = (Y_i_,1 Y_i_,2) can be represented as a complex vector z_i = Y_i_,1 +jY_i_,2 in C³², where $j = \sqrt{- 1}$ and C is the standard complex space. After removing the translations and normalizing to the unit 2-norm, each contour Y_i can be view as an element $z_{i} \in D^{32} = {z = {(z^{1}, \dots, z^{32})}^{T} \in C^{32} ∣ \sum_{m = 1}^{32} z^{m} = 0 and {‖ z ‖}_{2} = 1}$ . Then, after removing the 2-dimensional rotations, we obtain an element y_i = [z_i] in Kendall’s planar shape space, $\sum_{2}^{32} = D^{32} / S^{1}$ , which has dimension 30 and is identified with the complex projective space CP³⁰.

In order to use our intrinsic regression model, we determined the base point p and an orthonormal basis {Z₁, …, Z₃₀} for $T_{p} \sum_{2}^{32}$ as follows. We initially set p₀ = [z₀] with $z_{0} = {(1, - 1, 0, \dots, 0)}^{T} / \sqrt{2}$ and an orthonormal basis {Z̃₁, …, Z̃₃₀} in $T_{p_{0}} \sum_{2}^{32}$ , where ${\tilde{Z}}_{l} = {(1, \dots, 1, - (l + 1), 0, \dots, 0)}^{T} / \sqrt{(l + 1) (l + 2)}$ . Then, we projected all y_i’s onto $T_{p_{0}} \sum_{2}^{32}$ and calculated Log_p₀(y_i) for all i. Finally, we set the base point p as ${Exp}_{p_{0}} (n^{- 1} \sum_{i = 1}^{n} {Log}_{p_{0}} (y_{i}))$ and then used the parallel transport to rotate the initial basis {Z̃₁, …, Z̃₃₀} to obtain a new orthonormal basis {Z₁, …, Z₃₀} at p.

We consider an intrinsic regression model with $y_{i} \in \sum_{2}^{32}$ as a response vector and a vector of four covariates including gender, age, diagnosis, and the interaction age*diagnosis, that is, x_i = (x_i_,1, x_i_,2, x_i_,3, x_i_,4)^T with x_i_,4 = x_i_,2x_i_,3. We used a single-center link function with model parameters $(q, β) \in \sum_{2}^{32} \times R^{240}$ as follow. The intercept q is specified by $q = ϕ_{p}^{- 1} (t) = {Exp}_{p} (\sum_{ℓ = 1}^{30} (t_{2 ℓ - 1} + j t_{2 ℓ}) Z_{ℓ})$ , where t = (t₁, …, t₆₀)^T ∈ R⁶⁰. The regression coefficient vector β includes four 60 × 1 subvectors including β⁽^g⁾, β⁽^a⁾, β⁽^d⁾, and β⁽^ad⁾, which correspond to x_i₁, x_i₂, x_i₃, and x_i₄, respectively. Therefore, there are 300 unknown parameters in (t^T, β^T)^T. We define a 30 × 4 complex matrix as B = B_o+jB_e, with $B_{o} = (\begin{matrix} β_{o}^{(g)} & β_{o}^{(a)} & β_{0}^{(d)} & β_{o}^{(a d)} \end{matrix}), B_{e} = (\begin{matrix} β_{e}^{(g)} & β_{e}^{(a)} & β_{e}^{(d)} & β_{e}^{(a d)} \end{matrix}) \in R^{30 \times 4}$ , where $β_{o}^{(\cdot)}$ and $β_{e}^{(\cdot)}$ are the subvectors of β^(·) formed by the odd-indexed and even-indexed components, respectively, and a link function by $μ (x_{i}, q, β) = {Exp}_{q} ([U_{p, q} Z_{1}, \dots, U_{p, q} Z_{30}] B x_{i}) \in \sum_{2}^{32}$ , where $U_{q_{1}, q_{2}} v = U_{z_{q_{1}}, z_{q_{2}}^{*}} v$ , with q₁ = [z_q₁], q₂ = [z_q₂], $z_{q_{2}}^{*} = e^{j θ^{*}} z_{q_{2}}$ the optimal rotational alignment of z_q₂ to z_q₁, given by ${\bar{z}}_{q_{2}}^{T} z_{q_{1}} = e^{j θ^{*}} ∣ {\bar{z}}_{q_{2}}^{T} z_{q_{1}} ∣$ , and U_z₁,z₂v for any v ∈ C^k takes the form of

v - ({\bar{z_{1}}}^{T} v) z_{1} - ({\bar{{\tilde{z}}_{2}}}^{T} v) \tilde{z_{2}} + {({\bar{z_{1}}}^{T} z_{2}) ({\bar{z_{1}}}^{T} v) - \sqrt{1 - {∣ {\bar{z_{1}}}^{T} z_{2} ∣}^{2}} ({\bar{{\tilde{z}}_{2}}}^{T} v)} z_{1} + {\sqrt{1 - {∣ {\bar{z_{1}}}^{T} z_{2} ∣}^{2}} ({\bar{z_{1}}}^{T} v) + \bar{({\bar{z_{1}}}^{T} z_{2})} ({\bar{\tilde{z_{2}}}}^{T} v)} \tilde{z_{2}},

in which $\tilde{z_{2}} = {z_{2} - ({\bar{z_{1}}}^{T} z_{2}) z_{1}} / \sqrt{1 - {∣ {\bar{z_{1}}}^{T} z_{2} ∣}^{2}}$ , for z₁, z₂ ∈ 𝒟³². Finally, our intrinsic model is defined by $E [{Log}_{p} ({\bar{U}}_{p, μ (x_{i}, q, β)}^{T} y_{i}) ∣ x_{i}] = 0$ , for i = 1, …, 409.

5.3. Results

We first calculated $({\hat{q}}_{I}, {\hat{β}}_{I}) = (ϕ_{p}^{- 1} ({\hat{t}}_{I}), {\hat{β}}_{I})$ in (14) and $({\tilde{q}}_{E}, {\tilde{β}}_{E}) = (ϕ_{p}^{- 1} ({\tilde{t}}_{E}), {\tilde{β}}_{E})$ in (18). The intercept estimates q̂_I and q̂_E are very close to each other with ${dist}_{\sum_{2}^{32}} ({\hat{q}}_{I}, {\hat{q}}_{E}) < 0.0005$ . Second, we compared the efficiency gain in the estimates of β. The estimates β̂_I and β̂_E of regression coefficients and their standard deviations are displayed in Figure 3(a) & (b). The efficiency gain in Stage II is measured by the relative reduction in the variances of β̂_E relative to those of β̂_I, which is shown in Figure 3(c). There is an average variance relative reduction of about 16.77% across all parameters in β. There is an average variance relative reduction of about 12.25% for parameters in β⁽^ad⁾, whereas there is an average relative reduction of 19.98% for parameters in β⁽^g⁾.

Fig. 3 — Regression coefficient estimates (a) and their standard deviations (b) from Stages I and II; and (c) the relative reduction in the variances of β̂_E relative to those of β̂_I. There is an average of 16.77% relative decrease in variances all parameters in β, corresponding to 12.25% for β⁽*^ad*⁾ and 19.98% for β⁽^g⁾.

Third, we assessed whether there is an age×diagnosis interaction effect on the shape of the CC contour or not. We tested H₀ : β⁽^ad⁾ = 0₆₀ versus H₁ : β⁽^ad⁾ ≠ 0₆₀. The Wald test statistic equals $W_{n, ϕ}^{(1)} = 98.20$ with its p-value around 0.001. Thus, the data contains enough evidence to reject H₀, indicating that there is a strong age dependent diagnosis effect on the shape of the CC contours. The mean age-dependent CC trajectories for HCs and ADs within each gender group are shown in Figure 4. It can be observed that there is a difference in shape along the inner side of the posterior splenium and isthmus subregions in both male and female groups. The splenium seems to be less rounded and the isthmus is thinner in subjects with AD than in HCs.

Fig. 4 — Age-trajectories of the intrinsic mean shapes by diagnosis within each gender group.

Fourth, we assessed whether there is a gender effect on the shape of the CC contour or not. We tested H₀ : β⁽^g⁾ = 0₆₀ versus H₁ : β⁽^g⁾ ≠ 0₆₀. The Wald test statistic is $W_{n, ϕ}^{(1)} = 73.34$ with its p-value 0.116. Thus, it is not significant at the 0.05 level of significance. It may indicate that there is no gender effect on the shape of the CC contours. The mean age-dependent CC trajectories for the female and male groups within each diagnosis group are shown in Figure 5. We observed similar shapes of CC contours in males and females.

Fig. 5 — Age-trajectories of the intrinsic mean shapes by gender within each diagnosis group.

6. Discussion

We have developed a general statistical framework for intrinsic regression models of responses valued in a Riemannian symmetric space in general, and Lie groups in particular, and their association with a set of covariates in a Euclidean space. The intrinsic regression models are based on the generalized method of moment estimator and therefore the models avoid any parametric assumptions regarding the distribution of the manifold-valued responses. We also proposed a large class of link functions to map Euclidean covariates to the manifold of responses. Essentially, the covariates are first mapped to the tangent bundle to the Riemmanian manifold, and from there further mapped, via the manifold exponential map, to the manifold itself. We have adapted an annealing evolutionary stochastic algorithm to search for the ILSE, (q̂_I, β̂_I), of (q, β), in the Stage I of the estimation process, and a one-step procedure to search for the efficient estimator (q̃_E, β̃_E) in Stage II. Our simulation study and real data analysis demonstrate that the relative efficiency of the Stage II estimator improves as the sample size increases.

Supplementary Material

sup

NIHMS778184-supplement-sup.pdf^{(448.9KB, pdf)}

Acknowledgments

We thank the Editor, an Associate Editor, two referees, and Professor Huiling Le for valuable suggestions, which helped to improve our presentation greatly. We also thank Dr. Chao Huang and Mr. Yuai Hua for processing the ADNI CC shape data set. This work was supported in part by National Science Foundation grants and National Institute of Health grants.

Footnotes

^‡

http://surfer.nmr.mgh.harvard.edu/

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this paper. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

References

Bhattacharya A, Dunson D. Nonparametric Bayes classification and hypothesis testing on manifolds. Journal of Multivariate Analysis. 2012;111:1–19. doi: 10.1016/j.jmva.2012.02.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bhattacharya A, Dunson DB. Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. Biometrika. 2010;97(4):851–865. doi: 10.1093/biomet/asq044. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bhattacharya R, Patrangenaru V. Large sample theory of intrinsic and extrinsic sample means on manifolds. I. Ann Statist. 2003;31(1):1–29. [Google Scholar]
Bhattacharya R, Patrangenaru V. Large sample theory of intrinsic and extrinsic sample means on manifolds. II. Ann Statist. 2005;33(3):1225–1259. [Google Scholar]
Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis: I. segmentation and surface reconstruction. Neuroimage. 1999;9(2):179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]
Dryden IL, Mardia KV. Statistical Shape Analysis. Chichester: John Wiley & Sons Ltd; 1998. [Google Scholar]
Fletcher PT. Geodesic regression and the theory of least squares on Riemannian manifolds. International Journal of Computer Vision. 2013;105(2):171–185. [Google Scholar]
Fletcher PT, Lu C, Pizer S, Joshi S. Principal geodesic analysis for the study of nonlinear statistics of shape. Medical Imaging, IEEE Transactions on. 2004;23(8):995–1005. doi: 10.1109/TMI.2004.831793. [DOI] [PubMed] [Google Scholar]
Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50:1029–1054. [Google Scholar]
Healy DMJ, Kim PT. An empirical Bayes approach to directional data and efficient computation on the sphere. Ann Statist. 1996;24(1):232–254. [Google Scholar]
Helgason S. Differential Geometry, Lie Groups, and Symmetric Spaces, Volume 80 of Pure and Applied Mathematics. New York, NY: Academic Press Inc; 1978. [Google Scholar]
Huckemann S, Hotz T, Munk A. Intrinsic manova for Riemannian manifolds with an application to Kendall’s space of planar shapes. IEEE Trans Patt Anal Mach Intell. 2010;32:593–603. doi: 10.1109/TPAMI.2009.117. [DOI] [PubMed] [Google Scholar]
Kent JT. The Fisher-Bingham distribution on the sphere. J Roy Statist Soc Ser B. 1982;44(1):71–80. [Google Scholar]
Kim HJ, Adluru N, Collions MD, Chung MK, Bendlin BB, Johnson SC, Davidson RJ, Singh V. Multivariate general linear models (mglm) on Riemannian manifolds with applications to statistical analysis of diffusion weighted images. IEEE Annual Conference on Computer Vision and Pattern Recognition; 2014. pp. 2705–2712. [DOI] [PMC free article] [PubMed] [Google Scholar]
Korsholm L. The GMM estimator versus the semiparametric efficient score estimator under conditional moment restrictions. University of Aarhus, Department of Economics, Building 350; 1999. Working paper series. [Google Scholar]
Lang S. Fundamentals of Differential Geometry, Volume 191 of Graduate Texts in Mathematics. New York: Springer-Verlag; 1999. [Google Scholar]
Le H, Barden D. On the measure of the cut locus of a Fréchet mean. Bull Lond Math Soc. 2014;46:698–708. [Google Scholar]
Liang F, Liu C, Carroll RJ. Advanced Markov Chain Monte Carlo: Learning from Past Samples. New York: Wiley; 2010. [Google Scholar]
Machado L, Leite FS. Fitting smooth paths on Riemannian manifolds. Int J Appl Math Stat. 2006;4:25–53. [Google Scholar]
Machado L, Silva Leite F, Krakowski K. Higher-order smoothing splines versus least squares problems on Riemannian manifolds. J Dyn Control Syst. 2010;16(1):121–148. [Google Scholar]
Mardia KV, Jupp PE. Directional Statistics. Chichester: John Wiley & Sons Ltd; 2000. [Google Scholar]
McCullagh P, Nelder JA. Generalized Linear Models. 2. London: Chapman and Hall; 1989. [Google Scholar]
Muralidharan P, Fletcher P. Sasaki metrics for analysis of longitudinal data on manifolds. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on; 2012. pp. 1027–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
Newey WK. Econometrics, Volume 11 of Handbook of Statist. Amsterdam: North-Holland; 1993. Efficient estimation of models with conditional moment restrictions; pp. 419–454. [Google Scholar]
Paul LK, Brown WS, Adolphs R, Tyszka JM, Richards LJ, Mukherjee P, Sherr EH. Agenesis of the corpus callosum: genetic, developmental and functional aspects of connectivity. Nature Reviews Neuroscience. 2007;8:287–299. doi: 10.1038/nrn2107. [DOI] [PubMed] [Google Scholar]
Samir C, Absil PA, Srivastava A, Klassen E. A gradient-descent method for curve fitting on Riemannian manifolds. Foundations of Computational Mathematics. 2012;12(1):49–73. [Google Scholar]
Shi X, Styner M, LJ, Ibrahim JG, Lin W, Zhu H. Intrinsic regression models for manifold-value data. International Conference on Medical Imaging Computing and Computer Assisted Intervention (MICCAI); 2009. pp. 192–199. [PMC free article] [PubMed] [Google Scholar]
Shi X, Zhu H, Ibrahim JG, Liang F, Liberman J, Styner M. Intrinsic regression models for median representation of subcortical structures. Journal of American Statistical Association. 2012;107:12–23. doi: 10.1080/01621459.2011.643710. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spivak M. A Comprehensive Introduction to Differential Geometry. 2. I. Wilmington, Del: Publish or Perish Inc; 1979. [Google Scholar]
Su J, Dryden I, Klassen E, Le H, Srivastava A. Fitting smoothing splines to time-indexed, noisy points on nonlinear manifolds. Image and Vision Computing. 2012;30(6–7):428–442. [Google Scholar]
Witelson SF. Hand and sex differences in isthmus and genu of the human corpus callosum: a postmortem morphological study. Brain. 1989;112:799–835. doi: 10.1093/brain/112.3.799. [DOI] [PubMed] [Google Scholar]
Yuan Y, Zhu H, Lin W, Marron JS. Local polynomial regression for symmetric positive definite matrices. Journal of Royal Statistical Society B. 2012;74:697–719. doi: 10.1111/j.1467-9868.2011.01022.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu H, Chen Y, Ibrahim JG, Li Y, Hall C, Lin W. Intrinsic regression models for positive-definite matrices with applications to diffusion tensor imaging. J Amer Statist Assoc. 2009;104(487):1203–1212. doi: 10.1198/jasa.2009.tm08096. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sup

NIHMS778184-supplement-sup.pdf^{(448.9KB, pdf)}

[R1] Bhattacharya A, Dunson D. Nonparametric Bayes classification and hypothesis testing on manifolds. Journal of Multivariate Analysis. 2012;111:1–19. doi: 10.1016/j.jmva.2012.02.020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Bhattacharya A, Dunson DB. Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. Biometrika. 2010;97(4):851–865. doi: 10.1093/biomet/asq044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Bhattacharya R, Patrangenaru V. Large sample theory of intrinsic and extrinsic sample means on manifolds. I. Ann Statist. 2003;31(1):1–29. [Google Scholar]

[R4] Bhattacharya R, Patrangenaru V. Large sample theory of intrinsic and extrinsic sample means on manifolds. II. Ann Statist. 2005;33(3):1225–1259. [Google Scholar]

[R5] Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis: I. segmentation and surface reconstruction. Neuroimage. 1999;9(2):179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]

[R6] Dryden IL, Mardia KV. Statistical Shape Analysis. Chichester: John Wiley & Sons Ltd; 1998. [Google Scholar]

[R7] Fletcher PT. Geodesic regression and the theory of least squares on Riemannian manifolds. International Journal of Computer Vision. 2013;105(2):171–185. [Google Scholar]

[R8] Fletcher PT, Lu C, Pizer S, Joshi S. Principal geodesic analysis for the study of nonlinear statistics of shape. Medical Imaging, IEEE Transactions on. 2004;23(8):995–1005. doi: 10.1109/TMI.2004.831793. [DOI] [PubMed] [Google Scholar]

[R9] Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50:1029–1054. [Google Scholar]

[R10] Healy DMJ, Kim PT. An empirical Bayes approach to directional data and efficient computation on the sphere. Ann Statist. 1996;24(1):232–254. [Google Scholar]

[R11] Helgason S. Differential Geometry, Lie Groups, and Symmetric Spaces, Volume 80 of Pure and Applied Mathematics. New York, NY: Academic Press Inc; 1978. [Google Scholar]

[R12] Huckemann S, Hotz T, Munk A. Intrinsic manova for Riemannian manifolds with an application to Kendall’s space of planar shapes. IEEE Trans Patt Anal Mach Intell. 2010;32:593–603. doi: 10.1109/TPAMI.2009.117. [DOI] [PubMed] [Google Scholar]

[R13] Kent JT. The Fisher-Bingham distribution on the sphere. J Roy Statist Soc Ser B. 1982;44(1):71–80. [Google Scholar]

[R14] Kim HJ, Adluru N, Collions MD, Chung MK, Bendlin BB, Johnson SC, Davidson RJ, Singh V. Multivariate general linear models (mglm) on Riemannian manifolds with applications to statistical analysis of diffusion weighted images. IEEE Annual Conference on Computer Vision and Pattern Recognition; 2014. pp. 2705–2712. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Korsholm L. The GMM estimator versus the semiparametric efficient score estimator under conditional moment restrictions. University of Aarhus, Department of Economics, Building 350; 1999. Working paper series. [Google Scholar]

[R16] Lang S. Fundamentals of Differential Geometry, Volume 191 of Graduate Texts in Mathematics. New York: Springer-Verlag; 1999. [Google Scholar]

[R17] Le H, Barden D. On the measure of the cut locus of a Fréchet mean. Bull Lond Math Soc. 2014;46:698–708. [Google Scholar]

[R18] Liang F, Liu C, Carroll RJ. Advanced Markov Chain Monte Carlo: Learning from Past Samples. New York: Wiley; 2010. [Google Scholar]

[R19] Machado L, Leite FS. Fitting smooth paths on Riemannian manifolds. Int J Appl Math Stat. 2006;4:25–53. [Google Scholar]

[R20] Machado L, Silva Leite F, Krakowski K. Higher-order smoothing splines versus least squares problems on Riemannian manifolds. J Dyn Control Syst. 2010;16(1):121–148. [Google Scholar]

[R21] Mardia KV, Jupp PE. Directional Statistics. Chichester: John Wiley & Sons Ltd; 2000. [Google Scholar]

[R22] McCullagh P, Nelder JA. Generalized Linear Models. 2. London: Chapman and Hall; 1989. [Google Scholar]

[R23] Muralidharan P, Fletcher P. Sasaki metrics for analysis of longitudinal data on manifolds. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on; 2012. pp. 1027–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Newey WK. Econometrics, Volume 11 of Handbook of Statist. Amsterdam: North-Holland; 1993. Efficient estimation of models with conditional moment restrictions; pp. 419–454. [Google Scholar]

[R25] Paul LK, Brown WS, Adolphs R, Tyszka JM, Richards LJ, Mukherjee P, Sherr EH. Agenesis of the corpus callosum: genetic, developmental and functional aspects of connectivity. Nature Reviews Neuroscience. 2007;8:287–299. doi: 10.1038/nrn2107. [DOI] [PubMed] [Google Scholar]

[R26] Samir C, Absil PA, Srivastava A, Klassen E. A gradient-descent method for curve fitting on Riemannian manifolds. Foundations of Computational Mathematics. 2012;12(1):49–73. [Google Scholar]

[R27] Shi X, Styner M, LJ, Ibrahim JG, Lin W, Zhu H. Intrinsic regression models for manifold-value data. International Conference on Medical Imaging Computing and Computer Assisted Intervention (MICCAI); 2009. pp. 192–199. [PMC free article] [PubMed] [Google Scholar]

[R28] Shi X, Zhu H, Ibrahim JG, Liang F, Liberman J, Styner M. Intrinsic regression models for median representation of subcortical structures. Journal of American Statistical Association. 2012;107:12–23. doi: 10.1080/01621459.2011.643710. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Spivak M. A Comprehensive Introduction to Differential Geometry. 2. I. Wilmington, Del: Publish or Perish Inc; 1979. [Google Scholar]

[R30] Su J, Dryden I, Klassen E, Le H, Srivastava A. Fitting smoothing splines to time-indexed, noisy points on nonlinear manifolds. Image and Vision Computing. 2012;30(6–7):428–442. [Google Scholar]

[R31] Witelson SF. Hand and sex differences in isthmus and genu of the human corpus callosum: a postmortem morphological study. Brain. 1989;112:799–835. doi: 10.1093/brain/112.3.799. [DOI] [PubMed] [Google Scholar]

[R32] Yuan Y, Zhu H, Lin W, Marron JS. Local polynomial regression for symmetric positive definite matrices. Journal of Royal Statistical Society B. 2012;74:697–719. doi: 10.1111/j.1467-9868.2011.01022.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Zhu H, Chen Y, Ibrahim JG, Li Y, Hall C, Lin W. Intrinsic regression models for positive-definite matrices with applications to diffusion tensor imaging. J Amer Statist Assoc. 2009;104(487):1203–1212. doi: 10.1198/jasa.2009.tm08096. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Regression Models on Riemannian Symmetric Spaces

Emil Cornea

Hongtu Zhu

Peter Kim

Joseph G Ibrahim

Summary

1. Introduction

Fig. 1.

2. Differential Geometry Preliminaries

3. Intrinsic Regression Model

3.1. Formulation

3.2. A Theoretical Example: The Unit Sphere Sk

Fig. 2.

4. Estimation and Test Procedures

4.1. Generalized Method of Moment Estimators

Theorem 4.1

4.2. Efficient GMM Estimator

Theorem 4.2

Theorem 4.3

Theorem 4.4

4.3. Computational Algorithm

4.4. Hypotheses Testing

Theorem 4.5

Theorem 4.6

Theorem 4.7

Theorem 4.8

5. Real Data Example

5.1. ADNI Corpus Callosum Shape Data

5.2. Intrinsic Regression Models

Table 1.

5.3. Results

Fig. 3.

Fig. 4.

Fig. 5.

6. Discussion

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.2. A Theoretical Example: The Unit Sphere S^k