Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Apr 22.
Published in final edited form as: Med Image Comput Comput Assist Interv. 2015 Nov 18;9349:719–727. doi: 10.1007/978-3-319-24553-9_88

Nonlinear regression on Riemannian manifolds and its applications to Neuro-image analysis

Monami Banerjee 1, Rudrasis Chakraborty 1, Edward Ofori 2, David Vaillancourt 2, Baba C Vemuri 1,
PMCID: PMC4840251  NIHMSID: NIHMS771122  PMID: 27110601

Abstract

Regression in its most common form where independent and dependent variables are in ℝn is a ubiquitous tool in Sciences and Engineering. Recent advances in Medical Imaging has lead to a wide spread availability of manifold-valued data leading to problems where the independent variables are manifold-valued and dependent are real-valued or vice-versa. The most common method of regression on a manifold is the geodesic regression, which is the counterpart of linear regression in Euclidean space. Often, the relation between the variables is highly complex, and existing most commonly used geodesic regression can prove to be inaccurate. Thus, it is necessary to resort to a non-linear model for regression. In this work we present a novel Kernel based non-linear regression method when the mapping to be estimated is either from M → ℝn or ℝnM, where M is a Riemannian manifold. A key advantage of this approach is that there is no requirement for the manifold-valued data to necessarily inherit an ordering from the data in ℝn. We present several synthetic and real data experiments along with comparisons to the state-of-the-art geodesic regression method in literature and thus validating the effectiveness of the proposed algorithm.

1 Introduction

Regression is an essential tool for quantitative analysis to find the relation between independent and dependent variables. Here, we are given a training set of both of these variables and we seek a relation between them. When, both of these variables are in Euclidean space, and there is a linear relation between them, i.e., yi = axi + b for a set of {xi, yi}, a common way to solve for the unknowns a and b is using linear least-square estimator, i.e., minimizing the sum of square distances between the two sets of variables over the training set. But, in many real applications, the relation is seldom linear, hence a non-linear least squares estimator or any other sophisticated regression tool like Support Vector Regression [4] can be used.

Often, either of the independent or dependent variables are manifold-valued and lie on a smooth Riemannian manifold. In such instances, embedding the manifold valued variables in Euclidean space (using the Whitney Embedding [1]) might result in a poor estimation of the underlying model. Also, as any general manifold globally lacks the vector space structure, any linear combination of points on the manifold may not lie on the manifold. For example, suppose the data points lie in a Kendall’s shape space [14], then an arbitrary linear combination of the shapes will not yield a point on in the shape space. These problems motivate the development of novel regression methods for manifold-valued data. We will now briefly present earlier work that addresses this problem.

Related Work

Curve fitting on Riemannian manifolds where some notion of ordering is imposed on the manifold-valued data has been quite common lately in literature [2, 18, 7, 13, 5, 16]. We will present a brief review within the limited space. Samir et al. [18] developed a gradient descent algorithm for time ordered manifold-valued data using a variational formulation, where the cost function entails a data fidelity and a regularization constraint on the curve being sought. This formulation itself is quite common to finding smooth approximation of both real-valued and manifold-valued data. What is then different between methods is the kind of metric used and at times even the data fidelity terms. Each could facilitate the solution sought from an efficiency and/or accuracy.

In the recent past, several researchers [7, 13] have proposed geodesic regression on manifolds, as well as non-parametric regression models [2]. The geodesic regression models correspond to linear regression in ℝn. Most recently however, a variational spline regression for the manifold of diffeomorphisms was presented in a large deformation diffeomorphic mapping (LDDMM) setting [19]. Fletcher [7] proposed geodesic regression to regress manifold-valued data against the real-valued variables. Taking cues from [7], authors in [5], developed a regression technique for points that lie on unit Hilbert sphere. In [2], authors estimate the correlation between shape and age using manifold regression. The aforementioned methods dealt mostly with the independent scalar variable. A multivariate general linear model was proposed in [16] where given a dataset, authors try to model a functional relation from a ℝn to a manifold ℳ. In [15], they extend Canonical Correlation Analysis (CCA) on Riemannian manifold, where both of the variables are manifold-valued. Hong et al. [12] proposed a shooting spline formulation to regress points on Grassmann manifold with reals. In [9], Hinkle et al. has proposed a polynomial regression method formulated as a variational minimization problem on the manifold using covariant derivatives. The minimization tends to covariant differential equations.

In this paper, we present a nonlinear kernel regression technique to handle both of the following commonly encountered cases, ℝn → ℳ and ℳ → ℝn. We dub our proposed kernel based regression from ℝn → ℳ as Manifold-valued Kernel Regression (MVKR). A key advantage of this approach is that there is no requirement for the manifold-valued data to necessarily inherit an ordering from the multi-variate data in ℝn, a necessary requirement in most existing methods. An example in Fig. 1 depicts the usefullness in terms of accuracy in using the nonlinear regression over the geodesic regression model.

Fig. 1.

Fig. 1

Examples of nonlinear & geodesic regression.

2 Methodology

Regression is ubiquitous in scientific analysis where given a set of tuples {xi,yi}i=1NX×Y, the goal is to find a functional relation between {xi}i=1N and {yi}i=1N. Here, one variable is the observed data (independent variable) and the other one is the response (dependent variable). We propose a kernel interpolation to find the relation between observed data and responses where one of them lies in the Euclidean space and the other one lies on a Riemannian manifold. Given {xi}i=1Nn and {yi}i=1NM, we pose the two cases as following interpolation problems:

  • Manifold valued independent variable: Find a function f : ℳ → ℝn such that xi = f(yi), ∀i.

  • Manifold valued dependent variable: Find a function h : ℝn → ℳ such that yi = h(xi), ∀i.

In both the above cases, ℳ is a Riemannian manifold equipped with a Riemannian metric g. We will address these above two problems separately in the following subsections.

2.1 Manifold valued independent variable

Given {xi,yi}i=1N as before, we try to model the function : ℳ → ℝn by minimizing the following error function: E=1Ni=1Nx^i-xi2 where, x^i=f^(yi)=j=1kK(cj,yi)tj. Here {cj}j=1kM and {tj}j=1kn are the representatives on ℳ and ℝn respectively. 𝒦 : ℳ × ℳ → ℝ is the kernel function. Thus, , the approximation of x is the weighted mean of tj’s. The weights here are computed by using a suitable kernel function and representatives, {cj}j=1k, on the manifold, ℳ. We learn the {tj}j=1k by minimizing the above error function, E, whereas, {cj}j=1k are taken to be the cluster representatives. Here, we used the steepest descent technique to estimate {tj}j=1k. The gradient of the objective function with respect to tj is given by, tjE=2Ni=1N(x^i-xi)K(cj,yi).

Note that, as the objective function, E is convex in tj, the global minimum can be achieved using a steepest descent technique. In a similar fashion, we can initialize cj to be the cluster representatives and estimate them along the gradient direction. The gradient of the objective function with respect to cj is given by, cjE=2tjNi=1N(x^i-xi)cjK(cj,yi).

Since any kernel function depends on the underlying metric, if the underlying manifold ℳ has a closed form expression for the geodesic distance, so will ∇cj𝒦(cj, yi). In this work, we use the kernel K(c,y)=exp{-b2σ2d(c,y)2}, where b, σ2 are the kernel parameters, and d(., .) is the geodesic distance on ℳ. Then, cjK(cj,yi)=b2σ2K(cj,yi)Logcjyi where, Logcjyi is the Riemannian inverse exponential map. Note that, the b value is tuned according to the structure of the dataset. By drawing an analogy with the Gaussian kernel on ℝn, we chose a small b value for a well clustered data, and a high b value otherwise. The parameter σ2 is taken as the variance over the training data.

2.2 Manifold valued dependent variable

Given {xi,yi}i=1N as above, we now try to model the function ĥ : ℝn → ℳ such that yiĥ(xi). As before, let ℳ be equipped with a Riemannian metric g. Also, let d : ℳ × ℳ → ℝ be the geodesic distance on ℳ defined as follows: d(yi, yj)2 = gyi (Logyiyj, Logyiyj), where Logyiyj is the inverse-exponential map. We can now estimate h by minimizing the following error function: E=1Ni=1Nd(y^i,yi)2 where,

y^i=h^(xi)=argminμMj=1kKEuc(tj,xi)d(cj,μ)2 (1)

Analogous to the manifold valued independent variable case, here cj ∈ ℳ and tj ∈ ℝn, ∀j. 𝒦Euc : ℝn × ℝn → ℝ is the kernel function on the Euclidean space. Thus, yi is estimated as the weighted Fréchet mean (FM) [8] of the representatives, {cj}j=1k, where weights are given by the kernel function, yielding the MVKR. We use {tj}j=1k as the cluster representatives and estimate {cj}j=1k using the steepest descent on the objective function. The gradient direction of E with respect to cj is given by,

cjE=-2Ni=1NLogy^iyicjy^i. (2)

As cj and ŷi both are on ℳ, we will use charts to compute ∇cjŷi. Let ℳ be an m dimensional manifold. Consider two charts (U, Φ) and (V, Ψ) containing cj and ŷi, respectively. By fixing xi, we can take ŷi as a function of cj ‘s. Let the function be F. Then, ∇cj ŷi can be defined as ∇cj ŷi := ∇jG, where j = Φ(cj) and G = ΨFΦ−1 : ℝm → ℝm. Hence, ∇jG is the Jacobian of G. Note that, ∇cjETcjℳ, so in order to make the RHS of equation 2 to be in Tcjℳ, we use parallel transport of Logŷiyi from ŷi to cj. For a general Riemannian manifold ℳ, we can approximate this parallel transport, ΛcjLogŷiyi as ΛcjLogŷiyiLogcj yiLogcjŷi.

Since there is no closed form solution for the weighted FM of more than two samples on general Riemannian manifolds, computation of ∇ĉjG, or the Jacobian of G, is not straightforward. Hence, in spirit of [16, 11], we approximate Equation 1 as y^iExpp(j=1kKEuc(tj,xi)Logpcj), where p ∈ ℳ is any arbitrary point on M, and Exp is the Riemannian Exponential map. In the absence of such an approximation, the problem would become analytically intractable as estimating both the control points and the FM jointly is nontrivial. With this simplification, ∇cjŷi = 𝒦Euc(tj, xi) × Im, where Im is the identity matrix of size m. For the case of P(n), we resort to use of the efficient recursive FM estimator in [10] and a similar one for and Sn.

3 Experimental Results

We now evaluate the performance of the proposed regression method on both synthetic and real datasets. In the following two subsections, we will experimentally show effectiveness of our method to (1) regress real vector-valued dependent variables against manifold-valued independent variables and (2) regress manifold-valued dependent variables against real vector-valued independent variables. In order to quantify the performance of our ℝn to manifold regression, we use the R2 statistical measure and the p–value. The R2 statistical measure on a manifold is defined in [7] and repeated here for convenience. Let {yi}i=1N be the manifold-valued data with its corresponding predicted value to be {y^i}i=1N. Let the unexplained variance be defined as i=1Nd(yi,y^i)2. Then, the R2 statistic is defined as: R2=1-unexplainedvariancedatavariance. The value of R2 statistic lies in the interval [0, 1], and a value close to one in general denotes better regression performance. We use a t–test over 30 independent runs to reject the null hypothesis, H0: mean of the unexplained variance is not less than the mean of the data variance with a significance level of 0.001. For the manifold to ℝn regression, we present an application to the classification on Parkinson’s dataset and report the average classification accuracy over 30 runs.

3.1 Manifold valued independent variable

In this section, we present results of our regression scheme applied to classification of MR T2 brain scans obtained from, (1) controls (CON), and patients with (2) essential tremor (ET), and (3) Parkinson’s disease (PD). We aim to automatically discriminate between these three classes, using features derived from the data.

In [20], authors have used DTI based analysis, specifically the scalar-valued features to address the problem of movement disorder classification. In this section, we use the shape of the Substatia Nigra across the input population as our key discriminatory feature. Sample Substantia Nigra shapes for the three classes are shown in Fig. 2. The shapes of interest are first segmented and then are converted into a probability density function. Then using the square root density parameterization, this shape can be represented as a point on the unit Hilbert sphere using the Schrodinger Distance Transform (SDT) [3].

Fig. 2.

Fig. 2

Examples of Substantia Nigra.

The key feature used in our classification of the aforementioned disease classes is the shape of the Substantia Nigra. The Substantia Nigra was hand-segmented from all rigidly pre-aligned datasets, consisting of 25 controls, 15 ET and 24 PD images. The T2 brain scans were acquired using a 3T Phillips MR scanner with the following parameters: TR = 774 ms, TE = 86 ms and voxel size = 2 × 2 × 2 mm3.

We first collected random (point) samples on the boundary of each 3-D Substantia Nigra shape, and applied the SDT to represent each shape as a point on the unit hypersphere. The size of the ROI for the 3-D shape of interest was set to (28 × 28 × 15)mm3, resulting in a 11760-dimensional unit vectors using SDT. Therefore, the samples now live on the 𝕊11759 manifold.

We randomly selected 10 Control, 10 PD and 5 ET images as the test set, and used the rest of the data for training. The details of our classification method are described next. First, we regress the dependent variable against the independent variable on 𝕊11759. In order to make the dependent variable lie in [0, 1], we apply the logistic function ℒ on the dependent variable f(y). Then, we classify a point y as belonging to class-1, if ℒ(f(y)) < 0.5, else we assign it to class-2. The classification task is repeated 30 times using various randomly chosen training sets and the average accuracy is reported. The results are shown in Table 1. We compare our method with the standard PCA and PGA (Principal Geodesic Analysis) [6], and report the accuracy of classification.

Table 1.

Result based on Substantia Nigra shape

Control vs. ET Control vs. PD PD vs. ET
Proposed PGA PCA Proposed PGA PCA Proposed PGA PCA
Accuracy 100.00 90.14 75.69 95.26 92.95 67.32 85.71 87.58 64.60

The results show that our proposed method performs well compared to the other two in classifying Control versus PD and ET. In case of PD vs. ET classification, our method gives slightly lower accuracy compared to PGA.

3.2 Manifold valued dependent variable

In this section, we applied our MVKR method on synthetic and real datasets. In all of these experiments, we have made a comparison with the recently proposed MGLM method in [16] and MKRE (Manifold kernel regression estimator) in [2]. As MVKR and MKRE both use the same Nadaraya-Watson kernel, we have used the same choice of parameters for both of these methods.

Synthetic Data Experiment

For this experiment, we synthesized a dataset {xi,yi}i=15002×S2 by defining a function h : xi =[θi, ϕi] → yi as follows: h([θi, ϕi]) := (cos(θi) cos(ϕi), cos(θi) sin(ϕi), sin(θi)) where θi ∈ [0, π/2), ϕi ∈ [0, 2π], ∀i. Thus, all the yis are on the northern hemisphere of the 2–sphere, so FM is uniquely defined. We have partitioned this dataset into 90%, 10% for training and testing respectively. The p–value and average R2 statistics are reported in Table 2 over 30 runs. From these figures, we can clearly see that our MVKR method performs better in comparison to MGLM [16] and gives comprative performance to MKRE [2].

Table 2.

Synthetic data results

MVKR MGLM MKRE
Train Error 0.00 0.60 0.07
Test Error 0.00 0.61 0.07
R2 Stat. 1.00 0.29 0.92
p–value < 0.001 < 0.001 < 0.001

OASIS dataset [17]

We used the publicly available OASIS data [17] to regress manifold-valued data with reals. This dataset consists of T1 MR brain scans of subjects with ages from 18 to 96 including individuals with early stage Alzheimer’s Disease.

We randomly chose 4 brain scans from each of the decades in the 18 – 96 age group, totalling 36 brain images, out of which 32 were randomly chosen and used as training and the rest were used as the test set. Corpus callosum (CC) shapes of individuals of varying ages are shown in Fig. 3. We seek to model the relationship between age and shape of the CC, captured using three different features as described in the following. From each of the brain images of the 36 individuals, we construct three different data representations as follows. (1) We segmented out the CC from the brain images. Then, we take the boundary of the CC and map it to S24575 using the SDT [3]. (2) After segmenting out the CC, we used a set of landmark points on the boundary and map each of these point sets into the Kendall’s shape space [14], which is a complex projective space. (3) We took the whole brain image and computed the normalized histogram and used the square root of the normalized histogram to map each image on to S255.

Fig. 3.

Fig. 3

Corpus callosum shapes

The average R2 statistics of 30 runs on each of these three representations of the chosen OASIS datasets is given in Table 3. From the table, it is evident that the performance of MVKR is significantly better compared to the MGLM method. It should be noted that the R2 statistics reported by MVKR is not very high (not close to 1). But it can be argued that, as we are only considering relation between age and the manifold-valued data, the relation is highly nonlinear.

Table 3.

Results on the OASIS dataset

Dataset using SDT Kendall’s shape space Dataset using histogram

MVKR MGLM MKRE MVKR MGLM MKRE MVKR MGLM MKRE

R2 Stat. 0.49 0.05 0.46 0.35 −0.27 0.33 0.48 −0.18 0.40
p–value < 0.001 < 0.001 < 0.001 < 0.001 > 0.001 < 0.001 < 0.001 > 0.001 < 0.001

Hence, it is not possible to truly capture the “relation in full” based on age alone, of an individual. Also, the brain images are chosen randomly without considering gender, educational background or even symptoms of AD, all of which makes the relation between age and the shape of the CC very complex. So, given these confounding parameters that could influence the structure, the R2 statistics for MVKR depicts a significantly good performance. Note that, for second and third variant of this dataset, MGLM results in a negative R2 statistic. From the definition of R2 statistics, we can see that a negative value indicates that the regressor performed worse than the most trivial choice, which is FM of the dataset for any given test point x (value of the independent variable).

Thus, MGLM’s unsatisfactory performance on these datasets indicates that a linear regressor is inept for this problem and motivates the use of a nonlinear regression technique such as the one presented here. The p–values reported in Table 3 indicate the higher statistical significance and hence the superior performance of our MVKR method. In comparison to MKRE, the performance of MVKR is consistently better, though not by a significant amount.

So, in summary, as for most of the real cases, the data on the manifold do not lie close to a geodesic, the performance of MGLM is not comparable to MVKR. This is due to the fact that MGLM assumes that data lie close to a geodesic while MVKR does not require any such assumption. When the data lie or are close to a geodesic, MVKR and MGLM have comparable performance as can be seen from the following toy example. In this example, we have used the sythetic data on P(3), the space of symmetric positive definite matrices, in [16]. The R2 statistics value for MGLM and MVKR are 0.98 and 0.99 respectively. We would also like to point out that although compared to MKRE, performance of MVKR is not significantly better, MVKR is applicable for ℝn to M regression and vice versa, whereas, the method in [2] is applicable only to regression for the case of ℝ to M.

4 Conclusions

In this paper, we presented a novel nonlinear regression technique for estimating the functional relationship between manifold-valued independent variables and ℝn valued dependent variables and vice versa. Earlier work in this area involved use of geodesic regression and is ill suited for many situations involving complex relationships between the aforementioned independent and dependent variables. Our method involved a Kernel-based technique and we presented several experiments to demonstrate the performance of our methods in comparison to the state-of-the-art (MGLM method) on a variety of data sets. Results depict that our method yields superior performance for both the applications namely, classification of movement disorders and finding a correlation between age and CC shape of patients from the OASIS database.

Footnotes

This research was funded in part by the NIH grant NS066340 to BCV.

Contributor Information

Monami Banerjee, Email: monami@cise.ufl.edu.

Rudrasis Chakraborty, Email: rudrasis@cise.ufl.edu.

Edward Ofori, Email: eofori@ufl.edu.

David Vaillancourt, Email: vcourt@ufl.edu.

Baba C. Vemuri, Email: vemuri@cise.ufl.edu.

References

  • 1.Adachi M, Hudson K. Embeddings and immersions. American Mathematical Soc; 2012. [Google Scholar]
  • 2.Davis BC, Fletcher PT, et al. Population shape regression from random design data. IEEE ICCV. 2007:1–7. [Google Scholar]
  • 3.Deng Y, Rangarajan A, et al. A Riemannian framework for matching point clouds represented by the Schrodinger distance transform. IEEE CVPR. 2014:3756–3761. doi: 10.1109/CVPR.2014.486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Drucker H, Burges CJ, et al. Support vector regression machines. NIPS. 1997:155–161. [Google Scholar]
  • 5.Du J, Goh A, et al. Geodesic regression on orientation distribution functions with its application to an aging study. Neuroimage. 2014:416–426. doi: 10.1016/j.neuroimage.2013.06.081. [DOI] [PubMed] [Google Scholar]
  • 6.Fletcher P, Lu C, et al. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE TMI. 2004:995–1005. doi: 10.1109/TMI.2004.831793. [DOI] [PubMed] [Google Scholar]
  • 7.Fletcher PT. Geodesic regression and the theory of least squares on Riemannian manifolds. International journal of computer vision. 2013:171–185. [Google Scholar]
  • 8.Fréchet M. Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’institut Henri Poincaré. 1948:215–310. [Google Scholar]
  • 9.Hinkle J, Muralidharan P, et al. Polynomial regression on Riemannian manifolds. ECCV. 2012:1–14. [Google Scholar]
  • 10.Ho J, Cheng G, et al. Recursive Karcher expectation estimators and geometric law of large numbers. AISTATS. 2013:325–332. [Google Scholar]
  • 11.Ho J, Xie Y, et al. On a nonlinear generalization of sparse coding and dictionary learning. ICML. 2013:1480–1488. [PMC free article] [PubMed] [Google Scholar]
  • 12.Hong Y, Kwitt R, et al. Geodesic regression on the Grassmannian. ECCV. 2014:632–646. [Google Scholar]
  • 13.Hong Y, Singh N, et al. Time-warped geodesic regression. MICCAI. 2014:105–112. doi: 10.1007/978-3-319-10470-6_14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kendall D. A survey of the statistical theory of shape. Stat Science. 1989:87–99. [Google Scholar]
  • 15.Kim HJ, Adluru N, et al. Canonical Correlation analysis on Riemannian Manifolds and its Applications. ECCV. 2014:251–267. doi: 10.1007/978-3-319-10605-2_17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kim HJ, Bendlin BB, et al. MGLM on Riemannian manifolds with applications to statistical analysis of diffusion weighted images. IEEE CVPR. 2014:2705–2712. doi: 10.1109/CVPR.2014.352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Marcus DS, Wang TH, et al. OASIS: cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience. 2007:1498–1507. doi: 10.1162/jocn.2007.19.9.1498. [DOI] [PubMed] [Google Scholar]
  • 18.Samir C, Absil P-A, et al. A gradient-descent method for curve fitting on Riemannian manifolds. Foundations of Comput Mathematics. 2012:49–73. [Google Scholar]
  • 19.Singh N, Niethammer M. Splines for diffeomorphic image regression. MICCAI. 2014:121–129. doi: 10.1007/978-3-319-10470-6_16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Vaillancourt D, Spraker M, et al. High-resolution diffusion tensor imaging in the substantia nigra of de novo parkinson disease. Neurology. 2009:1378–1384. doi: 10.1212/01.wnl.0000340982.01727.6e. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES