Nested Grassmannians for Dimensionality Reduction with Applications

Chun-Hao Yang; Baba C Vemuri

. Author manuscript; available in PMC: 2023 Feb 18.

Published in final edited form as: J Mach Learn Biomed Imaging. 2022 Mar;2022:002.

Nested Grassmannians for Dimensionality Reduction with Applications

Chun-Hao Yang ¹, Baba C Vemuri ²

PMCID: PMC9938729 NIHMSID: NIHMS1871482 PMID: 36818740

Abstract

In the recent past, nested structures in Riemannian manifolds has been studied in the context of dimensionality reduction as an alternative to the popular principal geodesic analysis (PGA) technique, for example, the principal nested spheres. In this paper, we propose a novel framework for constructing a nested sequence of homogeneous Riemannian manifolds. Common examples of homogeneous Riemannian manifolds include the n-sphere, the Stiefel manifold, the Grassmann manifold and many others. In particular, we focus on applying the proposed framework to the Grassmann manifold, giving rise to the nested Grassmannians (NG). An important application in which Grassmann manifolds are encountered is planar shape analysis. Specifically, each planar (2D) shape can be represented as a point in the complex projective space which is a complex Grassmann manifold. Some salient features of our framework are: (i) it explicitly exploits the geometry of the homogeneous Riemannian manifolds and (ii) the nested lower-dimensional submanifolds need not be geodesic. With the proposed NG structure, we develop algorithms for the supervised and unsupervised dimensionality reduction problems respectively. The proposed algorithms are compared with PGA via simulation studies and real data experiments and are shown to achieve a higher ratio of expressed variance compared to PGA.

Keywords: Grassmann Manifolds, Dimensionality Reduction, Shape Analysis, Homogeneous Riemannian Manifolds

1. Introduction

Riemannian manifolds are often used to model the sample space in which features derived from the raw data encountered in many medical imaging applications live. Common examples include the diffusion tensors (DTs) in diffusion tensor imaging (DTI) (Basser et al., 1994), the ensemble average propagator (EAP) (Callaghan, 1993). Both DTs and EAP are used to capture the diffusional properties of water molecules in the central nervous system by non-invasively imaging the tissue via diffusion weighted magnetic resonance imaging. In DTI, diffusion at a voxel is captured by a DT, which is a 3 × 3 symmetric positive-definite matrix, whereas EAP is a probability distribution characterizing the local diffusion at a voxel, which can be parametrized as a point on the Hilbert sphere. Another example is the shape space used to represent shapes in shape analysis. There are many ways to represent a shape, and the most simple one is to use landmarks. For the landmark-based representation, the shape space is called Kendall’s shape space (Kendall, 1984). Kendall’s shape space is in general a stratified space (Goresky and MacPherson, 1988; Feragen et al., 2014), but for the special case of planar shapes, the shape space is the complex projective space, which is a complex Grassmann manifold. The examples mentioned above are often high-dimensional: a DTI scan usually contains half a million DTs; the shape of the Corpus Callosum (which is used in our experiments) is represented by a several hundreds of boundary points in $ℝ^{2}$ . Thus, in these cases, dimension reduction techniques, if applied appropriately, can benefit the subsequent statistical analysis.

For data on Riemannian manifolds, the most widely used dimensionality reduction method is the principal geodesic analysis (PGA) (Fletcher et al., 2004), which generalizes the principal component analysis (PCA) to manifold-valued data. In fact, there are many variants of PGA. Fletcher et al. (2004) proposed to find the geodesic submanifold of a certain dimension that maximizes the projected variance and computationally, it was achieved via a linear approximation, i.e., applying PCA on the tangent space at the intrinsic mean. This is sometimes referred to as the tangent PCA. Note that this approximation requires the data to be clustered around the intrinsic mean, otherwise the tangent space approximation to the manifold leads to inaccuracies. Later on, Sommer et al. (2010) proposed the Exact PGA (EPGA), which does not use any linear approximation. However, EPGA is computationally expensive as it requires two non-linear optimizations steps per iteration (projection to the geodesic submanifold and finding the new geodesic direction such that the loss of information is minimized). Chakraborty et al. (2016) partially solved this problem for manifolds with constant sectional curvature (spheres and hyperbolic spaces) by deriving closed form formulae for the projection. Other variants of PGA include but are not limited to sparse exact PGA (Banerjee et al., 2017), geodesic PCA (Huckemann et al., 2010), and probabilistic PGA (Zhang and Fletcher, 2013). All these methods focus on projecting data to a geodesic submanifold as in PCA where one projects data to a vector subspace. Instead, one can also project data to a submanifold that minimizes the reconstruction error without any further restrictions, e.g. being geodesic. This is the generalization of the principal curve (Hastie and Stuetzle, 1989) to Riemannian manifolds presented in Hauberg (2016).

Another feature of PCA is that it produces a sequence of nested vector subspaces. From this observation, Jung et al. (2012) proposed the principal nested spheres (PNS) by embedding an n — 1-sphere into an n-sphere, and the embedding is not necessarily isometric. Hence PNS is more general than PGA in that PNS is not required to be geodesic. Similarly, for the manifold P_n of n × n SPD matrices, Harandi et al. (2018) proposed a geometry-aware dimension reduction by projecting data in P_n to P_m for some m ≪ n. They also applied the nested Pⁿ for the supervised dimensionality reduction problem. Damon and Marron (2014) considered a nested sequence of relations which determine a nested sequence of submanifolds that are not necessarily geodesic. They showed various examples, including Euclidean space and the n-sphere, depicting how the nested relations generalized PCA and PNS. However, for an arbitrary Riemannian manifold, it is not clear how to construct a nested submanifold. Another generalization of PGA was proposed by Pennec et al. (2018), called the exponential barycentric subspace (EBS). A k-dimensional EBS is defined as the locus of weighted exponential barycenters of (k + 1) affinely independent reference points. The EBSs are naturally nested by removing or adding reference points.

Unlike PGA which can be applied to any Riemannian manifolds, the construction of the nested manifolds relies heavily on the geometry of the specific manifold, and there is no general principle for such a construction. All the examples described above (spheres and P_n) and several others such as the Grassmannian, Stiefel etc. are homogeneous Riemannian manifolds (Helgason, 1979), which intuitively means that all points on the manifold ‘look’ the same. In this work, we propose a general framework for constructing a nested sequence of homogeneous Riemannian manifolds, and, via some simple algebraic computation, show that the nested sphere and the nested P_n can indeed be derived within this framework. We will then apply this framework to the Grassmann manifolds, called the nested Grassmann manifolds (NG). The Grassmann manifold $Gr (p, V)$ is the manifold of all p-dimensional subspaces of the vector space $V$ where $1 \leq p \leq dim V$ . Usually $V = ℝ^{n}$ or $V = ℂ^{n}$ . An important example is Kendall’s shape space of 2D shapes. The space of all shapes determined by k landmarks in $ℝ^{2}$ is denoted by $Σ_{2}^{k}$ , and Kendall (1984) showed that it is isomorphic to the complex projective space $ℂ P^{k - 2} ≅ Gr (1, ℂ^{k - 1})$ . In many applications, the number k of landmarks is large, and so is the dimension of $Gr (1, ℂ^{k - 1})$ . The core of the proposed dimensionality reduction involves projecting data on $Gr (p, V)$ to $Gr (p, \tilde{V})$ with $dim \tilde{V} ≪ dim V$ . The main contributions of this work are as follows: (i) We propose a general framework for constructing a nested sequence of homogeneous Riemannian manifolds unifying the recently proposed nested spheres (Jung et al., 2012) and nested SPD manifolds (Harandi et al., 2018). (ii) We present novel dimensionality reduction techniques based on the concept of NG in both supervised and unsupervised settings respectively. (iii) We demonstrate via several simulation studies and real data experiments, that the proposed NG can achieve a higher ratio of expressed variance compared to PGA.

The rest of the paper is organized as follows. In Section 2, we briefly review the definition of homogeneous Riemannian manifolds and present the recipe for the construction of nested homogeneous Riemannian manifolds. In Section 3, we first review the geometry of the Grassmannian. By applying the procedure developed in Section 2, we present the nested Grassmann manifolds representation and discuss some of its properties in details. Then we describe algorithms for our unsupervised and supervised dimensionality reduction techniques, called the Principal Nested Grassmanns (PNG), in Section 4. In Section 5, we present several simulation studies and experimental results showing how the PNG technique performs compared to PGA under different settings. Finally, we draw conclusions in Section 6.

2. Nested Homogeneous Spaces

In this section, we introduce the structure of nested homogeneous Riemannian manifolds. A Riemannian manifold (M, τ) is homogeneous if the group of isometries G = I(M) admitted by the manifold acts transitively on M (Helgason, 1979), i.e., for x, y ∈ M, there exists g ∈ G such that g(x) = y. In this case, M can be identified with G/H where H is an isotropy subgroup of G at some point p ∈ M i.e. H = {g ∈ G : g(p) = p}. Examples of homogeneous Riemannian manifolds include but are not limited to, the n-spheres Sⁿ⁻¹ = SO(n)/SO(n – 1), the SPD manifolds P_n = GL(n)/O(n), the Stiefel manifolds St(m, n) = SO(n)/SO(n − m), and the Grassmann manifolds Gr(p, n) = SO(n)/S(O(p) × O(n − p)).

In this paper, we focus on the case where G is either a real or a complex matrix Lie group, i.e. G is a subgroup of $GL (n, ℝ)$ or $GL (n, ℂ)$ . The main idea behind the construction of nested homogeneous spaces is simple: augmenting the matrix in G in an appropriate way. With an embedding of the isometry group G, the embedding of the homogeneous space G/H follows naturally from the quotient structure.

Let G and $\tilde{G}$ be two connected Lie groups such that $dim G < dim \tilde{G}$ and $\tilde{ι} : G \to \tilde{G}$ be an embedding. For a closed connected subgroup H of G, let $\tilde{H} = \tilde{ι} (H)$ . Since $\tilde{ι}$ is an embedding, $\tilde{H}$ is also a closed subgroup of $\tilde{G}$ . Now the canonical embedding of G/H in $\tilde{G} / \tilde{H}$ is defined by $ι (g H) = \tilde{ι} (g) \tilde{H}$ for g ∈ G. It is easy to see that ι is well-defined. Let g₁, g₂ ∈ G be such that g₁ = g₂h for some h ∈ H. Then

ι (g_{1} H) = \tilde{ι} (g_{1}) \tilde{H} = \tilde{ı} (g_{2} h) \tilde{H} = \tilde{ι} (g_{2}) \tilde{ι} (h) \tilde{H} = \tilde{ι} (g_{2}) \tilde{H} = ι (g_{2} H) .

Now for the homogeneous Riemannian manifolds (M = G/H, τ₁) and $(\tilde{M} = \tilde{G} / \tilde{H}, τ_{2})$ , denote the left-G-invariant, right-H-invariant metric on G (resp. left- $\tilde{G}$ -invariant, right- $\tilde{H}$ -invariant metric on $\tilde{G}$ ) by ${\bar{τ}}_{1}$ and ${\bar{τ}}_{2}$ , respectively (see Cheeger and Ebin (1975, Prop. 3.16(4))).

Proposition 1 If $\tilde{ι} : G \to \tilde{G}$ is isometric, then so is $ι : G / H \to \tilde{G} / \tilde{H}$ .

Proof Denote the Riemannian submersions by π : G → G/H and $\tilde{π} : \tilde{G} \to \tilde{G} / \tilde{H}$ . Let X and Y be vector fields on G/H and $\bar{X}$ and $\bar{Y}$ be their horizontal lifts respectively, i.e. $d π (\bar{X}) = X$ and $d π (\bar{Y}) = Y$ . By the definition of Riemannian submersions, dπ is isometric on the horizontal spaces, i.e. ${\bar{τ}}_{1} (\bar{X}, \bar{Y}) = τ_{1} (d π (\bar{X}), d π (\bar{Y})) = τ_{1} (X, Y)$ . Since $\tilde{ı}$ is isometric, we have ${\bar{τ}}_{1} (\bar{X}, \bar{Y}) = {\bar{τ}}_{2} (d \tilde{ι} (\bar{X}), d \tilde{ι} (\bar{Y}))$ . By the definition of ι, we also have $ι \circ π = \tilde{π} \circ \tilde{ι}$ , which implies $d ι \circ d π = d \tilde{π} \circ d \tilde{ι}$ . Hence,

τ_{1} (X, Y) = {\bar{τ}}_{1} (\bar{X}, \bar{Y}) = {\bar{τ}}_{2} (d \tilde{ι} (\bar{X}), d \tilde{ι} (\bar{Y})) = τ_{2} ((d \tilde{π} \circ d \tilde{ι}) (\bar{X}), (d \tilde{π} \circ d \tilde{ι}) (\bar{Y})) = τ_{2} ((d ι \circ d π) (\bar{X}), (d ι \circ d π) (\bar{Y})) = τ_{2} (d ι (X), d ι (Y))

where the third equality follows from the isometry of $d \tilde{π}$ . ■

Proposition 1 simply says that if the isometry group is isometrically embedded, then the associated homogeneous Riemannian manifolds will also be isometrically embedded. Conversely, if we have a Riemannian submersion $\tilde{f} : \tilde{G} \to G$ , it can easily be shown that the induced map $f : \tilde{G} / \tilde{H} \to G / H$ would also be a Riemannian submersion where $H = \tilde{f} (\tilde{H})$ . The construction above can be applied to a sequence of homogeneous spaces ${M_{m}}_{m = 1}^{\infty}$ , i.e. the embedding ι_m : M_m → M_m+1 can be induced from the embedding of the isometry groups ${\tilde{ι}}_{m} : G_{m} \to G_{m + 1}$ where G_m = I(M_m) provided dim G_i < dim G_j for i < j. See Figure 1 for the structure of nested homogeneous spaces.

Figure 1: — Commutative diagram of the induced embedding for homogeneous spaces.

3. Nested Grassmann Manifolds

In this section, we will apply the theory of nested homogeneous space from the previous section to the Grassmann manifolds. First, we briefly review the geometry of the Grassmann manifolds in Section 3.1. With the theory in Section 2, we derive the nested Grassmann manifolds in Section 3.2, and the derivation for nested spheres and nested SPD manifolds are carried out in Section 3.3.

3.1. The Riemannian Geometry of Grassmann Manifolds

To simplify the notation, we assume $V = ℝ^{n}$ and write $Gr (p, n) ≔ Gr (p, ℝ^{n})$ . All the results presented in this section can be easily extended to the case of $V = ℂ^{n}$ by replacing transposition with conjugate transposition and orthogonal groups with unitary groups. The Grassmann manifold Gr(p, n) is the manifold of all p-dimensional subspaces of $ℝ^{n}$ , and for a subspace 𝒳 ∈ Gr(p, n),. we write 𝒳 = span(X) where the columns of X form an orthonormal basis for 𝒳. The space of all n × p matrices X such that X^T X = I_p called the Stiefel manifold, denoted by St(p, n). Special cases of Stiefel manifolds are the Lie group of all orthogonal matrices, O(n) = St(n, n), and the n-sphere, Sⁿ⁻¹ = St(1, n). The Stiefel manifold with the induced Euclidean metric (i.e. for U, V ∈ T_XSt(p, n). 〈U, V〉_X = tr(U^TV)) is a homogeneous Riemannian manifold, St(p, n) = SO(n)/SO(n − p). A canonical Riemannian metric on the Grassmann manifold can be inherited from the metric on St(p, n) since it is invariant to the left multiplication by elements of O(n) (Absil et al., 2004; Edelman et al., 1998). The Grassmann manifold with this metric is also homogeneous, Gr(p, n) = SO(n)/S(O(p) × O(n − p)).

With this canonical metric on the Grassmann manifolds, the geodesic can be expressed in closed form. Let 𝒳 = span(X) ∈ Gr(p, n) where X ∈ St(p, n) and H be an n × p matrix. Then the geodesic γ(t) with γ(0) = 𝒳 and γ′(0) = H given by γ_𝒳,H(t) = span(XV cos Σt + U sin Σt) where H = UΣV^T is the compact singular value decomposition (Edelman et al., 1998, Theorem 2.3). The exponential map at 𝒳 is a map from T_𝒳 Gr(p, n) to Gr(p, n) defined by Exp_𝒳H = γ_𝒳,H (1) = span(XV cos Σ + U sin Σ). If X^TY is invertible, the geodesic distance between 𝒳 = span(X) and 𝒴 = span(Y) is given by $d_{g}^{2} (𝒳, 𝒴) = tr Θ^{2} = \sum_{i = 1}^{p} θ_{i}^{2}$ where (I − XX^T)Y (X^TY )⁻¹ = UΣV^T, U ∈ St(p, n), V ∈ O(p), and Θ = tan⁻¹ Σ. The diagonal entries θ₁, …, θ_k of Θ are known as the principal angles between 𝒳 and 𝒴.

3.2. Embedding of Gr(p, m) in Gr(p, n)

Let 𝒳 = span(X) ∈ Gr(p, m), X ∈ St(p, m). The map ι : Gr(p, m) → Gr(p, n), for m < n, defined by

ι (𝒳) = span ([\begin{matrix} X \\ 0_{(n - m) \times p} \end{matrix}])

is an embedding and it is easy to check that this embedding is isometric (Ye and Lim, 2016, Eq. (8)). However, for the dimensionality reduction problem, the above embedding is insufficient as it is not flexible enough to encompass other possible embeddings. To design flexible embeddings, we apply the framework proposed in Section 2. We consider M_m = Gr(p, m) for which the isometry groups are G_m = SO(m) and H_m = S(O(p) × O(m − p)).

In this paper, we consider the embedding ${\tilde{ι}}_{m} : SO (m) \to SO (m + 1)$ given by,

{\tilde{ι}}_{m} (O) = GS (R [\begin{matrix} O & a \\ b^{T} & c \end{matrix}])

(1)

where O ∈ SO(m), R ∈ SO(m + 1), a, $b \in ℝ^{m}$ , $c \in ℝ$ , c ≠ b^TO⁻¹a, and GS(·) is the Gram-Schmidt process. Since the Riemannian submersion π : SO(m) → Gr(p, m) is defined by π (O) = span(O_p) where O ∈ SO(m) and O_p is the m × p matrix containing the first p columns of O, the induced embedding ι_m : Gr(p, m) → Gr(p, m + 1) is given by,

ι_{m} (𝒳) = span (R [\begin{matrix} X \\ b^{T} \end{matrix}]) = span (\tilde{R} X + v b^{T}),

where $b \in ℝ^{p}$ , R ∈ SO(m + 1), $\tilde{R}$ contains the first m columns of R (which means $\tilde{R} \in St (m, m + 1)$ , υ is the last column of R, and 𝒳 = span(X) ∈ Gr(p, m). It is easy to see that for R = I and b = 0, this gives the natural embedding described in Ye and Lim (2016) and at the beginning of this section.

Proposition 2 If b = 0, then ι_m is an isometric embedding.

Proof With Proposition 1, it suffices to show that ${\tilde{ι}}_{m}$ is isometric when b = 0. Note that as ι_m is independent of a and c in the definition of ${\tilde{ι}}_{m}$ , we can assume a = 0 and c = 1 without loss of generality. If ${\tilde{ι}}_{m}$ simplifies to

{\tilde{ι}}_{m} (O) = R [\begin{array}{l} O & 0 \\ 0 & 1 \end{array}]

where R ∈ SO(m + 1). The Riemannian distance on SO(n) given the induced Euclidean metric is $d_{SO} (O_{1}, O_{2}) = \frac{1}{\sqrt{2}} {‖ log O_{1}^{T} O_{2} ‖}_{F}$ . Then for O₁, O₂ ∈ SO(m),

d_{SO} ({\tilde{ι}}_{m} (O_{1}), {\tilde{ι}}_{m} (O_{2})) = \frac{1}{\sqrt{2}} {‖ log ([\begin{matrix} O_{1}^{T} O_{2} & 0 \\ 0 & 1 \end{matrix}]) ‖}_{F} = d_{SO} (O_{1}, O_{2}) .

Therefore ${\tilde{ι}}_{m}$ is an isometric embedding, and so is ι_m by Proposition 1. ■

With the embedding ι_m, we can construct the corresponding projection π_m : Gr(p, m + 1) → Gr(p, m) using the following proposition.

Proposition 3 The projection π_m : Gr(p, m+1) → Gr(p, m) corresponding to $ι_{m} (𝒳) = span (\tilde{R} X + v b^{T})$ is given by $π_{m} (𝒳) = span ({\tilde{R}}^{T} X)$ .

Proof First, let 𝒴 = span(Y) ∈ Gr(p, m) and 𝒳 = span(X) 2 Gr(p, m + 1) be such that $𝒳 = span (\tilde{R} Y + v b^{T})$ . Then $X L = \tilde{R} Y + v b^{T}$ for some L ∈ GL(p). Therefore, $Y = {\tilde{R}}^{T} (X L - v b^{T}) = {\tilde{R}}^{T} X L$ and $𝒴 = span (Y) = span ({\tilde{R}}^{T} X L) = span ({\tilde{R}}^{T} X)$ . Hence, the projection is given by $π_{m} (𝒳) = span ({\tilde{R}}^{T} X)$ . This completes the proof. ■

Note that for 𝒳 = span(X) 2 Gr(p, m+1), ι_m(π_m(𝒳)) = span(RR^T X+vb^T) = span((I − vv^T )X +vb^T) where $v \in ℝ^{m + 1}$ and ∥υ∥ = 1. The nested relation can be extended inductively and we refer to this construction as the nested Grassmann structure:

Gr (p, m) \overset{ι_{m}}{↪} Gr (p, m + 1) \overset{ι_{m + 1}}{↪} \dots \overset{ι_{n - 2}}{↪} Gr (p, n - 1) \overset{ι_{n - 1}}{↪} Gr (p, n) .

Thus the embedding from Gr(p, m) into Gr(p, n) can be constructed inductively by ι ≔ ι_n−1 ∘… ∘ι_m−1 ∘ ι_m and similarly for the corresponding projection. The explicit forms of the embedding and the projection are given in the following proposition.

Proposition 4 The embedding of Gr(p, m) into Gr(p, n) for m < n is given by ι_A,B(𝒳) = span(AX + B) where A ∈ St(m, n) and $B \in ℝ^{n \times p}$ such that A^T B = 0. The corresponding projection from Gr(p, n) to Gr(p, m) is given by π_A = span(A^T X).

Proof By the definition, ι ≔ ι_n−1 ∘…∘ι_m−1 ∘ι_m and thus the embedding ι : Gr(p, m) → Gr(p, n) can be simplified as

ι_{A, B} (𝒳) = span ((\prod_{i = m}^{n - 1} R_{i}) X + \sum_{i = m}^{n - 1} (\prod_{j = i + 1}^{n - 1} R_{j}) v_{i} b_{i}^{T}) = span (A X + B)

where R_i ∈ St(i, i + 1), v_i is such that [R_i υ_i] ∈ O(i + 1), $b_{i} \in ℝ^{p}$ , A = R_n−1R_n−2 ⋯ R_m ∈ St(m, n), and $B = \sum_{i = m}^{n - 1} (\prod_{j = i + 1}^{n - 1} R_{j}) v_{i} b_{i}^{T}$ is an n × p matrix. Similar to Proposition 3, the projection π_A : Gr(p, n) → Gr(p, m) is then given by π_A(𝒳) = span(A^TX). This completes the proof. ■

From Proposition 2, if B = 0 then ι_A is an isometric embedding. Hence, our nested Grassmann structure is more flexible than PGA as it allows one to project the data onto a non-geodesic submanifold. An illustration is shown in Figure 2. The results discussed in this subsection can be generalized to any homogeneous space in principle. For a given homogeneous space, once the embedding of the groups of isometries (e.g., Eq. (1)) is determined, the induced embedding and the corresponding projection can be derived akin to the case of Grassmann manifolds.

Figure 2: — Illustration of the embedding of Gr(p, m) in Gr(p, n) parametrized by A ∈ St(m, n) and $B \in ℝ^{n \times p}$ such that A^TB = 0.

3.3. Connections to Other Nested Structures

The nested homogeneous spaces proposed in this work (see Figure 1) actually provides a unified framework within which, the nested spheres (Jung et al., 2012) and the nested SPD manifolds (Harandi et al., 2018) are special cases.

The n-Sphere Example:

Since the n-sphere can be identified with a homogeneous space Sⁿ⁻¹ ≅ O(n)/O(n − 1), with the embedding (1), the induced embedding of Sⁿ⁻¹ into Sⁿ is

ι (x) = GS (R [\begin{array}{l} x \\ b \end{array}]) = \frac{1}{\sqrt{1 + b^{2}}} R [\begin{array}{l} x \\ b \end{array}] = R [\begin{matrix} sin (r) x \\ cos (r) \end{matrix}]

where x ∈ Sⁿ⁻¹, $b \in ℝ$ , and $r = {cos}^{- 1} (\frac{b}{\sqrt{1 + b^{2}}})$ . This is precisely the nested sphere proposed in Jung et al. (2012, Eq. (2)).

The SPD Manifold Example:

For the m-dimensional SPD manifold denoted by P_m, G_m = GL(m) and H_m = O(m). Consider the embedding ${\tilde{ι}}_{m} : GL (m) \to GL (m + 1)$ given by

\tilde{A} = {\tilde{ι}}_{m} (A) = R [\begin{matrix} A & 0 \\ 0 & 1 \end{matrix}] R^{T},

where A ∈ GL(m), R ∈ O(m + 1) and the corresponding projection ${\tilde{π}}_{m} : GL (m + 1) \to GL (m)$ is

{\tilde{π}}_{m} (\tilde{A}) = W^{T} \tilde{A} W

whereW contains the first m columns of R = [W υ] ∈ O(m + 1) (i.e., W ∈ St(m, m + 1) and W^T υ = 0. The submersion ψ ∘ f : GL(m) → P_m is given by ψ ∘ f(A):A^TA. Hence the induced embedding ι_m : P_m → P_m+1 and projection π_{m :} P_m+1 → P_m are

ι_{m} (X) = W X W^{T} + υ υ^{T} and π_{m} (X) = W^{T} X W

which is the projection map used in Harandi et al. (2018, Eq. (13)). However, Harandi et al. (2018) did not perform any embedding or construct a nested family of SPD manifolds. Recently, it came to our attention that Jaquier and Rozo (2020) derived a similar nested family of SPD manifolds based on the projection maps in Harandi et al. (2018) described above.

4. Dimensionality Reduction with Nested Grassmanns

In this section, we discuss how to apply the nested Grassmann structure to the problem of dimension reduction. In Section 4.1 and 4.2, we describe the unsupervised and supervised dimension reduction based on the nested Grassmann manifolds. In Section 4.3, we will discuss the choice of distance metrics required by the dimensionality reduction algorithm and present some technical details regarding the implementation. Analysis of principal nested Grassmann (PNG) will be introduced and discussed in Section 4.4 and Section 4.5.

4.1. Unsupervised Dimensionality Reduction

We can now apply the nested Grassmann structure to the problem of unsupervised dimensionality reduction. Suppose that we are given the points, that we seek is obtained by the minimizing the reconstruction error, i.e. 𝒳¹, …, 𝒳^N ∈ Gr(p, n). We would like to have lower dimensional representations in Gr(p, m) for 𝒳₁, …, 𝒳_N with m ≪ n. The desired projection map π_A that we seek is obtained by the minimizing the reconstruction error, i.e.

L_{u} (A, B) = \frac{1}{N} \sum_{i = 1}^{N} d^{2} (𝒳_{i}, {\hat{𝒳}}_{i}) = \frac{1}{N} \sum_{i = 1}^{N} d^{2} (span (X_{i}), span (A A^{T} X_{i} + B))

where d(·, ·) is a distance metric on Gr(p, n). It is clear that L_u has a O(m)-symmetry in the first argument, i.e. L_u(AO, B) = L_u(A, B) for O ∈ O(m). Hence, the optimization is performed over the space St(m, n)/O(m) ≅ Gr(m, n) when optimizing with respect to this particular loss function. Now we can apply the Riemannian gradient descent algorithm (Edelman et al., 1998) to obtain A and B by optimizing L_u(A, B) over span(A) ∈ Gr(m, n) and $B \in ℝ^{n \times p}$ such that A^TB = 0. Note that the restriction A^TB = 0 simply means that the columns of B are in the null space of A^T, denoted N(A^T). Hence in practice this restriction can be handled as follows. For arbitrary $\tilde{B} \in ℝ^{n \times p}$ project $\tilde{B}$ on to N(A^T), i.e. $B = P_{N (A^{T})} \tilde{B}$ where $P_{N (A^{T})} = I - A A^{T}$ is the projection from $ℝ^{n}$ to N(A^T). Thus, the loss function can be written as

L_{u} (A, B) = \frac{1}{N} \sum_{i = 1}^{N} d^{2} (span (X_{i}), span (A A^{T} X_{i} + (I - A A^{T}) B))

and it is optimized over $Gr (m, n) \times ℝ^{n \times p}$ . Note that since we represent a subspace by its orthonormal basis, when m > n/2, we can use the isomorphism Gr(m, n) ≅ Gr(n − m, n) to further reduce the computational burden. This will be particularly useful when m = n − 1 as in Section 4.4. Under this isomorphism Gr(m, n) ≅ Gr(n − m, n), the corresponding subspace of span (A) ∈ Gr(m, n) is span(A_⊥) ∈ Gr(n − m, n) where A_⊥ is an n × (n − m) matrix such that [A A_⊥] is an orthogonal matrix. Hence the loss function L_u can alternatively be expressed as

L_{u} (A, B) = \frac{1}{N} \sum_{i = 1}^{N} d^{2} (span (X_{i}), span ((I - A_{⊥} A_{⊥}^{T}) X_{i} + A_{⊥} A_{⊥}^{T} B)) .

Remark 1 The reduction of the space of all possible projections from St(m, n) to Gr(m, n) is a consequence of the choice of the loss function L_u. With a different loss function, one might have a different space of possible projections.

4.2. Supervised Dimensionality Reduction

If in addition to 𝒳₁, …, 𝒳_N ∈ Gr(p, n), we are given the associated labels y₁, …, y_N ∈ {1, …, k}, then we would like to use this extra information to sharpen the result of dimensionality reduction. Specifically, we expect that after reducing the dimension, points from the same class preserve their proximity while points from different classes are well separated. We use an affinity function $a : Gr (p, n) \times Gr (p, n) \to ℝ$ to encode the structure of the data as suggested by Harandi et al. (2018, Sec 3.1, Eq. (14)-(16)). For completeness, we repeat the definition of the affinity function here. The affinity function is defined as a(𝒳_i, 𝒳_j) = g_w(𝒳_i, 𝒳_j) − g_b(𝒳_i, 𝒳_j) where

g_{w} (𝒳_{i}, 𝒳_{j}) = {\begin{array}{l} 1 & if 𝒳_{i} \in N_{w} (𝒳_{j}) or 𝒳_{j} \in N_{w} (𝒳_{i}) \\ 0 & Otherwise \end{array}

g_{b} (𝒳_{i}, 𝒳_{j}) = {\begin{array}{l} 1 & if 𝒳_{i} \in N_{b} (𝒳_{j}) or 𝒳_{j} \in N_{b} (𝒳_{i}) \\ 0 & Otherwise \end{array} .

The set Nw(𝒳_i) consists of ν_w nearest neighbors of 𝒳_i that have the same labels as y_i, and the set N_b(𝒳_i) consists of ν_b nearest neighbors of 𝒳_i that have different labels from y_i. The nearest neighbors can be computed using the geodesic distance.

The desired projection map π_A that we seek is obtained by the minimizing the following loss function

L_{s} (A) = \frac{1}{N^{2}} \sum_{i, j = 1}^{N} a (𝒳_{i}, 𝒳_{j}) d^{2} (span (A^{T} X_{i}), span (A^{T} X_{j}))

where d is a distance metric on Gr(p, m). Note that if the distance metric d has O(m)-symmetry, e.g. the geodesic distance, so does L_s. In this case the optimization can be done on St(m, n)/O(m) ≅ Gr(m, n). Otherwise it is on St(m, n). This supervised dimensionality reduction is termed as, supervised nested Grassmann (sNG).

4.3. Choice of the distance function

The loss functions L^u and L^s depend on the choice of the distance $d : Gr (p, n) \times Gr (p, n) \to ℝ_{\geq 0}$ . Besides the geodesic distance, there are many widely used distances on the Grassmann manifold, see, for example, Edelman et al. (1998, p. 337) and Ye and Lim (2016, Table 2). In this work, we use two different distance metrics: (1) the geodesic distance d_g and (2) the projection distance, which is also called the chordal distance in Ye and Lim (2016) and the projection F-norm in Edelman et al. (1998). The geodesic distance was defined in Section 3.1 and the projection distance is defined as follows. For 𝒳, 𝒴 ∈ Gr(p, n), denote the projection matrices onto 𝒳 and 𝒴 is given by p_𝒳 and p_𝒴 respectively. Then, the distance between 𝒳 and 𝒴 is given by $d_{p} (𝒳, 𝒴) = {‖ P_{𝒳} - P_{𝒴} ‖}_{F} / \sqrt{2} = {(\sum_{i = 1}^{p} {sin}^{2} θ_{i})}^{1 / 2}$ where θ₁, …, θ_p are the principal angles of 𝒳 and 𝒴. If 𝒳 = span(X) then P_𝒳 = X(X^T X)⁻¹ X^T. It is also easy to see the the projection distance has O(n)-symmetry. We choose the projection distance mainly for its computational efficiency as it involves only matrix multiplication which has a time complexity O(n²) whereas the geodesic distance requires an SVD which has a time complexity of O(n³).

4.4. Analysis of Principal Nested Grassmannians

To determine the dimension of the nested submanifold that fits the data well enough, we can choose p < m₁ < … < m_k < n and estimate the projection onto these nested Grassmann manifolds. The ratio of expressed variance for each projection is the ratio of the variance of the projected data and the variance of the original data. With these ratios, we can choose the desired dimension according to the pre-specified percentage of expressed variance as one would do for choosing the number of principal components in PCA.

Alternatively, one can have a full analysis of principal nested Grassmanns (PNG) as follows. Starting from Gr(p, n), one can reduce the dimension down to Gr(p, p + 1). Using the diffeomorphism between Gr(p, n) and Gr(p, n − p), we have Gr(p, p + 1) ≅ Gr(1, p + 1), and hence we can continue reducing the dimension down to Gr(1, 2). The resulting sequence will be

Gr (p, n) \to Gr (p, n - 1) \to \dots \to Gr (p, p + 1) = Gr (1, p + 1) \to Gr (1, p) \to \dots \to Gr (1, 2) .

Furthermore, we can reduce the points on Gr(1, 2), which is a 1-dimensional manifold, to a 0-dimensional manifold, which is simply a point, by computing the Fréchet mean (FM). We call this FM the nested Grassmannian mean (NGM) of 𝒳₁, …, 𝒳_N ∈ Gr(p, n). The NGM is unique since $Gr (1, 2) ≅ ℝ P^{1}$ can be identified as the half circle in $ℝ^{2}$ and the FM is unique in this case. Note that in general, the NGM will not be the same as the FM of 𝒳₁, …, 𝒳_N since the embedding/projection need not be isometric. The supervised PNG (sPNG) can be obtained similarly by replacing each projection with it supervised counterpart.

4.5. Principal Scores

Whenever we apply a projection π^m : Gr(p, m + 1) → Gr(p, m) to the data, we might lose some information contained in the data. More specifically, since we project data on a p(m + 1 − p)-dimensional manifold to a p(m − p)-dimensional manifold, we need to describe this p(m + 1 − p) p(m − p) = p dimensional information loss during the projection. In PCA, this is done by computing the scores of each principal component, which are the transformed coordinates of each sample in the eigenspace of the covariance matrix. We can generalize the notion of principal scores to the nested Grassmanns as follows: For each 𝒳 ∈ Gr(p, m + 1), denote by $M_{π_{m} (𝒳)}$ the fiber of π_m(𝒳), i.e. $M_{π_{m} (𝒳)} = π_{m}^{- 1} (π_{m} (𝒳)) = {𝒴 \in Gr (p, m + 1) : π_{m} (𝒴) = π_{m} (𝒳)}$ which is a p-dimensional submanifold of Gr(p, m + 1). An illustration of this fiber is given in Figure 3a. Let $\tilde{𝒳} = ι_{m} (π_{m} (𝒳))$ and let the unit tangent vector $V \in T_{\tilde{𝒳}} M_{π_{m} (𝒳)}$ be the geodesic direction from $\tilde{𝒳}$ to 𝒳. Given a suitable basis on $T_{\tilde{𝒳}} M_{π_{m} (𝒳)}$ , V can be realized as a p-dimensional vector, and this will be the score vector of 𝒳 associated with the projection, π_m.

Figure 3: — Illustrations of submanifolds induced by π_m.

By the definition of $M_{π_{m} (𝒳)}$ we have the following decomposition of the tangent space of Gr(p, m + 1) at $\tilde{𝒳}$ into the horizontal space and the vertical space induced by π_m,

T_{\tilde{𝒳}} Gr (p, m + 1) = T_{\tilde{𝒳}} M_{π_{m} (𝒳)} \oplus {(d ι_{m})}_{π_{m} (𝒳)} (T_{π_{m} (𝒳)} Gr (p, m)) .

An illustration of this decomposition is given in Figure 3b. A tangent vector of $M_{π_{m} (𝒳)}$ at $\tilde{𝒳}$ is of the form Δ = A_⊥b^T where A_⊥ is any (m + 1)-dim vector such that [A A_⊥] is orthogonal and $b \in ℝ^{p}$ It is easy to check that π_m(span(AA^TX + A_⊥b^T)) = π_m(span(X)) = span(A^TX). Hence a natural coordinate for the tangent vector $Δ = A_{⊥} b^{T}$ is $b \in ℝ^{p}$ , and the geodesic direction from $\tilde{𝒳}$ to 𝒳 would be V= X^TA_⊥. It is easy to see that ∥V∥_F = 1 since X has orthonormal columns. To reflect the distance between $\tilde{𝒳}$ and 𝒳, i.e. the reconstruction error, we define $d (\tilde{𝒳}, 𝒳) V$ as the score vector for 𝒳 associated with π_m. In the case of Gr(1, 2) → NGM, we use the sign distance to the NGM as the score. For complex nested Grassmanns however, the principal score associated with each projection is a p-dimensional complex vector. For the sake of visualization, we transform this p-dimensional complex vector to a 2p-dimensional real vector. The procedure for computing the PNG and the principal scores is summarized in Algorithm 1.

Remark 2 Note that this definition of principal score is not intrinsic as it depends on the choice of basis. Indeed, it is impossible to choose a p-dimensional vector for the projection π_m in an intrinsic way, since the only property of a map that is independent of bases is the rank of the map. A reasonable choice of basis is made by viewing the Grassmann Gr(p, m) as a quotient manifold of St(p, m), which is a submanifold in $ℝ^{m \times p}$ . This is how we define the principal score for nested Grassmanns.

5. Experiments

In this section, we will demonstrate the performance of the proposed dimensionality reduction technique, i.e. PNG and sPNG, via experiments on synthetic and real data. The implementation¹ is based on the python library pymanopt (Townsend et al., 2016) and we use the steepest descent algorithm for the optimization (with default parameters in pymanopt). The optimization was performed on a desktop with 3.6GHz Intel i7 processors and took about 30 seconds to converge.

5.1. Synthetic Data

In this subsection, we compare the performance of the projection and the geodesic distances respectively. The questions we will answer are the following. (1) From Section 4.3, we see that using projection distance is more efficient than using the geodesic distance. But how do they perform compared to each other under varying dimension n and variance level σ²? (2) Is our method of dimensionality reduction ”better” than PGA? Under what conditions does our method outperform PGA?

5.1.1. Projection and Geodesic Distance Comparisons

The procedure we used to generate random points on Gr(p, n) for the synthetic data experiments is as follows: First, we generate N points from a uniform distribution on St(p, m) (Chikuse, 2003, Ch. 2.5), generate A from the uniform distribution on St(m, n), and generate B as an n × p matrix with i.i.d entries from N(0, 0.1). Then we compute ${\tilde{𝒳}}_{i} = span (A X_{i} + (I - A A^{T}) B) \in Gr (p, n)$ . Finally, we compute $𝒳_{i} = {Exp}_{{\tilde{𝒳}}_{i}} (σ U_{i})$ , where $U_{i} = {\tilde{U}}_{i} / ‖ {\tilde{U}}_{i} ‖$ and ${\tilde{U}}_{i} \in T_{{\tilde{𝒳}}_{i}} Gr (p, n)$ to include some perturbation.

This experiment involves comparing the performance of the NG representation in terms of the explained variance, under different levels of data variance. In this experiment, we set N = 50, n = 10, m = 3, and p = 1 and σ is ranging from 0.5 to 1. The results are averaged over 100 repetitions and are shown in Figure 4. From these results, we can see that the explained variance for the projection distance and the geodesic distance are indistinguishable but using projection distance leads to much faster convergence than when using the geodesic distance. The reason is that when two points on the Grassmann manifold are close, the geodesic distance can be well-approximated by the projection distance. When the algorithm converges, the original point 𝒳_i and the reconstructed point ${\hat{𝒳}}_{i}$ should be close and the geodesic distance can thus be well-approximated by the projection distance. Therefore, for all the experiments in the next section, we use the projection distance for the sake of efficiency.

Figure 4: — Comparison of the NG representations based on the projection and geodesic distances using the expressed variance.

5.1.

5.1.2. Comparison of PNG and PGA

Now we compare the proposed PNG to PGA. In order to have a fair comparison between PNG and PGA, we define the principal components of PNG as the principal components of the scores obtained as in Section 4.5. Similar to the previous experiment, we set N = 50, n = 10, m = 5, p = 2, and σ = 0.01, 0.05, 0.1, 0.5 and apply the same procedure to generate synthetic data. The results are averaged over 100 repetitions and are shown in Figure 5.

From Figure 5, we can see that our method outperforms PGA by virtue of the fact that it is able to capture a larger amount of variance contained in the data. We can see that when the variance is small, the improvement of PNG over PGA is less significant, whereas, our method is significantly better for the large data variance case (e.g. comparing σ = 0.5 and σ = 0.01). Note that when the variance in the data is small, i.e. the data are tightly clustered around the FM, and PGA captures the essence of the data well. However, the requirement in PGA on the geodesic submanifold to pass through the anchor point, namely the FM, is not meaningful for data with large variance as explained through the following simple example. Consider, a few data points spread out on the equator of a sphere. The FM in this case is likely to be the north pole of the sphere if we restrict ourselves to the upper hemisphere. Thus, the geodesic submanifold computed by PGA will pass through this FM. However, what is more meaningful is a submanifold corresponding to the equator, which is what a nested spheres representation (Jung et al., 2012) in this case yields. In similar vein, for data with large variance on a Grassmann manifold, our NG representation will yield a more meaningful representation than PGA.

5.2. Application to Planar Shape Analysis

We now apply our method to planar (2-dimensional) shape analysis. A planar shape σ can be represented as an ordered set of k > 2 points in $ℝ^{2}$ , called a k-ad or a configuration. Here we assume that these k points are not all identical. Denote the configuration by X which is a k × 2 matrix. Let H be the (k − 1) × k Helmert submatrix (Dryden and Mardia, 2016, Ch. 2.5). Then Z = HX/∥HX∥_F is called the pre-shape of X from which the information about translation and scaling is removed. The space of all pre-shapes is called the pre-shape space, denoted by $𝒮_{2}^{k}$ . By definition, the pre-shape space is a (2k − 3)-dimensional sphere. The shape is obtained by removing the rotation from the pre-shape, and thus the shape space is $Σ_{2}^{k} = 𝒮_{2}^{k} / O (2)$ . It was shown by Kendall (1984) that $Σ_{2}^{k}$ is a smooth manifold and, when equipped with the quotient metric, is isometric to the complex projective space $ℂ P^{k - 2}$ equipped with the Fubini-Study metric (up to a scale factor) which is a special case of the complex Grassmannians, i.e. $ℂ P^{k - 2} ≅ Gr (1, ℂ^{k - 1})$ . Hence, we can apply the proposed PNG to planar shapes. For planar shapes, we also compare with the recently proposed principal nested shape spaces (PNSS) (Dryden et al., 2019), which is an application of PNS on the pre-shape space. We will now demonstrate how the PNG performs compared to PGA and PNSS using some simple examples of planar shapes and the OASIS dataset.

Examples of Planar Shapes

We take three datasets: digit3, gorf, and gorm, from the R package shapes (Dryden, 2021). The digit3 dataset consists of 30 shapes of the digit 3, each of which is represented by 13 points in $ℝ^{2}$ ; the gorf dataset consists of 30 shapes of female gorilla skull, each of which is represented by 8 points in $ℝ^{2}$ ; the gorm dataset consists of 29 shapes of male gorilla skull, each of which is represented by 8 points in $ℝ^{2}$ . Example shapes from these three datasets are shown in Figure 6. The cumulative ratios of variance explained by the first 5 principal components² of PNG, PGA, and PNSS are shown in Figure 7. It can be seen from Figure 7 that the proposed PNG achieves higher explained variance than PGA and PNSS respectively in most cases.

Figure 6: — Example shapes from the three datasets.

Figure 7: — Cumulative explained variance by the first 5 principal components of PNG, PGA, and PNSS.

OASIS Corpus Callosum Data Experiment

The OASIS database (Marcus et al., 2007) is a publicly available database that contains T1-MR brain scans of subjects of age ranging from 18 to 96. In particular, it includes subjects that are clinically diagnosed with mild to moderate Alzheimer’s disease. We further classify them into three groups: young (aged between 10 and 40), middle-aged (aged between 40 and 70), and old (aged above 70). For demonstration, we randomly choose 4 brain scans within each decade, totalling 36 brain scans. From each scan, the Corpus Callosum (CC) region is segmented and 250 points are taken on the boundary of the CC region. See Figure 8 for samples of the segmented corpus callosi. In this case, the shape space is $Σ_{2}^{248} ≅ ℂ P^{248} ≅ Gr (1, ℂ^{249})$ . Results of application of the three methods to this data are shown in Figure 9.

Figure 8: — Example Corpus Callosi shapes from three distinct age groups, each depicted using the boundary point sets.

Figure 9: — Cumulative explained variance captured by the first 10 principal components of PNG, PGA, and PNSS respectively.

Since the data are divided into three groups (young, middle-aged, and old), we can apply the sPNG described in Section 4.2 to reduce the dimension. The purpose of this experiment is not to demonstrate state-of-the-art classification accuracy for this dataset. Instead, our goal here is to demonstrate that the proposed nested Grassmann representation in a supervised setting is much more discriminative than the competition, namely the supervised PGA. Hence, we choose a simple classifier such as the support vector machine (SVM) Vapnik (1995) to highlight the aforementioned discriminative power of the nested Grassmann over PGA.

For comparison, the PGA can be easily extended to supervised PGA (sPGA) by first diffeomorphically mapping all the data to the tangent space anchored at the FM and then performing supervised PCA Bair et al. (2006); Barshan et al. (2011) on the tangent space. However, generalizing PNSS to the supervised case is nontrivial and is beyond the scope of this paper. Therefore, we limit our comparison to the unsupervised PNSS. In this demonstration, we apply an SVM on the scores obtained from different dimension reduction algorithms, and we choose only the first three principal scores to show that even with the 3-dimensional representation of the original shapes, we can still achieve good classification results. The results are shown in Table 1. These results are in accordance with our expectation since in sPNG, we seek a projection that minimizes the within-group variance while maximizing the between-group variance. However, as we observed earlier, the constraint of requiring the geodesic submanifold to pass through the FM is not well suited for this dataset which has a large variance across the data. This accounts for why the sPNG exhibits far superior performance compared to sPGA in accuracy.

Table 1:

Classification accuracies for sPGA and sPNG respectively.

Method	Accuracy
sPNG	83.33%
PNG	75%
sPGA	66.67%
PGA	63.89%
PNSS	80.56%

Open in a new tab

6. Conclusion

In this work, we proposed a novel nested geometric structure for homogeneous spaces and used this structure to achieve dimensionality reduction for data residing in Grassmann manifolds. We also discuss how this nested geometric structure served as a natural generalization of other existing nested geometric structures in literature namely, spheres and the manifold of SPD matrices. Specifically, we showed that a lower dimensional Grassmann manifold can be embedded into a higher dimensional Grassmann manifold and via this embedding we constructed a sequence of nested Grassmann manifolds. Compared to the PGA, which is designed for general Riemannian manifolds, the proposed method can capture a higher percentage of data variance after reducing the dimensionality. This is primarily because our method, unlike the PGA, does not require the submanifold to be a geodesic submanifold and to pass through the Fréchet mean of the data. Succinctly, the nested Grassmann structure allows us to fit the data to a larger class of submanifolds than PGA. We also proposed a supervised dimensionality reduction technique which simultaneously differentiates data classes while reducing dimensionality. Efficacy of our method was demonstrated on the OASIS Corpus Callosi data for dimensionality reduction and classification. We showed that our method outperforms the widely used PGA and the recently proposed PNSS by a large margin.

Acknowledgments

This research was in part funded by the NSF grant IIS-1724174, the NIH NINDS and NIA via R01NS121099 to Vemuri and the MOST grant 110–2118-M-002–005-MY3 to Yang.

Footnotes

Ethical Standards

The work follows appropriate ethical standards in conducting research and writing the manuscript, following all applicable laws and regulations regarding treatment of animals or human subjects.

Conflicts of Interest

We declare we don’t have conflicts of interest.

^1.

Our code is available at https://github.com/cvgmi/NestedGrassmann.

^2.

Here the principal components in PNG and PGA are complex whereas the principal components in PNSS are real. Hence, we choose 5 principal components in PNG and PGA and 10 principal components in PNSS, so that the reduced (real) dimension is 10 in all three cases.

Contributor Information

Chun-Hao Yang, Institute of Applied Mathematical Science, National Taiwan University, Taipei, Taiwan.

Baba C. Vemuri, Department of CISE, University of Florida, Gainesville, FL, USA

References

Absil P-A, Mahony Robert, and Sepulchre Rodolphe. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica, 80(2):199–220, 2004. [Google Scholar]
Bair Eric, Hastie Trevor, Paul Debashis, and Tibshirani Robert. Prediction by supervised principal components. Journal of the American Statistical Association, 101(473):119–137, 2006. [Google Scholar]
Banerjee Monami, Chakraborty Rudrasis, and Vemuri Baba C. Sparse exact PGA on Riemannian manifolds. In Proceedings of the IEEE International Conference on Computer Vision, pages 5010–5018, 2017. [Google Scholar]
Barshan Elnaz, Ghodsi Ali, Azimifar Zohreh, and Jahromi Mansoor Zolghadri. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition, 44(7):1357–1371, 2011. [Google Scholar]
Basser Peter J, Mattiello James, and LeBihan Denis. MR diffusion tensor spectroscopy and imaging. Biophysical journal, 66(1):259–267, 1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
Callaghan Paul T. Principles of Nuclear Magnetic Resonance Microscopy. Oxford University Press; on Demand, 1993. [Google Scholar]
Chakraborty Rudrasis, Seo Dohyung, and Vemuri Baba C. An efficient exact-PGA algorithm for constant curvature manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3976–3984, 2016. [Google Scholar]
Cheeger Jeff and Ebin David Gregory. Comparison theorems in Riemannian geometry, volume 9. North-holland Amsterdam, 1975. [Google Scholar]
Chikuse Yasuko. Statistics on Special Manifolds, volume 174. Springer Science & Business Media, 2003. [Google Scholar]
Damon James and Marron JS. Backwards principal component analysis and principal nested relations. Journal of Mathematical Imaging and Vision, 50(1–2):107–114, 2014. [Google Scholar]
Dryden IL. shapes package. R Foundation for Statistical Computing, Vienna, Austria, 2021. URL http://www.R-project.org. Contributed package, Version 1.2.6. [Google Scholar]
Dryden Ian L and Mardia Kanti V. Statistical shape analysis: with applications in R, volume 995. John Wiley & Sons, 2016. [Google Scholar]
Dryden Ian L, Kim Kwang-Rae, Laughton Charles A, Le Huiling, et al. Principal nested shape space analysis of molecular dynamics data. Annals of Applied Statistics, 13(4):2213–2234, 2019. [Google Scholar]
Edelman Alan, Arias Tomás A, and Smith Steven T. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998. [Google Scholar]
Feragen Aasa, Nielsen Mads, Vedel Jensen Eva Bjørn, du Plessis Andrew, and Lauze François. Geometry and statistics: Manifolds and stratified spaces. Journal of Mathematical Imaging and Vision, 50(1):1–4, 2014. [Google Scholar]
Fletcher P Thomas, Lu Conglin, Pizer Stephen M, and Joshi Sarang. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23(8): 995–1005, 2004. [DOI] [PubMed] [Google Scholar]
Goresky Mark and MacPherson Robert. Stratified Morse Theory. Springer, 1988. [Google Scholar]
Harandi Mehrtash, Salzmann Mathieu, and Hartley Richard. Dimensionality reduction on SPD manifolds: The emergence of geometry-aware methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(1):48–62, 2018. [DOI] [PubMed] [Google Scholar]
Hastie Trevor and Stuetzle Werner. Principal curves. Journal of the American Statistical Association, 84(406):502–516, 1989. [Google Scholar]
Hauberg Søren. Principal curves on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1915–1921, 2016. [DOI] [PubMed] [Google Scholar]
Helgason Sigurdur. Differential geometry, Lie groups, and symmetric spaces. Academic press, 1979. [Google Scholar]
Huckemann Stephan, Hotz Thomas, and Munk Axel. Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Statistica Sinica, pages 1–58, 2010. [Google Scholar]
Jaquier Noémie and Rozo Leonel. High-dimensional bayesian optimization via nested riemannian manifolds. Advances in Neural Information Processing Systems, 33:20939–20951, 2020. [Google Scholar]
Jung Sungkyu, Dryden Ian L, and Marron JS. Analysis of principal nested spheres. Biometrika, 99 (3):551–568, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kendall David G. Shape manifolds, Procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16(2):81–121, 1984. [Google Scholar]
Marcus Daniel S, Wang Tracy H, Parker Jamie, Csernansky John G, Morris John C, and Buckner Randy L. Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. Journal of Cognitive Neuroscience, 19 (9):1498–1507, 2007. [DOI] [PubMed] [Google Scholar]
Pennec Xavier et al. Barycentric subspace analysis on manifolds. The Annals of Statistics, 46(6A): 2711–2746, 2018. [Google Scholar]
Sommer Stefan, Lauze François, Hauberg Søren, and Nielsen Mads. Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations. In European Conference on Computer Vision, pages 43–56. Springer, 2010. [Google Scholar]
Townsend James, Koep Niklas, and Weichwald Sebastian. Pymanopt: A python toolbox for optimization on manifolds using automatic differentiation. Journal of Machine Learning Research, 17(137):1–5, 2016. URL http://jmlr.org/papers/v17/16-177.html. [Google Scholar]
Vapnik Vladimir. The Nature of Statistical Learning Theory. Springer Science & Business Media, 1995. [Google Scholar]
Ye Ke and Lim Lek-Heng. Schubert varieties and distances between subspaces of different dimensions. SIAM Journal on Matrix Analysis and Applications, 37(3):1176–1197, 2016. [Google Scholar]
Zhang Miaomiao and Fletcher Tom. Probabilistic principal geodesic analysis. In Advances in Neural Information Processing Systems, pages 1178–1186, 2013. [Google Scholar]

[R1] Absil P-A, Mahony Robert, and Sepulchre Rodolphe. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica, 80(2):199–220, 2004. [Google Scholar]

[R2] Bair Eric, Hastie Trevor, Paul Debashis, and Tibshirani Robert. Prediction by supervised principal components. Journal of the American Statistical Association, 101(473):119–137, 2006. [Google Scholar]

[R3] Banerjee Monami, Chakraborty Rudrasis, and Vemuri Baba C. Sparse exact PGA on Riemannian manifolds. In Proceedings of the IEEE International Conference on Computer Vision, pages 5010–5018, 2017. [Google Scholar]

[R4] Barshan Elnaz, Ghodsi Ali, Azimifar Zohreh, and Jahromi Mansoor Zolghadri. Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recognition, 44(7):1357–1371, 2011. [Google Scholar]

[R5] Basser Peter J, Mattiello James, and LeBihan Denis. MR diffusion tensor spectroscopy and imaging. Biophysical journal, 66(1):259–267, 1994. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Callaghan Paul T. Principles of Nuclear Magnetic Resonance Microscopy. Oxford University Press; on Demand, 1993. [Google Scholar]

[R7] Chakraborty Rudrasis, Seo Dohyung, and Vemuri Baba C. An efficient exact-PGA algorithm for constant curvature manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3976–3984, 2016. [Google Scholar]

[R8] Cheeger Jeff and Ebin David Gregory. Comparison theorems in Riemannian geometry, volume 9. North-holland Amsterdam, 1975. [Google Scholar]

[R9] Chikuse Yasuko. Statistics on Special Manifolds, volume 174. Springer Science & Business Media, 2003. [Google Scholar]

[R10] Damon James and Marron JS. Backwards principal component analysis and principal nested relations. Journal of Mathematical Imaging and Vision, 50(1–2):107–114, 2014. [Google Scholar]

[R11] Dryden IL. shapes package. R Foundation for Statistical Computing, Vienna, Austria, 2021. URL http://www.R-project.org. Contributed package, Version 1.2.6. [Google Scholar]

[R12] Dryden Ian L and Mardia Kanti V. Statistical shape analysis: with applications in R, volume 995. John Wiley & Sons, 2016. [Google Scholar]

[R13] Dryden Ian L, Kim Kwang-Rae, Laughton Charles A, Le Huiling, et al. Principal nested shape space analysis of molecular dynamics data. Annals of Applied Statistics, 13(4):2213–2234, 2019. [Google Scholar]

[R14] Edelman Alan, Arias Tomás A, and Smith Steven T. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998. [Google Scholar]

[R15] Feragen Aasa, Nielsen Mads, Vedel Jensen Eva Bjørn, du Plessis Andrew, and Lauze François. Geometry and statistics: Manifolds and stratified spaces. Journal of Mathematical Imaging and Vision, 50(1):1–4, 2014. [Google Scholar]

[R16] Fletcher P Thomas, Lu Conglin, Pizer Stephen M, and Joshi Sarang. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23(8): 995–1005, 2004. [DOI] [PubMed] [Google Scholar]

[R17] Goresky Mark and MacPherson Robert. Stratified Morse Theory. Springer, 1988. [Google Scholar]

[R18] Harandi Mehrtash, Salzmann Mathieu, and Hartley Richard. Dimensionality reduction on SPD manifolds: The emergence of geometry-aware methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(1):48–62, 2018. [DOI] [PubMed] [Google Scholar]

[R19] Hastie Trevor and Stuetzle Werner. Principal curves. Journal of the American Statistical Association, 84(406):502–516, 1989. [Google Scholar]

[R20] Hauberg Søren. Principal curves on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1915–1921, 2016. [DOI] [PubMed] [Google Scholar]

[R21] Helgason Sigurdur. Differential geometry, Lie groups, and symmetric spaces. Academic press, 1979. [Google Scholar]

[R22] Huckemann Stephan, Hotz Thomas, and Munk Axel. Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Statistica Sinica, pages 1–58, 2010. [Google Scholar]

[R23] Jaquier Noémie and Rozo Leonel. High-dimensional bayesian optimization via nested riemannian manifolds. Advances in Neural Information Processing Systems, 33:20939–20951, 2020. [Google Scholar]

[R24] Jung Sungkyu, Dryden Ian L, and Marron JS. Analysis of principal nested spheres. Biometrika, 99 (3):551–568, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Kendall David G. Shape manifolds, Procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16(2):81–121, 1984. [Google Scholar]

[R26] Marcus Daniel S, Wang Tracy H, Parker Jamie, Csernansky John G, Morris John C, and Buckner Randy L. Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. Journal of Cognitive Neuroscience, 19 (9):1498–1507, 2007. [DOI] [PubMed] [Google Scholar]

[R27] Pennec Xavier et al. Barycentric subspace analysis on manifolds. The Annals of Statistics, 46(6A): 2711–2746, 2018. [Google Scholar]

[R28] Sommer Stefan, Lauze François, Hauberg Søren, and Nielsen Mads. Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations. In European Conference on Computer Vision, pages 43–56. Springer, 2010. [Google Scholar]

[R29] Townsend James, Koep Niklas, and Weichwald Sebastian. Pymanopt: A python toolbox for optimization on manifolds using automatic differentiation. Journal of Machine Learning Research, 17(137):1–5, 2016. URL http://jmlr.org/papers/v17/16-177.html. [Google Scholar]

[R30] Vapnik Vladimir. The Nature of Statistical Learning Theory. Springer Science & Business Media, 1995. [Google Scholar]

[R31] Ye Ke and Lim Lek-Heng. Schubert varieties and distances between subspaces of different dimensions. SIAM Journal on Matrix Analysis and Applications, 37(3):1176–1197, 2016. [Google Scholar]

[R32] Zhang Miaomiao and Fletcher Tom. Probabilistic principal geodesic analysis. In Advances in Neural Information Processing Systems, pages 1178–1186, 2013. [Google Scholar]

PERMALINK

Nested Grassmannians for Dimensionality Reduction with Applications

Chun-Hao Yang

Baba C Vemuri

Abstract

1. Introduction

2. Nested Homogeneous Spaces

Figure 1:

3. Nested Grassmann Manifolds

3.1. The Riemannian Geometry of Grassmann Manifolds

3.2. Embedding of Gr(p, m) in Gr(p, n)

Figure 2:

3.3. Connections to Other Nested Structures

The n-Sphere Example:

The SPD Manifold Example:

4. Dimensionality Reduction with Nested Grassmanns

4.1. Unsupervised Dimensionality Reduction

4.2. Supervised Dimensionality Reduction

4.3. Choice of the distance function

4.4. Analysis of Principal Nested Grassmannians

4.5. Principal Scores

Figure 3:

5. Experiments

5.1. Synthetic Data

5.1.1. Projection and Geodesic Distance Comparisons

Figure 4:

5.1.2. Comparison of PNG and PGA

Figure 5:

5.2. Application to Planar Shape Analysis

Examples of Planar Shapes

Figure 6:

Figure 7:

OASIS Corpus Callosum Data Experiment

Figure 8:

Figure 9:

Table 1:

6. Conclusion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases