Principal Component Analysis in Space Forms

Puoya Tabaghi; Michael Khanzadeh; Yusu Wang; Siavash Mirarab

doi:10.1109/tsp.2024.3457529

. Author manuscript; available in PMC: 2025 Aug 21.

Published in final edited form as: IEEE Trans Signal Process. 2024 Sep 10;72:4428–4443. doi: 10.1109/tsp.2024.3457529

Principal Component Analysis in Space Forms

Puoya Tabaghi ¹, Michael Khanzadeh ², Yusu Wang ³, Siavash Mirarab ⁴

PMCID: PMC12366648 NIHMSID: NIHMS2096070 PMID: 40843461

Abstract

Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.

Index Terms—: Principal component analysis, Riemannian manifolds, hyperbolic and spherical spaces

I. Introduction

GIVEN a set of multivariate points, principal component analysis (PCA) finds orthogonal basis vectors so that different components of the data, in the new coordinates, become uncorrelated and the leading bases carry the largest projected variance of the points. PCA is related to factor analysis [1], Karhunen-Loéve expansion, and singular value decomposition [2] — with a history going back to the 18th century [3]. The modern formalism of PCA goes back to the work of Hotelling [4]. Owing to its interpretability and flexibility, PCA has been an indispensable tool in data science applications [5]. The PCA formulation has been studied numerously in the literature. Tipping and Bishop [6] established a connection between factor analysis and PCA in a probabilistic framework. Other extensions have been proposed [7], e.g., Gaussian processes [8] and sensible [9], Bayesian [10], sparse [11], [12], [13], [14], and Robust PCA [15].

PCA's main features are its linearity and nested optimality of subspaces with different dimensions. PCA uses a linear transformation to extract features. Thus, applying PCA to non-Euclidean data ignores their geometry, produces points that may not belong to the original space, and breaks downstream applications relying on this geometry [16], [17], [18], [19].

We focus on space forms: complete, simply connected Riemannian manifolds of dimension $d \geq 2$ and constant curvature — spherical, Euclidean, or hyperbolic spaces (positive, zero, and negative curvatures) [20]. Space forms have gained attention in the machine learning community due to their ability to represent many forms of data. Hyperbolic spaces are suitable for hierarchical structures [18], [21], [22], [23], [24], biological data [25], [26], and phylogenetic trees [17]. Spherical spaces find application in text embeddings [27], longitudinal data [28], and cycle-structures in graphs [29].

To address the shortcomings of Euclidean PCA for non-Euclidean data, several authors propose Riemannian PCA methods [16], [30], [31], [32], [33], [34], [35], [36], [37], [38]. Riemannian manifolds generally lack a vector space structure [39], posing challenges for defining principal components. A common approach for dimensionality reduction of manifold-valued data relies on tangent spaces. Fletcher et al. [33] propose a cost to quantify the quality of a Riemannian affine subspace but use a heuristic approach, principal geodesics analysis (PGA), to optimize it: (1) the base point (intrinsic mean) is the solution to a fixed-point problem, (2) Euclidean PCA in tangent space estimates the low-rank tangent vectors. Even more principled approaches do not readily yield tractable solutions necessary for analyzing large-scale data [34], as seen in spherical and hyperbolic PCAs [28], [40], [41].

Despite recent progress, PCA in space forms remains inadequately explored. In general, cost-based Riemannian (e.g., spherical and hyperbolic) PCAs rely on finding the optimal Riemannian affine subspace by minimizing a nonconvex function. The cost function, proxy, or methodology is usually inspired by the $ℓ_{2}^{2}$ cost, with no definite justification for it [16], [28], [33], [35], [40], [41]. These algorithms rely on iterative methods to estimate the Riemannian affine subspaces, e.g., gradient descent, fixed-point iterations, and proximal alternate minimization, and they are slow to converge and require parameter tuning. There is also no guarantee that estimated Riemannian affine subspaces form a total order under inclusion (i.e., optimal higher-dimensional subspaces include lower-dimensional ones) unless they perform cost minimization in a greedy (suboptimal) fashion by building high-dimensional subspaces based on previously estimated low-dimensional subspaces. Notably, Chakraborty et al. propose a greedy PGA for space forms by estimating one principal geodesic at a time [42]. They derive an analytic formula for projecting a point onto a parameterized geodesic. This simplifies the projection step of the PGA. However, we still have to solve a nonconvex optimization problem (with no theoretical guarantees) to estimate the principal geodesic at each iteration.

We address PCA limitations in space forms by proposing a closed-form, theoretically optimal, and computationally efficient method to derive all principal geodesics at once. We begin with a differential geometric view of Euclidean PCA (Section II), followed by a generic description of Riemannian PCA (Section III). In this view, a proper PCA cost function must (1) naturally define a centroid for manifold-valued points and (2) yield theoretically optimal affine subspaces forming a total order under inclusion. We introduce proper costs for spherical (Section IV) and hyperbolic (Section V) PCA problems. Minimizing each cost function leads to an eigenequation, which can be effectively solved. For hyperbolic PCA, the optimal affine subspace solves an eigenequation in Lorentzian space which is equipped with an indefinite inner product. These results give us efficient algorithms to derive hyperbolic principal components. We delegate all proofs to the Appendix.

A. Preliminaries and Notations

Let $(ℳ, g)$ be a Riemannian manifold. The tangent space $T_{p} ℳ$ is the collection of all tangent vectors at $p \in ℳ$ . The Riemannian metric $g_{p} : T_{p} ℳ \times T_{p} ℳ \to R$ is given by a positive-definite inner product and depends smoothly on $p$ . We use $g_{p}$ to define notions such as subspace, norms, and angles, similar to inner product spaces. For any subspace $H \subseteq T_{p} ℳ$ , we define its orthogonal complement as follows:

H^{⊥} = \{h^{'} \in T_{p} ℳ : g_{p} (h, h^{'}) = 0, \forall h \in H\} \subseteq T_{p} ℳ .

(1)

The norm of $v \in T_{p} ℳ$ is $‖ v ‖ \overset{def}{=} \sqrt{g_{p} (v, v)}$ . We denote the length of a smooth curve $γ : [0,1] \to ℳ$ as $[γ] = \int_{0}^{1} ‖γ^{'} (t)‖ d t$ . A geodesic $γ_{p_{1}, p_{2}}$ is the shortest-length path between $p_{1}$ and $p_{2} \in ℳ$ , that is, $γ_{p_{1}, p_{2}} = {arg min}_{γ} L [γ] : γ (0) = p_{1}, γ (1) = p_{2}$ . Interpreting the parameter $t$ as time, if a geodesic $γ (t)$ starts at $γ (0) = p \in ℳ$ with initial velocity $γ^{'} (0) = v \in T_{p} ℳ$ , the exponential map ${exp}_{p} (v)$ gives its position at $t = 1$ . For $p$ and $x \in ℳ$ , the logarithmic map ${log}_{p} (x)$ gives the initial velocity to move (with constant speed) along the geodesic from $p$ to $x$ in one time step. A Riemannian manifold $ℳ$ is geodesically complete if the exponential and logarithmic maps, at every point $p \in ℳ$ , are well-defined operators [43]. A submanifold $ℳ^{'}$ of a Riemannian manifold ( $ℳ, g$ ) is geodesic if any geodesic on $ℳ^{'}$ with its induced metric $g$ is also a geodesic on $(ℳ, g)$ . For $N \in N$ , we let $[N] \overset{def}{=} {1, \dots, N}$ and $[N]_{0} \overset{def}{=} [N] \cup {0}$ . The variable $x_{1}$ is an element of the vector $x = {(x_{0}, \dots, x_{D - 1})}^{⊤} \in R^{D}$ . It can also be an indexed vector, e.g., $x_{1}, \dots, x_{N} \in R^{D}$ . This distinction will be clarified in the context. We use $E_{N} [\cdot]$ to denote the empirical mean of its inputs with indices in $[N]$ .

II. Principal Component Analysis — Revisited

Similar to the notion by Pearson [44], PCA finds the optimal low-dimensional affine space to represent data. Let $p \in R^{D}$ and let the column span of $H \in R^{D \times K}$ be a subspace. For the affine subspace $p + H$ , PCA assumes the following cost:

cost (p + H∣ 𝒳) = E_{N} [f \circ d (x_{n}, 𝒫_{p + H} (x_{n}))],

where $𝒳 = \{x_{n} \in R^{D} : n \in [N]\}$ , $𝒫_{p + H} (x_{n}) = {arg min}_{x \in p + H} d (x, x_{n}), d (\cdot, \cdot)$ computes the $ℓ_{2}$ distance, and the distortion function $f (x) = x^{2}$ . This formalism relies on (1) an affine subspace $p + H$ , (2) the projection operator $𝒫_{p + H}$ , and (3) the distortion function $f$ . To generalize affine subspaces to Riemannian manifolds, consider parametric lines:

γ_{p, x} (t) = (1 - t) p + t x, and γ_{h^{'}} (t) = p + t h^{'},

(2)

where $h^{'} \in H^{⊥}$ , the orthogonal complement of the subspace $H$ ; see Fig. 1(a) and 1(b). We reformulate affine subspaces as:

p + H = \{x \in R^{D} : ⟨x - p, h^{'}⟩ = 0, for all h^{'} \in H^{⊥}\} = \{x \in R^{D} : ⟨γ_{p, x}^{'} (0), γ_{h^{'}}^{'} (0)⟩ = 0, for all h^{'} \in H^{⊥}\},

where $⟨ \cdot, \cdot ⟩$ is the dot product and ${γ^{'} (t_{0}) \overset{def}{=} \frac{d}{d t} γ (t)|}_{t = t_{0}}$ .

Fig. 1. — (a,b) One- (a) and two-dimensional (b) affine subspaces in $R^{3}$ . We show subspaces ( $H^{⊥}$ ) at point $p$ instead of the origin. We may define the same Riemannian affine subspace using other base points, e.g., $p^{'}$ . (c) Two-dimensional affine subspace in a hyperbolic space (Poincaré) where $h^{'} \in T_{p} I^{3} = R^{3}$ .

Definition 1: An affine subspace is a set of points, e.g., $x$ , where there exists $p \in R^{D}$ such that (tangent of) the line $γ_{x, p}$ (i.e., $γ_{p, x}^{'} (0)$ ) is normal to a set of tangent vectors at $p$ .

Definition 1 requires $dim (H^{⊥})$ parameters to describe $p + H$ . For example, in $R^{3}$ , we need two orthonormal vectors to represent a one-dimensional affine subspace; see Fig. 1(a). We use Definition 1 since it describes affine subspaces in terms of lines and tangent vectors, not a global coordinate system.

III. Riemannian Principal Component Analysis

We next introduce Riemannian affine subspaces and then propose a generic framework for Riemannian PCA.

A. Riemannian Affine Subspaces

The notion of affine subspaces can be extended to Riemannian manifolds using tangent subspaces of the Riemannian manifold $ℳ$ [33]. The Riemannian affine subspace is the image of a subspace of $T_{p} ℳ$ under the exponential map, i.e.,

ℳ_{H} = {exp}_{p} (H) \overset{def}{=} \{{exp}_{p} (h) : h \in H\},

(3)

where $H$ is a subspace of $T_{p} ℳ$ and $p \in ℳ$ . Equivalently, the Riemannian affine subspace is defined as follows.

Definition 2: Let $(ℳ, g)$ be a geodesically complete Riemannian manifold, $p \in ℳ$ , and subspace $H \subseteq T_{p} ℳ$ . We let $ℳ_{H} = \{x \in ℳ : g_{p} ({log}_{p} (x), h^{'}) = 0, \forall h^{'} \in H^{⊥}\}$ where $H^{⊥}$ is the orthogonal complement of $H$ ; see equation (1).

The set $ℳ_{H}$ is a collection of points on geodesics originating at $p$ such that their initial velocities are normal to vectors in $H^{⊥} \subseteq T_{p} ℳ$ , or form a subspace $H \subseteq T_{p} ℳ$ ; cf. Definition 1.

Example 1: When $H$ is a one-dimensional subspace, then $ℳ_{H}$ contains geodesics that pass through $p$ with the initial velocity in $H$ , i.e., $ℳ_{H} = {γ (t) : geodesic γ where γ (0) = p, γ^{'} (0) \in H, t \in R\}$ . Thus, with equation (3), geodesics are one-dimensional Riemannian affine subspaces.

Example 2: The Euclidean exponential map is ${exp}_{p} (h) = p + h$ ; see Table I. Therefore, equation (3) recovers the affine subspaces, i.e., ${exp}_{p} (H) = {p + h : h \in H} \overset{def}{=} p + H$ .

TABLE I.

Summary of Relevant Operators in Euclidean, Spherical, and Hyperbolic Spaces

$ℳ$	$T_{p} ℳ$	$g_{p} (u, v)$	${log}_{p} (x) : θ = d (x, p)$	${exp}_{p} (v)$	$d (x, p)$
$R^{D}$	$R^{D}$	$⟨ u, v ⟩$	$x - p$	$p + v$	$‖ x - p ‖_{2}$
$S^{D}$	$p^{⊥}$	$⟨ u, v ⟩$	$\frac{θ}{sin (θ)} (x - cos (θ) p)$	$cos (\sqrt{C} ‖ v ‖) p + sin (\sqrt{C} ‖ v ‖) \frac{v}{\sqrt{C} ‖ v ‖}$	$\frac{1}{\sqrt{C}} arccos (C ⟨ x, p ⟩)$
$H^{D}$	$p^{⊥}$	$[u, v]$	$\frac{θ}{sinh (θ)} (x - cos h (θ) p)$	$cos h (\sqrt{- C} ‖ v ‖) p + sinh (\sqrt{- C} ‖ v ‖) \frac{v}{\sqrt{- C} ‖ v ‖}$	$\frac{1}{\sqrt{- C}} acosh (C [x, p])$

Open in a new tab

Recall that, a nonempty set $V$ is a Euclidean affine subspace if and only if there exists $p \in V$ such that $α (v_{1} - p) + p \in V$ and $(v_{1} - p) + (v_{2} - p) + p \in V$ for all $α \in R$ and $v_{1}, v_{2} \in V$ . We have a similar definition for Riemannian affine subspaces.

Definition 3: Let $(ℳ, g)$ be a geodesically complete Riemannian manifold. Then, nonempty $V \subseteq ℳ$ is an affine set if and only if there exists $p \in V$ such that ${exp}_{p} (α {log}_{p} (v_{1})) \in V$ and ${exp}_{p} ({log}_{p} (v_{1}) + {log}_{p} (v_{2})) \in V$ for all $α \in R$ and $v_{1}, v_{2} \in V$ .

B. Proper Cost for Riemannian PCA

Similar to Euclidean PCA, Riemannian PCA aims to find a (Riemannian) affine subspace with minimum average distortion between points and their projections.

Definition 4: Let $(ℳ, g)$ be a geodesically complete Riemannian manifold that is equipped with distance function $d, p \in ℳ$ , and subspace $H \subseteq T_{p} ℳ$ . A geodesic projection of $x \in ℳ$ onto $ℳ_{H}$ is $𝒫_{H} (x) \in {arg min}_{y \in ℳ_{H}} d (x, y)$ . If ${arg min}_{y \in ℳ_{H}} d (x, y) \neq \emptyset$ , then ${min}_{y \in ℳ_{H}} d (x, y) = ‖{log}_{𝒫_{H} (x)} (x)‖$ for any geodesic projection $𝒫_{H} (x)$ .

Remark 1: Projecting a manifold-valued point onto a Riemannian affine subspace is not a trivial task, often requiring the solution of a nonconvex optimization problem over a submanifold, i.e., solving ${arg min}_{y \in ℳ_{H}} d (x, y)$ . Definition 4 states that if a solution exists (which is not always guaranteed), the projection distance must be equal to $‖{log}_{𝒫_{H} (x)} (x)‖$ . This also requires computing the logarithmic map, which may not be available for all Riemannian manifolds.

In Euclidean PCA, minimizing the $ℓ_{2}^{2}$ cost is equivalent to maximizing the variance of the projected data. To avoid confusing arguments regarding the notion of variance and centroid, we formalize the cost (parameterized by $ℳ_{H}$ ) in terms of the projection distance, viz.,

cost (ℳ_{H} ∣ 𝒳) = E_{N} [f (‖{log}_{𝒫_{H} (x_{n})} (x_{n})‖)],

(4)

where $𝒳 = \{x_{n} \in ℳ : n \in [N]\}$ and $f : R^{+} \to R$ is a monotonically increasing distortion function. The projection point $𝒫_{H} (x)$ may not be unique. Its minimizer, if it exists, is the best affine subspace to represent manifold-valued points.

Definition 5: Riemannian PCA aims to minimize the cost in equation (4) — for a specific choice of distortion function $f$ .

Choice of f. The closed-form solution for the optimal Euclidean affine subspace is due to letting $f (x) = x^{2}$ . This is a proper cost function with the following properties:

Consistent Centroid. The optimal 0-dimensional affine subspace (a point) is the centroid of data points, i.e., $p^{*} = {arg min}_{y \in R^{D}} E_{N} [f \circ d (x_{n}, y)] = E_{N} [x_{n}]$ .
Nested Optimality. The optimal affine subspaces form a nested set, i.e., $p^{*} \subseteq {(p + H_{1})}^{*} \subseteq \dots$ where ${(p + H_{d})}^{*}$ is the optimal $d$ -dimensional affine subspace.

Definition 6: For Riemannian PCA, we call $cost (ℳ_{H} ∣ 𝒳)$ a proper cost function if its minimizers satisfy the consistent centroid and nested optimality conditions.

Deriving the logarithm operator is not a trivial task for general Riemannian manifolds, e.g., the manifold of rankdeficient positive semidefinite matrices [45]. Focusing on constant-curvature Riemannian manifolds, we propose distortion functions that, unlike existing methods, arrive at proper cost functions with closed-form optimal solutions.

IV. Spherical PCA

Consider the spherical manifold ( $S^{D}, g^{S}$ ) with curvature $C > 0$ , where $S^{D} = \{x \in R^{D + 1} : ⟨ x, x ⟩ = C^{- 1}\}$ , a sphere with radius $C^{- \frac{1}{2}}$ , and $g_{p}^{S} (u, v) = ⟨ u, v ⟩$ computes the dot product of $u, v \in T_{p} S^{D} = \{x \in R^{D + 1} : ⟨ x, p ⟩ = 0\} \overset{def}{=} p^{⊥}$ .

A. Spherical Affine Subspace and the Projection Operator

Let $p \in S^{D}$ and the subspace $H \subseteq T_{p} S^{D} = p^{⊥}$ . Following Definition 2 and Table I, the spherical affine subspace is:

S_{H}^{D} = \{x \in S^{D} : ⟨x, h^{'}⟩ = 0, \forall h^{'} \in H^{⊥}\} = S^{D} \cap (p \oplus H),

where $\oplus$ is the direct sum operator, i.e., $p \oplus H = {α p + h : h \in H, α \in R}$ , and $H^{⊥}$ is the orthogonal complement of $H$ ; see equation (1). This matches Pennec's notion of the metric completion of exponential barycentric spherical subspace [37].

Claim 1: $S_{H}^{D}$ is a geodesic submanifold.

There are orthogonal tangent vectors $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ that form a complete basis for $H^{⊥}$ , i.e., $⟨h_{i}^{'}, h_{j}^{'}⟩ = C δ_{i, j}$ , where $δ_{i, j} = 0$ if $i \neq j$ and $δ_{i, j} = 1$ if $i = j$ . Using these basis vectors, we derive a simple expression for the projection distance.

Proposition 1: For any $S_{H}^{D}$ and $x \in S^{D}$ , we have

min_{y \in S_{H}^{D}} d (x, y) = C^{- \frac{1}{2}} acos (\sqrt{1 - \sum_{k \in [K^{'}]} {⟨x, h_{k}^{'}⟩}^{2}}),

where ${\{h_{k^{'}}^{'}\}}_{k^{'} \in [K^{'}]}$ are the complete orthogonal basis vectors of $H^{⊥}$ . Both $S_{H}^{D}$ and $S^{D}$ have a fixed curvature $C > 0$ .

For points at $C^{- \frac{1}{2}} \frac{π}{2}$ distance from $S_{H}^{D}$ , there is no unique projection onto the affine subspace. Nevertheless, Proposition 1 provides a closed-form expression for the projection distance in terms of basis vectors for $H^{⊥}$ . This distance monotonically increases with the length of the residual of $x$ onto $H^{⊥}$ , i.e., $\sum_{k \in [K^{'}]} {⟨x, h_{k}^{'}⟩}^{2}$ . Since $dim (H) ≪ D$ is common, switching to the basis of $H$ helps us represent the projection distance with fewer parameters.

Proposition 2: For any $S_{H}^{D}$ and $x \in S^{D}$ , we have

min_{y \in S_{H}^{D}} d (x, y) = C^{- \frac{1}{2}} acos (\sqrt{C^{2} ⟨ x, p ⟩^{2} + \sum_{k \in [K]} {⟨x, h_{k}⟩}^{2}}),

where ${\{h_{k}\}}_{k \in [K]}$ are complete orthogonal basis vectors of $H$ .

Next, we derive an isometry between $S_{H}^{D}$ and $S^{dim (H)}$ — where both have the fixed curvature $C > 0$ .

Theorem 2: The isometry $𝒬 : S_{H}^{D} \to S^{K}$ and its inverse are

𝒬 (x) = C^{- \frac{1}{2}} [\begin{matrix} C ⟨ x, p ⟩ \\ ⟨x, h_{1}⟩ \\ ⋮ \\ ⟨x, h_{K}⟩ \end{matrix}], 𝒬^{- 1} (y) = C^{- \frac{1}{2}} (y_{0} C p + \sum_{k = 1}^{K} y_{k} h_{k}),

where ${\{h_{k}\}}_{k \in [K]}$ are complete orthogonal basis vectors of $H$ .

Corollary 1: The dimension of $S_{H}^{D}$ is $dim (H)$ .

Finally, we can provide an alternative view of spherical affine subspaces based on sliced unitary matrices ${G \in R^{(D + 1) \times (K + 1)} : G^{⊤} G = I_{K + 1}\}$ .

Claim 3: For any $S_{H}^{D}$ , there is a sliced-unitary operator $G : S^{dim} (H) \to S_{H}^{D}$ and vice versa.

B. Minimum Distortion Spherical Subspaces

To define principal components, we need a specific choice of distortion function $f$ ; see equation (4). Before presenting our choice, let us discuss previously studied cost functions.

1). Review of Existing Work:

Dai and Müller consider an intrinsic PCA for smooth functional data $𝒳$ on a Riemannian manifold $ℳ$ [28] with the distortion function $f (x) = x^{2}$ , i.e.,

{cost}_{Dai .} (S_{H}^{D} ∣ 𝒳) = E_{N} [f (min_{y \in S_{H}^{D}} d (x_{n}, y))] .

(5)

Their algorithm, Riemannian functional principal component analysis (RFPCA), is based on first solving for the base point — the optimal zero-dimensional affine subspace, i.e., the Fréchet mean $p^{*} = {arg min}_{p \in S^{D}} E_{N} [d {(x_{n}, p)}^{2}]$ . Then, they project each point to $T_{p^{*}} ℳ$ using the logarithmic map. Next, they perform PCA on the resulting tangent vectors to obtain the $K$ -dimensional tangent subspace. Finally, they map back the projected tangent vectors to $ℳ$ (spherical space) using the exponential map. Despite its simplicity, this approach suffers from four shortcomings. (1) There is no closed-form expression for the Fréchet mean of spherical data. (2) Theoretical analysis on computation complexity of estimating a Fréchet mean is not yet known; and its computation involves an argmin operation which oftentimes cannot be easily differentiated [46]. (3) Even for accurately estimated Fréchet mean, there is no guarantee that the optimal solution to the problem (5) is the Fréchet mean. Huckemann and Ziezold [47] show that the Fréchet mean may not belong to the optimal one-dimensional affine spherical subspace. (4) Even if the Fréchet mean happens to be the optimal base point, performing PCA in the tangent space is not the solution to problem (5).

Liu et al. propose a spherical matrix factorization problem:

min_{\begin{matrix} G \in R^{(D + 1) \times (K + 1)} \\ {\{y_{n} \in S^{K}\}}_{n \in [N]} \end{matrix}} E_{N} [{‖x_{n} - G y_{n}‖}_{2}^{2}] : G^{⊤} G = I_{K + 1,}

(6)

where $𝒳 = \{x_{n} \in R^{D + 1} : n \in [N]\}$ is the measurement set and $y_{1}, \dots, y_{N} \in S^{K}$ are features in a spherical space with $C = 1$ [40]. They propose a proximal algorithm to solve for the affine subspace and features. This formalism is not a spherical PCA because the measurements do not belong to a spherical space. The objective in equation (6) aims to best project Euclidean points to a low-dimensional spherical affine subspace with respect to the squared Euclidean distance — refer to Claim 2. Nevertheless, if we change their formalism and let the input space be a spherical space, we arrive at:

{cost}_{Liu} (S_{H}^{D} ∣ 𝒳) = E_{N} [- cos (min_{y_{n} \in S^{K}} d (x_{n}, G y_{n}))] = E_{N} [f \circ d (x_{n}, 𝒫_{H} (x_{n}))],

(7)

where $H$ is a tangent subspace that corresponds to $G$ (see Claim 2) and $𝒫_{H}$ is the spherical projection operator. This formalism uses distortion function $f (x) = - cos (x)$ .

Nested spheres [48] by Jung et al. is an alternative procedure for fitting principal nested spheres to iteratively reduce the dimensionality of data. It finds the optimal ( $D - 1$ )-dimensional subsphere $𝒰_{D - 1}$ by minimizing the following cost,

{cost}_{Jung} (𝒰_{D - 1} ∣ 𝒳) = E_{N} [{(d (x_{n}, p) - r)}^{2}],

where $𝒰_{D - 1} = \{x \in S^{D} : d (x, p) = r\}$ — over $p \in S^{D}$ and $r \in R^{+}$ . This is a constrained nonlinear optimization problem without closed-form solutions. Once they estimate the optimal $𝒰_{D - 1}$ , they map each point to the lower-dimensional spherical space $S^{D - 1}$ — and repeat this process until they reach the target dimension. The subspheres are not necessarily great spheres, making this decomposition nongeodesic.

2). A Proper Cost Function for Spherical PCA:

In contrast to distortions $f (x) = - c o s (x)$ and $f (x) = x^{2}$ used by Liu et al. [40] and Dai and Müller [28], we choose $f (x) = {s i n}^{2} (\sqrt{C} x)$ . Using Proposition 1, we arrive at:

cost (S_{H}^{D} ∣ 𝒳) = E_{N} [\sum_{k \in [K^{'}]} {⟨x_{n}, h_{k}^{'}⟩}^{2}],

(8)

i.e., the average $ℓ_{2}^{2}$ norm of the projected points in the directions of vectors $h_{1}^{'}, \dots, h_{K^{'}}^{'} \in T_{p} S^{D}$ . The expression (8) leads to a tractable constrained optimization problem.

Claim 4: Let $x_{1}, \dots, x_{N} \in S^{D}$ . The spherical PCA equation (8) aims to find $p \in S^{D}$ and orthogonal $h_{1}^{'}, \dots, h_{K^{'}}^{'} \in T_{p} S^{D}$ that minimize $\sum_{k \in [K^{'}]} h_{k}^{'} ⊤ C_{x} h_{k}^{'} C_{x} = E_{N} [x_{n} x_{n}^{⊤}]$ .

The solution to the problem in Claim 3 is the minimizer of the cost in equation (8), which is a set of orthogonal vectors $h_{1}^{'}, \dots, h_{K^{'}}^{'} \in p^{⊥}$ that capture, in quantum physics terminology, the least possible energy of $C_{x}$ .

Theorem 5: Let $x_{1}, \dots, x_{N} \in S^{D}$ . Then, an optimal solution for $p$ is the leading eigenvector of $C_{x}$ , and $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ (basis vectors of $H^{⊥}$ ) are the eigenvectors that correspond to the smallest $K^{'}$ eigenvalues of the second-moment matrix $C_{x}$ .

Corollary 2: The optimal subspace $H$ is spanned by $K$ leading eigenvectors of $C_{x}$ , discarding the first one.

Claim 6: The cost function in equation (8) is proper.

The distortion in equation (8) implies a closed-form definition for the centroid of spherical data points, i.e., a zero-dimensional affine subspace that best represents the data.

Definition 7: A spherical mean $μ (𝒳)$ for point set $𝒳$ is any point such that $E_{N} [f \circ d (x_{n}, μ (𝒳))] = {min}_{p \in S^{D}} E_{N} [f \circ d (x_{n}, p)]$ . The solution is a scaled leading eigenvector of $C_{x}$ .

Interpreting the optimal base point as the spherical mean in Definition 7 shows our spherical PCA has consistent centroid and nested optimality: optimal spherical affine subspaces of different dimensions form a chain under inclusion. However, $μ (𝒳)$ is not unique and only identifies the direction of the main component of points; see Fig. 2.

Fig. 2. — (a) A set of data points in $S^{D}$ , where $D = 2$ . (b) The best estimate for the base point $p$ and the tangent subspace $H = h_{1} \in T_{p} S^{D}$ — the spherical affine subspace $S_{H}^{D} = (p \oplus H) \cap S^{D}$ . (c) The projection of points onto $S_{H}^{D} (H = h_{1})$ . (d) The low-dimensional features in $S^{K}$ , where $K = dim (S_{H}^{D}) = 1$ .

Remark 2: PGA [33] and RFPCA [28] involve the intensive task of Fréchet mean estimation. This involves iterative techniques like gradient descent or fixed-point iterations on nonlinear optimization objectives. There has been work on the numerical analysis of Fréchet mean computations [46]. On the other hand, SPCA [40] uses alternating linearized minimization to estimate the optimal subspace. In contrast, our method (SFPCA) requires computing the second-moment matrix $C_{x}$ with a complexity of $O (D^{2} N)$ and involves eigendecomposition with a worst-case complexity of $O (D^{3})$ .

V. Hyperbolic PCA

Let us first introduce Lorentzian spaces.

Definition 8: The Lorentzian ( $D + 1$ )-space, denoted by $R^{1, D}$ , is a vector space equipped with the Lorentzian inner product

\forall x, y \in R^{1, D} : [x, y] = x^{⊤} J_{D} y, J_{D} = (\begin{matrix} - 1 & 0^{⊤} \\ 0 & I_{D} \end{matrix}),

where $I_{D}$ is the $D \times D$ identity matrix.

The $D$ -dimensional hyperbolic manifold $(H^{D}, g^{H})$ with curvature $C < 0$ , where $H^{D} = \{x \in R^{D + 1} : [x, x] = C^{- 1} and x_{0} > 0\}$ and metric $g_{p}^{H} (u, v) = [u, v]$ for $u, v \in T_{p} H^{D} = \{x \in R^{D + 1} : [x, p] = 0\} \overset{def}{=} p^{⊥}$ ; see Table I.

A. Eigenequation in Lorentzian spaces

Like inner product spaces, we define operators in $R^{1, D}$ .

Definition 9: Let $A \in R^{(D + 1) \times (D + 1)}$ be a matrix (operator) in $R^{1, D}$ . We let $A^{[⊤]}$ be the $J_{D}$ -adjoint of $A$ if and only if $A^{[⊤]} = J_{D} A^{⊤} J_{D} . A^{[- 1]}$ is the $J_{D}$ -inverse of $A$ if and only if (iff) $A^{[- 1]} J_{D} A = A J_{D} A^{[- 1]} = J_{D}$ . An invertible matrix $A$ is called $J_{D}$ -unitary iff $A^{⊤} J_{D} A = J_{D}$ ; see [49] for more detail.

The Lorentzian space $R^{1, D}$ is equipped with an indefinite inner product, i.e., $\exists x \in R^{1, D} : [x, x] < 0$ . Therefore, it requires a form of eigenequation defined by its indefinite inner product. For completeness, we propose the following definition of eigenequation in the complex Lorentzian space $C^{1, D}$ .

Definition 10: For $A \in C^{(D + 1) \times (D + 1)}, v \in C^{1, D}$ is its $J_{D^{-}}$ eigenvector and $λ$ is the corresponding $J_{D}$ -eigenvalue if

A J_{D} v = sgn ([v^{*}, v]) λ v, where λ \in C,

(9)

and $v^{*}$ is the complex conjugate of $v$ . The sign of the norm, $sgn ([v^{*}, v])$ , defines positive and negative $J_{D}$ -eigenvectors.

Definition 10 is subtly different from hyperbolic eigenequation [50] — a special case of $(A, J)$ eigenvalue decomposition. We prefer Definition 10 as it carries over familiar results from Euclidean to the Lorentzian space.

Proposition 3: If $A = A^{[⊤]}$ , then its $J_{D}$ -eigenvalues are real.

Let $v$ be a $J_{D}$ -eigenvector of $A$ where $|[v^{*}, v]| = 1$ . Then, $[v^{*}, A J_{D} v] = sgn ([v^{*}, v]) λ [v^{*}, v] = λ,$ the $J_{D^{-}}$ eigenvalue of $A$ ; see Definition 10. There is a connection between Euclidean and Lorentzian eigenequations. Namely, $A J_{D} \in C^{(D + 1) \times (D + 1)}$ has eigenvector $v \in C^{D + 1}$ and $[v^{*}, v] \neq 0$ . Then, $({|[v^{*}, v]|}^{- \frac{1}{2}} v, sgn ([v^{*}, v]) λ)$ is a $J_{D^{-}}$ eigenpair of $A$ .

Claim 7: $J_{D}$ -eigenvectors of $A$ are parallel to eigenvectors of $A J_{D}$ .

Proposition 4 shows that the normalization factor is well-defined for full-rank matrices.

Proposition 4: If $A$ is full-rank, then $\{v \in R^{D + 1} : A J_{D} v = λ v and [v^{*}, v] = 0\} = \emptyset$ .

Our algorithm uses the connection between Euclidean and $J_{D}$ -eigenequations. We extend the notion of diagonalizability to derive the optimal affine subspace; see Proposition 7.

Definition 11: $A \in C^{(D + 1) \times (D + 1)}$ is $J_{D}$ -diagonalizable if and only if there is a $J_{D}$ -invertible $V \in C^{(D + 1) \times (D + 1)}$ such that $A J_{D} V = V J_{D} Λ$ , where $Λ$ is a diagonal matrix.

B. Hyperbolic Affine Subspace and the Projection Operator

Let $p \in H^{D}$ and $H^{⊥}$ be a $K^{'}$ -dimensional subspace of $T_{p} H^{D} = p^{⊥}$ . Following Definition 2 and Table I, we arrive at the following definition for the hyperbolic affine subspace:

H_{H}^{D} = \{x \in H^{D} : [x, h^{'}] = 0, \forall h^{'} \in H^{⊥}\},

(10)

where $H^{⊥}$ is the orthogonal complement of $H$ , i.e., $H_{H}^{D} = H^{D} \cap (p \oplus H)$ . This also coincides with the metric completion of exponential barycentric hyperbolic subspace [37].

Claim 8: The hyperbolic subspace is a geodesic submanifold.

Lemma 1 shows that there is a complete set of orthogonal tangents $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ where $[h_{i}^{'}, h_{j}^{'}] = - C δ_{i, j}$ and span $H^{⊥}$ . In Proposition 5, we provide a closed-form expression for the projection distance onto $H_{H}^{D}$ in terms of the basis of $H^{⊥}$ .

Proposition 5: For any $H_{H}^{D}$ and $x \in H^{D}$ , we have

min_{y \in H_{H}^{D}} d (x, y) = | C |^{- \frac{1}{2}} acosh (\sqrt{1 + \sum_{k \in [K^{'}]} {[x, h_{k}^{'}]}^{2}}),

where ${\{h_{k^{'}}^{'}\}}_{k^{'} \in [K^{'}]}$ are complete orthogonal basis of $H^{⊥}$ .

The projection distance monotonically increases with the norm of its residual of $x$ onto $H^{⊥}$ , i.e., $\sum_{k \in [K^{'}]} {[x, h_{k}^{'}]}^{2}$ . Proposition 5 asks for the orthogonal basis of $H^{⊥}$ —commonly, a high-dimensional space. We can use the basis of $H$ to compute the projection distance.

Proposition 6: For any $H_{H}^{D}$ and $x \in H^{D}$ , we have

min_{y \in H_{H}^{D}} d (x, y) = | C |^{- \frac{1}{2}} acosh (\sqrt{C^{2} [x, p]^{2} - \sum_{k \in [K]} {[x, h_{k}]}^{2}}),

where ${\{h_{k}\}}_{k \in [K]}$ are complete orthonormal basis of $H$ .

We represent points in $H_{H}^{D}$ as a linear combination of the base point and tangent vectors. Given these $K + 1$ vectors, we can find a low-dimensional representation for points in $H_{H}^{D}$ —reducing the dimensionality of hyperbolic data points.

Theorem 9: The isometry $𝒬 : H_{H}^{D} \to H^{K}$ and its inverse are

𝒬 (x) = α [\begin{matrix} C [x, p] \\ ⋮ \end{matrix}], 𝒬^{- 1} (y) = α (- y_{0} C p + \sum_{k = 1}^{K} y_{k} h_{k}),

where $α = | C |^{- \frac{1}{2}}$ and $H$ has complete orthogonal basis vectors ${\{h_{k}\}}_{k \in [K]}$ . Both $H_{H}^{D}$ and $H^{K}$ have curvature $C < 0$ .

Corollary 3: The affine dimension of $H_{H}^{D}$ is $dim (H)$ .

Similar to the spherical case, we can characterize hyperbolic affine subspaces in terms of sliced $J_{D}$ -unitary matrices—paving the way for constrained optimization methods over sliced $J_{D}$ -unitary matrices to solve hyperbolic PCA problems.

Claim 10: For any $H_{H}^{D}$ , there is a sliced $J_{D}$ -unitary operator $G : H^{dim (H)} \to H_{H}^{D}$ , i.e., $G^{⊤} J_{D} G = J_{dim (H)}$ , and vice versa.

C. Minimum Distortion Hyperbolic Subspaces

1). Review of Existing Work:

Chami et al. propose HoroPCA [41]. They define $GH (p, q_{1}, \dots, q_{K})$ as the geodesic hull $γ_{1}, \dots, γ_{K}$ , where $γ_{k}$ is a geodesic such that $γ_{k} (0) = p \in H^{D}$ and ${lim}_{t \to + \infty} γ_{k} (t) = q_{k} \in \partial H^{D}$ for all $k \in [K]$ . The geodesic hull of $γ_{1}, \dots, γ_{K}$ contains straight lines between $γ_{k} (t)$ and $γ_{k^{'}} (t^{'})$ for all $t, t^{'} \in R$ and $k, k^{'} \in [K]$ .

Claim 11: $GH (p, q_{1}, \dots, q_{K})$ is a hyperbolic affine subspace.

Their goal is to maximize a proxy for the projected variance:

{cost}_{Chami} (H_{H}^{D} ∣ 𝒳) = - E_{N} [d {({\hat{𝒫}}_{H} (x_{n}), {\hat{𝒫}}_{H} (x_{n^{'}}))}^{2}],

(11)

where ${\hat{𝒫}}_{H}$ is the horospherical projection operator — which is not a geodesic (distance-minimizing) projection. They propose a sequential algorithm to minimize the cost in equation (11), using a gradient descent method, as follows: (1) the base point is computed as the Fréchet mean via gradient descent; and (2) a higher-dimensional affine subspace is estimated based on the optimal affine subspace of lower dimension. One may formulate the hyperbolic PCA problem as follows:

min_{\begin{matrix} G \in R^{(D + 1) \times (K + 1)} \\ {\{y_{n} \in H^{K}\}}_{n \in [N]} \end{matrix}} E_{N} [{‖x_{n} - G y_{n}‖}_{2}^{2}] : G^{⊤} J_{D} G = J_{K,}

(12)

where $x_{1}, \dots, x_{N} \in R^{D + 1}$ are the measurements and $y_{1}, \dots, y_{N} \in H^{K}$ are low-dimensional hyperbolic features. The formulation in equation (12) leads to the decomposition of a Euclidean matrix in terms of a sliced- $J_{D}$ unitary matrix and a hyperbolic matrix — a topic for future studies.

2). A Proper Cost Function for Hyperbolic PCA:

We choose $f (x) = {sinh}^{2} (\sqrt{| C |} x)$ to arrive at the following cost:

cost (H_{H}^{D} ∣ 𝒳) = E_{N} [\sum_{k \in [K^{'}]} {[x_{n}, h_{k}^{'}]}^{2}];

(13)

see Proposition 5. We interpret $cost (H_{H}^{D} ∣ 𝒳)$ as the aggregate dissipated energy of the points in directions of normal tangent vectors. If $x \in H^{D}$ has no components in the direction of normal tangents — i.e., $[x, h_{k}^{'}] = 0$ where $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ are orthogonal basis vectors for $H^{⊥}$ — then $\sum_{k \in [K^{'}]} {[x, h_{k}^{'}]}^{2} = 0$ . Our distortion function in equation (13) leads to the formulation of hyperbolic PCA as a constrained optimization problem:

Problem 12: Let $x_{1}, \dots, x_{N} \in H^{D}$ and $C_{x} = E_{N} [x_{n} x_{n}^{⊤}]$ . The hyperbolic PCA aims to find a point $p \in H^{D}$ and a set of orthogonal vectors $h_{1}^{'}, \dots, h_{K^{'}}^{'} \in T_{p} H^{D} = p^{⊥}$ that minimize the function $\sum_{k \in [K^{'}]} {h_{k}^{'}}^{⊤} J_{D} C_{x} J_{D} h_{k}^{'}$ .

Claim 1 aims to minimize the cost in equation (13), i.e.,

cost (H_{H}^{D} ∣ 𝒳) = \sum_{k \in [K^{'}]} h_{k}^{' ⊤} J_{D} C_{x} J_{D} h_{k}^{'},

where $p \in H^{D}$ , $h_{1}^{'}, \dots, h_{K^{'}}^{'} \in p^{⊥}$ , $[h_{i}^{'}, h_{j}^{'}] = - C δ_{i, j}$ and $C_{x} = E_{N} [x_{n} x_{n}^{⊤}]$ , over $p \in H^{D}, H \subseteq p^{⊥}, dim (H) = K$ . Claim 1 asks for $J_{D}$ -orthogonal vectors $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ in an appropriate tangent space $T_{p} H^{D}$ that capture the least possible energy of $C_{x}$ with respect to the Lorentzian scalar product.

Remark 3: The spectrum of a matrix is its set of eigenvalues. Discarding an eigenvalue from the matrix's eigenvalue decomposition approximates the matrix with an error proportional to the magnitude of the discarded eigenvalue, that is, discarded energy. Similarly, one can define the $J_{D}$ -spectrum of the second-moment matrix $C_{x}$ as the set of its $J_{D}$ -eigenvalues. As we demonstrate in numerical experiments, we use $J_{D}$ -spectrum to identify the existence of outlier hyperbolic points.

This is akin to the Euclidean PCA: the subspace is spanned by the leading eigenvectors of the covariance matrix. However, to prove the hyperbolic PCA theorem, we need a technical result on $J_{D}$ -diagonalizability of the second-moment matrix.

Proposition 7: If $A \in R^{(D + 1) \times (D + 1)}$ is a symmetric and $J_{D}$ -diagonalizable matrix, i.e., $A J_{D} V = V J_{D} Λ$ , that has distinct (in absolute values) diagonal elements of $Λ$ , then $A = V Λ V^{⊤}$ where $V$ is a $J_{D}$ -unitary matrix.

From Proposition 7, any symmetric, $J_{D}$ -diagonalizable matrix with distinct (absolute) $J_{D}$ -eigenvalues has $D$ positive and one negative $J_{D}$ -eigenvectors — all orthogonal to each other.

Theorem 13: Let $x_{1}, \dots, x_{N} \in H^{D}$ and $C_{x} = E_{N} [x_{n} x_{n}^{⊤}]$ be a $J_{D}$ -diagonalizable matrix. Then, the optimal solution for point $p$ is the scaled negative $J_{D}$ -eigenvector of $C_{x}$ and the optimal $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ are the scaled positive $J_{D}$ -eigenvectors that correspond to the smallest $K^{'} J_{D}$ -eigenvalues of $C_{x}$ . And $H$ is spanned by $K = D - K^{'}$ scaled positive $J_{D}$ -eigenvectors that correspond to the leading $J_{D}$ -eigenvalues of $C_{x}$ .

The $J_{D}$ -diagonalizability condition requires $C_{x}$ to be similar to a diagonal matrix. Proposition 7 provides a sufficient condition for its $J_{D}$ -diagonalizability; in fact, we conjecture that symmetry is sufficient even if it has repeated $J_{D}$ -eigenvalues.

Claim 14: The cost function in equation (13) is proper.

The proper cost in equation (13) implies the following closed-form definition for the hyperbolic centroid.

Definition 12: A hyperbolic mean of $𝒳$ is $μ (𝒳)$ if $E_{N} [f \circ d (x_{n}, μ (𝒳))] = {min}_{p \in H^{D}} E_{N} [f \circ d (x_{n}, p)]$ . The solution is the scaled negative $J_{D}$ -eigenvector of $C_{x}$ .

Remark 4: The formalism of space form (Euclidean, spherical, and hyperbolic) PCAs shows similarities through the use of (in)definite eigenequations. This arises from the introduction of proper cost functions which resulted in quadratic cost functions with respect to the base points and tangent vectors. However, this approach is not necessarily generalizable to other Riemannian manifolds. This limitation is due to the absence of (1) a simple Riemannian metric, (2) a closed-form distance function, and (3) closed-form exponential and logarithmic maps in general Riemannian manifolds, e.g., the manifold of rankdeficient positive semidefinite matrices [45].

VI. Numerical Results

We compare our space form PCA algorithm (SFPCA) to other leading algorithms in terms of accuracy and speed.

A. Synthetic Data and Experimental Setup

We generate random, noise-contaminated points on known (but random) spherical and hyperbolic affine subspaces. We then apply PCA methods to recover the projected points after estimating the affine subspace. We conduct experiments examining the impact of number of points $N$ , the ambient dimension $D$ , the dimension of the affine subspace $K$ , and the noise level $σ$ on the performance of algorithms.

1). Random Affine Subspace:

For fixed ambient and subspace dimensions, $D$ and $K$ , we sample from a normal distribution and normalize it to get the spherical (or hyperbolic) point $p$ . We then generate random vectors from the standard normal distribution and use the Gram-Schmidt process to construct tangent vectors: project the first random vector onto $p^{⊥}$ and normalize it to $h_{1} \in T_{p} S^{D}$ , where $S \in {S, H}$ . We then project the second random vector onto $(p \oplus H)^{⊥}$ , where $H = h_{1}$ , and normalize it to $h_{2} \in T_{p} S^{D}$ . We repeat this until we form a $K$ -dimensional affine subspace in $T_{p} S^{D}$ .

2). Noise-Contaminated Points:

Let $H \in R^{(D + 1) \times K}$ be the subspace of $T_{p} S^{D}$ . For $n \in [N]$ , we generate $c_{n} ~ 𝒩 (0, α_{S} I_{K + 1})$ and let $v_{n} = H c_{n}$ . To add noise, we project $ν_{n} ~ 𝒩 (0, α_{S} σ^{2} I_{D + 1})$ onto $T_{p} S^{D}$ , i.e, $p^{⊥} v_{n}$ . We then let $x_{n} = {exp}_{p} (v_{n} + p^{⊥} ν_{n})$ be the noise-contaminated point. Finally, $α_{S} = \frac{π}{4}$ if $S = S$ and $α_{S} = 1$ if $S = H$ .

3). PCA on Noisy Data:

We use each algorithm to estimate an affine subspace $S_{\hat{H}}^{D}$ where $\hat{H} \subseteq T_{\hat{p}} S^{D}$ and $\hat{p}$ are the estimated parameters. We let $n_{i} \overset{def}{=} E_{N} [d (x_{n}, 𝒫_{H} (x_{n}))]$ be the empirical mean of the distance between measurements and the true subspace, and $n_{o} \overset{def}{=} E_{N} [d (𝒫_{\hat{H}} (x_{n}), 𝒫_{H} (𝒫_{\hat{H}} (x_{n})))]$ be the average distance between denoised points ${\{𝒫_{\hat{H}} (x_{n})\}}_{n \in [N]}$ and the true affine subspace. If $S_{\hat{H}}^{D}$ is a good approximation to $S_{H}^{D}$ , then $n_{o}$ is small. We evaluate the performance of algorithms using the normalized output error, $\frac{n_{o}}{n_{i}}$ .

Remark 5: The ratio of $n_{o}$ over $n_{i}$ quantifies how much the estimated points are farther from the true subspace compared to the original noise-contaminated points. This is a normalized quantity, i.e., it is invariant with respect to the scale of data points, which makes it ideal for comparing results as $D, K$ , $σ$ , and $N$ vary. A reasonable upper bound for this ratio is 1 — as PCA is expected to denoise the point sets by finding the optimal low-dimensional affine subspace for them.

4). Randomized Experiments and Algorithms:

For each random affine subspace and noise-contaminated point set, we report the normalized error and the running time for each algorithm. Then, we repeat each random trial 100 times. We use our implementation of principal geodesic analysis (PGA) [33]. We also implement Riemannian functional principal component analysis (RFPCA) for spherical PCA [28] and spherical principal component analysis (SPCA) [40]. Since SPCA is computationally expensive, we first run our SFPCA to provide it with good initial parameters. For hyperbolic experiments, we use HoroPCA [41] and barycentric subspace analysis (BSA) [37], implemented by Chami et al. [41].

B. Spherical PCA

1). Experiment $S (K_{1})$ :

For a fixed $D = 10^{2}, N = 10^{4}$ , increasing the subspace dimension $K$ worsens the normalized output errors for all algorithms; see Fig. 3(a). RFPCA is unreliable, while other methods are similar in their error reduction pattern. When $K$ is close to $D$ , SFPCA has a marginal but consistent advantage over PGA. SFPCA is faster than the rest, and $K$ has a minor impact on running times.

2). Experiments $S (D_{1})$ and $S (D_{2})$ :

In $S (D_{1})$ — fixed $K = 1, N = 10^{4}$ and varying $D$ — PGA, SFPCA, and SPCA exhibit a similar denoising performance, not impacted by $D$ ; see Fig. 3 (b). RFPCA has higher output error levels than other methods. To further compare SFPCA and its close competitor PGA, we design the challenging experiment $S (D_{2})$ with $K = 10$ and $N = 10^{3}$ . In this setting, SFPCA exhibits a clear advantage over PGA in error reduction; see Fig. 4. In both settings, SFPCA continues to be the fastest in almost all conditions despite using a warm start for PGA.

Fig. 4. — Spherical experiment $S (D_{2})$ . The y-axes show running time and normalized output error. All axes are in logarithmic scale.

3). Experiment $S (N_{1})$ :

For fixed $K = 1$ and $D = 10^{2}$ , when we change $N$ and $σ$ , our SFPCA has the fastest running time; and it is tied in having the lowest normalized output error with SPCA and PGA; see Fig. 3(c). As expected, increasing $N$ generally makes all methods slower, partially because the computation of $C_{x}$ has $O (N)$ complexity. Computing a base point $p$ using iterative computations on all $N$ points is time-consuming with $N$ , whereas our SFPCA has worst-case complexity of $O (D^{3})$ . SFPCA provides similar error reductions compared to the rest due to providing an excessive number of points to each algorithm. SPCA fails in some cases, as evident from the erratic behavior of the normalized output error. This may be due to the algorithm's failure to converge within the allocated maximum running time. SPCA takes about 15 minutes on 10⁴ points in each trial, while our SFPCA takes less than a second. PGA is the closest competitor in normalized error but is about three times slower.

C. Hyperbolic PCA

1). Experiments $H (K_{1})$ and $H (K_{2})$ :

On small datasets in $H (K_{1}) (D = 50, N = 51)$ , for each trial, HoroPCA and BSA take close to an hour whereas SFPCA and PGA take milliseconds; see Fig. 5(a). Increasing $K$ only increases the running time of BSA and HoroPCA but does not change SFPCA's and PGA's. This is expected as they estimates an affine subspace greedily one dimension at a time. Regarding error reduction, as expected, all methods become less effective as $K$ grows. For small $σ$ , all methods achieve similar normalized output error levels with only a slight advantage for PGA and SFPCA. As $σ$ increases, PGA and HoroPCA become less effective compared to BSA and SFPCA. For large $σ$ , SFPCA exhibits a clear advantage over all other methods. In the larger $H (K_{2})$ experiments ( $D = 10^{2}, N = 10^{4}$ ), we compare the two fastest methods, SFPCA and PGA; see Fig. 6(a). When $σ$ is small, both methods have similar denoising performance for small $K$ ; SFPCA performs better only for larger $K$ . As $σ$ increases, SFPCA outperforms PGA irrespective of $K$ .

Fig. 6. — For each full-scale hyperbolic experiment, on the y-axes, we report running time and normalized output error in logarithmic scale. We report the median (solid line) and the first and third quartiles (shaded transparent area). Figures in rows (a),(b), and (c) are $H (K_{2}), H (D_{2})$ , and $H (N_{2})$ . All axes are in logarithmic scale.

2). Experiments $H (D_{1})$ and $H (D_{2})$ :

In $H (D_{1})$ , we fix $K = 1, N = 101$ and in $H (D_{2})$ , we let $K = 1, N = 10^{4}$ . Changing $D$ impacts each method differently; see Fig. 5(b). Both SFPCA and PGA take successively more time as $D$ increases, but they remain significantly faster than the other two, with average running times below 0.1 second. The running time of HoroPCA is (almost) agnostic to $D$ since its cost function (projected variance) is free of the parameter $D$ . Neither HoroPCA nor BSA outperform SFPCA in error reduction. All methods improve in their error reduction performance as $D$ increases. For large $σ$ , SFPCA provides the best error reduction performance among all algorithms. Comparing the fastest methods SFPCA and PGA, we observe consistent patterns in $H (D_{1})$ and $H (D_{2})$ : (1) SFPCA is faster, regardless of $D$ and the gap between the two methods can be as high as a factor of 10. (2) When $σ < 0.1$ , PGA slightly outperforms SFPCA in reducing error; with the lowest noise ( $σ = 0.01$ ), PGA gives 17% better accuracy, in average over all values of $D$ . However, as $σ$ increases, SFPCA becomes more effective; at the highest end $(σ = 0.5)$ , SFPCA outperforms PGA by 40%, in average over $D$ ; see Fig. 6(b).

3). Experiments $H (N_{1})$ and $H (N_{2})$ :

In $H (N_{1})$ , ( $K = 1, D = 10$ ), increasing $N$ impacts the running time of SFPCA and PGA due to computing $C_{x}$ ; see Fig. 5(c). Nevertheless, both are orders of magnitude faster than HoroPCA and BSA. All methods provide improved error reduction as $N$ increases. Comparing the fast methods SFPCA and PGA on larger datasets $H (N_{2}) (K = 1, D = 10^{2})$ shows that SFPCA is always faster, has a slight disadvantage in output error on small $σ$ , and substantial improvements on large $σ$ ; see Fig. 6(c).

D. Real Data: Spherical Spaces

We evaluate the performance of PCA methods using following datasets: (1) Intestinal Microbiome: Lahti et al. [51] analyzed the gut microbiota of $N = 1,006$ adults covering $D = 130$ bacterial groups. The study explored the effects of age, nationality, BMI, and DNA extraction methods on the microbiome. They assessed variations in microbiome compositions across young (18 to 40), middle-aged (41to 60), and older (61 to 77) age groups (a ternary classification problem). (2) Throat Microbiome: Charlson et al. [52] investigated the impact of cigarette smoking on the airway microbiota in 29 smokers and 33 non-smokers (a binary classification problem) using culture-independent 454 pyrosequencing of 16S rRNA. (3) Newsgroups: Using Python's scikit-learn package, the 20 newsgroups dataset was streamlined to a binary classification problem by retaining $N = 400$ samples from two distinct classes. Feature reduction was performed using TF-IDF, narrowing it down to $D = 3000$ features to improve computational efficiency. Each dataset has undergone standard preprocessing, e.g., normalization and square-root transformation, to ensure the data points are spherically distributed. For a fixed subspace dimension, we estimate spherical affine subspaces. Then, we compute the projected spherical data points and denoise the original compositional data.

Distortion Analysis.

For compositional data, we calculate distance matrices using Aitchison $(A I)$ , Jensen–Shannon $(J S)$ , and total variation ( $T V$ ) metrics. We also compute the spherical distance matrix ( $S$ ). For each embedding dimension $K$ , we compute projected point sets. We then compute the normalized errors; an example of normalized error is $\frac{{‖D_{T V} - {\hat{D}}_{T V}‖}_{F}}{{‖D_{T V}‖}_{F}}$ where $D_{T V}$ and ${\hat{D}}_{T V}$ are total variation distance matrices for the original and estimated data. For each algorithm, we then divide these normalized errors by their average across all algorithms, providing relative measures, that is, if the resulting relative error is greater than 1, the algorithm performs worse than average. We then report the mean and standard deviation of relative errors across different dimensions; see Table II. On all datasets, SFPCA outperforms the rest. Newsgroups experiments are limited to SFPCA and PGA due to the significant computational complexity of SPCA and RFPCA.

TABLE II.

The Mean and Standard Deviation of Normalized Distance Errors Are Divided by Their Average Across Methods. Classification Accuracies Are Percentage Deviations From 100% — Representing the Average Accuracy Across Methods. Boldface and Red Indicate SFPCA and the Top-Performing Method. Lower Distortions (↓) and Higher Accuracies (↑) Are Better

Metric (Method)	Throat Microbiome				Intestinal Microbiome				Newsgroups
Metric (Method)	SFPCA	RFPCA	SPCA	PGA	SFPCA	RFPCA	SPCA	PGA	SFPCA	PGA
$S (↓)$	0.88 ± 0.99	1.13 ± 1.28	0.9 ± 0.99	1.1 ± 1.18	0.75 ± 1.64	0.93 ± 2.09	1.47 ± 2.07	0.85 ± 1.79	0.77 ± 1.03	1.23 ± 1.4
$A I (↓)$	0.98 ± 0.48	1.03 ± 0.57	0.99 ± 0.46	1.01 ± 0.52	0.8 ± 1.06	0.81 ± 1.03	1.55 ± 1.04	0.84 ± 1.14	0.999 ± 0.4	1.001 ± 0.5
$J S (↓)$	0.91 ± 0.84	1.08 ± 1.04	0.95 ± 0.84	1.06 ± 0.97	0.75 ± 1.55	0.93 ± 1.96	1.47 ± 1.93	0.85 ± 1.7	0.9 ± 0.81	1.1 ± 0.98
$T V (↓)$	0.91 ± 1.0	1.08 ± 1.21	0.94 ± 1.0	1.07 ± 1.14	0.76 ± 1.65	0.92 ± 2.09	1.49 ± 2.07	0.83 ± 1.75	0.87 ± 0.88	1.13 ± 1.12
$(N N) (↑)$	−0.06 ± 3.03	0.15 ± 3.15	−0.26 ± 3.7	0.17 ± 3.0	0.04 ± 0.36	−0.05 ± 0.9	−0.02 ± 0.6	0.03 ± 0.3	0.04 ± 1.6	−0.04 ± 1.6
$(R F) (↑)$	0.59 ± 9.5	1.2 ± 9.6	−1.9 ± 9.3	0.11 ± 9.8	0.4 ± 2.3	0.11 ± 2.7	−0.74 ± 2.8	0.2 ± 2.3	0.3 ± 3.5	−0.3 ± 4.4

Open in a new tab

Classification Performance.

For each $K$ , using the denoised compositional data, we perform two classifiers: a five-layer neural network ( $N N$ ) and a random forest model ( $R F$ ). We normalize and report the mean and standard deviation of classification accuracies by the average accuracy of all methods. From Table II, SFPCA outperforms competing methods on Intestinal Microbiome and Newsgroups, though the accuracy differences are mostly less than one percent. In Section I, we further compare the performance of the two classifiers on Newsgroups as it relates to PCA analysis.

E. Real Data: Hyperbolic Spaces

We use a biological dataset of 103 plant and algal transcriptomes [53]. The authors inferred phylogenetic trees from genome-wide genes. Tree leaves are present-day species, internal nodes are ancestral species, and branch weights represent evolutionary distances. The dataset includes an “unfiltered” version with 852 trees and a “filtered” version with 844 trees after removing error-prone genes and filtering problematic sequences. Errors appear as outliers, with more expected in the unfiltered dataset a priori. Other studies [54], [55] have used this dataset to evaluate outlier detection methods. We preprocess each tree by rescaling branch lengths to a diameter of 10, compute the distance matrix between leaves, and embed it into a $D$ -dimensional ( $D = 20$ ) hyperbolic space using a semidefinite program [18]. We use two metrics to evaluate PCA results, and then apply them for outlier detection.

Distortion Analysis.

For a fixed dimension $K$ , we estimate hyperbolic affine subspaces, compute the projected hyperbolic points, and their hyperbolic distance matrix ${\hat{D}}_{H}$ . The normalized distance error $\frac{{‖D_{H} - {\hat{D}}_{H}‖}_{F}}{{‖D_{H}‖}_{F}}$ is calculated, where $D_{H}$ is the original distance matrix. These errors are averaged over $K \in [D]$ for each algorithm and then divided by the average normalized errors across all algorithms — providing relative errors. If the relative error is greater than 1, the algorithm performs worse than average. For each algorithm, we report the mean and standard deviation of these relative errors across all gene trees. Distortion is not a perfect measure of PCA accuracy as highly noise-contaminated data should experience high distortions during the projection (denoising) step. In all experiments, PGA outperforms others in terms of distortion (Table III). SFPCA provides an average distance-preserving performance, contrary to synthetic experiments. We conjecture this may be due to the trees being relatively small (see scaled-down hyperbolic experiments in Fig. 5), or to high noise levels making distortion an inappropriate accuracy metric, or the discordance between our choice of distortion function $f = cos h$ (which overemphasizes large distances) and the distance distortion metric.

TABLE III.

Normalized Distance Errors ( $H$ ) Mean and Standard Deviation Are Divided by Their Average Across Methods. Quartet Scores ( $Q$ ) Are Percentage Deviations From 100% (the Average). Lower Distortions (↓) and Higher Scores (↑) Are Better

Method	Filtered		Unfiltered
Method	$Q (↑)$	$H (↓)$	$Q (↑)$	$H (↓)$
SFPCA	1.50 ± 1.91	0.98 ± 0.23	1.10 ± 1.74	1.01 ± 0.25
PGA	1.48 ± 1.47	0.55 ± 0.09	1.80 ± 1.45	0.53 ± 0.08
BSA	1.48 ± 1.61	1.45 ± 0.24	1.67 ± 1.91	1.45 ± 0.31
HoroPCA	−4.48 ± 2.79	1.01 ± 0.21	−4.57 ± 2.58	1.00 ± 0.20

Open in a new tab

Quartet Scores.

To use a biologically motivated accuracy measure, we use the quartet score [56]. For a target dimension $K$ , each algorithm is applied to an embedded hyperbolic point set to compute the projected (denoised) points. For each set of four points, we find the optimal tree topology with minimum distance distortion using the four-point condition [57]. For 10⁵ randomly chosen (but fixed) sets of four projected points, we estimate their topology and compare it with the true topology from the gene trees. For each dimension $K \in [D]$ , we compute the percentage of correctly estimated topologies, then average this over all dimensions. We normalize and report the mean and standard deviation of quartet scores by the average score of all methods, as detailed in Table III. In these experiments, PGA and SFPCA exhibit the best performance compared to the alternatives. This is particularly informative as the quartet score measures tree topology accuracy, not distance.

1). Outlier Detection With Hyperbolic Spectrum:

We show-case the practical utility of the $J_{D}$ -eigenequation (Definition 10) in species tree estimation. As proved in Theorem 4, the principal axes align with the leading $J_{D}$ -eigenvectors of $C_{x}$ . Thus, the optimal SFPCA cost corresponds to the sum of its neglected $J_{D}$ -eigenvalues. We conjecture that a tree with outliers has more outlier $J_{D}$ -eigenvalues; see Fig. 7(a)–7(b). If a tree has an outlier set of species (likely from incorrect sequences), its second leading $J_{D}$ -eigenvalue ( $λ_{2}$ )¹ is significantly larger than the rest. We quantify this by plotting its normalized retained energy $\frac{\sum_{k = 2}^{K} λ_{k}}{\sum_{d = 2}^{D} λ_{d}}$ (or cumulative spectrum) versus the normalized embedding dimension (or number of $J_{D}$ -eigenvalues) $x = K / D$ and finding its the knee point. This lets us sort gene trees by their hyperbolic spectrum.

Fig. 7. — (a) and (b): Normalized retained energy versus normalized number of eigenvalues for two gene trees. The knee point (intersection with $y = 1 - x$ line) for trees with outliers approaches (0, 1). (c) The quartet score for species trees constructed using the top $N$ trees (knee values) versus random orders.

After sorting, we use the top $N$ trees with the smallest knee values (least prone to outliers) to construct a species topology using ASTRAL [58]. ASTRAL outputs the quartet score between the estimated species tree and the input gene trees, where a higher score indicates more congruence among input trees. Thus, a higher score after filtering means outlier gene trees, likely inferred from problematic sequences, have been removed. Our results (Fig. 7(c)) show that hyperbolic spectrum-based sorting — offered only by our SFPCA — effectively identifies the worst trees most dissimilar to others, without explicitly comparing tree topologies. In contrast, random sorting keeps the quartet score fixed. Filtered trees have a higher score than unfiltered trees and benefit less from further filtering. It is remarkable that using eigenvalues alone, we can effectively find genes with discordant evolutionary histories.

Acknowledgment

The authors would like to thank the National Institutes of Health (NIH) for their financial support.

This research was supported in part by the NIH under Grant 1R35GM142725. The associate editor coordinating the review of this article and approving it for publication was Dr. Marco Felipe Duarte.

Appendix

2) Claim 1: Let $x, y \in S_{H}^{D} \subseteq S^{D}$ and $γ_{x, y} (t)$ be the geodesic where $γ_{x, y} (0) = x$ and $γ_{x, y} (1) = y$ . This geodesic lies at the intersection of $span {x, y}$ and $S^{D}$ . Since $x, y \in S_{H}^{D} = p \oplus H \cap S^{D}$ and $p \oplus H$ is a subspace, we have $span {x, y} \subseteq p \oplus H$ . Therefore, we have $γ_{x, y} (t) \in span {x, y} \cap S^{D} \subseteq p \oplus H \cap S^{D} = S_{H}^{D}$ for all $t \in [0,1]$ .

3) Claim 2: For a spherical affine subspace $S_{H}^{D}$ , we define the sliced unitary matrix $G = [C^{\frac{1}{2}} p, C^{- \frac{1}{2}} h_{1}, \dots, C^{- \frac{1}{2}} h_{K}]$ where $p$ is the base point and $h_{1}, \dots, h_{K}$ are orthogonal basis for $H$ . Then, we have $G y \in S_{H}^{D}$ for all $y \in S^{K}$ . Conversely, for any sliced unitary matrix $G = [g_{0}, g_{1}, \dots, g_{K}]$ , we let $p = C^{- \frac{1}{2}} g_{0}$ and $h_{k} = C^{\frac{1}{2}} g_{k}$ for $k \in [K]$ . Since $h_{k}$ 's and $p$ are orthogonal, we can define the spherical affine subspace.

4) Claim 3: We simplify its PCA cost as $cost (S_{H}^{D} ∣ 𝒳) = 1 - E_{N} [P_{H} {(x_{n})}_{2}^{2}] \overset{(a)}{=} E_{N} [\sum_{k \in [K^{'}]} {⟨ x_{n}, h_{k}^{'} ⟩}^{2}] \overset{(b)}{=} \sum_{k \in [K^{'}]} {h_{k}^{'}}^{⊤} C_{x} h_{k}^{'}$ where (a) follows from Proposition 1, $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ are orthogonal basis for $H^{⊥} \subseteq T_{p} S^{D}$ , and (b) follows from cyclic property of trace.

5) Claim 4: From Corollary 2 and Definition 7, an optimal zero-dimensional affine subspace (a point) is a subset of any other spherical affine subspace. In general, $S_{H_{1}}^{D} \subseteq S_{H_{2}}^{D}$ if and only if $dim (S_{H_{1}}^{D}) \leq dim (S_{H_{2}}^{D})$ .

6) Claim 5: Let $v \in C^{D + 1}$ be such that $A J_{D} v = sgn ([v^{*}, v]) λ v$ where $|[v^{*}, v]| = 1$ , then $‖ v ‖_{2}^{- 1} v$ is an eigenvector of $A J_{D}$ with eigenvalue of $sgn ([v^{*}, v]) λ$ .

7) Claim 6: Let $x, y \in H_{H}^{D}$ and $γ_{x, y}$ be the geodesic where $γ_{x, y} (0) = x, γ_{x, y} (1) = y$ . This geodesic belongs to $span {x, y} \cap H^{D}$ . We have $span {x, y} \subseteq p \oplus H$ since $x, y \in H_{H}^{D} = p \oplus H \cap H^{D}$ and $p \oplus H$ is a subspace. Thus, we have $γ_{x, y} (t) \in span {x, y} \cap H^{D} \subseteq H_{H}^{D}$ for all $t \in [0, 1]$ .

8) Claim 7: For a hyperbolic affine subspace $H_{H}^{D}$ , we define $G = [| C |^{\frac{1}{2}} p, | C |^{- \frac{1}{2}} h_{1}, \dots, | C |^{- \frac{1}{2}} h_{K}]$ where $p$ is the base point and $h_{1}, \dots, h_{K}$ are orthogonal basis for $H$ . We have $G^{⊤} J_{K} G = J_{K}$ and $G y \in H^{D}$ for all $y \in H_{H}^{D}$ . Conversely, for any sliced unitary matrix $G = [g_{0}, g_{1}, \dots, g_{K}]$ , we let $p = | C |^{- \frac{1}{2}} g_{0}$ and $h_{k} = | C |^{- \frac{1}{2}} g_{k}$ for $k \in [K]$ . Since $h_{k}$ 's and $p$ are orthogonal, we can define the spherical affine subspace.

9) Claim 8: Let $p \in H^{D}, q_{1}, \dots, q_{K} \in \partial H^{D}$ , and $γ_{1}, \dots, γ_{K}$ be the aforementioned geodesics. A point $x \in S \overset{def}{=} GH (p, q_{1}, \dots, q_{K})$ belongs to a geodesic whose end points are $γ_{k} (t)$ and $γ_{k^{'}} (t^{'})$ for $t, t^{'} \in R$ and $k, k^{'} \in [K]$ . Let us show $x \in H_{H}^{D}$ for a subspace $H \subseteq T_{p} H^{D}$ . From Claim 6, $H_{H}^{D}$ is a geodesic submanifold. It suffices to show that $γ_{1}, \dots, γ_{K}$ belong to $H_{H}^{D}$ . Let $h_{k} = γ_{k}^{'} (0) \in T_{p} H^{D}$ , for all $k \in [K]$ , and $H = span \{h_{1}, \dots, h_{K}\}$ . This proves $S \subseteq H_{H}^{D}$ . Conversely, let $x \in H_{H}^{D}$ — the hyperbolic affine subspace constructed as before. Since $H_{H}^{D}$ is a geodesic submanifold, $x$ belongs to a geodesic whose end points are $γ_{k} (t)$ and $γ_{k^{'}} (t^{'})$ for $t, t^{'} \in R$ and $k, k^{'} \in [K]$ , constructed as before. From $x \in convhull (γ_{1}, \dots, γ_{K})$ , we have $H_{H}^{D} \subseteq S$ .

10) Claim 9: From Theorem 4 and Definition 12, a optimal zero-dimensional affine subspace is a subset of any other affine subspace. For optimal affine subspaces $\{H_{H_{i}}^{D}\}$ , we have $H_{H_{1}}^{D} \subseteq H_{H_{2}}^{D}$ if and only if $dim (H_{H_{1}}^{D}) \leq dim (H_{H_{2}}^{D})$ .

11) Proposition 1: Consider the following Lagrangian:

ℒ (y, γ, Λ) = ⟨ x, y ⟩ + γ (⟨ y, y ⟩ - C^{- 1}) + \sum_{k \in [K^{'}]} λ_{k} ⟨y, h_{k}^{'}⟩,

(14)

where $Λ \overset{def}{=} \{λ_{k} : k \in [K^{'}]\}, ⟨ y, y ⟩ = C^{- 1}$ and $⟨y, h_{k}^{'}⟩ = 0, \forall k \in [K^{'}]$ . The solution to equation (14) takes the form $𝒫_{H} (x) = \sum_{i \in [K^{'}]} α_{i} h_{i}^{'} + β x$ , for scalars ${\{α_{i}\}}_{i \in [K^{'}]}$ and $β$ . The subspace conditions — $⟨𝒫_{H} (x), h_{k}^{'}⟩ = 0$ , $\forall k \in [K^{'}]$ — give $𝒫_{H} (x) = β P_{H} (x)$ where $P_{H} (x) = x - C^{- 1} \sum_{k \in [K^{'}]} ⟨x, h_{k}^{'}⟩ h_{k}^{'}$ . Enforcing the norm condition, we arrive at $𝒫_{H} (x) = C^{- \frac{1}{2}} {‖P_{H} (x)‖}_{2}^{- 1} P_{H} (x)$ where ${‖P_{H} (x)‖}_{2}^{2} = C^{- 1} (1 - \sum_{k \in [K^{'}]} {⟨x, h_{k}^{'}⟩}^{2})$ . Then, we have $d (x, 𝒫_{H} (x)) = C^{- \frac{1}{2}} acos (C ⟨x, C^{- \frac{1}{2}} {‖P_{H} (x)‖}_{2}^{- 1} P_{H} (x)) = C^{- \frac{1}{2}} acos (C^{\frac{1}{2}} {‖P_{H} (x)‖}_{2}) .$ If $P_{H} (x) = 0$ — i.e., $x \in span (H^{⊥})$ — then $𝒫_{H} (x) \in S_{H}^{D}$ is nonunique; but the projection distance well-defined as $d (x, y) = C^{- \frac{1}{2}} \frac{π}{2}$ , $\forall y \in S_{H}^{D}$ .

12) Proposition 2: Let $A = [C p, h_{1}, \dots, h_{K}, h_{1}^{'}, \dots, h_{K^{'}}^{'}] \in R^{(D + 1) \times (D + 1)}$ where $K + K^{'} = D$ . Distinct columns of $A$ are orthogonal — i.e., $A^{⊤} A = C I_{D + 1}$ . Hence, $p, h_{1}, \dots, h_{K}, h_{1}^{'}, \dots, h_{K^{'}}^{'}$ are linearly independent. Therefore, we have $P_{H} (x) = \sum_{k \in [K]} α_{k} h_{k} + \sum_{k \in [K^{'}]} β_{k} h_{k}^{'} + γ p \overset{(a)}{=} x - C^{- 1} \sum_{k \in [K^{'}]} ⟨x, h_{k}^{'}⟩ h_{k}^{'}$ where ${\{α_{k}\}}_{k \in [K]}, {\{β_{k}\}}_{k \in [K^{'}]}, γ$ are scalars, and (a) is due to Proposition 1. We hence have $β_{k} = C^{- 1} ⟨P (x), h_{k}^{'}⟩ = 0, α_{k} = C^{- 1} ⟨P (x), h_{k}⟩ = C^{- 1} ⟨x, h_{k}⟩$ and $γ = C ⟨ P (x), p ⟩ = C ⟨ x, p ⟩$ . We can accordingly compute ${‖P_{H} (x)‖}_{2}$ , and prove the proposition.

13) Proposition 3: Let $A$ be a real matrix such that $A = A^{[⊤]}$ , that is, $A = J_{D} A^{⊤} J_{D}$ . Let $(v, λ)$ be an eigenvectoreigenvalue pair of $A$ . Then, we have $A J_{D} v = sgn ([v^{*}, v]) λ v$ and $A J_{D} v^{*} = sgn ([v^{*}, v]) λ^{*} v^{*}$ . We have $λ^{*} = λ$ since

λ [v^{*}, v] = {(v^{*})}^{⊤} J_{D} A J_{D} v sgn ([v^{*}, v]) = {(A J_{D} v^{*})}^{⊤} J_{D} v sgn ([v^{*}, v]) = sgn ([v^{*}, v]) λ^{*} v^{H} J_{D} v sgn ([v^{*}, v]) = λ^{*} [v^{*}, v] .

14) Proposition 4: Let $A$ be a full-rank matrix and $v \in C^{1, D}$ such that $A J_{D} v = λ v$ and $[v^{*}, v] = 0$ . Then, we have $v^{H} J_{D} A J_{D} v = v^{H} J_{D} λ v = λ [v^{*}, v] = 0$ . This contradicts with the assumption that $A$ is full-rank.

15) Proposition 5: Consider the following Lagrangian:

ℒ (y, γ, Λ) = [x, y] + γ ([y, y] - C^{- 1}) + \sum_{k \in [K^{'}]} λ_{k} [y, h_{k}^{'}],

(15)

where $Λ = {\{λ_{k}\}}_{k \in [K^{'}]}$ , admits the solution $𝒫_{H} (x) = \sum_{i \in [K^{'}]} α_{i} h_{i}^{'} + β x$ , for scalars ${\{α_{i}\}}_{i \in [K^{'}]}$ and $β$ . The subspace conditions, $[𝒫_{H} (x), h_{k}^{'}] = 0, \forall k \in [K^{'}],$ give $𝒫_{H} (x) = β P_{H} (x)$ where $P_{H} (x) = x + C^{- 1} \sum_{k \in [K^{'}]} [x, h_{k}^{'}] h_{k}^{'}$ . Enforcing the norm condition, we get $𝒫_{H} (x) = | C |^{- \frac{1}{2}} {‖P_{H} (x)‖}^{- 1} P_{H} (x)$ , $‖P_{H} (x)‖ \overset{def}{=} \sqrt{- [P_{H} (x), P_{H} (x)]} = | C |^{- \frac{1}{2}} \sqrt{1 + \sum_{k \in [K^{'}]} {[x, h_{k}^{'}]}^{2}}$ . We have $(x, 𝒫_{H} (x)) = | C |^{- \frac{1}{2}} acosh (C [x, 𝒫_{H} (x)]) = | C |^{- \frac{1}{2}} acosh (| C |^{\frac{1}{2}} ‖P_{H} (x)‖)$ .

16) Proposition 6: Let $A = [C p, h_{1}, \dots, h_{K}, h_{1}^{'}, \dots, h_{K^{'}}^{'}] \in R^{(D + 1) \times (D + 1)}$ where $K + K^{'} = D$ . Columns of $A$ are $J$ -orthogonal, that is, $A^{⊤} J_{D} A = | C | J_{D}$ . Since we have $J_{D} A^{⊤} J_{D} A_{D} = | C | I_{D + 1}$ , $p, h_{1}, \dots, h_{K}, h_{1}^{'}, \dots, h_{K^{'}}^{'}$ are linearly independent, we have $P_{H} (x) = \sum_{k \in [K]} α_{k} h_{k} + \sum_{k \in [K^{'}]} β_{k} h_{k}^{'} + γ p \overset{(a)}{=} x + C^{- 1} \sum_{k \in [K^{'}]} [x, h_{k}^{'}] h_{k}^{'}$ where ${\{α_{k}\}}_{k \in [K]}, {\{β_{k}\}}_{k \in [K^{'}]}, γ$ are scalars, and (a) is due Proposition 5. So we have $β_{k} = - C^{- 1} [P (x), h_{k}^{'}] = 0$ , $α_{k} = - C^{- 1} [x, h_{k}],$ and $γ = C [P (x), p] = C [x, p],$ i.e., $P_{H} (x) = C [x, p] p - C^{- 1} \sum_{k \in [K]} [x, h_{k}] h_{k}$ . Then, we can compute $‖P_{H} (x)‖$ and prove the proposition.

17) Proposition 7: Let $A = V Λ V^{⊤}$ where $V$ is $J_{D}$ -unitary. Since $Λ$ is a diagonal matrix, we let $A = V J_{D} Λ J_{D} V^{⊤}$ . We have $A J_{D} V = V J_{D} Λ J_{D} (V^{⊤} J_{D} V) = V J_{D} Λ$ , i.e., $A$ is $J_{D}$ -diagonalizable.

Let $A \in R^{(D + 1) \times (D + 1)}$ be symmetric and $A J_{D} V = V J_{D} Λ$ for a $J_{D}$ -invertible $V$ and $Λ = diag {(λ_{d})}_{d \in [D + 1]}$ with distinct (in absolute values) elements. Then, we have

A J_{D} v_{d} = \{\begin{array}{l} - λ_{d} v_{d}, & if d = 1 \\ λ_{d} v_{d}, & if d \neq 1, \end{array}

(16)

where $v_{d}$ be the $d$ -th column of $V$ . In the eigenequation (16), the negative (positive) signs are designated for the eigenvectors with negative (positive) norms. For distinct $i, j$ , we have

|λ_{i} [v_{i}, v_{j}]| \overset{(a)}{=} |[A J_{D} v_{i}, v_{j}]| = |v_{i}^{⊤} J_{D} A^{⊤} J_{D} v_{j}| = |v_{i}^{⊤} J_{D} A J_{D} v_{j}| = |[v_{i}, A J_{D} v_{j}]| = |λ_{j} [v_{i}, v_{j}]|,

where (a) is due to the eigenequation (16). Since $|λ_{i}| \neq |λ_{j}|$ , then we must have $[v_{i}, v_{j}] = 0$ . Without loss of generality, $J_{D}$ -eigenvectors are scaled such that $|[v_{d}, v_{d}]| = 1$ for $d \in [D + 1]$ . Lemma 1 shows that $[v_{1}, v_{1}] = - 1$ and $[v_{d}, v_{d}] = 1$ for $d > 1$ .

Lemma 1: $V^{⊤} J_{D} V = J_{D}$ .

Proof: Let $A$ where $A J_{D} V = V J_{D} Λ$ and $V J_{D}$ -diagonalizes $A^{⊤} J_{D} A$ , viz., $V^{⊤} J_{D} (A^{⊤} J_{D} A) J_{D} V Λ \overset{(a)}{=} {(V J_{D} Λ)}^{⊤} J_{D} (V J_{D} Λ) \overset{(b)}{=} Λ J_{D} Λ^{'} J_{D} Λ = Λ^{2} Λ^{'}$ , where (a) follows from $A J_{D} V = V J_{D} Λ$ , (b) from $[v_{i}, v_{j}] = 0$ for $i \neq j$ , and $Λ^{'}$ is a diagonal matrix. This shows that $V$ diagonalizes $B \overset{def}{=} {(A J_{D})}^{⊤} J_{D} (A J_{D})$ , i.e., $V^{⊤} {(A J_{D})}^{⊤} J_{D} (A J_{D}) V$ is a diagonal matrix. However, $B$ is a symmetric matrix with only one negative eigenvalue [18]. Therefore, without loss of generality, the first diagonal element of $Λ^{'}$ is negative. □

From Lemma 1, we have $V^{- 1} = J_{D} V^{⊤} J_{D}$ and $A = A J_{D} V {(J_{D} V)}^{- 1} = V J_{D} Λ V^{- 1} J_{D} = V J_{D} Λ (J_{D} V^{⊤} J_{D}) J_{D} = V J_{D} Λ J_{D} V^{⊤} = V Λ V^{⊤}$ .

18) Theorem1: Consider $S_{H}^{D}$ with orthogonal tangents $h_{1}, \dots, h_{K}$ . For $x \in S_{H}^{D}$ , we have $P_{H} (x) = x, {‖P_{H} (x)‖}_{2}^{2} = C^{- 1}$ , and $‖ 𝒬 (x) ‖_{2}^{2} = C^{- 1}$ , i.e., $𝒬 (x) \in S^{K}$ and $𝒬$ is a map between $S_{H}^{D}$ to $S^{K}$ ; see proof of Proposition 2. We also have $x = P_{H} (x) = C ⟨ x, p ⟩ p + C^{- 1} \sum_{k \in [K]} ⟨x, h_{k}⟩ h_{k} = 𝒬^{- 1} \circ 𝒬 (x)$ for all $x \in S_{H}^{D}$ . Hence, $𝒬^{- 1}$ is the inverse map of $𝒬$ — a bijection. Finally, $𝒬$ is an isometry between $S_{H}^{D}$ and $S^{K}$ since $d (x_{1}, x_{2}) = d (𝒬 (x_{1}), 𝒬 (x_{2}))$ for all $x_{1}, x_{2} \in S_{H}^{D}$ .

19) Theorem2: We have $\sum_{k \in [K^{'}]} h_{k}^{'} C_{x} h_{k}^{'} \geq \sum_{k \in [K^{'}]} C λ_{D + 1 - k} (C_{x})$ where $λ_{d} (C_{x})$ is the $d$ -the largest eigenvalue of $C_{x}$ . We achieve the lower bound if we let $h_{k}^{'} = C^{\frac{1}{2}} v_{D + 1 - k} (C_{x})$ for $k \in [K^{'}]$ . The optimal base point is any vector in $span {\{h_{1}^{'}, \dots, h_{K^{'}}^{'}\}}^{⊥}$ with norm $C^{- \frac{1}{2}}$ , i.e., $p = C^{- \frac{1}{2}} v_{1} (C_{x})$ allows for nested affine subspaces.

20) Theorem3: Consider $H_{H}^{D}$ with orthogonal tangents $h_{1}, \dots, h_{K}$ . For $x \in H_{H}^{D}$ , we have $P_{H} (x) = x$ , $[P_{H} (x), P_{H} (x)] = C^{- 1},$ and $[𝒬 (x), 𝒬 (x)] = C^{- 1},$ i.e., $𝒬 (x) \in H^{K}$ and $𝒬$ is a map between $H_{H}^{D}$ to $H^{K}$ ; see proof of Proposition 6. We also have $x = P_{H} (x) = C [x, p] p - C^{- 1} \sum_{k \in [K]} [x, h_{k}] h_{k} = 𝒬^{- 1} \circ 𝒬 (x)$ for all $x \in S_{H}^{D}$ . Hence, $𝒬^{- 1}$ is the inverse map of $𝒬$ — a bijection. Finally, $𝒬$ is an isometry between $H_{H}^{D}$ and $S^{K}$ since $d (x_{1}, x_{2}) = d (𝒬 (x_{1}), 𝒬 (x_{2}))$ for all $x_{1}, x_{2} \in H_{H}^{D}$ .

21) Theorem4: WLOG, we scale ${\{h_{k^{'}}^{'}\}}_{k^{'}}$ and $H^{'} \overset{def}{=} [h_{1}^{'}, \dots, h_{K^{'}}^{'}] \in R^{(D + 1) \times K^{'}}$ such that $[h_{i}^{'}, h_{j}^{'}] = δ_{i, j}$ , i.e., $H^{' ⊤} J_{D} H^{'} = I_{K^{'}}$ . The cost is:

cost (H_{H}^{D} ∣ 𝒳) = \sum_{k \in [K^{'}]} h_{k}^{' ⊤} J_{D} C_{x} J_{D} h_{k}^{'} = Tr \{H^{' ⊤} J_{D} C_{x} J_{D} H^{'}\} = Tr \{H^{' ⊤} J_{D} V Λ V^{⊤} J_{D} H^{'}\} = Tr \{W^{⊤} Λ W\} \overset{(a)}{=} Tr \{W W^{⊤} J_{D} Λ J_{D}\} = Tr \{𝒲 Λ J_{D}\},

where $C_{x} = V Λ V^{⊤}$ is $J_{D}$ -diagonalizable, $W = V^{⊤} J_{D} H^{'} \in R^{(D + 1) \times K^{'}}$ , (a) follows from $Λ$ being a diagonal matrix, i.e., $Λ = J_{D} Λ J_{D}$ , and $𝒲 \overset{def}{=} W W^{⊤} J_{D}$ .

Lemma 2: $W^{⊤} J_{D} W = I_{K^{'}}$ .

Proof: $W^{⊤} J_{D} W = H^{' ⊤} J_{D} (V J_{D} V^{⊤}) J_{D} H^{'} \overset{(a)}{=} H^{' ⊤} J_{D} H^{'} = I_{K^{'}}$ , where (a) follows from $V J_{D} V^{⊤} = J_{D}$ . This is the case, since by definition, we have $V^{⊤} J_{D} V = J_{D}$ , or $J_{D} V^{⊤} J_{D} V_{D} = I_{D + 1}$ , that is, $J_{D} V^{⊤} J_{D} = V^{- 1}$ . Hence, we have $V J_{D} V^{⊤} = J_{D}$ . Finally, $H^{' ⊤} J_{D} H^{'} = I_{K^{'}}$ is the direct result of the orthogonality of basis vectors $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ .□

We write the cost function as follows:

cost (H_{H}^{D} ∣ 𝒳) = Tr \{𝒲 Λ J_{D}\} = - 𝒲_{11} λ_{1} + \sum_{d = 2}^{D + 1} 𝒲_{d, d} λ_{d}

where $\sum_{d = 1}^{D + 1} 𝒲_{d, d} = Tr \{W^{⊤} J_{D} W\} = Tr \{I_{K^{'}}\}$ , i.e., $𝒲_{11} = - \sum_{d = 2}^{D + 1} 𝒲_{d, d} + K^{'}$ . Let $W \in R^{(D + 1) \times K^{'}}$ be as follows:

W = [\begin{matrix} \sqrt{{‖w_{1}‖}_{2}^{2} - 1} & \dots \sqrt{{‖w_{K^{'}}‖}_{2}^{2} - 1} \\ w_{1} & \dots w_{K^{'}} \end{matrix}],

(17)

for vectors $w_{1}, \dots, w_{K^{'}} \in R^{D}$ with $ℓ_{2}$ norms greater or equal to 1 — notice $W^{⊤} J_{D} W = I_{K^{'}}$ and Lemma 2. From $𝒲 = W W^{⊤} J_{D}$ and equation (17), we have:

𝒲_{11} = - 1 \sum_{k \in [K^{'}]} ({‖w_{k}‖}^{2} - 1) \leq 0 : {‖w_{k}‖}_{2} \geq 1, \forall k \in [K^{'}] .

For $d \geq 2, 𝒲_{d, d}$ is the squared norm of the $d$ -th row of $W$ . Let $W_{c} \in R^{D \times K^{'}}$ where its $d$ -th row is equal to the ( $d + 1$ )-th row of $W$ . Therefore, we have

\sum_{d = 2}^{D + 1} 𝒲_{d, d} = Tr \{W_{c}^{⊤} W_{c}\} = \sum_{k = 1}^{K^{'}} {‖w_{k}‖}_{2}^{2} .

(18)

Let us now simplify the cost function as follows:

cost (H_{H}^{D}) = \sum_{d = 2}^{D + 1} 𝒲_{d, d} (λ_{1} + λ_{d}) - K^{'} D λ_{1} .

(19)

Fig. 8. — Classification accuracy (%) of PCA to dimension $K$ on reconstructed compositional data with (a) random forest and (b) neural net.

Lemma 3: For all $d \geq 2$ , we have $λ_{d} + λ_{1} \geq 0$ .

Proof: Let $v_{d}$ be the $d$ -th $J_{D}$ -eigenvector of $C_{x}$ . We have $λ_{d} = v_{d}^{⊤} J_{D} (sgn ([v_{d}, v_{d}]) λ_{d} v_{d}) = v_{d}^{⊤} J_{D} C_{x} J_{D} v_{d} = N^{- 1} \sum_{n \in [N]} {[x_{n}, v_{d}]}^{2} \geq 0$ , for all $d \geq 1$ . QED. □

From Lemma 3, equation (18), and ${‖w_{k}‖}_{2} \geq 1$ for all $k \in [K^{'}]$ , the minimum of the cost function in equation (19) happens only if $\sum_{d = 2}^{D + 1} 𝒲_{d, d} = \sum_{k = 1}^{K^{'}} {‖w_{k}‖}_{2}^{2} = K^{'}$ , i.e., ${‖w_{1}‖}_{2} = \dots = {‖w_{K^{'}}‖}_{2} = 1$ . Therefore, we have $W = V^{⊤} J_{D} H^{'} = [\begin{matrix} 0 & \dots & 0 \\ w_{1} & \dots & w_{K^{'}} \end{matrix}]$ . The first row of $V^{⊤} J_{D} H^{'}$ corresponds to the Lorentzian product of $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ and the first column of $V$ , i.e., $v_{1}$ . Hence, we have $h_{1}^{'}, \dots, h_{K^{'}}^{'} \in v_{1}^{⊥}$ . Since $v_{1}$ is the only negative $J_{D}$ -eigenvector of $C_{x}$ , then we have $p = v_{1}$ . From the unit norm constraintS for $w_{1}, \dots, w_{K^{'}}$ , we have $W^{⊤} W = I_{K^{'}}$ , i.e., $W$ is a sliced unitary matrix and $W W^{⊤}$ has zero-one eigenvalues. The cost function $cost (H_{H}^{D} ∣ 𝒳) = Tr \{W^{⊤} Λ W\} = Tr \{Λ W W^{⊤}\}$ achieves its minimum if and only if the non-zero singular values of $W W^{⊤}$ are aligned with the $K^{'}$ smallest diagonal values of $Λ$ — from the von Neumann's trace inequality [59]. Let $λ_{2} \leq λ_{3} \leq \dots$ . If $w_{i} = e_{i}$ for $i = 2, \dots, K^{'} + 1$ , then we achieve the minimum of the cost function, that is, $h_{1}^{'}, \dots, h_{K^{'}}^{'}$ are $K^{'}$ negative $J_{D}$ -eigenvectors paired to the smallest $J_{D}$ -eigenvalues of $C_{x}$ .

A. Additional Experiments

We present experiments on the NewsGroups dataset to demonstrate the impact of spherical PCA on classification accuracy. Using random forest and a five-layer neural network classifiers with a 90% training and 10% test split, the average accuracy is based on 20 random splits. As shown in Fig. 8, SFPCA outperforms PGA in average accuracy over $K$ (target dimension) experiments. Both methods significantly improve random forest performance and modestly improve the neural network's. This may be due to the neural network's denoising ability. For large $K$ , random forest accuracy declines, unlike the neural network. This may be due to the Newsgroups dataset’s high sparsity. Discarding small eigenvalues of the second-moment matrix significantly alters the data's sparsity, adversely affecting the random forest's accuracy.

Footnotes

The $J_{D}$ -eigenvalue $λ_{1}$ corresponds to the base point.

Contributor Information

Puoya Tabaghi, Halicioğlu Data Science Institute, University of California San Diego, San Diego, CA 92093 USA.

Michael Khanzadeh, Computer Science and Engineering Department, University of California San Diego, San Diego, CA 92093 USA. He is now with the Department of Computer Science, Columbia University, New York, NY 10027 USA.

Yusu Wang, Halicioğlu Data Science Institute, University of California San Diego, San Diego, CA 92093 USA.

Siavash Mirarab, Electrical and Computer Engineering Department, University of California San Diego, San Diego, CA 92093 USA.

REFERENCES

[1].Thurstone LL, “Multiple factor analysis.” Psychol. Rev, vol. 38, no. 5, 1931, Art. no. 406. [Google Scholar]
[2].Wold S, Esbensen K, and Geladi P, “Principal component analysis,” Chemometrics Intell. Lab. Syst, vol. 2, nos. 1–3, pp. 37–52, 1987. [Google Scholar]
[3].Stewart GW, “On the early history of the singular value decomposition,” SIAM Rev., vol. 35, no. 4, pp. 551–566, 1993. [Google Scholar]
[4].Hotelling H, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol, vol. 24, no. 6, 1933, Art. no. 417. [Google Scholar]
[5].Jolliffe IT, Principal Component Analysis. New York, NY, USA: Springer, 2002. [Google Scholar]
[6].Tipping ME and Bishop CM, “Probabilistic principal component analysis,” J. Roy. Statist. Soc.: Ser. B (Statist. Methodol.), vol. 61, no. 3, pp. 611–622, 1999. [Google Scholar]
[7].Vidal R, Ma Y, and Sastry S, “Generalized principal component analysis (GPCA),” IEEE Trans. Pattern Anal. Mach. Intell, vol. 27, no. 12, pp. 1945–1959, Dec. 2005. [DOI] [PubMed] [Google Scholar]
[8].Lawrence N, “Gaussian process latent variable models for visualisation of high dimensional data,” in Proc. Adv. Neural Inf. Process. Syst, vol. 16, 2003, pp. 329–336. [Google Scholar]
[9].Roweis S, “EM algorithms for PCA and SPCA,” in Proc. Adv. Neural Inf. Process. Syst, vol. 10, 1997, pp. 626–632. [Google Scholar]
[10].Bishop C, “Bayesian PCA,” in Proc. Adv. Neural Inf. Process. Syst, vol. 11, 1998, pp. 382–388. [Google Scholar]
[11].Jolliffe IT, Trendafilov NT, and Uddin M, “A modified principal component technique based on the LASSO,” J. Comput. Graphical Statist, vol. 12, no. 3, pp. 531–547, 2003. [Google Scholar]
[12].Zou H, Hastie T, and Tibshirani R, “Sparse principal component analysis,” J. Comput. Graphical Statist, vol. 15, no. 2, pp. 265–286, 2006. [Google Scholar]
[13].Cai TT, Ma Z, and Wu Y, “Sparse PCA: Optimal rates and adaptive estimation,” Ann. Statist, vol. 41, no. 6, pp. 3074–3110, 2013. [Google Scholar]
[14].Guan Y and Dy J, “Sparse probabilistic principal component analysis," in Proc. Artif. Intell. Statist, PMLR, 2009, pp. 185–192. [Google Scholar]
[15].Xu H, Caramanis C, and Sanghavi S, “Robust PCA via outlier pursuit,” in Proc. Adv. Neural Inf. Process. Syst, vol. 23, 2010, pp. 2496–2504. [Google Scholar]
[16].Fletcher PT, Lu C, Pizer SM, and Joshi S, “Principal geodesic analysis for the study of nonlinear statistics of shape,” IEEE Trans. Med. Imag, vol. 23, no. 8, pp. 995–1005, Aug. 2004. [DOI] [PubMed] [Google Scholar]
[17].Jiang Y, Tabaghi P, and Mirarab S, “Learning hyperbolic embedding for phylogenetic tree placement and updates,” Biology, vol. 11, no. 9, 2022, Art. no. 1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Tabaghi P and Dokmanić I, “Hyperbolic distance matrices,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery & Data Mining, 2020, pp. 1728–1738. [Google Scholar]
[19].Fletcher PT and Joshi S, “Principal geodesic analysis on symmetric spaces: Statistics of diffusion tensors," in Proc. Comput. Vis. Math. Methods Med. Biomed. Image Anal, Heidelberg, Germany: Springer, 2004, pp. 87–98. [Google Scholar]
[20].Lee JM, Riemannian Manifolds: An Introduction to Curvature, vol. 176. New York, NY, USA: Springer Science & Business Media, 2006. [Google Scholar]
[21].Sonthalia R and Gilbert A, “Tree! I am no tree! I am a low dimensional hyperbolic embedding,” in Proc. Adv. Neural Inf. Process. Syst, vol. 33, 2020, pp. 845–856. [Google Scholar]
[22].Tabaghi P, Peng J, Milenkovic O, and Dokmanić I, “Geometry of similarity comparisons," 2020, arXiv:2006.09858.
[23].Chien E, Pan C, Tabaghi P, and Milenkovic O, “Highly scalable and provably accurate classification in Poincaré balls," in Proc. IEEE Int. Conf. Data Mining, Piscataway, NJ, USA: IEEE Press, 2021, pp. 61–70. [Google Scholar]
[24].Chien E, Tabaghi P, and Milenkovic O, “HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discovery Data Mining, 2022, pp. 201–211. [Google Scholar]
[25].Klimovskaia A, Lopez-Paz D, Bottou L, and Nickel M, “Poincaré maps for analyzing complex hierarchies in single-cell data,” Nature Commun., vol. 11, no. 1, pp. 1–9, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Zhou Y and Sharpee TO, “Hyperbolic geometry of gene expression,” Iscience, vol. 24, no. 3, 2021, Art. no. 102225. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Meng Y et al. , “Spherical text embedding,” in Proc. Adv. Neural Inf. Process. Syst, vol. 32, 2019, pp. 8208–8217. [Google Scholar]
[28].Dai X and Müller H-G, “Principal component analysis for functional data on Riemannian manifolds and spheres,” Ann. Statist, vol. 46, no. 6B, pp. 3334–3361, 2018. [Google Scholar]
[29].Gu A, Sala F, Gunel B, and Ré C, “Learning mixed-curvature representations in product spaces,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 2898–2918. [Google Scholar]
[30].Rahman IU, Drori I, Stodden VC, Donoho DL, and Schröder P, “Multiscale representations for manifold-valued data,” Multiscale Model. & Simul, vol. 4, no. 4, pp. 1201–1232, 2005. [Google Scholar]
[31].Tournier M, Wu X, Courty N, Arnaud E, and Reveret L, “Motion compression using principal geodesics analysis," in Computer Graphics Forum, vol. 28, no. 2. Oxford, U.K.: Wiley Online Library, 2009, pp. 355–364. [Google Scholar]
[32].Anirudh R, Turaga P, Su J, and Srivastava A, “Elastic functional coding of human actions: From vector-fields to latent variables,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2015, pp. 3147–3155. [Google Scholar]
[33].Fletcher PT, Lu C, and Joshi S, “Statistics of shape via principal geodesic analysis on Lie groups," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, vol. 1. Piscataway, NJ, USA: IEEE Press, 2003, pp. I–I. [Google Scholar]
[34].Sommer S, Lauze F, and Nielsen M, “Optimization over geodesics for exact principal geodesic analysis,” in Proc. Adv. Comput. Math, vol. 40, no. 2, 2014, pp. 283–313. [Google Scholar]
[35].Huckemann S, Hotz T, and Munk A, “Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions,” Statistica Sinica, vol. 20, no. 1, pp. 1–58, 2010. [Google Scholar]
[36].Lazar D and Lin L, “Scale and curvature effects in principal geodesic analysis,” J. Multivariate Anal, vol. 153, pp. 64–82, Jan. 2017. [Google Scholar]
[37].Pennec X, “Barycentric subspace analysis on manifolds,” Ann. Statist, vol. 46, no. 6A, pp. 2711–2746, 2018. [Google Scholar]
[38].Sommer S, Lauze F, Hauberg S, and Nielsen M, “Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations," in Comput. Vis.(ECCV): 11th Eur. Conf. Comput. Vis, Heraklion, Crete, Greece. Springer, 2010, pp. 43–56. [Google Scholar]
[39].Tabaghi P, Chien E, Pan C, Peng J, and Milenković O, “Linear classifiers in product space forms," 2021, arXiv:2102.10204.
[40].Liu K, Li Q, Wang H, and Tang G, “Spherical principal component analysis," in Proc. SIAM Int. Conf. Data Mining, Philadelphia, PA, USA: SIAM, 2019, pp. 387–395. [Google Scholar]
[41].Chami I, Gu A, Nguyen DP, and Ré C, “HoroPCA: Hyperbolic dimensionality reduction via horospherical projections," in Proc. Int. Conf. Mach. Learn. PMLR, 2021, pp. 1419–1429. [Google Scholar]
[42].Chakraborty R, Seo D, and Vemuri BC, “An efficient exact-PGA algorithm for constant curvature manifolds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2016, pp. 3976–3984. [Google Scholar]
[43].Gallier J and Quaintance J, Differential Geometry and Lie Groups: A Computational Perspective. Springer Nature, 2020. [Google Scholar]
[44].Pearson K, “LIII. On lines and planes of closest fit to systems of points in space,” London, Edinburgh, Dublin Philos. Mag. J. Sci, vol. 2, no. 11, pp. 559–572, 1901. [Google Scholar]
[45].Lahav A and Talmon R, “Procrustes analysis on the manifold of SPSD matrices for data sets alignment,” IEEE Trans. Signal Process, vol. 71, pp. 1907–1921, 2023. [Google Scholar]
[46].Lou A, Katsman I, Jiang Q, Belongie S, Lim S-N, and De Sa C, “Differentiating through the Fréchet mean," in Proc. Int. Conf. Mach. Learn, PMLR, 2020, pp. 6393–6403. [Google Scholar]
[47].Huckemann S and Ziezold H, “Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces,” in Proc. Adv. Appl. Probability, vol. 38, no. 2, pp. 299–319, 2006. [Google Scholar]
[48].Jung S, Dryden IL, and Marron JS, “Analysis of principal nested spheres,” Biometrika, vol. 99, no. 3, pp. 551–568, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[49].Higham NJ, “J-orthogonal matrices: Properties and generation,” SIAM Rev., vol. 45, no. 3, pp. 504–519, 2003. [Google Scholar]
[50].Slapničar I and Veselić K, “A bound for the condition of a hyperbolic eigenvector matrix,” Linear Algebra Appl., vol. 290, nos. 1-3, pp. 247255, 1999. [Google Scholar]
[51].Lahti L, Salojärvi J, Salonen A, Scheffer M, and De Vos WM, “Tipping elements in the human intestinal ecosystem,” Nature Commun., vol. 5, no. 1, 2014, Art. no. 4344. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Charlson ES et al. , “Disordered microbial communities in the upper respiratory tract of cigarette smokers,” PLoS One, vol. 5, no. 12, 2010, Art. no. e15216. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Wickett NJ et al. , “Phylotranscriptomic analysis. the origin and early diversification of land plants,” Proc. Nat. Acad. Sci, vol. 111, no. 45, pp. 4859–4868, Oct. 2014,. [DOI] [PMC free article] [PubMed] [Google Scholar]
[54].Mai U and Mirarab S, “TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees,” BMC Genomics, vol. 19, no. S5, May 2018, Art. no. 272. [DOI] [PMC free article] [PubMed] [Google Scholar]
[55].Comte A et al. , “PhylteR: Efficient identification of outlier sequences in phylogenomic datasets,” Mol. Biol. Evol, vol. 40, no. 11, Nov. 2023, Art. no. msad234. [DOI] [PMC free article] [PubMed] [Google Scholar]
[56].Estabrook GF, McMorris FR, and Meacham CA, “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units,” Systematic Biol., vol. 34, no. 2, pp. 193–200, Jun. 1985. [Google Scholar]
[57].Warnow T, Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge, U.K.: Cambridge Univ. Press, 2017. [Google Scholar]
[58].Zhang C, Rabiee M, Sayyari E, and Mirarab S, “ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees,” BMC Bioinf., vol. 19, no. S6, May 2018, Art. no. 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
[59].Mirsky L, “A trace inequality of John von Neumann,” Monatshefte für Mathematik, vol. 79, no. 4, pp. 303–306, 1975. [Google Scholar]

[R1] [1].Thurstone LL, “Multiple factor analysis.” Psychol. Rev, vol. 38, no. 5, 1931, Art. no. 406. [Google Scholar]

[R2] [2].Wold S, Esbensen K, and Geladi P, “Principal component analysis,” Chemometrics Intell. Lab. Syst, vol. 2, nos. 1–3, pp. 37–52, 1987. [Google Scholar]

[R3] [3].Stewart GW, “On the early history of the singular value decomposition,” SIAM Rev., vol. 35, no. 4, pp. 551–566, 1993. [Google Scholar]

[R4] [4].Hotelling H, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol, vol. 24, no. 6, 1933, Art. no. 417. [Google Scholar]

[R5] [5].Jolliffe IT, Principal Component Analysis. New York, NY, USA: Springer, 2002. [Google Scholar]

[R6] [6].Tipping ME and Bishop CM, “Probabilistic principal component analysis,” J. Roy. Statist. Soc.: Ser. B (Statist. Methodol.), vol. 61, no. 3, pp. 611–622, 1999. [Google Scholar]

[R7] [7].Vidal R, Ma Y, and Sastry S, “Generalized principal component analysis (GPCA),” IEEE Trans. Pattern Anal. Mach. Intell, vol. 27, no. 12, pp. 1945–1959, Dec. 2005. [DOI] [PubMed] [Google Scholar]

[R8] [8].Lawrence N, “Gaussian process latent variable models for visualisation of high dimensional data,” in Proc. Adv. Neural Inf. Process. Syst, vol. 16, 2003, pp. 329–336. [Google Scholar]

[R9] [9].Roweis S, “EM algorithms for PCA and SPCA,” in Proc. Adv. Neural Inf. Process. Syst, vol. 10, 1997, pp. 626–632. [Google Scholar]

[R10] [10].Bishop C, “Bayesian PCA,” in Proc. Adv. Neural Inf. Process. Syst, vol. 11, 1998, pp. 382–388. [Google Scholar]

[R11] [11].Jolliffe IT, Trendafilov NT, and Uddin M, “A modified principal component technique based on the LASSO,” J. Comput. Graphical Statist, vol. 12, no. 3, pp. 531–547, 2003. [Google Scholar]

[R12] [12].Zou H, Hastie T, and Tibshirani R, “Sparse principal component analysis,” J. Comput. Graphical Statist, vol. 15, no. 2, pp. 265–286, 2006. [Google Scholar]

[R13] [13].Cai TT, Ma Z, and Wu Y, “Sparse PCA: Optimal rates and adaptive estimation,” Ann. Statist, vol. 41, no. 6, pp. 3074–3110, 2013. [Google Scholar]

[R14] [14].Guan Y and Dy J, “Sparse probabilistic principal component analysis," in Proc. Artif. Intell. Statist, PMLR, 2009, pp. 185–192. [Google Scholar]

[R15] [15].Xu H, Caramanis C, and Sanghavi S, “Robust PCA via outlier pursuit,” in Proc. Adv. Neural Inf. Process. Syst, vol. 23, 2010, pp. 2496–2504. [Google Scholar]

[R16] [16].Fletcher PT, Lu C, Pizer SM, and Joshi S, “Principal geodesic analysis for the study of nonlinear statistics of shape,” IEEE Trans. Med. Imag, vol. 23, no. 8, pp. 995–1005, Aug. 2004. [DOI] [PubMed] [Google Scholar]

[R17] [17].Jiang Y, Tabaghi P, and Mirarab S, “Learning hyperbolic embedding for phylogenetic tree placement and updates,” Biology, vol. 11, no. 9, 2022, Art. no. 1256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Tabaghi P and Dokmanić I, “Hyperbolic distance matrices,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery & Data Mining, 2020, pp. 1728–1738. [Google Scholar]

[R19] [19].Fletcher PT and Joshi S, “Principal geodesic analysis on symmetric spaces: Statistics of diffusion tensors," in Proc. Comput. Vis. Math. Methods Med. Biomed. Image Anal, Heidelberg, Germany: Springer, 2004, pp. 87–98. [Google Scholar]

[R20] [20].Lee JM, Riemannian Manifolds: An Introduction to Curvature, vol. 176. New York, NY, USA: Springer Science & Business Media, 2006. [Google Scholar]

[R21] [21].Sonthalia R and Gilbert A, “Tree! I am no tree! I am a low dimensional hyperbolic embedding,” in Proc. Adv. Neural Inf. Process. Syst, vol. 33, 2020, pp. 845–856. [Google Scholar]

[R22] [22].Tabaghi P, Peng J, Milenkovic O, and Dokmanić I, “Geometry of similarity comparisons," 2020, arXiv:2006.09858.

[R23] [23].Chien E, Pan C, Tabaghi P, and Milenkovic O, “Highly scalable and provably accurate classification in Poincaré balls," in Proc. IEEE Int. Conf. Data Mining, Piscataway, NJ, USA: IEEE Press, 2021, pp. 61–70. [Google Scholar]

[R24] [24].Chien E, Tabaghi P, and Milenkovic O, “HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discovery Data Mining, 2022, pp. 201–211. [Google Scholar]

[R25] [25].Klimovskaia A, Lopez-Paz D, Bottou L, and Nickel M, “Poincaré maps for analyzing complex hierarchies in single-cell data,” Nature Commun., vol. 11, no. 1, pp. 1–9, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Zhou Y and Sharpee TO, “Hyperbolic geometry of gene expression,” Iscience, vol. 24, no. 3, 2021, Art. no. 102225. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Meng Y et al. , “Spherical text embedding,” in Proc. Adv. Neural Inf. Process. Syst, vol. 32, 2019, pp. 8208–8217. [Google Scholar]

[R28] [28].Dai X and Müller H-G, “Principal component analysis for functional data on Riemannian manifolds and spheres,” Ann. Statist, vol. 46, no. 6B, pp. 3334–3361, 2018. [Google Scholar]

[R29] [29].Gu A, Sala F, Gunel B, and Ré C, “Learning mixed-curvature representations in product spaces,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 2898–2918. [Google Scholar]

[R30] [30].Rahman IU, Drori I, Stodden VC, Donoho DL, and Schröder P, “Multiscale representations for manifold-valued data,” Multiscale Model. & Simul, vol. 4, no. 4, pp. 1201–1232, 2005. [Google Scholar]

[R31] [31].Tournier M, Wu X, Courty N, Arnaud E, and Reveret L, “Motion compression using principal geodesics analysis," in Computer Graphics Forum, vol. 28, no. 2. Oxford, U.K.: Wiley Online Library, 2009, pp. 355–364. [Google Scholar]

[R32] [32].Anirudh R, Turaga P, Su J, and Srivastava A, “Elastic functional coding of human actions: From vector-fields to latent variables,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2015, pp. 3147–3155. [Google Scholar]

[R33] [33].Fletcher PT, Lu C, and Joshi S, “Statistics of shape via principal geodesic analysis on Lie groups," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, vol. 1. Piscataway, NJ, USA: IEEE Press, 2003, pp. I–I. [Google Scholar]

[R34] [34].Sommer S, Lauze F, and Nielsen M, “Optimization over geodesics for exact principal geodesic analysis,” in Proc. Adv. Comput. Math, vol. 40, no. 2, 2014, pp. 283–313. [Google Scholar]

[R35] [35].Huckemann S, Hotz T, and Munk A, “Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions,” Statistica Sinica, vol. 20, no. 1, pp. 1–58, 2010. [Google Scholar]

[R36] [36].Lazar D and Lin L, “Scale and curvature effects in principal geodesic analysis,” J. Multivariate Anal, vol. 153, pp. 64–82, Jan. 2017. [Google Scholar]

[R37] [37].Pennec X, “Barycentric subspace analysis on manifolds,” Ann. Statist, vol. 46, no. 6A, pp. 2711–2746, 2018. [Google Scholar]

[R38] [38].Sommer S, Lauze F, Hauberg S, and Nielsen M, “Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations," in Comput. Vis.(ECCV): 11th Eur. Conf. Comput. Vis, Heraklion, Crete, Greece. Springer, 2010, pp. 43–56. [Google Scholar]

[R39] [39].Tabaghi P, Chien E, Pan C, Peng J, and Milenković O, “Linear classifiers in product space forms," 2021, arXiv:2102.10204.

[R40] [40].Liu K, Li Q, Wang H, and Tang G, “Spherical principal component analysis," in Proc. SIAM Int. Conf. Data Mining, Philadelphia, PA, USA: SIAM, 2019, pp. 387–395. [Google Scholar]

[R41] [41].Chami I, Gu A, Nguyen DP, and Ré C, “HoroPCA: Hyperbolic dimensionality reduction via horospherical projections," in Proc. Int. Conf. Mach. Learn. PMLR, 2021, pp. 1419–1429. [Google Scholar]

[R42] [42].Chakraborty R, Seo D, and Vemuri BC, “An efficient exact-PGA algorithm for constant curvature manifolds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2016, pp. 3976–3984. [Google Scholar]

[R43] [43].Gallier J and Quaintance J, Differential Geometry and Lie Groups: A Computational Perspective. Springer Nature, 2020. [Google Scholar]

[R44] [44].Pearson K, “LIII. On lines and planes of closest fit to systems of points in space,” London, Edinburgh, Dublin Philos. Mag. J. Sci, vol. 2, no. 11, pp. 559–572, 1901. [Google Scholar]

[R45] [45].Lahav A and Talmon R, “Procrustes analysis on the manifold of SPSD matrices for data sets alignment,” IEEE Trans. Signal Process, vol. 71, pp. 1907–1921, 2023. [Google Scholar]

[R46] [46].Lou A, Katsman I, Jiang Q, Belongie S, Lim S-N, and De Sa C, “Differentiating through the Fréchet mean," in Proc. Int. Conf. Mach. Learn, PMLR, 2020, pp. 6393–6403. [Google Scholar]

[R47] [47].Huckemann S and Ziezold H, “Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces,” in Proc. Adv. Appl. Probability, vol. 38, no. 2, pp. 299–319, 2006. [Google Scholar]

[R48] [48].Jung S, Dryden IL, and Marron JS, “Analysis of principal nested spheres,” Biometrika, vol. 99, no. 3, pp. 551–568, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] [49].Higham NJ, “J-orthogonal matrices: Properties and generation,” SIAM Rev., vol. 45, no. 3, pp. 504–519, 2003. [Google Scholar]

[R50] [50].Slapničar I and Veselić K, “A bound for the condition of a hyperbolic eigenvector matrix,” Linear Algebra Appl., vol. 290, nos. 1-3, pp. 247255, 1999. [Google Scholar]

[R51] [51].Lahti L, Salojärvi J, Salonen A, Scheffer M, and De Vos WM, “Tipping elements in the human intestinal ecosystem,” Nature Commun., vol. 5, no. 1, 2014, Art. no. 4344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Charlson ES et al. , “Disordered microbial communities in the upper respiratory tract of cigarette smokers,” PLoS One, vol. 5, no. 12, 2010, Art. no. e15216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] [53].Wickett NJ et al. , “Phylotranscriptomic analysis. the origin and early diversification of land plants,” Proc. Nat. Acad. Sci, vol. 111, no. 45, pp. 4859–4868, Oct. 2014,. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] [54].Mai U and Mirarab S, “TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees,” BMC Genomics, vol. 19, no. S5, May 2018, Art. no. 272. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] [55].Comte A et al. , “PhylteR: Efficient identification of outlier sequences in phylogenomic datasets,” Mol. Biol. Evol, vol. 40, no. 11, Nov. 2023, Art. no. msad234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] [56].Estabrook GF, McMorris FR, and Meacham CA, “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units,” Systematic Biol., vol. 34, no. 2, pp. 193–200, Jun. 1985. [Google Scholar]

[R57] [57].Warnow T, Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge, U.K.: Cambridge Univ. Press, 2017. [Google Scholar]

[R58] [58].Zhang C, Rabiee M, Sayyari E, and Mirarab S, “ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees,” BMC Bioinf., vol. 19, no. S6, May 2018, Art. no. 153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] [59].Mirsky L, “A trace inequality of John von Neumann,” Monatshefte für Mathematik, vol. 79, no. 4, pp. 303–306, 1975. [Google Scholar]

PERMALINK

Principal Component Analysis in Space Forms

Puoya Tabaghi

Michael Khanzadeh

Yusu Wang

Siavash Mirarab

Abstract

I. Introduction

A. Preliminaries and Notations

II. Principal Component Analysis — Revisited

Fig. 1.

III. Riemannian Principal Component Analysis

A. Riemannian Affine Subspaces

TABLE I.

B. Proper Cost for Riemannian PCA

IV. Spherical PCA

A. Spherical Affine Subspace and the Projection Operator

B. Minimum Distortion Spherical Subspaces

1). Review of Existing Work:

2). A Proper Cost Function for Spherical PCA:

Fig. 2.

V. Hyperbolic PCA

A. Eigenequation in Lorentzian spaces

B. Hyperbolic Affine Subspace and the Projection Operator

C. Minimum Distortion Hyperbolic Subspaces

1). Review of Existing Work:

2). A Proper Cost Function for Hyperbolic PCA:

VI. Numerical Results

A. Synthetic Data and Experimental Setup

1). Random Affine Subspace:

2). Noise-Contaminated Points:

3). PCA on Noisy Data:

4). Randomized Experiments and Algorithms:

B. Spherical PCA

1). Experiment SK1 :

Fig. 3.

2). Experiments SD1 and SD2 :

Fig. 4.

3). Experiment SN1:

C. Hyperbolic PCA

1). Experiments HK1 and HK2:

Fig. 5.

Fig. 6.

2). Experiments HD1 and HD2 :

3). Experiments HN1 and HN2:

D. Real Data: Spherical Spaces

Distortion Analysis.

TABLE II.

Classification Performance.

E. Real Data: Hyperbolic Spaces

Distortion Analysis.

TABLE III.

Quartet Scores.

1). Outlier Detection With Hyperbolic Spectrum:

Fig. 7.

Acknowledgment

Appendix

Fig. 8.

A. Additional Experiments

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

1). Experiment $S (K_{1})$ :

2). Experiments $S (D_{1})$ and $S (D_{2})$ :

3). Experiment $S (N_{1})$ :

1). Experiments $H (K_{1})$ and $H (K_{2})$ :

2). Experiments $H (D_{1})$ and $H (D_{2})$ :

3). Experiments $H (N_{1})$ and $H (N_{2})$ :