Abstract
Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.
Index Terms—: Principal component analysis, Riemannian manifolds, hyperbolic and spherical spaces
I. Introduction
GIVEN a set of multivariate points, principal component analysis (PCA) finds orthogonal basis vectors so that different components of the data, in the new coordinates, become uncorrelated and the leading bases carry the largest projected variance of the points. PCA is related to factor analysis [1], Karhunen-Loéve expansion, and singular value decomposition [2] — with a history going back to the 18th century [3]. The modern formalism of PCA goes back to the work of Hotelling [4]. Owing to its interpretability and flexibility, PCA has been an indispensable tool in data science applications [5]. The PCA formulation has been studied numerously in the literature. Tipping and Bishop [6] established a connection between factor analysis and PCA in a probabilistic framework. Other extensions have been proposed [7], e.g., Gaussian processes [8] and sensible [9], Bayesian [10], sparse [11], [12], [13], [14], and Robust PCA [15].
PCA's main features are its linearity and nested optimality of subspaces with different dimensions. PCA uses a linear transformation to extract features. Thus, applying PCA to non-Euclidean data ignores their geometry, produces points that may not belong to the original space, and breaks downstream applications relying on this geometry [16], [17], [18], [19].
We focus on space forms: complete, simply connected Riemannian manifolds of dimension and constant curvature — spherical, Euclidean, or hyperbolic spaces (positive, zero, and negative curvatures) [20]. Space forms have gained attention in the machine learning community due to their ability to represent many forms of data. Hyperbolic spaces are suitable for hierarchical structures [18], [21], [22], [23], [24], biological data [25], [26], and phylogenetic trees [17]. Spherical spaces find application in text embeddings [27], longitudinal data [28], and cycle-structures in graphs [29].
To address the shortcomings of Euclidean PCA for non-Euclidean data, several authors propose Riemannian PCA methods [16], [30], [31], [32], [33], [34], [35], [36], [37], [38]. Riemannian manifolds generally lack a vector space structure [39], posing challenges for defining principal components. A common approach for dimensionality reduction of manifold-valued data relies on tangent spaces. Fletcher et al. [33] propose a cost to quantify the quality of a Riemannian affine subspace but use a heuristic approach, principal geodesics analysis (PGA), to optimize it: (1) the base point (intrinsic mean) is the solution to a fixed-point problem, (2) Euclidean PCA in tangent space estimates the low-rank tangent vectors. Even more principled approaches do not readily yield tractable solutions necessary for analyzing large-scale data [34], as seen in spherical and hyperbolic PCAs [28], [40], [41].
Despite recent progress, PCA in space forms remains inadequately explored. In general, cost-based Riemannian (e.g., spherical and hyperbolic) PCAs rely on finding the optimal Riemannian affine subspace by minimizing a nonconvex function. The cost function, proxy, or methodology is usually inspired by the cost, with no definite justification for it [16], [28], [33], [35], [40], [41]. These algorithms rely on iterative methods to estimate the Riemannian affine subspaces, e.g., gradient descent, fixed-point iterations, and proximal alternate minimization, and they are slow to converge and require parameter tuning. There is also no guarantee that estimated Riemannian affine subspaces form a total order under inclusion (i.e., optimal higher-dimensional subspaces include lower-dimensional ones) unless they perform cost minimization in a greedy (suboptimal) fashion by building high-dimensional subspaces based on previously estimated low-dimensional subspaces. Notably, Chakraborty et al. propose a greedy PGA for space forms by estimating one principal geodesic at a time [42]. They derive an analytic formula for projecting a point onto a parameterized geodesic. This simplifies the projection step of the PGA. However, we still have to solve a nonconvex optimization problem (with no theoretical guarantees) to estimate the principal geodesic at each iteration.
We address PCA limitations in space forms by proposing a closed-form, theoretically optimal, and computationally efficient method to derive all principal geodesics at once. We begin with a differential geometric view of Euclidean PCA (Section II), followed by a generic description of Riemannian PCA (Section III). In this view, a proper PCA cost function must (1) naturally define a centroid for manifold-valued points and (2) yield theoretically optimal affine subspaces forming a total order under inclusion. We introduce proper costs for spherical (Section IV) and hyperbolic (Section V) PCA problems. Minimizing each cost function leads to an eigenequation, which can be effectively solved. For hyperbolic PCA, the optimal affine subspace solves an eigenequation in Lorentzian space which is equipped with an indefinite inner product. These results give us efficient algorithms to derive hyperbolic principal components. We delegate all proofs to the Appendix.
A. Preliminaries and Notations
Let be a Riemannian manifold. The tangent space is the collection of all tangent vectors at . The Riemannian metric is given by a positive-definite inner product and depends smoothly on . We use to define notions such as subspace, norms, and angles, similar to inner product spaces. For any subspace , we define its orthogonal complement as follows:
| (1) |
The norm of is . We denote the length of a smooth curve as . A geodesic is the shortest-length path between and , that is, . Interpreting the parameter as time, if a geodesic starts at with initial velocity , the exponential map gives its position at . For and , the logarithmic map gives the initial velocity to move (with constant speed) along the geodesic from to in one time step. A Riemannian manifold is geodesically complete if the exponential and logarithmic maps, at every point , are well-defined operators [43]. A submanifold of a Riemannian manifold () is geodesic if any geodesic on with its induced metric is also a geodesic on . For , we let and . The variable is an element of the vector . It can also be an indexed vector, e.g., . This distinction will be clarified in the context. We use to denote the empirical mean of its inputs with indices in .
II. Principal Component Analysis — Revisited
Similar to the notion by Pearson [44], PCA finds the optimal low-dimensional affine space to represent data. Let and let the column span of be a subspace. For the affine subspace , PCA assumes the following cost:
where , computes the distance, and the distortion function . This formalism relies on (1) an affine subspace , (2) the projection operator , and (3) the distortion function . To generalize affine subspaces to Riemannian manifolds, consider parametric lines:
| (2) |
where , the orthogonal complement of the subspace ; see Fig. 1(a) and 1(b). We reformulate affine subspaces as:
where is the dot product and .
Fig. 1.

(a,b) One- (a) and two-dimensional (b) affine subspaces in . We show subspaces () at point instead of the origin. We may define the same Riemannian affine subspace using other base points, e.g., . (c) Two-dimensional affine subspace in a hyperbolic space (Poincaré) where .
Definition 1: An affine subspace is a set of points, e.g., , where there exists such that (tangent of) the line (i.e., ) is normal to a set of tangent vectors at .
Definition 1 requires parameters to describe . For example, in , we need two orthonormal vectors to represent a one-dimensional affine subspace; see Fig. 1(a). We use Definition 1 since it describes affine subspaces in terms of lines and tangent vectors, not a global coordinate system.
III. Riemannian Principal Component Analysis
We next introduce Riemannian affine subspaces and then propose a generic framework for Riemannian PCA.
A. Riemannian Affine Subspaces
The notion of affine subspaces can be extended to Riemannian manifolds using tangent subspaces of the Riemannian manifold [33]. The Riemannian affine subspace is the image of a subspace of under the exponential map, i.e.,
| (3) |
where is a subspace of and . Equivalently, the Riemannian affine subspace is defined as follows.
Definition 2: Let be a geodesically complete Riemannian manifold, , and subspace . We let where is the orthogonal complement of ; see equation (1).
The set is a collection of points on geodesics originating at such that their initial velocities are normal to vectors in , or form a subspace ; cf. Definition 1.
Example 1: When is a one-dimensional subspace, then contains geodesics that pass through with the initial velocity in , i.e., . Thus, with equation (3), geodesics are one-dimensional Riemannian affine subspaces.
Example 2: The Euclidean exponential map is ; see Table I. Therefore, equation (3) recovers the affine subspaces, i.e., .
TABLE I.
Summary of Relevant Operators in Euclidean, Spherical, and Hyperbolic Spaces
Recall that, a nonempty set is a Euclidean affine subspace if and only if there exists such that and for all and . We have a similar definition for Riemannian affine subspaces.
Definition 3: Let be a geodesically complete Riemannian manifold. Then, nonempty is an affine set if and only if there exists such that and for all and .
B. Proper Cost for Riemannian PCA
Similar to Euclidean PCA, Riemannian PCA aims to find a (Riemannian) affine subspace with minimum average distortion between points and their projections.
Definition 4: Let be a geodesically complete Riemannian manifold that is equipped with distance function , and subspace . A geodesic projection of onto is . If , then for any geodesic projection .
Remark 1: Projecting a manifold-valued point onto a Riemannian affine subspace is not a trivial task, often requiring the solution of a nonconvex optimization problem over a submanifold, i.e., solving . Definition 4 states that if a solution exists (which is not always guaranteed), the projection distance must be equal to . This also requires computing the logarithmic map, which may not be available for all Riemannian manifolds.
In Euclidean PCA, minimizing the cost is equivalent to maximizing the variance of the projected data. To avoid confusing arguments regarding the notion of variance and centroid, we formalize the cost (parameterized by ) in terms of the projection distance, viz.,
| (4) |
where and is a monotonically increasing distortion function. The projection point may not be unique. Its minimizer, if it exists, is the best affine subspace to represent manifold-valued points.
Definition 5: Riemannian PCA aims to minimize the cost in equation (4) — for a specific choice of distortion function .
Choice of f. The closed-form solution for the optimal Euclidean affine subspace is due to letting . This is a proper cost function with the following properties:
Consistent Centroid. The optimal 0-dimensional affine subspace (a point) is the centroid of data points, i.e., .
Nested Optimality. The optimal affine subspaces form a nested set, i.e., where is the optimal -dimensional affine subspace.
Definition 6: For Riemannian PCA, we call a proper cost function if its minimizers satisfy the consistent centroid and nested optimality conditions.
Deriving the logarithm operator is not a trivial task for general Riemannian manifolds, e.g., the manifold of rankdeficient positive semidefinite matrices [45]. Focusing on constant-curvature Riemannian manifolds, we propose distortion functions that, unlike existing methods, arrive at proper cost functions with closed-form optimal solutions.
IV. Spherical PCA
Consider the spherical manifold () with curvature , where , a sphere with radius , and computes the dot product of .
A. Spherical Affine Subspace and the Projection Operator
Let and the subspace . Following Definition 2 and Table I, the spherical affine subspace is:
where is the direct sum operator, i.e., , and is the orthogonal complement of ; see equation (1). This matches Pennec's notion of the metric completion of exponential barycentric spherical subspace [37].
Claim 1: is a geodesic submanifold.
There are orthogonal tangent vectors that form a complete basis for , i.e., , where if and if . Using these basis vectors, we derive a simple expression for the projection distance.
Proposition 1: For any and , we have
where are the complete orthogonal basis vectors of . Both and have a fixed curvature .
For points at distance from , there is no unique projection onto the affine subspace. Nevertheless, Proposition 1 provides a closed-form expression for the projection distance in terms of basis vectors for . This distance monotonically increases with the length of the residual of onto , i.e., . Since is common, switching to the basis of helps us represent the projection distance with fewer parameters.
Proposition 2: For any and , we have
where are complete orthogonal basis vectors of .
Next, we derive an isometry between and — where both have the fixed curvature .
Theorem 2: The isometry and its inverse are
where are complete orthogonal basis vectors of .
Corollary 1: The dimension of is .
Finally, we can provide an alternative view of spherical affine subspaces based on sliced unitary matrices .
Claim 3: For any , there is a sliced-unitary operator and vice versa.
B. Minimum Distortion Spherical Subspaces
To define principal components, we need a specific choice of distortion function ; see equation (4). Before presenting our choice, let us discuss previously studied cost functions.
1). Review of Existing Work:
Dai and Müller consider an intrinsic PCA for smooth functional data on a Riemannian manifold [28] with the distortion function , i.e.,
| (5) |
Their algorithm, Riemannian functional principal component analysis (RFPCA), is based on first solving for the base point — the optimal zero-dimensional affine subspace, i.e., the Fréchet mean . Then, they project each point to using the logarithmic map. Next, they perform PCA on the resulting tangent vectors to obtain the -dimensional tangent subspace. Finally, they map back the projected tangent vectors to (spherical space) using the exponential map. Despite its simplicity, this approach suffers from four shortcomings. (1) There is no closed-form expression for the Fréchet mean of spherical data. (2) Theoretical analysis on computation complexity of estimating a Fréchet mean is not yet known; and its computation involves an argmin operation which oftentimes cannot be easily differentiated [46]. (3) Even for accurately estimated Fréchet mean, there is no guarantee that the optimal solution to the problem (5) is the Fréchet mean. Huckemann and Ziezold [47] show that the Fréchet mean may not belong to the optimal one-dimensional affine spherical subspace. (4) Even if the Fréchet mean happens to be the optimal base point, performing PCA in the tangent space is not the solution to problem (5).
Liu et al. propose a spherical matrix factorization problem:
| (6) |
where is the measurement set and are features in a spherical space with [40]. They propose a proximal algorithm to solve for the affine subspace and features. This formalism is not a spherical PCA because the measurements do not belong to a spherical space. The objective in equation (6) aims to best project Euclidean points to a low-dimensional spherical affine subspace with respect to the squared Euclidean distance — refer to Claim 2. Nevertheless, if we change their formalism and let the input space be a spherical space, we arrive at:
| (7) |
where is a tangent subspace that corresponds to (see Claim 2) and is the spherical projection operator. This formalism uses distortion function .
Nested spheres [48] by Jung et al. is an alternative procedure for fitting principal nested spheres to iteratively reduce the dimensionality of data. It finds the optimal ()-dimensional subsphere by minimizing the following cost,
where — over and . This is a constrained nonlinear optimization problem without closed-form solutions. Once they estimate the optimal , they map each point to the lower-dimensional spherical space — and repeat this process until they reach the target dimension. The subspheres are not necessarily great spheres, making this decomposition nongeodesic.
2). A Proper Cost Function for Spherical PCA:
In contrast to distortions and used by Liu et al. [40] and Dai and Müller [28], we choose . Using Proposition 1, we arrive at:
| (8) |
i.e., the average norm of the projected points in the directions of vectors . The expression (8) leads to a tractable constrained optimization problem.
Claim 4: Let . The spherical PCA equation (8) aims to find and orthogonal that minimize .
The solution to the problem in Claim 3 is the minimizer of the cost in equation (8), which is a set of orthogonal vectors that capture, in quantum physics terminology, the least possible energy of .
Theorem 5: Let . Then, an optimal solution for is the leading eigenvector of , and (basis vectors of ) are the eigenvectors that correspond to the smallest eigenvalues of the second-moment matrix .
Corollary 2: The optimal subspace is spanned by leading eigenvectors of , discarding the first one.
Claim 6: The cost function in equation (8) is proper.
The distortion in equation (8) implies a closed-form definition for the centroid of spherical data points, i.e., a zero-dimensional affine subspace that best represents the data.
Definition 7: A spherical mean for point set is any point such that . The solution is a scaled leading eigenvector of .
Interpreting the optimal base point as the spherical mean in Definition 7 shows our spherical PCA has consistent centroid and nested optimality: optimal spherical affine subspaces of different dimensions form a chain under inclusion. However, is not unique and only identifies the direction of the main component of points; see Fig. 2.
Fig. 2.

(a) A set of data points in , where . (b) The best estimate for the base point and the tangent subspace — the spherical affine subspace . (c) The projection of points onto . (d) The low-dimensional features in , where .
Remark 2: PGA [33] and RFPCA [28] involve the intensive task of Fréchet mean estimation. This involves iterative techniques like gradient descent or fixed-point iterations on nonlinear optimization objectives. There has been work on the numerical analysis of Fréchet mean computations [46]. On the other hand, SPCA [40] uses alternating linearized minimization to estimate the optimal subspace. In contrast, our method (SFPCA) requires computing the second-moment matrix with a complexity of and involves eigendecomposition with a worst-case complexity of .
V. Hyperbolic PCA
Let us first introduce Lorentzian spaces.
Definition 8: The Lorentzian ()-space, denoted by , is a vector space equipped with the Lorentzian inner product
where is the identity matrix.
The -dimensional hyperbolic manifold with curvature , where and metric for ; see Table I.
A. Eigenequation in Lorentzian spaces
Like inner product spaces, we define operators in .
Definition 9: Let be a matrix (operator) in . We let be the -adjoint of if and only if is the -inverse of if and only if (iff) . An invertible matrix is called -unitary iff ; see [49] for more detail.
The Lorentzian space is equipped with an indefinite inner product, i.e., . Therefore, it requires a form of eigenequation defined by its indefinite inner product. For completeness, we propose the following definition of eigenequation in the complex Lorentzian space .
Definition 10: For is its eigenvector and is the corresponding -eigenvalue if
| (9) |
and is the complex conjugate of . The sign of the norm, , defines positive and negative -eigenvectors.
Definition 10 is subtly different from hyperbolic eigenequation [50] — a special case of eigenvalue decomposition. We prefer Definition 10 as it carries over familiar results from Euclidean to the Lorentzian space.
Proposition 3: If , then its -eigenvalues are real.
Let be a -eigenvector of where . Then, the eigenvalue of ; see Definition 10. There is a connection between Euclidean and Lorentzian eigenequations. Namely, has eigenvector and . Then, is a eigenpair of .
Claim 7: -eigenvectors of are parallel to eigenvectors of .
Proposition 4 shows that the normalization factor is well-defined for full-rank matrices.
Proposition 4: If is full-rank, then .
Our algorithm uses the connection between Euclidean and -eigenequations. We extend the notion of diagonalizability to derive the optimal affine subspace; see Proposition 7.
Definition 11: is -diagonalizable if and only if there is a -invertible such that , where is a diagonal matrix.
B. Hyperbolic Affine Subspace and the Projection Operator
Let and be a -dimensional subspace of . Following Definition 2 and Table I, we arrive at the following definition for the hyperbolic affine subspace:
| (10) |
where is the orthogonal complement of , i.e., . This also coincides with the metric completion of exponential barycentric hyperbolic subspace [37].
Claim 8: The hyperbolic subspace is a geodesic submanifold.
Lemma 1 shows that there is a complete set of orthogonal tangents where and span . In Proposition 5, we provide a closed-form expression for the projection distance onto in terms of the basis of .
Proposition 5: For any and , we have
where are complete orthogonal basis of .
The projection distance monotonically increases with the norm of its residual of onto , i.e., . Proposition 5 asks for the orthogonal basis of —commonly, a high-dimensional space. We can use the basis of to compute the projection distance.
Proposition 6: For any and , we have
where are complete orthonormal basis of .
We represent points in as a linear combination of the base point and tangent vectors. Given these vectors, we can find a low-dimensional representation for points in —reducing the dimensionality of hyperbolic data points.
Theorem 9: The isometry and its inverse are
where and has complete orthogonal basis vectors . Both and have curvature .
Corollary 3: The affine dimension of is .
Similar to the spherical case, we can characterize hyperbolic affine subspaces in terms of sliced -unitary matrices—paving the way for constrained optimization methods over sliced -unitary matrices to solve hyperbolic PCA problems.
Claim 10: For any , there is a sliced -unitary operator , i.e., , and vice versa.
C. Minimum Distortion Hyperbolic Subspaces
1). Review of Existing Work:
Chami et al. propose HoroPCA [41]. They define as the geodesic hull , where is a geodesic such that and for all . The geodesic hull of contains straight lines between and for all and .
Claim 11: is a hyperbolic affine subspace.
Their goal is to maximize a proxy for the projected variance:
| (11) |
where is the horospherical projection operator — which is not a geodesic (distance-minimizing) projection. They propose a sequential algorithm to minimize the cost in equation (11), using a gradient descent method, as follows: (1) the base point is computed as the Fréchet mean via gradient descent; and (2) a higher-dimensional affine subspace is estimated based on the optimal affine subspace of lower dimension. One may formulate the hyperbolic PCA problem as follows:
| (12) |
where are the measurements and are low-dimensional hyperbolic features. The formulation in equation (12) leads to the decomposition of a Euclidean matrix in terms of a sliced- unitary matrix and a hyperbolic matrix — a topic for future studies.
2). A Proper Cost Function for Hyperbolic PCA:
We choose to arrive at the following cost:
| (13) |
see Proposition 5. We interpret as the aggregate dissipated energy of the points in directions of normal tangent vectors. If has no components in the direction of normal tangents — i.e., where are orthogonal basis vectors for — then . Our distortion function in equation (13) leads to the formulation of hyperbolic PCA as a constrained optimization problem:
Problem 12: Let and . The hyperbolic PCA aims to find a point and a set of orthogonal vectors that minimize the function .
Claim 1 aims to minimize the cost in equation (13), i.e.,
where , , and , over . Claim 1 asks for -orthogonal vectors in an appropriate tangent space that capture the least possible energy of with respect to the Lorentzian scalar product.
Remark 3: The spectrum of a matrix is its set of eigenvalues. Discarding an eigenvalue from the matrix's eigenvalue decomposition approximates the matrix with an error proportional to the magnitude of the discarded eigenvalue, that is, discarded energy. Similarly, one can define the -spectrum of the second-moment matrix as the set of its -eigenvalues. As we demonstrate in numerical experiments, we use -spectrum to identify the existence of outlier hyperbolic points.
This is akin to the Euclidean PCA: the subspace is spanned by the leading eigenvectors of the covariance matrix. However, to prove the hyperbolic PCA theorem, we need a technical result on -diagonalizability of the second-moment matrix.
Proposition 7: If is a symmetric and -diagonalizable matrix, i.e., , that has distinct (in absolute values) diagonal elements of , then where is a -unitary matrix.
From Proposition 7, any symmetric, -diagonalizable matrix with distinct (absolute) -eigenvalues has positive and one negative -eigenvectors — all orthogonal to each other.
Theorem 13: Let and be a -diagonalizable matrix. Then, the optimal solution for point is the scaled negative -eigenvector of and the optimal are the scaled positive -eigenvectors that correspond to the smallest -eigenvalues of . And is spanned by scaled positive -eigenvectors that correspond to the leading -eigenvalues of .
The -diagonalizability condition requires to be similar to a diagonal matrix. Proposition 7 provides a sufficient condition for its -diagonalizability; in fact, we conjecture that symmetry is sufficient even if it has repeated -eigenvalues.
Claim 14: The cost function in equation (13) is proper.
The proper cost in equation (13) implies the following closed-form definition for the hyperbolic centroid.
Definition 12: A hyperbolic mean of is if . The solution is the scaled negative -eigenvector of .
Remark 4: The formalism of space form (Euclidean, spherical, and hyperbolic) PCAs shows similarities through the use of (in)definite eigenequations. This arises from the introduction of proper cost functions which resulted in quadratic cost functions with respect to the base points and tangent vectors. However, this approach is not necessarily generalizable to other Riemannian manifolds. This limitation is due to the absence of (1) a simple Riemannian metric, (2) a closed-form distance function, and (3) closed-form exponential and logarithmic maps in general Riemannian manifolds, e.g., the manifold of rankdeficient positive semidefinite matrices [45].
VI. Numerical Results
We compare our space form PCA algorithm (SFPCA) to other leading algorithms in terms of accuracy and speed.
A. Synthetic Data and Experimental Setup
We generate random, noise-contaminated points on known (but random) spherical and hyperbolic affine subspaces. We then apply PCA methods to recover the projected points after estimating the affine subspace. We conduct experiments examining the impact of number of points , the ambient dimension , the dimension of the affine subspace , and the noise level on the performance of algorithms.
1). Random Affine Subspace:
For fixed ambient and subspace dimensions, and , we sample from a normal distribution and normalize it to get the spherical (or hyperbolic) point . We then generate random vectors from the standard normal distribution and use the Gram-Schmidt process to construct tangent vectors: project the first random vector onto and normalize it to , where . We then project the second random vector onto , where , and normalize it to . We repeat this until we form a -dimensional affine subspace in .
2). Noise-Contaminated Points:
Let be the subspace of . For , we generate and let . To add noise, we project onto , i.e, . We then let be the noise-contaminated point. Finally, if and if .
3). PCA on Noisy Data:
We use each algorithm to estimate an affine subspace where and are the estimated parameters. We let be the empirical mean of the distance between measurements and the true subspace, and be the average distance between denoised points and the true affine subspace. If is a good approximation to , then is small. We evaluate the performance of algorithms using the normalized output error, .
Remark 5: The ratio of over quantifies how much the estimated points are farther from the true subspace compared to the original noise-contaminated points. This is a normalized quantity, i.e., it is invariant with respect to the scale of data points, which makes it ideal for comparing results as , , and vary. A reasonable upper bound for this ratio is 1 — as PCA is expected to denoise the point sets by finding the optimal low-dimensional affine subspace for them.
4). Randomized Experiments and Algorithms:
For each random affine subspace and noise-contaminated point set, we report the normalized error and the running time for each algorithm. Then, we repeat each random trial 100 times. We use our implementation of principal geodesic analysis (PGA) [33]. We also implement Riemannian functional principal component analysis (RFPCA) for spherical PCA [28] and spherical principal component analysis (SPCA) [40]. Since SPCA is computationally expensive, we first run our SFPCA to provide it with good initial parameters. For hyperbolic experiments, we use HoroPCA [41] and barycentric subspace analysis (BSA) [37], implemented by Chami et al. [41].
B. Spherical PCA
1). Experiment :
For a fixed , increasing the subspace dimension worsens the normalized output errors for all algorithms; see Fig. 3(a). RFPCA is unreliable, while other methods are similar in their error reduction pattern. When is close to , SFPCA has a marginal but consistent advantage over PGA. SFPCA is faster than the rest, and has a minor impact on running times.
Fig. 3.

For each spherical experiment, on the y-axes, we report running time and normalized output error. We show the median (solid line) and the first and third quartiles (shaded transparent area) over all random trials. Figures (a,b,c) show the results for , respectively. All axes are in logarithmic scale.
2). Experiments and :
In — fixed and varying — PGA, SFPCA, and SPCA exhibit a similar denoising performance, not impacted by ; see Fig. 3 (b). RFPCA has higher output error levels than other methods. To further compare SFPCA and its close competitor PGA, we design the challenging experiment with and . In this setting, SFPCA exhibits a clear advantage over PGA in error reduction; see Fig. 4. In both settings, SFPCA continues to be the fastest in almost all conditions despite using a warm start for PGA.
Fig. 4.

Spherical experiment . The y-axes show running time and normalized output error. All axes are in logarithmic scale.
3). Experiment :
For fixed and , when we change and , our SFPCA has the fastest running time; and it is tied in having the lowest normalized output error with SPCA and PGA; see Fig. 3(c). As expected, increasing generally makes all methods slower, partially because the computation of has complexity. Computing a base point using iterative computations on all points is time-consuming with , whereas our SFPCA has worst-case complexity of . SFPCA provides similar error reductions compared to the rest due to providing an excessive number of points to each algorithm. SPCA fails in some cases, as evident from the erratic behavior of the normalized output error. This may be due to the algorithm's failure to converge within the allocated maximum running time. SPCA takes about 15 minutes on 104 points in each trial, while our SFPCA takes less than a second. PGA is the closest competitor in normalized error but is about three times slower.
C. Hyperbolic PCA
1). Experiments and :
On small datasets in , for each trial, HoroPCA and BSA take close to an hour whereas SFPCA and PGA take milliseconds; see Fig. 5(a). Increasing only increases the running time of BSA and HoroPCA but does not change SFPCA's and PGA's. This is expected as they estimates an affine subspace greedily one dimension at a time. Regarding error reduction, as expected, all methods become less effective as grows. For small , all methods achieve similar normalized output error levels with only a slight advantage for PGA and SFPCA. As increases, PGA and HoroPCA become less effective compared to BSA and SFPCA. For large , SFPCA exhibits a clear advantage over all other methods. In the larger experiments (), we compare the two fastest methods, SFPCA and PGA; see Fig. 6(a). When is small, both methods have similar denoising performance for small ; SFPCA performs better only for larger . As increases, SFPCA outperforms PGA irrespective of .
Fig. 5.

For each scaled-down hyperbolic experiment, on the y-axes, we report running time and normalized output error in logarithmic scale. We report the median (solid line) and the first and third quartiles (shaded transparent area) over all random trials. Figures in rows (a), (b), and (c) are , and .
Fig. 6.

For each full-scale hyperbolic experiment, on the y-axes, we report running time and normalized output error in logarithmic scale. We report the median (solid line) and the first and third quartiles (shaded transparent area). Figures in rows (a),(b), and (c) are , and . All axes are in logarithmic scale.
2). Experiments and :
In , we fix and in , we let . Changing impacts each method differently; see Fig. 5(b). Both SFPCA and PGA take successively more time as increases, but they remain significantly faster than the other two, with average running times below 0.1 second. The running time of HoroPCA is (almost) agnostic to since its cost function (projected variance) is free of the parameter . Neither HoroPCA nor BSA outperform SFPCA in error reduction. All methods improve in their error reduction performance as increases. For large , SFPCA provides the best error reduction performance among all algorithms. Comparing the fastest methods SFPCA and PGA, we observe consistent patterns in and : (1) SFPCA is faster, regardless of and the gap between the two methods can be as high as a factor of 10. (2) When , PGA slightly outperforms SFPCA in reducing error; with the lowest noise (), PGA gives 17% better accuracy, in average over all values of . However, as increases, SFPCA becomes more effective; at the highest end , SFPCA outperforms PGA by 40%, in average over ; see Fig. 6(b).
3). Experiments and :
In , (), increasing impacts the running time of SFPCA and PGA due to computing ; see Fig. 5(c). Nevertheless, both are orders of magnitude faster than HoroPCA and BSA. All methods provide improved error reduction as increases. Comparing the fast methods SFPCA and PGA on larger datasets shows that SFPCA is always faster, has a slight disadvantage in output error on small , and substantial improvements on large ; see Fig. 6(c).
D. Real Data: Spherical Spaces
We evaluate the performance of PCA methods using following datasets: (1) Intestinal Microbiome: Lahti et al. [51] analyzed the gut microbiota of adults covering bacterial groups. The study explored the effects of age, nationality, BMI, and DNA extraction methods on the microbiome. They assessed variations in microbiome compositions across young (18 to 40), middle-aged (41to 60), and older (61 to 77) age groups (a ternary classification problem). (2) Throat Microbiome: Charlson et al. [52] investigated the impact of cigarette smoking on the airway microbiota in 29 smokers and 33 non-smokers (a binary classification problem) using culture-independent 454 pyrosequencing of 16S rRNA. (3) Newsgroups: Using Python's scikit-learn package, the 20 newsgroups dataset was streamlined to a binary classification problem by retaining samples from two distinct classes. Feature reduction was performed using TF-IDF, narrowing it down to features to improve computational efficiency. Each dataset has undergone standard preprocessing, e.g., normalization and square-root transformation, to ensure the data points are spherically distributed. For a fixed subspace dimension, we estimate spherical affine subspaces. Then, we compute the projected spherical data points and denoise the original compositional data.
Distortion Analysis.
For compositional data, we calculate distance matrices using Aitchison , Jensen–Shannon , and total variation () metrics. We also compute the spherical distance matrix (). For each embedding dimension , we compute projected point sets. We then compute the normalized errors; an example of normalized error is where and are total variation distance matrices for the original and estimated data. For each algorithm, we then divide these normalized errors by their average across all algorithms, providing relative measures, that is, if the resulting relative error is greater than 1, the algorithm performs worse than average. We then report the mean and standard deviation of relative errors across different dimensions; see Table II. On all datasets, SFPCA outperforms the rest. Newsgroups experiments are limited to SFPCA and PGA due to the significant computational complexity of SPCA and RFPCA.
TABLE II.
The Mean and Standard Deviation of Normalized Distance Errors Are Divided by Their Average Across Methods. Classification Accuracies Are Percentage Deviations From 100% — Representing the Average Accuracy Across Methods. Boldface and Red Indicate SFPCA and the Top-Performing Method. Lower Distortions (↓) and Higher Accuracies (↑) Are Better
| Metric (Method) | Throat Microbiome | Intestinal Microbiome | Newsgroups | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SFPCA | RFPCA | SPCA | PGA | SFPCA | RFPCA | SPCA | PGA | SFPCA | PGA | |
| 0.88 ± 0.99 | 1.13 ± 1.28 | 0.9 ± 0.99 | 1.1 ± 1.18 | 0.75 ± 1.64 | 0.93 ± 2.09 | 1.47 ± 2.07 | 0.85 ± 1.79 | 0.77 ± 1.03 | 1.23 ± 1.4 | |
| 0.98 ± 0.48 | 1.03 ± 0.57 | 0.99 ± 0.46 | 1.01 ± 0.52 | 0.8 ± 1.06 | 0.81 ± 1.03 | 1.55 ± 1.04 | 0.84 ± 1.14 | 0.999 ± 0.4 | 1.001 ± 0.5 | |
| 0.91 ± 0.84 | 1.08 ± 1.04 | 0.95 ± 0.84 | 1.06 ± 0.97 | 0.75 ± 1.55 | 0.93 ± 1.96 | 1.47 ± 1.93 | 0.85 ± 1.7 | 0.9 ± 0.81 | 1.1 ± 0.98 | |
| 0.91 ± 1.0 | 1.08 ± 1.21 | 0.94 ± 1.0 | 1.07 ± 1.14 | 0.76 ± 1.65 | 0.92 ± 2.09 | 1.49 ± 2.07 | 0.83 ± 1.75 | 0.87 ± 0.88 | 1.13 ± 1.12 | |
| 0.59 ± 9.5 | 1.2 ± 9.6 | −1.9 ± 9.3 | 0.11 ± 9.8 | 0.4 ± 2.3 | 0.11 ± 2.7 | −0.74 ± 2.8 | 0.2 ± 2.3 | 0.3 ± 3.5 | −0.3 ± 4.4 | |
Classification Performance.
For each , using the denoised compositional data, we perform two classifiers: a five-layer neural network () and a random forest model (). We normalize and report the mean and standard deviation of classification accuracies by the average accuracy of all methods. From Table II, SFPCA outperforms competing methods on Intestinal Microbiome and Newsgroups, though the accuracy differences are mostly less than one percent. In Section I, we further compare the performance of the two classifiers on Newsgroups as it relates to PCA analysis.
E. Real Data: Hyperbolic Spaces
We use a biological dataset of 103 plant and algal transcriptomes [53]. The authors inferred phylogenetic trees from genome-wide genes. Tree leaves are present-day species, internal nodes are ancestral species, and branch weights represent evolutionary distances. The dataset includes an “unfiltered” version with 852 trees and a “filtered” version with 844 trees after removing error-prone genes and filtering problematic sequences. Errors appear as outliers, with more expected in the unfiltered dataset a priori. Other studies [54], [55] have used this dataset to evaluate outlier detection methods. We preprocess each tree by rescaling branch lengths to a diameter of 10, compute the distance matrix between leaves, and embed it into a -dimensional () hyperbolic space using a semidefinite program [18]. We use two metrics to evaluate PCA results, and then apply them for outlier detection.
Distortion Analysis.
For a fixed dimension , we estimate hyperbolic affine subspaces, compute the projected hyperbolic points, and their hyperbolic distance matrix . The normalized distance error is calculated, where is the original distance matrix. These errors are averaged over for each algorithm and then divided by the average normalized errors across all algorithms — providing relative errors. If the relative error is greater than 1, the algorithm performs worse than average. For each algorithm, we report the mean and standard deviation of these relative errors across all gene trees. Distortion is not a perfect measure of PCA accuracy as highly noise-contaminated data should experience high distortions during the projection (denoising) step. In all experiments, PGA outperforms others in terms of distortion (Table III). SFPCA provides an average distance-preserving performance, contrary to synthetic experiments. We conjecture this may be due to the trees being relatively small (see scaled-down hyperbolic experiments in Fig. 5), or to high noise levels making distortion an inappropriate accuracy metric, or the discordance between our choice of distortion function (which overemphasizes large distances) and the distance distortion metric.
TABLE III.
Normalized Distance Errors () Mean and Standard Deviation Are Divided by Their Average Across Methods. Quartet Scores () Are Percentage Deviations From 100% (the Average). Lower Distortions (↓) and Higher Scores (↑) Are Better
| Method | Filtered | Unfiltered | ||
|---|---|---|---|---|
| SFPCA | 1.50 ± 1.91 | 0.98 ± 0.23 | 1.10 ± 1.74 | 1.01 ± 0.25 |
| PGA | 1.48 ± 1.47 | 0.55 ± 0.09 | 1.80 ± 1.45 | 0.53 ± 0.08 |
| BSA | 1.48 ± 1.61 | 1.45 ± 0.24 | 1.67 ± 1.91 | 1.45 ± 0.31 |
| HoroPCA | −4.48 ± 2.79 | 1.01 ± 0.21 | −4.57 ± 2.58 | 1.00 ± 0.20 |
Quartet Scores.
To use a biologically motivated accuracy measure, we use the quartet score [56]. For a target dimension , each algorithm is applied to an embedded hyperbolic point set to compute the projected (denoised) points. For each set of four points, we find the optimal tree topology with minimum distance distortion using the four-point condition [57]. For 105 randomly chosen (but fixed) sets of four projected points, we estimate their topology and compare it with the true topology from the gene trees. For each dimension , we compute the percentage of correctly estimated topologies, then average this over all dimensions. We normalize and report the mean and standard deviation of quartet scores by the average score of all methods, as detailed in Table III. In these experiments, PGA and SFPCA exhibit the best performance compared to the alternatives. This is particularly informative as the quartet score measures tree topology accuracy, not distance.
1). Outlier Detection With Hyperbolic Spectrum:
We show-case the practical utility of the -eigenequation (Definition 10) in species tree estimation. As proved in Theorem 4, the principal axes align with the leading -eigenvectors of . Thus, the optimal SFPCA cost corresponds to the sum of its neglected -eigenvalues. We conjecture that a tree with outliers has more outlier -eigenvalues; see Fig. 7(a)–7(b). If a tree has an outlier set of species (likely from incorrect sequences), its second leading -eigenvalue ()1 is significantly larger than the rest. We quantify this by plotting its normalized retained energy (or cumulative spectrum) versus the normalized embedding dimension (or number of -eigenvalues) and finding its the knee point. This lets us sort gene trees by their hyperbolic spectrum.
Fig. 7.

(a) and (b): Normalized retained energy versus normalized number of eigenvalues for two gene trees. The knee point (intersection with line) for trees with outliers approaches (0, 1). (c) The quartet score for species trees constructed using the top trees (knee values) versus random orders.
After sorting, we use the top trees with the smallest knee values (least prone to outliers) to construct a species topology using ASTRAL [58]. ASTRAL outputs the quartet score between the estimated species tree and the input gene trees, where a higher score indicates more congruence among input trees. Thus, a higher score after filtering means outlier gene trees, likely inferred from problematic sequences, have been removed. Our results (Fig. 7(c)) show that hyperbolic spectrum-based sorting — offered only by our SFPCA — effectively identifies the worst trees most dissimilar to others, without explicitly comparing tree topologies. In contrast, random sorting keeps the quartet score fixed. Filtered trees have a higher score than unfiltered trees and benefit less from further filtering. It is remarkable that using eigenvalues alone, we can effectively find genes with discordant evolutionary histories.
Acknowledgment
The authors would like to thank the National Institutes of Health (NIH) for their financial support.
This research was supported in part by the NIH under Grant 1R35GM142725. The associate editor coordinating the review of this article and approving it for publication was Dr. Marco Felipe Duarte.
Appendix
2) Claim 1: Let and be the geodesic where and . This geodesic lies at the intersection of and . Since and is a subspace, we have . Therefore, we have for all .
3) Claim 2: For a spherical affine subspace , we define the sliced unitary matrix where is the base point and are orthogonal basis for . Then, we have for all . Conversely, for any sliced unitary matrix , we let and for . Since 's and are orthogonal, we can define the spherical affine subspace.
4) Claim 3: We simplify its PCA cost as where (a) follows from Proposition 1, are orthogonal basis for , and (b) follows from cyclic property of trace.
5) Claim 4: From Corollary 2 and Definition 7, an optimal zero-dimensional affine subspace (a point) is a subset of any other spherical affine subspace. In general, if and only if .
6) Claim 5: Let be such that where , then is an eigenvector of with eigenvalue of .
7) Claim 6: Let and be the geodesic where . This geodesic belongs to . We have since and is a subspace. Thus, we have for all .
8) Claim 7: For a hyperbolic affine subspace , we define where is the base point and are orthogonal basis for . We have and for all . Conversely, for any sliced unitary matrix , we let and for . Since 's and are orthogonal, we can define the spherical affine subspace.
9) Claim 8: Let , and be the aforementioned geodesics. A point belongs to a geodesic whose end points are and for and . Let us show for a subspace . From Claim 6, is a geodesic submanifold. It suffices to show that belong to . Let , for all , and . This proves . Conversely, let — the hyperbolic affine subspace constructed as before. Since is a geodesic submanifold, belongs to a geodesic whose end points are and for and , constructed as before. From , we have .
10) Claim 9: From Theorem 4 and Definition 12, a optimal zero-dimensional affine subspace is a subset of any other affine subspace. For optimal affine subspaces , we have if and only if .
11) Proposition 1: Consider the following Lagrangian:
| (14) |
where and . The solution to equation (14) takes the form , for scalars and . The subspace conditions — , — give where . Enforcing the norm condition, we arrive at where . Then, we have If — i.e., — then is nonunique; but the projection distance well-defined as , .
12) Proposition 2: Let where . Distinct columns of are orthogonal — i.e., . Hence, are linearly independent. Therefore, we have where are scalars, and (a) is due to Proposition 1. We hence have and . We can accordingly compute , and prove the proposition.
13) Proposition 3: Let be a real matrix such that , that is, . Let be an eigenvectoreigenvalue pair of . Then, we have and . We have since
14) Proposition 4: Let be a full-rank matrix and such that and . Then, we have . This contradicts with the assumption that is full-rank.
15) Proposition 5: Consider the following Lagrangian:
| (15) |
where , admits the solution , for scalars and . The subspace conditions, give where . Enforcing the norm condition, we get , . We have .
16) Proposition 6: Let where . Columns of are -orthogonal, that is, . Since we have , are linearly independent, we have where are scalars, and (a) is due Proposition 5. So we have , and i.e., . Then, we can compute and prove the proposition.
17) Proposition 7: Let where is -unitary. Since is a diagonal matrix, we let . We have , i.e., is -diagonalizable.
Let be symmetric and for a -invertible and with distinct (in absolute values) elements. Then, we have
| (16) |
where be the -th column of . In the eigenequation (16), the negative (positive) signs are designated for the eigenvectors with negative (positive) norms. For distinct , we have
where (a) is due to the eigenequation (16). Since , then we must have . Without loss of generality, -eigenvectors are scaled such that for . Lemma 1 shows that and for .
Lemma 1: .
Proof: Let where and -diagonalizes , viz., , where (a) follows from , (b) from for , and is a diagonal matrix. This shows that diagonalizes , i.e., is a diagonal matrix. However, is a symmetric matrix with only one negative eigenvalue [18]. Therefore, without loss of generality, the first diagonal element of is negative. □
From Lemma 1, we have and .
18) Theorem1: Consider with orthogonal tangents . For , we have , and , i.e., and is a map between to ; see proof of Proposition 2. We also have for all . Hence, is the inverse map of — a bijection. Finally, is an isometry between and since for all .
19) Theorem2: We have where is the -the largest eigenvalue of . We achieve the lower bound if we let for . The optimal base point is any vector in with norm , i.e., allows for nested affine subspaces.
20) Theorem3: Consider with orthogonal tangents . For , we have , and i.e., and is a map between to ; see proof of Proposition 6. We also have for all . Hence, is the inverse map of — a bijection. Finally, is an isometry between and since for all .
21) Theorem4: WLOG, we scale and such that , i.e., . The cost is:
where is -diagonalizable, , (a) follows from being a diagonal matrix, i.e., , and .
Lemma 2: .
Proof: , where (a) follows from . This is the case, since by definition, we have , or , that is, . Hence, we have . Finally, is the direct result of the orthogonality of basis vectors .□
We write the cost function as follows:
where , i.e., . Let be as follows:
| (17) |
for vectors with norms greater or equal to 1 — notice and Lemma 2. From and equation (17), we have:
For is the squared norm of the -th row of . Let where its -th row is equal to the ()-th row of . Therefore, we have
| (18) |
Let us now simplify the cost function as follows:
| (19) |
Fig. 8.

Classification accuracy (%) of PCA to dimension on reconstructed compositional data with (a) random forest and (b) neural net.
Lemma 3: For all , we have .
Proof: Let be the -th -eigenvector of . We have , for all . QED. □
From Lemma 3, equation (18), and for all , the minimum of the cost function in equation (19) happens only if , i.e., . Therefore, we have . The first row of corresponds to the Lorentzian product of and the first column of , i.e., . Hence, we have . Since is the only negative -eigenvector of , then we have . From the unit norm constraintS for , we have , i.e., is a sliced unitary matrix and has zero-one eigenvalues. The cost function achieves its minimum if and only if the non-zero singular values of are aligned with the smallest diagonal values of — from the von Neumann's trace inequality [59]. Let . If for , then we achieve the minimum of the cost function, that is, are negative -eigenvectors paired to the smallest -eigenvalues of .
A. Additional Experiments
We present experiments on the NewsGroups dataset to demonstrate the impact of spherical PCA on classification accuracy. Using random forest and a five-layer neural network classifiers with a 90% training and 10% test split, the average accuracy is based on 20 random splits. As shown in Fig. 8, SFPCA outperforms PGA in average accuracy over (target dimension) experiments. Both methods significantly improve random forest performance and modestly improve the neural network's. This may be due to the neural network's denoising ability. For large , random forest accuracy declines, unlike the neural network. This may be due to the Newsgroups dataset’s high sparsity. Discarding small eigenvalues of the second-moment matrix significantly alters the data's sparsity, adversely affecting the random forest's accuracy.
Footnotes
The -eigenvalue corresponds to the base point.
Contributor Information
Puoya Tabaghi, Halicioğlu Data Science Institute, University of California San Diego, San Diego, CA 92093 USA.
Michael Khanzadeh, Computer Science and Engineering Department, University of California San Diego, San Diego, CA 92093 USA. He is now with the Department of Computer Science, Columbia University, New York, NY 10027 USA.
Yusu Wang, Halicioğlu Data Science Institute, University of California San Diego, San Diego, CA 92093 USA.
Siavash Mirarab, Electrical and Computer Engineering Department, University of California San Diego, San Diego, CA 92093 USA.
REFERENCES
- [1].Thurstone LL, “Multiple factor analysis.” Psychol. Rev, vol. 38, no. 5, 1931, Art. no. 406. [Google Scholar]
- [2].Wold S, Esbensen K, and Geladi P, “Principal component analysis,” Chemometrics Intell. Lab. Syst, vol. 2, nos. 1–3, pp. 37–52, 1987. [Google Scholar]
- [3].Stewart GW, “On the early history of the singular value decomposition,” SIAM Rev., vol. 35, no. 4, pp. 551–566, 1993. [Google Scholar]
- [4].Hotelling H, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol, vol. 24, no. 6, 1933, Art. no. 417. [Google Scholar]
- [5].Jolliffe IT, Principal Component Analysis. New York, NY, USA: Springer, 2002. [Google Scholar]
- [6].Tipping ME and Bishop CM, “Probabilistic principal component analysis,” J. Roy. Statist. Soc.: Ser. B (Statist. Methodol.), vol. 61, no. 3, pp. 611–622, 1999. [Google Scholar]
- [7].Vidal R, Ma Y, and Sastry S, “Generalized principal component analysis (GPCA),” IEEE Trans. Pattern Anal. Mach. Intell, vol. 27, no. 12, pp. 1945–1959, Dec. 2005. [DOI] [PubMed] [Google Scholar]
- [8].Lawrence N, “Gaussian process latent variable models for visualisation of high dimensional data,” in Proc. Adv. Neural Inf. Process. Syst, vol. 16, 2003, pp. 329–336. [Google Scholar]
- [9].Roweis S, “EM algorithms for PCA and SPCA,” in Proc. Adv. Neural Inf. Process. Syst, vol. 10, 1997, pp. 626–632. [Google Scholar]
- [10].Bishop C, “Bayesian PCA,” in Proc. Adv. Neural Inf. Process. Syst, vol. 11, 1998, pp. 382–388. [Google Scholar]
- [11].Jolliffe IT, Trendafilov NT, and Uddin M, “A modified principal component technique based on the LASSO,” J. Comput. Graphical Statist, vol. 12, no. 3, pp. 531–547, 2003. [Google Scholar]
- [12].Zou H, Hastie T, and Tibshirani R, “Sparse principal component analysis,” J. Comput. Graphical Statist, vol. 15, no. 2, pp. 265–286, 2006. [Google Scholar]
- [13].Cai TT, Ma Z, and Wu Y, “Sparse PCA: Optimal rates and adaptive estimation,” Ann. Statist, vol. 41, no. 6, pp. 3074–3110, 2013. [Google Scholar]
- [14].Guan Y and Dy J, “Sparse probabilistic principal component analysis," in Proc. Artif. Intell. Statist, PMLR, 2009, pp. 185–192. [Google Scholar]
- [15].Xu H, Caramanis C, and Sanghavi S, “Robust PCA via outlier pursuit,” in Proc. Adv. Neural Inf. Process. Syst, vol. 23, 2010, pp. 2496–2504. [Google Scholar]
- [16].Fletcher PT, Lu C, Pizer SM, and Joshi S, “Principal geodesic analysis for the study of nonlinear statistics of shape,” IEEE Trans. Med. Imag, vol. 23, no. 8, pp. 995–1005, Aug. 2004. [DOI] [PubMed] [Google Scholar]
- [17].Jiang Y, Tabaghi P, and Mirarab S, “Learning hyperbolic embedding for phylogenetic tree placement and updates,” Biology, vol. 11, no. 9, 2022, Art. no. 1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Tabaghi P and Dokmanić I, “Hyperbolic distance matrices,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery & Data Mining, 2020, pp. 1728–1738. [Google Scholar]
- [19].Fletcher PT and Joshi S, “Principal geodesic analysis on symmetric spaces: Statistics of diffusion tensors," in Proc. Comput. Vis. Math. Methods Med. Biomed. Image Anal, Heidelberg, Germany: Springer, 2004, pp. 87–98. [Google Scholar]
- [20].Lee JM, Riemannian Manifolds: An Introduction to Curvature, vol. 176. New York, NY, USA: Springer Science & Business Media, 2006. [Google Scholar]
- [21].Sonthalia R and Gilbert A, “Tree! I am no tree! I am a low dimensional hyperbolic embedding,” in Proc. Adv. Neural Inf. Process. Syst, vol. 33, 2020, pp. 845–856. [Google Scholar]
- [22].Tabaghi P, Peng J, Milenkovic O, and Dokmanić I, “Geometry of similarity comparisons," 2020, arXiv:2006.09858.
- [23].Chien E, Pan C, Tabaghi P, and Milenkovic O, “Highly scalable and provably accurate classification in Poincaré balls," in Proc. IEEE Int. Conf. Data Mining, Piscataway, NJ, USA: IEEE Press, 2021, pp. 61–70. [Google Scholar]
- [24].Chien E, Tabaghi P, and Milenkovic O, “HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discovery Data Mining, 2022, pp. 201–211. [Google Scholar]
- [25].Klimovskaia A, Lopez-Paz D, Bottou L, and Nickel M, “Poincaré maps for analyzing complex hierarchies in single-cell data,” Nature Commun., vol. 11, no. 1, pp. 1–9, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Zhou Y and Sharpee TO, “Hyperbolic geometry of gene expression,” Iscience, vol. 24, no. 3, 2021, Art. no. 102225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Meng Y et al. , “Spherical text embedding,” in Proc. Adv. Neural Inf. Process. Syst, vol. 32, 2019, pp. 8208–8217. [Google Scholar]
- [28].Dai X and Müller H-G, “Principal component analysis for functional data on Riemannian manifolds and spheres,” Ann. Statist, vol. 46, no. 6B, pp. 3334–3361, 2018. [Google Scholar]
- [29].Gu A, Sala F, Gunel B, and Ré C, “Learning mixed-curvature representations in product spaces,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 2898–2918. [Google Scholar]
- [30].Rahman IU, Drori I, Stodden VC, Donoho DL, and Schröder P, “Multiscale representations for manifold-valued data,” Multiscale Model. & Simul, vol. 4, no. 4, pp. 1201–1232, 2005. [Google Scholar]
- [31].Tournier M, Wu X, Courty N, Arnaud E, and Reveret L, “Motion compression using principal geodesics analysis," in Computer Graphics Forum, vol. 28, no. 2. Oxford, U.K.: Wiley Online Library, 2009, pp. 355–364. [Google Scholar]
- [32].Anirudh R, Turaga P, Su J, and Srivastava A, “Elastic functional coding of human actions: From vector-fields to latent variables,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2015, pp. 3147–3155. [Google Scholar]
- [33].Fletcher PT, Lu C, and Joshi S, “Statistics of shape via principal geodesic analysis on Lie groups," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, vol. 1. Piscataway, NJ, USA: IEEE Press, 2003, pp. I–I. [Google Scholar]
- [34].Sommer S, Lauze F, and Nielsen M, “Optimization over geodesics for exact principal geodesic analysis,” in Proc. Adv. Comput. Math, vol. 40, no. 2, 2014, pp. 283–313. [Google Scholar]
- [35].Huckemann S, Hotz T, and Munk A, “Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions,” Statistica Sinica, vol. 20, no. 1, pp. 1–58, 2010. [Google Scholar]
- [36].Lazar D and Lin L, “Scale and curvature effects in principal geodesic analysis,” J. Multivariate Anal, vol. 153, pp. 64–82, Jan. 2017. [Google Scholar]
- [37].Pennec X, “Barycentric subspace analysis on manifolds,” Ann. Statist, vol. 46, no. 6A, pp. 2711–2746, 2018. [Google Scholar]
- [38].Sommer S, Lauze F, Hauberg S, and Nielsen M, “Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations," in Comput. Vis.(ECCV): 11th Eur. Conf. Comput. Vis, Heraklion, Crete, Greece. Springer, 2010, pp. 43–56. [Google Scholar]
- [39].Tabaghi P, Chien E, Pan C, Peng J, and Milenković O, “Linear classifiers in product space forms," 2021, arXiv:2102.10204.
- [40].Liu K, Li Q, Wang H, and Tang G, “Spherical principal component analysis," in Proc. SIAM Int. Conf. Data Mining, Philadelphia, PA, USA: SIAM, 2019, pp. 387–395. [Google Scholar]
- [41].Chami I, Gu A, Nguyen DP, and Ré C, “HoroPCA: Hyperbolic dimensionality reduction via horospherical projections," in Proc. Int. Conf. Mach. Learn. PMLR, 2021, pp. 1419–1429. [Google Scholar]
- [42].Chakraborty R, Seo D, and Vemuri BC, “An efficient exact-PGA algorithm for constant curvature manifolds,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2016, pp. 3976–3984. [Google Scholar]
- [43].Gallier J and Quaintance J, Differential Geometry and Lie Groups: A Computational Perspective. Springer Nature, 2020. [Google Scholar]
- [44].Pearson K, “LIII. On lines and planes of closest fit to systems of points in space,” London, Edinburgh, Dublin Philos. Mag. J. Sci, vol. 2, no. 11, pp. 559–572, 1901. [Google Scholar]
- [45].Lahav A and Talmon R, “Procrustes analysis on the manifold of SPSD matrices for data sets alignment,” IEEE Trans. Signal Process, vol. 71, pp. 1907–1921, 2023. [Google Scholar]
- [46].Lou A, Katsman I, Jiang Q, Belongie S, Lim S-N, and De Sa C, “Differentiating through the Fréchet mean," in Proc. Int. Conf. Mach. Learn, PMLR, 2020, pp. 6393–6403. [Google Scholar]
- [47].Huckemann S and Ziezold H, “Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces,” in Proc. Adv. Appl. Probability, vol. 38, no. 2, pp. 299–319, 2006. [Google Scholar]
- [48].Jung S, Dryden IL, and Marron JS, “Analysis of principal nested spheres,” Biometrika, vol. 99, no. 3, pp. 551–568, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Higham NJ, “J-orthogonal matrices: Properties and generation,” SIAM Rev., vol. 45, no. 3, pp. 504–519, 2003. [Google Scholar]
- [50].Slapničar I and Veselić K, “A bound for the condition of a hyperbolic eigenvector matrix,” Linear Algebra Appl., vol. 290, nos. 1-3, pp. 247255, 1999. [Google Scholar]
- [51].Lahti L, Salojärvi J, Salonen A, Scheffer M, and De Vos WM, “Tipping elements in the human intestinal ecosystem,” Nature Commun., vol. 5, no. 1, 2014, Art. no. 4344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Charlson ES et al. , “Disordered microbial communities in the upper respiratory tract of cigarette smokers,” PLoS One, vol. 5, no. 12, 2010, Art. no. e15216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Wickett NJ et al. , “Phylotranscriptomic analysis. the origin and early diversification of land plants,” Proc. Nat. Acad. Sci, vol. 111, no. 45, pp. 4859–4868, Oct. 2014,. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Mai U and Mirarab S, “TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees,” BMC Genomics, vol. 19, no. S5, May 2018, Art. no. 272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Comte A et al. , “PhylteR: Efficient identification of outlier sequences in phylogenomic datasets,” Mol. Biol. Evol, vol. 40, no. 11, Nov. 2023, Art. no. msad234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Estabrook GF, McMorris FR, and Meacham CA, “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units,” Systematic Biol., vol. 34, no. 2, pp. 193–200, Jun. 1985. [Google Scholar]
- [57].Warnow T, Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge, U.K.: Cambridge Univ. Press, 2017. [Google Scholar]
- [58].Zhang C, Rabiee M, Sayyari E, and Mirarab S, “ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees,” BMC Bioinf., vol. 19, no. S6, May 2018, Art. no. 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Mirsky L, “A trace inequality of John von Neumann,” Monatshefte für Mathematik, vol. 79, no. 4, pp. 303–306, 1975. [Google Scholar]
