Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2022 Apr 9;12(1):210–311. doi: 10.1093/imaiai/iaac005

Linear convergence of the subspace constrained mean shift algorithm: from Euclidean to directional data

Yikun Zhang 1,, Yen-Chi Chen 2
PMCID: PMC9893762  PMID: 36761435

Abstract

This paper studies the linear convergence of the subspace constrained mean shift (SCMS) algorithm, a well-known algorithm for identifying a density ridge defined by a kernel density estimator. By arguing that the SCMS algorithm is a special variant of a subspace constrained gradient ascent (SCGA) algorithm with an adaptive step size, we derive the linear convergence of such SCGA algorithm. While the existing research focuses mainly on density ridges in the Euclidean space, we generalize density ridges and the SCMS algorithm to directional data. In particular, we establish the stability theorem of density ridges with directional data and prove the linear convergence of our proposed directional SCMS algorithm.

Keywords: ridges, subspace constrained mean shift, directional data, optimization on a manifold

Keywords: 62G05, 49Q12, 62H11

1. Introduction

Identifying meaningful lower dimensional structures from a point cloud has long been a popular research topic in Statistics and Machine Learning [60, 111]. One reliable characterization of such a low-dimensional structure is the density ridge, which can be feasibly estimated by a kernel density estimator (KDE) from point cloud data [39, 45]. Loosely speaking, an estimated density ridge signifies a high-density curve or surface in a point cloud; see the left panel of Fig. 1. Let Inline graphic be the underlying probability density function that generates the data in the Euclidean space Inline graphic. Its order-Inline graphic density ridge Inline graphic with Inline graphic is the set of points defined as

graphic file with name DmEquation1.gif (1.1)

where Inline graphic are the eigenvalues of Hessian Inline graphic and Inline graphic has its columns as the last Inline graphic orthonormal eigenvectors. The notion of density ridges has appeared in various scientific fields, such as medical imaging [114], seismology [95] and astronomy [26, 101]. To locate an estimated density ridge defined by (Euclidean) KDE, [83] proposed a practical method called subspace constrained mean shift (SCMS) algorithm.

Fig. 1.


Fig. 1.

Density ridges estimated by Euclidean and directional SCMS algorithms on two synthetic datasets (drawn as black points) with hidden circular manifold structures (indicated by blue curves) on Inline graphic and the unit sphere Inline graphic, respectively. Left: The orange points indicate the estimated ridge obtained by the Euclidean SCMS algorithm from the dataset on Inline graphic. Right: The red points represent the estimated directional ridge identified by our directional SCMS algorithm, while the orange points indicate the estimated ridge obtained by the Euclidean SCMS algorithm from the dataset on Inline graphic. This panel is presented under the Hammer projection; see Appendix B for more details.

While the statistical estimation and asymptotic theories of density ridges in Inline graphic have been well-studied [22, 24, 45, 88, 89], the literature falls short of addressing the algorithmic properties of the ridge-finding method, i.e. the SCMS algorithm. To the extent of our knowledge, [46, 47] were the only available works to investigate the SCMS algorithm and its modified version from an algorithmic perspective. However, they only proved a non-decreasing property of density estimates and the validity of two stopping criteria for the SCMS algorithm. The algorithmic convergence of the SCMS algorithm remains an open question. There are two challenges to answering this question. First, because every iteration of the SCMS algorithm involves a projection matrix defined by the (estimated) Hessian, it is no longer a conventional first-order method in optimization. Second, estimating a density ridge in practice is a nonconvex/nonconcave optimization problem. Thus, the first objective of this paper is to provide a theoretical study on the algorithmic convergence and its associated (linear) rate of convergence for the SCMS algorithm.

In stark contrast to abundant research papers about density ridges in the Euclidean space, little work has been done to examine the statistical properties and any practical algorithm of estimating density ridges on the unit hypersphere Inline graphic. Nevertheless, data on Inline graphic are ubiquitous in many scientific fields of study, such as seismology (e.g. longitudes and latitudes of the epicenters of earthquakes) and astronomy (e.g. right ascensions and declinations of astronomical objects). Such data are generally known as directional data in the statistical literature [70, 74]. Hence, the second objective of this paper is to generalize density ridges and the SCMS algorithm to directional data.

More importantly, identifying an estimated density ridge from directional data on Inline graphic by the Euclidean SCMS algorithm always suffers from high bias near the two poles of Inline graphic. Consider a synthetic dataset with independently and identically distributed (i.i.d.) observations Inline graphic from a great circle connecting the North and South Poles of Inline graphic with additive noises. We apply both the Euclidean and directional SCMS algorithms to this simulated dataset. While the estimated ridges by the Euclidean SCMS algorithm fail to recover the desired great circle in high latitude regions, the ridges identified by our proposed directional SCMS algorithm align well with the underlying circular structure; see the right panel in Fig. 1 for a preview and Appendix B for a more detailed discussion.

Main Results. The main contributions of this paper are summarized as follows:

  • We present the convergence analysis of the SCMS and the general SCGA algorithms and prove their linear convergence properties with Euclidean data (Theorem 3.1, Corollary 3.2, and related discussion in Section 3.3):
    graphic file with name DmEquation2.gif
    where Inline graphic is a sequence of points generated by the SCGA or SCMS algorithm in Inline graphic, Inline graphic is the limit point of the sequence, and Inline graphic is a constant.
  • We generalize density ridges and the SCMS algorithm to directional data on Inline graphic (Section 4).

  • We prove the statistical convergence rate of a ridge estimator on the sphere Inline graphic defined by the directional KDE (Theorem 4.1):
    graphic file with name DmEquation3.gif
    where Inline graphic and Inline graphic are the population and estimated directional density ridges, respectively, Inline graphic is the Hausdorff distance, and Inline graphic is the dimension of Inline graphic.
  • We establish the convergence of the SCMS and the general SCGA algorithms with directional data and derive their linear convergence results (Theorem 4.2, Corollary 4.2 and related expositions in Section 4.3):
    graphic file with name DmEquation4.gif
    where Inline graphic is the sequence of points generated by the directional SCGA or SCMS algorithm, Inline graphic is the convergence point, Inline graphic is a constant and Inline graphic is the geodesic distance on Inline graphic.

Other Related Literature. The problem of density ridge estimation has its unique standing in both the computer science and statistics literature; see [33, 39, 52, 53] and references therein. Among various definition of density ridges [79, 84], our definition follows from [22, 39, 45], because its statistical estimation theory has been well established and it is feasible to be directly generalized to directional densities. Practically, the SCMS algorithm for identifying an estimated density ridge first appeared in the field of computer vision [94] before its introduction to the statistical community by [83]. More recently, [90] proposed alternative methods to the SCMS algorithm for finding density ridges, which are based on a gradient descent of the ridgeness and have connections to solution manifolds [28]. They presented the convergence analysis on continuous versions of their proposed methods and discretized them via Euler’s method. Our directional SCMS algorithm is extended from the directional mean shift algorithm [62, 65, 80, 113, 117, 118]. As we cast the (directional) SCMS algorithms into subspace constrained gradient ascent (SCGA) algorithms (on a hypersphere), it is worth mentioning that one should not confuse the SCGA algorithm here with the projected gradient ascent/descent method for a constrained problem in the standard optimization theory; see Section 3.2 in [17] for some references of the latter one. The SCGA algorithm discussed in this paper is a gradient ascent algorithm but with a subspace constrained gradient. When the subspace coincides with alternating one-dimensional coordinate spaces, the SCGA algorithm reduces to the well-known coordinate ascent/descent method [112]. Some linear convergence results of the coordinate descent algorithms were previously established by [11, 73]. Other related work includes [66, 67], though, in their problem setups, the projection matrix onto the subspace is random and has its expectation equal to the identity matrix. Our interested SCGA algorithm always has a deterministic constrained subspace defined by the eigenspace associated with the last several eigenvalues of the Hessian of the density Inline graphic.

Outlines and Notations. Section 2 introduces the definitions of Euclidean and directional KDEs and reviews some preliminary concepts of differential geometry on Inline graphic. We discuss the assumptions on the Euclidean density ridges and establish the (linear) convergence results of the SCGA and SCMS algorithms in Section 3. In Section 4, we generalize the definition of density ridges to the directional data scenario and prove the (linear) convergence properties of the SCGA and SCMS algorithms on Inline graphic. Some simulation studies and real-world applications of Euclidean and directional SCMS algorithms are presented in Section 5, whose code is available at https://github.com/zhangyk8/EuDirSCMS. We conclude the paper and discuss some potential impacts in Section 6.

Throughout the paper we use Inline graphic as the intrinsic dimension of density ridges, whose ambient spaces are Inline graphic in the Euclidean data case and Inline graphic in the directional data case. Notice that a quantity under the directional data setting that has its counterpart in the Euclidean data case will be denoted by the same notation with an extra underline. For instance, Inline graphic is a ridge of the density Inline graphic in the Euclidean space Inline graphic, while Inline graphic refers to a ridge of the directional density Inline graphic on the sphere Inline graphic.

Let Inline graphic be a smooth function and Inline graphic be a multi-index (that is, Inline graphic are nonnegative integers and Inline graphic). Define Inline graphic as the Inline graphic-th order partial derivative operator, where Inline graphic is often written as Inline graphic. For Inline graphic, we define the functional norms

graphic file with name DmEquation5.gif

When Inline graphic, this becomes the infinity norm of Inline graphic; for Inline graphic, the above norms are indeed some semi-norms. We also define Inline graphic.

The (total) gradient and Hessian of Inline graphic are defined as Inline graphic and Inline graphic. Inductively, the third derivative of Inline graphic is a Inline graphic array given by Inline graphic. When Inline graphic is a directional density supported on Inline graphic, the preceding functional norms are defined via the Riemannian gradient, Hessian and high-order derivatives of Inline graphic within the tangent space Inline graphic at Inline graphic, and the supremum will be taken over Inline graphic instead of Inline graphic. They are equivalent to the derivatives of Inline graphic with respect to the local coordinate chart on Inline graphic; see Section 2.3 for a review.

Let Inline graphic denote the Inline graphic entry of a matrix Inline graphic. Then, the Frobenius norm is Inline graphic, where Inline graphic is the trace of the square matrix Inline graphic, and the operator norm is Inline graphic. In most cases, we consider the Inline graphic (operator) norm Inline graphic. We define Inline graphic. The inequality relationships between the above matrix norms are Inline graphic, Inline graphic and Inline graphic.

We use the big-O notation Inline graphic if the absolute value of Inline graphic is upper bounded by a positive constant multiple of Inline graphic for all sufficiently large Inline graphic. In contrast, Inline graphic when Inline graphic. For random vectors, the notation Inline graphic is short for a sequence of random vectors that converges to zero in probability. The expression Inline graphic denotes the sequence that is bounded in probability; see Section 2.2 of [107] for details.

2. Preliminaries

In this section, we review the KDE with Euclidean and directional data as well as some differential geometry concepts on Inline graphic.

2.1 Kernel Density Estimation with Euclidean Data

Let Inline graphic be a random sample from a distribution Inline graphic with density Inline graphic supported on the Euclidean space Inline graphic. We call such random sample Inline graphic Euclidean data in the sequel. The (Euclidean) KDE at point Inline graphic with a kernel function Inline graphic and bandwidth parameter Inline graphic is written as [27, 96, 110]:

graphic file with name DmEquation6.gif (2.1)

The kernel Inline graphic is generally a unimodal function satisfying the following properties:

  • (K1) Inline graphic.

  • (K2) Inline graphic is (radially) symmetric, i.e. Inline graphic.

  • (K3) Inline graphic and Inline graphic, where Inline graphic is the usual Inline graphic norm in Inline graphic.

One possible approach to construct a multivariate kernel Inline graphic with the above properties is to derive it from a kernel profile as follows:

graphic file with name DmEquation7.gif (2.2)

where Inline graphic is the normalizing constant such that Inline graphic satisfies (K1) and the function Inline graphic is called the profile of the kernel. This kernel form is generally used in deriving (subspace constrained) mean shift algorithms; see Section 3.2. An important example of the profile function is Inline graphic for Inline graphic, leading to the multivariate Gaussian kernel Inline graphic.

Another approach of designing a multivariate kernel function is to leverage the product kernel technique as Inline graphic, where Inline graphic are kernels function defined on Inline graphic satisfying the properties (K1-3). This leads to a multivariate KDE as

graphic file with name DmEquation8.gif (2.3)

In fact, the multivariate Gaussian kernel Inline graphic can be obtained by defining its kernel profile as Inline graphic for Inline graphic or taking Inline graphic. In practice, the multivariate KDE (2.1) with Gaussian kernel is the most popular nonparametric density estimator with Euclidean data.

The most crucial part in applying the KDE is to select the bandwidth parameter Inline graphic. Common methods in the literature aim at minimizing the mean integrated square error (MISE):

graphic file with name DmEquation9.gif

or its asymptotic part through the rule of thumb [99], cross validation [16, 50, 91, 102] and plug-in methods [98]. As choosing the bandwidth is not the main focus of this paper, we refer the interested reader to [61, 97] and Chapter 6.5 of [96] for comprehensive reviews.

2.2 Kernel Density Estimation with Directional Data

The Euclidean KDE (2.1) exhibits some salient drawbacks in dealing with directional data samples; see Appendix B for a detailed exposition. Fortunately, the theory of kernel density estimation with directional data has been well studied since late 1970s [7, 12, 43, 51, 86, 120]. Let Inline graphic be a random sample generated from an underlying directional density function Inline graphic on Inline graphic with Inline graphic where Inline graphic is the Lebesgue measure on Inline graphic. The directional KDE is given by

graphic file with name DmEquation10.gif (2.4)

where Inline graphic is a directional kernel (i.e. a rapidly decaying function with nonnegative values and defined on Inline graphic for some constant Inline graphic), Inline graphic is the bandwidth parameter and Inline graphic is a normalizing constant satisfying Inline graphic.

Remark 2.1.

The distance metric used by the directional KDE (2.4) on Inline graphic is identical to the standard Euclidean metric in the ambient space Inline graphic. This is because the standard Euclidean metric Inline graphic of Inline graphic is topologically equivalent (but not strongly equivalent) to the geodesic distance Inline graphic on Inline graphic due to the following equality:


Remark 2.1. (2.5)

See Section C.1.5 in [81] for the definition of equivalence of metrics. Hence, the distance metric in (2.4) is indeed intrinsic on Inline graphic and adaptive to its geometry.

As in the applications of Euclidean KDEs, the bandwidth selection is a critical part in determining the performances of directional KDEs [7, 43, 51, 75, 82, 93, 106]. On the contrary, the choice of the kernel is less crucial; see, e.g. Page 72 of [110] and Section 6.3.2 in [96] for the reasoning. A popular candidate is the so-called von Mises kernel Inline graphic, which serves as a counterpart of the Gaussian kernel for directional KDEs. Its name originates from the famous Inline graphic-von Mises–Fisher distribution on Inline graphic, which is denoted by Inline graphic and has the density:

graphic file with name DmEquation12.gif (2.6)

where Inline graphic is the directional mean, Inline graphic is the concentration parameter and Inline graphic is the modified Bessel function of the first kind at order Inline graphic. For more details on statistical properties of the von Mises–Fisher distribution and directional KDE, we refer the interested reader to [9, 44, 74].

2.3 Riemannian Gradient, Hessian and Exponential Map on Inline graphic

Given that the unit hypersphere Inline graphic is a nonlinear manifold, the Riemannian gradient and Hessian of a smooth function Inline graphic on Inline graphic are defined within its tangent spaces. They are different from but also interconnected with the total gradient and Hessian of Inline graphic in the ambient Euclidean space Inline graphic.

Inline graphic  Riemannian Gradient on Inline graphic. Let Inline graphic be the tangent space of Inline graphic at point Inline graphic, which consists of all the vectors starting from Inline graphic and tangent to Inline graphic. Given a smooth function Inline graphic, its Riemannian gradient  Inline graphic is defined as

graphic file with name DmEquation13.gif (2.7)

for any (unit) vector Inline graphic, where Inline graphic is the inner product (or Riemannian metric) in Inline graphic and Inline graphic is the differential operator of Inline graphic at Inline graphic; see, e.g. Section 3.1 in [10] for more details. Note that the Riemannian metric on Inline graphic coincides with the standard inner product Inline graphic in the ambient space Inline graphic; see Section 3.6.1 in [1]. If Inline graphic is smooth in an open neighborhood containing Inline graphic and we consider Inline graphic as vectors in Inline graphic, then the inner product in Inline graphic reduces to the usual one in Inline graphic and the Riemannian gradient Inline graphic can be expressed in terms of the total gradient Inline graphic as

graphic file with name DmEquation14.gif (2.8)

where Inline graphic is the identity matrix. The left-hand side of (2.8) is the projection of the total gradient Inline graphic onto the tangent space Inline graphic at Inline graphic.

Inline graphic  Riemannian Hessian on Inline graphic. The Riemannian Hessian  Inline graphic at point Inline graphic is a symmetric bilinear map from the tangent space Inline graphic into itself defined as

graphic file with name DmEquation15.gif (2.9)

for any Inline graphic, where Inline graphic is the Riemannian connection on Inline graphic. Similar to Inline graphic, the Riemannian Hessian Inline graphic has the following explicit formula when viewed in the ambient Euclidean space Inline graphic:

graphic file with name DmEquation16.gif (2.10)

where Inline graphic and Inline graphic are the total gradient and Hessian of Inline graphic in Inline graphic. This formula can be derived via the Riemannian connection and Weingarten map on Inline graphic [2] and Section 5.5 in [1] or geodesics on Inline graphic [118].

Inline graphic  Exponential Map. An exponential map  Inline graphic at Inline graphic is a mapping that takes a vector Inline graphic to a point Inline graphic along the curve Inline graphic with Inline graphic and Inline graphic. Here, Inline graphic is a curve of minimum length between Inline graphic and Inline graphic (i.e. the so-called geodesic on Inline graphic). An intuitive way of thinking of the exponential map Inline graphic evaluated at Inline graphic on Inline graphic is that starting at point Inline graphic, we identify another point Inline graphic on Inline graphic along the geodesic (or great circle) in the direction of Inline graphic so that the geodesic distance between Inline graphic and Inline graphic is Inline graphic. As Inline graphic is a compact Riemannian manifold, the exponential map Inline graphic is a diffeomorphism (smooth bijection) from a neighborhood of Inline graphic to its image on Inline graphic; see Lemma 6.16 in [69]. The inverse of an exponential map (or logarithmic map) is defined within a neighborhood Inline graphic around Inline graphic as a mapping Inline graphic such that Inline graphic represents the vector in Inline graphic starting at Inline graphic, pointing to Inline graphic, and with its length equal to the geodesic distance between Inline graphic and Inline graphic.

3. Linear Convergence of the SCMS Algorithm With Euclidean Data

Given the definition of a order-Inline graphic ridge Inline graphic in (1.1) of the (smooth) density Inline graphic on the Euclidean space Inline graphic, we introduce, in this section, some commonly assumed conditions to regularize Inline graphic and its stability theorem. After revisiting the frameworks of the Euclidean mean shift and SCMS algorithms as well as deriving the SCMS algorithm as the SCGA algorithm with an adaptive step size, we present our (linear) convergence analysis on the SCGA and SCMS algorithms.

3.1 Assumptions and Stability of Euclidean Density Ridges

Under the spectral decomposition on the Hessian Inline graphic as Inline graphic, we know that Inline graphic is a real orthogonal matrix with the eigenvectors of Inline graphic as its columns and Inline graphic is a diagonal matrix with Inline graphic. Given that Inline graphic, we let Inline graphic be the projection matrix onto the column space of Inline graphic and Inline graphic be the projection matrix onto the complement space, where Inline graphic and Inline graphic is the identity matrix in Inline graphic. Then, the order-Inline graphic principal gradient Inline graphic (or projected gradient in [22, 45] is defined as

graphic file with name DmEquation17.gif (3.1)

and Inline graphic will be called the residual gradient. The order-Inline graphic density ridge can be equivalently defined as

graphic file with name DmEquation18.gif (3.2)

It follows that the 0-ridge Inline graphic is the set of local modes of Inline graphic, whose statistical properties and practical estimation algorithm have been well studied in [6, 25]. Thus, we only consider the case when Inline graphic in the sequel. We define the projection from point Inline graphic onto a ridge Inline graphic by Inline graphic and the distance from point Inline graphic to Inline graphic by Inline graphic. Note that the projection from point Inline graphic to Inline graphic may not be unique. To guarantee the uniqueness of the projection, we introduce a concept called the reach [32, 42]:

graphic file with name DmEquation19.gif (3.3)

where Inline graphic and Inline graphic is a Inline graphic-dimensional ball of radius Inline graphic centered at Inline graphic. To obtain a well-behaved ridge Inline graphic, some assumptions need imposing on the underlying density Inline graphic around a small neighborhood of Inline graphic.

  • (A1) (Differentiability) We assume that Inline graphic is bounded and at least four times differentiable with bounded partial derivatives up to the fourth order for every Inline graphic.

  • (A2) (Eigengap) We assume that there exist constants Inline graphic and Inline graphic such that Inline graphic and Inline graphic for any Inline graphic.

  • (A3) (Path Smoothness) Under the same Inline graphic in (A2), we assume that there exists another constant Inline graphic such that
    graphic file with name DmEquation20.gif
    for all Inline graphic and Inline graphic.

Condition (A1) is a natural differentiability assumption under the context of ridge estimation. Condition (A2) is a curvature assumption on the true density Inline graphic, ensuring that Inline graphic is ‘strongly concave’ around Inline graphic inside the Inline graphic-dimensional linear space spanned by the columns of Inline graphic. We call this property ‘subspace constrained strong concavity’. It is one of the most important components in establishing the linear convergence of the SCGA and SCMS algorithms; see Remark 3.3 for the reasoning. Condition (A3) regularizes the gradient and third-order derivatives of Inline graphic from being too steep around the ridge Inline graphic. They are also imposed by [45] for characterizing a quadratic behavior of Inline graphic around Inline graphic and ensuring the stability of Inline graphic, as well as by [22] to avoid the degenerate normal spaces of Inline graphic. Consequently, Inline graphic is a Inline graphic-dimensional manifold that contains neither intersections nor endpoints; see also Lemma C.1 in the Appendix. Notice that the inequality assumptions in (A3) depend on both the ambient dimension Inline graphic and the intrinsic dimension Inline graphic of the ridge Inline graphic. The larger the dimensions Inline graphic and Inline graphic are, the harder the assumptions will hold. This phenomenon, in some sense, reflects the curse of dimensionality in nonparametric ridge estimation.

Given conditions (A1–3), the ridge Inline graphic will be stable under small perturbations of the underlying density Inline graphic and its derivatives, which is summarized in the following lemma. The stability of Inline graphic is generally measured by the Hausdorff distance defined as

graphic file with name DmEquation21.gif (3.4)

where Inline graphic are two sets in Inline graphic.

Lemma 3.1. (Theorem 4 in [45]).

Assume conditions (A1–3) for two densities Inline graphic. When Inline graphic is sufficiently small, we have


Lemma 3.1. (Theorem 4 in [45]).

where Inline graphic and Inline graphic are the Inline graphic-ridges of Inline graphic and Inline graphic, respectively.

When the true density Inline graphic that generates the Euclidean data Inline graphic is replaced by the Euclidean KDE Inline graphic in the definition (1.1) of density ridges, we obtain a natural (plug-in) estimator of the true ridge Inline graphic as

graphic file with name DmEquation23.gif

To regularize statistical behaviors of the estimated ridge Inline graphic, we make the following assumptions on the kernel of its form (2.2):

  • (E1) We assume that the kernel profile Inline graphic is non-increasing and at least three times continuously differentiable with bounded fourth-order partial derivatives as well as
    graphic file with name DmEquation24.gif
    with Inline graphic.
  • (E2) Let
    graphic file with name DmEquation25.gif
    We assume that Inline graphic is a bounded VC (subgraph) class of measurable functions on Inline graphic; that is, there exist constants Inline graphic such that for any Inline graphic,
    graphic file with name DmEquation26.gif
    where Inline graphic is the Inline graphic-covering number of the normed space Inline graphic, Inline graphic is any probability measure on Inline graphic and Inline graphic is an envelope function of Inline graphic. Here, the norm Inline graphic is defined as Inline graphic.

Remark 3.1.

Recall that the Inline graphic-covering number  Inline graphic is defined as the minimal number of Inline graphic-balls Inline graphic of radius Inline graphic needed to cover the (function) class Inline graphic. One popular concept for controlling uniform covering number Inline graphic is the notion of Vapnik–Červonenkis (subgraph) classes, or simply VC classes. Starting from collections of sets, we say that a collection Inline graphic of subsets of the sample space Inline graphic  picks out a certain subset of the finite set Inline graphic if it can be written as Inline graphic for some Inline graphic. The collection is said to shatter  Inline graphic if Inline graphic picks out each of its Inline graphic subsets. The VC-index  Inline graphic of Inline graphic is the smallest Inline graphic for which no set of size Inline graphic is shattered by Inline graphic. A collection Inline graphic of measurable sets is called a VC class if its index Inline graphic is finite. To generalize this concept to a class Inline graphic of real-valued and measurable functions defined on Inline graphic, we say that Inline graphic is a VC subgraph class if the collection of all subgraphs of the functions in Inline graphic forms a VC class of sets in Inline graphic. An important property of VC (subgraph) classes is that their Inline graphic-covering numbers grow polynomially in Inline graphic as what condition (E2) is stated; see Theorem 2.6.4 in [108]. More in-depth discussion on VC classes can be found in Chapter 2.6 of the same book.

Condition (E1) can be relaxed such that the kernel profile Inline graphic is three times continuously differentiable except for finite number of points on Inline graphic. Such relaxation allows us to include the Epanechnikov and other compactly supported kernel. The integrability assumption on Inline graphic in condition (E1) is similar to the conditions (K1) and (K3) in Section 2.1 for the purpose of bounding the expectations and variances of the KDE Inline graphic and its (partial) derivatives. Condition (E2) regularizes the complexity of the kernel and its (partial) derivatives, which is essential in establishing the uniform consistency of Inline graphic and its derivatives to the corresponding quantities of Inline graphic as in equation (3.5).

Given conditions (E1) and (E2), the techniques in [20, 40, 48] can be utilized to show the uniform consistency of the Euclidean KDE Inline graphic and its derivatives as

graphic file with name DmEquation27.gif (3.5)

3.2 Mean Shift and SCMS Algorithms with Euclidean Data

We begin with a quick review on the Euclidean mean shift algorithm, as the SCMS algorithm is built on top of such formulation. Given condition (E1) and the Euclidean KDE Inline graphic with kernel (2.2), its gradient estimator takes the form

graphic file with name DmEquation28.gif (3.6)

where the first term is a variant of KDEs and the second term is the mean shift vector

graphic file with name DmEquation29.gif (3.7)

This factorization suggests that the mean shift vector aligns with the direction of maximum increase in Inline graphic. Thus, moving a point along its mean shift vector successively yields an ascending path to a local mode [29, 31, 71]. Let Inline graphic be the mean shift sequence with the Euclidean KDE Inline graphic. Then, one step iteration of the mean shift algorithm is written as

graphic file with name DmEquation30.gif (3.8)

showing that the mean shift algorithm is a gradient ascent method with an adaptive step size

graphic file with name DmEquation31.gif (3.9)

Here, we denote by Inline graphic the denominator of the adaptive step size Inline graphic. Lemma 3.2 shows that under condition (E1) and the differentiability assumption on Inline graphic, Inline graphic tends to a fixed constant with probability tending to 1 for any Inline graphic as Inline graphic and Inline graphic. Therefore, the step size Inline graphic has its asymptotic rate as Inline graphic and tends to zero as Inline graphic and Inline graphic as well. The proof of Lemma 3.2 can be found in Appendix D.

Lemma 3.2.

Assume conditions (A1) and (E1). The convergence rate of Inline graphic is


Lemma 3.2.

for any Inline graphic as Inline graphic and Inline graphic.

As the mean shift algorithm is not the main focus of this paper, we will make an abuse of notation and denote by Inline graphic the sequence produced by the SCMS or SCGA algorithm in the sequel. Compared with the mean shift iteration (3.8), the SCMS algorithm updates the sequence Inline graphic through the SCMS vector Inline graphic as

graphic file with name DmEquation33.gif (3.10)

See Algorithm 1 in Appendix A for the entire procedure. This also implies that the SCMS algorithm can be viewed as a sample-based SCGA method as

graphic file with name DmEquation34.gif (3.11)

with the same adaptive step size Inline graphic as the Euclidean mean shift algorithm in (3.8). The formulation (3.11) sheds light on some (linear) convergence properties of the SCMS algorithm as we will demonstrate in the next subsection.

3.3 Linear Convergence of Population and Sample-Based SCGA Algorithms

We have shown in (3.11) that the (usual/Euclidean) SCMS algorithm is a variant of the sample-based SCGA algorithm in Inline graphic with an adaptive step size Inline graphic. To establish the (linear) convergence results of the SCMS algorithm with Euclidean KDE Inline graphic, it suffices to study the (linear) convergence of the sample-based SCGA algorithm with objective function Inline graphic. To this end, we begin by studying the convergence of the population SCGA algorithm whose objective function is the underlying density Inline graphic.

Let Inline graphic be the sequence defined by the population SCGA algorithm and Inline graphic be the sequence defined by the sample-based SCGA algorithm. The population SCGA algorithm is defined by its iterative formula as

graphic file with name DmEquation35.gif (3.12)

where Inline graphic is a (fixed) step size. The sample-based SCGA algorithm has its iterative formula as (3.11), except that the standard sample-based SCGA algorithm normally embraces a constant step size Inline graphic.

Remark 3.2.

In (3.10) and (3.11), we consider the SCMS algorithm as a sample-based SCGA iteration with an adaptive step size Inline graphic. Our Lemma 3.2 suggests that Inline graphic tends to zero in a rate Inline graphic as Inline graphic and Inline graphic. However, once the sample size Inline graphic is fixed and the bandwidth Inline graphic is chosen, the step size Inline graphic is not only upper bounded but also uniformly lower bounded away from zero with respect to the iteration number Inline graphic by the differentiability condition (E1) when the current iterative point Inline graphic lies within the compact neighborhood Inline graphic. Note that Inline graphic is compact because Inline graphic is a finite union of connected and compact manifolds; see (d) of Lemma C.1. More importantly, these upper and lower bounds of Inline graphic when Inline graphic are independent of the iteration number Inline graphic. Therefore, conditioning on the case when the sample size Inline graphic is sufficiently large, one can always select a small bandwidth Inline graphic such that the adaptive step size Inline graphic of the SCMS algorithm is sufficiently small but not equal to zero.

As revealed by the following proposition, our imposed conditions (A1–3) in Section 3.1 ensure that as long as the step size Inline graphic is small, the objective function Inline graphic along any population SCGA sequence Inline graphic is non-decreasing and the sequence by itself converges to Inline graphic when it is initialized within a small neighborhood of Inline graphic.

Proposition 3.1. (Convergence of the SCGA Algorithm.)

For any SCGA sequence Inline graphic defined by (3.12) with Inline graphic, the following properties hold.

  • (a) Under condition (A1), the objective function sequence Inline graphic is non-decreasing and converges.

  • (b) Under condition (A1), Inline graphic.

  • (c) Under conditions (A1–3), Inline graphic whenever Inline graphic with the convergence radius Inline graphic satisfying
    graphic file with name DmEquation36.gif
    where Inline graphic is a constant defined in (h) of Lemma C.1 while Inline graphic is a quantity depending on both the dimension Inline graphic and functional norm Inline graphic up to the fourth-order (partial) derivatives of Inline graphic.

The proof of Proposition 3.1 can be found in Appendix D. We make two comments on the choice of the convergence radius Inline graphic in (c) of Proposition 3.1. The first two quantities in the upper bound of Inline graphic ensure that Inline graphic and therefore, the projection of Inline graphic onto Inline graphic is well defined. The last quantity in the upper bound of Inline graphic is critical to guarantee that the distances Inline graphic from the SCGA sequence Inline graphic to the ridge Inline graphic can be controlled by the norms Inline graphic of order-Inline graphic principal gradients for Inline graphic.

Corollary 3.1. (Convergence of the SCMS Algorithm.)

When the fixed sample size Inline graphic is sufficiently large and the fixed bandwidth Inline graphic is chosen to be sufficiently small, the following properties hold for the SCMS sequence Inline graphic with high probability under conditions (A1–3) and (E1–2).

  • (a) The Euclidean KDE sequence Inline graphic is non-decreasing and thus converges.

  • (b) Inline graphic.

  • (c) Inline graphic whenever Inline graphic with the convergence radius Inline graphic defined in (c) of Proposition 3.1.

Corollary 3.1 is the sample-based version of Proposition 3.1. On the one hand, when Inline graphic is sufficiently large and Inline graphic is small enough, the estimated ridge Inline graphic also satisfies conditions (A1–3) with high probability; see Lemma 3.1 and the uniform bounds (3.5) of Inline graphic. On the other hand, the adaptive step size Inline graphic of the SCMS algorithm can be always smaller than the threshold Inline graphic when the sample size Inline graphic is sufficiently large and Inline graphic is small; see Remark 3.2. Consequently, our arguments in Proposition 3.1 can be applied to establish the (local) convergence of the SCMS sequence here. In addition, we point out that Proposition 2 in [46] also proved the results (a-b) of Corollary 3.1 under condition (E1) and the convexity assumption on the kernel profile Inline graphic. The difference is that our arguments hold when Inline graphic is large and Inline graphic is small while the extra convexity assumption in [46] enables the authors to prove the results (a-b) universally for any choice of the bandwidth Inline graphic.

By Proposition 3.1 and Corollary 3.1, it is now reasonable to denote the limiting points of the population and sample-based SCGA sequences Inline graphic and Inline graphic by Inline graphic and Inline graphic, respectively. Before stating our main linear convergence results, we introduce the concepts of Q-linear and R-linear convergence from optimization literature; see, e.g. Appendix A2 in [78].

Definition 3.2. (Linear Rate of Convergence.)

We say that the convergence of the sequence Inline graphic to Inline graphic is Q-linear if there exists a constant Inline graphic such that


Definition 3.2. (Linear Rate of Convergence.)

We say that the convergence is R-linear if there is a sequence of nonnegative scalars Inline graphic such that


Definition 3.2. (Linear Rate of Convergence.)

The linear convergence of the SCGA sequence Inline graphic will be established under the following local condition.

  • (A4) (Quadratic Behaviors of Residual Vectors) We assume that the SCGA sequence Inline graphic with step size Inline graphic and Inline graphic as its limiting point satisfies
    graphic file with name DmEquation39.gif
    for some constant Inline graphic, where Inline graphic is the constant defined in condition (A2).

Condition (A4) imposes a direct assumption on the SCGA sequence Inline graphic, under which the residual vector Inline graphic and its inner product with the residual gradient Inline graphic are upper bounded by a quadratic term Inline graphic. This condition is imposed to guarantee that Inline graphic is ‘subspace constrained strongly concave’ around Inline graphic; see also Remark 3.3. Our proof of Theorem 3.1 suggests that the residual vector Inline graphic is only required to be smaller than the first-order term Inline graphic. For simplicity, we require it to be quadratic. When condition (A4) fails to hold, the associated SCGA sequence can only converge sublinearly to Inline graphic. Therefore, it is an essential element in the linear convergence of the SCGA algorithm, and we discuss some potentially weaker assumptions that implicate condition (A4) in Appendix E. Intuitively, the SCGA path converges to Inline graphic following the direction of principal gradient Inline graphic. To further gain more insights into the correctness of condition (A4), we consider a special density function

graphic file with name DmEquation40.gif (3.13)

on Inline graphic, whose one-dimensional ridge is Inline graphic by the definition (1.1). Some careful calculations suggest that its principal gradient Inline graphic points toward the ridge Inline graphic in the direction Inline graphic when Inline graphic and in the direction Inline graphic when Inline graphic; see Fig. 2 for a graphical illustration. Furthermore, the smallest eigenvalue of Inline graphic is negative whenever Inline graphic. Hence, the residual gradient Inline graphic is perpendicular to the SCGA direction, and condition (A4) naturally holds.

Fig. 2.


Fig. 2.

Contour lines of the density function (3.13) and its principal gradient flows.

We now present our linear convergence results for the population and sample-based SCGA algorithms.

Theorem 3.3. (Linear Convergence of the SCGA Algorithm.)

Assume conditions (A1–4) throughout the theorem.

  • (a) Q-Linear convergence of Inline graphic: Consider a convergence radius Inline graphic satisfying
    graphic file with name DmEquation41.gif
    where Inline graphic is the constant defined in (h) of Lemma C.1 and Inline graphic is a quantity defined in (c) of Proposition 3.1 that depends on both the dimension Inline graphic and the functional norm Inline graphic up to the fourth-order derivative of Inline graphic. Whenever Inline graphic and the initial point Inline graphic with Inline graphic, we have that
    graphic file with name DmEquation42.gif
  • (b) R-Linear convergence of Inline graphic: Under the same radius Inline graphic in (a), we have that whenever Inline graphic and the initial point Inline graphic with Inline graphic,
    graphic file with name DmEquation43.gif

We further assume conditions (E1-2) in the rest of statements. If Inline graphic and Inline graphic,

  • (c) Q-Linear convergence of Inline graphic: under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation44.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.
  • (d) R-Linear convergence of Inline graphic: under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation45.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.

The detailed proof of Theorem 3.3 can be found in Appendix D. Note that, as in (c) of Proposition 3.1, we elucidate a threshold value for the convergence radius Inline graphic in (a), under which the population SCGA algorithm converges linearly to Inline graphic. The first three quantities in the threshold value are directly adopted from the upper bound of the convergence radius Inline graphic in (c) of Proposition 3.1, while the last term controls the ‘subspace constrained strongly concavity’ (3.15) of Inline graphic within Inline graphic.

Remark 3.3.

Notice that the standard strong concavity assumption on the objective function (or density function) Inline graphic is not sufficient to establish the linear convergence of the population SCGA algorithm (3.12). This is because, under the (quasi-)strong concavity assumption [76], the objective function Inline graphic would satisfy


Remark 3.3. (3.14)

for some constant Inline graphic, and those standard proofs of the linear convergence of gradient ascent methods rely on this inequality; see Section 3.4 in [17]. However, as indicated in our proof of Theorem 3.3, the linear convergence of the SCGA algorithm requires the following inequality instead:


Remark 3.3. (3.15)

for some constant Inline graphic, where Inline graphic is generally chosen to be Inline graphic. We call the function Inline graphic satisfying (3.15) to be ‘subspace constrained strongly concave’. Since


Remark 3.3.

the strong concavity assumption (3.14) will not imply the key inequality (3.15) for the linear convergence of the population SCGA algorithm unless the residual gradient term Inline graphic can be upper bounded by the second-order error term Inline graphic. The imposed eigengap condition (A2) as well as condition (A4) with its related discussion in Appendix E fill in this gap, ensuring that such a quadratic upper bound holds on the residual gradients along the SCGA sequence.

Corollary 3.2. (Linear Convergence of the SCMS Algorithm.)

Assume conditions (A1–4) and (E1–2). When the fixed sample size Inline graphic is sufficiently large and the bandwidth Inline graphic is chosen to be sufficiently small, there exists a convergence radius Inline graphic such that the SCMS sequence Inline graphic satisfies the following property with high probability:


Corollary 3.2. (Linear Convergence of the SCMS Algorithm.)

whenever Inline graphic and the initial point Inline graphic.

Corollary 3.2 should also be regarded as the linear convergence of the sample-based SCGA algorithm to the estimated ridge Inline graphic defined by the Euclidean KDE Inline graphic. Based on conditions (E1–2) and the uniform bounds (3.5), Inline graphic together with its ridge Inline graphic and sample-based SCGA sequence Inline graphic satisfy conditions (A1–4) with probability tending to 1 as Inline graphic and Inline graphic. As a result, one can follow our argument in (a) of Theorem 3.1 to establish the linear convergence of the sample-based SCGA algorithm with a fixed step size Inline graphic satisfying Inline graphic. Furthermore, when the fixed sample size Inline graphic is sufficiently large and the bandwidth Inline graphic is chosen to be small, the adaptive step size Inline graphic of the SCMS algorithm always falls below the threshold Inline graphic for linear convergence but is also uniformly bounded away from zero with respect to the iteration number Inline graphic; see our Remark 3.2. By taking the infimum of the adaptive step size Inline graphic with respect to Inline graphic, one can thus establish the linear convergence of the SCMS algorithm with its rate of convergence as Inline graphic and Inline graphic.

4. The SCMS Algorithm With Directional Data and Its Linear Convergence

In this section, we generalize the definition (1.1) of density ridges to directional densities on Inline graphic and propose our directional SCMS algorithm to identify directional density ridges. In addition, we prove the linear convergence of our directional SCMS algorithm by adjusting the arguments in Section 3.3. Throughout this section, Inline graphic denotes a random sample from a directional distribution with density Inline graphic supported on the unit hypersphere Inline graphic that is embedded in the ambient Euclidean space Inline graphic.

4.1 Definitions, Assumptions and Stability of Directional Density Ridges

To apply the matrix forms of the Riemannian gradient Inline graphic and Hessian Inline graphic of a directional density Inline graphic in the ambient space Inline graphic, we first extend Inline graphic from its support Inline graphic to Inline graphic by defining

graphic file with name DmEquation50.gif (4.1)

Now, given the expressions of Inline graphic and Inline graphic defined in (2.8) and (2.10), we perform the spectral decomposition on Inline graphic as Inline graphic, where Inline graphic is a real orthogonal matrix with columns Inline graphic as the eigenvectors of Inline graphic that are associated with the eigenvalues Inline graphic and lie within the tangent space Inline graphic at Inline graphic, and Inline graphic. Note that the Riemannian Hessian Inline graphic has a unit eigenvector Inline graphic that is orthogonal to Inline graphic and corresponds to eigenvalue 0.

Let Inline graphic be the last Inline graphic columns of Inline graphic, i.e. the unit eigenvectors inside the tangent space Inline graphic corresponding to the Inline graphic smallest eigenvalues of Inline graphic. Let Inline graphic be the projection matrix onto the linear space spanned by the columns of Inline graphic, and Inline graphic. We define the order-Inline graphic principal Riemannian gradient Inline graphic by

graphic file with name DmEquation51.gif (4.2)

where the last equality follows from the fact that the columns of Inline graphic are orthogonal to the unit vector Inline graphic. The order-Inline graphic density ridge on Inline graphic (or directional density ridge) is the set of points defined as

graphic file with name DmEquation52.gif (4.3)

Our definition of density ridges on Inline graphic can be arguably generalized to any smooth function Inline graphic supported on an arbitrary Riemannian manifold. It also follows that the 0-ridge Inline graphic is the set of local modes of Inline graphic on Inline graphic, whose statistical properties and practical estimation algorithm are discussed in [118]. Therefore, we only focus on the case when Inline graphic in this paper. To regularize the directional density ridge Inline graphic, we modify our assumptions on the Euclidean density ridge Inline graphic in Section 3.1 as follows:

  • (A1) (Differentiability) Under the extension (4.1) of the directional density Inline graphic, we assume that the total gradient Inline graphic, total Hessian matrix Inline graphic and third-order derivative tensor Inline graphic in Inline graphic exist, and are continuous on Inline graphic and square integrable on Inline graphic. We also assume that Inline graphic has bounded fourth-order derivatives on Inline graphic.

  • (A2) (Eigengap) We assume that there exist constants Inline graphic and Inline graphic such that Inline graphic and Inline graphic for any Inline graphic.

  • (A3) (Path Smoothness) Under the same Inline graphic in (A2), we assume that there exists another constant Inline graphic such that
    graphic file with name DmEquation53.gif
    for all Inline graphic and Inline graphic.

Recall that Inline graphic is a Inline graphic-neighborhood of the directional ridge Inline graphic in the ambient space Inline graphic. The discussions about conditions (A1–3) in Section 3.1 apply to their directional counterparts (A1–3), except that the eigengap condition (A2) is imposed on eigenvalues Inline graphic within the tangent space Inline graphic at Inline graphic. However, since the only eigenvalue of Inline graphic associated with the eigenvector outside the tangent space Inline graphic is 0, the eigengap condition (A2) is also valid to the entire spectrum of Inline graphic in the ambient space Inline graphic. The extension of Inline graphic in (4.1) has also been used by [43, 44, 120]. Because the directional density Inline graphic remains unchanged along every radial direction of Inline graphic under the extension (4.1), the radial component of its total gradient is Inline graphic for all Inline graphic, and the Riemannian gradient (2.8) of Inline graphic on Inline graphic becomes

graphic file with name DmEquation54.gif (4.4)

Similarly, the Riemannian Hessian (2.10) of Inline graphic on Inline graphic reduces to

graphic file with name DmEquation55.gif (4.5)

Both the Riemannian gradient and Hessian of Inline graphic on Inline graphic are invariant under this extension.

Remark 4.1. (Connection to Solution Manifolds.)

Example 4 in [28] showed that any Euclidean density ridge Inline graphic defined in (1.1) is a concrete example of a solution manifold Inline graphic with Inline graphic being a vector-valued function. It is not difficult to verify that our defined directional density ridge Inline graphic in (4.3) also belongs to the general form of the solution manifold Inline graphic, where we may rewrite Inline graphic with Inline graphic defined by

Remark 4.1. (Connection to Solution Manifolds.)

recalling that Inline graphic are the last Inline graphic eigenvectors of the Riemannian Hessian Inline graphic of the directional density Inline graphic. More importantly, our imposed conditions (A1–3) in the Euclidean ridge case and (A1–3) in the directional ridge case imply all the required assumptions in [28], i.e. the differentiability of Inline graphic and non-degeneracy of the normal space of Inline graphic; see (d) of Lemmas C.1 and G.1 in the Appendix. Therefore, the discussion about statistical properties and (normal) gradient flows of a generic solution manifold Inline graphic apply to the (directional) density ridge Inline graphic or Inline graphic here.

Similar to Euclidean density ridges, we establish the following stability theorem of directional density ridges. To measure the distance between two directional ridges Inline graphic defined by the directional densities Inline graphic and Inline graphic, we adopt the definition (3.4) of Hausdorff distance between two sets in the ambient Euclidean space Inline graphic. Note that the Euclidean norm used in the definition (3.4) is upper bounded by the geodesic distance when our interested sets lie on Inline graphic; see also (2.5). We will leverage this property in our proof of Theorem 4.1; see Appendix H for details.

Theorem 4.1.

Suppose that conditions (A1–3) hold for the directional density Inline graphic and that condition (A1) holds for Inline graphic. When Inline graphic is sufficiently small,

  • (a) conditions (A2–3) holds for Inline graphic.

  • (b) Inline graphic.

  • (c) Inline graphic for a constant Inline graphic.

One natural estimator of the directional density ridge Inline graphic can be obtained by plugging the directional KDE Inline graphic into the definition (4.3) as

graphic file with name DmEquation57.gif

To regularize the statistical behavior of the estimated directional ridge Inline graphic, we consider the following assumptions that are generalized from conditions (E1–2):

  • (D1) Assume that Inline graphic is a bounded and three times continuously differentiable function with a bounded fourth order derivative on Inline graphic for some constant Inline graphic such that
    graphic file with name DmEquation58.gif
  • (D2) Let
    graphic file with name DmEquation59.gif
    We assume that Inline graphic is a bounded VC (subgraph) class of measurable functions on Inline graphic; that is, there exist constants Inline graphic such that for any Inline graphic,
    graphic file with name DmEquation60.gif
    where Inline graphic is the Inline graphic-covering number of the normed space Inline graphic, Inline graphic is any probability measure on Inline graphic and Inline graphic is an envelope function of Inline graphic. Here, the norm Inline graphic is defined as Inline graphic.

The differentiability assumption in condition (D1) can be relaxed such that Inline graphic is (three times) continuously differentiable except for a set of points with Lebesgue measure Inline graphic on Inline graphic. Conditions (D1) and (A1) are generally required for establishing the pointwise convergence rates of the directional KDE and its derivatives [43, 44, 51, 64, 120]. Under these two conditions, Inline graphic appearing in the step sizes Inline graphic or Inline graphic of the directional mean shift or SCMS algorithm can also be shown to diverge at the order Inline graphic as Inline graphic and Inline graphic; see Section 4.2 for details. Condition (D2) regularizes the complexity of kernel Inline graphic and its derivatives as in condition (E2) in order for the uniform convergence rates of the directional KDE and its derivatives; see (4.6) below. One can justify via integration by parts that the von-Mises kernel Inline graphic and many compactly supported kernels satisfy conditions (D1–2).

Given conditions (D1–2), the techniques in [7, 43, 44, 51, 118, 120] can be utilized to demonstrate that

graphic file with name DmEquation61.gif (4.6)

where Inline graphic is the Riemannian connection on Inline graphic with Inline graphic so that Inline graphic, Inline graphic and Inline graphic; see Section 5.3 in [1] and Chapter 4 in [69].

4.2 Mean Shift and SCMS Algorithm with Directional Data

Before deriving our directional SCMS algorithm, we first review the mean shift algorithm with directional data Inline graphic [62, 80, 113]. The formal derivation can be found in Section 3 of [118]. Given the directional KDE Inline graphic in (2.4), the directional mean shift vector can be defined as

graphic file with name DmEquation62.gif (4.7)

Similar to the Euclidean mean shift vector (3.7), Inline graphic also points toward the direction of maximum increase in Inline graphic after being projected onto the tangent space Inline graphic. Thus, the directional mean shift iteration translates a point Inline graphic as Inline graphic with an extra projection Inline graphic to draw the shifted point back to Inline graphic.

Let Inline graphic denote the sequence defined by the above directional mean shift procedure. Later, by abuse of notation, we will use the same notation to denote the directional SCGA/SCMS sequence with Inline graphic. As Inline graphic, some simple algebra shows that the directional mean shift algorithm can be written into the following fixed-point iteration formula:

graphic file with name DmEquation63.gif (4.8)

From (4.8), it is also possible to write the directional mean shift algorithm as a gradient ascent method on Inline graphic with the iteration formula [116]:

graphic file with name DmEquation64.gif (4.9)

where the adaptive step size Inline graphic is given by

graphic file with name DmEquation65.gif (4.10)

Here, we denote the angle between Inline graphic and Inline graphic (or equivalently, the angle between Inline graphic and Inline graphic) by Inline graphic; see Section 5.2 in [118] for detailed derivations. Within some small neighborhoods around local modes of Inline graphic, Inline graphic and the adaptive step size Inline graphic will be dominated by Inline graphic. The following lemma characterizes the asymptotic behaviors of Inline graphic on Inline graphic and consequently, Inline graphic.

Lemma 4.1. (Lemma 10 in [118]).

Assume conditions (D1) and (A1). For any fixed Inline graphic, we have


Lemma 4.1. (Lemma 10 in [118]).

as Inline graphic and Inline graphic, where Inline graphic is a constant depending only on kernel Inline graphic and dimension Inline graphic.

Lemma 4.1 indicates that Inline graphic with probability tending to 1 as Inline graphic and Inline graphic for any Inline graphic. The conclusion may seem counterintuitive at the first glance, but one should be aware that the consistency of Inline graphic holds only on its tangent component; see (4.6). The radial component of Inline graphic that is perpendicular to Inline graphic diverges, despite the fact that the true directional density Inline graphic does not have any radial component. Using Lemma 4.1, one can argue that the adaptive step size Inline graphic in (4.10) of the directional mean shift algorithm as a gradient ascent method on Inline graphic tends to zero at the rate Inline graphic as Inline graphic and Inline graphic.

In the sequel, we denote by Inline graphic the iterative sequence generated by our directional SCMS algorithm. There are two different methods of defining a directional SCMS iteration, while we will demonstrate that one of them is superior.

Inline graphic  Method 1: As in the Euclidean SCMS algorithm, one can define the directional SCMS sequence by the directional mean shift vector (4.7) as

graphic file with name DmEquation67.gif (4.11)

where Inline graphic, Inline graphic and Inline graphic is the estimated version of Inline graphic defined by the directional KDE Inline graphic. Here, we plug in (4.7) and leverage the orthogonality between the columns of Inline graphic and Inline graphic in (*).

Unlike the Euclidean SCMS algorithm, we need an extra standardization step Inline graphic to project the updated point back to Inline graphic, which leads to the following fixed-point iteration:

graphic file with name DmEquation68.gif (4.12)

where the components Inline graphic and Inline graphic are always orthogonal for any Inline graphic; see Fig. 3 for a graphical illustration.

Fig. 3.


Fig. 3.

An illustration of one-step iterations under two candidate directional SCMS algorithms

Inline graphic  Method 2: The fixed-point iteration formula (4.8) of the directional mean shift algorithm suggests a more efficient formulation of the directional SCMS algorithm as

graphic file with name DmEquation69.gif (4.13)

where we replace the directional mean shift vector Inline graphic with the standardized total gradient estimator Inline graphic in (4.11). This directional SCMS is again a fixed-point iteration as

graphic file with name DmEquation70.gif (4.14)

A direct computation demonstrates that, by the non-increasing property of kernel Inline graphic and the fact that Inline graphic for Inline graphic,

graphic file with name DmEquation71.gif (4.15)

Because the radial components Inline graphic and Inline graphic in directional SCMS iterative formulae (4.12) and (4.14), respectively, make no contributions to the iteration of point Inline graphic on Inline graphic, the inequality (4.15) indicates that the directional SCMS algorithm with iterative formula (4.14) takes a larger step size in moving the SCMS sequence Inline graphic on Inline graphic. This helps accelerate the movements of those points that are far away from the ridge Inline graphic or lie in the regions with low density values of Inline graphic. In this sense, the directional SCMS algorithm with iterative formula (4.14) will be superior to (4.12); see Fig. 3 for a graphical demonstration. We thus choose Method 2 as our directional SCMS algorithm. Algorithm 2 in Appendix A provides the detailed steps of implementing Method 2 in practice.

Inspired by Proposition 2 in [46] for the Euclidean SCMS algorithm, we derive the ascending property of our directional SCMS algorithm (4.13) and two convergent results for stopping the algorithm in the following proposition. The proof is deferred to Appendix I, in which our argument is similar to but logically different from the proof of Proposition 2 in [46].

Proposition 4.2.

Assume that the directional kernel Inline graphic is non-increasing, twice continuously differentiable and convex with Inline graphic. Given the directional KDE Inline graphic and the directional SCMS sequence Inline graphic defined by (4.13) or (4.14), the following properties hold:

  • (a) The estimated density sequence Inline graphic is non-decreasing and thus converges.

  • (b) Inline graphic.

  • (c) If the kernel Inline graphic is also strictly decreasing on Inline graphic, then Inline graphic.

Remark 4.2.

Our results (b) and (c) in Proposition 4.2 demonstrate that the stopping criterion of our directional SCMS algorithm can follow either the norm of the principal Riemannian gradient estimator Inline graphic or the (Euclidean) distance Inline graphic between two consecutive iterative points, where the latter one requires a strictly decreasing kernel such as the von Mises kernel Inline graphic.

Motivated by the iterative formula (4.9) for the gradient ascent algorithm on Inline graphic, we consider writing our directional SCMS algorithm as a variant of the SCGA algorithm on Inline graphic with an iterative formula:

graphic file with name DmEquation72.gif (4.16)

where Inline graphic is the exponential map at Inline graphic and Inline graphic is the adaptive step size. Analogous to the Euclidean SCMS algorithm and its SCGA representation (3.11), the formulation (4.16) will reveal the (linear) convergence properties of our directional SCMS algorithm in the upcoming Section 4.3. To derive an explicit formula for Inline graphic, we recall the fixed-point equation (4.14) of our directional SCMS algorithm and compute the geodesic distance between Inline graphic and Inline graphic (one-step directional SCMS update) as

graphic file with name DmEquation73.gif

where, in the second equality, we equate the geodesic distance between Inline graphic and Inline graphic to the norm of the tangent vector inside the exponential map in (4.16). This suggests that our directional SCMS algorithm is a sample-based SCGA algorithm on Inline graphic with adaptive step size

graphic file with name DmEquation74.gif (4.17)

for Inline graphic, where Inline graphic’ denotes the angle between Inline graphic and Inline graphic. Note that the above derivation is based on the orthogonality between Inline graphic and the order-Inline graphic principal Riemannian gradient estimator

graphic file with name DmEquation75.gif

see Fig. 3 for a graphical illustration. When our directional SCMS algorithm approaches the estimated ridge Inline graphic, Inline graphic’ tends to 0 and Inline graphic is approximately equal to 1. Thus, the step size Inline graphic is also controlled by Inline graphic as in the directional mean shift scenario; see Equation (4.10). Therefore, Lemma 4.1 is still effective to argue that the step size Inline graphic converges to 0 with probability tending to 1 when Inline graphic and Inline graphic.

4.3 Linear Convergence of Population and Sample-Based SCGA Algorithms on Inline graphic

As we have shown in (4.16) that our proposed directional SCMS algorithm is an example of the sample-based SCGA method with directional KDE Inline graphic on Inline graphic with an adaptive step size Inline graphic, our main focus in this subsection will be on the (linear) convergence of such SCGA algorithm on Inline graphic. We first consider the population SCGA algorithm on Inline graphic defined by its iterative formula as

graphic file with name DmEquation76.gif (4.18)

with a suitable choice of the step size Inline graphic. The sample-based version substitutes the subspace constrained Riemannian gradient Inline graphic with its estimator Inline graphic and generally has a constant step size Inline graphic; see (4.16). In the sequel, we denote the sequence defined by the population SCGA algorithm with objective function Inline graphic on Inline graphic by Inline graphic and the sequence defined by the sample-based SCGA algorithm with objective function Inline graphic on Inline graphic by Inline graphic.

Remark 4.3.

Note that the definition (4.18) of the SCGA algorithm is adaptive to any Riemannian manifold Inline graphic, not restricting to the unit hypersphere Inline graphic. The only requirement on Inline graphic for (4.18) to be valid is that the exponential map Inline graphic is well defined within a small neighborhood of Inline graphic on the tangent space Inline graphic for each Inline graphic. More importantly, our assumptions (A1–3) and condition (A4) are generalizable to any smooth function Inline graphic supported on Inline graphic, and our (linear) convergence results are applicable to the SCGA algorithm (4.18) on Inline graphic whose sectional curvature is lower bounded by a real number; see one of the key lemmas in our proofs (Lemma I.1).

Similar to the SCGA algorithm in the Euclidean space Inline graphic, the following proposition demonstrates that the SCGA algorithm (4.18) on Inline graphic yields a non-decreasing sequence of the objective function Inline graphic supported on Inline graphic and a convergent SCGA sequence to the directional ridge Inline graphic, as long as the step size Inline graphic is sufficiently small.

Proposition 4.3. (Convergence of the SCGA Algorithm on Inline graphic.)

For any SCGA sequence Inline graphic defined by (4.18) with Inline graphic, the following properties hold:

  • (a) Under condition (A1), the objective function sequence Inline graphic is non-decreasing and thus converges.

  • (b) Under condition (A1), Inline graphic.

  • (c) Under conditions (A1–3), Inline graphic whenever Inline graphic with the convergence radius Inline graphic satisfying
    graphic file with name DmEquation77.gif
    where Inline graphic is a constant defined in (h) of Lemma G.1 while Inline graphic is a quantity depending on both the dimension Inline graphic and the functional norm Inline graphic up to the fourth-order (partial) derivatives of Inline graphic.

The proof of Proposition 4.3 can be found in Appendix I. The upper bound for the convergence radius Inline graphic has the same meaning as in Proposition 3.1 for the Euclidean SCGA algorithm, ensuring that Inline graphic and the distances from the SCGA sequence Inline graphic on Inline graphic to the directional ridge Inline graphic can be upper bounded by the norms Inline graphic of order-Inline graphic principal Riemannian gradients for all Inline graphic.

Corollary 4.1. (Convergence of the Directional SCMS Algorithm.)

When the fixed sample size Inline graphic is sufficiently large and the bandwidth Inline graphic is chosen to be correspondingly small, the following properties hold for the directional SCMS sequence Inline graphic with high probability under conditions (A1–3) and (D1–2):

  • (a) The directional KDE sequence Inline graphic is non-decreasing and thus converges.

  • (b) Inline graphic.

  • (c) Inline graphic whenever Inline graphic with the convergence radius Inline graphic defined in (c) of Proposition 4.3.

Corollary 4.1 should also be considered as the convergence results of the sample-based SCGA algorithm on Inline graphic. To justify Corollary 4.1, we know from Theorem 4.1 that conditions (A1–3) also hold with high probability for the directional KDE Inline graphic and its estimated directional ridge Inline graphic when Inline graphic is sufficiently large and Inline graphic is small enough. Further, by Lemma 4.1, the adaptive step size Inline graphic of our directional SCMS algorithm can be smaller than the threshold value Inline graphic in Proposition 4.3 but also universally bounded away from zero with respect to the iteration number Inline graphic, given a sufficiently large but fixed sample Inline graphic and a sufficiently small bandwidth Inline graphic; recall our Remark 3.2. As a result, Corollary 4.1 follows from Proposition 4.3. Notice that the statements in Proposition 4.2 are essentially the same as the results (a–b) in Corollary 4.1 here. However, similar to Proposition 2 in [46] for the Euclidean SCMS algorithm, Proposition 4.2 for the directional SCMS algorithm is established under the convexity assumption on the directional kernel Inline graphic and holds for any sample size Inline graphic and bandwidth Inline graphic. On the contrary, the results (a–b) in Corollary 4.1 are asymptotic and probabilistic properties, in which we require Inline graphic and Inline graphic.

According to Proposition 4.3 and Corollary 4.1, we can denote the limiting points of the population and sample-based SCGA algorithms on Inline graphic by Inline graphic and Inline graphic, respectively. The definition of the linear convergence of any converging sequence on Inline graphic (or an arbitrary Riemannian manifold) is similar to the one in the flat Euclidean space Inline graphic (see Definition 3.2), except that the Euclidean distance is replaced with the geodesic distance on Inline graphic in the definition; see Section 4.5 in [1].

Using the notation in [116], we let Inline graphic. Given that the sectional curvature is Inline graphic on Inline graphic, we have Inline graphic. One can show by differentiating Inline graphic that Inline graphic is strictly increasing with respect to Inline graphic and Inline graphic for any Inline graphic. Analogous to the Euclidean SCGA algorithms, we will establish the linear convergence of the SCGA sequence Inline graphic on Inline graphic (or any Riemannian manifold whose sectional curvature is lower bounded by a real number) as well as its sample-based version under the following local condition.

  • (A4) (Quadratic Behaviors of Residual Vectors) We assume that the SCGA sequence Inline graphic on Inline graphic with step size Inline graphic and Inline graphic as its limiting point satisfies
    graphic file with name DmEquation78.gif
    for some constant Inline graphic, where Inline graphic is the constant defined in condition (A2) and Inline graphic is the logarithmic map.

Condition (A4) serves as a generalization of its Euclidean counterpart condition (A4) to Inline graphic, which again requires a quadratic behavior of the residual vector Inline graphic within the tangent space Inline graphic. Under this condition, the objective (density) function Inline graphic is ‘subspace constrained geodesically strongly concave’ around the directional ridge Inline graphic; see also Remark 4.4. Some discussions about potentially weaker assumptions that imply condition (A4) in Appendix E are also applicable in the manifold setting under some modifications; see Remark E.1. One intuitive example that condition (A4) holds is presented at the second row of Fig. 5, where the directional SCMS/SCGA iterative vector Inline graphic is always orthogonal to the residual space Inline graphic for all Inline graphic around the (estimated) ridge on Inline graphic.

Fig. 5.


Fig. 5.

Density ridges estimated by the directional SCMS algorithm performed on the two simulated datasets and their (linear) convergence plots. Horizontally, the first row displays the results on the simulated vMF mixture dataset, while the second row presents the results on the circular simulated dataset on Inline graphic. Vertically, the first column includes plots with directional KDE, estimated ridges and trajectories of directional SCMS sequences from two (randomly) chosen initial points on Inline graphic. The second and third columns present the convergence plots for the log-distances of points in the highlighted sequences (indicated by hollow cyan points) to their limiting points or the estimated ridges on Inline graphic.

Theorem 4.4. (Linear Convergence of the SCGA Algorithm on Inline graphic.)

Assume conditions (A1–4) throughout the theorem.

  • (a) Q-Linear convergence of Inline graphic: Consider a convergence radius Inline graphic satisfying
    graphic file with name DmEquation79.gif
    where Inline graphic is the constant defined in (h) of Lemma G.1 and Inline graphic is a quantity defined in (c) of Proposition 4.3 that depends on both the dimension Inline graphic and the functional norm Inline graphic up to the fourth-order (partial) derivatives of Inline graphic. Whenever Inline graphic and the initial point Inline graphic with Inline graphic, we have that
    graphic file with name DmEquation80.gif
  • (b) R-Linear convergence of Inline graphic: Under the same radius Inline graphic in (a), we have that whenever Inline graphic and the initial point Inline graphic with Inline graphic,
    graphic file with name DmEquation81.gif

We further assume (D1–2) in the rest of statements. Suppose that Inline graphic and Inline graphic.

  • (c) Q-Linear convergence of Inline graphic: Under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation82.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.
  • (d) R-Linear convergence of Inline graphic: Under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation83.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.

The detailed proof of Theorem 4.4 is in Appendix I. The theorem illuminates both the step size requirement and the convergence radius Inline graphic for the linear convergence of SCGA algorithms on Inline graphic. Similar to Euclidean SCGA algorithms in Theorem 3.3, the upper bound of the convergence radius Inline graphic consists of the three quantities adopted from Proposition 4.3 and a quantity controlling the ‘subspace constrained geodesically strong concavity’ around the directional ridge Inline graphic.

Remark 4.4.

Similar to Euclidean SCGA algorithms, the geodesically strong concavity assumption [116] on the objective function Inline graphic is not sufficient to prove the linear convergence of the SCGA algorithm (4.18) on Inline graphic. We instead establish the following ‘subspace constrained geodesically strong concavity’ under some mild conditions (A1–4):


Remark 4.4. (4.19)

for some constant Inline graphic, where Inline graphic is generally chosen to be Inline graphic. In fact, the most critical factors for establishing this property is the eigengap condition (A2) and the quadratic behaviors of residual vectors stated in condition (A4).

Corollary 4.2. (Linear Convergence of the Directional SCMS Algorithm.)

Assume conditions (A1–4) and (D1–2). When the fixed sample size Inline graphic is sufficiently large and the fixed bandwidth is chosen to be sufficiently small, there exists a convergence radius Inline graphic such that the directional SCMS sequence Inline graphic satisfies


Corollary 4.2. (Linear Convergence of the Directional SCMS Algorithm.)

with high probability whenever Inline graphic and the initial point Inline graphic.

We also identify Corollary 4.2 as the linear convergence of the sample-based SCGA algorithm on Inline graphic to the estimated directional ridge Inline graphic defined by the directional KDE Inline graphic. The corollary can be justified by noticing that, under conditions (D1–2) and the uniform bounds (4.6), Inline graphic satisfies conditions (A1–3) with probability tending to 1 as Inline graphic and Inline graphic; see Theorem 4.1. With this fact, one can leverage our argument in (a) of Theorem 4.4 to prove the linear convergence of the sample-based SCGA algorithm on Inline graphic with a fixed step size Inline graphic satisfying Inline graphic. Additionally, when the fixed sample size Inline graphic is sufficiently large and the bandwidth is chosen to be accordingly small, the adaptive step size Inline graphic of our directional SCMS algorithm in (4.17) always falls below the threshold value Inline graphic for linear convergence by Lemma 4.1 but is also bounded away from zero; recall Remark 3.2. Taking the infimum of Inline graphic with respect to the iteration number Inline graphic under a fixed Inline graphic and Inline graphic yields our results in Corollary 4.2.

5. Experiments

In this section, we first validate our linear convergence results of both Euclidean and directional SCMS algorithms on some simulated datasets. Then, we apply these two algorithms to a real-world earthquake dataset so as to identify its density ridges and compare the estimated ridges with boundaries of tectonic plates and fault lines, on which earthquakes are known to happen frequently.

We leverage the Gaussian kernel profile Inline graphic in the Euclidean SCMS algorithm and the von Mises kernel Inline graphic in the directional SCMS algorithm. In addition, the logarithms of the estimated densities are utilized in our actual implementations (Step 2 in Algorithms 1 and 2 in Appendix A) of the Euclidean and directional SCMS algorithms because of two advantages. First, using the log-density in the Euclidean SCMS algorithm leads to a faster convergence process [46]; see our empirical illustration in Fig. A7. Second, estimating a hidden manifold with a density ridge defined by a log-density stabilizes the valid region for a well-defined ridge compared with the corresponding ridge defined by the original density; see Theorem 7 (Surrogate theorem) in [45].

Unless stated otherwise, we set the default bandwidth parameter of the Euclidean SCMS algorithm to the normal reference rule in [20, 25], which is

graphic file with name DmEquation86.gif (5.1)

where Inline graphic is the sample standard deviation along Inline graphic-th coordinate and Inline graphic is the (Euclidean) dimension of the data in Inline graphic. As mentioned by [25], there are two advantages of applying the normal reference rule (5.1) in our context. First, the KDE Inline graphic under Inline graphic tends to be oversmoothing [97], because the bandwidth minimizes the asymptotic MISE for estimating the first-order derivatives of a multivariate Gaussian distribution with covariance matrix Inline graphic; see Corollary 4 in [20]. More importantly, the Euclidean SCMS algorithm with an oversmoothed KDE Inline graphic would not produce too many spurious ridges. Second, compared with cross validation methods, Inline graphic is easy to compute in practice, especially when the dimension of data is high. The default bandwidth parameter of the directional SCMS algorithm is selected via the rule of thumb in Proposition 2 of [43], which optimizes the asymptotic MISE for a Inline graphic distribution. The concentration parameter Inline graphic is estimated by Equation (4.4) in [9]. That is,

graphic file with name DmEquation87.gif (5.2)

where Inline graphic given the directional dataset Inline graphic and we recall that Inline graphic is the modified Bessel function of the first kind of order Inline graphic. As Inline graphic-von Mises–Fisher distribution behaves as the Gaussian distribution on Inline graphic, choosing the bandwidth (5.2) also helps smooth out the resulting directional KDE. The tolerance level is always set to be Inline graphic for any SCMS algorithm.

5.1 Simulation Study on the Euclidean SCMS Algorithm

To evaluate the algorithmic rate of convergence of the Euclidean SCMS algorithm (Algorithm 1), we generate the first simulated dataset by randomly drawing 1000 data points from a Gaussian mixture model with density Inline graphic, where Inline graphic, Inline graphic and Inline graphic. Another simulated dataset consists of 1000 data points randomly generated from an upper half circle with radius 2 and i.i.d. Gaussian noises Inline graphic. When applying Algorithm 1 with the estimated log-density on each of these two simulated datasets, we choose the set of initial mesh points as the simulated dataset itself and remove those initial points whose density values are below 25% of the maximum density from the set of mesh points in order to obtain a cleaner ridge structure.

Figure 4 presents the Euclidean KDE plots, estimated density ridges from the Euclidean SCMS algorithm and their (linear) convergence plots on the two simulated datasets. The linear trends of those plots in the second and third columns of Fig. 4 empirically demonstrate the correctness of our Theorem 3.1 and Corollary 3.2 about the linear convergence of the Euclidean SCMS algorithm.

Fig. 4.


Fig. 4.

Density ridges estimated by the Euclidean SCMS algorithm on the two simulated datasets and their (linear) convergence plots. Horizontally, the first row displays the results of the simulated Gaussian mixture dataset, while the second row presents the results of the half circle simulated dataset. Vertically, the first column includes plots with Euclidean KDE, estimated ridges, and trajectories of SCMS sequences from two (randomly) chosen initial points. The second and third columns present the (linear) convergence plots for the log-distances of points in the highlighted sequences (indicated by hollow cyan points) to their limiting points or the estimated ridges.

5.2 Simulation Study on the Directional SCMS Algorithm

Analogous to our simulation study for the linear convergence of the Euclidean SCMS algorithm, we verify the linear convergence of our directional SCMS algorithm (Algorithm 2) on two different simulated datesets. One of them comprises 1000 data points randomly generated from a vMF mixture model Inline graphic with Inline graphic, Inline graphic and Inline graphic. The other simulated dataset is identical to the example in the right panel of Fig. 1 and the underlying dataset in Fig. B9, which consists of 1000 randomly sampled points from a circle connecting two poles on Inline graphic with i.i.d. additive Gaussian noises Inline graphic to their Cartesian coordinates and additional Inline graphic normalization onto Inline graphic. In our implementation of Algorithm 2 with the directional log-density on the two simulated datasets, we also set each initial mesh as the dataset itself and remove those points whose density values are below 10% of the maximal density value from each set of mesh points.

Figure 5 shows the directional KDE plots, estimated density ridges on Inline graphic from the directional SCMS algorithm and their (linear) convergence plots on the aforementioned simulated datasets. Those linear decreasing trends in the convergence plots, possibly after several pilot iterations, illustrate the locally linear convergence of the directional SCMS algorithm that we proved in Theorem 4.4 and Corollary 4.2. Note that those minor perturbations at the tails of some linear convergence plots in Fig. 5 are due to precision errors.

5.3 Density Ridges on Earthquake Data

It is well known that earthquakes on Earth tend to strike more frequently along the boundaries of tectonic plates and fault lines (i.e. sections of a plate or two plates are moving in different directions); see [54, 103] for more details. We analyze earthquakes with magnitudes of 2.5+ occurring between 2020-10-01 00:00:00 UTC and 2021-03-31 23:59:59 UTC, which can be obtained from the Earthquake Catalog (https://earthquake.usgs.gov/earthquakes/search/) of the United States Geological Survey. The dataset Inline graphic contains 15049 earthquakes worldwide in this half-year period.

The normal reference rule (5.1) leads to the bandwidth parameter Inline graphic and the rule of thumb (5.2) yields Inline graphic under the earthquake dataset Inline graphic. However, as these bandwidths lead to oversmoothing density estimates, we decrease the bandwidths for the Euclidean and directional SCMS algorithms to Inline graphic and Inline graphic respectively, in order to detect more ridge structures. We generate 5000 points uniformly on the sphere Inline graphic as the initial mesh points.

To compare the earthquake ridges obtained by the Euclidean and directional SCMS algorithms with the boundaries of tectonic plates, we download the boundary geometry file of the 56 tectonic plates from https://www.kaggle.com/cwthompson/tectonic-plate-boundaries according to the models of [5, 13] and overlap them with the estimated ridges in Fig. 6. The results suggest that the ridges identified by the Euclidean and directional SCMS algorithms on the earthquake dataset coincide with the boundaries of tectonic plates to a large extent. Note that the Euclidean and directional ridges on the earthquake dataset Inline graphic do not show too much difference, because most of the observed earthquakes are in the low latitude region (Inline graphic) where most human beings live. Yet, the ridges estimated by our proposed directional SCMS algorithm do align better with the boundary of the Eurasian Plate near the North Pole than the ones estimated by the Euclidean SCMS algorithm, which confirms the superiority of our directional SCMS algorithm in the high latitude region; see also Appendix B for more in-depth analysis.

Fig. 6.


Fig. 6.

Comparisons between density ridges obtained by the Euclidean SCMS algorithm on angular coordinates and the directional SCMS algorithm on Cartesian coordinates from the earthquake dataset. On each panel, the ground-truth boundaries of tectonic plates are plots in blue curves.

We further quantify the performances of earthquake ridges Inline graphic and Inline graphic estimated by the Euclidean and directional SCMS algorithms from two different perspectives. First, given the fact that an estimated ridge should lie on the region where earthquakes happen more intensively, we compute the mean geodesic distances from each point in the earthquake dataset Inline graphic to the ridges Inline graphic and Inline graphic, respectively, as

graphic file with name DmEquation88.gif

where Inline graphic is the number of earthquakes in the dataset. The ridge Inline graphic estimated by our directional SCMS algorithm is around 4% closer to the earthquakes in Inline graphic on average. Second, we assess the estimation errors of Inline graphic and Inline graphic with respect to the boundaries of tectonic plates. To this end, we view the surface of the Earth as a unit sphere Inline graphic and define a manifold-recovering error measure [119] between the set of boundary points Inline graphic and an estimated ridge Inline graphic as

graphic file with name DmEquation89.gif (5.3)

where Inline graphic and Inline graphic are the cardinalities of Inline graphic and Inline graphic, respectively. Note that although the density ridge Inline graphic and the boundaries of tectonic plates Inline graphic are continuous structures in theory, they are generally represented by sets of discrete points in practice. That is why we can calculate their cardinalities without computing complicated integrals. Moreover, the manifold-recovering error measure is an average between the mean geodesic distances from each point in Inline graphic to Inline graphic and from each point in Inline graphic to Inline graphic. We define such a balanced error measure to avoid biasing toward an estimated ridge Inline graphic that only approximates a small portion of Inline graphic in high accuracy but fails to cover other parts of Inline graphic; see Fig. 4 in [119] for an illustrative example. The manifold-recovering error measures of the ridges Inline graphic and Inline graphic estimated by the Euclidean and directional SCMS algorithms with respect to the boundaries of tectonic plates Inline graphic are

graphic file with name DmEquation90.gif

Our directional SCMS algorithm again reduces the estimation error by around 3.9%. In summary, the earthquake ridges yielded by our directional SCMS algorithm are not only closer to the earthquakes on average than the ones identified by the Euclidean SCMS algorithm but also have a lower error in approximating the boundaries of tectonic plates.

6. Discussions

In this paper, we have provided a rigorous proof for the linear convergence of the well-known SCMS algorithm by viewing it as an example of the SCGA algorithm. We have also generalized the definition of density ridges from the usual densities supported on compact sets in Inline graphic to the directional densities supported on Inline graphic with nonzero curvature. The stability theorem of directional density ridges has been established, and the linear convergence of our proposed directional SCMS algorithm has been proved. Table 1 summarizes the frameworks of considering the (directional) mean shift/SCMS algorithms as gradient ascent/SCGA methods (on Inline graphic) and our results of asymptotic convergence rates of their corresponding step sizes.

Table 1.

Comparisons between the Euclidean and directional mean shift (MS) or SCMS algorithms and summary of the asymptotic convergence rates of their adaptive sizes when viewed as GA/SCGA algorithms in Inline graphic or on Inline graphic.

Algorithms Recast forms as GA/SCGA (in Inline graphic or on Inline graphic) Asymptotic step sizes
MS / SCMS in Inline graphic Inline graphic Inline graphic
(See Lemma 3.2)
MS / SCMS on Inline graphic Inline graphic Inline graphic
Inline graphic
(See Lemma 4.1)

Our theoretical analyses of the SCGA algorithm in the Euclidean space Inline graphic and on the unit hypersphere Inline graphic has potential implications beyond proving the linear convergence of SCMS algorithms. In the optimization literature [1, 77, 78, 116], it is well known that a standard gradient ascent method (on a smooth manifold) will converge linearly given an appropriate step size when the objective function is smooth and (geodesically) strongly concave. However, as we have discussed in Remarks 3.3 and 4.4, the smoothness and (geodesically) strong concavity assumptions are not sufficient for the linear convergence of the SCGA algorithms. Therefore, identifying density ridges with the SCGA algorithms is not only a nonconvex optimization problem, but also fundamentally more complex than standard gradient ascent methods. The assumptions and proof arguments developed in this paper may give some insights into the linear convergence of the SCGA algorithms with other forms of subspace constrained gradients.

There are still many open problems related to the SCMS algorithm. First, a central issue in determining the performance of an SCMS algorithm is the bandwidth selection. There is a variety of bandwidth selection mechanisms available to the Euclidean KDE and its derivatives in the literature [20, 96], but it is unclear how they can be applied to the SCMS algorithm. We plan to specialize or generalize such techniques to the SCMS algorithm under both the Euclidean and directional data. Second, our definition of density ridges is generalizable to any density supported on an arbitrary Riemannian manifold. As [56] has formulated the principal curve on a Riemannian manifold based on its classical definition in [55], it will be interesting to propose a new definition of principal curves from the perspective of density ridges on Riemannian manifolds and derive a more general SCMS algorithm, possibly based on some existing nonlinear mean shift methods on manifolds [104, 105].

Data Availability Statement

The data and code underlying this paper are available at https://github.com/zhangyk8/EuDirSCMS. Specifically, the earthquake data in Section 5.3 were obtained from the Earthquake Catalog (https://earthquake.usgs.gov/earthquakes/search/) of the United States Geological Survey.

Supplementary Material

EuDirSCMS-main_iaac005

Acknowledgment

We thank the anonymous reviewers for their helpful comments that improved the quality of this paper.

Funding

Y.C. is supported by the National Science Foundation [DMS-1952781 and DMS-2112907] and CAREER award [DMS-2141808], and the National Institutes of Health [U24-AG072122].

A. Algorithmic Summaries of Euclidean and Directional SCMS Algorithms

In this section, we provide algorithmic summaries of the Euclidean and directional SCMS algorithms for practical reference. Algorithm 1 describes each step of the Euclidean SCMS algorithm in detail. In our actual implementation of the algorithm, we replace the density estimator Inline graphic with Inline graphic. To demonstrate that the (directional) SCMS algorithms under the log-density implementation give rise to a faster convergence process, we repeat our experiments in Sections 5.1 and 5.2 (i.e. Figs 4 and 5) 20 times for each simulated dataset with the (directional) SCMS algorithms under the original (estimated) density and the (estimated) log-density, respectively. The comparisons between their running times are shown in Fig. A7, in which the (directional) SCMS algorithms under the log-density implementation clearly outperform their counterparts with the original density in terms of the average elapsed time until convergence.

Fig. A7.


Fig. A7.

Running time comparisons between the (directional) SCMS algorithms with the original density and the log-density applied to our simulated datasets in Figs 4 and 5.

Additionally, when the observational data in practice are noisy, it is common to incorporate an extra denoising step before Step 2 of Algorithm 1 to remove observations in low-density areas and stabilize the (Euclidean) SCMS algorithm; see [23, 45] for comparative studies that demonstrate the significance of denoising.

We summarize the directional SCMS algorithm in Algorithm 2. Note that in Step 2-1 of Algorithm 2, we compute the scaled versions Inline graphic and Inline graphic for Inline graphic because the estimated principal Riemannian gradient Inline graphic and Hessian Inline graphic are often very small. The scaling stabilizes the numerical computation. The spectral decomposition is thus performed on the scaled Hessian estimator Inline graphic, and the scaled principal Riemannian gradient estimator is calculated as

graphic file with name DmEquation93.gif

where Inline graphic has its columns equal to the orthonormal eigenvectors associated with the Inline graphic smallest eigenvalues of the scaled Hessian estimator Inline graphic (or equivalently, Inline graphic) inside the tangent space Inline graphic.

graphic file with name iaac005fx1.jpg

B. Limitations of Euclidean KDE in Handling Directional Data

In this section, we demonstrate with examples and simulation studies that it is inadequate to analyze angular or directional data with Euclidean KDE (2.1) and SCMS algorithm (Algorithm 1). Consider a directional data sample Inline graphic generated from a directional density Inline graphic on Inline graphic. In real-world applications, the random observations Inline graphic on Inline graphic are commonly represented by their angular coordinates Inline graphic with Inline graphic or equivalently, Inline graphic for Inline graphic, where Inline graphic are longitudes and Inline graphic are latitudes.

graphic file with name iaac005fx2.jpg

B.1 Case I: Density Estimation

As the angular coordinates Inline graphic of the directional dataset Inline graphic have their ranges in a subset Inline graphic of the flat Euclidean space Inline graphic, it is tempting to apply the Euclidean KDE on Inline graphic to construct a density estimator as

graphic file with name DmEquation94.gif (B1)

where Inline graphic uses a radial symmetric kernel with profile Inline graphic, and Inline graphic leverages a product kernel. However, the Euclidean KDEs in (B1) (both Inline graphic and Inline graphic) exhibit two potential drawbacks of dealing with directional data.

Inline graphic First, Inline graphic in (B1) is an estimator of the directional density Inline graphic under its angular representation Inline graphic. Here, Inline graphic is Inline graphic-periodic in its first coordinate and Inline graphic-periodic in its second coordinate. Then, the bias of Inline graphic in estimating Inline graphic is

graphic file with name DmEquation95.gif

where Inline graphic and Inline graphic is the Laplacian of Inline graphic; see [27] for details. However, the second-order partial derivative Inline graphic along the lines of constant latitude (or parallels) would tend to infinity as we approach the north and south poles, given that the first-order partial derivative Inline graphic is bounded. One method to justify this claim is that the curvatures of these parallels, which are equivalent to the reciprocals of their radii, tend to infinity as these radii shrink. In addition, one should recall that the curvature of a function Inline graphic is defined as Inline graphic. Therefore, applying (B1) to estimate the angular representation Inline graphic of the directional density Inline graphic will produce high bias as the estimator Inline graphic approaches the high-latitude regions (around the north and south poles); see also Panel (c) of Fig. B9.

Inline graphic Second, the Euclidean KDE Inline graphic leverages the Euclidean distances between any query point Inline graphic and observations Inline graphic under their angular coordinates to construct the density estimates, instead of using the (intrinsic) geodesic distances. Note that the Euclidean distance in the angular coordinate system is not equivalent to the Euclidean distance in the ambient Euclidean space Inline graphic containing the directional data on Inline graphic. As a result, some observations that have dramatically different geodesic distances to density query points can have the same density contributions in Inline graphic, as illustrated in Example B.1.

Example B.1.

Suppose that we want to estimate the density values at Inline graphic and Inline graphic, where Inline graphic is of a small value. Consider a random sample consisting of only two observations Inline graphic and Inline graphic. If we use the Euclidean distance, the distance between Inline graphic and the distance between Inline graphic are the same. Therefore, when we use the Euclidean KDE Inline graphic to estimate the underlying density, the contribution of Inline graphic to Inline graphic will be the same as the contribution of Inline graphic to Inline graphic. Nevertheless, their geodesic distances are very different, because Inline graphic while Inline graphic is a quantity close to zero; see Fig. B8 for a graphical illustration. It explains, from a different angle, why the Euclidean KDE Inline graphic will have a large bias in estimating the underlying density when the query point Inline graphic is within the high latitude region.

Fig. B8.


Fig. B8.

Graphical illustration of geodesic distances between Inline graphic and Inline graphic as well as Inline graphic and Inline graphic.

B.2 Case II: Ridge-Finding Problem

Consider the following simulated example of identifying a density ridge via the Euclidean SCMS algorithm (Algorithm 1) and our proposed directional SCMS algorithm (Algorithm 2). We generate 1000 data points Inline graphic uniformly frbecauseom a great circle connecting the North and South Poles of Inline graphic with some i.i.d. additive Gaussian noises Inline graphic to their Cartesian coordinates. Then, all the simulated points will be standardized back to Inline graphic via Inline graphic normalization. The angular coordinates of these simulated points are denoted by Inline graphic accordingly. Figure B9 presents the result of applying both the Euclidean SCMS algorithm (with the Gaussian kernel) to angular coordinates and the directional SCMS algorithm (with the von Mises kernel) to Cartesian coordinates of our simulated dataset. As shown in the panel (b) of Fig. B9, the Euclidean SCMS algorithm exhibits high bias in estimating the true circular structure near two poles of Inline graphic, while our directional SCMS algorithm is able to seek out the true circular structure under negligible errors. The density plot in the panel (c) of Fig. B9 exhibits two nonsmoothing peaks on the North Pole due to the infinite Hessian matrices of the underlying density in its angular coordinate; recall our discussion in Section B.1. This also explains the chaotic behavior of the Euclidean KDE in high-latitude regions.

Fig. B9.


Fig. B9.

Euclidean and directional SCMS algorithms performed on the simulated dataset. Panels (a)–(c): Outcomes of the Euclidean SCMS algorithm with the contour plot for the Euclidean KDE. Panels (d)–(f): Outcomes of our directional SCMS algorithm with the contour plot for the directional KDE. Panels (a)–(b) and (d)–(e) are shown in the view of Hammer projections (page 160 in [100], while Panels (c) and (f) are presented under the orthographic projections.

At this point, some readers may have a natural concern: why we do not directly apply the Euclidean SCMS algorithm to the Cartesian coordinates Inline graphic of the available data points? We discuss the potential downsides of this approach from two different aspects.

  1. The Euclidean SCMS algorithm is not intrinsically designed for handling the directional data Inline graphic. Directly applying the algorithm to these Cartesian coordinates leads to an estimated ridge not lying on Inline graphic. While the Inline graphic normalization is able to standardize the ridge points back to Inline graphic, this standardization process will inevitable introduce extra bias.

  2. When estimating the underlying density of Inline graphic, we know from (3.5) and some KDE literature [20, 27, 96] that the (uniform) rates of convergence of the Euclidean KDE and its derivatives depend on the dimension Inline graphic of the ambient space instead of the intrinsic dimension Inline graphic of directional data. This dimensionality effect also appears in the (linear) convergence of the downstream SCMS algorithm, which, for instance, shrinks the upper bounds of the (linear) convergence radius and step size threshold in Theorem 3.1. Thus, analyzing directional data Inline graphic with the Euclidean KDE and SCMS algorithm will slow down the statistical and algorithmic rates of convergence of the density estimators as well as lower the accuracy of the resulting ridge in recovering the underlying structure inside the dataset.

To support our above explanations, we extend our simulation study in Fig. B9 as follows. We vary the maximum latitude attained by the underlying (intrinsic) circular structure on Inline graphic from Inline graphic to Inline graphic while keeping the circle parallel to the original great circle connecting the North and South Poles of Inline graphic; see the panel (a) in Fig. B10 for an illustration. For each of these underlying circles, we follow the same sampling scheme as in Fig. B9, i.e. sampling 1000 points uniformly on the circle with some i.i.d. additive Gaussian noises Inline graphic to their Cartesian coordinates and Inline graphic normalization back to Inline graphic. The Cartesian coordinates of the simulated points from each circular structure are denoted by Inline graphic while their angular coordinates are represented by Inline graphic. Then, we apply our directional SCMS algorithm to Inline graphic from each of these simulated datasets. Moreover, the Euclidean SCMS algorithm is applied to both the angular coordinates Inline graphic and Cartesian coordinates Inline graphic from each of these simulated datasets, where we consider Inline graphic as a dataset in the ambient space Inline graphic in the latter case. Here, the sets of initial points for the Euclidean and directional SCMS algorithms are the simulated datasets themselves. Finally, we compute the average geodesic distance errors on Inline graphic from the resulting ridges to the corresponding true circular structures. To reduce the randomness of our simulation studies, we also repeat the above sampling and experimental procedures 20 times for each true circular structure.

Fig. B10.


Fig. B10.

Euclidean and directional SCMS algorithms applied to the simulated datasets whose true structures are circles on Inline graphic attaining their maximum latitudes from Inline graphic to Inline graphic, respectively. The dots on each line plot in the panels (b–d) are the means of the associated statistics for the repeated experiments, while the error bars indicate their corresponding standard deviations.

We present our comparisons of the Euclidean and directional SCMS algorithms based on three metrics in Fig. B10: (i) average geodesic distance errors between the estimated ridges and the true circular structures, (ii) the number of iteration steps and (iii) the running time. Notice that, as the latitudes of the underlying circular structures increase, the distance errors of (Euclidean) ridges based on the Euclidean SCMS algorithm applied on the angular coordinates Inline graphic rise. Conversely, the distance errors of directional ridges and the ridges based on the Euclidean SCMS algorithm in Inline graphic decreases when the true circular structures climb on Inline graphic; see the panel (b) of Fig. B10. While the performances of our directional SCMS algorithm and the Euclidean SCMS algorithm in Inline graphic are almost indistinguishable in terms of the average geodesic distance errors, our directional SCMS algorithm significantly outperforms the Euclidean SCMS algorithm with regards to time efficiency; see the panels (c–d) of Fig. B10. Note that the Euclidean SCMS algorithm exhibits high variance in the number of iteration steps under the repeated experiments, because each simulated dataset may contain some outliers that are far away from the true circular structure on Inline graphic and the Euclidean SCMS algorithm requires exceptionally large iterative steps to converge when initialized from these outliers. Our directional SCMS algorithm, however, is stabler in its iterative step due to the fact that it is adaptive to the geometry of Inline graphic.

Other potential issues of analyzing directional data with Euclidean methods and ignoring the curvature of Inline graphic can be found in [30]. In summary, it is highly inadequate and inefficient to handle directional data with the Euclidean KDE and SCMS algorithm, which calls for the needs to introduce the directional KDE (2.4) and propose our well-designed SCMS algorithm for analyzing directional data (Algorithm 2).

C. Normal Space of the Euclidean Density Ridge

As we will refer to conditions (A1–3) frequently in the next two sections, we restate them here:

  • (A1) (Differentiability) We assume that Inline graphic is bounded and at least four times differentiable with bounded partial derivatives up to the fourth order for every Inline graphic.

  • (A2) (Eigengap) We assume that there exist constants Inline graphic and Inline graphic such that Inline graphic and Inline graphic for any Inline graphic.

  • (A3) (Path Smoothness) Under the same Inline graphic in (A2), we assume that there exists another constant Inline graphic such that
    graphic file with name DmEquation96.gif
    for all Inline graphic and Inline graphic.

Given a matrix-valued function Inline graphic, its gradient Inline graphic will be an Inline graphic array defined as Inline graphic. The derivative of Inline graphic in the directional of a vector Inline graphic is defined as

graphic file with name DmEquation97.gif

When the matrix Inline graphic, we will use the notation Inline graphic interchangeably to denote its directional derivative along Inline graphic.

Recall that an order-Inline graphic ridge of the density Inline graphic in Inline graphic is the collection of points defined as

graphic file with name DmEquation98.gif

Lemma C.1 below shows that under conditions (A1–3), the Jacobian matrix Inline graphic has rank Inline graphic at every point of Inline graphic, and Inline graphic is a Inline graphic-dimensional manifold by the implicit function theorem [92]. Consequently, the row space of Inline graphic spans the normal space to Inline graphic.

If we define Inline graphic, the derivation in pages 60–63 of [39] shows that

graphic file with name DmEquation99.gif (C1)

for Inline graphic, and the column space of Inline graphic spans the normal space to Inline graphic. Let

graphic file with name DmEquation100.gif

for Inline graphic. Then,

graphic file with name DmEquation101.gif (C2)

However, the columns of Inline graphic are not orthonormal. Thus, we leverage the orthonormalization in [22] to construct Inline graphic whose columns are orthonormal and span the same column space as Inline graphic in the following steps. Under the condition that Inline graphic has full rank Inline graphic at every point Inline graphic (see Lemma C.1), Inline graphic is positive definite, and we perform the Cholesky decomposition on it, that is,

graphic file with name DmEquation102.gif (C3)

where Inline graphic is a lower triangular matrix whose diagonal elements are positive. We then define

graphic file with name DmEquation103.gif (C4)

Notice that Inline graphic intrinsically depend on the dimension Inline graphic of the ridge Inline graphic, but we do not explicate these dependencies in their notations. As discussed in [22], Inline graphic might not be unique because the eigenvalues of Inline graphic can have their multiplicities greater than 1. Any collection of linearly independent unit eigenvectors of Inline graphic fits into the above construction for Inline graphic. However, as will be shown later, this volatility of Inline graphic will not affect our results, as we only require the smoothness of Inline graphic to develop a lower bound of Inline graphic.

Lemma C.1.

Assume conditions (A1–3). Given that Inline graphic and Inline graphic are defined in (C2) and (C4), we have the following properties:

  • (a) Inline graphic and Inline graphic have the same column space. In addition,
    graphic file with name DmEquation104.gif
    That is, Inline graphic is the projection matrix onto the columns of Inline graphic.
  • (b) The columns of Inline graphic are orthonormal to each other.

  • (c) For Inline graphic, the column space of Inline graphic is normal to the (tangent) direction of Inline graphic at Inline graphic.

  • (d) For all Inline graphic, Inline graphic. Moreover, Inline graphic is a Inline graphic-dimensional manifold that contains neither intersections and nor endpoints. Namely, Inline graphic is a finite union of connected and compact manifolds.

  • (e) For Inline graphic, all the Inline graphic nonzero singular values of Inline graphic are greater than Inline graphic and therefore,
    graphic file with name DmEquation105.gif
  • (f) When Inline graphic is sufficiently small and Inline graphic,
    graphic file with name DmEquation106.gif
    for some constant Inline graphic.
  • (g) Assume that another density function Inline graphic also satisfies conditions (A1–3) and Inline graphic is sufficiently small. Then
    graphic file with name DmEquation107.gif
    for some constant Inline graphic and any Inline graphic, where Inline graphic is the matrix defined in (C4) with the underlying density Inline graphic.
  • (h) The reach of Inline graphic satisfies
    graphic file with name DmEquation108.gif
    for some constant Inline graphic.

Lemma C.1 is extended from Lemma 2 in [22] to handle the density ridge Inline graphic with Inline graphic. As our conditions (A1–3) imply the imposed conditions of Lemma 2 in [22], our proof of Lemma C.1 essentially follows from their arguments with some minor modifications.

Proof. Proof of Lemma C.1

We adopt and generalize parts of the proof of Lemma 2 in [22].

(a) This property is a natural corollary of the Cholesky decomposition as

graphic file with name DmEquation109.gif

(b) Some direct calculations show that

graphic file with name DmEquation110.gif

(c) It can be proved by the argument of Lemma 1 in [22]. Or, we define an arbitrary parametrized curve Inline graphic lying within Inline graphic for some Inline graphic. Then Inline graphic aligns with the tangent direction at Inline graphic. Since Inline graphic, taking the derivative with respect to Inline graphic gives us that

graphic file with name DmEquation111.gif

with Inline graphic. Hence, by the arbitrariness of Inline graphic, the column of Inline graphic is normal to the tangent direction of Inline graphic at Inline graphic.

(d) We prove that the Inline graphic nonzero singular values of Inline graphic are bounded away from 0. Recall that

graphic file with name DmEquation112.gif

with

graphic file with name DmEquation113.gif

for Inline graphic. Under conditions (A2-3),

graphic file with name DmEquation114.gif

It shows that all the singular values of Inline graphic are less than Inline graphic. Moreover, under condition (A2) again, all the Inline graphic nonzero singular values of Inline graphic are greater than Inline graphic. By Theorem 3.3.16 in [57], we know that all the Inline graphic nonzero singular values of Inline graphic are greater than Inline graphic. Therefore, Inline graphic. The rest of the proof follows directly from Claim 4 in [22].

(e) By the proof of (d), we already know that all the Inline graphic nonzero singular values of Inline graphic are greater than Inline graphic. Thus, Inline graphic, and

graphic file with name DmEquation115.gif

where Inline graphic is the smallest singular value of matrix Inline graphic.

Finally, the proofs of properties (f), (g) and (h) are essentially the same as the corresponding claims in [22]. We thus omitted them.

As we have discussed in Remark 4.1, property (d) of Lemma C.1 demonstrates that our imposed assumptions (A1–3) for the ridge Inline graphic is sufficient to imply the critical full-rank condition of its normal space in [28] in order for Inline graphic to be a well-defined solution manifold.

D. Proofs of Lemma 3.2, Proposition 3.1, and Theorem 3.1

Lemma D.1.

Assume conditions (A1) and (E1). The convergence rate of Inline graphic is


Lemma D.1.

for any Inline graphic as Inline graphic and Inline graphic.

Another interpretation of Lemma 3.2 is that Inline graphic diverges to infinity at the rate

graphic file with name DmEquation117.gif

if we select the bandwidth Inline graphic to minimize the asymptotic MISE [27], where ‘Inline graphic’ stands for the asymptotic equivalence.

Proof. Proof of Lemma 3.2

Note that

graphic file with name DmEquation118.gif (D1)

Given the differentiability of Inline graphic guaranteed by condition (A1), the expectation of Inline graphic is given by

graphic file with name DmEquation119.gif

By condition (E1), the dominating constant Inline graphic is finite and therefore,

graphic file with name DmEquation120.gif (D2)

In addition, we calculate the variance of Inline graphic as

graphic file with name DmEquation121.gif

Again, by condition (E1), the dominating constant Inline graphic is finite. Thus, by the central limit theorem,

graphic file with name DmEquation122.gif (D3)

where Inline graphic. Combining (D1), (D2) and (D3), we conclude that

graphic file with name DmEquation123.gif

for any Inline graphic as Inline graphic and Inline graphic.

Remark D.1.

Some previous research papers on the mean shift algorithm [3; 6; 19; 71] have already justified that the algorithm converges to a local mode of the KDE Inline graphic when its local modes are isolated and the algorithm starts within some small neighborhoods of these estimated local modes. Lemma 3.2 here provides a (probabilistic) perspective on the linear convergence of the mean shift algorithm. It is well known that the set of the true local modes of Inline graphic can be approximated by the set of estimated modes defined by Inline graphic [25]. Moreover, around the true local modes of the density Inline graphic, one can argue that Inline graphic is strongly convex and has a Lipschitz gradient with probability tending to 1 by the uniform consistency of Inline graphic and Inline graphic as Inline graphic and Inline graphic; see the uniform bounds (3.5). Hence, by some standard results in optimization theory (e.g. Chapter 3 in [17]), a sample-based gradient ascent algorithm with objective function Inline graphic converges linearly to (estimated) local modes around their neighborhoods as long as its step size is below some threshold value. Finally, recall that (i) the mean shift algorithm is a special variant of the sample-based gradient ascent method with an adaptive size Inline graphic by (3.8) and (ii) Inline graphic can be sufficiently small but bounded away from 0 when Inline graphic is large and Inline graphic is small by Lemma 3.2; see also Remark 3.2. Therefore, the mean shift algorithm will converge linearly with high probability around some small neighborhood of the local modes of Inline graphic when the sample size Inline graphic is sufficiently large and Inline graphic is chosen to be small.

Proposition D.1. (Convergence of the SCGA Algorithm.)

For any SCGA sequence Inline graphic defined by (3.12) with Inline graphic, the following properties hold.

  • (a) Under condition (A1), the objective function sequence Inline graphic is non-decreasing and converges.

  • (b) Under condition (A1), Inline graphic.

  • (c) Under conditions (A1-3), Inline graphic whenever Inline graphic with the convergence radius Inline graphic satisfying
    graphic file with name DmEquation124.gif
    where Inline graphic is a constant defined in (h) of Lemma C.1 and Inline graphic is a quantity depending on both the dimension Inline graphic and functional norm Inline graphic up to the fourth-order (partial) derivatives of Inline graphic.

Proof. Proof of Proposition 3.1

(a) We first derive the following fact about the objective function Inline graphic.

Inline graphic  Fact 1. Given (A1), Inline graphic is Inline graphic-smooth, that is, Inline graphic is Inline graphic-Lipschitz.

This fact follows from the differentiability of Inline graphic ensured by condition (A1) and Taylor’s theorem that

graphic file with name DmEquation125.gif

for any Inline graphic, where Inline graphic is within a Inline graphic-neighborhood of Inline graphic. Moreover,

graphic file with name DmEquation126.gif (D4)

When Inline graphic, we have that

graphic file with name DmEquation127.gif (D5)

showing that the objective function Inline graphic is non-decreasing along Inline graphic. Given the boundedness of Inline graphic guaranteed by condition (A1), we know that the sequence Inline graphic is bounded. Thus, Inline graphic also converges.

(b) From (a), we know that when Inline graphic,

graphic file with name DmEquation128.gif

Since the sequence Inline graphic converges as Inline graphic, it follows that

graphic file with name DmEquation129.gif

(c) Given condition (A2) and the fact that Inline graphic, we know that

graphic file with name DmEquation130.gif

Let Inline graphic denote the projection of Inline graphic in the SCGA sequence onto the ridge Inline graphic. Since Inline graphic by (h) of Lemma C.1, Inline graphic is well defined when Inline graphic. Given that the definition of Inline graphic in (C2), we know that

graphic file with name DmEquation131.gif (D6)

when Inline graphic, where we leverage (e) of Lemma C.1 to obtain the inequality (i). More specifically, Inline graphic is a full row rank matrix by (d) of Lemma C.1 and Inline graphic lies within the row space of Inline graphic because Inline graphic is normal to Inline graphic at Inline graphic. Since the nonzero singular values of Inline graphic are lower bounded by Inline graphic, it follows that Inline graphic. From the above derivation, we also know that Inline graphic is indeed the supremum norm of Inline graphic over the line segment connecting Inline graphic and Inline graphic, which depends on the uniform functional norm Inline graphic of the partial derivatives of Inline graphic up to the fourth order. The result follows from (b).

The following Davis–Kahan theorem [36] is one of the most notable theorems in matrix perturbation theory. We present the theorem in a modified version from [45, 109]. Other useful variants of the Davis–Kahan theorem can be found in [115].

Lemma D.2. (Davis–Kahan.)

Let Inline graphic and Inline graphic be two symmetric matrices in Inline graphic, whose spectra (Definition 1.1.4 in [58] are Inline graphic and Inline graphic, and Inline graphic be an interval. Denote by Inline graphic the set of eigenvalues of Inline graphic that are contained in Inline graphic, and by Inline graphic the matrix whose columns are the corresponding (unit) eigenvectors to Inline graphic (more formally, Inline graphic is the image of the spectral projection induced by Inline graphic). Denote by Inline graphic and Inline graphic the analogous quantities for Inline graphic. If


Lemma D.2. (Davis–Kahan.)

then the distance Inline graphic between two subspaces is bounded by


Lemma D.2. (Davis–Kahan.)

for any orthogonally invariant norm Inline graphic, such as the Frobenius norm Inline graphic and the Inline graphic-operator norm Inline graphic, where Inline graphic is a diagonal matrix with the ascending principal angles between the column spaces of Inline graphic and Inline graphic on the diagonal.

Note that when we take the Frobenius norm in Lemma D.2, Inline graphic by some simple algebra. Consequently, we will utilize the following inequality from the Davis–Kahan theorem in our subsequent proofs as

graphic file with name DmEquation134.gif (D7)

Theorem D.1. (Linear Convergence of the SCGA Algorithm.)

Assume conditions (A1–4) throughout the theorem.

  • (a) Q-Linear convergence of Inline graphic: Consider a convergence radius Inline graphic satisfying
    graphic file with name DmEquation135.gif
    where Inline graphic is the constant defined in (h) of Lemma C.1 and Inline graphic is a quantity defined in (c) of Proposition 3.1 that depends on both the dimension Inline graphic and the functional norm Inline graphic up to the fourth-order derivative of Inline graphic. Whenever Inline graphic and the initial point Inline graphic with Inline graphic, we have that
    graphic file with name DmEquation136.gif
  • (b) R-Linear convergence of Inline graphic: Under the same radius Inline graphic in (a), we have that whenever Inline graphic and the initial point Inline graphic with Inline graphic,
    graphic file with name DmEquation137.gif

We further assume conditions (E1–2) in the rest of statements. If Inline graphic and Inline graphic,

  • (c) Q-Linear convergence of Inline graphic: under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation138.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.
  • (d) R-Linear convergence of Inline graphic: under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation139.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.

Proof. Proof of Theorem 3.1

The entire proof is inspired by some standard results in optimization theory. However, the objective function Inline graphic is no longer strongly concave, and we focus on the SCGA iteration instead of the standard gradient ascent method. We first recall the following two facts. Inline graphic  Fact 1. Given condition (A1), Inline graphic is Inline graphic-smooth, that is, Inline graphic is Inline graphic-Lipschitz. Inline graphic  Fact 2. Given conditions (A1–3), we know that Inline graphic for any Inline graphic and

graphic file with name DmEquation140.gif

for any Inline graphic.

Fact 1 has been proved in Proposition 3.1, implying that the objective function sequence Inline graphic is non-decreasing when Inline graphic. Fact 2 is a natural corollary by Proposition 3.1, because Inline graphic and Inline graphic is the objective function value after one-step SCGA iteration with step size Inline graphic. The iteration will move Inline graphic toward the ridge Inline graphic. With the help of these two facts, we start the proofs of (a–d).

(a) We first show that the following claim: for all Inline graphic and the initial point Inline graphic,

graphic file with name DmEquation141.gif (D8)

where Inline graphic. By the differentiability of Inline graphic guaranteed by condition (A1) and Taylor’s theorem, we have that

graphic file with name DmEquation142.gif

where we use the equality Inline graphic in (i) and (iii), leverage conditions (A2) and (A4) to obtain that Inline graphic and Inline graphic in (ii), apply the quadratic bound on Inline graphic from condition (A4) to obtain (iv), and use the fact that Inline graphic in (v). We also use the fact that Inline graphic and Inline graphic in (v). Claim (D8) thus follows.

Given Fact 2 and any Inline graphic,

graphic file with name DmEquation143.gif

where we apply (D4) to obtain the inequality. It implies that

graphic file with name DmEquation144.gif (D9)

Therefore,

graphic file with name DmEquation145.gif

whenever Inline graphic, where we apply (D8) and (D9) in (i) and use the choice of Inline graphic to argue that

graphic file with name DmEquation146.gif

in (ii). By telescoping, we conclude that when Inline graphic and Inline graphic,

graphic file with name DmEquation147.gif

The result follows.

(b) The result follows easily from (a) and the inequality Inline graphic for all Inline graphic.

(c) The proof here is partially inspired by the proof of Theorem 2 in [8]. We write the spectral decompositions of Inline graphic and Inline graphic as

graphic file with name DmEquation148.gif

where Inline graphic and Inline graphic. By Weyl’s theorem (Theorem 4.3.1 in [58]) and uniform bounds (3.5), we know that for any Inline graphic,

graphic file with name DmEquation149.gif

Thus, Inline graphic satisfies the first two inequalities in condition (A2) when Inline graphic is sufficiently small and Inline graphic is sufficiently large. According to the Davis–Kahan theorem (Lemma D.2 here) and uniform bounds (3.5),

graphic file with name DmEquation150.gif

for any Inline graphic, where we use (D7) and the fact that Inline graphic to obtain (i). Hence, when Inline graphic and Inline graphic,

graphic file with name DmEquation151.gif (D10)

with probability tending to 1.

We now claim that Inline graphic and

graphic file with name DmEquation152.gif (D11)

for all Inline graphic. We will prove this claim by induction on the iteration number. Note that when Inline graphic, we derive from triangle inequality that

graphic file with name DmEquation153.gif

where we apply the result in (a) to obtain the last inequality. Moreover, by the choice of Inline graphic and (D10), we are guaranteed that Inline graphic. In the induction from Inline graphic, we suppose that Inline graphic and the claim (D11) holds at iteration Inline graphic. The same argument then implies that the claim (D11) holds for iteration Inline graphic and that Inline graphic. The claim (D11) is thus proved.

Now, given that Inline graphic, we iterate the claim (D11) to show that

graphic file with name DmEquation154.gif

where the fourth inequality follows by summing the geometric series, and the last equality is due to our notation Inline graphic. It completes the proof.

(d) The result follows easily from (c) and the inequality Inline graphic for all Inline graphic.

E. Discussion on Condition (A4)

In this section, we explore several avenues to derive condition (A4) based on some potentially weaker assumptions. Recall from Section 3.3 that condition (A4) requires the following:

  • (A4) (Quadratic Behaviors of Residual Vectors) We assume that the SCGA sequence Inline graphic with step size Inline graphic and Inline graphic as its limiting point satisfies that
    graphic file with name DmEquation155.gif
    for some constant Inline graphic, where Inline graphic is the constant defined in condition (A2).

E.1 Self-Contractedness Assumption

One important assumption that connects condition (A4) with the existing conditions (A1–3) in Section 3.1 is the so-called self-contracted property [34, 35, 49]:

  • (A5) (Self-Contractedness) We assume that the SCGA sequence Inline graphic satisfies that
    graphic file with name DmEquation156.gif

Condition (A5) requires the SCGA sequence to move toward the ridge Inline graphic under a relatively straight and shrinking path. As we have proved in Proposition 3.1 that the SCGA sequence Inline graphic converges to Inline graphic when the sequence is initialized near Inline graphic and its step size is small, condition (A5) is indeed a mild assumption as long as the sequence Inline graphic does not move erratically around Inline graphic. More importantly, we demonstrate by Proposition E.1 below that condition (A5) can be implied by a subspace constrained version of the concavity assumption on the objective (density) function Inline graphic.

Proposition E.1.

Assume condition (A1) and the following assumption on the objective function Inline graphic:

  • (A6) (Subspace Constrained Concavity ) For any Inline graphic with Inline graphic being a constant radius, it holds that
    graphic file with name DmEquation157.gif

Then, the SCGA sequence Inline graphic defined in (3.12) with step size Inline graphic and initial point Inline graphic is self-contracted.

Notice that the density function (3.13) satisfies the ‘subspace constrained concavity’ condition (A6) around a small neighborhood of its ridge Inline graphic. Moreover, it is intuitive to verify that condition (A6) is a weaker assumption compared with our established ‘subspace constrained strong concavity’ in Theorem 3.1; see also Remark 3.3.

Proof. Proof of Proposition E.1

The proof is inspired by Lemma 14 in [49]. We show the self-contractedness for Inline graphic as follows, where Inline graphic is arbitrary. For all Inline graphic and Inline graphic with Inline graphic, we calculate that

graphic file with name DmEquation158.gif

where we apply condition (A6) in inequality (i), use the ascending property of Inline graphic from (a) of Proposition 3.1 to argue that Inline graphic in inequality (ii), and leverage the inequality (D5) guaranteed by condition (A1) to obtain (iii). The self-contractedness of the SCGA sequence Inline graphic thus follows.

Under the self-contractedness condition (A5), we argue by the following lemma that the existing conditions (A1–3) in the literature [22, 45] is nearly sufficient to imply the quadratic behavior of the residual vector Inline graphic along the SCGA sequence Inline graphic. In other words, condition (A4) and the linear convergence of the SCGA algorithm hold without any extra assumption.

Lemma E.1.

Assume condition (A5) throughout the lemma.

  • (a) The total length of the SCGA trajectory is of the linear order, i.e.
    graphic file with name DmEquation159.gif
  • (b) We further assume conditions (A1–2). Then,
    graphic file with name DmEquation160.gif
    for any Inline graphic with some radius Inline graphic, where we recall that Inline graphic is the effective radius in condition (A2) under which the underlying density Inline graphic has an eigengap Inline graphic between the Inline graphic-th and Inline graphic-th eigenvalues of its Hessian matrix Inline graphic.

Proof. Proof of Lemma E.1

(a) This result follows directly from Theorem 15 of [49]. Note that although their results are stated for the standard gradient descent path, the associated proof only utilizes the self-contractedness property of the iterative path. Thus, their proofs are applicable to our SCGA setting under condition (A5).

(b) We first decompose the vector Inline graphic into an infinite sum of SCGA iterations Inline graphic for Inline graphic and obtain that

graphic file with name DmEquation161.gif (E1)

for any Inline graphic, where we leverage the orthogonality between Inline graphic and Inline graphic and the idempotence of Inline graphic for all Inline graphic in (ii). See also Fig. E11 for a graphical illustration of the decomposition. By Davis–Kahan theorem (Lemma D.2 and (D7) here) and conditions (A1–2), we deduce that for all Inline graphic,

graphic file with name DmEquation162.gif

where we use the Taylor’s theorem in (i) as well as apply the self-contractedness condition (A5) and possibly shrink the radius Inline graphic so that Inline graphic in (ii). Hence, by (E1) and the fact that Inline graphic, we obtain that

graphic file with name DmEquation163.gif

 

graphic file with name DmEquation163a.gif

implying the second bound in condition (A4) with Inline graphic. In addition,

graphic file with name DmEquation164.gif

The results follow.

Fig. E11.


Fig. E11.

Decomposition of the vector Inline graphic into the summation Inline graphic of subspace constrained gradient ascent iterative vectors.

According to (b) of Lemma E.1, condition (A4) will hold with Inline graphic whenever

graphic file with name DmEquation165.gif (E2)

The choice of Inline graphic is a valid constant under the differentiability condition (A1). More importantly, (E2) is essentially the same assumption as the first inequality of condition (A3). Compared with the corresponding condition in (A3), the upper bound in (E2) for Inline graphic around the ridge Inline graphic is only shrunk by a dimension-dependent factor Inline graphic. As condition (A3) and (E2) are local, this adjustment does not induce too much extra strictness on the underlying density Inline graphic.

E.2 Subspace Constrained Polyak-Łojasiewicz Inequality Assumption

We have demonstrated in Appendix E.1 that the crucial condition (A4) is valid under the self-contractedness assumption on the SCGA sequence Inline graphic. Consequently, the linear convergence of the SCGA algorithm can be established by slightly modifying the common assumptions (A1–3) in ridge estimation. Nevertheless, the self-contractedness property of the SCGA sequence Inline graphic does not always hold in practice, and it may only be implied by the subspace constrained concavity condition (A6) as proved in Proposition E.1.

Given the fact that the underlying density function Inline graphic or its estimator Inline graphic may not satisfy the subspace constrained concavity assumption in many practical applications of SCGA and SCMS algorithms, we present another approach to deduce condition (A4) based on the well-known Polyak–Łojasiewicz inequality [72, 87]. Given any SCGA sequence Inline graphic with limiting point Inline graphic and step size Inline graphic, we consider the following condition:

  • (A7) (Subspace Constrained Polyak–Łojasiewicz Inequality) For all Inline graphic, there exists a constant Inline graphic such that
    graphic file with name DmEquation166.gif

Similar to the standard Polyak–Łojasiewicz inequality, there exist some objective functions that satisfy the subspace constrained Polyak–Łojasiewicz inequality but fail to be concave in the subspace constrained sense as in condition (A6); see [21, 41] and Equation (36) in [28]. From this aspect, condition (A7) incorporates some extra SCGA sequences satisfying condition (A4) and converging linearly to the ridge Inline graphic. However, as the subspace constrained Polyak–Łojasiewicz inequality does not imply condition (A5) or (A6), it should not be regarded as a more general condition. Furthermore, unlike the standard gradient ascent/descent method (Theorem 2 in [63]), the error bound condition (i.e. Equation (D6) here) does not imply the subspace constrained Polyak–Łojasiewicz inequality, indicating a challenge in validating condition (A7) in practice.

Despite these disadvantages, the subspace constrained Polyak–Łojasiewicz inequality condition does give rise to a concise proof for the linear convergence of the objective function value Inline graphic along the SCGA sequence Inline graphic.

Proposition E.2.

Assume conditions (A1) and (A7). Then, for any SCGA sequence Inline graphic with step size Inline graphic, we have that


Proposition E.2.

Proof. Proof of Proposition E.2

The proof is inspired by Theorem 1 in [63]. From (D5) and condition (A7), we know that

graphic file with name DmEquation168.gif

for all Inline graphic when Inline graphic. By some rearrangements, we conclude that

graphic file with name DmEquation169.gif

The final display follows from telescoping.

More importantly, the subspace constrained Polyak–Łojasiewicz inequality controls the total length of the SCGA path to be of the linear order and implicates the quadratic behaviors of residual vectors as required by condition (A4).

Lemma E.2.

Assume conditions (A1) and (A7) throughout the lemma.

  • (a) The total length of the SCGA trajectory is of the linear order, i.e.
    graphic file with name DmEquation170.gif
  • (b) We further assume condition (A2). Then,
    graphic file with name DmEquation171.gif
    for any Inline graphic with some radius Inline graphic, where we recall that Inline graphic is the effective radius in condition (A2) under which the underlying density Inline graphic has an eigengap Inline graphic between the Inline graphic-th and Inline graphic-th eigenvalues of its Hessian matrix Inline graphic.

Proof. Proof of Lemma E.2

(a) This part of the proof is inspired by the arguments in Theorem 9 of [49]. Based on the proof of (a) in Proposition 3.1 under condition (A1), we know from (D5) that

graphic file with name DmEquation172.gif

when Inline graphic. Using this inequality and condition (A7), we derive that

graphic file with name DmEquation173.gif

where we use the inequality Inline graphic to obtain (i) and apply condition (A7) in inequality (ii). Since Inline graphic, some rearrangement of the above inequality suggests that

graphic file with name DmEquation174.gif

Therefore,

graphic file with name DmEquation175.gif

where we leverage condition (A7) again in (i). In addition, to obtain inequalities (ii) and (iii), we recall from the proof of (d) for Lemma C.1 that Inline graphic, in which the singular values of Inline graphic is bounded by Inline graphic and the singular values of Inline graphic is bounded by Inline graphic. The result thus follows.

(b) This part of the proof is analogous to our arguments in (b) of Lemma E.1, except that the SCGA sequence Inline graphic is no longer self-contracted. For the completeness, we still repeat some arguments and highlight the differences here. By Davis–Kahan theorem (Lemma D.2 and (D7) here) and conditions (A1–2), we have that for all Inline graphic,

graphic file with name DmEquation176.gif

where we possibly shrink the radius Inline graphic so that Inline graphic to obtain inequality (ii). Notice also that, since Inline graphic may not hold without the self-contractedness property, we use a looser bound

graphic file with name DmEquation177.gif

from (a) to derive inequality (i). Therefore, by (E1) and the fact that Inline graphic, we obtain that

graphic file with name DmEquation178.gif

which implies the second bound in condition (A4) with Inline graphic. Finally,

graphic file with name DmEquation179.gif

The results follow.

The results in (b) of Lemma E.2 also imply condition (A4) with Inline graphic whenever

graphic file with name DmEquation180.gif (E3)

Once again, the choice of Inline graphic is feasible under condition (A1) and the upper bound (E3) can be viewed as a variant of the first inequality in condition (A3). From this perspective, the subspace constrained Polyak–Łojasiewicz inequality (A7) also leads to an alternative assumptions for condition (A4) and the linear convergence of the SCGA algorithm.

Remark E.1.

Note that the results in Proposition E.2 can be generalized to the directional or arbitrary manifold cases under conditions (A1–3). First, the subspace constrained Polyak–Łojasiewicz inequality for the SCGA sequence Inline graphic on Inline graphic or an arbitrary manifold can be modified as


Remark E.1.

where Inline graphic is the objective (density) function. Based on the proof of (a) in Proposition 4.2 and our arguments in (a) of Lemma E.2, it follows that the total length of the SCGA trajectory on Inline graphic or an arbitrary manifold is of the linear order, i.e.


Remark E.1.

Second, to establish the quadratic bounds for Inline graphic and Inline graphic, one can follow the arguments in the proof of (b) in Lemma E.2 and leverage the two facts:

1. The tangent vector Inline graphic can be decomposed into an infinite sum of SCGA updates (4.18) on Inline graphic or an arbitrary manifold as


Remark E.1.

See Fig. E12 for a graphical illustration. This equation is valid because parallel transports preserve inner products and are linear.

2. Under conditions (A1–2), we know that


Remark E.1.

for some constant Inline graphic, where we leverage the fact that the vector field


Remark E.1.

with Inline graphic and Inline graphic has its variation Inline graphic bounded by


Remark E.1.

according to the Davis–Kahan theorem for any Inline graphic. However, we are not sure if the self-contractedness condition can also be adaptive to the directional or general manifold cases, given that the arguments in Theorem 15 of [49] are based on the Euclidean geometry.

Fig. E12.


Fig. E12.

Decomposition of the vector Inline graphic within the tangent space Inline graphic into the summation Inline graphic of parallel transported SCGA iterative vectors. Here, the blue curves on Inline graphic are iterative paths of the SCGA algorithm, while the green vectors are tangent vectors Inline graphic after being parallel transported to Inline graphic.

F. Other Technical Concepts of Differential Geometry on Inline graphic

Inline graphic  Taylor’s Theorem on Inline graphic. Given a smooth function Inline graphic on Inline graphic, its Taylor’s expansion is often written as [85]:

graphic file with name DmEquation187.gif (F1)

for any Inline graphic, where Inline graphic is the exponential map at Inline graphic. One may replace the exponential map with a more general concept called the retractions on an arbitrary manifold; see Section 4.1 and Proposition 5.5.5 in [1].

  • Parallel Transport. When comparing vectors in two different tangent spaces Inline graphic on Inline graphic, we leverage the notion of parallel transport  Inline graphic to transport vectors from one tangent space to another along a geodesic. In addition, Inline graphic is a tangent vector in Inline graphic after being parallel transported from Inline graphic along a geodesic (or great circle) on Inline graphic. The parallel transport mapping Inline graphic is a linear isometry along any smooth curve on Inline graphic, i.e. Inline graphic for any Inline graphic; see Proposition 5.5 in [69] or Proposition 1 in Section 4-4 of [37].

  • Sectional Curvature. Sectional curvature is the Gaussian curvature of a two-dimensional submanifold formed as the image of a two-dimensional subspace of a tangent space after exponential mapping; see Section 3-2 in [37] for detailed discussions about the Gaussian curvature. It is known that a two-dimensional submanifold with positive, zero or negative sectional curvature is locally isometric to a two-dimensional sphere, a Euclidean plane or a hyperbolic plane with the same Gaussian curvature [116].

  • Geodesically Strong Concavity. A function Inline graphic is said to be geodesically concave if for any Inline graphic, it holds that
    graphic file with name DmEquation188.gif
    for any Inline graphic, where Inline graphic is a geodesic with Inline graphic and Inline graphic. When Inline graphic is differentiable, an equivalent statement of the geodesic concavity is that (Theorem 11.17 in [15])
    graphic file with name DmEquation189.gif
    A function Inline graphic is said to be geodesically Inline graphic-strongly concave if for any Inline graphic, it holds that
    graphic file with name DmEquation190.gif

G. Normal Space of Directional Density Ridge

Recall that we extend the directional density Inline graphic from its support Inline graphic to Inline graphic by defining Inline graphic for all Inline graphic. As we will refer to conditions (A1–3) frequently in the next three sections, we restate them here:

  • (A1) (Differentiability) Under the extension (4.1) of the directional density Inline graphic, we assume that the total gradient Inline graphic, total Hessian matrix Inline graphic and third-order derivative tensor Inline graphic in Inline graphic exist, and are continuous on Inline graphic and square integrable on Inline graphic. We also assume that Inline graphic has bounded fourth-order derivatives on Inline graphic.

  • (A2) (Eigengap) We assume that there exist constants Inline graphic and Inline graphic such that Inline graphic and Inline graphic for any Inline graphic.

  • (A3) (Path Smoothness) Under the same Inline graphic in (A2), we assume that there exists another constant Inline graphic such that
    graphic file with name DmEquation191.gif
    for all Inline graphic and Inline graphic.

Recall that an order-Inline graphic density ridge of a directional density Inline graphic on Inline graphic is the set of points defined as

graphic file with name DmEquation192.gif (G1)

Lemma G.1 below shows that under conditions (A1–3), the Jacobian matrices Inline graphic and Inline graphic (i.e. projecting the columns of Inline graphic onto the tangent space Inline graphic) both have rank Inline graphic at every point on Inline graphic, and Inline graphic will be a Inline graphic-dimensional submanifold on Inline graphic by the implicit function theorem [68, 92]. Analogous to the discussion about the normal space of a Euclidean density ridge in Appendix C, we define

graphic file with name DmEquation193.gif

Different from the Euclidean density ridge case, it is the column space of

graphic file with name DmEquation194.gif (G2)

that spans the normal space of Inline graphic within the ambient space Inline graphic. It can be seen from our Remark 4.1 that the rows of

graphic file with name DmEquation195.gif

span the normal space of the solution manifold Inline graphic; see also Lemma 1 in [28]. Consequently, the column space of Inline graphic spans the normal space of Inline graphic within the tangent space Inline graphic at each Inline graphic. The technique in pages 60–63 of [39] is still valid to argue that

graphic file with name DmEquation196.gif (G3)

for Inline graphic, where we use the fact that Inline graphic on Inline graphic under the extension of Inline graphic as in (A1). Let

graphic file with name DmEquation197.gif

for Inline graphic. Then,

graphic file with name DmEquation198.gif (G4)

As in the Euclidean data case, the columns of Inline graphic are not orthonormal, and we again leverage the orthonormalization technique in [22] to construct Inline graphic that shares the same column space with Inline graphic but has orthonormal columns. That is, under the condition that Inline graphic has full column rank Inline graphic at every point Inline graphic (see Lemma G.1),

graphic file with name DmEquation199.gif (G5)

with the Cholesky decomposition Inline graphic, where Inline graphic is a lower triangular matrix whose diagonal elements are positive. Finally, the non-uniqueness of Inline graphic will not affect our subsequent discussions about the properties of directional density ridges.

Lemma G.1.

Assume conditions (A1–3). Given that Inline graphic, Inline graphic and Inline graphic are defined in (G4) and (G5), we have the following properties:

  • (a) Inline graphic and Inline graphic have the same column space. In addition,
    graphic file with name DmEquation200.gif
    That is, Inline graphic is the projection matrix onto the columns of Inline graphic.
  • (b) The columns of Inline graphic are orthonormal to each other.

  • (c) For Inline graphic, the column space of Inline graphic is normal to the (tangent) direction of Inline graphic at Inline graphic.

  • (d) For Inline graphic, the smallest eigenvalue Inline graphic, and
    graphic file with name DmEquation201.gif
    Moreover, all the nonzero singular values of Inline graphic are greater than Inline graphic, and
    graphic file with name DmEquation202.gif
    Therefore, Inline graphic is a Inline graphic-dimensional submanifold that contains neither intersections nor endpoints on Inline graphic. Namely, Inline graphic is a finite union of connected and compact submanifolds on Inline graphic.
  • (e) For all Inline graphic,
    graphic file with name DmEquation203.gif
  • (f) When Inline graphic is sufficiently small and Inline graphic,
    graphic file with name DmEquation204.gif
    for some constant Inline graphic.
  • (g) Assume that another directional density function Inline graphic also satisfies conditions (A1–3) after the extension Inline graphic in Inline graphic, and Inline graphic is sufficiently small. Then,
    graphic file with name DmEquation205.gif
    for some constant Inline graphic and any Inline graphic, where Inline graphic is the matrix defined in (G5) with directional density Inline graphic.
  • (h) The reach of Inline graphic satisfies
    graphic file with name DmEquation206.gif
    for some constant Inline graphic.

This lemma is a direct extension of Lemma C.1 to the directional data scenario; thus, its proof is similar to the proof of Lemma C.1.

Proof. Proof of Lemma G.1

The proofs of properties (a), (b) and (c) can be inherited from the corresponding ones in Lemma C.1 with mild modifications and we thus omit them.

(d) We will prove that the Inline graphic nonzero singular values of Inline graphic and Inline graphic are bounded away from 0. Recall that

graphic file with name DmEquation207.gif

with

graphic file with name DmEquation208.gif

for Inline graphic. Under condition (A2),

graphic file with name DmEquation209.gif

It shows that all the singular values of Inline graphic or simply Inline graphic are less than Inline graphic. Moreover, under condition (A2) again, all the Inline graphic singular values of

graphic file with name DmEquation210.gif

are greater than Inline graphic.

By Theorem 3.3.16 in [57], we know that all the singular values of Inline graphic and Inline graphic are greater than

graphic file with name DmEquation211.gif

where Inline graphic are singular values of a matrix Inline graphic in their descending order. Therefore, the minimum eigenvalue of Inline graphic satisfies

graphic file with name DmEquation212.gif (G6)

Now, given Inline graphic and Inline graphic, we know that

graphic file with name DmEquation213.gif

If we denote the orthonormal eigenvectors of Inline graphic by Inline graphic, then

graphic file with name DmEquation214.gif

are the orthonormal eigenvectors of Inline graphic, whose eigenvalues are thus lower bounded by Inline graphic due to (G6). Hence, Inline graphic.

By the implicit function theorem and the extra constraint Inline graphic, Inline graphic is a Inline graphic-dimensional submanifold on Inline graphic. It also implies that Inline graphic cannot have intersections, because otherwise the intersected points will violate the rank condition. Finally, we argue by contradiction that Inline graphic has no endpoints. Assume, on the contrary, that Inline graphic has an end point Inline graphic. Our preceding argument has shown that Inline graphic, the derivative of Inline graphic, is bounded. In addition, Inline graphic. However, this contradicts to the implicit function theorem indicating that Inline graphic is a Inline graphic-dimensional submanifold on Inline graphic, because at the end point Inline graphic, there exists no local coordinate chart for Inline graphic defined on an open set in Inline graphic. The results follow.

(e) By the proof of (d), we already know that all the Inline graphic nonzero singular values of Inline graphic and Inline graphic are greater than Inline graphic. Also, all the Inline graphic nonzero singular values of Inline graphic are greater than Inline graphic. Thus, the results follow easily from the argument of (e) in Lemma C.1.

Finally, the proofs of properties (f), (g) and (h) are essentially the same as the corresponding claims in [22]. We thus omitted them. For (h), the reader should be aware that we have extended the directional density Inline graphic from Inline graphic to Inline graphic. In addition, it is the columns of Inline graphic that span the normal space of Inline graphic in the ambient space, whose nonzero singular values are lower bounded by Inline graphic. The proof of (h) can also be found in Theorem 3 of [28].

H. Stability of Directional Density Ridge

H.1 Subspace Constrained Gradient Flows

This subsection is modified from Section 4 in [45] for directional densities and their ridges on Inline graphic. A map Inline graphic is a subspace constrained gradient flow with the principal Riemannian gradient Inline graphic if Inline graphic and

graphic file with name DmEquation215.gif (H1)

where the last equality follows from (4.4). Given the definition of the directional density ridge Inline graphic in (G1), it consists of the destinations of the subspace constrained gradient flow Inline graphic, i.e. Inline graphic if Inline graphic for some Inline graphic satisfying (H1). It will be convenient to parametrize the SCGA path with Inline graphic by arc length. Let Inline graphic be the arc length from Inline graphic to Inline graphic:

graphic file with name DmEquation216.gif

Denote the inverse of Inline graphic by Inline graphic. Note that

graphic file with name DmEquation217.gif

With Inline graphic, we have that

graphic file with name DmEquation218.gif (H2)

which is a reparametrization of (H1) by arc length. Note that Inline graphic always lies on Inline graphic because its velocity is within the tangent space Inline graphic for every Inline graphic. Lemma 2 in [45] justifies the uniqueness of Inline graphic passing through any particular point Inline graphic under conditions (A1–3). The (reversed) subspace constrained gradient flow Inline graphic can be lifted onto the directional function Inline graphic, as we may define

graphic file with name DmEquation219.gif (H3)

Sometimes, we may add the subscript Inline graphic to the curves Inline graphic if we want to emphasize that Inline graphic start from or pass through the specific point Inline graphic.

To analyze the behavior of the subspace constrained gradient flow Inline graphic lifted on Inline graphic, we need the derivative of the projection matrix Inline graphic along the path Inline graphic. Recall that Inline graphic. The collection Inline graphic defines a matrix field: there is a matrix Inline graphic attached to each point Inline graphic. As mentioned earlier, there is a unique path Inline graphic and unique Inline graphic such that Inline graphic for any Inline graphic. Define

graphic file with name DmEquation220.gif (H4)

where Inline graphic with Inline graphic being the Riemannian connection on Inline graphic. Under conditions (A1–3), Inline graphic has a quadratic-like behavior near the directional ridge Inline graphic, analogous to Lemma 3 in [45].

Lemma H.1.

Assume that conditions (A1–3) holds. For all Inline graphic, we have the following properties:

  • (a) Inline graphic, Inline graphic, and Inline graphic. Thus, Inline graphic is non-decreasing in Inline graphic.

  • (b) The second derivative of Inline graphic satisfies Inline graphic.

  • (c) Inline graphic.

Proof. Proof of Lemma H.1

The proof is adopted from Lemma 3 in [45].

(a) The first property Inline graphic is obvious from the definition (H3). Then,

graphic file with name DmEquation221.gif

since Inline graphic for all Inline graphic. By the definition of Inline graphic in (G1), Inline graphic when Inline graphic. Thus, Inline graphic and Inline graphic is non-decreasing in Inline graphic.

(b) Note that

graphic file with name DmEquation222.gif

Differentiating both sides of the equation, we have that

graphic file with name DmEquation223.gif

Since Inline graphic (idempotent), we have that Inline graphic, and hence the second term on the right-hand side of the above equation becomes

graphic file with name DmEquation224.gif

Thus,

graphic file with name DmEquation225.gif

By (a) and (H2), we conclude that

graphic file with name DmEquation226.gif (H5)

Now, we will bound the two terms in (H5), respectively. As for the first term Inline graphic, we notice that Inline graphic is in the column space of Inline graphic. Hence,

graphic file with name DmEquation227.gif

where Inline graphic. Therefore, from condition (A2),

graphic file with name DmEquation228.gif

and consequently,

graphic file with name DmEquation229.gif

As for the second term Inline graphic, we notice that Inline graphic, where Inline graphic, and Inline graphic. Then,

graphic file with name DmEquation230.gif

However, Inline graphic. To see this, note that Inline graphic and it implies that

graphic file with name DmEquation231.gif

showing that Inline graphic. To bound Inline graphic, we proceed as follows. As before, we let Inline graphic. Then, by the Davis–Kahan theorem (Lemma D.2 here),

graphic file with name DmEquation232.gif

Note that Inline graphic, because Inline graphic. Thus, from condition (A3),

graphic file with name DmEquation233.gif

Therefore, Inline graphic.

(c) For some Inline graphic,

graphic file with name DmEquation234.gif

by (a) and (b). As Inline graphic is parametrized by arc length, we conclude that

graphic file with name DmEquation235.gif

The result follows.

The statement (c) in Lemma H.1 is known as the quadratic growth condition in the optimization literature [4, 38]. Under conditions (A1–3), such a quadratic growth of the subspace constrained gradient flow Inline graphic lifted onto the directional density Inline graphic enables us to quantify the stability of directional ridges under small perturbations on the directional density and develop the linear convergence of the (directional) SCGA algorithms on Inline graphic.

H.2 Proof of Theorem 4.1

We now show that if two directional densities Inline graphic and Inline graphic are close, their corresponding ridges Inline graphic and Inline graphic are also close. We will use, for instance, Inline graphic and Inline graphic, to refer to the principal (Riemannian) gradient and projection matrix with its columns as the eigenvectors corresponding to the smallest Inline graphic eigenvalues of the (Riemannian) Hessian Inline graphic with the tangent space of Inline graphic defined by Inline graphic.

Theorem H.1.

Suppose that conditions (A1–3) hold for the directional density Inline graphic and that condition (A1) holds for Inline graphic. When Inline graphic is sufficiently small,

  • (a) conditions (A2–3) holds for Inline graphic.

  • (b) Inline graphic.

  • (c) Inline graphic for a constant Inline graphic.

Proof. Proof of Theorem 4.1

Our arguments are modified from the proof of Theorem 4 in [45] as well as Proposition 4 and Theorem 5 in [28].

(a) We write the spectral decompositions of Inline graphic and Inline graphic as

graphic file with name DmEquation236.gif

By Weyl’s Theorem (Theorem 4.3.1 in [58]), we know that

graphic file with name DmEquation237.gif

where we recall that there are at most Inline graphic nonzero eigenvalues of the Riemannian Hessian Inline graphic on Inline graphic. Thus, Inline graphic satisfies condition (A2). Moreover, since condition (A3) depends only on the first and third order derivative of Inline graphic, they hold for Inline graphic when Inline graphic is small enough.

(b) We present two methods based on two different flows to prove this statement and comment their pros and cons in Remark H.1. Method A: By the Davis–Kahan theorem (Lemma D.2 and (D7)),

graphic file with name DmEquation238.gif

for any Inline graphic. Then, given that Inline graphic,

graphic file with name DmEquation239.gif

Therefore, by the differentiability of Inline graphic from (A1) and the compactness of Inline graphic, we obtain from the above calculations that

graphic file with name DmEquation240.gif

for some constant Inline graphic that only depends on the dimension Inline graphic.

Now, let Inline graphic. Then, Inline graphic, and Inline graphic. Let Inline graphic be the SCGA flow through Inline graphic as defined in Section H.1 so that Inline graphic for some Inline graphic. Note that Inline graphic. From property (a) of Lemma H.1, we have that Inline graphic. Moreover, by Taylor’s theorem,

graphic file with name DmEquation241.gif

for some Inline graphic between Inline graphic and Inline graphic. Since Inline graphic, from property (b) of Lemma H.1,

graphic file with name DmEquation242.gif

and consequently, Inline graphic, where Inline graphic denotes the geodesic distance between Inline graphic and Inline graphic on Inline graphic. Therefore,

graphic file with name DmEquation243.gif

Now let Inline graphic. The same argument shows that Inline graphic for some constant Inline graphic because conditions (A1–3) hold for Inline graphic.

As a result, Inline graphic.

Method B: Since we are only required to bound the maximum Euclidean distance between Inline graphic and Inline graphic, i.e. Inline graphic, we may view Inline graphic and Inline graphic as solution manifolds in Inline graphic and tentatively ignore the manifold constraint Inline graphic. Define Inline graphic. Given that Inline graphic, the gradient of Inline graphic,

graphic file with name DmEquation244.gif (H6)

is a vector in Inline graphic. Let Inline graphic. We define a flow Inline graphic such that

graphic file with name DmEquation245.gif

It can be argued by Theorem 7 in [28] that Inline graphic when Inline graphic for some small Inline graphic. In addition, we can always choose Inline graphic to be small enough so that Inline graphic. By Theorem 3.39 in [59], Inline graphic is uniquely defined because the gradient Inline graphic is well defined for all Inline graphic. We can also reparametrize Inline graphic by arc length as

graphic file with name DmEquation246.gif

Let Inline graphic be the terminal time/arc-length point and Inline graphic be the destination of Inline graphic on Inline graphic. The above argument also demonstrates that the flows Inline graphic or Inline graphic converge to the manifold Inline graphic from the normal direction of Inline graphic, because we can write

graphic file with name DmEquation247.gif

and the column space of Inline graphic spans the normal space of Inline graphic at Inline graphic. The goal now is to bound Inline graphic because its length must be greater or equal to Inline graphic. We then define Inline graphic. Differentiating Inline graphic with respect to Inline graphic leads to

graphic file with name DmEquation248.gif (H7)

by (d) in Lemma G.1. (Note that Inline graphic because Inline graphic and by the continuity of Inline graphic, we can always choose Inline graphic such that Inline graphic for all Inline graphic.) As Inline graphic, by the proof of Method A, we know that

graphic file with name DmEquation249.gif

where Inline graphic is some value between Inline graphic and Inline graphic. Hence, Inline graphic, which is independent of Inline graphic. This implies that

graphic file with name DmEquation250.gif

We can exchange the role of Inline graphic and Inline graphic and apply the same argument to show that

graphic file with name DmEquation251.gif

In total, this leads to the conclusion that Inline graphic.

(c) By (h) in Lemma G.1, the reach of Inline graphic has a lower bound, Inline graphic. Note that Inline graphic and Inline graphic depend on the first three order derivatives of Inline graphic. Thus, the lower bound for the reach of Inline graphic will be identical to the one for Inline graphic with an error rate Inline graphic.

Note that for the stability of directional ridges, one can relax the condition (A1) by requiring Inline graphic to be Inline graphic-Hölder with Inline graphic.

Remark H.1.

We apply two different methods to establish the stability theorem of directional density ridges. Method A utilizes the subspace constrained gradient flow constructed in Section H.1 and its quadratic behavior (Lemma H.1), while Method B defines a normal flow to the ridge Inline graphic induced by the column space of Inline graphic. Each of these two flows has its pros and cons. The subspace constrained gradient flow aligns more coherently with our directional SCMS algorithm (Algorithm 2) to identify the (estimated) directional ridge from data, because it relies only on the first- and second-order derivatives of the (estimated) density Inline graphic. Nevertheless, the subspace constrained gradient flow does not necessarily converge to Inline graphic in the optimal direction, that is, the normal direction to Inline graphic. This can be seen from the explicit formula (G4) of Inline graphic, which spans the normal space of Inline graphic. The normal flow


Remark H.1.

defined in Method B, however, converges to Inline graphic in its normal direction by construction. In general, the normal flow tends to the ridge Inline graphic faster than the subspace constrained gradient flow, but it may be complicated to compute in any practical ridge-finding task due to its involvement with third-order derivatives of the (estimated) density Inline graphic. Recently, [90] presented explicit formulae for finding density ridges via such a normal flow and its discrete gradient descent approximation. Additionally, they defined a smoothed version of the ridgeness function that also circumvents the computations of third-order derivatives of Inline graphic.

I. Proofs of Proposition 4.1, Proposition 4.2 and Theorem 4.2

Proposition I.1.

Assume that the directional kernel Inline graphic is non-increasing, twice continuously differentiable and convex with Inline graphic. Given the directional KDE Inline graphic and the directional SCMS sequence Inline graphic defined by (4.13) or (4.14), the following properties hold:

  • (a) The estimated density sequence Inline graphic is non-decreasing and thus converges.

  • (b) Inline graphic.

  • (c) If the kernel Inline graphic is also strictly decreasing on Inline graphic, then Inline graphic.

Proof. Proof of Proposition 4.1

(a) The sequence Inline graphic is bounded if the kernel Inline graphic is non-increasing with Inline graphic. Hence, it suffices to show that it is non-decreasing. The convexity and differentiability of kernel Inline graphic imply that

graphic file with name DmEquation253.gif (I1)

for all Inline graphic. Then, with Inline graphic and the iterative formula (4.14) in the main paper, we derive that

graphic file with name DmEquation254.gif

where we use the orthogonality between Inline graphic and Inline graphic in (i), multiply Inline graphic to both the numerators and denominators of the two summands to obtain (ii), leverage the fact that Inline graphic in (iii), use the inequality Inline graphic in (iv). It thus completes the proof of (a).

(b) Our derivation in (a) already shows that

graphic file with name DmEquation255.gif

Notice that, on the one hand, the differentiability of kernel Inline graphic and the compactness of Inline graphic imply that Inline graphic for all Inline graphic, where Inline graphic only depends on the bandwidth Inline graphic and kernel Inline graphic. On the other hand, our argument in (a) already proves the convergence of Inline graphic. Therefore,

graphic file with name DmEquation256.gif

as Inline graphic. The result follows.

(c) Given the iterative formula (4.14) in the main paper, we deduce that

graphic file with name DmEquation257.gif

where we leverage the orthogonality between Inline graphic and Inline graphic to obtain (i) and (ii). Under the assumption that the kernel Inline graphic is strictly decreasing and (twice) continuously differentiable, we know that Inline graphic is lower bounded away from 0 on Inline graphic. Therefore, with the result in (b), the above calculation indicates that

graphic file with name DmEquation258.gif

as Inline graphic. The result follows.

Remark I.1.

The conditions imposed on kernel Inline graphic in Proposition 4.1 is satisfied by some commonly used kernels, such as the von Mises kernel Inline graphic. However, they can be further relaxed. On the one hand, it is sufficient to assume that the kernel Inline graphic is twice continuously differentiable except for finitely many points on Inline graphic. On the other hand, as long as the kernel Inline graphic satisfies Inline graphic and the true directional density Inline graphic is positive almost everywhere on Inline graphic, Lemma 4.1 demonstrates that Inline graphic with probability tending to 1 when Inline graphic and Inline graphic. Therefore, our upper bound on Inline graphic in our proof of (c) will be asymptotically valid for all Inline graphic, even without the strict decreasing property of kernel Inline graphic. Under such relaxation, our conclusions in Proposition 4.1 are applicable to directional SCMS algorithms with other kernels that have bounded supports on Inline graphic.

Proposition I.2. Convergence of the SCGA Algorithm on Inline graphic.

For any SCGA sequence Inline graphic defined by (4.18) with Inline graphic, the following properties hold:

  • (a) Under condition (A1), the objective function sequence Inline graphic is non-decreasing and thus converges.

  • (b) Under condition (A1), Inline graphic.

  • (c) Under conditions (A1-3), Inline graphic whenever Inline graphic with the convergence radius Inline graphic satisfying
    graphic file with name DmEquation259.gif
    where Inline graphic is a constant defined in (h) of Lemma G.1 and Inline graphic is a quantity depending on both the dimension Inline graphic and the functional norm Inline graphic up to the fourth-order (partial) derivatives of Inline graphic.

Proof. Proof of Proposition 4.2

The proof is similar to our arguments in Proposition 3.1. For the completeness, we still delineate the detailed steps because the proof requires some nontrivial techniques, such as parallel transports and line integrals, on general Riemannian manifolds.

(a) We first derive the following property of the objective function Inline graphic supported on Inline graphic, which is a counterpart of Fact 1 in the proof of Proposition 3.1.

Inline graphic  Property 1. Given (A1), the function Inline graphic is Inline graphic-smooth on Inline graphic, that is, Inline graphic is Inline graphic-Lipschitz. This property follows easily from the differentiability of Inline graphic guaranteed by condition (A1) and Theorem 4.34 in [69] that

graphic file with name DmEquation260.gif (I2)

for any Inline graphic, where Inline graphic lies on the geodesic curve Inline graphic with Inline graphic and Inline graphic. Then,

graphic file with name DmEquation261.gif (I3)

where the equality (i) follows from the fundamental theorem for line integrals (Theorem 11.39 in [68]), equality (ii) utilizes the isometric property of parallel transports and inequality (iv) follows from (I2). Moreover, since the velocity of the geodesic Inline graphic is always constant, we deduce that Inline graphic and the equality (iii) follows. We will make use of the following direction of the inequality (I3):

graphic file with name DmEquation262.gif (I4)

Moreover, when Inline graphic,

graphic file with name DmEquation263.gif

showing that the objective function Inline graphic is non-decreasing along the SCGA path Inline graphic on Inline graphic. Given the compactness of Inline graphic and the differentiability of Inline graphic, we know that the sequence Inline graphic is bounded. Thus, it converges.

(b) From (a), we know that when Inline graphic,

graphic file with name DmEquation264.gif

Since the sequence Inline graphic converges, it follows that

graphic file with name DmEquation265.gif

Recall from (2.5) that Inline graphic, so Inline graphic as well.

(c) Given condition (A2) and the fact that Inline graphic, we know that

graphic file with name DmEquation266.gif

Let Inline graphic be the projection of Inline graphic in the SCGA sequence onto the directional ridge Inline graphic. Since Inline graphic by (h) of Lemma G.1, Inline graphic is well defined when Inline graphic. Recall from (G3) that the column space of

graphic file with name DmEquation267.gif

coincides with the normal space of Inline graphic within the tangent space Inline graphic. We define a geodesic Inline graphic with Inline graphic and calculate that

graphic file with name DmEquation268.gif

where we utilize the isometric properties of parallel transports in (i), note that the velocity of geodesic is constant, i.e. Inline graphic for any Inline graphic to obtain (ii), leverage (d) of Lemma G.1 to deduce (iii) and use the fact that Inline graphic when Inline graphic in the inequality (iv). In particular for the inequality (iii), Inline graphic is a full column rank matrix and Inline graphic lies within the column space of Inline graphic. Since the nonzero singular values of Inline graphic are lower bounded by Inline graphic, it follows that

graphic file with name DmEquation269.gif

In addition, we also know that Inline graphic comes from the supremum norm of Inline graphic over the geodesic connecting Inline graphic and Inline graphic with Inline graphic being the Riemannian connection, which in turn depends on the uniform functional norm Inline graphic of the partial derivatives of Inline graphic up to the fourth order. By (b), we deduce that

graphic file with name DmEquation270.gif

The results follow.

The nonzero curvature structure of the unit (hyper-sphere) Inline graphic, on which the objective function (or density) Inline graphic lies, induces an extra challenge in establishing the linear convergence of population and sample-based SCGA algorithms. Some useful techniques used in analyzing non-asymptotic convergence of first-order methods in Inline graphic, such as the law of cosines and linearizations of the objective function, would fail on Inline graphic [116]. Therefore, we first introduce a practical trigonometric distance bound for the Alexandrov space [18] with its sectional curvature bounded from below.

Lemma I.1. Lemma 5 in (116); see also (14).

If Inline graphic are the sides (i.e. side lengths) of a geodesic triangle in an Alexandrov space with sectional curvature lower bounded by Inline graphic, and Inline graphic is the angle between sides Inline graphic and Inline graphic, then


Lemma I.1. Lemma 5 in (116); see also (14). (I5)

The sketching proof of Lemma I.1 can be found in Lemma 5 of [116]. Note that the sectional curvature Inline graphic on Inline graphic. We inherit the notation in [116] and denote Inline graphic by Inline graphic for the curvature-dependent quantity in the inequality (I5). One can show by differentiating Inline graphic with respect to Inline graphic that Inline graphic is strictly increasing and greater than 1 for any Inline graphic and fixed Inline graphic. With Lemma I.1 in hand, we are able to state a straightforward corollary indicating an important relation between two consecutive points in the SCGA sequence Inline graphic on Inline graphic defined by (4.18):

graphic file with name DmEquation272.gif (I6)

Corollary I.1.

For any point Inline graphic in a geodesically convex set on Inline graphic, the update in (I6) satisfies


Corollary I.1.

where Inline graphic is the geodesic distance between Inline graphic and Inline graphic on Inline graphic.

Proof. Proof of Corollary I.1

Recall that the (population) SCGA iterative formula on Inline graphic is given by Inline graphic. Note that for the geodesic triangle Inline graphic with Inline graphic, we have that

graphic file with name DmEquation274.gif

and

graphic file with name DmEquation275.gif

By letting Inline graphic and Inline graphic in Lemma I.1, we obtain that

graphic file with name DmEquation276.gif

Some rearrangements will yield the final display.

Note that Inline graphic in our conditions (A2–3) is a geodesically convex set, where the minimal geodesic between two points in the set Inline graphic always lies within the set. Hence, Corollary I.1 is applicable to our interested SCGA algorithm initialized within Inline graphic.

Theorem I.1. Linear Convergence of the SCGA Algorithm on Inline graphic.

Assume conditions (A1–4) throughout the theorem.

  • (a) Q-Linear convergence of Inline graphic: Consider a convergence radius Inline graphic satisfying
    graphic file with name DmEquation277.gif
    where Inline graphic is the constant defined in (h) of Lemma G.1 and Inline graphic is a quantity defined in (c) of Proposition 4.2 that depends on both the dimension Inline graphic and the functional norm Inline graphic up to the fourth-order (partial) derivatives of Inline graphic. Whenever Inline graphic and the initial point Inline graphic with Inline graphic, we have that
    graphic file with name DmEquation278.gif
  • (b) R-Linear convergence of Inline graphic: Under the same radius Inline graphic in (a), we have that whenever Inline graphic and the initial point Inline graphic with Inline graphic,
    graphic file with name DmEquation279.gif

We further assume (D1–2) in the rest of statements. Suppose that Inline graphic and Inline graphic.

  • (c) Q-Linear convergence of Inline graphic: Under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation280.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.
  • (d) R-Linear convergence of Inline graphic: Under the same radius Inline graphic and Inline graphic in (a), we have that
    graphic file with name DmEquation281.gif
    with probability tending to 1 whenever Inline graphic and the initial point Inline graphic with Inline graphic.

Proof. Proof of Theorem 4.2

The proof is similar to our argument in Theorem 3.1, except that the objective function Inline graphic is supported on a nonlinear manifold Inline graphic here. The key arguments are credited to Corollary I.1. We first recall the following two properties. Inline graphic  Property 1. Given (A1), the function Inline graphic is Inline graphic-smooth on Inline graphic, that is, Inline graphic is Inline graphic-Lipschitz. Inline graphic  Property 2. Given conditions (A1–3), we know that Inline graphic and

graphic file with name DmEquation282.gif

for any Inline graphic with Inline graphic.

Property 1 has been established in the proof of Proposition 4.2, indicating that the objective function sequence Inline graphic is non-decreasing when Inline graphic. Property 2 is a natural corollary by Proposition 4.2, because Inline graphic and

graphic file with name DmEquation283.gif

is the objective function value after one-step SCGA iteration on Inline graphic with step size Inline graphic. The iteration will move Inline graphic closer to the directional ridge Inline graphic. With the help of these two properties, we start the proofs of (a–d).

(a) We first prove the following claim using Lemma E.1: for all Inline graphic and Inline graphic,

graphic file with name DmEquation284.gif (I7)

where Inline graphic. By the differentiability of Inline graphic ensured by condition (A1) and Taylor’s theorem on Inline graphic, we deduce that

graphic file with name iaac005fx3.jpg

where we leverage the equality Inline graphic in (i) and (iii), use conditions (A2) and (A4) that Inline graphic and Inline graphic in (ii), apply the quadratic bound for Inline graphic in condition (A4) to obtain (iv), and leverage the facts that Inline graphic and Inline graphic when Inline graphic in (v); recall (2.5). Our claim (I7) is thus proved.

In addition, given Property 2 and any Inline graphic, we derive that

graphic file with name DmEquation286.gif

where we apply (I3) to obtain the inequality. This indicates that

graphic file with name DmEquation287.gif (I8)

for any Inline graphic. Therefore, by Corollary I.1, we obtain that

graphic file with name DmEquation288.gif

 

graphic file with name DmEquation288a.gif

whenever Inline graphic, where we utilize Corollary I.1 and the monotonicity of Inline graphic with respect to Inline graphic in (i), apply (I7) and (I8) to obtain (ii), and use the choice of Inline graphic to argue that

graphic file with name DmEquation289.gif

in (iii). By telescoping, we conclude that when Inline graphic and Inline graphic,

graphic file with name DmEquation290.gif

The result follows.

(b) The result follows obviously from (a) and the fact that Inline graphic for all Inline graphic.

(c) The proof is logically similar to the proof of (c) in Theorem 3.1. We write the spectral decompositions of Inline graphic and Inline graphic as

graphic file with name DmEquation291.gif

By Weyl’s theorem (Theorem 4.3.1 in [58]) and uniform bounds (4.6),

graphic file with name DmEquation292.gif

Thus, Inline graphic will satisfy conditions (A2) with high probability when Inline graphic is sufficiently small and Inline graphic is sufficiently large. According to Davis–Kahan theorem (Lemma D.2 here), uniform bounds (4.6), and the continuity of exponential maps, we have that

graphic file with name DmEquation293.gif

for any Inline graphic, where we utilize the Davis–Kahan theorem and Inline graphic in (i). Hence, when Inline graphic and Inline graphic,

graphic file with name DmEquation294.gif (I9)

with probability tending to 1.

We now claim that Inline graphic and

graphic file with name DmEquation295.gif (I10)

for all Inline graphic. We again prove this claim by induction on the iteration number. Note that when Inline graphic, we derive that

graphic file with name DmEquation296.gif

where we apply the triangle inequality in (i) and leverage the result in (a) and (I9) to obtain (ii). The triangle inequality is valid in this context because the geodesic measures the minimal distance between two points on Inline graphic. Moreover, by the choice of Inline graphic and (I9), we are sure that Inline graphic. In the induction from Inline graphic, we suppose that Inline graphic and the claim (I10) holds at iteration Inline graphic. The same argument then implies that the claim (I10) holds for iteration Inline graphic and that Inline graphic. The claim (I10) is thus verified.

Now, given that Inline graphic, we iterate the claim (I10) to show that

graphic file with name DmEquation297.gif

where the fourth inequality follows by summing the geometric series, and the last equality is due to our notation Inline graphic. It completes the proof.

(d) The result follows directly from (c) and the inequality Inline graphic for all Inline graphic.

Contributor Information

Yikun Zhang, Department of Statistics, University of Washington, Seattle, WA 98195, USA.

Yen-Chi Chen, Department of Statistics, University of Washington, Seattle, WA 98195, USA.

References

  • 1. Absil, P.-A., Mahony, R. & Sepulchre, R. (2008) Optimization Algorithms on Matrix Manifolds. Princeton, NJ: Princeton University Press. [Google Scholar]
  • 2. Absil, P. A., Mahony, R. & Trumpf, J. (2013) An extrinsic look at the riemannian hessian. Geometric Science of Information. ( F.  Nielsen & F.  Barbaresco eds). Berlin Heidelberg: Springer, pp. 361–368. [Google Scholar]
  • 3. Aliyari Ghassabeh, Y. (2015) A sufficient condition for the convergence of the mean shift algorithm with gaussian kernel. J. Multivariate Anal., 135, 1–10. [Google Scholar]
  • 4. Anitescu, M. (2000) Degenerate nonlinear programming with a quadratic growth condition. SIAM J. Optim., 10, 1116–1135. [Google Scholar]
  • 5. Argus, D. F., Gordon, R. G. & DeMets, C. (2011) Geologically current motion of 56 plates relative to the no-net-rotation reference frame. Geochemistry, Geophysics, Geosystems, 12. [Google Scholar]
  • 6. Arias-Castro, E., Mason, D. & Pelletier, B. (2016) On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. J. Mach. Learn. Res., 17, 1–28. [Google Scholar]
  • 7. Bai, Z., Rao, C. & Zhao, L. (1988) Kernel estimators of density function of directional data. J. Multivariate Anal., 27, 24–39. [Google Scholar]
  • 8. Balakrishnan, S., Wainwright, M. J. & Yu, B. (2017) Statistical guarantees for the em algorithm: From population to sample-based analysis. Ann. Statist., 45, 77–120. [Google Scholar]
  • 9. Banerjee, A., Dhillon, I. S., Ghosh, J. & Sra, S. (2005) Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res., 6, 1345–1382. [Google Scholar]
  • 10. Banyaga, A. & Hurtubise, D. (2004) Lectures on Morse Homology. Texts in the Mathematical Sciences. Netherlands: Springer. [Google Scholar]
  • 11. Beck, A. & Tetruashvili, L. (2013) On the convergence of block coordinate descent type methods. SIAM J. Optim., 23, 2037–2060. [Google Scholar]
  • 12. Beran, R. (1979) Exponential models for directional data. Ann. Statist., 7, 1162–1178. [Google Scholar]
  • 13. Bird, P. (2003) An updated digital model of plate boundaries. Geochemistry, Geophysics, Geosystems, 4. [Google Scholar]
  • 14. Bonnabel, S. (2013) Stochastic gradient descent on riemannian manifolds. IEEE Trans. Automat. Control, 58, 2217–2229. [Google Scholar]
  • 15. Boumal, N. (2020) An introduction to optimization on smooth manifolds. Available online, Aug.. [Google Scholar]
  • 16. Bowman, A. W. (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71, 353–360. [Google Scholar]
  • 17. Bubeck, S. (2015) Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn., 8, 231–357. [Google Scholar]
  • 18. Burago, Y., Gromov, M. & Perel’man, G. (1992) A.d. alexandrov spaces with curvature bounded below. Russian Math. Surveys, 47, 1–58. [Google Scholar]
  • 19. Carreira-Perpiñán, M. Á. (2007) Gaussian mean-shift is an em algorithm. IEEE Trans. Pattern Anal. Mach. Intell., 29, 767–776. [DOI] [PubMed] [Google Scholar]
  • 20. Chacón, E. J., Duong, T. & Wand, P. M. (2011) Asymptotics for general multivariate kernel density derivative estimators. Statist. Sinica, 21, 807. [Google Scholar]
  • 21. Charles, Z. & Papailiopoulos, D. (2018) Stability and generalization of learning algorithms that converge to global optima. International Conference on Machine Learning. PMLR, PMLR, pp. 745–754.
  • 22. Chen, Y.-C., Genovese, C. R. & Wasserman, L. (2015a) Asymptotic theory for density ridges. Ann. Statist., 43, 1896–1928. [Google Scholar]
  • 23. Chen, Y.-C., Ho, S., Freeman, P. E., Genovese, C. R. & Wasserman, L. (2015b) Cosmic web reconstruction through density ridges: method and algorithm. Monthly Notices of the Royal Astronomical Society, 454, 1140–1156. [Google Scholar]
  • 24. Chen, Y.-C., Genovese, C. R., Ho, S. & Wasserman, L. (2015c) Optimal ridge detection using coverage risk. Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. [Google Scholar]
  • 25. Chen, Y.-C., Genovese, C. R. & Wasserman, L. (2016a) A comprehensive approach to mode clustering. Electron. J. Stat., 10, 210–241. [Google Scholar]
  • 26. Chen, Y.-C., Ho, S., Brinkmann, J., Freeman, P. E., Genovese, C. R., Schneider, D. P. & Wasserman, L. (2016b) Cosmic web reconstruction through density ridges: catalogue. Monthly Notices of the Royal Astronomical Society, 461, 3896–3909. [Google Scholar]
  • 27. Chen, Y.-C. (2017) A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology, 1, 161–187. [Google Scholar]
  • 28. Chen, Y.-C. (2022) Solution manifold and its statistical applications. Electron. J. Stat., 16, 408–450. [Google Scholar]
  • 29. Cheng, Y. (1995) Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell., 17, 790–799. [Google Scholar]
  • 30. Chrisman, N. R. (2017) Calculating on a round planet. International Journal of Geographical Information Science, 31, 637–657. [Google Scholar]
  • 31. Comaniciu, D. & Meer, P. (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24, 603–619. [Google Scholar]
  • 32. Cuevas, A. (2009) Set estimation: Another bridge between statistics and geometry. Bol. Estad. Investig. Oper, 25, 71–85. [Google Scholar]
  • 33. Damon, J. (1999) Properties of ridges and cores for two-dimensional images. J. Math. Imaging Vis., 10, 163–174. [Google Scholar]
  • 34. Daniilidis, A., Ley, O. & Sabourau, S. (2010) Asymptotic behaviour of self-contracted planar curves and gradient orbits of convex functions. J. Math. Pures Appl., 94, 183–199. [Google Scholar]
  • 35. Daniilidis, A., David, G., Durand-Cartagena, E. & Lemenant, A. (2015) Rectifiability of self-contracted curves in the euclidean space and applications. J. Geom. Anal., 25, 1211–1239. [Google Scholar]
  • 36. Davis, C. & Kahan, W. M. (1970) The rotation of eigenvectors by a perturbation. iii. SIAM J. Numer. Anal., 7, 1–46. [Google Scholar]
  • 37. do Carmo, M. (2016) Differential Geometry of Curves and Surfaces: Revised and UpdatedSecond Edition. Dover Books on Mathematics. Dover Publications. [Google Scholar]
  • 38. Drusvyatskiy, D. & Lewis, A. S. (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res., 43, 919–948. [Google Scholar]
  • 39. Eberly, D. (1996) Ridges in Image and Data Analysis. Computational Imaging and Vision. Springer Netherlands. [Google Scholar]
  • 40. Einmahl, U. & Mason, D. M. (2005) Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist., 33, 1380–1403. [Google Scholar]
  • 41. Fazel, M., Ge, R., Kakade, S. & Mesbahi, M. (2018) Global convergence of policy gradient methods for the linear quadratic regulator. International Conference on Machine Learning. PMLR, PMLR, pp. 1467–1476.
  • 42. Federer, H. (1959) Curvature measures. Trans. Amer. Math. Soc., 93, 418–491. [Google Scholar]
  • 43. García-Portugués, E. (2013) Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electron. J. Stat., 7, 1655–1685. [Google Scholar]
  • 44. García-Portugués, E., Crujeiras, R. M. & González-Manteiga, W. (2013) Kernel density estimation for directional-linear data. J. Multivariate Anal., 121, 152–175. [Google Scholar]
  • 45. Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. & Wasserman, L. (2014) Nonparametric ridge estimation. Ann. Statist., 42, 1511–1545. [Google Scholar]
  • 46. Ghassabeh, Y. A., Linder, T. & Takahara, G. (2013) On some convergence properties of the subspace constrained mean shift. Pattern Recognition, 46, 3140–3147. [Google Scholar]
  • 47. Ghassabeh, Y. A. & Rudzicz, F. (2020) Modified subspace constrained mean shift algorithm. J. Classification, 1–17. [Google Scholar]
  • 48. Giné, E. & Guillou, A. (2002) Rates of strong uniform consistency for multivariate kernel density estimators. Annales de l’Institut Henri Poincare (B) Probability and Statistics, 38, 907–921. [Google Scholar]
  • 49. Gupta, C., Balakrishnan, S. & Ramdas, A. (2021) Path length bounds for gradient descent and flow. J. Mach. Learn. Res., 22, 1–63. [Google Scholar]
  • 50. Hall, P. (1983) Large sample optimality of least squares cross-validation in density estimation. Ann. Statist., 1156–1174. [Google Scholar]
  • 51. Hall, P., Watson, G. S. & Cabrara, J. (1987) Kernel density estimation with spherical data. Biometrika, 74, 751–762. [Google Scholar]
  • 52. Hall, P., Qian, W. & Titterington, D. M. (1992) Ridge finding from noisy data. J. Comput. Graph. Statist., 1, 197–211. [Google Scholar]
  • 53. Hall, P., Peng, L. & Rau, C. (2001) Local likelihood tracking of fault lines and boundaries. J. R. Stat. Soc. Ser. B Stat. Methodol., 63, 569–582. [Google Scholar]
  • 54. Harris, R. A. (2017) Large earthquakes and creeping faults. Reviews of Geophysics, 55, 169–198. [Google Scholar]
  • 55. Hastie, T. & Stuetzle, W. (1989) Principal curves. J. Amer. Statist. Assoc., 84, 502–516. [Google Scholar]
  • 56. Hauberg, S. (2015) Principal curves on riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell., 38, 1915–1921. [DOI] [PubMed] [Google Scholar]
  • 57. Horn, R. A. & Johnson, C. R. (1991) Topics in Matrix Analysis. Cambridge Univ. Press. [Google Scholar]
  • 58. Horn, R. A. & Johnson, C. R. (2012) Matrix Analysis, 2nd edn. Cambridge Univ. Press. [Google Scholar]
  • 59. Irwin, M. C. (2001) Smooth dynamical systems, vol. 17. World Scientific. [Google Scholar]
  • 60. Izenman, A. J. (2012) Introduction to manifold learning. Wiley Interdiscip. Rev. Comput. Stat., 4, 439–446. [Google Scholar]
  • 61. Jones, M. C., Marron, J. S. & Sheather, S. J. (1996) A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc., 91, 401–407. [Google Scholar]
  • 62. Kafai, M., Miao, Y. & Okada, K. (2010) Directional mean shift and its application for topology classification of local 3d structures. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. IEEE, pp. 170–177. [Google Scholar]
  • 63. Karimi, H., Nutini, J. & Schmidt, M. (2016) Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. Machine Learning and Knowledge Discovery in Databases. Cham: Springer International Publishing, pp. 795–811. [Google Scholar]
  • 64. Klemelä, J. (2000) Estimation of densities and derivatives of densities with directional data. J. Multivariate Anal., 73, 18–40. [Google Scholar]
  • 65. Kobayashi, T. & Otsu, N. (2010) Von mises-fisher mean shift for clustering on a hypersphere. 20th International Conference on Pattern Recognition. IEEE, pp. 2130–2133. [Google Scholar]
  • 66. Kozak, D., Becker, S., Doostan, A. & Tenorio, L. (2019) Stochastic subspace descent. arXiv preprint arXiv: 1904.01145.
  • 67. Kozak, D., Becker, S., Doostan, A. & Tenorio, L. (2020) A stochastic subspace approach to gradient-free optimization in high dimensions. arXiv preprint arXiv, 2003.02684. [Google Scholar]
  • 68. Lee, J. (2012) Introduction to Smooth Manifolds. Graduate Texts in Mathematics, 2nd edn. Springer. [Google Scholar]
  • 69. Lee, J. M. (2018) Introduction to Riemannian manifolds. Springer. [Google Scholar]
  • 70. Ley, C. & Verdebout, T. (2017) Modern directional statistics. CRC Press. [Google Scholar]
  • 71. Li, X., Hu, Z. & Wu, F. (2007) A note on the convergence of the mean shift. Pattern Recognition, 40, 1756–1762. [Google Scholar]
  • 72. Lojasiewicz, S. (1963) A topological property of real analytic subsets. Coll. du CNRS. Les équations aux dérivées partielles, 117, 87–89. [Google Scholar]
  • 73. Luo, Z.-Q. & Tseng, P. (1992) On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl., 72, 7–35. [Google Scholar]
  • 74. Mardia, K. & Jupp, P. (2000) Directional Statistics. Wiley Series in Probability and Statistics. Wiley. [Google Scholar]
  • 75. Marzio, M. D., Panzera, A. & Taylor, C. C. (2011) Kernel density estimation on the torus. J. Statist. Plann. Inference, 141, 2156–2173. [Google Scholar]
  • 76. Necoara, I., Nesterov, Y. & Glineur, F. (2019) Linear convergence of first order methods for non-strongly convex optimization. Math. Programming, 175, 69–107. [Google Scholar]
  • 77. Nesterov, Y., et al. (2018) Lectures on convex optimization, vol. 137. Springer. [Google Scholar]
  • 78. Nocedal, J. & Wright, S. J. (2006) Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. New York: Springer. [Google Scholar]
  • 79. Norgard, G. & Bremer, P.-T. (2012) Second derivative ridges are straight lines and the implications for computing lagrangian coherent structures. Phys. D, 241, 1475–1476. [Google Scholar]
  • 80. Oba, S., Kato, K. & Ishii, S. (2005) Multi-scale clustering for gene expression profiling data. Proceedings of Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’05). IEEE, pp. 210–217. [Google Scholar]
  • 81. Ok, E. A. (2007) Real Analysis with Economic Applications, vol. 10. Princeton University Press. [Google Scholar]
  • 82. Oliveira, M., Crujeiras, R. M. & Rodríguez-Casal, A. (2012) A plug-in rule for bandwidth selection in circular density estimation. Comput. Stat. Data Anal., 56, 3898–3908. [Google Scholar]
  • 83. Ozertem, U. & Erdogmus, D. (2011) Locally defined principal curves and surfaces. J. Mach. Learn. Res., 12, 1249–1286. [Google Scholar]
  • 84. Peikert, R., Günther, D. & Weinkauf, T. (2013) Comment on “second derivative ridges are straight lines and the implications for computing lagrangian coherent structures, physica d 2012.05. 006”. Phys. D, 242, 65–66. [Google Scholar]
  • 85. Pennec, X. (2006) Intrinsic statistics on riemannian manifolds: Basic tools for geometric measurements. J. Math. Imaging Vision, 25, 127–154. [Google Scholar]
  • 86. Pewsey, A. & García-Portugués, E. (2021) Recent advances in directional statistics. Test, 1–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Polyak, B. (1963) Gradient methods for the minimisation of functionals. Comput. Math. Math. Phys., 3, 864–878. [Google Scholar]
  • 88. Qiao, W. (2021) Asymptotic confidence regions for density ridges. Bernoulli, 27, 946–975. [Google Scholar]
  • 89. Qiao, W. & Polonik, W. (2016) Theoretical analysis of nonparametric filament estimation. Ann. Statist., 44, 1269–1297. [Google Scholar]
  • 90. Qiao, W. & Polonik, W. (2021) Algorithms for ridge estimation with convergence guarantees. arXiv preprint arXiv:2104.12314.
  • 91. Rudemo, M. (1982) Empirical choice of histograms and kernel density estimators. Scand. J. Statist., 65–78. [Google Scholar]
  • 92. Rudin, W. (1976) Principles of Mathematical Analysis, 3rd edn. McGraw-Hill New York. [Google Scholar]
  • 93. Saavedra-Nieves, P. & María Crujeiras, R. (2020) Nonparametric estimation of directional highest density regions. arXiv preprint arXiv:2009.08915.
  • 94. Saragih, J. M., Lucey, S. & Cohn, J. F. (2009) Face alignment through subspace constrained mean-shifts. Proceedings of the IEEE 12th International Conference on Computer Vision. IEEE, pp. 1034–1041. [Google Scholar]
  • 95. Sasaki, H., Kanamori, T. & Sugiyama, M. (2017) Estimating density ridges by direct estimation of density-derivative-ratios. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54. ( A.  Singh & J.  Zhu eds). FL, USA:  PMLR: Fort Lauderdale, pp. 204–212. [Google Scholar]
  • 96. Scott, D. (2015) Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley Series in Probability and Statistics. Wiley. [Google Scholar]
  • 97. Sheather, S. J. (2004) Density estimation. Statist. Sci., 19, 588–597. [Google Scholar]
  • 98. Sheather, S. J. & Jones, M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B Stat. Methodol., 53, 683–690. [Google Scholar]
  • 99. Silverman, B. W. (1986) Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. [Google Scholar]
  • 100. Snyder, J., Voxland, P. & U.S.), G. S. (1989) An Album of Map Projections. An Album of Map Projections. U.S: Government Printing Office. [Google Scholar]
  • 101. Sousbie, T., Pichon, C., Courtois, H., Colombi, S. & Novikov, D. (2007) The three-dimensional skeleton of the SDSS. The Astrophysical Journal, 672, L1–L4. [Google Scholar]
  • 102. Stone, C. J. (1984) An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist., 1285–1297. [Google Scholar]
  • 103. Subarya, C., Chlieh, M., Prawirodirdjo, L., Avouac, J.-P., Bock, Y., Sieh, K., Meltzner, A. J., Natawidjaja, D. H. & McCaffrey, R. (2006) Plate-boundary deformation associated with the great sumatra–andaman earthquake. Nature, 440, 46–51. [DOI] [PubMed] [Google Scholar]
  • 104. Subbarao, R. & Meer, P. (2006) Nonlinear mean shift for clustering over analytic manifolds. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1. IEEE, pp. 1168–1175. [Google Scholar]
  • 105. Subbarao, R. & Meer, P. (2009) Nonlinear mean shift over riemannian manifolds. Int. J. Comput. Vis., 84, 1. [Google Scholar]
  • 106. Taylor, C. C. (2008) Automatic bandwidth selection for circular density estimation. Comput. Statist. Data Anal., 52, 3493–3500. [Google Scholar]
  • 107. van der Vaart, A. W. (1998) Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge Univ. Press. [Google Scholar]
  • 108. van der Vaart, A. W. & Wellner, J. A. (1996) Weak convergence and empirical processes: with applications to statistics. Springer Science & Business Media. [Google Scholar]
  • 109. von Luxburg, U. (2007) A tutorial on spectral clustering. Statist. Comput., 17, 395–416. [Google Scholar]
  • 110. Wasserman, L. (2006) All of Nonparametric Statistics (Springer Texts in Statistics). Berlin, Heidelberg: Springer-Verlag. [Google Scholar]
  • 111. Wasserman, L. (2018) Topological data analysis. Annu. Rev. Stat. Appl., 5, 501–532. [Google Scholar]
  • 112. Wright, S. J. (2015) Coordinate descent algorithms. Math. Programming, 151, 3–34. [Google Scholar]
  • 113. Yang, M.-S., Chang-Chien, S.-J. & Kuo, H.-C. (2014) On mean shift clustering for directional data on a hypersphere. Proceedings of the Artificial Intelligence and Soft Computing. Cham: Springer International Publishing, pp. 809–818. [Google Scholar]
  • 114. You, S., Bas, E., Erdogmus, D. & Kalpathy-Cramer, J. (2011) Principal curved based retinal vessel segmentation towards diagnosis of retinal diseases. Proceedings of the IEEE First International Conference on Healthcare Informatics, Imaging and Systems Biology. IEEE, pp. 331–337. [Google Scholar]
  • 115. Yu, Y., Wang, T. & Samworth, R. J. (2014) A useful variant of the davis–kahan theorem for statisticians. Biometrika, 102, 315–323. [Google Scholar]
  • 116. Zhang, H. & Sra, S. (2016) First-order methods for geodesically convex optimization. Proceedings of the 29th Annual Conference on Learning Theory ( V.  Feldman, A.  Rakhlin & O.  Shamir eds). Proceedings of Machine Learning Research, vol. 49. Columbia University, New York, New York, USA: PMLR, pp. 1617–1638. [Google Scholar]
  • 117. Zhang, Y. & Chen, Y.-C. (2021a) The em perspective of directional mean shift algorithm. arXiv preprint arXiv:2101.10058.
  • 118. Zhang, Y. & Chen, Y.-C. (2021b) Kernel smoothing, mean shift, and their learning theory with directional data. J. Mach. Learn. Res., 22, 1–92. [Google Scholar]
  • 119. Zhang, Y. & Chen, Y.-C. (2021c) Mode and ridge estimation in euclidean and directional product spaces: A mean shift approach. arXiv preprint arXiv:2110.08505.
  • 120. Zhao, L. & Wu, C. (2001) Central limit theorem for integrated squared error of kernel estimators of spherical density. Sci. China Ser. A Math., 44, 474–483. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

EuDirSCMS-main_iaac005

Data Availability Statement

The data and code underlying this paper are available at https://github.com/zhangyk8/EuDirSCMS. Specifically, the earthquake data in Section 5.3 were obtained from the Earthquake Catalog (https://earthquake.usgs.gov/earthquakes/search/) of the United States Geological Survey.


Articles from Information and Inference are provided here courtesy of Oxford University Press

RESOURCES