Skip to main content
Springer logoLink to Springer
. 2021 May 15;5(3):425–458. doi: 10.1007/s41468-021-00072-4

Improved approximate rips filtrations with shifted integer lattices and cubical complexes

Aruni Choudhary 1,, Michael Kerber 2, Sharath Raghvendra 3
PMCID: PMC8549989  PMID: 34722862

Abstract

Rips complexes are important structures for analyzing topological features of metric spaces. Unfortunately, generating these complexes is expensive because of a combinatorial explosion in the complex size. For n points in Rd, we present a scheme to construct a 2-approximation of the filtration of the Rips complex in the L-norm, which extends to a 2d0.25-approximation in the Euclidean case. The k-skeleton of the resulting approximation has a total size of n2O(dlogk+d). The scheme is based on the integer lattice and simplicial complexes based on the barycentric subdivision of the d-cube. We extend our result to use cubical complexes in place of simplicial complexes by introducing cubical maps between complexes. We get the same approximation guarantee as the simplicial case, while reducing the total size of the approximation to only n2O(d) (cubical) cells. There are two novel techniques that we use in this paper. The first is the use of acyclic carriers for proving our approximation result. In our application, these are maps which relate the Rips complex and the approximation in a relatively simple manner and greatly reduce the complexity of showing the approximation guarantee. The second technique is what we refer to as scale balancing, which is a simple trick to improve the approximation ratio under certain conditions.

Keywords: Persistent homology, Rips filtrations, Approximation algorithms, Topological data analysis

Introduction

Context. Persistent homology (Carlsson 2009; Edelsbrunner and Harer 2010; Edelsbrunner et al. 2002) is a technique to analyze data sets using topological invariants. The idea is to build a multi-scale representation of data sets and to track its homological changes across the scales.

A standard construction for the important case of point clouds in Euclidean space is the Vietoris-Rips complex (usually abbreviated as simply the Rips complex): for a scale parameter α0, it is the collection of all subsets of points with diameter at most α. When α increases from 0 to , the Rips complexes form a filtration, an increasing sequence of nested simplicial complexes whose homological changes can be computed and represented in terms of a barcode.

The computational drawback of Rips complexes is their sheer size: the k-skeleton of a Rips complex (that is, where only subsets of size at most k+1 are considered) for n points consists of Θ(nk+1) simplices because every (k+1)-subset joins the complex for a sufficiently large scale parameter. This size bound makes barcode computations for large point clouds infeasible even for low-dimensional homological features1. This difficulty motivates the question of what we can say about the barcode of the Rips filtration without explicitly constructing all of its simplices.

We address this question using approximation techniques. The space of barcodes forms a metric space: two barcodes are close if similiar homological features occur on roughly the same range of scales. More precisely, the bottleneck distance is used as a distance metric between barcodes. The first approximation scheme by Sheehy (2013) constructs a (1+ε)-approximation of the k-skeleton of the Rips filtration using only n(1ε)O(λk) simplices for arbitrary finite metric spaces, where λ is the doubling dimension of the metric. Further approximation techniques for Rips complexes (Dey et al. 2014) and the closely related Čech complexes (Botnan and Spreemann 2015; Cavanna et al. 2015; Kerber and Sharathkumar 2013) have been derived subsequently, all with comparable size bounds. More recently, we constructed an approximation scheme (Choudhary et al. 2019) for the Čech filtrations of n points in Rd that had size n1εO(d)2O(dlogd+dk) for the k-skeleton, improving the size bound from previous work.

In Choudhary et al. (2017b), we constructed an approximation scheme for Rips filtration in Euclidean space that yields a worse approximation factor of only O(d), but uses only n2O(dlogk+d) simplices. In Choudhary et al. (2017b), we also show a lower bound result on the size of approximations: for any ε<1/log1+cn with some constant c(0,1), any ε-approximate filtration has size nΩ(loglogn).

There has also been work on using cubical complexes to compute persistent homology, such as in Wagner et al. (2012). Cubical complexes are typically smaller than their simplicial counterparts, simply because they avoid triangulations. However, to our knowledge, there has been no attempt to utilize them in computing approximations of filtrations. Also, while there are efficient methods to compute persistence for simplicial complexes connected with simplicial maps (Dey et al. 2014; Kerber and Schreiber 2017), we are not aware of such counterparts for cubical complexes.

Our contributions. For the Rips filtration of n points in Rd with distances taken in the L-norm, we present a 2-approximation whose k-skeleton has size at most n6d-1(2k+4)(k+3)!dk+2=n2O(dlogk+d) where ab denotes Stirling numbers of the second kind. This translates to a 2d0.25-approximation of the Rips filtration in the Euclidean metric and hence improves the asymptotic approximation quality of our previous approach (Choudhary et al. 2017b) with the same size bound. Our scheme gives the best size guarantee over all previous approaches.

On a high level, our approach follows a straightforward approximation scheme: given a scaled and appropriately shifted integer grid on Rd, we identify those grid points that are close to the input points and build an approximation complex using these grid points. The challenge lies in how to connect these grid points to a simplicial complex such that close-by grid points are connected, while avoiding too many connections to keep the size small. Our approach first selects a set of active faces in the cubical complex defined over the grid, and defines the approximation complex using the barycentric subdivision of this cubical complex.

We also describe an output-sensitive algorithm to compute our approximation. By randomizing the aforementioned shifts of the grids, we obtain a worst-case running time of n2O(d)logΔ+2O(d)M in expectation, where Δ is the spread of the point set (that is, the ratio of the diameter to the closest distance of two points) and M is the size of the approximation.

Additionally, this paper makes the following technical contributions:

  • We follow the standard approach of defining a sequence of approximation complexes and establishing an interleaving between the Rips filtration and the approximation. We realize our interleaving using chain maps connecting a Rips complex at scale α to an approximation complex at scale cα, and vice versa, with c1 being the approximation factor. Previous approaches (Choudhary et al. 2017b; Dey et al. 2014; Sheehy 2013) used simplicial maps for the interleaving, which induce an elementary form of chain maps and are therefore more restrictive.

    The explicit construction of such maps can be a non-trivial task. The novelty of our approach is that we avoid this construction by the usage of acyclic carriers (Munkres 1984). In short, carriers are maps that assign subcomplexes to subcomplexes under some mild extra conditions. While they are more flexible, they still certify the existence of suitable chain maps, as we exemplify in Sect. 2. We believe that this technique is of general interest for the construction of approximations of cell complexes.

  • We exploit a simple trick that we call scale balancing to improve the quality of approximation schemes. In short, if the aforementioned interleaving maps from and to the Rips filtration do not increase the scale parameter by the same amount, one can simply multiply the scale parameter of the approximation by a constant. Concretely, given maps
    ϕα:RαXαψα:XαRcα
    interleaving the Rips complex Rα and the approximation complex Xα, we can define Xα:=Xα/c and obtain maps
    ϕα:RαXcαψα:XαRcα
    which improves the interleaving from c to c. While it has been observed that the same trick can be used for improving the worst-case distance between Rips and Čech filtrations,2 our work seems to be the first to make use of it in the context of approximations.
  • We extend our approximation scheme to use cubical complexes instead of simplicial complexes, thereby achieving a marked reduction in size complexity. To connect the cubical complexes at different scales, we introduce the notion of cubical maps, which is a simple extension of simplicial maps to the cubical case. While we do not know of an algorithm that can compute persistence for the case of cubical complexes with cubical maps, we believe that this is a first step towards advocating the use of cubical complexes as approximating structures.

Our technique can be combined with dimension reduction techniques in the same way as in Choudhary et al. (2017b) (see Theorems 19, 21, and 22 therein), with improved logarithmic factors. We state the main results in the paper, while omitting the technical details.

Updates from the conference version. An earlier version of this paper appeared at the 25th European Symposium on Algorithms (Choudhary et al. 2017a). In that version, we achieved a 32-approximation of the L Rips filtration and correspondingly, a 32d0.25-approximation of the L2 case. In this version, we improve the weak interleaving of Choudhary et al. (2017a) to a strong interleaving to get improved approximation factors. We expand upon the details of scale balancing, among other proofs that were missing from the conference version. We add the case of cubical complexes in this version.

There is a subtle yet important distinction between the approximation complexes used in the conference version and the current result. In the conference version, our simplicial complex was built using only active faces, while the current version uses both active and secondary faces (please see Sect. 4 for definitions). This makes it easier to relate the simplicial and the cubical complexes in the current version. On the other hand the complexes are different, hence the associated proofs have been adapted accordingly.

Outline. We start by explaining the relevant topological concepts in Sect. 2. We give details of the integer grids that we use in Sect. 3. In Sect. 4 we present our approximation scheme that uses the barycentric subdivision, and present the computational aspects in Sect. 5. The extension to cubical complexes is presented in Sect. 6. We discuss practical aspects of our scheme and conclude in Sect. 7. Some details of the strong interleaving from Sect. 4 are deferred to Appendix A.

Preliminaries

We briefly review the essential topological concepts needed. More details are available in standard references (see Bubenik et al. 2015; Chazal et al. 2009; Edelsbrunner and Harer 2010; Hatcher 2002; Munkres 1984).

Simplicial complexes. A simplicial complex K on a finite set of elements S is a collection of subsets {σS} called simplices such that each subset τσ is also in K. The dimension of a simplex σK is k:=|σ|-1, in which case σ is called a k-simplex. A simplex τ is a sub-simplex of σ if τσ. We remark that, commonly a sub-simplex is called a “face” of a simplex, but we reserve the word “face” for a different structure. For the same reason, we do not introduce the common notation of of “vertices” and “edges” of simplicial complexes, but rather refer to 0- and 1-simplices throughout. The k-skeleton of K consists of all simplices of K whose dimension is at most k. For instance, the 1-skeleton of K is a graph defined by its 0-simplices and 1-simplices.

Given a point set PRd and a real number α0, the (Vietoris-)Rips complex on P at scale α consists of all simplices σ=(p0,,pk)P such that diam(σ)α, where diam denotes the diameter. In this work, we write Rα for the Rips complex at scale 2α with the Euclidean metric, and Rα when using the metric of the L-norm. In either way, a Rips complex is an example of a flag complex, which means that whenever a set {p0,,pk}P has the property that every 1-simplex {pi,pj} is in the complex, then the k-simplex {p0,,pk} is also in the complex.

A related complex is the Čech complex of P at scale α, which consists of simplices of P for which the radius of the minimum enclosing ball is at most α. We do not study Čech complexes in this paper, but we mention them briefly while showing a connection with the Rips complex later in this section.

A simplicial complex K is a subcomplex of K if KK. For instance, Rα is a subcomplex of Rα for 0αα. Let L be a simplicial complex. Let φ^ be a map which assigns a vertex of L to each vertex of K. A simplicial map is a map φ:KL induced by a vertex map φ^, such that for every simplex {p0,,pk} in K, the set {φ^(p0),,φ^(pk)} is a simplex of L. For K a subcomplex of K, the inclusion map inc:KK is an example of a simplicial map. A simplicial map is completely determined by its action on the 0-simplices of K.

Chain complexes. A chain complex C=(Cp,p) with pZ is a collection of abelian groups Cp and homomorphisms p:CpCp-1 such that p-1p=0. A simplicial complex K gives rise to a chain complex C(K) for a fixed base field F: define Cp for p0 as the set of formal linear combinations of p-simplices in K over F, and C-1:=F. The boundary of a k-simplex with k1 is the (signed) sum of its sub-simplices of co-dimension one3; the boundary of a 0-simplex is simply set to 1. The homomorphisms p are then defined as the linear extensions of this boundary operator. Note that C(K) is sometimes called augmented chain complex of K, where the augmentation refers to the addition of the non-trivial group C-1.

A chain map ϕ:CD between chain complexes C=(Cp,p) and D=(Dp,p) is a collection of group homomorphisms ϕp:CpDp such that ϕp-1p=pϕp. For simplicial complexes K and L, we call a chain map ϕ:C(K)C(L) augmentation-preserving if ϕ-1 is the identity. A simplicial map φ:KL between simplicial complexes induces an augmentation-preserving chain map φ¯:C(K)C(L) between the corresponding chain complexes. This construction is functorial, meaning that for φ the identity function on a simplicial complex K, φ¯ is the identity function on C(K), and for composable simplicial maps φ,φ, we have that φφ¯=φ¯φ¯.

Homology. The p-th homology group Hp(C) of a chain complex is defined as kerp/imp+1. The p-th homology group of a simplicial complex K, Hp(K), is the p-th homology group of its induced chain complex C(K). Note that this definition is commonly referred to as reduced homology, but we ignore this distinction and consider reduced homology throughout. Hp(C) is an F-vector space because we have chosen our base ring F as a field. Intuitively, when the chain complex is generated from a simplicial complex, the dimension of the p-th homology group counts the number of p-dimensional holes in the complex. We write H(C) for the direct sum of all Hp(C) for p0.

A chain map ϕ:CD induces a linear map ϕ:H(C)H(D) between the homology groups. Again, this construction is functorial, meaning that it maps identity maps to identity maps, and it is compatible with compositions.

Acyclic carriers. We call a simplicial complex K acyclic, if K is connected and all homology groups Hp(K) are trivial. For simplicial complexes K and L, an acyclic carrier Φ is a map that assigns to each simplex σ in K, a non-empty acyclic subcomplex Φ(σ)L, and whenever τ is a sub-simplex of σ, then Φ(τ)Φ(σ). We say that a chain cCp(K) is carried by a subcomplex K, if c takes value 0 except for p-simplices in K. A chain map ϕ:C(K)C(L) is carried by  Φ, if for each simplex σK, ϕ(σ) is carried by Φ(σ). We state the acyclic carrier theorem (Munkres 1984, Thm 13.3), adapted to our notation:

Theorem 1

Let Φ:KL be an acyclic carrier. Then,

  • There exists an augmentation-preserving chain map ϕ:C(K)C(L) carried by Φ.

  • If two augmentation-preserving chain maps ϕ1,ϕ2:C(K)C(L) are both carried by Φ, then ϕ1=ϕ2.4

We remark that “augmentation-preserving” is crucial in the statement: without it, the trivial chain map (that maps everything to 0) turns the first statement trivial and easily leads to a counter-example for the second claim.

Filtrations and towers. Let IR be a set of real values which we refer to as scales. A filtration is a collection of simplicial complexes (Kα)αI such that KαKα for all ααI. For instance, (Rα)α0 is a filtration which we call the Rips filtration. A (simplicial) tower is a sequence (Kα)αJ of simplicial complexes with J being a discrete set (for instance J={2kkZ}), together with simplicial maps φα:KαKα between complexes at consecutive scales. For instance, the Rips filtration can be turned into a tower by restricting to a discrete range of scales, and using the inclusion maps as φ. The approximation constructed in this paper will be another example of a tower.

We say that a simplex σ is included in the tower at scale α, if σ is not in the image of the map φα:KαKα, where α is the scale preceding α in the tower. The size of a tower is the number of simplices included over all scales. If a tower arises from a filtration, its size is simply the size of the largest complex in the filtration (or infinite, if no such complex exists). However, this is not true in general for simplicial towers, because simplices can collapse in the tower and the size of the complex at a given scale may not take into account the collapsed simplices which were included at earlier scales in the tower.

Barcodes and Interleavings. A collection of vector spaces (Vα)αI connected with linear maps λα1,α2:Vα1Vα2 is called a persistence module, if λα,α is the identity on Vα and λα2,α3λα1,α2=λα1,α3 for all α1α2α3I for the index set I.

We generate persistence modules using the previous concepts. Given a simplicial tower (Kα)αI, we generate a sequence of chain complexes (C(Kα))αI. By functoriality, the simplicial maps φ of the tower give rise to chain maps φ¯ between these chain complexes. Using functoriality of homology, we obtain a sequence (H(Kα))αI of vector spaces with linear maps φ¯, forming a persistence module. The same construction applies to filtrations as a special case.

Persistence modules admit a decomposition into a collection of intervals of the form [α,β] (with α,βI), called the barcode, subject to certain tameness conditions. The barcode of a persistence module characterizes the module uniquely up to isomorphism. If the persistence module is generated by a simplicial complex, an interval [α,β] in the barcode corresponds to a homological feature (a “hole”) that comes into existence at complex Kα and persists until it disappears at Kβ.

Two persistence modules (Vα)αI and (Wα)αI with linear maps ϕ·,· and ψ·,· are said to be weakly (multiplicatively) c-interleaved with c1, if there exist linear maps γα:VαWcα and δα:WαVcα, called interleaving maps, such that the diagram

graphic file with name 41468_2021_72_Equ1_HTML.gif 1

commutes, that is, ψ=γδ and ϕ=δγ for all {,α/c2,α/c,α,cα,}I (we have skipped the subscripts of the maps for readability). In such a case, the barcodes of the two modules are 3c-approximations of each other in the sense of Chazal et al. (2009). We say that two towers are c-approximations of each other if their persistence modules are c-approximations.

Under the more stringent conditions of strong interleaving, the approximation ratio can be improved. Two persistence modules (Vα)α0 and (Wα)α0 with respective linear maps ϕ·,· and ψ·,· are said to be (multiplicatively) strongly c-interleaved if there exist a pair of families of linear maps γα:VαWcα and δα:WαVcα for c>0, such that Diagram (2) commutes for all 0αα (the subscripts of the maps are excluded for readability). In such a case, the persistence barcodes of the two modules are said to be c-approximations of each other in the sense of Chazal et al. (2009).

graphic file with name 41468_2021_72_Equ2_HTML.gif 2

Finally, we mention a special case that relates equivalent persistence modules (Carlsson and Zomorodian 2005; Goodman et al. 2017). Two persistence modules V=(Vα)αI and W=(Wα)αI that are connected through linear maps ϕ,ψ respectively are isomorphic if there exists an isomorphism fα:VαWα for each αI for which the following diagram commutes for all αβI:

graphic file with name 41468_2021_72_Equ3_HTML.gif 3

Isomorphic persistence modules have identical persistence barcodes.

Scale balancing. Let V=(Vα)αI and W=(Wα)αI be two persistence modules with linear maps fv,fw, respectively. Let there be linear maps ϕ:Vα/ε1Wα and ψ:WαVαε2 for 1ε1,ε2 such that all α,α/ε1,αε2I. Suppose that the following diagram commutes, for all αI.

graphic file with name 41468_2021_72_Equ4_HTML.gif 4

Let ε:=max(ε1,ε2). Then, by replacing ε1,ε2 by ε in Diagram (4), the diagram still commutes, so V is a 3ε-approximation of W.

We define a new vector space Vcα:=Vα, where c=ε1ε2 and cαI. This gives rise to a new persistence module, V=(Vcα)αI. The maps ϕ and ψ can then be interpreted as ϕ:Vα/ε1ε2Wα, or ϕ:VαWαε1ε2 and ψ:WαVαε1ε2. Then, Diagram (4) can be re-interpreted as

graphic file with name 41468_2021_72_Equ5_HTML.gif 5

which still commutes. Therefore, V is a 3ε1ε2-approximation of W, which is an improvement over V, since ε1ε2max(ε1,ε2). V and V have the same barcode up to a scaling factor.

This scaling trick also works when V and W are strongly interleaved. If we have the following commutative diagrams: (where we have skipped the maps for readability):

graphic file with name 41468_2021_72_Equ6_HTML.gif 6

then V and W are max(ε1,ε2)-approximations of each other. By defining V as before, the following diagrams

graphic file with name 41468_2021_72_Equ7_HTML.gif 7

commute for d=cε2=ε1ε2, so we can improve a max(ε1,ε2)-approximation to an ε1ε2-approximation.

We end the section by discussing a basic but important relation between Čech and Rips filtrations. It is well-known that for any α0, CαRαC2α (Edelsbrunner and Harer 2010). This gives a strong interleaving between the towers (Cα)α0 and (Rα)α0 with ε1=1 and ε2=2. Applying the scale balancing technique, we get that

Lemma 1

The scaled Čech persistence module (H(C24α))α0 and the Rips persistence module (H(Rα))α0 are 24-approximations of each other.

Shifted integer lattices

In this section, we take a look at simple modifications of the integer lattice.

We denote by I:={αs:=λ2ssZ} with λ>0, a discrete set of scales. For each scale in I, we define grids which are scaled and translated (shifted) versions of the integer lattice.

Definition 1

(Scaled and shifted grids) For each scale αsI, we define the scaled and shifted grid Gαs inductively as:

  • For s=0, Gαs is simply the scaled integer grid λZd, where each basis vector has been scaled by λ.

  • For s0, we choose an arbitrary point OαsGαs and define
    Gαs+1=2Gαs-Oαs+Oαs+αs2±1,,±1, 8
    where the signs of the components of the last vector are chosen independently and uniformly at random (and the choice is independent for each s).
  • For s0, we define
    Gαs-1=12Gαs-Oαs+Oαs+αs-12±1,,±1, 9
    where the last vector is chosen as in the case of s0.

Equations (8) and (9) are consistent at s=0. A simple example of the above construction is the sequence of grids with Gαs:=αsZd for even s, and Gαs:=αsZd+αs-12(1,,1) for odd s.

Next, we motivate the shifting of the grids. Let VorGs(x) denote the Voronoi cell of any point xGs with respect to the point set Gs. It is clear that the Voronoi cell is a cube of side length αs centered at x. The shifting of the grids ensures that each xGαs lies in the Voronoi region of a unique yGαs+1. Using an elementary calculation, we show a stronger statement:

Lemma 2

Let xGαs,yGαs+1 be such that xVorGαs+1(y). Then,

VorGαs(x)VorGαs+1(y).

Proof

Without loss of generality, we can assume that αs=2 and x is the origin, using an appropriate translation and scaling. Also, we assume for the sake of simplicity that Gαs+1=2Gαs+(1,,1); the proof is analogous for any other translation vector. In that case, it is clear that y=(1,,1). Since Gαs=2Zd, the Voronoi region of x is the set [-1,1]d. Since Gαs+1 is a translated version of 4Zd, the Voronoi region of y is the cube [-1,3]d, which covers [-1,1]d. The claim follows. For an example look to Fig. 1.

Fig. 1.

Fig. 1

Gαs is represented by small disks (yellow), while Gαs+1 is represented by larger disks (green). Possible locations of x are indicated with their Voronoi regions. The Voronoi regions of the larger grid contain those of x

Cubical complex of Zd

The integer grid Zd naturally defines a cubical complex, where each element is an axis-aligned, k-dimensional cube with 0kd. To define it formally, let denote the set of all integer translates of faces of the unit cube [0,1]d, considered as a convex polytope in Rd. We call the elements of faces of Zd.

Each face has a dimension k; the 0-faces, or vertices are exactly the points in Zd. The facets of a k-face E are the (k-1)-faces contained in E. We call a pair of facets of E opposite facets, if they are disjoint. Naturally, these concepts carry over to scaled and shifted versions of Zd, so we define αs as the cubical complex defined by Gαs.

We define a map gαs:αsαs+1 as follows: for vertices of αs, we assign to xGαs the (unique) vertex yGαs+1 such that xVorGαs+1(y) (see Lemma 2). For a k-face f of αs with vertices (p1,,p2k) in Gαs, we set gαs(f) to be the convex hull of {gαs(p1),,gαs(p2k)}; the next lemma shows that this is a well-defined map. In this paper, we sometimes call gαs a cubical map, since it is a counterpart of simplicial maps for cubical complexes.

Lemma 3

Let f be k-face of αs with vertices {p1,,p2k}Gαs. Then

  • the set of vertices {gαs(p1),,gαs(p2k)} form a face e of αs+1.

  • for every face e1e, there is a face f1f such that gαs(f1)=e1.

  • if e1,e2 are any two opposite facets of e, then there exists a pair of opposite facets f1,f2 of f such that gαs(f1)=e1 and gαs(f2)=e2.

Proof

First claim: We prove the first claim by induction on the dimension of faces of Gαs. Base case: for vertices, the claim is trivial using Lemma 2. Induction case: let the claim hold true for all (k-1)-faces of Gαs. We show that the claim holds true for all k-faces of Gαs.

Let f be a k-face of Gαs. Let f1 and f2 be opposite facets of f, along the m-th coordinate. Let us denote the vertices of f1 by (p1,,p2k-1) and those of f2 by (p2k-1+1,,p2k) taken in the same order, that is, pj and p2k-1+j differ in only the m-th coordinate for all 1j2k-1. By definition, all vertices of f1 share the m-th coordinate, and we denote coordinate of these vertices by z. Then, the m-th coordinate of all vertices of f2 equals z+αs. Then gαs(pj) and gαs(p2k-1+j) have the same coordinates, except possibly the m-th coordinate. By induction hypothesis, e1=gαs(f1) and e2=gαs(f2) are two faces of Gs+1. This implies that e2 is a translate of e1 along the m-th coordinate.

There are two cases: if e1 and e2 share the m-th coordinate, then e1=e2 and therefore gαs(f)=e1=e2=e, so the claim follows. On the other hand, if e1 and e2 do not share the m-th coordinate, then they are two faces of αs+1 which differ in only one coordinate by αs+1. So they are opposite facets of a co-dimension one face e of Gαs+1. Using induction, the claim follows.

Second claim: We prove the claim by induction over the dimension of e1. Base case: e1 is a vertex. The vertices of f in Voronoi region of e1 form f1. Since f is an axis parallel face and the Voronoi region is also axis-parallel, it is immediate that f1 is a face of f. Assume that the claim is true up to dimension i. For e1 a face of dimension i+1, consider opposite facets ea and eb of e. By the induction claim, there exist faces fa,fbf that satisfy gαs(fa)=ea,gαs(fb)=eb. fa and fb are disjoint since otherwise gαs(fafb) would be common to both ea and eb, a contradiction. If ea is a translate of eb along the m-th coordinate, then fa is also a translate of fb along the same coordinate. Therefore fa and fb are opposite faces of a face f1 and gαs(f1)=e1.

Third claim: Without loss of generality, assume that x1 is the direction in which e2 is a translate of e1. Using the second claim, let h denote the maximal face of f such that gαs(h)=e1. Clearly, hf, since that would imply gαs(f)=e1=e, which is a contradiction.

Suppose h has dimension less than k-1. Let h be the facet of f that contains h and has the same x1 coordinates for all vertices. Then gαs(h)=e1, which contradicts the maximality of h.

Therefore, the only possibility is that h is a facet f1 of f such that gαs(f1)=e1. Let f2 be the opposite facet of f1. From the proof of the first claim, it is easy to see that gαs(f2)=e2. The claim follows.

Barycentric subdivision

We discuss a special triangulation of αs. A flag in αs is a set of faces {f0,,fk} of αs such that

f0fk.

The barycentric subdivision of αs, denoted by sdαs, is the (infinite) simplicial complex whose simplices are the flags of αs (Munkres 1984).

In particular, the 0-simplices of sdαs are the faces of αs. An equivalent geometric description of sdαs can be obtained by defining the 0-simplices as the barycenters of the faces in sdαs, and introducing a k-simplex between (k+1) barycenters if the corresponding faces form a flag. For a simple example, see Figs. 2 and 3. It is easy to see that sdαs is a flag complex. Given a face f in αs, we write sd(f) for the subcomplex of sdαs consisting of all flags that are formed only by faces contained in f.

Fig. 2.

Fig. 2

A portion of the grid in two dimensions. The dots are the grid points which form the 0-faces of the cubical complex

Fig. 3.

Fig. 3

The barycentric subdivision of the grid. The tiny squares are barycenters of the 1-faces and 2-faces of the cubical complex

Approximation scheme with simplicial complexes

We define our approximation complex for a finite set of points in Rd. Recall from Definition 1 that we can define a collection of scaled and shifted integer grids Gαs over a collection of scales I:={αs=2ssZ} in Rd. To make the exposition simple, we define our complex in a slightly generalized form.

Barycentric spans

Fix some sZ and let V denote any non-empty subset of Gαs.

Vertex span. We say that a face fαs is spanned by V, if the set of vertices V(f):=fV

  • is non-empty, and

  • not contained in any facet of f.

Trivially, the vertices of αs which are spanned by V are precisely the points in V. Any face of αs which is not a vertex must contain at least two vertices of V in order to be spanned. We point out that the set of spanned faces of αs is not closed under taking sub-faces. For instance, if V consists of two antipodal points of a d-cube, the only faces spanned by V are the d-cube and the two vertices; all other faces of the d-cube contain at most one vertex and hence are not spanned.

It is simple to test whether any given k-face fαs is spanned by the set of points V(f). Let T[1,,d] be the set of common coordinates of the points in V(f). V(f) spans f if and only if the standard basis vectors of Rd corresponding to T span f. T can be computed in |V(f)|O(d)=O(2kd) time by a linear scan of the coordinates. The coordinate directions spanned by f can also be found and compared with T within the same time bound.

Barycentric span. The barycentric span of V is the subcomplex of sdαs obtained by taking the union of the complete barycentric subdivisions of the maximal faces of αs that are spanned by V. The barycentric span of V is indeed a simplicial complex by definition. Moreover, the barycentric span is a flag complex. Then for any face fαs, the barycentric span of V(f) is either empty or acyclic.

Furthermore, for any non-empty subset WV, the faces of αs that are spanned by W are also spanned by V. Consequently, the barycentric span of W is a subcomplex of the barycentric span of V.

Approximation complex

We denote by PRd a finite set of points. We define two maps:

  • aαs:PGαs: for each point pP, we let aαs(p) denote the grid point in Gαs that is closest to p, that is, pVorGαs(aαs(p)). We assume for simplicity that this closest point is unique, which can be ensured using well-known methods (Edelsbrunner and Mücke 1990). We define the active vertices of Gαs as
    Vαs:=imaαs=aαs(P)Gαs,
    that is, the set of grid points that have at least one point of P in their Voronoi cells.
  • bαs:VαsP: the map bαs takes an active vertex of Gαs to its closest point in P. By taking an arbitrary total order on P to resolve multiple assignments, we ensure that this assignment is unique.

Naturally, bαs(v) is a point inside VorGαs(v) for any vVαs. It follows that the map bαs is a section of aαs, that is, aαsbαs:VαsVαs is the identity on Vαs. However, this is not true for bαsaαs in general.

Recall that the map gαs:αsαs+1 takes grid points of Gαs to grid points of Gαs+1. Using Lemma 2, it follows at once that:

Lemma 4

For all αsI and each xVαs, gαs(x)=(aαs+1bαs)(x).

Recall that Rα denotes the Rips complex at scale α for the L-norm. The next statement is a direct application of the the triangle inequality; let diam() denote the diameter in the L-norm.

Lemma 5

Let QP be a non-empty subset such that diam(Q)αs. Then, the set of grid points aαs(Q) is contained in a face of αs.

Equivalently, for any simplex σ=(p0,,pk)Rαs/2 on P, the set of active vertices {aαs(p0),,aαs(pk)} is contained in a face of αs.

Proof

We prove the claim by contradiction. Suppose that the set of active vertices aαs(Q) is not contained in a face of αs. Then, there exists at least one pair of points {x,y}Q such that aαs(x), aαs(y) are not in a common face of αs. By the definition of the grid Gαs, the grid points aαs(x), aαs(y) therefore have L-distance at least 2αs. Moreover, x has L-distance less than αs/2 from aαs(x), and the same is true for y and aαs(y). By the triangle inequality, the L-distance of x and y is more than αs, which is a contradiction to the fact that diam(Q)αs.

We now define our approximation tower. For any scale αs, we define Xαs as the barycentric span of the active vertices VαsGαs. See Figs. 45 and 6 for a simple illustration.

Fig. 4.

Fig. 4

A two-dimensional grid, shown along with its cubical complex. The green points (small dots) denote the points in P and the red vertices (encircled) are the active vertices (color figure online)

Fig. 5.

Fig. 5

The active faces are shaded. The closure of the active faces forms the cubical complex

Fig. 6.

Fig. 6

The generated approximation complex, whose vertices consist of those of the cubical complex and the blue vertices (small dots), which are the barycenters of active and secondary faces

To simplify notation, we denote

  • the faces of αs spanned by Vαs as active faces, and

  • the faces of active faces that are not spanned by Vαs as secondary faces.

To complete the description of the approximation tower, we need to define simplicial maps of the form g~αs:XαsXαs+1, which connect the simplicial complexes at consecutive scales. We show that such maps are induced by gαs.

Lemma 6

Let f be any active face of αs. Then, gαs(f) is an active face of αs+1.

Proof

Using Lemma 3, e:=gαs(f) is a face of αs. If e is a vertex, then it is active, because f contains at least one active vertex v, and gαs(v)=e in this case. If e is not a vertex, we assume for a contradiction that it is not active. Then, it contains a facet e1 that contains all active vertices in e. Let e2 denote the opposite facet of e1 in e. By Lemma 3, f contains opposite facets f1, f2 such that gαs(f1)=e1 and gαs(f2)=e2. Since f is active, both f1 and f2 contain active vertices; in particular, f2 contains an active vertex v. But then the active vertex gαs(v) must lie in e2, contradicting the fact that e1 contains all active vertices of e.

As a result, g is well defined for each face eαs, since there exists some active face eαs with ee, and g(e)g(e). By definition, a simplex σXαs is a flag (f0fk) of faces in αs. We set

g~αs(σ):=gαsf0,,gαsfk,

where (gαs(f0)gαs(fk)) is a flag of faces in αs+1 by Lemma 6, and hence is a simplex in Xαs+1. It follows that g~s:XαsXαs+1 is a simplicial map. This completes the description of the simplicial tower

XαssZ.

Interleaving with the Rips module

First, we show that our tower is a constant-factor approximation of the the L-Rips filtration of P. We then show the relation between our approximation tower and the Euclidean Rips filtration of P.

We start by defining two acyclic carriers. First, we set λ=1 and abbreviate α:=αs=2s to simplify notation.

  • C1α:Rα/2Xα: for any simplex σ=(p0,,pk) in Rα/2, we set C1α(σ) as the barycentric span of U:={as(p0),,as(pk)}, which is a subcomplex of Xα. Using Lemma 5, U lies in a maximal active face f of α, so that C1α(σ) is acyclic. The barycentric span of any subset of U is a subcomplex of the barycentric span of U, so C1α is a carrier. Therefore, C1α is an acyclic carrier.

  • C2α:XαRα: let σ be any flag of Xα and let E be the smallest active face of α that contains σ (we break ties by making use of an arbitrary global order on P)5. We collect all the points of P that map to vertices of E under the map aα and set C2α(σ) as the simplex on this set of points. By an application of the triangle inequality, we see that the L-diam of C2α(σ) is at most 2α, so C2α(σ)Rα and is acyclic. It is also clear that C2α(τ)C2α(σ) for each τσ, so C2α is an acyclic carrier.

Using the acyclic carrier theorem (Theorem 1), there exist augmentation-preserving chain maps

c1α:CRα/2CXαandc2α:CXαCRα,

between the chain complexes, which are carried by C1α and C2α respectively, for each αI. We obtain the following diagram of augmentation-preserving chain maps:

graphic file with name 41468_2021_72_Equ10_HTML.gif 10

where inc corresponds to the chain map for inclusion maps, and g~ denotes the chain map for the corresponding simplicial map g (we removed indices of the maps for readability).

The chain complexes give rise to a diagram of the corresponding homology groups, connected by the induced linear maps c1,c2,inc,g~:

graphic file with name 41468_2021_72_Equ11_HTML.gif 11

Lemma 7

For all αI, the linear maps in the lower triangle of Diagram (11) commute, that is,

g~=c1c2.

Proof

We look at the corresponding triangle in Diagram (10). We show that the (augmentation-preserving) chain maps g~ and c1c2 are both carried by an acyclic carrier D:XαX2α. The claim then follows from the acyclic carrier theorem.

Let σXα be any flag and let Eα denote the minimal active face containing σ. Let {q1,,qk} be the active vertices of E. Let {p1,,pm} be the set of points of P that map to {q1,,qk} under the map aα. Since the L-diameter of {p1,,pm} is at most 2α, using Lemma 5 we see that {a2α(p1),,a2α(pm)} is a face of 2α. We set D(σ) as the barycentric span of {a2α(p1),,a2α(pm)}. It follows that D is an acyclic carrier.

Further, {a2α(p1),,a2α(pm)}={g2α(q1),,g2α(qk)} from Lemma 2, so D(σ) is the barycentric subdivision of g2α(E). As a result D=C1C2 so that it carries c1c2. We show that D also carries the map g~.

By definition, for each face eE, g(e)g(E) and g~(sd(e))g~(sd(E)). This means that g~(σ) is contained in g(E). This shows that g~(σ)C1C2(σ) implying that g~ is carried by C1C2, as required.

Lemma 8

For all αI, the linear maps in the upper triangle of Diagram (11) commute, that is,

inc=c2c1.

Proof

The proof technique is analogous to the proof of Lemma 7. We define an acyclic carrier D:RαR2α which carries inc and c2c1, both of which are augmentation-preserving.

Let σ=(p0,,pk)Rα be any simplex. The set of active vertices

U:=a2αp0,,a2αpkG2α

lie in a face f of G2α, using Lemma 5. We can assume that f is active, as otherwise, we argue about a facet of f that contains U. We set D(σ) as the simplex on the subset of points in P, whose closest grid point in G2α is any vertex of f. Using the triangle inequality we see that D(σ)R2α, so D is an acyclic carrier. The vertices of σ are a subset of D(σ), so D carries the map inc. Showing that D carries c2c1 requires further explanation.

Let δ be any simplex in X2α for which the chain c1(σ) takes a non-zero value. Since c1(σ) is carried by C1(σ), we have that δC1(σ), which is the barycentric span of U. Furthermore, for any τC1(σ), C2(τ) is a simplex on the set of vertices {pPa2α(p)V(f)}. It follows that C2(τ)D(σ). In particular, since c2 is carried by C2, c2(c1(σ))D(σ) as well.

Using Lemmas 7 and 8, we see that the two persistence modules H(Xαs)sZ and H(Rα)α0 are weakly 2-interleaved.

With elementary modifications in the definition of X and g~, we can get a tower of the form (Xα)α0. Furthermore, with minor changes in the interleaving arguments, we show that the corresponding persistence module is strongly 4-interleaved with the L-Rips module. Using scale balancing, this result improves to a strong 2-interleaving (see Lemma 16). Since the techniques used in the proof are very similar to the concepts used in this section, for the sake of brevity we defer all further details to Appendix A.

Using the strong stability theorem for persistence modules and taking scale balancing into account, we immediately get that:

Theorem 2

The scaled persistence module (H(X2α))α0 and the L-Rips persistence module (H(Rα))α0 are 2-approximations of each other.

For any pair of points p,pRd, it holds that

p-pp-p2dp-p.

This in turn shows that the L2- and the L-Rips filtrations are strongly d-interleaved. Using the scale balancing technique for strongly interleaved persistence modules, we get:

Lemma 9

The scaled persistence module (H(Rα/d0.25))α0 and (H(Rα))α0 are strongly d0.25-interleaved.

Using Theorem 2, Lemma 9 and the fact that interleavings satisfy the triangle inequality (Bubenik and Scott 2014, Theorem  3.3), we see that the module (H(X2α))α0 is strongly 2d0.25-interleaved with the scaled Rips persistence module (H(Rα/d0.25))α0. We can remove the scaling in the Rips filtration simply by multiplying the scales on both sides with d0.25 and obtain our final approximation result:

Theorem 3

The module (H(X2d4α))α0 and the Euclidean Rips persistence module (H(Rα))α0 are 2d0.25-approximations of each other.

Computational complexity

In this section, we discuss the computational aspects of constructing the approximation tower. In Sect. 5.1 we discuss the size complexity of the tower. An algorithm to compute the tower efficiently is presented in Sect. 5.2.

Range of relevant scales. Set n:=|P| and let CP(P) denote the closest pair distance of P. At scale α0:=CP(P)3d and lower, no two active vertices lie in the same face of the grid, so the approximation complex consists of n isolated 0-simplices. At scale αm:=diam(P) and higher, points of P map to active vertices of a common face (by Lemma 5), so the generated complex is acyclic. We inspect the range of scales [α0,αm] to construct the tower, since the barcode is explicitly known for scales outside this range. For this, we set λ=α0 in the definition of the scales. The total number of scales is

log2αm/α0=log2diam(P)3dCP(P)=log2Δ+log23d=O(logΔ+logd),

where Δ=diam(P)CP(P) is the spread of the point set.

Size of the tower

The size of a tower is the number of simplices that do not have a preimage, that is, the number of simplex inclusions in the tower. We start by counting the number of active faces used in the tower.

Lemma 10

The number of active faces without pre-image in the tower is at most n3d.

Proof

At scale α0, there are n inclusions of 0-simplices in the tower, due to n active vertices. Using Lemma 2, g is surjective on the active vertices of (for any scale). Hence, no further active vertices are added to the tower.

It remains to count the maximal active faces of dimension 1 without preimage. We will use a charging argument, charging the existence of such an active face to one of the points in P. We show that each point of P is charged at most 3d-1 times, which proves the claim. For that, we first fix an arbitrary total order on P. Each active vertex on any scale has a non-empty subset of P in its Voronoi region; we call the maximal such point with respect to the order the representative of the active vertex.

For each active face f of dimension at least one, we define the signature of f as the set of representatives of the active vertices of f. If for any set of active vertices u1,,uk we have that v=g(u1)==g(uk), then the representative of v is one of the representatives of u1,,uk, using Lemma 2. Therefore, the signatures of the active faces that are images of f under g are subsets of the signature of f. This implies that each maximal active face that is included has a unique maximal signature. We bound the number of maximal signatures to get a bound on the number of maximal active face inclusions. We charge the addition of each maximal signature to the lowest ordered point according to .

Each signature contains representatives of active vertices from a face of α. Since each active vertex v has 3d-1 neighboring vertices in the grid that lie in a common face, the representative p of v can be charged 3d-1 times. There is a canonical isomorphism between the neighboring vertices of v at each scale. Then, for p to be charged more times, the image of v and some neighboring vertex u must be identical under g at some scale. But then, the representative of g(v)=g(u) is not p anymore, since p was the lowest ranked point in its neighborhood, hence the representative changes when the Voronoi regions are combined. So, p could not have been charged in such a case. Therefore, each point pP is indeed charged at most 3d-1 times.

There are n active faces of dimension 0 and at most n(3d-1) active faces of higher dimension. The upper bound is n+n(3d-1)=n3d, as claimed.

Theorem 4

The k-skeleton of the tower has size at most

n6d-1(2k+4)(k+3)!dk+2=n2O(dlogk+d),

where ab denotes the Stirling number of the second kind.

Proof

Each k-simplex that is included in the tower at any given scale α is a part of the barycentric subdivision of an active face that is also included at α. Therefore, we can account for the inclusion of this simplex by including the barycentric subdivision of its parent active face.

From Lemma 10 at most n3d active faces are included in the tower over all dimensions. We bound the number of k-simplices in the barycentric subdivision of a d-cube. Multiplying with n3d gives the required bound.

Let c be any d-cube of α. To count the number of flags of length (m+1) contained in c that start with some vertex and end with c, we use similar ideas as in Edelsbrunner and Kerber (2012): first, we fix any vertex v of c and count the flags of the form vc. Every -face in c incident to v corresponds to a subset of coordinate indices, in the sense that the coordinates not chosen are fixed to the coordinates of v for the face. With this correspondence, a flag from v to c of length (m+1) corresponds to an ordered m-partition of {1,,d}. The number of such partitions is known as m! times the quantity dm, which is the Stirling number of second kind (Rennie and Dobson 1969), and is upper bounded by 2O(dlogm). Since c has 2d vertices, the total number of flags vc of length (m+1) with any vertex v is hence 2dm!dm.

We now count the number of flags of length k+1. Each such flag is (k+1)-subset of some flag of length m=k+3 that start with a vertex and end with c. There are 2d(k+2)!dk+2 such flags and each of them has k+3k+1=(k+3)(k+2)/2 subsets of size (k+1). The number of (k+1)-flags is upper bounded by 2d(k+2)!dk+2(k+3)(k+2)2=2d-1(k+2)(k+3)!dk+2. The k-skeleton has size at most

n3d2d-1(k+2)(k+3)!dk+2=n6d-1(2k+4)(k+3)!dk+2.

Computing the tower

From Sect. 3, we know that Gαs+1 is built from Gαs by making use of an arbitrary translation vector (±1,,±1)Zd. In our algorithm, we pick the components of this translation vector uniformly at random from {+1,-1}, and independently for each scale. The choice behind choosing this vector randomly becomes more clear in the next lemma.

From the definition, the cubical maps gαs:αsαs+1 can be composed for multiple scales. For a fixed αs, we denote by g(j):αsαs+j the j-fold composition of g, that is,

g(j)=gαs+j-1gαs+j-2gαs+1gαs,

for j1.

Lemma 11

For any k-face fαs with 1kd, let Y denote the minimal integer j such that g(j)(f) is a vertex, for a given choice of the randomly chosen translation vectors. Then, the expected value of Y satisfies

E[Y]3logk,

which implies that no face of αs survives more than 3logd scales in expectation.

Proof

Without loss of generality, assume that the grid under consideration is Zd and f is the k-face spanned by the vertices {{0,1},,{0,1}k,0,,0}, so that the origin is a vertex of f. The proof for the general case is analogous.

Let y1{-1,1} denote the randomly chosen first coordinate of the translation vector, so that the corresponding shift is one of {-1/2,1/2}.

  • If y1=1, then the grid G on the next scale has some grid point with x1-coordinate 1/2. Clearly, the closest grid point in G to the origin is of the form (+1/2,±1/2,,±1/2), and thus, this point is also closest to (1,0,0,,0). The same is true for any point (0,,,) and its corresponding point (1,,,) on the opposite facet of f. Hence, for y1=1, g(f) is a face where all points have the same x1-coordinate.

  • On the other hand, if y1=-1, the origin is mapped to some point which has the form (-1/2,±1/2,,±1/2) and (1,0,,0) is mapped to (3/2,±1/2,,±1/2), as one can directly verify. Hence, in this case, in g(f), points do not all have the same x1 coordinate.

We say that the x1-coordinate collapses in the first case and survives in the second. Both events occur with the same probability 1/2. Because the shift is chosen uniformly at random for each scale, the probability that x1 did not collapse after j iterations is 1/2j.

f spans k coordinate directions, so it must collapse along each such direction to contract to a vertex. Once a coordinate collapses, it stays collapsed at all higher scales. As the random shift is independent for each coordinate direction, the probability of a collapse is the same along all coordinate directions that f spans. Using the union bound, the probability that gj(f) has not collapsed to a vertex is at most k/2j. With Y as in the statement of the lemma, it follows that

P(Yj)k/2j.

Hence,

E[Y]=j=1jP(Y=j)=j=1P(Yj)logk+c=1j=clogk(c+1)logkP(Yj)logk+c=1j=clogk(c+1)logkP(Yclogk)logk+c=1logkk2clogklogk+logkc=11kc-1logk+2logk3logk.

As a consequence of the lemma, the expected “lifetime” of k-simplices in our tower with k>0 is rather short: given a flag e0e, the face e will be mapped to a vertex after O(logd) steps, and so will be all its sub-faces, turning the flag into a vertex. It follows that summing up the total number of k-simplices with k>0 over Xα for all α0 yields an upper bound of n2O(dlogk+d) as well.

Algorithm description

Recall that a simplicial map can be written as a composition of simplex inclusions and contractions of vertices (Dey et al. 2014; Kerber and Schreiber 2017). That means, given the complex Xαs, to describe the complex at the next scale αs+1, it suffices to specify

  • which pairs of vertices in Xαs map to the same image under g~, and

  • which simplices in Xαs+1 are included at scale Xαs+1.

The input is a set of n points PRd. The output is a list of events, where each event is of one of the three following types:

  • A scale event defines a real value α and signals that all upcoming events happen at scale α (until the next scale event).

  • An inclusion event introduces a new simplex, specified by the list of vertices on its boundary (we assume that every vertex is identified by a unique integer).

  • A contraction event is a pair of vertices (ij) from the previous scale, and signifies that i and j are identified as the same from that scale.

In a first step, we estimate the range of scales that we are interested in. We compute a 2-approximation of diam(P) by taking any point pP and calculating maxqPp-q. Then we compute CP(P) using a randomized algorithm in n2O(d) expected time (Khuller and Matias 1995).

Next, we proceed scale-by-scale and construct the list of events accordingly. On the lowest scale, we simply compute the active vertices by point location for P in a cubical grid, and enlist n inclusion events (this is the only step where the input points are considered in the algorithm).

For the data structure, we use an auxiliary container S and maintain the invariant that whenever a new scale is considered, S consists of all simplices of the previous scale, sorted by dimension. In S, for each vertex, we store an id and a coordinate representation of the active face to which it corresponds. Every -simplex with >0 is stored just as a list of integers, denoting its boundary vertices. We initialize S with the n active vertices at the lowest scale.

Let α<α be any two consecutive scales with , the respective cubical complexes and X,X the approximation complexes, with g~:XX being the simplicial map connecting them. Suppose we have already constructed all events at scale α.

  • First, we enlist the scale event for α.

  • Then, we enlist the contraction events. For that, we iterate through the vertices of X and compute their value under g, using point location in a cubical grid. We store the results in a list S (which contains the simplices of X). If for a vertex j, g(j) is found to be equal to g(i) for a previously considered vertex i, we choose the minimal such i and enlist a contraction event for (ij).

  • We turn to the inclusion events:
    • We start with the case of vertices. Every vertex of X is either an active face or a secondary face of . Each active face must contain an active vertex, which is also a vertex of X. We iterate through the elements in S. For each active vertex v encountered, we go over all faces of the cubical complex that contain v as a vertex, and check whether they are active. For every active face E encountered that is not in S yet, we add it to S and enlist an inclusion event of a new 0-simplex. Additionally, we go over each face of E, add it to S and enlist a vertex inclusion event, thereby enumerating the secondary faces that are in E. At termination, all vertices of X have been detected.
    • Next, we iterate over the simplices of S of dimension 1, and compute their image under g~ using the pre-computed vertex map; we store the result in S.
    • To find the simplices of dimension 1 included at X, we exploit our previous insight that they contain at least one vertex that is included at the same scale (see the proof of Theorem 4). Hence, we iterate over the vertices included in X and find the included simplices inductively in dimension. Let v be the current vertex under consideration; assume that we have found all (p-1)-simplices in X that contain v. Each such (p-1)-simplex σ is a flag of length p in . We iterate over all faces e that extend σ to a flag of length p+1. If e is active, we have found a p-simplex in X incident to v. If this simplex is not in S yet, we add it and enlist an inclusion event for it. We also enqueue the simplex in our inductive procedure, to look for (p+1)-simplices in the next round. At the end of the procedure, we have detected all simplices in X without preimage, and S contains all simplices of X. We set SS and proceed to the next scale.

This ends the description of the algorithm.

Theorem 5

To compute the k-skeleton, the algorithm takes

n2O(d)logΔ+2O(d)M

time in expectation and M space, where M denotes the size of the tower. In particular, the expected time is bounded by

n2O(d)logΔ+n2O(dlogk+d)

and the space is bounded by n2O(dlogk+d).

Proof

In the analysis, we ignore the costs of point locations in grids, checking whether a face is active, and searches in data structures S, since all these steps have negligible costs when appropriate data structures are chosen.

Computing the image of a vertex of X costs O(2d) time. Moreover, there are at most n2O(d) vertices altogether in the tower in expectation (using Lemma 10), so this bound in particular holds on each scale. Hence, the contraction events on a fixed scale can be computed in n2O(d) time. Finding new active vertices requires iterating over the cofaces of a vertex in a cubical complex. There are 3d such cofaces for each vertex. This has to be done for a subset of the vertices in X, so the running time is also n2O(d). Further, for each new active face, we go over its 2O(d) faces to enlist the secondary faces, so this step also consumes n2O(d) time. Since there are O(logΔ+logd) scales considered, these steps require n2O(d)logΔ over all scales.

Computing the image of g~ for a fixed scale costs at most O(2d|X|). M is the size of the tower, that is, the simplices without preimage, and I is the set of scales considered. The expected bound for αI|Xα|=O(logdM), because every simplex has an expected lifetime of at most 3logd by Lemma 11. Hence, the cost of these steps is bounded by 2O(d)M.

In the last step of the algorithm, we find the simplices of X included at α. We consider a subset of simplices of X, and for each, we iterate over a collection of faces in the cubical complex of size at most 2O(d). Hence, this step is also bounded by 2O(d)|X| per scale, and hence bounded 2O(d)M as well.

For the space complexity, the auxiliary data structure S gets as large as X, which is clearly bounded by M. For the output complexity, the number of contraction events is at most the number of inclusion events, because every contraction removes a vertex that has been included before. The number of inclusion events is the size of the tower. The number of scale events as described is O(logΔ+logd). However, it is simple to get rid of this factor by only including scale events in the case that at least one inclusion or contraction takes place at that scale. The space complexity bound follows.

Dimension reduction

When the ambient dimension d is large, our approximation scheme can be combined with dimension reduction techniques to reduce the final complexity, very similar to the application in Choudhary et al. (2017b). For a set of n points PRd, we apply the dimension reduction schemes of Johnson-Lindenstrauss (JL) (Johnson et al. 1986), Matoušek (MT) (Matoušek 1990), and Bourgain’s embedding (BG) (Bourgain 1985). We then compute the approximation on the lower-dimensional point set. We only state the main results in Table 1, leaving out the proofs since they are very similar to those from Choudhary et al. (2017b).

Table 1.

Comparison of dimension reduction techniques: here the approximation ratio is for the Rips persistence module, and the size refers to the size of the k-skeleton of the approximation

technique approximation ratio size runtime
JL O(log0.25n) nO(logk) nO(1)logΔ+nO(logk)
MT O((logn)0.75(loglogn)0.25) nO(1) nO(1)logΔ
BG + MT O((logn)1.75(loglogn)0.25) nO(1) nO(1)logΔ

Approximation scheme with cubical complexes

We extend our approximation scheme to use cubical complexes in place of simplicial complexes. We start by detailing a few aspects of cubical complexes.

Cubical complexes

We now briefly describe the concept of cubical complexes, essentially expanding upon the contents of Sect. 3.1. For a detailed overview of cubical homology, we refer to Kaczynski et al. (2004).

Definition

We define cubical complexes over the grids Gαs. For any fixed αs, the grids Gαs defines a natural collection of cubes. An elementary cube γ is a product of intervals γ=I1×I2××Id, where each interval is of the form Ij=(xj,xj+mj), such that the vertex (x1,,xm)Gαs and each mj is either 0 or αs. That means, an (elementary) cube is simply a face of a d-cube of the grid. An interval Ij is said to be degenerate if mj=0. The dimension of γ is the number of non-degenerate intervals that defines it. We define the boundary of any interval as the two degenerate intervals that form its endpoints and denote this by (Ij)=(xj,xj)+(xj+mj,xj+mj). Taking the boundary of any fixed subset of the intervals defining γ consecutively gives a sum of faces of γ. A cubical complex of Gαs is a finite collection of cubes of Gαs.

We define chain complexes for the cubical case in the same way as in simplicial complexes. The chain complexes are connected by boundary homomorphisms, where the boundary of a cube is defined as:

I1××Id=(I1)×I2××Id++I1××Id-1×(Id),

where (I1××(Ij)××Id) denotes the sum

I1××xi,xi××Id+I1××xi+mi,xi+mi××Id.

It can be quickly verified that for each cube γ, (γ)=0 since each term appears twice in the expression and the addition is over Z2.

Cubical maps and induced homology

Let Tαs and Tαt denote the cubical complexes defined by the grids Gαs and Gαt, respectively, for st. We use the vertex map g:GαsGαt to define a map between the cubical complexes. Note that if (ab) are vertices of a cube of Tαs that differ in one coordinate, then (g(a), g(b)) are vertices of a cube of Tαt that differ in at most one coordinate. A cubical map is a map f:TαsTαt defined using g, such that for each cube γ=[a1,b1]××[ad,bd] of Tαs, f(γ):=[g(a1),g(b1)]××[g(ad),g(bd)] spans a cube of Tαt. The cubical map can also be restricted to sub-complexes of Tαs and Tαt, provided that the image f(γ) is well-defined.

Each cubical map also defines a corresponding continuous map between the underlying spaces of the respective complexes. Let x|γ| be a point in γ. Then, the coordinates of x can be uniquely written as x=[λ1a1+(1-λ1)b1,,λdad+(1-λd)bd] where each λi[0,1]. The image of x under the continuous extension of f is the point [λ1g(a1)+(1-λ1)g(b1),,λdg(ad)+(1-λd)g(bd)] in the cube g(γ).

The cubical map f gives rise to a chain map f#:Cp(Tαs)Cp(Tαt) between the p-th chain groups of the complexes, for each p[0,,d]. For each cube γ, f#(γ)=f(γ) if dim(γ)=dim(f(γ)) and 0 otherwise. For any chain c=iγi, the chain map is defined linearly f#(c)=if#(γi). It is simple to verify that f#=f#, so this gives a homomorphism between the chain groups.

Moving to the homology level, we get the respective homology groups H(Tαs) and H(Tαt) and the chain map from above induces a linear map between them. The concept of reduced homology and augmentation maps is also applicable to the cubical chain complexes. For a sequence of cubical complexes connected with cubical maps, this generates a persistence module.

Cubical filtrations and towers are defined in a similar manner to the simplicial case. A cubical filtration is a collection of cubical complexes (Tα)αI such that TαTα for all ααI. A (cubical) tower is a sequence (Tα)αJ of cubical complexes with J being an index set together with cubical maps between complexes at consecutive scales. A cubical tower can be written as a sequence of inclusions and contractions, where an inclusion refers to the addition of a cube and a contraction refers to collapsing a cube along a coordinate direction to either of the endpoints of the interval.

Description

We choose the simplest possible cubical complex to define our approximation cubical tower: for each scale αs, we define the cubical complex Uαs as the set of active faces and secondary faces spanned by Vαs. Hence the cubical complex is closed under taking faces and is well-defined. See Fig. 5 for a simple example.

Recall from Sect. 4 that for each sZ, Uαs and Uαs+1 are related by a cubical map gαs, which gives rise to the cubical tower

UαssZ.

We extend this to a tower (Uα)α0 by using techniques from Appendix A. In Sect. 4 we saw that the tower (Xα)α0 gives an approximation to the Rips filtration. The relation between the simplicial and cubical towers is trivial: Xαs is simply a triangulation of |Uαs|. Hence Xαs and Uαs have the same homology (Munkres 1984). Moreover, the simplicial map is derived from an application of the cubical map. In particular, the continuous versions of both maps are the same. For any 0αβ, let

  • f1:H(Uα)H(Uβ) denote the homomorphism induced by the cubical map,

  • f2:H(Xα)H(Xβ) denote the homomorphism induced by the simplicial map, and

  • f0:H(|Xα|=|Uα|)H(|Xβ|=|Uβ|) denote the homomorphism induced by the common continuous map.

It is well-established that f1=f0 (Kaczynski et al. 2004, Chapter. 6) and f2=f0 (Munkres 1984, Chapter. 2). Therefore, we conclude that the persistence modules (H(Uα))α0 and (H(Xα))α0 are persistence-equivalent. Combining this observation with the result of Theorem 3, we get

Theorem 6

The scaled persistence modules

  • (H(U2α))α0 and the L-Rips module (H(Rα))α0 are 2-approximations of each other, and

  • (H(U2d4α))α0 and the Rips module (H(Rα))α0 2d0.25-approximate each other.

To compute the cubical tower, we simply re-use the algorithm for the simplicial case, with small changes:

  • In the simplicial case, we used a container S to hold the simplices from the previous scale. We alter S to store the cubes from the previous scale. For each interval, we store an id and its coordinates. Each cube is stored as the set of ids of the intervals that define it.

  • At each scale, we enumerate the image of the cubical map by computing the image of each interval, and then use this pre-computed map to compute the image of (1)-dimensional cubes.

  • For the inclusions, we find all the active and secondary faces but do not compute the simplices. The inclusions in the cubical tower correspond exactly to the inclusions of active and secondary faces in the simplicial tower, so this enumerates all inclusions correctly.

From Lemma 10 at most n3d active faces are added to the tower. Hence at most n3d3d=n6d active and secondary faces are added to the tower. Computing the tower takes time as in Theorem 5 by replacing M with the size bound. We conclude that:

Theorem 7

The cubical tower has size at most n6d and takes at most n6dlogΔ time in expectation to compute, where Δ is the spread of the point set.

Discussion

Practicality

We now touch upon the practical aspects of our constructions. An implementation of our approximation scheme would be a tool that computes the (approximate) persistence barcode for any input data set. For any scheme to be useful in practice, it should be able to compute sufficiently close approximations using a reasonable amount of resources.

Our cubical tower consists of cubical complexes connected via cubical maps. To our knowledge, there are no algorithms to compute barcodes in this setting where the cubical maps are more than just trivial inclusions. As such, although our cubical scheme has exponentially lower theoretical guarantees compared to the simplicial tower, we can not hope to test it in practice unless the appropriate primitives are available. It could be an interesting research direction to develop this primitive and in particular investigate whether the techniques used in computing persistence barcodes for a simplicial tower allow a generalization to the cubical case.

It makes more sense to inspect the simplicial tower. We saw in Theorem 4 that the size of the tower is n6d-1(2k+4)(k+3)!dk+2. Unfortunately, this bound is already too large so that the storage requirement of the Algorithm (Theorem 7) explodes exponentially. Let us assume a conservative bound of 1 Byte of memory requirement per simplex. For a point set in d=8 dimensions and k=4, the complexity bound is already at least 4000 Terabytes, before factoring in n. For a point set in d=10 dimensions and k=5, this explodes to 1020 Terabytes. While these are upper bounds, in practice the complexity will still need to be many orders of magnitude smaller to be feasile, which is unlikely. Even with conservative estimates our storage requirement is impractical.

Therefore we are not very hopeful that implementing the scheme in its current state will provide any useful insight for high dimensional approximations. Making it implementation-worthy demands more optimizations and tools at the algorithmic level. This is worth another Algorithmic engineering project in its own right. We plan to pursue this line of research in the future. Since our focus in this paper was geared towards theoretical aspects of approximations, we exclude experimental results in the current work. We hope that a more careful implementation-focussed approach may prove more practical.

On the other hand, the upper bound for the cubical case is simply n6d. Even for d=10, the storage requirement would be less than 100 Megabytes before factoring in n. This is far more attractive than the simplicial case. As such, it may make more sense to invest time and effort in developing tools to compute barcodes in the cubical setup.

Summary

We presented an approximation scheme for the Rips filtration, with improved approximation ratio, size and computational complexity than previous approaches for the case of high-dimensional point clouds. In particular, we are able to achieve a marked reduction in the size of the approximation by using cubical complexes in place of simplicial complexes. This is in contrast to all other previous approaches that used simplicial complexes as approximating structures.

An important technique that we used in our scheme is the application of acyclic carriers to prove interleaving results. An alternative would to be explicitly construct chain maps between the Rips and the approximation towers; unfortunately, this make the interleaving analysis significantly more complex. While the proof of the interleaving in Sect. 4.3 is still technically challenging, it greatly simplifies by the usage of acyclic carriers. There is also no benefit in knowing the interleaving maps because they are only required for the analysis of the interleaving, and not for the actual computation of the approximation tower. We believe that this technique is of general interest for the construction of approximations of cell complexes.

Our simplicial tower is connected by simplicial maps; there are (implemented) algorithms to compute the barcode of such towers (Dey et al. 2014; Kerber and Schreiber 2017). It is also quite easy to adapt our tower construction to a streaming setting (Kerber and Schreiber 2017), where the output list of events is passed to an output stream instead of being stored in memory.

Acknowledgements

We would like to thank the reviewers end editors for their feedback, which was very helpful in improving the presentation.

Strong interleaving for barycentric scheme

Recall that we build the approximation tower over the set of scales I:={αs=2ssZ}. The tower (Xα)αI connected with the simplicial map g~ can be extended to the set of scales {α0} with simple modifications:

  • for αI, we define Xα in the usual manner. The map g~ stays the same as before for complexes at such scales.

  • for all α[αs,αs+1), we set Xα=Xαs, for any αsI. That means, the complex stays the same in the interval between any two scales of I, so we define g~ as the identity within this interval.

These give rise to the tower (Xα)α0, that is connected with the simplicial map g~. This modification helps in improving the interleaving with the Rips persistence module.

First, we extend the acyclic carriers C1 and C2 from before to the new case:

  • C1α:RαX4α,α>0: we define C1 as before, simply changing the scales in the definition. It is straightforward to see that C1 is still a well-defined acyclic carrier.

  • C2α:XαRα,α0: this stays the same as before. It is simple to check that C2 is still a well-defined acyclic carrier.

These give rise to augmentation-preserving chain maps between the chain complexes:

c1α:CRαCX4αandc2α:CXαCRα,

using the acyclic carrier theorem as before (Theorem 1).

Lemma 12

The diagram

graphic file with name 41468_2021_72_Equ12_HTML.gif 12

commutes on the homology level, for all 0αα.

Proof

Consider the acyclic carrier C1inc:RαX4α. It is simple to verify that this carrier carries both c1inc and g~c1, so the induced diagram on the homology groups commutes, from Theorem 1.

Lemma 13

The diagram

graphic file with name 41468_2021_72_Equ13_HTML.gif 13

commutes on the homology level, for all 0αα.

Proof

We construct an acyclic carrier D:XαRα which carries incc2 and c2g~, thereby proving the claim (Theorem 1).

Consider any simplex σXα and let Eα be the minimal active face of containing σ. We set D(σ) as the simplex on the set of input points of P, which lie in the Voronoi regions of the vertices of g(E). By the triangle inequality, D(σ) is a simplex of Rα, so that D is a well-defined acyclic carrier. It is straightforward to verify that D carries both c2g~ and incc2.

Lemma 14

The diagram

graphic file with name 41468_2021_72_Equ14_HTML.gif 14

commutes on the homology level, for all 0αα.

Proof

The diagram is essentially the same as the lower triangle of Diagram 10, with a change in the scales. As a result, the proof of Lemma 7 also applies for our claim directly.

Lemma 15

The diagram

graphic file with name 41468_2021_72_Equ15_HTML.gif 15

commutes on the homology level, for all 0αα.

Proof

The diagram can be re-interpreted as:

graphic file with name 41468_2021_72_Equ16_HTML.gif 16

The modified diagram is essentially the same as the upper triangle of Diagram 10, with a change in the scales and a replacement of c1 with g~c1, that is equivalent to the chain map at the scale α. Hence, the proof of Lemma 8 also applies for our claim directly.

Using Lemmas 12, 13, 14, 15, and the scale balancing technique for strongly interleaved persistence modules, it follows that

Lemma 16

The persistence modules (H(X2α))α0 and (H(Rα))α0 are strongly 2-interleaved.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Footnotes

1

An exception are point clouds in R2 and R3, for which alpha complexes (Edelsbrunner and Harer 2010) are an efficient alternative.

2

Ulrich Bauer, private communication.

3

To avoid thinking about orientations, it is often assumed that F=Z2 is the field with two elements.

4

In the language of Munkres (1984), this result is stated as the existence of a chain homotopy between ϕ1 and ϕ2. As evident from Munkres (1984), Theorem  12.4, this implies that the induced linear maps are the same.

5

We define an order between the active faces of α, using : for each active face Fα, there are at least two points of P whose images under gα are vertices of F; say {q1q2qm}P are the points that map to F. We assign to F the string of length n: Inline graphic. Each active face has a unique string associated to it. A total order on the faces is obtained by taking the lexicographic orders of the strings of each active face.

Aruni Choudhary is supported in part by European Research Council StG 757609. Michael Kerber is supported by Austrian Science Fund (FWF) grant number P 29984-N35. Sharath Raghvendra acknowledges support of NSF CRII grant CCF-1464276.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Aruni Choudhary, Email: arunich@inf.fu-berlin.de.

Michael Kerber, Email: kerber@tugraz.at.

Sharath Raghvendra, Email: sharathr@vt.edu.

References

  1. Botnan M, Spreemann G. Approximating persistent homology in Euclidean space through collapses. Appl. Algebra Eng. Commun. Comput. 2015;26(1–2):73–101. doi: 10.1007/s00200-014-0247-y. [DOI] [Google Scholar]
  2. Bourgain J. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. Math. 1985;52(1–2):46–52. doi: 10.1007/BF02776078. [DOI] [Google Scholar]
  3. Bubenik P, Scott JA. Categorification of persistent homology. Discrete Comput. Geom. 2014;51(3):600–627. doi: 10.1007/s00454-014-9573-x. [DOI] [Google Scholar]
  4. Bubenik P, de Silva V, Scott J. Metrics for generalized persistence modules. Found. Comput. Math. 2015;15(6):1501–1531. doi: 10.1007/s10208-014-9229-5. [DOI] [Google Scholar]
  5. Carlsson G. Topology and data. Bull. Am. Math. Soc. 2009;46:255–308. doi: 10.1090/S0273-0979-09-01249-X. [DOI] [Google Scholar]
  6. Carlsson G, Zomorodian A. Computing persistent homology. Discrete Comput. Geom. 2005;33(2):249–274. doi: 10.1007/s00454-004-1146-y. [DOI] [Google Scholar]
  7. Cavanna, N., Jahanseir, M., Sheehy, D.: A Geometric perspective on sparse filtrations. In: Proceedings of the 27th Canadian Conference on Computational Geometry (CCCG), pp. 116–121 (2015)
  8. Chazal, F., Cohen-Steiner, D., Glisse, M., Guibas, L., Oudot, S.: Proximity of persistence modules and their diagrams. In: ACM Symposium on Computational Geometry (SoCG), pp. 237–246 (2009)
  9. Choudhary, A., Kerber, M., Raghavendra, S.: Improved approximate rips filtrations with shifted integer lattices. In: Proceedings of the 25th Annual European Symposium on Algorithms (ESA), pp. 28:1–28:13 (2017)
  10. Choudhary, A., Kerber, M., Raghavendra, S.: Polynomial-sized topological approximations using the permutahedron (extended version). Discrete Comput. Geom. (2017)
  11. Choudhary, A., Kerber, M., Raghavendra, S.: Improved topological approximations by digitization. In: Proceedings of the Symposium on Discrete Algorithms (SODA), pp. 448:1–448:14 (2019)
  12. Dey, T.K., Fan, F., Wang, Y.: Computing topological persistence for simplicial maps. In: Proceedings of the 30th Annual Symposium on Computational Geometry (SoCG), pp. 345–354 (2014)
  13. Edelsbrunner H, Harer J. Computational Topology—An Introduction. New York: American Mathematical Society; 2010. [Google Scholar]
  14. Edelsbrunner H, Kerber M. Dual complexes of cubical subdivisions of Rn. Discrete Comput. Geom. 2012;47(2):393–414. doi: 10.1007/s00454-011-9382-4. [DOI] [Google Scholar]
  15. Edelsbrunner, H., Mücke, E.P.: Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Trans. Gr. 66–104 (1990)
  16. Edelsbrunner H, Letscher D, Zomorodian A. Topological persistence and simplification. Discrete Comput. Geom. 2002;28(4):511–533. doi: 10.1007/s00454-002-2885-2. [DOI] [Google Scholar]
  17. Goodman JE, O’Rourke J, Tóth CD, editors. Handbook of Computational Geometry. Boca Raton: CRC Press; 2017. [Google Scholar]
  18. Hatcher A. Algebraic Topology. Cambridge: Cambridge University Press; 2002. [Google Scholar]
  19. Johnson WB, Lindenstrauss J, Schechtman G. Extensions of Lipschitz maps into Banach spaces. Israel J. Math. 1986;54(2):129–138. doi: 10.1007/BF02764938. [DOI] [Google Scholar]
  20. Kaczynski T, Mischaikow K, Mrozek M. Computational Homology. New York: Springer; 2004. [Google Scholar]
  21. Kerber, M., Schreiber, H.: Barcodes of towers and a streaming algorithm for persistent homology. In: Proceedings of 33rd International Symposium on Computational Geometry (SoCG), pp. 57:1–57:15 (2017) [DOI] [PMC free article] [PubMed]
  22. Kerber, M., Sharathkumar, R.: Approximate Čech complex in low and high dimensions. In: Algorithms and Computation—24th International Symposium (ISAAC), pp. 666–676 (2013)
  23. Khuller S, Matias Y. A simple randomized sieve algorithm for the closest-pair problem. Inf. Comput. 1995;118(1):34–37. doi: 10.1006/inco.1995.1049. [DOI] [Google Scholar]
  24. Matoušek, J.: Bi-Lipschitz embeddings into low-dimensional Euclidean spaces. Commentationes Mathematicae Universitatis Carolinae (1990)
  25. Munkres JR. Elements of Algebraic Topology. Milton Park: Westview Press; 1984. [Google Scholar]
  26. Rennie BC, Dobson AJ. On stirling numbers of the second kind. J. Comb. Theory. 1969;7(2):116–121. doi: 10.1016/S0021-9800(69)80045-1. [DOI] [Google Scholar]
  27. Sheehy D. Linear-size approximations to the Vietoris-rips filtration. Discrete Comput. Geom. 2013;49(4):778–796. doi: 10.1007/s00454-013-9513-1. [DOI] [Google Scholar]
  28. Wagner, H., Chen, C., Vuçini, E.: Efficient Computation of Persistent Homology for Cubical Data, pp. 91–106. Springer, Berlin (2012)

Articles from Journal of Applied and Computational Topology are provided here courtesy of Springer

RESOURCES