Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Dec 1.
Published in final edited form as: J Appl Comput Topol. 2020 Jul 29;4(4):481–507. doi: 10.1007/s41468-020-00057-9

Evolutionary homology on coupled dynamical systems with applications to protein flexibility analysis

Zixuan Cang 1, Elizabeth Munch 2,3, Guo-Wei Wei 4,5,6
PMCID: PMC8223814  NIHMSID: NIHMS1616503  PMID: 34179350

Abstract

While the spatial topological persistence is naturally constructed from a radius-based filtration, it has hardly been derived from a temporal filtration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based filtration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are defined on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein flexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the field.

Keywords: Evolutionary homology, Local property, Dynamical systems, Protein network

1. Introduction

Homology, a tool from traditional algebraic topology, provides an algebraic structure which encodes topological structures of different dimensions in a given space, such as connected components, closed loops, and other higher dimensional analogues [48]. To study topological invariants in a discrete data set, one uses the structure of the data set, such as pairwise distance information, to build a simplicial complex, which can be loosely thought of as a generalization of a graph, and then compute the homology of the complex. However, conventional homology is blind to scale, and thus retains too little geometric or physical information to be practically useful. Persistent homology, a new branch of algebraic topology, embeds multiscale information into topological invariants to achieve an interplay between geometry and topology [18,37,45,50,46,60].

Given a continuum of topological spaces, called a filtration, persistence encodes the changing homology as a proxy for the shape and size of the data set by keeping track of when homological features appear and disappear over the course of the filtration. This flexibility means that the choice of filtration allows the use of persistent homology to be tailored to the data set given and the question asked. As a result, it has been utilized for analysis of data sets arising from many different domains. For biology related areas [63], persistence has been used in bioinformatics [52,72,15,13], neuroscience [82,32,33,31], and protein folding [92,90,93,43].

The 0-dimensional version of persistent homology was originally introduced under the name “size function” [40,41,7779]. The generalized persistent homology theory and a practical algorithm was formulated by Edelsbrunner et al. [38]; the algebraic foundation was subsequently established by Zomorodian and Carlsson [96]. Recently, there have been significant developments and generalizations of persistent homology methodology [8,28,30,25,22,20,81,19,94, 11,69,36], further understanding of metrics and stability [27,24,29,12,34], and computational algorithms [67,35,59,84,62,7,6]. Persistent homology is often visualized by barcodes [23,45] where horizontal line segments called bars represent homology generators that survive over different filtration scales. The persistence diagram [37] is an equivalent representation that plots the births and deaths of the generators in a 2D plane.

Persistent homology is a versatile tool for data analysis. However, the difficulties inherent in the interpretation of the topological space of persistence barcodes [57,86,61] means that the most success in combining these topological signatures with machine learning methods has been found by turning persistence barcodes into features in a well-behaved space suitable for machine learning. Options for this procedure are quickly growing, and include persistence landscapes [10], algebraic constructions [2,21,51], persistence images [1, 93,13], kernel methods [76,92], and tent functions [74]. In 2015, Cang et al. constructed one of the first topology based machine learning algorithms and applied it for protein classification involving tens of thousands of proteins and hundreds of tasks [14]. This approach has been generalized for the predictions of protein-ligand binding affinity [16] and mutation-induced protein stability change [15], and further combined with convolutional neural networks and multi-task learning algorithms [17].

Although most persistent homology formulations are based on spatial data, such as point clouds, the use of homology for the analysis of dynamical systems and time series analysis predates and intertwines with the beginnings of persistent homology [50,58,44,4,78,77,79]. More recently, there has been increased interest in the combination of persistent homology with time series analysis [80]. Some common methods include computing the persistent homology of the Takens embedding [73,72,71,54,53,55], studying the sublevelset homology of movies [56,85], and working with the additional structure afforded by persistent cohomology [81,9,87]. Wang and Wei have defined temporal topological persistence via the solution of a time-dependent partial differential equation derived from differential geometry [88]. This method encodes spatial connectivity into temporal persistence in the Laplace-Beltrami flow, and offers accurate quantitative prediction of fullerene isomer stability in terms of total curvature energy for over 500 fullerene molecules. Stolz et al. have recently constructed persistent homology from time-dependent functional networks associated with coupled time series [83]. This work uses weight rank clique filtration over a defined parameter reflecting similarities between trajectories to characterize coupled dynamical systems.

All the aforementioned methods concern the global topology of a given data, such as the topology of the point cloud of a protein. Topology is inherently a global concept and describes a data as a whole. Such a global topology is useful for the global property of the object under description, e.g. band gap of a solid material, the binding affinity of an entire protein-ligand complex, and solubility of a molecule. It is noticed that relative homology was applied to extract the global topology of a localized region [39] and has been used to compare maps [3]. However, in science, engineering, and other fields, it is often desirable to understand the local property of an individual component of object, such as the topological property of a given atom in a molecule, the impurity in a solid, and a node in a network.

The objective of the present work is to introduce a new type of topological methods, called evolutionary homology (EH). The proposed EH describes the topological properties of an individual component that is determined by the given individual and its adjacency in a data. To this end, we embed the data into a dynamical system and systematically perturb each individual element (oscillator) of the dynamical system to generate topological response, which is recorded as temporal persistence. Specifically, simplicial complex filtration based on the trajectories of a set of chaotic oscillators coupled via the interactions of a physical system to obtain temporal topological persistence. We are particularly interested in the encoding of the topological connectivity of a real physical system into the chaotic dynamical system and the decoding of physical properties from the EH. To this end, we regulate the dynamical system by a generalized graph Laplacian matrix defined on the physical system with a distinct geometric structure. As such, the regulation encodes the structural information into the time evolution of the dynamical system. We use two well-studied dynamical systems, the Lorenz system and the Rössler system, to facilitate the control and synchronization of chaotic oscillators by weighted graph Laplacian matrices. These dynamical systems are chosen for their simplicity, rich dynamics and well-known chaotic behaviors. We create machine learning features from the EH barcodes by using the Wasserstein and bottleneck metrics. The resulting outputs in various topological dimensions are directly correlated with the physical properties of the dataset.

To demonstrate the quantitative analysis power of EH, we apply the present method to the prediction of protein thermal fluctuations characterized by experimental B-factors of Cα atoms. In this application, protein residues are represented by individual dynamical systems connected by a coupling matrix derived from given pairwise interactions of the residues. The protein flexibility is characterized by analyzing how the perturbations introduced to the systems are propagated and relaxed among oscillators, which create EH. We show that these coupled nonlinear dynamical systems provide more information compared to other methods. It is found that the present EH provides some of the most accurate B-factor predictions for a benchmark set of 364 proteins.

2. Methods

This section is devoted to the methods and algorithms. In Sec. 2.2, we give a brief discussion of coupled dynamical systems and their stability control via a correlation (coupling) matrix which embeds topological connectivity of a physical system into the dynamical system. We review persistent homology and persistence barcodes in Sec. 2.3. We formulate local topology or evolutionary homology on coupled dynamical systems in Sec. 2.4. Finally, we discuss the treatment of barcodes, the associated metrics, and the methods for learning in Sec. 2.5.

2.1. Overview

We aim to extract topological information from a coupled dynamical system for the prediction of its physical properties. In the coupled dynamical system, all objects are represented by the same set of mathematical rules. We assume that a measurement of pairwise interactions is given a priori. This pairwise interaction induces couplings among the individual objects such as atoms on a protein which leads to the synchronization of the system if the coupling is sufficiently strong. Then a perturbation is applied to an individual object which will be propagated through the coupled system and finally relax to the synchronous state. We define simplicial complexes and algebraic groups on the dynamical motion or trajectories of the coupled system. The time evolution plays the role of filtration and allows us to further define evolutionary homology. The resulting topological persistence over time enables us to predict the physical properties of the embedded system, such as protein flexibility, protein-protein binding interactions, and the affinity of protein-drug binding.

Protein flexibility analysis is considered a specific application to illustrate and validate our approach. Protein flexibility is an important property that strongly correlates to many protein functions, such as reactivity, allosteric signaling, DNA binding specificity, Alzheimer’s disease, etc. In our formulation, every protein residue is represented by a nonlinear oscillator. The pairwise interaction among protein residues is characterized by a spatial distance valued graph Laplacian function. The method introduced in this work describes the formation and change of high order topological invariants and how they quantify protein residue flexibility. This approach has shown to provide more accurate flexibility prediction than current state-of-the-art methods.

2.2. Coupled dynamical systems

The time evolution of complex phenomena is often described by dynamical systems, i.e., mathematical models built on differential equations for continuous dynamical systems or on difference equations for discrete dynamical systems. Most dynamical systems have their origins in Newtonian mechanics. However, these mathematical models typically only admit highly reduced descriptions of the original complex physical systems, and thus their continuous forms do not have to satisfy the Euler-Lagrange equation of the least action principle. Although a low-dimensional dynamical system is not expected to describe the full dynamics of a complex physical system, its long-term behavior, such as the existence of steady states (i.e., fixed points) and/or chaotic states, offers a qualitative understanding of the underlying system. Focused on ergodic systems, dynamic mappings, bifurcation theory, and chaos theory, the study of dynamical systems is a mathematical subject in its own right, drawing on analysis, geometry, and topology. Dynamical systems are motivated by real-world applications, having a wide range of applications to physics, chemistry, biology, medicine, engineering, economics, and finance.

The dynamical systems employed in this work are well-known chaotic oscillators, namely the Lorenz system and the Rössler system. These systems are selected for the following reasons. 1) They have well-known chaotic behavior. For certain parameter regions and initial conditions, these systems admit chaotic behavior resembling real world phenomena. This information is used to encode the interactions of the physical system. 2) The chaoticity of the Lorenz system and the Rössler system can be easily controlled via a coupling strategy [66,49,89,91] which enables us to appropriately design the proposed EH method. 3) Although the dynamics of the Lorenz system and the Rössler system are quite rich, they are easy to compute. Therefore, interactions of physical systems, such as protein residue-residue interactions and protein-ligand interactions, can be easily encoded to regulate their chaotic dynamics. The resulting dynamics are used in the computation of persistence.

2.2.1. Systems configuration

A brief review is given to establish notation and facilitate our topological formulation, largely following the work of Hu et al. [49] and Xia and Wei [91]. We consider a system with N objects, such as N atoms in a molecule or N neurons in a brain. We regard each object as an n-dimensional dynamical system, i.e., an n-dimensional oscillator. As such, the internal dynamics of N objects is governed by

duidt=g(ui),i=1,2,,N,

where ui = {ui,1,ui,2, ⋯ ,ui,n}T is a column vector of size n.

In reality, objects are interacting with each other. As a result, there are external couplings among objects. The coupling of the objects can be very general. We consider an N ×N graph Laplacian matrix A defined for pairwise interactions

Aij={I(i,j),ijliAil,i=j,

where I(i, j) is a value describing the strength of influence on the ith object induced by the jth object. We assume undirected graph edges, so I(i, j) = I(j, i).

For specific application to protein flexibility, we consider a set of N atoms at positions {ri3}i=1N. Then, I(i, j) represents non-covalent interactions between the ith atom and the jth atom and can be well-approximated by a radial basis function defined via the Euclidean distance between them [91].

Let u = {u1, u2, ⋯, uN}T be a column vector (of size Nn) with ui = {ui,1, ui,2,·⋯ ,ui,n}T. The coupled system is an N ×n-dimensional dynamical system

dudt=G(u)+ϵ(AΓ)u, (1)

where G(u)={g(u1),g(u2),,g(uN)}T, ϵ is a parameter, and Γ is an n×n predefined linking matrix. Weights are used so that the interaction strength between the objects represented by the oscillators can be quantitatively described. The term (AiΓ)u describes the difference between oscillator i and the other oscillators. Since the rows of A add up to 0, the oscillators will reach synchronized state given enough coupling strength.

The Lorenz attractor is described by

gl(ui)=[δ(ui,2ui,1)ui,1(γui,3)ui,2ui,1ui,2βui,3] (2)

where δ, γ, and β are parameters determining the state of the Lorenz oscillator. The Rössler attractor is governed by

gr(ui)=[ui,2ui,3ui,1+aui,2b+ui,3(ui,1c)] (3)

where a, b, c are model parameters. Both the Lorenz attractor and the Rössler attractor have three components and three parameters, but they have different phase space structures and chaotic behaviors.

2.2.2. Stability and controllability

Let s(t) satisfy ds/dt = g(s). We say the coupled systems are in a synchronous state if

u1(t)=u2(t)==uN(t)=s(t).

The stability can be analyzed using v = {u1s, u2s, ⋯, uNs}T with the following equation obtained by linearizing Eq. (1)

dvdt=[INDg(s)+ϵ(AΓ)]v, (4)

where IN is the N × N unit matrix and Dg(s) is the Jacobian of g on s.

The stability of the synchronous state in Eq. (4) can be studied by eigenvalue analysis of graph Laplacian A. Since the graph Laplacian A for undirected graph is symmetric, it only admits real eigenvalues. After diagonalizing A as

Aϕj=λjϕj,j=1,2,,N,

where λj is the jth eigenvalue and ϕj is the jth eigenvector, v can be represented by

v=j=1Nϕjwj(t).

Then, the original problem on the coupled systems of dimension N ×n can be studied independently on the n-dimensional systems

dwjdt=(Dg(s)+ϵλjΓ)wj,j=1,2,,N. (5)

Let Lmax be the largest Lyapunov characteristic exponent of the jth system governed by Eq. (5). It can be decomposed as Lmax = Lg + Lc, where Lg is the largest Lyapunov exponent of the system ds/dt = g(s) and Lc depends on λj and Γ. In many numerical experiments carried out in this work, we set Γ = In, an n × n identity matrix. Then the stability of the coupled systems is determined by the second largest eigenvalue λ2. The critical coupling strength 0 can, therefore, be derived as ϵ0 = Lg/(−λ2).. A requirement for the coupled systems to synchronize is that ϵ > ϵ0, while ϵ ≤ ϵ0 causes instability.

An example of chaos controlled by coupling is shown in Fig. 1. In this example, each alpha carbon atom (Cα) of protein PDB:1E68 is associated with a Lorenz oscillator and the underlying locations of the oscillators are used to construct the coupling matrix. The specific coupling matrix A = Ageo + Aseq used in this example is a sum of a graph Laplacian matrix defined using the geometric coupling,

Aijgeo={1,ifijanddijorg<ϵd,liAilgeo,i=j,

and another which takes the amino acid sequence into account,

Aijseq={ϵseq,if(i+1+N)modN=j,ϵseq,if(i1+N)modN=j,0,otherwise.

Here, dorg is the distance function in the original space; that is, the Euclidean distance between atoms in this example. The mod operator is used because the protein in this example is circular. The parameters used for the example of Fig. 1 are ϵseq = 0.7 for sequence coupling, ϵd = 4Å for spatial cutoff, and δ = 10, γ = 60, and β = 8/3 for the Lorenz system. The parameters in Eq. (1) are ϵ = 10 and

Γ=[000100000].

Initial values for all oscillators are randomly chosen.

Fig. 1.

Fig. 1

(a) Chaotic trajectory of one oscillator without coupling. (b) The 70 synchronized oscillators associated with the carbon Cα atoms of protein PDB:1E68 are plotted together.

2.3. Homology analysis preliminary

In this section, we review the TDA background that is essential for us to establish notations and facilitate our formulations. The interested reader can find further specifics in, e.g., Carlsson [18], or Edelsbrunner and Harer [37].

2.3.1. Simplicial complex and homology

Topological spaces can be approximated, represented, and discretized by simplicial complexes. An (abstract) simplicial complex is a (finite) collection of sets K = {σi}i where each σi is a subset of a (finite) set K0 called the vertex set. We require that this collection satisfies the following condition: if σiK and τ is a face of σi (that is, if τσj commonly denoted τσi), then τK. If σi has k + 1 vertices, {v0, v1, ⋯, vk} where every pair of vertices is nonequivalent, σi is called a k-simplex. The k-skeleton of a simplicial complex K is the subcomplex of K consisting of simplices of dimension k and below. See Fig. 2 for an example.

Fig. 2.

Fig. 2

Examples of simplices of different dimensions (left), and a simplicial complex with a function given on the vertices and edges (middle). The barcode for the given function is drawn at right.

The homology group for a fixed simplicial complex gives a topological characterization which encodes holes of different dimensions. Homology groups are built using linear transformations called boundary operators. A k-chain of the simplicial complex K is a finite formal sum of the k-simplices in K, α=aiσi with coefficients ai2. The group of all k-chains with addition given by the addition of the coefficients is called the k-th chain group and is denoted by Ck(K) or simply Ck when the choice of complex is obvious. Note that because 2 is a field, Ck(K) is, in fact, a vector space.

The boundary operator k : CkCk−1 is the linear transformation generated by mapping any k-simplex to the sum of its codim-1 faces; namely,

k({v0,v1,,vk})=i=0k{v0,,v^i,,vk},

where v^i means that vi is absent. The kth cycle group, Zk(K), is the kernel of the boundary operator k with elements called k-cycles. The kth boundary group, Bk(K), is the image of the boundary operator k+1 and its elements are called k-boundaries. Since kk+1 = 0, Bk(K) is a subgroup of Zk(K). Thus we can define the kth homology group, Hk(K), to be the quotient group Zk(K)/Bk(K). Each equivalence class in Hk(K) can be thought of as corresponding to a k-dimensional “loop” in K going around a k + 1-dimensional “hole”: 1-dimensional classes give information about loops going around 2D voids, 2-dimensional classes give information about enclosures of 3D voids, etc. While the analogy is not as nice, 0-dimensional classes give information about connected components of the space.

2.3.2. Filtration of a simplicial complex and persistent homology

We now turn to the case where we have a changing simplicial complex and want to understand something about its structure. Consider a finite simplicial complex K and let f be a real-valued function on the simplices of K which satisfies the following: f(τ) ≤ f(σ) for all τσ simplices in K. We will refer to this function as the filtration function. For any x, the sublevelset of K associated to x is defined as

K(x)={σKf(σ)x}.

Note first that because of our assumptions on f, K(x) is always a simplicial complex, and second that K(x) ⊆ K(y) for any xy. Further, as x varies, K(x) only changes at the function values defined on the simplices. Since K is assumed to be finite, let {x1 < x2 < ⋯ < x} be the sorted range of f. The filtration of K with respect to f is the ordered sequence of its subcomplexes,

K(x1)K(x2)K(xl)=K. (6)

The filtration of a simplicial complex sets the stage for a thorough topological examination of the space under multiple scales of the filtration parameter which is the output value of the filtration function f. Our choice of the filtration function f for coupled dynamical systems will be given in Sec. 2.4.2.

We are interested in studying the structure of a filtration like that of Eq. (6). Functoriality of homology means that such a sequence of inclusions induces linear transformations on the sequence of vector spaces

Hk(K(x1))Hk(K(x2))Hk(K(xn)). (7)

Persistent homology not only characterizes each frame in the filtration {K(xi)}i, but also tracks the appearance and disappearance (commonly referred to as births and deaths) of nontrivial homology classes as the filtration progresses. A collection of vector spaces {Vi} and linear transformations fi : ViVi+1 is called a persistence module, of which Eq. (7) is an example. It is a special case of a much more general theorem of Gabriel [42] that sufficiently nice persistence modules can be decomposed uniquely into a finite collection of interval modules [26,68]. An interval module I[b,d) is a persistence module for which Vi=2 if i ∈ [b, d) and 0 otherwise; and fi is the identity when possible, and 0 otherwise.

Therefore, given the persistence module of Eq. (7), we can decompose it as [b,d)BkI[b,d), and thus fully represent the algebraic information by the discrete collection Bk. These intervals exactly encode when homology classes appear and disappear in the persistence module. The collection of such intervals can be visualized by plotting points in the 2D half plane {(x, y) | yx} which is known as a persistence diagram; or by stacking the horizontal intervals, which is known as a barcode. In this paper, for no reason other than convenience, we represent our information using barcodes. We call the barcode resulting from a sequence of trivial homology groups the empty barcode and denote it by ∅. Thus, for every interval [b, d) ∈ Bk, we call b the birth time and d the death time.

2.4. Evolutionary homology and its barcode representation

2.4.1. Kinematics

Consider a system of N not yet synchronized oscillators {u1, ⋯, uN} associated to a collection of N embedded points, {r1,,rN}d. We assume the global synchronized state is a periodic orbit denoted s(t) for t ∈ [t0, t1] where s(t0) = s(t1). For flexibility and generality, we work on post-processed trajectories obtained by applying a transformation function on the original trajectories, u^i(t):=T(ui(t)). The choice of function T is flexible and should fit the applications; in this work, we choose

T(ui(t))=mint[t0,t1]ui(t)s(t)2, (8)

which gives 1-dimensional trajectories for simplicity. Further, in our specific example, s^(t):=T(s(t))=0, but, again, this is not necessary in general.

We wish to study the effects on the synchronized system of N oscillators (an (N ×3)-dimensional system) after perturbing one oscillator of interest. To this end, we set the initial values of all the oscillators except that of the ith oscillator to s(t¯) for a fixed t¯[t0,t1]. The initial value of the ith oscillator is set to ρ(s(t¯)) where ρ is a predefined function playing the role of introducing disturbance to the system. After the system starts running, some oscillators will be dragged away from and then go back to the periodic orbit as the disturbance is propagated and relaxed through the system. Let u^ji(t) denote the modified trajectory of the jth oscillator after perturbing the ith oscillator at t = 0. We focus on the subset of nodes that are affected by the perturbation,

Vi={njmaxt>0{mint[t0,t1]u^ji(t)s^(t)2}ϵp}

for some fixed ϵp determining how much deviation from synchronization constitutes “being affected”.

2.4.2. Filtration function defined for coupled dynamical systems

Assuming we have perturbed the oscillator for node ni, let M = |Vi|. We will now construct a function fi on the complete simplicial complex, denoted by K or KM with M vertices. Here, we abuse notation and write Vi = {n1, ⋯, nM}. The filtration function f:KM is built to take into account the temporal pattern of the propagation of the perturbance through the coupled systems and the relaxation (going back to synchronization) of the coupled systems. It requires the advance choice of three parameters:

  • ϵp ≥ 0, mentioned above, which determines when a trajectory is far enough from the global synchronized state, s(t) to be considered unsynchronized,

  • ϵsync ≥ 0 which controls when two trajectories are close enough to be considered synchronized with each other, and

  • ϵd ≥ 0 which is a distance parameter in the space where the points ri are embedded, giving control on when the objects represented by the oscillators are far enough apart to be considered insignificant to each other.

We will define the function fi by giving its value on simplices in the order of increasing dimension. Define

tsynci=min{ttu^ji(t)u^ki(t)2dtϵsync2,j,k}.

That is, tsynci is the first time at which all oscillators have returned to the global synchronized state after perturbing the ith oscillator. The value of the filtration function for the vertex nj is defined as

fi(nj)=min{{tmint[t0,t1]u^ji(t)s^(t)2ϵp}{tsynci}}. (9)

Next, we give the function value fi for the edges of K. To avoid the involvement of any insignificant interaction between oscillators, an edge between nj and nk denoted by ejk is allowed in the earlier stage of the filtration only if djkorgϵd where djkorg is the distance between ri and rj in d. Specifically, the value of the filtration function for the edge ejk is defined as

fi(ejk)={max{min{ttu^ji(t)u^ki(t)2dtϵsync},fi(nj),fi(nk)},ifdjkorgϵdtsynci,ifdjkorg>ϵd. (10)

It should be noted that to this point, f defines a filtration function because when djkorgϵd,fi(nj)fi(ejk) according to the definition given in Eq. (10). The property also holds when djkorg>ϵd because fi(nj)tsync according to the definition in Eq. (9) and fi(ejk) equals tsync in this case.

We extend the function to the higher dimensional simplices using the definition on the 1-skeleton. A simplex σ of dimension higher than one is included in K(x) if all of its 1-dimensional faces are already included; that is, its filtration value is defined iteratively by dimension as

fi(σ)=maxτσfi(τ),

where the max is taken over all codim-1 faces of σ. Taking the filtration of K using this function (c.f. Eq. (6)) means that topological changes only occur at the collection of function values {fi(nj)}j{fi(ejk)}jk. Fig. 3 shows the filtration constructed for an example consisting of three trajectories.

Fig. 3.

Fig. 3

The filtration of the simplicial complex associated to three 1-dimensional trajectories (T(u)) as defined in Sec. 2.4.2. Here, each vertex corresponds to the trajectory with the same color. A vertex is added when its trajectory value exceeds the parameter ϵp; an edge is added when its two associated trajectories become close enough together that the area between the curves after that time is below the parameter ϵsync. Triangles and higher dimensional simplices are added when all necessary edges have been included in the filtration.

2.4.3. Computation of evolutionary homology

The previous section gives a function fi:K|Vi| defined on the complete simplicial complex with |Vi| vertices for each i = 1, ⋯, N. From the filtration defined by fi, we then compute the persistence barcode for homology dimension k, which we call the kth EH barcode, denoted Bik. The persistent homology computation for dimension ≥ 1 on the filtered simplicial complex is done using the software package Ripser [6] using the fact that k-dimensional homology only requires knowledge of k and k + 1-dimensional simplices. The 0-dimensional homology is computed with a modification of the union-find algorithm.

Fig. 4 gives an example of the geometric configurations of two sets of points associated to Lorenz oscillators and their resulting EH barcodes. The EH barcodes effectively examine the local properties of significant cycles in the original space which is important when the data is intrinsically discrete instead of a discrete sampling of a continuous space. As a result, the point clouds with different geometry but similar barcodes using traditional persistence methods1 may be distinguished by EH barcodes.

Fig. 4.

Fig. 4

An example of the construction of the EH barcode. The geometry of two embedded systems is shown in Fig (a) and (b). Specifically, (b) consists of six vertices of a regular hexagon with side length of e1; and (a) consists of the vertices in (b) with the addition of the vertices of hexagons with a side length of e2e1 centered at each of the previous vertices; here, e1 = 8 and e2 = 1. Figs. (c) and (d) are the EH barcodes corresponding to Figs. (a) and (b) respectively. A collection of coupled Lorenz systems is used with parameters δ = 1, γ = 12, β = 8/3, μ = 8, k = 2, Γ = I3, and ϵ = 12; see Eqs. (2), (11) and (1). In the model for the ith residue, marked in red, the system is perturbed from the synchronized state by setting ui,3 = 2s3 with s3 being the value of the third variable of the dynamical system at the synchronized state and is simulated with step size h = 0.01 from t = 0 using the fourth-order Runge-Kutta method. The calculation of persistent homology using the Vietoris-Rips filtration with Euclidean distance on the point clouds delivers similar bars [corresponding to the 1-dimensional holes in (a) and (b) which are [e1e2, 2(e1e2)) and [e1, 2e1).

2.5. Topological learning

2.5.1. Metrics on the space of barcodes

The similarity between persistence barcodes can be quantified by barcode space distances. The most commonly used metrics are the bottleneck distance [27] and the p-Wasserstein distances [29]. The definitions of the two distances are summarized as follows.

The l distance between two persistence bars I1 = [b1, d1) and I2 = [b2, d2) is defined to be

Δ(I1,I2)=max{|b2b1|,|d2d1|}.

The distance between a bar I = [b, d) and null is analogously measured as

λ(I):=(db)/2=minxΔ(I,[x,x)).

For two finite barcodes of dimension k, B1k={Iα1}αAk and B2k={Iβ2}βBk, a partial bijection is defined to be a bijection θ:AkBk where AkAk to BkBk. In order to define the p-Wasserstein distance, we have the following penalty for θ

P(θ)=(αAΔ(Iα1,Iθ(α)2)p+αAk\Akλ(Iα1)p+βBk\Bkλ(Iβ2)p)1/p

Then the p-Wasserstein distance is defined as

dW,p(B1k,B2k)=minθΘP(θ),

where Θ is the set of all possible partial bijections from Ak to Bk. Intuitively, a partial bijection θ is mostly penalized for connecting two bars with large difference measured by Δ(·), and for connecting long bars to degenerate bars (the diagonals of persistence diagram), measured by λ(·).

The bottleneck distance is an L analogue to the p-Wasserstein distance. The bottleneck penalty of a partial matching θ is defined as

P(θ)=max{maxαA{Δ(Iα1,Iθ(α)2)},maxαAk\Ak{λ(Iα1)},maxβBk\Bk{λ(Iβ2)}}.

The bottleneck distance is defined as

dW,(B1k,B2k)=minθΘP(θ).

2.5.2. Learning with barcodes

Evolutionary homology provides a relatively abstract characterization of the objects of interest. It is potentially powerful in many applications, but may be difficult to use out of the box for machine learning or quantitative data analysis techniques. In regression analysis or the training part of supervised learning, with Bi being the collection of sets of barcodes corresponding to the ith entry in the training data, the problem can be cast into the following minimization problem,

minθbΘb,θmΘmiIL(yi,F(Bi;θb);θm),

where L is a scalar loss function, yi is the collection of target values in the training set, F is a function that maps barcodes to suitable input for the learning models, and θb and θm are the parameters to be optimized within the search domains Θb and Θm respectively. The form of the loss function also depends on the choice of metric and machine learning/regression model.

A function F which translates barcodes to structured representation (tensors with fixed dimension) can be used with popular machine learning models including random forest, gradient boosting trees and deep neural networks. Another popular class of models are the kernel based models that depend on an abstract measurement of the similarity or distance between the entries.

Our choices for F, defined in Eq. (12) of Sec. 3.1, will arise from looking at the distance from the specified barcode to the empty barcode and there is no tuning of θb. In Sec. 3.3 where we quantitatively analyze protein residue flexibility, we evaluate our method by checking the correlation between each topological feature defined in Eq. (12) and the experimental value (blind prediction) as well as the correlation between the output of a linear regression with multiple topological features and the experimental value (regression). In the former case, there is no parameter to be optimized, while in the latter case, the specific minimization problem can be written as

minθmn+1iI(yi[EHip1,1,,EHipn,n,1]θm)2,

where EHipk,k is the topological parameter by computing the pk-Wasserstein distance of the empty set to the kth barcode associated with the EH computation of the ith protein residue (node), I is the set of indexes of all residues in the protein and yi is the experimental B-factor for the ith protein residue which quantitatively reflects flexibility.

3. Results

This section starts with protein flexibility analysis in Sec. 3.1. The analysis of ordered and disordered proteins is given in Sec. 3.2. Finally, the quantitative prediction of protein B-factors is described in Sec. 3.3.

3.1. Protein residue flexibility analysis

Proteins have many functions in life forms. They are consisted of one or multiple chains of amino acid residues and often fold into specific 3D structures. The amino acid residues have the same basic structure and different types of residues possess different side chains (often referred to as functional groups). The carbon atom connected to the side chain is called the alpha carbon and forms the backbone of a protein and depict the protein structure at residue level. For many functioning proteins, such as enzymes, certain levels of flexibility at designated locations are required to function correctly. The ability to predict protein flexibility is important in tasks including drug design, protein design, and protein stability analysis. In this section, we combine all the methods to formulate protein residue flexibility analysis using the EH barcodes. Consider a protein with N residues and let ri denote the position of the alpha carbon (Cα) atom of the ith residue. The coupled systems defined in Eq. (1) are used to study protein flexibility with each protein residue represented by an oscillator (the Lorenz system or the Rössler system in this application). Define the distance for the atoms in the original space as the Euclidean distance between the Cα atoms, dorg(ri,rj)=rirj2. A weighted graph Laplacian matrix is constructed based on the distance function dorg to prescribe the coupling strength between the oscillators and is defined as

Aij={e(dorg(ri,rj)/μ)κ,ij,liAil,i=j, (11)

where μ and κ are tunable parameters. The matrix Γ is set to the identity matrix I.

To quantitatively study the flexibility of a protein, one needs to extract topological information for each residue. To this end, we go through the process given in the previous sections once for each residue. When addressing the ith residue, we perturb the ith oscillator at a time point in a synchronized system and take this state as the initial condition for the coupled systems. See Fig. 5 for an example of this procedure when perturbing the oscillator attached to a residue for a given embedding of one particular protein.

Fig. 5.

Fig. 5

The result of perturbing residue 31 in protein (PDB:1ABA). (a) The modified trajectories as defined in Eq. (8) is plotted for each residue after the perturbation at t = 0 as a heatmap. The residues are ordered by the (geometric) distance to the perturbed site from the closest to the farthest. (b) The modified trajectories as defined in Eq. (8) is plotted for each residue after the perturbation at t = 0 as line plots. The darker lines are closer to the perturbed site. The heatmap shows filtration value for the edges as defined in Eq. (10) and the order of residues is the same as in (a). The parameters for the coupled Lorenz system and the perturbation method are the same as that of Fig. 4.

A collection of modified trajectories {u^i(t)}i=1N is obtained with the transformation function defined in Eq. (8). The persistence over time for {u^i(t)}i=1N is computed following the filtration procedure defined in Sec. 2.4.2. Let Bik be the kth EH barcode obtained from the experiment of perturbing the oscillator corresponding to residue i. We introduce the following topological features to relate to protein flexibility:

EHip,k=dW,p(Bik,), (12)

where dW,p for 1 ≤ p < ∞ is the p-Wasserstein distance and p = ∞ is the bottleneck distance. We will show that these features characterize the behavior of this particular collection of barcodes, which in turn, captures the topological pattern of the coupled dynamical systems arising from the underlying protein structure.

The interactions among residues are a major contribution to protein stability and flexibility. Here each protein residue is represented by a dynamical system. Their interactions are modeled by coupling of these dynamical systems. When this coupled system reaches synchronization state, a perturbation of one of the dynamical systems is introduced which serves as a probe to study the flexibility of the corresponding protein residue. Specifically, the flexibility of any given residue is reflected by how the perturbation induced stress is propagated and relaxed through the interactions in the system. Such a relaxation process will induce the change in the states of the nearby oscillators. Therefore, the records of the time evolution of this subset of coupled oscillators in terms of topological invariants can be used to analyze and predict protein flexibility.

The difference in results of the procedure can be seen in the example of Fig. 6 where the control of chaotic oscillators attached to a partially disordered protein (PDB:2RVQ) and a well-folded protein (PDB:1UBQ) is demonstrated. Clearly, the folded part of protein 2RVQ has strong correlations or interactions among residues from residue 25 to residue 110, which leads to the synchronization of the associated chaotic oscillators. In contrast, the random coil part of protein 2RVQ does not have much coupling or interaction among residues. Consequently, the associated chaotic oscillators remain in chaotic dynamics during the time evolution. For folded protein 1UBQ, the associated chaotic oscillators become synchronized within a few steps of simulation, except for a small flexible tail. This behavior underpins the use of coupled dynamical systems for protein flexibility analysis.

Fig. 6.

Fig. 6

Left: partially disordered protein, model 1 of PDB:2RVQ. Right: well folded protien, PDB:1UBQ. The ui,1 value of each dynamical system is plotted as heatmap. The Lorenz system defined in Eq. (2) is used with the parameters δ = 10, γ = 28, β = 8/3. The coupling matrix A defined in Eq. (11) has parameters μ = 14, κ = 2. The coupled system defined in Eq. (1) has parameters Γ = I3 and ϵ = 12. The system is initialized with a random value between 0 and 1 and is simulated from t = 0 to t = 200 with step size h = 0.01. The system is numerically solved using the 4-th order Runge-Kutta method. It can be seen from the heatmaps that the oscillators corresponding to the disordered regions behave asynchronously.

3.2. Discovery of disordered and flexible protein regions

To illustrate the correlation between protein residue flexibility and the topological features defined in Eq. (12), we study several proteins with intrinsically disordered regions. Intrinsically disordered proteins lack stable 3-dimensional molecular structures. One such an example is the Tau protein that stabilizes microtubules and its malfunction is related to Alzheimer’s disease. Partially disordered proteins refer to the intrinsically disordered proteins that contain both stable structure and flexible regions. In nature, the disordered regions may play important roles in biological processes which requires flexibility.

In this section, we use the coupled Lorenz system parameters, perturbation method for the ith residue, and simulation described in Fig. 4 (δ = 1, γ = 12, β = 8/3, μ = 0, κ = 2, Γ = I3, ϵ = 0.12). The simulation is stopped when all oscillators go back to synchronized state. This process is repeated for each residue. Two NMR structures of partially disordered proteins PDB:2ME9 and PDB:2MT6 are studied. The reconstructing 3D structures from NMR data often leads to multiple structure models that are all compatible to the NMR data. We compute the topological features for each model of the structures and take an average over the models. The results are plotted in Fig. 7. The disordered regions clearly correlate to the peaks of EH∞,0 and the valleys of EH∞,1, EH1,0, and EH1,1. The topological features are also able to distinguish between relatively stable coils (the coils that are consistent among the NMR models) and the disordered parts (the parts that differ among the NMR models).

Fig. 7.

Fig. 7

(a) Models 1–3 of PDB:2ME9 with the disordered region colored in blue, red, and yellow for the three models. (b) Similar plot as (a) for PDB:2MT6. (c) Topological features for PDB:2ME9 whose large disordered region is from residue 28 to residue 85. (d) Topological features for PDB:2MT6 whose large disordered region is from residue 118 to residue 151.

3.3. Protein B-factor prediction

B-factor describes how much an atom fluctuate around its mean position in crystal structures. Protein B-factors quantitatively measure the relative thermal motion of each atom and reflects atomic flexibility and dynamics. Though B-factor is also affected by factors such as the refinement methods, it is still a relatively robust measurement of atomic flexibility in proteins. In fact, high correlation (a correlation coefficient of about 0.8) of B-factors among homologous proteins has been reported [75]. The x-ray crystal structures deposited to the Protein Data Bank contain experimentally derived B-factors which can be used to validate the proposed method [70,64]. To analyze protein flexible regions, B-factor prediction is needed for protein structures built from computational models and some experimentally solved structures using NMR or cryo-EM techniques. Normal mode analysis (NMA) is one of the first methods proposed for B-factor predictions [47]. The Gaussian network model (GNM) [5] was known for its better accuracy and efficiency compared to a variety of earlier methods [95]. The multiscale flexibility-rigidity index (FRI), which is about 20% more accurate than GNM, has been established as the state-of-the-art in the B-factor predictions [65].

In this section, we compute the correlation between the topological features and the experimentally derived protein B-factors. Two oscillators are considered, the Lorenz system and the Rössler system. When Lorenz system is used, the same parameters are used as in Section 3.2 (δ = 1, γ = 12, β = 8/3, μ = 0, κ = 2, Γ = I3, ϵ = 0.12). When Rössler system is used, the same coupling parameters are used (a = 0.1, b = 0.1, c = 4, μ = 0, κ = 2, Γ = I3, ϵ = 0.12). We further test the proposed topological features by building a simple linear regression model with a least square penalty against the experimental B-factors. A collection of 364 diverse proteins reported in the literature is chosen as the validation data (The set of 365 proteins [64] excepts PDB:1AGN due to issue in reported B-factors [65]). The size of the proteins ranges from tens to thousands of amino acid residues. The topological features in the model are the same as the setup given in Sec. 3.2. An example of the resulting persistence barcodes for relatively rigid and relatively flexible residues are shown in Fig. 8.

Fig. 8.

Fig. 8

Barcode plots for two residues. (a) Residue 6 of PDB:2NUH with a B-factor of 12.13 Å2. (b) Residue 49 of PDB:2NUH with a B-factor of 33.4 Å2.

The computed topological features are plotted against a relatively small protein and a relatively large protein in Fig. 9. Clearly, 0-dimensional topological features, specifically EH∞,0, provide a reasonable approximation to experimental B-factors. The regression using all topological information, EH, offers very good approximation to experimental B-factors. A summary of the results and a comparison to other methods is shown in Table 1 for the set of 364 proteins. It is seen that the present evolutionary topology based prediction outperforms other methods in computational biophysics. A possible reason for this excellent performance is that the proposed method gives a more detailed description of residue interactions in terms of three different topological dimensions and two distance metrics. This example indicates that the proposed EH has a great potential for other important biophysical applications, including the predictions of protein-ligand binding affinities, mutation induced protein stability changes and protein-protein interactions.

Fig. 9.

Fig. 9

B-factors and the computed topological features. EH shows the linear regression with EH1,0, EH1,1, EH∞,1, EH∞,0, EH1,0, EH1,1, EH2,0and EH2,1 within each protein. The y-axes of the panels have different scales to show the correlation between the variances. (a) PDB:3PSM with 94 residues. (b) PDB:3SZH with 697 residues.

Table 1.

The averaged Pearson correlation coefficients (RP) between the computed values (blind prediction for the topological features and regression for the rest of the models) and the experimental B-factors for a set of 364 proteins [65] and three sets of proteins of different sizes [70]. Top: Prediction RPs based on EH barcodes. Bottom: A comparison of the RPs of predictions from different methods based on the big protein set. Here, EH is the linear regression using EH∞,0, EH∞,1, EH1,0, EH1,1, EH2,0, and EH2,1 within each protein. For a few large and multi-chain proteins, to reduce the computation time and as a good approximation, we compute their EH barcodes on separated (protein) chains. The proteins that were analyzed on each separate chains include: 1F8R, 1H6V, 1KMM, 2D5W, 3HHP, 1QKI, and 2Q52 for both attractors; and additionally, 1GCO, 3LG3, 3W4Q, 2AH1, 3SZH, 4G6C for Rössler attractor. Note that there is an estimated upper limit (correlation coefficient of about 0.8) for B-factor prediction [75].

All (364) Small (33) Medium (36) Large (35)

Method Lorenz Rössler Lorenz Rössler Lorenz Rössler Lorenz Rössler
EH∞,0 0.586 0.469 0.476 0.504 0.569 0.531 0.565 0.500
EH∞,1 −0.039 0.119 −0.001 −0.010 −0.059 0.158 −0.062 0.105
EH∞,2 −0.097 0.003 −0.010 0.0 −0.099 0.0 −0.065 0.0
EH1,0 −0.477 0.486 −0.092 0.486 −0.521 0.542 −0.516 0.487
EH1,1 −0.381 0.204 −0.077 0.032 −0.384 0.276 −0.401 0.210
EH1,2 −0.104 0.002 −0.013 0.0 −0.105 0.0 −0.071 0.0
EH2,0 0.188 0.486 0.171 0.502 0.154 0.552 0.185 0.507
EH2,1 −0.258 0.015 −0.033 −0.022 −0.233 0.074 −0.276 −0.035
EH2,2 −0.100 0.002 −0.010 0.0 −0.102 0.0 −0.067 0.0
EH 0.691 0.698 0.746 0.773 0.701 0.729 0.663 0.665
Method RP Description

EH (Rössler) 0.698 Topological metrics
EH (Lorenz) 0.691 Topological metrics
mFRI 0.670 Multiscale FRI [65]
pfFRI 0.626 Parameter free FRI [64]
GNM 0.565 Gaussian network model [64]

For both dynamical systems, it was observed that the lowest topological dimension (EH∗,0) generally has the strongest correlation to B-factors. The higher dimensional parameters (EH∗,1 and EH∗,2) still carry unique and valuable information which, in a fitting model, boosts the overall performance when paired with EH∗,0 information. Moreover, the higher dimensional parameters are especially useful in the prediction of larger proteins (medium and large proteins in Table 1) indicating that high dimensions can potentially play important roles in the analysis of very complex systems. Despite the unstable performance in small proteins, all parameters show robust and superior performance in medium and large proteins. This observation further demonstrates the usefulness of the present method in handling datasets with very complex structures.

4. Conclusion

Most topological tools are constructed for the global topology of an object under study. The direct use of dynamical systems for the construction of topological persistence is scarce in general. In this work, we utilize dynamical system as a means to study the topology of an individual component of an object. We embed internal interactions of a complex physical object into a set of chaotic dynamical systems to couple chaotic oscillators together, which leads to the eventual synchronization of the dynamics. Simplices, simplicial complexes, and homology groups are subsequently defined based on trajectories of individual chaotic dynamical systems. The resulting topological tool, called evolutionary homology (EH), is able to analyze the topological invariants and its persistence over time of each individual component of a physical object. The resulting barcode representation of the topological persistence is able to unveil the quantitative local topology-local function relationship of individual subsystems of a physical object.

We choose the well-known Lorenz system and Rössler system as examples to illustrate our EH formulation. An important biophysical problem, protein flexibility analysis, is employed to demonstrate the proposed methods. Specifically, we construct weighted graph Laplacian matrices from protein residue networks to regulate the Lorenz or Rössler system, which leads to the synchronization of the chaotic oscillators associated with protein residue network nodes. The synchronization process for each individual oscillator reflects the corresponding Cα’s interaction pattern and is translated into topological invariants of various dimensions and their persistence over time. The Wasserstein and bottleneck metrics are used to quantitatively discriminate EH barcodes of various dimensions from different protein residues, unveiling their thermal fluctuations. The EH model is found to outperform other state-of-the-art methods, namely both geometric graph and spectral graph theory based approaches, in the protein B-factor predictions of a commonly used benchmark set of 364 proteins.

Finally, the proposed EH will be a powerful tool for studying the local properties of other physical systems, such as the impurities of solid materials and partially disordered proteins. By appropriately reorganization and combination of EH barcodes, the proposed EH method can also be applied to the study of the global properties of a physical object, such as the binding affinities of protein-drug, protein-protein, protein-metal and protein-nucleic acid interactions and the protein stability change upon mutation.

Acknowledgments

This work was supported in part by NSF Grants DMS-1721024, DMS-1761320, and IIS1900473, NIH grant GM126189, Pfizer and Bristol-Myers Squibb. The work of EM was supported in part by NSF grants DMS-1800446 and CMMI-1800466.

Footnotes

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

1

Here, traditional means the Vietoris-Rips filtration on the point cloud induced by the embedding

Publisher's Disclaimer: This Author Accepted Manuscript is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication but has not been copyedited or corrected. The official version of record that is published in the journal is kept up to date and so may therefore differ from this version.

Contributor Information

Zixuan Cang, Department of Mathematics, Michigan State University.

Elizabeth Munch, Department of Computational Mathematics, Science and Engineering, Michigan State University; Department of Mathematics, Michigan State University.

Guo-Wei Wei, Department of Mathematics, Michigan State University; Department of Biochemistry and Molecular Biology, Michigan State University; Department of Electrical and Computer Engineering, Michigan State University.

References

  • 1.Adams H, Emerson T, Kirby M, Neville R, Peterson C, Shipman P, Chepushtanova S, Hanson E, Motta F, Ziegelmeier L: Persistence Images: A Stable Vector Representation of Persistent Homology. Journal of Machine Learning Research 18(8), 1–35 (2017). URL http://jmlr.org/papers/v18/16-337.html [Google Scholar]
  • 2.Adcock A, Carlsson E, Carlsson G: The ring of algebraic functions on persistence bar codes. Homology, Homotopy and Applications 18(1), 381–402 (2016). DOI 10.4310/HHA.2016.v18.n1.a21 [DOI] [Google Scholar]
  • 3.Ahmed M, Fasy BT, Wenk C: Local persistent homology based distance between maps. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 43–52. ACM; (2014) [Google Scholar]
  • 4.Arai M, Brandt V, Dabaghian Y: The effects of theta precession on spatial learning and simplicial complex dynamics in a topological model of the hippocampal spatial map. PLoS Computational Biology 10(6), e1003651 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bahar I, Atilgan AR, Erman B: Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding and Design 2(3), 173–181 (1997) [DOI] [PubMed] [Google Scholar]
  • 6.Bauer U: Ripser: a lean c++ code for the computation of Vietoris-Rips persistence barcodes. Software available at https://github.com/Ripser/ripser
  • 7.Bauer U, Kerber M, Reininghaus J: Distributed computation of persistent homology. In: 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 31–38. SIAM; (2014) [Google Scholar]
  • 8.Bendich P, Harer J: Persistent intersection homology. Foundations of Computational Mathematics 11(3), 305–336 (2011) [Google Scholar]
  • 9.Berwald JJ, Gidea M, Vejdemo-Johansson M: Automatic recognition and tagging of topologically different regimes in dynamical systems. Discontinuity, Nonlinearity, and Complexity 3(4), 413–426 (2014) [Google Scholar]
  • 10.Bubenik P: Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research 16(1), 77–102 (2015) [Google Scholar]
  • 11.Bubenik P, Scott JA: Categorification of persistent homology. Discrete & Computational Geometry 51(3), 600–627 (2014). DOI 10.1007/s00454-014-9573-x. URL 10.1007/s00454-014-9573-x [DOI] [Google Scholar]
  • 12.Bubenik P, de Silva V, Scott J: Metrics for Generalized Persistence Modules. Foundations of Computational Mathematics 15(6), 1501–1531 (2015) [Google Scholar]
  • 13.Cang Z, Mu L, Wei GW: Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. Plos Computational Biology 14(1), e1005929. 10.1371/journal.pcbi.1005929 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cang Z, Mu L, Wu K, Opron K, Xia K, Wei GW: A topological approach for protein classification. Molecular Based Mathematical Biology 3, 140–162 (2015) [Google Scholar]
  • 15.Cang Z, Wei GW: Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology. Bioinformatics 33, 3549–3557 (2017) [DOI] [PubMed] [Google Scholar]
  • 16.Cang Z, Wei GW: Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction. International Journal for Numerical Methods in Biomedical Engineering 34(2), e2914 (2017) [DOI] [PubMed] [Google Scholar]
  • 17.Cang Z, Wei GW: TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. Plos Computational Biology 13(7), e1005690, 10.1371/journal.pcbi.1005690 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Carlsson G: Topology and data. Bulletin of the American Mathematical Society 46(2), 255–308 (2009). DOI 10.1090/S0273-0979-09-01249-X. URL http://www.ams.org/journal-getitem?pii=S0273-0979-09-01249-X. Survey [DOI] [Google Scholar]
  • 19.Carlsson G, De Silva V: Zigzag persistence. Foundations of Computational Mathematics 10(4), 367–405 (2010) [Google Scholar]
  • 20.Carlsson G, de Silva V, Morozov D: Zigzag persistent homology and real-valued functions. In: Proc. 25th Annu. ACM Sympos. Comput. Geom., pp. 247–256 (2009) [Google Scholar]
  • 21.Carlsson G, Verovšek SK: Symmetric and r-symmetric tropical polynomials and rational functions. Journal of Pure and Applied Algebra 220(11), 3610–3627 (2016) [Google Scholar]
  • 22.Carlsson G, Zomorodian A: The theory of multidimensional persistence. Discrete & Computational Geometry 42(1), 71–93 (2009) [Google Scholar]
  • 23.Carlsson G, Zomorodian A, Collins A, Guibas LJ: Persistence barcodes for shapes. International Journal of Shape Modeling 11(02), 149–187 (2005) [Google Scholar]
  • 24.Chazal F, Cohen-Steiner D, Glisse M, Guibas LJ, Oudot SY: Proximity of persistence modules and their diagrams. In: Proc. 25th ACM Sympos. on Comput. Geom., pp. 237–246 (2009) [Google Scholar]
  • 25.Chazal F, Guibas LJ, Oudot SY, Skraba P: Persistence-based clustering in Riemannian manifolds. Journal of the ACM (JACM) 60(6), 41 (2013) [Google Scholar]
  • 26.Chazal F, de Silva V, Glisse M, Oudot S: The Structure and Stability of Persistence Modules. Springer International Publishing; (2016). DOI 10.1007/978-3-319-42545-0 [DOI] [Google Scholar]
  • 27.Cohen-Steiner D, Edelsbrunner H, Harer J: Stability of persistence diagrams. Discrete & Computational Geometry 37(1), 103–120 (2007) [Google Scholar]
  • 28.Cohen-Steiner D, Edelsbrunner H, Harer J: Extending persistence using Poincaré and Lefschetz duality. Foundations of Computational Mathematics 9(1), 79–103 (2009) [Google Scholar]
  • 29.Cohen-Steiner D, Edelsbrunner H, Harer J, Mileyko Y: Lipschitz functions have Lp-stable persistence. Foundations of computational mathematics 10(2), 127–139 (2010) [Google Scholar]
  • 30.Cohen-Steiner D, Edelsbrunner H, Harer J, Morozov D: Persistent homology for kernels, images, and cokernels. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 09, pp. 1011–1020 (2009) [Google Scholar]
  • 31.Curto C: What can topology tell us about the neural code? Bulletin of the American Mathematical Society 54(1), 63–78 (2017) [Google Scholar]
  • 32.Curto C, Itskov V: Cell groups reveal structure of stimulus space. PLoS Computational Biology 4(10), e1000205 (2008). DOI 10.1371/journal.pcbi.1000205 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dabaghian Y, Mémoli F, Frank L, Carlsson G: A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Computational Biology 8(8), e1002581 (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.de Silva V, Munch E, Stefanou A: Theory of interleavings on categories with a flow. Theory and Applications of Categories 33(21), 583–607 (2018). URL http://www.tac.mta.ca/tac/volumes/33/21/33-21.pdf [Google Scholar]
  • 35.Dey TK, Fan F, Wang Y: Computing topological persistence for simplicial maps. In: Proceedings of the thirtieth annual symposium on Computational geometry, pp. 345–354 (2014) [Google Scholar]
  • 36.Di Fabio B, Landi C: A Mayer-Vietoris formula for persistent homology with an application to shape recognition in the presence of occlusions. Foundations of Computational Mathematics 11(5), 499–527 (2011) [Google Scholar]
  • 37.Edelsbrunner H, Harer J: Computational Topology: An Introduction. American Mathematical Society (2010) [Google Scholar]
  • 38.Edelsbrunner H, Letscher D, Zomorodian A: Topological persistence and simplification. Discrete & Computational Geometry 28, 511–533 (2002) [Google Scholar]
  • 39.Fasy BT, Wang B: Exploring persistent local homology in topological data analysis. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6430–6434. IEEE; (2016) [Google Scholar]
  • 40.Frosini P: A distance for similarity classes of submanifolds of a Euclidean space. Bullentin of Australian Mathematical Society 42(3), 407–416 (1990) [Google Scholar]
  • 41.Frosini P, Landi C: Size theory as a topological tool for computer vision. Pattern Recognition and Image Analysis 9(4), 596–603 (1999) [Google Scholar]
  • 42.Gabriel P: Unzerlegbare darstellungen i. manuscripta mathematica 6(1), 71–103 (1972). DOI 10.1007/BF01298413. URL 10.1007/BF01298413 [DOI] [Google Scholar]
  • 43.Gameiro M, Hiraoka Y, Izumi S, Kramar M, Mischaikow K, Nanda V: A topological measurement of protein compressibility. Japan Journal of Industrial and Applied Mathematics 32(1), 1–17 (2015) [Google Scholar]
  • 44.Gameiro M, Mischaikow K, Kalies W: Topological characterization of spatial-temporal chaos. Physical Review E 70(3), 035203 (2004) [DOI] [PubMed] [Google Scholar]
  • 45.Ghrist R: Barcodes: The persistent topology of data. Bull. Amer. Math. Soc 45, 61–75 (2008) [Google Scholar]
  • 46.Ghrist R: Elementary Applied Topology. Createspace Seattle (2014) [Google Scholar]
  • 47.Go N, Noguti T, Nishikawa T: Dynamics of a small globular protein in terms of low-frequency vibrational modes. Proc. Natl. Acad. Sci 80, 3696–3700 (1983) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hatcher A: Algebraic Topology. Cambridge University Press; (2002) [Google Scholar]
  • 49.Hu G, Yang J, Liu W: Instability and controllability of linearly coupled oscillators: Eigenvalue analysis. Phys. Rev. E 58, 4440–4453 (1998) [Google Scholar]
  • 50.Kaczynski T, Mischaikow K, Mrozek M: Computational Homology, Applied Mathematical Sciences, vol. 157. Springer-Verlag, New York: (2004) [Google Scholar]
  • 51.Kališnik S: Tropical coordinates on the space of persistence barcodes. Foundations of Computational Mathematics (2018). DOI 10.1007/s10208-018-9379-y [DOI] [Google Scholar]
  • 52.Kasson PM, Zomorodian A, Park S, Singhal N, Guibas LJ, Pande VS: Persistent voids: a new structural metric for membrane fusion. Bioinformatics 23, 1753–1759 (2007) [DOI] [PubMed] [Google Scholar]
  • 53.Khasawneh FA, Munch E: Exploring equilibria in stochastic delay differential equations using persistent homology. In: Proceedings of the ASME 2014 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, August 17–20, 2014, Buffalo, NY, USA: (2014). Paper no. DETC2014/VIB-35655. [Google Scholar]
  • 54.Khasawneh FA, Munch E: Chatter detection in turning using persistent homology. Mechanical Systems and Signal Processing 70–71, 527–541 (2016). DOI 10.1016/j.ymssp.2015.09.046. [DOI] [Google Scholar]
  • 55.Khasawneh FA, Munch E: Utilizing Topological Data Analysis for Studying Signals of Time-Delay Systems, pp. 93–106. Springer International Publishing, Cham: (2017). DOI 10.1007/978-3-319-53426-87. URL 10.1007/978-3-319-53426-87 [DOI] [Google Scholar]
  • 56.Kramár M, Levanger R, Tithof J, Suri B, Xu M, Paul M, Schatz MF, Mischaikow K: Analysis of Kolmogorov flow and Rayleigh–bénard convection using persistent homology. Physica D: Nonlinear Phenomena 334, 82–98 (2016) [Google Scholar]
  • 57.Mileyko Y, Mukherjee S, Harer J: Probability measures on the space of persistence diagrams. Inverse Problems 27(12), 124007 (2011). URL http://stacks.iop.org/0266-5611/27/i=12/a=124007 [Google Scholar]
  • 58.Mischaikow K, Mrozek M, Reiss J, Szymczak A: Construction of symbolic dynamics from experimental time series. Physical Review Letters 82(6), 1144 (1999) [Google Scholar]
  • 59.Mischaikow K, Nanda V: Morse theory for filtrations and efficient computation of persistent homology. Discrete & Computational Geometry 50(2), 330–353 (2013). DOI 10.1007/s00454-013-9529-6. URL 10.1007/s00454-013-9529-6 [DOI] [Google Scholar]
  • 60.Munch E: A user’s guide to topological data analysis. Journal of Learning Analytics 4(2), 47–61 (2017). DOI 10.18608/jla.2017.42.6. URL http://www.learning-analytics.info/journals/index.php/JLA/article/view/5196 [DOI] [Google Scholar]
  • 61.Munch E, Turner K, Bendich P, Mukherjee S, Mattingly J, Harer J, et al. : Probabilistic Fréchet means for time varying persistence diagrams. Electronic Journal of Statistics 9(1), 1173–1204 (2015) [Google Scholar]
  • 62.Nanda V: Perseus: the persistent homology software. Software available at http://www.sas.upenn.edu/vnanda/perseus
  • 63.Nanda V, Sazdanović R: Simplicial Models and Topological Inference in Biological Systems, pp. 109–141. Springer Berlin Heidelberg, Berlin, Heidelberg: (2014). DOI 10.1007/978-3-642-40193-06. URL 10.1007/978-3-642-40193-06 [DOI] [Google Scholar]
  • 64.Opron K, Xia K, Wei GW: Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis. Journal of Chemical Physics 140, 234105 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Opron K, Xia K, Wei GW: Communication: Capturing protein multiscale thermal fluctuations. Journal of Chemical Physics 142(211101) (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ott E, Grebogi C, Yorke JA: Controlling chaos. Physical review letters 64(11), 1196 (1990) [DOI] [PubMed] [Google Scholar]
  • 67.Otter N, Porter MA, Tillmann U, Grindrod P, Harrington HA: A roadmap for the computation of persistent homology. EPJ Data Science 6(1), 17 (2017). DOI 10.1140/epjds/s13688-017-0109-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Oudot SY: Persistence Theory: From Quiver Representations to Data Analysis (Mathematical Surveys and Monographs). American Mathematical Society (2017) [Google Scholar]
  • 69.Oudot SY, Sheehy DR: Zigzag zoology: Rips zigzags for homology inference. Foundations of Computational Mathematics 15(5), 1151–1186 (2015) [Google Scholar]
  • 70.Park JK, Jernigan R, Wu Z: Coarse grained normal mode analysis vs. refined gaussian network model for protein residue-level structural fluctuations. Bulletin of Mathematical Biology 75(1), 124–160 (2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Perea JA: Persistent homology of toroidal sliding window embeddings. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; (2016). DOI 10.1109/icassp.2016.7472916 [Google Scholar]
  • 72.Perea JA, Deckard A, Haase SB, Harer J: Sw1pers: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data. BMC Bioinformatics 16(1), 257 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Perea JA, Harer J: Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis. Foundations of Computational Mathematics 15(3), 799–838 (2015) [Google Scholar]
  • 74.Perea JA, Munch E, Khasawneh FA: Approximating continuous functions on persistence diagrams using template functions. arXiv:1902.07190 (2019) [Google Scholar]
  • 75.Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK: Protein flexibility and intrinsic disorder. Protein Science 13(1), 71–80 (2004) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Reininghaus J, Huber S, Bauer U, Kwitt R: A stable multi-scale kernel for topological machine learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4741–4748 (2015) [Google Scholar]
  • 77.Robins V: Towards computing homology from finite approximations. In: Topology Proceedings, vol. 24, pp. 503–532 (1999) [Google Scholar]
  • 78.Robins V, Meiss JD, Bradley E: Computing connectedness: An exercise in computational topology. Nonlinearity 11(4), 913 (1998). URL http://stacks.iop.org/0951-7715/11/i=4/a=009 [Google Scholar]
  • 79.Robins V, Meiss JD, Bradley E: Computing connectedness: disconnectedness and discreteness. Physica D: Nonlinear Phenomena 139(3–4), 276–300 (2000). DOI 10.1016/S0167-2789(99)00228-6. [DOI] [Google Scholar]
  • 80.Robinson M: Topological Signal Processing. Springer; (2014) [Google Scholar]
  • 81.de Silva V, Morozov D, Vejdemo-Johansson M: Persistent cohomology and circular coordinates. Discrete & Computational Geometry 45, 737–759 (2011) [Google Scholar]
  • 82.Singh G, Mémoli F, Ishkhanov T, Sapiro G, Carlsson G, Ringach DL: Topological analysis of population activity in visual cortex. Journal of vision 8(8), 11–11 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Stolz BJ, Harrington HA, Porter MA: Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos: An Interdisciplinary Journal of Nonlinear Science 27(4), 047410 (2017) [DOI] [PubMed] [Google Scholar]
  • 84.Tausz A, Vejdemo-Johansson M, Adams H: JavaPlex: A research software package for persistent (co)homology. Software available at http://code.google.com/p/javaplex (2011) [Google Scholar]
  • 85.Tralie CJ, Perea JA: (Quasi) periodicity quantification in video data, using topology. SIAM Journal on Imaging Sciences 11(2), 1049–1077 (2018) [Google Scholar]
  • 86.Turner K, Mileyko Y, Mukherjee S, Harer J: Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry 52(1), 44–70 (2014). DOI 10.1007/s00454-014-9604-7. URL 10.1007/s00454-014-9604-7 [DOI] [Google Scholar]
  • 87.Vejdemo-Johansson M, Pokorny FT, Skraba P, Kragic D: Cohomological learning of periodic motion. Applicable Algebra in Engineering, Communication and Computing 26(1–2), 5–26 (2015) [Google Scholar]
  • 88.Wang B, Wei GW: Object-oriented persistent homology. Journal of Computational Physics 305, 276–299 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Wei GW, Zhan M, Lai CH: Tailoring wavelets for chaos control. Phys. Rev. Lett 89, 284103 (2002) [DOI] [PubMed] [Google Scholar]
  • 90.Xia K, Feng X, Tong Y, Wei GW: Persistent homology for the quantitative prediction of fullerene stability. Journal of computational chemistry 36(6), 408–422 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Xia K, Wei GW: Molecular nonlinear dynamics and protein thermal uncertainty quantification. Chaos: An Interdisciplinary Journal of Nonlinear Science 24, 013103 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Xia K, Wei GW: Persistent homology analysis of protein structure, flexibility and folding. International Journal for Numerical Methods in Biomedical Engineering 30, 814–844 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Xia K, Wei GW: Multidimensional persistence in biomolecular data. Journal of computational chemistry 36(20), 1502–1520 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Xia K, Zhao Z, Wei GW: Multiresolution topological simplification. Journal of Computational Biology 22(9), 887–891 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Yang LW, Chng CP: Coarse-grained models reveal functional dynamics–I. elastic network models–theories, comparisons and perspectives. Bioinformatics and Biology Insights 2, 25–45 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Zomorodian A, Carlsson G: Computing Persistent Homology. Discrete & Computational Geometry 33(2), 249–274 (2005) [Google Scholar]

RESOURCES