Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2021 Aug 12;11:16414. doi: 10.1038/s41598-021-94847-5

Topological analysis of interaction patterns in cancer-specific gene regulatory network: persistent homology approach

Hosein Masoomy 1,#, Behrouz Askari 1,#, Samin Tajik 2,#, Abbas K Rizi 3, G Reza Jafari 1,4,
PMCID: PMC8361050  PMID: 34385492

Abstract

In this study, we investigated cancer cellular networks in the context of gene interactions and their associated patterns in order to recognize the structural features underlying this disease. We aim to propose that the quest of understanding cancer takes us beyond pairwise interactions between genes to a higher-order construction. We characterize the most prominent network deviations in the gene interaction patterns between cancer and normal samples that contribute to the complexity of this disease. What we hope is that through understanding these interaction patterns we will notice a deeper structure in the cancer network. This study uncovers the significant deviations that topological features in cancerous cells show from the healthy one, where the last stage of filtration confirms the importance of one-dimensional holes (topological loops) in cancerous cells and two-dimensional holes (topological voids) in healthy cells. In the small threshold region, the drop in the number of connected components of the cancer network, along with the rise in the number of loops and voids, all occurring at some smaller weight values compared to the normal case, reveals the cancerous network tendency to certain pathways.

Subject terms: Biological physics, Complex networks

Introduction

Cancer is one of the most common human genetic diseases characterized by cellular over-proliferation13. Through the gene expression process, genetic code modulates biological functions and associated molecular pathways. The subsequent cellular phenotype is modulated by a dynamic network of interactions among genes. Perturbations in these interactions affect the overall manifestation of genetically driven diseases such as cancer. Genes and their dynamic interactions can be modeled by complex networks represented by nodes and links4. In network systems, each node is considered as a dynamic entity, evolving under the influence of others59. Systems of interacting units consist of links having positive, negative, or zero weight and they together develop a weighted signed network, called Gene Regulatory Network (GRN)1014. GRNs can be constructed by maximum entropy models, analyzed by balance theory approaches15 and topological methods16. Moreover, responses to driving forces on the structure formation of these networks cause the development of new features and subsequently lead to the identification of unique patterns in the observational data. These patterns can arise from non-trivial connections that go beyond classical pairwise interactions, leading to a higher-order construction16. These constructions can be described by simplices of different dimensions and hence, can be studied in the framework of Balance Theory and Topological Data Analysis (TDA). From TDA, we employ the Persistent Homology (PH) analysis tool, which is based on algebraic topology and has been applied to problems in a variety of fields such as network science, physics, chemistry, biology, and medicine1731. PH has been previously used to study protein-protein interaction networks to inform cancer therapy by determining the correlation between Betti numbers and the survival of cancer patients32.

In order to study states of balanced and imbalanced cancer networks, we previously modeled GRNs by groups of three interacting genes, forming triangles (triads) of interactions15. The resulting signed weighted network analysis in the context of Balance Theory showed significant differences between cancer and healthy cases of GRNs in the number of characteristics such as energy, number, and distribution of imbalanced triangles. This paper aims to study the higher-order representation of gene regularity interaction networks derived from cancer and normal samples. Using PH, we address theoretical concepts using empirical data and report network features of cancer samples compared to normal counterparts. Finally, we propose PH as unsupervised network analysis to study human diseases such as cancer.

Network construction from real data and the result of balance theory analysis of the interaction network

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product which leads to the production of protein as the final functional product. Cells go through a wide range of mechanisms known as Gene regulation to increase or decrease the production of specific gene products. Gene expression data is large-scale measurements of the degrees of freedom of a biological system such as a cell. In the language of statistical physics, these describe the micro-states of a cell. A gene regulatory network is a complex network33 which its nodes represent the genes, and its links between them represent the interactive couplings between genes which can be used to predict the global properties of the cells.

We used mRNA (expression level) data of 20532 genes in the case of Breast Cancer (BRCA: Breast invasive carcinoma) from The Cancer Genome Atlas (TCGA)34,35. Since RPKM (Reads Per Kilobase transcript per Million reads) puts together the ideas of normalizing by sample and by gene, we used the RPKM normalized data to find the correlation between the expression levels of the genes. The Reads Per Kilobase transcript per Million reads (RPKM) normalized data was used in order to put together the ideas of normalizing by sample and by the gene. When we calculate RPKM, we are normalizing for both the library size (the sum of each column) and the gene length. Due to computational purposes, we only kept the top 483 most variable genes for all analyses by calculating for each gene the variance of its expression level over its samples. For each gene, we have calculated the variance of its expression level over its samples, and accordingly stored the first 483 genes with the highest variance, which is due to more different activity patterns these genes show. This cohort consisted of two sets of 114 healthy and 764 cancer samples.

We constructed a pairwise correlation matrix36,37 from our data-set based on pairwise gene expressions in the obtained data-set. To find the regulatory connections between genes, we needed a statistical description of the data in terms of suitable observables and infer38,39 the underlying regulatory connections. Therefore, we restricted ourselves to an undirected pairwise maximum-entropy probability model with terms up to second order4042, which we derive for continuous, real-valued variables. This can be considered as a problem in inverse Statistical Physics43,44 where we want to infer parameters of a model based on observations, instead of calculating observables on the basis of model parameters. We applied the following model with pairwise couplings

P({Si})=1Zexpi<jJijSiSj 1

where Si represents the expression level of gene i as a continuous real-valued variable, and interaction matrix Jij, describes the strength of the net interaction between two genes. Z is the so-called partition function, for normalizing the model. The corresponding Hamiltonian (energy function) for this Boltzmann distribution function is then H=-i<jJijSiSj.

Model parameters can be found by satisfying these conditions through the use of Lagrange multipliers; (i) Our model should give the same first and second moments as we measure from the data and (ii) it must maximize the Gibbs-Shannon entropy function defined as S[P]=-P({Si})ln(P({Si})). The obtained model is a multivariate Gaussian distribution of the form:

P(S;S,C)=exp-12(S-S)TC-1(S-S)(2π)L/2det(C)1/2, 2

where L is the number of genes in the distribution and the couplings can be inferred simply by inverting the matrix of variances and covariances of expression levels Jij=-Cij-1. This approach is also linked to the concept of partial correlations in statistics45,46 such that the inverse of the covariance matrix, C-1, also known as precision matrix, offers information about the partial correlations of variables.

By assuming a maximum entropy pairwise model, we were looking for the interaction matrix J, whose every element Jij is the strength of the net interaction between gene i and gene j. In other words, the strength and the sign of the interaction represents the mutual influence of a pair of genes’ expression levels on one another. In real data samples, we considered genes that are either expressed or not expressed together, and defined them as being correlated when they are expressed (or not expressed) mutually. Subsequently, one can construct correlation matrices. However, concerning the interaction matrix construction, we need a model Hamiltonian, producing coefficients. Hence, from the experimental data, we reconstruct the gene-gene interactions computationally based on a model, following the practice that collective behaviors in such systems are described quantitatively by models that capture the observed pairwise correlations. Elements of the proposed interaction matrix J, represent pairwise interaction between genes in the proposed model, where the weight of the link i-j, represented by Jij denote the strength of the interaction between gene i and gene j. Furthermore, genetic interaction (GI) between two genes can be inferred from the sign of their interactions, indicating the way they may affect each other’s functions. Positive and negative interactions on the foundation of the constructed network imply gene expression modulation within the network. Therefore, we expect J to be a sparse matrix since each gene interacts only with a couple of other genes. Inverting a large covariance matrix computationally, however, yields to a matrix which almost none of its elements are zero. To keep this at bay, the inverse of the covariance matrix has been obtained by means of the Graphical Lasso (GLasso) algorithm47. GLasso is generally a sparse penalized maximum likelihood estimator for the concentration or inverse of covariance matrix of a multivariate elliptical distribution. When dealing with a multivariate Gaussian distribution with limited observations (lack of enough samples)48,49, GLasso yields a sparse network (-C-1) while preserving the global features of the network50. In a network analysis, simple thresholding methods can be misleading because removing weak ties may results in the fragmentation of the network; A pair of genes may be weakly connected, while that tie plays a significant role in the structure of the network. On the other hand, removing a strong connection between insignificant or isolated pair of nodes may not destroy the global features of the network, G-Lasso is wary of such issues.

According to structural balance theory, dyadic links holding positive and negative interactions yields four different types of triads, triangles of interactions, in the network5155. Balance and imbalanced states of triangles are consequently determined based on the sign of the product of the links; balanced when positive (JijJjkJki>0), and imbalanced or frustrated otherwise (JijJjkJki<0), and their corresponding energy of a triangle, being defined as Eijk=JijJjkJki, constructs an “Energy landscape” for the network. The stability of imbalanced triangles in the GRN has been studied in previous reports and complex structures and collective behavior of genes has been examined. Previous results confirmed that cancerous cells posses a fewer number of imbalanced triangles compared to the normal samples. In addition, imbalanced triangles in the healthy network appear to be more isolated from the main part of the network. It was shown how the distributions of triangles in the network and their absolute corresponding energy can be used as means to compare normal and cancer networks15.

Stability, in terms of Balance Theory corresponds to a lower energy level according to the proposed energy equation5659. It implies less possibility of changing the configuration of the triangles and therefore, less change in gene regulation within the network. The energy landscape of networks was previously proposed to examine the state of balance60. Energy distributions of different types of triangles was significantly variable in cancer samples compared to normal counterparts. In addition, it was found that the cancer network has less tendency to change its state due to its lower energy level compared to normal network15.

Examining the distribution of the triangles suggested the correlations between such triangles were also different between the two networks15. Based on this observation, we asked how triads with different energies are connected to one another and how schematic diagrams of distribution of frustrated triangles in the normal and cancer network differ. To address this, the concept of exceeding the length of interaction from triplet interactions towards higher-order interactions, quartic interactions or Energy-Energy Correlation between triangles can be proposed, allowing one to study the very influence of units of four entities on the final degree of balance61. Considering a simple pairwise interaction term between triads with a common edge in previous reports, the model Hamiltonian to treat the states of balanced and imbalanced triads is defined

H=-i<j<k<lΔijkΔijl=-s(G), 3

where Δijk represents a triad shaped by i, j, k nodes. In quartic balance theory now the number of squares, i.e., s(G), is an essential parameter for the specific graph configuration and according to structural balance, the corresponding energies can be compared. This formalism examines the probability distribution of the jammed states’ levels of energy, assuming that for the triads, the shift from balanced to imbalanced can be determined based on all triads that share a common link61. As discussed, constructing 3rd and 4th order interaction networks and examining their corresponding energy for normal and cancer networks provide us with practical insights and can be used to compare stability, energy, and the tendency toward changing their states in cancer and normal samples. This concept motivated us to move forward to study higher-order interaction methods, so as to gain a thorough perspective of these interactions, the patterns of these interactions, by which we address further unsolved questions in this matter. In this paper, as an alternative to studying higher dimensional simplices, we employed a topological scheme to examine the interaction patterns of two networks. This method involves studying cancer and normal gene networks using behaviours of defined k-dimensional holes as a general approach to study their higher-order interactions.

Method

By analogy, studying and comparing patterns of interactions in the networks as an alternative to transcending triads or quartic order can be considered as describing a building by its floors and bedrooms, and hallways rather than its building blocks. To study higher-order interactions in cancer networks, formed not only by nodes and links but also by triangles and cliques of higher dimensions, we employed algebraic topology strategy toward analyses that require the encapsulation of higher dimensionality as a substitute for simple pairwise interactions. We suggest that these representations are implemented to complement our previously employed network techniques to distinguish the features of cancerous and normal networks. Here we preview some fundamentals of algebraic topology, and homology theory that is utilized in topological data analysis6265. A simplicial complex is represented by a set of a finite collection of k-dimensional simplices (k-simplices) σk=[v0,v1,,vk]. In Fig. 1 we show the configuration of low-dimensional simplices, their network representation, constructing the associated clique simplicial complex from an unweighted network of nodes and links, and their topological features. As it can be noted from the figure, a 0-simplex σ0 is regarded as vertex (node), a 1-simplex σ1 is defined as an edge (link), a 2-simplex σ2 is a triangle, and a 3-simplex σ3 is a tetrahedron, and so on, see Fig. 1a. For a given simplicial complex ψ, one can define a k-dimensional chain (k-chain) as a linear combination of k-simplices of ψ as follows:

ck=iaiσk(i), 4

where the coefficient aiZ2 and the sum is over all k-simplices σk in ψ. It can be considered that a set of k-simplices forms an abstract vector space Ck, so-called k-dimensional chain group (k-chain group), where its dimension is the number of k-simplices of the complex. For any simplices in any dimension k, in order to measure the topological features and study the homology of the complex a k-dimensional boundary operator has to be defined as:

k(σk)=i=0k(-1)i[v0,,vi-1,vi+1,,vk]σk 5

So k is an operator, mapping σk to its boundary and consequently k-dimensional chain group Ck to (k-1)-dimensional chain group Ck-1:

...k+2Ck+1k+1CkkCk-1k-1...C22C11C00

Figure 1.

Figure 1

(a) 0-,1-,2-, and 3-simplex from left to right (Up row), and their network representation (bottom row). (b) Example of some topological spaces with their associated Betti numbers (left column), and the equivalent spaces and their network representation (right column). (c) An example for constructing the associated clique simplicial complex (middle column) from an unweighted network of nodes and links (left column), and its network representation with its topological features(right column). In network representation, orange and blue subnetworks correspond to 1-holes (loops) and 2-holes (voids), respectively. The network has the Betti vector of β=(1,2,1).

One can define a k-dimensional cycle (k-cycle) zk as a k-chain ck that is mapped to empty set by boundary operator, k(ck)=. This leads to create a subspace Zk, so-called k-dimensional cycle group (k-cycle group), of vector space Ck. On the other hand a k-chain ck that is the boundary of a (k+1)-chain ck+1 can be define as a k-dimensional boundary (k-boundary) bk and consequently k-dimensional boundary group (k-boundary group) Bk as subspace of Ck. Since “boundaries have no boundary”, one can easily write BkZkCk. The idea of homology theory is to discard k-cycles that are also k-boundary. To this end, we put an equivalence relation on Zk as follows. Two k-cycles zk(i) and zk(j) are homologous (equivalent), zk(i)zk(j), if zk(i)-zk(j)Bk. The equivalence relation partitions the subspace Zk into a union of disjoint subsets, called homology classes. The k-homology group of complex ψ is defined as Hk{[zk]|zkZk} where [zk] is the homology class of zkZk.

Hk=Zk/Bk 6

The kth Betti number of complex ψ, denoted by βk(ψ), as a topological invariant of the complex, is the dimension of k-homology group of the complex ψ. Intuitively βk(ψ) indicates number of k-dimensional topological holes (k-holes) of complex ψ. Thus β0 counts number of connected components of ψ, β1 counts number of 1-holes (loops) of ψ, β2 counts number of 2-holes (voids) of ψ and so on, see Fig. 1b. In the graph representation of Fig.1, the one-dimensional topological hole and the two-dimensional hole are illustrated with orange and blue colored lines respectively as our starting point to schematically define these topological features and identify them. It then follows that the Betti numbers are used as an algebraic tool in order to classify the topological spaces and study the homology of the complex66. Due to studying complex networks in terms of homology theory, we use the persistent homology (PH) technique which is the main part of topological data analysis (TDA) as a modern mathematical tool in data science. Following persistent homology strategy, rather than working with the set of nodes (1-simplices) and links (2-simplices), and the statistical properties of the network defined in network science67, we consider higher-order connections as high-dimensional simplices to map the network. In fact, a clique simplicial complex of a network is a simplicial complex in which any k-simplex σk corresponds to a (k+1)-clique (a complete sub-network of order k+1), Fig. 1c. In order to analyze the impact of weight in the structure of a complex network, PH considers the weight as the filtering parameter (threshold), so the filtration as an increasing sequence of complexes can be created, such that, all 1-simplexes (links) with weights higher than the threshold are removed from the weighted complex (network). Upon this development, various topological features such as 1-dimensional holes (loops), and 2-dimensional holes(voids) will appear (birth) by changing the threshold, where they may later disappear (death) in higher values. During a filtration, by varying the threshold of interaction w, a topological feature hk may appear wb(hk), or disappear wd(hk); and the persistency (lifetime) l(hk)wd(hk)-wb(hk) of these homological features can be used to analyze global features of the data-set, which in our case is to examine the differences between the two data-sets29,68. Persistence barcode (PB) or equivalently persistence diagram (PD) for each dimension, are representations of PH that summarize topological information of the data-set. For instance, in PD plot of kth dimension for weighted complex ψ(w), any topological feature hk is represented by a point p(hk)=(wb(hk),wd(hk)), persistence pair, in a 2-dimensional Euclidean space. Figure 2 elaborates the filtration process and the evolution of k-dimensional topological holes and their persistence upon increasing the threshold by the mean of persistence diagram and barcode for the filtration. By this approach, one can capture global features of the network at any threshold (weight) and monitor the persistence and the robustness of the topological features. Hence, adopting a simplicial modeling, a gene is defined as a 0-simplex σ0, and the interactions between genes are regarded as a 1-simplex σ1, and so on. Through varying the scale over which the connections between vertices are made, we aim to identify the behavior of defined simplices from one another within two networks. In a network of interaction, where the genes are vertices and the interactions between two genes are defined as edges, we impose PH to map the network to a weighted clique simplicial complex, use the strength of interaction as a varying threshold, and obtain a family of complexes (subcomplexes) as a function of the weight. We establish a family of unweighted graphs where their topological features can be examined, and their topological evolution as a function of interaction threshold can be studied. This approach can be taken as an alternative to assigning a Hamiltonian to a weighted interaction network to compare these two networks topologically rather than quantitatively in terms of their energy landscape.

Figure 2.

Figure 2

(a) An example of the adjacency matrix for a weighted network. The shade of each pixel corresponds to the weight of the link between the associated nodes. (b) A filtration for the weighted clique simplicial complex constructed from the weighted network. (c) Persistence barcode representing the topological evolution of k-dimensional topological holes. Evolution (birth-weight and death-weight) of any k-holes in the filtration are represented by a horizontal bar (k=0,1,2 black, orange and blue bar, respectively), starting from its birth-weight and ending at its death-weight. The arrows indicate the survived holes. (d) Persistence diagram for the filtration. Any k-hole in the filtration is shown by a point (k=0,1,2 black circle, orange triangle, and blue square, respectively), called persistence pair, in 2-dimensional Euclidean space, known as birth-death space. The first and the second element of the persistence pair equals birth-weight and death weight, respectively. The survived holes lie on the horizontal red dashed-line.

Result and discussion

By analyzing the interaction networks from the topological point of view, we aim to uncover prominent insights into cellular gene interaction patterns. To this end, applying the PH technique on the weighted complex networks of the normal and cancerous data sets, we analyze the evolution of the dimension of the k-homology group of the topological space (βk); where these Betti numbers demonstrate the number of k-dimensional topological holes. As previously noted, a k-hole of the space, depending on its dimension, is a subspace that has no boundary and is not a boundary of any spaces. From the complex network perspective, the k-holes indicate a lack of higher-order connections (links, triangles, ...) between the nodes (agents) of the network, such that by increasing the number of 0-holes β0 (connected components), one can discuss about the lack of links (1-simplices) to connect the connected components. Whereas, arising the number of 1-holes β1 (topological loops) implies the lack of triangles (2-simplices) to connect the nodes (agents) of a sub-network. Through extracting the homological features as a set of evolving 0-2 dimensional Betti numbers, we compare two gene regulatory networks’ interaction patterns topologically. Measuring the number of independent holes of dimension k, plotting their persistence barcode and persistence diagrams and their evolution as a function of weight, is our key point to analyze the topological features of these two data sets. Figure 3 shows the evolution of the number of connected components (0-dimensional holes), its topological barcode, and the persistent diagram for both networks as a function of threshold. As the absolute value of the threshold was increased from 0, there was a sudden decrease in the number of components for both networks. For the cancer network, this sudden drop appeared to happen in a smaller value of interaction.

Figure 3.

Figure 3

(a) Persistence diagram of 0-homology group (PD0) for the normal (blue circles) and the cancer (orange triangles) gene interaction network. The cancerous network includes two small persistent clusters (orange triangles between dashed-lines). Inset: Corresponding persistence barcode (PB0) for normal (blue bars) and cancerous (orange bars) network. The number of survived connected component (arrows) indicate that both networks are path-connected, and two orange long bars correspond to the small persistent clusters in cancerous network. (b) The number of connected components as a function of threshold (β0-curve) for the normal (blue circle) and the cancer (orange triangle) network. This curves indicate that the cancer network has more global accessibility rather than the normal network. The number of connected components in the cancer data-set dropped at a smaller value.

We then asked whether this apparent separation was due to the variation in the strength and distribution of links in those two networks, where the range of weight function seems to be shortened in the cancerous data and more scattered in the normal one, Fig. 4. We found that the faster decline of established components of gene expression interactions in the cancer network is driven by links with the smaller weight. It is noted that gene interactions with higher weight values play a crucial role in the normal case. Conversely, links with the lower value of interaction become dominant in the cancerous network. It should also be pointed out that the two orange triangles between two dashed-lines of PD0 plot (and equivalently the two orange long bars in PB0) account for two small persistent clusters in the cancerous network. β0-curve and correspondingly the number of arrows in PB0 plot, confirms that both networks are path-connected for high weights. We further tested the contribution of gene interaction patterns to cellular networks by comparing the number of 1-dimensional holes (loops) in both networks, in which the graph of cancerous and healthy samples appeared to have deviated significantly. Figure 5 demonstrates the number of loops as a function of threshold. PD1 and PB1 plot illustrate that the cancerous network contains more persistent loops (persistence pairs between dashed-lines in PD1 and long bars in PB1) rather than the normal one. In the bottom panel of Fig. 5, β1-curve reveals that the networks have reached the loopful regime at a distinct value of thresholds. According to the PB1 plot and β1-curve, there are several survived loops (arrows in PB1 and tail of the curves in β1-curve) in the cancerous network, while the normal network is almost loopless at the higher thresholds. We noticed that by increasing the weight of the interactions to its highest value, the number of loops in cancer samples does not reach zero. Our results suggest that studying the pattern of survived 1-dimensional holes can lead to the role of these persistent topological spaces in cancer networks.

Figure 4.

Figure 4

Distribution of weights of links for both normal (blue) and cancerous (orange) networks, where the shorter width of the distribution function of cancerous sample compared to the normal one indicates that the normal network has high-weighted links (tail of the distribution functions) rather than the cancerous network.

Figure 5.

Figure 5

(a) Persistence diagram of 1-homology group (PD1) for normal (blue circles) and cancerous (orange triangles) interaction network. There are many persistent loops (persistence pairs between dashed-lines) in cancerous network rather than normal network. Inset: Corresponding persistence barcode (PB1) for normal (blue bars) and cancerous (orange bars) network. The long bars correspond to the persistent loops in normal and cancerous networks, and the number of survived loops (arrows) in the cancerous network is more than the normal network. (b) The number of topological loops as function of threshold (β1-curve) for normal (blue circle) and cancerous (orange triangle) network. The networks become loop-full at different thresholds (0.02 and 0.03 respectively), whereas they include the same number of loops almost at 0.02 and 0.10. More importantly, the tail of curves show that the cancerous network is loopful, but the normal network is almost loopless.

Figure 6 compares the number of two-dimensional holes (voids) in these networks. The existence of the persistence pairs between dashed-lines in PD2 along with the long bars of PB2 suggests that the normal network includes more persistent voids compared to the cancerous one. As it can be remarked from this figure, the number of two-dimensional holes for both networks starts increasing at small values of threshold. The separation of the β2-curves, however, illustrates that the statistics of the voids saturation is distinct in these two data sets, such that the β2-curve for the cancerous network saturates at the smaller values of interaction.This separation of the pattern is evident at the higher value of the threshold where the last stage of filtration shows a prominent deviation in the number voids in these two states. According to the number of arrows in PB2 plot and the tail of β2-curve, one expects that the number of voids in the normal network is significantly higher. We conclude that unlike patterns of loops, voids are more dominant in the normal network in the high threshold region. Our weight distribution function analysis implies that the cancerous network includes a total number of links with weaker interactions compared to the normal case. The sharper weight distribution function of the cancerous network around smaller absolute values, reveals how this network goes through its topological evolution more promptly as their weights are more restricted to smaller values. The number of connected components dropped, the number of loops and voids raised, and the saturation all happened at smaller thresholds compared to the normal case. One biological interpretation of this result could be that genes in the cancerous cell seem to be highly dependent on specific pathways causing them to start interacting at smaller thresholds and finding their isolated pathways at smaller values. In this study, according to our results, we propose TDA can be employed to associate cancer cell proliferation to numbers and the evolution of topological features, so as to study this disease from the viewpoint of patterns of genes’ interaction in order to confirm how local topological modifications may contribute to global features and propose examining the patterns of interactions as a general and global picture as an alternative to studying single genes and their pairwise interactions.

Figure 6.

Figure 6

(a) Persistence diagram of 2-homology group (PD2) for normal (blue circles) and cancerous (orange triangles) interaction network. The normal network contains more persistent voids (persistent pairs between dashed-lines) rather than the cancerous one. Inset: Corresponding persistence barcode (PB2) for normal (blue bars) and cancerous (orange bars) network. The long bars correspond to the persistent voids, as well, the normal network has more survived void (arrows) rather than the cancerous network. (b) β2-curve for normal (blue circle) and cancerous (orange triangle) network. The curves illustrate that the statistics of voids of the networks saturate in various value of threshold, such that the β2-curve for the cancerous network saturates earlier (approximately 0.1) than the normal (approximately 0.5) network, while it saturates to the lower value (approximately 5000) than the normal case (approximately 8000).

Conclusion

Adopting a novel computational approach, we propose that topological data analysis methods, such as Persistent Homology can be used to study cancer sample data to gain a better perspective on the complexity of this disease at the network level. Cancer is the most common human genetic disease, generated by a number of certain modifications into genes that control the way our cells function. Genes interact with each other, which their highly correlated expressions, and their interactions within a regulatory frame and leading to the emergence of complex structures in the cells, led researchers to investigate the Gene Regulatory Network (GRN) of cells in the framework of graph theory. In this study, we found that network structures are distinctive for normal and cancer samples in both the number and persistence of topological features. Biologically, it is possible that patterns of Betti curves in cancer samples are a manifestation of oncogene addiction at the network level. This phenomenon is defined based on experimental observations that cancer cells appear to be highly dependent on a specific oncogenic pathway69. It is plausible that the persistent topological spaces in cancer samples are sets of tightly related genes that modulate a specific oncogenic pathway, critical for cellular survival and proliferation. Referring back to our building analogy, with its floors and considering its building plan, our question now is if there exist some established patterns for the genes in cancerous networks upon which genes interact, or how these patterns, deviating significantly from the healthy one develop within the networks.

Author contributions

All authors have a same contributions.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Hosein Masoomy, Behrouz Askari and Samin Tajik.

References

  • 1.Chow AY. Cell cycle control by oncogenes and tumor suppressors: Driving the transformation of normal cells into cancerous cells. NatureEducation. 2010;3:7035–7040. [Google Scholar]
  • 2.Hassanpour SH, Dehghani M. Review of cancer from perspective of molecular. J. Cancer Res. Pract. 2017;4:127–129. doi: 10.1016/j.jcrpr.2017.07.001. [DOI] [Google Scholar]
  • 3.Weir HK, Thompson TD, Soman A, Møller B, Leadbetter S. The past, present, and future of cancer incidence in the united states: 1975 through 2020. Cancer. 2015;121:1827–1837. doi: 10.1002/cncr.29258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Newman ME. The structure and function of complex networks. SIAM Rev. 2003;45:167–256. doi: 10.1137/S003614450342480. [DOI] [Google Scholar]
  • 5.Barabasi A-L, Oltvai ZN. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 2004;5:101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
  • 6.Hecker M, Lambeck S, Toepfer S, Van Someren E, Guthke R. Gene regulatory network inference: data integration in dynamic models–a review. Biosystems. 2009;96:86–103. doi: 10.1016/j.biosystems.2008.12.004. [DOI] [PubMed] [Google Scholar]
  • 7.Walhout AJ. Gene-centered regulatory network mapping. Methods Cell Biol. 2011;106:271–88. doi: 10.1016/B978-0-12-544172-8.00010-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Peter IS, Davidson EH. Genomic Control Process: Development and Evolution. Academic Press; 2015. [Google Scholar]
  • 9.Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D-U. Complex networks: Structure and dynamics. Phys. Rep. 2006;424:175–308. doi: 10.1016/j.physrep.2005.10.009. [DOI] [Google Scholar]
  • 10.Costanzo M, Vander Sluis B, Koch EN, Baryshnikova A, Pons C, Tan, G, Wang W, Usaj M, Hanchard J, Lee SD, Pelechano V, Styles EB, Billmann M, van Leeuwen J, van Dyk N, Lin ZY, Kuzmin E, Nelson J, Piotrowski JS, Srikumar T, Bahr S, Chen Y, Deshpande R, Kurat CF, Li SC, Li Z, Usaj MM, Okada H, Pascoe N, San Luis BJ, Sharifpoor S, Shuteriqi E, Simpkins SW, Snider J, Suresh HG, Tan Y, Zhu H, Malod-Dognin N, Janjic V, Przulj N, Troyanskaya OG, Stagljar I, Xia T, Ohya Y, Gingras AC, Raught B, Boutros M, Steinmetz LM, Moore CL, Rosebrock AP, Caudy AA, Myers CL, Andrews B, Boone C. A global genetic interaction network maps a wiring diagram of cellular function. Science. 2016;353(6306):aaf1420. doi: 10.1126/science.aaf1420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Liesecke F, et al. Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks. Sci. Rep. 2018;8:1–16. doi: 10.1038/s41598-018-29077-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ghorbani M, Jonckheere EA, Bogdan P. Gene expression is not random: Scaling, long-range cross-dependence, and fractal characteristics of gene regulatory networks. Front. Physiol. 2018;9:1446. doi: 10.3389/fphys.2018.01446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Huynh-Thu VA, Sanguinetti G. Gene regulatory network inference: An introductory survey. Methods. Mol. Biol. 2018;1883:1–23. doi: 10.1007/978-1-4939-8882-2_1. [DOI] [PubMed] [Google Scholar]
  • 14.Tieri P, Farina L, Petti M, Astolfi L, Paci P, Castiglione F. Network inference and reconstruction in bioinformatics. Encyclop. Bioinformat. Comput. Biol. 2019;2:805–813. doi: 10.1016/B978-0-12-809633-8.20290-2. [DOI] [Google Scholar]
  • 15.Rizi KA, Zamani M, Shirazi A, Jafari GR, Kertész J. Stability of imbalanced triangles in gene regulatory networks of cancerous and normal cells. Front. Physiol. 2021;11:1792. doi: 10.3389/fphys.2020.573732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Battiston F. Networks beyond pairwise interactions: Structure and dynamics. Phys. Rep. 2020;874:1–92. doi: 10.1016/j.physrep.2020.05.004. [DOI] [Google Scholar]
  • 17.Tadić B, Andjelković M, Boshkoska BM, Levnajić Z. Algebraic topology of multi-brain connectivity networks reveals dissimilarity in functional patterns during spoken communications. PLoS One. 2016;11:e0166787. doi: 10.1371/journal.pone.0166787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Andjelković M, Tadić B, Mitrović Dankulov M, Rajković M, Melnik R. Topology of innovation spaces in the knowledge networks emerging through questions-and-answers. PloS one. 2016;11:e0154655. doi: 10.1371/journal.pone.0154655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Andjelković M, Tadić B, Melnik R. The topology of higher-order complexes associated with brain hubs in human connectomes. Sci. Rep. 2020;10:1–10. doi: 10.1038/s41598-020-74392-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sizemore AE, Phillips-Cremins JE, Ghrist R, Bassett DS. The importance of the whole: Topological data analysis for the network neuroscientist. Netw. Neurosci. 2019;3:656–673. doi: 10.1162/netn_a_00073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kartun-Giles AP, Bianconi G. Beyond the clustering coefficient: A topological analysis of node neighbourhoods in complex networks. Chaos Solit. Fract. X. 2019;1:100004. doi: 10.1016/j.csfx.2019.100004. [DOI] [Google Scholar]
  • 22.Horak D, Maletić S, Rajković M. Persistent homology of complex networks. J. Stat. Mech. Theory Exp. 2009;2009:P03034. doi: 10.1088/1742-5468/2009/03/P03034. [DOI] [Google Scholar]
  • 23.DeWoskin D, et al. Applications of computational homology to the analysis of treatment response in breast cancer patients. Topol. Appl. 2010;157:157–164. doi: 10.1016/j.topol.2009.04.036. [DOI] [Google Scholar]
  • 24.Qaiser T, et al. Persistent homology for fast tumor segmentation in whole slide histology images. Proc. Comput. Sci. 2016;90:119–124. doi: 10.1016/j.procs.2016.07.033. [DOI] [Google Scholar]
  • 25.Hiraoka Y, et al. Hierarchical structures of amorphous solids characterized by persistent homology. Proc. Natl. Acad. Sci. 2016;113:7035–7040. doi: 10.1073/pnas.1520877113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ichinomiya T, Obayashi I, Hiraoka Y. Persistent homology analysis of craze formation. Phys. Rev. E. 2017;95:012504. doi: 10.1103/PhysRevE.95.012504. [DOI] [PubMed] [Google Scholar]
  • 27.Nguyen M, Aktas M, Akbas E. Bot detection on social networks using persistent homology. Math. Comput. Appl. 2020;25:58. [Google Scholar]
  • 28.Hernández Serrano D, Sánchez Gómez D. Centrality measures in simplicial complexes: Applications of topological data analysis to network science. Appl. Math. Comput. 2020;382:125331. [Google Scholar]
  • 29.Aktas ME, Akbas E, El Fatmaoui A. Persistence homology of networks: Methods and applications. Appl. Netw. Sci. 2019;4:61. doi: 10.1007/s41109-019-0179-3. [DOI] [Google Scholar]
  • 30.Olejniczak M, Severo Pereira Gomes A, Tierny J. A topological data analysis perspective on noncovalent interactions in relativistic calculations. Int. J. Quantum Chem. 2020;120:e26133. doi: 10.1002/qua.26133. [DOI] [Google Scholar]
  • 31.Masoomy, H., Askari, B., Najafi, M. & Movahed, S. Persistent homology of weighted visibility graph from fractional gaussian noise. ArXiv:2101.03328 (2021). [DOI] [PubMed]
  • 32.Benzekry S, Tuszynski JA, Rietman EA, Klement GL. Design principles for cancer therapy guided by changes in complexity of protein-protein interaction networks. Biol. Dir. 2015;10:32. doi: 10.1186/s13062-015-0058-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Newman M. Networks. Oxford University Press; 2018. [Google Scholar]
  • 34.Weinstein JN, et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 2013;45:1113. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.https://www.cancer.gov/tcga.
  • 36.Lee JA, Dobbin KK, Ahn J. Covariance adjustment for batch effect in gene expression data. Stat. Med. 2014;33:2681–2695. doi: 10.1002/sim.6157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Schneidman E, Berry MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440:1007–1012. doi: 10.1038/nature04701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.MacKay DJ, Mac Kay DJ. Information Theory, Inference and Learning Algorithms. Cambridge University Press; 2003. [Google Scholar]
  • 39.Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006. [Google Scholar]
  • 40.Stein RR, Marks DS, Sander C. Inferring pairwise interactions from biological data using maximum-entropy probability models. PLoS Comput. Biol. 2015;11:e1004182. doi: 10.1371/journal.pcbi.1004182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Moradimanesh Z, Khosrowabadi R, Gordji ME, Jafari G. Altered structural balance of resting-state networks in autism. Sci. Rep. 2021;11:1–16. doi: 10.1038/s41598-020-80330-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lezon TR, Banavar JR, Cieplak M, Maritan A, Fedoroff NV. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns. Proc. Natl. Acad. Sci. 2006;103:19033–19038. doi: 10.1073/pnas.0609152103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Nguyen HC, Zecchina R, Berg J. Inverse statistical problems: from the inverse ising problem to data science. Adv. Phys. 2017;66:197–261. doi: 10.1080/00018732.2017.1341604. [DOI] [Google Scholar]
  • 44.Castellana M, Bialek W. Inverse spin glass and related maximum entropy problems. Phys. Rev. Lett. 2014;113:117204. doi: 10.1103/PhysRevLett.113.117204. [DOI] [PubMed] [Google Scholar]
  • 45.Krumsiek J, Suhre K, Illig T, Adamski J, Theis F. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Syst. Biol. 2011;5:21. doi: 10.1186/1752-0509-5-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Baba K, Shibata R, Sibuya M. Partial correlation and conditional correlation as measure of conditional independence. Aust. N. Z. J. Stat. 2004;46:657–664. doi: 10.1111/j.1467-842X.2004.00360.x. [DOI] [Google Scholar]
  • 47.Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–441. doi: 10.1093/biostatistics/kxm045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Dempster AP. Covariance selection. Biometrics. 1972;28:157–175. doi: 10.2307/2528966. [DOI] [Google Scholar]
  • 49.Banerjee, O., d’Aspremont, A. & Ghaoui, L. Sparse covariance selection via robust maximum likelihood estimation. https://arxiv.org/pdf/cs/0506023.pdf (2005).
  • 50.Borgatti SP. Centrality and network flow. Soc. Netw. 2005;27:55–71. doi: 10.1016/j.socnet.2004.11.008. [DOI] [Google Scholar]
  • 51.Heider F. Attitudes and cognitive organization. J. Psychol.. 1946;21:107–112. doi: 10.1080/00223980.1946.9917275. [DOI] [PubMed] [Google Scholar]
  • 52.Cartwright D, Harary F. Structural balance: A generalization of Heider’s theory. Psychol. Rev. 1956;63:277. doi: 10.1037/h0046049. [DOI] [PubMed] [Google Scholar]
  • 53.Kirkley A, Cantwell GT, Newman MEJ. Balance in signed networks. Phys. Rev. E. 2019;99:012320. doi: 10.1103/PhysRevE.99.012320. [DOI] [PubMed] [Google Scholar]
  • 54.Antal T, Krapivsky PL, Redner S. Dynamics of social balance on networks. Phys. Rev. E. 2005;72:036121. doi: 10.1103/PhysRevE.72.036121. [DOI] [PubMed] [Google Scholar]
  • 55.Singh P, Sreenivasan S, Szymanski BK, Korniss G. Competing effects of social balance and influence. Phys. Rev. E. 2016;93:042306. doi: 10.1103/PhysRevE.93.042306. [DOI] [PubMed] [Google Scholar]
  • 56.Saeedian M, Azimi-Tafreshi N, Jafari GR, Kertesz J. Epidemic spreading on evolving signed networks. Phys. Rev. E. 2017;95:022314. doi: 10.1103/PhysRevE.95.022314. [DOI] [PubMed] [Google Scholar]
  • 57.Rabbani F, Shirazi AH, Jafari GR. Mean-field solution of structural balance dynamics in nonzero temperature. Phys. Rev. E. 2019;99:062302. doi: 10.1103/PhysRevE.99.062302. [DOI] [PubMed] [Google Scholar]
  • 58.Hedayatifar L, Hassanibesheli F, Shirazi A, Farahani SV, Jafari G. Pseudo paths towards minimum energy states in network dynamics. Phys. A Stat. Mech. Appl. 2017;483:109–116. doi: 10.1016/j.physa.2017.04.132. [DOI] [Google Scholar]
  • 59.Sheykhali S, Darooneh AH, Jafari GR. Partial balance in social networks with stubborn links. Phys. A Stat. Mech. Appl. 2020;548:123882. doi: 10.1016/j.physa.2019.123882. [DOI] [Google Scholar]
  • 60.Marvel SA, Strogatz SH, Kleinberg JM. Energy landscape of social balance. Phys. Rev. Lett. 2009;103:198701. doi: 10.1103/PhysRevLett.103.198701. [DOI] [PubMed] [Google Scholar]
  • 61.Kargaran A, Ebrahimi M, Riazi M, Hosseiny A, Jafari G. Quartic balance theory: Global minimum with imbalanced triangles. Phys. Rev. E. 2020;102:012310. doi: 10.1103/PhysRevE.102.012310. [DOI] [PubMed] [Google Scholar]
  • 62.Wasserman L. Topological data analysis. Annu. Rev. Stat. Appl. 2018;5:501–532. doi: 10.1146/annurev-statistics-031017-100045. [DOI] [Google Scholar]
  • 63.Zomorodian A. Topological data analysis. Adv. Appl. Comput. Topol. 2012;70:1–39. doi: 10.1090/psapm/070/587. [DOI] [Google Scholar]
  • 64.Munch E. A user’s guide to topological data analysis. J. Learn. Anal. 2017;4:47–61. [Google Scholar]
  • 65.Epstein C, Carlsson G, Edelsbrunner H. Topological data analysis. Inverse Problems. 2011;27:120201. doi: 10.1088/0266-5611/27/12/120201. [DOI] [Google Scholar]
  • 66.Topaz CM, Ziegelmeier L, Halverson T. Topological data analysis of biological aggregation models. PloS one. 2015;10:e0126383. doi: 10.1371/journal.pone.0126383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Albert R, Barabási A-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 2002;74:47. doi: 10.1103/RevModPhys.74.47. [DOI] [Google Scholar]
  • 68.Roy I, Vijayaraghavan S, Ramaia SJ, Samal A. Forman-Ricci curvature and persistent homology of unweighted complex networks. Chaos Solit. Fract. 2020;140:110260. doi: 10.1016/j.chaos.2020.110260. [DOI] [Google Scholar]
  • 69.Sharma SV, Settleman J. Oncogene addiction: Setting the stage for molecularly targeted cancer therapy. Genes Dev. 2007;21:3214–3231. doi: 10.1101/gad.1609907. [DOI] [PubMed] [Google Scholar]

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES