Abstract
Dimension is a fundamental property of objects and the space in which they are embedded. Yet ideal notions of dimension, as in Euclidean spaces, do not always translate to physical spaces, which can be constrained by boundaries and distorted by inhomogeneities, or to intrinsically discrete systems such as networks. To take into account locality, finiteness and discreteness, dynamical processes can be used to probe the space geometry and define its dimension. Here we show that each point in space can be assigned a relative dimension with respect to the source of a diffusive process, a concept that provides a scale-dependent definition for local and global dimension also applicable to networks. To showcase its application to physical systems, we demonstrate that the local dimension of structural protein graphs correlates with structural flexibility, and the relative dimension with respect to the active site uncovers regions involved in allosteric communication. In simple models of epidemics on networks, the relative dimension is predictive of the spreading capability of nodes, and identifies scales at which the graph structure is predictive of infectivity. We further apply our dimension measures to neuronal networks, economic trade, social networks, ocean flows, and to the comparison of random graphs.
Subject terms: Mathematics and computing, Applied mathematics, Physics, Biological physics, Complex networks
Defining the dimension in bounded, inhomogeneous or discrete physical systems may be challenging. The authors introduce here a dynamics-based notion of dimension by analysing diffusive processes in space, relevant for non-ideal physical systems and networks.
Introduction
One of the first forays into graph dimensionality originated with Erdös, when he explored the embedding of graphs into a minimum finite-dimensional Euclidean space1. This line of study helped realise the algorithmic importance of geometric interpretations of graphs2 but was unfortunately no more than a by-product of the graph embedding process, yielding little actionable information3. Later, by characterising the fractal properties of complex networks, a measure of network dimension was defined in terms of the scaling property of a network topological volume4–6. Whilst the fractal approach showed that dimension plays an important role in characterising network topology and governing dynamical processes such as percolation7, it was initially limited to global descriptions of network dimension. Extensions that considered the local scaling properties of the volume at different topological distances from a node were introduced in8 and have been used to define a node-centric dimension that can identify influential nodes9,10 or vital spreaders in infection models11.
However, methodologies based on fractal approaches assume that the topological volume follows a power-law distribution, a strong assumption, not necessarily accurate in real-world networks exhibiting heterogeneities5. Similarly, in classic papers such as12, where the dimension of a node is defined using the decay rate of diffusion, or in13, where a random walk is used to create node embeddings, the same assumptions of homogeneity are required and an intermediate scale of dynamics must be chosen. As an example, with a diffusive source located at the joining of a 1-d and a 2-d space, by measuring the decay rate we immediately ignore the heterogeneity of the space and simply find a dimension somewhere between 1 and 2. In this paper, we posit that the dimension at a node can, and should be, defined as relative to another node. Using the solution of diffusion at other nodes relative to the source we are able to define a relative dimension.
Results
Graph dimension from diffusion dynamics
We start with the Green’s function of the diffusion equation in d dimensions
1 |
which, together with an initial condition as a delta function at some position x0, provides a solution of diffusion equation as p(x, t) = Gt(x − x0). From hereon, we refer to the time evolution of p(x, t) as the transient response. As already considered in our previous works14,15, these solutions have a maxima in their transient response at any other location x, at time and amplitude given as
2 |
where, without loss of generality, x0 = 0. Then, the dimension at any point x relative to x0 can be evaluated to yield the definition of the relative dimension
3 |
Clearly, on the Euclidean space , the relative dimension is always equal to d, independently of x and x0. However, if we instead consider a compact subspace , the diffusion dynamics will deviate from those prescribed in Equation (1) due to the presence of boundaries relative to x and x0.
The key property of Equation (3) that allows us to generalise it to graphs is that the positions x0 and x are not explicit in the right-hand side but only used as labels to initialise the diffusion dynamics and measure the transient response. Consequently, the relative dimension can be seen as intrinsic as it does not rely on any Euclidean embedding, but only on the existence of a diffusion dynamics on the original space. In particular, on graphs we can use the standard diffusion process
4 |
for a time-dependent node vector p(t) with L the normalised graph Laplacian L = K−1(K − A) (corresponding to Euclidean diffusion in the continuous limit16), where K is the diagonal matrix of node degrees. Using a delta function at node i with mass mi, p(0) = (0, 0, …, mi, …, 0), as our initial condition, the j-th coordinate of the solution of Equation (4) (the so-called transient response of j) is given by the heat kernel
5 |
By numerically solving (5), we can measure the time and amplitude at which a maximum appears in the transient response peak (time evolution) of node j given a delta function initial condition at node i. In analogy to Equation (3), we can then compute the full N × N matrix of relative dimensions with elements
6 |
To illustrate the notion of relative dimension, we used a line graph (Fig. 1a, b) as a discrete representation of the continuous 1-D interval. We observe that due to the boundaries, a large fraction of nodes do not have a peak in transient response, however for nodes near the source, where the boundary has no influence, the relative dimension is close to the expected d = 1. We emphasise that the dimension is not derived from a fit to the data, as is common in measures of fractal dimensions4–6, but instead is directly observed at the transient response relative to a source node.
It is then natural to define the local dimension of a node i by averaging the relative dimension of the nodes displaying a peak in their transient responses relative to i before a given time τ as
7 |
where is the indicator function. Whilst the local dimension can be likened to a measure of centrality, it also directly captures the dimension of the local embedding space. In Fig. 1c we observe the increasing effect of the boundaries on local dimension as we increase the scale. Near the centre of the line, and when considering nearby nodes (at short scales), one can expect to estimate a dimension near 1, or equivalently 2 for the grid shown in Fig. 1d. We observe in Fig. 1c a central region with that becomes increasingly smaller as scale τ increases; at short scales, the central region is insensitive to the boundaries since the diffusion has not yet reached them. This ‘boundary insensitive central region’ collapses at τ = 1 (corresponding to the spectral gap of the graph) when all nodes have aggregated information about the boundaries of the line graph.
Finally, we can define a graph measure of dimension by averaging the local dimensions across multiple scales to obtain the global dimension
8 |
still dependent on τ. In Fig. 1e we display the global dimension (as a ratio to the expected Euclidean dimension) for the line and grid graphs and their periodic equivalents (the circle and sphere graphs respectively).
Whilst the periodic equivalents do not contain boundaries, they are still constrained to a compact space that will introduce topological effects, e.g., on a periodic graph the diffusion will interact with itself at the opposite side to the initial condition. We first notice that the non-periodic graphs display a maximum in global dimension, likely when the effect of the boundaries is lowest. In contrast, the periodic graphs do not exhibit a peak of the same magnitude suggesting that the topological effect of a compact space has less impact on the global dimension than the presence of a boundary.
In the context of graphs as discrete Euclidean spaces, the maximum of the global dimension curve (Fig. 1e) can be seen as an approximation of the Euclidean dimension, whereas the global dimension at largest scale characterises the effect of the boundary or topology of the graph. It should be noted that for a non grid-like graph, what is a boundary or a topological effect is not clear. By increasing the graph size, and thus reducing the effects of the boundaries, the global dimension converges towards the expected Euclidean dimension (Fig. 1f). For the grid, the surface of the boundary increases with respect to the volume of the space and results in a slower convergence, whereas the global dimension of the periodic grid is only affected by the topology, and thus converges faster.
Delaunay meshes and inhomogeneities
To develop more intuition for our measure of relative dimension, we consider a simple constructive example using Delaunay meshes in Fig. 2. Given a source-node located at the left boundary of a homogeneous delaunay mesh, relative dimension displays an inhomogeneous distribution radially from the source until nodes do not have a transient response peak (Fig. 2(a)). Adding nodes near the centre of the Delaunay grid graph creates local inhomogeneities modifying the underlying space, with a clear analogy to the theory of gravitation and gravitational lensing17. In particular, the added mass acts as a gravitational lens for the diffusion process, whereby nodes directly behind the point mass that were previously ’unreachable’ can be ’reached by the diffusion’ if the mass is sufficiently large. Small masses are reminiscent of weak lensing (Fig. 2(b)), whereas larger masses are closer to strong lensing (Fig. 2(c))18. The behaviour of relative dimension in the presence of inhomogeneities suggests that diffusion effectively occurs on a curved geometry induced by the presence of the mass. Moving the mass towards one boundary (Fig. 2(d)) shows some coupling between the lensing effect and the presence of the boundary. All three possible effects, boundaries, topology and inhomogeneities, are thus important in the notion of dimensions, but may not be distinguishable in more complex networks. Nevertheless, our notion of relative dimension is able to capture them all in one graph-theoretical measure.
Dimensions in protein structure: rigidity and allostery
We then apply the relative dimensions on a real-world example with allostery in proteins, a phenomena whereby a subset of a protein (active site) can be modulated (activated or inactivated) through binding of a ligand at another subset of the protein (allosteric site). We examine three well-studied allosteric proteins: HRas GTPas, Lac repressor and PDK1 in Fig. 3 (for more details on these proteins, see Methods). In HRas, we find a low relative dimension at the active site given the allosteric site as the source (Fig. 3a(i)), but in reverse the allosteric site does not see a transient peak from the diffusion started in the active site (Fig. 3a(ii)). Even if an exact statement of allosteric mechanism is not our purpose here, it is interesting to note that a low relative dimension suggests a more ‘direct’ or ‘funneled’ communication from the allosteric site to the active site. Moreover, the asymmetry of this communication may relate to different functions for each half of the protein.
The lac repressor protein is constructed from two separate monomers and it is generally understood that binding of both NPF molecules (one on each monomer) is required to activate the lac repressor via a cooperative allosteric effect acting on the hinge region19. Given that the allosteric mechanism is cooperative, we do not expect a direct communication to the active site from the allosteric site, and instead we examined the change in relative dimension upon using a single allosteric site as a source (Fig. 3b(i)) vs. both allosteric sites as sources simultaneously (Fig. 3b(ii)). We find that when binding NFP to just one monomer the relative dimension across the entire protein is lower when compared to using both allosteric sites as sources of diffusion.
Finally, binding at the PDK1 interacting fragment (PIF) on PDK1 triggers a signal to start the phosphorylation of the activation loop of the substrates at the ATP pocket, or active site20, and thus we would expect direct communication between the active and allosteric sites. Using the allosteric site as the source of our diffusion (Fig. 3c), we find that a large region of PDK1 does not return a relative dimension (grey region in Fig. 3c). We remind the reader that to calculate relative dimension we must observe a peak in the transient response. Of those residues for which relative dimension was computed, the activation loop displays the lowest relative dimension to the allosteric site. We hypothesise that a lower dimension pathway from the allosteric to active site will improve the efficiency of communication transfer since it becomes more direct.
Whilst the relative dimension provides insights into allostery, we can leverage the local and global dimension to examine protein dynamics. In Fig. 4(a), we show a strong correlation between the local dimension and of residues for Fig. 4a(i) an unglycosylated antibody CH2 domain and Fig. 4a(ii) an Oestrogen Related Receptor g protein. The results here suggest that a residue with a larger local dimension is associated with a lower flexibility and thus lower degrees of freedom.
To examine this further, we plotted the Pearson correlation between local dimension and for 12 randomly chosen proteins in Fig. 4(b). We see that at middling to long time scales of diffusion the correlation plateau with an average at about σ = 0.55 suggesting that the relationship between local dimension and protein flexibility is robust. Calculating the global dimension for the same set of proteins in Fig. 4(c), we find a correlation (Pearson σ = 0.73) between global dimension and the of a protein. The global values of dimension sit between 1.36 and 1.5 for the 12 proteins. These results agree with studies that show spectral dimension is generally < 2 and decreases with an increase in flexibility12,21.
We now take a deeper look at Aquifex Adenylate Kinase (ADK), a dynamical protein with three subdomains: the lid, AMP and core domains. We find that the closed conformation displays a higher local dimension due to the presence of stabilising interactions, not present in the open conformation, creating a more compact structure (Fig. 4d). The AMP and lid domains are known to open and close around substrate. We find that both have a lower local dimension relative to the core domain (Fig. 4e) and that the AMP domain to have a lower average local dimension than the lid domain in both conformations. The latter we validated using experimental fluorescence correlation spectroscopy that shows that the AMP domain to open and close at a faster rate (16.2 μs) than the lid domain (46.6 μs)22,23.
Local dimension as a means to differentiate node roles
To further explore our measure of dimension in the context of identifying roles of nodes within the network, we present two examples of real-world complex networks in Fig. 5 where nodes have pre-assigned roles. The first example explores the world trade network (consisting of 80 nodes) of metal manufacturing in 199424, where nodes correspond to countries and directed incoming edges represent the amount of weighted imports from another country. A well established concept in economic theory partitions countries based on their positioning (1. core, 2. semi-peripheral, 3. peripheral) within the world economy25. For the largest scale, we find significant differences between distributions of the local dimension for each of the world partitions (Fig 5b). There is almost no overlap in local dimension between the two extreme partitions, core and periphery, but the distribution of local dimension for semi-peripherical nodes is wider, suggesting that this class of countries is more diverse.
Our second example is the undirected connectome (N = 377) of the nematode Caenorhabditis elegans (Fig. 5b(i)) with the inclusion of muscles, important for examining control26 (https://www.wormatlas.org/neuronalwiring.html), and where scales have previously been shown as important27. We compare the dimension of the three different neuronal types (inter neurons, sensory neurons, motor neurons) and muscles, at long scales in Fig. 5b(ii), and find significant differences in their local dimensions. Inter-neurons are central nodes of neural circuits that enable communication between sensory and motor neurons, thus we would expect them to sit in a higher-dimensional space, where muscles are peripheral as they display the lowest local dimension, likely aiding with the direct propagation of signals. In addition, we find the highest dimensional nodes are the important control motor neurons AVA/AVB neurons (both left and right), resulting in uncontrolled motion if ablated26 (see Supplementary Table 1 for top 40 local dimension neurons).
Local dimension as scale-dependent measure of centrality
Measures of centrality are some of the most fundamental tools in network theory. Here, we show that the local dimension can also be utilised as a scale-dependent centrality measure, such as those derived in15,28. To illustrate the use of the local dimension as a centrality measure for complex networks we analysed two datasets where the importance of nodes changes substantially with scale.
First we look at the global network of ocean surface currents derived from the Global Drifters programme (http://www.aoml.noaa.gov/phod/gdp/index.php) constructed by29 (https://github.com/maurofaccin/ocean_surface_dataset). Each node is associated with a small region of the ocean, and an edge between two nodes counts the number of drifters passing from one to another region in a given time interval T. For short times, such as T = 16 days, the graph connectivity remains local with respect to the spatial embedding of the nodes on the earth surface, but with larger times (T = 208 days) the connectivity becomes long range and complex (see also the degree distribution in Supplementary Fig. 1). We can examine both time intervals at short and long scales of our local dimension (Fig. 6a); the small or large scale local dimension provide different perspectives on regions of high dimensions, related to regions where the ocean flow has a more complex dynamics. At small time intervals and short scales (Fig. 6a(i) top), we identify locally high dimension regions such as the Gulf stream or the Pacific garbage patch where drifters remain trapped and circulate quickly. If we look at long time intervals (Fig. 6a(ii)), we notice bands of high dimension which represent the boundaries between main gyres, such as that along the equator. At short scales, the drifters have lower dimensional dynamics while they follow these currents. However, at longer scales the drifters can drift north or south of the equator and be further transported to widely different regions throughout the world, and thus the dimension of the boundaries between major ocean currents is larger. We also note a visual similarity between the small time interval and long scale (Fig. 6a(i) bottom) and long time interval and short scale (Fig. 6a(ii) top), whereby the drifter movements are generally split between north and south. Our results provide further evidence that a notion of scale in the analysis of ocean flow is crucial to exploit and interpret the dynamics29.
Finally, we examine a complex social network of scientific collaborations between New Zealand institutions (Fig. 6b). Each node represents an institution which falls into the following categories: higher education, Government, Private not for profit, or Business Enterprise. Edges are weighted by the number of collaborations between two institutions in the time period 2010–2015, measured by co-authored publications on Scopus30. We compute the local dimension as a function of scale on this network and identify three main scales (short, medium and long; Fig. 6b(ii)). On average and across scales, the higher education institutions displayed the highest local dimension and business enterprises were lowest. However, if we instead look only at the 5 nodes with the highest local dimension, we find that at short scales, businesses and government institutions comprised the top 5 local dimension nodes, highlighting their high dimension to a small neighbourhood. For a wide range of medium time scales, we find that the universities display the largest local dimension, reflecting their hub-like role in the network (Fig. 6b(i)). At long time scales (in the limit close to stationarity) we find a mixture of nodes from all institutions appear in the top 5 nodes. A previous study used betweenness and eigenvector centrality to show that most central institutions were not solely universities, but was also comprised of other institution types30. Here, we show that the precise role of each node depends on the choice of scale, as already discussed in ref. 15.
Dimension in epidemic spreading
What about dynamical processes on networks? In Fig. 7a, we use an SIR model on Watts-Strogatz small-world networks31 and by scanning the infection probability β, we show that the local dimension of a node strongly predicts its infectiousness. Below the critical regime of large infectiousness, we find that infection probability is positively correlated with the scale, i.e. the size of the local neighbourhood that should be considered grows with the infection probability. However, near criticality βcrit (a threshold infection probability), we observe a behaviour similar to a phase transition, whereby the time scale that local dimension correlates best with node infectiousness diverges towards values near unity, corresponding to the largest scale of the local dimension.
We further computed the local dimension and SIR dynamics for small-world graphs whilst varying the probability of rewiring p parameter, to interpolate between near regular graphs to Erdős-Réyni random graphs. In Fig. 7b we observe that the relationship between the optimal scale to determine local dimension and infectiousness of a node disappears with the randomness of the network. At low β, node infectiousness is dominated by the distance from high degree nodes in a small-world graph and, as β increases, the spreading dynamics accelerates and nodes further away can be infected. A local dimension at longer time scales τ is therefore necessary to obtain a better prediction on node infectiousness. However, in Erdős-Réyni random networks all nodes are on average at equal distance from high degree nodes and no meaningful scale exists.
We find similar linear relationships between β and scale in a Delaunay grid graph (Fig. 7c) and the European powergrid (Fig. 7d). The decrease in scale for the local dimension to be a good predictor beyond βcrit for both graphs echoed the results of high probability re-wiring in small-world graphs, suggesting that global graph structure becomes less important if the infection probability is sufficiently high.
Graph classification from distributions of local dimensions
Random graphs, such as the Watts-Strogatz graph used above, sit at the intersection of graph theory and probability theory, and are often used to investigate the properties of ‘typical’ graphs. Various models of random graphs exist to cover the diversity of complex networks encountered in the real-world, but the most commonly discussed are Erdős-Réyni, Watts-Strogatz, and Barabasi-Albert graphs. To understand whether the distribution of local dimension differed across these three types random graphs, we generated a large dataset with various choices of parameters to generate each type of random graphs of similar sizes (see “Methods”). We then computed the local dimension of each node of each graph and extracted three features from the distribution of local dimension (mean, standard deviation and skewness) and used a Random Forest model to classify between the random graph types. The classification model achieved 0.95 ± 0.014 accuracy with a stratified 10-fold split, suggesting that different random graphs types display inherently different dimensional properties. A Shap feature importance analysis revealed that the skewness and standard deviation of the distributions were most informative in differentiating the random graph types (Fig. 8(a)). The skewness and standard deviation of Barabasi-Albert graphs were larger reflecting their extremely broad and non-homogenous degree distribution. As expected, an overlap in the distribution of Erdős–Réyni and Watts-Strogatz graphs is observed (Fig. 8(c)) owing to the fact that Watts-Strogatz graphs were designed specifically to interpolate between lattices and fully disordered states (similar to, but not exactly Erdős–Réyni32) via a rewiring of edges. Despite their overlap, Erdős-Réyni graphs display a smaller standard deviation, likely resulting from a more homogeneous degree distribution.
Discussion
In this paper we have introduced a new framework to define notions of dimensions not only on graphs, but on any space where a dynamical process (from which the Euclidean dimension can be inferred) can be defined. Our measure of dimension is defined using consensus dynamics on graphs, which is most similar to Euclidean diffusion, and naturally links with the dimension in the d-dimension diffusion equation. In this sense, our measure is intrinsically defined through the diffusive process taking place on a discrete system and recovers the intuitive definition of dimension as the system loses its discreteness. In doing so, we are also able to give a geometric meaning (through the notion of dimension) to the effect of boundaries and density inhomogeneities. We have shown the relevance of this approach to examine real-world systems such as protein dynamics, neuronal or social networks, ocean currents or epidemic spreading by examining the underlying graph structure.
Through various detailed studies with the relative dimension, probing local dimensions at various scales, or characterising entire graphs with the global dimension, we have provided evidence for the wide applicability of our dimension measures to both non-complex and complex networks (see SI for characterisation of degree distributions of graphs used in this paper). There are a variety of practical applications where probing network geometry is of great utility33 and are within the scope of these dimension measures. For example, spatially modulated neurons (such as place cells or grid cells), whose network architecture plays a fundamental role in the representation of space and spatial memory, could be studied with our measures to understand the local and global lattice arrangement of firing fields34. Alternatively, our measures could be used to provide insights into the manifestation of material properties. For example, the angle at which two stacked layers of graphene are oriented relative to each other dictates the presence of superconductivity and fragile topology35. Further analysis of graph classification problems using the distribution of dimension measures (relative or local) are also promising in view of our preliminary results using random generative networks.
Methods
Graph diffusion
A network (or a graph) G is a tuple , consisting of the set of nodes vertices and edges connecting them. The network can be described by its N × N adjacency matrix which indicates the existence and the weight of a connection (edge) between each pair of nodes. On a graph, there are several non-equivalent definitions of diffusion, which are defined by different forms of the graph Laplacian. However, only one forms corresponds to the Euclidean diffusion, described by the normalised Laplacian L = K−1(K − A) where K is the diagonal matrix of weighted degrees and A the weighted adjacency matrix16. Using the definition of the Laplacian, we can state the diffusion equation for a N × 1 time-dependent node vector p(t) as in Equation (4), which is also known as consensus dynamics36. For an initial condition with a delta function of mass m at node i, the jth coordinate of the solution of Equation (4) is given by Eq. (5). For comparability across different graphs, we normalise the times of diffusion by the second smallest eigenvalue of the graph Laplacian, λ2 (the spectral gap), thus τ = 1 is the time scale for the diffusion to reach stationarity.
From our choice of Laplacian, the relative dimension matrix d (that we introduce in the next section) is symmetric if the initial masses m are chosen inversely proportional to the weighted node degrees.
In addition, to ensure that the stationary state of the diffusion sums to unity, we take where is the mean weighted degree and n is the number of nodes in the source. This is used in the protein example, where the initial mass are distributed on all the atoms of the allosteric or active site.
Comparison with fractal dimension
Looking more closely at our definition of relative dimension of Equation (6), it is proportional to the ratio of natural logarithms of peak amplitude and time, which displays similarities to the fractal based approaches where an approximate dimension can be derived from the ratio of natural logarithms of mass at a radius r,
9 |
where the mass M is simply the number of nodes within some link distance r7.
Computational aspects
Python code to compute the relative, local and global dimensions is available at https://github.com/barahona-research-group/DynGDim, based on the package NetworkX and numpy/scipy standard libraries.
Delaunay mesh with mass
We apply Delaunay triangulation to a 40 by 40 grid to return a weighted planar graph for which no point is inside the circumcircle of any triangle. The size of the grid is one unit of the code distance units. We define the weights of each edge as the inverse Euclidean lengths between points and thus obtain a discretisation of the plane. To simulate the gravitational lensing effect, we added additional nodes sampled from a Gaussian distribution with parameters with variance 0.05 in the unit square with various positions and number of nodes.
Protein graph construction
The graph representation of the proteins used in this work are computed using37, an extension of38. In short, from a pdb file, each atom is represented by a node, and bonds between atoms by an edge weighted by the energy of the bond. The choice of bonds is key to create a meaningful graph representation, and is explained in37,38, see39 to access the code.
Root-mean-square fluctuation calculations
Enzymatic proteins are inherently flexible and known to exhibit motions across a wide range of temporal and spatial scales. Using simulations, each atom can be assigned a root-mean-square fluctuation (RMSF). We calculate the RMSF using the CABS-flex 2.0 webserver which simulates protein dynamics using a coarse-grained protein model40.
Protein dataset
We present here more details on the main set of proteins we used in this work.
HRas
HRas plays an important role in signal transduction during cell-cycle regulation41. Previous studies have shown that calcium acetate acts as an allosteric activator and its mechanism of allostery is mediated by a network of hydrogen bonds, involving structural water molecules, that link the allosteric site to the catalytic residue Q6142. We treat the allosteric and active sites, that are located at opposite ends of the protein (PDB ID: 3K8Y), as the source or target nodes in our relative dimension (since multiple atoms compose the allosteric and active sites, we use all nodes as the source of the diffusive process with a uniform distribution on them).
Lactose repressor (lac)
As a second example, we examine the well-studied lactose repressor (lac) (PDB ID: 1EFA) in Fig. 3b, present in E. coli and which binds to the lac operon, a section of DNA, to inhibit the expression of proteins for the metabolism of lactose when no lactose is present43,44. In its complete form, it consists of 4 monomers, with two binding sites to a single DNA strand, inhibiting the genes located between them. The combination of two monomers co-operate to form one of the two binding sites (orange region in Fig. 3b). On each monomer there is an allosteric site for the binding of NPF molecules that activate the lac repressor.
PDK1
PDK1 is a well-known protein Kinase (PDB ID: 3ORX) that is implicated in the progression of Melanoma’s45. The allosteric site of PDK1 is a sequence of amino acids, called the PDK1 interacting fragment (PIF), that binds to a phosphate on the catalytic domain. This binding triggers a signal to start the phosphorylation of the activation loop of the substrates at the ATP pocket, or active site20. The crystallographic structure (PDB ID: 3ORX) used for our analysis has the molecule BI4 bound at the active site45 via three hydrogen bounds to a region of high relative dimension, and interacts through hydrophobic forces on a region of low relative dimension.
Fluorescence correlation microscopy experiments
Protein plasmids of Aquifex Adenylate Kinase (ID:18092 Plasmid:peT3a-AqAdk/MVGDH) were purchased from AddGene as deposited by ’Dorothee Kern Lab Plasmids’. The plasmids were already encoded with two cysteine mutations for maleimide conjugation. ADK was expressed in a 1 litre culture BL21 (DE3) cells via inoculation with 1 mM IPTG. BugBuster was used for cell lysis and TCEP and protease inhibitor was added to the lysate. ADK was purified via HIS-tag with a gravi-trap (GE-healthcare), and a PD-10 column was used to remove imidazole and exchange into protein buffer (20 mM TRIS, 50 mM NaCl). TCEP and protease inhibitor were added throughout the purification process. Alexa 488-labelled ADK was prepared overnight using 20 μM protein with molar ratio 1:10 of protein:Alexa 488. Excess dye was removed using HIS-tag purification and a PD-10 column. A Typhoon was used to examine the gel of the purified-labelled ADK product and showed no excess fluorophore. The label sites for the FRET experiment were Tyr 52 (AMPbd domain) changed to Cys and Val 145 changed to Cys (lid domain)46. Samples were diluted to 200 pM in pH 7.5 FRET buffer (20 mM TRIS, 50 mM NaCl) with 0.3 mg/ml BSA to prevent surface adsorption. Measurements were taken at thermal equilibrium such that all processes under analysis are statistical fluctuations around the equilibrium. Freely diffusing single molecules were detected using a home-built dual-channel confocal fluorescence microscope. A tunable wavelength argon ion laser (model 35LAP321-230, Melles Griot, Carlsbad, CA) was set to 514.5 nm to excite Alexa 488. The beam was focused into the sample solution to a diffraction-limited spot with a high numerical aperture oil-immersion objective (Nikon Plan Apo TIRF 60x, NA 1.45). The closer refractive indexes of oil and glass relative to water and glass make oil immersion preferable due to reduced light reflection. Type FF immersion oil (Cargille, USA) was used due to its negligible fluorescent properties. The obtained fluctuations of fluorescence intensity are autocorrelated. We fit the autocorrelation curves with a global model that includes components for triplet excitation, conformational dynamics and diffusion, with the assumption that they differed by a factor of 1.6 to distinguish the components,
where τc, τm and τD are the dynamical time scales of the protein conformational dynamics, mean triplet relaxation and the protein diffusion respectively. F1 is the fraction of molecules entering the triplet state and F2 is the fraction of molecules conformationally fluctuating.
Root-mean-square fluctuation analysis
We use the cabs flex 2 server that generated fast simulations of near-native dynamics. The dynamics uses Monte Carlo dynamics and an asymmetric metropolis scheme. CABS is a well established coarse-grained (i.e. atoms are combined into larger units) protein modelling tool. CABS uses a forcefield derived from statistical regularities seen in known protein structures, and it includes side-chain-side-chain mean field potentials, coarse-grained models of main chain hydrogen bonds, and local peptide-chain geometric preferences. The solvent effect is accounted for in an implicit fashion through protein structure statistics used in the derivation of the CABS force field. The dynamics of CABS-based coarse-grained proteins is simulated by a random series of local conformational transitions (controlled by a Monte Carlo method). The results show strong similarities with fully atomistic MD simulations. (Description here http://biocomp.chem.uw.edu.pl/sites/default/files/publications/ct300854w.pdf) The resulting trajectory from the MD simulation is analysed and clustered to a representative ensemble of protein models that reflect the flexibility of the input structure. In short, the simulation (like other MD simulations) examines the dynamic evolution of interacting units (atoms or coarse-grained units). The trajectories are determined by solving Newtons equations of motion, where the forces between units are determined by the proposed forcefield. Therefore, inherently one can study the thermodynamic properties of a system via a MD simulation.
SIR model
For the example with SIR dynamics, we simulated the standard SIR model on networks, using the fast approximation of47, with open sourced code available at https://github.com/springer-math/Mathematics-of-Epidemics-on-Networksand estimated the infectiousness of each node as the averaged number of removed nodes when the spread started from this node over 500 realisation of the dynamics. To estimate the critical value for the infectiousness β, we computed the average infectability across all nodes for each β and estimated βcrit as the value for which half of the nodes are infected.
Graph classification dataset
We generated 600 graphs of each of the three classes, Erdos–Renyi, Barabasi–Albert and Small Worlds. We sampled the number of nodes with 10 bins from 100 to 1000, and repeated that 3 times with different random seed. For in each case, we created 20 networks of each types with the following range of parameters: ER from with probabilities from 0.03 to 0.1, BA with number of edges per nodes from 1 to 20 and SW with probability from 0.1 to 0.7 and number of neighbours from 5 to 10. Improvements to the random graph classification results can be made using other graph theoretic features48.
Supplementary information
Acknowledgements
We thank David Infield, Thomas Higginson, Francesca Vianello, Florian Song, Paul Expert, Asher Mullokandov and Sophia Yaliraki for valuable discussions. We acknowledge funding through EPSRC award EP/N014529/1 supporting the EPSRC Centre for Mathematics of Precision Healthcare at Imperial. R.P. acknowledges funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) Project-ID 424778381-TRR 295. A.A. was supported by funding to the Blue Brain Project, a research centre of the École polytechnique fédérale de Lausanne (EPFL), from the Swiss government’s ETH Board of the Swiss Federal Institutes of Technology.
Author contributions
R.P., A.A., M.B. contributed to conceiving the original idea. RP and AA contributed to writing the code. R.P. and A.A. contributed to the main analyses. R.P., A.A., M.B. contributed to writing the manuscript.
Peer review
Peer review information
Nature Communications thanks Filippo Radicchi, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work
Data availability
The authors declare that the data supporting the findings of this study are available within the paper and its Supplemental Information files.
Code availability
The code is shared under the GNU General Public License v3.0. It can be found at https://github.com/barahona-research-group/DynGDim and 10.5281/zenodo.649677849.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Robert Peach, Alexis Arnaudon.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-30705-w.
References
- 1.Erdös P, Harary F, Tutte WT. On the dimension of a graph. Mathematika. 1965;12:118–122. doi: 10.1112/S0025579300005222. [DOI] [Google Scholar]
- 2.Lovász, L. Graphs and Geometry, vol. 65 (American Mathematical Soc., 2019).
- 3.Linial N, London E, Rabinovich Y. The geometry of graphs and some of its algorithmic applications. Combinatorica. 1995;15:215–245. doi: 10.1007/BF01200757. [DOI] [Google Scholar]
- 4.Csányi G, Szendrői B. Fractal–small-world dichotomy in real-world networks. Phys. Rev. E. 2004;70:016122. doi: 10.1103/PhysRevE.70.016122. [DOI] [PubMed] [Google Scholar]
- 5.Gastner MT, Newman ME. The spatial structure of networks. Eur. Phys. J. B Condens. Matter Complex Syst. 2006;49:247–252. doi: 10.1140/epjb/e2006-00046-8. [DOI] [Google Scholar]
- 6.Shanker O. Defining dimension of a complex network. Mod. Phys. Lett. B. 2007;21:321–326. doi: 10.1142/S0217984907012773. [DOI] [Google Scholar]
- 7.Daqing L, Kosmidis K, Bunde A, Havlin S. Dimension of spatially embedded networks. Nat. Phys. 2011;7:481–484. doi: 10.1038/nphys1932. [DOI] [Google Scholar]
- 8.Silva, F. N. & Costa, L. d. F. Local dimension of complex networks. Preprint at https://arxiv.org/abs/1209.2476 (2012).
- 9.Pu J, Chen X, Wei D, Liu Q, Deng Y. Identifying influential nodes based on local dimension. EPL (Europhys. Lett.) 2014;107:10010. doi: 10.1209/0295-5075/107/10010. [DOI] [Google Scholar]
- 10.Bian T, Deng Y. Identifying influential nodes in complex networks: a node information dimension approach. Chaos. 2018;28:043109. doi: 10.1063/1.5030894. [DOI] [PubMed] [Google Scholar]
- 11.Wen, T., Pelusi, D. & Deng, Y. Vital spreaders identification in complex networks with multi-local dimension. Knowl. Based Syst.195, 105717 (2020).
- 12.Reuveni S, Granek R, Klafter J. Anomalies in the vibrational dynamics of proteins are a consequence of fractal-like structure. Proc. Natl Acad. Sci. USA. 2010;107:13696–13700. doi: 10.1073/pnas.1002018107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lacasa L, Gómez-Gardenes J. Correlation dimension of complex networks. Phys. Rev. Lett. 2013;110:168703. doi: 10.1103/PhysRevLett.110.168703. [DOI] [PubMed] [Google Scholar]
- 14.Peach RL, Arnaudon A, Barahona M. Semi-supervised classification on graphs using explicit diffusion dynamics. Found. Data Sci. 2020;2:19. doi: 10.3934/fods.2020002. [DOI] [Google Scholar]
- 15.Arnaudon A, Peach RL, Barahona M. Scale-dependent measure of network centrality from diffusion dynamics. Phys. Rev. Res. 2020;2:033104. doi: 10.1103/PhysRevResearch.2.033104. [DOI] [Google Scholar]
- 16.Singer A. From graph to manifold laplacian: the convergence rate. Appl. Comput. Harmonic Anal. 2006;21:128–134. doi: 10.1016/j.acha.2006.03.004. [DOI] [Google Scholar]
- 17.Einstein A. Lens-like action of a star by the deviation of light in the gravitational field. Science. 1936;84:506–507. doi: 10.1126/science.84.2188.506. [DOI] [PubMed] [Google Scholar]
- 18.Misner, C. W., Thorne, K. S. & Wheeler J. A. Gravitation (Macmillan, 1973).
- 19.Müller-Hill, B. & Oehler, S. The Lac Operon (Walter de Gruyter New York, 1996).
- 20.Biondi RM, Kieloch A, Currie RA, Deak M, Alessi DR. The pif-binding pocket in pdk1 is essential for activation of s6k and sgk, but not pkb. EMBOJ. 2001;20:4380–4390. doi: 10.1093/emboj/20.16.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Reuveni S, Granek R, Klafter J. Proteins: coexistence of stability and flexibility. Phys. Rev. Lett. 2008;100:208101. doi: 10.1103/PhysRevLett.100.208101. [DOI] [PubMed] [Google Scholar]
- 22.Peach, R. Exploring Protein Dynamics Using Graph Theory and Single-molecule Spectroscopy. Imperial College London, Ph.D. thesis (2017).
- 23.Peach, R. L. et al. Unsupervised graph-based learning predicts mutations that alter protein dynamics. bioRxiv Preprint at 10.1101/847426 (2019).
- 24.De Nooy, W., Mrvar, A. & Batagelj, V. Exploratory Social Network Analysis with Pajek: Revised and Expanded Edition for Updated Software, Vol. 46 (Cambridge University Press, 2018).
- 25.Smith DA, White DR. Structure and dynamics of the global economy: network analysis of international trade 1965–1980. Soc Forces. 1992;70:857–893. doi: 10.2307/2580193. [DOI] [Google Scholar]
- 26.Yan G, et al. Network control principles predict neuron function in the caenorhabditis elegans connectome. Nature. 2017;550:519–523. doi: 10.1038/nature24056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bacik, K. A., Schaub, M. T., Beguerisse-Díaz, M., Billeh, Y. N. & Barahona, M. Flow-based network analysis of the Caenorhabditis elegans connectome. PLoS Comp. Biol.12, 1511.00673 (2016). [DOI] [PMC free article] [PubMed]
- 28.Estrada E, Hatano N. Communicability in complex networks. Phys. Rev. E. 2008;77:036111. doi: 10.1103/PhysRevE.77.036111. [DOI] [PubMed] [Google Scholar]
- 29.Faccin M, Schaub MT, Delvenne J-C. State aggregations in Markov chains and block models of networks. Phys. Rev. Lett. 2021;127:078301. doi: 10.1103/PhysRevLett.127.078301. [DOI] [PubMed] [Google Scholar]
- 30.Aref, S., Friggens, D. & Hendy, S. Analysing scientific collaborations of New Zealand institutions using scopus bibliometric data. In Proc of the Australasian Computer Science Week Multiconference, Association for Computing Machinery, 1–10 (2018).
- 31.Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’networks. Nature. 1998;393:440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]
- 32.Maier BF. Generalization of the small-world effect on a model approaching the erdős–rényi random graph. Sci. Rep. 2019;9:1–9. doi: 10.1038/s41598-019-45576-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Boguna M, et al. Network geometry. Nat. Rev. Phys. 2021;3:114–135. doi: 10.1038/s42254-020-00264-4. [DOI] [Google Scholar]
- 34.Ginosar, G. et al. Locally ordered representation of 3d space in the entorhinal cortex. Nature596, 1–6 (2021). [DOI] [PubMed]
- 35.Cao Y, et al. Unconventional superconductivity in magic-angle graphene superlattices. Nature. 2018;556:43–50. doi: 10.1038/nature26160. [DOI] [PubMed] [Google Scholar]
- 36.Masuda N, Porter MA, Lambiotte R. Random walks and diffusion on networks. Phys. Rep. 2017;716:1–58. doi: 10.1016/j.physrep.2017.07.007. [DOI] [Google Scholar]
- 37.Song, F., Yaliraki, S. N. & Barahona, M. Bagpype: A python package for the construction of atomistic, energy-weighted graphs from biomolecular structures. figshare preprint figshare:10.6084 (2021).
- 38.Amor BR, Schaub MT, Yaliraki SN, Barahona M. Prediction of allosteric sites and mediating interactions through bond-to-bond propensities. Nat. Commun. 2016;7:12477. doi: 10.1038/ncomms12477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Mersmann, S. et al. ProteinLens: a web-based application for the analysis of allosteric signalling on atomistic graphs of biomolecules. Nucleic Acids Res.10.5281/zenodo.6496778 (2021). [DOI] [PMC free article] [PubMed]
- 40.Kuriata A, et al. Cabs-flex 2.0: a web server for fast simulations of flexibility of protein structures. Nucleic Acids Res. 2018;46:W338–W343. doi: 10.1093/nar/gky356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.McCormick F. Ras-related proteins in signal transduction and growth control. Mol. Reprod. Dev. 1995;42:500–506. doi: 10.1002/mrd.1080420419. [DOI] [PubMed] [Google Scholar]
- 42.Buhrman G, Holzapfel G, Fetics S, Mattos C. Allosteric modulation of ras positions q61 for a direct role in catalysis. Proc. Natl Acad. Sci. USA. 2010;107:4931–4936. doi: 10.1073/pnas.0912226107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Becker NA, Greiner AM, Peters JP, Maher III LJ. Bacterial promoter repression by dna looping without protein–protein binding competition. Nucleic Acids Res. 2014;42:5495–5504. doi: 10.1093/nar/gku180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wilson C, Zhan H, Swint-Kruse L, Matthews K. The lactose repressor system: paradigms for regulation, allosteric behavior and protein folding. Cell. Mol. Life Sci. 2007;64:3–16. doi: 10.1007/s00018-006-6296-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sadowsky JD, et al. Turning a protein kinase on or off from a single allosteric site via disulfide trapping. Proc. Natl Acad. Sci. USA. 2011;108:6056–6061. doi: 10.1073/pnas.1102376108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Henzler-Wildman KA, et al. Intrinsic motions along an enzymatic reaction trajectory. Nature. 2007;450:838–844. doi: 10.1038/nature06410. [DOI] [PubMed] [Google Scholar]
- 47.Kiss, I. Z., Miller, J. C., Simon, P. L. et al. Mathematics of Epidemics on Networks 598 (Springer, 2017).
- 48.Peach RL, et al. hcga: Highly comparative graph analysis for network phenotyping. Patterns. 2021;2:100227. doi: 10.1016/j.patter.2021.100227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Peach, R., Arnaudon, A. & Barahona, M. Relative, local and global dimension in complex networks: code. 10.5281/zenodo.6496779 (2022). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors declare that the data supporting the findings of this study are available within the paper and its Supplemental Information files.
The code is shared under the GNU General Public License v3.0. It can be found at https://github.com/barahona-research-group/DynGDim and 10.5281/zenodo.649677849.