Abstract
The evolution of diverse phenotypes both involves and is constrained by molecular interaction networks. When these networks influence patterns of expression, we refer to them as gene regulatory networks (GRNs). Here, we develop a model of GRN evolution analogous to work from quasi-species theory, which is itself essentially the mutation–selection balance model from classical population genetics extended to multiple loci. With this GRN model, we prove that—across a broad spectrum of selection pressures—the dynamics converge to a stationary distribution over GRNs. Next, we show from first principles how the frequency of GRNs at equilibrium is related to the topology of the genotype network, in particular, via a specific network centrality measure termed the eigenvector centrality. Finally, we determine the structural characteristics of GRNs that are favoured in response to a range of selective environments and mutational constraints. Our work connects GRN evolution to quasi-species theory—and thus to classical populations genetics—providing a mechanistic explanation for the observed distribution of GRNs evolving in response to various evolutionary forces, and shows how complex fitness landscapes can emerge from simple evolutionary rules.
Keywords: gene regulatory networks, quasi-species theory, mutation–selection balance, neutral network
1. Introduction
Molecular networks influence both macro- and micro-evolutionary processes [1–5]. But, how might they themselves evolve? A recent comparative study of regulatory networks found that their structures often exist at the edge of critically, straddling the border of chaotic and ordered states [6]. The idea that biological regulatory networks should exhibit the kind of dynamic stability associated with near-critical networks has been theorized as adaptive, both from the perspective of maintaining functionality under mutational perturbation, i.e. their robustness [7,8], and their ability to effectively process information [9]. However, there is also substantial empirical and theoretical evidence for the importance of evolutionary change in these networks, i.e. their so-called evolvability [8,10]. Indeed, a trade-off between robustness and evolvability is hypothesized as an explanation for the common ‘small-world’ property—also a feature of near-critical networks—seen in biological networks [11]. Nevertheless, foundational work on self-organized criticality and 1/f noise demonstrated that dynamical systems embedded in a spatial dimension, e.g. biological regulatory networks, might naturally evolve to near-critical states [12,13].
Over the past two decades, several models of gene regulatory network (GRN) evolution have been proposed [14–19], which have influenced our understanding of diverse phenomena including canalization [15,20], allopatric speciation [17,18,21], expression noise [22] and the structural properties of GRNs themselves [16,23,24]. Empirical studies of transcription factors [25,26], mRNA profiles [27] and comparative genomics [28] suggest that gene duplication/loss and modularity both play an essential role in generating the observed patterns of gene regulation [29–31]. Building from these studies, Force et al. [30] showed computationally that the subfunctionalization of ancestral genes following duplication events can lead to modular GRNs. Similarly, Espinosa-Soto & Wagner [24] demonstrated that sequential adaptation via newly specialized gene activity patterns can increase the modularity of GRNs.
More recently, studies integrating empirical data with computational models have hypothesized how an expanded set of evolutionary forces may shape the structure of GRNs [32–35]. The results of this work are supported by several mathematical models of GRN evolution introduced to study regulatory structures influenced by duplication events [36], selection on functional dynamics [37], horizontal gene transfer [38], correlated mutations [39] and non-genetic inheritance [40]. Conversely, GRNs are hypothesized to evolve largely as a by-product of the progression towards some optimal state, through a combination of negative-feedback regulation [41], the rate of molecular evolution [42], trade-offs between robustness and evolvability [6] and self-organization of functional activity [43].
When genotypes are expanded to more complicated sets of genes, the long-term stationary solution for a population genetic model describing their evolution can be studied with quasi-species theory [44,45]. Using this approach, prior work has derived exact solutions for the steady-state distribution of higher dimensional genotypes and criteria for global convergence [46–48]. However, results obtained in these studies relied on an assumption of mutational accessibility among genotypes with non-zero fitness, i.e. irreducible and primitive transition matrices, and often on particular functional forms for selection [49–53]. Demonstrating analytically that mutations are accessible and that the steady-state distribution of GRNs can be studied under arbitrary forms of selections will advance both our theoretical understanding of how these structures evolve and capacity to evaluate theory with empirical data.
Here, we derive analytical conclusions for models of GRN evolution using an approach similar to those taken in population genetics and quasi-species theory [45]. However, instead of assuming mutational accessibility, we prove its existence mathematically. Specifically, we study the dynamics of GRN evolution in an infinitely large population with non-overlapping generations at mutation/selection balance. Using this model, we mechanistically recover empirical observations such as GRN modularity and prove that the dynamics always converge to a stationary distribution over GRNs. Then, assuming binary viability, identical ability to reproduce and rare mutation, we analytically show that the frequency of GRNs at mutation–selection balance is proportional to each GRN’s eigenvector centrality in a sub-graph of the genotype network [54–57]. Finally, we determine which structural motifs associated with GRNs are favoured in response to a wide variety of selective regimes and regulatory constraints. We discuss how analysing GRN evolution using a network-science approach can provide a mechanistic explanation for the way evolution shapes and is constrained by higher-order interactions [58].
2. Models
2.1. Quasi-species model with selection, reproduction and mutation
We begin with a quasi-species model that incorporates selection, asexual reproduction and mutation such that the viable individuals in the current generation reproduce and generate their offspring, which experience mutation and undergo selection to form the next generation (see figure 1 for a simple sketch). This phenomenological modelling scheme is quite common for studying both deterministic and stochastic Markovian dynamics [44,46–48,59]. Yet, as we show shortly, basic probability theory allows us to construct a model from the bottom up and provides a probabilistic interpretation of the various parameters. Additionally, we impose a few key assumptions, including: (i) an infinitely large population size, (ii) non-overlapping generations, (iii) asexual reproduction (i.e. no random assortment nor recombination), (iv) a fixed selective environment, and (v) that any single-locus mutation has a non-zero chance to occur per generation.
Figure 1.
Illustrative cartoon of different stages in the proposed quasi-species model.
More formally, suppose that It represents an individual randomly sampled from the population at generation t. Let g(It) be its genotype and the random variable of whether It is viable. Here represents the event that It survives, and otherwise. We further denote by It−1 → It the event that the randomly sampled individual at generation t − 1, namely It−1, reproduced and generated the randomly sampled individual at generation t, namely It. We use to represent the set of all plausible genotypes.
For any genotype , we are interested in its prevalence in the population at a given generation t after selection. In other words, we would like to know the probability that we observe a randomly sampled individual at generation t with genotype g, given that the sampled individual is viable. Applying Bayes’ theorem, this focal conditional probability becomes
| 2.1 |
For simplicity, we adopt the abbreviation and , which are equivalently the survival probability or the viability of genotype g, and the average viability at generation t.
What we have left in equation (2.1) is the probability that a randomly sampled individual has genotype g before selection. The derivation of relies on two observations: first, the genotype of individual It depends on the unique genotype of its parent and any subsequent mutation; second, this parent individual must be viable. The event g(It) = g is hence partitioned1 by the joint events . So we have
| 2.2 |
We abbreviate , which shows the mutational probability from genotype g′ to genotype g.
Finally, in equation (2.2) is the probability that the parent of a randomly sampled individual at generation t has genotype g′. Applying Bayes’ theorem once more, this probability becomes
| 2.3 |
where is the reproductivity of genotype g′, and is the average reproductivity at generation t − 1. Note that, instead of defining reproductivity as the number of offspring an individual has, the probabilistic formulation describes reproductivity as—when sampling from the infinitely sized next generation—how likely is one to observe an offspring of the focal genotype.
More importantly, we see that equation (2.3) leads us back to the conditional probability that, at generation t − 1, a randomly sampled individual has genotype g′ given that it is viable. Combining (2.1) to (2.3), we obtain the master equation for a quasi-species model that integrates selection, reproduction and mutation,
| 2.4 |
2.2. Pathway framework of gene regulatory networks: representing genotypes by expression behaviour
In the existing literature on quasi-species theory, the model parameters are usually arbitrarily tunable or follow an assumed distribution for simplicity. Hypothetically, these parameters depend on the resultant phenotypes of the genotypes, and any genotype–phenotype mapping reflects constraints and provides information on the model parameters. In previous work, we proposed a modelling approach, termed the pathway framework, to describe how the structure of GRNs varies due to genetic changes and how they respond to a given selective pressure [18,19] (which we summarize below; see its formal mathematical formulation in electronic supplementary material, appendix A). In the current work, we apply the pathway framework of GRNs as a genotype–phenotype mapping for parametrizing a quasi-species model (2.4).
The pathway framework conceptualizes alleles of genes as ‘black boxes’ that encapsulate their expression behaviour. Regulation between two genes naturally arises when one gene’s protein product is involved in the activation of the other’s expression (see figure 2). The pathway framework, therefore, represents the genotype as the input–output relation of each gene’s expression behaviour, and the corresponding GRN can be constructed by connecting genes based on regulation. The collective state of each network is the resulting genotype. As a consequence, these input–output relations of gene expression serve as the ‘inherited’ reactions, where the external environmental stimuli trigger an expression cascade that activates proteins. We consider the final state of the network following such a cascade the GRN’s phenotype.
Figure 2.
The pathway framework interprets a GRN as an abstraction of the expression behaviour of the genotype. In this framework, a GRN consists of edges indicating the input–output pair of a gene’s expression, from which transcriptional regulation between genes can be recovered, and it is arguably a more compact representation than the conventional notion of GRNs.
We note that the pathway framework is not without a handful of assumptions and thus may be limited to specific GRNs. Nevertheless, in the pathway framework GRNs play the role of an informative genotype–phenotype mapping, which evokes some mechanistically interpretable parametrization for models in quasi-species theory. In later sections we show how, despite this somewhat naive and specific genotype–phenotype mapping, we obtain a fairly broad demonstration of the key assumption in quasi-species theory—e.g. mutational accessibility—when proving the global convergence to a stationary solution (see §4).
In this work, we focus on a minimal pathway framework of GRNs which relies on a few additional assumptions. First, we assume there is a constant collection of proteins that can possibly appear in the organisms, and the state of a protein is binary, which indicates whether the protein is present or absent in an organism. Second, assuming that any gene’s expression is activated by a single protein and produces a single protein product, the allele of the gene becomes the ordered pair of protein activator/product. If the protein activator is in the present state, the allele of the gene turns the state of the protein product to present. Third, there is a fixed collection of genes in the organisms, and the allele of each gene can be any pair of activator/product in the constant collection of proteins. Fourth, the external environmental stimuli, if any, specify some activator proteins in the constant collection and sets their state to present.
Under these assumptions, a GRN can be transformed from its conventional notion, where nodes in the network represent genes and the edges show regulation among them, into a more compact format such that the nodes are exactly the constant collection of proteins, and the directed edges describe the expression behaviour of alleles of genes (see figure 2). Hereafter, we use the term GRN to describe those networks represented in this more compact format under the pathway framework. However, we note that the two constructions are merely different representations of the expression behaviour of the same underlying genotype. While the set of all possible genotypes is denoted by in §2.1, because the pathway framework assumes that each gene is represented by its expression behaviour and the GRN is constructed through chains of expression, we also use the notation for their corresponding GRNs, and we write to refer to a possible genotype/GRN. Given the constant collection of proteins and genes , the set of all possible GRNs is determined such that each possible GRN is a network among with directed edges, each of which is labelled by a gene in and points from any protein activator to any protein product in .
The pathway framework provides an approach to model various evolutionary forces, such as random mutation and natural selection, through graphical operations and structural characteristics on the GRNs. Mutation at a gene changes its allele stochastically, which is essentially a random process over all possible pairs of protein activators/products in the constant collection excluding the original allele. In the corresponding GRN, mutating a gene is equivalent to rewiring the directed edge labelled by the focal gene. On the other hand, selection is usually characterized as some phenotypic response evaluated against an environment. In this case, because a phenotype is developed through a cascade of internal protein expression starting from the external stimuli, the binary state of a protein in the resulting phenotype corresponds to its reachability from the stimulated proteins in the GRN. The viability and the reproductivity of a genotype can, therefore, be modelled as functions of node reachability in the GRN. For example, in the case study in §3.2, we will consider a simple scenario where the mutation at each gene is independent and the outcome is uniform among all possible alleles, and that the viability is 1 if some phenotypic constraint is satisfied or 0 otherwise. We explore more complex scenarios in later sections.
2.3. Genotype networks: a space of mutational relationship between gene regulatory networks
Previous work has developed the concept of the genotype network, which captures how various genotypes transition from one to another through mutations (not necessarily just point mutations) and/or recombination [54,60,61]. Here, we adopt the genotype network to describe the mutational connection between GRNs. The genotype network of GRNs is a undirected network of networks, where every possible GRN becomes a mega-node, and two mega-nodes are connected if the two corresponding GRNs differ only by an allele at a single locus. In other words, an edge between two mega-nodes in the genotype network represents a single-locus mutation between GRNs (figure 3). Instead of concentrating on all possible GRNs, we often focus on the mutational relationship between a subset of GRNs. One of the most common phenotypic constraints is to only consider GRNs with equal viability and reproductivity, where the induced sub-graph of the genotype network is known as the neutral network [56,61], which captures mutational transitions between GRNs that are selectively neutral in a single generation.
Figure 3.
(a) Genotype network of GRNs, where, under the pathway framework, two mega-nodes (GRNs) are connected if and only if thy differ by one edge rewiring. (b) Neutral network of GRNs, where inviable mega-nodes are removed from the genotype network. In this illustrative example inviability is modelled as a regulatory pathway from the stimulus to the protein product with a fatal effect.
We emphasize two important properties of a genotype network of GRNs and its induced sub-graphs under the pathway framework. First, because the underlying collections of proteins and genes are fixed, and a mutation at any gene can lead to a mutant allele that points from any protein activator to any protein product, each GRN has the same number of mutational neighbours. As a result, the genotype network of GRNs is in fact a regular graph. Second, for any phenotypic constraint, we show that the resulting induced sub-graph of the genotype network is connected. In other words, there always exists a sequence of single-locus mutations between two GRNs such that the involved GRNs all satisfy the arbitrarily given constraint on their phenotype. The guaranteed connectedness also applies to the neutral network of GRNs, where the phenotypic constraint corresponds to protein states leading to the same viability and reproductivity.
We leave to electronic supplementary material, appendix C the detailed proof for the connectedness of a sub-graph of the genotype network induced from arbitrary phenotypic constraint, but provide a brief outline here. The proof is based on a few observations of the pathway framework of GRNs. Under the assumption of binary protein states, there naturally exist some protein activator/product pairs that are ‘redundant’ in terms of the resulting phenotype. Such redundancy manifests when the product is simply the activator itself, or when multiple genes share the same activator/product pair. Furthermore, given a phenotypic constraint, one can come up with a family of ‘naive’ GRNs that satisfy the constraint. Specifically, such a ‘naive’ GRN is constructed by (i) for each required-present protein, assigning it as the product of a gene with an external stimulus as the activator and (ii) assigning the rest of genes with redundant activator/product pairs. Our proof in electronic supplementary material, appendix C systematically finds a mutational trajectory between two GRNs gs and gt satisfying the phenotypic constraint. This trajectory consists of three segments—between gs and a naive GRN g′, between g′ and another naive GRN g″, and finally between g″ and gt—and all GRNs traversed by the mutational trajectory also satisfy the given phenotypic constraint.
3. Analyses
3.1. Convergence to a stationary distribution of gene regulatory networks
Our goal now is to prove the convergence of the quasi-species model (2.4) under the pathway framework and derive the stationary distribution over possible GRNs at mutation/selection balance. We begin by focusing on groups of GRNs whose probability to be observed is relatively straightforward to model. First, for any GRN g with a zero viability, i.e. , the probability to observe g from a randomly sampled individual that has survived selection is also zero. Formally speaking, denoting those GRNs with a non-zero viability by , we have for each and at any time t. Second, we denote the GRNs with a zero reproductivity by . Since GRNs do not contribute offspring, their probability to be observed solely depends on the other GRNs . In particular, for each and at any time t, we have .
Next, we consolidate the focal conditional probabilities for every at generation t into a column vector p(t). In this vector, p(t), the igth entry corresponds to an individual g and can be written as . With this column vector formulation, the master equation (2.4) can now be rewritten in a matrix format. Specifically, we denote by T the transition matrix among GRNs , whose entry at the igth row and the ig′th column is ρg′μg′ gνg for , i.e. the probability that g' transitions into g in a single generation or the joint effects of viability selection and mutation. In addition, we can define a second matrix R to capture reproductivity, which filters genotypes arising via T that have zero reproductive success. More formally, R encodes the transition from to , whose entry at the igth row and the ig′th column is again ρg′μg′ gνg. Together, T and R represent the two components of selection in our model, viability and reproduction. With these matrix notations, and including the denominator of equation (2.4) where , the master equation (2.4) becomes
| 3.1 |
with noting a row vector of ones with the proper length. GRN fitness arises from the time-evolution of this master equation.
The matrix T plays a key role in the master equation (3.1), and it has a nice property that all its entries are positive. Since T corresponds to the transition between GRNs , the relative reproductivity ρg′ and the viability νg are both positive (noting that R includes selection via reproduction). Next, we must show that the mutation probability μg′g is positive as well. Recall that, when constructed through the pathway framework of GRNs, the sub-graph of the genotype network induced by any phenotypic constraint is connected (see §2.3 and electronic supplementary material, appendix C). More formally, the connectedness among GRNs constrained by a non-zero viability and reproductivity implies that, for any , there exists a sequence of mutations which transforms g′ to g through GRNs in . Since we assume that any single-locus mutation can occur with a non-zero probability (recall from §2.1), there is a non-zero chance for g′ to mutate to g within one generation,2 i.e. μg′g > 0. As a result, we observe that T is a positive matrix.
For the ease of presentation, we show the convergence of equation (3.1) when the matrix T is symmetric and provide the proof for a non-symmetric T in electronic supplementary material, appendix D. In this case, the eigenvectors of the symmetric matrix T are linearly independent and form a basis of n-dimensional vectors, where . We order the eigenvectors such that the magnitudes of their corresponding eigenvalues are non-increasing. The initial distribution can then be rewritten as a linear combination of the eigenvectors of T
| 3.2 |
In addition, because p(t) is proportional to Tp(t−1) for t > 0, we have p(t−1) proportional to Tt−1p(0) and consequently
| 3.3 |
where v1 and λ1 are the leading eigenvector and the leading eigenvalue of T, respectively. Since T is a positive matrix, by the Perron–Frobenius theorem, we have |λ1| > |λi| for every i > 1, which guarantees the convergence of equation (3.1)
| 3.4 |
For a general and potentially non-symmetric matrix T, we can first factor T by its generalized eigenvectors and its Jordan normal form and then an analogous derivation follows (see electronic supplementary material, appendix D). Therefore, under the pathway framework of GRNs, the master equation (3.1) converges to a stationary distribution that is proportional to the leading eigenvector of T. Combined with the GRNs having zero viability/reproductivity, whose probability to be observed under the limit t → ∞ can be easily computed given (3.4), the stationary distribution of GRNs describes the balanced scenario between selection, mutation and reproduction.
3.2. Case study: binary viability, identical reproductivity and independent mutation
We next turn to a simplified case study to demonstrate the validity of our predicted stationary distribution via derivation and simulation. First, a GRN g either always survives the selection stage or becomes inviable, i.e. it has binary viability νg ∈ {0, 1}. This assumption also implies that for any GRN with a non-zero viability, we have νg = 1.
Second, we assume that each GRN produces the same number of offspring. Equivalently, the probability that an individual randomly sampled from an infinitely large offspring population had a viable parent with a specific GRN g is a constant for any viable GRN . We denote this uniform reproductivity by , which is non-zero.
Third, given the underlying collection of proteins and genes , the per-generation occurrence of mutation at every is assumed to be an independent identically distributed Bernoulli random variable with a constant success probability μ. Moreover, if it occurs, a mutation at γ randomly changes γ’s expression behaviour to any other pair of protein activator/product encoded in with an equal probability. Under this assumption of independent and uniform mutation, the per-generation probability that a GRN g′ mutates to g becomes
| 3.5 |
where we denote by the set of possible pairs of protein activator/product in , and d(g′, g) is the number of genes with different expression behaviour between g′ and g.
For this more specific model, we can rewrite the transition matrix T into a series
| 3.6 |
where the entry at the igth row and the ig′th column of matrix Tk is ρμg′ g if d(g′, g) = k and 0 otherwise. Observe that T0 is proportional to the identity matrix I (of a proper size), and T1 is proportional to the adjacency matrix3 of the neutral network of GRNs (see §2.3), which we denoted by A. Writing , whose entries are finite even for a zero per-generation, per-locus mutation probability μ, equation (3.6) becomes
| 3.7 |
We further consider the scenario that mutations are rare events, specifically, under the limit μ → 0. Since the eigenvectors of I + cA are exactly the eigenvectors of A for any scalar c, and are symmetric matrices because d(g′, g) = d(g, g′), the theory of eigenvalue perturbation [62,63] ensures that the leading eigenvector of T converges to the leading eigenvector4 of A,
| 3.8 |
From equation (3.4), we have
| 3.9 |
In network science, entries of the leading eigenvector of the adjacency matrix of a connected, undirected graph are known as the eigenvector centrality [64,65] of the nodes. As a result, under the assumptions of binary viability, identical reproductivity and rare uniform mutation, the probability distribution of viable GRNs converges to a stationary distribution that is proportional to their eigenvector centrality in the neutral network.
To validate the predicted probability distribution of GRNs under mutation–selection balance, we simulate the evolution of 107 parallel populations. The simulations are parametrized with the constant sets of genes and proteins. We assume that at least one protein cannot be the product of any gene and whose presence is provided externally. We refer to these externally provided stimuli as input proteins. We also assume that at least one protein only has direct physiological effects and cannot activate any of the genes in the network, which we call the output proteins. Under this minimal set-up, there are a total of potential pairs of expression activator/products (i.e. genes), which leads to plausible GRNs. We evolve the populations under the environmental condition such that one of the input proteins is externally provided, and one of the output proteins has a fatal effect (i.e. it must be absent for an individual to have non-zero viability), resulting in possible viable GRNs. Our simulated GRNs sizes were kept small to ensure adequate sampling of the empirical distribution, which grows super-exponentially in the number of genes and proteins. We leave the simulation of larger GRNs to future work.
We simulate the evolution of parallel populations using a Wright–Fisher model [66]. Specifically, we run simulations with 16 individuals in a population,5 and given a current generation, the next generation is generated through randomly choosing viable GRNs from the current generation with replacement followed by potential mutations with a per-locus mutation probability μ = 0.1. We begin with 10 000 different initial populations where the GRN of every individual is chosen uniformly at random from all possible GRNs , and 1000 lineages are evolved from each initial population. The 107 parallel populations are evolved for a constant number of generations, and we randomly sample one viable GRN from each of them to form the simulated distribution of GRNs. This fixed length of evolution is determined through the temporal lower bound such that the resulting GRN distribution is theoretically close enough to the stationary distribution within a given level of error tolerance (detailed in electronic supplementary material, appendix E). In electronic supplementary material, figure S7, we show that our approximation holds for population sizes of 1600 and in electronic supplementary material, figure S8 with larger population sizes and much lower mutation rates (μ = 0.001).
Moreover, in order to account for the uncertainty of finite-sized sampling in the simulated distribution, we also draw the same number of 107 independent samples from the predicted distribution (3.9) to form an empirical distribution. Repeating the sampling procedure 1000 times, we obtain an ensemble of empirical distributions that captures the effect of finite-sized sampling over the predicted probability that GRNs are to be observed. We further use the averaged variation distance between the empirical distribution and the predicted distribution as the error tolerance from which the number of generations to be simulated is calculated, such that convergence of the model is theoretically guaranteed (electronic supplementary material, appendix E).
In figure 4a, we compare the exact, properly normalized leading eigenvector of the transition matrix T (3.7) along with the predicted stationary distribution of viable GRNs under the rare-mutation approximation (3.9). Observe that even a moderate per-locus mutation probability μ leads to a GRN distribution well aligned with the predicted one; especially with respect to the uncertainty arising from finite-sized sampling in the simulations. Moreover, figure 4b shows the simulated distribution of viable GRNs after long-term evolution. We see that, despite some overdispersion, the simulated distribution agrees with the derived stationary distribution of GRNs. Direct comparison between the simulated distribution and the exact solution, i.e. the leading eigenvector of the transition matrix (3.7), shows no significant difference as well (see electronic supplementary material, figure S6). Combined, our simulations provide computational evidence that, when viability is assumed binary and mutations are rare, the topology of the neutral network—specifically the eigenvector centrality of mega-nodes—serves as a informative predictor of the distribution of GRNs under mutation–selection balance.
Figure 4.
Validation that the evolutionary dynamics of GRNs converges to the derived stationary distribution (3.9). We compare the predicted stationary distribution of viable GRNs under the rare-mutation approximation with (a) the exact leading eigenvector of the transition matrix (3.7) with various per-locus mutation probability μ (coloured by a red-purple gradient from large to small) and (b) the distribution of GRNs sampled from their simulated evolutionary dynamics with μ = 0.1 (blue). The predicted distribution under the limit μ → 0 is coloured in grey, and the shaded area shows its 95% confidence band that accounts for the uncertainty of finite-sized sampling in the simulations. In both panels, the viable GRNs are ordered increasingly by their predicted probability to be observed.
3.3. Prevalent gene regulatory networks under mutation–selection balance
We now apply our model to a case study of binary viability, identical reproductivity and rare mutation to further investigate the structure of GRNs that have a higher probability to be observed than others under different selective regimes. We again consider GRNs with a constant collection of six proteins and four genes. In addition, for the ease of presentation, we label the genes by upper-case letter and the proteins by numerals , where proteins 1 and 2 are the input proteins and proteins 5 and 6 are the output proteins, respectively (see §3.2). Under the pathway framework of GRNs, an environment can be jointly described by: (i) a set of stimuli proteins that are externally provided, (ii) a set of essential proteins whose absence state leads to inviability, and (iii) a set of fatal proteins whose presence state causes inviability. We will focus on seven distinct environments listed in table 1 that showcase the scenarios of single versus multiple stimulated/essential/fatal proteins and their combinations.
Table 1.
Different environmental conditions specified by the sets of stimulated, essential, fatal proteins.
| Env. 1 | Env. 2 | Env. 3 | Env. 4 | Env. 5 | Env. 6 | Env. 7 | |
|---|---|---|---|---|---|---|---|
| stimuli | {1} | {1} | {1, 2} | {1} | {1} | {1, 2} | {1} |
| essentials | {6} | {5, 6} | {6} | {5} | |||
| fatals | {6} | {5, 6} | {6} | {6} |
For each of the focal environmental conditions, we examine the prevalent regulatory structures among various groups of GRNs. These groups consist of GRNs satisfying different constraints on their structural properties, which correspond to a few artificially enforced scenarios of interest. Here, the focal topological constraints originate from patterns observed in the most prevalent GRNs, and progressively adding constraints offers a rough ranking of regulatory patterns for their prevalence. We arrange groups of GRNs based on the following four constraints: first, GRNs with a gene of ‘spare’ functionality are excluded, where the spareness of a gene refers to its negligible consequence on the resulting phenotype. This includes self-regulating genes due to the binary state assumption and genes that are activated by an input protein which is not externally stimulated or that produce an output protein without an essential/fatal effect under the given environment. Second, we exclude GRNs with multiple genes of the same, redundant expression behaviour. Third, we only consider those GRNs where all the genes are functionally activated. This constraint mimics the scenario that genes with active expression behaviour are more likely to be observed empirically than inactive ones. Fourth, we exclude GRNs where a gene is directly activated by a stimulus and produces an essential protein to enforce selection on regulation rather than individual genes. Combinations of these four constraints lead to eight distinct groups where the prevalent GRNs are investigated (see table 2).
Table 2.
Groups of GRNs by imposing constraints on their structural properties.
| no spare genes | no redundant genes | all genes activated | no direct selection | |
|---|---|---|---|---|
| group (i) | ||||
| group (ii) | ✓ | |||
| group (iii) | ✓ | ✓ | ||
| group (iv) | ✓ | ✓ | ||
| group (v) | ✓ | ✓ | ✓ | |
| group (vi) | ✓ | |||
| group (vii) | ✓ | |||
| group (viii) | ✓ | ✓ |
In figure 5, we plot the GRNs that have the largest predicted probability to be observed among the various groups and environments, i.e. the GRNs with the greatest eigenvector centrality in the neutral network under each scenario. Note that such GRNs may not be unique; in fact, one can find multiple similar GRNs through transformations that preserve their roles in the neutral network, e.g. exchanging the expression behaviour of two genes A and B. Yet, these GRNs all share common structural features, and we only show a random sample from the GRNs with the same, maximal probability to be observed in our prediction. Moreover, figure 5 demonstrates the prevalent GRNs in both the representation of the pathway framework that manifests expression activator/product of each gene (labelled arrows between circles) and that of the conventional notion showing the regulation between genes (unlabelled arrows among rectangles).
Figure 5.
GRN that has the largest eigenvector centrality in the neutral network for different environmental conditions (table 1) and among different constrained groups of GRNs (table 2). For each prevalent GRN, its pathway framework representation is plotted by the circles and the labelled arrows, while its conventional representation is drawn through the rectangles and the unlabelled arrows. A node is coloured in orange if the protein/gene is present/activated and in blue otherwise.
A few intriguing observations arise from the prevalent regulatory structure in figure 5. For environmental conditions where protein products can only be fatal (environments 1, 2 and 3), GRNs with many spare genes tend to have the largest predicted probability under mutation–selection balance (group (i)). Once constrained by the absence of the spare genes (group (ii)), we see prevalent GRNs with lots of redundancy where genes share the same expression behaviour or are completely isolated. If we further exclude redundant paths and/or enforce all genes to be activated (groups (iii) and (iv), respectively), the prevalent GRNs demonstrate a structure which seemingly avoids expression activated by the stimulated protein or producing the fatal proteins as much as possible. Interestingly, for the environment with multiple stimuli and constraining on no spare and redundant genes (environment 3 and group (iii)), the functional activeness of all genes naturally emerges.
For the environmental conditions where protein products must be produced for survival (environments 4, 5 and 6), the most prevalent GRNs are the ones where several redundant genes are directly activated by an external stimulus and produce the essential protein (group (i)). When redundant genes are excluded (group (vi)), the prevalent GRNs have multiple parallel paths between the external stimuli and the essential proteins. When constrained such that genes activated by the external stimuli cannot produce the essential protein (group (vii)), the prevalent GRNs similarly show multiple pathways that share the same intermediate genes. Jointly imposing the two constraints mentioned above (group (viii)) results in prevalent GRN structures that maintain multiple, distinct regulatory pathways within the same GRN.
Finally, when both essential and fatal proteins exist (environment 7) and GRNs are unconstrained, the most probable networks only contain redundant genes that are directly activated by the stimulus and synthesize the essential target (groups (i), (ii) and (iv)). When we constrain GRNs such that they cannot have redundant genes, the prevalent regulatory structures typically only have a single pair of connected genes, one that is activated by the external stimuli and whose protein product activates a second gene that produces the essential protein (groups (iii) and (vi)). If we further require the activation of genes or exclude direct selection on individual genes (groups (v), (vii) and (viii)), we again see multiple pathways emerge in the prevalent GRNs.
4. Discussion
In this work, we analyse the evolutionary dynamics of GRNs under a quasi-species model with selection, mutation and asexual reproduction. Integrating this model with the pathway framework of GRNs, we analytically show that the population dynamics always converge to a stationary distribution of GRNs given any arbitrary viability function. This stationary distribution characterizes the ensemble of regulatory networks under mutation–selection balance, and it reveals the structural features of GRNs predicted to be favourable under long-term evolution. Next, we investigate a case study assuming binary viability, identical reproductivity and rare mutation, and find that the stationary distribution of GRNs can be derived from the topology of the genotype network alone. Specifically, the probability of observing a GRN under mutation–selection balance is proportional to each GRN’s eigenvector centrality in the neutral network, which—in this simplified model—is a sub-graph of the genotype network consisting of all viable GRNs. Using this approximation, we identify the network structures which are most likely to evolve in response to various selective regimes and regulatory constraints.
Our primary contribution to the broader field of quasi-species theory (and hence population genetics) is a mechanistic explanation for the key assumption of irreducible transition matrices [46–48]. As we mentioned in §1, Moran relates this assumption—which is required for global convergence in quasi-species models—to the scenario that viable genotypes are mutually accessible through mutations [59]. Similarly, a recent review by Aguirre et al. [61] concluded that mutational accessibility was present in population genetic models of genotype network evolution where all networks have non-zero fitness. Our work takes an alternative route and assumes that the mechanisms of gene regulation encode the relationship between genotype and phenotype [18]. Despite the simplicity of our model, the mutational accessibility between genotypes with non-zero fitness naturally emerges due to the high dimensionality of GRNs. This result on mutational accessibility implies that quasi-species models will still exhibit global convergence even in cases with extreme fitness values such as occurs in holey adaptive landscapes [67].
Our result that a GRN’s eigenvector centrality in the neutral network is proportional to its fitness sheds light on how we may interpret the prevalence of GRNs under rare mutation and strong selection. When first introduced by Bonacich [64], the eigenvector centrality was designed to capture an individual’s global ‘importance’ as measured by their social ties in a communication network. In particular, the eigenvector centrality is computed based on the idea that a node’s importance is proportional to the sum of its neighbours’ importance scores. This interpretation is nicely translated to the context of the neutral network of GRNs: under mutation–selection balance, the probability of observing a GRN is proportional to the total likelihood of observing its viable, mutational neighbours. As a consequence, the eigenvector centrality contains information on both robustness [68] and evolvability. Additionally, even for simple models, a GRN’s fitness will be a function of both its individual viability/reproductivity and the viability/reproductivity of its neighbours.
The observed prevalent structures of GRNs in our analyses also provide an alternative explanation for evolutionary robustness. We inductively find that prevalent GRNs are those with a minimal number of inviable mutational neighbours (see §3.3). Since the genotype network is a regular graph under the pathway framework of GRNs (recalling from §2.3), i.e. every GRN has the same number of mutational neighbours, GRNs with equal viability and reproductivity can still have different fitness if they have different numbers of inviable neighbours. In other words, the observed GRNs under various environmental conditions represent a balance between survival, reproduction and the number of viable neighbours in the neutral network. We emphasize that this concept of robustness emerges naturally from our mechanistic, quasi-species model of GRN evolution rather than an a prior assumption about important properties of GRNs.
Previous work has also found that the topological features of genotype networks can be related to evolutionary processes of interest. For example, evolvability has been approximated by the size of the genotype network of a given phenotype [69], as well as the number of ‘neighbouring’ phenotypes inferred from the genotype network [70]. Robustness has been modelled as the node degree in the genotype network [70], and Dall’Olio et al. [71] adopted the average path length in the genotype network as a proxy for genetic heterogeneity. To the best of our knowledge, Van Nimwegen et al. [68] were the first to bridge between the asymptotic abundance of different genotypes under a population genetic model and their eigenvector centrality in the neutral network. Our study extends Van Nimwegen et al.’s [68] conclusion and differentiates a quasi-species perspective from a model lacking genetic variation. For instance, if a population fixes a single genotype at all time and its evolution is modelled as a random walk on the neutral network, results from network science guarantee the fixation probability at a given genotype to be proportional to its degree instead of the eigenvector centrality in the neutral network [65].
The current scope of this work is not without limitation. First, we assume a constant, static environment in which the population evolves, whereas populations certainly experience shifting or alternating environmental conditions [72–74]. Consequently, our predicted regulatory structures under mutation–selection balance may not capture all the features present in empirical GRNs [75,76]. Second, our model focused on the joint forces of selection and mutation. Although this model can indeed be extended to include more complicated mechanisms such as recombination [77,78], gene duplication [79,80] and demographics [81], we leave such extensions—along with their possible implications—to future work. Third, when the time scale of environmental change is much faster than that of the evolutionary dynamics (see electronic supplementary material, appendix E), the transient constitution of GRNs in a population shall attract more ‘attention’ than their stationary distribution at mutation–selection balance [82–84]. Put simply, it remains an open question whether real-world populations should ever be conceptualized as at equilibrium (even dynamic) as opposed to existing in some far-from-equilibrium state [85]. Fourth, despite agreement between the derived stationary distribution of GRNs in an infinitely large population and our long-term numerical simulations, we also find that a finite population size moderately influences the transient evolutionary dynamics. Developing a richer understanding of the role genetic drift plays in structuring the evolution of GRNs is an important extension of our work. Finally, despite our model being constrained to the assumptions listed in §2.2, it can be extended to more diverse gene regulatory mechanisms. For instance, gene regulation can be modelled as different logic gates connecting multiple expression activators/suppressors/products, and the chemical states of proteins can also be generalized to continuous dosage. Importantly, the connectivity of the neutral network and its convergence to a stationary distribution still hold under these more complicated models, provided that mutations between GRNs are not prohibited.
Across a broad range of models, our work demonstrates that the neutral network of GRNs is always connected (in agreement with existing computational work [86]) and that the relative frequency at equilibrium of various GRNs can be predicted from first principles. Therefore, our work shows how different evolutionary forces can favour different GRN structures. We believe future work that progressively integrates a broader set of evolutionary mechanisms will result in models capable of being compared with empirical data, and that analytical predictions building from our work will complement existing studies based purely on computational approaches [24,30,36]. Nevertheless, the current work establishes a null expectation for how GRNs are shaped by mutation and selection. Critically, this null expectation appears to recapitulate many of the topological features of molecular interaction networks currently associated with more complex properties like evolvability and robustness. Said differently, our work shows that complex fitness landscapes can emerge from simple evolutionary rules [19].
Acknowledgements
We are grateful to Professors Rafael F. Guerrero and C. Brandon Ogbunugafor for insightful discussions around the evolution of gene networks and higher-order interactions in biology.
Endnotes
A set A is said to be partitioned by {Ai}i∈I if and for two distinct i, j ∈ I.
To be more precise, this argument is only valid when the joint probability for any combination of multiple single-locus mutations is non-zero per generation. Otherwise, we can modify the master equation (3.1) by extending the time scale from 1 to Δt, where Δt is the diameter of the sub-graph of the genotype network constrained by a non-zero viability and reproductivity. The modified transition matrix is now proportional to TΔt, which is a positive matrix since mutation events at different generations are independent. Replacing T by TΔt we have an analogous derivation to prove the convergence of the master equation.
In graph theory, the adjacency matrix A is a way to represent the structure of a graph, where Aij = 1 if nodes i and j are connected, and Aij = 0 otherwise.
Here we use the notation v1 (A) and λ1 (A) for the leading eigenvector and the leading eigenvalue of A, respectively.
Given our assumption of no genetic drift, i.e. infinite population size, focusing on the model fit for a small number of individuals is conservative. In addition, we are able to more richly explore the effect of initial conditions and sampling on the resultant distribution of GRNs. This number, i.e. 16, is also comparable to the number of possible expression pairs per locus to facilitate the diversity in the initial populations. In electronic supplementary material, figure S7, we show that our approximation holds for population sizes of 1600 and in electronic supplementary material, figure S8 with larger population sizes and much lower mutation rates.
Data accessibility
The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures and tables, and the code of the analyses is available at https://github.com/chiahungyang/GenoNet. The data are provided in the electronic supplementary material [87].
Authors' contributions
C.-H.Y.: conceptualization, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft; S.V.S.: conceptualization, funding acquisition, project administration, supervision, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration
We declare we have no competing interests.
Funding
This research was supported by start-up funds from Northeastern University to S.V.S. The funders played no role in the study design nor interpretation of the findings.
References
- 1.Hoekstra HE, Coyne JA. 2007. The locus of evolution: evo devo and the genetics of adaptation. Evolution 61, 995-1016. ( 10.1111/j.1558-5646.2007.00105.x) [DOI] [PubMed] [Google Scholar]
- 2.Carroll SB. 2008. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25-36. ( 10.1016/j.cell.2008.06.030) [DOI] [PubMed] [Google Scholar]
- 3.Yeaman S, Whitlock MC. 2011. The genetic architecture of adaptation under migration–selection balance. Evolution 65, 1897-1911. ( 10.1111/evo.2011.65.issue-7) [DOI] [PubMed] [Google Scholar]
- 4.Cohen AA, Martin LB, Wingfield JC, McWilliams SR, Dunne JA. 2012. Physiological regulatory networks: ecological roles and evolutionary constraints. Trends Ecol. Evol. 27, 428-435. ( 10.1016/j.tree.2012.04.008) [DOI] [PubMed] [Google Scholar]
- 5.Nelson TC, Jones MR, Velotta JP, Dhawanjewar AS, Schweizer RM. 2019. Unveiling connections between genotype, phenotype, and fitness in natural populations. Mol. Ecol. 28, 1866-1876. ( 10.1111/mec.2019.28.issue-8) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Daniels BC, Kim H, Moore D, Zhou S, Smith HB, Karas B, Kauffman SA, Walker SI. 2018. Criticality distinguishes the ensemble of biological regulatory networks. Phys. Rev. Lett. 121, 138102. ( 10.1103/PhysRevLett.121.138102) [DOI] [PubMed] [Google Scholar]
- 7.Barkai N, Leibler S. 1997. Robustness in simple biochemical networks. Nature 387, 913-917. ( 10.1038/43199) [DOI] [PubMed] [Google Scholar]
- 8.Masel J, Trotter MV. 2010. Robustness and evolvability. Trends Genet. 26, 406-414. ( 10.1016/j.tig.2010.06.002) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cheong R, Rhee A, Wang CJ, Nemenman I, Levchenko A. 2011. Information transduction capacity of noisy biochemical signaling networks. Science 334, 354-358. ( 10.1126/science.1204553) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lenski RE, Barrick JE, Ofria C. 2006. Balancing robustness and evolvability. PLoS Biol. 4, e428. ( 10.1371/journal.pbio.0040428) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wagner A, Fell DA. 2001. The small world inside large metabolic networks. Proc. R. Soc. Lond. B 268, 1803-1810. ( 10.1098/rspb.2001.1711) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bak P, Tang C, Wiesenfeld K. 1987. Self-organized criticality: and explanation of 1/f noise. Phys. Rev. Lett. 59, 381-384. ( 10.1103/PhysRevLett.59.381) [DOI] [PubMed] [Google Scholar]
- 13.Dorogovtsev SN, Mendes JFF. 2002. Evolution of networks. Adv. Phys. 51, 1079-1187. ( 10.1080/00018730110112519) [DOI] [Google Scholar]
- 14.Wagner A. 1994. Evolution of gene networks by gene duplications: a mathematical model and its implications on genome organization. Proc. Natl Acad. Sci. USA 91, 4387-4391. ( 10.1073/pnas.91.10.4387) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Siegal ML, Bergman A. 2002. Waddington’s canalization revisited: developmental stability and evolution. Proc. Natl Acad. Sci. USA 99, 10 528-10 532. ( 10.1073/pnas.102303999) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Carneiro MO, Taubes CH, Hartl DL. 2011. Model transcriptional networks with continuously varying expression levels. BMC Evol. Biol. 11, 363. ( 10.1186/1471-2148-11-363) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Schiffman JS, Ralph PL. 2018. System drift and speciation. BioRxiv, 231209. ( 10.1101/231209) [DOI]
- 18.Yang C-H, Scarpino SV. 2020. Reproductive barriers as a byproduct of gene network evolution. BioRxiv. ( 10.1101/2020.06.12.147322) [DOI]
- 19.Yang C-H, Scarpino SV. 2022. A family of fitness landscapes modeled through gene regulatory networks. Entropy 24, 622. ( 10.3390/e24050622) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wagner A. 1996. Does evolutionary plasticity evolve? Evolution 50, 1008-1023. ( 10.1111/j.1558-5646.1996.tb02342.x) [DOI] [PubMed] [Google Scholar]
- 21.Palmer ME, Feldman MW. 2009. Dynamics of hybrid incompatibility in gene networks in a constant environment. Evolution 63, 418-431. ( 10.1111/evo.2009.63.issue-2) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Puzovic N. 2020. Effect of gene network topology on the evolution of gene-specific expression noise. PhD thesis, Christian-Albrechts-Universität Kiel, Germany.
- 23.Burda Z, Krzywicki A, Martin OC, Zagorski M. 2010. Distribution of essential interactions in model gene regulatory networks under mutation-selection balance. Phys. Rev. E 82, 011908. ( 10.1103/PhysRevE.82.011908) [DOI] [PubMed] [Google Scholar]
- 24.Espinosa-Soto C, Wagner A. 2010. Specialization can drive the evolution of modularity. PLoS Comput. Biol. 6, e1000719. ( 10.1371/journal.pcbi.1000719) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Madan Babu M, Teichmann SA. 2003. Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res. 31, 1234-1244. ( 10.1093/nar/gkg210) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Teichmann SA, Madan Babu M. 2004. Gene regulatory network growth by duplication. Nat. Genet. 36, 492-496. ( 10.1038/ng1340) [DOI] [PubMed] [Google Scholar]
- 27.Thompson DA, et al. 2013. Evolutionary principles of modular gene regulation in yeasts. Elife 2, e00603. ( 10.7554/eLife.00603) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Martí-Solans J, Belyaeva OV, Torres-Aguila NP, Kedishvili NY, Albalat R, Cañestro C. 2016. Coelimination and survival in gene network evolution: dismantling the RA-signaling in a chordate. Mol. Biol. Evol. 33, 2401-2416. ( 10.1093/molbev/msw118) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Schlosser G, Wagner GP. 2004. Modularity in development and evolution. Chicago, IL: University of Chicago Press. [Google Scholar]
- 30.Force A, Cresko WA, Pickett FB, Proulx SR, Amemiya C, Lynch M. 2005. The origin of subfunctions and modular gene regulation. Genetics 170, 433-446. ( 10.1534/genetics.104.027607) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Aguirre J, Buldú JM, Manrubia SC. 2009. Evolutionary dynamics on networks of selectively neutral genotypes: effects of topology and sequence stability. Phys. Rev. E 80, 066112. ( 10.1103/PhysRevE.80.066112) [DOI] [PubMed] [Google Scholar]
- 32.Des Marais DL, Guerrero RF, Lasky JR, Scarpino SV. 2017. Topological features of a gene co-expression network predict patterns of natural diversity in environmental response. Proc. R. Soc. B 284, 20170914. ( 10.1098/rspb.2017.0914) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lotterhos KE, Yeaman S, Degner J, Aitken S, Hodgins KA. 2018. Modularity of genes involved in local adaptation to climate despite physical linkage. Genome Biol. 19, 157. ( 10.1186/s13059-018-1545-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Blanco-Pastor JL, et al. 2020. Canonical correlations reveal adaptive loci and phenotypic responses to climate in perennial ryegrass. Mol. Ecol. Res. 21, 849-870. ( 10.1111/men.v21.3) [DOI] [PubMed] [Google Scholar]
- 35.Lopez-Arboleda WA, Reinert S, Nordborg M, Korte A. 2021. Global genetic heterogeneity in adaptive traits. BioRxiv. [DOI] [PMC free article] [PubMed]
- 36.Louzoun Y, Muchnik L, Solomon S. 2006. Copying nodes versus editing links: the source of the difference between genetic regulatory networks and the www. Bioinformatics 22, 581-588. ( 10.1093/bioinformatics/btk030) [DOI] [PubMed] [Google Scholar]
- 37.Kwon Y-K, Cho K-H. 2007. Analysis of feedback loops and robustness in network evolution based on Boolean models. BMC Bioinf. 8, 430. ( 10.1186/1471-2105-8-430) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Maslov S, Krishna S, Pang TY, Sneppen K. 2009. Toolbox model of evolution of prokaryotic metabolic networks and their regulation. Proc. Natl Acad. Sci. USA 106, 9743-9748. ( 10.1073/pnas.0903206106) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cheng F, Liu C, Lin C-C, Zhao J, Jia P, Li W-H, Zhao Z. 2015. A gene gravity model for the evolution of cancer genomes: a study of 3000 cancer genomes across 9 cancer types. PLoS Comput. Biol. 11, e1004497. ( 10.1371/journal.pcbi.1004497) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Odorico A, Rünneburger E, Le Rouzic A. 2018. Modelling the influence of parental effects on gene-network evolution. J. Evol. Biol. 31, 687-700. ( 10.1111/jeb.2018.31.issue-5) [DOI] [PubMed] [Google Scholar]
- 41.Peng W, Liu P, Xue Y, Acar M. 2015. Evolution of gene network activity by tuning the strength of negative-feedback regulation. Nat. Commun. 6, 1-9. [DOI] [PubMed] [Google Scholar]
- 42.Masalia RR, Bewick AJ, Burke JM. 2017. Connectivity in gene coexpression networks negatively correlates with rates of molecular evolution in flowering plants. PLoS ONE 12, e0182289. ( 10.1371/journal.pone.0182289) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Siebert BA, Hall CL, Gleeson JP, Asllani M. 2020. Role of modularity in self-organization dynamics in biological networks. Phys. Rev. E 102, 052306. ( 10.1103/PhysRevE.102.052306) [DOI] [PubMed] [Google Scholar]
- 44.Eigen M, Schuster P. 1977. A principle of natural self-organization. Naturwissenschaften 64, 541-565. ( 10.1007/BF00450633) [DOI] [PubMed] [Google Scholar]
- 45.Wilke CO. 2005. Quasispecies theory in the context of population genetics. BMC Evol. Biol. 5, 1-8. ( 10.1186/1471-2148-5-44) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Thompson CJ, McBride JL. 1974. On Eigen’s theory of the self-organization of matter and the evolution of biological macromolecules. Math. Biosci. 21, 127-142. ( 10.1016/0025-5564(74)90110-2) [DOI] [Google Scholar]
- 47.Jones BL, Enns RH, Rangnekar SS. 1976. On the theory of selection of coupled macromolecular systems. Bull. Math. Biol. 38, 15-28. ( 10.1016/S0092-8240(76)80040-7) [DOI] [Google Scholar]
- 48.Demetrius L. 1983. Selection and evolution in macromolecular systems. J. Theor. Biol. 103, 619-643. ( 10.1016/0022-5193(83)90286-2) [DOI] [PubMed] [Google Scholar]
- 49.Bürger R. 1998. Mathematical properties of mutation-selection models. Genetica 102, 279-298. ( 10.1023/A:1017043111100) [DOI] [Google Scholar]
- 50.Johnson T. 1999. The approach to mutation–selection balance in an infinite asexual population, and the evolution of mutation rates. Proc. R. Soc. Lond. B 266, 2389-2397. ( 10.1098/rspb.1999.0936) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Steinsaltz D, Evans SN, Wachter KW. 2005. A generalized model of mutation–selection balance with applications to aging. Adv. Appl. Math. 35, 16-33. ( 10.1016/j.aam.2004.09.003) [DOI] [Google Scholar]
- 52.Goyal S, Balick DJ, Jerison ER, Neher RA, Shraiman BI, Desai MM. 2012. Dynamic mutation–selection balance as an evolutionary attractor. Genetics 191, 1309-1319. ( 10.1534/genetics.112.141291) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Smerlak M, Youssef A. 2017. Limiting fitness distributions in evolutionary dynamics. J. Theor. Biol. 416, 68-80. ( 10.1016/j.jtbi.2017.z01.005) [DOI] [PubMed] [Google Scholar]
- 54.Wagner A. 2011. The origins of evolutionary innovations: a theory of transformative change in living systems. Oxford, UK: Oxford University Press. [Google Scholar]
- 55.Smith JM. 1970. Natural selection and the concept of a protein space. Nature 225, 563-564. ( 10.1038/225563a0) [DOI] [PubMed] [Google Scholar]
- 56.Schuster P, Fontana W, Stadler PF, Hofacker IL. 1994. From sequences to shapes and back: a case study in RNA secondary structures. Proc. R. Soc. Lond. B 255, 279-284. ( 10.1098/rspb.1994.0040) [DOI] [PubMed] [Google Scholar]
- 57.Cowperthwaite MC, Meyers LA. 2007. How mutational networks shape evolution: lessons from RNA models. Annu. Rev. Ecol. Evol. Syst. 38, 203-230. ( 10.1146/ecolsys.2007.38.issue-1) [DOI] [Google Scholar]
- 58.Ogbunugafor CB, Scarpino SV. 2022. Higher-order interactions in biology: the curious case of epistasis. In Higher-order systems (eds F Battiston, G Petri), pp. 417–433. Berlin, Germany: Springer.
- 59.Moran PAP. 1976. Global stability of genetic systems governed by mutation and selection. In Mathematical Proceedings of the Cambridge Philosophical Society, vol. 80, pp. 331–336. Cambridge, UK: Cambridge University Press.
- 60.Ciliberti S, Martin OC, Wagner A. 2007. Innovation and robustness in complex regulatory gene networks. Proc. Natl Acad. Sci. USA 104, 13 591-13 596. ( 10.1073/pnas.0705396104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Aguirre J, Catalán P, Cuesta JA, Manrubia S. 2018. On the networked architecture of genotype spaces and its critical effects on molecular evolution. Open Biol. 8, 180069. ( 10.1098/rsob.180069) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Andrew AL, Eric Chu K-W, Lancaster P. 1993. Derivatives of eigenvalues and eigenvectors of matrix functions. SIAM J. Matrix Anal. Appl. 14, 903-926. ( 10.1137/0614061) [DOI] [Google Scholar]
- 63.Greenbaum A, Li R-C, Overton ML. 2020. First-order perturbation theory for eigenvalues and eigenvectors. SIAM Rev. 62, 463-482. ( 10.1137/19M124784X) [DOI] [Google Scholar]
- 64.Bonacich P. 1987. Power and centrality: a family of measures. Am. J. Soc. 92, 1170-1182. ( 10.1086/228631) [DOI] [Google Scholar]
- 65.Newman M. 2018. Networks. Oxford, UK: Oxford University Press. [Google Scholar]
- 66.Hudson RR. 2002. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337-338. ( 10.1093/bioinformatics/18.2.337) [DOI] [PubMed] [Google Scholar]
- 67.Gavrilets S. 1997. Evolution and speciation on holey adaptive landscapes. Trends Ecol. Evol. 12, 307-312. ( 10.1016/S0169-5347(97)01098-7) [DOI] [PubMed] [Google Scholar]
- 68.Van Nimwegen E, Crutchfield JP, Huynen M. 1999. Neutral evolution of mutational robustness. Proc. Natl Acad. Sci. USA 96, 9716-9720. ( 10.1073/pnas.96.17.9716) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Cowperthwaite MC, Economo EP, Harcombe WR, Miller EL, Meyers LA. 2008. The ascent of the abundant: how mutational networks constrain evolution. PLoS Comput. Biol. 4, e1000110. ( 10.1371/journal.pcbi.1000110) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Wagner A. 2007. Robustness and evolvability: a paradox resolved. Proc. R. Soc. B 275, 91-100. ( 10.1098/rspb.2007.1137) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Dall’Olio GM, Bertranpetit J, Wagner A, Laayouni H. 2014. Human genome variation and the concept of genotype networks. PLoS ONE 9, e99424. ( 10.1371/journal.pone.0099424) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Hietpas RT, Bank C, Jensen JD, Bolon DNA. 2013. Shifting fitness landscapes in response to altered environments. Evolution 67, 3512-3522. ( 10.1111/evo.12207) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Szücs M, Vahsen ML, Melbourne BA, Hoover C, Weiss-Lehman C, Hufbauer RA. 2017. Rapid adaptive evolution in novel environments acts as an architect of population range expansion. Proc. Natl Acad. Sci. USA 114, 13 501-13 506. ( 10.1073/pnas.1712934114) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.French RK, Holmes EC. 2020. An ecosystems perspective on virus evolution and emergence. Trends Microbiol. 28, 165-175. ( 10.1016/j.tim.2019.10.010) [DOI] [PubMed] [Google Scholar]
- 75.Shen-Orr SS, Milo R, Mangan S, Alon U. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64-68. ( 10.1038/ng881) [DOI] [PubMed] [Google Scholar]
- 76.Guelzim N, Bottani S, Bourgine P, Képès F. 2002. Topological and causal structure of the yeast transcriptional regulatory network. Nat. Genet. 31, 60-63. ( 10.1038/ng873) [DOI] [PubMed] [Google Scholar]
- 77.Feldman MW, Otto SP, Christiansen FB. 1996. Population genetic perspectives on the evolution of recombination. Annu. Rev. Genet. 30, 261-295. ( 10.1146/genet.1996.30.issue-1) [DOI] [PubMed] [Google Scholar]
- 78.Ortiz-Barrientos D, Engelstädter J, Rieseberg LH. 2016. Recombination rate evolution and the origin of species. Trends Ecol. Evol. 31, 226-236. ( 10.1016/j.tree.2015.12.016) [DOI] [PubMed] [Google Scholar]
- 79.Zhang J. 2003. Evolution by gene duplication: an update. Trends Ecol. Evol. 18, 292-298. ( 10.1016/S0169-5347(03)00033-8) [DOI] [Google Scholar]
- 80.Konrad A, Teufel AI, Grahnen JA, Liberles DA. 2011. Toward a general model for the evolutionary dynamics of gene duplicates. Genome Biol. Evol. 3, 1197-1209. ( 10.1093/gbe/evr093) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Li J, Li H, Jakobsson M, Li S, Sjödin P, Lascoux M. 2012. Joint analysis of demography and selection in population genetics: where do we stand and where could we go? Mol. Ecol. 21, 28-44. ( 10.1111/mec.2011.21.issue-1) [DOI] [PubMed] [Google Scholar]
- 82.Maruyama T, Fuerst PA. 1985. Population bottlenecks and nonequilibrium models in population genetics. II. Number of alleles in a small population that was formed by a recent bottleneck. Genetics 111, 675-689. ( 10.1093/genetics/111.3.675) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Gamelon M, Gimenez O, Baubet E, Coulson T, Tuljapurkar S, Gaillard JM. 2014. Influence of life-history tactics on transient dynamics: a comparative analysis across mammalian populations. Am. Nat. 184, 673-683. ( 10.1086/677929) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Burie J-B, Djidjou-Demasse R, Ducrot A. 2020. Asymptotic and transient behaviour for a nonlocal problem arising in population genetics. Eur. J. Appl. Math. 31, 84-110. ( 10.1017/S0956792518000487) [DOI] [Google Scholar]
- 85.Goldenfeld N, Woese C. 2011. Life is physics: evolution as a collective phenomenon far from equilibrium. Annu. Rev. Condens. Matter Phys. 2, 375-399. ( 10.1146/conmatphys.2011.2.issue-1) [DOI] [Google Scholar]
- 86.Ciliberti S, Martin OC, Wagner A. 2007. Robustness can evolve gradually in complex regulatory gene networks with varying topology. PLoS Comput. Biol. 3, e15. ( 10.1371/journal.pcbi.0030015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Yang C-H, Scarpino SV. 2023. The ensemble of gene regulatory networks at mutation–selection balance. Figshare. ( 10.6084/m9.figshare.c.6350487) [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Yang C-H, Scarpino SV. 2023. The ensemble of gene regulatory networks at mutation–selection balance. Figshare. ( 10.6084/m9.figshare.c.6350487) [DOI] [PMC free article] [PubMed]
Data Availability Statement
The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures and tables, and the code of the analyses is available at https://github.com/chiahungyang/GenoNet. The data are provided in the electronic supplementary material [87].





