Abstract
We model the evolution of eukaryotic protein-protein interaction (PPI) networks. In our model, PPI networks evolve by two known biological mechanisms: (1) Gene duplication, which is followed by rapid diversification of duplicate interactions. (2) Neofunctionalization, in which a mutation leads to a new interaction with some other protein. Since many interactions are due to simple surface compatibility, we hypothesize there is an increased likelihood of interacting with other proteins in the target protein’s neighborhood. We find good agreement of the model on 10 different network properties compared to high-confidence experimental PPI networks in yeast, fruit flies, and humans. Key findings are: (1) PPI networks evolve modular structures, with no need to invoke particular selection pressures. (2) Proteins in cells have on average about 6 degrees of separation, similar to some social networks, such as human-communication and actor networks. (3) Unlike social networks, which have a shrinking diameter (degree of maximum separation) over time, PPI networks are predicted to grow in diameter. (4) The model indicates that evolutionarily old proteins should have higher connectivities and be more centrally embedded in their networks. This suggests a way in which present-day proteomics data could provide insights into biological evolution.
Introduction
We are interested in the evolution of protein-protein interaction (PPI) networks. PPI network evolution accompanies cellular evolution, and may be important for processes such as the emergence of antibiotic resistance in bacteria [1], [2], the growth of cancer cells [3], and biological speciation [4]–[6]. In recent years, increasingly large volumes of experimental PPI data have become available [7]–[10], and a variety of computational techniques have been created to process and analyze these data [11]–[18]. Although these techniques are diverse, and the experimental data are noisy [19], a general picture emerging from these studies is that the evolutionary pressures shaping protein networks are deeply interlinked with the networks’ topology [20]. Our aim here is to construct a minimal model of PPI network evolution which accurately captures a broad panel of topological properties.
In this work, we describe an evolutionary model for eukaryotic PPI networks. In our model, protein networks evolve by two known biological mechanisms: (1) a gene can duplicate, putting one copy under new selective pressures that allow it to establish new relationships to other proteins in the cell, and (2) a protein undergoes a mutation that causes it to develop new binding or new functional relationships with existing proteins. In addition, we allow for the possibility that once a mutated protein develops a new relationship with another protein (called the target), the mutant protein can also more readily establish relationships with other proteins in the target’s neighborhood. One goal is to see if random changes based on these mechanisms could generate networks with the properties of present-day PPI networks. Another goal is then to draw inferences about the evolutionary histories of PPI networks.
Results
We represent a PPI network as a graph. Each node on the graph represents one protein. A link (edge) between two nodes represents a physical interaction between the two corresponding proteins. The links are undirected and unweighted. To model the evolution of the PPI graph, we simulate a series of steps in time. At time , one protein in the network is subjected to either a gene duplication or a neofunctionalizing mutation, leading to an altered network by time . We refer to this model as the DUNE (DUplication & NEofunctionalization) model.
Gene Duplication
One mechanism by which PPI networks change is gene duplication (DU) [21]–[23]. In DU, an existing gene is copied, creating a new, identical gene. In our model, duplications occur at a rate , which is assumed to be constant for each organism. All genes are accessible to duplication, with equal likelihood. For simplicity, we assume that one gene codes for one protein. One of the copies continues to perform the same biological function and remains under the same selective pressures as before. The other copy is superfluous, since it is no longer essential for the functioning of the cell [24].
The superfluous copy of a protein/gene is under less selective pressure; it is free to lose its previous function and to develop some other function within the cell. Due to this reduced selective pressure, further mutations to the superfluous protein are more readily accepted, including those that would otherwise have been harmful to the organism [25], [26]. Hence, a superfluous protein diverges rapidly after its DU event [27], [28]. This well-known process is referred to as the post-duplication divergence. Following [29], we assume that the link of each such superfluous protein/gene to its former neighbors is deleted with probability . The post-duplication divergence tends to be fast; for simplicity, we assume the divergence occurs within the same time step as the DU. The divergence is asymmetric [30], [31]: one of the proteins diversifies rapidly, while the other protein retains its prior activity. We delete links from the original or the duplicate with equal probability because the proteins are identical. As discussed in the supporting information (SI), this is closely related to the idea of subfunctionalization, where divergence freely occurs until redundancy is eliminated (see SI text in File S1). In our model, is an adjustable parameter.
In many cases, the post-duplication divergence results in a protein which has lost all its links. These ‘orphan’ proteins correspond to silenced or deleted genes in our model. As discussed below, our model predicts that the gene loss rate should be slightly higher than the duplication rate in yeast, and slightly lower in flies and humans.
We simulate a gene duplication event at time as follows:
-
1a.
Duplicate a randomly-chosen gene with probability .
-
2a.
Choose either the original (50%) or duplicate (50%), and delete each of its links with probability .
-
3a.
Move on to the next time interval, time .
Neofunctionalization
Our model also takes into account that DNA can be changed by random mutations. Most such mutations do not lead to changes in the PPI network structure. However, some protein mutations lead to new interactions with some other protein (which we call the target protein). The formation of a novel interaction is called a neofunctionalization (NE) event. NE refers to the creation of new interactions, not to the disappearance of old ones. Functional deletions tend to be deleterious to organisms [32]. We do not account for loss-of-function mutations (link deletions) except during post-duplication divergence because damaged alleles will, in general, be eliminated by purifying selection. In our model, NE mutations occur at a rate , which is assumed to be constant. All proteins are equally likely to be mutated.
How does the mutated protein choose a target protein to which it links? We define a probability that any protein in the network is selected for receiving the new link from the mutant protein. To account for the possibility of homodimerization, the mutated protein may also link to itself [24], [33]. Random choice dictates that (see SI).
Many PPI’s are driven by a simple geometric compatibility between the surfaces of the proteins [34]. The simplest example is the case of PPI’s between flat, hydrophobic surfaces [35], a type of interaction which is very common [36]. These PPI’s have a simple planar interface, and the binding sites on the individual proteins are geometrically quite similar to one another. One consequence of these similar-surface interactions is that if protein A can bind to proteins B and C, then there is a greater-than-random chance that B and C will interact with each other. We refer to this property as transitivity: if A binds B, and A binds C, then B binds C. The number of triangles in the PPI network should correlate roughly with transitivity. As discussed below, the number of triangles (as quantified by the global clustering coefficient) is about 45 times higher in real PPI networks than in an equally-dense random graph. This suggests that transitivity is quite common in PPI networks. Another source of transitivity is gene duplication. If A binds B, then A is copied to create a duplicate protein A’, then A’ will (initially) also bind B. If A interacts with A’, then a triangle exists. However, duplication is unlikely to be the primary source of transitivity; recent evidence shows that, due to the post-duplication divergence, duplicates tend to participate in fewer triangles than other proteins [37].
A concrete example of transitivity is provided by the evolution of the retinoic acid receptor (RAR), an example of neofunctionalization which has been characterized in detail [38]. Three paralogs of RAR exist in vertebrates (RAR, , and ), as a result of an ancient duplication. The interaction profiles of these proteins are quite different. Previous work indicates that RAR retained the role of the ancestral RAR [38], while RAR and evolved new functionality. RAR has several interactions not found in RAR. RAR has novel interactions with a histone deacetylase (HDAC3) as well as seven of HDAC3’s nearest-neighbors (HDAC4, MBD1, Q15959, NRIP1, Q59FP9, NR2E3, GATA2). None of these interactions are found in RAR. The probability that all of these novel interactions were created independently is very low. RAR has 65 known PPI’s and HDAC3 has 83, and the present-day size of the human PPI network is a little over 3000 proteins. Therefore, the chance of RAR randomly evolving novel interactions with 7 of HDAC3’s neighbors is less than 1 in a billion. This strongly suggests that when a protein evolves an interaction to a target, it has a greater-than-random chance of also linking to other, neighboring proteins.
How do similar-surface interactions affect the evolution of PPI networks? First, consider how an interaction triangle would form. Suppose proteins A and B bind due to physically similar binding sites. Protein X mutates and evolves the capacity to bind A. There is a reasonable chance that X has a surface which is similar to both A and B. If so, protein X is likely to also bind to B, forming a triangle. Denote the probability that two proteins interact due to a simple binding site similarity by . The probability that A binds B (and X binds A) in this manner is . Assuming these probabilities are identical and independent, the probability that X binds B is .
So far, we have discussed transitivity as it affects the PPI’s in which protein A is directly involved (A’s first-neighbors). We now introduce a third protein to the above example, resulting in a chain of interactions: protein A binds B, B binds C, but C does not bind A. Protein X mutates and gains an interaction with A (with probability ). What is the probability that X will also bind C? The probability that B binds C due to surface similarity is . Thus, X will bind C (A’s second-neighbor) with probability . In general, the probability that X will bind one of A’s neighbors is . We refer to this process as assimilation, and the ‘assimilation parameter’ is a constant which varies between species. As discussed in SI, it is primarily mutliple-partner proteins which bind to their partners at different times and/or locations which are affected by this process; consequently, at most one link is created by assimilation at the first-neighbor level, second-neighbor level, etc. Assimilation is assumed to act on a much shorter time scale than duplication and neofunctionalization; in our model, it is instantaneous.
Our hypothesized assimilation mechanism makes several predictions that could be tested experimentally: (1) the probability of a protein assimilating into a new pathway should be (at the first-neighbor level), (at the second-neighbor level), and so on, where is a constant which varies between species; (2) weak, nonspecific binding and planar interfaces should be overrepresented in interaction triangles (and longer cycles) between non-duplicate proteins; (3) competitive inhibitors should be overrepresented in interaction triangles; and (4) domain shuffling should be associated with assimilation. (See SI for discussion of (3) and (4).).
We simulate a neofunctionalization event at time as follows:
-
1b.
Mutate a randomly-chosen gene with probability .
-
2b.
Link to a randomly-chosen target protein.
-
3b.
Add a second link to one of the target’s first-neighbor proteins, chosen randomly, with probability .
-
4b.
Add a link to one of the target’s second-neighbor proteins, with probability , etc.
-
5b.
Move on to the next time interval, time .
Model Simulation and Parameters
A flowchart of how PPI networks evolve in our model is shown in Figure 1. To simulate the network’s evolution, one of the two mechanisms above is used at each time step, using [39]. We call each possible time series a trajectory. We begin each trajectory starting from two proteins sharing a link (the simplest configuration that is still technically a network). Each simulated trajectory ends when the model network has grown to have the same total number of links, , as found in the experimental data, . Here, we perform sets of simulations for three different organisms: yeast (Saccharomyces cerevisiae), fruit flies (Drosophila melanogaster), and humans (Homo sapiens). Because evolution is stochastic, there are different possible trajectories, even for identical starting conditions and parameters. We simulated 50 trajectories for each organism. Our figures show the median values of each feature as a heavy line, and individual trajectories as light lines.
For a given data set, the number of links () is known. We estimate the duplication rate from literature values. There have been several empirical estimates of duplication rates, mostly falling within an order of magnitude of each other [27], [40]–[42], [42]–[45]. We averaged together the literature values to estimate for each species (Table 1).
Table 1. Network sizes and model parameters.
N data | K data | d | μ | φ | a | |
Yeast | 2170 | 3819 | 0.01 | 7.86×10−4 | 0.555 | 0.690 |
Fly | 878 | 1140 | 0.0014 | 5.89×10−4 | 0.866 | 0.546 |
Human | 3165 | 5547 | 0.0037 | 7.62×10−4 | 0.652 | 0.727 |
and are the numbers of proteins and links, respectively. ( is used to stop the simulation. is not used as a constraint.) and have units of per gene per million years (Myr). and are probabilities (unitless). and are constraints from the data, while , , and are adjustable parameters. We used Monte Carlo simulations to optimize the parameter values, by minimizing the total symmetric mean absolute percentage error values of the simulated versus the experimental data (see SI). Our values of are substantially lower than because is the rate of mutations leading to the creation of a new PPI (rather than being a simple mutation rate, which would be much higher).
The quantity is not as well known. Its value relative to has been the topic of considerable debate [24], [46]–[48]. Although, in principle, is a measurable quantity, it has proven difficult to obtain an accurate value, in part because the fixation rate of neofunctionalized alleles varies with population size [49], [50]. In the absence of a consensus order-of-magnitude estimate, in our model, we treat as a fitting parameter. Consistent with the findings of [51] and [46], our best-fit values of are within an order of magnitude of each other for yeast, fruit fly, and human networks. Best-fit parameter values are given in Table 1.
Present-day Network Topology
One test of an evolutionary model is its predictions for present-day PPI network topologies. Current large-scale PPI data sets have a high level of noise, resulting in significant problems with false positives and negatives [19], [52]. To mitigate this, we compare only to ‘high-confidence’ experimental PPI network data gathered in small-scale experiments (see Methods). We computed 10 topological features, quantifying various static and dynamic aspects of the networks’ global and local structures: degree, closeness, eigenvalues, betweenness, modularity, diameter, error tolerance, largest component size, clustering coefficients, and assortativity. 8 of these properties are described below (see SI for others).
The degree of a node is the number of links connected to it. For protein networks, a protein’s degree is the number of proteins with which it has direct interactions. Some proteins interact with few other proteins, while other proteins (called ‘hubs’) interact with many other proteins. Previous work indicates that hubs have structural and functional characteristics that distinguish them from non-hubs, such as increased proportion of disordered surface residues and repetitive domain structures [53]. The high degree of a protein hub could indicate that protein has unusual biological significance [54]. The network’s overall link density is described by its mean degree, (Table 2). The degree distribution is the probability that a protein will have links. PPI networks have a few hub proteins and many relatively isolated proteins. The heavy tail of the degree distribution shows that PPI networks have significantly more hubs than random networks have. Simulated and experimental degree distributions are compared in Figure 2. (For quantitative comparisons, see SI.).
Table 2. Comparison of network features.
Q | D | f 1 | 〈C〉 | 〈k〉 | |
Yeast data | 0.75 | 15 | 0.89 | 0.09 | 3.65 |
DUNE | 0.74(7) | 17(6) | 0.8(1) | 0.041(9) | 4.0(8) |
Vázquez | 0.80(4) | 21(5) | 0.2(1) | 0.045(5) | 2.6(4) |
Berg | 0.518(4) | 12.0(7) | 0.990(3) | 0.0027(9) | 4.10(3) |
RG | 0.910(3) | 36(3) | 0.987(6) | 0.475(8) | 5.31(8) |
MpK | 0.58(6) | 24(5) | 1.000(2) | 0.08(3) | 4.4(6) |
ER | 0.588(8) | 13.0(9) | 0.995(2) | 0.002(1) | 3.5(6) |
Fly data | 0.86 | 23 | 0.73 | 0.10 | 2.93 |
DUNE | 0.82(2) | 20(2) | 0.81(3) | 0.09(1) | 2.36(9) |
Human data | 0.75 | 15 | 0.88 | 0.08 | 3.69 |
DUNE | 0.74(6) | 17(2) | 0.88(4) | 0.09(1) | 3.7(4) |
Modularity , diameter , fraction of nodes in the largest component , global clustering coefficient , and is the average degree of proteins the largest component. ‘Data’ is the empirical data, ‘DUNE’ is the model described here, ‘Vázquez’ is the duplication-only model of [29], ‘Berg’ is the link dynamics model [85], ‘RG’ is random geometric [89], ‘MpK’ is the physical desolvation model presented in [52], and ‘ER’ is an Erdös-Rényi random graph [90]. Simulated values are the median ( standard deviation) over 50 simulations. (See SI for details of each model’s setup and optimization.).
Component refers to a set of reachable proteins. If any protein is reachable from any other protein (by hopping from neighbor to neighbor), then the network only has one component. If there is no path leading from protein A to B, then A and B are in different components. The fraction of nodes in the largest component () is a measure of network fragmentation (Table 2 and Figure S3). Note that, although silent genes (proteins with no links) exist in real systems, these genes do not appear in data sets consisting only of PPI’s. Therefore, calculations of for all models exclude orphan proteins (proteins with ).
Gene loss, the silencing or deletion of genes, is known to play an important role in evolution. The loss of a functioning gene will damage an organism, making the gene loss unlikely to be passed on. The exception is if the gene is redundant. Consistent with this reasoning, evidence suggests that many gene loss events are losses of one copy of a duplicated gene [30], [55]. Although empirical estimates of the gene loss rate varied considerably, a consistent finding across several studies is that the rates of gene duplication and loss are of the same order-of-magnitude [27], [41], [44]. This broad picture is in good agreement with our model. In our model, a gene is considered lost when it has degree zero. Our model predicts that the ratio of orphan to non-orphan proteins is in yeast, in flies, and in humans. The gene loss rate has been previously estimated to be about half the duplication rate in both flies and humans [27], [44], consistent with our model’s prediction.
The distance between nodes and is defined as the number of node-to-node steps that it takes along the shortest path to get from node to . The closeness centrality of a node , , is the inverse of the average distance from node to all other nodes in the same component. The diameter, , of a network is the longest distance in the network. Simulated closeness distributions are compared to experiments in Figure 3. Interestingly, proteins have about ‘six degrees of separation’, similar to social networks [56], [57]. The closeness distributions have peaks around .
Another property of a network is its modularity [58]. Networks are modular if they have high densities of links (defining regions called modules), connected by lower densities of links (between modules). One way to quantify the extent of modular organization in a network is to compute the modularity index, [59], [60]:
(1) |
where and are the degrees of nodes and , and denote the modules to which nodes and belong, if and otherwise, and if nodes and share a link, and otherwise. quantifies the difference between the actual within-module link density to the expected link density in a randomly connected network. ranges between and ; positive values of indicate that the number of links within modules is greater than random. The numerical value of required for a network to be considered ‘modular’ depends on the number of nodes and links and method of computation. To calibrate baseline values given our particular network data, we used the null model described in [61]. Our non-modular baseline values are for the human PPI net, for yeast, and for flies (see SI). As shown in Table 2, PPI networks are highly modular, and our simulated values are in good agreement with those of experimental data.
The clustering coefficient, , for a protein , is a measure of mutual connectivity of the neighbors of protein . is defined as the ratio of the actual number of links between neighbors of protein to the maximum possible number of links between them,
(2) |
In a PPI network, clustering is thought to reflect the high likelihood that proteins of similar function are mutually connected [62]. The average (or global) clustering coefficient, , quantifies the extent of clustering in the network as a whole. As shown in Table 2, PPI networks have large global clustering coefficient values; the yeast PPI network, for example, has a value of which is 45 times higher than that of a random graph of equivalent link density. In flies and humans, our simulated networks have values in excellent agreement with the data; in yeast, our predicted value is slightly low.
A network is said to be ‘hierarchically clustered’ if the clustering coefficient and degree obey a power-law relation, [63] (Figure S1), indicating that nodes are organized into small-scale modules, and the small-scale modules are in turn organized into larger-scale modules following the same pattern [64]. By plotting each node’s clustering coefficient against its degree, we observed a trend consistent with hierarchical clustering, although data in the tail is very limited.
The betweenness of a node measures the extent to which it ‘bridges’ between different modules. Betweenness centrality, , is defined as:
(3) |
Betweenness has been proposed as a uniquely functionally-relevant metric for PPI networks because it relates local and global topology. It has been argued that knocking out a protein that has high betweenness may be more lethal to an organism than knocking out a protein of high degree [65]. Betweenness distributions are shown in Figure 4.
If a network’s well-connected nodes are mostly attached to poorly-connected nodes, the network is called disassortative. A simple way to quantify disassortativity is by determining the median degree of a protein’s neighbors () as a function of its degree (). Previous work has found that yeast networks are disassortative [61]. It has been argued that disassortativity is an essential feature of PPI network evolution, and recent modeling efforts have heavily emphasized this feature [66], [67]. However, it was noted by [68] that disassortativity may simply be an artifact of the yeast two-hybrid technique, and [69] pointed out that this trend is quite different among different yeast datasets, and in some cases is completely reversed, resulting in assortative mixing, where high degree proteins prefer to link to other high-degree proteins. As shown in Figure 5 and Table S1, the empirical data shows no evidence of disassortativity in flies or humans, and even the trend in yeast is quite weak. This conclusion is based solely on analysis of the empirical data, and casts further doubt on the role of disassortative mixing in PPI network evolution.
Comparisons of simulated and experimental eigenvalue spectra and error tolerance curves are shown in SI (Figures S7 and S8). As discussed in SI, the various per-node network properties we have analyzed are largely uncorrelated (Figure S9).
Evolutionary Trajectories
We now consider the question of how PPI networks evolve in time. The present-day networks show a rich-get-richer structure: PPI networks tend to have both more well-connected nodes and more poorly connected nodes than random networks have. In our model, the rich-get-richer property has two bases: duplication and assimilation. The equal duplication chance per protein means the probability for a protein with links to acquire a new link via duplication of one of its interaction partners is proportional to . Likewise, the probability of a protein to receive a link from the first-neighbor assimilation probability is proportional to its degree . ‘Rich’ proteins get richer because the probability of acquiring new links rises with the number of existing links.
First, we discuss two dynamical quantities for which experimental evidence exists: the rate of gene loss, and the relation between a protein’s age and its centrality. Gene losses in our model correspond to ‘orphan’ proteins which have no interactions with other proteins. As shown in Figure S3, the fraction of orphan proteins grows quickly at first, then levels off. This is consistent with the findings of [44]: in humans, while the overall duplication rate is higher than the loss rate, when only data from the past 200 Myr are considered, the loss rate is slightly higher than the duplication rate. In our model, after the initial rapid expansion, the rate of gene loss stabilizes relative to the duplication rate.
We define the ‘age’ of a protein in our simulation according to the order in which proteins were added to the network. Our model shows that a protein’s age correlates with certain network properties. Consistent with earlier work [70]–[73], we find that older proteins tend to be more highly connected. We plotted the ‘age index’ of a protein (the time step at which the protein was introduced) versus its centrality scores. As shown in Figure S2, the age index negatively correlates with degree, betweenness, and closeness centralities: older proteins tend to be more central than younger proteins. Figure S2 shows our model’s prediction that a protein’s age correlates with degree, betweenness, and closeness centrality. We confirmed this prediction by following the evolutionary trajectories of individual proteins (Figure S4). These results are consistent with the eigenvalue-based aging method described in [73] (Figure S5). Phylogenetic protein age estimates indicate that older proteins tend to have a higher degree [70], [73], which our model correctly predicts. Interestingly, the eigenvalue-based scores are only modestly correlated with other centrality scores (0.36 degree, 0.47 betweenness, and 0.10 closeness correlations). Using the eigenvalue method in tandem with our centrality-based method could provide stronger age-discriminating power for PPI networks than either method alone.
The correlation between centrality and age suggests that static properties of present-day networks may be used to estimate relative protein ages. Suppose each normalized centrality score (, , ) represents a coordinate in a 3-D ‘centrality space’. We can then define a composite centrality score () as .
Do older proteins typically have different functions than newer proteins? We classified S. cerevisiae proteins using the GO-slim gene ontology system in the Saccharomyces Genome Database. As shown in Figure S6, GO-slim enrichment profiles were somewhat different between the oldest and youngest proteins (as measured by their values). Several categories which were more enriched for the oldest proteins were the cell cycle, stress response, cytoskeletal and cell membrane organization, whereas younger proteins were overrepresented in several metabolic processes. Overall, the differences were not dramatic, suggesting that cellular processes generally require both central and non-central proteins to function. Consistent with this, ancient proteins tend to be centrally located with modules, as their betweenness values gradually decline over time (Figure S4). The roughly linear relation between degree and betweenness also suggests that ancient proteins do not occupy structurally ‘special’ positions within the network, such as stitching together separate modules (Table S1 and Figure S10). This may indicate that modules tend to accumulate around the most ancient proteins, which act as a sort of nucleus. Thus, ancient proteins are involved in all kinds of pathways, because they have each nucleated their own pathway.
In contrast to the two dynamical quantities discussed so far, most structural properties of PPI networks have only been measured for the present-day network. Although our model accurately reproduces the present-day values of these quantities, there is no direct evidence that the simulated trajectories are correct; rather, these are predictions of our model. Figure 6 shows that both modularity and diameter increase with time. These are not predictions that can be tested yet for biological systems, since there is no time-resolved data yet available for PPI evolution. Time-resolved data is only currently available for various social networks (links to websites, co-authorship networks, etc.). Interestingly, the diameters of social networks are found to shrink over time [74]. Our model predicts that PPI networks differ from these social networks in that their diameters grow over time. In addition to and , we tracked the evolutionary trajectories of several other quantities: the evolution of the global clustering coefficient, the rate of signal propagation, the size of the largest connected component (Figure S3), as well as betweenness and degree values for individual nodes (Figure S4). See SI for details.
Discussion
The relevance of selection to PPI network evolution has been a topic of considerable debate [75], particularly in the context of higher-order network features, such as modularity. A number of authors have argued that specific selection programs are required to generate modular networks, such as oscillation between different evolutionary goals [76]–[81]. However, previous work has shown that gene duplication by itself, in the absence of both natural selection and neofunctionalization, can generate modular networks [82], [83]. Consistent with the findings of [82], [83], modularity in our model is primarily generated by gene duplications (Figure S11; see SI for sensitivity analysis). Unfortunately, duplication-only models err in their predictions of other network properties (Tables 2 and S2; Figure S12). A well-known problem with duplication models is that they generate excessively fragmented networks, with only about 20% of the proteins in the largest component. This is in sharp contrast to real PPI networks, which have 73% to 89% of their proteins in the largest component. Neofunctionalization-only models have most of their proteins in the largest component, but are significantly less modular than real networks. As shown in Table 2, by modeling duplication and neofunctionalization simultaneously, the DUNE model generates networks which have the modularity found in duplication-only models, while retaining most proteins in the largest component. This lends support to the idea that gene duplication contributes to the modularity found in real biological networks, and that protein modules can arise under neutral evolution, without requiring complicated assumptions about selective pressures. This is consistent with recent experimental work characterizing a real-world fitness landscape, showing that it is primarily shaped by neutral evolution [84].
Previous estimates of NE rates in eukaryotes have varied widely, generally falling in the range of 100 to 1000 changes/genome/Myr [24], [46], [85], or on the order of 0.1 change/gene/Myr. However, more recent empirical work has identified several problems with the methods used to obtain these estimates, suggesting that de novo link creation is much less common than previously thought [48]. This is consistent with our model. The best-fit values of our NE rate are in the range of to /gene/Myr (Table 1), which in all three organisms are considerably slower than the duplication rates .
Biologically, many of the interactions created by our neofunctionalization mechanism are expected to initially be weak, non-functional interactions. The results of [86] suggest that strong functional interactions are correlated with hydrophobicity, which in turn is correlated with promiscuity. We posit that initially weak, non-functional interactions are an essential feature of PPI evolution, as they provide the ‘raw material’ for the subsequent evolution of functional interactions. If this reasoning is correct, one consequence should be that hub proteins are, on average, more important to the cell than non-hub proteins. This has been found to be true: both degree [54] and betweenness centrality [65] have positive correlations with essentiality, indicating that hub proteins are often critical to the cell’s survival.
We have described here a model for how eukaryotic protein networks evolve. The model, called DUNE, implements two biological mechanisms: (1) gene duplications, leading to a superfluous copy of a protein that can change rapidly under new selective pressures, giving new relationships with other proteins and (2) a protein can undergo random mutations, leading to neofunctionalization, the de novo creation of new relationships with other proteins. Neofunctionalization can lead to assimilation, the formation of extra novel interactions with the other proteins in the target’s neighborhood. Biological evidence suggests that this type of mechanism exists. Our specific implementation is based on a simple geometric surface-compatibility argument for the observed transitivity in PPI networks. This is, of course, a heavily simplified model of PPI network evolution, and there are many biological factors which have not been included. However, our relatively simple model shows good agreement with 10 topological properties in yeast, fruit flies, and humans. One finding is that PPI networks can evolve modular structures, just from these random forces, in the absence of specific selection pressures. We also find that the most central proteins also tend to be the oldest. This suggests that looking at the structures of present-day protein networks can give insight into their evolutionary history.
Methods
Genome-wide PPI screens have a high level of noise [19], and specific interactions correlate poorly between data sets [52]. We found that several large-scale features differed substantially between types of high-throughput experiments (see SI). Due to concerns about the accuracy and precision of data obtained through high-throughput screens, we chose to work with ‘high-confidence’ data sets consisting only of pairwise interactions confirmed in small-scale experiments, which we downloaded from the public HitPredict database [87]. We found sufficient high-confidence data in yeast (S. cerevisiae), fruit flies (D. melanogaster), and humans (H. sapiens).
All simulations and network feature calculations were carried out in Matlab. Our scripts are freely available for download at http://ppi.tinybike.net. We computed betweenness centralities, clustering coefficients, shortest paths, and component sizes using the MatlabBGL package. Modularity values were calculated with the algorithm of [88]. All comparisons (except the degree distribution) are between the largest connected components of the simulated and experimental data.
Due to the human network’s somewhat larger size, most dynamical features were calculated once per 50 time steps for the human network, but were updated at every time step in the yeast and fly networks. For dynamical plots, the coordinates of the trend line are medians-of-medians. The amount of time elapsed per time step (the coordinate) varies between simulations. We binned the time coordinates to the nearest 10 million years for yeast and fly, and 25 million years for human. When multiple values from the same simulation fell within the same bin, we used the median value. We then calculated the median value between simulations. Scatter plot trend lines are calculated in a similar way. The trend line represents the median response variable (, , or ) value over all nodes within a single simulation with degree . The coordinate of the trend line is therefore the median (across 50 simulations) of these median response variables. This median-of-medians includes all simulations that have nodes of a given degree.
Supporting Information
Acknowledgments
We would like to thank Tom MacCarthy and Anna Panchenko for valuable discussions.
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: GJP was funded by a National Defense Science and Engineering Graduate Fellowship from the Department of Defense, SP was funded by a Sandler Blue Sky Award, KSP was funded by a Long Term Ecological Research grant from the National Science Foundation, and KAD was funded by the National Science Foundation and National Institutes of Health GM 34993. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Hughes D. Microbial genetics: Exploiting genomics, genetics and chemistry to combat antibiotic resistance. Nature Reviews Genetics. 2003;4:432–441. doi: 10.1038/nrg1084. [DOI] [PubMed] [Google Scholar]
- 2.Cirz R, Chin J, Andes D, de Crécy-Lagard V, Craig W, et al. Inhibition of mutation and combating the evolution of antibiotic resistance. PLoS Biology. 2005;3:e176. doi: 10.1371/journal.pbio.0030176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Taylor I, Linding R, Warde-Farley D, Liu Y, Pesquita C, et al. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nature Biotechnology. 2009;27:199–204. doi: 10.1038/nbt.1522. [DOI] [PubMed] [Google Scholar]
- 4.Lynch M, O’Hely M, Walsh B, Force A. The probability of preservation of a newly arisen gene duplicate. Genetics. 2001;159:1789–1804. doi: 10.1093/genetics/159.4.1789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ting CT, Tsaur SC, Sun S, Browne W, Chen YC, et al. Gene duplication and speciation in Drosophila: evidence from the Odysseus locus. Proc Natl Acad Sci USA. 2004;101:12232–12235. doi: 10.1073/pnas.0401975101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dutkowski J, Tiuryn J. Phylogeny-guided interaction mapping in seven eukaryotes. BMC Bioinformatics. 2009;10:393. doi: 10.1186/1471-2105-10-393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, et al. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA. 2000;97:1143–1147. doi: 10.1073/pnas.97.3.1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Uetz P, Giot L, Cagney G, Mansfield T, Judson R, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. [DOI] [PubMed] [Google Scholar]
- 9.Krogan N, Cagney G, Yu H, Zhong G, Guo X, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440:637–643. doi: 10.1038/nature04670. [DOI] [PubMed] [Google Scholar]
- 10.Yu H, Braun P, Yildirim M, Lemmens I, Venkatesan K, et al. High-quality binary protein interaction map of the yeast interactome network. Science. 2008;5898:104–110. doi: 10.1126/science.1158684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Marcotte E, Pellegrini M, Ng H, Rice D, Yeates T, et al. Detecting protein function and protein-protein interactions from genome sequences. Nature. 1999;402:86–90. doi: 10.1126/science.285.5428.751. [DOI] [PubMed] [Google Scholar]
- 12.Pellegrini M, Marcotte E, Thompson M, Eisenberg D, Yeates T. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96:4285–4288. doi: 10.1073/pnas.96.8.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Valencia A, Pazos F. Computational methods for the prediction of protein interactions. Current Opinion in Structural Biology. 2002;12:368–373. doi: 10.1016/s0959-440x(02)00333-0. [DOI] [PubMed] [Google Scholar]
- 14.Gomez S, Noble W, Rzhetsky A. Learning to predict protein-protein interactions from protein sequences. Bioinformatics. 2003;19:1875–1881. doi: 10.1093/bioinformatics/btg352. [DOI] [PubMed] [Google Scholar]
- 15.Jothi R, Kann M, Przytycka T. Predicting protein-protein interaction by searching evolutionary tree automorphism space. Bioinformatics. 2005;21:241–250. doi: 10.1093/bioinformatics/bti1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liu Y, Liu N, Zhao H. Inferring protein-protein interactions through high-throughput interaction data from diverse organisms. Bioinformatics. 2005;21:3279–3285. doi: 10.1093/bioinformatics/bti492. [DOI] [PubMed] [Google Scholar]
- 17.Shoemaker B, Panchenko A. Deciphering protein-protein interactions. part II. computational methods to predict protein and domain interaction partners. PLoS Computational Biology. 2007;3:e43. doi: 10.1371/journal.pcbi.0030043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Burger L, van Nimwegen E. Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol Syst Biol. 2008;4:165. doi: 10.1038/msb4100203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Deane C, Salwiński L, Xenarios I, Eisenberg D. Protein interactions: two methods for assessment of the reliability of high throughput observations. Molecular & Cellular Proteomics. 2002;1:349–356. doi: 10.1074/mcp.m100037-mcp200. [DOI] [PubMed] [Google Scholar]
- 20.Yamada T, Bork P. Evolution of biomolecular networks – lessons from metabolic and protein interactions. Nature Reviews Molecular Cell Biology. 2009;10:791–803. doi: 10.1038/nrm2787. [DOI] [PubMed] [Google Scholar]
- 21.Ohno S. Evolution by Gene Duplication. Springer-Verlag, New York. 1970.
- 22.Zhang J. Evolution by gene duplication: an update. Trends in Ecology & Evolution. 2003;18:292–298. [Google Scholar]
- 23.Xiao H, Jiang N, Schaffner E, Stockinger E, van der Knaap E. A retrotransposon-mediated gene duplication underlies morphological variation of tomato fruit. Science. 2008;319:1527–1530. doi: 10.1126/science.1153040. [DOI] [PubMed] [Google Scholar]
- 24.Wagner A. How the global structure of protein interaction networks evolves. Proc R Soc Lond B. 2003;270:457–466. doi: 10.1098/rspb.2002.2269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Koch A. Enzyme evolution. i. the importance of untranslatable intermediates. Genetics. 1972;72:297–316. doi: 10.1093/genetics/72.2.297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Taylor J, Raes J. Duplication and divergence: the evolution of new genes and old ideas. Annual Review of Genetics. 2004;38:615–643. doi: 10.1146/annurev.genet.38.072902.092831. [DOI] [PubMed] [Google Scholar]
- 27.Lynch M, Conery J. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. doi: 10.1126/science.290.5494.1151. [DOI] [PubMed] [Google Scholar]
- 28.Maslov S, Sneppen K, Eriksen K KA amd Yan. Upstream plasticity and downstream robustness in evolution of molecular networks. BMC Evolutionary Biology. 2004;4:9. doi: 10.1186/1471-2148-4-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Vázquez A, Flammini A, Maritan A, Vespignani A. Modeling of protein interaction networks. ComPlexUs. 2003;1:38–44. [Google Scholar]
- 30.Kellis M, Birren B, Lander E. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004;428:617–624. doi: 10.1038/nature02424. [DOI] [PubMed] [Google Scholar]
- 31.Gu X, Zhang Z, Huang W. Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proc Natl Acad Sci USA. 2005;102:707–712. doi: 10.1073/pnas.0409186102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA. 1998.
- 33.Ispolatov I, Yuryev A, Mazo I, Maslov S. Binding properties and evolution of homodimers in protein-protein interaction networks. Nucleic Acids Res. 2005;33:3629–3635. doi: 10.1093/nar/gki678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Jones S, Thornton J. Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996;93:13–20. doi: 10.1073/pnas.93.1.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Tovchigrechko A, Vakser I. How common is the funnel-like energy landscape in protein-protein interactions? Protein Science. 2001;10:1572–1583. doi: 10.1110/ps.8701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wu F, Towfic F, Dobbs D, Honavar V. Analysis of protein-protein dimeric interfaces. In: IEEE International Conference on Bioinformatics and Biomedicine. Fremont, CA. 2007.
- 37.Vinogradov A, Anatskaya O. Loss of protein interactions and regulatory divergence in yeast whole-genome duplicates. Genomics. 2009;93:534–542. doi: 10.1016/j.ygeno.2009.02.004. [DOI] [PubMed] [Google Scholar]
- 38.Escriva H, Bertrand S, Germain P, Robinson-Rechavi M, Umbhauer M, et al. Neofunction-alization in vertebrates: The example of retinoic acid receptors. PLoS Genetics. 2006;2:e102. doi: 10.1371/journal.pgen.0020102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Gillespie D. Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry. 1977;81:2340–2361. [Google Scholar]
- 40.Gu Z, Cavalcanti A, Chen F, Bouman P, Li W. Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Molecular Biology and Evolution. 2002;19:256–262. doi: 10.1093/oxfordjournals.molbev.a004079. [DOI] [PubMed] [Google Scholar]
- 41.Gao L, Innan H. Very low gene duplication rate in the yeast genome. Science. 2004;306:1367–1370. doi: 10.1126/science.1102033. [DOI] [PubMed] [Google Scholar]
- 42.Lynch M, Conery J. The evolutionary demography of duplicate genes. Journal of Structural and Functional Genomics. 2003;3:35–44. [PubMed] [Google Scholar]
- 43.Osada N, Innan H. Duplication and gene conversion in the Drosophila melanogaster genome. PLoS Genetics. 2008;4:e1000305. doi: 10.1371/journal.pgen.1000305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cotton J, Page R. Rates and patterns of gene duplication and loss in the human genome. Proceedings of the Royal Society B 272. 2005. [DOI] [PMC free article] [PubMed]
- 45.Pan D, Zhang L. Quantifying the major mechanisms of recent gene duplications in the human and mouse genomes: a novel strategy to estimate gene duplication rates. Genome Biol. 2007;8:R158. doi: 10.1186/gb-2007-8-8-r158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Beltrao P, Serrano L. Specificity and evolvability in eukaryotic protein interaction networks. PLoS Computational Biology 3. 2007. [DOI] [PMC free article] [PubMed]
- 47.Lynch M, Sung W, Morris K, Coffey N, Landry C, et al. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci USA. 2008;105:9272–9277. doi: 10.1073/pnas.0803466105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gibson T, Goldberg D. Questioning the ubiquity of neofunctionalization. PLoS Computational Biology. 2009;5:e1000252. doi: 10.1371/journal.pcbi.1000252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Kimura M. Some problems of stochastic processes in genetics. Ann Math Stat. 1957;28:882–901. [Google Scholar]
- 50.Walsh J. How often do duplicated genes evolve new functions? Genetics. 1995;139:421–428. doi: 10.1093/genetics/139.1.421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.He X, Zhang J. Rapid subfunctionalization accompanied by prolonged and substantial neo-functionalization in duplicate gene evolution. Genetics. 2005;169:1157–1164. doi: 10.1534/genetics.104.037051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Deeds E, Ashenberg O, Shakhnovich E. A simple physical model for scaling in protein-protein interaction networks. Proc Natl Acad Sci USA. 2006;103:311–316. doi: 10.1073/pnas.0509715102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Patil A, Kinoshita K, Nakamura H. Domain distribution and intrinsic disorder in hubs in the human protein-protein interaction network. Protein Science. 2010;19:1461–1468. doi: 10.1002/pro.425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Jeong H, Mason S, Barabási A, Oltvai Z. Lethality and centrality in protein networks. Nature. 2001;411:41–42. doi: 10.1038/35075138. [DOI] [PubMed] [Google Scholar]
- 55.Ku H, Vision T, Liu J, Tanksley S. Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc Natl Acad Sci USA. 2000;97:9121–9126. doi: 10.1073/pnas.160271297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Travers J, Milgram S. An experimental study of the small world problem. Sociometry. 1969;32:425–443. [Google Scholar]
- 57.Leskovec J, Horvitz E. Proceedings of the 17th international conference on World Wide Web. ACM, New York. 2008.
- 58.Yook S, Oltvai Z, Barabási A. Functional and topological characterization of protein interaction networks. Proteomics. 2004;4:928–942. doi: 10.1002/pmic.200300636. [DOI] [PubMed] [Google Scholar]
- 59.Newman M, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113. doi: 10.1103/PhysRevE.69.026113. [DOI] [PubMed] [Google Scholar]
- 60.Newman M. Modularity and community structure in networks. Proc Natl Acad Sci USA. 2006;103:8577–8582. doi: 10.1073/pnas.0601602103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Maslov S, Sneppen K. Specificity and Stability in Topology of Protein Networks. Science. 2002;296:910–913. doi: 10.1126/science.1065103. [DOI] [PubMed] [Google Scholar]
- 62.von Mering C, Krause R, Snel B, Cornell M, Oliver S, et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002;31:399–403. doi: 10.1038/nature750. [DOI] [PubMed] [Google Scholar]
- 63.Ravasz E, Somera A, Mongru D, Oltvai Z, Barabási A. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–1555. doi: 10.1126/science.1073374. [DOI] [PubMed] [Google Scholar]
- 64.Barabási A, Oltvai Z. Network biology: understanding the cell’s functional organization. Nature Reviews Genetics. 2004;5:101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
- 65.Joy M, Brock A, Ingber D, Huang S. High-betweenness proteins in the yeast protein interaction network. Journal of Biomedicine and Biotechnology. 2005;2:96–103. doi: 10.1155/JBB.2005.96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Zhao D, Liu Z, Wang J. Duplication: a mechanism producing disassortative mixing networks in biology. Chin Phys Lett. 2007;24:2766. [Google Scholar]
- 67.Wan X, Cai S, Zhou J, Liu Z. Emergence of modularity and disassortativity in protein-protein interaction networks. Chaos. 2010;20:045113. doi: 10.1063/1.3517107. [DOI] [PubMed] [Google Scholar]
- 68.Aloy P, Russell R. Potential artefacts in protein-interaction networks. FEBS Lett. 2002;530:253–254. doi: 10.1016/s0014-5793(02)03427-0. [DOI] [PubMed] [Google Scholar]
- 69.Hakes L, Pinney J, Robertson D, Lovell S. Protein-protein interaction networks and biology – what’s the connection? Nature Biotechnol. 2008;26:69–72. doi: 10.1038/nbt0108-69. [DOI] [PubMed] [Google Scholar]
- 70.Woese C. Bacterial evolution. Microbiol Rev. 1987;51:221–271. doi: 10.1128/mr.51.2.221-271.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Krapivsky P, Redner S. Organization of growing random networks. Physical Review E. 2001;63:066123. doi: 10.1103/PhysRevE.63.066123. [DOI] [PubMed] [Google Scholar]
- 72.Qin H, Lu H, Wu W, Li W. Evolution of the yeast protein interaction network.Proc Natl Acad Sci USA. 2003;100:12820–12824. doi: 10.1073/pnas.2235584100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zhu G, Yang H, Yang R, Ren J, Li B, et al. Uncovering evolutionary ages of nodes in complex networks. Eur Phys Jour B. 2012;85:106. [Google Scholar]
- 74.Leskovec J, Kleinberg J, Faloutsos C. Graphs over time: densification laws, shrinking diameters and possible explanations. Knowledge Discovery in Databases: PKDD 2005. 2005.
- 75.Lynch M. The evolution of genetic networks by non-adaptive processes. Nat Rev Genet. 2007;8:803–813. doi: 10.1038/nrg2192. [DOI] [PubMed] [Google Scholar]
- 76.Lipson H, Pollack J, Suh N. On the origin of modular variation. Evolution. 2002;56:1549–1556. doi: 10.1111/j.0014-3820.2002.tb01466.x. [DOI] [PubMed] [Google Scholar]
- 77.Kashtan N, Alon U. Spontaneous evolution of modularity and network motifs. Proc Natl Acad Sci USA. 2005;102:13773–13778. doi: 10.1073/pnas.0503610102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Komurov K, White M. Revealing static and dynamic modular architecture of the eukaryotic protein interaction network. Molecular Systems Biology. 2007;3:110. doi: 10.1038/msb4100149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Wagner G, Pavlicev M, Cheverud J. The road to modularity. Nature Reviews Genetics. 2007;8:921–931. doi: 10.1038/nrg2267. [DOI] [PubMed] [Google Scholar]
- 80.Espinosa-Soto C, Wagner A. Specialization can drive the evolution of modularity. PLoS Computational Biology. 2010;6:e1000719. doi: 10.1371/journal.pcbi.1000719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Soyer O. Fate of a duplicate in a network context. In: Dittmar K, Liberles D, editors, Evolution After Gene Duplication, Wiley-Blackwell. 2010. pp. 215–228.
- 82.Hallinan J. Gene duplication and hierarchical modularity in intracellular interaction networks. BioSystems. 2004;74:51–62. doi: 10.1016/j.biosystems.2004.02.004. [DOI] [PubMed] [Google Scholar]
- 83.Solé R, Valverde S. Spontaneous emergence of modularity in cellular networks. J R Soc Interface. 2008;5:129–133. doi: 10.1098/rsif.2007.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Hietpas R, Jensen J, Bolon D. Experimental evolution of a fitness landscape. Proc Natl Acad Sci USA. 2011;108:7896–7901. doi: 10.1073/pnas.1016024108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Berg J, Lassig M, Wagner A. Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications. BMC Evolutionary Biology. 2004;4:51–63. doi: 10.1186/1471-2148-4-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Heo M, Maslov S, Shakhnovich E. Topology of protein interaction network shapes protein abundances and strengths of their functional and nonspecific interactions. Proc Natl Acad Sci USA. 2011;108:4258–4263. doi: 10.1073/pnas.1009392108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Patil A, Nakai K, Nakamura H. Hitpredict: a database of quality assessed protein-protein interactions in nine species. Nucleic Acids Research. 2011;39:D744–D749. doi: 10.1093/nar/gkq897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Blondel V, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008: P10008+ 2008.
- 89.Pržulj N, Corneil D, Jurisica I. Modeling interactome: scale-free or geometric? Bioinformatics. 2004;20:3508–3515. doi: 10.1093/bioinformatics/bth436. [DOI] [PubMed] [Google Scholar]
- 90.Erdős P, Rényi A. On the evolution of random graphs. Publ Math Inst Hung Acad Sci. 1960;5:17–61. [Google Scholar]
- 91.Clauset A, Shalizi C, Newman M. Power-law distributions in empirical data. SIAM Review. 2009;51:661–703. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.