Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Dec 13;108(1):343–348. doi: 10.1073/pnas.1009775108

Lateral acquisition of genes is affected by the friendliness of their products

Uri Gophna a,1, Yanay Ofran b,1
PMCID: PMC3017175  PMID: 21149709

Abstract

A major factor in the evolution of microbial genomes is the lateral acquisition of genes that evolved under the functional constraints of other species. Integration of foreign genes into a genome that has different components and circuits poses an evolutionary challenge. Moreover, genes belonging to complex modules in the pretransfer species are unlikely to maintain their functionality when transferred alone to new species. Thus, it is widely accepted that lateral gene transfer favors proteins with only a few protein–protein interactions. The propensity of proteins to participate in protein–protein interactions can be assessed using computational methods that identify putative interaction sites on the protein. Here we report that laterally acquired proteins contain significantly more putative interaction sites than native proteins. Thus, genes encoding proteins with multiple protein–protein interactions may in fact be more prone to transfer than genes with fewer interactions. We suggest that these proteins have a greater chance of forming new interactions in new species, thus integrating into existing modules. These results reveal basic principles for the incorporation of novel genes into existing systems.

Keywords: network evolution, genomics, horizontal gene transfer, systems biology


The horizontal transfer of DNA, known as lateral gene transfer (LGT), is considered a major mechanism for the acquisition of novel biological functions in bacteria (1). Transfer of a single gene introduces an evolutionary puzzle; biological processes are typically carried out by the interaction of several proteins, not by a single gene product. Genes that belong to complex modules might not be functional when transferred alone to a new species. Thus, it is widely accepted that genes with fewer interactions are more likely to be transferred to, and retained in, a new species. This postulate is known as the complexity hypothesis (2). Subsequent studies that elaborated on the importance of protein–protein interactions (PPIs) in this context provide some support for the principles laid out in the complexity hypothesis (35); for example, it has been shown that proteins encoded by genes that have been more frequently transferred in evolution tend to have fewer interactions than other proteins (5). However, supporting evidence for the complexity hypothesis comes from analysis of the PPIs of transferred genes within their contemporary, posttransfer species, not from the number of PPIs in their pretransfer host. Indeed, fewer contemporary PPIs might imply a small number of PPIs in the pretransfer host as well, thus corroborating the assumption that highly interacting proteins are less likely to be transferred and retained. Alternatively, fewer contemporary PPIs might merely imply that horizontally acquired genes had less time to develop interactions with proteins in their new host compared with vertically acquired genes. Thus, such a posttransfer analysis does not provide conclusive evidence regarding the relationship between PPI and LGT. Unfortunately, there is no way to systematically assess the pretransfer PPI network, because in most cases, there is no clear record of the original host, let alone comprehensive PPI data obtained for its proteins. Therefore, it is currently impossible to directly analyze the PPI of transferred genes in their pretransfer context. To circumvent the lack of pretransfer PPI data and still be able to assess the importance of PPI for the acquisition and retention of genes through LGT, we assessed the protein interaction potential of transferred gene products. This potential is based on the number of putative protein-biding sites on a protein's surface. Proteins with more interaction sites tend to interact with more proteins (i.e. have a higher node degree in PPI networks) (6). It has been shown that residues that are part of putative protein-binding sites can be identified a priori from structure and even from sequence (79); thus, it is possible to assess a protein's interaction potential, or its stickiness, directly from its amino acid sequence. In the present work, we used interaction potentials in an attempt to uncover the actual interaction capacity, or “friendliness,” of laterally acquired genes.

Results and Discussion

It has been shown that proteins with more interaction sites tend to have higher degree in PPI networks (6). It also has been shown that interaction sites can be predicted a priori from structure and even from sequence (79). Fig. 1 shows the relationship between the predicted relative interaction potential (RIP) of a protein and the number of proteins with which that protein interacts. The RIP of a protein, which represents the percentage of its residues that are predicted to be interaction sites, is defined as Eq. 1:

graphic file with name pnas.1009775108eq1.jpg

where Npp is number of residues predicted to be part of a PPI site and L is the length of the protein. Numerous tools have demonstrated the statistical possibility of identifying a priori residues that are part of PPI sites (see refs. 7, 9, and 10 for a review). We used ISIS (11), a tool that annotates the residues that putatively bind other proteins for each protein sequence. ISIS is a machine learning–based suite. In brief, for each residue, ISIS compares the physicochemical characteristics of its environment, its predicted structural features (e.g., solvent accessibility, secondary structure), and its evolutionary profile with those of residues in known PPI sites. Thus, ISIS determines whether a given residue is a putative PPI site. Large-scale analysis of ISIS has confirmed its statistical robustness and a lack of bias toward proteins from a specific species, kingdom, or subcellular localization (8, 11, 12).

Fig. 1.

Fig. 1.

Predicted interaction sites are more abundant in proteins that are involved in many PPIs in E. coli. Thus, predicted interaction sites can be viewed as an interaction potential or as a measure of protein friendliness. The number of proteins in each category (13) is given in parentheses. A similar trend was observed for another PPI dataset (15) (Fig. S1).

Using ISIS, we calculated the RIP for all Escherichia coli proteins. We then divided the E. coli proteins into groups based on their degree in the PPI network, that is, by the number of experimentally observed pairwise PPIs that they formed (Materials and Methods). As expected, proteins with a higher degree had a higher RIP. We repeated the analysis with three large PPI datasets (1315) that used different stringency cutoffs, and found no significant difference in this trend in any of these datasets.

The first dataset that we used relied on a combination of pull-down results and additional data from individual studies obtained from the database of interacting proteins (DIP) (13, 14). The second dataset used a different experimental method, tandem affinity tagging (15). Despite the differences in detection methods and stringency criteria between the two datasets, the overall trends were similar. Nevertheless, the analysis of Hu et al. (15) identified more interacting proteins and more overall interactions than the other studies. This allowed for a more detailed comparison that revealed a strong correlation (r = 0.50; P < 10−10) between the number of interacting proteins and the fraction of sticky residues predicted by ISIS (Fig. S1). Interestingly, when we separated the recently transferred gene products from the entire set and assessed them on their own, the correlation disappeared (r = −0.23; P = 0.24). The correlation of the remaining proteins was marginally stronger (r = 0.51) when the recently acquired gene products were excluded.

These findings indicate that the ISIS-predicted RIP of a protein can be viewed as a measure of the protein's stickiness, or friendliness. This measure is higher for proteins with a greater number of interaction partners. To evaluate the relationship between friendliness and LGT, we compared the RIP of proteins that are less affected by gene transfer, according to the phylogenetic discordance sequence (PDS) metric (5, 16), with that of proteins that are subject to frequent LGT events. This metric estimates the discordance of a protein's phylogenetic signal in relation to the signals of most proteins in the genome, assigning values from 0 to 1, with 1 indicating a totally concordant sequence and 0 indicating a highly discordant protein. We found that proteins that are less affected by gene transfer tend to have a higher RIP (P < 1.9E−6, Kruskal–Wallis test) (Fig. 2A). This finding appears to agree with the general notion of the complexity hypothesis. But because this analysis is based on assessing the interaction potential of the horizontally acquired protein in its new host, it represents the posttransfer state of the protein rather than its pretransfer state. Once such a gene is acquired by a new host, its interaction potential might change. In particular, regions that once mediated interactions in the pretransfer host that do not occur in the new organism might degenerate due to accumulation of mutations. This is expected to be a relatively slow process, however. Therefore, the fact that proteins that are less affected by transfer tend to have a higher RIP does not necessarily indicate that highly interacting proteins are rarely transferred. The lower RIP of transferred genes may merely be the result of the gradual loss of protein interaction sites in transferred genes.

Fig. 2.

Fig. 2.

Interaction potential of E. coli genes by PDS. Genes with a higher PDS score have a more concordant evolutionary signal. PDS values were obtained from the EMU Web server (http://emu.imb.uq.edu.au), as described previously (5). Error bars indicate jackknife error estimates. The number of proteins in each category is given in parentheses.

To test these evolutionary scenarios, we compared recently acquired genes, for which there has been relatively little time to accumulate mutations, to more anciently transferred genes that have experienced longer periods of selection and are expected to be more highly integrated into their new genome (17). For this purpose, we used a dataset of gene gains (i.e., events in which a member of an orthology group appears in a lineage) and examined three datasets of protein-coding genes in E. coli for which transfer has been dated: recent transfers, defined as those introduced into E. coli K-12 after the split from Salmonella, about 100 million years ago (18); early transfers, defined as those ORFs introduced before the EscherichiaSalmonella split; and nontransferred genes. Dating of the transfer was done with a parsimony-based approach (17). This method does not allow the inference of the origin of genes present only in one genome in the dataset and thus likely underestimates transfer, given that genes with very few or no homologs within a bacterial class are usually the product of LGT (e.g., from bacteriophages). Transferred proteins are on average shorter than native proteins. To account for any difference in the number of interaction sites that could be due to dissimilar protein sizes, we compared populations of proteins with the same length distributions. We sampled each population of proteins (ancient, recent, and nontransferred) to yield three populations with the same length distributions, and then compared the RIP in these populations. We excluded proteins shorter than 70 aa from the analysis. It is important to note that RIP, our measure of friendliness, is normalized for length. This normalization is required because it is well established that many factors, including protein size, affect the likelihood of successful transfer. Longer proteins are less likely to be transferred, but are more likely to have more putative interaction sites. Through normalization, we can test the effect of stickiness in isolation, independent of protein length. Comparing the three sets of proteins (recent transfers, ancient transfers, and nontransferred) revealed a clear evolutionary trend of the highest RIP values in the recently acquired genes (Fig. 3), followed by the more ancient transferred genes and then the nontransferred genes (P = 0.002, Kruskal–Wallis test).

Fig. 3.

Fig. 3.

Interaction potential of genes with varying residence time in the E. coli genome. Interaction potential values were calculated by ISIS (Materials and Methods), and gene antiquity data were obtained from Lercher and Pál (17). Recent transfers correspond to categories 1–4 in that dataset, whereas ancient transfers correspond to categories 5 and 6 in that dataset. Error bars indicate jackknife error estimates. The number of proteins in each category is given in parentheses. Note that genes that have no reciprocal best Blast hit in any other species in Lercher and Pál's dataset are also likely to be recent transfers and to have no LGT inference, and thus are not shown in this figure.

We also performed an additional analysis based on a different and independent method for dating transferred genes through the assessment of irregular G + C composition (19) (see also refs. 5 and 20). Unlike in the dataset of Lercher and Pal, these genes are not necessarily the sole representatives of their respective gene families in the genome. Using this method, we divided the E. coli genes into two sets, recent transfers and ancient or not transferred, and compared the average RIP of these sets. The results of this comparison further support our previous findings. As shown in Fig. 4, the RIP is higher in the recently acquired genes than in the remainder of the genome (P = 0.001, Mann–Whitney U test).

Fig. 4.

Fig. 4.

Interaction potential of recently acquired genes in E. coli. Recently acquired genes were obtained from Ragan (20). The method of determining gene transfer age differs from that used in Fig. 3; thus, different genes may be classified differently. By this metric, recent transfers are stickier than ancient ones. Error bars indicate jackknife error estimates. The number of proteins in each category is in parentheses.

Both of our analyses suggest that transferred genes have more interaction sites than nontransferred genes, but that these sites degenerate over time. Moreover, as reported above, the highly significant correlation between the RIP and the experimentally observed degree in the PPI network does not exist in recently transferred genes. This implies that for the initial transfer and retention to succeed, the more PPIs in the pretransfer host, the better.

To test the scenario in which interaction interfaces deteriorate after transfer, we attempted to infer whether transferred genes of intermediate age in a genome have a lower interaction potential than native homologs in other genomes. These genes should have been acquired fairly recently to ensure their identification as laterally acquired, while providing sufficient time for interface decay. We selected all genes that satisfied the following three criteria: (i) were acquired before the EscherichiaSalmonella split but after the split from Yersinia species [rank 5 in the Lercher and Pál dataset (17)]; (ii) had an unusual composition, indicating that they were not yet ameliorated, further supporting a relatively recent acquisition [i.e., included in the Lawrence dataset (19, 20)]; and (iii) belonged to gene families that experienced relatively little transfer (PDS >0.98), so that homologs in other genomes were more likely to be vertically inherited in those genomes than were the products of LGT themselves. For each of these 13 proteins, we retrieved the best BLAST matches from the National Center for Biotechnology Information's NR database, ignoring duplicate hits to different strains from the same species. We analyzed the 10 proteins that had at least nine BLAST matches outside the γ-proteobacteria (Table S1) and compared their interaction potential with the average of their 10 closest homologs. Of note, 9 out of 10 homolog sets had a higher average normalized stickiness than their respective E. coli homologs (P < 10−5, reshuffle simulation) (Fig. 5). This finding demonstrates that a protein's stickiness is substantially higher in a pretransfer organism (represented by the homologs, where it is probably vertically inherited) than in the posttransfer organism (i.e., in E. coli). A closer look at these proteins indeed reveals that in most cases, the sticky sites were in conserved positions along the sequence. In E. coli, these sites are typically composed of fewer residues. In many cases, those positions are not predicted to be sticky at all in E. coli (Fig. 6 and Fig. S2). This implies that whereas proteins with higher RIP are more likely to be transferred, sticky residues that no longer mediate interactions gradually degenerate due to an accumulation of mutations, resulting in the reduced interaction potential seen in older laterally acquired genes.

Fig. 5.

Fig. 5.

Interaction potential of laterally acquired genes in E. coli versus their homologs outside γ-proteobacteria. Black squares represent E. coli proteins, and red triangles represent homologs in other genomes. In 9 of 10 protein families, the predicted stickiness of the E. coli homolog was below the average stickiness of the family (P < 10−5 in reshuffle simulation).

Fig. 6.

Fig. 6.

Multiple sequence alignment highlighting predicted sticky positions in the sequences of homologous transcriptional activators from different species (GI numbers in parentheses). A multiple sequence alignment was performed to all sequences using the MAFFT server (33), after which the amino acid single-letter representations were replaced by predictions from ISIS. A “p” represents a position in the alignment predicted to be sticky in that species, “−” represents a position not predicted to be sticky in that species, and “*” represents a gap in the multiple sequence alignment. The sequence from E. coli is in red. Note that the predicted sticky residues tend to create clusters along the sequence and to be conserved in most species; however, in most cases the sites in E. coli either are smaller than in the other species or are eliminated altogether.

The complexity hypothesis is based on the fundamental insight that systems-wide properties affect the likelihood of successful LGT. This by no means applies to all gene transfer events. It has been known for more than a decade that laterally acquired genes are often clustered together on the genome (19, 21), potentially forming “selfish operons” (22). Thus, functional complexes (even very large ones) can be easily coacquired if encoded on a single cluster (5, 23). Furthermore, many proteins, ranging from β-lactamases to most aminoacyl-tRNA synthetases (24, 25), require no interaction partner for activity. In addition, many LGT events involve gene fragments rather than whole genes (26, 27), and thus cannot be investigated by the methods described above.

Nevertheless, many genes are transferred without their interaction partners. The extent of this transfer can be assessed by checking how often gene products interact with their neighbors on the genome (Materials and Methods). Not surprisingly, we found significant enrichment for interactions with products of gene neighbors. In particular, interactions of proteins that are encoded between one and five genes apart occurred 615% more often than expected (Materials and Methods); however, only 2% of interactions were between proteins encoded by genes exhibiting this high proximity. Moreover, 96.9% of the interactions occurred between proteins more than 21 genes apart, a distance unlikely to allow for cotransfer, except on a very large plasmid. This finding is supported by the similar average distance between the genes encoding interaction partners of recently transferred genes and ancient transfers (1,243 vs. 1,232). Thus, it is safe to assume that in most cases, genes are transferred without their interaction partners, which are critical for the execution of their function. Consequently, it has been assumed that highly connected proteins are less likely to be retained (2). Our analysis shows that there is indeed a strong relationship between the degree of a protein in PPI networks and the likelihood of that protein's transfer. However, counterintuitively, this relationship may actually be the opposite of that assumed by the complexity hypothesis, with the more highly connected proteins more likely to be retained, possibly because of their greater ability to interact with native proteins in their new host and to be integrated into existing processes to create new functions. Indeed, Davids and Zhang (3) demonstrated that in E. coli, laterally acquired genes interact preferentially with core proteins, conserved in all E. coli strains, that have had their genome sequenced over less-conserved proteins or other transferred genes. Similarly, Ochman et al. (4) reported that in PPI networks, laterally acquired proteins tend to interact with hubs rather than with the proteins in the periphery of the network, again demonstrating the integrative potential of these genes. Furthermore, acquired genes that degenerated to pseudogenes in Shigella have functional relatives with a lower-than-average degree in E. coli, further supporting the importance of the interaction potential of the acquired genes (4).

Alternatively, proteins may be retained not because of their interaction sites, but rather because the new host has genes that encode proteins homologous to the original interaction partners of the transferred gene product. However, analysis of the conservation of PPIs shows that even minimal sequence divergence (e.g., sequence identity of 90% between two proteins) can change most of the interactions (28). This is consistent with previous analyses that found that interface residues are only marginally more conserved than other surface residues (11), despite the fact that the location of the interface on homologous proteins may be conserved (29). It also has been shown that across-species interactions are even less conserved than within-species interactions (28). These analyses suggest that unless two homologs have >90% sequence identity, they are unlikely to interact with each other's partners without mutations that adjust their respective interfaces. This kind of similarity is very rare and cannot account for the wide effect that we observed. Our finding that successfully transferred genes are inherently “friendlier” helps explain these genes’ preferential attachment to core proteins and underscores the importance of interactions in the evolution of bacterial genomes.

Materials and Methods

PDS Determination.

The PDS metric (16) measures the extent to which genes within a genome share the same evolutionary history with the remainder of the genome in terms of similarity to orthologs. If the evolutionary history is indeed concordant for all genes, then these best hits in other genomes should be ranked similarly for each coding ORF. An ORF presenting with a pattern that conflicts with this common ranking is considered discordant (PDS <1). PDS values for gene families present in the E. coli genome were obtained from Wellner et al. (5).

Horizontally Transferred Genes.

Datasets of ancestral genes versus ancient and recent transfers (17) were kindly provided by Martin J. Lercher (Heinrich-Heine-University, Düsseldorf, Germany). From these datasets, we used data based on the DELTRAN algorithm, with relative penalties for horizontal gene transfer and loss of 2:1. We divided the data into three categories: nontransferred, recent transfers (categories 1–4 in the dataset), and ancient transfers (categories 5 and 6 in the dataset). We conducted an additional comparison on the dataset of recently acquired genes detected by compositional biases by Lawrence and Ochman (19).

PPI.

PPI data were obtained from previous studies (1315), and the number of interactions per protein (i.e., degree) was computed following Wellner et al. (5).

RIP Analysis Using ISIS.

The E. coli K-12 sequences were obtained from the National Center for Biotechnology Information's FTP site. For each protein in the set, we predicted interaction sites using ISIS (8). ISIS uses PSI-BLAST (30) to generate an HSSP file (31), then uses PROF (32) to predict secondary structure and solvent accessibility. The MSA from HSSP is used to compute the conservation and the evolutionary profile of each residue. The results were fed into ISIS using the following parameters: gap = 8, stretch = 5, and crowd = 7.

To account for the different distributions of protein length in transferred and nontransferred genes, we compared protein populations of the same length distribution. We first eliminated any protein shorter than 70 aa, then grouped proteins according to length (in aa) into bins in increments of 10 aa; that is, for each PDS category, all proteins of 70 ≥ length > 80 aa were grouped in the first bin, all proteins with 80 ≥ length > 90 aa were grouped in the next bin, and so on. To compare proteins from different PDS categories, we sampled three populations with equal representation for each bin. We ended up with three sets of proteins with the same proportion of proteins from each bin. Note that even though RIP is already normalized by length, this normalization is aimed at creating populations with similar length distributions, to account for any bias that might result from very different length distributions in the protein populations. Running the same analysis on the unnormalized set of proteins did not affect the results.

Degradation Analysis.

We analyzed the degradation of sticky sites (Fig. 6 and Fig. S1) as follows. From proteins that were acquired before the EscherichiaSalmonella split but after the split from Yersinia species [rank 5 in the Lercher and Pál dataset (17)] and had an unusual composition, indicating that they were newly acquired [i.e., were included in the Lawrence dataset (19, 20)], we selected those originating from families that experienced relatively little transfer (PDS >0.98). This should guarantee that at least some homologs in other genomes were more likely to be vertically inherited in those genomes than products of LGT themselves. We chose the 10 best BLASTP matches outside γ-proteobacteria for detailed analysis, and eliminated cases in which fewer than 10 homologs existed. Homologous sequences were aligned using MAFFT (33). For every sequence, the single-letter amino acid code in each position was replaced by annotation of that residue from ISIS (“p” for sticky residues, “−” otherwise). Gaps in the alignment were replaced with an asterisk.

Assessing Genome Vicinity of Interacting Proteins.

We used the PPI data of Hu et al. (15). For each pair of interacting proteins, we determined their respective separation on the genome; immediate neighbors had a distance of 1, genes separated by one other gene had a distance of 2, and so on. As a background, for each pair of interacting proteins, we randomly selected two proteins and recorded their distance. Finally, we compared the distribution of distances in real PPIs versus random pairs, and compared the average distance of recent transfers with that of ancient transfers.

Statistical Analysis.

We used SPSS version 12 (SPSS) to perform the Kruskal–Wallis and Mann–Whitney tests. Error bars represent jackknife-derived confidence thresholds (34). The resample size, m(n), was set to 0.3 of the data.

We obtained P values for the comparison of stickiness in E. coli and homologs in other species using a reshuffle simulation, as follows. In each family of proteins, one member was randomly marked as “E. coli,” and its predicted stickiness was compared with the average stickiness of the remaining proteins in this family. We did this for each family, and recorded the number of families in which the E. coli homolog was less sticky than the average. We repeated this 100,000 times. In none of these repetitions did we find a case in which more than seven families had a less-sticky random E. coli homolog compared with the average stickiness of the rest of the family. In 66% of these cases there were five or fewer of these families, and in only 0.05% (453 cases) were there 7 out of 10 such families.

Supplementary Material

Supporting Information

Acknowledgments

We thank Martin J. Lercher for the dataset of laterally acquired genes, Nir Yosef for PPI data, and Martin Kupiec, Eytan Ruppin, Oren Harman, and Rotem Sorek for a critical reading of the manuscript. U.G. is supported by grants from the Israeli Ministry of Health and the McDonnell Foundation.

Footnotes

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1009775108/-/DCSupplemental.

References

  • 1.Doolittle WF, et al. How big is the iceberg of which organellar genes in nuclear genomes are but the tip? Philos Trans R Soc Lond B Biol Sci. 2003;358:39–57. doi: 10.1098/rstb.2002.1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Jain R, Rivera MC, Lake JA. Horizontal gene transfer among genomes: The complexity hypothesis. Proc Natl Acad Sci USA. 1999;96:3801–3806. doi: 10.1073/pnas.96.7.3801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Davids W, Zhang Z. The impact of horizontal gene transfer in shaping operons and protein interaction networks—direct evidence of preferential attachment. BMC Evol Biol. 2008;8:23. doi: 10.1186/1471-2148-8-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ochman H, Liu R, Rocha EP. Erosion of interaction networks in reduced and degraded genomes. J Exp Zoolog B Mol Dev Evol. 2007;308:97–103. doi: 10.1002/jez.b.21147. [DOI] [PubMed] [Google Scholar]
  • 5.Wellner A, Lurie MN, Gophna U. Complexity, connectivity, and duplicability as barriers to lateral gene transfer. Genome Biol. 2007;8:R156. doi: 10.1186/gb-2007-8-8-r156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kim PM, Lu LJ, Xia Y, Gerstein MB. Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006;314:1938–1941. doi: 10.1126/science.1136174. [DOI] [PubMed] [Google Scholar]
  • 7.de Vries SJ, Bonvin AM. How proteins get in touch: Interface prediction in the study of biomolecular complexes. Curr Protein Pept Sci. 2008;9:394–406. doi: 10.2174/138920308785132712. [DOI] [PubMed] [Google Scholar]
  • 8.Ofran Y, Rost B. ISIS: Interaction sites identified from sequence. Bioinformatics. 2007;23:e13–e16. doi: 10.1093/bioinformatics/btl303. [DOI] [PubMed] [Google Scholar]
  • 9.Ofran Y. Prediction of Protein Interaction Sites in Computational Protein–Protein Interactions. In: Nussinov R, Schreiber G, editors. Boca Raton, FL: CRC Press; 2009. pp. 167–184. [Google Scholar]
  • 10.Zhou HX, Qin S. Interaction site prediction for protein complexes: A critical assessment. Bioinformatics. 2007;23:2203–2209. doi: 10.1093/bioinformatics/btm323. [DOI] [PubMed] [Google Scholar]
  • 11.Ofran Y, Rost B. Protein–protein interaction hotspots carved into sequences. PLOS Comput Biol. 2007;3:e119. doi: 10.1371/journal.pcbi.0030119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ofran Y, Rost B. Predicted protein–protein interaction sites from local sequence information. FEBS Lett. 2003;544:236–239. doi: 10.1016/s0014-5793(03)00456-3. [DOI] [PubMed] [Google Scholar]
  • 13.Arifuzzaman M, et al. Large-scale identification of protein–protein interaction of Escherichia coli K-12. Genome Res. 2006;16:686–691. doi: 10.1101/gr.4527806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yosef N, Kupiec M, Ruppin E, Sharan R. A complex-centric view of protein network evolution. Nucleic Acids Res. 2009;37:e88. doi: 10.1093/nar/gkp414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hu P, et al. Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol. 2009;7:e96. doi: 10.1371/journal.pbio.1000096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Clarke GD, Beiko RG, Ragan MA, Charlebois RL. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J Bacteriol. 2002;184:2072–2080. doi: 10.1128/JB.184.8.2072-2080.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lercher MJ, Pál C. Integration of horizontally transferred genes into regulatory interaction networks takes many million years. Mol Biol Evol. 2008;25:559–567. doi: 10.1093/molbev/msm283. [DOI] [PubMed] [Google Scholar]
  • 18.Ochman H, Wilson AC. Evolution in bacteria: Evidence for a universal substitution rate in cellular genomes. J Mol Evol. 1987;26:74–86. doi: 10.1007/BF02111283. [DOI] [PubMed] [Google Scholar]
  • 19.Lawrence JG, Ochman H. Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci USA. 1998;95:9413–9417. doi: 10.1073/pnas.95.16.9413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ragan MA. On surrogate methods for detecting lateral gene transfer. FEMS Microbiol Lett. 2001;201:187–191. doi: 10.1111/j.1574-6968.2001.tb10755.x. [DOI] [PubMed] [Google Scholar]
  • 21.Hacker J, Carniel E. Ecological fitness, genomic islands and bacterial pathogenicity: A Darwinian view of the evolution of microbes. EMBO Rep. 2001;2:376–381. doi: 10.1093/embo-reports/kve097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lawrence JG, Roth JR. Selfish operons: Horizontal transfer may drive the evolution of gene clusters. Genetics. 1996;143:1843–1860. doi: 10.1093/genetics/143.4.1843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gophna U, Ron EZ, Graur D. Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events. Gene. 2003;312:151–163. doi: 10.1016/s0378-1119(03)00612-7. [DOI] [PubMed] [Google Scholar]
  • 24.Woese CR, Olsen GJ, Ibba M, Söll D. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol Mol Biol Rev. 2000;64:202–236. doi: 10.1128/mmbr.64.1.202-236.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Andam CP, Williams D, Gogarten JP. Biased gene transfer mimics patterns created through shared ancestry. Proc Natl Acad Sci USA. 2010;107:10679–10684. doi: 10.1073/pnas.1001418107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chan CX, Darling AE, Beiko RG, Ragan MA. Are protein domains modules of lateral genetic transfer? PLoS ONE. 2009;4:e4524. doi: 10.1371/journal.pone.0004524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chan CX, Beiko RG, Darling AE, Ragan MA. Lateral transfer of genes and gene fragments in prokaryotes. Genome Biol Evol. 2009;1:429–438. doi: 10.1093/gbe/evp044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mika S, Rost B. Protein–protein interactions more conserved within species than across species. PLOS Comput Biol. 2006;2:e79. doi: 10.1371/journal.pcbi.0020079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhang QC, Petrey D, Norel R, Honig BH. Protein interface conservation across structure space. Proc Natl Acad Sci USA. 2010;107:10896–10901. doi: 10.1073/pnas.1005894107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dodge C, Schneider R, Sander C. The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res. 1998;26:313–315. doi: 10.1093/nar/26.1.313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rost B, Yachdav G, Liu J. The PredictProtein server. Nucleic Acids Res. 2004;32:W321–W326. doi: 10.1093/nar/gkh377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008;9:286–298. doi: 10.1093/bib/bbn013. [DOI] [PubMed] [Google Scholar]
  • 34.Efron B. Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika. 1981;68:589. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
1009775108_sfig01.pdf (86.1KB, pdf)
1009775108_sfig02.pdf (322.9KB, pdf)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES