Abstract
The mechanisms that underlie the origin of major prokaryotic groups are poorly understood. In principle, the origin of both species and higher taxa among prokaryotes should entail similar mechanisms — ecological interactions with the environment paired with natural genetic variation involving lineage-specific gene innovations and lineage-specific gene acquisitions1,2,3,4. To investigate the origin of higher taxa in archaea, we have determined gene distributions and gene phylogenies for the 267,568 protein coding genes of 134 sequenced archaeal genomes in the context of their homologs from 1,847 reference bacterial genomes. Archaea-specific gene families define 13 traditionally recognized archaeal higher taxa in our sample. Here we report that the origins of these 13 groups unexpectedly correspond to 2,264 group-specific gene acquisitions from bacteria. Interdomain gene transfer is highly asymmetric, transfers from bacteria to archaea are more than 5-fold more frequent than vice versa. Gene transfers identified at major evolutionary transitions among prokaryotes specifically implicate gene acquisitions for metabolic functions from bacteria as key innovations in the origin of higher archaeal taxa.
Genome evolution in prokaryotes entails both tree-like components generated by vertical descent and network-like components generated by lateral gene transfer (LGT)5,6. Both processes operate in the formation of prokaryotic species1,2,3,4,5,6. While it is clear that LGT within prokaryotic groups such as cyanobacteria7, proteobacteria8 or halophiles9 is important in genome evolution, the contribution of LGT to the formation of novel prokaryotic groups at higher taxonomic levels is unknown. Prokaryotic higher taxa are recognized and defined by rRNA phylogenetics10, their existence is supported by phylogenomic studies of informational genes11 that are universal to all genomes, or nearly so12. Such core genes encode about 30-40 proteins for ribosome biogenesis and information processing functions, but they comprise only about 1% of an average genome. While core phylogenomics studies provide useful prokaryotic classifications13, they give little insight into the remaining 99% of the genome, because of LGT14. The core does not predict gene content across a given prokaryotic group, especially in groups with large pangenomes or broad ecological diversity1,4, nor does the core itself reveal which gene innovations underlie the origin of major groups.
To examine the relationship between gene distributions and the origins of higher taxa among archaea, we clustered all 267,568 proteins encoded in 134 archaeal chromosomes using the Markov Cluster Algorithm (MCL)15 at a ≥25% global amino acid identity threshold, thereby generating 25,762 archaeal protein families having ≥2 members. Clusters below that sequence identity threshold were not considered further. Among the 25,762 archaeal clusters, two thirds (16,983) are archaeal specific — they detect no homologs among 1,847 bacterial genomes. The presence of these archaea-specific genes in each of the 134 archaeal genomes is plotted in Fig. 1 against an unrooted reference tree (left panel) constructed from a concatenated alignment of the 70 single copy genes universal to archaea sampled. The gene distributions strongly correspond to the 13 recognized archaeal higher taxa present in our sample, with 14,416 families (85%) occurring in members of only one of the 13 groups indicated and 1,545 (11%) occurring in members of two groups only (Fig. 1). Another 4% of archaea-specific clusters are present in more than two groups, and 0.3% are present in all genomes sampled (Fig. 1).
The remaining one third of the archaeal families (8,779 families) have homologs that are present in anywhere from one to 1,495 bacterial genomes. The number of genes that each archaeal genome shares with 1,847 bacterial genomes and which bacterial genomes harbor those homologs is shown in the gene sharing matrix (Extended Data Fig. 1), which reveals major differences in the per-genome frequency of bacterial gene occurrences across archaeal lineages. We generated alignments and maximum likelihood trees for those 8,471 archaeal families having bacterial counterparts and containing ≥4 taxa. In 4,397 trees the archaeal sequences were monophyletic (Fig. 2), while in the remaining 4,074 trees the archaea were not monophyletic, interleaving with bacterial sequences. For all trees, we plotted the distribution of gene presence or absence data across archaeal taxa onto the reference tree.
Among the 4,397 cases of archaeal monophyly, 1,053 trees contained sequences from only one bacterial genome or bacterial phylum (Extended Data Figure 2), a distribution indicating gene export from archaea to bacteria. In the remaining 3,315 trees (Supplementary Table 3), the monophyletic archaea were nested within a broad bacterial gene distribution spanning many phyla. For 2,264 of those trees, the genes occur specifically in only one higher archaeal taxon (left portion of Fig. 2), but at the same time they are very widespread among diverse bacteria (lower panel of Fig. 2), clearly indicating that they are archaeal acquisitions from bacteria, or imports. Among the 2,264 imports, genes involved in metabolism (39%) are the most frequent (Supplementary Table 2).
Like the archaea-specific genes in Fig. 1, the imports in Fig. 2 correspond to the 13 archaeal groups. Does the origin of these groups coincide with the acquisition of the imports? If the imports were acquired at the origin of each group, their set of phylogenies should be similar to the set of phylogenies for the archaea-specific, or recipient, genes (Fig. 1) from the same group. As an alternative to single origin to account for monophyly, the imports might have been acquired in one lineage and then spread through the group, in which case the recipient and import tree sets should differ. Using Kolmogorov-Smirnov test adapted to non-identical leaf sets, we could not reject the null hypothesis H0 that the import and recipient tree sets were drawn from the same distribution for six of the 13 higher taxa: Thermoproteales (P = 0.32), Desulfurococcales (P = 0.3), Methanobacteriales (P = 0.96), Methanococcales (P = 0.19), Methanosarcinales (P = 0.16), and Haloarchaea (P = 0.22), while the slightest possible perturbation of the import set, one random prune and graft LGT event per tree, did reject H0 at P < 0.002 in those six cases, very strongly (P < 10−42) for the Haloarchaea, where the largest tree sample is available (Extended Data Fig. 3, Extended Data Table 1). For these six archaeal higher taxa, the origin of their group-specific bacterial genes and the origin of the group are indistinguishable.
In 4,074 trees, the archaea were not monophyletic (Extended Data Fig. 4; Supplementary Table 4-5). Transfers in these phylogenies are not readily polarized and were scored neither as imports nor exports. Importantly, if we plot the gene distributions sorted for bacterial groups, rather than for archaeal groups, we do not find similar patterns such as those defining the 13 archaeal groups. That is, we do not detect patterns that would correspond to the acquisition of archaeal genes at the origin of bacterial groups (Extended Data Fig. 5), indicating that gene transfers from archaea to bacteria, though they clearly do occur, do not correspond to the origin of major bacterial groups sampled here.
In archaeal systematics, Haloarchaea, Archaeoglobales, and Thermoplasmatales branch within the methanogens13,16, as in our reference tree (Fig. 2). All three groups hence derive from methanogenic ancestors. Previous studies have identified a large influx of bacterial genes into the halophile common ancestor17, and gene fluxes between archaea at the origin of these major clades16. Fig. 2 shows that the acquisition of bacterial genes corresponds to the origin of these three groups from methanogenic ancestors, all of which have relinquished methanogenesis and harbour organotrophic forms18,19. Among the 2,264 bacteria-to-archaea transfers, 1,881 (83%) have been acquired by methanogens or ancestrally methanogenic lineages, which comprise 55% of the present archaeal sample.
Neither the archaea-specific genes nor the bacterial acquisitions showed evidence for any pattern of higher order archaeal relationships or hierarchical clustering20 among the 13 higher taxa, with the exception of the crenarchaeote-euryarchaeote spilt (Extended Data Fig. 6). While 16,680 gene families (14,414 archaea-specific and 2,264 acquisitions) recover the groups themselves, only 4% as many genes (601: 491 archaea-specific and 110 acquisitions) recover any branch in the reference phylogeny linking those groups (Extended Data Fig. 7).
For 7,379 families present in 2-12 groups, we examined all 6,081,075 possible trees that preserve the crenarchaeote-euryarchaeote split by coding each group as an OTU (operational taxonomic unit) and scoring gene presence in one member of a group as present in the group. A random tree can account for 569 (8%) of the families, the best tree can account for 1,180 families (16%), while the reference tree accounts for 849 (11%) of the families (Extended Data Fig. 8). Thus, the gene distributions conflict with all trees and do not support a hierarchical relationship among groups.
Figure 3 shows the phylogenetic structure (gray branches) that is recovered by the individual phylogenies of the 70 genes that were used to make the reference tree. It reveals a tree of tips21 in that, for deeper branches, no individual gene tree manifests the deeper branches of the concatenation tree. Even the crenarchaeote-euryarchaeote split is not recovered because of the inconsistent position of Thaumarchaea and Nanoarchaea. Projected upon the tree of tips are the bacterial acquisitions that correspond to the origin of the 13 archaeal groups studied here.
The direction of transfers between the two prokaryotic domains is highly asymmetric. The 2,264 imports plotted in Fig. 3 are transfers from bacteria to archaea, occurring only in one archaeal group (Extended Data Table 2, Supplementary Table 6). Yet only 391 converse transfers, exports from archaea to bacteria, were observed (Extended Data Table 2), the bacterial genomes most frequently receiving archaeal genes occurring in Thermotogae (Supplementary Table 7). Transfers from bacteria to archaea are thus >5-fold more frequent than vice versa, yet sample-scaled for equal number of bacterial and archaeal genomes, transfers from bacteria to archaea are 10.7-fold more frequent (see Supplementary Information). The bacteria-to-archaea transfers comprise predominantly metabolic functions, with amino acid import and metabolism (208 genes), energy production and conversion (175 genes), inorganic ion transport and metabolism (123 genes) and carbohydrate transport and metabolism (139 genes) being the four most frequent functional classifications (Extended Data Table 2).
The extreme asymmetry in interdomain gene transfers likely relates to the specialized lifestyle of methanogens, which served as recipients for 83% of the polarized gene transfers observed (Supplementary Table 8). Hydrogen-dependent methanogens are specialized chemolithoautotrophs, the route to more generalist organotrophic lifestyles that are not H2-CO2 dependent entails either gene invention or gene acquisition. For Haloarchaea, Archaeoglobales and Thermoplasmatales, gene acquisition from bacteria provided the key innovations that transformed methanogenic ancestors into founders of novel higher taxa with access to new niches, whereby several methanogen lineages have acquired numerous bacterial genes22 but have retained the methanogenic lifestyle.
Gene transfers from bacteria to archaea not only underpin the origin of major archaeal groups, they also underpin the origin of eukaryotes, because the host that acquired the mitochondrion was, phylogenetically, an archaeon23,24. Our current findings support the theory of rapid expansion and slow reduction currently emerging from studies of genome evolution25. Subsequent to genome expansion via acquisition, lineage-specific gene loss predominates, as evident in Figs. 1 and 2. In principle, the bacterial genes that correspond to the origin of major archaeal groups could have been acquired by independent LGT events9,14, via unique combinations in founder lineage pangenomes3,4, or via mass transfers involving symbiotic associations, similar to the origin of eukaryotes23,24. For lineages in which the origin of bacterial genes and the origin of the higher archaeal taxon are indistinguishable, the latter two mechanisms seem more likely.
Extended Data
Extended Data Table 1. Comparison of sets of trees for single-copy genes in 11 archaeal groups.
Archaeal groups | Number of taxa | Number of genes | Recipients vs. Imports | Recipients vs. 1 LGT | Recipients vs. 2 LGT | Recipients vs. Random |
---|---|---|---|---|---|---|
Thermoproteales | 13 | 29 | 0.32 | 6.80E-07 | 4.20E-05 | 1.50E-07 |
Desulfurococcales | 13 | 21 | 0.3 | 3.10E-06 | 1.60E-05 | 5.50E-07 |
Sulfolobales | 17 | 77 | 0.062 | 0.2 | 1.50E-03 | 3.70E-15 |
Thermococcales | 14 | 65 | 0.00081 | 4.30E-11 | 1.60E-09 | 1.40E-16 |
Methanobacteriales | 8 | 34 | 0.96 | 5.50E-09 | 2.60E-08 | 2.60E-08 |
Methanococcales | 15 | 54 | 0.19 | 0.0017 | 9.90E-06 | 3.10E-10 |
Thermoplasmatales | 4 | 5 | 1 | 0.036 | 0.7 | 0.036 |
Archaeoglobales | 4 | 9 | 0.6 | 0.6 | 0.6 | 1 |
Methanococcales | 15 | 54 | 0.19 | 0.0017 | 9.90E-06 | 3.10E-10 |
Methanosarcinales | 10 | 70 | 0.16 | 6.90E-12 | 8.00E-11 | 8.80E-15 |
Halobacteriales | 23 | 594 | 0.22 | 8.40E-43 | 1.00E-71 | 1.10E-146 |
Extended Data Table 2. Functional annotations for archaeal genes according to gene family distribution and phylogeny.
Function | COG category | Specific | M | NM | Exp | Imp |
---|---|---|---|---|---|---|
Information | Chromatin structure and dynamics | 14 | 1 | 5 | 1 | 1 |
Translation, ribosome biogenesis | 263 | 84 | 50 | 9 | 27 | |
Replication, recombination and repair | 375 | 126 | 185 | 17 | 69 | |
Transcription | 524 | 124 | 113 | 10 | 81 | |
Cellular | Defense mechanisms | 48 | 62 | 116 | 4 | 45 |
Cell cycle, division, chromosome partitioning | 79 | 22 | 15 | 2 | 13 | |
Trafficking, secretion, vesicular transport | 97 | 17 | 6 | 3 | 6 | |
Cell motility | 146 | 40 | 29 | 8 | 33 | |
Cell wall/membrane/envelope biogenesis | 197 | 143 | 203 | 10 | 91 | |
Protein turnover, chaperones | 236 | 85 | 137 | 18 | 61 | |
Signal transduction mechanisms | 308 | 120 | 129 | 16 | 101 | |
Metabolism | Secondary metabolites | 10 | 46 | 35 | 0 | 30 |
Nucleotide transport and metabolism | 44 | 53 | 105 | 7 | 41 | |
Lipid transport and metabolism | 62 | 113 | 117 | 6 | 72 | |
Coenzyme transport and metabolism | 168 | 143 | 219 | 11 | 97 | |
Inorganic ion transport and metabolism | 232 | 176 | 265 | 16 | 123 | |
Carbohydrate transport and metabolism | 118 | 205 | 227 | 14 | 139 | |
Energy production and conversion | 334 | 254 | 403 | 25 | 175 | |
Amino acid transport and metabolism | 177 | 278 | 440 | 26 | 208 | |
No annotation | General function prediction only | 949 | 434 | 560 | 49 | 297 |
Function unknown | 12602 | 789 | 715 | 139 | 554 | |
Total | 16983 | 3315 | 4074 | 391 | 2264 |
Supplementary Material
Acknowledgements
We gratefully acknowledge funding from European Research Council (ERC 232975 to W.F.M.), the graduate school E-Norm of the University of Düsseldorf (W.F.M.), the DFG (Scho 316/11-1 to P.S.; SI 642/10-1 to B.S.), and BMBF (0316188A, B.S). G.L. is supported by an ERC grant (281357 to Tal Dagan), D.B. thanks the Alexander von Humbold Foundation for a Fellowship. Computational support of the Zentrum für Informations- und Medientechnologie (ZIM) at the University of Düsseldorf is gratefully acknowledged.
Footnotes
Author Information: Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests.
References
- 1.Doolittle WF, Papke RT. Genomics and the bacterial species problem. Genome Biol. 2006;7:116. doi: 10.1186/gb-2006-7-9-116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Retchless AC, Lawrence JG. Temporal fragmentation of speciation in Bacteria. Science. 2007;317:1093–1096. doi: 10.1126/science.1144876. [DOI] [PubMed] [Google Scholar]
- 3.Achtmann M, Wagner M. Microbial diversity and the genetic nature of microbial species. Nat. Rev. Microbiol. 2008;6:431–440. doi: 10.1038/nrmicro1872. [DOI] [PubMed] [Google Scholar]
- 4.Fraser C, Alm EJ, Polz MF, Spratt BG, Hanage WP. The bacterial species challenge: making sense of genetic and ecological diversity. Science. 2009;323:741–746. doi: 10.1126/science.1159388. [DOI] [PubMed] [Google Scholar]
- 5.Puigbo P, Wolf YI, Koonin EV. The tree and net components of prokaryote genome evolution. Genome Biol. Evol. 2010;2:745–756. doi: 10.1093/gbe/evq062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dagan T. Phylogenomic networks. Trends Microbiol. 2011;19:483–491. doi: 10.1016/j.tim.2011.07.001. [DOI] [PubMed] [Google Scholar]
- 7.Hess WR. Genome analysis of marine photosynthetic microbes and their global role. Curr. Opin. Biotechnol. 2004;15:191–198. doi: 10.1016/j.copbio.2004.03.007. [DOI] [PubMed] [Google Scholar]
- 8.Kloesges T, et al. Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths. Mol. Biol. Evol. 2011;28:1057–1074. doi: 10.1093/molbev/msq297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Williams D, Gogarten JP, Papke RT. Quantifying homologous replacement of loci between haloarchaeal species. Genome Biol. Evol. 2012;4:1223–1244. doi: 10.1093/gbe/evs098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Woese CR. Bacterial evolution. Microbiol. Rev. 1987;51:221–271. doi: 10.1128/mr.51.2.221-271.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rivera MC, Jain R, Moore JE, Lake JA. Genomic evidence for two functionally distinct gene classes. Proc. Natl. Acad. Sci. USA. 1998;95:6239–6244. doi: 10.1073/pnas.95.11.6239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Puigbo P, Wolf YI, Koonin EV. Search for a tree of life in the thicket of the phylogenetic forest. J. Biol. 2009;8:59. doi: 10.1186/jbiol159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Brochier-Armanet C, Forterre P, Gribaldo S. Phylogeny and evolution of the Archaea: One hundred genomes later. Curr. Opin. Microbiol. 2011;14:274–281. doi: 10.1016/j.mib.2011.04.015. [DOI] [PubMed] [Google Scholar]
- 14.Lake JA, Rivera MC. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol. Biol. Evol. 2004;21:681–690. doi: 10.1093/molbev/msh061. [DOI] [PubMed] [Google Scholar]
- 15.Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–1584. doi: 10.1093/nar/30.7.1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wolf YI, Makarova KS, Yutin N, Koonin EV. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer. Biol. Direct. 2012;7:46. doi: 10.1186/1745-6150-7-46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nelson-Sathi S, et al. Acquisitions of 1,000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proc. Natl. Acad. Sci. USA. 2012;109:20537–20542. doi: 10.1073/pnas.1209119109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bräsen C, Esser D, Rauch B, Siebers B. Carbohydrate metabolism in Archaea: Current insights into unusual enzymes and pathways and their regulation. Microbiol. Mol. Biol. Rev. 2014;78:89–175. doi: 10.1128/MMBR.00041-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Siebers B, Schönheit P. Unusual pathways and enzymes of central carbohydrate metabolism in Archaea. Curr. Opin. Microbiol. 2005;8:695–705. doi: 10.1016/j.mib.2005.10.014. [DOI] [PubMed] [Google Scholar]
- 20.Doolittle WF, Bapteste E. Pattern pluralism and the tree of life hypothesis. Proc. Natl. Acad. Sci. USA. 2007;104:2043–2049. doi: 10.1073/pnas.0610699104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Creevey CJ, et al. Does a tree-like phylogeny only exist at the tips in the tree of prokaryotes? Proc. R. Soc. B. 2004;271:2551–2558. doi: 10.1098/rspb.2004.2864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Deppenmeier U, et al. The genome of Methanosarcina mazei: Evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 2002;4:453–461. [PubMed] [Google Scholar]
- 23.Williams TA, Foster GF, Cox CY, Embley TM. An archaeal origin of eukaryotes supports only two primary domains of life. Nature. 2013;504:231–236. doi: 10.1038/nature12779. [DOI] [PubMed] [Google Scholar]
- 24.McInerney JO, O’Connell MJ, Pisani D. The hybrid nature of eukaryota and a consilient view of life on Earth. Nat. Rev. Microbiol. 2014;12:449–455. doi: 10.1038/nrmicro3271. [DOI] [PubMed] [Google Scholar]
- 25.Wolf YI, Koonin EV. Genome reduction as the dominant mode of evolution. BioEssays. 2013;35:829–837. doi: 10.1002/bies.201300037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. doi: 10.1126/science.278.5338.631. [DOI] [PubMed] [Google Scholar]
- 28.Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. doi: 10.1016/s0168-9525(00)02024-2. [DOI] [PubMed] [Google Scholar]
- 29.Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 2003;52:696–704. doi: 10.1080/10635150390235520. [DOI] [PubMed] [Google Scholar]
- 30.Stamatakis A, Ludwig T, Meier H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics. 2005;21:456–463. doi: 10.1093/bioinformatics/bti191. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.