Abstract
The archaeal ancestor scenario (AAS) for the origin of eukaryotes implies the emergence of a new kind of organism from the fusion of ancestral archaeal and bacterial cells. Equipped with this “chimeric” molecular arsenal, the resulting cell would gradually accumulate unique genes and develop the complex molecular machineries and cellular compartments that are hallmarks of modern eukaryotes. In this regard, proteins related to phagocytosis and cell movement should be present in the archaeal ancestor, thus identifying the recently described candidate archaeal phylum “Lokiarchaeota” as resembling a possible candidate ancestor of eukaryotes. Despite its appeal, AAS seems incompatible with the genomic, molecular, and biochemical differences that exist between Archaea and Eukarya. In particular, the distribution of conserved protein domain structures in the proteomes of cellular organisms and viruses appears hard to reconcile with the AAS. In addition, concerns related to taxon and character sampling, presupposing bacterial outgroups in phylogenies, and nonuniform effects of protein domain structure rearrangement and gain/loss in concatenated alignments of protein sequences cast further doubt on AAS-supporting phylogenies. Here, we evaluate AAS against the traditional “three-domain” world of cellular organisms and propose that the discovery of Lokiarchaeota could be better reconciled under the latter view, especially in light of several additional biological and technical considerations.
1. Introduction
The discovery of the novel candidate archaeal phylum “Lokiarchaeota” from metagenomic samples taken from sites near Loki's Castle hydrothermal vents of the Arctic Ocean was recently reported [1]. There are two interesting aspects to this discovery: (i) several eukaryotic signature proteins (ESPs) related to membrane remodeling, cell division, and the cytoskeleton, previously thought to be either absent or rare in akaryotes (Archaea and Bacteria; sensu [2]), were detected in the composite Lokiarchaeota genomes (Loki 1, Loki 2, and Loki 3), and (ii) phylogenomic analyses of concatenated alignment of 36 conserved proteins revealed that eukaryotes and Lokiarchaeota grouped together within Archaea, suggesting an archaeal ancestor scenario (AAS) for the origin of eukaryotes [3]. The AAS thus favors a two-domain (2D) view of the tree of life (ToL) where eukaryotes emerge from within Archaea, specifically as sister group to the proposed TACKL (including Thaumarchaeota, Aigarchaeota, Crenarchaeota, Korarchaeota, and Lokiarchaeota) superphylum [4, 5], after a likely merger of archaeal microbes (resembling Lokiarchaeota) and the mitochondrial ancestors [6].
AAS is fast becoming an accepted scenario to explain deep evolutionary history (e.g., [7–9]) and the origin of eukaryotic cells [10, 11]. Except for some dissenting opinions [12], Lokiarchaeota is now commonly viewed as the “missing link” in the transition from “simple” to “complex” life [1]. However, several key differences in the membrane biology, biochemistry, and virospheres of Archaea and Eukarya seem at odds with AAS (see [13] for a recent review). Simultaneous ToL reconstructions from concatenated ribosomal proteins and the small-subunit ribosomal RNA (SSU rRNA) gene produced conflicting topologies with the former supporting the AAS while the latter recovering the “Woesian” three-domain (3D) ToL [14] of cellular diversification into domains Archaea, Bacteria, and Eukarya [15]. Because protein sequences are generally more conserved than nucleic acid sequences, SSU rRNA genes possess relatively lower number of informative sites and a higher rate of evolution compared to concatenated ribosomal protein sets. SSU rRNA genes are therefore likely more sensitive to known issues such as the notorious long-branch-attraction (LBA) artifact [16]. In turn, ribosomal proteins exhibit strong compositional biases among the cellular domains of life that need to be better understood [15]. While the study provided an “updated” view of the ToL incorporating hundreds of uncultivated representatives of archaeal and bacterial genera (the so-called “microbial dark matter” [17]) into ToL reconstructions, the authors remained indecisive in picking either the 2D (from concatenated ribosomal proteins) or the 3D (from SSU rRNA) ToL to explain the origin of eukaryotes beyond any doubt [15]. The AAS is also in conflict with several historical phylogenetic and phylogenomic frameworks such as phylogenies built from SSU rRNA sequences [14], single-gene alignments of ancient paralogous genes [18, 19], gene content and order [20, 21], concatenated gene [22] and protein domain [23, 24] sets, and abundance combination and architecture of protein structural domains in modern genomes [23, 25, 26] that have consistently supported the 3D ToL despite disagreements on the location of the root of the ToL and the fact that most generated trees are unrooted [27–30].
It has been argued however that the use of “advanced” models of sequence evolution with relaxed assumptions of homogenous amino acid compositions of gene products across sites and branches is necessary to recover the origin of Eukarya from within Archaea (see [31] for a recent review). However, the presence of distant outgroups (e.g., bacterial ribosomal proteins that are quite divergent from archaeal-eukaryotic counterparts but are used to root the ToLs) and fast-evolving species (e.g., Nanoarchaeota [32] and Methanopyrus kandleri [33]) in datasets can make even these sophisticated methods prone to LBA, as shown by recent simulations [29] (see also [34]). Moreover, a concatenated (i.e., supermatrix) approach to phylogenetics, as applied by Spang et al. [1] to support AAS, could be problematic especially when member genes have independent evolutionary histories. Simulations have shown that concatenated gene sets can produce aberrant trees with high bootstrap (BS) support [35]. The approach is also susceptible to heterotachy (i.e., unequal evolutionary rates among genes in a concatenated set) [35, 36], which can complicate inferring deep evolutionary relationships and can introduce distortions to interdomain calculations, among other issues (see Section 5). In light of these considerations, here we examine the evidence supporting the 2D scenario for the diversification of cellular life, perform taxa and character manipulations to reanalyze the dataset of Spang et al. [1] that supported the Lokiarchaeota-Eukarya sisterhood, and consider several biological and technical issues that weaken the 2D in favor of the 3D ToL.
2. Eukaryotic Genomes Are More Complex Than Mere Archaea-Bacteria Genomic Chimeras
AAS remains popular due to the purported chimeric nature of eukaryotic genomes [5, 37]. For example, Guy et al. (2014) wrote, “The apparent genomic chimerism in eukaryotic genomes is currently best explained by invoking a cellular fusion at the root of the eukaryotes that involves one archaeal and one or more bacterial components” [3]. Indeed, eukaryotic genomes include many genes that have homologs in Archaea and Bacteria. Genes exhibiting bacterial affinity generally perform metabolic functions while those with archaeal affinity perform informational roles (i.e., DNA replication, transcription, and translation) [37], though exceptions to this “rule” exist (see [38] for a recent review). The proponents of AAS claim that chimerism in eukaryotic genomes is best explained by invoking the transformation of an archaeon (host cell) into a eukaryote by the engulfment of the bacterial ancestor of mitochondria [1]. Thus, a new kind of cell would originate from fusion between two different kinds of cells, a scenario contested to be biologically implausible (see [13] for a recent review).
A coarse-grained examination of eukaryotic genomes also indicates that chimerism is apparently an oversimplification. For example, in addition to Archaea-like and Bacteria-like genes, eukaryotic genomes house a significant number of viral genes and viral-like retrotransposable genetic elements that are likely remnants of ancient viral infections [39, 40]. This viral-like genetic material should therefore imply a “third” partner contributing towards genomic chimerism in eukaryotes. Under AAS, this new partner must invade the eukaryotic genome (or originate de novo) after the proposed fusion event because eukaryotic RNA and retrotranscribing virus families have hitherto not been described in Archaea (see Figure 1 in [41]). This poses a conceptual problem because modern RNA viruses are likely relics of ancient RNA viruses that played significant roles in evolutionary history, perhaps even contributing to the discovery of DNA [42]. Moreover, a substantial number of eukaryotic core genes lack any homologs in akaryotes and were believed to be present in the last common eukaryotic ancestor (up to 40% according to [43]). Remarkably, Eukarya-specific and viral-like genes quantitatively exceed Archaea/Bacteria-like genes in eukaryotic genomes and not all Bacteria-like genes descended from the mitochondrial ancestor (Section 3). At first glance, these observations suggest that the Archaea-Bacteria chimerism is not an a priori requirement to explain eukaryogenesis. Instead, it rather underestimates the distinctive and global nature of eukaryotic genomes.
3. AAS Is Not Supported by Protein Structure Data
A dissection of the proteomic makeup of 383 completely sequenced eukaryal proteomes reveals the global nature of eukaryotic proteomes (Figure 1). A total of 1,661 protein domain fold superfamilies (FSFs) coded by eukaryotic proteomes can be divided into eight mutually exclusive groups: ABEV (universal), ABE (universal in cells), BEV (all except Archaea), AEV (all except Bacteria), AE (only in Archaea and Eukarya), BE (only in Bacteria and Eukarya), EV (only in Eukarya and viruses), and E (unique to eukaryotes) (Figure 1). FSFs, as defined by the Structural Classification of Proteins (SCOP) database [52, 53], are collections of distantly related protein domains that share recognizable structural and biochemical similarities indicative of divergence from ancestral domain structures. FSFs are thus highly conserved molecular characters that are useful tools to examine deep evolutionary relationships, especially because protein structure is more refractory to change compared to gene and protein sequences that are prone to mutational saturation over long evolutionary distances [54–56].
The AE, BE, and EV groups are of particular interest to this discussion as they imply sharing of homologous FSFs in only two sets of proteomes. The numbers alone are interesting as there is an 8-fold difference in the number of eukaryotic FSFs shared only with Bacteria compared with those shared only with Archaea (283 BE versus 34 AE). This bias challenges both the AAS [1] and the traditionally accepted Archaea/Eukarya sisterhood [14], as one should expect greater sharing between Archaea and Eukarya under these models. Moreover, the EV group even outnumbers the AE FSFs (40 versus 34). While it has been argued that viruses frequently pickpocket cellular genes [57], this historical “belief” has been challenged by several large-scale bioinformatics explorations that suggest gene flow from viruses to cells in fact exceeds gene transfer in the opposite direction [46, 58, 59]. Viruses can also create new genes during intracellular replication using host cell machinery (e.g., ~70–80% of viral genes lack cellular homologs; see Figure 1 in [46]) and some of these genes can later be coopted by cellular genomes (refer to the “virocell” concept [60]). Indeed, 16 out of 38 (42%) EV FSFs perform Other functions, a functional category that includes proteins with either unknown or viral functions, suggesting they did not originate in Eukarya (Figure 2). Eukaryotic proteomes also encode a substantial number of unique FSFs (281, ~17% of total eukaryotic FSFs) that confirm that eukaryotic genomes are not mere chimeras of genes mixed from different sources but are more complex than anticipated under the AAS model. In fact, the Lokiarchaeum genome (Loki 1) adds only 10 new FSFs to the archaeal repertoire [12] suggesting that the “bridge” between Archaea and Eukarya remains wide, especially when inferring homology at protein structure level.
It can however be argued that the presence of the same FSF in two different sets of proteomes could be due to horizontal gene transfer (HGT) or convergent evolution. However, similar concerns are also applicable to BLAST-based inferences of homology, especially because top BLAST hits are not necessarily orthologous [61]. Importantly, convergent evolution of protein folds is extremely rare [62] because the protein backbone is formed by unique “fingerprint” designs achieved through interactions between amino acid side chains. Due to the direct evolutionary constraint to maintain the overall biochemical function of proteins, disruptions in the protein structural backbone are generally resisted for longer periods of evolutionary time [55, 56, 63]. Moreover, the odds of originating convergent “fingerprints” are very small [62] and there is no reason to suggest that protein structure is relatively more influenced by nonvertical evolution than gene sequences (please see [54] and the references therein). In fact, the recent expansion in the availability of deposited protein structures in structure databases (123, 273 structural entries in RCSB Protein Data Bank [64] as of October 5, 2016) offers the unique opportunity to revise life history using an alternative and likely more reliable set of molecular characters.
4. Protein Domain Fold Superfamilies (FSFs) Shared Only by Bacteria and Eukarya (BE) Are Not Restricted to Metabolic Roles
The endosymbiosis of the mitochondrial ancestor likely contributed many metabolic genes to modern eukaryotic genomes [65, 66] and could therefore influence the large size of the BE group (Figure 1). This prompted us to inspect the functional makeup of the AE, BE, and EV groups (Figure 2). Interestingly, BE was not restricted solely to metabolic FSFs but included an ensemble of informational, general, and other FSFs involved in intracellular and extracellular processes (Figure 2). In fact, metabolic FSFs constituted only 31% of BE FSFs (72 out of 233) highlighting the partial contribution of metabolism-inspired gene transfer and enzymatic recruitment to the composition of the BE group. Moreover, eukaryotes shared more informational FSFs with Bacteria than Archaea (29 versus 10). The data therefore suggest that mitochondrial endosymbiosis does not fully account for the large numerical difference in the sizes of BE and AE FSFs. Instead, Bacteria-like eukaryotic genes can alternatively be explained by a combination of (i) endosymbiosis in a protoeukaryotic ancestor (i.e., not an archaeon), (ii) recent HGTs between bacterial and eukaryotic species, and/or (iii) Bacteria-Eukarya sisterhood in an alternative topology of the 3D ToL [28, 67, 68], without the need to invoke the AAS. It is important to note that, despite several concerns and the use of methods that do not root ToLs (reviewed in [29]), the early origin of Bacteria is taken by default or as a fact under AAS and corresponding phylogenetic trees are rooted using bacterial outgroup sequences. This rooting is ad hoc and could be problematic because it ignores a large body of work challenging the “traditional” bacterial rooting of the ToL [28, 30]. In other words, Bacteria and Eukarya share a wide range of molecular (283 FSFs) and biochemical features (e.g., similar lipid membranes) indicating perhaps a more complex evolutionary history than that explained by chimerism or nonvertical evolution [28].
Similarly, Archaea-like genes in eukaryotes can be explained under the Woesian 3D scenario by invoking a sister group relationship between Archaea and Eukarya, a view historically supported by phylogenies rooted with many paralogous gene sequences [18, 19]. Notably, this topology also accounts for the presence of several ESPs that are scattered in various members of Archaea [1]. Other alternatives involve the origin of the three cellular domains from a complex ancestor of life [69, 70] followed by selective loss of Archaea-like eukaryotic genes in Bacteria and loss of Bacteria-like genes in Archaea (e.g., [71]). For example, the distribution of FSFs in Archaea, Bacteria, Eukarya, and viruses revealed the existence of a shared “universal” core comprising 54% of total FSFs (903 ABE and ABEV FSFs out of a total of 1,661) (Figure 1). The large size of the universal core favors the view that the last common ancestor of cells (and viruses) was already more complex than anticipated (see also [72, 73]). Hence, the differential loss of genes can also account for their absence in one of the three cellular domains of life, especially because many akaryotic species are believed to evolve via genome reduction [74–76]. In summary, even ignoring evidence from FSF distributions, alternative explanations can account for the purported chimerism that is at the root of AAS models suggesting that chimerism could be an oversimplified interpretation of eukaryotic genomes.
5. Technical Issues Related to Taxon and Character Sampling Question AAS
Next, we focus on the more technical aspects of the AAS. It is true that simple genomic comparisons, such as those of FSF distributions, are no substitutes to formal phylogenetic studies (though they have been supported by comparative and phylogenomic exercises [28]). As case study, we evaluated the technical design of the study of Spang et al. [1]. The authors recovered a clade of Lokiarchaeota and Eukarya from trees reconstructed from a concatenated alignment of 36 “universal” proteins in 104 taxa (84 Archaea, 10 Bacteria, and 10 Eukarya, hereafter the 84-10-10 dataset). We focus our discussion on two aspects of their tree reconstruction: (i) taxon sampling and (ii) the use of concatenated alignments (i.e., character sampling and assembly).
Taxon sampling is extremely important for the success of phylogenomic reconstructions as biased and uneven sampling can easily mislead evolutionary interpretations. As Delsuc et al. (2005) wrote, “garbage in, garbage out” [77], implying that even the best algorithms can produce false results when taxa/characters do not sufficiently represent extant biodiversity or are known to be problematic. First, overrepresentation of archaeal taxa and sparse selection of bacterial and eukaryal species (i.e., 84-10-10 in [1]) could be problematic, especially because the dataset includes several archaeal species that are sole members of their phylum (e.g., Candidatus Korarchaeum cryptofilum), have unknown taxonomic affiliations (e.g., Nanoarchaeota [32, 78]), and/or are fast-evolving (Nanoarchaeota [32], M. kandleri [33]). Ideally, taxa should be sampled randomly, equally, and densely from each major group of organisms and increased for reliable tree reconstruction [79, 80] and fast-evolving members excluded [34, 81]. This is showcased by the basal positions of M. kandleri and Thermotoga maritima within the archaeal and bacterial subtrees in Spang et al.'s (2015) trees (Figure 2 in [1]). M. kandleri is a fast-evolving archaeon and its basal position in most phylogenetic trees is now considered a technical artifact [33, 82]. Similarly, the examination of slow-evolving sites in rRNA sequences has revised the phylogenetic placement of T. maritima [83] (see also [84]). To dissect these issues, we produced an unrooted distance-based phylogenomic network from the 84-10-10 Archaea-Bacteria-Eukarya concatenated sequence dataset [1]. Interestingly, the network did not group Eukarya within Archaea, recovering instead the 3D view of life (Figure 3(a)). Separately, we reconstructed distance networks from the occurrence (i.e., presence or absence) of universal FSFs (ABE) and FSFs shared by Archaea and Eukarya (34 AE) in 102 taxa sampled randomly and equally from the three cellular domains (i.e., 34 taxa each). Again, and despite the AE FSFs biasing reconstructions towards the AAS model, eukaryotes retained their unique identity and did not form a group within the archaeal subtree (Figure 3(b)).
While distance-based methods are no good substitutes to the sophisticated maximum likelihood (ML) and Bayesian analyses (used by Spang et al. [1]) that are less sensitive to LBA and account for relaxed assumptions of amino acid substitutions across sites and branches, they can be useful indicators of underlying conflicts between data and trees and can reveal the existence of reticulations [85]. Importantly, robust retrodictions should provide congruent reconstructions from parametric, nonparametric, and distance methods. Nevertheless, to test the impact of archaeal sampling on the robustness of tree topology, we repeated the phylogenetic analyses by producing 10 new datasets from the 84-10-10 dataset, sampling each time all 10 bacterial and eukaryal species but randomly extracting 10 archaea roughly representative of the known archaeal diversity (i.e., 3 Crenarchaeota, 3 Euryarchaeota, 1 Korarchaeota, 1 Aigarchaeota, 1 Thaumarchaeota, and 1 Lokiarchaeota; Figures S1–S10). Lokiarchaeum (Loki 1) was chosen as the Lokiarchaeota representative for these reconstructions. Despite using the same concatenated alignment of Spang et al. [1], balancing the number of taxa from each domain (i.e., the 10-10-10 datasets) had an immediate effect on the recovered phylogenies. In fact, 7 out of 10 reconstructed ML trees yielded monophyletic Archaea without any mixing of eukaryotic taxa (Figures S2–S8). For the remaining 3 trees that supported paraphyletic Archaea (Figures S1, S9, and S10), we observed that M. kandleri (a fast-evolving archaeon) was part of two reconstructions (Figures S9 and S10) indicating that this organism could distort tree topology. For the third tree that recovered paraphyletic Archaea (but in the absence of M. kandleri, Figure S1), we observed that group I euryarchaeotes (e.g., Thermococcales and Methanogens group I) were missing among the sampled archaeal taxa. Noticeably, Figure S5 that included M. kandleri but did not produce paraphyletic Archaea included both group I (i.e., Methanococcus maripaludis, Methanococcales) and group II (Ferroplasma acidiphilum, Thermoplasmatales) euryarchaeotes confirming our initial observation that taxon sampling should be broad and inclusive of all groups with careful exclusion of fast-evolving species. Therefore, we produced 3 new phylogenies for the problematic datasets (i.e., Figures S1, S9, and S10) by replacing M. kandleri and Candidatus K. cryptofilum (the unique member of the putative phylum Korarchaeota, Figures S9 and S10) and Cand. K. cryptofilum and Picrophilus torridus (a group II euryarchaeote, Figure S1) by two sequences from group I Euryarchaeota (see trees in Figure 4). These revised datasets recovered the monophyly of Archaea (BS > 80%) and produced 3D ToLs (Figure 4). Our experimentation therefore hinted that the AAS (or 2D ToL) could perhaps be an outcome of including fast-evolving species and/or incomplete/unbalanced taxon sampling in phylogenetic datasets that could bias even the latest and sophisticated methods of tree reconstruction. Indeed, recent simulations have revealed that even Bayesian inferences could be prone to LBA when outgroups are too distant [29], a case, for example, when bacterial proteins are used to root ToLs. Indeed, separate ML and Bayesian reconstructions of DNA-dependent RNA polymerase (a universally conserved large protein and a reliable molecular marker [86]) performed after selecting 39 taxa each from Archaea, Bacteria, and Eukarya and after careful exclusion of fast-evolving archaeal species (Nanoarchaea and M. kandleri) recovered the 3D ToL and a sister relationship between Euryarchaeota and Lokiarchaeota (and its closest evolutionary relative Thorarchaeota [87]) indicating that the result obtained by Spang et al. [1] likely suffered from problematic experimental design (Da Cunha et al. ms. submitted). In summary, both distance-based and probabilistic methods of tree reconstruction and parsimonious inferences drawn from FSF distributions in eukaryotic proteomes challenge the phylogenetic reconstructions of Spang et al. [1] and the AAS model.
The second issue relates to the concatenated or supermatrix approach towards resolving deep evolutionary relationships. Spang et al. (2015) produced a concatenated alignment of 36 conserved genes in 104 taxa. This alignment was trimmed to remove sites with >50% gaps to filter out ambiguous regions. There could be two major problems with this approach: First, trimming using a 50% threshold (partial deletion) is highly dependent on the composition of inclusive taxa. Since the archaeal species dominated the dataset (i.e., 84 out of 104), a minimum of 32 archaeal species must possess the same indel present in all bacterial and eukaryal taxa to trim out ambiguous sites. The obvious problem with this approach is that one could trim out different regions when working with different datasets, as these vary in composition of Archaea, Bacteria, and Eukarya. While taxa deletion experiments of Spang et al. [1] claim to minimize the consequences of this issue, balancing the number of organisms sampled from each major group of organisms seems a logical modus operandi. Second, concatenated alignments are generally preferred because they yield greater resolution than single-gene markers and are relatively less susceptible to LBA (discussed in [77]). However, their use can be significantly compromised when the genes involved have different evolutionary histories [88], as Spang et al. (2015) themselves noted that the topologies of single-gene phylogenies (which were not shown) were “often inconclusive with low support values at critical nodes” [1]. In fact, only 5/36 genes in the concatenated alignment [1] supported the Lokiarchaeota/Eukarya affiliation. Thus, it becomes crucial to reconcile concatenated phylogenies against phylogenies of individual genes (that were included in concatenation) or to perhaps produce alignment-independent phylogenies to avoid these issues [54]. Indeed, several conflicts between concatenated gene sets and single-gene phylogenies specifically aimed towards resolving the phylogenetic relationship between Archaea and Eukarya have historically been reported (reviewed in [13]). To quote Forterre on this topic, “One should be cautious in the interpretation of trees obtained from the concatenation of protein sequences that produce such contradictory individual trees” [13]. It can also be a conceptual challenge to visualize the effects of protein domain gain, loss, inversions, and rearrangements in concatenation of several genes. These are well-known evolutionary processes influencing the history of molecular sequences [89] and could pose serious issues especially when primary sequence identity between proteins is very low, as could be the case when comparing distantly related taxa over long evolutionary timespans. Simulations have also shown that concatenated gene sets can lead to inconsistencies and produce misleading trees with high BS values [35], in addition to known issues of heterotachy [36].
Spang et al.'s [1] definition of “universal” proteins is also confusing since some bacterial and eukaryal taxa did not encode one or more of the 36 selected proteins. For example, 7 out of 10 eukaryal taxa did not include the Zn-dependent protease (arCOG04064) [1]. This shows that relatively little phylogenetic information (in terms of both taxa and character sampling) was contributed by bacterial and eukaryal sequences in their study. Moreover, because the dataset included a large number of ribosomal proteins (21 out of 36) that are quite divergent between Bacteria and Archaea/Eukarya, we suspect that the archaeal affiliation of eukaryotes was artificially enhanced under such experimental design (this would be true especially because trees were rooted using bacterial outgroup sequences). Finally, the authors detected several ESPs in the Lokiarchaeota genomes, claiming to be features unique to Lokiarchaeota and Eukarya. However, comparing FSF distributions across the three cellular domains of life and viruses indicates widespread presence of ESPs, especially in viruses (e.g., the Gelsolin-like domain superfamily [12]), suggesting perhaps that archaeal metagenomes were contaminated with eukaryoviruses. The authors also acknowledged the presence of “mimivirus” [90] in the metagenomic sample raising the possibility that its eukaryotic host could also be present. Even if the ESPs genuinely belong to Lokiarchaeota, they can still be explained by the Woesian 3D cellular world by considering a complex archaeal ancestor and subsequent gene loss in modern Archaea [38].
6. AAS Is at Odds with Biochemical and Virosphere Differences between Archaea and Eukarya
To quote Forterre, “Generally speaking, it is very difficult to resolve ancient relationships by molecular phylogenetic methods for both practical and theoretical reasons, essentially because the informative signal is completely erased at long evolutionary distances”… “One possibility to bypass this phylogenetic impasse is to focus on biological plausibility” [13]. AAS is especially weakened in this regard when one considers differences in the membrane biology and virospheres of Archaea and Eukarya. These issues have been raised before (e.g., [13, 38, 91–94]) but never satisfactorily addressed by the proponents of AAS. For example, transformation of one kind of cell into another has never been observed in nature even after known cases of HGT across domains (e.g., transfer of about 1,000 genes between Archaea and Bacteria [95]) and endosymbiosis events that are more “intimate” associations between cells but do not produce a new domain of life (e.g., plants remain eukaryotes despite acquiring about one-fifth of their genes from cyanobacteria [96]). Moreover, transformation of an archaeon into a eukaryote would imply transforming archaeal membrane lipids (ether-linked) into bacteria/eukarya like membrane lipids (ester-linked) for which there is no evolutionary rationale. Instead, the difference between the membrane biologies of Archaea and Bacteria/Eukarya could be taken as a powerful synapomorphy supporting the archaeal rooting of the 3D ToL [28]. Moreover, the complex makeup of eukaryotic cells differs greatly from the streamlined makeup of both Bacteria and especially Archaea (please note the substantial number of E FSFs in Figure 1). This gap is only marginally reduced by addition of the Lokiarchaeum genome that only adds 10 new FSFs to Archaea [12]. The scenario also seems logically incompatible because of little or no overlap in the genetics and morphology of archaeoviruses and eukaryoviruses (discussed elsewhere [97]). Specifically, many families of RNA viruses that infect eukaryotes seemingly cannot carry out a productive infection cycle in Archaea (though the archaeal virosphere remains largely unexplored [98]). Based on current data, under AAS, one should therefore postulate the late origin of eukaryotic RNA viruses after the transformation had taken place, as claimed by [99]. But this goes against several lines of evidence suggesting that RNA viruses originated very early in evolution and likely led the transition to a DNA world via retrotranscription [42], including a global phylogenomic study of cellular and viral proteomes [46]. The recent discovery of possibly multicellular eukaryotic fossils in 2.1-billion-year-old sediments pushes back in time the last common eukaryotic ancestor [100], further weakening the argument enforcing eukaryotic origins from within Archaea (reviewed in [13]). In short, AAS seems biologically implausible in light of several biological considerations.
7. Conclusions
Metagenomic explorations, development of single-cell sequencing technologies, and improvements in in silico reconstruction of (meta)genomes are yielding novel insights into our understanding of the evolutionary history of cellular organisms. The recent sequencing of Lokiarchaeota composite genomes and resulting phylogenetic analysis suggested an archaeal origin for the eukaryotic cell. The discovery has been widely publicized and the debate surrounding the origin of eukaryotes now considered by many to be settled. However, history inferred from protein structure data reveals a more global picture of the genetic composition of eukaryotic proteomes. Specifically, it takes into account the shared genes with Archaea, Bacteria, and viruses and challenges the purported eukaryotic genomic chimerism that is at the root of AAS models. While some interpret genomic chimerism in eukaryotes by invoking a fusion event at the root of eukaryote evolution, inferences redrawn from phylogenomic analyses performed after balanced taxon and character sampling, removal of fast-evolving species, and comparative analysis of protein structure distribution contradict that interpretation. Moreover, several biological and technical considerations are at odds with the proposed Lokiarchaeota-Eukarya phylogenetic affiliation and suggest that the 3D ToL may still be the more reasonable evolutionary scenario considering biological plausibility and support from molecular data.
Supplementary Material
Acknowledgments
Research was supported by grants from the National Science Foundation (OISE-1132791) and the National Institute of Food and Agriculture (ILLU-802-909 and ILLU-483-625) to Gustavo Caetano-Anollés, from the Marine Biotechnology Program (PJT200620, Genome Analysis of Marine Organisms and Development of Functional Applications) funded by the Ministry of Oceans and Fisheries, Korea, to KyungMo Kim, and from the Higher Education Commission, Start-up Research Grant Program (Project no. 21-519/SRGP/R&D/HEC/2014), Pakistan, to Arshan Nasir. Violette Da Cunha is supported by the European Research Council (ERC) grant from the European Union's Seventh Framework Program (FP/2007-2013)/Project EVOMOBIL-ERC Grant Agreement no. 340440 to Patrick Forterre.
Additional Points
The concatenated trimmed alignments for 10-10-10 subsample trees can be downloaded at http://clustomcloud.kopri.re.kr/archaea/Trimmed_alignments_10_10_10.zip.
Competing Interests
The authors declare that there are no competing interests regarding the publication of this paper.
References
- 1.Spang A., Saw J. H., Jørgensen S. L., et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature. 2015;521(7551):173–179. doi: 10.1038/nature14447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Forterre P. Neutral terms. Nature. 1992;355(6358):p. 305. [Google Scholar]
- 3.Guy L., Saw J. H., Ettema T. J. G. The archaeal legacy of eukaryotes: a phylogenomic perspective. Cold Spring Harbor Perspectives in Biology. 2014;6(10) doi: 10.1101/cshperspect.a016022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Guy L., Ettema T. J. G. The archaeal ‘TACK’ superphylum and the origin of eukaryotes. Trends in Microbiology. 2011;19(12):580–587. doi: 10.1016/j.tim.2011.09.002. [DOI] [PubMed] [Google Scholar]
- 5.McInerney J. O., O'Connell M. J., Pisani D. The hybrid nature of the Eukaryota and a consilient view of life on Earth. Nature Reviews Microbiology. 2014;12(6):449–455. doi: 10.1038/nrmicro3271. [DOI] [PubMed] [Google Scholar]
- 6.Martijn J., Ettema T. J. G. From archaeon to eukaryote: the evolutionary dark ages of the eukaryotic cell. Biochemical Society Transactions. 2013;41(1):451–457. doi: 10.1042/bst20120292. [DOI] [PubMed] [Google Scholar]
- 7.Williams T. A., Embley T. M. Changing ideas about eukaryotic origins. Philosophical Transactions of the Royal Society B: Biological Sciences. 2015;370(1678) doi: 10.1098/rstb.2014.0318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Embley T. M., Williams T. A. Evolution: steps on the road to eukaryotes. Nature. 2015;521(7551):169–170. doi: 10.1038/nature14522. [DOI] [PubMed] [Google Scholar]
- 9.Koonin E. V. Archaeal ancestors of eukaryotes: not so elusive any more. BMC Biology. 2015;13(1, article 84) doi: 10.1186/s12915-015-0194-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Spang A., Ettema T. J. G. Microbial diversity: the tree of life comes of age. Nature Microbiology. 2016;1(5) doi: 10.1038/nmicrobiol.2016.56.16056 [DOI] [PubMed] [Google Scholar]
- 11.Koonin E. V. Origin of eukaryotes from within archaea, archaeal eukaryome and bursts of gene gain: eukaryogenesis just made easier? Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2015;370(1678) doi: 10.1098/rstb.2014.0333.20140333 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Nasir A., Kim K. M., Caetano-Anollés G. Lokiarchaeota: eukaryote-like missing links from microbial dark matter? Trends in Microbiology. 2015;23(8):448–450. doi: 10.1016/j.tim.2015.06.001. [DOI] [PubMed] [Google Scholar]
- 13.Forterre P. The universal tree of life: an update. Frontiers in Microbiology. 2015;6, article 717 doi: 10.3389/fmicb.2015.00717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Woese C. R., Kandler O., Wheelis M. L. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences of the United States of America. 1990;87(12):4576–4579. doi: 10.1073/pnas.87.12.4576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hug L. A., Baker B. J., Anantharaman K., et al. A new view of the tree of life. Nature Microbiology. 2016;1(5) doi: 10.1038/nmicrobiol.2016.48.16048 [DOI] [PubMed] [Google Scholar]
- 16.Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology. 1978;27(4):p. 401. doi: 10.2307/2412923. [DOI] [Google Scholar]
- 17.Marcy Y., Ouverney C., Bik E. M., et al. Dissecting biological “dark matter” with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(29):11889–11894. doi: 10.1073/pnas.0704662104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Iwabe N., Kuma K., Hasegawa M., Osawa S., Miyata T. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proceedings of the National Academy of Sciences of the United States of America. 1989;86(23):9355–9359. doi: 10.1073/pnas.86.23.9355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gogarten J. P., Kibak H., Dittrich P., et al. Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proceedings of the National Academy of Sciences of the United States of America. 1989;86(17):6661–6665. doi: 10.1073/pnas.86.17.6661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Korbel J. O., Snel B., Huynen M. A., Bork P. SHOT: a web server for the construction of genome phylogenies. Trends in Genetics. 2002;18(3):158–162. doi: 10.1016/s0168-9525(01)02597-5. [DOI] [PubMed] [Google Scholar]
- 21.Snel B., Bork P., Huynen M. A. Genome phylogeny based on gene content. Nature Genetics. 1999;21(1):108–110. doi: 10.1038/5052. [DOI] [PubMed] [Google Scholar]
- 22.Ciccarelli F. D., Doerks T., von Mering C., Creevey C. J., Snel B., Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science. 2006;311(5765):1283–1287. doi: 10.1126/science.1123061. [DOI] [PubMed] [Google Scholar]
- 23.Yang S., Doolittle R. F., Bourne P. E. Phylogeny determined by protein domain content. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(2):373–378. doi: 10.1073/pnas.0408810102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lin J., Gerstein M. Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Research. 2000;10(6):808–818. doi: 10.1101/gr.10.6.808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wang M., Caetano-Anollés G. Global phylogeny determined by the combination of protein domains in proteomes. Molecular Biology and Evolution. 2006;23(12):2444–2454. doi: 10.1093/molbev/msl117. [DOI] [PubMed] [Google Scholar]
- 26.Kim K. M., Caetano-Anollés G. The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms. BMC Evolutionary Biology. 2012;12(1, article 13) doi: 10.1186/1471-2148-12-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Forterre P., Philippe H. Where is the root of the universal tree of life? BioEssays. 1999;21(10):871–879. doi: 10.1002/(sici)1521-1878(199910)21:10<871::aid-bies10>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]
- 28.Caetano-Anollés G., Nasir A., Zhou K., et al. Archaea: the first domain of diversified life. Archaea. 2014;2014:26. doi: 10.1155/2014/590214.590214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gouy R., Baurain D., Philippe H. Rooting the tree of life: the phylogenetic jury is still out. Philosophical Transactions of the Royal Society B: Biological Sciences. 2015;370(1678) doi: 10.1098/rstb.2014.0329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Brinkmann H., Philippe H. Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Molecular Biology and Evolution. 1999;16(6):817–825. doi: 10.1093/oxfordjournals.molbev.a026166. [DOI] [PubMed] [Google Scholar]
- 31.Williams T. A., Foster P. G., Cox C. J., Embley T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature. 2013;504(7479):231–236. doi: 10.1038/nature12779. [DOI] [PubMed] [Google Scholar]
- 32.Brochier C., Gribaldo S., Zivanovic Y., Confalonieri F., Forterre P. Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales? Genome Biology. 2005;6(5, article R42) doi: 10.1186/gb-2005-6-5-r42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Brochier C., Forterre P., Gribaldo S. Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox. Genome biology. 2004;5(3):p. R17. doi: 10.1186/gb-2004-5-3-r17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Philippe H., Brinkmann H., Lavrov D. V., et al. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biology. 2011;9(3) doi: 10.1371/journal.pbio.1000602.e1000602 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kubatko L. S., Degnan J. H. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Systematic Biology. 2007;56(1):17–24. doi: 10.1080/10635150601146041. [DOI] [PubMed] [Google Scholar]
- 36.Kolaczkowski B., Thomton J. W. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogenous. Nature. 2004;431(7011):980–984. doi: 10.1038/nature02917. [DOI] [PubMed] [Google Scholar]
- 37.Rivera M. C., Jain R., Moore J. E., Lake J. A. Genomic evidence for two functionally distinct gene classes. Proceedings of the National Academy of Sciences of the United States of America. 1998;95(11):6239–6244. doi: 10.1073/pnas.95.11.6239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Forterre P. The common ancestor of archaea and eukarya was not an archaeon. Archaea. 2013;2013:18. doi: 10.1155/2013/372396.372396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Katzourakis A., Gifford R. J. Endogenous viral elements in animal genomes. PLoS Genetics. 2010;6(11) doi: 10.1371/journal.pgen.1001191.e1001191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Holmes E. C. The evolution of endogenous viral elements. Cell Host and Microbe. 2011;10(4):368–377. doi: 10.1016/j.chom.2011.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nasir A., Forterre P., Kim K. M., Caetano-Anollés G. The distribution and impact of viral lineages in domains of life. Frontiers in Microbiology. 2014;5, article no. 194 doi: 10.3389/fmicb.2014.00194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Forterre P. The two ages of the RNA world, and the transition to the DNA world: a story of viruses and cells. Biochimie. 2005;87(9-10):793–803. doi: 10.1016/j.biochi.2005.03.015. [DOI] [PubMed] [Google Scholar]
- 43.Fritz-Laylin L. K., Prochnik S. E., Ginger M. L., et al. The genome of naegleria gruberi illuminates early eukaryotic versatility. Cell. 2010;140(5):631–642. doi: 10.1016/j.cell.2010.01.032. [DOI] [PubMed] [Google Scholar]
- 44.Gough J., Chothia C. Superfamily: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Research. 2002;30(1):268–272. doi: 10.1093/nar/30.1.268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gough J., Karplus K., Hughey R., Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. Journal of Molecular Biology. 2001;313(4):903–919. doi: 10.1006/jmbi.2001.5080. [DOI] [PubMed] [Google Scholar]
- 46.Nasir A., Caetano-Anollés G. A phylogenomic data-driven exploration of viral origins and evolution. Science Advances. 2015;1(8) doi: 10.1126/sciadv.1500527.e1500527 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vogel C., Berzuini C., Bashton M., Gough J., Teichmann S. A. Supra-domains: evolutionary units larger than single protein domains. Journal of Molecular Biology. 2004;336(3):809–823. doi: 10.1016/j.jmb.2003.12.026. [DOI] [PubMed] [Google Scholar]
- 48.Vogel C., Teichmann S. A., Pereira-Leal J. The relationship between domain duplication and recombination. Journal of Molecular Biology. 2005;346(1):355–365. doi: 10.1016/j.jmb.2004.11.050. [DOI] [PubMed] [Google Scholar]
- 49.Vogel C., Chothia C. Protein family expansions and biological complexity. PLoS Computational Biology. 2006;2(5, article e48) doi: 10.1371/journal.pcbi.0020048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Huson D. H. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics. 1998;14(1):68–73. doi: 10.1093/bioinformatics/14.1.68. [DOI] [PubMed] [Google Scholar]
- 51.Guindon S., Lethiec F., Duroux P., Gascuel O. PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Research. 2005;33(2):W557–W559. doi: 10.1093/nar/gki352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Fox N. K., Brenner S. E., Chandonia J.-M. SCOPe: structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Research. 2014;42(1):D304–D309. doi: 10.1093/nar/gkt1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Andreeva A., Howorth D., Chandonia J.-M., et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Research. 2008;36(1):D419–D425. doi: 10.1093/nar/gkm993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Caetano-Anollés G., Nasir A. Benefits of using molecular structure and abundance in phylogenomic analysis. Frontiers in Genetics. 2012;3, article 172 doi: 10.3389/fgene.2012.00172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Illergård K., Ardell D. H., Elofsson A. Structure is three to ten times more conserved than sequence—a study of structural response in protein cores. Proteins: Structure, Function and Bioinformatics. 2009;77(3):499–508. doi: 10.1002/prot.22458. [DOI] [PubMed] [Google Scholar]
- 56.Lundin D., Poole A. M., Sjöberg B.-M., Högbom M. Use of structural phylogenetic networks for classification of the ferritin-like superfamily. Journal of Biological Chemistry. 2012;287(24):20565–20575. doi: 10.1074/jbc.M112.367458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Moreira D., López-García P. Ten reasons to exclude viruses from the tree of life. Nature Reviews Microbiology. 2009;7(4):306–311. doi: 10.1038/nrmicro2108. [DOI] [PubMed] [Google Scholar]
- 58.Cortez D., Forterre P., Gribaldo S. A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes. Genome Biology. 2009;10(6, article no. R65) doi: 10.1186/gb-2009-10-6-r65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Daubin V., Lerat E., Perrière G. The source of laterally transferred genes in bacterial genomes. Genome biology. 2003;4(9):p. R57. doi: 10.1186/gb-2003-4-9-r57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Forterre P. Manipulation of cellular syntheses and the nature of viruses: the virocell concept. Comptes Rendus Chimie. 2011;14(4):392–399. doi: 10.1016/j.crci.2010.06.007. [DOI] [Google Scholar]
- 61.Koski L. B., Golding G. B. The closest BLAST hit is often not the nearest neighbor. Journal of Molecular Evolution. 2001;52(6):540–542. doi: 10.1007/s002390010184. [DOI] [PubMed] [Google Scholar]
- 62.Gough J. Convergent evolution of domain architectures (is rare) Bioinformatics. 2005;21(8):1464–1471. doi: 10.1093/bioinformatics/bti204. [DOI] [PubMed] [Google Scholar]
- 63.Balaji S., Srinivasan N. Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins. Protein Engineering. 2001;14(4):219–226. doi: 10.1093/protein/14.4.219. [DOI] [PubMed] [Google Scholar]
- 64.Rose P. W., Prlić A., Bi C., et al. The RCSB protein data bank: views of structural biology for basic and applied research and education. Nucleic Acids Research. 2015;43(1):D345–D356. doi: 10.1093/nar/gku1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Sagan L. On the origin of mitosing cells. Journal of Theoretical Biology. 1967;14(3):225–274. doi: 10.1016/0022-5193(67)90079-3. [DOI] [PubMed] [Google Scholar]
- 66.Schwartz R. M., Dayhoff M. O. Origins of prokaryotes, eukaryotes, mitochondria, and chloroplasts. Science. 1978;199(4327):395–403. doi: 10.1126/science.202030. [DOI] [PubMed] [Google Scholar]
- 67.Wong J. T.-F. Emergence of life: from functional RNA selection to natural selection and beyond. Frontiers in Bioscience—Landmark. 2014;19:1117–1150. doi: 10.2741/4271. [DOI] [PubMed] [Google Scholar]
- 68.Xue H., Tong K.-L., Marck C., Grosjean H., Wong J. T.-F. Transfer RNA paralogs: evidence for genetic code-amino acid biosynthesis coevolution and an archaeal root of life. Gene. 2003;310(1-2):59–66. doi: 10.1016/s0378-1119(03)00552-3. [DOI] [PubMed] [Google Scholar]
- 69.Penny D., Poole A. The nature of the last universal common ancestor. Current Opinion in Genetics and Development. 1999;9(6):672–677. doi: 10.1016/S0959-437X(99)00020-9. [DOI] [PubMed] [Google Scholar]
- 70.Penny D., Collins L. J., Daly T. K., Cox S. J. The relative ages of eukaryotes and akaryotes. Journal of Molecular Evolution. 2014;79(5-6):228–239. doi: 10.1007/s00239-014-9643-y. [DOI] [PubMed] [Google Scholar]
- 71.Nasir A., Caetano-Anollés G. Comparative analysis of proteomes and functionomes provides insights into origins of cellular diversification. Archaea. 2013;2013:13. doi: 10.1155/2013/648746.648746 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kim K. M., Caetano-Anollés G. The proteomic complexity and rise of the primordial ancestor of diversified life. BMC Evolutionary Biology. 2011;11(1, article 140) doi: 10.1186/1471-2148-11-140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Kim K. M., Nasir A., Caetano-Anollés G. The importance of using realistic evolutionary models for retrodicting proteomes. Biochimie. 2014;99(1):129–137. doi: 10.1016/j.biochi.2013.11.019. [DOI] [PubMed] [Google Scholar]
- 74.Wang M., Yafremava L. S., Caetano-Anollés D., Mittenthal J. E., Caetano-Anollés G. Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genome Research. 2007;17(11):1572–1585. doi: 10.1101/gr.6454307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Wang M., Kurland C. G., Caetano-Anollés G. Reductive evolution of proteomes and protein structures. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(29):11954–11958. doi: 10.1073/pnas.1017361108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Csurös M., Miklós I. Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model. Molecular Biology and Evolution. 2009;26(9):2087–2095. doi: 10.1093/molbev/msp123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Delsuc F., Brinkmann H., Philippe H. Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics. 2005;6(5):361–375. doi: 10.1038/nrg1603. [DOI] [PubMed] [Google Scholar]
- 78.Waters E., Hohn M. J., Ahel I., et al. The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proceedings of the National Academy of Sciences of the United States of America. 2003;100(22):12984–12988. doi: 10.1073/pnas.1735403100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Zwickl D. J., Hillis D. M. Increased taxon sampling greatly reduces phylogenetic error. Systematic Biology. 2002;51(4):588–598. doi: 10.1080/10635150290102339. [DOI] [PubMed] [Google Scholar]
- 80.Heath T. A., Hedtke S. M., Hillis D. M. Taxon sampling and the accuracy of phylogenetic analyses. Journal of Systematics and Evolution. 2008;46(3):239–257. doi: 10.3724/SP.J.1002.2008.08016. [DOI] [Google Scholar]
- 81.Rodríguez-Ezpeleta N., Brinkmann H., Burger G., et al. Toward resolving the eukaryotic tree: the phylogenetic positions of jakobids and cercozoans. Current Biology. 2007;17(16):1420–1425. doi: 10.1016/j.cub.2007.07.036. [DOI] [PubMed] [Google Scholar]
- 82.Petitjean C., Deschamps P., López-García P., Moreira D., Brochier-Armanet C. Extending the conserved phylogenetic core of archaea disentangles the evolution of the third domain of life. Molecular Biology and Evolution. 2015;32(5):1242–1254. doi: 10.1093/molbev/msv015. [DOI] [PubMed] [Google Scholar]
- 83.Brochier C., Philippe H. Phylogeny: a non-hyperthermophilic ancestor for bacteria. Nature. 2002;417(6886):p. 244. doi: 10.1038/417244a. [DOI] [PubMed] [Google Scholar]
- 84.Zhaxybayeva O., Swithers K. S., Lapierre P., et al. On the chimeric nature, thermophilic origin, and phylogenetic placement of the Thermotogales. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(14):5865–5870. doi: 10.1073/pnas.0901260106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Bryant D., Moulton V. Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Molecular Biology and Evolution. 2004;21(2):255–265. doi: 10.1093/molbev/msh018. [DOI] [PubMed] [Google Scholar]
- 86.Brochier C., Forterre P., Gribaldo S. An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences. BMC Evolutionary Biology. 2005;5, article 36 doi: 10.1186/1471-2148-5-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Seitz K. W., Lazar C. S., Hinrichs K.-U., Teske A. P., Baker B. J. Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction. ISME Journal. 2016:1696–1705. doi: 10.1038/ismej.2015.233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Lartillot N., Brinkmann H., Philippe H. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evolutionary Biology. 2007;7(1, article S4) doi: 10.1186/1471-2148-7-s1-s4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Nasir A., Kim K. M., Caetano-Anollés G. Global patterns of protein domain gain and loss in superkingdoms. PLoS Computational Biology. 2014;10(1) doi: 10.1371/journal.pcbi.1003452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.La Scola B., Audic S., Robert C., et al. A giant virus in amoebae. Science. 2003;299(5615):p. 2033. doi: 10.1126/science.1081867. [DOI] [PubMed] [Google Scholar]
- 91.Woese C. R. Interpreting the universal phylogenetic tree. Proceedings of the National Academy of Sciences of the United States of America. 2000;97(15):8392–8396. doi: 10.1073/pnas.97.15.8392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Cavalier-Smith T. Origin of the cell nucleus, mitosis and sex: roles of intracellular coevolution. Biology Direct. 2010;5(1, article 7) doi: 10.1186/1745-6150-5-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Forterre P. A new fusion hypothesis for the origin of Eukarya: better than previous ones, but probably also wrong. Research in Microbiology. 2011;162(1):77–91. doi: 10.1016/j.resmic.2010.10.005. [DOI] [PubMed] [Google Scholar]
- 94.de Duve C. The origin of eukaryotes: a reappraisal. Nature Reviews Genetics. 2007;8(5):395–403. doi: 10.1038/nrg2071. [DOI] [PubMed] [Google Scholar]
- 95.Nelson-Sathi S., Dagan T., Landan G., et al. Acquisition of 1,000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(50):20537–20542. doi: 10.1073/pnas.1209119109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Martin W., Rujan T., Richly E., et al. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(19):12246–12251. doi: 10.1073/pnas.182432999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Nasir A., Sun F.-J., Kim K. M., Caetano-Anollés G. Untangling the origin of viruses and their impact on cellular evolution. Annals of the New York Academy of Sciences. 2015;1341(1):61–74. doi: 10.1111/nyas.12735. [DOI] [PubMed] [Google Scholar]
- 98.Bolduc B., Shaughnessy D. P., Wolf Y. I., Koonin E. V., Roberto F. F., Young M. Identification of novel positive-strand RNA viruses by metagenomic analysis of archaea-dominated yellowstone hot springs. Journal of Virology. 2012;86(10):5562–5573. doi: 10.1128/jvi.07196-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Koonin E. V., Dolja V. V., Krupovic M. Origins and evolution of viruses of eukaryotes: the ultimate modularity. Virology. 2015;479-480:2–25. doi: 10.1016/j.virol.2015.02.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Albani A. E., Bengtson S., Canfield D. E., et al. Large colonial organisms with coordinated growth in oxygenated environments 2.1 Gyr ago. Nature. 2010;466(7302):100–104. doi: 10.1038/nature09166. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.