Skip to main content
Science Advances logoLink to Science Advances
. 2020 Aug 26;6(35):eabb7258. doi: 10.1126/sciadv.abb7258

Chlamydial contribution to anaerobic metabolism during eukaryotic evolution

Courtney W Stairs 1,*, Jennah E Dharamshi 1,*, Daniel Tamarit 1,2,3, Laura Eme 1,4, Steffen L Jørgensen 5, Anja Spang 1,6, Thijs J G Ettema 1,2,
PMCID: PMC7449678  PMID: 32923644

Genomic analysis of newly identified Chlamydiae reveals that their ancestors contributed to the evolution of anaerobic eukaryotes.

Abstract

The origin of eukaryotes is a major open question in evolutionary biology. Multiple hypotheses posit that eukaryotes likely evolved from a syntrophic relationship between an archaeon and an alphaproteobacterium based on H2 exchange. However, there are no strong indications that modern eukaryotic H2 metabolism originated from archaea or alphaproteobacteria. Here, we present evidence for the origin of H2 metabolism genes in eukaryotes from an ancestor of the Anoxychlamydiales—a group of anaerobic chlamydiae, newly described here, from marine sediments. Among Chlamydiae, these bacteria uniquely encode genes for H2 metabolism and other anaerobiosis-associated pathways. Phylogenetic analyses of several components of H2 metabolism reveal that Anoxychlamydiales homologs are the closest relatives to eukaryotic sequences. We propose that an ancestor of the Anoxychlamydiales contributed these key genes during the evolution of eukaryotes, supporting a mosaic evolutionary origin of eukaryotic metabolism.

INTRODUCTION

Hydrogen transfer is a central aspect of many syntrophic relationships across the tree of life and represents the basis for various hypotheses on the origin of eukaryotes (15). The current prevailing theory for the origin of eukaryotes involves the symbiosis of an archaeon and an alphaproteobacterium. Although there are multiple proposals for the nature of the symbiosis, many hypotheses suggest that eukaryotic life evolved in an oxic-anoxic transition zone (OATZ) and that syntrophic interactions may have played a key role (15). By studying syntrophies in present-day OATZs, such as those found in marine sediments, we may be able to reconstruct interactions that were critical for eukaryogenesis.

Marine sediments are typically characterized by stable conditions and a well-defined OATZ (6, 7). The anoxic layers immediately below this transition zone often host representatives of the Asgard archaea, members of a superphylum of diverse lineages that are thought to represent the closest living relatives of the archaeal ancestor of eukaryotes (8, 9). Asgard archaea are predicted to have distinct metabolic features, with several members likely dependent on syntrophic interactions for growth based on H2 exchange (3, 4). H2 metabolism is widespread in prokaryotes, including Asgard archaea (3, 10), and H2 production and/or utilization is featured in various eukaryogenesis models (15). However, the absence of genes encoding eukaryotic-type H2 production [i.e., [FeFe]-hydrogenases (HYDA)] in most modern archaea (including Asgard archaea) or alphaproteobacteria has made it difficult to pinpoint the origin of this metabolism in eukaryotes.

The ability to produce H2 has been demonstrated in numerous eukaryotes known to experience transient or permanent anoxia (11, 12). H2-producing eukaryotes often have drastically different mitochondria compared to aerobic eukaryotes. These organelles are collectively referred to as “mitochondrion-related organelles” (MROs) and include H2-producing mitochondria, hydrogenosomes, and mitosomes. These organelles can vary in their complement of respiratory complexes (complexes CI to CV) ranging from complete retention in the H2-producing mitochondria of Acanthamoeba castellanii to complete loss in the hydrogenosomes of Trichomonas vaginalis [Fig. 1; see (11, 12)]. In eukaryotes and some anaerobic bacteria, pyruvate oxidation is coupled to H2 production by the concerted action of pyruvate:ferredoxin oxidoreductase (PFO) and HYDA, ultimately generating H2 and acetyl–coenzyme A (CoA). The proper assembly of the Fe-S cluster of HYDA relies on three maturase proteins (HYDE, HYDF, and HYDG). In some bacteria, HYDA functions with two other proteins, HYDB and HYDC—homologs of the NUOF and NUOE subunits of CI of the respiratory chain, respectively—in a trimeric confurcating hydrogenase complex. This complex couples the favorable and unfavorable oxidations of ferredoxin (Fd; E0′ ≈ −450 mV) and NADH (reduced form of nicotinamide adenine dinucleotide) (E0′ ≈ −320 mV), respectively, to reduce protons (E0′ ≈ −420 mV) (13). In some anaerobic and microaerophilic microbial eukaryotes, PFO-mediated acetyl-CoA production and HYDA-mediated H2 production occur in the cytoplasm (14), plastid (15), and/or mitochondrial compartment (16, 17). While many of these eukaryotes have lost all genes that encode the respiratory chain (i.e., CI to CV), some have retained two nuclear-encoded genes of MRO-localized CI subunits (i.e., nuoE and nuoF genes). These two subunits can oxidize both NADH or Fd (18), suggesting that, within the MRO, they could interact with HYDA to form a trimeric electron-confurcating hydrogenase, analogous to the bacterial complex discussed above (13).

Fig. 1. Anoxychlamydiales and anaerobic eukaryotes use similar anoxic metabolic strategies.

Fig. 1

The presence of genes encoding components of the indicated energy metabolism pathways (left), encoded in the genomes of Chlamydiae (gray), Anoxychlamydiales (orange), and select anaerobic eukaryotes (blue), is indicated by a filled circle. For eukaryotic representatives, the subcellular location of each gene product or pathway is indicated by a filled blue circle (cytoplasm) or a filled circle with an outline (organelles). Outlined blue, green, and purple filled circles indicate the mitochondrion or MRO, the plastid (green), and the vacuole (purple), respectively (see legend). Eukaryotic representatives A. castellanii and Blastocystis sp. have hydrogen-producing mitochondria, Pygsuia biforma and T. vaginalis have hydrogenosomes, and Chlamydomonas reinhardtii has anaerobically functioning mitochondria with plastidal hydrogen production. The number of genomes obtained from the indicated environmental sources for each Chlamydiae clade is shown in the top right. The boxplot on the lower right shows the normalized coverage of contigs from the corresponding Anoxychlamydiales MAGs obtained from Loki’s Castle marine sediments (data S1). HYDA-G, hydrogenase subunits A to G. ADI, arginine deiminase; OTC, ornithine transcarbamoylase; CK, carbamate kinase; PFO, pyruvate:ferredoxin oxidoreductase; PDC, pyruvate dehydrogenase complex; FEO, ferrous iron transport; ACK, acetate kinase; PTA, phosphotransacetylase; NaH, Na+:H+ antiporter. The Chlamydiae species tree is a representation of species relationships derived from Dharamshi et al. (28).

Unlike in bacteria, genes encoding proteins for H2 -producing metabolism (i.e., HYDA, HYDE-G, and PFO) are sparsely distributed across the tree of eukaryotes (11), and the origins of this pathway remain hotly debated: Was HYDA-mediated H2 production present in the alphaproteobacterial endosymbiont that gave rise to the mitochondrion (1, 19), or was it acquired by different eukaryotic lineages via horizontal gene transfer (HGT) (20)? One obstacle for inferring the evolutionary origin of these genes is the overall lack of resolution of individual protein phylogenies (2123). Moreover, while the origins of most proteins related to aerobic metabolism in mitochondria [e.g., respiratory chain, pyruvate dehydrogenase, and some components of the tricarboxylic acid (TCA) cycle] display clear affinity to genes from extant relatives of the endosymbiont (i.e., modern alphaproteobacteria) (24, 25), the origins of all anaerobiosis-associated proteins are unknown. Previous phylogenetic analyses of these proteins failed to recover a clear consistent prokaryotic group as the sister lineage of the eukaryotic sequences (2123). Here, we provide the first evidence for a specific bacterial donor of eukaryotic H2 metabolism from an order, “Candidatus Anoxychlamydiales,” newly described here, that represents the first identified anaerobic members of the phylum Chlamydiae.

RESULTS AND DISCUSSION

Chlamydiae are described as obligate intracellular bacteria of eukaryotes that display varying levels of auxotrophy for cofactors, amino acids, and nucleotides, and often import these compounds from their hosts (26). Although chlamydiae can acquire nucleotide triphosphates directly from their hosts using adenosine triphosphate (ATP)/adenosine diphosphate (ADP) transporters (27), they also conserve energy by glycolysis and aerobic respiration (26). With the retrieval of diverse and abundant chlamydial metagenome-assembled genomes (MAGs) from anoxic marine sediments (28) sampled near Loki’s Castle hydrothermal vent field (29), we have recently challenged the view that aerobiosis and an obligate eukaryotic intracellular lifestyle are unifying features of the phylum Chlamydiae (29).

Anoxychlamydiales are the first Chlamydiae shown to encode anaerobic metabolism

Using phylogenetic analyses of conserved single-copy marker proteins (28), we observed that 11 of the 24 Loki’s Castle chlamydial MAGs and two MAGs from estuary sediment and groundwater (30, 31) consistently form a distinct order hereafter referred to as Anoxychlamydiales (Fig. 1). In our marine sediment samples, Anoxychlamydiales are dominant community members as determined by read coverage of both the conserved ribosomal protein operon and each MAG (Fig. 1, fig. S1, and data S1). Because these genomes appear to derive from cells that are actively replicating (28), and because all cultured chlamydiae can only replicate intracellularly inside eukaryotic hosts (26), we expected to find eukaryotes in these samples. However, the low number of eukaryote-derived sequences in these samples could not explain the observed high relative abundance of Anoxychlamydiales (Fig. 1 and fig. S1) (28). We therefore suspect that Anoxychlamydiales are not obligate intracellular symbionts of eukaryotes.

When examining the predicted metabolic potential of the Anoxychlamydiales, we observed notable differences in their gene content compared to other chlamydiae (Fig. 1, fig. S2, and data S1). Reconstruction of Anoxychlamydiales metabolism strongly supports the hypothesis that this group is composed of anaerobes, in contrast to previously identified members of the phylum Chlamydiae (figs. S2 to S4). Phylogenetic analyses revealed that many anaerobiosis-associated proteins found in Anoxychlamydiales are closely related to homologs from diverse groups of anaerobic prokaryotes (fig. S3 and data S2). Like other facultative and obligate anaerobic fermentative bacteria, the Anoxychlamydiales are predicted to produce ATP by substrate-level phosphorylation via glycolysis, the arginine deimination pathway, and the concerted action of acetate kinase (ACK) and phosphate acetyltransferase (PTA), resulting in the concomitant production of acetate (Fig. 1, figs. S3 and S4, and data S1). Some chlamydiae can generate a Na+ gradient using CI (i.e., a Na+-transporting NADH dehydrogenase), allowing for Na+-driven synthesis of ATP via a V-type adenosine triphosphatase (ATPase) (Fig. 1) (32). While Anoxychlamydiales do not encode a Na+-transporting CI, they can likely still generate a Na+ gradient using a Na+:H+ antiporter (NaH), the gene of which appears to have been acquired by HGT from anaerobic deltaproteobacteria (fig. S3F). We therefore suspect that these bacteria also have Na+-driven ATP synthesis via a V-type ATPase (Fig. 1 and figs. S3 and S4). We also identified a number of distinguishing anaerobiosis-associated characteristics of Anoxychlamydiales compared to other chlamydiae, e.g., ferrous iron import and metabolism (FeoB), and the absence of a complete respiratory chain and TCA cycle (Fig. 1).

The Anoxychlamydiales MAGs also encode a collection of metabolic modules that are common in anaerobic eukaryotes (Fig. 1, figs. S3 and S4, and data S1). Most unexpectedly, Anoxychlamydiales have the capacity for H2 production, a feature not previously assigned to representatives of this phylum. In addition to encoding a pfo gene, most of these MAGs have a gene cluster encoding the [FeFe]-hydrogenase maturases (i.e., hydE, hydF, and hydG) and a second cluster encoding a trimeric [FeFe]-hydrogenase (i.e., hydA, nuoE/hydC, and nuoF/hydB) (fig. S6). The domain architecture of both the HYDA and NUOF/HYDB proteins (figs. S5 and S6) resembles those of other trimeric H2-producing hydrogenase systems in bacteria (10). As [FeFe]-hydrogenases are bidirectional enzymes, it remains to be assessed whether these chlamydial proteins produce or consume H2. However, considering that we could not identify genes typically involved in H2-consuming pathways (e.g., terminal electron acceptors or carbon fixation pathways; data S1), we find it likely that Anoxychlamydiales produce H2 during acetogenic fermentation (Fig. 1, fig. S4, and data S1). Collectively, these features indicate that the Anoxychlamydiales might be H2 and acetate producers that live syntrophically within a microbial consortium with H2 and/or acetate consumers (e.g., hydrogenotrophic methanogens and homoacetogens). Given their high relative abundance in anoxic marine sediments (Fig. 1 and fig. S1) and the importance of H2 and acetate in the carbon cycle (33), these Anoxychlamydiales may play a previously unrecognized role in global biogeochemical cycling in anoxic environments. The punctate distribution of anaerobiosis-associated metabolism in other unclassified chlamydial lineages (data S1) suggests that further exploration of Chlamydiae in anoxic environments will reveal anaerobiosis to be more widespread within this phylum than is currently recognized.

Anoxychlamydiales encode eukaryotic-like hydrogen metabolism

To investigate the evolutionary history of the proteins mediating H2 metabolism in Anoxychlamydiales and their relationship to eukaryotic homologs, we carried out phylogenetic analyses of each of the seven proteins involved in the PFO-HYDA system. In these phylogenies, the Anoxychlamydiales formed a sister clade to eukaryotes (HYDE, HYDF, and HYDG) or branched close to them (PFO), with high support (Fig. 2). The relationships between eukaryotes within the eukaryotic clades display patterns of both vertical (e.g., plastid-bearing lineages and chytrid fungi; see data S2) and horizontal inheritance (data S2). Furthermore, the topologies within the Anoxychlamydiales clades (data S2) are largely congruent with known organismal relationships (Fig. 1), suggesting vertical inheritance of these genes in this chlamydial order. This finding was unexpected because until now, there was no clear evidence for a consistent prokaryotic lineage branching sister to eukaryotes in those four protein phylogenies. For example, if we exclude the Anoxychlamydiales, the closest prokaryotic taxa to the eukaryotic sequences are non-Asgard archaea (PFO), firmicutes and aquificae (HYDE), or bacteria of mixed taxonomic assignment (HYDF and HYDG). The observation of the Anoxychlamydiales sequences representing a sister clade to (and not emerging from within) eukaryotes strongly suggests that Anoxychlamydiales did not receive these genes by HGT from eukaryotes but that, on the contrary, an Anoxychlamydiales ancestor might have contributed these genes to eukaryotes.

Fig. 2. Anoxychlamydiales genes are the closest prokaryotic relatives of eukaryotic genes encoding anaerobic metabolism.

Fig. 2

(A to D) Maximum likelihood (ML) phylogenies for H2 metabolism proteins. Circles [nonparametric (NP)] and squares [ultrafast (UF)] summarize bootstrap support values (BP, boostrap percentage) for each bipartition mapped onto the best-scoring ML phylogeny. BPUF, BPNP, and transfer bootstrap expectation (TBE) for monophyly of eukaryotes (α) and eukaryotes with Anoxychlamydiales (β) or Anoxychlamydiales and other taxa (β′) are indicated. Eukaryotes and Anoxychlamydiales are shaded blue and orange, respectively (see data S1 for model parameters and alignment features and data S2 for full phylogenies). (E) The origin of mitochondrial metabolism in eukaryotes is an evolutionary mosaic. In aerobic mitochondria (solid lines), pyruvate is oxidized by PDC to acetyl-CoA, which is fed into the TCA cycle to produce reducing equivalents for the ETC that fuels ATP synthesis by oxidative phosphorylation. In MROs (dashed lines), pyruvate is oxidized by PFO (1) to acetyl-CoA, which is used for ATP synthesis by substrate-level phosphorylation (12). Hydrogen is produced by a trimeric confurcating hydrogenase (5 to 7) using electrons from Fd and NADH. The 4Fe-4S cluster of HYDA (5) is assembled by maturases (2 to 4). In this hypothetical cell, colors represent the proposed origin of each component [data S2 and figs. S5 and S6; (2123)]. 1 to -5 function in the plastids of some algae (15). (F and G) Scenarios for the timing of acquisition of hydrogen metabolism relative to major events in eukaryotic evolution. Relative timings of gene acquisitions from an Anoxychlamydiales ancestor (orange arrows) (F) before or immediately after the emergence of last eukaryotic common ancestor (LECA) or (G) after the radiation of eukaryotes, mediated by HGT into and between eukaryotes (blue arrows). Uncertainty regarding the timing of mitochondrial integration is depicted with a purple triangle. Gene losses are shown with an “X.” Ancestral Asgard archaeon (As; gray) and the alphaproteobacterial ancestor of the mitochondrion (α; purple).

We next assessed the relationships between trimeric H2-producing hydrogenase components and related proteins of CI from Anoxychlamydiales, other chlamydiae, and eukaryotes. In phylogenetic analyses of both NUOE/HYDC and NUOF/HYDB, eukaryotic sequences form a clade with alphaproteobacterial sequences, a topology expected given their mitochondrial ancestry (25). The NUOE/HYDC and NUOF/HYDB Anoxychlamydiales sequences branch distantly from CI subunits of eukaryotes and other chlamydiae (fig. S6). In the HYDA phylogeny, there are two groups of eukaryotic sequences, both of which branch distantly from Anoxychlamydiales homologs (fig. S5). Contrary to other components of H2 metabolism mentioned above (e.g., HYDE-G and PFO), HYDA appears to have different evolutionary origins in Anoxychlamydiales and eukaryotes. In HYDA, NUOE/HYDC, and NUOF/HYDB phylogenies, the closest sister taxa to the Anoxychlamydiales sequences are a collection of diverse and distantly related bacteria isolated from anoxic environments (3436). The congruence of these phylogenies is mirrored by the conserved operon organization and domain structure of the bacterial sequences that branch closest to the Anoxychlamydiales (figs. S5 and S6). Collectively, this implies that the Anoxychlamydiales NUOE/HYDC and NUOF/HYDB subunits have a distinct evolutionary history compared to the homologous CI subunits from other chlamydiae. These findings also suggest that the nuoE/hydC, nuoF/hydB, and hydA genes have been acquired independently from the hydrogenase maturases and pfo (Fig. 2).

Anoxychlamydiales ancestor contributed to the mosaic origins of eukaryotic metabolism

To date, the origin of the proteins mediating H2 metabolism in eukaryotes has been elusive, and it was unclear whether any of these proteins were present in the last eukaryotic common ancestor (LECA). Two prevailing hypotheses for the origin of this pathway suggest that modern H2 metabolism derives from HGT events into and between eukaryotes or that this pathway was present in the alphaproteobacterial endosymbiont that gave rise to the mitochondrion (1, 20). This work identifies an ancestor of the Anoxychlamydiales as the likely donor of genes coding for components of H2 metabolism in eukaryotes, as opposed to this metabolism having been derived from the alphaproteobacterial ancestor of the mitochondria (1, 37). The timing of the acquisition of these genes from an ancestor of Anoxychlamydiales relative to major events in eukaryotic evolution remains unclear (Fig. 2, F and G). However, the lack of evidence for genes encoding eukaryotic-type H2 metabolism in modern representatives of archaea (including the Asgard archaea) and alphaproteobacteria strongly suggests that this metabolism was acquired after the emergence of the first eukaryotic common ancestor (FECA). The punctate distribution of H2 metabolism across the tree of eukaryotes suggests that vertical inheritance (potentially dating back to LECA), differential loss, and HGT between eukaryotes have played a role in the evolution of these genes in eukaryotes. These findings highlight the contribution of genes originating from outside the archaeal host-ancestor and alphaproteobacterial mitochondrial-ancestor during eukaryotic evolution, a possibility that has been explicitly part of several eukaryogenesis scenarios (2, 5, 38). Future investigations might reveal additional chlamydial homologs, or those from other prokaryotes, that are more closely related to the eukaryotic proteins than those from Anoxychlamydiales, and additional data of this sort could allow for the refinement of evolutionary scenarios underpinning the origin and early evolution of eukaryotes.

Chlamydiae have previously been invoked in other hypotheses detailing major evolutionary transitions in eukaryotes, such as the emergence of plastids (39, 40). In addition, Pittis and Gabaldon (38) found evidence for the late acquisition of genes encoding mitochondrial components (i.e., of alphaproteobacterial origin) in the eukaryotic lineage before LECA, alongside a similarly timed acquisition of genes from Chlamydiae and the related phylum Verrucomicrobia. We therefore propose that a chlamydial ancestor contributed key components of H2 metabolism during eukaryogenesis and potentially before the diversification of eukaryotes. The acquisition of an [FeFe]-hydrogenase system and its integration into the metabolic networks of an early proto-eukaryote may have allowed for the loss of the presumed archaeal and/or alphaproteobacterial H2 metabolism of the host cell. While H2 exchange is featured in many eukaryogenesis models, our work provides the first evidence for a hypothesis whereby the archaeal-type H2 production was replaced by a non-alphaproteobacterial H2 system during eukaryogenesis.

Conclusion

Extant chlamydiae (e.g., Anoxychlamydiales) and the closest prokaryotic relatives of eukaryotes (i.e., Asgard archaea) inhabit similar anoxic environments, in close vicinity to OATZs where eukaryogenesis is thought to have occurred. Although we do not know whether Anoxychlamydiales and Asgard archaea interact, their co-occurrence within the same microbial communities suggests that HGT between their ancestors could have facilitated the transfer of genes related to H2 metabolism. However, questions regarding the origins of the remaining components of H2 production and other anaerobic strategies in eukaryotes remain open. The polyphyletic relationship of eukaryotic HYDA implies multiple distinct origins and suggests that, if HYDA was in LECA, it has been replaced numerous times in the ancestors of modern HYDA-containing eukaryotes. In contrast, eukaryotic HYDA maturases and PFO have a single evolutionary origin. The exact timing of these contributions and their importance for the origin of eukaryotes remain as avenues for future exploration. The mosaic origin of various mitochondrial components, including respiration and H2 production, strongly suggests that prokaryotes from at least three major phyla of life—Alphaproteobacteria, Chlamydiae, and Asgard archaea—contributed to the emergence of the first eukaryotic cell.

MATERIALS AND METHODS

Metagenomic relative abundance

To investigate the relative abundance of Anoxychlamydiales among microbial community members in the marine sediment samples, we surveyed the four metagenomes for “RP15 contigs,” i.e., contigs encoding at least 5 of 15 ribosomal proteins found in an operon conserved across prokaryotes. These were identified in the metagenomic assemblies (GS08_GC12_126, GS10_PC15_940, GS10_PC15_1000, and GS10_PC15_1060) using a previously described workflow (41). Contig coverage was estimated by mapping sequence reads from each metagenome to corresponding assembled contigs using Bowtie2 (42). Contigs longer than 20 kb were split into 10-kb fragments before coverage estimation. Averaged coverages of RP15 contigs from each marine sediment metagenome were then compared (fig. S1 and data S1).

Maximum likelihood (ML) phylogenies of concatenated proteins extracted from RP15 contigs were inferred for each marine sediment sample (GS08_GC12_126 in fig. S1, all in data S2), alongside a set of phylogenetically diverse prokaryotic reference taxa (41, 43) using RAxML version 8.2.4 (44) under the PROTCATLG model of evolution with 100 rapid bootstraps.

We normalized the coverage of contigs (and contig fragments) from each Loki’s Castle Anoxychlamydiales MAG to the average metagenome coverage (data S1). This allowed us to compare MAG abundance between samples and indicated the magnitude of over- or underrepresentation of each MAG in the corresponding metagenome (Fig. 1).

Annotation and orthology clustering

Protein sequences were annotated according to the strategy laid out by Dharamshi et al. (28). In short, annotations include: top hits against National Center for Biotechnology Information nonredundant (nr) protein database using DIAMOND aligner v0.9.19.120 (45) blastp (“--more-sensitive”) alongside taxonomic classification of sequences [lowest common ancestor (LCA) algorithm, “-f 102”], assignment of protein domain annotations using InterProScan (46) version 5.22-61.0 [e.g., Protein Families (Pfam) (47) and InterPro (IPR) (48) domains], and assignment to both root-level “-d NOG” and bacterial-level “-d BACT” nonsupervised orthologous groups (NOGs) using eggNOG-mapper (49) with the eggNOG database (50).

The presence of KEGG (Kyoto Encyclopedia of Genes and Genomes) (51, 52) modules connected to various metabolic pathways related to energy generation and electron transfer (Fig. 1 and fig. S4) were assessed across Chlamydiae species representatives (data S1) through the assignment of KEGG orthology (KO) numbers by GhostKOALA (53). Pfam domains (47) allowed the identification of several additional proteins of interest (data S1), including CIII of the electron transport chain (PF00033). The presence of specific proteins across Chlamydiae included in phylogenetic trees (see below) is also indicated in data S1.

The presence of specific pathways related to energy generation found across Chlamydiae was also assessed in several anaerobic eukaryote representatives (e.g., A. castellani, GCF_000313135.1; T. vaginalis, GCF_000002825.2; Chlamydomonas reinhardtii, GCF_000002595.1; Blastocystis sp. ST 1, GCA_001651215.1; and Pygsuia biforma, GCRY00000000.1) using GhostKOALA (53) or manual inspection (see data S1).

Gene content unique to or enriched in Anoxychlamydiales relative to other chlamydiae was assessed by mapping protein sequences to NOGs. eggNOG-mapper (49) version 4.5 was used to map all protein sequences from Chlamydiae species representatives (data S1) to root-level NOGs (“-d NOG”). NOGs unique to Anoxychlamydiales in comparison to other chlamydiae and found across the group (in at least five members) are outlined in fig. S2 (data S1) and are ordered by Clusters of Orthologous Genes (COG) category (54). Those enriched in Anoxychlamydiales, in at least five members and up to three other chlamydiae, can be found in data S1.

Phylogenetic analysis of genes encoding anaerobiosis-associated proteins

We selected orthologous groups related to anaerobiosis that were found in the Anoxychlamydiales MAGs and typically absent from other chlamydiae. To generate the phylogenetic datasets, we used each of the Anoxychlamydiales sequences as a query against GenBank nr (January 2019), the Marine Microbial Eukaryote Transcriptome Sequencing Project [MMETSP; (55)], and other sequencing projects (22, 56) using BLAST (57) to retrieve the best 2000 hits with an e value lower than 1 × 10−5. For HYDA phylogenies, we retrieved 2000 additional hits using the large HYDA domain of T. vaginalis (XP_001322682.1), Nyctotherus ovalis (CAA76373.1), and Thalassiosira pseudonana (XP_002295160.1). All HYDA sequences were classified with HYDDB (58), and only group 1A HYDAs were retained. HMMer version h3.1b2 (www.hmmer.org) (59) and the hidden Markov model (HMM) for the large HYDA domain (PF02906) were used to identify and extract only the large HYDA domain from each sequence for further analysis. For all datasets, taxonomy was assigned to each sequence using ETE version 3.0.0 (60), and CD-HIT version 4.6 (61) was used to reduce the dataset at the 80% amino acid sequence identity level.

MAFFT v7 (a multiple alignment program for amino acid or nucleotide sequences) (62) using the “--auto” flag was used to build initial alignments, and BMGE (Block Mapping and Gathering with Entropy) version 1.12 (63) was used to mask ambiguously aligned residues (i.e., entropy scores greater than 0.6) using the BLOSUM30 matrix. Initial trees were generated using FastTree version 2.1.9 SSE3 (64), distantly related sequences were removed, and clades of closely related organisms (e.g., same class) were trimmed to contain only representative sequences. With the final datasets, T-COFFEE v11 using M-Coffee mode (65, 66) was used to generate a consensus alignment of the eight default alignment softwares. Alignments were manually refined when necessary. Ambiguously aligned residues were removed using BMGE version 1.12 (63) with the indicated thresholds (data S1).

ModelFinder (67) implemented in IQ-TREE v 1.6.10 (68) was used to select the most appropriate evolutionary model [including the C-series mixture models (69)] for each protein (data S1). A total of 1000 ultrafast (70) and SH-like approximate likelihood ratio test (SH-aLRT) (71) bootstraps were calculated using the best-scoring evolutionary model for each protein. The resulting ML tree was used as the guide tree for rapid approximation of posterior mean site frequency (PMSF) of the C-series of mixture models (72) and 100 nonparametric bootstraps. For HYDE, HYDF, HYDG, and PFO phylogenies, in addition to the nonparametric bootstraps discussed above, we calculated the transfer bootstrap expectation (73) implemented in IQ-TREE v 2.0 (74). Rogue taxa were selected with RogueNaRok (75) using the nonparametric bootstrap trees and ML tree from these analyses under the default settings. Taxa identified as rogue were removed from the original dataset, and the alignments and phylogenies were recomputed as described above.

In initial phylogenetic trees for HYDF (see “HYDF family” in data S2), a clade composed of eukaryotes, Anoxychlamydiales, and additional mixed prokaryotic taxa (HYDF clade I) resolved on a long branch apart from other prokaryotic sequences (HYDF clade II), indicating two paralogs. We therefore also analyzed HYDF clade I as described above (Fig. 2 and data S2).

Topologies constraining eukaryotic sequences with various prokaryotic clades were tested and are outlined in data S1. In all cases, the branching of Anoxychlamydiales sister to eukaryotes could not be rejected. Topology tests were performed for HYDE, HYDF, HYDG, and PFO (data S1) using IQ-TREE v 2.0 (74). Briefly, ML trees of the constrained topologies were calculated using the “-g” option under the model of evolution previously selected by ModelFinder (67). The approximately unbiased (AU) test (76) was performed on the ML constrained trees and the 100 bootstrap trees (“-au,” “-z,” “-n 0,” “-zb 10000,” and “-zw”) and with model parameters estimated using the ML tree (“-te”).

Summarized and full phylogenies can be found in Fig. 2; figs. S3, S5, and S6; and data S2. For each gene, the sequence dataset (*.fasta), sequence alignment (*.tcoffee.fasta), BMGE masked alignment (*.bmge.*.fasta), PMSF phylogeny (*.pmsf.tre), and ultrafast bootstrap phylogeny (*.uf.tre) are provided in the following data repository (Figshare DOI: 10.6084/m9.figshare.12387980). Additional datasets with the TBE values (*.tbe.tree) and rogue taxa removed (*.RR_removed.*) for HYDE, HYDF, HYDG, and PFO are also provided. Accession numbers including “CAMPEP” or “CAMNT” derive from the MMETSP database (55).

Data visualization

In R v.3.2.2 (R Development Core Team, 2008), the package ggplot2 (77) was used for plots in Fig. 1 and fig. S1, while the package genoPlotR (78) was used to generate a synteny plot for visualization of the genome organization of HYDA, NUOE, and NUOF (fig. S6). iTOL (79) was used to visualize protein domains and to map them to the corresponding sequences in phylogenetic trees (figs. S5 and S6). Figtree v1.4.2 (80), iTOL (79), and Adobe Illustrator were used to visualize and edit phylogenetic trees.

Acknowledgments

Funding: This work was supported by grants from the Swedish Research Council (VR grant 2015-04959), the European Research Council (ERC starting and consolidator grants 310039 and 817834, respectively), and the Swedish Foundation for Strategic Research (SSF-FFL5) to T.J.G.E. C.W.S. was supported by the European Molecular Biology Organization long-term fellowship (ALTF-997-2015) and the Natural Sciences and Engineering Research Council of Canada (PDF 487174-2016). L.E. was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no. 704263 and by funding from the European Research Council (ERC starting grant no. 803151. D.T. was supported by the Swedish Research Council International Postdoc grant (2018-06609). A.S. was supported by the Swedish Research Council (VR starting grant 2016-03559) and the NWO-I Foundation of the Netherlands Organization for Scientific Research (WISE fellowship). S.L.J. was supported by the Trond Mohn starting grant BFS2017REK03. Author contributions: Conceptualization: T.J.G.E., J.E.D., C.W.S., and A.S. Data curation: J.E.D. and C.W.S. Formal analysis: J.E.D., C.W.S., and T.J.G.E. Investigation: J.E.D., C.W.S., L.E., D.T., and T.J.G.E. Validation: all authors. Resources: S.L.J. and T.J.G.E. Supervision: T.J.G.E. Visualization: J.E.D. and C.W.S. Writing—original draft: J.E.D., C.W.S., and T.J.G.E. Writing—reviewing and editing: all authors. The contributions of D.T. and L.E. should be regarded as equal. Competing interests: The authors declare that they have no competing interests. Data and materials availability: In addition to data available in the Supplementary Materials, files containing sequence datasets, alignments, and phylogenetic trees in Newick format are archived at the digital repository Figshare DOI: 10.6084/m9.figshare.12387980. Whole Genome Shotgun projects for metagenome assemblies GS08_GC12_126, GS10_PC15_940, GS10_PC15_1000, and GS10_PC15_1060 have been deposited at DDBJ/ENA/GenBank under the accessions SDBU00000000, SDBV00000000, SDBS00000000, and SDBT00000000, respectively. The versions described in this paper are versions SDBU01000000, SDBV01000000, SDBS01000000, and SDBT01000000, with MAGs generated from each linked to BioProject PRJNA504765. Accessions for genomes analyzed in this study can be found in data S1.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/35/eabb7258/DC1

View/request a protocol for this paper from Bio-protocol.

References and Notes

  • 1.Martin W., Müller M., The hydrogen hypothesis for the first eukaryote. Nature 392, 37–41 (1998). [DOI] [PubMed] [Google Scholar]
  • 2.Moreira D., Lopez-Garcia P., Symbiosis between methanogenic archaea and delta-proteobacteria as the origin of eukaryotes: The syntrophic hypothesis. J. Mol. Evol. 47, 517–530 (1998). [DOI] [PubMed] [Google Scholar]
  • 3.Spang A., Stairs C. W., Dombrowski N., Eme L., Lombard J., Caceres E. F., Greening C., Baker B. J., Ettema T. J. G., Proposal of the reverse flow model for the origin of the eukaryotic cell based on comparative analyses of Asgard archaeal metabolism. Nat. Microbiol. 4, 1138–1148 (2019). [DOI] [PubMed] [Google Scholar]
  • 4.Imachi H., Nobu M. K., Nakahara N., Morono Y., Ogawara M., Takaki Y., Takano Y., Uematsu K., Ikuta T., Ito M., Matsui Y., Miyazaki M., Murata K., Saito Y., Sakai S., Song C., Tasumi E., Yamanaka Y., Yamaguchi T., Kamagata Y., Tamaki H., Takai K., Isolation of an archaeon at the prokaryote-eukaryote interface. Nature 577, 519–525 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.López-García P., Moreira D., The Syntrophy hypothesis for the origin of eukaryotes revisited. Nat. Microbiol. 5, 655–667 (2020). [DOI] [PubMed] [Google Scholar]
  • 6.Froelich P. N., Klinkhammer G. P., Bender M. L., Luedtke N. A., Heath G. R., Cullen D., Dauphin P., Hammond D., Hartman B., Maynard V., Early oxidation of organic-matter in pelagic sediments of the eastern equatorial Atlantic - suboxic diagenesis. Geochim. Cosmochim. Acta 43, 1075–1090 (1979). [Google Scholar]
  • 7.Jorgensen S. L., Hannisdal B., Lanzen A., Baumberger T., Flesland K., Fonseca R., Ovreas L., Steen I. H., Thorseth I. H., Pedersen R. B., Schleper C., Correlating microbial community profiles with geochemical data in highly stratified sediments from the Arctic Mid-Ocean Ridge. PNAS 109, E2846–E2855 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Spang A., Saw J. H., Jørgensen S. L., Zaremba-Niedzwiedzka K., Martijn J., Lind A. E., van Eijk R., Schleper C., Guy L., Ettema T. J. G., Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zaremba-Niedzwiedzka K., Caceres E. F., Saw J. H., Bäckström D., Juzokaite L., Vancaester E., Seitz K. W., Anantharaman K., Starnawski P., Kjeldsen K. U., Stott M. B., Nunoura T., Banfield J. F., Schramm A., Baker B. J., Spang A., Ettema T. J. G., Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 541, 353–358 (2017). [DOI] [PubMed] [Google Scholar]
  • 10.Greening C., Biswas A., Carere C. R., Jackson C. J., Taylor M. C., Stott M. B., Cook G. M., Morales S. E., Genomic and metagenomic surveys of hydrogenase distribution indicate H2 is a widely utilised energy source for microbial growth and survival. ISME J. 10, 761–777 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Stairs C. W., Leger M. M., Roger A. J., Diversity and origins of anaerobic metabolism in mitochondria and related organelles. Philos. Trans. R. Soc. Lond. B Biol. Sci. 370, 20140326 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Muller M., Mentel M., van Hellemond J. J., Henze K., Woehle C., Gould S. B., Yu R. Y., van der Giezen M., Tielens A. G. M., Martin W. F., Biochemistry and evolution of anaerobic energy metabolism in eukaryotes. Microbiol. Mol. Biol. Rev. 76, 444–495 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schut G. J., Adams M. W., The iron-hydrogenase of Thermotoga maritima utilizes ferredoxin and NADH synergistically: A new perspective on anaerobic hydrogen production. J. Bacteriol. 191, 4451–4457 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lloyd D., Ralphs J. R., Harris J. C., Giardia intestinalis, a eukaryote without hydrogenosomes, produces hydrogen. Microbiology 148, 727–733 (2002). [DOI] [PubMed] [Google Scholar]
  • 15.Happe T., Naber J. D., Isolation, characterization and N-terminal amino acid sequence of hydrogenase from the green alga Chlamydomonas reinhardtii. Eur. J. Biochem. 214, 475–481 (1993). [DOI] [PubMed] [Google Scholar]
  • 16.Lindmark D. G., Eckenrode B. L., Halberg L. A., Dinbergs I. D., Carbohydrate, energy and hydrogenosomal metabolism of Tritrichomonas foetus and Trichomonas vaginalis. J. Protozool. 36, 214–216 (1989). [DOI] [PubMed] [Google Scholar]
  • 17.Nyvltova E., Sutak R., Harant K., Sedinova M., Hrdy I., Paces J., Vlcek C., Tachezy J., NIF-type iron-sulfur cluster assembly system is duplicated and distributed in the mitochondria and cytosol of Mastigamoeba balamuthi. Proc. Natl. Acad. Sci. U.S.A. 110, 7371–7376 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hrdy I., Hirt R. P., Dolezal P., Bardonová L., Foster P. G., Tachezy J., Martin Embley T., Trichomonas hydrogenosomes contain the NADH dehydrogenase module of mitochondrial complex I. Nature 432, 618–622 (2004). [DOI] [PubMed] [Google Scholar]
  • 19.Martin W. F., Too much eukaryote LGT. Bioessays 39, (2017). [DOI] [PubMed] [Google Scholar]
  • 20.Leger M. M., Eme L., Stairs C. W., Roger A. J., Demystifying eukaryote lateral gene transfer. Bioessays 40, e1700242 (2018). [DOI] [PubMed] [Google Scholar]
  • 21.Hug L. A., Stechmann A., Roger A. J., Phylogenetic distributions and histories of proteins involved in anaerobic pyruvate metabolism in eukaryotes. Mol. Biol. Evol. 27, 311–324 (2010). [DOI] [PubMed] [Google Scholar]
  • 22.Stairs C. W., Eme L., Brown M. W., Mutsaers C., Susko E., Dellaire G., Soanes D. M., van der Giezen M., Roger A. J., A SUF Fe-S cluster biogenesis system in the mitochondrion-related organelles of the anaerobic protist Pygsuia. Curr. Biol. 24, 1176–1186 (2014). [DOI] [PubMed] [Google Scholar]
  • 23.Leger M. M., Eme L., Hug L. A., Roger A. J., Novel hydrogenosomes in the microaerophilic jakobid Stygiella incarcerata. Mol. Biol. Evol. 33, 2318–2336 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Schnarrenberger C., Martin W., Evolution of the enzymes of the citric acid cycle and the glyoxylate cycle of higher plants. A case study of endosymbiotic gene transfer. Eur. J. Biochem. 269, 868–883 (2002). [DOI] [PubMed] [Google Scholar]
  • 25.Emelyanov V. V., Common evolutionary origin of mitochondrial and rickettsial respiratory chains. Arch. Biochem. Biophys. 420, 130–141 (2003). [DOI] [PubMed] [Google Scholar]
  • 26.Omsland A., Sixt B. S., Horn M., Hackstadt T., Chlamydial metabolism revisited: Interspecies metabolic variability and developmental stage-specific physiologic activities. FEMS Microbiol. Rev. 38, 779–801 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Schmitz-Esser S., Linka N., Collingro A., Beier C. L., Neuhaus H. E., Wagner M., Horn M., ATP/ADP translocases: A common feature of obligate intracellular amoebal symbionts related to Chlamydiae and Rickettsiae. J. Bacteriol. 186, 683–691 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Dharamshi J. E., Tamarit D., Eme L., Stairs C. W., Martijn J., Homa F., Jørgensen S. L., Spang A., Ettema T. J. G., Marine sediments illuminate Chlamydiae diversity and evolution. Curr. Biol. 30, 1032–1048.e7 (2020). [DOI] [PubMed] [Google Scholar]
  • 29.Pedersen R. B., Rapp H. T., Thorseth I. H., Lilley M. D., Barriga F. J. A. S., Baumberger T., Flesland K., Fonseca R., Früh-Green G. L., Jorgensen S. L., Discovery of a black smoker vent field and vent fauna at the Arctic Mid-Ocean Ridge. Nat. Commun. 1, 126 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Baker B. J., Lazar C. S., Teske A. P., Dick G. J., Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3, 14 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Anantharaman K., Brown C. T., Hug L. A., Sharon I., Castelle C. J., Probst A. J., Thomas B. C., Singh A., Wilkins M. J., Karaoz U., Brodie E. L., Williams K. H., Hubbard S. S., Banfield J. F., Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Liang P., Rosas-Lemus M., Patel D., Fang X., Tuz K., Juárez O., Dynamic energy dependency of Chlamydia trachomatis on host cell metabolism during intracellular growth: Role of sodium-based energetics in chlamydial ATP generation. J. Biol. Chem. 293, 510–522 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Orsi W. D., Ecology and evolution of seafloor and subseafloor microbial communities. Nat. Rev. Microbiol. 16, 671–683 (2018). [DOI] [PubMed] [Google Scholar]
  • 34.Liesack W., Bak F., Kreft J. U., Stackebrandt E., Holophaga foetida gen. nov., sp. nov., a new, homoacetogenic bacterium degrading methoxylated aromatic compounds. Arch. Microbiol. 162, 85–90 (1994). [DOI] [PubMed] [Google Scholar]
  • 35.Slobodkina G. B., Kovaleva O. L., Miroshnichenko M. L., Slobodkin A. I., Kolganova T. V., Novikov A. A., van Heerden E., Bonch-Osmolovskaya E. A., Thermogutta terrifontis gen. nov., sp. nov. and Thermogutta hypogea sp. nov., thermophilic anaerobic representatives of the phylum Planctomycetes. Int. J. Syst. Evol. Microbiol. 65, 760–765 (2015). [DOI] [PubMed] [Google Scholar]
  • 36.Zheng H., Brune A., Complete genome sequence of Endomicrobium proavitum, a free-living relative of the intracellular symbionts of termite gut flagellates (Phylum Elusimicrobia). Genome Announc. 3, e00679-15 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Degli Esposti M., Cortez D., Lozano L., Rasmussen S., Nielsen H. B., Martinez Romero E., Alpha proteobacterial ancestry of the [Fe-Fe]-hydrogenases in anaerobic eukaryotes. Biol. Direct 11, 34 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pittis A. A., Gabaldon T., Late acquisition of mitochondria by a host with chimaeric prokaryotic ancestry. Nature 531, 101–104 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Moustafa A., Reyes-Prieto A., Bhattacharya D., Chlamydiae has contributed at least 55 genes to Plantae with predominantly plastid functions. PLOS ONE 3, e2205 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ball S., Colleoni C., Cenci U., Raj J. N., Tirtiaux C., The evolution of glycogen and starch metabolism in eukaryotes gives molecular clues to understand the establishment of plastid endosymbiosis. J. Exp. Bot. 62, 1775–1801 (2011). [DOI] [PubMed] [Google Scholar]
  • 41.Martijn J., Vosseberg J., Guy L., Offre P., Ettema T. J. G., Deep mitochondrial origin outside the sampled alphaproteobacteria. Nature 557, 101–105 (2018). [DOI] [PubMed] [Google Scholar]
  • 42.Langmead B., Salzberg S. L., Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Raymann K., Brochier-Armanet C., Gribaldo S., The two-domain tree of life is linked to a new root for the Archaea. Proc. Natl. Acad. Sci. U.S.A. 112, 6670–6675 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Stamatakis A., RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Buchfink B., Xie C., Huson D. H., Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015). [DOI] [PubMed] [Google Scholar]
  • 46.Jones P., Binns D., Chang H. Y., Fraser M., Li W., McAnulla C., McWilliam H., Maslen J., Mitchell A., Nuka G., Pesseat S., Quinn A. F., Sangrador-Vegas A., Scheremetjew M., Yong S. Y., Lopez R., Hunter S., InterProScan 5: Genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Finn R. D., Bateman A., Clements J., Coggill P., Eberhardt R. Y., Eddy S. R., Heger A., Hetherington K., Holm L., Mistry J., Sonnhammer E. L. L., Tate J., Punta M., Pfam: The protein families database. Nucleic Acids Res. 42, D222–D230 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Finn R. D., Attwood T. K., Babbitt P. C., Bateman A., Bork P., Bridge A. J., Chang H. Y., Dosztányi Z., el-Gebali S., Fraser M., Gough J., Haft D., Holliday G. L., Huang H., Huang X., Letunic I., Lopez R., Lu S., Marchler-Bauer A., Mi H., Mistry J., Natale D. A., Necci M., Nuka G., Orengo C. A., Park Y., Pesseat S., Piovesan D., Potter S. C., Rawlings N. D., Redaschi N., Richardson L., Rivoire C., Sangrador-Vegas A., Sigrist C., Sillitoe I., Smithers B., Squizzato S., Sutton G., Thanki N., Thomas P. D., Tosatto S. C. E., Wu C. H., Xenarios I., Yeh L. S., Young S. Y., Mitchell A. L., InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 45, D190–D199 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Huerta-Cepas J., Forslund K., Coelho L. P., Szklarczyk D., Jensen L. J., von Mering C., Bork P., Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol. Biol. Evol. 34, 2115–2122 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Huerta-Cepas J., Szklarczyk D., Forslund K., Cook H., Heller D., Walter M. C., Rattei T., Mende D. R., Sunagawa S., Kuhn M., Jensen L. J., von Mering C., Bork P., eggNOG 4.5: A hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Kanehisa M., Goto S., KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kanehisa M., Sato Y., Kawashima M., Furumichi M., Tanabe M., KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kanehisa M., Sato Y., Morishima K., BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016). [DOI] [PubMed] [Google Scholar]
  • 54.Tatusov R. L., Fedorova N. D., Jackson J. D., Jacobs A. R., Kiryutin B., Koonin E. V., Krylov D. M., Mazumder R., Mekhedov S. L., Nikolskaya A. N., Sridhar Rao B., Smirnov S., Sverdlov A. V., Vasudevan S., Wolf Y. I., Yin J. J., Natale D. A., The COG database: An updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Keeling P. J., Burki F., Wilcox H. M., Allam B., Allen E. E., Amaral-Zettler L. A., Armbrust E. V., Archibald J. M., Bharti A. K., Bell C. J., Beszteri B., Bidle K. D., Cameron C. T., Campbell L., Caron D. A., Cattolico R. A., Collier J. L., Coyne K., Davy S. K., Deschamps P., Dyhrman S. T., Edvardsen B., Gates R. D., Gobler C. J., Greenwood S. J., Guida S. M., Jacobi J. L., Jakobsen K. S., James E. R., Jenkins B., John U., Johnson M. D., Juhl A. R., Kamp A., Katz L. A., Kiene R., Kudryavtsev A., Leander B. S., Lin S., Lovejoy C., Lynn D., Marchetti A., McManus G., Nedelcu A. M., Menden-Deuer S., Miceli C., Mock T., Montresor M., Moran M. A., Murray S., Nadathur G., Nagai S., Ngam P. B., Palenik B., Pawlowski J., Petroni G., Piganeau G., Posewitz M. C., Rengefors K., Romano G., Rumpho M. E., Rynearson T., Schilling K. B., Schroeder D. C., Simpson A. G. B., Slamovits C. H., Smith D. R., Smith G. J., Smith S. R., Sosik H. M., Stief P., Theriot E., Twary S. N., Umale P. E., Vaulot D., Wawrik B., Wheeler G. L., Wilson W. H., Xu Y., Zingone A., Worden A. Z., The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): Illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol. 12, e1001889 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Leger M. M., Kolisko M., Kamikawa R., Stairs C. W., Kume K., Čepička I., Silberman J. D., Andersson J. O., Xu F., Yabuki A., Eme L., Zhang Q., Takishita K., Inagaki Y., Simpson A. G. B., Hashimoto T., Roger A. J., Organelles that illuminate the origins of Trichomonas hydrogenosomes and Giardia mitosomes. Nat. Ecol. Evol. 1, 0092 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T. L., BLAST+: Architecture and applications. BMC Bioinformatics 10, 421 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Sondergaard D., Pedersen C. N. S., Greening C., HydDB: A web tool for hydrogenase classification and analysis. Sci. Rep. 6, 34212 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Eddy S. R., Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Huerta-Cepas J., Serra F., Bork P., ETE 3: Reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Huang Y., Niu B., Gao Y., Fu L., Li W., CD-HIT Suite: A web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Katoh K., Misawa K., Kuma K., Miyata T., MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Criscuolo A., Gribaldo S., BMGE (Block Mapping and Gathering with Entropy): A new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Price M. N., Dehal P. S., Arkin A. P., FastTree 2 – Approximately maximum-likelihood trees for large alignments. PLOS ONE 5, e9490 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Notredame C., Higgins D. G., Heringa J., T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000). [DOI] [PubMed] [Google Scholar]
  • 66.Wallace I. M., O’Sullivan O., Higgins D. G., Notredame C., M-Coffee: Combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Kalyaanamoorthy S., Minh B. Q., Wong T. K. F., von Haeseler A., Jermiin L. S., ModelFinder: Fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Nguyen L. T., Schmidt H. A., von Haeseler A., Minh B. Q., IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Quang S., Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24, 2317–2323 (2008). [DOI] [PubMed] [Google Scholar]
  • 70.Hoang D. T., Chernomor O., von Haeseler A., Minh B. Q., Vinh L. S., UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Guindon S., Dufayard J. F., Lefort V., Anisimova M., Hordijk W., Gascuel O., New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010). [DOI] [PubMed] [Google Scholar]
  • 72.Wang H. C., Minh B. Q., Susko E., Roger A. J., Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Syst. Biol. 67, 216–235 (2018). [DOI] [PubMed] [Google Scholar]
  • 73.Lemoine F., Domelevo Entfellner J. B., Wilkinson E., Correia D., Dávila Felipe M., de Oliveira T., Gascuel O., Renewing Felsenstein’s phylogenetic bootstrap in the era of big data. Nature 556, 452–456 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Minh B. Q., Schmidt H. A., Chernomor O., Schrempf D., Woodhams M. D., von Haeseler A., Lanfear R., IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Aberer A. J., Krompass D., Stamatakis A., Pruning rogue taxa improves phylogenetic accuracy: An efficient algorithm and webservice. Syst. Biol. 62, 162–166 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Shimodaira H., An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002). [DOI] [PubMed] [Google Scholar]
  • 77.H. Wickham, ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, New York, 2009). [Google Scholar]
  • 78.Guy L., Roat Kultima J., Andersson S. G. E., genoPlotR: Comparative gene and genome visualization in R. Bioinformatics 26, 2334–2335 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Letunic I., Bork P., Interactive tree of life (iTOL) v3: An online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242–W245 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.A. Rambaut, FigTree v1.3.1 (Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, 2010). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/35/eabb7258/DC1


Articles from Science Advances are provided here courtesy of American Association for the Advancement of Science

RESOURCES