Significance
Oxidoreductases mediate the biological production of chemical energy and regulate the flow of essential elements in all organisms and ecosystems, yet their evolutionary history is poorly understood. Here we present a network analysis of all known metal-containing oxidoreductases across the tree of life. Members of this network seem to have driven microbial metabolism in the Archean oceans. Our analysis reveals that oxidoreductases are polyphyletic and derived from a minimum of 10 different ancient protein families with distantly related domains. However, we find substantial evidence that two apparently distinct and ubiquitous iron-containing families of oxidoreductases containing Fe2S2 and hemes arose from a single common ancestor.
Keywords: iron–sulfur, Great Oxidation/Oxygenation Event, biogeochemical cycles, core pathways
Abstract
Oxidoreductases mediate electron transfer (i.e., redox) reactions across the tree of life and ultimately facilitate the biologically driven fluxes of hydrogen, carbon, nitrogen, oxygen, and sulfur on Earth. The core enzymes responsible for these reactions are ancient, often small in size, and highly diverse in amino acid sequence, and many require specific transition metals in their active sites. Here we reconstruct the evolution of metal-binding domains in extant oxidoreductases using a flexible network approach and permissive profile alignments based on available microbial genome data. Our results suggest there were at least 10 independent origins of redox domain families. However, we also identified multiple ancient connections between Fe2S2- (adrenodoxin-like) and heme- (cytochrome c) binding domains. Our results suggest that these two iron-containing redox families had a single common ancestor that underwent duplication and divergence. The iron-containing protein family constitutes ∼50% of all metal-containing oxidoreductases and potentially catalyzed redox reactions in the Archean oceans. Heme-binding domains seem to be derived via modular evolutionary processes that ultimately form the backbone of redox reactions in both anaerobic and aerobic respiration and photosynthesis. The empirically discovered network allows us to peer into the ancient history of microbial metabolism on our planet.
Oxidoreductases are anciently derived enzymes that mediate electron transfer (i.e., redox) reactions across the tree of life and ultimately came to facilitate biologically driven fluxes of hydrogen, carbon, nitrogen, oxygen, and sulfur on Earth (1). It has been suggested that an ancestral pool of peptide modules may have given rise to the first protein folds that were dispersed into different superfamilies (2–4). Some of these peptide modules are part of the limited set of building blocks (i.e., the “redox enzyme construction kit”) that gave rise to many oxidoreductases (5). Given this Darwinian model of “descent with modification” for amino acid sequences in the active sites of enzymes, we analyzed a set of core of oxidoreductase catalytic domains to elucidate origin(s) and evolutionary patterns of biological electron transfer reactions.
Previous analysis suggests that the catalytic domains evolved in microbes long before the Great Oxidation Event (GOE) ca. 2.4 billion y ago (6). However, owing to their ancient provenance, often small size, and high divergence, the evolutionary history of these domains is challenging to reconstruct. For example, a recent attempt to reconstruct the phylogeny of oxidoreductase domains based on structural data (7) was limited to pairwise distance analysis (without an underlying model of structure evolution) and implied a monophyletic origin of all metal-binding domains. To address the evolutionary history of oxidoreductases, we used hidden Markov model (HMM) (8–10) profile-to-profile alignments and protein similarity networks (11, 12) to study the metal-binding domains. HMMs (10) are a class of probabilistic models generally applicable to linear sequences (13). Because profile HMMs capture family-specific information, including functionally and structurally important residues, they are more sensitive and accurate than sequence alignments alone when searching for deeply diverged family members (2, 14, 15). The evolutionary relationships of protein families identified using profile HMMs may be reconstructed with similarity networks. These networks are composed of vertices (nodes), which represent protein sequences, connected by edges, representing similarity above a specified cutoff. Protein similarity networks offer an appealing alternative to phylogenetic approaches that rely on simultaneous multiple sequence alignments to reconstruct strictly bifurcating trees. By incorporating different metrics (e.g., pairwise alignment of profiles or sequence-to-profile alignments), network analysis provides a flexible approach to access the composition of domains in ancient protein families. In this study we applied a flexible network approach (11, 12) and permissive profile alignments (2, 14, 15) on microbial genome data to reconstruct the evolutionary history of oxidoreductase metal-binding domains. Our results suggest that whereas there were at least 10 independent origins of redox domain families one core family of iron-containing oxidoreductases came to dominate the electron fluxes across the planet before the evolution of oxygen. This family continues to represent the core of biologically catalyzed electron transfer reactions on Earth.
Results and Discussion
Profile Alignments Reveal at Least 10 Origins of Transition-Metal Redox Domains.
By aligning sequence profiles of metal-binding redox domains (16) (102 HMM domain profiles with five or more sequences; Methods) we constructed a network of vertices, with edges between them indicating domain similarity. This approach revealed 10 distinct (disconnected; Methods) subnetworks (grouping 71 domains) and 31 isolated domains (Fig. 1, Fig. S1, and Table S1). Domains binding different ligands (e.g., Fe4S4 with Fe2S2) were connected to each other only in the largest subnetwork (Fig. 1A); however, this subnetwork did not contain domains binding molybdenum, tungsten, manganese, or copper. These results strongly imply that oxidoreductases (assigned to EC class 1) are polyphyletic (Fig. 1).
Fig. 1.
Multiple subnetworks of transition-metal redox domains found using profile alignments. Vertices indicate domain families and edge lengths indicate profile alignment scores (number of aligned residues per shortest profile length). Solid edge connections are verified both by PSI-BLAST and profile alignments. Six vertices were collapsed for the purpose of presentation (Fig. S1). Isolated vertices are not shown (Table S1). Note that the distance between disconnected network components does not indicate the level of similarity. (A) Domain ligands (indicated by vertex color). Note that a ligand may be coordinated by different types of domains (e.g., heme in three different types of domains). (B) Vertex pie charts show the proportion of sequences with different oxygen requirements for each domain (aerobes are blue, facultative aerobes are yellow, and anaerobes are red). Domains that have <2% annotated sequences are shown in light blue with a darker circle. Yellow asterisks indicate a domain with a high proportion (≥70%) of sequences from thermophiles. Proportions were calculated with normalized organism counts, that is, by dividing the count of domain-sequences of a given category (e.g., anaerobes) by the number of organisms in this category in the overall set.
One Diverged Subnetwork Implies a Common Ancestor for Iron-Binding Domains.
The largest subnetwork contains edges that connect different ligand-binding protein domains. Both profile-vs.-profile and position-specific iterative (PSI)-BLAST alignments connect the following binding domains: Fe4S4 to Fe2S2, iron–sulfur (Fe4S4, Fe2S2) to four cysteine iron domains (FeS4), and iron–sulfur (Fe4S4, Fe2S2) to hemes (Fig. S2). In contrast, alignments of Fe4S4-binding domains with nitrogenase (FeFe, MoFe, VFe, or 8Fe-7S) are supported only by the more sensitive profile alignments (Fig. 1A), implying a more distant relationship. This analysis suggests a monophyletic origin of iron-binding domains that we postulate to have evolved from FexSx to hemes (discussed below). This core set of domains is distantly related to domains of nitrogenase and those containing other metals, including vanadium and nickel (i.e., NiFe in hydrogenase).
Ancient Domains Arose in Thermophilic Anaerobic Prokaryotes and Subsequently Diverged Following the Rise of Oxygen.
To test the hypothesis that oxidoreductase transition metal-binding domains that share a common ancestor may have diversified via environmental selection, we studied their distribution with respect to tolerance of oxygen and elevated temperatures. Domain families dominated by sequences from anaerobes (>50%) and thermophiles (>70%) are found in the largest subnetwork (Fig. 1B and SI Methods) and are presumably relics of the oldest core motifs. In contrast, one-half of the domain families (14 of 28) in other subnetworks are dominated by sequences from aerobic prokaryotes. Ten of these domains are present in enzymes whose functions require molecular oxygen [e.g., oxygenases (monooxygenase and dioxygenase) and cytochrome c oxidase]. Our results suggest that some domains found in the largest subnetwork arose early in a thermophilic anaerobic prokaryotic ancestor (17, 18) and subsequently diverged as the oxidation state of the surface of the Earth increased and temperature presumably decreased. In contrast, the domains present in other isolated, mostly aerobic “islands” (subnetworks) presumably evolved independently after the rise of oxygen.
The assumption of independent innovation is supported by a study of the evolution of protein folds. This analysis suggests that enzymes recruited new (and preexisting) folds for oxygen-dependent metabolism in the several-hundred-million-year interval between the emergence of oxygenic photosynthesis and the GOE (19). The absence of connections between domain islands may reflect limitations of the sequence similarity-based bioinformatic approach used in this study. However, it is plausible that (i) selective forces imposed by an increase in oxygen following the GOE served as a driving force for the evolution of novel domains and (ii) specific, environmentally constrained niches preserved this novelty. One such example is superoxide dismutase (SOD; Table S1), which is found in high frequency in facultative anaerobes. The presumably toxic effect of even low concentrations of oxygen imposed strong selective forces that resulted in the innovation of several independent SOD analogs of oxygen scavengers, two of which (Cu/Zn and Fe/Mn binding) (20) appear as isolated domains in our network.
Ancient Origin of Fe4S4-Binding Domains Is Implied by Its High Abundance in Anaerobic, Thermophilic Environments and High Network Centrality.
To study the possible antiquity of Fe4S4-binding domains, we grouped domains that bind similar ligands (Table S1) and studied their network centrality (SI Methods). Transition metal-binding domains can be divided into three categories based on differences in their relative abundance between obligate anaerobes, facultative aerobes, and obligate aerobes (Fig. S3). The Fe4S4-binding domains reveal a pattern of descending abundance as oxygen requirement increases (Fig. S3A) and temperature requirement decreases (from thermophilic to meso/psychrophilic; Fig. S3D and SI Methods). Although temperature and oxygen requirements are treated separately, their influence on the distribution of microbes is often interconnected and trends in the occurrence of a specific domain may derive from one or a combination of both factors. The high centrality of Fe4S4-binding domains further supports a putative origin in ancient anaerobic environments (Fig. S4). This hypothesis is supported by a phylogenomic analysis that addressed the evolution of metal-binding domain structures and found an early origin of iron–sulfur-binding domains (6). In addition, it has been suggested that ferredoxin and related proteins evolved early in the history of biological catalysis of redox reactions (21, 22). This idea is partially based on the short, repeat sequence in modern ferredoxins (21), all of which contain a CXXCXXC….C motif and are chiral (22, 23). We propose that the Fe4S4-binding domain served as a template for the emergence of other domains found across the oxidoreductase landscape. The adjustment to changes in metal availability (6, 24) and the need to generate domains with different redox potential and electron transfer chains seem to have been important driving forces in the evolution of new domains.
Analysis of Networks Based Solely on Domain Sequences Is Consistent with a Common Origin of Fe2S2- and Heme-Binding Domains.
Alignment of domain profiles (Fig. 1) revealed a potential evolutionary connection between Fe2S2- and heme-binding domains. To investigate this result we constructed a more stringent network from domain sequences of the largest subnetwork by using a higher sequence identity cutoff and larger alignment coverage required to define an edge (Methods). Within this stringent network (Fig. 2A), subnetwork 8 contains a central region of 83 Fe2S2-binding domain sequences (an adrenodoxin-family domain that also includes prokaryotic putidaredoxin and terpredoxin) surrounded by 282 heme-binding domain sequences (cytochrome c). Although “promiscuous” domains (i.e., aligning to three or more domain sequences binding a different ligand) exist both for Fe2S2- (n = 34) and heme- (n = 39) binding domains, the promiscuous heme domains have sixfold fewer connections with other heme domain sequences than do the promiscuous Fe2S2 domains (Fig. 2B). Nearly one-half (19) of the promiscuous heme domains do not align with any other heme domain sequence. That is, the central (promiscuous) sequences of the heme domains are more similar to Fe2S2 domains than to other heme domains. These results suggest that the Fe2S2 and heme, which constitute ∼50% of all metal-containing oxidoreductases (7), are derived from a single common ancestor. To investigate the robustness of this result, we randomized the connections between the Fe2S2- and heme-binding domain sequences using three different approaches (Methods). These randomized networks (Fig. S5 A and B) were markedly different from those shown in Fig. 2A.
Fig. 2.
Evolutionary relationship between heme-binding cytochrome c and the adrenodoxin-like (Fe2S2-binding) domain sequences. (A) The entire subnetwork 8 of the domain-sequence network constructed from domain sequences of the largest subnetwork (Methods) that contains heme- and Fe2S2-binding sequences. Vertices indicate adrenodoxin-family domain sequences (gray circles; InterPro ID IPR018298), and cytochrome c domain sequences (red circles; general and multiheme cytochrome with InterPro IDs IPR009056 and IPR011031, respectively). Edges indicate PSI-BLAST alignments. (B) Promiscuous sequences of heme (red bars) are more similar to Fe2S2 (gray bars) domain sequences than to other heme domain sequences. The level of promiscuity (x axis) is defined by the minimal number of connections to nonself domain sequences (i.e., heme connection with Fe2S2). (C) Network representing all paths of degenerative alignments chain (Methods) starting from the 34 promiscuous adrenodoxin-like (Fe2S2-binding) domain sequences and ending with cytochrome c (heme-binding) domain sequences. Vertices indicate sequences and edges BLAST alignments (Methods). Note that some edges are used in more than one path. In most steps, only one substitution occurred (observed by one nonaligned amino acid). The double dash marks an edge that was shifted left and extended solely for the purpose of compact presentation.
Approximately 35% of Transition-Metal Redox Domain-Containing Proteins Are Not Classified as Enzymes.
To understand the origin of transition-metal redox domains we examined their distribution in other enzyme classes in the UniProt database. Approximately 44% of the transition-metal redox domain-containing proteins are not classified as enzymes and a small proportion (1.8%) is assigned to nonoxidoreductase enzyme activities (Table S2). In a previous study aimed at detecting oxidoreductases based on the presence of catalytic sites it was suggested that ∼20% of nonenzyme proteins in the UniProt database are misclassified enzymes (25). However, even taking this level of misannotation into account, as much as 35% of transition-metal redox domain-containing proteins in our set have no assigned enzymatic activity. An explanation for this result may be that the transition-metal redox domains evolved from short sequences (i.e., modules) that were able to bind transition metals and were later recruited into oxidoreductases, where they assumed a new role in mediating electron transfer reactions. This hypothesis is supported by the observation that an ancestral pool of peptide modules gave rise to the ancient folds (2). To further test our inference, we asked whether these types of transitions are found in available metagenome data.
Evidence for Naturally Occurring Intermediates Between Fe2S2-Binding (Adrenodoxin-like) and Heme-Binding (Cytochrome c) Domain Sequences.
We searched the UniProt domain sequences and the Global Ocean Sampling (GOS) data (26) for all possible sequence intermediates between the core regions of Fe2S2–heme-binding domains (Methods). Using a “degenerative alignment chain” approach starting from Fe2S2-binding (adrenodoxin-like) domain sequences, we retrieved 21 heme-binding (cytochrome c) domain sequences (Fig. 2C). The majority (90%) of the degenerative alignment paths ended with a heme-binding domain, whereas the remainder ended with FeS- (9%) and copper- (1%) binding domains. Paths that ended with heme-binding domain sequences contained an average of 13.2 steps and 1.5 indels per path (Tables S3 and S4). It seems that protein sequences found in nature are constrained to a very limited portion (∼1012) of all available protein space (10390) (2). For the modules identified here, only a small fraction (108) of the available protein space was accessed (SI Methods). Thus, it is reasonable to assume that it is virtually impossible for the protein space found in nature to contain all of the intermediates between any given pair of sequences.
To compensate for this lack of diversity, specifically in the GOS database, we performed a degenerative alignment chain search against a “joint database” that included a larger sequence space from noneukaryotic organisms in UniProt (SI Methods). For the sample of 11 (adrenodoxin-like) queries (SI Methods) this approach did not retrieve any unexpected domain sequences (i.e., those that bind ligands other than Fe4S4, heme, FeS, and copper). This result suggests that expansion of sequence space to apparently unrelated modules is not an artifact of the degenerative alignment analysis in the current protein space. Furthermore, we propose that naturally occurring protein intermediates, which may not be capable of binding Fe2S2 or heme, exist transiently between Fe2S2 and the core of heme-binding domains.
Alignment of Complete Protein Sequence Profiles of Fe2S2-Binding (Adrenodoxin Family) and Cytochrome c-Specific Families Demonstrates Conservation Beyond the Short Fe2S2 Domain Sequences.
To search for conservation beyond the short Fe2S2 domain sequence and its corresponding core in the cytochrome c domain, we generated profiles from complete sequences of proteins containing the Fe2S2 (adrenodoxin family)-binding domains and the cytochrome c-specific domains (SI Methods). The adrenodoxin-like domain-containing protein aligns (E-value 0.0025) to cytochrome c4 (IPR024167) with six aligned positions outside the short domain core; two of these residues (Leu and Gly) are conserved in >40% of 544 and 557 of adrenodoxin-like and cytochrome c4 domain sequences, respectively (Fig. S5). Thus, if an evolutionary path exists between the adrenodoxin family to cytochrome c, it was extended beyond the core binding residues. Based on the redox potential of cytochrome c and the adrenodoxin family proteins, we assume that the newly derived porphyrin, using the cytochrome c-like domain, was probably more oxidizing than the adrenodoxin-like Fe2S2 domain (27–29).
It has been suggested that the abiotic origins of life precursors used transition metal-based ligands (e.g., iron–sulfur clusters) to fix carbon (30), and the basic reaction was appropriated by the first free-living organisms (31). This hypothesis highlights the critical importance of transition metal-binding domains in the origin of metabolism. However, our analysis suggests there was no single transition metal domain that is the progenitor of all extant oxidoreductases. Rather, the network analyses we present strongly indicate a polyphyletic evolutionary history. The multiple functions encoded by highly diverged protein families hinder testing the hypothesis of polyphyly via the application of a standard phylogenetic analysis with an associated protein evolution model (32). Furthermore, our analysis supports the hypothesis of extraordinary genetic innovation during the Archaean eon (19, 33). One of these studies highlights genetic innovations during an “Archaean genetic expansion” that was associated with an expansion in microbial respiratory and electron transport capabilities (33). The other suggests that enzymes recruited new folds for oxygen-dependent metabolism (19). This “research and development” stage in the first ca. 2.5 billion y of Earth’s history have facilitated the radiation of microbes under the strong selective conditions ultimately imposed by the presence of oxygen. Metabolic innovation ended with the evolution of eukaryotes (1).
We cannot completely rule out the possibility that sequence convergence explains the shared patterns we identified for short amino acid sequences in, for example Fe2S2-binding domains. However, the evolutionary connection between heme- and iron–sulfur Fe2S2-binding domains is supported by phylogenetic analysis of the remotely related (34, 35) signal sensor PAS and GAF domain families, which shows the partition of a heme-binding PAS, Fe4S4-binding PAS, and Fe2S2-binding GAF into three different clades (36, 37). Networks do not address the direction of evolution; however, the postulated existence of iron–sulfur-based ancient life precursors (30, 31) suggests that the Fe2S2-binding domain appeared before the heme-binding structure. This assumption is also supported by evolutionary analysis of transition-metal redox domains that demonstrate that Fe2S2 are highly evolvable, loop-rich structures containing few hydrogen bonds (7). These hypotheses suggest a derived evolutionary position for heme-binding domains (specifically cytochrome c), which is found across the tree of life. These protein domains ultimately form modules of oxidoreductase evolution that support both anaerobic and aerobic respiration and photosynthesis (38).
Does the Largest Subnetwork Contain Components of Metabolic Pathways Present in the First Cells on Our Planet?
Inspection of the central network that contains both iron–sulfur and hemes suggests the component oxidoreductases represent an ancient core of redox metal-dependent pathways in anaerobes and thermophiles (Fig. 3). In Archean oceans, H2 was undoubtedly a major source of reducing power (17, 18, 39, 40). Metabolic strategies among ancient microbial consortia (Fig. 3) probably included utilization of redox-coupled reactions with the available donor, H2, and acceptors, CO2 and elemental sulfur, to generate an H+-based thermodynamic disequilibrium that was subsequently used to form chemical bonds (29, 40). Electrons from H2 would have been extracted by a membrane-associated hydrogenase and transferred across the membrane to two major pathways: (i) CO2 reduction in methanogens and (ii) elementary sulfur reduction (38). The latter could have been mediated by a sulfur-reducing hydrogenase ancestor (41) or by an ancient sulfur-reducing domain (42). Interestingly, both of these functions are represented in the largest subnetwork (Fig. 1).
Fig. 3.
Core energy conversion pathways in the ancient microbiome captured by the largest protein domain subnetwork (Fig. 1, largest subnetwork). Ancient consortia of microbes probably exploited redox-coupled reactions with available donor (H2) and acceptors (CO2 and elementary sulfur) to generate H+ disequilibrium (i.e., proton motive force), which was in turn harvested to form chemical bonds (ATP) [modified from Madigan (38)]. Electrons from H2 were harvested by a membrane-associated hydrogenase1 (InterPro domains IDs: IPR018194, IPR004108, and IPR014406) and were transferred across the membrane (A) via iron-based domains2 to a heterodisulfide reductase3 (InterPro domain ID IPR017680) that regenerated coenzyme M and B, mediating the last step in methanogenesis (CO2 reduction) or (B) to sulfur reductase4 (InterPro domains IDs: IPR018194, IPR006066, and IPR006067; Results and Discussion) (38). Finally, harvesting energy from H+ disequilibrium into chemical bonds is supposedly an ancient mechanism that is abundant throughout the tree of life. The resulting H2S could have been recycled to H2 by abiotic reaction with iron sulfide minerals (ΔG0′ = −42 kJ) (38).
Our network also contains domains that may have participated in an ancient energy conservation proton reduction pathway, acetyl-CoA synthesis, nitrogen fixation, and regulation of anoxygenic photosynthesis and electron transfer (Table 1). Together these could support sustainable core pathways that formed the foundation for evolving primordial light using anoxygenic photosynthesis followed by oxygenic photosynthesis. The latter drove the GOE and the evolution of novel domains that rely on oxygen.
Table 1.
Ancient pathways represented in the biggest profiles-based redox domains network (Fig. 1)
| Pathway | Related domain names (IDs*) | Comments |
| Energy conservation, basic proton reduction | Hydrogenase (18194, 04108, 14406) | Found in the deeply branching thermophilic archaeon Pyrococcus furiosus (37) |
| Ferredoxin (06058, 09051, 17900) | ||
| Acetyl-CoA (central metabolite) synthesis | Oxygen-sensitive pyruvate oxidoreductase (11898) | Could have gained its substrate (pyruvate) from abiotic synthesis in conditions that prevailed in hydrothermal vents (44) |
| Nitrogen fixation | Nitrogenase component 1 (00510) | Either a molybdenum–iron, vanadium–iron, or iron–iron protein |
| Regulation of photosynthesis and electron transfer | Ferredoxin thioredoxin reductase (04209) | Evolved in subsequent phases of evolution of photosynthesis and electron transfer chains |
| Rieske (05805) | ||
| Cytochrome c (03088, 08168, 11031) | ||
| Ferredoxin (06058, 09051, 17900) |
Domain IDs show InterPro ID without the “IPR0” preceding characters (i.e., 18506 instead of IPR018506; Table S1).
Methods
All protein sequences with their annotations were extracted from UniProt (December 2011) (43) and the UniProt Metagenomic and Environmental Sequences databases (downloaded November 2011). UniProt sequences with transition metal-using redox domains (25) were extracted from InterPro (44). Profiles were generated for each domain using scripts from the HHblits suite (16). All-vs.-all alignment of domain profiles was made with the HHblits HHalign program. Note that although it takes all publicly available genomic data into account, our approach and related results (i.e., isolated subnetworks) are inevitably limited by the availability of sequences and resolution power of alignments. We also generated all-vs.-all domain sequence alignments using PSI-BLAST. Metadata from organisms (i.e., oxygen and temperature requirements; Fig. 1B and Fig. S3) were extracted from the National Center for Biotechnology Information and from the Integrated Microbial Genomes project database of the Joint Genome Institute (June 2012). It should be noted that to minimize phylogenetic bias in our profile networks resulting from oversampling particular lineages, we analyzed most of the publicly available prokaryotic genomes (Fig. 1B and Fig. S3). Centrality (Fig. S4 B and C) and connection between Fe2S2- and heme-binding domains (Fig. 2A) were computed for the more stringent network (protein sequence identity >30%, hit length of >70% of the smallest homolog) constructed from domain sequences of the largest subnetwork. To search for possible sequence intermediates between Fe2S2- (adrenodoxin-family) and heme- (cytochrome c) binding domains, we constructed a set of BLAST alignments (Fig. 2) starting from Fe2S2 domain sequences in which the aligned region of the hit was used as a query sequence in the next step. When the chain of degenerative alignments retrieved a domain sequence that bound a ligand other than Fe2S2 it was terminated and considered as a “path” (Tables S3 and S4). For all methods, all parameter and procedural details are reported in SI Methods.
Supplementary Material
Acknowledgments
We thank Vikas Nanda, Stefan Senn, John Kim, Shu Cheng, Chengesheng Zhu, Ben Jelen (all Rutgers University), and Eric Bapteste (Université Pierre et Marie Curie) for helpful discussions; Johannes Söding (Ludwig Maximilian University of Munich) for helpful advice and technical support in the profiles alignment analysis; J. Clark Lagarias (University of California, Davis) for discussions; and Huan Qiu, Udi Zelzion, and other D.B. laboratory members for helpful insights into the work. This research was funded by Gordon and Betty Moore Foundation Grant GBMF2807 (to P.G.F.).
Footnotes
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1403676111/-/DCSupplemental.
References
- 1.Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320(5879):1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
- 2.Alva V, Remmert M, Biegert A, Lupas AN, Söding J. A galaxy of folds. Protein Sci. 2010;19(1):124–130. doi: 10.1002/pro.297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bukhari SA, Caetano-Anollés G. Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes. PLOS Comput Biol. 2013;9(3):e1003009. doi: 10.1371/journal.pcbi.1003009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Furnham N, et al. Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLOS Comput Biol. 2012;8(3):e1002403. doi: 10.1371/journal.pcbi.1002403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Baymann F, et al. The redox protein construction kit: Pre-last universal common ancestor evolution of energy-conserving enzymes. Philos Trans R Soc Lond B Biol Sci. 2003;358(1429):267–274. doi: 10.1098/rstb.2002.1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Dupont CL, Butcher A, Valas RE, Bourne PE, Caetano-Anollés G. History of biological metal utilization inferred through phylogenomic analysis of protein structures. Proc Natl Acad Sci USA. 2010;107(23):10567–10572. doi: 10.1073/pnas.0912491107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kim JD, Senn S, Harel A, Jelen BI, Falkowski PG. Discovering the electronic circuit diagram of life: Structural relationships among transition Metal binding sites in oxidoreductases. Philos Trans R Soc B. 2013;386:220120257–220120266. doi: 10.1098/rstb.2012.0257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: Detection of distantly related proteins. Proc Natl Acad Sci USA. 1987;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]
- 11.Bapteste E, Bouchard F, Burian RM. Philosophy and evolution: Minding the gap between evolutionary patterns and tree-like patterns. Methods Mol Biol. 2012;856:81–110. doi: 10.1007/978-1-61779-585-5_4. [DOI] [PubMed] [Google Scholar]
- 12.Bapteste E, et al. Evolutionary analyses of non-genealogical bonds produced by introgressive descent. Proc Natl Acad Sci USA. 2012;109(45):18266–18272. doi: 10.1073/pnas.1206541109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
- 14.Sadreyev RI, Baker D, Grishin NV. Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci. 2003;12(10):2262–2272. doi: 10.1110/ps.03197403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Söding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21(7):951–960. doi: 10.1093/bioinformatics/bti125. [DOI] [PubMed] [Google Scholar]
- 16.Remmert M, Biegert A, Hauser A, Söding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012;9(2):173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]
- 17.Baross JA, Hoffman SE. Submarine hydrothermal vents and associated gradient environments as sites for the origin and evolution of life. Orig Life Evol Biosph. 1985;15(4):327–345. [Google Scholar]
- 18.Nisbet EG, Sleep NH. The habitat and nature of early life. Nature. 2001;409(6823):1083–1091. doi: 10.1038/35059210. [DOI] [PubMed] [Google Scholar]
- 19.Wang M, et al. A universal molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol. 2011;28(1):567–582. doi: 10.1093/molbev/msq232. [DOI] [PubMed] [Google Scholar]
- 20.Miller AF. Superoxide dismutases: Active sites that save, but a protein that kills. Curr Opin Chem Biol. 2004;8(2):162–168. doi: 10.1016/j.cbpa.2004.02.011. [DOI] [PubMed] [Google Scholar]
- 21.Eck RV, Dayhoff MO. Evolution of the structure of ferredoxin based on living relics of primitive amino Acid sequences. Science. 1966;152(3720):363–366. doi: 10.1126/science.152.3720.363. [DOI] [PubMed] [Google Scholar]
- 22.Kim JD, Rodriguez-Granillo A, Case DA, Nanda V, Falkowski PG. Energetic selection of topology in ferredoxins. PLOS Comput Biol. 2012;8(4):e1002463. doi: 10.1371/journal.pcbi.1002463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mulholland SE, Gibney BR, Rabanal F, Dutton PL. Determination of nonligand amino acids critical to [4Fe-4S]2+/+ assembly in ferredoxin maquettes. Biochemistry. 1999;38(32):10442–10448. doi: 10.1021/bi9908742. [DOI] [PubMed] [Google Scholar]
- 24.Williams RJP. Natural selection of the chemical elements. Proc R Soc Lond B Biol Sci. 1981;213:361–397. [Google Scholar]
- 25.Harel A, Falkowski P, Bromberg Y. TrAnsFuSE refines the search for protein function: Oxidoreductases. Integr Biol (Camb) 2012;4(7):765–777. doi: 10.1039/c2ib00131d. [DOI] [PubMed] [Google Scholar]
- 26.Yooseph S, et al. The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol. 2007;5(3):e16. doi: 10.1371/journal.pbio.0050016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Huang YY, Kimura T. Reduction potential and thermodynamic parameters of adrenodoxin by the use of an anaerobic thin-layer electrode. Anal Biochem. 1983;133(2):385–393. doi: 10.1016/0003-2697(83)90099-4. [DOI] [PubMed] [Google Scholar]
- 28.Lehninger AL, Nelson DL, Cox MM. Lehninger Principles of Biochemistry. 5th Ed. New York: Freeman; 2008. p. 511. [Google Scholar]
- 29. Williams RJP, Frausto da Silva JRR (1996) The Natural Selection of the Chemical Elements: The Environment and Life’s Chemistry (Clarendon, New York), pp 186, 306, 588.
- 30.Wachtershauser G. Pyrite formation, the first energy source for life: A hypothesis. Syst Appl Microbiol. 1998;10:207–210. [Google Scholar]
- 31.Martin W, Russell MJ. On the origins of cells: A hypothesis for the evolutionary transitions from abiotic geochemistry to chemoautotrophic prokaryotes, and from prokaryotes to nucleated cells. Philos Trans R Soc Lond B Biol Sci. 2003;358(1429):59–83, discussion 83–85. doi: 10.1098/rstb.2002.1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Theobald DL. A formal test of the theory of universal common ancestry. Nature. 2010;465(7295):219–222. doi: 10.1038/nature09014. [DOI] [PubMed] [Google Scholar]
- 33.David LA, Alm EJ. Rapid evolutionary innovation during an Archaean genetic expansion. Nature. 2011;469(7328):93–96. doi: 10.1038/nature09649. [DOI] [PubMed] [Google Scholar]
- 34.Anantharaman V, Koonin EV, Aravind L. Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains. J Mol Biol. 2001;307(5):1271–1292. doi: 10.1006/jmbi.2001.4508. [DOI] [PubMed] [Google Scholar]
- 35.Ho YS, Burden LM, Hurley JH. Structure of the GAF domain, a ubiquitous signaling motif and a new class of cyclic GMP receptor. EMBO J. 2000;19(20):5288–5299. doi: 10.1093/emboj/19.20.5288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Unden G, Nilkens S, Singenstreu M. Bacterial sensor kinases using Fe-S cluster binding PAS or GAF domains for O2 sensing. Dalton Trans. 2013;42(9):3082–3087. doi: 10.1039/c2dt32089d. [DOI] [PubMed] [Google Scholar]
- 37.Müllner M, et al. A PAS domain with an oxygen labile [4Fe-4S](2+) cluster in the oxygen sensor kinase NreB of Staphylococcus carnosus. Biochemistry. 2008;47(52):13921–13932. doi: 10.1021/bi8014086. [DOI] [PubMed] [Google Scholar]
- 38.Madigan MT. Brock Biology of Microorganisms. 13th Ed. San Francisco: Benjamin Cummings; 2012. pp. 354–359, 385–387, 393–394, 402, 451. [Google Scholar]
- 39.Kim JD, Yee N, Nanda V, Falkowski PG. Anoxic photochemical oxidation of siderite generates molecular hydrogen and iron oxides. Proc Natl Acad Sci USA. 2013;110(25):10073–10077. doi: 10.1073/pnas.1308958110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Reysenbach AL, Shock E. Merging genomes with geochemistry in hydrothermal ecosystems. Science. 2002;296(5570):1077–1082. doi: 10.1126/science.1072483. [DOI] [PubMed] [Google Scholar]
- 41.Ma K, Schicho RN, Kelly RM, Adams MW. Hydrogenase of the hyperthermophile Pyrococcus furiosus is an elemental sulfur reductase or sulfhydrogenase: Evidence for a sulfur-reducing hydrogenase ancestor. Proc Natl Acad Sci USA. 1993;90(11):5341–5344. doi: 10.1073/pnas.90.11.5341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Pedroni P, et al. Characterization of the locus encoding the [Ni-Fe] sulfhydrogenase from the archaeon Pyrococcus furiosus: Evidence for a relationship to bacterial sulfite reductases. Microbiology. 1995;141(Pt 2):449–458. doi: 10.1099/13500872-141-2-449. [DOI] [PubMed] [Google Scholar]
- 43.UniProt Consortium Reorganizing the protein space at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2012;40(Database issue):D71–D75. doi: 10.1093/nar/gkr981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hunter S, et al. InterPro: The integrative protein signature database. Nucleic Acids Res. 2009;37(Database issue):D211–D215. doi: 10.1093/nar/gkn785. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



