Abstract
Cyanobacteria forged two major evolutionary transitions with the invention of oxygenic photosynthesis and the bestowal of photosynthetic lifestyle upon eukaryotes through endosymbiosis. Information germane to understanding those transitions is imprinted in cyanobacterial genomes, but deciphering it is complicated by lateral gene transfer (LGT). Here, we report genome sequences for the morphologically most complex true-branching cyanobacteria, and for Scytonema hofmanni PCC 7110, which with 12,356 proteins is the most gene-rich prokaryote currently known. We investigated components of cyanobacterial evolution that have been vertically inherited, horizontally transferred, and donated to eukaryotes at plastid origin. The vertical component indicates a freshwater origin for water-splitting photosynthesis. Networks of the horizontal component reveal that 60% of cyanobacterial gene families have been affected by LGT. Plant nuclear genes acquired from cyanobacteria define a lower bound frequency of 611 multigene families that, in turn, specify diazotrophic cyanobacterial lineages as having a gene collection most similar to that possessed by the plastid ancestor.
Keywords: plastid evolution, endosymbiosis, phylogenomics, true-branching cyanobacteria, nitrogen fixation
Introduction
Cyanobacteria are crucial players in Earth and life history because they generated the oxygen that has been present in the Earth’s atmosphere for the last 2.4 billion years (Bekker et al. 2004) and because one uniquely fateful cyanobacterium became, via endosymbiosis, the ancestor of all plastids among photosynthetic eukaryotes (Gould et al. 2008). Though they continue to impact global geochemical cycles through N2-fixation (Moisander et al. 2010), and the sequestering of trace metals (Morel and Price 2003) as well as phosphorous (van Mooy et al. 2009), their main ecological significance is the oxygen-producing photosynthetic apparatus that fuels most contemporary food chains. Their main evolutionary significance is that they mediated two pivotal innovations in life’s history—water-splitting photosynthesis and the origin of primary plastids. Clues to both of those major evolutionary transitions should, in principle, be imprinted in cyanobacterial genomes. But reconstructing those events is not straightforward, because lateral gene transfer (LGT) redistributes genes among prokaryote genomes (Ochman et al. 2000), and among cyanobacterial genomes in particular (Raymond et al. 2002; Mulkidjanian et al. 2006; Dufresne et al. 2008; Shi and Falkowski 2008), over geological time.
By necessity, and perhaps more so than for any other prokaryotic group, LGT has always been hard-wired into the bigger picture of cyanobacterial evolution. To explain the origin of cyanobacterial water-splitting photosynthesis, both of the main competing theories require LGT to account for the distribution of photosystems across prokaryotic groups (Xiong and Bauer 2002; Hohmann-Marriot and Blankenship 2011). This is because the reaction centers of photosystems I and II clearly share common ancestry (Baymann et al. 2001; Hohmann-Marriot and Blankenship 2011), but without specifying how they entered the cyanobacterial ancestor genome. One theory posits that the two photosystems evolved in independent lineages and became merged in the founder cyanobacterium via LGT (Baymann et al. 2001), while the alternative has it that the photosystems diverged within a photosynthetic (protocyanobacterial) ancestor and were subsequently exported via LGT to some anoxygenic photosynthetic lineages (Xiong and Bauer 2002; Mulkidjanian et al. 2006; Sharon et al. 2009). Compatible with a role for LGT in photosystem evolution is the finding that the genes for both photosystems I and II are mobile in marine phage metagenomes (Lindell et al. 2004; Sharon et al. 2009).
LGT also figures into the origin of plastids, because many genes were transferred from endosymbiont to host. Chloroplasts were once free-living cyanobacteria and contained approximately 2,000 proteins (Richly and Leister 2004), a number comparable with a cyanobacterium, yet the genomes of modern plastids contain only 5–10% as many genes as those of their free-living cousins. This suggests that hundreds or thousands of the plastid ancestor’s genes were either lost or relocated to the host nucleus during the course of plant evolution via endosymbiotic gene transfer (EGT) (Gould et al. 2008). Furthermore, the phylogenetic identity of the plastid ancestor remains debated because of LGT. Different phylogenetic trees trace the plastid ancestor near the base of cyanobacterial diversification (Criscuolo and Gribaldo 2011), near coccoid cyanobacteria within the Synechococcus–Prochlorococcus (SynPro) clade (Reyes-Prieto et al. 2010), near the nitrogen-fixing Cyanothece clade (Deschamps et al. 2008), or near filamentous, heterocyst-forming cyanobacterial lineages (Deusch et al. 2008). The simplest explanation for such findings—in an evolutionary context that incorporates LGT—is that the plastid ancestor donated one (chimeric) genome’s worth of genes to the host, and that LGT has been reassorting the homologs of these genes among free-living cyanobacterial and other prokaryote genomes ever since (Deusch et al. 2008). Because of LGT over time, the question of which “lineage” of cyanobacteria gave rise to the plastid loses meaning (Doolittle and Bapteste 2007), because the genomes and nature of the “lineages” have changed since the time of plastid origin over 1.2 billion years ago (Deusch et al. 2008; Gross et al. 2008). However, comparison of plant genes acquired from the plastid ancestor with cyanobacterial homologs can reveal which modern cyanobacteria harbor a collection of genes most similar to that of the plastid ancestor.
So far, missing in genomic studies of cyanobacterial evolution are sequences from the group designated as subsection V (Rippka et al. 1979). Subsection V cyanobacteria grow as filaments that differentiate heterocysts (specialized N2-fixing cells), they produce cyst-like resting cells (akinetes) as well as differentiated motile trichomes (hormogonia), and most exhibit true branching. The developmental and morphological variety of subsection V cyanobacteria places them among the most complex of prokaryotes, for which reason they were even long thought to be the direct ancestors of all eukaryotes but only in the days before the endosymbiotic origin of plastids has been postulated (Mereschkowsky 1905) and eventually gained compelling support (Doolittle 1980). To better understand the role of subsection V species in cyanobacterial evolution and their possible relationship to the plastid ancestor, we have sequenced five genomes sampling a broad spectrum of filamentous, true-branching architecture (fig. 1A and B), and diverse geographical locations including rice fields in India (Fischerella muscicola PCC 73103 and Chlorogloeopsis fritschii PCC 6912), and hot springs in New Zealand (F. muscicola PCC 7414), Wyoming, USA (F. thermalis PCC 7521), and in Spain (C. fritschii PCC 9212) (Rippka et al. 1979). In addition Scytonema hofmanni PCC 7110, a Nostocales representative (subsection IV) isolated from a limestone cave (Crystal cave, Bermuda) (Rippka et al. 1979), whose filaments form false branches (fig. 1C) and exhibit aerial growth, was included for comparison.
Fig. 1.—
Genomes of Stigonematales and Scytonema. (A) Fischerella muscicola PCC 7414, forming true lateral branches. (B) Chlorogloeopsis fritschii PCC 6912, undergoing cell divisions in more than one plane but never producing lateral branches. Heterocysts and hormogonia, differentiated by members of both genera are marked by red and cyan arrows, respectively. (C) Scytonema hofmanni PCC 7110 showing false branching filaments (black arrow) and heterocysts (red arrow). (D) Genomic features of the six novel sequenced genomes. Genomes have been deposited in NCBI under accessions (PRJNA104961, PRJNA104963, PRJNA104969, PRJNA104967, PRJNA104965, and PRJNA157363). Fully annotated versions are available at www.molevol.de/resources. (E) Frequency distribution of protein coding genes in the new genomes, and comparison with other cyanobacterial genomes examined.
Materials and Methods
Cyanobacterial Cultures and DNA Isolation
Stock cultures were maintained at 37°C on slants (or plates) in BG11o medium (Rippka and Herdman 2002), supplemented with 5 mM NaHCO3 and solidified with 0.9% (w/v) washed agar (Sigma, A 8678). For DNA isolation, cultures were grown at 37°C in BG11 medium (Rippka and Herdman 2002), with orbital shaking (100 rpm) in an Infors Incubator, at a PPFD of 30 µmol quanta m−2s−1. Cultures were harvested after 3–6 weeks of incubation, depending on density of the inoculum and the growth rates of the strains. DNA isolation from strains of Chlorogloeopsis was performed as described (Franche and Damerval 1988), with the addition of 1% Sarkosyl during lysozyme treatment to remove polysaccharides and a final RNA digestion step. Polysaccharide-free high molecular weight genomic DNA (gDNA) from strains of Fischerella was obtained by following a protocol for polysaccharide-rich plants (Sharma et al. 2002).
Genome Sequencing and Annotation
Prior to genome sequencing the identity of the gDNA was verified by sequencing of the 16S rDNA with primers 101F (ACTGGCGGACGGGTGAGTAA) and 1047R (GACGACAGCCATGCAGCACC), and comparison against cyanobacterial sequences available in NCBI. Genome sequencing was performed on the Genome Sequencer FLX using Titanium chemistry (Roche Applied Science, Penzberg, Germany) yielding a 10- to 32-fold coverage. Genome scaffolding was achieved by 3 kbp paired-end standard runs. The sequencing libraries were prepared from 4 μg of gDNA for whole genome shot gun sequencing and 5 μg of gDNA for paired-end sequencing, according to the supplier’s instructions. Additionally, a fosmid library was constructed with the Copy Control Fosmid Library Production Kit (Biozym Scientific, Hess. Oldendorf, Germany). Terminal DNA sequences of cloned genomic inserts were determined with an ABI 3730xl DNA Analyzer (Life Technologies, Darmstadt, Germany). Furthermore, Sanger-reads were generated from fosmid clones to cover the gaps between contigs for each of the five genomes. Sequence data were assembled with the GS De Novo Assembler Software (ver. 2.0.01.14, 2.3, and 2.5.3). For each genome, large (>500 bp) and small contigs (<500 bp) were obtained, including numerous repetitive elements and insertion segments. For finishing purposes, all DNA sequences were uploaded into the Consed program (Gordon et al. 1998). The final annotation including COGs (Tatusov et al. 2001) of the genome sequences was accomplished with the GenDB software (Meyer et al. 2003). Gene prediction was performed by means of combining results of the software tools GLIMMER (Delcher et al. 1999), CRITICA (Badger and Olson 1999), and GISMO (Krause et al. 2007).
Phylogenetic Analysis of Cyanobacterial Genomes
Fully sequenced cyanobacterial proteomes were downloaded from NCBI version March/2011. For the reconstruction of cyanobacterial gene families, we conducted an all-against-all BLAST search (Ver. 2.2.17) (Altschul et al. 1997) using the protein sequences. Reciprocal best BLAST hits (rBBH) were performed using a threshold of E value ≤ 10−10 and percent amino acid identity ≥30. For the clustering analysis, the overall protein sequence similarity between rBBH proteins, calculated as the percent of identical amino acids, was multiplied by the length ratio of the two proteins. Clusters of gene families were inferred from the rBBH similarity matrix using the MCL ver. 1.008 clustering procedure (Enright et al. 2002), with the inflation parameter (I) set to 2.0. For the reconstruction of a consensus tree phylogeny, 324 gene families present as single copies in all cyanobacterial genomes analyzed were aligned with MAFFT (Katoh et al. 2002) ver. 6.717b. Phylogenetic trees were reconstructed using the Neighbor-Joining (NJ) approach (Saitou and Nei 1987). Protein sequence distances were calculated with PROTDIST (Felsenstein 1993), and applying the JTT substitution model (Jones et al. 1992). Phylogenetic trees were reconstructed with NEIGHBOR (Felsenstein 1993). The consensus phylogeny was reconstructed with CONSENSE (Felsenstein 1993). A concatenated alignment was reconstructed from the aligned protein sequences, and all genes were weighted equally (supplementary fig. S1, Supplementary Material online). A phylogenetic tree was reconstructed from the concatenated alignment using the NJ approach and the software described as earlier. A phylogenetic network was reconstructed with SplitsTree Ver. 4.10 using the default parameters (Huson and Bryant 2006). A minimal lateral network (MLN) was reconstructed using the consensus phylogeny as the reference tree, and the gene families described earlier according to the approach described in Dagan et al. (2008). Maximum likelihood phylogeny was reconstructed using PhyML (Guindon et al. 2010) with LG model + I (estimation of invariant sites) + G (gamma distribution with 4 rate categories). Tree topology (SPR), branch length, and rate parameters were optimized.
Phylogenetic Analysis of the Plastid Ancestor
Sequences of nuclear-encoded proteins from the whole genomes of Arabidopsis thaliana, Oryza sativa subsp. japonica, Physcomitrella patens, Chlamydomonas reinhardtii, Entamoeba histolytica, Dictyostelium discoideum, Filobasidiella neoformans, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, Ciona intestinalis, Danio rerio, Gallus gallus, Canis lupus familiaris, and Homo sapiens were obtained from RefSeq database release November 2009 (Pruitt et al. 2007). Nuclear proteomes of Cyanidioschyzon merolae version February 2005 (Matsuzaki et al. 2004), Ostreococcus tauri version 2.0 (Palenik et al. 2007), and Xenopus tropicalis release 4.1, August 2005 (Bowes et al. 2008), were downloaded from the respective genome project websites. Additionally, 650 fully sequenced genomes of prokaryotes, including those of 46 cyanobacterial representatives, were downloaded from NCBI RefSeq database release November 2009 (Pruitt et al. 2007). To avoid clustering artifacts of distantly related eukaryotic and prokaryotic sequences, the sequences of cyanobacteria and photosynthetic eukaryotes were first clustered into separate sets of protein families. Matrices of algal/plant and cyanobacterial sequences were constructed from reciprocal best BLAST hits using an all-against-all BLAST, and thresholds of E-value ≤ 10−10 and amino acid sequence identities ≥25%. Clusters of homologous protein sequences were reconstructed from each of the matrices using MCL (Enright et al. 2002) Ver. 08-312, 1.008, with scheme = 7 and I = 2.0. Protein sequences of noncyanobacterial prokaryotes and nonphotosynthetic eukaryotes were added to the plant/algal clusters of proteins, depending on their sequence homologies using the above threshold, and a limit of three sequences per phylum. Overlapping plant/algal and cyanobacterial clusters were joined. The sequences of protein families were aligned using MAFFT (Katoh et al. 2002) Ver. 6.717b (2009/12/03). Multiple sequence alignment quality was assessed using the HoT-method (Landan and Graur 2007). Plant/algal protein sequences with Sum of Pairs Score <80% were excluded from the cluster. Phylogenetic trees were reconstructed using maximum likelihood approach with PhyML (Guindon et al. 2010) and the best-fit model as inferred with ProtTest (Abascal et al. 2005). The search for a best-fit model using ProtTest was restricted for nuclear gene substitution models including JTT (Jones et al. 1992) and WAG (Whelan and Goldman 2001) matrices. These were tested with all combinations of +I (estimation of invariant sites), +G (gamma distribution with 4 rate categories), and +F (using amino acid frequencies from the alignment) parameters. Branch lengths, model, and topology were optimized. From among 35,862 trees in total, WAG model was found as the best fit in 89% of the trees, with WAG + I + G as the more prevalent choice (34%). Genes of endosymbiotic origin in algal and plant genomes were inferred from the phylogenetic trees by searching for sisterhood between cyanobacterial protein sequences and their counterparts encoded by the nuclear genes of the photosynthetic eukaryotes (Martin et al. 2002). Protein families in the latter phototrophs were counted as having resulted from EGT(s), if at least one of them had a cyanobacterial sequence as the nearest neighbor. Concatenated alignments were analyzed and used for tree construction by the same methods as described earlier.
Results
Genomes of Subsection V (Stigonematales) and Scytonema
The genome size distribution of the five Stigonematales strains (5.9 ± 2 Mb; fig. 1D) is similar to that of subsection IV members (Nostocales) (Larsson et al. 2011). With only 5,340 CDSs, F. thermalis PCC 7521 has the smallest genome among the subsection V members, whereas the genome of S. hofmanni PCC 7110 (subsection IV) has 12,356 predicted ORFs, making it the most gene-rich prokaryote sequenced to date (fig. 1D). Clustering of all 223,941 CDSs encoded in 51 cyanobacterial genomes by protein sequence similarity resulted in 18,185 cyanobacterial protein families and 47,174 singletons. Protein families with metabolic or cellular functions have significantly more duplicates in strains of subsection V than in those of subsection IV (P < 2.2 × 10–16, paired t test). Subsection V and IV strains do not differ in gene copy number for information processing protein families (P = 0.11, paired t test). The genome of strain PCC 7521 contains fewer duplicates (P < 2.2 × 10–16, paired t test) than the other two representatives of Fischerella. The frequency of genes shared with other filamentous cyanobacteria and the distribution of gene function are similar (fig. 1E and supplementary fig. S2, Supplementary Material online) among the three phenotypically similar Fischerella strains (Rippka et al. 1979).
Patterns of gene presence and absence might identify genes related to cyanobacterial morphological diversity (Stucken et al. 2010; Larsson et al. 2011). A subset of 22 protein families is unique and common to all filamentous cyanobacteria in our sample (supplementary table S1, Supplementary Material online), only few of which have known function. Subsection V members share 7 ± 1% of their proteome with those of subsection IV, and 73 protein families are specific to heterocyst-forming strains (supplementary table S2, Supplementary Material online). Most of the remaining subsections IV- and V-specific genes fall into cell wall, membrane, and envelope biogenesis COGs, such as glycosyltransferases, exopolysaccharide synthesis, and secretion. Some of the subsection V-specific protein families might be involved in the multiseriate filament phenotype and formation of true branches. On average, only 2% of the proteins encoded in subsection V genomes are specific to true-branching forms. Only 46 gene families are uniquely shared among subsection V genomes (supplementary table S3, Supplementary Material online). Although their functions are yet unknown, their classifications entail mostly cell wall, membrane, envelope biogenesis, and signal transduction functions. The relative paucity of proteins comprising the core set of the true branching cyanobacteria suggests that this phenotype hinges upon very few expressed proteins, which may mainly affect regulation of cell division genes and/or localization of their products.
Vertical and Lateral Components of Cyanobacterial Genome Evolution
To reconstruct a cyanobacterial backbone phylogeny, we identified all 324 single-copy protein families common to all 51 cyanobacteria in our sample and reconstructed their phylogenetic trees. The consensus tree (fig. 2), rooted with Gloeobacter violaceus, indicates a single origin for the filamentous architecture, and the concatenated alignment (564,408 sites) yielded an identical topology with NJ (supplementary fig. S3, Supplementary Material online), where all branches are supported by 100% bootstrap replicates. Maximum likelihood reconstruction yielded a phylogeny in which filamentous cyanobacteria are polyphyletic (supplementary fig. S3, Supplementary Material online), the difference to NJ being the position of Microcoleus chthonoplastes PCC 7420, a filamentous strain isolated from salt marshes (Rippka et al. 1979). Current whole-genome cyanobacterial phylogenies group Microcoleus with subsection I (Criscuolo and Gribaldo 2011), yielding paraphyly for filamentous forms. Although 55 of our 324 single copy gene trees support that position for Microcoleus, 111 recover filamentous monophyly, discrepancies that might reflect the workings of LGT (Raymond et al. 2002; Mulkidjanian et al. 2006; Shi and Falkowski 2008; Dufresne et al. 2008). To test the consistency of the backbone (consensus) phylogeny, we reconstructed a phylogenetic network using SplitsTree (Huson and Bryant 2006). The resulting network reveals a paucity of conflicting splits in the data (supplementary fig. S4, Supplementary Material online). A total of 92 out of 212 splits are compatible with the NJ tree topology and their sum of split weight amounts to 96% of the total network; and thus, the NJ tree explains most of the split variability in the data.
Fig. 2.—
Vertical and lateral gene evolution in cyanobacterial genomes. NJ consensus (or backbone) tree, inferred from 324 single-copy protein families common to all 51 cyanobacteria in our sample, and rooted with Gloeobacter violaceus PCC 7421. Branches indicating vertical gene evolution are indicated in black. The MLN is indicated by edges that do not map onto the vertical component, with number of genes per edge indicated by a color gradient from cyan (1 gene) to orange (736 genes). The phylogenetic position of the eukaryotic clade reconstructed using 23 core genes is marked by “a.” The SynPro clade is marked by an arrow.
To estimate the degree and distribution of LGT in cyanobacterial evolution, we reconstructed a MLN, which infers LGT frequencies by allowing increasing amounts of LGT per protein family across a given backbone phylogeny (here the consensus tree), and identifying for all gene families the LGT frequency at which the distributions of modern genome sizes and inferred ancestral genome sizes agree best (Dagan et al. 2008). The MLN analysis conservatively assumes that all gene trees for all protein families are compatible (Dagan et al. 2008) and entails no gene tree comparisons. It revealed that 6,068 (34%) of the cyanobacterial protein families require no LGT to account for their gene distributions, whereas 12,116 (66%) protein families have undergone at least one LGT event. Because the method does not tally conflicting gene trees for homologous sequences, these are conservative lower bound estimates, in contrast to other recent studies (Raymond et al. 2002; Mulkidjanian et al. 2006; Shi and Falkowski 2008; Dufresne et al. 2008). Our estimate is found in agreement with earlier quantification of LGT frequency among cyanobacteria using an embedded quartets approach (Zhaxybayeva et al. 2006).
The MLN is presented in figure 2, and shows vertical components of cyanobacterial evolution and a network of 1,183 edges indicating laterally shared genes. Within the network, 358 edges (32%) represent a single laterally shared gene, whereas most edges (55%) carry ≤3 genes. Only 91 (7%) of edges carry >20 genes. Thus, bulk transfers of tens of genes or more are rare. The clade of marine Prochlorococcus and Synechococcus (SynPro) strains, which are recognized as being closely related environmental specialists of reduced genome size (Rocap et al. 2003; Dufresne et al. 2008), appear to have the lowest LGT frequency. The intertwined phylogenies within this clade (Zhaxybayeva et al. 2009) go undetected because the MLN is reconstructed from gene presence/absence data that are uninformative for the reconstruction of recombination events at the intra-species level (Dagan et al. 2008). The most highly connected nodes implicate the four contemporary strains Acaryochloris marina MBIC 11017, Cyanothece PCC 7425, M. chthonoplastes PCC 7420, and S. hofmanni PCC 7110 (fig. 2). Two of these strains, A. marina, an atypical marine unicellular cyanobacterium producing chlorophyll d as the primary photosynthetic pigment (Swingley et al. 2008), and M. chthonoplastes, a marine mat former, have the largest genomes (8.36 and 8.65 Mb, respectively) known for members of subsections I and III, and show an expansion of protein families (Larsson et al. 2011). The MLN pinpoints large genomes as harboring gene pools that are frequently transferred among cyanobacteria, and identifies subsection V strains as being more highly connected with strains of subsections IV and III (1.4 edges/node) than with unicellular strains (0.3 edges/node), also when strains of the SynPro clade are excluded (0.7 edges/node). This may suggest the existence of a LGT barrier between unicellular (mostly marine) and filamentous (mostly terrestrial) cyanobacteria.
The Nature of the Plastid Ancestor
To identify plant nuclear genes of cyanobacterial origin, we reconstructed 35,862 phylogenetic trees containing both eukaryotic and prokaryotic homologs and looked for trees in which plants and cyanobacteria branch together. In the present sample, considering all trees, between 8.7% and 11.5% of all nuclear genes in photosynthetic eukaryotes sampled branch with cyanobacterial homologs (table 1). For the most reliable alignments, where false negatives are less likely, the proportion of genes acquired from plastids ranges between 16% of the genes in Arabidopsis genome and >20% of the genes in the smaller genomes of Ostreococcus and Cyanidioschyzon (fig. 3), with energy metabolism and carbohydrate metabolism (99 genes) being the most frequent functional categories (supplementary fig. S5, Supplementary Material online). Clearly, the quantitative contribution of cyanobacteria to plant genomes was great, and the backbone of plant metabolism was acquired from them—plants are, biochemically, cyanobacteria wrapped in a bigger box.
Table 1.
Proportion of Plant Genes of Endosymbiotic Origin
| No. Proteins | Total Tree Set |
CS ≥ 80% |
≤ 3 homologues |
|||||
|---|---|---|---|---|---|---|---|---|
| No. Trees | No. Putative EGT | EGT Bootstrap Support | No. Trees | No. Putative EGT | No. Protein Families | No. Putative EGT | ||
| Arabidopsis thaliana | 30,897 | 9,025 | 801 (8.9%) | 87.89 ± 20.10 | 3,306 | 424 (12.8%) | 2,091 | 136 (6.5%) |
| Oryza sativa | 26,712 | 7,292 | 637 (8.7%) | 84.82 ± 21.41 | 2,596 | 347 (13.4%) | 1,623 | 95 (5.9%) |
| Physcomitrella patens | 35,468 | 8,847 | 903 (10.2%) | 84.74 ± 22.11 | 3,425 | 542 (15.8%) | 1,402 | 78 (5.6%) |
| Ostreococcus tauri | 7,715 | 3,495 | 403 (11.5%) | 84.64 ± 21.20 | 1,232 | 247 (20.1%) | 324 | 26 (8.0%) |
| Cyanidioschyzon merolae | 4,761 | 2,688 | 307 (11.4%) | 83.92 ± 20.05 | 844 | 167 (19.8%) | 223 | 15 (6.7%) |
| Chlamydomonas reinhardtii | 14,262 | 4,515 | 478 (10.6%) | 83.81 ± 21.04 | 1,646 | 283 (17.2%) | 599 | 41 (6.8%) |
| Total | 119,815 | 35,862 | 3,529 (9.8%) | 84.97 ± 20.98 | 13,049 | 2,010 (15.4%) | 6,262 | 391 (6.2%) |
Fig. 3.—

Phylogenetic characteristics of EGT inference. The frequency of EGT as inferred from alignments of varying reliability degrees. The distribution of alignment reliability as estimated by column score (CS) is presented in bars, colored according to the respective eukaryotes. The CS measure is calculated as the proportion of alignment sites whose reconstruction is independent upon the direction upon which the sequences are fed to the alignment algorithm (Landan and Graur 2007). The frequency of genes inferred as EGT is plotted above in the eukaryote-dependent strongly colored lines, with the proportions inferred from trees reconstructed by maximum likelihood and NJ approaches in solid and dashed lines, respectively.
To trace the nature of the plastid ancestor, we first assembled a dataset of 23 nuclear genes of plastid origin present in all plant and cyanobacterial genomes sampled. The tree of concatenated alignments, rooted by G. violaceus PCC 7421, shows a deep branch placing plastids basal among cyanobacteria (designated with an “a” in fig. 2). Expanding the data set to include 200 universal cyanobacterial gene families with a single, composite plant OTU (genes acquired from cyanobacteria and present in at least one plant) yielded the same long, deep branch. Long basal branches are characteristic of long-branch attraction (LBA), a well-known phylogenetic artifact. Compositional heterogeneity such as AT bias and heterotachy can cause LBA (Lockhart et al. 2006), and a basal position due to an LBA often involves the grouping of strains in which strain-specific character states are abundant (Stiller and Hall 1999). The sequences of the 23 universally distributed proteins in the six photosynthetic eukaryotes were found to contain significantly more unique substitutions than their cyanobacterial homologues (P = 7 × 10–66, one-tailed Kolmogorov–Smirnov test, supplementary fig. S6, Supplementary Material online), and an examination of the larger set of 200 phylogenetic trees reconstructed for genes of endosymbiotic origin shows that the eukaryotic clade branch length is on average 10-fold larger than that of the cyanobacterial branches. The basal position of plastids among cyanobacteria in the concatenated alignment tree (fig. 2 and supplementary fig. S6, Supplementary Material online) is attributable to LBA. Worse, given that LGT is frequent among cyanobacteria (Raymond et al. 2002; Mulkidjanian et al. 2006; Shi and Falkowski 2008; Dufresne et al. 2008), there is no reason to suspect that any “core” gene phylogeny will be a faithful proxy for the rest of the genome (Doolittle and Bapteste 2007).
Therefore, we turned our attention to the larger set of nuclear genes of cyanobacterial origin whose homologs are not universally distributed among cyanobacteria. For 611 plant nuclear gene families identified as plastid acquisitions, we scored gene presence and absence, and protein sequence identity among cyanobacterial genomes (fig. 4). The SynPro clade lacks a substantial portion of these plastid ancestor gene families. A total of 245 (40%) protein families possessed by plants are absent in all Prochlorococcus strains, 137 (22%) are absent in all Synechococcus strains (fig. 4). The similarity map also shows that overall protein sequence similarity of plant nuclear genes is highest to homologs in members of subsection IV and V. For 225 (37%) protein families, the average amino acid identity between the cyanobacterial genes and their plant homologs is significantly higher for subsection V genomes (α = 0.05, Kolmogorov–Smirnov test and FDR) than for subsection I genomes. When subsection IV and V genomes are combined and compared with those of subsection I, the value increases to 270 (44%) (α = 0.05, Kolmogorov-Smirnov test and FDR). Thus, subsection IV and V genomes harbor more homologs of genes that plants acquired from cyanobacteria and those have higher sequence similarity to their plant homologs than genomes of subsection I. Similar amino acid usage in different organisms may sometimes lead to an overestimation of species relatedness (Rodríguez-Ezpeleta and Embley 2012). Here, we tested for such possible bias using a principle component analysis (PCA) for the amino acid frequencies encoded by the 611 genes of endosymbiotic origin. The transformation of amino acid usage into two principal components explains in total 89% of the variability observed (supplementary fig. S7, Supplementary Material online). Furthermore, the PCA reveals that the eukaryotic species do not group with the filamentous cyanobacteria; hence, the protein sequence similarity observed between those two groups is not a result of biased amino acid usage. Consequently, we can conclude that in the present sample, the collection of genes possessed by the ancestor of plastids was most similar to that in filamentous, heterocyst-forming cyanobacteria (fig. 2).
Fig. 4.—
Presence/absence and sequence similarity patterns of cyanobacterial protein families by comparison with their homologs of endosymbiotic origin in six photosynthetic eukaryotes. Amino acid sequence similarity between the cyanobacterial proteins (x axis) and their counterparts in the eukaryotic plastid-derived set of protein families (y axis), as deduced for the genomes in the data set. Cell shades in the matrix correspond to the similarity ranking for each protein family (i.e., line) according to a color gradient from red (high similarity) to blue (low similarity). White cells correspond to genes lacking in the respective genomes. Protein families are ordered according to their distribution pattern into (A) nearly universal, (B) sparse representation or (C) highly frequent in the oceanic species, and (D) generally sparse representation. Cyanobacterial strains are ordered according to the MLN in fig. 2.
Discussion
Possible Initial Benefits of Plastids
Today plastids supply fixed carbon to plant cells, but they also have a myriad of other functions in amino acid, lipid, and cofactor biosynthesis as well as nitrogen metabolism. What was the biochemical or physiological context of the symbiosis that gave rise to plastids—what initially associated the founder endosymbiont to its host in the first place? Traditional reasoning on the selective advantage that was crucial to the establishment of the plastid has it that the production of carbohydrates by the cyanobacterial endosymbiont was the key, a view that was clearly expressed by Mereschkowsky (1905, p. 605) in his initial formulation of endosymbiotic theory: “Plant cells receive with no effort whatsoever large amounts of preformed organic substrates (carbohydrates), which their chromatophores willingly supply.”
An alternative suggestion is that the initial advantage of plastids may have simply been their uniquely useful metabolic end product, O2, as a boost to respiration in early mitochondria (Martin and Müller 1998). The chemical benefit of O2 could, of course, have only been of value if the initial endosymbiosis had taken place at a time in Earth’s history, or in an environment, where O2 was not freely available in sufficient amounts. Fossil evidence supports the notion that the primary plastid endosymbiosis occurred at least 1.2 billion years ago (Butterfield 2000) and molecular estimates suggest that plastids might have arisen by approximately 1.5 billion years ago (Parfrey et al. 2011). Geochemists have found over the last decade that an approximately 2 billion year span of protracted ocean anoxia ended only about 580 Ma (Anbar and Knoll 2002; Johnston et al. 2009; Lyons et al. 2009; Lyons and Reinhardt 2009; Sahoo et al. 2012). The six major eukaryotic assemblages or “supergroups” currently recognized, including plants, arose and diversified during that time (Parfrey et al. 2011), that is, while the oceans were still anoxic (Müller et al. 2012). Such geological context (ocean anoxia during most of the Proterozoic) would be compatible with a possible role for O2 as an initial benefit in the plastid evolution. Indeed, for Stanier (1970), the production of O2 was a reason to suggest that plastids arose before mitochondria did. Of course, Proterozoic ocean anoxia was likely less pronounced in the photic zone than below it (Johnston et al. 2009). A freshwater origin of plastids is also a possibility to consider, whereby the present data linking plastids phylogenomically more closely with freshwater cyanobacteria than with marine forms (fig. 2) would be compatible with that view.
Another suggestion is that the key to establishment of the plastid was the origin of carbon translocators in the plastid inner membrane and that the incorporation of a metabolite antiporter like the triose phosphate translocator in the ancestral plastid membrane was the essential step for establishing the primary endosymbiosis by allowing the plant ancestor to profit from cyanobacterial carbon fixation (Weber et al. 2006). In the same vein, it was furthermore argued that the key to establishment of the plastid entailed the insertion of additional host-controlled metabolite exchange proteins into plastid membranes fulfilling a similar export role (Gross and Bhattacharya 2009). A problem with theories that focus on carbon exporters as the key innovation at plastid origin is that cyanobacteria are well known to produce copious amounts of exopolysaccharides (De Philippis and Vincenzini 1998), such that there would be no need to evolve or insert transporters for provision of carbohydrates to be realized by the host.
The theory for the initial benefit of plastids that is currently best founded in direct observation, we would argue, is that nitrogen fixation was a key to the establishment of the symbiosis (Kneip et al. 2007). This view is supported by the circumstance that in modern symbioses involving cyanobacteria, nitrogen (not reduced carbon) is usually the key nutrient underlying the success of the partnership (Rai et al. 2000; Raven 2002). Accordingly, the cyanobacterial endosymbionts are nitrogen fixing forms and combined nitrogen (ammonium) is the nutrient provided by the cyanobacterium. This is true for diatoms with N2-fixing cyanobacterial endosymbionts (Prechtl et al. 2004; Kneip et al. 2008), prymnesiophytes with associated N-fixing cyanobacteria that might be ectosymbionts (Thompson et al. 2012), cyanobionts in lichens (Rikkinen et al. 2002), coralloid roots of cycads (Costa et al. 2004), the angiosperm Gunnera (Chiu et al. 2005), and the water-fern Azolla (Ran et al. 2010). In the case of Azolla and Rhopalodia, the N2-fixing cyanobacteria live as intracellular endosymbionts (Kneip et al. 2008; Ran et al. 2010).
Recent studies have suggested that a filamentous phenotype and heterocyst differentiation may have been hallmark phenotypic characteristics of the plastid ancestor (Deusch et al. 2008; Ran et al. 2010; Larsson et al. 2011). Indeed, in modern cyanobacterial symbioses, fixed nitrogen is the main currency of benefit that the cyanobacterial symbiont provides to its host (Kayley et al. 2007). The early physiological association of the plastid ancestors with their host might thus have been similar to that of the unicellular nitrogen-fixing endosymbiont and its diatom host Rhopalodia (Kneip et al. 2008), or the highly reduced Nostoc azollae, an obligate cyanobiont of water-ferns, whose genome has drastically been reduced, with a large portion of the remaining genes specifically dedicated to heterocyst differentiation and nitrogen fixation (Ran et al. 2010). A potential problem with this view is that nitrogen fixation has not been retained by any modern plant (Allen and Raven 1996). Why not? One possible reason concerns the circumstance that cyanobacterial O2 production led to an oxidation state of the environment in which nitrate became abundant (Falkowski et al. 2008)—in a world of abundant nitrate, nitrogenase is less necessary, hence less likely to be retained, although one should recall that modern cyanobacterial symbionts do fix nitrogen for their hosts. Perhaps, more importantly, in oxic environments cyanobacteria that express nitrogenase must exhibit either temporal separation of photosynthesis and nitrogen fixation (N2-fixation occurring mainly in the dark; Mitsui et al. 1986), or other means of protecting the notoriously O2-sensitive enzyme from inactivation such as diazocyte differentiation in Trichodesmium (Sandh et al. 2012), or heterocyst formation in subsections IV and V (Kumar et al. 2010). It is possible that such nitrogenase-protecting strategies, whereas readily accessible to genetically autonomous prokaryotes, are not among the realm of possibilities that plastids, which relinquished most of their genetic autonomy, can developmentally attain.
Many Endosymbionts, or Only One with Many Genes?
Gene transfer following plastid origin readily explains plant nuclear genes that branch with cyanobacteria. However, many plant-specific genes branch with other prokaryotes (fig. 5A). Plant genes that branch with chlamydial homologs have led to the inferences that a chlamydial endosymbiont accompanied the origin of plastids (Brinkman et al. 2002; Huang and Gogarten 2007; Price et al. 2012). This theory postulates that the plant ancestor consumed cyanobacteria as food and was parasitized by environmental chlamydias (Huang and Gogarten 2007; Moustafa et al. 2008), whereby the chlamydias were key to establishing the plastid because chlamydia-like bacteria donated genes that allowed export of photosynthate from the cyanobacterial plastid ancestor and its polymerization into storage polysaccharide in the cytosol (Price et al. 2012). The flaw with this theory is that it is based on the uncritical interpretation of computational results of genome comparisons that, as has long been known (Rujan and Martin 2001; Martin et al. 2002; Esser et al. 2007; Dagan et al. 2008), would implicate many other groups of prokaryotes far more strongly than they would implicate chlamydias as active bystanders at the origin of plastids. The focus on chlamydia as opposed to, say spirochaetes or proteobacteria, is arbitrary and to some extent ad hoc. If one were to take the chlamydia theory seriously, or think it through in full, the transiently symbiotic and gene-dealing “chlamydioplast” would have to take a number and wait in line next to the actinobacterioplast, the clostridioplast, the bacilloplast, the bacteriodetoplast, and the spirochaetoplast, and so forth. (fig. 5A). Beyond the cyanobacterial signal, which corresponds to a tangible double membrane-bounded and DNA-containing organelle, the other putative phylogenetic signals in the data, especially that involving chlamydia, are better explained in terms of known phenomena, such as LGT among free-living prokaryotes (Dagan et al. 2008) and by phylogeny reconstruction errors (White et al. 2007; Stiller 2011) (fig. 5B), both of which we know to really exist, than in terms of gene dealing endosymbionts whose existence is inferred from a few gene trees. The null hypothesis for endosymbiotic theory in the age of genomes should be: The ancestors of plastids underwent LGT, just like modern cyanobacteria, whose genomes are chimeras of genes from many sources (Mulkidjanian et al. 2006), and the plastid ancestor genome was probably no different (Richards and Archibald 2011). LGT among prokaryotes accounts for the diverse sequence affinities of genes acquired from the single ancestor of plastids with far fewer corollaries than a one-symbiont-per-gene theory. We merely need to incorporate the effect that LGT among prokaryotes will have over geological time on the endosymbiotic origins of organelles.
Fig. 5.—

Taxon distribution of nearest neighbors to plant genes. (A) Tree samples distribute as following: Arabidopsis: 2,324; Oryza: 1,792; Physcomitrella: 2,511; Ostreococcus: 968; Chlamydomonas: 1,218; and Cyanidioschyzon: 693. Microbial taxonomic groups having a low frequency of nearest neighbors were grouped into the “Others” bar. Those include Aquificae, Dictyoglomi, Elusimicrobia, Fibrobacteres, Fusobacteria, Gemmatimonadetes, Korarchaeota, Nanoarchaeota, Nitrospirae, Tenericutes, Thaumarchaeota, and Thermotogae. (B) A comparison of alignment quality (CS) between trees of Arabidopsis genes having a cyanobacterial nearest neighbor (black) and trees where a nearest neighbor from a different prokaryotic group was inferred (colored according to the taxa). In all groups but the Euryarchaeota, the alignment quality of trees where a noncyanobacterial nearest neighbor was inferred is significantly lower in comparison with tree topologies having cyanobacteria as their nearest neighbor (using Wilcoxon test, α = 0.05). These results suggest that the inference of noncyanobacterial nearest neighbors to plant genes is less reliable than the inference of cyanobacterial nearest neighbors.
Clues to the Origin of Two Photosystems
One notable aspect of cyanobacterial phylogenomics presented in this study is that the marine cyanobacteria are not basal in the trees (fig. 2 and supplementary fig. S3, Supplementary Material online). These small unicellular cyanobacteria (diameter 1 µm or less) share reduced genome sizes (<3 Mb) as a common trait, and seem to have arisen from ancestors with larger genomes (Larsson et al. 2011) that, inferred from the phylogeny, lived in terrestrial, brackish, or perhaps freshwater environments (Sánchez-Baracaldo et al. 2005). This led Blank and Sánchez-Baracaldo (2010) to suggest that oxygenic photosynthesis arose in a freshwater environment. Our results support that view, and this conclusion has implications for the origin of water-splitting photosynthesis. Among many possibilities (Xiong and Bauer 2002; Hohmann-Marriot and Blankenship 2011; Williamson et al. 2011), it has been suggested that the progenitor of the cyanobacteria had genes for both type I (RCI) and type II (RCII) photosynthetic reaction centers (via gene duplication) but expressed either set of genes depending on the reducing conditions in the environment (Allen 2005): type RCI in the presence of H2S for noncyclic electron flow, as in Chlorobium (or the facultative anaerobic cyanobacterium Oscillatoria limnetica); and type RCII in the absence of H2S, for cyclic electron flow, as in Rhodobacter (Allen 2005). Were regulation to fail such that both type I and type II reaction centers became expressed in the absence of H2S, the protocyanobacterium would oxidatively perish, unless it could extract electrons from an environmentally available donor.
Such an electron donor could have been aqueous MnII/III, which has the utilitous property of being photo-oxidized by ultraviolet light (Allen and Martin 2007), an abundant component of solar radiation incident on the Earth’s surface prior to accumulation of atmospheric oxygen. Attaining suitably high concentrations of MnII/III as an environmentally available electron donor in the ocean would be problematic, but not in a freshwater setting. Allen et al. (2012) have recently shown that an engineered, Mn-binding type II reaction center of Rhodobacter sphaeroides will produce O2 from
{\rm O}_{\rm 2}^{^- } in the presence of Mn in a light-dependent reaction in which photo-damage is impeded in comparison with that in a wild-type, Mn-free reaction center. Their observation (Allen et al. 2012) is likely an important clue to the origin of oxygenic photosynthesis, at which time a protocyanobacterial type II reaction center acquired, via natural selection, the ability to (photo-)oxidize MnII/III—itself ultimately rereduced by water—and then to reduce a newly constitutive type I reaction center. Transition from environmental (substrate) MnII/III ions to the catalytic Mn4Ca center of cyanobacterial RCII would then have permitted light-dependent CO2 and/or nitrogen fixation, in the absence of electron donors other than water.
What Makes a Branching Cyanobacterium?
The morphological diversity of cyanobacteria poses an intriguing question in the biology and evolution of cell differentiation. Transposon mutagenesis of Synechococcus elongatus PCC 7942 (subsection I) revealed that the loss of several genes involved in cell division leads to filament formation (Miyagishima et al. 2005). However, our analysis revealed that all recognized cyanobacterial cell division genes are present in the genomes of filamentous cyanobacteria, including those of subsection V. This suggests that the filamentous phenotype in cyanobacteria of subsections III, IV, and V is not due to loss of genes for cell division, though it is currently unknown whether those that are present are all expressed. Genes common to both unicellular and filamentous cyanobacteria may also be important for determining trichome structure in members of subsections III–V. This is suggested by a recent study on the filamentous heterocystous strain N. punctiforme ATCC 29133 (Lehner et al. 2011), which showed that mutations of the amiC2 gene, encoding an amidase involved in septa formation, will lead to a morphology similar to that of colonial unicellular cyanobacteria, and prevent heterocyst differentiation. Furthermore, filament formation in S. elongatus PCC 7942 can be induced by over-expression of the gene encoding FtsZ, which is known as a cell division protein (Mori and Johnson 2001). Thus, the lack of clear candidate genes whose distribution across cyanobacterial genomes correlate with cellular morphology and the experimental evidence that links between the expression level (rather than presence/absence) of cell division proteins and filament formation suggest that a filamentous phenotype may result from modifications of the gene regulatory network and cell division program.
Supplementary Material
Supplementary figures S1–S7 and tables S1–S3 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
Acknowledgments
The work in the authors’ laboratories is supported by SFB-TR1 to T.D. and W.F.M., the European Research Council (grant no. 232975 to W.F.M.; grant no. 281357 to T.D.), and a Leverhulme Trust Research Grant (no. F07 476AQ to J.F.A.). The support by the Institut Pasteur and the Centre National de la Recherche Scientifique (URA 2172) is acknowledged by M.G., R.R., and N.T.M. The authors are grateful to T. Coursin and T. Laurent for technical assistance in maintaining the Pasteur Culture Collection of Cyanobacteria at the Institut Pasteur. Additional computational support and infrastructure was provided by “Zentrum fuer Informations- und Medientechnologie” (ZIM) at Heinrich-Heine-University, Duesseldorf, Germany.
Literature Cited
- Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21:2104–2105. doi: 10.1093/bioinformatics/bti263. [DOI] [PubMed] [Google Scholar]
- Allen JF. A redox switch hypothesis for the origin of two light reactions in photosynthesis. FEBS Lett. 2005;579:963–938. doi: 10.1016/j.febslet.2005.01.015. [DOI] [PubMed] [Google Scholar]
- Allen JF, Martin W. Evolutionary biology: out of thin air. Nature. 2007;445:610–612. doi: 10.1038/445610a. [DOI] [PubMed] [Google Scholar]
- Allen JF, Raven JA. Free-radical-induced mutation vs redox regulation: costs and benefits of genes in organelles. J Mol Evol. 1996;42:482–492. doi: 10.1007/BF02352278. [DOI] [PubMed] [Google Scholar]
- Allen JP, et al. Light-driven oxygen production from superoxide by Mn-binding bacterial reaction centers. Proc Natl Acad Sci U S A. 2012;109:2314–2318. doi: 10.1073/pnas.1115364109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;35:3389–3342. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anbar AD, Knoll AH. Proterozoic ocean chemistry and evolution: a bioinorganic bridge. Science. 2002;297:1137–1142. doi: 10.1126/science.1069651. [DOI] [PubMed] [Google Scholar]
- Badger JH, Olsen GJ. CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol. 1999;16:512–524. doi: 10.1093/oxfordjournals.molbev.a026133. [DOI] [PubMed] [Google Scholar]
- Baymann F, Brugna M, Mühlenhoff U, Nitschke W. Daddy, where did (PS)I come from? Biochim Biophys Acta. 2001;1507:291–310. doi: 10.1016/s0005-2728(01)00209-2. [DOI] [PubMed] [Google Scholar]
- Bekker A, et al. Dating the rise of atmospheric oxygen. Nature. 2004;427:117–120. doi: 10.1038/nature02260. [DOI] [PubMed] [Google Scholar]
- Blank CE, Sánchez-Baracaldo P. Timing of morphological and ecological innovations in the cyanobacteria—a key to understanding the rise in atmospheric oxygen. Geobiology. 2010;8:1–23. doi: 10.1111/j.1472-4669.2009.00220.x. [DOI] [PubMed] [Google Scholar]
- Bowes JB, et al. Xenbase: a Xenopus biology and genomics resource. Nucleic Acids Res. 2008;36:D761–D767. doi: 10.1093/nar/gkm826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brinkman FS, et al. Evidence that plant-like genes in Chlamydia species reflect an ancestral relationship between Chlamydiaceae, cyanobacteria, and the chloroplast. Genome Res. 2002;12:1159–1167. doi: 10.1101/gr.341802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Butterfield NJ. Bangiomorpha pubescens n. gen., n. sp.: implications for the evolution of sex, multicellularity, and the mesoproterozoic/neoproterozoic radiation of eukaryotes. Paleobiology. 2000;263:386–404. [Google Scholar]
- Chiu WL, et al. Nitrogen deprivation stimulates symbiotic gland development in Gunnera manicata. Plant Physiol. 2005;139:224–230. doi: 10.1104/pp.105.064931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costa JL, Romero EM, Lindblad P. Sequence based data supports a single Nostoc strain in individual coralloid roots of cycads. FEMS Microbiol Ecol. 2004;49:481–487. doi: 10.1016/j.femsec.2004.05.001. [DOI] [PubMed] [Google Scholar]
- Criscuolo A, Gribaldo S. Large-scale phylogenomic analyses indicate a deep origin of primary plastids within cyanobacteria. Mol Biol Evol. 2011;28:3019–3032. doi: 10.1093/molbev/msr108. [DOI] [PubMed] [Google Scholar]
- Dagan T, Artzy-Randrup Y, Martin W. Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. Proc Natl Acad Sci U S A. 2008;105:10039–10044. doi: 10.1073/pnas.0800679105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Philippis R, Vincenzini M. Exocellular polysaccharides from cyanobacteria and their possible applications. FEMS Microbiol Rev. 1998;22:151–175. [Google Scholar]
- Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999;27:4636–4641. doi: 10.1093/nar/27.23.4636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deschamps P, et al. Metabolic symbiosis and the birth of the plant kingdom. Mol Biol Evol. 2008;25:536–548. doi: 10.1093/molbev/msm280. [DOI] [PubMed] [Google Scholar]
- Deusch O, et al. Genes of cyanobacterial origin in plant nuclear genomes point to a heterocyst-forming plastid ancestor. Mol Biol Evol. 2008;25:748–761. doi: 10.1093/molbev/msn022. [DOI] [PubMed] [Google Scholar]
- Doolittle WF. Revolutionary concepts in evolutionary biology. Trends Biochem Sci. 1980;5:146–149. [Google Scholar]
- Doolittle WF, Bapteste E. Pattern pluralism and the tree of life hypothesis. Proc Natl Acad Sci U S A. 2007;104:2043–2049. doi: 10.1073/pnas.0610699104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dufresne A, et al. Unraveling the genomic mosaic of a ubiquitous genus of marine cyanobacteria. Genome Biol. 2008;9:R90. doi: 10.1186/gb-2008-9-5-r90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Enright AJ, van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–1584. doi: 10.1093/nar/30.7.1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Esser C, Martin W, Dagan T. The origin of mitochondria in light of a fluid prokaryotic chromosome model. Biol Lett. 2007;3:180–184. doi: 10.1098/rsbl.2006.0582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320:1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
- Felsenstein J. Seattle (WA): University of Washington; 1993. PHYLIP (phylogeny inference package). Version 3.5c. [Google Scholar]
- Franche C, Damerval T. Test on nif probes and DNA hybridizations. Methods Enzymol. 1988;167:803–808. [Google Scholar]
- Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res. 1998;8:195–202. doi: 10.1101/gr.8.3.195. [DOI] [PubMed] [Google Scholar]
- Gould SB, Waller RF, McFadden GI. Plastid evolution. Annu Rev Plant Biol. 2008;59:491–517. doi: 10.1146/annurev.arplant.59.032607.092915. [DOI] [PubMed] [Google Scholar]
- Gross J, Bhattacharya D. Opinion: Mitochondrial and plastid evolution in eukaryotes: an outsiders’ perspective. Nat Rev Genet. 2009;10:495–505. doi: 10.1038/nrg2610. [DOI] [PubMed] [Google Scholar]
- Gross J, Meurer J, Bhattacharya D. Evidence of a chimeric genome in the cyanobacterial ancestor of plastids. BMC Evol Biol. 2008;8:117. doi: 10.1186/1471-2148-8-117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
- Hohmann-Marriott MF, Blankenship RE. Evolution of photosynthesis. Annu Rev Plant Biol. 2011;62:515–548. doi: 10.1146/annurev-arplant-042110-103811. [DOI] [PubMed] [Google Scholar]
- Huang J, Gogarten JP. Did an ancient chlamydial endosymbiosis facilitate the establishment of primary plastids? Genome Biol. 2007;8:R99. doi: 10.1186/gb-2007-8-6-r99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 2006;23:254–267. doi: 10.1093/molbev/msj030. [DOI] [PubMed] [Google Scholar]
- Johnston DT, Wolfe-Simon F, Pearson A, Knoll AH. Anoxygenic photosynthesis modulated Proterozoic oxygen and sustained Earth’s middle age. Proc Natl Acad Sci U S A. 2009;106:16925–16929. doi: 10.1073/pnas.0909248106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation rate matrices from protein sequences. Comput Appl Biosci. 1992;8:275–282. doi: 10.1093/bioinformatics/8.3.275. [DOI] [PubMed] [Google Scholar]
- Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kayley MU, Bergman B, Raven JA. Exploring cyanobacterial mutualism. Annu Rev Ecol Evol Syst. 2007;38:255–273. [Google Scholar]
- Kneip C, Lockhart P, Voss C, Maier UG. Nitrogen fixation in eukaryotes—new models for symbiosis. BMC Evol Biol. 2007;7:55. doi: 10.1186/1471-2148-7-55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kneip C, Voss C, Lockhart PJ, Maier UG. The cyanobacterial endosymbiont of the unicellular algae Rhopalodia gibba shows reductive genome evolution. BMC Evol Biol. 2008;8:30. doi: 10.1186/1471-2148-8-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krause L, et al. GISMO-gene identification using a support vector machine for ORF identification. Nucleic Acids Res. 2007;35:540–549. doi: 10.1093/nar/gkl1083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar K, Mella-Herrera RA, Golden JW. Cyanobacterial heterocysts. Cold Spring Harb Perspect Biol. 2010;2:a000315. doi: 10.1101/cshperspect.a000315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landan G, Graur D. Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol. 2007;24:1380–1383. doi: 10.1093/molbev/msm060. [DOI] [PubMed] [Google Scholar]
- Larsson J, Nylander JA, Bergman B. Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits. BMC Evol Biol. 2011;11:187. doi: 10.1186/1471-2148-11-187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lehner J, et al. The morphogene AmiC2 is pivotal for multicellular development in the cyanobacterium Nostoc punctiforme. Mol Microbiol. 2011;79:1655–1669. doi: 10.1111/j.1365-2958.2011.07554.x. [DOI] [PubMed] [Google Scholar]
- Lindell D, et al. Transfer of photosynthesis genes to and from Prochlorococcus viruses. Proc Natl Acad Sci U S A. 2004;101:11013–11018. doi: 10.1073/pnas.0401526101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lockhart P, et al. Heterotachy and tree building: a case study with plastids and eubacteria. Mol Biol Evol. 2006;23:40–45. doi: 10.1093/molbev/msj005. [DOI] [PubMed] [Google Scholar]
- Lyons TW, Anbar AD, Severmann S, Scott C, Gill BC. Tracking euxinia in the ancient ocean: a multiproxy perspective and Proterozoic case study. Annu Rev Earth Planet Sci. 2009;37:507–534. [Google Scholar]
- Lyons TW, Reinhard CT. An early productive ocean unfit for aerobics. Proc Natl Acad Sci U S A. 2009;106:18045–18046. doi: 10.1073/pnas.0910345106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin W, Müller M. The hydrogen hypothesis for the first eukaryote. Nature. 1998;392:37–41. doi: 10.1038/32096. [DOI] [PubMed] [Google Scholar]
- Martin W, et al. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci U S A. 2002;99:12246–12251. doi: 10.1073/pnas.182432999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsuzaki M, et al. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature. 2004;428:653–657. doi: 10.1038/nature02398. [DOI] [PubMed] [Google Scholar]
- Mereschkowsky C. Über Natur und Ursprung der Chromatophoren im Pflanzenreiche. Biol Centralbl. 1905;25:593–604. [English translation in Eur J Phycol. 1999;34:287–295.] [Google Scholar]
- Meyer F, et al. GenDB—an open source genome annotation system for prokaryote genomes. Nucleic Acids Res. 2003;31:2187–2195. doi: 10.1093/nar/gkg312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitsui AS, et al. Strategy by which nitrogen-fixing unicellular cyanobacteria grow photoautotrophically. Nature. 1986;323:720–722. [Google Scholar]
- Miyagishima SY, Wolk CP, Osteryoung KW. Identification of cyanobacterial cell division genes by comparative and mutational analyses. Mol Microbiol. 2005;56:126–143. doi: 10.1111/j.1365-2958.2005.04548.x. [DOI] [PubMed] [Google Scholar]
- Moisander PH, et al. Unicellular cyanobacterial distributions broaden the oceanic N2 fixation domain. Science. 2010;327:1512–1524. doi: 10.1126/science.1185468. [DOI] [PubMed] [Google Scholar]
- Morel FM, Price NM. The biogeochemical cycles of trace metals in the oceans. Science. 2003;300:944–947. doi: 10.1126/science.1083545. [DOI] [PubMed] [Google Scholar]
- Mori T, Johnson CH. Independence of circadian timing from cell division in cyanobacteria. J Bacteriol. 2001;183:2439–2444. doi: 10.1128/JB.183.8.2439-2444.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moustafa A, Reyes-Prieto A, Bhattacharya D. Chlamydiae has contributed at least 55 genes to plantae with predominantly plastid functions. PLoS One. 2008;3:e2205. doi: 10.1371/journal.pone.0002205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mulkidjanian AY, et al. The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci U S A. 2006;103:13126–13131. doi: 10.1073/pnas.0605709103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Müller M, et al. Biochemistry and evolution of anaerobic energy metabolism in eukaryotes. Microbiol Mol Biol Rev. 2012;76:444–495. doi: 10.1128/MMBR.05024-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ochman H, Lawrence JG, Groisman EA. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000;405:299–304. doi: 10.1038/35012500. [DOI] [PubMed] [Google Scholar]
- Palenik B, et al. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A. 2007;104:7705–7710. doi: 10.1073/pnas.0611046104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parfrey LW, Lahr DJG, Knoll AH, Katz LA. Estimating the timing of early eukaryotic diversification with multigene molecular clocks. Proc Natl Acad Sci U S A. 2011;108:13624–13629. doi: 10.1073/pnas.1110633108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prechtl J, Kneip C, Lockhart P, Wenderoth K, Maier UG. Intracellular spheroid bodies of Rhopalodia gibba have nitrogen-fixing apparatus of cyanobacterial origin. Mol Biol Evol. 2004;21:1477–1481. doi: 10.1093/molbev/msh086. [DOI] [PubMed] [Google Scholar]
- Price DC, et al. Cyanophora paradoxa genome elucidates origin of photosynthesis in algae and plants. Science. 2012;335:843–847. doi: 10.1126/science.1213561. [DOI] [PubMed] [Google Scholar]
- Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rai AN, Söderbäck E, Bergman B. Cyanobacterium-plant symbioses. New Phytol. 2000;147:449–481. doi: 10.1046/j.1469-8137.2000.00720.x. [DOI] [PubMed] [Google Scholar]
- Ran L, et al. Genome erosion in a nitrogen-fixing vertically transmitted endosymbiotic multicellular cyanobacterium. PLoS One. 2010;5:e11486. doi: 10.1371/journal.pone.0011486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raven JA. Evolution of cyanobacterial symbioses. In: Rai AN, Bergman B, Rasmussen U, editors. Cyanobacteria in symbiosis. Dordrecht (The Netherlands): Kluwer Academic Publishers; 2002. pp. 326–246. [Google Scholar]
- Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE. Whole-genome analysis of photosynthetic prokaryotes. Science. 2002;298:1616–1620. doi: 10.1126/science.1075558. [DOI] [PubMed] [Google Scholar]
- Reyes-Prieto A, et al. Differential gene retention in plastids of common recent origin. Mol Biol Evol. 2010;27:1530–1537. doi: 10.1093/molbev/msq032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richards TA, Archibald JM. Cell evolution: gene transfer agents and the origin of mitochondria. Curr Biol. 2011;21:R112. doi: 10.1016/j.cub.2010.12.036. [DOI] [PubMed] [Google Scholar]
- Richly E, Leister D. An improved prediction of chloroplast proteins reveals diversities and commonalities in the chloroplast proteomes of Arabidopsis and rice. Gene. 2004;329:11–16. doi: 10.1016/j.gene.2004.01.008. [DOI] [PubMed] [Google Scholar]
- Rikkinen J, Oksanen I, Lohtander K. Lichen guilds share related cyanobacterial endosymbionts. Science. 2002;297:357. doi: 10.1126/science.1072961. [DOI] [PubMed] [Google Scholar]
- Rippka R, Deruelles J, Waterbury JB, Herdman M, Stanier RY. Generic assignments, strain histories and properties of pure cultures of cyanobacteria. J Gen Microbiol. 1979;111:1–61. [Google Scholar]
- Rippka R, Herdman H. Pasteur culture collection of Cyanobacteria: catalogue and taxonomic handbook. I. Catalogue of strains. Paris: Institut Pasteur; 2002. [Google Scholar]
- Rocap G, et al. Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 2003;424:1042–1047. doi: 10.1038/nature01947. [DOI] [PubMed] [Google Scholar]
- Rodríguez-Ezpeleta N, Embley TM. The SAR11 group of alpha-proteobacteria is not related to the origin of mitochondria. PLoS One. 2012;7:e30520. doi: 10.1371/journal.pone.0030520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rujan T, Martin W. How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies. Trends Genet. 2001;17:113–120. doi: 10.1016/s0168-9525(00)02209-5. [DOI] [PubMed] [Google Scholar]
- Sahoo SK, et al. Ocean oxygenation in the wake of the Marinoan glaciation. Nature. 2012;489:546–549. doi: 10.1038/nature11445. [DOI] [PubMed] [Google Scholar]
- Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
- Sánchez-Baracaldo P, Hayes PK, Blank CE. Morphological and habitat evolution in the Cyanobacteria using a compartmentalization approach. Geobiology. 2005;3:145–165. [Google Scholar]
- Sandh G, Xu Linghua, Bergman B. Diazocyte development in the marine diazotrophic cyanobacterium Trichodesmium. Microbiology. 2012;158:345–352. doi: 10.1099/mic.0.051268-0. [DOI] [PubMed] [Google Scholar]
- Sharma AD, Gill PK, Singh P. DNA isolation from dry and fresh samples of polysaccharide-rich plants. Plant Mol Biol Rep. 2002;20:415a–415f. [Google Scholar]
- Sharon I, et al. Photosystem I gene cassettes are present in marine virus genomes. Nature. 2009;461:258–262. doi: 10.1038/nature08284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi T, Falkowski PG. Genome evolution in cyanobacteria: the stable core and the variable shell. Proc Natl Acad Sci U S A. 2008;105:2510–2515. doi: 10.1073/pnas.0711165105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stanier RY. Some aspects of the biology of cells and their possible evolutionary significance. Symp Soc Gen Microbiol. 1970;20:1–38. [Google Scholar]
- Stiller JW. Experimental design and statistical rigor in phylogenomics of horizontal and endosymbiotic gene transfer. BMC Evol Biol. 2011;11:259. doi: 10.1186/1471-2148-11-259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stiller JW, Hall BD. Long-branch attraction and the rDNA model of early eukaryotic evolution. Mol Biol Evol. 1999;16:1270–1279. doi: 10.1093/oxfordjournals.molbev.a026217. [DOI] [PubMed] [Google Scholar]
- Stucken K, et al. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications. PLoS One. 2010;5:e9235. doi: 10.1371/journal.pone.0009235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swingley WD, et al. Niche adaptation and genome expansion in the chlorophyll d-producing cyanobacterium Acaryochloris marina. Proc Natl Acad Sci U S A. 2008;105:2005–2010. doi: 10.1073/pnas.0709772105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tatusov RL, et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001;29:22–28. doi: 10.1093/nar/29.1.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson AW, et al. Unicellular cyanobacterium symbiotic with a single-celled eukaryotic alga. Science. 2012;337:1546–1550. doi: 10.1126/science.1222700. [DOI] [PubMed] [Google Scholar]
- van Mooy BA, et al. Phytoplankton in the ocean use non-phosphorus lipids in response to phosphorus scarcity. Nature. 2009;458:69–72. doi: 10.1038/nature07659. [DOI] [PubMed] [Google Scholar]
- Weber AP, Linka M, Bhattacharya D. Single, ancient origin of a plastid metabolite translocator family in Plantae from an endomembrane-derived ancestor. Eukaryot Cell. 2006;5:609–612. doi: 10.1128/EC.5.3.609-612.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whelan S, Goldman N. A general model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001;18:691–699. doi: 10.1093/oxfordjournals.molbev.a003851. [DOI] [PubMed] [Google Scholar]
- White WT, Hills SF, Gaddam R, Holland BR, Penny D. Treeness triangles: visualizing the loss of phylogenetic signal. Mol Biol Evol. 2007;24:2029–2039. doi: 10.1093/molbev/msm139. [DOI] [PubMed] [Google Scholar]
- Williamson A, Conlan B, Hillier W, Wydrzynski T. The evolution of photosystem II: insights into the past and future. Photosynth Res. 2011;107:71–86. doi: 10.1007/s11120-010-9559-3. [DOI] [PubMed] [Google Scholar]
- Xiong J, Bauer CE. Complex evolution of photosynthesis. Annu Rev Plant Biol. 2002;53:503–521. doi: 10.1146/annurev.arplant.53.100301.135212. [DOI] [PubMed] [Google Scholar]
- Zhaxybayeva O, Doolittle WF, Papke RT, Gogarten JP. Intertwined evolutionary histories of marine Synechococcus and Prochlorococcus marinus. Genome Biol Evol. 2009;1:325–339. doi: 10.1093/gbe/evp032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res. 2006;16:1099–1108. doi: 10.1101/gr.5322306. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



