Significance
Eukaryotic photosynthetic organelles (plastids) originated >1 billion y ago via the endosymbiosis of a β-cyanobacterium. The resulting proliferation of primary producers fundamentally changed our planet’s history, allowing for the establishment of human populations. Early stages of plastid integration, however, remain poorly understood, including the role of horizontal gene transfer from nonendosymbiotic bacteria. Rules governing organellogenesis are difficult, if not impossible, to evaluate using the highly derived algal and plant systems. Insights into this issue are provided by the amoeba Paulinella chromatophora, which contains more recently established photosynthetic organelles of α-cyanobacterial origin. Here we show that the impact of Muller’s ratchet that leads to endosymbiont genome reduction seems to drive the fixation of horizontally acquired “compensatory” bacterial genes in the host nuclear genome.
Keywords: endosymbiosis, genome evolution, organellogenesis, horizontal gene transfer, coevolution
Abstract
Plastids, the photosynthetic organelles, originated >1 billion y ago via the endosymbiosis of a cyanobacterium. The resulting proliferation of primary producers fundamentally changed global ecology. Endosymbiotic gene transfer (EGT) from the intracellular cyanobacterium to the nucleus is widely recognized as a critical factor in the evolution of photosynthetic eukaryotes. The contribution of horizontal gene transfers (HGTs) from other bacteria to plastid establishment remains more controversial. A novel perspective on this issue is provided by the amoeba Paulinella chromatophora, which contains photosynthetic organelles (chromatophores) that are only 60–200 million years old. Chromatophore genome reduction entailed the loss of many biosynthetic pathways including those for numerous amino acids and cofactors. How the host cell compensates for these losses remains unknown, because the presence of bacteria in all available P. chromatophora cultures excluded elucidation of the full metabolic capacity and occurrence of HGT in this species. Here we generated a high-quality transcriptome and draft genome assembly from the first bacteria-free P. chromatophora culture to deduce rules that govern organelle integration into cellular metabolism. Our analyses revealed that nuclear and chromatophore gene inventories provide highly complementary functions. At least 229 nuclear genes were acquired via HGT from various bacteria, of which only 25% putatively arose through EGT from the chromatophore genome. Many HGT-derived bacterial genes encode proteins that fill gaps in critical chromatophore pathways/processes. Our results demonstrate a dominant role for HGT in compensating for organelle genome reduction and suggest that phagotrophy may be a major driver of HGT.
Plastids are photosynthetic organelles in algae and plants that originated >1 billion y ago in the protistan ancestor of the Archaeplastida (red, glaucophyte, and green algae plus plants) via the primary endosymbiosis of a β-cyanobacterium (1, 2). Subsequently, plastids spread through eukaryote–eukaryote (i.e., secondary and tertiary) endosymbioses to other algal groups (3). The resulting proliferation of primary producers fundamentally changed our planet’s history, allowing for the establishment of human populations. Plastid evolution was accompanied by a massive size reduction of the endosymbiont genome and the transfer of thousands of endosymbiont genes into the host nuclear genome, a process known as endosymbiotic gene transfer (EGT) (4). Proteins encoded by the transferred genes are synthesized in the cytoplasm and many are posttranslationally translocated into the plastid through the TIC/TOC protein import complex (5). EGT is widely recognized as a major contributor to the evolution of eukaryotes, and in particular the transformation of an endosymbiont into an organelle. More recently, it was proposed that horizontal gene transfers (HGTs) from cooccurring intracellular bacteria also supplied genes that facilitated plastid establishment (6). However, the extent and sources of HGTs and their importance to organelle evolution remain controversial topics (7, 8).
The chromatophore of the cercozoan amoeba Paulinella chromatophora (Rhizaria) represents the only known case of acquisition of a photosynthetic organelle other than the primary endosymbiosis that gave rise to the Archaeplastida (9). The chromatophore originated much more recently than plastids (∼60–200 Ma) via the uptake of an α-cyanobacterial endosymbiont related to Synechococcus/Cyanobium spp. (9, 10). In contrast to heterotrophic Paulinella species that feed on bacteria, their phototrophic sister, P. chromatophora, lost its phagotrophic ability and relies primarily on photosynthetic carbon fixation for survival (11, 12). The chromatophore genome is reduced to 1 Mbp, approximately one-third the size of the ancestral cyanobacterial genome. Genome reduction was accompanied by the complete loss of many biosynthetic pathways, including those for various amino acids and cofactors. In other pathways, genes for single metabolic enzymes were lost (13). How the host compensates for the loss of metabolic functions from the chromatophore remains unknown. Previous studies identified >30 nuclear genes of α-cyanobacterial origin that were likely acquired via EGT from the chromatophore (14–16). However, most of these genes encoded functions related to photosynthesis and light adaptation and do not seem to complement gaps in chromatophore-encoded metabolic pathways. Three EGT-derived genes that encode the photosystem I (PSI) subunits PsaE, PsaK1, and PsaK2 were shown to be synthesized on cytoplasmic ribosomes and traffic (likely via the Golgi) into the chromatophore, where they assemble with chromatophore-encoded PSI subunits (17). Even though details of the protein translocation mechanism remain to be elucidated, these findings demonstrate that cytoplasmically synthesized proteins can be imported into chromatophores. Owing to the large number of bacteria associated with P. chromatophora in all available laboratory cultures, the full metabolic capacity of P. chromatophora is unknown and the occurrence of HGTs remains uncertain because of the inability to distinguish genes from contaminating bacteria from true HGT.
Results and Discussion
Transcriptome and Genome Datasets from Axenic P. chromatophora.
To deduce the rules that govern organelle integration into cellular metabolism, we focused on exploring the extent of HGT in P. chromatophora and the putative functions of proteins derived from HGT. For this purpose, we established a bacteria-free (i.e., axenic) culture of P. chromatophora. These cells were used to generate the transcriptome and genome data discussed here. The P. chromatophora transcriptome dataset comprises 49.5 Mbp of assembled sequence with a contig N50 of 1.1 kbp. These contigs encode homologs of 442/458 (97%) of the core eukaryotic proteins in the Core Eukaryotic Genes Mapping Approach (CEGMA) database (16). Preliminary analyses indicate that the nuclear genome has a surprisingly large estimated size of ∼9.6 Gbp (Fig. S1 and Materials and Methods). Thus, despite generating 147.4 Gbp of data from paired-end and mate-pair libraries (Materials and Methods), our initial assembly remained highly fragmented (N50 of 711 bp). All contigs >15 kbp in size were chromatophore- or mitochondrion-derived sequences. A potentially circular contig of 47.4 kbp with an average read coverage of 12,903× (0.82% of total genomic mapped reads) was identified as the complete, or nearly complete, P. chromatophora mitochondrial genome (Fig. S2). This contig contains 22 protein-coding genes, 27 tRNAs, and two (large + small) ribosomal RNA subunits.
Fig. S1.
Histogram depicting the probability of observing frequent 31-mers (observed two or more times) in a 10% random subsample of P. chromatophora Illumina HiSeq reads. Reads were truncated to 150 bp from 250 bp to exclude low-quality bases from the analysis. The exponential distribution of k-mer frequencies, as opposed to an expected normal distribution, is indicative of a sequence coverage predominantly derived from unique or nonoverlapping DNA amplicons and evidences low-coverage sequencing of an extremely large genome.
Fig. S2.
Forty-seven-kbp genomic contig that encodes all or most of the P. chromatophora mitochondrial genome.
Chromatophore and Host Genomes Encode Complementary Functions.
Metabolic reconstruction of the amoeba gene inventory revealed the presence of genes for many metabolic pathways on the nuclear genome that were originally also present on, but then lost from, the chromatophore genome (e.g., Met, Ser, Gly, and purine biosynthesis; Fig. 1A and Figs. S3 and S4). In other instances, gaps in chromatophore-encoded pathways are filled by proteins encoded on the nuclear genome (e.g., Arg, His, and aromatic amino acid biosynthesis; Fig. 1B and Fig. S3). Interestingly, chromatophore genome reduction also involved the loss of genes essential for bacteria-specific functions that cannot be replaced by eukaryotic genes. One such “lost” gene encodes UDP-N-acetylmuramoyl-tripeptide:d-Ala-d-Ala ligase (MurF), which ligates the dipeptide d-Ala-d-Ala to the growing peptide side chain of peptidoglycan monomers (Fig. 1C). All remaining steps of peptidoglycan biosynthesis are encoded on the chromatophore genome. Intriguingly, analysis of the P. chromatophora transcriptome dataset revealed the presence of a nuclear-encoded MurF of β-proteobacterial origin (Figs. 1C and 2A).
Fig. 1.
Metabolic pathways and DNA replication in P. chromatophora. The distribution of chromatophore-encoded (within green rectangles) and nuclear-encoded genes is shown, although the subcellular localization of the gene products is unknown. Numbers associated with chromatophore-encoded enzymes are locus tags for the respective genes (e.g., 1234 represents PCC_1234). Pale lettering/arrows indicate that the gene is missing from the chromatophore genome or absent in nuclear transcriptome data. Circles and rectangles adjacent to the enzymes indicate their phylogenetic origin and targeting prediction (TargetP prediction; mTP and SP predictions with a reliability class <3 are shown), respectively; they are defined immediately below the figure. Multiples of the individual symbols represent the presence of multiple protein versions encoded by the transcript dataset. 3-PGA, 3-phosphoglycerate; PII, the PII nitrogen-sensing protein (see text); PEP, phosphoenolpyruvate; SSB, single-strand binding protein. The pathways shown are for the synthesis of serine and methionine (Ser, Met, A), arginine (Arg, B), peptidoglycan (C) and the precursor of aromatic amino acids (chorismate) and cysteine (Cys, E) as well as for DNA replication (D).
Fig. S3.
Amino acid biosynthetic pathways in P. chromatophora. The distribution of chromatophore-encoded (within the green rectangle) and nuclear-encoded genes is shown, although the subcellular localization of their gene products is unknown. Numbers associated with chromatophore-encoded enzymes are the locus tags of the respective gene (e.g., 1234 stands for PCC_1234). Represented by dark blue, green, or violet arrows are enzymes involved in biosynthesis of neutral, negatively charged, or basic amino acids, respectively. Pale lettering/arrows indicate that the gene is missing from the chromatophore genome or in the nuclear transcriptome data. The circles and rectangles adjacent to the enzymes indicate their phylogenetic origin and targeting prediction, respectively. Targeting predictions for full-length proteins were obtained using TargetP and mTP and SP predictions with a reliability class (RC) <3 are shown. Multiples of the individual symbols represent the presence of multiple protein versions encoded by the transcript dataset. Question marks indicate uncertainties in enzymatic activity. 3-PGA, 3-phosphoglycerate; PEP, phosphoenolpyruvate; PRPP, 5-phosphoribosyl 1-pyrophosphate; THF, tetrahydrofolate; unspec. TA, unspecific transaminase.
Fig. S4.
Nucleotide biosynthetic pathways in P. chromatophora. The distribution of chromatophore-encoded (within the green rectangle) and nuclear-encoded genes is shown, although the subcellular localization of their gene products is unknown. Numbers associated with chromatophore-encoded enzymes are the locus tags of the respective gene (e.g., 1234 stands for PCC_1234). Pale lettering/arrows indicate that the gene is missing from the chromatophore genome or in the nuclear transcriptome data. The circles and rectangles adjacent to the enzymes indicate their phylogenetic origin and targeting prediction, respectively. Targeting predictions for full-length proteins were obtained using TargetP and mTP and SP predictions with a RC <3 are shown. Multiples of the individual symbols represent the presence of multiple protein versions encoded by the transcript dataset.
Fig. 2.
Phylogeny of HGT-derived genes in P. chromatophora. Maximum likelihood phylogenetic trees from amino acid alignments of (A) MurF, (B) PolA, (C) LigA, and (D) AroE. Numbers at the branches represent bootstrap values. Color code: purple, α-; black, β-; and gray, γ-proteobacteria; blue, α-; and green, β-cyanobacteria; orange, thermodesulfobacteria; pink, Eukarya; and red, P. chromatophora. (E) Portion of amino acid alignment of nuclear and chromatophore-encoded copies of P. chromatophora AroE with proteobacterial and cyanobacterial sequences. The tree (left) represents “species” phylogeny based on the ribosomal operon. The lineages are marked as follows: green, S. elongatus; pink, marine Synechococcus clade; blue, Prochlorococcus clade; orange, Cyanobium clade; red, P. chromatophora (nuclear and chromatophore genes); black, β-; and gray, γ-proteobacteria.
Predominance of HGT in the Evolution of P. chromatophora.
The finding that a β-proteobacterial MurF was encoded on the P. chromatophora nuclear genome prompted us to search for additional bacterial genes on this genome. Based on phylogenetic analysis of proteins encoded by P. chromatophora nuclear transcripts, there are at least 150 independent bacterial gene acquisitions that are often followed by gene family expansions, resulting in at least 229 bacterium-derived genes (Materials and Methods and Dataset S1). Only 58 (or 25%) of these genes are of α-cyanobacterial origin, and thus potentially chromatophore-derived, although we cannot exclude the possibility that some may also have arisen via HGT from related cyanobacterial lineages. Most of the remaining 171 HGTs are affiliated with other bacteria, with 64 being confidently assigned to a specific donor bacterial lineage and two for which an HGT or EGT origin could not be unambiguously determined (Fig. S5 A and B and Dataset S1). For 52 other genes there was not sufficient bootstrap support (i.e., ≥80%) to establish affiliation with a particular bacterial clade, or the sequences originated at the base of a particular lineage, indicating a likely donor group, but with lower confidence. The remaining 53 bacterial genes could not be assigned to a specific clade due to frequent HGTs among these taxa. Nonetheless, these latter genes likely arose via HGT because similar genes are absent in other eukaryotes or α-cyanobacteria (Fig. S5 A and B and Dataset S1). Therefore, our results suggest a predominance of HGT over EGT in the evolution of the P. chromatophora photosynthetic lineage. We hypothesize that this result is explained by the fact that P. chromatophora has a phagotrophic ancestry that facilitated the HGT ratchet. Analysis of a partial nuclear genome sequence from wild-caught cells of Paulinella ovalis, a phagotrophic sister lineage of P. chromatophora, revealed the presence of various bacterial DNA sequences that were likely derived from food vacuoles (18). This partial genome dataset also revealed nuclear genes of α-cyanobacterial origin (e.g., a diaminopimelate epimerase gene), suggesting that in addition to EGT phagotrophy can lead to HGT in the Paulinella lineage, as previously hypothesized (19). These results can also be the consequence of the uptake of DNA from the environment by transformation or by viral transduction.
Fig. S5.
Origin of HGT genes (A and B) and functional categories of HGT (C) and EGT genes (D) in the P. chromatophora nuclear genome. (A and B) Total number of bacterial genes identified in this study in the P. chromatophora nuclear transcriptome broken down by their presumptive sources (for explanation of classification of phylogenetic support see Dataset S1). Because gene acquisition was often followed by gene family expansion, the total number of genes (A) is higher than the number of presumed individual transfers (B). (C and D) Numbers of acquired genes broken down by their presumed cellular functions (see also Dataset S1). Because HGT was often followed by gene family expansion, the total number of genes (light gray) is higher than the number of presumed single transfers (dark gray).
Spliced Leader Sequences and Introns Confirm Nuclear Origin of HGT Genes.
Validation of the nuclear origin of P. chromatophora HGT candidates is provided by the presence of a conserved 20-nt transspliced leader (SL) sequence on many of these transcripts. The biological function of SLs is not well understood but they are found at the 5′ terminus of mature mRNAs in a phylogenetically diverse group of organisms including euglenozoans, cnidarians, chordates, nematodes, and dinoflagellates (20), but to our knowledge they have not previously been reported from Rhizaria. Of the 17,801 unique nuclear transcripts with an assigned function, 4,649 (26.1%) contained the SL sequence CGGATAWTCCKGCTTTTCTG or a 5′-truncated version of this sequence (but at least CTTTTCTG) within the first 40 nt and usually at the 5′ terminus (Fig. S6). Because RNA sequencing generally results in poor assembly at the 5′ends of transcripts, we expect that the actual fraction of transcripts carrying a SL at their 5′ end is much higher. As expected, SLs were absent from all chromatophore- and mitochondrion-derived transcripts. Of the presumed HGT-derived cDNA contigs, 32% contained an SL (Dataset S1). For the other presumed HGT-derived transcripts, we searched for spliceosomal introns in the corresponding genomic contigs. Using both of these approaches we were able to confirm the nuclear origin for 162 of the 171 genes derived via HGT (Dataset S1).
Fig. S6.
P. chromatophora spliced leader sequences. (A) Thirty typical transspliced transcripts aligned by their SL sequence. (B) Transspliced transcripts aligned with their encoding genes; SL sequences are represented in pale colors.
Chromatophore-Related Functions of HGT Genes.
Adaptive HGTs from bacteria have been reported from diverse eukaryotic lineages (e.g., refs. 21–25). Thus, it is likely that some HGT candidates represent ancient transfers to the nuclear genome that are not related to chromatophore function. However, none of the P. chromatophora HGTs was present in the partial P. ovalis dataset, and many encode proteins that fill specific gaps in chromatophore-encoded metabolic pathways [e.g., d-Ala-d-Ala ligase MurF (Figs. 1C and 2A), a DNA polymerase I (PolA) responsible for removal of RNA primers and filling in the resulting gaps during DNA replication, a DNA ligase (LigA) that seals DNA nicks (Figs. 1D and 2 B and C), and a serine O-acetyltransferase CysE (Fig. 1E)]. Eight bacterial genes of non-α-cyanobacterial provenance function in bacterial cell wall biosynthesis or division, whereas 25 are associated with the processing of genetic information. Twelve HGTs encode transporters that might facilitate metabolite or ion exchange between the chromatophore and the P. chromatophora cytoplasm (Fig. S5 C and D and Dataset S1). For example, a gene encoding a putative Gly/Ala Na+ symporter may be involved in shuttling cytoplasmically synthesized Gly and Ala into the chromatophore, which lacks genes encoding the pathways for Gly and Ala biosynthesis (Fig. S3).
A lack of biochemical data makes it impossible to predict the subcellular localization of nuclear-encoded, and in particular, HGT-derived proteins. However, the functional complementarity of nuclear and chromatophore-encoded proteins provides a reasonable basis for our speculation that, similar to the case of the EGT-derived photosynthesis polypeptides PsaE and PsaK, nuclear-encoded HGT-derived proteins are imported into the chromatophore to rescue lost gene functions. In this context it is interesting that a highly conserved glnB gene is present on the chromatophore genome (Fig. 1B and Fig. S7). This gene encodes the PII nitrogen-sensing protein that regulates arginine biosynthesis through interactions with the N-acetyl glutamate kinase (ArgB) (26), which is encoded on the nuclear genome and derived via HGT from a planctomycete donor. For transcripts that included the full-length N terminus of the encoded protein, as indicated by either the presence of an SL sequence or an in-frame stop codon upstream of the presumable start methionine, the occurrence of potential N-terminal targeting sequences was analyzed using TargetP 1.1 in nonplant mode (27) (Fig. 1 and Figs. S3 and S4). For the enzymes that catalyze the first steps in the arginine biosynthetic pathway (ArgJ, ArgB, and ArgC), as for PsaE and PsaK (17), no N-terminal presequences were predicted. For the last two enzymes of the arginine biosynthetic pathway, ArgG and ArgH, a mitochondrial targeting peptide (mTP) was predicted. TargetP predictions of mTPs and signal peptides (SPs) seem accurate for P. chromatophora based on the finding that most enzymes of the TCA cycle and typical ER proteins yield high confidence mTP or SP predictions, respectively (Table S1). Thus, it is likely that some HGT-derived proteins not predicted to contain an mTP or SP are targeted to the chromatophores where they replace lost functions, or play a role in host/chromatophore metabolic integration. However, in other cases the proteins for a given metabolic pathway may be partitioned between the cytoplasm and chromatophore and the connectivity of the pathway established by metabolite exchange between the two compartments.
Fig. S7.
ClustalX alignment of cyanobacterial and P. chromatophora PII proteins. Accession numbers are as follows: Cyanobium sp. PCC 7001 (WP 006909858.1); Gloeobacter violaceus (WP 011140260.1); Leptolyngbya sp. PCC 6406 (WP 008313271.1); Microcystis aeruginosa TAIHU98 (ELP55889.1); P. chromatophora CCAC0185 (YP 002048850.1); Prochlorococcus marinus (WP 011133091.1); Prochlorococcus sp. MIT 0601 (WP 036900543.1); Synechococcus sp. WH 5701 (WP 006171995.1); Synechococcus sp. WH 8102 (WP 011127336.1); and Synechocystis sp. PCC 6803 (WP 010873156.1).
Table S1.
TargetP-predicted subcellular localizations of P. chromatophora TCA-cycle enzymes and typical endoplasmic reticulum (ER) proteins
| KO | Annotation | Transcript | FL | Pred. | RC | TPlen |
| Presumptive mitochondrial proteins | ||||||
| K01610 | Phosphoenolpyruvate carboxykinase | Scaffold1715-size2867 | SL | — | 4 | — |
| K01610 | Phosphoenolpyruvate carboxykinase | Scaffold13900-size987 | SL | M | 2 | 27 |
| K01647 | Citrate synthase | Scaffold5561-size1790 | SL | M | 1 | 32 |
| K01647 | Citrate synthase | Scaffold4763-size1911 | SL | — | 1 | — |
| K00031 | Isocitrate dehydrogenase | Scaffold1668-size2892 | SL | M | 1 | 36 |
| K00031 | Isocitrate dehydrogenase | Scaffold5171-size1844 | SL | M | 1 | 54 |
| K00031 | Isocitrate dehydrogenase | Scaffold7869-size1494 | SL | M | 2 | 30 |
| K00164 | 2-oxoglutarate mitochondrial-like | Scaffold768-size3730 | SL | M | 1 | 45 |
| K00658 | Dihydrolipoamide succinyltransferase | Scaffold5357-size1818 | SL | M | 1 | 62 |
| K00658 | Dihydrolipoamide succinyltransferase | Scaffold9773-size1298 | SL | M | 1 | 19 |
| K01899 | Succinyl-coa ligase alpha subunit | Scaffold10104-size1271 | SL | M | 1 | 35 |
| K01900 | Succinyl-coa ligase beta subunit | Scaffold5858-size1745 | SL | M | 2 | 23 |
| K01900 | Succinyl-coa ligase beta subunit | Scaffold6040-size1720 | SL | M | 1 | 45 |
| K00235 | Succinate dehydrogenase subunit 2 | Scaffold11955-size1120 | SL | M | 3 | 23 |
| K01676 | Fumarate hydratase | Scaffold3270-size2225 | SL | M | 1 | 54 |
| K00025 | Malate dehydrogenase | Scaffold9817-size1295 | SL | — | 2 | — |
| K00026 | Malate dehydrogenase | Scaffold12090-size1110 | SL | M | 1 | 35 |
| Presumptive ER proteins | ||||||
| K03846 | Alpha- -mannosyltransferase | Scaffold3701-size2116 | SL | M | 4 | 86 |
| K03849 | Probable dolichyl pyrophosphate glc1 man9 c2 alpha- -glucosyltransferase | Scaffold7771-size1508 | ST | SP | 1 | 26 |
| K00729 | Dolichyl-phosphate beta-glucosyltransferase | Scaffold13333-size1021 | SL | SP | 1 | 27 |
| K07151 | Oligosaccharyl transferase | Scaffold2187-size2597 | SL | SP | 4 | 25 |
| K05546 | Glycoside hydrolase family 31 | Scaffold642-size3957 | SL | SP | 5 | 12 |
| K05546 | Alpha-glucosidase ii | Scaffold1009-size3417 | ST | SP | 1 | 18 |
| K01230 | Glycoside hydrolase family 47 protein | Scaffold2558-size2445 | SL | SP | 2 | 29 |
| K01230 | Mannosyl-oligosaccharide -alpha-mannosidase ia-like | Scaffold1523-size3001 | SL | SP | 1 | 24 |
| K01230 | ER mannosyl-oligosaccharide -alpha-mannosidase | Scaffold6185-size1702 | SL | — | 5 | — |
| K09490 | Heat shock protein 70 family member | Scaffold19995-size701 | SL | SP | 1 | 16 |
| K09580 | Protein disulfide isomerase A1 | Scaffold9082-size1363 | SL | SP | 2 | 16 |
| K09580 | Protein disulfide-isomerase A1 | Scaffold5515-size1797 | SL | SP | 1 | 28 |
| K09582 | Protein disulfide-isomerase A4-like | Scaffold3643-size2130 | SL | SP | 3 | 14 |
| K09583 | Protein disulfide-isomerase A5-like | Scaffold7220-size1567 | SL | SP | 1 | 18 |
| K09584 | Protein disulfide-isomerase A6 | Scaffold2938-size2323 | SL | SP | 1 | 19 |
| K09586 | ER resident protein 29 | Scaffold12514-size1077 | SL | SP | 2 | 16 |
Predictions were only made for protein sequences for which the full-length N terminus was obtained. The table lists the Kyoto Encyclopedia for Genes and Genomes orthology number (KO); annotation; transcript name; presence of a spliced leader sequence (SL), or a stop codon upstream of the start methionine (ST) at the 5′ end of the transcript as evidence for a full-length protein sequence; final TargetP prediction (M, mitochondrion and SP, secretory pathway); TargetP reliability class (RC) of prediction from 1 to 5 (with 1 indicating the strongest prediction); and length of the predicted presequence (TPlen).
Presymbiotic Interbacterial HGT vs. Postsymbiotic Bacteria to Eukaryote HGT.
Eukaryotic genomes are widely known to contain many genes of bacterial origin (28) that are usually attributed to mitochondrion and plastid endosymbiosis. The diverse phylogenetic origins of these bacterial genes is explained by the fluid genome composition of prokaryotes (8) that resulted in chimeric (presymbiotic) genomes in the donor lineages (29, 30). The observed bursts in HGT frequency that coincide with organelle acquisition in eukaryotes support this interpretation (8). Is this the case in P. chromatophora? Do the many bacterial gene transfers we observed have their origins in a highly chimeric α-cyanobacterial genome of the endosymbiont? Alternatively, did these foreign genes arise via EGT from the existing mitochondrial endosymbiont? The second explanation can be largely excluded on two counts: (i) The P. chromatophora HGT candidates are not found in other eukaryotes, all of which share the same mitochondrion, and (ii) many of these HGTs seem to specifically fill gaps in chromatophore pathways. Therefore, it is not reasonable to assume that a mitochondrion-derived murF gene was maintained over hundreds of millions of years even though there was no need for peptidoglycan synthesis.
To evaluate the first, more intriguing, scenario we used phylogenomics to determine how many of the 867 protein-coding genes still retained on the chromatophore genome had an HGT (i.e., non-α-cyanobacterial) origin. This analysis demonstrated that 848/867 (97.8%) of the chromatophore-encoded genes are placed unambiguously as sister to, or nested within, the α-cyanobacteria group and therefore are not the result of interphylum HGT. There is a single gene (PCC_0175, a YGGT family membrane protein) for which a noncyanobacterial origin is supported by a bootstrap value ≥80%. This implies that if lineage-specific HGT-derived genes were present in the chromatophore ancestor, they primarily had nonessential functions that did not survive endosymbiosis. Consistent with these findings is the observation that although cyanobacterial genomes are well known to undergo frequent HGTs (29, 31) (i) HGT rates in the Prochlorococcus/Synechococcus clade are the lowest among cyanobacteria (29) and (ii) a detailed comparative genomic study of Prochlorococcus spp. and marine Synechococcus spp. revealed a core set of 1,273 genes present in 12 Prochlorococcus species (32). Genes in the core genome encode essential functions including enzymes involved in central carbon metabolism and amino acid and chlorophyll biosynthesis. The larger, less widely distributed component of this pan-genome encodes functions that may relate to niche specificity and that are nonessential under optimal growth conditions. Genes such as argB, murF, polA, cysE, and aroE (discussed here; Fig. 1) are part of the core genome and are present in all 12 Prochlorococcus and 4 Synechococcus strains analyzed (32). In addition, the first β-cyanobacterium branching outside of the α-cyanobacteria, Synechococcus elongatus, contains the cyanobacterial version of these genes (Fig. 2E and Fig. S8), suggesting that the ancestor of the chromatophore also encoded cyanobacterial homologs of these genes. These results support our hypothesis that the P. chromatophora host cell acquired the many bacterial genes that we identified primarily through postendosymbiotic HGT, and not EGT from a highly chimeric endosymbiont genome. Finally, we note that insects such as mealybugs that harbor nutritional, bacterial endosymbionts with highly reduced genomes have also gained bacterial genes through HGT. Similar to the situation observed for P. chromatophora, the insect HGT-derived genes seem to complement functions lost from the symbiont genome (33).
Fig. S8.
(A–D) Synapomorphies in cyanobacterial and in P. chromatophora/bacterial proteins. Portion of amino acid alignments of various cyanobacterial genes with HGT-derived P. chromatophora genes and closely related bacterial gene versions. The tree (left) represents “species” phylogeny based on the ribosomal operon. The lineages are marked as follows: green, S. elongatus; pink, the marine Synechococcus clade; blue, the Prochlorococcus clade; orange, the Cyanobium clade; and red, P. chromatophora. Sequences are arranged according to their position in the rDNA phylogeny. Note the numerous synapomorphies across cyanobacterial gene versions on the one hand and the P. chromatophora gene with bacterial sequences on the other hand.
Intermediates in the Replacement Process.
To further test the hypothesis that HGT into the nuclear genome can replace chromatophore genes, we searched for potential intermediates in the replacement process and found full-length chromatophore-encoded genes with bacterial homologs present in the nuclear genome (both copies transcribed). Examples are the shikimate dehydrogenase AroE (Figs. 1E and 2 D and E), an inositol monophosphatase, and the elongation factor leader peptidase A (LepA) (Dataset S1). This potential intermediate replacement state was also identified for EGT-derived genes (15). Once the introduced nuclear gene attains targeting capabilities, the copy of the gene fixed or lost in each case of “gene duplication” via HGT or EGT cannot be predicted with confidence. However, the gene transfer ratchet model described by Doolittle (19) (and our data) predicts that over evolutionary time an increasing number of organelle genes will be lost in favor of nuclear copies. Additional genome data from phagotrophic Paulinella species will provide insights into which HGT-derived genes predate endosymbiosis and which may be associated with organelle evolution.
Conclusion
Whereas most eukaryotic genes are vertically inherited, data are accumulating of widespread HGT in eukaryotes that is tied to adaptation (28). The uptake of a bacterial endosymbiont represents a profound change in lifestyle that requires recalibration of the host genetic repertoire, a need that can be partially met via HGT. In addition, the impact of Muller’s ratchet that leads to endosymbiont genome reduction seems to drive the fixation of horizontally acquired “compensatory” bacterial genes in the host genome. Thus, similar to EGT, HGT-derived genes may facilitate integration of the endosymbiont by providing the host with transcriptional/translational control over chromatophore metabolic functions, metabolite fluxes between the cytoplasm and chromatophore, and the processing of genetic information. Therefore, like EGT, HGT establishes key connections that enable the host to coordinate host–chromatophore metabolism, growth, and proliferation. We hypothesize that in P. chromatophora phagotrophy was initially maintained during chromatophore integration (Fig. 3), with the mixotrophic lifestyle setting the stage for a gene transfer ratchet that facilitated organelle integration by enabling replacement of chromatophore genes with genes derived from either EGT or HGT, in addition to the repurposing of host-derived genes. This is consistent with the observed bursts in HGT frequencies coincident with plastid and mitochondrion acquisition (8).
Fig. 3.
Evolution of phototrophy from a phagotrophic ancestor in the Paulinella clade. In step 1a a mixotrophic cell evolved by maintaining a α-cyanobacterial endosymbiont and exploiting its photosynthetic ability. Over time, the host targeted proteins to the symbiont and inserted membrane transporters to gain control over symbiont growth and division, leading to vertical inheritance of the nascent organelle. Step 1b indicates heterotrophic Paulinella species that did not acquire permanent endosymbionts. In step 2 efficient metabolite exchange led to loss of phagotrophy and relaxed functional constraint on many chromatophore genes, leading to massive chromatophore genome reduction. Colored sections represent HGT (multicolor) and EGT (green) components of the nuclear genome; arrow thickness represents prevalence of the particular gene transfer type during different evolutionary stages.
Materials and Methods
Cultivation of P. chromatophora and Generation of Axenic Culture.
P. chromatophora CCAC0185 was grown as described previously (17). To generate an axenic culture P. chromatophora cells were sprayed onto nutrient agar plates. The axenic culture was obtained from a single bacteria-free P. chromatophora cell recovered from these plates (for details see SI Materials and Methods).
Generation of Sequencing Libraries and Assemblies.
Genomic DNA (gDNA) and cDNA derived from axenic P. chromatophora cultures were subjected to Illumina and Nextera library generation and sequencing, resulting in 147.4 Gbp gDNA and 4.9 Gbp cDNA raw sequence data. Genome and transcriptome assemblies were generated as detailed in SI Materials and Methods. All sequence and assembly data generated in this project can be accessed via NCBI BioProject PRJNA311736.
Estimation of Genome Size.
Mapping the Illumina HiSeq data against gDNA contigs that encode a complete or partial CEGMA core eukaryote protein (n = 78) resulted in an average coverage of 10.05×. With a total amount of 96.2 Gbp of HiSeq data, we arrived at a genome size estimation of 9.57 Gbp. Mapping the MiSeq data separately (1.81× average coverage of contigs; 17.4 Gbp of data) yields an extremely close estimate of 9.61 Gbp. Estimation of the genome size through k-mer counts arrived at a similar result (11.45 Gbp). A histogram of k-mer frequency (number of times a particular 31-mer was observed) vs. probability approaches an exponential distribution (Fig. S1), as opposed to an expected normal distribution. This pattern indicates that despite generating 113.6 Gbp of genome data, our sequence coverage was derived predominantly from unique or nonoverlapping DNA amplicons and evidences low-coverage sequencing of an extremely large genome. For more detail see SI Materials and Methods.
Phylogenomic Pipeline and Screening for HGT from Bacteria.
In brief, an initial phylogenomic analysis was performed as follows. Predicted proteins were queried via BLASTp (e-value ≤ 1 × 10−5) (34) against a local protein database. A maximum of 12 species from each taxonomic phylum was selected in descending order of blast bitscore from the results to a maximum of 150 total species, and the respective sequences were aligned. Maximum-likelihood phylogenies were generated using RAxML v. 8.2 (35) with 100 bootstrap replicates under the LG+G model. The resulting trees were screened for P. chromatophora + prokaryote monophyly with bootstrap support of ≥70% or for trees containing, besides the P. chromatophora sequence, sequences solely from prokaryotes. After this initial screening, contigs of potential bacterial origin were manually curated. Curated alignments were then subjected to a second phylogenetic analysis using IQTREE (36) with 2,000 ultrafast bootstrap replicates and automatic model selection. Phylogenetic trees and protein alignments are available at cyanophora.rutgers.edu/paulinella. For more detail see SI Materials and Methods.
SI Materials and Methods
Generation of Axenic Culture.
To generate a bacteria-free (axenic) culture, late log-phase cells were washed four times in sterile Waris-H culture medium. Single cells were then sprayed onto the surface of solid Waris-H medium containing 1% bacterial standard medium (0.8 g bactopeptone, 0.1 g glucose, 0.1 g meat extract, and 0.1 g yeast extract per 100 mL) solidified with 1% agar. Although P. chromatophora cannot grow on solid medium, cells survive for several days on the nutrient agar plates. After 3–5 d on the solid medium at 17 °C and a light intensity of 5 µE⋅m−2⋅s−1, bacterial growth was visualized by the formation of whitish halos surrounding nearly all of the P. chromatophora cells. Bacteria-free single cells (halo absent) were picked from the solid medium with a microcapillary and each cell was transferred into 0.75 mL of Waris-H medium supplemented with 1.5 mM Na2SiO3 and a threefold concentration of vitamins and soil extract. The axenic state of cultures was regularly checked microscopically as well as by bacterial growth assays (streaking aliquot of culture on nutrient agar plates as described above and turbidity tests in 7 mL liquid growth medium supplemented with bacterial standard medium at a 1:10 and 1:100 dilution). Furthermore, total genomic gDNA was extracted from the culture and a fragment of the 16S rDNA (1.5 kbp in length) was amplified by PCR using universal bacterial primers [SG1_baci and SG2_baci (37)]. This PCR product was ligated into the pJET1.2 vector (Thermo Fisher Scientific), cloned in Escherichia coli, and 20 clones were Sanger-sequenced using the pJET1.2 forward sequencing primer (Thermo Fisher Scientific). All 20 sequences recovered were identical with the P. chromatophora chromatophore 16S rDNA nucleotide sequence.
Generation of DNA Libraries and Illumina Sequencing.
Cells from 1- to 3-wk-old axenic P. chromatophora cultures were pooled, harvested by centrifugation, and washed four times with sterile medium at 4 °C. Pellets were snap-frozen in liquid nitrogen for nucleic acid extraction. TRIzol-extracted total RNA was used to generate a cDNA sequencing library using the TruSeq RNA Sample Preparation Kit (Illumina) with adaptor AR019. This library was sequenced in a single MiSeq run resulting in (2×) 16.3 M 150 bp paired-end reads totaling 4.9 Gbp [sequence read archive (SRA) accession no. SRX1624577].
Chloroform-extracted total gDNA was used to generate 400- to 500-bp and 500- to 600-bp insert size sequencing libraries using the TruSeq DNA Sample Preparation Kit (Illumina), as well as 5- to 6-kbp and 7- to 10-kbp Mate-Pair sequencing libraries using the Gel-Plus Protocol of the Nextera Mate-Pair Sample Preparation Kit (Illumina). The different gDNA libraries were sequenced as follows: (i) 400–500 bp insert library, sequenced in a single MiSeq run resulting in (2×) 16.7 M 250 bp paired-end reads, totaling 8.4 Gbp (SRA accession no. SRX1624478); (ii) 500–600 bp insert library, sequenced in a single MiSeq v2 run resulting in (2×) 19.7 M 325 bp paired-end reads totaling 12.8 Gbp (SRA accession no. SRX1624478); (iii) Mate-Pair 7–10 kbp library, sequenced in a single MiSeq run resulting in (2×) 4.3 M 75 bp paired-end reads totaling 0.32 Gbp; (iv) Mate-Pair 5–6 kbp sequenced in a MiSeq run, resulting in (2×) 11.7 M 75 bp paired-end reads totaling 0.88 Gbp; and (v) 400–500 bp insert library, sequenced in two HiSeq2000 lanes resulting in (2×) 250 M 250 bp paired-end reads totaling 125 Gbp (SRA accession no. SRX1624515).
Genome and Transcriptome Assembly and Annotation.
Raw genome sequencing reads were preprocessed with TreQ-CG (38), a utility that decreases volume and complexity of sequencing data by first clustering reads and then generating a consensus sequence that represents an elongated, error-corrected consensus read. The consensus reads were then adapter-trimmed with cutadapt (39) and assembled with ABySS v1.5 (40). The resulting assembly consisted of 1,177,125 contigs >100 bp with an N50 of 711 bp totaling 741 Mbp. The extremely fragmented nature of this assembly is explained by the large genome size of the amoeba (discussed below).
Transcriptome sequence data were quality- and adapter-trimmed using the CLC Genomics Workbench (Qiagen; quality trimmer heuristic limit cutoff = 0.05). The forward and reverse read sets from the paired-end sequencing were assembled independently of one another using the CLC Genomics Workbench de novo assembler, and these two assemblies were then merged and clustered/subassembled using the TGICL (41) pipeline and CAP3 assembler (42). The resulting assembly was placed onto scaffolds using SSPACE (43) and the paired-end read information. The final EST assembly comprised 78,140 contigs with an N50 of 1.1 kbp and summed to 49.5 Mbp. Of the 29.5 million paired reads that were mapped by SSPACE, 26.9 million mapped as pairs to the same EST. Proteins were predicted from the cDNA contigs using TransDecoder (https://transdecoder.github.io/) and the putative top-hit annotations/functions were determined using BLAST tools. The extensive database (and knowledge) of conserved prokaryotic gene functions allowed us to confidently annotate many of the HGTs described in our work. All sequence and assembly data generated in this project can be accessed via NCBI BioProject PRJNA311736. The mitochondrial genome was annotated using a combination of MITOS (44) and ORF identification/BLASTp against the NCBI nr database.
Estimation of Genome Size.
Genome size was estimated using a combination of Illumina sequence coverage depth of core single-copy eukaryote genes and k-mer counts. First, we isolated genomic DNA contigs that were found to encode a complete or partial CEGMA core eukaryote protein (n = 78) and mapped the Illumina HiSeq data to these contigs using the CLC Genomics Workbench (alignment stringency of 95% similarity). Using the average coverage of this mapping (10.05×) and the total amount of HiSeq data generated (96.2 Gbp), we arrived at a genome size estimation of 9.57 Gbp. Mapping the MiSeq data separately (1.81× average coverage of contigs; 17.4 Gbp of data) yields an extremely close estimate of 9.61 Gbp. Additionally, we counted the number of frequent (i.e., occurring more than once and thus not a product of sequencing error) k-mers in our HiSeq data using Turtle (45). The number of frequent 31-mers identified in our Paulinella data (n = 2.38 billion) corresponds to 85% (by number) of those derived from a 3.3 Gbp human genome library (n = 2.8 billion) at 42.2× coverage as in ref. 45. Our results would then be roughly commensurate with ca. 34.9× (at 85%) coverage of the human genome, or alternatively, 10.05× coverage (using the HiSeq coverage estimate above) of an 11.45-Gbp genome. A histogram of k-mer frequency (number of times a particular 31-mer was observed) vs. probability approaches an exponential distribution (Fig. S1), as opposed to an expected normal distribution. This pattern indicates that despite generating 113.6 Gbp of genome data, our sequence coverage was derived predominantly from unique or nonoverlapping DNA amplicons and evidences low-coverage sequencing of an extremely large genome. Our informatic assessments thus estimate the P. chromatophora strain CCAC0185 nuclear genome to be between 9.57–11.45 Gbp.
Testing the Level of Bacterial Contamination in the Sequence Data.
To test the level of bacterial contamination in our transcriptome assembly (introduced during sample preparation and/or sequencing steps), we blasted the E. coli 16S rDNA (BLASTn, e-value cutoff of 1 × 10−5) against our transcriptome dataset. We recovered 43 cDNA contigs that were not chromatophore-derived. The quality- and adapter-trimmed RNAseq data were then mapped (85% nt similarity over the entire read length) to a set of contigs composed of (i) the 536,242 bacterial + archaeal 16S rDNA sequences contained in the SILVA rRNA database release 123.1 (46), (ii) the 43 P. chromatophora cDNA contigs obtained from BLASTn using the E. coli 16S rDNA sequence as a query, and (iii) the chromatophore nucleotide genome sequence (13). A total of 66,674 of the 32.3 M reads (0.2%) mapped to contigs other than the chromatophore genome; these reads were then assembled using the CLC Genomics Workbench de novo assembler to 49 contigs, of which 27 were composed of two or more reads. One such contig in particular was 1,353 bp in length, contained 42,979 of the 66,674 reads, and had a top BLASTn (NCBI nr database) hit to the P. chromatophora chromatophore (it is likely that these reads had nonspecific or “equally good” hits to other bacterial rDNAs during the read mapping step and thus were not excluded as chromatophore-derived). Of the remaining 26 contigs, 3 had additional hits to P. chromatophora “16S-like” sequence (as denoted in NCBI) and the remaining 23 had hits to various bacteria, predominantly Pseudomonas (8 contigs). Thus, we estimate that <23,695 reads of the 32.3 M reads (<0.07%) are likely to be derived from bacterial 16S rDNA sources.
The same approach was used to determine bacterial contamination in the P. chromatophora nuclear genome libraries. The 38.8 M trimmed reads from gDNA library #2 (MiSeq, discussed above) were mapped to the reference contigs as above (85% nt similarity over entire read length). A total of 7,001 of the 38.7 M reads had an alignment (∼0.02%) to contigs other than the chromatophore genome and assembled to 10 contigs, of which 9 were composed of two or more reads. BLASTn analysis identified three contigs as P. chromatophora 16S rDNA and the remaining six contigs were of eukaryotic provenance (predominantly Schizosaccharomyces 18S, 40S, and 26S rDNAs). Finally, 2,752 of the 102.9 M trimmed reads from run1 of gDNA library #5 (HiSeq, discussed above) mapped to the SILVA prokaryote 16S database (85% nt similarity over entire read). These assembled to four contigs, two of which were composed of two or more reads. Both of those contigs were P. chromatophora 16S rDNA. Thus, we do not find evidence of bacterial contamination in the genome data.
Phylogenomic Pipeline and Screening for HGT from Bacteria.
An initial phylogenomic analysis was performed as follows. Predicted proteins were queried via BLASTp (e-value ≤ 1 × 10−5) (34) against a local protein database comprised of NCBI RefSeq release 59 with the addition of sequenced eukaryote (e.g., fungal, metazoan, Viridiplantae, and stramenopile) genomes from the Joint Genome Institute (jgi.doe.gov/), TBestDB (47), and six-frame translated eukaryote EST sequences obtained from the NCBI dbEST (48). A maximum of 12 species from each taxonomic phylum were selected in descending order of blast bitscore (to a maximum of 150 total species) from the results, and the respective protein sequences were aligned using MAFFT v. 7.2 (49;linsi option). Maximum-likelihood phylogenies were generated using RAxML v. 8.2 (35) with 100 bootstrap replicates under the LG+G model. The resulting trees were screened for P. chromatophora + prokaryote monophyly with bootstrap support of ≥70% using in-house Perl scripts or for trees containing, besides the P. chromatophora sequence, solely prokaryotes.
After this initial screening, contigs of potential bacterial origin were manually curated. Contigs encoding proteins (i) with a very low overall level of sequence conservation, (ii) with similarities dominated by a protein domain rather than overall protein coverage (different domains often aligning with different protein sequences), and (iii) representing very short segments of a protein were eliminated, unless additional contigs representing pieces of the same gene were identified. To identify contigs that represent pieces of the same gene or multiple in sequence divergence, copies of the same gene, the closest bacterial sequence for a given contig was blasted back to the P. chromatophora transcriptome (tBLASTn). Contigs identified were integrated in the respective alignment unless they were of eukaryotic or chromatophore origin. Curated alignments were then subjected to a second phylogenetic analysis using IQTREE (36) with 2,000 μLtrafast bootstrap replicates and automatic model selection. Phylogenetic trees and protein alignments are available at cyanophora.rutgers.edu/paulinella.
Supplementary Material
Acknowledgments
We thank John Coller and Ji Xuhuai (Stanford Functional Genomics Facility), Jane Grimwood and Jeremy Schmutz (HudsonAlpha) for the generation of Illumina sequence data, Karl Forchhammer for advice on the Arg biosynthetic pathway, and Rajat Roy for advice regarding genome size estimation. This study was supported by National Science Foundation Grants MCB-10370 (to A.R.G.) and EF 08-27023 and OCE 11-29203 (to D.B.) and Deutsche Forschungsgemeinschaft Grant NO 9010/1-1 (to E.C.M.N.).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. J.M.A. is a Guest Editor invited by the Editorial Board.
Data deposition: All sequence and assembly data generated in this project can be accessed via NCBI BioProject PRJNA311736. Sequence raw data reported in this paper have been deposited in the NCBI Sequence Read Archive [accession nos. SRX1624577 (cDNA reads) and SRX1624478, SRX1624478, and SRX1624515 (gDNA reads)].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1608016113/-/DCSupplemental.
References
- 1.Falkowski PG, et al. The evolution of modern eukaryotic phytoplankton. Science. 2004;305(5682):354–360. doi: 10.1126/science.1095964. [DOI] [PubMed] [Google Scholar]
- 2.Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. A molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol Evol. 2004;21(5):809–818. doi: 10.1093/molbev/msh075. [DOI] [PubMed] [Google Scholar]
- 3.Gould SB, Waller RF, McFadden GI. Plastid evolution. Annu Rev Plant Biol. 2008;59:491–517. doi: 10.1146/annurev.arplant.59.032607.092915. [DOI] [PubMed] [Google Scholar]
- 4.Martin W, Herrmann RG. Gene transfer from organelles to the nucleus: How much, what happens, and Why? Plant Physiol. 1998;118(1):9–17. doi: 10.1104/pp.118.1.9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schleiff E, Becker T. Common ground for protein translocation: Access control for mitochondria and chloroplasts. Nat Rev Mol Cell Biol. 2011;12(1):48–59. doi: 10.1038/nrm3027. [DOI] [PubMed] [Google Scholar]
- 6.Ball SG, et al. Metabolic effectors secreted by bacterial pathogens: Essential facilitators of plastid endosymbiosis? Plant Cell. 2013;25(1):7–21. doi: 10.1105/tpc.112.101329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Archibald JM. Evolution: Gene transfer in complex cells. Nature. 2015;524(7566):423–424. doi: 10.1038/nature15205. [DOI] [PubMed] [Google Scholar]
- 8.Ku C, et al. Endosymbiotic origin and differential loss of eukaryotic genes. Nature. 2015;524(7566):427–432. doi: 10.1038/nature14963. [DOI] [PubMed] [Google Scholar]
- 9.Marin B, Nowack ECM, Melkonian M. A plastid in the making: Evidence for a second primary endosymbiosis. Protist. 2005;156(4):425–432. doi: 10.1016/j.protis.2005.09.001. [DOI] [PubMed] [Google Scholar]
- 10.Nowack ECM. Paulinella chromatophora—Rethinking the transition from endosymbiont to organelle. Acta Soc Bot Pol. 2014;83(4):387–397. [Google Scholar]
- 11.Kies L. [Electron microscopical investigations on Paulinella chromatophora Lauterborn, a thecamoeba containing blue-green endosymbionts (Cyanelles) (author’s transl)] Protoplasma. 1974;80(1):69–89. doi: 10.1007/BF01666352. [DOI] [PubMed] [Google Scholar]
- 12.Kies L, Kremer BP. Function of cyanelles in the thecamoeba Paulinella chromatophora. Naturwissenschaften. 1979;66(11):578–579. [Google Scholar]
- 13.Nowack ECM, Melkonian M, Glöckner G. Chromatophore genome sequence of Paulinella sheds light on acquisition of photosynthesis by eukaryotes. Curr Biol. 2008;18(6):410–418. doi: 10.1016/j.cub.2008.02.051. [DOI] [PubMed] [Google Scholar]
- 14.Nakayama T, Ishida K. Another acquisition of a primary photosynthetic organelle is underway in Paulinella chromatophora. Curr Biol. 2009;19(7):R284–R285. doi: 10.1016/j.cub.2009.02.043. [DOI] [PubMed] [Google Scholar]
- 15.Nowack ECM, et al. Endosymbiotic gene transfer and transcriptional regulation of transferred genes in Paulinella chromatophora. Mol Biol Evol. 2011;28(1):407–422. doi: 10.1093/molbev/msq209. [DOI] [PubMed] [Google Scholar]
- 16.Reyes-Prieto A, et al. Differential gene retention in plastids of common recent origin. Mol Biol Evol. 2010;27(7):1530–1537. doi: 10.1093/molbev/msq032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nowack ECM, Grossman AR. Trafficking of protein into the recently established photosynthetic organelles of Paulinella chromatophora. Proc Natl Acad Sci USA. 2012;109(14):5340–5345. doi: 10.1073/pnas.1118800109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bhattacharya D, et al. Single cell genome analysis supports a link between phagotrophy and primary plastid endosymbiosis. Sci Rep. 2012;2:356. doi: 10.1038/srep00356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Doolittle WF. You are what you eat: A gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet. 1998;14(8):307–311. doi: 10.1016/s0168-9525(98)01494-2. [DOI] [PubMed] [Google Scholar]
- 20.Bitar M, Boroni M, Macedo AM, Machado CR, Franco GR. 2013. The spliced leader trans-splicing mechanism in different organisms: Molecular details and possible biological roles. Front Genet 4:199 (abstr) [DOI] [PMC free article] [PubMed]
- 21.Acuña R, et al. Adaptive horizontal transfer of a bacterial gene to an invasive insect pest of coffee. Proc Natl Acad Sci USA. 2012;109(11):4197–4202. doi: 10.1073/pnas.1121190109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Boto L. Horizontal gene transfer in the acquisition of novel traits by metazoans. Proc R Soc B. 2014;281(1777):20132450. doi: 10.1098/rspb.2013.2450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.de Koning AP, Brinkman FSL, Jones SJM, Keeling PJ. Lateral gene transfer and metabolic adaptation in the human parasite Trichomonas vaginalis. Mol Biol Evol. 2000;17(11):1769–1773. doi: 10.1093/oxfordjournals.molbev.a026275. [DOI] [PubMed] [Google Scholar]
- 24.Ropars J, et al. Adaptive horizontal gene transfers between multiple cheese-associated fungi. Curr Biol. 2015;25(19):2562–2569. doi: 10.1016/j.cub.2015.08.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schönknecht G, et al. Gene transfer from bacteria and archaea facilitated evolution of an extremophilic eukaryote. Science. 2013;339(6124):1207–1210. doi: 10.1126/science.1231707. [DOI] [PubMed] [Google Scholar]
- 26.Burillo S, Luque I, Fuentes I, Contreras A. Interactions between the nitrogen signal transduction protein PII and N-acetyl glutamate kinase in organisms that perform oxygenic photosynthesis. J Bacteriol. 2004;186(11):3346–3354. doi: 10.1128/JB.186.11.3346-3354.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007;2(4):953–971. doi: 10.1038/nprot.2007.131. [DOI] [PubMed] [Google Scholar]
- 28.Huang J. Horizontal gene transfer in eukaryotes: The weak-link model. BioEssays. 2013;35(10):868–875. doi: 10.1002/bies.201300007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dagan T, et al. Genomes of Stigonematalean cyanobacteria (subsection V) and the evolution of oxygenic photosynthesis from prokaryotes to plastids. Genome Biol Evol. 2013;5(1):31–44. doi: 10.1093/gbe/evs117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Soucy SM, Huang J, Gogarten JP. Horizontal gene transfer: Building the web of life. Nat Rev Genet. 2015;16(8):472–482. doi: 10.1038/nrg3962. [DOI] [PubMed] [Google Scholar]
- 31.Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT. Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events. Genome Res. 2006;16(9):1099–1108. doi: 10.1101/gr.5322306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kettler GC, et al. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 2007;3(12):e231. doi: 10.1371/journal.pgen.0030231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Husnik F, et al. Horizontal gene transfer from diverse bacteria to an insect genome enables a tripartite nested mealybug symbiosis. Cell. 2013;153(7):1567–1578. doi: 10.1016/j.cell.2013.05.040. [DOI] [PubMed] [Google Scholar]
- 34.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 35.Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hess S, Suthaus A, Melkonian M. “Candidatus Finniella” (Rickettsiales, Alphaproteobacteria), novel endosymbionts of viridiraptorid amoeboflagellates (Cercozoa, Rhizaria) Appl Environ Microbiol. 2015;82(2):659–670. doi: 10.1128/AEM.02680-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mahmud P. 2014. Reduced representations for efficient analysis of genomic data. PhD thesis (Rutgers, The State University of New Jersey, New Brunswick, NJ)
- 39.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10–12. [Google Scholar]
- 40.Simpson JT, et al. ABySS: A parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–1123. doi: 10.1101/gr.089532.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pertea G, et al. TIGR Gene Indices clustering tools (TGICL): A software system for fast clustering of large EST datasets. Bioinformatics. 2003;19(5):651–652. doi: 10.1093/bioinformatics/btg034. [DOI] [PubMed] [Google Scholar]
- 42.Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res. 1999;9(9):868–877. doi: 10.1101/gr.9.9.868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27(4):578–579. doi: 10.1093/bioinformatics/btq683. [DOI] [PubMed] [Google Scholar]
- 44.Bernt M, et al. MITOS: Improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol. 2013;69(2):313–319. doi: 10.1016/j.ympev.2012.08.023. [DOI] [PubMed] [Google Scholar]
- 45.Roy RS, Bhattacharya D, Schliep A. Turtle: Identifying frequent k-mers with cache-efficient algorithms. Bioinformatics. 2014;30(14):1950–1957. doi: 10.1093/bioinformatics/btu132. [DOI] [PubMed] [Google Scholar]
- 46.Quast C, et al. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res. 2013;41(Database issue):D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.O’Brien EA, et al. TBestDB: A taxonomically broad database of expressed sequence tags (ESTs) Nucleic Acids Res. 2007;35(Database issue):D445–D451. doi: 10.1093/nar/gkl770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Boguski MS, Lowe TMJ, Tolstoshev CM. dbEST--Database for “expressed sequence tags”. Nat Genet. 1993;4(4):332–333. doi: 10.1038/ng0893-332. [DOI] [PubMed] [Google Scholar]
- 49.Katoh K, Asimenos G, Toh H. Multiple alignment of DNA sequences with MAFFT. Methods Mol Biol. 2009;537:39–64. doi: 10.1007/978-1-59745-251-9_3. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.











