Significance
Mealybugs are plant sap-sucking insects with a nested symbiotic arrangement, where one bacterium lives inside another bacterium, which together live inside insect cells. These two bacteria, along with genes transferred from other bacteria to the insect genome, allow the insect to survive on its nutrient-poor diet. Here, we show that the innermost bacterium in this nested symbiosis was replaced several times over evolutionary history. These results show that highly integrated and interdependent symbiotic systems can experience symbiont replacement and suggest that similar dynamics could have occurred in building the mosaic metabolic pathways seen in mitochondria and plastids.
Keywords: Sodalis, organelle, horizontal gene transfer, scale insect
Abstract
Stable endosymbiosis of a bacterium into a host cell promotes cellular and genomic complexity. The mealybug Planococcus citri has two bacterial endosymbionts with an unusual nested arrangement: the γ-proteobacterium Moranella endobia lives in the cytoplasm of the β-proteobacterium Tremblaya princeps. These two bacteria, along with genes horizontally transferred from other bacteria to the P. citri genome, encode gene sets that form an interdependent metabolic patchwork. Here, we test the stability of this three-way symbiosis by sequencing host and symbiont genomes for five diverse mealybug species and find marked fluidity over evolutionary time. Although Tremblaya is the result of a single infection in the ancestor of mealybugs, the γ-proteobacterial symbionts result from multiple replacements of inferred different ages from related but distinct bacterial lineages. Our data show that symbiont replacement can happen even in the most intricate symbiotic arrangements and that preexisting horizontally transferred genes can remain stable on genomes in the face of extensive symbiont turnover.
Many organisms require intracellular bacteria for survival. The oldest and most famous example is the eukaryotic cell, which depends on mitochondria (and in photosynthetic eukaryotes, the chloroplasts or plastids) for the generation of biochemical energy (1–4). However, several more evolutionarily recent examples exist, where intracellular bacteria are involved in nutrient production from unbalanced host diets. For example, deep sea tube worms, some protists, and many sap-feeding insects are completely dependent on intracellular bacteria for essential nutrient provisioning (5–7). Some of these symbioses can form highly integrated organismal and genetic mosaics that, in many ways, resemble organelles (8–11). Like organelles, these endosymbionts have genomes encoding few genes (12, 13), rely on gene products of bacterial origin that are encoded on the host genome (9–11, 14, 15), and in some cases, import protein products encoded by these horizontally transferred genes back into the symbiont (16, 17). The names given to these bacteria—endosymbiont, protoorganelle, or bona fide organelle—are a matter of debate (18–21). What is not in doubt is that long-term interactions between hosts and essential bacteria generate highly integrated and complex symbioses.
Establishment of a nutritional endosymbiosis is beneficial for a host by allowing access to previously inaccessible food sources. However, strict dependence on intracellular bacteria can come with a cost: endosymbionts that stably associate with and provide essential functions to hosts often experience degenerative evolution (22–25). This degenerative process is thought to be driven by long-term reductions in effective population size (Ne) caused by the combined effects of asexuality [loss of most recombination and lack of new DNA through horizontal gene transfer (HGT)] and host restriction (e.g., frequent population bottlenecks at transmission in vertically transmitted bacteria) (26). The outcomes of these processes are clearly reflected in the genomes of long-term endosymbionts. These genomes are the smallest of any bacterium that is not an organelle, have among the fastest rates of evolution measured for any bacterium (12, 13), and are predicted to encode proteins and RNAs with decreased structural stability (26, 27). In symbioses where the endosymbiont is required for normal host function, such as in the bacterial endosymbionts of sap-feeding insects, this degenerative process can trap the host in a symbiotic “rabbit hole,” where it depends completely on a symbiont which is slowly degenerating (28).
Unimpeded, the natural outcome of this degenerative process would seem to be extinction of the entire symbiosis. However, extinction, if it does happen, is difficult to observe, and surely is not the only solution to dependency on a degenerating symbiont. For example, organelles are bacterial endosymbionts that have managed to survive for billions of years (2). Despite the reduced Ne of organelle genomes relative to nuclear genomes, eukaryotes are able to purge deleterious mutations that arise on organelle genomes, perhaps through a combination of host-level selection and the strong negative selective effects of substitutions on gene-dense organelle genomes (29, 30). Extant organelle genomes also encode few genes relative to most bacteria, and it is also likely that a long history of moving genes to the nuclear genome has helped slow or stop organelle degeneration (21, 31). Some of the most degenerate insect endosymbionts also seem to have adopted a gene transfer strategy, although the number of transferred genes is far smaller compared with organelles. In aphids, mealybugs, psyllids, and whiteflies, some genes related to endosymbiont function are encoded on the nuclear genome, although in most cases, these genes have been transferred from other bacteria and not the symbionts themselves (9–11, 14). Another solution to avoid host extinction is to replace the degenerating symbiont with a fresh one or supplement it with a new partner. Examples of symbiont replacement and supplementation are replete in insects, occurring in at least the sap-feeding Auchenorrhyncha (23, 32–34), psyllids (22, 35), aphids (25, 36, 37), lice (38), and weevils (39, 40). When viewed over evolutionary time, it becomes clear that endosymbioses can be dynamic—both genes and organisms come and go. It follows that any view of a symbiotic system established from just one or a few host lineages might provide only a snapshot of the complexity that built the observed relationship.
Mealybugs (Hemiptera: Cocoidea: Pseudococcidae) are a group of phloem sap-sucking insects that contain most of the symbiotic complexity described above. All of these insects depend on bacterial endosymbionts to provide them with essential amino acids missing from their diets, but nutrient provisioning is accomplished in dramatically different ways in different mealybug lineages. One subfamily, the Phenacoccinae, has a single β-proteobacterial endosymbiont called Tremblaya phenacola, which provides essential amino acids and vitamins to the host insect (9, 41). In the other subfamily of mealybugs, the Pseudococcinae, Tremblaya has been supplemented with a second bacterial endosymbiont, a γ-proteobacterium named Moranella endobia in the mealybug Planococcus citri (PCIT). Although symbiont supplementation is not uncommon, what makes this symbiosis unique is its structure: Moranella stably resides in the cytoplasm of its partner bacterial symbiont, Tremblaya princeps (42–45).
The organisms in the nested three-way P. citri symbiosis are intimately tied together at the metabolic level. T. princeps PCIT has one of the smallest bacterial genomes ever reported, totaling 139 kb in length, encoding only 120 protein-coding genes, and lacking many translation-related genes commonly found in the most extremely reduced endosymbiont genomes (42). Many metabolic genes missing in Tremblaya are present on the M. endobia PCIT genome. Together with their host insect, these two symbionts are thought to work as a “metabolic patchwork” to produce nutrients needed by all members of the consortium (42). The symbiosis in P. citri is further supported by numerous HGTs from several different bacterial donors to the insect genome, but not from Tremblaya or Moranella. These genes are up-regulated in the insect's symbiotic tissue (the bacteriome) and fill in many of the remaining metabolic gaps inferred from the bacterial endosymbiont genomes (9).
Other data suggest additional complexity in the mealybug symbiosis. Phylogenetic analyses of the intra-Tremblaya endosymbionts show that, although different lineages of mealybugs in the Pseudococcinae all possess γ-proteobacterial endosymbionts related to Sodalis, these bacteria do not show the coevolutionary patterns typical of many long-term endosymbionts (43, 44, 46). Developmental studies suggest that Tremblaya and its resident γ-proteobacteria can be differentially regulated by the host (44, 47). These data raise the possibility that the innermost bacterium of this symbiosis is labile and may have resulted from separate acquisitions, or that the original intra-Tremblaya symbiont has been replaced in different mealybug lineages. What is not clear is when these acquisitions may have occurred and what effect they have had on the symbiosis. Here, we use host and symbiont genome sequencing from seven mealybug species (five generated for this study) to better understand how complex interdependent symbioses may develop over time in the context of gene and organism acquisition and loss.
Results
Overview of Our Sequencing Efforts.
We generated genome data for five diverse Pseudococcinae mealybug species, in total closing nine symbiont genomes into single circular-mapping molecules (five genomes from Tremblaya and four from the Sodalis-allied γ-proteobacterial symbionts) (Table 1). Unexpectedly, we detected γ-proteobacterial symbionts in Maconellicoccus hirsutus (MHIR), which was not previously reported to harbor intrabacterial symbionts inside Tremblaya cells (Figs. 1–3 and Fig. S1). We also found that Pseudococcus longispinus (PLON) harbored two γ-proteobacterial symbionts, each with a complex genome larger than 4 Mbp; these genomes were left as a combined draft assembly of 231 contigs with a total size of 8,191,698 bp and an N50 of 82.6 kbp (Table 1).
Table 1.
Genome statistics for mealybug endosymbionts and draft mealybug genomes
| Mealybug species | P. avenae | M. hirsutus | F. virgata | P. citri | P. longispinus | T. perrisii | P. marginatus |
| Mealybug abbreviation | PAVE | MHIR | FVIR | PCIT | PLON | TPER | PMAR |
| Total assembly size (bp) | NA | 163,044,544 | 304,570,832 | 377,829,872 | 284,990,201 | 237,582,518 | 191,208,351 |
| Total o. of scaffolds | NA | 12,889 | 32,723 | 167,514 | 66,857 | 80,386 | 60,102 |
| N50 | N75 | NA | 47,025 | 22,300 | 25,562 | 12,551 | 7,078 | 3,639 | 10,126 | 4,908 | 4,681 | 2,689 | 6,799 | 3,788 |
| BUSCOs Arthropoda (n=2,675) | NA | 76% | 76% | 71% | 70% | 66% | 72% |
| BUSCOs Eukaryota (n=429) | NA | 85% | 84% | 80% | 78% | 77% | 82% |
| CEGMA (n=248; including partial) | NA | 99.19% | 97.98% | 98.79% | 98.39% | 99.6% | 98.79% |
| Tremblaya symbiont | T. phenacola | T. princeps | T. princeps | T. princeps | T. princeps | T. princeps | T. princeps |
| Genome size (plasmid size if present) | 170,756 bp (744 bp) | 138,415 bp | 141,620 bp | 138,927 bp | 144,042 bp | 143,340 bp | 140,306 bp |
| Average fragment coverage | NA (454 data) | 795 | 663 | 374 | 1,326 | 2,364 | 787 |
| G + C (%) | 42.2 | 61.8 | 58.3 | 58.8 | 58.9 | 57.8 | 58.3 |
| CDS (pseudogenes) | 178 (3) | 136 (7) | 132 (13) | 125 (16) | 134 (15) | 116 (31) | 124 (17) |
| CDS coding density (%) | 86.3 | 77.2 | 69.3 | 66.0 | 70.7 | 59.2 | 67.0 |
| rRNAs | tRNAs | ncRNAs | 4 | 31 | 3 | 6 | 14 | 3 | 6 | 14 | 3 | 6 | 10 | 3 | 6 | 16 | 3 | 6 | 12 | 3 | 6 | 17 | 3 |
| γ-Proteobacterial symbiont | Not present | D. endobia | G. endobia | Mo. endobia | PLON1 and PLON2 | H. endobia | Mi. endobia |
| Genome size (plasmid size) | NA | 834,723 bp (11,828 bp) | 938,041 bp | 538,294 bp | 8,190,816* | 628,221 bp (8,492 bp) | 352,837 bp |
| Average fragment coverage | NA | 121 (38) | 372 | 827 | 30 | 559 (312; 1,750) | 620 |
| G + C (%) | NA | 44.2 | 28.9 | 43.5 | 53.9 | 42.8 | 30.6 |
| CDS (pseudogenes) | NA | 564 (99) | 461 (30) | 419 (24) | NA (NA) | 510 (16) | 273 (8) |
| CDS coding density (%) | NA | 59.8 | 48.1 | 77.4 | NA | 80.4 | 75.5 |
| rRNAs | tRNAs | ncRNAs | NA | 3 | 40 | 14 | 3 | 39 | 8 | 5 | 41 | 9 | NA | 3 | 41 | 10 | 3 | 41 | 5 |
| Reference | 9 | This study | This study | 42 | This study | This study | This study |
H. endobia codes two plasmids of 3,244 and 5,248 bp. Extended assembly metrics for draft mealybug genomes are available as Table S2.
Combined assembly size for both γ-proteobacterial symbionts in PLON. CDS, protein-coding DNA sequence; NA, not applicable; ncRNA, noncoding RNA; PAVE, Phenacoccus avenae.
Fig. 1.
Genome size and structure of the mealybug endosymbionts. Linear genome alignments of (Upper) seven Tremblaya genomes (blue) are contrasted with linear genome alignments of (Lower) five genomes of their respective γ-proteobacterial symbionts (red). The T. princeps genomes are perfectly collinear and similar in size, whereas the γ-proteobacterial genomes are highly rearranged and different in size. Alignments are ordered based on a schematic mealybug/Tremblaya phylogeny (original phylogenies are in Fig. S1) and accompanied by basic genome statistics (detailed genome statistics are in Table 1). Gene boxes are colored according to their category: proteins in blue, pseudogenes in gray, rRNAs in green, noncoding RNAs in yellow, and tRNAs in red.
Fig. 3.
A complex history of gene retention, loss, and acquisition in the mealybug symbiosis. Retention of selected biosynthetic pathways, such as amino acids, B vitamins (B-vit.), peptidoglycan (Peptido.), translation-related genes [various initiation, elongation, and termination factors (Tr. factors)], and HGTs. For each of seven mealybug species, boxes in row 1 represent Tremblaya genes (blue), row 2 represents its γ-proteobacterial symbionts (red), and row 3 represents the host genome (insect genes in green and HGTs in yellow). Missing genes are shown in gray, and recognizable pseudogenes are shown with black radial gradient. Raw data used here (including gene names) are available in Dataset S2B. Bio., biotin; Rib., riboflavin.
Fig. S1.
Supplementary phylogenetic trees. Values at nodes represent support from ML bootstrap pseudoreplicates. (A) Multigene ML phylogeny of Tremblaya within β-proteobacteria inferred from 49 concatenated protein sequences. (B) Zoomed-in Tremblaya ML phylogeny inferred from the 16S–23S rRNA alignment. (C) Multigene mealybug ML phylogeny inferred from 419 concatenated CEGMA protein sequences. (D) ML phylogeny of γ-proteobacterial symbionts inferred from the 16S–23S rRNA alignment. Clade labels A–G were adopted from the work by Thao et al. (43).
We also assembled five mealybug draft genomes (Table 1). Because our assemblies were generated only from short-insert paired end data, the insect draft genomes consisted primarily of numerous short scaffolds (Fig. S2 and Table S1).
Fig. S2.
Schematic diagrams of insect scaffolds containing HGTs involved in amino acid and B vitamin metabolism. Insect exons (predicted by GeneMark ES) are color-coded as green rectangles and when in close proximity to HGTs, annotated by their putative functions. Genes of bacterial origin are highlighted in yellow. (A) Genome localization of bioABD, ribAD, lysA, dapF, and tms HGTs confirming that they are present on insect scaffolds. Only the longest scaffold for each HGT is shown, because the scaffolds from different mealybug species share gene order. (B) Alignments of M. hirsutus, P. marginatus, and F. virgata scaffolds showing cysK acquisition after divergence of the Maconellicoccus clade and cysK duplication in F. virgata (also present in P. citri and P. longispinus) and riboflavin transporter duplication in P. marginatus.
Table S1.
Extended assembly metrics for draft mealybug genomes
| Assembly metric | MHIR | FVIR | PCIT (reassembly) | PLON | TPER | PMAR |
| Total assembly size (bp) | 163,044,544 | 304,570,832 | 377,829,872 | 284,990,201 | 237,582,518 | 191,208,351 |
| Total no. of scaffolds | 12,889 | 32,723 | 167,514 | 66,857 | 80,386 | 60,102 |
| No. of scaffolds ≥1,000 bp | 8,043 | 21,984 | 64,930 | 40,284 | 58,090 | 33,617 |
| Largest scaffold (bp) | 393,850 | 322,873 | 82,122 | 182,788 | 54,847 | 76,575 |
| N50 | N75 | 47,025 | 22,300 | 25,562 | 12,551 | 7,078 | 3,639 | 10,126 | 4,908 | 4,681 | 2,689 | 6,799 | 3,788 |
| G + C (%) | 35.3 | 34.2 | 34.3 | 33.7 | 31.5 | 36.1 |
| No. of Ns per 100 kbp | 97.8 | 20.7 | 152.6 | 26.2 | 8.8 | 34.1 |
| CEGMA complete (of 248) | 239 (96.37%) | 239 (96.37%) | 236 (95.16%) | 229 (92.34%) | 236 (95.16%) | 242 (97.58%) |
| CEGMA complete plus partial | 246 (99.19%) | 243 (97.98%) | 245 (98.79%) | 244 (98.39%) | 247 (99.60%) | 245 (98.79%) |
| BUSCOs Eukaryota (n=429) | C:85% [D:7.4%], F:3.0%, M:11% | C:84% [D:5.1%], F:3.9%, M:11% | C:80% [D:6.9%], F:7.2%, M:11% | C:78% [D:3.4%], F:9.0%, M:12% | C:77% [D:4.1%], F:10%, M:12% | C:82% [D:5.8%], F:5.5%, M:11% |
| BUSCOs Arthropoda (n=2,675) | C:76% [D:3.5%], F:14%, M:9.4% | C:76% [D:3.3%], F:13%, M:9.9% | C:71% [D:4.8%], F:16%, M:12% | C:70% [D:2.3%], F:16%, M:13% | C:66% [D:2.3%], F:16%, M:16% | C:72% [D:3.0%], F:15%, M:12% |
All values were calculated without endosymbiont and low-coverage contamination contigs. BUSCOs Arthropoda assessments for Acyrthosiphon pisum genome assembly as a reference: C:72% [D:6.1%], F:15%, M:12%. C, complete; D, duplicated; F, fragmented; M, missing.
Verifying the Intra-Tremblaya Location for the γ-Proteobacterial Endosymbionts.
The intra-Tremblaya location of the γ-proteobacterial symbionts has been established for mealybugs in the genera Planococcus (44, 45), Pseudococcus (44, 48), Crisicoccus (49), Antonina, Antoniella, Rhodania, Trionymus, and Ferrisia (50). However, to our knowledge, the organization of Tremblaya and its partner γ-proteobacteria has never been investigated in Maconellicoccus or Paracoccus. We therefore verified that both M. hirsutus and Paracoccus marginatus (PMAR) had the expected γ-proteobacteria inside Tremblaya structure using FISH microscopy (Fig. S3).
Fig. S3.
FISH confirming that intrabacterial symbionts reside inside Tremblaya cells in (A) M. hirsutus and (B) P. marginatus mealybugs. Tremblaya cells are in green, and γ-proteobacterial symbionts (DEMHIR and MEPMAR) are in red. (Scale bar: 10 μm.)
Tremblaya Genomes Are Stable in Size and Structure; the γ-Proteobacterial Genomes Are Not.
Genomes from all five T. princeps species (those that have a γ-proteobacterial symbiont) are completely syntenic and similar in size, ranging from 138 to 143 kb (Fig. 1). The gene contents are also similar, with 107 protein-coding genes shared in all five Tremblaya genomes. All differences in gene content come from gene loss or nonfunctionalization in different lineages (Fig. 1). Four pseudogenes (argS, mnmG, lpd, and rsmH) are shared in all five T. princeps genomes, indicating that some pseudogenes can be retained in Tremblaya for long periods of time. Pseudogene numbers were notably higher and coding densities were lower in T. princeps genomes from P. marginatus and Trionymus perrisii (TPER) (Fig. 1 and Table 1).
In contrast to the genomic stability observed in Tremblaya, the genomes of the γ-proteobacterial symbionts vary dramatically in size, coding density, and gene order (Figs. 1 and 3 and Table 1). These genomes range in size from 353 to ∼4,000 kb (P. longispinus contains two ∼4,000-kb genomes from different γ-proteobacteria) and are all notably different from the 539-kb Moranella genome of P. citri (42).
Phylogenetic Analyses Confirm the Intra-Tremblaya γ-Proteobacterial Symbionts Result from Multiple Infections.
The lack of conservation in γ-proteobacterial genome size and structure, combined with data showing that their phylogeny does not mirror that of their mealybug or Tremblaya hosts (43, 44) (Fig. S1), supports early hypotheses that the γ-proteobacterial symbionts of diverse mealybug lineages result from multiple unrelated infections (43, 44). Although the Sodalis-allied clade is extremely hard to resolve because of low taxon sampling of facultative and free-living relatives, nucleotide bias, and rapid evolution in obligate symbionts, none of our analyses indicate a monophyletic group of mealybug symbionts congruent with the host and Tremblaya trees (Fig. 2 and Fig. S1).
Fig. 2.
The intra-Tremblaya mealybug symbionts are members of the Sodalis clade of γ-proteobacteria. A multigene phylogeny of Sodalis-allied insect endosymbionts and closely related Enterobacteriaceae (γ-proteobacteria) was inferred from 80 concatenated proteins under the LG + G evolutionary model in RaxML v8.2.4. Mealybug endosymbionts are highlighted in red. Values at nodes represent bootstrap pseudoreplicates from the maximum likelihood (ML) analysis, posterior probabilities from Bayesian inference (BI) topology inferred under the LG + I + G model, and posterior probabilities from BI topology inferred from the Dayhoff6 recoded dataset under the CAT + GTR + G model in PhyloBayes, respectively.
Draft Insect Genomes Reveal the Timing of Mealybug HGTs.
Gene annotation of low-quality draft genome assemblies is known to be problematic (51). We therefore verified that our mealybug assemblies were sufficient for our purpose of establishing gene presence or absence by comparing our gene sets with databases containing core eukaryotic [Core Eukaryotic Genes Mapping Approach (CEGMA)] and Arthropod [Benchmarking Universal Single-Copy Orthologs (BUSCO)] gene sets. CEGMA scores surpass 98% in all of our assemblies, and BUSCO Arthropoda scores range from 66 to 76% (Table S1). We note that the low scores against the BUSCO database likely reflect the hemipteran origin of mealybugs rather than our fragmented assembly; the high-quality pea aphid genome (52) scores 72% using identical settings. We conclude that our mealybug draft assemblies are sufficient for determining the presence or absence of bacterial HGTs.
We first sought to confirm that the HGTs found previously in the P. citri genome (9) were present in other mealybug species (Tables S2 and S3) and establish the timing of these transfers. [Consistent with our previous findings (9), there were no well-supported HGTs of Tremblaya origin detected in any of our mealybug assemblies.] Our data show that the acquisition of some HGTs [bioABD, ribAD, dapF, lysA, tryptophan 2-monooxygenase oxidoreductase (tms), and ATPases associated with diverse cellular activities (AAA-ATPases)] predated the Phenacoccinae/Pseudococcinae divergence and thus the acquisition of any γ-proteobacterial endosymbiont (Fig. 3). These old HGTs mostly involve amino acid and B vitamin metabolism, are usually found on longer insect scaffolds that contain several essential insect genes, and are syntenic across mealybug species (Fig. 4). In each of these cases, no other bacterial genes or pseudogenes were found within the scaffolds (Tables S2 and S3), suggesting that these HGTs resulted from the transfer of small DNA fragments or that flanking bacterial DNA from larger fragments was lost after the transfer was established. The origin of some of these transfers [7,8-diaminopelargonic acid synthase and biotin synthase (bioAB)] likely predates the entire mealybug lineage, because they are found in the genome of the whitefly Bemisia tabaci (11).
Table S2.
Insect scaffolds containing horizontally transferred genes
| Gene category and HGT | Scaffold name, length, and k-mer coverage (merged k-mers) | |||||
| MHIR | FVIR | PCIT | PLON | TPER | PMAR | |
| B vitamin metabolism | ||||||
| bioA | NODE_1095_43437_30.5427_ID_2189* | NODE_2692_29264_45.8233_ID_5383 | NODE_1158_22396_21.4066_ID_2315 | NODE_13454_6321_92.0554_ID_26907 | NODE_5755_7749_42.2835_ID_11509 | NODE_15638_3963_48.9007_ID_31275 |
| NODE_3702_14332_30.9377_ID_7403 | ||||||
| bioB | NODE_206_103677_26.441_ID_411* | NODE_1537_39445_37.3168_ID_3073 | NODE_1118_22642_22.6156_ID_223 | NODE_11460_7325_111.564_ID_22919 | NODE_386_19325_14.7471_ID_771 | NODE_1524_15917_46.9302_ID_3047 |
| bioD | NODE_407_76514_32.3402_ID_813* | NODE_10823_7722_46.6698_ID_21645 | NODE_17050_6003_28.5577_ID_34099 | NODE_6031_11780_41.4639_ID_12061 | NODE_6741_7177_24.074_ID_13481 | NODE_21598_2996_39.4689_ID_43195 |
| ribA | NODE_36_178330_31.6542_ID_71* | NODE_854_51798_32.6572_ID_1707 | NODE_12118_7709_9.0878_ID_24235 | NODE_10187_8129_45.4534_ID_20373 | NODE_22346_3461_14.5056_ID_44691 | NODE_1442_16334_42.4795_ID_2883 |
| ribD | NODE_3471_11646_37.1715_ID_6941 | NODE_4692_19496_33.1948_ID_9383* | NODE_22359_4879_37.4948_ID_44717 | NODE_4881_13443_38.1156_ID_9761 | NODE_10832_5498_46.801_ID_21663 | NODE_9906_5436_52.0394_ID_19811 |
| panC | NA | NODE_1895_35506_42.1294_ID_3789* | NA | NA | NA | NA |
| Amino acid metabolism | ||||||
| cysK | NA | NODE_1251_43541_36.8307_ID_2501* | NODE_5169_12355_8.40561_ID_10337 | NODE_6319_11425_96.4325_ID_12637 | NODE_5086_8195_13.8618_ID_10171 | NODE_317_27801_43.9358_ID_633 |
| NODE_1576_20002_20.754_ID_3151 | NODE_28193_2829_70.8861_ID_56385 | |||||
| NODE_3332_15001_36.8971_ID_6663 | ||||||
| dapF | NODE_2062_24285_26.5533_ID_4123* | NODE_5954_15883_38.1428_ID_11907 | NODE_962_23955_17.9039_ID_1923 | NODE_20465_4268_36.1901_ID_40929 | NODE_6454_7335_15.475_ID_12907 | NODE_28986_1694_175.113_ID_57971 |
| lysA | NODE_59_148786_27.0847_ID_117 | NODE_4_297799_35.7395_ID_7* | NODE_30394_3749_14.3249_ID_60787 | NODE_8644_9224_44.8211_ID_17287 | NODE_7424_6818_19.9689_ID_14847 | NODE_1012_18919_46.8622_ID_2023 |
| tms | NODE_1166_41617_26.8228_ID_2331* | NA | NODE_6634_10945_16.722_ID_13267 | NODE_13050_6499_55.789_ID_26099 | NODE_34438_2417_29.2671_ID_68875 | NODE_8338_6146_45.9414_ID_16675 |
| NODE_5474_3263_6.97353_ID_10947 | NODE_7749_10066_15.9657_ID_15497 | NODE_25746_3290_140.852_ID_51491 | NODE_6297_7425_23.0654_ID_12593 | NODE_4174_9644_62.5393_ID_8347 | ||
| NODE_11115_8160_6.39149_ID_22229 | NODE_5435_12567_33.0441_ID_10869 | NODE_19614_3786_320.495_ID_39227 | NODE_12227_4696_36.585_ID_24453 | |||
| NODE_3006_17561_42.3735_ID_6011 | NODE_34895_2381_28.3895_ID_69789 | |||||
| Peptidoglycan metabolism | ||||||
| murA | NA | NODE_460_66309_43.1341_ID_919* | NODE_11354_8054_20.7208_ID_22707 | NODE_115_61230_35.6962_ID_229 | NA | NA |
| murB | NA | NODE_12758_5717_33.0994_ID_25515 (possible pseudogene) | NODE_369_31461_35.5337_ID_737* | NODE_20534_4254_49.7111_ID_41067 | NA | NA |
| murC | NA | NA | NODE_1601_19897_15.255_ID_3201* | NODE_22793_3812_49.6747_ID_45585 | NA | NA |
| murD | NA | NA | NODE_13782_7024_6.68302_ID_27563* | NODE_24019_3587_40.034_ID_48037 | NA | NA |
| murE | NA | NA | NODE_6492_11057_8.87711_ID_12983* | NODE_17363_4962_30.2712_ID_34725 | NA | NA |
| murF | NA | NA | NODE_594_27680_14.487_ID_1187* | NODE_4718_13704_42.3641_ID_9435 | NA | NA |
| amiD | NODE_127_124060_24.8687_ID_253 | NA | NODE_37192_2984_10.5592_ID_74383 | NA | NA | NA |
| mltB | NA | NA | NODE_19703_5383_17.8893_ID_39405 | NA | NA | NA |
| b-Lactamase | NA | NODE_5744_16411_32.7118_ID_11487 | NODE_41741_2494_87.3403_ID_83481 | NODE_4491_14129_34.1056_ID_8981 | NODE_27744_2940_11.9002_ID_55487 | NODE_1286_17279_50.7695_ID_2571 |
| NODE_15550_3679_32.2136_ID_31099 | NODE_9718_8869_20.8995_ID_19435 | NODE_16462_5218_33.5475_ID_32923 | NODE_14178_4665_12.505_ID_28355 | |||
| NODE_19161_5497_22.6318_ID_38321 | NODE_27195_3022_166.648_ID_54389 | NODE_28052_2913_15.796_ID_56103 | ||||
| NODE_24154_3566_51.4341_ID_48307 | NODE_6508_7297_15.3206_ID_13015 | |||||
| NODE_2155_20606_37.4645_ID_4309* | ||||||
| ddlB | NA | NODE_52_125214_31.955_ID_103* | NODE_2593_16610_22.7297_ID_5185 | NODE_7871_9825_39.2015_ID_15741 | NODE_29017_2831_20.6286_ID_58033 | NA |
| Other | ||||||
| DUR1,2 | NA | NA | NODE_1398_20965_17.3176_ID_2795* (both urea carboxylase and allophanate hydrolase) | NODE_2264_20156_44.516_ID_4527 (only allophanate hydrolase) | NA | NA |
| gshA | NA | NA | NODE_33435_3399_30.6965_ID_66869 | NA | NA | NA |
| Type III effector | NA | NODE_4508_20239_32.5829_ID_9015 | NODE_2326_17345_10.1749_ID_4651 (+ more than 10 other copies) | NODE_935_28982_40.2817_ID_1869 (+ more than 10 other copies) | NODE_174_23133_18.8075_ID_347 (+ more than 10 other copies) | NODE_1751_14972_78.5305_ID_3501 |
| NODE_932_49868_38.3909_ID_1863 | NODE_31_48312_40.329_ID_61 | |||||
| NODE_955_49231_37.1248_ID_1909 | NODE_4166_9650_48.8279_ID_8331 | |||||
| NODE_444_67489_35.7914_ID_887* | NODE_6268_7513_43.5062_ID_12535 | |||||
| NODE_1448_40490_35.3609_ID_2895 | ||||||
| chitinase | NA | NA | NA | NA | NODE_1934_11960_15.4634_ID_3867 | NODE_378_26435_39.2884_ID_755* |
| rlmI | NA | NA | NODE_8054_9863_19.3512_ID_16107 | NA | NA | NA |
| AAA-ATPases | NODE_36_178330_31.6542_ID_71 (+ numerous other hits) | NODE_854_51798_32.6572_ID_1707 (+ numerous other hits) | NODE_3869_14076_36.1782_ID_7737 (+ numerous other hits) | NODE_3376_16544_40.2929_ID_6751 (+ numerous other hits) | NODE_4822_8396_19.8446_ID_9643 (+ numerous other hits) | NODE_1442_16334_42.4795_ID_2883 (+ numerous other hits) |
| Ankyrin repeat protein (likely opposite HGT direction; i.e., from insects to Wolbachia) | NA | NODE_942_49564_41.7943_ID_1883 (+ numerous other hits to ankyrin proteins) | NODE_1287_21600_20.3177_ID_2573 (+ numerous other hits to ankyrin proteins) | NODE_1876_21872_38.6857_ID_3751 (+ numerous other hits to ankyrin proteins) | NODE_2986_10256_23.7463_ID_5971 (+ numerous other hits to ankyrin proteins) | NODE_1130_18092_47.7632_ID_2259 (+ numerous other hits to ankyrin proteins) |
NA, not applicable.
Longest scaffolds for each of the HGT candidate.
Table S3.
Overview of evidence that the HGTs are encoded on the insect genomes
| HGT | Phylogenetic origin (does not necessarily mean donor) | Present in several mealybug species and forms a single clade | Other bacterial genes on the scaffold | Insect genes on the insect scaffolds | Overall HGT evidence |
| bioA | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| bioB | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| bioD | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| ribA | γ-Proteobacteria: Enterobacteriales | Yes | AAA-ATPase HGT | Yes | Strong support |
| ribD | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| panC | β-Proteobacteria | No, only FVIR | No | Yes | Moderate support |
| cysK | γ-Proteobacteria: Enterobacteriales | Yes | No | Yes | Strong support |
| dapF | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| lysA | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| tms | γ-Protobacteria or β-proteobacteria | Yes | No | Yes | Strong support |
| murA | γ-Proteobacteria: Enterobacteriales | Yes | No | Yes | Strong support |
| murB | Bacteroidetes | Yes | No | Yes | Strong support |
| murC | Bacteroidetes (PCIT) | Yes but different origin | No | Yes | Moderate support |
| α-Proteobacteria: Rickettsiales (PLON) | |||||
| murD | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| murE | α-Proteobacteria: Rickettsiales | Yes | No | No | Moderate support |
| murF | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| amiD | γ-Proteobacteria: Enterobacteriales (PCIT) | Yes but different origin | No | Yes | Moderate support |
| α-Proteobacteria: Rickettsiales (MHIR) | |||||
| mltB | γ-Proteobacteria: Enterobacteriales | No, only PCIT | No | No | Weaker support |
| b-Lactamase | γ-Proteobacteria: Enterobacteriales | Yes | No | Yes | Strong support |
| ddlB | α-Proteobacteria: Rickettsiales | Yes | No | Yes | Strong support |
| DUR1,2 | γ-Proteobacteria: Enterobacteriales | Yes but different origin | No | No | Moderate support |
| gshA | γ-Proteobacteria: Enterobacteriales | No, only PCIT | No | No | Weaker support |
| Type III effector | γ-Protobacteria or β-proteobacteria | Yes | No | Yes | Strong support |
| chitinase | γ-Protobacteria or β-proteobacteria | Yes | No | Yes | Strong support |
| rlmI | γ-Proteobacteria: Enterobacteriales | No, only PCIT | No | No | Weaker support |
| AAA-ATPases | α-Proteobacteria: Rickettsiales | NA | ribA HGT | Yes | Moderate support |
| Ankyrin repeat proteins | α-Proteobacteria: Rickettsiales | NA | No | Yes | Moderate support |
Fig. 4.
HGTs detected in individual mealybug species. Retention of HGT candidates detected across all mealybug species (blue, possible pseudogene; gray, gene not detected; red, different phylogenetic origin; yellow, gene present).
We find that several HGTs were likely acquired after the divergence of the Maconellicoccus clade [cysteine synthase A (cysK), beta-lactamase (b-lact), type III effector (T3ef), and d-alanine-d-alanine ligase B (ddlB)]. One of these genes, cysK, clusters with sequences from other Sodalis-allied bacteria, consistent with a possible origin from an early γ-proteobacterial intrabacterial symbiont (Dataset S1F). We note that cysK has undergone tandem duplication in P. longispinus, Ferrisia virgata (FVIR), and P. citri (Fig. S2A and Tables S2 and S3), which was also observed for several other HGTs (tms, b-lact, T3ef, chiA, ankyrin repeat proteins, and AAA-ATPases). Most of the HGTs found in only one or two mealybug species are related to peptidoglycan metabolism and were assembled on shorter scaffolds with few insect genes on them. Possible HGT losses of tms in FVIR and ddlB in P. marginatus were detected based on our assemblies. Except in three cases (amiD, murC, and DUR1), HGT candidates detected from several mealybug species shared a significant amount of sequence similarity and clustered as a single clade in our phylogenies (Dataset S1), suggesting that these transfers resulted from single events.
Evolution of the Metabolic Patchwork.
We previously found complementary patterns of gene loss and retention between Tremblaya, Moranella, and the mealybug host in the P. citri symbiosis (9, 42). Our comparative genomic data allow us to see how genes are retained or lost in different genomes in multiple lineages that have γ-proteobacterial symbionts of different inferred ages (Fig. 3). These data also allow us to observe how new symbionts evolve in response to the presence of both preexisting symbionts and horizontally transferred genes.
Overall, our data point to an extremely complex pattern of gene loss and retention in the mealybug symbiosis (Fig. 3). Some pathways, such as those for the production of lysine, phenylalanine, and methionine, show a relatively similar patchwork pattern in all mealybugs, with gene retention interspersed between Tremblaya, its γ-proteobacterial endosymbiont, and/or the host. Gene retention patterns from many other pathways, however, show much less predictable patterns. The isoleucine, valine, leucine, threonine, and histidine pathways show a tendency toward Tremblaya-dominated biosynthesis in M. hirsutus, F. virgata, and P. citri (that is, gene retention in Tremblaya and gene loss in the γ-proteobacterial symbiont) but with a clear shift toward γ-proteobacterial–dominated biosynthesis in P. marginatus and T. perrisii. Other pathways, such as tryptophan, show γ-proteobacterial dominance in all mealybug symbioses but with reliance on at least one Tremblaya gene in P. citri, P. marginatus, and T. perrisii. In the arginine pathway, gene retention is dominated by Tremblaya in M. hirsutus but by the γ-proteobacterial endosymbiont in all other lineages, with sporadic loss of Tremblaya genes in different lineages. Overall, M. hirsutus encodes the most Tremblaya genes and the fewest γ-proteobacterial genes, whereas TPER shows the opposite pattern.
Gene Retention Patterns for Translation-Related Genes in Tremblaya.
In contrast to metabolic genes involved in nutrient production, the retention patterns for genes involved in translation vary little between mealybug species (Fig. 3). As first shown in Tremblaya PCIT (42), none of the additional Tremblaya genomes that we report here encode any functional aminoacyl tRNA synthetase, with an exception of one likely functional gene (cysS) in T. princeps PLON, which is present as a pseudogene in several other lineages of Tremblaya. Furthermore, all Tremblaya genomes have lost key translational control proteins that are typically retained even in the smallest endosymbiont genomes, such as ribosome recycling factor, l-methionyl-tRNAfMet N-formyltransferase, and peptide deformylase. The translational release factors RF-1 and RF-2 (prfAB) and elongation factor (EF) EF-Ts (tsf) are present only in the gene-rich T. princeps MHIR genome and absent or pseudogenized in all other T. princeps genomes. Initiation factors (IFs) IF-1, IF-2, and IF-3 (infABC) and EFs EF-Tu and EF-G (tufA and fusA) are retained in all Tremblaya genomes, as are most ribosomal proteins (Dataset S2A).
Taxonomy of Mealybug Endosymbionts.
The naming convention in the field of insect endosymbiosis has been to keep the species names constant for lineages of endosymbiotic bacteria resulting from single infections, even if they exist in different species of host insects. The host is denoted by appending a specific abbreviation to the end of the endosymbiont name (e.g., T. princeps PCIT for T. princeps from P. citri). However, our data show that the intra-Tremblaya γ-proteobacterial symbionts are not from the same infection; they result from independent endosymbiotic events from clearly discrete lineages within the Sodalis clade (Fig. 2). Following convention, we have chosen to give these γ-proteobacteria different genus names but unite them by retaining the “endobia” species denomination for each one (such as in Moranella endobia).
We propose the following Candidatus status names for four lineages of intra-Tremblaya γ-proteobacterial symbionts of mealybugs for which we have completed a genome. First, Candidatus Doolittlea endobia MHIR is for the endosymbiont from M. hirsutus. This name honors the American evolutionary biologist W. Ford Doolittle (1941–) for his contributions to our understanding of HGT and endosymbiosis. Second, Candidatus Gullanella endobia FVIR is for the endosymbiont from F. virgata. This name honors the Australian entomologist Penny J. Gullan (1952–) for her contributions to numerous aspects of mealybug biology and taxonomy. Third, Candidatus Mikella endobia PMAR is for the endosymbiont from P. marginatus. This name honors the Canadian biochemist Michael W. Gray (1943–) for his contributions to our understanding of organelle evolution. Fourth, Candidatus Hoaglandella endobia TPER is for the endosymbiont from T. perrisii. This name honors the American biochemist Mahlon B. Hoagland (1921–2009) for his contributions to our understanding of the genetic code, including the codiscovery of tRNA. All of the names that we propose could be extendible to related mealybugs species (e.g., G. endobia for other members of the Ferrisia clade) if future phylogenetic analyses show that these symbionts result from the same infection. For simplicity, we use all endosymbiont names without the Candidatus denomination.
Discussion
Diversity of Intra-Tremblaya Symbiont Genomes Suggests Multiple Replacements.
Phylogenetic analyses based on rRNA and protein-coding genes from the γ-proteobacterial endosymbionts of mealybugs first indicated their origins from multiple unrelated bacteria (43, 44). What was unclear from these data was the order and timing of the γ-proteobacterial infections and how these infections affected the other members of the symbiosis. We imagine three possible scenarios that could explain these phylogenetic and genomic data (Fig. 5). The first is that there was a single γ-proteobacterial acquisition in the ancestor of the Pseudococcinae that has evolved idiosyncratically as mealybugs diversified over time, leading to seemingly unrelated genome structures and coding capacities (the “idosyncratic” scenario) (Fig. 5A). The second is that the γ-proteobacterial infections occurred independently, each establishing symbioses inside Tremblaya in completely unrelated and separate events (the “independent” scenario) (Fig. 5B). The third is that there was a single γ-proteobacterial acquisition in the Pseudococcinae ancestor that has been replaced in some mealybug lineages over time (the “replacement” scenario) (Fig. 5C). The idosyncratic scenario is easy to disregard, because although acquisition of a symbiont followed by rapid diversification of the host might result in different patterns of genome evolution in different lineages, it should result in monophyletic clustering in phylogenetic trees. Previous phylogenetic work as well as our phylogenomic data (Fig. 2) show that the γ-proteobacteria that have infected different mealybugs have originated from clearly distinct (and well-supported) bacterial lineages.
Fig. 5.
Three possible scenarios that built the mealybug symbiosis. Independent γ-proteobacterial acquisitions are shown as arrows, and replacements are noted with Rs above the arrow. Colors represent the different γ-proteobacterial genomes shown in Fig. 1. (A) The idiosyncratic scenario, where a single γ-proteobacterial acquisition evolved differently as mealybugs diverged, leading to different genome sizes and structures in extant mealybugs. (B) The independent scenario, where the different sizes and structures of the γ-proteobacterial genomes shown in Fig. 1 result from completely independent acquisitions. (C) The replacement scenario, where the different sizes and structures of the γ-proteobacterial genomes shown in Fig. 1 result from several replacements of an ancestral γ-proteobacterial symbiont.
The independent and replacement scenarios are more difficult to tell apart with our data, and the true history of the symbiosis may have involved both. However, we favor symbiont replacement as the main mechanism that generated the complexity that we see in mealybugs, primarily because of the large differences in size observed in the γ-proteobacterial genomes (Fig. 1 and Table 1). Genome size is strongly correlated to endosymbiotic age in bacteria, especially at the onset of symbiosis, when genome reduction can be rapid (53–57). Most relevant to our argument here is the speed with which genome reduction has been shown to take place in Sodalis-allied bacteria closely related to the γ-proteobacterial symbionts of mealybugs (34, 58, 59). It has been estimated that as much as 55% of an ancestral Sodalis genome was lost on the transition to endosymbiosis in a mere ∼28,000 y, barely enough time for 1% sequence divergence to accumulate between the new symbiont and a free-living relative (58). Our general assumption is, therefore, that recently established endosymbionts should have larger genomes than older symbionts. However, we note that genome reduction is not a deterministic process related to time, especially as the symbiosis ages. It is clear that, in some insects housing pairs of ancient symbionts with highly reduced genomes, the older endosymbiont can have a larger genome than the newer symbiont (60).
The evidence for recent replacement is most obvious in P. longispinus (Fig. 3 and Table 1). This symbiosis harbors two related γ-proteobacterial symbionts (61), each with a rod-like cell shape, although it is currently unclear if both bacteria reside within Tremblaya (48). Both of these genomes are about 4 Mb in size (Table 1), approximately the same size as the recently acquired Sodalis symbionts from tsetse fly (4.3 Mb) (62) and rice weevil (4.5 Mb) (59). These morphological and genomic features as well as their relatively short branches in Fig. 2 all suggest that the γ-proteobacterial symbionts are recent acquisitions in the P. longispinus symbiosis. The P. longispinus replacement seems so recent that the stereotypical complementary patterns of gene loss and retention have not had time to accumulate between the γ-proteobactia and Tremblaya (Fig. 3). However, Tremblaya PLON is missing the same translation-related genes (aside from cysS) as all other Tremblaya, indicating that it has long ago adapted to the presence of a (now eliminated) bacterium living in its cytoplasm. Comprehensive analyses of the two γ-proteobacterial genomes from P. longispinus are ongoing and will be published elsewhere.
We hypothesize that the larger, gene-rich γ-proteobacterial genomes that we describe here are the result of symbiont replacements of an ancestral γ-proteobacterial endosymbiont rather than completely independent infections in different mealybug lineages. We suspect that the massive loss in key translation-related genes (Fig. 3) in Tremblaya occurred in response to the first γ-proteobacterial infection, which then required all subsequent replacement events to also reside within the Trembalay cytoplasm. It is tempting to speculate that the 353-kb Mikella PMAR genome is the ancestral intra-Tremblaya symbiont lineage that has not been replaced or at least has not been recently replaced. However, because the relevant clades split right after the Phenacoccinae/Pseudococcinae divergence—that is, right at the acquisition of the first γ-proteobacterial symbiont—much richer taxon sampling would be needed to test the hypothesis that this was, in fact, the original symbiont lineage (Fig. 2). We also note that, in at least one other case, bacteria from the Sodalis group have established multiple repeated infections in a replacement-like pattern (38).
How Did the Bacteria Within a Bacterium Structure Start, and Why Does It Persist?
In extreme cases of endosymbiotic genome reduction, genes required for the generation of a cell envelope, along with other fundamental processes, are lost (12, 13). This phenomenon is seen in Tremblaya, where even the largest genome (from Phenacoccus avenae, which lacks a γ-proteobacterial symbiont) encodes no genes for the production of fatty acids or peptidoglycan (9). We assume that the envelope that defines the Tremblaya cytoplasm is made by the host, because it cannot be made by Tremblaya. These data suggest that when the first γ-proteobacterial endosymbiont established residence in Tremblaya, it invaded a membrane system that was perhaps more eukaryotic than bacterial in nature (even if it ultimately ended up in a “bacterial” cytoplasm). Bacteria in the Sodalis group are very good at establishing intracellular infections in insect cells (38, 63, 64), and we suggest that their propensity to infect Tremblaya might simply reflect this ability. The cytoplasm vs. envelope distinction is important, because the mealybug symbiosis has been held up by many—including us—as a rare example of a stable bacteria within a bacterium symbiosis. Although this description might be apt if one considers the Tremblaya cytoplasm bacterial in nature, it may not be if one considers the types of membranes that the innermost bacteria had to cross to get there.
But why did the first γ-proteobacterial endosymbiont end up inside Tremblaya? We can think of two related possibilities. The first is that it was easier to use the established transport system between the insect cell and Tremblaya (65) than to evolve a new one. The second is that the insect immune system likely does not target Tremblaya cells, and so the Tremblaya cytoplasm is an ideal hiding place for a newly arrived symbiont. After the loss of critical translation-related genes in Tremblaya, the symbiosis would persist with a bacteria within a bacterium structure because no other structure is possible. We note that Sodalis- and Arsenophonus-allied symbionts were recently suggested to sometimes reside within Sulcia cells in the leafhoppers Cicadella viridis and Macrosteles laevis (66, 67). Although these studies were based only on EM imaging and not confirmed by specific probes (e.g., with FISH), it is possible that symbioses formed by bacteria taking up residence inside of degenerate symbionts with host-derived cell envelopes are not uncommon.
Evolution of Organelles and the Timing of HGT.
It is widely accepted that the mitochondria found across eukaryotes are related back to a single common α-proteobacterial ancestor (68) and that the plastids resulted from a single cyanobacterial infection (69). What is less clear is what happened before these endosymbiont lineages were fixed into organelles. The textbook concept is that a bacterium was taken up by a host cell, transferred most of its genes, and became the mitochondrion or plastid (70). This idea becomes more complicated when the taxonomic affiliation of bacterial genes on eukaryotic genomes is examined (71–74). For example, only about 20% of mitochondria-related horizontally transferred genes have strong α-proteobacterial phylogenetic affinities (72). The signals for the remaining 80% are either too weak to confidently place the gene or show clear affiliation with other bacterial groups (71, 72). Hypotheses that explain these data fall roughly into two camps. Some imagine a gradual process where multiple taxonomically diverse endosymbioses may have occurred—and transferred genes—before the final α-proteobacterial symbiont was fixed. That is, the mitochondria arrived rather late in the evolution of a eukaryotic-like cell that already contained many bacterial genes resulting from HGT of previous symbionts (75, 76). Others favor a more abrupt “mitochondria early” scenario, where an endosymbiont with a taxonomically diverse mosaic genome made the transition to becoming the mitochondrion in a single endosymbiotic event, transferring its genes during the process. In this scenario, the mosaic nature of the extant eukaryotic genomes resulted from the “inherited chimerism” of the lone mitochondria bacterial ancestor because of the propensity of bacteria to participate in HGT with distantly related groups (73, 77, 78).
We suggest that the data reported here indirectly support the gradualist or mitochondria late view of organelle evolution. We find that the majority of nutrient-related HGTs occurred before the divergence of the Phenacoccinae and Pseudococcinae (Figs. 3 and 4) and therefore before the establishment of any γ-proteobacterial symbiont. In particular, HGTs in the riboflavin and lysine pathways were retained on the insect genomes as the first γ-proteobactieral symbiont was established and new γ-proteobacterial symbionts replaced old ones (Figs. 2 and 3). Our results make it clear that HGTs can remain stable on host genomes for millions of years, even after the addition or replacement of symbionts that share pathways with these genes, and directly show how mosaic metabolic pathways can be built gene by gene as symbionts come and go over time. We note that the “shopping bag” hypothesis (79), which argues that establishment of an endosymbiosis should be regarded as a continuous process involving a number of partners rather than a single event involving two partners, fits our data remarkably well. Of course, our data do not rule out inherited chimerism as a contributor to the taxonomic diversity of genes that support organelle function, because many bacterial genomes are taxonomically mosaic because of HGT (73). As with most solutions to endosymbiotic problems, the true answer is likely a complicated mixture of both processes.
Using Symbiont Supplementation and Replacement to Claw Out of the Rabbit Hole.
At the onset of a nutritional symbiosis, a new organism comes on board and allows access to a previously inaccessible food source. Rapid adaptation and diversification can occur—the new symbiont adapts to the host, the host adapts to the symbiont, and the entire symbiosis expands in the newly available ecological niche. However, cases where a bacterial symbiont takes up stable residence in a host cell also seem to lead to irreversible degeneration and codependence between host and symbiont (26, 28, 80, 81). What HGT, symbiont supplementation, and symbiont replacement may offer is a way out—at least temporarily, but perhaps permanently—of this degenerative ratchet.
However, new symbionts may also provide ecological opportunity in addition to evolutionary reinvigoration. We note that the mealybug with one of the broadest host ranges is also the species with the most recent γ-proteobacterial replacement, P. longispinus. P. longispinus is an important agricultural pest and known to feed on plants from 82 families (scalenet.info/catalogue/pseudococcus%20longispinus/). It seems possible that fresh symbionts with large genomes could provide novel functions unavailable in more degenerate symbionts, again propelling the symbioses into new niches.
Materials and Methods
Samples of the mealybug species M. hirsutus (pink hibiscus mealybug; MHIR; collection locality: Helwan, Egypt), F. virgata (striped mealybug; FVIR; collection locality: Helwan, Egypt), and P. marginatus (papaya mealybug; PMAR; collection locality: Mayotte, Comoro Islands) were identified and provided by Thibaut Malausa, Institut National de la Recherche Agronomique, Sophia, France. T. perrisii (TPER; collection locality: Poland) samples were provided by Małgorzata Kalandyk-Kołodziejczyk, University of Silesia, Katowice, Poland. P. longispinus samples (long-tailed mealybug; PLON) were collected by F.H. in a winter garden of the Faculty of Science, University of South Bohemia. DNA vouchers and insect vouchers of adult females for slide mounting are available from F.H. DNA was isolated from three to eight whole insects of all species by the Qiagen QIAamp DNA Micro Kit, and each library was multiplexed on two-thirds of an Illumina HiSeq 2000 Lane and sequenced as 100-bp paired end reads. The M. hirsutus sample was sequenced on an entire MiSeq lane with v3 chemistry and 300-bp paired end mode. Both approaches generated sufficient coverage for both symbiont genomes and draft insect genomes. Adapter clipping and quality filtering were carried out in the Trimmomatic package (82) using default settings. Read error correction (BayesHammer), de novo assembly (k-mers K21, K33, K55, and K77 for 100-bp data and K99 and K127 for 300-bp data), and mismatch/short-indel correction were performed by the SPAdes assembler, v3.5.0 (83). Additional endosymbiont-targeted long k-mer (91 and 241 bp) assemblies generated by the Ray v2.3.1 (84) and PRICE v1.2 (85) assemblers were used to improve assemblies of complex endosymbiont regions.
Additional information on the computational and microscopy methods can be found in SI Materials and Methods. General Tremblaya primers are shown in Table S4.
Table S4.
Tremblaya primer
| Genome region | Forward primer | Reverse primer(s) |
| leuA_fwd ↔ rpsO_rRNA_fwd_rev | CTAAGGGCTGAGGACGTTGG | CCCCTACGCAGCCTGTTTAT |
| rpsO_rRNA_fwd_rev ↔ prs_rev | CCCCTACGCAGCCTGTTTAT | GGGTAGCTCAGCGGTAAGAG |
| tRNA_Gly_fwd ↔ rsmH_rev | GCCTAGTGCAGGGATAGAAGG | CACTGAGGCTCTGAGTTGGC |
| tRNA_Gly_fwd ↔ 23S_rRNA_rev1 | GCCTAGTGCAGGGATAGAAGG | CGTTGATAGGCTGGGTGTGT |
| tRNA_Gly_fwd ↔ 23S_rRNA_rev2 | GCCTAGTGCAGGGATAGAAGG | AAGTTCCGACCTGCACGAAT |
| argG_fwd ↔ rib_pseudo_rev | CCCTGGCCTATGCTTCTGAC | GGAGGTCAGATTCGAGGCAG |
| ilvD_fwd ↔ hypothetical_protein_rev | ATAAGGAGGAGGGTGCCTGT | GTGATGGTGTTAGGTTGCGG |
These primers were used for duplicated rRNA operons and one more region-breaking assembly of five T. princeps genomes.
SI Materials and Methods
Symbiont Genome Assembly, Annotation, and Analyses.
Endosymbiont genomes were closed into circular mapping molecules by the combination of PCR and Sanger sequencing. General Tremblaya primers for closing of problematic regions, such as the duplicated rRNA operon, were designed to be applicable to most Tremblaya princeps species (Table S4). Given unclear GC skew in some of the species, the origin of replication was set to the same region as in already published Tremblaya and Moranella genomes to standardize comparative genomic analyses. Pilon v1.12 (86) and REAPR v1.0.17 (87) were used to diagnose and improve potential misassemblies, collapsed repeats, and polymorphisms. Genome annotations and reannotations [abbreviations combine Tremblaya princeps (TP) with species abbreviations such as PCIT; i.e., for TPPCIT, MEPCIT, and Tremblaya phenacola from Phenacoccus avenae (TPPAVE)] were carried out by the Prokka v1.10 pipeline (88) with disabled default discarding of ORFs overlapping tRNAs. Our comparative data allowed us to reannotate many genes and pseudogenes previously annotated as hypothetical proteins and uncover pseudogene remnants (Dataset S2A). Tremblaya panproteome was curated manually with an extensive use of MetaPathways v2.0 (89), PathwayTools v17.0 (90), and InterProscan v5.10 (91) and then, used in Prokka as trusted proteins for annotation. This approach was used to obtain identical gene names for all seven Tremblaya genomes (TPPAVE, TPPCIT, TPMHIR, TPFVIR, TPPLON, TPPMAR, and TPTPER). tRNA and tmRNA regions were reannotated using tFind.pl wrapper (bioinformatics.sandia.gov/software). Tremblaya pseudogenes were reannotated in the Artemis browser (92) based on genome alignment of all Tremblaya genomes.
Genomes of γ-proteobacterial symbionts were annotated as described for Tremblaya genomes, except that several approaches were used to assist in pseudogene annotation. Proteins split into two or more ORFs were joined into a single pseudogene feature. All proteins were then searched against the National Center of Biotechnology Information (NCBI) nonredundant protein database (NR) database, and their length was compared. If the endosymbiont protein was shorter than 60% of its 10 top hits, it was called a pseudogene unless it is known to be a bifunctional protein and at least one of its domains was intact. All intergenic regions were then screened by BlastX (e value 1e−4] against NR to reveal pseudogene remnants.
Multigene matrices of conserved orthologous genes for β-proteobacteria (49 genes) and Enterobacteriaceae (80 genes) were generated by the PhyloPhlAN package (93). Sequences of genes for 16S and 23S rRNA were downloaded from the NCBI nucleotide database and used for Tremblaya- and Sodalis-allied, species-rich phylogenies. All matrices were aligned by the MAFFT v6 l-INS-i algorithm (94). Ambiguously aligned positions were excluded by trimAL v1.2 (95) with the −automated 1 flag set for likelihood-based phylogenetic methods. Maximum likelihood (ML) and Bayesian inference (BI) phylogenetic methods were applied to the single-gene and concatenated amino acid alignments. ML trees were inferred using RAxML 8.2.4 (96) under the LG + G model with subtree pruning and regrafting tree search algorithm and 1,000 bootstrap pseudoreplicates. BI analyses were conducted in MrBayes 3.2.2 (97) under the LG + I + G model with 5 million generations [prset aamodel = fixed(lg), lset rates = invgamma ngammacat = 4, mcmcp checkpoint = yes ngen = 5,000,000]. Concatenated 16S–23S rRNA gene phylogenies for mealybug endosymbionts were inferred as above, except that the GTR + I + G model was used. For BI analyses, a proportion of invariable sites (I) was estimated from the data, and heterogeneity of evolutionary rates was modeled by four substitution rate categories of the γ- (G) distribution with the γ-shape parameter (α) estimated from the data. Exploration of Markov chain Monte Carlo convergence and burn-in determination were performed in AWTY (ceb.csit.fsu.edu/awty) and Tracer v1.5 (evolve.zoo.ox.ac.uk). Additionally, concatenated protein and Dayhoff6 recoded datasets were analyzed under the CAT + GTR + G model in PhyloBayes MPI 1.5a (98). Posterior distributions obtained under four independent PhyloBayes runs were compared using tracecomp and bpcomp programs, and runs were considered converged at maximum discrepancy value <0.1 and minimum effective size >100.
Tremblaya genomes were aligned using progressiveMauve v2.3.1 (99). Clusters of orthologous genes were generated using OrhoMCL v1.4 (100). Orthologs missed because of low homology (BLAST e value 1e−5) were curated with the help of identical gene order and annotations. All genomes were visualized as linear with links connecting positions of orthologous genes in Processing3 (https://processing.org/). Additional figures were drawn or curated in Inkscape (https://inkscape.org/en/).
Contamination Screening and Filtering of Draft Mealybug Genomes.
The presence of additional species, such as facultative symbionts, environmental bacteria, and contamination in the genome data were visualized by the Taxon-Annotated GC Coverage (TAGC; drl.github.io/blobtools/) plots (101, 102), and the tool was also used to extract contigs of two γ-proteobacterial symbionts from the Pseudococcus longispinus mealybug and Wolbachia sp. from the Maconellicoccus hirsutus mealybug. We confirmed that there were no other organisms present in our data at high coverage, except the expected endosymbionts. Although there are now reliable methodologies to remove the majority of contamination from data sequenced using several independent libraries (102, 103), recognizing low-coverage contamination (in our case, mostly of bacterial, human, and plant origin) from single-library sequencing data can be problematic. Using the TAGC Tool, we were able to recognize low-coverage Propionibacterium spp. and human contamination in several of the samples (megablast e value 1e−25) and plant contamination in the P. longispinus sample. These short sequences were filtered out, and also, all (nonsymbiont) contigs or scaffolds shorter than 200 bp and/or having coverage lower than 3× were excluded from the total assemblies.
Draft Insect Genomes and HGTs.
Endosymbiont contigs and PhiX contigs (from the spike in of Illumina libraries) were excluded from assemblies, and insect genome assemblies were evaluated by the Quast v.2.3 Tool (104) for basic assembly statistics and by the CEGMA v2.5 (105) and BUSCO v1.1 (106) with Arthropoda dataset for gene completeness (Table S2). Lacking RNA Sequencing data to properly annotate the draft genomes, only preliminary gene predictions were carried out by unsupervised GeneMark-ES (107) runs to get exon structures for scaffolds with HGTs.
Horizontally transferred genes previously identified in the Planococcus citri genome were used as queries for BlastN, tBlastN, and tBlastX searches against custom databases made of scaffolds from individual species. Additionally, two approaches were used to minimize false negative results possibly caused by highly diverged and/or fragmented HGTs undetected by BLAST searches. First, nucleotide alignments of individual HGTs (see above) were used as Hidden Markov Model profiles in nhmmer (108) searches against scaffolds of individual assemblies. Second, BLAST databases were made out of all raw fastq reads and searched by tBlastN using protein HGTs from P. citri as queries.
Lineage-specific candidates of HGT were detected as reported previously (9) using the NR database (downloaded March 17, 2015). We used stringent screening criteria: only genes present on long scaffolds containing insect genes or present in several mealybug genomes were considered as strongly supported HGT candidates here (Table S3). Moreover, all scaffolds of HGT candidates presented here were confirmed by mapping raw read data and manually examined for low-coverage regions and potential misassemblies created by the joining of low-coverage contigs of bacterial contaminants with bona fide insect contigs.
A multigene mealybug phylogeny was inferred as above using 419 concatenated protein sequences of the core eukaryotic proteins identified from six mealybug genomes by the CEGMA package. Phylogenetic trees for individual HGTs were inferred as reported previously (9), except that the workflow was implemented using the ETE3 Python Toolkit (109).
Microscopy.
Whole-mealybug individuals stored in absolute ethanol were postfixed with 4% (vol/vol) paraformaldehyde in PBS for 1 h; dehydrated by 1-h incubations in 80%, 90%, and 100% (vol/vol) ethanol; cleared in xylene two times for 1 h each, and paraffin embedded overnight. Paraffin blocks were sectioned to 5–7 μM sections, deparaffinized in xylene two times for 5 min each, and then, hydrated through a 100%, 85%, and 70% (vol/vol) ethanol series. Hybridization was done according to the work by van Leuven et al. (110). No probe and RNase A controls were used to assess insect tissue autofluorescence. The following fluorochrome-labeled oligonucleotide probes targeting 16S rRNA were used for endosymbiont in situ hybridization of M. hirsutus [TPMHIR: 5′-Cy3-ATGCCACCCTTCCTCCCGAA-3′; Doolittlea endobia MHIR (DEMHIR): 5′-Cy5-CTTTCATTTTCTTCCCCGTT-3′] and Parracoccus marginatus [TPPMAR: ACGCCCYCCTTCATCCCGAA; Mikella endobia PMAR (MEPMAR): 5′-Cy5-TAATAACTTTCTTCCTTGCT-3′]. An Olympus FV 1000 IX Inverted Laser Scanning Confocal Microscope was used for imaging with 60× and 100× oil immersion lenses. Image postprocessing was done in Fiji v1.51a (111).
Supplementary Material
Acknowledgments
We thank the Genomics Core Facility at the University of Montana, the DNA Sequencing Facility at the University of Utah, and the European Molecular Biology Laboratory Genomics Core Facility in Heidelberg for sequencing services. F.H. was funded by the Fulbright Commission and Grant Agency of the University of South Bohemia Grant 04-001/2014/P. J.P.M. was funded by National Science Foundation (NSF) Grants IOS-1256680 and IOS-1553529, National Aeronautics and Space Administration Astrobiology Institute Award NNA15BB04A, and NSF-Experimental Program to Stimulate Competitive Research Award NSF-IIA-1443108 (to the Montana Institute on Ecosystems).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: The nine complete endosymbiont genomes, five draft assemblies of insect genomes, and raw data have been deposited into the European Nucleotide Archive (ENA; accession nos.: Maconellicoccus hirsutus: PRJEB12066; Ferrisia virgata: PRJEB12067; Pseudococcus longispinus: PRJEB12068; Paracoccus marginatus: PRJEB12069; and Trionymus perrisii: PRJEB12071). Unannotated draft genomes of two Enterobacteriaceae symbionts from P. longispinus mealybugs and a B-supergroup Wolbachia strain sequenced from M. hirsutus mealybugs were deposited in Figshare (accession nos. 10.6084/m9.figshare.2010393 and 10.6084/m9.figshare.2010390).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603910113/-/DCSupplemental.
References
- 1.Gray MW, Doolittle WF. Has the endosymbiont hypothesis been proven? Microbiol Rev. 1982;46(1):1–42. doi: 10.1128/mr.46.1.1-42.1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Palmer JD. Organelle genomes: Going, going, gone! Science. 1997;275(5301):790–791. doi: 10.1126/science.275.5301.790. [DOI] [PubMed] [Google Scholar]
- 3.Martin W, Müller M. The hydrogen hypothesis for the first eukaryote. Nature. 1998;392(6671):37–41. doi: 10.1038/32096. [DOI] [PubMed] [Google Scholar]
- 4.Embley TM, Martin W. Eukaryotic evolution, changes and challenges. Nature. 2006;440(7084):623–630. doi: 10.1038/nature04546. [DOI] [PubMed] [Google Scholar]
- 5.Douglas AE. Mycetocyte symbiosis in insects. Biol Rev Camb Philos Soc. 1989;64(4):409–434. doi: 10.1111/j.1469-185x.1989.tb00682.x. [DOI] [PubMed] [Google Scholar]
- 6.Nowack ECM, Melkonian M. Endosymbiotic associations within protists. Philos Trans R Soc Lond B Biol Sci. 2010;365(1541):699–712. doi: 10.1098/rstb.2009.0188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Stewart FJ, Newton ILG, Cavanaugh CM. Chemosynthetic endosymbioses: Adaptations to oxic-anoxic interfaces. Trends Microbiol. 2005;13(9):439–448. doi: 10.1016/j.tim.2005.07.007. [DOI] [PubMed] [Google Scholar]
- 8.Nakayama T, Ishida K. Another acquisition of a primary photosynthetic organelle is underway in Paulinella chromatophora. Curr Biol. 2009;19(7):R284–R285. doi: 10.1016/j.cub.2009.02.043. [DOI] [PubMed] [Google Scholar]
- 9.Husnik F, et al. Horizontal gene transfer from diverse bacteria to an insect genome enables a tripartite nested mealybug symbiosis. Cell. 2013;153(7):1567–1578. doi: 10.1016/j.cell.2013.05.040. [DOI] [PubMed] [Google Scholar]
- 10.Sloan DB, et al. Parallel histories of horizontal gene transfer facilitated extreme reduction of endosymbiont genomes in sap-feeding insects. Mol Biol Evol. 2014;31(4):857–871. doi: 10.1093/molbev/msu004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Luan J-B, et al. Metabolic coevolution in the bacterial symbiosis of whiteflies and related plant sap-feeding insects. Genome Biol Evol. 2015;7(9):2635–2647. doi: 10.1093/gbe/evv170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.McCutcheon JP, Moran NA. Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. 2011;10(1):13–26. doi: 10.1038/nrmicro2670. [DOI] [PubMed] [Google Scholar]
- 13.Moran NA, Bennett GM. The tiniest tiny genomes. Annu Rev Microbiol. 2014;68:195–215. doi: 10.1146/annurev-micro-091213-112901. [DOI] [PubMed] [Google Scholar]
- 14.Nikoh N, et al. Bacterial genes in the aphid genome: Absence of functional gene transfer from Buchnera to its host. PLoS Genet. 2010;6(2):e1000827. doi: 10.1371/journal.pgen.1000827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nowack ECM, et al. Endosymbiotic gene transfer and transcriptional regulation of transferred genes in Paulinella chromatophora. Mol Biol Evol. 2011;28(1):407–422. doi: 10.1093/molbev/msq209. [DOI] [PubMed] [Google Scholar]
- 16.Nowack ECM, Grossman AR. Trafficking of protein into the recently established photosynthetic organelles of Paulinella chromatophora. Proc Natl Acad Sci USA. 2012;109(14):5340–5345. doi: 10.1073/pnas.1118800109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nakabachi A, Ishida K, Hongoh Y, Ohkuma M, Miyagishima SY. Aphid gene of bacterial origin encodes a protein transported to an obligate endosymbiont. Curr Biol. 2014;24(14):R640–R641. doi: 10.1016/j.cub.2014.06.038. [DOI] [PubMed] [Google Scholar]
- 18.Theissen U, Martin W. The difference between organelles and endosymbionts. Curr Biol. 2006;16(24):R1016–R1017. doi: 10.1016/j.cub.2006.11.020. [DOI] [PubMed] [Google Scholar]
- 19.Keeling PJ, Archibald JM. Organelle evolution: What’s in a name? Curr Biol. 2008;18(8):R345–R347. doi: 10.1016/j.cub.2008.02.065. [DOI] [PubMed] [Google Scholar]
- 20.McCutcheon JP, Keeling PJ. Endosymbiosis: Protein targeting further erodes the organelle/symbiont distinction. Curr Biol. 2014;24(14):R654–R655. doi: 10.1016/j.cub.2014.05.073. [DOI] [PubMed] [Google Scholar]
- 21.Keeling PJ, McCutcheon JP, Doolittle WF. Symbiosis becoming permanent: Survival of the luckiest. Proc Natl Acad Sci USA. 2015;112(33):10101–10103. doi: 10.1073/pnas.1513346112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sloan DB, Moran NA. Genome reduction and co-evolution between the primary and secondary bacterial symbionts of psyllids. Mol Biol Evol. 2012;29(12):3781–3792. doi: 10.1093/molbev/mss180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bennett GM, Moran NA. Small, smaller, smallest: The origins and evolution of ancient dual symbioses in a Phloem-feeding insect. Genome Biol Evol. 2013;5(9):1675–1688. doi: 10.1093/gbe/evt118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Nakabachi A, et al. Defensive bacteriome symbiont with a drastically reduced genome. Curr Biol. 2013;23(15):1478–1484. doi: 10.1016/j.cub.2013.06.027. [DOI] [PubMed] [Google Scholar]
- 25.Manzano-Marín A, Latorre A. Settling down: The genome of Serratia symbiotica from the aphid Cinara tujafilina zooms in on the process of accommodation to a cooperative intracellular life. Genome Biol Evol. 2014;6(7):1683–1698. doi: 10.1093/gbe/evu133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Moran NA. Accelerated evolution and Muller’s rachet in endosymbiotic bacteria. Proc Natl Acad Sci USA. 1996;93(7):2873–2878. doi: 10.1073/pnas.93.7.2873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fares MA, Barrio E, Sabater-Muñoz B, Moya A. The evolution of the heat-shock protein GroEL from Buchnera, the primary endosymbiont of aphids, is governed by positive selection. Mol Biol Evol. 2002;19(7):1162–1170. doi: 10.1093/oxfordjournals.molbev.a004174. [DOI] [PubMed] [Google Scholar]
- 28.Bennett GM, Moran NA. Heritable symbiosis: The advantages and perils of an evolutionary rabbit hole. Proc Natl Acad Sci USA. 2015;112(33):10169–10176. doi: 10.1073/pnas.1421388112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Popadin KY, Nikolaev SI, Junier T, Baranova M, Antonarakis SE. Purifying selection in mammalian mitochondrial protein-coding genes is highly effective and congruent with evolution of nuclear genes. Mol Biol Evol. 2013;30(2):347–355. doi: 10.1093/molbev/mss219. [DOI] [PubMed] [Google Scholar]
- 30.Cooper BS, Burrus CR, Ji C, Hahn MW, Montooth KL. Similar efficacies of selection shape mitochondrial and nuclear genes in both Drosophila melanogaster and Homo sapiens. G3 (Bethesda) 2015;5(10):2165–2176. doi: 10.1534/g3.114.016493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Smith DR, Keeling PJ. Mitochondrial and plastid genome architecture: Reoccurring themes, but significant differences at the extremes. Proc Natl Acad Sci USA. 2015;112(33):10177–10184. doi: 10.1073/pnas.1422049112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.McCutcheon JP, Moran NA. Parallel genomic evolution and metabolic interdependence in an ancient symbiosis. Proc Natl Acad Sci USA. 2007;104(49):19392–19397. doi: 10.1073/pnas.0708855104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Koga R, Bennett GM, Cryan JR, Moran NA. Evolutionary replacement of obligate symbionts in an ancient and diverse insect lineage. Environ Microbiol. 2013;15(7):2073–2081. doi: 10.1111/1462-2920.12121. [DOI] [PubMed] [Google Scholar]
- 34.Koga R, Moran NA. Swapping symbionts in spittlebugs: Evolutionary replacement of a reduced genome symbiont. ISME J. 2014;8(6):1237–1246. doi: 10.1038/ismej.2013.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Thao ML, et al. Secondary endosymbionts of psyllids have been acquired multiple times. Curr Microbiol. 2000;41(4):300–304. doi: 10.1007/s002840010138. [DOI] [PubMed] [Google Scholar]
- 36.Lamelas A, et al. Serratia symbiotica from the aphid Cinara cedri: A missing link from facultative to obligate insect endosymbiont. PLoS Genet. 2011;7(11):e1002357. doi: 10.1371/journal.pgen.1002357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Vogel KJ, Moran NA. Functional and evolutionary analysis of the genome of an obligate fungal symbiont. Genome Biol Evol. 2013;5(5):891–904. doi: 10.1093/gbe/evt054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Smith WA, et al. Phylogenetic analysis of symbionts in feather-feeding lice of the genus Columbicola: Evidence for repeated symbiont replacements. BMC Evol Biol. 2013;13(1):109. doi: 10.1186/1471-2148-13-109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lefèvre C, et al. Endosymbiont phylogenesis in the dryophthoridae weevils: Evidence for bacterial replacement. Mol Biol Evol. 2004;21(6):965–973. doi: 10.1093/molbev/msh063. [DOI] [PubMed] [Google Scholar]
- 40.Toju H, Tanabe AS, Notsu Y, Sota T, Fukatsu T. Diversification of endosymbiosis: Replacements, co-speciation and promiscuity of bacteriocyte symbionts in weevils. ISME J. 2013;7(7):1378–1390. doi: 10.1038/ismej.2013.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Gruwell ME, Hardy NB, Gullan PJ, Dittmar K. Evolutionary relationships among primary endosymbionts of the mealybug subfamily phenacoccinae (hemiptera: Coccoidea: Pseudococcidae) Appl Environ Microbiol. 2010;76(22):7521–7525. doi: 10.1128/AEM.01354-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.McCutcheon JP, von Dohlen CD. An interdependent metabolic patchwork in the nested symbiosis of mealybugs. Curr Biol. 2011;21(16):1366–1372. doi: 10.1016/j.cub.2011.06.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Thao ML, Gullan PJ, Baumann P. Secondary (gamma-Proteobacteria) endosymbionts infect the primary (beta-Proteobacteria) endosymbionts of mealybugs multiple times and coevolve with their hosts. Appl Environ Microbiol. 2002;68(7):3190–3197. doi: 10.1128/AEM.68.7.3190-3197.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kono M, Koga R, Shimada M, Fukatsu T. Infection dynamics of coexisting beta- and gammaproteobacteria in the nested endosymbiotic system of mealybugs. Appl Environ Microbiol. 2008;74(13):4175–4184. doi: 10.1128/AEM.00250-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.von Dohlen CD, Kohler S, Alsop ST, McManus WR. Mealybug β-proteobacterial endosymbionts contain γ-proteobacterial symbionts. Nature. 2001;412(6845):433–436. doi: 10.1038/35086563. [DOI] [PubMed] [Google Scholar]
- 46.López-Madrigal S, et al. Molecular evidence for ongoing complementarity and horizontal gene transfer in endosymbiotic systems of mealybugs. Front Microbiol. 2014;5:449. doi: 10.3389/fmicb.2014.00449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Parkinson JF, Gobin B, Hughes WOH. Heritability of symbiont density reveals distinct regulatory mechanisms in a tripartite symbiosis. Ecol Evol. 2016;6(7):2053–2060. doi: 10.1002/ece3.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gatehouse LN, Sutherland P, Forgie SA, Kaji R, Christeller JT. Molecular and histological characterization of primary (betaproteobacteria) and secondary (gammaproteobacteria) endosymbionts of three mealybug species. Appl Environ Microbiol. 2012;78(4):1187–1197. doi: 10.1128/AEM.06340-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Koga R, Nikoh N, Matsuura Y, Meng XY, Fukatsu T. Mealybugs with distinct endosymbiotic systems living on the same host plant. FEMS Microbiol Ecol. 2013;83(1):93–100. doi: 10.1111/j.1574-6941.2012.01450.x. [DOI] [PubMed] [Google Scholar]
- 50.Buchner P. Endosymbiosis of Animals with Plant Microorganisms. Interscience Publishers; New York: 1965. p. 909. [Google Scholar]
- 51.Denton JF, et al. Extensive error in the number of genes inferred from draft genome assemblies. PLOS Comput Biol. 2014;10(12):e1003998. doi: 10.1371/journal.pcbi.1003998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.International Aphid Genomics Consortium Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol. 2010;8(2):e1000313. doi: 10.1371/journal.pbio.1000313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Moran NA, Mira A. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol. 2001;2(12):H0054. doi: 10.1186/gb-2001-2-12-research0054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Frank AC, Amiri H, Andersson SG. Genome deterioration: Loss of repeated sequences and accumulation of junk DNA. Genetica. 2002;115(1):1–12. doi: 10.1023/a:1016064511533. [DOI] [PubMed] [Google Scholar]
- 55.Moran NA. Microbial minimalism: Genome reduction in bacterial pathogens. Cell. 2002;108(5):583–586. doi: 10.1016/s0092-8674(02)00665-7. [DOI] [PubMed] [Google Scholar]
- 56.Moran NA, McCutcheon JP, Nakabachi A. Genomics and evolution of heritable bacterial symbionts. Annu Rev Genet. 2008;42:165–190. doi: 10.1146/annurev.genet.41.110306.130119. [DOI] [PubMed] [Google Scholar]
- 57.Moya A, Peretó J, Gil R, Latorre A. Learning how to live together: Genomic insights into prokaryote-animal symbioses. Nat Rev Genet. 2008;9(3):218–229. doi: 10.1038/nrg2319. [DOI] [PubMed] [Google Scholar]
- 58.Clayton AL, et al. A novel human-infection-derived bacterium provides insights into the evolutionary origins of mutualistic insect-bacterial symbioses. PLoS Genet. 2012;8(11):e1002990. doi: 10.1371/journal.pgen.1002990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Oakeson KF, et al. Genome degeneration and adaptation in a nascent stage of symbiosis. Genome Biol Evol. 2014;6(1):76–93. doi: 10.1093/gbe/evt210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.McCutcheon JP, Moran NA. Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution. Genome Biol Evol. 2010;2:708–718. doi: 10.1093/gbe/evq055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Rosenblueth M, Sayavedra L, Sámano-Sánchez H, Roth A, Martínez-Romero E. Evolutionary relationships of flavobacterial and enterobacterial endosymbionts with their scale insect hosts (Hemiptera: Coccoidea) J Evol Biol. 2012;25(11):2357–2368. doi: 10.1111/j.1420-9101.2012.02611.x. [DOI] [PubMed] [Google Scholar]
- 62.Toh H, et al. Massive genome erosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host. Genome Res. 2006;16(2):149–156. doi: 10.1101/gr.4106106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hosokawa T, Kaiwa N, Matsuura Y, Kikuchi Y, Fukatsu T. Infection prevalence of Sodalis symbionts among stinkbugs. Zoological Lett. 2015;1(1):5. doi: 10.1186/s40851-014-0009-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Dale C, Young SA, Haydon DT, Welburn SC. The insect endosymbiont Sodalis glossinidius utilizes a type III secretion system for cell invasion. Proc Natl Acad Sci USA. 2001;98(4):1883–1888. doi: 10.1073/pnas.021450998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Duncan RP, et al. Dynamic recruitment of amino acid transporters to the insect/symbiont interface. Mol Ecol. 2014;23(6):1608–1623. doi: 10.1111/mec.12627. [DOI] [PubMed] [Google Scholar]
- 66.Michalik A, Jankowska W, Kot M, Gołas A, Szklarzewicz T. Symbiosis in the green leafhopper, Cicadella viridis (Hemiptera, Cicadellidae). Association in statu nascendi? Arthropod Struct Dev. 2014;43(6):579–587. doi: 10.1016/j.asd.2014.07.005. [DOI] [PubMed] [Google Scholar]
- 67.Kobiałka M, Michalik A, Walczak M, Junkiert Ł, Szklarzewicz T. Sulcia symbiont of the leafhopper Macrosteles laevis (Ribaut, 1927) (Insecta, Hemiptera, Cicadellidae: Deltocephalinae) harbors Arsenophonus bacteria. Protoplasma. 2016;253(3):903–912. doi: 10.1007/s00709-015-0854-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Wang Z, Wu M. Phylogenomic reconstruction indicates mitochondrial ancestor was an energy parasite. PLoS One. 2014;9(10):e110685. doi: 10.1371/journal.pone.0110685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Ochoa de Alda JAG, Esteban R, Diago ML, Houmard J. The plastid ancestor originated among one of the major cyanobacterial lineages. Nat Commun. 2014;5:4937. doi: 10.1038/ncomms5937. [DOI] [PubMed] [Google Scholar]
- 70.Booth A, Doolittle WF. Eukaryogenesis, how special really? Proc Natl Acad Sci USA. 2015;112(33):10278–10285. doi: 10.1073/pnas.1421376112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Kurland CG, Andersson SG. Origin and evolution of the mitochondrial proteome. Microbiol Mol Biol Rev. 2000;64(4):786–820. doi: 10.1128/mmbr.64.4.786-820.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Gray MW. Mosaic nature of the mitochondrial proteome: Implications for the origin and evolution of mitochondria. Proc Natl Acad Sci USA. 2015;112(33):10133–10138. doi: 10.1073/pnas.1421379112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ku C, et al. Endosymbiotic gene transfer from prokaryotic pangenomes: Inherited chimerism in eukaryotes. Proc Natl Acad Sci USA. 2015;112(33):10139–10146. doi: 10.1073/pnas.1421385112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Zimorski V, Ku C, Martin WF, Gould SB. Endosymbiotic theory for organelle origins. Curr Opin Microbiol. 2014;22:38–48. doi: 10.1016/j.mib.2014.09.008. [DOI] [PubMed] [Google Scholar]
- 75.Ettema TJG. Evolution: Mitochondria in the second act. Nature. 2016;531(7592):39–40. doi: 10.1038/nature16876. [DOI] [PubMed] [Google Scholar]
- 76.Pittis AA, Gabaldón T. Late acquisition of mitochondria by a host with chimaeric prokaryotic ancestry. Nature. 2016;531(7592):101–104. doi: 10.1038/nature16941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Ku C, et al. Endosymbiotic origin and differential loss of eukaryotic genes. Nature. 2015;524(7566):427–432. doi: 10.1038/nature14963. [DOI] [PubMed] [Google Scholar]
- 78.Koonin EV. Archaeal ancestors of eukaryotes: Not so elusive any more. BMC Biol. 2015;13(1):84. doi: 10.1186/s12915-015-0194-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Larkum AWD, Lockhart PJ, Howe CJ. Shopping for plastids. Trends Plant Sci. 2007;12(5):189–195. doi: 10.1016/j.tplants.2007.03.011. [DOI] [PubMed] [Google Scholar]
- 80.Fares MA, Ruiz-González MX, Moya A, Elena SF, Barrio E. Endosymbiotic bacteria: groEL buffers against deleterious mutations. Nature. 2002;417(6887):398. doi: 10.1038/417398a. [DOI] [PubMed] [Google Scholar]
- 81.Andersson JO, Andersson SG. Insights into the evolutionary process of genome degradation. Curr Opin Genet Dev. 1999;9(6):664–671. doi: 10.1016/s0959-437x(99)00024-6. [DOI] [PubMed] [Google Scholar]
- 82.Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Bankevich A, et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Boisvert S, Laviolette F, Corbeil J. Ray: Simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol. 2010;17(11):1519–1533. doi: 10.1089/cmb.2009.0238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Ruby JG, Bellare P, Derisi JL. PRICE: Software for the targeted assembly of components of (Meta) genomic sequence data. G3 (Bethesda) 2013;3(5):865–880. doi: 10.1534/g3.113.005967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Walker BJ, et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Hunt M, et al. REAPR: A universal tool for genome assembly evaluation. Genome Biol. 2013;14(5):R47. doi: 10.1186/gb-2013-14-5-r47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Seemann T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
- 89.Konwar KM, Hanson NW, Pagé AP, Hallam SJ. MetaPathways: A modular pipeline for constructing pathway/genome databases from environmental sequence information. BMC Bioinformatics. 2013;14(1):202. doi: 10.1186/1471-2105-14-202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Karp PD, et al. Pathway Tools version 13.0: Integrated software for pathway/genome informatics and systems biology. Brief Bioinform. 2010;11(1):40–79. doi: 10.1093/bib/bbp043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Jones P, et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics. 2014;30(9):1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Rutherford K, et al. Artemis: Sequence visualization and annotation. Bioinformatics. 2000;16(10):944–945. doi: 10.1093/bioinformatics/16.10.944. [DOI] [PubMed] [Google Scholar]
- 93.Segata N, Börnigen D, Morgan XC, Huttenhower C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat Commun. 2013;4:2304. doi: 10.1038/ncomms3304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008;9(4):286–298. doi: 10.1093/bib/bbn013. [DOI] [PubMed] [Google Scholar]
- 95.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25(15):1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19(12):1572–1574. doi: 10.1093/bioinformatics/btg180. [DOI] [PubMed] [Google Scholar]
- 98.Lartillot N, Rodrigue N, Stubbs D, Richer J. PhyloBayes MPI: Phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol. 2013;62(4):611–615. doi: 10.1093/sysbio/syt022. [DOI] [PubMed] [Google Scholar]
- 99.Darling AE, Mau B, Perna NT. progressiveMauve: Multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5(6):e11147. doi: 10.1371/journal.pone.0011147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Li L, Stoeckert CJ, Jr, Roos DS. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Kumar S, Jones M, Koutsovoulos G, Clarke M, Blaxter M. Blobology: Exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Front Genet. 2013;4:237. doi: 10.3389/fgene.2013.00237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Koutsovoulos G, et al. No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini. Proc Natl Acad Sci USA. 2016;113(18):5053–5058. doi: 10.1073/pnas.1600338113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Delmont TO, Eren AM. Identifying contamination with advanced visualization and analysis practices: Metagenomic approaches for eukaryotic genome assemblies. PeerJ. 2016;4:e1839. doi: 10.7717/peerj.1839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Parra G, Bradnam K, Korf I. CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23(9):1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]
- 106.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- 107.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33(20):6494–6506. doi: 10.1093/nar/gki937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Wheeler TJ, Eddy SR. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013;29(19):2487–2489. doi: 10.1093/bioinformatics/btt403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Huerta-Cepas J, Dopazo J, Gabaldón T. ETE: A python environment for tree exploration. BMC Bioinformatics. 2010;11(1):24. doi: 10.1186/1471-2105-11-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Van Leuven JT, Meister RC, Simon C, McCutcheon JP. Sympatric speciation in a bacterial endosymbiont results in two genomes with the functionality of one. Cell. 2014;158(6):1270–1280. doi: 10.1016/j.cell.2014.07.047. [DOI] [PubMed] [Google Scholar]
- 111.Schindelin J, et al. Fiji: An open-source platform for biological-image analysis. Nat Methods. 2012;9(7):676–682. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








