Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2014 Aug 21;6(9):2350–2360. doi: 10.1093/gbe/evu179

Phylogenomic Study Indicates Widespread Lateral Gene Transfer in Entamoeba and Suggests a Past Intimate Relationship with Parabasalids

Jessica R Grant 1, Laura A Katz 1,2,*
PMCID: PMC4217692  PMID: 25146649

Abstract

Lateral gene transfer (LGT) has impacted the evolutionary history of eukaryotes, though to a lesser extent than in bacteria and archaea. Detecting LGT and distinguishing it from single gene tree artifacts is difficult, particularly when considering very ancient events (i.e., over hundreds of millions of years). Here, we use two independent lines of evidence—a taxon-rich phylogenetic approach and an assessment of the patterns of gene presence/absence—to evaluate the extent of LGT in the parasitic amoebozoan genus Entamoeba. Previous work has suggested that a number of genes in the genome of Entamoeba spp. were acquired by LGT. Our approach, using an automated phylogenomic pipeline to build taxon-rich gene trees, suggests that LGT is more extensive than previously thought. Our analyses reveal that genes have frequently entered the Entamoeba genome via nonvertical events, including at least 116 genes acquired directly from bacteria or archaea, plus an additional 22 genes in which Entamoeba plus one other eukaryote are nested among bacteria and/or archaea. These genes may make good candidates for novel therapeutics, as drugs targeting these genes are less likely to impact the human host. Although we recognize the challenges of inferring intradomain transfers given systematic errors in gene trees, we find 109 genes supporting LGT from a eukaryote to Entamoeba spp., and 178 genes unique to Entamoeba spp. and one other eukaryotic taxon (i.e., presence/absence data). Inspection of these intradomain LGTs provide evidence of a common sister relationship between genes of Entamoeba (Amoebozoa) and parabasalids (Excavata). We speculate that this indicates a past close relationship (e.g., symbiosis) between ancestors of these extant lineages.

Keywords: microbial eukaryotes, parasites, Trichomonas vaginalis, horizontal gene transfer, LGT, HGT

Introduction

Entamoeba histolytica is a human parasite that has a significant impact on health worldwide (Stanley 2003). Although initial phylogenetic analyses placed Entamoeba as an early diverging eukaryote, more recent studies based on greater numbers of genes and more sophisticated methods have shown that Entamoeba is a highly derived member of the Amoebozoa (see Embley 2006), one of the major clades of eukaryotes (Lühe 1913; Cavalier-Smith 1998). Previous work suggests that some genes in Entamoeba are of bacterial or archaeal origin (Yang et al. 1994; Rosenthal et al. 1997; Ali, Shigeta, et al. 2004), and the original annotation of the E. histolytica genome revealed examples of lateral gene transfer (LGT; Loftus and Hall 2005; Loftus et al. 2005; Clark et al. 2007). The original estimate of 96 genes involved in LGT was lowered to 68 when the genome was reassessed in 2007 (Clark et al. 2007) and again in 2010 (Lorenzi et al. 2010). Most of these 68 genes appear to be transfers from bacteria, but others do not have a closely related bacterial donor (at least not one with available sequence data for comparison) and may have been transferred from another eukaryote (Loftus et al. 2005). The genomes of the human parasite E. histolytica, and the closely related E. dispar and E. invadens, have been sequenced and there are expressed sequence tag (EST) data for E. nuttalli and Mastigamoeba balamuthi, a free-living relative. We used these data in a taxon-rich phylogenomic pipeline (Grant and Katz 2014) to assess the impact of LGT on the Entamoeba genome.

Despite barriers to integrating foreign DNA into a genome (Andersson 2005; Baltrus 2013), it is apparent that LGT has impacted the evolution of eukaryotes as well as bacteria and archaea (Katz 2002; Andersson et al. 2003; Keeling and Palmer 2008). Many of the genes affected by LGT are involved in metabolism (Nixon et al. 2002; Ali, Hashimoto, et al. 2004; Ali, Shigeta, et al. 2004; Anderson and Loftus 2005), and it seems likely that LGT has influenced the independent evolution of an anaerobic lifestyle in many eukaryotic lineages (Ginger et al. 2010; Hug et al. 2010).

Supported discordance among gene trees provides evidence for LGT, as a gene with a history of LGT will cluster with the donor lineage in phylogenetic trees long after it has been transferred to another genome. Yet single gene trees are notoriously error prone and errors in the inference of LGT can be made when donor lineages are not represented on the tree (Beiko and Ragan 2009). Several inferences of LGTs, based on phylogenetic relationships, have been falsified by trees with improved taxon sampling and more sophisticated methods (Richards et al. 2003; Andersson 2005). Hence, a taxon-rich approach is needed to assess cases of LGT. Further, there is greater power in detecting LGT between distantly related species (i.e., interdomain events.) For example, a gene of bacterial ancestry is quite distinct and often easy to recognize after it has been transferred to a eukaryotic genome.

The pattern of presence or absence of a gene can also be a strong indicator of LGT, especially for genes found only among members of distantly related lineages, though the impact of gene loss cannot be discounted (Zmasek and Godzik 2011; Wolf and Koonin 2013). For example, a gene found only among diverse bacteria and one clade of eukaryotes could be explained by assuming the gene was present in the last common ancestor of eukaryotes and that it was lost in all lineages except one. However, LGT from a bacterium to the ancestor of the clade that shares the gene is a more parsimonious explanation (Ragan 2001). Thus, searching for unusual patterns of taxa represented in gene alignments can be used to detect LGT (Lake and Rivera 2004; Cohen and Pupko 2010; Cohen et al. 2011; Le et al. 2012).

Here, we take two approaches to investigate the impact of LGT on the Entamoeba genome: 1) analyze taxon-rich phylogenetic gene trees and assess evidence for LGT in Entamoeba from both bacterial and archaeal lineages, and to a lesser extent from other eukaryotes; and 2) catalog examples of patterns of gene presence/absence in Entamoeba plus bacteria/archaea as further evidence of potential LGTs. Both approaches suggest a greater number of LGT events in the genome of Entamoeba than previously documented. To our surprise, both also provide evidence of a relationship between ancestors of Entamoeba and parabasalids such as Trichomonas vaginalis, another human parasite that is phylogenetically distant from Entamoeba on the eukaryotic tree of life.

Materials and Methods

Initial Pipeline

The starting point for these analyses is a set of orthologous groups from OrthoMCL (2003), a database of clustered orthologous groups that includes taxa from 105 whole genomes including E. histolytica, E. dispar, and E. invadens. We chose the 6,107 genes in OrthoMCL that contained E. histolytica to seed our phylogenomic pipeline (Grant and Katz 2014). Another species of Entamoeba, E. nuttalli, and a free-living relative of Entamoeba, Mastigamoeba balamuthi, were included in the data added by the pipeline along with 237 eukaryotes, 485 bacteria, and 59 archaea (supplementary table S1, Supplementary Material online). The output of this pipeline includes, for each orthologous group, a robust single-gene alignment and a most likely tree built in RAxML 7.2.8 (Stamatakis 2006; Stamatakis et al. 2008) with model setting PROTGAMMALG.

Of the 6,107 starting genes, 4,000 were not recovered at the end of the pipeline because either they had fewer than two taxa, because no characters remained after masking positions with more than 50% missing data in the alignments, or because the Entamoeba were removed (180 genes). Removal of Entamoebae spp. can occur when the original cluster of sequences in OrthoMCL is too divergent to satisfy the stringent criteria of our phylogenomic pipeline (Grant and Katz 2014). An additional 3,664 genes were uninformative because the group included only Entamoeba and no other taxa. Finally, we chose to discard those genes that had only one Entamoeba sequence (57 genes), or Entamoeba plus only one other sequence (99 genes) to eliminate potential cases of contamination, though we understand cases of recent LGT may have been missed here.

Inferences from Pipeline

Single-gene alignments for the remaining 2,107 genes from the pipeline output were analyzed in FastTree (Price et al. 2009, 2010), as a first assessment. One thousand bootstrap (BS) replicates were built under the WAG model. A consensus tree was built from these replicates and nodes of <70% BS support were collapsed using custom python scripts and implementing the tree walking methods in p4 (Foster 2004). Trees collapsed to nodes with >70% BS were examined by script and by eye to determine the supported relationships between the Entamoeba and other taxa on the tree. A total of 993 trees were not considered in our assessment of inheritance, as their BS consensus did not provide support for relationships between Entamoeba spp. and any other taxon. This left 1,114 genes to be assessed for evidence of LGT or vertical inheritance.

For the genes that suggested LGT but included other eukaryotes (Entamoeba spp. nested in bacteria and/or archaea in a gene that contains other eukaryotes [81 genes] or Entamoeba spp. sister to nonamoebozoa eukaryotes [520 genes]), we further refined our inference of LGT with the approximately unbiased (AU) test, as implemented in Consel (Shimodaira and Hasegawa 2001), testing the monophyly of Amoebozoa. In these categories, 63 and 109 genes, respectively, rejected the monophyly of Amoebozoa and were retained.

These 678 (1,114 minus the genes that did not pass the AU test) genes were initially categorized based on the topology of nodes with >70% BS support in FastTree as follows: Vertical inheritance: 253 genes; Entamoeba spp. in a tree with only bacteria and/or archaea: 53 genes; Entamoeba spp. nested in bacteria and/or archaea in a gene that contains other eukaryotes: 63 genes; Entamoeba spp. plus one other eukaryote nested in bacteria and/or archaea or as the only eukaryotes in a gene with bacteria and/or archaea: 22 genes; Entamoeba spp. sister to nonamoebozoa eukaryotes: 109 genes; Entamoeba spp. plus one other major clade of eukaryotes: 178 genes (fig. 1).

Fig. 1.—

Fig. 1.—

Number of trees supporting LGT in Entamoeba ranked from strongest to weakest support. Cartoon trees exemplifying the patterns consistent with LGT. (A) Putative interdomain LGT: Entamoeba species in a tree that otherwise includes only bacteria and/or archaea. (B) Putative interdomain LGT: Entamoeba species nested within bacteria or archaea in trees with other eukaryotic taxa; monophyletic amoebozoa rejected by AU test, or no other amoebozoa in gene. (C) Putative interdomain LGT followed by intradomain LGT: Entamoeba species with a eukaryotic sister taxon nested within bacteria and/or archaea. Relationship with other eukaryote supported with >80% bootstrap support. (D) Putative intradomain LGT: Entamoeba species in a eukaryotic clade that is distinct from other amoebozoan taxa; monophyletic Amoebozoa rejected by AU test.

Sister Relationship—Two Approaches

We addressed the issue of sister taxa two ways: First, we investigated the genes from our initial phylogenetic inferences with Entamoeba spp. in a relationship with non-amoebozoan eukaryotes—the 22 genes with Entamoeba spp. plus nonamoebozoan eukaryotes nested within bacteria and/or archaea and 109 genes with Entamoeba spp. in a gene with other Amoebozoa but with the monophyly of Amoebozoa rejected by the AU test (fig. 1C and D). FastTree has been compared favorably to RAxML (Liu et al. 2011) but we bootstrapped a number of alignments using RAxML and found that the BS values from FastTree were inflated as compared with the values from RAxML. To be more sure of our sister relationship inferences, we rebootstrapped the 109 trees with RAxML version 7.2.8 using rapid bootstrapping with model PROTGAMMALG and determining the proper number of independent bootstrap replicates with bootstopping criteria autoMRE (Stamatakis et al. 2005, 2008; Stamatakis 2006). After bootstrapping, we retained only those 22 genes that had BS support >80% in RAxML to a sister clade that contained only one clade of eukaryotes (e.g., trees with sister relationships to a plant and a fungus were rejected even if BS support was high.)

Secondly, in order to investigate sister relationships of Entamoeba spp. independent of our phylogenetic pipeline, we analyzed all orthologous groups from OrthoMCL made up of only Entamoeba spp. and one other eukaryotic species. To align genes and assess the robustness of the OrthoMCL groupings, fasta files downloaded from the OrthoMCL database were passed through Guidance (Liu et al. 2011), a program that builds and bootstraps multisequence alignments and scores both taxa and characters. For our phylogenomic pipeline, we used Guidance with relaxed score cutoffs (Penn et al. 2010) because the default parameters are too stringent for phylogenomic analyses given the diversity seen with our broad taxon sampling. For this presence/absence analysis, however, we wanted to have greater confidence in our call of orthology, so we used the more conservative default parameters. Most of the groups (225 of 372), were not recovered after Guidance because there were too few sequences in the alignment—either in the original OrthoMCL group (81 orthologous groups) or after removal of poorly aligned sequences (85 orthologous groups) or because Guidance removed all of the sequences from one of the two taxa (59 orthologous groups). Relationships that were retained after this screen are reported.

Functional Comparisons

To assess the function of the genes categorized as having either vertical or lateral descent, we used BLAST2GO (Conesa et al. 2005) with default parameters to assign gene ontologies to the E. histolytica sequences. Entamoeba histolytica sequences from the 253 genes with strong support for vertical inheritance and the 116 genes with strong support for interdomain LGT (table 1 and supplementary table S2, Supplementary Material online) were used. We assessed the differences in Level 2 Biological Processes, as assessed by Blast2GO, in these two groups of genes.

Table 1.

Number of Genes Supporting Vertical versus Lateral Inheritance

Inheritance Type Total Number Placement of Entamoeba in Trees Number
Vertical inheritance 253 Tree topology: Entamoeba with other Amoebozoa 197
Present only in Entamoeba and other Amoebozoa 56
Interdomain LGT 116 Present only in Entamoeba plus bacteria and/or archaea. 53
Tree topology: Entamoeba within bacteria and/or archaea. 63
Interdomain + intradomain LGT 22 Tree topology: Entamoeba sister to a non-amoebozoan eukaryote nested within bacteria and/or archaea. 22
Intradomain LGT 287 Tree topology: Entamoeba with eukaryotic sister, monophyly of Amoebozoa rejected by AU test. 109
Present only in Entamoeba plus one other nonamoebozoan eukaryote. 178
Total 678

Note.—Patterns of inheritance interpreted from tree topologies generated by phylogenomic pipeline and from patterns of gene presence/absence. Inheritance types are broken into subgroups depending on the topological evidence for the type, and number of trees in each category are given. Additional details as in Materials and Methods.

Results

To investigate the impact of LGT on the parasitic genus Entamoeba phylogenetically, we used aligned sequences and gene trees produced from our phylogenomic pipeline (Grant and Katz 2014). After preliminary screening, we retained 1,114 genes that met our initial criteria of sufficient taxon sampling and BS support for a relationship for Entamoebae spp. (see Materials and Methods). For those genes present in other amoebozoan taxa, we used the AU test (Shimodaira and Hasegawa 2001) to evaluate the alternative hypothesis that Entamoeba spp. inherited the gene vertically and the tree topology is spurious. We removed 436 genes for which the AU test did not reject the monophyly of Amoebozoa, leaving 678 genes in our analyses.

From our sample of 678 genes, 253 have patterns consistent with vertical inheritance from amoebozoan ancestors: 197 trees with topological evidence (i.e., Entamoeba spp. in a clade with other amoebozoan taxa) and 56 trees with presence/absence support (i.e., alignments containing only Entamoeba spp. plus other Amoebozoa; table 1 and supplementary table S2, Supplementary Material online).

We also looked for evidence of interdomain LGT to explain the presence of genes in Entamoeba spp. We identified 116 genes consistent with a single interdomain LGT event giving rise to the genes in Entamoeba spp. (table 1): 53 with gene presence/absence evidence of LGT (i.e., genes with only Entamoeba spp. and bacteria and/or archaea; fig. 1A and table 1) and 63 trees with phylogenetic evidence of LGT (i.e., Entamoeba spp. nested within bacterial or archaeal clades with >80% BS support; fig. 1B and table 1).

A smaller number of genes suggest a history of interdomain transfer followed by intradomain transfer. In 22 gene trees, Entamoeba is found sister to a single nonamoebozoan eukaryotic taxon nested within clades of bacteria and/or archaea—a topology consistent with LGT from bacteria or archaea into one of the eukaryotes followed by a second LGT event into the eukaryotic sister (fig. 1C and table 1).

We also looked at patterns of intradomain transfer, though we recognize that eukaryote-to-eukaryote LGT is more difficult to assess than interdomain LGT. Phylogenetic evidence suggests 287 genes from our pipeline analysis have support for intradomain LGT. In 109 genes, tree topologies show a sister relationship with nonamoebozoan taxa, and the monophyly of Amoebozoa (vertical descent) is rejected by the AU test (fig. 1D and table 1), evidence for putative intradomain LGT. In addition, 178 trees contained only Entamoeba spp. and taxa from one other nonamoebozoan clade, a pattern of gene presence/absence suggesting possible gene sharing between diverse eukaryotes (table 1 and supplementary table S2, Supplementary Material online). In these cases, gene loss in all other taxa is another possible explanation.

Gene Function

The distribution of gene function is different in vertically inherited genes compared with genes putatively impacted by LGT in the E. histolytica genome. Using Blast2Go (Conesa et al. 2005), we assigned functional categories to genes from our phylogenomic pipeline that had phylogenetic signatures of interdomain LGT (116 genes; table 1) and those with compelling phylogenetic evidence of vertical descent (197 genes; table 1). The genes identified as candidate interdomain LGT genes are more likely to be involved in metabolic processes than those identified as vertically inherited genes, while vertically inherited genes are more evenly distributed among processes (fig. 2).

Fig. 2.—

Fig. 2.—

Functional categories for vertically inherited genes and putatively laterally transferred genes show that LGT genes are more likely to be involved in metabolism. We compared the function of vertically inherited genes (blue) and putative laterally transferred genes (red). Categories are level two biological processes, as inferred by BLAST2GO (Conesa et al. 2005).

Sister Relationships across All Trees

The topologies of the single gene trees from our pipeline show a striking relationship between Entamoeba and the parabasalid taxa T. vaginalis, Histomonas meleagridis, and Pentatrichomonas hominis. To investigate further, we took a dual approach to assessing sister relationships to Entamoeba spp.—one relying on the output of our phylogenomic pipeline and another independent of our pipeline, relying only on estimates of homology in OrthoMCL (Li et al. 2003; Chen et al. 2006; see Materials and Methods).

We examined sister relationships in two types of trees from our phylogenomic pipeline: the 22 trees with Entamoeba spp. sister to a single eukaryotic taxon, which were nested within bacteria or archaea (interdomain LGT followed by intradomain LGT; fig. 1C and table 1) and the 109 trees with Entamoeba and a nonamoebozoan sister in trees where the monophyly of Amoebozoa is rejected by the AU test (eukaryote-to-eukaryote LGT; fig. 1D and table 1). To be conservative in our estimation of sister relationships, we bootstrapped the 109 trees in RAxML, and kept only the 22 trees with a BS support of >80% for a sister relationship between Entamoeba spp. and one eukaryotic taxon. This approach identified 44 gene trees with a sister relationship that can be identified with confidence and among these, the most common sister taxon was T. vaginalis (22 trees; fig. 3A and supplementary data S2 and table S3, Supplementary Material online). Other taxa had supported sister relationships with Entamoeba in many fewer trees including kinetoplastids (four trees), Giardia spp. (three trees), apicomplexa (three trees), mixed Excavata (three trees), and microsporidia (two trees; supplementary table S3, Supplementary Material online). These rarer occurrences may be due to aberrant LGT events, convergence, or may appear from biases in methods.

Fig. 3.—

Fig. 3.—

Parabasalids, including Trichomonas vaginalis, are the most common sister taxon of Entamoeba and the most likely nonamoebozoan taxon to share similar genes with Entamoeba. (A) Most common sister taxa of Entamoeba in trees that reject monophyletic amoebozoa and have >80% bootstrap support for sister relationships and (B) most common taxa found with Entamoeba in genes found only in Entamoeba and one other eukaryote. In both analyses, parabasalids (Excavata) are the most common nonamoebozoan partner of Entamoeba. Data for all species are in supplementary tables S3 and S4, Supplementary Material online.

We also examined sister relationships among genes present only in Entamoeba spp and one other eukaryote based on clusters of orthologs determined by OrthoMCL as this approach is independent of the parameters of our pipeline. Here, we found the same association between Entamoeba spp. and T. vaginalis. Of the 147 genes that passed the stringent requirements (see Materials and Methods), the largest number (42 groups) is consistent with vertical inheritance as they include only Entamoeba spp. and Dictyostelium discoideum, the only other amoebozoan taxon represented in OrthoMCL. The second must common association was between Entamoeba spp. and T. vaginalis, which was found in 29 genes (fig. 3B). The remaining genes contained Entamoeba spp. plus taxa found in only a few groups (e.g., at most nine groups for Arabidopsis thaliana), with many taxa being represented in only one orthologous group (fig. 3B and supplementary table S4, Supplementary Material online).

Discussion

Phylogenetic Trees and Patterns of Gene Presence/Absence Suggest Widespread LGT in the Genome of Entamoeba

Our dual approach of assessing taxon-rich tree topologies and gene presence/absence patterns reveals the impact of LGT in the genomes of Entamoeba spp. As interpretation of past LGT events is challenging given problems inherent in analyzing evolutionary history of single genes over long periods of time, we rank our findings roughly in order of greatest to least confidence. We identified 116 candidate interdomain LGT events, 22 putative instances of interdomain LGT followed by intradomain LGT, plus 287 possible intradomain LGT events (table 1 and fig. 1). We recognize that our attempts to be conservative may have eliminated a number of “true” LGT events and that additional examples may be identified as taxon sampling improves.

The impact of LGT on Entamoeba spp. has been previously recognized (Yang et al. 1994; Rosenthal et al. 1997; Loftus et al. 2005; Clark et al. 2007) though our approach yields a longer list of candidate genes. Comparisons across studies are challenging because of concurrent changes in methodologies for genome assembly and LGT detection; nevertheless, we find the 68 genes identified in Clark 2007 (supplementary table S2, Supplementary Material online) plus more. The differences emerge, in part, because we use a dual approach of assessing tree topologies and identifying cases where genes are only found in Entamoeba spp. and bacteria and/or archaea. The genes we retain include nine candidate LGTs originally identified by Loftus et al (2005) that were removed from consideration by Clark et al. (2007) as increased taxon sampling had revealed eukaryotic sisters. We retained genes if Entamoeba spp. plus one other eukaryotic group was nested among bacteria and/or archaea (i.e., interdomain followed by intradomain LGTs) and found that six of the nine genes rejected by Clark et al. (2007) showed Entamoeba spp. sister to T. vaginalis (see below).

Compared with vertically inherited genes, the genes identified by our approach as candidate LGTs are more likely to be involved in metabolism (fig. 2), a trend noted in several recent studies of LGT into other microbial eukaryotes (Embley 2006; Ginger 2006; Tsaousis et al. 2012; Imanian and Keeling 2014). These genes may make effective targets for drug discovery efforts, as drugs targeting genes with bacterial origins are less likely to have an impact on their human host (Umejiego et al. 2008; Keeling 2009; Alsmark et al. 2013).

Phylogenetic Trees and Orthology Estimates Support a Specific Ancestral Relationship with Parabasalids

Both phylogenetic trees and presence/absence data show a specific relationship between genes found in Entamoeba spp. and those in T. vaginalis and sometimes other parabasalids. (Although the entire genome has been sequenced from T. vaginalis, there are only limited EST data from P. hominis and H. meleagridis (supplementary table S1, Supplementary Material online) making assessment of the relationship with parabasalids as a whole more difficult.) We used two independent approaches here, first assessing the sister relationships in phylogenetic trees and second analyzing patterns of gene presence/absence in all orthologous groups from OrthoMCL. Both analyses show T. vaginalis is more highly represented than any other nonamoebozoan taxon (fig. 3 and supplementary table S4, Supplementary Material online).

Inspection of individual trees from our phylogenomic pipeline yields some intriguing patterns. For example, phylogenetic analyses of a lipase-containing protein, OG5_129115 (abbreviation refers to orthologous group as determined in OrthoMCL; Li et al. 2003) place the T. vaginalis sequence sister to E. dispar and E. invadens and nested among other amoebozoan taxa (fig. 4). Analyses of OG5_127586 (a hypothetical conserved protein) show three Entamoeba paralogs, each sister to T. vaginalis paralogs, suggesting acquisition of a recently expanded gene family (fig. 5). In this case, one of these clades also contains Blastocystis hominis, another distantly related mucosal parasite. This topology suggests the possibility of parallel gene transfers into organisms within environmental niches, as has previously been hypothesized (Ricard et al. 2006; Alsmark et al. 2013; Clarke et al. 2013).

Fig. 4.—

Fig. 4.—

Exemplar maximum-likelihood tree of lipase-containing protein (OrthoMCL cluster OG5_129115) showing putative LGT from Entamoeba to Trichomonas vaginalis. Maximum-likelihood tree with monophyletic Amoebozoa interrupted by a T. vaginalis sequence sister to E. dispar and E. invadens suggests LGT from an ancestor of Entamoeba to T. vaginalis. Arrow points to clades of interest. Sequences are labeled with their major clade: Op, Opisthokont; Pl, Archaeplastida; Sr, SAR; Ex, Excavata; Am, Amoebozoa. Branches within the clades of interest are labeled with bootstrap support; branches outside the clades of interest with bootstrap support >70% are bold. Scale bar represents number of changes.

Fig. 5.—

Fig. 5.—

Maximum-likelihood tree of hypothetical conserved protein (OrthoMCL cluster OG5_127586) showing putative LGT of multiple paralogs. Maximum-likelihood tree with multiple sister groups of Entamoeba and Trichomonas vaginalis suggests a relationship between Entamoeba and T. vaginalis more recently than the gene duplication event that created the paralogs. Sequences are labeled by their major clade (see fig. 4 legend for abbreviations). Branches within the clades of interest are labeled with bootstrap support; branches outside the clades of interest with bootstrap support >70% are bold. Scale bar represents number of changes.

Given our taxon sampling, there is no clear directionality in the observed LGT between the ancestors of Entamoeba and Trichomonas: Of the 22 trees analyzed, two are consistent with transfer from Entamoeba to Trichomonas (i.e., T. vaginalis is nested among Amoebozoa), three appear to be the opposite, and in the remaining 17 trees there is poor support at deeper nodes (supplementary tables S2 and S3, Supplementary Material online). We anticipate that directionality can be determined with additional sampling of whole genomes from diverse amoebozoan and parabasalid lineages.

An association between E. histolytica and T. vaginalis has been mentioned in several studies, as organisms that have both been impacted by LGT and some putative gene sharing has been noted (Stanley 2005; Keeling and Palmer 2008; Alsmark et al. 2009, 2013). Borrowing genes, particularly metabolic genes (fig. 2), may be one way to adapt to a new, extremely different, environment. Entamoeba emerges as a parasite from a mostly nonparasitic clade (Amoebozoa) and the transition to parasitism may have occurred in a similar environment (i.e., epithial cells) as the evolution of parasitism in parabasalids. Thus, it is possible that genes shared between these taxa are due to borrowing genes from a pool of donor lineages available in the niche environment that they share (Alsmark et al. 2013; Clarke et al. 2013). However, the sister relationship between these two taxa in our analyses suggests more than a shared tendency to independently pick up genes from their environment. The frequent sister relationship, along with the patterns of shared paralogs (e.g., fig. 5), suggests a past relationship between the ancestors of the two taxa. While this is speculative, endosymbiosis among eukaryotes does occur, for example, Excavata symbionts are found within the macronucleus of some ciliates (e.g., Fokin et al. 2008, 2014; Gomaa et al. 2014). Moreover, Tanifuji. et al (2011) document an example of an amoebozoan, Neoparamoeba pemaquidensis, that hosts a kinetoplastid (an Excavata, like the parabasalids) endosymbiont—a relationship similar to the one we suggest here.

Caveats

Single gene trees are prone to error; any one gene tree that can be explained by LGT might also be explained by misidentified orthologs, gene loss, limited taxon sampling or any of a number of other causes (see Kurland et al. 2003; Martin 2005). Although some of the discordance we see is no doubt due to this sort of error, the large number of discordant genes and the independent evidence of bias in both gene function and sister taxon relationships point to something beyond error in our analyses. Nevertheless, it is important to consider alternative explanations. One possible explanation is convergent evolution: Eukaryotic parasites living in a similar niche experience similar selective pressures, including host defenses and a microaerophilic environment. However, although convergent evolution is often seen in protein functional domains (Bork and Doolittle 1992; Gandbhir et al. 1995; Tomii et al. 2012), convergence at the sequence level across the length of a gene is unlikely (Doolittle 1994; Oren 1995; Gogarten and Olendzenski 1999). Another possibility is that we still have inadequate taxon sampling to depict vertical donor lineages accurately, though this is less likely to impact cases of interdomain transfers.

Although these caveats mean that we may be mistaken in our interpretation of individual gene trees, we believe that our conservative approach—relying on strong phylogenetic support for taxon-rich gene trees (i.e., high BS support and AU tests) and identifying genes present in only Entamoeba spp. plus bacteria/archaea (i.e., presence/absence data)—may well have led to an underestimate of the number of interdomain LGTs. Moreover, our hypothesis of past relationships between the ancestors of Entamoeba spp. and parabasalids is testable as additional taxa are sampled from close relatives.

Supplementary Material

Supplementary data and tables S1–S4 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

Supplementary Data

Acknowledgments

This work was supported by a U.S. National Institutes of Health award 1R15GM097722-01 and U.S. National Science Foundation award DEB-1208741. The authors thank Rob Dorit (Smith College) and J. Gordon Burleigh (Univ. Florida) for helpful conversations.

Literature Cited

  1. Ali V, Hashimoto T, Shigeta Y, Nozaki T. Molecular and biochemical characterization of D-phosphoglycerate dehydrogenase from Entamoeba histolytica. A unique enteric protozoan parasite that possesses both phosphorylated and nonphosphorylated serine metabolic pathways. Eur J Biochem. 2004;271:2670–2681. doi: 10.1111/j.1432-1033.2004.04195.x. [DOI] [PubMed] [Google Scholar]
  2. Ali V, Shigeta Y, Tokumoto U, Takahashi Y, Nozaki T. An intestinal parasitic protist, Entamoeba histolytica, possesses a non-redundant nitrogen fixation-like system for iron-sulfur cluster assembly under anaerobic conditions. J Biol Chem. 2004;279:16863–16874. doi: 10.1074/jbc.M313314200. [DOI] [PubMed] [Google Scholar]
  3. Alsmark C, et al. Patterns of prokaryotic lateral gene transfers affecting parasitic microbial eukaryotes. Genome Biol. 2013;14:R19. doi: 10.1186/gb-2013-14-2-r19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Alsmark UC, et al. Horizontal gene transfer in eukaryotic parasites: a case study of Entamoeba histolytica and Trichomonas vaginalis. Methods Mol Biol. 2009;532:489–500. doi: 10.1007/978-1-60327-853-9_28. [DOI] [PubMed] [Google Scholar]
  5. Anderson IJ, Loftus BJ. Entamoeba histolytica: observations on metabolism based on the genome sequence. Exp Parasitol. 2005;110:173–177. doi: 10.1016/j.exppara.2005.03.010. [DOI] [PubMed] [Google Scholar]
  6. Andersson JO. Lateral gene transfer in eukaryotes. Cell Mol Life Sci. 2005;62:1182–1197. doi: 10.1007/s00018-005-4539-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Andersson JO, Sjogren AM, Davis LAM, Embley TM, Roger AJ. Phylogenetic analyses of diplomonad genes reveal frequent lateral gene transfers affecting eukaryotes. Curr Biol. 2003;13:94–104. doi: 10.1016/s0960-9822(03)00003-4. [DOI] [PubMed] [Google Scholar]
  8. Baltrus DA. Exploring the costs of horizontal gene transfer. Trends Ecol Evol. 2013;28:489–495. doi: 10.1016/j.tree.2013.04.002. [DOI] [PubMed] [Google Scholar]
  9. Beiko RG, Ragan MA. Untangling hybrid phylogenetic signals: horizontal gene transfer and artifacts of phylogenetic reconstruction. Methods Mol Biol. 2009;532:241–256. doi: 10.1007/978-1-60327-853-9_14. [DOI] [PubMed] [Google Scholar]
  10. Bork PD, Doolittle RF. Proposed acquisition of an animal protein domain by bacteria. Proc Natl Acad Sci U S A. 1992;89:8990–8994. doi: 10.1073/pnas.89.19.8990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cavalier-Smith T. A revised six-kingdom system of life. Biol Rev Camb Philos Soc. 1998;73:203–266. doi: 10.1017/s0006323198005167. [DOI] [PubMed] [Google Scholar]
  12. Chen F, Mackey AJ, Stoeckert CJ, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–D368. doi: 10.1093/nar/gkj123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Clark CG, et al. Structure and content of the Entamoeba histolytica genome. Adv Parasitol. 2007;65:51–190. doi: 10.1016/S0065-308X(07)65002-7. [DOI] [PubMed] [Google Scholar]
  14. Clarke M, et al. Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 2013;14:R11. doi: 10.1186/gb-2013-14-2-r11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cohen O, Gophna U, Pupko T. The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene Transfer. Mol Biol Evol. 2011;28:1481–1489. doi: 10.1093/molbev/msq333. [DOI] [PubMed] [Google Scholar]
  16. Cohen O, Pupko T. Inference and characterization of horizontally transferred gene families using stochastic mapping. Mol Biol Evol. 2010;27:703–713. doi: 10.1093/molbev/msp240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Conesa A, et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–3676. doi: 10.1093/bioinformatics/bti610. [DOI] [PubMed] [Google Scholar]
  18. Doolittle RF. Convergent evolution: the need to be explicit. Trends Biochem Sci. 1994;19:15–18. doi: 10.1016/0968-0004(94)90167-8. [DOI] [PubMed] [Google Scholar]
  19. Embley TM. Multiple secondary origins of the anaerobic lifestyle in eukaryotes. Philos Trans R Soc B Biol Sci. 2006;361:1055–1067. doi: 10.1098/rstb.2006.1844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fokin SI, Di Giuseppe G, Erra F, Dini F. Euplotespora binucleata n. gen., n. sp (protozoa: microsporidia), a parasite infecting the hypotrichous ciliate Euplotes woodruffi, with observations on microsporidian infections in Ciliophora. J Eukaryot Microbiol. 2008;55:214–228. doi: 10.1111/j.1550-7408.2008.00322.x. [DOI] [PubMed] [Google Scholar]
  21. Fokin SI, Schrallhammer M, Chiellini C, Verni F, Petroni G. Free-living ciliates as potential reservoirs for eukaryotic parasites: occurrence of a trypanosomatid in the macronucleus of Euplotes encysticus. Parasit Vectors. 2014;7:203. doi: 10.1186/1756-3305-7-203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Foster PG. Modeling compositional heterogeneity. Syst Biol. 2004;53:485–495. doi: 10.1080/10635150490445779. [DOI] [PubMed] [Google Scholar]
  23. Gandbhir M, Rasched I, Marliere P, Mutzel R. Convergent evolution of amino-acid usage in archaebacterial and eubacterial lineages adapted to high-salt. Res Microbiol. 1995;146:113–120. doi: 10.1016/0923-2508(96)80889-8. [DOI] [PubMed] [Google Scholar]
  24. Ginger ML. Niche metabolism in parasitic protozoa. Philos Trans R Soc B Biol Sci. 2006;361:101–118. doi: 10.1098/rstb.2005.1756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ginger ML, Fritz-Laylin LK, Fulton C, Cande WZ, Dawson SC. Intermediary metabolism in protists: a sequence-based view of facultative anaerobic metabolism in evolutionarily diverse eukaryotes. Protist. 2010;161:642–671. doi: 10.1016/j.protis.2010.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gogarten JP, Olendzenski L. Orthologs, paralogs and genome comparisons. Curr Opin Genet Dev. 1999;9:630–636. doi: 10.1016/s0959-437x(99)00029-5. [DOI] [PubMed] [Google Scholar]
  27. Gomaa F, et al. One alga to rule them all: unrelated mixotrophic testate amoebae (amoebozoa, rhizaria and stramenopiles) share the same symbiont (trebouxiophyceae) Protist. 2014;165:161–176. doi: 10.1016/j.protis.2014.01.002. [DOI] [PubMed] [Google Scholar]
  28. Grant JR, Katz LA. Building a phylogenomic pipeline for the eukaryotic tree of life—addressing deep phylogenies with genome-scale data. PLoS Curr. 2014 doi: 10.1371/currents.tol.c24b6054aebf3602748ac042ccc8f2e9. Advance Access published April 2, 2014, doi: 10.1371/currents.tol.c24b6054aebf3602748ac042ccc8f2e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hug LA, Stechmann A, Roger AJ. Phylogenetic distributions and histories of proteins involved in anaerobic pyruvate metabolism in eukaryotes. Mol Biol Evol. 2010;27:311–324. doi: 10.1093/molbev/msp237. [DOI] [PubMed] [Google Scholar]
  30. Imanian B, Keeling PJ. Horizontal gene transfer and redundancy of tryptophan biosynthetic enzymes in dinotoms. Genome Biol Evol. 2014;6:333–343. doi: 10.1093/gbe/evu014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Katz LA. Lateral gene transfers and the evolution of eukaryotes: theories and data. Int J Syst Evol Microbiol. 2002;52:1893–1900. doi: 10.1099/00207713-52-5-1893. [DOI] [PubMed] [Google Scholar]
  32. Keeling PJ. Functional and ecological impacts of horizontal gene transfer in eukaryotes. Curr Opin Genet Dev. 2009;19:613–619. doi: 10.1016/j.gde.2009.10.001. [DOI] [PubMed] [Google Scholar]
  33. Keeling PJ, Palmer JD. Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet. 2008;9:605–618. doi: 10.1038/nrg2386. [DOI] [PubMed] [Google Scholar]
  34. Kurland CG, Canback B, Berg OG. Horizontal gene transfer: a critical view. Proc Natl Acad Sci U S A. 2003;100:9658–9662. doi: 10.1073/pnas.1632870100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lake JA, Rivera MC. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol Biol Evol. 2004;21:681–690. doi: 10.1093/molbev/msh061. [DOI] [PubMed] [Google Scholar]
  36. Le PT, et al. An automated approach for the identification of horizontal gene transfers from complete genomes reveals the rhizome of Rickettsiales. BMC Evol Biol. 2012;12:243. doi: 10.1186/1471-2148-12-243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Li L, Stoeckert CJ, Roos DS. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Liu K, Linder CR, Warnow T. RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One. 2011;6:e27731. doi: 10.1371/journal.pone.0027731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Loftus B, et al. The genome of the protist parasite Entamoeba histolytica. Nature. 2005;433:865–868. doi: 10.1038/nature03291. [DOI] [PubMed] [Google Scholar]
  40. Loftus BJ, Hall N. Entamoeba: still more to be learned from the genome. Trends Parasitol. 2005;21:453–453. doi: 10.1016/j.pt.2005.08.007. [DOI] [PubMed] [Google Scholar]
  41. Lorenzi HA, et al. New assembly, reannotation and analysis of the Entamoeba histolytica genome reveal new genomic features and protein content information. PLoS Neglect Trop Dis. 2010;4:e716. doi: 10.1371/journal.pntd.0000716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Lühe M. Erstes urreich der tiere. In: Lang A, editor. 1913. Handbuch der Morphologie der Wirbellosen Tiere. Jena: G. Fischer. [Google Scholar]
  43. Martin W. Molecular evolution—lateral gene transfer and other possibilities. Heredity. 2005;94:565–566. doi: 10.1038/sj.hdy.6800659. [DOI] [PubMed] [Google Scholar]
  44. Nixon JEJ, et al. Evidence for lateral transfer of genes encoding ferredoxins, nitroreductases, NADH oxidase, and alcohol dehydrogenase 3 from anaerobic prokaryotes to Giardia lamblia and Entamoeba histolytica. Eukaryot Cell. 2002;1:181–190. doi: 10.1128/EC.1.2.181-190.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Oren A. Convergent evolution of amino acid usage in archaebacterial and eubacterial lineages adapted to high salt—comment. Res Microbiol. 1995;146:805–806. doi: 10.1016/0923-2508(96)80889-8. [DOI] [PubMed] [Google Scholar]
  46. Penn O, et al. GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Res. 2010;38:W23–W28. doi: 10.1093/nar/gkq443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26:1641–1650. doi: 10.1093/molbev/msp077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Price MN, Dehal PS, Arkin AP. FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Ragan MA. On surrogate methods for detecting lateral gene transfer. FEMS Microbiol Lett. 2001;201:187–191. doi: 10.1111/j.1574-6968.2001.tb10755.x. [DOI] [PubMed] [Google Scholar]
  50. Ricard G, et al. Horizontal gene transfer from bacteria to rumen ciliates indicates adaptation to their anaerobic, carbohydrates-rich environment. BMC Genomics. 2006;7:22. doi: 10.1186/1471-2164-7-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Richards TA, Hirt RP, Williams BAP, Embley TM. Horizontal gene transfer and the evolution of parasitic protozoa. Protist. 2003;154:17–32. doi: 10.1078/143446103764928468. [DOI] [PubMed] [Google Scholar]
  52. Rosenthal B, et al. Evidence for the bacterial origin of genes encoding fermentation enzymes of the amitochondriate protozoan parasite Entamoeba histolytica. J Bacteriol. 1997;179:3736–3745. doi: 10.1128/jb.179.11.3736-3745.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Shimodaira H, Hasegawa M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. 2001;17:1246–1247. doi: 10.1093/bioinformatics/17.12.1246. [DOI] [PubMed] [Google Scholar]
  54. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
  55. Stamatakis A, Hoover P, Rougemont J. A rapid bootstrap algorithm for the RAxML web servers. Syst Biol. 2008;57:758–771. doi: 10.1080/10635150802429642. [DOI] [PubMed] [Google Scholar]
  56. Stamatakis A, Ludwig T, Meier H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics. 2005;21:456–463. doi: 10.1093/bioinformatics/bti191. [DOI] [PubMed] [Google Scholar]
  57. Stanley SL. Amoebiasis. Lancet. 2003;361:1025–1034. doi: 10.1016/S0140-6736(03)12830-9. [DOI] [PubMed] [Google Scholar]
  58. Stanley SL., Jr The Entamoeba histolytica genome: something old, something new, something borrowed and sex too? Trends Parasitol. 2005;21:451–453. doi: 10.1016/j.pt.2005.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Tanifuji G, et al. Genomic characterization of Neoparamoeba pemaquidensis (amoebozoa) and its kinetoplastid endosymbiont. Eukaryot Cell. 2011;10:1143–1146. doi: 10.1128/EC.05027-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Tomii K, Sawada Y, Honda S. Convergent evolution in structural elements of proteins investigated using cross profile analysis. BMC Bioinformatics. 2012;13 doi: 10.1186/1471-2105-13-11. :11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Tsaousis AD, et al. Evolution of Fe/S cluster biogenesis in the anaerobic parasite Blastocystis. Proc Natl Acad Sci U S A. 2012;109:10426–10431. doi: 10.1073/pnas.1116067109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Umejiego NN, et al. Targeting a prokaryotic protein in a eukaryotic pathogen: identification of lead compounds against cryptosporidiosis (vol 15, pg 70, 2008) Chem Biol. 2008;15:200–200. doi: 10.1016/j.chembiol.2007.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Wolf YI, Koonin EV. Genome reduction as the dominant mode of evolution. Bioessays. 2013;35:829–837. doi: 10.1002/bies.201300037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Yang W, Li E, Kairong T, Stanley SL., Jr Entamoeba histolytica has an alcohol dehydrogenase homologous to the multifunctional adhE gene product of Escherichia coli. Mol Biochem Parasitol. 1994;64:253–260. doi: 10.1016/0166-6851(93)00020-a. [DOI] [PubMed] [Google Scholar]
  65. Zmasek CM, Godzik A. Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires. Genome Biol. 2011;12:13. doi: 10.1186/gb-2011-12-1-r4. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data
supp_evu179_TableS1.xlsx (274.1KB, xlsx)
supp_evu179_TableS2.xlsx (39.3KB, xlsx)
supp_evu179_TableS3.xlsx (55.9KB, xlsx)

Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES