Skip to main content
Genome Biology and Evolution logoLink to Genome Biology and Evolution
. 2016 Aug 26;8(9):2964–2978. doi: 10.1093/gbe/evw208

Genetic Drift, Not Life History or RNAi, Determine Long-Term Evolution of Transposable Elements

Amir Szitenberg 1,2,*, Soyeon Cha 3, Charles H Opperman 3, David M Bird 3, Mark L Blaxter 4, David H Lunt 1
PMCID: PMC5635653  PMID: 27566762

Abstract

Transposable elements (TEs) are a major source of genome variation across the branches of life. Although TEs may play an adaptive role in their host’s genome, they are more often deleterious, and purifying selection is an important factor controlling their genomic loads. In contrast, life history, mating system, GC content, and RNAi pathways have been suggested to account for the disparity of TE loads in different species. Previous studies of fungal, plant, and animal genomes have reported conflicting results regarding the direction in which these genomic features drive TE evolution. Many of these studies have had limited power, however, because they studied taxonomically narrow systems, comparing only a limited number of phylogenetically independent contrasts, and did not address long-term effects on TE evolution. Here, we test the long-term determinants of TE evolution by comparing 42 nematode genomes spanning over 500 million years of diversification. This analysis includes numerous transitions between life history states, and RNAi pathways, and evaluates if these forces are sufficiently persistent to affect the long-term evolution of TE loads in eukaryotic genomes. Although we demonstrate statistical power to detect selection, we find no evidence that variation in these factors influence genomic TE loads across extended periods of time. In contrast, the effects of genetic drift appear to persist and control TE variation among species. We suggest that variation in the tested factors are largely inconsequential to the large differences in TE content observed between genomes, and only by these large-scale comparisons can we distinguish long-term and persistent effects from transient or random changes.

Keywords: nematoda, transposable elements evolution, RNA interference, mating system, parasitism

Introduction

Transposable elements (TEs) are mobile genetic entities found in the genomes of organisms across diverse branches of life, and which are a major source of genetic variation (Charlesworth et al. 1994; Kidwell and Lisch 1997; Bennett et al. 2004). TEs comprise approximately two-thirds of the human genome (de Koning et al. 2011), and in other plants and animals may account for up to 85% of all DNA (Marracci et al. 1996; Schnable et al. 2009). In stark contrast, other eukaryotic genomes contain only 1–3% TE-derived sequence within their typically much smaller genomes (Ibarra-Laclette et al. 2013; Burke et al. 2015). The mechanisms that create this variability are not fully understood.

TE insertions are a significant source of deleterious mutation causing gene disruption (Biémont et al. 1997; Kidwell and Lisch 1997), double-strand breaks (Gasior et al. 2006; Hedges and Deininger 2007), ectopic recombination (Charlesworth et al. 1997), gene expression change (Lerman et al. 2003), and other types of mutagenesis (Kidwell and Lisch 2001). In humans, deleterious TE activity contributes to ∼0.3% of genetic disease (Callinan and Batzer 2006; Cordaux and Batzer 2009). Some TE insertions, however, have only weak deleterious effects, increasing their likelihood of survival and expansion (Zou et al. 1996; Kim et al. 1998; Gao et al. 2008; Leem et al. 2008; Pritham 2009; Hellen and Brookfield 2013). Given sufficient time, a small proportion of these may be co-opted for protein-coding or regulatory functions by the host genome, and thus become very important components of organismal evolution (Lerman et al. 2003; Keren et al. 2010; Kojima and Jurka 2011). Despite being a key player in organismal evolution, the evolutionary forces determining the TE composition in genomes are far from clear. We have selected the phylum Nematoda, for its phylogenetic diversity of available genomes, as a system in which to investigate TE variation in a phylogenetically controlled design. While other studies have examined the correspondence between life history or other traits with TE evolution (Cutter et al. 2008; Campos et al. 2012, 2014; Hess et al. 2014; Fierst et al. 2015), these often muster relatively few phylogenetically independent contrasts, and a relatively recent evolutionary time scale. Examining evolutionary events across the entire phylum Nematoda gives a broad perspective where the balance of evolutionary forces will have had time to work.

Substantial efforts to characterize the forces and processes shaping genome evolution have given rise to explanations for the divergence in TE loads among species, including the effects of mating system and recombination, life history, genome GC content, and transposition suppression systems such as RNAi. These factors influence TEs both directly, by affecting their possibility for spread or removal, and indirectly, by modifying the effective population size and probability of fixation (Charlesworth and Charlesworth 1983). The effects of mating system and recombination have been much discussed, with conflicting predictions for either an increase (Montgomery et al. 1987; Wright and Schoen 2000) or decrease (Bestor 1999; Nordborg 2000; Wright and Finnegan 2001; Boutin et al. 2012; Arunkumar et al. 2015) in TE loads in selfing species. Duret et al. (2000) found that nonrecombining genomic regions are less TE rich than recombining regions in Caenorhabditis elegans, when considering DNA TEs. Also in Caenorhabditis, Cutter et al. (2008) predicted lower TE loads in selfing compared with outcrossing species. In contrast, TE spread was positively associated with recombination in Drosophila (Campos et al. 2012, 2014), although this was not recovered in a subsequent study (Bast et al. 2015). A mating system effect on genome size (and thus likely TE load), was reported in plants (Govindaraju and Cullis 1991; Albach and Greilhuber 2004; Wright et al. 2008), but subsequent studies accounting for phylogenetic associations in the data did not recover these effects (Whitney et al. 2010; Ågren et al. 2015; Fierst et al. 2015). Analysis of the evolution of TE loads in the Nematoda, where several independent shifts in mating system have occurred (fig. 1), may aid in better understanding the evolutionary forces and genomic processes operating.

Fig. 1.—

Fig. 1.—

Transposable element loads in Nematoda by class. SSU-rRNA phylogenetic tree of Nematoda with TE load information by class. The columns represent (left to right) DNA, LTR, LINE, and SINE element loads (numerical values are given in supplementary table S2, Supplementary Material online), the phylogenetic clade sensu (Blaxter et al. 1998), presence or absence of RNAi pathway proteins (RRF1, RRF3 and PIWI), parasitism (animal parasite, plant parasite, or free living), and mating system (parthenogenic, gonochoristic, hermaphroditic or apomictic). Black scales at the bottom of each bar-chart represent 2500 TEs. Sources for life history information are in supplementary table S1, Supplementary Material online.

Adoption of a parasitic lifestyle can reduce the effective population size, and thus the effectiveness of recombination and natural selection. Parasites may be subdivided into infrapopulations within hosts, and this population subdivision reduces the effective population size compared with free-living species (Criscione and Blouin 2005). Increased TE counts were found in ectoparasitic Amanita fungi compared with free living Amanita species (Hess et al. 2014), where the authors suggested the effective population size effects of parasitism as a cause for the difference. As Nematoda contain several independent transitions to parasitism, this hypothesis can also be further tested (fig. 1).

Genome nucleotide bias (GC content) has been shown to influence a wide variety of cellular processes, and especially the rates and patterns of molecular evolution. These effects include tRNA abundance and codon usage (Ikemura 1981, 1985; Muto and Osawa 1987; Knight et al. 2001), mutational patterns (Lobry 1996; Sueoka 1999), gene expression (Gouy and Gautier 1982; Holm 1986; Sharp et al. 1986; Sharp and Devine 1989; Andersson and Kurland 1990; Stenico et al. 1994), protein and RNA structure and composition (Gambari et al. 1989; Zama 1989, 1996; D’Onofrio et al. 1991; Huynen et al. 1992; Collins and Jukes 1993; Gupta et al. 2000), and translational efficiency (Berg and Kurland 1997). The tight integration of TEs with cellular processes will mean that they will also be affected by differential nucleotide biases, as has been examined by Hellen and Brookfield (2013), who demonstrated the accumulation and persistence of human Alu elements was favoured in GC-rich regions. Again, diversity in GC content across nematode genomes offers power to detect the effects of GC on TE load evolution.

The host genome is engaged in defending itself against TE insertions, with RNA interference (RNAi) pathways a key cellular processes silencing TEs in eukaryotes (Tabara et al. 1999; Aravin et al. 2001; Sijen and Plasterk 2003; Chung et al. 2008; Czech et al. 2008; Ghildiyal et al. 2008; Kawamura et al. 2008; Slotkin et al. 2009). RNAi pathways variation is thus suggested to be key to TE evolution (Matzke et al. 2000; Bossdorf et al. 2008; Richards 2008; Obbard et al. 2009; Rebollo et al. 2012). In nematodes a variety of mechanisms of TE silencing have been characterized at the molecular level (Aravin et al. 2007; Das et al. 2008; Bagijn et al. 2012; Sarkies et al. 2015), with different pathways operating in different clades (fig. 1). This variation permits examination of the role of alternate TE silencing pathways in explaining genome-wide TE loads.

The importance of nondeterministic processes in shaping TE evolution has been long recognized by population geneticists (Charlesworth and Charlesworth 1983; Lynch and Conery 2003; Le Rouzic et al. 2007; Whitney et al. 2010) with the efficiency of selection and TE silencing likely to be greatly influenced by the effective population size. If differences in TEs between lineages are not determined by processes such as mating system or life history then a null model of genome evolution, one which is shaped by nondeterministic processes such as mutation and drift (Lynch 2007). Here we conduct correlation and ANOVA tests of deterministic forces previously proposed to affect TE evolution, with phylogenetically independent contrasts of TE counts in species from across the phylogenetic diversity of Nematoda (Blaxter et al. 1998) as the dependant variable. We find no evidence for a deterministic effect of life history, GC content or RNAi pathway variation on TE load variation. Furthermore, our data strongly suggest that stochastic changes are the major genome-wide determinant of TE diversity.

Results

TE Loads in Nematoda

To test the effect of mating system, parasitic lifestyle, GC content, RNAi and transposition mechanism on TE evolution, TEs were identified and classified in 43 genome assemblies representing the five major nematode lineages and the tardigrade Hypsibius dujardini (fig. 1; supplementary table S1 and methods S1, sections 1–7, Supplementary Material online). Three quantifiers of TE loads, namely TE counts, coverage of the genome assembly by TEs, and the proportion of genome assembly covered by TEs, were strongly correlated with one another (0.72 < r < 0.9, P value < 0.005). None of these measures correlated with genome assembly quality (represented by N50 values), although the TE counts and the total length of TEs did correlate with assembly length, asserting that TE prediction is robust to assembly quality differences (supplementary methods S1, section 12 and fig. S1, Supplementary Material online). The correlation with assembly length was lost for almost all TE superfamilies under consideration of the phylogenetic relationships among the nematode species (supplementary methods S1, section 13 and results S1, section 4, Supplementary Material online). Since the different measures of TE content were shown to be strong proxies of one another, we focused our analyses on TE counts. We expect TE counts to represent TE related evolutionary rates (i.e., rate of change in TE content) more linearly than their assembly coverage or its proportion of the genome assembly, because of the differences in sequence length among TEs from different TE groups.

High TE loads have a patchy distribution among species in Nematoda, with hotspots observed in the Dorylaimia (Clade I of Blaxter et al. 1998) and Enoplia (Clade II), in Rhabditina (Clade V), and in the Tylenchomorpha genera Meloidogyne and Globodera (part of Clade IV) (fig. 1). DNA elements were usually the most abundant, followed by LTR elements, whereas LINE and SINE elements were quite scarce (fig. 1). When classes were broken down into families (fig. 2; supplementary table S2 and fig. S2, Supplementary Material online), a large proportion of the variation among species, for “cut and paste” DNA elements, was contributed by variation in loads of TcMar element families, which are scarce in Dorylaimida (Clade I) and abundant in Rhabditina (Clade V). hAT families followed a similar pattern, but with less extreme differences among species. Onchocerca volvulus (Spirurina; Clade III) had high loads of Helitron elements (5,372 copies), and hardly any other TEs, a very different pattern from its relatives in Clade III. Among LTR superfamilies, Gypsy elements predominated, with Copia and Pao elements also prevalent, though a large proportion of the elements were unclassified. The predominant LINE elements were Penelope and RTE. SINE elements, although more abundant in a few Rhabditina (Clade V) species than in others, were generally scarce (< 500 in most species, supplementary table S2, Supplementary Material online). The composition of the consensus TE library is described in supplementary results S1, Supplementary Material online.

Fig. 2.—

Fig. 2.—

Transposable element loads in Nematoda by superfamily. SSU-rRNA phylogenetic tree of Nematoda with TE loads information by superfamily. The columns represent (left to right) the presence or absence of RRF1, RRF3 and PIWI RNAi pathway proteins, parasitism (A-animal parasite, P-plant parasite, or F-free living), and mating system (P-parthenogen, G-gonochoric, H-hermaphroditic or A-apomictic), the phylogenetic clade sensu (Blaxter et al. 1998) and the proportions of DNA, LTR, LINE and SINE element superfamilies within each of the classes (numerical values in supplementary table S2 and color key in supplementary fig. S1, Supplementary Material online).

Phylogenetic Signal in TE Load

According to our null hypothesis, TE loads evolve neutrally and change (in rates and patterns) is expected to be congruent with the topology and branch lengths of the species tree. This can be assessed via phylogenetic transformations of observed TE loads (Pagel 1994). To account for phylogenetic uncertainty while computing such transformations, we generated a Bayesian posterior distribution of SSU-rRNA phylogenetic species trees. Tree transformation values of the TE counts were computed with each of the trees in the posterior distribution, and for each of the TE classes (DNA, LTR, LINE and SINE; fig. 3A). Transformation value distributions were also computed for each superfamily (supplementary fig. S3, Supplementary Material online), within each class of TEs, and the median value of each superfamily was recorded across the superfamilies in a given class (fig. 3B). We did not include SINE element superfamily medians, as SINE elements were too sparse to compute a meaningful distribution (fig. 3B).

Fig. 3.—

Fig. 3.—

Phylogenetic transformations of TE loads. The δ, κ, and λ transformations of TE loads, representing the fit between the TE loads and the tree’s topology (λ), branch-lengths (κ) and root-tip distance (δ). (A) The distribution of transformation values across the posterior distribution of most likely phylogenetic trees for each element class (DNA, LTR, LINE, and SINE). (B) The distribution of median transformation values of each superfamily of elements within each of the classes. Only superfamilies where the distance between the first and third quartiles was <0.2 for λ and <0.5 for κ and δ are included (i.e., superfamilies with an unresolved transformation value are excluded). SINE elements are not shown because the distributions cover the whole range of values. Per-superfamily distribution of the λ, κ and δ transformations across the posterior distribution of trees is shown in supplementary figure S3, Supplementary Material online.

The λ transformation (Pagel 1994) provides an estimate of the degree to which traits are predicted by the phylogenetic relationships, with λ = 1 indicating a strong fit. At the class level, DNA, LTR and LINE element load variations are strongly correlated with the species phylogenetic relationships (λ > 0.5; fig. 3A). For many superfamilies the median λ was >0.5, indicating that high fit to the phylogeny is a general characteristic of TEs, and not only a feature of a few large superfamilies (fig. 3B). For SINE elements, in part due to their low abundance and phylogenetic uncertainty, this correlation was not recovered. The strong fit with the phylogenetic tree demonstrates that intraspecific variation in TE loads is not an important source of noise in our results. A second phylogenetic transformation κ, provides an estimate of the correspondence between the branch lengths and the rate of change of a trait (Pagel 1994). κ > 1 indicates a higher rate of change in longer branches, κ = 1 indicates that the rate of change of the trait conforms with the general evolutionary rate, and κ < 1 indicates that the trait is more conserved than expected from neutrality. The κ value distribution for nematode DNA TE loads showed that DNA TE evolution depends less on the organismal evolutionary rate than other TE classes, at the class level (κ < 1; fig. 3A). The pattern persisted for most superfamilies when considering κ median values at the DNA element superfamily level (fig. 3B). Lastly, the δ transformation estimates the tree depth at which nonneutral evolutionary events occurred, where δ < 1 suggests ancient events and δ > 1 indicates that the trait diversified recently. For DNA elements, δ was >1, indicating that recent events explain their current TE load patterns, whereas for LTR elements, δ was <1, suggesting that ancient events explain their load patterns. δ was not determined for LINE and SINE elements due to phylogenetic uncertainty. Only for LTR elements did these patterns persist when the median δ values of individual superfamilies were considered (fig. 3B), where all of the LTR superfamilies underwent important early events (median δ < 0.3).

The Effect of Life Cycle, RNAi Pathway, and Genome GC Content Variation on TE Evolution

Primary literature was surveyed in order to determine the mating system of each species and to identify parasites of plants and animals (supplementary table S1, Supplementary Material online). Key proteins involved in RNA silencing of transposons (RRF1, RRF3 and PIWI) were identified in the genome assembly data using reference sequences (from Sarkies et al. 2015; supplementary results S1, Supplementary Data and Supplementary Data, Supplementary Material online), and genome assembly N50, span and GC content were calculated (supplementary methods S1, Supplementary Data, Supplementary Material online). The reproductive mode, parasitic status and RNAi pathway for each nematode species is summarized in figure 1 and supplementary table S1, Supplementary Material online. The presence and absence of RNAi pathway proteins for the most part conformed with the predictions made by Sarkies et al. (2015), with a few exceptions. Syphacia muris, (Oxyuridomorpha, Spirurina in Clade III), lacks the expected RdRP RRF3 protein that is found in other Spirurina species. Since the genome assembly has high N50 values (60,730 bp), and much supporting transcriptome data (supplementary methods S1, Supplementary Data, Supplementary Material online), it is highly likely that this species lacks RRF3 (or possess a very divergent RRF3 orthologue). The Heterorhabditis bacteriophora (Rhabditomorpha; Clade V) genome lacked an RRF3 locus although RRF3 is expected in Rhabditomorpha species (Sarkies et al. 2015). Given the relatively high quality of the H. bacteriophora genome assembly (N50 of 33,765 bp), RRF3 is again likely absent (or very divergent) in this species. No RRF3 were found in any of the 9 Tylenchomorpha species (Clade IV), regardless of their N50 values (3,348–121,687 bp), in keeping with expectations (Sarkies et al. 2015).

For each TE class and superfamily, we tested the effect of mating system, parasitic lifestyle, and variation of RNAi pathways on TE loads at terminal nodes using an ANOVA of phylogenetically independent contrasts (supplementary results S1, Supplementary Data, Supplementary Material online). No significant effect was detected following Holm–Bonferroni correction (Holm 1979). In the absence of any P value correction for multiple tests, the loads of only two superfamilies were significantly affected by mating system variation, but this is an expected rate of type I error, or an extreme minority of cases if these are true positives (Holm 1979). We also explored the correlation between genome assembly GC contents and TE loads (supplementary methods S1, Supplementary Data and Supplementary Data, Supplementary Material online) and found no significant results following a Holm–Bonferroni correction (Holm 1979). Prior to this correction, a weak correlation was found in only two superfamilies. On the whole, neither the ANOVA tests or the correlation tests revealed an effect of either of the tested factors. It is unlikely that our results are biased by our taxonomic sampling, as such bias would usually cause false positive results, and such do not occur.

Changes of TE Loads at Ancestral Nodes

To understand long-term processes in TE evolution, we reconstructed the TE loads for each element superfamily at each node in the Nematoda phylogeny, and derived the median change in TE loads at each node compared with its ancestor (fig. 4). For all the four TE classes (DNA, LTR, LINE, and SINE) the evolutionary process was characterized by a trend towards contraction of TE loads, with only very few events of stable expansions, except for shallow nodes where the nature of change was less predictable. Contraction in deep nodes appeared to have been more constant for LTR elements than other classes (fig. 4B), in agreement with the δ value in this class (fig. 3), and LTR elements were also the most dynamic in shallow nodes, with shallow expansion hotspots within Onchocercidae (Clade III), Strongylida and Caenorhabditis (Clade V), and Strongyloides, Globodera and Meloidogyne (CladeIV). Other classes (fig. 4A, C, and D), also showed recent expansions, but only in a subset of these taxa.

Fig. 4.—

Fig. 4.—

Median TE load change at ancestral and terminal nodes. The median load change of DNA (A), LTR (B), LINE (C), and SINE (D) superfamilies. Ancestral states were reconstructed for each superfamily. Then, the proportion of change, compared with the ancestral node, was computed for each superfamily, at each node. The median change proportions are presented for each class. Green nodes represent an increase compared with the most recent ancestor, with larger nodes representing a greater increase. Black nodes represent a decrease compared with the most recent ancestor, with smaller nodes representing a greater decrease. Where no bullet is visible, there has been a large decrease in TE counts. Long branches (0.06 or longer) along which at least 50% change in TE loads has occurred are green, gray or black to indicate an increase, stability or decrease of TE median loads along the branch. Since increase is not inferred, green branches do not ultimately occur.

Detection of Adaptive Processes and Convergent Evolution

To identify adaptive processes in the evolution of TEs, we tested the fit of the TE loads with the Ornstein Uhlenbeck (OU) model, using BayesTraits 2 (Pagel 1997). Since a strong phylogenetic background provides power to detect selection as a consistent deviation from it, and given the high λ values characterizing TE evolution in Nematoda, we predicted high power to detect α, the selection strength parameter in the OU process, as illustrated in figure 5A. This figure demonstrates that given the Nematoda phylogenetic tree and the phylogenetic pattern of the TE loads, low α values can be detected. Assuming no stochastic interference, α values significantly >1 were detected in 14, 7, 3, and 1 superfamilies from the classes of DNA, LINE, LTR and SINE elements, respectively (supplementary fig. S4 and Supplementary Data, Supplementary Material online). These families were analyzed for convergent evolution, fitting the most likely extended model, also allowing shifts in the selective optimum (θ) as well as stochastic change (σ2). Since transposition can increase the TE loads even when it is not adaptive, α and θ may also represent neutral or slightly deleterious transposition events rather than positive selection, although the balance between α and σ2 can help to distinguish between stochastic and deterministic trajectories, regardless of the true nature of α. Convergent evolution (i.e., polyphyletic lineages possessing the same selective optimum θ) was detected for all these elements. However, shifts in θ were only identified in terminal or otherwise shallow nodes, with the exception of a θ increase for two LTR element superfamilies at the base of the Rhabditina (Clade V), and never coincided with shifts in mating system, parasitism, or RNAi pathways (supplementary fig. S5, Supplementary Material online). Moreover, the σ2 (drift) values were overwhelmingly higher than α (selection) for most of the superfamilies, as the example shown in figure 5B, illustrating the stochasticity of the evolutionary trajectories of TE loads. Therefore, this analysis reveals stochastic evolutionary trajectories with no deterministic effect of the tested factors.

Fig. 5.—

Fig. 5.—

Ornstein Uhlenbeck-model fitting to detect selection. Power to detect the selection strength parameter alpha (A), under a gamma transformation value of 0.5 (black), 0.8 (red) and 1 (blue), and simulations of the evolutionary trajectory of the DNA/TcMar-Tc2 TE superfamily loads (B) under the OU model fitted to this superfamily (σ2=1×109, α = 4×105 for 105 generations and 50 replications).

Discussion

The common ancestor of Nematoda dates back to the Cambrian radiation (Vanfleteren et al. 1994), 550 million years ago, and thus the genome sequences of nematodes that have become available in the last decade provide a unique opportunity for comparative genomics analyses of the long-term forces shaping evolution. This contrasts with most previous studies which were only able to analyze recent periods (Duret et al. 2000; Wright et al. 2001; Albach and Greilhuber 2004; Cutter et al. 2008; Campos et al. 2012, 2014; Ågren et al. 2014, 2015; Hess et al. 2014; Fierst et al. 2015; but see, Whitney et al. 2010). We present analyses of the long-term evolution of TEs, exploring the roles and importances of multiple deterministic forces in a phylogenetic design. Our results establish that diversification in TE loads is recent and independent of GC content, life history, and RNAi, and is best understood as a stochastic process. We also find a consistent reduction in TE loads at ancestral nodes across the Nematoda tree, most notably at the base of clade III + IV + V. It would thus seem that the consequence of genetic drift and purifying selection, known to shape TE loads at the population level (Lynch and Conery 2003; Hua-Van et al. 2011), endure to shape TE load variation in the phylum level, with little effect of other potentially deterministic forces. A caveat is that the constant reduction in TE loads in ancestral nodes may reflect the long-term reduction in genome size as a whole, rather than a specific reduction in TE load.

Long-Term GC Content Variation Does Not Determine TE Loads

Genome GC content can change gradually along the phylogenetic tree. We therefore used an analytical procedure that accounts for the ancestral character states of both TE load and total GC content traits and tested for correlation between the two through the evolutionary history of nematodes. In humans, purifying selection against TE loss in gene rich regions of the genome is the main driver of variability in Alu element loads between GC-rich and -poor genomic regions (Brookfield 2001; Hellen and Brookfield 2013). However, although local GC content variation in the genome may explain the distribution of TEs inside that genome, the total GC content of the genome does not predict TE load differences between species as we did not find substantial correlation between the TE loads and the total GC content of the nematode genome assemblies. While the local GC content may indeed influence the number of insertions fixed in a given locus, it is not a limiting factor on TE loads in the genome as a whole.

Recent Variation in RNAi Pathways and Life History Is Not a Predictor of TE Evolution

TE load variation is independent of recent shifts in the species’ life history or RNAi pathways involved in TE silencing. Less than one percent of the ANOVA tests examining the effect of RNAi, parasitism, and mating system on TE loads suggested significant associations between traits and TE loads, and did not exceed an acceptable type I error rate. Since we cannot determine historic character states for RNAi pathways and life history, it is impossible to rule out completely that they would explain TE loads, but this limitation is shared by studies that do suggest a significant effect of these factors. In addition, the OU models fitted to the data were stochastic, and not directional. Even though a selection component is directly assumed in this model, and therefore always found, it was never strong enough to counteract stochasticity in our simulations. Deterministic models of TE load evolution thus have little or no support, and instead, the variation of TE loads among the extant species is consistent with a stochastic model. Exclusion of this wide range of direct possible deterministic explanations for TE load variation means that complex interactions among such forces must be postulated to retain strong effect for these proposed mechanisms. This is not to say that such deterministic effects are absent, only that they are short lived due to the genetic drift that often counteracts them. That we find genetic drift to be a dominant evolutionary force for TEs is not unexpected as, drift has been suggested to play a key role in the evolution of multicellular organisms due to their long-term and ancient reduction in effective population size (Lynch and Conery 2003).

TE Load Contraction Has Prevailed in TE Evolution

Ancestral state reconstructions reveals a consistent contraction in TE loads through time on our tree. Furthermore, long branches in the phylogeny, including terminal ones, often coincide with reduction of at least 50% in TE loads (fig. 4) and almost never with an increase. We thus suggest that a component of long-term purifying selection, acting at the population level (Hua-Van et al. 2011), in addition to the more recent effect of drift discussed earlier, should be included in a realistic model at this scale. In such a model purifying selection prevails in the long run, over genetic drift that might increase or preserve TE loads temporarily. If this is true, the co-occurrence of increased purifying selection of LTR elements with their increased expansions in terminal nodes, suggests that, on an average, LTR element loads have a tendency to increase faster than other elements and are therefore more deleterious and exposed to stronger purifying selection, in accordance with previous predictions (Brookfield 1995; Kidwell and Lisch 2001).

An increased strength of purifying selection that might be experienced by LTR elements could result from either their possible indiscriminate targeting of genic regions (Finnegan 1992; Pritham 2009) or from their suggested role in induction of increased ectopic recombination (Montgomery et al. 1987). It may be that LTR elements have not been able to evolve to efficiently target nongenic regions of the genome (Zuker et al. 1984; McDonald et al. 1997). One signature of increased ectopic recombination as a driver of purifying selection on LTR TEs would be an inverse correlation between the median sequence length of TE families and their loads in a given species (Petrov et al. 2003), but we did not detect such inverse correlation (supplementary methods S1, Supplementary Data, Supplementary Material online). Still, additional sampling is required to pin down the cause of TE contraction in ancestral node.

Conclusions

A wide body of literature has sought biological explanations for the observed patterns of variation of TE loads in eukaryotic genomes, invoking explanatory variables such purifying selection, mating system, parasitic lifestyle, genome-wide GC content and RNAi pathways for silencing TEs. Our analysis of the evolution of TEs on a long time scale—across the entire phylum Nematoda—shows high statistical power to detect directional selection, yet reveals that these variables do not, in fact, explain TE load variation among species, with the possible exception of purifying selection, given time. Instead, variation in TE loads is largely stochastic, explained by genetic drift, with little or no consistent effect of life history or genomic explanatory variables. We acknowledge that other characteristics, such as horizontal gene transfer, or recurrent activation and deactivation of TEs might also be stochastic. However, the strong congruence of the TE counts with the phylogenetic tree suggest that the variability in TE loads within species is smaller than between species, and thus the observed counts are close to fixation by drift. We also emphasize here that our results do not reject the importance of these or other factors, for an individual or a population, over relatively short time scales. Our inference is that in the long run they will not determine the evolutionary trajectories of TE loads, due to strong stochastic effects, and ultimately purifying selection. We also stress that although genetic drift and selection are processes that occur in populations, their signature can additionally be observed in speciation and over phylogenetic scales. We suggest that only studies that examine TE load across a large number of life history transitions and over large timescales will be able to provide power to reliably distinguish between stochastic and deterministic forces, and quantify the balance of evolutionary processes shaping this major component of eukaryotic genomes.

Methods

Genome Assemblies

Genome assemblies of species from phylum Nematoda, representing the five major clades (Blaxter et al. 1998) were obtained from different sources (supplementary table S3, Supplementary Material online). The assemblies included 4 species from Dorylaimida (Clade I), 1 from Enoplia (Clade II), 9 from Spirurina (Clade III), 15 from Tylenchina (Clade IV) and 13 from Rhabditina (Clade V). For Dorylaimia and Enoplida (Clades I and II), we analyzed all the available genome assemblies. The genome of the tardigrade Hypsibius dujardini was used as outgroup. To compare the completeness of the genome assemblies, the N50 metric (supplementary methods S1, Supplementary Data, Supplementary Material online) was calculated for each (supplementary table S3, Supplementary Material online). The GC content of each genome assembly was calculated (supplementary methods S1, Supplementary Data, Supplementary Material online). All the scripts, the input files and the output files used in the analyses presented here, are available on GitHub at https://github.com/HullUni-bioinformatics/Nematoda-TE-Evolution.

TE Identification

We conducted TE searches in the genome assemblies rather than in sequence read data, which are not publicly available for many of the target species. To mitigate the biases associated with this approach, we have also utilized complementary methods of TE searches. One of the approaches was homology based searches using reference DNA sequences of elements in a de novo constructed library, representing a wide taxonomic range within phylum Nematoda. RepeatModeler 1.0.4 (Smit and Hubley 2010b) was used to identify repeat sequences in each genome assembly using RECON (Bao and Eddy 2002), RepeatScout (Price et al. 2005) and TRF (Benson 1999). RepeatModeler uses RepeatMasker (Smit and Hubley 2010a) to classify the consensus sequences of the recovered repetitive sequence clusters. The identification stage employed RMBLAST (Camacho et al. 2009) and the Eukaryota TE library from Repbase Update (Jurka et al. 2005). The consensus sequences from all the species were pooled, and the uclust algorithm in USEARCH (Edgar 2010) was used to make a nonredundant library, picking one representative sequence for each 80% identical cluster. Additional classification of the consensus sequences was performed with the online version of Censor (Jurka et al. 1996). Classifications supported by matches with a score value >300 and 80% identity were retained. The script used to construct this library is in supplementary methods S1, Supplementary Data, Supplementary Material online.

RepeatMasker (Smit and Hubley 2010a) was used to search for repeat sequences in the Nematoda genome assemblies and that of H. dujardini, using this de novo Nematoda library (supplementary methods S1, Supplementary Data, Supplementary Material online). To eliminate redundancies in RepeatMasker output, we used One Code to Find Them All (Bailly-Bechet et al. 2014), which assembled overlapping matches with similar classifications, and retained only the highest scoring match of any remaining group of overlapping matches (supplementary methods S1, Supplementary Data, Supplementary Material online). Alternative approaches to identify TEs were also employed. TransposonPSI B (http://transposonpsi.sourceforge.net/), which searches for protein sequence matches in a protein database thus allowing acurate identification of shorter fragments, and LTRharvest (Ellinghaus et al. 2008), which identifies secondary structures (supplementary methods S1, Supplementary Data, Supplementary Material online), were used to screen the target genomes. For TransposonPSI searches, only chains with a combined score >80 were retained, whereas we retained only matches that were at least 2,000 bp long and 80% similar to the query from LTRharvest searches. Where matches from the three approaches overlapped, we retained only the longest match (supplementary methods S1, Supplementary Data, Supplementary Material online).

Characterization of RNAi Pathways

Three key proteins, distinguishing the three RNAi pathways discussed in Sarkies et al. (2015), were searched for in the genome assemblies (supplementary methods S1, Supplementary Data, Supplementary Material online), using the program Exonerate (Slater and Birney 2005). Sequences from the supplementary files data S1 and S2 from Sarkies et al. (2015) were used as queries to identify homologues of PIWI, an Argonaute (AGO) subtype, and RNA-dependent RNA polymerase (RdRP; specifically subtypes RRF1 and RRF3), respectively. Only matches at least 100 amino acids (aa) long and at least 60% similar to the query were retained. In addition, only the best scoring out of several overlapping matches was used (supplementary methods S1, Supplementary Data, Supplementary Material online). The matches and the queries were used to build two phylogenetic trees, one of PIWI (and other AGO) sequences and the other of RdRP sequences, to verify the identity of the matches (supplementary methods S1, Supplementary Data, Supplementary Material online). Each of the data sets was aligned using the L-ins-i algorithm in MAFFT 7 (Katoh and Standley 2013), and cleared of positions with a missing data proportion of over 0.3 using trimAl 1 (Capella-Gutiérrez et al. 2009). In the resulting alignment, only sequences longer than 60 aa were retained. Maximum Likelihood (ML) trees were reconstructed using RAxML 8 (Stamatakis 2014) with sh-like branch supports. Species that occurred at least once in any of the three clades (PIWI in the first tree and RRF1 and RRF3 in the second), were scored as possessing that gene (fig. 1). Where a species did not have a representative sequence in one of the clades, a directed search for the specific protein was conducted in the sequences that did not pass the filter (i.e., the best match had lower score and length than the set cutoff). The identity of sequences retrieved in this way was examined in a second pass of phylogenetic reconstruction. This step did not yield additional phylogenetically validated matches and confirmed the validity of the cutoff set in the filtering step.

Phylogenetic Reconstruction of the Nematoda Using Small Subunit Ribosomal RNA (SSU-rRNA) Sequences

To control for phylogenetic relationships within the TE counts data set, we inferred a species phylogenetic tree using the SSU-rRNA gene. This locus is considered to be reliable for the reconstruction of the phylogeny of Nematoda, and produces trees that tend to agree with previous analyses (Blaxter et al. 1998; Holterman et al. 2006; Meldal et al. 2007; van Megen et al. 2009). First, we identified SSU-rRNA genes with BLAST+ 2.2.28 (Camacho et al. 2009), in each of the genome assemblies, where for each species the query was an SSU-rRNA sequence of the same or closely related species, taken from the Silva 122 database (Quast et al. 2013). Matches shorter than 1,400 bp were not selected and the query sequence was retained instead, providing it was identical to the match. Species for which the SSU-rRNA sequence could not be recovered and was not available online were excluded from further analysis. Since unbalanced taxon sampling may reduce the accuracy of the phylogenetic reconstruction (Heath et al. 2008), we also included additional sequences from Silva (Quast et al. 2013), representing the diversity of Nematoda. ReproPhylo 0.1 (Szitenberg et al. 2015) was used to ensure the reproducibility of the phylogenetic workflow (supplementary results S2, Supplementary Material online). A secondary structure aware sequence alignment was conducted using SINA 1.2 (Pruesse et al. 2012), and the alignment was then trimmed with trimAl 1 (Capella-Gutiérrez et al. 2009) to exclude positions with missing data levels that lie above a heuristically determined cutoff. An ML search was conducted with RAxML 8 (Stamatakis 2014) under the GTR-GAMMA model and starting with 50 randomized maximum parsimony trees. Branch support values were calculated from 100 thorough bootstrap tree replications. After tree reconstruction, nodes that did not represent a genome assembly (either the BLAST match or the Silva sequence substitute) were removed from the tree programmatically using ETE2 (Huerta-Cepas et al. 2010). To characterize the phylogenetic uncertainty, we generated a posterior distribution of trees using Phylobayes 3 (Lartillot et al. 2009). Two chains were computed, using the trimmed ML tree as a starting tree and the GTR–CAT model (sensu; Lartillot and Philippe 2004). The analysis was continued until the termination criteria were met (specifically, maxdiff and rel_diff < 0.1, and effsize > 100), with a burnin fraction of 0.2 and by sampling each 10th tree. The same subsample of trees was used to generate a consensus tree. The reconstruction of the SSU rRNA tree is detailed in supplementary methods S1, section 1, Supplementary Material online.

The Effect of Life Cycle, RNAi and Percent GC Variation on TE Loads

Primary literature was surveyed to determine the mating system of each species and to identify parasites of plants and animals (supplementary table S1, Supplementary Material online). The effect of these factors on the TE loads was tested with an ANOVA of phylogenetically independent contrasts, using the R package Phytools (Revell et al. 2008). Species were classified into the four mating systems dioecy, androdioecy, facultative parthenogenesis (including both species that fuse sister gametes and species that duplicate the genome in the gametes) and strict apomixis. Species that had both hermaphroditic and gonochoric life cycle stages were classified as gonochoric (e.g., Heterorhabditis bacteriophora; Poinar 1975). We conducted three tests, in the first of which the four levels were tested, in the second the parthenogenetic and androdioecious species were pooled, and in the third, species were divided into dioecious and nondioecious.

To test the effect of parasitism, free living species, plant parasites and animal parasites were first tested as three separate groups, and then plant and animal parasites were pooled into a single group for a second test. The necromenic lifestyle of Pristionchus pacificus was classified as free living because this species is not reported to depend on any host function, only on the organisms that build up on its carcass (Dieterich et al. 2008).

ANOVA of phylogenetically independent contrasts was also used to test the effect of the variation in RNAi pathways on the TE loads. Six groups of species were determined based on the presence or absence of PIWI, RRF1 and RRF3 proteins. In addition, for each of the three proteins, the effect of their presence was tested independently of the other proteins. Finally, dependency between GC content of genome assemblies and their TE loads was tested by a regression of the squared contrasts of TE counts and the estimates of GC contents in ancestral nodes (Revell et al. 2008). The execution of ANOVA and correlation tests is detailed in supplementary methods S1, sections 10.23–10.25, Supplementary Material online.

Phylogenetic Signal in the TE Data

The phylogenetic transformations λ, κ and δ (Pagel 1994) were calculated with BayesTraits (Pagel 1997) over the subsample of trees produced with Phylobayes (see above), to account for phylogenetic uncertainty (supplementary methods S1, sections 10.4–10.10, Supplementary Material online). They were estimated for the pooled classes of DNA, LTR, LINE and SINE elements, as well as for individual superfamilies that occurred in at least 15 nematode species. The proportion of individual TEs that were included in this analysis is depicted in supplementary figure S2, Supplementary Material online (bottom).

Detection of Selection and Convergent Evolution of TE Loads

The Ornstein Uhlenbeck (OU) process (Gardiner 1985) was originally suggested as an approach to model the evolution of continuous traits based on phylogenies (Felsenstein 1985). Building upon this process, Hansen (1997) has developed a method to study changes in selection regimes, on the macroevolutionary scale, neglecting stochastic effects on the process. In the OU process, a change in character state depends on the strength of selection (α) and its distance and direction from the current selection optimum (θ). Goodwin later (Goodwin et al. 2003) added a Brownian Motion (BM) component to the model (σ2), recognizing the confounding effect that stochastic events related to demography might have on selection. The R package PMC (Boettiger et al. 2012) was used to assess the power of our data to detect OU processes in the evolution of TEs. The OU parameter α was estimated with Bayestraits (Pagel 1997), neglecting stochastic effects, in superfamilies occurring in at least 15 nematode species. Where a significant α was detected (P value < 0.05, in the posterior distribution of trees), indicating selection, we examined the possibility of convergent evolution between species with similar life cycle or RNAi status with the R package SURFACE (Ingram and Mahler 2013). In these analyses, selection optima shifts are detected in the trees’ branches through a heuristic search, which uses AIC test results as the optimization criterion. Then, further improvement to the fit of the model is attempted by unifying optimum shifts. Where the unification of two or more optimum shifts improved the AIC score of the model, convergent evolution is inferred. SURFACE uses the R package OUCH (Butler and King 2004) to fit OU models, and unlike Bayestraits, includes a stochastic component (σ2), expressed by BM, in the OU model. The steps described in this paragraph are detailed in supplementary methods S1, section 10.12–10.22, Supplementary Material online.

Magnitude of Change at Ancestral Nodes

To identify nodes in the species tree that were hotspots of change in TE loads, we reconstructed the ancestral character states for a subset of elements using an ML analysis (Revell et al. 2008). Since the phylogenetically independent contrast of a root node is also the maximum likelihood estimate of its character state (Felsenstein 1985), this analysis sequentially treats each node as the root, in order to compute the TE load at this node. The element subset included only classified elements from superfamilies that occurred in at least 15 species. Within each of the three groups of “cut and paste”, LTR, LINE, and SINE elements, we calculated the median change magnitude across the superfamilies in the group, and for each node. The magnitude of change was expressed as the proportion of the load of a given element superfamily at node X out of the load of the same superfamily at the parent of node X. The steps described here are detailed in supplementary methods S1, section 10.23, 10.26–10.27, Supplementary Material online.

Supplementary Material

Supplementary Data

Acknowledgments

We thank Dr Beth Hellen for her valuable comments, and Dr Peter Sarkies, Dr Arvid Ågren and Prof Carl Boettiger for assistance with the analysis. The following funding sources supported this study: The Science of the Environment Council grant (http://www.nerc.ac.uk/) NE/J011355/1 was awarded to D.H.L. and M.L.B. The Science of the Environment Council grant (http://www.nerc.ac.uk/) R8/H10/56 was awarded to GenPool, University of Edinburgh. The Medical Research Council grant (http://www.mrc.ac.uk/) G0900740 was awarded to GenPool, University of Edinburgh. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the article.

Literature Cited

  1. Ågren JA, et al. 2014. Mating system shifts and transposable element evolution in the plant genus Capsella. BMC Genomics 15:602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ågren JA, Greiner S, Johnson MTJ, Wright SI. 2015. No evidence that sex and transposable elements drive genome size variation in evening primroses. Evolution 69:1053–1062. [DOI] [PubMed] [Google Scholar]
  3. Albach DC, Greilhuber J. 2004. Genome size variation and evolution in Veronica. Ann Bot. 94:897–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Andersson SG, Kurland CG. 1990. Codon preferences in free-living microorganisms. Microbiol Rev. 54:198–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Aravin AA, et al. 2001. Double-stranded RNA-mediated silencing of genomic tandem repeats and transposable elements in the D. melanogaster germline. Curr Biol. 11:1017–1027. [DOI] [PubMed] [Google Scholar]
  6. Aravin AA, Hannon GJ, Brennecke J. 2007. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science 318:761–764. [DOI] [PubMed] [Google Scholar]
  7. Arunkumar R, Ness RW, Wright SI, Barrett SCH. 2015. The evolution of selfing is accompanied by reduced efficacy of selection and purging of deleterious mutations. Genetics 199:817–829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bagijn MP, et al. 2012. Function, targets, and evolution of Caenorhabditis elegans piRNAs. Science 337:574–578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bailly-Bechet M, Haudry A, Lerat E. 2014. ‘One code to find them all’: a perl tool to conveniently parse RepeatMasker output files. Mob DNA 5:13. [Google Scholar]
  10. Bao Z, Eddy SR. 2002. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12:1269–1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bast J, et al. 2015. No accumulation of transposable elements in asexual arthropods. Mol Biol Evol. 33(3):697–706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bennett EA, Coleman LE, Tsui C, Pittard WS, Devine SE. 2004. Natural genetic variation caused by transposable elements in humans. Genetics 168:933–951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Benson G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27:573–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Berg OG, Kurland CG. 1997. Growth rate-optimised tRNA abundance and codon usage. J Mol Biol. 270:544–550. [DOI] [PubMed] [Google Scholar]
  15. Bestor TH. 1999. Sex brings transposons and genomes into conflict. Genetica 107:289–295. [PubMed] [Google Scholar]
  16. Biémont C, Tsitrone A, Vieira C, Hoogland C. 1997. Transposable element distribution in Drosophila. Genetics 147:1997–1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Blaxter ML, et al. 1998. A molecular evolutionary framework for the phylum Nematoda. Nature 392:71–75. [DOI] [PubMed] [Google Scholar]
  18. Boettiger C, Coop G, Ralph P. 2012. Is your phylogeny informative? Measuring the power of comparative methods. Evolution 66:2240–2251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Bossdorf O, Richards CL, Pigliucci M. 2008. Epigenetics for ecologists. Ecol Lett. 11:106–115. [DOI] [PubMed] [Google Scholar]
  20. Boutin TS, Le Rouzic A, Capy P. 2012. How does selfing affect the dynamics of selfish transposable elements. Mob DNA 3:5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Brookfield J. 1995. Transposable elements as selfish DNA. Mobile genetic elements. New York (NY: ): Oxford University Press; p. 130–153. [Google Scholar]
  22. Brookfield JFY. 2001. Selection on Alu sequences?. Curr Biol. 11:R900–R901. [DOI] [PubMed] [Google Scholar]
  23. Burke M, et al. 2015. The plant parasite Pratylenchus coffeae carries a minimal nematode genome. Nematology 17:621–637. [Google Scholar]
  24. Butler MA, King AA. 2004. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. Am Nat. 164:683–695. [DOI] [PubMed] [Google Scholar]
  25. Callinan PA, Batzer MA. 2006. Retrotransposable elements and human disease. Genome Dyn. 1:104–115. [DOI] [PubMed] [Google Scholar]
  26. Camacho C, et al. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Campos JL, Charlesworth B, Haddrill PR. 2012. Molecular evolution in nonrecombining regions of the Drosophila melanogaster genome. Genome Biol Evol. 4:278–288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Campos JL, Halligan DL, Haddrill PR, Charlesworth B. 2014. The relation between recombination rate and patterns of molecular evolution and variation in Drosophila melanogaster. Mol Biol Evol. 31:1010–1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Charlesworth B, Charlesworth D. 1983. The population dynamics of transposable elements. Genet Res. 42:1–27. [Google Scholar]
  31. Charlesworth B, Langley CH, Sniegowski PD. 1997. Transposable element distributions in Drosophila. Genetics 147:1993–1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Charlesworth B, Sniegowski P, Stephan W. 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371:215–220. [DOI] [PubMed] [Google Scholar]
  33. Chung W-J, Okamura K, Martin R, Lai EC. 2008. Endogenous RNA interference provides a somatic defense against Drosophila transposons. Curr Biol. 18:795–802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Collins DW, Jukes TH. 1993. Relationship between G+ C in silent sites of codons and amino acid composition of human proteins. J Mol Evol. 36:201–213. [DOI] [PubMed] [Google Scholar]
  35. Cordaux R, Batzer MA. 2009. The impact of retrotransposons on human genome evolution. Nat Rev Genet. 10:691–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Criscione CD, Blouin MS. 2005. Effective sizes of macroparasite populations: a conceptual model. Trends Parasitol. 21:212–217. [DOI] [PubMed] [Google Scholar]
  37. Cutter AD, Wasmuth JD, Washington NL. 2008. Patterns of molecular evolution in Caenorhabditis preclude ancient origins of selfing. Genetics 178:2093–2104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Czech B, et al. 2008. An endogenous small interfering RNA pathway in Drosophila. Nature 453:798–802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. D’Onofrio G, Mouchiroud D, Aïssani B, Gautier C, Bernardi G. 1991. Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J Mol Evol. 32:504–510. [DOI] [PubMed] [Google Scholar]
  40. Das PP, et al. 2008. Piwi and piRNAs act upstream of an endogenous siRNA pathway to suppress Tc3 transposon mobility in the Caenorhabditis elegans germline. Mol Cell 31:79–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. de Koning APJ, Gu W, Castoe TA, Batzer MA, Pollock DD. 2011. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7:e1002384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Dieterich C, et al. 2008. The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism. Nat Genet. 40:1193–1198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Duret L, Marais G, Biémont C. 2000. Transposons but not retrotransposons are located preferentially in regions of high recombination rate in Caenorhabditis elegans. Genetics 156:1661–1669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461. [DOI] [PubMed] [Google Scholar]
  45. Ellinghaus D, Kurtz S, Willhoeft U. 2008. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Felsenstein J. 1985. Phylogenies and the comparative method. Am Nat. 125:1–15. [Google Scholar]
  47. Fierst JL, et al. 2015. Reproductive mode and the evolution of genome size and structure in Caenorhabditis nematodes. PLoS Genet. 11:e1005323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Finnegan DJ. 1992. Transposable elements. Curr Opin Genet Dev. 2:861–867. [DOI] [PubMed] [Google Scholar]
  49. Gambari R, Nastruzzi C, Barbieri R. 1989. Codon usage and secondary structure of the rabbit alpha-globin mRNA: a hypothesis. Biomed Biochim Acta. 49:S88–S93. [PubMed] [Google Scholar]
  50. Gao X, Hou Y, Ebina H, Levin HL, Voytas DF. 2008. Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res. 18:359–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Gardiner CW. 1985. Stochastic methods. Berlin–Heidelberg–New York–Tokyo: Springer-Verlag. [Google Scholar]
  52. Gasior SL, Wakeman TP, Xu B, Deininger PL. 2006. The human LINE-1 retrotransposon creates DNA double-strand breaks. J Mol Biol. 357:1383–1393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Ghildiyal M, et al. 2008. Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells. Science 320:1077–1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Goodwin TJD, Butler MI, Poulter RTM. 2003. Cryptons: a group of tyrosine-recombinase-encoding DNA transposons from pathogenic fungi. Microbiology-SGM 149:3099–3109. [DOI] [PubMed] [Google Scholar]
  55. Gouy M, Gautier C. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10:7055–7074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Govindaraju DR, Cullis CA. 1991. Modulation of genome size in plants: the influence of breeding systems and neighbourhood size. Evol Trends Plants. 5:43–51. [Google Scholar]
  57. Gupta SK, Majumdar S, Bhattacharya TK, Ghosh TC. 2000. Studies on the relationships between the synonymous codon usage and protein secondary structural units. Biochem Biophys Res Commun. 269:692–696. [DOI] [PubMed] [Google Scholar]
  58. Hansen TF. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351. [DOI] [PubMed] [Google Scholar]
  59. Heath TA, Hedtke SM, Hillis DM. 2008. Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol. 46:239–257. [Google Scholar]
  60. Hedges DJ, Deininger PL. 2007. Inviting instability: transposable elements, double-strand breaks, and the maintenance of genome integrity. Mutat Res. 616:46–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Hellen EHB, Brookfield JFY. 2013. Alu elements in primates are preferentially lost from areas of high GC content. Peer J. 1:e78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Hess J, et al. 2014. Transposable element dynamics among asymbiotic and ectomycorrhizal Amanita fungi. Genome Biol Evol. 6:1564–1578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Holm L. 1986. Codon usage and gene expression. Nucleic Acids Res. 14:3075–3087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Holm S. 1979. A simple sequentially rejective multiple test procedure. Scand Stat Theory Appl. 6:65–70. [Google Scholar]
  65. Holterman M, et al. 2006. Phylum-wide analysis of SSU rDNA reveals deep phylogenetic relationships among nematodes and accelerated evolution toward crown Clades. Mol Biol Evol. 23:1792–1800. [DOI] [PubMed] [Google Scholar]
  66. Hua-Van A, Le Rouzic A, Boutin TS, Filee J, Capy P. 2011. The struggle for life of the genome’s selfish architects. Biol Direct. 6:19.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Huerta-Cepas J, Dopazo J, Gabaldón T. 2010. ETE: a python environment for tree exploration. BMC Bioinformatics 11:24.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Huynen MA, Konings DAM, Hogeweg P. 1992. Equal G and C contents in histone genes indicate selection pressures on mRNA secondary structure. J Mol Evol. 34:280–291. [DOI] [PubMed] [Google Scholar]
  69. Ibarra-Laclette E, et al. 2013. Architecture and evolution of a minute plant genome. Nature 498:94–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Ikemura T. 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol. 151:389–409. [DOI] [PubMed] [Google Scholar]
  71. Ikemura T. 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 2:13–34. [DOI] [PubMed] [Google Scholar]
  72. Ingram T, Mahler DL. 2013. SURFACE: detecting convergent evolution from comparative data by fitting Ornstein-Uhlenbeck models with stepwise Akaike Information Criterion. Methods Ecol Evol. 4:416–425. [Google Scholar]
  73. Jurka J, et al. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110:462–467. [DOI] [PubMed] [Google Scholar]
  74. Jurka J, Klonowski P, Dagman V, Pelton P. 1996. CENSOR–a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem. 20:119–121. [DOI] [PubMed] [Google Scholar]
  75. Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 30:772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Kawamura Y, et al. 2008. Drosophila endogenous small RNAs bind to Argonaute 2 in somatic cells. Nature 453:793–797. [DOI] [PubMed] [Google Scholar]
  77. Keren H, Lev-Maor G, Ast G. 2010. Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet. 11:345–355. [DOI] [PubMed] [Google Scholar]
  78. Kidwell MG, Lisch D. 1997. Transposable elements as sources of variation in animals and plants. Proc Natl Acad Sci U S A. 94:7704–7711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Kidwell MG, Lisch DR. 2001. Perspective: transposable elements, parasitic DNA, and genome evolution. Evolution 55:1–24. [DOI] [PubMed] [Google Scholar]
  80. Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF. 1998. Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res. 8:464–478. [DOI] [PubMed] [Google Scholar]
  81. Knight RD, Freeland SJ, Landweber LF. 2001. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2:research0010.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Kojima KK, Jurka J. 2011. Crypton transposons: identification of new diverse families and ancient domestication events. Mob DNA 2(1):12. doi: 10.1186/1759-8753-2-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Lartillot N, Lepage T, Blanquart S. 2009. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25:2286–2288. [DOI] [PubMed] [Google Scholar]
  84. Lartillot N, Philippe H. 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol. 21:1095–1109. [DOI] [PubMed] [Google Scholar]
  85. Le Rouzic A, Boutin TS, Capy P. 2007. Long-term evolution of transposable elements. Proc Natl Acad Sci U S A. 104:19375–19380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Leem Y-E, et al. 2008. Retrotransposon Tf1 is targeted to Pol II promoters by transcription activators. Mol Cell 30:98–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Lerman DN, Michalak P, Helin AB, Bettencourt BR, Feder ME. 2003. Modification of heat-shock gene expression in Drosophila melanogaster populations via transposable elements. Mol Biol Evol. 20:135–144. [DOI] [PubMed] [Google Scholar]
  88. Lobry JR. 1996. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 13:660–665. [DOI] [PubMed] [Google Scholar]
  89. Lynch M. 2007. The frailty of adaptive hypotheses for the origins of organismal complexity. Proc Natl Acad Sci U S A. 104(Suppl 1):8597–8604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Lynch M, Conery JS. 2003. The origins of genome complexity. Science 302:1401–1404. [DOI] [PubMed] [Google Scholar]
  91. Marracci S, Batistoni R, Pesole G, Citti L, Nardi I. 1996. Gypsy/Ty3-like elements in the genome of the terrestrial salamander Hydromantes (Amphibia, Urodela). J Mol Evol. 43:584–593. [DOI] [PubMed] [Google Scholar]
  92. Matzke MA, Mette MF, Matzke AJ. 2000. Transgene silencing by the host genome defense: implications for the evolution of epigenetic control mechanisms in plants and vertebrates. Plant Mol Biol. 43:401–415. [DOI] [PubMed] [Google Scholar]
  93. McDonald JF, et al. 1997. LTR retrotransposons and the evolution of eukaryotic enhancers. In: Evolution and Impact of Transposable Elements. Contemporary Issues in Genetics and Evolution Springer Netherlands. p. 3–13. [PubMed]
  94. Meldal BHM, et al. 2007. An improved molecular phylogeny of the Nematoda with special emphasis on marine taxa. Mol Phylogenet Evol. 42:622–636. [DOI] [PubMed] [Google Scholar]
  95. Montgomery E, Charlesworth B, Langley CH. 1987. A test for the role of natural selection in the stabilization of transposable element copy number in a population of Drosophila melanogaster. Genet Res. 49:31–41. [DOI] [PubMed] [Google Scholar]
  96. Muto A, Osawa S. 1987. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci U S A. 84:166–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Nordborg M. 2000. Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with partial self-fertilization. Genetics 154:923–929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Obbard DJ, Gordon KHJ, Buck AH, Jiggins FM. 2009. The evolution of RNAi as a defence against viruses and transposable elements. Philos Trans R Soc B Biol Sci. 364:99–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Pagel M. 1994. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc R Soc B. 255:37–45. [Google Scholar]
  100. Pagel M. 1997. Inferring evolutionary processes from phylogenies. Zool Scr. 26:331–348. [Google Scholar]
  101. Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE. 2003. Size matters: non-LTR retrotransposable elements and ectopic recombination in Drosophila. Mol Biol Evol. 20:880–892. [DOI] [PubMed] [Google Scholar]
  102. Poinar GO. 1975. Description and biology of a new insect parasitic Rhabditoid, Heterorhabditis Bacteriophora N. Gen., N. Sp. (Rhabditida; Heterorhabditidae N. Fam.). Nematologica 21:463–470. [Google Scholar]
  103. Price AL, Jones NC, Pevzner PA. 2005. De novo identification of repeat families in large genomes. Bioinformatics 21:i351–i358. [DOI] [PubMed] [Google Scholar]
  104. Pritham EJ. 2009. Transposable elements and factors influencing their success in eukaryotes. J Hered. 100:648–655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Pruesse E, Peplies J, Glöckner FO. 2012. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28:1823–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Quast C, et al. 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41:D590–D596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Rebollo R, Romanish MT, Mager DL. 2012. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu Rev Genet. 46:21–42. [DOI] [PubMed] [Google Scholar]
  108. Revell LJ, Harmon LJ, Collar DC. 2008. Phylogenetic signal, evolutionary process, and rate. Syst Biol. 57:591–601. [DOI] [PubMed] [Google Scholar]
  109. Richards EJ. 2008. Population epigenetics. Curr Opin Genet Dev. 18:221–226. [DOI] [PubMed] [Google Scholar]
  110. Sarkies P, et al. 2015. Ancient and novel small RNA pathways compensate for the loss of piRNAs in multiple independent nematode lineages. PLoS Biol. 13:e1002061.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Schnable PS, et al. 2009. The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115. [DOI] [PubMed] [Google Scholar]
  112. Sharp PM, Devine KM. 1989. Codon usage and gene expression level in Dictyosteiium discoidtum: highly expressed genes do [prefer [optimal codons. Nucleic Acids Res. 17:5029–5040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Sharp PM, Tuohy TMF, Mosurski KR. 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14:5125–5143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Sijen T, Plasterk RHA. 2003. Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature 426:310–314. [DOI] [PubMed] [Google Scholar]
  115. Slater GSC, Birney E. 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Slotkin RK, et al. 2009. Epigenetic reprogramming and small RNA silencing of transposable elements in pollen. Cell 136:461–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Smit A, Hubley R. 2010a. RepeatMasker Open-1.0. 1996-2010. Institute for Systems Biology, Seattle.
  118. Smit A, Hubley R. 2010b. RepeatModeler Open-1.0. 2008-2010. Institute for Systems Biology, Seattle.
  119. Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics btu033.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Stenico M, Lloyd AT, Sharp PM. 1994. Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res. 22:2437–2446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Sueoka N. 1999. Two aspects of DNA base composition: G+ C content and translation-coupled deviation from intra-strand rule of A= T and G= C. J Mol Evol. 49:49–62. [DOI] [PubMed] [Google Scholar]
  122. Szitenberg A, John M, Blaxter ML, Lunt DH. 2015. ReproPhylo: an environment for reproducible phylogenomics. PLoS Comput Biol. 11:e1004447.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Tabara H, et al. 1999. The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell 99:123–132. [DOI] [PubMed] [Google Scholar]
  124. van Megen H, et al. 2009. A phylogenetic tree of nematodes based on about 1200 full-length small subunit ribosomal DNA sequences. Nematology 11:927–S27. [Google Scholar]
  125. Vanfleteren JR, et al. 1994. Molecular genealogy of some nematode taxa as based on cytochrome c and globin amino acid sequences. Mol Phylogenet Evol. 3:92–101. [DOI] [PubMed] [Google Scholar]
  126. Whitney KD, et al. 2010. A role for nonadaptive processes in plant genome size evolution?. Evolution 64:2097–2109. [DOI] [PubMed] [Google Scholar]
  127. Wright S, Finnegan D. 2001. Genome evolution: sex and the transposable element. Curr Biol. 11:R296–R299. [DOI] [PubMed] [Google Scholar]
  128. Wright SI, Le QH, Schoen DJ, Bureau TE. 2001. Population dynamics of an Ac-like transposable element in self-and cross-pollinating Arabidopsis. Genetics 158:1279–1288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Wright SI, Ness RW, Foxe JP, Barrett SCH. 2008. Genomic consequences of outcrossing and selfing in plants. Int J Plant Sci. 169:105–118. [Google Scholar]
  130. Wright SI, Schoen DJ. 2000. Transposon dynamics and the breeding system. In: Transposable elements and genome evolution. Georgia Genetics Review 1 Springer Netherlands. p. 139–148.
  131. Zama M. 1989. Codon usage and secondary structure of mRNA. In: Nucleic Acids Symposium Series. p. 93–94. [PubMed]
  132. Zama M. 1996. Translational pauses during the synthesis of proteins and mRNA structure. In: Nucleic Acids Symposium Series. p. 179–180. [PubMed]
  133. Zou S, Ke N, Kim JM, Voytas DF. 1996. The Saccharomyces retrotransposon Ty5 integrates preferentially into regions of silent chromatin at the telomeres and mating loci. Genes Dev. 10:634–645. [DOI] [PubMed] [Google Scholar]
  134. Zuker C, Cappello J, Lodish HF, George P, Chung S. 1984. Dictyostelium transposable element DIRS-1 has 350-base-pair inverted terminal repeats that contain a heat shock promoter. Proc Natl Acad Sci U S A. 81:2660–2664. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Genome Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES