Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2008 Sep 4;105(37):13965–13970. doi: 10.1073/pnas.0804671105

Demography and weak selection drive patterns of transposable element diversity in natural populations of Arabidopsis lyrata

Steven Lockton 1, Jeffrey Ross-Ibarra 1, Brandon S Gaut 1,*
PMCID: PMC2544562  PMID: 18772373

Abstract

Transposable elements (TEs) are the major component of most plant genomes, and characterizing their population dynamics is key to understanding plant genome complexity. Yet there have been few studies of TE population genetics in plant systems. To study the roles of selection, transposition, and demography in shaping TE population diversity, we generated a polymorphism dataset for six TE families in four populations of the flowering plant Arabidopsis lyrata. The TE data indicated significant differentiation among populations, and maximum likelihood procedures suggested weak selection. For strongly bottlenecked populations, the observed TE band-frequency spectra fit data simulated under neutral demographic models constructed from nucleotide polymorphism data. Overall, we propose that TEs are subjected to weak selection, the efficacy of which varies as a function of demographic factors. Thus, demographic effects could be a major factor driving distributions of TEs among plant lineages.

Keywords: bottleneck, genetics, TE, display


Transposable elements (TEs) are a major component of plant genomes, comprising >50% of all large (>2,000 Mb) angiosperm genomes studied to date (1). In the 2,500-Mb maize genome, for example, TE amplification is the source of 60%–80% of the genomic sequence (2, 3). TEs are also abundant in the compact genomes of rice (430 Mb) and Arabidopsis thaliana (119 Mb), contributing ≈29% and ≈10% of their genomes, respectively (2, 4). Because the mean haploid angiosperm genome size is ≈6,400 Mb (5), it is no exaggeration to state that most of the DNA contained within the nuclei of flowering plants is, in fact, TE DNA. If one is to understand the dynamics and evolution of plant genomes, a comprehensive understanding of TE evolutionary dynamics is therefore necessary.

Population genetics is a powerful tool with which to study the evolutionary dynamics of TEs. TE population genetics has a particularly rich empirical history in Drosophila melanogaster, in which the surprisingly few occupied TE sites are found at low population frequencies (6, 7). What limits the number and population frequencies of TEs? Most models of TE population dynamics have focused on the maintenance of TE copy number via an equilibrium between transposition, which increases the abundance of TEs in a host genome, and natural selection, which removes deleterious TE insertions (8, 9). However, the number and distribution of TEs in genomes are unlikely to be determined by selection and transposition alone (10); factors such as the population and life history of the host may also play significant roles (11). Several recent studies of nucleotide polymorphism highlight the difficulty of identifying the signature of natural selection without first understanding the impact of demographic history (1216). Yet the role of population structure in shaping TE distributions has been largely unexplored at empirical and theoretical levels (17). Without data on TE abundance within and among natural plant populations, our understanding of the evolutionary forces shaping TE distributions will remain incomplete.

Here we study the population genetics of TEs in A. lyrata (18). Much is known about TEs in the genus Arabidopsis, because the approximately 6,000 TEs within the A. thaliana genome have been well characterized (4, 19). A. lyrata diverged from A. thaliana ≈5 million years ago (20) and has become a model system for plant molecular population genetics (21). A. lyrata is a predominantly self-incompatible, perennial species distributed across northern and central Europe, Asia, and North America. A. lyrata consists of large, stable populations, particularly in Central Europe where populations are hypothesized to have served as Pleistocene refugia (2123). Importantly, Ross-Ibarra et al. (24) modeled the demographic history of six natural A. lyrata populations based on single-nucleotide polymorphism (SNP) data from 77 nuclear genes. They compared a putatively refugial German population to five populations from Sweden, Iceland, Russia, the United States, and Canada, estimating divergence times and population size differences between the German and non-German populations. The non-German populations had from ≈7- to 18-fold smaller estimated effective population sizes (Ne) than German populations, consistent with bottlenecks during colonization from Central European refugia (25).

Although much is known about the molecular population genetics of A. lyrata, very little is known about the population genetics of TEs in this or any other plant. Sampling five of the same populations studied by Ross-Ibarra et al. (24), we use transposon-insertion display (hereafter referred to as TE display) (26, 27) to generate TE polymorphism datasets for members of six TE families. With this large dataset, we exploit the inferred demographic history of A. lyrata to characterize the evolutionary forces that act on TEs at the population level.

Results

TE-Display Data and Population Frequencies.

We performed TE display in members of six TE families: Gypsy-like (Gypsy) class I LTR-retrotransposons; SINE-like I (SINE) and LINE-like (LINE) class I non-LTR-retroelements; and Ac-like III (Ac), CACTA-like (CACTA), and Tourist-like MITE (MITE) class II DNA elements. TE display was applied to samples of 9 to 12 individuals from each of four natural populations of A. lyrata: the putatively refugial German population, the colonized Swedish and Russian populations, and a combined North American sample from Canadian and U.S. populations (Table 1).

Table 1.

Descriptive statistics for TE bands

TE family Mean polymorphic band frequency var(f)* S Sx Sf§
Germany (n = 11)
    Gypsy 0.36 0.11 17 9.0 3
    LINE 0.36 0.07 20 10.7 0
    SINE 0.35 0.08 30 11.7 2
    Ac 0.35 0.06 25 8.3 2
    CACTA 0.26 0.05 17 12.7 0
    MITE 0.23 0.03 48 29.3 1
North America (n = 12)
    Gypsy 0.48 0.12 13 5 5
    LINE 0.31 0.07 20 8 4
    SINE 0.51 0.10 25 4 5
    Ac 0.32 0.07 37 19 1
    CACTA 0.35 0.10 12 7 0
    MITE 0.27 0.04 22 8 4
Russia (n = 12)
    Gypsy 0.47 0.09 12 3 6
    LINE 0.25 0.04 25 15 4
    SINE 0.41 0.10 19 1 4
    Ac 0.45 0.07 22 9 2
    CACTA 0.35 0.08 10 7 0
    MITE 0.47 0.11 30 11 1
Sweden (n = 9)
    Gypsy 0.49 0.11 11 4 2
    LINE 0.43 0.07 14 5 1
    SINE 0.43 0.06 21 4 7
    Ac 0.46 0.07 30 13 2
    CACTA 0.30 0.04 12 7 1
    MITE 0.44 0.10 33 10 5

*Variance of polymorphic band frequencies.

Number of observed TE bands, ignoring species-wide fixed bands.

Number of unique TE bands in a pairwise comparison between Germany and each other population. For Germany, a mean from each pairwise comparison is shown.

§Numbers of within-population fixed TE bands, ignoring species-wide fixed bands.

Fig. 1 graphically represents the Ac diversity data; analogous figures for the other five TE families are available [supporting information (SI) Fig. S1]. The TE band data yield three initial observations. First, an appreciable proportion of TE bands are fixed within population samples, but fixed bands make up a smaller proportion of total diversity in the German population (Table 1). Second, each polymorphic TE band is found, on average, in multiple individuals. For example, estimates of the mean within-population band frequency (w) for Ac range from 0.32 in the North American sample to 0.46 in the Swedish population, with similar ranges of w for the other five element families (Table 1). Note also that w and the variance of fw are often lowest in the German population (Table 1). Finally, although most bands are found in multiple populations, each TE family yields unique bands in every population. For example, 16 of 54 (30%) observed Ac bands are unique to one of the four population samples, with just two of these specific to the German sample (Fig. 1).

Fig. 1.

Fig. 1.

A. lyrata Ac-like diversity data. (A) A plot of TE-display data. A colored cell represents the presence of a TE; a white cell is a lack of TE detection. Each column represents a TE band, and each row represents an individual. Colors show population of origin. (B) TE BFS for observed data (circles) and simulated data (bars and vertical black lines). The bars represent the 95% credible intervals, the white horizontal lines in the bars are the medians, and the vertical black lines show the full ranges of the simulations.

Population Differentiation.

To investigate the extent of population differentiation, we applied a molecular analysis of variance (AMOVA) (28) to data from each TE family. Permutation tests revealed significant population differentiation (as measured by the ΦPT statistic) for each pairwise population comparison for all TE families (Fig. S2) (ΦPT > 0 at P < 0.01). Comparisons that included the German sample had lower ΦPT values on average (Fig. S2). Additionally, using the Structure program, we performed analyses using all of the TE-display data as a single dataset and assuming the TE bands were unlinked (29). A model of K = 4 clusters yielded the highest likelihood, with clear separation of individuals by geographic origin (Fig. S2).

Maximum Likelihood Estimation of Selection.

To infer the strength of selective forces acting on TE insertions, we applied a modification of the diffusion-approximation approach of Petrov et al. (6), correcting for our ascertainment scheme and assuming Hardy–Weinberg equilibrium (HWE) (see SI Methods). We obtained the maximum likelihood estimate (MLE) of s for each TE family in each population and calculated the estimated Nes (Neŝ) for each (Table S1). Eight of twenty-four Neŝ values have an absolute value <1, and two-thirds of the point estimates are positive. However, only 3 of 24 have confidence intervals that do not overlap zero, and only one of these is negative (MITEs in Germany). Importantly, the sign and magnitude of Neŝ vary by population: When data from all elements are combined, only the German sample yields a negative Nes estimate [Neŝ = −0.612; 95% C.I. (−1.360, 0.289)], whereas the other three bottlenecked populations yield positive values [North America = 0.558 (−0.491, 2.850); Russia = 0.720 (−0.417, 3.612); and Sweden = 1.662 (0.072, >6.0)]. These observations raise the possibility that Neŝ values reflect properties of populations as much as properties of selection on TEs.

Population Bottleneck TE Dynamics.

To assess the relative contributions of transposition, selection, and demography on patterns of TE diversity, for each TE family we compared the German population (as a reference) to each of the other populations in turn. We focused on three summary statistics of the data (Fig. 2): the total number of bands in the two populations (Stot), the number of unique bands in the bottlenecked population (Sxb), and the total number of bands in the bottlenecked population (Sb). Of particular note is the fact that Sxb was higher in some non-German populations than one might intuitively expect in bottlenecked populations; for example, Sxb = 19 for Ac elements in the bottlenecked North American population compared with, on average, approximately eight Ac bands unique to Germany in pairwise comparisons (Table 1).

Fig. 2.

Fig. 2.

Hypothetical genealogy to illustrate the simulation of TE distributions under a neutral bottleneck model. Black circles represent transposition events.

We compared Sb and Sxb to demographic expectations by using simulations of models that were inferred from silent nucleotide polymorphisms (24) (Fig. 2) (see Methods). These simulations were conditioned on Stot and assumed a constant (but unknown) transposition rate. We found significantly higher pairwise Sxb than expected in nearly a third (5 of 18) of our comparisons (North America: Ac, CACTA, LINE; Russia: LINE; Sweden: Ac; P < 0.05) (data not shown). Three scenarios could explain this observation. First, the demographic model used may be incorrect. This possibility seems unlikely, as the same model fits SNP data from the same populations well (24). Second, transposition could lead to higher Sxb than demographic expectations. However, the number of total and unique TE bands in Germany fit simulations well for every pairwise comparison (data not shown). Thus, this explanation only makes sense if transposition rates are substantially higher in the bottlenecked populations relative to Germany. Finally, an excess of Sxb could be explained by purifying selection removing TEs in the German population, thus increasing the number of TEs appearing as unique to other populations.

To investigate this last possibility, we performed simulations across a grid of θ values for the German population (θ = 4Neμ, where μ is the population transposition rate). We simulated data from models with θ values decreasing from 100% to 10% of the original value for Germany (24) (Fig. 3). In this context, decreasing θ serves as a proxy for weak selection (30). For 14 of 18 combinations (i.e., six TE families × three comparison populations), the observed data better fit a model with decreased θ (Fig. 3). Data from Ac, LINE, and CACTA elements were particularly compelling, with at least 10-fold estimated decreases of θ in Germany. These results suggest that purifying selection acts on TEs in the German population relative to a null demographic model fitted with presumably neutral SNP polymorphisms.

Fig. 3.

Fig. 3.

Relative probabilities of the observed data for different values of θ in Germany, scaled to the original demographic model (24).

Previous studies have noted that bottleneck events decrease Ne, thereby decreasing the efficacy of selection and slowing the rate of TE loss (31). If purifying selection is weakened due to bottleneck-related reductions in Ne then we predict that the frequency spectra of TE bands in bottlenecked populations should be consistent with neutrality under the inferred demographic history. To test this hypothesis, we again used the demographic models of Ross-Ibarra et al. (24), simulating data under the most likely value of θ (Fig. 3) to generate a posterior probability distribution of band-frequency spectra (BFS) for neutral multilocus TE-display datasets (see Methods). The observed BFS from our TE-display data fit the neutral expectations from these simulations quite well (Fig. 1, Fig. S1, and Table S2). Across all families and all populations, just 3.0% (8 of 265) of observed frequencies fell outside the 95% confidence intervals, and none remained significant after a per-TE family Bonferroni correction (Fig. 1, Fig. S1, and Table S2).

In an effort to bolster statistical power, we pooled data across TE families and compared the observed (pooled) BFS to its expectation based on simulation. To make these comparisons, for each population we first calculated the Mann–Whitney U between the observed pooled BFS and 1,000 simulated BFS, retaining the mean of these 1,000 U statistics. We then repeated this procedure for 1,000 simulated BFS, comparing each with 1,000 simulated data sets and retaining the mean U statistic. These 1,000 mean values represent variation in the pooled BFS expected under the demographic model and form the basis for evaluating the fit of the observed BFS to demographic expectations (Fig. 4). The pooled BFS did not differ from demographic expectations for the three non-German populations (Fig. 4) (North America: P = 0.346; Russia: P = 0.530; Sweden: P = 0.708). However, the pooled BFS in Germany was different from simulated data (Fig. 4) (P = 0.050), with an excess of observed low frequency variants. This last observation is consistent with weak purifying selection skewing the distribution of TEs toward rare variants in the German population.

Fig. 4.

Fig. 4.

Mann–Whitney U statistics of pooled TE-band frequency spectra. Histograms depict the distribution of 1,000 mean U statistics for data simulated under the neutral demographic model for each population, and arrows point to the mean U value obtained by comparing observed data with the model. The observed value in Germany is P = 0.05.

Direct Comparison of TEs and Silent SNPs.

If purifying selection against TEs is relaxed in the bottlenecked populations, comparisons of the ratio of TE diversity to diversity at neutral SNPs should reveal differences among the populations, with higher ratios in populations with relaxed purifying selection. Indeed, this is exactly the pattern observed: In Germany, the ratio of counts of polymorphic TEs to polymorphic derived silent SNPs is 0.28, but the three other populations have ratios of 0.86, 0.75, and 0.94 (Table S3). The difference is significant for all pairwise comparisons with Germany (Fisher's Exact Test, P < 0.001). A similar comparison of the mean number of TE bands per individual to the mean number of polymorphic SNPs per diploid genome produces identical results, in that Germany has a lower ratio (0.15) than North America (0.45), Russia (0.45), and Sweden (0.45) (Fisher's Exact Test, P < 0.001 for all pairwise comparisons to Germany) (Table S3). These observations suggest that patterns of TE polymorphisms differ across populations relative to a neutral marker (silent SNPs).

Discussion

TEs represent the majority of plant genomic DNAs and undoubtedly contribute to genomic flux. Molecular evolutionary analyses suggest, for example, that many plant TEs have proliferated within the recent past (3235) and that proliferation is counteracted by TE deletion (36, 37). Nonetheless, our understanding of TE evolution in plant genomes is woefully incomplete, in part because there have been few population genetic studies of plant TEs. Without population genetic information, one cannot infer the relative roles of transposition, natural selection, and genetic drift in TE accumulation.

Most population genetic analyses of TEs have assumed that TE population frequencies are governed by an equilibrium between selection and transposition (38). By using this assumption, negative selection has been found against TEs in both Drosophila (6, 39) and humans (40). However, recent simulation work strongly suggests that equilibrium is very unlikely under realistic conditions and that factors such as population-size variation can strongly affect TE dynamics (10). In plants, although several studies have used TE-display bands as genetic markers (37, 41, 42) or for phylogeographic analysis (43), the population genetic ramifications of TEs have largely been ignored. One notable exception inferred negative selection against TEs in A. lyrata under equilibrium conditions (44). Another recent study used A. thaliana TE polymorphism data to conclude that longer Helitrons are less likely to persist in the genome (32). Given the rarity of plant TE population genetic data, our population dataset of 6 TE families based on 44 individuals from 4 natural populations is to our knowledge unprecedented.

Individual TE bands are found in intermediate-to-high population frequencies in A. lyrata. This observation superficially suggests that the TEs in our sample have not been subjected to strong purifying selection. Assuming TE insertions are at HWE, the mean TE allele frequency was 0.24 across TE families and populations, and frequencies ranged from 0.13 to 0.35. Our allele frequency estimates match well with previous work on the Ac family of elements in A. lyrata (44), but are somewhat higher than those estimated in Drosophila. Mean allele frequencies of non-LTR elements in Drosophila are as high as 16% (6) (compared with 22% for SINEs and LINEs in our data), and even lower for LTR elements (compared with 26% for Gypsy in our data) (39). Mean frequencies of polymorphic TEs are higher in other systems, however, including Ta1 in humans [36% (40)], class I TEs in pufferfish [43% (45)], and nonautonomous Helitrons in A. thaliana [60% (32)].

Evolutionary Forces Governing TE Polymorphism.

Given the demographic history of A. lyrata, what forces govern TE diversity and polymorphism? Transposition may have occurred during the history of our sample, based on two lines of evidence. The first is the simple observation that every population sample has unique bands relative to the four other population samples. Given pairwise divergence time estimates (24) and assuming unique bands represent transposition events, we can use the average number of unique bands per individual (pooled across TE families) to calculate a per locus estimate of the transposition rate for each population. These estimates yield a mean rate of 2 × 10−5 bands per generation per locus, which is similar to estimates in ref. 46 or less than those in refs. 47 and 48. Second, transposition is biologically plausible, because the elements studied show evidence of activity in A. thaliana. For example, CACTA TEs are active in methylation-deficient mutants of A. thaliana (48), some families of Ac-like elements show evidence of recent activity (47, 49), and both SINE-like and Gypsy-like elements have been inferred to be active within A. thaliana's recent past (35, 50). In addition, SINE, Ac-like, CACTA, and MITE TEs are presumed to have been active recently because they contain ORFs (19) or vary in location among A. thaliana ecotypes (51).

Interpreting the selective forces acting on TEs is more difficult. Estimates of Nes were not large; 88% (21 of 24) had confidence intervals encompassing zero, suggesting the TE bands in our sample are subjected to at most weak selection. Nonetheless, point estimates of Nes tended to be positive, with two values significantly >0 (Table S1). Taken at face value, positive Neŝ values suggest positive selection on TEs. We believe such a conclusion would be in error, however, because the diffusion models make demographic assumptions, such as large, constant population sizes, which may not apply to our A. lyrata populations. Supporting this view is the fact that bottlenecked populations (North America, Sweden, and Russia) yield overall Neŝ values >1.0, whereas Germany, which most closely represents a neutral equilibrium population (24), yields the only overall negative Nes estimate (−0.612). We conclude, then, that the Neŝ values are generally consistent with weak selection (i.e., | Nes| < 1), but caution that our estimates of selection, and perhaps those of previous studies (6, 40, 45), should be viewed with healthy skepticism because they do not incorporate demographic complexities.

To further investigate the possibility of selection against TEs while recognizing demographic history, we compared the observed TE BFS to simulated BFS based on demographic models fitted to DNA sequence diversity data (24). We reasoned that purifying selection against TE insertions should lead to an overabundance of observed low-frequency TE bands relative to the expectation based on the neutral demographic models. By pooling data, we were able to show that the German population has an excess of low-frequency TE bands, as might be expected under weak purifying selection, an observation consistent with low but negative estimates of Nes for this population and estimated values of θ (Fig. 3). In contrast, we were unable to clearly reject the hypothesis that the TE distributions result largely from demographic instead of selective processes in the three strongly bottlenecked populations.

Implications for Understanding the Forces Acting on TE Diversity.

Our data provide evidence for the geographic structure of A. lyrata populations and TE activity within and among populations. However, unlike other studies (6, 44, 5154), we did not uncover clear evidence for selection against TEs in the non-German populations. Given the lack of obvious evidence for selection, can we discount selection entirely? The answer is no for three reasons. First, although we are unaware of any other empirical study that has explicitly modeled demographic history in TE population genetics, our statistical power to infer weak selection against a demographic background may be low. Second, previous studies have demonstrated convincingly that there is selection against TEs. For example, a recent study of rice TEs found that insertions into gene regions are lost rapidly because of strong selection against the interruption of gene function (55). These highly deleterious events are not expected to rise to appreciable population frequencies and are thus unlikely to have been included in our sample.

Third, there is the intriguing possibility that demography interacts with selection to shape the frequency and distribution of TEs. The pooled TEs in the German population have a negative Neŝ, but Neŝ values are slightly positive in the bottlenecked populations. To the extent that these Neŝ values are reasonable, they suggest that TE insertions are, on average, subject to nearly neutral population dynamics (56). The efficacy of selection is a function of Ne; if Ne changes such that | Nes| ≪ 1.0, drift can overcome selection. Our Nes estimates (Table S1), along with our observation that the BFS of TEs in bottlenecked populations are consistent with neutral demographic processes, are consistent with reduced efficacy of selection in bottlenecked populations. Ratios of polymorphic TEs and silent SNPs further suggest that purifying selection on TEs is relaxed in the bottlenecked populations relative to the German population.

If this conjecture is true, it has a profound impact on our understanding of the evolution of plant genomes. It suggests that genomic flux in TEs occurs at a rate that is influenced by demographic history. All other things being equal, plant species with small populations sizes should purge TE insertions less efficiently and hence accrue DNA more rapidly. The idea that genomic complexity is related to population size is not new (57) and has been cited as the cause of the accumulation of repetitive element insertions in the human genome (31). Thus far, however, there has been no compelling evidence for this effect within and between plant populations or between plant evolutionary lineages. Yet, given the wide range of differences in Ne among plants because of breeding system and life history, and also given evidence for strong demographic effects during processes like domestication (58, 59), our results raise the possibility that the differential expansion of TEs among plant lineages could be fundamentally a function of demographic history.

Materials and Methods

Sampling and Plant Growth.

Five populations of A. lyrata were sampled for this study: Plech, Germany (sampled by M. Clauss, Max Planck Institute of Chemical Ecology, Jena, Germany); Karhumäki, Russia (courtesy of O. Savolainen, University of Oulu, Oulu, Finland); Stubbsand, Sweden (also courtesy of O. Savolainen); Indiana Dunes, U.S. and Ontario, Canada (both provided by B. Mable, University of Glasgow, Galsgow, UK). Plants were grown at 22°C with a 16-h day for 8 weeks. DNA was extracted by using Qiagen's DNeasy Plant Mini kit.

TE Display.

We followed Le and Bureau (27) in our choice of TE-display adapters and adapter primers. TE-specific primers for the Ac and CACTA families were from previous studies (27, 44). Additional nested TE-specific primers were chosen by (i) designing a large number of primers for known TE sequences, (ii) performing virtual TE display in the A. thaliana genome, and (iii) screening primers in both A. thaliana ecotype Columbia and A. lyrata. Digestion, ligation, and PCR followed the methods of Le and Bureau (27) with slight modifications (SI Methods). Bands were sized with fragment analysis by using the software GeneMapper 4.0 on an ABI 3100 (Applied Biosystems) using a ROX-labeled MapMarker 1000 sizing standard (BioVentures) to score bands between 60- and 1,000-bp long. Preselective and selective PCR was repeated three times for each individual. Data were scored manually; a peak was scored as a TE band if it was the same base-pair size in two or more replicates. We examined the specificity and repeatability of TE display by first assessing error rates in three biological replicates of A. thaliana Col-0. We estimated a mean error rate of the PCR and fragment analysis at <4% across all TE families and all TE-display bands. A sample of 16 bands was cloned by using a pGEM T Easy vector (Promega), sequenced on an ABI 3130 to confirm their identity by using BLASTn (58), and submitted to GenBank. Fifteen (94%) of the bands had homology (at an e-value <1E-5) to the correct TE family; the remaining sequence matched an unannotated A. thaliana centromeric region.

Population Structure.

AMOVA (28) was performed on TE-display band data with GenAlEx 6 (60). For the program Structure (29), the data were treated as a single population of unlinked loci. We performed Structure on the band data with 10,000 burn-in runs followed by 100,000 steps, without using population source information and assuming the possibility of admixture. Results were visualized with the program DISTRUCT (61).

Simulation of the Neutral Demographic Model.

We used the demographic models inferred by Ross-Ibarra et al. (24), combining models for U.S. and Canada for the North American simulations. We simulated TE population genetic data with the program ms (62), drawing parameter values from the posterior distributions of the inferred models. We conditioned simulations on the total number of occupied bands Stot observed in the two populations being compared; such conditioning requires only that the (unknown) rate of transposition remain constant across the genealogy. We assumed TE sites are unlinked and for each site simulated 2n alleles, where n is the number of individuals in the sample, combining alleles into n dominant genotypes for comparison with TE-display data.

We performed two sets of simulations; for both, data were simulated for all six TE families in each of three pairwise population comparisons (contrasting Germany to Russia, Sweden, or North America, respectively). In the first set, we conditioned on Stot and performed 100,000 multilocus simulations, comparing Sxb in each population with the simulated value. For the second set of simulations, we varied θ in the German and ancestral populations across a grid of θ values decreasing from 100% to 10% compared with values specified in the original model. We accepted simulations that matched the observed Sxb, recording acceptance rates and continuing until reaching 5,000 acceptances. The relative probability of each point on the grid was estimated from the acceptance rates, and the most probable value was chosen for further use. For each of the 5,000 simulations from the most probable model, we then calculated the unfolded BFS, including fixed bands, thus generating a posterior distribution and 95% credible intervals of the BFS for each TE/population combination. The German BFS were generated by pooling the Germany simulated data from each of the three pairwise comparisons.

SNP-TE Comparisons.

For comparisons to numbers of polymorphic TEs in each population, we counted SNPs in 77 sequenced A. lyrata loci (24) and determined their ancestral state with an A. thaliana outgroup.

SI.

A schematic representation of the bottleneck model used for parameter estimation is available in Fig. S3, and a complete list of oligonucleotide sequences used in this work can be found in Table S4.

Supplementary Material

Supporting Information

Acknowledgments.

We thank N. Komarova, S. Wright, J. Hollister, and K. Thornton for assistance and discussion; R. Gaut for technical assistance; L. DeRose-Wilson for discussion; E. Thorhallsdottir, M. Clauss, O. Savolainen, and B. Mable for seed material; and three anonymous reviewers for constructive comments. This work was supported by National Science Foundation Grant DEB-0426166 (to B.S.G.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. EU558519EU558534).

This article contains supporting information online at www.pnas.org/cgi/content/full/0804671105/DCSupplemental.

References

  • 1.Bennetzen JL. Transposable elements, gene creation and genome rearrangement in flowering plants. Curr Opin Genet Dev. 2005;15:621–627. doi: 10.1016/j.gde.2005.09.010. [DOI] [PubMed] [Google Scholar]
  • 2.Messing J, et al. Sequence composition and genome organization of maize. Proc Natl Acad Sci USA. 2004;101:14349–14354. doi: 10.1073/pnas.0406163101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.SanMiguel P, et al. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274:765–768. doi: 10.1126/science.274.5288.765. [DOI] [PubMed] [Google Scholar]
  • 4.Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. [DOI] [PubMed] [Google Scholar]
  • 5.Zonneveld BJ, Leitch IJ, Bennett MD. First nuclear DNA amounts in more than 300 angiosperms. Annals of botany. 2005;96:229–244. doi: 10.1093/aob/mci170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE. Size matters: Non-LTR retrotransposable elements and ectopic recombination in Drosophila. Mol Biol Evol. 2003;20:880–892. doi: 10.1093/molbev/msg102. [DOI] [PubMed] [Google Scholar]
  • 7.Montgomery E, Charlesworth B, Langley CH. A test for the role of natural selection in the stabilization of transposable element copy number in a population of Drosophila melanogaster. Genet Res. 1987;49:31–41. doi: 10.1017/s0016672300026707. [DOI] [PubMed] [Google Scholar]
  • 8.Le Rouzic A, Deceliere G. Models of the population genetics of transposable elements. Genet Res. 2005;85:171–181. doi: 10.1017/S0016672305007585. [DOI] [PubMed] [Google Scholar]
  • 9.Charlesworth B, Charlesworth D. The population dynamics of transposable elements. Genet Res. 1983;42:1–27. [Google Scholar]
  • 10.Le Rouzic A, Boutin TS, Capy P. Long-term evolution of transposable elements. Proc Natl Acad Sci USA. 2007;104:19375–19380. doi: 10.1073/pnas.0705238104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Picot S, et al. The mariner transposable element in natural populations of Drosophila simulans. Heredity. 2008;101:53–59. doi: 10.1038/hdy.2008.27. [DOI] [PubMed] [Google Scholar]
  • 12.Tenaillon MI, U'Ren J, Tenaillon O, Gaut BS. Selection versus demography: A multilocus investigation of the domestication process in maize. Mol Biol Evol. 2004;21:1214–1225. doi: 10.1093/molbev/msh102. [DOI] [PubMed] [Google Scholar]
  • 13.Haddrill PR, Thornton KR, Charlesworth B, Andolfatto P. Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Res. 2005;15:790–799. doi: 10.1101/gr.3541005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Thornton K, Andolfatto P. Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics. 2006;172:1607–1619. doi: 10.1534/genetics.105.048223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Voight BF, et al. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc Natl Acad Sci USA. 2005;102:18508–18513. doi: 10.1073/pnas.0507325102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wright SI, et al. The effects of artificial selection on the maize genome. Science. 2005;308:1310–1314. doi: 10.1126/science.1107891. [DOI] [PubMed] [Google Scholar]
  • 17.Deceliere G, Charles S, Biemont C. The dynamics of transposable elements in structured populations. Genetics. 2005;169:467–474. doi: 10.1534/genetics.104.032243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Savolainen O, Langley CH, Lazzaro BP, Fr H. Contrasting patterns of nucleotide polymorphism at the alcohol dehydrogenase locus in the outcrossing Arabidopsis lyrata and the selfing Arabidopsis thaliana. Mol Biol Evol. 2000;17:645–655. doi: 10.1093/oxfordjournals.molbev.a026343. [DOI] [PubMed] [Google Scholar]
  • 19.Le QH, Wright S, Yu ZH, Bureau T. Transposon diversity in Arabidopsis thaliana. Proc Natl Acad Sci USA. 2000;97:7376–7381. doi: 10.1073/pnas.97.13.7376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Koch MA, Haubold B, Mitchell-Olds T. Comparative evolutionary analysis of the chalcone synthase and alcohol dehydrogenase loci among different lineages of Arabidopsis, Arabis and related genera (Brassicaceae) Mol Biol Evol. 2000;17:1483–1498. doi: 10.1093/oxfordjournals.molbev.a026248. [DOI] [PubMed] [Google Scholar]
  • 21.Mitchell-Olds T. Arabidopsis thaliana and its wild relatives: A model system for ecology and evolution. Trends Ecol Evol. 2001;16:693–700. [Google Scholar]
  • 22.Clauss MJ, Mitchell-Olds T. Population genetic structure of Arabidopsis lyrata in Europe. Mol Ecol. 2006;15:2753–2766. doi: 10.1111/j.1365-294X.2006.02973.x. [DOI] [PubMed] [Google Scholar]
  • 23.Koch MA, Matschinger M. Evolution and genetic differentiation among relatives of Arabidopsis thaliana. Proc Natl Acad Sci USA. 2007;104:6272–6277. doi: 10.1073/pnas.0701338104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ross-Ibarra J, et al. Patterns of polymorphism and demographic history in natural populations of Arabidopsis lyrata. PLoS ONE. 2008;3:e2411. doi: 10.1371/journal.pone.0002411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Muller MH, Leppala J, Savolainen O. Genome-wide effects of postglacial colonization in Arabidopsis lyrata. Heredity. 2008;100:47–58. doi: 10.1038/sj.hdy.6801057. [DOI] [PubMed] [Google Scholar]
  • 26.Korswagen HC, Durbin RM, Smits MT, Plasterk RH. Transposon Tc1-derived, sequence-tagged sites in Caenorhabditis elegans as markers for gene mapping. Proc Natl Acad Sci USA. 1996;93:14680–14685. doi: 10.1073/pnas.93.25.14680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Le QH, Bureau T. Prediction and quality assessment of transposon insertion display data. Biotechniques. 2004;36:222–228. doi: 10.2144/04362BM04. [DOI] [PubMed] [Google Scholar]
  • 28.Excoffier L, Smouse P, Quattro J. Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics. 1992;131:479–491. doi: 10.1093/genetics/131.2.479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Charlesworth D, Charlesworth B, Morgan MT. The pattern of neutral molecular variation under the background selection model. Genetics. 1995;141:1619–1632. doi: 10.1093/genetics/141.4.1619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gherman A, et al. Population bottlenecks as a potential major shaping force of human genome architecture. PLoS Genet. 2007;3:e119. doi: 10.1371/journal.pgen.0030119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hollister JD, Gaut BS. Population and evolutionary dynamics of helitron transposable elements in Arabidopsis thaliana. Mol Biol Evol. 2007;24:2515–2524. doi: 10.1093/molbev/msm197. [DOI] [PubMed] [Google Scholar]
  • 33.Ma J, Devos KM, Bennetzen JL. Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res. 2004;14:860–869. doi: 10.1101/gr.1466204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43–45. doi: 10.1038/1695. [DOI] [PubMed] [Google Scholar]
  • 35.Pereira V. Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol. 2004;5:R79. doi: 10.1186/gb-2004-5-10-r79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Devos KM, Brown JKM, Bennetzen JL. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Research. 2002;12:1075–1079. doi: 10.1101/gr.132102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vitte C, Panaud O, Quesneville H. LTR retrotransposons in rice (Oryza sativa, L.): Recent burst amplifications followed by rapid DNA loss. BMC Genomics. 2007;8:218. doi: 10.1186/1471-2164-8-218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bergman CM, Bensasson D. Recent LTR retrotransposon insertion contrasts with waves of non-LTR insertion since speciation in Drosophila melanogaster. Proc Natl Acad Sci USA. 2007;104:11340–11345. doi: 10.1073/pnas.0702552104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Charlesworth B, Langley CH. The population genetics of Drosophila transposable elements. Ann Rev Genet. 1989;23:251–287. doi: 10.1146/annurev.ge.23.120189.001343. [DOI] [PubMed] [Google Scholar]
  • 40.Boissinot S, Davis J, Entezam A, Petrov D, Furano AV. Fitness cost of LINE-1 (L1) activity in humans. Proc Natl Acad Sci USA. 2006;103:9590–9594. doi: 10.1073/pnas.0603334103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.De Keukeleire P, et al. Analysis by transposon display of the behavior of the dTph1 element family during ontogeny and inbreeding of Petunia hybrida. Mol Genet Genomics. 2001;265:72–81. doi: 10.1007/s004380000390. [DOI] [PubMed] [Google Scholar]
  • 42.Kwon SJ, Park KC, Kim JH, Lee JK, Kim NS. Rim 2/Hipa CACTA transposon display: A new genetic marker technique in Oryza species. BMC genetics. 2005;6:15. doi: 10.1186/1471-2156-6-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cornman RS, Arnold ML. Phylogeography of Iris missouriensis (Iridaceae) based on nuclear and chloroplast markers. Mol Ecol. 2007;16:4585–4598. doi: 10.1111/j.1365-294X.2007.03525.x. [DOI] [PubMed] [Google Scholar]
  • 44.Wright SI, Quang HL, Schoen DJ, Bureau TE. Population dynamics of an Ac-like transposable element in self- and cross-pollinating Arabidopsis. Genetics. 2001;158:1279–1288. doi: 10.1093/genetics/158.3.1279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Neafsey DE, Blumenstiel JP, Hartl DL. Different regulatory mechanisms underlie similar transposable element profiles in pufferfish and fruitflies. Mol Biol Evol. 2004;21:2310–2318. doi: 10.1093/molbev/msh243. [DOI] [PubMed] [Google Scholar]
  • 46.Nuzhdin SV. Sure facts, speculations, and open questions about the evolution of transposable element copy number. Genetica. 1999;107:129–137. [PubMed] [Google Scholar]
  • 47.Frank MJ, Liu D, Tsay YF, Ustach C, Crawford NM. Tag1 is an autonomous transposable element that shows somatic excision in both Arabidopsis and tobacco. Plant Cell. 1997;9:1745–1756. doi: 10.1105/tpc.9.10.1745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Miura A, et al. Mobilization of transposons by a mutation abolishing full DNA methylation in Arabidopsis. Nature. 2001;411:212–214. doi: 10.1038/35075612. [DOI] [PubMed] [Google Scholar]
  • 49.Tsay YF, Frank MJ, Page T, Dean C, Crawford NM. Identification of a mobile endogenous transposon in Arabidopsis thaliana. Science. 1993;260:342–344. doi: 10.1126/science.8385803. [DOI] [PubMed] [Google Scholar]
  • 50.Lenoir A, et al. The evolutionary origin and genomic organization of SINEs in Arabidopsis thaliana. Mol Biol Evol. 2001;18:2315–2322. doi: 10.1093/oxfordjournals.molbev.a003778. [DOI] [PubMed] [Google Scholar]
  • 51.Casacuberta E, Casacuberta JM, Puigdomenech P, Monfort A. Presence of miniature inverted-repeat transposable elements (MITEs) in the genome of Arabidopsis thaliana: Characterisation of the emigrant family of elements. Plant J. 1998;16:79–85. doi: 10.1046/j.1365-313x.1998.00267.x. [DOI] [PubMed] [Google Scholar]
  • 52.Rizzon C, Marais G, Gouy M, Biemont C. Recombination rate and the distribution of transposable elements in the Drosophila melanogaster genome. Genome Research. 2002;12:400–407. doi: 10.1101/gr.210802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Hoogland C, Biemont C. Chromosomal distribution of transposable elements in Drosophila melanogaster: Test of the ectopic recombination model for maintenance of insertion site number. Genetics. 1996;144:197–204. doi: 10.1093/genetics/144.1.197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wright SI, Agrawal N, Bureau TE. Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res. 2003;13:1897–1903. doi: 10.1101/gr.1281503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Naito K, et al. Dramatic amplification of a rice transposable element during recent domestication. Proc Natl Acad Sci USA. 2006;103:17620–17625. doi: 10.1073/pnas.0605421103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Ohta T. The nearly neutral theory of molecular evolution. Ann Rev Syst Ecol. 1992;23:263–286. [Google Scholar]
  • 57.Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]
  • 58.Doebley J. In: Isozymes in Plant Biology. Soltis D, Soltis P, editors. Portland, Oregon: Dioscorides Press; 1989. pp. 165–191. [Google Scholar]
  • 59.Eyre-Walker A, Gaut RL, Hilton H, Feldman DL, Gaut BS. Investigation of the bottleneck leading to the domestication of maize. Proc Natl Acad Sci USA. 1998;95:4441–4446. doi: 10.1073/pnas.95.8.4441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Peakall R, Smouse PE. GENALEX 6: Genetic analysis in Excel. Population genetic software for teaching and research. Mol Ecol Notes. 2006;6:288–295. doi: 10.1093/bioinformatics/bts460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Rosenberg NA. DISTRUCT: A program for the graphical display of population structure. Mol Ecol Notes. 2004;4:137–138. [Google Scholar]
  • 62.Hudson RR. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics. 2002;18:337–338. doi: 10.1093/bioinformatics/18.2.337. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES