Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

Ágnes Jónás; Thomas Taus; Carolin Kosiol; Christian Schlötterer; Andreas Futschik

doi:10.1534/genetics.116.191197

. 2016 Aug 19;204(2):723–735. doi: 10.1534/genetics.116.191197

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

Ágnes Jónás ^*,^†, Thomas Taus ^*,^†, Carolin Kosiol ^†, Christian Schlötterer ^†, Andreas Futschik ^†,^‡,¹

PMCID: PMC5068858 PMID: 27542959

Abstract

The effective population size ( $N_{e}$ ) is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term $N_{e} .$ They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to $N_{e} .$ Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of $N_{e},$ which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate $N_{e}$ estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide $N_{e}$ estimates, we extend our method using a recursive partitioning approach to estimate $N_{e}$ locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their $N_{e}$ estimates. We present an application to Pool-seq data from experimental evolution with Drosophila and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest.

Keywords: effective population size, genetic drift, Pool-seq, experimental evolution

DURING experimental evolution studies, populations are maintained under specific laboratory conditions (Kawecki et al. 2012; Long et al. 2015; Schlötterer et al. 2015). In sexually reproducing organisms, the census population size is typically kept fixed at fairly low numbers, rarely exceeding 2000 individuals. With such small population sizes, genetic drift causes stochastic fluctuations in allele frequencies. Under neutrality, the level of random frequency changes is determined by the effective population size ( $N_{e}$ ) (Wright 1931). Furthermore, the efficacy of selection is influenced by $N_{e} .$ For weakly selected alleles, the probability of fixation is directly proportional to the product of $N_{e}$ and the intensity of selection (Fisher 1930; Kimura 1964). As changes in allele frequency are greatly affected by the population size, it is fundamental to estimate $N_{e}$ accurately to understand molecular variation in experimental evolution studies.

Krimbas and Tsakas (1971) estimated $N_{e}$ using the standardized variance of allele frequency (F, see also Falconer and Mackay 1996) from longitudinal samples in natural populations of olive flies. As F was calculated from these samples, they accounted for the sampling variance that also contributed to the true allele frequency variance. This approach was further improved and used by several authors (Nei and Tajima 1981; Pollak 1983; Waples 1989; Jorde and Ryman 2007).

With the widespread availability of powerful computers, also maximum-likelihood-based methods became popular (Williamson and Slatkin 1999; Anderson et al. 2000; Wang 2001; Hui and Burt 2015) in addition to the moment-based approaches discussed above. Although these methods show less bias than the moment-based approaches (Wang 2001), they are still computationally demanding, in particular for the large numbers of markers typically obtained with novel sequencing technologies (Foll et al. 2015).

Estimating $N_{e}$ with temporal methods requires samples collected at least at two time points. Alternative methods that use only a single time point are based on linkage disequilibrium (LD) (Hill 1981; Waples and Do 2008, 2010; Waples and England 2011), heterozygote excess (Pudovkin et al. 1996), molecular coancestry (Nomura 2008), sibship frequencies (Wang 2009, 2013), or combinations of summary statistics using approximate Bayesian computation (Tallmon et al. 2008). LD-based methods are widespread but require haplotype or unphased diploid genotype information, which limits their applicability.

Although the cost for sequencing has dropped considerably, the separate sequencing of thousands of individuals in replicate populations in experimental evolution studies is still out of reach. Sequencing samples in pools (Pool-seq) can provide a cost-effective alternative (Schlötterer et al. 2014). Pool-seq has also been shown to outperform individual-based sequencing in estimating allele frequencies and inferring population genetic parameters under several conditions (Futschik and Schlötterer 2010; Zhu et al. 2012; Gautier et al. 2013). For these reasons, Pool-seq has become the basis of many experimental evolution “evolve and resequence” (E&R) studies (Turner et al. 2011; Schlötterer et al. 2015). Following the emergence of E&R, many population genetic estimators have been adjusted to handle the properties of Pool-seq data (Futschik and Schlötterer 2010; Kofler et al. 2011a,b; Kolaczkowski et al. 2011; Boitard et al. 2013; Ferretti et al. 2013). To the best of our knowledge, no $N_{e}$ estimators have been developed so far that properly deal with Pool-seq data.

In this article, we present a novel temporal method to estimate $N_{e}$ from pooled samples. We show that previously proposed estimators can lead to substantial bias, as they neglect the variance component due to pooled sequencing. We introduce a new model accounting for the two-step sampling process associated with Pool-seq data. In the first sampling step individuals are drawn from the population to create pooled DNA samples. In the second step, pooled sequencing is modeled as binomial sampling of reads out of the DNA pool. We show on simulated data that our method outperforms standard temporal $N_{e}$ estimators. For real data, we also suggest to use a segmentation algorithm, to partition the genome-wide sequence data into stretches of DNA with significantly different $N_{e}$ estimates. Finally, we present an application to a genome-wide experimental evolution data set from Drosophila melanogaster (Franssen et al. 2015).

Materials and Methods

Sampling schemes

Nei and Tajima (1981) investigated the sampling properties of temporal $N_{e}$ estimators and proposed two different sampling schemes. Under the first scheme (plan I), individuals are either sampled after reproduction or returned to the population after their genotypes have been examined. In contrast, under the second scheme (plan II) sampling takes place before reproduction and the sampled individuals are permanently removed from the population and their genotypes do not contribute to the next generation. By assuming different sampling distributions, they derived separate $N_{e}$ estimators under sampling plans I and II.

Waples (1989) unified the calculations under the two plans by assuming binomial sampling out of an infinitely large parental gamete pool for both sampling schemes. He concluded that the measure of variance under the two sampling plans differs only in a covariance term. For plan I, there is a positive correlation between allele frequencies sampled t generations apart because they are both derived from the same population at generation 0. In contrast, for plan II, the initial sample and individuals contributing to the next generation can be considered as independent binomial samples; thus sample frequencies at generations 0 and t are uncorrelated.

For a typical E&R study, outbred experimental populations are created by mixing a large number of inbred lines (e.g., Turner and Miller 2012; Huang et al. 2014; Franssen et al. 2015). The populations are then propagated under the desired experimental conditions while keeping the census size of the population controlled through time (Figure 1). However, the experimenter has no direct influence on the effective population size, which is in general lower than the census size. In E&R studies with Drosophila, the census size rarely exceeds some hundreds of individuals, and sampling usually takes place after reproduction according to plan I. For organisms maintained at larger sizes, such as yeast, the sample for genetic analysis is not returned to the population (Burke et al. 2014). Plan II applies to such cases.

Two-step sampling in experimental evolution with *Drosophila*. In E&R studies, populations are propagated at a census size N defined by the experimenter, which is in general larger than the effective population size $N_{e} .$ Using temporal methods, $N_{e}$ can be estimated from the variance in allele frequency between samples taken t generations apart. To get an accurate representation of allele frequencies in population genetic studies, a large number of individuals $S_{j}$ ( $j \in {0, t}$ ) are sampled and pooled. Sampling can take place according to sampling plan I or II based on the mode of reproduction. Pooled samples are then subjected to high-throughput sequencing. Sequenced reads are subsequently aligned to the reference genome (shown at the bottom). We represent pool sequencing by an additional sampling step (called sampling step 2). We correct for both sampling steps when estimating $N_{e}$ in pooled samples. Additionally, we take into account variable coverage levels across the genome (coverage $R_{i j}$ for site i at $T = j,$ $j \in {0, t}$ ) when correcting for the variance coming from sequencing.

In E&R studies, sampled individuals are often pooled together for DNA extraction (Schlötterer et al. 2014). The size of the pool can be as large as the whole population. Depending on the experimental design, it is also possible that only a fraction of the population is sequenced, for instance, only females (Tobler et al. 2014; Franssen et al. 2015). Pooled individuals are used to create DNA libraries, which are, in turn, subjected to high-throughput sequencing.

We consider two separate sampling steps when estimating $N_{e}$ from Pool-seq samples (Figure 1). In the first step, we model the sampling of individuals out of the population. This can take place according to either plan I or plan II. In the second step, we model the sequencing of a DNA pool by drawing reads at random with replacement from the first-step sample. The allele frequency variance inferred from the sample is corrected for the additional variance coming from the two-step sampling and used for estimating $N_{e} .$

Notation

We assume that the experimental population is propagated at a constant census size N and that $N \geq N_{e} .$ We use genome-wide single-nucleotide polymorphism (SNP) data sampled t generations apart to estimate $N_{e}$ (Figure 1) and denote the estimated effective population size by ${\hat{N}}_{e} .$ Multiallelic sites in populations with low mutation rates, such as Drosophila, exist but are rare and likely to be sequencing errors (Burke et al. 2010). Therefore we consider only biallelic SNPs at n polymorphic sites. At each site i ( $i = 1 \dots n$ ) the true population allele frequency is denoted by $p_{i j}$ at time $T = j,$ where $j \in {0, t} .$ To obtain allele frequency estimates for an unknown $p_{i j},$ the population is subjected to sampling. We consider two sampling steps (Figure 1). At $T = j,$ we first sample $S_{j}$ individuals out of the population to create a pooled DNA library for sequencing. Note that the number of sampled individuals is constant over the n sites, and therefore the index i is omitted here. Sampling individuals can take place according to either plan I or plan II, as described above (also shown in Supplemental Material, Figure S1). As the second sampling step, we model Pool-seq by drawing $R_{i j}$ reads out of the pooled DNA sample at each site i ( $i = 1 \dots n$ ). This allows for variation in sequence coverage. Below we derive the variance in allele frequency for a given site. To keep notation simple, we omit again the index i and denote the unknown sample allele frequency among the $S_{0}$ individuals at the first sampling time point ( $T = 0$ ) by x and the subsequent allele frequency estimate obtained via pool sequencing from $R_{0}$ reads by $\hat{x} .$ Similarly, at some $T = t,$ the respective frequencies are denoted by y and $\hat{y} .$ Note that under pool sequencing only $\hat{x}$ and $\hat{y}$ are observed.

Estimating $N_{e}$ from temporal allele frequency changes

Under neutral Wright–Fisher evolution the variance in allele frequency ( $σ_{p}^{2}$ ) generated by drift after t generations at a single locus in a diploid population is well described by the expression

σ_{p}^{2} = p (1 - p) [1 - {(1 - \frac{1}{2 N_{e}})}^{t}],

(1)

where p is the starting allele frequency (Falconer and Mackay 1996). Wright (1931) denoted the standardized variance by $F = σ_{p}^{2} / p (1 - p),$ which leads to a convenient closed-form expression for $N_{e} .$ Furthermore, if $N_{e}$ is large enough, $F \approx 1 - e^{- t / 2 N_{e}}$ and $N_{e}$ can be calculated as

N_{e} \approx \frac{- t}{2 \ln (1 - F)} .

(2)

The relation between $N_{e}$ and allele frequency changes described in Equation 1 was first used by Krimbas and Tsakas (1971) in natural populations of olive flies. They estimated the variance using

F = F_{a} : = \frac{1}{a} \sum_{k = 1}^{a} \frac{{(x_{k} - y_{k})}^{2}}{x_{k} (1 - x_{k})},

(3)

where $x_{k}$ and $y_{k}$ ( $k = 1, \dots, a$ ) are the observed allele frequencies in the samples collected t generations apart and a is the number of alleles at a specific locus. To eliminate the contribution of sampling errors to the variance, the total variance $F_{a}$ was corrected for the random sampling noise by simply subtracting the corresponding variance. This approach was further investigated and developed by a number of authors (Pamilo and Varvio-Aho 1980; Nei and Tajima 1981; Pollak 1983; Waples 1989).

Possible sources of bias in $N_{e}$ estimators were later investigated by Jorde and Ryman (2007). The authors pointed out that the expectation over F is typically approximated by taking the expected values separately for the numerator and the denominator (Turner et al. 2001). They suggested a different weighting scheme of alleles leading to an alternative less-biased estimator to measure temporal frequency change.

Correction for two-step sampling

We consider a random-mating population with discrete generations. Neutral evolution is assumed with no selection, migration, and mutation. Samples are drawn from the population at generations $T = 0$ and t. Throughout the derivation we consider diploid populations, and therefore a sample of $S_{j}$ individuals leads to $2 S_{j}$ sequences at times $T = j \in {0, t} .$ Sampling is assumed to be binomial with parameters $2 S_{j}$ and $p_{j}$ (Waples 1989). In the second sampling step at time $T = j,$ sequencing a random pool $R_{j}$ of reads is also modeled as binomial sampling.

In analogy to Jorde and Ryman (2007), we use the following expression as our measure of the temporal change in allele frequency for biallelic sites,

F_{c} = \frac{{(\hat{x} - \hat{y})}^{2}}{\hat{z} - \hat{x} \hat{y}},

(4)

where $\hat{z} = (\hat{x} + \hat{y}) / 2 .$

The expectation of $F_{c}$ for a single biallelic locus is approximated by

E (F_{c}) \approx \frac{E {(\hat{x} - \hat{y})}^{2}}{E (\hat{z} - \hat{x} \hat{y})} = \frac{Var (\hat{x}) + Var (\hat{y}) - 2 Cov (\hat{x}, \hat{y})}{E (\hat{z} - \hat{x} \hat{y})} .

(5)

For both plans, we derive expressions for the numerator and denominator in Equation 5 separately under the two-step sampling procedure, described above. Here we summarize our main conclusions; details on the derivation are provided in File S1. With $C_{j} : = 1 / 2 S_{j} + 1 / R_{j} - 1 / 2 S_{j} R_{j}$ for $j \in {0, t},$ and p denoting the true population allele frequency in the gamete pool at generation 0, we obtain

Var (\hat{x}) = p (1 - p) C_{0},

(6)

and

Var (\hat{y}) = p (1 - p) [1 - (1 - C_{t}) {(1 - \frac{1}{2 N_{e}})}^{t}] .

(7)

Note that Equations 6 and 7 differ only in the correction term $C_{j}$ from that in Waples (1989).

Waples (1989) previously showed that the denominator in Equation 5 reduces to

E (\hat{z} - \hat{x} \hat{y}) = p (1 - p) - Cov (\hat{x}, \hat{y}) .

(8)

For plan II, $Cov (\hat{x}, \hat{y}) = 0$ (Waples 1989), and $F_{c}$ corrected for the noise coming from the two-step sampling for a single locus is given by

{F^{'}}_{c} = \frac{F_{c} - C_{0} - C_{t}}{1 - C_{t}} .

(9)

For plan I, on the other hand, the sample allele frequency at generation 0 is positively correlated to the sample allele frequency at t because both are derived from the same population at generation 0. This requires us to calculate the sample covariance $Cov (\hat{x}, \hat{y})$ in Equation 5. It turns out (see File S1 for details) that the covariance of $\hat{x}$ and $\hat{y}$ is equal to

Cov (\hat{x}, \hat{y}) = \frac{p (1 - p)}{2 N},

(10)

where N is the census size of the population at generation 0. Equation 10 is in agreement with the corresponding term of the standard methods (Waples 1989). Substituting the inferred covariance into Equation 5 leads to the following corrected variance estimate, ${F^{'}}_{c}$ for plan I

{F^{'}}_{c} = \frac{F_{c} (1 - 1 / 2 N) - C_{0} - C_{t} + 1 / N}{1 - C_{t}} .

(11)

We provide the corresponding formulas of ${F^{'}}_{c}$ in haploid populations in File S1.

With Pool-seq data, randomness in sequencing and local structures in the genome can lead to different coverage across marker sites, which we denote by $R_{i j}$ for site i ( $i = 1, \dots n$ ) and time j ( $j \in {0, t}$ ). In the genome-wide data set, we calculate ${F^{'}}_{c}$ across n SNPs by summing over all loci in the numerator and denominator separately before carrying out the division in Equation 9, leading to the following weighting scheme for plan II:

\bar{{F^{'}}_{c}} = \frac{\sum_{i = 1}^{n} {({\hat{x}}_{i} - {\hat{y}}_{i})}^{2} - ({\hat{z}}_{i} - {\hat{x}}_{i} {\hat{y}}_{i}) (C_{i 0} + C_{i t})}{\sum_{i = 1}^{n} ({\hat{z}}_{i} - {\hat{x}}_{i} {\hat{y}}_{i}) (1 - C_{i t})} .

(12)

Similarly, $\bar{{F^{'}}_{c}}$ can be calculated for plan I using Equations 4 and 11. Analogous to the single-locus case, our proposed estimators ${\hat{N}}_{e} (P)$ for a diploid population are obtained by plugging $\bar{{F^{'}}_{c}}$ into Equation 2.

Long time series have recently become available for some E&R experiments (Barrick et al. 2009; Burke et al. 2010, 2014). Standard $N_{e}$ estimators (Krimbas and Tsakas 1971; Nei and Tajima 1981; Waples 1989) assume a small number of generations $(t)$ and approximate $N_{e}$ using $2 N_{e} \approx t / F .$ If, however, $t / N_{e}$ is larger, using this approximation can lead to severe bias (Figure S2). To avoid such a bias, we use Equation 2 to estimate $N_{e} .$

Simulations

We evaluate the performance of our estimator on data simulated under the neutral Wright–Fisher model. With a given population size of $2 N_{e},$ we simulate the frequency trajectory of n independent SNPs. As we focus on biallelic SNPs, we assume two possible nucleotides (alleles) to be present in the population with given frequencies at the start. To create a new generation, nucleotides are drawn independently at random with a probability given by their respective allele frequencies in the old generation. The population is propagated at a constant size of $2 N_{e}$ for t nonoverlapping generations. The effective population size is then estimated from allele frequencies inferred from Pool-seq samples taken from the population at the start and after t generations. The sampling of individuals to create the pooled DNA library is simulated by using sampling without replacement. To model the uneven coverage of genome-wide sequence data, we simulate a random coverage for each site, using a Poisson distribution with parameter equal to the given mean coverage. For every position, reads are then generated by binomial sampling from the library with sample size equal to the local coverage.

We assess the performance of our estimator for various combinations of $N_{e},$ pool size, coverage, number of SNPs, and distribution of starting allele frequencies. Additional to these parameters, we also test how the ratio between census and effective population size ( $r = N / N_{e}$ ) affects the accuracy of the proposed estimator. For this purpose, we increment the population size to a desired level of N in each generation while keeping the allele frequencies unchanged to avoid introducing additional sampling variance. We simulated each scenario 100 times.

Linkage disequilibrium between loci can reduce the number of independent SNPs, thereby increasing the variance of the estimate. The impact of dependence between SNPs is investigated based on 10 replicated whole-genome forward simulations with recombination, using the software tool MimicrEE (Kofler and Schlötterer 2014). As a starting population for the forward simulations, we sampled 2000 haploid genomes out of 8000 genomes simulated with fastsimcoal v.1.1.2 (Excoffier and Foll 2011; Bastide et al. 2013). The parameters used to generate the genomes mimic a wild population of D. melanogaster from Vienna (Fiston-Lavier et al. 2010; Bastide et al. 2013; Kofler and Schlötterer 2014). Allele counts are subjected to binomial sampling to mimic Pool-seq with a given sequence coverage. $N_{e}$ is estimated in nonoverlapping windows, each containing a fixed number of SNPs.

Estimating $N_{e}$ on simulated data

We denote our estimator corrected for the additional sampling step, i.e., pooling, by $N_{e} (P) .$ We compare $N_{e} (P)$ to the standard estimators $N_{e} (W)$ and $N_{e} (J R)$ proposed by Waples (1989) and Jorde and Ryman (2007) that correct only for a single sampling step.

We illustrate experimental sampling procedures by considering two major scenarios: (i) The full population is sequenced as one large pool and (ii) only a subset of the population is used to create pooled samples. Under scenario (i) we simulate only a single binomial sampling step to represent sampling reads out of the DNA pool. The pool size is set to be equal to the census size of the population ( $S_{j} = N$ ), and the number of sampled reads ( $R_{i j}$ ) represents the per-site coverage. For estimators that correct only for a single sampling step, we use the coverage ( $R_{i j}$ ) as the sample size. For scenario (ii), the sampled individuals ( $S_{j}$ ) and the read number ( $R_{i j}$ ) represent the pool size and coverage for $N_{e} (P) .$ The coverage ( $R_{i j}$ ) is taken as the sample size for the $N_{e} (W)$ and $N_{e} (J R)$ estimators, as these methods consider only one sampling step.

Change point inference for genome-wide estimates

The effect of genetic drift on the variance in allele frequency specified in Equation 1 holds only under the assumptions of Wright–Fisher neutral evolution. Deviations from the Wright–Fisher model, such as the presence of selection or demography, may cause systematically different changes in allele frequency, affecting the variance and causing locally variable patterns in genetic diversity. Furthermore, the effect of selection on one site of the genome may cause changes in the behavior of variants at nearby sites (Maynard Smith and Haigh 1974; Barton 2000; Comeron et al. 2008). As a result, the estimates of $N_{e}$ at different locations of the genome may deviate from the true number of breeding individuals in the population (Kimura and Crow 1963; Charlesworth 2009). For example, regions under background selection are associated with reduced ${\hat{N}}_{e}$ values that extend to linked sites due to the Hill–Robertson effect (Charlesworth 1996, 2012a; Comeron et al. 2008). Similarly, selectively favorable alleles can also drag nearby neutral sites to high frequency (Maynard Smith and Haigh 1974), causing a local reduction in the estimated $N_{e}$ (Liu and Mittler 2008). Such an event is also known as a selective sweep (Berry et al. 1991). On the other hand, we expect the opposite pattern, i.e., a local elevation of ${\hat{N}}_{e}$ for types of selection such as balancing selection that maintain variation in the genome (Baysal et al. 2007; Charlesworth 2009).

To detect such patterns in ${\hat{N}}_{e},$ we apply a segmentation algorithm to partition the genome into locally homogeneous ${\hat{N}}_{e}$ stretches. We use a method related to an approach suggested by Futschik et al. (2014) for partitioning DNA sequences with respect to GC content. It is based on a statistical multiscale criterion and provides statistical error control, in the sense that the estimated number of windows will not exceed the true one except for a small error probability α to be specified by the user. With our $N_{e}$ estimates, we use a criterion proposed by Frick et al. (2014) for normally distributed responses. It is implemented as part of the R package stepR (Frick et al. 2014). By using simulations with selection we also illustrate that this method is able to capture the signal of locally variable ${\hat{N}}_{e}$ along the chromosome.

Data availability

We estimated N_e in an E&R study with D. melaongaster, published in Orozco-terWengel et al. (2012) and Franssen et al. (2015). Pool-seq read libraries from these studies are available at the European Sequence Read Archive at http://www.ebi.ac.uk/ena/ under accession nos. ERP001290 and ERS460611–ERS460613.

Results and Discussion

Two-step correction is vital to avoid large bias in ${\hat{N}}_{e}$ with Pool-seq data

Methods that do not correct for the additional sampling step caused by pooling can lead to substantial bias in ${\hat{N}}_{e}$ as illustrated in Figure 2. Using simulated data, we compare our proposed estimator $N_{e} (P)$ to two commonly used estimators $N_{e} (W)$ (Waples 1989) and $N_{e} (J R)$ (Jorde and Ryman 2007) that provide highly accurate estimates when only a single sampling event is simulated (Figure S3). Figure 2 shows that the additional correction substantially decreases the bias for almost all scenarios (see also Figure S4, Figure S5, and Figure S6). Under plan I, $N_{e} (P)$ is nearly unbiased. The plan II version of the estimator has a slight upward bias when applied on data simulated under plan I, if the samples are taken at very close time points.

Effective population size estimated with different methods. Sixty generations of Wright–Fisher neutral evolution with $N_{e} = 100$ diploid individuals were simulated for n = 2000 unlinked loci (SNPs). Prior to sampling, the population was increased to a census size of $N = 500$ individuals at each generation. At the starting population and at each indicated time point a sample was taken to create a pool of $S = 100$ individuals. The pool was sequenced to an average coverage of $R = 50$ and $N_{e}$ was estimated on the resulting data set by separately contrasting allele frequencies at generation 0 to each of the evolved generations denoted on the x-axis, using $N_{e} (P),$ $N_{e} (W)$ (Waples 1989), and $N_{e} (J R)$ (Jorde and Ryman 2007). Each box represents results from 100 simulations with identical parameters. The dashed gray line shows the true value of $N_{e} .$ Data are simulated under plan I assumptions and the results of plan I and II estimators are shown in the left and right panels, respectively.

As an alternative approach, we also estimated $N_{e}$ separately for each locus, using ${F^{'}}_{c}$ in Equations 9 and 11. We then calculated the effective population size across the n loci as the harmonic mean over the single-locus ${\hat{N}}_{e}$ estimates ( ${\hat{N}}_{e}^{*} (P)$ ) (Peel et al. 2013). In our simulations, the harmonic mean estimator shows an accuracy similar to that of the original ${\hat{N}}_{e} (P)$ (Figure S7). However, for t lying in the midrange of the simulated interval ( $t =$ 15–40), ${\hat{N}}_{e}^{*} (P)$ is slightly more biased under plan I.

Because of the additional sampling variance, both $N_{e} (W)$ and $N_{e} (J R)$ have a downward bias in particular for small t. Furthermore, $N_{e} (W)$ is upwardly biased for larger values of t, probably reflecting that alleles closer to fixation or loss are contributing less to the variance (Waples 1989). The drift variance accumulates with an increasing number of generations, while the sampling variance stays constant, making the initial bias of $N_{e} (J R)$ less pronounced for larger t. When samples are collected only a few generations apart, the variance of $N_{e} (P)$ estimators tends to be larger than that of $N_{e} (W)$ and $N_{e} (J R)$ under both plans.

Plan I and II estimators differ by a factor resulting from the covariance between the sample frequencies at generations 0 and t (Equation 10), which is inversely proportional to the census population size. Consequently, the difference between plans I and II becomes smaller for increasing N. Waples (1989) investigated how the ratio between census and effective population size ( $r = N / N_{e}$ ) affects the accuracy of the estimators and concluded that the ratio of $r \geq 2$ is sufficient to reach similar estimates for both sampling schemes. We tested the performance of $N_{e} (P)$ on simulated data with $N_{e} = 100$ and $N : N_{e}$ ratios of $r = 1, 2, 5$ with different coverages and pool sizes (Figure S4, Figure S5, and Figure S6). When $N = N_{e},$ the $N_{e} (P)$ plan I method achieves highly accurate estimates for all time points in contrast to the other methods (Figure S4). If, however, the $N_{e} (P)$ plan II estimator is applied to data simulated under plan I, we observe an upward bias for small t, which improves with an increasing number of generations. This pattern is not unexpected since the missing covariance term becomes less influential in view of the increasing drift variance after several generations. When the entire population is sequenced as a single pool ( $S = 100$ ), the plan II estimators of Waples (1989) and Jorde and Ryman (2007) perform similarly to the $N_{e} (P)$ plan I estimator because the correction for pooling in $N_{e} (P)$ cancels out the additional covariance term when $S = N,$ making the term used as F approximately identical to that of $N_{e} (J R) .$ This is a general pattern irrespective of r.

For $r \geq 2,$ $N_{e} (P)$ plan I remains highly accurate (Figure S5 and Figure S6). Furthermore, when increasing the census size under a constant $N_{e}$ (equivalent to increasing r), the covariance between sample allele frequencies decreases, making the difference between plans I and II almost negligible (Waples 1989). The sampling variance becomes proportionally smaller compared to the drift variance with an increasing number of generations between the samples. This improves our ability to accurately estimate $N_{e} .$

Correcting for the additional variance inherent to Pool-seq leads to an improved performance of $N_{e} (P)$ compared to the standard methods for both plans. In general, with Pool-seq data the extent of the bias of the $N_{e} (W)$ and $N_{e} (J R)$ estimates depends on the ratio between N and S, smaller sample sizes (S) leading to a larger bias. As we accounted for the sequencing step with these estimators (Estimating $N_{e}$ on simulated data), decreasing the coverage at a given pool size does not change the bias much but rather increases the variance of the estimators.

In most of the experimental studies the investigator has control over the census population size; thus requiring the knowledge of N for $N_{e} (P)$ plan I does not restrict the analysis. We illustrate the performance of $N_{e} (P)$ plan I only when $N_{e} = N$ in the main text, but according to Figure S5 and Figure S6, $N_{e} (P)$ plan I is also highly accurate when $r \geq 2.$

We show the coefficient of variation (CV) of the $N_{e} (P)$ plan I estimator in Figure 3. The CV is defined as the ratio between the standard deviation and the mean ( $CV = \hat{σ} / \hat{μ},$ where both $\hat{σ}$ and $\hat{μ}$ are estimated from the sample). It measures the relative dispersion of the distribution of the estimated values. $N_{e} (P)$ estimators are highly precise in nearly all cases, except when the drift variance is negligible compared to the sampling variance (Figure 3; see also Figure S9 and Figure S11 where $N_{e} = 1000, t < 30, S \leq 100,$ and $R = 50$ ). The bias is coming from a few outlier estimates, but the median shows more robust results (Figure S13). For plan II estimators, the behavior of the method is similar (Figure S8, Figure S10, Figure S12, and Figure S14). Note that the simulations underlying Figure S8, Figure S10, Figure S12, and Figure S14 have been done under plan I.

Coefficient of variation of $N_{e} (P)$ under plan I for various parameter values. Neutral Wright–Fisher simulations were performed with various combinations of the parameters: effective population size ( $N_{e} = 100, 500, 1000$ diploid individuals), pool size ( $S = 100, 50$ ), and coverage ( $R = 150, 100, 50$ ). $N_{e}$ was estimated with $N_{e} (P)$ under plan I, using $n = 2000$ SNPs. $S = N$ indicates scenarios when the whole population is sequenced as a single pool. For all simulations, we assumed $N = N_{e} .$ Each value is calculated over 100 simulations. When the coefficient of variation exceeds one, the inset shows the actual value.

Increasing the number of SNPs reduces the variance of $N_{e} (P)$

We test how the number of loci (n) used to infer $N_{e}$ affects the accuracy and the precision of the estimates by gradually increasing the number of independent SNPs from 100 to 10,000 (Figure 4). We observe a larger variance and a slight downward bias for a small number of SNPs (100 SNPs). Both the bias and the variance become smaller with a larger number of SNPs. Some further improvement is obtained when >10,000 SNPs are used (not shown), but the benefit of additional independent SNPs levels off. We conclude that $n = 2000$ SNPs usually provide sufficient precision and accuracy. However, when linkage disequilibrium is present in a genome-wide data set, the number of truly independent SNPs per window is reduced and a larger number of loci is recommended.

Effect of the number of SNPs used for estimating $N_{e} .$ The effective population size is estimated using $N_{e} (P)$ plan I on simulated data with $N_{e} = N = 100.$ A total number of $S = 100$ individuals are pooled and sequenced at a mean coverage of $R = 50.$ Based on 100 simulation runs, $N_{e}$ is estimated using different numbers of SNPs at multiple time points.

A skewed starting allele frequency distribution only moderately increases the variance of $N_{e} (P)$

In natural populations, the neutral site frequency spectrum is skewed toward allele frequencies close to the boundaries. $N_{e} (P)$ uses a weighting scheme that is not very sensitive to this skew (see also Jorde and Ryman 2007). This makes it robust with respect to the shape of the starting allele frequency distribution. We illustrate this with a simulated data set having a starting allele frequency distribution that is skewed toward low- and high-frequency variants (Beta(0.2, 0.2)) as expected under neutrality. The estimates of $N_{e}$ from such data sets are compared to simulated data with matching parameters but uniform starting allele frequency distribution (Figure 5). We observe a very slight upward bias with neutral starting allele frequencies compared to uniform and a moderate increase in the variance given $t \geq 15.$ With an increasing number of generations, the difference becomes negligible.

Influence of the starting allele frequency distribution on $N_{e} (P)$ under plan I. A comparison between uniform and Beta(0.2, 0.2)-distributed (neutral) starting allele frequencies is shown. The simulation parameters match those of the genome-wide simulations in Figure 6.

The presence of linkage disequilibrium does not have a large effect on the precision of $N_{e} (P)$

We investigated the sensitivity of our estimator to linkage disequilibrium between loci, using genome-wide neutral simulations with recombination (Kofler and Schlötterer 2014). We simulated data with three different rates of recombination: high, normal, and no recombination. For the first case, the recombination rate is set to mimic the behavior of almost independent SNPs. In the normal recombination rate scenario, we use D. melanogaster recombination rates (Fiston-Lavier et al. 2010). The effective population size was estimated in nonoverlapping windows with a fixed number of $n = 10, 000$ SNPs (Figure 6). Different levels of linkage disequilibrium affect the number of independent loci per window. Nevertheless, we observe only a slight increase in the precision of the $N_{e}$ estimates with increasing recombination rate (Figure 6).

Effect of linkage disequilibrium on ${\hat{N}}_{e} .$ The effect of linkage disequilibrium on our estimator was evaluated based on a whole-genome forward simulation with recombination using the software MimicrEE (Kofler and Schlötterer 2014). Three sets of simulations were performed with different rates of recombination: high, normal, and no recombination. For each parameter setup, a genome-wide simulation is replicated 10 times. The effective population size was estimated with $N_{e} (P)$ (plan I) in nonoverlapping windows of n = 10,000 SNPs for each replicate. The box plots show the distribution of $N_{e}$ estimates across replicates and windows.

Heterogeneous ${\hat{N}}_{e}$ along the genome in an E&R study with D. melanogaster

We estimated $N_{e}$ in a recent E&R study with D. melanogaster (Orozco-terWengel et al. 2012; Franssen et al. 2015). In this experiment replicate populations of 1000 individuals were subjected to a fluctuating hot environment for 59 generations. Allele frequency estimates were obtained for founder and evolved populations, using Pool-seq. $N_{e}$ was estimated based on the allele frequency changes between founder and latest evolved populations in nonoverlapping windows containing 10,000 SNPs, using $N_{e} (P)$ under plan I. To determine the number of DNA stretches with different ${\hat{N}}_{e}$ along the genome, we use a segmentation algorithm provided in the software tool by Frick et al. (2014). This method requires homogeneity of variances. Since the variance of estimates increases with $N_{e},$ the estimates were log-transformed before applying the partitioning procedure. The obtained step function was back-transformed to the original scale and is shown for three biological replicates (Figure 7).

Genome-wide ${\hat{N}}_{e}$ from an E&R study with *D. melanogaster*. $N_{e}$ is estimated based on the allele frequency changes between founder and evolved populations at generation 59 (Franssen *et al.* 2015). In the top panel, we show genome-wide estimates calculated with $N_{e} (P)$ (plan I), using $N = 1000$ as census size and $S = 500$ as pool size (Orozco-terWengel *et al.* 2012) and nonoverlapping windows of 10,000 SNPs. Chromosome-wide mean estimates across replicates are shown by the dashed lines and also calculated separately for each replicate in Table 1. DNA stretches with significantly different ${\hat{N}}_{e}$ are determined using the *stepR* software package (Frick *et al.* 2014) (bottom panel). Lower and upper $1 - α$ confidence bands are shown as shaded areas. α controls the error, *i.e.*, the probability for overestimating the number of change points, and is calculated automatically as described in Frick *et al.* (2014). The colors indicate different biological replicates.

The mean and the median estimates for each chromosome arm as well as across the genome are stable across replicates (see Table 1 and Table S1). As experimental evolution studies often aim to find signals that are consistent across replicates, this can be an important check of the experimental setup. On the other hand, we see differences between chromosome arms. For example, the mean is clearly lower for 3R, emphasizing the added value of spatial analysis compared to genome-wide estimates.

Table 1. Genome-wide mean ${\hat{N}}_{e}$ from an E&R study with D. melanogaster.

	Mean
Replicate	X	2L	2R	3L	3R	Genome-wide
R1	257.9675	231.6854	257.0828	193.4339	131.7072	199.4463
R2	328.8878	297.9832	274.8529	193.3237	194.9571	239.3618
R3	263.4829	246.5448	211.8995	157.6411	133.9459	187.1573

Open in a new tab

The effective population size is estimated with $N_{e} (P)$ plan I in windows of 10,000 SNPs (Figure 7). The mean estimates across windows are shown for the major chromosome arms. The genome-wide mean is taken over the autosomes, excluding chromosome 4.

In D. melanogaster ${\hat{N}}_{e}$ ranges between ∼100 and 400. Around the centromere of chromosome 2, the estimated $N_{e}$ decreases by two-thirds in replicates 1 and 3, which is in agreement with the expectation of low diversity and, as a consequence, low $N_{e}$ in regions with reduced recombination (Begun and Aquadro 1992; Presgraves 2005; Haddrill et al. 2007; Campos et al. 2012). Furthermore, ${\hat{N}}_{e}$ is low on the entire chromosome arm 3R and also on parts of 3L. Overall, these patterns can be attributed to strong LD, caused either by low recombination rates around the centromeres (Chan et al. 2012) or by segregating inversions (Kapun et al. 2014) in combination with selection potentially on rare variants. The reduction in ${\hat{N}}_{e}$ is also well captured by the segmentation algorithm (Figure 7), which shows a similar pattern when applied on simulated data with selection (Figure S15). These results are consistent with those of Tobler et al. (2014), who observed a massive amount of outlier SNPs around the centromere of chromosome 2 and on 3R. Interestingly, certain regions of the genome show extensive differences in ${\hat{N}}_{e}$ between the replicates, which might be reflecting different selection histories or differences in demography, such as replicate-specific bottlenecks.

${\hat{N}}_{e}$ may also vary as a result of differences in the modes of transmission of different components in the genome. For example, on the X chromosome, $N_{e}$ is equal to three-quarters of the autosomal population size (Vicoso and Charlesworth 2006, 2009). Interestingly, our estimates in the E&R experiment do not reflect this expectation of reduced effective population size. Instead, we estimate $N_{e}$ to be as high as ${\hat{N}}_{e}$ on the autosomes. Unequal sex ratio between males and females can be a source of such a pattern (Charlesworth 2009); however, unbalanced sex ratio has not been reported in this experiment. Another possible explanation for increased ${\hat{N}}_{e}$ on the X can be the presence of background selection as suggested by Charlesworth (2012b). He argues that because of the lack of recombination in male Drosophila, the effect of background selection is more effective on the autosomes than on the X chromosome. Orozco-terWengel et al. (2012) reported differences in the number of putatively selected sites between the X and autosomes. They found that candidate SNPs were underrepresented on the X. Their selection scan identifies signatures of deviation from neutral expectation, which is also reflected in the reduction in ${\hat{N}}_{e}$ on the autosomes, indicating higher selection pressure.

Recommendations for genome-wide data sets

Most of the methods proposed previously are not designed for genome-wide high-density SNP data sets. However, the method of Jorde and Ryman (2007) was successfully used for genome-wide data by Foll et al. (2014). Reed et al. (2014) also used a similar approach to estimate $N_{e}$ for whole-genome data, using sliding windows. We estimated $N_{e}$ in windows with a fixed number of SNPs. Using windows of fixed lengths in base pairs would affect the variance of the estimator (Figure 4) but does not distort the mean. All these approaches, however, do not account for the ruggedness of the recombination landscape and can lead to windows with different levels of linkage disequilibrium in them. To overcome this problem it would be possible to define windows based on recombination distance. Unfortunately, the lack of haplotype information in the Pool-seq data makes it difficult to infer linkage disequilibrium. One way to infer linkage information from pooled sequence data is provided by the software LDx (Feder et al. 2012). For model organisms such as Drosophila, readily available recombination maps can also be used as a proxy (Przeworski et al. 2001; Kulathinal et al. 2008; Fiston-Lavier et al. 2010). If only a single genome-wide $N_{e}$ estimate is required, one can alternatively use a set of randomly distributed SNPs over the genome to obtain ${\hat{N}}_{e} .$

Temporal methods make a number of assumptions, which, if violated, can lead to biased $N_{e}$ estimates. For example, in our simulations, we considered only effective population sizes that are constant over time. Fluctuating $N_{e}$ is a frequent phenomenon in natural populations and can be an important component of an experimental design. For example, in repeatedly bottlenecked populations, the smallest population size dominates ${\hat{N}}_{e}$ (Luikart et al. 1999; Charlesworth 2009). But even in strictly controlled populations the experimental regime can induce changes in $N_{e} .$ When the population changes in size, the estimated $N_{e}$ is generally interpreted as the harmonic mean of the effective population sizes over the generations (Wright 1938; Nei and Tajima 1981; Waples 1989). However, if time series allele frequency data are available, such changes can be detected by estimating $N_{e}$ from pairwise comparisons between consecutive time points.

All evolutionary forces (selection, demography, etc.) that lead to deviations from the neutral expectation will also affect our estimate. Nevertheless, systematic forces that result in locally different values of ${\hat{N}}_{e}$ can be detected with a sliding-window approach, as illustrated with simulations under selection (Figure S15). The D. melanogaster data set also illustrates this point; i.e., the hypothesized regions under selection coincide with regions of reduced $N_{e}$ (Orozco-terWengel et al. 2012; Tobler et al. 2014; Franssen et al. 2015). For this to be detected, however, most of the allele frequency change has to occur over the sampled time span.

In the E&R study with D. melanogaster, shown above, the criterion of nonoverlapping generations, assumed by temporal methods, is met (see Orozco-terWengel et al. 2012 for details on experimental design). However, for samples from an age-structured population, the resulting ${\hat{N}}_{e}$ can be biased (Waples and Yokota 2007). In these cases, as suggested by Waples and Yokota (2007), larger spacing between samples maximizes the drift signal compared to sampling biases associated with age structure.

Using a small number of generations can lead to outlier estimates

In general, $N_{e} (P)$ has a lower bias but larger variance, especially when t is small. As pointed out by Jorde and Ryman (2007) our weighting scheme leads to an increased variance but a smaller bias compared to other schemes. We observe outlier estimates among replicates at early generations (generation 5, Figure 2, Figure S4, Figure S5, and Figure S6) for $N_{e} (P) .$ When the sampling variance is large compared to the drift variance ( $N_{e} = 1000,$ $S \leq 100,$ and $R = 50,$ Figure S11 and Figure S12), the deviation of the outlier estimates from the true $N_{e}$ is particularly large. For a few cases, we even observe large negative estimates. Negative estimates, in general, can be interpreted as $N_{e}$ being infinity, that is, no evidence of genetic drift (Peel et al. 2013). In our simulations this is plausible when $N_{e}$ is large and t is small, such that drift has not had a large effect on the population allele frequencies yet. Note that the harmonic mean estimator ( ${\hat{N}}_{e}^{*} (P)$ ) has smaller variance for large $N_{e}$ (Figure S16). This estimator, however, is less accurate than $N_{e} (P)$ for small $N_{e}$ as shown in Figure S17.

To eliminate potential outliers and an inflated variance we recommend increasing the signal-to-noise ratio by pooling a sufficient number of individuals. Using later generations or increasing the number of SNPs in the analysis also helps to avoid outlier estimates. When none of these strategies can be applied, we suggest using the genome-wide median $N_{e} (P)$ estimates or the harmonic mean estimator, as these are more robust to extreme outliers.

Conclusions

Effective population size is an important parameter for describing evolutionary dynamics, making its accurate estimation essential for population genetic studies. Several methods have been designed to estimate $N_{e},$ and their performance was comprehensively evaluated on simulated as well as real data (Barker 2011; Serbezov et al. 2012; Baalsrud et al. 2014; Holleley et al. 2014; Gilbert and Whitlock 2015). These studies mainly focused on genetic data collected from natural populations, which usually differ from experimental studies in terms of the census population size and sampling scheme. We designed a method that accurately infers the effective population size in genome-wide data from experimental populations sequenced in pools. Our approach improves temporal methods by explicitly correcting for two stages of sampling introduced by pooling and sequencing. Our results on simulated data confirm that methods that fail to properly account for the two stages of sampling inherent to Pool-seq can lead to severely biased $N_{e}$ estimates.

Pool-seq data are often considered to be overdispersed, i.e., displaying more variability than is predicted by the binomial sampling model (Yang et al. 2012). However, Zhu et al. (2012) and Futschik and Schlötterer (2010) validated that the error in allele frequency estimates is reasonably well approximated by binomial sampling given that a large enough number of individuals are pooled. Nevertheless, if overdispersion is present in the data, that will lead to additional variance, which is not modeled in our framework and will result in a downward bias of the estimated $N_{e} .$ If the level of overdispersion can be inferred for the data (see, e.g., Gautier et al. 2013; Illingworth 2015), it is possible to introduce a parameter that accounts for the additional between-pool variation (see File S1, Equation S8).

We also illustrate the applicability of our method for estimating $N_{e}$ from experimental data of D. melanogaster and show that in combination with a recursive partitioning method we can infer patterns of local variation in $N_{e}$ along the genome. Additionally, it is possible to calculate confidence intervals based on the $χ^{2}$ distribution (Waples 1989) or alternatively apply a nonparametric bootstrap approach.

Software availability

Our proposed estimators along with standard methods from the literature are implemented within the R package Nest. The package is currently available at https://github.com/ThomasTaus/Nest.

Acknowledgments

We thank Mads Fristrup Schou, Susanne U. Franssen, and Neda Barghi for helpful comments on the Nest software package and Robin S. Waples and an anonymous reviewer for their constructive comments that greatly improved the manuscript. A.J. and T.T. are members of the Vienna Graduate School of Population Genetics, which is funded by the Austrian Science Fund (FWF, W1225). C.S. is also supported by the European Research Council grant ”ArchAdapt,” and T.T. is a recipient of a Doctoral Fellowship (DOC) of the Austrian Academy of Sciences.

Footnotes

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.191197/-/DC1.

Communicating editor: M. A. Beaumont

Literature Cited

Anderson E. C., Williamson E. G., Thompson E. A., 2000. Monte Carlo evaluation of the likelihood for N(e) from temporally spaced samples. Genetics 156(4): 2109–2118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baalsrud H. T., Saether B.-E., Hagen I. J., Myhre A. M., Ringsby T. H., et al. , 2014. Effects of population characteristics and structure on estimates of effective population size in a house sparrow metapopulation. Mol. Ecol. 23(11): 2653–2668. [DOI] [PubMed] [Google Scholar]
Barker J. S. F., 2011. Effective population size of natural populations of Drosophila buzzatii, with a comparative evaluation of nine methods of estimation. Mol. Ecol. 20(21): 4452–4471. [DOI] [PubMed] [Google Scholar]
Barrick J. E., Yu D. S., Yoon S. H., Jeong H., Oh T. K., et al. , 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461(7268): 1243–1247. [DOI] [PubMed] [Google Scholar]
Barton N. H., 2000. Genetic hitchhiking. Philos. Trans. R. Soc. Lond. B Biol. Sci. 355(1403): 1553–1562. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bastide H., Betancourt A., Nolte V., Tobler R., Stöbe P., et al. , 2013. A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genet. 9(6): e1003534. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baysal B. E., Lawrence E. C., Ferrell R. E., 2007. Sequence variation in human succinate dehydrogenase genes: evidence for long-term balancing selection on SDHA. BMC Biol. 5: 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Begun D. J., Aquadro C. F., 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356(6369): 519–520. [DOI] [PubMed] [Google Scholar]
Berry A. J., Ajioka J. W., Kreitman M., 1991. Lack of polymorphism on the Drosophila fourth chromosome resulting from selection. Genetics 129: 1111–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boitard S., Kofler R., Françoise P., Robelin D., Schlötterer C., et al. , 2013. Pool-hmm: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples. Mol. Ecol. Resour. 13(2): 337–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burke M. K., Dunham J. P., Shahrestani P., Thornton K. R., Rose M. R., et al. , 2010. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467(7315): 587–590. [DOI] [PubMed] [Google Scholar]
Burke M. K., Liti G., Long A. D., 2014. Standing genetic variation drives repeatable experimental evolution in outcrossing populations of Saccharomyces cerevisiae. Mol. Biol. Evol. 31(12): 3228–3239. [DOI] [PMC free article] [PubMed] [Google Scholar]
Campos J. L., Charlesworth B., Haddrill P. R., 2012. Molecular evolution in nonrecombining regions of the Drosophila melanogaster genome. Genome Biol. Evol. 4(3): 278–288. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chan A. H., Jenkins P. A., Song Y. S., 2012. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genet. 8(12): e1003090. [DOI] [PMC free article] [PubMed] [Google Scholar]
Charlesworth B., 1996. Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet. Res. 68(2): 131–149. [DOI] [PubMed] [Google Scholar]
Charlesworth B., 2009. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10(3): 195–205. [DOI] [PubMed] [Google Scholar]
Charlesworth B., 2012a The effects of deleterious mutations on evolution at linked sites. Genetics 190(1): 5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
Charlesworth B., 2012b The role of background selection in shaping patterns of molecular evolution and variation: evidence from variability on the Drosophila X chromosome. Genetics 191: 233–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
Comeron J. M., Williford A., Kliman R. M., 2008. The Hill-Robertson effect: evolutionary consequences of weak selection and linkage in finite populations. Heredity 100(1): 19–31. [DOI] [PubMed] [Google Scholar]
Excoffier L., Foll M., 2011. Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27(9): 1332–1334. [DOI] [PubMed] [Google Scholar]
Falconer D. S., Mackay T. F. C., 1996. Introduction to Quantitative Genetics. Benjamin-Cummings, Menlo Park, CA. [Google Scholar]
Feder A. F., Petrov D. A., Bergland A. O., 2012. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One 7(11): e48588. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ferretti L., Ramos-Onsins S. E., Pérez-Enciso M., 2013. Population genomics from pool sequencing. Mol. Ecol. 22(22): 5561–5576. [DOI] [PubMed] [Google Scholar]
Fisher R., 1930. The Genetical Theory of Natural Selection. Oxford University Press, Oxford. [Google Scholar]
Fiston-Lavier A.-S., Singh N. D., Lipatov M., Petrov D. A., 2010. Drosophila melanogaster recombination rate calculator. Gene 463(1–2): 18–20. [DOI] [PubMed] [Google Scholar]
Foll M., Poh Y.-P., Renzette N., Ferrer-Admetlla A., Bank C., et al. , 2014. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10(2): e1004185. [DOI] [PMC free article] [PubMed] [Google Scholar]
Foll M., Shim H., Jensen J. D., 2015. WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol. Ecol. Resour. 15(1): 87–98. [DOI] [PubMed] [Google Scholar]
Franssen S. U., Nolte V., Tobler R., Schlötterer C., 2015. Patterns of linkage disequilibrium and long range hitchhiking in evolving experimental Drosophila melanogaster populations. Mol. Biol. Evol. 32(2): 495–509. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frick K., Munk A., Sieling H., 2014. Multiscale change point inference. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(3): 495–580. [Google Scholar]
Futschik A., Schlötterer C., 2010. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186(1): 207–218. [DOI] [PMC free article] [PubMed] [Google Scholar]
Futschik A., Hotz T., Munk A., Sieling H., 2014. Multiscale DNA partitioning: statistical evidence for segments. Bioinformatics 30(16): 2255–2262. [DOI] [PubMed] [Google Scholar]
Gautier M., Foucaud J., Gharbi K., Cézard T., Galan M., et al. , 2013. Estimation of population allele frequencies from next-generation sequencing data: pool-vs. individual-based genotyping. Mol. Ecol. 22(14): 3766–3779. [DOI] [PubMed] [Google Scholar]
Gilbert K. J., Whitlock M. C., 2015. Evaluating methods for estimating local effective population size with and without migration. Evolution 69(8): 2154–2166. [DOI] [PubMed] [Google Scholar]
Haddrill P. R., Halligan D. L., Tomaras D., Charlesworth B., 2007. Reduced efficacy of selection in regions of the Drosophila genome that lack crossing over. Genome Biol. 8(2): R18. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hill W. G., 1981. Estimation of effective population size from data on linkage disequilibrium. Genet. Res. 38: 209–216. [Google Scholar]
Holleley C. E., Nichols R. A., Whitehead M. R., Adamack A. T., Gunn M. R., et al. , 2014. Testing single-sample estimators of effective population size in genetically structured populations. Conserv. Genet. 15: 23–35. [Google Scholar]
Huang Y., Wright S. I., Agrawal A. F., 2014. Genome-wide patterns of genetic variation within and among alternative selective regimes. PLoS Genet. 10(8): e1004527. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hui T.-Y. J., Burt A., 2015. Estimating effective population size from temporally spaced samples with a novel, efficient maximum-likelihood algorithm. Genetics 200: 285–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
Illingworth C. J. R., 2015. Fitness inference from short-read data: within-host evolution of a reassortant H5N1 influenza virus. Mol. Biol. Evol. 32(11): 3012–3026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jorde P. E., Ryman N., 2007. Unbiased estimator for genetic drift and effective population size. Genetics 177: 927–935. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kapun M., van Schalkwyk H., McAllister B., Flatt T., Schlötterer C., 2014. Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster. Mol. Ecol. 23(7): 1813–1827. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kawecki T. J., Lenski R. E., Ebert D., Hollis B., Olivieri I., et al. , 2012. Experimental evolution. Trends Ecol. Evol. 27(10): 547–560. [DOI] [PubMed] [Google Scholar]
Kimura M., 1964. Diffusion model in population genetics. J. Appl. Probab. 1: 177–223. [Google Scholar]
Kimura M., Crow J. F., 1963. The measurement of effective population number. Evolution 17(3): 279–288. [Google Scholar]
Kofler R., Schlötterer C., 2014. A guide for the design of evolve and resequencing studies. Mol. Biol. Evol. 31(2): 474–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kofler R., Orozco-terWengel P., De Maio N., Pandey R. V., Nolte V., et al. , 2011a PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS One 6(1): e15925. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kofler R., Pandey R. V., Schlötterer C., 2011b PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27(24): 3435–3436. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolaczkowski B., Kern A. D., Holloway A. K., Begun D. J., 2011. Genomic differentiation between temperate and tropical Australian populations of Drosophila melanogaster. Genetics 187: 245–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krimbas C. B., Tsakas S., 1971. The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide control – Selection or drift? Evolution 25: 454–460. [DOI] [PubMed] [Google Scholar]
Kulathinal R. J., Bennett S. M., Fitzpatrick C. L., Noor M. A. F., 2008. Fine-scale mapping of recombination rate in Drosophila refines its correlation to diversity and divergence. Proc. Natl. Acad. Sci. USA 105(29): 10051–10056. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Y., Mittler J. E., 2008. Selection dramatically reduces effective population size in HIV-1 infection. BMC Evol. Biol. 8: 133. [DOI] [PMC free article] [PubMed] [Google Scholar]
Long A., Liti G., Luptak A., Tenaillon O., 2015. Elucidating the molecular architecture of adaptation via evolve and resequence experiments. Nat. Rev. Genet. 16(10): 567–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luikart G., Cornuet J. M., Allendorf F. W., 1999. Temporal changes in allele frequencies provide estimates of population bottleneck size. Conserv. Biol. 13(3): 523–530. [Google Scholar]
Maynard Smith J., Haigh J., 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35. [PubMed] [Google Scholar]
Nei M., Tajima F., 1981. Genetic drift and estimation of effective population size. Genetics 98: 625–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nomura T., 2008. Estimation of effective number of breeders from molecular coancestry of single cohort sample. Evol. Appl. 1(3): 462–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
Orozco-terWengel P., Kapun M., Nolte V., Kofler R., Flatt T., et al. , 2012. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21(20): 4931–4941. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pamilo P., Varvio-Aho S. L., 1980. On the estimation of population size from allele frequency changes. Genetics 95: 1055–1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peel D., Waples R. S., Macbeth G. M., Do C., Ovenden J. R., 2013. Accounting for missing data in the estimation of contemporary genetic effective population size (N(e)). Mol. Ecol. Resour. 13(2): 243–253. [DOI] [PubMed] [Google Scholar]
Pollak E., 1983. A new method for estimating the effective population size from allele frequency changes. Genetics 104: 531–548. [DOI] [PMC free article] [PubMed] [Google Scholar]
Presgraves D. C., 2005. Recombination enhances protein adaptation in Drosophila melanogaster. Curr. Biol. 15(18): 1651–1656. [DOI] [PubMed] [Google Scholar]
Przeworski M., Wall J. D., Andolfatto P., 2001. Recombination and the frequency spectrum in Drosophila melanogaster and Drosophila simulans. Mol. Biol. Evol. 18(3): 291–298. [DOI] [PubMed] [Google Scholar]
Pudovkin A. I., Zaykin D. V., Hedgecock D., 1996. On the potential for estimating the effective number of breeders from heterozygote-excess in progeny. Genetics 144: 383–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reed L. K., Lee K., Zhang Z., Rashid L., Poe A., et al. , 2014. Systems genomics of metabolic phenotypes in wild-type Drosophila melanogaster. Genetics 197: 781–793. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schlötterer C., Tobler R., Kofler R., Nolte V., 2014. Sequencing pools of individuals - mining genome-wide polymorphism data without big funding. Nat. Rev. Genet. 15(11): 749–763. [DOI] [PubMed] [Google Scholar]
Schlötterer C., Kofler R., Versace E., Tobler R., Franssen S. U., 2015. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation. Heredity 114(5): 431–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
Serbezov D., Jorde P. E., Bernatchez L., Olsen E. M., Vllestad L. A., 2012. Short-term genetic changes: evaluating effective population size estimates in a comprehensively described brown trout (Salmo trutta) population. Genetics 191: 579–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tallmon D. A., Koyuk A., Luikart G., Beaumont M. A., 2008. Computer Programs: onesamp: a program to estimate effective population size using approximate Bayesian computation. Mol. Ecol. Resour. 8(2): 299–301. [DOI] [PubMed] [Google Scholar]
Tobler R., Franssen S. U., Kofler R., Orozco-terWengel P., Nolte V., et al. , 2014. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Mol. Biol. Evol. 31(2): 364–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turner T. L., Miller P. M., 2012. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191: 633–642. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turner T. F., Salter L. A., Gold J. R., 2001. Temporal-method estimates of $N_{e}$ from highly polymorphic loci. Conserv. Genet. 2: 297–308. [Google Scholar]
Turner T. L., Stewart A. D., Fields A. T., Rice W. R., Tarone A. M., 2011. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7(3): e1001336. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vicoso B., Charlesworth B., 2006. Evolution on the X chromosome: unusual patterns and processes. Nat. Rev. Genet. 7: 645–653. [DOI] [PubMed] [Google Scholar]
Vicoso B., Charlesworth B., 2009. Effective population size and the faster-X effect: an extended model. Evolution 63(9): 2413–2426. [DOI] [PubMed] [Google Scholar]
Wang J., 2001. A pseudo-likelihood method for estimating effective population size from temporally spaced samples. Genet. Res. 78(3): 243–257. [DOI] [PubMed] [Google Scholar]
Wang J., 2009. A new method for estimating effective population sizes from a single sample of multilocus genotypes. Mol. Ecol. 18(10): 2148–2164. [DOI] [PubMed] [Google Scholar]
Wang J., 2013. A simulation module in the computer program COLONY for sibship and parentage analysis. Mol. Ecol. Resour. 13(4): 734–739. [DOI] [PubMed] [Google Scholar]
Waples R. S., 1989. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121: 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
Waples R. S., Do C., 2008. LDNe: a program for estimating effective population size from data on linkage disequilibrium. Mol. Ecol. Resour. 8(4): 753–756. [DOI] [PubMed] [Google Scholar]
Waples R. S., Do C., 2010. Linkage disequilibrium estimates of contemporary $N_{e}$ using highly variable genetic markers: a largely untapped resource for applied conservation and evolution. Evol. Appl. 3(3): 244–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
Waples R. S., England P. R., 2011. Estimating contemporary effective population size on the basis of linkage disequilibrium in the face of migration. Genetics 189: 633–644. [DOI] [PMC free article] [PubMed] [Google Scholar]
Waples R. S., Yokota M., 2007. Temporal estimates of effective population size in species with overlapping generations. Genetics 175: 219–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williamson E. G., Slatkin M., 1999. Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152: 755–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright S., 1931. Evolution in Mendelian populations. Genetics 16: 97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright S., 1938. Size of population and breeding structure in relation to evolution. Science 87: 430–431. [Google Scholar]
Yang X., Todd J. A., Clayton D., Wallace C., 2012. Extra-binomial variation approach for analysis of pooled DNA sequencing data. Bioinformatics 28(22): 2898–2904. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu Y., Bergland A. O., González J., Petrov D. A., 2012. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS One 7(7): e41901. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[bib1] Anderson E. C., Williamson E. G., Thompson E. A., 2000. Monte Carlo evaluation of the likelihood for N(e) from temporally spaced samples. Genetics 156(4): 2109–2118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Baalsrud H. T., Saether B.-E., Hagen I. J., Myhre A. M., Ringsby T. H., et al. , 2014. Effects of population characteristics and structure on estimates of effective population size in a house sparrow metapopulation. Mol. Ecol. 23(11): 2653–2668. [DOI] [PubMed] [Google Scholar]

[bib3] Barker J. S. F., 2011. Effective population size of natural populations of Drosophila buzzatii, with a comparative evaluation of nine methods of estimation. Mol. Ecol. 20(21): 4452–4471. [DOI] [PubMed] [Google Scholar]

[bib4] Barrick J. E., Yu D. S., Yoon S. H., Jeong H., Oh T. K., et al. , 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461(7268): 1243–1247. [DOI] [PubMed] [Google Scholar]

[bib5] Barton N. H., 2000. Genetic hitchhiking. Philos. Trans. R. Soc. Lond. B Biol. Sci. 355(1403): 1553–1562. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Bastide H., Betancourt A., Nolte V., Tobler R., Stöbe P., et al. , 2013. A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genet. 9(6): e1003534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Baysal B. E., Lawrence E. C., Ferrell R. E., 2007. Sequence variation in human succinate dehydrogenase genes: evidence for long-term balancing selection on SDHA. BMC Biol. 5: 12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Begun D. J., Aquadro C. F., 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356(6369): 519–520. [DOI] [PubMed] [Google Scholar]

[bib9] Berry A. J., Ajioka J. W., Kreitman M., 1991. Lack of polymorphism on the Drosophila fourth chromosome resulting from selection. Genetics 129: 1111–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Boitard S., Kofler R., Françoise P., Robelin D., Schlötterer C., et al. , 2013. Pool-hmm: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples. Mol. Ecol. Resour. 13(2): 337–340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Burke M. K., Dunham J. P., Shahrestani P., Thornton K. R., Rose M. R., et al. , 2010. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467(7315): 587–590. [DOI] [PubMed] [Google Scholar]

[bib12] Burke M. K., Liti G., Long A. D., 2014. Standing genetic variation drives repeatable experimental evolution in outcrossing populations of Saccharomyces cerevisiae. Mol. Biol. Evol. 31(12): 3228–3239. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Campos J. L., Charlesworth B., Haddrill P. R., 2012. Molecular evolution in nonrecombining regions of the Drosophila melanogaster genome. Genome Biol. Evol. 4(3): 278–288. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Chan A. H., Jenkins P. A., Song Y. S., 2012. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genet. 8(12): e1003090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Charlesworth B., 1996. Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet. Res. 68(2): 131–149. [DOI] [PubMed] [Google Scholar]

[bib16] Charlesworth B., 2009. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10(3): 195–205. [DOI] [PubMed] [Google Scholar]

[bib17] Charlesworth B., 2012a The effects of deleterious mutations on evolution at linked sites. Genetics 190(1): 5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Charlesworth B., 2012b The role of background selection in shaping patterns of molecular evolution and variation: evidence from variability on the Drosophila X chromosome. Genetics 191: 233–246. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Comeron J. M., Williford A., Kliman R. M., 2008. The Hill-Robertson effect: evolutionary consequences of weak selection and linkage in finite populations. Heredity 100(1): 19–31. [DOI] [PubMed] [Google Scholar]

[bib20] Excoffier L., Foll M., 2011. Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27(9): 1332–1334. [DOI] [PubMed] [Google Scholar]

[bib21] Falconer D. S., Mackay T. F. C., 1996. Introduction to Quantitative Genetics. Benjamin-Cummings, Menlo Park, CA. [Google Scholar]

[bib22] Feder A. F., Petrov D. A., Bergland A. O., 2012. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One 7(11): e48588. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Ferretti L., Ramos-Onsins S. E., Pérez-Enciso M., 2013. Population genomics from pool sequencing. Mol. Ecol. 22(22): 5561–5576. [DOI] [PubMed] [Google Scholar]

[bib24] Fisher R., 1930. The Genetical Theory of Natural Selection. Oxford University Press, Oxford. [Google Scholar]

[bib25] Fiston-Lavier A.-S., Singh N. D., Lipatov M., Petrov D. A., 2010. Drosophila melanogaster recombination rate calculator. Gene 463(1–2): 18–20. [DOI] [PubMed] [Google Scholar]

[bib26] Foll M., Poh Y.-P., Renzette N., Ferrer-Admetlla A., Bank C., et al. , 2014. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10(2): e1004185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Foll M., Shim H., Jensen J. D., 2015. WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol. Ecol. Resour. 15(1): 87–98. [DOI] [PubMed] [Google Scholar]

[bib28] Franssen S. U., Nolte V., Tobler R., Schlötterer C., 2015. Patterns of linkage disequilibrium and long range hitchhiking in evolving experimental Drosophila melanogaster populations. Mol. Biol. Evol. 32(2): 495–509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Frick K., Munk A., Sieling H., 2014. Multiscale change point inference. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(3): 495–580. [Google Scholar]

[bib30] Futschik A., Schlötterer C., 2010. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186(1): 207–218. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Futschik A., Hotz T., Munk A., Sieling H., 2014. Multiscale DNA partitioning: statistical evidence for segments. Bioinformatics 30(16): 2255–2262. [DOI] [PubMed] [Google Scholar]

[bib32] Gautier M., Foucaud J., Gharbi K., Cézard T., Galan M., et al. , 2013. Estimation of population allele frequencies from next-generation sequencing data: pool-vs. individual-based genotyping. Mol. Ecol. 22(14): 3766–3779. [DOI] [PubMed] [Google Scholar]

[bib33] Gilbert K. J., Whitlock M. C., 2015. Evaluating methods for estimating local effective population size with and without migration. Evolution 69(8): 2154–2166. [DOI] [PubMed] [Google Scholar]

[bib34] Haddrill P. R., Halligan D. L., Tomaras D., Charlesworth B., 2007. Reduced efficacy of selection in regions of the Drosophila genome that lack crossing over. Genome Biol. 8(2): R18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Hill W. G., 1981. Estimation of effective population size from data on linkage disequilibrium. Genet. Res. 38: 209–216. [Google Scholar]

[bib36] Holleley C. E., Nichols R. A., Whitehead M. R., Adamack A. T., Gunn M. R., et al. , 2014. Testing single-sample estimators of effective population size in genetically structured populations. Conserv. Genet. 15: 23–35. [Google Scholar]

[bib37] Huang Y., Wright S. I., Agrawal A. F., 2014. Genome-wide patterns of genetic variation within and among alternative selective regimes. PLoS Genet. 10(8): e1004527. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Hui T.-Y. J., Burt A., 2015. Estimating effective population size from temporally spaced samples with a novel, efficient maximum-likelihood algorithm. Genetics 200: 285–293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Illingworth C. J. R., 2015. Fitness inference from short-read data: within-host evolution of a reassortant H5N1 influenza virus. Mol. Biol. Evol. 32(11): 3012–3026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Jorde P. E., Ryman N., 2007. Unbiased estimator for genetic drift and effective population size. Genetics 177: 927–935. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Kapun M., van Schalkwyk H., McAllister B., Flatt T., Schlötterer C., 2014. Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster. Mol. Ecol. 23(7): 1813–1827. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Kawecki T. J., Lenski R. E., Ebert D., Hollis B., Olivieri I., et al. , 2012. Experimental evolution. Trends Ecol. Evol. 27(10): 547–560. [DOI] [PubMed] [Google Scholar]

[bib43] Kimura M., 1964. Diffusion model in population genetics. J. Appl. Probab. 1: 177–223. [Google Scholar]

[bib44] Kimura M., Crow J. F., 1963. The measurement of effective population number. Evolution 17(3): 279–288. [Google Scholar]

[bib45] Kofler R., Schlötterer C., 2014. A guide for the design of evolve and resequencing studies. Mol. Biol. Evol. 31(2): 474–483. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Kofler R., Orozco-terWengel P., De Maio N., Pandey R. V., Nolte V., et al. , 2011a PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS One 6(1): e15925. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Kofler R., Pandey R. V., Schlötterer C., 2011b PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27(24): 3435–3436. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Kolaczkowski B., Kern A. D., Holloway A. K., Begun D. J., 2011. Genomic differentiation between temperate and tropical Australian populations of Drosophila melanogaster. Genetics 187: 245–260. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Krimbas C. B., Tsakas S., 1971. The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide control – Selection or drift? Evolution 25: 454–460. [DOI] [PubMed] [Google Scholar]

[bib50] Kulathinal R. J., Bennett S. M., Fitzpatrick C. L., Noor M. A. F., 2008. Fine-scale mapping of recombination rate in Drosophila refines its correlation to diversity and divergence. Proc. Natl. Acad. Sci. USA 105(29): 10051–10056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Liu Y., Mittler J. E., 2008. Selection dramatically reduces effective population size in HIV-1 infection. BMC Evol. Biol. 8: 133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Long A., Liti G., Luptak A., Tenaillon O., 2015. Elucidating the molecular architecture of adaptation via evolve and resequence experiments. Nat. Rev. Genet. 16(10): 567–582. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Luikart G., Cornuet J. M., Allendorf F. W., 1999. Temporal changes in allele frequencies provide estimates of population bottleneck size. Conserv. Biol. 13(3): 523–530. [Google Scholar]

[bib54] Maynard Smith J., Haigh J., 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35. [PubMed] [Google Scholar]

[bib55] Nei M., Tajima F., 1981. Genetic drift and estimation of effective population size. Genetics 98: 625–640. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] Nomura T., 2008. Estimation of effective number of breeders from molecular coancestry of single cohort sample. Evol. Appl. 1(3): 462–474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Orozco-terWengel P., Kapun M., Nolte V., Kofler R., Flatt T., et al. , 2012. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21(20): 4931–4941. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Pamilo P., Varvio-Aho S. L., 1980. On the estimation of population size from allele frequency changes. Genetics 95: 1055–1057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] Peel D., Waples R. S., Macbeth G. M., Do C., Ovenden J. R., 2013. Accounting for missing data in the estimation of contemporary genetic effective population size (N(e)). Mol. Ecol. Resour. 13(2): 243–253. [DOI] [PubMed] [Google Scholar]

[bib60] Pollak E., 1983. A new method for estimating the effective population size from allele frequency changes. Genetics 104: 531–548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Presgraves D. C., 2005. Recombination enhances protein adaptation in Drosophila melanogaster. Curr. Biol. 15(18): 1651–1656. [DOI] [PubMed] [Google Scholar]

[bib62] Przeworski M., Wall J. D., Andolfatto P., 2001. Recombination and the frequency spectrum in Drosophila melanogaster and Drosophila simulans. Mol. Biol. Evol. 18(3): 291–298. [DOI] [PubMed] [Google Scholar]

[bib63] Pudovkin A. I., Zaykin D. V., Hedgecock D., 1996. On the potential for estimating the effective number of breeders from heterozygote-excess in progeny. Genetics 144: 383–387. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Reed L. K., Lee K., Zhang Z., Rashid L., Poe A., et al. , 2014. Systems genomics of metabolic phenotypes in wild-type Drosophila melanogaster. Genetics 197: 781–793. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Schlötterer C., Tobler R., Kofler R., Nolte V., 2014. Sequencing pools of individuals - mining genome-wide polymorphism data without big funding. Nat. Rev. Genet. 15(11): 749–763. [DOI] [PubMed] [Google Scholar]

[bib66] Schlötterer C., Kofler R., Versace E., Tobler R., Franssen S. U., 2015. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation. Heredity 114(5): 431–440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Serbezov D., Jorde P. E., Bernatchez L., Olsen E. M., Vllestad L. A., 2012. Short-term genetic changes: evaluating effective population size estimates in a comprehensively described brown trout (Salmo trutta) population. Genetics 191: 579–592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Tallmon D. A., Koyuk A., Luikart G., Beaumont M. A., 2008. Computer Programs: onesamp: a program to estimate effective population size using approximate Bayesian computation. Mol. Ecol. Resour. 8(2): 299–301. [DOI] [PubMed] [Google Scholar]

[bib69] Tobler R., Franssen S. U., Kofler R., Orozco-terWengel P., Nolte V., et al. , 2014. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Mol. Biol. Evol. 31(2): 364–375. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Turner T. L., Miller P. M., 2012. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191: 633–642. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib71] Turner T. F., Salter L. A., Gold J. R., 2001. Temporal-method estimates of $N_{e}$ from highly polymorphic loci. Conserv. Genet. 2: 297–308. [Google Scholar]

[bib72] Turner T. L., Stewart A. D., Fields A. T., Rice W. R., Tarone A. M., 2011. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7(3): e1001336. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] Vicoso B., Charlesworth B., 2006. Evolution on the X chromosome: unusual patterns and processes. Nat. Rev. Genet. 7: 645–653. [DOI] [PubMed] [Google Scholar]

[bib74] Vicoso B., Charlesworth B., 2009. Effective population size and the faster-X effect: an extended model. Evolution 63(9): 2413–2426. [DOI] [PubMed] [Google Scholar]

[bib75] Wang J., 2001. A pseudo-likelihood method for estimating effective population size from temporally spaced samples. Genet. Res. 78(3): 243–257. [DOI] [PubMed] [Google Scholar]

[bib76] Wang J., 2009. A new method for estimating effective population sizes from a single sample of multilocus genotypes. Mol. Ecol. 18(10): 2148–2164. [DOI] [PubMed] [Google Scholar]

[bib77] Wang J., 2013. A simulation module in the computer program COLONY for sibship and parentage analysis. Mol. Ecol. Resour. 13(4): 734–739. [DOI] [PubMed] [Google Scholar]

[bib78] Waples R. S., 1989. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121: 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib79] Waples R. S., Do C., 2008. LDNe: a program for estimating effective population size from data on linkage disequilibrium. Mol. Ecol. Resour. 8(4): 753–756. [DOI] [PubMed] [Google Scholar]

[bib80] Waples R. S., Do C., 2010. Linkage disequilibrium estimates of contemporary $N_{e}$ using highly variable genetic markers: a largely untapped resource for applied conservation and evolution. Evol. Appl. 3(3): 244–262. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Waples R. S., England P. R., 2011. Estimating contemporary effective population size on the basis of linkage disequilibrium in the face of migration. Genetics 189: 633–644. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Waples R. S., Yokota M., 2007. Temporal estimates of effective population size in species with overlapping generations. Genetics 175: 219–233. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] Williamson E. G., Slatkin M., 1999. Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152: 755–761. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib84] Wright S., 1931. Evolution in Mendelian populations. Genetics 16: 97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib85] Wright S., 1938. Size of population and breeding structure in relation to evolution. Science 87: 430–431. [Google Scholar]

[bib86] Yang X., Todd J. A., Clayton D., Wallace C., 2012. Extra-binomial variation approach for analysis of pooled DNA sequencing data. Bioinformatics 28(22): 2898–2904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Zhu Y., Bergland A. O., González J., Petrov D. A., 2012. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS One 7(7): e41901. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

Ágnes Jónás

Thomas Taus

Carolin Kosiol

Christian Schlötterer

Andreas Futschik

Abstract

Materials and Methods

Sampling schemes

Figure 1.

Notation

Estimating Ne from temporal allele frequency changes

Correction for two-step sampling

Simulations

Estimating Ne on simulated data

Change point inference for genome-wide estimates

Data availability

Results and Discussion

Two-step correction is vital to avoid large bias in N^e with Pool-seq data

Figure 2.

Figure 3.

Increasing the number of SNPs reduces the variance of Ne(P)

Figure 4.

A skewed starting allele frequency distribution only moderately increases the variance of Ne(P)

Figure 5.

The presence of linkage disequilibrium does not have a large effect on the precision of Ne(P)

Figure 6.

Heterogeneous N^e along the genome in an E&R study with D. melanogaster

Figure 7.

Table 1. Genome-wide mean N^e from an E&R study with D. melanogaster.

Recommendations for genome-wide data sets

Using a small number of generations can lead to outlier estimates

Conclusions

Software availability

Acknowledgments

Footnotes

Literature Cited

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Estimating $N_{e}$ from temporal allele frequency changes

Estimating $N_{e}$ on simulated data

Two-step correction is vital to avoid large bias in ${\hat{N}}_{e}$ with Pool-seq data

Increasing the number of SNPs reduces the variance of $N_{e} (P)$

A skewed starting allele frequency distribution only moderately increases the variance of $N_{e} (P)$

The presence of linkage disequilibrium does not have a large effect on the precision of $N_{e} (P)$

Heterogeneous ${\hat{N}}_{e}$ along the genome in an E&R study with D. melanogaster

Table 1. Genome-wide mean ${\hat{N}}_{e}$ from an E&R study with D. melanogaster.