Abstract
New mutations provide the raw material for evolution and adaptation. The distribution of fitness effects (DFE) describes the spectrum of effects of new mutations that can occur along a genome, and is, therefore, of vital interest in evolutionary biology. Recent work has uncovered striking similarities in the DFE between closely related species, prompting us to ask whether there is variation in the DFE among populations of the same species, or among species with different degrees of divergence, that is whether there is variation in the DFE at different levels of evolution. Using exome capture data from six tree species sampled across Europe we characterized the DFE for multiple species, and for each species, multiple populations, and investigated the factors potentially influencing the DFE, such as demography, population divergence, and genetic background. We find statistical support for the presence of variation in the DFE at the species level, even among relatively closely related species. However, we find very little difference at the population level, suggesting that differences in the DFE are primarily driven by deep features of species biology, and those evolutionarily recent events, such as demographic changes and local adaptation, have little impact.
Keywords: DFE, deleterious mutations, population structure, forest trees
Introduction
The distribution of fitness effects (DFE) of new mutations, that is, the proportion of new mutations that are expected to be adaptive, neutral, slightly deleterious, or strongly deleterious, is at the heart of any evolutionary model, yet, in spite of recent progress (for a review, see Johri et al. 2022) it is still hard to estimate and is poorly understood. While there is variation in the DFE across distantly related species with dissimilar biological features (Huber et al. 2017), on shorter evolutionary timescales it is not clear how the DFE might come to differ among species or populations, although we can make some predictions from the Nearly Neutral Theory (Ohta 1973). In particular, the strength of selection acting on new mutations is expected to scale with effective population size, Ne, and, therefore, to be affected by demographic processes. We also expect that the fraction of mutations inferred to be nearly neutral, that is, slightly deleterious, will be related to proxies of Ne. In particular, the ratio of slightly deleterious to neutral diversity will be smaller in high Ne populations (Welch et al. 2008).
Despite these predictions, empirical evidence has been mixed. Major evolutionary transitions do affect the DFE. For instance, a shift in mating systems from outcrossing to selfing leads to a lower Ne and a significant increase in the fraction of slightly deleterious mutations (e.g. Douglas et al. 2015), as predicted under the Nearly Neutral Theory. However, a number of studies have found that across closely related species, the DFE and related summary statistics, such as the ratio of nonsynonymous to synonymous nucleotide diversity, πN/πS, are remarkably stable (Grivet et al. 2017; Castellano et al. 2019; Liu et al. 2022), even when comparing domesticated species and their wild relatives (Chen et al. 2017). In the latter, domestication has a very strong effect on synonymous nucleotide diversity but the ratio of nonsynonymous to synonymous nucleotide variation, a good proxy of the slightly deleterious class of mutations for populations at demographic equilibrium (Ohta 1973), was barely affected. Additionally, while some studies have found associations between parameters associated with the DFE and demographic processes such as range expansion (González-Martínez et al. 2017; Willi et al. 2020), others have not (Takou et al. 2021).
These contrasting results may reflect real biological and demographic differences across species and populations. Species may also experience different environmental conditions across their ranges, which could result in changes in the parameters of the DFE. For example, Martin and Lenormand (2006) found evidence to support a scenario in which mutations have more variable fitness effects when an organism exists in an environment to which it is less well adapted. They interpreted this result in terms of a simple fitness landscape model. A recent study in Arabidopsis thaliana (Weng et al. 2021) also found that mutational variance was greater in populations growing in stressful environments in which their fitness was low. However, not all of the results in Weng et al. agree with the predictions of a simple fitness landscape model. For example, the authors found that beneficial mutations were more common in populations in less stressful environments. Additionally, a review of the impact of environment on the effects of new mutations found that environmental stress can both decrease and increase the mean strength of selection acting on new mutations, as well as its variance (Agrawal and Whitlock 2010). Population differentiation may also be important, with more differentiated populations appearing to have less similar strengths of selection acting on shared mutations than less differentiated populations (Huang et al. 2021). Whether this could lead to differences in the DFE between populations given enough evolutionary time has not yet been systematically investigated.
However, contrasting results across species and populations might also be due to differences in metrics used to characterize patterns of deleterious and neutral diversity. It has been argued that while summary statistics such as the ratio of nonsynonymous to synonymous nucleotide diversity provide a good measure of the efficiency of selection, they are poor measures of the deleterious genetic load experienced by a population due to the effects of demography and nonequilibrium dynamics. For instance, after a demographic event, slightly deleterious nonsynonymous mutations will reach their equilibrium frequency spectra more rapidly than synonymous mutations, simply because the equilibrium frequencies of slightly deleterious mutations are lower (Simons et al. 2014; Simons and Sella 2016). Counts of nonsynonymous derived alleles are more robust to nonequilibrium dynamics, and give a good measure of load if mutations are deleterious, and their effects are additive. Therefore, metrics such as Rxy, which were specifically developed for the purpose of estimating asymmetries in counts of derived mutations between populations, provide a better proxy of genetic load (Do et al. 2015). A combination of such metrics, in addition to those based on the site frequency spectrum, may allow for a greater understanding of how new mutations affect the molecular evolution of populations and species differ.
In the present study, we investigated variation in the DFE at both the species and population levels by leveraging exome capture data collected from range-wide populations of six forest tree species, comprising four angiosperms and three conifers, at different degrees of phylogenetic distance. These trees are keystones of European forests with a range of life history traits. All species are widely distributed, but there are marked differences in levels of population differentiation within species (see supplementary table S1, Supplementary Material online for details). By using orthologous genomic regions, we were able to compare the DFE among species while controlling for gene content. Additionally, all species have been sampled broadly across their natural ranges, following the same sampling scheme, providing us with an ideal dataset to assess the constancy of the DFE at the within-species level. Finally, we also explored variation in patterns of genetic load between populations.
Methods
Samples
The data consists of six wind-pollinated forest tree species (6), two conifers (Picea abies and Pinus pinaster), and four angiosperms (Betula pendula, Fagus sylvatica, Populus nigra, and Quercus petraea), distributed across Eurasia from the boreal to the Mediterranean region, and with either animal-, wind-, or water-dispersed seeds. The species vary in both life history and population structure (Milesi et al. 2023; see supplementary table S1, Supplementary Material online for details).
Sequencing and SNP Calling
Sequencing and single nucleotide polymorphism (SNP) calling were as described in Milesi et al. (2023). Briefly, the data are the result of targeted nuclear DNA sequencing (∼10,000 species-specific probes that covered ∼3 Mb of sequence) on a total of 3,407 adult trees collected from 19 to 26 locations per species (∼25 samples each) across their distribution range. The targeted regions primarily consisted orthologous regions among species, in addition to regions that had previously been identified as targets of selection. Site-based annotation (4-fold degenerate and 0-fold degenerate sites) of detected SNPs was generated using the Python script NewAnnotateRef.py available at https://github.com/fabbyrob/science/blob/master/pileup_analyzers/NewAnnotateRef.py (Williamson et al. 2014). Detected SNPs were functionally annotated in order to predict their effects on protein sequences using the tool ANNOVAR (Wang et al. 2010). SNPs were classified as “noncoding”; “coding 4-fold degenerate synonymous”; “coding 0-fold degenerate nonsynonymous”; and “nonsense” (determining a premature STOP codon or a STOP loss). Filtering steps were applied in order to remove incorrectly assigned or clear hybrid samples. Full documentation of bioinformatics pipelines used to generate these VCF files is available at https://github.com/GenTree-h2020-eu/GenTree. The VCF files used in the present study correspond to version 5.3.2, available at https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/DV2X0M. In order to be included in our analyses, both polymorphic and monomorphic sites had to have a call depth >8 or genotype quality >20. Loci with >50% missing calls were also removed. SNPs and monomorphic sites were further restricted to those that are either 4-fold or 0-fold degenerate. An additional subdivision of our SNP dataset was created, which included only those SNPs that occur in orthologous genomic regions found in all six tree species.
SNP Polarization
To increase the power of our DFE estimation methods, we inferred the ancestral state at each SNP. This was achieved by considering the state of the site in either a single outgroup species (two species in our dataset; see supplementary table S2, Supplementary Material online for details) or two outgroup species (four species; see supplementary table S2, Supplementary Material online for details). For each species, we, therefore, mapped the genome of one or more outgroup species to the same reference genome used for SNP calling for that species using the bwa software package (Li and Durbin 2009); for further details and commands used, see supplementary table S2, Supplementary Material online. We also retained SNP sites that could not be matched to a site in an outgroup species (see in the following), due, for example, to being missing in the outgroup species genome. We used the maximum likelihood method implemented in Est-SFS (Keightley and Jackson 2018) for assigning the ancestral allele at polymorphic sites, assuming the Kimura 2-parameter substitution model. To conduct this step, we first down-sampled to a maximum number of 100 haplotypes per species by sampling randomly from a hypergeometric distribution to account for missing data and to not exceed the maximum permissible number of haplotypes for Est-SFS. We then used the probability associated with the state of each SNP to assign likely ancestral states, removing SNPs for which the probability of the major allele being the ancestral state was between 0.4 and 0.6, and which we are therefore not able to polarize with confidence. So, SNPs for which there was no outgroup information available could therefore still be assigned an ancestral state based on their minor allele frequency; however, we note that this is a small fraction of SNPs, and that all downstream analyses account for errors in ancestral state identification. We used a model averaging procedure to assess the effect of accounting for error in ancestral state identification on DFE inference (see DFE Inference); additionally, we assessed how restricting our dataset to GC-conservative mutations, which are less likely to be affected by polarization error due to the exclusion of CpG hypermutable sites, affects our results.
Grouping Samples
For downstream analyses, we were interested in investigating variation in the DFE across a species range. DFE inference power depends on the number of sequenced individuals, and number of available SNPs; we, therefore, pooled individuals into groups based on sampling location (see supplementary fig. S1, Supplementary Material online for the map of sampling locations). This was first achieved by taking all individuals per country; subsequently, if multiple distinct admixture groups were present in this “country” pool of individuals, as identified in Milesi et al. (2023), this pool was subdivided further based on the admixture groups. If any pool contained fewer than 20 individuals it was not included in our analysis in the interest of maintaining sufficient power to achieve accurate results. We will refer to these pools as “populations”, full details of which can be found in supplementary table S3, Supplementary Material online. We also calculated the mean latitude and longitude of each sampling location per population.
Summary Statistics
We inferred a number of standard population genetic summary statistics including Wright's fixation index, FST (as calculated over 4-fold sites), and 0- and 4-fold pairwise nucleotide site diversity, π0 and π4, respectively, after first projecting our data down to an SFS of 40 haplotypes, that is, 20 individuals per species or population, to account for any sites with missing data. Projecting takes the average across every possible resampling of the data, as implemented in Python using functions in the δaδi package (Gutenkunst et al. 2010). Those pairwise nucleotide diversity estimates were then used to calculate the ratio π0/π4. For each population, pairwise FST was calculated with Python scripts, as implemented in δaδi (Gutenkunst et al. 2010). For each species, we identified the median longitude and latitude of sampling locations, and chose as a reference the pooled population sampled closest to this location, which represents a “central” population to the species range. These “central” populations were DE for B. pendula, CH for F. sylvatica, LT for P. abies, FR-North for P. pinaster, IT-North for P. nigra, and CH for Q. petraea (for location details, see supplementary table S3, Supplementary Material online for details).
Finally, we also inferred Rxy, an estimator of the differences of genetic load between populations, as defined in Do et al. (2015). Briefly, Rxy measures the average difference in the accumulation of mutations between two genomes sampled in different populations at all sites for which the ancestral state is known. One counts the number of derived mutations in genome x that are not present in genome y and vice versa, and Rxy is defined as the ratio of these two counts. If selection has been equally effective and mutation rates have been the same since the populations diverged, Rxy is expected to equal 1. This statistic was shown to be monotonically related to the difference in mutation load between the two populations. We followed Do et al. (2015) in calculating confidence intervals on this estimate using a weighted block jack-knife procedure whereby SNP data was divided into 100 “consecutive” blocks and Rxy was recalculated, removing one block per run. Each VCF was first split into chunks of length 2 Mb, based on the SNP position in the assembled genome, and then these chunks were combined into 100 groups of similar length. This grouping was done such that consecutive parts of the genome were kept together, although small scaffolds meant that occasionally different scaffolds were combined into a single group. As before, we used our estimated “central” population per species as the reference when presenting results, but results are very similar when different populations are used as reference. We also calculate R′xy for 0-fold degenerate, nonsynonymous sites, a measure which is normalized using putatively neutral 4-fold degenerate synonymous sites, by dividing Rxy for 0-fold sites by Rxy for 4-fold sites. The custom scripts used to calculate all summary statistics are available at: https://github.com/j-e-james/TreeDFEScripts.
DFE Inference
The DFE was primarily inferred using polyDFE (Tataru et al. 2017; Tataru and Bataillon 2019). PolyDFE implements a likelihood-based approach, and simultaneously infers the DFE while also accounting for the effects of other distorters of the SFS such as demography and errors in SNP polarization through the incorporation of nuisance parameters (Eyre-Walker et al. 2006), which are inferred for each category of the SFS. This method requires the specification of a class of neutral and a class of non-neutral sites, for which we used 0- and 4-fold degenerate sites, thereby avoiding site-counting issues that arise with 2- and 3-fold degenerate sites. As no change in 4-fold degenerate sites results in a change in amino acid, they are the best proxy for neutrally evolving sites in coding DNA, although it is possible that there is selection on synonymous codon usage in the species (Duret 2002). We then inferred the neutral and non-neutral site frequency spectra across species and populations, after first projecting our data down to the same number of individuals to account for any sites with missing data. Our analyses were run on data projected down to 40 haplotypes, that is, 20 individuals. All scripts required for the processing of data, and the polyDFE input files used in this analysis, are available at (https://github.com/j-e-james/TreeDFEScripts). PolyDFE allows for the fitting of both deleterious-only DFEs and mixed DFEs, which account for the possible effects of beneficial mutations on DFE inference. From the DFE fitted for beneficial mutations, polyDFE is also able to estimate α, henceforth referred to as αDFE, the rate of adaptive molecular evolution. In polyDFE, this is calculated from the full DFE for beneficial mutations, however, this may inflate estimates of αDFE due to the inclusion of beneficial mutations with very small selective effects. We, therefore, followed Galtier et al. (2016) by incorporating a lower bound of 5 for the population selection coefficients of positive mutations to be used in the calculation of αDFE, as implemented in polyDFE v. 2.0 (Tataru et al. 2017). This lower bound is arbitrary, and changing it will have an impact on the estimated value of αDFE. Finally, we note that we only use polymorphism data when running PolyDFE, to avoid having to make the assumption that the DFE is invariant between the ingroup and outgroup species. PolyDFE is able to estimate the deleterious DFE accurately without divergence information, and the inclusion of divergence data provides little or no improvement to estimates of the beneficial DFE (Tataru et al. 2017; Booker 2020).
To ensure that, as far as possible, our polyDFE runs explored the full range of parameter space when estimating the DFE, we ran polyDFE a minimum of five times per species, using different starting parameters for each run (see supplementary table S4, Supplementary Material online for details). Runs in which parameters were close to the edges of their permitted ranges were removed; we then assessed whether our runs reliably returned similar estimated DFE parameters and had a small gradient of the likelihood. As DFE estimation entails considerable uncertainty, we then ran polyDFE a further four times per species, initializing runs using both analytically estimated parameters and those parameters previously found to return the smallest gradient of the likelihood, fitting a different model per run: in model 1 we fit a full (deleterious and advantageous mutations) DFE, including an estimation of the rate of misidentification of the ancestral allele, ɛanc; in model 2 we fit a full DFE, without including the estimation of ɛanc; in model 3 we fit a deleterious mutation-only DFE; in model 4 we again fit a deleterious mutation-only DFE, but without including an estimation of ɛanc. All models include an estimation of nuisance parameters, which account for the effects of demography. We then performed model averaging over the four models, as described in Muyle et al. (2021), such that models are weighted by their AICs to account for uncertainty in parameter estimation. We calculated AIC weights and generated bootstrap datasets using the R functions provided in polyDFE (Tataru et al. 2017), available at https://github.com/paula-tataru/polyDFE. All other scripts used to conduct these analyses are available at https://github.com/j-e-james/TreeDFEScripts.
PolyDFE v2.0 (Tataru and Bataillon 2019) enables the simultaneous fitting of DFEs to multiple datasets, allowing for model comparisons to assess whether models in which DFE parameters differ between datasets provide a significantly better fit than models in which the DFE parameters are shared between datasets. In situations in which we were interested in comparing models (e.g. comparing populations), we inferred a full DFE, including nuisance parameters and ɛanc, allowing these parameters to vary between datasets, as recommended by Tataru and Bataillon (2019), to account for differences in ancestral identification error and demographic processes between comparisons.
Statistical Analyses
We considered correlations between our inferred DFE parameters, life history traits, and population genetics summary statistics. All statistical analyses and plotting were conducted in R, using scripts available at https://github.com/j-e-james/TreeDFEScripts.
Results
Summary Statistics
Our dataset comprises polarized SNPs from approximately 3 Mb of targeted genome sequencing for six European tree species which were sampled broadly across their range, with approximately 25 individuals sequenced per location. For all populations across species (see “population” definition above), we calculated population genetic summary statistics to investigate the efficiency of selection across species and among populations within species.
Species vary broadly in π0/π4, the efficiency of purifying selection (Fig. 1A), with selection appearing to be comparatively inefficient in P. pinaster and P. abies relative to broad-leaved species such as B. pendula. This may reflect differences among species in effective population size. However, it does not clearly relate to levels of panmixia, despite the species exhibiting different degrees of genetic differentiation across their ranges (Fig. 1B). B. pendula exhibits very little population differentiation and has the most efficient selection of the six species. For F. sylvatica, Q. petraea, B. pendula, and P. abies, despite their broad geographic ranges, the efficiency of purifying selection was similar among populations within a species. By contrast, P. nigra and P. pinaster have the highest levels of population genetic structure, with strongly differentiated and isolated Moroccan populations, and there is a relationship with latitude and π0/π4 for both species (P. nigra R2 = 0.91, P = 0.0019 and P. pinaster R2 = 0.55, P = 0.04), with π0/π4 being lowest in populations at lower latitudes for both P. nigra and P. pinaster. However, these species have intermediate values of π0/π4 when comparing among species. P. abies, F. sylvatica, and Q. petraea show moderate levels of structure, with population FST increasing with latitude, and while F. sylvatica and Q. petraea have intermediate values of π0/π4, P. abies has the least efficient selection of any of the species studied.
It has been argued that metrics measuring the ratio of nonsynonymous to synonymous (or 0- to 4-fold degenerate) diversity are poor measures of genetic load (Do et al. 2015). We therefore also estimated the statistic Rxy, which compares the frequency of derived alleles between a focal (X) and reference (Y) population. The neutral expectation is that the number of derived alleles is the same in the focal population as in the reference, while values of Rxy above 1 indicate that the focal population has an excess of derived alleles.
Comparing focal populations to a single reference population (for which we used a population that was approximately central for the sampling locations per species, Fig. 1C), the most striking results are for P. nigra populations MA and GB, which show a deficit of derived alleles relative to the reference population. We also note a slight tendency for low latitude populations of P. abies to show a relative deficit of derived alleles, which agrees with our 0- to 4-fold diversity results. However, in the vast majority of populations, we find no deviation from the neutral expectation that the number of derived alleles at 0-fold degenerate sites is the same in the focal as in the reference population. If we use 4-fold degenerate synonymous sites to normalize Rxy (R′xy, Fig. 1D), which has been suggested to account for the effects of population structure (Do et al. 2015; Grossen et al. 2020), we find that no population has a significant deficit relative to the focal population. Therefore, although population structure has resulted in a deficit or excess of mutations in some populations, there is little evidence that populations differ in their genetic load.
Species DFE
We inferred the full DFE for all species, incorporating a gamma-distributed deleterious DFE and an exponential-distributed beneficial DFE, using only polymorphism data (Tataru et al. 2017). The gamma distribution is a flexible and commonly used distribution, and is parameterized by two values: the shape parameter, b, which is inversely related to the coefficient of variation of the strengths of selection acting on new mutations, and the scale parameter, Sd, which is the mean scaled strength of selection (Nes) acting on new mutations. We also inferred the purely deleterious DFE for all six species to assess whether incorporating beneficial mutations improves our DFE model inference (Fig. 2). We fitted the full DFE and the deleterious-only DFE models both with and without incorporating an estimation of the error rate for the inference of the ancestral state of alleles, ɛanc, and conducted a model averaging procedure (see Methods and supplementary fig. S2, Supplementary Material online to see the fit of each model, and see supplementary table S5, Supplementary Material online for all model-averaged inferred parameters for the deleterious and beneficial DFEs), such that the estimated DFE parameters presented here incorporate the degree of model uncertainty (Fig. 3; Table 1). In three of the species in our dataset, incorporating the rate of ancestral misidentification did not significantly improve the fit of the DFE model; while in F. sylvatica, Q. petraea and P. abies we see a small model improvement (P-values of likelihood ratio tests comparing models are 0.018, 0.016, and 0.0039, respectively); these species have high Sd values and a high proportion of adaptive substitutions, which is the selective regime in which we expect the rate of ancestral error to be inflated in polyDFE analyses (Tataru et al. 2017). Generally, the species in our dataset have similar values of b, but vary considerably in Sd. However, Sd values should be interpreted with caution, because Sd is not related to the distribution of selection coefficients of segregating mutations in a straightforward way (see supplementary fig. S3, Supplementary Material online for details).
Table 1.
Species | Sd | b | Fraction of mutations −1 < Nes < 0 |
π 0/π4 | Model |
---|---|---|---|---|---|
Fagus sylvatica | −25,000 | 0.36 | 0.20 | 0.23 | − ɛanc |
Quercus petraea | −9500 | 0.41 | 0.23 | 0.27 | +− ɛanc |
Betula pendula | −190 | 1.59 | 0.17 | 0.20 | +− |
Populus nigra | −571 | 0.39 | 0.24 | 0.26 | +− |
Picea abies | −47,000 | 0.097 | 0.30 | 0.35 | − ɛanc |
Pinus pinaster | −64 | 0.73 | 0.22 | 0.28 | +− |
Sd: the mean scaled strength of deleterious selection acting on new mutations rounded to two S. F., that is, the scale parameter of the gamma-shaped deleterious DFE; b: the shape parameter of the gamma-shaped deleterious DFE, which is inversely related to the coefficient of variation in the fitness effects of new deleterious mutations; the inferred fraction of mutations with fitness effects between −1 and 0, that is, the nearly neutral fraction of slightly deleterious mutations; and π0/π4. DFE parameters shown are model-averaged, such that estimates are weighted by model AIC. The best model, as ascertained using likelihood ratio tests, is indicated in the last column; whether fitting a deleterious-only DFE (−) or a full DFE including beneficial mutations (+−), and whether including the rate of error in the inference of the ancestral state improves the model fit (ɛanc).
In five of the six species, a model incorporating beneficial mutations into estimates of the DFE was the most highly weighted, although only in four species was this model a significantly better fit to the data. Ignoring the contribution of beneficial mutations to the DFE in these species leads to a reduction in the inferred value of b and to an increase in the inferred value of Sd (Figs. 2 and 3). We used polyDFE to estimate the rate of adaptive molecular evolution (i.e. the proportion of nonsynonymous substitutions that are beneficial), αDFE, incorporating a lower bound for the minimum strength of selection acting on new mutations. We demonstrate the effects of different bounds on the estimate of αDFE in see supplementary fig. S4, Supplementary Material online. We find that the rate of adaptive evolution is fairly high in some of the tree species, particularly in B. pendula and P. nigra, suggesting that adaptive substitutions are common in these forest tree species (Fig. 3).
P. abies and P. pinaster, the two conifer species included in this study, are an interesting pair to compare. P. abies has remarkably low values of αDFE, which is particularly surprising given the fairly high inferred αDFE in the other conifer in the dataset, P. pinaster. These differences may arise due to P. pinaster having relatively differentiated populations, which could facilitate local adaptation due to the limited influx of alleles from other populations, whereas P. abies has less population structure, that is, greater levels of admixture, and uniformly high levels of purifying selection across its range (Figs. 1 and 3). It is also interesting to note that P. pinaster has quite a distinct discretized DFE compared to the other species in the dataset, with a high inferred b, a low inferred absolute Sd, and a relatively small estimated fraction of mutations falling into the most strongly deleterious category (Nes < −100). P. pinaster has fewer SNPs compared to the other species in the dataset, and so we have less confidence in these results, however, these differences could represent the greater phylogenetic distance between P. pinaster with the other forest trees in the dataset. The most closely phylogenetically related species in this dataset are F. sylvatica and Q. petraea, which do have similar DFEs (shown in Figs. 2 and 3). However, we find that for these two species, DFE models that are fitted independently per species have significantly better log-likelihoods than models in which either both b and Sd are shared between species (P = 0.05), or models in which only b is shared between species (P = 0.03), as might be the case if the two species shared a DFE but had different effective population sizes.
Drivers of Differences in the DFE at the Species Level
GC-biased Gene Conversion
It has been demonstrated that GC-biased gene conversion can result in misinference of the DFE (Bolívar et al. 2018). CpG sites are highly mutable, and prone to polarization error. We therefore repeated our analyses restricting our dataset to GC-conservative mutations (see supplementary table S6 and fig. S5, Supplementary Material online for details). We found that fitting the DFE parameters independently for GC-conservative mutations does not provide a better model fit than allowing DFE parameters to be shared between GC-conservative mutations and the full SNP dataset. Our inferred DFEs are similar across datasets (see supplementary figs S6 and S7, Supplementary Material online for details), it is therefore unlikely that differences in GC-biased gene conversion, due, for example, to differences in recombination rate among species, explain differences in the DFE among species.
Life History Traits and Ne
There are no significant correlations between any of the estimated parameters of the DFE and the two life history traits that we tested, maximum longevity and age at first reproduction (i.e. minimum age at flowering), which were previously shown to predict genetic diversity and the efficiency of selection in plants (Chen et al. 2017). However, the mean scaled strength of selection acting on deleterious variants, Sd, varies across species, increasing with a proxy of Ne, the level of neutral nucleotide site diversity π4, which reflects the stronger effect of drift in smaller populations, as expected under the Nearly Neutral Theory (Spearman's rho = −0.79, P = 0.048, Pearson's R = 0.72, P = 0.065).
As expected under the Nearly Neutral Theory, the fraction of mutations that we infer to be nearly neutral from the DFE is correlated to our estimate of π0/π4 (Fig. 2E). However, π0/π4 is always greater than the nearly neutral fraction of mutations as estimated from the DFE. This is likely to be due to the contribution of segregating beneficial and slightly beneficial mutations to diversity in these species. Indeed, if we consider results from models in which we fit the deleterious DFE only (Fig. 2E), this systematic difference between π0/π4 and the nearly neutral fraction is reduced. B. pendula and P. nigra are particular outliers, highlighting the effect that beneficial variants have on patterns of molecular evolution in these species.
Gene Content
The differences in the DFE that we observe between species are unlikely to be due to differences in gene content, or differences in genes sequenced, between species. Indeed, the parameters of the DFE were very similar when calculated across all genes in the dataset, and when calculated only for those common orthologs that were sequenced in all six species (for details of the relative proportions of all-species orthologs, see supplementary table S7, Supplementary Material online for details). Only in P. pinaster do likelihood ratio tests suggest that an independent DFE for orthologs found in all species is a better fit to the data than a shared DFE for all genes. We found that a slightly higher fraction of new mutations is inferred to be strongly deleterious in orthologs (Fig. 3), which we might expect as such genes are likely to be older, involved in many important biological functions, and under strong purifying selection. This suggests that in P. pinaster, genes in our dataset that are not part of the all-species ortholog set might experience differences in selective effects; they may be under less strong purifying selection. We also note a lower fraction of adaptive substitutions in all-species orthologs.
Differences Among Populations Within Species
The DFE is similar across populations of the same species, with species explaining a considerable proportion of the variation in the parameters of the deleterious and beneficial DFE as calculated across populations (Fig. 5; for deleterious-only DFE inferences see supplementary fig S8, Supplementary Material online for details). For the majority of populations, we could not reject the null model that the DFE of the population is the same as the DFE inferred for the species as a whole. This was true even under a very conservative scenario in which we fit models assuming that both b and Sd are shared across the populations and the species on average. In other words, the mean scaled strength of selection and the variance in fitness due to new mutations is consistent across populations, despite any differences in demographic history and local adaptation to environmental conditions between populations.
There are some exceptions to these general trends. We note that while results for P. pinaster populations indicate a considerably greater spread of inferred b among populations (Fig. 5A), no differences between populations are statistically supported. However, model comparison results indicate that two B. pendula populations might have different DFEs from the species on average (ES and IT, see supplementary fig S9, Supplementary Material online for details). We also find that one Q. petraea population (LT), four F. sylvatica populations (AT, GB, NO, and SI, see supplementary fig S9, Supplementary Material online for details) and three P. nigra populations (GB, ITS, and MA, see supplementary fig S9, Supplementary Material online for details), have significantly different DFEs from the DFE as calculated over all populations of each species (Fig. 4). Our results suggest both Sd and b differ between these populations and the dataset as a whole. We note that of these outlier populations only two, F. sylvatica NO and P. nigra GB, are significant after performing a strict Bonferroni correction for multiple testing.
In these analyses, we compare a population-specific DFE to a species-level DFE inferred over all populations, which might reduce differences between populations and the species-level “pooled” DFE. We therefore repeated our analyses, comparing each focal population to a central reference population. We again found little difference in the DFE between populations within a species. The only populations that had significantly different DFEs to the central population were F. sylvatica NO and P. nigra MA and GB, and thus we can conclude that our results are consistent.
Drivers of Differences in the DFE at the Population Level
At the species level, evidence of a relationship between population differentiation and variation in the effectiveness of selection, or in the shape of the DFE, is not clear. P. pinaster, P. abies, and P. nigra, have higher mean FST values by approximately an order of magnitude, however, among-population variation in parameters of the DFE and the estimated effectiveness of selection are similar for both these and other species in the dataset with lower levels of population differentiation (see Fig. 4).
To investigate more systematically the possibility that differentiation leads to differences in the DFE, we correlated population pairwise FST with population differences in the DFE parameters Sd and b, and population differences in π0/π4. Although in some species, greater population differentiation appears to correlate with larger differences in parameters of the DFE, this relationship is not consistent (see supplementary fig S10, Supplementary Material online for details).
Discussion
Here, we have shown that both the efficiency of selection and the DFE differ among species, but that there is relatively little variation among populations within species. Our results suggest striking differences between different tree species, with conifers generally having a smaller fraction of highly deleterious mutations. This variation is not driven by differences in gene content between species. The nature of our exome capture dataset has resulted in a dataset that contains a high proportion of genes which are orthologs, with only 140 out of a total of 1,042 genes sequenced in only a single species. Even when we restrict our analysis to orthologous genes sequenced across every species, which constituted an average of 32% of genes sequenced per species (ranging from 26% in P. nigra to 38% in F. sylvatica), differences between species in both the mean scaled strength of selection (Sd) and the coefficient of variation in the strength of selection (b) remain the same.
An important caveat of the present study is that in order to estimate the DFE, we have assumed that the DFE can be reasonably well approximated as a continuous gamma distribution. This allowed us to conduct straightforward comparative analyses across species and populations. However, it is important to acknowledge that although the DFE is commonly modeled as a gamma distribution (Bataillon and Bailey 2014; Martin and Lenormand 2006), other distributions can be theoretically justified (Loewe and Charlesworth 2006), and in some studies better support has been found for alternative distributions such as the lognormal or multimodal (Kousathanas and Keightley 2013; Loewe and Charlesworth 2006; Sawyer et al. 2003). Alternative distributions may be better able to model high concentrations of strongly deleterious or lethal mutations more accurately, a feature that has been observed in some mutation accumulation experiments (Eyre-Walker and Keightley 2007). Such mutations have little chance of being observed in most datasets due to their rarity, and as such the shape of the most deleterious class of mutations is based on projecting from the DFE. However, previous analyses have found that the inferred parameters of the gamma DFE are not greatly affected by including or excluding an additional parameter that takes these most deleterious mutations into account (Eyre-Walker et al. 2006). At the other end of the selective scale, alternative models may also be better able to cope with the fact that it is difficult to examine the DFE for mutations that are either neutral or have very small selection coefficients (Welch et al. 2008). Some studies have considered models that consist of a distribution of selected effects plus a point mass of neutral mutations, which have sometimes been found to fit data well (Loewe and Charlesworth 2006; Kim et al. 2017).
Our analysis was conducted on targeted resequencing data. This approach allowed for the sampling of a high number of individuals per population and per species, increasing the amount of power we had to infer DFE's parameters, which are notoriously difficult to estimate. While it is possible that the genomic regions used in this analysis do not reflect processes across the genome, a dataset restricted to all-species orthologs has a similar DFE to the dataset as a whole, making it likely that our DFEs are representative of the whole coding genomes of the species included in this study. Interestingly, recent work (Simons et al. 2022) has argued that in a scenario in which most traits are highly polygenic and experiencing stabilizing selection, the distribution of selection coefficients will be similar across loci that underlie all such traits. The orthologous genes which make up the majority of the coding sequences included in this analysis are perhaps likely to experience both stabilizing selection, and to underlie traits that are highly polygenic, and hence be well described by the model developed by Simons et al.
Variation in the efficiency of selection at the among-population level in the species in our dataset is low. It is perhaps not surprising that many populations have highly similar DFEs to that inferred for the species overall, given the remarkably similar levels of π0/π4 across populations in most species (Fig. 1), and their often similar demographic histories. Although many of the species are important economically, their use by humans is unlikely to have affected the DFE, especially given that the domestication of other crop plants has had little effect on their DFEs compared to their wild relatives (Chen et al. 2017). Comparatively, forest tree domestication and breeding is in its infancy, and the increasing effects of human activity have not yet had sufficient time to have a large impact on the tree species in this study. Recent work on the demographic histories of these species found that populations were remarkably stable in recent time, with little detectible effective population size reductions even in the face of periods of glaciation (Milesi et al. 2023). The two populations for which we have the strongest evidence for differences in the DFE, on the other hand, are somewhat unusual in terms of their demographic histories. The P. nigra GB population experienced a sharp population decrease in the past, and subsequently recovered. F. sylvatica NO also experienced a fairly extreme decrease in Ne, from which it has since recovered. Both populations differ from the species as a whole in that they have a comparatively high fraction of strongly deleterious mutations.
It has been suggested that differences in genetic load among populations might drive differences in the DFE and that populations at the edge of a species’ range will have a temporarily increased mutation load relative to central populations, due to the increased importance of drift in these populations (Peischl et al. 2013; Willi et al. 2018). While for most populations we find no evidence that mutation load differs between populations, in two P. nigra populations, GB and MA, there is a reduction in the proportion of derived alleles relative to other P. nigra populations. P. nigra is also one of the two species that show a relationship between the efficiency of selection and latitude. For the GB population, greater purging of deleterious derived alleles is in line with our finding that a high fraction of new mutations in this population are strongly deleterious, and that the mean strength of selection acting on new deleterious mutations is greater.
However, for the Moroccan (MA) P. nigra population, a comparatively low fraction of new mutations is inferred to be strongly deleterious (see supplementary fig. S9, Supplementary Material online for details). This population is differentiated, and in addition, there is little correlation in the frequency of alleles between this and other P. nigra populations. There are also a number of fixed differences between MA and other P. nigra populations. The DFE might differ due to these fixed differences; for example, new mutations may be less strongly deleterious when they occur on a genetic background in which many deleterious mutations are already present. However, the differences we observe are not due to inbreeding; we do not see any evidence for a shift in mating system in the MA population. There are no clonal individuals in the MA population, nor any increase in the degree of relatedness between individuals (as estimated via the KING algorithm, implemented in PLINK; Manichaikul et al. 2010; Chang et al. 2015).
The fact that we generally do not find evidence for variation in the DFE at the population level does not mean that there is no local adaptation occurring in response to different environmental conditions across populations. Tree species generally show high levels of local adaptation, for example, for phenological traits (Savolainen et al. 2007), and the species in this study were generally inferred to have a high proportion of beneficial substitutions, with the exception of P. abies (αDFE, Fig. 3C). Infrequent, strong selective sweeps are expected to leave little signature on the SFS (Booker 2020), and thus have a relatively small effect on statistics calculated from it, including the DFE. Therefore, it is possible that the tree populations do experience local adaptation through selective sweeps, the effects of which we will not detect with the summary statistics considered here. However, the DFE is informative about the strength of purifying selection and the variance of mutational effects, which do not differ among populations in the tree species in this study.
It has been hypothesized that higher population differentiation might lead to greater differences in the parameters of the DFE between populations. Our general finding is that there is some relationship between population differentiation and differences in the DFE, particularly in the strength of deleterious selection (see supplementary fig. 10, Supplementary Material online for details), but it is not consistent. It is interesting to consider this finding in light of the scattering and collecting phase of the coalescent (Wakeley 1999). During the collecting phase, the more ancient part of a species’ history, the rate of coalescence is independent of the current geographic distribution of individuals. However, demographic history and geography will determine coalescence during the more recent part of a species history, the scattering phase. From this study, it seems that the DFE is more strongly affected by ancient events, that is, the collecting phase of the coalescent, and the long-term Ne, leading to similar strengths of purifying selection across most populations of the same species. Whether this finding is generally true remains to be seen; the tree species in this study have moderate to high dispersal rates, however, stronger patterns of isolation by distance will lead to a stronger signal during the scattering phase (Wilkins 2004), which may result in the scattering phase having a greater impact on the DFE.
Why do differences in the DFE exist at the species level? Neither of the life history traits that we examined, maximum longevity and average age at first flowering, showed a relationship with any parameters of the DFE. We focussed on these two traits as they have been previously shown to be predictive of genetic diversity in plants (Chen et al. 2017), although it is possible that other life history traits might affect the DFE. Previous work suggests that there might be a relationship between the DFE and large life history changes, such as transitioning from selfing to outcrossing. For example, in the herb Arabis alpina, selfing was associated with a reduction in the fraction of mutations inferred to experience strong negative selection, and a general reduction in the efficiency of purifying selection, while populations with mixed mating systems had very similar DFEs to outcrossing populations, with no signal of increased genetic load (Laenen et al. 2018). Relatedness also clearly plays a part- previous studies on closely related species have found that they share the same DFE (Chen et al. 2017; Castellano et al. 2019; Liu et al. 2022). The most closely related species in our dataset, F. sylvatica and Q. petraea, also appear to have more similar DFEs (see, for example, Fig. 3), although model comparison tests indicate that fitting DFEs independently to these species provides a better fit to the data, albeit only slightly (log-likelihoods of independent and shared models: −523.6194, −526.7272, P = 0.045). It may be that some slow evolving aspect of genome biology, for example, gene interaction networks, methods of gene expression regulation, or genome organization or size, eventually lead to differences in DFEs between species. The possibility that genome organization could affect the DFE was previously investigated by Hämälä and Tiffin (2020), who showed that a number of genome features could influence selective constraint, including expression level, expression variability, and gene network connectivity, while Castellano et al. (2020), found that gene density was negatively correlated to nonsynonymous diversity, possibly due to greater constraint acting on gene dense regions. This is of particular relevance to the species included in this study, because conifer genomes are considerably larger than the genomes of other tree species (De La Torre et al. 2014).
In summary, genome and species biology are important determinants of the DFE, whose long-term effects dominate short-term processes. Our findings indicate that despite differences among populations in environmental challenges faced, the mean strength of selection experienced by new mutations and their variation in selective effects remain similar across populations. The DFEs of the tree species in this study are stable, reflecting deep processes. A large change, such as a shift in breeding system, for example, from outcrossing to inbreeding, or genome structure, may be required before the DFE differs between populations or species.
GenTree Consortium: Paraskevi Alizoti1, Ricardo Alía2, Olivier Ambrosio3, Filippos A Aravanopoulos1, Georg von Arx4, Albet Audrey5, Francisco Auñón2, Camilla Avanzi6, Evangelia Avramidou1, Francesca Bagnoli7, Marko Bajc8, Eduardo Ballesteros2, Evangelos Barbas1, José M García del Barrio2, Cristina C Bastias9, Catherine Bastien10, Giorgia Beffa11, Raquel Benavides12, Vanina Benoit13, Frédéric Bernier5, Henri Bignalet5, Guillaume Bodineau14, Damien Bouic5, Sabine Brodbeck11, William Brunetto15, Jurata Buchovska16, Corinne Buret13, Melanie Buy17, Ana M Cabanillas-Saldaña18, Bárbara Carvalho12, Stephen Cavers19, Fernando Del Caño2, Sandra Cervantes20,21, Nicolas Cheval5, José M Climent2, Marianne Correard22, Eva Cremer23, Darius Danusevičius16, Benjamin Dauphin24, Jean-Luc Denou5, Bernard Dokhelar5, Alexis Ducousso25, Bruno Fady26, Patricia Faivre-Rampant27, Anna-Maria Farsakoglou1, Patrick Fonti4, Ioannis Ganopoulos28, Olivier Gilg22, Nicolas De Girardi29, René Graf11, Alan Gray30, Delphine Grivet31, Felix Gugerli24, Christoph Hartleitner32, Katrin Heer33, Enja Hollenbach34, Agathe Hurel25, Bernard Issenhuth5, Florence Jean15, Véronique Jorge35, Arnaud Jouineau36, Jan-Philipp Kappner34, Robert Kesälahti37, Florian Knutzen23, Sonja T Kujala38, Timo A Kumpula37, Katri Kärkkäinen38, Mariaceleste Labriola39, Celine Lalanne25, Johannes Lambertz34, Gregoire Le-Provost25, Vincent Lejeune14, Isabelle Lesur-Kupin40,41, Joseph Levillain42, Mirko Liesebach43, David López-Quiroga12, Ermioni Malliarou1, Jérémy Marchon11, Nicolas Mariotte36, Antonio Mas12, Silvia Matesanz44, Benjamin Meier11, Helge Meischner34, Célia Michotey17, Sandro Morganti11, Tor Myking45, Daniel Nievergelt4, Anne Eskild Nilsen45, Eduardo Notivol46, Dario I. Ojeda47, Sanna Olsson31, Lars Opgenoorth24,48, Geir Ostreng45, Birte Pakull43, Annika Perry30, Sara Pinosio7,49, Andrea Piotti6, Christophe Plomion40, Nicolas Poinot5, Mehdi Pringarbe22, Luc Puzos5, Annie Raffin5, José A Ramírez-Valiente2, Christian Rellstab24, Dourthe Remi5, Oliver Reutimann11, Sebastian Richter34, Juan J Robledo-Arnuncio2, Odile Rogier35, Elisabet Martínez Sancho4, Outi Savolainen37, Simone Scalabrin50, Volker Schneck51, Silvio Schueler52, Ivan Scotti26, Sergio San Segundo2, Vladimir Semerikov53, Lenka Slámová4, Ilaria Spanu54, Jørn Henrik Sønstebø45, Jean Thevenet22, Mari Mette Tollefsrud45, Norbert Turion22, Fernando Valladares12, Giovanni G. Vendramin7, Marc Villar55, Marjana Westergren56, Johan Westin57
1Aristotle University of Thessaloniki, School of Forestry and Natural Environment, Laboratory of Forest Genetics and Tree Improvement, 541
2Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria—Centro de Investigación Forestal (INIA-CIFOR), Ctra. de la Coruña km 7.5, 28040, Madrid, Spain
3INRAE, URFM F-84914, Avignon, France
4Forest Dynamics, Swiss Federal Research Institute WSL, 8903 Birmensdorf, Switzerland
5INRAE, UEFP, F-33610, Cestas, France
6Institute of Biosciences and Bioresources, National Reaseach Council of Italy
7Institute of Biosciences and Bioresources, National Reasearch Council of Italy (IBBR-CNR), 50019 Sesto Fiorentino, Italy
8Slovenian Forestry Institute, Vecna pot 2, 1000 Ljubljana, Slovenia
9Centre d’Ecologie Fonctionnelle et Evolutive (CEFE), CNRS, UMR 51
10INRAE, Dept ECODIV, F-45075, Orléans, France
11Biodiversity & Conservation Biology, Swiss Federal Research Institute WSL, 8
12LINCGlobal, Department of Biogeography and Global Change, Museo Nacional de Ciencias Naturales, CSIC, Serrano
13INRAE, ONF, BioForA, F-45075, Orléans, France
14INRAE, GBFOR, F-45075, Orléans, France
15INRAE, URFM, F-849
16Vytautas Magnus University, Studentu Street 11, 53361, Akademija, Lithuania
17INRAE, URGI, F-78026, Versailles, France
18Departamento de Agricultura, Ganadería y Medio Ambiente, Gobierno de Aragón, P. Mª Agustín 36, 50071, Zaragoza, Spain
19UK Centre for Ecology & Hydrology (UKCEH), EH26 0QB Bush Estate, United Kingdom
20Department of Ecology and Genetics, University of Oulu, 90014 Oulu, Finland
21Biocenter Oulu, University of Oulu, 90014 Oulu, Finland
22INRAE, UEFM, F-84914, Avignon, France
23Bavarian Institute for Forest Genetics, Forstamtsplatz 1, 83317, Teisendorf, Germany
24Biodiversity and Conservation Biology, Swiss Federal Research Institute WSL, 8903 Birmensdorf, Switzerland
25INRAE, Université de Bordeaux, BIOGECO, F-33770, Cestas, France
26National Research Institute for Agriculture, Food and the Environment (INRAE), 84914 Avignon, France
27University of Paris-Saclay, INRAE, Study of Plant Genome Polymorphism, 91000 Evry-Cour-couronnes, France
28Institute of Plant Breeding and Genetic Resources, Hellenic Agricultural Organization DEMETER (ex NAGREF), 57001, Thermi, Greece
29Swiss Federal Research Institute WSL, 8903 Birmensdorf, Switzerland
30UK Centre for Ecology and Hydrology, Bush Estate Penicuik, EH26 0QB, Edinburgh, UK
31Institute of Forest Sciences (ICIFOR-INIA), CSIC, 28040 Madrid, Spain
32LIECO GmbH & Co KG
33Forest Genetics, Albert-Ludwigs Universität Freiburg, Bertoldstraße 17, 79098 Freiburg, Germany
34Philipps University Marburg, Faculty of Biology, Plant Ecology and Geobotany, Karl-von-Frisch Strasse 8, 35043, Marburg, Germany
35INRAE, ONF, BioForA, 45075 Orléans, France
36INRAE, URFM, F-84914, Avignon, France
37University of Oulu, Pentti Kaiteran katu 1, 90014, University of Oulu, Finland
38Natural Resources Institute Finland, Paavo Havaksentie 3, 90014, University of Oulu, Finland
39Institute of Biosciences and BioResources, National Research Council (CNR), via Madonna del Piano 10, 50019, Sesto, Fiorentino, Italy
40University of Bordeaux, INRAE, BIOGECO, 33610 Cestas, France
41Helix Venture, 33700 Mérignac, France
42Université de Lorraine, AgroParisTech, INRAE, SILVA, 54000, Nancy, France
43Thünen Institute of Forest Genetics, Sieker Landstr. 2, 22927, Grosshansdorf, Germany
44Área de Biodiversidad y Conservación, Universidad Rey Juan Carlos, Calle Tulipán s/n, 28933, Móstoles, Spain
45Division of Forestry and Forest Resources, Norwegian Institute of Bioeconomy Research (NIBIO), P.O. Box 115, 1431, Ås, Norway
46Centro de Investigación y Tecnología Agroalimentaria de Aragón -Dpto. de Sistemas Agrarios, Forestales y Medio Ambiente (CITA), Avda. Montañana 930, 50059, Zaragoza, Spain
47Norwegian Institute of Bioeconomy Research (NIBIO), 8027 Bodø, Norway
48Plant Ecology and Geobotany, Philipps-Universität Marburg, 35043 Marburg, Germany
49Institute of Applied Genomics (IGA), 33100 Udine, Italy
50IGA Technology Services S.r.l., 33100 Udine, Italy
51Thünen Institute of Forest Genetics, Eberswalder Chaussee 3a, 15377, Waldsieversdorf, Germany
52Austrian Research Centre for Forests (BFW), Seckendorff-Gudent-Weg 8, 1131, Wien, Austria
53Institute of Plant and Animal Ecology, Ural branch of RAS, 8 Marta St. 202, 620144, Ekaterinburg, Russia
54Institute of Biosciences and BioResources, National Research Council (CNR), via Madonna del Piano 10, 50019, Sesto Fiorentino, Italy
55INRAE, ONF, BioForA, F-45075 Orléans, France
56Slovenian Forestry Institute, 1000 Ljubljana, Slovenia
57Skogforsk, Tomterna 1, 91821, Sävar, Sweden
Supplementary Material
Acknowledgments
J.J. was supported by a grant from the Wenner-Gren Foundation. The present project was supported by the European Union's Horizon 2020 Research and Innovation Programme grant agreement no. 676876 (Gentree). S.C.G.-M. was supported by the French government in the framework of the IdEX Bordeaux University “Investments for the Future” programme/GPR Bordeaux Plant Sciences. The computations were enabled by resources in project SNIC 2022/22-910 provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX, partially funded by the Swedish Research Council (VR) through grant agreement no. 2018-05973. We are grateful to Sylvain Glémin for his input on an earlier draft of this manuscript, and to two anonymous reviewers for their valuable insights into this work.
Conflict of interest statement. None declared.
Contributor Information
Jennifer James, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden; Swedish Collegium of Advanced Study, Uppsala University, Uppsala, Sweden.
Chedly Kastally, Department of Forest Sciences, University of Helsinki, Helsinki, Finland; Viikki Plant Science Centre, University of Helsinki, Helsinki, Finland.
Katharina B Budde, Department of Forest Genetics and Forest Tree Breeding, Georg-August-University Goettingen, Goettingen, Germany; Center of Biodiversity and Sustainable Land Use (CBL), University of Goettingen, Goettingen, Germany.
Santiago C González-Martínez, National Research Institute for Agriculture, Food and the Environment (INRAE), University of Bordeaux, BIOGECO, Cestas, France.
Pascal Milesi, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden; Science for Life Laboratory (SciLifeLab), Uppsala University, Uppsala, Sweden.
Tanja Pyhäjärvi, Department of Forest Sciences, University of Helsinki, Helsinki, Finland; Viikki Plant Science Centre, University of Helsinki, Helsinki, Finland.
Martin Lascoux, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden.
GenTree Consortium:
Paraskevi Alizoti, Ricardo Alía, Olivier Ambrosio, Filippos A Aravanopoulos, Georg von Arx, Albet Audrey, Francisco Auñón, Camilla Avanzi, Evangelia Avramidou, Francesca Bagnoli, Marko Bajc, Eduardo Ballesteros, Evangelos Barbas, José M García del Barrio, Cristina C Bastias, Catherine Bastien, Giorgia Beffa, Raquel Benavides, Vanina Benoit, Frédéric Bernier, Henri Bignalet, Guillaume Bodineau, Damien Bouic, Sabine Brodbeck, William Brunetto, Jurata Buchovska, Corinne Buret, Melanie Buy, Ana M Cabanillas-Saldaña, Bárbara Carvalho, Stephen Cavers, Fernando Del Caño, Sandra Cervantes, Nicolas Cheval, José M Climent, Marianne Correard, Eva Cremer, Darius Danusevičius, Benjamin Dauphin, Jean-Luc Denou, Bernard Dokhelar, Alexis Ducousso, Bruno Fady, Patricia Faivre-Rampant, Anna-Maria Farsakoglou, Patrick Fonti, Ioannis Ganopoulos, Olivier Gilg, Nicolas De Girardi, René Graf, Alan Gray, Delphine Grivet, Felix Gugerli, Christoph Hartleitner, Katrin Heer, Enja Hollenbach, Agathe Hurel, Bernard Issenhuth, Florence Jean, Véronique Jorge, Arnaud Jouineau, Jan-Philipp Kappner, Robert Kesälahti, Florian Knutzen, Sonja T Kujala, Timo A Kumpula, Katri Kärkkäinen, Mariaceleste Labriola, Celine Lalanne, Johannes Lambertz, Gregoire Le-Provost, Vincent Lejeune, Isabelle Lesur-Kupin, Joseph Levillain, Mirko Liesebach, David López-Quiroga, Ermioni Malliarou, Jérémy Marchon, Nicolas Mariotte, Antonio Mas, Silvia Matesanz, Benjamin Meier, Helge Meischner, Célia Michotey, Sandro Morganti, Tor Myking, Daniel Nievergelt, Anne Eskild Nilsen, Eduardo Notivol, Dario I Ojeda, Sanna Olsson, Lars Opgenoorth, Geir Ostreng, Birte Pakull, Annika Perry, Sara Pinosio, Andrea Piotti, Christophe Plomion, Nicolas Poinot, Mehdi Pringarbe, Luc Puzos, Annie Raffin, José A Ramírez-Valiente, Christian Rellstab, Dourthe Remi, Oliver Reutimann, Sebastian Richter, Juan J Robledo-Arnuncio, Odile Rogier, Elisabet Martínez Sancho, Outi Savolainen, Simone Scalabrin, Volker Schneck, Silvio Schueler, Ivan Scotti, Sergio San Segundo, Vladimir Semerikov, Lenka Slámová, Ilaria Spanu, Jørn Henrik Sønstebø, Jean Thevenet, Mari Mette Tollefsrud, Norbert Turion, Fernando Valladares, Giovanni G Vendramin, Marc Villar, Marjana Westergren, and Johan Westin
Supplementary material
Supplementary material is available at Molecular Biology and Evolution online.
Data Availability
The genetic data underlying this article are available as VCF files at: https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/DV2X0M.
Full documentation of bioinformatics pipelines used to generate the VCF files, including SNP filtering steps, are available at https://github.com/GenTree-h2020-eu/GenTree.
Code for all other analysis and bioinformatic steps is available at https://github.com/j-e-james/TreeDFEScripts.
References
- Agrawal AF, Whitlock MC. Environmental duress and epistasis: how does stress affect the strength of selection on new mutations? Trends Ecol Evol (Amst). 2010:25(8):450–458. 10.1016/j.tree.2010.05.003. [DOI] [PubMed] [Google Scholar]
- Bataillon T, Bailey SF. Effects of new mutations on fitness: insights from models and data. Ann N Y Acad Sci. 2014:1320(1):76–92. 10.1111/nyas.12460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolívar P, Mugal CF, Rossi M, Nater A, Wang M, Dutoit L, Ellegren H. Biased inference of selection due to GC-biased gene conversion and the rate of protein evolution in flycatchers when accounting for it. Mol Biol Evol. 2018:35(10):2475–2486. 10.1093/molbev/msy149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Booker TR. Inferring parameters of the distribution of fitness effects of new mutations when beneficial mutations are strongly advantageous and rare. G3 Genes|Genomes|Genetics. 2020:10(7):2317–2326. 10.1534/g3.120.401052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castellano D, Eyre-Walker A, Munch K. Impact of mutation rate and selection at linked sites on DNA variation across the genomes of humans and other homininae. Genome Biol Evol. 2020:12(1):3550–3561. 10.1093/gbe/evz215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Castellano D, Macià MC, Tataru P, Bataillon T, Munch K. Comparison of the full distribution of fitness effects of new amino acid mutations across great apes. Genetics. 2019:213(3):953–966. 10.1534/genetics.119.302494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015:4(1):1–16. 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J, Glemin S, Lascoux M. Genetic diversity and the efficacy of purifying selection across plant and animal Species. Mol Biol Evol. 2017:34(6):1417–1428. 10.1093/molbev/msx088. [DOI] [PubMed] [Google Scholar]
- De La Torre AR, Birol I, Bousquet J, Ingvarsson PK, Jansson S, Jones SJ, Keeling CI, MacKay J, Nilsson O, Ritland K, et al. Insights into conifer giga-genomes. Plant Physiol. 2014:166(4):1724–1732. 10.1104/pp.114.248708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Do R, Balick D, Li H, Adzhubei I, Sunyaev S, Reich D. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat Genet. 2015:47(2):126–131. 10.1038/ng.3186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Douglas GM, Gos G, Steige KA, Salcedo A, Holm K, Josephs EB, Arunkumar R, Ågren JA, Hazzouri KM, Wang W, et al. Hybrid origins and the earliest stages of diploidization in the highly successful recent polyploid Capsella bursa-pastoris. Proc Natl Acad Sci USA. 2015:112(9):2806–2811. 10.1073/pnas.1412277112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duret L. Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. 2002:12(6):640–649. 10.1016/S0959-437X(02)00353-2. [DOI] [PubMed] [Google Scholar]
- Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nat Rev Genet. 2007:8(8):610–618. 10.1038/nrg2146. [DOI] [PubMed] [Google Scholar]
- Eyre-Walker A, Woolfit M, Phelps T. The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics. 2006:173(2):891–900. 10.1534/genetics.106.057570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galtier N. Adaptive protein evolution in animals and the effective population size hypothesis. PLoS Genet. 2016:12(1):e1005774. 10.1371/journal.pgen. [DOI] [PMC free article] [PubMed] [Google Scholar]
- González-Martínez SC, Ridout K, Pannell JR. Range expansion compromises adaptive evolution in an outcrossing plant. Curr Biol. 2017:27(16):2544–2551.e4. 10.1016/j.cub.2017.07.007. [DOI] [PubMed] [Google Scholar]
- Grivet D, Avia K, Vaattovaara A, Eckert AJ, Neale DB, Savolainen O, González-Martínez SC. High rate of adaptive evolution in two widespread European pines. Mol Ecol. 2017:26(24):6857–6870. 10.1111/mec.14402. [DOI] [PubMed] [Google Scholar]
- Grossen C, Guillaume F, Keller LF, Croll D. Purging of highly deleterious mutations through severe bottlenecks in Alpine Ibex. Nat Commun. 2020:11(1):1001. 10.1038/s41467-020-14803-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutenkunst R, Hernandez R, Williamson S, Bustamante C. Diffusion approximations for demographic inference: DaDi. Nat Preced. 2010:1–1. 10.1038/npre.2010.4594.1. [DOI] [Google Scholar]
- Hämälä T, Tiffin P. Biased gene conversion constrains adaptation in Arabidopsis thaliana. Genetics. 2020:215(3):831–846. 10.1534/genetics.120.303335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, Fortier AL, Coffman AJ, Struck TJ, Irby MN, James JE, León-Burguete JE, Ragsdale AP, Gutenkunst RN. Inferring genome-wide correlations of mutation fitness effects between populations. Mol Biol Evol. 2021:38(10):4588–4602. 10.1093/molbev/msab162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huber CD, Kim BY, Marsden CD, Lohmueller KE. Determining the factors driving selective effects of new nonsynonymous mutations. Proc Natl Acad Sci USA. 2017:114(17):4465–4470. 10.1073/pnas.1619508114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johri P, Eyre-walker A, Jensen JD, Lohmueller KE, Gutenkunst RN. On the prospect of achieving accurate joint estimation of selection with population history. Genome Biol Evol. 2022:14(7):evac088. 10.1093/gbe/evac088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keightley PD, Jackson BC. Inferring the probability of the derived vs. the ancestral allelic state at a polymorphic site. Genetics. 2018:209(3):897–906. 10.1534/genetics.118.301120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kousathanas A, Keightley PD. A comparison of models to infer the distribution of fitness effects of new mutations. Genetics. 2013:193(4):1197–208. 10.1534/genetics.112.148023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim BY, Huber CD, Lohmueller KE. Inference of the distribution of selection coefficients for new nonsynonymous mutations using large samples. Genetics. 2017:206(1):345–361. 10.1534/genetics.116.197145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laenen B, Tedder A, Nowak MD, Toräng P, Wunder J, Wötzel S, Steige KA, Kourmpetis Y, Odong T, Drouzas AD, et al. Demography and mating system shape the genome-wide impact of purifying selection in Arabis alpina. Proc Natl Acad Sci USA. 2018:115(4):816–821. 10.1073/pnas.1707492115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009:25(14):1754–1760. 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu S, Zhang L, Sang Y, Lai Q, Zhang X, Jia C, Long Z, Wu J, Ma T, Mao K, et al. Demographic history and natural selection shape patterns of deleterious mutation load and barriers to introgression across Populus genome. Mol Biol Evol. 2022:39(2):msac008. 10.1093/molbev/msac008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loewe L, Charlesworth B. Inferring the distribution of mutational effects on fitness in Drosophila. Biol Lett. 2006:2(3):426–430. 10.1098/rsbl.2006.0481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010:26(22):2867–2873. 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin G, Lenormand T. The fitness effect of mutations across environments: a survey in light of fitness landscape models. Evolution. 2006:60(12):2413–2427. 10.1111/j.0014-3820.2006.tb01878.x. [DOI] [PubMed] [Google Scholar]
- Milesi P, Kastally C, Dauphin B, Cervantes S, Bagnoli F, Budde KB, Cavers S, Fady B, Faivre-Rampant P, Gonzalez-Martinez SC, et al. Synchronous effective population size changes and genetic stability of forest trees through glacial cycles. bioRxiv. 10.1101/2023.01.05.522822, 6 January 2023, preprint: not peer reviewed. [DOI]
- Muyle A, Martin H, Zemp N, Mollion M, Gallina S, Tavares R, Silva A, Bataillon T, Widmer A, Glémin S, et al. Dioecy is associated with high genetic diversity and adaptation rates in the plant genus Silene. Mol Biol Evol. 2021:38(3):805–818. 10.1093/molbev/msaa229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohta T. Slightly deleterious mutant substitutions in evolution. Nature. 1973:246(5428):96–98. 10.1038/246096a0. [DOI] [PubMed] [Google Scholar]
- Peischl S, Dupanloup I, Kirkpatrick M, Excoffier L. On the accumulation of deleterious mutations during range expansions. Mol Ecol. 2013:22(24):5972–5982. 10.1111/mec.12524. [DOI] [PubMed] [Google Scholar]
- Savolainen O, Pyhäjärvi T, Knürr T. Gene flow and local adaptation in trees. Annu Rev Ecol Evol Syst. 2007:38(1):595–619. 10.1146/annurev.ecolsys.38.091206.095646. [DOI] [Google Scholar]
- Sawyer SA, Kulathinal RJ, Bustamante CD, Hartl DL. Bayesian Analysis Suggests that Most Amino Acid Replacements in Drosophila Are Driven by Positive Selection. J Mol Evol. 2003:S154–S164. 10.1007/s00239-003-0022-3. [DOI] [PubMed] [Google Scholar]
- Simons YB, Mostafavi H, Smith CJ, Pritchard JK, Sella G. Simple scaling laws control the genetic architectures of human complex traits. bioRxiv. 2022:2022–10. [Google Scholar]
- Simons YB, Sella G. The impact of recent population history on the deleterious mutation load in humans and close evolutionary relatives. Curr Opin Genet Dev. 2016:41:150–158. 10.1016/j.gde.2016.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nat Genet. 2014:46(3):220–224. 10.1038/ng.2896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takou M, Hämälä T, Koch EM, Steige KA, Dittberner H, Yant L, Genete M, Sunyaev S, Castric V, Vekemans X, et al. Maintenance of adaptive dynamics and No detectable load in a range-edge outcrossing plant population. Mol Biol Evol. 2021:38(5):1820–1836. 10.1093/molbev/msaa322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tataru P, Bataillon T. PolyDFEv2.0: testing for invariance of the distribution of fitness effects within and across species. Bioinformatics. 2019:35(16):2868–2869. 10.1093/bioinformatics/bty1060. [DOI] [PubMed] [Google Scholar]
- Tataru P, Mollion M, Glémin S, Bataillon T. Inference of distribution of fitness effects and proportion of adaptive substitutions from polymorphism data. Genetics. 2017:207(3):1103–1119. 10.1534/genetics.117.300323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wakeley J. Nonequilibrium migration in human history. Genetics. 1999:153(4):1863–1871. 10.1093/genetics/153.4.1863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010:38(16):e164. 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welch JJ, Eyre-Walker A, Waxman D. Divergence and polymorphism under the nearly neutral theory of molecular evolution. J Mol Evol. 2008:67(4):418–426. 10.1007/s00239-008-9146-9. [DOI] [PubMed] [Google Scholar]
- Weng ML, Ågren J, Imbert E, Nottebrock H, Rutter MT, Fenster CB. Fitness effects of mutation in natural populations of Arabidopsis thaliana reveal a complex influence of local adaptation. Evolution. 2021:75(2):330–348. 10.1111/evo.14152. [DOI] [PubMed] [Google Scholar]
- Wilkins JF. A separation-of-timescales approach to the coalescent in a continuous population. Genetics. 2004:168(4):2227–2244. 10.1534/genetics.103.022830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willi Y, Fracassetti M, Bachmann O, Van Buskirk J. Demographic processes linked to genetic diversity and positive selection across a species’ range. Plant Commun. 2020:1(6):100111. 10.1016/j.xplc.2020.100111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willi Y, Fracassetti M, Zoller S, Van Buskirk J. Accumulation of mutational load at the edges of a species range. Mol Biol Evol. 2018:35(4):781–791. 10.1093/molbev/msy003. [DOI] [PubMed] [Google Scholar]
- Williamson RJ, Josephs EB, Platts AE, Hazzouri KM, Haudry A, Blanchette M, Wright SI. Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS Genet. 2014:10(9):e1004622. 10.1371/journal.pgen.1004622. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genetic data underlying this article are available as VCF files at: https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/DV2X0M.
Full documentation of bioinformatics pipelines used to generate the VCF files, including SNP filtering steps, are available at https://github.com/GenTree-h2020-eu/GenTree.
Code for all other analysis and bioinformatic steps is available at https://github.com/j-e-james/TreeDFEScripts.