Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2024 Dec 26;20(12):e1011536. doi: 10.1371/journal.pgen.1011536

Estimating the proportion of beneficial mutations that are not adaptive in mammals

Thibault Latrille 1,*,#, Julien Joseph 2,#, Diego A Hartasánchez 1, Nicolas Salamin 1
Editor: Kirk E Lohmueller3
PMCID: PMC11709321  PMID: 39724093

Abstract

Mutations can be beneficial by bringing innovation to their bearer, allowing them to adapt to environmental change. These mutations are typically unpredictable since they respond to an unforeseen change in the environment. However, mutations can also be beneficial because they are simply restoring a state of higher fitness that was lost due to genetic drift in a stable environment. In contrast to adaptive mutations, these beneficial non-adaptive mutations can be predicted if the underlying fitness landscape is stable and known. The contribution of such non-adaptive mutations to molecular evolution has been widely neglected mainly because their detection is very challenging. We have here reconstructed protein-coding gene fitness landscapes shared between mammals, using mutation-selection models and a multi-species alignments across 87 mammals. These fitness landscapes have allowed us to predict the fitness effect of polymorphisms found in 28 mammalian populations. Using methods that quantify selection at the population level, we have confirmed that beneficial non-adaptive mutations are indeed positively selected in extant populations. Our work confirms that deleterious substitutions are accumulating in mammals and are being reverted, generating a balance in which genomes are damaged and restored simultaneously at different loci. We observe that beneficial non-adaptive mutations represent between 15% and 45% of all beneficial mutations in 24 of 28 populations analyzed, suggesting that a substantial part of ongoing positive selection is not driven solely by adaptation to environmental change in mammals.

Author summary

The extent to which adaptation to changing environments is shaping genomes is a central question in molecular evolution. To quantify the rate of adaptation, population geneticists have typically used signatures of positive selection. However, mutations restoring an ancestral state of higher fitness lost by genetic drift are also positively selected, but they do not respond to a change in the environment. In this study, we have managed to distinguish beneficial mutations that are due to changing environments and those that are restoring pre-existing functions in mammals. We show that a substantial proportion of beneficial mutations cannot be interpreted as adaptive.

1 Introduction

Adaptation is one of the main processes shaping the diversity of forms and functions across the tree of life [1]. Evolutionary adaptation is tightly linked to environmental change and species responding to this change [2, 3]. Such environmental changes are either abiotic (e.g. temperature, humidity) or biotic (e.g. pressure from predators or viruses [4]). For adaptation to occur, there must be variation within populations, which mostly appears via mutations in the DNA sequence. While neutral mutations will not impact an individual fitness, deleterious mutations have a negative effect, and beneficial mutations improve their bearer fitness. A beneficial mutation is thus more likely than a neutral mutation to invade the population and reach fixation, resulting in a substitution at the species level.

Upon environmental change, because adaptive beneficial mutations toward new fitness optima are more likely, the number of substitutions also increases (Fig 1A). An increased substitution rate is thus commonly interpreted as a sign of adaptation [57]. The availability of large-scale genomic data and the development of theoretical models have enabled the detection and quantification of substitution rate changes across genes and lineages [810]. These approaches, now common practice in evolutionary biology, have helped better understand the processes underpinning the rates of molecular evolution, contributing to disentangling the effects of mutation, selection and drift in evolution [11]. However, a collateral effect has been conflating beneficial mutations with adaptive evolution when adaptive evolution is not the only process that can lead to beneficial mutations [1214].

Fig 1. Changing and stable fitness landscapes.

Fig 1

(A & B) For a given codon position of a protein-coding DNA sequence, amino acids (x-axis) have different fitness values (y-axis). Under a changing fitness landscape (A), these fitnesses fluctuate with time. The protein sequence follows the moving target defined by the amino-acid fitnesses. Since substitutions are preferentially accepted if they are in the direction of this target, substitutions are, on average, adaptive. At the phylogenetic scale (C), beneficial substitutions are common (positive signs), promoting phenotype diversification across species. Under a stable fitness landscape (B), most mutations reaching fixation are either slightly deleterious reaching fixation due to drift or are beneficial non-adaptive mutations restoring a more optimal amino acid. At the phylogenetic scale (D), deleterious substitutions (negative signs) are often reverted via beneficial non-adaptive mutations (positive signs), promoting phenotype stability and preserving well-established biological systems. Even though, individually, any beneficial non-adaptive mutation might have a weak effect on its bearer, we expect them to be scattered across the genome and the genome-wide signature of beneficial non-adaptive mutations to be detectable and quantifiable.

1.1 Beneficial yet non-adaptive mutations

In a constant environment, a deleterious mutation can reach fixation by genetic drift [15]. A new mutation restoring the ancestral fitness will thus be beneficial (Fig 1B), even though the environment has not changed [13, 1619]. We will refer to as beneficial non-adaptive mutations those mutations that restore the ancestral fitness under the assumption that the fitness landscape has not changed [12, 20]. Such mutations can happen at a different locus, in which case it is called a compensatory mutation [13, 17]. While compensatory mutations change the sequence and thus induce genetic diversification, beneficial non-adaptive mutations at the locus of the initial mutation reduce genetic diversity and do not contribute to genetic innovation, which are the focus of this manuscript. Although Tomoko Ohta considered beneficial non-adaptive mutations negligible in her nearly-neutral theory [15], their importance has now been acknowledged for expanding populations [12]. However, differentiating between an adaptive mutation and a beneficial non-adaptive mutation remains challenging [21]. Indeed, an adaptive mutation responding to a change in the environment and a beneficial non-adaptive mutation have equivalent fitness consequences for their bearer [12]. Similarly, at the population level, both types of mutations will result in a positive transmission bias of the beneficial allele. However, at the macro-evolutionary scale, the consequences of these two types of mutations are fundamentally different. While adaptive mutations promote phenotype diversification (Fig 1C), beneficial non-adaptive mutations promote phenotype stability and may help preserve well-established biological systems (Fig 1D). Additionally, the direction of adaptive evolution is unpredictable because it is caused by an unforeseen change in the environment and, hence, in the underlying fitness landscape [22]. On the other hand, beneficial non-adaptive mutations are predictable because, under a stable fitness landscape, any change from non-optimal to optimal amino acids will move back the site toward the equilibrium expected under the fitness landscape [2325]. They can then be distinguished from truly novel beneficial mutations because the latter are not expected to mutate toward the amino acids of higher fitnesses defined by the stable fitness landscape but rather mutate to amino acids showing a diversified pattern (Fig 1).

1.2 Fitness landscape reconstruction

The mutation-selection framework permits to link the patterns of substitution along a phylogenetic tree with the underlying fitness landscape [26, 27]. Such mutation-selection models applied to protein-coding DNA sequence alignments at the codon level allow us to estimate relative fitnesses for all amino acids for each site of the sequence, explicitly assuming that the underlying fitness landscape is stable along the phylogenetic tree [2830]. Moreover, effective population size (Ne) is considered constant along the phylogenetic tree precisely because of the fixed fitness landscape assumption, the consequences of which are detailed in the Discussion. Importantly, because mutation-selection codon models at the phylogenetic scale are based on population-genetics equations, their estimates of selection coefficients are directly interpretable as fitness effects at the population scale; and because they work at the DNA level, we are able to account for mutational bias in DNA and structure of the genetic code. The model further integrates the shared evolutionary history between samples and their divergence, which, together, allow us to estimate fitness effects in sequence alignments even though sequences are not independent samples and might not represent the equilibrium distribution of amino acids (see section 4.2 in Materials & methods). The detailed model implementation is available in S1 File, described as a Bayesian hierarchical model (Fig A in S1 File).

Accordingly, fitting the mutation-selection model to a multi-species sequence alignment allows us to obtain relative fitnesses for all amino acids (Fig 2A). The difference in fitness between a pair of amino acids allows us to predict whether any mutation would be a deleterious mutation toward a less fit amino acid, a nearly-neutral mutation, or a mutation toward a known fitter amino acid constituting thus a beneficial non-adaptive mutation (Fig 2B). We can hence use large-scale genomic data to test whether such fitnesses estimated at the phylogenetic scale predict the fitness effects at the population scale. The placental mammals represent an excellent study system to perform such an analysis. Having originated ∼102 million years ago, they diversified quickly [31]. Additionally, polymorphism data are available for many species [32], as are high quality protein-coding DNA alignments across the genome [33, 34]. By performing our analysis on 14,509 orthologous protein-coding genes across 87 species, we focus on genes shared across all mammals in our dataset and not newly functionalized genes in a lineage.

Fig 2. Selection coefficients at the phylogenetic and population scales.

Fig 2

At the phylogenetic scale (A), we estimated the amino-acid fitness for each site from protein-coding DNA alignments using mutation-selection codon models. For every possible mutation, the difference in amino-acid fitness before and after the mutation allows us to compute the selection coefficient at the phylogenetic scale (S0). Depending on S0 (B), mutations can be predicted as deleterious (D0), nearly-neutral (N0) or beneficial non-adaptive mutations (B0) toward a fitter amino acid and repairing existing functions. At the population scale, each observed single nucleotide polymorphism (SNP) segregating in the population can also be classified according to its S0 value (C). Occurrence and frequency in the population of non-synonymous polymorphisms, contrasted to synonymous polymorphisms (deemed neutral), is used to estimate selection coefficients (D-E) at the population scale (S), for each class of selection (D0, N0, B0). We can thus assess whether S0 predicts S and compute precision (F) and recall (G) for each class. The recall value for class B0 is the probability for beneficial mutations to be non-adaptive (G). Icons are adapted from https://phylopic.org under a Creative Commons license.

Having identified which potential DNA changes represent beneficial non-adaptive mutations (Fig 2A and 2B), we retrieved polymorphism data from 28 wild and domesticated populations belonging to six genera (Equus, Bos, Capra, Ovis, Chlorocebus, and Homo) to assess the presence of beneficial non-adaptive mutations at the population scale. We focused on both mutations currently segregating within populations and on substitutions in the terminal branches, and checked if any of these observed changes were indeed beneficial (Fig 2C and 2E). A similar approach demonstrated the presence of beneficial non-adaptive mutations in humans [23, 24] and in plants [25]. However, the model used to reconstruct the static fitness landscape in these studies can only be applied to deeply conserved protein domains in the tree of life, which corresponds to a subpart of the proteome that evolves slowly. The mutation-selection model used in the present work integrates phylogenetic relationships, and thus allows us to estimate the fitness landscape in shallower phylogenetic trees, and therefore can be applied almost exome-wide [35].

We first quantified the likelihood of any DNA mutation to be a beneficial non-adaptive mutation, that is, whenever a DNA mutation increases fitness under a stable fitness landscape. Subsequently, by quantifying the total amount of beneficial mutations in the current population across all types of DNA mutations, we could tease apart beneficial non-adaptive from adaptive mutations resulting from a change in the fitness landscape. Altogether, in this study, by integrating large-scale genomic datasets at both phylogenetic and population scales, we propose a way to explicitly quantify the contribution of beneficial non-adaptive mutations to positive selection across the entire exome of the six genera (Fig 2F and 2G).

2 Results

2.1 Selection along the terminal branches

First, we assessed whether fitness effects derived from the mutation-selection model at the phylogenetic scale predict selection occurring in terminal branches. We recovered the mutations that reached fixation in the terminal branches of the six genera. We only considered mutations fixed in a population as substitutions in the corresponding branch by discarding mutations segregating in our population samples. For each substitution identified in the terminal branches we obtained its S0 value such as predicted at the mammalian scale (Fig 2A and 2B). We could classify each substitution as either deleterious (D0S0<-1), nearly-neutral (N0-1<S0<1), or beneficial (B0S0>1). Because S0 values were based on the assumption that the fitness landscape is stable across mammals, B0 mutations (i.e., with S0 > 1) bring the bearer of this mutation toward an amino acid predicted to be fitter across mammals. Importantly, the mammalian alignment used to estimate the amino acid fitness landscape did not include the six focal genera and their sister species. This ensures independence between, on the one hand, the fitness landscape estimated, and on the other hand, both substitutions that occurred in the terminal branches, and segregating polymorphisms of the focal populations. Example substitutions in the terminal lineage of Chlorocebus sabaeus which are classified as B0 are shown in S2 File (section 1.1). For instance, in the mammalian protein-coding DNA alignment of gene SELE, the nucleotide at site 1722 has mutated (from T to C) at the base of Simiiformes (monkeys and apes), modifying the corresponding amino acid from Serine to Proline, but has been subsequently reverted in the branch of Chlorocebus sabaeus (Fig A in S2 File). However, other substitutions classified as B0 in the terminal branch of Chlorocebus sabaeus cannot be clearly interpreted as reversions along the terminal branch, and show several transitions to this amino acid across the mammalian phylogeny, as for instance site 3145 of gene THSD7A (Fig B in S2 File).

Among all the substitutions found in each terminal branch, between 10 and 13% were B0, while B0 mutations only represent between 0.9 and 1.2% of all non-synonymous mutations (Fig 3A and 3B for humans, Table A in S2 File for all dataset). Of note, if we were to assume a stationary mutation-selection-drift equilibrium in the terminal lineage, we would expect a symmetric proportion of positively (B0) and negatively (D0) selected substitutions if there were no adaptation. The lack of symmetry along the terminal branches then provides a means to estimate the frequency of non-adaptive beneficial substitutions. Mathematically, twice the fraction of B0 substitutions is an estimate of this rate. This rate is highly consistent across lineages (Table A in S2 File) and suggests an overall frequency of nearly neutral substitutions due to consistent long-term selection pressures, mutation and drift of approximately 20% of all substitutions (20%-26% across species).

Fig 3. Selection coefficients of mutations in humans of African descent.

Fig 3

(A) Distribution of scaled selection coefficients (S0), predicted for all possible non-synonymous DNA mutations away from the ancestral human exome (section 4.4). Mutations are divided into three classes of selection: deleterious (D0), nearly-neutral (N0) and beneficial (B0, supposedly beneficial non-adaptive mutations) (B) Distribution of scaled selection coefficients (S0) for all observed substitutions along the Homo branch after the Homo-Pan split (section 4.5). If there are fewer substitutions than expected, this class is thus undergoing purifying selection, as is the case for D0. (C) The site-frequency spectrum (SFS) in humans of African descent for a random sample of 16 alleles (means in solid lines and standard deviations in color shades) for each class of selection and for synonymous mutations, supposedly neutral (black). The SFS represents the proportion of mutations (y-axis) with a given number of derived alleles in the population (x-axis). At high frequencies, deleterious mutations are underrepresented. (D) Proportion of beneficial P[D], nearly-neutral P[N], and deleterious mutations P[B] estimated at the population scale for each class of selection at the phylogenetic scale (section 4.6). Proportions depicted here are not weighted by their mutational opportunities.

Furthermore, since in principle, B0 mutations are bound to reach fixation more often than neutral mutations, we calculated the dN/dS ratio of non-synonymous over synonymous divergence for all terminal lineages, focusing on the non-synonymous changes predicted as B0 mutations (dN(B0)/dS). We obtained values between 1.17 and 1.75 in the different lineages (Table B in S2 File), meaning that B0 mutations reach fixation slightly more frequently than synonymous mutations that are supposed to be neutral, consistent with these B0 mutations being weakly beneficial. Such an observation is consistent with the premise that B0 mutations are weakly beneficial, translating to a scaled selection coefficient between 0.32 and 1.24 (section 1.2 in S2 File). Finding dN(B0)/dS>1 for these sites confirms that these sites are closer to optimality at the end of the branch than at the beginning. Even though the beneficial effect of these mutations does not come from an environmental change, it does not change the fact that they have contributed positively to the population’s fitness. It is just that at mutation-selection-drift equilibrium, the increase in fitness at these sites is offset by deleterious substitutions elsewhere in the genome so that there is no net adaptation.

This result further indicates that using dN/dS as an estimate of purifying selection is biased (overestimated) due to the presence of beneficial non-adaptive mutations among the non-synonymous substitutions. By discarding all beneficial non-adaptive mutations we can obtain an estimate of dN/dS which is not inflated. By comparing these two ways of calculating dN/dS (see section 4.5 in Materials & methods), we calculated that beneficial non-adaptive mutations inflate dN/dS values by between 9 and 12% across genera (Table C in S2 File). This represents a substantial increase when considering that beneficial non-adaptive mutations only represent between 0.9 and 1.2% of non-synonymous mutational opportunities (Table A in S2 File).

2.2 Selection in populations

Second, we assessed whether our calculated S0 values predicted at the phylogenetic scale were also indicative of the selective forces exerted at the population level. We retrieved single nucleotide polymorphisms (SNPs) segregating in 28 mammalian populations. To determine if SNPs were ancestral or derived, we reconstructed the ancestral exome of each population. We then classified every non-synonymous SNP as either D0, N0, or B0 according its S0 value (Fig 2B and 2C).

First, SNPs classified as B0 are spread across the genomes and not strongly associated to the ontology terms of their respective genes (Table D and Fig C in S2 File). In humans, some SNPs have been associated with specific clinical prognosis terms obtained by clinical evaluation of the impact of variants on human Mendelian disorders [36]. Although this classification also relies on deep protein alignments and therefore cannot be considered an independent result from our own, it does provide a consistency check if the effect of a mutation on human health is in line with its fitness effect predicted by our method [37]. Therefore, we investigated whether the non-synonymous SNPs classified as D0 or B0 showed enrichment in specific clinical terms compared to SNPs classified as N0. Our results show that SNPs predicted as deleterious are associated with clinical terms such as Likely Pathogenic and Pathogenic, implying that, in general, the selective pressure of a mutation exerted across mammals is also predictive of its clinical effect in humans (Table E in S2 File) [38]. Conversely, B0 mutations are associated with clinical terms such as Benign and Likely Benign, which shows that B0 mutations are less likely to be functionally damaging (Table F in S2 File).

In addition to clinical prognosis, frequencies at which SNPs are segregating within populations provide information on their selective effects. For instance, deleterious SNPs usually segregate at lower frequencies because of purifying selection, which tends to remove them from the population (Fig 3C for humans). By gathering information across many SNPs, it is possible to estimate the distribution of fitness effects at the population scale, taking synonymous SNPs as a neutral expectation [3942]. From these estimated fitness effects, we can derive the proportion of deleterious mutations (P[D]), nearly-neutral mutations (P[N]) and beneficial mutations (P[B]) at the population scale (see section 4.6 in Materials & methods, Fig A-C in S3 File). These approaches offer a unique opportunity to contrast selection coefficients estimated at the phylogenetic scale (S0) and at the population scale (S) in different dataset (Fig D in S3 File).

Across our selection classes (D0, N0 and B0), one can ultimately estimate the proportion of correct and incorrect predictions, leading to an estimation of precision and recall (Fig 2F and 2G and section 4.7 in Materials & methods). Across 28 populations of different mammal species, mutations predicted to be deleterious at the phylogenetic scale (D0) were indeed purged at the population scale, with a precision in the range of 90–97% (Table 1 and Fig 3D for humans). Conversely, a recall in the range of 96–100% implied that mutations found to be deleterious at the population scale were most likely also predicted to be deleterious at the phylogenetic scale (Table 1). Altogether, purifying selection is largely predictable and amino acids with negative fitness across mammals have been effectively purged away in each population.

Table 1. Precision and recall for estimated selection coefficient of mutations given by mutation-selection models (S0).

Deleterious mutations
DS<-1
D0S0<-1
Nearly-neutral mutations
N-1<S<1
N0-1<S0<1
Beneficial mutations
BS>1
B0S0>1
Population Species N e Precision
P[DD0]
Recall
P[D0D]
Precision
P[NN0]
Recall
P[N0N]
Precision
P[BB0]
Recall
P[B0B]
Equus c. Equus caballus 7.5 × 104 0.923 0.972 0.570 0.341 0.648 0.536
Iran Bos taurus 5.6 × 104 0.915 1.000 0.632 0.358 0.873 0.243
Uganda Bos taurus 1.3 × 105 0.951 0.969 0.495 0.414 0.576 0.415
Australia Capra hircus 1.7 × 105 0.944 0.971 0.527 0.437 0.368 0.177
France Capra hircus 1.9 × 105 0.946 0.971 0.508 0.423 0.368 0.190
Iran (C. aegagrus) Capra hircus 1.9 × 105 0.948 0.969 0.486 0.444 0.368 0.165
Iran Capra hircus 2.3 × 105 0.953 0.966 0.425 0.407 0.368 0.193
Italy Capra hircus 1.9 × 105 0.947 0.971 0.551 0.439 0.368 0.243
Morocco Capra hircus 2.2 × 105 0.950 0.970 0.527 0.440 0.368 0.245
Iran Ovis aries 3.8 × 105 0.961 0.961 0.452 0.415 0.205 0.407
Iran (O. orientalis) Ovis aries 4.5 × 105 0.964 0.960 0.420 0.445 0.193 0.190
Iran (O. vignei) Ovis aries 3.7 × 105 0.967 0.959 0.361 0.470 0.190 0.110
Various Ovis aries 4.1 × 105 0.962 0.962 0.433 0.440 0.229 0.222
Morocco Ovis aries 4 × 105 0.962 0.961 0.462 0.424 0.211 0.514
Barbados Chlorocebus sabaeus 1.1 × 105 0.935 0.975 0.565 0.402 0.648 0.293
Central Afr. Rep. Chlorocebus sabaeus 1.7 × 105 0.948 0.971 0.508 0.423 0.535 0.275
Ethiopia Chlorocebus sabaeus 1.4 × 105 0.935 0.975 0.580 0.416 0.552 0.245
Gambia Chlorocebus sabaeus 1.4 × 105 0.944 0.975 0.654 0.437 0.577 0.821
Kenya Chlorocebus sabaeus 1.5 × 105 0.946 0.972 0.538 0.453 0.588 0.257
Nevis Chlorocebus sabaeus 1 × 105 0.933 0.976 0.629 0.412 0.599 0.358
South Africa Chlorocebus sabaeus 1.8 × 105 0.944 0.971 0.548 0.423 0.574 0.341
Saint Kitts Chlorocebus sabaeus 1.2 × 105 0.936 0.975 0.586 0.402 0.598 0.336
Zambia Chlorocebus sabaeus 1.7 × 105 0.945 0.971 0.512 0.432 0.585 0.250
African Homo sapiens 5.6 × 104 0.911 0.976 0.579 0.325 0.721 0.349
Admixed American Homo sapiens 4.5 × 104 0.902 0.978 0.584 0.299 0.690 0.345
East Asian Homo sapiens 4 × 104 0.905 0.978 0.585 0.325 0.688 0.249
European Homo sapiens 4.2 × 104 0.906 0.978 0.584 0.329 0.688 0.248
South Asian Homo sapiens 4.4 × 104 0.908 0.978 0.584 0.342 0.691 0.224

Precision is the estimation of the selection coefficient at population scale (S) given that S0 is known. Conversely, recall is the estimation of S0 given selection coefficient at the population scale (S) is known. Recall for beneficial mutations (P[B0B]) is thus the proportion of beneficial non-adaptive mutations among all beneficial mutations. Ne is the estimated effective population size for each population.

Mutations predicted as N0 were effectively composed of a mix of neutral and selected mutations with varying precision (36–63%) and recall (32–45%) across the different populations (Table 1, Fig 3D for humans). The variable proportions between populations can be explained by the effective number of individuals in the population (Ne), a major driver of selection efficacy. Moreover, estimates of mutation rate per generation (u), from Bergeron et al. [43] and Orlando et al. [44], and Watterson’s θ obtained from the synonymous SFS as in Achaz [45], allow us to obtain Ne through Ne = θ/4u. Using correlation analyses that accounted for phylogenetic relationship (see section 4.8 in Materials & methods, Fig E in S3 File), we found that higher Ne was associated with a smaller proportion of nearly-neutral mutations (r2 = 0.31, p = 0.001, Fig 4A). This result follows the prediction of the nearly-neutral theory and suggests that in populations with higher diversity (e.g., Bos or Ovis), discrimination between beneficial and deleterious mutations is more likely to occur (Fig F-H in S3 File). Conversely, many more mutations are effectively neutral in populations with lower diversity (e.g., Homo).

Fig 4. Proportion of nearly-neutral mutations and beneficial non-adaptive mutations as a function of effective population size (Ne).

Fig 4

Populations in circles, mean of the species across the populations as squares. (A) Proportion of nearly-neutral mutations at the population scale (P[N] in the y-axis), shown as a function of estimated effective population size (Ne in the x-axis). (B) Proportion of beneficial non-adaptive mutations among all beneficial mutations (P[B0B] in the y-axis), shown as a function of Ne in x-axis. Correlations account for phylogenetic relationship and non-independence of samples, through the fit of a Phylogenetic Generalized Linear Model (see section 4.6 in Materials & methods).

Finally, mutations predicted to be B0 were indeed beneficial for individuals bearing them, with a precision (Fig 2F) in the range of 19–87% (Table 1 and Fig 3D for humans). This result confirms that selection toward amino acids restoring existing functions is ongoing in these populations. Importantly, the recall value in this case, computed as P[B0B], is the probability for a beneficial mutation at the population scale to be a non-adaptive, i.e., going toward a fitter amino acid given a stable fitness landscape (Fig 2G, Table A in S4 File). In other words, the recall value quantifies the number of beneficial mutations restoring damaged genomes instead of creating adaptive innovations. Across the 28 populations, this proportion is in the range of 11–82% (Table 1), with a mean of 30%. Accounting for phylogenetic relationships, we found no correlation between the proportion of beneficial non-adaptive mutations and estimates of Ne based on genetic diversity (r2 = 0.00, p = 0.772, Fig 4B).

We additionally performed controls and simulations to ensure that our results were robust. First, we controlled that these estimations were not affected by SNP mispolarization (Fig A-B in S4 File). Second, we performed simulations at the population-genetic level and confirmed that our method was able to recover the proportion of beneficial mutations that are non-adaptive in synthetic polymorphism datasets (Fig C in S4 File). Third, we ran our analysis filtering out CpG mutations and obtained values of P[B0B] in the range of 5–27%, with a mean of 14% (Table B-C in S4 File, Fig D in S4 File), providing more conservative estimates. Finally, because the phylogenetic mutation-selection codon model should fit better for genes with uniformly conserved functions, we filtered out genes under pervasive adaptation [46] as a control. In this subset of the exome, containing genes with a more stable fitness landscape, we found an increase in the proportion of beneficial mutations that are non-adaptive (Wilcoxon signed-rank, s = 80, p = 0.002, Table D in S4 File), consistent with our expectation that beneficial mutations occur more frequently in genes under changing fitness landscapes.

2.3 Selection in the terminal lineage and in populations

As an alternative to relying solely on currently segregating mutations to quantify selection, one can leverage both polymorphism within a population and substitutions in the terminal lineage to estimate the distribution of fitness effects (DFE). Hence, we estimated precision and recall as done previously, but now including the number of substitutions per site as input for the DFE estimators (see 4.6 in Materials & methods and Fig E in S4 File). When including substitutions in the terminal lineage, estimates of P[B0B] are in the 10–78% range with a mean of 36%, and 19 out of 28 estimates fall between 15% and 45% (Table E-F in S4 File).

Additionally, we controlled that these estimations were not affected by SNP mispolarization (Fig F in S4 File). We also filtered out genes under pervasive adaptation, and again found an increase in P[B0B], consistent with our expectation (Wilcoxon signed-rank, s = 120, p = 0.027, Table G in S4 File). We assessed the impact of fitting the same functional form of DFE to the three different categories of changes D0, N0 and B0. To this aim we computed the total amount of current selection by fitting either a single DFE on the whole dataset or by summing the other three independent DFEs. These disjoint estimates are well correlated, with a goodness of fit r2 = 0.95, 0.89, 0.82 for respectively P[D] (Fig G in S4 File), P[N](Fig H in S4 File) and P[B] (Fig I in S4 File). Finally, we evaluated the effect of fitting a parametric functional form for the DFE. As implemented in Tataru et al. [42], the DFE is a mixture between a reflected gamma distribution and an exponential distribution (Eq 8, section 4.7 in Materials & methods). Instead of using such a continuous DFE, we also tested our prediction with a non-parametric functional form for the DFE, obtaining estimates of P[B0B] in the 8–94% range, with a mean of 43% (Table H-I in S4 File).

3 Discussion

3.1 Beneficial mutations are not necessarily adaptive

This study represents an essential step toward integrating the different evolutionary scales necessary to understand the combined effects of mutation, selection, and drift on genome evolution. In particular, we have been able to quantify the proportion of beneficial mutations that are non-adapative (i.e., not a response to a change in fitness landscape), which has only been achievable by combining exome-wide data from both phylogenetic and population scales. At the phylogenetic scale, codon diversity at each site of a protein-coding DNA alignment allows for reconstructing an amino-acid fitness landscape, assuming that this landscape is stable along the phylogenetic tree. These amino-acid fitness landscapes allow us to predict any mutation’s selection coefficient (S0) along a protein-coding sequence. We have compared these selective effects to observations at the population level, and by doing so, we have confirmed that mutations predicted to be deleterious (D0S0<-1) are generally purified away in extant populations. Our results concur with previous studies showing that SIFT scores [47, 48], based on amino acid alignments across species, also inform on the deleterious fitness effects exerted at the population scale [25]. However, contrary to SIFT scores, our mutation-selection model is parameterized by a fitness function such that changes are directly interpretable as fitness effects (see also section 1 in S3 File). In this regard, an interesting prediction of our model is that some deleterious mutations reach fixation due to genetic drift, while beneficial non-adaptive mutations restore states of higher fitness. We have tested this hypothesis and have found that a substantial part of these predicted non-adaptive mutations (B0S0>1) are indeed beneficial in extant populations. We estimate that between 11 and 82% of all beneficial mutations in mammalian populations are not adaptive. More specifically, in 24 out of 28 populations analyzed, the percentage of beneficial mutations estimated to be non-adaptive falls between 15 and 45%. These results suggest that many beneficial mutations are not adaptive, but rather restore states of higher fitness. Hence, we can correctly estimate the extent of adaptive evolution only if we account for the number of beneficial non-adaptive mutations [49, 50]. Here instead, we argue that we should dissociate positive selection from adaptive evolution and limit the use of adaptive mutations to those that are associated with adaptation to environmental change as such [12, 13, 51].

3.2 Assumptions and methodological limitations

The exact estimation of the contribution of beneficial non-adaptive mutations to positive selection relies on some hypotheses at both the phylogenetic and population scales and is sensitive to methodological limitations. Indeed, data quality and potentially inadequate modeling choices of both the fitness landscape (at the phylogenetic scale) and fitness effects (at the population scale) might also lead to missed predictions [10]. In practice, we obtained different values of the proportion of non-adaptive beneficial mutations depending on i) the filtering or not of CpG mutations [52], ii) whether we included substitutions in the terminal lineage along with within-population polymorphisms to estimate fitness effects [42], and iii) the model used to infer the fitness effects. It appears that our estimation can be sensitive to model misspecification and overall, while we provide an order of magnitude for the contribution of beneficial non-adaptive mutations to positive selection, methodological improvement on the estimation of the DFE is needed to increase the precision of this value.

To be conservative, we considered mutations as adaptive if they were detected as being under positive selection at the population scale despite them being either incorrectly predicted as deleterious (D0) or nearly-neutral (N0-1<S0<1) from the amino-acid mammalian fitnesses. An example of an incorrectly predicted deleterious mutation (D0) from its fitness landscape could be an amino acid having always been deleterious across mammals, but being advantageous (B) in the current species due to environmental changes or a major shift in their fitness landscape (e.g. domestication). To visualize an example of a wrongly predicted nearly-neutral mutation, we can first imagine a site where only hydrophilic amino acids are accepted because of the protein properties (e.g. a surface site of a globular protein). Let us then assume that such a site is also a target for viruses, hence promoting amino-acid changes which modify the site’s viral affinity [4]. Given the selective pressure favoring amino-acid change, but restricting the possibilities to hydrophilic amino acids, most hydrophilic amino acids will likely be visited along the phylogenetic tree and the mutation-selection model will give high and similar fitnesses to all of them. In such a case, any mutation between hydrophilic amino acids will be wrongly predicted as nearly-neutral (N0), while it is in fact adaptive. In summary, under a changing fitness landscapes [53], our phylogenetic mutation-selection model takes an average over fitness changes observed along the phylogeny, causing beneficial mutations (B) to be predicted as either deleterious (D0) or nearly-neutral (N0), therefore mechanically reducing P[B0B], and making our estimate conservative.

3.3 Convergent adaptation

If there are several substitutions toward the same amino acid along the mammalian tree (section 1.1 in S2 File), our mutation-selection model cannot formally distinguish between a scenario where mammals have fixed deleterious mutations that are reverted in several lineages, from more complex scenarios involving convergent adaptation across mammals. In a first scenario, repeated changes of fitness landscapes in the same direction could occur along several lineages, leading to repeated substitutions in multiple lineages (parallel or convergent adaptation). In a second scenario, an environmental change that occurred near the root of placental mammals (∼100 Mys ago), to which extant populations are currently responding independently through weakly adaptive mutations, could also lead to repeated substitutions toward the same amino acids. Importantly, we would usually expect adaptive convergent mutations to be linked to particular converging phenotypes across mammals, and hence, they should not massively affect the whole genome as we find (Fig C and Table D in S2 File). Moreover, after filtering out genes usually associated to recurrent adaptation (e.g. immune genes), we recover an even higher proportion of beneficial non-adaptive mutations (Table D in S4 File). For these reasons, we argue that the signal of predictable positive selection we recover in extant population is indeed mainly driven by non-adaptive evolution.

3.4 The influence of effective population size

Across the genome, beneficial non-adaptive mutations and deleterious mutations reaching fixation create a balance in which genomes are constantly damaged and restored simultaneously at different loci due to drift. Since the probability of fixation of mutations depends on the effective population size (Ne), the history of Ne plays a crucial role in determining the number of beneficial non-adaptive mutations compensating for deleterious mutations [54]. For example, a population size expansion will increase the efficacy of selection, and a larger proportion of mutations will be beneficial (otherwise effectively neutral), thus increasing the number of beneficial non-adaptive mutations. On the other hand, a population that has experienced a high Ne throughout its history should be closer to an optimal state under a stable fitness landscape, having suffered fewer fixations of deleterious mutations and therefore decreasing the probability of beneficial non-adaptive mutations [55]. Overall, we expect the proportion of beneficial non-adaptive mutations to be more dependent on Ne’s long-term expansions and contractions than on the short-term ones [12, 55].

Moreover, because our model assumes a fixed fitness landscape, it implicitly assumes that Ne is constant along the phylogenetic tree. Fluctuations due to changes in the fitness landscape or in Ne will be averaged out by the assumption of the current model that Ne is constant across lineages. It was recently shown [54], using computer intensive mutation-selection models with fluctuating Ne, that relaxing the assumption of a constant Ne results in more extreme estimates of amino-acid fitnesses than with the standard model used in this study. In other words, by assuming a constant Ne, we are underpowered to detect beneficial non-adaptive mutations since amino acids will have more similar fitnesses. As a consequence, some of the beneficial non-adaptive mutations currently segregating in population will be incorrectly classified as nearly-neutral by the mutation-selection model, and thus be wrongly interpreted as adaptive (see previous section). This ultimately results in lower estimates of the proportion of beneficial non-adaptive mutations. Given this inflation of missed predictions due to change in population sizes [14, 56, 57], our estimated proportion of beneficial non-adaptive mutations among adaptive ones is likely to be an underestimation.

3.5 The role of epistasis and compensatory mutations

Our model assumes that amino-acid fitness landscapes are site-specific and also independent of one another, whereas under pervasive epistasis, the fitness effect of any mutation at a particular site would depend on the amino acids present at other sites. Epistasis is common for mutations that influence the protein’s physical properties (e.g. conformation, stability, or affinity for ligands) or might arise due to nonlinear relationship between the protein’s physical properties and fitness [58]. Regardless of its origin, epistasis has been shown to play a role in the evolution of protein-coding genes, with amino-acid residues in contact within a protein or between proteins tending to co-evolve [5860]. Particularly, the residues in contact co-evolve to become more compatible with each other generating an entrenchment [6163]. Epistasis therefore allows for compensatory mutations, which restore fitness through mutations at loci different from where deleterious mutations took place, representing another case of non-adaptive beneficial mutations, but one which is not accounted for by our method. Hence, the beneficial mutations that we classify as putatively adaptive might in fact be compensatory mutations, making our estimation of the rate of non-adaptive beneficial mutations conservative.

Despite epistasis being an important factor in protein evolution, several deep-mutational scanning experiments have revealed that a site-specific fitness landscape predicts the evolution of sequences in nature with considerable accuracy [6466]. Additionally, the fact that we observe such a high proportion of beneficial non-adaptive mutations suggests that the underlying assumptions of our model, namely site-independence, implying no epistasis, and a static fitness landscape, are a reasonable approximation for the underlying fitness landscape of proteins. Our results imply that the fitness effects of new mutations are mostly conserved across mammalian orthologs, in agreement with other studies showing that for conserved orthologs with similar structures and functions, models without epistasis provide a reasonable estimate of fitness effects in protein-coding genes [67, 68]. Conceptually, the framework presented here, with the addition of a more complex protein fitness landscape at the phylogenetic scale, could be used to infer the relative contribution of compensatory mutations to non-adaptive and adaptive evolution.

3.6 Detecting adaptation above the nearly-neutral background

A long-standing debate in molecular evolution is whether the variation we observe between species in protein-coding genes is primarily due to nearly-neutral mutations reaching fixation by drift or primarily due to adaptation [15, 6971]. Measuring the “rate of adaptation” in proteins, as pioneered by McDonald & Kreitman [5], has been central to inform this debate [72]. However, the McDonald & Kreitman test detects signatures of accelerated evolution in a given terminal branch compared to an expectation based on polymorphism present in the population. It considers the fraction of substitutions that fix too quickly as “adaptive” [57] despite there being other processes that can lead to their fixation [7375] and some of these substitutions being beneficial but non-adaptive [1214, 17]. Here, the expectation is built on the pattern of substitutions across a phylogeny compared to the fitness effects that can be estimated from both substitutions in a terminal lineage and polymorphism in populations. Moreover, the goal is not to detect a fraction of beneficial substitution (i.e. “adaptive” substitutions for McDonald & Kreitman [5]), but to estimate the proportion of non-adaptive mutations among beneficial ones.

We provide evidence that in mammalian orthologs, many substitutions occur through fixation of both deleterious mutations and beneficial non-adaptive mutations. Detecting adaptation above this background of substitutions remains a challenge [69, 76]. Mathematically, the surplus of positive selection due to an externally-driven changing fitness landscape is called fitness flux, and requires experimentally measuring the selection coefficient of each mutation in each genetic background. The fitness flux can be estimated if either the substitutions history is known [13] or changes of frequency in currently segregating variants [51]. Without experimentally measured selection coefficients, another strategy is precisely to use a nearly-neutral substitution model as a null model of evolution. Under a strictly neutral evolution of protein-coding sequence, we expect the ratio of non-synonymous over synonymous substitutions (dN/dS) to be equal to one. Deviations from this neutral expectation, such as dN/dS > 1, which can be generated by an excess of non-synonymous substitutions, is generally interpreted as a sign of adaptation. However, as shown in this study, a dN/dS > 1 is not necessarily a signature of adaptation but can be due to beneficial non-adaptive mutations. So, by relaxing the strict neutrality and assuming a stable fitness landscape instead, one can predict the expected rate of evolution, called ω0 [77, 78]. Adaptation can thus be considered as evolution under a changing fitness landscape and tested as such by searching for the signature of dN/dS > ω0 [19, 30, 79]. Using a stable fitness landscape as a null model of evolution, thus accounting for selective constraints exerted on the different amino acids, increased the statistical power in testing for adaptation [46]. Instead of relying solely on summary statistics (such as dN/dS or ω0), another strategy to detect adaptation is to include changes in the fitness landscapes inherently within the mutation selection framework, either with small changes along the phylogeny [80] or either by allowing fitness to change on subsets of branches [81, 82]. Such mechanistic models could be more general than site-specific fitness landscapes, including epistasis and changing fitness landscapes [62, 82].

3.7 Conclusions

We have provided empirical evidence that an evolutionary model assuming a stable fitness landscape at the mammalian scale allows us to predict the fitness effects of mutations in extant populations and individuals, acknowledging the balance between deleterious and beneficial non-adaptive mutations. We argue that such a model would represent a null expectation for the evolution of protein-coding genes in the absence of adaptation. Altogether, because a substantial part of positive selection can be explained by beneficial non-adaptive mutations, but not its entirety, we argue that the mammalian exome is shaped by both adaptive and non-adaptive processes, and that none of them alone is sufficient to explain the observed patterns of changes. In that sense, to avoid conflating beneficial mutations with adaptive evolution, the term “adaptation” should retain its original meaning associated with a change in the underlying fitness landscape and be modelled as such [13, 51].

4 Materials & methods

4.1 Phylogenetic dataset

Protein-coding DNA sequence alignments in placental mammals and their corresponding gene trees come from the OrthoMaM database (https://www.orthomam.univ-montp2.fr) and were processed as in Latrille et al. [46]. OrthoMaM contains a total of 116 mammalian reference sequences in v10c [33, 34, 83].

Genes located on the X and Y chromosomes and on the mitochondrial genome were discarded from the analysis because the level of polymorphism—which is necessary for population-based analyses—is expected to be different in these three regions compared to the autosomal genome. Sequences of species for which we used population-level polymorphism (see section 4.3) and their sister species, were removed from the analysis to ensure independence between the data used in the phylogenetic and population scales. Sites in the alignment containing more than 10% of gaps across the species were discarded. Altogether, our genome-wide dataset contains 14, 509 protein-coding DNA sequences in 87 placental mammals.

4.2 Selection coefficient (S0) in a phylogeny-based method

We analyzed the phylogenetic-level data using mutation-selection models. These models assume the protein-coding sequences are at mutation-selection balance under a fixed fitness landscape characterized by a fitness vector over the 20 amino acids at each site [26, 28, 84]. Mathematically, the rate of non-synonymous substitution from codon a to codon b (qab(i)) at site i of the sequence is equal to the rate of mutation of the underlying nucleotide change (μab) multiplied by the scaled probability of mutation fixation (Pab(i)). The probability of fixation depends on the difference between the scaled fitness of the amino acid encoded by the mutated codon (Fb(i)) and the amino acid encoded by the original codon (Fa(i)) at site i [85, 86].

The rate of substitution from codon a to b at a site i is thus:

{qab(i)=0ifcodonsaandbaremorethanonemutationaway,qab(i)=μabifcodonsaandbaresynonymous,andqab(i)=μabFb(i)-Fa(i)1-eFa(i)-Fb(i)ifcodonsaandbarenon-synonymous. (1)

Fitting the mutation-selection model on a multi-species sequence alignment leads to an estimation of the gene-wide 4 × 4 nucleotide mutation rate matrix (μ) as well as the 20 amino-acid fitness landscape (F(i)) at each site i. The priors and full configuration of the model are given in S1 File (section 1). From a technical perspective, the Bayesian estimation is a two-step procedure [87]. The first step is a data augmentation of the alignment, consisting in sampling a detailed substitution history along the phylogenetic tree for each site, given the current value of the model parameters. In the second step, the parameters of the model can then be directly updated by a Gibbs sampling procedure, conditional on the current substitution history. Alternating between these two sampling steps yields a Markov chain Monte-Carlo (MCMC) procedure whose equilibrium distribution is the posterior probability density of interest [87, 88]. Additionally, across-site heterogeneities in amino-acid fitness profiles are captured by a Dirichlet process. More precisely, the number of amino-acid fitness profiles estimated is lower than the number of sites in the alignment. Consequently each profile has several sites assigned to it, resulting in a particular configuration of the Dirichlet process. Conversely, sites with similar signatures are assigned to the same fitness profile. This configuration of the Dirichlet process is resampled through the MCMC to estimate a posterior distribution of amino acid profiles for each site specifically [35, 89]. From a more mechanistic perspective, even though not all amino acids occur at every single codon site of the DNA alignment, we can nevertheless estimate the distribution of amino-acid fitnesses by generalizing the information recovered across sites and across amino acids based on the phylogenetic relationship among samples. In particular, synonymous substitutions along the tree contain the signal to estimate branch lengths and the nucleotide transition matrix, while non-synonymous substitutions contain information on fitness difference between codons connected by single nucleotide changes [35].

The selection coefficient for a mutation from codon a to codon b at site i is defined as:

S0(i)(ab)=ΔF(i)=Fb(i)-Fa(i). (2)

In our subsequent derivation the source (a) and target (b) codons as well as the site (i) are implicit and thus never explicitly written.

The scaled selection coefficient (S0 = ΔF) is formally the product of the selection coefficient at the individual level (s) and the effective population size (Ne), as S0 = 4Ne × s. The value of S0 informs us on the strength of selection exerted on amino acids changes. Thus, according to its S0 value, we can classify any mutation as either a deleterious mutation toward a less fit amino acid (D0S0<-1), a nearly-neutral mutation (N0-1<S0<1), or a mutation toward a known fitter amino acid, constituting thus a beneficial non-adaptive mutation (B0S0>1).

We used the Bayesian software BayesCode (https://github.com/ThibaultLatrille/bayescode, v1.3.1) to estimate the selection coefficients for each protein-coding gene in the mammalian dataset. We ran the MCMC algorithm implemented in BayesCode for 2, 000 generations as described in Latrille et al. [46]. For each gene, after discarding a burn-in period of 1, 000 generations of MCMC, we obtained posterior mean estimates (over the 1, 000 generations left of MCMC) of the mutation rate matrix (μ) as well as the 20 amino-acid fitness landscape (F(i)) at each site i.

4.3 Polymorphism dataset

The genetic variants representing the population level polymorphisms were obtained from the following species and their available datasets: Equus caballus (EquCab2 assembly in the EVA study PRJEB9799 [90]), Bos taurus (UMD3.1 assembly in the NextGen project: https://projects.ensembl.org/nextgen/), Ovis aries (Oar_v3.1 assembly in the NextGen project), Capra hircus (CHIR1 assembly in the NextGen project, converted to ARS1 assembly with dbSNP identifiers [91]), Chlorocebus sabaeus (ChlSab1.1 assembly in the EVA project PRJEB22989 [92]), Homo sapiens (GRCh38 assembly in the 1000 Genomes Project [93]). In total, we analyzed 28 populations across the 6 different species with polymorphism data. The data was processed as described in Latrille et al. [46].

Only bi-allelic single nucleotide polymorphisms (SNPs) found within a gene were in our polymorphism dataset, while nonsense variants and indels were discarded. To construct the dataset, we first recovered the location of each SNP (represented by its chromosome, position, and strand) in the focal species and matched it to its corresponding position in the coding sequence (CDS) using gene annotation files (GTF format) downloaded from Ensembl (ensembl.org). We then verified that the SNP downloaded from Ensembl matched the reference in the CDS in FASTA format. Next, the position in the CDS was converted to the corresponding position in the multi-species sequence alignment (containing gaps) from the OrthoMaM database (see section 4.2) for the corresponding gene by doing a global pairwise alignment (Biopython function pairwise2). This conversion from genomic position to alignment position was only possible when the assembly used for SNP-calling was the same as the one used in the OrthoMaM alignment, the GTF annotations, and the FASTA sequences. SNPs were polarized using the three closest outgroups found in the OrthoMaM alignment with est-usfs v2.04 [94], and alleles with a probability of being derived lower than 0.99 were discarded.

4.4 Mutational opportunities

The mutational opportunities of any new mutation refer to its likelihood of falling into a specific category (synonymous, deleterious, nearly-neutral, or beneficial). Deriving such opportunities is necessary to estimate the strength of selection exerted at the population scale since different categories might have different mutational opportunities, and thus polymorphism and divergence need to be corrected accordingly (see sections 4.5, 4.6, and 4.7). To calculate mutational opportunities, we reconstructed the ancestral exome of each of the 28 populations by using the most likely ancestral state from est-usfs (see section 4.3), which differs from the corresponding species reference exome since it accounts for the variability present in the specific population.

From the reconstructed ancestral exome, all possible mutations were computed, weighted by the instantaneous rate of change between nucleotides obtained from the mutation rate matrix (μ, see section 4.2), summing to μtot across the whole exome, and to μsyn when restricted to synonymous mutations. Finally, the mutational opportunities for synonymous mutations were computed as the total number of sites across the exome (Ltot) weighted by the proportion of synonymous mutations among all possible mutations as:

Lsyn=Ltotμsynμtot. (3)

Similarly, for non-synonymous mutations, the total mutation rate for each class of selection x{D0,N0,B0}, called μ(x), was estimated as the sum across all non-synonymous mutations if their selection coefficient at the phylogenetic scale is in the class S0x. Accordingly, the mutational opportunities (L(x)) for each class of selection coefficient (x) was finally computed as the total number of sites across the exome (Ltot) weighted by the ratio of the aggregated mutations rates falling in the class μ(x):

L(x)=Ltotμ(x)μtot. (4)

Finally, P[x] is the probability for a non-synonymous mutation to be in the class x, thus computed as:

P[x]=L(x)y{D0,N0,B0}L(y). (5)

4.5 Substitution mapping and dN/dS in the terminal branch

We inferred the protein-coding DNA sequences for each node of the 4-taxa tree containing the focal species and the three closest outgroups species found in the OrthoMaM alignment by applying the M5 codon model (gamma site rate variation) as implemented in FastML.v3.11 [95]. Consequently, for each focal species we reconstructed the protein coding DNA sequence of the whole exome at the base of the terminal branch before the split from the sister species. We considered Ceratotherium simum simum as Equus caballus’ sister species; Bison bison bison as Bos taurus’ sister species; Pantholops hodgsonii as Ovis aries’ sister species; Pantholops hodgsonii as Capra hircus’ sister species; Macaca mulatta as Chlorocebus sabaeus’ sister species and finally, we considered Pan troglodytes as Homo sapiens’ sister species. From this reconstructed exome, we determined the direction of the substitution occurring along the terminal branch of the phylogenetic tree toward each extant population. SNPs segregating in the population were discarded, and the most likely ancestral state from est-usfs (see section 4.3) was used as the reference for each extant population. For each substitution, we recovered its S0 value as calculated through the phylogeny-based method (see section 4.2). Finally, the rate of non-synonymous over synonymous substitutions for a given class of selection coefficient (x{D0,N0,B0}) was computed as:

{dN(x)=D(x)L(x),dS=DsynLsyn, (6)

where D(x) was the number of non-synonymous substitutions in class x, Dsyn was the number of synonymous substitutions across the exome, while L(x) and Lsyn were the numbers of non-synonymous and synonymous mutational opportunities, respectively, as defined in section 4.4. δ(dN/dS) was computed as the difference between dN/dS computed over all substitutions and dN/dS when we removed beneficial non-adaptive mutations dN(S0 < 1)/dS, normalized by dN/dS. Note that the quantities δ(dN/dS) and δ(dN) are equivalent due to the simplification of the factor dS:

δ(dN/dS)=dN/dS-dN(S0<1)/dSdN/dS=dN-dN(S0<1)dN=δ(dN). (7)

4.6 Scaled selection coefficients (S) in a population-based method

To obtain a quantitative estimate of the distribution of selection coefficients for each category of SNPs, we used the polyDFE model [42, 96]. This model uses the count of derived alleles to infer the distribution of fitness effects (DFE). The probability of sampling an allele at a given frequency (before fixation or extinction) is informative of its scaled selection coefficient at the population scale (S). Therefore, pooled across many sites, the site-frequency spectrum (SFS) provides information on the underlying S of mutations. However, estimating a single S for all sampled mutations is biologically unrealistic, and a DFE of mutations is usually assumed [39, 40]. The polyDFE [42, 96] software implements a mixture of a Γ and exponential distributions to model the DFE of non-synonymous mutations, while synonymous mutations are considered neutral. The model estimates the parameters βd, b, pb and βb for non-synonymous mutations as:

ϕS;βd,b,pb,βb=(1pb)fΓ(S;βd,b)ifS0,pbfe(S;βb)ifS>0, (8)

where βd ≤ −1 is the estimated mean of the DFE for S ≤ 0; b ≥ 0.2 is the estimated shape of the Γ distribution; 0 ≤ pb ≤ 1 is the estimated probability that S > 0; βb ≥ 1 is the estimated mean of the DFE for S > 0; and fΓ(S;m, b) is the density of the Γ distribution with mean m and shape b, while fe(S;m) is the density of the exponential distribution with mean m.

PolyDFE requires one SFS for non-synonymous mutations and one for synonymous mutations (neutral expectation), as well as the number of sites on which each SFS was sampled. For populations containing more than 8 individuals, the SFS was subsampled down to 16 chromosomes (8 diploid individuals) without replacement (hyper-geometric distribution) to alleviate the effect of different sampling depths in the 28 populations. Altogether, for each class of selection (x{D0,N0,B0}) of non-synonymous SNPs, we aggregated all the SNPs in the selection class x as an SFS. The number of sites on which each SFS was sampled is given by L(x) for the non-synonymous SFS and Lsyn for the synonymous SFS respectively. For each class of selection x, once fitted to the data using maximum likelihood with polyDFE, the parameters of the DFE (βd, b, pb, βb) were used to compute P[Dx], P[Nx], and P[Bx] as:

P[Dx]=P[S<-1x]=(1-pb)--1fΓ(-S;-βd,b)dS, (9)
P[Nx]=P[-1<S<1x]=(1-pb)-10fΓ(-S;-βd,b)dS+pb01fe(S;βb)dS, (10)
P[Bx]=P[S>1x]=pb1+fe(S;βb)dS. (11)

Rather than relying solely on currently segregating mutations to quantify selection, polyDFE can leverage both divergence and polymorphism to estimate the parameters of the DFE. We can thus add four more inputs to polyDFE: D(x), L(x), Dsyn and Lsyn such as defined in the previous section. Because the estimates of DFE are different with this method, we naturally obtained different values of P[Dx], P[Nx], and P[Bx].

4.7 Precision and recall

For readability, we give here precision and recall for beneficial mutations (B0 and B), but it can be obtained using the same derivation for the deleterious mutations (D0 and D) and nearly-neutral mutations (N0 and N).

Precision is the proportion of mutations correctly predicted as beneficial (P[BB0]) out of all predicted as beneficial non-adaptive mutations (P[B0]), which can be written as a conditional probability:

P[BB0]P[B0]=P[BB0]. (12)

Namely, precision corresponds to the probability for a B0 mutation to be effectively beneficial at the population level (B). This probability, computed from Eq 11, is obtained by restricting our analysis to SNPs that are predicted to be beneficial non-adaptive mutations (yellow fill for the category B0 in Fig 3D).

Recall is the proportion of mutations correctly predicted as beneficial (P[BB0]) out of all beneficial mutations (P[B]), which can be written as a conditional probability:

P[BB0]P[B]=P[B0B]. (13)

Namely, recall corresponds to the probability for a beneficial mutation at the population level (B) to be a beneficial non-adaptive mutation (B0). Using Bayes theorem, recall can be re-written as:

P[B0B]=P[BB0]×P[B0]P[B], (14)

where P[BB0] and P[B0] can be calculated using Eqs 12 and 5, respectively, and P[B] is the probability of a mutation to be beneficial at the level of the population, which can be computed from the law of total probabilities as:

P[B]=x{D0,N0,B0}P[Bx]×P[x]. (15)

4.8 Correlation with effective population size (Ne)

Genetic diversity estimator Watterson’s θS was obtained for each population from the synonymous SFS as in Achaz [45]. For each popuation, Ne was estimated from the equation Ne = θS/(4 × u), where u is the mutation rate per generation. Estimates for u were averaged per species across the pedigree-based estimation in Bergeron et al. [43] for Homo, Bos, Capra and Chlorocebus. For Ovis we used the estimated u of Capra. For Equus, we used u as estimated in Orlando et al. [44] (u = 7.24 × 10−9). Because a correlation must account for phylogenetic relationship and non-independence of samples, we fitted a Phylogenetic Generalized Linear Model in R with the method pgls with default settings from the package caper [97]. The mammalian dated tree was obtained from TimeTree [98] and pruned to include only the species analysed in this study, with multi-furcation of the different populations from each species placed at the same divergence time as the species (section 2.1 in S3 File).

Supporting information

S1 File. Supplementary appendix on the parameterization of Mutation-selection codon models.

Contains 5 pages of supplementary information including 1 figure (Fig A).

(PDF)

pgen.1011536.s001.pdf (304.1KB, pdf)
S2 File. Supplementary appendix on the caracterization of non-adaptive beneficial mutations.

Contains 12 pages of supplementary information including 3 figures (Fig A to C) and 6 tables (Table A to F).

(PDF)

pgen.1011536.s002.pdf (750.3KB, pdf)
S3 File. Supplementary appendix on the contrast of selection at the phylogenetic and population-genetic scales.

Contains 7 pages of supplementary information including 8 figures (Fig A to H).

(PDF)

pgen.1011536.s003.pdf (730.3KB, pdf)
S4 File. Supplementary appendix on controls for estimating the proportion of beneficial mutations that are not adaptive.

Contains 21 pages of supplementary information including 9 figures (Fig A to I) and 9 tables (Table A to I).

(PDF)

pgen.1011536.s004.pdf (1.2MB, pdf)

Acknowledgments

We gratefully acknowledge the help of Mélodie Bastian, Nicolas Lartillot, Carina Farah Mugal, Laurent Duret, Alexandre Reymond, Daniele Silvestro and Nicolas Gambardella for their advice and reviews concerning this manuscript. This work was performed using the computing facilities of the CC LBBE/PRABI. This study makes use of data generated by the NextGen Consortium.

Data Availability

The data underlying this article are available at https://doi.org/10.5281/zenodo.7878953. Snakemake pipeline, analysis scripts and documentation are available at https://github.com/ThibaultLatrille/SelCoeff.

Funding Statement

This work was funded by Faculté de Biologie et de Médecine, Université de Lausanne (https://www.unil.ch; to TL, DAH and NS), Swiss National Science Fund (https://www.snf.ch; grant 310030-185223 to NS) and Agence Nationale de la Recherche (https://anr.fr/; grant ANR-19-CE12-0019 / HotRec to JJ). The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Darwin C. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. vol. 220. John Murray; 1859. [PMC free article] [PubMed] [Google Scholar]
  • 2. Merrell DJ. The Adaptive Seascape: The Mechanism of Evolution. U of Minnesota Press; 1994. [Google Scholar]
  • 3. Gavrilets S, Losos JB. Adaptive Radiation: Contrasting Theory with Data. Science. 2009;323(5915):732–737. doi: 10.1126/science.1157966 [DOI] [PubMed] [Google Scholar]
  • 4. Enard D, Cai L, Gwennap C, Petrov DA. Viruses Are a Dominant Driver of Protein Adaptation in Mammals. eLife. 2016;5:e12469. doi: 10.7554/eLife.12469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. McDonald JH, Kreitman M. Adaptative Protein Evolution at Adh Locus in Drosophila. Nature. 1991;351(6328):652–654. doi: 10.1038/351652a0 [DOI] [PubMed] [Google Scholar]
  • 6. Smith NGC, Eyre-Walker A. Adaptive Protein Evolution in Drosophila. Nature. 2002;415(6875):1022–1024. doi: 10.1038/4151022a [DOI] [PubMed] [Google Scholar]
  • 7. Welch JJ. Estimating the Genomewide Rate of Adaptive Protein Evolution in Drosophila. Genetics. 2006;173(2):821–837. doi: 10.1534/genetics.106.056911 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Yang Z, Bielawski JR. Statistical Methods for Detecting Molecular Adaptation. Trends in Ecology and Evolution. 2000;15(12):496–503. doi: 10.1016/S0169-5347(00)01994-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Eyre-Walker A. The Genomic Rate of Adaptive Evolution. Trends in Ecology & Evolution. 2006;21(10):569–575. doi: 10.1016/j.tree.2006.06.015 [DOI] [PubMed] [Google Scholar]
  • 10. Moutinho AF, Bataillon T, Dutheil JY. Variation of the Adaptive Substitution Rate between Species and within Genomes. Evolutionary Ecology. 2019;34(3):315–338. doi: 10.1007/s10682-019-10026-z [DOI] [Google Scholar]
  • 11. Lynch M. Mutation Pressure, Drift, and the Pace of Molecular Coevolution. Proceedings of the National Academy of Sciences. 2023;120(27):e2306741120. doi: 10.1073/pnas.2306741120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Charlesworth J, Eyre-Walker A. The Other Side of the Nearly Neutral Theory, Evidence of Slightly Advantageous Back-Mutations. Proceedings of the National Academy of Sciences. 2007;104(43):16992–16997. doi: 10.1073/pnas.0705456104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Mustonen V, Lässig M. From Fitness Landscapes to Seascapes: Non-Equilibrium Dynamics of Selection and Adaptation. Trends in genetics. 2009;25(3):111–119. doi: 10.1016/j.tig.2009.01.002 [DOI] [PubMed] [Google Scholar]
  • 14. Jones CT, Youssef N, Susko E, Bielawski JP. Shifting Balance on a Static Mutation–Selection Landscape: A Novel Scenario of Positive Selection. Molecular Biology and Evolution. 2017;34(2):391–407. [DOI] [PubMed] [Google Scholar]
  • 15. Ohta T. The Nearly Neutral Theory of Molecular Evolution. Annual Review of Ecology and Systematics. 1992;23(1992):263–286. doi: 10.1146/annurev.es.23.110192.001403 [DOI] [Google Scholar]
  • 16. Gillespie JH. On Ohta’s Hypothesis: Most Amino Acid Substitutions Are Deleterious. Journal of Molecular Evolution. 1995;40(1):64–69. doi: 10.1007/BF00166596 [DOI] [Google Scholar]
  • 17. Hartl DL, Taubes CH. Compensatory Nearly Neutral Mutations: Selection without Adaptation. Journal of Theoretical Biology. 1996;182(3):303–309. doi: 10.1006/jtbi.1996.0168 [DOI] [PubMed] [Google Scholar]
  • 18. Sella G, Hirsh AE. The Application of Statistical Physics to Evolutionary Biology. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(27):9541–9546. doi: 10.1073/pnas.0501865102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Cvijović I, Good BH, Jerison ER, Desai MM. Fate of a Mutation in a Fluctuating Environment. Proceedings of the National Academy of Sciences. 2015;112(36):E5021–E5028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Piganeau G, Eyre-Walker A. Estimating the Distribution of Fitness Effects from DNA Sequence Data: Implications for the Molecular Clock. Proceedings of the National Academy of Sciences of the United States of America. 2003;100(18):10335–10340. doi: 10.1073/pnas.1833064100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Chi PB, Kosater WM, Liberles DA. Detecting Signatures of Positive Selection against a Backdrop of Compensatory Processes. Molecular Biology and Evolution. 2020;37(11):3353–3362. doi: 10.1093/molbev/msaa161 [DOI] [PubMed] [Google Scholar]
  • 22. Bazykin GA. Changing Preferences: Deformation of Single Position Amino Acid Fitness Landscapes and Evolution of Proteins. Biology letters. 2015;11(10):20150315. doi: 10.1098/rsbl.2015.0315 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Moses AM, Durbin R. Inferring Selection on Amino Acid Preference in Protein Domains. Molecular Biology and Evolution. 2009;26(3):527–536. doi: 10.1093/molbev/msn286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Fischer A, Greenman C, Mustonen V. Germline Fitness-Based Scoring of Cancer Mutations. Genetics. 2011;188(2):383–393. doi: 10.1534/genetics.111.127480 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Chen J, Bataillon T, Glémin S, Lascoux M. Hunting for Beneficial Mutations: Conditioning on SIFT Scores When Estimating the Distribution of Fitness Effect of New Mutations. Genome Biology and Evolution. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Halpern AL, Bruno WJ. Evolutionary Distances for Protein-Coding Sequences: Modeling Site-Specific Residue Frequencies. Molecular Biology and Evolution. 1998;15(7):910–917. doi: 10.1093/oxfordjournals.molbev.a025995 [DOI] [PubMed] [Google Scholar]
  • 27. McCandlish DM, Stoltzfus A. Modeling Evolution Using the Probability of Fixation: History and Implications. Quarterly Review of Biology. 2014;89(3):225–252. doi: 10.1086/677571 [DOI] [PubMed] [Google Scholar]
  • 28. Rodrigue N, Philippe H. Mechanistic Revisions of Phenomenological Modeling Strategies in Molecular Evolution. Trends in Genetics. 2010;26(6):248–252. doi: 10.1016/j.tig.2010.04.001 [DOI] [PubMed] [Google Scholar]
  • 29. Tamuri AU, Goldstein RA. Estimating the Distribution of Selection Coefficients from Phylogenetic Data Using Sitewise Mutation-Selection Models. Genetics. 2012;190(3):1101–1115. doi: 10.1534/genetics.111.136432 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Rodrigue N, Lartillot N. Detecting Adaptation in Protein-Coding Genes Using a Bayesian Site- Heterogeneous Mutation-Selection Codon Substitution Model. Molecular Biology and Evolution. 2017;34(1):204–214. doi: 10.1093/molbev/msw220 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Foley NM, Mason VC, Harris AJ, Bredemeyer KR, Damas J, Lewin HA, et al. A Genomic Timescale for Placental Mammal Evolution. Science. 2023;380(6643):eabl8189. doi: 10.1126/science.abl8189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, et al. Ensembl 2021. Nucleic Acids Research. 2021;49(D1):D884–D891. doi: 10.1093/nar/gkaa942 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Ranwez V, Delsuc F, Ranwez S, Belkhir K, Tilak MK, Douzery EJ. OrthoMaM: A Database of Orthologous Genomic Markers for Placental Mammal Phylogenetics. BMC Evolutionary Biology. 2007;7(1):1–12. doi: 10.1186/1471-2148-7-241 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Scornavacca C, Belkhir K, Lopez J, Dernat R, Delsuc F, Douzery EJP, et al. OrthoMaM V10: Scaling-up Orthologous Coding Sequence and Exon Alignments with More than One Hundred Mammalian Genomes. Molecular Biology and Evolution. 2019;36(4):861–862. doi: 10.1093/molbev/msz015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Rodrigue N, Philippe H, Lartillot N. Mutation-Selection Models of Coding Sequence Evolution with Site-Heterogeneous Amino Acid Fitness Profiles. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(10):4629–34. doi: 10.1073/pnas.0910915107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Research. 2018;46(D1):D1062–D1067. doi: 10.1093/nar/gkx1153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Grimm DG, Azencott CA, Aicheler F, Gieraths U, MacArthur DG, Samocha KE, et al. The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity. Human Mutation. 2015;36(5):513–523. doi: 10.1002/humu.22768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Sullivan PF, Meadows JRS, Gazal S, Phan BN, Li X, Genereux DP, et al. Leveraging Base-Pair Mammalian Constraint to Understand Genetic Variation and Human Disease. Science. 2023;380(6643):eabn2937. doi: 10.1126/science.abn2937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Eyre-Walker A, Woolfit M, Phelps T. The Distribution of Fitness Effects of New Deleterious Amino Acid Mutations in Humans. Genetics. 2006;173(2):891–900. doi: 10.1534/genetics.106.057570 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Eyre-Walker A, Keightley PD. Estimating the Rate of Adaptive Molecular Evolution in the Presence of Slightly Deleterious Mutations and Population Size Change. Molecular Biology and Evolution. 2009;26(9):2097–2108. doi: 10.1093/molbev/msp119 [DOI] [PubMed] [Google Scholar]
  • 41. Galtier N. Adaptive Protein Evolution in Animals and the Effective Population Size Hypothesis. PLoS Genetics. 2016;12(1):e1005774. doi: 10.1371/journal.pgen.1005774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Tataru P, Mollion M, Glémin S, Bataillon T. Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data. Genetics. 2017;207(3):1103–1119. doi: 10.1534/genetics.117.300323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Bergeron LA, Besenbacher S, Zheng J, Li P, Bertelsen MF, Quintard B, et al. Evolution of the Germline Mutation Rate across Vertebrates. Nature. 2023; p. 1–7. doi: 10.1038/s41586-023-05752-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Orlando L, Ginolhac A, Zhang G, Froese D, Albrechtsen A, Stiller M, et al. Recalibrating Equus Evolution Using the Genome Sequence of an Early Middle Pleistocene Horse. Nature. 2013;499(7456):74–78. doi: 10.1038/nature12323 [DOI] [PubMed] [Google Scholar]
  • 45. Achaz G. Frequency Spectrum Neutrality Tests: One for All and All for One. Genetics. 2009;183(1):249–258. doi: 10.1534/genetics.109.104042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Latrille T, Rodrigue N, Lartillot N. Genes and Sites under Adaptation at the Phylogenetic Scale Also Exhibit Adaptation at the Population-Genetic Scale. Proceedings of the National Academy of Sciences of the United States of America. 2023;120(11):e2214977120. doi: 10.1073/pnas.2214977120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Ng PC, Henikoff S. SIFT: Predicting Amino Acid Changes That Affect Protein Function. Nucleic Acids Research. 2003;31(13):3812–3814. doi: 10.1093/nar/gkg509 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Vaser R, Adusumalli S, Leng SN, Sikic M, Ng PC. SIFT Missense Predictions for Genomes. Nature Protocols. 2016;11(1):1–9. doi: 10.1038/nprot.2015.123 [DOI] [PubMed] [Google Scholar]
  • 49. Keightley PD, Eyre-Walker A. What Can We Learn about the Distribution of Fitness Effects of New Mutations from DNA Sequence Data? Philosophical Transactions of the Royal Society B: Biological Sciences. 2010;365(1544):1187–1193. doi: 10.1098/rstb.2009.0266 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Rice DP, Good BH, Desai MM. The Evolutionarily Stable Distribution of Fitness Effects. Genetics. 2015;200(1):321–329. doi: 10.1534/genetics.114.173815 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Mustonen V, Lässig M. Fitness Flux and Ubiquity of Adaptive Evolution. Proceedings of the National Academy of Sciences. 2010;107(9):4248–4253. doi: 10.1073/pnas.0907953107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Eyre-Walker A, Eyre-Walker YC. How Much of the Variation in the Mutation Rate along the Human Genome Can Be Explained? G3: Genes, Genomes, Genetics. 2014;4(9):1667–1670. doi: 10.1534/g3.114.012849 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Mustonen V, Lässig M. Molecular Evolution under Fitness Fluctuations. Physical Review Letters. 2008;100(10):108101. doi: 10.1103/PhysRevLett.100.108101 [DOI] [PubMed] [Google Scholar]
  • 54. Latrille T, Lanore V, Lartillot N. Inferring Long-Term Effective Population Size with Mutation–Selection Models. Molecular Biology and Evolution. 2021;38(10):4573–4587. doi: 10.1093/molbev/msab160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Huber CD, Kim BY, Marsden CD, Lohmueller KE. Determining the Factors Driving Selective Effects of New Nonsynonymous Mutations. Proceedings of the National Academy of Sciences. 2017;114(17):4465–4470. doi: 10.1073/pnas.1619508114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Lanfear R, Kokko H, Eyre-Walker A. Population Size and the Rate of Evolution. Trends in Ecology and Evolution. 2014;29(1):33–41. doi: 10.1016/j.tree.2013.09.009 [DOI] [PubMed] [Google Scholar]
  • 57. Platt A, Weber CC, Liberles DA. Protein Evolution Depends on Multiple Distinct Population Size Parameters. BMC Evolutionary Biology. 2018;18(1):1–9. doi: 10.1186/s12862-017-1085-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Starr TN, Thornton JW. Epistasis in Protein Evolution. Protein Science. 2016;25(7):1204–1218. doi: 10.1002/pro.2897 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-Coupling Analysis of Residue Coevolution Captures Native Contacts across Many Protein Families. Proceedings of the National Academy of Sciences. 2011;108(49):E1293–E1301. doi: 10.1073/pnas.1111471108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Marks DS, Hopf TA, Sander C. Protein Structure Prediction from Sequence Variation. Nature Biotechnology. 2012;30(11):1072–1080. doi: 10.1038/nbt.2419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Goldstein RA, Pollard ST, Shah SD, Pollock DD. Nonadaptive Amino Acid Convergence Rates Decrease over Time. Molecular Biology and Evolution. 2015;32(6):1373–1381. doi: 10.1093/molbev/msv041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Goldstein RA, Pollock DD. Sequence Entropy of Folding and the Absolute Rate of Amino Acid Substitutions. Nature Ecology & Evolution. 2017;1(12):1923–1930. doi: 10.1038/s41559-017-0338-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Park Y, Metzger BPH, Thornton JW. Epistatic Drift Causes Gradual Decay of Predictability in Protein Evolution. Science. 2022;376(6595):823–830. doi: 10.1126/science.abn6895 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Ashenberg O, Gong LI, Bloom JD. Mutational Effects on Stability Are Largely Conserved during Protein Evolution. Proceedings of the National Academy of Sciences of the United States of America. 2013;110(52):21071–21076. doi: 10.1073/pnas.1314781111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Doud MB, Ashenberg O, Bloom JD. Site-Specific Amino Acid Preferences Are Mostly Conserved in Two Closely Related Protein Homologs. Molecular Biology and Evolution. 2015;32(11):2944–2960. doi: 10.1093/molbev/msv167 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Bloom JD. Identification of Positive Selection in Genes Is Greatly Improved by Using Experimentally Informed Site-Specific Models. Biology Direct. 2017;12(1):1–24. doi: 10.1186/s13062-016-0172-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Youssef N, Susko E, Bielawski JP. Consequences of Stability-Induced Epistasis for Substitution Rates. Molecular Biology and Evolution. 2020. doi: 10.1093/molbev/msaa151 [DOI] [PubMed] [Google Scholar]
  • 68. Vigué L, Croce G, Petitjean M, Ruppé E, Tenaillon O, Weigt M. Deciphering Polymorphism in 61,157 Escherichia Coli Genomes via Epistatic Sequence Landscapes. Nature Communications. 2022;13(1):4030. doi: 10.1038/s41467-022-31643-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Kimura M. Evolutionary Rate at the Molecular Level. Nature. 1968;217(5129):624–626. doi: 10.1038/217624a0 [DOI] [PubMed] [Google Scholar]
  • 70. Gillespie JH. Substitution Processes in Molecular Evolution. III. Deleterious Alleles. Genetics. 1994;138(3):943–952. doi: 10.1093/genetics/138.3.943 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Jensen JD, Payseur BA, Stephan W, Aquadro CF, Lynch M, Charlesworth D, et al. The Importance of the Neutral Theory in 1968 and 50 Years on: A Response to Kern and Hahn 2018. Evolution. 2019;73(1):111–114. doi: 10.1111/evo.13650 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Galtier N. Half a Century of Controversy: The Neutralist/Selectionist Debate in Molecular Evolution. Genome Biology and Evolution. 2024;16(2):evae003. doi: 10.1093/gbe/evae003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Galtier N, Duret L. Adaptation or Biased Gene Conversion? Extending the Null Hypothesis of Molecular Evolution. Trends in Genetics. 2007;23(6):273–277. doi: 10.1016/j.tig.2007.03.011 [DOI] [PubMed] [Google Scholar]
  • 74. Rousselle M, Laverré A, Figuet E, Nabholz B, Galtier N. Influence of Recombination and GC-biased Gene Conversion on the Adaptive and Nonadaptive Substitution Rate in Mammals versus Birds. Molecular Biology and Evolution. 2019;36(3):458–471. doi: 10.1093/molbev/msy243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Joseph J. Increased Positive Selection in Highly Recombining Genes Does Not Necessarily Reflect an Evolutionary Advantage of Recombination. Molecular Biology and Evolution. 2024; p. msae107. doi: 10.1093/molbev/msae107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Ohta T, Gillespie JH. Development of Neutral and Nearly Neutral Theories. Theoretical Population Biology. 1996;49(2):128–142. doi: 10.1006/tpbi.1996.0007 [DOI] [PubMed] [Google Scholar]
  • 77. Spielman SJ, Wilke CO. The Relationship between dN/dS and Scaled Selection Coefficients. Molecular biology and evolution. 2015;32(4):1097–1108. doi: 10.1093/molbev/msv003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Dos Reis M. How to Calculate the Non-Synonymous to Synonymous Rate Ratio of Protein-Coding Genes under the Fisher-Wright Mutation-Selection Framework. Biology Letters. 2015;11(4):20141031. doi: 10.1098/rsbl.2014.1031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Rodrigue N, Latrille T, Lartillot N. A Bayesian Mutation-Selection Framework for Detecting Site-Specific Adaptive Evolution in Protein-Coding Genes. Molecular Biology and Evolution. 2021;38(3):1199–1208. doi: 10.1093/molbev/msaa265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Tamuri AU, dos Reis M. A Mutation-Selection Model of Protein Evolution under Persistent Positive Selection. Molecular Biology and Evolution. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Kazmi SO, Rodrigue N. Detecting Amino Acid Preference Shifts with Codon-Level Mutation-Selection Mixture Models. BMC Evolutionary Biology. 2019;19(1):62. doi: 10.1186/s12862-019-1358-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Stolyarova AV, Nabieva E, Ptushenko VV, Favorov AV, Popova AV, Neverov AD, et al. Senescence and Entrenchment in Evolution of Amino Acid Sites. Nature Communications. 2020;11(1):4603. doi: 10.1038/s41467-020-18366-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Douzery EJP, Scornavacca C, Romiguier J, Belkhir K, Galtier N, Delsuc F, et al. OrthoMaM v8: A Database of Orthologous Exons and Coding Sequences for Comparative Genomics in Mammals. Molecular Biology and Evolution. 2014;31(7):1923–1928. doi: 10.1093/molbev/msu132 [DOI] [PubMed] [Google Scholar]
  • 84. Yang Z, Nielsen R. Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage. Molecular Biology and Evolution. 2008;25(3):568–579. doi: 10.1093/molbev/msm284 [DOI] [PubMed] [Google Scholar]
  • 85. Wright S. Evolution in Mendelian Populations. Genetics. 1931;16(2):97–159. doi: 10.1093/genetics/16.2.97 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Fisher RA. The Genetical Theory of Natural Selection. The Clarendon Press; 1930. [Google Scholar]
  • 87. Rodrigue N, Lartillot N, Philippe H. Bayesian Comparisons of Codon Substitution Models. Genetics. 2008;180(3):1579–1591. doi: 10.1534/genetics.108.092254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Lartillot N. A Bayesian Mixture Model for Across-Site Heterogeneities in the Amino-Acid Replacement Process. Molecular biology and evolution. 2004;21(6):1095–1109. doi: 10.1093/molbev/msh112 [DOI] [PubMed] [Google Scholar]
  • 89.Lartillot N. Inférence Probabiliste Pour La Phylogénie, La Génomique Comparative et Les Sciences de La Macro-Évolution; 2013.
  • 90. Al Abri MA, Holl HM, Kalla SE, Sutter NB, Brooks SA. Whole Genome Detection of Sequence and Structural Polymorphism in Six Diverse Horses. PLoS ONE. 2020;15(4):e0230899. doi: 10.1371/journal.pone.0230899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: The NCBI Database of Genetic Variation. Nucleic Acids Research. 2001;29(1):308–311. doi: 10.1093/nar/29.1.308 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Svardal H, Jasinska AJ, Apetrei C, Coppola G, Huang Y, Schmitt CA, et al. Ancient Hybridization and Strong Adaptation to Viruses across African Vervet Monkey Populations. Nature Genetics. 2017;49(12):1705–1713. doi: 10.1038/ng.3980 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Zheng-Bradley X, Streeter I, Fairley S, Richardson D, Clarke L, Flicek P, et al. Alignment of 1000 Genomes Project Reads to Reference Assembly GRCh38. GigaScience. 2017;6(7):gix038. doi: 10.1093/gigascience/gix038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Keightley PD, Jackson BC. Inferring the Probability of the Derived vs the Ancestral Allelic State at a Polymorphic Site. Genetics. 2018;209(3):897–906. doi: 10.1534/genetics.118.301120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95. Ashkenazy H, Penn O, Doron-Faigenboim A, Cohen O, Cannarozzi G, Zomer O, et al. FastML: A Web Server for Probabilistic Reconstruction of Ancestral Sequences. Nucleic Acids Research. 2012;40(W1):W580–W584. doi: 10.1093/nar/gks498 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Tataru P, Bataillon T. polyDFE: Inferring the Distribution of Fitness Effects and Properties of Beneficial Mutations from Polymorphism Data. In: Methods in Molecular Biology. vol. 2090. Humana Press Inc.; 2020. p. 125–146. [DOI] [PubMed] [Google Scholar]
  • 97. Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N, et al. The Caper Package: Comparative Analysis of Phylogenetics and Evolution in R. R package version. 2013;5(2):1–36. [Google Scholar]
  • 98. Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Molecular Biology and Evolution. 2017;34(7):1812–1819. doi: 10.1093/molbev/msx116 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Justin C Fay, Kirk E Lohmueller

13 Jun 2024

Dear Dr Latrille,

Thank you very much for submitting your Research Article entitled 'Estimating the proportion of beneficial mutations that are not adaptive in mammals' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Kirk E Lohmueller

Guest Editor

PLOS Genetics

Justin Fay

Section Editor

PLOS Genetics

If you choose to submit a revised manuscript, please address all of the comments from the reviewers. When revising your manuscript, please pay special attention to the following points:

1) Please add some simulations to address the performance of your approach under the assumptions of the model.

2) Reviewer 3 raises concerns about the functional form for the DFE that you’re fitting. Please carefully address these points.

3) Reviewer 2 asks questions pertaining to the role that changing fitness landscapes could play in the inference. Reviewer 2 also raises concerns about the correlation of the patterns with effective population size. Please resolve this point.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors provides a very careful evaluation of the fitness effect of non-synonymous mutations using both population genetic and phylogenetic (multi-species alignment) data and methods. They suggest that a substantial proportion (between 15-45%) of all new beneficial mutations are “non-adaptive”, i.e. they restore fitness that was lost due to a previous fixation of a slightly deleterious mutation and thus are not responding to a changed environment. This is as far as I know the first quantitative assessment of this idea. A central assumption is that the fitness landscape of amino acid replacements can be estimated from phylogenetic data and that this fitness landscape stays constant over time. There are some potential model violations (changing population size, epistasis, CpG mutation bias) and the authors provide a fair assessment each of these potential issues and how these could affect the precision of the estimated effect.

I don’t have any major concerns. This is a careful and insightful study and I fully endorse publication in PLOS Genetics. Below, I’ve listed a few suggestions for the discussion.

I think it would be useful to have a more explicit discussion of strongly beneficial mutations. Rare but strongly beneficial mutations would not be inferred by either of the methods, but can contribute substantially to divergence. I think it would help to mention that the inferred beneficial (non-adaptive) mutations are relatively weak and fall within a narrow range of selection strength (1< S <10). I also wonder if it is possible to calculate how much non-adaptive beneficial mutations contribute to nonsynonymous divergence, relative to all beneficial mutations. E.g., how much would they inflate estimates of the rate of adaptive evolution (alpha)?

Related to this, could the authors comment on the study by Endard et al. (eLife, 2016, https://doi.org/10.7554/eLife.12469), which states that viruses are dominantly driving protein adaptation in mammals? They estimate that close to 30% of adaptive amino acid changes are driven by adaptation to viruses. Is this consistent with or contradicting the results of the present study?

Reviewer #2: In this paper, Latrille, Joseph et al. combine phylogenetic and popopulation genetic methods to infer the contribution of beneficial non-adaptive mutations to molecular evolution. They define beneficial non-adaptive mutations as mutations that revert deleterious amino acid changes that have fixed due to drift to their previous, beneficial state. This is contrasted with adaptive mutations, which increase in frequency due to positive selection after a change in the fitness landscape, compensatory mutations, which restore ancestral fitness through a change at a different amino acid than the initial deleterious change, and nearly neutral mutations, which have little fitness effect. They present evidence that ongoing positive selection is driven by a combination of beneficial non-adaptive mutations as well as adaptive mutations, and the proportion of beneficial non-adaptive mutations may be substantial. In addition, they make the point that a model that includes positively-selected beneficial non-adaptive mutations is a more appropriate null model for the evolution of protein-coding genes than is a model of no positive selection. Overall, this work is an interesting and important contribution to our understanding of adaptation and positive selection.

Major comments

My major comments are centered on two main things: The assumption of a stable underlying fitness landscape, and the consequences of the dependence on effective population size.

Stable underlying fitness landscape:

Overall, I would appreciate more clarification and reasoning for why the assumption of a stable underlying fitness landscape is reasonable in this case. It is necessary for the model employed, which makes it extremely important to both justify its use and consider the consequences if the assumption is violated. In the scenario as formulated, the presence of adaptive mutations means the fitness landscape is not entirely stable (acknowledged by the authors eg. P4L113-114 “we could tease apart beneficial non-adaptive from adaptive mutations resulting from a change in the fitness landscape”). Is the implication then that the fitness landscape is stable for some proteins, but not for others? If so, this could have interesting implications for understanding the relative proportions of beneficial non-adaptive mutations vs. adaptive mutations.

Furthermore, what would be the theoretical consequences if this assumption were relaxed? The authors start to approach this in one way by re-conducting the analysis using highly conserved genes (P7L208-211), which might be expected to have a more stable fitness landscape than other genes that are likely to adapt to changing environmental conditions. I would appreciate more discussion of the implications of the findings associated with that version of the analysis in general, and specifically as they might relate to the assumption of a stable fitness landscape.

In addition, could the presence of (an unknown number of) adaptive mutations impact the fitting of the mutation-selection model to the multi-species alignments, and impact the estimation of the amino-acid fitness landscape itself?

This potential concern is especially pronounced for species that have been undergoing domestication, and as a result have had major changes in their fitness landscapes in the relatively recent past. Four of the six species tested (Ovis, Capra, Bos, Equus) are domestic species, and as a result are expected to have had strong adaptive responses to a major shift in their fitness landscape, potential changes in the DFE as a result (Rice et al. 2015 Genetics), and additionally recent large reductions in effective population size.

Effective population size:

Because the definition of mutation categories deleterious, nearly-neutral, and (non-adaptive) beneficial is relative to S0, and S0 = 4Ne*s, the categorization of mutations necessarily depends on species/population Ne. This is appropriate, since efficiency of selection depends on Ne, but it means that the Ne estimates are very important to the results. I appreciate that the authors took a consistent approach based on θ, but their reported Ne values are 1-2 orders of magnitude higher for many of these species than often reported in the literature (for example, see Warren et al. 2015 for Chlorocebus sabaeus; Eyidivandi et al. 2020 Animal Genetics, Taheri et al. 2022 Small Ruminant Research for Ovis aries spp. Taheri et al. also used the NextGen data.) If the Nes reported here are biased upward, that will also bias the proportion of mutations determined to be in each category.

In addition, phylogenetic regressions are conducted to understand the relationship between the proportion of mutations in each category at the population scale and at the population scale given its category at the phylogenetic scale. However the expectations of these regressions are not straightforward, since Prop[B], Prop[B0|B], and the proportions of the other categories are, by definition, dependent on Ne. As a result, the expected relationships are dependent on Nes. That is, a relationship between the dependent variable and Ne is baked in.

Minor comments

- The structure of the manuscript is at times hard to follow, with material in the introduction that seems to fit better in the methods, etc. I ask the authors to revisit the overall structure during their revisions and make sure that the organization is straightforward. Some examples:

- Some methodological descriptions in introduction rather than methods, these make it hard to follow/find these details (eg. P3L90-98)

- P8L250-251 should be in results

- P16L486-487 “we fitted a Phylogenetic Generalized Linear Model in R with the package caper” it is unclear without more context (found only in the results and discussion) what the nature of this model is.

- Eqs. 9 and 10 are inconsistent with eq. 8, showing fΓ(S; -βd,b) vs. fΓ(-S; -βd,b) Is this intentional?

- The authors bring up compensatory mutations a couple of times throughout the manuscript, including in a section of the discussion in which they discuss the potential conflation of putatively adaptive mutations and compensatory mutations in their study. They argue that the proportion of beneficial non-adaptive mutations observed in the present study supports that a model without epistasis as an adequate fit to the protein data (P8 L277-278). I would appreciate additional clarification of this argument, as on the surface it seems like a high proportion of observed beneficial non-adaptive mutations wouldn’t preclude an additional contribution of compensatory mutations. In addition, this makes me curious about whether the method presented, with the addition of a more complex protein fitness landscape, could be used to infer the relative contribution of compensatory mutations (with perhaps further implications for inferring the contribution of compensatory mutations to adaptive evolution, see eg. Storz 2018 “Compensatory mutations and epistasis for protein function”)?

- The authors point out the difficulty in distinguishing between beneficial non-adaptive mutations from convergent adaptation, which is an interesting point. I wonder if incorporating effective population size of the lineages that include such convergent/beneficial non-adaptive mutations could help to distinguish them, given the expected relationship between the proportion of mutations of different classes and effective population size?

Reviewer #3: The paper concerns the prevalence of advantageous back mutations in molecular evolution. The authors present a method for estimating this prevalence. The method is based on fitting a model of molecular evolution where each site in a protein has a conserved set of amino acid preferences given as scaled fitness values. Then, currently segregating SNPs can be assigned scaled selection coefficients under this model. The site frequency spectrum of these same SNPs can be used to estimate the distribution of current selective pressures as a function of the predicted selection coefficients from the model of long-term amino acid preferences. This can then be used to estimate the fraction of beneficial new mutations that are restoring a historically preferred amino acid state versus the fraction of beneficial new mutations that are responding to new selective pressures. The authors apply the method exome-wide to a set of 80+ mammalian species for the model of molecular evolution and polymorphisms from over 20 populations. While the estimates for the prevalence of advantageous back mutations vary widely between populations, they generally constitute a substantial minority of all beneficial mutations.

The manuscript represents an innovative way of combining approaches from molecular evolution and population genetics to address an interesting but understudied problem (the prevalence of beneficial back mutations). The broad strokes approach is similar to a recent PNAS paper with the same lead author (Latrille, T., Rodrigue, N. & Lartillot, N. 2023 Genes and Sites under Adaptation at the Phylogenetic Scale Also Exhibit Adaptation at the Population-Genetic Scale. PNAS), but the question and methods here are distinct. The manuscript is a good fit for the readership of PLOS Genetics and the conclusions are of broad interest. That being said, I have some technical concerns about whether the proposed method is sound as currently implemented.

Major issues:

1. While the authors do a good job anticipating the effects of violations of model assumptions (which would tend to result in an underestimate of the frequency of beneficial back mutations), they do not show that their method is capable of producing accurate estimates when the model assumptions are satisfied. They should show this via simulations where the ground truth is known.

2. I am particularly concerned about the previous point because I suspect that the method has a problem as currently implemented. The problem is that the modeling of the DFE of the segregating variants is insufficiently expressive for the role that it plays in the inference. The DFE fit by the authors is a reasonable choice for the overall DFE but they are asking it to fit distributions that may look very different than the overall DFE (e.g. distributions truncated at -1 or +1). Fitting this grossly misspecified model will lead to highly inaccurate parameter estimates in some regimes.

Specifically, the DFE they use is a mixture between a reflected gamma distribution and an exponential distribution, which is a standard choice in the literature for estimating the whole DFE, and is roughly appropriate in that it can e.g. produce the peak at neutrality observed in nature and control the fractions of slightly deleterious mutations, etc. However, under the authors’ procedure they split the SNPs into subsets based on their estimated selection coefficients from the phylogenetic model and then estimate the DFEs for these subsets individually. These DFEs will have a completely different shape from the overall DFE. For example, in the case that selective pressures are truly constant over time, the ground truth of the DFEs to be estimated would be a deleterious distribution right-truncated at -1, a neutral distribution truncated between -1 and +1, and a beneficial distribution truncated on the left at +1. Even to the extent that the authors’ model of the DFE can put the right total probability in any one of these regions, the shape of the distribution in those regions will be completely wrong. For example, attempting to put most of the probability greater than +1 will also necessarily produce a DFE with enormously large selection coefficients because the positive portion of the DFE is constrained to be exponential.

Off the top of my head I do not know how to solve this problem. I believe the new fastDFE software from Thomas Bataillon’s group allows you to define custom parametric models for the DFE. Another possibility might be to fit a mixture of the parametric DFE the authors are currently using and the truncated DFEs that would be predicted from the phylogenetic mutation-selection model (but I haven’t thought it through).

3. There are several passages such as line 146-149 stating that dN/dS is “biased” as a measure of adaptation due to the presence of beneficial back mutations. As currently written this discussion appears misguided.

First, it is well-understood that as a practical matter dN/dS is massively biased AGAINST detecting adaptation because the signal from sites under purifying selection typically overwhelms the signal from sites experiencing positive selection. Obviously any discussion of subtle details of the behavior of dN/dS as a method of detecting adaptation has to be viewed against this backdrop.

Second, the authors come to their conclusion by calculating dN/dS for the subset of sites that they predict are maladapted at the start of the terminal branches under their phylogenetic model. Finding dN/dS>1 for these sites is 100% appropriate, since this subset of sites did truly adapt and the new amino acids at these sites really are closer to optimality at the end of the branch than at the beginning! The fact that the cause of the maladaptation is the previous fixation of deleterious mutations rather than environmental change does not change the fact that this subset of sites has contributed positively to the adaptedness of the population. It is just that under the stationary process the adaptation at these sites is offset by maladaptive substitutions elsewhere in the genome so that there is no net adaptation.

Third, there is a broader issue throughout the manuscript of verbally tying adaptation to environmental change. However, rapid evolution in genes with elevated dN/dS is often not driven by environmental change but rather by frequency-dependent selection, e.g. rapid evolution of reproductive genes and genes involved in sexual selection. In the mathematical framework used by the current authors, Mustonen and Lassig 2010 PNAS’s treatment of these issues via the calculation of “fitness flux” provides a rigorous method for addressing these complications precisely in the setting of allelic preferences relevant to the methods described here (which for instance structurally cannot address issues of absolute fitness, reproductive excess, etc.). I think the more intuitive language used in the manuscript is OK as motivation, but the reader should be more explicitly directed to the mathematical theory for a rigorous treatment (and the authors may also want to update some of their views in light of this clearer understanding).

4. The authors do a good job in addressing how epistasis due to interactions between specific sites such as structural contacts would play into their modeling framework but should also address epistasis that arises because mutations act additively but selection is nonlinear (e.g. stabilizing selection on an additive trait such as protein activity). The molecular evolution in the latter case is dealt with by Charlesworth 2013 “Stabilizing Selection, Purifying Selection, and Mutational Bias in Finite Populations” in Genetics.

5. I think there is a missed opportunity in the analysis of substitutions along terminal branches to provide an estimate of the total frequency of amino acid substitutions due to mutation-selection-drift balance. Because the model of molecular evolution used is time reversible, the distribution of mutations fixed at stationarity is symmetric around zero. The lack of symmetry along the terminal branches then provides a means to estimate the frequency of beneficial back substitutions (in contrast to the rest of the study which estimates the frequency of beneficial back mutations). For example, one could use twice the fraction of substitutions with selection coefficients greater than 1 as an estimate of this rate; more refined estimates are also likely possible. Likewise, the fraction of deleterious substitutions minus the fraction of advantageous substitutions provides a crude estimate of the fraction of substitutions due to adaptation to new environmental conditions.

More generally, the highly consistent estimates of the frequency of beneficial back substitutions in the last column of table S1 (very tight between .09-.12) suggests an overall frequency of nearly neutral substitutions due to consistent long-term selection pressures, mutation and drift of approximately 20% of all substitutions (doubling the last column of table S1 to count the corresponding deleterious fixations). This provides additional evidence for the importance of beneficial but not adaptive mutations.

Minor issues:

- Line 167, perhaps also helpful to cite Grimm et al. 2015 here

- I think readers would appreciate a little more discussion on the relationship between these results and the McDonald-Kreitman test literature, since both are integrating fixed and segregating variation, and relate to a fraction of adaptive mutations/substitutions.

- I suggest also discussing the alternative approach of considering adaption to new selective environments by allowing selective preferences to change on subsets of branches (e.g. Kazmi and Rodrigue 2019, or Ritchie, Stark and Liberles 2021) and the strengths / weaknesses of this versus the current approach.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Christian Huber

Reviewer #2: No

Reviewer #3: No

Decision Letter 1

Justin C Fay, Kirk E Lohmueller

19 Nov 2024

PGENETICS-D-24-00471R1Estimating the proportion of beneficial mutations that are not adaptive in mammalsPLOS Genetics Dear Dr. Latrille, Thank you for submitting your manuscript to PLOS Genetics. After careful consideration, we feel that it has merit but does not fully meet PLOS Genetics's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days . If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosgenetics@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgenetics/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards,Kirk E LohmuellerAcademic EditorPLOS Genetics Justin FaySection EditorPLOS Genetics Aimée DudleyEditor-in-ChiefPLOS Genetics Anne GorielyEditor-in-ChiefPLOS Genetics Additional Editor Comments: Thank you for submitting the revised manuscript. The reviewers and I all agree that it substantially improved. Before we move forward with your manuscript, please address the comments from Reviewer 3 about the Supplementary Information. Journal Requirements:

1) Your current Financial Disclosure states, " This work was funded by Université de Lausanne (https://www.unil.ch; to TL, DAH and NS) and Agence Nationale de la Recherche (https://anr.fr/; grant ANR-19-CE12-0019 / HotRec to JJ). This study makes use of data generated by the NextGen Consortium. The European Union’s Seventh Framework Programme (FP7/2010-2014) provided funding for the project under grant agreement no 244356 - “NextGen”. The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. " While your funding information on the submission form indicates only two funders. However, the funder "The European Union’s Seventh Framework Programme (FP7/2010-2014)" is currently missing. Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well. Please indicate by return email the full and correct funding information for your study and confirm the order in which funding contributions should appear. 

Reviewers' comments:Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have carefully considered and responded to all of my previous comments, and I believe the revisions have enhanced the manuscript.

Reviewer #2: The authors have completed a thorough and substantial review in response to my and the other reviewers’ comments.

I appreciate the additional discussion of predictions at the population vs phylogenetic level and the fitness seascape, as well as the manner in which they addressed the implications of Ne under multiple circumstances. The manuscript is now also easier to follow.

Reviewer #3: The authors have conducted a thorough and thoughtful revision, both with respect to my comments and the comments of the other reviewers, and have conducted numerous additional analyses that bolster the conclusions of the manuscript.

I do suggest some additional minor changes to the SI for formatting and completeness. Specifically, the SI is written as more of an outline than a publication-ready document. For example, it mostly has lists of bullet points below the figures rather than figure captions. Also, I think that many of the supplemental figures could use more explanation and be more self-contained. For instance, I assume that Figure S10 is using the same convention as main text Figure 4 that the circles denote populations and the squares denote species means, but there isn’t a caption and the bulleted text doesn’t explain this either. Likewise, section 7.1 is just a table (plus bullet points acting as a caption) with no explanation as are several of the other SI suggestions. Overall I think the SI should be brought up to a publication-ready standard.

Minor comments:

SI Typo “Additionally to including divergence, with also tested our prediction with polyDFE model D instead of model C” with->we

********** 

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

********** 

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Christian Huber

Reviewer #2: Yes: M. Elise Lauterbur

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility: To enhance the reproducibility of your results, we recommend that authors deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Decision Letter 2

Justin C Fay, Kirk E Lohmueller

10 Dec 2024

Dear Dr Latrille,

We are pleased to inform you that your manuscript entitled "Estimating the proportion of beneficial mutations that are not adaptive in mammals" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Kirk E Lohmueller

Academic Editor

PLOS Genetics

Justin Fay

Section Editor

PLOS Genetics

Aimée Dudley

Editor-in-Chief

PLOS Genetics

Anne Goriely

Editor-in-Chief

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-24-00471R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Justin C Fay, Kirk E Lohmueller

19 Dec 2024

PGENETICS-D-24-00471R2

Estimating the proportion of beneficial mutations that are not adaptive in mammals

Dear Dr Latrille,

We are pleased to inform you that your manuscript entitled "Estimating the proportion of beneficial mutations that are not adaptive in mammals" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Supplementary appendix on the parameterization of Mutation-selection codon models.

    Contains 5 pages of supplementary information including 1 figure (Fig A).

    (PDF)

    pgen.1011536.s001.pdf (304.1KB, pdf)
    S2 File. Supplementary appendix on the caracterization of non-adaptive beneficial mutations.

    Contains 12 pages of supplementary information including 3 figures (Fig A to C) and 6 tables (Table A to F).

    (PDF)

    pgen.1011536.s002.pdf (750.3KB, pdf)
    S3 File. Supplementary appendix on the contrast of selection at the phylogenetic and population-genetic scales.

    Contains 7 pages of supplementary information including 8 figures (Fig A to H).

    (PDF)

    pgen.1011536.s003.pdf (730.3KB, pdf)
    S4 File. Supplementary appendix on controls for estimating the proportion of beneficial mutations that are not adaptive.

    Contains 21 pages of supplementary information including 9 figures (Fig A to I) and 9 tables (Table A to I).

    (PDF)

    pgen.1011536.s004.pdf (1.2MB, pdf)
    Attachment

    Submitted filename: PGENETICS-D-24-00471R-Response-to-reviewers.pdf

    pgen.1011536.s005.pdf (634.2KB, pdf)
    Attachment

    Submitted filename: PGENETICS-D-24-00471R2-Response-to-reviewers.pdf

    pgen.1011536.s006.pdf (80.4KB, pdf)

    Data Availability Statement

    The data underlying this article are available at https://doi.org/10.5281/zenodo.7878953. Snakemake pipeline, analysis scripts and documentation are available at https://github.com/ThibaultLatrille/SelCoeff.


    Articles from PLOS Genetics are provided here courtesy of PLOS

    RESOURCES