arising from H. Gu et al. Nature Communications 10.1038/s41467-023-37468-y (2023)
With the recent onset of the SARS-CoV-2 pandemic, there has been great interest in interpreting the within-patient evolutionary dynamics of this virus. Indeed, the accurate identification of genomic regions experiencing positive selection, and the quantification of these selective effects, is of crucial importance for both evolutionary as well as clinical interpretation. With this goal, the recently published Gu et al.1 work collected 2820 respiratory samples to investigate observed levels of within-patient synonymous relative to non-synonymous variation, and relied upon this comparison to assign genomic regions as evolving under purifying selection, neutrality, or positive selection. Specifically, they interpreted > 0 as being indicative of positive selection, ~0 as being indicative of neutrality, and <0 as being indicative of purifying selection (e.g., see Fig. 2 of Gu et al.). Using this criterion when performing sliding window analyses, the authors claimed that multiple genomic regions are experiencing positive selection. Crucially, the authors relied upon their selection inference derived from these -based comparisons to support conclusions regarding infection dynamics in vaccinated vs. unvaccinated patients, a focal point of their publication.
There is a long history in the field of population genetics of comparing non-synonymous and synonymous divergence in this regard (i.e., dN/dS), as well as in jointly interpreting non-synonymous to synonymous divergence relative to polymorphism (e.g., as implemented in the McDonald-Kreitman test2, as well as numerous other related implementations; see refs. 3,4). In this framework, assuming that synonymous sites are evolving neutrally, the neutral divergence at these sites under genetic drift alone will be equal to the neutral mutation rate5, and thus non-synonymous divergence may be interpreted as being depressed by purifying selection or accelerated by positive selection relative to this synonymous/neutral standard.
However, this divergence-based interpretation does not correctly extend to a comparison of and as utilized by Gu et al. As one example, the effects of selection at linked sites (see review of ref. 6) renders this polymorphism-level interpretation problematic. Namely, even if mutations at synonymous sites are themselves neutral (and see ref. 7), their observed frequency in the population may be shaped by the episodic genetic hitchhiking effects associated with positive selection (i.e., selective sweeps8), and will be shaped by the constantly occurring genetic hitchhiking effects associated with purifying selection (i.e., background selection9). Importantly, these genetic hitchhiking effects will not impact divergence-based comparisons such as dN/dS (10; though there are nonetheless important considerations, see refs. 11,12), but they will strongly impact polymorphism-based comparisons such as the of Gu et al.
For these reasons, one must account for the myriad of evolutionary forces shaping observed levels of within-patient nucleotide variation when performing population genomic inference of this sort13,14. In SARS-CoV-2 specifically, this evolutionary baseline model will necessarily include the underlying mutation and recombination rates, the history of population size change associated with infection, as well as the constant purging of deleterious mutations and the resulting effects on linked sites15,16. Only by accounting for these certain-to-be-operating evolutionary processes may one determine if episodic or hypothesized processes (such as positive or balancing section) need to be invoked to explain observed levels and patterns of variation17–20.
Thus, in order to investigate the claims of Gu et al., we simulated this SARS-CoV-2 baseline model in both the presence and absence of positive selection, in order to better interpret the behavior of . As shown in Figs. 1 and 2, these simulations reveal multiple reasons to question their interpretations. Firstly, because of the small number of variable sites observed in the SARS-CoV-2 genome in any given patient sample, particularly after their filtering for SNPs segregating at greater than 2.5% frequency in a folded site frequency spectrum (i.e., resulting in a median of ~5 SNPs/sampled genome in the patient data), there is an extremely large variance associated with and , which is only exacerbated by further reducing the scale of inference to specific genomic windows. For example, as shown in Fig. 1, in the complete absence of positive selection, it is naturally the case that purifying selection will on average reduce the frequencies of non-synonymous relative to synonymous variants (though the latter will be experiencing background selection effects); however, it is also the case that the variance is such that there is an appreciable probability of observing values that are larger than (i.e., their criteria for identifying positive selection), particularly on a sliding-window scale.
Secondly, even in the presence of positive selection (Fig. 2), the implemented expectation of > 0 by Gu et al. would not successfully identify this evolutionary process. As shown for both a partial selective sweep (i.e., a beneficial mutation having reached 50% frequency in the patient population) and a complete selective sweep (i.e., a beneficial mutation having reached fixation in the patient population immediately prior to sampling), respectively, the expectation of remains negative. This observation partly owes to the fact that linked synonymous variants will be increased in frequency via genetic hitchhiking more readily than other linked non-synonymous variants which are likely deleterious; as such, synonymous variation in the hitchhiked region of the genome may be augmented more than non-synonymous variation. In addition, these models are similarly characterized by a large variance.
We additionally extended this model to consider recurrent beneficial mutations. Specifically, we evaluated scenarios in which 1% of new mutations are beneficial and in which 10% of new mutations are beneficial, occurring on the strongly or weakly deleterious DFE backgrounds given in Figs. 1 and 2, or occurring on the DFE background recently estimated for SARS-CoV-2 experimentally21. As shown in Supplementary Fig. 1, genomic windows were observed in all scenarios in which is both greater than and less than 0, and even genome-wide there is no significant differentiation in these distributions. It is worth emphasizing that while an extreme scenario in which 10% of all newly arising mutations are strongly beneficial and simultaneously segregating in the population may indeed elevate relative to , even this unrealistic parameter space does not reliably produce this pattern. Furthermore, given that elevated may also be readily generated by models lacking positive selection entirely as shown, this -based approach of Gu et al. remains inappropriate owing to issues of identifiability.
In summary, is not a reliable indicator of selective effects and dynamics. As shown in the specific case of SARS-CoV-2, the large variance associated with relatively few genomic SNPs renders the interpretation highly tenuous, leading to a situation in which values greater than 0 and less than 0 are both associated with appreciable probabilities in the presence of purifying selection alone. Furthermore, even with the addition of positive selection, the observation of > is unreliable owing partly to the effects of genetic hitchhiking. For these reasons, statistical inference procedures which directly account for multiple competing evolutionary processes (see refs. 22,23), and which utilize more sophisticated expectations associated with patterns of variation in the site frequency spectrum and linkage disequilibrium associated with positive selection (as reviewed by ref. 24, and see ref. 25), would be required to evaluate the claims of Gu et al.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
This work was supported by the National Institutes of Health grant R35GM139383 to J.D.J.
Author contributions
VS, JWT and JDJ conceived the project; VS performed simulations with input from JWT and JDJ; VS, JWT and JDJ wrote the manuscript; JDJ provided funding for the project.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Data availability
Datasets generated and/or analyzed during the current study are available in the paper. Source data are provided with this paper.
Code availability
All scripts and data underlying the simulations, analyses, and Figures may be found at: https://github.com/vivaksoni/Gu_etal_2023_response.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-46261-4.
References
- 1.Gu H, et al. Within-host genetic diversity of SARS-CoV-2 lineages in unvaccinated and vaccinated individuals. Nat. Commun. 2023;14:1793. doi: 10.1038/s41467-023-37468-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991;351:652–654. doi: 10.1038/351652a0. [DOI] [PubMed] [Google Scholar]
- 3.Charlesworth, B. & Charlesworth, D. Elements of Evolutionary Genetics. (W. H. Freeman and Company, New York, 2010).
- 4.Walsh, B. & Lynch, M. Evolution and Selection of Quantitative Traits. (Oxford University Press, Oxford, 2018).
- 5.Kimura, M. The Neutral Theory of Molecular Evolution. (Cambridge University Press, Cambridge, 1983).
- 6.Charlesworth B, Jensen JD. Effects of selection at linked sites on patterns of genetic variability. Annu. Rev. Ecol. Evol. Syst. 2021;52:177–197. doi: 10.1146/annurev-ecolsys-010621-044528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wang H, Pipes L, Nielsen R. Synonymous mutations and the molecular evolution of SARS-CoV-2 origins. Virus Evol. 2021;7:1–11. doi: 10.1093/ve/veaa098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet. Res. 1974;23:23–35. doi: 10.1017/S0016672300014634. [DOI] [PubMed] [Google Scholar]
- 9.Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993;134:1289–1303. doi: 10.1093/genetics/134.4.1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Birky CW, Walsh JB. Effects of linkage on rates of molecular evolution. Proc. Natl Acad. Sci. 1988;85:6414–6418. doi: 10.1073/pnas.85.17.6414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Eyre-Walker A. Changing effective population size and the McDonald-Kreitman test. Genetics. 2002;162:2017–2024. doi: 10.1093/genetics/162.4.2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kryazhimskiy S, Plotkin JB. The population genetics of dN/dS. PLoS Genet. 2008;4:e1000304. doi: 10.1371/journal.pgen.1000304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Johri P, Eyre-Walker A, Gutenkunst RN, Lohmueller KE, Jensen JD. On the prospect of achieving accurate joint estimation of selection with population history. Genome Biol. Evol. 2022;14:evac088. doi: 10.1093/gbe/evac088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Johri P, et al. Recommendations for improving statistical inference in population genomics. PLOS Biol. 2022;20:e3001669. doi: 10.1371/journal.pbio.3001669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Terbot JW, et al. Developing an appropriate evolutionary baseline model for the study of SARS-CoV-2 patient samples. PLOS Pathog. 2023;19:e1011265. doi: 10.1371/journal.ppat.1011265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Terbot JW, et al. A simulation framework for modeling the within-patient evolutionary dynamics of SARS-CoV-2. Genome Biol. Evol. 2023;15:evad204. doi: 10.1093/gbe/evad204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Irwin KK, et al. On the importance of skewed offspring distributions and background selection in virus population genetics. Heredity. 2016;117:393–399. doi: 10.1038/hdy.2016.58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jensen JD, Kowalik TF. A consideration of within-host human cytomegalovirus genetic variation. Proc. Natl Acad. Sci. 2020;117:816–817. doi: 10.1073/pnas.1915295117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jensen, J. D. Studying population genetic processes in viruses: from drug-resistance evolution to patient infection dynamics. In: Bamford, D. H. and Zuckerman, M. (eds.) Encyclopedia of Virology, 4th edition 5, 227–232 (2021).
- 20.Johri P, Stephan W, Jensen JD. Soft selective sweeps: addressing new definitions, evaluating competing models, and interpreting empirical outliers. PLOS Genet. 2022;18:e1010022. doi: 10.1371/journal.pgen.1010022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Flynn JA, et al. Comprehensive fitness landscape of SARS-CoV-2 Mpro reveals insights into viral resistance mechanisms. Elife. 2022;11:e77433. doi: 10.7554/eLife.77433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Johri P, Charlesworth B, Jensen JD. Toward an evolutionarily appropriate null model: jointly inferring demography and purifying selection. Genetics. 2020;215:173–192. doi: 10.1534/genetics.119.303002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Howell AA, et al. Developing an appropriate evolutionary baseline model for the study of human cytomegalovirus. Genome Biol. Evol. 2023;15:evad059. doi: 10.1093/gbe/evad059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Stephan W. Selective sweeps. Genetics. 2019;211:5–13. doi: 10.1534/genetics.118.301319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution. 2023;77:2113–2127. doi: 10.1093/evolut/qpad120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Haller BC, Messer PW. SLiM 4: Multispecies eco-evolutionary modeling. Am. Nat. 2023;201:E127–E139. doi: 10.1086/723601. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Datasets generated and/or analyzed during the current study are available in the paper. Source data are provided with this paper.
All scripts and data underlying the simulations, analyses, and Figures may be found at: https://github.com/vivaksoni/Gu_etal_2023_response.