Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 19.
Published in final edited form as: Nature. 2025 Feb 19;638(8051):E19–E22. doi: 10.1038/s41586-024-08496-5

Insufficient evidence for natural selection associated with the Black Death

Alison R Barton 1,8, Cindy G Santander 2,8, Pontus Skoglund 3, Ida Moltke 2, David Reich 1,4,5,6, Iain Mathieson 7,
PMCID: PMC11938207  NIHMSID: NIHMS2061492  PMID: 39972236

Klunk et al.1 analysed ancient DNA from before, during and after the Black Death, and claimed that large allele frequency changes at immune genes in aggregate, and at four specific variants, reflected natural selection. These claims are unsupported for four reasons. First, the enrichment in immune genes disappears if an appropriate randomization test is carried out; second, after correcting an error in the estimation of allele frequencies, none of the four reported loci pass the original filtering thresholds; third, these filtering thresholds do not adequately correct for multiple testing, and even the original results were not statistically significant. Finally, we find no evidence of significant change in frequency of the ERAP2 variant rs2549794, either in the data reported by Klunk et al. or in published data for samples spanning 2,000 years.

Frequency changes were a technical artefact

Klunk et al. report that variants in immune genes showed enrichment of large degree of differentiation (FST) values compared with putatively neutral genomic regions (P < 10−11). However, when we randomly permuted which samples were labelled as pre- and post-Black Death, such that there should be no enrichment (Supplementary Methods), the P values did not follow a null expectation, with an inflation factor of 11.80 (Fig. 1a,b). Moreover, 7% of the permutations produce a nominal P value even lower than that observed in the original article (P = 7.9 × 10−12), corresponding to an increase of 10 orders of magnitude to a non-significant level (P > 0.05). Although the exact P values are sensitive to parameter choices (Supplementary Methods), overall they reflect the pattern expected from artefactual differences between immune and neutral loci, rather than a signal due to natural selection. Artefacts are expected because the immune genes were captured with different reagents and sequenced in different batches than the putatively neutral genes.

Fig. 1 |. Reported enrichment persists even after permutation to remove any signal that could be due to the Black Death.

Fig. 1 |

a, For 100 permutations of the pre- and post-Black Death labels, the observed P values are plotted in a cumulative density plot for the 99th percentile of enrichment of variants with MAF > 10%. The blue dashed line indicates a cumulative count of 5 of 100 runs, the green dashed line indicates the expected significance threshold of 0.05, and the red solid line shows the P value obtained in the original article. b, A qq plot for the same P values as in a, showing the inflation over the expected null distribution of P values. c, For 100 iterations approximately matching the sample size and coverage from the original study by downsampling a subset of samples from the 1000 Genomes Project, the observed P values are plotted in a cumulative density plot for the 99th percentile of enrichment of variants with MAF > 10%. Lines represent the same values as in a. d, A qq plot for the same P values as in c, showing the inflation over the expected null distribution of P values.

Noting that coverage varied across panels (genome-wide association study (GWAS) loci 7.1×, exonic loci 2.9× and neutral loci 8.3×), we carried out an independent set of simulations using sequencing data from present-day populations, randomly assigning individuals to before or after the Black Death and downsampling coverage to match the data from Klunk et al. (Supplementary Methods). We again found enrichment of large FST values at immune loci with an inflation factor of 68.08 (Fig. 1c,d) and 39% of replicates resulted in nominal P values indicating greater significance than reported in the original article. Although this demonstrates that differences in coverage are sufficient to generate a spurious signal of enrichment similar to that reported by Klunk et al., we expect that other technical differences—in read count distribution, reference bias, GC content and other capture bias2–would also contribute. If technical differences between panels drive the signal, then permuting across single nucleotide polymorphisms (SNPs) (rather than individuals) would not control the inflation.

Allele frequencies were estimated incorrectly

Klunk et al. estimate allele frequencies as follows: “we calculated the expected number of alternate alleles as the likelihood the individual is heterozygous plus 2× the likelihood the individual is homozygous alternate” (supplementary methods of ref. 1). This procedure is incorrect. Genotype likelihoods are the probability of the data conditional on the genotype (P(data|genotype)), but Klunk et al. treat them as though they were the probability of the genotype conditional on the data (P(genotype|data)). Their expression is mathematically equivalent to a posterior mean with a prior that all three genotypes are equally likely. This will produce frequency estimates with a bias towards the prior mean of 0.5 that depends on coverage (Fig. 2ac). Thus, differences in estimated allele frequency can reflect differences in coverage, not selection (mean coverage in London pre-Black Death 6.8×, during Black Death 7.5× and post-Black Death 6.0×).

Fig. 2 |. Bias in allele frequency estimates based on genotype likelihoods.

Fig. 2 |

a, True values against unbiased maximum likelihood (ML) estimates for simulated read data with an average of 5× coverage simulated for 200 individuals at 30,000 SNPs to match the data observed in Klunk et al.1. b, True values against biased estimates computed with the Klunk et al. approach for simulated read data with an average of 5× coverage simulated for 200 individuals at 30,000 SNPs. c, Maximum likelihood estimates against Klunk et al. estimates for 31,799 SNPs included in the analysis with no minimum allele frequency threshold. d, Manhattan plot for FST scan of loci that pass the Klunk et al. filtering criteria using maximum likelihood estimates of allele frequencies (equivalent to figure 2c in Klunk et al.1). One variant passes the filtering criteria (but does not pass a Bonferroni correction).

We re-estimated allele frequencies using the unbiased maximum likelihood approach, and applied the original filtering criteria (Supplementary Methods). None of the four reported loci pass the thresholds originally used by Klunk et al. (Fig. 2d). Rerunning the enrichment analysis described in the previous section with maximum likelihood estimates still resulted in inflated P values, indicating that the two issues are independent. Finally, to avoid false-positive calls due to DNA degradation, most ancient DNA studies are restricted to known polymorphic sites, restricted to transversions, use single-stranded-aware genotyping or remove damaged bases with enzymatic treatments. Klunk et al. do not use any of these strategies. Of the 22,868 variants with minor allele frequency (MAF) > 5% that they analysed, the transition/transversion ratio is around 19 (compared with an expectation of 2–3 for genomic data), and only 4,456 appear in the approximately 96 million UK Biobank SNP imputation set3, which should include almost all sites with MAF > 5% in the ancient populations. This suggests that most variants in their study are artefactual.

No control for multiple testing

To control the experiment-wide false-positive rate, genome-wide scans must correct for the large number of statistical tests4. Klunk et al. instead apply an ad hoc filtering strategy with no statistical justification. We simulated variants under a null model of identical allele frequencies in all populations using the same sample sizes as Klunk et al. and found that a proportion of 1.4 × 10−4 of these variants pass these filters (Supplementary Methods). For comparison, a conventional Bonferroni-corrected significance threshold would be 0.05/3293 = 1.5 × 10−5, approximately 10 times smaller. Furthermore, to remove spurious outliers driven by genotyping errors, many ancient DNA studies have required multiple, statistically significant, linked variants at a locus5,6–a requirement that is also standard in GWAS. Klunk et al. do not apply this filter.

No evidence of recent natural selection at ERAP2

Klunk et al. report that their lead SNP–rs2549794 at ERAP2–is associated in vitro with a differential response to Yersinia pestis. However, this is not evidence of natural selection. That requires a statistical analysis that shows a more rapid change in allele frequency than expected by chance, whereas the changes reported by Klunk et al. are entirely consistent with random sampling. If there were no differences in allele frequency, the probability of observing an FST value larger than the 0.0247 reported for rs2549794 would be at least P = 0.067 (Fig. 3a and Supplementary Methods). Even this is an underestimate as low coverage, genetic drift and reference bias inflate FST values7. The large point estimate of the selection coefficient—five times larger than the largest selection coefficient at a common variant documented in humans at the lactase persistence allele8—is simply a consequence of the fact that variants were chosen on the basis of large estimated FST values, and is not independent evidence for selection (Fig. 3b).

Fig. 3 |. No evidence of selection at rs2549794.

Fig. 3 |

a, Histogram of simulated FST values for 2 samples of 38 and 63 diploid individuals from 2 populations with identical allele frequency of 0.438. The dashed red line shows the reported value for rs2549794. b, Distribution of estimated selection coefficients (ŝ), conditional on passing FST and directionality filters, under a null model of identical allele frequency of 0.438 in all populations assuming diploid coverage in all individuals. c, Estimated frequencies of rs2549794 in the 3 time points from Klunk et al. plus the periods 1,000–2,000 bp and the present day. Dashed lines show present-day frequencies and error bars show approximate 95% confidence intervals that overlap the present-day frequency for all time points except post-Black Death (BD) Denmark.

We also examined rs2549794 in ancient and present-day populations from the UK and Denmark (Supplementary Methods). The C allele frequency in present-day people with ‘British/Irish’ ancestry from UK Biobank is 0.438 (n = 264,261 alleles), not significantly different to ancient individuals from England dated between 2,000 and 1,000 years before present (bp) (frequency = 0.466, n = 146 alleles, χ2 P = 0.55). Similarly, the frequency in present-day Denmark (0.429, n = 100,528 alleles) is not significantly different to 2,000–1,000 bp (0.368, n = 76 alleles, P = 0.34). There is therefore no evidence that the frequency of rs2549794 has changed in either England or Denmark over the past 2,000 years (Fig. 3c). Finally, this locus shows no evidence of selection in scans to detect adaptation in Britain in this period using a range of different data and methods810.

Conclusion

Inadequate correction for multiple testing, implausibly large effects, and post hoc rationalization for non-significant associations are precisely the issues that led to the development of statistical standards for genome-wide scans. Even ignoring the technical issues and batch effects that we identified, the evidence presented by Klunk et al. does not meet these standards.

Methods

See Supplementary Information for detailed methods.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-024-08496-5.

Supplementary Material

Supplementary Methods

Acknowledgements

P.S. was supported by the European Molecular Biology Organisation, the Vallee Foundation, the European Research Council (grant 852558), the Wellcome Trust (217223/Z/19/Z), and Francis Crick Institute core funding (FC001595) from Cancer Research UK, the UK Medical Research Council, and the Wellcome Trust. C.G.S. and I. Moltke were funded by a European Research Council starting grant awarded to I. Moltke (ERC-2018-STG-804679). I. Moltke was also supported by a Villum Fonden Young Investigator grant (project VIL19114). A.R.B. and D.R. were supported by the John Templeton Foundation (grant 61220). D.R. was also supported by National Institutes of Health grant HG012287, the Allen Discovery Center programme, a Paul G. Allen Frontiers Group advised programme of the Paul G. Allen Family Foundation, and the Howard Hughes Medical Institute. I. Mathieson was supported by the National Institute of General Medical Sciences (GM133708). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or other funding agencies.

Footnotes

Competing interests The authors declare no competing interests.

Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41586-024-08496-5.

Code availability

Code to replicate the analyses described here is available at https://github.com/arbarton/Klunk_matters_arising/.

Data availability

No new data were generated for this publication. The sequence data from Klunk et al.1 were obtained from the authors. Other data sources are described in the Supplementary Methods.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Methods

Data Availability Statement

No new data were generated for this publication. The sequence data from Klunk et al.1 were obtained from the authors. Other data sources are described in the Supplementary Methods.

RESOURCES