Skip to main content
eLife logoLink to eLife
. 2021 Aug 19;10:e67509. doi: 10.7554/eLife.67509

Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin’s Paradox

Vince Buffalo 1,
Editors: Guy Sella2, Detlef Weigel3
PMCID: PMC8486380  PMID: 34409937

Abstract

Neutral theory predicts that genetic diversity increases with population size, yet observed levels of diversity across metazoans vary only two orders of magnitude while population sizes vary over several. This unexpectedly narrow range of diversity is known as Lewontin’s Paradox of Variation (1974). While some have suggested selection constrains diversity, tests of this hypothesis seem to fall short. Here, I revisit Lewontin’s Paradox to assess whether current models of linked selection are capable of reducing diversity to this extent. To quantify the discrepancy between pairwise diversity and census population sizes across species, I combine previously-published estimates of pairwise diversity from 172 metazoan taxa with newly derived estimates of census sizes. Using phylogenetic comparative methods, I show this relationship is significant accounting for phylogeny, but with high phylogenetic signal and evidence that some lineages experience shifts in the evolutionary rate of diversity deep in the past. Additionally, I find a negative relationship between recombination map length and census size, suggesting abundant species have less recombination and experience greater reductions in diversity due to linked selection. However, I show that even assuming strong and abundant selection, models of linked selection are unlikely to explain the observed relationship between diversity and census sizes across species.

Research organism: None

Introduction

A longstanding mystery in evolutionary genetics is that the observed levels of genetic variation across sexual species span an unexpectedly narrow range. Under neutral theory, the average number of nucleotide differences between sequences (pairwise diversity, π) is determined by the balance of new mutations and their loss by genetic drift (Kimura and Crow, 1964; Malécot, 1948; Wright, 1931). In particular, expected pairwise diversity at neutral sites in a panmictic population of Nc diploids is π4Ncμ, where μ is the per basepair per generation mutation rate. Given that metazoan germline mutation rates only differ 10-fold (10−8–10−9, Kondrashov and Kondrashov, 2010; Lynch, 2010), and census sizes vary over several orders of magnitude, under neutral theory one would expect that pairwise diversity also vary over several orders of magnitude. However, early allozyme surveys revealed that diversity levels across a wide range of species varied just an order of magnitude (Lewontin, 1974, p. 208); this is known as Lewontin’s ‘‘Paradox of Variation’. With modern sequencing-based estimates of π across taxa ranging over only three orders of magnitude (0.01–10%, Leffler et al., 2012), Lewontin’s paradox remains unresolved through the genomics era.

Early on, explanations for Lewontin’s Paradox have been framed in terms of the neutralist–selectionist controversy (Lewontin, 1974; Kimura, 1984; Gillespie, 1991; Gillespie, 2001). The neutralist view is that beneficial alleles are sufficiently rare and deleterious alleles are removed sufficiently quickly, that levels of genetic diversity are shaped predominantly by genetic drift and mutation (Kimura, 1984). Specifically, non-selective processes decouple the effective population size implied by observed levels of diversity π^, N~e=π^/4μ, from the census size, Nc. By contrast, the selectionist view is that direct selection and the indirect effects of selection on linked neutral diversity suppress diversity levels across taxa, specifically because the impact of linked selection is greater in large populations. Undoubtedly, these opposing views represent a false dichotomy, as population genomic studies have uncovered evidence for the substantial impact of both demographic history (e.g. Zhao et al., 2013; Palkopoulou et al., 2015) and linked selection on genome-wide diversity (e.g. Elyashiv et al., 2016; Begun and Aquadro, 1992; Aguade et al., 1989; McVicker et al., 2009).

Possible resolutions of Lewontin’s Paradox

A resolution of Lewontin’s Paradox would involve a mechanistic description and quantification of the evolutionary processes that prevent diversity from scaling with census sizes across species. This would necessarily connect to the broader literature on the empirical relationship between diversity and population size (Frankham, 1996; Nei and Graur, 1984; Soulé, 1976; Leroy et al., 2021), and the ecological and life history correlates of genetic diversity (Nevo, 1978; Powell, 1975; Nevo et al., 1984). Three categories of processes stand out as potentially capable of decoupling census sizes from diversity: non-equilibrium demography, variance and skew in reproductive success, and selective processes.

It has long been appreciated that effective population sizes are typically less than census population sizes, tracing back to early debates between R.A. Fisher and Sewall Wright (Fisher and Ford, 1947; Wright, 1948). Possible causes of this divergence between effective and census population sizes include demographic history (e.g. population bottlenecks), extinction and recolonization dynamics, or the breeding structure of populations (e.g. the variance in reproductive success and population substructure). Early explanations for Lewontin’s Paradox suggested bottlenecks during the last glacial maximum severely reduced population sizes (Kimura, 1984; Ohta and Kimura, 1973; Nei and Graur, 1984), and emphasized that large populations recover to equilibrium diversity levels more slowly (Nei and Graur, 1984, Kimura, 1984 p. 203–204). Another explanation is that cosmopolitan species repeatedly endure extinction and recolonization events, which reduces effective population size (Maruyama and Kimura, 1980; Slatkin, 1977).

While intermittent demographic events like bottlenecks and recent expansions have long-term impacts on diversity (since mutation-drift equilibrium is reached on the order of size of the population), characteristics of the breeding structure such as high variance (Vw) or skew in reproductive success continuously suppress diversity below the levels predicted by the census size (Wright, 1938). For example, in many marine animals, females are highly fecund, and dispersing larvae face extremely low survivorship, leading to high variance in reproductive success (Waples et al., 2018; Waples et al., 2013; Hedgecock and Pudovkin, 2011; Hauser and Carvalho, 2008). Such ‘‘sweepstakes’ reproductive systems can lead to remarkably small ratios of effective to census population size (e.g. Ne/Nc can range from 10–6–10–2), since Ne/N1/Vw(Hedgecock, 1994; Wright, 1938; Nunney, 1993), and require multiple-merger coalescent processes to describe their genealogies (Eldon and Wakeley, 2006). Overall, these reproductive systems diminish the diversity in some species, but seem unlikely to explain Lewontin’s Paradox broadly across metazoans.

Alternatively, selective processes, and in particular the indirect effects of selection on linked neutral variation, could potentially explain the observed narrow range of diversity. The earliest mathematical model of hitchhiking was proffered as a explanation of Lewontin’s Paradox (Smith and Haigh, 1974). Since, linked selection has been shown to impact diversity levels in a variety of species, as evidenced by the correlation between recombination and diversity (Aguade et al., 1989; Begun and Aquadro, 1992; Cutter and Payseur, 2003; Stephan and Langley, 1998; Cai et al., 2009). Theoretic work to explain this pattern has considered the impact of a steady influx of beneficial mutations (recurrent hitchhiking; Stephan et al., 1992; Stephan, 1995), and purifying selection against deleterious mutations (background selection, BGS; Charlesworth et al., 1993; Nordborg et al., 1996; Hudson and Kaplan, 1994). Indeed, empirical work indicates background selection diminishes diversity around genic regions in a variety of species (McVicker et al., 2009; Hernandez et al., 2011; Charlesworth, 1996), and now efforts have shifted towards teasing apart the effects of positive and negative selection on genomic diversity (Elyashiv et al., 2016).

A class of models that are of particular interest in the context of Lewontin’s Paradox are recurrent hitchhiking models that decouple diversity from the census population size. These models predict diversity levels when strongly selected beneficial mutations regularly enter and sweep through the population, trapping lineages and forcing them to coalesce (Kaplan et al., 1989; Gillespie, 2000). In general, decoupling occurs under these hitchhiking models when the rate of coalescence due to selection is much greater than the rate of neutral coalescence (e.g. Coop and Ralph, 2012, Equation 22). In contrast, under other linked selection models, the resulting effective population size is proportional to population size; these models cannot decouple diversity, all else equal. For example, models of background selection and polygenic fitness variation predict diversity is proportional to population size, mediated by the total recombination map length and the deleterious mutation rate or fitness variation (Charlesworth et al., 1993; Nicolaisen and Desai, 2012; Nordborg et al., 1996; Robertson, 1961; Santiago and Caballero, 1995).

Recent approaches towards resolving Lewontin’s Paradox

Recently, Corbett-Detig et al., 2015 used population genomic data to estimate the reduction in diversity due to background selection and hitchhiking across 40 species, and showed that the impact of selection increases with two proxies of census population size, species range and the inverse of body size. Based on this evidence, they argued that selection could explain Lewontin’s Paradox; however, in a re-analysis, Coop, 2016 demonstrated that the observed magnitude of these reductions is insufficient to explain the orders-of-magnitude shortfall between observed and expected levels of diversity across species. Other recent work has found that life history characteristics related to parental investment, such as propagule size, are good predictors diversity in animals (Romiguier et al., 2014; Chen et al., 2017). Nevertheless, while these diversity correlates are important clues, they do not propose a mechanism by which these traits act to constrain diversity within a few orders of magnitude.

Here, I revisit Lewontin’s Paradox by integrating several data sets in order to compare the observed relationship between diversity and census size with the predicted relationship under different selection models. Prior surveys of genetic diversity either lacked census population size estimates, used allozyme-based measures of heterozygosity, or included fewer species. To address these shortcomings, I first estimate census sizes by combining predictions of population density based on body size with ranges estimated from geographic occurrence data. Using these estimates, I quantify the relationship between census size and previously-published genomic diversity estimates across 172 metazoan taxa within nine phyla, thus characterizing the relationship between π and Nc that underlies Lewontin’s Paradox.

Past work looking at the relationship between π and Nc has been unable to fully account for phylogenetic non-independence across taxa (Felsenstein, 1985). To address this, I use phylogenetic comparative methods (PCMs) with a synthetic time-calibrated phylogeny to account for shared phylogenetic history. Moreover, it is disputed whether considering phylogenetic non-independence is necessary in population genetics, given that coalescent times within species are much less than divergence times (Whitney and Garland, 2010; Lynch, 2011). Using PCMs, I address this by estimating the degree of phylogenetic signal in the diversity census size relationship, and investigating how these traits evolve along the phylogeny.

Finally, I explore whether the predicted reductions of diversity under background selection and recurrent hitchhiking are sufficiently strong to resolve Lewontin’s Paradox. I do so using selection parameters from Drosophila melanogaster, a species known to be strongly affected by linked selection. Given the effects of linked selection are mediated by recombination map length, I also investigate how map lengths vary with census population size using data from a previously-published survey (Stapley et al., 2017). I find map lengths are typically shorter in large–census-size species, increasing the effects of linked selection in these species, which could further decouple diversity from census size. Still, I find the combined impact of these modes of linked selection fall short in explaining Lewontin’s Paradox, and discuss future avenues through which the Paradox of Variation could be fully resolved.

Results

Estimates of census population size

An impediment in resolving Lewontin’s Paradox is characterizing the relationship between diversity and census population sizes. This is difficult because census population sizes are unavailable for many taxa, especially for extremely abundant, cosmopolitan species that define the upper limit of ranges. Previous work has surveyed the literature for census size estimates (Nei and Graur, 1984; Soulé, 1976; Frankham, 1996), or used range, body size, or qualitative categories as proxies for census size (Corbett-Detig et al., 2015; Leffler et al., 2012). To quantify the relationship between genomic estimates of diversity and census population sizes, I first approximate census population sizes for 172 metazoan taxa (Figure 1). I estimate population densities based on an empirical linear relationship between body sizes and density that holds across metazoans (see Figure 1—figure supplement 1; Damuth, 1981; Damuth, 1987). Then, from geographic occurrence data, I estimate range sizes. Finally, I estimate population size as the product of these predicted densities and range estimates (see Materials and methods: Macroecological Estimates of Population Size). Note that the relationship between population density and body size is driven by energy budgets, and thus reflects macroecological equilibria (Damuth, 1987). Consequently, population sizes are underestimated for taxa like humans and their domesticated species, and overestimated for species with anthropogenically reduced densities or fragmented ranges. For example, the population size of Lynx lynx is likely around 50,000 (IUCN, 2020) which is around two orders of magnitude smaller than my estimate. Additionally, the range size estimates do not consider whether an area has unsuitable habitat, and thus may be overestimated for species with particular niches or patchy habitats. While my approach produces approximate and sometimes crude estimates, it has the advantage that it can be efficiently calculated for numerous taxa, which is sufficient to estimate the magnitude of Lewontin’s Paradox (see Population Size Validation for more on validation based on biomass and other approaches).

Figure 1. The distribution of approximate census population sizes estimated by this study.

Some phyla containing few species were excluded for clarity.

Figure 1—source data 1. The population size estimates for 172 metazoan taxa.

Figure 1.

Figure 1—figure supplement 1. The relationship between body mass and population density found by Damuth, 1987, which is used to predict population densities.

Figure 1—figure supplement 1.

The source of this data is appendix table of Damuth, 1987; the color indicates Damuth’s original group labels. The dashed line was estimated using a lognormal regression model in Stan. References to each measurement are available in Damuth, 1987.
Figure 1—figure supplement 2. The fraction of total species per class on earth included in this study’s sample, per class.

Figure 1—figure supplement 2.

The color of the points represents phylum, and the size of the point represents the absolute number of species by class.
Figure 1—figure supplement 3. Comparison of this paper’s range estimates procedure against the IUCN Red List’s range estimates.

Figure 1—figure supplement 3.

The correspondence between the ranges estimated with the alpha hull method applied to GBIF data used in this paper and IUCN Red List’s Extent of Occurrence for the subset of species in both datasets. Note that the IUCN Red List contains predominantly endangered species, which leads to ascertainment bias; still, the high correlation between the estimated ranges shows the alpha hull method works well.
Figure 1—figure supplement 4. Validation of this paper’s range estimates against the categorical labels of Leffler et al., 2012.

Figure 1—figure supplement 4.

The estimated ranges using GBIF occurrence data, ordered within and colored by the original range category labels assigned in Leffler et al., 2012.
Figure 1—figure supplement 5. The relationship between body length (meters) and body mass (grams) in the Romiguier et al., 2014 data set.

Figure 1—figure supplement 5.

The relationship between body length (meters) and body mass (grams) in the Romiguier et al., 2014 data set. This is used to infer body masses for taxa. The gray dashed line is the line of best fit inferred using Stan.

Characterizing the Diversity–Census-size Relationship

To determine which ecological or evolutionary processes could decouple diversity from census population size, we first need to quantify this relationship across a wide variety of taxa. Previous work has found there is a significant relationship between heterozygosity and the logarithm of population size or range size, but these studies relied on heterozygosity measured from allozyme data (Soulé, 1976; Frankham, 1996; Nei and Graur, 1984). I confirm these findings using pairwise diversity estimates from genomic sequence data and the estimated census sizes (Figure 2). The pairwise diversity estimates are from three sources: Leffler et al., 2012, Corbett-Detig et al., 2015, and Romiguier et al., 2014, and are predominantly from either synonymous or non-coding DNA (see Methods and Materials: 4.1 Diversity and Map Length Data). Overall, an ordinary least squares (OLS) relationship on a log-log scale fits the data well (Figure 2, gray dashed line). The OLS slope estimate is significant and implies a 13% percent increase in differences per basepair for every order of magnitude census size grows (95% confidence interval [12%, 14%], adjusted R2=0.26; see also the OLS fit per-phyla, Figure 2—figure supplement 2).

Figure 2. A visualization of Lewontin’s Paradox of Variation.

Pairwise diversity (data from Leffler et al., 2012, Corbett-Detig et al., 2015, and Romiguier et al., 2014), which varies over three orders of magnitude, shows a weak relationship with approximate population size, which varies over 12 orders of magnitude. The shaded curve shows the range of expected neutral diversity if Ne were to equal Nc under the four-alleles model, log10(π)=log10(θ)log10(1+4θ/3) where θ=4Ncμ, for two mutation rates, μ=10-8 and μ=10-9, and the light gray dashed line represents the maximum pairwise diversity under the four alleles model. The dark gray dashed line is the OLS regression fit, and the blue dashed line is the regression fit using a phylogenetic mixed-effects model. Points are colored by phylum. The species Equus ferus przewalskii (Nc103 and π=3.6×10-3) was an outlier and excluded from this figure for visual clarity.

Figure 2—source data 1. The diversity and population size dataset for 172 metazoan taxa.

Figure 2.

Figure 2—figure supplement 1. A linear-log version of Figure 2.

Figure 2—figure supplement 1.

Points are colored by phylum, and the shaded region is the predicted neutral level of diversity assuming Ne=Nc with mutation range ranging between 109μ108.
Figure 2—figure supplement 2. A version of Figure 2 with OLS estimates per phylum.

Figure 2—figure supplement 2.

Diversity and approximate population size for 172 taxa, colored by phylum; the dashed lines indicate the non-phylogenetic OLS estimates of the relationship between population size and diversity grouped by phyla.
Figure 2—figure supplement 3. The posterior distributions and fitted relationship between diversity and both body mass and range size.

Figure 2—figure supplement 3.

The relationship between diversity (differences per basepair) and body mass (left) and range (right) across 172 species. The top row are posterior distributions of parameters estimated using the phylogenetic mixed-effects model using 166 taxa in the synthetic phylogeny for the intercept, slope, and phylogenetic signal from the mixed-effects model. The bottom row contain each species as a point, colored by phyla. The gray dashed line is the non-phylogenetic standard regression estimate, and the blue dashed line is the relationship fit by the phylogenetic mixed-effects model.
Figure 2—figure supplement 4. Pairwise diversity grouped by the range categories from Leffler et al., 2012, with point size indicating the predicted population density.

Figure 2—figure supplement 4.

The vertical lines are the range category group means.

Notably, this relationship has few outliers and is relatively homoscedastic. This is in part because of the log-log scale, in contrast to previous work (Nei and Graur, 1984; Soulé, 1976); see Figure 2—figure supplement 1 for a version on a log-linear scale. However, it is noteworthy that few taxa have diversity estimates below 10−3.5 differences per basepair. Those that do, lynx (Lynx Lynx), wolverine (Gulo gulo), and Massasauga rattlesnake (Sistrurus catenatus) face habitat loss and declining population sizes. These three species are all in the IUCN Red List, but are listed as least concern (though their presence in the Red List indicates they are of conservation interest). In Appendix D, Appendix D Diversity and IUCN Red List Status, I explore the relationships between IUCN Red List status, diversity, and population size.

Phylogenetic non-independence and the population size diversity relationship

One limitation of using ordinary least squares is that shared phylogenetic history can create correlation structure in the residuals, which violates an assumption of the regression model and can lead to bias (Felsenstein, 1985; Revell, 2010). To address this shortcoming, I fit the diversity–census-size relationship using a phylogenetic mixed-effects model, investigated whether there is a signal of phylogenetic non-independence, estimated the continuous trait values on the phylogeny, and explored how diversity and population size evolve. Prior population genetic comparative studies have lacked time-calibrated phylogenies and assumed unit branch lengths (Whitney and Garland, 2010), a shortcoming that has drawn criticism (Lynch, 2011). I use a synthetic time-calibrated phylogeny created from the DateLife project (O’Meara et al., 2020) to account for shared phylogenetic history (see Materials and methods: Phylogenetic Comparative Methods).

Using a phylogenetic mixed-effects model (Lynch, 1991; Hadfield and Nakagawa, 2010; de Villemereuil and Nakagawa, 2014) implemented in Stan (Carpenter et al., 2017; Stan Development Team, 2020), I estimated the linear relationship between diversity and population size (on a log-log scale) accounting for phylogeny, for the 166 taxa without missing data and present in the synthetic chronogram. This type of model is needed because closely-related species may differ from the average trend between Nc and π in similar ways due to shared phylogenetic history, similar life history traits, etc., and thus do not represent independent observations as is assumed by the standard regression model. This is a form of phylogenetic pseudoreplication, and can be accounted for with a phylogenetic mixed-effects model. The phylogenetic mixed-effects model does not assume that there is phylogenetic structure in either Nc or π (which itself is not a violation of the standard regression model, Revell, 2010 and Uyeda et al., 2018), but rather accounts for phylogenetic correlation structure in the residuals if any is present. Importantly, phylogenetic mixed-effects models simultaneously estimate the degree of phylogenetic structure in the residuals while fitting the relationship between Nc and π. If the residuals are distributed independently, the estimated relationship would be similar to that found by ordinary least squares, and the estimated phylogenetic signal would be zero. Overall, this approach is conservative, making no assumptions about the source of the phylogenetic signal while accounting for violations of the regression model due to dependence among the residuals if present (see Revell, 2010 for a discussion of this).

As with the linear regression, I find this relationship is positive and significant (95% credible interval 0.03, 0.11), though somewhat attenuated compared to the OLS estimates (Figure 3B). Since the population size estimates are based on range and body mass, they are essentially a composite trait; fitting phylogenetic mixed-effects models separately on body mass and range indicates these have significant positive and negative effects, respectively (Figure 2—figure supplement 3; see also Figure 2—figure supplement 4 for the relationship between diversity and the range categories of Leffler et al., 2012).

Figure 3. Phylogenetic comparative models of diversity and population size.

(A) The ancestral continuous trait estimates for the population size and diversity (differences per bp, log scaled) across the phylogeny of 166 taxa. The phyla of the tips are indicated by the color bar in the center. (B) The posterior distributions of the intercept, slope, and phylogenetic signal (λ, de Villemereuil and Nakagawa, 2014) of the phylogenetic mixed-effects model of diversity and population size (log scaled). Also shown are the 90% credible interval (light blue shading), posterior mean (blue line), OLS estimate (gray solid line), and bootstrap OLS confidence intervals (light gray shading). (C) The node-height tests of diversity, population size, and the two components of the population size estimates, body mass, and range (all traits on log scale before contrast was calculated). Each point shows the standardized phylogenetic independent contrast and branching time for a pair of lineages. Red lines are robust regression estimates (and are only shown for statistically significant relationships at the α=0.05 level). Note that some outlier pairs with very high phylogenetic independent contrasts were excluded (in all cases, these outliers were in the genus Drosophila).

Figure 3.

Figure 3—figure supplement 1. The posterior distributions for the parameters of the phylogenetic mixed-effects model of diversity and population size (this is analogous to Figure 3B) fit separately on chordates (n=68), molluscs (n=13), and arthropods (n=68).

Figure 3—figure supplement 1.

The phylogenetic mixed-effects model for chordates indicated the best-fitting model had no residual variance (σr2=0), so an alternate model without this variance component was used to ensure proper convergence; this model is shown in green. The light blue (green) shaded regions are the 90% credible intervals, the blue (green) lines the posterior averages, the gray shaded regions the OLS bootstrap 95% confidence intervals, and the gray lines the OLS estimate. Note that unlike Figure 3, the OLS estimate uses all taxa, not just those present in the phylogeny, since splitting the data by phyla reduces sample sizes (OLS with just the subset of taxa in the phylogeny is not significant for either chordates and arthropods). The vertical dashed gray line indicates zero.
Figure 3—figure supplement 2. The ancestral continuous trait estimates for diversity and population size with species labels.

Figure 3—figure supplement 2.

Figure 3—figure supplement 3. The ancestral continuous trait estimates for recombination map length and diversity and population size with species labels.

Figure 3—figure supplement 3.

Since the phylogenetic mixed-effects model simultaneously estimates the variance of the phylogenetic effect (σp2) and the residual variance (σr2), these can be used to estimate the phylogenetic signal, λ=σp2/(σp2+σr2) (Lynch, 1991; de Villemereuil and Nakagawa, 2014; see Freckleton et al., 2002 for a comparison to Pagel’s λ). When residuals are free of correlations due to shared phylogenetic history, then λ=0 and all the variance could be explained by evolution or noise on the tips. In the relationship between population size and diversity, the posterior mean of λ=0.67 (90% credible interval [0.58,0.75]) indicates a majority of the variance perhaps might be due to shared phylogenetic history (Figure 3B).

This high degree of phylogenetic signal substantiates Gillespie's concern (Gillespie, 1991) that the π–Nc relationship may be driven by chordate-arthropod differences. A visual inspection of the estimated ancestral continuous values for diversity and population size on the phylogeny indicates the high phylogenetic signal seems to be driven in part by chordates having low diversity and small population sizes compared to non-chordates (Figure 3A). This problem resembles Felsenstein’s worst-case scenario (Felsenstein, 1985; Uyeda et al., 2018), where a singular event on a lineage separating two clades generates a spurious association between two traits.

To investigate whether clade-level differences dominated the relationship between diversity and population size, I fit phylogenetic mixed-effects models to phyla-level subsets of the data for clades with sufficient sample sizes (see Methods: 4.4 Phylogenetic Comparative Methods). This analysis shows a significant positive relationship between diversity and population size in arthropods, and positive weak relationships in molluscs and chordates (Figure 3—figure supplement 1). Each of the 90% credible intervals for slope overlap, suggesting the relationship between π and Nc is similar across these clades.

Additionally, I have explored the rate of trait change through time using node-height tests (Freckleton and Harvey, 2006). Node-height tests regress the absolute values of the standardized contrasts between lineages against the branching time (since present) of these lineages. Under Brownian Motion (BM), standardized contrasts are estimates of the rate of character evolution (Felsenstein, 1985); if a trait evolves under constant rate BM, this relationship should be flat. For both diversity and population size, node-height tests indicate a significant increase in the rate of evolution towards the present (robust regression p-values 0.023 and 0.00018 respectively; Figure 3C). Considering the constituents of the population size estimate, range and body mass, separately, the rate of evolution of range but not body mass shows a significant increase (p-value 1.03 × 10−7) towards the present.

Interestingly, the diversity node-height test reveals two rate shifts at deeper splits (Figure 3C, top left) around 570 Mya. These nodes represent the branches between tunicates and vertebrates in chordates, and cephalopods and pleistomollusca (bivalves and gastropods) in molluscs. While the cephalopod-pleistomollusca split outlier may be an artifact of having a single cephalopod (Sepia officinalis) in the phylogeny, the tunicate-vertebrate split outlier is driven by the low diversity of vertebrates and the previously-documented exceptionally high diversity of tunicates (sea squirts; Nydam and Harrison, 2010; Small et al., 2007). This deep node representing a rate shift in diversity could reflect a change in either effective population size or mutation rate, and there is some evidence of both in this genus Ciona (Small et al., 2007; Tsagkogeorga et al., 2012). Neither of these deep rate shifts in diversity is mirrored in the population size node-height test (Figure 3C, top right). Rather, it appears a trait impacting diversity but not census size (e.g. mutation rate or offspring distributions) has experienced a shift on the lineage separating tunicates and vertebrates. At nearly 600 Mya, these deep nodes illustrate that expected effective population sizes (and thus coalescence times) can share phylogenetic history, due to phylogenetic inertia in some combination of population size, reproductive system, and mutation rates.

Finally, an important caveat is the increase in rate towards the tips could be caused by measurement noise, or possibly uncertainty or bias in the divergence time estimates deep in the tree. Inspecting the lineage pairs that lead to this increase in rate towards the tips indicates these represent plausible rate shifts, e.g. between cosmopolitan and endemic sister species like Drosophila simulans and Drosophila sechellia; however, ruling out measurement noise entirely as an explanation would involve modeling the uncertainty of diversity and population size estimates.

Assessing the impact of linked selection on diversity across taxa

The above analyses reemphasize the drastic shortfall of diversity levels as compared to census sizes. Linked selection has been proposed as the mechanism that acts to reduce diversity levels from what we would expect given census sizes (Smith and Haigh, 1974; Gillespie, 2000; Corbett-Detig et al., 2015). Here, I test this hypothesis by estimating the scale of diversity reductions expected under background selection and recurrent hitchhiking, and comparing these to the observed relationship between π and Nc.

I quantify the effect of linked selection on diversity as the ratio of observed diversity (π) to the estimated diversity in the absence of linked selection (π0), R=π/π0. Here, π0 would reflect only demographic history and non-heritable variation in reproductive success. There are two difficulties in evaluating whether linked selection could resolve Lewontin’s Paradox. The first difficulty is that π0 is unobserved. Previous work has estimated π0 using methods that exploit the spatial heterogeneity in recombination and functional density across the genome to fit linked selection models that incorporate both hitchhiking and background selection (Elyashiv et al., 2016; Corbett-Detig et al., 2015). The second difficulty is understanding how R varies across taxa, since we lack estimates of critical model parameters for most species. Still, I can address a key question: if diversity levels were determined by census sizes (π0=4Ncμ), would the combined effects of background selection and recurrent hitchhiking be sufficient to reduce diversity to observed levels? Furthermore, does the relationship between census size and predicted diversity under linked selection across species, πBGS+HH=Rπ0, match the observed relationship in Figure 2?

Since we lack estimates of selection parameters across species, I parameterize the hitchhiking and BGS models using estimates from Drosophila melanogaster, a species known to be strongly affected by linked selection (Sella et al., 2009). Under a generalized model of hitchhiking and background selection (Elyashiv et al., 2016; Coop and Ralph, 2012) and assuming Ne=Nc, the expected diversity is

πBGS+HHθ1/B(U,L)+2NcS(γ,J,L) (1)

where θ=4Ncμ, B(U,L) is the effect of background selection, and S(γ,J,L) is the rate of coalescence caused by sweeps (Elyashiv et al., 2016, Equation 1, Coop and Ralph, 2012, Equation 20). Under background selection models with recombination, the reduction is B(U,L)=exp(U/L) where U is the per diploid genome per generation deleterious mutation rate, and L is the recombination map length in Morgans (Hudson and Kaplan, 1994; Nordborg et al., 1996). This BGS model is similar to models of effective population size under polygenic fitness variation, and can account for other modes of linked selection (Robertson, 1961; Santiago and Caballero, 1995; Santiago and Caballero, 1998, see Appendix 2, Background Selection and Polygenic Fitness Models). The coalescence rate due to sweeps is S(γ,J,L)=γLJ, where γ is the number of adaptive substitutions per generation, and J is the probability a lineage is trapped by sweeps as they occur across the genome (J2,2 in Equation 15 of Coop and Ralph, 2012).

Parameterizing the model this way, I then set the key parameters that determine the impact of recurrent hitchhiking and background selection (γ, J, and U) to strong selection values estimated for Drosophila melanogaster by Elyashiv et al., 2016. My estimate of the adaptive substitutions per generation (γDmel2.3×10-3) based Elyashiv et al. implies a rate of sweeps per basepair of νBP,Dmel2.34×10-11, which is close to other estimates from D. melanogaster (see Figure 4—figure supplement 5A). The rate of deleterious mutations per diploid genome, per generation is parameterized using the estimate from Elyashiv et al., UDmel=1.6, which is slightly greater than previous estimates based on Bateman-Mukai approaches (Mukai, 1985; Mukai, 1988; Charlesworth, 1987). Finally, the probability that a lineage is trapped in a sweep, JDmel4.5×10-4, is calculated from the estimated genome-wide average coalescence rate due to sweeps from Elyashiv et al. (see Figure 4—figure supplement 5B and Materials and methods: Predicted Reductions in Diversity for more details on parameter estimates). Using these parameters, I then explore how the predicted range of diversity levels varies across species with recombination map length (L) and census population size (Nc).

Previous work has found that the impact of linked selection increases with Nc (Corbett-Detig et al., 2015; see Figure 4—figure supplement 4A), and it is often thought that this is driven by higher rates of adaptive substitutions in larger populations (Ohta, 1992), despite equivocal evidence (Galtier, 2016). However, there is another mechanism by which species with larger population sizes might experience a greater impact of linked selection: recombination map length, L, is known to correlate with body mass (Burt and Bell, 1987) and thus varies inversely with population size. As this is a critical parameter that determines the genome-wide impact of both hitchhiking and background selection, I examine the relationship between recombination map length (L) and census population size (Nc) across taxa, using available estimates of map lengths across species (Stapley et al., 2017; Corbett-Detig et al., 2015). I find a significant non-linear relationship using phylogenetic mixed-effects models (Figure 4A; see Methods and materials: 4.4 Phylogenetic Comparative Methods). There is also a correlation between map length and genome size (Figure 4—figure supplement 2) and genome size and population size (Figure 4—figure supplement 1). These findings are consistent with the hypothesis that non-adaptive processes increase genome size in small-Ne species (Lynch and Conery, 2003) which in turn could increase map lengths, as well as the hypothesis that map lengths are adaptively longer to more efficiently select against deleterious alleles in smaller populations (Roze, 2021). Overall, the negative relationship between map length and census size indicates linked selection is expected to be stronger in species with short map lengths, which are high-Nc species.

Figure 4. Predicting the impact of linked selection on diversity.

(A) The observed relationship between recombination map length (L) and census size (Nc) across 136 species with complete data and known phylogeny. Triangle points indicate six social taxa excluded from the model fitting since these have adaptively higher recombination map lengths (Wilfert et al., 2007). The dark gray line is the estimated relationship under a phylogenetic mixed-effects model, and the gray interval is the 95% posterior average. (B) Points indicate the observed π–Nc relationship across taxa shown in Figure 2, and the blue ribbon is the range of predicted diversity were Ne=Nc for μ=10-810-9, and after accounting for the expected reduction in diversity due to background selection and recurrent hitchhiking under Drosophila melanogaster parameters. In both plots, point color indicates phylum.

Figure 4—source data 1. The map length, population size, and linked selection estimates for 136 metazoan taxa.

Figure 4.

Figure 4—figure supplement 1. The relationship between genome size and approximate census population size.

Figure 4—figure supplement 1.

The dashed gray line indicates the OLS fit. Tiger salamander (Ambystoma tigrinum) was excluded because of its exceptionally large genome size ( 30Gbp).
Figure 4—figure supplement 2. The relationship between genome size and recombination map length.

Figure 4—figure supplement 2.

The dashed gray line indicates the OLS fit for all taxa, and the dashed colored dashed lines indicate the linear relationship fit by phyla. Tiger salamander (Ambystoma tigrinum) was excluded because of its exceptionally large genome size ( 30Gbp).
Figure 4—figure supplement 3. The observed π–Nc relationship (points) across species compared to the predicted diversity (ribbons) under different modes of linked selection and parameters, for a range of mutation rates, 10–9–10–8.

Figure 4—figure supplement 3.

In both subplots, the gray ribbon is the expected diversity if Ne=Nc. In (A), the predicted impact on diversity for four modes of linked selection are depicted: background selection (purple) and hitchhiking (yellow) individually under the Drosophila melanogaster parameters as in Figure 4B, and strong background selection (red) where UstrongBGS=10UDmel16, and strong recurrent hitchhiking, where γstrongHH=10γDmel0.23. (B) The predicted diversity under the combined effects of strong background selection and strong hitchhiking (orange) compared to the original predicted diversity as in Figure 4B (blue). Overall, under strong background selection and hitchhiking parameters, predicted diversity would be less than observed for high-Nc species, indicating the poor fit to observed data is not sensitive to the choice of Drosophila melanogaster parameters.
Figure 4—figure supplement 4. The relationship between Nc and diversity in the Corbett-Detig et al., 2015 data, and the relationship between estimated reduction in diversity and census size, for three different approaches.

Figure 4—figure supplement 4.

(A) The diversity data from Corbett-Detig et al., 2015 and the census population size estimated here for metazoan taxa. (B) The reductions in diversity, R=Ne/N, plotted against census size across species. The red points are the reductions estimated by Corbett-Detig et al., 2015. This confirms Corbett-Detig et al., 2015 finding that the impact of selection (I=1-R) increases with census population size (though, in the original paper size body size and range were used as separate proxy variables for census population size). The green and red points are the predicted reduction in diversity under the recurrent hitchhiking (RHH) and background selection (BGS) model using the Drosophila melanogaster parameters as described in the main text. The reduction in the diversity due to sweeps, from Equation 1, is determined by the term 2NS. Green points treat N as the implied effective population size from diversity N~e=π^/4μ, assuming μ=10-9. Yellow points treat N as the census size, N=Nc. Overall, using the census size, e.g. 2NcS, leads to reductions in diversity that far exceed the empirical estimates of Corbett-Detig et al. and reasonable model-based predictions from N~e.
Figure 4—figure supplement 5. Comparison of the Drosophila sweep parameters used in this study with parameters from other studies.

Figure 4—figure supplement 5.

(A) The estimate of the number of sweeps per basepair, per genome (νBP) from Table 2 of Elyashiv et al., 2016 (the studies included are Li and Stephan, 2006; Andolfatto, 2007; Macpherson et al., 2007 and Jensen et al., 2008); the red point is my estimate used in this paper. (B) Points are the data from Shapiro et al., 2007. The blue line is the non-linear least squares fit to the data, and the green dashed line is the sweep model parameterized by the genome-wide average sweep coalescence rate 2NS0.92 from the classic sweep and background selection model of Elyashiv et al., 2016 (rs in Supplementary Table S6).

Then, I predict the expected diversity (πBGS+HH) under background selection and hitchhiking, assuming Ne=Nc and that all species had the rate of sweeps and strength of BGS as D. melanogaster. Since neutral mutation rates μ are unknown and vary across species, I calculate the range of predicted πBGS+HH estimates for µ = 10−9–10−8 (using the four-alleles model, Tajima, 1996), and compare this to the observed relationship between π and Nc in Figure 4B. Under these parameters and assumptions, linked selection begins to appreciably constrain diversity for Nc107, since S(γDmel,JDmel,L)108107 and linked selection dominates drift when S(γ,J,L)>1/2N. Overall, this reveals two problems for the hypothesis that linked selection could solve Lewontin’s Paradox. First, low to mid-Nc species (census sizes between 104–107) have sufficiently long map lengths that their diversity levels are only moderately reduced by linked selection, leading to a wide gap between predicted and observed diversity levels. For this not to be the case, the rate of adaptive mutations or the deleterious mutation rate would need to be orders of magnitude higher for species within this range than in Drosophila melanogaster, which is incompatible with the rate of adaptive protein substitutions across species (Galtier, 2016) and overall mutation rates (Lynch, 2010). Furthermore, linked selection has been quantified in humans, which fall in this census size range, and has been found to be relatively weak (McVicker et al., 2009; Hernandez et al., 2011; Hellmann et al., 2008; Cai et al., 2009; Boyko et al., 2008). Second, while hitchhiking and BGS can reduce predicted diversity levels for high-Nc species (Nc>1012) to observed levels, this would imply available estimates of π0 are underestimated by several orders of magnitude in Drosophila (Figure 4—figure supplement 4B). The high reductions in π predicted here (compared to those of Elyashiv et al., 2016) are a result of using Nc, rather than Ne=π0/4μ in the denominator of Equation (1), which leads to a very high rate of sweeps in the population. I do not consider selective interference, though the saturation of adaptive substitutions per Morgan would only act to limit the reduction in diversity (Weissman and Barton, 2012), and thus these results are conservative.

Finally, the poor fit between observed and predicted levels of diversity across species is not remedied by stronger selection parameters. In Figure 4—figure supplement 3B, I increase both selection parameters U and γ ten-fold each, and find the same qualitative pattern: on a log-log scale the relationship between Nc and π is linear, while the predicted diversity under linked selection is non-linear with Nc. Under this ten-fold higher selection regime, there is more overlap between observed and predicted levels of diversity, but diversity is severely under-predicted for high-Nc species. Additionally, this would imply that selection in low-to-mid-Nc species is ten-folder higher than estimated in Drosophila melanogaster, which is implausible. Overall, this suggests that present models of linked selection, even with very strong selection across species, are qualitatively incapable of matching the observed relationship between Nc and π and thus cannot explain Lewontin’s Paradox.

Discussion

Nearly fifty years after Lewontin’s description of the Paradox of Variation, how evolutionary, life history, and ecological processes interact to constrain diversity across taxa to a narrow range remains a mystery. I revisit Lewontin’s Paradox by first characterizing the relationship between genomic estimates of pairwise diversity and approximate census population size across 172 metazoan species. Previous surveys have used allozyme-based estimates, fewer taxa, or proxies of population size. My estimates of census population sizes are rough approximates, since they use body size to predict density. An improved estimate might account for vagility (as Soulé, 1976 did), though this is harder to do systematically across many taxa. Future work might also use other ecological information, such as total biomass, or species distribution modeling to improve census size estimates (Bar-On et al., 2018; Mora et al., 2011). Still, it seems more accurate estimates would be unlikely to change the qualitative findings here, which resemble those of early surveys (Nei and Graur, 1984; Soulé, 1976).

One limitation of this study is that diversity estimates are collated from a variety of sources rather than estimated with a single bioinformatic pipeline. This leads to technical noise across diversity estimates; perhaps the relationship between π and Nc found here could be tighter with a standardized bioinformatic pipeline. In addition, there might be systematic bioinformatic sources of bias: for example high-diversity sequences may fail to align to the reference genome and end up unaccounted for, leading to a downward bias. Alternatively, a high-diversity sequences might map to the reference genome, but adjacent mis-matching SNPs might be mistaken for a short insertion or deletion. While these issues might affect estimates in high-diversity species, it is unlikely to change the qualitative relationship between π–Nc.

Macroevolution and Across-Taxa population genomics

Lewontin’s Paradox arises from a comparison of diversity across species, yet it has been disputed whether such comparisons require phylogenetic comparative methods. Extending previous work that has accounted for phylogeny in particular clades (Leffler et al., 2012), or using taxonomical-level averages (Romiguier et al., 2014), I show that the positive relationship between diversity and census size is significant using a mixed-effects model with a time-calibrated phylogeny. Additionally, I find a high degree of phylogenetic signal, evidence of deep shifts in the rate of evolution of genetic diversity, and that arthropods and chordates form clusters. Overall, this suggests that previous concerns about phylogenetic non-independence in comparative population genetic studies were warranted (Gillespie, 1991; Whitney and Garland, 2010). Notably, Lynch, 2011 has argued that PCMs for pairwise diversity are unnecessary, since mutation rate evolution is fast and thus free of phylogenetic inertia, sampling variance should exceed the variance due to phylogenetic shared history, and coalescence times are much shorter than divergence times. Since my findings suggest PCMs are necessary in some cases, it is worthwhile to address these points.

First, Lynch has correctly pointed out that while coalescence times are much less than divergence times and should be free of phylogenetic shared history, the factors that determine coalescence times (e.g. mutation rates and effective population size) may not be (Lynch, 2011). In other words, coalescence times are free from phylogenetic shared history were we to condition on these causal factors that could be affected by shared phylogenetic history. My estimates of phylogenetic signal in the residuals, by contrast, are not conditioned on these factors. Importantly, even "correcting for" phylogeny implicitly favors certain causal interpretations over others (Westoby et al., 1995; Uyeda et al., 2018). Future work could try to untangle what causal factors determine coalescence times across species, as well as how these factors evolve across macroevolutionary timescales. Second, it is a misconception that a fast rate of trait evolution necessarily reduces phylogenetic signal (Revell et al., 2008), and that if either or both variables in a regression are free of phylogenetic signal, PCMs are unnecessary (Revell, 2010; Uyeda et al., 2018). The evidence of high phylogenetic signal found in this study suggests PCMs are necessary when fitting the relationship between Nc and π in order to account for correlated residuals among closely-related species, and to avoid spurious results from phylogenetic pseudoreplication.

Finally, beyond just accounting for phylogenetic non-independence, macroevolution and phylogenetic comparative methods are a promising way to approach across-species population genomic questions. For example, one could imagine that diversification processes could contribute to Lewontin’s Paradox. If large-Nc species were to have a rate of speciation that is greater than the rate at which mutation and drift reach equilibrium (which is indeed slower for large Nc species), this could act to decouple diversity from census population size. That is to say, even if the rate of random demographic bottlenecks were constant across taxa, lineage-specific diversification processes could lead certain clades to be systematically further from demographic equilibrium, and thus have lower diversity than expected for their census population size.

How could selection still explain Lewontin’s Paradox?

Even assuming selection parameters estimated from Drosophila melanogaster, where the effects of linked selection are thought to be especially strong, the predicted patterns of diversity under linked selection poorly fit observed patterns of diversity across species. My results support the analysis by Coop, 2016 showing that levels of π0 estimated by Corbett-Detig et al., 2015 are not decoupled from genome-wide average π, as would occur if linked selection were to explain Lewontin’s Paradox. Additionally, my analysis goes a step further, showing that current linked selection models under a wide range of selection parameters are incapable of explaining the observed relationship between census size and diversity. This is in part because mid-Nc species have sufficiently long recombination map lengths to diminish the effects of even strong selection. Overall, while this suggests hitchhiking and background selection seem unlikely to explain patterns of diversity across taxa, there are three major potential limitations of my approach that need further evaluation.

First, I approximate the reduction in diversity using homogeneous background selection and recurrent hitchhiking models (Kaplan et al., 1989; Hudson and Kaplan, 1995; Coop and Ralph, 2012), when in reality, there is genome-wide heterogeneity in functional density, recombination rates, and the adaptive substitutions across species. Each of these factors mediate how strongly linked selection impacts diversity across the genome. Despite these model simplifications, the predicted reduction in diversity in Drosophila melanogaster is 85% (when using Ne, not Nc), which is reasonably close to the estimated 77% from the more realistic model of Elyashiv et al. that accounts for the actual position of substitutions, annotation features, and recombination rate heterogeneity (though it should be noted that these both use the same parameter estimates). Furthermore, even though my model fails to capture the heterogeneity of functionality density and recombination rate in real genomes, it is still conservative, likely overestimating the effects of linked selection to see if it could be capable of decoupling diversity from census size and explain Lewontin’s Paradox. This is in part because the strong selection parameter estimates from Drosophila melanogaster used, but also because I assume that the effective population size is equal to the census size. Even then, this decoupling only occurs in very high–census-size species, and implies that the diversity in the absence of linked selection, π0, is currently underestimated by several orders of magnitude. Moreover, the study of Corbett-Detig et al., 2015 did consider recombination rate and functional density heterogeneity in estimating the reduction due to linked selection across species, yet their predicted reductions are orders of magnitude weaker than those considered here by assuming that Ne=Nc (Figure 4—figure supplement 4B). Overall, given the effects estimated under more realistic inference models are still orders of magnitude weaker than those used in this study, current models of linked selection seem fundamentally unable to fit the diversity–census-size relationship.

Second, my model here only considers hard sweeps, and ignores the contribution of soft sweeps (e.g. from standing variation or recurrent mutations; Hermisson and Pennings, 2005; Pennings and Hermisson, 2006), partial sweeps (e.g those that do not reach fixation), and the interaction of sweeps and spatial processes. While future work exploring these alternative types of sweeps is needed, the predicted reductions in diversity found here under the simplified sweep model are likely relatively robust to these other modes of sweeps for a few reasons. First, the shape of the diversity–recombination curve is equivalent under models of partial sweeps and hard sweeps, though these imply different rates of sweeps (Coop and Ralph, 2012). Second, in the limit where most fitness variation is due to weak soft sweeps from standing variation scattered across the genome (i.e. due to polygenic fitness variation), levels of diversity are well approximated by quantitative genetic linked selection models (Robertson, 1961; Santiago and Caballero, 1995). The reduction in diversity under these models is nearly identical to that under background selection models, in part because deleterious alleles at mutation-selection balance constitute a considerable component of fitness variation (see Appendix Section B; Charlesworth and Hughes, 2000; Charlesworth, 2015). Third, the parameters from Elyashiv et al., 2016 could reflect a mixture of types of sweeps (Elyashiv et al., 2016 p. 14 and p. 19 of their Supplementary Online Materials). Finally, I also disregarded the interaction of sweeps and spatial processes. For populations spread over wide ranges, limited dispersal slows the spread of sweeps, allowing for new beneficial alleles to arise, spread, and compete against other segregating beneficial variants (Ralph and Coop, 2015; Ralph and Coop, 2010). Through limited dispersal should act to ‘‘soften sweeps’ and not impact my findings for the reasons described above, future work could investigate how these processes impact diversity in ways not captured by hard sweep models.

Third, other selective processes, such as fluctuating selection or hard selective events (i.e. selection resulting in a reduction in the population size), could reduce diversity in ways not captured by the background selection and hitchhiking models. Since frequency-independent fluctuating selection reduces diversity under most conditions (Novak and Barton, 2017), this could lead seasonality and other sources of temporal heterogeneity to reduce diversity in large-Nc species with short generation times more than longer-lived species with smaller population sizes. Future work could consider the impact of fluctuating selection on diversity under simple models (Barton, 2000) if estimates of key parameters governing the rate of such fluctuations were known across taxa. Additionally, another mode of selection that could severely reduce diversity across taxa, yet remains unaccounted for in this study, is periodic hard selective events. These selective events could occur regularly in a species’ history yet be indistinguishable from demographic bottlenecks with just population genomic data.

Spatial and demographic processes

One limitation of this study is the inability to quantify the impact of spatial and demographic population genetic processes on the relationship between diversity and census population sizes across taxa. The genomic diversity estimates collated in this study unfortunately lack details about the sampling process and spatial data, which can have a profound impact on population genomic summary statistics (Battey et al., 2020). These issues could systematically bias species-wide diversity estimates; for example, if diversity estimates from a cosmopolitan species were primarily from a single region or subpopulation, diversity would be an underestimate relative to the entire population. However, biased spatial sampling alone seems incapable of explaining the π-Nc divergence in high-Nc taxa. In the extreme scenario in which only one subpopulation was sampled, FST would need to be close to one for population subdivision alone to sufficiently reduce the total population heterozygosity to explain the orders-of-magnitude shortfall between predicted and observed diversity levels. This can be seen by rearranging the expression for FST as HS=(1-FST)HT, where HS and HT are the subpopulation and total population heterozygosities; if HT=4Ncμ, then only FST1 can reduce HS several orders of magnitude. Yet, across-taxa surveys indicate that FST is almost never this high within species (Roux et al., 2016). Future work could quantify the extent to which more realistic spatial processes contribute to Lewontin’s Paradox. For example, high-Nc taxa usually experience range expansions, with repeated founder effects and local extinction/recolonization dynamics that depress diversity (Slatkin, 1977). In particular, with the appropriate data, one could estimate the empirical relationship between dispersal distance, range size, and coalescent effective population size across taxa.

In this study, I have focused entirely on assessing the role of linked selection, rather than demography, in reducing diversity across taxa. In contrast to demographic models, models of linked selection have comparatively fewer parameters and more readily permit rough estimates of diversity reductions across taxa. Given that I find that models of linked selection are incapable of explaining the observed relationship between Nc and π, this supports the hypothesis the diversity across species are shaped primarily by past demographic fluctuations. Still, a full resolution of Lewontin’s Paradox would require understanding how the demographic processes across taxa with incredibly heterogeneous ecologies and life histories transform Nc into Ne. With population genomic data becoming available for more species, this could involve systematically inferring the demographic histories of tens of species and looking for correlations in the frequency and size of bottlenecks with Nc across species.

Measures of effective population size, Timescales, and Lewontin’s Paradox

Lewontin’s Paradox describes the extent to which the effective population sizes implied by diversity, N~e, diverge from census population sizes. However, there are a variety other effective population size estimators calculable from different data and summary statistics (Wang et al., 2016; Caballero, 1994; Galtier and Rousselle, 2020). These include estimators based on the site frequency spectrum, observed decay in linkage disequilibrium, or temporal estimators that use the variance in allele frequency change through time. These various estimators capture different summaries of effective population size on shorter timescales than coalescent-based estimators (see Wang, 2005 for a review), and thus could be used to tease apart processes that impact the Ne-Nc relationship in the more recent past.

Temporal Ne estimators already play an important role in understanding another summary of the Ne-Nc relationship: the ratio Ne/Nc, which is an important quantity in conservation genetics (Frankham, 1995; Mace and Lande, 1991) and in understanding evolution in highly fecund marine species. Surveys of the short-term Ne/Nc relationship across taxa indicate mean Ne/Nc is on order of 0.1 (Frankham, 1995; Palstra and Ruzzante, 2008; Palstra and Fraser, 2012), though the uncertainty in these estimates is high, and some species with sweepstakes reproduction systems like Pacific Oyster (Crassostrea gigas) can have Ne/Nc106 (Hedgecock, 1994). Estimates of the Ne/Nc ratio may be an important, yet under appreciated piece of solving Lewontin’s Paradox. For example, if Ne is estimated from the allele frequency change across a single generation (i.e. Waples, 1989), Ne/Nc constrains estimates of the variance in reproductive success (Wright, 1938; Nunney, 1993; Nunney, 1996). This implies that apart from species with sweepstakes reproductive systems, the variance in reproductive success each generation (whether heritable or non-heritable) is likely insufficient to significantly contribute to constraining N~e for most taxa. Still, further work is needed to characterize (1) how Ne/Nc varies with Nc across taxa (though see Palstra and Fraser, 2012, Figure 2), and (2) the variance of Ne/Nc over longer time spans (i.e. how periodic sweepstakes reproductive events act to constrain Ne). Overall, characterizing how Ne/Nc varies across taxa and correlates with ecology and life history traits could provide clues into the mechanisms that leads propagule size and survivorship curves to be predictive of diversity levels across taxa (Romiguier et al., 2014; Hallatschek, 2018; Barry et al., 2020).

Finally, short-term temporal Ne estimators may play an important role in resolving Lewontin’s Paradox. These estimators, along with short-term estimates of the impact of linked selection (Buffalo and Coop, 2019; Buffalo and Coop, 2020), can inform us how much diversity is depressed by selection on shorter timescales, free from the rare strong selective events or severe bottlenecks that impact pairwise diversity. It could be that in any one generation, selection contributes more to the variance of allele frequency changes than drift, yet across-taxa patterns in diversity are better explained processes acting sporadically on longer timescales, such as colonization, founder effects, and bottlenecks. Thus, the pairwise diversity may not give us the best picture of the generation to generation evolutionary processes acting in a population to change allele frequencies. Furthermore, certain observed adaptations occur at a pace that is inexplicable given small effective population sizes implied by diversity, and are only possible if short-term effective population sizes are orders of magnitude larger (Karasov et al., 2010; Barton, 2010).

Conclusions

In Building a Science of Population Biology (Lewontin et al., 2004), Lewontin laments the difficulty of uniting population genetics and population ecology into a cohesive discipline of population biology. Lewontin’s Paradox of Variation remains a major unsolved problem at the nexus of these two different disciplines: we fail to understand the processes that connect a central parameter of population ecology, census size, to a central parameter of population genetics, effective population size across species. Given that selection seems to fall short in resolving Lewontin’s Paradox, a full resolution will require a mechanistic understanding the ecological, life history, and macroevolutionary processes that connect Nc to Ne across taxa. While I have focused exclusively on metazoan taxa since their population densities are more readily approximated from body mass, a full resolution must also include plant species (with the added difficulties of variation in selfing rates, different dispersal strategies, pollination, etc.).

Looking at Lewontin’s Paradox through an macroecological and macroevolutionary lens begets interesting questions outside of the traditional realm of population genetics. Here, I have found that diversity and Nc have a consistent relationship without many outliers, despite the wildly disparate ecologies, life histories, and evolutionary histories of the taxa included. Furthermore, taxa with very large census sizes have surprisingly low diversity. Is this explained by macroevolutionary processes, such as different rates of speciation for large-Nc taxa? Or, are the levels of diversity we observe today an artifact of our timing relative to the last glacial maximum, or the last major extinction? Did large-Nc prehistoric animal populations living in other geological eras have higher levels of diversity than our present taxa? Or, does ecological competition occur on shorter timescales such that strong population size contractions transpire and depress diversity, even if a species is undisturbed by climatic shifts or mass extinctions? Overall, patterns of diversity across taxa are determined by many overlaid evolutionary and ecological processes occurring on vastly different timescales. Lewontin’s Paradox of Variation may persist unresolved for some time because the explanation requires synthesis and model building at the intersection of all these disciplines.

Materials and methods

Diversity and map length data

The data used in this study are collated from a variety of previously published surveys. Of the 172 taxa with diversity estimates, 14 are from Corbett-Detig et al., 2015, 96 are from Leffler et al., 2012, and 62 are from Romiguier et al., 2014. The Corbett-Detig et al. data is estimated from four-fold degenerate sites, the Romiguier et al. data is synonymous sites, and the Leffler et al. data is estimated predominantly from silent, intronic, and non-coding sites. All types of diversity estimates from Leffler et al., 2012 were included to maximize the taxa in the study, since the variability of diversity across functional categories is much less than the diversity across taxa. Multiple diversity estimates per taxa were averaged. The total recombination map length data were from both (Stapley et al., 2017; 127 taxa), and (Corbett-Detig et al., 2015; 9 taxa). Both studies used sex-averaged recombination maps estimated with cross-based approaches; in some cases errors in the original data were found, documented, and corrected. These studies also included genome size estimates used to create Figure 4—figure supplement 2 and Figure 4—figure supplement 1.

Macroecological estimates of population size

A rough approximation for total population size (census size) is Nc=DR, where D is the population density in individuals per km2 and R is the range size in km2. Since population density estimates are not available for many taxa included in this study, I used the macroecological abundance-body size relationship to predict population density from body size. Since body length measurements are more readily available than body mass, I collated body length data from various sources (see https://github.com/vsbuffalo/paradox_variation; copy archived at swh:1:rev:8fa6b5834f6536319b1e5cd9722ca02d317183df, Buffalo, 2021); body lengths were averaged across sexes for sexually dimorphic species, and if only a range of lengths was available, the midpoint was used.

Then, I re-estimated the relationship between body mass and population density using the data in the appendix table of Damuth, 1987, which includes 696 taxa with body mass and population density measurements across mammals, fish, reptiles, amphibians, aquatic invertebrates, and terrestrial arthropods. Though the abundance-body size relationship can be noisy at small spatial or phylogenetic scales (Chapter 5, Gaston and Blackburn, 2008), across deeply diverged taxa such as those included in this study and Damuth, 1987, the relationship is linear and homoscedastic (see Figure 1—figure supplement 1). Using Stan (Stan Development Team, 2020), I jointly estimated the relationship between body mass from body length using the Romiguier et al., 2014 taxa, and used this relationship to predict body mass for the taxa in this study. These body masses were then used to predict population density simultaneously, using the Damuth, 1981 relationship. The code of this routine (pred_popsize_missing_centered.stan) is available in the GitHub repository (https://github.com/vsbuffalo/paradox_variation/).

To estimate range, I first downloaded occurrence records from Global Biodiversity Information Facility (Global Biodiversity Information Facility, 2020) using the rgbif R package (Chamberlain et al., 2014; Chamberlain and Boettiger, 2017). Using the occurrence locations, I inferred whether a species was marine or terrestrial, based on whether the majority of their recorded occurrences overlapped a continent using rnaturalearth and the sf packages (South, 2017; Pebesma, 2018). For each taxon, I estimated its range by finding the minimum α-shape containing these occurrences. The α parameters were set more permissive for marine species since occurrence data for marine taxa were sparser. Then, I intersected the inferred ranges for terrestrial taxa with continental polygons, so their ranges did not overrun landmasses (and likewise with marine taxa and oceans). I inspected diagnostic plots for each taxa for quality control (all of these plots are available in paradox_variation GitHub repository), and in some cases, I manually adjusted the α parameter or manually corrected the range based on known range maps (these changes are documented in the code data/species_ranges.r and data/species_range_fixes.r). The range of C. elegans was conservatively approximated as the area of the Western US and Western Europe based on the map in Frézal and Félix, 2015. Drosophila species ranges are from the Drosophila Speciation Patterns website, (Yukilevich, 2012; Yukilevich, 2017). To further validate these range estimates, I have compared these to the qualitative range descriptions Leffler et al., 2012 (Figure 1—figure supplement 4) and compared my α-shape method to a subset of taxa with range estimates from IUCN Red List (Chamberlain, 2020; IUCN, 2020; Figure 1—figure supplement 3). Each census population size is then estimated as the product of range and density.

Population size validation

I validated the approximate census sizes by comparing the implied biomass of these estimates to estimates of the total carbon biomass on earth by phylum (Bar-On et al., 2018). For species i with wet body mass mi and census size Ni, the implied biomass is miNi. For all species in a phylum S, this total sample biomass is bS=iSmiNi. I then compare this wet biomass to the carbon biomasses by phylum by Bar-On et al., 2018. Across animal species, the ratio of dry to wet body mass, and carbon body mass dry body mass varies little. In their study, Bar-On et al. assume wet body mass has a 70% water content, and 50% of dry body mass is carbon mass, leading to a wet body mass to carbon mass factor of 10.7/0.5=0.15. I use this factor to convert the total wet biomass to carbon biomass per phylum.

First, I compared the relative carbon biomass in this study to the relative carbon biomass on earth per phylum. This shows that this study’s sample over represents chordate biomass (by a factor of ∼3), and under represents in arthropod biomass (by a factor of 0.02) relative to the proportion of carbon biomass of these phyla on earth (see column eight of Table 1). Second, to check whether the carbon biomass per phylum in the sample was broadly consistent with the total on earth by phylum (BS for phylum S), I calculated the expected sample biomass if species were sampled randomly from the total species in a phylum, (BS×nS/TS, where nS is the total number of species in the sample in phylum S, TS is the total number of species in phylum S on earth). The fraction of total species on earth included in the sample in this study is depicted in Figure 1—figure supplement 2.

Table 1. How the total carbon biomass estimates by phylum from Bar-On et al., 2018 compare to the implied biomass estimates from this study.

All biomass estimates are carbon biomass, and the proportions are of total biomass with respect to the study. The proportion of biomass in this study compared to the Bar-On et al. estimates Bar-On et al., 2018 indicates chordates are overrepresented and arthropods are underrepresented in the present study; the factor that each phylum is overrepresented is given in the eighth column. Total species by phylum estimates are from Reaka-Kudla et al., 1996; Nicol, 1969; Zhang, 2013; Chapman, 2009. The ratio column is the ratio of total biomass implied by the Nc estimates of each species in a phylum to the actual biomass of that phylum.

Bar-On et al. Present study
phylum total species (T) biomass (B) prop. biomass biomass (b) prop. biomass num. species (n) factor overrepresented prop. total species (f=n/T) factor (b/f⁢B)
Arthropoda 1.26 × 106 1.20 0.4635 2.80 × 10−4 0.0102 68 0.02 5.41 × 10−5 4.31
Chordata 5.41 × 104 0.87 0.3357 2.67 × 10−2 0.9715 68 2.89 1.26 × 10−3 24.40
Annelida 1.70 × 104 0.20 0.0772 1.23 × 10−5 0.0004 3 0.01 1.76 × 10−4 0.35
Mollusca 9.54 × 104 0.20 0.0772 4.56 × 10–4 0.0166 13 0.21 1.36 × 10−4 16.70
Cnidaria 1.60 × 104 0.10 0.0386 3.07 × 10−5 0.0011 2 0.03 1.25 × 10−4 2.45
Nematoda 2.50 × 104 0.02 0.0077 4.03 × 10−6 0.0001 1 0.02 4.00 × 10−5 5.03

Next, I look at the ratio of sample biomass per phylum, bS, to this expected biomass per phylum (Table 1). The consistency is quite close for this rough approach and the non-random sample of taxa included in this study. The carbon biomass estimates for chordates implied by the census size estimates are ∼24-fold higher than expected, but is well within reasonable expectations given that the chordate sample includes many larger-bodied domesticated species (and is a biased sample in other ways). Similarly, the implied arthropod carbon biomass is quite close to what one would expect. Overall, these values indicate that the census size estimates here do not lead to implied biomasses per phylum that are outside the range of plausibility. For other population size consistency checks, see Appendix 3.

Phylogenetic comparative methods

Of the full dataset of 172 taxa with diversity and population size estimates, a synthetic calibrated phylogeny was created for 166 species that appear in phylogenies in DateLife project (O’Meara et al., 2020; Sanchez-Reyes and O’Meara, 2019). This calibrated synthetic phylogeny was then subset for the analyses based on what species had complete trait data. The diversity-population size relationship assessed by a linear phylogenetic mixed-effects model implemented in Stan (Stan Development Team, 2020), according to the methods described in de Villemereuil and Nakagawa, 2014, (see stan/phylo_mm_regression.stan in the GitHub repository). This same Stan model was used to estimate the same relationship between arthropod, chordate, and mollusc subsets of the data, though a reduced model was used for the chordate subset due to identifiability issues leading to poor MCMC convergence (Figure 3—figure supplement 1).

The relationship between recombination map length and the logarithm of population size is non-linear and heteroscedastic, and was fit using a lognormal phylogenetic mixed-effects model on the 130 species with complete data. Since social insects have longer recombination map lengths (Wilfert et al., 2007), social taxa were excluded when fitting this model. All Rhat (Vehtari et al., 2019) values were below 1.01 and the effective number of samples was over 1,000, consistent with good mixing; details about the model are available in the GitHub repository (phylo_mm_lognormal.stan). Continuous trait maps (Figure 3A, Figure 3—figure supplement 3, and Figure 3—figure supplement 2) were created using phytools (Revell, 2012). Node-height tests were implemented based on the methods in Geiger (Pennell et al., 2014; Harmon et al., 2008), and use robust regression to fit a linear relationship between phylogenetic independent contrasts and branching times.

Predicted reductions in diversity

The predicted reductions in diversity due to linked selection are approximated using selection and deleterious mutation parameters from Drosophila melanogaster, and the recombination map length estimates from Stapley et al., 2017 and Corbett-Detig et al., 2015. The mathematical details of the simplified sweep model are explained in the Appendix Section A. I use estimates of the number of substitutions, m, in genic regions between D. melanogaster and D. simulans from Hu et al., 2013. Following Elyashiv et al., 2016, only substitutions in UTRs and exons are included, since they found no evidence of sweeps in introns. Then, I average over annotation classes to estimate the mean proportion of substitutions that are beneficial, αDmel=0.42, which are consistent with the estimates of Elyashiv et al. and estimates from MacDonald–Kreitman test approaches (see Eyre-Walker, 2006, Table 1). Then, I use divergence time estimates between D. melanogaster and D. simulans of 4.2×106 and estimate of ten generations per year (Obbard et al., 2012), calculating there are γDmel=αm/2T=2.26×103 substitutions per generation. Given the length of the Drosophila autosomes, G, this implies that the rate of beneficial substitutions per basepair, per generation is νBP,Dmel=γDmel/G=2.34×1011. Finally, I estimate JDmel4.5×10-4 from the estimate of genome-wide average rate of sweeps from Elyashiv et al. (Supplementary Table S6) and assuming DrosophilaNe=106. These Drosophila melanogaster hitchhiking parameter estimates are close to other previously-published estimates (Figure 4—figure supplement 5). Finally, I use UDmel=1.6, from Elyashiv et al., 2016. With these parameter estimates from D. melanogaster, the recombination map lengths across species, and Equation (1), I estimate πBGS+HH (assuming Nc=Nc) across all species. This leads to a range of predicted diversity ranges across species corresponding to μ=109108; to visualize these, I take a convex hull of all diversity ranges and smooth this with R’s smooth.spline function.

Acknowledgements

I would like to thank Andy Kern and Peter Ralph for helpful discussions and supporting me during this work, and Graham Coop for inspiration and helpful feedback during socially distanced nature walks at Yolo Basin. I thank Jessica Stapley for kindly providing the recombination map length data, and Yaniv Brandvain, Amy Collins, Doc Edge, Tyler Kent, Chuck Langley, Matt Osmond, Sally Otto, Molly Przeworski, Jeff Ross-Ibarra, Aaron Stern, Anastasia Teterina, Michael Turelli, Margot Wood, and my Kern-Ralph labmates for helpful discussions. Sarah Friedman, Katherine Corn, and Josef Uyeda provided very useful advice about phylogenetic comparative methods; yet I take full responsibility for any shortcomings of my analysis. Finally, I am indebted to Guy Sella, Matt Pennell, and two other anonymous reviewers for helpful feedback. I would like to also thank UO librarian Dean Walton for helping me track down some rather difficult to find older papers. This work was supported by an NIH Grant (1R01GM117241) awarded to Andrew Kern.

Appendix 1

Simplified sweep effects model

I use a simplified model of the effects of recurrent hitchhiking and background selection (BGS) occurring uniformly along a genome. Expected diversity is given by

E(π)=θθ+1/B+2NS (2)
θ1/B+2NS (3)

(Equation 1 Elyashiv et al., 2016, Equation 4 of Kim and Stephan, 2000, and Equation 20 of Coop and Ralph, 2012). The BGS component is given by Hudson and Kaplan, 1995,

B(U,L)=Neexp(-UL) (4)

and the hitchhiking component is

S=νBPrBPJ (5)

(Coop and Ralph, 2012, Equation 20) where νBP and rBP are the substitutions and recombination per basepair respectively, J is the probability that two lineages coalesce down to one, given sweeps occur uniformly along the genome. Under this homogeneous sweep model, J is

J=0Lqf(r)2𝑑r (6)

where qf(r) is the approximate probability that a lineage is trapped by a sweep to frequency f when it is r recombination fraction away from this sweep (Coop and Ralph, 2012; Equation 15).

Since I use Drosophila melanogaster parameter estimates from Elyashiv et al., 2016, I now reconcile their model’s S term with the simple model above. They estimate S in Drosophila melanogaster using a composite likelihood model that considers hitchhiking and background selection simultaneously, using substitutions and stratifying by annotation. For a neutral position at site x, the coalescence rate due to sweeps is given by Elyashiv et al.’s Equation 3,

S(x)=1TiSα(iS)ya(iS)exp(-r(x,y)τ(s,N))g(s|iS)𝑑s (7)

where T is the length of the lineage (in generations) on which substitutions accrue, iS=1,,IS is the annotation class (e.g. exons, introns, UTRs), α(iS) is the fraction of substitutions in annotation class iS that are beneficial, a(iS) is the set of all substitutions in annotation class iS, τ(s,N) is the fixation time of a site with additive effect s, and g(s|iS) is the distribution of selection coefficients for annotation class iS.

Note, that we can recover the model of Coop and Ralph, 2012 from this expression. Suppose there is only one annotation class, and α fraction of substitutions are beneficial, and one selection coefficient s¯, (i.e. g(s)=δ0(s-s¯)), then

S(x)=αTyaexp(-r(x,y)τ(s¯,N)). (8)

Let the number of substitutions be m:=|a|, and imagine their positions are uniformly distributed on a segment of length G basepairs with the focal site is the middle at position x=0. Then, each substitution y is a random distance lyU(G/2,G/2) away from the focal site. Assuming the recombination rate is a constant rBP per basepair, and approximating the sum with an integral, we have,

S=αTi=1mEli(exp(-rBPliτ(s¯,N))) (9)
=αTGi=1m0Gexp(-rBPτ(s¯,N))𝑑 (10)
=αmTG0Gexp(-rBPτ(s¯,N))𝑑 (11)

Using u-substitution with r=rBP this simplifies to

S=αmTGrBP0Lexp(-rτ(s¯,N))𝑑r (12)

where L=GrBP.

To simplify this notation, note that the rate of adaptive substitutions per basepair per generation is νBP=αm/GT, so

S=νBPrBP0Lexp(-rτ(s¯,N))𝑑r (13)

This is analogous to the second term of Coop and Ralph, 2012, Equation 17, with k=i=2 and x=1 (e.g. conditioning on a sweep to fixation). Note that there appears to be a factor of two error in Elyashiv et al., 2016 compared to Coop and Ralph, 2012; here I include the factor of two. Then,

S=νBPrBP0Lexp(-2rτ(s¯,N))𝑑rJ (14)

where the integral is equal to J (J2,2 of Equation 15 in Coop and Ralph, 2012) since a simple model of qf(r)=fexp(-2rτ(s,N)) and if we condition on fixation, f=1. This expression is useful to generalize across species, since we know N and L. Additionally, we have estimates of α and m/T in Drosophila and other species. In Elyashiv et al, they consider the number of substitutions per generation in genic regions only; it should be noted that the number of coding basepairs varies little across species. For convenience, I define γ=αm/T as the number of adaptive substitutions per generation per entire genome, such that S(γ,J,L)=γLJ used in the main text. Using the estimates of m4.5×105, α0.42, and T8.4×107 from the Supplementary Material of Elyashiv et al., I arrive at γ0.00226 adaptive substitutions per generation, per genome. For a 100 megabase genome, this translates to a νBP2.34×10-11, which is close to previous estimates (Figure 4—figure supplement 5A). For J, I use an empirical estimate calculated from the genome-wide average of the rate of coalescent events due to sweeps, from Supplementary Table S6 of Elyashiv et al. (rs=2NS0.92; see Figure 4—figure supplement 5B). This implies J4.46×10-4. Alternatively, I have tried using the estimated distribution of selection coefficients from Elyashiv et al., but this led to a weaker estimate of J, since the adaptive substitutions considered tend to cluster around genic regions.

Appendix 2

Background selection and polygenic fitness models

Throughout the main text, I use recurrent hitchhiking and background selection models to estimate the reduction in diversity due to linked selection. Another class of linked selection models, which I refer to as quantitative genetic linked selection models (QGLS; Robertson, 1961; Santiago and Caballero, 1995), can also depress genome-wide diversity. Furthermore, these models may depress diversity at neutral sites unlinked to the regions containing fitness variation. While I did not explicitly incorporate these models into my estimates of the diversity reductions, their effect is implicit in background selection models because they are analytically nearly identical. Here, I briefly sketch out the connection between BGS and QGLS models.

Under the Santiago and Caballero, 1998 model, the effective population size is NeSC98=Nexp(C2/(1Z)L), where C2 is the standardized heritable fitness variation, 1-Z is the decay of genetic variance through time, and L is the recombination map length. This model can accommodate a variety of modes of selection such as selection on an infinitesimal trait (Santiago and Caballero, 1995, p. 1016), and the flux of either weakly advantageous or deleterious alleles (Santiago and Caballero, 1998, p. 2109). If the source of fitness variation is entirely the input of new deleterious mutations with heterozygous effect sh at rate U per diploid genome per generation, then under mutation-selection balance, the equilibrium relative variance in reproductive success C2=Ush (Crow and Kimura, 1970; Caballero, 2020, p. 167), and Z=1sh1/2Nc (Santiago and Caballero, 1998). Thus, if 1/2Nc<<sh<<1, then C2/(1Z)U and NeSC98Nexp(U/L), which is the BGS model used in the main text and is a result of many background selection models with similar assumptions (Hudson and Kaplan, 1994, Equation 15; Hudson and Kaplan, 1995, Equation 9; Nordborg et al., 1996, Equation 4; Barton, 1995, Equation 22b). Intuitively, the similarity of these models reflects the fact that a substantial proportion of heritable fitness variation is caused by the continual flux of deleterious alleles across the genome under mutation-selection balance (Charlesworth, 2015; Charlesworth and Hughes, 2000).

Appendix 3

Additional population size validation

In addition to the biomass-based validation described in the main text, I also conducted a few other consistency checks. First, note that the body-mass-based estimates of density for Drosophila are similar to previously used estimates in surveys of census size and diversity. Nei and Graur, 1984 suggested a maximum of 5 Drosophila per m2, including regions of the range that are not inhabitable. Across Drosophila, the body mass based estimates suggest 106.7–107.6 individuals per km2, or 4.5 – 36.3 individuals per m2, which are consistent with this previous estimate. Nei and Graur’s estimates of Drosophila pseudoobscura’s census size are four orders of magnitude smaller than mine, but their approach uses a speculated ratio of population sizes of different Drosophila species rather than range sizes (Nei and Graur, 1984, p. 81).

As another consistency check, I looked at the rank order of mammals by biomass. Whale species have the first and third highest biomass with 11.4 and 3.9 megatons of carbon biomass (for Balaenoptera bonaerensis and Eschrichtius robustus, respectively). While this seems high, a recent study shows that across whale species, pre-whaling carbon biomass was at the tens of megatons level (Pershing et al., 2010, Table 1 and Figure 1). Given that my census size estimates represent populations at a macroecological equilibrium, they would not reflect reduced density due to whaling or other anthropogenic causes. Humans had the second largest biomass, followed by wolf species (Canis lupus and C. latrans); as with whales, the population sizes for wolf species represent pre-anthropogenic densities and are overestimates compared to current population sizes, as expected.

Finally, there are other estimates of approximate population sizes for some species that I compared my estimates to. The United Nation’s FAOSTAT database estimates the total number of horses (Equus caballus) on earth as ∼60 million; the estimate in this study is close to 40 million. For other domesticated species like chicken (Gallus gallus), estimates range from 25 million to 19.6 billion (FAOSTAT statistics database, 2021; Robinson et al., 2014); the present study’s estimate lies in the middle at ∼175 million. Again, this is a known limitation of this method, as the range is estimated from occurrence data and does not consider species’ niches. This present study’s estimate of the number of king penguins (Aptenodytes patagonicus) is about 3 million; the population size was recently estimated as 2.23 million pairs (Shirihai, 2008).

Appendix 4

Diversity and IUCN Red List Status

I also investigated the relationship between species’ IUCN Red List categories (an ordinal scale of how threatened a species is) and both diversity and population size, finding that species categorized as more threatened have both smaller population sizes and reduced diversity, compared to non-threatened species (Appendix 4—figure 1) consistent with past work (Spielman et al., 2004). A linear model of diversity regressed on population size has lower AIC when the IUCN Red List categories are included, and the estimates of the effect of IUCN status are all negative on diversity, though not all are significant in part because some categories have three or fewer species (Appendix 4—table 1).

Appendix 4—figure 1. A version of Figure 2 with points colored by their IUCN Red List conservation status.

Appendix 4—figure 1.

Margin boxplots show the diversity and population size ranges (thin lines) and interquartile ranges (thick lines) for each category. NA/DD indicates no IUCN Red List entry, or Red List status Data Deficient; LC is Least Concern, NT is Near Threatened, VU is Vulnerable, EN is Endangered, and CR is Critically Endangered.

Appendix 4—table 1. The regression estimates of full IUCN Red List population size model for diversity, log10(π)=β0+βLCLC+βNTNT+βVUVU+βENEN+βCRCR+βNclog10(Nc); df=165.

Using AIC to compare this full model to a reduced model of log10(π)=β0+βNclog10(Nc), AICfull=204.9, AICreduced=216.4.

Mean 2.5 % 97.5 %
β0 −2.80 −3.20 −2.50
βLC −0.39 −0.57 −0.21
βNT −0.22 −0.83 0.39
βVU −0.34 −0.84 0.16
βEN −0.40 −0.73 −0.07
βCR −0.03 −0.65 0.59
βNc 0.08 0.05 0.11

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Vince Buffalo, Email: vsbuffalo@gmail.com.

Guy Sella, Columbia University, United States.

Detlef Weigel, Max Planck Institute for Developmental Biology, Germany.

Funding Information

This paper was supported by the following grant:

  • National Institutes of Health 1R01GM117241 to Vince Buffalo.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Additional files

Transparent reporting form

Data availability

All primary datasets collated by this study, including new census size and range estimates, are available on Github at http://github.com/vsbuffalo/paradox_variation (copy archived at https://archive.softwareheritage.org/swh:1:rev:8fa6b5834f6536319b1e5cd9722ca02d317183df). An archived version of this repository is also available at Zenodo.

The following dataset was generated:

Vince B. 2021. vsbuffalo/paradox_variation: biorxiv v.1 with minor corrections. Zenodo.

The following previously published datasets were used:

Stapley J, Feulner PGD, Johnston SE, Santure AW, Smadja CM. 2017. Supplementary material from "Variation in recombination frequency and distribution across eukaryotes: patterns and processes". figshare.

References

  1. Aguade M, Miyashita N, Langley CH. Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster. Genetics. 1989;122:607–615. doi: 10.1093/genetics/122.3.607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Andolfatto P. Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Research. 2007;17:1755–1762. doi: 10.1101/gr.6691007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bar-On YM, Phillips R, Milo R. The biomass distribution on earth. PNAS. 2018;115:6506–6511. doi: 10.1073/pnas.1711842115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barry P, Broquet T, Gagnaire P-A. Life tables shape genetic diversity in marine fishes. bioRxiv. 2020 doi: 10.1101/2020.12.18.423459. [DOI] [PMC free article] [PubMed]
  5. Barton NH. Linkage and the limits to natural selection. Genetics. 1995;140:821–841. doi: 10.1093/genetics/140.2.821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barton NH. Genetic hitchhiking. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences. 2000;355:1553–1562. doi: 10.1098/rstb.2000.0716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Barton N. Understanding adaptation in large populations. PLOS Genetics. 2010;6:e1000987. doi: 10.1371/journal.pgen.1000987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Battey CJ, Ralph PL, Kern AD. Space is the place: effects of continuous spatial structure on analysis of population genetic data. Genetics. 2020;215:193–214. doi: 10.1534/genetics.120.303143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Begun DJ, Aquadro CF. Levels of naturally occurring DNA polymorphism correlate with recombination rates in Drosophila melanogaster. Nature. 1992;356:519–520. doi: 10.1038/356519a0. [DOI] [PubMed] [Google Scholar]
  10. Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, White TJ, Nielsen R, Clark AG, Bustamante CD. Assessing the evolutionary impact of amino acid mutations in the human genome. PLOS Genetics. 2008;4:e1000083. doi: 10.1371/journal.pgen.1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Buffalo V. Code and Data for Why do species get a thin slice of π? swh:1:rev:8fa6b5834f6536319b1e5cd9722ca02d317183df https://archive.softwareheritage.org/swh:1:rev:8fa6b5834f6536319b1e5cd9722ca02d317183dfSoftware Heritage. 2021
  12. Buffalo V, Coop G. The linked selection signature of rapid adaptation in temporal genomic data. Genetics. 2019;213:1007–1045. doi: 10.1534/genetics.119.302581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Buffalo V, Coop G. Estimating the genome-wide contribution of selection to temporal allele frequency change. PNAS. 2020;117:20672–20680. doi: 10.1073/pnas.1919039117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Burt A, Bell G. Mammalian chiasma frequencies as a test of two theories of recombination. Nature. 1987;326:803–805. doi: 10.1038/326803a0. [DOI] [PubMed] [Google Scholar]
  15. Caballero A. Developments in the prediction of effective population size. Heredity. 1994;73 (Pt 6):657–679. doi: 10.1038/hdy.1994.174. [DOI] [PubMed] [Google Scholar]
  16. Caballero A. Quantitative Genetics. Cambridge University Press; 2020. [Google Scholar]
  17. Cai JJ, Macpherson JM, Sella G, Petrov DA. Pervasive hitchhiking at coding and regulatory sites in humans. PLOS Genetics. 2009;5:e1000336. doi: 10.1371/journal.pgen.1000336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A. Stan: a probabilistic programming language. Journal of Statistical Software. 2017;76:1–32. doi: 10.18637/jss.v076.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chamberlain S, Ram K, Barve V, Mcglinn D. R package version 0. 7, 7rgbif: interface to the global biodiversity information facility API. 2014
  20. Chamberlain S. rredlist: ‘IUCN’ red list client 2020
  21. Chamberlain S, Boettiger C. R Python, and ruby clients for GBIF species occurrence data. PeerJ Preprints. 2017 doi: 10.7287/peerj.preprints.3304v1. [DOI]
  22. Chapman AD. Numbers of Living Species in Australia and the World. Department of the Environment, Water, Heritage and the Arts Canberra; 2009. [Google Scholar]
  23. Charlesworth B. Sexual Selection: Testing the Alternatives. Wiley; 1987. [Google Scholar]
  24. Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993;134:1289–1303. doi: 10.1093/genetics/134.4.1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Charlesworth B. Background selection and patterns of genetic diversity in Drosophila Melanogaster. Genetical Research. 1996;68:131–149. doi: 10.1017/S0016672300034029. [DOI] [PubMed] [Google Scholar]
  26. Charlesworth B. Causes of natural variation in fitness: evidence from studies of Drosophila populations. PNAS. 2015;112:1662–1669. doi: 10.1073/pnas.1423275112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Charlesworth B, Hughes KA. In: Evolutionary Genetics: From Molecules to Morphology. Singh R. S, Krimbas C, editors. Vol. 1. Cambridge: University Press; 2000. The maintenance of genetic variation in Life-History traits; pp. 369–392. [Google Scholar]
  28. Chen J, Glémin S, Lascoux M. Genetic diversity and the efficacy of purifying selection across plant and animal species. Molecular Biology and Evolution. 2017;34:1417–1428. doi: 10.1093/molbev/msx088. [DOI] [PubMed] [Google Scholar]
  29. Coop G. Does linked selection explain the narrow range of genetic diversity across species? bioRxiv. 2016 doi: 10.1101/042598. [DOI]
  30. Coop G, Ralph P. Patterns of neutral diversity under general models of selective sweeps. Genetics. 2012;192:205–224. doi: 10.1534/genetics.112.141861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Corbett-Detig RB, Hartl DL, Sackton TB. Natural selection constrains neutral diversity across a wide range of species. PLOS Biology. 2015;13:e1002112. doi: 10.1371/journal.pbio.1002112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Crow JF, Kimura M. An Introduction to Population Genetics Theory. New York, Evanston and London: Harper and Row Publishers; 1970. [Google Scholar]
  33. Cutter AD, Payseur BA. Selection at linked sites in the partial selfer Caenorhabditis elegans. Molecular Biology and Evolution. 2003;20:665–673. doi: 10.1093/molbev/msg072. [DOI] [PubMed] [Google Scholar]
  34. Damuth J. Population density and body size in mammals. Nature. 1981;290:699–700. doi: 10.1038/290699a0. [DOI] [Google Scholar]
  35. Damuth J. Interspecific allometry of population density in mammals and other animals: the independence of body mass and population energy-use. Biological Journal of the Linnean Society. 1987;31:193–246. doi: 10.1111/j.1095-8312.1987.tb01990.x. [DOI] [Google Scholar]
  36. de Villemereuil P, Nakagawa S. In: Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology: Concepts and Practice. Garamszegi L. Z, editor. Berlin: Berlin, Heidelberg; 2014. General quantitative genetic methods for comparative biology; pp. 287–303. [DOI] [Google Scholar]
  37. Eldon B, Wakeley J. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 2006;172:2621–2633. doi: 10.1534/genetics.105.052175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Elyashiv E, Sattath S, Hu TT, Strutsovsky A, McVicker G, Andolfatto P, Coop G, Sella G. A genomic map of the effects of linked selection in Drosophila. PLOS Genetics. 2016;12:e1006130. doi: 10.1371/journal.pgen.1006130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Eyre-Walker A. The genomic rate of adaptive evolution. Trends in Ecology & Evolution. 2006;21:569–575. doi: 10.1016/j.tree.2006.06.015. [DOI] [PubMed] [Google Scholar]
  40. FAOSTAT statistics database UN food and agriculture organisation Rome. 2021. [May 17, 2021]. http://www.fao.org/faostat/en/
  41. Felsenstein J. Phylogenies and the comparative method. The American Naturalist. 1985;125:1–15. doi: 10.1086/284325. [DOI] [PubMed] [Google Scholar]
  42. Fisher RA, Ford EB. The spread of a gene in natural conditions in a colony of the moth panaxia dominula L. Heredity. 1947;1:143–174. doi: 10.1038/hdy.1947.11. [DOI] [Google Scholar]
  43. Frankham R. Effective population size/adult population size ratios in wildlife: a review. Genetical Research. 1995;66:95–107. doi: 10.1017/S0016672300034455. [DOI] [PubMed] [Google Scholar]
  44. Frankham R. Relationship of genetic variation to population size in wildlife. Conservation Biology. 1996;10:1500–1508. doi: 10.1046/j.1523-1739.1996.10061500.x. [DOI] [Google Scholar]
  45. Freckleton RP, Harvey PH, Pagel M. Phylogenetic analysis and comparative data: a test and review of evidence. The American Naturalist. 2002;160:712–726. doi: 10.1086/343873. [DOI] [PubMed] [Google Scholar]
  46. Freckleton RP, Harvey PH. Detecting non-Brownian trait evolution in adaptive radiations. PLOS Biology. 2006;4:e373. doi: 10.1371/journal.pbio.0040373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Frézal L, Félix MA. C. elegans outside the petri dish. eLife. 2015;4:e05849. doi: 10.7554/eLife.05849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Galtier N. Adaptive protein evolution in animals and the effective population size hypothesis. PLOS Genetics. 2016;12:e1005774. doi: 10.1371/journal.pgen.1005774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Galtier N, Rousselle M. How much does ne vary among species? Genetics. 2020;216:303622. doi: 10.1534/genetics.120.303622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Gaston K, Blackburn T. Pattern and Process in Macroecology. John Wiley & Sons; 2008. [Google Scholar]
  51. Gillespie JH. The Causes of Molecular Evolution. Oxford: Oxford University Press Google Scholar; 1991. [Google Scholar]
  52. Gillespie JH. Genetic drift in an infinite population. The pseudohitchhiking model. Genetics. 2000;155:909–919. doi: 10.1093/genetics/155.2.909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Gillespie JH. Is the population size of a species relevant to its evolution? Evolution. 2001;55:2161–2169. doi: 10.1111/j.0014-3820.2001.tb00732.x. [DOI] [PubMed] [Google Scholar]
  54. Global Biodiversity Information Facility 2020. (27 August 2020) GBIF Occurrence Download. GBIF.org. [DOI]
  55. Hadfield JD, Nakagawa S. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. Journal of Evolutionary Biology. 2010;23:494–508. doi: 10.1111/j.1420-9101.2009.01915.x. [DOI] [PubMed] [Google Scholar]
  56. Hallatschek O. Selection-Like biases emerge in population models with recurrent jackpot events. Genetics. 2018;210:1053–1073. doi: 10.1534/genetics.118.301516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Harmon LJ, Weir JT, Brock CD, Glor RE, Challenger W. GEIGER: investigating evolutionary radiations. Bioinformatics. 2008;24:129–131. doi: 10.1093/bioinformatics/btm538. [DOI] [PubMed] [Google Scholar]
  58. Hauser L, Carvalho GR. Paradigm shifts in marine fisheries genetics: ugly hypotheses slain by beautiful facts. Fish and Fisheries. 2008;9:333–362. doi: 10.1111/j.1467-2979.2008.00299.x. [DOI] [Google Scholar]
  59. Hedgecock D. Does variance in reproductive success limit effective population sizes of marine organisms. Genetics and Evolution of Aquatic Organisms. 1994;122:122–134. [Google Scholar]
  60. Hedgecock D, Pudovkin AI. Sweepstakes reproductive success in highly fecund marine fish and shellfish: a review and commentary. Bulletin of Marine Science. 2011;87:971–1002. doi: 10.5343/bms.2010.1051. [DOI] [Google Scholar]
  61. Hellmann I, Mang Y, Gu Z, Li P, de la Vega FM, Clark AG, Nielsen R. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals. Genome Research. 2008;18:1020–1029. doi: 10.1101/gr.074187.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Hermisson J, Pennings PS. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005;169:2335–2352. doi: 10.1534/genetics.104.036947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, Sella G, Przeworski M, 1000 Genomes Project Classic selective sweeps were rare in recent human evolution. Science. 2011;331:920–924. doi: 10.1126/science.1198878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Hu TT, Eisen MB, Thornton KR, Andolfatto P. A second-generation assembly of the Drosophila simulans genome provides new insights into patterns of lineage-specific divergence. Genome Research. 2013;23:89–98. doi: 10.1101/gr.141689.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Hudson RR, Kaplan NL. In: Non-Neutral Evolution: Theories and Molecular Data. Golding B, editor. Boston: Springer; 1994. Gene trees with background selection; pp. 140–153. [DOI] [Google Scholar]
  66. Hudson RR, Kaplan NL. Deleterious background selection with recombination. Genetics. 1995;141:1605–1617. doi: 10.1093/genetics/141.4.1605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. IUCN The IUCN red list of threatened species. 2020. [October 31, 2020]. https://www.iucnredlist.org
  68. Jensen JD, Thornton KR, Andolfatto P. An approximate bayesian estimator suggests strong, recurrent selective sweeps in Drosophila. PLOS Genetics. 2008;4:e1000198. doi: 10.1371/journal.pgen.1000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Kaplan NL, Hudson RR, Langley CH. The "hitchhiking effect" revisited. Genetics. 1989;123:887–899. doi: 10.1093/genetics/123.4.887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Karasov T, Messer PW, Petrov DA. Evidence that adaptation in Drosophila is not limited by mutation at single sites. PLOS Genetics. 2010;6:e1000924. doi: 10.1371/journal.pgen.1000924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Kim Y, Stephan W. Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics. 2000;155:1415–1427. doi: 10.1093/genetics/155.3.1415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Kimura M. The Neutral Theory of Molecular Evolution. Cambridge University Press; 1984. [Google Scholar]
  73. Kimura M, Crow JF. The number of alleles that can be maintained in a finite population. Genetics. 1964;49:725–738. doi: 10.1093/genetics/49.4.725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Kondrashov FA, Kondrashov AS. Measurements of spontaneous rates of mutations in the recent past and the near future. Philosophical Transactions of the Royal Society B: Biological Sciences. 2010;365:1169–1176. doi: 10.1098/rstb.2009.0286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Leffler EM, Bullaughey K, Matute DR, Meyer WK, Ségurel L, Venkat A, Andolfatto P, Przeworski M. Revisiting an old riddle: what determines genetic diversity levels within species? PLOS Biology. 2012;10:e1001388. doi: 10.1371/journal.pbio.1001388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Leroy T, Rousselle M, Tilak MK, Caizergues AE, Scornavacca C, Recuerda M, Fuchs J, Illera JC, De Swardt DH, Blanco G, Thébaud C, Milá B, Nabholz B. Island songbirds as windows into evolution in small populations. Current Biology. 2021;31:1303–1310. doi: 10.1016/j.cub.2020.12.040. [DOI] [PubMed] [Google Scholar]
  77. Lewontin RC. The Genetic Basis of Evolutionary Change. New York: Columbia University Press; 1974. [DOI] [Google Scholar]
  78. Lewontin RC, Singh RS, Uyenoyama MK. In: The Evolution of Population Biology. Singh RS, Uyenoyama MK, editors. Cambridge University Press; 2004. Building a science of population biology; pp. 7–20. [DOI] [Google Scholar]
  79. Li H, Stephan W. Inferring the demographic history and rate of adaptive substitution in Drosophila. PLOS Genetics. 2006;2:e166. doi: 10.1371/journal.pgen.0020166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Lynch M. Methods for the analysis of comparative data in evolutionary biology. Evolution. 1991;45:1065–1080. doi: 10.1111/j.1558-5646.1991.tb04375.x. [DOI] [PubMed] [Google Scholar]
  81. Lynch M. Evolution of the mutation rate. Trends in Genetics. 2010;26:345–352. doi: 10.1016/j.tig.2010.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Lynch M. Statistical inference on the mechanisms of genome evolution. PLOS Genetics. 2011;7:e1001389. doi: 10.1371/journal.pgen.1001389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]
  84. Mace GM, Lande R. Assessing extinction threats: toward a reevaluation of IUCN threatened species categories. Conservation Biology. 1991;5:148–157. doi: 10.1111/j.1523-1739.1991.tb00119.x. [DOI] [Google Scholar]
  85. Macpherson JM, Sella G, Davis JC, Petrov DA. Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics. 2007;177:2083–2099. doi: 10.1534/genetics.107.080226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Malécot G. Les mathématiques de l'hérédité. Paris: Masson; 1948. [Google Scholar]
  87. Maruyama T, Kimura M. Genetic variability and effective population size when local extinction and recolonization of subpopulations are frequent. PNAS. 1980;77:6710–6714. doi: 10.1073/pnas.77.11.6710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. McVicker G, Gordon D, Davis C, Green P. Widespread genomic signatures of natural selection in hominid evolution. PLOS Genetics. 2009;5:e1000471. doi: 10.1371/journal.pgen.1000471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Mora C, Tittensor DP, Adl S, Simpson AG, Worm B. How many species are there on earth and in the ocean? PLOS Biology. 2011;9:e1001127. doi: 10.1371/journal.pbio.1001127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Mukai T. In: Population Genetics and Molecular Evolution. Ohta T, Aoki K, editors. Berlin: Springer-Verlag; 1985. Experimental verification of the neutral theory; pp. 125–145. [Google Scholar]
  91. Mukai T. Genotype-environment interaction in relation to the maintenance of genetic variability in populations of Drosophila melanogaster. Proceedings of the Second International Conference on Quantitative Genetics.1988. [Google Scholar]
  92. Nei M, Graur D. Extent of protein polymorphism and the neutral mutation theory. Evolutionary Biology. 1984;17:73–118. [Google Scholar]
  93. Nevo E. Genetic variation in natural populations: patterns and theory. Theoretical Population Biology. 1978;13:121–177. doi: 10.1016/0040-5809(78)90039-4. [DOI] [PubMed] [Google Scholar]
  94. Nevo E, Beiles A, Ben-Shlomo R. In: Evolutionary Dynamics of Genetic Diversity. Mani GS, editor. Heidelberg: Springer Berlin; 1984. The evolutionary significance of genetic diversity: Ecological, demographic and life history correlates; pp. 13–213. [DOI] [Google Scholar]
  95. Nicol D. The number of living species of molluscs. Systematic Zoology. 1969;18:251–254. doi: 10.2307/2412618. [DOI] [Google Scholar]
  96. Nicolaisen LE, Desai MM. Distortions in genealogies due to purifying selection. Molecular Biology and Evolution. 2012;29:3589–3600. doi: 10.1093/molbev/mss170. [DOI] [PubMed] [Google Scholar]
  97. Nordborg M, Charlesworth B, Charlesworth D. The effect of recombination on background selection. Genetical Research. 1996;67:159–174. doi: 10.1017/S0016672300033619. [DOI] [PubMed] [Google Scholar]
  98. Novak S, Barton NH. When does Frequency-Independent selection maintain genetic variation? Genetics. 2017;207:653–668. doi: 10.1534/genetics.117.300129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Nunney L. The influence of mating system and overlapping generations on effective population size. Evolution. 1993;47:1329–1341. doi: 10.1111/j.1558-5646.1993.tb02158.x. [DOI] [PubMed] [Google Scholar]
  100. Nunney L. The influence of variation in female fecundity on effective population size. Biological Journal of the Linnean Society. 1996;59:411–425. doi: 10.1111/j.1095-8312.1996.tb01474.x. [DOI] [Google Scholar]
  101. Nydam ML, Harrison RG. Polymorphism and divergence within the ascidian genus Ciona. Molecular Phylogenetics and Evolution. 2010;56:718–726. doi: 10.1016/j.ympev.2010.03.042. [DOI] [PubMed] [Google Scholar]
  102. Obbard DJ, Maclennan J, Kim KW, Rambaut A, O'Grady PM, Jiggins FM. Estimating divergence dates and substitution rates in the Drosophila phylogeny. Molecular Biology and Evolution. 2012;29:3459–3473. doi: 10.1093/molbev/mss150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Ohta T. The nearly neutral theory of molecular evolution. Annual Review of Ecology and Systematics. 1992;23:263–286. doi: 10.1146/annurev.es.23.110192.001403. [DOI] [Google Scholar]
  104. Ohta T, Kimura M. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genetical Research. 1973;22:201–204. doi: 10.1017/S0016672300012994. [DOI] [PubMed] [Google Scholar]
  105. O’Meara B, Sanchez-Reyes LL, Eastman J, Heath T, ril Wright A, Schliep K, Chamberlain S, Midford P, Harmon L, Brown J, Pennell M, Alfaro M. 0.3.2Datelife: Go from a List of Taxa or a Tree to a Chronogram using Open Scientific Data. 2020 https://github.com/phylotastic/datelife
  106. Palkopoulou E, Mallick S, Skoglund P, Enk J, Rohland N, Li H, Omrak A, Vartanyan S, Poinar H, Götherström A, Reich D, Dalén L. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Current Biology. 2015;25:1395–1400. doi: 10.1016/j.cub.2015.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Palstra FP, Fraser DJ. Effective/census population size ratio estimation: a compendium and appraisal. Ecology and Evolution. 2012;2:2357–2365. doi: 10.1002/ece3.329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Palstra FP, Ruzzante DE. Genetic estimates of contemporary effective population size: what can they tell us about the importance of genetic stochasticity for wild population persistence? Molecular Ecology. 2008;17:3428–3447. doi: 10.1111/j.1365-294X.2008.03842.x. [DOI] [PubMed] [Google Scholar]
  109. Pebesma E. Simple features for R: standardized support for spatial vector data. The R Journal. 2018;10:439. doi: 10.32614/RJ-2018-009. [DOI] [Google Scholar]
  110. Pennell MW, Eastman JM, Slater GJ, Brown JW, Uyeda JC, FitzJohn RG, Alfaro ME, Harmon LJ. Geiger v2.0: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics. 2014;30:2216–2218. doi: 10.1093/bioinformatics/btu181. [DOI] [PubMed] [Google Scholar]
  111. Pennings PS, Hermisson J. Soft sweeps II--molecular population genetics of adaptation from recurrent mutation or migration. Molecular Biology and Evolution. 2006;23:1076–1084. doi: 10.1093/molbev/msj117. [DOI] [PubMed] [Google Scholar]
  112. Pershing AJ, Christensen LB, Record NR, Sherwood GD, Stetson PB. The impact of whaling on the ocean carbon cycle: why bigger was better. PLOS ONE. 2010;5:e12444. doi: 10.1371/journal.pone.0012444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Powell JR. In: Evolutionary Biology. Theodosius D, Hecht M, William C. S, editors. Vol. 8. New York: Plenum Press; 1975. Protein variation in natural populations of animals; pp. 79–199. [Google Scholar]
  114. Ralph P, Coop G. Parallel adaptation: one or many waves of advance of an advantageous allele? Genetics. 2010;186:647–668. doi: 10.1534/genetics.110.119594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Ralph PL, Coop G. The role of standing variation in geographic convergent adaptation. The American Naturalist. 2015;186 Suppl 1:S5–S23. doi: 10.1086/682948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Reaka-Kudla ML, Wilson DE, Wilson EO. Biodiversity II: Understanding and Protecting Our Biological Resources. Joseph Henry Press; 1996. [Google Scholar]
  117. Revell LJ, Harmon LJ, Collar DC. Phylogenetic signal, evolutionary process, and rate. Systematic Biology. 2008;57:591–601. doi: 10.1080/10635150802302427. [DOI] [PubMed] [Google Scholar]
  118. Revell LJ. Phylogenetic signal and linear regression on species data: phylogenetic regression. Methods in Ecology and Evolution. 2010;1:319–329. doi: 10.1111/j.2041-210X.2010.00044.x. [DOI] [Google Scholar]
  119. Revell LJ. Phytools: an R package for phylogenetic comparative biology (and other things) Methods in Ecology and Evolution. 2012;3:217–223. doi: 10.1111/j.2041-210X.2011.00169.x. [DOI] [Google Scholar]
  120. Robertson A. Inbreeding in artificial selection programmes. Genetical Research. 1961;2:189–194. doi: 10.1017/S0016672300000690. [DOI] [PubMed] [Google Scholar]
  121. Robinson TP, Wint GR, Conchedda G, Van Boeckel TP, Ercoli V, Palamara E, Cinardi G, D'Aietti L, Hay SI, Gilbert M. Mapping the global distribution of livestock. PLOS ONE. 2014;9:e96084. doi: 10.1371/journal.pone.0096084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Romiguier J, Gayral P, Ballenghien M, Bernard A, Cahais V, Chenuil A, Chiari Y, Dernat R, Duret L, Faivre N, Loire E, Lourenco JM, Nabholz B, Roux C, Tsagkogeorga G, Weber AA, Weinert LA, Belkhir K, Bierne N, Glémin S, Galtier N. Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature. 2014;515:261–263. doi: 10.1038/nature13685. [DOI] [PubMed] [Google Scholar]
  123. Roux C, Fraïsse C, Romiguier J, Anciaux Y, Galtier N, Bierne N. Shedding light on the grey zone of speciation along a continuum of genomic divergence. PLOS Biology. 2016;14:e2000234. doi: 10.1371/journal.pbio.2000234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Roze D. A simple expression for the strength of selection on recombination generated by interference among mutations. PNAS. 2021;118:e2022805118. doi: 10.1073/pnas.2022805118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Sanchez-Reyes LL, O’Meara B. Datelife: leveraging databases and analytical tools to reveal the dated tree of life. bioRxiv. 2019 doi: 10.1101/782094. [DOI] [PMC free article] [PubMed]
  126. Santiago E, Caballero A. Effective size of populations under selection. Genetics. 1995;139:1013–1030. doi: 10.1093/genetics/139.2.1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Santiago E, Caballero A. Effective size and polymorphism of linked neutral loci in populations under directional selection. Genetics. 1998;149:2105–2117. doi: 10.1093/genetics/149.4.2105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Sella G, Petrov DA, Przeworski M, Andolfatto P. Pervasive natural selection in the Drosophila genome? PLOS Genetics. 2009;5:e1000495. doi: 10.1371/journal.pgen.1000495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Shapiro JA, Huang W, Zhang C, Hubisz MJ, Lu J, Turissini DA, Fang S, Wang HY, Hudson RR, Nielsen R, Chen Z, Wu CI. Adaptive genic evolution in the Drosophila genomes. PNAS. 2007;104:2271–2276. doi: 10.1073/pnas.0610385104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Shirihai H. The Complete Guide to Antarctic Wildlife: Birds and Marine Mammals of the Antarctic Continent and the Southern Ocean. Princeton University Press; 2008. [Google Scholar]
  131. Slatkin M. Gene flow and genetic drift in a species subject to frequent local extinctions. Theoretical Population Biology. 1977;12:253–262. doi: 10.1016/0040-5809(77)90045-4. [DOI] [PubMed] [Google Scholar]
  132. Small KS, Brudno M, Hill MM, Sidow A. Extreme genomic variation in a natural population. PNAS. 2007;104:5698–5703. doi: 10.1073/pnas.0700890104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genetical Research. 1974;23:23–35. doi: 10.1017/S0016672300014634. [DOI] [PubMed] [Google Scholar]
  134. Soulé ME. In: Molecular Evolution. Ayala F. J, editor. Sunderland, Massachusetts: Sinauer Associates; 1976. Allozyme variation, its determinants in space and time; pp. 60–77. [Google Scholar]
  135. South A. Rnaturalearth: World Map Data From Natural Earth. Natural Earth; 2017. [Google Scholar]
  136. Spielman D, Brook BW, Frankham R. Most species are not driven to extinction before genetic factors impact them. PNAS. 2004;101:15261–15264. doi: 10.1073/pnas.0403809101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Stan Development Team . Stan Modeling Language Users Guide and Reference Manual. Stan Developer; 2020. [Google Scholar]
  138. Stapley J, Feulner PGD, Johnston SE, Santure AW, Smadja CM. Variation in recombination frequency and distribution across eukaryotes: patterns and processes. Philosophical Transactions of the Royal Society B: Biological Sciences. 2017;372:20160455. doi: 10.1098/rstb.2016.0455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Stephan W, Wiehe THE, Lenz MW. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theoretical Population Biology. 1992;41:237–254. doi: 10.1016/0040-5809(92)90045-U. [DOI] [Google Scholar]
  140. Stephan W. An improved method for estimating the rate of fixation of favorable mutations based on DNA polymorphism data. Molecular Biology and Evolution. 1995;12:959–962. doi: 10.1093/oxfordjournals.molbev.a040274. [DOI] [PubMed] [Google Scholar]
  141. Stephan W, Langley CH. DNA polymorphism in lycopersicon and crossing-over per physical length. Genetics. 1998;150:1585–1593. doi: 10.1093/genetics/150.4.1585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  142. Tajima F. The amount of DNA polymorphism maintained in a finite population when the neutral mutation rate varies among sites. Genetics. 1996;143:1457–1465. doi: 10.1093/genetics/143.3.1457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  143. Tsagkogeorga G, Cahais V, Galtier N. The population genomics of a fast evolver: high levels of diversity, functional constraint, and molecular adaptation in the tunicate Ciona intestinalis. Genome Biology and Evolution. 2012;4:852–861. doi: 10.1093/gbe/evs054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. Uyeda JC, Zenil-Ferguson R, Pennell MW. Rethinking phylogenetic comparative methods. Systematic Biology. 2018;67:1091–1109. doi: 10.1093/sysbio/syy031. [DOI] [PubMed] [Google Scholar]
  145. Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C. Rank-normalization, folding, and localization: an improved for assessing convergence of MCMC. arXiv. 2019 https://arxiv.org/abs/1903.08008
  146. Wang J. Estimation of effective population sizes from data on genetic markers. Philosophical Transactions of the Royal Society B: Biological Sciences. 2005;360:1395–1409. doi: 10.1098/rstb.2005.1682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  147. Wang J, Santiago E, Caballero A. Prediction and estimation of effective population size. Heredity. 2016;117:193–206. doi: 10.1038/hdy.2016.43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Waples RS. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics. 1989;121:379–391. doi: 10.1093/genetics/121.2.379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  149. Waples RS, Luikart G, Faulkner JR, Tallmon DA. Simple life-history traits explain key effective population size ratios across diverse taxa. Proceedings of the Royal Society B: Biological Sciences. 2013;280:20131339. doi: 10.1098/rspb.2013.1339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  150. Waples RS, Grewe PM, Bravington MW, Hillary R, Feutry P. Robust estimates of a high Ne/N ratio in a top marine predator, southern bluefin tuna. Science Advances. 2018;4:eaar7759. doi: 10.1126/sciadv.aar7759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  151. Weissman DB, Barton NH. Limits to the rate of adaptive substitution in sexual populations. PLOS Genetics. 2012;8:e1002740. doi: 10.1371/journal.pgen.1002740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  152. Westoby M, Leishman MR, Lord JM. On Misinterpreting the `Phylogenetic Correction'. The Journal of Ecology. 1995;83:531–534. doi: 10.2307/2261605. [DOI] [Google Scholar]
  153. Whitney KD, Garland T. Did genetic drift drive increases in genome complexity? PLOS Genetics. 2010;6:e1001080. doi: 10.1371/journal.pgen.1001080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  154. Wilfert L, Gadau J, Schmid-Hempel P. Variation in genomic recombination rates among animal taxa and the case of social insects. Heredity. 2007;98:189–197. doi: 10.1038/sj.hdy.6800950. [DOI] [PubMed] [Google Scholar]
  155. Wright S. Evolution in Mendelian Populations. Genetics. 1931;16:97–159. doi: 10.1093/genetics/16.2.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  156. Wright S. Size of population and breeding structure in relation to evolution. Science. 1938;87:430–431. [Google Scholar]
  157. Wright S. On the roles of directed and random changes in gene frequency in the genetics of populations. Evolution. 1948;2:279–294. doi: 10.1111/j.1558-5646.1948.tb02746.x. [DOI] [PubMed] [Google Scholar]
  158. Yukilevich R. Asymmetrical patterns of speciation uniquely support reinforcement in Drosophila. Evolution. 2012;66:1430–1446. doi: 10.1111/j.1558-5646.2011.01534.x. [DOI] [PubMed] [Google Scholar]
  159. Yukilevich R. Drosophila speciation patterns. 2017. [May 27, 2020]. http://www.Drosophila-speciation-patterns.com
  160. Zhang Z-Q. In: Animal Biodiversity: An Outline of Higher-Level Classification and Survey of Taxonomic Richness (Addenda 2013) Zhang Z. Q, editor. Vol. 3703. Zootaxa; 2013. Animal biodiversity: An update of classification and diversity in 2013; pp. 5–11. [DOI] [PubMed] [Google Scholar]
  161. Zhao S, Zheng P, Dong S, Zhan X, Wu Q, Guo X, Hu Y, He W, Zhang S, Fan W, Zhu L, Li D, Zhang X, Chen Q, Zhang H, Zhang Z, Jin X, Zhang J, Yang H, Wang J, Wang J, Wei F. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation. Nature Genetics. 2013;45:67–71. doi: 10.1038/ng.2494. [DOI] [PubMed] [Google Scholar]

Decision letter

Editor: Guy Sella1
Reviewed by: Guy Sella2, Matthew Pennell3

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Thank you for submitting your article "Why do species get a thin slice of π? Revisiting Lewontin's Paradox of Variation" for consideration by eLife. Your article has been reviewed by 4 peer reviewers, including Guy Sella as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Detlef Weigel as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Matthew Pennell (Reviewer #4).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The reviewers appreciated the scholarship and extensive work that went into this manuscript. At the same time, they had several issues with the novel analyses. After some discussion, they suggested that if the paper is revised into more of a review, then it could be of interest to the broad readership of eLife. This would imply substantial streamlining, down-weighting and even removing some analyses, and addressing the reviewers comments on others. Specifically:

– Find ways to validate the census size estimates (Reviewers 1 and 2).

– The reviewers had a hard time understanding the motivation for the phylogenetic analysis (e.g., Reviewers 1-3). One option is to clarify this motivation, as well as the interpretation of the results; another is to remove this analysis or parts of it. In addition, Reviewer 4, who is an expert on these analyses, had a number of concrete comments that should be addressed.

– While the reviewers found the analysis of linked selection effects intriguing, they were unclear about its interpretation. Notably, to what extent is it simply affirming the conclusions of Coop (2016), as opposed to illustrating a much more dramatic effect (e.g., Reviewers 1-3).

Reviewer #1 (Recommendations for the authors):

1) Substantial streamlining and editing would make the manuscript much clearer.

2) The abstract is way too long: not on the intended resolution.

3) If you aim for a broad readership, you may consider having the background be less of a historical review (without sacrificing scholarship). Also, no need to recap the historical review at the beginning of the Discussion.

4) You recap what you do at length several times (e.g., L 130-156), which is repetitive.

5) It would be clearer to use consistent terminology, e.g., instead of "enigma", "anomaly", "paradox" and "explanation", "resolution"…

6) The writing switches between acknowledging that several factors are plausibly at work and seeking "a solution".

7) It is claimed that "an ordinary least squares relationship on a log-log scale fits the data well" but I did not find a quantification to this effect. Namely, what proportion of the variance does it explain? Moreover, it is claimed that this relationship is homoscedastic. Can that be quantified as well? From looking at the figure it seems that the regression may explain ~1 of the ~3 orders of magnitude in diversity levels and the residuals explain ~2. It would also be helpful to say what we learn from this fit that we did not learn from staring at the plot. Does one (or more) of the potential explanations for Lewontin's paradox posit a log-log relationship?

Reviewer #2 (Recommendations for the authors):

I would love to see a revised version of this piece published.

Reviewer #3 (Recommendations for the authors):

My main suggestion is to expand a little more the arguments for why the author feels particular factors are unlikely to explain the patterns qualitatively, and to touch on some of the alternative explanations raised in the public review.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Quantifying Lewontin's Paradox Suggests Natural Selection is Unlikely to Explain the Narrow Range of Diversity Among Species" for consideration by eLife. Your article has been reviewed by 4 peer reviewers, including Guy Sella as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Detlef Weigel as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Matthew Pennell (Reviewer #4).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

All of us find the topic important, the manuscript very interesting and the revised version improved. At the same time, we also felt that the potential impact of the paper is diminished by not (or only partially) addressing key issues raised in the previous round of review. All the more so, if this is to be more of a research paper than a review (although there can be some range in between).

Notably:

1) We found that the purpose and insights of the phylogenetic regression model were still not clear. While we had Matthew to answer some of our questions, most of the readers will not have such a resource.

2) We think that more precisely articulating how the linked selection analysis may have moved us forward (i.e., beyond just saying that it did not resolve the paradox entirely) would be helpful. For example, if you believe that it accounts for several orders of magnitude even if it does not account for the shape of the dependency of π on Nc for intermediate values of Nc, then that is important progress.

3) In that regard, we also think that exploring a somewhat wider range of selection parameters (along the lines suggested by reviewer 1 in the previous round) would be helpful in elucidating how far linked selection can and cannot take us.

In summary, we think that the impact of your nice work would be enhanced substantially by addressing these issues, but we leave that to your discretion and will not require another round of reviews.

Reviewer #1 (Recommendations for the authors):

The manuscript has improved substantially in both substance and presentation. I believe that it would be all the more impactful with another thorough round of editing. I attach a pdf with comments/suggestions, but am not a native speaker and am also mildly dyslexic, which is to say that it may be worth getting additional feedback from better writers/editors.

A few comments about substance:

– The interpretation and necessity of controlling for phylogenetic non-independence. Having read the revised manuscript, your replies and the comments of Reviewer 4, I remain confused about the interpretation of the analysis. For example, I don't understand whether it is about controlling for phylogenetic relationships in errors or in factors that affect the relationship between π and Nc. Moreover, to the extent that it does the latter, what are the assumptions that go into the correction.

In the discussion you say that (l. 415-16) "The evidence of high phylogenetic signal found in this study suggests PCMs are needed, in part to avoid spurious results from phylogenetic pseudo-replication." If I understand you correctly, you are suggesting that factors that modify the relationship between π and Nc, such as life history traits, are likely to be more similar in more closely related species, and consequently, the (pi, Nc) points corresponding to closely related species may be considered replication in that they reflect the same processes. If this is what you meant, I think it should be spelled out. Moreover, it would be good to do so early on, in the results, such that the reader can better understand what the PCM analysis is plausibly about.

In the Results section you say that (l. 220-223): "If the relationship between diversity and population size was free of shared phylogenetic history, \λ = 0 and all the variances could be explained by evolution on the tips; this is analogous to Lynch's conjecture that coalescent times should be free of phylogenetic signal (2011)." Doesn't this contradict the interpretation discussed in the previous paragraph?

Also, I still don't understand the claim that PCM corrects for "spurious pseudo-replication". Shouldn't the determination whether it is "pseudo-replication" depend on the notion of the "true" relationship that you are trying to estimate? Stated differently, maybe some of the phylogenetic signal arises from similarities in factors that affect the pi-Nc relationship and others that just affect π (e.g., mutation rates), a given notion of the "true" relationship would suggest that you want to correct for the latter but not the former. How can PCMs do that without specifying what it is that you are correcting for?

I realize that some of these problems may reflect my ignorance about PCMs, but doubt that the general readership you are aiming is much more knowledgeable.

– Relationship between genetic map length and Nc. You note that (l 333-5): "These findings are consistent with both the hypothesis that non-adaptive processes increase genome size in small-Ne species (Lynch and Conery, 2003) which in turn could increase map lengths…". I think the map length is largely explained (e.g., R^2=0.96) by the requirement of having one cross-over per chromosome or chromosomal arm (can't remember which). Specifically, I think that this relationship is much stronger than with genome size, and am not sure whether there is any residual effect of size after controlling for the number of chromosomes.

– Discussion about different measures of Ne. I found several points in this section deep and insightful. As you point out, there are different N_e's for different quantities, and comparisons among them informs us about different processes. I think it would be helpful to emphasize that these are different quantities rather than estimators of the same quantity and that Lewontin's paradox is specifically about the one defined by diversity levels.

A few general comments about presentation:

– You often use several different terms for the same thing, e.g., N_e and expected coalescence rate; explain, solve and resolve; genetic/recombinational map length, heterozygosity/pairwise diversity. This is confusing to readers, who wonder whether there was a reason for using the specific term. Choosing one for each term and sticking with it would be clearer.

– You sometimes use terms that seem to be private abbreviations, e.g., "strong selection parameters" and "quantifying… paradox".

– Perhaps avoid hyphenations as abbreviations, e.g., "low-Nc species" and "across-taxa relationship".

- The discussion would benefit from extensive editing.

Very nice work!

Reviewer #2 (Recommendations for the authors):

I have mixed feelings about this revision. The author did not follow the reviewers' suggestion of shifting from a research to a review article, and rather tried to reinforce his original results. One problem I'm having is that I still do not see the point of the sophisticated phylogenetic analysis that is presented. It is clear from the existing literature that π has a strong phylogenetic inertia; I guess the interesting question is what causes this inertia; whether λ is 0.4, 0.6 or 0.8 is good to know but not a major achievement, in my opinion. I have a similar comment regarding the report that genetic map tends to be shorter in large census sized species, which is cool but not totally novel, and not a very strong effect – I do not understand why social insects are excluded from the model, by the way. I like much figure 4b, which substantiates Coop's 2016 argument via empirical data analysis – but again the added value is mainly in the illustration; the argument existed already.

In my opinion this version has essentially the same strengths and weaknesses as the previous one: text book-like figures and excellent writing, but no breakthrough as far as the newly reported results are concerned. A missed opportunity, I would say.

Reviewer #3 (Recommendations for the authors):

In reviewing the author's response letter and the revised draft, the revisions have certainly streamlined the paper and clarified the author's justification for the analyses.

However, I still have some concerns about how the work is motivated at the outset, and how the novelty of the results is explained:

In the Introduction, the bottom of page 3 the author says that a limitation of past work is that others do not propose a mechanism by which traits act to constrain diversity within a few orders of magnitude. This seems to be a major motivation for doing a new study, but how does this study tackle this question? Did the author have an a priori reason to expect that a more explicit estimation of Nc would resolve Lewontin's paradox, and if so, why? Could the phylogenetic correction have been expected to have resolved Lewontin's paradox, and if so, why (see below)?

Correcting for the effect of Nc on map length in tandem with the linked selection model fitting does seem to fall into this category, and with this in mind, I think the author could go a long way towards clarifying the novelty of the work by being more explicit about these results. As noted by reviewer 1 in the first round of reviews the 'glass half full' interpretation of this study for the genetic draft model is that the map length correction combined with linked selection model fitting CAN actually go a significant way towards shrinking down the range of diversity expected. While it can't go ALL the way, this is an important and novel result that is more important to clarify than to simply conclude that we are still in the exact same conclusion zone as before this work. The author has gone some way towards using linked selection to explain the variation, and this is worth clarifying, and perhaps quantifying in the discussion.

I still find the motivation for the phylogenetic correction hard to parse in the introduction, and I would suggest front-loading some of the points the author makes in the response about the importance of phylogenetic correction even if coalescent times themselves are not constrained by phylogeny into the intro. Clearly, multiple reviewers found it difficult to understand why phylogenetic correction was needed and wasn't just eroding power, so this is important to front-load.

Relatedly, I am still not convinced by the argument that for Lynch's conjecture to be true λ must be zero (page 7). A number of possible interpretations made by the author in the results provide possible mechanisms (like phylogenetic changes in mutation rate) that would still enable coalescent times themselves to be free of a direct phylogenetic effect. The revisions and response now provide a better explanation for the importance of a phylogenetic correction in any case, but I don't think any of the analyses have definitively told us that coalescent times have a phylogenetic signal- it could simply be the phylogenetic inertia of traits associated with Ne and mutation rate.

Reviewer #4 (Recommendations for the authors):

I liked this paper before and I like it even better now. You have done a thorough and thoughtful job at responding to reviewer comments and have addressed my major critiques. This is a really excellent study.

eLife. 2021 Aug 19;10:e67509. doi: 10.7554/eLife.67509.sa2

Author response


Essential revisions:

The reviewers appreciated the scholarship and extensive work that went into this manuscript. At the same time, they had several issues with the novel analyses. After some discussion, they suggested that if the paper is revised into more of a review, then it could be of interest to the broad readership of eLife. This would imply substantial streamlining, down-weighting and even removing some analyses, and addressing the reviewers comments on others. Specifically:

Thank you for the feedback. At present (and discussed in more detail in the replies below), the three novel analyses and findings of this study (see reply to Reviewer 2) make this outside the scope of a review, so I am submitting it as a research article.

– Find ways to validate the census size estimates (Reviewers 1 and 2).

Reviewer 1’s feedback helped me discover an error in my previous analysis. The census sizes are now a few orders of magnitude smaller for cosmopolitan arthropod species like Drosophila melanogaster. I have also conducted a consistency check by comparing the implied biomasses from my study to previously-published estimates across phyla.

– The reviewers had a hard time understanding the motivation for the phylogenetic analysis (e.g., Reviewers 1-3). One option is to clarify this motivation, as well as the interpretation of the results; another is to remove this analysis or parts of it. In addition, Reviewer 4, who is an expert on these analyses, had a number of concrete comments that should be addressed.

I have significantly reworked how this analysis was framed, in part thanks to the feedback of Reviewer 4. I also have reworked some sections of the discussion to discuss why such analyses are needed.

– While the reviewers found the analysis of linked selection effects intriguing, they were unclear about its interpretation. Notably, to what extent is it simply affirming the conclusions of Coop (2016), as opposed to illustrating a much more dramatic effect (e.g., Reviewers 1-3).

I have reworked part of the discussion (lines 518-524) to discuss how the findings in this paper fits in, and builds off of the findings of Coop (2016). I discuss this in much more detail below.

Reviewer #1 (Recommendations for the authors):

1) Substantial streamlining and editing would make the manuscript much clearer.

Thank you, following this and other feedback, I have edited some unclear sections.

2) The abstract is way too long: not on the intended resolution.

I have reduced the abstract significantly, to 200 words (the limit of eLife).

3) If you aim for a broad readership, you may consider having the background be less of a historical review (without sacrificing scholarship). Also, no need to recap the historical review at the beginning of the Discussion.

I have removed the recap at the beginning of the discussion. I have not cut down the historical context as, (1) I believe such a mini review (in the spirit of Coop and Ralph, 2012) is needed, and (2) other reviewers and readers did seem to like this aspect.

4) You recap what you do at length several times (e.g., L 130-156), which is repetitive.

I am not sure I follow this comment — this is the first time I describe what I am doing in the manuscript, which seems like an important part of the introduction?

5) It would be clearer to use consistent terminology, e.g., instead of "enigma", "anomaly", "paradox" and "explanation", "resolution"…

"… I have removed some synonyms for clarity.

6) The writing switches between acknowledging that several factors are plausibly at work and seeking "a solution".

I agree that “explanation” or “resolution” are better terms, and have made these changes.

7) It is claimed that "an ordinary least squares relationship on a log-log scale fits the data well" but I did not find a quantification to this effect. Namely, what proportion of the variance does it explain? Moreover, it is claimed that this relationship is homoscedastic. Can that be quantified as well? From looking at the figure it seems that the regression may explain ~1 of the ~3 orders of magnitude in diversity levels and the residuals explain ~2. It would also be helpful to say what we learn from this fit that we did not learn from staring at the plot. Does one (or more) of the potential explanations for Lewontin's paradox posit a log-log relationship?

The log-log relationship is used because of heteroscedasticity on a linear-log scale, and because both axes vary over several orders of magnitude. I do not think it is too surprising that π varies a few orders of magnitude, since Nc varies over so many (and Ne a fraction of that). Regarding the proportion of variance, I have included an R2 in the paper (the adjusted R2 is about 0.25). However, it should be noted that R2 has numerous statistical issues (see p. 181-182 of Cosma Schalizi’s The Truth about Linear Regression, http://www.stat.cmu.edu/~cshalizi/TALR/TALR.pdf). Likewise, I don’t think a formal test of heteroscedasticity is particularly helpful; a plot of the residuals versus x values is convincing enough (though I think unnecessary to include in the supplementary materials).

Reviewer #3 (Recommendations for the authors):

My main suggestion is to expand a little more the arguments for why the author feels particular factors are unlikely to explain the patterns qualitatively, and to touch on some of the alternative explanations raised in the public review.

This is useful feedback that other reviewers have voiced as well. I have expanded on how my results connection to that of Coop (2016) on lines 517-524, as well as addressed some more points about the need for PCMs in the discussion (lines 440-473).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

All of us find the topic important, the manuscript very interesting and the revised version improved. At the same time, we also felt that the potential impact of the paper is diminished by not (or only partially) addressing key issues raised in the previous round of review. All the more so, if this is to be more of a research paper than a review (although there can be some range in between).

Notably:

1) We found that the purpose and insights of the phylogenetic regression model were still not clear. While we had Matthew to answer some of our questions, most of the readers will not have such a resource.

I have significantly edited this (lines 197-284) to more clearly explain the purpose of the phylogenetic mixed-effect model and better frame the statistical issues it addresses.

2) We think that more precisely articulating how the linked selection analysis may have moved us forward (i.e., beyond just saying that it did not resolve the paradox entirely) would be helpful. For example, if you believe that it accounts for several orders of magnitude even if it does not account for the shape of the dependency of π on Nc for intermediate values of Nc, then that is important progress.

The main problem I show here is that assuming strong selection (i.e. the levels predicted under D. melanogaster), theoretic linked selection models do not match the observed data if Ne = Nc. Unfortunately, this analysis cannot shed light on how much linked selection impacts diversity across species because I use Drosophilamelanogaster parameters that are only suitable for qualitatively checking if linked selection could explain the observed pattern, not quantify the impact across species. Still, my findings are a serious qualitative mismatch between the observed and predicted relationship between diversity and population size. Furthermore, as the new analysis described below shows, this qualitative finding is not sensitive to parameter choice.

3) In that regard, we also think that exploring a somewhat wider range of selection parameters (along the lines suggested by reviewer 1 in the previous round) would be helpful in elucidating how far linked selection can and cannot take us.

I have added a new supplementary figure (p. 47) looking at how a ten-fold increase in BGS (the U parameter) and HH (the γ parameter) would impact the predicted relationship between diversity and population size. This figure also depicts the predicted effect of BGS and HH on selection individually (subfigure A). Overall, this demonstrates that the poor fit found between observed and predicted levels of diversity is not something that can be remedied with stronger parameters. I discuss this on lines 377-386, which also addresses the above issue as well.

In summary, we think that the impact of your nice work would be enhanced substantially by addressing these issues, but we leave that to your discretion and will not require another round of reviews.

Thank you to all the reviewers for feedback.

Reviewer #1 (Recommendations for the authors):

The manuscript has improved substantially in both substance and presentation. I believe that it would be all the more impactful with another thorough round of editing. I attach a pdf with comments/suggestions, but am not a native speaker and am also mildly dyslexic, which is to say that it may be worth getting additional feedback from better writers/editors.

A few comments about substance:

– The interpretation and necessity of controlling for phylogenetic non-independence. Having read the revised manuscript, your replies and the comments of Reviewer 4, I remain confused about the interpretation of the analysis. For example, I don't understand whether it is about controlling for phylogenetic relationships in errors or in factors that affect the relationship between π and Nc. Moreover, to the extent that it does the latter, what are the assumptions that go into the correction.

I agree this was unclear in the way I presented the results; I have fixed this to make it clear I am dealing with phylogenetic correlation in the residuals.

In the discussion you say that (l. 415-16) "The evidence of high phylogenetic signal found in this study suggests PCMs are needed, in part to avoid spurious results from phylogenetic pseudo-replication." If I understand you correctly, you are suggesting that factors that modify the relationship between π and Nc, such as life history traits, are likely to be more similar in more closely related species, and consequently, the (pi, Nc) points corresponding to closely related species may be considered replication in that they reflect the same processes. If this is what you meant, I think it should be spelled out. Moreover, it would be good to do so early on, in the results, such that the reader can better understand what the PCM analysis is plausibly about.

I added a lot in the Results section on pseudo-replication (lines 197-284) to address this and more clearly explain why this correction is needed, and what it does and does not assume.

In the Results section you say that (l. 220-223): "If the relationship between diversity and population size was free of shared phylogenetic history, \λ = 0 and all the variance could be explained by evolution on the tips; this is analogous to Lynch's conjecture that coalescent times should be free of phylogenetic signal (2011)." Doesn't this contradict the interpretation discussed in the previous paragraph?

I agree this was incorrect and have removed this.

Also, I still don't understand the claim that PCM corrects for "spurious pseudo-replication". Shouldn't the determination whether it is "pseudo-replication" depend on the notion of the "true" relationship that you are trying to estimate? Stated differently, maybe some of the phylogenetic signal arises from similarities in factors that affect the pi-Nc relationship and others that just affect π (e.g., mutation rates), a given notion of the "true" relationship would suggest that you want to correct for the latter but not the former. How can PCMs do that without specifying what it is that you are correcting for?

I realize that some of these problems may reflect my ignorance about PCMs, but doubt that the general readership you are aiming is much more knowledgeable.

This gets at some deeper issues about PCMs, but I think the important part is that phylogenetic signal in either the X or Y variable itself is not a violation of the standard regression model, only error in the residuals (Revell, 2010). Even if the true relationship is driven by, say, a mutation rate change along a lineage, sister taxa that inherit this mutation rate along their parent lineage are not independent observations of this effect in a way that an entirely different clade with its own mutation rate change would be (Maddison and FitzJohn, 2014). The phylogenetic mixed-effect model is only looking for correlation in errors and down-weighting the samples as if they were correlated observations. This is a conservative approach to deal with the structure found in the residuals, and importantly, the findings are not qualitatively different than the OLS.

– Relationship between genetic map length and Nc. You note that (l 333-5): "These findings are consistent with both the hypothesis that non-adaptive processes increase genome size in small-Ne species (Lynch and Conery, 2003) which in turn could increase map lengths…". I think the map length is largely explained (e.g., R^2=0.96) by the requirement of having one cross-over per chromosome or chromosomal arm (can't remember which). Specifically, I think that this relationship is much stronger than with genome size, and am not sure whether there is any residual effect of size after controlling for the number of chromosomes.

This is an interesting point, and I do have a figure showing the relationship between chromosome number and map length in the project’s GitHub repository. I’ve run a quick regression of map_length ~ chrom_number + genome_size and find both are significant. However, the direction of causality is unclear and this could be plagued by collider bias. Overall, I think the discussion of this is best left as is (the original sentence was added only to address a reviewer’s concerns in the last round of revisions).

– Discussion about different measures of Ne. I found several points in this section deep and insightful. As you point out, there are different N_e's for different quantities, and comparisons among them informs us about different processes. I think it would be helpful to emphasize that these are different quantities rather than estimators of the same quantity and that Lewontin's paradox is specifically about the one defined by diversity levels.

I have tried to make this clear with “These various estimators capture different summaries of effective population size on shorter timescales than coalescent-based estimators”.

A few general comments about presentation:

– You often use several different terms for the same thing, e.g., N_e and expected coalescence rate; explain, solve and resolve; genetic/recombinational map length, heterozygosity/pairwise diversity. This is confusing to readers, who wonder whether there was a reason for using the specific term. Choosing one for each term and sticking with it would be clearer.

I have made these changes (though in some cases allelic heterozygosity, not per basepair diversity was measured by these earlier allozyme studies).

– You sometimes use terms that seem to be private abbreviations, e.g., "strong selection parameters" and "quantifying… paradox".

I have made these changes.

– Perhaps avoid hyphenations as abbreviations, e.g., "low-Nc species" and "across-taxa relationship".

I have kept the low-Nc, high-Nc, etc as this saves a lot of text and makes sentences shorter.

- The discussion would benefit from extensive editing.

I have made the majority of these changes, thank you for your feedback!

Reviewer #2 (Recommendations for the authors):

I have mixed feelings about this revision. The author did not follow the reviewers' suggestion of shifting from a research to a review article, and rather tried to reinforce his original results. One problem I'm having is that I still do not see the point of the sophisticated phylogenetic analysis that is presented. It is clear from the existing literature that π has a strong phylogenetic inertia; I guess the interesting question is what causes this inertia; whether λ is 0.4, 0.6 or 0.8 is good to know but not a major achievement, in my opinion. I have a similar comment regarding the report that genetic map tends to be shorter in large census sized species, which is cool but not totally novel, and not a very strong effect – I do not understand why social insects are excluded from the model, by the way. I like much figure 4b, which substantiates Coop's 2016 argument via empirical data analysis – but again the added value is mainly in the illustration; the argument existed already.

I think it is important to point out that Coop (2016) only refuted Corbett-Detig et al’s claim that their estimated impact of linked selection is strong enough to explain Lewontin’s Paradox. Since Corbett-Detig’s work does not have census population sizes, only proxies of population size, we have no concept of the magnitude of the effect to be explained, and whether linked selection can explain it. This can be seen in Coop’s (2016) Figure 1, where the dashed red line is a hypothetical relationship — in my work, by quantifying the π-Nc relationship, we can get a sense of the scale. Finally, my work goes a step further than Coop’s work, showing that not only do Corbett-Detig’s estimates not go far enough in showing linked selection can explain Lewontin’s Paradox, but that no estimates, not even those orders of magnitude larger, can explain Lewontin’s Paradox (see Figure 4-Supplement 4).

Reviewer #3 (Recommendations for the authors):

In reviewing the author's response letter and the revised draft, the revisions have certainly streamlined the paper and clarified the author's justification for the analyses.

However, I still have some concerns about how the work is motivated at the outset, and how the novelty of the results is explained:

In the Introduction, the bottom of page 3 the author says that a limitation of past work is that others do not propose a mechanism by which traits act to constrain diversity within a few orders of magnitude. This seems to be a major motivation for doing a new study, but how does this study tackle this question? Did the author have an a priori reason to expect that a more explicit estimation of Nc would resolve Lewontin's paradox, and if so, why? Could the phylogenetic correction have been expected to have resolved Lewontin's paradox, and if so, why (see below)?

See the response to reviewer #2 above. Also, if the relationship between π and Nc were no longer significant after accounting for phylogeny, and within-clade regressions were similarly non-significant, this would not solve Lewontin’s Paradox. Rather, this would suggest that the relationship is even less dependent on Nc than we expected, and perhaps some macroecological or macroevolutionary process could explain this.

Correcting for the effect of Nc on map length in tandem with the linked selection model fitting does seem to fall into this category, and with this in mind, I think the author could go a long way towards clarifying the novelty of the work by being more explicit about these results. As noted by reviewer 1 in the first round of reviews the 'glass half full' interpretation of this study for the genetic draft model is that the map length correction combined with linked selection model fitting CAN actually go a significant way towards shrinking down the range of diversity expected. While it can't go ALL the way, this is an important and novel result that is more important to clarify than to simply conclude that we are still in the exact same conclusion zone as before this work. The author has gone some way towards using linked selection to explain the variation, and this is worth clarifying, and perhaps quantifying in the discussion.

I have addressed this with the main comments. The issue here is that the findings here are qualitative; they show that linked selection models are incapable of explaining Lewontin’s Paradox under strong selection expected of D. melanogaster. Given that it would be erroneous to assume that linked selection in all species is as strong as it is in D. melanogaster, I cannot use these results to quantify the impact of linked selection across species.

I still find the motivation for the phylogenetic correction hard to parse in the introduction, and I would suggest front-loading some of the points the author makes in the response about the importance of phylogenetic correction even if coalescent times themselves are not constrained by phylogeny into the intro. Clearly, multiple reviewers found it difficult to understand why phylogenetic correction was needed and wasn't just eroding power, so this is important to front-load.

I have clarified this in the PCM section.

Relatedly, I am still not convinced by the argument that for Lynch's conjecture to be true λ must be zero (page 7). A number of possible interpretations made by the author in the results provide possible mechanisms (like phylogenetic changes in mutation rate) that would still enable coalescent times themselves to be free of a direct phylogenetic effect. The revisions and response now provide a better explanation for the importance of a phylogenetic correction in any case, but I don't think any of the analyses have definitively told us that coalescent times have a phylogenetic signal- it could simply be the phylogenetic inertia of traits associated with Ne and mutation rate.

I discuss this in the discussion a bit (see the mention of Uyeda et al. around lines 432-438). Overall, a trait that has phylogenetic signal is not a violation of the regression model, and not a problem; only if the residuals have correlated errors is this a problem (see Revell 2010 and Uyeda et al. 2018).

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Vince B. 2021. vsbuffalo/paradox_variation: biorxiv v.1 with minor corrections. Zenodo. [DOI]
    2. Stapley J, Feulner PGD, Johnston SE, Santure AW, Smadja CM. 2017. Supplementary material from "Variation in recombination frequency and distribution across eukaryotes: patterns and processes". figshare. [DOI] [PMC free article] [PubMed]
    3. Global Biodiversity Information Facility 2020. (27 August 2020) GBIF Occurrence Download. GBIF.org. [DOI]

    Supplementary Materials

    Figure 1—source data 1. The population size estimates for 172 metazoan taxa.
    Figure 2—source data 1. The diversity and population size dataset for 172 metazoan taxa.
    Figure 4—source data 1. The map length, population size, and linked selection estimates for 136 metazoan taxa.
    Transparent reporting form

    Data Availability Statement

    All primary datasets collated by this study, including new census size and range estimates, are available on Github at http://github.com/vsbuffalo/paradox_variation (copy archived at https://archive.softwareheritage.org/swh:1:rev:8fa6b5834f6536319b1e5cd9722ca02d317183df). An archived version of this repository is also available at Zenodo.

    The following dataset was generated:

    Vince B. 2021. vsbuffalo/paradox_variation: biorxiv v.1 with minor corrections. Zenodo.

    The following previously published datasets were used:

    Stapley J, Feulner PGD, Johnston SE, Santure AW, Smadja CM. 2017. Supplementary material from "Variation in recombination frequency and distribution across eukaryotes: patterns and processes". figshare.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES