Significance
Human genomes carry hundreds of mutations that are predicted to be deleterious in some environments, potentially affecting the health or fitness of an individual. We characterize the distribution of deleterious mutations among diverse human populations, modeled under different selection coefficients and dominance parameters. Using a new dataset of diverse human genomes from seven different populations, we use spatially explicit simulations to reveal that classes of deleterious alleles have very different patterns across populations, reflecting the interaction between genetic drift and purifying selection. We show that there is a strong signal of purifying selection at conserved genomic positions within African populations, but most predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa.
Keywords: mutation, founder effect, range expansion, expansion load, purifying selection
Abstract
The Out-of-Africa (OOA) dispersal ∼50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.
It has long been recognized that a human genome may carry many strongly deleterious mutations; Morton et al. (1) estimated that each human carries on average four or five mutations that would have a “conspicuous effect on fitness” if expressed in a homozygous state. Empirically estimating the deleterious mutation burden is now feasible through next-generation sequencing (NGS) technology, which can assay the complete breadth of variants in a human genome. For example, recent sequencing of over 6,000 exomes revealed that nearly half of all surveyed individuals carried a likely pathogenic allele in a known Mendelian disease gene (i.e., from a disease panel used for newborn screening) (2). Although there is some variation across individuals in the number of deleterious alleles per genome, we still do not know whether there are significant differences in deleterious variation among populations. Human populations vary dramatically in their levels of neutral genetic diversity, which suggests variation in the effective population size, Ne. Theory suggests that the efficacy of natural selection is reduced in populations with lower Ne because they experience greater genetic drift (3, 4). In an idealized population of constant size, the efficacy of purifying selection depends on the relationship between Ne and the selection coefficient s against deleterious mutations. If 4Nes << 1, deleterious alleles evolve as if they were neutral and can, thus, reach appreciable frequencies. This theory raises the question of whether human populations carry differential burdens of deleterious alleles due to differences in demographic history.
Several recent papers have tested for differences in the burden of deleterious alleles among populations; these papers have focused on primarily comparing populations of western European and western African ancestry. Despite similar genomic datasets, these papers have reached a variety of contradictory conclusions (4–9). Initially, Lohmueller et al. (10) found that a panel of European Americans carried proportionally more derived, deleterious alleles than a panel of African Americans, potentially as the result of the Out-of-Africa (OOA) bottleneck. More recently, analyses using NGS exome datasets from samples of analogous continental ancestry found small or no differences in the average number of deleterious alleles per genome between African Americans and European Americans—depending on which prediction algorithm was used (11–13). Simulations by Fu et al. (11) found strong bottlenecks with recovery could recapitulate patterns of differences in the number of deleterious alleles between African and non-African populations, supporting Lohmueller et al. (10), but in contrast to work by Simons et al. (12).
It is important to note two facts about these contradictory observations. First, these papers tend to use different statistics, which differ in power to detect changes across populations, as well as the impact of recent demographic history (6, 11). Lohmueller et al. (10) compared the relative number of nonsynonymous to synonymous (or “probably damaging” to “benign”) SNPs per population in a sample of n chromosomes, whereas Simons et al. (12) examined the special case of n = 2 chromosomes, namely, the average number of predicted deleterious alleles per genome (i.e., heterozygous + 2 * homozygous derived variants per genome). One way to think about these statistics is that the total number of variants, S, gives equal weight, w = 1, to an SNP regardless of its frequency, p. The average number of deleterious variants statistic gives weights proportional to the expected heterozygous and homozygous frequencies or w = 2p(1 − p) + p2 = 2p − p2. The average number of deleterious alleles per genome is fairly insensitive to differences in demographic history because heterozygosity is biased toward common variants. In contrast, the proportion of deleterious alleles has greater power to detect the impact of recent demographic history for large n across the populations because it is sensitive to rare variants that tend to be more numerous, younger, and enriched for functionally important mutations (14–16). Second, empirical comparisons between two populations have focused primarily on an additive model for deleterious mutations, even though there is evidence for pathogenic mutations exhibiting a recessive or dominant effect (17, 18), and possibly an inverse relationship between the strength of selection s and the dominance parameter h (19).
There remains substantial conceptual and empirical uncertainty surrounding the processes that shape the distribution of deleterious variation across human populations. We aim here to clarify three aspects underlying this controversy: (i) Are there empirical differences in the total number of deleterious alleles among multiple human populations? (ii) Which model of dominance is appropriate for deleterious alleles (i.e., should zygosity be considered in load calculations)? (iii) Are the observed patterns consistent with predictions from models of range expansions accompanied by founder effects? We address these questions with a new genomic dataset of seven globally distributed human populations.
Results
Population History and Global Patterns of Genetic Diversity.
We obtained moderate coverage whole-genome sequence (median depth 7×) and high coverage exome sequence data (median depth 78×) from individuals from seven populations from the Human Genome Diversity Panel (HGDP) (20). Unrelated individuals (no relationship closer than first cousin) were selected from seven populations chosen to represent the spectrum of human genetic variation from throughout Africa and the OOA expansion, including individuals from the Namibian San, Mbuti Pygmy (Democratic Republic of Congo), Algerian Mozabite, Pakistani Pathan, Cambodian, Siberian Yakut, and Mexican Mayan populations (Fig. 1A). The 2.48-Gb full genome callset consisted of 14,776,723 single nucleotide autosomal variants, for which we could orient 97% to ancestral/derived allele status (SI Appendix).
Heterozygosity among the seven populations decreases with distance from southern Africa, consistent with an expansion of humans from that region (21). The Namibian San population carried the highest number of derived heterozygotes, ∼2.39 million per sample, followed closely by the Mbuti Pygmies (SI Appendix, Table S1 and Fig. S5). The North African Mozabites carry more heterozygotes than the OOA populations in our dataset (2 million) but substantially fewer than the sub-Saharan samples, likely reflecting a complex history of an OOA migration, followed by reentry into North Africa and subsequent recent gene flow with neighboring African populations (22). The Maya have the lowest median number of heterozygotes in our sample, ∼1.5 million, which may be inflated due to recent European admixture (23). Two Mayan individuals displayed substantial recent European admixture (>20%) as assessed with local ancestry assignment (24) (SI Appendix, Fig. S6); these individuals were removed from analyses of deleterious variants. When we recalculated heterozygosity in the Maya, it was reduced by 3.5%. The decline in heterozygosity in OOA populations with distance from Africa strongly supports earlier results based on SNP array and microsatellite data for a serial founder effect model for the OOA dispersal (25, 26). We analyzed population history for individuals having sufficient coverage from five of the studied populations using the pairwise sequential Markovian coalescent software (PSMC) to estimate changes in Ne (11, 12, 27). Because dating demographic events with PSMC is dependent on both the assumed mutation rate and the precision with which a given event can be inferred, we compare relative bottleneck magnitudes and timing among the seven HGDP populations. Consistent with previous analyses (27), the OOA populations show a sharp reduction in Ne, with virtually identical population histories (Fig. 1B and SI Appendix). Simulations indicate that the magnitude of the 12-fold bottleneck is accurately estimated (SI Appendix, Fig. S7), even if the time of the presumed bottleneck is difficult to estimate precisely using PSMC. Interestingly, both the Mbuti and the Namibian San show a moderate reduction in Ne relative to the ancestral maximum, with the San experiencing an almost twofold reduction in Ne and the Mbuti displaying a reduction intermediate between the San and OOA populations (see also refs. 20, 28, and 29). These patterns are consistent with multiple population histories (e.g., both short and long bottlenecks) and multiple demographic events, including a reduction in substructure from the ancestral human population rather than a bottleneck per se (27).
Differences in Deleterious Alleles per Individual Genomes.
Owing to differences in coverage among the whole genome sequences, our subsequent analyses focus on the high-coverage exome dataset (78× median coverage) to minimize any bias in comparing populations (Materials and Methods). We classified all mutations discovered in the exome dataset into categories based on Genomic Evolutionary Rate Profiling (GERP) Rejected Substitution (RS) scores. These conservation scores reflect various levels of constraint within a mammalian phylogeny (Materials and Methods) and are used to categorize mutations by their predicted deleterious effect (30, 31). Importantly, the allele present in the human reference genome was not used in the GERP RS calculation, avoiding the reference-bias effect previously observed in other algorithms (11, 12) (SI Appendix, Fig. S8A). Variants were sorted into four groups reflecting the likely severity of mutational effects: “neutral” (−2 < GERP < 2), “moderate” (2 ≤ GERP < 4), “large” (4 ≤ GERP < 6), and “extreme” (GERP ≥ 6) (SI Appendix, Fig. S9). GERP categories were concordant with ANNOVAR functional annotations (SI Appendix, Table S2 and Fig. S8B).
When considering the total number of derived alleles per individual, defined here as AI = (1 × HET) + (2 × HOMder), we observe an increase of predicted deleterious alleles with distance from Africa (Fig. 1C). The number of predicted deleterious alleles per individual increases along the range expansion axis (from San to Maya), consistent with theoretical predictions for expansion load (32). The maximal difference in the number of deleterious alleles between African and OOA individuals is ∼150 alleles. This result is consistent with theoretical predictions; the rate at which deleterious mutations accumulate in wave-front populations is limited by the total number of mutations occurring during the expansion (32). Assuming an exomic mutation rate of u = 0.5 per haploid exome and an expansion that lasted for t = 1,000 generations, a very conservative upper limit for the excess of deleterious alleles in OOA individuals would be 2*u*t = 1,000. The cline in AI is most pronounced for large-effect alleles (4 ≤ GERP < 6, Fig. 2E), whereby the San individuals carry AI = 4,450 large-effect alleles on average, increasing gradually to 4,550 in Yakut. The Mayans carry slightly fewer large-effect mutations per individual than the Yakut, which may be influenced by the residual European ancestry (between 5–20%) in our sample. For extreme alleles (GERP ≥ 6), each individual in the dataset carries on average 110–120 predicted highly deleterious alleles with no significant differences among populations (Fig. 2F). The average additive GERP score—obtained by counting the GERP scores at homozygous sites twice—for all predicted deleterious variants per individual is lowest in the San (∼3.3) and highest in the Maya (∼3.8).
Similar patterns are found when we consider the number of derived homozygous sites per individual. We find that individuals from OOA populations exhibit significantly more homozygotes for moderate, large, and extreme variants than African populations (Fig. 2 A–C). In addition, we observe a clear increase in the number of derived homozygotes with distance from Africa for moderate (2 ≤ GERP < 4) and large (4 ≤ GERP < 6) mutation effects categories, whereas the number of derived “extreme” homozygotes (GERP ≥ 6) is similar among OOA populations: All OOA genomes possess 30–40 extremely deleterious alleles in homozygous state (Fig. 2C). These patterns are in excellent agreement with theoretical predictions for the evolution of genetic variation during range expansions (7). The average GERP score per individual for derived homozygous variants is less differentiated than the additive model (above), varying between 2.43–2.49.
It is important to note that AI is strongly influenced by common variants. Goode et al. (33) observed that as much as 90% of deleterious alleles in a single genome have a derived allele frequency greater than 5%, suggesting that the bulk of mutational burden using this metric will come from common variants. To explore this idea, we randomly chose an individual in each population and calculated the proportion of deleterious variants that are rare (<10%, i.e., a singleton within our population samples) and common (>10%), for each GERP category (Fig. 3A). Common deleterious alleles contribute to more than 90% of an individual’s AI, and the proportion of common deleterious variants increases with distance from Africa, as can be seen by the decrease of rare deleterious variants. This includes common large-effect variants, which make up proportionally more of AI for an OOA individual than for an African individual. For example, in a Mayan individual, 93% of large-effect variants are common compared with a San individual, where only 85% of large-effect variants are common (SI Appendix, Fig. S12). Given the small number of chromosomes in each population (n = 14–16), estimates of allele frequencies are subject to sampling effects. We recently performed the same analysis on exome data from the 1000 Genome Phase 1 Project (34). We find a similar pattern as in our HGDP data: On a per-genome basis, common variants represent a majority of the alleles predicted to be deleterious (5).
Differences in Deleterious Alleles at the Population Level.
To further elucidate the relationship between predicted mutation effect and allele frequencies, we compared the site frequency spectrum (SFS) for neutral and large- (4 ≤ GERP < 6) effect variants (Fig. 3B; see SI Appendix, Fig. S14 for a comparison between neutral and extreme variants). For all populations, singletons are enriched for deleterious variants (compared with neutral variants), consistent with the effect of purifying selection against deleterious variants (15, 35). However, the SFSs of OOA and African populations show marked differences. The neutral and deleterious SFSs of OOA populations show a global shift toward higher frequencies, consistent with the effects of serial bottlenecks/founder effects. It follows that OOA populations have fewer rare deleterious variants than Africans, as well as a larger proportion of fixed deleterious alleles; almost 7.9% of large-effect variants are fixed in the Maya, whereas the San have only 1.8% of deleterious variants fixed (Fig. 3B).
Simulations of Purifying Selection Under a Range Expansion.
We sought to interpret the population-specific patterns of genetic diversity for each GERP category under a model including serial founder effects across geographic space and purifying selection. We simulated the evolution of both neutral and deleterious mutations under a simple model of range expansion in a 2D habitat (SI Appendix, Fig. S21). At selected loci, the ancestral allele was assumed selectively neutral and mutants reduced an individual’s fitness by a factor 1 − s only if it was present in homozygous state, that is, deleterious mutations were assumed to be completely recessive. Three thousand generations (corresponding to about 75 kya) after the onset of the range expansion, we computed the average expected heterozygosity for all populations. Computational limitations of individual-based simulations prohibit a complete exploration of the parameter space for this model, but, by varying migration rates and selection coefficients, we identified parameter values that fit the observed clines in heterozygosity reasonably well (Fig. 4B). Specifically, we first identified selection coefficients that yield the same relative differences between observed neutral and selected heterozygosities (Fig. 4A). Then, the migration rate was adjusted to fit the observed clines in heterozygosities, assuming that the distance between two demes is 250 km (Fig. 4B). The fit selection coefficients were 0, 1.25 × 10−4, 1 × 10−3, and 2 × 10−3 for neutral, moderate, large, and extreme GERP scores categories, respectively; the GERP ≥ 6 category showed the worst fit and observed counts indicate that even stronger selection coefficients should be considered for these extreme mutations (16). We performed the same analysis using a model in which mutations are codominant and, as expected, we found that the fit selection coefficients are smaller than those obtained a recessive model. These coefficients are estimated as s = 0, 0.5 × 10−4, 1.2 × 10−4, and 2 × 10−4, respectively (SI Appendix, Fig. S16) (16).
Evolutionary Forces Acting on Heterozygosity.
To better understand which evolutionary forces have acted in different populations to shape their levels of genetic diversity, we define a new statistic, RH. RH measures the reduction in heterozygosity at conserved sites relative to neutral heterozygosity, RH = (Hneu − Hdel)/Hneu, where Hneu indicates heterozygosity at neutral sites and Hdel at GERP score categories >2. RH can be seen as a way to quantify changes of functional diversity across populations relative to neutral expectations. For instance, a constant RH value across populations would suggests that average functional diversity is determined by the same evolutionary force(s) as neutral diversity, that is, genetic drift and migration. In contrast, if RH changes across populations, it suggests that different evolutionary forces have shaped neutral and functional diversity, that is, selection has changed functional allele frequencies.
In our dataset, RH is significantly larger in sub-Saharan Africans than in OOA populations across all functional GERP categories (Fig. 4C), indicating that selection has acted differently relative to drift between the two groups. The correlation between RH value and predicted mutation effect observed in Africa (Fig. 4A) confirms that purifying selection has kept strongly deleterious alleles at lower frequencies than in OOA populations. We then asked whether there were significant differences across OOA population, as oriented by their distance from eastern Africa. Interestingly, we see that the OOA RH values do not depend on their distance from Africa for predicted moderate-effect alleles (P = 0.82; SI Appendix, Fig. S15), suggesting that the frequencies of moderate mutations have evolved mainly according to neutral demographic processes during the range expansion out of Africa. In contrast, for strongly deleterious variants (large and extreme GERP categories) we see a significant cline in RH (P = 0.01 and P = 1.12 × 10–6, respectively; SI Appendix, Fig. S15), which implies that purifying selection has also contributed to their evolution relative to demographic processes.
Models of Dominance.
We next considered whether there is empirical evidence for nonadditive effects for deleterious variants. Prior studies generally calculated “mutation load” by assuming an additive model, summing the number of deleterious alleles per individual, without factoring in whether an SNP occurs in a homozygous or heterozygous state. Determining an individual’s mutation load is, however, highly dependent on the underlying model of dominance (36) (a formal definition of mutation load is given below). For humans, Mendelian diseases tend to be overrepresented in endogamous populations or consanguineous pairings, indicating that many of these mutations are recessive (37); Gao et al. (38) estimate 0.58 lethal recessive mutations per diploid genome in the Hutterite population. Gene conversion can also lead to differential burden of derived, recessive diseases alleles among populations (39). Even height, a largely quantitative trait, seems to be affected by the architecture of recessive homozygous alleles in different populations (40).
To further clarify the impact of dominance, we compared the distribution of deleterious variants across genes associated with dominant or recessive disease as reported in Online Mendelian Inheritance in Man (OMIM) (41). We expect to see a lower proportion of large- and extreme-effect variants in genes with dominant OMIM mutation annotations, compared with genes with recessive OMIM mutation annotations. We tested this hypothesis with the HGDP as well as the much larger 1000 Genomes Phase 1 dataset (SI Appendix, Fig. S18B). We averaged the proportion of variants within each effect category and performed a Wilcoxon test to determine whether the distribution of the proportion of large-effect variants was different between dominant and recessive genes. In the HGDP dataset, we observed P = 0.06, and for the larger 1000 Genomes dataset, P = 0.03. Our results indeed show a significantly higher proportion of large-effect variants in genes with recessive annotations, compared with genes with dominant annotations, suggesting that deleterious variants in the genome may tend to be recessive. However, we caution that OMIM genes are here annotated as dominant or recessive, whereas dominance is a property of specific mutations, and therefore all deleterious variants in a gene will not necessarily have the same dominance coefficient. Nonetheless, our results are consistent with an interpretation that genes may have certain properties, for example negative selection against dominant mutations in crucial housekeeping or developmental genes, that influence the tolerable distribution of dominance among variants. We consider the effect of dominance (summarized by h, which measures the effect of selected mutations in heterozygotes relative to homozygotes) on mutation load in the HGDP population samples given the observed differences in heterozygosity.
Modeling the Burden of Deleterious Alleles.
We modeled three different scenarios to estimate the burden of deleterious alleles across populations. The relationship between fitness W and load for a given locus v is classically defined (36) as
Whet = gAa × (1 − hs) and Whom = gaa × (1 − s), where gAa and gaa are the observed genotype frequencies of the heterozygotes and derived homozygotes, respectively. The estimated population load (ignoring epistasis) is the sum of the load for all variants: . For each variant we assigned the selection coefficient inferred by the range expansion simulations according to its GERP score [see also Henn et al. (5)]. Given that we do not know the distribution of dominance effects in human variation, we started by estimating the bounds for the mutation load for each population by considering two extreme scenarios: completely recessive and complete additive models for deleterious variants. We calculated LT for each HGDP population (Fig. 5). When all mutations are considered strictly additive (h = 0.5), values for mutation load are very similar across populations, with sub-Saharan African populations having the lowest mutational load (LT =2.83), followed by the Pathan and Mozabites, and finally the Asian and Native American populations showing the highest load (LT = 2.89) (Fig. 5B). We consider this model, as adopted in earlier studies, to demonstrate that even under an additive assumption there is a statistically significant 1.7% difference in the spectrum of load between populations (SI Appendix, Fig. S24). When all mutations are considered recessive (h = 0), this model yields a much larger 45% difference in load (LT ranges between 1.27 and 1.85) between the San and the Maya (Fig. 5A). Although this is surely an overestimate, it illustrates the broad range of potential values and consistent signal in the data for differences among populations in estimated load. The mutation load under a recessive model is not explained by inbreeding, as measured by the cumulative amount of the genome in runs of homozygosity (cROH) greater than 1 Mb (r = 0.27, P = 0.55) (SI Appendix, Fig. S25); this is because the African hunter–gatherers have relatively high cROH compared with other global populations, as is commonly observed in small endogamous populations (21, 42).
For the third scenario we used a model based on studies of dominance in yeast and Drosophila (19, 43, 44), in which there is an inverse relationship between selection and dominance (highly deleterious mutations tend to be recessive), and where h is sampled from a distribution following Agrawal and Whitlock (19). The maximal difference in load under this model was 30.8% (Fig. 5C), again between the San and Maya, and the minimum difference in load was 1%, between the Cambodians and Yakut. We note that the difference in relative fitness [e−L(T)] is much less than the difference in mutation load (i.e., a relative reduction of 79% in the San versus 87% in the Maya translates to a 8% difference between the two populations under the h(s) model; see also Discussion). As in the other modeled dominance scenarios, the majority of calculated mutational load is contributed by the large-effect mutational category, because this category has a relatively strong selection coefficient and thousands of mutations (>4,000 on average per individual). Thus, this category contributes proportionally more to the total load, even though the extreme-effect mutations have a higher selection coefficient. We note, however, that our assumed selection coefficients, particularly for the extreme effect, are somewhat lower than those obtained by other distribution of fitness effect studies (16, 45) and simulations under an additive model results in even smaller selection coefficients (discussed above). Because selection coefficients are the same across populations in our calculations, s will affect the absolute value of load but not relative differences across populations.
Discussion
Two primary demographic signals are reflected in human genetic data from non-African populations. First, a major 5- to 10-fold population bottleneck is associated with the OOA dispersal(s) (46–48). Second, the distribution of genetic diversity among non-African populations is characterized by a decrease in heterozygosity proportional to geographic distance from northeastern Africa. A model of serial founder effects in the ancestral populations of Eurasia, Oceania, and the Americas has been posited as the most likely model for explaining the systematic variation in genetic diversity across this geographic range for humans (25, 26), as well as commensal human species (49, 50). By directly ascertaining genomic variation in over 50 individuals from seven populations, we observe a clear cline of genetic diversity as a function of distance from Africa, supporting evidence for a serial founder effect model. We also observe differences in the amount of predicted deleterious variation across populations. These differences seem to result from the genetic drift of existing deleterious variants to higher frequencies during the sequential range expansion after the OOA exit (Fig. 3B). Clines in heterozygosity for the different mutational effect categories can be reproduced by spatially explicit simulations with negative selection and recessive mutations (Fig. 4; see also codominant simulations in SI Appendix, Fig. S16). Although both moderate- and large-effect deleterious mutations have evolved under negative selection in Africa (Fig. 4C and SI Appendix, Fig. S15), many predicted moderate variants have evolved as if they were neutral in non-African populations. However, selection has remained a major force during the OOA expansion for strongly deleterious variants.
Impact of the OOA Bottleneck.
There is an ongoing debate on whether selection has been equally or more efficient in African versus non-African populations due to the major bottleneck that occurred in the ancestors of OOA populations (10, 12, 13, 35). Two studies found no significant differences in mutation load between European Americans and African Americans under an additive model with two classes of alleles: deleterious and neutral (12, 13, 33). Fu et al. (11) identified small but significant differences in the average number of alleles and the SFS, potentially due to a different algorithm for predicting mutation effect than earlier studies. We argue that estimates of the efficacy of selection should take into account not only the number of mutations per individual but also the predicted severity of mutational effect. Here, we classify mutations into four categories and find differences across populations in some, but not all, mutational categories. For variants that have putatively moderate (2 ≤ GERP < 4) or extreme deleterious effect (GERP ≥ 6), we do not see a significant difference between African and non-African populations in the number of mutations per individual. Significant per-individual differences are only observed for the intermediate large-effect category. We used PhyloP scores (51) as an alternative measure of conservation to verify our main results (SI Appendix, Fig. S26). We found qualitatively very similar patterns for both the spatial distribution of the number of derived homozygous sites per individual (SI Appendix, Fig. S26A) as well as the number of derived alleles per individual, suggesting that our results are robust to the choice of prediction algorithm that is used to estimate deleteriousness of mutations.
We note that the observed differences between populations are relatively small compared with the within-population variance (Fig. 2). Nonetheless, a novel measure of the efficacy of selection, RH, is significantly different across all three mutational categories (Fig. 4C and SI Appendix, Fig. S15) between sub-Saharan Africans and non-Africans in our dataset. That is, the observed heterozygosity at deleterious loci is greater in non-Africans than in Africans—after correcting for neutral genetic diversity in each group. This is particularly significant for moderate- and large-effect mutations, in agreement with theory that would suggest that differences in purifying selection will primarily emerge for variants at the Nes boundary.
Serial Founder Effects/Range Expansion.
Several simulation studies have attempted to characterize the distribution of deleterious alleles under OOA demographic scenarios. Some simulations focused on differences in the cumulative number of deleterious alleles per individual; others focused on differences in the proportion of segregating alleles within a population that are deleterious. Lohmueller et al. (10) found that a long bottleneck lasting more than 7,500 generations (>150,000 y) could produce the excess proportion of deleterious mutations observed in European Americans. A bottleneck model with subsequent explosive growth has also been proposed to explain the proportionally greater number of nonsynonymous or deleterious mutations in Eurasian populations (52, 53). As a consequence, deleterious mutations accumulate in populations during the expansion process. Simons et al. (12) tested a long bottleneck and subsequent population expansion model contrasting African and non-African populations and found no evidence that human demography played a role in the differential accumulation of deleterious alleles per individual.
A recent theoretical study of spatial range expansions (i.e., a model similar to geographic serial founder effects) showed that strong genetic drift at the wave front of expanding populations decreases the efficiency of selection (32). Under a spatial range expansion model, deleterious variants, unless they have a large selection coefficient, should evolve as if they are neutral on the wave front (32), and their overall frequency should therefore not change much during the range expansion (7). The loss of deleterious variants at some loci should be compensated by an increase of their frequencies at other loci. The frequency of deleterious homozygotes should therefore increase with distance from Africa, which is observed here in the rightward shift of the SFS in OOA populations (Fig. 3), except for the most evolutionarily constrained sites. We can address the question of whether this increased frequency is driven entirely by drift and gene surfing or by differential selection in non-African populations by considering the spatial distribution of the RH statistic (Fig. 4C). The fact that RH does not change among OOA populations for moderately deleterious alleles suggests that they have evolved as if they were neutral alleles during the expansion and that selection has not yet purged the deleterious mutations that increased in frequency. In contrast, extremely deleterious alleles (GERP ≥ 6) exhibit similar heterozygosity in all OOA populations, suggesting that they are subject to similar levels of purifying selection in these populations. The remaining deleterious alleles (4 ≤ GERP < 6) present an intermediate pattern, implying that both drift and selection have acted on this category of sites.
A recent controversy concerns whether there are differences in the efficacy of purifying selection between African and non-African populations (6, 12, 13). It is difficult to discuss our results in the context of this controversy because there is no generally accepted definition of “efficacy of selection,” and different definitions will lead to different interpretations (4). We therefore prefer to interpret our results in the context of our spatially explicit model of range expansions, and the relative roles of drift and selection in this model. Recurrent founder events should contribute to a decrease in the effective population sizes with distance from Africa, and it is commonly assumed that selection will become weaker with smaller effective population sizes. However, reducing the impact of a range expansion to a simple gradient in effective size, and thus to a decrease of the efficacy of selection, can be misleading. Diversity-based estimates of Ne are not necessarily informative about the strength of selection in nonequilibrium scenarios because estimates of Ne may lag behind recent demographic changes (e.g., ref. 54). Rather, if one considers that deleterious alleles were kept at low frequencies by purifying selection in ancestral African populations, those that increased in frequency by gene surfing during the OOA expansion also became more accessible to subsequent selection, especially for those alleles that were recessive. The observed cline in RH for large-effect mutations is more compatible with an unequal purging of deleterious variants by selection. Indeed, selection will have had less time to act on newly formed populations that are further away from Africa, and it will also operate more slowly on populations that have less diversity and therefore lower interindividual differences in fitness. Furthermore, the fact that our simulations can reproduce the observed pattern with spatially uniform population sizes and strength of selection against deleterious mutations implies that the simulated gradients in RH in Fig. 4A, as well as the increased number of deleterious homozygous sites, is not the consequence of reduced strength of selection away from Africa. Rather, it is caused by increased drift during the expansion, as well as by differential purging of deleterious mutations after the expansion.
The Importance of Dominance.
Multiple modeling assumptions are crucial when considering the burden of deleterious alleles across populations. In addition to the selection coefficients, the assumed dominance terms are critical. An estimated 16% of Mendelian diseases are known to be autosomal recessive (estimated from the OMIM) and many contribute significantly to infant mortality. Owing to the difficulty of detecting recessive diseases, unless they are extremely damaging, there are potentially many more disease mutations that have an h coefficient less than 0.5. Autosomal recessive diseases seem to be more frequent than autosomal dominant diseases (55), and even mildly deleterious mutations are predicted to have a mean h of 0.25 (56). Although formal calculations of genetic load require multiple assumptions, we demonstrate that differences in calculated load across human populations are primarily sensitive to assumptions about dominance, as expected given the increased extent of homozygosity in OOA populations.
We have modeled deleterious mutations as having variable h coefficients. Whereas strongly deleterious mutations are likely recessive, dominance for weakly deleterious mutations is particularly problematic to estimate because there is less power to measure weak effects and h may be upwardly biased in model organism competition experiments (19). When sampling h coefficients under our model, we allowed weakly deleterious mutations to be assigned a coefficient h > 0.5, but this had little effect on mutational load because the bulk of the load was contributed by large-effect variants. However, a fraction of strongly deleterious mutations are clearly dominant, as ascertained from disease studies, and future work may need to model different mixtures distributions on h. We also note that the absolute mutational load is twofold higher under an additive model than under a recessive model (Fig. 5), as expected from theory (36).
Estimates of Mutational Load.
We estimate that there are differences in mutational burden calculated using a formal load model, among extant human populations, particularly if we depart from a simple additive assumption. We found that the change in mutation load between sub-Saharan African populations versus Native American populations (the two ends of the range) were significantly different at P < 0.05 under recessive, partially recessive, and additive models (SI Appendix, Fig. S24). Mutational load under a fully or partially recessive model is 10 to 30% greater in non-African populations (Fig. 5A), as the result of higher homozygosity from the legacy of the OOA bottleneck across all (deleterious) mutation categories [e.g., LT(Mbuti) = 1.59 and LT(Yakut) = 1.95 under the h(s) model]. All populations carry significant load, relative to a population with the alternate, ancestral allele genotype. Under a model where fitness differences are determined only by genotype and environments are equal across individuals, the relative fitness [e−L(T)] of 0.204 for the Mbuti indicates a reduction in fitness of 79.6%, whereas a relative fitness of 0.142 for the Yakut indicates an 85.7% reduction. These fitness differences are relatively small, even under a partially recessive model.
Although illustrative, such models of load have important limitations. The mutations identified in this dataset have not been functionally characterized and are predicted to be deleterious based on degree of sequence conservation. The assumed selective coefficients across GERP categories are fit based on a recessive model, which is not applicable to all sites. However, although different selection coefficients will change the values of load in our calculation, it will not change the relative difference among populations because the same set of coefficients were applied to all populations (5). If mutations have different fitness effects across heterogeneous global environments, then the values of mutation load will change. Indeed, a proportion of the alleles may be locally adaptive, or neutral, and hence the sign of the selection coefficient for the mutation would be misestimated in our analysis. For example, the Duffy null allele is classified as a large-effect mutation using GERP (RS = 4.27) and is found at high frequency in western Africa; however, it has likely increased in frequency due to positive selection as a response to malaria (57). Recent genome-wide studies have stressed the paucity of selective sweeps in the human genome (35, 58, 59); only 0.5% of nonsynonymous mutations in 1000 Genomes Pilot Project were identified has having undergone positive selection. Others have emphasized evidence for pervasive adaptive selection (60, 61) and a variety of studies have identified specific beneficial alleles locally adapted to high altitude, immune response, and pigmentation (62–64). We considered local adaptive evolution by examining highly differentiated alleles in our dataset, that is, alleles that differ by 80% in frequency between a pair of populations, indicative of a strong local adaptation. We find that highly differentiated alleles have the same GERP score distribution as nondifferentiated alleles, indicating there is little reason to believe that most large- and extreme-effect mutations have been subjected to strong local adaptation (SI Appendix, Fig. S20; also see ref. 65). We conclude that the raw, calculated mutational burden may differ across human populations, although the effects of positive selection, varying environments, and epistasis have yet to be explored and remain a significant challenge to fully understanding mutational burden.
Conclusions.
A major difference between our work and previous results is the interpretative framework we present, which underlines the role of range expansions out of Africa to explain patterns of neutral and functional diversity. Whereas previous comparisons between African and non-African diversity attributed the observed increased proportion of deleterious variants in non-Africans to the OOA bottleneck (10), our study shows that a single bottleneck is not sufficient to reproduce the gradient we observe in the number of deleterious alleles per individual with distance from Africa (Fig. 2). Taking into account the range expansion of modern humans (66) sheds new light on this apparent controversy. Finally, we note that recent simulation work (4) suggests that the impact of a bottleneck on the efficacy of natural selection depends critically on the distribution of fitness and dominance effects as well as postbottleneck demographic history. Although these models and parameter choices clearly affect the interpretation of the pattern of deleterious alleles across populations, we find empirical evidence for significant differences in deleterious alleles as tabulated by a variety of statistics across the spectrum of human genetic diversity.
Materials and Methods
Samples and Data.
Aliquots of DNA isolated from cultured lymphoblastoid cell lines were obtained from Centre d’Étude du Polymorphisme and prepared for both full genome sequencing on Illumina HiSeq technology and exome capture with an Agilent SureSelect 44Mb array. One hundred one base pair read-pairs were mapped onto the human genome reference (GRCh37) using a mapping and variant calling pipeline designed to effectively manage massive amounts of short-read data. This pipeline followed many of the best practices developed by the 1000 Genomes Project Consortium (34).
Variant Annotation.
Ancestral state was inferred based on orthologous regions in a great ape and rhesus macaque phylogeny as reported by Ensembl Compara and used by the 1000 Genomes Project. To determine the biological impact of a variant we used GERP score (30) as a measure of conservation across a phylogeny. Positive scores reflect a site showing a high degree of conservation, based on the inferred number of “rejected substitutions” across the phylogeny. GERP scores were obtained from the University of California, Santa Cruz genome browser (hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/All_hg19_RS.bw) based on an alignment of 35 mammals to human. The allele represented in the human hg19 sequence was not included in the calculation of GERP RS scores. The human reference sequence was excluded from the alignment for the calculation of both the neutral rate and site-specific “observed” rate for the RS score to prevent any bias in the estimates. In addition to GERP, we also used PhyloP scores (51) as measures of genomic constraint during the evolution of mammals. We used the PhyloPNH scores computed in Fu et al. (11) from the 36 eutherian-mammal EPO alignments [available in Ensembl release 70 (67)], which is also computed without using the human reference sequence.
Classification of Mutation Effects by GERP Scores.
Variants were classified as being neutral, moderate, large, or extreme for GERP scores with ranges [−2,2], [2,4], [4,6], and [6,max], respectively. The use of four “bins” of GERP scores simplifies the range expansion simulations performed for distinct selection coefficients. For every individual the total number of derived deleterious counts found in homozygosity (i.e., 2 × HOM), and the total number of deleterious counts [i.e., HET + (2 × HOM)] within each category was recorded.
Individual-Based Simulations.
To simulate changes in heterozygosity, we modeled human range expansion across an array of 10 × 100 demes (32). After reaching migration-selection-drift equilibrium, populations expand into the empty territory, which is separated from the ancestral population by a geographical barrier, through a spatial bottleneck (SI Appendix, Fig. S21). After 3,000 generations, we computed the average expected heterozygosity for all populations. The migration rate and selection coefficients were adjusted to generate heterozygosity consistent with the observed data, without formally maximizing the fit. The code used for simulations can be downloaded from https://github.com/CMPG/ADMRE.
Calculating Load.
Mutational load was calculated following Kimura et al. (36), but using observed genotype frequencies instead of inferring them from Hardy–Weinberg based on the allele frequencies. In this way, the fitness of the heterozygotes and the homozygotes will be Whet = Aa × (1 − hs) and Whom = aa × (1 − s), where Aa and aa are the genotype frequencies of the heterozygotes and derived homozygotes, respectively. The fitness for a given variant will be relative to that of the ancestral variant, which for numerical convenience is set to 1. The relationship between fitness and load is Lv = 1 – W = 1 – (1 – Whet – Whom), and the total population load is the sum of the load for all variants, .
Supplementary Material
Acknowledgments
We thank Chris Tyler-Smith, David Reich, Yuval Simons, Spencer Koury, and Simon Gravel for helpful discussion. L.R.B. was supported by a Beatriu de Pinós Programme Fellowship. This work was supported by NIH Grants 3R01HG003229 (to C.D.B. and B.M.H.) and DP5OD009154 (to J.M.K.). S.P. and I.D. were supported by Swiss SNSF Grant 31003A-143393 (to L.E.).
Footnotes
Conflict of interest statement: C.D.B. is the founder of IdentifyGenomics, LLC, and is on the scientific advisory boards of Personalis, Inc. and Ancestry.com as well as the medical advisory board InVitae. None of this played a role in the design, execution, or interpretation of experiments and results presented here.
This article is a PNAS Direct Submission. C.F.A. is a guest editor invited by the Editorial Board.
Data deposition: The sequence reported in this paper has been deposited in the NCBI Sequence Read Archive (accession no. SRP036155).
3Deceased May 3, 2014.
See Commentary on page 809.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1510805112/-/DCSupplemental.
References
- 1.Morton NE, Crow JF, Muller HJ. An estimate of the mutational damage in man from data on consanguienous marriages. Proc Natl Acad Sci USA. 1956;42(11):855–863. doi: 10.1073/pnas.42.11.855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tabor HK, et al. NHLBI Exome Sequencing Project Pathogenic variants for Mendelian and complex traits in exomes of 6,517 European and African Americans: Implications for the return of incidental results. Am J Hum Genet. 2014;95(2):183–193. doi: 10.1016/j.ajhg.2014.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ohta T. Slightly deleterious mutant substitutions in evolution. Nature. 1973;246(5428):96–98. doi: 10.1038/246096a0. [DOI] [PubMed] [Google Scholar]
- 4.Gravel S. 2014. When is selection effective? bioRXiv, dx.doi.org/10.1101/010934.
- 5.Henn BM, Botigué LR, Bustamante CD, Clark AG, Gravel S. Estimating the mutation load in human genomes. Nat Rev Genet. 2015;16(6):333–343. doi: 10.1038/nrg3931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lohmueller KE. The distribution of deleterious genetic variation in human populations. Curr Opin Genet Dev. 2014;29:139–146. doi: 10.1016/j.gde.2014.09.005. [DOI] [PubMed] [Google Scholar]
- 7.Peischl S, Excoffier L. Expansion load: Recessive mutations and the role of standing genetic variation. Mol Ecol. 2015;24(9):2084–2094. doi: 10.1111/mec.13154. [DOI] [PubMed] [Google Scholar]
- 8.Casals F, et al. Whole-exome sequencing reveals a rapid change in the frequency of rare functional variants in a founding population of humans. PLoS Genet. 2013;9(9):e1003815. doi: 10.1371/journal.pgen.1003815. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kehdy FSG, et al. Brazilian EPIGEN Project Consortium Origin and dynamics of admixture in Brazilians and its effect on the pattern of deleterious mutations. Proc Natl Acad Sci USA. 2015;112(28):8696–8701. doi: 10.1073/pnas.1504447112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lohmueller KE, et al. Proportionally more deleterious genetic variation in European than in African populations. Nature. 2008;451(7181):994–997. doi: 10.1038/nature06611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fu W, Gittelman RM, Bamshad MJ, Akey JM. Characteristics of neutral and deleterious protein-coding variation among individuals and populations. Am J Hum Genet. 2014;95(4):421–436. doi: 10.1016/j.ajhg.2014.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nat Genet. 2014;46(3):220–224. doi: 10.1038/ng.2896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Do R, et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat Genet. 2015;47(2):126–131. doi: 10.1038/ng.3186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fu W, et al. NHLBI Exome Sequencing Project Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493(7431):216–220. doi: 10.1038/nature11690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tennessen JA, et al. Broad GO; Seattle GO; NHLBI Exome Sequencing Project Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337(6090):64–69. doi: 10.1126/science.1219240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Boyko AR, et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008;4(5):e1000083. doi: 10.1371/journal.pgen.1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bittles AH, Black ML. Evolution in health and medicine Sackler colloquium: Consanguinity, human evolution, and complex diseases. Proc Natl Acad Sci USA. 2010;107(Suppl 1):1779–1786. doi: 10.1073/pnas.0906079106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Slatkin M. A population-genetic test of founder effects and implications for Ashkenazi Jewish diseases. Am J Hum Genet. 2004;75(2):282–293. doi: 10.1086/423146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Agrawal AF, Whitlock MC. Inferences about the distribution of dominance drawn from yeast gene knockout data. Genetics. 2011;187(2):553–566. doi: 10.1534/genetics.110.124560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cann HM, et al. A human genome diversity cell line panel. Science. 2002;296(5566):261–262. doi: 10.1126/science.296.5566.261b. [DOI] [PubMed] [Google Scholar]
- 21.Henn BM, et al. Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc Natl Acad Sci USA. 2011;108(13):5154–5162. doi: 10.1073/pnas.1017511108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Henn BM, et al. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 2012;8(1):e1002397. doi: 10.1371/journal.pgen.1002397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wang S, et al. Genetic variation and population structure in native Americans. PLoS Genet. 2007;3(11):e185. doi: 10.1371/journal.pgen.0030185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: A discriminative modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet. 2013;93(2):278–288. doi: 10.1016/j.ajhg.2013.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ramachandran S, et al. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci USA. 2005;102(44):15942–15947. doi: 10.1073/pnas.0507611102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Prugnolle F, Manica A, Balloux F. Geography predicts neutral genetic diversity of human populations. Curr Biol. 2005;15(5):R159–R160. doi: 10.1016/j.cub.2005.02.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475(7357):493–496. doi: 10.1038/nature10231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Meyer M, et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338(6104):222–226. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kidd JM, et al. Population genetic inference from personal genome data: Impact of ancestry and admixture on human genomic variation. Am J Hum Genet. 2012;91(4):660–671. doi: 10.1016/j.ajhg.2012.08.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cooper GM, et al. NISC Comparative Sequencing Program Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005;15(7):901–913. doi: 10.1101/gr.3577405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Cooper GM, et al. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat Methods. 2010;7(4):250–251. doi: 10.1038/nmeth0410-250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Peischl S, Dupanloup I, Kirkpatrick M, Excoffier L. On the accumulation of deleterious mutations during range expansions. Mol Ecol. 2013;22(24):5972–5982. doi: 10.1111/mec.12524. [DOI] [PubMed] [Google Scholar]
- 33.Goode DL, et al. Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res. 2010;20(3):301–310. doi: 10.1101/gr.102210.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lohmueller KE, et al. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet. 2011;7(10):e1002326. doi: 10.1371/journal.pgen.1002326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kimura M, Maruyama T, Crow JF. The mutation load in small populations. Genetics. 1963;48:1303–1312. doi: 10.1093/genetics/48.10.1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Reich DE, Lander ES. On the allelic spectrum of human disease. Trends Genet. 2001;17(9):502–510. doi: 10.1016/s0168-9525(01)02410-6. [DOI] [PubMed] [Google Scholar]
- 38.Gao Z, Waggoner D, Stephens M, Ober C, Przeworski M. An estimate of the average number of recessive lethal mutations carried by humans. Genetics. 2015;199(4):1243–1254. doi: 10.1534/genetics.114.173351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lachance J, Tishkoff SA. Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles. Am J Hum Genet. 2014;95(4):408–420. doi: 10.1016/j.ajhg.2014.09.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.McQuillan R, et al. ROHgen Consortium Evidence of inbreeding depression on human height. PLoS Genet. 2012;8(7):e1002655. doi: 10.1371/journal.pgen.1002655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database issue):D514–D517. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Henn BM, et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS One. 2012;7(4):e34267. doi: 10.1371/journal.pone.0034267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mukai T, Chigusa SI, Mettler LE, Crow JF. Mutation rate and dominance of genes affecting viability in Drosophila melanogaster. Genetics. 1972;72(2):335–355. doi: 10.1093/genetics/72.2.335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Houle D, Hughes KA, Assimacopoulos S, Charlesworth B. The effects of spontaneous mutation on quantitative traits. II. Dominance of mutations with effects on life-history traits. Genet Res. 1997;70(1):27–34. doi: 10.1017/s001667239700284x. [DOI] [PubMed] [Google Scholar]
- 45.Racimo F, Schraiber JG. Approximation to the distribution of fitness effects across functional categories in human segregating polymorphisms. PLoS Genet. 2014;10(11):e1004697. doi: 10.1371/journal.pgen.1004697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Henn BM, Cavalli-Sforza LL, Feldman MW. The great human expansion. Proc Natl Acad Sci USA. 2012;109(44):17758–17764. doi: 10.1073/pnas.1212380109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Laval G, Patin E, Barreiro LB, Quintana-Murci L. Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS One. 2010;5(4):e10284. doi: 10.1371/journal.pone.0010284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Marth GT, Czabarka E, Murvai J, Sherry ST. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics. 2004;166(1):351–372. doi: 10.1534/genetics.166.1.351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Tanabe K, et al. Plasmodium falciparum accompanied the human expansion out of Africa. Curr Biol. 2010;20(14):1283–1289. doi: 10.1016/j.cub.2010.05.053. [DOI] [PubMed] [Google Scholar]
- 50.Linz B, et al. An African origin for the intimate association between humans and Helicobacter pylori. Nature. 2007;445(7130):915–918. doi: 10.1038/nature05562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20(1):110–121. doi: 10.1101/gr.097857.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Keinan A, Clark AG. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science. 2012;336(6082):740–743. doi: 10.1126/science.1217283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Lohmueller KE. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 2014;10(5):e1004379. doi: 10.1371/journal.pgen.1004379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Pennings PS, Kryazhimskiy S, Wakeley J. Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet. 2014;10(1):e1004000. doi: 10.1371/journal.pgen.1004000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Erickson RP, Mitchison NA. The low frequency of recessive disease: Insights from ENU mutagenesis, severity of disease phenotype, GWAS associations, and demography: An analytical review. J Appl Genet. 2014;55(3):319–327. doi: 10.1007/s13353-014-0203-3. [DOI] [PubMed] [Google Scholar]
- 56.Manna F, Martin G, Lenormand T. Fitness landscapes: An alternative theory for the dominance of mutation. Genetics. 2011;189(3):923–937. doi: 10.1534/genetics.111.132944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Sabeti PC, et al. International HapMap Consortium Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449(7164):913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Hernandez RD, et al. 1000 Genomes Project Classic selective sweeps were rare in recent human evolution. Science. 2011;331(6019):920–924. doi: 10.1126/science.1198878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Granka JM, et al. Limited evidence for classic selective sweeps in African populations. Genetics. 2012;192(3):1049–1064. doi: 10.1534/genetics.112.144071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Enard D, Messer PW, Petrov DA. Genome-wide signals of positive selection in human evolution. Genome Res. 2014;24(6):885–895. doi: 10.1101/gr.164822.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Grossman SR, et al. 1000 Genomes Project Identifying recent adaptations in large-scale genomic data. Cell. 2013;152(4):703–713. doi: 10.1016/j.cell.2013.01.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Yi X, et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329(5987):75–78. doi: 10.1126/science.1190371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Pickrell JK, et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009;19(5):826–837. doi: 10.1101/gr.087577.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Scheinfeldt LB, Tishkoff SA. Recent human adaptation: Genomic approaches, interpretation and insights. Nat Rev Genet. 2013;14(10):692–702. doi: 10.1038/nrg3604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Coop G, et al. The role of geography in human adaptation. PLoS Genet. 2009;5(6):e1000500. doi: 10.1371/journal.pgen.1000500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Sousa V, Peischl S, Excoffier L. Impact of range expansions on current human genomic diversity. Curr Opin Genet Dev. 2014;29:22–30. doi: 10.1016/j.gde.2014.07.007. [DOI] [PubMed] [Google Scholar]
- 67.Flicek P, et al. Ensembl 2013. Nucleic Acids Res. 2013;41(Database issue):D48–D55. doi: 10.1093/nar/gks1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.