Skip to main content
. 2018 Aug 23;7:e36317. doi: 10.7554/eLife.36317

Figure 1. Average derived allele frequency per individual (DAFi¯) as a function of recombination rate.

1000G SNPs were ranked by their local recombination rate and divided into 20 bins of equal size. DAFi¯ was computed for each individual as the number of heterozygous sites plus two times the number of derived homozygous sites and averaged per geographic region. (A) DAFi¯ vs. recombination rate on a log10 scale for all 17,129,351 1000G SNPs. (B) Same as panel A for SNPs in transcribed regions (TR), non-transcribed regions (NTR), or non-transcribed regions more than 50 kb away from TR (NTR-50kb). (C) Same as panel A for SNPs differently affected by GC-biased gene conversion (gBGC). Left: WS sites, where the derived allele is favored by gBGC. Center: SW sites, where the ancestral allele is favored by gBGC. Right: WW and SS sites, which are not affected by gBGC. The vertical dashed lines at 1.5 cM/Mb delimit an approximate threshold above which BGS has no effect on WW and SS sites, but where gBGC has a strong and opposite effect on WS and SW sites. Each group (AFR: Africans, EUR: Europeans, EAS: East-Asians, SAS: South Asians, AMR: Admixed Americans) includes individuals from two populations (see Supplementary file 1 - Table S1). Shaded areas delimit the 95% confidence interval of each group, estimated using a block-bootstrap approach (see Materials and methods).

Figure 1.

Figure 1—figure supplement 1. Genealogy of a sample of two diploid individuals at an arbitrary locus.

Figure 1—figure supplement 1.

It illustrates the fact that the expected number of derived mutations counted as the number of heterozygous sites plus two times the number of homozygous sites is the same for the two individuals assuming a constant mutation rate over the whole genealogy. Mutations accumulating over a single line of a given color will appear as heterozygous sites in a given individual, whereas mutations occurring along a double line of the same color will appear as homozygous. Since the orange path has the same length as the blue path, the expected number of mutations accumulated since the global MRCA of the sample will be identical in the two individuals. This argument generalizes to an arbitrary number of individuals, and this irrespective of the fact that these individuals belong to the same population or not, which includes the case where different populations have different demographic histories. The same argument thus holds for any locus and can thus be extended to a collection of loci with similar mutation rate. Differences in the number of derived alleles between individuals from different populations should thus reflect differences in mutation rates, selection intensity or generation time between populations.
Figure 1—figure supplement 2. Individual number of derived alleles (ni) for 1000G individuals.

Figure 1—figure supplement 2.

Each panel compares the distributions of ni among five geographic regions, with 20 individuals per region. The distribution of ni is broadly similar among geographic regions for all categories of SNPs. (A) ni computed on 17,129,351SNPs. Asians (Eastern, EAS and Southern Asians, SAS) have a significantly lower mean ni than the other three regions (Tukey test, p<0.01). (B) ni computed on synonymous sites. SAS have a lower mean ni than AFR (Africans) (Tukey test, p=0.0004). C) ni computed on non-synonymous sites. SAS have a lower mean ni than AFR (Africans) (Tukey test, p=0.009). D) ni computed on a set of sites assumed to be evolving neutrally (see text and Figure 1). SAS have a significantly lower mean ni count than non-Asians (Tukey test, p=0.0000006). This difference remains in transcribed regions (E) where SAS have a lower mean ni count than AFR (p=0.00005) and EUR (Europeans, p=0.002) but disappears in neutrally evolving non-transcribed regions (F), which suggests that background selection could still be acting in transcribed regions with relatively large recombination in South Asian populations (SAS).
Figure 1—figure supplement 3. The increase of DAFi¯ with recombination rate is robust to the choice of the recombination map.

Figure 1—figure supplement 3.

Panels are analogous to Figure 1A, but based on different recombination maps. (A) LD-based HapMap recombination map derived from a European population (CEU, log-linear regression). (B) LD-based HapMap recombination map derived from a Japanese population (JPTC) Pedigree-based sex-averaged recombination map from deCODE (Kong et al., 2010; Wegmann et al., 2011). DAFi¯ increases almost linearly with the logarithm of the recombination rate except for the lowest recombination class (<0.1 cM/Mb for all three maps), which could be due to the difficulty in estimating very low-recombination rates with both LD-based and pedigree-based methods. Both methods are likely to miss rare recombination events, such that the true recombination rate around some SNPs of the lowest recombination class could be higher than that reported. Shaded areas represent the 95% confidence interval of each group of individuals estimated using a block-bootstrap approach (see Materials and methods).
Figure 1—figure supplement 4. Same Figure and legend as Figure 1 but based on 20 SGDP individuals sequenced at high-coverage (Mallick et al., 2016).

Figure 1—figure supplement 4.

Note that DAFi¯ is larger than for 1000G (Figure 1). This is because this data sets has fewer rare alleles, as polymorphic sites are identified among 20 individuals only, as compared to 100 individuals for the SGDP data set. Shaded areas represent the 95% confidence interval of each group of individuals estimated using a block-bootstrap approach (see Materials and methods).
Figure 1—figure supplement 5. DAFi¯ as a function of various genomic predictors for the 1000G dataset.

Figure 1—figure supplement 5.

(A) DAFi¯ as a function of recombination rate for random individuals from three different populations. The same filtering criteria were used as in Figure 1A. (B) DAFi¯ as a function of the recombination distance to the nearest exon. Non-exonic SNPs in our 1000G dataset were classified according to their distance to the nearest exonic region in centiMorgans into 20 equal-size bins. Individuals are grouped by geographic region as in Figure 1A. The vertical dashed line denotes a distance 0.01 cM. C—E) DAFi¯ as a function of recombination rate for mutations associated with different GERP RS conservation scores. (C) SNPs considered as neutral (–2≤GERP RS<2) (D) SNPs with slightly deleterious mutations (2≤GERP RS<4) (E) SNPs with deleterious mutations (GERP RS >4). (F) Relationship between the average B-statistic and DAFi¯ for SNPs for which an estimate of B is available. Both statistics are computed in 20 bins defined as in Figure 1A. The solid lines link the 20 bins from the lowest to the highest recombination rate and the black arrow indicates the bin of SNPs with the lowest average recombination rates. In all panels, shaded areas represent the 95% confidence interval estimated using a block-bootstrap approach (see Materials and methods).
Figure 1—figure supplement 6. DAFi¯in 1000G populations as a function of recombination rate for various sites.

Figure 1—figure supplement 6.

(A) DAFi¯ in transcribed regions (TR, red) and in non-transcribed regions (NTR, blue) are compared to DAFi¯ computed over all sites (all, black). DAFi¯converges to the same values for both TR and NTR for high-recombination rates where BGS does not act. (B) DAFi¯ for TR and NTR regions increases with recombination distance to exons. In the interval between ~0.001 and~0.1 cM where SNP can be either in TR or NTR regions, DAFi¯s are similar for TR and NTR. (C) Comparison between DAFi¯ computed for three gBGC categories of sites. In regions of high-recombination rate, S alleles are favored by gBGC, such that DAFi¯ for WS sites increases, and DAFi¯ for SW sites decreases with recombination. As expected, DAFi¯ values diverge for different gBGC sites in genomic regions with high-recombination rate.
Figure 1—figure supplement 7. DAFi¯ as a function of recombination rate for different mutation types in 1000G individuals.

Figure 1—figure supplement 7.

The figure is a decomposition of Figure 1C for all possible substitutions. (A) WS mutations. (B) SW mutations. (C) WW and SS mutations. Note that WW and SS sites should only be affected by BGS and not by gBGC. Dashed vertical lines show the approximate limit of the influence of BGS (1.5 cM/Mb), above which DAFi¯reaches a plateau for sites unaffected by gBGC. Shaded areas correspond to 95% CI obtained by a block-bootstrap approach (see Materials and methods).
Figure 1—figure supplement 8. Influence of recombination on diversity (B-statistic).

Figure 1—figure supplement 8.

McVicker et al. (2009) median B-statistics are reported as a function of recombination rate (solid black line). The observed data is fitted either under Hudson and Kaplan (1995) simple model of background selection (BGS) with B=exp(ud/r), where ud is the genome-wide constant deleterious mutation rate, and r is the recombination rate in cM/Mb (red dashed line), or by allowing a log-log linear dependence of the deleterious mutation rate on the recombination rate as ud=u0rb (solid red line). The existence of a correlation between mutation and recombination considerably improves the fit to the B-statistics and thus better explains the signature of background selection (BGS) in the human genome. The vertical black dashed line indicates the recombination rate of 1.5 cM/Mb that we used as an approximate threshold to define our set of neutrally evolving SNPs in the remainder of the study. Note that this threshold also marks the beginning of a plateau for the observed B-statistics.
Figure 1—figure supplement 9. DAFi¯ of WW + SS sites with RR ≥1.5 cM/Mb from the 1000G data set as a function of various covariates.

Figure 1—figure supplement 9.

(A) recombination rate, (B) B statistic, (C) recombination distance to hotspots that are defined as having RR >10 cM/Mb, (D) distance to PhastCons conserved elements and (E) distance to exons. DAFi¯ is positively correlated with the distance to conserved elements and to a smaller extent to B and the distance to exons, whereas it is negatively correlated to the distance to hotspots which suggest that BGS continues to play a marginal role on DAFi¯. SNPs were divided into 20 bins of equal size. Shaded areas correspond to 95% CI obtained by a block-bootstrap approach (see Materials and methods).
Figure 1—figure supplement 10. Genomic distribution of SNPs.

Figure 1—figure supplement 10.

Figure 1—figure supplement 10.