Skip to main content
Genetics logoLink to Genetics
. 2021 Jun 3;218(3):iyab083. doi: 10.1093/genetics/iyab083

Population dynamics of GC-changing mutations in humans and great apes

Juraj Bergman 1,, Mikkel Heide Schierup 1,
Editor: S Wright
PMCID: PMC9335939  PMID: 34081117

Bergman and Schierup analyze fixation differences between different types of mutations that change the GC content of the genome. The authors focus on the role of GC-biased gene conversion, directly inferring the effect of recombination rate on the fixation probability of different mutation types. They find differences between the fixation dynamics of transitions and transversions across different great ape species, human populations, sexes, and nucleotide contexts.

Keywords: nucleotide composition evolution, meiotic recombination, biased gene conversion

Abstract

The nucleotide composition of the genome is a balance between the origin and fixation rates of different mutations. For example, it is well-known that transitions occur more frequently than transversions, particularly at CpG sites. Differences in fixation rates of mutation types are less explored. Specifically, recombination-associated GC-biased gene conversion (gBGC) may differentially impact GC-changing mutations, due to differences in their genomic distributions and efficiency of mismatch repair mechanisms. Given that recombination evolves rapidly across species, we explore gBGC of different mutation types across human populations and great ape species. We report a stronger correlation between segregating GC frequency and recombination for transitions than for transversions. Notably, CpG transitions are most strongly affected by gBGC in humans and chimpanzees. We show that the overall strength of gBGC is generally correlated with effective population sizes in humans, with some notable exceptions, such as a stronger effect of gBGC on non-CpG transitions in populations of European descent. Furthermore, species of the Gorilla and Pongo genus have a greatly reduced gBGC effect on CpG sites. We also study the dependence of gBGC dynamics on flanking nucleotides and show that some mutation types evolve in opposition to the gBGC expectation, likely due to the hypermutability of specific nucleotide contexts. Our results highlight the importance of different gBGC dynamics experienced by GC-changing mutations and their impact on nucleotide composition evolution.

Introduction

The nucleotide composition of the genome evolves under a composite of evolutionary forces, often acting antagonistically. Mutations that are introduced into the genome are generally AT-biased, i.e., the mutation rate from strong (GC) to weak (AT) nucleotides is higher than in the opposite direction. However, the proportion of fixed sites in the genome is GC-biased compared to the expectation based on mutation rate differences. This observation is usually ascribed to a biased, and mechanistically unresolved, recombination-associated repair process, termed GC-biased gene conversion (gBGC; Holmquist 1992; Eyre-Walker 1993; Galtier et al. 2001). Ample evidence from mammals—mainly the repeatedly observed positive correlation between recombination rate and GC content (Fullerton et al. 2001; Galtier 2003; Montoya-Burgos et al. 2003; Meunier and Duret 2004; Romiguier et al. 2010; Pessia et al. 2012; Tortereau et al. 2012; Capra et al. 2013; Lachance and Tishkoff 2014)—points to gBGC as a major determinant of nucleotide composition at the genomic level, and results in increased disease risk (Capra et al. 2013; Lachance and Tishkoff 2014).

The proportions of different mutation types, i.e., the mutational spectrum, are known to vary with the surrounding nucleotide context, as well as differ at the species and population levels in humans and great apes (Harris 2015; Harris and Pritchard 2017; Mathieson and Reich 2017; Carlson et al. 2018). However, the fixation biases of different mutation types are not nearly as well-characterized, even at the basic level of transition and transversion differences. With this study, we aim to provide a broader perspective on mutation fixation biases, particularly with respect to potential differences in gBGC dynamics experienced by different GC-changing mutations. We explore multiple possible sources of these differences, including mutation-specific repair mechanisms, differences in genomic distributions of mutations, surrounding genomic context as well as differences in the effective sizes and recombination rates between sexes, populations, and species.

During meiotic recombination, segregating variants can come into contact in recombination heteroduplexes that consist of homologous pairings of DNA strands between chromosomes of different parental origins. Homology search, as well as mismatch repair required for proper heteroduplex formation, share much of the molecular machinery of postreplicative mismatch repair (MMR; Jiricny 2006; Spies and Fishel 2015; Chakraborty and Alani 2016). However, while MMR that occurs immediately after replication utilizes epigenetic marks to distinguish between the template and nascent DNA strands and restore the mismatched basepair, the repair of mismatches in recombination heteroduplexes is context-dependent. Specifically, heteroduplex mismatches underlied by GC-changing mutations are more often resolved into G:C than A:T pairs, providing the basis for gBGC. GC-changing mutations that precede gBGC are either G(C) ↔ A(T) transitions or G(C) ↔ T(A) GC-changing transversions. Therefore, a heteroduplex mismatch that forms during meiotic recombination will be of the G:T or C:A type if a site segregates for a transition, or of the G:A or C:T type for transversion-segregating sites (Supplementary Figure S1). Molecular studies of Escherichia coli DNA repair efficiency indicate that different mismatch types are successfully resolved at different rates—generally, G:T and C:A mismatches are better substrates for the repair machinery than G:A or C:T mismatches (Kramer et al. 1984; Dohet et al. 1985; Su et al. 1988). Studies on human cell cultures yielded similar results (Holmes et al. 1990; Thomas et al. 1991). Additionally, biased repair of heteroduplex mismatches into G:C, rather than A:T pairs, has been reported in various mammalian systems, especially for transition-associated mismatches (Brown and Jiricny 1988; Wiebauer and Jiricny 1989, 1990; Bill et al. 1998). Together, these results describe molecular underpinnings that may potentially manifest as different gBGC dynamics of transitions and transversions at the population genomic level.

The genomic distribution of different mutations can further modify the gBGC effect experienced by different mutation types. Specifically, as the transition/transversion ratio of new mutations varies along the genome (Seplyarskiy et al. 2012), the proportion of transitions in high (or low) recombination regions can be different from the same proportion for transversions. As gBGC is driven by recombination rate, this inequality could result in different average gBGC strength experienced by the two mutation types. Most notably, the particularly high proportion of CpG transitions in GC-rich regions with high-recombination rates (Savatier et al. 1985; Selker and Stevens 1985) is likely to experience the strongest gBGC dynamics compared to other mutation types.

The relationship between recombination rates and gBGC strength is known to affect fixation patterns and divergence rates in different human populations and chimpanzees (Capra et al. 2013; Lachance and Tishkoff 2014; Dutta et al. 2018). Additionally, sex-specificity of recombination maps has been observed to affect gBGC patterns, via a stronger association to male recombination (Kostka et al. 2012). This observation holds especially for telomeric regions of chromosomes, which are known to be enriched for male recombination events (Halldorsson et al. 2019).

The strength of gBGC is usually quantified as B = Neb, where Ne is the effective population size and b is the conversion bias parameter defined by Nagylaki (1983). Given this parametrization, nucleotide diversity is affected in a manner indistinguishable from directional selection toward GC content. The B parameter can be calculated from the skew of the allele frequency spectrum (AFS), a categorical representation of nucleotide diversity within a sample of individuals from a particular population. AFS-based methods have been recently applied to both human (Glémin et al. 2015) and great ape datasets (Borges et al. 2019). Glémin et al. (2015) considered a model where the ancestral state of the polymorphisms is assumed to be known (i.e., polarized data), making it possible to contrast the AFS of derived weak (W = AT) to strong (S = GC) mutations against the SW AFS. On the other hand, Borges et al. (2019) used unpolarized data and considered AFSs of all six possible nucleotide pairs to estimate GC bias. Both studies yielded gBGC estimates of B <1, as did a previous study based on substitution rates (Lachance and Tishkoff 2014). Even though these estimates are in the nearly neutral range of fixation coefficients (Ohta 1979; Ohta and Gillespie 1996), they nonetheless have a profound effect on the evolution of nucleotide content as millions of sites across the genome are potentially affected by gBGC.

Here, we use human (1000 Genomes Project 2015; Jónsson et al. 2017) and great ape population genomic data sets (Prado-Martinez et al. 2013; Xue et al. 2015; de Manuel et al. 2016; Nater et al. 2017) to assess gBGC dynamics of the three most common mutation types: CpG and non-CpG transitions, and GC-changing transversions. We relate patterns of nucleotide variation to population-specific recombination maps that have recently been derived for humans (Halldorsson et al. 2019; Spence and Song 2019), or that we reconstruct de novo within this study for eight subspecies of great apes. We find a stronger correlation between the recombination rate and segregating GC frequency for transitions compared to GC-changing transversions. We estimate the GC-fixation bias and observe that CpG transitions have especially large B values compared to the other mutation types, likely due to the strong correlation between their genomic distribution and recombination rate. The observed patterns are largely congruent with differences in historical effective sizes across populations and species.

Materials and methods

Data

The single nucleotide polymorphism (SNP) data for the Yoruba (YRI), Toscana (TSI), and Han Chinese (CHB) populations were taken from the 1000 Genomes Project (2015) and the corresponding recombination maps from Spence and Song (2019). The SNP data for the Icelandic population (ISL) were taken from Jónsson et al. (2017) and the recombination maps from Halldorsson et al. (2019).

The great ape dataset was curated from four different sequencing studies (Prado-Martinez et al. 2013; Xue et al. 2015; de Manuel et al. 2016; Nater et al. 2017), consisting of samples from five subspecies of the Pan genus: Pan troglodytes ellioti (PTE; N= 10), Pan troglodytes schweinfurthii (PTS; N= 19), Pan troglodytes troglodytes (PTT; N= 18), Pan troglodytes verus (PTV; N= 11), and Pan paniscus (PPA; N= 13); one subspecies of the Gorilla genus: Gorilla gorilla gorilla (GGG; N= 23); and two subspecies of the Pongo genus: Pongo abelii (PAB; N= 11) and Pongo pygmaeus (PPY; N= 15). We conducted a de novo read mapping and variant calling procedure for all eight great ape populations. The reference genomes used for mapping short reads were Clint PTRv2 for chimpanzee (University of Washington; January 2018), gorGor4 for gorilla (Wellcome Trust Sanger Institute; October 2015), and Susie PABv2 for orangutan (University of Washington, 2018); https://www.ncbi.nlm.nih.gov/. The short read data were mapped using the bwa mapper, version 0.7.5a (Li and Durbin 2009). Filtering and merging of bam files were done using sambamba v0.5.1 (Tarasov et al. 2015). We retained properly paired reads with mapping quality MQ 50 and less than 2 mismatches to the reference and filtered out duplicated, unmapped, and secondarily aligned reads. The variants were called separately for each individual sample using HaplotypeCaller in -ERC GVCF mode of gatk v4.0 (McKenna et al. 2010). Genotypes were called jointly for the respective subspecies populations using gatk GenotypeGVCFs. The called SNPs were filtered according to gatk guidelines for hard filtering (-filter″ QD < 2.0 || FS > 60.0 || MQ < 40.0 || MappingQualityRankSum < −12.5 || ReadPosRankSum < −8.0″). Furthermore, vcftools v0.1.16 (Danecek et al. 2011) was used to filter out sites where more than 20% of individuals had missing data and/or sites that failed the Hardy–Weinberg equilibrium test (P <0.001). We then filtered out SNPs covered by less than 1.5 × n reads (where n is the total ploidy of the region, thus ensuring that, on average, SNPs are covered by at least 1.5 reads per haploid genome) or more than 2 × n × cov reads (where cov is the mean coverage per haploid genome).

For both human and great ape datasets, we excluded sites masked by RepeatMasker (Smit et al. 2004) in the respective reference sequences.

Estimation of GC frequency and statistical tests

The GC frequency of a segregating GC-changing mutation is calculated for each bi-allelic site as the proportion of haploid genomes, in a respective population sample, that have a G(C) as opposed to an A(T) at a transition-segregating site, or a G(C) vs a T(A) at a transversion segregating site. For human datasets, we filtered out singleton sites and sites segregating at frequency <0.5%. For the regression analyses, we considered nonoverlapping 1 Mb autosomal windows with at least 1000 segregating sites and an average recombination rate of >0 cM/Mb and <5 cM/Mb. All regression analyses were done using the lm function, as implemented in R (R Core Team 2021). We used the Levene test to assess whether the variance of the GC frequency distribution is equal between different mutation types, i.e., to detect heteroscedasticity of variances (Levene 1960). This test was performed using the levene.test function implemented in the “car” R library.

Estimation of the B parameter

We applied the Glémin et al. (2015)B inference method to the YRI, TSI, and CHB populations, as the ancestral states of sites for these populations are provided in the vcf files of the 1000 Genomes Project (2015). Using this information, we construct three types of polarized AFSs for each population and chromosome, required for input into the B estimation program: ATGC, GCAT, and neutral (GC-conservative) derived AFSs. Additionally, the program requires GC content (pGC) as input, which was calculated as described in Glémin et al. (2015), where for each SNP, we take the percentage of GC bases within a 100-bp window surrounding the SNP. The 50 basepairs upstream and downstream of the focal SNP comprising the window were taken from the human reference sequence. The average pGC is then calculated across all SNPs comprising the AFS. To generate B distributions in Supplementary Figure S5A, we resample the ATGC and GCAT AFSs of a focal mutation type, such that we vary the sizes (i.e., the sum of all sites) for each of these focal AFSs. The probabilities of sampling a site in a specific polymorphism category are equal to the proportion of sites in this category in the focal AFS. For a focal mutation type, the resampled sizes of the AFSs match the AFS sizes of an alternative mutation type originating from the same chromosome. We then apply the Glémin et al. (2015) method to the resampled AFSs. This was done for all six possible combinations of focal and alternative AFSs. We similarly vary the pGC parameter between mutation types to generate distributions in Supplementary Figure 5B.

For B inference using unpolarized AFSs whose categories are based on the segregating GC frequency of sites, we follow Vogl and Bergman (2015) and consider a Moran model of evolution for a bi-allelic locus under the influence of mutation (θ=Nμ) and fixation bias toward the preferred allele (B = Neb). In this application, B quantifies the preferred fixation of GC over AT alleles. Under mutation-fixation-drift equilibrium, and assuming low mutation rates and a high sample size, the frequency x at which the preferred allele segregates in the population is:

Pr(x|ϑ,B)=ϑeBxx1(1x)1, (1)

where ϑ=(αβθ)/(αeB+β) (α is a parameter ranging from 0 to 1 that quantifies the mutation bias toward the preferred allele; β=1α is the mutation bias toward the unpreferred allele). Assuming binomial sampling, the probability that an SNP is present at frequency y in a sample of M haploid sequences is:

Pr(y|ϑ,B,M)=limMϑ01(My)eBxxy1(1x)My1dx=ϑeBy/MMy(My)for1<y<(M1). (2)

This formula is a good approximation if αβθ<0.01 (Vogl and Clemente 2012) (well within the range for mammals) and a sample of size M >20. Furthermore, we can treat y as a discrete frequency category or as a category encompassing a specific frequency range.

To account for the potential effect of demography and population structure on Pr(y|ϑ,B,M) we introduce nuisance parameters ry that are obtained by comparing the mutation-drift equilibrium expectation of the AFS with the empirical AFS of putatively neutral sites. In our case, we use the AFS from GC-conservative sites, i.e., sites segregating for A(T):T(A) or G(C):C(G) polymorphisms, as a proxy for putatively neutral sites and define:

ry=(Ly,ny=1CLy,n)/(y1(My)1y=1Cy1(My)1), (3)

where C is the number of frequency categories of the AFS and Ly,n is the count of GC-conservative sites in frequency category y of the neutral AFS. As these sites are unaffected by GC-fixation bias, the choice of assigning them into the yth or (My)th frequency category of the AFS is random. We then normalize (2) with the sum across y categories and express the likelihood of the non-neutral AFS (i.e., the AFS of GC-changing mutations) as:

Pr(L1,,LC|B,C)=y=1C(ryeBy/MMy(My)y=1CryeBy/MMy(My))Ly, (4)

where Ly is the count of GC-changing sites within the yth category of the non-neutral AFS. The estimate of B can be obtained by maximizing the likelihood (4) as described in Appendix A of Vogl and Bergman (2015).

Due to the large sample sizes, allele frequency spectra for the human populations were constructed by classifying sites into 50 equally sized GC frequency bins. For the great ape datasets, due to the smaller sample sizes, we assigned GC-changing sites into discrete GC frequency categories to construct the corresponding AFSs. Furthermore, we omit the lowest and the highest categories of the AFSs before estimating B, since sites assigned to these extreme categories are most likely to be affected by mutation processes and are therefore the most likely source of bias when estimating B.

Classification of mutations according to flanking context

Each GC-changing mutation is classified into a mutation type category according to the 5ʹ- and 3ʹ-adjacent base determined from the reference sequence. We exclude segregating sites that are adjacent to one another. Additionally, we group together mutation types that are reverse complements of each other. For example, a site segregating for a 5ʹACG3ʹ ↔5ʹATG3ʹ mutation is grouped together with a site segregating for a 5ʹCGT3ʹ ↔5ʹCAT3ʹ mutation. This procedure results in 32 categories of GC-changing mutations, 16 categories each for transitions and GC-changing transversions.

Estimation of great ape recombination maps

The variant calls used for map estimation are derived de novo within this study as described above. Steps involved in recombination rate inference were adopted from previous studies that used similar data for map estimation (Auton et al. 2012; Stevison et al. 2015). Briefly, to improve genotype calls, we first imputed the missing data using fastPHASE v1.4 (Scheet and Stephens 2006) as described in Stevison et al. (2015) and then rephased the inferred haplotypes with PHASE v2.1 (Stephens et al. 2001) as described in Auton et al. (2012). Recombination maps were generated using the LDhat v2.2 program in interval mode (McVean et al. 2004). Specifically, haplotypes were split into windows containing 4000 nonsingleton SNPs with a 200 SNP overlap between windows. Recombination maps were inferred for each window separately and subsequently joined into chromosome-level recombination maps as described in Auton et al. (2012). To avoid potential biases in map estimation, we excluded data of four individuals from the G. gorilla gorilla subspecies population (Bulera, Kowali, Suzie, and Oko) due to their high level of relatedness with other members of this population (Prado-Martinez et al. 2013). We converted the 4Ner/bp estimates of recombination rate provided by the LDhat v2.2 program into cM/Mb for each 1-Mb region by dividing the average 4Ner/bp of the region with the per basepair estimate of Watterson’s θw across the 1-Mb region, normalized by the number of callable sites within the region. A site was defined as callable if covered by at least 1.5 reads per haploid genome and no more than 2 × n × cov reads. Since E[θw]=4Neμ (where μ is the per base and generation mutation rate), the division yields (4Ner/bp)/θw=r/μ. We then used the recently obtained estimate of μ=1.25×108 for great apes (Besenbacher et al. 2019) to obtain the probability of recombination r and obtain average estimates of recombination rates in 1-Mb regions as cM/Mb = 100r/106.

Data availability

All data used in the study are from previously published sources and freely available. Supplementary materials are available at https://github.com/jbergman/gcDynamics.

Results

GC-changing mutations and recombination rate

As gBGC only affects GC-changing mutations during recombination, we focus our analyses on population dynamics of transitions and G(C) ↔T(A) transversions. The basis of our analyses is the GC frequency of a segregating GC-changing mutation that we calculate for each bi-allelic site as the proportion of haploid genomes, in a respective population sample, that have a G(C) as opposed to an A(T) at a transition-segregating site, or a G(C) vs a T(A) at a transversion segregating site. For most of the analyses of human data, we focus on three populations from the 1000 Genomes Project (2015)—African Yoruba; YRI, Italian Toscana; TSI and Han Chinese; CBH—as representatives of the ancestral and initial Out-of-Africa human populations. The Icelandic (ISL) population was also studied as nucleotide diversity and the recombination map of this population has the highest resolution of any human population studied to date.

We observe a positive correlation between the average recombination rate and average segregating GC frequency of GC-changing mutations in 1-Mb genomic windows of the African YRI population (Spearman’s ρ= 0.3757, P <2.2 × 10–16; Figure 1A). This pattern is also consistent across Eurasian populations (Table 1). We next separate GC-changing mutations into transitions and transversions to investigate potential differences between their population dynamics with respect to recombination-associated processes. We distinguish between CpG and non-CpG transitions, as a significant proportion of GC-changing mutations are transitions at CpG sites (27.78%) that likely experience distinct population dynamics (Fullerton et al. 2001; Bhérer et al. 2017). Of the remainder of GC-changing mutations, non-CpG transitions are the most abundant mutation type (53.13%), while non-CpG and CpG transversions comprise 15.94% and 3.14%, respectively. These proportions are based on the African YRI population. We will omit CpG transversions from this part of the analysis and revisit their particular dynamics in the section regarding flanking context dependency.

Figure 1.

Figure 1

Relationship between the average GC frequency (fGC) at segregating sites and average recombination rate in 1-Mb autosomal windows of the African Yoruba population, for (A) all GC-changing mutations, and (B) for non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), and GC-changing transversions (TV; CpG–) separately. (C) Relationship between the count of GC-changing segregating sites and recombination rate in 1-Mb autosomal windows of the African Yoruba population for non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), and GC-changing transversions (TV; CpG–) separately. (D) Relationship between the count of GC-changing segregating sites and average GC frequency (fGC) in 1-Mb autosomal windows of the African Yoruba population for non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), and GC-changing transversions (TV; CpG–) separately. Total number of analyzed SNPs is 7,934,810, spread across 2666 autosomal 1-Mb regions.

Table 1.

Linear regression coefficients with corresponding R2 values for the relationship between the average GC frequency (fGC) or count proportion of all segregating sites, non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), or GC-changing transversions (TV; CpG–) and average recombination rate across 1-Mb autosomal regions of different human populations (YRI, Yoruba; TSI, Toscana; CHB, Han Chinese; ISL, Iceland)

Population
Parameter YRI TSI CHB ISL
f GC (All) 0.0294 (R2 = 0.1608) 0.0201 (R2 = 0.1437) 0.0154 (R2 = 0.1393) 0.0154 (R2 = 0.1464)
f GC (TS; CpG–) 0.0267 (R2 = 0.1303) 0.0184 (R2 = 0.1111) 0.0137 (R2 = 0.1007) 0.0141 (R2 = 0.1166)
f GC (TS; CpG+) 0.0363 (R2 = 0.1752) 0.0253 (R2 = 0.1477) 0.0194 (R2 = 0.1401) 0.0188 (R2 = 0.1551)
f GC (TV; CpG–) 0.0119 (R2 = 0.0420) 0.0068 (R2 = 0.0180) 0.0048 (R2 = 0.0133) 0.0092 (R2 = 0.0567)
%Count (All) 0.0111 (R2 = 0.3776) 0.0112 (R2 = 0.3580) 0.0091 (R2 = 0.3086) 0.0056 (R2 = 0.2640)
%Count (TS; CpG–) 0.0077 (R2 = 0.2149) 0.0081 (R2 = 0.2105) 0.0066 (R2 = 0.1715) 0.0039 (R2 = 0.1429)
%Count (TS; CpG+) 0.0189 (R2 = 0.4695) 0.0184 (R2 = 0.4427) 0.0149 (R2 = 0.3991) 0.0100 (R2 = 0.3721)
%Count (TV; CpG–) 0.0076 (R2 = 0.1468) 0.0081 (R2 = 0.1512) 0.0064 (R2 = 0.1240) 0.0036 (R2 = 0.0852)

All coefficients are significant at P <0.0001. The numbers of SNPs used in the analysis are provided in Supplementary Table S1.

As expected, the correlation between average segregating GC frequency and recombination rate in 1-Mb genomic windows for different mutation types is positive and significant (Spearman’s ρ= 0.3480, P <2.2 × 10–16 for non-CpG transitions; Spearman’s ρ= 0.3924, P <2.2 × 10–16 for CpG transitions; Spearman’s ρ= 0.1753, P <2.2 × 10–16 for transversions; Figure 1B). We also observe that this correlation is stronger for transitions compared to the same correlation for transversions (t =12.0153, P <0.001 for the comparison between CpG transitions and transversions; t =10.8384, P <0.001 for the comparison between non-CpG transitions and transversions). Notably, the GC frequency distributions differ between the three mutation types both with respect to their means and variances. The mean of GC frequencies across 1-Mb regions is higher for CpG transitions compared to non-CpG transversions (t =6.528, P =7.364 × 10–11), which have a higher mean compared to non-CpG transitions (t =33.339, P <2.2 × 10–16). Furthermore, average GC frequencies of transitions are more variable compared to transversions (Levene test statistic = 107.72, P <2.2–16 for the comparison between CpG transitions and non-CpG transversions; Levene test statistic = 300.86, P <2.2–16 for the comparison between non-CpG transitions and non-CpG transversions), and GC frequencies of CpG transitions are more variable compared to non-CpG transitions (Levene test statistic = 50.45, P =1.382–12). Therefore, the differences between correlations in Figure 1B are likely driven by the wider range of average GC frequencies for transitions compared to transversions. Linear regression analysis of the YRI population (Table 1) predicts that an increase of 2.67% and 3.63% in average GC frequency is expected for non-CpG and CpG transitions, respectively, per 1 cM/Mb increase in recombination rate, while only a 1.19% increase is expected for transversions. The recombination rate explains ∼13–18% of the variance in average GC frequency for transition sites, and ∼4% for transversions. The expected increase in average GC frequency per unit of recombination rate for the Eurasian populations is consistent with these results, but lower compared to the YRI population (Table 1). This observation is in accordance with the lower effective sizes of these populations, despite the fact that a higher recombination rate is inferred for Eurasian populations (Supplementary Figure S2A). This observation indicates that the effective size, rather than the difference between recombination maps, is a more important determining factor of between-population differences in average GC frequency of segregating mutations at the 1-Mb scale. Generally, correlations and regression coefficients are more similar between all sites and transitions, rather than transversions, due to the fact that the ratio of segregating transitions and GC-changing transversions is biased—approximately 4:1 in favor of transitions. Therefore, the genome-wide profile of the average GC frequency of segregating sites is mainly due to transition-specific processes.

The correlation between the abundance of segregating sites and recombination rate is positive (Spearman’s ρ= 0.5089, P <2.2 × 10–16 for non-CpG transitions; Spearman’s ρ= 0.7689, P <2.2 × 10–16 for CpG transitions; Spearman’s ρ= 0.3762, P <2.2 × 10–16 for transversions; Figure 1C), indicating that mutations generally accumulate in high-recombination regions. Furthermore, this correlation is strongest for CpG transitions compared to the two other mutation types (t =25.0652, P <0.001 for the comparison between CpG and non-CpG transitions; t =31.7503, P <0.001 for the comparison between CpG transitions and non-CpG transversions). We next quantify the effect of recombination on mutation abundance for each mutation type by calculating linear regression coefficients for the relationship between the count proportion of a specific mutation type and the average recombination rate in 1-Mb windows (Table 1). We define the count proportion as the percentage of segregating sites of a specific mutation type within a 1-Mb window, with respect to the corresponding total genome-wide count. As this is done separately for each mutation type, we account for the differences between the abundance of different mutation types in the genome. We find that the recombination rate explains ∼37–47% of the variance in the proportion of CpG transitions, compared to ∼14–21% and ∼9–15% for non-CpG transitions and transversions, respectively (Table 1). For the YRI population, an increase in one unit of recombination rate results in an increase of 0.0189% of total CpG transitions, compared to ∼0.0077% and ∼0.0076% for non-CpG transitions and transversions, respectively. A similar trend is true for the Eurasian population. These results demonstrate that CpG transitions are especially enriched in high-recombination regions. Notably, the ISL population has a considerably lower increase in the count percentage per unit increase of recombination when compared to other Eurasian populations. This is likely due to the different inference methodology and higher resolution of the ISL recombination map, resulting in a larger variance of recombination rates (Supplementary Figure S2A).

The analyses above were done using sex-averaged recombination rates. However, as sex-specific recombination maps are available for the ISL population, this allows us to specifically assess the influence of sex on the population dynamics of GC-changing mutations. Namely, we repeat the linear regression analysis from Table 1 with a model where the explanatory variables include male and female recombination rates, as well as their interaction (Table 2). Notably, the regression coefficients are positive and significant for both the maternal and paternal recombination maps, signifying that recombination in both sexes is associated with the increase in average GC frequency and abundance of mutations. However, the magnitude of the coefficients is always higher for the paternal recombination rate, as observed previously (Kostka et al. 2012), despite the fact that the genome-wide maternal recombination rate is higher (Supplementary Figure S2A). This is likely due to the higher variance of male recombination rates along the genome, most likely due to the high telomeric recombination specific to male meiosis (Halldorsson et al. 2016). Interestingly, the interaction coefficient is always negative, implying that maternal and paternal recombination have antagonistic effects. This is likely a consequence of the differences between male and female meiosis, such as the duration of the processes, crossover rates and locations, and possibly different repair mechanisms that are involved in mismatch resolution.

Table 2.

Linear regression coefficients with corresponding R2 values for the relationship between the average GC frequency (fGC) or count proportion of all segregating sites, non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), or GC-changing transversions (TV; CpG–) and recombination rate parameters across 1-Mb autosomal regions for the ISL population, where rm is the maternal recombination rate and rp is the paternal recombination rate

Explanatory variables
Parameter rm rp rm×rp R 2
f GC (All) 0.0105*** 0.0186*** −0.0044*** 0.147
f GC (TS; CpG–) 0.0113*** 0.0173*** −0.0048*** 0.119
f GC (TS; CpG+) 0.0112*** 0.0232*** −0.0050*** 0.1582
f GC (TV; CpG–) 0.0070*** 0.0078*** −0.0019** 0.0488
%Count (All) 0.0023*** 0.0061*** −0.0008*** 0.2776
%Count (TS; CpG–) 0.0017*** 0.0035*** −0.0003* 0.1408
%Count (TS; CpG+) 0.0042*** 0.0123*** −0.0020*** 0.4345
%Count (TV; CpG–) 0.0009** 0.0036*** −0.0001 (n.s.) 0.0903

The numbers of SNPs used in the analysis are provided in Supplementary Table S1.

*** P <0.001;

** P <0.01;

* P <0.05; n.s., nonsignificant.

Distribution of GC frequencies

The distribution of GC frequencies for transition- and transversion-segregating sites has the characteristic U-shape, typical for human populations with a relatively low population size-scaled mutation rate θ (4Neμ) < 0.01 (Figure 2). All distributions are skewed toward high GC frequencies. For the YRI population, the distribution of GC frequencies for CpG transitions has a more extreme skew compared to both non-CpG transition (t =155.81, P <2.2 × 10–16) and transversion distributions (t =40.973, P <2.2 × 10–16), with a higher mean (60.01% GC for CpG transitions, 54.90% GC for non-CpG transitions and 58.21% GC for transversions) and median (79.17% GC for CpG transitions, 67.59% GC for non-CpG transitions, and 75.93% GC for transversions; Supplementary Table S2). A similar trend is observed for Eurasian populations, but with a general reduction in proportions of low and high GC frequency variants likely due to the bottleneck-induced reduction of effective sizes in these populations following the Out-of-Africa migration. The skewness of the GC frequency distribution possibly reflects the strength of gBGC dynamics acting on each mutation type and is in accordance with the observed correlations in Figure 1, B and C. Additionally, these distributions indicate that CpG transitions potentially evolve under strongest gBGC dynamics across all human populations.

Figure 2.

Figure 2

Distribution of GC frequencies (fGC) for non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), and GC-changing transversions (TV; CpG–) of Yoruba (YRI), Toscana (TSI), Han Chinese (CHB), and Iceland (ISL) populations. The solid lines represent the medians and the dashed lines represent the means of the distributions.

The distribution of GC frequencies differs between non-CpG transitions and transversions (Figure 2), with transversions having a significantly higher average GC frequency (t =82.452, P <2.2 × 10–16), despite a stronger correlation between recombination rate and average GC frequency for non-CpG transitions (Figure 1B). A contributing factor to this observation is the fact that the abundance of non-CpG transitions is negatively correlated with their average GC frequency (Spearman’s ρ=0.0782, P < 0.0001), while the same correlation is positive for CpG transitions (Spearman’s ρ= 0.4841, P <2.2 × 10–16) and transversions (Spearman’s ρ= 0.0396, P =0.0408; Figure 1D). In other words, the proportion of non-CpG transitions in high- vs low-recombination regions is relatively smaller than the corresponding proportion of transversions, thus contributing to their less skewed GC frequency distribution. Another likely contributor is the higher variance of GC frequencies for non-CpG transitions compared to transversions (Figure 1B).

The effect of mutation bias and local GC content on GC frequencies

While multiple previous studies attribute skews in the GC frequency to gBGC, in this section, we wish to explore possible alternative explanations for these observations.

Firstly, the difference between ATGC and GCAT mutation rates can potentially cause skewness of the AFSs. However, for low mutation rates typical for humans and great apes, the theory predicts that the skewness of the GC frequency distribution for segregating sites should be weakly influenced by mutation processes. Under the assumption of mutation-selection equilibrium and accounting for mutation bias (i.e., differences between forward and backward mutation rates), the skewness of the allele frequency distribution is largely determined by the fixation bias toward the preferred allele (McVean and Charlesworth 1999; Vogl and Bergman 2015). In practice, the strongest potential impact of mutation bias on the GC frequency skew should be present at CpG sites, as CpGTpG transitions have a 10× higher per base mutation rate compared to the backward rate (Jónsson et al. 2017). Supplementary Figure S3 shows that the mutation rate for human CpG transitions estimated from a recent de novo mutation (DNM) study (Jónsson et al. 2017) causes a very minor skew in the distribution of GC frequencies. Notably, a 10–100× higher CpGTpG mutation rate would be required to cause a skew of the GC frequency distributions as in Figure 2.

Secondly, recombination-associated mutations may contribute to population dynamics of GC-changing mutations and partly underlied the observation that average GC frequency and abundance of segregating mutations are positively correlated to recombination rate (Figure 1, A–C). While recombination is mutagenic, recombination-associated mutations comprise a very small fraction of genome-wide DNMs—in a recent study, only 173 out of 200,435 DNMs were found within 1 kb from a recombination crossover event (Halldorsson et al. 2019). An increase in mutation rate can extend up to 40 kb around maternal crossover events; however, these mutations are mostly GC-conservative CG transversions that do not affect the average GC frequency of segregating GC-changing mutations. Furthermore, direct sequencing of recombination hotspots in human sperm cells demonstrates that, despite a higher mutation rate in these regions, the opposing effect of gBGC is nonetheless the dominant factor in determining the sequence evolution of high-recombination regions (Arbeithuber et al. 2015).

Thirdly, as the GC content of the genome is not at equilibrium (Duret and Arndt 2008), mutation bias and local GC content can affect the skewness of AFSs, and therefore the estimation of the gBGC parameter B. To minimize this effect we chose to study gBGC dynamics at the chromosome level—i.e., a large genomic scale. Additionally, to explicitly test for the effect of mutation processes at this scale, we estimated B by the method of Glémin et al. (2015) based on polarized ATGC and GCAT AFSs, that takes into account local GC content, mutation bias, as well as possible misinference due to errors when assigning ancestral nucleotide states. We applied the approach to individual chromosomes across three 1000 Genomes populations (YRI, TSI, and CHB), where ancestral state estimates are already available. This analysis recapitulates the results of Glémin et al. (2015) where larger B values were detected for CpG sites and qualitatively support our inferences of differences in gBGC dynamics between mutation types as a function of effective population sizes (Supplementary Figure 4A).

Fourthly, the ratio between the total number of sites in ATGC vs GCAT AFS is largely determined by mutation bias and local GC content. On the other hand, the skewness of these spectra can be affected both by mutation parameters, as well as gBGC dynamics. To gain further insight into the potential effect of mutation processes on gBGC dynamics at the chromosomal level, we assess the effect of variance of the mutation parameters on B estimation. We do this by resampling the AFSs of a focal mutation type to match the number of sites (and therefore the ratio) of the AFSs of another mutation type (see Materials and methods). We apply this to all six possible combinations of the three mutation types and re-estimate the chromosomal B distributions. We observe no notable difference between reinferred B distributions compared to the original distributions (Supplementary Figure S5A). Similarly, we test the impact of local GC content on B inference by permuting the GC content parameter (pGC) between mutation types and reinferring B distributions. Again, we observe no significant difference compared to the original distributions (Supplementary Figure S5B). We conclude that the variation in mutation bias and local GC content at the chromosomal level likely has a small effect on B estimates, and can be largely ignored. We therefore present B values estimated using the Vogl and Bergman (2015) method for the remainder of the analyses.

Fixation bias in human populations

Given the results of the previous section, we quantify the effect of gBGC on segregating sites using chromosome-level allele frequency spectra that are constructed by assigning sites into 50 equally sized categories based on their segregating GC frequency. Such AFSs have been successfully used to assess fixation biases (Vogl and Bergman 2015; Borges et al. 2019) and have the added benefit of not requiring potentially erroneous polarization of sites. Additionally, our approach specifically accommodates the exclusion of the most extreme categories of the AFS when inferring the B parameter. These categories comprise relatively young mutations that are most likely to be affected by mutation processes. The exclusion of these categories additionally accounts for any residual effect of mutation bias on gBGC inference, in turn amplifying the gBGC signal in the data relative to mutation processes. Applying our method on the YRI population, we calculate the average genome-level B estimates to be 0.3045, 0.6282, and 0.5162 for non-CpG transitions, CpG transitions, and non-CpG transversions, respectively. These estimates are in the nearly neutral range of selection coefficients (Ohta 1979; Ohta and Gillespie 1996) and in line with previous estimates for humans (Lachance and Tishkoff 2014; Glémin et al. 2015). Consistent with this result, the per-chromosome B estimates (Figure 3A) are higher for CpG transitions than non-CpG transitions (t =7.3553, P <0.001) or transversions (t =3.5954, P= 0.001). Furthermore, the correlation between B estimates inferred by our method and the method of Glémin et al. (2015) is positive (Spearman’s ρ= 0.4141, P= 0.0006) and B ranges of the two methods largely overlap (Supplementary Figure S6), indicating that both methods capture similar gBGC dynamics.

Figure 3.

Figure 3

(A) Distribution of the per-chromosome B estimates for non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), and GC-changing transversions (TV; CpG–). The “diamond” points correspond to the genome-wide B estimated for all autosomes together. (B) Relationship between the non-CpG transitions (blue), CpG transitions (red), or GC-changing transversions (green) B parameter and the sex-averaged recombination rate across autosomes. Each point is an autosomal chromosome and the four panels in each figure correspond to the Yoruba (YRI), Toscana (TSI), Han Chinese (CHB), and Iceland (ISL) populations. AFSs used in the analysis are provided in Supplementary data S1. B values were estimated using the Vogl and Bergman (2015) method.

The B estimates for CpG transitions are approximately 2.06× and 1.22× larger compared to non-CpG transitions and transversions, respectively. These observations are in accordance with the strong correlation between the average segregating GC frequency of CpG transitions and recombination rate (Figure 1B) and their accumulation in high-recombination regions (Figure 1C). Estimates of B for CpG transitions and non-CpG transversions are in correspondence with the differences in effective sizes of populations, with YRI having the highest B estimates compared to the Eurasian populations. The differences are significant when comparing estimates of CpG transitions between the YRI and CHB (t =2.4865, P =0.0172) or ISL population (t =2.707, P =0.0097), and estimates of non-CpG transversions between the YRI population and all Eurasian populations (t =4.3104, P =0.001 for the YRI-TSI comparison; t =6.3491, P <0.001 for the YRI-CHB comparison; t =3.6811, P =0.0007 for the YRI-ISL comparison). The B estimates for non-CpG transitions are lowest in the CHB population consistent with the lower effective size (t =2.0321, P =0.0491 for the YRI-CHB comparison; t =2.3822, P =0.0219 for the TSI-CHB comparison; t =2.3411, P =0.0243 for the ISL-CHB comparison). Another notable trend is the higher B estimates of non-CpG transitions in European populations compared to Africans and Asians. This trend is in opposition to the historical difference in population sizes between the African and European populations. It is also unlikely to be a product of the generally higher recombination rate in European populations, as we would then expect a similar effect on the two other mutation types. Furthermore, the correlations between the African YRI and European TSI populations for the GC frequency (Spearman’s ρ= 0.8799, P <2.2 × 10–16; Supplementary Figure S7A) and abundance (Spearman’s ρ= 0.8679, P <2.2 × 10–16; Supplementary Figure S7B) of non-CpG transitions is highly significant at the 1-Mb scale, as is the correlation of recombination rates (Spearman’s ρ= 0.9268, P <2.2 × 10–16; Supplementary Figure S7C). Therefore, this trend is likely a consequence of genomic features that are only detectable at a finer genomic scale, such as the difference in location and intensity of recombination hotspots between African and European populations. Interestingly, the observed trends from Figure 3 are also conserved at the geographic superpopulation level (Figure 4 and Supplementary Figure S8), suggesting that gBGC processes have been stable after population establishment following the Out-of-Africa migration.

Figure 4.

Figure 4

Distribution of the genome-wide B estimates for non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), and GC-changing transversions (TV; CpG–) across populations comprising different geographic superpopulations (AFR, African; SAS, South Asian; EAS, East Asian; EUR, European; AMR, Admixed American). AFSs used in the analysis are provided in Supplementary data S1. B values were estimated using the Vogl and Bergman (2015) method.

We next investigate the relationship between B and recombination rate—as expected, for all populations and mutation types, there is a positive correlation between B and the average recombination rate at the chromosome level (Figure 3B). Chromosome-level recombination rates explain a substantial proportion of the variance in per-chromosome B estimates (Table 3)—on average ∼46–58% for non-CpG transitions, ∼60–68% for CpG transitions, and ∼20–34% for non-CpG transversions. Furthermore, the YRI recombination rate varies at a similar scale to Eurasian populations (Figure 3B), yet has the highest increase in B per increase in unit of recombination (Table 3). Again, this observation demonstrates that the efficiency of gBGC is promoted by the historically higher effective size of the African population.

Table 3.

Linear regression coefficients with corresponding R2 values for the relationship between per-chromosome B of non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), or GC-changing transversions (TV; CpG–) and recombination rate, for different human populations (YRI, Yoruba; TSI, Toscana; CHB, Han Chinese; ISL, Iceland)

Population
Parameter YRI TSI CHB ISL
B (TS; CpG–) 0.4852 (R2 = 0.4602)* 0.3731 (R2 = 0.4773)** 0.2813 (R2 = 0.5262)** 0.3029 (R2 = 0.5782)**
B (TS; CpG+) 0.6503 (R2 = 0.6186)** 0.5184 (R2 = 0.5990)** 0.3834 (R2 = 0.6713)** 0.4068 (R2 = 0.6829)**
B (TV; CpG–) 0.2319 (R2 = 0.3248)* 0.1302 (R2 = 0.2146)* 0.0864 (R2 = 0.2046)* 0.1451 (R2 = 0.3389)*

** P <0.001;

* P <0.05.

GC-changing mutations and flanking nucleotide context

In this section, we classify GC-changing mutations and their reverse complements into 32 mutational types depending on their flanking 5ʹ and 3ʹ nucleotides. We note that the proportions of different mutation types vary widely, with the 5ʹACG3ʹ ↔ 5ʹATG3ʹ type being the most abundant (Figure 5A). Generally, transitions, and especially CpG transitions, are the most abundant mutation types. Furthermore, the proportions of specific mutation types are very similar between populations. However, we do recover the strong enrichment signal of the 5ʹTCC3ʹ ↔ 5ʹTTC3ʹ mutation type in European populations, compared to the YRI and CHB populations (Supplementary Figure S9), as previously observed (Harris and Pritchard 2017).

Figure 5.

Figure 5

(A) The proportions of the 32 different mutation types for the YRI, TSI, CHB, and ISL populations. (B) The values of the fixation bias B of 32 different mutation types for the YRI, TSI, CHB, and ISL populations. Each block represents the value of a statistic across different mutation types (transitions; TS, and transversions; TV) for each population. Values are calculated separately based on their flanking context, where the x-axis and y-axis represent the 3ʹ and 5ʹ nucleotides, respectively. Complementary nucleotide contexts are grouped together. AFSs used in the analysis are provided in Supplementary data S2. B values were estimated using the Vogl and Bergman (2015) method.

We next calculate the fixation index B for each of the 32 mutation types (Figure 5B). Most B values are positive, indicating fixation toward GC content as expected for sites evolving under gBGC. Interestingly, the highest B values are estimated for 5ʹ*CC3ʹ ↔ 5ʹ*AC3ʹ transversions, demonstrating that subsets of non-CpG transversions may experience gBGC dynamics stronger than even CpG transitions. Unexpectedly, however, some mutation types have negative B values. For transitions, the 5ʹACT3ʹ ↔ 5ʹATT3ʹ mutation type has a negative B value in all populations, in opposition to the gBGC expectation. For transversions, negative B values are present for the 5ʹCCT3ʹ ↔ 5ʹCAT3ʹ mutation type and three types of CpG transversions (5ʹCCG3ʹ ↔ 5ʹCAG3ʹ, 5ʹTCG3ʹ ↔ 5ʹTAG3ʹ, and 5ʹACG3ʹ ↔ 5ʹAAG3ʹ). This unexpected result is likely due to non-gBGC processes affecting the GC frequency distribution of these mutation types. We next focus on CpG transversions, which seem to be the most affected by this phenomenon.

Transversions that segregate at CpG sites have the following characteristics. Firstly, they are the rarest mutation type of all GC-changing mutations (comprising 3.14%; 250,334 sites in total; Supplementary Table S2). Secondly, their average segregating GC frequency is the lowest out of all the mutation types with a mean of 40.27% GC and median of 21.76% GC (Supplementary Table S2). Accordingly, the AFS of CpG transversions is skewed toward low GC (i.e., high AT) frequency variants (Supplementary Figure S10). However, despite their low GC content, CpG transversions have typical characteristics of sites evolving under gBGC dynamics, such as the positive correlation between average GC frequency and recombination (Supplementary Figure S11A). Furthermore, the count of CpG transversions is positively correlated to recombination rate, as well as average GC frequency (Supplementary Figure S11, B and C). Therefore, the GC frequency distribution of CpG transversions is affected either by an excess of low GC variants or conversely, the lack of high GC variants. An excess of low GC variants would imply an exceptionally high ATGC mutation rate, which is highly unlikely given that mutation rates in humans are generally low and AT-biased. Furthermore, under this scenario, we would expect a much higher representation of CpG transversions, as opposed to only 3.14% of all GC-changing mutations. Another possible explanation for the skew in the CpG transversion AFS is a biased fixation of AT nucleotides at CpG transversion sites. However, this is also highly unlikely, as no biological mechanisms are known that would cause such a bias. Therefore, a more likely explanation for the observed GC frequency distribution of CpG transversions is a lack of high-frequency GC variants. This underrepresentation can occur due to the high mutability of GC content at CpG sites. Specifically, as sites that segregate for CpG transversions reach high GC frequency, the probability of additional mutations increases, in turn decreasing the number of detectable CpG transversions segregating at high GC frequency. As a result, high GC frequency CpG transversions would be enriched for tri-allelic variants, harder to detect with conventional SNP calling methods, and possibly misclassified into other mutation types, resulting in their general underrepresentation. Additionally, recent shifts in the recombination map can also contribute to the observed patterns. While it is difficult to reliably test all of these expectations, we can do a basic analysis of tri-allelic sites within our sample of segregating sites.

As expected, the number of tri-allelic sites is higher for CpG sites (18,168 at CpG sites and 11,264 at non-CpG sites; Supplementary Table S2), as is the overall proportion of these sites within their respective type classes (0.7% at CpG sites and 0.2% at non-CpG sites; χ2 = 12,998, P <2.2 × 10–16). Generally, the average segregating GC frequency at tri-allelic sites is high compared to bi-allelic sites (Supplementary Table S2). Furthermore, tri-allelic CpG sites tend to have higher GC frequency compared to non-CpG sites (t =25.089, P <2.2 × 10–16). These observations indicate that sites segregating at high GC frequency are especially susceptible to additional mutations and the formation of tri-allelic sites. Therefore, we expect a negative correlation between B values and the abundance of tri-allelic sites. Indeed, we observe a highly significant negative correlation between transversion B values and the number of tri-allelic sites (Spearman’s ρ= −0.8118, P= 0.0001), while the same is not observed for transitions (Spearman’s ρ= 0.1971, P= 0.4643; Figure 6). This suggests that, at least for rare CpG mutation types such as CpG transversions, the GC frequency distribution is affected by the general hypermutability of CpG sites. Importantly, this observation does not exclude the action of other aforementioned modifiers of GC frequency distributions, which may interact to produce similar GC frequency patterns for other mutation types.

Figure 6.

Figure 6

The correlation between the numbers of tri-allelic sites and B values for transitions and transversions of the YRI population. Each point represents a value for a specific flanking nucleotide context. B values were estimated using the Vogl and Bergman (2015) method.

Fixation bias and recombination rate in great apes

In this section, we investigate the GC-fixation bias in eight great ape populations consisting of five subspecies of the Pan genus, one subspecies of the Gorilla genus, and two subspecies of the Pongo genus. Additionally, we have derived recombination maps specific to each of these populations and study the relationship between the population-specific recombination rate and the fixation parameter B.

The distributions of per-chromosome B estimates for the populations of the Pan genus are similar to those of humans, with CpG transitions having the highest values, followed by non-CpG transversions and transitions (Figure 7A). A direct comparison between per-chromosome B values of the P. paniscus and human YRI population reveals no difference for CpG transitions (t =0.18628, P =0.8531), reflective of similar effective sizes of the two populations (Prado-Martinez et al. 2013) and a historically stable recombination environment experienced by this mutation type across the two species. On the other hand, somewhat higher B values are estimated for non-CpG transitions (t =4.7306, P <0.0001) and transversions (t =7.0785, P <0.0001) in humans, which may occur if these mutation types are affected by the differential evolution of the recombination landscape since their split from the common ancestor (Munch et al. 2014). Notably, at the 1-Mb scale, the correlations between recombination rate and average GC frequency of segregating mutations for the P. paniscus population are positive (Spearman’s ρ= 0.3208, P <2.2 × 10–16 for non-CpG transitions; Spearman’s ρ= 0.3614, P <2.2 × 10–16 for CpG transitions; Spearman’s ρ= 0.2520, P <2.2 × 10–16 for transversions; Supplementary Figure S12A) and similar to the observed trend in humans (Figure 1B). Additionally, as for humans (Figure 1C), the correlations between recombination rate and mutation abundance are positive and strongest for CpG transitions (Spearman’s ρ= = 0.1739, P <2.2 × 10–16 for non-CpG transitions; Spearman’s ρ= 0.5658, P <2.2 × 10–16 for CpG transitions; Spearman’s ρ= 0.1568, P <2.2 × 10–16 for transversions; Supplementary Figure S12B). These results further demonstrate that large-scale gBGC dynamics are mostly congruent between chimpanzees and humans.

Figure 7.

Figure 7

(A) Distribution of the per-chromosome B parameter for non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), and GC-changing transversions (TV; CpG–). The “diamond” points correspond to the genome-wide B estimated for all autosomes together. (B) Relationship between the non-CpG transitions (blue), CpG transitions (red), or GC-changing transversions (green) per-chromosome B parameter and the sex-averaged recombination rate. The eight panels in each figure correspond to the P. troglodytes ellioti (PTE), P. troglodytes schweinfurthii (PTS), P. troglodytes troglodytes (PTT), P. troglodytes verus (PTV), P. paniscus (PPA), G. gorilla gorilla (GGG), P. abelii (PAB), and P. pygmaues (PPY) populations. AFSs used in the analysis are provided in Supplementary data S3. B values were estimated using the Vogl and Bergman (2015) method.

In contrast to humans and Pan species, great apes of the Gorilla and Pongo genus have the highest B estimates for non-CpG transversions, followed by CpG and non-CpG transitions. Therefore, the most notable characteristic of these species is the reduction of gBGC intensity at CpG transitions. Accordingly, we observe a reduced mean GC frequency of segregating CpG transitions (55.66% GC for G. gorilla gorilla and 53.21% GC for P. abelii) compared to non-CpG transversions (59.4% GC for G. gorilla gorilla and 61.4% GC for P. abelii) that is highly significant (t =82.0, P <2.2 × 10–16 for G. gorilla gorilla; t =221.63, P <2.2 × 10–16 for the P. abelii). At the 1-Mb scale, the correlations between recombination rate and average segregating GC frequency or mutation abundance for Gorilla and Pongo populations are positive (Supplementary Figure S11, C–H), but also notably different compared to the same correlations for human and Pan populations. Therefore, gBGC dynamics have likely diverged between humans and more distantly related great ape species. An important contributing factor to this observation could be differences in recombination landscapes between the species, as Gorilla and Pongo populations have more uniform recombination landscapes compared to other great apes (Stevison et al. 2015; Supplementary Figure S2B). Such recombination landscapes may hypothetically result in more similar distributions of B values across different mutation categories for these populations (Figure 7A). Furthermore, great apes of the Gorilla and Pongo genus have the highest per-chromosome B estimates for non-CpG transversions, while subspecies of the Pongo genus have the highest B values for non-CpG transitions, in accordance with high effective sizes of these populations (Prado-Martinez et al. 2013).

Similarly to human populations, the linear regression coefficients for the relationship between per-chromosome B estimates and recombination rates are positive for all populations and mutation classes (Table 4). Recombination rates explain ∼16–60%, ∼23–69%, and ∼25–48% of the variance in B estimates for non-CpG transitions, CpG transitions, and non-CpG transversions, respectively. These estimates overlap with those of human populations. Notably, the increase in B with unit increase in recombination rate is highest for the Gorilla and Pongo populations, which is likely due to the lower variance of recombination rates (Supplementary Figure S2B). When considering GC-changing mutations classified according to their flanking nucleotide composition, we observe similar patterns as in humans, especially for the P. paniscus population. The proportions of different mutation classes are similar between representative populations of each genus, with a general excess of CpG transitions and the highest paucity of CpG transversions (Figure 8A). Furthermore, differences in B estimates across the 32 mutation categories are subtle, and major characteristics, such as the negative B values for CpG transversions, are present in all species (Figure 8B). Notably, the majority of observed patterns in humans are also replicated in great apes, especially in the closest human relatives—bonobos and chimpanzees. Therefore, the differential impact of effective population sizes, recombination, and gBGC on distinct mutation types is a conserved phenomenon of the Hominidae family.

Table 4.

Linear regression coefficients with corresponding R2 values for the relationship between per-chromosome B of non-CpG transitions (TS; CpG–), CpG transitions (TS; CpG+), or GC-changing transversions (TV; CpG–) and recombination rate, for different great ape populations (PTE, P. troglodytes ellioti; PTS, P. troglodytes schweinfurthii; PTT, P. troglodytes troglodytes; PTV, P. troglodytes verus; PPA, P. paniscus; GGG, G. gorilla gorilla; PAB, P. abelii; PPY, P. pygmaues)

Population
Parameter PTE PTS PTT PTV
B (TS; CpG–) 0.6849 (R2 = 0.2134)* 1.1613 (R2 = 0.4117)*** 0.7148 (R2 = 0.1621)* 0.7641 (R2 = 0.1999)*
B (TS; CpG+) 0.8927 (R2 = 0.3640)** 1.3206 (R2 = 0.5150)*** 0.9952 (R2 = 0.3369)** 0.7729 (R2 = 0.2332)*
B (TV; CpG–) 0.1088 (n.s.) 0.3525 (n.s.) 0.1406 (n.s.) 0.2455 (n.s.)
PPA GGG PAB PPY
B (TS; CpG–) 0.9297 (R2 = 0.4688)*** 1.6106 (R2 = 0.5122)*** 2.0949 (R2 = 0.5026)*** 0.9643 (R2 = 0.6006)***
B (TS; CpG+) 0.9110 (R2 = 0.5911)*** 1.8518 (R2 = 0.6195)*** 2.2734 (R2 = 0.5771)*** 1.0836 (R2 = 0.6897)***
B (TV; CpG–) 0.4762 (R2 = 0.2692)** 0.8227 (R2 = 0.4167)*** 0.8979 (R2 = 0.3951)*** 0.4756 (R2 = 0.4276)***

*** P <0.001;

** P <0.01;

* P <0.05; n.s., nonsignificant.

Figure 8.

Figure 8

(A) The proportions of the 32 different mutation types for the P. paniscus (PPA), G. gorilla gorilla (GGG), and P. abelii (PAB) populations. (B) The fixation bias B of 32 different mutation types for the PPA, GGG, and PAB populations. Each block represents the value of a statistic across different mutation types (transitions; TS, and GC-changing transversions; TV) for each population. Values are calculated separately based on their flanking context, where the x-axis and y-axis represent the 3ʹ and 5ʹ nucleotides, respectively. Complementary nucleotide contexts are grouped together. AFSs used in the analysis are provided in Supplementary data S4. B values were estimated using the Vogl and Bergman (2015) method.

Discussion

Within this article, we use correlation analysis and population genetic theory to quantify the difference in gBGC dynamics between different types of GC-changing mutations. The basis of our analysis is the estimation of GC frequencies at segregating sites using large population genomic datasets, allowing us to examine the relationship between segregating GC content, recombination, and the gBGC parameter B. We conclude that differences in GC frequency distributions of different mutation types are largely due to the complex relationship between the effective population size, the distribution of different mutation types across the genome, and the features of the recombination map, such as its rate intensity and sex-specificity.

The stronger correlations between the average GC frequency and recombination for transitions, compared to the same correlation for transversions, indicates that transition-segregating sites are more prone to gBGC—a result that is in accordance with early molecular studies of DNA repair (Brown and Jiricny 1988; Wiebauer and Jiricny 1989, 1990; Bill et al. 1998). Generally, MMR is an essential process for maintaining genomic integrity. The most common types of mutations during replication are misincorporations of A or T nucleotides in place of G or C. Therefore, postreplicative MMR machinery may have evolved to preferentially recognize these mismatches and restore them back to G:C pairs. Since mismatches in recombination heteroduplexes are resolved using much of the molecular machinery of postreplicative MMR, recombination-associated gBGC may simply be an indirect consequence of the coevolution between the prevalence of AT-biased mutations and GC-biased MMR during DNA replication. Correspondingly, since transitions are more common than transversions, a higher specificity of MMR for transition-associated mismatches and consequently, a higher GC bias may be expected. Recent studies of noncrossover (NCO) gene conversions in a Mexican-American population estimated the GC-biased conversion ratio to be 0.7:0.3 for transitions and 0.64:0.36 for transversions (Williams et al. 2015), while a similar study in the Icelandic population resulted in ratios of 0.7:0.3 and 0.68:0.32 for transitions and transversions, respectively (Halldorsson et al. 2016). While tentative, these differences are not statistically significant. On the other hand, Halldorsson et al. (2016) detected a slightly greater GC bias for CpG SNPs, while no such bias was detected by Williams et al. (2015). Given that the difference in the increase of average GC frequency per 1 cM/Mb is only ∼2% between the different mutation types (Table 1), we conclude that mutation type-specific repair mechanisms have a limited effect on mutation type-specific gBGC dynamics. A similar conclusion was reached by Glémin et al. (2015), where only 2% of the variance in B estimates was explained by the CpG status of the mutation.

Our estimates of the B parameter represent gBGC dynamics given the long-term, average Ne of the studied populations. Therefore, if the conversion bias changed between transitions and transversions in the recent past, this may affect the transition-transversion difference derived from human pedigree studies compared to our estimated difference. Since the pedigree studies focused only on NCO conversions, there is a substantial fraction of crossover conversions that remain unexplored, which may further impede the comparison between gBGC dynamics inferred from pedigree data and the population genomic approach presented herein. Additionally, demography and population structure are taken into account using the ry coefficients derived from the neutral AFS to normalize the gBGC AFS. However, this comes with the assumption that segregating frequencies of both neutral and sites evolving under gBGC dynamics are equally affected by demography—which can be a somewhat unrealistic assumption, even though simulations show that ry coefficients generally perform well in correcting for nonequilibrium dynamics (Eyre-Walker et al. 2006). It is also important to note that the Vogl and Bergman (2015) and Glémin et al. (2015) methods are preferred in different scenarios. We opted to present B estimates following Vogl and Bergman (2015) given the chromosomal scale at which we infer B and the fact that this method is not affected by misinference of ancestral states. On the other hand, the Glémin et al. (2015) method explicitly models local GC content and is therefore preferred if B was estimated at a finer genomic scale (1 Mb), where GC content is more variable. However, ancestral misinference is a potential confounder of B estimates given this method.

The difference between transition and transversion B estimates could also arise if Ne along the genome varies locally with the distribution of transitions or transversions. Specifically, transversion mutations are likely to be more deleterious in both coding (Zhang 2000) and regulatory regions (Guo et al. 2017), therefore experiencing stronger purifying selection, which would in turn reduce Ne around transversions, more so than transitions. To test the potential effect that such Ne reduction has on our B estimates, we exclude sites in coding regions and regions 1000 bp upstream and downstream of genes and redo the per-chromosome B estimation for the YRI population. The B estimates remain very similar for all mutation types (Supplementary Figure S13), likely due to the fact that mutations in regulatory and coding regions are only a small fraction of all segregating sites that we analyze—indeed, only 300,476 (<4%) out of the full 7,714,418 sites were excluded by this analysis.

Our analyses show that the major factor in determining the patterns of gBGC dynamics is the historical effective size of the population. This is most evident when comparing per-chromosome B estimates for CpG transitions and non-CpG transversions between human populations, with Africans having higher values compared to Eurasians (Figures 3A and 4 and Supplementary Figure S5). Additionally, great ape populations with high effective sizes of the Gorilla and Pongo genus have the highest B estimates for non-CpG transversions and non-CpG transitions, respectively (Figure 7A). However, there are notable exceptions to this pattern that are likely due to factors other than effective size differences. Specifically, Europeans have higher B estimates for non-CpG transitions compared to both Africans and Asians, while B estimates of CpG transitions for Gorilla and Pongo populations are notably lower compared to other species. Such patterns are likely due to between-species differences in recombination landscapes (Supplementary Figure S2) and/or shifts in distributions and frequencies of different mutation types (Harris and Pritchard 2017). Additionally, the analysis of male and female recombination maps (Table 2) implies that sex-specific meiotic processes are important determinants of gBGC dynamics, as observed previously (Kostka et al. 2012).

When considering gBGC dynamics of GC-changing mutations that have been classified into categories according to their flanking nucleotide context, we uncovered that some mutation categories have negative B values, in opposition to the gBGC hypothesis. This is a surprising result that is likely underlied by CpG hypermutability in the case of transversions segregating at CpG sites (Figure 6). However, negative B values for 5ʹACT3ʹ ↔ 5ʹATT3ʹ and 5ʹCCT3ʹ ↔ 5ʹCAT3ʹ mutation types have likely arisen by different mechanisms, as they cannot be explained by CpG hypermutability. Therefore, more investigation is needed into the dependence of gBGC dynamics on flanking nucleotide context.

We illustrate the importance of analyzing population genomic datasets with their corresponding population-specific recombination maps when considering gBGC dynamics. Given the current accumulation of large population genomic datasets that are suitable for recombination map estimation, recombination-associated processes can now be studied at an unprecedented scale, providing us with a better understanding of the effect of fixation biases on the evolution of nucleotide composition across a large variety of species.

Acknowledgments

The authors thank Marjolaine Rousselle and Claus Vogl for critical reading of the manuscript.

Funding

This work was supported by the Novo Nordisk Foundation (grant NNF18OC0031004).

Conflicts of interest

None declared.

Literature cited

  1. 1000 Genomes Project 2015. A global reference for human genetic variation. Nature. 526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Arbeithuber B, Betancourt AJ, Ebner T, Tiemann-Boege I.. 2015. Crossovers are associated with mutation and biased gene conversion at recombination hotspots. Proc Natl Acad Sci U S A. 112:2109–2114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, et al. 2012. A fine-scale chimpanzee genetic map from population sequencing. Science. 336:193–198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Besenbacher S, Hvilsom C, Marques-Bonet T, Mailund T, Schierup MH.. 2019. Direct estimation of mutations in great apes reconciles phylogenetic dating. Nat Ecol Evol. 3:286–292. [DOI] [PubMed] [Google Scholar]
  5. Bhérer C, Campbell CL, Auton A.. 2017. Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales. Nat Commun. 8:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bill CA, Duran WA, Miselis NR, Nickoloff JA.. 1998. Efficient repair of all types of single-base mismatches in recombination intermediates in Chinese hamster ovary cells: competition between long-patch and G-T glycosylase-mediated repair of G-T mismatches. Genetics. 149:1935–1943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Borges R, Szöllősi G, Kosiol C.. 2019. Quantifying GC-biased gene conversion in great ape genomes using polymorphism-aware models. Genetics. 212:1321–1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brown TC, Jiricny J.. 1988. Different base/base mispairs are corrected with different efficiencies and specificities in monkey kidney cells. Cell. 54:705–711. [DOI] [PubMed] [Google Scholar]
  9. Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A.. 2013. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes. PLoS Genet. 9:e1003684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Carlson J, Locke AE, Flickinger M, Zawistowski M, Levy S, et al. ; The BRIDGES Consortium. 2018. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun. 9:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chakraborty U, Alani E.. 2016. Understanding how mismatch repair proteins participate in the repair/anti-recombination decision. FEMS Yeast Res. 16:fow071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, et al. 2011. The variant call format and vcftools. Bioinformatics. 27:2156–2158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. de Manuel M, Kuhlwilm M, Frandsen P, Sousa VC, Desai T, et al. 2016. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science. 354:477–481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Dohet C, Wagner R, Radman M.. 1985. Repair of defined single base-pair mismatches in Escherichia coli. Proc Natl Acad Sci U S A. 82:503–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Duret L, Arndt PF.. 2008. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet. 4:e1000071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Dutta R, Saha-Mandal A, Cheng X, Qiu S, Serpen J, et al. 2018. 1000 human genomes carry widespread signatures of GC biased gene conversion. BMC Genomics. 19:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Eyre-Walker A. 1993. Recombination and mammalian genome evolution. Proc Biol Sci. 252:237–243. [DOI] [PubMed] [Google Scholar]
  18. Eyre-Walker A, Woolfit M, Phelps T.. 2006. The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics. 173:891–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Fullerton SM, Carvalho AB, Clark AG.. 2001. Local rates of recombination are positively correlated with GC content in the human genome. Mol Biol Evol. 18:1139–1142. [DOI] [PubMed] [Google Scholar]
  20. Galtier N. 2003. Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 19:65–68. [DOI] [PubMed] [Google Scholar]
  21. Galtier N, Piganeau G, Mouchiroud D, Duret L.. 2001. GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics. 159:907–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Glémin S, Arndt PF, Messer PW, Petrov D, Galtier N, et al. 2015. Quantification of GC-biased gene conversion in the human genome. Genome Res. 25:1215–1228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Guo C, McDowell IC, Nodzenski M, Scholtens DM, Allen AS, et al. 2017. Transversions have larger regulatory effects than transitions. BMC Genomics. 18:394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Halldorsson BV, Hardarson MT, Kehr B, Styrkarsdottir U, Gylfason A, et al. 2016. The rate of meiotic gene conversion varies by sex and age. Nat Genet. 48:1377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Halldorsson BV, Palsson G, Stefansson OA, Jonsson H, Hardarson MT, et al. 2019. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science. 363:eaau1043. [DOI] [PubMed] [Google Scholar]
  26. Harris K. 2015. Evidence for recent, population-specific evolution of the human mutation rate. Proc Natl Acad Sci U S A. 112:3439–3444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Harris K, Pritchard JK.. 2017. Rapid evolution of the human mutation spectrum. eLife. 6:e24284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Holmes J, Clark S, Modrich P.. 1990. Strand-specific mismatch correction in nuclear extracts of human and drosophila melanogaster cell lines. Proc Natl Acad Sci U S A. 87:5837–5841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Holmquist GP. 1992. Chromosome bands, their chromatin flavors, and their functional features. Am J Hum Genet. 51:17. [PMC free article] [PubMed] [Google Scholar]
  30. Jiricny J. 2006. The multifaceted mismatch-repair system. Nat Rev Mol Cell Biol. 7:335. [DOI] [PubMed] [Google Scholar]
  31. Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, et al. 2017. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci Data. 4:170115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kostka D, Hubisz MJ, Siepel A, Pollard KS.. 2012. The role of GC-biased gene conversion in shaping the fastest evolving regions of the human genome. Mol Biol Evol. 29:1047–1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kramer B, Kramer W, Fritz H-J.. 1984. Different base/base mismatches are corrected with different efficiencies by the methyl-directed DNA mismatch-repair system of E. coli. Cell. 38:879–887. [DOI] [PubMed] [Google Scholar]
  34. Lachance J, Tishkoff SA.. 2014. Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles. Am J Hum Genet. 95:408–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Levene H. 1960. Robust tests for equality of variances. In: Olkin I, Ghurye SG, Hoeffding W, Madow WG, Mann HB, editors. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. p. 278–292. Stanford University Press, California, USA.
  36. Li H, Durbin R.. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 25:1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mathieson I, Reich D.. 2017. Differences in the rare variant spectrum among human populations. PLoS Genet. 13:e1006581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. McVean GA, Charlesworth B.. 1999. A population genetic model for the evolution of synonymous codon usage: patterns and predictions. Genet Res. 74:145–158. [Google Scholar]
  40. McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, et al. 2004. The fine-scale structure of recombination rate variation in the human genome. Science. 304:581–584. [DOI] [PubMed] [Google Scholar]
  41. Meunier J, Duret L.. 2004. Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol. 21:984–990. [DOI] [PubMed] [Google Scholar]
  42. Montoya-Burgos JI, Boursot P, Galtier N.. 2003. Recombination explains isochores in mammalian genomes. Trends Genet. 19:128–130. [DOI] [PubMed] [Google Scholar]
  43. Munch K, Mailund T, Dutheil JY, Schierup MH.. 2014. A fine-scale recombination map of the human–chimpanzee ancestor reveals faster change in humans than in chimpanzees and a strong impact of GC-biased gene conversion. Genome Res. 24:467–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Nagylaki T. 1983. Evolution of a large population under gene conversion. Proc Natl Acad Sci U S A. 80:5941–5945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Nater A, Mattle-Greminger MP, Nurcahyo A, Nowak MG, Manuel MD, et al. 2017. Morphometric, behavioral, and genomic evidence for a new orangutan species. Curr Biol. 27:3487–3498. [DOI] [PubMed] [Google Scholar]
  46. Ohta T. 1979. Slightly deleterious mutant substitutions in evolution. Nature. 246:96–98. [DOI] [PubMed] [Google Scholar]
  47. Ohta T, Gillespie J.. 1996. Development of neutral and nearly neutral theories. Theor Popul Biol. 49:128–142. [DOI] [PubMed] [Google Scholar]
  48. Pessia E, Popa A, Mousset S, Rezvoy C, Duret L, et al. 2012. Evidence for widespread GC-biased gene conversion in eukaryotes. Genome Biol Evol. 4:675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, et al. 2013. Great ape genetic diversity and population history. Nature. 499:471–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
  51. Romiguier J, Ranwez V, Douzery EJ, Galtier N.. 2010. Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res. 20:1001–1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Savatier P, Trabuchet G, Faure C, Chebloune Y, Gouy M, et al. 1985. Evolution of the primate β-globin gene region: High rate of variation in CpG dinucleotides and in short repeated sequences between man and chimpanzee. J Mol Biol. 182:21–29. [DOI] [PubMed] [Google Scholar]
  53. Scheet P, Stephens M.. 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 78:629–644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Selker EU, Stevens JN.. 1985. DNA methylation at asymmetric sites is associated with numerous transition mutations. Proc Natl Acad Sci U S A. 82:8114–8118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Seplyarskiy VB, Kharchenko P, Kondrashov AS, Bazykin GA.. 2012. Heterogeneity of the transition/transversion ratio in Drosophila and Hominidae genomes. Mol Biol Evol. 29:1943–1955. [DOI] [PubMed] [Google Scholar]
  56. Smit A, Hubley R, Green P.. 2004. Repeatmasker open-3.0. 1996–2010. http://www.repeatmasker.org.
  57. Spence JP, Song YS.. 2019. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Sci Adv. 5:eaaw9206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Spies M, Fishel R.. 2015. Mismatch repair during homologous and homeologous recombination. Cold Spring Harb Perspect Biol. 7:a022657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Stephens M, Smith NJ, Donnelly P.. 2001. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 68:978–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Stevison LS, Woerner AE, Kidd JM, Kelley JL, Veeramah KR, et al. 2015. The time scale of recombination rate evolution in great apes. Mol Biol Evol. 33:928–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Su S-S, Lahue RS, Au KG, Modrich P.. 1988. Mispair specificity of methyl-directed DNA mismatch correction in vitro. J Biol Chem. 263:6829–6835. [PubMed] [Google Scholar]
  62. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P.. 2015. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 31:2032–2034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Thomas DC, Roberts J, Kunkel T.. 1991. Heteroduplex repair in extracts of human HeLa cells. J Biol Chem. 266:3744–3751. [PubMed] [Google Scholar]
  64. Tortereau F, Servin B, Frantz L, Megens H-J, Milan D, et al. 2012. A high density recombination map of the pig reveals a correlation between sex-specific recombination and GC content. BMC Genomics. 13:586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Vogl C, Bergman J.. 2015. Inference of directional selection and mutation parameters assuming equilibrium. Theor Popul Biol. 106:71–82. [DOI] [PubMed] [Google Scholar]
  66. Vogl C, Clemente F.. 2012. The allele-frequency spectrum in a decoupled Moran model with mutation, drift, and directional selection, assuming small mutation rates. Theor Popul Biol. 81:197–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Wiebauer K, Jiricny J.. 1989. In vitro correction of G.T mispairs to G.C pairs in nuclear extracts from human cells. Nature. 339:234. [DOI] [PubMed] [Google Scholar]
  68. Wiebauer K, Jiricny J.. 1990. Mismatch-specific thymine DNA glycosylase and DNA polymerase beta mediate the correction of GT mispairs in nuclear extracts from human cells. Proc Natl Acad Sci U S A. 87:5842–5845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Williams AL, Genovese G, Dyer T, Altemose N, Truax K, et al. ; on behalf of the T2D-GENES Consortium. 2015. Non-crossover gene conversions show strong GC bias and unexpected clustering in humans. eLife. 4:e04637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Xue Y, Prado-Martinez J, Sudmant PH, Narasimhan V, Ayub Q, et al. 2015. Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding. Science. 348:242–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Zhang J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J Mol Evol. 50:56–68. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data used in the study are from previously published sources and freely available. Supplementary materials are available at https://github.com/jbergman/gcDynamics.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES