Abstract
Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.
Keywords: genomics, gorillas, natural selection, great apes
Introduction
The Gorilla genus consists of two morphologically distinguishable species, western (Gorilla gorilla) and eastern (G. beringei) gorillas (Grubb et al. 2003), each of which is divided into two recognized subspecies. Eastern gorilla populations occur in lowlands and highlands in the Democratic Republic of Congo, Uganda, and Rwanda, whereas western gorilla populations reside primarily in Cameroon, Equatorial Guinea, Gabon, Congo, and the Central African Republic (Sarmiento 2003). Western gorillas include western lowland gorillas (G. gorilla gorilla), the subspecies with the largest population size (and the main focus of this study), and Cross River gorillas (G. gorilla diehli), of which only a few hundred individuals remain. Eastern gorillas are composed of eastern lowland gorillas (G. beringei graueri) and mountain gorillas (G. beringei beringei), which are found today in only two small isolated subpopulations.
Gorillas are the largest extant nonhuman primate, with male and female western lowland gorilla body weights averaging 170 and 71 kg, respectively (Smith and Jungers 1997). Gorillas also demonstrate the largest sexual dimorphism in body size of any of the apes. This is likely related to their mating system (Plavcan 2001), where gorillas exhibit a polygynous structure in which a single dominant male largely controls access to reproduction with a number of adult females. Among the apes, gorillas also demonstrate an unusual diet and digestive anatomy. For example, field studies indicate that the western gorilla diet comprises as many as 230 plant parts from 180 plant species (Rothman et al. 2006). While consuming extremely diverse and large quantities of terrestrial vegetation throughout the year, western gorillas will also regularly eat fruit when it is available (Rogers et al. 2004; Doran-Sheehy et al. 2009). Given their large body size and strictly herbivorous/frugivorous diet, it is not surprising that the gorilla gut anatomy has evolved a distinctive digestive anatomy and physiology. Mainly as a result of the large capacity for microbial fermentation in a large pouched colon, gorillas gut anatomy allows for energy gain through the absorption of volatile fatty acids and microbial protein (Stevens and Hume 1995).
Research on demographic events and selective pressures experienced by gorillas may provide insights to the evolutionary forces that have uniquely influenced patterns of gorilla morphological and genetic variation. Both species of gorilla are considered threatened on the IUCN Red List of Threatened Species (IUCN 2013); western gorillas are classified as critically endangered and eastern gorillas are classified as endangered. Recent census estimates indicate a rapid recent population size contraction in gorillas due to multiple factors including outbreaks of the Ebola virus, the bushmeat trade, habitat loss, and fragmentation (Walsh et al. 2003; Anthony et al. 2007; Le Gouar et al. 2009).
There has been considerable effort to estimate split times and population sizes for western and eastern gorillas (Thalmann et al. 2007, 2011; Ackermann and Bishop 2010; Scally et al. 2012; Prado-Martinez et al. 2013). These studies make use of disparate data sets and modeling assumptions, particularly in terms of the treatment of gene flow subsequent to initial population separations. Based on eight microsatellites, Thalmann et al. (2011) estimate that the separation of Cross River and western lowland gorilla populations occurred 17.8 ka, followed by a comparatively high level of gene flow. On the other hand, Prado-Martinez et al. (2013) estimated this population divergence time at 114 ka based on a modified PSMC (pairwise sequentially Markovian coalescent) approach (Note: The above mentioned values have been adjusted to match the mutation rate used in this manuscript where appropriate). The random phasing procedure applied in the modified PSMC approach may not be appropriate for such recent population split times (Prado-Martinez et al. 2013). Moreover, estimates of the separation of eastern gorillas from the western lowland/Cross River gorillas range from about 100 to 450 ka, with varying degrees, lengths, and directions of gene flow (Becquet and Preworski 2007; Thalmann et al. 2007; Ackermann and Bishop 2010; Mailund et al. 2012; Scally et al. 2012; Prado-Martinez et al. 2013). Additionally, previous studies suggest substructure within the western lowland gorilla species (Clifford et al. 2004; Nsubuga et al. 2010; Scally et al. 2013; Fünfstück et al. 2014).
Few previous studies have analyzed broad patterns of natural selection in the gorilla genome. Scally et al. (2012) found that regions exhibiting accelerated evolution in gorillas, compared with humans and chimpanzees, were most enriched for developmental terms including ear, hair follicle, gonad and brain development, and sensory perception of sound. Many of the studies have been focused on specific loci with hypotheses for being under recent positive selection (e.g., reproductive genes; Good et al. 2013).
In this study, we use a coalescent approach to infer divergence times, rates of gene flow, and effective population sizes based on medium to high coverage whole-genome data from three gorilla subspecies: Western lowland, Cross River, and eastern lowland. We use a diffusion approximation approach to infer temporal changes in western lowland gorilla effective population size, conduct a genome scan for positive selection to identify signatures of recently completed selective sweeps, and investigate the nature of genetic changes within those regions to identify putative targets of selection. Additionally, we analyze the overall distribution of fitness effects (DFE) for polymorphic sites in western lowland gorillas and the proportion of substitutions compared with humans that have been driven by adaptive evolution versus random genetic drift.
Results
Gorilla Population Structure
Whole-genome sequence data from 17 gorillas, including 14 western lowland gorillas, 2 eastern lowland gorillas, and 1 Cross River gorilla were aligned to the gorGor3 (Ensembl release 62) reference genome and processed with filtering as previously described (Prado-Martinez et al. 2013; supplementary table S1, Supplementary Material online). We limited our analysis to samples without evidence of inter and intraspecies sequence contamination and characterized patterns of genetic variation based on single nucleotide polymorphism (SNP) genotypes obtained using the GATK Unified Genotyper, limited to sites with at least 8-fold (8×) coverage in all samples. Across the autosomes, we observe that eastern lowland gorillas have the lowest heterozygosity (5.62–5.69 × 10−4) of all the groups studied here followed by the single Cross River sample (9.09 × 10−4) and 14 western lowland gorillas (1.2–1.6 × 10−3) (supplementary fig. S1, Supplementary Material online). We used principal components analysis (PCA) (Patterson et al. 2006) and ADMIXTURE (Alexander et al. 2009) to further explore relationships among the samples. As previously observed (Prado-Martinez et al. 2013), when considering all samples together, PC1 shows clear separation of eastern and western gorillas with western lowland and Cross River gorillas arrayed along PC2 (supplementary fig. S2, Supplementary Material online). PCA performed on only the western lowland gorilla samples does not reveal clear population clusters, although the individuals are somewhat ordered by sample geography (supplementary fig. S3, Supplementary Material online). Results from ADMIXTURE, a model-based clustering algorithm allowing for mixed ancestry, support the existence of two clusters dividing eastern and western lowland gorillas (supplementary fig. S4, Supplementary Material online). When applied to only the 14 western lowland gorilla samples, we observe that K = 1 has the lowest cross-validation (CV) error (supplementary fig. S5, Supplementary Material online). Moreover, the appearance of a cline at K = 2 suggests a poor fit of the data to an admixture model with discrete sources and PCA does support substructure in the form of a cline in western lowland gorillas (supplementary fig. S3, Supplementary Material online).
Relationship between Western Lowland and Eastern Lowland Gorillas
We applied Generalized Phylogenetic Coalescent Sampler (G-PhoCS), a Bayesian coalescent-based approach, to infer ancestral population sizes, divergence times, and rates of gene flow (Gronau et al. 2011) among the three gorilla subspecies. This inference is based on genealogies inferred at many independent and neutrally evolving loci across the autosomal genome. To avoid bias caused by the alleles represented in the reference genome, which is derived from a western gorilla, we used BSNP (Gronau et al. 2011), a reference-genome-free Bayesian genotype inference algorithm, to perform variant calling separately for each sample. Based on the BSNP output, we produced diploid sequence alignments of two eastern lowland gorillas, nine western lowland gorillas, one Cross River gorilla, and the human reference genome at 25,573 “neutral loci” with size approximately 1 kb and an interlocus distance of approximately 50 kb. The neutral loci were chosen based on the positions of putatively neutral loci previously utilized for humans (Gronau et al. 2011), but further filtered to remove loci that intersected with exons, conserved elements, recent transposable elements, and recent segmental duplications in the gorilla genome.
For many of the analyses presented here, we used a four-population phylogeny as inferred by TreeMix (Pickrell and Pritchard 2012) and in agreement with our ADMIXTURE results (supplementary figs. S6 and S7, Supplementary Material online), with eastern and western gorilla ancestors separating first, followed by western lowland and Cross River gorilla (fig. 1). We first evaluated four alternative scenarios, having either no gene flow between any gorillas (fig. 1, scenario 1) or bidirectional gene flow between any two gorilla species (fig. 1, scenarios 2–4). In G-PhoCS, gene flow is modeled using migration bands of constant migration rate between two lineages over the entire time period of their existence. We utilized several combinations of western lowland gorilla samples, always including two eastern lowland gorillas, two western lowland gorillas, one Cross River gorilla, and one human. We initially ran G-PhoCS for 50,000 iterations and monitored convergence using Tracer (Rambaut et al. 2013). Estimates of population split times are sensitive to model assumptions, particularly gene flow. Our G-PhoCS analysis finds no evidence of migration events between western lowland and Cross River gorillas (supplementary fig. S8, scenario 2, Supplementary Material online). We do observe evidence of gene flow from western lowland gorilla to eastern lowland gorilla with mean total migration rate 0.3 (95% CI: 0.240–0.356), equivalent to 0.37 migrants per generation (95% CI: 0.312–0.433) (supplementary fig. S8, scenario 4, Supplementary Material online). We also observe a small signal of gene flow from Cross River gorilla to eastern lowland gorilla (supplementary fig. S8, scenario 3, Supplementary Material online); however, 50,000 iterations were not sufficient for convergence. To further explore these results, we tested a scenario with two migration bands: One from western lowland to eastern lowland gorilla and another from Cross River to eastern lowland gorilla (fig. 1, scenario 5), and extended the number of iterations to 300,000 to allow the posterior estimates to fully converge (supplementary fig. S9, Supplementary Material online). Setting an additional migration band from Cross River to eastern lowland gorilla makes little difference because migration from western lowland to eastern lowland gorilla has the strongest migration signal (supplementary figs. S10–S13, Supplementary Material online). The estimated migration rate from Cross River to eastern lowland gorilla is 0.004 (95% CI: 0.000–0.018), equivalent to 0.019 migrants per generation (95% CI: 0.000–0.071). By using this setting (fig. 1, scenario 5), we estimate the split time between western lowland gorilla and Cross River gorilla to be approximately 68 ka, and the split time between eastern lowland and western ancestral gorilla to be approximately 261 ka when assuming a human and gorilla divergence time of 12 ma (Scally et al. 2012) (table 1). We also observed a decrease of western gorilla population size and a decrease of eastern gorilla population size after their initial split and a 6-fold difference between current eastern and western gorilla population sizes. The relative population sizes of the gorilla populations are rather robust to the chronological human/gorilla split time used for calibration, though the actual estimated size and chronological date of the split times are sensitive to the split time assumptions as many calculations are pegged to the calibration date (table 1).
Table 1.
Human–Gorilla Divergence Time (Ma) |
|||
---|---|---|---|
8 | 10 | 12 | |
Mutation rate per generation without CpG (×10−8) | 1.461 | 1.169 | 0.974 |
(1.456–1.466) | (1.165–1.173) | (0.970–0.978) | |
Eastern gorilla population size (×103) | 2.853 | 3.566 | 4.280 |
(2.755–2.956) | (3.443–3.696) | (4.132–4.435) | |
Western gorilla population size (×103) | 16.774 | 20.967 | 25.161 |
(13.114–21.439) | (16.393–26.798) | (19.672–32.158) | |
Cross River gorilla population size (×103) | 2.054 | 2.567 | 3.080 |
(2.352–2.755) | (2.940–3.443) | (3.529–4.132) | |
Western–Cross River ancestral population size (×103) | 20.462 | 25.578 | 30.693 |
(17.294–24.191) | (21.617–30,239) | (25.940–36.287) | |
Gorilla ancestral population size (×103) | 26.500 | 33.126 | 39.751 |
(25.829–26.965) | (32.286–33.706) | (38.743–40.447) | |
Human–gorilla ancestral population size (×103) | 45.472 | 56.840 | 68.208 |
(44.349–46.608) | (55.437–58.259) | (66.524–69.911) | |
Western–Cross River split time (Ma) | 0.046 | 0.057 | 0.068 |
(0.038–0.056) | (0.048–0.070) | (0.057–0.084) | |
Eastern–Western–Cross River ancestral split time (Ma) | 0.174 | 0.218 | 0.261 |
(0.161–0.194) | (0.201–0.243) | (0.242–0.292) |
Note.—Population history estimates by using G-PhoCS when assuming a range of human–gorilla divergence time (8, 10, and 12 Ma). We assumed migration events from western lowland to eastern lowland gorilla and from Cross River to eastern lowland gorilla (fig. 1, scenario 5). Values in parentheses correspond to 95% credible intervals.
Western Gorilla Demographic Inference
We additionally inferred the fine-scale population history of western lowland gorillas using the genome-wide site frequency spectrum (SFS) obtained from 14 individuals (Gutenkunst et al. 2009). We utilized a diffusion approximation for demographic inference (∂α∂i) on the unfolded SFS based on 4,554,752 SNPs only considering sites where all samples had at least 8× coverage. Variants were polarized to ancestral and derived alleles based on human outgroup sequences, and we implemented a context-dependent correction for ancestral misidentification (Hernandez et al. 2007). Five demographic models were fit using ∂α∂i and inferring the best-fit demographic model requires us to assess whether the improvement in fit afforded by additional parameters needed in more complex models is justified (table 2). Although the bottleneck followed by exponential growth model and the three-epoch models have similar fits, the three-epoch model has the best fit; moreover, the model selection is robust when SNPs are thinned to 100 kb. Our results suggest an ancient expansion followed by a more recent drastic, 5.6-fold, population contraction is the best model for the data. Specifically, assuming a mutation rate of 1.1 × 10−8 per base pair per generation (Roach et al. 2010) and generation time of 19 years (Langergraber et al. 2012), the best-fit model is a three-epoch model that has an ancestral effective population size of 31,800 (95% CI: 30,690–32,582) (table 2). The first size change event occurred 969,000 years ago (95% CI: 764,074–1,221,403) and increased the effective population size to 44,200 (95% CI: 42,424–46,403) individuals. The second size change event occurred 22,800 years ago (95% CI: 16,457–30,178) and decreased the effective population size to 7,900 (95% CI: 6,433–9,240) individuals (fig. 2).
Table 2.
Demographic Model | Theta/Ancestral Pop Size | P1 | T1 | P2 | T2 | Log-Likelihood | AIC | |
---|---|---|---|---|---|---|---|---|
Standard neutral | 1,167,204 | −60,420 | 120,840 | |||||
32,643 | ||||||||
Exponential growth | 1,299,805 | 0.09 | 0.009 | −6,222 | 12,448 | |||
36,352 | 3,272 | 12,432 | ||||||
Bottleneck, then exponential growth | 1,181,405 | 39.54 | 0.33 | 0.32 | −578 | 1,162 | ||
33,040 | 1,306,416 | 10,903 | 401,771 | |||||
Two epochs | 1,297,300 | 3.4 e-13 | 1.2e-14 | −5,654 | 11,312 | |||
36,282 | 0 | 0 | ||||||
Three epochs | 1,136,249 | 1.391 | 0.785 | 0.249 | 0.019 | −473 | 954 | |
31,777 | 44,190 | 946,129 | 7,905 | 22,842 |
Note.—Gray line contains program parameter output, and white line contains conversion into years. With P1 first population size change, T1 length of bottleneck, P2 second size change, and T2 time of second size change. For the conversion, a mutation rate of 1.1e-8 mutations per base pair per generation and a 19-year generation time were used. The total number of callable sites is 812,645,853.
Nsubuga et al. (2010) and Fünfstück et al. (2014) found evidence for multiple population clusters within western lowland gorillas utilizing data from simple sequence repeats (SSR; microsatellite) variation. Though ADMIXTURE results from our data find the lowest CV error with a one-population model, we also inferred demography separately for individuals on either side of the putative cline (supplementary table S2, Supplementary Material online). Both sample sets yield very similar demographic inferences compared with those obtained from the combined set of 14 individuals.
Selection in Western Lowland Gorillas
Identifying Selective Sweeps
We employed a composite likelihood approach (SweeD, Pavlidis et al. 2013) to scan for genomic regions showing signs of recent selective sweeps (Nielsen et al. 2005). The method compares the regional SFS with the background SFS to calculate a composite likelihood ratio (CLR), which indicates the likelihood of a sweep at a specific genomic region (in 100-kb windows). Significance was determined by comparisons to neutral regions (without selection) simulated in the software ms (Hudson 2002) with the inferred three-epoch demography. Genomic windows were compared with simulated regions with similar estimated recombination rate and percent of sequence masked. For the autosomes, these analyses identified 273 windows of size 100 kb with P < 10−3. With a more stringent P value cutoff a subset of the windows are identified: 111 windows of size 100 kb with P < 10−4 (supplementary table S3, Supplementary Material online). A total of 50 windows had a P value < 10−5, indicating that the CLR of these windows surpassed every CLR obtained from the simulated neutral distribution. As some of the 50 windows were adjacent to each other, these correspond to 43 distinct regions where the observed CLR value exceeded that obtained from 100,000 neutral simulations.
The region with the largest CLR in the western lowland gorilla genome is located on chromosome 5 (fig. 3). This region consists of four adjacent 100-kb windows with P value < 10−5 (chr5: 122,465,120–122,864,624). There are several genomic features in this top-scoring 400-kb region, including the protein coding genes CTNNA1, SIL1, and MATR3 as well as other noncoding features, including 5S rRNA, U6 snRNA, and SNORA74. The region contains three nonsynonymous SNPs which pass the quality filtering but do not have coverage to pass the 8× depth filter: One coding change in CTNNA1, a cadherin-associated protein, and two in SIL1, a nucleotide exchange factor which interacts with heat-shock protein-70. We also note that this region is directly upstream of SLC23A1, a vitamin C transporter and PAIP2, a repressor of polyadenylate-binding protein PABP1. PAIP2 acts as part of innate defense against cytomegalovirus (CMV) (McKinney et al. 2013), which has been detected in wild gorilla populations (Leendertz et al. 2009).
Another region identified (at P < 10−4) contains several genes involved in taste reception. Although in the 8× data set, there is one nonsynonymous SNP in TAS2R20, our full SNP data set contains segregating nonsynonymous changes in three of the taste receptors, including one change in TAS2R50 (derived allele frequency [DAF] = 93%), three in TAS2R20 (DAFs = 89%, 11%, and 7%), and two in TAS2R19 (DAFs = 7% and 4%).
We conducted a gene ontology (GO) enrichment analysis of all regions with P < 10−3 using the Bioconductor package, topGO (Alexa and Rahnenfuhrer 2010) to identify gene pathways subjected to recent selective sweeps in western lowland gorillas. Using the elimination method with Fisher’s exact test, we identified 16 enriched GO categories (P < 0.01) (supplementary table S4, Supplementary Material online). The term with the lowest P value is sodium ion transmembrane transport (GO:0035725, P = 0.00039) and terms related to taste, pancreatic and saliva secretion, cardiac muscle cell function, and several others were identified.
Distribution of Fitness Effects
We estimated α, the fraction of nonsynonymous mutations to reach fixation due to adaptive evolution, through the method outlined in Keightley and Eyre-Walker (2012). This method utilizes the synonymous and nonsynonymous SFS, as well as divergence relative to an outgroup, to simultaneously infer demography and the DFE assuming a gamma distribution. Using the human reference genome as an outgroup we estimate α for western lowland gorillas to be 1.4% (95% CI: −11.6% to 11.0%). The DFE has a shape parameter of 0.152 and mean Nes of 3,076. This distribution is leptokurtic, with a strong peak near zero and a long negative tail that extends to lethality, indicating that the vast majority of fixed and segregating nonsynonymous variants are nearly neutral.
Discussion
Relationship between Western Lowland, Cross River, and Eastern Lowland Gorillas
Several other studies have made use of disparate data sets and modeling assumptions to estimate population split times, sizes, and levels of gene flow for different gorilla species. The estimates described in this manuscript are broadly consistent with previous studies, but there are some differences (supplementary table S5, Supplementary Material online). Our estimate of 68 ka for the Cross River–western lowland split is intermediate between the previous estimates; however, we do not find support for gene flow between these two groups in our G-PhoCS analysis. Two main caveats apply to this analysis. First, in our G-PHoCS model estimates of gene flow and population split-time are confounded as western lowland and Cross River gorilla are sister species. Second, immediately following a gene flow event the variance in individual ancestry proportions across a population is large, with ancestry proportions becoming more uniform over time (Gravel 2012). As our analysis utilized a single Cross River sample, by chance we may have missed signals associated with very recent gene flow.
We estimate that the separation of eastern gorillas from the western lowland/Cross River ancestor occurred 261 ka, with subsequent gene flow from both western lowland and Cross River populations to the eastern gorillas. This value is similar to the 214 ky split time inferred by the modified PSMC approach. Scally et al. (2012), based on a model of symmetric gene flow, estimated a separation time of 429 ka. Mailund et al. (2012) arrive at a broadly similar estimate based on a coal-HMM, and estimate gene flow continuing until 150 ka. We note that our analysis indicates that the direction of gene flow was from western lowland and Cross River to eastern gorillas, with a higher rate from western lowland than from Cross River gorilla. However, Thalmann et al. (2007) find evidence for gene flow from eastern to western gorillas. Alternatively, Ackermann and Bishop (2010) find support for a western to eastern gene flow in morphological and molecular data. One way to assess evidence for gene flow is through the use of D statistics, which provide a formal test for the fit of an unrooted tree to the data (Kulathinal et al. 2009; Green et al. 2010; Durand et al. 2011; Patterson et al. 2012). Excessive allele sharing not accounted for by the population tree is evidence in support of gene flow among the studied populations. The D statistics calculated in Prado-Martinez et al. (2013) suggest that Cross River gorillas are genetically closer to eastern gorillas than western lowland gorillas are to eastern gorillas, which would not be predicted by the gene flow values we infer. We further explored this apparent contradiction by calculating D statistics for additional samples from Prado-Martinez et al. (2013) and using variants identified by BSNP based on mapping to the gorilla reference genome (supplementary table S6, Supplementary Material online). The western lowland gorilla sample A934_Delphi is not included in this study as it contains low-level contamination from a bonobo (Prado-Martinez et al. 2013). Consistent with this potential contamination, A934_Delphi shows an extreme value for the D statistic relative to other western gorillas; however, significant statistics are also obtained when using other samples (supplementary table S6A, Supplementary Material online). We do not observe significant D statistics for genotypes calculated from reads mapped to the gorilla reference genome using BSNP (supplementary table S6B, Supplementary Material online). Additional Cross River samples, as well as new analytic approaches that take advantage of the additional information contained in physically phased genome sequences (Schiffels and Durbin 2014), may shed further light on patterns of gene flow among extant gorilla species.
Western Gorilla Demographic Inference
Given the availability of 14 western lowland gorilla samples, we estimated a single-population demographic history using ∂α∂i. Due to limited sample size, our model does not incorporate other subspecies/species. Our ∂α∂i analysis indicates that western lowland gorillas have undergone a small, ancient population size expansion event 970 ka followed by a drastic size reduction 23 ka. These results are broadly concordant with previous estimates of temporal population size change in gorillas based on the PSMC model (Prado-Martinez et al. 2013) (supplementary fig. S14, Supplementary Material online), especially given that it is known that PSMC tends to smooth instantaneous size changes. We note that the ancient increase predates our estimation for the separation of eastern and western gorillas, and the recent size decrease postdates our estimation of Cross River–western lowland separation. The underlying causes of these effective population size changes are unclear. Previous studies note glacial and interglacial oscillations during the last 2 My may have had an effect on gorilla population size and structure (Thalmann et al. 2007). For example, during the Last Glacial Maximum, rainforest cover was greatly diminished, especially in West Africa where a few refugia were surrounded by tropical grassland (Jolly et al. 1997).
Previous studies suggest substructure within the western lowland gorilla species (Clifford et al. 2004; Nsubuga et al. 2010; Scally et al. 2013; Fünfstück et al. 2014), but our results support the use of a one-population model of western lowland gorillas (though there may be some subtle isolation-by-distance or demic structure). Earlier studies that involved analysis of SSR motifs (DNA microsatellites) provided some indications of substructure within western lowland gorillas (Nsubuga et al. 2010; Fünfstück et al. 2014). Although a slower evolving set of markers, such as SNPs, can identify expansion from a common ancestor and imply demographic changes over tens of thousands of generations, more rapidly evolving microsatellite loci can reveal more recent aspects of gene flow and population substructure. The gorillas utilized in this study have diverse origins; however, some origins cannot be precisely confirmed. PCA and ADMIXTURE analysis support grouping of samples into one population for ∂α∂i analysis. Additionally, models inferred separately on subsets of the data yielded concordant results (supplementary table S2, Supplementary Material online).
In addition to the inferred decline in gorilla effective population size, census estimates note that the gorilla population has declined by more than 60% in the past 20–25 years, prompting their “critically endangered” conservation status (IUCN 2013). This decrease is thought to be due predominantly to Ebola outbreaks and commercial hunting (Walsh et al. 2003; Le Gouar et al. 2009). This sharp decline is much too recent to be observed in our analysis given the data set available.
Natural Selection in Western Lowland Gorillas
Identifying Recent Selective Sweeps
One of the goals of this study was to identify regions that have been under recent positive selection in the western gorilla genome. Our analysis complements previous results (e.g., Scally et al. 2012) by inferring regions with significant recent signs of selective sweeps within population-level full-genome western lowland gorilla data. GO enrichment analysis identified sensory perception of taste (GO:0050909) as one of the most significantly enriched categories in the genome. This term has also been identified as enriched in selected genomic regions in mammalian genomes (Kosiol et al. 2008). All identified genes (GOGO-T2R14, TAS2R19, TAS2R20, and TAS2R50) are type 2 taste receptors, which are thought to be responsible for bitter taste perception in humans (Adler et al. 2000; Chandrashekar et al. 2000; Matsunami et al. 2000). Bitter taste receptors are thought to be important to avoiding harmful substances, and have been predicted to have undergone an extensive gene expansion in mammalian evolution (Go 2006). Gorilla diets are eclectic and highly selective, consisting of a wide array of plant species, fruits, and some insects. Variation in dietary composition across well-studied sites has been noted, as well as seasonal variation within sites (Rogers et al. 2004). Their large body size and large colons with the presence of many cellulose digesting ciliates, as well as their hindgut fermentation digestive strategy, assist dietary flexibility and consumption of difficult to digest foods, such as bark (Remis 2004; Remis and Dierenfeld 2004). The influence of heritable aspects of taste perception on dietary preferences in gorilla populations has not previously been possible. Our genomic studies suggest that bitter taste receptor variation merits consideration as a factor in dietary variation and feeding ecology in gorilla populations and may be investigated. Furthermore, among the top 16 most enriched GO terms are terms involving cardiac muscle function and fibroblast apoptosis. Interestingly, cardiomyopathy involving fibrotic proliferation is a prominent cause of death in captive gorillas, particularly in males (Schulman et al. 1995).
This study focused specifically on selective sweeps. There is much interest in identifying regions under balancing selection, for example between humans and chimpanzees (Leffler et al. 2013). We did not attempt to identify regions under balancing selection due to the sensitivity of the tests to the quality of the reference genome, the depth of coverage of sequencing, and other filtering parameters. We hope that these data will provide a basis for continuing studies of balancing selection.
Distribution of Fitness Effects
We found the DFE and the rate of fixation of adaptive mutations to be similar, but lower than estimates in humans (Boyko et al. 2008). This result is counterintuitive given that western lowland gorillas have a larger effective population size than humans. Because the mean gamma (Nes) is lower in gorillas and Ne is higher, we infer the magnitude of E(s) to be quite a bit smaller in gorillas than humans. Our best-fit demographic model is a three-epoch model. To estimate the proportion of adaptive nonsynonymous substitutions in the genome, DFE-alpha utilizes a two-epoch demographic model. Messer and Petrov (2013) have previously shown that although the approach invoked by DFE-alpha generally correctly recovers α, Veeramah et al. (2014) demonstrated that DFE-alpha can substantially underestimate the true Nes because of background selection acting at linked sites. In addition the strength of any selection, gamma, acting at synonymous sites, which are taken as putatively neutral in this approach, is likely to be larger in gorillas due to their larger effective population size, potentially further distorting estimates of the DFE. As such although our estimate of α may be quite robust, the reliability of the DFE estimate is more uncertain.
Conservation Implications
Conservation of wild gorilla populations in their habitats will benefit from focused efforts to protect populations that collectively encompass the genetic diversity of each species and subspecies. Identification of gene flow that occurred in the past between populations provides impetus for landscape-level conservation plans to provide for migration corridors inferred from genetic data. Managed populations of gorillas in zoos benefit from veterinary care that, increasingly, may benefit from medical approaches based on genetic information. Cardiac disease is a major mortality factor in managed gorilla populations (McManamon and Lowenstine 2012). The opportunities to provide supportive care based on an understanding of the evolutionary similarities and differences in cardiac development and physiology between gorillas and humans can contribute to the welfare of managed gorilla populations, while also providing insights into the evolution of loci associated with cardiac disease risk in humans.
Materials and Methods
Samples
Samples without evidence of sequence read contamination from unrelated western lowland gorillas (n = 14), eastern lowland gorillas (n = 2), and a Cross River gorilla (n = 1) were mostly obtained from blood from wild-caught zoo specimens (supplementary table S1, Supplementary Material online) (Prado-Martinez et al. 2013). All samples were sequenced on an Illumina sequencing platform (HiSeq 2000) with data production at three different sequencing centers; samples were sequenced to 12.7–42.1× coverage. Samples were collected under the supervision of ethical committees and CITES permissions were obtained as necessary. Sequence reads are available from the SRA under accession SRP018689.
Mapping to Gorilla Reference Assembly
Sequences were mapped to gorGor3 and filtered as detailed in Prado-Martinez et al. (2013). Variants were identified in three pools of samples: The 14 western lowland gorillas, the 2 eastern gorillas, and the 1 Cross River gorilla sample. To compare variant calls among sample sets, we generated genome masks that identified all sites that were callable across all samples. Filters were calibrated such that we captured 90% of sites that passed the VQSR procedure (Prado-Martinez et al. 2013). For western lowland gorillas, the filters correspond to a total sample read depth (DP) ≥95 and ≤307, mapping quality (MQ) ≥39 and percent of reads with mapping quality 0 (MQ0fraction) ≤3. For eastern lowland gorilla, the criteria were DP ≥12 and ≤37, MQ ≥ 33, and MQ0fraction ≤4. For the Cross River gorilla, the criteria were DP ≥5 and ≤24, MQ ≥38 and MQ0fraction ≤0. For each sample set, we additionally removed sites within 5 bp of called indels, and removed all positions overlapping with segmental duplications (Sudmant et al. 2013). For analysis of the SFS, we additionally imposed a minimum depth criteria of eight to increase accuracy at singleton sites. For G-PhoCS analysis, variants were identified for each sample independently using BSNP to avoid bias induced by the reference genome and from population level genotype calling. The genotype coordinates were then converted from gorGor3 (Ensembl release 62) to gorGor3.1 (Ensembl release 64) using a custom script.
Recombination Rate Estimates
Gorilla-specific recombination rates were estimated using western gorilla SNP data; described in detail in Stevison LS, Woerner AE, Kidd JM, Kelley JL, Veeramah KR, McManus KF, Great Ape Genome Project, Bustamante CD, Hammer MF, Wall JD (in preparation). Briefly, using both the human-based mapping and the species-specific mapping described above, the data were filtered using a combination of vcftools (Danecek et al. 2011) and custom scripts. Sites with more than 80% missing data were removed. Then, variable sites within 15 bp of each other were thinned to only retain a single site. Next, a reciprocal liftOver (minMatch = 0.1) (Hinrichs et al. 2006) was performed to remove sites that did not map back to the original position. Finally, sites not in Hardy–Weinberg were removed (cutoff = 0.001). After these initial filters were performed on the sites mapped to both reference genomes, the remaining sites were intersected between the two assemblies, with only the species-specific orientation used for subsequent phasing and rate estimation steps. Next, synteny blocks were defined based on the coordinates in both the human and nonhuman primate reference genomes. Then, within each syntenic region, phasing and imputation was performed using the software fastPHASE (Scheet and Stephens 2006), and an additional filter based on minor allele frequency was performed (cutoff = 0.05). For improved phasing accuracy, the variants were rephased using the software PHASE (Stephens and Donnelly 2003) similar to Auton et al. 2012. Rates were then estimated in 4,000 SNP blocks using LDhat (Fearnhead and Donnelly 2001; International HapMap Consortium 2005) (same run parameters as in Auton et al. 2012). The final number of sites used to estimate recombination rates was approximately 7.8 million, as compared with 5.3 and 1.6 million for western chimpanzee and HapMap, respectively.
Summary Measures: PCA, Population structure, and Heterozygosity
Inference of population structure and PCA, which require a set of independent SNPs, was conducted on 10% thinned data when comparing all three subspecies, using ADMIXTURE (Alexander et al. 2009) and smartpca (Patterson et al. 2006), respectively. PCA of three species was conducted on the intersect of the 8× data in western lowland, Cross River, and eastern lowland gorillas. When considering only the western lowland gorillas, data were pruned for linkage (plink –indep 50 5 2). We performed ten independent ADMIXTURE runs for each tested value of K. Heterozygosity was estimated based on the number of heterozygous SNPs per individual in the unfiltered 8× data.
Evolutionary Relationship of Western Lowland, Cross River, and Eastern Lowland Gorillas
G-PhoCS utilizes input alignments from multiple independent “neutral loci” in which recombination within loci occurred at negligible rate but recombination between loci was sufficient to assume that genealogies are approximately uncorrelated (Gronau et al. 2011). Assuming that parameters of recombination are broadly consistent among primates, we adopted the 37,574 neutral loci previously identified by Gronau et al. (2011) for the human genome (build NCBI36), lifted-over these loci to the gorilla genome, and then applied a series of filters to obtain a new set of “neutral loci” for the gorilla genome. Specifically, we removed regions without conserved synteny in human–gorilla alignments, recent transposable elements annotated by RepeatMasker with ≤20% divergence, exons of protein-coding genes, conserved noncoding elements according to phastCons, and recent segmental duplications in Gorilla. This resulted in 26,248 loci, with size of approximately 1 kb and interlocus distance of approximately 50 kb. We called genotypes from the whole-genome data at these neutral loci using BSNP, setting –P flat, which assumes uniform prior distribution to determine genotype calls for each individual without bias introduced by the reference genome. For each locus, we also masked simple repeats, positions within 3 bp of an insertion/deletion, positions with less than five reads, and CpG sites. Finally we used MUSCLE (Edgar 2004) to make alignments of each inferred sequence. After removing loci with completely missing data (all Ns) in at least one individual, we obtained a final set of 25,573 neutral loci for input to G-PhoCS.<COMP NOTE: Please note that in-line equations are in picture format, kindly set T and m in italics for “Tdiv” and “mAB”>
We applied G-PhoCS to different combinations of samples. These combinations always included both eastern lowland gorillas, the single Cross River gorilla but contained different combinations of two western lowland gorillas. An aligned human reference genome was included as an outgroup. We first evaluated four alternative scenarios: No gene flow between gorillas species and bidirectional gene flow between any two gorillas species. For each case, we ran G-PhoCS for 50,000 iterations and found that this was sufficient to establish convergence for the no gene flow and bidirectional gene flow models between western and Cross River gorilla. We reran the analysis with bidirectional gene flow between western and eastern gorilla, and allowed two migration band parameters, one from western lowland to eastern lowland gorilla and another from Cross River to eastern lowland gorilla. We found that 300,000 iterations were sufficient to establish convergence for parameters of interests and we set the burn-in as the first two-thirds of iterations (supplementary fig. S9, Supplementary Material online). The raw estimates by G-PhoCS are ratios between model parameters. Using humans as an outgroup, we calibrated the model based on the average genomic divergence time between human and gorilla, denoted Tdiv. We assume a range of Tdiv = 8.0 − 12.0 Ma. An average mutation rate was calculated by µ = . This mutation rate differs from the rate used in ∂α∂i analyses because it ignores CpG mutations, which are excluded by our filters. Effective population sizes were calibrated by a factor of (4*19*)−1, assuming an average gorilla generation time of 19 years (Langergraber et al. 2012). We also calculated estimates of expected number of migrants per generation, given by mAB * θB and the total migration rate, given by mAB * τAB.
Demographic Inference of Western Lowland Gorilla
The western lowland gorilla single population demographic model was inferred through a diffusion approximation approach implemented in the ∂α∂i software (Gutenkunst et al. 2009). This approach calculates the log likelihood of the model fit based on a comparison between the observed and expected SFS. Five demographic models were evaluated: A standard neutral model, an exponential growth model, a model of a bottleneck followed by exponential growth, a two epoch model, and a three epoch model. We evaluated results with all SNPs that passed the filters and had at least 8× coverage, as well as a subset of these SNPs thinned to 100 kb. For each model, ten independent runs were performed and the model and associated parameters that maximized the likelihood were chosen. To convert ∂α∂i parameter output to years and effective population sizes, we assumed a mutation rate of 1.1 × 10−8 per generation (Roach et al. 2010; 1000 Genomes Project Consortium 2010) and a generation time of 19 years (Langergraber et al. 2012). Confidence intervals for each parameter were determined through bootstrapping the input SNPs in blocks of 500 kb 1,000 times.
For analysis of western lowland gorillas, we used all sites from the genomic data with at least 8× coverage in all samples. The unfolded (polarized) SFS was determined using humans as an outgroup and ancestral misidentification was corrected using the method developed in Hernandez et al. (2007), which is implemented in ∂α∂i. Briefly, this approach infers the unfolded SFS through a context-dependent mutation model. It considers the trinucleotide sequence context of each SNP in gorillas and the outgroup, the great ape transition rate matrix for each nucleotide (as in Hwang and Green 2004, provided by Hwang DG, unpublished data), the proportion of each trinucleotide sequence in the gorilla sequence data, and the gorilla-outgroup divergence (empirically estimated at 1.60% and 1.51% from the complete and 8× filtered sequence data, respectively).
Signals of Recent Selective Sweeps in Western Lowland Gorillas
Signals of recent selective sweeps were inferred using the SweepFinder method developed in Nielsen et al. (2005) implemented in SweeD (Pavlidis et al. 2013). The method uses a nonoverlapping sliding window approach to calculate the composite likelihood of the data for two models: 1) A model of a recently completed selective sweep in the window and (2) a model that the window SFS is from the same distribution as the background SFS, where the background SFS is the SFS of the entire chromosome. This method outputs a CLR of these two models. Nonoverlapping windows of 100 kb along each chromosome were used to analyze the polarized 8× data. Windows that had less than 10% of the base pairs callable in the 8× data set were excluded from analysis.
The unfolded SFS was determined using a two outgroup approach. Nucleotides at each SNP in the gorilla genome were compared with the reference alleles for human (hg19) and rhesus macaque (rheMac2). At each SNP position, if one gorilla allele matched the reference allele in both humans and rhesus macaques, then that allele was assumed to be the ancestral allele. If the three species did not share an allele at a specific SNP, the site was excluded. The ancestral misidentification correction implemented in Hernandez et al. (2007) adjusts the overall SFS and not individual SNPs, and was therefore inappropriate for this analysis.
To determine CLR significance, neutral genomic regions were simulated with ms (Hudson 2002). This test may be weakly dependent on demography, recombination rate, and the length of sequence where variants can be identified (Nielsen et al. 2005; Williamson et al. 2007); therefore, all neutral simulations are based on the inferred three-epoch demography, as well as conservative estimates of the recombination rate and the callable sequence length. Though the CLR is rather robust to variable recombination rates, a false inference of a selective sweep is more likely to occur in regions that have a recombination rate lower than the assumed rate. (Nielsen et al. 2005). Furthermore, regions with less callable sequence have less data and thus may have lower power to recognize a selective sweep through the CLR. Due to this, 100,000 neutral regions were simulated for each of the following recombination rates in centimorgans per megabase (cM/MB): 0, 0.25, 0.5, 1, and 2. In each region, base pairs were randomly masked at one of the following levels: 90%, 80%, 70%, 60%, and 50%. A total of 2,500,000 regions were simulated; 100,000 for each combination of parameters. The average recombination rate in each gorilla 100-kb genomic region was calculated from a gorilla-specific recombination map (Stevison et al., in preparation). For gorilla regions without an estimated recombination rate, the average recombination rate (0.6429 cM/Mb) was used (Stevison et al., in preparation). The CLR significance of each gorilla 100-kb region was determined through comparison with the closest set of neutral simulations. When gorilla windows were between parameters, simulations with lower recombination and higher masking were used, making the test more conservative.
We utilized false discovery rate (FDR) methods outlined in Storey and Tibshirani (2003) and the tuning parameter selection method from Williamson et al. (2007) to estimate the percent of features we call “significant” that are actually null. The tuning parameter selection method was used because, as Williamson et al. (2007) points out, the CLR was designed to be conservative and thus there are many regions with P = 1. This violates Storey and Tibshirani’s (2003) assumption that P values of null features follow a uniform distribution. As we are testing many hypotheses simultaneously, we estimated the proportion of inferred selected genomic regions likely to be false positives at various P value thresholds. We utilized the approach outlined in Storey and Tibshirani (2003) and the tuning parameter selection method from Williamson et al. (2007). Using the same parameters as Williamson et al. (2007), we found that the FDR at a P value threshold of 10−5 was 0.50%, at 10−4 was 3.4%, at 0.001 was 9.11%, and at 0.01 was 34.04%.
Genomic features, including genes and their corresponding GO terms, in regions with significant signs of selective sweeps were identified through the Ensembl database (Flicek et al. 2011). All genes reported are verified human genes that are computationally predicted to have an orthologous gene in the gorilla genome. The Bioconductor package, topGO, was used for GO enrichment analysis (Alexa and Rahnenfuhrer 2010). The set of significant genes tested has P < 0.001 and the background distribution of genes were those that overlapped with windows tested in SweeD (regions with greater than 10% of their sequence in the callable genome). The elimination method with Fisher’s exact test was used to infer significantly enriched GO terms.
Distribution of Fitness Effects
The DFE was inferred through the DFE-alpha server (Keightley and Eyre-Walker 2012). This method attempts to correct for the biases in the McDonald–Kreitman test due to slightly deleterious mutations. Briefly, this method simultaneously infers demography and the DFE, based on transition matrix methods. Adaptive substitutions are inferred through the difference between the observed divergence and the predicted divergence, based on a gamma distribution and a two-epoch demographic model. Input data were the folded nonsynonymous and synonymous frequency spectra, as annotated by SNPEff (Cingolani et al. 2012). To calculate the number of divergent sites, we utilized the UCSC multiz alignment of the human (hg19), chimpanzee (panTro3), and rhesus macaque (rheMac2) genomes to the gorilla (gorGor3) genome coding region and restricted to sites with no missing data. The number of divergence sites was then calculated from the sites where the human, chimpanzee, and rhesus macaque shared the same allele, and the gorilla genome allele differed. Confidence intervals were determined through bootstrapping the input synonymous and nonsynonymous SNPs 1,000 times.
Supplementary Material
Supplementary tables S1–S5 and figures S1–S14 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
The authors thank Ryan Gutenkunst for assistance with ∂α∂i, Ryan Hernandez for discussions about the DFE and site frequency spectra, and Omar Cornejo for extensive input on analytical methods. This work was supported by the National Institute of Health to M.F.H. and J.D.W. (R01_HG005226), National Institute of Health (2T32GM007276-39) and a Stanford Center for Computational, Evolutionary and Human Genomics (CEHG) fellowship to K.F.M., and National Science Foundation Graduate Research Fellowship Grant DGE-1143953 to A.E.W.
References
- Ackermann RR, Bishop JM. Morphological and molecular evidence reveals recent hybridization between gorilla taxa. Evolution. 2010;64:271–290. doi: 10.1111/j.1558-5646.2009.00858.x. [DOI] [PubMed] [Google Scholar]
- Adler E, Hoon MA, Nueller KL, Chandrashekar J, Ryba NJ, Zuker CS. A novel family of mammalian taste receptors. Cell. 2000;100:693–702. doi: 10.1016/s0092-8674(00)80705-9. [DOI] [PubMed] [Google Scholar]
- Alexa A, Rahnenfuhrer J. 2010. topGO: enrichment analysis for Gene Ontology. R package version 2.16.0. [Google Scholar]
- Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anthony NM, Johnson-Bawe M, Jeffery K, Clifford SL, Abernethy KA, Tutin CE, Lahm SA, White LJT, Utley JF, Wickings EJ, et al. The role of Pleistocene refugia and rivers in shaping gorilla genetic diversity in central Africa. Proc Natl Acad Sci U S A. 2007;104:20432–20436. doi: 10.1073/pnas.0704816105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, et al. A fine-scale chimpanzee genetic map from population sequencing. Science. 2012;336:193–198. doi: 10.1126/science.1216872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008;4:e1000083. doi: 10.1371/journal.pgen.1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Becquet C, Przeworski M. A new approach to estimate parameters of speciation models with applications to apes. Genome Res. 2007;17:1505–1519. doi: 10.1101/gr.6409707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandrashekar J, Mueller KL, Hoon MA, Adler E, Feng L, Guo W, Zuker CS, Ryba J. T2Rs function as bitter taste receptors. Cell. 2000;100:703–711. doi: 10.1016/s0092-8674(00)80706-0. [DOI] [PubMed] [Google Scholar]
- Cingolani P, Platts A, Wang Le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clifford SL, Anthony NM, Bawe-Johnson M, Abernethy KA, Tutin CEG, White LJT, Bermejo M, Goldsmith ML, McFarland K, Jeffery KJ, et al. Mitochondial DNA phylogeography of western lowland gorillas (Gorilla gorilla gorilla) Mol Ecol. 2004;13:1551–1565. doi: 10.1111/j.1365-294X.2004.02140.x. [DOI] [PubMed] [Google Scholar]
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker R, Lunter G, Marth G, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doran-Sheehy D, Mongo P, Lodwick J, Conklin-Brittain NL. Male and female western gorilla diet: preferred foods, use of fallback resources, and implications for ape versus old world monkey foraging strategies. Am J Phys Anthropol. 2009;140:727–738. doi: 10.1002/ajpa.21118. [DOI] [PubMed] [Google Scholar]
- Durand EY, Patterson N, Reich D, Slatkin M. Testing for ancient admixture between closely related populations. Mol Biol Evol. 2011;28:2239–2252. doi: 10.1093/molbev/msr048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fearnhead P, Donnelly PJ. Estimating recombination rates from population genetic data. Genetics. 2001;159:1299–1318. doi: 10.1093/genetics/159.3.1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flicek P, Amode RM, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, et al. Ensembl 2011. Nucleic Acids Res. 2011;39:D800–D806. doi: 10.1093/nar/gkq1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fünfstück T, Arandjelovic M, Morgan DB, Sanz C, Breuer T, Stokes EJ, Reed P, Olson SH, Cameron K, Ondzie A, et al. The genetic population structure of wild western lowland gorillas (Gorilla gorilla gorilla) living in continuous rain forest. Am J Primatol. 2014;76(9):868–878. doi: 10.1002/ajp.22274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Go Y. Lineage-specific expansions and contractions of the bitter taste receptor gene repertoire in vertebrates. Mol Biol Evol. 2006;23:964–972. doi: 10.1093/molbev/msj106. [DOI] [PubMed] [Google Scholar]
- Good JM, Wiebe V, Albert FW, Burbano HA, Kircher M, Green RE, Halbwax M, André C, Atencia R, Fischer A, et al. Comparative population genomics of the ejaculate in humans and the great apes. Mol Biol Evol. 2013;30:964–976. doi: 10.1093/molbev/mst005. [DOI] [PubMed] [Google Scholar]
- Gravel S. Population genetics models of local ancestry. Genetics. 2012;191(2):607–619. doi: 10.1534/genetics.112.139808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, et al. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A. Bayesian inference of ancient human demography from individual genome sequences. Nat Genet. 2011;43:1031–1034. doi: 10.1038/ng.937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grubb P, Butynski TM, Oates JF, Bearder SK, Disotell TR, Groves CP, Struhsaker TT. Assessment of the diversity of African primates. Int J Primatol. 2003;24:1301–1357. [Google Scholar]
- Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the demographic history of multiple populations from multidimensional SNP frequency data. PLOS Genet. 2009;5(10):e1000695. doi: 10.1371/journal.pgen.1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hernandez RD, Williamson SH, Bustamante CD. Context dependence, ancestral misidentification, and spurious signatures of natural selection. Mol Biol Evol. 2007;24:1792–1800. doi: 10.1093/molbev/msm108. [DOI] [PubMed] [Google Scholar]
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34:D590–D598. doi: 10.1093/nar/gkj144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson RR. Generating samples under a Wright-Fisher neutral model. Bioinformatics. 2002;18:337–338. doi: 10.1093/bioinformatics/18.2.337. [DOI] [PubMed] [Google Scholar]
- Hwang DG, Green P. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A. 2004;101(39):13994–14001. doi: 10.1073/pnas.0404142101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–1320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- IUCN. 2013. IUCN red list of threatened species. Version 2013.2. [cited 2014 May 5]. Available from: http://www.iucnredlist.org.
- Jolly D, Taylor D, Marchant R, Hamilton A, Bonnefille R, Buchet G, Riollet G. Vegetation dynamics in central Africa since 18,000 yr BP: pollen records from the interlacustrine highlands of Burundi, Rwanda and western Uganda. J Biogeogr. 1997;24:492–512. [Google Scholar]
- Keightley PD, Eyre-Walker A. Estimating the rate of adaptive molecular evolution when the evolutionary divergence between species is small. J Mol Evol. 2012;74:61–68. doi: 10.1007/s00239-012-9488-1. [DOI] [PubMed] [Google Scholar]
- Kosiol C, Vinař T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A. Patterns of positive selection in six mammalian genomes. PLOS Genet. 2008;4(8):e1000144. doi: 10.1371/journal.pgen.1000144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kulathinal RJ, Stevison LS, Noor MAF. The genomics of speciation in Drosophila: diversity, divergence, and introgression estimated using low-coverage genome sequencing. PLOS Genet. 2009;5(7):e1000550. doi: 10.1371/journal.pgen.1000550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langergraber KE, Prüfer K, Rowney C, Boesch C, Crockford C, Fawcett K, Inoue E, Inoue-Muruyama M, Mitani JC, Muller MN, et al. Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proc Natl Acad Sci U S A. 2012;109(39):15716–15721. doi: 10.1073/pnas.1211740109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Le Gouar JP, Vallet D, David L, Bermejo M, Gatti S, Levréro F, Petit EJ, Ménard N. How Ebola impacts genetics of Western Lowland Gorilla populations. PLoS One. 2009;4(12):e8375. doi: 10.1371/journal.pone.0008375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leendertz FH, Deckers M, Schempp W, Lankester F, Boesch C, Mugisha L, Dolan A, Gatherer D, McGeoch DJ, Ehlers B. Novel cytomegaloviruses in free-ranging and captive great apes: phylogenetic evidence for bidirectional horizontal transmission. J Gen Virol. 2009;90:2386–2394. doi: 10.1099/vir.0.011866-0. [DOI] [PubMed] [Google Scholar]
- Leffler EM, Gao Z, Pfeifer S, Ségurel L, Auton A, Venn O, Bowden R, Bontrop R, Wall JD, Sella G, et al. Multiple instances of ancient balancing selection shared between humans and chimpanzees. Science. 2013;339:1578–1582. doi: 10.1126/science.1234070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mailund T, Halager AE, Westergaard M, Dutheil JY, Munch K, Andersen LN, Lunter G, Prüfer K, Scally A, Hobolth A, et al. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet. 2012;8:e1003125. doi: 10.1371/journal.pgen.1003125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsunami H, Montmayeur JP, Buck LB. A family of candidate taste receptors in human and mouse. Nature. 2000;404:601–614. doi: 10.1038/35007072. [DOI] [PubMed] [Google Scholar]
- McKinney C, Yu D, Mohr I. A new role for the cellular PABP repressor Paip2 as an innate restriction factor capable of limiting productive cytomegalovirus replication. Genes Dev. 2013;27:1809–1820. doi: 10.1101/gad.221341.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McManamon R, Lowenstine L. Cardiovascular disease in great apes. In: Fowler ME, Miller RE, editors. Fowler’s Zoo and Wildlife Medicine. 7th ed. Missouri: Elsevier Saunders; 2012. [Google Scholar]
- Messer PW, Petrov DA. Frequent adaptation and the McDonald-Kreitman test. Proc Natl Acad Sci U S A. 2013;110(21):8615–8620. doi: 10.1073/pnas.1220835110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen R, Williamson S, Kim Y, Hubisz M, Clark A, Bustamante C. Genomic scans for selective sweeps using SNP data. Genome Res. 2005;15:1566–1575. doi: 10.1101/gr.4252305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nsubuga AM, Holzman J, Chemnick LG, Ryder OA. The cryptic genetic structure of the North American captive gorilla population. Conserv Genet. 2010;11:161–172. [Google Scholar]
- Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics. 2012;192(3):1065–1093. doi: 10.1534/genetics.112.145037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12):e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pavlidis P, Zivkovic D, Stamatakis A, Alachiotis N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol Biol Evol. 2013;30(9):2224–2234. doi: 10.1093/molbev/mst112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8(11):e1002967. doi: 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plavcan JM. Sexual dimorphism in primate evolution. Yearb Phys Anthropol. 2001;44:25–53. doi: 10.1002/ajpa.10011.abs. [DOI] [PubMed] [Google Scholar]
- Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B, Veeramah KR, Woerner AE, O’Connor TD, Santpere G, et al. Great ape genetic diversity and population history. Nature. 2013;499:471–475. doi: 10.1038/nature12228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rambaut A, Suchard MA, Xie D, Drummond AJ. 2013. Tracer v1.5. [cited 2014 May 5]. Available from: http://beast.bio.ed.ac.uk/Tracer.
- Remis MJ, Dierenfeld ES. Digesta passage, digestibility and behavior in captive gorillas under two dietary regimens. Int J Primatol. 2004;24(4):825–845. [Google Scholar]
- Remis MJ. Western lowland gorillas as seasonal frugivores: use of variable resources. Am J Primatol. 2004;43:87–109. doi: 10.1002/(SICI)1098-2345(1997)43:2<87::AID-AJP1>3.0.CO;2-T. [DOI] [PubMed] [Google Scholar]
- Roach JC, Glusman G, Smith AFA, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328(5978):636–639. doi: 10.1126/science.1186802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers ME, Abernethy K, Bermejo M, Cipolletta C, Doran D, McFarland K, Nishihara T, Remis M, Tutin CE. Western gorilla diet: a synthesis from six sites. Am J Primatol. 2004;64:173–192. doi: 10.1002/ajp.20071. [DOI] [PubMed] [Google Scholar]
- Rothman JM, Pell AN, Nkurunungi JB, Dierenfeld ES. Nutritional aspects of the diet of wild gorillas: how do Bwindi gorillas compare? In: Newton-Fisher NE, Reynolds V, Notman H, Paterson J, editors. Primates of western Uganda. New York: Kluwer Press; 2006. [Google Scholar]
- Sarmiento EE. Distribution, taxonomy, genetics, ecology, and causal links of gorilla survival: the need to develop practical knowledge for gorilla conservation. In: Taylor AB, Goldsmith ML, editors. Gorilla biology: a multidisciplinary perspective. Cambridge: Cambridge University Press; 2003. pp. 432–471. [Google Scholar]
- Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, Hobolth A, Lappalainen T, Mailund T, Marques-Bonet T, et al. Insights into hominid evolution from the gorilla genome sequence. Nature. 2012;483:169–175. doi: 10.1038/nature10842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scally A, Yngvadottir B, Xue Y, Ayub Q, Durbin R, Tyler-Smith S. A genome-wide survey of genetic variation in gorillas using reduced representation sequencing. PLoS One. 2013;8(6):e65066. doi: 10.1371/journal.pone.0065066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78:629–644. doi: 10.1086/502802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schiffels S, Durbin R. Inferring human population size and separation history from multiple genome sequences. Nat Genet. 2014;46:919–925. doi: 10.1038/ng.3015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schulman Y, Farb A, Virmani R, Montali RJ. Fibrosing cardiomyopathy in lowland gorillas (Gorilla gorilla gorilla) in the United States: a retrospective study. J Zoo Wildl Med. 1995;26:43–51. [Google Scholar]
- Smith RJ, Jungers WL. Body mass in comparative primatology. J Hum Evol. 1997;32:523–559. doi: 10.1006/jhev.1996.0122. [DOI] [PubMed] [Google Scholar]
- Stephens M, Donnelly P. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet. 2003;73:1162–1169. doi: 10.1086/379378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stevens CE, Hume ID. Comparative physiology of the vertebrate digestive system. 2nd ed. Cambridge: Cambridge University Press; 1995. p. 75. [Google Scholar]
- Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100(16):9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sudmant PH, Huddleston J, Catacchio CR, Malig M, Hillier LW, Baker C, Mohajeri K, Kondova I, Bontrop RE, Persengiev S, et al. Evolution and diversity of copy number variation in the great ape lineage. Genome Res. 2013;23:1373–1382. doi: 10.1101/gr.158543.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thalmann O, Fischer A, Lankester F, Pääbo S, Vigilant L. The complex evolutionary history of gorillas: insights from genomic data. Mol Biol Evol. 2007;24:146–158. doi: 10.1093/molbev/msl160. [DOI] [PubMed] [Google Scholar]
- Thalmann O, Wegmann D, Spitzner M, Arandjelovic M, Guschanski K, Leuenberger C, Bergl RA, Vigilant L. Historical sampling reveals dramatic demographic changes in western gorilla populations. BMC Evol Biol. 2011;11:85. doi: 10.1186/1471-2148-11-85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Veeramah KR, Gutenkunst RN, Woerner AE, Watkins JC, Hammer MF. Evidence for increased levels of positive and negative selection on the X chromosome versus autosomes in humans. Mol Biol Evol. 2014;31(9):2267–2282. doi: 10.1093/molbev/msu166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walsh PD, Abernathy KA, Bernejo M, Beyers R, De Wachter P, Akou ME, Huijbregts B, Mambounga DI, Toham AK, Kilbourn AM, et al. Catastrophic ape decline in western equatorial Africa. Nature. 2003;422:611–614. doi: 10.1038/nature01566. [DOI] [PubMed] [Google Scholar]
- Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R. Localizing recent adaptive evolution in the human genome. PLoS Genet. 2007;3(6):e90. doi: 10.1371/journal.pgen.0030090. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.