Skip to main content
. 2019 Jun 25;8:e42989. doi: 10.7554/eLife.42989

Figure 3. Archaic cenhaps are found in AMH populations.

A full resolution version of this figure is available as Figure 3—source data 3. (a) Haplotypic representation of 8816 SNPs from 5008 imputed chr11 genotypes from the 1000 Genomes Project (Left: chr11:50509493–51594084, Right: chr11:54697078–55326684; hg19). SNPs were filtered for MAC ≥ 35 and passing the 4gt_dco with a tolerance of three (see Materials and methods). Minor alleles shown in black and assembly gap indicated by red line. Haplotypes were clustered with UPGMA based on the hamming distance between haplotypes comprised of 1000 SNPs surrounding the gap (Left: chr11:51532172–51594084, Right: chr11:54697078–54845667; hg19, indicated by red bar at bottom). Superpopulation and cenhap partitioning are indicated by bars at far left. Log2 counts of DM (derived in archaic, shared by haplotype), DN (derived in archaic, not shared by haplotype) and AN (ancestral in archaic, not shared by haplotype) for each cenhap relative to Altai Neanderthal (NEA) and Denisovan (DEN) at left. Gray horizontal bar (top) indicates region included in analysis of archaic content; black bars indicate SNPs with data for archaic and ancestral states. (b) Bar plots indicating the mean and 95% confidence intervals of DM, DN, AM (ancestral in archaic, shared by cenhap) and AN counts for cenhap groups (as partitioned in a. and c.) relative to Altai Neanderthal and Denisovan genomes, using chimpanzee as an outgroup (Speidel et al., 2019). (c) Haplotypic representation, as above, of 21950 SNPs from 5008 imputed chr12 genotypes from the 1000 Genomes Project (Left: chr12:33939700–34856380, Right: chr12:37856765–39471374; hg19). SNPs were filtered for MAC ≥ 35. Haplotypes were clustered with UPGMA based on 1000 SNPs surrounding the gap (Left: chr12:34821738–34856670, Right: chr12:37856765–37923684; hg19). Bars at side, top and bottom same as in a. (d) A UPGMA tree based on the synonymous divergence for 30 genes in the seven major chr11 cenhaps (see Figure 3—source data 2), assuming the TMRCA of humans and chimpanzee is 6.5MY (see Materials and methods and legend for Figure 1d). The error bars at each node represent ±two standard deviations of distributions of estimated TMRCAs across the genes.

Figure 3—source data 1. The 37 chr11 coding genes in the CPR (2 left and 35 right of the centromere gap) used in the UPGMA clustering and estimation of TMRCAs.
Gene models and alignments from Ensembl release 92 (April 2018). Numbers of nonsynonymous differences in the two basal cenhaps (1, 2 and both, 1_&_2; see Figure 3) from the other cenhaps of the 5008 imputed chr11 CPR haplotypes (see Materials and methods). Numbers of sites divergent (human-chimp): div_sites. Numbers of sites polymorphic: polym_sites. Average nonsynomymous divergence: nonsyn_div. Average synonymous divergence: syn_div. Average nonsynonymous diversity: nonsyn_π. Average synonymous diversity: syn_π.
DOI: 10.7554/eLife.42989.019
Figure 3—source data 2. The eight chr8 coding genes in the CPR (8 left and 0 right of the centromere gap) used in the UPGMA clustering and estimation of TMRCAs.
Gene models and alignments from Ensembl release 92 (April 2018). Numbers of sites divergent (human-chimp): div_sites. Numbers of sites polymorphic: polym_sites. Average nonsynomymous divergence: nonsyn_div. Average synonymous divergence: syn_div. Average nonsynonymous diversity: nonsyn_π. Average synonymous diversity: syn_π.
DOI: 10.7554/eLife.42989.020
Figure 3—source data 3. Full resolution version of Figure 3.
DOI: 10.7554/eLife.42989.021

Figure 3.

Figure 3—figure supplement 1. Region of chromosome 11 used for cenhap coding region divergence.

Figure 3—figure supplement 1.

A full resolution version of this figure is available as Figure 3—figure supplement 1—source data 1. (a) Haplotypic representation of 38644 SNPs from 5008 imputed chr11 genotypes from the 1000 Genomes project (Left: chr11:46509551–51594084, Right: chr11:54695707–59326455; hg19). Green lines indicate the region used for analysis of divergence in coding regions (Left: 49952369, Right: 56643039; hg19). SNPs were filtered for MAC ≥ 35 and passing the 4gt_dco with a tolerance of three (see Materials and methods). Minor alleles are shown in black, and assembly gap is indicated by red line. Haplotypes were clustered with UPGMA based on the hamming distances between haplotypes comprised of 1000 SNPs surrounding the gap, indicated by the red bar at bottom. Superpopulation and cenhap partitioning is shown at left.
Figure 3—figure supplement 1—source data 1. Full resolution version of Figure 3—figure supplement 1.
DOI: 10.7554/eLife.42989.014

Figure 3—figure supplement 2. Evidence of an archaic cenhap within Africa on chromosome 8.

Figure 3—figure supplement 2.

A full resolution version of this figure is available as Figure 3—figure supplement 2—source data 1.(a) Haplotypic representation of 14000 SNPs from 5008 imputed chr8 genotypes from the 1000 Genomes Project (Left: chr8:42178101–43838849, Right: chr8:46839548–49217022; hg19). SNPs were filtered for MAC ≥ 35 and passing the 4gt_dco with a tolerance of three (see Materials and methods). Minor alleles shown in black. Centromeric gap is indicated by red line. Haplotypes were clustered with UPGMA based on the hamming distances between haplotypes comprised of 1000 SNPs surrounding the gap (indicated by red bar at bottom). Superpopulation is indicated at left. (b) Filtered cenhaps show very little evidence of recombination and support archaic ancestry of a basal cenhap found in Africa. Cenhaps with putative exchange in their ancestry were filtered from the data in a by clustering SNPs on the low recombination regions on the left and right side of the gap separately (Left: chr8:42668082–43838849, Right: chr8:46839548–48639846, indicated by green and red lines; hg19). Left-side and right-side clades with little evidence of recombination were intersected to yield 1661 cenhaps used in downstream analysis of archaic contribution and TMRCA. Analysis of possible archaic descent was limited to an internal window of 10602 SNPs, indicated by green lines (chr8:43202774–47755914; hg19). At the left are log2 counts of DM (derived in archaic, shared by cenhap), DN (derived in archaic, not shared by cenhap) and AN (ancestral in archaic, not shared by cenhaps) based on the Altai Neanderthal (NEA) and Denisovan (DEN) sequence using chimpanzee as an outgroup (Prüfer et al., 2017). Gray bar at top indicates region included in analysis of archaic content and black bars indicate SNPs with data for archaic and outgroup state. Red bar at bottom indicates 1000 SNP region used for clustering. (c) A UPGMA tree based on the synonymous divergence in eight genes (see Figure 3—source data 2) in the three major chr8 cenhaps, assuming the TMRCA of humans and chimps is 6.5MYA. The error bars at each node represent ±two standard deviations of distributions of estimated TMRCAs across the genes. (d) Bar plots indicating the mean and 95% confidence intervals of DM, DN, AM (ancestral in archaic, shared by cenhap) and AN counts for cenhap groups (as partitioned b) relative to the archaic genomes (Prüfer et al., 2017).
Figure 3—figure supplement 2—source data 1. Full resolution version of Figure 3—figure supplement 2.
DOI: 10.7554/eLife.42989.016

Figure 3—figure supplement 3. Evidence of archaic cenhap introgression on chromosome 10.

Figure 3—figure supplement 3.

A full resolution version of this figure is available as Figure 3—figure supplement 3—source data 1. Haplotypic representation of 14000 SNPs from 5008 imputed chr10 genotypes from the 1000 Genomes Project (Left: chr10:37341777–39154888, Right: chr10:42354982–43762908; hg19). SNPs were filtered for MAC ≥ 35 and passing the 4gt_dco with a tolerance of three (see Materials and methods). Minor alleles are shown in black, and centromeric gap is indicated by red line. Haplotypes were clustered with UPGMA based on the hamming distance between haplotypes comprised of 1400 SNPs surrounding the gap (indicated by red bar at bottom). Superpopulation is indicated at left. Analysis of possible archaic descent was limited to an internal window of 7221 SNPs showing little evidence of exchange in the most centromere-distal regions, indicated by green lines. At the left are log2 counts of DM (derived in archaic, shared by cenhap), DN (derived in archaic, not shared by cenhap) and AN (ancestral in archaic, not shared by cenhaps) based on the Altai Neanderthal (NEA) and Denisovan (DEN) sequence using chimpanzee as an outgroup (Prüfer et al., 2017). Gray bar at top indicates region included in analysis of archaic content and black bars indicate SNPs with data for archaic and outgroup state.
Figure 3—figure supplement 3—source data 1. Full resolution version of Figure 3—figure supplement 3.
DOI: 10.7554/eLife.42989.018