A full resolution version of this figure is available as
Figure 3—figure supplement 2—source data 1.(
a) Haplotypic representation of 14000 SNPs from 5008 imputed chr8 genotypes from the 1000 Genomes Project (Left: chr8:42178101–43838849, Right: chr8:46839548–49217022; hg19). SNPs were filtered for MAC ≥ 35 and passing the
4gt_dco with a tolerance of three (see Materials and methods). Minor alleles shown in black. Centromeric gap is indicated by red line. Haplotypes were clustered with UPGMA based on the hamming distances between haplotypes comprised of 1000 SNPs surrounding the gap (indicated by red bar at bottom). Superpopulation is indicated at left. (
b) Filtered cenhaps show very little evidence of recombination and support archaic ancestry of a basal cenhap found in Africa. Cenhaps with putative exchange in their ancestry were filtered from the data in
a by clustering SNPs on the low recombination regions on the left and right side of the gap separately (Left: chr8:42668082–43838849, Right: chr8:46839548–48639846, indicated by green and red lines; hg19). Left-side and right-side clades with little evidence of recombination were intersected to yield 1661 cenhaps used in downstream analysis of archaic contribution and TMRCA. Analysis of possible archaic descent was limited to an internal window of 10602 SNPs, indicated by green lines (chr8:43202774–47755914; hg19). At the left are log
2 counts of DM (derived in archaic, shared by cenhap), DN (derived in archaic, not shared by cenhap) and AN (ancestral in archaic, not shared by cenhaps) based on the Altai Neanderthal (NEA) and Denisovan (DEN) sequence using chimpanzee as an outgroup (
Prüfer et al., 2017). Gray bar at top indicates region included in analysis of archaic content and black bars indicate SNPs with data for archaic and outgroup state. Red bar at bottom indicates 1000 SNP region used for clustering. (
c) A UPGMA tree based on the synonymous divergence in eight genes (see
Figure 3—source data 2) in the three major chr8 cenhaps, assuming the TMRCA of humans and chimps is 6.5MYA. The error bars at each node represent ±two standard deviations of distributions of estimated TMRCAs across the genes. (
d) Bar plots indicating the mean and 95% confidence intervals of DM, DN, AM (ancestral in archaic, shared by cenhap) and AN counts for cenhap groups (as partitioned
b) relative to the archaic genomes (
Prüfer et al., 2017).