Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2025 May 24:2023.09.25.559202. Originally published 2023 Sep 25. [Version 3] doi: 10.1101/2023.09.25.559202

The MUC19 Gene: An Evolutionary History of Recurrent Introgression and Natural Selection

Fernando A Villanea 1,, David Peede 2,3,4,, Eli J Kaufman 5, Valeria Añorve-Garibay 3,6, Elizabeth T Chevy 3, Viridiana Villa-Islas 6,7, Kelsey E Witt 8, Roberta Zeloni 9,10, Davide Marnetto 10, Priya Moorjani 11,12, Flora Jay 13, Paul N Valdmanis 5, María C Ávila-Arcos 6, Emilia Huerta-Sánchez 2,3,14
PMCID: PMC10557577  PMID: 37808839

Abstract

We study the gene MUC19, for which some modern humans carry a Denisovan-like haplotype. MUC19 is a mucin, a glycoprotein that forms gels with various biological functions. We find diagnostic variants for the Denisovan-like MUC19 haplotype at high frequencies in admixed Latin American individuals, and at highest frequency in 23 ancient Indigenous American individuals, all predating population admixture with Europeans and Africans. We find that the Denisovan-like MUC19 haplotype is under positive selection and carries a higher copy number of a 30 base-pair variable number tandem repeat, and that copy numbers of this repeat are exceedingly high in American populations. Finally, some Neanderthals carry the Denisovan-like MUC19 haplotype, and that it was likely introgressed into human populations through Neanderthal introgression rather than Denisovan introgression.

One-Sentence Summary:

Modern humans and Neanderthals carry a Denisovan variant of the MUC19 gene, which is under positive selection in populations of Indigenous American ancestry.


Most modern humans of non-African ancestry carry both Neanderthal and Denisovan genomic variants [13]. While most of these variants are putatively neutral, some archaic variants found in modern humans have been targets of positive natural selection [49]. Interbreeding with Neanderthals and Denisovans may have thereby facilitated adaptation to the myriad novel environments that modern humans encountered as they populated the globe [10]. Indeed, several studies have identified signatures of adaptive introgression in Eurasian and Oceanian populations [1120]. Indigenous American populations, however, present great potential for studying the underlying evolutionary processes of local adaptation [21]. In the 25,000 years since the first individuals populated the American continent, these populations would have encountered manifold novel environments, far different from the Beringian steppe, to which their ancestral population was adapted [22].

Previous studies identified MUC19—a gene involved in immunity—as a candidate for adaptive introgression among populations from the 1000 Genomes Project (1KG). These studies found the region surrounding MUC19 to harbor several Denisovan variants in Mexicans (MXL) [23]; and reported that this region has one of the largest densities of Denisovan alleles in Mexicans [24]. MUC19 was also reported to be under positive selection in North American Indigenous populations using Population Branch Statistic (PBS) and integrated Haplotype Scores (iHS) methods for detecting positive selection [25].

In this study, we confirm and further characterize signatures of both introgression and positive selection at MUC19 in MXL. We find an archaic haplotype segregating at high frequency in most populations on the American continent, which is also present in two of the late high-coverage Neanderthal genomes—Chagyrskaya and Vindija. MXL individuals harbor Denisovan-specific coding mutations in MUC19 at high frequencies, and exhibit elevated copy number of a tandem repeat region within MUC19 compared to other worldwide populations. Our results point to a complex pattern of multiple introgression events, from Denisovans to Neanderthals, and Neanderthals to modern humans, which may have played a unique role in the evolutionary history of Indigenous American populations.

Results

Signatures of adaptive introgression at MUC19 in admixed populations from the Americas

We compiled introgressed tracts that overlap the NCBI RefSeq coordinates for MUC19 (hg19, Chr12:40787196–40964559) by at least one base pair. Figure 1A shows the density of introgressed tracts for all non-African populations in the region, using introgression maps inferred with hmmix [26]. All non-African populations harbor introgressed tracts overlapping this region, but at much lower frequencies than the American populations (AMR tract frequency: ~0.183, non-AMR tract frequency: ~0.087; Proportions Z-test, P-value: 5.011e-14; Fisher’s Exact Test, P-value: 2.144e-12; Table S1). Mexicans (MXL)—a population with a large component of Indigenous American genetic ancestry (~48%; [27])—exhibits the highest frequency of the introgressed tracts (0.305; Table S2). Given this, we examined a 742kb window containing the longest introgressed tract found in Mexicans (hg19, Chr12:40272001–41014000; Figure S1). This region contains 135 Denisovan-specific SNPs, classified as such because they are rare or absent in African populations (<1%), present in MXL (>1%), and shared uniquely with the Altai Denisovan. All 135 of these SNPs are sequestered within a core 72kb region (hg19, Chr12:40759001–40831000; shaded gray region in Figure 1A) that has the highest introgressed tract density amongst individuals in the 1KG (see [51]), making both the 742kb and 72kb region outliers for Denisovan-specific SNP density in MXL (742kb region P-value: <3.164e-4; 72kb region P-value: <3.389e-5; Figure S2; Table S3S4). In contrast, there are 80 Neanderthal-specific SNPs in MXL found within the larger 742kb region (P-value: 0.159; Figure S3; Table S5), with only four located in the 72kb region (P-value: 0.263; Figure S3; Table S6).

Figure 1. Signals of adaptive introgression at MUC19.

Figure 1.

(A) Density of introgressed tracts inferred using hmmix that overlap MUC19 for the 1KG (black outline) and stratified by superpopulation—Admixed Americans (AMR) in bluish green, South Asians (SAS) in reddish purple, East Asians (EAS) in blue, and Europeans (EUR) in vermillion. The gray shaded region corresponds to the focal 72kb region, which is the densest contiguous region of introgressed tracts longer than 40kb. (B) UAFR,B,Denisovan(1%, 30%, 100%) values for each non-African population, stratified by superpopulation, per NCBI Refseq gene (gray X’s), where MUC19 is denoted as a yellow X. (C) Population Branch Statistic (PBS) for the Mexican population (MXL) in the 1KG using the Han Chinese (CHB) and Central European (CEU) populations in the 1KG as control populations (PBSMXL:CHB:CEU) for all SNPs in the 742kb region that corresponds to the longest introgressed tract found in MXL. The orange squares represent Denisovan-specific SNPs, the sky blue diamonds represent Neanderthal-specific SNPs, and the reddish purple pentagons represent shared archaic SNPs—note that all of these archaic SNP partitions are rare or absent in Africa and present in MXL (see [51]). The black triangles represent SNPs present across both modern human populations and the archaics, while the gray circles represent SNPs private to modern humans. The black dashed line represents the 99.95th percentile of PBSMXL:CHB:CEU scores for all SNPs genome-wide, and the gray shaded region corresponds to the focal 72kb region—the same gray shaded region in panel A. The MUC19 and LRRK2 genes are fully encompassed within the 742kb region, while ~65% of SLC2A13 overlaps the 742kb region. Below the PBSMXL:CHB:CEU points are the introgressed tracts for MXL (bluish green), CHB (blue), and CEU (vermillion) sorted from shortest to longest within each population.

To test if natural selection is acting on this region, we computed three statistics; one developed to detect adaptive introgression (UA,B,C(w, x, y), A: African super population, B: non-African populations, C: Altai Denisovan; (w, x, y) are allele frequency thresholds in A, B and C, [24]), and two for positive selection (PBS, and iHS). For each gene, we computed UAFR,B,Denisovan(w=1%, x=30%, y=100%), which measures the number of Denisovan alleles found in the homozygous state (100%) that are almost absent in Africans (<1%) and reach a frequency of at least 30% in a given non-African population. Figure 1B shows that MUC19 in MXL is an extreme outlier, as no other gene in any non-African population exhibits such a large value of UAFR,B,Denisovan(1%, 30%, 100%). When we compute the same statistic in windows instead of per gene, the MUC19 region is an outlier only in MXL and is zero for all other non-African populations (P-value 72kb region: <3.284e-5; P-value 742kb region: <3.139e-4: Figure S4; Table S7S8). Furthermore, we compared the windowed UAFR,B,Denisovan(w=1%, x=30%, y=100%) results with their corresponding Q95AFR,B,Denisovan(w=1%, y=100%) value, which quantifies the 95th percentile of the Denisovan allele frequencies found in a given non-African population B for the Denisovan alleles found in the homozygous state (100%), that are almost absent in Africans (<1%), we find that for both the 72kb and 742kb MUC19 regions that Q95AFR,MXL,Denisovan(w=1%, y=100%) = ~30%, which suggests that both the 72kb and 742kb MUC19 regions exhibit signals consistent with adaptive introgression that are not observed in any other 1KG population (Figure S5S6; Table S9).

We next computed PBSMXL:CHB:CEU, where the Han Chinese (CHB) and Central European (CEU) populations were used as control populations, for both the region corresponding to the longest introgressed tract in MXL—742kb—and the 72kb region in MUC19, and find that both regions exhibit statistically significant PBSMXL:CHB:CEU values compared to other 742kb (PBSMXL:CHB:CEU: 0.066; P-value: 0.004) and 72kb (PBSMXL:CHB:CEU: 0.127; P-value: 0.002) windows of the genome respectively (Figure S7; Table S10S11). We then computed PBSMXL:CHB:CEU for each SNP in the 742kb region. Figure 1C shows that in MXL there are many SNPs with statistically significant PBS values in that region (417 out of 6144 SNPs), all which present values above the 99.95th percentile of genome-wide PBSMXL:CHB:CEU values (Benjamini-Hochberg corrected P-values: <0.01; see Supplemental Section S1 in [51]). We note that some SNPs have a larger PBSMXL:CHB:CEU value near the SLC2A13 gene than within the 72kb MUC19 region, but this is due to changes in the archaic allele frequency in CHB and CEU, as the introgressed tracts in these populations are more sparse than the introgressed tracts in MXL (see tracts in Figure 1C). When we partition the MXL population into two demes, consisting of individuals with more than 50% and those with less than 50% Indigenous American ancestry genome-wide [27], and recompute PBS, we find that PBS values for archaic variants are elevated among individuals with a higher proportion of Indigenous American ancestry, suggesting that this region was likely targeted by selection before admixture with European and African populations (Figure S8; Table S10S11).

To exclude the possibility that demographic events such as a founder effect explain the observed signatures of positive selection, we simulated the best fitting demographic parameters inferred for the MXL population [28] to obtain the expected null distribution of PBS values. We first showed that PBS has power to detect adaptive introgression under this demographic model (see Supplemental Section S1 in [51]). We found that demographic forces alone result in lower PBS values compared to what is observed at this gene region (see Supplemental Section S1 in [51]), even when we consider a very conservative null model of heterosis. Furthermore, to also consider haplotype-based measures of positive selection, we computed the integrated haplotype score (iHS) for every 1KG population using selscan [29] to provide haplotype-based evidence of natural selection ([51]). Among all 1KG populations, MXL is the only population with an elevated proportion of SNPs with normalized |iHS| > 2 in either the 742kb (599 out of 2248 SNPs) or 72kb region (229 out of 425 SNPs; Table S12S13). In MXL we find that 130 out of the 135 Denisovan-specific SNPs in the 72kb region have normalized |iHS| > 2, reflective of positive selection (Figure S9; Table S12S13, see Supplemental Section S2 in [51]), which supports our previous allele frequency-based tests of natural selection.

Admixed individuals exhibit an elevated number of variable number tandem repeats at MUC19

MUC19 contains a 30 base pair variable number tandem repeat (VNTR; hg19, Chr12:40876395–40885001; Figure S10), located 45.4kb away from the core 72kb haplotype, but within the larger 742kb introgressed region. To test if individuals who harbor an introgressed tract overlapping the repeat region differ in the number of repeats compared to individuals who do not harbor introgressed tracts, we calculated the number of repeats of the 30bp motif in the 1KG individuals (see [51]; Figure S11; Table S14S15). For each individual, we first report the average number of repeats between their two chromosomes. The genomes of the four archaic individuals do not harbor a higher copy number of tandem repeats (Altai Denisovan: 296 copies; Altai Neanderthal: 379 copies; Vindija Neanderthal: 268 copies; and Chagyrskaya Neanderthal: 293 copies). Among all individuals from the 1KG, we identified outlier individuals with elevated number of repeats above the 95th percentile (>487 repeats; dashed line in Figure 2). We found that MXL individuals have on average ~493 repeats and individuals from the admixed American super population have on average ~417 repeats (Figure 2A; Table S16S17). In contrast, non-admixed American populations have an average of ~341 to ~365 repeats (Figure 2A; Table S16). Out of all the outlier individuals from the 1KG (>487 repeats), a significant proportion of them (~77%) are from admixed American populations (Proportions Z-test, P-value: 3.971e-17; Table S18S21; Figure S12). Outlier individuals from the Americas also carry a significantly higher copy number of tandem repeats compared to the other outlier individuals from non-admixed American populations (Mann-Whitney U, P-value: 5.789e-7; Figure S12; Table S18S21). In MXL, we find that exactly 50% of individuals exhibit an elevated copy number of tandem repeats (Table S16).

Figure 2. Copy number variation of a 30 base pair variable number tandem repeat motif in the 1KG individuals at MUC19.

Figure 2.

(A) Average number of repeat copies between an individual’s two chromosomes for archaic individuals (black X’s), individuals who do not harbor an introgressed tract (sky blue X’s), individuals with one introgressed tract (yellow X’s), and individuals with two introgressed tracts (bluish green X’s) determined by the number of introgressed tracts inferred using hmmix overlapping the MUC19 VNTR, for each population in the 1KG. The mean number of repeat copies stratified by population is denoted by a grey diamond and the average number of repeat copies amongst individuals who carry exactly zero, one, and two introgressed tracts are denoted by sky blue, yellow, and bluish green circles respectively and are stratified by population. The black dashed line denotes the outlier threshold, which corresponds to the 95th percentile of the 1KG repeat copies distribution. Repeat copies appeared similar to the reference human genome (287.5 copies) in the Altai Denisovan (296 copies) and Altai (379 copies), Vindija (268 copies), and Chagyrskaya (293 copies) Neanderthal genomes. (B) The relationship between the average number of repeat copies between a MXL individual’s two chromosomes and the number of introgressed tracts overlapping the MUC19 VNTR region. Note that there is a significant positive correlation between the number of repeat copies and the number of introgressed tracts present in an MXL individual (Spearman’s ρ: 0.885; P-value: 2.839e-22). (C) The relationship between the average number of repeat copies between a MXL individual’s two chromosomes and the proportion of Indigenous American ancestry at the MUC19 VNTR region. Note that there is a significant positive correlation between the number of repeat copies and the proportion of Indigenous American ancestry in an MXL individual (Spearman’s ρ: 0.438; P-value: 2.940e-4).

Within individuals exhibiting an outlier number of repeats (>487), a significant proportion (~86%) have an introgressed tract overlapping the repeat region and these individuals harbor an elevated number of repeats compared to outlying individuals who do not harbor an introgressed tract overlapping the VNTR region (Proportions Z-test, P-value: 2.127e-29; Mann-Whitney U, P-value: 1.398e-06; Figure S13; Table S18S21). All outlying MXL individuals carry at least one introgressed tract that overlaps with the VNTR region (Figure 2). MXL has more individuals exhibiting an elevated copy number (>487 repeats) than any other 1KG population, and there is a positive correlation between the number of repeats and the number of introgressed tracts that overlap with the VNTR present in a MXL individual (Spearman’s ρ: 0.885; P-value: 2.839e-22; Figure 2B; Figure S14; Table S22). We find that among MXL individuals, the number of repeats and the Indigenous American ancestry proportion at the repeat region is significantly positively correlated (Spearman’s ρ: 0.483; P-value: 2.940e-4; Figure 2C; Figure S15, Table S23S24), while the African (Spearman’s ρ: −0.289; P-value: 2.072e-2; Figure S15, Table S23S24) and European (Spearman’s ρ: −0.353; P-value: 4.191e-3; Figure S15, Table S23S24) ancestry proportions have a significant negative correlation. Taken together, in MXL, we find that an individual’s VNTR copy number is highly predicted by the number of introgressed tracts that overlap the VNTR. To a lesser extent, the VNTR copy number is also predicted by the Indigenous American ancestry proportion in the repeat region, indicating that individuals with elevated VNTR copy number have higher proportions of Indigenous American ancestry and harbor the introgressed haplotype. Individuals who carry an elevated number of the MUC19 VNTR are likely to also carry the archaic haplotype, especially in admixed American populations where the archaic haplotype of MUC19 is found at highest frequencies (Mann-Whitney U, P-value: 1.597e-87; Figure S13; Figure 2; Table S18S21).

Given the difficulties of calling numbers of repeats from short-read data, we examined long-read sequence data from the Human Pangenome Reference Consortium (HPRC) and Human Genome Structural Variant Consortium (HGSVC) [42]. These corroborated our findings (Figure S10; Figure S16), revealing an extra 424 copies of the 30bp MUC19 tandem repeat exclusively in American samples, arranged in four additional segments of 106 repeats (at 3,171 bp each). This structural variant is exceptionally large; it effectively doubles the size of the ~12kb coding exon that harbors the tandem repeat (Figures S10).

Introgression introduced missense variants at MUC19

Inspecting the 135 Denisovan-specific SNPs and 4 Neanderthal-specific SNPs in the core 72kb region reveals that some modern humans carry two Denisovan-specific synonymous sites and nine Denisovan-specific non-synonymous sites (Table S25). We quantified the allele frequencies for these nine Denisovan-specific missense variants in present-day populations and in 23 ancient Indigenous American genomes that predate European colonization and the African slave trade (Figure 3A; Table S26S33). In the admixed American superpopulation, we find that the Denisovan-specific missense mutations are segregating at the highest frequencies (frequency range in AMR,: ~0.154 - ~0.157) compared to all other 1KG superpopulations (frequency range in non-AMR,: ~0 - ~0.108; Table S27S28). When we stratify by population instead of by superpopulation, we find the Denisovan-specific missense mutations are segregating at frequencies between ~0.069 and ~0.305 amongst admixed American populations, at varying frequencies between ~0.005 and ~0.157 throughout European, East Asian, and South Asian populations, and at the highest frequency in MXL where all nine Denisovan-specific missense mutations are segregating at a frequency of ~0.305 (Figure 3A; Table S29). We find the mean Denisovan-specific missense mutation frequency to be positively correlated with the introgressed tract frequency per population (Pearson’s ρ: 0.976; P-value: 5.306e-16; Figure S17).

Figure 3. Frequency and protein sequence context of the nine Denisovan-specific missense mutations at the 72kb region in MUC19.

Figure 3.

(A) Heatmap depicting the frequency of Denisovan-specific missense mutations (columns) amongst the four archaic individuals (n = 2; per archaic individual), 23 ancient pre-European colonization American individuals (n = 46), the entire African superpopulation in the 1KG (AFR; n = 1008), and admixed American populations in the 1KG—Mexico (MXL; n = 128), Peru (PEL; n = 170), Colombia (CLM; n = 188), Puerto Rico (PUR; n = 208)—where the “n” represents the number of chromosomes in each population. The left hand side of each row denotes one of the nine Denisovan-specific missense mutations where the position and amino acid substitution (hg19 reference amino acid → Denisovan-specific amino acid). The text in each cell represents the Denisovan-specific missense mutation frequency, and for the ancient Americans we also denote the 95% confidence interval. For the archaic individuals, each cell is denoted with the individual’s amino acid genotype and each AFR cell is denoted by the homozygous hg19 reference amino acid genotype. (B) Denisovan-specific missense mutations in the context of the MUC19 protein sequence. The first 2000 residues are depicted as the main plot, the full protein sequence is displayed in the smaller subplot. Conserved exons are colored as sky blue and the UniProt domains are colored orange, where the text corresponds to specific UniProt domain identity—Von Willebrand factor (VFW) D domains, VWFC domain, and C-terminal cystine knot-like (CTCK) domain. Each of the nine Denisovan-specific missense mutations are denoted by their rsID, plotted with respect to residue index on the x-axis and their corresponding Grantham score on the y-axis. The color of each Denisovan-specific missense mutation denotes whether the mutation has a Grantham score less than 100 (black) or a Grantham score greater than 100 (vermillion, and the marker denotes whether their respective exon has a negative PhyloP score (diamonds) or a positive PhyloP score (crosses).

We then evaluate the frequency of the nine Denisovan-specific missense mutations in 23 ancient pre-European colonization American individuals, and find that each of the nine Denisovan-specific missense mutations are segregating at higher frequencies than in any admixed American population in the 1KG, but at statistically similar frequencies with respect to MXL (see [51]; Figure 3A; Table S29S32). These ancient individuals were sampled from a wide geographic and temporal range (Figure S18; Table S26; [51) and do not comprise a single population, yet we detect the presence of the Denisovan-specific missense mutations in sampled individuals from Alaska, Montana, California, Ontario, Central Mexico, Peru, and Patagonia (Table S30). When we quantify the frequency of these mutations in 22 unadmixed Indigenous Americans from the Simons Genome Diversity Project (SGDP), we find that all nine Denisovan-specific missense variants are segregating at a frequency of ~0.364, which is statistically similar to the ancient American frequencies (see [51]; Table S31S32), and higher than any admixed American population in the 1KG, albeit at statistically similar frequencies with respect to MXL (Table S31S32). Given that all nine of the missense mutations are found within a ~17.5kb region, we quantified the frequency of the Denisovan-specific missense mutation at position Chr12:40808726 in both the ancient individuals and admixed Americans in the 1KG, as this position has genotype information in 20 out of the 23 ancient American individuals (Table S30). We then assessed the relationship between Indigenous American ancestry proportion at the 72kb region, and this Denisovan-specific missense mutation frequency. We find a positive and significant relationship (Pearson’s ρ: 0.489; P-value: 1.982e-23; Figure S19) between an individual’s Indigenous American Ancestry proportion and their respective Denisovan-specific missense mutation frequency, which suggests that recent admixture in the Americas may have diluted the introgressed ancestry at the 72kb region. We also quantify the frequency of these variants in 44 African individuals from the SGDP, and find all nine Denisovan-specific missense variants at a frequency of ~0.011, in a single chromosome from a Khomani San individual (Table S33).

To estimate the potential effect of these missense mutations on the MUC19 protein, we relied on Grantham scores [30]. One of the Denisovan-specific missense mutations found at position Chr12:40821871 (rs17467284 in Figure 3B) results in an amino acid change with a Grantham score of 102. This substitution is classified as moderately radical [31] and suggests that the amino acid introduced through introgression is likely to impact the translated protein’s structure or function. This Denisovan-specific missense mutation falls within an exon that is highly conserved across vertebrates (PhyloP score: 5.15, P-value: 7.08e-6; Figure 3B) [32], indicating that this amino acid residue is likely functionally important, and that the amino acid change introduced by the Denisovan-specific missense mutation may have a significant structural or functional impact. Furthermore, this missense mutation falls between two Von Willebrand factor D domains, which play an important role in the formation of mucin polymers and gel-like matrices [33]. Our results suggest that this Denisovan-specific missense mutation is a potential candidate for impacting its translated protein and may affect the polymerization properties of MUC19 and the viscosity of the mucin matrix.

Identification of the most likely donor of the introgressed haplotype at MUC19

To identify the most likely archaic donor, we investigated the patterns of haplotype divergence at MUC19 by comparing the modern human haplotypes in the 1KG in the 72kb region (see Methods; shaded region in Figure 1A) to the high-coverage archaic humans. We calculated the sequence divergence—the number of pairwise differences normalized by the effective sequence length—between all haplotypes in the 1KG and the genotypes for the Altai Denisovan and the three high-coverage Neanderthal individuals (Figure S20S22; Tables S34S35). Haplotypes from the Americas exhibit a bimodal distribution of sequence divergence for affinities to the Altai Denisovan, which we do not observe for the African haplotypes (Figure 4A), as expected for an introgressed region. When comparing to all four high-coverage archaic genomes at the 72kb region (Figure 4B), there is a clear pattern of sequence divergence for the introgressed haplotypes found in the American super-population of the 1KG (AMR). Interestingly, Figure 4B shows that African haplotypes are closer in sequence divergence to the Altai Neanderthal than to the Altai Denisovan, but the value is not statistically significant (Dataset 1 [52]; [51]). The Altai Neanderthal itself is significantly more distant than expected from the Altai Denisovan (sequence divergence: 0.003782, P-value: 0.002, Figure S23, Table S36), and this larger than expected divergence explains why African haplotypes appear closer to the Altai Neanderthal in this region. We corroborate the pattern observed in Figure 4 using PCA to visualize the haplotype structure in this region (Figure S24).

Figure 4. Haplotype divergence at the 72kb region in MUC19.

Figure 4.

(A) Distribution of haplotype divergence—number of pairwise differences between a modern human haplotype and an archaic genotype normalized by the effective sequence length—with respect to the Altai Denisovan for all individuals in the Admixed American (AMR, black bars) and African (AFR, gray bars) superpopulations. (B) Joint distribution of haplotype divergence from the Altai Denisovan (x-axis) and the Neanderthals (y-axis)—Altai Neanderthal in sky blue, Chagyrskaya Neanderthal in yellow, and Vindija Neanderthal in reddish purple—for all individuals in the AMR (circles) and AFR (triangles) superpopulations. The three grey ellipses (α, β, and γ) represent the three distinct haplotype groups segregating in the 1KG. The α ellipse represents the introgressed haplotypes which exhibit a low sequence divergence from the Altai Denisovan, a high sequence divergence from the Altai Neanderthal, and an intermediate sequence divergence—higher compared to the Altai Denisovan but lower compared to the Altai Neanderthal—with respect to the Chagyrskaya and Vindija Neanderthals. The β ellipse represents the non-introgresssed haplotypes which exhibit a high sequence divergence from the Altai Denisovan, a low sequence divergence from the Altai Neanderthal, and an intermediate sequence divergence—lower compared to the Altai Denisovan but higher compared to the Altai Neanderthal—with respect to the Chagyrskaya and Vindija Neanderthals. Note that the AMR haplotype within the γ ellipse is positioned at intermediate sequence divergence values with respect to the α and β ellipses, which represents one of seven recombinant haplotypes segregating in the 1KG (see Figure S44 in [51]).

Despite our UAFR,MXL,Denisovan(1%, 30%, 100%) and archaic SNP density results demonstrating that the introgressed haplotype at the 72kb region shares the most alleles with the Altai Denisovan (Figure 4B), we find that this region is not statistically significantly closer to the Altai Denisovan individual than expected from the genomic background of sequence divergence (sequence divergence: 0.00097, P-value: 0.237, Figure S25, Table S37). However, this is not unusual, given that the Altai Denisovan is not genetically closely related to Denisovan introgressed segments in modern humans (see Supplemental Section S5 in [51]), which might suggest that the Denisovan donor population of the 72kb region in MUC19 is not closely related to the Altai Denisovan individual. Furthermore, the 72kb region is also not statistically significantly closer to Neanderthals than expected from the genomic background of sequence divergence (sequence divergence from the Altai Neanderthal: 0.003648, P-value: 0.995; Chagyrskaya Neanderthal: 0.001818, P-value: 0.811; Vindija Neanderthal: 0.001816, P-value: 0.806; Figure S25, Table S37).

As an additional approach, we used the D+ statistic to assess which archaic human exhibits the most allele sharing with the introgressed haplotype at the 72kb region in MUC19 [34, 35]. We performed D+ (P1, P2; P3, Outgroup) tests with the following configurations: the Yoruban population (YRI) as P1, the focal MXL individual (NA19664) with two copies of the introgressed haplotype with an affinity to the Altai Denisovan as P2, and one of the four high-coverage archaic genomes as P3; we use the EPO ancestral allele call from the six primate alignment as the Outgroup. We exclusively observe a positive and significant D+ value (D+: 0.743, P-value: 1.386e-5; Figure S26; Table S38) when the Altai Denisovan is used as P3 (the putative donor population). Conversely, when any of the three Neanderthals are used as P3, we observe non-significant D+ values (P3: Altai Neanderthal, D+: −0.622, P-value: 0.999; P3: Chagyrskaya Neanderthal, D+: 0.175, P-value: 0.183; P3: Vindija Neanderthal, D+: 0.182, P-value: 0.174; Figure S26; Table S38). These D+ suggest that the introgressed haplotype at the 72kb MUC19 region shares more alleles with the Altai Denisovan, which is not observed with any of the three Neanderthals and provides evidence that the introgressed haplotype found in modern humans is Denisovan-like.

When we consider the 742kb region in MXL, we find that it is closest to the Chagyrskaya and Vindija Neanderthals, and significantly closer than expected from the genomic background (sequence divergence from the Chagyrskaya Neanderthal: 0.000661, P-value: 0.006; from the Vindija Neanderthal: 0.000656, P-value: 0.007; Figure S27S30; Table S3941; Dataset 2 [52]; [51). We also tested whether this region is statistically significantly closer to the Altai Denisovan than expected from the genomic background and found that this tract in MXL is also significantly closer than expected to the Altai Denisovan, albeit not as close when compared to the Chagyrskaya and Vindija Neanderthals (sequence divergence from the Altai Denisovan: 0.000806, P-value: 0.019; Figure S27S30; Table S39S41). We then performed D+ analyses for the 742kb region with identical configurations as for the 72kb region and observe positive and significant D+ values when P3 is Chagyrskaya (D+: 0.381, P-value: 7.375e-6; Figure S31; Table S42), and Vindija Neanderthals (D+: 0.383, P-value: 7.505e-6; Figure S31; Table S42), but not when the Altai Neanderthal is P3 (D+: 0.091, P-value: 1.442e-1; Figure S31; Table S42). D+ is, however, significant when the Altai Denisovan is P3 (D+: 0.377, P-value: 9.889e-8; Figure S31; Table S42). These D+ results are consistent with our sequence divergence results, which indicate that the introgressed haplotype at the 742kb MUC19 region has a high affinity for the Altai Denisovan and the two late Neanderthals, but not the Altai Neanderthal (Figures S20S31; Tables S34S42).

Given the high density of Denisovan-specific alleles (Figure S2; Table S4), the sequence divergence, and D+ results for the 72kb and 742kb region, the most parsimonious explanation is that a Denisovan population could have introduced this haplotype into non-Africans. However, our 742kb results also suggest a Neanderthal population could have introduced the introgressed haplotype. This is further supported by the sequence divergence results at the 72kb region where late Neanderthals exhibit intermediate distance to the introgressed haplotype (Figure 4B), suggesting they harbor some of the Denisovan alleles.

Neanderthals introduce Denisovan-like introgression into non-African modern humans

Based on sequence divergence, the Chagyrskaya and Vindija Neanderthals carry a 742kb haplotype that is most similar to the Altai Neanderthal, with the exception of the 72kb region. To understand why the Chagyrskaya and Vindija Neanderthals exhibit intermediate levels of sequence divergence with the introgressed haplotype present in MXL at the 72kb region in MUC19 relative to the Altai Denisovan and Altai Neanderthal (see the α ellipse in Figure 4B), we computed the number of heterozygous sites for each archaic human. Because the Chagyrskaya and Vindija Neanderthals present intermediate sequence divergences, we expected these two individuals to have more heterozygosity than the Altai Neanderthal. At the 72kb region in MUC19, we observe that the Chagyrskaya and Vindija Neanderthals carry an elevated number of heterozygous sites (Chagyrskaya heterozygous sites: 168, P-value: 2.307e-4; Vindija heterozygous sites: 171, P- value: 3.282e-4; Figure 5A; Figure S32; Table S43) that is higher than those of the Altai Neanderthal (heterozygous sites: 1, P-value: 0.679; Figure 5A; Figure S32; Table S43) and the Altai Denisovan (heterozygous sites: 6, P-value: 0.455; Figure5A; Figure S32; Table S43). The Chagyrskaya and Vindija Neanderthals carry a higher number of heterozygous sites than all African individuals (~75, P-value: 0.424; Figure 5A; Figure S33; Table S44), and have a more similar pattern to non-African individuals carrying exactly one Denisovan-like haplotype (~287, P-value: 3.157e-4; yellow X’s in Figure 5A; Figure S33; Table S44). This observation runs opposite to the genome-wide expectation for Neanderthals, as archaic humans have much lower heterozygosity than modern humans (genome-wide heterozygosity is ~0.00014 - ~0.00017 for the Neanderthals, ~0.00019 for the Denisovan, and ~0.001 for Africans modern humans; Figure S34; Table S45).

Figure 5. The high levels of heterozygosity in the Chagyrskaya and Vindija Neanderthals are explained by Denisovan-like ancestry at the 72kb region in MUC19.

Figure 5.

(A) Number of heterozygous sites at the 72kb region in MUC19 per archaic individual (black X’s), 1KG individuals without the introgressed haplotype (sky blue X’s), 1KG individuals with exactly one copy of the introgressed haplotype (yellow X’s), 1KG individuals with a recombinant introgressed haplotype (vermillion X’s), and 1KG individuals with two copies of the introgressed haplotype (bluish green X’s). The average number of heterozygous sites stratified by population are denoted by the grey diamonds and the average number of heterozygous sites amongst individuals who carry exactly zero, one, and two introgressed haplotypes are denoted by sky blue, yellow, and bluish green circles respectively and are stratified by population. (B) Haplotype matrix of the 233 segregating sites (columns) amongst the focal MXL individual (NA19664) with two copies of the introgressed haplotype; the focal YRI individual (NA19190) without the introgressed haplotype; the Altai Denisovan; the Altai Neanderthal; and the two phased haplotypes for the Chagyrskaya and Vindija Neanderthals, respectively. Cells shaded blue denote the hg19 reference allele, cells shaded reddish purple denote the alternative allele, and cells shaded white represent sites that did not pass quality control in the given archaic individual. Note that the focal MXL and YRI individuals are homozygous for every position in the 72kb region in MUC19 and that the heterozygous sites for the Altai Denisovan and Altai Neanderthal—six and one heterozygous sites respectively—are omitted.

Within modern humans, we find that individuals carrying exactly one Denisovan-like haplotype at the 72kb region harbor significantly more heterozygous sites at MUC19 compared to the rest of their genome (average number of heterozygous sites: ~287, P-value: 3.157e-4; Figure S33; Table S44), which surpasses the number of heterozygous sites at MUC19 of any African individual (Figure 5A). Individuals carrying two Denisovan-like haplotypes harbor significantly fewer heterozygous sites than expected at MUC19 relative to the rest of their genome (average number of heterozygous sites: ~4, P-value: 6.945e-4; Figure S33; Table S44), while African individuals harbor the expected number of heterozygous sites (average number of heterozygous sites: ~75, P-value: 0.424; Figure S33; Table S44). Given that the Chagyrskaya and Vindija Neanderthals and non-African individuals who harbor one copy of the Denisovan-like haplotype exhibit an excess of heterozygous sites at the 72kb region, we hypothesized that the Chagyrskaya and Vindija Neanderthals also harbor one Denisovan-like haplotype. This arrangement would explain the elevated number of heterozygous sites and the intermediary sequence divergences with respect to the introgressed haplotype.

To test this hypothesis, we first performed additional tests for gene flow between the archaic individuals using the D+ statistic within the 72kb MUC19 region that provided evidence that the Chagyrskaya and Vindija Neanderthals harbor one copy of the Denisovan-like haplotype. For these comparisons the Altai Neanderthal is P1, either the Chagyrskaya or Vindija Neanderthals are P2, and the Altai Denisovan is P3, we observe significant and positive D+ values supporting gene flow between the Denisovan and the Chagyrskaya (D+: 0.783; P-value: 0.029) and Vindija (D+: 0.819; P-value: 0.018) Neanderthals (Figure S35; Table S46). To further investigate whether the Chagyrskaya and Vindija Neanderthals harbor one Denisovan-like haplotype in the 72kb region, we used BEAGLE to phase the 72kb region. As no phasing has been done for archaic humans, we tested the reliability of using the 1KG as a reference panel by constructing a synthetic 72kb region. We sampled one allele from the Altai Neanderthal and one allele from the Altai Denisovan at heterozygous sites in either the Chagyrskaya or Vindija Neanderthals. We found that we could phase the synthetic individual perfectly at this region (see Supplemental Sections S3S4 in [51]). Encouraged by these results, we phased the Chagyrskaya and Vindija Neanderthals at the 72kb region, and confirmed they carry one haplotype that is similar to the Altai Neanderthal, and one haplotype that is similar to the Denisovan-like haplotype in MXL. Relative to the Altai Neanderthal, the Chagyrskaya Neanderthal-like haplotype exhibits 3.5 differences, and the Vindija exhibits 4 differences (Figure 5B; Table S47). Relative to the Altai Denisovan, the Chagyrskaya Denisovan-like haplotype exhibits 43 differences, and the Vindija haplotype exhibits 41 differences (Figure 5B; Table S47). As expected, the phased Denisovan-like haplotype in these two Neanderthals is closest to the Denisovan-like haplotype in MXL; the Chagyrskaya exhibits 5 differences, and the Vindija Neanderthal exhibits 4 differences (Figure 5B; Table S48). We show that, in the 72kb region, the introgressed haplotype in MXL is statistically significantly closer to the phased Denisovan-like haplotype present in Chagyrskaya and Vindija Neanderthals (sequence divergence from Chagyrskaya Neanderthal haplotype: 0.000104, P-value: 0.003; sequence divergence from Vindija Neanderthal haplotype: 0.000083, P-value: 0.002; Figure S36; Table S48; Dataset 3 [52]; [51]). Due to the potential introduction of biases when phasing ancient DNA data, to investigate if the Chagyrskaya and Vindija Neanderthals carry a Denisovan-like haplotype we developed an approach called Pseudo-Ancestry Painting (PAP, see [51]) to assign the two alleles at a heterozygous site to two source individuals. We found that using an MXL (NA19664) and a YRI (NA19190) individual as sources maximizes the number of heterozygous sites in the Chagyrskaya (PAP Score: 0.94, P-value: 3.683e-4) and Vindija (PAP Score: 0.929, P-value: 8.679e-05) Neanderthals (Figure S37; Table S49).

In sum, our analyses suggest that some non-Africans carry a mosaic region of archaic ancestry: a small Denisovan-like haplotype (72kb) embedded in a larger Neanderthal haplotype (742kb), that was inherited through Neanderthals, who themselves acquired Denisovan ancestry from an earlier introgression event (Figure S38). This is consistent with the literature, where Denisovan introgression into Neanderthals is rather common [37, 38]. Thus, we refer to the mosaic haplotype found in modern humans as the archaic haplotype.

Discussion

The study of adaptive archaic introgression has illuminated candidate genomic regions that affect the health and overall fitness of global populations. In this study, we pinpointed several aspects of the gene MUC19 that highlight its importance as a candidate to study adaptive introgression: one of the haplotypes that span this gene in modern humans is of archaic origin; modern humans inherited this haplotype from Neanderthals, who in turn inherited it from Denisovans; the haplotype introduced nine missense mutations that are at high frequency in both Indigenous and Admixed American populations; individuals with the archaic haplotype carry a massive coding VNTR expansion relative to the non-archaic haplotype, and their functional differences may help explain how mainland Indigenous Americans adapted to their environments, which remains under-explored. This study adds an example to the growing literature of natural selection acting on archaic alleles at coding sites, or possibly an example of natural selection acting on human VNTRs, a developing research frontier [see, 39].

A larger implication of our findings is that archaic ancestry could have been a useful source of standing genetic variation as the early Indigenous American populations adapted to new environments, with genes like MUC19 and other mucins possibly mediating important fitness effects [40]. The variation in the MUC19 coding VNTR in global populations dovetails with this idea and adds to a growing body of evidence for the important role of structural variants in human genomics and evolution [4142]. In American populations, particular haplotypes carrying the most extreme copy numbers were selected and are now relatively frequent. This VNTR expansion effectively doubles the functional domain of this mucin, indicating an adaptive role driven by environmental pressures particular to the Americas. However, we cannot know whether the non-synonymous variants or the VNTR is driving natural selection as they are linked in haplotypes, and our evidence for positive selection is tied to SNP variation and not to the VNTR itself.

Another interesting aspect of MUC19 is the evolutionary history of the introgressed region. Our observation of a 72kb Denisovan haplotype found in Neanderthals and non-African modern humans that is nested within a larger Neanderthal haplotype, suggests that the smaller Denisovan haplotype was first introgressed into Neanderthals, who later admixed with modern humans to introduce the full 742 kb haplotype. While the Altai Neanderthal does not harbor the Denisovan haplotype at the 72kb region, the other two chronologically younger Neanderthals (Chagyrskaya and Vindija) do. We phased these younger Neanderthals (see Supplementary Sections S3S5 in [51]) and showed that they harbor exactly one Denisovan-like haplotype, which explains why they exhibit an excess of heterozygosity. The Denisovan-like haplotype in the younger Neanderthals is also statistically significantly closer to the archaic haplotype present in MXL (Figure S36; Table S48), providing additional evidence that modern humans obtained this haplotype through an interbreeding event with Neanderthals. Despite the introgressed archaic haplotype having an excessive amount of shared alleles with the Altai Denisovan at the 72kb region, the Altai Denisovan harbors several private mutations—14 and 6 mutations in the homozygous and heterozygous state respectively—that are absent across all 287 Denisovan-like haplotypes in the 1KG, suggesting that the introgressing Denisovan population may not be closely related to Altai Denisovan (see Supplemental Section S5; [51]). Indeed, the introgressed haplotype in the 72kb region is present at low frequencies in other non-African populations including Papuans—where the genome-wide Denisovan ancestry of Papuans has been estimated to originate from a population of Denisovans that was not closely related to the Altai Denisovan [33]. Finding two highly divergent haplotypes maintained in polymorphism in two Neanderthal populations, and finding the archaic haplotype at high frequencies in American populations but not at fixation may point to a balanced polymorphism [45]. More generally, the evolutionary history of this region suggests a complex history that involves recurrent introgression and natural selection, and it parallels complex introgression patterns from other regions of the genome [4648].

Finally, we find a single San individual who carries the nine Denisovan missense variants in heterozygous form, uniquely among all African individuals considered here. The sequence divergence between this San haplotype and the archaic MXL haplotype at the 72kb region is high (0.001342), further supporting the origin of the archaic haplotype in non-Africans as introgressed. Khoe-San populations are estimated to have diverged from other African groups 120 thousand years ago [43]. Finding a divergent haplotype in the San is consistent with a previous study [44], as ~1% of their ancestry can be attributed to lineages diverged from the main human lineage beyond 1 million years ago. We note that this San individual does not harbor an extended number of repeat copies of the VNTR (301 copies), which further supports the importance of the VNTR expansion in the Americas. Furthermore, we cannot determine if this variant found its way into the San through modern admixture of non-African ancestry into Sub-Saharan populations.

Perhaps the largest knowledge gap concerning why the archaic haplotype of MUC19 would be under positive selection is its underlying function. Mucins are secreted glycoproteins responsible for the gel-like properties and the viscosity of the mucus [49]. Mucins are characterized by proline, threonine, and serine (PTS) tandem repeats, which in MUC19 are structured into 30bp tandem repeats. The massive difference in copy numbers of the 30bp PTS tandem repeat domains carried by individuals harboring the Human-like and archaic haplotypes strongly suggests MUC19 variants differ in function as a consequence of different molecular binding affinities between variants. This is the case in other mucins, such as MUC7, where variants carrying different numbers of PTS repeats exhibit different microbe-binding properties [40]. If the two variants of MUC19 also have differential binding properties, this would lend support to why positive selection would increase the frequency of the archaic haplotype in American populations. Yet, there is limited medical literature associating variation in MUC19 with human fitness. Further experimental validation of how VNTRs and the Denisovan-specific missense mutations affect MUC19 function is necessary to understand the effect the archaic haplotype may exert on the translated MUC19 protein, and how it modifies its function during the formation of mucin polymers.

Methods developed in evolutionary biology can be useful for identifying candidate variants underlying biological functions. Future functional and evolutionary studies of the MUC19 region will not only provide insight into specific mechanisms of how variation at this gene confers a selective advantage, but also specific evolutionary events that occurred in the history of humans. Beyond improving our understanding of how archaic variants facilitated adaptation in novel environments, our findings also highlight the importance of studying archaic introgression in understudied populations, such as admixed populations from the Americas [50]. Genetic variation in American populations is less well-characterized than other global populations; it is difficult to deconvolve Indigenous ancestries from European, African, and—to a lesser extent—South Asian ancestries, following 500 years of European colonization [29]. This knowledge gap is exacerbated by the high cost of performing genomic studies, building infrastructure, and generating scientific capacity in Latin America—but it is a worthwhile investment—as our study shows that leveraging these populations can lead to the identification of exciting candidate loci that can expand our understanding of adaptation from archaic standing variation.

Supplementary Material

Supplement 1
media-1.pdf (6.5MB, pdf)
Supplement 2
media-2.pdf (507.1KB, pdf)

Acknowledgments:

We would like to thank Alyssa Funk for contributing to the development of the PBS analysis, Ratchanon Pornmongkolsuk for early visualizations of global frequencies of MUC19, and Diego Ortega del Vecchyo and Paolo Provero for their insightful comments and discussion. We would also like to thank the Crawford and Ramachandran laboratories, especially Ria Vinod, Julian Stamp, Chibuikem Nwizu, Cole Williams, and Leah Darwin for their invaluable feedback and support throughout the duration of this project. Part of this research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.

Funding:

The Leakey Foundation (to FAV).

National Institutes of Health (1R35GM128946-01 to EHS).

Alfred P. Sloan Foundation (to EHS).

Blavatnik Family Graduate Fellowship in Biology and Medicine (to DP).

Brown University Predoctoral Training Program in Biological Data Science (NIH T32 GM128596 to DP and ETC).

National Institutes of Health (R35GM142978 to PM).

Burroughs Wellcome Fund (Career Award at the Scientific Interface to PM).

National Institutes of Health (R01NS122766 to PNV).

Human Frontier Science Program (to EHS, FJ, and MAA).

Footnotes

Competing interests: Authors declare that they have no competing interests.

Data and materials availability:

The 1,000 Genomes Project Phase III, Simons Genome Diversity Project, high-coverage archaic genomes, Human Pangenome Reference Consortium, and Human Genome Structural Variant Consortium datasets are all publicly available. Ancient American genomes are available after signing data agreements from the original publications. All software used in this study is publicly available, and all statistical tests are described in the methods. All the information needed to reproduce the results in this study is described in the methods and supplemental methods. Additionally, the original code and final results can be found at: https://github.com/David-Peede/MUC19; intermediary files used to produce our final results can be found at: https://doi.org/10.5061/dryad.z612jm6pj; and the introgressed tracts, repeat information, phased late Neanderthal haplotypes, and Datasets S1S4 can be found at: https://doi.org/10.5281/zenodo.15042423.

References and Notes

  • [1].Ahlquist KD, Banuelos Mayra M, Funk Alyssa, Lai Jiaying, Rong Stephen, Villanea Fernando A, and Witt Kelsey E. Our tangled family tree: new genomic methods offer insight into the legacy of archaic admixture. Genome biology and evolution, 13(7):evab115, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Browning Sharon R, Browning Brian L, Zhou Ying, Tucci Serena, and Akey Joshua M. Analysis of human sequence data reveals two pulses of archaic denisovan admixture. Cell, 173(1):53–61, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Villanea Fernando Aand Schraiber Joshua G. Multiple episodes of interbreeding between neanderthal and modern humans. Nature ecology & evolution, 3(1):39, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Yang Melinda A, Malaspinas Anna-Sapfo, Durand Eric Y, and Slatkin Montgomery. Ancient structure in Africa unlikely to explain neanderthal and non African genetic similarity. Molecular biology and evolution, 29(10):2987–2995, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Sankararaman Sriram, Mallick Swapan, Patterson Nick, and Reich David. The combined landscape of Denisovan and Neanderthal ancestry in present day humans. Current Biology, 26(9):1241–1247, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Petr Martin, Pääbo Svante, Kelso Janet, and Vernot Benjamin. Limits of long-term selection against Neandertal introgression. Proceedings of the National Academy of Sciences, 116(5):1639–1644, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Zhang Xinjun, Kim Bernard, Lohmueller Kirk E, and Sánchez Emilia Huerta. The impact of recessive deleterious variation on signals of adaptive introgression in human populations. Genetics, 215(3):799–812, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Racimo Fernando, Sankararaman Sriram, Nielsen Rasmus, and Huerta-Sánchez Emilia. Evidence for archaic adaptive introgression in humans. Nature Reviews Genetics, 16(6):359, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Zhang Xinjun, Kim Bernard, Singh Armaan, Sankararaman Sriram, Durvasula Arun, and Lohmueller Kirk E. Maladapt reveals novel targets of adaptive introgression from neanderthals and denisovans in worldwide human populations. Molecular Biology and Evolution, 40(1):msad001, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Fan Shaohua, Hansen Matthew EB, Lo Yancy, and Tishkoff Sarah A. Going global by adapting local: A review of recent human adaptation. Science, 354(6308):54–59, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Mendez Fernando L, Watkins Joseph C, and Hammer Michael F. A haplotype at STAT2 introgressed from Neanderthals and serves as a candidate of positive selection in papua new guinea. The American Journal of Human Genetics, 91(2):265–274, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Sankararaman Sriram, Mallick Swapan, Dannemann Michael, Kay Prüfer Janet Kelso, Pääbo Svante, Patterson Nick, and Reich David. The genomic landscape of Neanderthal ancestry in present-day humans. Nature, 507 (7492):354–357, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Vernot Benjamin and Akey Joshua M. Resurrecting surviving Neandertal lineages from modern human genomes. Science, 343(6174):1017–1021, 2014. [DOI] [PubMed] [Google Scholar]
  • [14].Gittelman Rachel M, Schraiber Joshua G, Vernot Benjamin, Mikacenic Carmen, Wurfel Mark M, and Akey Joshua M. Archaic hominin admixture facilitated adaptation to out-of-africa environments. Current Biology, 26(24):3375–3382, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Aaron J Sams Anne Dumaine, Nédélec Yohann, Yotova Vania, Alfieri Carolina, Tanner Jerome E, Messer Philipp W, and Barreiro Luis B. Adaptively introgressed Neandertal haplotype at the OAS locus functionally impacts innate immune responses in humans. Genome biology, 17(1):1–15, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Dannemann Michael and Kelso Janet. The contribution of Neanderthals to phenotypic variation in modern humans. The American journal of human genetics, 101(4):578–589, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Marnetto Davide and Huerta-Sánchez Emilia. Haplostrips: revealing population structure through haplotype visualization. Methods in Ecology and Evolution, 8(10):1389–1392, 2017. [Google Scholar]
  • [18].Huerta-Sánchez Emilia, Jin Xin, Asan Zhuoma Bianba, Peter Benjamin M., Vinckenbosch Nicolas, Liang Yu, Yi Xin, He Mingze, Somel Mehmet, Ni Peixiang, Wang Bo, Ou Xiaohua, Huasang, Luosang Jiangbai, Cuo Zha Xi Ping, Li Kui, Gao Guoyi, Yin Ye, Wang Wei, Zhang Xiuqing, Xu Xun, Yang Huanming, Li Yingrui, Wang Jian, Wang Jun & Nielsen Rasmus. Altitude adaptation in Tibetans caused by introgression of denisovan-like DNA. Nature, 512(7513):194, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Racimo Fernando, Gokhman David, Fumagalli Matteo, Ko Amy, Hansen Tor ben, Moltke Ida, Albrechtsen Anders, Carmel Liran, Huerta-Sánchez Emilia, and Nielsen Rasmus. Archaic adaptive introgression in tbx15/wars2. Molecular biology and evolution, 34(3):509–524, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Zhang Xinjun, Witt Kelsey E, Bañuelos Mayra M, Ko Amy, Yuan Kai, Xu Shuhua, Nielsen Rasmus, and Huerta-Sánchez Emilia. The history and evolution of the denisovan-epas1 haplotype in tibetans. Proceedings of the National Academy of Sciences, 118(22):e2020803118, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Tamm Erika, Kivisild Toomas, Reidla Maere, Metspalu Mait, Smith David Glenn, Mulligan Connie J, Bravi Claudio M, Rickards Olga, Martinez-Labarga Cristina, Khusnutdinova Elsa K, Fedorova Sardana A.,Golubenko Maria V.,Stepanov Vadim A.,Gubina Marina A.,Zhadanov Sergey I.,Ossipova Ludmila P.,Damba Larisa,Voevoda Mikhail I.,Dipierri Jose E.,Villems Richard,Malhi Ripan S. Beringian standstill and spread of Native American founders. PloS one, 2(9):e829, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Beck Hylke E, Zimmermann Niklaus E, Tim R McVicar Noemi Vergopolan, Berg Alexis, and Wood Eric F. Present and future köppen-geiger climate classification maps at 1-km resolution. Scientific data, 5(1):1–12, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Witt Kelsey E, Funk Alyssa, Añorve-Garibay Valeria, Fang Lesly Lopez, and Huerta-Sánchez Emilia. The impact of modern admixture on archaic human ancestry in human populations. Genome Biology and Evolution, 15 (5):evad066, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Racimo Fernando, Marnetto Davide, and Huerta-Sánchez Emilia. Signatures of archaic adaptive introgression in present-day human populations. Molecular biology and evolution, 34(2):296–317, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Reynolds Austin W, Mata-Míguez Jaime, Miró-Herrans Aida, Briggs-Cloud Marcus, Sylestine Ana, Barajas-Olmos Francisco, Ortiz Humberto Garcia, Rzhetskaya Margarita, Orozco Lorena, Raff Jennifer A, Hayes Geoffrey, and Bolnick Deborah A. Comparing signals of natural selection between three indigenous North American populations. Proceedings of the National Academy of Sciences, page 201819467, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Skov Laurits, Hui Ruoyun, Shchur Vladimir, Hobolth Asger, Scally Aylwyn, Schierup Mikkel Heide, and Durbin Richard. Detecting archaic introgression using an unadmixed outgroup. PLoS Genetics, 14(9):e1007641, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Martin Alicia R, Gignoux Christopher R, Walters Raymond K, Wojcik Genevieve L, Neale Benjamin M, Gravel Simon, Daly Mark J, Bustamante Carlos D, and Kenny Eimear E. Human demographic history impacts genetic risk prediction across diverse populations. The American Journal of Human Genetics, 100(4):635–649, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Medina-Munoz Santiago G, Vecchyo Diego Ortega-Del, Hervert Luis Pablo Cruz, Ferreyra-Reyes Leticia, Garcia-Garcia Lourdes, Estrada Andres Moreno, and Ragsdale Aaron. “Demographic modeling of admixed Latin American populations from whole genomes.” The American Journal of Human Genetics, 110(10), 1804–1816, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Szpiech Zachary A. selscan 2.0: scanning for sweeps in unphased data. Bioinformatics. 40(1):btae006, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Grantham Richard. Amino acid difference formula to help explain protein evolution. science, 185(4154):862–864, 1974. [DOI] [PubMed] [Google Scholar]
  • [31].Li Wen-Hsiung, Wu Chung-I, and Luo Chi-Cheng. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Molecular biology and evolution, 2(2):150–174, 1985. [DOI] [PubMed] [Google Scholar]
  • [32].Pollard Katherine S, Hubisz Melissa J, Rosenbloom Kate R, and Siepel Adam. Detection of nonneutral substitution rates on mammalian phylogenies. Genome research, 20(1):110– 121, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Javitt Gabriel, Khmelnitsky Lev, Albert Lis, Lavi Shlomo Bigman Nadav Elad, Morgenstern David, Ilani Tal, Levy Yaakov, Diskin Ron, and Fass Deborah. Assembly mechanism of mucin and von willebrand factor polymers. Cell, 183(3):717–729, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Lopez-Fang Lesly, Peede David, Vecchyo Diego Ortega-Del, McTavish Emily Jane, and Huerta-Sánchez Emilia. Leveraging shared ancestral variation to detect local introgression. PLoS Genetics, 20(1), e1010155, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Peede David, Vecchyo Diego Ortega-Del, and Huerta-Sánchez Emilia. The utility of ancestral and derived allele sharing for genome-wide inferences of introgression. bioRxiv, 2022. doi: 10.1101/2022.12.02.518851. [DOI] [Google Scholar]
  • [36].Ongaro Linda, Huerta-Sánchez Emilia. A history of multiple Denisovan introgression events in modern humans. Nature Genetics, 5:1-1, 2024. [DOI] [PubMed] [Google Scholar]
  • [37].Slon Viviane, Mafessoni Fabrizio, Vernot Benjamin, Filippo Cesare De, Grote Steffi, Viola Bence, Hajdinjak Mateja, Stéphane Peyrégne Sarah Nagel, Brown Samantha, Douka Katerina, Higham Tom, Kozlikin Maxim B., Shunkov Michael V., Derevianko Anatoly P., Kelso Janet, Meyer Matthias, Kay Prüfer & Svante Pääbo. The genome of the offspring of a Neanderthal mother and a Denisovan father. Nature, 561(7721):113–116, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Peter Benjamin M. 100,000 years of gene flow between Neandertals and Denisovans in the Altai mountains. bioRxiv, 2020. doi: 10.1101/2020.03.13.990523. [DOI] [Google Scholar]
  • [39].Plender E.G., Prodanov T., Hsieh P., Nizamis E., Harvey W.T., Sulovari A., Munson K.M., Kaufman E.J., O’Neal W.K., Valdmanis P.N. and Marschall T. Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B. The American Journal of Human Genetics, 111(8), pp.1700–1716, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Xu Duo, Pavlidis Pavlos, Taskent Recep Ozgur, Alachiotis Nikolaos, Flanagan Colin, DeGiorgio Michael, Blekhman Ran, Ruhl Stefan, and Gokcumen Omer. Archaic hominin introgression in Africa contributes to functional salivary MUC7 genetic variation. Molecular Biology and Evolution, 34(10):2704–2715, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Yan S. M., Sherman R. M., Taylor D. J., Nair D. R., Bortvin A. N., Schatz M. C., & McCoy R. C. Local adaptation and archaic introgression shape global diversity at human structural variant loci. Elife, 10, e67615, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science, 372(6537):eabf7117, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Ragsdale A. P., Weaver T. D., Atkinson E. G., Hoal E. G., Möller M., Henn B. M., & Gravel S. A weakly structured stem for human origins in Africa. Nature, 617(7962), 755–763, 2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Wang K., Mathieson I., O’Connell J., & Schiffels S. Tracking human population structure through time from whole genome sequences. PLoS genetics, 16(3), e1008552, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Viscardi Lucas Henriques, Paixao-Cortes Vanessa Rodrigues, Comas David, Salzano Francisco Mauro, Rovaris Diego, Bau Claiton Dotto, Amorim Carlos Eduardo G, and Bortolini Maria Catira. Searching for ancient balanced polymorphisms shared between Neanderthals and modern humans. Genetics and Molecular Biology, 41:67–81, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Posth Cosimo, Wißing Christoph, Kitagawa Keiko, Pagani Luca, Holstein Laura van, Racimo Fernando, Wehrberger Kurt, Conard Nicholas J., Kind Claus Joachim, Bocherens Hervé & Krause Johannes. Deeply divergent archaic mitochondrial genome provides lower time boundary for African gene flow into Neanderthals. Nature communications 8(1): 16046, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Petr Martin, Hajdinjak Mateja, Fu Qiaomei, Essel Elena, Rougier Hélène, Crevecoeur Isabelle, Semal Patrick, Golovanova Liubov V., Doronichev Vladimir B., Lalueza-Fox Carles ,Rasilla Marco de la, Rosas Antonio, Shunkov Michael V., Kozlikin Maxim B.,Derevianko Anatoli P., Vernot Benjamin, Meyer Matthias , Kelso Janet. The evolutionary history of Neanderthal and Denisovan Y chromosomes. Science 369(65110): 1653–1656, 2020. [DOI] [PubMed] [Google Scholar]
  • [48].Peyrégne Stéphane, Kelso Janet, Peter Benjamin M, and Pääbo Svante. The evolutionary history of human spindle genes includes back-and-forth gene flow with Neandertals. Elife, 11:e75464, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Pajic Petar, Shen Shichen, Qu Jun, May Alison J, Knox Sarah, Ruhl Stefan, and Gokcumen Omer. A mechanism of gene evolution generating mucin function. Science advances, 8(34):eabm8757, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Villanea Fernando Aand Witt Kelsey E. Underrepresented populations at the archaic introgression frontier. Frontiers in Genetics, 13:821170, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Materials and methods are available as supplementary materials.
  • [52].Villanea Fernando A., Peede David, Kaufman Eli J., Añorve-Garibay Valeria, Chevy Elizabeth T., Villa-Islas Viridiana, Witt Kelsey E., Zeloni Roberta, Marnetto Davide, Moorjani Priya, Jay Flora, Valdmanis Paul N., Ávila-Arcos María C., Huerta-Sánchez Emilia. Data for: The MUC19 gene in Denisovans, Neanderthals, and Modern Humans: Recurrent Introgression and Natural Selection , Zenodo (2025); 10.5281/zenodo.15042423. [DOI] [PubMed] [Google Scholar]
  • [53].Villanea Fernando A., Peede David, Kaufman Eli J., Valeria Añorve-Garibay Elizabeth T. Chevy, Villa-Islas Viridiana, Witt Kelsey E., Zeloni Roberta, Marnetto Davide, Moorjani Priya, Jay Flora, Valdmanis Paul N., Ávila-Arcos María C., Huerta-Sánchez Emilia. Data for: The MUC19 gene in Denisovans, Neanderthals, and Modern Humans: Recurrent Introgression and Natural Selection , DRYAD (2025); 10.5061/dryad.z612jm6pj. [DOI] [PubMed] [Google Scholar]
  • [54].1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature, 526(7571):68, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Mallick Swapan, Li Heng, Lipson Mark, Mathieson Iain, Gymrek Melissa, Racimo Fernando, Zhao Mengyao, Chennagiri Niru, Nordenfelt Susanne, Tandon Arti, Skoglund Pontus, Lazaridis Iosif, Sankararaman Sriram, Fu Qiaomei, Rohland Nadin, Renaud Gabriel, Erlich Yaniv, Willems Thomas, Gallo Carla, Spence Jeffrey P., Song Yun S., Poletti Giovanni, Balloux Francois, van Driem George, de Knijff Peter, Romero Irene Gallego, Jha Aashish R., Behar Doron M., Bravi Claudio M., Capelli Cristian, Hervig Tor, Moreno- Estrada Andres, Posukh Olga L., Balanovska Elena, Balanovsky Oleg, Karachanak-Yankova Sena, Sahakyan Hovhannes, Toncheva Draga, Yepiskoposyan Levon, Tyler-Smith Chris, Xue Yali, Abdullah M. Syafiq, Ruiz-Linares Andres, Beall Cynthia M., Di Rienzo Anna, Jeong Choongwon, Starikovskaya Elena B., Metspalu Ene, Parik Jüri, Villems Richard, Henn Brenna M., Hodoglugil Ugur, Mahley Robert, Sajantila Antti, Stamatoyannopoulos George, Wee Joseph T. S., Khusainova Rita, Khusnutdinova Elza, Litvinov Sergey, Ayodo George, Comas David, Hammer Michael F., Kivisild Toomas, Klitz William, Winkler Cheryl A., Labuda Damian, Bamshad Michael, Jorde Lynn B., Tishkoff Sarah A., Watkins W. Scott, Metspalu Mait, Dryomov Stanislav, Sukernik Rem, Singh Lalji, Thangaraj Kumarasamy, Pääbo Svante, Kelso Janet, Patterson Nick & Reich David The simons genome diversity project: 300 genomes from 142 diverse populations. Nature, 538(7624):201–206, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Wong Karen H. Y., Ma Walfred, Wei Chun-Yu, Yeh Erh-Chan, Lin Wan-Jia, Wang Elin H. F., Su Jen-Ping, Hsieh Feng-Jen, Kao Hsiao-Jung, Chen Hsiao-Huei, Chow Stephen K., Young Eleanor, Chu Catherine, Poon Annie, Yang Chi-Fan, Lin Dar-Shong, Hu Yu- Feng, Wu Jer-Yuarn, Lee Ni-Chung, Hwu Wuh-Liang, Boffelli Dario, Martin David, Xiao Ming & Kwok Pui-Yan. Towards a reference genome that captures global genetic diversity. Nature communications, 11(1):5482, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Paten Benedict, Herrero Javier, Beal Kathryn, Fitzgerald Stephen, and Birney Ewan. Enredo and pecan: genome-wide mammalian consistency based multiple alignment with paralogs. Genome research, 18(11):1814– 1828, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [58].Paten Benedict, Herrero Javier, Fitzgerald Stephen, Beal Kathryn, Flicek Paul, Holmes Ian, and Birney Ewan. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome research, 18(11):1829–1843, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Liao Wen-Wei, Asri Mobin, Ebler Jana, Doerr Daniel, Haukness Marina, Hickey Glenn, Lu Shuangjia, Lucas Julian K., Monlong Jean, Abel Haley J., Buonaiuto Silvia, Chang Xian H., Cheng Haoyu, Chu Justin, Colonna Vincenza, Eizenga Jordan M., Feng Xiaowen, Fischer Christian, Fulton Robert S., Garg Shilpa, Groza Cristian, Guarracino Andrea, Harvey William T., Heumos Simon, Howe Kerstin, Jain Miten, Lu Tsung-Yu, Markello Charles, Martin Fergal J., Mitchell Matthew W., Munson Katherine M., Mwaniki Moses Njagi, Novak Adam M., Olsen Hugh E., Pesout Trevor, Porubsky David, Prins Pjotr, Sibbesen Jonas A., Jouni Sirén Chad Tomlinson, Villani Flavia, Vollger Mitchell R., Antonacci-Fulton Lucinda L., Baid Gunjan, Baker Carl A., Belyaeva Anastasiya, Billis Konstantinos, Carroll Andrew, Chang Pi-Chuan, Cody Sarah, Cook Daniel E., Cook-Deegan Robert M., Cornejo Omar E., Diekhans Mark, Ebert Peter, Fairley Susan, Fedrigo Olivier, Felsenfeld Adam L., Formenti Giulio, Frankish Adam, Gao Yan, Garrison Nanibaa’ A., Giron Carlos Garcia, Green Richard E., Haggerty Leanne, Hoekzema Kendra, Hourlier Thibaut, Ji Hanlee P., Kenny Eimear E., Koenig Barbara A., Kolesnikov Alexey, Korbel Jan O., Kordosky Jennifer, Koren Sergey, Lee HoJoon, Lewis Alexandra P., Hugo Magalhães Santiago Marco-Sola, Marijon Pierre, Ann McCartney Jennifer McDaniel, Mountcastle Jacquelyn, Nattestad Maria, Nurk Sergey, Olson Nathan D., Popejoy Alice B., Puiu Daniela, Rautiainen Mikko, Regier Allison A., Rhie Arang, Sacco Samuel, Sanders Ashley D., Schneider Valerie A., Schultz Baergen I., Shafin Kishwar, Smith Michael W., Sofia Heidi J., Abou Tayoun Ahmad N., Thibaud-Nissen Françoise, Tricomi Francesca Floriana, Wagner Justin, Walenz Brian, Wood Jonathan M. D., Zimin Aleksey V., Bourque Guillaume, Chaisson Mark J. P., Flicek Paul, Phillippy Adam M., Zook Justin M., Eichler Evan E., Haussler David, Wang Ting, Jarvis Erich D., Miga Karen H., Garrison Erik, Marschall Tobias, Hall Ira M., Li Heng & Paten Benedict. “A draft human pangenome reference.” Nature 617, no. 7960: 312–324, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Gustafson Jonas A., Gibson Sophia B., Damaraju Nikhita, Zalusky Miranda P.G., Hoekzema Kendra, Twesigomwe David, Yang Lei, Snead Anthony A., Richmond Phillip A., De Coster Wouter, Olson Nathan D., Guarracino Andrea, Li Qiuhui, Miller Angela L., Goffena Joy, Anderson Zachary B., Storz Sophie H.R., Ward Sydney A., Sinha Maisha, Gonzaga-Jauregui Claudia, Clarke Wayne E., Basile Anna O., Corvelo André, Reeves Catherine, Helland Adrienne, Musunuri Rajeeva Lochan, Revsine Mahler, Patterson Karynne E., Paschal Cate R., Zakarian Christina, Goodwin Sara, Jensen Tanner D., Robb Esther, The Genomes ONT Sequencing Consortium, University of Washington Center for Rare Disease Research (UW-CRDR), Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium, McCombie William Richard, Sedlazeck Fritz J., Zook Justin M., Montgomery Stephen B., Garrison Erik, Kolmogorov Mikhail, Schatz Michael C., McLaughlin Richard N. Jr., Dashnow Harriet, Zody Michael C., Loose Matt, Jain Miten, Eichler Evan E., and Miller Danny E. “High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.” Genome Research 34, no. 11 (2024): 2061–2073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Herrero Javier , Muffato Matthieu , Beal Kathryn , Fitzgerald Stephen , Gordon Leo , Pignatelli Miguel , Vilella Albert J. , Searle Stephen M. J. , Amode Ridwan , Brent Simon , Spooner William , Kulesha Eugene , Yates Andrew , Flicek Paul. Ensembl comparative genomics resources. Database, 2016:bav096, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [62].Prüfer K, De Filippo C, Grote S, Mafessoni F, Korlević P, Hajdinjak M, Vernot B, Skov L, Hsieh P, Peyrégne S, Reher D. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science, 358(6363):655–8, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [63].Mafessoni F, Grote S, De Filippo C, Slon V, Kolobova KA, Viola B, Markin SV, Chintalapati M, Peyrégne S, Skov L, Skoglund P. A high-coverage Neandertal genome from Chagyrskaya Cave. Proceedings of the National Academy of Sciences, 117(26):15132–6, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64].Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research, 35(suppl_1):D61–5, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Cingolani Pablo, Platts Adrian, Wang Le Lily, Coon Melissa, Nguyen Tung, Wang Luan, Land Susan J, Lu Xiangyi, and Ruden Douglas M. A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. fly, 6(2):80–92, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [66].Scheib CL, Li Hongjie, Desai Tariq, Link Vivian, Kendall Christopher, Dewar Genevieve, Griffith Peter William, Mörseburg Alexander, Johnson John R, Potter Amiee, Kerr Susan L., Endicott Phillip, Lindo John,Haber Marc, Xue Yali, Tyler-Smith Chris, Sandhu Manjinder S., Lorenz Joseph G., Randall Tori D., Faltyskova Zuzana, Pagani Luca, Danecek Petr, O’Connell Tamsin C., Martz Patricia, Boraas Alan S., Byrd Brian F., Leventhal Alan, Cambra Rosemary, Williamson Ronald, Lesage Louis, Holguin Brian, Ygnacio-De Soto Ernestine, Rosas JohnTommy, Metspalu Mait, Stock Jay T., Manica Andrea, Scally Aylwyn, Wegmann Daniel, Malhi Ripan S., Kivisild Toomas. Ancient human parallel lineages within North America contributed to a coastal expansion. Science, 360(6392):1024–1027, 2018. [DOI] [PubMed] [Google Scholar]
  • [67].Lindo John, Haas Randall, Hofman Courtney, Apata Mario, Moraga Mauricio, Verdugo Ricardo A, Watson James T, Llave Carlos Viviano, Witonsky David, Beall Cynthia, Warinner Christina, Novembre John, Aldenderfer Mark, and Rienzo Anna Di. The genetic prehistory of the andean highlands 7000 years bp through european contact. Science advances, 4(11): eaau4921, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [68].De la Fuente Constanza, Ávila-Arcos María C, Galimany Jacqueline, Carpenter Meredith L, Homburger Julian R, Blanco Alejandro, Contreras Paloma, Diana Cruz Dávalos Omar Reyes, Roman Manuel San, Moreno-Estrada Andrés, Campos Paula F., Eng Celeste, Huntsman Scott, Burchard Esteban G., Malaspinas Anna-Sapfo, Bustamante Carlos D., Willerslev Eske, Llop Elena, Verdugo Ricardo A., and Moraga Mauricio. Genomic insights into the origin and diversification of late maritime hunter gatherers from the Chilean patagonia. Proceedings of the National Academy of Sciences, 115(17):E4006–E4012, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Moreno-Mayar J. Víctor, Potter Ben A., Vinner Lasse, Steinrücken Matthias, Rasmussen Simon, Terhorst Jonathan, Kamm John A., Albrechtsen Anders, Malaspinas Anna-Sapfo, Sikora Martin, Reuther Joshua D., Irish Joel D., Malhi Ripan S., Orlando Ludovic, Song Yun S., Nielsen Rasmus, Meltzer David J. & Willerslev Eske. Terminal pleistocene alaskan genome reveals first founding population of native americans. Nature, 553 (7687):203, 2018. [DOI] [PubMed] [Google Scholar]
  • [70].Rasmussen Morten, Anzick Sarah L., Waters Michael R., Skoglund Pontus, DeGiorgio Michael, Stafford Thomas W. Jr, Rasmussen Simon, Moltke Ida, Albrechtsen Anders, Doyle Shane M., Poznik G. David, Gudmundsdottir Valborg, Yadav Rachita, Malaspinas Anna-Sapfo, White V Samuel Stockton, Allentoft Morten E., Cornejo Omar E., Tambets Kristiina, Eriksson Anders, Heintzman Peter D., Karmin Monika, Korneliussen Thorfinn Sand, Meltzer David J., Pierre Tracey L., Stenderup Jesper, Saag Lauri, Warmuth Vera M., Lopes Margarida C., Malhi Ripan S., Brunak Søren, Thomas Sicheritz-Ponten Ian Barnes, Collins Matthew, Orlando Ludovic, Balloux Francois, Manica Andrea, Gupta Ramneek, Metspalu Mait, Bustamante Carlos D., Jakobsson Mattias, Nielsen Rasmus & Willerslev Eske. The genome of a late pleistocene human from a clovis burial site in western montana. Nature, 506(7487): 225, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [71].Villa-Islas Viridiana, Izarraras-Gomez Alan, Larena Maximilian, Campos Elizabeth Mej´ıa Perez, Sandoval-Velasco Marcela, Rodríguez Juan Esteban Rodríguez, Bravo-Lopez Miriam, Moguel Barbara, Fregel Rosa, Garfias-Morales Ernesto, Tretmanis Jazeps Medina, Velázquez-Ramírez David Alberto,Herrera-Muñóz Alberto, Sandoval Karla, Nieves-Colón Maria A., Moreno Gabriela Zepeda García, Villanea Fernando A., Medina Eugenia Fernández Villanueva, Aguayo-Haro Ramiro,Valdiosera Cristina, Ioannidis Alexander G., Moreno-Estrada Andrés, Jay Flora,Huerta-Sanchez Emilia, Moreno-Mayar J. Víctor, Sánchez-Quinto Federico, Ávila-Arco María C. Demographic history and genetic structure in pre hispanic central mexico. Science, 380(6645):eadd6142, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Li Heng and Durbin Richard. Fast and accurate short read alignment with burrows–wheeler transform. bioinformatics, 25(14):1754–1760, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [73].Li Heng, Handsaker Bob, Wysoker Alec, Fennell Tim, Ruan Jue, Homer Nils, Marth Gabor, Abecasis Goncalo, and Durbin Richard. The sequence alignment/map format and samtools. Bioinformatics, 25(16):2078– 2079, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [74].Coll Macià M, Skov L, Peter BM, Schierup MH. Different historical generation intervals in human populations inferred from Neanderthal fragment lengths and mutation signatures. Nature Communications,12(1):5317, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [75].Seabold S., & Perktold J. Statsmodels: econometric and statistical modeling with python. SciPy, 7(1), 2010. [Google Scholar]
  • [76].Witt KE, Villanea F, Loughran E, Zhang X, Huerta-Sanchez E. Apportioning archaic variants among modern populations. Philosophical Transactions of the Royal Society B, 377(1852):20200411, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [77].Yi Xin, Liang Yu, Huerta-Sanchez Emilia, Jin Xin, Ping Cuo Zha Xi, Pool John E., Xu Xun, Jiang Hui, Vinckenbosch Nicolas, Thorfinn Sand Korneliussen Hancheng Zheng, Liu Tao, He Weiming, Li Kui, Luo Ruibang, Nie Xifang, Wu Honglong, Zhao Meiru, Cao Hongzhi, Zou Jing, Shan Ying, Li Shuzheng, Yang Qi, Asan Peixiang Ni, Tian Geng, Xu Junming, Liu Xiao, Jiang Tao, Wu Renhua, Zhou Guangyu, Tang Meifang, Qin Junjie, Wang Tong, Feng Shuijian, Li Guohong, Huasang Jiangbai Luosang, Wang Wei, Chen Fang, Wang Yading, Zheng Xiaoguang, Li Zhuo, Bianba Zhuoma, Yang Ge, Wang Xinping, Tang Shuhui, Gao Guoyi, Chen Yong, Luo Zhen, Gusang Lamu, Cao Zheng, Zhang Qinghui, Ouyang Weihan, Ren Xiaoli, Liang Huiqing, Zheng Huisong, Huang Yebo, Li Jingxiang, Bolund Lars, Kristiansen Karsten, Li Yingrui, Zhang Yong, Zhang Xiuqing, Li Ruiqiang, Li Songgang, Yang Huanming, Nielsen Rasmus, Wang Jun, and Wang Jian. Sequencing of 50 human exomes reveals adaptation to high altitude. science, 329(5987):75–78, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [78].Bhatia Gaurav, Patterson Nick, Sankararaman Sriram, and Price Alkes L . Estimating and interpreting fst: the impact of rare variants. Genome research, 23(9):1514–1521, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [79].Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS biology, 4(3):e72, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [80].International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature, 449(7164):851, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [81].Virtanen Pauli, Gommers Ralf, Oliphant Travis E, Haberland Matt, Reddy Tyler, Cournapeau David, Burovski Evgeni, Peterson Pearu, Weckesser Warren, Bright Jonathan, van der Walt Stéfan J., Brett Matthew, Wilson Joshua, Millman K. Jarrod, Mayorov Nikolay, Nelson Andrew R. J., Jones Eric, Kern Robert, Larson Eric, Carey C J, Polat İlhan, Feng Yu, Moore Eric W., VanderPlas Jake, Laxalde Denis, Perktold Josef, Cimrman Robert, Henriksen Ian, Quintero E. A., Harris Charles R., Archibald Anne M., Ribeiro Antônio H., Pedregosa Fabian, van Mulbregt Paul & SciPy 1.0 Contributors. Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods, 17(3):261–272, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [82].Byrska-Bishop Marta, Evani Uday S., Zhao Xuefang, Basile Anna O., Abel Haley J., Regier Allison A., Clarke André Corvelo Wayne E., Musunuri Rajeeva, Nagulapalli Kshithija, Fairley Susan, Runnels Alexi, Winterkorn Ernesto Lowy, Human Genome Structural Variation Consortium, Flicek Paul, Germer Soren, Brand Harrison, Hall Ira M., Talkowski Michael E., Narzisi Giuseppe, Zody Michael C. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185 (18): 3426–3440, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [83].Course Meredith M, Sulovari Arvis, Gudsnuk Kathryn, Eichler Evan E, and Valdmanis Paul N. Characterizing nucleotide variation and expansion dynamics in human-specific variable number tandem repeats. Genome Research, 31(8):1313–1324, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [84].Quinlan A. R., & Hall I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [85].Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research, 27(2), 573–580, 1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [86].Hubisz Melissa, and Siepel Adam. “Inference of ancestral recombination graphs using ARGweaver.” Statistical population genomics: 231-266, 2020. [DOI] [PubMed] [Google Scholar]
  • [87].Haller Benjamin Cand Messer Philipp W. Slim 3: Forward genetic simulations beyond the wright–fisher model. Molecular biology and evolution, 36(3):632–637, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [88].Harrow Jennifer, Frankish Adam, Gonzalez Jose M, Tapanari Electra, Diekhans Mark, Kokocinski Felix, Aken Bronwen L, Barrell Daniel, Zadissa Amonida, Searle Stephen, Barnes If, Bignell Alexandra, Boychenko Veronika, Hunt Toby, Kay Mike, Mukherjee Gaurab, Rajan Jeena, Despacio-Reyes Gloria, Saunders Gary, Steward Charles, Harte Rachel, Lin Michael, Howald Cédric, Tanzer Andrea, Derrien Thomas, Chrast Jacqueline, Walters Nathalie, Balasubramanian Suganthi, Pei Baikang, Tress Michael, Jose Manuel Rodriguez Iakes Ezkurdia, van Baren Jeltje, Brent Michael, Haussler David, Kellis Manolis, Valencia Alfonso, Reymond Alexandre, Gerstein Mark, Guigó Rodericand Hubbard Tim J. Gencode: the reference human genome annotation for the encode project. Genome research, 22(9):1760–1774, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [89].Gutenkunst Ryan N, Hernandez Ryan D, Williamson Scott H, and Bustamante Carlos D. Inferring the joint demographic history of multiple populations from multidimensional snp frequency data. PLoS genetics, 5 (10):e1000695, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [90].Green Richard E., Krause Johannes, Briggs Adrian W., Maricic Tomislav, Stenzel Udo, Kircher Martin, Patterson Nick, Li Heng, Zhai Weiwei, Fritz Markus Hsi-Yang, Hansen Nancy F., Durand Eric Y., Malaspinas Anna-Sapfo, Jensen Jeffrey D., Marques- Bonet Tomas, Alkan Can, Prüfer Kay, Meyer Matthias, Burbano Hernán A., Good Jeffrey M., Schultz Rigo, Ayinuer Aximu-Petri Anne Butthof, Höber Barbara, Höffner Barbara, Siegemund Madlen, Weihmann Antje, Nusbaum Chad, Lander Eric S., Russ Carsten, Novod Nathaniel, Affourtit Jason, Egholm Michael, Verna Christine, Rudan Pavao, Brajkovic Dejana, Kucan Željko, Gušic Ivan, Doronichev Vladimir B., Golovanova Liubov V., Lalueza-Fox Carles, de la Rasilla Marco, Fortea Javier, Rosas Antonio, Schmitz Ralf W., Johnson Philip L. F., Eichler Evan E., Falush Daniel, Birney Ewan, Mullikin James C., Slatkin Montgomery, Nielsen Rasmus, Kelso Janet, Lachmann Michael, Reich David, and Pääbo Svante. A draft sequence of the Neandertal genome. Science 328, (5979): 710–722, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [91].Gravel Simon, Henn Brenna M, Gutenkunst Ryan N, Indap Amit R, Marth Ga bor T, Clark Andrew G, Yu Fuli, Gibbs Richard A, 1000 Genomes Project, Bustamante Carlos D, et al. Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences, 108(29):11983–11988, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [92].Reich David, Green Richard E, Kircher Martin, Krause Johannes, Patterson Nick, Durand Eric Y, Viola Bence, Briggs Adrian W, Stenzel Udo, Johnson Philip LF, Maricic Tomislav, Good Jeffrey M., Marques-Bonet Tomas, Alkan Can, Fu Qiaomei, Mallick Swapan, Li Heng, Meyer Matthias, Eichler Evan E., Stoneking Mark, Richards Michael, Talamo Sahra, Shunkov Michael V., Derevianko Anatoli P., Hublin Jean-Jacques, Kelso Janet, Slatkin Montgomery & Pääbo Svante. Genetic history of an archaic hominin group from denisova cave in siberia. Nature, 468(7327):1053–1060, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [93].Bernard Y Kim, Huber Christian D, and Lohmueller Kirk E. Inference of the distribution of selection coefficients for new nonsynonymous mutations using large samples. Genetics, 206(1):345–361, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [94].Miles Alistair and Harding NJ. scikit-allel: A python package for exploring and analysing genetic variation data, 2016. [Google Scholar]
  • [95].Jacobs Guy S, Hudjashov Georgi, Saag Lauri, Kusuma Pradiptajati, Darusallam Chelzie C, Lawson Daniel J, Mondal Mayukh, Pagani Luca, Ricaut François-Xavier, Stoneking Mark, Metspalu Mait, Sudoyo Herawati, Lansing J. Stephen, Cox Murray P. Multiple deeply divergent denisovan ancestries in papuans. Cell, 2019. [DOI] [PubMed] [Google Scholar]
  • [96].Skoglund Pontus & Jakobsson Matthias. Archaic human ancestry in East Asia. Proceedings of the National Academy of Sciences, 108(45), 18301–18306, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [97].Hamid I., Korunes K. L., Beleza S., & Goldberg A. Rapid adaptation to malaria facilitated by admixture in the human population of Cabo Verde. Elife, 10, e63177, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [98].Robinson James T., Thorvaldsdóttir Helga, Winckler Wendy, Guttman Mitchell, Lander Eric S., Getz Gad, and Mesirov Jill P. “Integrative genomics viewer.” Nature biotechnology 29, no. 1: 24–26, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [99].Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, De Filippo C, Li H. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature;505(7481):43–9, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (6.5MB, pdf)
Supplement 2
media-2.pdf (507.1KB, pdf)

Data Availability Statement

The 1,000 Genomes Project Phase III, Simons Genome Diversity Project, high-coverage archaic genomes, Human Pangenome Reference Consortium, and Human Genome Structural Variant Consortium datasets are all publicly available. Ancient American genomes are available after signing data agreements from the original publications. All software used in this study is publicly available, and all statistical tests are described in the methods. All the information needed to reproduce the results in this study is described in the methods and supplemental methods. Additionally, the original code and final results can be found at: https://github.com/David-Peede/MUC19; intermediary files used to produce our final results can be found at: https://doi.org/10.5061/dryad.z612jm6pj; and the introgressed tracts, repeat information, phased late Neanderthal haplotypes, and Datasets S1S4 can be found at: https://doi.org/10.5281/zenodo.15042423.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES