Abstract
Saliva houses over 2000 proteins and peptides with poorly clarified functions, including proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases. Their genes are poorly conserved across related species, reflecting an evolutionary adaptation. We searched the nucleotide substitutions fixed in these salivary proteins’ gene loci in modern humans compared with ancient hominins. We mapped 3472 sequence variants/nucleotide substitutions in coding, noncoding, and 5′-3′ untranslated regions. Despite most of the detected variations being within noncoding regions, the frequency of coding variations was far higher than the general rate found throughout the genome. Among the various missense substitutions, specific substitutions detected in PRB1 and PRB2 genes were responsible for the introduction/abrogation of consensus sequences recognized by convertase enzymes that cleave the protein precursors. Overall, these changes that occurred during the recent human evolution might have generated novel functional features and/or different expression ratios among the various components of the salivary proteome. This may have influenced the homeostasis of the oral cavity environment, possibly conditioning the eating habits of modern humans. However, fixed nucleotide changes in modern humans represented only 7.3% of all the substitutions reported in this study, and no signs of evolutionary pressure or adaptative introgression from archaic hominins were found on the tested genes.
Keywords: salivary proteins, nucleotide substitutions, evolution
1. Introduction
Saliva is a multifaceted bodily fluid that contains enzymes (amylases, lysozymes, and lipases), proteins, peptides and glycoproteins, lipids (hormones such as testosterone and progesterone), and proteases, along with a high concentration of inorganic ions [1]. To date, more than 2000 proteins and peptides have been identified in saliva [2]. They are mainly involved in the homeostasis of the oral cavity, the digestion process, and the innate immune response [3]. Ninety percent of the salivary proteins and peptides derive from the secretion of the three major salivary glands (parotid, submandibular, and sublingual glands), while the remaining 10% are secreted by minor salivary glands or derive from exfoliated cells and leucocytes present in the gingival–crevicular fluid [4] from plasma exudate, plus some contributions from the oral microbial flora. During their transit in the secretory pathway, salivary proteins undergo a series of post-translational modifications (PTMs), including phosphorylation, N-terminal acetylation, glycosylation, sulfation, and proteolytic cleavages. Further changes in proteins and peptides also occur after secretion in the oral cavity, through the action of exogenous (microflora) and endogenous enzymes [1].
The main contribution to the composition of the human salivary proteome derives from a few protein families. In particular, proline-rich proteins (PRPs), statherin (STATH), P-B peptide, histatins (HTN), cystatins (CST), and amylases (AMY) altogether represent more than 95% (w/w) of all proteins found in saliva to date [5]. PRPs represent the major fraction of the salivary proteome in Homo sapiens (nearly 70% of the total protein content; >50% in weight) and include basic (bPRPs), acidic (aPRPs), and basic glycosylated (gPRPs) PRPs. They share a high abundance of proline, glycine, and glutamine residues, which represent 70–80% of the entire amino acid sequence [6,7]. bPRPs include eleven parent peptides/proteins and more than six parent glycosylated proteins (gPRPs), plus several proteoforms derived from gene polymorphisms and PTMs [8,9,10] (Figure 1). PRPs are encoded by genes belonging to the PRP multigene family, located within the PRB locus mapping on 12p13.2. The locus includes six tandemly linked genes: PRB2–PRB1–PRB4–PRH2–PRB3–PRH1, in the 5′-to-3′ direction, and is highly polymorphic as it contains internally repetitive DNA sequences, leading to frequent recombinational events [11,12]. At least four alleles (S, small; M, medium; L, large; and VL, very large) are present in the Western population of Homo sapiens at PRB1 and PRB3 loci and three (S, M, L) at PRB2 and PRB4 loci [8] (Figure 1). Except for the protein encoded by the PRB3 locus that gives rise to gPRPs, all the bPRP pro-proteins are cleaved completely by pro-protein convertases, generating smaller peptides/proteins, before granule maturation [9] (Figure 1). aPRPs are expressed in two loci, PRH1 and PRH2, mapping on chromosome 12p13. Single amino acid substitution and repeat insertion generate three PRH1 alleles, encoding parotid isoelectric-focusing slow isoform (PIF-s), the parotid acidic protein (Pa)—both 150 residues long—and the double band isoform slow (Db-s)—171 amino acid residues long [10] (Figure 2A). A single nucleotide substitution generates two PHR2 alleles, encoding the PRP-1 and PRP2 isoforms [11] (Figure 2A). A pro-protein convertase partially cleaves PRP-1, PRP2 and PIF-s in 3 N-terminal fragments of 106 residues, called PRP3, PRP4, PIF-f (PRP3 type), and a common C-terminal fragment of 44 amino acids, called P-C peptide. Db-s is cleaved at position 127 generating two peptides: Db-f (f stands for fast) and the P-C peptide (same as above) [12] (Figure 2A). The Pa isoform not carrying the convertase sequence generates a dimeric form through a disulfide bond [13] (Figure 2A). STATH is encoded by the STATH gene located in chromosome 4q13-19 [13,14]. Several STATH proteoforms are detectable in saliva due to phosphorylation, cyclization by transglutaminase 2, and proteolysis by amino-/carboxy-peptidases and convertase action [13,15,16]. P-B is a proline-rich small peptide encoded by the SMR3B gene, mapping on chromosome 4q13.3 [17], near the STATH gene, possibly sharing epigenetic control and/or the DNA replication timeframe [13,15,16]. HTN are small cationic histidine-rich peptides encoded by the HTN1 and HTN3 genes on chromosome 4q13. Despite their high sequence homology, HTN1 and HTN3 have different maturation pathways and biological activities [17,18,19].
CST are inhibitory cysteine proteases involved in the innate immune response [20]. CSTA and CSTB are encoded by CSTA and CSTB genes, respectively, whereas CST-SN, CST-SA, CST-C, CST-S, and CST-D are encoded by CST1-CST5 genes (Figure 2B). Several PTMs occur in CST proteins, including N-acetylation, proteolytic cleavages, phosphorylation, and M-, W-, and C-oxidation, causing different final protein structures detectable in human saliva [21]. Also, two isoforms generated by single amino acid substitutions of cystatin D and cystatin SN are present in saliva [21] (Figure 2B).
The amylase alpha 1A (AMY1A) gene, on chromosome 1p21.1, is responsible for the expression of AMY, which accounts for about 20% of the weight of salivary proteins and is the most abundant protein of the whole saliva of Homo sapiens.
Several comparative studies have shown that the human salivary proteome differs from other species due to genetic divergences that are possible due to environmental factors, including diet and pathogens [22,23,24,25]. A recent study reported the results obtained from the comparison of the salivary proteomes of Homo sapiens sapiens (modern humans) with our closest extant evolutionary relatives, chimpanzees, and gorillas [26]. The authors demonstrated that the salivary protein composition is unique to each species despite their close sequence homology, which likely reflects an evolutionary adaptation [26]. Despite this initial observation, the evolution of human loci-encoding salivary proteins has not been studied to date. Nowadays, the increasing amount of genomic data obtained through sequencing of preserved skeletal remains of extinct hominins, such as Homo neanderthalensis (Neanderthals) and Homo Denisova (Denisovans), can reveal the extent of diversity that has emerged at the genomic level during more recent human evolution.
In this study, we aimed to identify the sequence changes that have been fixed during the recent human evolution in the gene loci encoded for the most abundant salivary proteins (namely, PRPs, statherin, P-B peptide, histatins, cystatins, and amylases) to gather possible functional indications regarding their evolutionary path and their contribution to oral homeostasis and salivary functions. Eating habits may be indeed mutually implicated with salivary proteins’ biology since these are implicated in the modulation of the microbiome of the oral cavity and the entire gastrointestinal tract [26]. To achieve this, we have interrogated the publicly available sequence databases of Neanderthals and Denisovans and compared them with modern human genome sequence data. This allowed us to identify several nucleotide substitutions in the loci coding for the most relevant human salivary protein families.
2. Results
By comparing the genomic sequences of salivary gene loci in modern humans with those of Altai Neanderthals, Chagyrskaya Neanderthals, Vindija Neanderthasl, and Denisovans, we identified an overall number of 3472 sequence variants/nucleotide substitutions across the 17 tested salivary genes in coding, noncoding, 5′-3′ untranslated (UTRs), and regulatory regions. The nucleotide substitutions observed in the 17 salivary-tested genes were summarized in Figure 3. Of the 3472 changed nucleotides, only 428 were in coding regions, and 121 were annotated as synonymous (Figure 3). The remaining 307 nucleotide variations were nonsynonymous (Figure 3), which are known to be subjected to a higher evolutionary pressure and are frequently exposed to natural selection [27,28]. We have, therefore, attempted a functional interpretation of nonsynonymous variations, which is inherently speculative and deserves future functional studies. The potential impact of nonsynonymous variants on salivary proteins’ function of Neanderthals and Denisovans was predicted by a SIFT (sorting intolerant from tolerant) analysis (see Table 1, Table 2 and Table 3), which enables predicting amino acid substitutions that may exert a deleterious effect. The reference single nucleotide polymorphism (SNP) number (rs) and the corresponding frequencies of the 107 missense changes in coding regions were also reported in Table 1, Table 2 and Table 3. Of note, even though the nucleotide changes located in noncoding regions should not affect the primary structure of the encoded protein, they could affect regulatory elements that may modify the splicing and/or the binding of epigenetic modulators and/or chromatin folding/looping. The variants fixed at 100% in modern humans compared to ancient hominines were highlighted in light orange in Table 1, Table 2 and Table 3 and Tables S1–S17.
Table 1.
Chromosome Position (hg19) |
Gene Region | Modern Human | Altai Neanderthal (Variant Frequency a) |
Chagyrskaya Neanderthal (Variant Frequency a) |
Vindija Neanderthal (Variant Frequency a) |
Denisovan (Variant Frequency a) |
Codon→Amino Acid | SNP id | SNP Total Frequency (ALFA) |
SIFT Results (Score) |
---|---|---|---|---|---|---|---|---|---|---|
PRB1 (reverse reading, chromosome 12) | ||||||||||
11,507,477 | Exon 2 (II-2) |
CTT | CTT (100%) | TTT (13%) | TTT (7%) * | CTT (100%) | GAA→E10 AAA→K10 |
n.a. | n.a. | Damaging (0.02) |
11,507,464 | Exon 2 (II-2) |
AGG | AGG (100%) | AGG (100%) | AAG (12%) | AGG (100%) | UCC→S14 UUC→F14 |
rs1173856027 | A = 0% | Tolerated (0.72) |
11,506,888 | Exon 3 (II-2) |
GGG | GGG (100%) | GGG (100%) | GAG (12%) | GGG (100%) | CCC→P35 CUC→L35 |
n.a. | n.a. | Tolerated (0.06) |
11,506,856 | Exon 3 (II-2) |
GGG | GGG (100%) | AGG (11%) | GGG (100%) | GGG (100%) | CCC→P45 UCC→S45 |
rs762910991 | A = 0.003% | Tolerated (0.17) |
11,506,853 | Exon 3 (II-2) |
GGT | TGT (3%) * | GGT (100%) | AGT (15%) | GGT (100%) | CCA→P46 UCA→S46 |
rs745726339 | A = 0% | Damaging (0) |
11,506,852 | Exon 3 (II-2) |
GGT | GGT (100%) | GGT (100%) | GAT (11%) | GGT (100%) | CCA→P46 CUA→L46 |
n.a. | n.a. | Damaging (0) |
11,506,804 | Exon 3 (II-2) |
GTT | GAT (61%) | GAT (63%) | GAT (60%) | GTT (100%) | CAA→Q62 CUA→L62 |
n.a. | n.a. | Tolerated (0.29) |
11,506,801 | Exon 3 (II-2) |
CCT | CCT (100%) | CTT (11%) | CTT (5%) * | CCT (100%) | GGA→G63 GAA→E63 |
n.a. | n.a. | Damaging (0.01) |
11,506,790 | Exon 3 (II-2) |
GTT | GTT (100%) | ATT (11%) | ATT (6%) * | GTT (100%) | CAA→Q67 UAA→stop |
rs1409612167 | A = 0% | Damaging due to stop |
11,506,784 | Exon 3 (II-2) |
CTG | CTG (100%) | CTG (100%) | TTG (13%) | CTG (100%) | GAC→D69 AAC→N69 |
rs554211998 | T = 0% | Tolerated (0.95) |
11,506,774 | Exon 3 (II-2) |
GCT | GTT (13%) | GTT (8%) * | GTT (6%) * | GTT (9%) * | CGA→R72 CAA→Q72 |
rs202083397 | T = 10.6% | Tolerated (0.08) |
11,506,766 | Exon 3 (II-2) |
GCT | GCT (100%) | GCT (100%) | ACT (12%) | GCT (100%) | CGA→R75 UGA→stop |
rs766131639 | A = 0% | Damaging due to stop |
11,506,730 | Exon 3 (Ps-2) |
GTT | GTT (100%) | ATT (16%) | GTT (100%) | GTT (100%) | CAA→Q12 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,506,723 | Exon 3 (Ps-2) |
CCA | CCA (100%) | CTA (12%) | CTA (3%) * | CCA (100%) | GGU→G14 GAU→D14 |
rs534597111 | T = 0% | NS |
11,506,669 | Exon 3 (Ps-2) |
GGT | GTT (39%) | GTT (36%) | GTT (55%) | GTT (26%) | CCA→P32 CAA→Q32 |
rs772365043 | C = 0% | NS |
11,506,618 | Exon 3 (Ps-2) |
CCT | CCT (100%) | CTT (17%) | CTT (3%) * | CCT (100%) | GGA→G49 GAA→E49 |
n.a. | n.a. | NS |
11,506,612 | Exon 3 (Ps-2) |
GGG | GGG (100%) | GAG (11%) | GGG (100%) | GGG (100%) | CCC→P51 CUC→L51 |
n.a. | n.a. | NS |
11,506,577 | Exon 3 (IB-6) |
GGA | GGA (100%) | AGA (13%) | GGA (100%) | GGA (100%) | CCU→P2 UCU→S2 |
n.a. | n.a. | NS |
11,506,514 | Exon 3 (IB-6) |
GGA | GGA (100%) | AGA (6%) * | AGA (11%) | GGA (100%) | CCU→P23 UCU→S23 |
n.a. | n.a. | NS |
11,506,492 | Exon 3 (IB-6) |
GGT | GGT (100%) | GGT (100%) | GAT (13%) | GGT (100%) | CCA→P30 CUA→L30 |
n.a. | n.a. | NS |
11,506,490 | Exon 3 (IB-6) |
GGG | AGG (5%) * | AGG (18%) | AGG (8%) * | GGG (100%) | CCC→P31 UCC→S31 |
n.a. | n.a. | NS |
11,506,486 | Exon 3 (IB-6) |
GGT | GGT (100%) | GGT (100%) | GTT (18%) | GGT (100%) | CCA→P32 CAA→Q32 |
rs755622101 | T = 1.3% | NS |
11,506,473 | Exon 3 (Ps-2) |
TTC | TTG(100%) | TTG(83%) ** | TTG(100%) ** | TTG(75%) ** | AAG→K37 AAC→N37 |
rs61930109 | G = 72.1% | NS |
11,506,403 | Exon 3 (Ps-2) |
AGG | GGG (50%) ** | GGG (50%) ** | AGG (100%) ** | GGG (100%) | UCC→S59 CCC→P59 |
n.a. | n.a. | NS |
11,506,370 | Exon 3 (Ps-2) |
GGG | GGG (100%) | GGG (100%) | AGG (21%) | GGG (100%) | CCC→P70 UCC→S70 |
rs774158904 | A = 0% | NS |
11,506,369 | Exon 3 (Ps-2) |
GGG | GGG (93%) | GGG (100%) | GAG (16%) | GGG (100%) | CCC→P71 CUC→L71 |
rs369001998 | A = 0.007% | NS |
11,506,339 | Exon 3 (Ps-2) |
GGG | GGG (97%) | GAG (5%) * | GAG (23%) | GGG (100%) | CCC→P81 CUC→L81 |
n.a. | n.a. | NS |
11,506,333 | Exon 3 (Ps-2) |
GGA | GGA (100%) | GAA (5%) * | GAA (11%) | GGA (100%) | CCU→P83 CUU→L83 |
n.a. | n.a. | NS |
11,506,309 | Exon 3 (Ps-2) |
GGT | GAT (4%) * | GAT (6%) * | GAT (17%) | GGT (100%) | CCA→P91 CUU→L91 |
n.a. | n.a. | Damaging (0.01) |
11,506,303 | Exon 3 (Ps-2) |
GGT | GTT (3%) * | GTT (13%) | GGT (100%) | GGT (100%) | CCA→P93 CAA→Q93 |
rs201682460 | T = 2.8% | Damaging (0) |
11,506,301 | Exon 3 (Ps-2) |
GTT | ATT (4%) * | GTT (100%) | ATT (15%) | GTT (100%) | CAA→Q94 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,506,285 | Exon 3 (Ps-2) |
GGA | GGA (100%) | GGA (100%) | GAA (14%) | GGA (100%) | CCU→P99 CUU→L99 |
n.a. | n.a. | Damaging (0.01) |
11,506,283 | Exon 3 (Ps-2) |
GTT | GTT (100%) | ATT (14%) | ATT (13%) | GTT (100%) | CAA→Q100 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,506,250 | Exon 3 (Ps-2) |
GGT | GGT (100%) ** | GGT (100%) | AGT (14%) | GGT (100%) | CCA→P111 UCA→S111 |
n.a. | n.a. | Tolerated (0.08) |
11,506,249 | Exon 3 (Ps-2) |
GGT | GGT (100%) ** | GGT (100%) | GAT (13%) | GGT (100%) | CCA→P111 CUA→L111 |
rs1208300501 | A = 0% | Tolerated (0.09) |
11,506,246 | Exon 3 (Ps-2) |
GGG | GGG (100%) ** | GAG (18%) | GGG (100%) | GGG (100%) | CCC→P112 CUC→L112 |
rs1303924609 | A = 0% | Damaging (0.02) |
11,506,241 | Exon 3 (Ps-2) |
GTT | GTT (100%) ** | GTT (100%) | ATT (14%) | GTT (100%) | CAA→Q114 UAA→stop |
rs751826141 | A = 0% | Damaging due to stop |
11,506,217 | Exon 3 (IB-6) |
CGG | GGG (67%) ** | GGG (17%) ** | GGG (25%) | CGG (100%) | GCC→A61 CCC→P61 |
rs771648794 | G = 0.04% | Tolerated (1) |
11,506,154 | Exon 3 (IB-6) |
GGG | GGG (100%) | AGG (17%) | AGG (4%) * | GGG (100%) | CCC→P82 UCC→S82 |
n.a. | n.a. | Tolerated (0.15) |
11,506,150 | Exon 3 (IB-6) |
GGT | GGT (100%) | GAT (14%) | GGT (100%) | GAT (6%) * | CCA→P83 CUA→L83 |
rs747444571 | A = 0% | Damaging (0.03) |
11,506,079 | Exon 3 (IB-6) |
GGA | GGA (100%) | GGA (100%) | AGA (13%) | GGA (100%) | CCU→P107 UCU→S107 |
n.a. | n.a. | Tolerated (0.06) |
11,506,075 | Exon 3 (IB-6) |
GGA | GGA (100%) | GGA (100%) | GAA (13%) | GGA (100%) | CCU→P108 CUU→L108 |
n.a. | n.a. | Damaging (0.01) |
11,506,070 | Exon 3 (IB-6) |
CCC | CCC (100%) | CCC (100%) | TCC (12%) | CCC (100%) | GGG→G110 AGG→R110 |
n.a. | n.a. | Tolerated (0.3) |
11,506,057 | Exon 3 (IB-6) |
AGG | AGG (100%) | AAG (11%) | AAG (5%) * | AGG (100%) | UCC→S114 UUC→F114 |
n.a. | n.a. | Damaging (0.03) |
11,506,052 | Exon 3 (IB-6) |
GGA | GGA (100%) | AGA (10%) * | AGA (18%) | GGA (100%) | CCU→P116 UCU→S116 |
rs1372423355 | A = 0% | Tolerated (0.06) |
PRB2 (reverse reading, chromosome 12) | ||||||||||
11,548,429 | Exon 1 (Signal) |
CGG | CGG (100%) | CAG (3%) * | CAG (13%) | CGG (100%) | GCC→A11(sp) GUC→V11(sp) |
rs1415819382 | A = 0% | Damaging (0) |
11,547,429 | Exon 2 (IB-1) |
CCT | TCT (4%) * | CCT (100%) | TCT (12%) | CCT (100%) | GGA→G18 AGA→R18 |
n.a. | n.a. | Damaging (0.2) |
11,546,899 | Exon 3 (IB-1) |
CCT | CCT (100%) | CTT (11%) | CCT (100%) | CCT (100%) | GGA→G22 GAA→E22 |
rs188924826 | T = 0.007% | Tolerated (0.1) |
11,546,894 | Exon 3 (IB-1) |
GGG | GGG (100%) | AGG (14%) | GGG (100%) | GGG (100%) | CCC→P24 UCC→S24 |
n.a. | n.a. | Tolerated (0.73) |
11,546,872 | Exon 3 (IB-1) |
GGA | GGA (100%) | GGA (100%) | GAA (11%) | GGA (100%) | CCU→P31 CUU→L31 |
rs748769813 | A = 0% | Tolerated (0.46) |
11,546,830 | Exon 3 (IB-1) |
GGG | GGG (100%) | GAG (9%) * | GAG (17%) | GGG (100%) | CCC→P45 CUC→L45 |
n.a. | n.a. | Tolerated (0.1) |
11,546,828 | Exon 3 (IB-1) |
GGT | AGT (3%) * | GGT (100%) | AGT (17%) | GGT (100%) | CCA→P46 UCA→S46 |
rs755161117 | A = 0.007% | Tolerated (0.36) |
11,546,825 | Exon 3 (IB-1) |
GTT | GTT (97%) | GTT (100%) | ATT (17%) | GTT (100%) | CAA→Q47 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,546,810 | Exon 3 (IB-1) |
GGA | GGA (100%) | GGA (100%) | AGA (13%) | GGA (100%) | CCU→P52 UCU→S52 |
rs1347881375 | A = 0% | Tolerated (0.97) |
11,546,809 | Exon 3 (IB-1) |
GGA | GGA (100%) | GAA (6%) * | GAA (12%) | GGA (100%) | CCU→P52 CUU→L52 |
n.a. | n.a. | Tolerated (0.3) |
11,546,807 | Exon 3 (IB-1) |
GTT | GTT (97%) | ATT (11%) | ATT (11%) | GTT (100%) | CAA→Q53 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,546,792 | Exon 3 (IB-1) |
GGA | GGA (100%) | AGA (18%) | GGA (100%) | GGA (100%) | CCU→P58 UCU→S58 |
n.a. | n.a. | Tolerated (0.76) |
11,546,780 | Exon 3 (IB-1) |
GGT | GGT (100%) | GGT (100%) | AGT (12%) | GGT (100%) | CCA→P62 UCA→S62 |
n.a. | n.a. | Tolerated (0.64) |
11,546,770 | Exon 3 (IB-1) |
GGT | GGT (100%) | GGT (100%) | GAT (13%) | GGT (100%) | CCA→P65 CUA→L65 |
n.a. | n.a. | Tolerated (1) |
11,546,764 | Exon 3 (IB-1) |
GGT | GGT (100%) | GGT (96%) | GAT (12%) | GGT (100%) | CCA→P67 CAA→Q67 |
rs201994479 | T = 0.008% | Tolerated (0.43) |
11,546,732 | Exon 3 (IB-1) |
GGA | GGA (100%) | GGA (100%) | AGA (13%) | GGA (100%) | CCU→P78 UCU→S78 |
n.a. | n.a. | Tolerated (0.38) |
11,546,716 | Exon 3 (IB-1) |
GTT | GAT (4%) * | GAT (14%) | GTT (97%) | GTT (100%) | CAA→Q83 CUA→L83 |
n.a. | n.a. | Tolerated (0.32) |
11,546,686 | Exon 3 (IB-1) |
GCT | GTT (42%) | GTT (39%) | GTT (51%) | GTT (29%) | CGA→R93 CAA→Q93 |
rs76832300 | n.a. | Tolerated (0.5) |
11,546,677 | Exon 3 (IB-1) |
GCT | GCT (100%) | GCT (100%) | GCT (100%) | GTT (24%) | CGA→R96 CAA→Q96 |
rs201144571 | T = 0.08% | Tolerated (0.47) |
11,546,647 | Exon 3 (P-J) |
GGG | GGG (100%) | GGG (100%) | GAG (15%) | GGG (100%) | CCC→P10 CUC→L10 |
n.a. | n.a. | Tolerated (0.18) |
11,546,642 | Exon 3 (P-J) |
GTT | GTT (100%) | GTT (100%) | ATT (17%) | GTT (100%) | CAA→Q12 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,546,627 | Exon 3 (P-J) |
GGA | AGA (3%) * | AGA (11%) | AGA (5%) * | GGA (100%) | CCU→P17 UCU→S17 |
n.a. | n.a. | Tolerated (0.45) |
11,546,618 | Exon 3 (P-J) |
GGA | GGA (100%) | GGA (93%) | AGA (17%) | GGA (100%) | CCU→P20 UCU→S20 |
n.a. | n.a. | Tolerated (0.81) |
11,546,617 | Exon 3 (P-J) |
GGA | GGA (100%) | GGA (100%) | GAA (17%) | GGA (100%) | CCU→P20 CUU→L20 |
rs780517289 | A = 0% | Tolerated (0.82) |
11,546,615 | Exon 3 (P-J) |
GGT | GGT (100%) | AGT (12%) | AGT (8%) * | GGT (100%) | CCA→P21 UCA→S21 |
n.a. | n.a. | Tolerated (0.39) |
11,546,614 | Exon 3 (P-J) |
GGT | GGT (100%) | GAT (11%) | GGT (100%) | GGT (100%) | CCA→P21 CUA→L21 |
n.a. | n.a. | Tolerated (0.29) |
11,546,585 | Exon 3 (P-J) |
GGG | GGG (100%) | GGG (100%) | AGG (13%) | GGG (100%) | CCC→P31 UCC→S31 |
n.a. | n.a. | Tolerated (0.53) |
11,546,581 | Exon 3 (P-J) |
GGT | GTT (6%) * | GTT (13%) | GGT (100%) | GGT (100%) | CCA→ P32 CAA→Q32 |
n.a. | n.a. | Damaging (0.05) |
11,546,566 | Exon 3 (P-J) |
TTT | TCT (8%) * | TCT (12%) | TTT (100%) | TTT (100%) | AAA→K37 AGA→R37 |
rs746515947 | C = 0% | Tolerated (1) |
11,546,462 | Exon 3 (IB-8a) |
GGG | GGG (100%) | AGG (13%) | GGG (100%) | GGG (100%) | CCC→P9 UCC→S9 |
rs201392419 | A = 0% | Tolerated (0.58) |
11,546,395 | Exon 3 (IB-8a) |
GGT | GTT (16%) | GTT (10%) * | GTT (13%) | GTT (4%) * | CCA→P31 CAA→Q31 |
rs11054277 | T = 0.01% | Damaging (0) |
11,546,380 | Exon 3 (IB-8a) |
TTT | TCT (17%) | TCT (14%) | TCT (6%) * | TTT (100%) | AAA→K37 AGA→R37 |
rs11054276 | C = 0.01% | Tolerated (1) |
11,546,381 | Exon 3 (IB-8a) |
TTT | TTT (100%) | CTT (100%) | TTT (100%) | GTT (13%) | AAA→K37 CAA→Q37 |
rs201455726 | G = 0.2% | Tolerated (0.42) |
11,546,369 | Exon 3 (IB-8a) |
GGG | GGG (100%) | AGG (12%) | GGG (100%) | GGG (100%) | CCC→P41 UCC→S41 |
rs1238238576 | A = 0% | Tolerated (0.42) |
11,546,347 | Exon 3 (IB-8a) |
GTT | GAT (6%) * | GAT (4%) * | GAT (15%) | GTT (100%) | CAA→Q48 CUA→L48 |
n.a. | n.a. | Tolerated (0.32) |
11,546,342 | Exon 3 (IB-8a) |
GGT | GGT (100%) | GGT (100%) | AGT (18%) | GGT (100%) | CCA→P50 UCA→S50 |
n.a. | n.a. | Tolerated (0.41) |
11,546,327 | Exon 3 (IB-8a) |
CTG | CTG (100%) | TTG (11%) | TTG (18%) | CTG (100%) | GAC→D55 AAC→N55 |
n.a. | n.a. | Tolerated (0.28) |
11,546,314 | Exon 3 (IB-8a) |
GTT | GCT (87%) | GCT (77%) | GCT (67%) | GCT (94%) | CAA→Q59 CGA→R59 |
rs34305575 | C = 7.6% | Tolerated (0.35) |
11,546,309 | Exon 3 (IB-8a) |
CGG | GGG (12%) | GGG (13%) | GGG (18%) | GGG (5%) * | GCC→A61 CCC→P61 |
rs201308939 | G = 3.8% | Tolerated (0.25) |
11,546,305 | Exon 3 (IB-8a) |
GCT | GTT (3%) * | GCT (100%) | GTT (11%) | GCT (100%) | CGA→R62 CAA→Q62 |
rs199748368 | T = 0.07% | Tolerated (0.46) |
11,546,300 | Exon 3 (IB-8a) |
GGA | GGA (100%) | AGA (13%) | GGA (100%) | GGA (100%) | CCU→P64 UCU→S64 |
rs755713521 | n.a. | Tolerated (0.66) |
11,546,294 | Exon 3 (IB-8a) |
CCT | CCT (100%) | TCT (13%) | CCT (100%) | CCT (100%) | GGA→G66 AGA→R66 |
n.a. | n.a. | Damaging (0.03) |
11,546,279 | Exon 3 (IB-8a) |
GGT | AGT (2%) * | GGT (100%) | AGT (13%) | GGT (100%) | CCA→P71 UCA→S71 |
n.a. | n.a. | Tolerated (0.67) |
11,546,278 | Exon 3 (IB-8a) |
GGT | GAT (2%) * | GGT (100%) | GAT (13%) | GGT (100%) | CCA→P71 CUA→L71 |
rs766408532 | n.a. | Tolerated (0.26) |
11,546,246 | Exon 3 (IB-8a) |
GGG | GGG (100%) | GGG (100%) | AGG (14%) | GGG (100%) | CCC→P82 UCC→S82 |
rs1440556057 | A = 0.0004% | Tolerated (0.42) |
11,546,245 | Exon 3 (IB-8a) |
GGG | GGG (97%) | GAG (7%) * | GAG (26%) | GAG (7%) * | CCC→P82 CUC→L82 |
rs1262267049 | A = 0.0004% | Tolerated (0.15) |
11,546,213 | Exon 3 (IB-8a) |
GGG | GGG (100%) | AGG (8%) * | AGG (25%) | GGG (100%) | CCC→P93 UCC→S93 |
rs1408969762 | n.a. | Tolerated (0.26) |
11,546,187 | Exon 3 (IB-8a) |
GTT | GTT (96%) | GTC (10%) * | GTC (12%) | GTC (4%) * | CAA→Q101 CAC→H101 |
n.a. | n.a. | Tolerated (0.23) |
11,546,161 | Exon 3 (IB-8a) |
GTT | GAT (21%) | GTT (100%) | GAT (30%) | GTT (100%) | CAA→Q110 CUA→L110 |
n.a. | n.a. | Tolerated (0.61) |
11,546,089 | Exon 3 (P-F) |
GGG | GGG (100%) | GAG (17%) ** | GAG (17%) | GGG (100%) | CCC→P10 CUC→L10 |
n.a. | n.a. | Tolerated (0.61) |
11,546,084 | Exon 3 (P-F) |
GTT | GTT (100%) | GTT (100%) | ATT (15%) | GTT (100%) | CAA→Q12 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,546,059 | Exon 3 (P-F) |
GGG | GGG (100%) | GAG (7%) * | GAG (21%) | GGG (100%) | CCC→P20 CUC→L20 |
n.a. | n.a. | Tolerated (0.19) |
11,546,050 | Exon 3 (P-F) |
GGA | GTA (4%) * | GTA (13%) | GGA (100%) | GTA (7%) * | CCU→P23 CAU→H23 |
n.a. | n.a. | Tolerated (0.56) |
11,546,027 | Exon 3 (P-F) |
GGG | GGG (100%) | AGG (11%) | AGG (7%) * | GGG (100%) | CCC→P31 UCC→S31 |
rs1201001162 | n.a. | Tolerated (0.61) |
11,546,023 | Exon 3 (P-F) |
GGT | GGT (100%) | GTT (5%) * | GTT (13%) | GTT (4%) * | CCA→P32 CAA→Q32 |
rs201391404 | T = 0.059% | Damaging (0.03) |
11,546,009 | Exon 3 (P-F) |
TTT | TTT (100%) | TTT (100%) | TTT (95%) | GTT (12%) | AAA→K37 CAA→ Q37 |
n.a. | n.a. | Tolerated (0.26) |
11,545,975 | Exon 3 (P-F) |
GTT | GAT (2%) * | GAT (16%) | GAT (33%) | GTT (100%) | CAA→Q48 CUA→L48 |
n.a. | n.a. | Tolerated (0.31) |
11,545,964 | Exon 3 (P-F) |
GGT | GGT (100%) | CGT (20%) | CGT (22%) | CGT (19%) | CCA→P51 GCA→A51 |
n.a. | n.a. | Tolerated (0.74) |
11,545,904 | Exon 3 (P-H) |
GGG | GGG (100%) | AGG (3%) * | AGG (11%) | GGG (100%) | CCC→P10 UCC→S10 |
n.a. | n.a. | Tolerated (0.8) |
11,545,868 | Exon 3 (P-H) |
GGA | GGA (100%) | GGA (100%) | AGA (13%) | GGA (100%) | CCU→P22 UCU→S22 |
n.a. | n.a. | Tolerated (0.69) |
11,545,814 | Exon 3 (P-H) |
GTC | GTC (100%) | ATC (4%) * | ATC (12%) | GTC (100%) | CAG→Q40 UAG→stop |
n.a. | n.a. | Damaging due to stop |
11,545,802 | Exon 3 (P-H) |
GCG | GCG (100%) | GCG (100%) | ACG (11%) | GCG (100%) | CGC→R44 UGC→C44 |
rs748815572 | A = 0% | Tolerated (0.07) |
11,545,793 | Exon 3 (P-H) |
GTT | GTT (100%) | ATT (12%) | GTT (100%) | GTT (100%) | CAA→Q47 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,545,790 | Exon 3 (P-H) |
CCC | CCC (100%) | CCC (100%) | TCC (13%) | CCC (100%) | GGG→G48 AGG→R48 |
n.a. | n.a. | Tolerated (0.7) |
PRB3 (reverse reading, chromosome 12) | ||||||||||
11,422,578 | Exon 1 (Signal) |
CGG | CGG (100%) | CAG (14%) | CAG (3%) * | CGG (100%) | GCC→A8(sp) GUC→V8(sp) |
rs1337927316 | n.a. | Tolerated (0.06) |
11,421,578 | Exon 2 (Gl-5) |
AGG | AGG (100%) | AAG (11%) | AAG (11%) | AGG (100%) | UCC→S14 UUC→F14 |
n.a. | n.a. | Tolerated (0.32) |
11,421,002 | Exon 3 (Gl-5) |
GGG | GGG (100%) | AGG (11%) | AGG (4%) * | GGG (100%) | CCC→P45 UCC→S45 |
rs533382585 | n.a. | Damaging (0.04) |
11,420,989 | Exon 3 (Gl-5) |
CCG | CCG (100%) | CTG (14%) | CTG (5%) * | CCG (96%) | GGC→G49 GAC→D49 |
n.a. | n.a. | Damaging (0) |
11,420,975 | Exon 3 (Gl-5) |
CCA | TCA (2%) * | TCA (17%) | CCA (100%) | CCA (100%) | GGU→G54 AGU→S54 |
rs1197023343 | n.a. | Tolerated (0.12) |
11,420,974 | Exon 3 (Gl-5) |
CCA | CCA (100%) | CTA (8%) * | CTA (21%) | CCA (100%) | GGU→G54 GAU→D54 |
n.a. | n.a. | Tolerated (0.19) |
11,420,971 | Exon 3 (Gl-5) |
GGG | GGG (100%) | GGG (100%) | GAG (11%) | GGG (100%) | CCC→P55 CUC→L55 |
n.a. | n.a. | Damaging (0.02) |
11,420,956 | Exon 3 (Gl-5) |
CCT | CCT (98%) | CCT (100%) | CTT (14%) | CCT (100%) | GGA→G60 GAA→E60 |
rs745804122 | T = 0% | Tolerated (0.06) |
11,420,945 | Exon 3 (Gl-5) |
CCT | CCT (100%) | CCT (100%) | TCT (14%) | TCT (4%) * | GGA→G64 AGA→R64 |
rs781151188 | T = 0% | Damaging (0.02) |
11,420,939 | Exon 3 (Gl-5) |
GGG | GGG (100%) | AGG (11%) ** | AGG (11%) | GGG (100%) | CCC→P66 UCC→S66 |
n.a. | n.a. | Damaging (0.04) |
11,420,927 | Exon 3 (Gl-5) |
CCT | CCT (100%) | CCT (100%) | TCT (11%) | CCT (100%) | GGA→G70 AGA→R70 |
n.a. | n.a. | Damaging (0) |
11,420,926 | Exon 3 (Gl-5) |
CCT | CCT (100%) | CCT (100%) | CTT (16%) | CCT (100%) | GGA→G70 GAA→E70 |
n.a. | n.a. | Damaging (0) |
11,420,906 | Exon 3 (Gl-5) |
GGT | GGT (100%) | GGT (100%) | AGT (12%) | GGT (100%) | CCA→P77 UCA→S77 |
n.a. | n.a. | Damaging (0.04) |
11,420,899 | Exon 3 (Gl-5) |
GCA | GTA (73%) | GCA (100%) | GTA (65%) | GTA (80%) | CGU→R79 CAU→H79 |
rs769836435 | T = 0.02% | Tolerated (0.59) |
11,420,896 | Exon 3 (Gl-5) |
GGC | GGC (100%) | GGC (100%) | GAC (13%) | GGC (100%) | CCG→P80 CUG→L80 |
n.a. | n.a. | Tolerated (0.09) |
11,420,836 | Exon 3 (Gl-5) |
GCA | GTA (7%) * | GTA (5%) * | GTA (9%) * | GTA (22%) | CGU→R100 CAU→H100 |
n.a. | n.a. | Tolerated (0.24) |
11,420,815 | Exon 3 (Gl-5) |
GGT | GTT (18%) | GGT (100%) | GGT (96%) | GGT (100%) | CCA→P107 CAA→Q107 |
rs201963893 | T = 0% | Tolerated (0.45) |
11,420,803 | Exon 3 (Gl-5) |
CCT | CCT (100%) | CCT (100%) | CTT (15%) | CCT (100%) | GGA→G111 GAA→E111 |
n.a. | n.a. | Tolerated (0.41) |
11,420,800 | Exon 3 (Gl-5) |
CCT | CCT (97%) | CCT (100%) | CTT (11%) | CCT (100%) | GGA→G112 GAA→E112 |
n.a. | n.a. | Damaging (0.01) |
11,420,780 | Exon 3 (Gl-5) |
GGC | GGC (100%) | AGC (11%) | GGC (100%) | GGC (100%) | CCG→P119 UCG→S119 |
n.a. | n.a. | Damaging (0.04) |
11,420,779 | Exon 3 (Gl-5) |
GGC | GAC (4%) * | GAC (6%) * | GAC (35%) | GGC (100%) | CCG→P119 CUG→L119 |
n.a. | n.a. | Damaging (0.03) |
11,420,728 | Exon 3 (Gl-5) |
AGG | AAG (4%) * | AGG (100%) | AAG (11%) | AGG (100%) | UCC→S136 UUC→F136 |
n.a. | n.a. | Damaging (0.04) |
11,420,716 | Exon 3 (Gl-5) |
GGC | GAC (4%) * | GGC (100%) | GAC (17%) | GGC (100%) | CCG→P140 CUG→L140 |
n.a. | n.a. | Tolerated (0.12) |
11,420,687 | Exon 3 (Gl-5) |
GGG | GGG (98%) | AGG (15%) | GGG (100%) | GGG (100%) | CCC→P150 UCC→S150 |
n.a. | n.a. | Tolerated (0.15) |
11,420,686 | Exon 3 (Gl-5) |
GGG | GGG (98%) | GAG (8%) * | GAG (18%) | GGG (100%) | CCC→P150 CUC→L150 |
n.a. | n.a. | Tolerated (0.15) |
11,420,614 | Exon 3 (Gl-2) |
CCT | CCT (100%) | CCT (100%) | CTT (11%) | CCT (100%) | GGA→G132 GAA→E132 |
rs768625455 | n.a. | NS |
11,420,597 | Exon 3 (Gl-2) |
CCA | CCA (100%) | CCA (100%) | TCA (13%) | CCA (100%) | GGU→G138 AGU→S138 |
rs780713977 | n.a. | Tolerated (0.09) |
11,420,588 | Exon 3 (Gl-2) |
GGA | AGA (4%) * | AGA (10%) * | AGA (16%) | GGA (100%) | CCU→P141 UCU→S141 |
n.a. | n.a. | Tolerated (0.78) |
11,420,495 | Exon 3 (Gl-2) |
GGT | AGT (12%) | AGT (3%) * | AGT (6%) * | AGT (14%) | CCA→P172 UCA→S172 |
n.a. | n.a. | Tolerated (0.14) |
11,420,308 | Exon 4 (Gl-2) |
GGG | GGG (100%) | AGG (17%) | GGG (100%) | GGG (100%) | CCC→P234 UCC→S234 |
rs760324380 | A = 0.0008% | Tolerated (0.09) |
11,420,307 | Exon 4 (Gl-2) |
GGG | GGG (100%) | GAG (12%) | GGG (100%) | GGG (100%) | CCC→P234 CUC→L234 |
n.a. | n.a. | Damaging (0.03) |
11,420,304 | Exon 4 (Gl-2) |
GGT | GGT (100%) | GAT (12%) | GGT (100%) | GGT (100%) | CCA→P235 CUA→L235 |
n.a. | n.a. | Damaging (0.01) |
11,420,281 | Exon 4 (Gl-2) |
GCA | GCA (100%) | ACA (13%) | ACA (10%) * | GCA (100%) | CGU→R243 UGU→C243 |
rs758570507 | A = 0% | Damaging (0.05) |
11,420,278 | Exon 4 (Gl-2) |
GGG | GGG (100%) | GGG (100%) | AGG (11%) | GGG (100%) | CCC→P244 UCC→S244 |
n.a. | n.a. | Tolerated (0.27) |
11,420,182 | Exon 4 (Gl-2) |
GGT | GGT (100%) | GGT (100%) | AGT (11%) | GGT (100%) | CCA→P277 UCA→S277 |
rs755939114 | A = 0% | Tolerated (0.06) |
11,420,170 | Exon 4 (Gl-2) |
CCC | CCC (100%) | CCC (100%) | TCC (11%) | CCC (100%) | GGG→G280 AGG→R280 |
n.a. | n.a. | Tolerated (0.07) |
11,420,161 | Exon 4 (Gl-2) |
GGT | GGT (100%) | GGT (100%) | AGT (13%) | GGT (100%) | CCA→P283 UCA→S283 |
n.a. | n.a. | Tolerated (0.21) |
11,420,160 | Exon 4 (Gl-2) |
GGT | GGT (100%) | GGT (100%) | GAT (19%) | GGT (100%) | CCA→P283 CUA→L283 |
n.a. | n.a. | Tolerated (0.09) |
11,420,154 | Exon 4 (Gl-2) |
TCT | TTT (3%) * | TCT (100%) | TTT (11%) | TCT (100%) | AGA→R285 AAA→K285 |
n.a. | n.a. | Tolerated (0.63) |
PRB4 (reverse reading, chromosome 12) | ||||||||||
11,463,280 | Exon 1 (PGA) |
TCA | TGA (100%) | TGA (100%) | TGA (97%) | TGA (100%) | AGU→S2 ACU→T2 |
n.a. | n.a. | Tolerated (0.83) |
11,461,801 | Exon 3 (PGA) |
GCT | GCT (98%) | GCT (97%) | GTT (13%) | GCT (100%) | CGA→R23 CAA→Q23 |
n.a. | n.a. | Tolerated (0.57) |
11,461,772 | Exon 3 (PGA) |
GCA | GCA (100%) | GCA (96%) | ACA (12%) | GCA (100%) | CGU→R33 UGU→C33 |
rs77775235 | A = 0% | Tolerated (0.06) |
11,461,769 | Exon 3 (PGA) |
GGG | TGG (5%) * | TGG (9%) * | TGG (5%) * | TGG (13%) | CCC→P34 ACC→T34 |
rs144658455 | T = 0% | Tolerated (0.53) |
11,461,745 | Exon 3 (PGA) |
GTT | CTT (8%) * | CTT (8%) * | CTT (5%) * | CTT (12%) | CAA→Q42 GAA→E42 |
rs76859544 | C = 6.8% | Tolerated (1) |
11,461,742 | Exon 3 (PGA) |
CCT | TCT (10%) * | TCT (27%) | TCT (11%) | TCT (7%) * | GGA→G43 AGA→R43 |
rs776943151 | T = 0.05% | Tolerated (0.45) |
11,461,706 | Exon 3 (PGA) |
GGG | TGG (14%) | TGG (23%) | TGG (13%) | TGG (20%) | CCC→P55 ACC→T55 |
rs12308381 | T = 21.6% | Tolerated (0.12) |
11,461,675 | Exon 3 (PGA) |
GCT | GGT (1%) * | GGT (2%) * | GGT (2%) * | GGT (28%) | CGA→R65 CCA→P65 |
rs75743553 | G = 0% | Tolerated (0.32) |
11,461,673 | Exon 3 (PGA) |
GGG | GGG (99%) | AGG (13%) | AGG (2%) * | GGG (100%) | CCC→P66 UCC→S66 |
rs1332850459 | A = 0% | Tolerated (0.25) |
11,461,580 | Exon 3 (PGA) |
TGG | GGG (65%) | GGG (52%) | GGG (24%) | GGG (54%) | ACC→T97 CCC→P97 |
n.a. | n.a. | Tolerated (0.81) |
11,461,570 | Exon 3 (PGA) |
GGA | GTA (51%) | GTA (54%) | GTA (8%) * | GTA (47%) | CCU→P100 CAU→H100 |
n.a. | n.a. | Tolerated (0.59) |
11,461,553 | Exon 3 (PGA) |
TCT | CCT (13%) | CCT (15%) | TCT (100%) | CCT (24%) | AGA→R106 GGA→G106 |
n.a. | n.a. | Tolerated (0.84) |
11,461,550 | Exon 3 (PGA) |
GGT | GGT (100%) | AGT (17%) | GGT (100%) | GGT (100%) | CCA→P107 UCA→S107 |
n.a. | n.a. | Tolerated (0.50) |
11,461,549 | Exon 3 (PGA) |
GGT | GCT (13%) | GCT (6%) * | GGT (100%) | GCT (13%) | CCA→P107 CGA→R107 |
n.a. | n.a. | Tolerated (0.9) |
11,461,525 | Exon 3 (PGA) |
AGG | AGG (100%) | AAG (100%) | AAG (100%) | AGG (100%) | UCC→S115 UUC→F115 |
n.a. | n.a. | Damaging (0.04) |
11,461,513 | Exon 3 (PGA) |
GGT | GGT (100%) | GAT (10%) * | GAT (11%) | GGT (100%) | CCA→P119 CUA→L119 |
n.a. | n.a. | Damaging (0.04) |
11,461,471 | Exon 3 (PGA) |
CCA | CCA (100%) | CTA (4%) * | CTA (14%) | CCA (100%) | GGU→G133 GAU→D133 |
n.a. | n.a. | Tolerated (0.46) |
11,461,421 | Exon 3 (PGA) |
GGG | GGG (100%) | AGG (5%) * | AGG (6%) * | AGG (100%) | CCC→P150 UCC→S150 |
n.a. | n.a. | Tolerated (0.18) |
11,461,420 | Exon 3 (PGA) |
GGG | GGG (100%) | GAG (11%) | GGG (100%) | GGG (100%) | CCC→P150 CUC→L150 |
n.a. | n.a. | Tolerated (0.1) |
11,461,412 | Exon 3 (PGA) |
CTT | CTT (100%) | TTT (14%) | CTT (100%) | CTT (100%) | GAA→E153 AAA→K153 |
n.a. | n.a. | Tolerated (0.85) |
11,461,319 | Exon 4 (P-D P32A) |
GGA | GGA (97%) | AGA (9%) * | AGA (11%) | GGA (100%) | CCU→P23 UCU→S23 |
n.a. | n.a. | Tolerated (0.55) |
11,461,309 | Exon 4 (P-D P32A) |
GGT | GGT (100%) | GGT (100%) | GAT (11%) | GGT (100%) | CCA→P26 CUA→L26 |
n.a. | n.a. | Damaging (0.01) |
11,461,229 | Exon 4 (P-D P32A) |
GGA | GGA (100%) | AGA (13%) | AGA (4%) * | GGA (100%) | CCU→P54 UCU→S54 |
n.a. | n.a. | Tolerated (0.13) |
a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.
Table 2.
Chromosome Position (hg19) |
Gene Region | Modern Human | Altai Neanderthal (Variant Frequency a) |
Chagyrskaya Neanderthal (Variant Frequency a) |
Vindija Neanderthal (Variant Frequency a) |
Denisovan (Variant Frequency a) |
Codon→Amino Acid | SNP id | SNP Total Frequency (ALFA) |
SIFT Results (Score) |
---|---|---|---|---|---|---|---|---|---|---|
PRH2 (direct reading, chromosome 12) | ||||||||||
11,082,885 | Exon 2 (PRP-1) |
GTT | ATT (2%) * | ATT (12%) | ATT (4%) * | GTT (100%) | GUU→V12 AUU→I12 |
rs776898585 | A = 0% | N.S |
11,082,894 | Exon 2 (PRP-1) |
GTA | GTA (100%) | ATA (12%) | ATA (10%) * | GTA (100%) | GUA→V15 AUA→I15 |
n.a. | n.a. | Tolerated (0.26) |
11,083,305 | Exon 3 (PRP-1) |
CCA | CCA (98%) | TCA (14%) | TCA (14%) | CCA (100%) | CCA→P33 UCA→S33 |
n.a. | n.a. | Tolerated (0.07) |
11,083,318 | Exon 3 (PRP-1) |
GGA | GGA (100%) | GAA (14%) | GGA (100%) | GGA (100%) | GGA→G37 GAA→E37 |
n.a. | n.a. | Tolerated (0.07) |
11,083,323 | Exon 3 (PRP-1) |
CAA | CAA (100%) | TAA (8%) * | TAA (12%) | CAA (100%) | CAA→Q39 UAA→stop |
n.a. | n.a. | Damaging due to stop |
11,083,426 | Exon 3 (PRP-1) |
GGA | GGA (100%) | GGA (100%) | GAA (11%) | GGA (100%) | GGA→G73 GAA→E73 |
n.a. | n.a. | Damaging (0.02) |
11,083,431 | Exon 3 (PRP-1) |
CCA | CCA (100%) | TCA (13%) | TCA (8%) * | TCA (6%) * | CCA→P75 UCA→S75 |
n.a. | n.a. | Tolerated (0.23) |
11,083,452 | Exon 3 (PRP-1) |
GGA | GGA (100%) | AGA (6%) * | AGA (14%) | GGA (100%) | GGA→G82 AGA→R82 |
n.a. | n.a. | Damaging (0.01) |
11,083,455 | Exon 3 (PRP-1) |
GGC | GGC (100%) | AGC (17%) | GGC (100%) | GGC (100%) | GGC→G83 AGC→S83 |
n.a. | n.a. | N.S. |
11,083,488 | Exon 3 (PRP-1) |
GGA | GGA (100%) | GGA (100%) | AGA (11%) | GGA (100%) | GGA→G94 AGA→R94 |
n.a. | n.a. | Damaging (0.04) |
11,083,531 | Exon 3 (PRP-1) |
AGG | AGG (100%) | AGG (100%) | AAG (18%) | AGG (100%) | AGG→R108 AAG→K108 |
n.a. | n.a. | N.S. |
11,083,536 | Exon 3 (PRP-1) |
CAA | CAA (100%) | TAA (11%) | CAA (100%) | CAA (100%) | CAA→Q110 UAA→stop |
n.a. | n.a. | N.S. |
11,083,545 | Exon 3 (PRP-1) |
CCC | CCC (100%) | TCC (12%) | TCC (6%) * | CCC (100%) | CCC→P113 UCC→S113 |
rs1289206423 | T = 0% | N.S. |
11,083,551 | Exon 3 (PRP-1) |
CAG | CAG (97%) | CAG (100%) | TAG (13%) | CAG (100%) | CAG→Q115 UAG→stop |
n.a. | n.a. | N.S. |
11,083,570 | Exon 3 (PRP-1) |
GGT | GGT (100%) | GAT (18%) | GGT (100%) | GGT (100%) | GGU→G121 GAU→D121 |
n.a. | n.a. | N.S. |
11,083,575 | Exon 3 (PRP-1) |
CCC | CCC (96%) | TCC (8%) * | TCC (15%) | CCC (100%) | CCC→P123 UCC→S123 |
n.a. | n.a. | N.S. |
11,083,581 | Exon 3 (PRP-1) |
CCT | CCT (100%) | TCT (20%) | TCT (8%) * | CCT (100%) | CCU→P125 UCU→S125 |
n.a. | n.a. | N.S. |
11,083,582 | Exon 3 (PRP-1) |
CCT | CCT (100%) | CTT (13%) | CTT (8%) * | CCT (100%) | CCU→P125 CUU→L125 |
n.a. | n.a. | N.S. |
11,083,605 | Exon 3 (PRP-1) |
CCA | CCA (100%) | TCA (11%) | CCA (100%) | CCA (100%) | CCA→P133 UCA→S133 |
rs1343870622 | T = 0% | N.S. |
11,083,618 | Exon 3 (PRP-1) |
GGG | GGG (100%) | GAG (11%) | GGG (100%) | GGG (100%) | GGG→G137 GAG→E137 |
n.a. | n.a. | N.S. |
11,083,635 | Exon 3 (PRP-1) |
CCT | CCT (100%) | CCT (100%) | TCT (16%) | CCT (100%) | CCU→P143 UCU→S143 |
n.a. | n.a. | N.S. |
11,083,636 | Exon 3 (PRP-1) |
CCT | CCT (100%) | CCT (100%) | CTT (11%) | CCT (100%) | CCU→P143 CUU→L143 |
n.a. | n.a. | N.S. |
11,083,663 | Exon 3 (C-term removal) |
TCT | TCT (100%) | TCT (100%) | TTT (17%) | TCT (100%) | UCU→S152(rem) UUU→F152(rem) |
rs746351335 | n.a. | N.S. |
HTN1 (direct reading, chromosome 4) | ||||||||||
70,920,165 | Exon 4 | CAT | CAT (100%) | TAT (2%) * | TAT (13%) | CAT (100%) | CAU→H15 UAU→Y15 |
n.a. | n.a. | Tolerated (0.37) |
70,921,215 | Exon 5 | GAA | GAA (100%) | AAA (3%) * | AAA (11%) | GAA (100%) | GAA→E16 AAA→K16 |
n.a. | n.a. | N.S |
70,921,234 | Exon 5 | CGA | CAA (2%) * | CAA (58%) | CAA (3%) * | CGA (100%) | CGA→R32 CAA→Q32 |
rs375127098 | A = 0.014% | N.S |
HTN3 (direct reading, chromosome 4) | ||||||||||
70,896,460 | Exon 2 (Signal) |
ATG | ATG (100%) | ATA (11%) | ATG (100%) | ATG (100%) | AUG→M0(sp) AUA→I0(sp) |
n.a. | n.a. | N.S |
70,897,696 | Exon 3 (Signal) |
GGA | GGA (100%) | AGA (12%) | AGA (4%) * | GGA (100%) | GGA→G17(sp) AGA→R17(sp) |
rs1254624179 | n.a. | N.S |
AMY1A (reverse reading, chromosome 1) | ||||||||||
104,238,248 | Exon 2 (Signal) |
ACC | ACC (100%) | ACC (100%) | ATC (15%) | ACC (100%) | UGG→W4(sp) UAG→stop |
n.a. | n.a. | Damaging due to stop |
104,238,189 | Exon 2 | GCT | GCT (100%) | ACT (13%) | ACT (20%) ** | GCT (100%) | CGA→R10 UGA→stop |
n.a. | n.a. | Damaging due to stop |
104,237,696 | Exon 3 | ACC | ACC (100%) | ACC (100%) | ATC (17%) | ACC (100%) | UGG→W59 UAG→stop |
n.a. | n.a. | Damaging due to stop |
104,237,685 | Exon 3 | GTT | GTT (100%) | GTT (100%) | ATT (14%) | GTT (100%) | CAA→Q63 UAA→stop |
n.a. | n.a. | Damaging due to stop |
104,237,626 | Exon 3 | TAC | TAC (100%) | TAC (100%) | TAT (15%) | TAC (100%) | AUG→M82 AUA→I82 |
n.a. | n.a. | Damaging (0.01) |
104,236,795 | Exon 4 | GCA | GCA (100%) | GCA (100%) | ACA (13%) | GCA (100%) | CGU→R92 UGU→C92 |
n.a. | n.a. | Damaging (0) |
104,236,666 | Exon 4 | CTA | CTA (100%) | CTA (100%) | TTA (11%) | CTA (100%) | GAU→D135 AAU→N135 |
n.a. | n.a. | Tolerated (0.08) |
104,236,654 | Exon 4 | CCA | CCA (100%) | TCA (5%) * | TCA (11%) | CCA (100%) | GGU→G139 AGU→S139 |
n.a. | n.a. | Tolerated (0.6) |
104,236,152 | Exon 5 | CAG | CAG (100%) | TAG (15%) | TAG (20%) | CAG (100%) | GUC→V157 AUC→I157 |
n.a. | n.a. | Tolerated (0.17) |
104,236,146 | Exon 5 | CTA | CTA (100%) | TTA (8%) * | TTA (12%) | CTA (100%) | GAU→D159 AAU→N159 |
n.a. | n.a. | Tolerated (1) |
104,236,139 | Exon 5 | GCA | GTA (4%) * | GTA (7%) * | GTA (12%) | GCA (100%) | CGU→R161 CAU→H161 |
n.a. | n.a. | Damaging (0.01) |
104,236,080 | Exon 5 | CTT | CTT (100%) | CTT (100%) | TTT (13%) | CTT (100%) | GAA→E181 AAA→K181 |
n.a. | n.a. | Tolerated (0.11) |
104,235,996 | Exon 5 | CGT | CGT (96%) | CGT (100%) | TGT (13%) | CGT (100%) | GCA→A209 ACA→T209 |
n.a. | n.a. | Tolerated (0.27) |
104,235,164 | Exon 6 | CTC | CTC (100%) | CTC (100%) | TTC (11%) | CTC (100%) | GAG→E240 AAG→K240 |
n.a. | n.a. | Damaging (0.01) |
104,235,148 | Exon 6 | TCA | TCA (100%) | TCA (100%) | TTA (18%) | TCA (100%) | AGU→S245 AAU→N245 |
n.a. | n.a. | Tolerated (0.52) |
104,235,083 | Exon 6 | GCG | ACG (3%) * | ACG (6%) * | ACG (12%) | GCG (100%) | CGC→R267 UGC→C267 |
n.a. | n.a. | Damaging (0) |
104,234,224 | Exon 7 | CCT | CCT (100%) | CCT (100%) | CTT (13%) | CCT (100%) | GGA→G281 GAA→E281 |
n.a. | n.a. | Damaging (0) |
104,234,218 | Exon 7 | CCA | CCA (100%) | CTA (13%) | CTA (15%) | CCA (100%) | GGU→G283 GAU→D283 |
n.a. | n.a. | Tolerated (0.25) |
104,234,129 | Exon 7 | GAA | GAA (100%) | AAA (13%) | GAA (100%) | GAA (100%) | CUU→L313 UUU→F313 |
n.a. | n.a. | Damaging (0) |
104,234,125 | Exon 7 | TGG | TGG (100%) | TAG (17%) | TGG (100%) | TGG (100%) | ACC→T314 AUC→I314 |
n.a. | n.a. | Damaging (0) |
104,233,978 | Exon 8 | GGA | GGA (100%) | AGA (13%) | AGA (11%) | GGA (100%) | CCU→P332 UCU→S332 |
n.a. | n.a. | Damaging (0.05) |
104,233,977 | Exon 8 | GGA | GGA (100%) | GAA (6%) * | GAA (11%) | GGA (100%) | CCU→P332 CUU→L332 |
n.a. | n.a. | Damaging (0) |
104,233,963 | Exon 8 | GCT | GCT (100%) | GCT (100%) | ACT (14%) | GCT (100%) | CGA→R337 UGA→stop |
rs19955486 | A = 0.08% | Damaging due to stop |
104,231,858 | Exon 9 | ACA | ACA (100%) | ACA (100%) | ATA (11%) | ACA (100%) | UGU→C378 UAU→Y378 |
n.a. | n.a. | Damaging (0) |
104,231,680 | Exon 10 | CAC | CAC (100%) | TAC (4%) * | TAC (20%) | CAC (100%) | GUG→V401 AUG→M401 |
n.a. | n.a. | Damaging (0) |
104,231,643 | Exon 10 | CCC | CCC (100%) | CTC (5%) * | CTC (11%) | CCC (100%) | GGG→G413 GAG→E413 |
n.a. | n.a. | Damaging (0.02) |
104,231,622 | Exon 10 | CCC | CCC (100%) | CCC (100%) | CTC (13%) | CCC (100%) | GGG→G420 GAG→E420 |
n.a. | n.a. | Tolerated (0.08) |
104,230,237 | Exon 11 | TGA | TGA (100%) | TGA (100%) | TAA (13%) | TGA (100%) | ACU→T442 AUU→I442 |
n.a. | n.a. | Damaging (0) |
104,230,129 | Exon 11 | AGA | AGA (100%) | AGA (100%) | AAA (13%) | AGA (100%) | UCU→S478 UUU→F478 |
n.a. | n.a. | Tolerated (0.62) |
STATH (direct reading, chromosome 4) | ||||||||||
70,866,583 | Exon 5 | GGG | GGG (100%) | AGG (13%) | AGG (3%) * | GGG (100%) | GGG→G17 AGG→R17 |
n.a. | n.a. | N.A. |
70,866,616 | Exon 5 | CCA | CCA (98%) | CCA (100%) | TCA (11%) | TCA (3%) * | CCA→P28 UCA→S28 |
n.a. | n.a. | N.A. |
70,866,626 | Exon 5 | CCA | CCA (100%) | CTA (15%) | CCA (100%) | CCA (96%) | CCA→P31 CUA→L31 |
n.a. | n.a. | N.A. |
70,866,628 | Exon 5 | CAA | CAA (100%) | TAA (15%) | CAA (100%) | CAA (100%) | CAA→Q32 UAA→stop |
n.a. | n.a. | Damaging due to stop |
SMR3B (direct reading, chromosome 4) | ||||||||||
71,255,405 | Exon 3 | AGG | AGG (100%) | AGG (100%) | AAG (12%) | AGG (100%) | AGG→R5 AAG→K5 |
rs777831757 | A = 0% | NS |
71,255,444 | Exon 3 | CCT | CCT (100%) | CTT (12%) | CTT (3%) * | CCT (100%) | CCU→P18 CUU→L18 |
n.a. | n.a. | NS |
71,255,495 | Exon 3 | GGG | GGG (100%) | GGG (94%) | GAG (17%) | GGG (100%) | GGG→G35 GAG→E35 |
n.a. | n.a. | NS |
a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored.
Table 3.
Chromosome Position (hg19) |
Gene Region | Modern Human | Altai Neanderthal (Variant Frequency a) |
Chagyrskaya Neanderthal (Variant Frequency a) |
Vindija Neanderthal (Variant Frequency a) |
Denisovan (Variant Frequency a) |
Codon→Amino Acid | SNP id | SNP Total Frequency (ALFA) |
SIFT Results (Score) |
---|---|---|---|---|---|---|---|---|---|---|
CST1 (reverse reading, chromosome 20) | ||||||||||
23,731,494 | Exon 1 (Signal) | ATA | GTA (100%) | GTA (95%) | GTA (100%) | GTA (100%) | UAU→Y3(sp) CAU→H3(sp) |
rs6076122 | G = 71.1% | Tolerated (0.11) |
23,731,463 | Exon 1 (Signal) |
TGG | TAG (2%) * | TAG (13%) | TAG (5%) * | TGG (100%) | ACC→T13(sp) AUC→I13(sp) |
n.a. | n.a. | Tolerated (0.39) |
23,731,455 | Exon 1 (Signal) |
CAC | CAC (100%) | CAC (100%) | TAC (16%) | CAC (100%) | GUG→V16(sp) AUG→M16(sp) |
n.a. | n.a. | Tolerated (0.23) |
23,731,446 | Exon 1 (Signal) |
CGG | CGG (100%) | CGG (100%) | TGG (11%) | CGG (100%) | GCC→A19(sp) ACC→T19(sp) |
rs1425228752 | T = 0.001% | Damaging (0.01) |
23,731,439 | Exon 1 | TCG | TCG (100%) | TTG (6%) * | TTG (14%) | TCG (100%) | AGC→S2 AAC→N2 |
n.a. | n.a. | Tolerated (0.15) |
23,731,428 | Exon 1 | CTC | CTC (100%) | CTC (100%) | TTC (21%) | CTC (100%) | GAG→E6 AAG→K6 |
rs1292698911 | T = 0.0004% | Tolerated (0.66) |
23,731,394 | Exon 1 | CGT | CGT (100%) | CAT (13%) | CGT (100%) | CGT (100%) | GCA→A17 GUA→V17 |
n.a. | n.a. | Tolerated (0.25) |
23,731,344 | Exon 1 | CTC | TTC (3%) * | CTC (100%) | TTC (11%) | TTC (3%) * | GAG→E34 AAG→K34 |
rs368203290 | T = 0.008% | Tolerated (0.07) |
23,731,307 | Exon 1 | GCA | GCA (100%) | GTA (14%) | GCA (100%) | GTA (6%) * | CGU→R46 CAU→H46 |
rs758187154 | T = 0% | Damaging (0.01) |
23,731,281 | Exon 1 | GTT | GTT (100%) | GTT (100%) | ATT (13%) | GTT (100%) | CAA→Q55 UAA→stop |
n.a. | n.a. | Damaging due to stop |
23,729,759 | Exon 2 | CCC | CCC (100%) | CCC (100%) | CGC (26%) | CCC (100%) | GGG→G59 GCG→A59 |
n.a. | n.a. | Tolerated (1) |
23,729,700 | Exon 2 | GGG | GGG (100%) | GGG (100%) | AGG (11%) | GGG (100%) | CCC→P79 UCC→S79 |
n.a. | n.a. | Tolerated (0.38) |
23,729,699 | Exon 2 | GGG | GGG (100%) | GAG (3%) * | GAG (11%) | GGG (100%) | CCC→P79 CUC→L79 |
rs756782667 | A = 0% | Tolerated (0.06) |
23,729,687 | Exon 2 | TGG | TGG (100%) | TAG (16%) | TAG (4%) * | TGG (100%) | ACC→T83 AUC→I83 |
n.a. | n.a. | Damaging (0.02) |
23,728,503 | Exon 3 | GGG | GGG (100%) | AGG (11%) | AGG (3%) * | GGG (100%) | CCC→P106 UCC→S106 |
rs754531104 | A = 0.004% | Tolerated (0.09) |
23,728,494 | Exon 3 (Cys-SN) |
TTG | CTG (10%) * | CTG (11%) | CTG (14%) | CTG (4%) * | AAC→N109 GAC→D109 |
rs3188319 | C = 0.004% | Tolerated (1) |
23,728,490 | Exon 3 | TCT | TTT (2%) * | TTT (14%) | TCT (100%) | TCT (100%) | AGA→R110 AAA→K110 |
n.a. | n.a. | Tolerated (1) |
23,728,487 | Exon 3 | TCC | TCC (100%) | TTC (13%) | TTC (7%) * | TCC (100%) | AGG→R111 AAG→K111 |
rs3188320 | T = 0% | Tolerated (0.85) |
CST2 (reverse reading, chromosome 20) | ||||||||||
23,807,260 | Exon 1 (Signal) |
CGG | CGG (100%) | CGG (100%) | CAG (14%) | CGG (100%) | GCC→A12(sp) GUC→V12(sp) |
rs1411653443 | A = 0.007% | Damaging (0.02) |
23,807,257 | Exon 1 (Signal) |
TGG | TGG (100%) | TAG (14%) | TGG (100%) | TGG (100%) | ACC→T13(sp) AUC→I13(sp) |
n.a. | n.a. | Tolerated (0.43) |
23,807,245 | Exon 1 (Signal) |
CGG | CGG (100%) | CAG (14%) | CGG (100%) | CGG (100%) | GCC→A17(sp) GUC→V17(sp) |
n.a. | n.a. | Tolerated (0.1) |
23,807,231 | Exon 1 | GGG | GGG (100%) | AGG (14%) | AGG (8%) * | GGG (100%) | CCC→P3 UCC→S3 |
n.a. | n.a. | Tolerated (1) |
23,807,162 | Exon 1 | GCA | ACA (95%) | ACA (100%) | ACA (100%) | ACA (8%) * | CGU→R26 UGU→C26 |
rs111349461 | A = 0.06% | Damaging (0.05) |
23,807,138 | Exon 1 | CTC | TTC (3%) * | TTC (12%) | TTC (6%) * | CTC (100%) | GAG→E34 AAG→K34 |
rs541427772 | T = 0.017% | Tolerated (0.07) |
23,807,102 | Exon 1 | GCG | ACG (3%) * | GCG (100%) | ACG (11%) | GCG (100%) | CGC→R46 UGC→C46 |
rs112783512 | A = 0.019% | Tolerated (0.07) |
23,807,093 | Exon 1 | GCC | GCC (100%) | ACC (4%) | ACC (20%) | GCC (100%) | CGG→R49 UGG→W49 |
rs55860552 | A = 0.12% | Damaging (0) |
23,807,084 | Exon 1 | GCT | GCT (100%) | ACT (5%) * | ACT (15%) | GCT (100%) | CGA→R52 UGA→stop |
rs568411970 | A = 0% | Damaging due to stop |
23,807,077 | Exon 1 | TCC | TCC (100%) | TCC (100%) | TTC (13%) | TCC (100%) | AGG→R54 AAG→K54 |
n.a. | n.a. | Tolerated (0.34) |
23,807,075 | Exon 1 | CTC | CTC (100%) | TTC (12%) | TTC (12%) | CTC (100%) | GAG→E55 AAG→K55 |
n.a. | n.a. | Tolerated (1) |
23,805,930 | Exon 2 | TAT | CAT (7%) * | CAT (5%) * | CAT (14%) | CAT (4%) * | AUA→I67 GUA→V67 |
rs199856966 | C = 0.004% | Tolerated (1) |
23,805,917 | Exon 2 | GCT | GTT (2%) * | GTT (13%) | GTT (5%) * | GTT (2%) * | CGA→R71 CAA→Q71 |
rs150428155 | T = 0.008% | Damaging (0.01) |
23,805,878 | Exon 2 | ACA | ACA (100%) | ACA (97%) | ATA (14%) | ACA (100%) | UGU→C84 UAU→Y84 |
n.a. | n.a. | Damaging (0) |
23,805,875 | Exon 2 | CGG | CGG (100%) | CAG (15%) | CAG (2%) * | CGG (100%) | GCC→A85 GUC→V85 |
n.a. | n.a. | Tolerated (0.06) |
23,804,730 | Exon 3 | ACG | ACG (100%) | ATG (7%) * | ATG (11%) | ACG (100%) | UGC→C98 UAC→Y98 |
n.a. | n.a. | Damaging (0) |
23,804,702 | Exon 3 | ACC | ACC (100%) | ACT (12%) | ACC (100%) | ACC (100%) | UGG→W107 UGA→stop |
rs1380420803 | n.a. | Damaging due to stop |
23,804,691 | Exon 3 | TAC | TCC (13%) | TCC (10%) * | TCC (9%) * | TAC (100%) | AUG→M111 AGG→R111 |
rs202150666 | C = 0.01% | Tolerated (0.31) |
CST3 (reverse reading, chromosome 20) | ||||||||||
23,618,472 | Exon 1 (Signal) |
GAG | GAG (100%) | AAG (8%) * | AAG (15%) | GAG (100%) | CUC→L8(sp) UUC→F8(sp) |
rs1285248919 | n.a. | Damaging (0) |
23,618,433 | Exon 1 | GGG | GGG (100%) | GGG (100%) | AGG (13%) | GGG (100%) ** | CCC→P22(sp) UCC→S22(sp) |
n.a. | n.a. | Tolerated (0.5) |
23,618,370 | Exon 1 | CAC | CAC (100%) | CAC (100%) | TAC (13%) | CAC (100%) | GUG→V18 AUG→M18 |
n.a. | n.a. | Tolerated (0.11) |
23,618,358 | Exon 1 | CCA | CCA (100%) | TCA (22%) | TCA (4%) * | CCA (100%) | GGU→G22 AGU→S22 |
n.a. | n.a. | Tolerated (0.48) |
23,618,357 | Exon 1 | CCA | CCA (100%) | CTA (11%) | CCA (100%) | CCA (100%) | GGU→G22 GAU→D22 |
n.a. | n.a. | Tolerated (0.56) |
23,618,295 | Exon 1 | GTG | GTG (100%) | GTG (100%) | ATG (13%) | GTG (100%) | CAC→H43 UAC→Y43 |
n.a. | n.a. | Tolerated (1) |
23,615,994 | Exon 2 | CCC | CTC (3%) * | CCC (100%) | CTC (13%) | CCC (100%) | GGG→G59 GAG→E59 |
n.a. | n.a. | Damaging (0.01) |
23,614,564 | Exon 3 | GTC | GTC (100%) | GTC (100%) | ATC (13%) | GTC (100%) | CAG→Q118 UAG→stop |
n.a. | n.a. | Damaging due to stop |
CST4 (reverse reading, chromosome 20) | ||||||||||
23,669,566 | Exon 1 (Signal) |
TGG | TGG (100%) | TAG (7%) * | TAG (11%) | TGG (100%) | ACC→T13(sp) AUC→I13(sp) |
rs770415022 | n.a. | Tolerated (0.37) |
23,669,561 | Exon 1 (Signal) |
CGA | CGA (100%) | CGA (100%) | CGA (100%) | AGA (100%) | GCU→A15(sp) UCU→S15(sp) |
n.a. | n.a. | Tolerated (0.39) |
23,669,539 | Exon 1 | AGG | AGG (100%) | AAG (5%) * | AAG (13%) | AGG (100%) | UCC→S3 UUC→F3 |
n.a. | n.a. | Tolerated (0.08) |
23,669,470 | Exon 1 | GCA | GCA (100%) | GTA (15%) | GCA (100%) | GTA (17%) | CGU→R26 CAU→H26 |
rs201273557 | T = 0.01% | Tolerated (0.08) |
23,669,462 | Exon 1 | GTG | GTG (100%) | GTG (100%) | ATG (18%) | GTG (100%) | CAC→H29 UAC→Y29 |
n.a. | n.a. | Tolerated (0.06) |
23,669,408 | Exon 1 | GGC | GGC (100%) | AGC (12%) | GGC (100%) | GGC (100%) | CCG→P47 UCG→S47 |
n.a. | n.a. | Tolerated (0.06) |
23,667,835 | Exon 2 | AAA | CAA (97%) | CAA (100%) | CAA (90%) | AAA (100%) | UUU→F58 GUU→V58 |
rs145608577 | C = 0.2% | Tolerated (1) |
23,667,828 | Exon 2 | CCC | CCC (100%) | CTC (18%) | CCC (100%) | CCC (100%) | GGG→G60 GAG→E60 |
rs144556333 | T = 0.007% | Damaging (0) |
23,667,826 | Exon 2 | CAC | CAC (100%) | TAC (10%) * | TAC (27%) | CAC (100%) | GUG→V61 AUG→M61 |
n.a. | n.a. | Tolerated (0.24) |
23,667,808 | Exon 2 | CAT | CAT (100%) | TAT (13%) | CAT (100%) | TAT (4%) * | GUA→V67 AUA→I67 |
rs774067751 | T = 0.007% | Tolerated (0.23) |
23,667,792 | Exon 2 | TGG | TGG (100%) | TAG (13%) | TGG (100%) | TGG (100%) | ACC→T72 AUC→I72 |
n.a. | n.a. | Damaging (0) |
23,667,783 | Exon 2 | TGG | TGG (100%) | TGG (95%) | TAG (15%) | TGG (100%) | ACC→T75 AUC→I75 |
rs760057501 | A = 0% | Damaging (0.01) |
23,666,565 | Exon 3 | TAC | TCC (88%) | TCC (14%) | TCC (80%) | TAC (100%) | AUG→M111 AGG→R111 |
rs779547810 | C = 0% | Tolerated (0.87) |
CST5 (reverse reading, chromosome 20) | ||||||||||
23,860,243 | Exon 1 | AGC | AAC (3%) * | AGC (100%) | AAC (11%) | AAC (5%) * | UCG→S4 UUG→L4 |
rs145031249 | A = 0.011% | Tolerated (0.27) |
23,860,211 | Exon 1 | GTA | GTA (100%) | GTA (100%) | ATA (12%) | GTA (100%) | CAU→H15 UAU→Y15 |
n.a. | n.a. | Tolerated (1) |
23,860,199 | Exon 1 | GAG | GAG (100%) | AAG (11%) | GAG (100%) | GAG (100%) | CUC→L19 UUC→F19 |
rs370924959 | A = 0% | Tolerated (0.66) |
23,860,178 | Exon 1 | ACA | GCA (93%) | GCA (100%) | GCA (95%) | GCA (100%) | UGU→ C26 CGU→ R26 |
rs1799841 | G = 43.2% | Tolerated (1) |
23,860,174 | Exon 1 | CGG | CGG (100%) | CGG (100%) | CAG (11%) | CGG (100%) | GCC→A27 GUC→V27 |
n.a. | n.a. | Tolerated (0.18) |
23,860,130 | Exon 1 | CTA | CTA (100%) | CTA (100%) | TTA (14%) | CTA (100%) | GAU→D42 AAU→N42 |
rs1257216384 | n.a. | Tolerated (0.11) |
23,860,093 | Exon 1 | CGG | CGG (100%) | CGG (100%) | CAG (11%) | CGG (100%) | GCC→A54 GUC→V54 |
n.a. | n.a. | Tolerated (0.11) |
23,858,200 | Exon 2 | TGG | TGG (100%) | TAG (22%) | TGG (100%) | TGG (100%) | ACC→T76 AUC→I76 |
rs41282292 | A = 0.061% | Damaging (0) |
CSTA (direct reading, chromosome 3) | ||||||||||
122,044,197 | Exon 1 | GTT | GTT (100%) | ATT (11%) | GTT (100%) | GTT (100%) | GUU→V20 AUU→I20 |
rs778366890 | A = 0% | Tolerated (0.23) |
122,056,400 | Exon 2 | CCA | CCA (100%) | CCA (100%) | TCA (12%) | CCA (100%) | CCA→P25 UCA→S25 |
n.a. | n.a. | Tolerated (0.74) |
122,060,361 | Exon 3 | CTT | CTT (100%) | CTT (100%) | TTT (16%) | CTT (100%) | CUU→L82 UUU→F82 |
n.a. | n.a. | Damaging (0) |
122,060,373 | Exon 3 | CAG | CAG (100%) | CAG (100%) | TAG (12%) | CAG (100%) | CAG→Q86 UAG→stop |
n.a. | n.a. | Damaging due to stop |
CSTB (reverse reading, chromosome 21) | ||||||||||
45,194,562 | Exon 2 | CGC | TGC (2%) * | TGC (11%) | CGC (100%) | CGC (100%) | GCG→A49 ACG→T49 |
rs559906825 | T = 0.007% | Damaging (0) |
45,194,138 | Exon 3 | TGG | TGG (98%) | TCG (13%) | TGG (95%) | TGG (100%) | ACC→T81 AGC→S81 |
n.a. | n.a. | Tolerated (0.65) |
45,194,132 | Exon 3 | AGA | AGA (100%) | AGA (100%) | AAA (15%) | AGA (100%) | UCU→S83 UUU→F83 |
n.a. | n.a. | Tolerated (0.1) |
a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.
In the following subparagraphs, the results were detailed considering one locus at a time. Note that given the extreme structure heterogeneity of the tested genes with multiple alleles and different lengths, the nucleotide variations were indicated according to their genomic coordinates (see Section 4 for details).
2.1. Nucleotide Variations in the Gene Loci Encoding Basic Proline-Rich Proteins
2.1.1. PRB1 Gene
The genomic alignment allowed us to identify 130 nucleotide changes in the PRB1 gene in ancient hominines compared with modern humans (Table 1 and Table S1). Fifty-five of these were detected within coding exons and included ten synonymous and forty-five nonsynonymous nucleotide substitutions. Among the nonsynonymous nucleotide substitutions, 20 corresponded to SNPs annotated in modern humans (Table 1). SIFT prediction indicated that 46% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). The T-C transition, which occurred in modern humans at position 11,506,774, causing the substitution of R72 with a Q in the II-2 isoform (Table 1 and Figure 4a), may have an impact on post-translational protein processing. Indeed, the modern human R72 residue is part of the R72SPR75 consensus sequence recognized by the pro-protein convertase responsible for the cleavage between II-2 and P-E peptides. Therefore, we may hypothesize that in archaic species, the PRB-1-encoded protein was a fused peptide spanning 136 amino acids, which integrates the modern II-2 and P-E (Table 1 and Figure 4a). The sequences of the peptides and the resulting putative archaic protein primary structures (named PRB-1 salivary archaic fusion 1 peptide, PRB-1 SAF-1) are reported in Figure 4a. The remaining seventy-five nucleotide changes identified in the PRB1 locus were found to fall within noncoding regions, namely fifty-four in introns, six in upstream regions, one in the 5′ UTR, 1 in the 3′UTR, and thirteen in downstream regions (Table S1).
2.1.2. PRB2 Gene
One hundred and thirty-six nucleotide substitutions were detected in the PRB2 locus in ancient hominines compared with modern humans (Table 1 and Table S2). Thirty-seven of these were identified in introns, ten in upstream regions, one in the 3′UTR, and eight in downstream regions. The remaining eighty variations were found in coding regions, namely two in exon 1 (corresponding to the signal peptide), one in exon 2, and the remaining in exon 3 (Table 1 and Table S2). Of note, the modern human sequence reported in the UniProtKB database corresponded to the L allele coding for the common isoforms IB-8a Con1- and P-H S1, the first one with a P residue instead of an S at position 100, the second one with an S residue instead of an A at position 1 [8]. Of the 80 sequence variants found in coding exons, 64 were nonsynonymous, causing amino acid substitutions. SIFT prediction indicated that 19% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). Twenty-six out of the sixty-four nonsynonymous substitutions were annotated as common variants (SNPs) in modern humans (Table 1). In particular, two changes occurring at 11,546,686 bp and 11,546,677 bp caused the substitution of the R93 and R96 with Q within the ancient IB-1 isoform. The two archaic residues were found in all four species, (Table 1). This implied that the archaic hominins’ R93SPR96 consensus sequence, recognized by the pro-protein convertase, apparently lacked two key arginine residues, thus disabling the post-translational cleavage. Therefore, the ancient saliva composition should feature a protein deriving from the fusion of IB-1 and P-J peptides, spanning 157 amino acids (named the PRB-2 salivary archaic fusion 2 peptide, PRB-2 SAF-2 peptide, in Figure 4b). Conversely, the presence of a C nucleotide at 11,546,314 bp in Neanderthals and Denisovans, instead of T in modern humans, led to the introduction of an R instead of the Q59 (Q217 in pro-protein) of the IB-8a Con1- isoform. This archaic primary structure would then include an additional pro-protein convertase consensus sequence, R59SAR62, causing the cleavage of the IB-8a Con1- protein into two smaller peptides. According to the usual removal of the C-terminal arginine residue observed for almost all the bPRPs, both peptides should be 61 aminoacidic residues long (Figure 4c). These putative archaic hominins’ PRB-2 variants are named by us the PRB-2 salivary archaic cleavage 1 peptide (PRB-2 SAC-1 peptide) and the PRB-2 salivary archaic cleavage 2 peptide (PRB-2 SAC-2 peptide) and are shown in Figure 4c. Of note, the sequence of the PRB-2 SAC-1 peptide exactly corresponds to the sequence of the modern human P-J peptide with an alanine (A61) instead of a serine in the last amino acid residue. The sequence of the PRB-2 SAC-2 peptide exactly corresponds to the modern human P-F peptide with a serine (S61) instead of an alanine in the last amino acid residue (Figure 4d and [9]). The variation at 11,546,395 bp indicated that in archaic hominins, the P31 (P189 of pro-protein) residue was replaced by a Q in the IB-8a Con1-; this change results probably in a deleterious effect on protein function, as predicted by SIFT analysis.
The protein name, the modifications with respect to modern humans, and the corresponding frequencies found in Neanderthals, Chagyrskayas, Vindijas and/or Denisovans are reported for each archaic protein. The positions of each substitution are also reported in the primary sequences (residues in bold characters). q: pyroglutamic acid; S: phosphorylated serine.
2.1.3. PRB3 Gene
We have identified 163 nucleotide variations in the PRB3 locus in ancient hominines compared with modern humans (Table 1 and Table S3). Of these, 53 were detected in coding regions and 110 in noncoding regions (71 within introns, 14 in upstream regions, 2 in the 3′UTR, and 23 in downstream regions; Table S3). The archaic sequences were compared with the allele Gl-2 (or PRP-3M) of modern humans. Fourteen variations identified in coding exons were synonymous, whereas thirty-nine changes were missense variants. Twelve out of the thirty-nine nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 1). PRP3 protein contains eight N-glycosylated Asp residues falling into the NXS/pS sequon; among the substitutions found in the PRB3 gene, only those at position 11,420,728 fall within the consensus sequence (S136F), and deleterious results for the protein function were predicted by SIFT (Table 1). Overall, 37.5% of the substitutions were found to be deleterious on the protein function (Table 1). The noncoding variant found at position 11,420,458 could probably affect the splicing process of PRB3 transcripts in ancient hominins since it fell within the GU consensus site (splice donor site) at 5′ end of intron 3 (Table S3).
2.1.4. PRB4 Gene
For the PRB4 locus, we detected 129 nucleotide substitutions in ancient hominines compared with modern humans (Table 1 and Table S4). Of these, 27 were found in coding exons, including 4 synonymous and 23 nonsynonymous (Table 1), and 102 in noncoding regions (Table S4). The archaic sequence was compared with the small allele of the modern human locus coding for P-D peptides and glycosylated protein A (PGA). The 23 missense variants were all found within coding regions for the glycosylated protein A, while none of the identified variations would affect the P-D variant (see Table 1 for details). These variations had no consequence on the consensus sequence of pro-protein convertase or on the sequence of the glycosylation sites. It is interesting to observe that all the archaic sequences reported a code for the P-D P32A variant. Overall, seven out of the twenty-three nonsynonymous in the PRB4 locus corresponded to annotated common variants in modern humans, and only 13% were found to be deleterious on the protein function (Table 1).
2.2. Nucleotide Variations in the Gene Locus Encoding the a-PRP
One hundred and sixty-three nucleotide substitutions have been annotated in the PRH2 gene locus in ancient hominines compared with modern humans (Table 2 and Table S5), of which thirty fell within coding exons, including seven synonymous and twenty-three nonsynonymous. Four of these latter corresponded to annotated common variants in modern humans (Table 2). Sixty-six nucleotide substitutions were identified in introns, seven in upstream regions, three in the 5′UTR, forty-nine in the 3′UTR, and eight in downstream regions (Table S5). The archaic DNA sequences reported in the sequence database used in this study (see Section 4 for details) corresponded to the PRP-1 protein of the PRH2 alleles, thus having a N50 residue. The nucleotide variations reported in Table 1 generated two synonymous substitutions at D6 and P135.
2.3. Nucleotide Variations in the HTN Gene Loci
A total of 188 and 175 nucleotide substitutions were identified in the HTN1 and HTN3 genes, respectively (Table 2, Tables S6 and S7). The nucleotide substitutions reported in HTN1 are distributed as follows: 4 fell within coding exons, including1 synonymous and 3 nonsynonymous, and 184 fell in noncoding regions, including146 within introns, 6 in upstream regions, 3 in the 5′UTR, 9 in the 3′UTR, and 20 in downstream regions (Table 2 and Table S6). Regarding HTN3, 3 nucleotide changes were reported in coding exons (1 synonymous and 2 nonsynonymous), whereas 172 fell in noncoding regions (145 within introns, 9 in upstream regions, 3 in the 5′UTR, 5 in the 3′UTR, and 10 in downstream regions) (Table 2 and Table S7). One missense variant for HTN1 and one for HTN3 found in ancient hominins were also reported as SNPs in modern humans (Table 2).
2.4. Nucleotide Variations in the AMY1A Gene Locus
Two hundred and twelve nucleotide substitutions have been annotated in the AMY1A gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S8). Forty changes fell within coding exons, of which eleven were synonymous and twenty-nine were nonsynonymous. Only one of the nonsynonymous substitutions corresponded to an annotated common variant in modern humans (Table 2). One hundred forty-four nucleotide substitutions were identified in introns, four in upstream regions, nine in the 5′UTR, and fifteen in downstream regions (Table S8).
2.5. Nucleotide Variations in the STATH and P-B Gene Loci
One hundred fifty-nine nucleotide substitutions have been annotated in the STATH gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S9). Six changes fell within coding exons, of which two were synonymous and four were nonsynonymous (Table 2). One hundred fifty-three nucleotide substitutions were detected in introns and regulatory regions (Table S9).
One hundred eighty-seven nucleotide substitutions were detected in the SMR3B locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S10). Of these, 5 were found in coding exons (2 synonymous and 3 nonsynonymous), 155 were in introns, 3 in upstream regions, 3 in 5′UTRs, 10 in 3′UTR, and 11 in downstream regions (Table 2 and Table S10). One missense variant was reported as an SNP in modern humans (Table S10).
2.6. Nucleotide Variations in the CST Gene Loci
2.6.1. CST1 Gene
We have annotated 227 nucleotide substitutions in the CST1 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S11). Of these, 128 were found in introns, 19 in upstream regions, 7 in the 5′UTR, 12 in the 3′UTR, 32 in downstream regions (Table S11), and 29 in coding regions, including 11 synonymous and 18 missense variations (Table 3). The nucleotide variation at 23,731,494 bp caused the substitution of the Y3(sp) with an H, affecting the third amino acid residue of the signal peptide. This should not impact the function of the protein, although it may have affected the speed of protein translation and/or the correct processing and trafficking. Four substitutions out of eighteen could have a negative impact on protein function, as predicted by SIFT. Overall, nine nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table S11).
2.6.2. CST2 Gene
We detected 167 nucleotide changes in the CST2 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S12). Of these, 103 were in introns, 15 in upstream regions, 8 in the 3′UTR, 17 in downstream noncoding regions (Table S12), and 24 in coding regions (Table 2). The latter included six synonymous and nineteen nonsynonymous variations, eight of which were predicted to have a deleterious effect on protein function (SIFT score < 0.05). Ten out of the eighteen nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 2). Interestingly, the nucleotide change at 23,804,691 bp fell into the canonical DNA-binding motif for the NR3C1 (nuclear receptor subfamily 3 group C member 1) transcription factor, as reported in the UCSC Genome Browser. This variation could most likely affect the affinity of this factor for the regulatory region and thus the expression of the CST2 gene.
2.6.3. CST3 Gene
In the CST3 locus, we have identified 452 nucleotide variations in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S13). Of these, 329 were in introns, 18 in upstream regions, 9 in 5′UTR, 50 in 3′UTR, 29 in downstream noncoding regions (Table S13), and 17 in coding regions, including 9 synonymous and 8 nonsynonymous variations (Table 2). One nucleotide substitution corresponded to an annotated common variant in modern humans (Table 2).
2.6.4. CST4 Gene
Two hundred and sixty-three nucleotide substitutions were detected in the CST4 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S14). These included 130 changes in introns, 42 in upstream regions, 4 in the 5′UTR, 20 in the 3′UTR, 43 in downstream noncoding regions (Table S14), and 24 in coding exons (11 synonymous and 13 missense variations; Table 3). Seven variations in this locus corresponded to annotated common variants in modern humans (Table 3). The change at 23,666,565 bp caused the substitution of the M111 with an R in the corresponding Neanderthal peptide structure. Even if it causes the substitution of an uncharged amino acid with a charged one, the SIFT analysis did not predict a deleterious effect of this variant on the function of the archaic protein compared to modern humans.
2.6.5. CST5 Gene
One hundred ninety-three nucleotide substitutions were annotated in the CST5 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S15). Sixteen changes were mapped in the coding region, including eight synonymous and eight nonsynonymous (Table 3). Of the 177 nucleotide substitutions located in noncoding regions, 118 were in introns, 24 in upstream regions, 18 in 3′UTR, and 17 in downstream regions (Table S15). The exonic nucleotide variation generated the codon for an R in both archaic hominins instead of C26. This represented a common variant also found in modern humans (rs1799841). The cystatin D variant with the R26 is frequently detected in the soluble fraction of human saliva, probably because is more soluble than the C26-containing isoform [19]. Moreover, the opposite substitution (R26C) was detectable with high frequency at the same amino acid residue in the cystatin SA gene of Neanderthals. Five out of the eight nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table 3).
2.6.6. CSTA and CSTB Genes
Finally, 394 and 134 nucleotide substitutions were identified in CSTA and CSTB loci, respectively, in Neanderthals and Denisovans compared with modern humans (Table 3, Tables S16 and S17). The nucleotide substitutions reported in CSTA were distributed as follows: 6 fell in coding exons, including 2 synonymous and 4 nonsynonymous, and 388 fell in noncoding regions, including 346 in introns, 10 in upstream regions, 5 in the 5′UTR, 10 in the 3′UTR, and 17 in downstream regions (Table 3 and Table S16). Among these changes, the variation at 122,044,848-122,044,850 positions of CSTA was a CTT deletion, observed exclusively in Denisovans (Table S16). This fell within the canonical DNA-binding motif for the Spi-1 proto-oncogene transcription factor (source: UCSC Genome Browser); therefore, it could probably affect the expression of the CSTA gene in the ancient hominin. Regarding CSTB, 9 nucleotide changes were reported in coding exons (6 synonymous and 3 nonsynonymous), whereas 125 fell in noncoding regions (55 within introns, 27 in upstream regions, 5 in the 5′UTR, 15 in the 3′UTR, and 23 in downstream regions) (Table 3 and Table S17). One missense variant for CSTA and 1 for CSTB found in ancient hominins were also reported as an SNP in modern humans (Table 3).
2.7. Geographic Distribution of Genetic Variants in Modern Humans
Of note, the salivary protein genes tested resulted polymorphic in humans. The frequency of specific coding nonsynonymous genetic variants also changed between different populations, as reported in the Geography of Genetic Variants Browser (https://popgen.uchicago.edu/ggv; accessed on 22 July 2022) (File S1) [29]. In particular, 20 genetic variants (three in the PRB1 gene, six in PRB2, one in PRB3, two in CST1, four in CST2, three in CST5, and one in CSTB; highlighted in red in Table 1, Table 2 and Table 3) displayed a different geographic distribution and specifically; rs554211998, rs201994479, rs34305575, rs6076122, rs111349461, rs55860552, rs568411970, rs145031249, and rs1799841 showed a peculiar allele frequency in African populations (File S1).
2.8. Evolutionary Pressure of Salivary Protein Genes
To investigate if some of the salivary protein genes studied showed evidence of positive selection in anatomically modern humans, we performed a population branch statistics (PBS) analysis [30]. Our results showed no signal of recent selective pressure for the genes analysed, attesting that variants on these genes did not affect individual fitness (File S2). We also implemented the Tajima test as an additional evolutionary analysis to evaluate the selective effects of each observed substation. Tajima’s D values show comparable variance among the genes analysed. The D values were prevalently slightly negative or positive (ranging from −0.698 to 3.359) (File S3), confirming the absence of a selective sweep [31], which was already suggested by the PBS test.
Compared to modern humans, Neanderthal and Denisovan genomes showed evidence of ancient interbreed [32], leading to an uneven distribution of introgressed chromosomal regions because of natural selection [33]. To investigate if some of the salivary protein gene variants studied might be due to interbreeding, we used two databases of archaic introgression based on a comparison with modern genomes from the 1000 genomes project [34] and the Estonian Biocentre collection [35], which also reported data from previous studies [33,36]. However, the considered genes were not encompassed within the chromosomal regions highlighted in the databases and, therefore, did not show an apparent sign of adaptative introgression from archaic hominins.
3. Discussion
The different dietary habits of archaic hominins and modern humans have been mostly attributed to the changes in the availability of natural food resources, the oral bacterial community (microbiota), and climatic conditions [37,38]. A role for salivary proteins can be also inferred, as they are known to be implicated in the modulation of the microbiome of the oral cavity, the entire gastrointestinal tract, and taste perception [39]. aPRPs can promote the attachment of several important bacteria, such as Actinomyces viscosus, Bacteroides gingival, and some strains of Streptococcus mutans. Moreover, both aPRPs and statherin promote the colonization of oral surfaces by Porfiromonas gingivalis [40]. It was reported that the salivary proteins may modulate oral health and homeostasis, maintain a stable ecosystem, and inhibit the growth of cariogenic bacteria [41,42]. Recently, 258 salivary proteins were found differentially expressed between the caries-free and caries-active children [43]. They are also involved in taste perception. In particular, the salivary bPRPs II-2 and Ps-1 contribute to bitter taste sensitivity [44]. Also, some salivary peptides belonging to the bPRPs and the histatin families can bind polyphenols in tannin-rich foods, thus evoking the typical astringent sensation [44]. Salivary proteins play an important role in affecting sweet [45], salt [46], and umami [47] tastes, along with fat, salt, and bitter acceptance [48,49]. Also, cystatins are supposed to affect taste perception, as lower salivary levels of these peptides may enhance proteolysis, which would affect the mucosal pellicle lining of the oral cavity, thereby increasing the accessibility of tastants to taste receptors [49]. Interestingly, most of these proteins have been shown to be modulated in pathological conditions, including tumors and inflammation, suggesting that they play a role as clinically relevant biomarkers [5].
Therefore, a hypothesis has been raising that the evolutionary changes occurred in the structure of these proteins could be associated with the different dietary habits of archaic hominins. In this regard, mutations in different bitter taste receptor genes (namely TAS2R62, TAS2R64, and TAS2R38) and the masticatory myosin gene MYH16, along with the duplication of the salivary amylase gene AMY1 that has occurred in recent human evolution, have been associated with variations in taste sensitivity and the shift toward the food cooking habits of modern humans [50].
Based on this emerging background, in this study, we identified and inferred the functional consequences of the nucleotide substitutions fixed in the gene loci coding for the main salivary proteins in modern humans compared to ancient hominins species (Neanderthals and Denisovans).
By mapping over 3400 nucleotide substitutions, we have shown that the majority (87.7%) of changes are detectable in the genes expressing the most important salivary proteins (proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases) of modern humans, compared with Neanderthals and Denisovans, mapped within noncoding regions.
Quite unexpectedly, our data also showed the presence of nucleotide variations affecting the coding sequence of all 17 gene loci analysed. Overall, the frequency of coding variations in these genomic loci is far higher than the general rate found throughout the genome since previous studies highlighted that relatively few amino acid changes have become fixed in recent human evolution to date [51,52]. To the best of our knowledge, this study provides the first original description of coding nucleotide changes that occurred in salivary protein genes during the recent evolutionary shift of modern humans from Neanderthal and Denisovan species. Focusing on these missense variations, we hypothesized the possible functional effects they could have played in protein structure, processing, and function. Of the 307 missense changes found in the coding regions of the tested genes, 92 were predicted to have a potentially deleterious effect on protein function.
The changes identified in the PRB1 and PRB2 genes are worth particular attention and could be interpreted in light of the extant knowledge of the biology of the encoded proteins. As already mentioned, the PRB protein family is highly polymorphic and, despite being common to all mammals, the proteins belonging to this family feature have significant structural differences among species. For instance, the peptides generated by the convertase cleavage span 50 to 90 amino acids in length in humans and 10 to 40 in pigs, with sensible variations in the peptide sequences [53]. Therefore, bPRPs appear to be non-conserved across species, probably because they are mostly implicated in taste perception and underwent a deep transformation during evolution due to the changing habits and habitats of the species [44]. Interestingly, our results showed that three nucleotide substitutions annotated in the archaic hominins’ PRB1 and PRB2 genes affect specific arginine residues within the consensus sequences of the polypeptide, which are recognized by the pro-protein convertases responsible for their cleavage. These changes could have determined the presence of fused proteins in the archaic hominins’ proteome. The putative “PRB1 salivary archaic fusion 1 peptide” and “PRB2 salivary archaic fusion 2 peptide” could have been possibly associated with additional and/or alternative functions that able to influence the eating habits of extinct hominins. In addition, we have also identified a sequence change in the PRB2 gene that instead generates a new pro-protein convertase consensus sequence in the encoded peptide. As a result, ancient hominins could have expressed two smaller peptides, the “PRB2 salivary archaic cleavage 1 peptide” and the “PRB2 salivary archaic cleavage 2 peptide”, possibly exerting alternative functions, which deserve further functional studies.
The missense nucleotide substitutions annotated in the remaining salivary protein genes described in this study (aPRPs, histatins, amylases, statherin, P-B peptide, and cystatins) could be interpreted, at least in part, considering the putative changes that they can cause in post-translational protein processing, sorting, localization, and trafficking toward secretion. In addition, all the missense variations that introduce or remove a cysteine residue on the archaic cystatins, most likely affecting the conserved sequences involved in the protein-protein binding [53], could also influence protein function.
We also annotated the nucleotide variations fixed within the noncoding regions of modern humans of the tested genes, given these could reasonably affect the expression levels of salivary proteins by changing the affinity of transcriptional regulators for promoters, enhancer and/or silencer elements, and/or the splicing, in addition to changing splice site consensus sequences and leading to the formation of alternative coding transcripts. Also, they could affect post-transcriptional regulation mechanisms, such as the binding of the noncoding regulatory RNAs, leading to varying protein types and amounts that emerged during the recent evolution. Specifically, two nucleotide substitutions found in the CST2 and CSTA gene loci appear to fall within the canonical DNA-binding motifs for specific transcriptional factors, which could most likely intervene in the modulation of their expression. We also annotated 216 changes in the 3′ untranslated regions in 16 of the 17 genes analysed (in all but AMY1A). These substitutions might instead condition the binding of specific microRNA-targeting salivary protein transcripts, modulating their stability and the translation process.
Lastly, 34.9% of the nonsynonymous nucleotide substitutions identified in this study appear to be frequent in the modern human genome, where they are annotated as single nucleotide polymorphisms (SNPs). In addition, some of these coding genetic variants display a different geographic distribution in humans. This observation reduces the evolutionary significance of such changes, which are to be considered in light of the polymorphic nature of these genomic loci. However, taken together, variants showing alternative nucleotide fixation in modern vs. archaic humans represent 7.3% of all the nucleotide substitutions reported in the study.
Also, our results do not suggest any significant evolutionary pressure or sign of adaptative introgression from archaic hominins on the tested genes.
4. Materials and Methods
4.1. Nucleotide Variants Annotation
In order to annotate all the nucleotide variants within the gene loci of the salivary proteins of interest, we compared modern human sequences with Altai Neanderthals (downloaded from http://cdna.eva.mpg.de/Neanderthal/altai/AltaiNeanderthal/bam/, accessed on 2 May 2020), Chagyrskaya Neanderthals (Index of/neandertal/Chagyrskaya/BAM (mpg.de), accessed on 9 December 2022), Vindija Neanderthals (Index of/neandertal/Vindija/bam/Pruefer_etal_2017/Vindija33.19 (mpg.de), accessed on 9 December 2022), and Denisova sequences (http://cdna.eva.mpg.de/denisova/alignments/, accessed on 2 May 2020) [54,55]. The fossil remains, aged between 50,000 and 30,000 years, come from two distinct geographical areas. The female Neanderthal sample from Vindija (Croatia), in the Western Balkans, yielded a 30× genome coverage [56]. The other samples came from two different sites in the Altai Mountains in Siberia (Russia): the genomic data of a female Neanderthal (at 52× coverage) [57] and a juvenile female Denisovan individual (at 30× coverage) [55] came from the Denisova cave, and another female sample came from the Chagyrskaya cave, located about 100 km westward, and yielded a genome of 27× coverage [58]. In particular, we aligned the sequences of modern humans and ancient hominines by means of the Integrative Genomics Viewer (IGV) tool (2.3.72 version) [59,60,61]. Note that the reference genomes annotated in this database are set on the hg19 genome assembly coordinates. We annotated all the nucleotide substitutions with a frequency greater than 10% and a coverage of a minimum of 10 counts in both coding, noncoding, and regulatory sequences (i.e., 5′ and 3′ untranslated and flanking upstream and downstream regulatory regions) for each gene of interest to consider the possible damage and fragmentation to which the ancient hominin DNA was subjected. Of note, the variant frequency indicated the percentage of frequency of that substitution in ancient hominines, as reported by the IGV tool, considering the depth (coverage) of the reads displayed at each locus. For each tested gene, a region of approximately 500 bp upstream and downstream of the first and last exons was, respectively, considered and screened to annotate nucleotide substitutions within regulatory regions able to affect the gene expression rate. The precise hg19 genomic coordinates for each tested gene locus were as follows: PRB1 locus 11,509,000–11,504,200 on chromosome 12; PRB2 locus 11,549,000–11,544,000 on chromosome 12; PRB3 locus 11,423,140–11,418,300 on chromosome 12; PRB4 locus 11,463,900–11,459,500 on chromosome 12; PRH2 locus 11,081,500–11,087,950 on chromosome 12; HTN1 locus 70,915,750–70,925,000 on chromosome 4; HTN3 locus 70,893,670–70,902,700 on chromosome 4; AMY1A locus 104,239,500–104,229,500 on chromosome 1; STATH locus 70,861,200–70,868,790 on chromosome 4; SMR3B locus 71,248,550–71,256,400 on chromosome 4; CST1 locus 23,732,000–23,727,600 on chromosome 20; CST2 locus 23,807,800–23,803,900 on chromosome 20; CST3 locus 23,619,100–23,606,800 on chromosome 20; CST4 locus 23,670,200–23,665,700 on chromosome 20; CST5 locus 23,860,900–23,856,000 on chromosome 20; CSTA locus 122,043,600–122,061,300 on chromosome 3; and CSTB locus 45,196,800–45,193,000 on chromosome 21.
The annotation with the corresponding frequency of all variations in present-day human populations was collected by integrating information from both the dbSNP (Single Nucleotide Polymorphism Database; https://www.ncbi.nlm.nih.gov/snp, accessed on 15 July 2020) and the Ensembl (http://www.ensembl.org/index.html, accessed on 15 July 2020) databases. In particular, the frequency was reported as the Allele Frequency Aggregator (ALFA New). The analysis of regulatory regions in the gene loci analysed was assessed by implementing the information available on the UCSC Genome Browser database (https://genome.ucsc.edu, accessed on 15 July 2020).
The coding sequences of salivary proteins were extracted from the publicly available UniProtKB database (https://www.uniprot.org/, accessed on 15 July 2020): PRB1, primary accession number: P04280; PRB2: P02812; PRB3: Q04118; PRB4: P10163; PRH2: P02810; HTN1: P15515; HTN3: P15516; STATH: P02808; AMY1A: P0DUB6; P-B: P02814, CST1: P01037; CST2: P09228; CST3: P01034; CST4: P01036; CST5: P28325, CSTA: P01040, CSTB: P04080.
4.2. Protein Data Analysis
The potential impact of the amino acid substitution on salivary protein function was predicted by SIFT (sorting intolerant from tolerant) version 5.1.1 using the Genome tool (SIFT nonsynonymous single nucleotide variants (genome-scale), available at the SIFT website (http://sift.jcvi.org/, accessed on 20 June 2022). The SIFT algorithm is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST [62]. SIFT results with a score < 0.05 indicate amino acids deleterious on protein function.
4.3. Selective Pressure Analysis
To detect any possible trace of selective pressure, PBS has been applied. PBS is a statistical three-population test based on the FST fixation index, and it has proven to be one of the best methods of detecting signs of recent natural selection on genomes [31]. Regarding the choice of the three populations, we used three distant populations worldwide (CEU for Europe, CHB for Asia, and YRI for Africa), which are the most commonly used [63,64] and are among the first populations released by the 1000 Genomes, Phase 1 [64].
FST among three possible populations pairs (CEU, CHB, and YRI) has been calculated by VCFtools v0.1.16 [65] using VCF files of each gene under scrutiny. The genes were previously filtrated with Plink 1.9 [66] to keep only the variants with MAF ≥ 0.05. Then, PBS and relative plots were performed with R Studio software (R Core Team 2021, https://www.R-project.org, accessed on 2 December 2022).
5. Conclusions
In conclusion, the nucleotide substitutions that have putatively affected the amino acid composition, the post-translational modification, and/or the gene expression levels of salivary proteins described in this study might have generated novel functional features and a different expression ratio among the several components of the salivary proteome. Given the largely unknown functional roles of most salivary proteins, we may only speculate that these changes could have ultimately modified the entire homeostasis of the oral cavity environment, possibly conditioning the eating habit lifestyle of modern humans. Our data may pave the way to unravelling evolutionary processes that have occurred through changes of salivary composition in the oral cavity homeostasis. This knowledge could provide additional novel cues toward a better understanding of the ability of different species to adapt to different and changing environments.
Acknowledgments
We thank Luca Pagani (Università di Padova) for their useful advice on adaptative introgression.
Supplementary Materials
The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms241915010/s1.
Author Contributions
Conceptualization: M.C. and O.P.; data elaboration and collection, L.D.P., M.C., M.B., B.M. and A.O.; manuscript editing, L.D.P., W.L., M.C., B.M., T.C., O.P. and S.S. All authors contributed to the discussion and revision of the manuscript. All authors have read and agreed to the published version of the manuscript.
Data Availability Statement
All data reported in this manuscript are shown in the results section and further supported by the extended datasets provided in the supplementary files. No new primary datasets to be deposited have been generated.
Conflicts of Interest
The authors declare no conflict of interest.
Funding Statement
This study was partially supported by the FIR 2021 funds (Cagliari, Italy) to T.C. and the “Linea D.1–D.3.1” funds from the Università Cattolica del Sacro Cuore (Rome, Italy) to L.D.P., W.L., and O.P.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Cabras T., Iavarone F., Manconi B., Olianas A., Sanna M.T., Castagnola M., Messana I. Top-down analytical platforms for the characterization of the human salivary proteome. Bioanalysis. 2014;6:563–581. doi: 10.4155/bio.13.349. [DOI] [PubMed] [Google Scholar]
- 2.Bandhakavi S., Stone M.D., Onsongo G., Van Riper S.K., Griffin T.J. A Dynamic Range Compression and Three-Dimensional Peptide Fractionation Analysis Platform Expands Proteome Coverage and the Diagnostic Potential of Whole Saliva. J. Proteome Res. 2009;8:5590–5600. doi: 10.1021/pr900675w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Vila T., Rizk A.M., Sultan A.S., Jabra-Rizk M.A. The power of saliva: Antimicrobial and beyond. PLoS Pathog. 2019;15:e1008058. doi: 10.1371/journal.ppat.1008058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ngo L.H., Veith P.D., Chen Y.Y., Chen D., Darby I.B., Reynolds E.C. Mass Spectrometric Analyses of Peptides and Proteins in Human Gingival Crevicular Fluid. J. Proteome Res. 2010;9:1683–1693. doi: 10.1021/pr900775s. [DOI] [PubMed] [Google Scholar]
- 5.Boroumand M., Olianas A., Cabras T., Manconi B., Fanni D., Faa G., Desiderio C., Messana I., Castagnola M. Saliva, a bodily fluid with recognized and potential diagnostic applications. J. Sep. Sci. 2021;44:3677–3690. doi: 10.1002/jssc.202100384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Beeley J.A. Basic proline-rich proteins: Multifunctional defence molecules? Oral Dis. 2012;7:69–70. doi: 10.1034/j.1601-0825.2001.0070201.x. [DOI] [PubMed] [Google Scholar]
- 7.Hajishengallis G., Russell M.W. Innate Humoral Defense Factors. Mucosal Immunol. 2015;1:251–270. doi: 10.1016/B978-0-12-415847-4.00015-X. [DOI] [Google Scholar]
- 8.Lyons K.M., Azen E.A., Goodman P.A., Smithies O. Many protein products from a few loci: Assignment of human salivary proline-rich proteins to specific loci. Genetics. 1988;120:255–265. doi: 10.1093/genetics/120.1.255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Padiglia A., Orrù R., Boroumand M., Olianas A., Manconi B., Sanna M.T., Desiderio C., Iavarone F., Liori B., Messana I., et al. Extensive Characterization of the Human Salivary Basic Proline-Rich Protein Family by Top-Down Mass Spectrometry. J. Proteome Res. 2018;17:3292–3307. doi: 10.1021/acs.jproteome.8b00444. [DOI] [PubMed] [Google Scholar]
- 10.Manconi B., Castagnola M., Cabras T., Olianas A., Vitali A., Desiderio C., Sanna M.T., Messana I. The intriguing heterogeneity of human salivary proline-rich proteins. J. Proteom. 2016;134:47–56. doi: 10.1016/j.jprot.2015.09.009. [DOI] [PubMed] [Google Scholar]
- 11.Lyons K.M., Stein J.H., Smithies O. Length polymorphisms in human proline-rich protein genes generated by intragenic unequal crossing over. Genetics. 1988;120:267–278. doi: 10.1093/genetics/120.1.267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Azen E.A., Amberger E., Fisher S., Prakobphol A., Niece R.L. PRB1, PRB2, and PRB4 coded polymorphisms among human salivary concanavalin-A binding, II-1, and Po proline-rich proteins. Am. J. Hum. Genet. 1966;58:143–153. [PMC free article] [PubMed] [Google Scholar]
- 13.Messana I., Cabras T., Pisano E., Sanna M.T., Olianas A., Manconi B., Pellegrini M., Paludetti G., Scarano E., Fiorita A., et al. Trafficking and Postsecretory Events Responsible for the Formation of Secreted Human Salivary Peptides: A Proteomics Approach. Mol. Cell. Proteom. 2008;7:911–926. doi: 10.1074/mcp.M700501-MCP200. [DOI] [PubMed] [Google Scholar]
- 14.Jensen J.L., Lamkin M.S., Troxler R.F., Oppenheim F.G. Multiple forms of statherin in human salivary secretions. Arch. Oral Biol. 1991;36:529–534. doi: 10.1016/0003-9969(91)90147-M. [DOI] [PubMed] [Google Scholar]
- 15.Inzitari R., Cabras T., Rossetti D.V., Fanali C., Vitali A., Pellegrini M., Paludetti G., Manni A., Giardina B., Messana I., et al. Detection in human saliva of different statherin and P-B fragments and derivatives. Proteomics. 2006;6:6370–6379. doi: 10.1002/pmic.200600395. [DOI] [PubMed] [Google Scholar]
- 16.Cabras T., Inzitari R., Fanali C., Scarano E., Patamia M., Sanna M.T., Pisano E., Giardina B., Castagnola M., Messana I. HPLC–MS characterization of cyclo-statherin Q-37, a specific cyclization product of human salivary statherin generated by transglutaminase 2. J. Sep. Sci. 2006;29:2600–2608. doi: 10.1002/jssc.200600244. [DOI] [PubMed] [Google Scholar]
- 17.Torres P., Castro M., Reyes M., Torres V. Histatins, wound healing, and cell migration. Oral Dis. 2018;24:1150–1160. doi: 10.1111/odi.12816. [DOI] [PubMed] [Google Scholar]
- 18.Castagnola M., Inzitari R., Rossetti D.V., Olmi C., Cabras T., Piras V., Nicolussi P., Sanna M.T., Pellegrini M., Giardina B., et al. A Cascade of 24 Histatins (Histatin 3 Fragments) in Human Saliva: Suggestion for a Pre-Secretory Sequential Cleavage Pathway. J. Biol. Chem. 2004;279:41436–41443. doi: 10.1074/jbc.M404322200. [DOI] [PubMed] [Google Scholar]
- 19.Wang G. Human Antimicrobial Peptides and Proteins. Pharmaceuticals. 2014;7:545–594. doi: 10.3390/ph7050545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dickinson D.P. Cysteine peptidases of mammals: Their biological roles and potential effects in the oral cavity and other tissues in health and disease. Crit. Rev. Oral Biol. Med. 2022;13:238–275. doi: 10.1177/154411130201300304. [DOI] [PubMed] [Google Scholar]
- 21.Manconi B., Liori B., Cabras T., Vincenzoni F., Iavarone F., Castagnola M., Messana I., Olianas A. Salivary Cystatins: Exploring New Post-Translational Modifications and Polymorphisms by Top-Down High-Resolution Mass Spectrometry. J. Proteome Res. 2017;16:4196–4207. doi: 10.1021/acs.jproteome.7b00567. [DOI] [PubMed] [Google Scholar]
- 22.Perry G.H., Dominy N.J., Claw K.G., Lee A.S., Fiegler H., Redon R., Werner J., Villanea F.A., Mountain J.L., Misra R., et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 2007;39:1256–1260. doi: 10.1038/ng2123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Polley S., Louzada S., Forni D., Sironi M., Balaskas T., Hains D.S., Yang F., Hollox E.J. Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc. Natl. Acad. Sci. USA. 2015;112:5105–5110. doi: 10.1073/pnas.1416531112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Xu D., Pavlidis P., Taskent R.O., Alachiotis N., Flanagan C., DeGiorgio M., Blekhman R., Ruhl S., Gokcumen O. Archaic Hominin Introgression in Africa Contributes to Functional Salivary MUC7 Genetic Variation. Mol. Biol. Evol. 2017;34:2704–2715. doi: 10.1093/molbev/msx206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xu D., Pavlidis P., Thamadilok S., Redwood E., Fox S., Blekhman R., Ruhl S., Gokcumen O. Recent evolution of the salivary mucin MUC7. Sci. Rep. 2016;6:31791. doi: 10.1038/srep31791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Thamadilok S., Choi K.S., Ruhl L., Schulte F., Kazim A.L., Hardt M., Gokcumen O., RuhL S. Human and Nonhuman Primate Lineage-Specific Footprints in the Salivary Proteome. Mol. Biol. Evol. 2020;37:395–405. doi: 10.1093/molbev/msz223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Edwards A.W.F. The Genetical Theory of Natural Selection. Genetics. 2000;154:1419–1426. doi: 10.1093/genetics/154.4.1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA. 2010;107:961–968. doi: 10.1073/pnas.0912629107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Marcus J.H., Novembre J. Visualizing the geography of genetic variants. Bioinformatics. 2017;33:594–595. doi: 10.1093/bioinformatics/btw643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Yi X., Liang Y., Huerta-Sanchez E., Jin X., Cuo Z.X., Pool J.E., Xu X., Jiang H., Vinckenbosch N., Korneliussen T.S., et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329:75–78. doi: 10.1126/science.1190371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. doi: 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Skoglund P., Jakobsson M. Archaic human ancestry in East Asia. Proc. Natl. Acad. Sci. USA. 2011;108:18301–18306. doi: 10.1073/pnas.1108181108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sankararaman S., Mallick S., Patterson N., Reich D. The Combined Landscape of Denisovan and Neanderthal Ancestry in Present-Day Humans. Curr. Biol. 2016;26:1241–1247. doi: 10.1016/j.cub.2016.03.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Racimo F., Marnetto D., Huerta-Sánchez E. Signatures of Archaic Adaptive Introgression in Present-Day Human Populations. Mol. Biol. Evol. 2017;34:296–317. doi: 10.1093/molbev/msw216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Jagoda E., Lawson D.J., Wall J.D., Lambert D., Muller C., Westaway M., Leavesley M., Capellini T.D., Mirazón Lahr M., Gerbault P., et al. Disentangling Immediate Adaptive Introgression from Selection on Standing Introgressed Variation in Humans. Mol. Biol. Evol. 2018;35:623–630. doi: 10.1093/molbev/msx314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Vernot B., Akey J.M. Resurrecting surviving Neandertal lineages from modern human genomes. Science. 2014;343:1017–1021. doi: 10.1126/science.1245938. [DOI] [PubMed] [Google Scholar]
- 37.Weyrich L.S., Duchene S., Soubrier J., Arriola L., Llamas B., Breen J., Morris A.G., Alt K.W., Caramelli D., Dresely V., et al. Neanderthal behaviour, diet, and disease inferred from ancient DNA in dental calculus. Nature. 2017;544:357–361. doi: 10.1038/nature21674. [DOI] [PubMed] [Google Scholar]
- 38.El Zaatari S., Grine F.E., Ungar P.S., Hublin J.J. Neandertal versus Modern Human Dietary Responses to Climatic Fluctuations. PLoS ONE. 2016;11:e0153277. doi: 10.1371/journal.pone.0153277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cornejo Ulloa P., van der Veen M.H., Krom B.P. Review: Modulation of the oral microbiome by the host to promote ecological balance. Odontology. 2019;107:437–448. doi: 10.1007/s10266-019-00413-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lamont R.J., Jenkinson H.F. Subgingival colonization by Porphyromonas gingivalis. Oral Microbiol. Immunol. 2000;15:341–349. doi: 10.1034/j.1399-302x.2000.150601.x. [DOI] [PubMed] [Google Scholar]
- 41.Laputková G., Schwartzová V., Bánovčin J., Alexovič M., Sabo J. Salivary Protein Roles in Oral Health and as Predictors of Caries Risk. Open Life Sci. 2018;13:174–200. doi: 10.1515/biol-2018-0023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lynge Pedersen A.M., Belstrøm D. The role of natural salivary defences in maintaining a healthy oral microbiota. J. Dent. 2019;80:S3–S12. doi: 10.1016/j.jdent.2018.08.010. [DOI] [PubMed] [Google Scholar]
- 43.Chen W., Jiang Q., Yan G., Yang D. The oral microbiome and salivary proteins influence caries in children aged 6 to 8 years. BMC Oral Health. 2020;20:295. doi: 10.1186/s12903-020-01262-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cabras T., Melis M., Castagnola M., Padiglia A., Tepper B.J., Messana I., Tomassini Barbarossa I. Responsiveness to 6-n-Propylthiouracil (PROP) Is Associated with Salivary Levels of Two Specific Basic Proline-Rich Proteins in Humans. PLoS ONE. 2012;7:e30962. doi: 10.1371/journal.pone.0030962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Rodrigues L., Costa G., Cordeiro C., Pinheiro C., Amado F., Lamy E. Salivary proteome and glucose levels are related with sweet taste sensitivity in young adults. Food Nutr. Res. 2017;61:1389208. doi: 10.1080/16546628.2017.1389208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Stolle T., Grondinger F., Dunkel A., Meng C., Médard G., Kuster B., Hofmann T. Salivary Proteome Patterns Affecting Human Salt Taste Sensitivity. J. Agric. Food Chem. 2017;65:9275–9286. doi: 10.1021/acs.jafc.7b03862. [DOI] [PubMed] [Google Scholar]
- 47.Scinska-Bienkowska A., Wrobel E., Turzynska D., Bidzinski A., Jezewska E., Sienkiewicz-Jarosz H., Golembiowska K., Kostowski W., Kukwa A., Plaznik A., et al. Glutamate concentration in whole saliva and taste responses to monosodium glutamate in humans. Nutr. Neurosci. 2006;9:25–31. doi: 10.1080/10284150600621964. [DOI] [PubMed] [Google Scholar]
- 48.Méjean C., Morzel M., Neyraud E., Issanchou S., Martin C., Bozonnet S., Urbano C., Schlich P., Hercberg S., Péneau S., et al. Salivary Composition Is Associated with Liking and Usual Nutrient Intake. PLoS ONE. 2015;10:e0137473. doi: 10.1371/journal.pone.0137473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Morzel M., Chabanet C., Schwartz C., Lucchi G., Ducoroy P., Nicklaus S. Salivary protein profiles are linked to bitter taste acceptance in infants. Eur. J. Pediatr. 2014;173:575–582. doi: 10.1007/s00431-013-2216-z. [DOI] [PubMed] [Google Scholar]
- 50.Perry G.H., Kistler L., Kelaita M.A., Sams A.J. Insights into hominin phenotypic and dietary evolution from ancient DNA sequence data. J. Hum. Evol. 2015;79:55–63. doi: 10.1016/j.jhevol.2014.10.018. [DOI] [PubMed] [Google Scholar]
- 51.Green R.E., Krause J., Briggs A.W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M.H., et al. A Draft Sequence of the Neandertal Genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Burbano H.A., Hodges E., Green R.E., Briggs A.W., Krause J., Meyer M., Good J.M., Maricic T., Johnson P.L., Xuan Z., et al. Targeted Investigation of the Neandertal Genome by Array-Based Sequence Capture. Science. 2010;328:723–725. doi: 10.1126/science.1188046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bode W., Engh R., Musil D., Thiele U., Huber R., Karshikov A., Brzin J., Kos J., Turk V. The 2.0 A X-ray crystal structure of chicken egg white cystatin and its possible mode of interaction with cysteine proteinases. EMBO J. 1988;7:2593–2599. doi: 10.1002/j.1460-2075.1988.tb03109.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Mednikova B.B. A Proximal Pedal Phalanx of a Paleolithic Hominin from Denisova Cave, Altai. Archaeol. Ethnol. Anthropol. Eurasia. 2011;39:129–138. doi: 10.1016/j.aeae.2011.06.017. [DOI] [Google Scholar]
- 55.Meyer M., Kircher M., Gansauge M.T., Li H., Racimo F., Mallick S., Schraiber J.G., Jay F., Prüfer K., de Filippo C., et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222–226. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Prüfer K., de Filippo C., Grote S., Mafessoni F., Korlević P., Hajdinjak M., Vernot B., Skov L., Hsieh P., Peyrégne S., et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science. 2017;358:655–658. doi: 10.1126/science.aao1887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Prüfer K., Racimo F., Patterson N., Jay F., Sankararaman S., Sawyer S., Heinze A., Renaud G., Sudmant P.H., de Filippo C., et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505:43–49. doi: 10.1038/nature12886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Mafessoni F., Grote S., de Filippo C., Slon V., Kolobova K.A., Viola B., Markin S.V., Chintalapati M., Peyrégne S., Skov L., et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl. Acad. Sci. USA. 2020;117:15132–15136. doi: 10.1073/pnas.2004944117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Thorvaldsdottir H., Robinson J.T., Mesirov J.P. Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Brief. Bioinform. 2013;14:178–192. doi: 10.1093/bib/bbs017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Robinson J.T., Thorvaldsdóttir H., Wenger A.M., Zehir A., Mesirov J.P. Variant Review with the Integrative Genomics Viewer. Cancer Res. 2017;77:e31–e34. doi: 10.1158/0008-5472.CAN-17-0337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ng P.C. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Pfeifer B., Alachiotis N., Pavlidis P., Schimek M.G. Genome scans for selection and introgression based on k-nearest neighbour techniques. Mol. Ecol. Resour. 2020;20:1597–1609. doi: 10.1111/1755-0998.13221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Bhatia G., Patterson N., Pasaniuc B., Zaitlen N., Genovese G., Pollack S., Mallick S., Myers S., Tandon A., Spencer C., et al. Genome-wide comparison of African-ancestry populations from CARe and other cohorts reveals signals of natural selection. Am. J. Hum. Genet. 2011;89:368–381. doi: 10.1016/j.ajhg.2011.07.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data reported in this manuscript are shown in the results section and further supported by the extended datasets provided in the supplementary files. No new primary datasets to be deposited have been generated.