Skip to main content
International Journal of Molecular Sciences logoLink to International Journal of Molecular Sciences
. 2023 Oct 9;24(19):15010. doi: 10.3390/ijms241915010

A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution

Lorena Di Pietro 1,2,, Mozhgan Boroumand 3,†,, Wanda Lattanzi 1,2,*, Barbara Manconi 4, Martina Salvati 1, Tiziana Cabras 4, Alessandra Olianas 4, Laura Flore 4, Simone Serrao 5, Carla M Calò 4, Paolo Francalacci 4, Ornella Parolini 1,2, Massimo Castagnola 3
Editor: Pierre Pontarotti
PMCID: PMC10573131  PMID: 37834461

Abstract

Saliva houses over 2000 proteins and peptides with poorly clarified functions, including proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases. Their genes are poorly conserved across related species, reflecting an evolutionary adaptation. We searched the nucleotide substitutions fixed in these salivary proteins’ gene loci in modern humans compared with ancient hominins. We mapped 3472 sequence variants/nucleotide substitutions in coding, noncoding, and 5′-3′ untranslated regions. Despite most of the detected variations being within noncoding regions, the frequency of coding variations was far higher than the general rate found throughout the genome. Among the various missense substitutions, specific substitutions detected in PRB1 and PRB2 genes were responsible for the introduction/abrogation of consensus sequences recognized by convertase enzymes that cleave the protein precursors. Overall, these changes that occurred during the recent human evolution might have generated novel functional features and/or different expression ratios among the various components of the salivary proteome. This may have influenced the homeostasis of the oral cavity environment, possibly conditioning the eating habits of modern humans. However, fixed nucleotide changes in modern humans represented only 7.3% of all the substitutions reported in this study, and no signs of evolutionary pressure or adaptative introgression from archaic hominins were found on the tested genes.

Keywords: salivary proteins, nucleotide substitutions, evolution

1. Introduction

Saliva is a multifaceted bodily fluid that contains enzymes (amylases, lysozymes, and lipases), proteins, peptides and glycoproteins, lipids (hormones such as testosterone and progesterone), and proteases, along with a high concentration of inorganic ions [1]. To date, more than 2000 proteins and peptides have been identified in saliva [2]. They are mainly involved in the homeostasis of the oral cavity, the digestion process, and the innate immune response [3]. Ninety percent of the salivary proteins and peptides derive from the secretion of the three major salivary glands (parotid, submandibular, and sublingual glands), while the remaining 10% are secreted by minor salivary glands or derive from exfoliated cells and leucocytes present in the gingival–crevicular fluid [4] from plasma exudate, plus some contributions from the oral microbial flora. During their transit in the secretory pathway, salivary proteins undergo a series of post-translational modifications (PTMs), including phosphorylation, N-terminal acetylation, glycosylation, sulfation, and proteolytic cleavages. Further changes in proteins and peptides also occur after secretion in the oral cavity, through the action of exogenous (microflora) and endogenous enzymes [1].

The main contribution to the composition of the human salivary proteome derives from a few protein families. In particular, proline-rich proteins (PRPs), statherin (STATH), P-B peptide, histatins (HTN), cystatins (CST), and amylases (AMY) altogether represent more than 95% (w/w) of all proteins found in saliva to date [5]. PRPs represent the major fraction of the salivary proteome in Homo sapiens (nearly 70% of the total protein content; >50% in weight) and include basic (bPRPs), acidic (aPRPs), and basic glycosylated (gPRPs) PRPs. They share a high abundance of proline, glycine, and glutamine residues, which represent 70–80% of the entire amino acid sequence [6,7]. bPRPs include eleven parent peptides/proteins and more than six parent glycosylated proteins (gPRPs), plus several proteoforms derived from gene polymorphisms and PTMs [8,9,10] (Figure 1). PRPs are encoded by genes belonging to the PRP multigene family, located within the PRB locus mapping on 12p13.2. The locus includes six tandemly linked genes: PRB2PRB1–PRB4–PRH2–PRB3PRH1, in the 5′-to-3′ direction, and is highly polymorphic as it contains internally repetitive DNA sequences, leading to frequent recombinational events [11,12]. At least four alleles (S, small; M, medium; L, large; and VL, very large) are present in the Western population of Homo sapiens at PRB1 and PRB3 loci and three (S, M, L) at PRB2 and PRB4 loci [8] (Figure 1). Except for the protein encoded by the PRB3 locus that gives rise to gPRPs, all the bPRP pro-proteins are cleaved completely by pro-protein convertases, generating smaller peptides/proteins, before granule maturation [9] (Figure 1). aPRPs are expressed in two loci, PRH1 and PRH2, mapping on chromosome 12p13. Single amino acid substitution and repeat insertion generate three PRH1 alleles, encoding parotid isoelectric-focusing slow isoform (PIF-s), the parotid acidic protein (Pa)—both 150 residues long—and the double band isoform slow (Db-s)—171 amino acid residues long [10] (Figure 2A). A single nucleotide substitution generates two PHR2 alleles, encoding the PRP-1 and PRP2 isoforms [11] (Figure 2A). A pro-protein convertase partially cleaves PRP-1, PRP2 and PIF-s in 3 N-terminal fragments of 106 residues, called PRP3, PRP4, PIF-f (PRP3 type), and a common C-terminal fragment of 44 amino acids, called P-C peptide. Db-s is cleaved at position 127 generating two peptides: Db-f (f stands for fast) and the P-C peptide (same as above) [12] (Figure 2A). The Pa isoform not carrying the convertase sequence generates a dimeric form through a disulfide bond [13] (Figure 2A). STATH is encoded by the STATH gene located in chromosome 4q13-19 [13,14]. Several STATH proteoforms are detectable in saliva due to phosphorylation, cyclization by transglutaminase 2, and proteolysis by amino-/carboxy-peptidases and convertase action [13,15,16]. P-B is a proline-rich small peptide encoded by the SMR3B gene, mapping on chromosome 4q13.3 [17], near the STATH gene, possibly sharing epigenetic control and/or the DNA replication timeframe [13,15,16]. HTN are small cationic histidine-rich peptides encoded by the HTN1 and HTN3 genes on chromosome 4q13. Despite their high sequence homology, HTN1 and HTN3 have different maturation pathways and biological activities [17,18,19].

Figure 1.

Figure 1

Schematic representation of basic proline-rich genes and encoded proteins: PRB1 (A), PRB2 (B), PRB3 (C), PRB4 (D). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks, with the corresponding symbol on top. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; aminoacidic substitutions are shown for selected isoforms. See text for additional details.

Figure 2.

Figure 2

Schematic representation of acidic proline-rich proteins (A) and cystatins (B). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks with corresponding symbols on top. All cystatin alternative proteoforms feature two disulfide bridges (indicated by brackets between Cys), oxidation (ox), and phosphorylation (P) sites. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; ox: oxidation sites; p-E: N-terminal pyroglutamic acid; aminoacidic substitutions are shown for selected isoforms. See text for additional details.

CST are inhibitory cysteine proteases involved in the innate immune response [20]. CSTA and CSTB are encoded by CSTA and CSTB genes, respectively, whereas CST-SN, CST-SA, CST-C, CST-S, and CST-D are encoded by CST1-CST5 genes (Figure 2B). Several PTMs occur in CST proteins, including N-acetylation, proteolytic cleavages, phosphorylation, and M-, W-, and C-oxidation, causing different final protein structures detectable in human saliva [21]. Also, two isoforms generated by single amino acid substitutions of cystatin D and cystatin SN are present in saliva [21] (Figure 2B).

The amylase alpha 1A (AMY1A) gene, on chromosome 1p21.1, is responsible for the expression of AMY, which accounts for about 20% of the weight of salivary proteins and is the most abundant protein of the whole saliva of Homo sapiens.

Several comparative studies have shown that the human salivary proteome differs from other species due to genetic divergences that are possible due to environmental factors, including diet and pathogens [22,23,24,25]. A recent study reported the results obtained from the comparison of the salivary proteomes of Homo sapiens sapiens (modern humans) with our closest extant evolutionary relatives, chimpanzees, and gorillas [26]. The authors demonstrated that the salivary protein composition is unique to each species despite their close sequence homology, which likely reflects an evolutionary adaptation [26]. Despite this initial observation, the evolution of human loci-encoding salivary proteins has not been studied to date. Nowadays, the increasing amount of genomic data obtained through sequencing of preserved skeletal remains of extinct hominins, such as Homo neanderthalensis (Neanderthals) and Homo Denisova (Denisovans), can reveal the extent of diversity that has emerged at the genomic level during more recent human evolution.

In this study, we aimed to identify the sequence changes that have been fixed during the recent human evolution in the gene loci encoded for the most abundant salivary proteins (namely, PRPs, statherin, P-B peptide, histatins, cystatins, and amylases) to gather possible functional indications regarding their evolutionary path and their contribution to oral homeostasis and salivary functions. Eating habits may be indeed mutually implicated with salivary proteins’ biology since these are implicated in the modulation of the microbiome of the oral cavity and the entire gastrointestinal tract [26]. To achieve this, we have interrogated the publicly available sequence databases of Neanderthals and Denisovans and compared them with modern human genome sequence data. This allowed us to identify several nucleotide substitutions in the loci coding for the most relevant human salivary protein families.

2. Results

By comparing the genomic sequences of salivary gene loci in modern humans with those of Altai Neanderthals, Chagyrskaya Neanderthals, Vindija Neanderthasl, and Denisovans, we identified an overall number of 3472 sequence variants/nucleotide substitutions across the 17 tested salivary genes in coding, noncoding, 5′-3′ untranslated (UTRs), and regulatory regions. The nucleotide substitutions observed in the 17 salivary-tested genes were summarized in Figure 3. Of the 3472 changed nucleotides, only 428 were in coding regions, and 121 were annotated as synonymous (Figure 3). The remaining 307 nucleotide variations were nonsynonymous (Figure 3), which are known to be subjected to a higher evolutionary pressure and are frequently exposed to natural selection [27,28]. We have, therefore, attempted a functional interpretation of nonsynonymous variations, which is inherently speculative and deserves future functional studies. The potential impact of nonsynonymous variants on salivary proteins’ function of Neanderthals and Denisovans was predicted by a SIFT (sorting intolerant from tolerant) analysis (see Table 1, Table 2 and Table 3), which enables predicting amino acid substitutions that may exert a deleterious effect. The reference single nucleotide polymorphism (SNP) number (rs) and the corresponding frequencies of the 107 missense changes in coding regions were also reported in Table 1, Table 2 and Table 3. Of note, even though the nucleotide changes located in noncoding regions should not affect the primary structure of the encoded protein, they could affect regulatory elements that may modify the splicing and/or the binding of epigenetic modulators and/or chromatin folding/looping. The variants fixed at 100% in modern humans compared to ancient hominines were highlighted in light orange in Table 1, Table 2 and Table 3 and Tables S1–S17.

Figure 3.

Figure 3

Nucleotide substitutions in salivary protein genes. The pie chart shows the type and number of 3472 nucleotide substitutions across the 17 tested salivary genes. In particular, the 428 substitutions found in coding regions included 307 nonsynonymous changes across all the 17 genes tested. See text for additional details.

Table 1.

Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRB1, PRB2, PRB3, and PRB4 gene loci.

Chromosome
Position (hg19)
Gene Region Modern Human Altai Neanderthal
(Variant Frequency a)
Chagyrskaya
Neanderthal
(Variant Frequency a)
Vindija
Neanderthal
(Variant Frequency a)
Denisovan
(Variant Frequency a)
Codon→Amino Acid SNP id SNP Total
Frequency (ALFA)
SIFT Results
(Score)
PRB1 (reverse reading, chromosome 12)
11,507,477 Exon 2
(II-2)
CTT CTT (100%) TTT (13%) TTT (7%) * CTT (100%) GAA→E10
AAA→K10
n.a. n.a. Damaging (0.02)
11,507,464 Exon 2
(II-2)
AGG AGG (100%) AGG (100%) AAG (12%) AGG (100%) UCC→S14
UUC→F14
rs1173856027 A = 0% Tolerated (0.72)
11,506,888 Exon 3
(II-2)
GGG GGG (100%) GGG (100%) GAG (12%) GGG (100%) CCC→P35
CUC→L35
n.a. n.a. Tolerated (0.06)
11,506,856 Exon 3
(II-2)
GGG GGG (100%) AGG (11%) GGG (100%) GGG (100%) CCC→P45
UCC→S45
rs762910991 A = 0.003% Tolerated (0.17)
11,506,853 Exon 3
(II-2)
GGT TGT (3%) * GGT (100%) AGT (15%) GGT (100%) CCA→P46
UCA→S46
rs745726339 A = 0% Damaging
(0)
11,506,852 Exon 3
(II-2)
GGT GGT (100%) GGT (100%) GAT (11%) GGT (100%) CCA→P46
CUA→L46
n.a. n.a. Damaging
(0)
11,506,804 Exon 3
(II-2)
GTT GAT (61%) GAT (63%) GAT (60%) GTT (100%) CAA→Q62
CUA→L62
n.a. n.a. Tolerated (0.29)
11,506,801 Exon 3
(II-2)
CCT CCT (100%) CTT (11%) CTT (5%) * CCT (100%) GGA→G63
GAA→E63
n.a. n.a. Damaging (0.01)
11,506,790 Exon 3
(II-2)
GTT GTT (100%) ATT (11%) ATT (6%) * GTT (100%) CAA→Q67
UAA→stop
rs1409612167 A = 0% Damaging due to stop
11,506,784 Exon 3
(II-2)
CTG CTG (100%) CTG (100%) TTG (13%) CTG (100%) GAC→D69
AAC→N69
rs554211998 T = 0% Tolerated (0.95)
11,506,774 Exon 3
(II-2)
GCT GTT (13%) GTT (8%) * GTT (6%) * GTT (9%) * CGA→R72
CAA→Q72
rs202083397 T = 10.6% Tolerated (0.08)
11,506,766 Exon 3
(II-2)
GCT GCT (100%) GCT (100%) ACT (12%) GCT (100%) CGA→R75
UGA→stop
rs766131639 A = 0% Damaging due to stop
11,506,730 Exon 3
(Ps-2)
GTT GTT (100%) ATT (16%) GTT (100%) GTT (100%) CAA→Q12
UAA→stop
n.a. n.a. Damaging due to stop
11,506,723 Exon 3
(Ps-2)
CCA CCA (100%) CTA (12%) CTA (3%) * CCA (100%) GGU→G14
GAU→D14
rs534597111 T = 0% NS
11,506,669 Exon 3
(Ps-2)
GGT GTT (39%) GTT (36%) GTT (55%) GTT (26%) CCA→P32
CAA→Q32
rs772365043 C = 0% NS
11,506,618 Exon 3
(Ps-2)
CCT CCT (100%) CTT (17%) CTT (3%) * CCT (100%) GGA→G49
GAA→E49
n.a. n.a. NS
11,506,612 Exon 3
(Ps-2)
GGG GGG (100%) GAG (11%) GGG (100%) GGG (100%) CCC→P51
CUC→L51
n.a. n.a. NS
11,506,577 Exon 3
(IB-6)
GGA GGA (100%) AGA (13%) GGA (100%) GGA (100%) CCU→P2
UCU→S2
n.a. n.a. NS
11,506,514 Exon 3
(IB-6)
GGA GGA (100%) AGA (6%) * AGA (11%) GGA (100%) CCU→P23
UCU→S23
n.a. n.a. NS
11,506,492 Exon 3
(IB-6)
GGT GGT (100%) GGT (100%) GAT (13%) GGT (100%) CCA→P30
CUA→L30
n.a. n.a. NS
11,506,490 Exon 3
(IB-6)
GGG AGG (5%) * AGG (18%) AGG (8%) * GGG (100%) CCC→P31
UCC→S31
n.a. n.a. NS
11,506,486 Exon 3
(IB-6)
GGT GGT (100%) GGT (100%) GTT (18%) GGT (100%) CCA→P32
CAA→Q32
rs755622101 T = 1.3% NS
11,506,473 Exon 3
(Ps-2)
TTC TTG(100%) TTG(83%) ** TTG(100%) ** TTG(75%) ** AAG→K37
AAC→N37
rs61930109 G = 72.1% NS
11,506,403 Exon 3
(Ps-2)
AGG GGG (50%) ** GGG (50%) ** AGG (100%) ** GGG (100%) UCC→S59
CCC→P59
n.a. n.a. NS
11,506,370 Exon 3
(Ps-2)
GGG GGG (100%) GGG (100%) AGG (21%) GGG (100%) CCC→P70
UCC→S70
rs774158904 A = 0% NS
11,506,369 Exon 3
(Ps-2)
GGG GGG (93%) GGG (100%) GAG (16%) GGG (100%) CCC→P71
CUC→L71
rs369001998 A = 0.007% NS
11,506,339 Exon 3
(Ps-2)
GGG GGG (97%) GAG (5%) * GAG (23%) GGG (100%) CCC→P81
CUC→L81
n.a. n.a. NS
11,506,333 Exon 3
(Ps-2)
GGA GGA (100%) GAA (5%) * GAA (11%) GGA (100%) CCU→P83
CUU→L83
n.a. n.a. NS
11,506,309 Exon 3
(Ps-2)
GGT GAT (4%) * GAT (6%) * GAT (17%) GGT (100%) CCA→P91
CUU→L91
n.a. n.a. Damaging (0.01)
11,506,303 Exon 3
(Ps-2)
GGT GTT (3%) * GTT (13%) GGT (100%) GGT (100%) CCA→P93
CAA→Q93
rs201682460 T = 2.8% Damaging
(0)
11,506,301 Exon 3
(Ps-2)
GTT ATT (4%) * GTT (100%) ATT (15%) GTT (100%) CAA→Q94
UAA→stop
n.a. n.a. Damaging due to stop
11,506,285 Exon 3
(Ps-2)
GGA GGA (100%) GGA (100%) GAA (14%) GGA (100%) CCU→P99
CUU→L99
n.a. n.a. Damaging
(0.01)
11,506,283 Exon 3
(Ps-2)
GTT GTT (100%) ATT (14%) ATT (13%) GTT (100%) CAA→Q100
UAA→stop
n.a. n.a. Damaging due to stop
11,506,250 Exon 3
(Ps-2)
GGT GGT (100%) ** GGT (100%) AGT (14%) GGT (100%) CCA→P111
UCA→S111
n.a. n.a. Tolerated (0.08)
11,506,249 Exon 3
(Ps-2)
GGT GGT (100%) ** GGT (100%) GAT (13%) GGT (100%) CCA→P111
CUA→L111
rs1208300501 A = 0% Tolerated (0.09)
11,506,246 Exon 3
(Ps-2)
GGG GGG (100%) ** GAG (18%) GGG (100%) GGG (100%) CCC→P112
CUC→L112
rs1303924609 A = 0% Damaging
(0.02)
11,506,241 Exon 3
(Ps-2)
GTT GTT (100%) ** GTT (100%) ATT (14%) GTT (100%) CAA→Q114
UAA→stop
rs751826141 A = 0% Damaging due to stop
11,506,217 Exon 3
(IB-6)
CGG GGG (67%) ** GGG (17%) ** GGG (25%) CGG (100%) GCC→A61
CCC→P61
rs771648794 G = 0.04% Tolerated
(1)
11,506,154 Exon 3
(IB-6)
GGG GGG (100%) AGG (17%) AGG (4%) * GGG (100%) CCC→P82
UCC→S82
n.a. n.a. Tolerated (0.15)
11,506,150 Exon 3
(IB-6)
GGT GGT (100%) GAT (14%) GGT (100%) GAT (6%) * CCA→P83
CUA→L83
rs747444571 A = 0% Damaging
(0.03)
11,506,079 Exon 3
(IB-6)
GGA GGA (100%) GGA (100%) AGA (13%) GGA (100%) CCU→P107
UCU→S107
n.a. n.a. Tolerated (0.06)
11,506,075 Exon 3
(IB-6)
GGA GGA (100%) GGA (100%) GAA (13%) GGA (100%) CCU→P108
CUU→L108
n.a. n.a. Damaging
(0.01)
11,506,070 Exon 3
(IB-6)
CCC CCC (100%) CCC (100%) TCC (12%) CCC (100%) GGG→G110
AGG→R110
n.a. n.a. Tolerated
(0.3)
11,506,057 Exon 3
(IB-6)
AGG AGG (100%) AAG (11%) AAG (5%) * AGG (100%) UCC→S114
UUC→F114
n.a. n.a. Damaging
(0.03)
11,506,052 Exon 3
(IB-6)
GGA GGA (100%) AGA (10%) * AGA (18%) GGA (100%) CCU→P116
UCU→S116
rs1372423355 A = 0% Tolerated
(0.06)
PRB2 (reverse reading, chromosome 12)
11,548,429 Exon 1
(Signal)
CGG CGG (100%) CAG (3%) * CAG (13%) CGG (100%) GCC→A11(sp)
GUC→V11(sp)
rs1415819382 A = 0% Damaging
(0)
11,547,429 Exon 2
(IB-1)
CCT TCT (4%) * CCT (100%) TCT (12%) CCT (100%) GGA→G18
AGA→R18
n.a. n.a. Damaging
(0.2)
11,546,899 Exon 3
(IB-1)
CCT CCT (100%) CTT (11%) CCT (100%) CCT (100%) GGA→G22
GAA→E22
rs188924826 T = 0.007% Tolerated
(0.1)
11,546,894 Exon 3
(IB-1)
GGG GGG (100%) AGG (14%) GGG (100%) GGG (100%) CCC→P24
UCC→S24
n.a. n.a. Tolerated
(0.73)
11,546,872 Exon 3
(IB-1)
GGA GGA (100%) GGA (100%) GAA (11%) GGA (100%) CCU→P31
CUU→L31
rs748769813 A = 0% Tolerated
(0.46)
11,546,830 Exon 3
(IB-1)
GGG GGG (100%) GAG (9%) * GAG (17%) GGG (100%) CCC→P45
CUC→L45
n.a. n.a. Tolerated
(0.1)
11,546,828 Exon 3
(IB-1)
GGT AGT (3%) * GGT (100%) AGT (17%) GGT (100%) CCA→P46
UCA→S46
rs755161117 A = 0.007% Tolerated
(0.36)
11,546,825 Exon 3
(IB-1)
GTT GTT (97%) GTT (100%) ATT (17%) GTT (100%) CAA→Q47
UAA→stop
n.a. n.a. Damaging due to stop
11,546,810 Exon 3
(IB-1)
GGA GGA (100%) GGA (100%) AGA (13%) GGA (100%) CCU→P52
UCU→S52
rs1347881375 A = 0% Tolerated
(0.97)
11,546,809 Exon 3
(IB-1)
GGA GGA (100%) GAA (6%) * GAA (12%) GGA (100%) CCU→P52
CUU→L52
n.a. n.a. Tolerated
(0.3)
11,546,807 Exon 3
(IB-1)
GTT GTT (97%) ATT (11%) ATT (11%) GTT (100%) CAA→Q53
UAA→stop
n.a. n.a. Damaging due to stop
11,546,792 Exon 3
(IB-1)
GGA GGA (100%) AGA (18%) GGA (100%) GGA (100%) CCU→P58
UCU→S58
n.a. n.a. Tolerated
(0.76)
11,546,780 Exon 3
(IB-1)
GGT GGT (100%) GGT (100%) AGT (12%) GGT (100%) CCA→P62
UCA→S62
n.a. n.a. Tolerated
(0.64)
11,546,770 Exon 3
(IB-1)
GGT GGT (100%) GGT (100%) GAT (13%) GGT (100%) CCA→P65
CUA→L65
n.a. n.a. Tolerated
(1)
11,546,764 Exon 3
(IB-1)
GGT GGT (100%) GGT (96%) GAT (12%) GGT (100%) CCA→P67
CAA→Q67
rs201994479 T = 0.008% Tolerated
(0.43)
11,546,732 Exon 3
(IB-1)
GGA GGA (100%) GGA (100%) AGA (13%) GGA (100%) CCU→P78
UCU→S78
n.a. n.a. Tolerated
(0.38)
11,546,716 Exon 3
(IB-1)
GTT GAT (4%) * GAT (14%) GTT (97%) GTT (100%) CAA→Q83
CUA→L83
n.a. n.a. Tolerated
(0.32)
11,546,686 Exon 3
(IB-1)
GCT GTT (42%) GTT (39%) GTT (51%) GTT (29%) CGA→R93
CAA→Q93
rs76832300 n.a. Tolerated
(0.5)
11,546,677 Exon 3
(IB-1)
GCT GCT (100%) GCT (100%) GCT (100%) GTT (24%) CGA→R96
CAA→Q96
rs201144571 T = 0.08% Tolerated
(0.47)
11,546,647 Exon 3
(P-J)
GGG GGG (100%) GGG (100%) GAG (15%) GGG (100%) CCC→P10
CUC→L10
n.a. n.a. Tolerated
(0.18)
11,546,642 Exon 3
(P-J)
GTT GTT (100%) GTT (100%) ATT (17%) GTT (100%) CAA→Q12
UAA→stop
n.a. n.a. Damaging due to stop
11,546,627 Exon 3
(P-J)
GGA AGA (3%) * AGA (11%) AGA (5%) * GGA (100%) CCU→P17
UCU→S17
n.a. n.a. Tolerated
(0.45)
11,546,618 Exon 3
(P-J)
GGA GGA (100%) GGA (93%) AGA (17%) GGA (100%) CCU→P20
UCU→S20
n.a. n.a. Tolerated
(0.81)
11,546,617 Exon 3
(P-J)
GGA GGA (100%) GGA (100%) GAA (17%) GGA (100%) CCU→P20
CUU→L20
rs780517289 A = 0% Tolerated
(0.82)
11,546,615 Exon 3
(P-J)
GGT GGT (100%) AGT (12%) AGT (8%) * GGT (100%) CCA→P21
UCA→S21
n.a. n.a. Tolerated
(0.39)
11,546,614 Exon 3
(P-J)
GGT GGT (100%) GAT (11%) GGT (100%) GGT (100%) CCA→P21
CUA→L21
n.a. n.a. Tolerated
(0.29)
11,546,585 Exon 3
(P-J)
GGG GGG (100%) GGG (100%) AGG (13%) GGG (100%) CCC→P31
UCC→S31
n.a. n.a. Tolerated
(0.53)
11,546,581 Exon 3
(P-J)
GGT GTT (6%) * GTT (13%) GGT (100%) GGT (100%) CCA→ P32
CAA→Q32
n.a. n.a. Damaging
(0.05)
11,546,566 Exon 3
(P-J)
TTT TCT (8%) * TCT (12%) TTT (100%) TTT (100%) AAA→K37
AGA→R37
rs746515947 C = 0% Tolerated
(1)
11,546,462 Exon 3
(IB-8a)
GGG GGG (100%) AGG (13%) GGG (100%) GGG (100%) CCC→P9
UCC→S9
rs201392419 A = 0% Tolerated
(0.58)
11,546,395 Exon 3
(IB-8a)
GGT GTT (16%) GTT (10%) * GTT (13%) GTT (4%) * CCA→P31
CAA→Q31
rs11054277 T = 0.01% Damaging
(0)
11,546,380 Exon 3
(IB-8a)
TTT TCT (17%) TCT (14%) TCT (6%) * TTT (100%) AAA→K37
AGA→R37
rs11054276 C = 0.01% Tolerated
(1)
11,546,381 Exon 3
(IB-8a)
TTT TTT (100%) CTT (100%) TTT (100%) GTT (13%) AAA→K37
CAA→Q37
rs201455726 G = 0.2% Tolerated
(0.42)
11,546,369 Exon 3
(IB-8a)
GGG GGG (100%) AGG (12%) GGG (100%) GGG (100%) CCC→P41
UCC→S41
rs1238238576 A = 0% Tolerated
(0.42)
11,546,347 Exon 3
(IB-8a)
GTT GAT (6%) * GAT (4%) * GAT (15%) GTT (100%) CAA→Q48
CUA→L48
n.a. n.a. Tolerated
(0.32)
11,546,342 Exon 3
(IB-8a)
GGT GGT (100%) GGT (100%) AGT (18%) GGT (100%) CCA→P50
UCA→S50
n.a. n.a. Tolerated
(0.41)
11,546,327 Exon 3
(IB-8a)
CTG CTG (100%) TTG (11%) TTG (18%) CTG (100%) GAC→D55
AAC→N55
n.a. n.a. Tolerated
(0.28)
11,546,314 Exon 3
(IB-8a)
GTT GCT (87%) GCT (77%) GCT (67%) GCT (94%) CAA→Q59
CGA→R59
rs34305575 C = 7.6% Tolerated
(0.35)
11,546,309 Exon 3
(IB-8a)
CGG GGG (12%) GGG (13%) GGG (18%) GGG (5%) * GCC→A61
CCC→P61
rs201308939 G = 3.8% Tolerated
(0.25)
11,546,305 Exon 3
(IB-8a)
GCT GTT (3%) * GCT (100%) GTT (11%) GCT (100%) CGA→R62
CAA→Q62
rs199748368 T = 0.07% Tolerated
(0.46)
11,546,300 Exon 3
(IB-8a)
GGA GGA (100%) AGA (13%) GGA (100%) GGA (100%) CCU→P64
UCU→S64
rs755713521 n.a. Tolerated
(0.66)
11,546,294 Exon 3
(IB-8a)
CCT CCT (100%) TCT (13%) CCT (100%) CCT (100%) GGA→G66
AGA→R66
n.a. n.a. Damaging (0.03)
11,546,279 Exon 3
(IB-8a)
GGT AGT (2%) * GGT (100%) AGT (13%) GGT (100%) CCA→P71
UCA→S71
n.a. n.a. Tolerated
(0.67)
11,546,278 Exon 3
(IB-8a)
GGT GAT (2%) * GGT (100%) GAT (13%) GGT (100%) CCA→P71
CUA→L71
rs766408532 n.a. Tolerated
(0.26)
11,546,246 Exon 3
(IB-8a)
GGG GGG (100%) GGG (100%) AGG (14%) GGG (100%) CCC→P82
UCC→S82
rs1440556057 A = 0.0004% Tolerated
(0.42)
11,546,245 Exon 3
(IB-8a)
GGG GGG (97%) GAG (7%) * GAG (26%) GAG (7%) * CCC→P82
CUC→L82
rs1262267049 A = 0.0004% Tolerated
(0.15)
11,546,213 Exon 3
(IB-8a)
GGG GGG (100%) AGG (8%) * AGG (25%) GGG (100%) CCC→P93
UCC→S93
rs1408969762 n.a. Tolerated
(0.26)
11,546,187 Exon 3
(IB-8a)
GTT GTT (96%) GTC (10%) * GTC (12%) GTC (4%) * CAA→Q101
CAC→H101
n.a. n.a. Tolerated
(0.23)
11,546,161 Exon 3
(IB-8a)
GTT GAT (21%) GTT (100%) GAT (30%) GTT (100%) CAA→Q110
CUA→L110
n.a. n.a. Tolerated
(0.61)
11,546,089 Exon 3
(P-F)
GGG GGG (100%) GAG (17%) ** GAG (17%) GGG (100%) CCC→P10
CUC→L10
n.a. n.a. Tolerated
(0.61)
11,546,084 Exon 3
(P-F)
GTT GTT (100%) GTT (100%) ATT (15%) GTT (100%) CAA→Q12
UAA→stop
n.a. n.a. Damaging due to stop
11,546,059 Exon 3
(P-F)
GGG GGG (100%) GAG (7%) * GAG (21%) GGG (100%) CCC→P20
CUC→L20
n.a. n.a. Tolerated
(0.19)
11,546,050 Exon 3
(P-F)
GGA GTA (4%) * GTA (13%) GGA (100%) GTA (7%) * CCU→P23
CAU→H23
n.a. n.a. Tolerated
(0.56)
11,546,027 Exon 3
(P-F)
GGG GGG (100%) AGG (11%) AGG (7%) * GGG (100%) CCC→P31
UCC→S31
rs1201001162 n.a. Tolerated
(0.61)
11,546,023 Exon 3
(P-F)
GGT GGT (100%) GTT (5%) * GTT (13%) GTT (4%) * CCA→P32
CAA→Q32
rs201391404 T = 0.059% Damaging (0.03)
11,546,009 Exon 3
(P-F)
TTT TTT (100%) TTT (100%) TTT (95%) GTT (12%) AAA→K37
CAA→ Q37
n.a. n.a. Tolerated
(0.26)
11,545,975 Exon 3
(P-F)
GTT GAT (2%) * GAT (16%) GAT (33%) GTT (100%) CAA→Q48
CUA→L48
n.a. n.a. Tolerated
(0.31)
11,545,964 Exon 3
(P-F)
GGT GGT (100%) CGT (20%) CGT (22%) CGT (19%) CCA→P51
GCA→A51
n.a. n.a. Tolerated
(0.74)
11,545,904 Exon 3
(P-H)
GGG GGG (100%) AGG (3%) * AGG (11%) GGG (100%) CCC→P10
UCC→S10
n.a. n.a. Tolerated
(0.8)
11,545,868 Exon 3
(P-H)
GGA GGA (100%) GGA (100%) AGA (13%) GGA (100%) CCU→P22
UCU→S22
n.a. n.a. Tolerated
(0.69)
11,545,814 Exon 3
(P-H)
GTC GTC (100%) ATC (4%) * ATC (12%) GTC (100%) CAG→Q40
UAG→stop
n.a. n.a. Damaging due to stop
11,545,802 Exon 3
(P-H)
GCG GCG (100%) GCG (100%) ACG (11%) GCG (100%) CGC→R44
UGC→C44
rs748815572 A = 0% Tolerated
(0.07)
11,545,793 Exon 3
(P-H)
GTT GTT (100%) ATT (12%) GTT (100%) GTT (100%) CAA→Q47
UAA→stop
n.a. n.a. Damaging due to stop
11,545,790 Exon 3
(P-H)
CCC CCC (100%) CCC (100%) TCC (13%) CCC (100%) GGG→G48
AGG→R48
n.a. n.a. Tolerated
(0.7)
PRB3 (reverse reading, chromosome 12)
11,422,578 Exon 1
(Signal)
CGG CGG (100%) CAG (14%) CAG (3%) * CGG (100%) GCC→A8(sp)
GUC→V8(sp)
rs1337927316 n.a. Tolerated
(0.06)
11,421,578 Exon 2
(Gl-5)
AGG AGG (100%) AAG (11%) AAG (11%) AGG (100%) UCC→S14
UUC→F14
n.a. n.a. Tolerated
(0.32)
11,421,002 Exon 3
(Gl-5)
GGG GGG (100%) AGG (11%) AGG (4%) * GGG (100%) CCC→P45
UCC→S45
rs533382585 n.a. Damaging (0.04)
11,420,989 Exon 3
(Gl-5)
CCG CCG (100%) CTG (14%) CTG (5%) * CCG (96%) GGC→G49
GAC→D49
n.a. n.a. Damaging
(0)
11,420,975 Exon 3
(Gl-5)
CCA TCA (2%) * TCA (17%) CCA (100%) CCA (100%) GGU→G54
AGU→S54
rs1197023343 n.a. Tolerated
(0.12)
11,420,974 Exon 3
(Gl-5)
CCA CCA (100%) CTA (8%) * CTA (21%) CCA (100%) GGU→G54
GAU→D54
n.a. n.a. Tolerated
(0.19)
11,420,971 Exon 3
(Gl-5)
GGG GGG (100%) GGG (100%) GAG (11%) GGG (100%) CCC→P55
CUC→L55
n.a. n.a. Damaging (0.02)
11,420,956 Exon 3
(Gl-5)
CCT CCT (98%) CCT (100%) CTT (14%) CCT (100%) GGA→G60
GAA→E60
rs745804122 T = 0% Tolerated
(0.06)
11,420,945 Exon 3
(Gl-5)
CCT CCT (100%) CCT (100%) TCT (14%) TCT (4%) * GGA→G64
AGA→R64
rs781151188 T = 0% Damaging (0.02)
11,420,939 Exon 3
(Gl-5)
GGG GGG (100%) AGG (11%) ** AGG (11%) GGG (100%) CCC→P66
UCC→S66
n.a. n.a. Damaging (0.04)
11,420,927 Exon 3
(Gl-5)
CCT CCT (100%) CCT (100%) TCT (11%) CCT (100%) GGA→G70
AGA→R70
n.a. n.a. Damaging
(0)
11,420,926 Exon 3
(Gl-5)
CCT CCT (100%) CCT (100%) CTT (16%) CCT (100%) GGA→G70
GAA→E70
n.a. n.a. Damaging
(0)
11,420,906 Exon 3
(Gl-5)
GGT GGT (100%) GGT (100%) AGT (12%) GGT (100%) CCA→P77
UCA→S77
n.a. n.a. Damaging
(0.04)
11,420,899 Exon 3
(Gl-5)
GCA GTA (73%) GCA (100%) GTA (65%) GTA (80%) CGU→R79
CAU→H79
rs769836435 T = 0.02% Tolerated
(0.59)
11,420,896 Exon 3
(Gl-5)
GGC GGC (100%) GGC (100%) GAC (13%) GGC (100%) CCG→P80
CUG→L80
n.a. n.a. Tolerated
(0.09)
11,420,836 Exon 3
(Gl-5)
GCA GTA (7%) * GTA (5%) * GTA (9%) * GTA (22%) CGU→R100
CAU→H100
n.a. n.a. Tolerated
(0.24)
11,420,815 Exon 3
(Gl-5)
GGT GTT (18%) GGT (100%) GGT (96%) GGT (100%) CCA→P107
CAA→Q107
rs201963893 T = 0% Tolerated
(0.45)
11,420,803 Exon 3
(Gl-5)
CCT CCT (100%) CCT (100%) CTT (15%) CCT (100%) GGA→G111
GAA→E111
n.a. n.a. Tolerated
(0.41)
11,420,800 Exon 3
(Gl-5)
CCT CCT (97%) CCT (100%) CTT (11%) CCT (100%) GGA→G112
GAA→E112
n.a. n.a. Damaging (0.01)
11,420,780 Exon 3
(Gl-5)
GGC GGC (100%) AGC (11%) GGC (100%) GGC (100%) CCG→P119
UCG→S119
n.a. n.a. Damaging (0.04)
11,420,779 Exon 3
(Gl-5)
GGC GAC (4%) * GAC (6%) * GAC (35%) GGC (100%) CCG→P119
CUG→L119
n.a. n.a. Damaging (0.03)
11,420,728 Exon 3
(Gl-5)
AGG AAG (4%) * AGG (100%) AAG (11%) AGG (100%) UCC→S136
UUC→F136
n.a. n.a. Damaging (0.04)
11,420,716 Exon 3
(Gl-5)
GGC GAC (4%) * GGC (100%) GAC (17%) GGC (100%) CCG→P140
CUG→L140
n.a. n.a. Tolerated
(0.12)
11,420,687 Exon 3
(Gl-5)
GGG GGG (98%) AGG (15%) GGG (100%) GGG (100%) CCC→P150
UCC→S150
n.a. n.a. Tolerated
(0.15)
11,420,686 Exon 3
(Gl-5)
GGG GGG (98%) GAG (8%) * GAG (18%) GGG (100%) CCC→P150
CUC→L150
n.a. n.a. Tolerated
(0.15)
11,420,614 Exon 3
(Gl-2)
CCT CCT (100%) CCT (100%) CTT (11%) CCT (100%) GGA→G132
GAA→E132
rs768625455 n.a. NS
11,420,597 Exon 3
(Gl-2)
CCA CCA (100%) CCA (100%) TCA (13%) CCA (100%) GGU→G138
AGU→S138
rs780713977 n.a. Tolerated
(0.09)
11,420,588 Exon 3
(Gl-2)
GGA AGA (4%) * AGA (10%) * AGA (16%) GGA (100%) CCU→P141
UCU→S141
n.a. n.a. Tolerated
(0.78)
11,420,495 Exon 3
(Gl-2)
GGT AGT (12%) AGT (3%) * AGT (6%) * AGT (14%) CCA→P172
UCA→S172
n.a. n.a. Tolerated
(0.14)
11,420,308 Exon 4
(Gl-2)
GGG GGG (100%) AGG (17%) GGG (100%) GGG (100%) CCC→P234
UCC→S234
rs760324380 A = 0.0008% Tolerated
(0.09)
11,420,307 Exon 4
(Gl-2)
GGG GGG (100%) GAG (12%) GGG (100%) GGG (100%) CCC→P234
CUC→L234
n.a. n.a. Damaging (0.03)
11,420,304 Exon 4
(Gl-2)
GGT GGT (100%) GAT (12%) GGT (100%) GGT (100%) CCA→P235
CUA→L235
n.a. n.a. Damaging (0.01)
11,420,281 Exon 4
(Gl-2)
GCA GCA (100%) ACA (13%) ACA (10%) * GCA (100%) CGU→R243
UGU→C243
rs758570507 A = 0% Damaging (0.05)
11,420,278 Exon 4
(Gl-2)
GGG GGG (100%) GGG (100%) AGG (11%) GGG (100%) CCC→P244
UCC→S244
n.a. n.a. Tolerated
(0.27)
11,420,182 Exon 4
(Gl-2)
GGT GGT (100%) GGT (100%) AGT (11%) GGT (100%) CCA→P277
UCA→S277
rs755939114 A = 0% Tolerated
(0.06)
11,420,170 Exon 4
(Gl-2)
CCC CCC (100%) CCC (100%) TCC (11%) CCC (100%) GGG→G280
AGG→R280
n.a. n.a. Tolerated
(0.07)
11,420,161 Exon 4
(Gl-2)
GGT GGT (100%) GGT (100%) AGT (13%) GGT (100%) CCA→P283
UCA→S283
n.a. n.a. Tolerated
(0.21)
11,420,160 Exon 4
(Gl-2)
GGT GGT (100%) GGT (100%) GAT (19%) GGT (100%) CCA→P283
CUA→L283
n.a. n.a. Tolerated
(0.09)
11,420,154 Exon 4
(Gl-2)
TCT TTT (3%) * TCT (100%) TTT (11%) TCT (100%) AGA→R285
AAA→K285
n.a. n.a. Tolerated
(0.63)
PRB4 (reverse reading, chromosome 12)
11,463,280 Exon 1
(PGA)
TCA TGA (100%) TGA (100%) TGA (97%) TGA (100%) AGU→S2
ACU→T2
n.a. n.a. Tolerated (0.83)
11,461,801 Exon 3
(PGA)
GCT GCT (98%) GCT (97%) GTT (13%) GCT (100%) CGA→R23
CAA→Q23
n.a. n.a. Tolerated (0.57)
11,461,772 Exon 3
(PGA)
GCA GCA (100%) GCA (96%) ACA (12%) GCA (100%) CGU→R33
UGU→C33
rs77775235 A = 0% Tolerated (0.06)
11,461,769 Exon 3
(PGA)
GGG TGG (5%) * TGG (9%) * TGG (5%) * TGG (13%) CCC→P34
ACC→T34
rs144658455 T = 0% Tolerated (0.53)
11,461,745 Exon 3
(PGA)
GTT CTT (8%) * CTT (8%) * CTT (5%) * CTT (12%) CAA→Q42
GAA→E42
rs76859544 C = 6.8% Tolerated
(1)
11,461,742 Exon 3
(PGA)
CCT TCT (10%) * TCT (27%) TCT (11%) TCT (7%) * GGA→G43
AGA→R43
rs776943151 T = 0.05% Tolerated (0.45)
11,461,706 Exon 3
(PGA)
GGG TGG (14%) TGG (23%) TGG (13%) TGG (20%) CCC→P55
ACC→T55
rs12308381 T = 21.6% Tolerated (0.12)
11,461,675 Exon 3
(PGA)
GCT GGT (1%) * GGT (2%) * GGT (2%) * GGT (28%) CGA→R65
CCA→P65
rs75743553 G = 0% Tolerated (0.32)
11,461,673 Exon 3
(PGA)
GGG GGG (99%) AGG (13%) AGG (2%) * GGG (100%) CCC→P66
UCC→S66
rs1332850459 A = 0% Tolerated (0.25)
11,461,580 Exon 3
(PGA)
TGG GGG (65%) GGG (52%) GGG (24%) GGG (54%) ACC→T97
CCC→P97
n.a. n.a. Tolerated (0.81)
11,461,570 Exon 3
(PGA)
GGA GTA (51%) GTA (54%) GTA (8%) * GTA (47%) CCU→P100
CAU→H100
n.a. n.a. Tolerated (0.59)
11,461,553 Exon 3
(PGA)
TCT CCT (13%) CCT (15%) TCT (100%) CCT (24%) AGA→R106
GGA→G106
n.a. n.a. Tolerated (0.84)
11,461,550 Exon 3
(PGA)
GGT GGT (100%) AGT (17%) GGT (100%) GGT (100%) CCA→P107
UCA→S107
n.a. n.a. Tolerated (0.50)
11,461,549 Exon 3
(PGA)
GGT GCT (13%) GCT (6%) * GGT (100%) GCT (13%) CCA→P107
CGA→R107
n.a. n.a. Tolerated
(0.9)
11,461,525 Exon 3
(PGA)
AGG AGG (100%) AAG (100%) AAG (100%) AGG (100%) UCC→S115
UUC→F115
n.a. n.a. Damaging (0.04)
11,461,513 Exon 3
(PGA)
GGT GGT (100%) GAT (10%) * GAT (11%) GGT (100%) CCA→P119
CUA→L119
n.a. n.a. Damaging (0.04)
11,461,471 Exon 3
(PGA)
CCA CCA (100%) CTA (4%) * CTA (14%) CCA (100%) GGU→G133
GAU→D133
n.a. n.a. Tolerated
(0.46)
11,461,421 Exon 3
(PGA)
GGG GGG (100%) AGG (5%) * AGG (6%) * AGG (100%) CCC→P150
UCC→S150
n.a. n.a. Tolerated (0.18)
11,461,420 Exon 3
(PGA)
GGG GGG (100%) GAG (11%) GGG (100%) GGG (100%) CCC→P150
CUC→L150
n.a. n.a. Tolerated
(0.1)
11,461,412 Exon 3
(PGA)
CTT CTT (100%) TTT (14%) CTT (100%) CTT (100%) GAA→E153
AAA→K153
n.a. n.a. Tolerated (0.85)
11,461,319 Exon 4
(P-D P32A)
GGA GGA (97%) AGA (9%) * AGA (11%) GGA (100%) CCU→P23
UCU→S23
n.a. n.a. Tolerated (0.55)
11,461,309 Exon 4
(P-D P32A)
GGT GGT (100%) GGT (100%) GAT (11%) GGT (100%) CCA→P26
CUA→L26
n.a. n.a. Damaging (0.01)
11,461,229 Exon 4
(P-D P32A)
GGA GGA (100%) AGA (13%) AGA (4%) * GGA (100%) CCU→P54
UCU→S54
n.a. n.a. Tolerated (0.13)

a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.

Table 2.

Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRH2, HTN1, HTN3, AMY1A, STATH, and SMR3B gene loci.

Chromosome
Position (hg19)
Gene Region Modern Human Altai Neanderthal
(Variant Frequency a)
Chagyrskaya
Neanderthal
(Variant Frequency a)
Vindija
Neanderthal
(Variant Frequency a)
Denisovan
(Variant Frequency a)
Codon→Amino Acid SNP id SNP Total
Frequency (ALFA)
SIFT Results
(Score)
PRH2 (direct reading, chromosome 12)
11,082,885 Exon 2
(PRP-1)
GTT ATT (2%) * ATT (12%) ATT (4%) * GTT (100%) GUU→V12
AUU→I12
rs776898585 A = 0% N.S
11,082,894 Exon 2
(PRP-1)
GTA GTA (100%) ATA (12%) ATA (10%) * GTA (100%) GUA→V15
AUA→I15
n.a. n.a. Tolerated (0.26)
11,083,305 Exon 3
(PRP-1)
CCA CCA (98%) TCA (14%) TCA (14%) CCA (100%) CCA→P33
UCA→S33
n.a. n.a. Tolerated (0.07)
11,083,318 Exon 3
(PRP-1)
GGA GGA (100%) GAA (14%) GGA (100%) GGA (100%) GGA→G37
GAA→E37
n.a. n.a. Tolerated (0.07)
11,083,323 Exon 3
(PRP-1)
CAA CAA (100%) TAA (8%) * TAA (12%) CAA (100%) CAA→Q39
UAA→stop
n.a. n.a. Damaging due to stop
11,083,426 Exon 3
(PRP-1)
GGA GGA (100%) GGA (100%) GAA (11%) GGA (100%) GGA→G73
GAA→E73
n.a. n.a. Damaging (0.02)
11,083,431 Exon 3
(PRP-1)
CCA CCA (100%) TCA (13%) TCA (8%) * TCA (6%) * CCA→P75
UCA→S75
n.a. n.a. Tolerated (0.23)
11,083,452 Exon 3
(PRP-1)
GGA GGA (100%) AGA (6%) * AGA (14%) GGA (100%) GGA→G82
AGA→R82
n.a. n.a. Damaging (0.01)
11,083,455 Exon 3
(PRP-1)
GGC GGC (100%) AGC (17%) GGC (100%) GGC (100%) GGC→G83
AGC→S83
n.a. n.a. N.S.
11,083,488 Exon 3
(PRP-1)
GGA GGA (100%) GGA (100%) AGA (11%) GGA (100%) GGA→G94
AGA→R94
n.a. n.a. Damaging (0.04)
11,083,531 Exon 3
(PRP-1)
AGG AGG (100%) AGG (100%) AAG (18%) AGG (100%) AGG→R108
AAG→K108
n.a. n.a. N.S.
11,083,536 Exon 3
(PRP-1)
CAA CAA (100%) TAA (11%) CAA (100%) CAA (100%) CAA→Q110
UAA→stop
n.a. n.a. N.S.
11,083,545 Exon 3
(PRP-1)
CCC CCC (100%) TCC (12%) TCC (6%) * CCC (100%) CCC→P113
UCC→S113
rs1289206423 T = 0% N.S.
11,083,551 Exon 3
(PRP-1)
CAG CAG (97%) CAG (100%) TAG (13%) CAG (100%) CAG→Q115
UAG→stop
n.a. n.a. N.S.
11,083,570 Exon 3
(PRP-1)
GGT GGT (100%) GAT (18%) GGT (100%) GGT (100%) GGU→G121
GAU→D121
n.a. n.a. N.S.
11,083,575 Exon 3
(PRP-1)
CCC CCC (96%) TCC (8%) * TCC (15%) CCC (100%) CCC→P123
UCC→S123
n.a. n.a. N.S.
11,083,581 Exon 3
(PRP-1)
CCT CCT (100%) TCT (20%) TCT (8%) * CCT (100%) CCU→P125
UCU→S125
n.a. n.a. N.S.
11,083,582 Exon 3
(PRP-1)
CCT CCT (100%) CTT (13%) CTT (8%) * CCT (100%) CCU→P125
CUU→L125
n.a. n.a. N.S.
11,083,605 Exon 3
(PRP-1)
CCA CCA (100%) TCA (11%) CCA (100%) CCA (100%) CCA→P133
UCA→S133
rs1343870622 T = 0% N.S.
11,083,618 Exon 3
(PRP-1)
GGG GGG (100%) GAG (11%) GGG (100%) GGG (100%) GGG→G137
GAG→E137
n.a. n.a. N.S.
11,083,635 Exon 3
(PRP-1)
CCT CCT (100%) CCT (100%) TCT (16%) CCT (100%) CCU→P143
UCU→S143
n.a. n.a. N.S.
11,083,636 Exon 3
(PRP-1)
CCT CCT (100%) CCT (100%) CTT (11%) CCT (100%) CCU→P143
CUU→L143
n.a. n.a. N.S.
11,083,663 Exon 3
(C-term removal)
TCT TCT (100%) TCT (100%) TTT (17%) TCT (100%) UCU→S152(rem)
UUU→F152(rem)
rs746351335 n.a. N.S.
HTN1 (direct reading, chromosome 4)
70,920,165 Exon 4 CAT CAT (100%) TAT (2%) * TAT (13%) CAT (100%) CAUH15
UAUY15
n.a. n.a. Tolerated (0.37)
70,921,215 Exon 5 GAA GAA (100%) AAA (3%) * AAA (11%) GAA (100%) GAAE16
AAAK16
n.a. n.a. N.S
70,921,234 Exon 5 CGA CAA (2%) * CAA (58%) CAA (3%) * CGA (100%) CGAR32
CAAQ32
rs375127098 A = 0.014% N.S
HTN3 (direct reading, chromosome 4)
70,896,460 Exon 2
(Signal)
ATG ATG (100%) ATA (11%) ATG (100%) ATG (100%) AUGM0(sp)
AUAI0(sp)
n.a. n.a. N.S
70,897,696 Exon 3
(Signal)
GGA GGA (100%) AGA (12%) AGA (4%) * GGA (100%) GGAG17(sp)
AGAR17(sp)
rs1254624179 n.a. N.S
AMY1A (reverse reading, chromosome 1)
104,238,248 Exon 2
(Signal)
ACC ACC (100%) ACC (100%) ATC (15%) ACC (100%) UGG→W4(sp)
UAG→stop
n.a. n.a. Damaging due to stop
104,238,189 Exon 2 GCT GCT (100%) ACT (13%) ACT (20%) ** GCT (100%) CGA→R10
UGA→stop
n.a. n.a. Damaging due to stop
104,237,696 Exon 3 ACC ACC (100%) ACC (100%) ATC (17%) ACC (100%) UGG→W59
UAG→stop
n.a. n.a. Damaging due to stop
104,237,685 Exon 3 GTT GTT (100%) GTT (100%) ATT (14%) GTT (100%) CAA→Q63
UAA→stop
n.a. n.a. Damaging due to stop
104,237,626 Exon 3 TAC TAC (100%) TAC (100%) TAT (15%) TAC (100%) AUG→M82
AUA→I82
n.a. n.a. Damaging (0.01)
104,236,795 Exon 4 GCA GCA (100%) GCA (100%) ACA (13%) GCA (100%) CGU→R92
UGU→C92
n.a. n.a. Damaging (0)
104,236,666 Exon 4 CTA CTA (100%) CTA (100%) TTA (11%) CTA (100%) GAU→D135
AAU→N135
n.a. n.a. Tolerated (0.08)
104,236,654 Exon 4 CCA CCA (100%) TCA (5%) * TCA (11%) CCA (100%) GGU→G139
AGU→S139
n.a. n.a. Tolerated (0.6)
104,236,152 Exon 5 CAG CAG (100%) TAG (15%) TAG (20%) CAG (100%) GUC→V157
AUC→I157
n.a. n.a. Tolerated (0.17)
104,236,146 Exon 5 CTA CTA (100%) TTA (8%) * TTA (12%) CTA (100%) GAU→D159
AAU→N159
n.a. n.a. Tolerated (1)
104,236,139 Exon 5 GCA GTA (4%) * GTA (7%) * GTA (12%) GCA (100%) CGU→R161
CAU→H161
n.a. n.a. Damaging (0.01)
104,236,080 Exon 5 CTT CTT (100%) CTT (100%) TTT (13%) CTT (100%) GAA→E181
AAA→K181
n.a. n.a. Tolerated (0.11)
104,235,996 Exon 5 CGT CGT (96%) CGT (100%) TGT (13%) CGT (100%) GCA→A209
ACA→T209
n.a. n.a. Tolerated (0.27)
104,235,164 Exon 6 CTC CTC (100%) CTC (100%) TTC (11%) CTC (100%) GAG→E240
AAG→K240
n.a. n.a. Damaging (0.01)
104,235,148 Exon 6 TCA TCA (100%) TCA (100%) TTA (18%) TCA (100%) AGU→S245
AAU→N245
n.a. n.a. Tolerated (0.52)
104,235,083 Exon 6 GCG ACG (3%) * ACG (6%) * ACG (12%) GCG (100%) CGC→R267
UGC→C267
n.a. n.a. Damaging (0)
104,234,224 Exon 7 CCT CCT (100%) CCT (100%) CTT (13%) CCT (100%) GGA→G281
GAA→E281
n.a. n.a. Damaging (0)
104,234,218 Exon 7 CCA CCA (100%) CTA (13%) CTA (15%) CCA (100%) GGU→G283
GAU→D283
n.a. n.a. Tolerated (0.25)
104,234,129 Exon 7 GAA GAA (100%) AAA (13%) GAA (100%) GAA (100%) CUU→L313
UUU→F313
n.a. n.a. Damaging (0)
104,234,125 Exon 7 TGG TGG (100%) TAG (17%) TGG (100%) TGG (100%) ACC→T314
AUC→I314
n.a. n.a. Damaging (0)
104,233,978 Exon 8 GGA GGA (100%) AGA (13%) AGA (11%) GGA (100%) CCU→P332
UCU→S332
n.a. n.a. Damaging (0.05)
104,233,977 Exon 8 GGA GGA (100%) GAA (6%) * GAA (11%) GGA (100%) CCU→P332
CUU→L332
n.a. n.a. Damaging (0)
104,233,963 Exon 8 GCT GCT (100%) GCT (100%) ACT (14%) GCT (100%) CGA→R337
UGA→stop
rs19955486 A = 0.08% Damaging due to stop
104,231,858 Exon 9 ACA ACA (100%) ACA (100%) ATA (11%) ACA (100%) UGU→C378
UAU→Y378
n.a. n.a. Damaging (0)
104,231,680 Exon 10 CAC CAC (100%) TAC (4%) * TAC (20%) CAC (100%) GUG→V401
AUG→M401
n.a. n.a. Damaging (0)
104,231,643 Exon 10 CCC CCC (100%) CTC (5%) * CTC (11%) CCC (100%) GGG→G413
GAG→E413
n.a. n.a. Damaging (0.02)
104,231,622 Exon 10 CCC CCC (100%) CCC (100%) CTC (13%) CCC (100%) GGG→G420
GAG→E420
n.a. n.a. Tolerated (0.08)
104,230,237 Exon 11 TGA TGA (100%) TGA (100%) TAA (13%) TGA (100%) ACU→T442
AUU→I442
n.a. n.a. Damaging (0)
104,230,129 Exon 11 AGA AGA (100%) AGA (100%) AAA (13%) AGA (100%) UCU→S478
UUU→F478
n.a. n.a. Tolerated (0.62)
STATH (direct reading, chromosome 4)
70,866,583 Exon 5 GGG GGG (100%) AGG (13%) AGG (3%) * GGG (100%) GGGG17
AGGR17
n.a. n.a. N.A.
70,866,616 Exon 5 CCA CCA (98%) CCA (100%) TCA (11%) TCA (3%) * CCAP28
UCAS28
n.a. n.a. N.A.
70,866,626 Exon 5 CCA CCA (100%) CTA (15%) CCA (100%) CCA (96%) CCAP31
CUAL31
n.a. n.a. N.A.
70,866,628 Exon 5 CAA CAA (100%) TAA (15%) CAA (100%) CAA (100%) CAAQ32
UAAstop
n.a. n.a. Damaging due to stop
SMR3B (direct reading, chromosome 4)
71,255,405 Exon 3 AGG AGG (100%) AGG (100%) AAG (12%) AGG (100%) AGG→R5
AAG→K5
rs777831757 A = 0% NS
71,255,444 Exon 3 CCT CCT (100%) CTT (12%) CTT (3%) * CCT (100%) CCU→P18
CUU→L18
n.a. n.a. NS
71,255,495 Exon 3 GGG GGG (100%) GGG (94%) GAG (17%) GGG (100%) GGG→G35
GAG→E35
n.a. n.a. NS

a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored.

Table 3.

Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on CST1, CST2, CST3, CST4, CST5, CSTA, and CSTB gene loci.

Chromosome
Position (hg19)
Gene Region Modern Human Altai Neanderthal
(Variant Frequency a)
Chagyrskaya
Neanderthal
(Variant Frequency a)
Vindija
Neanderthal
(Variant Frequency a)
Denisovan
(Variant Frequency a)
Codon→Amino Acid SNP id SNP Total
Frequency (ALFA)
SIFT Results
(Score)
CST1 (reverse reading, chromosome 20)
23,731,494 Exon 1 (Signal) ATA GTA (100%) GTA (95%) GTA (100%) GTA (100%) UAU→Y3(sp)
CAU→H3(sp)
rs6076122 G = 71.1% Tolerated
(0.11)
23,731,463 Exon 1
(Signal)
TGG TAG (2%) * TAG (13%) TAG (5%) * TGG (100%) ACC→T13(sp)
AUC→I13(sp)
n.a. n.a. Tolerated
(0.39)
23,731,455 Exon 1
(Signal)
CAC CAC (100%) CAC (100%) TAC (16%) CAC (100%) GUG→V16(sp)
AUG→M16(sp)
n.a. n.a. Tolerated
(0.23)
23,731,446 Exon 1
(Signal)
CGG CGG (100%) CGG (100%) TGG (11%) CGG (100%) GCC→A19(sp)
ACC→T19(sp)
rs1425228752 T = 0.001% Damaging
(0.01)
23,731,439 Exon 1 TCG TCG (100%) TTG (6%) * TTG (14%) TCG (100%) AGC→S2
AAC→N2
n.a. n.a. Tolerated
(0.15)
23,731,428 Exon 1 CTC CTC (100%) CTC (100%) TTC (21%) CTC (100%) GAG→E6
AAG→K6
rs1292698911 T = 0.0004% Tolerated
(0.66)
23,731,394 Exon 1 CGT CGT (100%) CAT (13%) CGT (100%) CGT (100%) GCA→A17
GUA→V17
n.a. n.a. Tolerated
(0.25)
23,731,344 Exon 1 CTC TTC (3%) * CTC (100%) TTC (11%) TTC (3%) * GAG→E34
AAG→K34
rs368203290 T = 0.008% Tolerated
(0.07)
23,731,307 Exon 1 GCA GCA (100%) GTA (14%) GCA (100%) GTA (6%) * CGU→R46
CAU→H46
rs758187154 T = 0% Damaging
(0.01)
23,731,281 Exon 1 GTT GTT (100%) GTT (100%) ATT (13%) GTT (100%) CAA→Q55
UAA→stop
n.a. n.a. Damaging due
to stop
23,729,759 Exon 2 CCC CCC (100%) CCC (100%) CGC (26%) CCC (100%) GGG→G59
GCG→A59
n.a. n.a. Tolerated
(1)
23,729,700 Exon 2 GGG GGG (100%) GGG (100%) AGG (11%) GGG (100%) CCC→P79
UCC→S79
n.a. n.a. Tolerated
(0.38)
23,729,699 Exon 2 GGG GGG (100%) GAG (3%) * GAG (11%) GGG (100%) CCC→P79
CUC→L79
rs756782667 A = 0% Tolerated
(0.06)
23,729,687 Exon 2 TGG TGG (100%) TAG (16%) TAG (4%) * TGG (100%) ACC→T83
AUC→I83
n.a. n.a. Damaging
(0.02)
23,728,503 Exon 3 GGG GGG (100%) AGG (11%) AGG (3%) * GGG (100%) CCC→P106
UCC→S106
rs754531104 A = 0.004% Tolerated
(0.09)
23,728,494 Exon 3
(Cys-SN)
TTG CTG (10%) * CTG (11%) CTG (14%) CTG (4%) * AAC→N109
GAC→D109
rs3188319 C = 0.004% Tolerated
(1)
23,728,490 Exon 3 TCT TTT (2%) * TTT (14%) TCT (100%) TCT (100%) AGA→R110
AAA→K110
n.a. n.a. Tolerated
(1)
23,728,487 Exon 3 TCC TCC (100%) TTC (13%) TTC (7%) * TCC (100%) AGG→R111
AAG→K111
rs3188320 T = 0% Tolerated
(0.85)
CST2 (reverse reading, chromosome 20)
23,807,260 Exon 1
(Signal)
CGG CGG (100%) CGG (100%) CAG (14%) CGG (100%) GCC→A12(sp)
GUC→V12(sp)
rs1411653443 A = 0.007% Damaging
(0.02)
23,807,257 Exon 1
(Signal)
TGG TGG (100%) TAG (14%) TGG (100%) TGG (100%) ACC→T13(sp)
AUC→I13(sp)
n.a. n.a. Tolerated
(0.43)
23,807,245 Exon 1
(Signal)
CGG CGG (100%) CAG (14%) CGG (100%) CGG (100%) GCC→A17(sp)
GUC→V17(sp)
n.a. n.a. Tolerated
(0.1)
23,807,231 Exon 1 GGG GGG (100%) AGG (14%) AGG (8%) * GGG (100%) CCC→P3
UCC→S3
n.a. n.a. Tolerated
(1)
23,807,162 Exon 1 GCA ACA (95%) ACA (100%) ACA (100%) ACA (8%) * CGU→R26
UGU→C26
rs111349461 A = 0.06% Damaging
(0.05)
23,807,138 Exon 1 CTC TTC (3%) * TTC (12%) TTC (6%) * CTC (100%) GAG→E34
AAG→K34
rs541427772 T = 0.017% Tolerated
(0.07)
23,807,102 Exon 1 GCG ACG (3%) * GCG (100%) ACG (11%) GCG (100%) CGC→R46
UGC→C46
rs112783512 A = 0.019% Tolerated
(0.07)
23,807,093 Exon 1 GCC GCC (100%) ACC (4%) ACC (20%) GCC (100%) CGG→R49
UGG→W49
rs55860552 A = 0.12% Damaging
(0)
23,807,084 Exon 1 GCT GCT (100%) ACT (5%) * ACT (15%) GCT (100%) CGA→R52
UGA→stop
rs568411970 A = 0% Damaging due
to stop
23,807,077 Exon 1 TCC TCC (100%) TCC (100%) TTC (13%) TCC (100%) AGG→R54
AAG→K54
n.a. n.a. Tolerated
(0.34)
23,807,075 Exon 1 CTC CTC (100%) TTC (12%) TTC (12%) CTC (100%) GAG→E55
AAG→K55
n.a. n.a. Tolerated
(1)
23,805,930 Exon 2 TAT CAT (7%) * CAT (5%) * CAT (14%) CAT (4%) * AUA→I67
GUA→V67
rs199856966 C = 0.004% Tolerated
(1)
23,805,917 Exon 2 GCT GTT (2%) * GTT (13%) GTT (5%) * GTT (2%) * CGA→R71
CAA→Q71
rs150428155 T = 0.008% Damaging
(0.01)
23,805,878 Exon 2 ACA ACA (100%) ACA (97%) ATA (14%) ACA (100%) UGU→C84
UAU→Y84
n.a. n.a. Damaging
(0)
23,805,875 Exon 2 CGG CGG (100%) CAG (15%) CAG (2%) * CGG (100%) GCC→A85
GUC→V85
n.a. n.a. Tolerated
(0.06)
23,804,730 Exon 3 ACG ACG (100%) ATG (7%) * ATG (11%) ACG (100%) UGC→C98
UAC→Y98
n.a. n.a. Damaging
(0)
23,804,702 Exon 3 ACC ACC (100%) ACT (12%) ACC (100%) ACC (100%) UGG→W107
UGA→stop
rs1380420803 n.a. Damaging due to stop
23,804,691 Exon 3 TAC TCC (13%) TCC (10%) * TCC (9%) * TAC (100%) AUG→M111
AGG→R111
rs202150666 C = 0.01% Tolerated
(0.31)
CST3 (reverse reading, chromosome 20)
23,618,472 Exon 1
(Signal)
GAG GAG (100%) AAG (8%) * AAG (15%) GAG (100%) CUC→L8(sp)
UUC→F8(sp)
rs1285248919 n.a. Damaging
(0)
23,618,433 Exon 1 GGG GGG (100%) GGG (100%) AGG (13%) GGG (100%) ** CCC→P22(sp)
UCC→S22(sp)
n.a. n.a. Tolerated
(0.5)
23,618,370 Exon 1 CAC CAC (100%) CAC (100%) TAC (13%) CAC (100%) GUG→V18
AUG→M18
n.a. n.a. Tolerated
(0.11)
23,618,358 Exon 1 CCA CCA (100%) TCA (22%) TCA (4%) * CCA (100%) GGU→G22
AGU→S22
n.a. n.a. Tolerated
(0.48)
23,618,357 Exon 1 CCA CCA (100%) CTA (11%) CCA (100%) CCA (100%) GGU→G22
GAU→D22
n.a. n.a. Tolerated
(0.56)
23,618,295 Exon 1 GTG GTG (100%) GTG (100%) ATG (13%) GTG (100%) CAC→H43
UAC→Y43
n.a. n.a. Tolerated
(1)
23,615,994 Exon 2 CCC CTC (3%) * CCC (100%) CTC (13%) CCC (100%) GGG→G59
GAG→E59
n.a. n.a. Damaging
(0.01)
23,614,564 Exon 3 GTC GTC (100%) GTC (100%) ATC (13%) GTC (100%) CAG→Q118
UAG→stop
n.a. n.a. Damaging due to stop
CST4 (reverse reading, chromosome 20)
23,669,566 Exon 1
(Signal)
TGG TGG (100%) TAG (7%) * TAG (11%) TGG (100%) ACC→T13(sp)
AUC→I13(sp)
rs770415022 n.a. Tolerated (0.37)
23,669,561 Exon 1
(Signal)
CGA CGA (100%) CGA (100%) CGA (100%) AGA (100%) GCU→A15(sp)
UCU→S15(sp)
n.a. n.a. Tolerated (0.39)
23,669,539 Exon 1 AGG AGG (100%) AAG (5%) * AAG (13%) AGG (100%) UCC→S3
UUC→F3
n.a. n.a. Tolerated (0.08)
23,669,470 Exon 1 GCA GCA (100%) GTA (15%) GCA (100%) GTA (17%) CGU→R26
CAU→H26
rs201273557 T = 0.01% Tolerated (0.08)
23,669,462 Exon 1 GTG GTG (100%) GTG (100%) ATG (18%) GTG (100%) CAC→H29
UAC→Y29
n.a. n.a. Tolerated (0.06)
23,669,408 Exon 1 GGC GGC (100%) AGC (12%) GGC (100%) GGC (100%) CCG→P47
UCG→S47
n.a. n.a. Tolerated (0.06)
23,667,835 Exon 2 AAA CAA (97%) CAA (100%) CAA (90%) AAA (100%) UUU→F58
GUU→V58
rs145608577 C = 0.2% Tolerated (1)
23,667,828 Exon 2 CCC CCC (100%) CTC (18%) CCC (100%) CCC (100%) GGG→G60
GAG→E60
rs144556333 T = 0.007% Damaging (0)
23,667,826 Exon 2 CAC CAC (100%) TAC (10%) * TAC (27%) CAC (100%) GUG→V61
AUG→M61
n.a. n.a. Tolerated (0.24)
23,667,808 Exon 2 CAT CAT (100%) TAT (13%) CAT (100%) TAT (4%) * GUA→V67
AUA→I67
rs774067751 T = 0.007% Tolerated (0.23)
23,667,792 Exon 2 TGG TGG (100%) TAG (13%) TGG (100%) TGG (100%) ACC→T72
AUC→I72
n.a. n.a. Damaging (0)
23,667,783 Exon 2 TGG TGG (100%) TGG (95%) TAG (15%) TGG (100%) ACC→T75
AUC→I75
rs760057501 A = 0% Damaging (0.01)
23,666,565 Exon 3 TAC TCC (88%) TCC (14%) TCC (80%) TAC (100%) AUG→M111
AGG→R111
rs779547810 C = 0% Tolerated (0.87)
CST5 (reverse reading, chromosome 20)
23,860,243 Exon 1 AGC AAC (3%) * AGC (100%) AAC (11%) AAC (5%) * UCG→S4
UUG→L4
rs145031249 A = 0.011% Tolerated (0.27)
23,860,211 Exon 1 GTA GTA (100%) GTA (100%) ATA (12%) GTA (100%) CAU→H15
UAU→Y15
n.a. n.a. Tolerated (1)
23,860,199 Exon 1 GAG GAG (100%) AAG (11%) GAG (100%) GAG (100%) CUC→L19
UUC→F19
rs370924959 A = 0% Tolerated (0.66)
23,860,178 Exon 1 ACA GCA (93%) GCA (100%) GCA (95%) GCA (100%) UGU→ C26
CGU→ R26
rs1799841 G = 43.2% Tolerated (1)
23,860,174 Exon 1 CGG CGG (100%) CGG (100%) CAG (11%) CGG (100%) GCC→A27
GUC→V27
n.a. n.a. Tolerated (0.18)
23,860,130 Exon 1 CTA CTA (100%) CTA (100%) TTA (14%) CTA (100%) GAU→D42
AAU→N42
rs1257216384 n.a. Tolerated (0.11)
23,860,093 Exon 1 CGG CGG (100%) CGG (100%) CAG (11%) CGG (100%) GCC→A54
GUC→V54
n.a. n.a. Tolerated (0.11)
23,858,200 Exon 2 TGG TGG (100%) TAG (22%) TGG (100%) TGG (100%) ACC→T76
AUC→I76
rs41282292 A = 0.061% Damaging (0)
CSTA (direct reading, chromosome 3)
122,044,197 Exon 1 GTT GTT (100%) ATT (11%) GTT (100%) GTT (100%) GUU→V20
AUU→I20
rs778366890 A = 0% Tolerated (0.23)
122,056,400 Exon 2 CCA CCA (100%) CCA (100%) TCA (12%) CCA (100%) CCA→P25
UCA→S25
n.a. n.a. Tolerated (0.74)
122,060,361 Exon 3 CTT CTT (100%) CTT (100%) TTT (16%) CTT (100%) CUU→L82
UUU→F82
n.a. n.a. Damaging (0)
122,060,373 Exon 3 CAG CAG (100%) CAG (100%) TAG (12%) CAG (100%) CAG→Q86
UAG→stop
n.a. n.a. Damaging due
to stop
CSTB (reverse reading, chromosome 21)
45,194,562 Exon 2 CGC TGC (2%) * TGC (11%) CGC (100%) CGC (100%) GCG→A49
ACG→T49
rs559906825 T = 0.007% Damaging (0)
45,194,138 Exon 3 TGG TGG (98%) TCG (13%) TGG (95%) TGG (100%) ACC→T81
AGC→S81
n.a. n.a. Tolerated (0.65)
45,194,132 Exon 3 AGA AGA (100%) AGA (100%) AAA (15%) AGA (100%) UCU→S83
UUU→F83
n.a. n.a. Tolerated (0.1)

a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.

In the following subparagraphs, the results were detailed considering one locus at a time. Note that given the extreme structure heterogeneity of the tested genes with multiple alleles and different lengths, the nucleotide variations were indicated according to their genomic coordinates (see Section 4 for details).

2.1. Nucleotide Variations in the Gene Loci Encoding Basic Proline-Rich Proteins

2.1.1. PRB1 Gene

The genomic alignment allowed us to identify 130 nucleotide changes in the PRB1 gene in ancient hominines compared with modern humans (Table 1 and Table S1). Fifty-five of these were detected within coding exons and included ten synonymous and forty-five nonsynonymous nucleotide substitutions. Among the nonsynonymous nucleotide substitutions, 20 corresponded to SNPs annotated in modern humans (Table 1). SIFT prediction indicated that 46% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). The T-C transition, which occurred in modern humans at position 11,506,774, causing the substitution of R72 with a Q in the II-2 isoform (Table 1 and Figure 4a), may have an impact on post-translational protein processing. Indeed, the modern human R72 residue is part of the R72SPR75 consensus sequence recognized by the pro-protein convertase responsible for the cleavage between II-2 and P-E peptides. Therefore, we may hypothesize that in archaic species, the PRB-1-encoded protein was a fused peptide spanning 136 amino acids, which integrates the modern II-2 and P-E (Table 1 and Figure 4a). The sequences of the peptides and the resulting putative archaic protein primary structures (named PRB-1 salivary archaic fusion 1 peptide, PRB-1 SAF-1) are reported in Figure 4a. The remaining seventy-five nucleotide changes identified in the PRB1 locus were found to fall within noncoding regions, namely fifty-four in introns, six in upstream regions, one in the 5′ UTR, 1 in the 3′UTR, and thirteen in downstream regions (Table S1).

Figure 4.

Figure 4

Predicted archaic hominins’ PRB-1 (panel (a)) and PRB-2 (panels (bd)) protein variants.

2.1.2. PRB2 Gene

One hundred and thirty-six nucleotide substitutions were detected in the PRB2 locus in ancient hominines compared with modern humans (Table 1 and Table S2). Thirty-seven of these were identified in introns, ten in upstream regions, one in the 3′UTR, and eight in downstream regions. The remaining eighty variations were found in coding regions, namely two in exon 1 (corresponding to the signal peptide), one in exon 2, and the remaining in exon 3 (Table 1 and Table S2). Of note, the modern human sequence reported in the UniProtKB database corresponded to the L allele coding for the common isoforms IB-8a Con1- and P-H S1, the first one with a P residue instead of an S at position 100, the second one with an S residue instead of an A at position 1 [8]. Of the 80 sequence variants found in coding exons, 64 were nonsynonymous, causing amino acid substitutions. SIFT prediction indicated that 19% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). Twenty-six out of the sixty-four nonsynonymous substitutions were annotated as common variants (SNPs) in modern humans (Table 1). In particular, two changes occurring at 11,546,686 bp and 11,546,677 bp caused the substitution of the R93 and R96 with Q within the ancient IB-1 isoform. The two archaic residues were found in all four species, (Table 1). This implied that the archaic hominins’ R93SPR96 consensus sequence, recognized by the pro-protein convertase, apparently lacked two key arginine residues, thus disabling the post-translational cleavage. Therefore, the ancient saliva composition should feature a protein deriving from the fusion of IB-1 and P-J peptides, spanning 157 amino acids (named the PRB-2 salivary archaic fusion 2 peptide, PRB-2 SAF-2 peptide, in Figure 4b). Conversely, the presence of a C nucleotide at 11,546,314 bp in Neanderthals and Denisovans, instead of T in modern humans, led to the introduction of an R instead of the Q59 (Q217 in pro-protein) of the IB-8a Con1- isoform. This archaic primary structure would then include an additional pro-protein convertase consensus sequence, R59SAR62, causing the cleavage of the IB-8a Con1- protein into two smaller peptides. According to the usual removal of the C-terminal arginine residue observed for almost all the bPRPs, both peptides should be 61 aminoacidic residues long (Figure 4c). These putative archaic hominins’ PRB-2 variants are named by us the PRB-2 salivary archaic cleavage 1 peptide (PRB-2 SAC-1 peptide) and the PRB-2 salivary archaic cleavage 2 peptide (PRB-2 SAC-2 peptide) and are shown in Figure 4c. Of note, the sequence of the PRB-2 SAC-1 peptide exactly corresponds to the sequence of the modern human P-J peptide with an alanine (A61) instead of a serine in the last amino acid residue. The sequence of the PRB-2 SAC-2 peptide exactly corresponds to the modern human P-F peptide with a serine (S61) instead of an alanine in the last amino acid residue (Figure 4d and [9]). The variation at 11,546,395 bp indicated that in archaic hominins, the P31 (P189 of pro-protein) residue was replaced by a Q in the IB-8a Con1-; this change results probably in a deleterious effect on protein function, as predicted by SIFT analysis.

The protein name, the modifications with respect to modern humans, and the corresponding frequencies found in Neanderthals, Chagyrskayas, Vindijas and/or Denisovans are reported for each archaic protein. The positions of each substitution are also reported in the primary sequences (residues in bold characters). q: pyroglutamic acid; S: phosphorylated serine.

2.1.3. PRB3 Gene

We have identified 163 nucleotide variations in the PRB3 locus in ancient hominines compared with modern humans (Table 1 and Table S3). Of these, 53 were detected in coding regions and 110 in noncoding regions (71 within introns, 14 in upstream regions, 2 in the 3′UTR, and 23 in downstream regions; Table S3). The archaic sequences were compared with the allele Gl-2 (or PRP-3M) of modern humans. Fourteen variations identified in coding exons were synonymous, whereas thirty-nine changes were missense variants. Twelve out of the thirty-nine nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 1). PRP3 protein contains eight N-glycosylated Asp residues falling into the NXS/pS sequon; among the substitutions found in the PRB3 gene, only those at position 11,420,728 fall within the consensus sequence (S136F), and deleterious results for the protein function were predicted by SIFT (Table 1). Overall, 37.5% of the substitutions were found to be deleterious on the protein function (Table 1). The noncoding variant found at position 11,420,458 could probably affect the splicing process of PRB3 transcripts in ancient hominins since it fell within the GU consensus site (splice donor site) at 5′ end of intron 3 (Table S3).

2.1.4. PRB4 Gene

For the PRB4 locus, we detected 129 nucleotide substitutions in ancient hominines compared with modern humans (Table 1 and Table S4). Of these, 27 were found in coding exons, including 4 synonymous and 23 nonsynonymous (Table 1), and 102 in noncoding regions (Table S4). The archaic sequence was compared with the small allele of the modern human locus coding for P-D peptides and glycosylated protein A (PGA). The 23 missense variants were all found within coding regions for the glycosylated protein A, while none of the identified variations would affect the P-D variant (see Table 1 for details). These variations had no consequence on the consensus sequence of pro-protein convertase or on the sequence of the glycosylation sites. It is interesting to observe that all the archaic sequences reported a code for the P-D P32A variant. Overall, seven out of the twenty-three nonsynonymous in the PRB4 locus corresponded to annotated common variants in modern humans, and only 13% were found to be deleterious on the protein function (Table 1).

2.2. Nucleotide Variations in the Gene Locus Encoding the a-PRP

One hundred and sixty-three nucleotide substitutions have been annotated in the PRH2 gene locus in ancient hominines compared with modern humans (Table 2 and Table S5), of which thirty fell within coding exons, including seven synonymous and twenty-three nonsynonymous. Four of these latter corresponded to annotated common variants in modern humans (Table 2). Sixty-six nucleotide substitutions were identified in introns, seven in upstream regions, three in the 5′UTR, forty-nine in the 3′UTR, and eight in downstream regions (Table S5). The archaic DNA sequences reported in the sequence database used in this study (see Section 4 for details) corresponded to the PRP-1 protein of the PRH2 alleles, thus having a N50 residue. The nucleotide variations reported in Table 1 generated two synonymous substitutions at D6 and P135.

2.3. Nucleotide Variations in the HTN Gene Loci

A total of 188 and 175 nucleotide substitutions were identified in the HTN1 and HTN3 genes, respectively (Table 2, Tables S6 and S7). The nucleotide substitutions reported in HTN1 are distributed as follows: 4 fell within coding exons, including1 synonymous and 3 nonsynonymous, and 184 fell in noncoding regions, including146 within introns, 6 in upstream regions, 3 in the 5′UTR, 9 in the 3′UTR, and 20 in downstream regions (Table 2 and Table S6). Regarding HTN3, 3 nucleotide changes were reported in coding exons (1 synonymous and 2 nonsynonymous), whereas 172 fell in noncoding regions (145 within introns, 9 in upstream regions, 3 in the 5′UTR, 5 in the 3′UTR, and 10 in downstream regions) (Table 2 and Table S7). One missense variant for HTN1 and one for HTN3 found in ancient hominins were also reported as SNPs in modern humans (Table 2).

2.4. Nucleotide Variations in the AMY1A Gene Locus

Two hundred and twelve nucleotide substitutions have been annotated in the AMY1A gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S8). Forty changes fell within coding exons, of which eleven were synonymous and twenty-nine were nonsynonymous. Only one of the nonsynonymous substitutions corresponded to an annotated common variant in modern humans (Table 2). One hundred forty-four nucleotide substitutions were identified in introns, four in upstream regions, nine in the 5′UTR, and fifteen in downstream regions (Table S8).

2.5. Nucleotide Variations in the STATH and P-B Gene Loci

One hundred fifty-nine nucleotide substitutions have been annotated in the STATH gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S9). Six changes fell within coding exons, of which two were synonymous and four were nonsynonymous (Table 2). One hundred fifty-three nucleotide substitutions were detected in introns and regulatory regions (Table S9).

One hundred eighty-seven nucleotide substitutions were detected in the SMR3B locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S10). Of these, 5 were found in coding exons (2 synonymous and 3 nonsynonymous), 155 were in introns, 3 in upstream regions, 3 in 5′UTRs, 10 in 3′UTR, and 11 in downstream regions (Table 2 and Table S10). One missense variant was reported as an SNP in modern humans (Table S10).

2.6. Nucleotide Variations in the CST Gene Loci

2.6.1. CST1 Gene

We have annotated 227 nucleotide substitutions in the CST1 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S11). Of these, 128 were found in introns, 19 in upstream regions, 7 in the 5′UTR, 12 in the 3′UTR, 32 in downstream regions (Table S11), and 29 in coding regions, including 11 synonymous and 18 missense variations (Table 3). The nucleotide variation at 23,731,494 bp caused the substitution of the Y3(sp) with an H, affecting the third amino acid residue of the signal peptide. This should not impact the function of the protein, although it may have affected the speed of protein translation and/or the correct processing and trafficking. Four substitutions out of eighteen could have a negative impact on protein function, as predicted by SIFT. Overall, nine nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table S11).

2.6.2. CST2 Gene

We detected 167 nucleotide changes in the CST2 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S12). Of these, 103 were in introns, 15 in upstream regions, 8 in the 3′UTR, 17 in downstream noncoding regions (Table S12), and 24 in coding regions (Table 2). The latter included six synonymous and nineteen nonsynonymous variations, eight of which were predicted to have a deleterious effect on protein function (SIFT score < 0.05). Ten out of the eighteen nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 2). Interestingly, the nucleotide change at 23,804,691 bp fell into the canonical DNA-binding motif for the NR3C1 (nuclear receptor subfamily 3 group C member 1) transcription factor, as reported in the UCSC Genome Browser. This variation could most likely affect the affinity of this factor for the regulatory region and thus the expression of the CST2 gene.

2.6.3. CST3 Gene

In the CST3 locus, we have identified 452 nucleotide variations in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S13). Of these, 329 were in introns, 18 in upstream regions, 9 in 5′UTR, 50 in 3′UTR, 29 in downstream noncoding regions (Table S13), and 17 in coding regions, including 9 synonymous and 8 nonsynonymous variations (Table 2). One nucleotide substitution corresponded to an annotated common variant in modern humans (Table 2).

2.6.4. CST4 Gene

Two hundred and sixty-three nucleotide substitutions were detected in the CST4 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S14). These included 130 changes in introns, 42 in upstream regions, 4 in the 5′UTR, 20 in the 3′UTR, 43 in downstream noncoding regions (Table S14), and 24 in coding exons (11 synonymous and 13 missense variations; Table 3). Seven variations in this locus corresponded to annotated common variants in modern humans (Table 3). The change at 23,666,565 bp caused the substitution of the M111 with an R in the corresponding Neanderthal peptide structure. Even if it causes the substitution of an uncharged amino acid with a charged one, the SIFT analysis did not predict a deleterious effect of this variant on the function of the archaic protein compared to modern humans.

2.6.5. CST5 Gene

One hundred ninety-three nucleotide substitutions were annotated in the CST5 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S15). Sixteen changes were mapped in the coding region, including eight synonymous and eight nonsynonymous (Table 3). Of the 177 nucleotide substitutions located in noncoding regions, 118 were in introns, 24 in upstream regions, 18 in 3′UTR, and 17 in downstream regions (Table S15). The exonic nucleotide variation generated the codon for an R in both archaic hominins instead of C26. This represented a common variant also found in modern humans (rs1799841). The cystatin D variant with the R26 is frequently detected in the soluble fraction of human saliva, probably because is more soluble than the C26-containing isoform [19]. Moreover, the opposite substitution (R26C) was detectable with high frequency at the same amino acid residue in the cystatin SA gene of Neanderthals. Five out of the eight nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table 3).

2.6.6. CSTA and CSTB Genes

Finally, 394 and 134 nucleotide substitutions were identified in CSTA and CSTB loci, respectively, in Neanderthals and Denisovans compared with modern humans (Table 3, Tables S16 and S17). The nucleotide substitutions reported in CSTA were distributed as follows: 6 fell in coding exons, including 2 synonymous and 4 nonsynonymous, and 388 fell in noncoding regions, including 346 in introns, 10 in upstream regions, 5 in the 5′UTR, 10 in the 3′UTR, and 17 in downstream regions (Table 3 and Table S16). Among these changes, the variation at 122,044,848-122,044,850 positions of CSTA was a CTT deletion, observed exclusively in Denisovans (Table S16). This fell within the canonical DNA-binding motif for the Spi-1 proto-oncogene transcription factor (source: UCSC Genome Browser); therefore, it could probably affect the expression of the CSTA gene in the ancient hominin. Regarding CSTB, 9 nucleotide changes were reported in coding exons (6 synonymous and 3 nonsynonymous), whereas 125 fell in noncoding regions (55 within introns, 27 in upstream regions, 5 in the 5′UTR, 15 in the 3′UTR, and 23 in downstream regions) (Table 3 and Table S17). One missense variant for CSTA and 1 for CSTB found in ancient hominins were also reported as an SNP in modern humans (Table 3).

2.7. Geographic Distribution of Genetic Variants in Modern Humans

Of note, the salivary protein genes tested resulted polymorphic in humans. The frequency of specific coding nonsynonymous genetic variants also changed between different populations, as reported in the Geography of Genetic Variants Browser (https://popgen.uchicago.edu/ggv; accessed on 22 July 2022) (File S1) [29]. In particular, 20 genetic variants (three in the PRB1 gene, six in PRB2, one in PRB3, two in CST1, four in CST2, three in CST5, and one in CSTB; highlighted in red in Table 1, Table 2 and Table 3) displayed a different geographic distribution and specifically; rs554211998, rs201994479, rs34305575, rs6076122, rs111349461, rs55860552, rs568411970, rs145031249, and rs1799841 showed a peculiar allele frequency in African populations (File S1).

2.8. Evolutionary Pressure of Salivary Protein Genes

To investigate if some of the salivary protein genes studied showed evidence of positive selection in anatomically modern humans, we performed a population branch statistics (PBS) analysis [30]. Our results showed no signal of recent selective pressure for the genes analysed, attesting that variants on these genes did not affect individual fitness (File S2). We also implemented the Tajima test as an additional evolutionary analysis to evaluate the selective effects of each observed substation. Tajima’s D values show comparable variance among the genes analysed. The D values were prevalently slightly negative or positive (ranging from −0.698 to 3.359) (File S3), confirming the absence of a selective sweep [31], which was already suggested by the PBS test.

Compared to modern humans, Neanderthal and Denisovan genomes showed evidence of ancient interbreed [32], leading to an uneven distribution of introgressed chromosomal regions because of natural selection [33]. To investigate if some of the salivary protein gene variants studied might be due to interbreeding, we used two databases of archaic introgression based on a comparison with modern genomes from the 1000 genomes project [34] and the Estonian Biocentre collection [35], which also reported data from previous studies [33,36]. However, the considered genes were not encompassed within the chromosomal regions highlighted in the databases and, therefore, did not show an apparent sign of adaptative introgression from archaic hominins.

3. Discussion

The different dietary habits of archaic hominins and modern humans have been mostly attributed to the changes in the availability of natural food resources, the oral bacterial community (microbiota), and climatic conditions [37,38]. A role for salivary proteins can be also inferred, as they are known to be implicated in the modulation of the microbiome of the oral cavity, the entire gastrointestinal tract, and taste perception [39]. aPRPs can promote the attachment of several important bacteria, such as Actinomyces viscosus, Bacteroides gingival, and some strains of Streptococcus mutans. Moreover, both aPRPs and statherin promote the colonization of oral surfaces by Porfiromonas gingivalis [40]. It was reported that the salivary proteins may modulate oral health and homeostasis, maintain a stable ecosystem, and inhibit the growth of cariogenic bacteria [41,42]. Recently, 258 salivary proteins were found differentially expressed between the caries-free and caries-active children [43]. They are also involved in taste perception. In particular, the salivary bPRPs II-2 and Ps-1 contribute to bitter taste sensitivity [44]. Also, some salivary peptides belonging to the bPRPs and the histatin families can bind polyphenols in tannin-rich foods, thus evoking the typical astringent sensation [44]. Salivary proteins play an important role in affecting sweet [45], salt [46], and umami [47] tastes, along with fat, salt, and bitter acceptance [48,49]. Also, cystatins are supposed to affect taste perception, as lower salivary levels of these peptides may enhance proteolysis, which would affect the mucosal pellicle lining of the oral cavity, thereby increasing the accessibility of tastants to taste receptors [49]. Interestingly, most of these proteins have been shown to be modulated in pathological conditions, including tumors and inflammation, suggesting that they play a role as clinically relevant biomarkers [5].

Therefore, a hypothesis has been raising that the evolutionary changes occurred in the structure of these proteins could be associated with the different dietary habits of archaic hominins. In this regard, mutations in different bitter taste receptor genes (namely TAS2R62, TAS2R64, and TAS2R38) and the masticatory myosin gene MYH16, along with the duplication of the salivary amylase gene AMY1 that has occurred in recent human evolution, have been associated with variations in taste sensitivity and the shift toward the food cooking habits of modern humans [50].

Based on this emerging background, in this study, we identified and inferred the functional consequences of the nucleotide substitutions fixed in the gene loci coding for the main salivary proteins in modern humans compared to ancient hominins species (Neanderthals and Denisovans).

By mapping over 3400 nucleotide substitutions, we have shown that the majority (87.7%) of changes are detectable in the genes expressing the most important salivary proteins (proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases) of modern humans, compared with Neanderthals and Denisovans, mapped within noncoding regions.

Quite unexpectedly, our data also showed the presence of nucleotide variations affecting the coding sequence of all 17 gene loci analysed. Overall, the frequency of coding variations in these genomic loci is far higher than the general rate found throughout the genome since previous studies highlighted that relatively few amino acid changes have become fixed in recent human evolution to date [51,52]. To the best of our knowledge, this study provides the first original description of coding nucleotide changes that occurred in salivary protein genes during the recent evolutionary shift of modern humans from Neanderthal and Denisovan species. Focusing on these missense variations, we hypothesized the possible functional effects they could have played in protein structure, processing, and function. Of the 307 missense changes found in the coding regions of the tested genes, 92 were predicted to have a potentially deleterious effect on protein function.

The changes identified in the PRB1 and PRB2 genes are worth particular attention and could be interpreted in light of the extant knowledge of the biology of the encoded proteins. As already mentioned, the PRB protein family is highly polymorphic and, despite being common to all mammals, the proteins belonging to this family feature have significant structural differences among species. For instance, the peptides generated by the convertase cleavage span 50 to 90 amino acids in length in humans and 10 to 40 in pigs, with sensible variations in the peptide sequences [53]. Therefore, bPRPs appear to be non-conserved across species, probably because they are mostly implicated in taste perception and underwent a deep transformation during evolution due to the changing habits and habitats of the species [44]. Interestingly, our results showed that three nucleotide substitutions annotated in the archaic hominins’ PRB1 and PRB2 genes affect specific arginine residues within the consensus sequences of the polypeptide, which are recognized by the pro-protein convertases responsible for their cleavage. These changes could have determined the presence of fused proteins in the archaic hominins’ proteome. The putative “PRB1 salivary archaic fusion 1 peptide” and “PRB2 salivary archaic fusion 2 peptide” could have been possibly associated with additional and/or alternative functions that able to influence the eating habits of extinct hominins. In addition, we have also identified a sequence change in the PRB2 gene that instead generates a new pro-protein convertase consensus sequence in the encoded peptide. As a result, ancient hominins could have expressed two smaller peptides, the “PRB2 salivary archaic cleavage 1 peptide” and the “PRB2 salivary archaic cleavage 2 peptide”, possibly exerting alternative functions, which deserve further functional studies.

The missense nucleotide substitutions annotated in the remaining salivary protein genes described in this study (aPRPs, histatins, amylases, statherin, P-B peptide, and cystatins) could be interpreted, at least in part, considering the putative changes that they can cause in post-translational protein processing, sorting, localization, and trafficking toward secretion. In addition, all the missense variations that introduce or remove a cysteine residue on the archaic cystatins, most likely affecting the conserved sequences involved in the protein-protein binding [53], could also influence protein function.

We also annotated the nucleotide variations fixed within the noncoding regions of modern humans of the tested genes, given these could reasonably affect the expression levels of salivary proteins by changing the affinity of transcriptional regulators for promoters, enhancer and/or silencer elements, and/or the splicing, in addition to changing splice site consensus sequences and leading to the formation of alternative coding transcripts. Also, they could affect post-transcriptional regulation mechanisms, such as the binding of the noncoding regulatory RNAs, leading to varying protein types and amounts that emerged during the recent evolution. Specifically, two nucleotide substitutions found in the CST2 and CSTA gene loci appear to fall within the canonical DNA-binding motifs for specific transcriptional factors, which could most likely intervene in the modulation of their expression. We also annotated 216 changes in the 3′ untranslated regions in 16 of the 17 genes analysed (in all but AMY1A). These substitutions might instead condition the binding of specific microRNA-targeting salivary protein transcripts, modulating their stability and the translation process.

Lastly, 34.9% of the nonsynonymous nucleotide substitutions identified in this study appear to be frequent in the modern human genome, where they are annotated as single nucleotide polymorphisms (SNPs). In addition, some of these coding genetic variants display a different geographic distribution in humans. This observation reduces the evolutionary significance of such changes, which are to be considered in light of the polymorphic nature of these genomic loci. However, taken together, variants showing alternative nucleotide fixation in modern vs. archaic humans represent 7.3% of all the nucleotide substitutions reported in the study.

Also, our results do not suggest any significant evolutionary pressure or sign of adaptative introgression from archaic hominins on the tested genes.

4. Materials and Methods

4.1. Nucleotide Variants Annotation

In order to annotate all the nucleotide variants within the gene loci of the salivary proteins of interest, we compared modern human sequences with Altai Neanderthals (downloaded from http://cdna.eva.mpg.de/Neanderthal/altai/AltaiNeanderthal/bam/, accessed on 2 May 2020), Chagyrskaya Neanderthals (Index of/neandertal/Chagyrskaya/BAM (mpg.de), accessed on 9 December 2022), Vindija Neanderthals (Index of/neandertal/Vindija/bam/Pruefer_etal_2017/Vindija33.19 (mpg.de), accessed on 9 December 2022), and Denisova sequences (http://cdna.eva.mpg.de/denisova/alignments/, accessed on 2 May 2020) [54,55]. The fossil remains, aged between 50,000 and 30,000 years, come from two distinct geographical areas. The female Neanderthal sample from Vindija (Croatia), in the Western Balkans, yielded a 30× genome coverage [56]. The other samples came from two different sites in the Altai Mountains in Siberia (Russia): the genomic data of a female Neanderthal (at 52× coverage) [57] and a juvenile female Denisovan individual (at 30× coverage) [55] came from the Denisova cave, and another female sample came from the Chagyrskaya cave, located about 100 km westward, and yielded a genome of 27× coverage [58]. In particular, we aligned the sequences of modern humans and ancient hominines by means of the Integrative Genomics Viewer (IGV) tool (2.3.72 version) [59,60,61]. Note that the reference genomes annotated in this database are set on the hg19 genome assembly coordinates. We annotated all the nucleotide substitutions with a frequency greater than 10% and a coverage of a minimum of 10 counts in both coding, noncoding, and regulatory sequences (i.e., 5′ and 3′ untranslated and flanking upstream and downstream regulatory regions) for each gene of interest to consider the possible damage and fragmentation to which the ancient hominin DNA was subjected. Of note, the variant frequency indicated the percentage of frequency of that substitution in ancient hominines, as reported by the IGV tool, considering the depth (coverage) of the reads displayed at each locus. For each tested gene, a region of approximately 500 bp upstream and downstream of the first and last exons was, respectively, considered and screened to annotate nucleotide substitutions within regulatory regions able to affect the gene expression rate. The precise hg19 genomic coordinates for each tested gene locus were as follows: PRB1 locus 11,509,000–11,504,200 on chromosome 12; PRB2 locus 11,549,000–11,544,000 on chromosome 12; PRB3 locus 11,423,140–11,418,300 on chromosome 12; PRB4 locus 11,463,900–11,459,500 on chromosome 12; PRH2 locus 11,081,500–11,087,950 on chromosome 12; HTN1 locus 70,915,750–70,925,000 on chromosome 4; HTN3 locus 70,893,670–70,902,700 on chromosome 4; AMY1A locus 104,239,500–104,229,500 on chromosome 1; STATH locus 70,861,200–70,868,790 on chromosome 4; SMR3B locus 71,248,550–71,256,400 on chromosome 4; CST1 locus 23,732,000–23,727,600 on chromosome 20; CST2 locus 23,807,800–23,803,900 on chromosome 20; CST3 locus 23,619,100–23,606,800 on chromosome 20; CST4 locus 23,670,200–23,665,700 on chromosome 20; CST5 locus 23,860,900–23,856,000 on chromosome 20; CSTA locus 122,043,600–122,061,300 on chromosome 3; and CSTB locus 45,196,800–45,193,000 on chromosome 21.

The annotation with the corresponding frequency of all variations in present-day human populations was collected by integrating information from both the dbSNP (Single Nucleotide Polymorphism Database; https://www.ncbi.nlm.nih.gov/snp, accessed on 15 July 2020) and the Ensembl (http://www.ensembl.org/index.html, accessed on 15 July 2020) databases. In particular, the frequency was reported as the Allele Frequency Aggregator (ALFA New). The analysis of regulatory regions in the gene loci analysed was assessed by implementing the information available on the UCSC Genome Browser database (https://genome.ucsc.edu, accessed on 15 July 2020).

The coding sequences of salivary proteins were extracted from the publicly available UniProtKB database (https://www.uniprot.org/, accessed on 15 July 2020): PRB1, primary accession number: P04280; PRB2: P02812; PRB3: Q04118; PRB4: P10163; PRH2: P02810; HTN1: P15515; HTN3: P15516; STATH: P02808; AMY1A: P0DUB6; P-B: P02814, CST1: P01037; CST2: P09228; CST3: P01034; CST4: P01036; CST5: P28325, CSTA: P01040, CSTB: P04080.

4.2. Protein Data Analysis

The potential impact of the amino acid substitution on salivary protein function was predicted by SIFT (sorting intolerant from tolerant) version 5.1.1 using the Genome tool (SIFT nonsynonymous single nucleotide variants (genome-scale), available at the SIFT website (http://sift.jcvi.org/, accessed on 20 June 2022). The SIFT algorithm is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST [62]. SIFT results with a score < 0.05 indicate amino acids deleterious on protein function.

4.3. Selective Pressure Analysis

To detect any possible trace of selective pressure, PBS has been applied. PBS is a statistical three-population test based on the FST fixation index, and it has proven to be one of the best methods of detecting signs of recent natural selection on genomes [31]. Regarding the choice of the three populations, we used three distant populations worldwide (CEU for Europe, CHB for Asia, and YRI for Africa), which are the most commonly used [63,64] and are among the first populations released by the 1000 Genomes, Phase 1 [64].

FST among three possible populations pairs (CEU, CHB, and YRI) has been calculated by VCFtools v0.1.16 [65] using VCF files of each gene under scrutiny. The genes were previously filtrated with Plink 1.9 [66] to keep only the variants with MAF ≥ 0.05. Then, PBS and relative plots were performed with R Studio software (R Core Team 2021, https://www.R-project.org, accessed on 2 December 2022).

5. Conclusions

In conclusion, the nucleotide substitutions that have putatively affected the amino acid composition, the post-translational modification, and/or the gene expression levels of salivary proteins described in this study might have generated novel functional features and a different expression ratio among the several components of the salivary proteome. Given the largely unknown functional roles of most salivary proteins, we may only speculate that these changes could have ultimately modified the entire homeostasis of the oral cavity environment, possibly conditioning the eating habit lifestyle of modern humans. Our data may pave the way to unravelling evolutionary processes that have occurred through changes of salivary composition in the oral cavity homeostasis. This knowledge could provide additional novel cues toward a better understanding of the ability of different species to adapt to different and changing environments.

Acknowledgments

We thank Luca Pagani (Università di Padova) for their useful advice on adaptative introgression.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms241915010/s1.

Author Contributions

Conceptualization: M.C. and O.P.; data elaboration and collection, L.D.P., M.C., M.B., B.M. and A.O.; manuscript editing, L.D.P., W.L., M.C., B.M., T.C., O.P. and S.S. All authors contributed to the discussion and revision of the manuscript. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

All data reported in this manuscript are shown in the results section and further supported by the extended datasets provided in the supplementary files. No new primary datasets to be deposited have been generated.

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

This study was partially supported by the FIR 2021 funds (Cagliari, Italy) to T.C. and the “Linea D.1–D.3.1” funds from the Università Cattolica del Sacro Cuore (Rome, Italy) to L.D.P., W.L., and O.P.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Cabras T., Iavarone F., Manconi B., Olianas A., Sanna M.T., Castagnola M., Messana I. Top-down analytical platforms for the characterization of the human salivary proteome. Bioanalysis. 2014;6:563–581. doi: 10.4155/bio.13.349. [DOI] [PubMed] [Google Scholar]
  • 2.Bandhakavi S., Stone M.D., Onsongo G., Van Riper S.K., Griffin T.J. A Dynamic Range Compression and Three-Dimensional Peptide Fractionation Analysis Platform Expands Proteome Coverage and the Diagnostic Potential of Whole Saliva. J. Proteome Res. 2009;8:5590–5600. doi: 10.1021/pr900675w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Vila T., Rizk A.M., Sultan A.S., Jabra-Rizk M.A. The power of saliva: Antimicrobial and beyond. PLoS Pathog. 2019;15:e1008058. doi: 10.1371/journal.ppat.1008058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ngo L.H., Veith P.D., Chen Y.Y., Chen D., Darby I.B., Reynolds E.C. Mass Spectrometric Analyses of Peptides and Proteins in Human Gingival Crevicular Fluid. J. Proteome Res. 2010;9:1683–1693. doi: 10.1021/pr900775s. [DOI] [PubMed] [Google Scholar]
  • 5.Boroumand M., Olianas A., Cabras T., Manconi B., Fanni D., Faa G., Desiderio C., Messana I., Castagnola M. Saliva, a bodily fluid with recognized and potential diagnostic applications. J. Sep. Sci. 2021;44:3677–3690. doi: 10.1002/jssc.202100384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Beeley J.A. Basic proline-rich proteins: Multifunctional defence molecules? Oral Dis. 2012;7:69–70. doi: 10.1034/j.1601-0825.2001.0070201.x. [DOI] [PubMed] [Google Scholar]
  • 7.Hajishengallis G., Russell M.W. Innate Humoral Defense Factors. Mucosal Immunol. 2015;1:251–270. doi: 10.1016/B978-0-12-415847-4.00015-X. [DOI] [Google Scholar]
  • 8.Lyons K.M., Azen E.A., Goodman P.A., Smithies O. Many protein products from a few loci: Assignment of human salivary proline-rich proteins to specific loci. Genetics. 1988;120:255–265. doi: 10.1093/genetics/120.1.255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Padiglia A., Orrù R., Boroumand M., Olianas A., Manconi B., Sanna M.T., Desiderio C., Iavarone F., Liori B., Messana I., et al. Extensive Characterization of the Human Salivary Basic Proline-Rich Protein Family by Top-Down Mass Spectrometry. J. Proteome Res. 2018;17:3292–3307. doi: 10.1021/acs.jproteome.8b00444. [DOI] [PubMed] [Google Scholar]
  • 10.Manconi B., Castagnola M., Cabras T., Olianas A., Vitali A., Desiderio C., Sanna M.T., Messana I. The intriguing heterogeneity of human salivary proline-rich proteins. J. Proteom. 2016;134:47–56. doi: 10.1016/j.jprot.2015.09.009. [DOI] [PubMed] [Google Scholar]
  • 11.Lyons K.M., Stein J.H., Smithies O. Length polymorphisms in human proline-rich protein genes generated by intragenic unequal crossing over. Genetics. 1988;120:267–278. doi: 10.1093/genetics/120.1.267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Azen E.A., Amberger E., Fisher S., Prakobphol A., Niece R.L. PRB1, PRB2, and PRB4 coded polymorphisms among human salivary concanavalin-A binding, II-1, and Po proline-rich proteins. Am. J. Hum. Genet. 1966;58:143–153. [PMC free article] [PubMed] [Google Scholar]
  • 13.Messana I., Cabras T., Pisano E., Sanna M.T., Olianas A., Manconi B., Pellegrini M., Paludetti G., Scarano E., Fiorita A., et al. Trafficking and Postsecretory Events Responsible for the Formation of Secreted Human Salivary Peptides: A Proteomics Approach. Mol. Cell. Proteom. 2008;7:911–926. doi: 10.1074/mcp.M700501-MCP200. [DOI] [PubMed] [Google Scholar]
  • 14.Jensen J.L., Lamkin M.S., Troxler R.F., Oppenheim F.G. Multiple forms of statherin in human salivary secretions. Arch. Oral Biol. 1991;36:529–534. doi: 10.1016/0003-9969(91)90147-M. [DOI] [PubMed] [Google Scholar]
  • 15.Inzitari R., Cabras T., Rossetti D.V., Fanali C., Vitali A., Pellegrini M., Paludetti G., Manni A., Giardina B., Messana I., et al. Detection in human saliva of different statherin and P-B fragments and derivatives. Proteomics. 2006;6:6370–6379. doi: 10.1002/pmic.200600395. [DOI] [PubMed] [Google Scholar]
  • 16.Cabras T., Inzitari R., Fanali C., Scarano E., Patamia M., Sanna M.T., Pisano E., Giardina B., Castagnola M., Messana I. HPLC–MS characterization of cyclo-statherin Q-37, a specific cyclization product of human salivary statherin generated by transglutaminase 2. J. Sep. Sci. 2006;29:2600–2608. doi: 10.1002/jssc.200600244. [DOI] [PubMed] [Google Scholar]
  • 17.Torres P., Castro M., Reyes M., Torres V. Histatins, wound healing, and cell migration. Oral Dis. 2018;24:1150–1160. doi: 10.1111/odi.12816. [DOI] [PubMed] [Google Scholar]
  • 18.Castagnola M., Inzitari R., Rossetti D.V., Olmi C., Cabras T., Piras V., Nicolussi P., Sanna M.T., Pellegrini M., Giardina B., et al. A Cascade of 24 Histatins (Histatin 3 Fragments) in Human Saliva: Suggestion for a Pre-Secretory Sequential Cleavage Pathway. J. Biol. Chem. 2004;279:41436–41443. doi: 10.1074/jbc.M404322200. [DOI] [PubMed] [Google Scholar]
  • 19.Wang G. Human Antimicrobial Peptides and Proteins. Pharmaceuticals. 2014;7:545–594. doi: 10.3390/ph7050545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dickinson D.P. Cysteine peptidases of mammals: Their biological roles and potential effects in the oral cavity and other tissues in health and disease. Crit. Rev. Oral Biol. Med. 2022;13:238–275. doi: 10.1177/154411130201300304. [DOI] [PubMed] [Google Scholar]
  • 21.Manconi B., Liori B., Cabras T., Vincenzoni F., Iavarone F., Castagnola M., Messana I., Olianas A. Salivary Cystatins: Exploring New Post-Translational Modifications and Polymorphisms by Top-Down High-Resolution Mass Spectrometry. J. Proteome Res. 2017;16:4196–4207. doi: 10.1021/acs.jproteome.7b00567. [DOI] [PubMed] [Google Scholar]
  • 22.Perry G.H., Dominy N.J., Claw K.G., Lee A.S., Fiegler H., Redon R., Werner J., Villanea F.A., Mountain J.L., Misra R., et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 2007;39:1256–1260. doi: 10.1038/ng2123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Polley S., Louzada S., Forni D., Sironi M., Balaskas T., Hains D.S., Yang F., Hollox E.J. Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc. Natl. Acad. Sci. USA. 2015;112:5105–5110. doi: 10.1073/pnas.1416531112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Xu D., Pavlidis P., Taskent R.O., Alachiotis N., Flanagan C., DeGiorgio M., Blekhman R., Ruhl S., Gokcumen O. Archaic Hominin Introgression in Africa Contributes to Functional Salivary MUC7 Genetic Variation. Mol. Biol. Evol. 2017;34:2704–2715. doi: 10.1093/molbev/msx206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Xu D., Pavlidis P., Thamadilok S., Redwood E., Fox S., Blekhman R., Ruhl S., Gokcumen O. Recent evolution of the salivary mucin MUC7. Sci. Rep. 2016;6:31791. doi: 10.1038/srep31791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Thamadilok S., Choi K.S., Ruhl L., Schulte F., Kazim A.L., Hardt M., Gokcumen O., RuhL S. Human and Nonhuman Primate Lineage-Specific Footprints in the Salivary Proteome. Mol. Biol. Evol. 2020;37:395–405. doi: 10.1093/molbev/msz223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Edwards A.W.F. The Genetical Theory of Natural Selection. Genetics. 2000;154:1419–1426. doi: 10.1093/genetics/154.4.1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lynch M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA. 2010;107:961–968. doi: 10.1073/pnas.0912629107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Marcus J.H., Novembre J. Visualizing the geography of genetic variants. Bioinformatics. 2017;33:594–595. doi: 10.1093/bioinformatics/btw643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yi X., Liang Y., Huerta-Sanchez E., Jin X., Cuo Z.X., Pool J.E., Xu X., Jiang H., Vinckenbosch N., Korneliussen T.S., et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329:75–78. doi: 10.1126/science.1190371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. doi: 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Skoglund P., Jakobsson M. Archaic human ancestry in East Asia. Proc. Natl. Acad. Sci. USA. 2011;108:18301–18306. doi: 10.1073/pnas.1108181108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sankararaman S., Mallick S., Patterson N., Reich D. The Combined Landscape of Denisovan and Neanderthal Ancestry in Present-Day Humans. Curr. Biol. 2016;26:1241–1247. doi: 10.1016/j.cub.2016.03.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Racimo F., Marnetto D., Huerta-Sánchez E. Signatures of Archaic Adaptive Introgression in Present-Day Human Populations. Mol. Biol. Evol. 2017;34:296–317. doi: 10.1093/molbev/msw216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Jagoda E., Lawson D.J., Wall J.D., Lambert D., Muller C., Westaway M., Leavesley M., Capellini T.D., Mirazón Lahr M., Gerbault P., et al. Disentangling Immediate Adaptive Introgression from Selection on Standing Introgressed Variation in Humans. Mol. Biol. Evol. 2018;35:623–630. doi: 10.1093/molbev/msx314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Vernot B., Akey J.M. Resurrecting surviving Neandertal lineages from modern human genomes. Science. 2014;343:1017–1021. doi: 10.1126/science.1245938. [DOI] [PubMed] [Google Scholar]
  • 37.Weyrich L.S., Duchene S., Soubrier J., Arriola L., Llamas B., Breen J., Morris A.G., Alt K.W., Caramelli D., Dresely V., et al. Neanderthal behaviour, diet, and disease inferred from ancient DNA in dental calculus. Nature. 2017;544:357–361. doi: 10.1038/nature21674. [DOI] [PubMed] [Google Scholar]
  • 38.El Zaatari S., Grine F.E., Ungar P.S., Hublin J.J. Neandertal versus Modern Human Dietary Responses to Climatic Fluctuations. PLoS ONE. 2016;11:e0153277. doi: 10.1371/journal.pone.0153277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Cornejo Ulloa P., van der Veen M.H., Krom B.P. Review: Modulation of the oral microbiome by the host to promote ecological balance. Odontology. 2019;107:437–448. doi: 10.1007/s10266-019-00413-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lamont R.J., Jenkinson H.F. Subgingival colonization by Porphyromonas gingivalis. Oral Microbiol. Immunol. 2000;15:341–349. doi: 10.1034/j.1399-302x.2000.150601.x. [DOI] [PubMed] [Google Scholar]
  • 41.Laputková G., Schwartzová V., Bánovčin J., Alexovič M., Sabo J. Salivary Protein Roles in Oral Health and as Predictors of Caries Risk. Open Life Sci. 2018;13:174–200. doi: 10.1515/biol-2018-0023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lynge Pedersen A.M., Belstrøm D. The role of natural salivary defences in maintaining a healthy oral microbiota. J. Dent. 2019;80:S3–S12. doi: 10.1016/j.jdent.2018.08.010. [DOI] [PubMed] [Google Scholar]
  • 43.Chen W., Jiang Q., Yan G., Yang D. The oral microbiome and salivary proteins influence caries in children aged 6 to 8 years. BMC Oral Health. 2020;20:295. doi: 10.1186/s12903-020-01262-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Cabras T., Melis M., Castagnola M., Padiglia A., Tepper B.J., Messana I., Tomassini Barbarossa I. Responsiveness to 6-n-Propylthiouracil (PROP) Is Associated with Salivary Levels of Two Specific Basic Proline-Rich Proteins in Humans. PLoS ONE. 2012;7:e30962. doi: 10.1371/journal.pone.0030962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Rodrigues L., Costa G., Cordeiro C., Pinheiro C., Amado F., Lamy E. Salivary proteome and glucose levels are related with sweet taste sensitivity in young adults. Food Nutr. Res. 2017;61:1389208. doi: 10.1080/16546628.2017.1389208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Stolle T., Grondinger F., Dunkel A., Meng C., Médard G., Kuster B., Hofmann T. Salivary Proteome Patterns Affecting Human Salt Taste Sensitivity. J. Agric. Food Chem. 2017;65:9275–9286. doi: 10.1021/acs.jafc.7b03862. [DOI] [PubMed] [Google Scholar]
  • 47.Scinska-Bienkowska A., Wrobel E., Turzynska D., Bidzinski A., Jezewska E., Sienkiewicz-Jarosz H., Golembiowska K., Kostowski W., Kukwa A., Plaznik A., et al. Glutamate concentration in whole saliva and taste responses to monosodium glutamate in humans. Nutr. Neurosci. 2006;9:25–31. doi: 10.1080/10284150600621964. [DOI] [PubMed] [Google Scholar]
  • 48.Méjean C., Morzel M., Neyraud E., Issanchou S., Martin C., Bozonnet S., Urbano C., Schlich P., Hercberg S., Péneau S., et al. Salivary Composition Is Associated with Liking and Usual Nutrient Intake. PLoS ONE. 2015;10:e0137473. doi: 10.1371/journal.pone.0137473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Morzel M., Chabanet C., Schwartz C., Lucchi G., Ducoroy P., Nicklaus S. Salivary protein profiles are linked to bitter taste acceptance in infants. Eur. J. Pediatr. 2014;173:575–582. doi: 10.1007/s00431-013-2216-z. [DOI] [PubMed] [Google Scholar]
  • 50.Perry G.H., Kistler L., Kelaita M.A., Sams A.J. Insights into hominin phenotypic and dietary evolution from ancient DNA sequence data. J. Hum. Evol. 2015;79:55–63. doi: 10.1016/j.jhevol.2014.10.018. [DOI] [PubMed] [Google Scholar]
  • 51.Green R.E., Krause J., Briggs A.W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M.H., et al. A Draft Sequence of the Neandertal Genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Burbano H.A., Hodges E., Green R.E., Briggs A.W., Krause J., Meyer M., Good J.M., Maricic T., Johnson P.L., Xuan Z., et al. Targeted Investigation of the Neandertal Genome by Array-Based Sequence Capture. Science. 2010;328:723–725. doi: 10.1126/science.1188046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Bode W., Engh R., Musil D., Thiele U., Huber R., Karshikov A., Brzin J., Kos J., Turk V. The 2.0 A X-ray crystal structure of chicken egg white cystatin and its possible mode of interaction with cysteine proteinases. EMBO J. 1988;7:2593–2599. doi: 10.1002/j.1460-2075.1988.tb03109.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mednikova B.B. A Proximal Pedal Phalanx of a Paleolithic Hominin from Denisova Cave, Altai. Archaeol. Ethnol. Anthropol. Eurasia. 2011;39:129–138. doi: 10.1016/j.aeae.2011.06.017. [DOI] [Google Scholar]
  • 55.Meyer M., Kircher M., Gansauge M.T., Li H., Racimo F., Mallick S., Schraiber J.G., Jay F., Prüfer K., de Filippo C., et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222–226. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Prüfer K., de Filippo C., Grote S., Mafessoni F., Korlević P., Hajdinjak M., Vernot B., Skov L., Hsieh P., Peyrégne S., et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science. 2017;358:655–658. doi: 10.1126/science.aao1887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Prüfer K., Racimo F., Patterson N., Jay F., Sankararaman S., Sawyer S., Heinze A., Renaud G., Sudmant P.H., de Filippo C., et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505:43–49. doi: 10.1038/nature12886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Mafessoni F., Grote S., de Filippo C., Slon V., Kolobova K.A., Viola B., Markin S.V., Chintalapati M., Peyrégne S., Skov L., et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl. Acad. Sci. USA. 2020;117:15132–15136. doi: 10.1073/pnas.2004944117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Thorvaldsdottir H., Robinson J.T., Mesirov J.P. Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Brief. Bioinform. 2013;14:178–192. doi: 10.1093/bib/bbs017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Robinson J.T., Thorvaldsdóttir H., Wenger A.M., Zehir A., Mesirov J.P. Variant Review with the Integrative Genomics Viewer. Cancer Res. 2017;77:e31–e34. doi: 10.1158/0008-5472.CAN-17-0337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Ng P.C. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Pfeifer B., Alachiotis N., Pavlidis P., Schimek M.G. Genome scans for selection and introgression based on k-nearest neighbour techniques. Mol. Ecol. Resour. 2020;20:1597–1609. doi: 10.1111/1755-0998.13221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Bhatia G., Patterson N., Pasaniuc B., Zaitlen N., Genovese G., Pollack S., Mallick S., Myers S., Tandon A., Spencer C., et al. Genome-wide comparison of African-ancestry populations from CARe and other cohorts reveals signals of natural selection. Am. J. Hum. Genet. 2011;89:368–381. doi: 10.1016/j.ajhg.2011.07.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data reported in this manuscript are shown in the results section and further supported by the extended datasets provided in the supplementary files. No new primary datasets to be deposited have been generated.


Articles from International Journal of Molecular Sciences are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES