Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2025 Jul 15;122(29):e2502158122. doi: 10.1073/pnas.2502158122

Inference of human pigmentation from ancient DNA by genotype likelihoods

Silvia Perretti a,1, Patrícia Santos a,1, Maria Teresa Vizzari a, Enrico Tassani b, Andrea Benazzo a, Silvia Ghirotto a,2, Guido Barbujani a,2
PMCID: PMC12304992  PMID: 40663601

Significance

Inference from ancient DNA may be flawed when informative sites are typed at low coverage, i.e., when they are read only once or a few times. We empirically show that a widely used protocol for pigmentation inference, HIrisPlex-S, often yields incorrect results if the coverage level is <8×, but accuracy is improved by an approach based on the calculation of genotype likelihoods from raw sequencing data. We used that approach to describe changes in eye, hair, and skin color in prehistoric Eurasia. We showed that the Neolithic diffusion of early farmers introduced lighter phenotypes, but for millennia, pigmentation diversity remained extensive, so that many Europeans kept the dark skins of their African ancestors well within the Bronze and Iron ages.

Keywords: ancient DNA, human evolution, phenotype inference, pigmentation, genotype likelihood

Abstract

Light eyes, hair, and skins probably evolved several times as Homo sapiens dispersed from Africa. In areas with lower UV radiation, light pigmentation alleles increased in frequency because of their adaptive advantage and of other contingent factors such as migration and drift. However, the tempo and mode of their spread is not known. Phenotypic inference from ancient DNA is complicated, both because these traits are polygenic and because of low sequence depth. We evaluated the effects of the latter by randomly removing reads in three high-coverage ancient samples, the Paleolithic Ust’-Ishim from Russia, the Mesolithic SF12 from Sweden, and the Neolithic I5077 from current Croatia. We could thus compare three approaches to pigmentation inference, concluding that for suboptimal levels of coverage (<8×), a probabilistic method estimating genotype likelihoods leads to the most robust predictions. We then applied that protocol to 348 ancient genomes from Eurasia, describing how skin, eye, and hair color evolved over the past 45,000 y. The shift toward lighter pigmentations turned out to be all but linear in time and place, and slower than expected, with half of the individuals showing dark or intermediate skin colors well into the Bronze and Iron ages. We also observed a peak of light eye pigmentation in Mesolithic times, and an accelerated change during the spread of Neolithic farmers over Western Eurasia, although localized processes of gene flow and admixture, or lack thereof, also played a significant role.


Skin is not preserved in fossils, but there is little doubt that the early hominins’ epidermis was covered by hair and lightly pigmented (1). Exposure to sunlight induces degradation of folate, a molecule essential in DNA synthesis and cell proliferation (2); as the hair protection was lost in the course of evolution, there is evidence for a selective sweep favoring alleles that would make skins darker (3). Conversely, light skin colors promote photochemical-controlled synthesis of vitamin D (2). When Homo spread Northward from Africa into Eurasia, the selection regime thus changed, and lighter phenotypes emerged. In both phases, UV radiation seems the most likely selective agent since other environmental factors such as temperature, rainfall, or humidity show lower levels of correlation with skin color (4). On top of these major trends, adaptation to local conditions, genetic drift, and migration contributed to past and present patterns of human pigmentation.

Skin, eye, and hair colors are complex phenotypes with polygenic inheritance. All depend on the different amounts, type, and distribution of two pigments, eumelanin (brown-red) and pheomelanin (brown-yellow), produced by human melanocytes (5). Blue and green irises are not due to additional eye pigments, but to the scattering of the light caused by variable cellular density of the corneal stroma (5).

The melanin biosynthetic pathway involves several enzymatic steps, and at least 26 relevant genes have been identified in association studies (6). As is the case for other complex traits, AI-based algorithms have been designed to predict skin, eye, and hair color from DNA information. One of the most widely used test systems, HIrisPlex-S (79), infers from 41 SNPs the individual probabilities for three eye, four hair, and five skin color categories. When based on good-quality genomic data, HIrisPlex-S has a low margin of error. However, problems may arise when, regardless of the inferential method used, phenotypes are to be inferred from ancient genomes, usually typed at low coverage.

HIrisPlex-S assumes that the allelic and genotypic states of the loci of interest are known, which for many ancient samples is unwarranted. Indeed, directly calling genotypic variants from such data is challenging, because of DNA fragmentation, exogenous contamination, and degradation. More often than not, all these factors result in low sequencing depth, which in turn affects the possibility to reliably identify genotypes, and hence phenotypes too. Thus, the robustness of phenotypic prediction methods when applied to ancient samples needs to be validated considering their particular characteristics, as also remarked by Manuel Ferrando-Bernal et al. (10).

A critical factor, in these cases, is the possibility to take into account genotype uncertainty. In the first part of this paper we compared three approaches to pigmentation inference, concluding that, for suboptimal levels of coverage (<8×), a probabilistic method estimating genotype likelihoods leads to the most robust predictions. Then, in the second part of this paper, we applied that protocol to a broad dataset of 348 ancient genomes from Eurasia, thus describing how skin, eye, and hair color evolved over the past 45,000 y.

Results

Testing the Robustness of Phenotypic Inference on Ancient Data.

The primary objective of this study was to evaluate the effect of different levels of genome coverage on phenotypic inference in the context of low-coverage ancient DNA studies. HIrisPlex-S was developed in forensic science, but that has been recently used even in the context of ancient DNA studies. The HIrisPlex-S protocol has been applied on various ancient specimens, including, for example, medieval samples (11), remains dated to around 1485 attributed to King Richard III (12), the “Cheddar Man” (13), the “La Braña” Mesolithic sample (14), and even a 5,700-y-old chewed birch pitch from which human DNA was extracted (15). So far, however, no one tested whether the protocol is effective on samples sequenced at low coverage, or whether it leads to robust inference when different calling algorithms are used. As detailed in Fig. 1, we evaluated the HIrisPlex-S inferential robustness with genotypes generated using three different calling procedures: a direct approach, in which genotypes are directly called using GATK UnifiedGenotyper v3.5 (16); an imputation protocol, commonly used in the ancient DNA context, performed using GLIMPSE v1.1.1 (17); and a probabilistic approach, where genotypes likelihoods (18) were calculated and 1,000 different genotypes were sampled, weighting their effect on the phenotypic inference according to their likelihood. We obtained phenotypic prediction by analyzing the 41 HIrisPlex-S positions; direct and imputed genotypes were directly entered into the HIrisPlex-S website, whereas probabilistic predictions incorporated likelihood-based genotype sampling. We then postprocessed the resulting prediction files in R environment v4.3.3 (19) to extract, interpret, and compare the phenotypic outcomes across all approaches (see Methods and SI Appendix for details).

Fig. 1.

Fig. 1.

Diagram illustrating the conceptual framework for phenotypic prediction across varying sequencing coverages. The original high-coverage sample (Top) undergoes progressive downsampling, decreasing sequencing coverage (Left). The downsampled data are processed through three distinct workflows: direct approach (Top branch), imputation approach (Middle branch), and probabilistic approach (Bottom branch). Each branch represents a step-by-step workflow that returns predicted phenotypic traits for downstream comparative analyses.

To measure the loss of information when coverage is low, we iteratively downsampled the Ust’-Ishim [45,045 calBP (20)], SF12 [9,033 ± 8,757 calBP (21)], and I5077 (6,110 ± 25 calBP) (22) samples’ reads. We selected these samples because they represent crucial historical periods and geographical locations (Paleolithic Western Siberia, Mesolithic Sweden, and the Neolithic Balkans), and were, at the time of the analysis, the best-covered samples at the 41 HIrisPlex-S positions. Subsequently, we identified the minimum coverage across the 41 HIrisPlex-S positions (SI Appendix, Figs. S19–S21), that is 17× for Ust’-Ishim sample, 33× for SF12, and 14× for I5077, and performed progressive downsampling starting from these coverage levels. We could obtain robust phenotypic predictions only for the Ust’-Ishim sample (28×) and for the SF12 sample (44×). Conversely, because of low coverage at several sites and the allelic combination, we did not reach the threshold for a reliable skin color prediction for the I5077 sample (see SI Appendix for details).

We thus compared the phenotypic predictions obtained from two high-coverage genomes with the ones obtained considering 10 or 11 different coverage levels: 17×, 15×, 12×, 10×, 8×, 5×, 4×, 3×, 2×, and 1× for the Ust’-Ishim sample and 33×, 20×, 15×, 12×, 10×, 8×, 5×, 4×, 3×, 2×, and 1× for the SF12 sample. For each coverage level, we tested 10 independent downsampled datasets. Fig. 2 shows the estimated phenotypes and their frequency for each coverage level, under the three different calling algorithms.

Fig. 2.

Fig. 2.

Comparative phenotypic predictions for Ust’-Ishim (A) and SF12 (B) across the three different approaches and sequencing coverages. Columns correspond to the three different phenotypic traits predicted: eye color (Left), hair color (Center), and skin color (Right). Each panel consists of four rows: the first row presents the phenotypes inferred from high-coverage data (17× for Ust’-Ishim and 33× for SF12, respectively) following the direct approach. The second, third, and fourth rows show the results obtained using the direct, imputation, and probabilistic approaches, respectively. The x-axis represents the different coverage levels, while the y-axis shows the percentage of times each phenotype was predicted over 10 independent downsampled datasets.

Ust’-Ishim sample.

What we might call the true phenotypes were evaluated through the direct approach with a mean coverage level of 28×, namely brown eyes, black hair, and dark to black skin. As shown in Fig. 2A, the three estimation methods returned robust estimates for eyes and hair color, regardless of the coverage level, correctly predicted 100% of times. Conversely, the skin prediction yielded different results. The true phenotype was recognized all times only through the probabilistic approach; with the direct approach, we had 10% of error in the estimate at the lowest coverage levels, whereas the imputation protocol started providing erroneous estimates at 3× (10% of error), reaching the 50% of wrong predictions at 2 and 1×.

SF12 sample.

The true phenotypes at a mean coverage level of 44× were blue eyes, brown hair, and dark skin (Fig. 2B). With the direct approach the system was unable to predict the eye color for some downsampled datasets starting at 8×, whereas with the probabilistic approach, the eye color was unpredicted only in the 10% of the 1× datasets. In all the other downsampled datasets the blue eyes phenotype was correctly identified. With the imputation approach the system failed to identify the true phenotype at 4, 2, and 1×, where in 10% of cases brown eyes were inferred. For this sample, the hair color was correctly predicted with greater difficulty by all three approaches. We indeed observed wrong estimation starting at 8× of coverage, with a misassignment proportion reaching 80% for some coverage levels. The probabilistic method was the one showing the lowest misidentification rate, whereas the highest error rate was observed for the imputation approach (68% of misidentification rate on average below 8×). Skin color was correctly predicted 100% of times by the three approaches when the coverage level was ≥10×. Starting at 8× the system showed some issues in correctly predicting the true phenotype, especially when the genotypes were defined through the direct approach or imputation method. In particular, datasets generated through the imputation method were misassigned almost 100% of times for coverage levels below 8×. This percentage is lower when the direct approach is used, about 30% on average. The probabilistic approach performed slightly better; below coverage 8× we had, on average, 23% misassigned phenotypes.

This analysis highlights how easily the HIrisPlex-S phenotypic estimation procedure fails, if genotypes are called from data with a coverage <8×, a situation commonly encountered when DNA comes from ancient specimens. Above this coverage threshold, the direct, probabilistic, and imputation methods recover the true phenotype 100% of times. At a genomic coverage level of <8×, conversely, the three methods perform differently. Among them, the probabilistic method is the one that most frequently returns the same phenotype estimated at full coverage. The worst performances are provided by the imputation method, suggesting that the phenotypes inferred obtained through imputation of all missing HIrisPlex-S positions should be considered with caution.

Phenotypic Inference of Eurasian Samples from Paleolithic to Iron Age.

We collected a large dataset of 348 published ancient DNAs from individuals typed at different coverage levels (all above 1×; Fig. 3 and Dataset S1), and spanning from 45,000 to 1,700 y ago. Samples were labeled based on archaeological evidence and not only on chronology. We estimated eye, skin, and hair color of each sample, applying the inferential approach that provided more robust results for that specific coverage level in the previous analyses (Methods and Dataset S2). Fig. 4 and SI Appendix, Figs. S22–S33 show the pigmentation traits inferred in samples belonging to different time periods. We only reported phenotypes that the calling methods could confidently predict; for the probabilistic method, we reported the status of a phenotype only if it had been predicted in at least 90% of the 1,000 replications generated by the genotype likelihoods.

Fig. 3.

Fig. 3.

Geographical distribution across Eurasia of selected ancient samples. The colored dots represent different time transect: black—Paleolithic, blue—Mesolithic, yellow—Neolithic, green—Copper Age, orange—Bronze Age, and purple—Iron Age. The number of samples at each site is indicated inside each dot. The two test samples, the Paleolithic Ust’-Ishim and the Mesolithic SF12, are highlighted with labels.

Fig. 4.

Fig. 4.

Temporal and geographical distribution of skin pigmentation estimates in Eurasia from Paleolithic to Iron Age. The maps illustrate the spatial and temporal distribution of the inferred skin pigmentation phenotypes. Dimension of each pie chart indicates the sample size. Skin color results are grouped into three categories: Dark (Dark to Black and Dark), Intermediate, and Light (Pale and Very Pale).

Paleolithic period: from approximately 45,000 to 13,000 y ago; 12 samples; 11 typed for eye color, hereafter E, 10 for hair color, hereafter H; 12 for skin color, hereafter S; one of them is the Ust'-Ishim test sample.

Dark phenotypes are inferred for all traits for almost all the samples analyzed. The only exception is a Russian sample, Kostenki 14, dated to between 38,700 and 36,200 y ago, which exhibits an intermediate skin color (23).

Mesolithic period: From approximately 14,000 to 4,000 y ago; 66 samples; 35 E, 63 H, 53 S; one of them is the SF12 test sample.

Light eye colors are inferred for 11 samples; they come from Northern Europe, France, and Serbia. By contrast, all 24 samples from the easternmost regions only display the dark phenotype. In Serbia, both phenotypes coexist, one with blue eyes and four with brown eyes. Sixty-one samples show dark hair phenotypes, with the exception of 1 Swedish and 1 Serbian sample, both showing blonde features. Skin color displays a broader range of phenotypes: predominantly dark (43 samples), with regions in Europe also showing intermediate phenotypes (7 samples from Denmark, France, Georgia, Western Russia, Serbia, and Spain) and the earliest light phenotypes observed in this study (3 samples from France and Sweden). In this time transect, we observe for the earliest individual with inferred blue eyes, blonde hair, and light skin, NEO27, a hunter-gatherer from Sweden who lived approximately 12,000 y ago (24).

Neolithic period: From approximately 10,000 to 4,000 y ago; 132 samples, 93 E, 120 H, 93 S; one of them is the I5077 test sample.

We still observe most individuals showing the dark eye phenotype (81 samples), including in France, where we previously found only light phenotype. Both dark and light eye phenotypes are observed in Northern and Central-Eastern Europe, with the light phenotype inferred in 12 samples from Austria, Denmark, Greece, Ireland, Latvia, Serbia, and Sweden. Hair color is predicted as dark in almost all samples, with one exception from Austria who has an intermediate phenotype and five from Denmark, Greece, Ireland, and Serbia with light phenotype. Additionally, we observed for the first time in our dataset 1 sample with red hair, an early farmer from Türkiye. The skin phenotype is more variable, with regions in Europe (Portugal, Italy, Austria, Germany, Hungary, Estonia, and Russia) and Western Asia (Iran and Türkiye) exhibiting exclusively a dark phenotype, whereas other regions show either both dark and intermediate phenotypes (25 samples exhibit the latter, from Croatia, Denmark, France, Greece, Ireland, Latvia, Malta, Poland, Serbia, Sweden, and Ukraine), or even light skin phenotypes (in 5 samples from the Czech Republic, Great Britain, Latvia, Sweden, and Ukraine).

Copper Age: From approximately 6,000 to 3,500 y ago; 42 samples, 31 E, 33 H, 28 S.

Even during the Copper Age dark phenotypes are prevalent. Most samples, 26, showed dark eyes, with the light phenotype present in 5 samples from Denmark, Hungary, Italy, and Romania. Hair phenotypes remain mostly dark, with only 1 sample showing intermediate hair color (Denmark) and 1 samples exhibiting light hair color (Romania). Skin color is still predominantly dark (17 samples) in Eastern Europe, and the Iberian Peninsula, but intermediate skin tones are observed in Spain, Kazakhstan, and Central Europe (7 samples from Hungary, Italy, the Netherlands, Poland, and Romania), and light skins in Denmark, Great Britain, and Romania (4 samples).

Bronze Age: From approximately 7,000 to 3,000 y ago; 71 samples, 55 E, 64 H, 43 S.

In this time period, we observed an increasing proportion of light eye phenotype. While 39 samples throughout Europe and Asia are still exhibiting dark eyes, 16 samples display a light phenotype. These light phenotypes are still mainly found in Europe, but are also emerging in other regions such as Western Russia and Jordan, and as far East as Kazakhstan. Dark hair phenotypes remain predominant in most of Europe and Asia (49 samples), with intermediate phenotypes present in 2 samples from Denmark and Hungary. However, there is a greater proportion of light phenotypes (12 samples), specifically in Italy, Western Russia, Jordan, and Kazakhstan. One sample from Greece exhibits red hair. Most samples from Western and Southern Europe, Russia, and Southern Asia still show dark skin (22 samples), but we also observed an increase in intermediate phenotypes in Central Europe and Central-Eastern Europe, as well as their first appearance in Russia (15 samples in total). The light phenotype emerged in 6 samples from the Czech Republic, Denmark, Estonia, France, Great Britain, and Hungary. During this period, we observed an increase in the co-occurrence of estimated blue eyes, blonde hair, and light skin, with 4 samples exhibiting this combination of phenotypes: I7198 from the Czech Republic (14), EKA1 from Estonia (25), I2445 from England (14), and SZ1 from Hungary (26).

Iron Age: From approximately 3,000 to 1,700 y ago; 25 samples, 15 E, 19 H, 11 S.

In this phase, the dark eye phenotype (10 samples) is present in Great Britain, Spain, and Central and Eastern Russia, while the light eye phenotype (3 samples) occurs in Denmark and Finland. Both phenotypes coexist in Italy and Kazakhstan. Hair remains predominantly dark throughout Europe and Asia (14 samples), with one intermediate phenotype observed in Denmark and four light phenotypes in Denmark, Finland, Italy, and Kazakhstan. Skin color analysis shows the dark phenotype (6 samples) in Central and Eastern Russia, Kazakhstan, and Italy. The intermediate phenotype occurs in 3 samples from Denmark, Kazakhstan, and Spain, whereas the light phenotype (2 samples) is still present in Northern Europe. A combination of blue eyes, blonde hair, and pale skin is observed in 2 samples: VK521 from Denmark (27) and DA236 from Finland (28).

SI Appendix, Fig. S34 and Dataset S2 show that the inferred phenotypes would be different if we called them using a direct rather than a probabilistic approach for low coverage levels. In particular, direct calls would lead to a higher frequency of dark skin phenotypes in all the time periods considered.

Discussion

Inference of human pigmentation traits from ancient DNA evidence has garnered significant attention in recent years. So far, however, all the available approaches rest on the assumption that the sample’s allelic state at polymorphic positions is exactly known, which is not always warranted. Indeed, ancient genomic data are often produced at low coverage, meaning that often few reads (if any) map a specific genomic region; as a consequence, the direct calling of genotypes on these samples may introduce a substantial bias in downstream analyses (79). As for imputation, it is based on association of alleles in current genomes; assuming that levels of linkage disequilibrium were the same in ancient genomes would introduce a hard-to-quantify bias in the analyses. Our downsampling experiments show that the HIrisPlex-S system can be safely exploited through the currently available pipeline (direct approach) only when the coverage level at the 41 inferential sites is at least 8x, which is uncommon in ancient samples.

To address this issue, in this work, we propose a framework integrating a probabilistic approach in phenotype inference, useful when a direct genotype calling would not be accurate. We tested this framework by estimating phenotypes considering for each sample 1,000 combinations of genotypes at the 41 HIrisPlex-S positions, reflecting their likelihoods (18). This way, we obtained both a phenotype inference and a measure of its reliability. The coverage downsampling procedure we applied on the Ust’-Ishim and SF12 samples proved the probabilistic approach is robust in inferring the true pigmentation trait even at low (below 8×) or very low (1-2×) coverage levels, although, in the SF12 sample, there is some degree of uncertainty at low coverage levels. With low coverage, one approach, widely used in paleogenomics, is to impute the genotypic state using algorithms that leverage information from adjacent loci and a reference database to fill in the missing position. However, we showed that imputing several HIrisPlex-S positions may bias the estimation of phenotypes, particularly at very low coverage, and therefore the imputation approach should be taken with caution. As is the case when continuous traits are described as discrete, not all phenotypes are inferred with the same level of confidence by HIrisPlex-s (79). This is particularly true for the skin phenotypes categories, the limits of which are admittedly somewhat arbitrary.

By a probabilistic approach, we showed that eye, hair, and skin color changed substantially through time in Eurasia. It was reasonable to imagine that the first hunting-gathering settlers, who came from warmer climates, had mostly dark pigmentation (2). We are now showing that their phenotypes persisted up to the Iron age. We found the earliest instance of light skin color in the Swedish Mesolithic, but it comes from only one sample in >50. Things changed afterward, but very slowly, so that only in the Bronze Age did the frequency of light skins equal that of dark skins in Europe; during much of prehistory, most Europeans were dark-skinned. A similar trend, with dark pigmentation long coexisting with an increasing, yet relatively small proportion of lighter traits, is observed for hair and eye color, although there was a temporary peak of light eye frequency in the Mesolithic period, when we inferred light pigmentation for 11 out of 35 samples.

There is little doubt that, on top of selection, gene flow was the main factor causing shifts of pigmentation traits. Antonio et al. (29) observed a decrease of between-population genetic variances as we move in time from the Mesolithic period to the present. In this study, we described a parallel increase of within-population variance for pigmentation traits. Both results are the expected consequences of an evolutionary scenario in which the effects of gene flow exceed those of genetic drift.

Among the episodes of gene flow documented in Western Eurasian prehistory, the spread of early Neolithic farmers from Anatolia is known to have profoundly changed the genetic makeup of populations (30), to the point that some authors speak of “population turnover,” e.g., in the British Isles (13) and in Denmark (24). What we observed for pigmentation traits appears to be, in part, a consequence of that massive migration. Actually, the transition to food production also led to an increase in infectious disease and to a poorer diet (see, e.g., ref. 31), but once in the new territories, immigrating farmers had two evolutionary advantages over their hunting-gathering counterparts. By farming and animal herding they increased the amount of available food, and had a skin phenotype fit for the lower levels of UV radiation. Both factors gave them (and only them) the potential for demographic growth (32, 33), ultimately leading to profound changes in the Europeans’ genomes.

However, these and other ancient DNA data (34) show that the process was all but linear, and took longer than the Neolithic to be completed. The first instance of light phenotypes we identified dates back to Mesolithic (skin, eye) or Neolithic (hair) times, but whereas light skins steadily increased in frequency across time, possibly due to their adaptive value, hair and eye colors showed fluctuations, the main of which is the localized Mesolithic increase of light eye frequency. As a consequence, we do not think that the changes described in this paper can be regarded as the effects of a wave of migration proceeding at a regular pace. Rather, what we think we are observing is a process in which, above and beyond the major Neolithic demic diffusion over much of Western Eurasia, localized processes of migration and admixture, or lack thereof, played a significant role. Even under reduced UV radiation, then, food availability was a factor. Some hunting and gathering populations could still obtain sufficient vitamin D from dietary sources, such as fish and game. Only when farming settlements got larger and the fauna was depleted, pale skin colors replaced for good the dark phenotypes.

Identifying the specific genes responsible for the observed trends is not straightforward, due to the low mean coverage, which often leaves the allelic status at several loci undefined. However, the Paleolithic sample Ust-Ishim (with dark phenotypes for all three traits) and the Hungarian Bronze age SZ1 (with light phenotypes for all three traits) differ at several loci known to have changed in the course of time (SLC45A2, LOC105374875, HERC2, PIGU, TYRP1, ANKRD11, BNC2, HERC2, OCA2, and DEF8), as well as two novel loci, TYR (rs1042602) and SLC24A5 (rs1426654), both of which carry the AA genotypes in the SZ1 sample. Strong support from previous studies links the A allele of rs1042602 to the absence of freckles (35). Furthermore, the rs1042602 variant, located in the TYR gene encoding tyrosinase involved in melanin biosynthesis, is associated with albinism, particularly in individuals who are homozygous for the A allele, in combination with another genetic variant within the TYR gene, rs4547091, when homozygous for the C allele (36). The nonsynonymous mutation (G→A) at rs1426654 in the SLC24A5 gene is strongly associated with reduced melanin production and lighter skin pigmentation in humans. The ancestral allele G is fixed in Africans and East Asians while the derived allele A is nearly fixed in European populations (98.7 to 100%) (37, 38) and present at high frequencies in the Near East and South Asia. In addition, rs1426654 has been linked to skin pigmentation variation in admixed populations with recent European ancestry and within South Asians (38). Its rarity in East Asia and most sub-Saharan African populations supports the hypothesis that it originated in Europe or the Near East within the past 10,000 to 35,000 y (38). As noted by Lin et al., the SLC24A5 locus exemplifies a rare instance of strong, recent adaptation in human history and serves as a case of adaptive gene flow at a pigmentation-related locus (39).

HIrisPlex-S has a low margin of error, particularly for European populations. While the system provides robust inferences, its accuracy can be affected in recently admixed populations, especially for intermediate phenotypes (40, 41) and skin color. This can be attributed to factors such as phenotype misclassification resulting from subjective skin tone perception, as observed in Southeastern Brazilians (42). With the development of inferential procedures for phenotype estimation, such as the one proposed in this work, together with the increasing production of high-quality ancient genomic data, we will gain deep insights into the evolution of pigmentation-related phenotypes in our species. This will help reconstruct a comprehensive and detailed picture of crucial phases of our prehistory while enhancing our understanding of the evolutionary and demographic forces that drove and shaped modern human phenotypic variation.

Methods

Testing the Robustness of Phenotypic Inference on Ancient Data.

To evaluate the robustness of the three different approaches we considered two ancient high-coverage samples, Ust’-Ishim and SF12. For the direct and probabilistic approaches, we generated a pileup file from the alignment data for the 41 HIrisPlex-S positions using SAMtools v1.11 mpileup command (43). Then, we conducted a pointwise progressive downsampling from the mpileup output file within the R environment v4.3.3 (19), using a nonreplacement sampling to ensure random selection of unique elements to prevent read and nucleotide duplication (see SI Appendix for details). For the imputation approach the downsampling was performed using the SAMtools v1.11 view command with the -s option (43), where the 41 informative positions have been masked and then imputed (see SI Appendix for details).

For the probabilistic approach, we computed, for each of the 41 informative positions, the genotype likelihoods for each of the ten possible genotypes within the R environment v4.3.3 (19) applying the formula of the first version of GATK (dragon) (44). This approach ensures that multiple genotypes may be evaluated for the same position due to their genotype likelihood values, improving the procedures of ref. 13. Since the HIrisPlex-S system accepts only a single allele derived from a genotype, we performed 1,000 samplings from the 10 possible genotypes according to their genotype likelihoods and posterior probabilities. Each sampling event resulted in a unique combination of informative alleles. The final result consists of a table containing 1,000 rows, each representing a set of 41 allelic combinations (see SI Appendix for details).

To infer the phenotypic traits, we uploaded the required CSV file containing the allele count for the 41 positions, obtained with the three protocols, into the HIrisPlex-S website (https://hirisplex.erasmusmc.nl). The results obtained from the HIrisPlex-S system were analyzed within the R environment v4.3.3 (19) following the guidelines outlined in the HIrisPlex-S User Manual (2018) (79).

Inference of Pigmentation Traits on Ancient Genomic Data.

We analyzed a dataset of 348 published ancient human whole-genomes (Fig. 3 and Dataset S1), including the two test samples used for the validation step, with a minimum mean coverage of 1×. The samples encompass a temporal range from approximately 45,000 to 1,700 y ago, and distributed from Western Europe to Asia, representing a total of 34 countries. We defined six different groups based on the chronological period, archeological context, cultural affiliation, and genetic affinities of the samples, as reported in the literature. All the samples were processed using an in-house pipeline (see SI Appendix for details).

Within our dataset, 13 samples have a coverage level above 8× across all the 41 HIrisPlex-S positions, and so we applied the direct approach to extract the genotypes for the 41 positions; 335 samples for which the coverage was equal to or below 8× we applied the probabilistic approach and we treated uncovered HIrisPlex-S positions as missing data.

Supplementary Material

Appendix 01 (PDF)

Dataset S01 (XLSX)

pnas.2502158122.sd01.xlsx (66.7KB, xlsx)

Dataset S02 (XLSX)

pnas.2502158122.sd02.xlsx (78.4KB, xlsx)

Acknowledgments

G.B. and S.P. were financially supported by PRIN 2020 (Grant 2020HJXCK9) from the Italian Ministry of Education, University and Research (MIUR). M.T.V., A.B., P.S., and S.G. were financially supported by PRIN 2020 (Grant 2020TACEZR) from the MIUR, and by PRIN 2022 PNRR from the MIUR, funded by the European Union—NextGenerationEU—mission 4, component C2, investment 1.1—P20228M8ZN—CUP F53D23008180001. We thank Gloria Gonzalez Fortes for her contribution in the initial stages of this project and Nina Jablonski for discussion on the previous version of this manuscript.

Author contributions

S.G. and G.B. designed research; S.P., P.S., and M.T.V. performed research; A.B. contributed new reagents/analytic tools; S.P., P.S., M.T.V., and E.T. analyzed data; and S.P., P.S., S.G., and G.B. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

PNAS policy is to publish maps as provided by the authors.

Contributor Information

Silvia Ghirotto, Email: ghrslv@unife.it.

Guido Barbujani, Email: g.barbujani@unife.it.

Data, Materials, and Software Availability

All scripts underlying the inferential procedure presented in this manuscript are fully documented and available on GitHub at https://github.com/Ghirotto-Lab-at-University-of-Ferrara/Phenotypic_inference and Zenodo (45). Previously published data were used for this work (SI Appendix, Dataset S1).

Supporting Information

References

  • 1.Jablonski N. G., Chaplin G., The evolution of human skin coloration. J. Hum. Evol. 39, 57–106 (2000). [DOI] [PubMed] [Google Scholar]
  • 2.Jablonski N. G., The evolution of human skin pigmentation involved the interactions of genetic, environmental, and cultural variables. Pigment Cell Melanoma Res. 34, 707–729 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rogers A. R., Iltis D., Wooding S., Genetic variation at the MC1R locus and the time since loss of human body hair. Curr. Anthropol. 45, 105–108 (2004). [Google Scholar]
  • 4.Chaplin G., Geographic distribution of environmental factors influencing human skin coloration. Am. J. Phys. Anthropol. 125, 292–302 (2004). [DOI] [PubMed] [Google Scholar]
  • 5.Sturm R. A., Larsson M., Genetics of human iris colour and patterns. Pigment Cell Melanoma Res. 22, 544–562 (2009). [DOI] [PubMed] [Google Scholar]
  • 6.Liu J., Bitsue H. K., Yang Z., Skin colour: A window into human phenotypic evolution and environmental adaptation. Mol. Ecol. 33, e17369 (2024), 10.1111/mec.17369. [DOI] [PubMed] [Google Scholar]
  • 7.Chaitanya L., et al. , The HIrisPlex-S system for eye, hair and skin colour prediction from DNA: Introduction and forensic developmental validation. Forensic Sci. Int. Genet. 35, 123–135 (2018). [DOI] [PubMed] [Google Scholar]
  • 8.Walsh S., et al. , Global skin colour prediction from DNA. Hum. Genet. 136, 847–863 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Walsh S., et al. , Developmental validation of the HIrisPlex system: DNA-based eye and hair colour prediction for forensic and anthropological usage. Forensic Sci. Int. Genet. 9, 150–161 (2014). [DOI] [PubMed] [Google Scholar]
  • 10.Ferrando-Bernal M., Brand C. M., Capra J. A., Inferring human phenotypes using ancient DNA: From molecules to populations. Curr. Opin. Genet. Dev. 90, 102283 (2025), 10.1016/j.gde.2024.102283. [DOI] [PubMed] [Google Scholar]
  • 11.Draus-Barini J., et al. , Bona fide colour: DNA prediction of human eye and hair colour from ancient and contemporary skeletal remains. Investig. Genet. 4, 3 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.King T. E., et al. , Identification of the remains of King Richard III. Nat. Commun. 5, 5631 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Brace S., et al. , Ancient genomes indicate population replacement in early neolithic Britain. Nat. Ecol. Evol. 3, 765–771 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Olalde I., et al. , The beaker phenomenon and the genomic transformation of northwest Europe. Nature 555, 190–196 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jensen T. Z. T., et al. , A 5700 year-old human genome and oral microbiome from chewed birch pitch. Nat. Commun. 10, 5520 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.DePristo M. A., et al. , A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rubinacci S., Ribeiro D. M., Hofmeister R. J., Delaneau O., Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat. Genet. 53, 120–126 (2021). [DOI] [PubMed] [Google Scholar]
  • 18.Nielsen R., Paul J. S., Albrechtsen A., Song Y. S., Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.R Core Team, R: A Language and Environment for Statistical Computing [Preprint] (R Foundation for Statistical Computing, Vienna, Austria, 2021). https://www.R-project.org/. [Google Scholar]
  • 20.Fu Q., et al. , Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Günther T., et al. , Population genomics of mesolithic Scandinavia: Investigating early postglacial migration routes and high-latitude adaptation. PLoS Biol. 16, e2003703 (2018), 10.1371/journal.pbio.2003703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mathieson I., et al. , The genomic history of southeastern Europe. Nature 555, 197–203 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Seguin-Orlando A., et al. , Genomic structure in Europeans dating back at least 36,200 years. Science 346, 1113–1118 (2014). [DOI] [PubMed] [Google Scholar]
  • 24.Allentoft M. E., et al. , 100 ancient genomes show repeated population turnovers in neolithic Denmark. Nature 625, 329–337 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Malmström H., et al. , The genomic ancestry of the Scandinavian Battle Axe Culture people and their relation to the broader Corded Ware horizon. Proc. R. Soc. B: Biol. Sci. 286, 20191528 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Amorim C. E. G., et al. , Understanding 6th-century barbarian social organization and migration through paleogenomics. Nat. Commun. 9, 3547 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Margaryan A., et al. , Population genomics of the Viking world. Nature 585, 390–396 (2020). [DOI] [PubMed] [Google Scholar]
  • 28.Sikora M., et al. , The population history of northeastern Siberia since the Pleistocene. Nature 570, 182–188 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Antonio M. L., et al. , Stable population structure in Europe since the Iron Age, despite high mobility. eLife 13, e79714 (2024), 10.7554/eLife.79714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Marchi N., et al. , The genomic origins of the world’s first farmers. Cell 185, 1842–1859.e18 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Pearson J., et al. , Mobility and kinship in the world’s first village societies. Proc. Natl. Acad. Sci. U.S.A. 120, e2209480119 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ammerman A. J., “The transition to early farming in Europe” in Simulating Transitions to Agriculture in Prehistory, Pardo-Gordó S., Bergin S., Eds. (Springer International Publishing, 2021), pp. 225–253. [Google Scholar]
  • 33.Bellwood P., First Farmers: The Origins of Agricultural Societies (Wiley-Blackwell, ed. 2, 2023). [Google Scholar]
  • 34.Kuijpers Y., et al. , Evolutionary trajectories of complex traits in European populations of modern humans. Front. Genet. 13, 833190 (2022), 10.3389/fgene.2022.833190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sulem P., et al. , Genetic determinants of hair, eye and skin pigmentation in Europeans. Nat. Genet. 39, 1443–1452 (2007). [DOI] [PubMed] [Google Scholar]
  • 36.Liu J., Black G. C., Kimber S. J., Sergouniotis P. I., Generation of a human induced pluripotent stem cell line carrying the TYR c.575C>A (p.Ser192Tyr) and c.1205G>A (p.Arg402Gln) variants in homozygous state using CRISPR-Cas9 genome editing. Stem Cell Res. 64, 102880 (2022). [DOI] [PubMed] [Google Scholar]
  • 37.Lamason R. L., et al. , SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310, 1782–1786 (2005). [DOI] [PubMed] [Google Scholar]
  • 38.Basu Mallick C., et al. , The light skin allele of SLC24A5 in south Asians and Europeans shares identity by descent. PLoS Genet. 9, e1003912 (2013), 10.1371/journal.pgen.1003912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lin M., et al. , Rapid evolution of a skin-lightening allele in southern African KhoeSan. Proc. Natl. Acad. Sci. U.S.A. 115, 13324–13329 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Marano L. A., Andersen J. D., Goncalves F. T., Garcia A. L. O., Fridman C., Evaluation of HIrisplex-S system markers for eye, skin and hair color prediction in an admixed Brazilian population. Forensic Sci. Int. Genet. Suppl. Ser. 7, 427–428 (2019). [Google Scholar]
  • 41.Hohl D. M., et al. , Applicability of the IrisPlex system for eye color prediction in an admixed population from Argentina. Ann. Hum. Genet. 86, 297–327 (2022). [DOI] [PubMed] [Google Scholar]
  • 42.Carratto T. M. T., et al. , Evaluation of the HIrisPlex-S system in a Brazilian population sample. Forensic Sci. Int. Genet. Suppl. Ser. 7, 794–796 (2019). [Google Scholar]
  • 43.Li H., et al. , The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.McKenna A., et al. , The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Perretti S., et al. , Phenotypic_inference-v1.0. Zenodo. 10.5281/zenodo.15738991. Deposited 25 June 2025. [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Dataset S01 (XLSX)

pnas.2502158122.sd01.xlsx (66.7KB, xlsx)

Dataset S02 (XLSX)

pnas.2502158122.sd02.xlsx (78.4KB, xlsx)

Data Availability Statement

All scripts underlying the inferential procedure presented in this manuscript are fully documented and available on GitHub at https://github.com/Ghirotto-Lab-at-University-of-Ferrara/Phenotypic_inference and Zenodo (45). Previously published data were used for this work (SI Appendix, Dataset S1).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES