Summary
Indigenous populations of the Qinghai-Xizang Plateau exhibit unique high-altitude adaptations, especially within Tibeto–Burman (TB) groups. However, the paternal genetic heritage of eastern Plateau regions remains less explored. We present one integrative Y chromosome dataset of 9,901 modern and ancient individuals, including whole Y chromosome sequences from 1,297 individuals and extensive Y-SNP/STR genotype data. We reveal the Paleolithic common origin and following divergence of Qinghai-Xizang Plateau ancestors from East Asian lowlands, marked by subsequent isolation and Holocene expansion involving local hunter-gatherers and millet-farming communities. We identified two key TB-related founding lineages, D-Z31591 and O-CTS4658, which underwent significant expansions around 5,000 years ago on the Qinghai-Xizang Plateau and its eastern Tibetan-Yi Corridor. The genetic legacy of these TB lineages highlights crucial migration pathways linking the Plateau and lowland southwestern China. Our findings align paternal genetic structures with East Asian geography and linguistic groups, underscoring the utility of Y chromosome analyses in unraveling complex paternal histories.
Subject areas: Evolutionary biology, Paleobiology, Anthropology
Graphical abstract

Highlights
-
•
We reported an integrative Yanhuang cohort Y chromosome dataset from 9,901 individuals
-
•
Paleolithic and Holocene events shaped Qinghai-Xizang Plateau paternal lineages
-
•
Two Tibeto-Burman-related founding lineages had star-like expansions ∼5,000 years ago
-
•
We revealed migration corridors and shared paternal origins among Sino-Tibetan-speaking populations
Evolutionary biology; Paleobiology; Anthropology
Introduction
The non-recombining region of the Y chromosome (NRY), which is uniquely inherited along male lines, offers significant potential for applications in forensic science and molecular anthropology. Analyses of the genetic structure and genomic diversity of ethno-linguistically different human populations, informed by databases such as the 1000 Genomes Project, 10K Chinese People Genomic Diversity Project, and the Human Genome Diversity Project, revealed that populations with diverse ethnolinguistic backgrounds possess distinct genetic architectures influencing human traits and diseases.1,2,3 Initiatives such as the All of Us Research Program enhance the understanding of human genetic diversity by focusing on previously underrepresented populations, thus reducing European bias in genetic research.4,5,6 Despite abundant genomic resources for mitochondrial DNA and autosomes, comprehensive resources for the Y chromosome remain scarce.4,7,8,9,10 The complexity and high repetitiveness of the Y chromosome sequence have historically hindered detailed studies of its structural variations and the biological implications of its variants. However, recent advancements in capture sequencing and long-read sequencing technologies have facilitated more precise Y chromosome assembly, which is critical for various applications.11,12 The completion of a telomere-to-telomere (T2T) assembly of the Y chromosome, coupled with population genetic analyses of 43 diverse human Y chromosomes, underscores the complexity and variability of their sequencing characteristics and population-specific variations.11,12 These developments significantly advanced the ability to engage high-confidence NRY regions and measurable Y chromosome segments in forensic investigations, population genetic studies, and molecular anthropology, promising substantial impacts on multiple disciplines.
China’s vast genetic, cultural, and ethnic diversity reflects a history shaped by complex movements and admixture events involving ancient Yellow River millet farmers, Yangtze River rice cultivators, diverse Paleolithic hunter-gatherers, and Western Eurasian pastoralists.13 This intricate genetic history underpins the spatiotemporal diversity observed in ancient and modern East Asian populations.8,13 The origins and dispersal of the Sino-Tibetan (ST) language family, which is predominant in eastern Eurasia and comprises the Tibeto-Burman (TB) and Sinitic languages, remain debated. Hypotheses suggest that the ST languages originated in North China, the Tibetan-Yi Corridor (TYC) in western Sichuan, and northeastern India on the southern Qinghai-Xizang Plateau.14 Analyses of ancient DNA from the Yellow River Basin revealed connections between Neolithic millet farmers and early highland East Asians, including populations in the Qinghai-Xizang Plateau and Nepal.15,16,17 Mitochondrial DNA and Y chromosomal data have highlighted the Paleolithic origins of the region’s initial settlers and their links to broader East Asian maternal and paternal lineages.15,18,19,20 Population genetic studies suggest that the genetic composition of modern Tibetans was shaped by both Paleolithic colonization and Neolithic expansion events.21,22 This is further corroborated by recent ancient DNA studies identifying a Holocene link between millet farmers and ancient Qinghai-Xizang Plateau populations, as well as a deep genetic connection between Tibetans and early Asians.16 Due to their varying natural environments and interactions with culturally diverse groups, geographically distinct TB-speaking populations show differentiated population structures.16,23 While core Tibetan populations on the Plateau display unique genetic profiles, those in the surrounding lowlands have been influenced by gene flows from neighboring Indians, Central Asians, and other East Asian populations.16,23,24 This complex genetic legacy underscores the need for further exploration into paternal genetic diversity and population evolutionary processes among geographically distinct TB groups, offering profound insights into the demographic processes that have shaped regional human history.
Recent Chinese genomic cohorts, such as STROMICS, the China Kadoorie Biobank, ChinaMAP, the NyuWa Genome Resource, and the Born in Guangzhou Cohort Study, have documented the genomic diversity of the Chinese populations.3,13,25,26,27,28,29 These studies have significantly contributed to filling the gaps in the genomic data of Chinese populations and advancing human health equity.5 Despite these advancements, the genomic resources of the Y chromosome and their potential to elucidate the paternal genetic history of this group have not yet been explored. To address the missing diversity of Y chromosomes in China, we launched the YanHuang cohort, aimed at sequencing over 100K ethnolinguistically diverse Chinese males to delineate the complete genetic landscape of Y chromosome variations and investigate the paternal origins of ancient and modern Chinese populations. Our pilot work reported the paternal genetic background of diverse admixture models within the Han Chinese and ethnic minority groups.13,30 Wang et al. constructed a phylogenetic tree from modern and ancient East Asian populations, revealing multiple founding lineages from ancient farmers, herders, and hunter-gatherers that shaped the paternal gene pool of contemporary East Asians.13 Another study highlighted the diverse contributions to East Asian paternal lineages and introduced the "Weakly-Differentiated Multi-source Admixture model" to decode the complex demographic history of Han Chinese populations using extensive genomic data.30 However, paternal genomic diversity, settlements on the Qinghai-Xizang Plateau, and potential geographical corridors facilitating population exchange between highland and lowland areas remain uncharacterized in the current era of sequencing.
Y chromosome markers are pivotal in reconstructing paternal demographic history, enhancing forensic paternal biogeographic inferences, and refining pedigree searches.7,9,23 Specifically, Y chromosome short tandem repeats (Y-STRs) are frequently utilized in genetic research due to their effectiveness.31,32,33,34 Analyzing numerous Y-STRs enhances haplotype identification resolution within populations, improving the discriminative capacity of genetic analysis. However, the high mutation rates of Y-STRs, ranging from 1.0 × 10−4 to 1.0 × 10−3 per generation, introduce challenges by potentially altering haplotypes within the same lineage, complicating forensic familial searches.35 Conversely, Y chromosome single nucleotide polymorphisms (Y-SNPs), which have lower mutation rates of approximately 1.0 × 10−8 per generation, provide a stable method for preserving paternal lineage information over extensive periods.36 Here, we reported large-scale paternal genomic data aimed to refine paternal lineage investigations by distinguishing lineages via the shared haplotypes or haplogroups, thereby providing a clearer picture of the genetic structure and forensic characteristics of geographically distinct TB-speaking populations. This comprehensive approach enhances the understanding of genetic diversity and supports forensic applications by providing more accurate lineage information.
Results
YanHuang cohort genomic resources and episodes of Paleolithic and Neolithic diversification and isolation in Tibeto–Burman people
We present an integrated YanHuang Y chromosome genomic resource encompassing data from 9,901 ethnolinguistically diverse individuals across 38 ethnic groups and 34 provinces (Figures 1A and 1B). This objective was to identify the founding lineages of TB people and reconstruct their paternal demographic history. The dataset comprises three distinct types of data. First, whole Y chromosome sequences from 994 modern and 303 ancient individuals16,18,37,38,39,40,41 were used to reconstruct the phylogenetic relationships between modern and ancient Chinese populations and estimate the chronology of divergence, expansion, and migration events in modern and ancient TB populations. Second, we analyzed 4,298 genetic profiles featuring population-specific SNP and STR variations from Chinese populations to explore the genetic relationships and landscape of TB people and other reference Chinese populations.42,43,44 Finally, we examined genetic data from 4,306 individuals with high-density Y-SNPs from 38 ethnic groups in the Chinese Paternal Genomic Diversity Project (CPGDP) to elucidate the origins and dispersal patterns of two TB-related founding lineages.
Figure 1.
Geographical position, phylogeny, and phylogenetic relationships between modern and ancient populations
(A) Geographical distribution of the newly whole-genome sequenced and genotyped Tibeto-Burman (TB)-speaking populations and reference groups.
(B) Detailed map of the Chinese regions encompassing the newly collected TB groups.
(C) Time-calibrated TB-dominant D and O lineage phylogeny showing the main founding lineage highlighted in this work.
(D) Maximum likelihood-based phylogenetic relationships showing a clustering pattern between modern and spatiotemporally different ancient populations.
Y chromosome sequences provide insights into the common patrilineal ancestors of founding lineages. We sequenced genomes from 72 TB-speaking representative samples and integrated them with modern and ancient Eurasian data from the pilot work of the YanHuang cohort,13 creating a comprehensive dataset of 1,297 Y chromosomes. The four B2b1a1b African lineages served as the basal branch in our time-stamped phylogenetic analysis. This analysis revealed a coalescence between the D and O founding lineages 65,339 to 74,810 years ago (ya). Divergence and admixture events, indicated by BEAST analyses, suggested prolonged population bottlenecks followed by recent expansions in these lineages. Specifically, the D1a1a and D1a1b lineages diverged between 43,354 and 51,057 ya after a 19,270-year bottleneck. The D1a1a lineage then split into the D1a1a1b and D1a1a1a1b sublineages after an 11,340-year period of stability, after which it expanded during the Neolithic transition (Figure 1C). We identified two Neolithic TB-related lineages of the O2a2b1a1a1a4a-CTS4658 and D1a1a1a1b-Z31591 expanded in TB people. The Tibetan-dominant D1a1a1a1b lineage expanded between 4,692 and 6,663 ya, likely coinciding with the Proto-Tibetan adoption of millet or barley farming and adaptation to high-altitude environments. Similarly, a lineage associated with Sherpa and other Tibetan populations expanded between 5,403 and 7,040 ya, as observed in the Pumi and other groups.
Further exploration of phylogenetic patterns among modern and ancient populations led to the construction of a unified paternal genealogy (Figure 1D). The early population structure, associated with multiple typical East Asian lineages, revealed that at least two distinct ancestral founding lineages contributed to the genetic pool of the TB-speaking populations. Phylogenetic analyses among modern and ancient East Asians confirmed genetic continuity during the Neolithic period across major genetically differentiated regions: the northern Yellow River Basin, southern Yangtze River Basin, Amur River Basin, and Qinghai-Xizang Plateau. The D1a2 lineages found in the Jomon people represent an early divergence from the Qinghai-Xizang Plateau-related D1a1 lineages. Two primary sublineages of D1a1, D1a1a, and D1a1b diverged during the Upper Paleolithic period. The D1a1a lineage was identified in both modern Tibetan and Yi populations, as well as in ancient individuals from the Qinghai-Xizang Plateau (Figure 2A), including those from archeological sites such as Samdzong, Gebusailu, Qulongsazha, Sangdalongguo, and Gebusailu. The D1a1b lineage was observed in the Tibetan and Yi populations, as well as in the Mosuo and Pumi populations and in Iron Age individuals from Nyingchi Kangyu (D1a1b1a). The O2a2b1a1a1a4a lineages were identified in the Yi, Lahu, Pumi, and previously documented Zhuang populations, which clustered with 39 individuals from the Bronze Age to historical periods in highland areas (Figure 2B). Meanwhile, the N1b2 lineage has been observed in modern Yi and Tujia populations, as well as in several ancient individuals from the Qinghai-Xizang Plateau (Sangdalongguo, Laga, Gebusailu, Qulongsazha, and Zongri). Additionally, four ancient individuals from high-altitude regions belong to the O2a2b1a2a1a lineage. Overall, TB-related O-CTS4658 and D-Z31591 lineages were identified in both modern and ancient TB individuals, which clustered with a diverse range of ancient highland populations.
Figure 2.
Geographical position and pathPhynder placement of ancient East Asian samples belonging to two TB-founding lineages into this fully resolved Y chromosome phylogeny
(A) Phylogenetic and clustering patterns of D lineages among modern TB people and ancient eastern Eurasian individuals.
(B) O lineages carried by modern TB people, Tai-Kadai people, and ancient highland Qinghai-Xizang Plateau individuals. The geographical positions of key ancient individuals were labeled in the middle maps. Ancient individuals were denoted via the green background. The base map was officially approved with the number GS(2019)1674 (http://bzdt.ch.mnr.gov.cn/).
Genetic diversity of Tibeto–Burman people inferred from Y-STR haplotypes
To comprehensively explore the genetic patterns of the TB-speaking population and their relationships with other reference groups, we genotyped paternal diversity data of large-scale populations via more cost-effective genotyping methods. We reported 4,298 Y chromosome haplotypes, including 37 Y-STRs and 215 Y-SNPs, with 519 newly genotyped TB individuals. These data were submitted to the YHRD database, revealing significant genetic diversity across ethnolinguistically distinct populations (Figures 3 and 4). Among the TB-speaking individuals, 495 unique Y-STR haplotypes were identified, distributed as follows: 85 Tibetans in Muli (TML), 93 Tibetans in Chengdu (TCD), 104 Yis in Liangshan (YLS), 137 Sherpas in Dingjie (SDJ), and 58 Tibetans in Qinghai (TQH), indicating considerable genetic heterogeneity. The shared haplotype between the two populations highlighted their genetic interconnectedness. The haplotype diversity (HD) ranged from 0.9978 to 1.0000, demonstrating the robust discrimination power of our genetic profiling, especially when using the AGCU Y37 kit (Table S5). This kit outperformed Yfiler Plus and Yfiler in delineating genetic diversity due to its overall higher discrimination capacity (DC) and lower haplotype match probability (HMP) in the studied population (Table S5), supported by the analysis of 234 alleles across 31 single-copy loci. (Table S6). We identified a unique 20.3 microvariant allele at DYS627, traced to a 'G' deletion at the 19th repeat unit via Sanger sequencing (Figure S1). The analysis of three multicopy loci—DYS527, DYS385, and DYF387S1—revealed 133 allele combinations, indicating significant genetic diversity, with rapidly mutating Y-STRs (RM Y-STRs) showing greater diversity than slower-mutating loci such as DYS391, DYS437, and DYS645 (Tables S6 and S7). Comprehensive Y-SNP-STR analysis associated all samples with microvariant alleles at DYS518 and the Q1a1-F746 haplogroup, enhancing our understanding of genetic structures and refining paternal lineage analysis for forensic and anthropological applications (Table S8).
Figure 3.
Fully-resolved Y chromosome phylogeny and paternal genetic history of TB people
(A) High-resolution phylogenetic tree and haplogroup frequency heatmap for five TB-speaking populations. This figure presents a streamlined phylogenetic tree alongside a heatmap that illustrates the distribution frequencies of various haplogroups across five distinct TB-speaking populations.
(B) Median-joining network topologies derived from Y-SNP-STR haplotypes. This series of networks elucidates the genetic relationships and evolutionary divergence within key paternal lineages among the studied populations, with each panel focusing on different levels of haplogroup resolution. Network topology for D1a1a-M15 subhaplogroups, which were denoted via different backgrounds. Network topology illustrates the diversity within D1a1a-M15 subclades, which is denoted by the different colors of the circle.
(C) Network depicting the structure of the D1a1b-P99 subhaplogroups. Detailed topology of D1a1b-P99 subhaplogroups, highlighting specific lineage relationships.
(D) Network topology for O2a2b1a1a1a4a-CTS4658 sublineages showing branching patterns. Detailed view of the haplogroup distribution within the O2a2b1a1a1a4a-CTS4658 sublineages.
Figure 4.
Geographical distribution and genetic relationships of newly collected and reference populations
(A) Multidimensional scaling analysis based on the Fst genetic distance matrix comparing newly collected TB populations with 28 Chinese reference groups.
(B) Neighbor-joining phylogenetic tree derived from the 27-Y-STR-based Rst genetic distance matrix illustrating the genetic relationships among the populations studied.
(C) Principal component analysis depicting clustering patterns among target TB-speaking populations and 87 global reference populations. The base map was officially approved with the numbers GS(2016)1760 and GS(2023)2761 (http://bzdt.ch.mnr.gov.cn/).
Our findings revealed an increase in shared haplotypes and a decrease in discrimination capacity as the number of genotyping markers decreased. This underscores the need for a tailored Y-STR panel for Chinese populations to reduce the risk of false matches in forensic applications. Genetic diversity (GD) assessments indicated that multicopy loci exhibited the highest diversity, with RM Y-STRs showing significant diversity. However, due to their high mutation rates, RM Y-STRs are less suited for paternal kinship identification, whereas conventional Y-STRs, which mutate more slowly, are preferred for reliable lineage tracing. Our integrated analysis linked all samples with microvariant alleles at DYS518 to the Q1a1-F746 haplogroup. This finding emphasizes the necessity of considering both allele and haplogroup data to clarify genetic lineage and historical migratory events, thereby enhancing the accuracy and reliability of forensic and genealogical investigations.
Y chromosome haplogroup distribution among geographically diverse Tibeto–Burman groups
We identified a wide range of Y chromosome haplogroups across geographically distinct TB-speaking populations. Specifically, 44 haplogroups were observed in 254 Tibetan individuals: 25 in TML, 19 in TCD, and 19 in TQH. For YLS, 33 haplogroups were identified, while SDJ exhibited only seven, demonstrating reduced genetic diversity (Figure 3A). The Haplogroup diversity (HGD) varied significantly, from a low of 0.6183 in SDJ to a high of 0.9376 in YLS. Prevalent haplogroups among Tibetans included D1∗-M174, which was found in more than half of the individuals across the three Tibetan subpopulations, and O2∗-M122, which was notably more frequent among YLS individuals. Additionally, O1b∗-P31, N∗-M231, and O1a∗-M119 were common in YLS, with N1b2-M1819 being the dominant subhaplogroup of N-M231. In the SDJ, O2∗-M122 dominated (98.14%). Subhaplogroup analysis revealed distinct distribution patterns. For example, in Tibetan populations, subhaplogroups D1-M174, such as D1a1a1a1b-SK541 and D1a1b1a2∼-PH97/Z34364/Z34365, showed variable frequencies across regions, with D1a1a1a1b-SK541 being the most frequent in YLS. In contrast, the subclade O2a2b1a1∗-M117 of O2-M122 was most prevalent across the Tibetan and Yi populations, highlighting different genetic legacies and geographical differences. This detailed haplogroup profiling underscores the complex genetic landscape of TB-speaking populations and provides crucial insights into their historical migrations and interactions. The distinct haplogroup compositions reflect the unique evolutionary histories and adaptive strategies of these populations at different altitudes, influenced by both their environment and their historical migration patterns.
We extensively investigated the phylogeographic distribution of founding haplogroups among TB speakers. Our findings indicated the prominent presence of the D1a1a haplogroup in Southwest China, particularly among TB-speaking communities (Figure S2A). The D1a1b haplogroup showed a broader regional presence in both Southwest and Northwest China, especially among geographically diverse Tibetan groups (Figure S2B). The O2a1 haplogroup, predominantly found among East Asians (including many Han Chinese individuals), was notably absent in Northwest East Asians (Figure S2C). Conversely, O2a2 exhibited a broad distribution across East and Southeast Asians (Figure S2D). The N1a haplogroup was primarily found in northern European and northern East Asian populations, while N1b appeared predominantly in southwestern East Asia (Figure S2E and S2F). Additional haplogroups, such as C2∗-M217, G∗-M201, and J∗-M304, were present in minor frequencies within these populations, underscoring a complex mosaic of paternal lineages in geographically diverse regions of China (Figure 3A). Our data also revealed rich diversity within Tibetan populations, including rare haplogroups such as LT-P326 and L-M20 (Figure 3A), suggesting varied historical interactions and migrations. This analysis provides a comprehensive overview of the genetic structure within TB-speaking populations, highlighting the significant variation and widespread distribution of specific haplogroups and enhancing our understanding of their historical and evolutionary backgrounds.
Fine-scale paternal genetic structure of Tibeto–Burman people
Genetic similarities and differences revealed by Y-STR haplotypes
Population genetic work suggested that STR markers with high mutation rates have a stronger power to illuminate recent dynamics of human genetic history.45 We analyzed Y-STR haplotype data to investigate the paternal genetic structure of TB speakers, revealing significant genetic relationships within and across TB populations (Figures 4A‒4C, S3‒S5, and Tables S9, S10, and S11). The initial findings showed significant genetic proximity within regional subgroups (Figure S3; Table S9). For instance, the TML and Tibetan in Nagqu (TNG) populations exhibited no measurable genetic distance, indicating strong genetic continuity. Similarly, Tibetan subpopulations such as Tibetan in Shigatse (TSG) demonstrated close genetic affiliations with TML, suggesting regional genetic coherence among highland Tibetan communities. Using 29 Y-STR markers, we confirmed that Tibetan populations share more genetic similarities with other highland groups than with lowland East Asian populations (Figure S4; Table S10). The YLS population showed a close genetic affinity with the Qiang in Beichuan (QBC) population, underscoring shared genetic traits across geographically and culturally connected groups. In contrast, the SDJ population was markedly distinct from other TB-speaking groups, aligning more closely with certain lowland Han populations. This highlights the complex mosaic of genetic influences in this region due to historical migrations and interactions. These results underscore the importance of regional and cultural contexts in shaping genetic structures, contributing to a deeper understanding of genetic diversity within East Asian populations.
The analysis of 27 Y-STR haplotypes (Figure S5; Table S11) confirmed close genetic relationships among Tibetan groups, highlighting significant affinities, particularly between the YLS and Sichuan Hui populations. The SDJ population showed closer genetic ties with the Han populations of Henan and Shanxi, with Rst values of 0.2634 and 0.2663, respectively. MDS based on 29 Y-STRs revealed a distinct Tibetan-related cluster, underscoring strong genetic links across Tibetan populations in different geographic locations (Figure S6; Table S10). This is notable among Tibetan groups in the TYC and Qinghai-Xizang Tibetan, and between geographically different Tibetan groups in Northwest China. Interestingly, the SDJ population appeared to be genetically isolated from other East Asian groups, corroborated by high Rst genetic distances greater than 0.21 (Table S10). Additionally, MDS analysis using 29 Y-STRs indicated that YLS shows greater genetic similarity with the linguistically related Yi population in Guizhou than with the geographically close QBC population. Conversely, the 27-Y-STR-based MDS revealed a cluster predominantly associated with Sinitic languages, positioning the SDJ and Hainan Li populations as distinct from the other analyzed groups (Figure S7). In this context, the TML closely aligns with other highland Tibetans, while the TCD and TQH are distinct from typical high-altitude Tibetan populations. The clustering of the YLS and Hui populations from Shaanxi underscores their shared genetic makeup, emphasizing the complex interplay of geography, language, and genetics in shaping the population structure of East Asia.
Clustering patterns revealed by Y-SNP haplotypes and haplogroups
Y-SNP haplotype analysis revealed clear clustering patterns among Tibetan and other East Asian populations. The TML and TCD populations were closely related to TNG and TSG. Similarly, TQH showed significant genetic links to TSG and TCD. YLS exhibited notable genetic affinity with the Hui population in Xinjiang, reflecting shared paternal lineages and regional genetic influences. The SDJ group shared genetic closeness with the northern Han Chinese, possibly indicating historical migrations or genetic admixture (Tables S12, S13, Figures S8 and S9). MDS analyses of 113 overlapping Y-SNPs suggested that the newly studied Tibetan populations (TML, TCD, and TQH) formed a distinct cluster; YLS was closely related to TSG, while SDJ was distinctly separate from other global populations (Figure S10). Additional MDS analysis using 157 Y-SNPs revealed that TML and TCD grouped with TSG, highlighting strong regional genetic coherence, while YLS aligned more closely with Mongolian reference populations (Figure 4A). Phylogenetic analyses confirmed the genetic proximity of Tibetan populations across different regions and underscored the distinct genetic makeup of the SDJ population (Figures 4B, S11, and S12). This complex genetic landscape illustrates the diverse genetic heritage of TB-speaking populations and underscores the impact of geographic separation and historical migrations on genetic diversity.
PCA patterns based on haplogroup frequencies revealed significant insights into the population structure of TB-speaking groups. We identified distinct clusters associated with geographic and ethnic origins (Figures 4C and S13). An extensive Asian-related gradient stretched from southern Han Chinese to Pathan populations in Afghanistan, while European and American populations aligned along the second principal component (PC2), showing diverse genetic backgrounds (Figure 4C). Within East Asia, a pronounced north‒south genetic gradient encompassed Han-, Hui-, and Mongolian-related clusters. This gradient was particularly marked in a focused analysis of East Asian populations, where Hui/Mongolian and Tibetan-related clusters were distinct, and the Austronesian-speaking Gaoshan population of Taiwan and Han Chinese from Shanxi occupied the extremes (Figure S13). The target Tibetan populations aligned closely with other East Asian groups, reflecting shared regional heritage. Notably, the YLS population showed closer genetic affiliation with the Hui population from Henan, while the SDJ population was isolated from other East Asian groups, indicating complex historical interactions and migrations within these regions.
Evolutionary history patterns inferred from the combined analysis of Y-SNP-STR haplotypes
To investigate the distribution patterns and evolutionary trajectories of major haplogroups among TB groups, we utilized MJ network topologies constructed from haplotypes derived from 27 Y-STRs and 157 Y-SNPs (Figures 3B‒3D and S14‒S17). We found that haplogroup D1a1a∗-M15, particularly its subhaplogroup D1a1a1a1b-SK541, was mainly present at low and middle altitudes in the TCD and YLS populations, with a lower prevalence among highland Tibetans (Figure 3B). Another significant subhaplogroup, D1a1b∗-P99, especially D1a1b1a2∼-PH97/Z34364/Z34365, was distributed across various Tibetan populations, indicating broad geographical spread among high-altitude communities (Figure 3C). Our analysis revealed that haplogroup O2∗-M122 was significantly prevalent across TB-speaking populations. Within this group, subhaplogroup O2a1∗-L467 was widespread among Han Chinese individuals, whereas O2a2a∗-M188 was predominant in southern Han Chinese individuals (Figures S14 and S15). Subhaplogroup O2a2b1a1∗-M117 exhibited high frequencies among the newly studied TB speakers and the Sinitic-speaking Hui and Han populations (Figure S16). O2a2b1a2a∗-F444 was notably prevalent among the Han and Hui populations (Figure S17). Particularly striking was the prominence of O2a2b1a1a1a4a-CTS4658 and its subhaplogroups in the SDJ, where it formed a star-like topology in the MJ network, suggesting a recent rapid expansion in this highland population (Figure 3D). This subhaplogroup also showed significant differences between the Tibetan and Yi groups, although it was less common in the Han Chinese population. These findings enrich our understanding of the complex genetic makeup and historical migrations of TB groups across different regions of Asia.
To investigate factors influencing genetic diversity among ethnolinguistically and geographically distinct populations, we conducted an AMOVA using 27 Y-STR and 157 Y-SNP markers across 33 Chinese populations categorized by ethnicity, linguistic affiliation, and altitude. Our analysis revealed that variations among groups and populations derived from 157 Y-SNPs were significantly greater than those from 27 Y-STRs (Table S14). Specifically, among-group variations based on ethnic categorization (15.32% for 157 Y-SNPs and 5.41% for 27 Y-STRs) exceeded those based on linguistic (8.70% for 157 Y-SNPs and 2.34% for 27 Y-STRs) or altitude groupings (6.65% for 157 Y-SNPs and 2.70% for 27 Y-STRs). Within-group variations among populations sharing the same altitude or linguistic family were notably more pronounced than those between ethnically similar groups. Intrapopulation variations accounted for the majority of genetic differences among Chinese populations, exceeding 82% for 157 Y-SNPs and 93% for 27 Y-STRs. These findings underscore the enhanced discriminatory power of 157 Y-SNPs over 27 Y-STRs and their utility in tracing paternal lineages among diverse Chinese groups.
Phylogeographical origin of Tibeto–Burman founding lineages
We finally examined the phylogeographical distribution of key mutations within the CPGDP resource. From a cohort of 232,413 individuals, we screened 918 samples from the D-Z31591 lineage and 3,388 from the O-CTS4658 lineage. Among the D-Z31591 sublineages (D-MF122648, D-F2625, D-F3401, D-F19133, and D-MF175572), we observed substantial population expansions. Notably, we collected 13 samples from Xizang, 8 from Qinghai, and 155 from Sichuan (Figure 1B). Analysis of haplogroup frequency and Y-SNP/STR profiles indicated that the highest frequencies occurred on the Qinghai-Xizang Plateau, suggesting that this region was a potential origin and center of post-colonization expansion for these ancient highlanders (Figure 5A). This pattern was further supported by optimized correlation analysis. Within the D-Z31591 lineage, 793 samples were from Hans, 33 from Tibetans, and 21 from Yis, representing the top three ethnic groups. For O-related TB founders, equivalent methodologies revealed the highest frequencies predominantly in the Qinghai-Xizang Plateau and Southwest China, as supported by optimized hotspot analysis (Figure 5B). Pearson correlation analysis between geographical coordinates and prefecture-level frequencies indicated no significant correlation with latitude for the D-Z31591 lineage and marginal negative correlations for other parameters (Figure 5D). These results suggest that the formation of these lineages in highland and lowland East Asians was not solely driven by isolation by distance. Finally, we analyzed the correlation between human migration patterns inferred from autosomal and Y chromosome evidence by comparing ADMIXTURE-based ancestral proportions and lineage frequencies. A positive correlation emerged between our identified lineages and Lubrak-related highland East Asian ancestry (Figure 5C), underscoring a significant genetic link.
Figure 5.
Phylogeographical analysis and correlation results of two TB-related founding lineages
(A and B) Haplogroup frequency and optimized hotspot analysis results for O2a2b1a1a1a4a-CTS4658 and D1a1a1a1b-Z31591. The red color in the left panel indicates a higher haplogroup frequency, and the yellow color indicates a low haplogroup frequency in the frequency spectrum. The red color in the right panel denotes the possible original center.
(C) The correlation between the founding lineage frequency and the geographical coordinates and ADMIXTURE-based admixture proportion using Pearson correlation analysis. The blue color indicates a positive correlation, and the red color indicates a negative correlation. ∗ represents 0.05 < p value <0.01, ∗∗ represents 0.01 ≤ p value <0.001, ∗∗∗ represents p value <0.001.
(D) The correlation between the frequency of two founding lineages and the latitude and longitude coordinates using Pearson correlation analysis. The base map was officially approved with the number GS(2023)2767 (http://bzdt.ch.mnr.gov.cn/).
Discussion
Previous genetic studies on paternal genetic diversity have sought to elucidate the formation of East Asia through preglacial and postglacial migrations via southern and northern routes.46,47,48,49 These studies also examined complex migrations and admixture within and between lowland and highland East Asia using low-density Y-SNP variations22,31,32,50,51 and sex-biased adaptations shaping uniparental gene pools.47 The Qinghai-Xizang Plateau, known for its harsh environmental conditions such as high altitude, low temperatures, severe aridity, and oxygen scarcity, has been home to human settlement since the Paleolithic era.49,52,53 Despite these formidable challenges, modern humans established themselves in the region, with many Paleolithic sites across the plateau dating back to around 20,000 ya.49,54 However, genetic research reveals that present-day Tibetan populations have their origins in Neolithic East Asia, specifically northern China.14,55 Recent gene flow has been strongly indicated by previous studies.49,56 However, fine-scale paternal genetic history from eastern regions, including the northeastern Qinghai-Xizang Plateau and the TYC, remains largely unknown, particularly from large-scale high-density Y-SNP data or whole-genome sequencing data. We reported an integrated YanHuang Y chromosome genomic resource, focusing on the formation of modern highland East Asians through whole Y chromosome sequencing, Y-SNP/STR genotyping of TB-speaking individuals, and high-density Y-SNP data from ethnolinguistically diverse Chinese populations across 34 provinces. Our study identified prevalent paternal lineages within highland TB-speaking populations, highlighting haplogroups D1∗-M174 (including subhaplogroups D1a1a1a1b-SK541, D1a1b1a2∼-PH97/Z34364/Z34365, and D1a1b1a3∼-Z42599/Z42600/Z42601/Z42602) and O2∗-M122, especially O2a2b1a1∗-M117. The TB-speaking SDJ predominantly exhibited haplogroup O2a2b1a1a1a4a∗-CTS4658, indicating a unique paternal lineage. Among the lowland TB-speaking YLS, the dominant haplogroup was O2∗-M122 (O2a2b1a1∗-M117 and O2a2b1a2a∗-F444), with significant occurrences of D1∗-M174 (D1a1a1a1b-SK541), N∗-M231 (N1b2-M1819), O1b∗-P31 (O1b1a1∗-PK4), and O1a∗-M119 (O1a1a∗-P203.1), indicating diverse genetic backgrounds. The D1-M174 haplogroup, integral to the East Asian paternal lineage, is particularly frequent among Tibetan and some Japanese populations, illustrating its historical significance and geographical spread.57 Variations within this haplogroup, such as D1a1a∗-M15 and D1a1b∗-P99, underscore their regional importance22 and are often considered Tibetan-specific lineages. The presence of this lineage across different Tibetan groups from Muli to Qinghai suggests a deep-rooted and widespread historical influence. It is widely believed that haplogroup D-M174 represents the remnants of the earliest modern human settlers on the Qinghai-Xizang Plateau, who likely endured through the Last Glacial Maximum.22,49 The migration patterns of D1a1a-M15, derived from D1-M174, highlight its evolution and expansion from western Sichuan northward into Qinghai and across the TYC into the Himalayas, reflecting significant migratory events and adaptations.58 These genetic insights enrich our understanding of the paternal genetic structure among Tibetan-speaking populations and enhance our knowledge of their historical migrations and interactions across diverse ecological and geographical landscapes.
The haplogroup O2∗-M122, predominant among the newly analyzed Tibetan-speaking populations, is widely distributed across East and Southeast Asia.32,42,59,60,61 Studies, including those by Yan et al., indicate that approximately 40% of Han Chinese people trace their paternal lineage to late Neolithic progenitors, particularly from the Oα (O2a2b1a1∗-M117), Oβ (O2a2b1a2a1a∗-F46), and Oγ (O2a1b1a1a1a∗-F11) lineages.60 These lineages significantly shaped the paternal genetic landscape of East Asian populations during the Neolithic period. Previous studies have confirmed that approximately 6,000 ya, farmers from the Yangshao culture in the middle Yellow River basin, carrying the O2a2b1a1a-F5 lineage, migrated to the Qinghai-Xizang Plateau.49,62 Additionally, based on ancient DNA from the Banpo site, it is possible that the Yangshao culture also contributed to the spread of haplogroup O2a1b1a1a1a-F11.63 The subhaplogroup O2a2b1a1a1a4a∗-CTS4658 was notably prevalent among the Sherpa population, with network analyses indicating recent rapid expansion. This high frequency in the SDJ population may be due to the localized population of Sherpas in China, primarily residing in Dingjie County within the Tibet Autonomous Region, highlighting the genetic distinctiveness of the Sherpa and Tibetan communities on the Qinghai-Xizang Plateau. Meantime, the phylogeographic analysis confirmed the highest frequency among highland Tibetans and their neighbors. Taken together, our time-labeled phylogeny of the O and D lineages, along with phylogenetic relationships among modern and ancient Chinese populations, confirmed that both Paleolithic and Neolithic genetic legacies contributed to the formation of proto-TB populations.
The haplogroup O1a1a∗-P203.1, with its upstream haplogroup O1a-M119 observed in the remains from the Liangzhu site,64 predominantly observed in the YLS, is widespread among southern Chinese and Southeast Asian populations and appears among eastern and northern Han Chinese populations.42,65,66 Subhaplogroups of O1b∗-P31, notably O1b1a1∗-PK4, frequently found in the TML and YLS, are prevalent across southern Chinese and South Asian populations, Southeast Asian tribal communities, and even among the Japanese population.42,66,67,68 Additionally, ancient DNA sequences confirm that around 3,000 ya, the Wucheng people in Jiangxi Province carried the O1b1a1a-M95 lineage.64 Conversely, the sublineage O1b2∗-M176 is common in Japanese, Korean, and some Manchu populations.57,67 The strategic positioning of the TML and YLS along the TYC, a significant migratory route to the Qinghai-Xizang Plateau, highlights the influence of ancient southern East Asian migrations carrying O1b-related subhaplogroups on the genetic landscape of modern ST-speaking populations,69,70 explaining the relatively high frequency of O1b∗-P31 observed in the TYC populations.
To elucidate the genetic relationships and differences among geographically different TB groups and various East Asian reference populations, we conducted genetic analyses, including genetic distance estimations, MDS, PCA, AMOVA, and phylogenetic relationship construction. These analyses utilized data from haplogroup frequencies, Y-STR/Y-SNP haplotypes, high-density SNP profiles and whole-genome sequences. Notably, the results based on Y-SNP haplotypes and haplogroup frequencies provided a more precise reflection of genetic affinity and differentiation among the ethnolinguistically diverse groups compared to Y-STR haplotypes. This enhanced resolution underscores the value of using diverse genetic markers to capture the complex patterns of genetic affinity and differentiation within and between populations.
Our analysis aimed to enhance the understanding of paternal demographic history among diverse TB-speaking populations. We found that the TML and TCD populations maintained close genetic ties with the Ü-Tsang Tibetans, notably the TSG and TNG groups. In contrast, the TQH population was more genetically aligned with the Kham Tibetan population, particularly the Tibetan_Chamdo population from the eastern Qinghai-Xizang Plateau. The YLS population showed a significant genetic affinity with Hui populations from Sichuan, Shaanxi, and Henan, suggesting considerable gene flow from these regions into the Yi population in the TYC. Conversely, the Sherpa population exhibited distinct genetic traits, supported by unique haplogroup distributions observed in the SDJ, indicating their relative genetic isolation from other groups.
Limitations of the study
This study has certain limitations, such as limited sampling locations and high coverage of Y chromosome variations. Tibetan populations are widely distributed across the Qinghai-Xizang Plateau, Qinghai, and Sichuan, with smaller populations in Gansu and Yunnan. Expanding sampling to these regions would help provide a more comprehensive understanding of the population history of TB groups. Additionally, much of the data used in this study relies on genotyping and haplogroup frequency information. In the era of whole-genome sequencing, using whole Y chromosome sequences could capture more genetic information and offer deeper insights. Lastly, incorporating large-scale ancient DNA data from the Paleolithic and Neolithic periods in South Asia, surrounding areas of the Qinghai-Xizang Plateau, and the Yellow River basin and Yangtze River basin would further elucidate the complex and dynamic genetic landscape of Tibeto-Burman populations.
Conclusion
This study utilized three kinds of advanced Y-SNP genotyping technologies to create a valuable genetic resource for forensic genetics and molecular anthropology. Our analysis highlights a strong correlation between specific allelic variations in Y-STRs and well-defined haplogroups, providing a theoretical framework for predicting haplogroups from Y-STR haplotypes. Despite variability within Y-STR haplotypes across similar haplogroups in Chinese populations, we found a consistent association of identical Y-STR haplotypes with specific haplogroups. This confirms a robust relationship between Y-STR haplotypes and haplogroup classifications. This study also revealed a distinct correlation between the complex paternal genetic structures of Chinese populations and their geographical and linguistic contexts. This finding underscores the utility of Y chromosomal markers in forensic pedigree analysis and paternal biogeographical ancestry assessments. Our findings suggest that geographically diverse TB groups exhibit distinct paternal genetic histories yet share close genetic ties with northern lowland East Asians, supporting a shared origin in North China for the ST people. Overall, this work deepens our understanding of genetic diversity and underscores the broader applicability of genetic markers in anthropological and forensic investigations.
Resource availability
Lead contact
Further information and requests for genomic resources should be directed to the lead contact, Guanglin He (guanglinhescu@163.com).
Materials availability
This study did not generate new unique reagents.
Data and code availability
-
•
Data: The Y-STR and Y-SNP haplotype data for 519 TB-speaking individuals have been deposited in the YHRD database (https://yhrd.org/) under accession numbers YA004726 (TCD), YA004729 (TML), YA004613 (TQH), YA004223 (YLS), and YA004730 (SDJ). The supplementary materials contain all additional data used in this study. The data collection and usage adhered to the guidelines stipulated by the People’s Republic of China on the administration of human genetic resources.
-
•
Code: This article does not report the original code.
-
•
All other items: Requests for access to the raw data should be directed to Guanglin He at guanglinhescu@163.com or Mengge Wang at Menggewang2021@163.com.
Acknowledgments
We express our gratitude to all the volunteers who contributed to this study. We acknowledge the financial support received from the National Natural Science Foundation of China (Grant No. 82202078) for M.W., (Grant No. 82402203) for G.H., and from the National Social Science Foundation of China (Major Project Grant No. 23&ZD203) for G.H., and from Open Research Project of the Ministry of Public Security (Grant No. 2024FGKFKT02) for M.W. Additional support for G.H. includes the Open Project of the Key Laboratory of Forensic Genetics of the Ministry of Public Security (2022FGKFKT05), the Center for Archaeological Science of Sichuan University (23SASA01), the 1‧3‧5 Project for Disciplines of Excellence at West China Hospital, Sichuan University (ZYJC20002), and the Sichuan Science and Technology Program (2024NSFSC1518).
Author contributions
Chao Liu, Mengge Wang Huijun Yuan, and Guanglin He conceived and designed the study. Mengge Wang and Guanglin He collected the samples. Mengge Wang and Guanglin He extracted the genomic DNA and performed the genotyping. Yunhui Liu, Lintao Luo, Yuhang Feng, Zhiyong Wang, and Ting Yang performed the population genetic analysis. Mengge Wang, and Guanglin He drafted the article. Mengge Wang and Guanglin He revised the article.
Declaration of interests
The authors declare no competing interests.
STAR★Methods
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited data | ||
| Y-STR and Y-SNP haplotype data | This study | YHRD: https://yhrd.org/ (YA004726 for TCD, YA004729 for TML, YA004613 for TQH, YA004223 for YLS, and YA004730 for SDJ). National Genomics Data Center: https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA028381 |
| Software and algorithms | ||
| BWA v0.7.13 | Li and Durbin71 | http://bio-bwa.sourceforge.net; RRID: SCR_010910 |
| Picard v3.0.0 | N/A | http://broadinstitute.github.io/picard; RRID: SCR_006525 |
| GATK v4.2.6.1. | McKenna et al.72 | https://gatk.broadinstitute.org/hc/en-us; RRID: SCR_001876 |
| BCFtools v1.8 | Li73 | https://www.htslib.org; RRID: SCR_005227 |
| VCFtools | Danecek et al.74 | https://vcftools.github.io/index.html; RRID: SCR_001235 |
| GeneMapper ID v.1.5 | N/A | https://www.thermofisher.com/order/catalog/product/4475073; RRID: SCR_014290 |
| Chromas Lite V2.6.6 | N/A | https://technelysium.com.au/wp/chromas/; RRID: SCR_000598 |
| HaploGrouper | Jagadeesan et al.75 | https://gitlab.com/bio_anth_decode/haploGrouper |
| the STR Analysis for Forensics (STRAF) | Gouy et al.76 | https://straf-p7bdrhm3xq-ew.a.run.app/https://github.com/agouy/straf |
| YHRD website | N/A | https://yhrd.org/pages/tools/amova |
| SPSS v.25.0 | N/A | https://www.ibm.com/support/pages/downloading-ibm-spss-statistics-25; RRID: SCR_002865 |
| R v4.3.3 | R CoreTeam77 | https://cran.r-project.org/bin/windows/base/; RRID: SCR_001905 |
| Arlequin v.3.5 | Excoffier et al.78 | https://cmpg.unibe.ch/software/arlequin35/; RRID: SCR_009051 |
| MEGA v.7.0 | Kumar et al.79 | https://www.megasoftware.net/; RRID: SCR_000667 |
| Surfer v.19. | Relethford80 | https://www.goldensoftware.com/products/surfer/ |
| MVSP v.3.22. | N/A | https://www.kovcomp.co.uk/downl2.html |
| Network 10.1 | N/A | https://www.fluxus-engineering.com/sharenet.htm |
| Network Publisher | N/A | https://www.fluxus-engineering.com/sharenet.htm |
| Y-LineageTracker | Chen et al.81 | https://github.com/Shuhua-Group/Y-LineageTracker |
| ArcMap | N/A | https://www.esri.com/en-us/arcgis/products/arcgis-desktop/overview |
| RaXML v8.0.0 | Stamatakis et al.82 | https://github.com/stamatak/standard-RAxML; RRID: SCR_006086 |
| pathPhynder | Martiniano et al.83 | https://github.com/ruidlpm/pathPhynder |
| BEAST v.1.10.4 | Suchard et al.84 | https://beast.community; RRID: SCR_010228 |
| LogCombiner v1.10.4. | Drummond and Rambaut85 | https://beast.community/logcombiner |
| Tracer v1.7 | Rambaut et al.86 | https://beast.community/tracer; RRID: SCR_019121 |
| TreeAnnotator v1.10.4 | Drummond and Rambaut85 | https://beast.community/treeannotator |
| FigTree v1.4.4 | N/A | http://tree.bio.ed.ac.uk/software/figtree/; RRID: SCR_008515 |
Experimental model and study participant details
This study followed ethical standards set by the Medical Ethics Committees of West China Hospital of Sichuan University (Approval No. 2023-306) and the principles of the International Declaration of Helsinki. We collected samples in three batches. First, we obtained peripheral venous blood from 519 unrelated TB-speaking individuals in various communities after providing informed consent for genotyping STR and SNP profiles. This included 254 Tibetan individuals from multiple locations: 101 from Muli County, Liangshan Yi Autonomous Prefecture; 95 from Chengdu, Sichuan Province; and 58 from Qinghai Province. Additionally, we sampled 104 Yi participants from Liangshan Yi Autonomous Prefecture and 161 Sherpa participants from Dingjie County, Shigatse, Tibet Autonomous Region. We integrated these data with 3,779 previously reported genotypes from ethnolinguistically diverse Chinese populations to characterize general paternal profiles across China (Figures 1A and 1B). Second, we collected 72 representative samples from the D-Z31591 and O-CTS4658 lineages for whole-genome sequencing and merged them with 918 modern samples from the pilot work of the YangHuang cohort and 303 ancient Y chromosome sequences from published ancient autosome-based studies to elucidate the demographic dynamics of TB people further (Tables S15 and S16). Finally, we collected additional samples to explore the evolutionary history of the founding TB lineages. This included 918 samples from 31 provinces covering 275 prefecture-level cities associated with the D-Z31591 lineage and 3,388 samples from 34 provinces covering 373 prefecture-level cities linked to the O-CTS4658 lineage for high-density Y-SNP genotyping. The resource encompassed 37 ethnic groups and over four thousand ST-speaking individuals, including 3,811 Han Chinese individuals, 89 Tibetan individuals, 84 Yis individuals, 39 Huis individuals, 38 Manchus individuals, 27 Bais individuals, and 168 individuals from 31 other minority groups.
All participants provided informed consent, and the study procedures were approved by the Medical Ethics Committee of West China Hospital, Sichuan University (Approval No. 2023-1288). The study was conducted following the Human Genetic Resources Administration of China (HGRAC) guidelines and adhered to the principles of the 2013 revision of the Helsinki Declaration.
Method details
Whole-genome sequencing and ancient Y chromosome sequences
The whole genomes of representative samples were sequenced using the DNBSEQ-T7 platform (MGI, Shenzhen, China) following an in-house protocol.3 We used BWA v0.7.13 71 to map the raw sequencing reads to the GRCh37 human reference genome and Picard v3.0.0 to remove duplicate reads. Base quality score recalibration was performed using GATK v4.2.6.1. Y chromosome BAM files were extracted and combined with reference targeting sequencing 20 Mb Y chromosome BAMs.72 The GATK HaplotypeCaller, CombineGVCFs, and GenotypeGVCFs modules were used for the joint calling of genome-wide variants.72 We focused on high-quality Y chromosome regions, specifically the 10 Mb region used in Poznik’s population evolution modeling.87 Quality control was performed using BCFtools v1.8, filtering variants with missing call rates greater than 5%, base quality less than 20, and heterogeneity rates greater than 15%.73 Variants with missing call rates exceeding 5% were removed using VCFtools.74 The raw sequencing reads of ancient Tibetans were downloaded from the Genome Sequence Archive of the National Genomics Data Center (https://ngdc.cncb.ac.cn/gsa-human/) and aligned following standard ancient DNA research protocols.41 Quality-controlled BAM files were used for integrative analysis between modern and ancient genomic data and haplogroup classification.
Y-STR haplotype profiling
As a quality control measure, we used male DNA standard 9948 (Promega Corporation, USA) as a positive control throughout the study. For Y-STR haplotype profiling, we employed the AGCU Y37 Kit for multiplex amplification of 37 Y-STR loci.88 Ultrapure water served as the negative control. Each reaction mixture included 2 μL of reaction mixture, 1 μL of Y37 primers, 0.2 μL of DNA polymerase, and 1 μL of DNA template at 2 ng/μL, with the final volume adjusted to 5 μL using 0.8 μL of deionized water (ddH2O). Thermal cycling was conducted on a ProFlex 96-well PCR system (Thermo Fisher Scientific) under the following conditions: initial denaturation at 95°C for 2 min, 30 cycles of denaturation at 94°C for 30 s, annealing at 60°C for 1 min, extension at 72°C for 1 min, a final extension at 60°C for 20 min, and holding at 4°C. We analyzed the amplified products using an ABI 3500XL Genetic Analyzer. The electrophoresis setup included 9.8 μL of deionized formamide, 0.2 μL of AGCU Marker SIZ-500 internal standard, and 1 μL of either the amplified product or the Y37 allelic ladder standard. The electrophoresis parameters were an injection time of 10 s at 1.2 kV, followed by a 3-min prerun and a 22-min electrophoresis at 15 kV. Data interpretation was performed using GeneMapper ID v.1.5 software.
Sanger sequencing and detection of microvariant alleles
To identify microvariant alleles not cataloged in the standard Bin file, we used Sanger sequencing for validation. We used the DYS448 amplification primers from Hohoff et al.,89 the DYS570 and DYS627 primers from Ballantyne et al.,35 and the DYS527 primers from the NIST website (Table S1). The PCR amplification mixture included 10 μL of QIAGEN Multiplex PCR Master Mix (2×), 1 μL each of forward and reverse primers (10 μM), 2 μL of DNA template (2 ng/μL), and 6 μL of ddH2O. The PCR conditions, which varied by primer-specific annealing temperature, are detailed in Table S2. After amplification, we verified the specificity of the PCR products via polyacrylamide gel electrophoresis. We then sequenced the PCR products using the Sanger method to genotype the alleles accurately. Sequencing analyses were performed using Chromas Lite V2.6.6 software (Technelysium Pty Ltd., Australia), ensuring precise allele identification.
SnaPshot-based Y-SNP genotyping, microarray genotyping, and haplogroup classification
We genotyped 215 Y-SNP loci using SNaPshot panels (Figure 2A) following protocols described by Wang et al.90 Y-SNP profiles were analyzed with GeneMapper ID v.1.5 software. High-density Y-SNPs from 918 D-Z31591 and 3,388 O-CTS4658 samples were genotyped using the Thermo Fisher Scientific Illumina 23MF_v1 array, which includes 769,530 SNPs, 27,280 of which are phylogenetically informative Y chromosome SNPs. We manually classified haplogroups for 215 Y-SNP-based genotypes and used Haplogrouper75 for haplogroup inference on high-density Y-SNP data and whole Y chromosome sequences, adhering to the Y-DNA Haplogroup Tree 2019–2020 standards.
Data analysis
Y-STR data analysis
For our Y-STR data analysis, the allele frequencies and genetic diversity of each Y-STR locus were calculated using the STR Analysis for Forensics (STRAF) software.76 To ensure data clarity, three multicopy loci—DYS527, DYS385a/b, and DYF387S1—were excluded from the analysis. Furthermore, the allele count for DYS389II was adjusted by subtracting DYS389I to derive DYS389b. Allele frequency was computed using the direct counting method for multicopy loci, copy number variations, and null alleles. The frequency of each Y-STR haplotype was calculated using the formula , where x represents the number of observed haplotypes and N is the total sample size. HD, GD, HMP, and DC were derived using the following formulas: HD/GD = , HMP , and DC = . Here, pi is the frequency of the i-th haplotype, k is the number of haplotypes, and N is the sample size of each studied population.
To evaluate genetic distances among geographically diverse populations, we analyzed 29 Y-STRs cataloged in the Y Chromosome Haplotype Reference Database (YHRD), including subsets of 27 Y-STRs from the Yfiler Plus kit and 17 Y-STRs from the Yfiler kit. We estimated genetic distances (Rst) using the AMOVA&MDS tool on the YHRD website. The resulting Rst genetic distance matrix was subjected to multidimensional scaling (MDS) analysis in SPSS v.25.0 and visualized using R software77 to enhance the interpretability of genetic relationships. Detailed descriptions of the reference populations used in this analysis are provided in Tables S3 and S4.42,43,44,91 We performed an analysis of molecular variance (AMOVA) with Arlequin v.3.5 to assess molecular variance within and between these populations.78 Additionally, we constructed a neighbor-joining (NJ) phylogenetic tree based on the Rst matrix using MEGA v.7.0 79 to further elucidate phylogenetic relationships.
Y-SNP data analysis
We calculated haplogroup frequencies within the studied populations using the direct counting method. HGD was determined by the formula HGD = , where pi represents the frequency of the i-th haplogroup, k is the total number of observed haplogroups, and N is the sample size. To visualize the distribution of major haplogroups among the TB populations, we generated contour maps using Surfer v.19.80 Genetic distances (Fst) between geographically distinct populations were calculated based on either 113 Y-SNPs common among worldwide populations or 157 Y-SNPs common among Chinese populations using Arlequin v.3.5. We conducted MDS analysis of the Fst genetic distance matrix with SPSS v.25.0 to determine spatial genetic relationships. Additionally, we performed principal component analysis (PCA) based on haplogroup frequencies using MVSP v.3.22. Phylogenetic relationships were further delineated through an NJ phylogenetic tree constructed with MEGA v.7.0. AMOVA based on Y-SNP haplotypes was conducted using Arlequin v.3.5 to determine variance components attributed to different levels of population grouping.
Integrative analysis of Y-SNPs and Y-STRs
To elucidate genetic relationships among populations, we constructed a median-joining (MJ) network using Network 10.1 and Network Publisher software, integrating Y-SNP-STR haplotypes. To enhance analysis accuracy, we excluded DYS385a/b due to its multicopy nature and treated DYS389 as two separate loci: DYS389I and DYS389b (calculated as DYS389II−DYS389I). For DYF387S1, we considered only the DYF387S1b allele. In this network analysis, we assigned Y-SNPs a high weight of 99 due to their lower mutation rates, providing stability to the network structure. Conversely, Y-STRs, with greater variability, were assigned weights ranging from 1 to 5, inversely proportional to their mutation rates. This weighting system balanced the contributions of SNPs and STRs, offering a detailed and nuanced view of the genetic landscape and historical population dynamics.92
Spatial correlation analysis
We conducted spatial correlation analysis using R software and Y-LineageTracker,81 applying parameters such as –level and –freq to estimate haplogroup frequencies at both the provincial and prefectural levels. We examined the geographical distribution and potential phylogeographic origins of founding lineages through spatial autocorrelation analysis performed in ArcMap.
Whole Y chromosome sequence-based demographical history reconstruction
We obtained high-quality variant calls using the sequence masks and filters mentioned above, which served as high-confidence 10 Mb targeted Y chromosome regions. The final dataset of 994 samples was used to construct a maximum-likelihood tree via RaXML v8.0.0,82 with 200 rapid bootstrap inferences and a maximum-likelihood search. We then integrated 303 ancient Y chromosome sequences into the reconstructed reference phylogeny for combined analysis using pathPhynder.83 Coalescent times for each node were estimated using Bayesian Markov Chain Monte Carlo (MCMC) methods with BEAST v.1.10.4 software.84 To preserve the phylogenetic topology, we included additional samples from other haplogroups, specifically four samples from haplogroup B, to root the tree.93 We conducted four parallel runs, each with a different seed number, and merged them using LogCombiner v1.10.4.85 Each run consisted of 60 million chains, logging every 3,000 steps. The results were manually inspected using Tracer v1.7 software86 and the initial 25% was discarded as burn-in using TreeAnnotator v1.10.85 Consistent parameters were maintained across all runs, including the GTR substitution model with Gamma and Invariant sites heterogeneity model, a strict clock with a uniform distribution prior to mutation rate (7.4e-10; 95% CI: 6.7e-10-8.6e-10 mutations/nucleotide/year), and the Bayesian Skyline model with a group size of 10. The NO-M214 node served as the calibration point for estimating coalescence age, with an age of 41,900 years (95% CI: 40,175–43,591).94 The maximum clade credibility tree was then visualized using FigTree.
Quantification and statistical analysis
We conducted a Pearson correlation analysis using R software between the founding lineage frequency and the geographical coordinates and ADMIXTURE-based admixture proportion in the Figure 5C, where ∗ represents 0.05 < p value <0.01, ∗∗ represents 0.01 ≤ p value <0.001, ∗∗∗ represents p value <0.001. Meanwhile, we conducted another Pearson correlation analysis between the frequency of two founding lineages and the latitude and longitude coordinates in the Figure 5D, where the R value is considered significant if the p-value is less than 0.05.
Published: November 22, 2024
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.isci.2024.111456.
Contributor Information
Mengge Wang, Email: menggewang2021@163.com.
Yunhui Liu, Email: gianthuihui@gmail.com.
Huijun Yuan, Email: yuanhj301@wchscu.cn.
Chao Liu, Email: liuchaogzf@163.com.
Guanglin He, Email: guanglinhescu@163.com.
Supplemental information
References
- 1.Byrska-Bishop M., Evani U.S., Zhao X., Basile A.O., Abel H.J., Regier A.A., Corvelo A., Clarke W.E., Musunuri R., Nagulapalli K., et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell. 2022;185:3426–3440. doi: 10.1016/j.cell.2022.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bergström A., McCarthy S.A., Hui R., Almarri M.A., Ayub Q., Danecek P., Chen Y., Felkel S., Hallast P., Kamm J., et al. Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020;367:1339. doi: 10.1126/science.aay5012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wang M., Yao H., Sun Q., Duan S., Tang R., Chen J., Wang Z., Sun Y., Li X., Wang S., et al. Pilot work of the 10K Chinese People Genomic Diversity Project along the Silk Road suggests a complex east‒west admixture landscape and biological adaptations. Sci. China Life Sci. 2024 doi: 10.1101/2023.02.26.530053. [DOI] [Google Scholar]
- 4.Bick A.G., Metcalf G.A., Mayo K.R., Lichtenstein L., Rura S., Carroll R.J., Musick A., Linder J.E., Jordan I.K., Nagar S.D., et al. Genomic data in the All of Us Research Program. Nature. 2024;627:340–346. doi: 10.1038/s41586-023-06957-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang M., Chen H., Luo L., Huang Y., Duan S., Yuan H., Tang R., Liu C., He G. Forensic investigative genetic genealogy: expanding pedigree tracing and genetic inquiry in the genomic era. J Genet Genomics. 2024 doi: 10.1016/j.jgg.2024.06.016. [DOI] [PubMed] [Google Scholar]
- 6.He G., Wang M., Luo L., Sun Q., Yuan H., Lv H., Feng Y., Liu X., Cheng J., Bu F., et al. Population genomics of Central Asian peoples unveil ancient Trans-Eurasian genetic admixture and cultural exchanges. hLife. 2024;2:554–562. doi: 10.1016/j.hlife.2024.06.006. [DOI] [Google Scholar]
- 7.Sun Q., Wang M., Lu T., Duan S., Liu Y., Chen J., Wang Z., Sun Y., Li X., Wang S., et al. Differentiated adaptative genetic architecture and language-related demographical history in South China inferred from 619 genomes from 56 populations. BMC Biol. 2024;22:55. doi: 10.1186/s12915-024-01854-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li X., Wang M., Su H., Duan S., Sun Y., Chen H., Wang Z., Sun Q., Yang Q., Chen J., et al. Evolutionary history and biological adaptation of Han Chinese people on the Mongolian Plateau. hLife. 2024;2:296–313. doi: 10.1016/j.hlife.2024.04.005. [DOI] [Google Scholar]
- 9.He G., Wang P., Chen J., Liu Y., Sun Y., Hu R., Duan S., Sun Q., Tang R., Yang J., et al. Differentiated genomic footprints suggest isolation and long-distance migration of Hmong-Mien populations. BMC Biol. 2024;22:18. doi: 10.1186/s12915-024-01828-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Luo L., Wang M., Liu Y., Li J., Bu F., Yuan H., Tang R., Liu C., He G. Sequencing and characterizing human mitochondrial genomes in the biobankbased genomic research paradigm. Sci. China Life Sci. 2024 doi: 10.1007/s11427-024-2736-7. [DOI] [Google Scholar]
- 11.Rhie A., Nurk S., Cechova M., Hoyt S.J., Taylor D.J., Altemose N., Hook P.W., Koren S., Rautiainen M., Alexandrov I.A., et al. The complete sequence of a human Y chromosome. Nature. 2023;621:344–354. doi: 10.1038/s41586-023-06457-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hallast P., Ebert P., Loftus M., Yilmaz F., Audano P.A., Logsdon G.A., Bonder M.J., Zhou W., Höps W., Kim K., et al. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature. 2023;621:355–364. doi: 10.1038/s41586-023-06425-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang M., Huang Y., Liu K., Wang Z., Zhang M., Yuan H., Duan S., Wei L., Yao H., Sun Q., et al. Multiple Human Population Movements and Cultural Dispersal Events Shaped the Landscape of Chinese Paternal Heritage. Mol. Biol. Evol. 2024;41 doi: 10.1093/molbev/msae122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhang M., Yan S., Pan W., Jin L. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. Nature. 2019;569:112–115. doi: 10.1038/s41586-019-1153-z. [DOI] [PubMed] [Google Scholar]
- 15.Ning C., Li T., Wang K., Zhang F., Li T., Wu X., Gao S., Zhang Q., Zhang H., Hudson M.J., et al. Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat. Commun. 2020;11:2700. doi: 10.1038/s41467-020-16557-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang H., Yang M.A., Wangdue S., Lu H., Chen H., Li L., Dong G., Tsring T., Yuan H., He W., et al. Human genetic history on the Tibetan Plateau in the past 5100 years. Sci. Adv. 2023;9 doi: 10.1126/sciadv.add5582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sun Y., Wang M., Sun Q., Liu Y., Duan S., Wang Z., Zhou Y., Zhong J., Huang Y., Huang X., et al. Distinguished biological adaptation architecture aggravated population differentiation of Tibeto-Burman-speaking people. J Genet Genomics. 2024;51:517–530. doi: 10.1016/j.jgg.2023.10.002. [DOI] [PubMed] [Google Scholar]
- 18.Wang C.C., Yeh H.Y., Popov A.N., Zhang H.Q., Matsumura H., Sirak K., Cheronet O., Kovalev A., Rohland N., Kim A.M., et al. Genomic insights into the formation of human populations in East Asia. Nature. 2021;591:413–419. doi: 10.1038/s41586-021-03336-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhao M., Kong Q.P., Wang H.W., Peng M.S., Xie X.D., Wang W.Z., Zhao S.N., Jiayang, Duan J.G., Duan J.G., Cai M.C. Mitochondrial genome evidence reveals successful Late Paleolithic settlement on the Tibetan Plateau. Proc. Natl. Acad. Sci. USA. 2009;106:21230–21235. doi: 10.1073/pnas.0907844106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jeong C., Ozga A.T., Witonsky D.B., Malmström H., Edlund H., Hofman C.A., Hagan R.W., Jakobsson M., Lewis C.M., Aldenderfer M.S., et al. Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc. Proc. Natl. Acad. Sci. USA. 2016;113:7485–7490. doi: 10.1073/pnas.1520844113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lu D., Lou H., Yuan K., Wang X., Wang Y., Zhang C., Lu Y., Yang X., Deng L., Zhou Y., et al. Ancestral Origins and Genetic History of Tibetan Highlanders. Am. J. Hum. Genet. 2016;99:580–594. doi: 10.1016/j.ajhg.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Qi X., Cui C., Peng Y., Zhang X., Yang Z., Zhong H., Zhang H., Xiang K., Cao X., Wang Y., et al. Genetic Evidence of Paleolithic Colonization and Neolithic Expansion of Modern Humans on the Tibetan Plateau. Mol. Biol. Evol. 2013;30:1761–1778. doi: 10.1093/molbev/mst093. [DOI] [PubMed] [Google Scholar]
- 23.Sun Y., Wang M., Sun Q., Liu Y., Duan S., Wang Z., Zhou Y., Zhong J., Huang Y., Huang X., et al. Distinguished biological adaptation architecture aggravated population differentiation of Tibeto-Burman-speaking people. J Genet Genomics. 2024;51:517–530. doi: 10.1016/j.jgg.2023.10.002. [DOI] [PubMed] [Google Scholar]
- 24.Yang X.Y., Rakha A., Chen W., Hou J., Qi X.B., Shen Q.K., Dai S.S., Sulaiman X., Abdulloevich N.T., Afanasevna M.E., et al. Tracing the Genetic Legacy of the Tibetan Empire in the Balti. Mol. Biol. Evol. 2021;38:1529–1536. doi: 10.1093/molbev/msaa313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cheng S., Xu Z., Bian S., Chen X., Shi Y., Li Y., Duan Y., Liu Y., Lin J., Jiang Y., et al. The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay. Cell Discov. 2023;9:75. doi: 10.1038/s41421-023-00582-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Walters R.G., Millwood I.Y., Lin K., Schmidt Valle D., McDonnell P., Hacker A., Avery D., Edris A., Fry H., Cai N., et al. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genom. 2023;3 doi: 10.1016/j.xgen.2023.100361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang P., Luo H., Li Y., Wang Y., Wang J., Zheng Y., Niu Y., Shi Y., Zhou H., Song T., et al. NyuWa Genome resource: A deep whole-genome sequencing-based variation profile and reference panel for the Chinese population. Cell Rep. 2021;37 doi: 10.1016/j.celrep.2021.110017. [DOI] [PubMed] [Google Scholar]
- 28.Cao Y., Li L., Xu M., Feng Z., Sun X., Lu J., Xu Y., Du P., Wang T., Hu R., et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res. 2020;30:717–731. doi: 10.1038/s41422-020-0322-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Huang S., Liu S., Huang M., He J.R., Wang C., Wang T., Feng X., Kuang Y., Lu J., Gu Y., et al. The Born in Guangzhou Cohort Study enables generational genetic discoveries. Nature. 2024;626:565–573. doi: 10.1038/s41586-023-06988-4. [DOI] [PubMed] [Google Scholar]
- 30.Wang Z., Wang M., Liu K., Yuan H., Duan S., Liu Y., Luo L., Jiang X., Chen S., Wei L., et al. Paternal genomic resources from the YanHuang cohort suggested a Weakly-Differentiated Multi-source Admixture model for the formation of Han’s founding ancestral lineages. Dev. Reprod. Biol. 2023;2023 doi: 10.1101/2023.11.08.566335. [DOI] [Google Scholar]
- 31.Shi H., Zhong H., Peng Y., Dong Y.-L., Qi X.-B., Zhang F., Liu L.-F., Tan S.-J., Ma R.Z., Xiao C.-J., et al. Y chromosome evidence of earliest modern human settlement in East Asia and multiple origins of Tibetan and Japanese populations. BMC Biol. 2008;6:45. doi: 10.1186/1741-7007-6-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhong H., Shi H., Qi X.B., Duan Z.Y., Tan P.P., Jin L., Su B., Ma R.Z. Extended Y Chromosome Investigation Suggests Postglacial Migrations of Modern Humans into East Asia via the Northern Route. Mol. Biol. Evol. 2011;28:717–727. doi: 10.1093/molbev/msq247. [DOI] [PubMed] [Google Scholar]
- 33.Luis J.R., Palencia-Madrid L., Runfeldt G., Garcia-Bertrand R., Herrera R.J. Delineating the dispersal of Y-chromosome sub-haplogroup O2a2b-P164 among Austronesian-speaking populations. Sci. Rep. 2024;14 doi: 10.1038/s41598-024-52293-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wei L.-H., Yan S., Teo Y.-Y., Huang Y.-Z., Wang L.-X., Yu G., Saw W.-Y., Ong R.T.-H., Lu Y., Zhang C., et al. Phylogeography of Y-chromosome haplogroup O3a2b2-N6 reveals patrilineal traces of Austronesian populations on the eastern coastal regions of Asia. PLoS One. 2017;12 doi: 10.1371/journal.pone.0175080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ballantyne K.N., Goedbloed M., Fang R., Schaap O., Lao O., Wollstein A., Choi Y., van Duijn K., Vermeulen M., Brauer S., et al. Mutability of Y-Chromosomal Microsatellites: Rates, Characteristics, Molecular Bases, and Forensic Implications. Am. J. Hum. Genet. 2010;87:341–353. doi: 10.1016/j.ajhg.2010.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Balanovsky O. Toward a consensus on SNP and STR mutation rates on the human Y-chromosome. Hum. Genet. 2017;136:575–590. doi: 10.1007/s00439-017-1805-8. [DOI] [PubMed] [Google Scholar]
- 37.Mao X., Zhang H., Qiao S., Liu Y., Chang F., Xie P., Zhang M., Wang T., Li M., Cao P., et al. The deep population history of northern East Asia from the Late Pleistocene to the Holocene. Cell. 2021;184:3256–3266.e13. doi: 10.1016/j.cell.2021.04.040. [DOI] [PubMed] [Google Scholar]
- 38.Wang T., Wang W., Xie G., Li Z., Fan X., Yang Q., Wu X., Cao P., Liu Y., Yang R., et al. Human population history at the crossroads of East and Southeast Asia since 11,000 years ago. Cell. 2021;184:3829–3841. doi: 10.1016/j.cell.2021.05.018. [DOI] [PubMed] [Google Scholar]
- 39.Tao L., Yuan H., Zhu K., Liu X., Guo J., Min R., He H., Cao D., Yang X., Zhou Z., et al. Ancient genomes reveal millet farming-related demic diffusion from the Yellow River into southwest China. Curr. Biol. 2023;33:4995–5002. doi: 10.1016/j.cub.2023.09.055. [DOI] [PubMed] [Google Scholar]
- 40.Yang M.A., Fan X., Sun B., Chen C., Lang J., Ko Y.C., Tsang C.H., Chiu H., Wang T., Bao Q., et al. Ancient DNA indicates human population shifts and admixture in northern and southern China. Science. 2020;369:282–288. doi: 10.1126/science.aba0909. [DOI] [PubMed] [Google Scholar]
- 41.Bai F., Liu Y., Wangdue S., Wang T., He W., Xi L., Tsho Y., Tsering T., Cao P., Dai Q., et al. Ancient genomes revealed the complex human interactions of the ancient western Tibetans. Curr. Biol. 2024;34:2594–2605.e7. doi: 10.1016/j.cub.2024.04.068. [DOI] [PubMed] [Google Scholar]
- 42.Lang M., Liu H., Song F., Qiao X., Ye Y., Ren H., Li J., Huang J., Xie M., Chen S., et al. Forensic characteristics and genetic analysis of both 27 Y-STRs and 143 Y-SNPs in Eastern Han Chinese population. Forensic Sci. Int. Genet. 2019;42:e13–e20. doi: 10.1016/j.fsigen.2019.07.011. [DOI] [PubMed] [Google Scholar]
- 43.Song M., Wang Z., Zhang Y., Zhao C., Lang M., Xie M., Qian X., Wang M., Hou Y. Forensic characteristics and phylogenetic analysis of both Y-STR and Y-SNP in the Li and Han ethnic groups from Hainan Island of China. Forensic Sci. Int. Genet. 2019;39:e14–e20. doi: 10.1016/j.fsigen.2018.11.016. [DOI] [PubMed] [Google Scholar]
- 44.Xie M., Song F., Li J., Lang M., Luo H., Wang Z., Wu J., Li C., Tian C., Wang W., et al. Genetic substructure and forensic characteristics of Chinese Hui populations using 157 Y-SNPs and 27 Y-STRs. Forensic Sci. Int. Genet. 2019;41:11–18. doi: 10.1016/j.fsigen.2019.03.022. [DOI] [PubMed] [Google Scholar]
- 45.Chu J.Y., Huang W., Kuang S.Q., Wang J.M., Xu J.J., Chu Z.T., Yang Z.Q., Lin K.Q., Li P., Wu M., et al. Genetic relationship of populations in China. Proc. Natl. Acad. Sci. USA. 1998;95:11763–11768. doi: 10.1073/pnas.95.20.11763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Peng M.S., He J.D., Fan L., Liu J., Adeola A.C., Wu S.F., Murphy R.W., Yao Y.G., Zhang Y.P. Retrieving Y chromosomal haplogroup trees using GWAS data. Eur. J. Hum. Genet. 2014;22:1046–1050. doi: 10.1038/ejhg.2013.272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Peng M.S., Zhang Y.P. Sex-biased adaptation shapes uniparental gene pools in Tibetans. Sci. China Life Sci. 2024;67:611–613. doi: 10.1007/s11427-023-2506-0. [DOI] [PubMed] [Google Scholar]
- 48.Wen B., Xie X., Gao S., Li H., Shi H., Song X., Qian T., Xiao C., Jin J., Su B., et al. Analyses of genetic structure of Tibeto-Burman populations reveals sex-biased admixture in southern Tibeto-Burmans. Am. J. Hum. Genet. 2004;74:856–865. doi: 10.1086/386292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yu X., Li H. Origin of ethnic groups, linguistic families, and civilizations in China viewed from the Y chromosome. Mol. Genet. Genom. 2021;296:783–797. doi: 10.1007/s00438-021-01794-x. [DOI] [PubMed] [Google Scholar]
- 50.Qi X., Cui C., Ouzhuluobu, Wu T., Su B. eLS. Wiley); 2014. Prehistoric Colonization and Demographic History of Modern Humans on the Tibetan Plateau. [DOI] [Google Scholar]
- 51.Shi H., Dong Y.-l., Wen B., Xiao C.-J., Underhill P.A., Shen P.-d., Chakraborty R., Jin L., Su B. Y-Chromosome Evidence of Southern Origin of the East Asian–Specific Haplogroup O3-M122. Am. J. Hum. Genet. 2005;77:408–419. doi: 10.1086/444436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Yuan B., Huang W., Zhang D. New evidence for human occupation of the northern Tibetan Plateau, China during the Late Pleistocene. Chin. Sci. Bull. 2007;52:2675–2679. doi: 10.1007/s11434-007-0357-z. [DOI] [Google Scholar]
- 53.Zhang D.D., Li S.H. Optical dating of Tibetan human hand- and footprints: An implication for the palaeoenvironment of the last glaciation of the Tibetan Plateau. Geophys. Res. Lett. 2002;29:16. doi: 10.1029/2001GL013749. [DOI] [Google Scholar]
- 54.Aldenderfer M., Yinong Z. The Prehistory of the Tibetan Plateau to the Seventh Century A.D.: Perspectives and Research from China and the West Since 1950. J. World PreHistory. 2004;18:1–55. doi: 10.1023/B:JOWO.0000038657.79035.9e. [DOI] [Google Scholar]
- 55.Sagart L., Jacques G., Lai Y., Ryder R.J., Thouzeau V., Greenhill S.J., List J.-M. Dated language phylogenies shed light on the ancestry of Sino-Tibetan. Proc. Natl. Acad. Sci. USA. 2019;116:10317–10322. doi: 10.1073/pnas.1817972116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gayden T., Cadenas A.M., Regueiro M., Singh N.B., Zhivotovsky L.A., Underhill P.A., Cavalli-Sforza L.L., Herrera R.J. The Himalayas as a Directional Barrier to Gene Flow. Am. J. Hum. Genet. 2007;80:884–894. doi: 10.1086/516757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hammer M.F., Karafet T.M., Park H., Omoto K., Harihara S., Stoneking M., Horai S. Dual origins of the Japanese: common ground for hunter-gatherer and farmer Y chromosomes. J. Hum. Genet. 2006;51:47–58. doi: 10.1007/s10038-005-0322-0. [DOI] [PubMed] [Google Scholar]
- 58.Wang C.-C., Wang L.-X., Shrestha R., Zhang M., Huang X.-Y., Hu K., Jin L., Li H. Genetic Structure of Qiangic Populations Residing in the Western Sichuan Corridor. PLoS One. 2014;9 doi: 10.1371/journal.pone.0103772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Poznik G.D., Xue Y., Mendez F.L., Willems T.F., Massaia A., Wilson Sayres M.A., Ayub Q., McCarthy S.A., Narechania A., Kashin S., et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat. Genet. 2016;48:593–599. doi: 10.1038/ng.3559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Yan S., Wang C.-C., Zheng H.-X., Wang W., Qin Z.-D., Wei L.-H., Wang Y., Pan X.-D., Fu W.-Q., He Y.-G., et al. Y Chromosomes of 40% Chinese Descend from Three Neolithic Super-Grandfathers. PLoS One. 2014;9 doi: 10.1371/journal.pone.0105691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ning C., Yan S., Hu K., Cui Y.-Q., Jin L. Refined phylogenetic structure of an abundant East Asian Y-chromosomal haplogroup O∗-M134. Eur. J. Hum. Genet. 2016;24:307–309. doi: 10.1038/ejhg.2015.183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Wang L.-X., Lu Y., Zhang C., Wei L.-H., Yan S., Huang Y.-Z., Wang C.-C., Mallick S., Wen S.-Q., Jin L., et al. Reconstruction of Y-chromosome phylogeny reveals two neolithic expansions of Tibeto-Burman populations. Mol. Genet. Genom. 2018;293:1293–1300. doi: 10.1007/s00438-018-1461-2. [DOI] [PubMed] [Google Scholar]
- 63.Zhang Y., Lei X., Chen H., Zhou H., Huang S. Ancient DNAs and the Neolithic Chinese super-grandfather Y haplotypes. bioRxiv. 2018 doi: 10.1101/487918. Preprint at. [DOI] [Google Scholar]
- 64.Li H., Huang Y., Mustavich L.F., Zhang F., Tan J.-Z., Wang L.-E., Qian J., Gao M.-H., Jin L. Y chromosomes of prehistoric people along the Yangtze River. Hum. Genet. 2007;122:383–388. doi: 10.1007/s00439-007-0407-2. [DOI] [PubMed] [Google Scholar]
- 65.Karafet T.M., Hallmark B., Cox M.P., Sudoyo H., Downey S., Lansing J.S., Hammer M.F. Major East-West Division Underlies Y Chromosome Stratification across Indonesia. Mol. Biol. Evol. 2010;27:1833–1844. doi: 10.1093/molbev/msq063. [DOI] [PubMed] [Google Scholar]
- 66.Yan S., Wang C.-C., Li H., Li S.-L., Jin L., Genographic Consortium An updated tree of Y-chromosome Haplogroup O and revised phylogenetic positions of mutations P164 and PK4. Eur. J. Hum. Genet. 2011;19:1013–1015. doi: 10.1038/ejhg.2011.64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Park M.J., Lee H.Y., Yang W.I., Shin K.-J. Understanding the Y chromosome variation in Korea—relevance of combined haplogroup and haplotype analyses. Int. J. Leg. Med. 2012;126:589–599. doi: 10.1007/s00414-012-0703-9. [DOI] [PubMed] [Google Scholar]
- 68.Kutanan W., Kampuansai J., Srikummool M., Brunelli A., Ghirotto S., Arias L., Macholdt E., Hübner A., Schröder R., Stoneking M. Contrasting Paternal and Maternal Genetic Histories of Thai and Lao Populations. Mol. Biol. Evol. 2019;36:1490–1506. doi: 10.1093/molbev/msz083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Wang M., Du W., He G., Wang S., Zou X., Liu J., Liu C., Liu C., Wang Z. Revisiting the genetic background and phylogenetic structure of five Sino-Tibetan-speaking populations: insights from autosomal InDels. Mol. Genet. Genom. 2020;295:969–979. doi: 10.1007/s00438-020-01673-x. [DOI] [PubMed] [Google Scholar]
- 70.Yao H.B., Tang S., Yao X., Yeh H.Y., Zhang W., Xie Z., Du Q., Ma L., Wei S., Gong X., et al. The genetic admixture in Tibetan-Yi Corridor. Am. J. Phys. Anthropol. 2017;164:522–532. doi: 10.1002/ajpa.23291. [DOI] [PubMed] [Google Scholar]
- 71.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Jagadeesan A., Ebenesersdóttir S.S., Guðmundsdóttir V.B., Thordardottir E.L., Moore K.H.S., Helgason A. HaploGrouper: a generalized approach to haplogroup classification. Bioinformatics. 2020;37:570–572. doi: 10.1093/bioinformatics/btaa729. [DOI] [PubMed] [Google Scholar]
- 76.Gouy A., Zieger M. STRAF—A convenient online tool for STR data evaluation in forensic genetics. Forensic Sci. Int. Genet. 2017;30:148–151. doi: 10.1016/j.fsigen.2017.07.007. [DOI] [PubMed] [Google Scholar]
- 77.Team, R.C. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2018. [Google Scholar]
- 78.Excoffier L., Lischer H.E.L. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 2010;10:564–567. doi: 10.1111/j.1755-0998.2010.02847.x. [DOI] [PubMed] [Google Scholar]
- 79.Kumar S., Stecher G., Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 2016;33:1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Relethford J.H. Geostatistics and spatial analysis in biological anthropology. Am. J. Phys. Anthropol. 2008;136:1–10. doi: 10.1002/ajpa.20789. [DOI] [PubMed] [Google Scholar]
- 81.Chen H., Lu Y., Lu D., Xu S. Y-LineageTracker: a high-throughput analysis framework for Y-chromosomal next-generation sequencing data. BMC Bioinf. 2021;22:114. doi: 10.1186/s12859-021-04057-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Martiniano R., De Sanctis B., Hallast P., Durbin R. Placing Ancient DNA Sequences into Reference Phylogenies. Mol. Biol. Evol. 2022;39 doi: 10.1093/molbev/msac017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Suchard M.A., Lemey P., Baele G., Ayres D.L., Drummond A.J., Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4 doi: 10.1093/ve/vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Drummond A.J., Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 2007;7 doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Rambaut A., Drummond A.J., Xie D., Baele G., Suchard M.A. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst. Biol. 2018;67:901–904. doi: 10.1093/sysbio/syy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Poznik G.D., Henn B.M., Yee M.C., Sliwerska E., Euskirchen G.M., Lin A.A., Snyder M., Quintana-Murci L., Kidd J.M., Underhill P.A., Bustamante C.D. Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science. 2013;341:562–565. doi: 10.1126/science.1237619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Du W., Feng P., Huang H., Wu W., Zhang L., Guo Y., Liu C., Liu H., Liu C., Chen L. Technical note: developmental validation of a novel 6-dye typing system with 36 Y-STR loci. Int. J. Leg. Med. 2019;133:1015–1027. doi: 10.1007/s00414-018-1864-y. [DOI] [PubMed] [Google Scholar]
- 89.Hohoff C., Dewa K., Sibbing U., Hoppe K., Forster P., Brinkmann B. Y-chromosomal microsatellite mutation rates in a population sample from northwestern Germany. Int. J. Leg. Med. 2007;121:359–363. doi: 10.1007/s00414-006-0123-9. [DOI] [PubMed] [Google Scholar]
- 90.Wang M., He G., Zou X., Liu J., Ye Z., Ming T., Du W., Wang Z., Hou Y. Genetic insights into the paternal admixture history of Chinese Mongolians via high-resolution customized Y-SNP SNaPshot panels. Forensic Sci. Int. Genet. 2021;54 doi: 10.1016/j.fsigen.2021.102565. [DOI] [PubMed] [Google Scholar]
- 91.Qian X., Hou J., Wang Z., Ye Y., Lang M., Gao T., Liu J., Hou Y. Next Generation Sequencing Plus (NGS+) with Y-chromosomal Markers for Forensic Pedigree Searches. Sci. Rep. 2017;7 doi: 10.1038/s41598-017-11955-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Wei W., Ayub Q., Xue Y., Tyler-Smith C. A comparison of Y-chromosomal lineage dating using either resequencing or Y-SNP plus Y-STR genotyping. Forensic Sci. Int. Genet. 2013;7:568–572. doi: 10.1016/j.fsigen.2013.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Mallick S., Li H., Lipson M., Mathieson I., Gymrek M., Racimo F., Zhao M., Chennagiri N., Nordenfelt S., Tandon A., et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016;538:201–206. doi: 10.1038/nature18964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Fu Q., Li H., Moorjani P., Jay F., Slepchenko S.M., Bondarev A.A., Johnson P.L.F., Aximu-Petri A., Prüfer K., de Filippo C., et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature. 2014;514:445–449. doi: 10.1038/nature13810. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
Data: The Y-STR and Y-SNP haplotype data for 519 TB-speaking individuals have been deposited in the YHRD database (https://yhrd.org/) under accession numbers YA004726 (TCD), YA004729 (TML), YA004613 (TQH), YA004223 (YLS), and YA004730 (SDJ). The supplementary materials contain all additional data used in this study. The data collection and usage adhered to the guidelines stipulated by the People’s Republic of China on the administration of human genetic resources.
-
•
Code: This article does not report the original code.
-
•
All other items: Requests for access to the raw data should be directed to Guanglin He at guanglinhescu@163.com or Mengge Wang at Menggewang2021@163.com.





