Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2026 Feb 3;43(2):msag034. doi: 10.1093/molbev/msag034

Genetic formation and regional disparities of Kra–Dai and Hmong–Mien speakers inferred from ancient genomes of cave burial populations in southwest China

Le Tao 1,2,3,1, Ying Xie 4,1, Haifeng He 5,1, Tianyou Bai 6, Jianxin Guo 7, Kongyang Zhu 8, Baitong Wang 9,10, Guangmao Xie 11,12,✉,3, Qiang Lin 13,✉,3, Chuan-Chao Wang 14,15,✉,3
Editor: Hie Lim KIM
PMCID: PMC12917239  PMID: 41632808

Abstract

Cave burial is a funerary practice believed to be associated with modern Kra–Dai (KD) and Hmong–Mien (HM) speakers for thousands of years. However, the extent to which these ancient cave burial practitioners contributed to the formation of modern ethnic minority groups remains poorly understood due to the limited ancient genomic data. We generated 14 newly sequenced ancient human genomes from cave burial sites in Guangxi. The findings reveal continuous gene flow from northern lineages into ancient cave burial populations, shaping their genetic profiles over time. We observed a significant genetic distinction in HM populations: Southeast Asian HM groups derive 74.8% to 100% of their ancestry from cave burials, preserving a robust ancient southern genetic signature, while Chinese HM populations exhibit only 11.1% to 37.2% ancient cave burial ancestry, but heavily admixed with Yellow River-related populations (14.7% to 52.1%), reflecting differential historical interactions with northern migrants. In contrast, most KD speakers maintain tight genetic clustering with Guangxi ancestors (28.5% to 100% contribution from cave burials). The HM formation involved admixture between ancient cave burials, northern farmers, and local KD-related groups, which is evident in the genetic cline of She and Miao populations.

Keywords: ancient DNA, cave burials, Kra–Dai speakers, Hmong–Mien speakers

Introduction

Cave burial is a special burial practice in southern China and Southeast Asia. Previous archaeological studies have suggested that the populations associated with cave burials in Guangxi are the ancestors of modern Kra–Dai (KD) and Hmong–Mien (HM) speakers (Zhang 1982a, 1982b; Zhang et al. 1986; Zhou and Feng 1991). Genetic evidence from ancient cave burial sites has demonstrated close genetic connections between ancient individuals and present-day KD and HM populations, with affinities aligned with their respective living periods (Wang et al. 2021b). Furthermore, the matrilineal connections of cave burial populations in Thailand and southern China suggest that the spread of this custom may have been accompanied by demic diffusion and admixture (Zhang et al. 2020). The Baiku Yao, an ethnic group among the Yao people who practice cave burial, are believed to be descendants of ancient cave burial populations dating back around 500 years (Guo et al. 2024). Ancient mitochondrial DNA studies have also suggested that ancient cave burial populations in southern China and Southeast Asia are ancestors of present-day KD speakers (Zhang et al. 2020; Carlhoff et al. 2023; Zhou et al. 2025). These multidisciplinary lines of evidence demonstrate a potential ancestor-descendant relationship between ancient cave burials and present-day KD and HM speakers.

Previous genetic studies of present-day populations have primarily relied on sequencing data from living individuals. Mitochondrial analysis suggested that modern KD populations may have descended from ancient rice-farming populations near the Yangtze River basin (Sun et al. 2021). Genome-wide SNP data from Maonan and Hlai individuals revealed genetic connections to HM and Sino-Tibetan populations (Wang et al. 2020; Chen et al. 2022). Genomic data from KD populations in the Yunnan–Guizhou Plateau indicate substantial genetic diversity, implying a complex population history and admixture events during their formation (Wang et al. 2023). The HM speakers, linguistically related people living in southern China and Southeast Asia, have also been the subject of population genetic studies. Early Mitochondrial data highlighted the genetic divergence between Hmong and Mien populations (Wen et al. 2005). The contributions of multiple ancestral sources, including Sino-Tibetan, KD, Austroasiatic and ancient rice-farming populations, to the formation of HM populations have been demonstrated using genetic data from present-day HM individuals (Li et al. 2007; Xia et al. 2019; Yang et al. 2022). Although the computational-constructed ancient genomic data have provided an effective approach for population genetics (Gao et al. 2024), direct ancient genomic evidence remains indispensable. Considering the complexity of population history and the scarcity of ancient DNA data related to KD and HM groups, additional ancient genomic datasets are essential for further elucidating the temporal dynamics of their population structure.

To resolve these competing hypotheses, we generated 14 new ancient genomes from key cave burial sites in Guangxi, which we co-analyzed with previously published datasets. We then employed a rigorous, hypothesis-testing framework to quantitatively evaluate the following competing scenarios:

  1. The primary local ancestry hypothesis: If the archaeological association is correct, modern KD and HM speakers should derive a substantial and measurable proportion of their ancestry directly from the ancient cave burial populations of Guangxi.

  2. The Northern Ancestry Dominance Hypothesis: Alternatively, if southward migrations were the dominant force, modern KD and HM groups would be primarily modeled as descendants of Yellow River-related northern populations, with minimal genetic contribution from the local cave burial practitioners.

  3. The complex admixture hypothesis: A third, more nuanced scenario posits that modern groups were formed through variable mixtures of local cave burial ancestry, northern Chinese lineages, and other unsampled regional populations, with proportions that may differ systematically between KD and HM speakers and across geographic regions.

By leveraging direct genetic evidence from the putative source populations, this study moves beyond correlation to formally test the genetic legacy of cave burial cultures and elucidate the complex interplay between local traditions and large-scale migrations in shaping the genomic diversity of Kra–Dai and Hmong–Mien speakers in East Asia.

Methods and materials

Archaeological information

Layi: This site is in the basin of the Hongshui River in Dahua Yao Autonomous County, Hechi City, Guangxi Zhuang Autonomous Region. We obtained 3 samples from 2 caves at this site.

Huatuyan: This site is near Huatu Village in Nandan County, Hechi City, Guangxi Zhuang Autonomous Region. We obtained 5 individuals from this site. Archaeological and genetic research indicate that the ancient cave burial practitioners here share close genetic relationships with the modern Baiku Yao population (Zhang et al. 1986; Guo et al. 2024).

Genggaishan: This site is also near Huatu Village in Nandan County, Hechi City, Guangxi Zhuang Autonomous Region. We obtained one individual from this site.

Shenxiandong: This site is near Fuqin Village in Pingguo County, Baise City, Guangxi Zhuang Autonomous Region. We obtained three individuals from this site.

Cenxun: This site is near Taiping Village in Pingguo County, Baise City, Guangxi Zhuang Autonomous Region. We obtained one individual from this site.

Banda: This site is at Banda Mountain in Dahua Yao Autonomous County, Hechi City, Guangxi Zhuang Autonomous Region. We obtained one sample from this site.

Lada: This site is at Lada Mountain in Hechi City, Guangxi Zhuang Autonomous Region. We obtained one sample from this site.

Ancient DNA extraction and library preparation

All samples were processed in the dedicated ancient DNA clean room at the Institute of Anthropology, Xiamen University. Before chemical decontamination, the outer surface of bones and teeth was mechanically removed using a dental drill to eliminate surface-exposed contaminants. Human remains were first cleaned with 75% ethanol and 10% sodium hypochlorite solution, followed by 30 min ultraviolet light exposure. We used dental drills to obtain 80 to 120 mg powder from teeth and the petrous parts of the temporal bones. DNA extraction followed a modified Rohland's protocol (Rohland et al. 2018): 1 ml lysis buffer containing 0.5 mM EDTA and 0.25 mg/mL Proteinase K to digest powder in a shaker at 37 °C, 300 rpm. The DNA solution was purified with a MinElute kit (Qiagen, Germany) following the manufacturer's manual. Extraction blanks were included in each batch.

We applied a single-strand library preparation procedure to prepare libraries for all samples (Gansauge et al. 2020) without UDG treatment, and amplified for 18 PCR cycles using AccuPrime Pfx. We performed in-solution DNA hybridization capture for Twist Mitochondrial Panel (CATALOG #104562) and the Twist Ancient Human SNP panel (CATALOG #106658) at 65 °C for 16 h (Rohland et al. 2022). Sequencing of enriched libraries was conducted as paired-end 100 bp runs on the Illumina Novaseq 6000 platforms, using a customized sequencing primer (ACACTCTTTCCCTACACGACGCTCTTCC).

Sequence data processing

We used AdapterRemoval v2.3.15 to trim adaptors remove low-quality bases, and merge paired reads into a single sequence (Schubert et al. 2016). Only merged reads ≥30 bp were retained for downstream analysis. Reads were mapped to the human reference genome hs37d5 using BWA v0.7.176, with the parameters “−l 1024” and “−n 0.01” (Li and Durbin 2009). PCR duplicates were removed using dedupe v0.12.3 (Peltzer et al. 2016). Each end of the reads was trimmed 6 bp using trimBam implemented in BamUtil v1.0.14 (Jun et al. 2015) to minimize terminal misincorporations typical of ancient DNA. Alignment filtering was performed using SAMtools v1.15 with the parameters −q 30 (mapping quality) and −Q 30 (base quality) (Danecek et al. 2021).

Authentication of ancient DNA

To authenticate ancient DNA, we examined terminal deamination patterns using PMDtools (Skoglund et al. 2014). We evaluated contamination using 3 independent approaches: mitochondrial contamination with Schmutzi (Renaud et al. 2015), nuclear contamination with ContamLD (Nakatsuka et al. 2020), and X-chromosome contamination in males using ANGSD (Korneliussen et al. 2014). All samples showed contamination rates less than 3%.

Sex determination

Sex determination was based on comparing sequencing coverage across autosomes, the X chromosome, and the Y chromosome. Coverage was obtained using samtools depth with parameters −q30 −Q37 −a and the Twist Ancient DNA panel (−b). Coverage for autosomal, X-chromosomal, and Y-chromosomal sites was then summarized using an awk-based script following the workflow described in the GA workshop documentation. Individuals showing autosomal-level X coverage and negligible Y coverage were classified as female, whereas those showing approximately half autosomal coverage on both X and Y chromosomes were classified as male (Fu et al. 2016).

Kinship detection

We detected kinship among individuals using READ, which infers first- and second-degree relatedness based on allele mismatch patterns (Monroy Kuhn et al. 2018). Pairs showing Z-scores outside the unrelated distribution were further inspected manually.

Data merging

We merged newly generated data with previously published datasets (Ning et al. 2020; Wang et al. 2021a, 2021b; Guo et al. 2024; Mallick et al. 2024; Zhu et al. 2024b; He et al. 2025; Xiong et al. 2025) using mergeit in EIGENSOFT. Two merged datasets were used in downstream study: the HumanOrigins dataset was used for principal component analysis, and the 1,240k dataset was used in admixture analysis, outgroup-f3, f4, qpWave, and qpAdm analysis. Individuals showing close genetic relatedness (see section Kinship detection) were removed prior to merging to avoid bias.

Principal components analysis (PCA)

Principal components analysis was performed using smartpca v16000 in EIGENSOFT based on the HumanOrigins dataset (Patterson et al. 2012). PCA was computed using modern populations only, and all ancient individuals were projected onto the resulting axes by setting lsqproject: YES (Zhu et al. 2024a). Default parameters were otherwise applied.

ADMIXTURE analysis

We pruned our dataset for linkage disequilibrium using PLINK v1.90 with parameters “-indep-pairwise 200 25 0.4” (Chang et al. 2015; Peter 2016; Lawson et al. 2018). Then, we performed unsupervised admixture analysis using ADMIXTURE v.1.3.0 (Alexander et al. 2009).

f-statistic analysis

We computed outgroup-f3 statistics using qp3pop v651 from ADMIXTOOLS, with the parameter “inbreed: YES”, to assess shared genetic drift between populations. f4 statistics were calculated using qpDstat v980 in f4 mode (f4mode: YES) to test asymmetric allele sharing and admixture signals (Patterson et al. 2012; Peter 2016). Results with |Z| ≥ 3 were considered statistically significant, whereas results with 2 ≤ |Z| ≤ 3 were considered suggestive.

Admixture modeling

Admixture proportions were estimated using qpAdm v810 from ADMIXTOOLS, modeling target populations as mixtures of selected source populations. Analyses were conducted with parameters allsnp: YES and inbreed: YES (Peter 2016).

Runs of homozygosity analysis

Runs of homozygosity were inferred using PLINK v1.9, following parameter settings optimized for ancient DNA (Chang et al. 2015; Ceballos et al. 2018). ROH length distributions were used to assess recent effective population size changes and potential population bottlenecks.

Results

Genetic structure of the cave burial populations in historic Guangxi

Guangxi, located at the crossroads of the Yunnan–Guizhou plateau, southern China, and Southeast Asia, harbors a diverse population structure. To provide genetic evidence for the formation of KD and HM populations in our study area and surrounding regions, we collected samples from ancient cave burial individuals, who, according to archaeological studies, are believed to be the ancestors of modern KD and HM populations (Zhang 1982a, 1982b; Zhang et al. 1986; Zhou and Feng 1991) (Fig. 1). We applied single-strand library preparation and in-solution capture to obtain high-quality genomic data (Gansauge et al. 2020; Rohland et al. 2022). After quality control, we retained 14 samples that exhibit ancient DNA damage patterns with endogenous DNA ranging from 0.98% to 73.59%, and 45,637 to 936,918 SNPs targeted in the 1,240k panel for downstream analysis (Table S1a and Figure S1). We merged our dataset with previous public datasets, obtaining 26 non-related cave burial individuals from Guangxi, dated between 1,500 and 300 BP (Wang et al. 2021b) (Table S1b and c). With this genomic data, we explored the genetic contributions of cave burial-related ancestors and the admixture history of KD and HM populations.

Figure 1.

Figure 1

Geographic location and genetic structure of ancient cave burial practitioners in Guangxi. Each symbol represents an archaeological site.

We first estimated genetic variation across individuals to assess the genetic homogeneity among cave burial populations in this region. Our results revealed distinct genetic profiles for individuals dated around 500 BP, which contained more rice-farming-related components than those dated around 1,500 BP (Fig. 2a and Table S1). However, we observed genetic homogeneity within each site, suggesting a familial and community-based nature to the cave burial practices.

Figure 2.

Figure 2

Genetic structure of ancient cave burial practitioners in Guangxi. a) Genetic homogeneity analysis using pairwise qpWave; each cell shows the P-value for Rank = 0 qpWave modeling based on the 1,240k dataset. b) Principal Component Analysis (PCA) of East Asian individuals. Ancient individuals (colored symbols) were projected onto the PCs calculated from modern East Asians (grey symbols). c, d) Ancestry components of ancient cave burial practitioners modeled using qpAdm. Each color represents an ancestral population used in the analysis. See also Table S4a and b.

We grouped individuals based on archaeological sites and qpWave analysis and performed principal component analysis (PCA). In line with previous research, all individuals from historical Guangxi clustered with ancient Southeast Asian populations, yet they form two distinct clusters (Fig. 2b). Outgroup-f3 analysis further confirmed that these populations exhibited a higher genetic affinity to ancient Southeast Asian (SEA) and Southeast Coastal (SEC) populations (Table S2a). To explore admixture signals, we employed admixture-f3 analysis, which indicated that the historical populations of Guangxi were a mixture of prehistoric Guangxi populations and ancient northern populations. The significant Z-scores from this analysis suggest that these individuals harbored ancestries related to both ancient SEC and Yellow River populations (f3(Yellow River-related populations/prehistorical Guangxi ancient populations/ancient Southeast Coastal populations, Yellow River-related populations/prehistorical Guangxi ancient populations/ancient Southeast Coastal populations; target) Z-score > 3) (Table S2b). Additionally, f4 analysis further confirmed the impact of north-to-south migration on the genetic landscape of Guangxi, with a significant Z-score for f4(Mbuti, Yellow River-related; 1,500/500, pre_his_SEC) (Table S3a).

Next, we used qpAdm to estimate the ancestry proportions of historical Guangxi individuals (Table S4a and b and Fig. 2c and d). For those dated to 1,500 BP, the results indicated a mixture of Southeast Asian, Coastal East Asian, and northern Chinese ancestries. This genetic pattern suggests that Guangxi was a key link between China and Southeast Asia. In contrast, individuals from 500 BP exhibited similar genetic profiles, but with an added northern-related lineage. The expansion of the Yellow River-related farming population significantly influenced the genetic components in this region (Wang et al. 2021b). Multiple migration events from the Central Plain to Guangxi since the Ming Dynasty contributed to shaping the genetic profile of Guangxi inhabitants (Guo et al. 2024).

Formation of modern Kra–Dai and Hmong–Mien speakers

Previous studies have highlighted the genetic relationships between ancient Guangxi (GX) populations and minor ethnic groups in Southeast Asia (SEA) and southern China (SC) (Wang et al. 2021b; Guo et al. 2024). However, continuous southward migration has significantly shaped genetic profiles by increasing the northern-related components in southern China (Yang et al. 2020; Wang et al. 2021b; Tao et al. 2023; Guo et al. 2024; Zhou et al. 2025). To investigate whether northern-related ancestry has homogenized the genetic structure of modern KD and HM speakers, we first conducted qpWave analysis to examine the genetic diversity within KD and HM populations in our study area and surrounding regions. Despite the strong influence of northern lineages, both KD and HM populations continue to exhibit genetic diversity (Fig. 3a).

Figure 3.

Figure 3

Genetic relationships between modern Kra–Dai (KD) and Hmong–Mien (HM) speakers and ancient cave burial populations. a) Genetic homogeneity analysis using pairwise qpWave; each cell shows the P-value for Rank = 0 qpWave modeling based on the Human Origins (HO) dataset. b) PCA of East Asian individuals. Each symbol represents an individual. Ancient cave burial individuals (red) were projected onto the PCs calculated from modern Han, KD, and HM populations.

We performed Outgroup-f3 and f4 analysis to evaluate the genetic affinities between cave burials and present-day KD and HM speakers. The results are consistent with previous studies, indicating a close genetic affinity between them (Tables S2c and S3b). In contrast to the extinct prehistorical Guangxi lineage (eg Longlin), results from admixture f3 analyses suggest that KD and HM populations primarily derived ancestry from historical GX ancestors (Table S2d). These findings imply that northern immigrants interacted and intermixed with local populations in Guangxi, rather than replacing them during historical periods.

To further explore the genetic relationships among these populations, we performed principal component analysis (PCA), which included Han, KD, and HM speakers (Fig. 3b). The results revealed that HM speakers form a distinct clade, separate from Han, KD, and 1,500 BP Guangxi ancient populations. Additionally, we observed that the Dao people form a tight cluster with most KD speakers, indicating a closer genetic relationship between them. ADMIXTURE analysis at K = 6 also showed that the Dao share genetic components similar to KD speakers (Fig. S2). We found that the She and Miao populations are positioned on a genetic cline between 500 BP Guangxi and southern Han populations, suggesting an admixture-driven differentiation during the formation of HM populations (Fig. 3b). The f3 and f4 statistics (eg f3(500BP GX, Han; HM), f4(Mbuti, Han; HM, 500 BP GX)) further support this observation (Tables S2e and S3c).

In contrast to HM speakers, all KD speakers, except for the CoLao and LaChi groups, form a cluster with 1,500 BP Guangxi ancient populations, showing a strong genetic affinity between them. However, the Dong populations from Hunan and Guizhou shift toward the Hmong–Mien cluster, indicating admixture events between these groups. The f4 analysis (eg f4(Mbuti, HM; Dong_Hunan/Dong_Guizhou, KD)) further supports this conclusion (Table S3d).

Interestingly, the CoLao and LaChi populations represent two distinct clades that deviate from other Kra–Dai speakers and the 1,500 BP Guangxi samples. We hypothesize that CoLao and LaChi populations either inherited alleles from another Kra–Dai-related ancestry or underwent a genetic bottleneck that intensified genetic drift, leading to their deviation from other Kra–Dai populations. To test this hypothesis, we used PLINK to estimate the Runs of Homozygosity (ROH) in modern Kra–Dai and Hmong–Mien speakers. Our results revealed significant short ROHs in CoLao, Hmong, and LaChi populations, suggesting a rapid decline in effective population size (Ne), which may explain their deviation from other Kra–Dai and Hmong–Mien speakers (Fig. 4). Moreover, we performed a f4 analysis in form of f4 (Mbuti, deep; target, aEA) to detect potential genetic influence from unsampled ghost deep lineages that may have contributed to the distinct PCA patterns observed in CoLao and LaChi. Our results indicate that CoLao shares genetic affinities with some deep lineages such as Longlin, Laos_Hoabinhian, and Japan_Jomon, which may explain the distinctive patterns in PCA (Table S3e).

Figure 4.

Figure 4

Runs of homozygosity (ROH) analysis of modern KD and HM speakers.

Additionally, PCA analysis revealed an outlier in the CoLao population, with a significant Z-score indicating that this individual shares more alleles with Hmong populations compared to other CoLao individuals (f4(Mbuti, CoLao_o; CoLao, Hmong) > 0, Z-score = 17.862).

To further investigate the contribution of ancient cave burial populations to modern HM and TK speaking populations, we performed qpAdm analysis (Table S4c and Fig. 5a and b).

Figure 5.

Figure 5

Ancestry modeling of present-day Hmong–Mien and Kra–Dai speakers. Each color represents an ancestral population used in the analysis. See also Table S4c and d. a) Ancestry components of modern Kra–Dai speakers. See also Table S4c. b) Ancestry components of modern Hmong–Mien speakers. See also Table S4c. c) Ancestry components of CoLao, Mulam, and Nung populations. See also Table S4d.

Our analysis indicates that genetic contributions from 500 BP individuals are present in all modern HM speakers, with a significantly higher proportion observed in populations from Southeast Asia. Specifically, HM speakers in Southeast Asia derive 74.5% to 100% of their ancestry from 500 BP individuals, with only 0% to 25.5% contributed by modern Han populations. In contrast, HM speakers in China exhibit a lower proportion of 500 BP ancestry (22.2% to 26.5%), with modern Han contributing 73.5% to 77.8%. However, this regional difference was not observed in the genetic modeling of modern KD speakers. Most modern KD speakers can be modeled as an admixture of 1,500 BP individuals and modern Han, with proportions ranging from 14.4% to 71.5% and 28.5% to 85.6%, respectively. Additionally, Li and Maonan populations contain extra components related to Southeast Asian ancestry, possibly influenced by southward coastal migrations. Notably, 1,500 BP individuals cannot be modeled as direct ancestors of the Nung and Mulam populations. Furthermore, our qpAdm analysis could not model the genetic composition of the CoLao population.

Since Han exhibits comparatively complex genetic structures, it may not be an ideal proxy for ancient northern ancestry components. To address this issue, we performed additional qpAdm analysis using ancient Yellow River-related populations (eg YR_LBIA, YR_MN) as the northern source (Table S4d). Our results remained consistent whether Han or YR-related populations were used as the northern source, indicating the feasibility of Han as a proxy of ancient northern ancestry in qpAdm modeling. Interestingly, our results reveal that Mulam and Nung derived their ancestries from Ami and ancient cave burials, with additional genetic contributions from YR-related (29.3%) and SEA populations (15.4%) (Table S4d and Fig. 5c). The variation in genetic components incorporates with their geographic distributions, suggests that two pre-HM clines genetically interacted with local populations during their formation history. Furthermore, CoLao can be modeled as a mixture of Ami (45%), SEA (17%) and YR-related populations (38%) (Table S4d and Fig. 5c). This distinct genetic structure may explain the deviating pattern observed in the PCA.

Discussion

This study provides a high-resolution genetic perspective on the formation of Kra–Dai and Hmong–Mien speakers by directly testing the long-standing archaeological hypothesis that ancient cave burial practitioners were their primary ancestors. Our analyses, particularly the formal ancestry modeling with qpAdm, allow us to adjudicate between the competing scenarios outlined in the introduction. Our findings support the complex admixture hypothesis, while simultaneously refining our understanding of the roles of local and northern ancestries. Temporal genetic homogeneity among these burial-related individuals, along with the increasing presence of northern-related ancestry, suggests that continuous large-scale migration from the north has significantly shaped the genetic profiles of the region, potentially decreasing genetic diversity over time.

Ancient populations practicing cave burial at the Zuoyou River basin and Nandan in Guangxi have been considered ancestors of present-day KD and HM speakers, respectively (Zhang 1982a, 1982b; Zhang et al. 1986; Zhou and Feng 1991). Our results and previous archaeological and genetic studies show a close relationship between these cave burial populations and modern KD and HM speakers. Present-day HM speakers exhibit a genetic cline toward both Han and KD populations, with shared alleles indicating the importance of admixture with Han and TK-related groups in the formation of HM populations. We found that most KD populations show a high genetic affinity with 1,500 BP Guangxi ancestors, suggesting that the genetic components of present-day TK speakers have remained relatively consistent over the past 1,500 years.

As expected, our qpAdm analysis revealed that nearly all present-day HM and KD speakers retain genetic components from their respective related populations. Notably, HM populations in Southeast Asia exhibit a much higher proportion of ancestry from cave burial individuals (74.5% to 100%) than their Chinese counterparts (22.2% to 26.5%), which are more admixed with modern Han populations. This regional disparity likely reflects historical migration patterns: as HM ancestral groups migrated southward, those settling in Southeast Asia encountered fewer sustained interactions with Han populations, preserving more of their ancient cave burial-related ancestry (Ge et al. 1993). In contrast, HM groups in China experienced prolonged admixture with Han migrants, particularly during the Ming Dynasty and subsequent periods (Ge et al. 1993), leading to a dilution of ancient southern ancestry. This aligns with historical records of north-south demographic shifts and the role of the Central Plain as a source of agricultural expansions that influenced ethnic formation in southern China.

Our results highlight the role of demographic integration in the ethnic formation of present-day HM groups. When using the YR-related population to present northern-related ancestries, we observed that Hmong exhibited different genetic profiles distinct from those in the previous qpAdm model, providing suggestive evidence that Hmong experienced different population admixture events compared to other HM speakers in SEA. The admixture patterns observed in HM populations, such as the She and Miao forming a cline between 500 BP Guangxi and southern Han, further highlight the role of admixture-driven differentiation. This is consistent with linguistic and archaeological evidence suggesting HM groups emerged from a blend of local southern populations and northern migrants, with genetic contributions from both KD-related and Sino-Tibetan lineages (Ratliff 2021).

Similarly, modern KD speakers also show genetic contributions from northern populations. Ancestry modeling indicates that most modern KD speakers derived their ancestry from ancient cave burials and Han populations. Although the initial models suggested that Mulam and Nung derived no ancestry from cave burials. Following qpAdm showed that Mulam and Nung can be modeled as mixtures of Ami and YR-related populations with additional genetic influence from geographically proximate ancient individuals (ie cave burials from Guangxi and Malaysia_LN, respectively).

The deviating pattern observed in PCA analysis indicates notable genetic heterogeneity of CoLao and LaChi populations compared to other TK groups. Particularly, CoLao populations could not be modeled in our previous qpAdm models using modern Han as northern-related source. Although alternative qpAdm models suggest that CoLao could be modeled as a descendant of Ami, SEA, and YR-related populations, indicating more complex genetic interactions among multiple ethnic groups, our analysis suggests that these genetic divergences may reflect a population bottleneck, as evidenced by the significant short ROH found in both CoLao and LaChi populations. Notably, ROH analysis was based on SNPs datasets, which cannot fully represent the temporal dynamics of these populations (eg changes in effective population size, Ne). Further investigation into the population history of CoLao and LaChi is currently limited by the lack of available whole-genome datasets.

Interestingly, one CoLao individual shared more alleles with Hmong than with other CoLao individuals. We hypothesize that this individual may have been adopted by the CoLao during childhood or that there was an issue with sample labeling. Additionally, the Dong populations’ shift toward the geographically overlapping HM cluster in PCA indicates cross-cultural interactions and genetic exchange between KD and HM groups in specific regions, such as Hunan and Guizhou, demonstrating the fluidity of ethnic boundaries in prehistoric and historic times.

However, our study was based on a limited number of cave burial individuals, which may only represent the genetic structure of ancient cave burial populations in this region. While the increasing availability of modern samples from TK and HM populations, along with reconstructed whole-genome data from modern population datasets, has advanced our understanding, the scarcity of genomic data from ancient individuals in South China and Southeast Asia still limits our ability to investigate the population history of HM and KD groups thoroughly. The limited sampling of contemporary HM and KD populations has constrained more comprehensive genetic analyses. Expanding spatial and temporal sampling efforts across South China and Southeast Asia could clarify whether the observed genetic patterns are region-specific or part of a larger-scale demographic process. Additionally, integrating paleoproteomic or isotopic data would provide a more holistic view of subsistence strategies and mobility patterns, complementing genetic evidence of admixture.

Conclusion

This study bridges archaeological, linguistic, and genetic evidence to demonstrate that ancient cave burial populations were pivotal contributors to the formation of modern KD and HM groups, with their genetic legacy shaped by millennia of interactions with northern migrants. The regional disparities in HM ancestry and the stability of KD genetic components highlight the complex, layered processes of ethnic formation in Southwest China, underscoring the value of ancient DNA in unraveling the multifaceted histories of populations at the crossroads of East and Southeast Asia.

Supplementary Material

msag034_Supplementary_Data

Acknowledgments

We sincerely thank the editors and reviewers for their contributions and suggestions. We thank all participants in these studies. SF and ZX from the Information and Network Center of Xiamen University are acknowledged for their help with high-performance computing.

Contributor Information

Le Tao, Institute of Forensic Science, Fudan University, Shanghai 200032, China; Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China; State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen 361005, China.

Ying Xie, Guangxi Institute of Cultural Relics Protection and Archaeology, Nanning 530003, China.

Haifeng He, State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen 361005, China.

Tianyou Bai, State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen 361005, China.

Jianxin Guo, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650201, China.

Kongyang Zhu, Department of Evolutionary Anthropology, University of Vienna, Djerassiplatz 1, Vienna 1030, Austria.

Baitong Wang, Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China; Fujian Provincial Key Laboratory of Philosophy and Social Sciences in Bioanthropology, Institute of Anthropology, Xiamen University, Xiamen 361005, China.

Guangmao Xie, Guangxi Institute of Cultural Relics Protection and Archaeology, Nanning 530003, China; School of History, Culture and Tourism, Guangxi Normal University, Guilin 541001, China.

Qiang Lin, Guangxi Institute of Cultural Relics Protection and Archaeology, Nanning 530003, China.

Chuan-Chao Wang, Institute of Forensic Science, Fudan University, Shanghai 200032, China; Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China.

Author contributions

Le Tao (Data curation, Formal analysis, Investigation, Software, Visualization, Data curation, Writing—original draft, Writing—review & editing), Ying Xie (Investigation, Resources, Writing—original draft), Haifeng He (Conceptualization, Investigation, Visualization, Writing—original draft, Writing—review & editing), Tianyou Bai (Investigation), Jianxin Guo (Conceptualization), Kongyang Zhu (Methodology, Software, Data curation), Baitong Wang (Investigation), Guangmao Xie (Resources, Project administration, Supervision), Qiang Lin (Resources, Project administration, Supervision), Chuan-Chao Wang (Conceptualization, Funding acquisition, Project administration, Supervision, Writing—original draft, Writing—review & editing)

Supplementary material

Supplementary material is available at Molecular Biology and Evolution online.

Funding

This work was funded by the National Key Research and Development Program of China (2024YFC3306701).

Data availability

The BAM files reported in this paper have been deposited in the Genome Sequence Archive in the National Genomics Data Center, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA-Human: HRA015436). All other data are available from the corresponding author on reasonable request.

Ethics approval and consent to participate

All procedures performed in studies, including obtaining and destructive sampling of samples, were approved by the Biomedical Research Ethics Committee of Xiamen University (No. XDYX202412K88).

References

  1. Alexander  DH, Novembre  J, Lange  K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009:19:1655–1664. 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Carlhoff  S  et al.  Genomic portrait and relatedness patterns of the Iron Age Log Coffin culture in northwestern Thailand. Nat Commun. 2023:14:8527. 10.1038/s41467-023-44328-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ceballos  FC, Joshi  PK, Clark  DW, Ramsay  M, Wilson  JF. Runs of homozygosity: windows into population history and trait architecture. Nat Rev Genet. 2018:19:220–234. 10.1038/nrg.2017.109. [DOI] [PubMed] [Google Scholar]
  4. Chang  CC  et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015:4:7. 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chen  J  et al.  Fine-scale population admixture landscape of Tai–Kadai-speaking Maonan in Southwest China inferred from genome-wide SNP data. Front Genet. 2022:13:815285. 10.3389/fgene.2022.815285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Danecek  P  et al.  Twelve years of SAMtools and BCFtools. Gigascience. 2021:10:giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Fu  Q  et al.  The genetic history of Ice Age Europe. Nature. 2016:534:200–205. 10.1038/nature17993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Gansauge  MT, Aximu-Petri  A, Nagel  S, Meyer  M. Manual and automated preparation of single-stranded DNA libraries for the sequencing of DNA from ancient biological remains and other sources of highly degraded DNA. Nat Protoc. 2020:15:2279–2300. 10.1038/s41596-020-0338-0. [DOI] [PubMed] [Google Scholar]
  9. Gao  Y  et al.  Reconstructing the ancestral gene pool to uncover the origins and genetic links of Hmong–Mien speakers. BMC Biol. 2024:22:59. 10.1186/s12915-024-01838-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ge  J, Cao  S, Wu  S. A concise history of Chinese migration [in Chinese]. Fujian People's Publishing House; 1993. [Google Scholar]
  11. Guo  J  et al.  Genetic affinity of cave burial and Hmong–Mien populations in Guangxi inferred from ancient genomes. Archaeol Anthropol Sci.  2024:16:121. 10.1007/s12520-024-02033-1. [DOI] [Google Scholar]
  12. He  H  et al.  Genetic stability in the lower Yangtze River basin from Song to Qing Dynasty. BMC Biol. 2025:23:270. 10.1186/s12915-025-02343-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jun  G, Wing  MK, Abecasis  GR, Kang  HM. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 2015:25:918–925. 10.1101/gr.176552.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Korneliussen  TS, Albrechtsen  A, Nielsen  R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics. 2014:15:356. 10.1186/s12859-014-0356-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lawson  DJ, van Dorp  L, Falush  D. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat Commun. 2018:9:3258. 10.1038/s41467-018-05257-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Li  H, Durbin  R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009:25:1754–1760. 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Li  H  et al.  Y chromosomes of prehistoric people along the Yangtze River. Hum Genet.  2007:122:383–388. 10.1007/s00439-007-0407-2. [DOI] [PubMed] [Google Scholar]
  18. Mallick  S  et al.  The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes. Sci Data.  2024:11:182. 10.1038/s41597-024-03031-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Monroy Kuhn  JM, Jakobsson  M, Gunther  T. Estimating genetic kin relationships in prehistoric populations. PLoS One. 2018:13:e0195491. 10.1371/journal.pone.0195491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Nakatsuka  N  et al.  ContamLD: estimation of ancient nuclear DNA contamination using breakdown of linkage disequilibrium. Genome Biol. 2020:21:199. 10.1186/s13059-020-02111-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ning  C  et al.  Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat Commun. 2020:11:2700. 10.1038/s41467-020-16557-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Patterson  N  et al.  Ancient admixture in human history. Genetics. 2012:192(3):1065–1093. 10.1534/genetics.112.145037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Peltzer  A  et al.  EAGER: efficient ancient genome reconstruction. Genome Biol. 2016:17:60. 10.1186/s13059-016-0918-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Peter  BM. Admixture, population structure, and F-statistics. Genetics. 2016:202:1485–1501. 10.1534/genetics.115.183913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ratliff  M. 14 Classification and historical overview of Hmong–Mien languages. In: The languages and linguistics of mainland Southeast Asia. De Gruyter Mouton; 2021. p. 247–260. [Google Scholar]
  26. Renaud  G, Slon  V, Duggan  AT, Kelso  J. Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol. 2015:16:224. 10.1186/s13059-015-0776-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Rohland  N  et al.  Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs. Genome Res. 2022:32:2068–2078. 10.1101/gr.276728.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Rohland  N, Glocke  I, Aximu-Petri  A, Meyer  M. Extraction of highly degraded DNA from ancient bones, teeth and sediments for high-throughput sequencing. Nat Protoc. 2018:13:2447–2461. 10.1038/s41596-018-0050-5. [DOI] [PubMed] [Google Scholar]
  29. Schubert  M, Lindgreen  S, Orlando  L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016:9:88. 10.1186/s13104-016-1900-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Skoglund  P  et al.  Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc Natl Acad Sci U S A. 2014:111:2229–2234. 10.1073/pnas.1318934111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Sun  J  et al.  Shared paternal ancestry of Han, Tai-Kadai-speaking, and Austronesian-speaking populations as revealed by the high resolution phylogeny of O1a-M119 and distribution of its sub-lineages within China. Am J Phys Anthropol. 2021:174:686–700. 10.1002/ajpa.24240. [DOI] [PubMed] [Google Scholar]
  32. Tao  L  et al.  Ancient genomes reveal millet farming-related demic diffusion from the Yellow River into southwest China. Curr Biol. 2023:33:4995–5002.e7. 10.1016/j.cub.2023.09.055. [DOI] [PubMed] [Google Scholar]
  33. Wang  CC  et al.  Genomic insights into the formation of human populations in East Asia. Nature. 2021a:591:413–419. 10.1038/s41586-021-03336-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Wang  J  et al.  Extensive genetic admixture between Tai-Kadai-speaking people and their neighbours in the northeastern region of the Yungui Plateau inferred from genome-wide variations. BMC Genomics. 2023:24:317. 10.1186/s12864-023-09412-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Wang  M  et al.  Massively parallel sequencing of mitogenome sequences reveals the forensic features and maternal diversity of Tai-Kadai-speaking Hlai islanders. Forensic Sci Int Genet. 2020:47:102303. 10.1016/j.fsigen.2020.102303. [DOI] [PubMed] [Google Scholar]
  36. Wang  T  et al.  Human population history at the crossroads of East and Southeast Asia since 11,000 years ago. Cell. 2021b:184:3829–3841.e21. 10.1016/j.cell.2021.05.018. [DOI] [PubMed] [Google Scholar]
  37. Wen  B  et al.  Genetic structure of Hmong–Mien speaking populations in East Asia as revealed by mtDNA lineages. Mol Biol Evol.  2005:22:725–734. 10.1093/molbev/msi055. [DOI] [PubMed] [Google Scholar]
  38. Xia  Z-Y  et al.  Inland-coastal bifurcation of southern East Asians revealed by Hmong–Mien genomic history. BioRxiv. 2019. 10.1101/730903 [DOI] [Google Scholar]
  39. Xiong  J  et al.  The genomic history of East Asian Middle Neolithic millet- and rice-agricultural populations. Cell Genom. 2025:5:100976. 10.1016/j.xgen.2025.100976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Yang  M  et al.  Genomic insights into the unique demographic history and genetic structure of five Hmong–Mien-Speaking Miao and Yao populations in Southwest China. Front Ecol Evol.  2022:10:849195. 10.3389/fevo.2022.849195. [DOI] [Google Scholar]
  41. Yang  MA  et al.  Ancient DNA indicates human population shifts and admixture in northern and southern China. Science. 2020:369:282–288. 10.1126/science.aba0909. [DOI] [PubMed] [Google Scholar]
  42. Zhang  S. Discussion on cliff cave burials in Guangxi and several related issues [in Chinese]. Ethnol Res. 1982a:2:85–118. [Google Scholar]
  43. Zhang  S, Peng  S, Zhou  S. Survey report on the Lihu cave burials in Nandan County, Guangxi [In Chinese]. Cultural Relics  1986:11:65–75+105. [Google Scholar]
  44. Zhang  X  et al.  A matrilineal genetic perspective of hanging coffin custom in Southern China and Northern Thailand. iScience. 2020:23:101032. 10.1016/j.isci.2020.101032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Zhang  Y. A preliminary investigation of cliff cave burials in the Left and Right River areas of Guangxi [In Chinese]. Ethnol Res. 1982b:2:250–260. [Google Scholar]
  46. Zhou  H  et al.  Exploration of hanging coffin customs and the bo people in China through comparative genomics. Nat Commun. 2025:16:10230. 10.1038/s41467-025-65264-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Zhou  J, Feng  T. Investigation and research on cliff cave burials in the Left and Right River Basin of Guangxi [In Chinese]. Jianghan Archaeol. 1991:3:28–36. [Google Scholar]
  48. Zhu  K  et al.  Protocol for a comprehensive pipeline to study ancient human genomes. STAR Protoc. 2024a:5:102985. 10.1016/j.xpro.2024.102985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zhu  K  et al.  The demic diffusion of Han culture into the Yunnan-Guizhou plateau inferred from ancient genomes. Natl Sci Rev. 2024b:11:nwae387. 10.1093/nsr/nwae387. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

msag034_Supplementary_Data

Data Availability Statement

The BAM files reported in this paper have been deposited in the Genome Sequence Archive in the National Genomics Data Center, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA-Human: HRA015436). All other data are available from the corresponding author on reasonable request.


Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES