Whole-genome variant of 220 Tibetan sheep from the Qinghai-Tibetan Plateau

Zengkui Lu; Chao Yuan; Tingting Guo; Fan Wang; Bowei Chen; Jianbin Liu

doi:10.1038/s41597-025-06360-3

. 2025 Dec 6;13:61. doi: 10.1038/s41597-025-06360-3

Whole-genome variant of 220 Tibetan sheep from the Qinghai-Tibetan Plateau

Zengkui Lu ^1,², Chao Yuan ^1,^2,^✉, Tingting Guo ^1,², Fan Wang ³, Bowei Chen ^1,², Jianbin Liu ^1,^2,^✉

PMCID: PMC12820102 PMID: 41353230

Abstract

The ancient Tibetan sheep breed has been shaped by long-term natural selection and artificial breeding in the Qinghai-Tibet Plateau region. Although their green organic mutton is highly favored by consumers, the low production efficiency of Tibetan sheep has resulted in a persistent supply shortage. Whole-genome sequencing analysis can identify genetic markers and candidate genes associated with important economic traits, which can be used in genomic-assisted breeding to accelerate genetic improvement and increase production efficiency. Here, we report whole-genome sequencing data from 220 Tibetan sheep across 11 populations inhabiting different altitudes, with an average coverage of 6.20X. Over 98.34% of clean reads were successfully mapped to the Tibetan sheep reference genome, identifying approximately 21.10 million high-quality single-nucleotide polymorphisms. This dataset provides a valuable resource for studying the genetic diversity and adaptability of Tibetan sheep, and may accelerate improvements in genetic traits.

Subject terms: Genetic variation, Genetic markers

Background & Summary

As one of the three major primitive sheep breeds in China, Tibetan sheep (Ovis aries) have lived on the Qinghai-Tibet Plateau for thousands of years, uniquely adapted to its high-altitude, cold, and strong ultraviolet conditions. Their excellent survival traits also include coarse feed tolerance, strong disease resistance, and robust foraging ability. The Tibetan sheep breed, shaped by long-term natural selection and artificial breeding, serves as a vital livelihood resource for local farmers and herders, and contributes to the sustainable growth and high-quality development of the pastoral economy.

Sheep, which are among the earliest domesticated animals, have maintained a close relationship with humans, especially nomadic people. Research indicates that Tibetan sheep originated from ancient northern Chinese sheep around 3,100 years ago¹, diverging approximately 2,000‒2,600 years ago². A small group of Tibetan sheep continued to expand southwestward, reaching central Tibet about 1,300 years ago¹, while the remaining populations settled across various regions of Qinghai, gradually adapting to local geographical conditions and evolving into distinct breeds. Statistics show that China’s Tibetan sheep population stands at 32.5 million head, accounting for 11% of the total sheep population^3,4. However, because of environmental constraints, the Tibetan breed has long been trapped in a cycle of low-level development and low-efficiency production. Despite the publication of the Tibetan sheep genome sequence^5,6, most research remains focused on high-altitude adaptability, mutton quality, and nutrient metabolism^7–9. Omics technologies have revealed that significant convergent evolution of the EPAS1 and EGLN1 genes has contributed to the breed’s adaptability to high-altitude environments, along with the identification of additional novel adaptive genes^10–12. Plateau adaptability is a complex, polygenic trait^13–15, with distinct local adaptation mechanisms observed among different Tibetan sheep populations and subtypes inhabiting varying altitudes. Furthermore, the lack of phenotypic and genomic data has hindered efforts to improve key economic traits in Tibetan sheep through genetic improvement. To bridge productivity gaps and accelerate breeding progress, it is imperative that we elucidate the genetic mechanisms underlying the formation of important economic traits in these sheep.

Whole-genome sequencing (WGS) has become a standard tool in livestock genetic breeding research, enabling the detection of genome-wide single nucleotide polymorphisms (SNPs), insertions and deletions (InDels), copy number variations (CNVs), and structural variations (SVs). This approach enables the identification of causal variations related to growth, reproduction, adaptability, and disease resistance. Here, we provide WGS data from 220 Tibetan sheep across 11 populations spanning an altitudinal gradient of 2,887 m to 4,643 m, marking the most comprehensive collection of whole-genome sequences from this breed to date. After aligning the sequencing data with the Tibetan sheep reference genome, a total of 21,099,381 high-quality SNPs were identified. We anticipate that this dataset will play an important role in assessing genetic diversity, gene flow, and regions of positive selection, as well as identifying candidate genes associated with economic traits in the Tibetan sheep population.

Methods

Sample collection

All animal experiments were performed under the guidance of ethical regulations from the Institutional Animal Care and Use Committee of Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences (Approval No. NKMYD201805; Approval Date: 18 October 2018). For this WGS analysis, we selected 11 Tibetan sheep populations that are grazed year-round in different agroclimatic zones, representing the diverse environments of the Qinghai-Tibet Plateau (Table 1). Twenty unrelated adult samples were collected from each population, with 5 mL of blood drawn from the jugular vein before morning grazing and stored in EDTA tubes at −20 °C.

Table 1.

Details of Tibetan sheep populations.

Population	Abbreviation	Number	Province	County	Altitude	Latitude	Longitude
Awang sheep	AW	20	Tibet	Gongjue	4643 m	33°51′31″	101°52′424″
Gangba white sheep	GB	20	Tibet	Gangba	4401 m	28°45′89″	88°61′37″
Gangba black sheep	GBB	20	Tibet	Gangba	4555 m	28°24′66″	88°41′36″
Ganjia sheep	GJ	20	Gansu	Xiahe	3022 m	34°63′72″	102°22′319″
Huoerba sheep	HB	20	Tibet	Zhongba	4614 m	30°12′10″	98°63′098″
Kecai sheep	KC	20	Gansu	Xiahe	3238 m	35°32′49″	102°40′802″
Oula sheep	OL	20	Gansu	Maqu	3501 m	33°51′31″	101°52′424″
Qiaoke sheep	QK	20	Gansu	Luqu	3498 m	35°42′11″	102°42′210″
Tao sheep	TS	20	Gansu	Zhuoni	2887 m	34°64′97″	103°52′386″
Tianjun white Tibetan sheep	WT	20	Qinghai	Tianjun	3331 m	37°28′46″	99°10′188″
Zashijia sheep	ZSJ	20	Qinghai	Qumalai	4269 m	34°14′87″	95°80′422″

Open in a new tab

DNA extraction and quality control

The blood samples were thawed at room temperature for 30 min, and genomic DNA was extracted using a blood genomic DNA extraction kit (TIANGEN, Beijing, China), in accordance with the manufacturer’s instructions. Agarose gel (1%) electrophoresis was used to detect DNA degradation and contamination in the samples. DNA purity was assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, MA, USA), and DNA concentrations were determined using a Qubit® 3.0 Fluorometer (Invitrogen, CA, USA). Qualifying DNA samples were sent to Guangzhou GeneDenovo Biotechnology Co., Ltd. (Guangzhou, China) for WGS.

Library preparation and sequencing

Following the manufacturer’s instructions, sequencing libraries for all samples were generated using the library construction kit from Illumina (CA, USA). Appropriate amounts of DNA were enzymatically fragmented into short segments, end-repaired, and dA-tailed prior to ligation of sequencing adapters. The DNA fragments were purified using AMPure XP beads (Merck, Shanghai, China), and fragments in the range of 300–400 bp were selected for PCR amplification. The size and concentration of the libraries were measured using a Qubit® 3.0 Fluorometer and an Agilent 2100 Bioanalyzer (Agilent, CA, USA). The effective concentration of each library was accurately quantified using the Bio-RAD CFX 96 Real-Time PCR Detection System (Bio-Rad, CA, USA). Libraries that passed quality control were sequenced on the Hiseq X10 PE150 platform (Illumina, CA, USA).

Sequence data pre-processing and mapping

Raw image data obtained from sequencing were converted into raw sequencing reads through base calling, and the results were stored in FASTQ file format. The fastp software (v0.23.4) was used for quality control of the sequencing data, including filtering low-quality reads, trimming low-quality bases from the 3’ end, and removing adapter sequences. Additionally, statistics on quality score distribution, GC content, error rate distribution, and N content were generated.

High-quality filtered reads were aligned to the Tibetan sheep reference genome (GCA_017524585.1) using the BWA-MEM algorithm (v0.7.17-r1188). The resulting Binary Alignment Map (BAM) files were sorted using Samtools (v1.17), and PCR duplicate reads were marked using the MarkDuplicates module in the Genome Analysis Toolkit (GATK, v4.5.0.0).

Variant calling, filtering and annotation

As shown in Fig. 1, variant calling and filtering were performed using GATK (v4.5.0.0). For SNP calling, Genomic Variant Call Format (GVCF) files were generated using the HaplotypeCaller module with the “-ERC GVCF” option. To improve scalability and accelerate joint genotyping, GVCF files were consolidated into a GenomicsDB datastore. Next, the GenotypeGVCFs module was applied to joint calling to produce population-based Variant Call Format (VCF) files. Biallelic SNPs were obtained using the SelectVariants module with the “-select-type SNP” and “–restrict-alleles-to BIALLELIC” parameters. To reduce false-positive SNPs, we applied the VariantFiltration module with the following quality control parameters:–filter-name “QD_filter” -filter “QD < 2.0”;–filter-name “FS_filter” -filter “FS > 60.0”;–filter-name “MQ_filter” -filter “MQ < 40.0”;–filter-name “SOR_filter” -filter “SOR > 3.0”;–filter-name “MQRankSum_filter” -filter; “MQRankSum < −12.5”;–filter-name “ReadPosRankSum_filter” -filter “ReadPosRankSum < −8.0”; and–cluster-size 3–missing-values-evaluate-as-failing. Additional SNP filtering was performed using PLINK (v1.9) with the following three criteria: SNP call rate < 0.1; minor allele frequency (MAF) < 0.01; and only SNPs on autosomes were retained. Remaining SNPs were annotated based on genomic position using ANNOVAR.

SNP validation

To assess the accuracy of identified SNPs, variants were compared against public datasets from the Database of Single Nucleotide Polymorphisms (dbSNP; https://ftp.ncbi.nih.gov/snp/organisms/archive/sheep_9940/VCF/) and the iSheep database (https://ngdc.cncb.ac.cn/isheep/download). To convert physical coordinates from these public datasets were to the reference genome used in this study, we employed the LiftOver tool from the University of California Santa Cruz¹⁶. The overlap between the variants identified in this study and public datasets was then calculated to determine the proportion of newly discovered SNPs.

Data Records

Whole-genome sequence data (FASTQ format) from 220 Tibetan Sheep samples representing 11 populations analyzed herein have been deposited in the NCBI Sequence Read Archive (SRA) and have been assigned BioProject accession number PRJNA1138910 (https://www.ncbi.nlm.nih.gov/sra/SRP527227)¹⁷. The final VCF files have been deposited in the European Variation Archive (EVA) under accession number PRJEB100942¹⁸.

Technical Validation

Quality control of sequencing data

Quality control of raw WGS data is the foundation for ensuring accuracy and reliability in downstream analysis¹⁹. For each individual, we obtained 13.78–24.29 Gb of sequenced bases (average: 16.75 Gb), with 85.32%–94.27% (average: 91.86%) of the bases having a Phred scaled quality score of 30 (Fig. 2, Table 2). The genome coverage ranged from 5.10X to 9.00X (average: 6.20X). To comprehensively identify genetic variations, increasing the sequencing depth can be used to improve genome coverage and variant detection rates. High-depth resequencing (30X) is considered the “gold standard”²⁰. However, if funding is limited, sequencing fewer samples at high depth may not provide adequate detection of all genetic variations^19,21. Indeed, a WGS study of pigs found that sequencing depths below 4X resulted in more false-positive variants, indicating that 4X is the lower limit for sequencing quality²². In this study, the average sequencing depth was 6.20X, which is above the 4X threshold. Additionally, with low-depth sequencing, a larger sample size reduces the false-positive rate in variant detection²³; thus, our sample size of 220 individuals was considered sufficient to accurately identify genetic variations in the Tibetan sheep genome. Although high-depth sequencing can yield more information, recent studies suggest that low-depth sequencing is a more effective approach for large sheep populations^24–26.

Fig. 2 — Boxplots showing the average raw base (A), raw Q30 (B) and sequencing depth (C) for Tibetan sheep samples.

Table 2.

Summary of whole genome resequencing data.

Population	Raw reads	Clean reads	Clean data rate(%)	Clean Q20 rate(%)	Clean Q30 rate(%)	Clean GC content(%)	Mapped Reads	Mapped Rate(%)
AW	95,701,170-137,835,824	94,678,924-136,045,192	98.43-99.08	95.36-97.74	87.83-93.42	43.95-44.84	94,261,765-135,881,157	99.56-99.96
GB	94,298,058-119,696,088	93,163,830-118,156,734	98.26-99.13	96.95-97.69	91.55-93.18	43.54-44.79	93,117,006-118,097,420	98.34-99.96
GBB	94,669,372-161,355,650	93,637,276-159,027,818	98.36-98.98	95.19-97.70	87.41-93.27	44.47-45.57	93,487,334-158,823,568	99.77-99.95
GJ	98,599,898-161,943,674	97,646,642-160,075,082	98.31-99.18	94.25-97.55	85.87-92.92	41.83-45.41	97,593,705-159,926,799	99.78-99.97
HB	93,872,024-122,164,806	92,772,578-120,126,382	98.11-98.99	95.67-97.71	88.39-93.35	44.34-45.45	92,734,799-120,008,442	99.83-99.96
KC	94,082,272-145,093,246	92,906,510-143,021,248	98.19-99.15	94.05-97.41	85.32-92.50	44.46-45.24	92,807,922-142,810,713	99.71-99.93
OL	93,811,744-142,172,432	92,911,336-140,093,662	97.95-99.04	95.86-97.52	88.76-92.93	44.67-45.69	92,861,844-139,844,041	99.82-99.95
QK	93,931,324-125,795,418	92,399,782-124,477,264	98.26-98.95	96.70-97.72	90.91-93.37	43.45-45.29	92,242,120-124,169,989	99.48-99.88
TS	94,844,408-145,690,888	93,843,280-143,920,534	98.09-99.12	96.76-97.79	91.27-93.45	43.71-45.35	93,732,435-143,830,294	98.63-99.94
WT	94,262,028-127,932,770	93,349,410-126,557,328	98.36-99.04	96.24-98.13	90.21-94.27	43.65-44.91	93,073,816-126,432,906	99.60-99.94
ZSJ	91,865,506-119,984,184	90,788,858-118,826,332	98.64-99.23	94.35-97.50	85.77-92.90	43.70-45.40	90,325,988-118,588,175	99.30-99.89

Open in a new tab

Fastp enables rapid data preprocessing and quality control of high-throughput sequencing data²⁷. As shown in Fig. 3, MultiQC was used to integrate fastp results and generate quality reports. The duplicate read rate serves as an indicator of the quality of sequencing data, with lower rates indicating better data quality. In this study, the average duplicate read rate was 22.84% and the average unique read rate was 77.16% (Fig. 3A). However, distinct peaks were observed in certain regions of some individual samples, possibly attributable to PCR over-amplification during sequencing²⁸. Figure 3B shows the average quality score for each base position, which was maintained at ~35, indicating very high sequencing quality. Similarly, the per-sequence quality score was consistently ~35, further demonstrating the high quality of the sequencing data (Fig. 3C). The GC content across all samples showed a stable distribution (average: 44.60%, Fig. 3D), suggesting no exogenous genome contamination during sequencing²⁹.

Quality control of SNP data

Using the HaplotypeCaller function in GATK³⁰, a total of 235,803,940 raw SNPs were identified in the 11 Tibetan sheep populations. To exclude low-quality SNPs, we used the VariantFiltration function in GATK, resulting in 213,867,729 SNPs. Finally, SNPs with a minor allele frequency (MAF) <0.05 and a missing rate > 10% were removed, totalling 21,099,381 SNPs for subsequent analyses. As shown in Fig. 4, we identified ~4.58 million SNPs (21.704%) that have not been reported previously in the dbSNP (https://ftp.ncbi.nih.gov/snp/organisms/archive/sheep_9940/VCF/) and the iSheep Data (https://ngdc.cncb.ac.cn/isheep/download), which could be due to prior underrepresentation of the sheep breeds studied here. Analysis of SNPs counts by mutation types across populations showed the G:C → A:T mutation to be the most prevalent (Fig. 5).

Fig. 4 — Venn diagrams for novel variants detected in 11 Tibetan sheep populations.

Fig. 5 — Statistics for the SNP number of different mutation types.

Variant detection accuracy can be assessed by the ratio of transitions (Ti) to transversions (Tv)^31,32. In the absence of selection pressure, Ti/Tv is expected to be 0.5; however, this is rarely observed. A typical Ti/Tv ratio for whole-genome analysis is ~2.0–2.1, while novel variants generally show a ratio of ~1.5. In this study, the observed Ti/Tv ratio was 2.56, suggesting high SNP calling accuracy; ratios exceeding 4 may indicate artifacts³³.

In the quality control of genomic data, the heterozygous-to-homozygous (Het/Hom) ratio is used to assess the genetic diversity of samples. Under the assumption of Hardy-Weinberg equilibrium, the expected Het/Hom ratio in human genomic data is 2.0²⁹. In this study, the Het/Hom ratio was 0.92, possibly reflecting inbreeding in Tibetan sheep, which can increase the likelihood of homozygosity. Additionally, a genomic evaluation of inbreeding coefficients in the 11 populations revealed severe inbreeding in some³⁴, further demonstrating the high sequence quality in this study.

The SNP density can reflect both the genetic diversity of samples and the distribution of variations in the genome. In this study, there was an average of one SNP every 125.62 bp, with the highest densities observed on the sex chromosomes and more uniform distribution on the autosomes (Table 3). To understand the functions of these SNPs, annotation was performed using the ANNOVAR software³⁵. Most SNPs were found to be distributed in intronic and intergenic regions (Table 4).

Table 3.

Summary statistics of SNPs in each chromosome.

No	Chromosome ID in GenBank	Chrom	Size (bp)	SNP number	SNP density (bp/SNP)
1	CM029819.1	chr1	280,644,136	2,287,570	122.68
2	CM029820.1	chr2	252,362,713	1,978,758	127.54
3	CM029821.1	chr3	226,826,216	1,757,294	129.08
4	CM029822.1	chr4	121,514,229	992,298	122.46
5	CM029823.1	chr5	108,580,084	858,786	126.43
6	CM029824.1	chr6	118,786,809	1,038,525	114.38
7	CM029825.1	chr7	105,184,753	807,578	130.25
8	CM029826.1	chr8	91,237,811	745,722	122.35
9	CM029827.1	chr9	95,403,819	828,421	115.16
10	CM029828.1	chr10	91,562,662	779,322	117.49
11	CM029829.1	chr11	67,716,071	487,579	138.88
12	CM029830.1	chr12	81,071,312	657,600	123.28
13	CM029831.1	chr13	83,580,535	627,802	133.13
14	CM029832.1	chr14	67,387,920	530,405	127.05
15	CM029833.1	chr15	83,093,940	711,143	116.85
16	CM029834.1	chr16	72,105,115	623,418	115.66
17	CM029835.1	chr17	73,523,957	625,458	117.55
18	CM029836.1	chr18	70,760,168	579,921	122.02
19	CM029837.1	chr19	60,600,000	471,256	128.59
20	CM029838.1	chr20	52,181,980	458,225	113.88
21	CM029839.1	chr21	51,086,750	442,809	115.37
22	CM029840.1	chr22	51,925,645	443,008	117.21
23	CM029841.1	chr23	62,764,803	544,019	115.37
24	CM029842.1	chr24	45,006,986	357,790	125.79
25	CM029843.1	chr25	45,418,129	422,578	107.48
26	CM029844.1	chr26	45,297,346	399,013	113.52
27	CM029845.1	chrX	144,257,616	620,923	232.33

Open in a new tab

Table 4.

Annotation result for SNPs in the Tibetan sheep populations.

Catcgory	AW	GB	GBB	GJ	HB	KC	OL	QK	TS	WT	ZSJ
exonic-nonsynonymous	41,250	42,044	41,674	42,614	42,354	42,955	42,439	41,065	41,376	43,687	42,013
exonic-synonymous	71,622	72,846	72,247	73,496	73,507	74,274	73,649	71,312	71,959	75,324	72,808
exonic-startloss/stopgain/loss	1,038	1,062	1,039	1,088	1,077	1,072	1,070	1,037	1,028	1,096	1,062
exonic-unknown	1,269	1,290	1,280	1,303	1,301	1,298	1,282	1,230	1,272	1,344	1,277
intronic	6,122,920	6,293,405	6,208,376	6,332,554	6,295,737	6,357,533	6,323,299	6,124,934	6,234,045	6,473,080	6,242,532
splicing	483	477	482	492	491	498	485	470	481	504	487
UTR3/UTR5	2	2	2	2	2	2	2	2	2	2	2
up/downstream	201,370	205,852	203,485	208,194	207,676	208,614	207,457	201,088	203,951	212,304	204,694
intergenic	11,917,492	12,231,525	12,092,829	12,350,508	12,227,833	12,318,005	12,310,123	11,936,580	12,154,590	12,572,495	12,156,930
Total SNPs	18,357,446	18,848,503	18,621,414	19,010,251	18,849,978	19,004,251	18,959,806	18,377,718	18,708,704	19,379,836	18,721,805

Open in a new tab

Polymorphism information content (PIC) is an indicator used to measure the polymorphism of genetic markers, reflecting the diversity of alleles at a locus. Our analysis showed the highest PIC value in the Tao sheep (TS) group and the lowest in the Zashijia sheep (ZSJ) group. (Fig. 5). Nucleotide diversity (π) is an indicator used to measure the degree of nucleotide variation, reflecting the average number of nucleotide substitutions in a population. In our analysis, the Tianjun white Tibetan sheep (WT) group had the highest π value, while the Zashijia sheep (ZSJ) group had the lowest (Fig. 6).

Fig. 6 — Estimation of genomic PIC (A) and π (B) based on SNPs of 11 Tibetan sheep populations. Each bar represents a Tibetan sheep population, and the data is presented as mean ± standard deviation.

Acknowledgements

This work was supported by the Gansu Provincial Science and Technology Plan (25JRRA453), the Central Public-interest Scientific Institution Basal Research Fund (1610322024012), the Innovation Project of Chinese Academy of Agricultural Sciences (25-LZIHPS-07), the Key R&D Program in Gansu Province (24YFNA022), and the Modern China Wool Cashmere Technology Research System (CARS-39-02).

Author contributions

Z.L. and J.L. conceived this study. Z.L. and C.Y. collected the samples and performed the experiments; Z.L., C.Y., T.G., F.W. and B.C. performed the research and analyzed the data. Z.L. drafted the manuscript. All authors have read and approved the final manuscript.

Data availability

Sequencing data was uploaded to SRA under accession number SRP527227 (https://www.ncbi.nlm.nih.gov/sra/SRP527227). The identified variants data were deposited in the EVA under accession number PRJEB100942.

Code availability

The list of the software and parameters used in this study is available through GitHub (https://github.com/luzengkui/sheep_NGS_01).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Chao Yuan, Email: yuanchao@caas.cn.

Jianbin Liu, Email: liujianbin@caas.cn.

References

1.Hu, X. J. et al. The genome landscape of Tibetan sheep reveals adaptive introgression from argali and the history of early human settlements on the Qinghai-Tibetan Plateau. Mol Biol Evol.36, 283–303 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Zhao, Y. X. et al. Genomic reconstruction of the history of native sheep reveals the peopling patterns of nomads and the expansion of early pastoralism in East Asia. Mol Biol Evol.34, 2380–2395 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.China National Commission of Animal Genetic Resources. Animal genetic resources in China (sheep and goats). (China Agriculture Press, 2011).
4.Li, X. et al. Whole-genome resequencing to investigate the genetic diversity and mechanisms of plateau adaptation in Tibetan sheep. J Anim Sci Biotechnol.15, 164 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.NCBI GenBankhttps://identifiers.org/ncbi/insdc.gca:GCA_017524585.1 (2021).
6.Lu, Z. et al. Chromosome-level genome assembly of Guide Black-Fur sheep (Ovis aries). Sci Data.11, 711 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Han, B. et al. Multiomics analyses provide new insight into genetic variation of reproductive adaptability in Tibetan sheep. Mol Biol Evol.41, msae058 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Xu, X. et al. Insight into the differences in meat quality among three breeds of sheep on the Qinghai-Tibetan plateau from the perspective of metabolomics and rumen microbiota. Food Chem X.23, 101731 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zhang, X. et al. Effects of different feeding regimes on muscle metabolism and its association with meat quality of Tibetan sheep. Food Chem.374, 131611 (2022). [DOI] [PubMed] [Google Scholar]
10.Wu, D. et al. Convergent genomic signatures of high-altitude adaptation among domestic mammals. Natl Sci Rev.7, 952–963 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Jin, M. et al. Whole-genome resequencing of Chinese indigenous sheep provides insight into the genetic basis underlying climate adaptation. Genet Sel Evol.56, 26 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Li, C. et al. Multi-omic analyses shed light on the genetic control of high-altitude adaptation in sheep. Genomics Proteomics Bioinformatics22, qzae030 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Wen, J. et al. Ancestral origins and post-admixture adaptive evolution of highland Tajiks. Natl Sci Rev.11, nwae284 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Ping, J. et al. A highland-adaptation variant near MCUR1 reduces its transcription and attenuates erythrogenesis in Tibetans. Cell Genom.5, 100782 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ferraretti, G. et al. Convergent evolution of complex adaptive traits modulates angiogenesis in high-altitude Andean and Himalayan human populations. Commun Biol.8, 377 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kuhn, R. M. et al. The UCSC genome browser database: update 2007. Nucleic Acids Res.35, D668–673 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP527227 (2024).
18.ENA European Variation Archivehttps://identifiers.org/ena.embl:ERP182370 (2025).
19.Pfeifer, S. P. From next-generation resequencing reads to a high-quality variant data set. Heredity118, 111–124 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Sims, D. et al. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet.15, 121–132 (2014). [DOI] [PubMed] [Google Scholar]
21.Ayalew, W. et al. Whole genome sequences of 70 indigenous Ethiopian cattle. Sci Data.11, 584 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Jiang, Y. et al. Optimal sequencing depth design for whole genome re-sequencing in pigs. BMC Bioinformatics20, 556 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res.21, 952–960 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Zhao, L. et al. Whole-genome resequencing of Hu sheep identifies candidate genes associated with agronomic traits. J Genet Genomics.51, 866–876 (2024). [DOI] [PubMed] [Google Scholar]
25.Zhao, F. et al. Analysis of 206 whole-genome resequencing reveals selection signatures associated with breed-specific traits in Hu sheep. Evol Appl.17, e13697 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Jin, M. et al. Genomic insights into the population history of fat-tailed sheep and identification of two mutations that contribute to fat tail adipogenesis. J Adv Res.S2090-1232, 00304–2 (2025). [DOI] [PubMed] [Google Scholar]
27.Chen, S. et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Belay, S. et al. Whole-genome resource sequences of 57 indigenous Ethiopian goats. Sci Data.11, 139 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Guo, Y. et al. Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform.15, 879–889 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet.43, 491–498 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Gheyas, A. et al. Whole genome sequences of 234 indigenous African chickens from Ethiopia. Sci Data.9, 53 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Rabbani, M. A. G. et al. Whole genome sequencing of three native chicken varieties (Common Deshi, Hilly and Naked Neck) of Bangladesh. Sci Data.11, 1432 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Wang, J. et al. Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics.31, 318–323 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Sun, L. et al. The accumulation of harmful genes within the ROH hotspot regions of the Tibetan sheep genome does not lead to genetic load. BMC Genomics.26, 60 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Wang, K. et al. Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res.38, e164 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP527227 (2024).

Data Availability Statement

The list of the software and parameters used in this study is available through GitHub (https://github.com/luzengkui/sheep_NGS_01).

[CR1] 1.Hu, X. J. et al. The genome landscape of Tibetan sheep reveals adaptive introgression from argali and the history of early human settlements on the Qinghai-Tibetan Plateau. Mol Biol Evol.36, 283–303 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Zhao, Y. X. et al. Genomic reconstruction of the history of native sheep reveals the peopling patterns of nomads and the expansion of early pastoralism in East Asia. Mol Biol Evol.34, 2380–2395 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.China National Commission of Animal Genetic Resources. Animal genetic resources in China (sheep and goats). (China Agriculture Press, 2011).

[CR4] 4.Li, X. et al. Whole-genome resequencing to investigate the genetic diversity and mechanisms of plateau adaptation in Tibetan sheep. J Anim Sci Biotechnol.15, 164 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.NCBI GenBankhttps://identifiers.org/ncbi/insdc.gca:GCA_017524585.1 (2021).

[CR6] 6.Lu, Z. et al. Chromosome-level genome assembly of Guide Black-Fur sheep (Ovis aries). Sci Data.11, 711 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Han, B. et al. Multiomics analyses provide new insight into genetic variation of reproductive adaptability in Tibetan sheep. Mol Biol Evol.41, msae058 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Xu, X. et al. Insight into the differences in meat quality among three breeds of sheep on the Qinghai-Tibetan plateau from the perspective of metabolomics and rumen microbiota. Food Chem X.23, 101731 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Zhang, X. et al. Effects of different feeding regimes on muscle metabolism and its association with meat quality of Tibetan sheep. Food Chem.374, 131611 (2022). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Wu, D. et al. Convergent genomic signatures of high-altitude adaptation among domestic mammals. Natl Sci Rev.7, 952–963 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Jin, M. et al. Whole-genome resequencing of Chinese indigenous sheep provides insight into the genetic basis underlying climate adaptation. Genet Sel Evol.56, 26 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Li, C. et al. Multi-omic analyses shed light on the genetic control of high-altitude adaptation in sheep. Genomics Proteomics Bioinformatics22, qzae030 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Wen, J. et al. Ancestral origins and post-admixture adaptive evolution of highland Tajiks. Natl Sci Rev.11, nwae284 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Ping, J. et al. A highland-adaptation variant near MCUR1 reduces its transcription and attenuates erythrogenesis in Tibetans. Cell Genom.5, 100782 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Ferraretti, G. et al. Convergent evolution of complex adaptive traits modulates angiogenesis in high-altitude Andean and Himalayan human populations. Commun Biol.8, 377 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Kuhn, R. M. et al. The UCSC genome browser database: update 2007. Nucleic Acids Res.35, D668–673 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP527227 (2024).

[CR18] 18.ENA European Variation Archivehttps://identifiers.org/ena.embl:ERP182370 (2025).

[CR19] 19.Pfeifer, S. P. From next-generation resequencing reads to a high-quality variant data set. Heredity118, 111–124 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Sims, D. et al. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet.15, 121–132 (2014). [DOI] [PubMed] [Google Scholar]

[CR21] 21.Ayalew, W. et al. Whole genome sequences of 70 indigenous Ethiopian cattle. Sci Data.11, 584 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Jiang, Y. et al. Optimal sequencing depth design for whole genome re-sequencing in pigs. BMC Bioinformatics20, 556 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res.21, 952–960 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Zhao, L. et al. Whole-genome resequencing of Hu sheep identifies candidate genes associated with agronomic traits. J Genet Genomics.51, 866–876 (2024). [DOI] [PubMed] [Google Scholar]

[CR25] 25.Zhao, F. et al. Analysis of 206 whole-genome resequencing reveals selection signatures associated with breed-specific traits in Hu sheep. Evol Appl.17, e13697 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Jin, M. et al. Genomic insights into the population history of fat-tailed sheep and identification of two mutations that contribute to fat tail adipogenesis. J Adv Res.S2090-1232, 00304–2 (2025). [DOI] [PubMed] [Google Scholar]

[CR27] 27.Chen, S. et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Belay, S. et al. Whole-genome resource sequences of 57 indigenous Ethiopian goats. Sci Data.11, 139 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Guo, Y. et al. Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform.15, 879–889 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet.43, 491–498 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Gheyas, A. et al. Whole genome sequences of 234 indigenous African chickens from Ethiopia. Sci Data.9, 53 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Rabbani, M. A. G. et al. Whole genome sequencing of three native chicken varieties (Common Deshi, Hilly and Naked Neck) of Bangladesh. Sci Data.11, 1432 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Wang, J. et al. Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics.31, 318–323 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Sun, L. et al. The accumulation of harmful genes within the ROH hotspot regions of the Tibetan sheep genome does not lead to genetic load. BMC Genomics.26, 60 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Wang, K. et al. Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res.38, e164 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Whole-genome variant of 220 Tibetan sheep from the Qinghai-Tibetan Plateau

Zengkui Lu

Chao Yuan

Tingting Guo

Fan Wang

Bowei Chen

Jianbin Liu

Abstract

Background & Summary

Methods

Sample collection

Table 1.

DNA extraction and quality control

Library preparation and sequencing

Sequence data pre-processing and mapping

Variant calling, filtering and annotation

Fig. 1.

SNP validation

Data Records

Technical Validation

Quality control of sequencing data

Fig. 2.

Table 2.

Fig. 3.

Quality control of SNP data

Fig. 4.

Fig. 5.

Table 3.

Table 4.

Fig. 6.

Acknowledgements

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases