Skip to main content
Scientific Data logoLink to Scientific Data
. 2025 Dec 6;13:61. doi: 10.1038/s41597-025-06360-3

Whole-genome variant of 220 Tibetan sheep from the Qinghai-Tibetan Plateau

Zengkui Lu 1,2, Chao Yuan 1,2,, Tingting Guo 1,2, Fan Wang 3, Bowei Chen 1,2, Jianbin Liu 1,2,
PMCID: PMC12820102  PMID: 41353230

Abstract

The ancient Tibetan sheep breed has been shaped by long-term natural selection and artificial breeding in the Qinghai-Tibet Plateau region. Although their green organic mutton is highly favored by consumers, the low production efficiency of Tibetan sheep has resulted in a persistent supply shortage. Whole-genome sequencing analysis can identify genetic markers and candidate genes associated with important economic traits, which can be used in genomic-assisted breeding to accelerate genetic improvement and increase production efficiency. Here, we report whole-genome sequencing data from 220 Tibetan sheep across 11 populations inhabiting different altitudes, with an average coverage of 6.20X. Over 98.34% of clean reads were successfully mapped to the Tibetan sheep reference genome, identifying approximately 21.10 million high-quality single-nucleotide polymorphisms. This dataset provides a valuable resource for studying the genetic diversity and adaptability of Tibetan sheep, and may accelerate improvements in genetic traits.

Subject terms: Genetic variation, Genetic markers

Background & Summary

As one of the three major primitive sheep breeds in China, Tibetan sheep (Ovis aries) have lived on the Qinghai-Tibet Plateau for thousands of years, uniquely adapted to its high-altitude, cold, and strong ultraviolet conditions. Their excellent survival traits also include coarse feed tolerance, strong disease resistance, and robust foraging ability. The Tibetan sheep breed, shaped by long-term natural selection and artificial breeding, serves as a vital livelihood resource for local farmers and herders, and contributes to the sustainable growth and high-quality development of the pastoral economy.

Sheep, which are among the earliest domesticated animals, have maintained a close relationship with humans, especially nomadic people. Research indicates that Tibetan sheep originated from ancient northern Chinese sheep around 3,100 years ago1, diverging approximately 2,000‒2,600  years ago2. A small group of Tibetan sheep continued to expand southwestward, reaching central Tibet about 1,300 years ago1, while the remaining populations settled across various regions of Qinghai, gradually adapting to local geographical conditions and evolving into distinct breeds. Statistics show that China’s Tibetan sheep population stands at 32.5 million head, accounting for 11% of the total sheep population3,4. However, because of environmental constraints, the Tibetan breed has long been trapped in a cycle of low-level development and low-efficiency production. Despite the publication of the Tibetan sheep genome sequence5,6, most research remains focused on high-altitude adaptability, mutton quality, and nutrient metabolism79. Omics technologies have revealed that significant convergent evolution of the EPAS1 and EGLN1 genes has contributed to the breed’s adaptability to high-altitude environments, along with the identification of additional novel adaptive genes1012. Plateau adaptability is a complex, polygenic trait1315, with distinct local adaptation mechanisms observed among different Tibetan sheep populations and subtypes inhabiting varying altitudes. Furthermore, the lack of phenotypic and genomic data has hindered efforts to improve key economic traits in Tibetan sheep through genetic improvement. To bridge productivity gaps and accelerate breeding progress, it is imperative that we elucidate the genetic mechanisms underlying the formation of important economic traits in these sheep.

Whole-genome sequencing (WGS) has become a standard tool in livestock genetic breeding research, enabling the detection of genome-wide single nucleotide polymorphisms (SNPs), insertions and deletions (InDels), copy number variations (CNVs), and structural variations (SVs). This approach enables the identification of causal variations related to growth, reproduction, adaptability, and disease resistance. Here, we provide WGS data from 220 Tibetan sheep across 11 populations spanning an altitudinal gradient of 2,887 m to 4,643 m, marking the most comprehensive collection of whole-genome sequences from this breed to date. After aligning the sequencing data with the Tibetan sheep reference genome, a total of 21,099,381 high-quality SNPs were identified. We anticipate that this dataset will play an important role in assessing genetic diversity, gene flow, and regions of positive selection, as well as identifying candidate genes associated with economic traits in the Tibetan sheep population.

Methods

Sample collection

All animal experiments were performed under the guidance of ethical regulations from the Institutional Animal Care and Use Committee of Lanzhou Institute of Husbandry and Pharmaceutical Science, Chinese Academy of Agricultural Sciences (Approval No. NKMYD201805; Approval Date: 18 October 2018). For this WGS analysis, we selected 11 Tibetan sheep populations that are grazed year-round in different agroclimatic zones, representing the diverse environments of the Qinghai-Tibet Plateau (Table 1). Twenty unrelated adult samples were collected from each population, with 5 mL of blood drawn from the jugular vein before morning grazing and stored in EDTA tubes at −20 °C.

Table 1.

Details of Tibetan sheep populations.

Population Abbreviation Number Province County Altitude Latitude Longitude
Awang sheep AW 20 Tibet Gongjue 4643 m 33°51′31″ 101°52′424″
Gangba white sheep GB 20 Tibet Gangba 4401 m 28°45′89″ 88°61′37″
Gangba black sheep GBB 20 Tibet Gangba 4555 m 28°24′66″ 88°41′36″
Ganjia sheep GJ 20 Gansu Xiahe 3022 m 34°63′72″ 102°22′319″
Huoerba sheep HB 20 Tibet Zhongba 4614 m 30°12′10″ 98°63′098″
Kecai sheep KC 20 Gansu Xiahe 3238 m 35°32′49″ 102°40′802″
Oula sheep OL 20 Gansu Maqu 3501 m 33°51′31″ 101°52′424″
Qiaoke sheep QK 20 Gansu Luqu 3498 m 35°42′11″ 102°42′210″
Tao sheep TS 20 Gansu Zhuoni 2887 m 34°64′97″ 103°52′386″
Tianjun white Tibetan sheep WT 20 Qinghai Tianjun 3331 m 37°28′46″ 99°10′188″
Zashijia sheep ZSJ 20 Qinghai Qumalai 4269 m 34°14′87″ 95°80′422″

DNA extraction and quality control

The blood samples were thawed at room temperature for 30 min, and genomic DNA was extracted using a blood genomic DNA extraction kit (TIANGEN, Beijing, China), in accordance with the manufacturer’s instructions. Agarose gel (1%) electrophoresis was used to detect DNA degradation and contamination in the samples. DNA purity was assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, MA, USA), and DNA concentrations were determined using a Qubit® 3.0 Fluorometer (Invitrogen, CA, USA). Qualifying DNA samples were sent to Guangzhou GeneDenovo Biotechnology Co., Ltd. (Guangzhou, China) for WGS.

Library preparation and sequencing

Following the manufacturer’s instructions, sequencing libraries for all samples were generated using the library construction kit from Illumina (CA, USA). Appropriate amounts of DNA were enzymatically fragmented into short segments, end-repaired, and dA-tailed prior to ligation of sequencing adapters. The DNA fragments were purified using AMPure XP beads (Merck, Shanghai, China), and fragments in the range of 300–400 bp were selected for PCR amplification. The size and concentration of the libraries were measured using a Qubit® 3.0 Fluorometer and an Agilent 2100 Bioanalyzer (Agilent, CA, USA). The effective concentration of each library was accurately quantified using the Bio-RAD CFX 96 Real-Time PCR Detection System (Bio-Rad, CA, USA). Libraries that passed quality control were sequenced on the Hiseq X10 PE150 platform (Illumina, CA, USA).

Sequence data pre-processing and mapping

Raw image data obtained from sequencing were converted into raw sequencing reads through base calling, and the results were stored in FASTQ file format. The fastp software (v0.23.4) was used for quality control of the sequencing data, including filtering low-quality reads, trimming low-quality bases from the 3’ end, and removing adapter sequences. Additionally, statistics on quality score distribution, GC content, error rate distribution, and N content were generated.

High-quality filtered reads were aligned to the Tibetan sheep reference genome (GCA_017524585.1) using the BWA-MEM algorithm (v0.7.17-r1188). The resulting Binary Alignment Map (BAM) files were sorted using Samtools (v1.17), and PCR duplicate reads were marked using the MarkDuplicates module in the Genome Analysis Toolkit (GATK, v4.5.0.0).

Variant calling, filtering and annotation

As shown in Fig. 1, variant calling and filtering were performed using GATK (v4.5.0.0). For SNP calling, Genomic Variant Call Format (GVCF) files were generated using the HaplotypeCaller module with the “-ERC GVCF” option. To improve scalability and accelerate joint genotyping, GVCF files were consolidated into a GenomicsDB datastore. Next, the GenotypeGVCFs module was applied to joint calling to produce population-based Variant Call Format (VCF) files. Biallelic SNPs were obtained using the SelectVariants module with the “-select-type SNP” and “–restrict-alleles-to BIALLELIC” parameters. To reduce false-positive SNPs, we applied the VariantFiltration module with the following quality control parameters:–filter-name “QD_filter” -filter “QD < 2.0”;–filter-name “FS_filter” -filter “FS > 60.0”;–filter-name “MQ_filter” -filter “MQ < 40.0”;–filter-name “SOR_filter” -filter “SOR > 3.0”;–filter-name “MQRankSum_filter” -filter; “MQRankSum < −12.5”;–filter-name “ReadPosRankSum_filter” -filter “ReadPosRankSum < −8.0”; and–cluster-size 3–missing-values-evaluate-as-failing. Additional SNP filtering was performed using PLINK (v1.9) with the following three criteria: SNP call rate < 0.1; minor allele frequency (MAF) < 0.01; and only SNPs on autosomes were retained. Remaining SNPs were annotated based on genomic position using ANNOVAR.

Fig. 1.

Fig. 1

Overview of the sequence alignment, variant calling and variant filtration process.

SNP validation

To assess the accuracy of identified SNPs, variants were compared against public datasets from the Database of Single Nucleotide Polymorphisms (dbSNP; https://ftp.ncbi.nih.gov/snp/organisms/archive/sheep_9940/VCF/) and the iSheep database (https://ngdc.cncb.ac.cn/isheep/download). To convert physical coordinates from these public datasets were to the reference genome used in this study, we employed the LiftOver tool from the University of California Santa Cruz16. The overlap between the variants identified in this study and public datasets was then calculated to determine the proportion of newly discovered SNPs.

Data Records

Whole-genome sequence data (FASTQ format) from 220 Tibetan Sheep samples representing 11 populations analyzed herein have been deposited in the NCBI Sequence Read Archive (SRA) and have been assigned BioProject accession number PRJNA1138910 (https://www.ncbi.nlm.nih.gov/sra/SRP527227)17. The final VCF files have been deposited in the European Variation Archive (EVA) under accession number PRJEB10094218.

Technical Validation

Quality control of sequencing data

Quality control of raw WGS data is the foundation for ensuring accuracy and reliability in downstream analysis19. For each individual, we obtained 13.78–24.29 Gb of sequenced bases (average: 16.75 Gb), with 85.32%–94.27% (average: 91.86%) of the bases having a Phred scaled quality score of 30 (Fig. 2, Table 2). The genome coverage ranged from 5.10X to 9.00X (average: 6.20X). To comprehensively identify genetic variations, increasing the sequencing depth can be used to improve genome coverage and variant detection rates. High-depth resequencing (30X) is considered the “gold standard”20. However, if funding is limited, sequencing fewer samples at high depth may not provide adequate detection of all genetic variations19,21. Indeed, a WGS study of pigs found that sequencing depths below 4X resulted in more false-positive variants, indicating that 4X is the lower limit for sequencing quality22. In this study, the average sequencing depth was 6.20X, which is above the 4X threshold. Additionally, with low-depth sequencing, a larger sample size reduces the false-positive rate in variant detection23; thus, our sample size of 220 individuals was considered sufficient to accurately identify genetic variations in the Tibetan sheep genome. Although high-depth sequencing can yield more information, recent studies suggest that low-depth sequencing is a more effective approach for large sheep populations2426.

Fig. 2.

Fig. 2

Boxplots showing the average raw base (A), raw Q30 (B) and sequencing depth (C) for Tibetan sheep samples.

Table 2.

Summary of whole genome resequencing data.

Population Raw reads Clean reads Clean data rate(%) Clean Q20 rate(%) Clean Q30 rate(%) Clean GC content(%) Mapped Reads Mapped Rate(%)
AW 95,701,170-137,835,824 94,678,924-136,045,192 98.43-99.08 95.36-97.74 87.83-93.42 43.95-44.84 94,261,765-135,881,157 99.56-99.96
GB 94,298,058-119,696,088 93,163,830-118,156,734 98.26-99.13 96.95-97.69 91.55-93.18 43.54-44.79 93,117,006-118,097,420 98.34-99.96
GBB 94,669,372-161,355,650 93,637,276-159,027,818 98.36-98.98 95.19-97.70 87.41-93.27 44.47-45.57 93,487,334-158,823,568 99.77-99.95
GJ 98,599,898-161,943,674 97,646,642-160,075,082 98.31-99.18 94.25-97.55 85.87-92.92 41.83-45.41 97,593,705-159,926,799 99.78-99.97
HB 93,872,024-122,164,806 92,772,578-120,126,382 98.11-98.99 95.67-97.71 88.39-93.35 44.34-45.45 92,734,799-120,008,442 99.83-99.96
KC 94,082,272-145,093,246 92,906,510-143,021,248 98.19-99.15 94.05-97.41 85.32-92.50 44.46-45.24 92,807,922-142,810,713 99.71-99.93
OL 93,811,744-142,172,432 92,911,336-140,093,662 97.95-99.04 95.86-97.52 88.76-92.93 44.67-45.69 92,861,844-139,844,041 99.82-99.95
QK 93,931,324-125,795,418 92,399,782-124,477,264 98.26-98.95 96.70-97.72 90.91-93.37 43.45-45.29 92,242,120-124,169,989 99.48-99.88
TS 94,844,408-145,690,888 93,843,280-143,920,534 98.09-99.12 96.76-97.79 91.27-93.45 43.71-45.35 93,732,435-143,830,294 98.63-99.94
WT 94,262,028-127,932,770 93,349,410-126,557,328 98.36-99.04 96.24-98.13 90.21-94.27 43.65-44.91 93,073,816-126,432,906 99.60-99.94
ZSJ 91,865,506-119,984,184 90,788,858-118,826,332 98.64-99.23 94.35-97.50 85.77-92.90 43.70-45.40 90,325,988-118,588,175 99.30-99.89

Fastp enables rapid data preprocessing and quality control of high-throughput sequencing data27. As shown in Fig. 3, MultiQC was used to integrate fastp results and generate quality reports. The duplicate read rate serves as an indicator of the quality of sequencing data, with lower rates indicating better data quality. In this study, the average duplicate read rate was 22.84% and the average unique read rate was 77.16% (Fig. 3A). However, distinct peaks were observed in certain regions of some individual samples, possibly attributable to PCR over-amplification during sequencing28. Figure 3B shows the average quality score for each base position, which was maintained at ~35, indicating very high sequencing quality. Similarly, the per-sequence quality score was consistently ~35, further demonstrating the high quality of the sequencing data (Fig. 3C). The GC content across all samples showed a stable distribution (average: 44.60%, Fig. 3D), suggesting no exogenous genome contamination during sequencing29.

Fig. 3.

Fig. 3

Quality control metrics from FastQC analysis of sequencing data. (A) Unique and duplicated sequence counts. (B) Mean quality score at each base position. (C) Per sequence quality score. (D) Per sequence GC content.

Quality control of SNP data

Using the HaplotypeCaller function in GATK30, a total of 235,803,940 raw SNPs were identified in the 11 Tibetan sheep populations. To exclude low-quality SNPs, we used the VariantFiltration function in GATK, resulting in 213,867,729 SNPs. Finally, SNPs with a minor allele frequency (MAF) <0.05 and a missing rate > 10% were removed, totalling 21,099,381 SNPs for subsequent analyses. As shown in Fig. 4, we identified ~4.58 million SNPs (21.704%) that have not been reported previously in the dbSNP (https://ftp.ncbi.nih.gov/snp/organisms/archive/sheep_9940/VCF/) and the iSheep Data (https://ngdc.cncb.ac.cn/isheep/download), which could be due to prior underrepresentation of the sheep breeds studied here. Analysis of SNPs counts by mutation types across populations showed the G:C → A:T mutation to be the most prevalent (Fig. 5).

Fig. 4.

Fig. 4

Venn diagrams for novel variants detected in 11 Tibetan sheep populations.

Fig. 5.

Fig. 5

Statistics for the SNP number of different mutation types.

Variant detection accuracy can be assessed by the ratio of transitions (Ti) to transversions (Tv)31,32. In the absence of selection pressure, Ti/Tv is expected to be 0.5; however, this is rarely observed. A typical Ti/Tv ratio for whole-genome analysis is ~2.0–2.1, while novel variants generally show a ratio of ~1.5. In this study, the observed Ti/Tv ratio was 2.56, suggesting high SNP calling accuracy; ratios exceeding 4 may indicate artifacts33.

In the quality control of genomic data, the heterozygous-to-homozygous (Het/Hom) ratio is used to assess the genetic diversity of samples. Under the assumption of Hardy-Weinberg equilibrium, the expected Het/Hom ratio in human genomic data is 2.029. In this study, the Het/Hom ratio was 0.92, possibly reflecting inbreeding in Tibetan sheep, which can increase the likelihood of homozygosity. Additionally, a genomic evaluation of inbreeding coefficients in the 11 populations revealed severe inbreeding in some34, further demonstrating the high sequence quality in this study.

The SNP density can reflect both the genetic diversity of samples and the distribution of variations in the genome. In this study, there was an average of one SNP every 125.62 bp, with the highest densities observed on the sex chromosomes and more uniform distribution on the autosomes (Table 3). To understand the functions of these SNPs, annotation was performed using the ANNOVAR software35. Most SNPs were found to be distributed in intronic and intergenic regions (Table 4).

Table 3.

Summary statistics of SNPs in each chromosome.

No Chromosome ID in GenBank Chrom Size (bp) SNP number SNP density (bp/SNP)
1 CM029819.1 chr1 280,644,136 2,287,570 122.68
2 CM029820.1 chr2 252,362,713 1,978,758 127.54
3 CM029821.1 chr3 226,826,216 1,757,294 129.08
4 CM029822.1 chr4 121,514,229 992,298 122.46
5 CM029823.1 chr5 108,580,084 858,786 126.43
6 CM029824.1 chr6 118,786,809 1,038,525 114.38
7 CM029825.1 chr7 105,184,753 807,578 130.25
8 CM029826.1 chr8 91,237,811 745,722 122.35
9 CM029827.1 chr9 95,403,819 828,421 115.16
10 CM029828.1 chr10 91,562,662 779,322 117.49
11 CM029829.1 chr11 67,716,071 487,579 138.88
12 CM029830.1 chr12 81,071,312 657,600 123.28
13 CM029831.1 chr13 83,580,535 627,802 133.13
14 CM029832.1 chr14 67,387,920 530,405 127.05
15 CM029833.1 chr15 83,093,940 711,143 116.85
16 CM029834.1 chr16 72,105,115 623,418 115.66
17 CM029835.1 chr17 73,523,957 625,458 117.55
18 CM029836.1 chr18 70,760,168 579,921 122.02
19 CM029837.1 chr19 60,600,000 471,256 128.59
20 CM029838.1 chr20 52,181,980 458,225 113.88
21 CM029839.1 chr21 51,086,750 442,809 115.37
22 CM029840.1 chr22 51,925,645 443,008 117.21
23 CM029841.1 chr23 62,764,803 544,019 115.37
24 CM029842.1 chr24 45,006,986 357,790 125.79
25 CM029843.1 chr25 45,418,129 422,578 107.48
26 CM029844.1 chr26 45,297,346 399,013 113.52
27 CM029845.1 chrX 144,257,616 620,923 232.33

Table 4.

Annotation result for SNPs in the Tibetan sheep populations.

Catcgory AW GB GBB GJ HB KC OL QK TS WT ZSJ
exonic-nonsynonymous 41,250 42,044 41,674 42,614 42,354 42,955 42,439 41,065 41,376 43,687 42,013
exonic-synonymous 71,622 72,846 72,247 73,496 73,507 74,274 73,649 71,312 71,959 75,324 72,808
exonic-startloss/stopgain/loss 1,038 1,062 1,039 1,088 1,077 1,072 1,070 1,037 1,028 1,096 1,062
exonic-unknown 1,269 1,290 1,280 1,303 1,301 1,298 1,282 1,230 1,272 1,344 1,277
intronic 6,122,920 6,293,405 6,208,376 6,332,554 6,295,737 6,357,533 6,323,299 6,124,934 6,234,045 6,473,080 6,242,532
splicing 483 477 482 492 491 498 485 470 481 504 487
UTR3/UTR5 2 2 2 2 2 2 2 2 2 2 2
up/downstream 201,370 205,852 203,485 208,194 207,676 208,614 207,457 201,088 203,951 212,304 204,694
intergenic 11,917,492 12,231,525 12,092,829 12,350,508 12,227,833 12,318,005 12,310,123 11,936,580 12,154,590 12,572,495 12,156,930
Total SNPs 18,357,446 18,848,503 18,621,414 19,010,251 18,849,978 19,004,251 18,959,806 18,377,718 18,708,704 19,379,836 18,721,805

Polymorphism information content (PIC) is an indicator used to measure the polymorphism of genetic markers, reflecting the diversity of alleles at a locus. Our analysis showed the highest PIC value in the Tao sheep (TS) group and the lowest in the Zashijia sheep (ZSJ) group. (Fig. 5). Nucleotide diversity (π) is an indicator used to measure the degree of nucleotide variation, reflecting the average number of nucleotide substitutions in a population. In our analysis, the Tianjun white Tibetan sheep (WT) group had the highest π value, while the Zashijia sheep (ZSJ) group had the lowest (Fig. 6).

Fig. 6.

Fig. 6

Estimation of genomic PIC (A) and π (B) based on SNPs of 11 Tibetan sheep populations. Each bar represents a Tibetan sheep population, and the data is presented as mean ± standard deviation.

Acknowledgements

This work was supported by the Gansu Provincial Science and Technology Plan (25JRRA453), the Central Public-interest Scientific Institution Basal Research Fund (1610322024012), the Innovation Project of Chinese Academy of Agricultural Sciences (25-LZIHPS-07), the Key R&D Program in Gansu Province (24YFNA022), and the Modern China Wool Cashmere Technology Research System (CARS-39-02).

Author contributions

Z.L. and J.L. conceived this study. Z.L. and C.Y. collected the samples and performed the experiments; Z.L., C.Y., T.G., F.W. and B.C. performed the research and analyzed the data. Z.L. drafted the manuscript. All authors have read and approved the final manuscript.

Data availability

Sequencing data was uploaded to SRA under accession number SRP527227 (https://www.ncbi.nlm.nih.gov/sra/SRP527227). The identified variants data were deposited in the EVA under accession number PRJEB100942.

Code availability

The list of the software and parameters used in this study is available through GitHub (https://github.com/luzengkui/sheep_NGS_01).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Chao Yuan, Email: yuanchao@caas.cn.

Jianbin Liu, Email: liujianbin@caas.cn.

References

  • 1.Hu, X. J. et al. The genome landscape of Tibetan sheep reveals adaptive introgression from argali and the history of early human settlements on the Qinghai-Tibetan Plateau. Mol Biol Evol.36, 283–303 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zhao, Y. X. et al. Genomic reconstruction of the history of native sheep reveals the peopling patterns of nomads and the expansion of early pastoralism in East Asia. Mol Biol Evol.34, 2380–2395 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.China National Commission of Animal Genetic Resources. Animal genetic resources in China (sheep and goats). (China Agriculture Press, 2011).
  • 4.Li, X. et al. Whole-genome resequencing to investigate the genetic diversity and mechanisms of plateau adaptation in Tibetan sheep. J Anim Sci Biotechnol.15, 164 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.NCBI GenBankhttps://identifiers.org/ncbi/insdc.gca:GCA_017524585.1 (2021).
  • 6.Lu, Z. et al. Chromosome-level genome assembly of Guide Black-Fur sheep (Ovis aries). Sci Data.11, 711 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Han, B. et al. Multiomics analyses provide new insight into genetic variation of reproductive adaptability in Tibetan sheep. Mol Biol Evol.41, msae058 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Xu, X. et al. Insight into the differences in meat quality among three breeds of sheep on the Qinghai-Tibetan plateau from the perspective of metabolomics and rumen microbiota. Food Chem X.23, 101731 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhang, X. et al. Effects of different feeding regimes on muscle metabolism and its association with meat quality of Tibetan sheep. Food Chem.374, 131611 (2022). [DOI] [PubMed] [Google Scholar]
  • 10.Wu, D. et al. Convergent genomic signatures of high-altitude adaptation among domestic mammals. Natl Sci Rev.7, 952–963 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jin, M. et al. Whole-genome resequencing of Chinese indigenous sheep provides insight into the genetic basis underlying climate adaptation. Genet Sel Evol.56, 26 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li, C. et al. Multi-omic analyses shed light on the genetic control of high-altitude adaptation in sheep. Genomics Proteomics Bioinformatics22, qzae030 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wen, J. et al. Ancestral origins and post-admixture adaptive evolution of highland Tajiks. Natl Sci Rev.11, nwae284 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ping, J. et al. A highland-adaptation variant near MCUR1 reduces its transcription and attenuates erythrogenesis in Tibetans. Cell Genom.5, 100782 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ferraretti, G. et al. Convergent evolution of complex adaptive traits modulates angiogenesis in high-altitude Andean and Himalayan human populations. Commun Biol.8, 377 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kuhn, R. M. et al. The UCSC genome browser database: update 2007. Nucleic Acids Res.35, D668–673 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP527227 (2024).
  • 18.ENA European Variation Archivehttps://identifiers.org/ena.embl:ERP182370 (2025).
  • 19.Pfeifer, S. P. From next-generation resequencing reads to a high-quality variant data set. Heredity118, 111–124 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sims, D. et al. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet.15, 121–132 (2014). [DOI] [PubMed] [Google Scholar]
  • 21.Ayalew, W. et al. Whole genome sequences of 70 indigenous Ethiopian cattle. Sci Data.11, 584 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Jiang, Y. et al. Optimal sequencing depth design for whole genome re-sequencing in pigs. BMC Bioinformatics20, 556 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res.21, 952–960 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhao, L. et al. Whole-genome resequencing of Hu sheep identifies candidate genes associated with agronomic traits. J Genet Genomics.51, 866–876 (2024). [DOI] [PubMed] [Google Scholar]
  • 25.Zhao, F. et al. Analysis of 206 whole-genome resequencing reveals selection signatures associated with breed-specific traits in Hu sheep. Evol Appl.17, e13697 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jin, M. et al. Genomic insights into the population history of fat-tailed sheep and identification of two mutations that contribute to fat tail adipogenesis. J Adv Res.S2090-1232, 00304–2 (2025). [DOI] [PubMed] [Google Scholar]
  • 27.Chen, S. et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Belay, S. et al. Whole-genome resource sequences of 57 indigenous Ethiopian goats. Sci Data.11, 139 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Guo, Y. et al. Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform.15, 879–889 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet.43, 491–498 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gheyas, A. et al. Whole genome sequences of 234 indigenous African chickens from Ethiopia. Sci Data.9, 53 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rabbani, M. A. G. et al. Whole genome sequencing of three native chicken varieties (Common Deshi, Hilly and Naked Neck) of Bangladesh. Sci Data.11, 1432 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wang, J. et al. Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics.31, 318–323 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sun, L. et al. The accumulation of harmful genes within the ROH hotspot regions of the Tibetan sheep genome does not lead to genetic load. BMC Genomics.26, 60 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wang, K. et al. Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res.38, e164 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP527227 (2024).

Data Availability Statement

Sequencing data was uploaded to SRA under accession number SRP527227 (https://www.ncbi.nlm.nih.gov/sra/SRP527227). The identified variants data were deposited in the EVA under accession number PRJEB100942.

The list of the software and parameters used in this study is available through GitHub (https://github.com/luzengkui/sheep_NGS_01).


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES