Abstract
Structural variations (SVs) play crucial roles in the evolutionary adaptation of domesticated animals to natural and human‐controlled environments, but SVs have not been explored in Tibetan cattle, which recently migrated and rapidly adapted to the high altitudes of the Qinghai‐Tibetan Plateau (QTP). In this study, a de novo chromosome‐level genome assembly for Tibetan cattle is constructed. It is found that using a lineage‐specific reference genome significantly increased variant detection accuracy and completeness. Analysis of long‐read sequencing data from 36 high‐altitude QTP and 48 low‐altitude cattle identified 222 528 SVs and 259 SV hotspot regions. Positively selected SVs in high‐altitude cattle are related to energy metabolism erythropoiesis and angiogenesis, and peroxisomal metabolism. A 102‐bp intronic deletion in GNPAT likely upregulated its expression. It is distinguished 7293 SVs that may be introgressed from yak, including variants upstream of the hypoxia‐inducing gene EGLN1. Finally, a ≈2‐Mb heterozygous inversion and two translocations on chromosome 6 are likely associated with the cattle gray coat via regulatory effects on the KIT gene. The results confirm the importance of SVs in evolutionary adaptation and the contribution yak‐introgressed SVs to the rapid acclimatization of QTP cattle.
Keywords: cattle, coat color, genome assembly, high‐altitude adaptation, structural variation
This study reveals the landscape of structural variants in Qinghai‐Tibetan Plateau cattle through long‐read sequencing. Discoveries include metabolic and oxygen‐regulation gene variants, along with a 2‐Mb KIT‐containing inversion and translocations responsible for cattle gray coat. These findings highlight the significant role of structural variants in the evolutionary adaptation of cattle on the Qinghai‐Tibetan Plateau.

1. Introduction
The Qinghai‐Tibetan Plateau (QTP) is one of the most extreme environments on Earth, with an average altitude of 4268 m above sea level. Species that inhabit the QTP must adapt to harsh conditions, including low oxygen concentrations, high‐intensity UV radiation, and large fluctuations in temperature. The geographical range of the QTP includes the high mountain and plateau region in southwestern China (Xizang Autonomous Region, most of Qinghai, parts of Sichuan, Gansu, and Yunnan Province) and extends to neighboring countries. Several studies have been carried out to understand the adaptations of yak[ 1 ] dog,[ 2 , 3 ] horse,[ 4 ] pig,[ 5 ] and chicken[ 6 ] to high‐altitude environments. Interestingly, selection signatures suggest diverse evolution pathways underlying their adaptive selection. For example, EPAS1, a key gene involved in hypoxia adaptation, has undergone convergent selection in Tibetan human, horse, dog, and gray wolf.[ 2 , 4 , 7 , 8 , 9 ] However, in yak and chicken, adaptation to high altitudes is achieved through the selection of other genes (e.g., ADAM17 and RYR2).[ 1 , 6 ]
The domestication of Bos taurus (taurine cattle) and B. indicus (indicine cattle) dates back more than 10 000 and 8000 years, respectively.[ 10 , 11 ] During the history and development of human society, cattle have spread and adapted to a vast variety of environmental conditions and have also experienced admixture with other bovine species.[ 12 , 13 , 14 , 15 , 16 ] QTP cattle include mainly Tibetan taurine and indicine cattle. Ancient DNA evidence indicates that taurine cattle arrived on the QTP at least 3900 years ago and represent the earliest domesticated cattle entering East Asia.[ 17 ] Tibetan cattle live at altitudes of 3000–4600 m and retain distinct morphological and other characteristics,[ 13 ] such as small body size, thick coat, low metabolic rate, and strong adaptability to high altitudes. High‐altitude indicine cattle include mainly the Shigatse Humped cattle breed in Xizang Autonomous Region (≈4000 m), whereas molecular data show a clear indicine influence on the humpless Diqing cattle in Yunnan Province (≈3460 m).[ 18 , 19 ]
Structural variations (SVs) are believed to have a broader influence on gene regulation and protein functionality than single nucleotide polymorphisms (SNPs) and shorter insertions or deletions (InDels) and are therefore also more important for understanding certain aspects of the adaptation process.[ 20 , 21 , 22 , 23 , 24 ] Previous studies have used SNPs to identify selection signals or detect adaptative introgression from yak into Tibetan taurine cattle: this revealed several candidate genes related to the hypoxia‐inducible transcription factor pathway, including EGLN1.[ 13 , 14 , 19 ] A recent study utilizing large‐scale genomic data revealed selected genes related to body size (HMGA2 and NCAPG) and energy metabolism (DUOXA2) in Tibetan cattle.[ 19 ] However, research on the adaptation and phenotypic associations of QTP cattle through SVs has not yet been conducted.
Here, we generated a high‐quality de novo genome assembly of Tibetan cattle based on the Pacific Biosciences high‐fidelity (HiFi) long‐read and high‐throughput chromosome conformation capture (Hi‐C) data. We detected the SVs associated with high‐altitude adaptation by collecting long‐read sequencing (LRS) data from 84 cattle across various altitudes, including 36 cattle on the QTP at altitudes above 3000 m, specifically from the Xizang Autonomous Region, Qinghai, and Yunnan Province in China (Figure 1a). To expand the sample size, these SVs were further genotyped in a larger panel of short‐read sequencing (SRS) data of 281 cattle and 12 individuals from other bovine species. Combined with epigenomic analysis, we highlight the unique contributions of SVs to high‐altitude adaptation and gray coat color in QTP cattle.
Figure 1.

Geographic distribution, phylogenetic relationships, and genome assembly of Qinghai‐Tibetan Plateau cattle. a) Geographic locations of the cattle breeds used in this study and their sequencing platforms. b) NeighborNet graph of 37 cattle populations. An allele frequency‐dependent distance metric (Reynolds) was used to construct NeighborNet. c) Chromosome diagrams for the ARS‐UCD1.2 (bottom) and Tibetan_v1 (top) assemblies for autosomes from the centromere (left) to the telomere (right); the X chromosome has two telomeres. The positions of the telomeres are marked with orange triangles. d) A phylogenetic tree of MASH distances derived from 11 bovine assemblies, including the taurine cattle reference genome. The yak assembly was used as the outgroup.
2. Results
2.1. Phylogenomic Relationships and De Novo Genome Assembly of QTP Cattle
To describe the population genetic structure of the QTP cattle populations, we generated an SNP dataset using 84 LRS and 281 SRS samples, which includes data from 148 QTP cattle (36 LRS and 112 SRS data) encompassing Changdu, Dingjie, Langkazi, Diqing, Yushu, Zhangmu, and Shigatse populations living at different altitudes (Figure 1a). A phylogenetic network (Figure 1b) revealed that QTP taurine cattle are closely related to Northeast Asian taurine and Northwest Chinese taurine cattle, whereas QTP indicine cattle (Shigatse Humped) are close to South Asian indicine cattle. The Diqing cattle exhibited mixed taurine‐indicine ancestry. These results were confirmed by phylogenetic tree of individuals (Figure S1a, Supporting Information) and principal component analysis (PCA, Figure S1b, Supporting Information).
We selected a male Tibetan cattle (Dingjie DJ12) living at an altitude of 4610 m to generate a Tibetan cattle reference genome sequence (Tibetan_v1). Assembly of 102.0 Gb of HiFi data by the Hifiasm program[ 25 ] (Table S1 and Figure S2, Supporting Information) resulted in 709 contigs with an N50 of 84.6 Mb. After the incorporation of 188.5 Gb Hi‐C data, we obtained a genome of 2.92 Gbp, consisting of 31 chromosomes in 60 scaffolds, with a scaffold N50 of 110.9 Mb. The chromosomes of Tibetan_v1 are up to 24.50 Mb longer than those of the taurine cattle reference genome (ARS‐UCD1.2) (Figure 1c and Table S2, Supporting Information), which, for a large part (49%) was due to additional repetitive regions. We detected telomere sequences (TTAGGG)n on 21 autosomes of Tibetan_v1 (ARS‐UCD1.2: 5 autosomes) (Figure 1c). A phylogenetic tree based on the Mash distances[ 26 ] between genomes positioned Dingjie (Tibetan_v1) between the Mongolian (Eurasian taurine) and N'Dama (African taurine) genomes[ 13 ] (Figure 1d).
An alignment of the Tibetan_v1 scaffolds against ARS‐UCD1.2 revealed high collinearity with the Tibetan_v1 genome (Figure S3, Supporting Information). SVs are nonrandomly distributed throughout the genome but are highly prevalent in the terminal regions of chromosomes (Figure S4, Supporting Information). Benchmarking Universal Single‐Copy Orthologs (BUSCO) evaluation revealed that 93.4% of the complete single‐copy genes in Tibetan_v1 were assembled out of the 9226 orthologous single‐copy genes (Table S3, Supporting Information). A K‐mer evaluation of the quality by the Merqury program[ 27 ] gave a consensus quality score (QV) of Tibetan_v1 of 52.67, which was clearly higher than that of the other references (31.1 to 35.9), confirming that the Tibetan_v1 assembly is more complete than other cattle reference genomes (Table 1 ). Interspersed repeats accounted for 44.25% of Tibetan_v1, among which long interspersed nuclear elements (LINEs) constituted the greatest proportion (26.15%), whereas short interspersed nuclear elements (SINEs) accounted for 10.88% (Table S4, Supporting Information).
Table 1.
Comparison of the Tibetan_v1 de novo assembly with representative cattle assemblies.
| Statistics | Tibetan_v1 | Hainan_v1 | Mongolian_v1 | ARS‐UCD1.2 |
|---|---|---|---|---|
| Genome size (bp) | 2919463950 | 2651892370 | 2638491090 | 2715853794 |
| Contig N50 (bp) | 84618144 | 44352259 | 47790958 | 25896116 |
| Scaffold N50 (bp) | 110861251 | 104224000 | 104053164 | 103308737 |
| Number of scaffolds | 60 | 301 | 400 | 2211 |
| Max scaffold (bp) | 163509520 | 157081543 | 146429245 | 158534110 |
| Min scaffold (bp) | 21316 | 749 | 1000 | 1034 |
| Scaffold mean (bp) | 48657732 | 8810273 | 6596227 | 1228337 |
| Consensus quality score (QV) | 52.7 | 35.9 | 39.9 | 31.1 |
| K‐mer based error rate | 5.4 × 10−6 | 2.6 × 10−4 | 1.0 × 10−4 | 7.8 × 10−4 |
| K‐mer completeness (%) | 94.4 | 91.7 | 93.9 | 96.7 |
2.2. Application of Lineage‐specific Assembly in Population Genetic Analysis
We aligned the SRS data of Tibetan_v1 to each of the four assemblies (ARS‐UCD1.2, Tibetan_v1, Hainan_v1, and Mongolian_v1). As expected, Tibetan_v1 had the highest mapping rate (Table S5, Supporting Information). We then identified small variants by aligning the SRS data of 27 Dingjie cattle to ARS‐UCD1.2 and Tibetan_v1 reference genomes. Alignments to Tibetan_v1 resulted in more singleton and fixed variants than alignment to ARS‐UCD1.2 (Figures S5a,b, Supporting Information), indicating that lineage‐specific assembly facilitates the discovery of rare variants. The use of a lineage‐specific reference could convert “multiallelic” into “biallelic” loci in a target lineage.[ 28 ] We found that from the multiallelic variants identified in the 27 Dingjie cattle against the ARS‐UCD1.2, 5.58–8.41% (708–1417 per genome) of them were converted into biallelic variants by alignment to Tibetan_v1 (Table S6, Supporting Information), resulting in 11928 biallelic variants linked with 5056 genes. Among them, 265 SNPs were found in exons of 181 genes. These genes were significantly enriched in pathways related to immunity (corrected P value < 0.05), which might be important for the adaptation of Tibetan cattle to high altitudes[ 13 ] (Table S7 and Figure S5c, Supporting Information). This example indicates that calling for variants based on non‐lineage‐specific references may miss important variants. However, a PCA of 119 East Asian taurine cattle on the basis of SNP genotypes (Figure S5d, Supporting Information) was not sensitive to the use of ARS‐UCD1.2 or Tibetan_v1 as references. This was also found for the calculation of effective population size (Ne) via a multiple sequentially Markovian coalescent (MSMC),[ 29 ] which for the Tibetan cattle population has remained relatively stable over the past 10 000 years (Figure S5e, Supporting Information). Therefore, this study confirms that for East Asian taurine cattle, the use of lineage‐specific references can improve variant calling but has a minimal effect on optimizing the evaluation of population genetic structure and effective population size.
2.3. SV Detection in QTP Cattle Based on LRS Data
We performed LRS of 36 QTP cattle sampled from six locations of Yushu (n = 10), Langkazi (n = 7), Diqing (n = 10), Dingjie (n = 6), and Bailang (n = 3) (Figure 1a; Table S8, Supporting Information). We also generated 28 new LRS genomes and retrieved 20 published LRS genomes of other cattle breeds of Chaidamu (n = 6), Mongolian (n = 10), Yanbian (n = 5), Simmental (n = 1), Hainan (n = 9), Fuzhou (n = 10), and Luxi (n = 7). To keep the coordinates of the genome consistent with previous reports, ARS‐UCD1.2 was used as the reference genome in this study. SVs > 50 bp relative to ARS‐UCD1.2 were detected and retained if they were supported by at least two of the following tools: CuteSV,[ 30 ] SVIM,[ 31 ] and Sniffles[ 32 ] (Figure S6, Supporting Information). On average, 36300 SVs per sample were identified and merged into 222528 nonredundant SVs (Figure 2a,b), including 89767 deletions (DELs), 6484 duplications (DUPs), 119 806 insertions (INSs), 2771 inversions (INVs), and 3700 complex SVs (break‐ends, BNDs) (Figure 2c). Annotation using the ANNOVAR program[ 33 ] revealed that most of these SVs were located in intergenic and intron regions, with exonic SVs accounting for only 2.2%. Among the 222 528 SVs, 46 028 (20.68%) SVs exhibited homozygous reference genotypes (0/0) across all examined low‐altitude cattle (Yanbian, Simmental, Luxi, Fuzhou, and Hainan), including 16 229 DELs, 27 112 INSs, 1679 DUPs, 458 INVs, and 550 BNDs.
Figure 2.

Discovery of structural variations (SVs) in 84 cattle genomes. a) SVs from each sample were merged via a nonredundant strategy starting with Simmental (Sim01) and iteratively adding unique calls from additional samples. The growth rate of the nonredundant set decreases as the number of samples increases. Shared SVs are shown in red color. b) Number of SVs per sample that are classified into shared (identified in all samples), major (in ≥ 42 samples), minor (in > 1 and < 42 samples), and singleton (in only one sample) SVs. c) Proportions of SVs in the four categories defined in (b). d) Chromosomal distribution of SV hotspot regions and 47 SV‐lined genes in newly identified hotspot regions.
We classified these SVs into four categories: shared (present in all samples), major (present in ≥ 50% of the samples), minor (present in more than one but < 50% of the samples), and singleton (present in only one sample). Among the QTP cattle, Diqing had the most singletons (4.93%), which probably reflects an admixture of East Asian indicine cattle (Figure 1b) or other bovine species (e.g., yak).[ 19 ] The number of nonredundant SVs initially increased steeply but began to plateau as more samples were added (Figure 2a), suggesting that the 36 cattle captured a substantial proportion of common SVs within the QTP population. The number of nonredundant SVs increases with the addition of taurindicine (Diqing and Luxi) and indicine (Hainan) cattle, which suggests that these are indicine‐specific variants. Although the rate of novel SV discovery gradually slowed with increasing sample size, the continued contribution of non‐redundant SVs—particularly from genetically distinct groups—implies that SV diversity in cattle may not yet be fully saturated. We detected 795 SVs in all samples (Figure 2b), which probably corresponded to singletons or errors in the cattle reference genome.
2.4. Characterization of Genomic SVs
Transposable elements (TEs) are important sources of SVs in cattle.[ 34 , 35 , 36 ] A total of 21147 TEs were identified across INS, DEL, and DUP, accounting for 9.50% of all the SVs. The TEs included 20904 retrotransposons (17126 LINEs, 2257 SINEs, and 1521 LTRs) and 243 DNA transposons (Table S9, Supporting Information). The enrichment of LINEs has been reported previously for sheep[ 37 ] and pigs.[ 38 ] The SVs displayed sharp peaks at ≈300, 1300, and 8000 bp corresponding to BOV‐A2 (SINE), BTLTR1/BTLTR1B (LTR/ERVK), and L1_BT/LIMA9 (LINE/L1), respectively (Figure S7, Supporting Information).
We identified 259 SV hotspots (see Materials and Methods) with a total size of ≈281 Mb, 157 of which have previously been identified[ 34 , 39 , 40 ] (Figure 2d). Random sampling (1000) and a Z test revealed that the SV hotspots are significantly enriched in the TE‐derived SVs, with a major contribution of DEL (Z score = 19.67, P = 1.81 × 10−86) (Table S10, Supporting Information). The same analysis revealed that hotspots are not enriched in conserved element regions found by the genomic evolutionary rate profiling (GERP) (Z score = −5.76, P = 4.14 × 10−9). These findings were consistent with previous reports, which showed that SV hotspots are present mainly in genomic regions rich in genes and high in diversity within populations.[ 34 , 41 ] We observed that many newly identified SV hotspot regions, such as those on B. taurus autosome 4 (BTA4, TRGC3 and TRGC4), BTA13 (SIRPB1), and BTA19 (TRIM25 and RAP1GAP2), are related to immunity (Figure 2d). Notably, the genes TRGC3 and TRGC4 on BTA4 belong to the TRG family, which is one of the gene T‐cell receptor subgroups.[ 42 ]
2.5. Identification of High‐Altitude‐Adapted SVs
To identify candidate SVs that may play a role in high‐altitude adaptation, we analyzed SVs that have extremely different allele frequencies in high‐altitude QTP cattle relative to low‐altitude cattle using the di statistic.[ 43 , 44 ] We defined outlier SVs as di ‐SVs based on the Z test (P < 0.001) (Figure 3a). Thus, we identified 1692 di ‐SVs, including 906 INSs, 763 DELs, 16 DUPs, and 10 INVs, involving 601 genes (Table S11, Supporting Information). Functional enrichment analyses of the 601 genes via Kyoto Encyclopedia of Genes and Genomes (KEGG) revealed significant enrichment in pathways (corrected P < 0.05) related to cardiac health, metabolic regulation, and neuronal function, including arrhythmogenic right ventricular cardiomyopathy, metabolic pathways, amino sugar and nucleotide sugar metabolism, and the calcium signaling pathway (Figure S8 and Table S12, Supporting Information).
Figure 3.

Differentiated structural variations (SVs) associated with the high‐altitude adaptation of Qinghai‐Tibetan Plateau cattle. a) Genome‐wide distribution of di for each SV across all autosomes between high‐ and low‐altitude cattle (top panel). The dashed line denotes the significance threshold (P < 0.001, corresponding to the Z test). The blue block in the Manhattan plot for SNP‐F ST and SV‐F ST (bottom panel) indicates the consistency of SORD region selection for SNPs and SVs. b) The red and blue tracks represent the ATAC‐seq data from muscle and thyroid tissues, respectively, of Tibetan cattle (GL01 and GL02). The pink line shows the 102‐bp DEL (BTA28:4013916‐4014018) in GNPAT, located within chromatin accessibility signals. c) Genotype frequency of the 102‐bp DEL in GNPAT of the 96 LRS‐based genomes, including data from 84 cattle and 12 yak. d) SNP and SV genotypes in the GNPAT region. e) Comparison of the relative luciferase activity between the wild‐type (WT) and Tibetan (TB) cattle haplotypes of GNPAT. The bars display the means±SD (n = 3 technical replicates). A two‐sided t‐test was used, where ** indicates P < 0.01 and *** indicates P < 0.001. f) The top panel shows the known binding motif of the transcription factor Ddit3::Cebpa, while the bottom panel represents the sequence motif identified within the 102‐bp DEL, which matches the binding site of Ddit3::Cebpa. This alignment indicates that the 102‐bp DEL contains the Ddit3::Cebpa binding motif, suggesting that this DEL may affect the function of the regulatory element.
Among the 601 genes, 45 genes have been previously reported with known functions in hypoxic responses and/or as gene candidates associated with physiological traits,[ 45 ] encompassing 78 SVs (Figure 3a; Table S13, Supporting Information). Several of these SVs are located in genes involved in erythropoiesis and angiogenesis, such as SSH2,[ 46 ] VGLL4,[ 47 ] PLCB1,[ 48 ] HPSE2,[ 49 ] and HPSE [ 50 ] (Figure 3a). Notably, the 67‐bp INS (BTA19:20937384) in SSH2 exhibited the highest di signal among these genes. As a member of the slingshot phosphatase family, SSH2 may play an important role in cofilin‐mediated vascular smooth muscle cell migration and neointima formation.[ 46 ] HPSE and PLCB1 are associated with the adaptation of Tibetan[ 51 ] and alpine Merino sheep[ 52 ] to high‐altitude hypoxia, respectively. In addition, we detected an 8493‐bp INS (BTA22:13818542) in CTNNB1, with an allelic difference of 0.211 (SV‐F ST) between QTP and low‐altitude cattle. CTNNB1 knockdown inhibited hypoxia‐regulated gene expression in vitro and reduced the protein level of hypoxia‐inducible factor‐1α (HIF‐1α).[ 53 ]
We further detected several important candidate SV‐related genes associated with lipid metabolism and mitochondrial metabolism, which are crucial for oxygen homeostasis.[ 54 ] The strongest signals were observed on BTA10, including four consecutive SVs of 74‐bp DEL (BTA10:65366980‐65367054), 71‐bp INS (BTA10:65418831), 601‐bp INS (BTA10:65422143), and 1056‐bp DEL (BTA10:65424402‐65425458) in the intergenic and intronic regions of the sorbitol dehydrogenase (SORD) gene (Figure 3a; Figure S9, Supporting Information). The allele frequencies of these SVs were as high as 35.7 to 83.3% in QTP cattle (Figure S9, Supporting Information). These SVs shared the same selection signatures of their linked SNPs (Figure 3a). SORD is the second enzyme of the polyol pathway. Low expression of SORD is expected to attenuate the increase in the cytosolic NADH/NAD+ ratio and increase glycolysis.[ 55 ] Another highly di‐SV is an 86‐bp DEL in the intron of NDUFB6 (BTA8:11510584‐11510670). This is a crucial component of respiratory complex I, which, together with complex III generates most of the mitochondrial ROS.[ 56 , 57 ] A third top signal is a 52‐bp DEL (BTA11:104554345‐104554397) within the intron of SARDH. This gene encodes an enzyme localized to the mitochondrial matrix and plays an important role in promoting energy balance in the context of nutrient stress through the CREB/CRTC pathway.[ 58 ] We also detected a 65‐bp INS (BTA1:80427709) located upstream of ADIPOQ, which is involved in adipose metabolism, body homeostasis, and cardiovascular protection.[ 59 ]
To explore the potential regulatory roles of the 1692 di ‐SVs under selection, we integrated ATAC‐seq data from six key tissues (adipose, liver, heart, lung, muscle, and thyroid) from high‐ and low‐altitude cattle to identify chromatin accessible regions (CARs) (Table S14, Supporting Information). By intersecting with CARs, we identified 59 di ‐SVs (3.5%) and 75 SV‐associated genes (Table S15, Supporting Information). Notably, three genes (GNPAT, CAT, and ACOT2) were significantly enriched in the GO pathway of the peroxisomal matrix. GNPAT encodes glyceronephosphate O‐acyltransferase, which is located in the peroxisomal membrane and catalyzes the first step of ether lipid synthesis involved in the decomposition of hydrogen peroxide and the maintenance of reactive oxygen species homeostasis.[ 60 , 61 ] We identified a 102‐bp DEL in the first intron of GNPAT (BTA28:4013916‐4014018) (Figure 3a,b), which had the highest signal among these three SVs. This 102‐bp DEL was found in 13 out of the 36 LRS‐based (36%, Figure 3c) and 36 out of the 112 SRS‐based genomes (32%, Figure 3d; Figure S10, Supporting Information) of QTP cattle, with a very low detection rate in other cattle breeds. A previous study indicated that peroxisomal genes play crucial roles under hypoxic conditions.[ 54 ] ATAC‐seq data from different tissues of QTP cattle revealed that the 102‐bp DEL falls within an open chromatin region. Chromatin accessibility at this locus was reduced in the heterozygous GL02 individual (genotype 0/1) compared to the wild‐type GL01 (genotype 0/0) (Figure 3b; Figure S11, Supporting Information).
Compared with the normal taurine haplotype, the QTP cattle‐specific haplotype carrying the 102‐bp DEL exhibited greater activity in a luciferase assay (P < 0.001) (Figure 3e). This finding indicates that the DEL may disrupts the structure of the open chromatin and subsequently affect GNPAT expression. Within the 102‐bp sequence, we identified a putative CCAAT/enhancer‐binding protein (C/EBP)‐binding motif (Ddit3::Cebpa), which controls lineage‐specific gene expression and acts as a master regulator of the terminal differentiation of several cell types[ 62 ] (Figure 3f). Notably, the GNPAT gene region was found to be introgressed from yak to QTP cattle.[ 19 ] However, yak carried this 102‐bp fragment (Figure 3d), suggesting that this 102‐bp DEL was not derived from yak introgression.
2.6. Yak Introgression
Previous studies have reported that historic yak introgression events have contributed to the rapid adaptation of QTP cattle.[ 13 , 14 , 63 ] We used LRS‐based genomes of 12 yak to identify the SVs that may have resulted from the introgression of yak into QTP cattle (Table S8, Supporting Information). We then searched for SVs that were shared by yak genomes but not present in other cattle genomes. Using LRS‐based genomes, 7293 autosomal SVs possibly derived from yak introgression were identified in QTP cattle, totaling 3.38 Mb (Figure 4a). To further validate these regions, we constructed a neighbour‐joining (NJ) tree using SNPs from 50‐kb flanking regions of the SVs in QTP cattle and yak, tracing the haplotype origins. In this way, we identified 5642 SVs in QTP cattle as probable yak introgressions (Table S17, Supporting Information). We found an introgressed SV hotspot region on BTA28, including two SVs (1324‐bp DEL in BTA28:4236113‐4237437 and 570‐bp INS in BTA28:4204966) located upstream of the well‐known hypoxia‐inducing gene EGLN1.[ 64 ] The two SVs had high frequencies in QTP cattle (Figure 4a‐c,e). Another high‐signal SV was a 57‐bp DEL (BTA9:87336841‐87336898) located in the first intron of PPP1R14C, a gene associated with angiogenesis.[ 65 ]
Figure 4.

Genome‐wide introgression from yak into Qinghai‐Tibetan Plateau (QTP) cattle. a) The distribution of yak introgressed fragments in the genome based on LRS‐SV results, as well as the SV hotspot region on BTA28. b–d), Allele frequencies of the SVs resulting from yak introgression b) in the LRS‐based genomes of all QTP cattle (LRS‐All QTP cattle); c) in the SRS‐based genomes of all QTP cattle (SRS‐All QTP cattle); and d) in the SRS‐based genomes of Changdu cattle (SRS‐Changdu). e) Phylogenetic analysis of the SNPs in the introgressed EGLN1 gene. f) Phylogenetic analysis of the SNPs in the introgressed NFE2L2 gene.
To expand the population size, the SVs were further genotyped using 293 SRS‐based genomes, from 112 QTP cattle, 169 cattle from various global populations, and 12 individuals from other bovine species with high‐coverage genome sequences (Table S16, Supporting Information). Applying the same identification criteria used for the LRS data, we detected 2334 SVs possibly derived from yak introgression (Table S18, Supporting Information), of which 971 showed overlap with the LRS SVs. Consistent with a recent study using SNP data,[ 19 ] our results indicate that not all QTP cattle populations have the same yak introgression signals. For example, the top candidate regions in the Changdu cattle population differ from those in other QTP cattle populations (Figure 4d). Interestingly, an introgressed region on BTA2, including a 52‐bp DEL in the intron of NFE2L2 (BTA2:19540699‐19540751), has a high frequency in the Changdu population and is absent in other cattle (Figure 4f). NFE2L2 (NFE2‐like bZIP transcription factor 2, also known as NRF2) plays a key role in the response to oxidative stress and induces the upregulation of glucose transporter 1 (GLUT1), which allows increased glucose uptake and HIF‐1‐driven augmentation of glycolysis.[ 66 ]
2.7. A 2‐Mb Heterozygous Inversion Related to the Gray Coat Phenotype of QTP Cattle
We further examined QTP‐specific SVs and found a 2‐Mb heterozygous INV and two translocations (TRAs) related to the gray coat phenotype in cattle. The gray phenotype of mammals occurs when at birth the hairs lose their pigment and turn white. This is observed in gray horses,[ 67 ] but also with a low frequency in QTP cattle. To confirm the genomic candidate region of the gray coat phenotype, SRS data from 23 gray cattle and 258 non‐gray cattle were used for SNP‐based GWAS (SNP‐GWAS) and SNP‐based F ST (SNP‐F ST) analyses (Figure 5a). Both SNP‐GWAS, which uses the genome‐wide efficient mixed model association (GEMMA) program with a mixed linear model, and SNP‐F ST confirmed a strong signal on BTA6. The GWAS signal spans 2.99 Mb (BTA6:69765842‐72752916; P < 1 × 10−8). This region harbors 23 non‐LOC genes and 13 LOC genes, including KIT (Figure S12, Supporting Information). The signals of the SNP‐F ST contrasting 22 gray and 29 non‐gray QTP cattle (BTA6:69750001‐72725001, P < 0.001) overlapped with the SNP‐GWAS signals (Figure 5a). However, no functional SNP or InDel was found. We then performed a whole‐genome SV‐based GWAS (SV‐GWAS) (Figure 5b) and SV‐based F ST (SV‐F ST) analysis (Figure S13, Supporting Information), and confirmed the highest signal of the 2‐Mb heterozygous INV (BTA6:69772181‐71772202), together with two TRAs in BTA17 (47058249) and BTA6 (70471991) (Figure 5b), overlapping with the region detected by SNPs for the gray coat color phenotype in cattle.
Figure 5.

A 2‐Mb heterozygous inversion related to the gray coat phenotype of Qinghai‐Tibetan Plateau (QTP) cattle. a) Genome‐wide Manhattan plots generated via GWAS and F ST based on SNPs for the gray coat phenotype. b) Genome‐wide Manhattan plot using the GWAS based on SVs for gray coats. c) Rearrangement of the 2‐Mb INV in non‐gray (top) and gray (bottom) QTP cattle. This region displays the genes located on BTA6 between positions 69.50‐72.00 Mb in the B. taurus reference genome (ARS‐UCD1.2) from the Ensembl database, with the blue shaded area indicating the specific location of the inversion at BTA6:69772181‐71772202. In addition to the 2‐Mb inversion on BTA6, we also identified two translocations (pink and purple lines), one of which involves a fragment from BTA17 that is translocated into intron 22 of PDGFRA. d) PCR validation of the 2‐Mb INV. We designed two pairs of PCR primers at the insertion site of the translocation (Primer1, BTA6:69764591) and the starting position of the inversion (Primer2, BTA6:69772181) and validated them in gray and non‐gray cattle. e) Skin sections of gray and black cattle stained with hematoxylin and eosin (H&E) and subjected to immunohistochemistry (IHC), respectively. For the IHC analysis, a rabbit antibody against KIT was used, with the brown color indicating positive staining and the blue color representing the cell nuclei. Scale bars, 50 µm. f) Chromatin architecture and regulatory landscape in the gray and non‐gray cattle genomic regions flanking the 2‐Mb INV region on BTA6. The top section represents a Hi‐C contact map, which visualizes chromatin interactions, with red and blue colors indicating high interaction densities. The grey shaded area encompasses four consecutive TADs in the cattle genome, with a highlighted region (red dashed circle) indicating strong chromatin interactions in gray cattle. The AB index plot, where red bars represent active (A) compartments and blue bars denote inactive (B) compartments. The ATAC and H3K27ac tracks show regulatory activity in this region. Loops derived from Hi‐C data highlight chromatin interactions across this region. The bottom panel shows an annotation map of genes located on BTA6 between positions 68–74 Mb in the ARS‐UCD1.2 genome from the Ensembl database.
The INV was confirmed by analysis of discordant and split reads at the junctions in gray cattle compared with those at the junctions in non‐gray cattle (Figure S14, Supporting Information). We further confirmed the presence of this 2‐Mb heterozygous INV via the program npInv,[ 68 ] an accurate tool for detecting and genotyping inversions using multiple alignment long reads. This INV was probably caused by error‐prone nonhomologous end joining (NHEJ) events and contains 20 coding genes, including KIT (Figure 5c). There was no disrupted gene present at the INV boundary; however, a translocation fragment of ≈33 kb from BTA17 was found 7590 bp upstream of the INV proximal breakpoint (Figure 5b,c, Supporting Information). This fragment was inserted into the 22nd intron of PDGFRA. We confirmed the INV and TRAs by PCR in 23 other gray cattle (Figure 5d; Figure S15, Supporting Information).
To further investigate the possible origin of this INV in QTP cattle, we conducted a comprehensive haplotype phylogenetic analysis using sequences from the INV region (BTA6:69722181‐71822202). The results indicate that the INV‐carrying haplotypes show strong phylogenetic clustering with South Asian indicine cattle (Bos indicus), while the alternative haplotype primarily groups with East Asian taurine cattle (Bos taurus) (Figure S16, Supporting Information). This suggests that the inversion haplotype segment was likely originated from South Asian indicine cattle, consistent with the presence of South Asian indicine ancestry components in QTP cattle populations as previously reported.[ 19 ]
Due to the well‐established role of KIT in determining white‐spotting phenotypes in several species, this gene is an obvious candidate for the gray coat phenotype in QTP cattle. KIT is expressed in several tissues, particularly in mast cells and melanocytes.[ 69 ] Using H&E staining and immunohistochemistry of skin samples, we observed a more intense presence of melanin and more pronounced KIT signals around the hair follicles of black cattle than in those of gray cattle (Figure 5e). Hi‐C data from gray cattle and non‐gray cattle indicated that the breakpoints of this INV were located at topologically associated domains (TAD) boundaries (Figure 5f; Table S19, Supporting Information). The presence of the INV led to stronger interactions across the TAD boundaries and stronger H3K27ac signals upstream of the inversion in gray cattle. Evolutionary conservation was confirmed through cross‐species alignment (LiftOver; BTA6:68‐70 Mb to GRCm39), showing mice KIT enhancer sequences are conserved in cattle (Figure S17a, Supporting Information), with these enhancers known to reside within KIT TAD in mice melanocytes.[ 69 ] Furthermore, the TRA at BTA6:70471991 in gray cattle precisely localizes to the CTCF‐defined TAD border between KIT and KDR (Figure 5f; Figure S17b, Supporting Information), while deletion of the Kit‐Kdr TAD boundary in mice results in a lighter coat color phenotype.[ 69 ] Together, these findings suggest that the INV and associated TRA events may alter KIT expression through TAD boundary disruption and consequent enhancer dysregulation, contributing to the gray coat phenotype in QTP cattle. We then examined the linkage disequilibrium (LD) level between the SNPs positioned within or adjacent to the 2‐Mb heterozygous INV (Figures S18 and S19, Supporting Information). The LD levels between distant SNPs are generally higher in gray cattle than in non‐gray cattle, indicating a suppression of recombination by the large inversion.
Previous studies have established the cell‐specific expression pattern of the Kit gene in mice, leading us to hypothesize that the effects of INV might exhibit cell‐type specificity (e.g., melanocytes). However, current bulk‐level Hi‐C interaction data lack resolution to discern such cellular heterogeneity. Future investigations employing single‐cell Hi‐C or other high‐resolution chromatin conformation capture methods could help verify these possibilities and elucidate cell‐type‐specific regulatory mechanisms.
3. Discussion
In this study, we generated a high‐quality assembly of Tibetan taurine cattle (Tibetan_v1) using a combination of PacBio HiFi and Hi‐C technologies. Tibetan_v1 has quality metrics comparable to or even exceeding those of other currently available cattle reference genomes. It may serve as a reference genome serve as an important data resource for further analysis of Asian cattle breeds, particularly for studies of the bovine pangenome.[ 70 ] Bovine autosomes are acrocentric,[ 71 ] and a complete bovine assembly is close to “centromere‐to‐telomere”.[ 72 ] We identified 20 autosomes with telomeric repeats in the Tibetan_v1 assembly.[ 72 ] In the application of population genetics, the use of a population‐specific reference genome (Tibetan_v1) instead of ARS‐UCD1.2 more accurately distinguishes East Asian cattle, detects rare low‐frequency variants, and allows the conversion of multiallelic into biallelic variants. The multiallelic variants covered by the European bovine reference genome in QTP cattle are mainly restricted to highly polymorphic immune‐related genes (BoLA, BoLA‐NC1, etc.).[ 73 , 74 ] Our results suggest that sequences with high nucleotide diversity will benefit from population‐specific reference genomes. This observation aligns with findings in other species, such as Tibetan goat, where population‐specific reference genomes have been shown to improve the detection of novel selective regions.[ 75 , 76 ]
Whole‐genome scanning of nucleotide variations has been widely used to identify candidate genes for hypoxia adaptation in high‐altitude species.[ 19 ] In this study, we focused on the role of SVs in high‐altitude adaptation, which involves both polygenic inheritance and gene introgression. We generated the largest dataset of QTP cattle SVs to date and identified SVs that are highly differentiated between high‐ and low‐altitude cattle populations, potentially linked to high‐altitude adaptation. Increasing oxygen delivery and reducing oxygen consumption are the two primary responses to hypoxia; the former relies on the improvement of blood and vascular conditions, and the latter mainly refers to glucose and lipid metabolism.[ 77 ] We identified SVs associated with high‐altitude adaptation in QTP cattle, primarily overlapping with genes related to erythropoiesis and angiogenesis (SSH2, VGLL4, PLCB1, HPSE2, and HPSE) and energy metabolism (SORD, NDUFB6, SARDH, and ADIPOQ). Similar adaptive pathways (e.g., hypoxia response and energy metabolism) were previously observed in yak,[ 1 , 63 , 78 ] though involving distinct genes, suggesting convergent mechanisms for high‐altitude survival between QTP cattle and native yak.
The selection signal around the SORD gene was highly consistent in both the SNPs and the SVs between the high‐ and low‐altitude cattle populations, indicating its potential crucial role in high‐altitude adaptation. A previous study on ischemia/reperfusion‐induced injury of the rat heart revealed that SORD (also called SDH) affects the induction of HIF‐1α by regulating the level of NAD+.[ 55 ] HIF‐1α is a subunit of a transcription factor called hypoxia‐inducible factor 1.[ 79 ] The physiological function of HIFs is to promote the adaptation of cells to low oxygen by inducing neovascularization and glycolysis.[ 80 ] Other high‐signal SV‐associated genes, such as CTNNB1[ 53 ] and HPSE,[ 81 ] may also help the high‐altitude adaptation of QTP cattle through the HIF pathway. Notably, some strongly selected SVs (e.g., SSH2) were located outside ATAC‐seq predicted open chromatin regions. This observation suggests two non‐exclusive possibilities: 1) The regulatory effects of adaptive SVs may operate through mechanisms independent of canonical chromatin accessibility; 2) Tissue‐specific regulatory landscapes‐while our ATAC‐seq covered six major tissues (heart, liver, lung, muscle, adipose and thyroid), key regulatory elements for certain genes might be active in other cell types. This highlights the importance. This highlights the necessity for future studies to integrate broader tissue sampling, single‐cell multi‐omics profiling, and functional validation to comprehensively resolve the regulatory mechanisms underlying high‐altitude adaptation. Additionally, we identified di ‐SVs in three peroxisomal genes (GNPAT, CAT, and ACOT2). We detected a 102‐bp DEL in the first intron of GNPAT, which is located within a predicted chromatin open region. The results from the luciferase reporter assay showed that the fragment containing the 102‐bp DEL significantly increased luciferase expression in mouse 293T cells, suggesting that the DEL sequence may block the function of a regulatory element.[ 82 ] In Tibetans, GNPAT has also been identified as a key gene involved in the response to UV treatment through the promotion of melanin synthesis.[ 83 ] CAT encodes a catalase that plays a key role in protecting cells and dismutates H2O2 to water and oxygen in response to oxidative stress. Combined with previous study,[ 54 ] it suggests that peroxisomal genes may play crucial roles under hypoxic conditions in QTP cattle.
BTA28 contains relatively many yak‐introgressed regions, which is consistent with previous SNP‐based analyses.[ 19 ] Notably, this region includes hypoxia genes such as EGLN1, EXOC8, GNPAT, SPRTN, TSNAX, FAM89A, TRIM67, and ARV1.[ 54 , 83 , 84 ] Interestingly, the same genomic region has also been reported in high‐altitude species such as the zoker (Eospalax baileyi)[ 85 ] and Tibetan humans.[ 86 ] SNPs revealed that the proportions of yak ancestry ranged from 0.64% to 3.26% in the QTP cattle genomes, among which Nangqian, Changdu, and Yushu were the highest, followed by Diqing, Zhangmu, Dingjie, and Shigatse taurine cattle.[ 19 ] In this study, we identified distinct hotspots of SV introgression across different QTP populations, indicating various patterns of yak introgression. For example, in the Changdu population, the highest introgression region signal was found on BTA2, which includes the NFE2L2 gene, which is involved in the oxidative stress response.[ 87 ] Changdu cattle inhabit an agro‐pastoral zone where interbreeding with yak is common. Our results indicate that not all QTP cattle populations have the same yak introgression signals or similar adaptive mechanisms, which may reflect the size of the QTP with geographical barriers. This has also influenced human variation. The top candidate genes that carry adaptive genetic variation previously found in Tibetan humans, such as EPAS1 and EGLN1, do not have strong positive selection signals in Deng people, who also live on the Plateau and have selection signals in KLHL12 and other genes.[ 88 ]
Chromosomal INVs may affect recombination and chromosome structure. However, INVs can be challenging to detect, so the prevalence and hence the significance of INVs segregating within a species are often overlooked. LRS data for gray cattle revealed a 2‐Mb heterozygous INV that distinguished phenotypically gray cattle from non‐gray cattle. These INVs may very well be favored by natural selection because they suppress recombination between coadapted genes,[ 89 ] where the gray coat phenotype contributes to adaptation to high‐altitude environments. Previous studies in humans and other mammals have shown that large INVs modify gene expression via two major mechanisms: i) directly disrupting regulatory elements adjacent to breakpoints and ii) rearranging regulatory elements positioned within or near the inverted region.[ 89 ] At the same time, we observed that the 2‐Mb INV was accompanied by two TARs, a phenomenon that appears to be common in cytogenetically detected INVs.[ 90 ] At the INV breakpoints, complexity and imbalances are often observed, which may result from error‐prone repair of processed double‐strand breaks.[ 91 ] Inhibitory recombination of inverted heterozygotes has been reported in several species, such as flat‐fruit peach[ 92 ] and deer mice.[ 93 ] A light coat color may contribute to adaptation to high‐latitude or tropical/subtropical environments by reflecting a relatively large proportion of incident solar radiation.[ 94 ] We propose that the gray coat phenotype in cattle arises through structural rearrangement of regulatory elements by this INV, or alternatively, through potential disruption of the KIT‐KDR TAD by the TRA, leading to dysregulated KIT expression.
GWAS based on SNPs constitute a powerful experimental strategy for identifying genetic variations underlying animal traits.[ 95 , 96 ] However, causative genomic variations often involve long stretches of DNA, such as the 2‐Mb heterozygous INV identified in this study, and are not detected by genome‐wide SNP genotyping. Our study suggests that LRS analysis can identify important SVs that SRS data cannot detect, providing high‐confidence candidate genes and variants for subsequent functional studies.
4. Conclusion
We generated a high‐quality genome assembly for Tibetan cattle and presented a comprehensive catalog of SV variations. We evaluated the important role of the population‐specific reference genome in detecting rare and low‐frequency variations. Leveraging the advantages of population‐scale long‐read sequencing, we identified genes and related SVs associated with high‐altitude adaptation and coat phenotypes, offering new insights into the plateau adaptability of cattle to the QTP.
5. Experimental Section
Ethics Statement
This study was approved by the Institutional Animal Care and Use Committee of Northwest A&F University (FAPWCNWAFU, Protocol no. NWAFAC 1008).
Samples Collected for Tibetan Cattle Genome Assembly
Whole blood from the Dingjie bull DJ12 was collected to construct the Tibetan cattle genome assembly. The bull was located in Dingjie County, Shigatse City, Xizang Autonomous Region, China. Genomic DNA was extracted from the tissues of the animals using the phenol/chloroform method. The resulting DNA was sequenced using the Pacific Biosciences (PacBio) Sequel II platform at Novogene, yielding a total of 102 Gbp, corresponding to a genomic coverage of ≈39×. In addition to long reads, the same animal was resequenced using Illumina HiSeq X Ten paired‐end short‐read sequencing, yielding 376 Gbp with an average insert size of 350 bp, corresponding to a genomic coverage of ≈145×.
In Situ Hi‐C Library Preparation
In this study, one library was generated from the blood of the Dingjie bull DJ12 and six in situ Hi‐C libraries from the skin tissue of gray and non‐gray cattle. Hi‐C libraries were constructed according to previous studies.[ 97 ] Briefly, the samples were crosslinked with 1% formaldehyde for 10 min at room temperature and quenched with 0.125 M glycine for 5 min. The crosslinked cells were subsequently lysed. Endogenous nuclease was inactivated with 0.3% SDS, and chromatin DNA was digested with 100 U of MboI (NEB), marked with biotin‐14‐dCTP (Invitrogen), and then ligated with 50 U of T4 DNA ligase (NEB). After the crosslinking was reversed, the ligated DNA was extracted with a QIAamp DNA Mini Kit (Qiagen) according to the manufacturer's instructions. The purified DNA was sheared into 300‐ to 500‐bp fragments and subjected to further blunt‐end repair, the addition of A‐tails and adaptors, followed by purification using biotin‐streptavidin‐mediated pull‐down and PCR amplification. Finally, the Hi‐C libraries were quantified and sequenced on the MGI‐seq platform (BGI, China).
Whole‐Genome Nanopore LRS Samples
Blood or ear tissue samples were collected from 36 cattle in five high‐altitude regions (above 3000 m) in the Xizang Autonomous Region, Qinghai, and Yunnan Province, China. Low‐altitude cattle were collected from Qinghai, Jilin, and Shandong Province, all below 3000 m above sea level, for a total of 28 blood or ear tissue samples. The sequencing libraries were prepared using an SQK‐LSK109 ligation kit (Oxford Nanopore Technologies; Oxford, UK). Sequencing was performed using PromethION flow cells (R9.4). Base calling was performed using Guppy (v.5.1.13). In addition, 20 published LRS datasets were downloaded from low‐altitude cattle for joint analysis. LRS data were downloaded from six domestic yak and six wild yak from the NCBI to identify SVs. All the samples used in this study are listed in Table S8 (Supporting Information).
Samples Collected for Whole‐genome Illumina SRS
The Illumina SRS data of 281 cattle from six geographic regions (Africa, Europe, Eurasia, Northeast Asia, South China, and South Asia) as well as six wild bovine species (n = 12) were generated in our study or retrieved from the sequence read archive of the National Center for Biotechnology Information (NCBI, Table S16, Supporting Information).
Genome Assembly and Quality Assessment
Hifiasm (v.0.16.1‐r375) was used to generate the assembly from HiFi CCS reads with default parameters.[ 25 ] Hifiasm yields one primary contig assembly and a pair of partially phased contig assemblies. The primary contigs were scaffolded to chromosomes using clean Hi‐C reads with Juicer (v.1.5)[ 98 ] and 3D‐DNA (v.201008).[ 99 ] The assembly was manually reviewed and adjusted using Juicebox Assembly Tools.[ 100 ] Genome completeness was assessed using BUSCO (v.4.0)[ 101 ] with the Mammalia_odb10 database. Merqury (v.1.0)[ 27 ] was used to estimate the quality value (QV) by employing high‐depth Illumina SRS data from the same individual whose genome was assembled. Mash[ 26 ] was used to estimate the distance between the assembled genome of the Tibetan cattle genome and other genomes.
Genome Annotation
Repetitive elements in Tibetan_v1 were identified by their matches to Repbase (v.20140131) using RepeatMasker (v.4.0.5) (http://www.repeatmasker.org). To identify protein‐coding genes in Tibetan_v1, Liftoff (v.1.5.2)[ 102 ] and GffRead (v.0.12.1)[ 103 ] were used to generate gene annotations from ARS‐UCD1.2. The annotation used for the transfer was NCBI GCF_0 022 63795.1_ARS‐UCD1.2_genomic.gff.
Variant Calling Evaluation
To assess the variant calling performance of the different references, the SRS data of 27 Dingjie samples were aligned to each of the two assemblies (ARS‐UCD1.2 and Tibetan_v1) using BWA‐MEM (v.0.7.13‐r1126) with default parameters, and the duplicated reads were removed using Picard Tools (http://broadinstitute.github.io/picard). The Genome Analysis Toolkit (GATK, v.3.8‐1‐0‐gf15c1c3ef) was used to detect SNPs and InDels. The variations were called using the “HaplotypeCaller”, “GenotypeGVCFs”, and “SelectVariants” of GATK. After SNP or InDel calling, “VariantFiltration” was used to discard sequencing and alignment artifacts from the SNPs with the parameters ′QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < −12.5, ReadPosRankSum < −8.0, and SOR > 3.0″ and based on a mean sequencing depth of variants (all individuals) of “< 1/3×and > 3×.” To identify the biallelic SNPs in Tibetan_v1 that appeared to be multiallelic under ARS‐UCD1.2, first the biallelic SNPs were extracted from Tibetan_v1 and the multiallelic SNPs from ARS‐UCD1.2 in 27 Dingjie cattle. Then, using the chain file created between Tibetan_v1 and ARS‐UCD1.2, LiftOver tools were employed for coordinate conversion.
Population Structure Analysis and Estimation of the Effective Population Size
To examine the fine‐scale population structure revealed by genetic variants from different assemblies (ARS‐UCD1.2 and Tibetan_v1), PCA of cattle populations was performed in the QTP region and northern China. PCA was conducted using the smartPCA program in EIGENSOFT (v.5.0).[ 104 ] A neighbor‐joining (NJ) tree was constructed using the matrix of pairwise genetic distances calculated by PLINK (v.1.9).[ 105 ] The unrooted NJ tree was then visualized via MEGA (v.5.0)[ 106 ] and FigTree (v.1.4.3) (http://tree.bio.ed.ac.uk/software/fgtree/). A NeighborNet network was constructed using Reynold's distances between populations using SplitsTree App (v.6.3.32).[ 107 ]
For the genetic variants called from different reference assemblies (ARS‐UCD1.2 and Tibetan_v1), MSMC2 analysis was applied to estimate the effective population size (Ne) of the Tibetan cattle, Anxi cattle, and Angus populations over time. This method was applied to all groups with two deep‐coverage (>14 ×) individuals per group. The samples (coverage) used in this analysis were as follows: Anxi, O718844 (32.72 ×) and V347862 (33.36 ×); Angus, SRR1365144 (16.27 ×) and SRR1425124 (16.44 ×); and Tibetan cattle, Xizang22 (26.28 ×) and Xizang7 (24.56 ×). For the calculation of the effective population size, the parameters of MSMC2 were set to ′msmc2 ‐t 10 ‐p 1*2+25*1+1*2 ‐I 0, 1, 2, 3″ and ′msmc2 ‐t 10 ‐p 1*2+25*1+1*2 ‐I 4, 5, 6, 7″. For the calculation of population separation, the parameters of MSMC2 were set to ′msmc2 ‐t 8 ‐P 0, 0, 0, 0, 1, 1, 1, 1 ‐s ‐p 1*2+25*1+1*2″. For effective population size inference, two individuals (4 phased haplotypes) from each group were used. A time scale for generation time of g = 6 and a mutation rate per generation of µg = 1.26 × 10−8 were used.
SV Discovery and Genotyping
LRS data from 84 samples were aligned to ARS_UCD1.2. Mapping was performed using NGMLR (v.0.2.7)[ 32 ] with default parameters. SV calling was performed using CuteSV (v.2.0.3),[ 30 ] SVIM (v.1.4.0),[ 31 ] and Sniffles (v.2.2).[ 32 ] Three minimum supporting reads were needed for each SV. The minimum length of the SVs was at least 50 bp. Specifically, CuteSV was run with the parameters ′–max_cluster_bias_INS 100 –diff_ratio_merging_INS 0.3 –max_cluster_bias_DEL 100 –diff_ratio_merging_DEL 0.3 –min_support 3 –min_size 50 –genotype –report_readid –sample.″ The following parameters were set for SVIM: ′–min_sv_size 50 –min_mapq 20 –minimum_depth 3 –insertion_sequences –sequence_alleles –read_names –sample.″ Sniffles was run with the parameters “–minsupport 3 –output‐rnames” to remove positions marked as IMPRECISE for INFO or as UNRESOLVED for FILTER. SURVIVOR (v.1.0.7)[ 108 ] was used to merge the SVs supported by two or three calling methods, with a maximum allowed pairwise distance of 500 bp between breakpoints. Individuals were genotyped using the force calling function of Sniffles with the –genotype‐vcf option. Multisample VCF files were merged and deduplicated using BCFtools (v.1.17)[ 109 ] and the “collapse” module in Truvari software.[ 110 ] After filtering out unreliable genotypes (all 0/0), a total of 222 528 SVs were obtained.
These 222 528 SVs were genotyped in 281 cattle and 12 individuals from other bovine species with available SRS data. Reads were mapped to ARS‐UCD1.2 using BWA‐MEM (v.0.7.13‐r1126) with default parameters. Paragraph (v.2.4a)[ 111 ] was used to genotype the SVs from the SRS data. BCFtools was used to combine the results for all the genotypes. Then all the unfiltered genotypes were replaced in Paragraph software with missing genotypes (./.) and excluded SVs without any remaining nonreference genotypes.
Following the methodology outlined in a previous study,[ 41 ] the midpoint of each SV was used to calculate SV hotspots via a publicly available script (https://github.com/daewoooo/primatR/blob/master/R/hotspotter.R) with minor modifications.
Detection of Selective Signals
To detect SVs with highly differentiated allele frequencies between QTP cattle (Bailang, Dingjie, Diqing, Langkazi, and Yushu) and low‐altitude cattle (Fuzhou, Yanbian, Luxi and Simmental), the di values for 222528 SVs were calculated from 84 LRS data as described in previous studies.[ 43 , 44 ] Chaidamu and Mongolian cattle were not included in either of the two groups because their altitudes were between 800 and 3000 m, which may have affected the results. For the SV‐F ST, the Weir and Cockerham estimator were employed for the F ST estimates via VCFtools (v.0.1.16)[ 112 ] to identify population‐stratified SVs in five groups: all QTP cattle versus low‐altitude cattle; Bailang versus low‐altitude cattle; Dingjie versus low‐altitude cattle; Diqing versus low‐altitude cattle; Langkazi versus low‐altitude cattle; and Yushu versus low‐altitude cattle. SVs with P values less than 0.001 (Z test) were selected as di ‐SVs.
Annotation of SVs
Consistent sequences of SVs were extracted and compared with consistent repeat sequences of mammals using RepeatMasker (v.4.0.5) (http://www.repeatmasker.org) to determine whether they were repetitive sequences. Specifically, for DELs and DUPs, BEDTools (v.2.25.0)[ 113 ] were used to intersect these segments with the repetitive sequences of the ARS‐UCD1.2, which requires a minimum reciprocal overlap of 80%. For INSs, the sequences were extracted and annotated using RepeatMasker. INSs were classified as repetitive sequences if the query sequence aligned completely, allowing for a maximum of 20 bp unaligned regions at the ends (left < 20). The functional regions of the SVs in the genomes were annotated using ANNOVAR.[ 33 ] KEGG and GO enrichment analyses were performed for SV‐linked genes in population stratification, and functional categories with corrected P values less than 0.05 were considered significantly enriched.
ATAC‐seq Analysis
Some of the ATAC‐seq data used in this study were generated through our own sequencing experiments, whereas others were obtained from publicly available NCBI datasets (Table S14, Supporting Information). After the adapters were checked for and trimmed with Trim Galore (v.0.6.10) (https://github.com/FelixKrueger/TrimGalore), the ATAC‐seq clean reads were mapped to ARS‐UCD1.2 using Bowtie2 (v.2.4.5).[ 114 ] BAM files were sorted using SAMtools (v.1.3),[ 115 ] and duplicate alignments were removed with Picard (v.2.20.2) (http://broadinstitute.github.io/picard). The peaks for individual replicates were called separately with MACS2 (v.2.2.7.1)[ 116 ] and then merged across three duplicates within the same tissue using BEDTools (v.2.25.0).[ 113 ] The R package obtained from https://github.com/junjunlab is used for drawing.
Dual‐Luciferase Assay
For the 102‐bp DEL of the GNPAT gene, sequences containing the entire SV and the 360‐bp flanking sequences were extracted from the cattle genome (ARS‐UCD1.2) and inserted into the pGL3 promoter. The HEK293T cell line was purchased from the National Science & Technology Infrastructure (Shanghai, China). Transfection was performed when the cell density in the 12‐well plate reached approximately 80%. Lipofectamine 2000 (Invitrogen, CA, USA) was mixed with plasmids (recombinant plasmid and pRL‐TK plasmid vector) at a ratio of 1:1, incubated for 20 min, and then added to the cells. The dual‐luciferase reporter assay kit was purchased from Promega (Wisconsin, USA), and the experiments were conducted according to the manufacturer's instructions. After cell lysis, the lysates were transferred to a 96‐well plate, and two substrates were sequentially added. Detection was then performed using a Multiskan system (TECAN, Spark, Switzerland). Each experiment was independently performed three times. All the data are presented as the means with standard deviations, and significant differences between groups were examined via two‐sided Student's t test. A P value < 0.05 was considered to indicate a significant difference.
Discovery of the Introgressed SVs
Yak LRS data were downloaded from the NCBI to identify SVs, including six domestic yak and 6 wild yak (Table S8, Supporting Information). Using SURVIVOR software, 12 yak and 84 cattle vcf files were merged, and the subsequent genotyping workflow was the same as that used for “SV discovery and genotyping”. After filtering out unreliable genotypes (all 0/0), a total of 250156 SVs was obtained. First, using LRS data, SVs specific to QTP cattle and fixed in yak genomes but absent from low‐altitude cattle populations (Fuzhou, Yanbian, Luxi, and Simmental) were investigated. A total of 7293 autosomal SVs possibly derived from yak introgression were identified in QTP cattle (Figure 4a). To further validate the SVs, haplotype trees were constructed using SNPs in the 50‐kb flanking of the SV, using a SNP dataset of 148 QTP (36 LRS and 112 SRS) samples and 16 reference SRS samples.
To expand the population size, the SVs were further genotyped using 293 SRS data, including 112 QTP cattle and 12 high‐coverage individuals from other bovine species (Table S16, Supporting Information). Paragraph (v.2.4a)[ 111 ] was used to genotype the SVs from the SRS data. Cattle from low‐altitude taurine breeds/populations (African taurine cattle, European taurine cattle, and East Asian taurine cattle) were used as control populations for genotyping (Table S18, Supporting Information). Then SVs that were shared with yak genomes but not present in other cattle samples were searched, and 2334 candidate introgressed tract SVs remained after SRS genotyping. The –maf parameter of VCFtools (v.0.1.16)[ 112 ] with the –freq parameter was used to calculate the allelic frequency in different breeds/populations. IQTREE (v.1.6.6)[ 117 ] was used to construct a phylogenetic tree for the introgressed yak regions. ModelFinder[ 118 ] was used to identify the best model of the phylogenetic tree.
Hi‐C Data Preprocessing and Normalization
Hi‐C read pairs were processed using the HiC‐Pro pipeline[ 119 ] with default parameters. After removing duplications, valid pairs were summed across three technical replicates and used to generate an interchromosome contact matrix. The normalized observed contact matrices were generated using the Knight‒Ruiz algorithm within the Juicer toolkit to eliminate intrinsic biases within the matrices. Quantile normalization was subsequently performed using the BNBC R package (v.1.0.0) to mitigate biases between samples. The correlation between normalized matrices of technical replicates was calculated using GenomeDISCO[ 120 ] (Figure S20, Supporting Information).
A/B Compartment Determination and Analysis
A/B compartments at 20 kb resolution were identified using PCA and the A‒B index as described in a previous study.[ 121 ] In brief, observed/expected contact matrices were obtained by normalizing intrachromosomal observed contact matrices using KR normalization and quantile normalization. For loci i and j, the expected contact frequency was determined as the median observed contact frequency at the same genomic distance. The observed/expected contact matrices were then calculated by dividing the observed contact frequency by the expected contact frequency for each locus pair. The “prcomp” package was applied in R (v.4.1.2) with default parameters to these matrices to generate PC1 vectors. The gene density for each bin was defined as the number of transcriptional start sites (TSSs) within ±100 kb of the bin. Compartments A and B were identified by calculating the Pearson's correlation between PC1 values and gene density for each chromosome using the “cor.test” function in R (v.4.1.2). Positive PC1 values were assigned to compartment A, and negative values were assigned to compartment B if the correlation was positive. The assignment was reversed if the correlation was negative. The A‐B index was subsequently calculated at a 20 kb resolution following previously described methods, representing the likelihood of a sequence interacting with either the A or B compartment.[ 121 ] Bins 20 kb in length with a positive A‐B index were classified as A compartments, whereas those with a negative A‐B index were classified as B compartments.
TAD Identification and Loop Calling
TADs were identified via a previously reported insulation score (IS)[ 122 ] method based on the KR and quantile‐normalized intrachromosomal observed contact matrices at a 20‐kb resolution. The IS reflects the aggregate interactions passing across each bin, and IS boundaries were identified using the public script matrix2insulation.pl (https://github.com/dekkerlab/cworld‐dekker) with the following parameters: ‐v, ‐is 260 000, ‐ids 200 000, ‐im mean, ‐nt 0.1, and ‐bmoe 0. Fit‐Hi‐C software (v.2.0.8)[ 123 ] was used with default parameters to identify loops at a 5‐kb resolution based on the KR and quantile‐normalized intrachromosomal contact matrices, with a q value < 1×10−8 used as a cutoff.
GWAS and FST Analysis of Gray Cattle
For both the SNP‐GWAS and SNP‐F ST analyses, read mapping and SNP calling for the SRS data were performed as described in the previous “Variant calling evaluation” section. To identify the locus associated with gray coat color, SRS data from 23 gray cattle and 258 nongray cattle were used. SNP‐GWAS analysis was conducted on 62091927 SNPs using the GEMMA.[ 124 ] The genome‐wide distribution of the F ST values was estimated using VCFtools (v.0.1.16)[ 112 ] with a 50‐kb window size and 25‐kb increment to examine pairwise genetic differentiation between 23 gray and 29 nongray cattle. For SV‐GWAS analysis, five gray cattle were used and the remaining 79 non‐gray cattle from the LRS dataset were analyzed with the GEMMA model. For SV‐F ST, five gray cattle and 11 non‐gray cattle were used to calculate the F ST value for each SV.
SV Validation
Visualization of the detected SVs was performed using IGV (v.2.2).[ 125 ] To obtain more reliable detection outputs for inversions, npInv (v.1.24)[ 68 ] was used to detect and genotype inversion variants. To further verify the authenticity and heterozygosity of the 2‐Mb INV and two associated TRAs, breakpoint‐spanning primers were designed and PCR validation was performed in an additional 23 gray cattle. Three primer pairs were used to independently validate each SV: Primer1 (F: 5′‐GTTGACCCTTTCTTTCCA‐3′; R: 5′‐GACCCTCCTTTCTATCCC‐3′) targeted TRA (BTA17:47058249‐BTA6:69764591); Primer2 (F: 5′‐ATCTTAGCTGTAGCATGT‐3′; R: 5′‐ACTTGTAGGTTACTGGTT‐3′) spanned the INV (BTA6:69722181‐71822202) breakpoints; and Primer3 (F: 5′‐TCCTTCCCTAAGATTCAGA‐3′; R: 5′‐GAGCCCTTGACCCACTA‐3′) amplified TRA (BTA6:70471991‐BTA17:46961621).
For the four introgressed SVs, four primer pairs were designed: Primer4 (F: 5′‐AAGACGCTTGCTCC‐3′; R: 5′‐CTTCTCCAAAACCACA‐3′) targeted DEL (BTA28:4236113‐4237437); Primer5 (F: 5′‐CTGGTCCCATCACTTC‐3′; R: 5′‐TTGCCATACCTCATTTT‐3′) spanned the INS (BTA28:4204966) breakpoint; Primer6 (F: 5′‐CTTAGGACTTCAGGGA‐3′; R: 5′‐GCTCATCATAGGTTTGTA‐3′) amplified DEL (BTA9:87336847‐87336904); and Primer7 (F: 5′‐ACCCTGTACCTCTTAGCA‐3′; R: 5′‐GACATGACTGAGCGACTG‐3′) targeted DEL (BTA2:19540699‐19540751) (Figure S21, Supporting Information).
PCR was performed in 25 µL reactions with 1 µL of genomic DNA (50 ng), 1 µL each of forward and reverse forward primers (10 µM), and 22 µL of Golden Star T6 Super PCR mix (Beijing Tsingke Biotech Co., Beijing, China). The PCR products were examined using 1.5% agarose gel electrophoresis. The presence and size of the amplified fragments were determined and used to infer the genotype of the inversion.
Immunohistochemistry and H&E Staining
Skin samples were fixed in 4% paraformaldehyde, washed in PBS, dehydrated and embedded in paraffin. The rabbit polyclonal antibody anti‐KIT (D260893, Sangon) was used for immunohistochemical staining.
Linkage Disequilibrium
LD was accessed across the chromosome containing the 2‐Mb INV (BTA6:69770000‐71780000) and the extended region (BTA6:67770000‐73780000) using samples from both gray and non‐gray cattle. Gray cattle group: comprising 28 samples from Dingjie (n = 13), Langkazi (n = 14), and Bailang cattle (n = 1) in the QTP. Non‐gray cattle group: consisting of Dingjie (n = 29), Diqing (n = 19), and Yushu cattle (n = 32) from the QTP. Randomly 8000 SNPs from the VCF file were sampled and haplotype blocks were analyzed using LDBlockShow[ 126 ] with the parameter ‘‐SeleVar 2′.
Conflict of Interest
The authors declare no conflict of interest.
Author Contributions
X.X., F.W., X.L., S.L. and L.Y. contributed equally to this work. N.C., C.L., W.L., and B.H. designed and supervised the project. X.X., F.W., X.L., S.L. and L.Y. performed most of the analyses. K.Q., R.S., J.L., J.Z., B.W., B.Z., S.Q., L.Z., S.W., C. Luobu, N. Cangjue, D.L., S.S., Z.M., R.L., Z.W., X.Y., and H.C. collected the samples. Y.Z. and L.D. conducted the experiments. X.X., F.W. and N.C. wrote the manuscript with input from all the authors. N.C., J.A.L., J.H., L.X., H.H., W.L., Z.Z., Y.W., Y.G., R.D., Y.H. and X.L. revised the manuscript.
Supporting information
Supporting Information
Supplemental Table 1
Acknowledgements
This work was supported by the National Key R&D Program of China (2021YFF1001000 and 2021YFD1200400), the National Natural Science Foundation of China (32372854, 32341054, 32102523 and 32260823), the Key Research and Development Program of Xizang Autonomous Region of China (XZ202301ZY0008N), the Yunnan Expert Workstations (202305AF150156), the China Agriculture Research System of MOF and MARA (CARS‐37), the Postdoctoral Fellowship Program of CPSF (GZC20232149), the Postdoctoral Research Project Funding of Shaanxi Province (2023BSHEDZZ132), the Open Project of State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University (2024‐KF‐02), the Program of Yunling Scholar and Yunling Cattle Special Program of Yunnan Joint Laboratory of Seeds and Seeding Industry (202205AR070001), the Construction of Yunling Cattle Technology Innovation Center and Industrialization of Achievements (2019ZG007), and Chuxiong Science and Technology Leading Talents (CXKJLJRC2023‐07). The authors thank Lijing Tang and Yafei Mao for their valuable suggestions, and the High‐Performance Computing (HPC) of Northwest A&F University (NWAFU) for providing the computing resources.
Xia X., Wang F., Luo X., Li S., Lyu Y., Zheng Y., Ma Z., Qu K., Song R., Liu J., Zhang J., Wangdui B., Zhuzha B., Quji S., Zhao L., Wangmu S., Luobu C., Cangjue N., Luosang D., Sizhu S., Cheng H., Li R., Wu Z., Dang R., Huang Y., Lan X., Xu L., Hu H., Low W., Zheng Z., Wang Y., Gao Y., Deng L., Lenstra J. A., Han J., Yang X., Lyu W., Huang B., Lei C., Chen N., Structural Variations Associated with Adaptation and Coat Color in Qinghai‐Tibetan Plateau Cattle. Adv. Sci. 2025, 12, e03258. 10.1002/advs.202503258
Contributor Information
Wenfa Lyu, Email: lvwenfa@jlau.edu.cn.
Bizhi Huang, Email: hbz@ynbp.cn.
Chuzhao Lei, Email: leichuzhao1118@nwafu.edu.cn.
Ningbo Chen, Email: ningbochen@nwafu.edu.cn.
Data Availability Statement
The data that support the findings of this study are openly available in NCBI at https://dataview.ncbi.nlm.nih.gov/object/PRJNA1158668?reviewer=k781773a0ej2l1mnj2j9j41a8l, reference number 1158668.
References
- 1. Qiu Q., Zhang G., Ma T., Qian W., Wang J., Ye Z., Cao C., Hu Q., Kim J., Larkin D. M., Auvil L., Capitanu B., Ma J., Lewin H. A., Qian X., Lang Y., Zhou R., Wang L., Wang K., Xia J., Liao S., Pan S., Lu X., Hou H., Wang Y., Zang X., Yin Y., Ma H., Zhang J., Wang Z., et al., Nat. Genet. 2012, 44, 946. [DOI] [PubMed] [Google Scholar]
- 2. Wang G. D., Fan R. X., Zhai W., Liu F., Wang L., Zhong L., Wu H., Yang H. C., Wu S. F., Zhu C. L., Li Y., Gao Y., Ge R. L., Wu C. I., Zhang Y. P., Genome Biol Evol 2014, 6, 2122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Gou X., Wang Z., Li N., Qiu F., Xu Z., Yan D., Yang S., Jia J., Kong X., Wei Z., Lu S., Lian L., Wu C., Wang X., Li G., Ma T., Jiang Q., Zhao X., Yang J., Liu B., Wei D., Li H., Yang J., Yan Y., Zhao G., Dong X., Li M., Deng W., Leng J., Wei C., et al., Genome Res. 2014, 24, 1308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Liu X., Zhang Y., Li Y., Pan J., Wang D., Chen W., Zheng Z., He X., Zhao Q., Pu Y., Guan W., Han J., Orlando L., Ma Y., Jiang L., Mol. Biol. Evol. 2019, 36, 2591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Li M., Tian S., Jin L., Zhou G., Li Y., Zhang Y., Wang T., Yeung C. K., Chen L., Ma J., Zhang J., Jiang A., Li J., Zhou C., Zhang J., Liu Y., Sun X., Zhao H., Niu Z., Lou P., Xian L., Shen X., Liu S., Zhang S., Zhang M., Zhu L., Shuai S., Bai L., Tang G., Liu H., et al., Nat. Genet. 2013, 45, 1431. [DOI] [PubMed] [Google Scholar]
- 6. Wang M. S., Li Y., Peng M. S., Zhong L., Wang Z. J., Li Q. Y., Tu X. L., Dong Y., Zhu C. L., Wang L., Yang M. M., Wu S. F., Miao Y. W., Liu J. P., Irwin D. M., Wang W., Wu D. D., Zhang Y. P., Mol. Biol. Evol. 2015, 32, 1880. [DOI] [PubMed] [Google Scholar]
- 7. Wu D. D., Yang C. P., Wang M. S., Dong K. Z., Yan D. W., Hao Z. Q., Fan S. Q., Chu S. Z., Shen Q. S., Jiang L. P., Li Y., Zeng L., Liu H. Q., Xie H. B., Ma Y. F., Kong X. Y., Yang S. L., Dong X. X., Esmailizadeh A., Irwin D. M., Xiao X., Li M., Dong Y., Wang W., Shi P., Li H. P., Ma Y. H., Gou X., Chen Y. B., Zhang Y. P., Natl. Sci. Rev. 2020, 7, 952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Zhang W., Fan Z., Han E., Hou R., Zhang L., Galaverni M., Huang J., Liu H., Silva P., Li P., Pollinger J. P., Du L., Zhang X., Yue B., Wayne R. K., Zhang Z., PLoS Genet. 2014, 10, 1004466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Simonson T. S., Yang Y., Huff C. D., Yun H., Qin G., Witherspoon D. J., Bai Z., Lorenzo F. R., Xing J., Jorde L. B., Prchal J. T., Ge R., Science 2010, 329, 72. [DOI] [PubMed] [Google Scholar]
- 10. Loftus R. T., MacHugh D. E., Bradley D. G., Sharp P. M., Cunningham P., Proc Natl Acad Sci U S A 91, 2757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Vigne J.‐D., Peters J., Helmer D., The first steps of animal domestication, Oxbow Books, Oxford: 2005. [Google Scholar]
- 12. Decker J. E., McKay S. D., Rolf M. M., Kim J., Molina Alcalá A., Sonstegard T. S., Hanotte O., Götherström A., Seabury C. M., Praharani L., Babar M. E., Correia de Almeida Regitano L., Yildiz M. A., Heaton M. P., Liu W. S., Lei C. Z., Reecy J. M., Saif‐Ur‐Rehman M., Schnabel R. D., Taylor J. F., PLoS Genet. 2014, 10, 1004254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Chen N., Cai Y., Chen Q., Li R., Wang K., Huang Y., Hu S., Huang S., Zhang H., Zheng Z., Song W., Ma Z., Ma Y., Dang R., Zhang Z., Xu L., Jia Y., Liu S., Yue X., Deng W., Zhang X., Sun Z., Lan X., Han J., Chen H., Bradley D. G., Jiang Y., Lei C., Nat. Commun. 2018, 9, 2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Wu D. D., Ding X. D., Wang S., Wójcik J. M., Zhang Y., Tokarska M., Li Y., Wang M. S., Faruque O., Nielsen R., Zhang Q., Zhang Y. P., Nat Ecol Evol 2018, 2, 1139. [DOI] [PubMed] [Google Scholar]
- 15. Medugorac I., Graf A., Grohs C., Rothammer S., Zagdsuren Y., Gladyr E., Zinovieva N., Barbieri J., Seichter D., Russ I., Eggen A., Hellenthal G., Brem G., Blum H., Krebs S., Capitan A., Nat. Genet. 2017, 49, 470. [DOI] [PubMed] [Google Scholar]
- 16. Zhang K., Lenstra J. A., Zhang S., Liu W., Liu J., Anim Genet 2020, 51, 637. [DOI] [PubMed] [Google Scholar]
- 17. Xia X. T., Achilli A., Lenstra J. A., Tong B., Ma Y., Huang Y. Z., Han J. L., Sun Z. Y., Chen H., Lei C. Z., Hu S. M., Chen N. B., Heredity (Edinb) 2021, 126, 1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Chen N., Xia X., Hanif Q., Zhang F., Dang R., Huang B., Lyu Y., Luo X., Zhang H., Yan H., Wang S., Wang F., Chen J., Guan X., Liu Y., Li S., Jin L., Wang P., Sun L., Zhang J., Liu J., Qu K., Cao Y., Sun J., Liao Y., Xiao Z., Cai M., Mu L., Siddiki A. Z., Asif M., et al., Nat. Commun. 2023, 14, 7803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Lyu Y., Wang F., Cheng H., Han J., Dang R., Xia X., Wang H., Zhong J., Lenstra J. A., Zhang H., Han J., MacHugh D. E., Medugorac I., Upadhyay M., Leonard A. S., Ding H., Yang X., Wang M. S., Quji S., Zhuzha B., Quzhen P., Wangmu S., Cangjue N., Wa D., Ma W., Liu J., Zhang J., Huang B., Qi X., Li F., et al., Sci. Bull. (Beijing) 2024, 69, 3415. [DOI] [PubMed] [Google Scholar]
- 20. He Y., Lou H., Cui C., Deng L., Gao Y., Zheng W., Guo Y., Wang X., Ning Z., Li J., Li B., Bai C., Liu S., Wu T., Xu S., Qi X., Su B., Natl. Sci. Rev. 2020, 7, 391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Chiang C., Scott A. J., Davis J. R., Tsang E. K., Li X., Kim Y., Hadzic T., Damani F. N., Ganel L., Montgomery S. B., Battle A., Conrad D. F., Hall I. M., Nat. Genet. 2017, 49, 692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Quan C., Li Y., Liu X., Wang Y., Ping J., Lu Y., Zhou G., Genome Biol. 2021, 22, 159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Lenstra J. A., Animal Research and One Health 2024, 2, 360. [Google Scholar]
- 24. Li X., Liu Q., Fu C., Li M., Li C., Li X., Zhao S., Zheng Z., J Genet Genomics 2024, 51, 394. [DOI] [PubMed] [Google Scholar]
- 25. Cheng H., Concepcion G. T., Feng X., Zhang H., Li H., Nat. Methods 2021, 18, 170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Ondov B. D., Treangen T. J., Melsted P., Mallonee A. B., Bergman N. H., Koren S., Phillippy A. M., Genome Biol. 2016, 17, 132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Rhie A., Walenz B. P., Koren S., Phillippy A. M., Genome Biol. 2020, 21, 245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Lou H., Gao Y., Xie B., Wang Y., Zhang H., Shi M., Ma S., Zhang X., Liu C., Xu S., Cell Syst 2022, 13, 321. [DOI] [PubMed] [Google Scholar]
- 29. Schiffels S., Durbin R., Nat. Genet. 2014, 46, 919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Jiang T., Liu Y., Jiang Y., Li J., Gao Y., Cui Z., Liu Y., Liu B., Wang Y., Genome Biol. 2020, 21, 189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Heller D., Vingron M., Bioinformatics 2019, 35, 2907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Sedlazeck F. J., Rescheneder P., Smolka M., Fang H., Nattestad M., von Haeseler A., Schatz M. C., Nat. Methods 2018, 15, 461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Wang K., Li M., Hakonarson H., Nucleic Acids Res. 2010, 38, 164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Dai X., Bian P., Hu D., Luo F., Huang Y., Jiao S., Wang X., Gong M., Li R., Cai Y., Wen J., Yang Q., Deng W., Nanaei H. A., Wang Y., Wang F., Zhang Z., Rosen B. D., Heller R., Jiang Y., Genome Res. 2023, 33, 1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Kelly C. J., Chitko‐McKown C. G., Chuong E. B., Genome Res. 2022, 32, 1474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Leonard A. S., Mapel X. M., Pausch H., Genome Res. 2024, 34, 300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Li R., Gong M., Zhang X., Wang F., Liu Z., Zhang L., Yang Q., Xu Y., Xu M., Zhang H., Zhang Y., Dai X., Gao Y., Zhang Z., Fang W., Yang Y., Fu W., Cao C., Yang P., Ghanatsaman Z. A., Negari N. J., Nanaei H. A., Yue X., Song Y., Lan X., Deng W., Wang X., Pan C., Xiang R., Ibeagha‐Awemu E. M., et al., Genome Res. 2023, 33, 463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Li Z., Liu X., Wang C., Li Z., Jiang B., Zhang R., Tong L., Qu Y., He S., Chen H., Mao Y., Li Q., Pook T., Wu Y., Zan Y., Zhang H., Li L., Wen K., Chen Y., Genome Res. 2023, 33, 1833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Crysnanto D., Leonard A. S., Fang Z. H., Pausch H., Proc Natl Acad Sci U S A 2021, 118, 2101056118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Talenti A., Powell J., Hemmink J. D., Cook E. A. J., Wragg D., Jayaraman S., Paxton E., Ezeasor C., Obishakin E. T., Agusi E. R., Tijjani A., Marshall K., Fisch A., Ferreira B. R., Qasim A., Chaudhry U., Wiener P., Toye P., Morrison L. J., Connelley T., Prendergast J. G. D., Nat. Commun. 2022, 13, 910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Ebert P., Audano P. A., Zhu Q., Rodriguez‐Martin B., Porubsky D., Bonder M. J., Sulovari A., Ebler J., Zhou W., Serra Mari R., Yilmaz F., Zhao X., Hsieh P., Lee J., Kumar S., Lin J., Rausch T., Chen Y., Ren J., Santamarina M., Höps W., Ashraf H., Chuang N. T., Yang X., Munson K. M., Lewis A. P., Fairley S., Tallon L. J., Clarke W. E., Basile A. O., et al., Science 2021, 372, abf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Li T. T., Xia T., Wu J. Q., Hong H., Sun Z. L., Wang M., Ding F. R., Wang J., Jiang S., Li J., Pan J., Yang G., Feng J. N., Dai Y. P., Zhang X. M., Zhou T., Li T., Nat. Commun. 2023, 14, 6601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Akey J. M., Ruhe A. L., Akey D. T., Wong A. K., Connelly C. F., Madeoy J., Nicholas T. J., Neff M. W., Proc Natl Acad Sci U S A 2010, 107, 1160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Feng Y., Xie N., Inoue F., Fan S., Saskin J., Zhang C., Zhang F., Hansen M. E. B., Nyambo T., Mpoloka S. W., Mokone G. G., Fokunang C., Belay G., Njamnshi A. K., Marks M. S., Oancea E., Ahituv N., Tishkoff S. A., Nat. Genet. 2024, 56, 258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Zheng W., He Y., Guo Y., Yue T., Zhang H., Li J., Zhou B., Zeng X., Li L., Wang B., Cao J., Chen L., Li C., Li H., Cui C., Bai C., Baimakangzhuo, Qi X., Ouzhuluobu, Su B., Genome Biol. 2023, 24, 73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Torres R. A., Drake D. A., Solodushko V., Jadhav R., Smith E., Rocic P., Weber D. S., Arterioscler Thromb Vasc Biol 2011, 31, 2424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Wang Y., Liu X., Xie B., Yuan H., Zhang Y., Zhu J., Redox Biol. 2020, 28, 101313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Cocco L., Manzoli L., Faenza I., Ramazzotti G., Yang Y. R., McCubrey J. A., Suh P. G., Follo M. Y., Adv. Biol. Regul. 2016, 60, 1. [DOI] [PubMed] [Google Scholar]
- 49. Pinhal M. A. S., Melo C. M., Nader H. B., Adv. Exp. Med. Biol. 2020, 1221, 821. [DOI] [PubMed] [Google Scholar]
- 50. Elkin M., Ilan N., Ishai‐Michaeli R., Friedmann Y., Papo O., Pecker I., Vlodavsky I., FASEB J. 2001, 15, 1661. [DOI] [PubMed] [Google Scholar]
- 51. Zhao P., Zhao F., Hu J., Wang J., Liu X., Zhao Z., Xi Q., Sun H., Li S., Luo Y., Front Physiol 2022, 13, 885444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Zhu S., Guo T., Zhao H., Qiao G., Han M., Liu J., Yuan C., Wang T., Li F., Yue Y., Yang B., Front. Genet. 2020, 11, 848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Hu J., Hameed M. R., Agaram N. P., Whiting K. A., Qin L. X., Villano A. M., O'Connor R. B., Rozenberg J. M., Cohen S., Prendergast K., Kryeziu S., R. L. White, Jr. , Posner M. C., Socci N. D., Gounder M. M., Singer S., Crago A. M., Clin. Cancer Res. 2024, 30, 450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Jain I. H., Calvo S. E., Markhard A. L., Skinner O. S., To T. L., Ast T., Mootha V. K., Cell 2020, 181, 716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Tang W. H., Wu S., Wong T. M., Chung S. K., Chung S. S., Free Radic Biol Med 2008, 45, 602. [DOI] [PubMed] [Google Scholar]
- 56. Clanton T. L., J. Appl. Physiol. 2007, 102, 2379. [DOI] [PubMed] [Google Scholar]
- 57. Wu Z., Zuo M., Zeng L., Cui K., Liu B., Yan C., Chen L., Dong J., Shangguan F., Hu W., He H., Lu B., Song Z., EMBO Rep. 2021, 22, 50827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Wang T., Wiater E., Zhang X., Thomas J. B., Montminy M., Proc Natl Acad Sci U S A 2021, 118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Maeda N., Funahashi T., Matsuzawa Y., Shimomura I., Atherosclerosis 2020, 292, 1. [DOI] [PubMed] [Google Scholar]
- 60. Chen Z., Ho I. L., Soeung M., Yen E. Y., Liu J., Yan L., Rose J. L., Srinivasan S., Jiang S., Chang Q. E., Feng N., Gay J. P., Wang Q., Wang J., Lorenzi P. L., Veillon L. J., Wei B., Weinstein J. N., Deem A. K., Gao S., Genovese G., Viale A., Yao W., Lyssiotis C. A., Marszalek J. R., Draetta G. F., Ying H., Nat. Commun. 2023, 14, 2194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Han J., Zheng D., Liu P. S., Wang S., Xie X., Cell Commun Signal 2024, 22, 475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Radomska H. S., Huettner C. S., Zhang P., Cheng T., Scadden D. T., Tenen D. G., Mol. Cell. Biol. 1998, 18, 4301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Liu X., Liu W., Lenstra J. A., Zheng Z., Wu X., Yang J., Li B., Yang Y., Qiu Q., Liu H., Li K., Liang C., Guo X., Ma X., Abbott R. J., Kang M., Yan P., Liu J., Nat. Commun. 2023, 14, 5617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Li C., Chen B., Langda S., Pu P., Zhu X., Zhou S., Kalds P., Zhang K., Bhati M., Leonard A., Huang S., Li R., Cuoji A., Wang X., Zhu H., Wu Y., Cuomu R., Gui B., Li M., Wang Y., Li Y., Fang W., Jia T., Pu T., Pan X., Cai Y., He C., Wang L., Jiang Y., Han J. L., et al., Genomics Proteomics Bioinformatics 2024, 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Tsuji‐Tamura K., Ogawa M., Angiogenesis 2023, 26, 523. [DOI] [PubMed] [Google Scholar]
- 66. Zheng J., Kim S. J., Saeidi S., Kim S. H., Fang X., Lee Y. H., Guillen‐Quispe Y. N., Ngo H. K. C., Kim D. H., Kim D., Surh Y. J., Free Radic Biol Med 2023, 194, 347. [DOI] [PubMed] [Google Scholar]
- 67. Pielberg G. R., Golovko A., Sundström E., Curik I., Lennartsson J., Seltenhammer M. H., Druml T., Binns M., Fitzsimmons C., Lindgren G., Sandberg K., Baumung R., Vetterlein M., Strömberg S., Grabherr M., Wade C., Lindblad‐Toh K., Pontén F., Heldin C. H., Sölkner J., Andersson L., Nat. Genet. 2008, 40, 1004. [DOI] [PubMed] [Google Scholar]
- 68. Shao H., Ganesamoorthy D., Duarte T., Cao M. D., Hoggart C. J., Coin L. J. M., BMC Bioinformatics 2018, 19, 261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Kabirova E., Ryzhkova A., Lukyanchikova V., Khabarova A., Korablev A., Shnaider T., Nuriddinov M., Belokopytova P., Smirnov A., Khotskin N. V., Kontsevaya G., Serova I., Battulin N., Nat. Commun. 2024, 15, 4521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Low W. Y., Animal Research and One Health 2024, 2, 363. [Google Scholar]
- 71. Blazak W. F., Eldridge F. E., J. Dairy Sci. 1977, 60, 1133. [DOI] [PubMed] [Google Scholar]
- 72. Leonard A. S., Crysnanto D., Fang Z. H., Heaton M. P., Vander Ley B. L., Herrera C., Bollwein H., Bickhart D. M., Kuhn K. L., Smith T. P. L., Rosen B. D., Pausch H., Nat. Commun. 2022, 13, 3012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Mandefro A., Sisay T., Edea Z., Uzzaman M. R., Kim K. S., Dadi H., J Anim Sci Technol 2021, 63, 248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Takeshima S. N., Corbi‐Botto C., Giovambattista G., Aida Y., BMC Genet. 2018, 19, 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Li C., Wu Y., Chen B., Cai Y., Guo J., Leonard A. S., Kalds P., Zhou S., Zhang J., Zhou P., Gan S., Jia T., Pu T., Suo L., Li Y., Zhang K., Li L., Purevdorj M., Wang X., Li M., Wang Y., Liu Y., Huang S., Sonstegard T., Wang M. S., Kemp S., Pausch H., Chen Y., Han J. L., Jiang Y., et al., Mol. Biol. Evol. 2022, 39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Xia X., Zhang F., Li S., Luo X., Peng L., Dong Z., Pausch H., Leonard A. S., Crysnanto D., Wang S., Tong B., Lenstra J. A., Han J., Li F., Xu T., Gu L., Jin L., Dang R., Huang Y., Lan X., Ren G., Wang Y., Gao Y., Ma Z., Cheng H., Ma Y., Chen H., Pang W., Lei C., Chen N., Genome Biol. 2023, 24, 211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Deng L., Zhang C., Yuan K., Gao Y., Pan Y., Ge X., He Y., Yuan Y., Lu Y., Zhang X., Chen H., Lou H., Wang X., Lu D., Liu J., Tian L., Feng Q., Khan A., Yang Y., Jin Z. B., Yang J., Lu F., Qu J., Kang L., Su B., Xu S., Natl. Sci. Rev. 2019, 6, 1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Qiu Q., Wang L., Wang K., Yang Y., Ma T., Wang Z., Zhang X., Ni Z., Hou F., Long R., Abbott R., Lenstra J., Liu J., Nat. Commun. 2015, 6, 10283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Semenza G. L., Rue E. A., Iyer N. V., Pang M. G., Kearns W. G., Genomics 1996, 34, 437. [DOI] [PubMed] [Google Scholar]
- 80. Pugh C. W., Ratcliffe P. J., Nat. Med. 2003, 9, 677. [DOI] [PubMed] [Google Scholar]
- 81. Si J., Guo J., Zhang X., Li W., Zhang S., Shang S., Zhang Q., Biol. Direct 2024, 19, 45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Borsari B., Villegas‐Mirón P., Pérez‐Lluch S., Turpin I., Laayouni H., Segarra‐Casas A., Bertranpetit J., Guigó R., Acosta S., Genome Res. 2021, 31, 1325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Yang Z., Bai C., Pu Y., Kong Q., Guo Y., Liu X., Zhao Q., Qiu Z., Zheng W., He Y., Lin Y., Deng L., Zhang C., Xu S., Peng Y., Xiang K., Zhang X., Cui C., Pan Y., Xin J., Wang Y., Liu S., Wang L., Guo H., Feng Z., Wang S., Shi H., Jiang B., Wu T., Qi X., et al., Proc Natl Acad Sci U S A 2022, 119, 2200421119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Storz J. F., Mol. Biol. Evol. 2021, 38, 2677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. An X., Mao L., Wang Y., Xu Q., Liu X., Zhang S., Qiao Z., Li B., Li F., Kuang Z., Wan N., Liang X., Duan Q., Feng Z., Yang X., Liu S., Nevo E., Liu J., Storz J. F., Li K., Nat Ecol Evol 2024, 8, 339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Shi J., Jia Z., Sun J., Wang X., Zhao X., Zhao C., Liang F., Song X., Guan J., Jia X., Yang J., Chen Q., Yu K., Jia Q., Wu J., Wang D., Xiao Y., Xu X., Liu Y., Wu S., Zhong Q., Wu J., Cui S., Bo X., Wu Z., Park M., Kellis M., He K., Nat. Commun. 2023, 14, 8282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Bae T., Hallis S. P., Kwak M. K., Exp. Mol. Med. 2024, 56, 501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Ge X., Lu Y., Chen S., Gao Y., Ma L., Liu L., Liu J., Ma X., Kang L., Xu S., Mol. Biol. Evol. 2023, 40, msad205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Said I., Byrne A., Serrano V., Cardeno C., Vollmers C., Corbett‐Detig R., Proc Natl Acad Sci U S A 2018, 115, 5492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Bilgrav Saether K., Eisfeldt J., Bengtsson J. D., Lun M. Y., Grochowski C. M., Mahmoud M., Chao H. T., Rosenfeld J. A., Liu P., Ek M., Schuy J., Ameur A., Dai H., Hwang J. P., Sedlazeck F. J., Bi W., Marom R., Wincent J., Nordgren A., Carvalho C. M. B., Lindstrand A., Genome Res. 2024, 34, 1785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Nilsson D., Pettersson M., Gustavsson P., Förster A., Hofmeister W., Wincent J., Zachariadis V., Anderlid B. M., Nordgren A., Mäkitie O., Wirta V., Käller M., Vezzi F., Lupski J. R., Nordenskjöld M., Lundberg E. S., Carvalho C. M. B., Lindstrand A., Hum Mutat 2017, 38, 180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Guan J., Xu Y., Yu Y., Fu J., Ren F., Guo J., Zhao J., Jiang Q., Wei J., Xie H., Genome Biol. 2021, 22, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Harringmeyer O. S., Hoekstra H. E., Nat Ecol Evol 2022, 6, 1965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Hansen P. J., Anim. Reprod. Sci. 2004, 82–83, 349. [DOI] [PubMed] [Google Scholar]
- 95. Littlejohn M. D., Henty K. M., Tiplady K., Johnson T., Harland C., Lopdell T., Sherlock R. G., Li W., Lukefahr S. D., Shanks B. C., Garrick D. J., Snell R. G., Spelman R. J., Davis S. R., Nat. Commun. 2014, 5, 5861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Plassais J., Kim J., Davis B. W., Karyadi D. M., Hogan A. N., Harris A. C., Decker B., Parker H. G., Ostrander E. A., Nat. Commun. 2019, 10, 1489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97. Rao S. S., Huntley M. H., Durand N. C., Stamenova E. K., Bochkov I. D., Robinson J. T., Sanborn A. L., Machol I., Omer A. D., Lander E. S., Aiden E. L., Cell 2014, 159, 1665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Durand N. C., Shamim M. S., Machol I., Rao S. S., Huntley M. H., Lander E. S., Aiden E. L., Cell Syst 2016, 3, 95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99. Dudchenko O., Batra S. S., Omer A. D., Nyquist S. K., Hoeger M., Durand N. C., Shamim M. S., Machol I., Lander E. S., Aiden A. P., Aiden E. L., Science 2017, 356, 92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100. Durand N. C., Robinson J. T., Shamim M. S., Machol I., Mesirov J. P., Lander E. S., Aiden E. L., Cell Syst 2016, 3, 99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. Simão F. A., Waterhouse R. M., Ioannidis P., Kriventseva E. V., Zdobnov E. M., Bioinformatics 2015, 31, 3210. [DOI] [PubMed] [Google Scholar]
- 102. Shumate A., Salzberg S. L., Bioinformatics 2021, 37, 1639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Pertea G., Pertea M., F1000Res 2020, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104. Patterson N., Price A. L., Reich D., PLoS Genet. 2006, 2, 190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105. Purcell S., Neale B., Todd‐Brown K., Thomas L., Ferreira M. A., Bender D., Maller J., Sklar P., de Bakker P. I., Daly M. J., Sham P. C., Am. J. Hum. Genet. 2007, 81, 559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106. Tamura K., Peterson D., Peterson N., Stecher G., Nei M., Kumar S., Mol. Biol. Evol. 2011, 28, 2731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107. Huson D. H., Bryant D., Nat. Methods 2024, 21, 1773. [DOI] [PubMed] [Google Scholar]
- 108. Jeffares D. C., Jolly C., Hoti M., Speed D., Shaw L., Rallis C., Balloux F., Dessimoz C., Bähler J., Sedlazeck F. J., Nat. Commun. 2017, 8, 14061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109. Danecek P., Bonfield J. K., Liddle J., Marshall J., Ohan V., Pollard M. O., Whitwham A., Keane T., McCarthy S. A., Davies R. M., Li H., Gigascience 2021, 10, giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110. English A. C., Menon V. K., Gibbs R. A., Metcalf G. A., Sedlazeck F. J., Genome Biol. 2022, 23, 271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111. Chen S., Krusche P., Dolzhenko E., Sherman R. M., Petrovski R., Schlesinger F., Kirsche M., Bentley D. R., Schatz M. C., Sedlazeck F. J., Eberle M. A., Genome Biol. 2019, 20, 291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112. Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A., Handsaker R. E., Lunter G., Marth G. T., Sherry S. T., McVean G., Durbin R., Bioinformatics 2011, 27, 2156.21653522 [Google Scholar]
- 113. Quinlan A. R., Curr Protoc Bioinformatics 2014, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114. Langmead B., Salzberg S. L., Nat. Methods 2012, 9, 357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., Bioinformatics 2009, 25, 2078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116. Zhang Y., Liu T., Meyer C. A., Eeckhoute J., Johnson D. S., Bernstein B. E., Nusbaum C., Myers R. M., Brown M., Li W., Liu X. S., Genome Biol. 2008, 9, R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117. Nguyen L. T., Schmidt H. A., von Haeseler A., Minh B. Q., Mol. Biol. Evol. 2015, 32, 268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118. Kalyaanamoorthy S., Minh B. Q., Wong T. K. F., von Haeseler A., Jermiin L. S., Nat. Methods 2017, 14, 587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119. Servant N., Varoquaux N., Lajoie B. R., Viara E., Chen C. J., Vert J. P., Heard E., Dekker J., Barillot E., Genome Biol. 2015, 16, 259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120. Ursu O., Boley N., Taranova M., Wang Y. X. R., Yardimci G. G., Stafford Noble W., Kundaje A., Bioinformatics 2018, 34, 2701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121. Rowley M. J., Nichols M. H., Lyu X., Ando‐Kuri M., Rivera I. S. M., Hermetz K., Wang P., Ruan Y., Corces V. G., Mol. Cell 2017, 67, 837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122. Crane E., Bian Q., McCord R. P., Lajoie B. R., Wheeler B. S., Ralston E. J., Uzawa S., Dekker J., Meyer B. J., Nature 2015, 523, 240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123. Kaul A., Bhattacharyya S., Ay F., Nat. Protoc. 2020, 15, 991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124. Zhou X., Stephens M., Nat. Genet. 2012, 44, 821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125. Thorvaldsdóttir H., Robinson J. T., Mesirov J. P., Brief Bioinform 2013, 14, 178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126. Dong S. S., He W. M., Ji J. J., Zhang C., Guo Y., Yang T. L., Brief Bioinform 2021, 22, bbaa227. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Supplemental Table 1
Data Availability Statement
The data that support the findings of this study are openly available in NCBI at https://dataview.ncbi.nlm.nih.gov/object/PRJNA1158668?reviewer=k781773a0ej2l1mnj2j9j41a8l, reference number 1158668.
