Abstract
Based on a pangenome graph platform, we simultaneously analyzed the impacts of SNPs and SVs in the population structure and phenotypic formation of global cattle using 2,409 individuals from 82 breeds. We demonstrated that SVs, like SNPs, effectively explain the population structure of global cattle. Genomic regions under strong selection, identified using both SNPs and SVs, consistently revealed footprints associated with human-mediated selection of economic traits in European improved cattle or natural selection of geographical adaptations. Notably, we detected that ∼40.14% of SVs were not tagged (LD, r2 < 0.6) by nearby SNPs. These “orphan” SVs may uncover new genetic signals and represent recent mutations associated with specific selection pressures or local environmental adaptation. Selected SVs tagged by SNPs also play causal or dominant roles in regions under selection. For example, our single-cell RNA sequencing has demonstrated that a notable SNP-tagged SV functions as an enhancer of the IGFBP7 gene, regulating fat deposition through IGFBP7+ cells. In conclusion, these SV-related mechanisms likely have caused some differences in economic traits and local adaptability across global cattle populations. Our integrated approaches highlight the unique and indispensable roles of SVs in shaping genetic diversity, offering novel insights into adaptation, selection, and strategies for improving cattle populations.
Keywords: cattle, pangenome, structural variation (SV), population differentiation, single cell sequencing, IGFBP7
Graphical Abstract
Graphical Abstract.
Introduction
Cattle are among the most economically and culturally significant domesticated species, contributing to food security, agriculture, and livelihoods worldwide. Due to natural and human selection, the global cattle populations have shown abundant phenotypic diversity. These phenotypic differences related to traits, especially adaptability and economic traits, contribute to differences in the economic value of cattle (Hayes and Daetwyler 2019). Generally, modern cattle trace their origins to multiple domestication events from wild aurochs populations in distinct geographic regions (Rossi et al. 2024). Two primary domestication events shaped the species: one approximately 10,000 years ago in the Fertile Crescent, giving rise to humpless taurine cattle (Bos taurus or B. taurus), and another around 8,000 years ago in the Indus Valley, resulting in humped indicine cattle (Bos indicus, B. indicus or Zebu cattle) (Pitt et al. 2019). Taurine and indicine cattle are interfertile and have experienced historical and recent hybridization. Indicine cattle have unique advantages in adapting to hot environments that have been facilitated by humans in response to a multi-century drought approximately 4,200 years ago (Verdugo et al. 2019). In Europe, taurine cattle underwent continuous breeding for economic gains such as milk and meat production after the Agricultural Revolution, and achieved great success, developing multiple world-renowned breeds. Compared to those in Europe, most of the cattle in Asia and Africa have not undergone targeted breeding due to conscription and nonintensive feeding.
These divergent traits are likely driven by subspecies-specific alleles or shared variations in various allele frequencies, which underlie unique quantitative trait loci (QTL) associated with these characteristics (Bolormaa et al. 2013). A large number of studies focused on the diversity of cattle populations and the uneven frequency distribution through genomics analyzing, which would draw wide interest and benefit the genetic improvement of the cattle industry (Karim et al. 2011; Chen et al. 2018a; Hayes and Daetwyler 2019; Kim et al. 2020). However, at least two factors limit our study from getting more comprehensive and accurate results. The first one is the limitation of the nucleotide sequence of the reference genome based on a Hereford cow. Even though it has undergone multiple upgrades since its initial release (Consortium 2009; Zimin et al. 2009; Rosen et al. 2020), the single reference genome does not fully capture cattle genetic diversity. For example, it is not yet telomere-to-telomere quality, while the best efforts so far are the genome from the Wagyu breed, which now has complete T2T sex chromosomes and a few autosomes (https://www.researchsquare.com/article/rs-6068440/v1). The second idea that has been proposed is the pangenome, which incorporates broader genetic diversity beyond a single reference genome (Smith et al. 2023). In cattle, a previous study developed breed-specific and pangenome reference graphs, uncovering 70 Mb of novel sequences (Crysnanto et al. 2019; Crysnanto and Pausch 2020; Crysnanto et al. 2021; Gao et al. 2025). Another study added 116 Mb of African cattle sequences, enhancing read mapping and variant calling accuracy (Talenti et al. 2022). Leonard et al. reported consistent pangenomes across platforms and constructed a pangenome from 16 PacBio HiFi assemblies (Leonard et al. 2022, 2023).
A pangenome is defined as the nonredundant collection of all DNA sequences present in a population, including a “core” genome shared by all individuals and a “variable” genome found only in a subset (Tettelin et al. 2005). The “variable” genome, including structural variation (SV), significantly contributes to overall genetic divergence (Eichler et al. 2007). SV involves larger genomic segments compared to single nucleotide polymorphisms (SNPs) and short insertion/deletion (InDel) variants. It includes deletions, insertions, duplications (commonly grouped as copy number variations or CNVs), inversions, and translocations, or complex events (involving two or more different types), ranging from 50 base pairs (bp) to 5 megabase pairs (Mb) (Scherer et al. 2007). The large size of SV increases the likelihood of influencing gene expression and function by altering gene dosage, interrupting coding sequences, or disturbing long-range regulation (Alkan et al. 2011; Sudmant et al. 2015). Previous research has detected SV in cattle genomes and linked them to potential phenotypes (Liu et al. 2010; Bickhart et al. 2012; Cicconardi et al. 2013; Kommadath et al. 2019; Chen et al. 2020; Hu et al. 2020; Low et al. 2020; Jang et al. 2021; Lee et al. 2021; Upadhyay et al. 2021). However, most of the previous research was mainly conducted on SNPs, the research on SV has just become popular recently in the era of pangenome research (Zhou et al. 2022; Leonard et al. 2024). It has been proven that more than 24% of SVs showed no or low linkage disequilibrium (LD) with SNPs in humans, and the SV could independently play important roles in genome biology and complex traits (Pang et al. 2010). Thus, one current limitation is that other kinds of mutations, especially SV, have not been fully studied, which results in unclear understanding of the diverse phenotypic differences in cattle at present.
In this study, we integrated SNP and SV to provide a comprehensive understanding of the genetic architecture of global cattle populations based on a pangenome graph platform. Using short-read whole-genome sequencing data from 2,409 individuals spanning 82 breeds, we investigated population structure, selective pressures, and adaptive traits. We identified genomic regions under selection associated with economically important traits and environmental adaptation. We examined both selected SVs, whether tagged or untagged by SNPs, for their potential functional roles and as candidates for recent favorable mutations. Finally, we validated the enhancer activity of one SV, which might regulate the cell-type-specific expression of IGFBP7 using single-cell RNA sequencing. Overall, our study underscores the complementary roles of SNPs and SVs in shaping genetic diversity and offers valuable insights into adaptation, selection, and strategies for livestock improvement, particularly in developing regions where environmental challenges are pronounced.
Results
Phenotype and Genome Variations of the Collected Datasets From Worldwide Cattle Breeds
We collected WGS datasets totaling ∼110 Tb for 2,409 cattle of 82 breeds, which represented the extensive genetic diversity of global cattle populations (Fig. 1a and supplementary table S1, Supplementary Material online). We classified the cattle into 13 different populations according to their geographic origins and taxonomies. Additionally, we retrieved reliable phenotype information for each cattle breed through trustable sources and publications (supplementary table S2, Supplementary Material online). These cattle exhibited substantial differences in economic phenotype and adaptability due to their diverse origins and levels of selective breeding. These include several widely hailed economic body traits of cattle, such as milk production, body weight, body height, and their adaptation to different climates (Fig. 1a and supplementary table S2, Supplementary Material online). Notably, cattle in Asia and Africa showed significantly lower beef and milk production compared to European cattle, which have undergone hundreds of years of breeding and selection (Fig. 1a and supplementary fig. S1a, and b, Supplementary Material online). However, European cattle also face challenges, such as adapting to high-temperature climates, when they are introduced into southern Asia and Africa (Fig. 1a and supplementary fig. S1c, Supplementary Material online).
Fig. 1.
Population structure and genetic diversity of the worldwide cattle breeds. a) Geographic classification, distribution, and phenotypes of the cattle breeds analyzed in this study. b) PCA analysis based on SVs. c) Admixture analysis based on SVs. EA, European aurochs; OBT, Oceania breeding taurine; WET, West European taurine; CSET, Central-South European taurine; WRT, West Russian taurine; ERT, East Russian taurine; NAT, Northeast Asian taurine; AT, African taurines; NWCT, Northwest Chinese taurines; TCT, Tibet Chinese taurine; NCCH, North-Central Chinese hybrid; SCI, South Chinese indicine; AI, African indicine; II, Indian indicine. d) Correlation analysis of average FST values across 28 different cattle population comparisons based on SNPs and SVs. e) Pie chart showing the percentage of SVs tagged by SNPs (r2 for linkage disequilibrium ≥0.6, within 1 Mb of the SV).
While the 1,000 Bulls Project had a slightly higher total sample count (Hayes and Daetwyler 2019), our study emphasizes diversity across major global lineages and complements that resource by including populations that were underrepresented in previous datasets. We analyzed two types of important genomic variations (SNPs and SVs) to comprehensively study the genetic diversity corresponding to worldwide cattle. Using a standardized SNP calling pipeline via GATK (McKenna et al. 2010), we identified and genotyped 28,088,254 SNPs with minor allele frequency (MAF) ≥ 0.01 across all samples using ARS-UCD1.2 as the reference. For SVs, we built a cattle pangenome graph by integrating 23 cattle assemblies and 92,518 previously reported SVs (supplementary table S3, Supplementary Material online). In total, 152,199 SV events were genotyped as biallelic variations (0/0, 0/1, and 1/1) for each animal using the VG software with the pangenome graph as the reference (Hickey et al. 2020). Compared to the previous study (Zhou et al. 2022), we identified 75,774 novel SVs, including 32,769 deletions (absences) and 43,005 insertions (presences). The inserted sequences had a total length of 24.51 Mb, which was absent in the reference genome. Over 82.61% (76,425) of the deletions previously reported (Zhou et al. 2022) were re-detected in this study. The call rate improved by an average of 14.26% using the VG software compared to our previous strategy (Zhou et al. 2022) (supplementary fig. S2a and S2b, Supplementary Material online). The SV genotyping results showed high consistency (93.38%) with our previous method (supplementary fig. S2c, Supplementary Material online). Moreover, 93.75% of the homozygous mutants and 83.75% of the heterozygous mutants were confirmed through targeted PCRs (supplementary table S4, Supplementary Material online). Generally, 139,036 (91.35%) SVs were supported by at least two animals. Considering a one-Mb distance, each SV was surrounded by at least 7,085 SNPs, with an average of 22,602 SNPs used for further analysis (supplementary fig. S2d, Supplementary Material online). This dataset provides a high-quality resource for comprehensive studies on the genetic diversity of cattle populations based on both SNPs and SVs.
SVs Reveal Global Cattle Population Structure Comparable to SNPs
To elucidate relationships among cattle populations, we conducted PCA, clustering, admixture, and gene drift analyses. Both SNP-based and SV-based population taxonomy analyses yielded consistent results. In the PCA analysis, the first component PC1 divided cattle into taurine and indicine, with hybrids positioned between them (Fig. 1b and supplementary fig. S3a, Supplementary Material online). The second component clearly separated the cattle from India, Africa, and South China within indicine and the cattle from Europe and West Africa within taurine (Fig. 1b and supplementary fig. S3a, Supplementary Material online). Clustering analysis supported these PCA findings and aligned with the historical distribution of cattle (supplementary fig. S3b, Supplementary Material online). For example, Australian Lowline cattle were derived from European Angus cattle, while West Russian cattle were similar to West European taurine, and the East Russian cattle shared similarities with Northeast Asian taurine (Fig. 1b and supplementary fig. S3a and b, Supplementary Material online).
In the admixture analysis, at K = 2, taurine and indicine cattle were successfully distinguished by both SNPs and SVs (Fig. 1d and supplementary fig. S3c, Supplementary Material online). At K = 6, although SVs did not separate West Europe and Central-South Europe taurine as SNPs did, they accurately explained the relationship of other cattle breeds, identifying taurine in Europe, Northeast Asia, and Africa, as well as indicine in North China and Africa (Fig. 1d and supplementary fig. S3c, Supplementary Material online). Chinese cattle showed the most complex ancestry. Northwest Chinese taurine showed mixed contributions from European taurine and Northeast Asian taurine, while North-Central China hybrids combined ancestry from European taurine, Northeast Asian taurine, and South Chinese indicine. In Africa, as previously reported, taurine and indicine cattle were partially crossed with each other. Both the gene drift analyses by SNPs and SVs showed that the Chinese and African cattle were both affected by European cattle breeds, alongside regional gene flow within their respective areas (supplementary fig. S3d, Supplementary Material online).
We calculated the average FST values for 74 population pairs, finding a strong correlation (0.85) between SNP- and SV-based results (Fig. 1d and supplementary table S5, Supplementary Material online). However, when considering the global LD (measured as r2) between SNPs and SVs, we found a large portion (75.84%) of SVs could not be tagged by SNPs, most of which were with low MAF < 0.01. Only 59.86% of SVs with MAF ≥ 0.01 were tagged (r2 ≥ 0.6) by at least one SNP within a one-Mb distance around SVs (Fig. 1e). Overall, besides their contributions captured by SNPs, our results also emphasize the independent and significant role of SVs in global cattle population classification and their potential influence on phenotypic variation.
SNP-based Selection Signals in European Cattle Guide Genomic Improvement in Unselected Populations
European improved cattle breeds have undergone systematic, targeted breeding for beef and milk production during the past centuries, leaving genome footprints of selection that are valuable for directing cattle breeding in other regions. These improved cattle breeds exhibited lower decay of LD for both SNPs and SVs and had larger regions of homozygosity (ROH) compared to other European cattle breeds (supplementary fig. S4, Supplementary Material online and Fig. 2c). We grouped animals by breeding purpose and size: dairy cattle (e.g. Holstein, Jersey, German Black Pied, Brown Swiss, Flechvieh, Gelbvieh, Rendena, Normande, totally 1,341 individuals), medium-small (MS) beef cattle (e.g. Angus, Hereford, totally 201 individuals), and large (L) beef cattle (Charolais, Limousin, Belgian Blue, Piemontese, totally 158 individuals). We applied a combined strategy of FST, XP-CLR, and XP-EHH based on SNPs to detect the selected genomic regions among these three populations (Fig. 2a). In total, 223, 237, 223, and 273 nonoverlapping regions were identified using 22.85 Mb SNPs between Beef and Dairy cattle, Beef (MS) and Dairy cattle, Beef (L) and Dairy cattle, and Beef (MS) and Beef (L) cattle, respectively (supplementary table S6, Supplementary Material online).
Fig. 2.
Selection signatures of SNPs in European beef and dairy cattle. a) Manhattan plot of FST values from SNP-based selection scans across four comparisons: beef cattle vs. dairy cattle, medium-small (MS) beef cattle vs. dairy cattle, large (L) beef cattle vs. dairy cattle, medium-small (MS) beef cattle vs. large (L) beef cattle. Dairy: European improved dairy cattle; Beef (MS): European improved medium-small beef cattle; Beef (L): European improved large beef cattle; NAT, Northeast Asian taurine; NWCT, Northwest Chinese taurine; NCCH, North-Central Chinese hybrid; SCI, South Chinese indicine; AT, African taurine; AI, African indicine. The dotted line represents the threshold for the top 1%. b) PCA analysis using the top 1% FST SNPs between beef cattle and dairy cattle; Beef: European improved beef cattle; Dairy: European improved dairy cattle; Europe others: cattle breeds except the improved beef and dairy cattle; China: cattle breeds in China; Africa: cattle breeds in Africa. c) Dot plot showing ROH length and count across different cattle populations; Beef: European improved beef cattle; Dairy: European improved dairy cattle; China: cattle breeds in China; Africa: cattle breeds in Africa. d) SNP heterozygous frequency and potential regulation mechanisms integrating chromatin states and FarmGTEx EWAS data for a beef ROH hotspot (chr14:23004920–23408215). CattleGTEx and FAANG results, including eSNPs, eGenes, and chromatin states, were graphically visualized to explore potential enhancer signals. While eSNP–eGene relationships were found to be widespread due to their complex, many-to-one and one-to-many nature, the pairings were clearly defined. Muscle, rumen, and testis tissues from beef cattle, as well as adipose and rumen tissues, were included in the analysis, with tissue-specific pairings indicated by line colors.
We examined the potential effects of selective signatures by comparing their chromosomal positions to cattle QTLs found in the public database that includes 192,247 QTLs for 552 different base traits from 1,179 publications (Hu et al. 2021). As expected, the significantly selected windows were enriched with milk content-related traits (milk beta-casein percentage, milk lactose content, etc.) between dairy cattle and all beef cattle, and with meat and carcass-associated traits (average daily gain, subcutaneous fat, carcass weight, etc.) between the two kinds of beef cattle (supplementary fig. S5, Supplementary Material online). Functional annotation of genes within these selected regions showed that they were partially enriched in the GO terms or pathways related to specific characteristics of two populations compared (supplementary table S7, Supplementary Material online). This included protein binding, calcium ion import across the plasma membrane, etc. between beef cattle and dairy cattle, and ATP binding, lipid binding, vascular smooth muscle contraction, thyroid hormone synthesis, etc. between MS and L beef cattle.
Interestingly, when using the top 1% SNPs derived from the FST analysis between European beef and dairy cattle to perform PCA analysis, we observed both Chinese and African cattle positioned between European beef and dairy cattle along PC1 (Fig. 2b). This supports the notion that Chinese cattle and African cattle have not undergone targeted breeding for dairy and meat production, i.e. the important genomic variations underlying beef and milk production in Chinese and African cattle were not under selection pressure. For example, the SNPs, located in SLC24A2 (chr8:24555356) selected for calcium channel activity in dairy cattle, ASIP (chr13:63667387) selected for fat deposition in beef (MS) cattle, and MYL6 (chr5:57162869) and MYL6B (chr5:57161801) selected for muscle development in beef (L) cattle (Magalhães et al. 2016; Liu et al. 2019; da Cruz et al. 2020), were all in the opposite ends of frequency distributions to the selected European cattle populations (Fig. 2a). This implied that the candidate selection signatures for Europe's improved beef and dairy cattle would be useful to guide the target breeding for Chinese and African cattle.
European improved cattle had significantly longer and more ROH regions compared to Chinese and African cattle (Fig. 2c). We identified 45 and 36 ROH hotspots (shared in over 80% of cattle and ≥100 kb) for European improved beef and dairy cattle, respectively, in which the Chinese and African cattle both showed low ROH frequency (supplementary tables S8 and S9, Supplementary Material online). These ROH hotspot regions were enriched for beef and milk production-related QTLs and genes. We identified an ROH hotspot specific for beef cattle in chr14:23004920–23408215. It covered multiple QTLs for carcass weight, average daily gain, body weight, fat thickness at the 12th rib, etc., as well as six genes including PLAG1, CHCHD7, MOS, LYN, TGS1, and TMEM68 (Fig. 2d and supplementary table S10, Supplementary Material online). Multiple studies have proved that this region, including PLAG1 is strongly related to the growth and body size of cattle (Karim et al. 2011; Nishimura et al. 2012; Engle and Hayes 2022). In dairy cattle, we also detected a specific ROH hotspot in chr18:56932170–57022131 covering dozens of the KLK gene family members (supplementary fig. S6, Supplementary Material online and supplementary table S10, Supplementary Material online). Even though KLK genes were not reported to have a direct association with milk production, they were proven to play key roles in the health of the mammary gland (Pampalakis et al. 2019; Christiono and Anggarani 2021). Cross-referencing with the 13 chromatin states and eQTLs from FarmGTEx, we successfully located the potential causal mutations and regulator elements for the ROH hotspots-covered genes. Notably, eQTL signals were especially enriched in several tissues (muscle, rumen, and testis for beef cattle, adipose, and rumen for dairy cattle) corresponding to the improvement of beef and dairy production (Fig. 2d and supplementary fig. S6, Supplementary Material online, supplementary table S10, Supplementary Material online). For example, enhancers located in chr14:23092257–23095329 might regulate the expression of PLAG1 in the testis which might be the actual causal region for body weight in cattle (Reverter et al. 2020; Engle and Hayes 2022) (Fig. 2d).
Recent SV Mutations Provide Genetic Variations Beyond SNPs for the Improvement of European Beef and Dairy Cattle
We detected 2,423 selective SV signatures from 116,194 SVs by applying FST statistics with a top 1% threshold across all comparisons among European improved beef and dairy cattle (supplementary table S11, Supplementary Material online). Notably, 32.80% to 60.12% of genes uncovered by SVs were not detected by SNPs using similar thresholds (top 1% FST value) (supplementary table S12, Supplementary Material online). At the same time, we note that a substantial portion of selective SNP signals (∼95%) could not be tagged by SVs. For example, genes like ASIP, MYL6, and MYL6B, detected by SNPs, were not identified by SVs. Nevertheless, annotated genes affected by selected SV signatures were enriched in functions similar to those candidate SNPs (supplementary table S13, Supplementary Material online). Significantly selected SV regions were enriched with QTLs related to milk production and meat and carcass traits (Fig. 3a). This confirmed that the SVs are as significant as SNPs and may explain phenotypic variations in cattle that SNPs alone cannot account for.
Fig. 3.
SV contributions to the genetic improvement of European beef and dairy cattle. a) Heatmap of QTL enrichment in significantly selected SV regions. b) Frequency and divergence of different MEI elements among significantly selected SVs. c) Box plot comparing the divergence of L1_BT between orphan SVs and SNP-tagged SVs. d) Partial Manhattan plot showing significantly selected orphan SVs in LRP1B between all beef cattle and dairy cattle. The LD block is plotted with or without significant SVs. e) Genomic position and sequence characteristics for the SV in the LRP1B gene. f) Manual inspection of genome assemblies across ruminants for the existence of the L1_BT sequence within the orphan SV. The species name with red color represented L1_BT presence and the species name with blue color represented absence. g) Frequency distribution of the orphan SV across cattle populations with different geographic and taxonomic classifications. WRT, West Russian taurine; ERT, East Russian taurines; NAT, Northeast Asian taurine; NWCT, Northwest Chinese taurine; TCT, Tibet Chinese taurine; NCCH, North-Central Chinese hybrid; SCI, South Chinese indicine; II, Indian indicine; AT, African taurine; AI, African indicine.
Generally, selective SNP signals became stronger as the distance to selected SV signals decreased, consistent with selection sweeps (supplementary fig. S7, Supplementary Material online). However, not all selected SVs could be tagged by nearby SNPs. We defined the SVs without tagged SNPs (r2 < 0.6, distance ≤1 Mb) as orphan SVs. We identified on average 1,395 (57.57%) orphan SVs and 1,082 (42.47%) SNP-tagged SVs across the four comparisons. Over half (55.30%) of the selected SVs overlapped with MEIs (mobile element insertions) in the genome. A detailed analysis of MEI types revealed that bovine-specific MEIs, including L1_BT (43.51%), BovB (13.25%), and BovA2 (10.32%), contributed most to the selective SV signatures (supplementary fig. S8, Supplementary Material online and Fig. 3b). L1_BT was particularly prominent among orphan SVs, exhibiting significantly younger divergence compared to its SNP-tagged counterparts (Fig. 3c), suggesting that orphan SVs represent more recent i.e. younger events.
Among the selected SV signatures distinguishing dairy and all beef cattle, we observed a top selected orphan SV (chr2:56290045–56298615) located in the intron of the LRP1B gene (Fig. 3d and e). LRP1B (low-density lipoprotein-related protein 1B) has been implicated as a candidate gene for fat deposition in milk (Martins et al. 2021). This orphan SV consisted of a complete L1_BT sequence with low divergence (1.5%) and contained H3K27ac enhancer signals in tissues such as the mammary gland and liver (Fig. 3e). Manual inspection of genome assemblies across ruminants revealed that this SV insertion occurred exclusively in taurine cattle (Fig. 3f). Population variation analysis further suggested that this L1_BT insertion likely arose after the divergence of European taurine and other taurine populations (Fig. 3g). Admixture analysis indicated that the SV variation in Northwest Chinese taurine and North Central Chinese hybrid cattle likely resulted from European taurine introgression (Figs. 1d and 3g). Thus, we hypothesize that this orphan SV represents a recent L1_BT insertion in the European taurine population, introducing functional elements that modulate LRP1B expression, thereby influencing milk fat deposition. During subsequent selection processes, this mutation diverged between dairy and beef cattle.
SNP and SV Selection Signatures Jointly Explain the Challenges Faced by European Cattle When Introduced to Tropical Regions
The introduction of improved dairy and beef cattle from Europe to regions lacking high-performance breeds is often complicated by challenges in adapting to local environmental and climatic conditions. The primary challenge for European cattle is environmental adaptation, including heat and disease resistance, when they are imported to South China and Africa. We detected 2,322 SVs specific to indicine cattle in these regions (absent in taurine populations, frequency ≥0.1 in indicine populations), contributing 1.19 Mb of novel sequences and overlapping with 588 gene body regions (supplementary table S14, Supplementary Material online). By performing selective signature scanning, we detected 25,743 SNPs (in 890 selected 50 kb-windows) and 2,081 SVs (1,050 presence and 1,031 absence events) as significantly selected between European taurine and South Chinese/African indicine cattle populations. Almost all selected SVs between European taurine and South Chinese/African indicine were tagged by SNPs, and the defined orphan SVs have LD r2 < 0.6 with SNPs (supplementary fig. S10a, Supplementary Material online). There were 30 genes commonly annotated by significantly selected SNPs and SVs (Fig. 4a). The previously identified selected genes, including CTNNA1, DNAJC18, etc. were both uncovered by SNPs and SVs simultaneously. Genes overlapped with significant SNPs and SVs were most enriched in Bacterial invasion of epithelial cells pathway and Endocytosis pathway for disease resistance; Thyroid hormone signaling pathway, Phospholipase D signaling pathway, Rap1 signaling pathway, Sphingolipid signaling pathway for heat/cold resistance; as well as Growth hormone synthesis, secretion and action pathway and Carbohydrate digestion and absorption pathways for body sizes (Fig. 4b and supplementary table S15, Supplementary Material online).
Fig. 4.
Dominant effects of SVs in selected genome regions between European taurine and Chinese-African indicine. a) Manhattan plot displaying selection signatures of SNPs and SVs. Significantly selected variants are highlighted in orange, with commonly selected genes labeled with stars and the gene names. b) Phenotypic differences and functional pathways affected by significant SVs between European taurine and Chinese-African indicine. c) Distribution of Fst values and overlap with functional genomic elements for selected SVs. d) Expression differences between taurine and indicine cattle in different tissues for the four genes with functional elements overlapping SVs. The student's t-test was used for the significance test. “*”P < 0.05; “**”P < 0.01.
However, even if SVs and SNPs uncovered the same genes with selection pressure, SV might play key roles in those selection regions for differences between European taurine and indicine. We compared all FST values for SNPs and SVs that overlapped with genes. We saw SVs with the highest FST value in 7 of the 30 genes with selection pressure and 4 genes (BBS9, ABHD12, CD300A, and GHR) that directly changed the sequence structure of functional elements including a primed enhancer, active element, active TSS and polycomb repressed (Fig. 4c). To figure out if the four selected SVs would affect the expression of the genes, we compared the gene expression levels of the four genes between 7,105 taurine and 490 indicine in eight different tissues from CattleGTEx. The results showed that all four genes were widely differentially expressed in different tissues between taurine and indicine cattle (Fig. 4d), implying that some SVs may play dominant roles in the genome regions with selective pressure. We acknowledge that future studies integrating tissue and breed matched epigenomic data will improve the precision of functional interpretation.
Novel SV Mutations Promote Subpopulation of Local Cattle Adaptability
We further performed selective signature scanning for both SNPs and SVs of different cattle populations according to different taxonomy relationships and geographical regions. Totally, we grouped the animals into 28 different pairs within 8 sub-groups (supplementary table S16, Supplementary Material online). We detected 3,301 regions (spanning 312.81 Mb in length) and 6,573 SVs that were significantly selected between two different populations (Fig. 5a and supplementary table S16, Supplementary Material online). Among these, 1,831 SVs (27.50%) were classified as orphan SVs (Fig. 5b). Interestingly, we observed that the proportion of orphan SVs showed a negative correlation with the genetic divergence (average FST value) between the populations compared (Fig. 5c). Compared to SNP-tagged SVs, orphan SVs displayed significantly lower FST values and MAF (supplementary fig. S9a and b, Supplementary Material online) and exhibited patterns according to geographic locations rather than subspecies formation (supplementary fig. S9c and d, Supplementary Material online). This again suggest that the orphan SVs might represent recent mutations that have spread among cattle herds in the local surrounding area due to their adaptability to selection.
Fig. 5.
Selection of orphan SVs among cattle populations from different geographic regions. a) Overview of population comparisons from different regions. b) Pie chart showing proportions of SNP-tagged SVs and orphan SVs. c) Correlation analysis between average FST value and orphan SV ratio across population comparisons. d) Heatmap of significantly selected SVs between African and other taurine populations. e) Manhattan plot for the selection signatures of SVs in African taurine vs. European taurine and African indicine vs. European taurine. SVs with FST > 0.3 in both of the two comparisons were marked as red dots. SVs that only showed an FST > 0.3 between African taurine and European taurine were marked as red stars. Functions of genes were annotated using DAVID online software. The genes related to various adaptabilities were marked using different colors.
African taurine, thriving in hot, humid environments, may be due to convergent evolution and benefits from hybridization with indicine (Kim et al. 2020, 2023). Among 926 commonly strong (FST > 0.3) SV selection signatures between African and other taurine, many were with similar frequency between African taurine and indicine but not with other taurine. These SVs might partially explain the adaption of African taurine to a locality like African indicine (Fig. 5d). There were 19 genes related to heat/cold resistance, disease resistance/immunity, energy metabolism, and body size that overlapped with commonly selected SVs for African taurine and African indicine with other taurine (Fig. 5e). It's noted that the orphan SVs among the selective signatures commonly showed medium frequency in the African taurine but with low/no frequency in other cattle populations (Fig. 5d). Moreover, those SVs were overlapped with genes for heat/cold resistance, disease resistance/immunity, etc. (Fig. 5e). For example, an orphan SV overlapping CRCP (chr25:28034437–28034687), a key immunity-related gene, was present in 41.66% of African taurine and a small proportion of African indicine possibly because of local hybridization (supplementary fig. S11, Supplementary Material online). Those orphan SVs might be originally from African taurine, which would be a benefit to help African taurine to adapt the local environment.
Dominant Role of SV in Regulating IGFBP7+ Adipocyte Formation to Coordinate European Cattle Differences and World Cattle Adaptability
A majority of SVs in our study were tagged by SNPs. However, SVs may play more dominant regulatory roles due to their longer sequences, which provide greater potential as regulatory elements compared to SNPs. Using the same thresholds, certain SVs but not the surrounding SNPs could be detected, and these selection signals would likely be missed in traditional SNP-based analyses. For further investigation, we selected an SV (chr6:72419812–72422015) that was significantly selected in European improved cattle and across various regional cattle populations. This SV exhibited a high presence frequency in MS beef cattle compared to cattle populations with lower fat deposition ability (large beef cattle and dairy cattle). Its presence frequency decreased progressively as the cattle populations moved from colder to hotter regions (Fig. 6a). Tagged by SNPs, this SV demonstrated the strongest selection signals relative to surrounding SNPs (supplementary fig. S12, Supplementary Material online). At the same time, SNPs near this SV displayed haplotype variations among different cattle populations, providing further evidence for the significance of this selective genomic region (supplementary fig. S13, Supplementary Material online). This SV is located in the first intron of IGFBP7, a gene with a critical role in lipid accumulation (Fig. 6b). Its selective characteristics were consistent with energy metabolism traits in MS beef cattle and their adaptability to cold and heat resistance. We validated the existence of the SV using PCR and the IGV software (supplementary fig. S14, Supplementary Material online). The SV sequence was identified as a partial L1_BT element with high divergence (Fig. 6b). Scanning the SV sequence using MEME software revealed multiple motifs, including the GGRRGAGGGAG motif, which was significantly enriched with transcription factors involved in glucose and lipid metabolism, apart from growth factor activity (Fig. 6b and supplementary fig. S15, Supplementary Material online). To understand its functional role, we isolated primary adipocytes from cattle with different SV genotypes and performed single-cell RNA sequencing. We observed a small cluster of cells (Cluster 18) that uniquely and highly expressed IGFBP7 (Fig. 6e and f). By applying a dual-luciferase reporter assay, we confirmed that this SV sequence exhibited significant enhancer activity (Fig. 6c). Furthermore, the IGFBP7 expression was significantly lower in Cluster 18 cells from SV(−) cattle compared to those from SV(+) cattle (Fig. 6d and supplementary fig. S16, Supplementary Material online). This suggests that the SV functions as an enhancer regulating IGFBP7 expression in Cluster 18 cells.
Fig. 6.
Functional analysis of an SV in IGFBP7 and its potential regulation mechanisms for fat deposition. a) Frequency distribution of the SV across cattle populations from different regions and among European improved beef and dairy cattle. Dairy: European improved dairy cattle; Beef (MS): European improved medium-small beef cattle; Beef (L): European improved large beef cattle; WRT, West Russian taurine; ERT, East Russian taurine; NAT, Northeast Asian taurine; NWCT, Northwest Chinese taurine; NCCH, North-Central Chinese hybrid; SCI, South China indicine; II, Indian indicine; AT, African taurine; AI, African indicine. The average temperature for each geographical region was shown on the left side of the map. b) Location and sequence characteristics of the SV. c) Dual-luciferase reporter assay for enhancer activity of the SV sequence. The student's t-test was used for the significance test. “*”P < 0.05; “***”P < 0.001; “****”P < 0.0001. d) Expression levels of IGFBP7 in Cluster 18 cells from cattle with different SV genotypes. e) t-SNE analysis of cell clusters. f) Expression levels of IGFBP7 across cell clusters. g) Comparison of lipid accumulation in primary adipocytes from the SV(−) and SV(+) cattle adipose tissues at Day 12 after the induction for adipogenesis.
IGFBP7 is known to be a secreted protein and regulates adjacent cells by binding to receptors in the cell membrane. Specifically, IGFBP7 binds to the IGF-1 receptor and blocks its activation by insulin-like growth factors (Evdokimova et al. 2012). Differentially expressed genes between IGFBP7+ cells and other cell types were significantly enriched in pathways related to biological regulation, regulation of cellular processes, response to stimuli, and cell communication (supplementary fig. S17, Supplementary Material online). Previous studies have also shown that certain stromal cell populations can regulate adipogenesis (Schwalie et al. 2018), supporting the hypothesis that IGFBP7+ cells may act as regulatory cells. Inducing adipogenesis analysis revealed that primary adipocytes separated from SV(−) cattle adipose tissue had higher lipid accumulation ability than primary adipocytes from SV(+) cattle (Fig. 6g and supplementary fig. S18, Supplementary Material online). Therefore, we hypothesize that this SV functions as an enhancer to upregulate IGFBP7 expression in Cluster 18 cells, resulting in elevated secretion of IGFBP7 into the extracellular matrix. This secreted IGFBP7 may modulate lipid accumulation by interacting with IGF1R receptors, potentially through effects on glucose uptake (supplementary fig. S19, Supplementary Material online).
Discussion
Cattle are central to agriculture and have evolved under natural environmental pressures and human-driven selective breeding (Rexroad et al. 2019). Selective forces (natural and human-imposed) and nonselective forces (demographic events and introgression) have shaped the cattle genome, influencing traits related to survival, productivity, and market demands (Decker et al. 2009, 2014). This study integrates SNP and SV analyses for a comprehensive understanding of the genetic factors driving cattle diversity and selection. The use of a pangenome approach captures SVs missed by a single genome. As seen in the latest T2T cattle work (https://www.researchsquare.com/article/rs-6068440/v1), which added 431 Mb of new sequences, the number of new bases will likely increase as the linear reference improves and more T2T assemblies are included. This highlights the growing importance of pangenomes in capturing genetic diversity.
Population Structure and Genetic Diversity
Principal component and admixture analyses revealed distinct genetic clusters separating taurine and indicine lineages, reflecting their evolutionary divergence and domestication histories (Decker et al. 2009, 2014). While taurine breeds are adapted to temperate climates and display high productivity in beef and dairy traits (Hayes et al. 2009), indicine breeds are adapted to tropical climates with resilience to heat, parasites, and diseases (Cooke et al. 2020a; Cooke et al. 2020b). Admixture patterns between these two subspecies indicate historical crossbreeding, aiming to combine desirable traits from both lineages (Cooke et al. 2020a).
SVs contributed significantly to genetic diversity. We observed that ∼40.14% of SVs were not tagged (r² < 0.6, MAF ≥ 0.01) by nearby SNPs within 1 Mb, emphasizing their unique role in phenotypic variation. Certain SVs cannot be tagged by SNPs (Korn et al. 2008; Handsaker et al. 2015), as shown in grapevine studies where SVs improved trait heritability (Liu et al. 2024). We further examined the characteristics and potential formation mechanisms of these orphan SVs. They were mostly associated with MEI-related elements, including L1_BT (43.51%), BovB (13.25%), and BovA2 (10.32%), all of which were unique to ruminants. L1-BT exhibited smaller average divergence compared to other MEIs, suggesting that these SVs may represent recent mutations (Adelson et al. 2009).
An orphan SV in the LRP1B gene was only detected in European taurine, supporting the idea that some orphan SVs may be recent and localized. Orphan SVs typically displayed low selective signals with medium MAFs in specific populations. We observed that populations clustered more by geography than taxonomy and speculated that orphan SVs in African taurine may contribute to local adaptability. If these SVs align with natural or artificial selection, they may undergo positive selection and expand within subpopulations. Thus, some orphan SVs could represent recent, favorable variants selected in specific populations. Further evidence is needed to clarify their formation mechanisms.
Even when SVs are tagged by SNPs, they may still exert a dominant and causal influence on phenotypic traits. For example, we observed that some SVs, such as those in GHR and IGFBP7, displayed the highest divergence values in the SNP-SV shared selective regions. We conducted an in-depth study to explore the function of the SV in IGFBP7 and finally showed that this SV may act as an enhancer for IGFBP7, regulating lipid accumulation in an IGFBP7+ cell cluster. This discovery illustrates how SVs can reveal previously unrecognized genetic mechanisms influencing economically important traits. At the same time, we need to note that this SV in IGFBP7 was significantly selected both in European improved cattle and across various regional cattle populations. In European improved cattle, the SV may regulate fat deposition in MS cattle, which are prone to fat deposition, compared to other cattle. However, in southern China and Africa, where hot environments prevail, the SV may modulate energy deposition in response to environmental pressures, as less energy is required for fat deposition compared to colder regions. Therefore, these findings underscore the importance of considering population-specific characteristics when utilizing selected loci for breeding and management strategies.
Selection and Adaptive Genetic Signatures
Selective sweep analyses have identified genomic regions under strong selection in European improved beef and dairy breeds, driven by human breeding priorities aimed at maximizing productivity. For example, previous findings include genes involved in milk yield and quality (e.g. DGAT1 and CSN2) and genes for growth and muscle development (e.g. MYF5 and GDF8) (Ma et al. 2021; Mohammadabadi et al. 2021). Functional enrichment was conducted to link these regions to key economically important traits such as milk production, carcass weight, and muscle development (Qanbari et al. 2014).
Selective signature analysis for European and regional cattle populations identified numerous SNP and SV loci related to economic traits and environmental adaptability. This not only identified numerous SNP loci related to economic traits (milk and beef production) and adaptability among different cattle herds but also uncovered the previously underexplored SV loci. Both selected SNPs and SVs appear to play critical roles in driving variation across cattle populations. This provides further important evidence for the crucial role of SV in explaining the diversity of cattle and valuable genomic resources for a balanced enhancement of the economic value of cattle worldwide in the future.
We also identified untagged SVs by SNPs under significant selection across different cattle populations. The annotated genes and QTLs overlapping with these SVs could explain the phenotype differences in ways similar to SNPs. This indicates that SV plays an indispensable role in understanding cattle diversity, particularly in areas where SNP signals alone are not enough (Xu et al. 2014; Chen et al. 2024). Our results revealed the potential influence of introgression in shaping SV landscapes and supplied vital information to promote the understanding of adaptation and phenotype differences between taurine and indicine cattle at the SV level. Additionally, our study provided proof of concept for utilizing SVs as important markers in evolutionary studies and breeding programs aimed at enhancing the adaptive potential of local cattle populations.
Regional-specific analyses further identified adaptive genetic signatures linked to environmental resilience, consistent with previous findings in African and South Chinese cattle (Gao et al. 2017; Kim et al. 2017; Liu et al. 2020; Tijjani et al. 2022; Ayalew et al. 2023). For example, genes like HSP70, associated with heat resistance, help support thermoregulation in high-temperature environments (Basiricò et al. 2011). In African cattle, immune-related genes provide resistance to endemic diseases such as trypanosomiasis (Rajavel et al. 2020). These findings provide essential insights into breeding strategies for climate resilience and disease resistance. However, adaptation often comes with tradeoffs. For instance, the enhanced thermotolerance of indicine breeds can lead to reduced milk production compared to taurine breeds (Cooke et al. 2020a). In heat-stressed Holstein cows, three genomic SNPs in the genes TLR4, GRM8, and SMAD3 have been identified as molecular markers for both milk production and thermotolerance (Zamorano-Algandar et al. 2023). Understanding these tradeoffs is essential for designing breeding programs that balance productivity with resilience, ensuring sustainable improvements in cattle genetics for diverse environments.
Implications and Applications
This study demonstrates the independent and complementary roles of SVs vs. SNPs in understanding genetic diversity and selection. While SNPs are crucial for detecting single-base changes and associated traits, SVs capture the large variations that often escape SNP-based analyses. For instance, the IGFBP7 SV highlights how structural variations can influence phenotypic traits independently of SNP signals. The findings have significant implications for cattle breeding and improvement. Incorporating SVs into genomic selection pipelines will enhance the accuracy of breeding programs, particularly for traits where SNP signals are insufficient (Hayes et al. 2009). Additionally, insights into adaptive genetic signatures can guide breeding strategies to improve resilience against climate change and disease pressures, which is especially relevant for sustainability in developing regions (Akinsola et al. 2024). Developing countries face unique challenges in cattle production, including limited resources, harsh environments, and diverse disease pressures. Identifying regional-specific genetic adaptations provides opportunities to develop breeds tailored to these conditions. Enhancing heat and disease resistance through targeted breeding or gene editing offers a promising pathway to boost productivity and food security (Liu et al. 2022).
Limitations and Future Directions
The improvements in SV calling accuracy and resolution in this study remain constrained by limitations such as reliance on short-read sequencing and low coverage in samples. Short reads are particularly ineffective for detecting large duplications and inversions, and mapping challenges persist in repetitive or complex genomic regions. Long-read sequencing technologies, like PacBio or Oxford Nanopore, offer significantly higher sensitivity for structural variant detection, with studies showing up to fivefold improvements compared to short-read methods (Huddleston et al. 2017; Miga et al. 2020; Cheng et al. 2021; Ebert et al. 2021; Logsdon et al. 2021; Bai et al. 2022; Olagunju et al. 2024; Su et al. 2024). Future efforts should prioritize improving reference genome quality, incorporating high-quality assemblies into graph-based pangenomes (Nguyen et al. 2023; Smith et al. 2023; Kalbfleisch et al. 2024; Zhang et al. 2024), and expanding datasets to include underrepresented cattle populations, especially indigenous breeds. Cattle graph pangenomes, which already show advantages in structural variant detection, hold great promise for enhancing genome-wide studies (Leonard et al. 2022, 2024). Additionally, experimental validation of candidate SNPs and SVs, through tools like CRISPR-based editing, will be essential for confirming functional roles. As sequencing costs decrease, integrating long-read data and other structural variants such as inversions and translocations will provide a more comprehensive understanding of cattle genetic diversity, adaptation, and evolutionary processes.
Conclusion
Integrating SNP and SV analyses provides a powerful framework for unraveling the complex genetic architecture of cattle populations. This approach identifies distinct genetic clusters for taurine and indicine lineages, genomic regions under selection for economically important traits such as milk production and muscle development, and unique contributions of SVs, including regulatory roles in fat deposition and other traits. The enhanced SV catalog generated here serves as a vital resource across diverse cattle breeds, shedding light on genomic content changes during domestication, breeding, and improvement. These insights highlight the potential of integrating SNP and SV analyses to unravel complex genetic networks driving cattle diversity and selection. This approach offers transformative opportunities for breeding programs to tackle food security challenges in developing countries and foster sustainable livestock management in the face of climate change.
Materials and Methods
Whole Genome Sequencing Data Collection and SNP Calling Processes
The 2,409 cattle of 82 cattle breeds were retrieved from the NCBI Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra/) and our previous studies (Zhou et al. 2022). The accession ID for each data set is in supplementary table S1, Supplementary Material online. Adapters and low-quality reads were filtered using parameters of “-q 20 -u 30 -l 75” by fastp (V0.23.2) (Chen et al. 2018b). The cleaned reads were then aligned to the cattle reference genome (ARS-UCD1.2) and SNPs were called using the BWA-GATK pipeline in Sentieon tools (https://www.sentieon.com/) (V202308). For the initial quality control of variants, VCFtools (V0.1.16) (Danecek et al. 2011) and GATK (V4.3.0) (Van der Auwera et al. 2013) were used with filtration criteria for SNPs: (i) max-missing 0.3; (ii) max-alleles 2; (iii) maf 0.01; (iv) min-meanDP 3; (v) QD < 2.0; (vi) QUAL < 30.0; (vii) SOR > 3.0; (viii) FS > 60.0; (ix) MQ < 40.0; (x) MQRankSum < −12.5; and (xi) ReadPosRankSum < −8.0.
Pangenome Graph Construction and SV Calling
To get more SVs genotyped for collected samples, we employed a strategy combining SVs called by different assemblies and high-quality SVs detected by short reads in large populations. First, we identified insertions and deletions for 23 collected cattle genome assemblies by pairwisely comparing them to the reference genome (ARS-UCD1.2) through three assembly-based SV calling tools: (i) Svrefine (V0.35), with the parameter “--maxsize 1,000,000”; (ii) Assemblytics (V1.2.1), with the “unique_anchor_length” set to 10,000, “min_variant_size” set to 50, and “max_variant_size” set to 10,000; and (iii) Minimap2 (V2.25) (Li 2021), used to map the assembly to the reference with the parameter “asm5”. The output PAF file was used to call SV using paftools, with the parameter “-L 10,000.” To improve SV breakpoint accuracy, we realigned sequences 100 bp upstream and downstream of the SV boundaries detected by the three software to the local reference using both mafft (V7.487) (Katoh and Toh 2010) and msa2vcf (V1.0). Second, we incorporated 92,518 high-quality deletions identified through WGS in 898 cattle samples across 57 different breeds in our previous study (Zhou et al. 2022). Finally, we used the ARS-UCD1.2 assembly as the backbone of the pangenome graph. All merged SVs were incorporated into a variant graph using the “construct” module of the vg toolkit (V1.51.0) without removing any alternate alleles (Hickey et al. 2020). The resulting pangenome graph was then indexed in XG and GCSA formats using “vg index”, with the “-L” parameter enabled for both formats. SVs were depicted as bubbles in the graph, with paths representing the corresponding alleles.
Each sample's clean reads were mapped against this graph genome using vg Giraffe, resulting in alignments in the GAM format (Sirén et al. 2021). Alignments with mapping quality below five or base quality below five were filtered. A compressed coverage index was calculated using “vg pack”, and snarls were created using “vg snarls,” both with default parameters. Finally, SV genotyping results for all samples were produced using “vg call,” with the parameter “-v --bias-mode --het-bias 2,4” on the constructed pangenome graph.
Population Structure Analysis
We prepared data for population analyses by randomly selecting 50 individuals for breeds with sample numbers over 50 and retaining breeds with sample numbers less than 50. Mutations with MAF < 0.01 or animals with low call rates (<0.8) were filtered out. We then performed LD-based pruning genotype data by filtering mutations with r2 > 0.5 using PLINK. This resulted in data for 472 samples from 14 cattle populations for 1,667,536 SNPs and 86,775 SVs. PCA was performed using “- pca” option of the PLINK software (v1.9) (Purcell et al. 2007). ADMIXTURE was run for each possible group (K = 2 to 10) with 200 bootstraps using Admixture (v1.3.0) (Alexander et al. 2009). The neighbor-joining tree was constructed using the matrix of pairwise genetic distances calculated by VCF2Dis (v1.4.0). A population-level phylogeny was reconstructed using the maximum likelihood (ML) mode in TreeMix (v.1.13).
Genetic Diversity Evaluation, LD Analysis, and ROH Detection
Genome-wide nucleotide diversity and genetic distances (FST) of different cattle populations were estimated using VCFtools (v0.1.17). The information of population and variation used for selective signatures was listed in supplementary table S5, Supplementary Material online. LD decay was calculated using popldelay. To ensure comparability between European improved cattle and other European cattle, pi-hat values were calculated using PLINK, and animals with pi-hat values higher than 0.5 were filtered out. Population sizes were balanced by retaining a similar number of animals. ROH was identified using Plink1.9 with “–homozyg” option and default parameters: “–homozyg-window-snp 20 –homozyg-density 50 –homozyg-window-het 1 –homozyg-window-missing 0 –homozyg-snp 20 –homozyg-kb 100 –homozyg-gap 50”. ROH islands were defined as genomic regions where at least 80% of the animals in a population had consecutive SNPs in an ROH (Pemberton et al. 2012).
Selective Signature Analysis for SNPs and SVs
We used a strategy by combining single-site-based and multiple window-based selective signature identification methods to minimize false positives. First, FST values were calculated for each SNP between compared cattle populations using VCFtools (v0.1.17). Then, three methods (FST, XP-EHH, and XP-CLR) were applied to detect selected genomic regions using SNPs. FST, average normalized XP-EHH scores, and XP-CLR were calculated for 50-kb windows with 20-kb steps using VCFtools (v0.1.17), selscan (v1.3.0), and xpclr (v1.1.2), separately. Only windows shared in the top 1% across the two methods were considered putative significant windows. High-confidence selected windows were defined as those containing at least one SNP in the top 1% of FST values. For the SVs, single-site FST calculations were conducted using VCFtools (v0.1.17). The top 1% was recognized as significant SVs. When comparing taurine and indicine cattle, only cattle with over 90% pure lineage were included.
Gene Function and QTL Enrichment
The gene annotation file was downloaded from the Ensembl database (https://ftp.ensembl.org/pub/release-80/gtf/bos_taurus/). Cattle QTL information was downloaded from the Animal QTL database (https://www.animalgenome.org/cgi-bin/QTLdb/index). Gene functional annotation analyses were performed using the online DAVID software (https://david.ncifcrf.gov/). Fisher's exact test was conducted to measure gene enrichment in annotation terms (threshold: 0.05). QTL enrichment analysis in particular regions (selected SVs) was conducted by Genome Association Tester software with 1,000 simulations of randomly assigned genomic intervals of the same size (Heger et al. 2013). Enrichments with FDR < 0.05 were considered statistically significant.
Presence/Absence Analysis for the Selected SV Sequence Located in LRP1B
We downloaded 16 representative assemblies for ruminant suborder, which included GCA_014182915.2 for Gaur, GWHBDOC00000000 for Gayal, GCF_032452875.1 for Banteng, GCF_002263795.3 for Cattle, GCF_003369695.1 for Zebu Cattle, GCA_005887515.3 for Domestic Yak, GCF_000754665.1 for American Bison, GCA_963879515.1 for European Bison, GCF_019923935.1 for Water Buffalo, GCF_001704415.2 for Goat, GCF_016772045.2 for Sheep, GCA_017591445.1 for Giraffe, GCF_910594005.1 for Red Deer, GCF_022376915.1 for Chinese Forest Musk Deer, GCA_022376925.1 for Lesser Mouse Deer, GCF_949987535.1 for Minke Whale. We constructed a phylogenetic tree of species genetic distance for these assemblies based on the Mash algorithm. Then, Cactus (v2.9.0) was used to perform whole genome alignment for all species to obtain the .maf file (Hickey et al. 2024). We checked for the sequence of the SV in each species to determine its presence or absence.
Targeted-PCR Validation of the L1_BT Insertion in IGFBP7
Genomic DNA for one Holstein and two Angus cattle was extracted from preadipocytes following the instructions of the DNA extraction kit (Tiangen, Beijing). Primers were designed (forward: GCCAGGGGTCTTAGTCT; reverse: AGGTTCGATCCACGATA) based on the cattle genome version (ARS-UCD1.2). PCR amplification was performed with a 50 mL reaction volume according to the Taq DNA polymerase manufacturer's protocol (2×Hieff PCR Master Mix, Yeasen, Shanghai). The genomic DNA was amplified on a Bio-Rad MyIQ thermocycler. The touchdown PCR cycle for target region amplification was as follows: initial denaturation at 94 °C for 4 min; followed by 18 cycles of 94 °C for 30 s, annealing at 68 °C ∼ 50 °C (decrease 1 °C per cycle) for 30 s; 22 cycles of 94 °C for 30 s, annealing at 50 °C for 30 s; primer extension at 72 °C for 1 min; and final extension at 72 °C for 10 min. All the amplified products were run on a 1.5% agarose gel.
Single-cell RNA Sequencing for Cattle Primary Adipocytes
Cattle primary adipocytes were isolated from subcutaneous adipose tissue following our previously established protocols (Peng et al. 2023). The subcutaneous adipose tissues were collected under the approval of the Animal Experimental Ethical Inspection of Laboratory Animal Center, Huazhong Agriculture University with the ID number HZAUCA-2022-0010. The cells were cultured in high-glucose DMEM medium (4.5 g/mL Glucose, 4.0 mM L-glutamine, Cytiva) containing 10% FBS supplemented with 1% penicillin-streptomycin at 37 °C and 5% CO2. When the density of the cattle primary adipocytes reached 70%, cells were collected and washed with resuspension buffer. Cells were filtered through a 40 µm strainer to remove cell clumps. Cell viability was determined to be 90% using trypan blue staining and hemocytometer counting. Three libraries were constructed for cattle primary adipocytes isolated from Holstein cattle PAV(+), Angus cattle PAV(+), and Angus cattle PAV(−) using the Chromium Single Cell 3′ Gel Bead-in-Emulsion, Library & Gel Bead Kit v3 (10× Genomics, Pleasanton, CA, United States). Each scRNA-seq library was paired-end sequenced in a single cell flow lane on an Illumina HiSeq system.
The Cell Ranger (v7.1.0, 10 × Genomics) single-cell software was used to perform sample demultiplexing, alignment, filtering, and UMI counting. Cells are excluded based on the 5 median absolute deviation from the median value of detected genes across all cells and more than 10% of total UMIs mapping to the mitochondrial genome. Potential cell doublets were removed using DoubletFinder (v2.0) with default parameters and 7.5% homotypic doublet proportion estimation (McGinnis et al. 2019). Data analysis was performed using the R package Seurat (v5.1.0) in the R environment (v 4.1.1), including quality control, normalization, and scaling of data, feature selection, dimensionality reduction, clustering, and visualization of data (Hao et al. 2021). In Seurat, data were normalized using the “NormalizeData,” 2000 highly variable genes were identified using “FindVariableFeatures” and data were scaled using “ScaleData.” The remaining highly variable genes were decomposed by PCA, and the top 50 dimensions were selected. The 50 PCs were harmonized across samples using Harmony (v0.1) (Korsunsky et al. 2019). Cells were clustered into subpopulations according to the same dimensions using the “FindClusters” function with a 1.0 resolution.
Function Prediction and Enhancer Activity Analysis for the L1_BT Insertion Sequence in IGFBP7
The MEME online tool (https://meme-suite.org/meme/) was used to search for motifs in the sequence of L1_BT in the IGFBP7 gene. The top 10 motifs were identified and submitted to the GOMo module to identify possible roles (Gene Ontology terms) for each motif. Enhancer activity of the L1_BT insertion sequence in IGFBP7 was analyzed using a luciferase reporter assay in HEK293T cells. Briefly, cells were seeded in a 96-well plate. When the cell confluence reached ∼70% to 80%, the vectors were transfected into the cells using transfection reagents (PEI MW 25000, Shanghai). In the luciferase reporter system, the vector pRL-TK served as an internal reference, while the pGL3-Control and the empty pGL3-Basic were used as positive and negative controls respectively. The activities of firefly luciferase and Renilla luciferase were measured according to the manufacturer's instructions for the luciferase reporter assay kit (Yeasen, Shanghai). After measuring the two fluorescence values, the ratio of the fluorescence intensity of firefly luciferase to that of Renilla luciferase was calculated. The final data of the experiment were obtained and presented in the form of mean ± standard error.
Lipid Accumulation Ability Comparison Between Cattle With Presence/Absence Status of the L1_BT Insertion Sequence in IGFBP7
Cattle primary adipocytes, with six replicates from Angus cattle with different SV genotypes, were cultured in a high-glucose DMEM medium containing 10% FBS and supplemented with 1% penicillin-streptomycin at 37 °C and 5% CO2, separately. The culture medium was changed every 48 h. When cell confluence reached 90%, a differentiation-inducing medium containing 1.0 μmol/L dexamethasone, 0.5 mmol/L 3-isobutyl-1-methylxanthine, 1.0 μmol/L rosiglitazone, and 10 mg/L insulin was applied for 48 h. Subsequently, a fresh medium containing 10 mg/L insulin and 1.0 μmol/L rosiglitazone was used to maintain differentiation. On the 12th day, the cells were fixed in 2% formaldehyde at 4 °C, then washed three times with PBS for 5 min at 4 °C. Fixed cells were stained with 7.5 µM BODIPY and 28.55 µM DAPI for 30 min at room temperature. For each replicate, 45 images were captured using fluorescence microscopy to quantify and compare lipid accumulation in cattle primary adipocytes with different SV genotypes.
Supplementary Material
Contributor Information
Shoulu Dai, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
Pengju Zhao, Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
Wenhao Li, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
Lingwei Peng, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; Yazhouwan National Laboratory, Sanya 572024, China.
Enhui Jiang, College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Agricultural Molecular Biology, Yangling, Shaanxi, China.
Yuqin Du, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
Wengang Zhang, Yazhouwan National Laboratory, Sanya 572024, China.
Xuelei Dai, Yazhouwan National Laboratory, Sanya 572024, China.
Liu Yang, Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD 20705, USA.
Zhiqiang Li, Yazhouwan National Laboratory, Sanya 572024, China.
Linjing Xu, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
Xianyong Lan, College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Agricultural Molecular Biology, Yangling, Shaanxi, China.
Wenfa Lyu, Yazhouwan National Laboratory, Sanya 572024, China; Key Laboratory of Animal Production, Product Quality, and Security, Ministry of Education, College of Animal Science and Technology, Jilin Agricultural University, Changchun 130118, China.
Liguo Yang, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
Lingzhao Fang, Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus 8000, Denmark.
George E Liu, Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD 20705, USA.
Yang Zhou, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China; Yazhouwan National Laboratory, Sanya 572024, China.
Supplementary Material
Supplementary material is available at Molecular Biology and Evolution online.
Author Contributions
All authors have read and approved the manuscript. Y.Z., G.E.L., and L.Z.F. conceived and designed the experiments. Y.Z., S.L.D., P.J.Z., W.H.L., L.W.P., X.Z., W.G.Z., X.L.D., and L.J.X. performed in silico prediction and computational analyses. Y.Q.D. performed PCR confirmation. X.Y.L. and E.H.J. performed enhancer activity analysis. Y.Z., S.L.D., G.E.L., L.Z.F., P.J.Z., Z.Q.L., L.Y., W.F.L., and L.G.Y. collected samples and generated genome sequencing data. Y.Z., S.L.D., G.E.L., L.Z.F., and P.J.Z. wrote and revised the paper.
Funding
This work was supported by the STI2030—Major Projects (2023ZD0404802), National Natural Science Foundation of China (32472876), the Support high-quality development of the seed industry project in Hubei (HBZY2023B008), and the Basic Research Project of Yazhouwan National Laboratory (2310SH01). G.E.L. is supported in part by AFRI grant numbers 2019-67015-29321 and 2021-67015-33409 from the USDA National Institute of Food and Agriculture (NIFA). This research used resources provided by the SCINet project of the USDA ARS project number 0500-00093-001-00-D.
Data Availability
The 2,409 WGS data of 82 cattle breeds were retrieved from the NCBI database (https://www.ncbi.nlm.nih.gov/sra/) and our previous studies. The accession ID for each dataset is in supplementary table S1, Supplementary Material online. The single-cell RNA sequencing data generated in this study have been submitted to the NCBI (https://www.ncbi.nlm.nih.gov) under accession number PRJNA1198857. The pangenome graphs and SNP-SV joint reference panel for the current study are available from Figshare (https://doi.org/10.6084/m9.figshare.28021577).
All software and their respective versions used in the study are publicly available as described in the Methods section. The pipelines for pangenome construction, SNP calling, SV genotyping, etc. are available at https://github.com/PengjuZ/CattleSV.
References
- Adelson DL, Raison JM, Edgar RC. (5016 co-authors). Characterization and distribution of retrotransposons and simple sequence repeats in the bovine genome. Proc Natl Acad Sci U S A. 2009:106(31):12855–12860. 10.1073/pnas.0901282106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akinsola OM, Musa AA, Muansangi L, Singh SP, Mukherjee S, Mukherjee A. Genomic insights into adaptation and inbreeding among Sub-Saharan African cattle from pastoral and agropastoral systems. Front Genet. 2024:15:1430291. 10.3389/fgene.2024.1430291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009:19(9):1655–1664. 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alkan C, Coe BP, Eichler EE. (5201 co-authors). Genome structural variation discovery and genotyping. Nat Rev Genet. 2011:12(5):363–375. 10.1038/nrg2958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ayalew W, Wu X-y, Tarekegn GM, Chu M, Liang C-n, Sisay Tessema T, Yan P. Signatures of positive selection for local adaptation of African native cattle populations: a review. J Integr Agric. 2023:22(7):1967–1984. 10.1016/j.jia.2023.01.004. [DOI] [Google Scholar]
- Bai B, Wang Y, Zhu R, Zhang Y, Wang H, Fan G, Liu X, Shi H, Niu Y, Ji W. Long-read sequencing and de novo assembly of the cynomolgus macaque genome. J Genet Genomics. 2022:49(10):975–978. 10.1016/j.jgg.2021.12.013. [DOI] [PubMed] [Google Scholar]
- Basiricò L, Morera P, Primi V, Lacetera N, Nardone A, Bernabucci U. Cellular thermotolerance is associated with heat shock protein 70.1 genetic polymorphisms in Holstein lactating cows. Cell Stress Chaperones. 2011:16(4):441–448. 10.1007/s12192-011-0257-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, et al. (5309 co-authors). Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res. 2012:22(4):778–790. 10.1101/gr.133967.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolormaa S, Pryce JE, Kemper KE, Hayes BJ, Zhang Y, Tier B, Barendse W, Reverter A, Goddard ME. Detection of quantitative trait loci in Bos indicus and Bos taurus cattle using genome-wide association studies. Genet Sel Evol. 2013:45(1):43. 10.1186/1297-9686-45-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen N, Cai Y, Chen Q, Li R, Wang K, Huang Y, Hu S, Huang S, Zhang H, Zheng Z, et al. Whole-genome resequencing reveals world-wide ancestry and adaptive introgression events of domesticated cattle in east Asia. Nat Commun. 2018a:9(1):2337. 10.1038/s41467-018-04737-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen N, Fu W, Zhao J, Shen J, Chen Q, Zheng Z, Chen H, Sonstegard TS, Lei C, Jiang Y. BGVD: an integrated database for bovine sequencing variations and selective signatures. Genomics Proteomics Bioinformatics. 2020:18(2):186–193. 10.1016/j.gpb.2019.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen S, Zhou Y, Chen Y, Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018b:34(17):i884–i890. 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Y, Khan MZ, Wang X, Liang H, Ren W, Kou X, Liu X, Chen W, Peng Y, Wang C. Structural variations in livestock genomes and their associations with phenotypic traits: a review. Front Vet Sci. 2024:11:1416220. 10.3389/fvets.2024.1416220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021:18(2):170–175. 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christiono S, Anggarani W. The effect of pregnancy milk on the expression of kallikrein related peptidase-4 (KLK-4) and collagen type 1 (Coll-1) in amelogenesis. Dentino (Jur. Ked. Gigi). 2021:6(2):126–130. 10.20527/dentino.v6i2.11993. [DOI] [Google Scholar]
- Cicconardi F, Chillemi G, Tramontano A, Marchitelli C, Valentini A, Ajmone-Marsan P, Nardone A. (5335 co-authors). Massive screening of copy number population-scale variation in Bos taurus genome. BMC Genomics. 2013:14(1):124. 10.1186/1471-2164-14-124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium TBGSaA . (4763 co-authors). The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science. 2009:324(5926):522–528. 10.1126/science.1169588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooke RF, Cardoso RC, Cerri RLA, Lamb GC, Pohler KG, Riley DG, Vasconcelos JLM. Cattle adapted to tropical and subtropical environments: genetic and reproductive considerations. J Anim Sci. 2020a:98(2):skaa015. 10.1093/jas/skaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooke RF, Daigle CL, Moriel P, Smith SB, Tedeschi LO, Vendramini JMB. Cattle adapted to tropical and subtropical environments: social, nutritional, and carcass quality considerations. J Anim Sci. 2020b:98(2):skaa014. 10.1093/jas/skaa014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crysnanto D, Leonard AS, Fang Z-H, Pausch H. Novel functional sequences uncovered through a bovine multiassembly graph. Proc Natl Acad Sci U S A. 2021:118(20):e2101056118. 10.1073/pnas.2101056118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crysnanto D, Pausch H. Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery. Genome Biol. 2020:21(1):184. 10.1186/s13059-020-02105-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crysnanto D, Wurmser C, Pausch H. Accurate sequence variant genotyping in cattle using variation-aware genome graphs. Genet Sel Evol. 2019:51(1):21. 10.1186/s12711-019-0462-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- da Cruz AS, Silva DC, Minasi LB, de Farias Teixeira LK, Rodrigues FM, da Silva CC, do Carmo AS, da Silva M, Utsunomiya YT, Garcia JF, et al. Single-nucleotide polymorphism variations associated with specific genes putatively identified enhanced genetic predisposition for 305-day milk yield in the Girolando crossbreed. Front Genet. 2020:11:573344. 10.3389/fgene.2020.573344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. (5580 co-authors). The variant call format and VCFtools. Bioinformatics. 2011:27(15):2156–2158. 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Decker JE, McKay SD, Rolf MM, Kim J, Molina Alcalá A, Sonstegard TS, Hanotte O, Götherström A, Seabury CM, Praharani L, et al. (5410 co-authors). Worldwide patterns of ancestry, divergence, and admixture in domesticated cattle. PLoS Genet. 2014:10(3):e1004254. 10.1371/journal.pgen.1004254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Decker JE, Pires JC, Conant GC, McKay SD, Heaton MP, Chen K, Cooper A, Vilkki J, Seabury CM, Caetano AR, et al. (4976 co-authors). Resolving the evolution of extant and extinct ruminants with high-throughput phylogenomics. Proc Natl Acad Sci U S A. 2009:106(44):18644–18649. 10.1073/pnas.0904691106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021:372:eabf7117. 10.1126/science.abf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eichler EE, Nickerson DA, Altshuler D, Bowcock AM, Brooks LD, Carter NP, Church DM, Felsenfeld A, Guyer M, Lee C, et al. (4773 co-authors). Completing the map of human genetic variation. Nature. 2007:447(7141):161–165. 10.1038/447161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Engle BN, Hayes BJ. Genetic variation in PLAG1 is associated with early fertility in Australian Brahman cattle. J Anim Sci. 2022:100(4):skac084. 10.1093/jas/skac084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evdokimova V, Tognon CE, Benatar T, Yang W, Krutikov K, Pollak M, Sorensen PH, Seth A. IGFBP7 binds to the IGF-1 receptor and blocks its activation by insulin-like growth factors. Sci Signal. 2012:5(255):ra92. 10.1126/scisignal.2003184. [DOI] [PubMed] [Google Scholar]
- Gao Y, Gautier M, Ding X, Zhang H, Wang Y, Wang X, Faruque MDO, Li J, Ye S, Gou X, et al. Species composition and environmental adaptation of indigenous Chinese cattle. Sci Rep. 2017:7(1):16196. 10.1038/s41598-017-16438-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao Y, Yang L, Kuhn K, Li W, Zanton G, Bowman M, Zhao P, Zhou Y, Fang L, Cole JB, et al. Long read and preliminary pangenome analyses reveal breed-specific structural variations and novel sequences in Holstein and Jersey cattle. J Adv Res. 2025. In press. 10.2991/978-94-6463-728-1. [DOI] [PubMed] [Google Scholar]
- Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S, Boettger LM, McCarroll SA. (5491 co-authors). Large multiallelic copy number variations in humans. Nat Genet. 2015:47(3):296–303. 10.1038/ng.3200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hao Y, Hao S, Andersen-Nissen E, Mauck WM, 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, et al. Integrated analysis of multimodal single-cell data. Cell. 2021:184(13):3573–3587.e29. 10.1016/j.cell.2021.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci. 2009:92(2):433–443. 10.3168/jds.2008-1646. [DOI] [PubMed] [Google Scholar]
- Hayes BJ, Daetwyler HD. 1000 bull genomes project to map simple and complex genetic traits in cattle: applications and outcomes. Annu Rev Anim Biosci. 2019:7(1):89–102. 10.1146/annurev-animal-020518-115024. [DOI] [PubMed] [Google Scholar]
- Heger A, Webber C, Goodson M, Ponting CP, Lunter G. GAT: a simulation framework for testing the association of genomic intervals. Bioinformatics. 2013:29(16):2046–2048. 10.1093/bioinformatics/btt343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickey G, Heller D, Monlong J, Sibbesen JA, Sirén J, Eizenga J, Dawson ET, Garrison E, Novak AM, Paten B. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biol. 2020:21(1):35. 10.1186/s13059-020-1941-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol. 2024:42(4):663–673. 10.1038/s41587-023-01793-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu Y, Xia H, Li M, Xu C, Ye X, Su R, Zhang M, Nash O, Sonstegard TS, Yang L, et al. Comparative analyses of copy number variations between Bos taurus and Bos indicus. BMC Genomics. 2020:21(1):682. 10.1186/s12864-020-07097-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu Z-L, Park CA, Reecy JM. Bringing the animal QTLdb and CorrDB into the future: meeting new challenges and providing updated services. Nucleic Acids Res. 2021:50(D1):D956–D961. 10.1093/nar/gkab1116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huddleston J, Chaisson MJP, Steinberg KM, Warren W, Hoekzema K, Gordon D, Graves-Lindsay TA, Munson KM, Kronenberg ZN, Vives L, et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 2017:27(5):677–685. 10.1101/gr.214007.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jang J, Kim K, Lee YH, Kim H. Population differentiated copy number variation of Bos taurus, Bos indicus and their African hybrids. BMC Genomics. 2021:22(1):531. 10.1186/s12864-021-07808-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalbfleisch TS, McKay SD, Murdoch BM, Adelson DL, Almansa-Villa D, Becker G, Beckett LM, Benítez-Galeano MJ, Biase F, Casey T, et al. The Ruminant Telomere-to-Telomere (RT2T) Consortium. Nat Genet. 2024:56(8):1566–1573. 10.1038/s41588-024-01835-2. [DOI] [PubMed] [Google Scholar]
- Karim L, Takeda H, Lin L, Druet T, Arias JA, Baurain D, Cambisano N, Davis SR, Farnir F, Grisart B, et al. Variants modulating the expression of a chromosome domain encompassing PLAG1 influence bovine stature. Nat Genet. 2011:43(5):405–413. 10.1038/ng.814. [DOI] [PubMed] [Google Scholar]
- Katoh K, Toh H. Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics. 2010:26(15):1899–1900. 10.1093/bioinformatics/btq224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim J, Hanotte O, Mwai OA, Dessie T, Bashir S, Diallo B, Agaba M, Kim K, Kwak W, Sung S, et al. The genome landscape of indigenous African cattle. Genome Biol. 2017:18(1):34. 10.1186/s13059-017-1153-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim K, Kim D, Hanotte O, Lee C, Kim H, Jeong C. Inference of admixture origins in indigenous African cattle. Mol Biol Evol. 2023:40(12):msad257. 10.1093/molbev/msad257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim K, Kwon T, Dessie T, Yoo D, Mwai OA, Jang J, Sung S, Lee S, Salim B, Jung J, et al. The mosaic genome of indigenous African cattle as a unique genetic resource for African pastoralism. Nat Genet. 2020:52(10):1099–1110. 10.1038/s41588-020-0694-2. [DOI] [PubMed] [Google Scholar]
- Kommadath A, Grant JR, Krivushin K, Butty AM, Baes CF, Carthy TR, Berry DP, Stothard P. A large interactive visual database of copy number variants discovered in taurine cattle. Gigascience. 2019:8(6):giz073. 10.1093/gigascience/giz073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, et al. (4901 co-authors). Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet. 2008:40(10):1253–1260. 10.1038/ng.237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with harmony. Nat Methods. 2019:16(12):1289–1296. 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee YL, Takeda H, Costa Monteiro Moreira G, Karim L, Mullaart E, Coppieters W, Appeltant R, Veerkamp RF, Groenen MAM, Georges M, et al. A 12 kb multi-allelic copy number variation encompassing a GC gene enhancer is associated with mastitis resistance in dairy cattle. PLoS Genet. 2021:17(7):e1009331. 10.1371/journal.pgen.1009331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leonard AS, Crysnanto D, Fang Z-H, Heaton MP, Vander Ley BL, Herrera C, Bollwein H, Bickhart DM, Kuhn KL, Smith TPL, et al. Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies. Nat Commun. 2022:13(1):3012. 10.1038/s41467-022-30680-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leonard AS, Crysnanto D, Mapel XM, Bhati M, Pausch H. Graph construction method impacts variation representation and analyse s in a bovine super-pangenome. Genome Biol. 2023:24(1):124. 10.1186/s13059-023-02969-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res. 2024:34(2):300–309. 10.1101/gr.278267.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. New strategies to improve minimap2 alignment accuracy. Bioinformatics. 2021:37(23):4572–4574. 10.1093/bioinformatics/btab705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, Mitra A, Alexander LJ, Coutinho LL, Dell'aquila ME, et al. (5313 co-authors). Analysis of copy number variations among diverse cattle breeds. Genome Res. 2010:20(5):693–703. 10.1101/gr.105403.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y, Fang X, Zhao Z, Li J, Albrecht E, Schering L, Maak S, Yang R. Polymorphisms of the ASIP gene and the haplotype are associated with fat deposition traits and fatty acid composition in Chinese Simmental steers. Arch Anim Breed. 2019:62(1):135–142. 10.5194/aab-62-135-2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y, Xu L, Yang L, Zhao G, Li J, Liu D, Li Y. Discovery of genomic characteristics and selection signatures in southern Chinese local cattle. Front Genet. 2020:11:533052. 10.3389/fgene.2020.533052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Z, Wang N, Su Y, Long Q, Peng Y, Shangguan L, Zhang F, Cao S, Wang X, Ge M, et al. Grapevine pangenome facilitates trait genetics and genomic breeding. Nat Genet. 2024:56(12):2804–2814. 10.1038/s41588-024-01967-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Z, Wu T, Xiang G, Wang H, Wang B, Feng Z, Mu Y, Li K. Enhancing animal disease resistance, production efficiency, and welfare through precise genome editing. Int J Mol Sci. 2022:23:7331. 10.3390/ijms23137331 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Logsdon GA, Vollger MR, Hsieh P, Mao Y, Liskovykh MA, Koren S, Nurk S, Mercuri L, Dishuck PC, Rhie A, et al. The structure, function and evolution of a complete human chromosome 8. Nature. 2021:593(7857):101–107. 10.1038/s41586-021-03420-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Low WY, Tearle R, Liu R, Koren S, Rhie A, Bickhart DM, Rosen BD, Kronenberg ZN, Kingan SB, Tseng E, et al. Haplotype-resolved genomes provide insights into structural variation and gene content in Angus and Brahman cattle. Nat Commun. 2020:11(1):2071. 10.1038/s41467-020-15848-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma Y, Khan MZ, Xiao J, Alugongo GM, Chen X, Chen T, Liu S, He Z, Wang J, Shah MK, et al. Genetic markers associated with milk production traits in dairy cattle. Agriculture. 2021:11(10):1018. 10.3390/agriculture11101018. [DOI] [Google Scholar]
- Magalhães AF, de Camargo GM, Fernandes GAJ, Gordo DG, Tonussi RL, Costa RB, Espigolan R, Silva RM, Bresolin T, de Andrade WB, et al. Genome-wide association study of meat quality traits in Nellore cattle. PLoS One. 2016:11(6):e0157845. 10.1371/journal.pone.0157845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martins R, Machado PC, Pinto LFB, Silva MR, Schenkel FS, Brito LF, Pedrosa VB. Genome-wide association study and pathway analysis for fat deposition traits in nellore cattle raised in pasture-based systems. J Anim Breed Genet. 2021:138(3):360–378. 10.1111/jbg.12525. [DOI] [PubMed] [Google Scholar]
- McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 2019:8(4):329–337.e4. 10.1016/j.cels.2019.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. (5577 co-authors). The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010:20(9):1297–1303. 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020:585(7823):79–84. 10.1038/s41586-020-2547-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohammadabadi M, Bordbar F, Jensen J, Du M, Guo W. Key genes regulating skeletal muscle development and growth in farm animals. Animals (Basel). 2021:11(3):835. 10.3390/ani11030835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, Nguyen LT, Ross EM, Hayes BJ, Chamberlain AJ, et al. In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants. Genet Sel Evol. 2023:55(1):9. 10.1186/s12711-023-00783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishimura S, Watanabe T, Mizoshita K, Tatsuda K, Fujita T, Watanabe N, Sugimoto Y, Takasuga A. Genome-wide association study identified three major QTL for carcass weight including the PLAG1-CHCHD7 QTN for stature in Japanese Black cattle. BMC Genet. 2012:13(1):40. 10.1186/1471-2156-13-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olagunju TA, Rosen BD, Neibergs HL, Becker GM, Davenport KM, Elsik CG, Hadfield TS, Koren S, Kuhn KL, Rhie A, et al. Telomere-to-telomere assemblies of cattle and sheep Y-chromosomes uncover divergent structure and gene content. Nat Commun. 2024:15(1):8277. 10.1038/s41467-024-52384-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pampalakis G, Zingkou E, Sidiropoulos KG, Diamandis EP, Zoumpourlis V, Yousef GM, Sotiropoulou G. Biochemical pathways mediated by KLK6 protease in breast cancer. Mol Oncol. 2019:13(11):2329–2343. 10.1002/1878-0261.12493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pang AW, MacDonald JR, Pinto D, Wei J, Rafiq MA, Conrad D, Park H, Hurles M, Lee C, Venter JC, et al. (5089 co-authors). Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 2010:11(5):R52. 10.1186/gb-2010-11-5-r52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pemberton TJ, Absher D, Feldman MW, Myers RM, Rosenberg NA, Li JZ. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet. 2012:91(2):275–292. 10.1016/j.ajhg.2012.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng L, Zhang X, Du Y, Li F, Han J, Liu O, Dai S, Zhang X, Liu GE, Yang L, et al. New insights into transcriptome variation during cattle adipocyte adipogenesis by direct RNA sequencing. iScience. 2023:26(10):107753. 10.1016/j.isci.2023.107753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pitt D, Sevane N, Nicolazzi EL, MacHugh DE, Park SDE, Colli L, Martinez R, Bruford MW, Orozco-terWengel P. Domestication of cattle: two or three events? Evol Appl. 2019:12(1):123–136. 10.1111/eva.12674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007:81(3):559–575. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qanbari S, Pausch H, Jansen S, Somel M, Strom TM, Fries R, Nielsen R, Simianer H. Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 2014:10(2):e1004148. 10.1371/journal.pgen.1004148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rajavel A, Heinrich F, Schmitt AO, Gültas M. Identifying cattle breed-specific partner choice of transcription factors during the African trypanosomiasis disease progression using bioinformatics analysis. Vaccines (Basel). 2020:8(2):246. 10.3390/vaccines8020246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reverter A, Hudson NJ, McWilliam S, Alexandre PA, Li Y, Barlow R, Welti N, Daetwyler H, Porto-Neto LR, Dominik S. A low-density SNP genotyping panel for the accurate prediction of cattle breeds. J Anim Sci. 2020:98(11):skaa337. 10.1093/jas/skaa337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rexroad C, Vallet J, Matukumalli LK, Reecy J, Bickhart D, Blackburn H, Boggess M, Cheng H, Clutter A, Cockett N, et al. Genome to phenome: improving animal health, production, and well-being—a new USDA blueprint for animal genome research 2018–2027. Front Genet. 2019:10:327. 10.3389/fgene.2019.00327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E, Rowan TN, Low WY, Zimin A, Couldrey C, et al. De novo assembly of the cattle reference genome with single-molecule sequencing. Gigascience. 2020:9(3):giaa021. 10.1093/gigascience/giaa021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rossi C, Sinding MS, Mullin VE, Scheu A, Erven JAM, Verdugo MP, Daly KG, Ciucani MM, Mattiangeli V, Teasdale MD, et al. The genomic natural history of the aurochs. Nature. 2024:635(8037):136–141. 10.1038/s41586-024-08112-6. [DOI] [PubMed] [Google Scholar]
- Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, Carter NP, Hurles ME, Feuk L. (4806 co-authors). Challenges and standards in integrating surveys of structural variation. Nat Genet. 2007:39(S7):S7–S15. 10.1038/ng2093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwalie PC, Dong H, Zachara M, Russeil J, Alpern D, Akchiche N, Caprara C, Sun W, Schlaudraff KU, Soldati G, et al. A stromal cell population that inhibits adipogenesis in mammalian fat depots. Nature. 2018:559(7712):103–108. 10.1038/s41586-018-0226-8. [DOI] [PubMed] [Google Scholar]
- Sirén J, Monlong J, Chang X, Novak AM, Eizenga JM, Markello C, Sibbesen JA, Hickey G, Chang PC, Carroll A, et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science. 2021:374(6574):abg8871. 10.1126/science.abg8871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith TPL, Bickhart DM, Boichard D, Chamberlain AJ, Djikeng A, Jiang Y, Low WY, Pausch H, Demyda-Peyrás S, Prendergast J, et al. The Bovine Pangenome Consortium: democratizing production and accessibility of genome assemblies for global cattle breeds and other bovine species. Genome Biol. 2023:24(1):139. 10.1186/s13059-023-02975-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su R, Zhou H, Yang W, Moqir S, Ritu X, Liu L, Shi Y, Dong A, Bayier M, Letu Y, et al. Near telomere-to-telomere genome assembly of Mongolian cattle: implications for population genetic variation and beef quality. Gigascience. 2024:13:giae099. 10.1093/gigascience/giae099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang FM, et al. (5753 co-authors). An integrated map of structural variation in 2,504 human genomes. Nature. 2015:526(7571):75–81. 10.1038/nature15394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Talenti A, Powell J, Hemmink JD, Cook EAJ, Wragg D, Jayaraman S, Paxton E, Ezeasor C, Obishakin ET, Agusi ER, et al. A cattle graph genome incorporating global breed diversity. Nat Commun. 2022:13(1):910. 10.1038/s41467-022-28605-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A. 2005:102(39):13950–13955. 10.1073/pnas.0506758102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tijjani A, Salim B, da Silva MVB, Eltahir HA, Musa TH, Marshall K, Hanotte O, Musa HH. Genomic signatures for drylands adaptation at gene-rich regions in African zebu cattle. Genomics. 2022:114(4):110423. 10.1016/j.ygeno.2022.110423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Upadhyay M, Derks MFL, Andersson G, Medugorac I, Groenen MAM, Crooijmans RPMA. Introgression contributes to distribution of structural variations in cattle. Genomics. 2021:113(5):3092–3102. 10.1016/j.ygeno.2021.07.005. [DOI] [PubMed] [Google Scholar]
- Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, et al. From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013:43(1):11.10.11–11.10.33. 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Verdugo MP, Mullin VE, Scheu A, Mattiangeli V, Daly KG, Maisano Delser P, Hare AJ, Burger J, Collins MJ, Kehati R, et al. Ancient cattle genomics, origins, and rapid turnover in the fertile Crescent. Science. 2019:365(6449):173–176. 10.1126/science.aav1002. [DOI] [PubMed] [Google Scholar]
- Xu L, Cole JB, Bickhart DM, Hou Y, Song J, VanRaden PM, Sonstegard TS, Van Tassell CP, Liu GE. (5406 co-authors). Genome wide CNV analysis reveals additional variants associated with milk production traits in Holsteins. BMC Genomics. 2014:15(1):683. 10.1186/1471-2164-15-683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zamorano-Algandar R, Medrano JF, Thomas MG, Enns RM, Speidel SE, Sánchez-Castro MA, Luna-Nevárez G, Leyva-Corona JC, Luna-Nevárez P. Genetic markers associated with milk production and thermotolerance in Holstein dairy cows managed in a heat-stressed environment. Biology (Basel). 2023:12(5):679. 10.3390/biology12050679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang T, Li H, Jiang M, Hou H, Gao Y, Li Y, Wang F, Wang J, Peng K, Liu Y-X. Nanopore sequencing: flourishing in its teenage years. J Genet Genomics. 2024:51(12):1361–1374. 10.1016/j.jgg.2024.09.007. [DOI] [PubMed] [Google Scholar]
- Zhou Y, Yang L, Han X, Han J, Hu Y, Li F, Xia H, Peng L, Boschiero C, Rosen BD, et al. Assembly of a pangenome for global cattle reveals missing sequences and novel structural variations, providing new insights into their diversity and evolutionary history. Genome Res. 2022:32(8):1585–1601. 10.1101/gr.276550.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, et al. (4906 co-authors). A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol. 2009:10(4):R42. 10.1186/gb-2009-10-4-r42. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The 2,409 WGS data of 82 cattle breeds were retrieved from the NCBI database (https://www.ncbi.nlm.nih.gov/sra/) and our previous studies. The accession ID for each dataset is in supplementary table S1, Supplementary Material online. The single-cell RNA sequencing data generated in this study have been submitted to the NCBI (https://www.ncbi.nlm.nih.gov) under accession number PRJNA1198857. The pangenome graphs and SNP-SV joint reference panel for the current study are available from Figshare (https://doi.org/10.6084/m9.figshare.28021577).
All software and their respective versions used in the study are publicly available as described in the Methods section. The pipelines for pangenome construction, SNP calling, SV genotyping, etc. are available at https://github.com/PengjuZ/CattleSV.







