Abstract
The role of structurally dynamic genomic regions in speciation is poorly understood due to challenges inherent in diploid genome assembly. Here, we reconstructed the evolutionary dynamics of structural variation in five cat species by phasing the genomes of three interspecies F1 hybrids to generate near-gapless single haplotype assemblies. We discerned that cat genomes have a paucity of segmental duplications relative to great apes, explaining their remarkable karyotypic stability. X chromosomes were hotspots of structural variation, including enrichment with inversions in a large recombination desert with characteristics of a supergene. The X-linked macrosatellite DXZ4 evolves more rapidly than 99.5% of the genome clarifying its role in felid hybrid incompatibility. Resolved sensory gene repertoires revealed functional copy number changes associated with ecomorphological adaptations, sociality, and domestication. This study highlights the value of gapless genomes to reveal structural mechanisms underpinning karyotypic evolution, reproductive isolation, and ecological niche adaptation.
Introduction
Comparative genomics is a powerful approach for inferring the genetic basis of adaptation and speciation. Its success depends on accurate and representative whole-genome alignments that precisely quantify genetic similarities and differences between evolutionary lineages to make predictions regarding the impact of genomic divergence on phenotypic evolution and diversification. The application of long-read sequencing has enabled increasingly precise comparisons between taxa, facilitating the assembly of 92–96% of a diploid genome sequence into chromosomes1,2. However, tracing the evolutionary history of regions of high structural complexity and allelic divergence has remained challenging. Until the completion of the human telomere-to-telomere (T2T) project3–5, genomic “dark matter”6,7 that encompasses satellite arrays, centromeres, segmental duplications, and complex gene families had been missing from nearly all comparative genomic studies. Consequently, for most species, we still have a limited understanding of the evolutionary dynamics of the most repetitive genomic sequences and how their divergence manifests in reproductive isolation and phenotypic innovation.
The cat family Felidae represents a speciose and successful apex predator radiation that occupies diverse biomes across the globe. Previous comparative genomic studies have illuminated their rapid diversification in the Miocene8,9, frequent post-speciation gene flow9,10, the impacts of demographic changes on genetic diversity and fitness11–13, and the genetic consequences of domestication14. Here, we applied the trio-binning approach15 to three divergent interspecific crosses amenable to high-resolution haplotype phasing (Fig. 1a) to generate near-gapless genome assemblies from multiple species pairs along the felid phylogeny. Comparisons of these assemblies provided an unprecedented glimpse into the properties of large and complex gene families and functional repetitive elements that were previously inaccessible14,16,17. We describe insights into the cauldron of repetitive genetic variation with potentially large effects on chromosome function and speciation.
Fig. 1. Assembly and synteny comparisons among the genomes of five cat species.

(A) Phylogeny and timescale of the parent species of the three hybrid trios used for assembly and comparative analysis. Pie charts illustrate the phasing results (% of total reads) for the F1 PacBio CLR long reads. (B) Comparison of contig N50 statistics and number of assembly gaps against other highly contiguous mammalian reference genomes from domestic species. CatMax refers to the theoretical N50 maximum based on domestic cat chromosome sizes. (C) Contig alignments for the six felid single haplotype assemblies from chrsA3, B4, E2, and F2/C3 to the felCat9 diploid domestic cat long-read genome assembly, depicted on the bottom as a G-banded ideogram. Inferred centromere locations are indicated by red bars. The bars above each ideogram are colored by species and represent assembly contigs > 1 Mb. Breaks between contigs are indicated by a black line and a shift in color contrast. The full set of chromosome alignments is found in fig. S1. (D) Synteny plot133 illustrating extensive collinearity of the five species assemblies. Blue and purple alignment tracks highlight the only chromosome number change in Felidae, the Robertsonian fusion of chrF1 and chrF2 present in all felid genera, and the derived C3 chromosome observed in Geoffroy’s cat, and all species of the genus Leopardus. (E) Dot-plot alignment (left) of Geoffroy’s cat chrC3 and domestic cat chrF1 and chrF2 (illustrated with multicolor FISH in F). Note the orange alignment fragment indicating a small centromeric fragment of chrF2 that defines the (G) inversion breakpoint on the ancestral chrF2.
Results
Phased Genome Assembly Reveals Remarkable Collinearity
We used long-read PacBio sequencing to phase and assemble six single haplotype genomes from five cat species (domestic cat, leopard cat, Geoffroy’s cat, tiger, and lion) through the application of trio-binning to three F1 interspecies hybrids15. The parent species of the crosses diverged ≥4 million years ago (Ma) (Fig. 1a), enabling >99.5% of the long sequence reads to be accurately phased into the parental haplotypes18 (Fig. 1a, Supplementary Figs. 1–4, Supplementary Table 1). De novo assembly produced ultracontiguous assemblies with contig N50=77–104 Mb (Table 1, Fig. 1b). At least 99.6% of the euchromatic sequence was assembled into chromosome-length scaffolds using Hi-throughput chromosome conformation capture (Hi-C; Supplementary Fig. 5), with an average of just 53 gaps per genome assembly, 15 gapless chromosomes across all species, and 62% of the assembled autosomes containing two or fewer gaps (Fig. 1c, Supplementary Fig. 6), exceeding comparable parameters from all other domestic species reference assemblies (Fig. 1b). The canonical telomeric sequence shared by vertebrates is TTAGGG19; however, different blocks of microsatellites are found in telomeres of other species of the generalized pattern (TxAyGz)20. To determine which chromosome assemblies extended into one or both telomeres, we searched for telomere-like repeat sequences by requiring 80% of the terminal 100 bases of the chromosome to be labeled as a repeat family or a tandem repeat. Then, we extended the search window progressively. 61% of the chromosomes in the six assemblies likely extend into both telomeres, 32% extend into one telomere, and the remaining 7% lack terminal repeats and are likely incomplete. Only 32% of the assembled chromosomes possess the canonical TTAGGG tandem array at the telomere, while 21 chromosomes terminated with the FA-satellite21,22(Supplementary Table 2). Felids have a surprising level of intraspecific variation in telomeric sequences, unlike the human genome, whose telomeres uniformly possess the canonical vertebrate telomeric sequence.
Table 1.
Felid single haplotype genome assembly statistics.
| Species/Hybrid sequenced | Domestic cat-508 (Bengal cat F1) |
Domestic cat-126 (Safari cat F1) |
Asian leopard cat (Bengal cat F1) |
Geoffroy’s cat (Safari cat F1) |
Lion (Liger F1) |
Tiger (Liger F1) |
|---|---|---|---|---|---|---|
| Sex of parent haplotype | ♀ | ♀ | ♀ | ♀ | ♂ | ♀ |
| chromosome # | 18, X | 18, X | 18, X | 17, X | 18, Y | 18, X |
| # contigs | 123 | 103 | 132 | 88 | 103 | 135 |
| Largest contig | 205,171,639 | 172,124,406 | 240,846,738 | 239,106,607 | 166,870,000 | 166,130,000 |
| Ungapped assembly length (Mb) | 2,422,283,418 | 2,425,722,929 | 2,435,689,660 | 2,426,362,316 | 2,297,542,863 | 2,408,668,598 |
| Contig N50 (Mb) | 84,507,663 | 92,686,623 | 83,696,501 | 104,474,415 | 77,781,637 | 74,360,613 |
| # scaffolds | 71 | 70 | 83 | 46 | 53 | 74 |
| Total assembly length (Mb) | 2,422,299,418 | 2,425,747,038 | 2,435,718,761 | 2,426,370,816 | 2,297,568,983 | 2,408,695,688 |
| Scaffold N50 (Mb) | 147,603,332 | 148,491,486 | 148,587,958 | 152,606,360 | 147,402,474 | 146,942,463 |
| # chromosome gaps | 60 | 39 | 56 | 45 | 55 | 65 |
| Complete BUSCO genes (mammalia_odb10) | 8,621 | 8,619 | 8,621 | 8,612 | 8,417 | 8,630 |
| Percent Complete | 93.4 | 93.4 | 93.4 | 93.3 | 91.2 | 93.5 |
| Single Copy | 8,599 | 8,596 | 8,599 | 8,592 | 8,383 | 8,601 |
| Duplicated | 154 | 160 | 154 | 152 | 143 | 147 |
| Missing | 451 | 447 | 451 | 462 | 666 | 449 |
| %complete+partial | 95.1 | 95.2 | 95.1 | 95.0 | 92.8 | 95.1 |
Pairwise whole-genome alignments between the five species’ assemblies revealed near-complete karyotypic stasis since they diverged from a common ancestor ~11–15 million years ago (Mya)9,10 (Fig. 1d). The only change in chromosome number is a single Robertsonian translocation of two small acrocentrics (chrF1 and chrF2), producing a medium-size metacentric (chrC3) shared by all species of the Neotropical cat genus Leopardus (Figs. 1e–f)23. Close inspection of alignments between Leopardus geoffroyi and Felis catus showed that chromosome C3 was the product of a centric fusion, followed by a near chromosome arm-length inversion that reoriented >99% of C3q relative to the ancestral chrF2 homolog (Fig. 1e). All other chromosomal rearrangements between species were inversions several orders of magnitude smaller in size (<2 Mb) (Fig. 2a; Supplementary Table 3). We identified 170 fixed inversions greater than 50bp (Fig. 2a) across the five species phylogeny that samples >50 million years of independent branch length. By comparison, great ape genomes contain the products of 1,326 fixed inversions larger than 50bp 1(Fig. 2a). Felids and great apes diverged on a very similar evolutionary timescale, matching nearly 1:1 for divergence events (Fig. 2a). Given the similarity in sampled evolutionary history, great ape genomes possess 7.7-fold more rearrangements than felids suggesting that great ape genomes are more structurally prone to chromosome rearrangement than felids.
Fig. 2. Felid structural variation.

(A) Comparison of fixed inversions (red numbers) plotted on branches of the phylogeny of felids (right) and great apes (left)(5). Note the similar divergence times between ape and felid species sampled. (B) Per chromosome inversion counts plotted against chromosome length. Autosomes are indicated with blue dots and chrX in red. (C) Comparison of inversion size between the autosomes and chrX for each branch of the phylogeny (colored dots) shown in (A) (except for the lion genome which is derived from the paternal haplotype of the male F1 Liger)(Supplementary Table 3). A one-sided Wilcoxon rank sum test determined significance. Domestic cat (n=40 autosomal inversions, n=11 X inversions, U=2.52, p=5.9e-03), Geoffroy’s cat (n=33 autosomal inversions, n=4 X inversions, U=2.15, p=1.6e-02), Asian leopard cat (n=40 autosomal inversions, n=6 X inversions, U=1.92, p=2.7e-02), domestic cat + Asian leopard cat (n= 17 autosomal inversions, n= 3 X inversions, U=2.59, p=4.8e-03), tiger (n=34 autosomal inversions, n=11 X inversions, U=2.54, p=5.6e-03), lion (n=34 autosomal inversions). Box plots show the interquartile range with the center line representing the median. Whiskers indicate the highest and lowest value within the upper and lower fences (upper fence = 75% quantile + 1.5*interquartile range, lower fence = 50% quantile − 1.5*interquartile range). (D) The physical distribution of fixed and polymorphic inversions (Supplementary Table 4) on chrX for each branch of the phylogeny relative to the tiger genome. The X chromosome genome sequences are otherwise collinear across species. A tiger recombination map estimated from population genomic data (Supplementary Fig. 30) is depicted at the bottom (see Methods) and is highly conserved with the recombination rate profile of the domestic cat X chromosome9,134. The shaded area refers to a large recombination cold spot shared with domestic cat, human, and pig9,10. CEN=centromere.
Segmental duplications (SDs) have been hypothesized to be major drivers of chromosome evolution and disease susceptibility in the great ape lineage by promoting non-allelic homologous recombination24,25, particularly because of their uniquely interspersed distribution 26. In support of this hypothesis, SDs flank 82–86% of known primate inversions27. To determine whether SDs might be a primary driver of felid inversions, we used SEDEF28 to identify SDs in each cat haplotype. The total bases in felid SDs range from 25Mb to 35Mb, or 1 to 1.5% of each genome (Supplementary Fig. 7). Most SDs reside in the small portion of unlocalized sequences, ≤0.4% of the total assembly length (Table 1). The SD frequency (7%) estimated in the human T2T genome29 is 5–7-fold higher than in felid genomes. Compared to great apes, the similar-fold reduction in chromosomal rearrangements and SD frequency in felid genomes supports the hypothesis that the overall frequency of SDs is the primary driver of chromosome evolution in these two lineages. Future analysis of near-gapless genomes in other mammalian lineages with highly variable rates of karyotypic evolution will enable the testing of this hypothesis.
Structural variation is enriched on the X chromosome.
The hemizygous nature of the X chromosome (chrX) in male heterogametic taxa promotes faster rates of evolution relative to the autosomes and the accumulation of loci associated with reproductive isolation and speciation30,31. Previous studies also revealed a higher fixation rate of inversions on chrX relative to autosomes32,33. In cats, chrX was an outlier in terms of the number of inversions relative to chromosome length (Fig. 2b). For each branch in the phylogeny, the mean inversion was significantly larger on chrX than the autosomes (Fig. 2c). Inversions accumulated disproportionately in a ~40-Mb recombination cold spot on chrX that is enriched for barriers to gene flow across multiple felid lineages10 (Fig. 2d). Two thirds (24/36) of the X-linked inversions were fixed versus polymorphic (Supplementary Table 3). 70% of fixed inversions harbored at least one protein-coding gene (mean 1.3 genes/fixed inversion). In contrast, only 33% of polymorphic inversions spanned or overlapped with a single protein-coding gene. In half of these cases, the inversion was located within a long intron (Supplementary Table 4). These results support previous observations in insects33 and suggest that the fixed X-linked inversions within the 40-Mb recombination cold spot may harbor beneficial alleles given their longer length and enrichment with protein-coding genes. Previous studies of small and big cats also identified signatures of natural selection within the large recombination cold spot14,34. We hypothesize that this gene-rich, inversion-rich region is a major X-linked supergene locus underpinning felid reproductive isolation that warrants future comparative genomic analyses.
Satellite elements have been implicated in speciation but are poorly represented in diploid genome assemblies35,36. Cat chrX chromosome harbors the only X-linked speciation gene identified in mammals; the macrosatellite repeat DXZ437. DXZ4 has been well studied regarding its putative role in mammalian X chromosome inactivation (XCI). Human DXZ4 consists of a single 3-kb tandem repeat array containing 56 monomers, where each repeat contains a single CTCF binding site4 (Fig. 3a). Long non-coding RNAs (DANT1 and DANT2) expressed from DXZ4 on the inactive X chromosome (Xi) promote superlooping with other macrosatellites on the Xi38 and facilitates the localization of the Barr Body in female placental mammals to the nucleolar membrane39 (Fig. 3a). The human T2T genome assembly first resolved the DXZ4 array structure, but a complete assembly of DXZ4 sequences in other mammalian taxa is largely lacking, clouding our understanding of its evolution and function. DXZ4 was resolved in all six cat assemblies, revealing a unique compound tandem repeat composed of two highly divergent (mean p-distance=0.67) repeat arrays, RA and RB (Fig. 3b). Both monomer types contain CTCF binding sites, but notably differ in the number and orientation of the sites that are important for CTCF binding affinity and loop extrusion directionality40, suggesting divergent superlooping functions between the arrays. The human and mouse genomes notably lack the RB array.
Fig. 3. DXZ4 evolution in placental mammals.

(A) (left) X-linked lncRNAs from Dxz4, Xist, and Firre cooperatively interact in 3D space to anchor the inactive X chromosome to the nucleolus (figure modified from135.) (right) Comparison of the human and domestic cat DXZ4 repeat structure and GC content shown in genomic context to flanking genes PLS3 and AGTR2. Felids possess two distinct repeat arrays, RA (blue) and RB (yellow), while human only possesses the RA type. (B) DXZ4 repeat unit size, CTCF binding site composition (purple arrows), and copy number in human (top) and sequenced cat species. The Jungle cat data is from a single haplotype chrX assembly (27). (C) StainedGlass (59) dot-plots showing DXZ4 repeat array divergence between the domestic cat (FCA-126) and other cat species (% identity of between species alignments is shown to the right), in increasing order of evolutionary divergence. Note higher conservation across the central and flanking regions adjacent to the RA and RB arrays. (D) Distribution of genomic divergence rates between tiger-Geoffroy’s cat and tiger-domestic cat across 28,312 5-kb alignment windows. Pairwise divergence values for DXZ4 RA and RB, and the internal spacer region are shown for comparison (E) Phylogeny of placental mammals with DXZ4 repeat array presence (blue=RA type, yellow=RB type, gray=ambiguous) inferred from each genome assembly.
Studies using interspecific backcross hybrids of the domestic cat and Jungle cat (Felis chaus) identified DXZ4 as a major-effect hybrid male sterility locus, with a likely role in reproductive isolation and speciation within the Felis genus37. The germ cells of sterile male hybrid cats possess RA-specific methylation defects and DANT1 misregulation, culminating in the failure of meiotic sex chromosome inactivation (MSCI) and meiotic arrest, hallmark phenotypes in mammalian interspecies hybrids31. Evidence that DXZ4 functions in male meiotic silencing was intriguing, given the parallels between the heterochromatic Barr body formed during female XCI and the condensed X-Y body in male MSCI. Although the hybrid sterility phenotype was attributed to DXZ4 interspecific divergence, the precise mechanism is not well understood. Here, our expanded sampling of felid genomes demonstrates that the compound RA and RB repeat structure is copy number variable across all species (Fig. 3b), suggesting copy number-mediated expression effects may play an important role in speciation in other felids. In addition, StainedGlass41 plots illustrate the rapidity of DXZ4 repeat array sequence divergence (Fig. 3c). RA and RB arrays evolve 2–3 fold faster than the flanking and intervening non-coding spacer sequences. Notably, a genome-wide analysis of pairwise interspecific genetic divergence calculated across 28,312 5-kb alignment windows (94.1% of the multispecies alignment) placed DXZ4 RA in the top 0.5% of the most rapidly evolving genomic loci (Fig. 3d), supporting its role as a speciation gene37.
To determine whether the compound DXZ4 array structure in cats is the exception or the rule in placental mammals, we searched for DXZ4 arrays in long-read genome assemblies from species representing divergent superorders (Fig. 3e, Supplementary Figs. 8–11). Most assemblies possessed a gap within or adjacent to the predicted position of DXZ4 (Supplementary Figs. 12–13). We were able to recover sufficient repeat array resolution at the edge of some assembly gaps to characterize the CTCF array. Although the DXZ4 monomer sequence diverges rapidly to the point of phylogenetic saturation and lack of phylogenetic patterning (Supplementary Fig. 14), we observed conservation of the CTCF binding motif patterns across species from different ordinal lineages. Euarchontoglires (e.g., primates, rodents, rabbits) possessed only RA or RB, while members of Laurasiatheria possess RA, RB, or both types (Fig. 3e). RA and RB were therefore present in the most recent common ancestor of boreoeutherian mammals. Moreover, the repeat unit length is relatively constrained (between 3–4.6 kb) across species despite rapid sequence divergence and little conservation outside the CTCF motif42. Given this unusual combination of spatial and structural evolutionary conservation and an extremely fast rate of sequence evolution, we predict that DXZ4 satellite divergence may play a more widespread role in establishing and maintaining species boundaries in other mammalian clades.
Intriguingly, all sampled species from the family Bovidae lack DXZ4 in their assembly, suggesting they may have evolved compensatory mechanisms for its loss. Multiple studies have shown that ablation of DXZ4 has no significant impact on the silenced state of the inactive X chromosome in mouse and human cells40,43. Nonetheless, the high degree of syntenic, CTCF42, and spatial conservation of the DXZ4 repeat array over the past 104 million years of the placental mammal radiation suggest that DXZ4 expression and long-range chromatin interactions are functionally important for some heretofore unidentified cellular role during XCI and MSCI44. Pan-autosomal gene downregulation is one noteworthy cellular phenotype shared by in vivo Dxz4-knock-out mice45 and sterile feline interspecific hybrid testes37. These observations raise the possibility that DXZ4, acting alone or in concert with other X-linked macrosatellites, may function in RNA-dependent, chrX-autosomal crosstalk associated with the X chromosome “counting” process in XCI45 and proper sequestration of the DNA damage response factors from the autosomes to the X-Y body during MSCI46,47. Gapless X chromosome assemblies from a diverse sampling of mammalian genomes will be critical to understanding the functional relevance of DXZ4 in the X chromosome biology of mammals.
Variation in centromere structure and size
Current human and great ape centromere sequence models portray large tandem repeat arrays of alpha satellites flanked by other satellite repeat types, SDs, transposable elements, and even some genes48. Whether centromere structure is conserved across mammalian lineages is poorly understood because they are not sequence-resolved in most genome assemblies. Therefore, we sought to determine whether our assemblies possessed genomic signatures characteristic of centromeric satellites5. Given the absence of previously annotated cat centromeric sequences, we first characterized the overall landscape of feline repetitive elements to enable de novo prediction of the most probable centromeric satellites (Supplementary Fig. 15). Interspersed repeats comprise 38% of each genome with a marked distinction between Felinae (Felis, Prionailurus, and Leopardus) and Panthera, with Felinae showing an average SINE insertion rate ~2.7x higher than Panthera, while conversely, the LINE insertion rate in Panthera is ~1.6x higher than Felinae (Supplementary Fig. 16).
Next, we searched for novel repeat enrichment within narrowly defined chromosomal regions for which we had strong a priori evidence classifying that region as centromere-containing based on integrative analysis of comparative mapping approaches9,14,17 (Supplementary Fig. 17). This strategy identified a single, most probable centromere-containing interval for each chromosome enriched >1,000-fold with a small class of tandem repeats (Supplementary Fig. 18). The location of these intervals was highly conserved across species and consistent with stability of the felid karyotype. Like human and ape centromeres, several better-resolved cat centromeres (e.g., chrE3, Fig. 4a) consisted of a central satellite array of higher-order repeats (HORs). The predominant satellite repeat was 113-bp in length, ~25% smaller than the 151-bp alpha satellite typical of great ape centromeres5,48 (Supplementary Fig. 19). StainedGlass analysis of these candidate satellite arrays revealed patterns of monomer divergence similar to great ape centromere arrays, with more divergent monomers flanking higher identity monomers within the central satellite array (Fig. 4a). The Geoffroy’s cat possessed the largest centromeric repeat arrays on most chromosomes (Supplementary Fig. 20). This species’ karyotype also has the distinct C3 metacentric chromosome, a product of a Robertsonian chromosome fusion between chrF1 and chrF2 that occurred in the ancestor of the Leopardus lineage ≥3 Mya9,10. StainedGlass and syntenic alignment plots (Fig. 4b, Supplementary Fig. 21) reveal that Geoffroy’s cat chrC3 centromeric region retains the highest pattern and sequence similarity to the ancestral chrF1 centromeric satellite array.
Fig. 4. Centromere annotation and evolution.

(A) StainedGlass41 dot-plot of domestic cat 126 chrE3 centromere region showing percent identity of self-alignments within the satellite repeat array (colored triangle, with % identity scale and distribution shown in the upper right). Below the chromosome are tracks for tandem repeat annotations (colors indicate different GRM-defined repeat units) and RepeatMasker annotations (key at bottom). (B) Geoffroy’s cat chrC3 centromere region. The bottom two panels display NCBI CpG and gene annotations and inferred homology to the domestic cat F1 and F2 centromeric regions. The top tracks show StainedGlass plots and repeat annotations (and fractions observed on y axis). The most probable centromeric repeat array is highlighted in yellow and supported by alignments in Supplementary Figure 21.
Centromere sizes and repeat composition varied markedly between chromosomes and across felid species. Although we cannot exclude incomplete/collapsed sequences for some of this variation (Supplementary Figs. 22–25), the centromeric regions of three autosomes were gapless in all six felid genomes (chrs. B4, D4, and E2), likely due to reduced satellite array repeat complexity. For example, Felis chrB4 possesses a narrower centromeric interval and lacks the large satellite arrays observed on other chromosomes (Supplementary Fig. 26). Some mammalian families, like equids (donkeys, onagers, zebras), also exhibit considerable variability in the presence/absence of satellite repeats at their centromeres49,50. By contrast, the chrD4 centromere possesses a mostly conserved satellite array and illustrates the rapidity with which the central satellite monomer array sequences diverge relative to the flanking sequence (Supplementary Figs. 27–28), similar to great apes5. These new assemblies pave the way to exploring the potential role of interspecific centromeric satellite variation in felid meiotic drive and speciation51.
Evolutionary Innovations in Sensory Supergene Families
Olfactory receptor genes (ORGs) encode receptors that detect odorants and represent the largest gene superfamily, dispersed across the majority of mammalian chromosomes52 (Fig. 3a). Variation in repertoire size and functional content has been linked to shifts in ecology, diet, and life history traits and are likely crucial components of adaptation to new environments53,54. Most comparative studies of OR gene variation were based on short-read assemblies, which confound allelic discrimination and gene copy number differences. Indeed, the previous enumeration of differences in OR repertoire sizes between cats and tigers produced opposing results14,54. We quantified the functional ORG and vomeronasal receptor (V1R) gene profiles within each genome assembly and added published repertoire reconstructions from the Jungle cat (Felis chaus)37 and a Fishing cat (Prionailurus viverrinus) based on Hi-Fi reads55. These assemblies showed gapless ORG and V1R gene cluster inclusion with contiguity metrics approaching the single haplotype assemblies (mean cN50=80 vs. 91 Mb).
We observed large ORG copy differences (>10% of the maximum repertoire size) between species (Fig. 5b, Supplementary Table 5). Felids retain >70% functional ORGs (Supplementary Table 6), larger than most mammals54. This elevated functional repertoire may reflect their predatory behaviors, with an acute sense of smell to track and locate prey across great physical distances56. The tiger is solitary with among the largest home range sizes and habitat diversity of any living felid57. It possessed the most extensive functional ORG repertoire and the highest number of gene duplications of any sampled species for air-borne Class II ORGs (Fig. 5b–c, Supplementary Tables 6–7). Several ORGs that are known to bind volatile compounds in the blood (OR1G1: nonanal, OR2W1, and OR51V1: hexanal)58,59, and the pheromone androstenone (OR7D4)58 had relatively high copy numbers (Supplementary Fig. 29). The tiger and Geoffroy’s cat lineages both possessed specific duplications in ORGs associated with blood-associated odorants. By contrast, the ancestor of the domestic cat lineage had the fewest ORG duplication events, potentially reflecting relaxed evolutionary pressure on olfaction before or during domestication.
Fig. 5. Olfactory (ORG) and Vomeronasal (V1R) receptor gene evolution in cats.

(A) Chromosomal distribution of ORG (red) and V1R (blue) genes within the domestic cat genome. (B) Phylogeny and rate of ORG family duplications (scale to lower left). Barplots to the right illustrate per/species ORG (navy blue) and V1R (purple) functional gene copy number. (C) Number of per-branch unique ORG retention, classified into class I (blue=“water-borne”) and class II (green=“airborne”) receptor types. Each circle represents a uniquely retained gene, with its subfamily classification depicted by the number. (D-E) Models of ORG birth and death with specific examples. (D) shows the standard birth and death (pseudogenization) model, illustrated by tiger chrD1 (OR4P4a and OR4P4b). (E) illustrates a gene birth followed by paralog birth via segmental duplication in the fishing cat. (F) illustrates a gene birth via segmental duplication in the Panthera ancestor preceding speciation of the lion and tiger lineages.
Class I ORG families (OR51, OR52, OR55, OR56) are generally considered the ‘water-borne’ odorant-binding class, and selection for functional copies is usually rare in terrestrial mammals. The Fishing cat (Prionailurus viverrinus) is one of two felids with pronounced aquatic adaptations such as foot webbing and other otter-like morphological adaptations to the head and tail56. The fishing cat possesses one of the largest relative percentages of functional water-borne ORGs (75%), similar to the two domestic cats (74 and 76%), and higher than the other wild felids (Lion: 67%, Tiger: 71%, Geoffroy’s cat: 72%, Leopard cat and Jungle cat: 73%, table S8). Notably, the adaptive importance of water-borne OR receptors to the Fishing cat is reflected in the lack of any Class I-specific pseudogenization events within its lineage and the retention of three functional Class I ORGs that have subsequently been pseudogenized in all other felid species (Fig. 5c, Supplementary Table 9).
Olfactory receptor gene sequences evolve through an evolutionary pattern known as the birth-and-death model60 (Fig. 5d). This model assumes new ORGs are ‘born’ through tandem gene duplication and retained via subfunctionalization or neofunctionalization61. Gene death occurs from nonsense mutations or larger-scale genic deletions. Analysis of the chromosomal regions flanking ORG clusters revealed that while many of the inferred duplication events consisted of the ORG sequence alone, 18 of the 198 detected lineage-specific gene duplications (9.1%) were the product of larger SDs spanning ≥2000bp (Fig. 5e–f), similar to the frequency (10%) of SD-driven ORG duplications in humans62. A mean rate of 2.73 amino acid mutations was observed between functional segmentally duplicated ORGs compared to 2.3 amino acids in gene-specific duplicates, suggesting differences in the rate of natural selection acting on ORG evolution may be dependent on the duplication mechanism. This distinction is important because all genes duplicated as part of a larger block may not be targets of selection. Segmental duplication likely explains some of the more extensive ORG repertoires observed in mammals, as in the African elephant, which is estimated to possess over 2,000 functional genes but more than 1,000 pseudogenes63. Future analyses of sensory genes in T2T genomes will allow further exploration of this model of ORG evolution in a range of vertebrate taxa.
Vomeronasal receptors (V1R) detect pheromones and other sociochemicals. We recovered complete V1R gene repertoires for each species, ranging from 67 genes in the Jungle cat to 85 genes in the Tiger (Fig. 5b), with ~36% of V1R genes retaining function across species (Supplementary Tables 10–12). The Tiger genome possessed the most functional V1R loci. Like their large functional ORG repertoire, this is potentially attributable to the large physical distances necessary for tigers to detect scent marks and discriminate potential conspecific and reproductively receptive mates64. Most of the estimated gene duplication events occurred in Tiger and Lion genomes. They may reflect divergent adaptations to the use of social/sexual cues in both solitary and social life histories. Interestingly, we observed the highest frequency of non-functional (68%) V1R genes within the Lion genome. Because Lions live in highly cooperative groups in physical proximity, we hypothesize that the increased pseudogenization rate may be the product of relaxed selection on the use of chemical cues for determining sexual status and identifying mates relative to solitary species. Furthermore, while there were no unique lineage or species-specific retention of functional V1R genes like in the ORG family, the only unique V1R gene loss event occurred in the ancestor of the domestic cats, evidence of relaxed selective pressures during domestication14.
Discussion
Here we applied feline hybrid models to produce multiple well-annotated and near-gapless sequence assemblies spanning felid radiation. Despite their similar evolutionary ages, great ape and felid lineages possess distinct differences in segmental duplication densities that provide a genomic explanation for the striking karyotypic stability observed across the cat radiation. Resolving recalcitrant sequence structures also clarifies how natural selection continues to shape different axes of genomic diversity. The chemosensory system is particularly relevant as gene family variation has large fitness effects, and here we showed that precisely resolved gene repertoires allow for discriminating the ecological relevance of gene birth and death. Notably, large differences in ORG and V1R gene repertoires between the closely related lion and tiger likely mirror the outcome of natural selection on evolved differences in social versus solitary life histories. The private retention of aquatic-borne odorant receptors in the fishing cat also helps to clarify the role of natural selection in ecological niche adaptation. Future studies of sensory gene repertoire variation within species occupying broad geographic ranges and habitats (e.g., tiger, puma, bobcat) using phased assembly approaches will provide critical insights into the genetic basis of local sensory adaptation.
Speciation studies typically focus on the landscape of divergence, seeking outlier loci or ‘islands of speciation’ to uncover the genetic barriers that maintain species boundaries in the face of gene flow65. Our study illustrates the rapidity with which functional satellite elements evolve relative to background rates of protein and genic sequence variation and provides additional evidence as to the role of DXZ4’s exceptional divergence in felid speciation. Yet satellites are often invisible to divergence scans as these highly repetitive regions are typically missing4,37 or misassembled in most diploid genome assemblies. Future genomic prospecting from T2T genomes3,66 promises to lend new insights into the landscape of genomic and structural divergence in adaptive phenotypic variation. We anticipate exciting breakthroughs inferring the genetic mechanisms of speciation and enabling genomically-informed biodiversity conservation67–69.
Methods
Biological materials and genome sequencing
Fibroblast cell lines were established at the National Cancer Institute under protocols approved under contract N01-CO-12400. The parent-offspring trio of the Safari cat was composed of a random-bred domestic cat (Felis silvestris catus) dam, a Geoffroy’s cat (Leopardus geoffroyi) sire, and a female F1 offspring. Cell lines were karyotyped to confirm species identity and F1 status (Supplementary Fig. 31). The details of the Bengal cat F1 trio were previously reported18,70,71. The parent-offspring trio of the Liger was composed of a Tiger dam, a Lion sire, and a male F1 offspring (LxT-3). A karyotype of the F1 male liger was generated (Supplementary Fig. 32).
High molecular weight genomic DNA was extracted from cells using a modified salting-out protocol72. PacBio SMRT libraries were size selected (>20-kb) and sequenced on the Sequel IIe instrument to yield approximately 158x and 153x coverage for the Safari and Liger F1, respectively. The Bengal F1 reads18 were sequenced on the Sequel I platform to 90x coverage.
Illumina fragment libraries (~300-bp average insert size) were prepared for the parent samples of trios using the NEBNext Ultra II FS DNA Library Kit (New England Biolabs Inc.). Samples were sequenced to ~30x coverage with 2×150-bp reads on the NovaSeq 6000 platform.
Hi-C library preparation and sequencing
Fibroblasts were fixed as a monolayer using 1% formaldehyde, divided into ~4.2×106 cell aliquots, snap-frozen in liquid nitrogen, and stored at −80°C73. Cells were lysed, resuspended in 200ul of 0.5x DNase I digestion buffer, and chromatin digested with 1.5 units of DNase I for 4 minutes. Downstream library preparation was performed as described73 and sequenced on the Illumina NovaSeq 6000 to ~78x coverage.
Genome Assembly and Annotation
Haplotype Binning
All Illumina data was processed with FastQC v0.11.874 and adapter trimming using Trim Galore! v0.6.4. Illumina sequences were unavailable for the parents of the F1 Safari cat. Therefore, we used the domestic cat parent (Fca-508) of the Bengal F1 hybrid and published Geoffroy’s cat Illumina data (Oge-3: SRR6071645)10 for phasing. Long reads were phased into haplotype bins using the trio binning feature of Canu v1.8 (TrioCanu)15,75.
De novo Assembly
Haplotyped long reads for each species were assembled using NextDenovo v2.2-beta.0 (github:Nextomics/Nextdenovo) with the configuration file (.cfg) altered for inputs: minimap2_options_raw = -x ava-pb, minimap2_options_cns = -x ava-pb. The seed_cutoff= option was adjusted to 32k for all assemblies. Lion Y chromosome contigs were identified using published procedures37.
Contig Polishing and QC
NextPolish v1.3.076 and NextDenovo corrected long reads were used to polish the raw contigs. Changes to the NextPolish configuration file included: genome_size=auto, and task=best, which instructs the program to perform two iterations of polishing using the corrected long reads. The sgs option was removed as polishing with the parental diploid short reads could lead to the conversion of consensus sequence to reflect the alternate haplotypes not present in the F1. The lgs options within the configuration file were left at default settings except for modification for PacBio long reads by adjusting minimap2_options= -x map-pb. Basic assembly stats were generated using QUAST v5.0.277 with the --fast run option selected. BUSCO v4.0.678 was used to assess genome completeness, with the -m genome setting with -l mammalia_odb10 database selected (9,226 single copy genes). Visual assessment of the assemblies was performed through alignment to the domestic cat assembly Fcat_Pben_1.0_maternal_alt (Fca-508: GCA_016509815.1)18 using nucmer (mummer3.23 package)79 with default settings. Delta files were used to generate dot plots using Dot: interactive dot plot viewer for genome-genome alignments (DNAnexus).
We also assessed assembly quality based on k-mer accuracy and completeness. Illumina data from each respective F1-hybrid was used to generate meryl (v1.3) k-mer databases for the two parents and child. Resulting meryl databases were then used to generate hapmer databases using Merqury’s (v1.3) hapmer script ($ sh $MERQURY/trio/hapmers.sh). The parental hapmer databases and child database were then passed to Merqury to evaluate assembly quality. We also assessed assembly quality using Inspector (https://github.com/Maggi-Chen/Inspector) (v1.0.2).
Scaffolding
Polished contigs from the domestic and Geoffroy’s cat were scaffolded using Hi-C data generated from the F1 Safari cat hybrid fibroblasts. Hi-C reads were binned into parental haplotypes prior to scaffolding by aligning the offspring reads to both polished parental assemblies using bwa mem v0.7.1784 and the classify_by_alignment (https://github.com/esrice/trio_binning/v0.2.0) program as described in Rice et al.88. Haplotyped reads were mapped to polished contigs using the pipeline and scripts described in88 (https://github.com/esrice/slurm-hic/) using SALSA v2.289,90 with parameters -e none -m yes. We removed all Y chr contigs prior to scaffolding to prevent incorporation of repetitive Y chromosome contigs into paralogous autosomal regions. Previously published Hi-C data for tiger (SRR8616865) and lion (SRR10075807/SRR10075808)(DNA Zoo91) was used to scaffold their respective assemblies with SALSA parameters -e GATC -m yes. The resulting scaffolds were inspected using QUAST, nucmer, and Hi-C contact maps. RagTag v1.0.192 was used to align scaffolds relative to Fcat_Pben_1.0_maternal_alt (Fca-508: GCA_016509815.1). Selected RagTag parameters included –remove-small, -f 10000 and -j unplaced.txt. RagTag scaffolds were manually inspected with Hi-C maps generated using Juicer v1.5.793 with option -s for compatibility with DNase Hi-C libraries. Maps were visualized using Juicebox v1.11.0894 and Juicebox Assembly Tools with scripts from 3d-dna v.180922.
Genome Annotation
The NCBI annotation pipeline provided the final assembly annotations used in our analyses. Identification and annotation of DXZ4 repeat units were performed manually using GC content traces, CTCF motif annotations, and self-self dot plots for the region using Geneious Prime v2021.0.3 and FlexiDot v1.0695. CTCF motifs were annotated using the Geneious Annotate & Predict tool with a sequence motif of GAGTTTCGCTTGATGGCAGTGTTGCACCACGAAT, based on the conserved CTCF motif logo96, with the most prevalent nucleotide representative of each position. A max mismatch of 13 was selected to allow for interspecific ambiguity within the motif. CTCF sites annotated using this method corresponded to the approximate location within human DXZ4 repeat units originally described by Chadwick97. Independent repeat units were aligned using the Mafft Multiple Aligner v1.4.0, and maximum likelihood trees generated with RAxML v8.2.1198 under a GTR+I+G model of sequence evolution. Trees were pruned using Mesquite v3.6199 and visualized using FigTree v1.4.4. Mean within- and between-group p-distances for masked (10% gaps masked) DXZ4 repeat unit alignments were calculated using Mega-X v10.0.5100. To compare the rate of DXZ4 repeat evolution to the remainder of the genome, we created a multiple-sequence alignment (MSA) with the domestic cat genome (Fca126) and Geoffroy’s cat aligned to the tiger SHA reference. The alignment was passed to Tree House Explorer (v1.0.2)101 where the THExBuilder function was used to calculate p-distances in 10kb windows with a strict missing data threshold of 0.0.
Comparative genomic analyses of DXZ4 were assessed with contiguous long-read genome assemblies from all placental mammal superordinal clades102 downloaded from NCBI. We chose male assemblies, where available, due to their single chrX haplotype. Reference gene annotations for PLS3 and AGTR2 were used with Liftoff to identify the location of DXZ496. Centromere positions were identified using a combination of NCBI annotations, interspecific alignments, and the Atlas of Mammalian Chromosomes, 2nd Edition103. Dot-plots were generated using FlexiDot. We determined the presence/absence of DXZ4 based on the presence of repeat structure, CTCF binding motifs, and location relative to PLS3 and AGTR2. Human, cat, and pig DXZ4 repeat monomers were also queried against the X-chromosome using the discontiguous-megablast BLAST algorithm.
Repetitive Landscape, Centromere Annotation and Analysis
Repeats
Repeats in each of the genomes were masked using RepeatMasker (RepeatMasker-4.1.2-p1; Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-4.0. 2013–2015 <http://www.repeatmasker.org>.) with the Dfam3.5+RepBase (rbrm-20181026) libraries where RepeatMasker was configured to use trf 4.09.1 for identifying tandem repeats, rmblastn (2.10.0+) to generate alignments and called with the -species cat option to mask using the cat-specific libraries. All repeats identified with that RepeatMasker run using the standard cat libraries were then masked as Ns, and RepeatModeler2 104(RepeatModeler 2 v2.0.2a; rmblast 2.10.0+; TRF 4.10, RECON, RepeatScout 1.0.6, RepeatMasker 4.1.2=-p1; LTR Structural Analysis: Enabled ( GenomeTools 1.6.2, LTR_Retriever v2.9.0, Ninja 1.10.2, MAFFT 7.453, CD-HIT 4.8.1) was used to model additional repetitive elements with the LTRstruct option enabled (LTR_retriever v2.9.0 configured to use rmblast2.10.+; RepeatMasker; hmmer3.3.2; cdhit4.8.1). All identified repeats were masked, and RepeatModeler2 was run again, and the genomes were N-masked. Finally, to be certain the centromeres had been fully sampled, centromere regions from the final N-masked genomes were used as the input to RepeatModeler to create a final set of repeat models that were added to the Dfam3.5 + RepBase + the 2 previous rounds of RepeatModeler. The three RepeatModeler consensus sequences were extended when the full repeat was not modeled and trimmed when the repeat model ran into a neighboring exon, concatenated, and redundancy was removed.
Segmental duplications
Before identifying segmental duplications, repetitive elements identified using the RepeatMasker/RepeatModeler approach described above, as well tandem repeats identified by GRM105 and ULTRA106 (version 0.99.17ultra; using period=10, period=100 and period=4000), were masked. Segmental duplications were defined using SEDEF28 with default parameters.
Centromeres
Initial outer bounds for the centromere region of each chromosome were defined by aligning known bounding markers107 against each cat genome using blat108. The location was further refined by identifying human/cat synteny breakpoints by aligning each cat genome to the human genome (GCA_000001405.27_GRCh38.p12_genomic.fna) using nucmer79 with default parameters then filtered using a 70% identity filter (delta-filter -i 70). Many felid chromosome arms are painted by separate human chromosomes using Zoo-FISH data, hence synteny breaks should define centromeric regions109. Reciprocal best alignments were extracted (show-coords -cT) and human/cat breakpoints were identified. To identify the centromere boundary, beginning at the human/cat alignment breakpoint, we move into the centromere analyzing the repeat density of Unknown+Satellite repeats (see RepeatMasker methods) in 25kb windows in 1kb steps. When that repeat density exceeded 0.3, we stepped “back” to the base of the repeat density peak. To identify the position at which there was a significant change in the Unknown+Satellite repeat density, we identified the change point with a probability of at least 0.75110. From that point, we again walked “away” from the centromere using a window size of 1.5kb on the density of all repetitive elements that were enriched >500x within the centromere, to incorporate any missed elements (density >0.25) within 30kb, and to incorporate missed tandem repeats (repeat unit sizes 100 to 4000, window size = 5k; density >0.20). Finally, we checked that any boundary was between and not within a predicted gene.
Sensory Receptor Annotation and Analysis
To identify both olfactory receptor (ORG) and vomeronasal receptor (V1R) genes, we combined both the BLAST86 and the Olfactory Receptor Assigner53 algorithms into a single workflow. Initially, genomic regions containing putative sequences were identified by mapping a set of mammal-annotated ORG and V1R sequences, available via RefSeq, to each genome using blastn. A minimum of 85% sequence identity and 200bp covered per hit was used to highlight potential sensory gene sequences and exclude non-specific GPCR-like regions. Genomic regions for each hit were extracted with an additional 500bp up and downstream to ensure start and stop codons were included. ORA uses a set of reference profile hidden Markov models (HMMs) to annotate ORG/V1R genes for each region extracted. Profile HMMs specific to V1Rs were generated using HMMR3111. ORG/V1R genes were classified as non-functional if they contained an in-frame stop codon or if they were less than 650bp in length (i.e., not long enough to complete the seven-transmembrane domain). identified ORG/V1R sequences were mapped to the original RefSeq data to confirm they were definitive sensory genes. All ORG/V1R genes were mapped (blastn) between felid genomes to ensure no sequences were missing between species. ORA was used to classify all ORG and V1R genes into 13 subfamilies (OR1/3/7, OR2/13, OR4, OR5/8/9, OR6, OR10, OR11, OR12, OR14, OR51, OR52, OR55, OR56) and seven subfamilies (V1R1, V1R2, V1R3, V1R4, V1R5, V1R48, V1R90, V1R100), respectively.
Maximum likelihood (ML) gene trees per gene family per chromosome were inferred using IQTREE v.1.6.12(GTR+I+G)112 based on multiple sequence alignments generated with Clustal Omega113. The number of lineage-specific gene duplication events per species was estimated using Notung114. Additionally, by splitting gene trees into all possible subtrees via the ‘ape’ package in R115, gene presence/absence per subtree was used to characterize putative one-to-one orthologs across species. Ambiguous orthologous relationships were further resolved using both genomic coordinates and blast hits. To determine if lineage-specific ORG/V1R gene duplications consisted of only the specific receptor gene or represented the duplication of a larger chromosomal region, 1000bps both up and downstream of each sequence was extracted and analyzed for segmental duplications as described above.
Tiger Recombination Map
Publicly available short-read data for 4 individual Panthera tigris jacksonii (SRR7152390, SRR7152389, SRR7152391, SRR715294) were trimmed, filtered, and mapped to the Panthera tigris (P.tigris_Pti1_mat1.1) reference genome. Mapping results were evaluated and summarized using the Qualimap function bamqc116. Samtools117 was used to remove duplicate reads. Base quality score recalibration (BQSR) was performed using GATK118,119 by generating an initial reference set of SNPs from the dataset itself. Variants were then called, and all samples were jointly genotyped. Variants were filtered to remove variants in repeatmasked regions using GATK. Variants were further filtered, removing variants within 5bps of an indel and those which did not meet the following quality criteria -e’%QUAL<30 | INFO/DP<16 | INFO/DP>62 | QD<2 | FS>60 | SOR>10 | ReadPosRankSum <−8 | MQRankSum <−12.5 | MQ<40’ in bcftools (https://github.com/samtools/bcftools). VCFtools (https://vcftools.github.io/man_latest.html) was used to remove indels, leaving 3,067,994 biallelic SNPs for further analysis. ReLERNN v.1.0.0, a deep learning approach that uses recurrent neural networks, was used to model the genome-wide recombination rate120. A mutation rate of 2.2e-9121 was used. ReLERNN was run using the simulate, train, predict and bscorrect modules with default settings. Inferred recombination rates were averaged in 2Mb blocks in 50kb sliding windows.
Structural Variant/Inversion Identification and Analysis
Initial Inversion Call Set Detection with PAV
An initial variant call set was generated using PAV122(GitHub commit: 24efbea) with minimap2 (v2.24)123 parameters “-x asm20 --secondary=no -a -t {params.cpu} --eqx -Y -B 2 -z 10000,50 --end-bonus=100” and PAV configuration settings “inv_region_limit: 3000000”, henceforth referred to as the PAV-mm2 call set. The “sv_inv.bed.gz” bed files containing inversion calls for each sample were then used for downstream filtration and validation. As an additional line of validation, we also ran PAV using Long-Read Aligner (LRA) (v 1.3.2)124 with parameters “-CONTIG -p s -t”. The resulting “sv_inv.bed.gz” inversion bed file was used for validation of the PAV-mm2 initial call set. Inversions overlapping regions identified as collapsed segmental duplications identified by SDA127 were removed from the analysis.
PBSV
CLR reads were mapped to the Geoffroy’s cat reference assembly (O.geoffroyi_Oge1_pat1.0) using pbmm2 (v1.9.0) using the parameters “--sort --median-filter”. The variant call set was generated using PBSV (v2.8.0) by first identifying signatures of structural variants using the discover command “pbsv discover --tandem-repeats tandem_repeats.bed <input.bam> <output.svsig.gz>” where tandem repeats were identified by GRM and ULTRA. Then, variants are called using the call command “pbsv call <reference.fasta> <output.svsig.gz> <output.vcf>”.
Sniffles
CLR reads were mapped to the Geoffroy’s cat reference assembly (O.geoffroyi_Oge1_pat1.0) using pbmm2 (v1.9.0) using the parameters “--sort --median-filter”. Variants were then called using Sniffles (v2.0.7)125,126 with parameters “-t <cpu_count> -i <input.bam> -v <output.sniffles.vcf> --tandem-repeats <reference-repeats.bed>”.
Long-read Mapping-based Call Set Filtration
Call sets from PAV-LRA, PBSV, and Sniffles were used to filter the initial PAV-mm2 call set by removing variants that were not supported by at least one of the three additional variant call sets. We utilized BEDTools (v2.30.0)85 to call inversion variants with a 50% reciprocal overlap. Inversions identified on unplaced scaffolds were excluded. We identified large inversions (>500kbp) not called by PAV with SafFire (https://github.com/mrvollger/SafFire). Input paf files were generated by mapping each assembly to the Geoffroy’s cat reference assembly (O.geoffroyi_Oge1_pat1.0) with minimap2 (v2.24) with parameters “-x asm20 -t <cpu_count> -c --eqx” and then rustybam (https://github.com/mrvollger/rustybam - bioconda v0.1.31) parameters “rb trim-paf sample.paf | rb break-paf --max-size 5000 | rb orient | rb filter --paired-len 100000 | rb stats --paf > sample.SafFire.bed”. Inversions greater than 500kbp were called if supported by both SafFire and Nucmer79 based dot plots.
Short-read genotyping and inversion classification
Pangenie (v1.0.1)128 classified inversions as species/lineage-specific, paraphyletic with breakpoint use, or polymorphic. Paired-end Illumina datasets for the lion (n=14), tiger (n=14), domestic cat (n=10), and Asian leopard cat (n=10) were downloaded from NCBI’s SRA database and interleaved utilizing Seqkit’s (v0.16.0)129 concat function. The interleaved FASTQ files and fully-phased VCF files were then passed to Pangenie using the parameters “-u -s <sample_name> -o <sample_name> -i <sample_interleaved_fastq> -r <reference_assembly> -v <fully_phased_PAV _inversions.vcf>”. We could not genotype Geoffroy’s cat-specific inversions using Illumina data. They were called if supported by inverted alignments to all query species. An initial phylogenetic matrix was constructed by merging inversions across all samples based on 50% reciprocal overlap (calculated by pybedtools v0.9.0)85,130.
Annotation of SV-overlapping/containing SD’s, Gaps, Genes, and Repetitive Elements
pybedtools (v0.9.0) intersected the breakpoint positions of the inversions with the coordinates of SDs, gaps, genes, and repetitive elements. SciPy’s (v1.7.3)131 ranksum function (one-sided, greater) determined if inversions flanked by SDs were significantly larger than inversions not flanked by SDs. Inversions flanked by repetitive elements sharing more than 90% identity were identified using pandas (v1.4.0). Repetitive elements within 100kb of the inversion breakpoints were aligned using biopython’s (v1.79)132 pairwise2.align.globalmx (upstream_seq, downstream_seq, 1, 0, score_only=True).
Statistics and Reproducibility
The one-sided Wilcoxon rank sum test was used to determine differences in inversion sizes between the autosomes and X chromosomes. In this study, no statistical method was used to predetermine sample size, no data were excluded from the analyses, and the experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.
Supplementary Material
Acknowledgements
We thank the High-Performance Research Computing center at Texas A&M University for support. DNA library preparation and long-read sequencing were performed at the University of Maryland Institute for Genome Sciences (Luke Tallon, Lisa Sadzewicz). Illumina sequencing was performed through the Texas A&M Institute for Genome Sciences & Society research core (Andrew Hillhouse). We thank Dr. Roscoe Stanyon for the flow-sorted domestic cat chromosomes for FISH experiments. This research was supported by grants from the Morris Animal Foundation (grant no. D19FE-04 to WJM and WCW), the National Science Foundation (grant no. DEB-1753760 to WJM), and the National Institutes of Health (grant no. R01 GM59290 to MAB). AJH was funded, in part, by a training grant from the National Institute of General Medical Sciences, NIH (T32 GM135115). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Footnotes
Ethics declarations
Competing interests: The authors declare no competing interests.
Code Availability Statement
Publicly available software and packages were used in this study. No custom code was used. All software and packages used in this study are described within the Methods section.
Data Availability
Assemblies are available in NCBI under accession numbers GCA_016509475.2, GCA_016509815.2, GCA_018350155.1, GCA_018350175.1, GCA_018350195.2, GCA_018350215.1. OR gene sequences and DXZ4 alignments are found at: https://figshare.com/s/68266360874d5078bdf5
References
- 1.Kronenberg ZN et al. High-resolution comparative analysis of great ape genomes. Science 360, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rhie A et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Nurk S et al. The complete sequence of a human genome. Science 376, 44–53 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Miga KH, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79–84 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Logsdon GA et al. The structure, function and evolution of a complete human chromosome 8. Nature 593, 101–107 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sedlazeck FJ, Lee H, Darby CA & Schatz MC Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018). [DOI] [PubMed] [Google Scholar]
- 7.Ahmad SF et al. Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics. Cells 9, 2714 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Johnson WE et al. The late Miocene radiation of modern Felidae: a genetic assessment. Science 311, 73–77 (2006). [DOI] [PubMed] [Google Scholar]
- 9.Li G, Davis BW, Eizirik E & Murphy WJ Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae). Genome Res. 26, 1–11 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Li G, Figueiró HV, Eizirik E & Murphy WJ Recombination-Aware Phylogenomics Reveals the Structured Genomic Landscape of Hybridizing Cat Species. Mol. Biol. Evol. 36, 2111–2126 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dobrynin P et al. Genomic legacy of the African cheetah, Acinonyx jubatus. Genome Biol. 16, 277 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Abascal F et al. Extreme genomic erosion after recurrent demographic bottlenecks in the highly endangered Iberian lynx. Genome Biol. 17, 251 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Johnson WE et al. Genetic restoration of the Florida panther. Science 329, 1641–1645 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Montague MJ et al. Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication. Proc. Natl. Acad. Sci. U. S. A. 111, 17230–17235 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Koren S et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. (2018) doi: 10.1038/nbt.4277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cho YS et al. The tiger genome and comparative analysis with lion and snow leopard genomes. Nat. Commun. 4, 2433 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Buckley RM et al. A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism. PLoS Genet. 16, e1008926 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bredemeyer KR, Harris AJ, Li G & Zhao L Ultracontinuous single haplotype genome assemblies for the domestic cat (Felis catus) and Asian leopard cat (Prionailurus bengalensis). J. Hered. 197, 165–173 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Meyne J, Ratliff RL & Moyzis RK Conservation of the human telomere sequence (TTAGGG)n among vertebrates. Proc. Natl. Acad. Sci. U. S. A. 86, 7049–7053 (1989). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Peska V & Garcia S Origin, Diversity, and Evolution of Telomere Sequences in Plants. Front. Plant Sci. 11, 117 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Fanning TG Origin and evolution of a major feline satellite DNA. J. Mol. Biol. 197, 627–634 (1987). [DOI] [PubMed] [Google Scholar]
- 22.Santos S, Chaves R & Guedes-Pinto H Chromosomal localization of the major satellite DNA family (FA-SAT) in the domestic cat. Cytogenet. Genome Res. 107, 119–122 (2004). [DOI] [PubMed] [Google Scholar]
- 23.Wurster-Hill DH, & Centerwall WR The interrelationships of chromosome banding patterns in canids, mustelids, hyena, and felids. Cytogenet. Cell Genet. 34, 178–192 (1982). [DOI] [PubMed] [Google Scholar]
- 24.Bailey JA, Baertsch R, Kent WJ, Haussler D & Eichler EE Hotspots of mammalian chromosomal evolution. Genome Biol. 5, R23 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Marques-Bonet T, Ryder OA & Eichler EE Sequencing primate genomes: what have we learned? Annu. Rev. Genomics Hum. Genet. 10, 355–386 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cantsilieris S et al. An evolutionary driver of interspersed segmental duplications in primates. Genome Biol. 21, 202 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mao Y et al. A high-quality bonobo genome refines the analysis of hominid evolution. Nature 594, 77–81 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Numanagic I et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34, i706–i714 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Vollger MR et al. Segmental duplications and their variation in a complete human genome. Science 376, eabj6965 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Charlesworth D & Charlesworth B Sex chromosomes: evolution of the weird and wonderful. Curr. Biol. 15, R129–31 (2005). [DOI] [PubMed] [Google Scholar]
- 31.Larson EL, Keeble S, Vanderpool D, Dean MD & Good JM The Composite Regulatory Basis of the Large X-Effect in Mouse Speciation. Mol. Biol. Evol. 34, 282–295 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Charlesworth B, Coyne JA & Barton NH The Relative Rates of Evolution of Sex Chromosomes and Autosomes. Am. Nat. 130, 113–146 (1987). [Google Scholar]
- 33.Cheng C & Kirkpatrick M Inversions are bigger on the X chromosome. Mol. Ecol. 28, 1238–1245 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Figueiró HV et al. Genome-wide signatures of complex introgression and adaptive evolution in the big cats. Sci Adv 3, e1700299 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ferree PM & Prasad S How can satellite DNA divergence cause reproductive isolation? Let us count the chromosomal ways. Genet. Res. Int. 2012, 430136 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bayes JJ & Malik HS Altered heterochromatin binding by a hybrid sterility protein in Drosophila sibling species. Science 326, 1538–1541 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bredemeyer KR et al. Rapid Macrosatellite Evolution Promotes X-Linked Hybrid Male Sterility in a Feline Interspecies Cross. Mol. Biol. Evol. 38, 5588–5609 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Figueroa DM, Darrow EM & Chadwick BP Two novel DXZ4-associated long noncoding RNAs show developmental changes in expression coincident with heterochromatin formation at the human (Homo sapiens) macrosatellite repeat. Chromosome Res. 23, 733–752 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Dossin F & Heard E The Molecular and Nuclear Dynamics of X-Chromosome Inactivation. Cold Spring Harb. Perspect. Biol. 14, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bonora G et al. Orientation-dependent Dxz4 contacts shape the 3D structure of the inactive X chromosome. Nat. Commun. 9, 1445 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Vollger MR, Kerpedjiev P, Phillippy AM & Eichler EE StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Horakova AH et al. The mouse DXZ4 homolog retains Ctcf binding and proximity to Pls3 despite substantial organizational differences compared to the primate macrosatellite. Genome Biol. 13, R70 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Froberg JE, Pinter SF, Kriz AJ, Jégu T & Lee JT Megadomains and superloops form dynamically but are dispensable for X-chromosome inactivation and gene escape. Nat. Commun. 9, 5004 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Brashear WA, Bredemeyer KR & Murphy WJ Genomic architecture constrained placental mammal X Chromosome evolution. Genome Res. 31, 1353–1365 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Andergassen D et al. In vivo Firre and Dxz4 deletion elucidates roles for autosomal gene regulation. Elife 8, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Abe H et al. Active DNA damage response signaling initiates and maintains meiotic sex chromosome inactivation. Nat. Commun. 13, 7212 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Abe H et al. The Initiation of Meiotic Sex Chromosome Inactivation Sequesters DNA Damage Signaling from Autosomes in Mouse Spermatogenesis. Curr. Biol. 30, 408–420.e5 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Altemose N et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Carbone L et al. Evolutionary movement of centromeres in horse, donkey, and zebra. Genomics 87, 777–782 (2006). [DOI] [PubMed] [Google Scholar]
- 50.Raudsepp T, Finno CJ, Bellone RR & Petersen JL Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era. Anim. Genet. 50, 569–597 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Henikoff S, Ahmad K & Malik HS The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098–1102 (2001). [DOI] [PubMed] [Google Scholar]
- 52.Young JM & Trask BJ The sense of smell: genomics of vertebrate odorant receptors. Hum. Mol. Genet. 11, 1153–1160 (2002). [DOI] [PubMed] [Google Scholar]
- 53.Hayden S et al. Ecological adaptation determines functional mammalian olfactory subgenomes. Genome Res. 20, 1–9 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Hughes GM et al. The birth and death of olfactory receptor gene families in mammalian niche adaptation. Mol. Biol. Evol. 35, 1390–1406 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Carroll RA et al. A novel fishing cat reference genome for the evaluation of potential germline risk variants. bioRxiv (2022) doi: 10.1101/2022.11.17.516921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sunquist M & Sunquist F Wild Cats of the World. (University of Chicago Press, 2017). [Google Scholar]
- 57.Nel JAJ Handbook of the Mammals of the World, Vol. 1: Carnivores, edited by Wilson DE & Mittermeier RA Lynx Edicions, Barcelona. 2009. [Google Scholar]
- 58.Dunkel A et al. Nature’s Chemical Signatures in Human Olfaction: A Foodborne Perspective for Future Biotechnology. Angewandte Chemie International Edition 53, 7124–7143 (2014). [DOI] [PubMed] [Google Scholar]
- 59.Moran Y, Barzilai MG, Liebeskind BJ & Zakon HH Evolution of voltage-gated ion channels at the emergence of Metazoa. J. Exp. Biol. 218, 515–525 (2015). [DOI] [PubMed] [Google Scholar]
- 60.Nei M & Rooney AP Concerted and birth-and-death evolution of multigene families. Annu. Rev. Genet. 39, 121–152 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Zhao J, Teufel AI, Liberles DA & Liu L A generalized birth and death process for modeling the fates of gene duplication. BMC Evol. Biol. 15, 275 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Newman T & Trask BJ Complex evolution of 7E olfactory receptor genes in segmental duplications. Genome Res. 13, 781–793 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Niimura Y, Matsui A & Touhara K Corrigendum: Extreme expansion of the olfactory receptor gene repertoire in African elephants and evolutionary dynamics of orthologous gene groups in 13 placental mammals. Genome Res. 25, 926 (2015). [PMC free article] [PubMed] [Google Scholar]
- 64.Soso SB & Koziel JA Characterizing the scent and chemical composition of Panthera leo marking fluid using solid-phase microextraction and multidimensional gas chromatography–mass spectrometry-olfactometry. Sci. Rep. 7, 1–15 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Nosil P & Feder JL Genomic divergence during speciation: causes and consequences. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 332–342 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Miga KH & Sullivan BA Expanding studies of chromosome structure and function in the era of T2T genomics. Hum. Mol. Genet. 30, R198–R205 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Wold J et al. Expanding the conservation genomics toolbox: Incorporating structural variants to enhance genomic studies for species of conservation concern. Mol. Ecol. 30, 5949–5965 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Formenti G et al. The era of reference genomes in conservation genomics. Trends Ecol. Evol. 37, 197–202 (2022). [DOI] [PubMed] [Google Scholar]
- 69.Mérot C, Oomen RA, Tigano A & Wellenreuther M A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation. Trends Ecol. Evol. 35, 561–572 (2020). [DOI] [PubMed] [Google Scholar]
- 70.Menotti-Raymond M et al. A genetic linkage map of microsatellites in the domestic cat (Felis catus). Genomics 57, 9–23 (1999). [DOI] [PubMed] [Google Scholar]
- 71.Menotti-Raymond M et al. Second-generation integrated genetic linkage/radiation hybrid maps of the domestic cat (Felis catus). J. Hered. 94, 95–106 (2003). [DOI] [PubMed] [Google Scholar]
- 72.Miller SA, Dykes DD & Polesky HF A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Research 16, 1215–1215 (1988). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ramani V et al. Mapping 3D genome architecture through in situ DNase Hi-C. Nat. Protoc. 11, 2104–2121 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Andrews S et al. FastQC: a quality control tool for high throughput sequence data. Preprint at (2010).
- 75.Koren S et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hu J, Fan J, Sun Z & Liu S NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020). [DOI] [PubMed] [Google Scholar]
- 77.Mikheenko A, Prjibelski A, Saveliev V, Antipov D & Gurevich A Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34, i142–i150 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV & Zdobnov EM BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015). [DOI] [PubMed] [Google Scholar]
- 79.Marçais G et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Murphy WJ et al. Novel gene acquisition on carnivore Y chromosomes. PLoS Genet. 2, e43 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Li G et al. Comparative analysis of mammalian Y chromosomes illuminates ancestral structure and lineage-specific evolution. Genome Res. 23, 1486–1495 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Brashear WA, Raudsepp T & Murphy WJ Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation. Genome Res. 28, 1841–1851 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Armstrong EE et al. Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data. BMC Biol. 18, 3 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Li H & Durbin R Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Quinlan AR & Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). [DOI] [PubMed] [Google Scholar]
- 87.Smit AFA, Hubley R & Green P RepeatMasker Open-4.0. 2013−-2015. Preprint at (2015).
- 88.Rice ES et al. Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle. Gigascience 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Ghurye J, Pop M, Koren S, Bickhart D & Chin C-S Scaffolding of long read assemblies using long range contact information. BMC Genomics 18, 527 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Ghurye J et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput. Biol. 15, e1007273 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Dudchenko O et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Alonge M et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 224 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Durand NC et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Robinson JT et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst 6, 256–258.e1 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Seibt KM, Schmidt T & Heitkam T FlexiDot: highly customizable, ambiguity-aware dotplots for visual sequence analyses. Bioinformatics 34, 3575–3577 (2018). [DOI] [PubMed] [Google Scholar]
- 96.Horakova AH, Moseley SC, McLaughlin CR, Tremblay DC & Chadwick BP The macrosatellite DXZ4 mediates CTCF-dependent long-range intrachromosomal interactions on the human inactive X chromosome. Hum. Mol. Genet. 21, 4367–4377 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Chadwick BP DXZ4 chromatin adopts an opposing conformation to that of the surrounding chromosome and acquires a novel inactive X-specific role involving CTCF and antisense transcripts. Genome Res. 18, 1259–1269 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Stamatakis A RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Maddison WP & Maddison DR Mesquite: a modular system for evolutionary analysis, v. 3.61. See http://mesquiteproject.org.
- 100.Kumar S, Stecher G, Li M, Knyaz C & Tamura K MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol. Biol. Evol. 35, 1547–1549 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Harris AJ, Foley NM, Williams TL & Murphy WJ Tree House Explorer: A Novel Genome Browser for Phylogenomics. Mol. Biol. Evol. 39, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Murphy WJ, Foley NM, Bredemeyer KR, Gatesy J & Springer MS Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation. Annu Rev Anim Biosci 9, 29–53 (2021). [DOI] [PubMed] [Google Scholar]
- 103.O’Brien SJ, Graphodatsky AS & Perelman PL Atlas of Mammalian Chromosomes. (John Wiley & Sons, 2020). [Google Scholar]
- 104.Flynn JM et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. U. S. A. 117, 9451–9457 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Vlahovic I et al. Global repeat map algorithm (GRM) reveals differences in alpha satellite number of tandem and higher order repeats (HORs) in human, Neanderthal and chimpanzee genomes – novel tandem repeat database. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO) Preprint at 10.23919/mipro48935.2020.9245278 (2020). [DOI] [Google Scholar]
- 106.Olson D & Wheeler T ULTRA: A Model Based Tool to Detect Tandem Repeats. ACM BCB 2018, 37–46 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Davis BW et al. A high-resolution cat radiation hybrid and integrated FISH mapping resource for phylogenomic studies across Felidae. Genomics 93, 299–304 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Kent JW BLAT—The BLAST-Like Alignment Tool. Genome Res. 12, 656–664 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Murphy WJ et al. A radiation hybrid map of the cat genome: implications for comparative mapping. Genome Res. 10, 691–702 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Erdman C & Emerson JW bcp: A Package for Performing a Bayesian Analysis of Change Point Problems, R package version 1.8. 4, URL (http: CRAN. R-procject. org/). J. Korea Water Resour. Assoc. [Google Scholar]
- 111.Eddy SR Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Nguyen L-T, Schmidt HA, von Haeseler A & Minh BQ IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol. 32, 268–274 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Sievers F et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Chen K, Durand D & Farach-Colton M NOTUNG: a program for dating gene duplications and optimizing gene family trees. J. Comput. Biol. 7, 429–447 (2000). [DOI] [PubMed] [Google Scholar]
- 115.Paradis E & Schliep K ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019). [DOI] [PubMed] [Google Scholar]
- 116.García-Alcalde F et al. Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics 28, 2678–2679 (2012). [DOI] [PubMed] [Google Scholar]
- 117.Li H et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Van der Auwera GA et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.McKenna A et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Adrion JR, Galloway JG & Kern AD Predicting the landscape of recombination using deep learning. Mol. Biol. Evol. 37, 1790–1808(2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Kumar S & Subramanian S Mutation rates in mammalian genomes. Proc. Natl. Acad. Sci. U. S. A. 99, 803–808 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Ebert P et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Li H Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Ren J & Chaisson MJP lra: A long read aligner for sequences and contigs. PLoS Comput. Biol 17, e1009078 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Smolka M et al. Comprehensive Structural Variant Detection: From Mosaic to Population-Level. bioRxiv 2022.04.04.487055 (2022). [Google Scholar]
- 126.Sedlazeck FJ et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Vollger MR et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Ebler J et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Shen W, Le S, Li Y & Hu F SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One 11, e0163962 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Dale RK, Pedersen BS & Quinlan AR Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Virtanen P et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Cock PJA et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Lovell JT et al. GENESPACE tracks regions of interest and gene copy number variation across multiple genomes. Elife 11, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Li G et al. A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination. G3 6, 1607–1616 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Jégu T, Aeby E & Lee JT The X chromosome in space. Nat. Rev. Genet. 18, 377–389 (2017). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Assemblies are available in NCBI under accession numbers GCA_016509475.2, GCA_016509815.2, GCA_018350155.1, GCA_018350175.1, GCA_018350195.2, GCA_018350215.1. OR gene sequences and DXZ4 alignments are found at: https://figshare.com/s/68266360874d5078bdf5
