Abstract
Chromosomal inversions are an important form of structural variation that can affect recombination, chromosome structure and fitness. However, because inversions can be challenging to detect, the prevalence and hence the significance of inversions segregating within species remains largely unknown, especially in natural populations of mammals. Here, by combining population-genomic and long-read sequencing analyses in a single, widespread species of deer mouse (Peromyscus maniculatus), we identified 21 polymorphic inversions that are large (1.5–43.8 Mb) and cause near-complete suppression of recombination when heterozygous (0–0.03 cM Mb−1). We found that inversion breakpoints frequently occur in centromeric and telomeric regions and are often flanked by long inverted repeats (0.5–50 kb), suggesting that they probably arose via ectopic recombination. By genotyping inversions in populations across the species’ range, we found that the inversions are often widespread and do not harbour deleterious mutational loads, and many are likely to be maintained as polymorphisms by divergent selection. Comparisons of forest and prairie ecotypes of deer mice revealed 13 inversions that contribute to differentiation between populations, of which five exhibit significant associations with traits implicated in local adaptation. Taken together, these results show that inversion polymorphisms have a significant impact on recombination, genome structure and genetic diversity in deer mice and likely facilitate local adaptation across the widespread range of this species.
Subject terms: Evolutionary genetics, Genome evolution, Structural variation
The identification of 21 large inversion polymorphisms in populations of deer mice shows that they are widespread, important for patterns of recombination and likely to be involved in local adaptation.
Main
A longstanding goal in population genetics has been to quantify intraspecific genetic variation, which serves as the substrate for evolutionary change. Since Lewontin and Hubby first characterized protein sequence variation in Drosophila pseudoobscura in 1966, tremendous progress has been made in measuring levels of single nucleotide polymorphisms (SNPs) in a wide diversity of species1. However, the prevalence of structural genomic variation, a focus of cytogenetics, remains largely uncharacterized in the molecular era2. Chromosomal inversions, in particular, are an important form of structural variation: inversions can be large (affecting megabases of sequence)3 and have been implicated in local adaptation, including differentiation of annual and perennial ecotypes of monkeyflowers4, wing-pattern morphs of mimetic butterflies5 and mating types of ruffs6,7.
Inversions may play a key role in local adaptation because of their effects on recombination. When heterozygous, an inversion will suppress recombination with the noninverted arrangement and, as a result, can drastically increase linkage disequilibrium (LD) between the loci it carries. As such, inversions can act as ‘supergenes’8, linking multiple locally adaptive alleles together into coinherited haplotype blocks, which may be advantageous in the face of gene flow9–11. Although inversions have been identified across a diversity of species in the context of local adaptation, suggesting that beneficial inversions may be common3, few studies have performed unbiased scans across the genome for inversion polymorphisms (but see refs. 12–16), raising the question of whether adaptive inversions are the exception or the rule. Thus, characterizing the abundance of inversion polymorphisms—that is, inversions segregating within a species—is a critical step towards quantifying levels of intraspecific genetic variation and understanding how and why inversion polymorphisms are established and maintained.
Detecting inversion polymorphisms with molecular data has traditionally been challenging (for example, breakpoints often reside in highly repetitive regions)17, but recent advances in long-read sequencing and increased feasibility of population-level genome resequencing provide new, powerful approaches for identifying inversions18,19. Using these approaches, recent studies have revealed the abundance of inversion polymorphisms in a few species: for example, sunflowers harbour dozens of large (1–100 Mb) inversion polymorphisms16, and humans harbour, on average, hundreds of inversion polymorphisms that affect more DNA base pairs (bp) in total than SNPs12,13.
Here, we perform an unbiased genome-wide scan for inversion polymorphisms in the deer mouse, Peromyscus maniculatus. The deer mouse is the most abundant and widespread mammal in North America: it has large effective population sizes20,21 and a range spanning all major terrestrial habitats, including dense forests and open prairies22. Early cytogenetic work in deer mice identified at least 13 visible chromosomal rearrangements23,24. Returning to this system in the molecular age, we detect 21 large inversion polymorphisms segregating within deer mice (some of which are likely to overlap with rearrangements detected by cytogenetics). In localizing these inversions, we determine their positions relative to centromeres and telomeres, explore their effects on chromosome structure, characterize genomic content at their breakpoints and propose a mechanism by which inversions arise in this species. Further, we quantify the impact of the inversions on recombination and the resulting effects on mutational load. Finally, we survey the distributions of the inversions across the species range and identify several inversions that contribute to local adaptation. Taken together, these results reveal proximate and ultimate mechanisms involved in the establishment and maintenance of inversion polymorphisms and suggest a prominent role for these inversions in local adaptation.
Results
Identifying inversion polymorphisms
To identify putative inversion polymorphisms, we initially focused on five populations—four deer mouse (P. maniculatus) and one oldfield mouse (Peromyscus polionotus), which is nested within the P. maniculatus clade (Fig. 1a)—and performed whole-genome resequencing (15× coverage with Illumina short-read data) on 15 individuals per population. To identify patterns of genetic variation consistent with inversion polymorphisms, we first characterized local population structure within populations and between population pairs in 100 kb windows across the genome using local principal component analyses (PCA)25 and identified outlier regions (Fig. 1b and Extended Data Fig. 1; as described in refs. 16,26). We then focused on genomic regions for which the first principal component separated individuals into three clusters, probably representing the three possible inversion genotypes (Fig. 1c and Extended Data Fig. 1), with the central cluster having the highest heterozygosity, consistent with inversion heterozygotes (Fig. 1d and Extended Data Fig. 1).
To verify that these genomic patterns were driven by suppression of recombination between haplotypes, we next measured linkage disequilibrium (LD) and recombination rates. In wild-caught mice, LD across all genotypes (but not within homozygotes) was elevated within predicted inversion regions (Fig. 1e and Extended Data Fig. 1), suggesting that recombination is suppressed between but not within haplotypes. We also estimated recombination rates using laboratory-raised inversion heterozygotes and found that putative inversion regions showed nearly complete suppression of recombination in heterozygotes (mean recombination per inversion: 0–0.03 cM Mb−1; Fig. 1f and Extended Data Fig. 1). Together, these results suggest that suppression of recombination is specifically driven by heterozygotes, providing strong evidence that inversion polymorphisms occur in the identified regions. In total, using this approach, we identified 21 inversion polymorphisms in this species. This is a conservative estimate because our approach was limited to identifying inversions >1 Mb in length with a minimum allele frequency of ~10%.
Owing to their number and sizes, these inversions alone affect recombination rates on a massive scale. The detected inversions range in size from 1.5 to 43.8 Mb and, in total, span 17.5% of the deer mouse genome. These inversions cause a near-complete suppression of recombination in heterozygotes: inversion regions show an average recombination rate of only 0.01 cM Mb−1, compared with a genome-wide rate (excluding inversion regions) of 0.80 cM Mb−1 (Extended Data Fig. 2). We also found no significant correlation between inversion size and recombination rate, highlighting how even the largest inversions almost completely suppress recombination (Extended Data Fig. 2). As a consequence, inversions can trap existing mutations or accumulate new mutations and maintain them in LD. Indeed, we found that genetic differentiation (FST) between inversion and standard haplotypes was elevated in a block-like structure (Fig. 1g and Extended Data Fig. 1), suggesting that the inversions partition genetic variation into large haploblocks, shaping patterns of genetic diversity across the deer mouse genome.
Inversion breakpoints
To localize inversion breakpoints, we performed PacBio long-read sequencing for one individual from each of the five focal populations and created de novo genome assemblies at the contig level (Extended Data Table 1). By aligning the de novo genome assemblies to the deer mouse reference genome (NCBI accession: GCA_003704035.3), we identified breakpoints for 13 of the 21 inversions (Fig. 2a and Extended Data Fig. 3). The eight inversions for which we did not identify breakpoints included five inversions (inv6.0, inv7.0, inv7.1, inv19.0, inv21.0) not represented in homozygotes among the PacBio-sequenced individuals (Extended Data Table 2); repetitive sequences probably prevented assembly across breakpoints for the remaining three inversions (inv10.0, inv11.0, inv15.1). Using the de novo genome assemblies, we predicted unique centromere locations for 21 of the 23 autosomes using a 344 bp satellite sequence that localizes to deer mouse centromeres27. Although centromeres are notoriously difficult to assemble28, the de novo genome assemblies spanned multiple predicted centromeres, revealing the highly repetitive nature of centromeric regions, with satellite sequence repeats spanning as much as 1.1 Mb (Fig. 2b). Together these data allowed us to precisely map many of the inversions to chromosomes and their positions relative to centromeres.
Extended Data Table 1.
Contig-level genomes were assembled using flye for one individual from each of the five focal populations. PacBio sequencing outputs are reported as subread N50 and flye genome assemblies are summarized with total length of assemblies, contig N50s, and number of contigs. Differences in assembly contiguity are likely driven by differences in heterozygosity in the sequenced samples.
Extended Data Table 2.
Inversion genotypes (0 = standard, 1 = inversion) for all 21 inversions in each of the five PacBio long-read sequenced samples. Inv6.0, inv7.0, inv7.1, inv19.0, inv20.0, inv21.0 are not represented by both homozygous genotypes in these samples.
We found that the distribution of the inversion polymorphisms across the genome is nonrandom. Of the 21 inversions, 15 are terminal, where the inversion ends within 1.5 Mb of the end of the chromosome (Fig. 2c). In addition, nine inversions have breakpoints (predicted or identified) within 1 Mb of the centromere (Fig. 2c); as predicted centromeres localize within the three inversions with identified breakpoints (inv13.0, inv14.0, inv20.0) and the other six inversions (inv6.0, inv7.0, inv7.1, inv10.0, inv15.1, inv19.0) are terminal and occur on acrocentric chromosomes, these inversions are likely to be pericentric (contain the centromere). As such, these nine inversions may toggle chromosomes between acrocentric and metacentric states, shifting centromere locations by as much as 43 Mb. In addition, these results suggest that centromeric and telomeric regions are likely to harbour inversion breakpoints in deer mice.
We also identified multiple genomic regions with recurrent inversion breakpoints. For example, on chromosome 7, we detected two overlapping inversions (inv7.2, inv7.3) with nearly identical breakpoints localizing only 80.2 kb apart (Fig. 2c, inset). Using whole-genome alignments between P. maniculatus and Peromyscus californicus, an outgroup, we determined the ancestral versus derived orientation for these two inversions and found that they arose independently rather than as a series of nested inversions. We also identified two inversions on chromosome 15 (inv15.1, inv15.2) with a shared breakpoint (although we localized breakpoints for only one of these inversions with the de novo assemblies) and two additional inversions on chromosome 7 (inv7.0, inv7.1) with breakpoints both occurring near the telomere (although we were unable to localize breakpoints for either) (Fig. 2c). The recurrence of inversion breakpoints further suggests that certain genomic regions have a greater tendency to participate in the formation of chromosomal rearrangements.
Characterizing the nature of inversion breakpoint regions is critical to understanding how inversions arise and why some genomic regions may be more susceptible to breakpoints. There are two major mechanisms by which inversions form: (1) nonhomologous end joining (NHEJ) can create inversions if double-stranded breaks occur and the sequence is reintegrated in reverse orientation; and (2) nonallelic homologous recombination (NAHR) can yield inversions if intrachromosomal crossing over occurs between inverted repeats (Fig. 3a). For 12 of the 13 inversions with localized breakpoints, we identified at least one pair of inverted repeats flanking the inversion (Fig. 3b). These inverted repeats ranged from 500 bp to 50 kb in length (Fig. 3b) and were often duplicated near the breakpoints (Fig. 3c and Extended Data Fig. 4). This suggests that the vast majority of inversions for which we identified breakpoints likely arose owing to NAHR, as opposed to NHEJ, consistent with the formation of inversions in humans29.
We next explored whether the breakpoints were enriched in repetitive genomic regions. For the 20 localized inversion breakpoints (excluding six breakpoints at chromosome ends), we used SEDEF30 to identify segmental duplications (SDs), defined as duplicated sequence within 500 kb of the breakpoint that is >1 kb in length and contains <70% common repeats. We found that breakpoint regions were significantly enriched for SDs compared with randomized regions genome-wide (Kolmogorov–Smirnov test: P < 0.001); for example, 50% of breakpoints had SD density in the top 90th percentile of random regions genome-wide (Fig. 3d). The repetitive structure of the breakpoints varied, with some breakpoint regions harbouring highly structured SDs in tandem (Fig. 3e and Extended Data Fig. 4) and others harbouring multiple interspersed SDs (Extended Data Fig. 4). Together, these analyses show that genomic regions with an accumulation of SDs may be prone to chromosomal rearrangements via ectopic recombination in deer mice.
Frequencies and evolution of inversions
To explore the distributions of these inversions, we next characterized their frequencies across the species range. We first determined the derived inversion arrangement based on genome alignments with an outgroup, P. californicus, and then genotyped the inversions in 218 mice from 13 populations (Fig. 4a). Most inversions were found in multiple populations: 18 of the 21 inversions were present in at least three of the 13 sampled populations (Fig. 4b). However, the varying distributions of the inversions suggest that they have differing evolutionary histories (for example, inversion age and selection): some inversions (for example, inv14.0) are widespread, whereas others (for example, inv7.2) are spatially constrained (Fig. 4c and Extended Data Fig. 5). The highly polymorphic nature of many of the inversions (for example, inv21.0) (Fig. 4c and Extended Data Fig. 5) was particularly striking, with 16 of 21 inversions segregating in at least two of the sampled populations (Fig. 4b). As such, inversion heterozygotes are common (Fig. 4b), indicating that the inversions have a profound impact on recombination rates in the wild.
Limited evidence for deleterious effects of inversions
To explore any negative consequences of inversions on fitness, we first examined possible deleterious effects due to inversion breakpoints. If an inversion breakpoint occurs within or near a gene, it may substantially affect the function and/or expression of that gene31. We found that significantly fewer inversion breakpoints occurred within protein-coding genes than expected based on the deer mouse gene density (binomial test: P = 0.004): of the 13 inversions for which we localized breakpoints, only two inversions (inv9.1, inv18.0) had breakpoints occurring within a protein-coding gene (inv9.1 disrupts the 1700129C05Rik intron, inv18.0 disrupts the Slc39a5 coding sequence (left breakpoint) and Baz2a intron (right breakpoint)) (Fig. 5a). Whereas these two inversions may affect phenotypes through disrupting gene function, the other 11 inversions with localized breakpoints do not disrupt annotated genes (Fig. 5a) and are thus less likely to convey strongly deleterious effects, although their breakpoints may still influence gene expression.
We next characterized possible mutational loads carried by the inversions, which may accumulate owing to suppressed recombination in inversion heterozygotes32. To do so, we tested whether the inversions were enriched for nonsynonymous mutations relative to the standard haplotypes. We found that the inversions did not show a significant increase in their proportion of segregating nonsynonymous to synonymous mutations (pN/pS) compared with the standard haplotypes (two-sided t-test: P > 0.05 for all inversions), nor did they show a significant increase in nucleotide diversity at nonsynonymous versus synonymous sites (πN/πS) compared with the standard haplotypes (two-sided t-test: P > 0.05 for all inversions) (Fig. 5b). In addition, neither the inversions nor the standard haplotypes showed enrichment for nonsynonymous mutations (pN/pS and πN/πS) relative to the rest of the genome (one-sided t-test: P > 0.05 for all inversions and standard haplotypes) (Fig. 5b), and we did not find a correlation between inversion heterozygote frequency and mutational load (Extended Data Fig. 6). Using nonsynonymous mutation accumulation as an estimate of mutational load, these results suggest that the inversions do not harbour a strong deleterious mutational load.
In addition, if inversions accumulate a recessive mutational load, inversion homozygotes should be rare (for example, in butterflies33 and sparrows34). In deer mice, however, inversion genotype frequencies are consistent with Hardy–Weinberg equilibrium (HWE): we found only five (of 73) instances in which a segregating inversion significantly deviated from HWE within a population (Fig. 4b). This suggests that inversion homozygotes are not strongly underrepresented relative to expectation in populations segregating for a given inversion, which further supports the observation of limited mutational load. We also note that since most inversion genotype frequencies are consistent with random mating, strong assortative or disassortative mating by inversion genotype does not readily occur (unlike in the ruff6,7 or white-throated sparrow34). Together, these lines of evidence suggest that these inversions in deer mice are not associated with strongly negative effects on fitness.
Multiple inversions contribute to local adaptation
To explore the role of positive selection in the establishment and maintenance of these inversion polymorphisms, we characterized the contribution of inversions to local population differentiation. We took advantage of previous work on two populations, representing forest and prairie deer mouse ecotypes (populations c and e, Fig. 4a), which are well characterized and widespread20. Forest and prairie mice show many pronounced phenotypic differences (for example, coat colour, tail length, foot length) despite ongoing gene flow. We previously identified an inversion on chromosome 15 (inv15.0) that contributes to phenotypic divergence between these ecotypes20. Returning to this system, we found that multiple newly identified inversions were also major contributors to differentiation between these populations. Specifically, genome-wide FST is low between ecotypes (genome-wide forest–prairie FST: 0.03 ± 0.03) owing to high migration rates20, yet we found multiple ‘genomic islands of divergence’ that showed remarkable overlap with identified inversion polymorphisms (inversion-region forest–prairie FST: 0.26 ± 0.16) (Fig. 6a). For 13 inversions, the ecotypes differed by >50% in their inversion frequencies. Using forward-genetic simulations in SLiM35, we found that for a locus to be maintained at >50% frequency difference between the forest and prairie ecotypes given high gene flow, it was most likely to be evolving under divergent selection (Extended Data Fig. 7), implicating these 13 inversions in local adaptation.
The distributions of these inversions across a forest–prairie habitat gradient further support their role in adaptation. Specifically, we genotyped the 13 polymorphic inversions in 136 samples across an environmental gradient and found that nine inversions showed steep changes in frequency across the forest–prairie habitat transition (Fig. 6b and Extended Data Fig. 8), suggesting that these inversions may be favoured in alternate habitats. Furthermore, five inversions (inv7.2, inv14.0, inv15.0, inv18.0, inv21.0) were significantly associated with an ecotype-defining trait, tail length, in laboratory-raised F2 hybrids20 (P < 0.05, linear model) and, for all five, the forest arrangement was associated with longer tails (Fig. 6c), consistent with long tails being important for balance in arboreal habitats36. These five inversions together explain 23.0% of the variance in tail length (individually explaining 2.0–12.5% of the variance, with additive effects ranging from 1.1–2.7 mm change in tail length). Inv15.0 has also been previously found to be significantly associated with coat colour, a second ecotype-defining trait20 (explaining 40% of coat colour variance) (Fig. 6c). Together, these results suggest that inversions may be a key source of genetic variation differentiating locally adapted deer mouse populations, with divergent selection likely to play a role in maintaining the inversions as polymorphisms within this species.
Discussion
Technological advances in genome sequencing have recently led to new opportunities for characterizing intraspecific structural variation. For example, the ability to perform population-level whole-genome resequencing allows signatures of large structural variants such as chromosomal inversions to be more easily detected19. This approach has recently been successful in identifying inversions in sunflowers16,26 and seaweed flies15 and now in deer mice. In addition, long-read sequencing has also greatly facilitated the detection and classification of structural variants. For example, here we found that inversion breakpoints reside in highly repetitive genomic regions, harbouring an enrichment of segmental duplications, similar to other mammalian species (that is, humans and great apes13,17). The repetitive nature of mammalian inversion breakpoints probably explains why breakpoints are so challenging to detect with short-read sequencing data alone, as well as with long-read data if read length or coverage is insufficient to resolve repeat regions, as we suspect is the case for the deer mouse inversions for which we failed to localize breakpoints. Future work combining these two approaches—to perform population-level long-read genome sequencing—will further our ability to detect structural variation within a diversity of species18.
In discovering deer mouse inversion polymorphisms, we found that they have an interesting distribution in the genome: a majority of the inversions occur terminally, and most of these involve breakpoints near centromeres. The inversions with breakpoints adjacent to centromeres are likely to be shifting centromere locations from the middle of the chromosome to the end of the chromosome (and vice versa), transforming chromosomes between metacentric and acrocentric states. This result could explain the longstanding observation that deer mice vary in number of acrocentric chromosomes23,24. Furthermore, inversions are also likely to influence chromosome accessibility owing to changes in the three-dimensional genome structure, which, in addition to the mutations the inversions carry, may influence the expression of genes found within the inversions. Despite this large variation in chromosome structure, deer mice (and, more generally, the Peromyscus genus) have a strongly conserved chromosome number (diploid n = 48)24. Unlike the case in other rodents such as the house mouse, which harbours Robertsonian fusions37, the large rearrangements involving centromeres occur primarily within and not between chromosomes in deer mice.
One hypothesis for why deer mouse inversions tend to involve telomeric and centromeric regions is that inversion breakpoints arise more frequently in these regions: genomic regions near centromeres and telomeres can harbour an excess of SDs (as well as other repeats), which may facilitate ectopic recombination38. A second hypothesis is that inversions with breakpoints in telomeric or centromeric regions are less likely to be removed by purifying selection than inversions that occur in other genomic regions: breakpoints that occur near centromeres and telomeres may be unlikely to have deleterious effects as these regions tend to be gene-sparse38. Indeed, none of the inversion breakpoints we found near centromeres (and only one near a telomere) disrupted protein-coding sequences. Terminal inversions may also be less likely than non-terminal inversions to have strong underdominant effects, which often occur owing to inversion loops that form in heterozygotes during meiosis3. If an inversion lacks homologous sequence on one side, such as in a terminal inversion, loop formation may be prevented. Previous evidence from deer mice suggests that inversion loop formation is rare in putative terminal inversions39. Thus, deer mouse inversions involving telomeres and centromeres may confer fewer deleterious costs associated with breakpoint effects and underdominance than inversions occurring in the rest of the genome.
Inversions are a particularly interesting form of structural variation because of their effects on recombination. Inversions in deer mice, when heterozygous, suppress recombination across their entire lengths. The number and sizes of the inversions thus seem striking in the context of recombination: 21 detected inversion polymorphisms, with a mean length of 20.0 Mb, affect a total of 420 million DNA bp (or 17%) of the deer mouse genome. Although these results are consistent with large inversions causing suppression of recombination in other species (for example, quails40, maize41 and cod42), whether inversion polymorphisms affect similar proportions of the genome in other species remains largely unknown. Furthermore, as our approach was limited to detecting inversions >1 Mb in length, there are possibly many additional inversions of shorter lengths segregating within deer mice, which is an important direction for future work. Nevertheless, we found that the detected inversions substantially shape the recombination landscape of deer mice: although suppression of recombination is limited to inversion heterozygotes (so the frequency of an inversion will determine the extent to which it affects recombination), most deer mouse inversions are widespread and inversion heterozygotes are common in natural populations.
Recombination plays an important role in evolution through creating new combinations of alleles and increasing the efficiency of natural selection43. In particular, through uncoupling deleterious and beneficial mutations, recombination reduces Hill–Robertson interference and facilitates the elimination of deleterious mutations and the spread of beneficial mutations44,45. Given the benefits of recombination, the abundance of inversions presents a paradox. With reduced efficacy of purifying selection in the absence of recombination, the expectation is that inversions will accumulate a deleterious mutational load (when inversion heterozygotes are common)32, which will limit their spread46. In deer mouse inversions, however, we did not find evidence for the accumulation of mutational load based on nonsynonymous mutations (although these inversions may harbour an excess of other types of deleterious variants such as transposable elements, which future work will further resolve), consistent with a recent study in sunflowers47. In both deer mice and sunflowers, inversion homozygotes are common47; as recombination proceeds uninterrupted in inversion homozygotes, deleterious mutations can efficiently be removed once an inversion reaches substantial allele frequency32, especially if effective population sizes (Ne) are high, as in many populations of deer mice (for example, Ne ≈ 4 × 106 in a single population20). As in sunflowers47, we hypothesize that these inversions, which act as large-scale modifiers of recombination when heterozygous, largely evaded deleterious costs associated with suppressed recombination by quickly spreading to high frequencies in deer mice, whose large population sizes could facilitate effective purifying selection in inversion homozygotes32 (noting that gene conversion between inversion and standard haplotypes may also have a role in reducing deleterious mutational load32).
A major hypothesis for the maintenance of inversion polymorphisms is the ‘local adaptation hypothesis’, which posits that when a population is locally adapting in the face of gene flow, suppressed recombination between multiple beneficial mutations can be advantageous, reducing the strength of selection necessary to establish and/or maintain each mutation in migration–selection equilibrium9–11. As deer mice are found continuously across a wide range of habitats, they are subjected to a range of selective pressures, probably with ongoing gene flow. Our results support an important role for divergent selection in maintaining inversions as polymorphisms within the species at large. In particular, we found that 13 inversions, including one previously identified20, are segregating between forest and prairie deer mouse ecotypes with high allele frequency differences and are likely to be subject to habitat-associated divergent selection, consistent with multiple inversions differentiating ecotypes in a diversity of species such as snails48, cod42, sunflowers26 and sticklebacks49. Although it remains an open question whether the inversions segregating between these forest–prairie ecotypes are advantageous because of their suppression of recombination, the high levels of migration between the forest and prairie populations suggest that increased LD between adaptive alleles may be particularly beneficial in this system20. In addition, five of these inversions have significant effects on tail length, and thus variation in this ecotype-specific trait is largely partitioned into inversions, consistent with the evolution of concentrated genetic architectures in the face of gene flow50.
A concrete understanding of the prevalence and significance of inversion polymorphisms specifically, and of structural variation more generally, remains largely elusive across natural populations of organisms, especially mammals51. We find that inversion polymorphisms are abundant in deer mice. Whether the abundance of inversion polymorphisms in deer mice is unique or representative of mammalian species will require similar investigations across additional species. Nevertheless, this work highlights the critical role of inversions in shaping patterns of recombination, genetic diversity and chromosomal structure in the deer mouse and suggests that inversions may play an even more important part in local adaptation than previously appreciated.
Methods
Population sampling and sequencing
Focal population sampling
We focused our initial analyses on five populations of P. maniculatus, each representing a distinct subspecies (P. m. rubidus, P. m. gambelii, P. m. bairdii, P. m. nubiterrae and P. p. subgriseus). Tissues from 15–17 wild-caught mice per population were collected in Siuslaw National Forest, Oregon, USA (P. m. rubidus)20, Baker City, Oregon, USA (P. m. gambelii)20, Derry, Pennsylvania, USA (P. m. nubiterrae)52, Ocala National Forest, Florida, USA (P. p. subgriseus)53 and Bridgewater, Michigan, USA (P. m. bairdii; obtained from the University of Michigan). All samples used in this study are listed in Supplementary Table 1.
Whole-genome resequencing and variant calling
To generate whole-genome resequencing data, we first extracted DNA from ~20 mg of liver tissue and generated sequencing libraries using Illumina DNA library preparation kits. We sequenced the resulting libraries using 150 bp paired-end sequencing on an Illumina NovaSeq S4 flow cell to obtain ~15× coverage per sample. Following demultiplexing, we mapped sequencing reads to the P. maniculatus bairdii reference genome (NCBI accession: GCA_003704035.3) using BWA-MEM. We accessed published whole-genome resequencing data for three populations: P. m. rubidus, P. m. gambelii20 (NCBI: PRJNA688305) and P. p. subgriseus53 (PRJNA838595). To call variant sites, we used HaplotypeCaller (GATK3.8) on each sample with the default heterozygosity prior (−hets = 0.001) and –ERC GVCF to produce per-sample genomic variant call format files (vcfs). Then, we ran GenotypeGVCFs (GATK3.8) to jointly genotype the samples. We performed hard filtering of SNPs based on GATK best practices (filtering variants with quality by depth (QD) < 2.0, FisherStrand (FS) > 60.0, mapping quality (MQ) < 40.0, MQRankSum < −12.5, ReadPosRankSum < −8.0) using VariantFiltration.
Identifying inversions
Local PCA
To identify genomic regions with outlier population structure, we performed local PCA with the lostruct package25 in R on each of the five focal populations and for all focal population pairs. Note that when all populations are included, population structure is driven by population divergence, which masks the signatures of possible inversions. Therefore, we included only individual populations or population pairs for this analysis, such that inversion signatures were detectable. Using lostruct, we performed local PCA for 100 kb windows with a step size of 100 kb. We then computed the distance between PCA maps (with the top two PCs) using the pc_dist function with default parameters and visualized these distances using multidimensional scaling (MDS) with the cmdscale function with two MDS axes.
To identify genomic regions with unusual population structure, we scanned for consecutive 100 kb windows that showed similar population structure to each other and distinct population structure from the rest of the chromosome. To do so, we first performed k-means clustering of the 100 kb windows in the MDS space, defined by the MDS1 and MDS2 axes, using numbers of clusters from k = 2 to k = 10. To determine the best k, we chose the k with the maximum silhouette score; this is an averaged measure of the dissimilarity between an observation and its neighbouring cluster. We then assigned 100 kb windows to the cluster determined by the k-means clustering for the chosen k. We next calculated the z score for the MDS1 score for each 100 kb window and selected genomic regions with consecutive windows belonging to the same cluster in which at least ten consecutive windows had z score >1.5.
PCA and heterozygosity
For each identified outlier region, we performed PCA on the entire region using scikit-allel v.1.3.2 (https://github.com/cggh/scikit-allel). For scikit-allel analyses, we created zarr objects from the whole-genome resequenced vcfs using allel.vcf_to_zarr. We then performed PCA using all SNPs in the region, with the function allel.pca, with n_components = 10, scaler = ‘patterson’ and ploidy = 2. k-means clustering of samples in PC1 versus PC2 space was performed in R with kmeans, following the approach detailed by Todesco et al.16, where samples were assigned to three clusters, setting the cluster starting positions as the minimum, maximum and middle value for PC1 scores to prevent clustering from being influenced by unequal numbers of samples per cluster. When clustering into three groups failed, we tried clustering into two groups, which would be the case if only two inversion genotypes are present. In a few cases (n = 4), we manually reassigned clusters for samples when k-means clustering had clear misassignments. For each outlier region identified, we also computed heterozygosity (reported as the percentage of sites that are heterozygous) for every sample in the relevant populations, using count_het in scikit-allel. Finally, we selected putative inversions to be outlier regions for which samples clustered into three distinct groups along PC1 with high heterozygosity for the middle cluster. We also included an additional four regions for which samples clustered into only two distinct groups along PC1 but signatures of recombination suggested the presence of an inversion (see below).
Linkage disequilibrium
For each putative inversion, we computed LD across the chromosome harbouring that putative inversion using: (1) all samples belonging to the population or population pair from which the putative inversion was identified; and (2) only the samples homozygous for the more common haplotype, based on the PCA clustering. To compute LD, we subset the vcf by sample and chromosome with bcftools. We then used vcftools to filter for SNPs with minor allele frequency (MAF) > 5% (--maf 0.05) and number of missing genotypes = 0 (--max-missing-count 0) and thinned SNPs to at most one SNP per 1 kb (--thin 1000). We computed LD with vcftools geno-r2. Finally, we used the script emerald2windowldcounts.pl (https://github.com/owensgl/reformat, https://github.com/owensgl/haploblocks) to calculate the mean r2 between 500 kb windows (that is, for a given set of two 500 kb windows, the mean r2 across all pairwise SNP comparisons between the two windows was computed).
Recombination rates
We estimated recombination maps for both the whole genome and within inversion regions, using laboratory-raised F2 hybrids from previous intercrosses between two population pairs: P. m. rubidus × P. m. gambelli20 and P. m. bairdii × P. p. subgriseus54, which yielded a total of 547 and 1061 F2 hybrids, respectively. Using double-digest restriction-site associated DNA (ddRAD) sequencing data of F2 hybrids, we determined ancestry and the location of recombination breakpoints in the F2 hybrids using the multiplexed shotgun genotyping pipeline (see ref. 20 for details). For the P. m. rubidus × P. m. gambelli intercross, we genotyped the founders (n = 4) and F1 hybrids (n = 49) of the intercross for the inversions (see Genotyping samples for inversions) to ensure that only F2 hybrids that were offspring of F1 inversion heterozygotes were used for computing recombination rates within inversion regions. All inversions analyzed in the P. m. bairdii × P. p. subgriseus intercross were fixed between the founders. Five inversions (inv7.0, inv7.3, inv9.1, inv15.2, inv20.0) were not represented by heterozygous F1 hybrids and so we were unable to characterize recombination rates for these inversions.
Genetic differentiation
To measure genetic differentiation between inversion and standard haplotypes across each identified inversion, we computed FST between predicted homozygote genotypes (clusters 1 and 3 from PCA clustering) using scikit-allel. We performed sliding-window FST analyses for 10 kb windows with a 10 kb step size using scikit-allel with the windowed_hudson_fst function and visualized FST with loess smoothing in R.
To analyze genome-wide genetic differentiation between forest (P. m. rubidus) and prairie (P. m. gambelii) ecotypes, we computed FST between forest and prairie populations in 100 kb windows across the genome with a step size of 100 kb, using scikit-allel with the windowed_hudson_fst function.
Localizing inversion breakpoints
PacBio long-read sequencing and de novo genome assembly
We performed long-read sequencing on five individuals (laboratory-colony-raised mice), one from each focal population. First, we extracted high-molecular-weight DNA from 200 μl fresh blood using the MagAttract HMW DNA mini kit (Qiagen), following the Whole Blood protocol. We quantified the resulting DNA using a Genomic DNA ScreenTape on a TapeStation 4200 (Agilent). Library preparation and sequencing were performed at the PacBio Sequencing Core of the University of Washington. In brief, libraries were prepared with the SMRTbell Express Template Prep Kit 2.0 (PacBio). We performed a size selection of 30 kb for the P. m. rubidus, P. m. nubiterrae and P. m. bairdii samples using BluePippin (Sage Science); we did not perform any size selection for the P. m. gambelii and P. p. subgriseus samples as the total library mass was below 500 ng. We then sequenced each on a Sequel II SMRTcell 8 M (PacBio), the P. m. rubidus, P. m. nubiterrae and P. m. bairdii samples with a 15 h video and the P. m. gambelii and P. p. subgriseus samples with a 30 h video.
We converted the bam files from each video to fastq files using bam2fastx (PacBio). We then used flye55 to create de novo genome assemblies at the contig level for each population. The flye assembler uses a repeat graph to assemble across repetitive genomic regions, a critical feature for localizing inversion breakpoints, which often occur in repetitive genomic regions. To reduce run time, we downsampled to 40× coverage (-asm-coverage = 40) for initial disjointing assembly but otherwise ran the assembler with default parameters. Genome qualities are reported in Extended Data Table 1.
To genotype each PacBio sample for the inversions, we first mapped the PacBio fastq files to the P. maniculatus reference genome using ngmlr56. Then, we used longshot57, a long-read-specific variant caller, to call variants for each sample. We merged the variant calls with the whole-genome resequencing vcfs and performed PCA for each inversion region, which allowed us to genotype the PacBio samples for the inversions (Extended Data Table 2; for details, see Genotyping samples for inversions).
Inversion breakpoint identification
We aligned the PacBio genome assemblies to the P. maniculatus bairdii reference genome using nucmer (mummer)58 with default parameters. Owing to the possibility of reference genome errors, we reoriented any scaffolds in the reference genome that were misoriented relative to the P. m. bairdii long-read assembly (that is, we identified signatures of inversions or translocations in the P. m. bairdii long-read assembly relative to the reference genome and resolved these regions to match the P. m. bairdii long-read assembly). Thus, all inversion analyses were relative to the P. m. bairdii long-read assembly. We also aligned published P. californicus59 (NCBI accession: GCA_007827085.2) and P. leucopus60 (NCBI accession: GCA_004664715.2) genomes as well as previously assembled de novo genomes for P. m. rubidus and P. m. gambelii20 from canu (a long-read genome assembler complementary to flye) to the P. maniculatus reference genome using nucmer.
For each inversion, we scanned for evidence of inversion breakpoints. To do so, we filtered for nucmer alignments >4 kb in length (or >10 kb for P. californicus, P. leucopus alignments). Inversion breakpoints are identifiable if: (1) a contig spans the inversion region and maps to the reference genome in opposite orientation within the inversion region; or (2) a contig spans only part of the inversion region and maps to the reference genome in opposite orientation to the flanking region of the other end of the inversion. We thus identified contigs that showed signatures of inversions in predicted inversion regions and identified breakpoint positions based on the PacBio assembly alignments to the P. maniculatus reference genome. In addition, we identified breakpoints for one of the predicted inversions based on the P. leucopus genome alignment to the P. maniculatus reference genome and one of the predicted inversions based on the P. californicus genome alignment to the P. maniculatus reference genome.
Determining derived arrangement
For each inversion polymorphism, we determined which arrangement was ancestral (standard) versus derived (inversion) based on the whole-genome alignments between P. californicus (outgroup) and P. maniculatus. We evaluated whether the P. californicus reference genome was inverted relative to the P. maniculatus reference genome for each inversion region, and we assigned the P. californicus orientation to be the ancestral, standard arrangement.
Predicting centromere locations
Peromyscus species are known to have satellite sequences that map to centromeres; specifically, a 344 bp satellite sequence (NCBI accession: KX555281.1) localizes to P. maniculatus centromeres27. We used blastn (blast v.2.2.29) to map this satellite sequence to the P. maniculatus reference genome and to each PacBio genome assembly (as long-read genome assemblies are more likely to assemble across repetitive regions), filtering for alignments with >85% identity. Using this approach, we then determined centromere locations in the reference genome (converting alignment positions in the PacBio genomes to their corresponding or closest reference genome coordinates). To further explore the predicted centromeres, we created dotplots for contigs from the PacBio genomes that spanned a predicted centromere. To do so, we used nucmer with --maxmatch, -l 50, -c 100 to align each contig to itself and then plotted all alignments >100 bp using R.
Characterizing repeat content at inversion breakpoints
Dotplots
To evaluate whether inversion breakpoints occurred in repetitive regions, we created dotplots from the PacBio contig-level assemblies. We performed self-versus-self nucmer alignments for contigs spanning inversion breakpoints, with --maxmatch, -l 50, -c 100; we filtered for alignments >1 kb and plotted the alignments in R.
Inverted repeats and SDs
We identified inverted repeats and segmental duplications (SDs) near inversion breakpoints using the package SEDEF30. For the relevant PacBio contigs identified above (spanning or adjacent to inversion breakpoints), we softmasked common repeats with RepeatMasker, using --xsmall and --species rodentia and masked the 344 bp centromere satellite sequence. We then performed SEDEF with default parameters on the entire set of relevant PacBio contigs. First, we determined inverted repeats to be any repeat identified by SEDEF that mapped in the opposite orientation to within 500 kb of both inversion breakpoints. Next, we called repeats as SDs if they were duplicated within 500 kb of a breakpoint, were ≥1 kb in length, had ≥70% identity with a duplication and had <70% of their sequence masked as common repeats. We then determined the density of SDs within 500 kb of each inversion breakpoint (note that we excluded breakpoints at chromosome ends as telomeres are not fully assembled in these genome assemblies). To compare the breakpoint SD density to that of random regions genome-wide, we also ran SEDEF on each contig from the P. m. bairdii PacBio genome assembly and called SDs. We then randomly selected 10,000 sites from across the genome and calculated the density of SDs within 500 kb of each site. Finally, we tested whether inversion breakpoints were significantly enriched for SDs relative to the 10,000 randomized genome-wide regions using the Kolmogorov–Smirnov test in R.
Genes near inversion breakpoints
We used the P. m. bairdii genome annotation (Pman2.1_chr_NCBI.corrected.merged-with-Apollo.Aug19.sorted_chr15.gff3) to explore whether inversion breakpoints disrupted annotated protein-coding genes. We tested whether the number of breakpoints disrupting gene sequence was expected by chance based on overall gene density using a binomial test; we calculated the gene density (including exons, introns and untranslated regions) to be 39% genome-wide and then used binom.test in R to perform a binomial test, with probability of success of 0.39.
Inversion frequencies
Sampling populations across species range
To characterize the frequencies and distributions of the inversions across the P. maniculatus range, we included 3–46 individuals from each of an additional eight populations, which, when combined with the initial populations, yielded a total of 218 mice from 13 populations. For five of the new eight populations (populations a, b, f, i and k in Fig. 4a; see Supplementary Table 1 for sample details), we extracted DNA from liver tissue and performed whole-genome resequencing (~10–15× coverage) and variant calling as described above. We also performed whole-genome resequencing for 11 P. leucopus samples and two P. californicus samples (see Supplementary Table 1 for sample details), which we also included in our variant calling pipeline. For three additional populations (populations d, g and h in Fig. 4a; see Supplementary Table 1 for sample details), we obtained publicly available exome-sequencing data61 (NCBI: PRJNA528923) and mapped sequencing reads to the P. maniculatus reference genome with BWA-MEM. We then performed variant calling as described above, except that these samples were joint-genotyped separately from the whole-genome resequenced samples. We approximated the P. maniculatus range using the IUCN Red List of Threatened Species (https://www.iucnredlist.org/species/16672/22360898) and plotted the range map in R, as shown in Fig. 4a.
Phylogenetic trees
To reconstruct the evolutionary relationships among populations, we used RAxML62 to build maximum-likelihood trees. First, we created a tree for the five focal P. maniculatus populations and two outgroups (Peromyscus leucopus and P. californicus). Using hard-filtered SNPs from across the autosomes, we thinned SNPs to at most one SNP per 100 kb using vcftools and merged vcfs across chromosomes. We converted the merged vcf to a PHYLIP matrix using vcf2phylip.py (https://github.com/edgardomortiz/vcf2phylip) and removed invariant sites using ascbias.py (https://github.com/btmartin721/raxml_ascbias), resulting in a total of 12,292 SNPs. We then ran RAxML v.8.2.12 using the ASC_GTRCAT model with the conditional likelihood method, -asc-corr=lewis, to correct for the ascertainment bias due to using SNPs63. We ran 100 bootstraps, with ‘-f a’ to perform rapid bootstrap analysis and visualized trees in iTOL64. We next created a tree for all 13 P. maniculatus populations and the two outgroups. To do so, we first merged the variants called for the three exome-sequenced populations with the whole-genome resequenced vcfs and subset each population to at most 15 individuals. We removed variants with missing genotypes for >20% of samples and masked inversion regions using bcftools. We then converted the vcf to a PHYLIP matrix and removed invariant sites as described above, resulting in a total of 15,518 SNPs. We ran RAxML as described above, with 100 bootstraps using ‘-f a’ to perform rapid bootstrap analysis and visualized trees in iTOL.
Genotyping samples for inversions
To genotype individuals for the presence or absence of inversions, we used a PCA approach. For each inversion, we selected closely related populations segregating for the inversion of interest and performed PCA for that inversion region using scikit-allel, as described above. PCA was performed with only a subset of populations to allow for the inversion of interest (rather than population divergence) to drive variance along PC1. We then projected the remaining samples onto the PC1 and PC2 axes. We genotyped samples for each inversion based on loading scores along PC1 (along which samples clustered into inversion genotype groups) with manual determination of boundaries. We verified that samples called as inversion heterozygotes had elevated heterozygosity in the inversion region using the count_het function in scikit-allel. We set any populations with ambiguous clustering along PC1 for a given inversion to have missing genotypes. Finally, we determined inversion genotype frequencies for each population and tested for deviations from HWE using HWE.chisq in R from the genetics package.
We also determined inversion genotypes for: (1) 547 F2 hybrids from the P. m. rubidus × P. m. gambelii cross; and (2) the 136 wild-caught mice from the environmental transect. To do so, we first created a set of SNPs fixed between the inversion and standard arrangements using homozygous samples from only forest (P. m. rubidus) and prairie (P. m. gambelii) populations, unless there were fewer than three homozygous samples per genotype, in which case we included additional homozygous samples from nearby populations (populations b and f, Fig. 4a) to improve filtering. Previously, the F2 hybrids were sequenced using the ddRAD-sequencing pipeline (as described in Recombination rates, NCBI: PRJNA687993), and the 136 transect mice were whole-genome resequenced at low coverage (NCBI: PRJNA688305)20. Using these sequencing data, we selected the fixed inversion-standard SNPs from bam files for the F2 hybrids and transect mice using mpileup and performed the hidden Markov model step of the multiplexed shotgun genotyping pipeline65 to determine genotype for each inversion.
Mutational load
To test whether the inversions were enriched for deleterious mutations compared with standard haplotypes, we analyzed the number of segregating nonsynonymous (pN) versus synonymous (pS) sites and nucleotide diversity at nonsynonymous (πN) versus synonymous (πS) sites using PopGenome66. For each inversion, we selected samples homozygous for the inversion arrangement and used readVCF to import biallelic SNPs for the samples and inversion regions of interest into PopGenome; specifically, we selected homozygous samples from the major P. maniculatus clade (Fig. 4b; populations a, b, c, e, f, i and j) except for inv10.0 and inv11.0, for which we also included populations k, l and m in order to sample both homozygous genotypes. We then used the set.synnonsyn function with the P. m. bairdii genome annotation to determine nonsynonymous and synonymous sites. Next, we computed nucleotide diversity for each synonymous and nonsynonymous site with the diversity.stats function. Finally, for 500 kb windows across each inversion region, we calculated the ratios pN/pS and πN/πS (using only sites that were segregating within the homozygous sample set). We then repeated these analyses for samples homozygous for the standard arrangement. To test whether the inversion and standard haplotypes significantly differed in pN/pS or πN/πS, we performed two-sided t-tests in R. Inv7.1 was excluded from this analysis because we had sequencing data for only one homozygous inversion sample; inv9.1 was also excluded because it harbours only six genes.
We also tested whether the inverted or standard haplotypes were enriched for deleterious mutations compared to the rest of the genome. To do so, we included all samples from the major P. maniculatus clade (Fig. 4b; populations a, b, c, e, f, i and j) and calculated pN/pS and πN/πS for 500 kb windows across all regions genome-wide, excluding the inverted regions. We tested whether the inverted or standard haplotypes showed significantly higher pN/pS or πN/πS than genome-wide regions using one-sided t-tests in R.
SLiM simulations
To explore a possible role of selection on the inversions, we performed forward-genetic simulations in SLiM v.3.6 (ref. 35). We simulated the forest (population c, P. m. rubidus) and prairie (population e, P. m. gambelii) populations evolving under a previously estimated best-fit demographic model20 and introduced an inversion as a Mendelian locus as a single copy. We set separate selection coefficients for the inversion locus in the forest versus prairie populations, varying the selection coefficients from −0.01 to +0.01. We introduced the inversion into either the forest or the prairie population at five time points, corresponding to 1.5 × 104, 1.5 × 105, 7.5 × 105, 1.5 × 106 and 2.2 × 106 generations ago, with 2.2 × 106 being the estimated time of the forest–prairie split. To reduce computational time, we scaled parameters by a factor of 100, with population sizes (N) and times divided by 100 (for example, after scaling, time points ranged from 1.5 × 102 to 2.2 × 104 generations ago) and migration rates (m) and selection coefficients (s) multiplied by 100 (for example, after scaling, selection coefficients ranged from −1.0 to +1.0), to keep Nm and Ns consistent35. For each set of forest and prairie selection coefficients and each time point, we ran 1,000 simulations and recorded the frequency of the inversion in the forest and prairie populations at the end of the simulation. Finally, for each scenario, we computed the probability that the inversion reached an absolute allele frequency difference between the forest and prairie populations >50%. All selection coefficients are reported as their values before scaling.
Clinal variation
To test whether inversion frequency was associated with local habitat, we analyzed P. maniculatus mice previously collected across a forest–prairie environmental gradient, which included 136 samples from nine sites across the Cascade mountains in Oregon, USA20. Using publicly available sequencing data20 (NCBI: PRJNA688305), we genotyped the 136 samples for the inversions (see Genotyping samples for inversions section above) and then used the package HZAR v.0.2.5 (ref. 67) to fit clines to inversion genotypes (https://github.com/oharring/chr15_inversion). We fit ten different cline models by varying the scaling of minimum and maximum allele frequencies (scaling: ‘fixed’ or ‘free’) and how exponential tails were fit (tails: ‘none’, ‘left’, ‘right’, ‘mirror’ and ‘both’). We selected the best model for each inversion using Akaike information criterion (with correction for small sample sizes) (AICc) values. Clines shown in Fig. 6b are fit with tails: ‘none’ and scales: ‘fixed’; best-fit clines are shown in Extended Data Fig. 8.
Genotype–phenotype associations
Using data from a reciprocal intercross between P. m. rubidus (forest population) × P. m. gambelii (prairie population) F2 hybrids (n = 547) as described above, we tested for associations between inversion genotype and three forest-ecotype-defining traits: tail length, foot length and coat colour. We used previously published phenotypic measurements20 and the inversion genotypes reported here. For each of the 13 polymorphic forest–prairie inversions, we tested whether inversion genotype was significantly correlated with trait variation using linear models in R, with genotype coded numerically (additive genetic model); for tail and foot length, we included body length as a fixed effect. We corrected for multiple hypothesis testing (that is, testing 13 different inversions) using Bonferroni correction.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We thank T. Sackton, D. Khost and members of the Hoekstra laboratory for their advice on the analyses; T. Sackton, J. Mallet, L. Gozashti, A. Kautt and members of the Mallet laboratory for providing helpful feedback on the manuscript; T. B. Wooldridge for sharing short-read sequencing data; and E. Hager and T. B. Wooldridge for many helpful discussions on inversions. The Bauer Core Facility at Harvard University provided short-read library preparation and sequencing services. The University of Washington PacBio Sequencing Core provided long-read library preparation and sequencing services. Computational analyses were run on the Odyssey and Cannon clusters supported by the Faculty of Arts and Sciences Research Computing Group at Harvard University. We thank the Museum of Southwestern Biology (University of New Mexico), Museum of Comparative Zoology (Harvard University), S. Cushman (US Forest Service, Rocky Mountain Research Station) and C. Thompson (University of Michigan) for providing specimens used in this study. O.S.H. was supported by a National Science Foundation Graduate Research Fellowship, a Harvard Quantitative Biology Student Fellowship (DMS 1764269), the Molecular Biophysics Training Grant (NIH NIGMS T32GM008313), an American Society of Mammalogists Grants-in-Aid of Research and a Society for the Study of Evolution R.C. Lewontin Early Award. H.E.H. is funded as a Howard Hughes Medical Institute Investigator.
Extended data
Author contributions
O.S.H. conceived the study and performed the analyses, with input from H.E.H. O.S.H. and H.E.H. wrote the manuscript.
Peer review
Peer review information
Nature Ecology & Evolution thanks Maren Wellenreuther and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
Sequencing data are available from NCBI SRA under BioProject accessions PRJNA856879, PRJNA816517, PRJNA860096, PRJNA862503; NCBI SRA accessions for individual samples are listed in Supplementary Table 1. Source data are provided with this paper.
Code availability
The code used for the analyses is available from GitHub (https://github.com/oharring/pman_inversions).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Olivia S. Harringmeyer, Email: olivia_meyerson@g.harvard.edu
Hopi E. Hoekstra, Email: hoekstra@oeb.harvard.edu
Extended data
is available for this paper at 10.1038/s41559-022-01890-0.
Supplementary information
The online version contains supplementary material available at 10.1038/s41559-022-01890-0.
References
- 1.Charlesworth B, Charlesworth D, Coyne JA, Langley CH. Hubby and Lewontin on protein variation in natural populations: when molecular genetics came to the rescue of population genetics. Genetics. 2016;203:1497–1503. doi: 10.1534/genetics.115.185975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mérot C, Oomen RA, Tigano A, Wellenreuther M. A roadmap for understanding the evolutionary significance of structural genomic variation. Trends Ecol. Evol. 2020;35:561–572. doi: 10.1016/j.tree.2020.03.002. [DOI] [PubMed] [Google Scholar]
- 3.Wellenreuther M, Bernatchez L. Eco-evolutionary genomics of chromosomal inversions. Trends Ecol. Evol. 2018;33:427–440. doi: 10.1016/j.tree.2018.04.002. [DOI] [PubMed] [Google Scholar]
- 4.Lowry DB, Willis JH. A widespread chromosomal inversion polymorphism contributes to a major life-history transition, local adaptation, and reproductive isolation. PLoS Biol. 2010;8:e1000500. doi: 10.1371/journal.pbio.1000500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Joron M, et al. Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature. 2011;477:203–206. doi: 10.1038/nature10341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Küpper C, et al. A supergene determines highly divergent male reproductive morphs in the ruff. Nat. Genet. 2016;48:79–83. doi: 10.1038/ng.3443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lamichhaney S, et al. Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax) Nat. Genet. 2016;48:84–88. doi: 10.1038/ng.3430. [DOI] [PubMed] [Google Scholar]
- 8.Thompson MJ, Jiggins CD. Supergenes and their role in evolution. Heredity. 2014;113:1–8. doi: 10.1038/hdy.2014.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bürger R, Akerman A. The effects of linkage and gene flow on local adaptation: a two-locus continent–island model. Theor. Popul. Biol. 2011;80:272–288. doi: 10.1016/j.tpb.2011.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kirkpatrick M, Barton N. Chromosome inversions, local adaptation and speciation. Genetics. 2006;173:419–434. doi: 10.1534/genetics.105.047985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Charlesworth B, Barton NH. The spread of an inversion with migration and selection. Genetics. 2018;208:377–382. doi: 10.1534/genetics.117.300426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ebert P, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021;372:eabf7117. doi: 10.1126/science.abf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Porubsky D, et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell. 2022;185:1986–2005. doi: 10.1016/j.cell.2022.04.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kapun M, Flatt T. The adaptive significance of chromosomal inversion polymorphisms in Drosophila melanogaster. Mol. Ecol. 2019;28:1263–1282. doi: 10.1111/mec.14871. [DOI] [PubMed] [Google Scholar]
- 15.Mérot C, et al. Locally adaptive inversions modulate genetic variation at different geographic scales in a seaweed fly. Mol. Biol. Evol. 2021;38:3953–3971. doi: 10.1093/molbev/msab143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Todesco M, et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature. 2020;584:602–607. doi: 10.1038/s41586-020-2467-6. [DOI] [PubMed] [Google Scholar]
- 17.Porubsky D, et al. Recurrent inversion toggling and great ape genome evolution. Nat. Genet. 2020;52:849–858. doi: 10.1038/s41588-020-0646-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.De Coster W, Weissensteiner MH, Sedlazeck FJ. Towards population-scale long-read sequencing. Nat. Rev. Genet. 2021;22:572–587. doi: 10.1038/s41576-021-00367-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mérot C. Making the most of population genomic data to understand the importance of chromosomal inversions for adaptation and speciation. Mol. Ecol. 2020;29:2513–2516. doi: 10.1111/mec.15500. [DOI] [PubMed] [Google Scholar]
- 20.Hager ER, et al. A chromosomal inversion contributes to divergence in multiple traits between deer mouse ecotypes. Science. 2022;377:399–405. doi: 10.1126/science.abg0718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Linnen CR, et al. Adaptive evolution of multiple traits through multiple mutations at a single gene. Science. 2013;339:1312–1316. doi: 10.1126/science.1233213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bedford NL, Hoekstra HE. Peromyscus mice as a model for studying natural variation. eLife. 2015;4:e06813. doi: 10.7554/eLife.06813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bradshaw WN, Hsu TC. Chromosomes of Peromyscus (Rodentia, Cricetidae) Cytogenetics. 1972;11:436–351. doi: 10.1159/000130209. [DOI] [PubMed] [Google Scholar]
- 24.Sparkes RS, Arakaki DT. Intrasubspecific and intersubspecific chromosomal polymorphism in Peromyscus maniculatus (deer mouse) Cytogenetics. 1966;5:411–418. doi: 10.1159/000129916. [DOI] [Google Scholar]
- 25.Li H, Ralph P. Local PCA shows how the effect of population structure differs along the genome. Genetics. 2019;211:289–304. doi: 10.1534/genetics.118.301747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Huang K, Andrew RL, Owens GL, Ostevik KL, Rieseberg LH. Multiple chromosomal inversions contribute to adaptive divergence of a dune sunflower ecotype. Mol. Ecol. 2020;29:2535–2549. doi: 10.1111/mec.15428. [DOI] [PubMed] [Google Scholar]
- 27.Smalec BM, Heider TN, Flynn BL, O’Neill RJ. A centromere satellite concomitant with extensive karyotypic diversity across the Peromyscus genus defies predictions of molecular drive. Chromosome Res. 2019;27:237–252. doi: 10.1007/s10577-019-09605-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Alkan C, et al. Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res. 2011;21:137–145. doi: 10.1101/gr.111278.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kidd JM, et al. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell. 2010;143:837–847. doi: 10.1016/j.cell.2010.10.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Numanagić I, et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics. 2018;34:i706–i714. doi: 10.1093/bioinformatics/bty586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kirkpatrick M. How and why chromosome inversions evolve. PLoS Biol. 2010;8:e1000501. doi: 10.1371/journal.pbio.1000501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Berdan EL, Blanckaert A, Butlin RK, Bank C. Deleterious mutation accumulation and the long-term fate of chromosomal inversions. PLoS Genet. 2021;17:e1009411. doi: 10.1371/journal.pgen.1009411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Jay P, et al. Mutation load at a mimicry supergene sheds new light on the evolution of inversion polymorphisms. Nat. Genet. 2021;53:288–293. doi: 10.1038/s41588-020-00771-1. [DOI] [PubMed] [Google Scholar]
- 34.Tuttle EM, et al. Divergence and functional degradation of a sex chromosome-like supergene. Curr. Biol. 2016;26:344–350. doi: 10.1016/j.cub.2015.11.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Haller BC, Messer PW. SLiM 3: forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 2019;36:632–637. doi: 10.1093/molbev/msy228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hager ER, Hoekstra HE. Tail length evolution in deer mice: linking morphology, behavior, and function. Integr. Comp. Biol. 2021;61:385–397. doi: 10.1093/icb/icab030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Nachman MW, Searle JB. Why is the house mouse karyotype so variable? Trends Ecol. Evol. 1995;10:397–402. doi: 10.1016/S0169-5347(00)89155-7. [DOI] [PubMed] [Google Scholar]
- 38.Samonte RV, Eichler EE. Segmental duplications and the evolution of the primate genome. Nat. Rev. Genet. 2002;3:65–72. doi: 10.1038/nrg705. [DOI] [PubMed] [Google Scholar]
- 39.Greenbaum IF, Reed MJ. Evidence for heterosynaptic pairing of the inverted segment in pericentric inversion heterozygotes of the deer mouse (Peromyscus maniculatus) Cytogenet. Genome Res. 1984;38:106–111. doi: 10.1159/000132040. [DOI] [PubMed] [Google Scholar]
- 40.Sanchez-Donoso I, et al. Massive genome inversion drives coexistence of divergent morphs in common quails. Curr. Biol. 2022;32:462–469.e6. doi: 10.1016/j.cub.2021.11.019. [DOI] [PubMed] [Google Scholar]
- 41.Fang Z, et al. Megabase-scale inversion polymorphism in the wild ancestor of maize. Genetics. 2012;191:883–894. doi: 10.1534/genetics.112.138578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Matschiner M, et al. Supergene origin and maintenance in Atlantic cod. Nat. Ecol. Evol. 2022;6:469–481. doi: 10.1038/s41559-022-01661-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Felsenstein J. The evolutionary advantage of recombination. Genetics. 1974;78:737–756. doi: 10.1093/genetics/78.2.737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Charlesworth B, Jensen JD. Effects of selection at linked sites on patterns of genetic variability. Annu. Rev. Ecol. Evol. Syst. 2021;52:177–197. doi: 10.1146/annurev-ecolsys-010621-044528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Keightley PD, Otto SP. Interference among deleterious mutations favours sex and recombination in finite populations. Nature. 2006;443:89–92. doi: 10.1038/nature05049. [DOI] [PubMed] [Google Scholar]
- 46.Nei M, Kojima K-I, Schaffer HE. Frequency changes of new inversions in populations under mutation-selection equilibria. Genetics. 1967;57:741–750. doi: 10.1093/genetics/57.4.741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Huang, K. et al. Mutation load in sunflower inversions is negatively correlated with inversion heterozygosity. Mol. Biol. Evol. 39, msac101 (2022). [DOI] [PMC free article] [PubMed]
- 48.Faria R, et al. Multiple chromosomal rearrangements in a hybrid zone between Littorina saxatilis ecotypes. Mol. Ecol. 2019;28:1375–1393. doi: 10.1111/mec.14972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jones FC, et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature. 2012;484:55–61. doi: 10.1038/nature10944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Yeaman S, Whitlock MC. The genetic architecture of adaptation under migration–selection balance. Evolution. 2011;65:1897–1911. doi: 10.1111/j.1558-5646.2011.01269.x. [DOI] [PubMed] [Google Scholar]
- 51.Dobigny G, Britton‐Davidian J, Robinson TJ. Chromosomal polymorphism in mammals: an evolutionary perspective. Biol. Rev. 2017;92:1–21. doi: 10.1111/brv.12213. [DOI] [PubMed] [Google Scholar]
- 52.Kingsley EP, Kozak KM, Pfeifer SP, Yang D-S, Hoekstra HE. The ultimate and proximate mechanisms driving the evolution of long tails in forest deer mice. Evolution. 2017;71:261–273. doi: 10.1111/evo.13150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wooldridge TB, et al. An enhancer of Agouti contributes to parallel evolution of cryptically colored beach mice. Proc. Natl Acad. Sci. USA. 2022;119:e2202862119. doi: 10.1073/pnas.2202862119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Bendesky A, et al. The genetic basis of parental care evolution in monogamous mice. Nature. 2017;544:434–439. doi: 10.1038/nature22074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
- 56.Sedlazeck FJ, et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods. 2018;15:461–468. doi: 10.1038/s41592-018-0001-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Edge P, Bansal V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat. Commun. 2019;10:4660. doi: 10.1038/s41467-019-12493-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kurtz S, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Tigano A, et al. Chromosome size affects sequence divergence between species through the interplay of recombination and selection. Evolution. 2022;76:782–798. doi: 10.1111/evo.14467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Long AD, et al. The genome of Peromyscus leucopus, natural host for Lyme disease and other emerging infections. Sci. Adv. 2019;5:eaaw6441. doi: 10.1126/sciadv.aaw6441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Schweizer RM, et al. Physiological and genomic evidence that selection on the transcription factor Epas1 has altered cardiovascular function in high-altitude deer mice. PLoS Genet. 2019;15:e1008420. doi: 10.1371/journal.pgen.1008420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Leaché AD, et al. Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies. Syst. Biol. 2015;64:1032–1047. doi: 10.1093/sysbio/syv053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47:W256–W259. doi: 10.1093/nar/gkz239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Andolfatto P, et al. Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res. 2011;21:610–617. doi: 10.1101/gr.115402.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Pfeifer B, Wittelsbürger U, Ramos-Onsins SE, Lercher MJ. PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 2014;31:1929–1936. doi: 10.1093/molbev/msu136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Derryberry EP, Derryberry GE, Maley JM, Brumfield RT. HZAR: hybrid zone analysis using an R software package. Mol. Ecol. Resour. 2014;14:652–663. doi: 10.1111/1755-0998.12209. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data are available from NCBI SRA under BioProject accessions PRJNA856879, PRJNA816517, PRJNA860096, PRJNA862503; NCBI SRA accessions for individual samples are listed in Supplementary Table 1. Source data are provided with this paper.
The code used for the analyses is available from GitHub (https://github.com/oharring/pman_inversions).