Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2021 Apr 29;17(4):e1009495. doi: 10.1371/journal.pgen.1009495

The genomics of rapid climatic adaptation and parallel evolution in North American house mice

Kathleen G Ferris 1,¤a,*, Andreas S Chavez 1,¤b, Taichi A Suzuki 1,¤c, Elizabeth J Beckman 1, Megan Phifer-Rixey 1,¤d, Ke Bi 1, Michael W Nachman 1,*
Editor: Sarah A Tishkoff2
PMCID: PMC8084166  PMID: 33914747

Abstract

Parallel changes in genotype and phenotype in response to similar selection pressures in different populations provide compelling evidence of adaptation. House mice (Mus musculus domesticus) have recently colonized North America and are found in a wide range of environments. Here we measure phenotypic and genotypic differentiation among house mice from five populations sampled across 21° of latitude in western North America, and we compare our results to a parallel latitudinal cline in eastern North America. First, we show that mice are genetically differentiated between transects, indicating that they have independently colonized similar environments in eastern and western North America. Next, we find genetically-based differences in body weight and nest building behavior between mice from the ends of the western transect which mirror differences seen in the eastern transect, demonstrating parallel phenotypic change. We then conduct genome-wide scans for selection and a genome-wide association study to identify targets of selection and candidate genes for body weight. We find some genomic signatures that are unique to each transect, indicating population-specific responses to selection. However, there is significant overlap between genes under selection in eastern and western house mouse transects, providing evidence of parallel genetic evolution in response to similar selection pressures across North America.

Author summary

Dissecting the genetic basis of parallel evolution, the independent evolution of similar phenotypes in similar environments among closely related lineages, allows evolutionary biologists to test whether evolution is predictable at the molecular level. Relatively little is still known about the genetics of parallel evolution in quantitative traits. Here we identify significant phenotypic and genomic parallel evolution in quantitative traits across two latitudinal transects of wild house mice in eastern and western North America. We find parallel evolution in thermally adaptive phenotypes (nest building behavior and body mass) and in genes involved in temperature-related traits such as body mass, metabolism, and temperature-sensing using population genomic scans for selection. We also find considerable divergent phenotypic and genomic evolution between eastern and western transects corresponding to known environmental differences between these transects. In this case, the evolution of quantitative traits across similar latitudinal transects involved a mixture of unique and shared responses to selection at the molecular level.

Introduction

A central goal of evolutionary biology is to understand how organisms adapt to novel environments. The geographic distribution of genotypes and phenotypes can provide information about the targets of spatially varying selection [1,2]. For example, clinal patterns of variation in Drosophila have been described for individual genes [35] and various traits such as fecundity [6] and wing size [7]. More recent work in Drosophila has taken a genome-wide approach, which has the advantage of being agnostic with respect to phenotype and thus has the potential to identify previously unsuspected targets of selection [811]. Genome-wide surveys of clinal variation have also been adopted in many other organisms [1219].

To understand the predictability of adaptive evolution in response to spatial variation in selection pressures, several studies have looked for repeated patterns of evolution across different populations that have experienced similar environments and selection pressures. For example, sticklebacks have repeatedly colonized freshwater lakes and streams from marine environments. Comparison of pairs of freshwater and marine populations has led to the discovery of genes that show parallel changes in independent transects [15]. Similarly, Drosophila melanogaster has independently colonized Australia and North America from its ancestral range. Consistent latitudinal clines of genetic and phenotypic variation have been identified on both continents [8,9]. Parallel patterns of clinal variation across independent transects provide strong evidence that the traits in question are adaptive. In cases where parallel phenotypic clines are observed, the discovery of parallel genetic clines illustrates the repeatability of evolution at the molecular level. For example, in sticklebacks, most freshwater populations share a suite of common phenotypic changes. Genetic patterns of variation reveal both parallel clines and some transect-specific clines, suggesting that adaptation to a freshwater environment may involve a mix of both shared and unique genetic changes [15,20].

Despite this extensive previous work, links between genotype and phenotype are still relatively uncommon in the context of clinal variation. Notable exceptions include protein variants such as Adh in Drosophila [4,21], Pgi in butterflies [2224], and Ldh in killifish [25,26]. In some cases, the genetic basis of clinally varying phenotypes has been identified through traditional mapping strategies [5,27]. One of the challenges of identifying the genes underlying clinal phenotypic variation is that many traits are highly polygenic with only modest contributions from individual genes. While there have been notable successes in identifying the genetic basis of parallel evolution in Mendelian or oligogenic traits such as cardenolide resistance [28], coat color [2931], or body armor in sticklebacks [27], much less is known about the extent of genetic parallelism in quantitative traits. In principle, one might expect highly polygenic traits to show less genetic parallelism since there may be numerous genetic paths to the same phenotypic optimum. One approach to identifying the genomic extent of parallel evolution in polygenic traits is to combine population genomic scans for selection across independent environmental clines [32], measurement of complex phenotypes in a common environment [33], and genome-wide association studies (GWAS) of the traits of interest [34].

The recent introduction of house mice, Mus musculus domesticus, into the Americas from Western Europe provides an opportunity to study rapid and parallel environmental adaptation. In their native range in Western Europe, house mice live in temperate climates. However, since their introduction into the Americas, M. m. domesticus have expanded into many novel environments from Alaska to the tip of South America, including the subarctic, xeric, and tropical climatic zones [35,36]. Throughout this range house mice frequently occupy outdoor structures, such as barns and sheds, exposing them to greater environmental variation than that which is experienced by humans. This recent population expansion into new and extreme habitats, combined with their status as a mammalian model system, makes house mice useful for studying the genetic basis of rapid local adaptation. Additionally, their colonization of multiple similar thermal environments across the Americas provides a powerful system to study the genomic basis of parallel evolution in response to temperature.

In this study we examine rapid adaptation along a latitudinal cline in western North America and compare our results to a previous study of clinal variation in house mice along a similar thermal gradient in eastern North America (Fig 1A) [33]. We analyzed the phenotypic and genomic basis of adaptation using wild mice collected from five populations between Tucson, Arizona and Edmonton, Alberta. The environment varies dramatically in temperature, precipitation, and seasonality across this latitudinal transect (S1 Fig). We first measured body weight and nest-building behavior, traits involved in thermal adaptation, in lab-reared descendants of wild mice collected from the ends of the western transect. We then performed a population genomic scan for selection and a genome wide association study on body weight using exome data to identify candidate loci for adaptive variation in quantitative traits in the western transect. Finally, we examined the degree of parallel genomic evolution by determining the overlap between the loci under selection in eastern and western North America. While both transects traverse a similar thermal range, the eastern transect does not share the western transect’s striking precipitation gradient suggesting there will be both strong parallel and non-parallel selective forces driving clinal variation in house mice across North America (S1 Fig) [37].

Fig 1. Sampling localities and relationships among populations.

Fig 1

A) Heat map of mean annual temperature (MAT) in North America with the parallel eastern and western house mouse transects with populations at similar latitudes shown in the same color. Degrees of latitude are marked on the y-axis, longitude on the x-axis, and MAT in °C is indicated by color. This map was created using the R package “raster’ to plot Mean Annual Temperature data from worldclim (MAT = bioclim variable 1). B) A bootstrap consensus neighbor-joining tree constructed in PAUP using a distance matrix generated from the exome sequences of all 100 mice in both the eastern and western transects depicting the relatedness among all 10 populations. This tree is rooted using five M. m. domesticus samples from Europe.

Results

Independent evolutionary history of eastern and western populations

Parallel phenotypic and genetic clines in eastern and western North America could result from independent evolution or shared history among populations at the same latitudes. To distinguish between these possibilities, we reconstructed the history of the 10 populations in both transects using exome data generated here from 50 wild mice across 21° of latitude in western North America (Fig 1A, collecting localities and samples are given in S1 Table, and exome data are summarized in S2 and S3 Tables), and exome data from the five eastern populations in [33]. Unrelated wild-caught mice were sampled in the same way in both transects, with 10 mice per locality. A neighbor-joining tree generated from a distance matrix based on the exome sequences of mice in both transects depicts the relationships among these 10 populations using mice from the ancestral European range to root the tree (Fig 1B). In general, mice within populations are more closely related to each other than they are to mice from other populations (Fig 1B). At a larger geographic scale, mice within each transect are more closely related to each other than they are to mice from the other transect. The tree contains two major clades, one containing mice from western North America and one containing mice from eastern North America. Both of these clades have a bootstrap support of 100%. We also generated a population tree using quartet assembly with random sampling of 50,000 quartets with SVDQuartets (see Materials and Methods). This tree also depicts two major clades corresponding to the eastern and western transects, with 100% bootstrap support for each clade (S2 Fig). Similar patterns can be seen in a principal components analysis, in which mice from each transect are separated along PC1 (S3 Fig). Thus, populations are clustered broadly by longitude rather than by latitude, suggesting that adaptation to different latitudes likely occurred independently in these two transects. An alternative possibility is that adaptation to high latitudes occurred once and that beneficial alleles were carried by rare migrants between the eastern and western transects. This seems less likely in light of the very recent colonization of house mice in the Americas and the well-supported phylogenetic relationships depicted in Figs 1B and S2.

We also documented genome-wide patterns of genetic variation within western North America. Overall levels of nucleotide diversity (π) within populations ranged from 0.14% to 0.25% (S4 Table), similar to levels of variation seen in the eastern transect [33]. Also similar to patterns observed in the eastern transect, principal components analysis largely grouped western mice by population (Fig 2A), which is consistent with the neighbor-joining tree depicted in Fig 1B. We observed a modest signature of isolation by distance (IBD) in the western transect (R2 = 0.28, Fig 2B), in contrast to the lack of IBD found in the eastern transect [33].

Fig 2. Population structure in mice from the western transect.

Fig 2

A) Three dimensional principal components plot of the five western populations from Fig 1: Tucson, AZ (red), St. George, UT (orange), Provo, UT (purple), Missoula, MT (cyran), and Edmonton, AB (dark blue); B) plot of genetic distance as measured by Fst versus geographic distance (km); C) admixture plot of the five western populations with M. m. castaneus ancestry plotted in green and M. m. domesticus ancestry plotted in black.

To better understand the subspecific origin of house mice in the western transect we used the software ngsAdmix to test for admixture between M. m. castaneus and M. m. domesticus. We detected a small signature of admixture between M. m. castaneus and M. m. domesticus in the Tucson, AZ population, but no evidence of admixture in the other four populations (Fig 2C). No signature of admixture among house mouse subspecies was found in the eastern transect [33]. Admixture in the Tucson population may explain the finding that this population has the highest pairwise FST (mean = 0.13) and sequence diversity (π = 0.0025, θω = 0.0024) of any population in the western transect (S4 and S5 Tables).

Phenotypic variation in house mice from western North America

Next, we sought to characterize potentially adaptive phenotypic variation in the western transect and compare this to previously documented variation in the eastern transect. We focused first on body weight and nest-building behavior (see Materials and Methods), two traits involved in thermal adaptation and known to differ between northern and southern populations in the eastern transect [33]. Among fifth-generation lab-born descendants of wild-caught mice, we found that body mass was significantly greater in Alberta mice than in Arizona mice (ANOVA; population p = 0.003, sex p = 0.035, age p = 0.547 Fig 3A). We also found that nest weight was significantly greater in Alberta mice than in Arizona mice (ANOVA; population p = 0.038, body weight p = 0.966, sex p = 0.394, age p = 0.954; Fig 3B, 3C, and 3D). The magnitude and direction of these differences were similar to those seen in the eastern transect [33]. The fact that these differences persisted over multiple generations among descendants of wild-caught animals reared in a common laboratory environment indicates that the differences have a genetic basis and are not due to either phenotypic plasticity or maternal effects. These genetic differences indicate that there has been parallel phenotypic evolution in morphology and behavior between eastern and western transects.

Fig 3. Phenotypic variation in lab-reared and wild-caught mice.

Fig 3

A) Body mass of fifth-generation laboratory-reared female (F) and male (M) mice from Arizona (red) and Alberta (blue); B) mass of nesting material used by fifth-generation laboratory-reared female (F) and male (M) mice from Arizona (red) and Alberta (blue) in a 24-hour period; C) a typical large nest built by a mouse from Alberta; D) a typical small nest built by a mouse from Arizona; E) brightness of the dorsal fur of 50 wild-caught mice as measured by a spectrometer. Brightness is measured as the total area under the average reflectance curve from 300–700 nm in arbitrary units.

We also characterized phenotypic variation among wild-caught animals. These animals likely varied in age, health, reproductive status, pathogen exposure, diet and many other unknown variables. We did not observe a significant latitudinal cline for body weight (p = 0.573). Since we did see significant differences in body weight in the lab reared mice, the lack of a significant cline among wild mice is likely due to the relatively small sample sizes per population as well as the influence of uncontrolled factors such as age, diet, and health. However, we did observe a significant cline for coat color in terms of brightness (Fig 3E; p = 0.001). Darker mice were observed in more northern latitudes where darker soils are associated with more humid environments and higher degrees of organic matter in the soil. In the western transect, this variation in coat color was readily discernable by eye. In contrast, no discernable variation in coat color was observed in the eastern transect, and therefore spectrophotometric data were not collected from eastern mice.

The genomic signature of adaptation in mice from western North America

To examine regions of the genome contributing to environmental adaption across western North America, we performed population genomic scans for selection using the latent factor mixed model (LFMM) method [38]. With a q-value cut-off of 0.05 we identified 13,057 single nucleotide polymorphisms (SNPs) in 4438 genes that were significantly associated with mean annual temperature (Fig 4A and S6 Table). Of those 13,057 SNPs, only 4% were non-synonymous, while 8.7% were synonymous and 87.3%, were non-coding. These proportions are similar to those seen in the eastern transect [33] and are roughly similar to the fractions of variable sites in the dataset. Nonetheless, the very small number of non-synonymous sites showing signatures of selection suggests that selection is acting primarily on regulatory rather than on protein-coding changes. We narrowed the list of candidate loci under selection by using a more stringent q-value cut-off (q = 0.001) and by only including genes with at least two SNPs meeting this cut-off. These more stringent filters identified 311 SNPs in 95 genes (S7 Table). Among these top candidates are genes annotated to phenotypes that likely mediate responses to temperature, precipitation, and seasonality across the western transect such as osmoregulation in the gut epithelia (Vipr1), circadian rhythm (Per2), skeletal development and body size (Bmp7, Bmp5), kidney function (Pkhd1), metabolism & body weight (Mc3r), and heat sensing (Trpm2). All of the top candidate genes complete with functional information are listed in S7 Table.

Fig 4.

Fig 4

Manhattan plots depicting the results of A) a population genomic scan for selection using LFMM (blue line indicates q-value = 0.05 and the red line indicates q-value = 0.001; B) a genome-wide association study of body weight using 38 mice from the western transect (blue line indicates a q-value = 0.01 and the red line indicates a q-value = 0.0001).

Parallel evolution in eastern and western populations

To address the extent of parallel genomic evolution we evaluated the overlap between LFMM outlier loci in both transects at two levels of significance: q-value<0.05 and q-value<0.001. Genetic parallelism was evaluated at the level of the gene, not the SNP. To determine whether the genetic overlap was significantly greater than expected by chance we conducted permutation tests with 100,000 replicates with replacement. At the lower stringency (q-value < 0.05), we observed 434 loci with signatures of selection in both transects, and this is significantly more than expected by chance (expected number = 339; permutation test, p-value < 0.001; S4 Fig). At the higher stringency (q-value < 0.001), we observed 16 genes with signatures of selection in both transects, also a significantly greater number than expected by chance (expected number = 5; permutation test, p-value < 0.001; S4 Fig and S8 Table). Four of these 16 genes show signatures of selection at the same SNP in the eastern and western transects, and none of these are non-synonymous mutations (S9 Table). Fourteen of these 16 genes have known functions, and of these, five have functions related to body size or fat composition (Mc3r, Mtx3), metabolism (Galnt2, Zfp663), or other aspects of thermoregulation such as temperature sensing (Trpm2). The observation that 31% of these genes involve traits potentially relevant to thermal adaptation suggests that much of this parallel evolution may be driven by adaptation to similar thermal gradients in eastern and western North America (Fig 1A).

Two genes with similar patterns in the eastern and western transects are noteworthy for their large changes in allele frequency and known relationship to traits that are likely adaptive along latitudinal gradients. Melanocortin receptor 3 (Mc3r) showed differences in allele frequency of 80% in both transects (Fig 5A and 5B) and is known to be involved in feeding, metabolism, and body weight [39]. Mc3r knock-out mice have significantly greater fat mass and lower lean mass than wild type mice [39]. Thermo-TRP ion channel 2 (Trpm2) showed shifts in allele frequency of 70% in both transects and encodes a neuronal axon ion channel involved in the sensation of non-noxious heat. Expression of Trpm2 in response to warm temperatures causes mice to behaviorally seek cooler temperatures [40]. Genetic variation in Trpm2 may underlie adaptive variation in behavioral thermoregulation in response to increasing temperatures in southern populations.

Fig 5. Examples of concordant and discordant clinal patterns between the eastern and western transects.

Fig 5

Allele frequency changes at SNPs in Mc3r are similar in the western (A) and eastern (B) transects, while allele frequency changes at SNPs in Pkhd1 are different in the western (C) and eastern (D) transects. Mc3r is believed to be involved in body size variation, while Pkhd1 is involved in kidney function.

Divergent evolution in eastern and western populations

In addition to the greater than expected patterns of parallel genetic change, we observed that most loci show signatures of selection in only one transect. For example, a locus associated with kidney function, Pkdh1, showed strong patterns of clinal variation in the western transect where there is a significant cline in mean annual precipitation (Fig 5C), but Pkdh1 did not show clinal patterns of variation in the eastern transect where there is little variation in precipitation (Fig 5D). In fact, of the top 10 candidate loci under selection (i.e. LFMM q< 0.05) in the western transect with known kidney-related functions, nine show weak or no signatures of selection in the eastern transect. Similarly, one locus showing signatures of selection in the western transect (S6 Table) that is potentially involved in coat color variation, Adam12, did not show signatures of selection in the eastern transect (S8 Table). This is consistent with patterns of environmental variation, with pronounced clines for soil color in the western transect but not in the eastern transect.

Genome wide association study of body weight

To link patterns of genetic variation with known phenotypic differences in body weight, we conducted a genome wide association study (GWAS) using GEMMA [41]. Using the exome data generated here, we tested for an association between body weight and genotype at each SNP among the mice from the western transect. We used a linear mixed-model controlling for genetic relatedness and sex as covariates and a false discovery rate of 5% to control for multiple testing. We also analyzed the data from [33] in the same way to look for associations between body weight and SNPs among mice from the eastern transect, however after correcting for multiple testing there were no significant SNPs associated with body weight in eastern North America.

We found that eight SNPs in five genes were significantly associated with variation in mouse body weight in the western transect (q-value < 0.05; Fig 4B and S10 Table). All of these loci except Lrrfip2 show signatures of selection in both western and eastern North America (LFMM q-value < 0.05). The average difference in allele frequency between the southernmost population and the northernmost population for these eight SNPs was 0.33, consistent with the idea that polygenic adaptation may be driven by modest changes in allele frequency at many genes [42]. Collectively, allelic variation at these five genes accounts for 1.82% of the phenotypic variance in mouse body weight in the western transect. Of these five genes, Cep85, Cdh8, and Epm2aip1 have established links to body mass or metabolism in lab mice or humans (S10 Table). Centrosomal protein 85 (Cep85) contains SNPs with a strong signature of selection (LFMM q-value = 0.006), and Cep85 expression is associated with variation in female body mass index in the eastern transect [43]. Cadherin 8 (Cdh8) also contains SNPs with a strong signature of selection (LFMM q-value < 0.001) and has been linked to obesity and metabolic traits through QTL mapping and differential expression analysis in mice [44]. Epm2aip1 (LFMM q-value = 0.002) is involved in glycogen metabolism [45]; inactivation of this gene in laboratory mice causes hepatic insulin resistance and resistance to age-related obesity [46]. Variation at each of these genes explains less than one percent of the variance in mouse body weight along the western transect (Cep85 PVE = 0.46%; Cdh8 PVE = 0.47%; Epm2aip1 PVE = 0.45%). The functional information linking these three genes to body weight or metabolism, combined with their signatures of selection in both transects, makes them strong candidates for adaptive variation in mouse body weight.

Discussion

We collected house mice along a latitudinal transect in western North America and compared these to house mice sampled along a similar thermal gradient in eastern North America [33]. First, we found that mice within each transect were more closely related to each other than they were to mice in the other transect. Nonetheless, lab-born progeny of mice sampled from the ends of both transects showed parallel differences in body weight and nest building behavior, suggesting that these adaptations have evolved independently in each transect. Second, genome scans identified candidate genes for environmental adaptation in the western transect and revealed that the overlap among candidate genes for each transect was more than expected by chance indicating parallel evolution occurring at the level of the genetic locus. Nevertheless, each transect also contained a number of unique candidates which may be driven by divergent environmental features in eastern and western North America. Finally, a small subset of genes showed GWAS hits for body weight in the western transect and signatures of selection in both transects. These genes are attractive candidates for linking genotype to phenotype for an adaptive quantitative trait. Below we discuss each of these issues in turn.

Colonization history and rapid phenotypic evolution in North America

House mice have been spread around the world in association with humans [47] and likely colonized the Americas during the last few hundred years. The earliest museum records of Mus in the Americas date to the early 1800’s, but it is likely that house mouse populations were established before that time. In the wild, mice breed seasonally and may undergo ~2 generations per year. Thus, mice have likely been in the Americas for 400–500 generations or more. In this short evolutionary timeframe, mice have adapted to a wide range of environmental conditions.

Since house mice are an important biomedical model system, it is worth noting a number of parallels between humans and house mice in the context of adaptation in the Americas. First, the timeframe for house mice in the Americas is similar to the timeframe for humans when measured in generations. Humans colonized the Americas ~15,000 years ago [48]. With a generation time of 28 years [49], this corresponds to ~500 generations. Second, house mice are commensal and live in close association with humans. House mice are frequently found in outdoor structures such as barns, sheds, and grain storage locations where they are exposed to similar environmental pressures as native rodents (e.g. ambient temperatures, pathogens, predators). Finally, like mice, humans show evidence of adaptive differences among populations from different environments in the Americas [50,51].

Details of the colonization history of house mice in North America from their ancestral range in western Europe are mostly unknown, but several conclusions can be drawn from the available data. Levels of nucleotide diversity in North American populations are similar to those seen in Europe [52,53], suggesting that the colonization of North America was not associated with a very strong bottleneck. Patterns of genetic variation in eastern North America do not show isolation-by-distance, arguing against a single introduction from which mice dispersed [33]. The relationships of populations depicted in Fig 1 indicate that mice are grouped more by longitude than by latitude, consistent with repeated colonizations of similar latitudes in eastern and western North America. Finally, the presence of M. m. castaneus alleles in Tucson suggests a connection with southern California where the presence of M. m. castaneus alleles has previously been documented [54]. These alleles were not found in other populations in the western transect. A railroad line connects southern California with Tucson and may have provided a conduit for dispersal of mice.

We find that, despite their recent introduction, house mice in western North America are significantly differentiated in many ecologically important traits including body weight, nest building behavior, and coat color. There was significant clinal variation in dorsal coat color in the western transect, with southern mice having lighter fur than mice in the north (Fig 3E). Matching dorsal fur with background soil environments is an important anti-predator adaptation in small ground-dwelling mammals [55]. The latitudinal coat color pattern that we found matches the general transition of background soil cover from southern regions with lighter colored soils that are more sparsely vegetated to northern locations with darker soils that are more densely vegetated. Similar patterns of color variation were not evident in the eastern transect. This divergence in fur color variation between the transects is consistent with environmental differences since there is a great deal of variation in soil brightness and precipitation across the intermountain West (Fig 1), while in the East these environmental factors are fairly uniform [33].

The patterns of house mouse colonization discussed above provide a context for understanding the parallel differences in body size and nest-building behavior seen between northern and southern populations in both transects. Phenotypic measurements of fifth-generation lab-reared mice from the ends of the western transect demonstrated that there is a genetically determined difference in body weight, with significantly heavier mice in the north compared to the south (Fig 3A). These results parallel differences in body weight seen in the eastern transect [33] and are consistent with Bergmann’s rule [56]. Larger mice have lower surface area to volume ratios and therefore suffer less heat loss [57]. Therefore, heavier mice from Alberta and New York should be better able to thermoregulate during cold northern winters than mice from more southern latitudes. Bergmann’s rule has been described for both body mass and body size, although in mammals, associations with latitude are typically stronger for body mass [58]. In most cases, as in the present study, body size and body mass are strongly correlated (linear regression R2 = 0.57, P< 0.0001, data in S1 Table).

Nest building behavior also showed a genetically-determined difference between fifth-generation lab-reared mice from the ends of the western transect. Mice from Alberta built nests twice as large on average (12.49g) as mice from Tucson (5.58g, Fig 3B, 3C, and 3D). A similar pattern was observed in lab-reared mice from eastern North America, where New York mice built nests twice as large as mice from Florida (Fig 1C) [33]. This genetic difference in nest weight is likely adaptive since a larger nest will better insulate mice from cold winter temperatures in the north. The observation of parallel differences in body size and nest building among mice in the eastern and western transects, combined with the separate evolutionary history of the mice in each transect (Fig 1), suggests that these phenotypic differences have arisen independently, presumably as an adaptive response to novel thermal environments.

Parallel and unique genomic changes underlie clinal adaptation

Parallel phenotypic changes in similar environments provide an opportunity to study the repeatability of evolution at the genetic level. The degree of parallelism may depend on many factors such as the relatedness of the taxa being compared, the degree of similarity between environments, and the genetic architecture of the traits being studied. A common pattern is that populations within a species exhibit greater levels of genetic parallelism than between-species comparisons, presumably because of the large proportion of shared standing genetic variation [59,60].

Much of the work on parallelism has focused on targets of selection where single genes are important, such as cardenolide resistance in diverse milkweed-feeding insects due to mutations in ATPα [28,61] or oxygen-binding in high-altitude birds due to mutations in Hbb [62]. In both cases, adaptation seems to involve a combination of parallel and unique changes. In situations involving complex traits that are highly polygenic, we expect fewer parallel changes because the target on which selection acts is so much larger. Despite this expectation, we found significantly more overlap in loci under selection between transects than expected by chance, and this pattern was robust to different significance cutoffs. This degree of overlap most likely arose as a consequence of selection having acted on the same pool of standing genetic variation in each latitudinal cline, increasing the likelihood of a similar response.

There are a number of biological and statistical reasons for expecting both shared and unique changes in house mice from eastern and western North America. While these two transects share many environmental similarities, such as a similar range in mean annual temperature, they also exhibit many differences, including considerably more variation in elevation, substrate, vegetation, and precipitation in the west compared to the east. It is likely that these differences impose distinct selective pressures in each transect, and these would be expected to lead to different genetic responses. Statistical issues also confound the easy interpretation of the proportion of shared and unique changes. The LFMM outliers undoubtedly include some false positives. Conversely, by controlling for genetic relatedness LFMM may exclude some loci that are truly under spatially varying selection, but whose variation in allele frequency is closely mirrored by population structure [63]. Additionally, there are probably genes that have responded to selection through modest changes in allele frequency [42] and these may not be detected by LFMM. Identifying polygenic signatures of adaptation remains a complex and active area of research still without perfect solutions [6468].

Despite these caveats, we identified 16 candidate genes showing parallel adaptation to variation in mean annual temperature in both transects at a stringent false discovery rate (LFMM q-value < 0.001). Many of these 16 genes are known from studies of laboratory mouse mutants to underlie traits that are important in thermal adaptation. For example, Mc3r controls variation in body weight and feeding behavior in laboratory mice [39,69] and Mc3r shows strong clinal signatures in both transects. Another gene of note, Trpm2 is a member of the transient receptor potential (TRP) family of thermally activated ion channels that is involved in neuronal sensing of non-noxious heat and behavioral thermoregulation in mice [40]. Activation of Trpm2 causes mice to seek out cooler temperatures. Thermo-TRP ion channels are involved in cold and heat sensing in both invertebrates and vertebrates and have functionally diversified across species [70]. Therefore, Thermo-TRP’s are likely targets for spatially varying selection across thermal gradients. In fact, Trmp8, another member of this gene family, is involved in cold sensing and has been implicated in adaptation to cold in thirteen-lined ground squirrels and Syrian hamsters [71], humans [72], and woolly mammoths [7375]. Our finding that Trpm2 is under selection in parallel latitudinal clines of house mice suggests that both cold- and heat-sensing receptors may be important in repeated adaptation to different temperature regimes.

Linking genotype to phenotype in an adaptive quantitative trait

Population genomic approaches are useful for detecting signatures of selection on traits that vary along ecological gradients. However, because the function of many genes remains unknown or described primarily through gene knockouts, the results of these scans can be difficult to interpret biologically. Naturally-occurring allelic variation within a gene may result in dramatically different phenotypic effects than those obtained from the inactivation of an entire locus through a knockout mutation. One way to more directly link loci under selection to phenotypic variation is to identify genome-wide associations between SNPs and adaptive traits of interest. By combining population genomic scans for selection and genome wide association studies we were able to identify a small set of genes that were significantly associated with body weight in the western transect, have established functional links to body mass in laboratory mice, and show signatures of parallel selection in the eastern and western transects: Cdh8, Epm2aip1, and Cep85. These genes account for a small portion of the variance in body weight, but they represent strong candidates for targets of selection related to a complex adaptive phenotype.

Many examples of the genetic basis of adaptation involve traits that are controlled by a few genes of major effect [27,28,30,76], yet most of adaptive evolution surely involves quantitative traits, where the contribution of individual genes is small. Under such situations, large changes in phenotype may be governed by modest changes in allele frequency at many loci [42]. This insight led to the development of several polygenic tests of selection based on GWAS hits for human height [64,66], although subsequent work revealed that population structure in GWAS can lead to spurious inferences [67,68]. Genome wide association studies have been performed on many traits in humans yet have rarely been applied to natural populations of other organisms. The GWAS reported in this paper was based on relatively small samples. Future studies aimed at large, well-powered GWAS on polygenic traits such as body size from a single population of wild mice would provide a means to implement more comprehensive polygenic tests of adaptation [77].

Conclusion

This work has demonstrated rapid and parallel environmental adaptation in quantitative traits at both the phenotypic and genomic levels in house mice across North America. We found that house mice in northern populations have adapted in parallel to cold environments by becoming larger and building bigger nests compared to mice in southern populations. This adaptation appears to be largely due to regulatory changes, as opposed to protein coding changes, since very few of the signatures of selection involved non-synonymous mutations. We discovered significant overlap in the loci under selection between transects, and many of the genes identified underlie traits likely to be important in thermal adaptation, such as body size and heat-sensing. Finally, we also discovered divergent patterns of selection between the two transects at both the phenotypic and genetic levels. Color variation tracked gradients in soil color and precipitation in the western transect, but no clines for fur color were observed in the eastern transect, where less variation in soil color and precipitation is seen. Similarly, genomic patterns of variation revealed stronger evidence of selection on genes involved in kidney function in the western transect compared to the eastern transect. Together our results show that a mixture of parallel and unique changes at both the phenotypic and genetic level may be expected when closely related populations adapt to parallel latitudinal gradients.

Materials and methods

Ethics statement

Animals were collected and sacrificed in accordance with protocols approved by the Institutional Animal Care and Use Committee (IACUC) of the University of Arizona and the University of California, Berkeley. All wild-caught animals were collected with permits issued from the states of Arizona, Utah, and Montana in the U.S. and the province of Alberta, Canada.

Sampling

Fifty wild Mus musculus were collected along a latitudinal transect in western North America from Arizona to Alberta using Sherman live traps. Ten mice were collected from each of the following locations: Tucson, AZ, St. George, UT, Provo, UT, Missoula, MT, and Edmonton, Alberta (Fig 1A and S1 Table). Each animal was caught at least 500 m from every other animal to avoid collecting relatives. Skins, skulls, and skeletons were prepared as museum specimens and deposited in the Museum of Vertebrate Zoology, University of California, Berkeley (accession numbers are given in S1 Table). Following euthanasia, fresh tissues were collected and stored in liquid nitrogen and then kept at -80°C until used for DNA extraction and sequencing.

In addition, live mice were collected from the ends of the transect, and descendants of these wild-caught mice were used to study traits in a common laboratory environment. Fourteen mice from Tucson, AZ, USA and 27 mice from Edmonton, Alberta, Canada were used to create new inbred lines. Within locations, individual collection sites were at least 500 m from each other. Lines were established from different sites, so that lines were unrelated. Animals were shipped to the University of California, Berkeley. Wild-caught animals were mated to create the first lab-reared (N1) generation. These mice were then inbred for five generations through brother-sister mating before being phenotyped for body mass and nest building as described below.

Phenotyping field collected and live mice

For field-collected mice we measured total length, tail length, hind limb length, ear length, body weight, and testis size. Weight and length were measured in the field by a single investigator (TAS) using a 30g micro-line spring scale (Schindellegi, Switzerland) and a ruler after euthanasia and before specimen preparation. Coat color was measured on museum specimens by assessing spectral reflectance of the mid-dorsal region using a USB2000 (Ocean Optics Inc., Dunedin, FL, USA) spectrophotometer with a dual deuterium and halogen light source. Coat brightness was assessed by calculating the total area under the average reflectance curve from 300-700nm [78]. Three measurements were taken with the probe perpendicular to the specimen and were averaged for analysis. Spectral reflectance wavelengths were recorded using SPECTRASUITE (Ocean Optics Inc., Dunedin, FL, USA).

For phenotyping, laboratory mice were housed singly after weaning in static cages at 23°C with 10 hour dark and 14 hour light cycles. Mice were phenotyped for nest building behavior and body weight, traits that are known to vary clinally in mice from eastern North America [33,79]. Thirteen Tucson mice (representing four different inbred lines) and 11 Edmonton mice (representing six different inbred lines) were phenotyped. Males and females were sampled from each line except for one Tucson line for which there was only a single female available, and two Edmonton lines for which only females were available. Mice used in these assays were 154–327 days old. Nest building behavior was measured by placing 40g of cotton on top of each cage and then weighing the remaining unused cotton 24 hours later. The difference between the initial and final cotton weights was used as a measure of nest size [33]. Body weight was measured using a digital scale. To test whether nest weight was significantly different between lab reared mice from Arizona and Alberta we used a generalized linear model (GLM) implemented in R including all mice with population, age, body weight, and sex as factors. A separate generalized linear model (GLM) was run in R to test whether body mass was significantly different between lab reared mice from Arizona and Alberta, including all mice with population, age, and sex as factors.

Exome capture, sequencing and assembly

Genomic DNA was extracted from liver of wild-caught mice using the Gentra PureGene (Qiagen Inc., Valencia, California) DNA extraction kit following the manufacturer’s fresh tissue protocol and was quantified on a Qubit 2.0 Fluorometer (Life Technologies, Foster City, CA) using the Qubit dsDNA BR Assay Kit (Life Technologies, Foster City, CA). We sheared 1 μg of genomic DNA to less than 500bp with a Biorupter (Diagenode, Denville, NJ) by sonicating the DNA for five cycles of 30 seconds on and 30 seconds off, briefly centrifuging the tubes, and then sonicating for five more cycles. Barcoded Illumina sequencing libraries were prepared using the Meyer and Kircher protocol [80]. Libraries were amplified with Phusion High-Fidelity DNA Polymerase (Thermo Scientific) for 6–8 cycles during the indexing polymerase chain reaction (PCR). Each individual sample was amplified twice in parallel and then merged to decrease PCR stochastic drift. Individually barcoded libraries were multiplexed in groups of 10 in equimolar amounts with each pool containing 1.25μg of total DNA. Exome enrichment was conducted with five captures from the SeqCap EZ Developer Library: Mouse Exome Kit (Nimblegen, Madison, WI) with slight modification to the manufacturer protocols. DNA multiplex sample pools were combined with blocking oligonucleotides and mouse COT-1 and EZ Library and incubated on a thermocycler for 72 hours at 47°C. Following hybridization, each enriched pool was split into three PCR reactions and amplified three times in parallel for 11–14 cycles and then merged. We used qPCR to determine the postcapture-enrichment efficiency. All five enriched-pooled libraries were combined together and sequenced on three lanes of an Illumina HiSeq3000 at the UC Davis Genome Center (150-bp paired end). Targeted areas include ~ 54.3 Mb of nuclear coding and UTR sequence.

For cleaning raw sequence data we followed the general protocol outlined by [81] and [82] with some modifications. Briefly, Raw fastq reads were filtered using Skewer [83] and Trimmomatic [84] to trim adapter contaminations and low quality reads. Exact PCR and/or optical duplicate reads were removed using Super-Deduper (https://github.com/dstreett/Super-Deduper). We used Bowtie2 [85] to align the resulting reads against the Escherichia coli genome to remove any potential bacterial contamination in the data. Overlapping paired reads were merged using Flash [86]. After cleaning, paired-end reads and merged single-end reads from each individual library were then aligned to the Mus musculus reference genome (GRCm38.p3) using Novoalign (http://www.novocraft.com/products/novoalign/) and we only kept reads that mapped uniquely to the reference. We used Picard (http://broadinstitute.github.io/picard/) to add read groups and GATK v3.7 [87] to perform re-alignment around indels. We then used SAMTools/bcftools [88] to generate a VCF file that contained all sites. Each site was sequenced to an average read depth of ~31X (S3 Table). The data in the VCF file were then filtered using a custom filtering program, SNPcleaner (https://github.com/tplinderoth/ngsQC) by following the protocol specified in [82]. We masked sites within 10 bp upstream and downstream of indels. We also only kept sites where at least 70% of the samples had at least 3x coverage. For most analyses, the dataset was not pruned for SNPs in close linkage. LD decays over relatively short distances in mice and rarely extends between genes [89]. Since the data are from exomes rather than whole genomes, the number of SNPs in tight association is modest. After these filters, 635K sites were used in downstream analyses.

SNP calling

SNP and genotype calling based on fixed coverage cutoffs can result in potential bias or introduce noise in downstream population genetic analyses [90]. To take into account the statistical uncertainties around SNP/genotype calling, we called SNPs and estimated allele frequencies using an empirical Bayesian framework implemented in ANGSD with a posterior probability of 0.95 and the p-value of the likelihood ratio test of a SNP being variable to be 1e-6 [91]. For each population we only kept variants where at least 80% of the samples had data after filtering. We also eliminated sites where the minor allele frequency was less than 5%, resulting in 342,106 SNPs total.

House mice consist of three main subspecies (M. m. domesticus in Western Europe, M. m. musculus in Eastern Europe and northern Asia, and M. m. castaneus in southeast Asia). M. m. domesticus is the subspecies that is believed to have colonized most of North and South America, although there are previous reports of introgression from M. m. castaneus in California [36,54]. We tested for the presence of admixture with M. m. castaneus in each of the five populations by acquiring whole genome sequence data for M. m. castaneus [53]. We downloaded fastq reads of 10 M. m. castaneus and 10 M. m. domesticus specimens (see S2 Table for SRA IDs). The raw fastq reads were cleaned, aligned and re-aligned to the same Mus musculus reference genome using the methods described above. Admixture with M. m. castaneus was investigated using “NGSAdmix” [92] implemented in ANGSD which handles genotype likelihoods in a Maximum Likelihood framework. We ran the analyses considering two to five ancestral populations (K) and 5% as the minimum minor allele frequency (minMaf) cutoff. For each value of K, we performed 10 replicates and plotted the results.

We also used genetic PCA to summarize variation within and among populations and calculated pairwise Fst. Both Fst calculations and genetic principal component analyses were implemented via the ngsTools software package [93].

Phylogenetic analyses

To investigate the evolutionary relationships among individuals and populations, we included mice from Europe (n = 5) [53], eastern North America (n = 50) [33] and western North America (n = 50; this paper). We pruned the autosomal biallelic SNPs for linkage disequilibrium with plink v1.90 [94] using non-overlapping 50 Kb windows and an r2 threshold of 0.5 (—indep-pairwise 50 50 0.5). Using PAUP* v4.0 [95], we estimated a neighbor joining tree and assessed node support by bootstrapping (100 repetitions). We also estimated a population tree in PAUP* using quartet assembly with random sampling of 50,000 quartets with SVDQuartets v1.0 [96] and bootstrapping to evaluate node support (100 repetitions). We defined each North American Mus population as an independent lineage in the tree and European M. m. domesticus as the outgroup.

Environmental association

We used the Latent Factor Mixed Model (LFMM) program to perform a population genomic scan for selection and identify candidate genes underlying environmental adaptation [38]. LFMM uses a hierarchical Bayesian mixed model based on PCA residuals to account for population genetic structure while testing for significant associations between variation in allele frequency and environmental variables. We note that LFMM generally outperforms GEMMA (see below) at identifying loci under environmental selection by having a much lower false negative rate and a similar false positive rate [38]. We ran LFMM 10 times with K = 2. We chose K = 2 as the number of appropriate latent factors to use in the LFMM model because it gave the best estimate of the genomic inflation factor (λ). P-values were adjusted to control for the false discovery rate (FDR). The distribution of p-values was examined and λ was modified to obtain a flatter distribution with a peak near zero (λ = 0.9).

We acquired mean annual temperature for each of the five sampling localities using BIOCLIM [97]. We chose mean annual temperature (MAT) as the environmental variable in our LFMM analysis because MAT was the variable most closely associated with latitude and most similar between the eastern and western transects. Outlier SNPs were identified using a false discovery rate of 5% (q-value < 0.05). To identify a narrower set of candidate genes under selection we also used a more stringent cut-off of 0.1% (q-value < 0.001) and required loci to contain at least two SNPs below this cut-off. Sex chromosomes were excluded from the analyses. Outlier loci were annotated here and below using the GRCm38.75 version of the M. m. domesticus genome and phenotype data from Mouse Genome Informatics (MGI) (www.informatics.jax.org). MGI compiles all mouse phenotype data, the majority of which derive from studies on classical inbred strains of mice.

In natural populations of house mice, including the Tucson population studied here, linkage disequilibrium (LD) typically extends only short distances [89]. Nonetheless, outlier SNPs in this analysis may be in LD with other nearby SNPs, including some that have not been surveyed such as intronic SNPs not captured by the exome probes. Thus, the outlier SNPs may be targets of selection themselves or may be in LD with SNPs that are targets of selection. Since LD does not typically extend over multiple genes, we have annotated the genes containing outlier SNPs, although it is possible that in some cases, the target of selection is a nearby gene. The same reasoning applies to the association study (below).

Genome wide association study

To identify genes underlying body weight we performed a genome wide association study (GWAS) using a linear mixed model approach with the program GEMMA [41]. Linear mixed model approaches have been used to successfully control for relatedness among samples and population stratification [98101]. Input files for mapping body weight were created with the program PLINK. A total of 339,130 SNPs were used in the final mapping analysis [102]. Sex chromosomes were excluded from the analyses. We excluded pregnant females, juveniles, and individuals whose reproductive status was uncertain from this analysis. The resulting body weight input file contained SNP genotypes and phenotypes for 38 mice that had both phenotype and genotype data from the western transect. A centered kinship matrix was created in GEMMA and the linear mixed model was run with sex and kinship as covariates. The linear mixed model used in GEMMA accounts for population structure in a GWAS by calculating kinship among sampled individuals and this should reduce false positives even with our small sample size [98].

GEMMA fits a linear mixed model in the following form:

y=Wα+xβ+u+ϵ;uMVNn(0,λτ1K),ϵMVNn(0,τ1In)

where y represents a n-vector of qualitative traits for n individuals, W is a n × c matrix of covariates, α is a c-vector of the corresponding coefficients including the intercept, x is an n-vector of genotypes, β is the effect size, u is a vector of random effects, ϵ represents a vector of errors and τ-1 is the variance of residual errors, λ is the ratio between the two variance components, K is the n × n relatedness matrix, In is a n × n identity matrix, and MVN is the multivariate normal distribution. In this case, y is a vector of bodyweight for n individuals, x is the n by 1 vector of genotypes, and u is an n by 1 vector to control for relatedness and population structure, and ϵ represents residual errors as an n × 1 vector. We used a genome wide 5% false discovery rate (fdr) to correct for multiple testing. To calculate the percent of the phenotypic variance explained (PVE) by each significantly associated SNP with our GEMMA output we used the following equation from [103]:

PVE=(2β2MAF(1MAF))/(2β2MAF(1MAF)+(se(β))22NMAF(1MAF))

where β is the effect size of a single SNP calculated using the mixed effects model in gemma, se(β) represents the standard error of the effect size, MAF is the minor allele frequency of the focal SNP, and N is equal to the sample size. To determine the effect size of an individual gene we calculated PVE for the most highly associated SNP within that locus.

We also analyzed the data from [33] in the same way to look for associations between body weight and SNPs among mice from the eastern transect. We performed GWAS on the mice from each transect separately, rather than in a combined dataset, because the combined dataset has significant population structure due to the deep divergence between transects (Figs 1B and S2). Additionally, our motivation for performing a GWAS in each transect was to ask whether the genetic basis of body weight overlapped between the transects.

Statistical evaluation of parallel genetic evolution

To detect parallel evolution between the east and west transects we compared the loci under selection in each transect. These analyses were performed at the level of the gene, not the SNP. A gene was considered to be an outlier if it contained at least one SNP under selection. To test whether the observed overlap was greater than expected by chance we performed a permutation test using the sample function in R without replacement with 100,000 permutations. We began with candidate genes identified by LFMM using a false discovery rate of 5% (q-value < 0.05, Z-score > 2.5). We then identified the number of overlapping outlier genes between the transects. Specifically, for each permutation we randomly sampled 7407 genes, representing the western transect outliers, from the total number of genes in the M. m. domesticus genome (24336). We then randomly sampled 1859 genes, representing the eastern transect outliers, from the total number of genes in the genome. We then tabulated the number of genes at the intersection of these two samples. We repeated these analyses using the outliers identified with a more stringent q-value cut-off of 0.001 in each transect (Z-score > 3.1). Ninety five percent p-value confidence intervals were calculated using equations 2 & 3 in [104].

Supporting information

S1 Table. Individuals sampled, including specimen catalog numbers, exact collecting localities, reproductive data, and body measurements of 50 wild-caught Mus musculus from western North America.

(XLSX)

S2 Table. Sample information for Mus musculus domesticus and M. m. castaneus from Harr et al. 2016 [53].

(XLSX)

S3 Table. Exon capture sequencing coverage statistics of data for each mouse in the western transect on an Illumina HiSeq 4000.

(XLSX)

S4 Table. Average nucleotide diversity (π) and average proportion of segregating sites (θ) for each of the western transect populations: Tucson, AZ, USA (TUC); St. George, UT, USA (STG); Provo, UT, USA (PRO); Missoula, MT, USA (MIS); Edmonton, Alberta, Canada (EDM).

(XLSX)

S5 Table. Pairwise Fst calculated using exome capture data among the western transect populations: Tucson, AZ, USA (TUC); St. George, UT, USA (STG); Provo, UT, USA (PRO); Missoula, MT, USA (MIS); Edmonton, Alberta, Canada (EDM).

(XLSX)

S6 Table. List of 13,057 SNPs in 4,438 genes significantly associated with variation in MAT across western North America using LFMM at the q-value < 0.05 level.

The ensemble ID, chromosome, bp position, base pair change (SNP), q-value, and allele frequency in TUC, STG, PRO, MIS, and EDM populations (from left to right) are listed for each SNP along with the start and end positions, gene name, and gene description.

(XLS)

S7 Table. List of 311 SNPs in 95 genes significantly associated with variation in MAT across western North America using LFMM at the q-value < 0.001 level with at least two SNPs meeting this threshold.

The ensemble ID, chromosome, bp position, base pair change (SNP), q-value, and allele frequency in TUC, STG, PRO, MIS, and EDM populations (from left to right) are listed for each SNP along with the start and end positions, gene name, and gene description.

(XLS)

S8 Table. A list of the top 16 genes under selection in both transects (q-values < 0.001) with their ENSMBL ID, abbreviated gene name, full gene name, and a description of their function.

Genes involved in metabolic processes have been highlighted in green, while genes involved specifically in thermoregulation are highlighted in blue.

(XLSX)

S9 Table. Parallel SNPs at four of the 16 genes under selection in both transects at the q-value < 0.001 significance cut-off.

(XLSX)

S10 Table. Top eight SNPs in five genes (FDR q-value < 0.05) associated with body weight variation in the western transect from a GWAS conducted using GEMMA.

Chromosome (chr), bp position (ps), alleles, minor allele frequency (MAF), SNP effect size (beta), effect size standard error (beta se), percent of the phenotypic variance explained by each SNP calculated according to Shim et al. 2015 (SNP PVE), likelihood ratio p-value (p_lrt), gene Ensemble ID, gene name, the lowest LFMM q-value for SNPs in that gene, and a description of gene function is listed for each significant SNP in the GWAS.

(XLSX)

S1 Fig. Plots of the bioclim variables mean annual temperature, temperature seasonality, isothermality, and mean annual precipitation against latitude for each house mouse population in the western transect.

(TIF)

S2 Fig. An estimated population tree using SVDQuartets.

Bootstrap support out of a total of 100 repetitions is represented on each node.

(TIF)

S3 Fig. Genetic principal components analysis (PCA) of all 100 house mice from 10 populations across Eastern and Western North America.

Circles represent populations from the Western transect: AZ (cyan), St. George, UT (black), Provo, UT (red), MT (green), AB (blue). Triangles represent populations from the Eastern transect: FL (cyan), GA (black), VA (red), PA (green), VT/NH (blue). PC1 explains 14% and PC2 5% of the genetic variance.

(TIF)

S4 Fig. Permuted distributions of the number of overlapping genes expected by chance in the eastern and western transects for genes showing LFMM q < 0.05 and genes showing LFMM q < 0.001.

Red lines indicate the observed number in each analysis.

(TIFF)

Acknowledgments

We thank Dana Lin and Felipe Martins for their assistance with western specimen collection in the field. We thank Kennedy Agwamba and Carrie Olson-Manning for help with bioinformatic and phylogenetic analyses, Gaby Heyer and Emily Tze for mouse husbandry assistance, Madeleine Rossanese for her work quantifying coat color, and Lydia Smith for assistance with molecular work.

Data Availability

Sequence data are available through the NCBI SRA (https://www.ncbi.nlm.nih.gov/sra) under BioProject ID: PRJNA718321. Scripts, input files for analyses, and phenotype data are available through Dryad: https://doi.org/10.5061/dryad.sxksn0324.

Funding Statement

ASC was supported by an NSF postdoctoral Fellowship (PRFB-1402539). This work was supported by NIH grants to MWN (RO1 GM074245 and R01 GM127468). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Huxley JS. Clines: An auxiliary method in taxonomy. Bijdr. Dierk. 1939; 27: 491–520. [Google Scholar]
  • 2.Endler JA. Geographic Variation, Speciation, and Clines. 10th ed. Princeton, New Jersey: Princeton University Press; 1977. [PubMed] [Google Scholar]
  • 3.Singh RS, Hickey DA, David J. Genetic differentiation between geographically distant populations of Drosophila melanogaster. Genetics. 1982;101(2):235–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Berry A, Kreitman M. Molecular analysis of an allozyme cline: alcohol dehydrogenase in Drosophila melanogaster on the east coast of North America. Genetics. 1993;134(3):869–893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Schmidt PS, Zhu CT, Das J, Batavia M, Yang L, Eanes WF. An amino acid polymorphism in the couch potato gene forms the basis for climatic adaptation in Drosophila melanogaster. Proceedings of the National Academy of Sciences of the United States of America. 2008;105(42):16207–11. 10.1073/pnas.0805485105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Schmidt PS, Paaby AB. Reproductive diapause and life-history clines in North American populations of Drosophila melanogaster. Evolution. 2008;62(5):1204–1215. 10.1111/j.1558-5646.2008.00351.x [DOI] [PubMed] [Google Scholar]
  • 7.Coyne JA, Beecham E. Heritability of two morphological characters within and among natural populations of Drosophila melanogaster. Genetics. 1987;117(4):727–737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Turner TL, Levine MT, Eckert ML, Begun DJ. Genomic analysis of adaptive differentiation in Drosophila melanogaster. Genetics. 2008;179(1):455–473. 10.1534/genetics.107.083659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Reinhardt JA, Kolaczkowski B, Jones CD, Begun DJ, Kern AD. Parallel geographic variation in Drosophila melanogaster. Genetics. 2014;197(1):361–373. 10.1534/genetics.114.161463 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Machado HE, Bergland AO, O’Brien KR, Behrman EL, Schmidt PS, Petrov DA. Comparative population genomics of latitudinal variation in Drosophila simulans and Drosophila melanogaster. Molecular Ecology. 2016;25(3):723–740. 10.1111/mec.13446 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sedghifar A, Saelao P, Begun D. Genomic patterns of geographic differentiation in Drosophila simulans. Genetics. 2016;202(3):1229–1240. 10.1534/genetics.115.185496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hancock AM, Di Rienzo A. Detecting the Genetic Signature of Natural Selection in Human Populations: Models, Methods, and Data. Annu Rev Anthropol. 2008;37:197–217. 10.1146/annurev.anthro.37.081407.085141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Fournier-Level A, Korte A, Cooper MD, Nordborg M, Schmitt J, Wilczek AM. A map of local adaptation in Arabidopsis thaliana. Science. 2011;334(6052):86–89. 10.1126/science.1209271 [DOI] [PubMed] [Google Scholar]
  • 14.Hancock AM, Witonsky DB, Alkorta-Aranburu G, Beall CM, Gebremedhin A, Sukernik R, et al. Adaptations to climate-mediated selective pressures in humans. PLoS Genetics. 2011;7(4):e1001375. 10.1371/journal.pgen.1001375 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J, et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature. 2012;484(7392):55–61. 10.1038/nature10944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gould BA, Stinchcombe JR. Population genomic scans suggest novel genes underlie convergent flowering time evolution in the introduced range of Arabidopsis thaliana. Molecular Ecology. 2017;26(1):92–106. 10.1111/mec.13643 [DOI] [PubMed] [Google Scholar]
  • 17.Bilinski P, Albert PS, Berg JJ, Birchler JA, Grote MN, Lorant A, et al. Parallel altitudinal clines reveal trends in adaptive evolution of genome size in Zea mays. PLoS Genetics. 2018;14(5):e1007162. 10.1371/journal.pgen.1007162 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang GD, Zhang BL, Zhou WW, Li YX, Jin JQ, Shao Y, et al. Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri. Proceedings of the National Academy of Sciences of the United States of America. 2018;115(22):E5056–65. 10.1073/pnas.1716257115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhang M, Suren H, Holliday JA. Phenotypic and genomic local adaptation across latitude and altitude in Populus trichocarpa. Genome Biology and Evolution. 2019;11(8):2256–2272. 10.1093/gbe/evz151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ellis NA, Glazer AM, Donde NN, Cleves PA, Agoglia RM, Miller CT. Distinct developmental genetic mechanisms underlie convergently evolved tooth gain in sticklebacks. Development. 2015;142(14):2442–2451. 10.1242/dev.124248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Siddiq MA, Loehlin DW, Montooth KL, Thornton JW. Experimental test and refutation of a classic case of molecular adaptation in Drosophila melanogaster. Nature Ecology and Evolution. 2017;1(2):1–6. 10.1038/s41559-016-0001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Watt WB, Carter PA, Blower SM. Adaptation at specific loci. IV. Differential mating success among glycolytic allozyme genotypes of Colias butterflies. Genetics. 1985;109(1):157–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Niitepõld K, Smith AD, Osborne JL, Reynolds DR, Carreck NL, Martin AP, et al. Flight metabolic rate and Pgi genotype influence butterfly dispersal rate in the field. Ecology. 2009;90(8):2223–2232. 10.1890/08-1498.1 [DOI] [PubMed] [Google Scholar]
  • 24.Wheat CW, Haag CR, Marden JH, Hanski I, Frilander MJ. Nucleotide polymorphism at a gene (Pgi) under balancing selection in a butterfly metapopulation. Molecular Biology and Evolution. 2010;27(2):267–281. 10.1093/molbev/msp227 [DOI] [PubMed] [Google Scholar]
  • 25.Powers DA, Place AR. Biochemical genetics of Fundulus heteroclitus (L.). I. Temporal and spatial variation in gene frequencies of Ldh-B, Mdh-A, Gpi-B, and Pgm-A. Biochemical Genetics. 1978;16(5–6):593–607. 10.1007/BF00484222 [DOI] [PubMed] [Google Scholar]
  • 26.Powers DA, Lauerman T, Crawford D, DiMichele L. Genetic mechanisms for adapting to a changing environment. Annual Review of Genetics. 1991;25(1):629–660. 10.1146/annurev.ge.25.120191.003213 [DOI] [PubMed] [Google Scholar]
  • 27.Colosimo PF, Hosemann KE, Balabhadra S, Villarreal G, Dickson M, Grimwood J, et al. Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science. 2005;307:1928–1933. 10.1126/science.1107239 [DOI] [PubMed] [Google Scholar]
  • 28.Zhen Y, Aardema ML, Medina EM, Schumer M, Andolfatto P. Parallel molecular evolution in an herbivore community. Science. 2012;337(6102):1634–1637. 10.1126/science.1226630 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hoekstra HE, Nachman MW. Different genes underlie adaptive melanism in different populations of rock pocket mice. Molecular Ecology. 2003;12(5):1185–1194. 10.1046/j.1365-294x.2003.01788.x [DOI] [PubMed] [Google Scholar]
  • 30.Hoekstra HE, Hirschmann RJ, Bundey RA, Insel PA, Crossland JP. A single amino acid mutation contributes to adaptive beach mouse color pattern. Science. 2006;313(5783):101–104. 10.1126/science.1126121 [DOI] [PubMed] [Google Scholar]
  • 31.Rosenblum EB, Römpler H, Schöneberg T, Hoekstra HE. Molecular and functional basis of phenotypic convergence in white lizards at White Sands. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(5):2113–2117. 10.1073/pnas.0911042107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Reid NM, Proestou DA, Clark BW, Warren WC, Colbourne JK, Shaw JR, et al. The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science. 2016;354(6317):1305–1308. 10.1126/science.aah4993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Phifer-Rixey M, Bi K, Ferris KG, Sheehan MJ, Lin D, Mack KL, et al. The genomic basis of environmental adaptation in house mice. PLOS Genetics. 2018;14:e1007672. 10.1371/journal.pgen.1007672 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Brennan RS, Healy TM, Bryant HJ, La M Van, Schulte PM, Whitehead A. Integrative population and physiological genomics reveals mechanisms of adaptation in killifish. Molecular Biology and Evolution. 2018;35(11):2639–2653. 10.1093/molbev/msy154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Didion JP, Fernando •, De Villena P-M. Deconstructing Mus gemischus: advances in understanding ancestry, structure, and variation in the genome of the laboratory mouse. Mammalian Genome. 2013;24:1–20. 10.1007/s00335-012-9441-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Phifer-Rixey M, Nachman MW. Insights into mammalian biology from the wild house mouse Mus musculus. eLife. 2015;4:1–13, e05959. 10.7554/eLife.05959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wang T, Hamann A, Spittlehouse D, Carroll C. Locally downscaled and spatially customizable climate data for historical and future periods for North America. PLoS ONE. 2016;11(6), e0156720. 10.1371/journal.pone.0156720 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Frichot E, Schoville SD, Bouchard G, François O. Testing for associations between loci and environmental gradients using latent factor mixed models. Molecular Biology and Evolution. 2013;30:1687–1699. 10.1093/molbev/mst063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chen AS, Marsh DJ, Trumbauer ME, Frazier EG, Guan X-M, Yu H, et al. Inactivation of the mouse melanocortin-3 receptor results in increased fat mass and reduced lean body mass. Nature. 2000;26:97–102. 10.1038/79254 [DOI] [PubMed] [Google Scholar]
  • 40.Tan C-H, Mcnaughton PA. The TRPM2 ion channel is required for sensitivity to warmth. Nature. 2016;536:460–463. 10.1038/nature19074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nature Genetics. 2012;44(7):821–824. 10.1038/ng.2310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Pritchard JK, Pickrell JK, Coop G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Current Biology. 2010;20(4):R208–215. 10.1016/j.cub.2009.11.055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Mack KL, Ballinger MA, Phifer-Rixey M, Nachman MW. Gene regulation underlies environmental adaptation in house mice. Genome Research. 2018;28:1636–1645. 10.1101/gr.238998.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Stewart TP, Kim H, Saxton AM, Kim J. Genetic and genomic analysis of hyperlipidemia, obesity and diabetes using (C57BL/6J × TALLYHO/JngJ) F2 mice. BMC Genomics. 2010;11(1):713. 10.1186/1471-2164-11-713 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Tagliabracci VS, Turnbull J, Wang W, Girard J-M, Zhao X, Skurat AV, et al. Laforin is a glycogen phosphatase, deficiency of which leads to elevated phosphorylation of glycogen in vivo. Proceedings of the National Academy of Sciences. 2007;104(49):19262–19266. 10.1073/pnas.0707952104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Turnbull J, Tiberia E, Pereira S, Zhao X, Pencea N, Wheeler AL, et al. Deficiency of a glycogen synthase-associated protein, Epm2aip1, causes decreased glycogen synthesis and hepatic insulin resistance. Journal of Biological Chemistry. 2013;288(48):34627–34637. 10.1074/jbc.M113.483198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Bonhomme F, Searle JB. House mouse phylogeography. In: Macholan M, Baird SJE, Munclinger P, Pialek J, editors. Evolution of the house mouse. Cambridge University Press, 2012. pp. 278–296. [Google Scholar]
  • 48.Waters MR. Late Pleistocene exploration and settlement of the Americas by modern humans. Science. 2019;365(6449):eaat5447. 10.1126/science.aat5447 [DOI] [PubMed] [Google Scholar]
  • 49.Moorjani P, Amorim CEG, Arndt PF, Przeworski M. Variation in the molecular clock of primates. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(38):10607–10612. 10.1073/pnas.1600374113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Fan S, Hansen MEB, Lo Y, Tishkoff SA. Going global by adapting local: A review of recent human adaptation. Science. 2016;354(6308):54–59. 10.1126/science.aaf5098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Reynolds AW, Mata-Míguez J, Miró-Herrans A, Briggs-Cloud M, Sylestine A, Barajas-Olmos F, et al. Comparing signals of natural selection between three Indigenous North American populations. Proceedings of the National Academy of Sciences of the United States of America. 2019;116(19):9312–9317. 10.1073/pnas.1819467116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Geraldes A, Basset P, Smith KL, Nachman MW. Higher differentiation among subspecies of the house mouse (Mus musculus) in genomic regions with low recombination. Molecular Ecology. 2011;20(22):4722–4736. 10.1111/j.1365-294X.2011.05285.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Harr B, Karakoc E, Neme R, Teschke M, Pfeifle C, Pezer Ž, et al. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus. Scientific Data. 2016;3:1–14. 10.1038/sdata.2016.75 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Orth A, Adama T, Din W, Bonhomme F. Hybridation naturelle entre deux sous-espèces de souris domestique, Mus musculus domesticus et Mus musculus castaneus, près du lac Casitas (Californie). Genome. 1998;41(1):104–110. 10.1139/g97-109 [DOI] [PubMed] [Google Scholar]
  • 55.Hubbard JK, Uy JAC, Hauber ME, Hoekstra HE, Safran RJ. Vertebrate pigmentation: from underlying genes to adaptive function. Trends in Genetics. 2010;26(5):231–239. 10.1016/j.tig.2010.02.002 [DOI] [PubMed] [Google Scholar]
  • 56.Bergmann KGLC. Über die Ver-hältnisse der wärmeokönomie der Thierezu ihrer Grösse. Göttinger Studien. 1847;3:595–708. [Google Scholar]
  • 57.Mayr E. Geographical character gradients and climatic adaptation. Evolution. 1956;10(1):105–108. [Google Scholar]
  • 58.Meiri S, Dayan T. On the validity of Bergmann’s rule. Journal of Biogeography. 2003;30(3):331–351. [Google Scholar]
  • 59.Conte GL, Arnegard ME, Peichel CL, Schluter D. The probability of genetic parallelism and convergence in natural populations. Proceedings of the Royal Society B: Biological Sciences. 2012;279:5039–5047. 10.1098/rspb.2012.2146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Preite V, Sailer C, Syllwasschy L, Bray S, Ahmadi H, Krämer U, et al. Convergent evolution in Arabidopsis halleri and Arabidopsis arenosa on calamine metalliferous soils. Philosophical Transactions of the Royal Society B: Biological Sciences. 2019;374:20180243. 10.1098/rstb.2018.0243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Karageorgi M, Groen SC, Sumbul F, Pelaez JN, Verster KI, Aguilar JM, et al. Genome editing retraces the evolution of toxin resistance in the monarch butterfly. Nature. 2019;574(7778): 409–412. 10.1038/s41586-019-1610-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Natarajan C, Hoffman FG, Weber RE, Fago A, Witt CC, Storz JF. Predictable convergence in hemoglobin function has unpredictable molecular underpinnings. Science. 2016;354(6310):336–339. 10.1126/science.aaf9070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLOS Genetics. 2016;12(2):e1005767. 10.1371/journal.pgen.1005767 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Turchin MC, Chiang CWK, Palmer CD, Sankararaman S, Reich D, Hirschhorn JN. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genetics. 2012;44(9):1015–1019. 10.1038/ng.2368 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Berg JJ, Coop G. A population genetic signal of polygenic adaptation. PLoS Genetics. 2014;10(8):e1004412. 10.1371/journal.pgen.1004412 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Field Y, Boyle EA, Telis N, Gao Z, Gaulton KJ, Golan D, et al. Detection of human adaptation during the past 2000 years. Science. 2016;354(6313):760–764. 10.1126/science.aag0776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, et al. Reduced signal for polygenic adaptation of height in UK biobank. eLife. 2019;8:e39725. 10.7554/eLife.39725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702. 10.7554/eLife.39702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Butler AA, Kesterson RA, Khong K, Cullen MJ, Pelleymounter MA, Dekoning J, et al. A unique metabolic syndrome causes obesity in the melanocortin-3 receptor-deficient mouse. Endocrinology. 2000;141(9):3518–3521. 10.1210/endo.141.9.7791 [DOI] [PubMed] [Google Scholar]
  • 70.Saito S, Tominaga M. Functional diversity and evolutionary dynamics of thermoTRP channels. Cell Calcium. 2015;57:214–221. 10.1016/j.ceca.2014.12.001 [DOI] [PubMed] [Google Scholar]
  • 71.Matos-Cruz V, Schneider ER, Mastrotto M, Merriman DK, Bagriantsev SN, Gracheva EO. Molecular prerequisites for diminished cold sensitivity in ground squirrels and hamsters. Cell Reports. 2017;21(12):3329–3337. 10.1016/j.celrep.2017.11.083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Key FM, Abdul-Aziz MA, Mundry R, Peter BM, Sekar A, D’Amato M, et al. Human local adaptation of the TRPM8 cold receptor along a latitudinal cline. PLoS Genetics. 2018;14(5):e1007298. 10.1371/journal.pgen.1007298 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Lynch VJ, Bedoya-Reina OC, Ratan A, Sulak M, Drautz-Moses DI, Perry GH, et al. Elephantid genomes reveal the molecular bases of woolly mammoth adaptations to the arctic. Cell Reports. 2015;12(2):217–228. 10.1016/j.celrep.2015.06.027 [DOI] [PubMed] [Google Scholar]
  • 74.Smith SD, Kawash JK, Karaiskos S, Biluck I, Grigoriev A. Evolutionary adaptation revealed by comparative genome analysis of woolly mammoths and elephants. DNA Research. 2017;24(4):359–369. 10.1093/dnares/dsx007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Chigurapati S, Sulak M, Miller W, Lynch VJ. Relaxed constraint and thermal desensitization of the cold-sensing ion channel TRPM8 in mammoths. bioRxiv. 2018. August 22;397356. [Google Scholar]
  • 76.Nachman MW, Hoekstra HE, D ‘Agostino SL. The genetic basis of adaptive melanism in pocket mice. Proceedings of the National Academy of Sciences of the United States of America. 2003;100(9):5268–5273. 10.1073/pnas.0431157100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Stern AJ, Speidel L, Zaitlen NA, Nielsen R. Disentangling selection on genetically correlated polygenic traits via whole-genome genealogies. American Journal of Human Genetics. 2021;108(2):219–239. 10.1016/j.ajhg.2020.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Wlasiuk G, Nachman MW. The genetics of adaptive coat color in gophers: coding variation at Mc1r Is not responsible for dorsal color differences. Journal of Heredity. 2007;98(6):567–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Lynch CB. Clinal variation in cold adaptation in Mus domesticus: verification of predictions from laboratory populations. American Naturalist. 1992;139(6):1219–1236. [Google Scholar]
  • 80.Meyer M, Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harbor protocols. 2010;2010(6):pdb.prot5448. 10.1101/pdb.prot5448 [DOI] [PubMed] [Google Scholar]
  • 81.Singhal S. De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set. Molecular Ecology Resources. 2013;13(3):403–416. 10.1111/1755-0998.12077 [DOI] [PubMed] [Google Scholar]
  • 82.Bi K, Linderoth T, Vanderpool D, Good JM, Nielsen R, Moritz C. Unlocking the vault: next-generation museum population genomics. Molecular Ecology. 2013;22(24):6018–6032. 10.1111/mec.12516 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Jiang H, Lei R, Ding S-W, Zhu S. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics. 2014;15(1):182. 10.1186/1471-2105-15-182 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–359. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957–2963. 10.1093/bioinformatics/btr507 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010;20(9):1297–1303. 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Laurie CC, Nickerson DA, Anderson AD, Weir BS, Livingston RJ, Dean MD, et al. Linkage disequilibrium in wild mice. PLoS Genetics. 2007;3(8):e0030144. 10.1371/journal.pgen.0030144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Johnson PLF, Slatkin M. Accounting for bias from sequencing error in population genetic estimates. Molecular Biology and Evolution. 2007;25(1):199–206. 10.1093/molbev/msm239 [DOI] [PubMed] [Google Scholar]
  • 91.Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics. 2014;15(1):356. 10.1186/s12859-014-0356-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Skotte L, Korneliussen TS, Albrechtsen A. Estimating individual admixture proportions from next generation sequencing data. Genetics. 2013;195:693–702. 10.1534/genetics.113.154138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Fumagalli M, Vieira FG, Linderoth T, Nielsen R. ngsTools: methods for population genetics analyses from next-generation sequencing data. Bioinformatics. 2014;30(10):1486–1487. 10.1093/bioinformatics/btu041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience. 2015;4(1):s13742–015. 10.1186/s13742-015-0047-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Swofford DL. Phylogenetic Analysis Using Parsimony. Version 4. Sinauer Associates, Sunderland, Massachusetts; 2002. [Google Scholar]
  • 96.Chifman J, Kubatko L. Quartet inference from SNP data under the coalescent model. Bioinformatics. 2014;30(23):3317–3324. 10.1093/bioinformatics/btu530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol. 2005;25:1965–1978. [Google Scholar]
  • 98.Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. 10.1534/genetics.107.080101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics. 2010;11(7):459–463. 10.1038/nrg2813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nature Genetics. 2010;42(4):355–360. 10.1038/ng.546 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D. Improved linear mixed models for genome-wide association studies. Nature Methods. 2012;9(6):525–526. 10.1038/nmeth.2037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A tool set for whole-genome association and population-based Llnkage analyses. The American Journal of Human Genetics. 2007;81(3):559–575. 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Shim H, Chasman DI, Smith JD, Mora S, Ridker PM, Nickerson DA, et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS ONE. 2015;10(4):e0120758. 10.1371/journal.pone.0120758 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Ruxton GD, Neuhaüser M. Improving the reporting of P-values generated by randomization methods. Methods in Ecology and Evolution. 2013;4:1033–1036. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. Individuals sampled, including specimen catalog numbers, exact collecting localities, reproductive data, and body measurements of 50 wild-caught Mus musculus from western North America.

(XLSX)

S2 Table. Sample information for Mus musculus domesticus and M. m. castaneus from Harr et al. 2016 [53].

(XLSX)

S3 Table. Exon capture sequencing coverage statistics of data for each mouse in the western transect on an Illumina HiSeq 4000.

(XLSX)

S4 Table. Average nucleotide diversity (π) and average proportion of segregating sites (θ) for each of the western transect populations: Tucson, AZ, USA (TUC); St. George, UT, USA (STG); Provo, UT, USA (PRO); Missoula, MT, USA (MIS); Edmonton, Alberta, Canada (EDM).

(XLSX)

S5 Table. Pairwise Fst calculated using exome capture data among the western transect populations: Tucson, AZ, USA (TUC); St. George, UT, USA (STG); Provo, UT, USA (PRO); Missoula, MT, USA (MIS); Edmonton, Alberta, Canada (EDM).

(XLSX)

S6 Table. List of 13,057 SNPs in 4,438 genes significantly associated with variation in MAT across western North America using LFMM at the q-value < 0.05 level.

The ensemble ID, chromosome, bp position, base pair change (SNP), q-value, and allele frequency in TUC, STG, PRO, MIS, and EDM populations (from left to right) are listed for each SNP along with the start and end positions, gene name, and gene description.

(XLS)

S7 Table. List of 311 SNPs in 95 genes significantly associated with variation in MAT across western North America using LFMM at the q-value < 0.001 level with at least two SNPs meeting this threshold.

The ensemble ID, chromosome, bp position, base pair change (SNP), q-value, and allele frequency in TUC, STG, PRO, MIS, and EDM populations (from left to right) are listed for each SNP along with the start and end positions, gene name, and gene description.

(XLS)

S8 Table. A list of the top 16 genes under selection in both transects (q-values < 0.001) with their ENSMBL ID, abbreviated gene name, full gene name, and a description of their function.

Genes involved in metabolic processes have been highlighted in green, while genes involved specifically in thermoregulation are highlighted in blue.

(XLSX)

S9 Table. Parallel SNPs at four of the 16 genes under selection in both transects at the q-value < 0.001 significance cut-off.

(XLSX)

S10 Table. Top eight SNPs in five genes (FDR q-value < 0.05) associated with body weight variation in the western transect from a GWAS conducted using GEMMA.

Chromosome (chr), bp position (ps), alleles, minor allele frequency (MAF), SNP effect size (beta), effect size standard error (beta se), percent of the phenotypic variance explained by each SNP calculated according to Shim et al. 2015 (SNP PVE), likelihood ratio p-value (p_lrt), gene Ensemble ID, gene name, the lowest LFMM q-value for SNPs in that gene, and a description of gene function is listed for each significant SNP in the GWAS.

(XLSX)

S1 Fig. Plots of the bioclim variables mean annual temperature, temperature seasonality, isothermality, and mean annual precipitation against latitude for each house mouse population in the western transect.

(TIF)

S2 Fig. An estimated population tree using SVDQuartets.

Bootstrap support out of a total of 100 repetitions is represented on each node.

(TIF)

S3 Fig. Genetic principal components analysis (PCA) of all 100 house mice from 10 populations across Eastern and Western North America.

Circles represent populations from the Western transect: AZ (cyan), St. George, UT (black), Provo, UT (red), MT (green), AB (blue). Triangles represent populations from the Eastern transect: FL (cyan), GA (black), VA (red), PA (green), VT/NH (blue). PC1 explains 14% and PC2 5% of the genetic variance.

(TIF)

S4 Fig. Permuted distributions of the number of overlapping genes expected by chance in the eastern and western transects for genes showing LFMM q < 0.05 and genes showing LFMM q < 0.001.

Red lines indicate the observed number in each analysis.

(TIFF)

Data Availability Statement

Sequence data are available through the NCBI SRA (https://www.ncbi.nlm.nih.gov/sra) under BioProject ID: PRJNA718321. Scripts, input files for analyses, and phenotype data are available through Dryad: https://doi.org/10.5061/dryad.sxksn0324.


Articles from PLoS Genetics are provided here courtesy of PLOS

RESOURCES