Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Oct 1.
Published in final edited form as: Mamm Genome. 2017 Aug 17;28(9-10):416–425. doi: 10.1007/s00335-017-9704-9

Whole exome sequencing of wild-derived inbred strains of mice improves power to link phenotype and genotype

Peter L Chang 1, Emily Kopania 1, Sara Keeble 1,2, Brice Sarver 2, Erica Larson 2,3, Annie Orth 4, Khalid Belkhir 4, Pierre Boursot 4, François Bonhomme 4, Jeffrey M Good 2, Matthew D Dean 1,4
PMCID: PMC5693759  NIHMSID: NIHMS900499  PMID: 28819774

Abstract

The house mouse is a powerful model to dissect the genetic basis of phenotypic variation, and serves as a model to study human diseases. Despite a wealth of discoveries, most classical laboratory strains have captured only a small fraction of genetic variation known to segregate in their wild progenitors, and existing strains are often related to each other. Inbred strains of mice independently derived from natural populations have the potential to increase power in genetic studies with the addition of novel genetic variation. Here, we perform exome-enrichment and high-throughput sequencing (~8X coverage) of 26 wild-derived strains known in the mouse research community as the “Montpellier strains”. We identified 1.46 million SNPs in our dataset, approximately 19% of which have not been detected from other inbred strains. This novel genetic variation is expected to contribute to phenotypic variation, as they include 18,496 nonsynonymous variants and 262 early stop codons. Simulations demonstrate that the higher density of genetic variation in the Montpellier strains provides increased power for quantitative genetic studies. Inasmuch as the power to connect genotype to phenotype depends on genetic variation, it is important to incorporate these additional genetic strains into future research programs.

Keywords: wild mice, inbred strains, genetic variation

Introduction

For more than 100 years, the house mouse (Mus musculus) has been a useful model for genetic research (Paigen 2003a, b). Several important features contribute to their utility, including a high quality reference genome with more than a decade’s worth of improved assembly and annotation (Church et al. 2009; Waterston et al. 2002), multiple complete genomes from distinct genetic strains (Keane et al. 2011; Nikolskiy et al. 2015; Srivastava et al. 2017; Wang et al. 2016; Waterston et al. 2002; Wong et al. 2012) and wild individuals (Harr et al. 2016), and dense genotyping of commonly used laboratory strains (Laurie et al. 2007; Lindblad-Toh et al. 2000; Petkov et al. 2004; Wade et al. 2002; Yang et al. 2007; Yang et al. 2009; Yang et al. 2011). Thousands of phenotypes have been gathered from hundreds of inbred mouse strains (Grubb et al. 2004; Wang et al. 2016; White et al. 2013), many of which are commercially available through institutions like The Jackson Laboratory.

Although the impact of the house mouse on biological research cannot be overstated, many existing inbred strains of mice are related to each other in complex ways, and capture only a small amount of genetic variation known to segregate in their wild progenitors (Beck et al. 2000; Keane et al. 2011; Salcedo et al. 2007; Wade et al. 2002; Yang et al. 2011). Inasmuch as the power to connect genotype to phenotype depends on genetic variation, it is important to incorporate additional genetic strains into future research programs. Inbred mouse strains are generally classified as “wild-derived inbred strains” or “classical inbred strains”. Wild derived inbred strains represent independent derivations from particular geographic areas. In contrast, classical inbred strains derive from a small pool of founders whose origins trace to Japanese and European mouse fanciers (Beck et al. 2000; Frazer et al. 2007; Moriwaki 1994; Morse 1978; Morse 2007; Silver 1995; Wade and Daly 2005; Wade et al. 2002), who likely crossed strains from different geographic or species origins prior to inbreeding (Didion and Villena 2013; Keane et al. 2011; Yang et al. 2007; Yang et al. 2011). As a result, 97% of the genome of classical inbred strains can be traced to 10 different haplotypes (Yang et al. 2011).

Several groups have established wild derived inbred strains in an attempt to increase the amount of genetic variation available to researchers. Bonhomme and colleagues at the University of Montpellier have set up one of the largest collections. The Montpellier collection includes 29 strains representing subspecies of Mus musculus, 4 of M. spretus, and one each of M. spicilegus, M. macedonicus, and M. caroli (Supplementary Table 1), all of which are available for nominal fees through the Montpellier Stock Center (http://www.isem.univ-montp2.fr/recherche/les-plate-formes/conservatoire-genetique-de-souris-sauvages/). Thirty-two strains have undergone at least 20 generations of brother-sister mating, allowing for the same level of pseudo-replication that can be achieved with classical inbred strains. Even though the Montpellier strains could greatly increase the amount of known genetic variation, they have been the topic of approximately 60 publications (Supplementary Table 2), which pales in comparison to the thousands of publications on just a few classical inbred strains such as the reference genome C57BL/6J.

To characterize genomic variation of the Montpellier strains, we generate and analyze exome sequences from 26 strains derived from natural populations of Mus musculus and M. spretus. Our goal was to increase the total amount of genetic variation known among inbred mouse strains. With very conservative methods, we identified 1.46 million SNPs, nearly 19% of which were not known from existing inbred strains. More than 77,000 were nonsynonymous or nonsense mutations that may introduce novel phenotypic variation, and we identified a few hundred genes that carry early termination codons and may provide alternatives to traditional knockouts. Using simulations, we show that inclusion of this new genetic variation would significantly improve the power of genetic mapping experiments. Our study demonstrates that the Montpellier strains represent a powerful yet under-utilized resource in genetic research.

Materials and Methods

Mouse strains employed

All husbandry and experimental methods, as well as all personnel involved were approved by the University of Southern California’s Institute for Animal Care and Use Committee, protocol #11394. We chose 26 wild-derived inbred strains (WDIS) from the Montpellier genetic repository to perform high throughput sequencing of enriched exomes. Fourteen of these strains were considered Mus musculus domesticus: 22MO (originally isolated from Monastir, Tunisia), BIK (Kefar Galim, Israel), BZO (Oran, Algeria), DCA (Akrotiri, Cyprus), DCP (Paphos, Cyprus), DDO (Odis, Denmark), DEB (Barcelona, Spain), DGA (Adjaria, Georgia), DIK (Keshet, Israel), DJO (Orcetto, Italy), DMZ (Azemmour, Morocco), DOT (Tahiti, French Polynesia), WLA (Toulouse, France), and WMP (Monastir, Tunisia). Four were considered M. m. musculus: MAM (Megri, Armenia), MBS (Sokolovo, Bulgaria), MGA (Alazani, Georgia) and MPB (Bialowieza, Poland). One was considered M. m. castaneus: CIM (Masinagudi, India). Four additional strains, BID (Birdjand, Iran), KAK (Khakhk, Iran), MPR (Rawalpindi, Pakistan), and TEH (Tehran, Iran), originated from regions that probably harbor multiple subspecies and were not assigned to any one subspecies (Hardouin et al. 2015). Lastly, we included three strains of M. spretus: SEG (Granada, Spain), SFM (Montpellier, France), and STF (Fondouk Djedid, Tunisia), a more distantly related species but one that can still interbreed with M. m. musculus (Bonhomme et al. 1978; Burgio et al. 2007; Dejager et al. 2009). Our sampling included all Montpellier strains considered to be M. m. domesticus; most classical inbred strains are most closely related to this subspecies (Yang et al. 2007).

With the exception of 22MO and WMP, all strains were collected at least several kilometers from every other strain, so they should not be close relatives. This type of sampling is expected to maximize the total amount of genetic variation captured, however it is inappropriate for population genetic analyses since the samples are not derived from a single population. All strains were initially maintained under a moderate inbreeding scheme, then under brother-sister mating for at least 20 generations and are thus highly inbred.

Exome sequencing

DNA was extracted from spleen collected from female mice, when possible, using the Qiagen MasterPure Complete DNA and RNA Purification Kit and protocol from Epicentre (Madison, WI). DNA was sheared using a Bioruptor UCD-200 with 7 rounds of sonication (7 minutes per round on high, 30s on 30s off) and genomic DNA libraries were constructed and individually barcoded using a previously described protocol designed to facilitate multiplexed exome capture (Rohland and Reich 2012). To reduce molecular interference during enrichment, we used truncated adaptors containing unique “internal” barcodes on the P5 end of genomic fragments (Rohland and Reich 2012). PCR primers were designed according to Rohland and Reich (2012).

In-solution sequence capture was performed using Nimblegen SeqCap EZ Mouse Exome probes as described in Nimblegen’s SeqCap EZ Library User’s Guide. Libraries were pooled equally to obtain 1 μg total DNA for each hybridization experiment. Libraries were then enriched using two separate capture reactions with eight libraries each, including blocking oligonucleotides specific to our custom adapters (Rohland and Reich 2012) and mouse COT-1 DNA (Invitrogen) to reduce non-specific hybridization. The capture reactions were hybridized for 68 hours at 47°C in an Eppendorf Mastercycler Pro, and then washed, eluted, and PCR-enriched. Capture enrichment success was verified using qPCR analysis of three targeted regions on pre- and post-capture library pools. Sequencing was performed using 76 bp paired-end reads on the Illumina Hi-seq 2500 platform provided by the Epigenome Center at the University of Southern California.

Illumina reads were mapped to different pseudoreference genomes dependent upon their species of origin (Sarver et al. 2017). A pseudoreference contains the backbone of the mm10 reference mouse genome (strain=C57BL/6J), but allelic states taken from representative strains of M. m. musculus, M. m. domesticus, M. m. castaneus or M. spretus are inserted into the mm10 reference genome. This approach leaves all called SNPs on a common coordinate system (mm10). All pseudoreferences were taken from Sarver et al. (2017). Preliminary analyses demonstrated that the four strains of unknown origin (BID, KAK, MPR, and TEH) fell within the M. m. musculus clade, and so were mapped to the M. m. musculus pseudoreference. The advantage of this technique is that species-specific variation can be incorporated into the reference, thus improving mapping accuracy and recovery, while preserving the coordinates of the mm10 build so that genome annotation can be used. Sequences were mapped with BWA MEM (v0.7.9a, Li and Durbin 2009), using default mapping parameters. Alignments to their respective pseudoreferences were used to identify variants using GATK HaplotypeCaller (McKenna et al. 2010), following PCR duplicate removal and indel realignment. Variants from dbSNP (version 142) were used as the training set during SNP recalibration and subjected to standard hard filtering parameters according to the GATK Best Practices recommendation (Auwera et al. 2013; DePristo et al. 2011): MQ > 56, QD > 24, FS < 12, MQRankSum < 8, ReadPosRankSum < 3, DP > 3. To assess confidence, we compared SNPs identified in the Montpellier strains that also overlapped with dbSNP and quantified how many calls agreed with known allelic states at those sites. In addition to novel SNPs, we identified 152,342 insertion/deletion mutations (Supplementary Table 4). SNPs were classified by functional categories using GRCm38.73 as annotated by EMBL-EBI Ensembl (www.ensembl.org).

Genetic relationships

We placed the 26 Montpellier exomes in the context of an additional 36 inbred strains commonly used in genetic research and compiled in dbSNP (v142) (Sherry et al. 1999; Sherry et al. 2001). Most of these are “classical inbred strains”, which share a common ancestry predominantly originating from a limited M. m. domesticus stock, with some human-mediated contribution from other species (Keane et al. 2011; Yang et al. 2007; Yang et al. 2011). Seven strains from dbSNP are considered wild-derived strains: three M. m. domesticus (LEWES/EiJ, WSB/EiJ, and ZALENDE/EiJ), one M. m. musculus (PWD/PhJ), one M. m. molossinus (MOLF/EiJ), one M. m. castaneus (CAST/EiJ), and one M. spretus (SPRET/EiJ). M. m. molossinus is a subspecies that originated through natural hybridization between M. m. musculus and M. m. castaneus (Yonekawa et al. 1986; Yonekawa et al. 1988).

In addition to placing the Montpellier strains in the context of known variation among other inbred strains, we repeated our analyses after also combining our data with the genetic variants called by Harr et al. (2016), specifically their file named AllMouse.vcf_90_recalibrated_snps_raw_indels_reheader_PopSorted.PASS.vcf available from http://wwwuser.gwdg.de/~evolbio/evolgen/wildmouse/. We only used sites covered in our Montpellier strains and dbSNP. Harr et al. (2016) performed whole genome sequencing from 27 M. m. domesticus, 22 M. m. musculus, 10 M. m. castaneus, and 8 M. spretus, to an average depth of 20.9X autosomal coverage. Although the Harr et al. (2016) data are from wild caught animals, they provide valuable context through which to view genetic relationships of the inbred strains (Montpellier + dbSNP strains).

SNPs that were found in our Montpellier panel as well as dbSNP were used to generate genealogies with the SNPHYLO program (Lee et al. 2014), which uses the maximum likelihood framework of the DNAML program in the PHYLIP package (Felsenstein 1993) to construct trees. 1000 bootstrap analyses were performed using all variants with the PHANGORN package (Schliep 2011) in R (www.r-project.org). Trees were visualized using the APE package (Paradis 2012; Paradis et al. 2004) in R. Resulting trees will be strongly affected by introgression known to occur in the history of inbreeding and therefore serve only as a rough approximation of genetic relationships.

We assessed genetic structure in our sample using STRUCTURE (Falush et al. 2007; Hubisz et al. 2009; Pritchard et al. 2000) and Principal Components Analyses (PCA). STRUCTURE was run under the admixture and correlated allele frequency model with ten independent runs of 10,000 burn-in MCMC iterations followed by 50,000 iterations for 2 to 8 clusters (k=2 to 8). Results were inspected with STRUCTURE HARVESTER (Earl 2012; Evanno et al. 2005). PCA was performed using the SNPRELATE package (Zheng et al. 2012). Unlike the above genealogical anayses, STRUCTURE and PCA require sites called in all 62 strains (i.e., no missing data across the 26 Montpellier exomes plus the 36 additional dbSNP strains). Multiallelic sites were excluded. To reduce the size of the dataset, we chose sites that were at least 100 kb apart, roughly the extent of linkage disequilibrium in wild mice (Laurie et al. 2007), resulting in 26,991 sites used in STRUCTURE and PCA analyses.

Protein coding variation

In addition to classifying SNPs into basic categories such as nonsynonymous and synonymous, we estimated codon usage bias across strains, using ENCPRIME (Novembre 2002), which tests the null hypothesis that codons within an amino acid class are used at equal frequency, after accounting for background base composition. Estimates of codon usage theoretically range from 20 (every amino acid coded by a single codon, representing maximal bias) to 61 (each amino acid coded by each of its synonymous codons at equal frequency, representing minimal bias). Background base composition was estimated from flanking non-exonic sequence, which is inevitably sequenced even when performing exome enrichment. Because base composition varies across the genome, we analyzed codon usage bias in 5 Mb windows. Lastly, we annotated early stop codons, which could represent alternatives to traditional gene knockouts if they disrupt normal gene function. To infer their functional impact, we determined the percentage of the protein truncated by the early stop codons, and whether they occurred on constitutive or facultative exons.

Gene flow between lineages and strains

Previous studies have shown that many laboratory strains have a mosaic genome that contains genetic material from multiple species (Dai et al. 2005; Ferris et al. 1982; Frazer et al. 2007; Ideraabdullah et al. 2004; Nagamine et al. 1992; Tucker et al. 1992; Yalcin et al. 2004; Yang et al. 2007; Yang et al. 2011). We tested for interspecific introgression using the ABBA-BABA test (Green et al. 2010). In this test, “A” and “B” indicate the distribution of a biallelic state across a rooted four-taxon genealogy. We tested the genealogy: (((M. m. domesticus strain #1, M. m. domesticus strain #2), M. m. musculus), M. spretus) represented by (((DOM*, LEWES/EiJ), CzechII/EiJ), SPRET/EiJ), where DOM* represents each Montpellier strain of M. m. domesticus tested individually. LEWES/EiJ and CzechII/EiJ were chosen to represent “pure” M. m. domesticus and M. m. musculus, respectively, as neither shows high levels of introgression (Yang et al. 2011, and unpublished data). We have sequenced the genome of CzechII/EiJ as part of unrelated work, only using variant calls here for the ABBA-BABA implementation. These results should be treated with caution as M. m. musculus and M. m. domesticus are closely related, can still interbreed in nature, and their genomes are not fully differentiated. An excess of either ABBA or BABA sites is best explained by introgression, in our case from M. m. musculus (represented by CzechII/EiJ) into M. m. domesticus.

We divided the genome into 489 non-overlapping 5 Mb windows, discarding any that had fewer than 20 ABBA-BABA sites with genotypes called with a minimum genotype quality (GQ) of 10. GQ is the Phred-scaled confidence in a called genotype and is strongly correlated to the number of reads that map to a particular site, the quality of the base calls from those reads, and sequencing error. For any DOM* strain that contained at least 20 windows with at least 20 ABBA-BABA sites, the null hypothesis of no introgression was evaluated using the block Jackknife procedure as implemented in the script jackKnife.R from the ANGSD package (Korneliussen et al. 2014).

Several recent studies have raised the possibility of gene flow between some wild-derived strains that may have unintentionally occurred after their establishment in the laboratory (Yang et al. 2007; Yang et al. 2009; Yang et al. 2011). To test for more recent introgression, we inspected the distribution of pairwise divergence between every possible pair of Montpellier strains having shared the same lab environment. Recent introgression should result in long stretches of the genome that are identical by descent. We used the B-SMUCE segmentation algorithm of Futschik et al. (2014) implemented in the smuceR function of the R package stepR with default settings. This multiscale segmentation algorithm looks for compositionally homogeneous segments. It estimates the best fit to a series of a minimal number of steps and provides estimates for the number of segments and their boundaries at the same time. Originally employed to find regional variation in GC (1) vs. AT (0) content, we applied this algorithm to a simple literal distance whereby two homologous SNPs yield 0 if they were identical or identically heterozygous, 1 if different and 0.5 if one of them was heterozygous. The program was run with default options to estimate the chromosomal distribution of haplotype sharing between every pair of strains.

Simulating mapping power

To quantify the effect of additional genetic variation on the power to map quantitative trait loci, we simulated two different backcross experiments, started by two different pairs of parental strains: C57BL/6J and DBA/2J, which are the parental strains to the BXD family, a classic recombinant inbred family from which over 5,000 phenotypes have been collected (Wang et al. 2016), and DGA and DJO, two Montpellier strains sequenced in this study. We confined the simulation to sites that were covered in all four strains. We did not include the X chromosome. Using the R package QTL (Broman and Sen 2009; Broman et al. 2003) we simulated a backcross design with heritability=0.1 or 0.5, and sample size of 50 or 100 individuals. We systematically simulated QTL along the genome at intervals of 10 cM, and then performed Haley-Knott regression (Haley and Knott 1992) using the SCANONE function of QTL (Broman and Sen 2009; Broman et al. 2003). We quantified the differences in maximal LOD scores, as well as the average length of the 95% confidence interval in QTL, quantified using the LODINT function in QTL, in these two hypothetical backcross designs.

Results and Discussion

SNP discovery

An average of 19.9 million reads were generated per Montpellier strain, with an average of 73.5% uniquely mapping to the genome (Supplementary Table 3). This amounted to an average coverage of 8.1X at called SNPs (Supplementary Table 3). All raw sequencing data are deposited as NCBI BioProject PRJNA326865.

Using a conservative SNP-calling pipeline and alleles in the mm10 genome as reference, we identified 1,460,057 SNPs (77,439 nonsynonymous, 166,048 synonymous, 178,088 untranslated regions, 740,923 intronic, and 297,559 intergenic) among 26 Montpellier strains (Table 1). Of these, 1,184,277 (81%) occurred in dbSNP. The Montpellier strains thus contribute 275,780 novel SNPs not previously known from inbred strains (18,496 nonsynonymous, 35,833 synonymous, 40,325 untranslated regions, 127,641 intronic, and 53,485 intergenic), with roughly one novel SNP every 12,000 bp. We did not experimentally validate called SNPs through further sequencing; however, among the SNPs identified in the Montpellier strains that also overlapped with dbSNP, 99.8% agreed with the known allelic states at those sites, confirming our pipeline yielded high quality SNP calls. In addition to novel SNPs, we identified 152,342 insertion/deletion mutations (Supplementary Table 4).

Table 1.

Number of SNPs observed (in parentheses: number not observed in dbSNP version 142) across the Montpellier strains

Type All Genotypes Include one (sub) species Exclude one (sub) species

M. m. castaneus M. m. domesticus M. m. musculus M. spretus M. m. castaneus M. m. domesticus M. m. musculus M. spretus
All SNPs 1460057 (275780) 423044 (50233) 673858 (72241) 713103 (100122) 957550 (104101) 1404809 (241355) 1374654 (236018) 1296270 (203381) 1043512 (192527)
Non-synonymous 77439 (18496) 22230 (2947) 39558 (6068) 33688 (6204) 52373 (6196) 74073 (16292) 71556 (14505) 70305 (13779) 54783 (13601)
Synonymous 166048 (35833) 52873 (6708) 88900 (10993) 76383 (11432) 114176 (12887) 158386 (30904) 155812 (28991) 153004 (27667) 120723 (25575)
UTR_5_PRIME 22412 (5210) 6966 (914) 11579 (1447) 10217 (1694) 15566 (2109) 21425 (4565) 21119 (4401) 20438 (4006) 16040 (3513)
UTR_3_PRIME 155676 (35115) 46086 (6455) 77481 (9577) 71698 (12568) 104339 (12898) 148707 (30520) 146639 (29816) 140354 (25927) 112167 (24907)
Intronic 740923 (127641) 214331 (23643) 317763 (29848) 372879 (48261) 486745 (49911) 715231 (112058) 701972 (112323) 654260 (92930) 521351 (87140)
Intergenic 297559 (53485) 80558 (9566) 138577 (14308) 148238 (19963) 184351 (20100) 286987 (47016) 277556 (45982) 257909 (39072) 218448 (37791)

We repeated our analyses after including only a single subspecies, showing that novel variants were not confined to a subset of strains sampled (Table 1). We reached a similar conclusion after systematically excluding one subspecies at a time (Table 1).

Genetic relationships

The four M. spretus strains (Fig. 1, left panel, strains labeled in black) formed a distinct group that was distantly related to the other strains, consistent with previous phylogenetic hypotheses (Lundrigan et al. 2002; Sarver et al. 2017). M. m. castaneus and M. m. musculus formed a group (Fig. 1, left panel, strains labeled in green), that then grouped with M. m. domesticus strains (Fig. 1, left panel, strains labeled in red), also consistent with previous phylogenetic inference (Keane et al. 2011; Phifer-Rixey et al. 2012; White et al. 2009). M. m. domesticus, M. m. musculus, and M. m. castaneus are closely related lineages that diverged less than 350 thousand years ago (Boursot et al. 1996; Geraldes et al. 2008; Salcedo et al. 2007; She et al. 1990; Suzuki et al. 2004). Many genomic regions are still unsorted and lack species-specific substitutions (Geraldes et al. 2011; Salcedo et al. 2007). MOLF/EiJ falls within the other M. m. musculus strains, suggesting that even though it is a natural hybrid species between M. m. castaneus and M. m. musculus (Yonekawa et al. 1986; Yonekawa et al. 1988), a majority of its genome is derived from M. m. musculus.

Figure 1.

Figure 1

Genealogical relationships among 26 Montpellier strains and 36 strains from dbSNP, determined by A) maximum likelihood tree, or B) STRUCTURE analysis considering k=5. Strain names on a black box indicate novel Montpellier exomes sequenced in this study. Nodes labeled with small black circles were supported with at least 95% bootstrap support.

The classical inbred strains (Fig. 1, left panel, strains labeled in blue) fell within the M. m. domesticus group, consistent with previous studies suggesting they are mostly derived from this subspecies (Yang et al. 2007; Yang et al. 2009; Yang et al. 2011). Interestingly, the classical inbred strains formed their own group within wild-derived M. m. domesticus, and even other wild-derived inbred strains from The Jackson Laboratory, like LEWES/EiJ and WSB/EiJ, fall outside of this group. The distinct grouping of the classical inbred strains reinforces what is known about their history, that they derived from a small set of founders during the early mouse fancy trade (Beck et al. 2000; Frazer et al. 2007; Moriwaki 1994; Morse 1978; Morse 2007; Silver 1995; Wade and Daly 2005; Wade et al. 2002). These general groups remained unchanged after repeating the analysis with wild-caught M. m. domesticus, M. m. castaneus, M. m. musculus, and M. spretus of Harr et al. (2016) (Supplementary Figure 1).

The four strains that were not assigned to a subspecies a priori (BID, KAK, MPR, and TEH) all fell within the M. m. musculus clade, but in a basal position within that group (Fig. 1, left panel). These strains originated from individuals collected near the center of mouse diversity in the Middle East (Pakistan and Iran) (Hardouin et al. 2015). Some of these populations may constitute independent (sub-) species that have yet to be described (Rajabi-Maham et al. 2012), or they may have captured ancestral polymorphism or secondary admixture that occurred in the field.

STRUCTURE analyses largely supported the genealogical tree, with log-likelihood values reaching a plateau at K=5 groups. The Evanno method was used to characterize the clustering, with delta K peaking at K=3 and falling to 0 at K=5. The two M. m. castaneus strains (dark green, Fig. 1, right panel) shared a large proportion of genetic variation with M. m. musculus (light green, Fig. 1, right panel). Interestingly, the allele frequencies among wild-derived M. m. domesticus strains (red, Fig. 1, right panel) differed from those of classical inbred strains (blue, Fig. 1, right panel). There is no evidence for genetic structure within classical inbred strains, which capture approximately half of the structure seen in wild derived M. m. domesticus (blue in Fig. 1, right panel). At K=3, M. spretus, M. m. castaneus + M. m. musculus, and M. m. domesticus form distinct clusters. At K=4, M. m. castaneus separates from M. m. musculus. At K=5, the classical inbreds separate from the wild-derived M. m. domesticus. K=6 does not separate out any samples from the rest.

PCA also supported the groupings seen in the genealogy (Supplementary Fig. 2). The first principal component explained 45.2% of the variation and separated the four M. spretus strains from the rest of the panel. The second principle component explained 16.7% of the genetic variation and separated the M. m. musculus and M. m. castaneus strains from the rest of the panel.

Protein coding variation

M. spretus showed low codon usage biased (corresponding to high ENCprime estimates), M. m. castaneus and M. m. musculus strains had high codon usage bias, and M. m. domesticus was intermediate (Fig. 2). Natural selection shapes codon usage bias across mammals, but shows no consistent correlation to proxies of effective population size (Kessler and Dean 2014). The effective population sizes of M. m. castaneus, M. m. domesticus, and M. m. musculus have been genetically estimated at 220K, 100K, and 60K, respectively (Geraldes et al. 2008; Geraldes et al. 2011; Phifer-Rixey et al. 2012). We might hypothesize that M. spretus has a smaller effective population size than M. musculus subspecies because it occupies a smaller geographic area and maintains lower density than M. musculus subspecies (Boursot et al. 1985; Britton and Thaler 1978; Dejager et al. 2009; Orsini et al. 1982). As predicted by a model of weak selection, M. m. castaneus has the strongest, and M. spretus the weakest, codon usage bias. However, the two intermediate taxa do not follow the predictions of weak selection. Thus, there is no consistent relationship between inferred effective population size and codon usage bias estimated from our exomes, as observed by Kessler and Dean (2014).

Figure 2.

Figure 2

Codon usage bias among Montpellier strains.

Across all 26 exomes of wild-derived inbred strains, we identified 262 genes with at least one early stop codon segregating in at least one isoform, many of which segregate in multiple strains (Supplementary Table 5). Of these, 193 truncate less than 10% of the protein and therefore may not have a strong functional impact (Supplementary Fig. 3). However, 117 genes segregate an early stop codon that truncates more than 50% of its wild type length (Supplementary Fig. 3), and could represent a novel source of effective knockouts for future functional studies. Surprisingly, the length of wild type protein truncated by early stop codons did not differ between early stop codons that occurred in constitutively versus facultatively spliced exons (t=0.98, df=260, P=0.33) (Supplementary Fig. 4), suggesting that both types of stop codons have similar functional impacts. In fact, more stop codons affected constitutively (N=143) vs. facultatively (N=119) expressed exons.

Here, we highlight one gene, Bard1, with an early stop codon that truncates 50.6% of the wild type protein in two Montpellier strains, BIK and DDO (Supplementary Table 5). BARD1 forms a heterodimer with BRCA1, which together plays an important role in chromosomal stability and tumor suppression. A truncated BARD1 protein results in defective homologous DNA repair (Westermark et al. 2003), and knockouts for Bard1 die at an early embryonic stage (McCarthy et al. 2003). Whether or not the two Montpellier strains with truncated BARD1 have altered chromosomal stability represents one of many potentially interesting follow-up studies.

Gene flow between lineages and strains

Eight wild-derived strains of M. m. domesticus (BIK, DCP, DDO, DEB, DIK, DJO, DOT, WLA) showed a significant ABBA-BABA result (|Z-score| > 3, Supplementary Table 6), suggesting past introgression from M. m. musculus (represented by CzechII/EiJ). Four strains (22MO, BZO, DMZ, DCA) did not show a significant ABBA-BABA result. Among these 12 strains, we analyzed an average of 2,961.5 ABBA-BABA sites from an average of 79.3 windows. The remaining two strains (DGA and WMP) lacked enough data to apply an ABBA-BABA test (fewer than 20 windows with at least 20 ABBA-BABA sites).

Using the B-SMUCE segmentation algorithm (Futschik et al. 2014), two M. m. domesticus strains DJO and 22MO show “typical” segment lengths of estimated genetic distances between two independently derived strains (Supplementary Fig. 5, Supplementary Table 7). In contrast, the two M. m. domesticus strains DEB and BZO show a marked shift towards long segments with no genetic distance (Supplementary Fig. 5), a possible signature of recent introgression followed by limited recombination and inbreeding. However, the fact that all regions have pairwise genetic distances greater than zero (Supplementary Fig. 5) argue against recent introgression occurring after laboratory establishment.

Simulating mapping power

The power to detect quantitative trait loci will depend in part on the number of markers across the genome. DJO and DGA have ample variants between them where well-known “SNP deserts” occur in classical inbred strains, including on chromosomes 10, 16 and X (Yang et al. 2007) (Fig. 3). The power to detect quantitative trait loci was higher, and the confidence intervals narrower, in the simulated DJO x DGA cross compared to the C57 x DBA cross. With a sample of 100 individuals, and a single QTL with heritability=0.5, the median LOD score of DJO x DGA F2 descendants was 0.13 higher than C57 x DBA descendants, which amounts to a 25% reduction in a traditional p-value. Fourteen regions were detected as a significant QTL in the DJO x DGA cross that were not significant in C57 x DBA (Fig. 4A). Furthermore, the average confidence intervals were 3.19 cM shorter in DJO x DGA compared to C57 x DBA cross (Fig. 4B). The narrower confidence interval translates to roughly 3 Mb of genome, or roughly 20 genes.

Figure 3.

Figure 3

SNP density between DJO and DGA or between C57BL/6J and DBA/2J.

Figure 4.

Figure 4

Figure 4

Simulated F2 cross between DJO and DGO resulted in A) higher LOD scores, and B) narrower confidence intervals compared to a F2 cross between C57BL/6J and DBA/2J.

Conclusions

Mouse genetics relies heavily on classical inbred strains (the blue strains of Fig. 1). Given that most classical strains are highly related to each other and capture a small amount of genetic variation from their wild progenitors, such research is inherently underpowered to link genomic and phenotypic variation. Our study identified several hundred thousand variants that were previously unknown from inbred strains. These include many that are predicted to have functional impact, including nonsynonymous mutations and early stop codons. Our data demonstrate that the Montpellier strains would increase the power of mouse genetics.

Supplementary Material

335_2017_9704_MOESM10_ESM. Supplementary Figure 1.

Genealogical relationships among 26 Montpellier strains, 36 strains from dbSNP, and 67 wild caught mice from Harr et al. (2016). Similar to Figure 1 but note that species names have changed to accommodate larger sample sizes. Strain names on a black box indicate novel Montpellier exomes sequenced in this study. Nodes labeled with small black circles were supported with at least 95% bootstrap support. Importantly, as seen in Fig. 1, the classical inbred strains cluster to the exclusion of wild derived inbred strains. Ignoring the classical inbred strains, M. m. musculus and M. m. domesticus are sister groups, unlike in Fig. 1 where M. m. musculus and M. m. castaneus are sister groups.

335_2017_9704_MOESM11_ESM. Supplementary Figure 2.

Principal components analysis of genetic variation among 26 Montpellier strains and 36 “classical inbred” strains. Colors follow Fig. 1.

335_2017_9704_MOESM12_ESM. Supplementary Figure 3.

Distribution of stop codons, arranged by the proportion of wild type protein that remains.

335_2017_9704_MOESM13_ESM. Supplementary Figure 4.

Proportion of wild type protein truncated by early stop codons, separated by whether the early stop codon occurs on a facultative or constitutive exon.

335_2017_9704_MOESM1_ESM. Supplementary Figure 5.

Examples of pairwise distribution of genetic divergence vs. haplotype length for 2 pairs of wild derived inbred strains.

335_2017_9704_MOESM2_ESM

Supplementary Table 1. A list of the Montpellier strains

order=arbitrary order; Strain=strain name; Genus=genus name; Species=species name; Country=country from which strain was originally collected; Established=year that strain was established in laboratory; Genotyped_MDA=was strain genotyped with Mouse Diversity Array; Karyotype=standard (std, 2N=30) or alternative 2N; Exome_sequencing_Current_study=whether strain was included in present study.

335_2017_9704_MOESM3_ESM
335_2017_9704_MOESM4_ESM
335_2017_9704_MOESM5_ESM

Supplementary Table 4. Number of SNPs discovered across “All Genotypes”, each respective species analyzed separately, or each respective species removed. Table divided according to whether SNPs were also found in dbSNP (top panel), total WDIS dataset (middle panel), or not found in dbSNP (bottom panel)

335_2017_9704_MOESM6_ESM
335_2017_9704_MOESM7_ESM. Supplementary Table 6.

Results of ABBA-BABA tests.

DOM_strain=Wild-derived M. m. domesticus strain which was included in the ABBA-BABA test; ABBA and BABA=the number of ABBA or BABA sites counted; N_windows=The number of 5 Mb windows that contained at least 20 ABBA-BABA sites; remaining columns=statistics associated with the ABBA-BABA jacknife permutation.

335_2017_9704_MOESM8_ESM
335_2017_9704_MOESM9_ESM

Acknowledgments

We thank Charlie Nicolet and Selene Tyndale from the Epigenome Center at USC. Brent Young, Rachel Mangels, and Lorraine Provencio helped with molecular work. Matt Salomon and Rob Williams gave many helpful suggestions. Jean-Jacques Duquesne maintained the wild mouse repository in Montpellier. Funding was provided by the National Institutes of Health Grant #GM098536 (MDD), National Science Foundation Grant #1146525 (MDD), the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health Grant #HD073439 (JMG), and the University of Montana Genomics Core, supported by a grant from the M.J. Murdock Charitable Trust.

Footnotes

Data and associated code from the entire pipeline described below are available on the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.[NNNN]). Illumina sequencing data are available in NCBI under the BioProject PRJNA326865.

Conflict of Interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest

References

  1. Auwera GA, Carneiro MO, Hartl C, et al. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics. 2013:11.10.11–11.10.33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Beck JA, Lloyd S, Hafezparast M, Lennon-Pierce M, Eppig JT, Festing MF, Fisher EM. Genealogies of mouse inbred strains. Nat Genet. 2000;24:23–25. doi: 10.1038/71641. [DOI] [PubMed] [Google Scholar]
  3. Bonhomme F, Martin S, Thaler L. Hybridation en laboratoire de Mus musculus L. et Mus spretus Lataste. Experientia. 1978;34:1140–1141. doi: 10.1007/BF01922917. [DOI] [PubMed] [Google Scholar]
  4. Boursot P, Din W, Anand R, Darviche D, Dod B, Von Deimling F, Talwar GP, Bonhomme F. Origin and radiation of the house mouse: mitochondrial DNA phylogeny. Journal of Evolutionary Biology. 1996;9:391–415. [Google Scholar]
  5. Boursot P, Jacquart T, Bonhomme F, Britton-Davidian J, Thaler L. Differenciation geographique du genome mitochondrial chez Mus spretus Lataste. Comptes rendus de l’Academie des sciences. 1985;301:161–166. [PubMed] [Google Scholar]
  6. Britton J, Thaler L. Evidence for the presence of two sympatric species of mice (genus <i>Mus</i> L.) in southern France based on biochemical genetics. Biochemical Genetics. 1978;16:213–225. doi: 10.1007/BF00484079. [DOI] [PubMed] [Google Scholar]
  7. Broman KW, Sen S. A guide to QTL mapping with R/qtl. New York: Springer; 2009. [Google Scholar]
  8. Broman KW, Wu H, Sen S, Churchill GA. R/qtl: QTL mapping in experimental crosses. Bioinformatics (Oxford, England) 2003;19:889–890. doi: 10.1093/bioinformatics/btg112. [DOI] [PubMed] [Google Scholar]
  9. Burgio G, Szatanik M, Guenet J-L, Arnau M-R, Panthier J-J, Montagutelli X. Interspecific recombinant congenic strains between C57BL/6 and mice of the Mus spretus species: a powerful tool to dissect genetic control of complex traits. Genetics. 2007;177:2321–2333. doi: 10.1534/genetics.107.078006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Church DM, Goodstadt L, Hillier LW, et al. Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse. PLoS Biol. 2009;7:e1000112. doi: 10.1371/journal.pbio.1000112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dai J-g, Min J-x, Xiao Y-b, Lei X, Shen W-h, Wei H. The absence of mitochondrial DNA diversity among common laboratory inbred mouse strains. Journal of experimental biology. 2005;208:4445–4450. doi: 10.1242/jeb.01920. [DOI] [PubMed] [Google Scholar]
  12. Dejager L, Libert C, Montagutelli X. Thirty years of Mus spretus: a promising future. Trends in Genetics. 2009;25:234–241. doi: 10.1016/j.tig.2009.03.007. [DOI] [PubMed] [Google Scholar]
  13. DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Didion J, Villena F-M. Deconstructing Mus gemischus: advances in understanding ancestry, structure, and variation in the genome of the laboratory mouse. Mammalian Genome. 2013;24:1–20. doi: 10.1007/s00335-012-9441-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Earl DA. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation genetics resources. 2012;4:359–361. [Google Scholar]
  16. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular ecology. 2005;14:2611–2620. doi: 10.1111/j.1365-294X.2005.02553.x. [DOI] [PubMed] [Google Scholar]
  17. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes. 2007;7:574–578. doi: 10.1111/j.1471-8286.2007.01758.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Felsenstein J. {PHYLIP}: phylogenetic inference package, version 3.5 c. 1993. [Google Scholar]
  19. Ferris SD, Sage RD, Wilson AC. Evidence from mtDNA sequences that common laboratory strains of inbred mice are descended from a single female. Nature. 1982;295:163–165. doi: 10.1038/295163a0. [DOI] [PubMed] [Google Scholar]
  20. Frazer KA, Eskin E, Kang HM, et al. A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature. 2007;448:1050–1053. doi: 10.1038/nature06067. [DOI] [PubMed] [Google Scholar]
  21. Futschik A, Hotz T, Munk A, Sieling H. Bioinformatics (Oxford, England) 2014. Multiscale DNA partitioning: statistical evidence for segments. [DOI] [PubMed] [Google Scholar]
  22. Geraldes A, Basset P, Gibson B, Smith KL, Harr B, Yu HT, Bulatova N, Ziv Y, Nachman MW. Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes. Mol Ecol. 2008;17:5349–5363. doi: 10.1111/j.1365-294X.2008.04005.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Geraldes A, Basset P, Smith KL, Nachman MW. Higher differentiation among subspecies of the house mouse (Mus musculus) in genomic regions with low recombination. Mol Ecol. 2011;20:4722–4736. doi: 10.1111/j.1365-294X.2011.05285.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Green RE, Krause J, Briggs AW, et al. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Grubb SC, Churchill GA, Bogue MA. A collaborative database of inbred mouse strain characteristics. Bioinformatics (Oxford, England) 2004;20:2857–2859. doi: 10.1093/bioinformatics/bth299. [DOI] [PubMed] [Google Scholar]
  26. Haley CS, Knott SA. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity (Edinb) 1992;69:315–324. doi: 10.1038/hdy.1992.131. [DOI] [PubMed] [Google Scholar]
  27. Hardouin EA, Orth A, Teschke M, Darvish J, Tautz D, Bonhomme F. Eurasian house mouse (Mus musculus L.) differentiation at microsatellite loci identifies the Iranian plateau as a phylogeographic hotspot. BMC evolutionary biology. 2015;15:26. doi: 10.1186/s12862-015-0306-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Harr B, Karakoc E, Neme R, et al. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus. Scientific Data. 2016;3:160075. doi: 10.1038/sdata.2016.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hubisz MJ, Falush D, Stephens M, Pritchard JK. Inferring weak population structure with the assistance of sample group information. Molecular ecology resources. 2009;9:1322–1332. doi: 10.1111/j.1755-0998.2009.02591.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Ideraabdullah FY, de la Casa-Esperon E, Bell TA, Detwiler DA, Magnuson T, Sapienza C, de Villena FP. Genetic and haplotype diversity among wild-derived mouse inbred strains. Genome Res. 2004;14:1880–1887. doi: 10.1101/gr.2519704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Keane TM, Goodstadt L, Danecek P, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477:289–294. doi: 10.1038/nature10413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kessler MD, Dean MD. Effective population size does not predict codon usage bias in mammals. Ecology and Evolution. 2014;4:3887–3900. doi: 10.1002/ece3.1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD: analysis of next generation sequencing data. BMC bioinformatics. 2014;15:1. doi: 10.1186/s12859-014-0356-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Laurie CC, Nickerson DA, Anderson AD, Weir BS, Livingston RJ, Dean MD, Smith KL, Schadt EE, Nachman MW. Linkage disequilibrium in wild mice. PLoS Genetics. 2007;3:e144. doi: 10.1371/journal.pgen.0030144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lee TH, Guo H, Wang X, Kim C, Paterson AH. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics. 2014;15:162. doi: 10.1186/1471-2164-15-162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Lindblad-Toh K, Winchester E, Daly MJ, et al. Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nat Genet. 2000;24:381–386. doi: 10.1038/74215. [DOI] [PubMed] [Google Scholar]
  38. Lundrigan BL, Jansa SA, Tucker PK. Phylogenetic relationships in the genus Mus, based on paternally, maternally, and biparentally inherited characters. Systematic biology. 2002;51:410–431. doi: 10.1080/10635150290069878. [DOI] [PubMed] [Google Scholar]
  39. McCarthy EE, Celebi JT, Baer R, Ludwig T. Loss of Bard1, the Heterodimeric Partner of the Brca1 Tumor Suppressor, Results in Early Embryonic Lethality and Chromosomal Instability. Molecular and cellular biology. 2003;23:5056–5063. doi: 10.1128/MCB.23.14.5056-5063.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Moriwaki K. Genetics in wild mice. Tokyo, Japan: Japan Scientific Societies Press; 1994. Wild mouse from a geneticist’s viewpoint. [Google Scholar]
  42. Morse HC. Origins of inbred mice. Academic Press; 1978. [Google Scholar]
  43. Morse HCI. Building a better mouse: one hundred years of genetics and biology. In: Fox JG, Barthold SW, Davisson MT, Newcomer CE, Quimby FW, Smith AL, editors. The mouse in biomedical research. Waltham, MA: Elsevier; 2007. [Google Scholar]
  44. Nagamine CM, Nishioka Y, Moriwaki K, Boursot P, Bonhomme F, Lau YFC. The musculus-type Y chromosome of the laboratory mouse is of Asian origin. Mammalian Genome. 1992;3:84–91. doi: 10.1007/BF00431251. [DOI] [PubMed] [Google Scholar]
  45. Nikolskiy I, Conrad DF, Chun S, Fay JC, Cheverud JM, Lawson HA. Using whole-genome sequences of the LG/J and SM/J inbred mouse strains to prioritize quantitative trait genes and nucleotides. BMC genomics. 2015;16:415. doi: 10.1186/s12864-015-1592-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Novembre JA. Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol. 2002;19:1390–1394. doi: 10.1093/oxfordjournals.molbev.a004201. [DOI] [PubMed] [Google Scholar]
  47. Orsini P, Cassaing J, Duplantier J, Croset H. Mus spretus Lataste et Mus musculus domesticus Rutty dans le Midi de la France. 1982. Premieres donnees sur l’ecologie des populations naturelles de souris. [Google Scholar]
  48. Paigen K. One hundred years of mouse genetics: an intellectual history. I. The classical period (1902–1980) Genetics. 2003a;163:1–7. doi: 10.1093/genetics/163.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Paigen K. One hundred years of mouse genetics: an intellectual history. II. The molecular revolution (1981–2002) Genetics. 2003b;163:1227–1235. doi: 10.1093/genetics/163.4.1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Paradis E. Analysis of phylogenetics and evolution with R. New York, NY: Springer; 2012. [Google Scholar]
  51. Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics (Oxford, England) 2004;20:289–290. doi: 10.1093/bioinformatics/btg412. [DOI] [PubMed] [Google Scholar]
  52. Petkov PM, Ding Y, Cassell MA, et al. An efficient SNP system for mouse genome scanning and elucidating strain relationships. Genome Res. 2004;14:1806–1811. doi: 10.1101/gr.2825804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Phifer-Rixey M, Bonhomme F, Boursot P, Churchill GA, Piálek J, Tucker PK, Nachman MW. Adaptive evolution and effective population size in wild house mice. Molecular Biology and Evolution. 2012;29:2949–2955. doi: 10.1093/molbev/mss105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Rajabi-Maham H, Orth A, Siahsarvie R, Boursot P, Darvish J, Bonhomme F. The south-eastern house mouse Mus musculus castaneus (Rodentia: Muridae) is a polytypic subspecies. Biol J Linn Soc. 2012;107:295–306. [Google Scholar]
  56. Rohland N, Reich D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 2012;22:939–946. doi: 10.1101/gr.128124.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Salcedo T, Geraldes A, Nachman MW. Nucleotide variation in wild and inbred mice. Genetics. 2007;177:2277–2291. doi: 10.1534/genetics.107.079988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Sarver B, Keeble S, Cosart T, Tucker P, Dean MD, Good JM. Phylogenomic insights into mouse evolution using a pseudoreference approach. Genome Biology and Evolution. 2017;9:726–739. doi: 10.1093/gbe/evx034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics (Oxford, England) 2011;27:592–593. doi: 10.1093/bioinformatics/btq706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. She JX, Bonhomme F, Boursot P, Thaler L, Catzeflis F. Molecular phylogenies in the genus Mus - comparative analysis of electrophoretic, scnDNA hybridization, and mtDNA RFLP data. Biol J Linn Soc. 1990;41:83–103. [Google Scholar]
  61. Sherry ST, Ward M, Sirotkin K. dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome research. 1999;9:677–679. [PubMed] [Google Scholar]
  62. Sherry ST, Ward M-H, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic acids research. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Silver L. Mouse genetics: concepts and applications. New York, New York: Oxford University Press; 1995. [Google Scholar]
  64. Srivastava A, Morgan AP, Najarian ML, et al. Genomes of the mouse collaborative cross. Genetics. 2017;206:537–556. doi: 10.1534/genetics.116.198838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Suzuki H, Shimada T, Terashima M, Tsuchiya K, Aplin K. Temporal, spatial, and ecological modes of evolution of Eurasian Mus based on mitochondrial and nuclear gene sequences. Mol Phylogenet Evol. 2004;33:626–646. doi: 10.1016/j.ympev.2004.08.003. [DOI] [PubMed] [Google Scholar]
  66. Tucker PK, Lee BK, Lundrigan BL, Eicher EM. Geographic origin of the Y chromosomes in “old” inbred strains of mice. Mammalian genome. 1992;3:254–261. doi: 10.1007/BF00292153. [DOI] [PubMed] [Google Scholar]
  67. Wade CM, Daly MJ. Genetic variation in laboratory mice. Nature genetics. 2005;37:1175–1180. doi: 10.1038/ng1666. [DOI] [PubMed] [Google Scholar]
  68. Wade CM, Kulbokas EJ, 3rd, Kirby AW, Zody MC, Mullikin JC, Lander ES, Lindblad-Toh K, Daly MJ. The mosaic structure of variation in the laboratory mouse genome. Nature. 2002;420:574–578. doi: 10.1038/nature01252. [DOI] [PubMed] [Google Scholar]
  69. Wang X, Pandey AK, Mulligan MK, et al. Joint mouse–human phenome-wide association to test gene function and disease risk. Nature communications. 2016 doi: 10.1038/ncomms10464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Waterston RH, Lindblad-Toh K, Birney E, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. [DOI] [PubMed] [Google Scholar]
  71. Westermark UK, Reyngold M, Olshen AB, Baer R, Jasin M, Moynahan ME. BARD1 Participates with BRCA1 in Homology-Directed Repair of Chromosome Breaks. Molecular and cellular biology. 2003;23:7926–7936. doi: 10.1128/MCB.23.21.7926-7936.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. White JK, Gerdin A-K, Karp NA, et al. Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes. Cell. 2013;154:452–464. doi: 10.1016/j.cell.2013.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. White MA, Ané C, Dewey CN, Larget BR, Payseur BA. Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genet. 2009;5:e1000729. doi: 10.1371/journal.pgen.1000729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Wong K, Bumpstead S, Van Der Weyden L, Reinholdt LG, Wilming LG, Adams DJ, Keane TM. Sequencing and characterization of the FVB/NJ mouse genome. Genome biology. 2012;13:R72. doi: 10.1186/gb-2012-13-8-r72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Yalcin B, Fullerton J, Miller S, et al. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc Natl Acad Sci USA. 2004;101:9734–9739. doi: 10.1073/pnas.0401189101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Yang H, Bell TA, Churchill GA, Pardo-Manuel de Villena F. On the subspecific origin of the laboratory mouse. Nat Genet. 2007;39:1100–1107. doi: 10.1038/ng2087. [DOI] [PubMed] [Google Scholar]
  77. Yang H, Ding Y, Hutchins LN, Szatkiewicz J, Bell TA, Paigen BJ, Graber JH, de Villena FP-M, Churchill GA. A customized and versatile high-density genotyping array for the mouse. Nat Meth. 2009;6:663–666. doi: 10.1038/nmeth.1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Yang H, Wang JR, Didion JP, et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat Genet. 2011;43:648–655. doi: 10.1038/ng.847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Yonekawa H, Gotoh O, Tagashira Y, Matsushima Y, Shi LI, Cho WS, Miyashita N, Moriwaki K. A hybrid origin of Japanese mice “Mus musculus molossinus”. Current topics in microbiology and immunology. 1986;127:62–67. [PubMed] [Google Scholar]
  80. Yonekawa H, Moriwaki K, Gotoh O, Miyashita N, Matsushima Y, Shi LM, Cho WS, Zhen XL, Tagashira Y. Hybrid origin of Japanese mice “Mus musculus molossinus”: evidence from restriction analysis of mitochondrial DNA. Mol Biol Evol. 1988;5:63–78. doi: 10.1093/oxfordjournals.molbev.a040476. [DOI] [PubMed] [Google Scholar]
  81. Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics (Oxford, England) 2012;28:3326–3328. doi: 10.1093/bioinformatics/bts606. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

335_2017_9704_MOESM10_ESM. Supplementary Figure 1.

Genealogical relationships among 26 Montpellier strains, 36 strains from dbSNP, and 67 wild caught mice from Harr et al. (2016). Similar to Figure 1 but note that species names have changed to accommodate larger sample sizes. Strain names on a black box indicate novel Montpellier exomes sequenced in this study. Nodes labeled with small black circles were supported with at least 95% bootstrap support. Importantly, as seen in Fig. 1, the classical inbred strains cluster to the exclusion of wild derived inbred strains. Ignoring the classical inbred strains, M. m. musculus and M. m. domesticus are sister groups, unlike in Fig. 1 where M. m. musculus and M. m. castaneus are sister groups.

335_2017_9704_MOESM11_ESM. Supplementary Figure 2.

Principal components analysis of genetic variation among 26 Montpellier strains and 36 “classical inbred” strains. Colors follow Fig. 1.

335_2017_9704_MOESM12_ESM. Supplementary Figure 3.

Distribution of stop codons, arranged by the proportion of wild type protein that remains.

335_2017_9704_MOESM13_ESM. Supplementary Figure 4.

Proportion of wild type protein truncated by early stop codons, separated by whether the early stop codon occurs on a facultative or constitutive exon.

335_2017_9704_MOESM1_ESM. Supplementary Figure 5.

Examples of pairwise distribution of genetic divergence vs. haplotype length for 2 pairs of wild derived inbred strains.

335_2017_9704_MOESM2_ESM

Supplementary Table 1. A list of the Montpellier strains

order=arbitrary order; Strain=strain name; Genus=genus name; Species=species name; Country=country from which strain was originally collected; Established=year that strain was established in laboratory; Genotyped_MDA=was strain genotyped with Mouse Diversity Array; Karyotype=standard (std, 2N=30) or alternative 2N; Exome_sequencing_Current_study=whether strain was included in present study.

335_2017_9704_MOESM3_ESM
335_2017_9704_MOESM4_ESM
335_2017_9704_MOESM5_ESM

Supplementary Table 4. Number of SNPs discovered across “All Genotypes”, each respective species analyzed separately, or each respective species removed. Table divided according to whether SNPs were also found in dbSNP (top panel), total WDIS dataset (middle panel), or not found in dbSNP (bottom panel)

335_2017_9704_MOESM6_ESM
335_2017_9704_MOESM7_ESM. Supplementary Table 6.

Results of ABBA-BABA tests.

DOM_strain=Wild-derived M. m. domesticus strain which was included in the ABBA-BABA test; ABBA and BABA=the number of ABBA or BABA sites counted; N_windows=The number of 5 Mb windows that contained at least 20 ABBA-BABA sites; remaining columns=statistics associated with the ABBA-BABA jacknife permutation.

335_2017_9704_MOESM8_ESM
335_2017_9704_MOESM9_ESM

RESOURCES