Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jan 4.
Published in final edited form as: Nat Genet. 2016 Jul 4;48(8):919–926. doi: 10.1038/ng.3609

Genome-wide association study of behavioral, physiological and gene expression traits in commercially available outbred CFW mice

Clarissa C Parker 1,2,3, Shyam Gopalakrishnan 1,4, Peter Carbonetto 1,5, Natalia M Gonzales 1, Emily Leung 1, Yeonhee J Park 1, Emmanuel Aryee 1, Joe Davis 1, David A Blizard 6, Cheryl L Ackert-Bicknell 7,8, Arimantas Lionikas 9, Jonathan K Pritchard 10,11,12, Abraham A Palmer 1,13,14,15
PMCID: PMC4963286  NIHMSID: NIHMS794262  PMID: 27376237

Abstract

Although mice are the most widely used model organism, genetic studies have suffered from limited mapping resolution due to extensive linkage disequilibrium (LD) that is characteristic of crosses among inbred strains. Carworth Farms White (CFW) mice are a commercially available outbred mouse population that exhibit rapid LD decay compared to other available mouse populations. We performed a genome-wide association study (GWAS) of behavioral, physiological and gene expression phenotypes using 1,200 male CFW mice. We used genotyping-by-sequencing (GBS) to obtain genotypes at 92,734 single nucleotide polymorphisms (SNPs). We also measured gene expression using RNA-Sequencing in three brain regions. Our study identified numerous behavioral, physiological and expression quantitative trait loci (QTLs). We integrated the behavioral QTL and eQTL results to implicate specific genes, including Azi2 in sensitivity to methamphetamine and Zmynd11 in anxiety-like behavior. The combination of CFW mice, GBS and RNA-Sequencing constitutes a powerful approach to GWAS in mice.

Introduction

In the last decade, genome-wide association studies (GWAS) have demonstrated that common alleles influence susceptibility to virtually all common diseases13. The success of GWAS in elucidating the genetic determinants of disease in humans is due in part to the large number of recombinations among unrelated individuals, which permits high-resolution mapping across the genome. One important conclusion from those studies is that most causal loci appear to be due to regulatory rather than coding polymorphisms4.

Mice offer a powerful tool for elucidating the genetic architecture of complex traits: environmental factors can be held constant or systematically varied; genome editing permits experimental testing of identified genotype-phenotype relationships; most mouse genes have a human homolog, allowing rapid translation to humans; and relevant tissues can be obtained under highly controlled conditions and used to identify gene expression quantitative trait loci (eQTLs). However, the mouse populations used in most prior studies lacked sufficient recombination to narrow the implicated loci to a tractable size and thus generally failed to identify specific genes5,6.

In this study, we mapped QTLs and eQTLs using Carworth Farms White (CFW) mice, which are a commercially available outbred population7. While CFW mice were not developed for genetic research, they have several attractive properties. CFW mice were derived from a small number of founders and have been subsequently maintained as an outbred population for more than 100 generations, thus degrading linkage disequilibrium (LD) between nearby alleles810. Although CFW mice have longer range LD compared to most human populations, they have less LD than other commercially available laboratory mice9, and therefore should provide fine-scale mapping resolution. Compared to humans, the more extensive LD in CFW mice means that fewer markers are needed to perform GWAS and correspondingly lower levels of significance are required because fewer independent hypotheses are tested. We used genotyping-by-sequencing (GBS) to overcome another barrier to GWAS in mice, which is the high cost and limited coverage of extant SNP genotyping arrays. Finally, based on the importance of regulatory variation suggested by human GWAS4,11, we identified eQTLs that co-mapped with behavioral QTLs in an effort to identify the most likely causal genes.

Results

We phenotyped 1,200 male CFW mice for conditioned fear, anxiety-like behavior, methamphetamine sensitivity, prepulse inhibition, fasting glucose, body weight, tail length, testis weight, the weight of five hindlimb muscles, bone mineral density, bone morphology and gene expression in prefrontal cortex, hippocampus and striatum (Figure 1, Supplementary Figures 1–4, Online Methods, Supplementary Note).

Figure 1. Components of study.

Figure 1

Each of the 4 panels illustrates a component of the study: (A) Behavioral testing and measurement of physiological traits; (B) Genotyping-by-sequencing (GBS); (C) Measurement of gene expression in brain tissues using RNA-Seq; (D) QTL mapping for physiological and behavioral traits, and for gene expression.

Genotyping

Existing mouse SNP genotyping technologies, such as the Mouse Universal Genotyping Array (MUGA), MegaMUGA12, the more recent GigaMUGA13 and the Mouse Diversity Array (MDA)14 were not designed to capture common genetic variation in the CFW population. Furthermore, we sought to reduce the cost of genotyping, which has been a barrier to GWAS in mice. Therefore, we adapted GBS, which was originally developed in maize15, for use in mice. We used GBS to genotype 1,024 CFW mice, and identified 92,734 autosomal bi-allelic SNPs after filtering, 79,284 (86%) of which were present in dbSNP (v137). The remaining 13,450 SNPs (14%) represent “novel” SNPs that had not been previously reported. The distribution of GBS SNPs on autosomes is shown in Figure 2A. The nonuniform distribution of SNPs is likely due to differences in the numbers of polymorphic markers among all laboratory mice (Figure 2A, Supplementary Figure 5) and regions that are identical-by-descent among CFW mice. The non-uniform distribution of polymorphic SNPs appears to be a characteristic of CFW mice since polymorphic SNPs identified by the MegaMUGA array showed a similar pattern (Figure 2A; r2 = 0.43 on log-scale).

Figure 2. Genetic characteristics of CFW mouse population.

Figure 2

(A) Density of GBS SNPs on autosomal chromosomes; (B) Mean LD (r2) decay rates estimated using frequency-matched SNPs55, with MAF > 20%, in a 34th generation AIL derived from LG/J and SM/J strains43,46, heterogeneous stock (HS) mice bred for > 50 generations49, the Hybrid Mouse Diversity Panel (HMDP)83, a panel of 30 inbred lab strains14,52, Diversity Outbred mice12, and CFW mice; (C) Treemix analysis summarizing genetic relationship between CFW mice and inbred strains in Wellcome Trust sequencing panel.

To assess the quality of GBS genotypes, we estimated the genotyping error rate in two ways. First, we compared GBS SNPs against those that were also present on the MegaMUGA array among 24 CFW mice that were genotyped using both platforms. This comparison yielded an overall discordance rate of 3%. We obtained a second estimate of the error rate of 1.6% by comparing genotypes in pairs of haplotypes that were identical-by-descent. Based on these results, we concluded that GBS provided a larger number of polymorphic SNPs than were found using MegaMUGA.

Genetic Architecture of the CFW Population

Comparing LD in different populations is useful for gauging mapping resolution16. Figure 2B shows that LD (r2) decays rapidly in CFW mice compared to other populations, consistent with previous findings based on a much smaller number of SNPs9,17, supporting their suitability for high-resolution mapping. Importantly, the majority of the SNPs we identified in CFW mice segregate among domesticus-derived laboratory strains (Figure 2C). Unlike the Collaborative Cross (CC) and the Diversity Outbred (DO), few of the SNPs found in CFW are derived from the castaneous and musculus subspecies1820. When compared to a panel of inbred mice, CFW are most genetically similar to FVB/NJ (Figure 2C).

Next, we considered the distribution of minor allele frequencies (MAF) of SNPs genotyped in the CFW mice (Supplementary Figure 6). The majority of SNPs (73%) had relatively high allele frequencies (MAF > 0.05). This profile is consistent with the reported history of CFW mice; namely, a severe bottleneck at the inception of the CFW population, followed by expansion to create an outbred population with a modest effective population size9. The mean MAF of novel SNPs was lower than for previously reported SNPs, consistent with the hypothesis that some of these novel SNPs are unique to the CFW population.

Although we requested only one mouse from each litter, we were concerned that individuals in our study might have close familial relationships because they were sampled from a finite breeding population; however, we did not detect widespread population structure or cryptic relatedness in the CFW mice (Supplementary Figures 7–11).

SNP Heritability

Supplementary Table 1 shows the SNP heritability21,22, which is the proportion of variance in the trait explained by available SNP genotypes. SNP heritability estimates ranged from 9–60%, with a mean of 28%. The mean SNP heritability for physiological traits was slightly higher (32%) compared to behavioral traits (27%).

GWAS

We mapped QTLs for 66 behavioral and physiological phenotypes (Supplementary Tables 1, 2; Figure 3A). We used GEMMA to fit a linear mixed model (LMM) and quantify support for an association at each SNP. We also used a simpler linear model that did not correct for population structure and observed that it produced broadly similar results (see Supplementary Figure 12). However, we have presented the results from the LMM-based analysis because it may reduce subtle inflation of the test statistics due to close relationships or fine-scale population structure. We calculated a threshold via permutation, which is a standard approach for QTL mapping in mice that controls for the type I error rate23,24 (Supplementary Figure 13). This approach identified numerous QTLs for physiological and behavioral traits (Figure 3A, Supplementary Figures 14–18) that exceeded 2 × 10−6 (p < 0.1). Supplementary Table 2 contains more detailed information about all the most significant physiological and behavioral QTLs.

Figure 3. QTLs for physiological and behavioral traits.

Figure 3

(A) Minimum p-values for association across all tested behavioral and physiological phenotypes (see Supplementary Table 1 and 2 for details). (B) Genome-wide scan for testis weight and (C) Pre-pulse inhibition in response to +12 dB pre-pulse. (D) Association signal for testis weight near the QTL on chromosome on 13. (E) Association signal for pre-pulse inhibition near the QTL on chromosome 7. Dotted red lines indicate thresholds (p<0.1) estimated via permutation tests.

For testis weight we found a strong association with rs6279141 on chromosome 13 (Figure 3B, p-value: 4.51 × 10−18) that accounted for 7.5% of variation in that trait. The implicated region contained few genes (Figure 3D), one of which was Inhba, a gene that has been shown to affect testis morphogenesis, testicular cell proliferation and testis weight in mice2527, and is therefore a promising candidate gene.

The strongest association for soleus muscle weight mapped to rs30535702 on chromosome 13 and explained 2.8% of trait variance (p-value: 8.33 × 10−8). One of the genes in this interval, Fst, is known to influence muscle mass28,29 and is a strong candidate to explain this association.

We identified several examples of pleiotropy. For example, two independently measured muscle weights, tibialis anterior (TA) and extensor digitorum longus (EDL), were both associated with rs27338905 on chromosome 2, in each case accounting for 2.3% of the variation. Tp53inp2 is near the peak marker and is abundantly expressed in skeletal muscle30, where it functions as a negative regulator of muscle mass31. Likewise, the weight of three muscles (gastrocnemius, EDL and soleus) mapped to the proximal end of chromosome 13; in each case, the minor allele was associated with increased muscle weight. Finally, on chromosome 12 we identified pleiotropic effects on tibia length and EDL weight.

Unexpectedly, we found that CFW mice appear predisposed toward abnormally high bone mineral density (BMD). This is a characteristic of CFW mice that does not appear to be shared with commonly used inbred laboratory strains (Supplementary Figure 3). This “abnormal BMD” phenotype was strongly associated with rs33583459 on chromosome 5 and rs29477109 on chromosome 11 (p-values: 1.57 × 10−9, 1.12 × 10−14, respectively). The locus on chromosome 5 contains a large number of genes, including Abcf2 and Slc4a2. The human ortholog, ABCF2, has been associated with BMD in the largest GWAS of BMD completed to date32, and is highly expressed in osteoblasts33. Slc4a2 plays a critical role in osteoclasts34 and homozygous deletion of Slc4a2 is associated with the osteopetrosis-like phenotype “Marble Bone Disease” in Red Angus cattle35. Thus, both Abcf2 and Slc4a2 are viable candidates for this region. The association on chromosome 11 contains the gene Col1a1. In humans, Osteogenesis Imperfecta Type I can be caused by a null allele of COL1A1 and results in gracile bones with decreased strength36,37. COL1A1 is also associated with other bone size phenotypes38, making Col1a1 a likely causal gene for this locus.

Finally, we identified several associations for behavioral traits, including methamphetamine sensitivity on chromosome 6 at rs22397909 (p-value: 9.03 × 10−7) and chromosome 9 at rs46497021 (p-value: 1.58 × 10−6); these associations account for 2.6% and 2.1% of the phenotype variance, respectively (Figure 4). We also identified an association for anxiety-like behavior with rs238465220 on chromosome 13 (p-value:7.31 × 10−8) that explained 3% of the variance. For prepulse inhibition (12 db), we identified associations with rs264716939 on chromosome 7 (Figure 3C) and rs230308064 on chromosome 13 (p-values: 1.18 × 10−6 and 2.17 × 10−6, respectively). There were many genes in the ~3Mb region on chromosome 7 that were associated with PPI (Figure 3E), making it difficult to identify the causal gene(s). Candidate genes for the associations with behavioral traits are discussed below.

Figure 4. Overview of eQTL mapping.

Figure 4

(A) Color of each pixel in the matrix depicts the lowest p-value among all eQTLs using a 10 Mb × 10 Mb window. (B) Overlap of genes with eQTLs in the three brain tissues detected using the traditional cis-eQTL mapping method (not ASE). The permutation-based p-value threshold for each eQTL is 0.05. (C) Genome-wide scan for total locomotor activity on day 3 of the methamphetamine sensitivity tests. (D) Association signal for total locomotor activity in the QTL region on chromosome 9. (E) Association signal for expression of Azi2 in the striatum, in the same region as panel D. Dotted red lines indicate thresholds (p < 0.1) estimated via permutation tests.

eQTLs

In an effort to identify causal genes within our behavioral QTLs, we mapped eQTLs for three brain regions that are critical for the behaviors that we studied. We performed RNA-Seq on messenger RNA (mRNA) from three brain regions: hippocampus (n = 79), striatum (n = 55) and prefrontal cortex (n = 54). In a cis-eQTL scan that was limited to the region flanking the gene being interrogated (Supplementary Figures 19, 20), we identified a total of 6,045 associations for 4,174 genes (Figure 4A, Supplementary Figure 21, Supplementary Table 3) at a permutation-derived significance threshold of p < 0.05 (this threshold reflects a per-gene, per-brain region). For 534 of those genes we identified a cis-eQTL in all three tissues. For an additional 803 genes, we identified a cis-eQTL in two of the three tissues (Figure 4B). The RNA-Seq data were generated from a set of partially overlapping individuals; therefore, we did not perform a joint analysis of the three brain tissues39.

In addition, we searched for cis-eQTLs by examining allele-specific expression (ASE), which measures relative expression of the two possible RNA alleles derived from a heterozygous SNP40,41. We identified 655 genes with ASE in at least one of the three tissues. Of these, 380 (58%) were found only using ASE, and 275 (42%) were also identified in the conventional cis-eQTL scan, suggesting that there was more overlap than would be expected by chance. Overlap was likely limited by several factors, including type I errors in the ASE and type II errors in both the ASE and conventional cis-eQTL mapping.

We also mapped eQTLs genome-wide for each gene in an effort to detect trans-eQTLs. We identified 2,278 trans-eQTLs that were significant (p < 0.05 permutation-based threshold) after testing 43,414 transcripts across the three brain regions. We expected almost that many tests to be positive under the null hypothesis. Consistent with this, a quantile-quantile (QQ) plot of these results suggested that only a small number of these results were true positives (Supplementary Figure 22). As expected, most true positive results appear to be from the hippocampus, which had the largest sample size (n = 79).

Integration of behavioral QTLs with eQTLs

Based on evidence from human GWAS, we anticipated that heritable gene expression polymorphisms (eQTLs) would be responsible for most of the observed behavioral associations. Therefore, we tried to identify eQTLs that co-mapped with behavioral QTLs, under the assumption that the eQTL might be the molecular cause of the behavioral QTL. For example, we observed an association between methamphetamine sensitivity and rs46497021 on chromosome 9 (p-value: 1.6 × 10−6; Figure 4C, Supplementary Figure 18). The implicated region was small (<1 Mb) and contained only 2 genes: Cmc1 and Azi2 (Figure 4D). We identified cis-eQTLs for both genes in the striatum, which is the tissue that is most relevant for methamphetamine sensitivity. However, rs46497021 was most strongly correlated with Azi2 expression (p-value: 1.2 × 10−8; Figure 4E). In addition, the pattern of SNPs associated with methamphetamine sensitivity and Azi2 expression showed obvious overlap. Therefore, while both Cmc1 and Azi2 are credible positional candidates, the eQTL data suggest that Azi2 is most likely to be the causative gene. Neither gene has been previously implicated in dopaminergic/striatal processes, suggesting that this observation may offer novel insights into the biology of this drug abuse-relevant trait.

Additionally, we identified an association between anxiety-like behavior and rs238465220 on chromosome 13 (p-value: 7.3 × 10−8, Supplementary Figure 17). The implicated region spanned ~1.5 Mb and contained 4 genes: Chrm3, Larp4b, Dip2c, and Zmynd11. Among those genes, rs238465220 was also associated with expression of Zmynd11 in the hippocampus, suggesting that this locus may influence anxiety-like behavior through regulation of Zmynd11 expression (Supplementary Figure 23). Zmydn11 has not been previously implicated in anxiety; however, copy number variants in ZMYND11 were recently shown to be associated with autistic tendencies and aggressive behaviors in humans42. These examples illustrate the utility of combining GWAS with eQTL data to identify the molecular mechanism by which a chromosomal region influences a complex trait.

Discussion

We performed a GWAS in a commercially available outbred mouse population, which identified numerous physiological, behavioral, and expression QTLs. In several cases the implicated loci were smaller than 1 Mb and contained just a handful of genes that included an obvious candidate. In addition, we used the eQTL results to further parse among the genes in the intervals that were implicated in the behavioral traits.

The goal of using CFW mice was to enhance our mapping resolution. CFW have shorter-range LD than other commercially available populations9. Using GBS genotypes, we estimated LD in CFW mice and compared it to other mapping populations (Figure 2B). The 34th generation of the LG/J×SM/J advanced intercross line (LG×SM-AIL) that we have used in prior studies4346 showed more extensive LD compared to CFW. Various outbred heterogeneous stocks (HS), typically made up of 8 inbred strains, have also been used in prior mapping efforts4749. We examined one HS49, and found that it also had longer range LD compared to CFW. The Hybrid Mouse Diversity Panel (HMDP)50,51, which is a collection of approximately 100 inbred mouse strains that has been used for QTL mapping, also showed greater LD compared to CFW, as did a smaller panel of 30 inbred strains52. The DO12,18,19,53 exhibited LD decay that was almost as degraded as CFW. Populations like the AIL and HS (including the DO) are expected to show decreased LD in the future due to the accumulation of additional recombinations (for example, the LG×SM-AIL is now at generation 62). MF-1 is another commercially available outbred population that has been used to map QTLs50,54, but we were unable to obtain the data needed to estimate LD decay in this population. Comparing LD patterns in different populations is a common method for estimating mapping resolution16, however additional factors including the allele frequency distribution55, population structure56, error rates and the number, effect size and frequency of causal variants all influence power and mapping resolution. Despite these limitations, our comparison of LD (Figure 2B) and our mapping results (Figures 34, Supplementary Table 2, Supplementary Figures 14–18, 21–23) suggest that CFW mice are an attractive option for fine-mapping studies.

Another important parameter for GWAS studies is allele frequency, since power to identify associations increases with greater MAF. Laboratory mouse populations have higher average MAF compared to humans or wild populations17. We found that 73% of SNPs genotyped in this study had MAF > 0.05, although our SNP filtering steps may have underestimated the number of rare SNPs. Populations produced by crossing inbred strains, such as F2 crosses, recombinant inbred (RI) lines, AILs and HS typically have even more desirable MAF distributions43. Because the ascertainment of SNPs included in genotyping platforms directly influences the estimated MAF distribution, we did not attempt to use publicly available data to compare MAFs in commonly used mapping populations.

We found that CFW mice lacked genetic variability in certain regions; for example, Chromosome 16 had a low density of polymorphic markers as measured using both GBS and MegaMUGA (Figure 2A) and no significant QTLs (Figure 3A). This is an example of a previously described tendency for laboratory mouse populations to harbor regions that are identical-by-descent43,57.

Several other advantages of CFW mice include their commercial availability, their low cost, and the ability to acquire non-siblings upon request. We also found that the CFW mice were easy to handle, and their uniform coat color simplified automated scoring of certain behavioral traits.

One barrier to more widespread adoption of GWAS in mice has been the lack of universal and economical SNP genotyping platforms. One innovative aspect of this paper is the use of GBS to overcome this obstacle. GBS is a reduced-representation sequencing approach in which a small fraction of the genome is sequenced at moderate depth in order to obtain genotypes at a subset of markers. While GBS shares some characteristics with low-coverage whole-genome sequencing5860, GBS yields high coverage for a subset of the genome, thus acquiring information about fewer SNPs but with greater confidence. Our GBS methods included a custom-designed library preparation protocol (which reduced per-sample costs), and used the standard software toolkits GATK61 and IMPUTE262. An advantage of GBS was that it did not require pre-selection of polymorphic SNPs. We chose conservative criteria for SNP calling, which yielded 92,734 SNPs, of which 14% were newly discovered and possibly unique to CFW. These 92,734 SNPs provided extensive coverage of the genome (Figure 2A) and allowed for fine-mapping (Figures 3C–D, 4D–E and Supplementary Figures 17, 23). The number of markers obtained using GBS can be titrated by varying the restriction enzymes used, the fragment sizes selected and the degree of sample multiplexing. GBS involves imputation to correct errors and to populate missing genotypes, requiring more expertise than analysis of SNP genotyping arrays. Compared to conventional array-based SNP genotyping, GBS had a higher error rate, which is expected to modestly decrease power, but should not produce false positive QTLs since the errors will not be correlated with the traits. We are currently improving genotype imputation methods for populations in which the founder haplotypes are known, such as AILs and HS populations 12,45,46,63,64. Because the monetary advantage of GBS over array-based genotyping will continue to improve as sequencing prices decrease, we anticipate that GBS and other sequencing-based approaches will supplant array-based methods in the coming years.

The majority of human GWAS findings implicate regulatory rather than coding differences4,11. The identified haplotypes frequently contain several genes. It is now widely appreciated that even when an association can be localized to a single gene, that gene may not be the cause of the association65, meaning that proximity to the peak SNP is not sufficient to identify the causal gene. eQTLs can provide the crucial link between a region implicated by GWAS and the biological processes that underlie that association. Therefore, a major goal of our study was to integrate behavioral QTL and eQTL data. We used RNA-Seq to examine gene expression in three brain regions that are known to be important for the behavioral traits that we studied. Although Azi2 was not an obvious candidate for the behavioral QTL for methamphetamine sensitivity, our data showing the co-mapping of an eQTL for Azi2 expression in the striatum provide an additional layer of evidence. Similarly, Zmynd11 has not been previously implicated in anxiety-like behavior, but the eQTL for Zmynd11 expression in the hippocampus suggests that it is the most promising of the four genes within the behavioral QTL. These examples demonstrate the power of integrating fine-mapping of behavioral QTLs and eQTLs and extend on multiple prior mouse studies that have used similar approaches in conjunction with F2 crosses66, RIs6769, selected lines70, HS71, outbred MF-1 mice50 and the HMDP51,72,73.

RNA-Seq offers a number of advantages relative to array-based gene expression measurements7480. In particular, we were able to map cis- and trans-eQTLs using a traditional mapping approach, and simultaneously map cis-eQTLs by quantifying ASE. Since only a fraction of genes can be studied using ASE, we did not anticipate complete overlap between genes identified using these two approaches. Using ASE we identified 655 cis-eQTLs of which 42% were also identified as cis-eQTLs using conventional mapping.

We found that physiological traits typically had slightly higher heritabilities than behavioral traits (Supplementary Table 1). We also found the effect sizes of individual associations tended to be higher for physiological traits (Supplementary Table 2), consistent with findings from another recent study in rats81. However, it was not always true that traits with the highest heritabilities also showed the largest effect sizes for individual associations. Because the effect size of individual QTL alleles is of paramount importance for assessing power at a given sample size, and because this parameter is never known in advance, it is not possible to provide general guidelines about the sample size needed for future studies. Based on our results, we suggest that a sample size of 1,000 or more CFW mice should be used for most traits, though traits like testis weight and abnormal BMD would have yielded significant results with just a few hundred mice. While our use of the CFW was intended to increase mapping precision, there is a direct tradeoff between mapping precision and statistical power6, therefore sample sizes required for studies using CFW will necessarily be larger than for F2, recombinant inbred, or other traditional mapping populations that offer less precision.

Our data do not directly address the reasons that the effect sizes we observed are so much larger than the effect sizes observed in most human GWAS. We can speculate that the unique population history of laboratory mice (domestication, selection and repeated population bottlenecks) have increased the frequency of alleles that may have been rare in ancestral wild mouse populations. It is also true that, unlike many traits studied in human GWAS, the traits we are examining are not disease traits (and thus may not influence fitness), and therefore may not have been influenced by natural selection even among ancestral wild mouse populations from which laboratory populations were originally derived. Furthermore, laboratory mice are drawn from a much more uniform environment, potentially diminishing gene-by-environment interactions that may reduce effect sizes in human GWAS. Finally, because LD in the CFW mice is more extensive as compared to humans, we are effectively testing fewer hypotheses and therefore applied a lower (permutation-derived) significance threshold.

We have shown that use of CFW mice in conjunction with GBS and RNA-Seq provides a powerful and efficient means for identifying genetic associations, and for nominating candidate genes within the associated regions. Compared to other outbred mouse populations, CFW mice showed rapid decay of LD (Figure 2B), were less expensive, and primarily allowed examination of domesticus derived alleles (Figure 2C). Compared to human GWAS, this approach provided dramatically reduced costs, the ability to examine phenotypes that include experimental manipulations that would be impractical or unethical in humans, the ability to obtain tissue samples for expression analysis, and the ability to exert exquisite control over environmental variables. Identified genes can be manipulated in future studies via genome engineering82. Thus, our approach can be used to rapidly generate specific and testable hypotheses for a wide array of complex traits. More broadly, our results demonstrate methods and principles that apply to a variety of other model systems.

Online Methods

Animal Models

We phenotyped 1,200 male Carworth Farms White (CFW) mice (Mus musculus) that were obtained from the Charles River Laboratories facility in Portage, Michigan, USA (CRL; strain code: CRL:CFW(SW); facility code: P08). We performed a power analysis using the program Quanto (hydra.usc.edu/gxe). This indicated that 1200 mice would provide 80% power to detect QTLs that accounted for ~3% of total trait variance with p < 5 x 10−7. Since our study was completed, the Portage colony has been relocated to Kingston, New York (new code K92). It has been reported that ancestors of the CFW mice were obtained from a large colony of Swiss mice in 1926, and maintained by Dr. Webster at the Rockefeller Institute. A single pair of highly inbred albino mice were later acquired by Carworth Farms and used to initiate an outbred mouse stock. Several mice from this colony were later acquired in 1974 by CRL and were subsequently maintained as an outbred population810.

Every two weeks, 48 male CFW mice were shipped from CRL in Portage, MI to our laboratory in Chicago, IL. We requested that CRL send only one mouse from each litter to avoid obtaining siblings, since close relatives reduce power to map QTLs, and complicate analysis. The average age of the mice upon arrival in our labs was 35 days (ranging from 34 to 46 days), and their average weight was 25.5 g (ranging from 13.4 g to 38.7 g). Mice were housed 4 per cage and given ~15 days to adapt to their new environment (Supplementary Figure 1). Standard lab chow and water were available ad libitum, except during the behavioral procedures and prior to testing for fasting glucose. Mice were maintained on a standard 12:12h light-dark cycle (lights on at 06:30). All phenotyping occurred during the light phase between 08:00 and 16:00 hours, over the period of August 2011 to December 2012. All procedures were approved by the University of Chicago Institutional Animal Care and Use Committee (IACUC) in accordance with National Institute of Health guidelines for the care and use of laboratory animals.

Phenotyping

The order of phenotyping was identical for each mouse, and is shown schematically in Supplementary Figure 1. One day after arrival, mice were fasted for four hours prior to measurement of blood glucose levels. Fourteen days later, we assessed their response to a novel environment and to administration of 1.5 mg/kg of methamphetamine in a 3-day paradigm64. Twelve days later, we tested mice for conditioned fear46. Nine days after that, we tested mice for prepulse inhibition44 (PPI). Finally, after 15 days we weighed and sacrificed the mice. Immediately after sacrifice, we weighed testis, and collected one leg for measurement of muscle phenotypes, and collected the other leg for measurement of bone-phenotypes. We also measured tail length at this time (Supplementary Note).

RNA-Seq

After sacrifice, we collected brain tissue from a subset of mice as a source of mRNA from the hippocampus (n = 79), striatum (n = 55) and frontal cortex (n = 54). We used RNA-Seq84,85 to quantify gene transcript abundance in these brain tissues. Library preparation was performed with the TruSeq RNA Sample Kit (Illumina). Samples were multiplexed 5-per lane and sequenced on an Illumina HiSeq 2000 sequencer, using single-end 100-bp reads. We processed the RNA-Seq short reads using the Tuxedo software suite79: (1) first, we aligned the short reads to the reference genome assembly (NCBI release 38, mm10) with bowtie286; (2) next, we used tophat279 to align the short reads to known splice junctions; (3) finally, we used cufflinks87 to calculate, for each gene, a gene-level measure of expression based on the mapped reads. This measure is reported in reads per kilobase per million reads mapped (RPKM). This measure does not depend on length of coding sequences or sequencing depth of each sample (so mapping eQTLs will not be biased by these factors). We focused on this gene-level measurement for subsequent investigation, including eQTL mapping and assessment of allele-specific expression (ASE). See Supplementary Note for further details.

Genotyping-by-sequencing (GBS)

Genotyping-by-sequencing (GBS) is a reduced-representation genotyping method for obtaining genotyping information by sequencing only regions that are proximal to a restriction enzyme cut site15. Our protocol was adapted from the procedures previously described88. GBS libraries were prepared by digesting genomic DNA with a restriction enzyme, PstI, and annealing oligonucleotide adapters to the resulting overhangs. Samples were multiplexed 12-per lane, and sequenced on an Illumina HiSeq 2000 sequencer using single-end 100-bp reads. We obtained an average of 4.8M reads per sample. By focusing the sequencing effort on the Pstl restriction sites, we obtained high coverage (~15x, Supplementary Figure 24) at a subset of genomic loci, although those reads were very non-uniformly distributed. We aligned the 100-bp single-end reads to Mouse Reference Assembly 38 from the NCBI database (mm10) using bwa89. We used GATK61,90 to discover variants and to obtain genotype probabilities. For the Variant Quality Score Recalibration (VQSR) step, we calibrated variant discovery against (1) whole-genome sequencing (WGS) data that we ascertained from a small set of CFW mice, (2) SNPs and indels from the Wellcome Trust Sanger Mouse Genome project91, and SNPs available in dbSNP release 137. We used IMPUTE262 to improve low-confidence genotypes, or genotypes that were not called in individual mice. Supplementary Table 4 and the Supplementary Note detail our efforts to estimate the error rate of GBS in this study. The Supplementary Note also contains a description of a small number of SNPs that were discarded because a large proportion of the genotypes were imputed with low certainty. Finally, the Supplementary Note details our identification of 110 DNA samples that appeared to be mislabeled and were therefore excluded from our study (Supplementary Figures 25–28).

Treemix analysis

We estimated phylogenetic relationships between the CFW mice and different lab strains sequenced as part of the Wellcome Trust mouse genome sequencing project using treemix92. We used the genotypes for the lab strains sequenced by the Wellcome Trust to obtain the locations of SNPs that were identified in the CFW mice using our GBS pipeline. We excluded the mus spretus strain from the Wellcome Trust data, since this strain was included as an outgroup. Since the lab strains are all inbred, we assumed that the allele frequency was 1 or 0. We represented each strain by only a single individual. We used a subset of 100 CFW mice to compute the allele frequencies from the genotype likelihoods of GBS SNPs in our sample. Treemix was used to fit a maximum-likelihood tree to all the lab strains and CFW samples.

QTL mapping for behavioral and physiological traits

We performed a GWAS for the behavioral and physiological phenotypes using all SNPs with MAF > 2% and good imputation quality (defined as 95% of the samples having a maximum probability genotype greater than 0.5). Although our analyses did not suggest the presence of close relatives or population structure, we used the linear-mixed model implemented in the program GEMMA93. GEMMA is similar to a standard linear regression, in which the quantitative trait (Y) is modeled as a linear combination of the genotype (X) and the covariates (Z), except that it includes an additional “random” or “polygenic” effect capturing the covariance structure in the phenotype that is attributed to genome-wide genetic sharing:

yi=μ+zi1α1++zimαm+xijβj+ui+εi.

The notation in this expression is defined as follows: yi is the ith phenotype sample; zik is ith sample of covariate k, in which k ranges from 1 to the number of covariates included in the regression (m); αk is the coefficient corresponding to covariate k; xij is the genotype of sample i at SNP j; βj is the coefficient corresponding to SNP j; ui is the polygenic effect for the ith sample; εi is the residual error; and μ is the intercept. The genotype, xij, is represented as the expected allele count, in which 0 represents a homozygous major allele, and 2 represents a homozygous minor allele, and βj is the additive effect of the expected allele count on the phenotype. The residuals εi are assumed to be i.i.d. normal with zero mean and covariance σ2, whereas the polygenic effect u = (u1, …, un)T is a random vector drawn from the multivariate normal distribution with mean zero and n × n covariance matrix σ2 λK, where n is the number of samples.

We estimated the relatedness matrix, K, from the genotype data. We specified the covariance matrix using the realized relationship matrix K = XXT/p, where p is the number of SNPs, and X is the n × p genotype matrix with entries xij. This formulation was derived from a polygenic model of the phenotype in which all SNPs helped explain variance in the phenotype, and the contributions of individual SNPs were i.i.d. normal9496.

The inclusion of a genetic marker in both the fixed and random terms can deflate the test statistic for this marker, leading to a loss of power to detect a QTL; this problem has been termed “proximal contamination”95. To avoid proximal contamination, we computed 19 different K matrices, each one excluding one of the 19 autosomes. To scan markers on a given chromosome, we used the version of K that did not include that chromosome. We have previously proposed this leave-one-chromosome-out (LOCO) approach as a simple solution for avoiding the problem of proximal contamination23.

We used a permutation-based approach to calculate the genome-wide significance threshold for p-values calculated in GEMMA. We estimated the distribution of p-values under the null hypothesis by mapping QTLs in 1,000 randomly permuted data sets, then taking the threshold to be the 100(1 − α)th percentile of this distribution, with α= 0.1. Although this permutation test is technically only valid under the assumption that the samples are exchangeable97, we have previously suggested that ‘naive’ permutations are generally sufficient23. Furthermore, given our observation that population structure is subtle, we expect that this simulation provides a good approximation to the null (Supplementary Note).

Heritability estimates

Instead of computing a point estimate for h2, which is the usual approach (e.g. using the REML estimate98), we evaluated the likelihood over a regular grid of values for h2, which allowed us to directly quantify uncertainty in h2 under the reasonable assumption of a uniform prior for the proportion of variance explained96.

We estimated the SNP heritability, h2, of our phenotypes21. Because the GBS SNPs did not completely tag all casual variants (and because we excluded the sex chromosomes), our estimates of h2 underestimated a trait’s true narrow-sense heritability. To estimate h2, we assumed that all genetic markers made some small contribution to variation in the trait, and that these contributions were normally distributed with the same variance96,98,99. Under this polygenic model, the covariance of the phenotype measurements was Cov(Y1, …,Yn = σ2H, where H=(I+σa2K), I is the n × n identity matrix, K is the n × n realized relatedness matrix, σa2 is the variance of the additive genetic effects, and σ2 is the variance of the residuals. Under this formulation, σa2 represents the relative contribution of the additive genetic variance, and we can use this parameter to provide an estimate for h2:

h2=σa2sa/(σa2sa+1)

where sa is the mean sample variances of all the available SNPs, or the mean of the diagonal entries of K assuming that the columns of X are centered so that each of the columns have a mean of zero. See Supplementary Note for further details.

Expression QTL (eQTL) mapping

We used the RPKM measurements from RNA-Seq and the GBS genotype data to map eQTLs. We performed an eQTL scan separately in each brain tissue (hippocampus, striatum, prefrontal cortex). First, we discarded genes with low levels of expression (RPKM < 1), and genes that showed no variability in expression. For the remaining genes, we quantile-normalized the expression data. To account for unknown confounders, we removed linear effects of the first few principal components (PCs) calculated from the K x N gene expression matrix (20 PCs for hippocampus, 10 PCs for striatum, 20 PCs for prefrontal cortex)41. After removing linear effects of the PCs, we again quantile-normalized the expression data. We then used an LMM as implemented in GEMMA to scan for cis-eQTLs, as described above for the behavioral and physiological phenotypes. To define cis-eQTLs, we only considered SNPs within 1 Mb of the gene’s transcribed region (preliminary analyses indicated that 1 MB captured most of the significant signals; Supplementary Figures 19–20). We used a permutation-based approach to calculate significance thresholds for p-values in the cis-eQTL mapping. We used 1,000 permutations of the expression values to compute a separate significance threshold for each gene, using only the SNPs that were included in the cis-eQTL scan. In addition to cis-eQTL scans, we also performed genome-wide trans-eQTLs scans for all the genes. The genome-wide scans were performed using the same LMM that was used for cis-eQTL analyses, except that all the SNPs outside a 2 Mb region around the gene were included in the trans-eQTL analysis. The significance threshold for trans-eQTLs was computed using permutations of 1,000 randomly selected genes in each tissue; this approach is permissible because all expression traits were quantile-normalized (Supplementary Note).

Allele-specific expression (ASE)

Finally, we performed an analysis of allele-specific expression (ASE) to identify genes that had ASE QTLs. This analysis was performed independently from the mapping of cis-eQTLs described above. We identified variants that had at least 10 samples with high-confidence heterozygote genotype calls. For genes that contained at least one such variant, we compared the relative expression of the two alleles across these heterozygote samples. To account for overdispersion, we used a beta-binomial model to fit the counts of the two alleles for each sample. We then used a likelihood-ratio test to test for significant deviation of the observed data from the expectation of equal counts from both alleles (Supplementary Figure 29 and Supplementary Note).

Supplementary Material

1
2
3
4
5

Acknowledgments

This project was funded by NIH R01GM097737 and P50DA037844 (A.A.P.), NIH T32DA07255 (C.C.P), NIH T32GM07197 (N.M.G.), NIH R01AR056280 (D.A.B.), NIH R01AR060234 (C.A.B.), the Fellowship from the Human Frontiers Science Program (P.C.), and the Howard Hughes Medical Institute (J.K.P.). The authors wish to acknowledge technical assistance from: Dana Godfrey, Sima Lionikaite, Vikte Lionikaite, Ausra S. Lionikiene, and John Zekos; as well as technical and intellectual input from: Drs. Mark Abney, Justin Borevitz, Karl Broman, Na Cai, Riyan Cheng, Nancy Cox, Robert Davies, Jonathan Flint, Leo Goodstadt, Paul Grabowski, Bettina Harr, Ellen Leffler, Richard Mott, Jerome Nicod, John Novembre, Alkes Price, Matthew Stephens, Daniel Weeks, and Xiang Zhou. Finally, we thank the anonymous reviewers for their input, which improved the quality of our study.

Footnotes

The authors declare no competing financial interests.

Author Contributions

A.A.P. conceived of the study; C.C.P. and A.A.P. supervised the project; S.G. and P.C. designed and implemented the statistical and bioinformatics analyses with contributions from C.C.P., J.K.P. and A.A.P.; N.M.G. designed and executed the RNA-Seq and GBS protocols with assistance from E.A. and J.D.; C.C.P. performed the behavioral phenotyping with assistance from E.L. and Y.J.P.; A.L. performed the muscle and bone phenotyping with input from D.A.B.; C.L.A-B. performed the bone mineral density phenotyping; C.C.P., S.G., P.C. and A.A.P wrote the paper, with input from all co-authors.

References

  • 1.Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118:1590–1605. doi: 10.1172/JCI34772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Manolio TA. Bringing genome-wide association findings into clinical use. Nat Rev Genet. 2013;14:549–558. doi: 10.1038/nrg3523. [DOI] [PubMed] [Google Scholar]
  • 3.Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–427. doi: 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Albert FW, Kruglyak L. The role of regulatory variation in complex traits and disease. Nat Rev Genet. 2015;16:197–212. doi: 10.1038/nrg3891. [DOI] [PubMed] [Google Scholar]
  • 5.Mott R, Flint J. Dissecting quantitative traits in mice. Annu Rev Genomics Hum Genet. 2013;14:421–439. doi: 10.1146/annurev-genom-091212-153419. [DOI] [PubMed] [Google Scholar]
  • 6.Parker CC, Palmer AA. Dark matter: are mice the solution to missing heritability? Front Genet. 2011;2:32. doi: 10.3389/fgene.2011.00032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lynch CJ. The so-called Swiss mouse. Lab Anim Care. 1969;19:214–220. [PubMed] [Google Scholar]
  • 8.Rice MC, O’Brien SJ. Genetic variance of laboratory outbred Swiss mice. Nature. 1980;283:157–161. doi: 10.1038/283157a0. [DOI] [PubMed] [Google Scholar]
  • 9.Yalcin B, et al. Commercially Available Outbred Mice for Genome-Wide Association Studies. PLoS Genet. 2010;6 doi: 10.1371/journal.pgen.1001085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chia R, Achilli F, Festing MFW, Fisher EMC. The origins and uses of mouse outbred stocks. Nat Genet. 2005;37:1181–1186. doi: 10.1038/ng1665. [DOI] [PubMed] [Google Scholar]
  • 11.Gusev A, et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet. 2014;95:535–552. doi: 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gatti DM, et al. Quantitative trait locus mapping methods for diversity outbred mice. G3 Bethesda Md. 2014;4:1623–1633. doi: 10.1534/g3.114.013748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Morgan AP, et al. The Mouse Universal Genotyping Array: From Substrains to Subspecies. G3 Bethesda Md. 2015 doi: 10.1534/g3.115.022087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yang H, et al. A customized and versatile high-density genotyping array for the mouse. Nat Methods. 2009;6:663–666. doi: 10.1038/nmeth.1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Elshire RJ, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS One. 2011;6:e19379. doi: 10.1371/journal.pone.0019379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. Am J Hum Genet. 2001;69:1–14. doi: 10.1086/321275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Laurie CC, et al. Linkage disequilibrium in wild mice. PLoS Genet. 2007;3:e144. doi: 10.1371/journal.pgen.0030144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chesler EJ. Out of the bottleneck: the Diversity Outcross and Collaborative Cross mouse populations in behavioral genetics research. Mamm Genome Off J Int Mamm Genome Soc. 2014;25:3–11. doi: 10.1007/s00335-013-9492-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Churchill GA, Gatti DM, Munger SC, Svenson KL. The Diversity Outbred mouse population. Mamm Genome Off J Int Mamm Genome Soc. 2012;23:713–718. doi: 10.1007/s00335-012-9414-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Collaborative Cross Consortium. The genome architecture of the Collaborative Cross mouse genetic reference population. Genetics. 2012;190:389–401. doi: 10.1534/genetics.111.132639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lee SH, et al. Estimation of SNP heritability from dense genotype data. Am J Hum Genet. 2013;93:1151–1155. doi: 10.1016/j.ajhg.2013.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wray NR, et al. Pitfalls of predicting complex traits from SNPs. Nat Rev Genet. 2013;14:507–515. doi: 10.1038/nrg3457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cheng R, Palmer AA. A simulation study of permutation, bootstrap, and gene dropping for assessing statistical significance in the case of unequal relatedness. Genetics. 2013;193:1015–1018. doi: 10.1534/genetics.112.146332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–971. doi: 10.1093/genetics/138.3.963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mendis SHS, Meachem SJ, Sarraj MA, Loveland KL. Activin A balances Sertoli and germ cell proliferation in the fetal mouse testis. Biol Reprod. 2011;84:379–391. doi: 10.1095/biolreprod.110.086231. [DOI] [PubMed] [Google Scholar]
  • 26.Mithraprabhu S, et al. Activin bioactivity affects germ cell differentiation in the postnatal mouse testis in vivo. Biol Reprod. 2010;82:980–990. doi: 10.1095/biolreprod.109.079855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Tomaszewski J, Joseph A, Archambeault D, Yao HHC. Essential roles of inhibin beta A in mouse epididymal coiling. Proc Natl Acad Sci U S A. 2007;104:11322–11327. doi: 10.1073/pnas.0703445104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lee SJ. Quadrupling muscle mass in mice by targeting TGF-beta signaling pathways. PloS One. 2007;2:e789. doi: 10.1371/journal.pone.0000789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lee SJ, et al. Regulation of muscle mass by follistatin and activins. Mol Endocrinol Baltim Md. 2010;24:1998–2008. doi: 10.1210/me.2010-0127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lionikas A, et al. Resolving candidate genes of mouse skeletal muscle QTL via RNA-Seq and expression network analyses. BMC Genomics. 2012;13:592. doi: 10.1186/1471-2164-13-592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sala D, et al. Autophagy-regulating TP53INP2 mediates muscle wasting and is repressed in diabetes. J Clin Invest. 2014;124:1914–1927. doi: 10.1172/JCI72327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Estrada K, et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat Genet. 2012;44:491–501. doi: 10.1038/ng.2249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zheng HF, et al. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature. 2015;526:112–117. doi: 10.1038/nature14878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Coury F, et al. SLC4A2-mediated Cl-/HCO3- exchange activity is essential for calpain-dependent regulation of the actin cytoskeleton in osteoclasts. Proc Natl Acad Sci U S A. 2013;110:2163–2168. doi: 10.1073/pnas.1206392110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Meyers SN, et al. A deletion mutation in bovine SLC4A2 is associated with osteopetrosis in Red Angus cattle. BMC Genomics. 2010;11:337. doi: 10.1186/1471-2164-11-337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sillence DO, Senn A, Danks DM. Genetic heterogeneity in osteogenesis imperfecta. J Med Genet. 1979;16:101–116. doi: 10.1136/jmg.16.2.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Sykes B, Ogilvie D, Wordsworth P, Anderson null, Jones N. Osteogenesis imperfecta is linked to both type I collagen structural genes. Lancet Lond Engl. 1986;2:69–72. doi: 10.1016/s0140-6736(86)91609-0. [DOI] [PubMed] [Google Scholar]
  • 38.Long JR, et al. Association between COL1A1 gene polymorphisms and bone size in Caucasians. Eur J Hum Genet EJHG. 2004;12:383–388. doi: 10.1038/sj.ejhg.5201152. [DOI] [PubMed] [Google Scholar]
  • 39.Flutre T, Wen X, Pritchard J, Stephens M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 2013;9:e1003486. doi: 10.1371/journal.pgen.1003486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Serre D, et al. Differential allelic expression in the human genome: a robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression. PLoS Genet. 2008;4:e1000006. doi: 10.1371/journal.pgen.1000006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pickrell JK, et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464:768–772. doi: 10.1038/nature08872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Coe BP, et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat Genet. 2014;46:1063–1071. doi: 10.1038/ng.3092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cheng R, et al. Genome-wide association studies and the problem of relatedness among advanced intercross lines and other highly recombinant populations. Genetics. 2010;185:1033–1044. doi: 10.1534/genetics.110.116863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Samocha KE, Lim JE, Cheng R, Sokoloff G, Palmer AA. Fine mapping of QTL for prepulse inhibition in LG/J and SM/J mice using F(2) and advanced intercross lines. Genes Brain Behav. 2010;9:759–767. doi: 10.1111/j.1601-183X.2010.00613.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Parker CC, et al. Fine-mapping alleles for body weight in LG/J × SM/J F and F(34) advanced intercross lines. Mamm Genome Off J Int Mamm Genome Soc. 2011;22:563–571. doi: 10.1007/s00335-011-9349-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Parker CC, et al. High-resolution genetic mapping of complex traits from a combined analysis of F2 and advanced intercross mice. Genetics. 2014;198:103–116. doi: 10.1534/genetics.114.167056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Talbot CJ, et al. High-resolution mapping of quantitative trait loci in outbred mice. Nat Genet. 1999;21:305–308. doi: 10.1038/6825. [DOI] [PubMed] [Google Scholar]
  • 48.Demarest K, Koyner J, McCaughran J, Cipp L, Hitzemann R. Further characterization and high-resolution mapping of quantitative trait loci for ethanol-induced locomotor activity. Behav Genet. 2001;31:79–91. doi: 10.1023/a:1010261909853. [DOI] [PubMed] [Google Scholar]
  • 49.Valdar W, et al. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet. 2006;38:879–887. doi: 10.1038/ng1840. [DOI] [PubMed] [Google Scholar]
  • 50.Ghazalpour A, et al. High-resolution mapping of gene expression using association in an outbred mouse stock. PLoS Genet. 2008;4:e1000149. doi: 10.1371/journal.pgen.1000149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Orozco LD, et al. Unraveling inflammatory responses using systems genetics and gene-environment interactions in macrophages. Cell. 2012;151:658–670. doi: 10.1016/j.cell.2012.08.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Sittig LJ, Carbonetto P, Engel KA, Krauss KS, Palmer AA. Integration of genome-wide association and extant brain expression QTL identifies candidate genes influencing prepulse inhibition in inbred F1 mice. Genes Brain Behav. 2016;15:260–270. doi: 10.1111/gbb.12262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Svenson KL, et al. High-resolution genetic mapping using the Mouse Diversity outbred population. Genetics. 2012;190:437–447. doi: 10.1534/genetics.111.132597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Yalcin B, et al. Genetic dissection of a behavioral quantitative trait locus shows that Rgs2 modulates anxiety in mice. Nat Genet. 2004;36:1197–1202. doi: 10.1038/ng1450. [DOI] [PubMed] [Google Scholar]
  • 55.Eberle MA, Rieder MJ, Kruglyak L, Nickerson DA. Allele frequency matching between SNPs reveals an excess of linkage disequilibrium in genic regions of the human genome. PLoS Genet. 2006;2:e142. doi: 10.1371/journal.pgen.0020142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Mangin B, et al. Novel measures of linkage disequilibrium that correct the bias due to population structure and relatedness. Heredity. 2012;108:285–291. doi: 10.1038/hdy.2011.73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Yang H, et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat Genet. 2011;43:648–655. doi: 10.1038/ng.847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.CONVERGE consortium. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature. 2015;523:588–591. doi: 10.1038/nature14659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Le SQ, Durbin R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 2011;21:952–960. doi: 10.1101/gr.113084.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 2011;21:940–951. doi: 10.1101/gr.117259.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. doi: 10.1371/journal.pgen.1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Parker CC, Sokoloff G, Cheng R, Palmer AA. Genome-wide association for fear conditioning in an advanced intercross mouse line. Behav Genet. 2012;42:437–448. doi: 10.1007/s10519-011-9524-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Parker CC, Cheng R, Sokoloff G, Palmer AA. Genome-wide association for methamphetamine sensitivity in an advanced intercross mouse line. Genes Brain Behav. 2012;11:52–61. doi: 10.1111/j.1601-183X.2011.00747.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Smemo S, et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507:371–375. doi: 10.1038/nature13138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Schadt EE, et al. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422:297–302. doi: 10.1038/nature01434. [DOI] [PubMed] [Google Scholar]
  • 67.Chesler EJ, Lu L, Wang J, Williams RW, Manly KF. WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior. Nat Neurosci. 2004;7:485–486. doi: 10.1038/nn0504-485. [DOI] [PubMed] [Google Scholar]
  • 68.Chesler EJ, et al. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet. 2005;37:233–242. doi: 10.1038/ng1518. [DOI] [PubMed] [Google Scholar]
  • 69.Bystrykh L, et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat Genet. 2005;37:225–232. doi: 10.1038/ng1497. [DOI] [PubMed] [Google Scholar]
  • 70.Palmer AA, et al. Gene expression differences in mice divergently selected for methamphetamine sensitivity. Mamm Genome Off J Int Mamm Genome Soc. 2005;16:291–305. doi: 10.1007/s00335-004-2451-8. [DOI] [PubMed] [Google Scholar]
  • 71.Huang GJ, et al. High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome Res. 2009;19:1133–1140. doi: 10.1101/gr.088120.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Farber CR, et al. Mouse genome-wide association and systems genetics identify Asxl2 as a regulator of bone mineral density and osteoclastogenesis. PLoS Genet. 2011;7:e1002038. doi: 10.1371/journal.pgen.1002038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Calabrese G, et al. Systems genetic analysis of osteoblast-lineage cells. PLoS Genet. 2012;8:e1003150. doi: 10.1371/journal.pgen.1003150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.de Klerk E, ‘t Hoen PAC. Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends Genet TIG. 2015;31:128–139. doi: 10.1016/j.tig.2015.01.001. [DOI] [PubMed] [Google Scholar]
  • 75.Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
  • 76.Mane SP, et al. Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing. BMC Genomics. 2009;10:264. doi: 10.1186/1471-2164-10-264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Tang F, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6:377–382. doi: 10.1038/nmeth.1315. [DOI] [PubMed] [Google Scholar]
  • 78.Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinforma Oxf Engl. 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Walter NA, et al. High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs. BMC Genomics. 2009;10:379. doi: 10.1186/1471-2164-10-379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Rat Genome Sequencing and Mapping Consortium et al. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats. Nat Genet. 2013;45:767–775. doi: 10.1038/ng.2644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat Biotechnol. 2014;32:347–355. doi: 10.1038/nbt.2842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Bennett BJ, et al. A high-resolution association mapping panel for the dissection of complex traits in mice. Genome Res. 2010;20:281–290. doi: 10.1101/gr.099234.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Majewski J, Pastinen T. The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends Genet TIG. 2011;27:72–79. doi: 10.1016/j.tig.2010.10.006. [DOI] [PubMed] [Google Scholar]
  • 85.Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinforma Oxf Engl. 2011;27:2325–2329. doi: 10.1093/bioinformatics/btr355. [DOI] [PubMed] [Google Scholar]
  • 88.Grabowski PP, Morris GP, Casler MD, Borevitz JO. Population genomic variation reveals roles of history, adaptation and ploidy in switchgrass. Mol Ecol. 2014;23:4059–4073. doi: 10.1111/mec.12845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma Oxf Engl. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Van der Auwera GA, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinforma Ed Board Andreas Baxevanis Al. 2013;11:11.10.1–11.10.33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Keane TM, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477:289–294. doi: 10.1038/nature10413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8:e1002967. doi: 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–824. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res. 2009;91:47–60. doi: 10.1017/S0016672308009981. [DOI] [PubMed] [Google Scholar]
  • 95.Listgarten J, et al. Improved linear mixed models for genome-wide association studies. Nat Methods. 2012;9:525–526. doi: 10.1038/nmeth.2037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Zhou X, Carbonetto P, Stephens M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 2013;9:e1003264. doi: 10.1371/journal.pgen.1003264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Abney M. Permutation testing in the presence of polygenic variation. Genet Epidemiol. 2015;39:249–258. doi: 10.1002/gepi.21893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Speed D, Hemani G, Johnson MR, Balding DJ. Improved heritability estimation from genome-wide SNPs. Am J Hum Genet. 2012;91:1011–1021. doi: 10.1016/j.ajhg.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5

RESOURCES