Skip to main content
Springer logoLink to Springer
. 2013 Dec 31;25(1):12–22. doi: 10.1007/s00335-013-9495-6

The genetics of gene expression in complex mouse crosses as a tool to study the molecular underpinnings of behavior traits

Robert Hitzemann 1,2, Daniel Bottomly 3, Ovidiu Iancu 2, Kari Buck 1,2, Beth Wilmot 3,4, Michael Mooney 4, Robert Searles 5, Christina Zheng 4, John Belknap 1,2, John Crabbe 1,2, Shannon McWeeney 1,3,4,6,
PMCID: PMC3916704  PMID: 24374554

Abstract

Complex Mus musculus crosses provide increased resolution to examine the relationships between gene expression and behavior. While the advantages are clear, there are numerous analytical and technological concerns that arise from the increased genetic complexity that must be considered. Each of these issues is discussed, providing an initial framework for complex cross study design and planning.

Introduction

Sandberg et al. (2000) using Affymetrix microarrays, were the first to detect differences in genome-wide brain gene expression between two inbred mouse strains (C57BL/6J [B6] and 129SvEv [129; now 129S6/SvEvTac]). Importantly, these authors observed that some differentially expressed (DE) genes were found in chromosomal regions with known behavioral quantitative trait loci (bQTLs). For example, Kcnj9 which encodes for GIRK3, an inwardly rectifying potassium channel, was differentially expressed (higher expression in the 129 strain) and is located on distal chromosome 1 in a region where QTLs had been identified for locomotor activity, alcohol and pentobarbital withdrawal, open-field emotionality, and certain aspects of fear-conditioned behavior. This study was unable to address the question of whether or not the elements regulating Kcnj9 expression were located within the QTL intervals and/or near the gene locus. However, it is possible to extract such causal relationships by combining gene expression and genotype data in genetically segregating populations. Jansen and Nap (2001) were among the first to suggest this approach, which they termed “genetical genomics”. Although originally described for Arabidopsis, the strategy was quickly used to examine gene expression in Drosophila, yeast, and the mouse (see Lum et al. 2006 and references therein). Schadt et al. (2003) and others defined the expression QTLs (eQTLs) as either cis (mapping near the gene locus) or trans (mapping elsewhere in the genome). When behavioral QTLs (bQTLs) and cis-eQTLs overlap, the cis-eQTL genes are inferred as strong quantitative trait gene (QTG) candidates (see e.g. Farris et al. 2010). The situation for trans-eQTLs is more complicated since the QTL confidence interval is generally larger and any gene within the QTL interval could have a regulatory role.

The application of genetical genomics to mouse has generally focused on segregating populations involving two inbred strains, one of which is very frequently the B6 strain. Descriptions of these applications are found in the following section. The data analysis is relatively straightforward, especially because good sequence data are available for essentially all strains that would ever be used in a behavioral experiment (Keane et al. 2011). There are, however, problems with the two strain intercross approach. First, two strains will capture only a fraction of the genetic diversity that is available in Mus musculus (Roberts et al. 2007; Keane et al. 2011). Behavioral techniques and apparatus have been engineered for the placid and some would argue somnambulant laboratory strains of mice that are highly related (Roberts et al. 2007). Using SNPs as a surrogate for genetic diversity, a B6 x DBA/2J (D2) F2 intercross has only 1/6 the gene diversity of a heterogeneous stock (HS) formed from the eight inbred strains used to form the collaborative cross (CC) (Churchill et al. 2004; Iancu et al. 2010); the CC strains include three wild-derived strains. Crosses of low genetic diversity are not optimal for systems biology applications (Churchill et al. 2004; Threadgill and Churchill 2012). Second, given high quality sequence data and dense genotyping platforms, the use of complex crosses allows one to extract for any QTL a haplotype structure which in turn can markedly reduce the QTL confidence interval, in some cases to less than 1 Mbp. Although QTLs of this size are still 1–2 orders of magnitude larger than QTLs detected in human association studies, the reduction in size, especially in gene poor regions, is still sufficient to focus the analysis on a handful of candidates.

This article focuses on the use of complex crosses to examine the relationships between gene expression and behavior. Some historical background is provided as the field has moved from simple to complex segregating populations. While the advantages of complex crosses are obvious, there are several disadvantages, especially ones associated with data analysis. Microarray platforms were not designed for complex crosses and thus, RNA-Seq becomes the preferred strategy for assessing gene expression. While RNA-Seq allows one to examine not only gene expression but also the expression of non-coding RNAs, alternative splicing and allele specific expression, the data analysis is computationally intensive. An additional consideration is that the inclusion of wild-derived strains in the HS-CC has sometimes limited the application of this population for mapping certain behavioral responses. Behavioral testing protocols in mice have been primarily established for assessment in the common laboratory strains and increased locomotor activity associated with the inclusion of the wild-derived alleles has raised concerns about testing validity (see Logan et al. (2013) for recent examination of potential impact in the Diversity Outbred).

Model systems for complex populations

One could begin a discussion of brain gene expression, behavior, and complex crosses with Sandberg et al. (2000) (see above) but to fully understand the role of mouse complex crosses in this equation, it is perhaps best to start with a series of papers that appeared more than 20 years ago and demonstrated that it was possible to map QTLs for behavioral traits in recombinant inbred (RI) strains of mice (e.g. Gora-Maslak et al. 1991; Belknap 1992). While several RI panels were available, it was the BXD RI panel (Taylor 1978) that was most widely used. These papers and confirmatory F2 intercross studies clearly established two important and related points. One was that the QTL effect sizes were generally small and two, as a consequence, the QTL confidence intervals were typically very large, frequently more than 25 cM (or ~50 Mbp). As a result, it was almost impossible to know which gene or genes within the QTL interval are causally related to the phenotype of interest. This search was of course further complicated at the time by the poor annotation of the mouse genome. Several strategies were developed to reduce the QTL interval (see e.g. Darvasi 1998). These included the use of interval specific congenic strains, mapping in advanced intercross populations, recombinant progeny testing, and the recombinant inbred segregation test. (Talbot et al. 1999, used a variant of the advanced intercross strategy to map QTLs for open-field behavior in a heterogeneous stock (HS) created from eight inbred laboratory mouse strains. A subsequent analysis of these data (Mott et al. 2000) revealed that it would be possible to map QTLs with good precision and extract an approximate QTL haplotype structure. However, despite these and other improvements, only a very small number of behavioral quantitative trait genes (bQTG) have been identified (see e.g. Shirley et al. 2004). Although QTL resolution at the gene level is not typical in some mouse populations, it can be possible to approach gene level resolution in some commercially available outbred populations (Yalcin and Flint 2012) and interval specific congenic lines (Shirley et al. 2004).

Several approaches have been used to identify and prioritize candidate genes within a QTL interval. This initially focused on allelic sequence variation, but, even just a decade ago this was possible only if one was willing to sequence individual genes. Today, given the availability of high quality inbred strain sequence data (Keane et al. 2011), it is now possible to interrogate a QTL interval and determine which genes harbor non-synonymous coding SNPs that match the QTL profile. An alternative approach, which was widely adopted, was to integrate QTL analysis and gene expression profiling, emphasizing the genetical genomics approach (Jansen and Nap 2001). The emphasis on this approach was key to the development of WebQTL (Wang et al. 2003). Gene expression data from multiple brain regions was made available for the B6 and D2 inbred strains and 32 BXD RI strains. Also posted at the Web site were a variety of RI strain behavioral and genotype data. For many investigators, this was the first portal for examining how the natural variation in gene expression and behavior were correlated. Over the years, the Website has been updated by the inclusion of brain gene expression data from other RI panels, mouse F2 intercrosses, additional BXD RI strains, and a significant number of inbred mouse strains, including whole brain and brain regional data. The data have been used in a variety of ways, including detecting how patterns of gene co-expression have behavioral associations (Chesler et al. 2005).

Peirce et al. (2006) mined the data to address the question of “how reliable are eQTLs?”. These authors noted that for B6xD2 genotypes, cis-eQTLs are highly replicable but that there is an overabundance on eQTLs where the B6 strain is associated with higher expression. These data suggested that some of these QTLs were artifacts due to SNPs and the poor hybridization of the D2 cDNA. Subsequent experiments showed that indeed this was the case (Walter et al. 2007, 2009). Flint and colleagues (see Solberg et al. 2006; Valdar et al. 2006a, b) mapped QTLs for a variety of behavioral phenotypes in >2,000 HS animals; this HS population (HS/NPT), also an eight strain cross, differed from that used by Talbot et al. (1999). Importantly for this article, Flint and colleagues collected hippocampal gene expression data on 460 animals (Huang et al. 2009). Similar to Peirce et al. (2006), Huang et al. (2009) concluded that a significant proportion of the cis-eQTLs were hybridization artifacts. Nonetheless and not unexpectedly, the number of “true” cis-eQTLs appeared to be significantly greater than those previously detected in simpler crosses; i.e., in the HS population, additional regulatory alleles are detected. Similar results were obtained for gene expression in a simpler HS (HS4), derived from crossing four laboratory strains (Malmanger et al. 2006).

The CC (Churchill et al. 2004) was formed to provide a unique system biology resource that addresses many shortcoming in available mouse strain resources, such as limited genetic diversity. The goal was to generate >1,000 RI strains formed from eight inbred strain founders that capture >90 % of the genetic diversity available in Mus musculus. Three of the CC founders are wild-derived strains. Although it appears that only several hundred RI strains will reach completion, the CC, like the BXD RI panel, will in time provide an important reference population for examining gene-behavior relationships. Two outbred versions of the CC have been created, the HS-CC and the Diversity Outbred (DO) (Iancu et al. 2010; Churchill et al. 2012). To date, brain gene expression data are only available for the HS-CC. Iancu et al. (2010) compared brain (striatum) gene expression in a B6xD2 F2 intercross, HS4, and HS-CC animals. Although it was assumed that the regulation of gene expression would differ in each of the populations, it was also assumed that given striatal function is not cross dependent, at some level function and gene expression should overlap in a similar way for all three crosses. To address this issue, Iancu et al. (2010) utilized the Weighted Gene Co-expression Network Analysis (WGCNA) (Zhang and Horvath 2005). This analysis builds from the premise that (a) gene expression networks have scale free properties (i.e. there are a few highly connected nodes) and (b) co-expressed genes share similar functions. The analysis revealed that while there were some cross-dependent differences, the overall modular substructure of the co-expression network was cross independent, the highly connected nodes remained intact. Iancu et al. (2013) next asked if selection for a behavioral phenotype (haloperidol-induced catalepsy) had similar effects on expression network structure across the three crosses. The results obtained are both interesting and cautionary as we press forward examining complex cross gene expression. The selection paradigm was short-term (3–4 generations), the rate of segregation of the responsive and non-responsive lines was similar, and the responsive and non-responsive lines all differed by greater than 20-fold in the haloperidol ED50. The difference in response was not pharmacokinetic. The first key observation was that there was no overlap of differential gene expression for the three selections. The second key observation was that as genetic diversity increased, the number of co-expression modules affected by selection also increased. It was possible to identify a core set of modules affected by selection. What is unknown is whether or not the additional modules that were affected by selection e.g. in the HS-CC population, are relevant to our understanding of the gene-behavior relationship.

Phenotype measurements in eQTL analysis

Several technological advances have fundamentally altered the definition of phenotype in QTL studies. Mapping RNA transcript and protein abundance levels is widespread, and in principle any biologic characteristic of interest can be tested for association with genetic polymorphisms. In the context of neurobehavioral traits, examples include number of neuronal cells in specific brain regions (Rosen and Williams 2001; Airey et al. 2001) and also brain morphometry (Li et al. 2005; Jan et al. 2008). The focus of this review is on high-throughput methodologies and in particular measurements of gene expression such as microarrays, qPCR, and RNA-Seq. While these technologies offer tremendous breath to transcriptome analysis, several factors can adversely affect the quality of the results. All technologies assume intact RNA; the extent to which this assumption is true can be evaluated using the RNA integrity number (RIN) (Schroeder et al. 2006). From human studies, it has been shown that possible confounding factors include length of time post-mortem and the pH of the sample; statistical analysis can incorporate these as covariates (Liu 2011). For hybridization based methods, factors affecting probe matching can strongly affect expression measurement (Walter et al. 2007); these errors can further propagate in the course of eQTL mapping (Iancu et al. 2012). PCR based methods can also be affected by polymorphisms within the primer sequence. Taking into account these factors has beneficial effects on the downstream analysis.

Batch effects can introduce serious confounding factors in the analysis of expression levels; ideally, all samples should be processed at the same time. If separate batches are unavoidable, balancing case/controls, and sex within batches is important. Several techniques that alleviate batch effects have been proposed, with the ComBat package among the most popular (Johnson et al. 2007).

A major limitation affecting microarray-based analyses is the limited dynamic range of the fluorescence signals. This problem is resolved by the RNA-Seq methodology, where the dynamic range is orders of magnitude above the microarray capacity (Nagalakshmi et al. 2008). The adverse effects of SNPs on probe hybridization are also completely alleviated by RNA-Seq. Count data is directly related to expression level, as opposed to microarrays where the fluorescent intensity is an indirect measurement. Although RNASeq is more costly than array-based technologies, costs are steadily decreasing, which promises increased utilization of this technology.

Analytical approaches for eQTL

The analysis of eQTL in complex crosses mirrors that of traditional QTL mapping at its core. However, it also comes with additional issues that require special care by an analyst either not considered in the simplest forms of QTL mapping or further exacerbated. We will briefly review some of the most common choices of statistical methodology with an emphasis on methods for the analysis of crosses with more than two founders. First, we will consider common issues between high dimensional eQTL techniques. Specifically we will consider methods devised to deal with the multitude of statistical tests that need to be performed for a given experiment through either corrections to significance measures or by approaches that reduce the number of tests that need to be performed. We will then discuss specific statistical methodology devised for the analysis of the emerging RNA-Seq technology as related to more established microarray eQTL methods. Note that this review will mainly consider frequentist methods, though we note that Bayesian approaches are becoming more prevalent in mouse genetics. See, for instance, the review by Stephens and Balding (Stephens and Balding 2009) as an introduction to Bayesian methods in genetics. Also note that we focus on the case of a single QTL/eQTL underlying a given trait though generalizations of the below methodology allow the examination of two or more loci.

Overview of genetic and statistical considerations

The analytical methods with which QTL/eQTL analysis occurs depends on the cross as well as other experimental factors such as the assumed genetic model and phenotype. It is important to note that there are a number of design considerations that should be taken into account early in the planning process, particularly for studies utilizing complex crosses (Fig. 1). For crosses involving two inbred progenitor mouse lines (i.e. F2s intercross or backcrosses) either a single marker analysis of variance (Broman and Speed 1999), interval mapping (Lander and Botstein 1989), or related regression based approaches (Haley and Knott 1992) are typically applied when assuming the presence of a single QTL. For crosses with more than two inbred founders such as in heterogenous stock (HS) (McClearn et al. 1970), CC (Churchill et al. 2004) or Diversity Outbred (DO) (Svenson et al. 2012) mice, typically multiple regression procedures are performed based on estimates of founder strain allelic contributions for a given marker/interval (Talbot et al. 1999; Mott et al. 2000; Svenson et al. 2012; Aylor et al. 2011; Durrant et al. 2011; Philip et al. 2011). These values are the result of haplotype reconstruction in terms of the founder lines using either the genotype calls (Mott et al. 2000; Liu et al. 2010), or intensities of the genotyping arrays (Svenson et al. 2012; Collaborative Cross 2012). Haplotype reconstructions in this manner mainly draw on the use of a Hidden Markov Model though alternate approaches have also been recently considered (Zhou et al. 2012). Hidden Markov Models are a machine learning approach designed for inferring underlying states of an unknown spatially/temporally ordered variable (Rabiner 1989). For this application, the states would correspond to founder inbred strain haplotypes and the end result would be a matrix of probabilities of descent from each pair of founder inbred strains which can be further summarized per strain (Mott et al. 2000; Valdar et al. 2009). The basic multiple linear regression model approach in this case would typically compare a model with the founder contributions to one without the founder contributions for each marker interval. The comparison of these two models allow the computation of an F statistic and accompanying p value (Valdar et al. 2009).

Fig. 1.

Fig. 1

Simple framework that highlights (in each orange box) the design and analysis considerations that should be taken into account for expression studies utilizing complex crosses. It is noted that the primary research question, as well as the cross, accompanying assumed genetic model and phenotype must be determined first

Multiple testing considerations

One issue that is exacerbated in high dimensional eQTL scans is how to pick a significance threshold once p values (or LOD scores) are generated for each expression phenotype. The way in which these thresholds are chosen can be roughly divided into three categories ordered by decreasing conservativeness: familywise error rate, false discovery rate (FDR), and permutation/simulation procedures. The procedure used depends on the expected effect size as well as type of desired downstream analysis. For instance if the main goal is to confirm the top ranked genes via qPCR there is little benefit to incur the increased computational and analytical time generating and interpreting large lists of genes potentially regulated by an QTL. Therefore a familywise based approach such as the Bonferroni correction would make sense (Bottomly et al. 2012). The Bonferroni correction has also been used as an approach to estimate the number of false positives (Schadt et al. 2003).

Controlling the false discovery rate also has been suggested (Storey and Tibshirani 2003; Carlborg et al. 2005). A common way to implement this control is through the computation of q values from the scan p values. A q value corresponds to the expected proportion of false positives when calling a given test significant (Storey and Tibshirani 2003). It has been used on top of permutation-based p values as a way to estimate the specificity of the given scan (Aylor et al. 2011; Chesler et al. 2005). In addition, FDR values have been estimated directly using subsets of the eQTL p values (Ghazalpour et al. 2008). One issue with considering FDR corrections is the presence of dependence if multiple p values are considered per expression trait (Kendziorski and Wang 2006). Dependence between two tests in this context means that say, a low p value for trait A implies a low p value for trait B. For instance the computation of q values relies on at most weak dependence between p values and violations of this may cause inaccuracies of the method (Storey and Tibshirani 2003). However, application of an approach such as surrogate variable analysis could be applied to remove dependencies between the test statistics increasing the validity of the q values (Leek and Storey 2007, 2008).

Permutation testing is arguably the most common approach for significant assessment in eQTL studies. An approach similar to QTL studies would apply a permutation procedure to each expression trait separately (Churchill and Doerge 1994). However, as the number of tests is thousands of times greater than a standard QTL analysis, it is not desirable to perform a full permutation test potentially increasing computation time by at least an additional thousand-fold. One approach is to reduce the number of permutations necessary to compute the significance threshold through the use of a parametric model (Valdar et al. 2006a). Also, permutation testing procedures can be applied to only a subset of expression traits with the result then used to choose thresholds for the remaining traits (Huang et al. 2009; Aylor et al. 2011). This approach needs to take into consideration distributional differences among the traits that can lead to large differences in threshold values (Carlborg et al. 2005). One approach to choose representative threshold values is to interpolate based on a representative group of threshold values (Huang et al. 2009), another is to choose a global threshold based on the distribution of the thresholds (West et al. 2007). Regardless of the approach used to generate the significance thresholds, permutations need to be carried appropriately out with regard to experimental design (Churchill and Doerge 2008).

Dimension reduction

One strategy to reduce the number of tests being performed in an eQTL setting is to focus only on a subset of expression traits relevant to the phenotype(s) of interest. Relevance in this case is determined through differential expression analysis (Schadt et al. 2003). Other approaches take advantage of the fact that expression data is highly correlated to first form groups of genes with highly similar expression profiles followed by a QTL mapping procedure, two common procedures for doing this are clustering and principle component analysis. Clustering algorithms are commonly used in microarray experiments (Eisen et al. 1998) and have been used successfully as a means to reduce the number of traits necessary to map (Chun and Keleş 2009; Lan et al. 2003; Yvert et al. 2003). Procedures based on principal components analysis, which seeks to find eigengenes or eigentraits that explains a certain amount of variability while being independent from one another (Alter et al. 2000), have also been applied to expression data prior to mapping (Lan et al. 2003; Biswas et al. 2008). Mapping expression traits by first clustering the expression data and then summarizing the clusters using the ‘eigengene’ have also been shown to be effective for finding QTL regions with a large effect on expression traits (Fuller et al. 2007).

RNA-Seq eQTL approaches

The advent of microarrays made eQTL approaches an attractive option to elucidate the genetic underpinnings of gene expression. However, microarrays have many issues that prevent them from being an ideal datasource. For instance, microarrays have fixed probes/reporters that can limit expression estimates. This means both that a potential gene of interest may not be interrogated in addition to the possibility that hybridization of the probes on the array may be affected by genomic differences as is discussed later. A more recent approach is the high throughput sequencing of the mRNA population in a given experimental condition for a given animal (Mortazavi et al. 2008). This data source is less constrained by annotation, is free from relying on reporter hybridization and therefore allows additional types of analyses related to basic microarray-based eQTL to be performed.

The first type of analysis facilitated by RNA-Seq is the study of transcript-level expression specifically alternative splicing QTL (sQTL) as has been found to be informative in humans (Heinzen et al. 2008; Kwan et al. 2008). This type of analysis has been examined using microarrays for complex mouse crosses (Alberts et al. 2005), however, in practice fixed microarray probe placement and genomic differences between probe sequence and RNA source was a major impediment (Huang et al. 2009; Ciobanu et al. 2010). From recent studies using RNA-Seq, it appears that the technology is better suited to assessing the genetics of alternative splicing analysis in humans (Pickrell et al. 2010; Rakitsch et al. 2012). However though it has been suggested as a promising avenue of research (Guryev and Cuppen 2009; Hitzemann et al. 2013) little work appears to have been done applying the method to mouse crosses.

Another potential benefit to the use of RNA-Seq is the direct study of allele-specific expression. These experiments have traditionally been performed through the use of RT-PCR based confirmation approaches (Cowles et al. 2002). Allele-specific expression is implemented in practice for RNA-Seq in a similar manner by essentially counting the number of sequence reads generated by the technology that overlap with either the reference or alternative allele(s) (Degner et al. 2009). Initial applications of this approach to study embryonic imprinting yielded promising (Gregg et al. 2010) though conflicting messages (DeVeale et al. 2012) about the additional power RNA-Seq lends to the problem.

Computational issues

One of the central issues with eQTL mapping is the drastic increase in computational capabilities it requires over a similar QTL study. This is only exacerbated by increases in marker density of new genotyping arrays (Yang et al. 2009) and expression traits in exon-level oligonucleotide arrays (Gardina et al. 2006) or RNA-Seq (Mortazavi et al. 2008). In order to gain computational efficiency, aspects of the underlying mathematics can be leveraged to provide essentially the same results using less computational resources. The simplest example of this is the ability to use a matrix of phenotypes in standard linear model fitting as opposed to a single phenotype vector as is typically used. This means that relatively computationally expensive matrix calculations are performed only once and can therefore be leveraged to perform batch processing of phenotypes at a significant decrease in computational time (Valdar et al. 2009). This type of batch processing also lends itself to parallel processing either through a cluster computing environment or a single computer with multiple processors. A related example is the mixed effects model framework of EMMA (Kang et al. 2008). Similarly, analysis methods have also been developed for RNA-Seq that make computationally beneficial approximations to the underlying parameter estimation procedure (McCarthy et al. 2012).

Population substructure

Population substructure is a serious confounding factor in many QTL and eQTL mapping studies (Devlin et al. 2001; Pritchard and Donnelly 2001; Kang et al. 2008; Valdar et al. 2009; Listgarten et al. 2010). In brief, the problem can be summarized as follows: for a statistical test used to identify the causative genetic effects on a phenotype, the null hypothesis states that there is no association between the genetic locus and the phenotype. However, this assumption does not hold in cases where population substructure is present: differences in average phenotype value between the subpopulations will be detected as a QTL for each genetic locus that segregates between the subpopulations, even though the locus is not necessarily causative. It is therefore important to distinguish between causative associations and associations due solely to genetic linkage.

In mouse QTL studies, much of the uneven relatedness between individuals is due to the complex genetic history of the commonly used inbred strains. The most significant differences are between the classical inbred strains and the wild-derived inbred strains (Ideraabdullah et al. 2004; Yalcin et al. 2004). Classical inbred strains are derived from a limited number of individuals of the Mus musculus subspecies that have widely varying degrees of relatedness (Bonhomme et al. 1989). The wild-derived strains are derived from several Mus subspecies captured at different times and geographic locations (Bonhomme and Guenet 1989). Therefore, studies that evaluate phenotypic variability among several inbred strains need to account for the phylogenetic differences.

Heterogeneous stock mice are derived from inbred strains using various outbreeding procedures (Chia et al. 2005). QTL mapping in these populations offers markedly higher resolution as compared to simple intercrosses (Talbot et al. 1999; Svenson et al. 2012). However, despite efforts to randomize the mating process, individuals in outbred mouse populations display varying levels of relatedness (Aldinger et al. 2009; Iancu et al. 2012). Furthermore, an in-depth analysis of the structure of a heterogeneous stock mouse population revealed that relatedness is not evenly distributed across the genome and individual chromosomes can have effects on phenotype that are distinct from the whole genome kinship information (Iancu et al. 2012) adding another layer of complexity. Therefore, mapping strategies employed in outbred populations need to adjust for this confounding factor (e.g., Cheng et al. 2011 and references therein).

Attempts to adjust for population substructure fall into several categories. In human association studies, genomic control (Devlin et al. 2000) structured association (Pritchard et al. 2000) and principal component analysis (Patterson et al. 2006) are the most commonly employed procedures. In mouse populations, the relatively large effect size of the kinship structure seems to favor an alternative mixed-model approach (Kang et al. 2008). In a further refinement of this approach (Iancu et al. 2012), we recently demonstrated that it is possible to simultaneously detect strain-specific effects and also correct for population structure.

Causal inference

One of the main benefits of eQTL studies is the ability to form networks based on the correlation/covariation structure of the expression data across the experimental populations (Chesler et al. 2005). This allows relationships between expression traits to be expressed, for example, Trait A and Trait B are correlated and therefore there is potentially a relationship between the two traits. Without additional information or assumptions typically one cannot state confidently whether Trait A causes Trait B (Trait A→Trait B) or Trait A reacts to Trait B (Trait A←Trait B) or whether there is a confounding factor responsible for the observed correlation. Therefore co-expression networks by themselves cannot usually be used to form ‘causal’ or ‘reactive’ hypotheses, however when jointly considered with DNA variation data such inference is possible (Schadt et al. 2005). The inclusion of DNA variation data in the context of experimental crosses is necessary as it can be assumed to be the main driver of variation in the traits under consideration (Schadt et al. 2005). There are several similar ways in which causal reasoning is performed in the eQTL context: model selection approaches (Schadt et al. 2005; Chen et al. 2007; Millstein et al. 2009) structural equation modeling (SEM) (Liu et al. 2008; Aten et al. 2008) and Bayesian networks (Zhu et al. 2007). All of these approaches are similar in spirit in that they attempt to define local or global relationships of the form Marker A→Trait B→Trait C. Although, the use of causal inference approaches have shown promise, in general some cautions apply about the interpretation of causal modeling in eQTL. Specifically, consideration of large sample sizes, the removal of factors that can play a role as a hidden confounder as well as considering comprehensive sets of models are seen as necessary steps for robust causal modeling (Li et al. 2010).

Conclusion

The utility and value of complex crosses for examining the relationship between behavior and expression is clear. However, there are numerous considerations given the increased genetic complexity that must be dealt with in the design of these types of studies. By highlighting each of these, we provide a conceptual framework to guide researchers in study planning and implementation.

Acknowledgments

This study was supported in part by United States Public Health Service grants AA10760, AA11034, AA13484, MH 51372, AA 13519, AA 20245, DA005228, Oregon Clinical and Translational Research Institute [5UL1RR024140], Knight Cancer Institute [5 P30 CA069533], and grant support from the Department of Veterans Affairs.

References

  1. Airey DC, Lu L, Williams R. Genetic control of the mouse cerebellum: identification of quantitative trait loci modulating size and architecture. J Neurosci. 2001;21(14):5099–5109. doi: 10.1523/JNEUROSCI.21-14-05099.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alberts R, Terpstra P, Bystrykh LV, de Haan G, Jansen RC. A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays. Genetics. 2005;171(3):1437–1439. doi: 10.1534/genetics.105.045930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aldinger KA, Sokoloff G, Rosenberg DM, Palmer AA, Millen KJ. Genetic variation and population substructure in outbred CD-1 mice: implications for genome-wide association studies. PLoS One. 2009;4(3):e4729. doi: 10.1371/journal.pone.0004729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci. 2000;97(18):10101–10106. doi: 10.1073/pnas.97.18.10101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Aten J, Fuller T, Lusis A, Horvath S. Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Syst Biol. 2008;2(1):34. doi: 10.1186/1752-0509-2-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Aylor DL, Valdar W, Foulds-Mathes W, Buus RJ, Verdugo RA, Baric RS, Ferris MT, Frelinger JA, Heise M, Frieman MB, Gralinski LE, Bell TA, Didion JD, Hua K, Nehrenberg DL, Powell CL, Steigerwalt J, Xie Y, Kelada SNP, Collins FS, Yang IV, Schwartz DA, Branstetter LA, Chesler EJ, Miller DR, Spence J, Liu EY, McMillan L, Sarkar A, Wang J, Wang W, Zhang Q, Broman KW, Korstanje R, Durrant C, Mott R, Iraqi FA, Pomp D, Threadgill D, Pardo-Manuel de Villena F, Churchill GA. Genetic analysis of complex traits in the emerging collaborative cross. Genome Res. 2011;21(8):1213–1222. doi: 10.1101/gr.111310.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Belknap JK. Empirical estimates of Bonferroni corrections for use in chromosome mapping studies with the BXD recombinant inbred strains. Behav Genet. 1992;22(6):677–684. doi: 10.1007/BF01066638. [DOI] [PubMed] [Google Scholar]
  8. Biswas S, Storey J, Akey J. Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis. BMC Bioinformatics. 2008;9(1):244. doi: 10.1186/1471-2105-9-244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bonhomme F, Guenet JL. The laboratory mouse and its wild relatives. In: Lyon MF, editor. Genetic variants and strains of the laboratory mouse. Oxford: Oxford University Press; 1989. p. 876. [Google Scholar]
  10. Bonhomme F, Miyashita N, Boursot P, Catalan J, Moriwaki K. Genetical variation and polyphyletic origin in Japanese Mus musculus. Heredity (Edinb) 1989;63(Pt 3):299–308. doi: 10.1038/hdy.1989.102. [DOI] [PubMed] [Google Scholar]
  11. Bottomly D, Ferris M, Aicher L, Rosenzweig E, Whitmore A, Aylor D, Haagmans B, Gralinski L, Bradel-Tretheway B, Bryan J, Threadgill D, Pardo-Manuel de Villena F, Baric R, Katze M, Heise M, McWeeney S. Expression quantitative trait loci for extreme host response to Influenza A in pre-collaborative cross mice. G3: Genes Genomes Genet. 2012;2:213–221. doi: 10.1534/g3.111.001800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Broman KW, Speed TP (1999) A review of methods for identifying QTLs in experimental crosses. Lecture Notes-Monograph Series: 114–142
  13. Carlborg Ö, De Koning DJ, Manly KF, Chesler E, Williams RW, Haley CS. Methodological aspects of the genetic dissection of gene expression. Bioinformatics. 2005;21(10):2383–2393. doi: 10.1093/bioinformatics/bti241. [DOI] [PubMed] [Google Scholar]
  14. Chen L, Emmert-Streib F, Storey J. Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol. 2007;8(10):R219. doi: 10.1186/gb-2007-8-10-r219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cheng R, Abney M, Palmer AA, Skol AD. QTLRel: an R package for genome-wide association studies in which relatedness is a concern. BMC Genet. 2011;12:66. doi: 10.1186/1471-2156-12-66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chesler EJ, Lu L, Shou S, Qu Y, Gu J, Wang J, Hsu HC, Mountz JD, Baldwin NE, Langston MA, et al. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet. 2005;37(3):233–242. doi: 10.1038/ng1518. [DOI] [PubMed] [Google Scholar]
  17. Chia R, Achilli F, Festing MF, Fisher EM. The origins and uses of mouse outbred stocks. Nat Genet. 2005;37(11):1181–1186. doi: 10.1038/ng1665. [DOI] [PubMed] [Google Scholar]
  18. Chun H, Keleş S. Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics. 2009;182(1):79–90. doi: 10.1534/genetics.109.100362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138(3):963–971. doi: 10.1093/genetics/138.3.963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Churchill GA, Doerge RW. Naive Application of Permutation Testing Leads to Inflated Type I Error Rates. Genetics. 2008;178(1):609–610. doi: 10.1534/genetics.107.074609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Churchill GA, Airey DC, Allayee H, Angel JM, Attie AD, Beatty J, Beavis WD, Belknap JK, Bennett B, Berrettini W, Bleich A, Bogue M, Broman KW, Buck KJ, Buckler E, Burmeister M, Chesler EJ, Cheverud JM, Clapcote S, Cook MN, Cox RD, Crabbe JC, Crusio WE, Darvasi A, Deschepper CF, Doerge RW, Farber CR, Forejt J, Gaile D, Garlow SJ, Geiger H, Gershenfeld H, Gordon T, Gu J, Gu W, de Haan G, Hayes NL, Heller C, Himmelbauer H, Hitzemann R, Hunter K, Hsu HC, Iraqi FA, Ivandic B, Jacob HJ, Jansen RC, Jepsen KJ, Johnson DK, Johnson TE, Kempermann G, Kendziorski C, Kotb M, Kooy RF, Llamas B, Lammert F, Lassalle JM, Lowenstein PR, Lu L, Lusis A, Manly KF, Marcucio R, Matthews D, Medrano JF, Miller DR, Mittleman G, Mock BA, Mogil JS, Montagutelli X, Morahan G, Morris DG, Mott R, Nadeau JH, Nagase H, Nowakowski RS, O’Hara BF, Osadchuk AV, Page GP, Paigen B, Paigen K, Palmer AA, Pan HJ, Peltonen-Palotie L, Peirce J, Pomp D, Pravenec M, Prows DR, Qi Z, Reeves RH, Roder J, Rosen GD, Schadt EE, Schalkwyk LC, Seltzer Z, Shimomura K, Shou S, Sillanpaa MJ, Siracusa LD, Snoeck HW, Spearow JL, Svenson K, Tarantino LM, Threadgill D, Toth LA, Valdar W, de Villena FP, Warden C, Whatley S, Williams RW, Wiltshire T, Yi N, Zhang D, Zhang M, Zou F. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat Genet. 2004;36(11):1133–1137. doi: 10.1038/ng1104-1133. [DOI] [PubMed] [Google Scholar]
  22. Churchill GA, Gatti DM, Munger SC, Svenson KL. The diversity outbred mouse population. Mamm Genome. 2012;23(9–10):713–718. doi: 10.1007/s00335-012-9414-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ciobanu DC, Lu L, Mozhui K, Wang X, Jagalur M, Morris JA, Taylor WL, Dietz K, Simon P, Williams RW. Detection, validation, and downstream analysis of allelic variation in gene expression. Genetics. 2010;184(1):119–128. doi: 10.1534/genetics.109.107474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Collaborative Cross C. The genome architecture of the collaborative cross mouse genetic reference population. Genetics. 2012;190(2):389–401. doi: 10.1534/genetics.111.132639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Cowles CR, Hirschhorn JN, Altshuler D, Lander ES. Detection of regulatory variation in mouse genes. Nat Genet. 2002;32(3):432–437. doi: 10.1038/ng992. [DOI] [PubMed] [Google Scholar]
  26. Darvasi A. Experimental strategies for the genetic dissection of complex traits in animal models. Nat Genet. 1998;18(1):19–24. doi: 10.1038/ng0198-19. [DOI] [PubMed] [Google Scholar]
  27. Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, Pritchard JK. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25(24):3207–3212. doi: 10.1093/bioinformatics/btp579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. DeVeale B, van der Kooy D, Babak T. Critical evaluation of imprinted gene expression by RNA–Seq: a new perspective. PLoS Genet. 2012;8:e1002600. doi: 10.1371/journal.pgen.1002600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Devlin B, Roeder K, Wasserman L. Genomic control for association studies: a semiparametric test to detect excess-haplotype sharing. Biostatistics. 2000;1(4):369–387. doi: 10.1093/biostatistics/1.4.369. [DOI] [PubMed] [Google Scholar]
  30. Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theor Popul Biol. 2001;60(3):155–166. doi: 10.1006/tpbi.2001.1542. [DOI] [PubMed] [Google Scholar]
  31. Durrant C, Tayem H, Yalcin B, Cleak J, Goodstadt L, de Villena FPM, Mott R, Iraqi FA. Collaborative cross mice and their power to map host susceptibility to Aspergillus fumigatus infection. Genome Res. 2011;21(8):1239–1248. doi: 10.1101/gr.118786.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci. 1998;95(25):14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Farris SP, Wolen AR, Miles MF. Using expression genetics to study the neurobiology of ethanol and alcoholism. Int Rev Neurobiol. 2010;91:95–128. doi: 10.1016/S0074-7742(10)91004-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S (2007) Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm Genome 18(6–7):463–472 [DOI] [PMC free article] [PubMed]
  35. Gardina P, Clark T, Shimada B, Staples M, Yang Q, Veitch J, Schweitzer A, Awad T, Sugnet C, Dee S, Davies C, Williams A, Turpaz Y. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array. BMC Genomics. 2006;7(1):325. doi: 10.1186/1471-2164-7-325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ghazalpour A, Doss S, Kang H, Farber C, Wen PZ, Brozell A, Castellanos R, Eskin E, Smith DJ, Drake TA. High-resolution mapping of gene expression using association in an outbred mouse stock. PLoS Genet. 2008;4(8):e1000149. doi: 10.1371/journal.pgen.1000149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Gora-Maslak G, McClearn GE, Crabbe JC, Phillips TJ, Belknap JK, Plomin R. Use of recombinant inbred strains to identify quantitative trait loci in psychopharmacology. Psychopharmacology. 1991;104(4):413–424. doi: 10.1007/BF02245643. [DOI] [PubMed] [Google Scholar]
  38. Gregg C, Zhang J, Weissbourd B, Luo S, Schroth GP, Haig D, Dulac C. High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science. 2010;329(5992):643–648. doi: 10.1126/science.1190830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Guryev V, Cuppen E. Next-generation sequencing approaches in genetic rodent model systems to study functional effects of human genetic variation. Prague Special Issue: Funct Genomics and Proteomics. 2009;583(11):1668–1673. doi: 10.1016/j.febslet.2009.04.020. [DOI] [PubMed] [Google Scholar]
  40. Haley CS, Knott SA. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity. 1992;69(4):315. doi: 10.1038/hdy.1992.131. [DOI] [PubMed] [Google Scholar]
  41. Heinzen EL, Ge D, Cronin KD, Maia JM, Shianna KV, Gabriel WN, Welsh-Bohmer K, Hulette CM, Denny TN, Goldstein DB. Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 2008;6(12):e1000001. doi: 10.1371/journal.pbio.1000001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Hitzemann R, Bottomly D, Darakjian P, Walter N, Iancu O, Searles R, Wilmot B, McWeeney S. Genes, behavior and next-generation RNA sequencing. Genes, Brain and Behavior. 2013;12(1):1–12. doi: 10.1111/gbb.12007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Huang G, Shifman S, Valdar W, Johannesson M, Yalcin B, Taylor MS, Taylor JM, Mott R, Flint J. High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome Res. 2009;19(6):1133–1140. doi: 10.1101/gr.088120.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Iancu OD, Darakjian P, Walter NA, Malmanger B, Oberbeck D, Belknap J, McWeeney S, Hitzemann R. Genetic diversity and striatal gene networks: focus on the heterogeneous stock-collaborative cross (HS-CC) mouse. BMC Genomics. 2010;11:585. doi: 10.1186/1471-2164-11-585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Iancu O, Darakjian P, Hitzemann R, McWeeney S. Detection of expression quantitative trait loci in complex mouse crosses: impact and alleviation of data quality and complex population substructure. Front Genet. 2012;3:157. doi: 10.3389/fgene.2012.00157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Iancu OD, Oberbeck D, Darakjian P, Kawane S, Erk J, McWeeney S, Hitzemann R. Differential network analysis reveals genetic effects on catalepsy modules. PLoS One. 2013;8(3):e58951. doi: 10.1371/journal.pone.0058951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Ideraabdullah FY, de la Casa-Esperon E, Bell TA, Detwiler DA, Magnuson T, Sapienza C, de Villena FP. Genetic and haplotype diversity among wild-derived mouse inbred strains. Genome Res. 2004;14(10A):1880–1887. doi: 10.1101/gr.2519704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Jan T, Lu L, Li C, Williams R, Waters R. Genetic analysis of posterior medial barrel subfield (PMBSF) size in somatosensory cortex (SI) in recombinant inbred strains of mice. BMC Neurosci. 2008;9:3. doi: 10.1186/1471-2202-9-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Jansen RC, Nap JP. Genetical genomics: the added value from segregation. Trends Genet. 2001;17(7):388–391. doi: 10.1016/s0168-9525(01)02310-1. [DOI] [PubMed] [Google Scholar]
  50. Johnson W, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–127. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]
  51. Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E. Efficient control of population structure in model organism association mapping. Genetics. 2008;178(3):1709–1723. doi: 10.1534/genetics.107.080101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, Furlotte NA, Eskin E, Nellaker C, Whitley H, Cleak J, Janowitz D, Hernandez-Pliego P, Edwards A, Belgard TG, Oliver PL, McIntyre RE, Bhomra A, Nicod J, Gan X, Yuan W, van der Weyden L, Steward CA, Bala S, Stalker J, Mott R, Durbin R, Jackson IJ, Czechanski A, Guerra-Assuncao JA, Donahue LR, Reinholdt LG, Payseur BA, Ponting CP, Birney E, Flint J, Adams DJ. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477(7364):289–294. doi: 10.1038/nature10413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Kendziorski C, Wang P. A review of statistical methods for expression quantitative trait loci mapping. Mamm Genome. 2006;17(6):509–517. doi: 10.1007/s00335-005-0189-6. [DOI] [PubMed] [Google Scholar]
  54. Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P, Hudson TJ, Sladek R, Majewski J. Genome-wide analysis of transcript isoform variation in humans. Nat Genet. 2008;40(2):225–231. doi: 10.1038/ng.2007.57. [DOI] [PubMed] [Google Scholar]
  55. Lan H, Stoehr JP, Nadler ST, Schueler KL, Yandell BS, Attie AD. Dimension reduction for mapping mRNA abundance as quantitative traits. Genetics. 2003;164(4):1607–1614. doi: 10.1093/genetics/164.4.1607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121(1):185–199. doi: 10.1093/genetics/121.1.185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3(9):e161. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Leek JT, Storey JD. A general framework for multiple testing dependence. Proc Natl Acad Sci. 2008;105(48):18718–18723. doi: 10.1073/pnas.0808709105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Li C, Wei X, Lu L, Peirce J, Williams R, Waters RS. Genetic analysis of barrel field size in the first somatosensory area (SI) in inbred and recombinant inbred strains of mice. Somatosens Mot Res. 2005;22(3):141–150. doi: 10.1080/08990220500262182. [DOI] [PubMed] [Google Scholar]
  60. Li Y, Tesson BM, Churchill GA, Jansen RC. Critical reasoning on causal inference in genome-wide linkage and association studies. Trends Genet. 2010;26(12):493–498. doi: 10.1016/j.tig.2010.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Listgarten J, Kadie C, Schadt EE, Heckerman D. Correction for hidden confounders in the genetic analysis of gene expression. Proc Natl Acad Sci USA. 2010;107(38):16465–16470. doi: 10.1073/pnas.1002425107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Liu C. Brain eQTL mapping informs genetic studies of psychiatric diseases. Neurosci Bull. 2011;27(2):123–133. doi: 10.1007/s12264-011-1203-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Liu B, de la Fuente A, Hoeschele I. Gene network inference via structural equation modeling in genetical genomics experiments. Genetics. 2008;178(3):1763–1776. doi: 10.1534/genetics.107.080069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Liu EY, Zhang Q, McMillan L, de Villena FP-M, Wang W. Efficient genome ancestry inference in complex pedigrees with inbreeding. Bioinformatics. 2010;26(12):i199–i207. doi: 10.1093/bioinformatics/btq187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Logan RW, Robledo RF, Recla JM, Philip VM, Bubier JA, Jay JJ, Harwood C, Wilcox T, Gatti DM, Bult CJ, Churchill GA, Chesler EJ. High-precision genetic mapping of behavioral traits in the diversity outbred mouse population. Genes Brain Behav. 2013;12(4):424–437. doi: 10.1111/gbb.12029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Lum PY, Chen Y, Zhu J, Lamb J, Melmed S, Wang S, Drake TA, Lusis AJ, Schadt EE. Elucidating the murine brain transcriptional network in a segregating mouse population to identify core functional modules for obesity and diabetes. J Neurochem. 2006;97(Suppl 1):50–62. doi: 10.1111/j.1471-4159.2006.03661.x. [DOI] [PubMed] [Google Scholar]
  67. Malmanger B, Lawler M, Coulombe S, Murray R, Cooper S, Polyakov Y, Belknap J, Hitzemann R. Further studies on using multiple-cross mapping (MCM) to map quantitative trait loci. Mamm Genome. 2006;17(12):1193–1204. doi: 10.1007/s00335-006-0070-2. [DOI] [PubMed] [Google Scholar]
  68. McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40(10):4288–4297. doi: 10.1093/nar/gks042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. McClearn GE, Wilson JR, Meredith W. The use of isogenic and heterogenic mouse stocks in behavioral research. In: Lindzey G, Thiessen DD, editors. Contributions to behavior-genetic analysis: the mouse as a prototype. New York: Appleton-Century-Crofts; 1970. pp. 3–22. [Google Scholar]
  70. Millstein J, Zhang B, Zhu J, Schadt E. Disentangling molecular relationships with a causal inference test. BMC Genet. 2009;10(1):23. doi: 10.1186/1471-2156-10-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
  72. Mott R, Talbot CJ, Turri MG, Collins AC, Flint J. A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci USA. 2000;97(23):12649–12654. doi: 10.1073/pnas.230304397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320(5881):1344–1349. doi: 10.1126/science.1158441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12):e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Peirce JL, Li H, Wang J, Manly KF, Hitzemann RJ, Belknap JK, Rosen GD, Goodwin S, Sutter TR, Williams RW, Lu L. How replicable are mRNA expression QTL? Mamm Genome. 2006;17(6):643–656. doi: 10.1007/s00335-005-0187-8. [DOI] [PubMed] [Google Scholar]
  76. Philip VM, Sokoloff G, Ackert-Bicknell CL, Striz M, Branstetter L, Beckmann MA, Spence JS, Jackson BL, Galloway LD, Barker P, et al. Genetic analysis in the Collaborative Cross breeding population. Genome Res. 2011;21(8):1223–1238. doi: 10.1101/gr.113886.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, Pritchard JK. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464(7289):768–772. doi: 10.1038/nature08872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol. 2001;60(3):227–237. doi: 10.1006/tpbi.2001.1543. [DOI] [PubMed] [Google Scholar]
  79. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet. 2000;67(1):170–181. doi: 10.1086/302959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Rabiner LR. A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE. 1989;77:2. [Google Scholar]
  81. Rakitsch B, Lippert C, Topa H, Borgwardt K, Honkela A, Stegle O (2012) A mixed model approach for joint genetic analysis of alternatively spliced transcript isoforms using RNA-Seq data. arXiv preprint arXiv:1210.2850
  82. Roberts A, Pardo-Manuel de Villena F, Wang W, McMillan L, Threadgill DW. The polymorphism architecture of mouse genetic resources elucidated using genome-wide resequencing data: implications for QTL discovery and systems genetics. Mamm Genome. 2007;18(6–7):473–481. doi: 10.1007/s00335-007-9045-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Rosen G, Williams R. Complex trait analysis of the mouse striatum: independent QTLs modulate volume and neuron number. BMC Neurosci. 2001;2:5. doi: 10.1186/1471-2202-2-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Sandberg R, Yasuda R, Pankratz DG, Carter TA, Del Rio JA, Wodicka L, Mayford M, Lockhart DJ, Barlow C. Regional and strain-specific gene expression mapping in the adult mouse brain. Proc Natl Acad Sci USA. 2000;97(20):11038–11043. doi: 10.1073/pnas.97.20.11038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422(6929):297–302. doi: 10.1038/nature01434. [DOI] [PubMed] [Google Scholar]
  86. Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, GuhaThakurta D, Sieberts SK, Monks S, Reitman M, Zhang C, Lum PY, Leonardson A, Thieringer R, Metzger JM, Yang L, Castle J, Zhu H, Kash SF, Drake TA, Sachs A, Lusis AJ. An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet. 2005;37(7):710–717. doi: 10.1038/ng1589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, Gassmann M, Lightfoot S, Menzel W, Granzow M, Ragg T. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol. 2006;7:3. doi: 10.1186/1471-2199-7-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Shirley RL, Walter NA, Reilly MT, Fehr C, Buck KJ. Mpdz is a quantitative trait gene for drug withdrawal seizures. Nat Neurosci. 2004;7(7):699–700. doi: 10.1038/nn1271. [DOI] [PubMed] [Google Scholar]
  89. Solberg LC, Valdar W, Gauguier D, Nunez G, Taylor A, Burnett S, Arboledas-Hita C, Hernandez-Pliego P, Davidson S, Burns P, Bhattacharya S, Hough T, Higgs D, Klenerman P, Cookson WO, Zhang Y, Deacon RM, Rawlins JN, Mott R, Flint J. A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice. Mamm Genome. 2006;17(2):129–146. doi: 10.1007/s00335-005-0112-1. [DOI] [PubMed] [Google Scholar]
  90. Stephens M, Balding DJ. Bayesian statistical methods for genetic association studies. Nat Rev Genet. 2009;10(10):681–690. doi: 10.1038/nrg2615. [DOI] [PubMed] [Google Scholar]
  91. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003;100(16):9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Svenson KL, Gatti DM, Valdar W, Welsh CE, Cheng R, Chesler EJ, Palmer AA, McMillan L, Churchill GA. High-resolution genetic mapping using the mouse diversity outbred population. Genetics. 2012;190(2):437–447. doi: 10.1534/genetics.111.132597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Talbot CJ, Nicod A, Cherny SS, Fulker DW, Collins AC, Flint J. High-resolution mapping of quantitative trait loci in outbred mice. Nat Genet. 1999;21(3):305–308. doi: 10.1038/6825. [DOI] [PubMed] [Google Scholar]
  94. Taylor B. Recombinant inbred strains: Use in gene mapping. In: Morse HC, editor. Origins of inbred mice. New York: Academic Press; 1978. pp. 423–438. [Google Scholar]
  95. Threadgill DW, Churchill GA. Ten years of the collaborative cross. G3 (Bethesda) 2012;2(2):153–156. doi: 10.1534/g3.111.001891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Valdar W, Flint J, Mott R. Simulating the collaborative cross: power of quantitative trait loci detection and mapping resolution in large sets of recombinant inbred strains of mice. Genetics. 2006;172(3):1783–1797. doi: 10.1534/genetics.104.039313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JN, Mott R, Flint J. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet. 2006;38(8):879–887. doi: 10.1038/ng1840. [DOI] [PubMed] [Google Scholar]
  98. Valdar W, Holmes CC, Mott R, Flint J. Mapping in structured populations by resample model averaging. Genetics. 2009;182(4):1263–1277. doi: 10.1534/genetics.109.100727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Walter NA, McWeeney SK, Peters ST, Belknap JK, Hitzemann R, Buck KJ. SNPs matter: impact on detection of differential expression. Nat Methods. 2007;4(9):679–680. doi: 10.1038/nmeth0907-679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Walter NA, Bottomly D, Laderas T, Mooney MA, Darakjian P, Searles RP, Harrington CA, McWeeney SK, Hitzemann R, Buck KJ. High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs. BMC Genomics. 2009;10:379. doi: 10.1186/1471-2164-10-379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Wang J, Williams RW, Manly KF. WebQTL: web-based complex trait analysis. Neuroinformatics. 2003;1(4):299–308. doi: 10.1385/NI:1:4:299. [DOI] [PubMed] [Google Scholar]
  102. West MAL, Kim K, Kliebenstein DJ, van Leeuwen H, Michelmore RW, Doerge RW, Clair DASt. Global eQTL mapping reveals the complex genetic architecture of transcript-level variation in arabidopsis. Genetics. 2007;175(3):1441–1450. doi: 10.1534/genetics.106.064972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Yalcin B, Flint J. Association studies in outbred mice in a new era of full-genome sequencing. Mamm Genome. 2012;23(9–10):719–726. doi: 10.1007/s00335-012-9409-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Yalcin B, Fullerton J, Miller S, Keays DA, Brady S, Bhomra A, Jefferson A, Volpi E, Copley RR, Flint J, Mott R. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc Natl Acad Sci USA. 2004;101(26):9734–9739. doi: 10.1073/pnas.0401189101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Yang H, Ding Y, Hutchins LN, Szatkiewicz J, Bell TA, Paigen BJ, Graber JH, de Villena FPM, Churchill GA. A customized and versatile high-density genotyping array for the mouse. Nat Methods. 2009;6(9):663–666. doi: 10.1038/nmeth.1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, Mackelprang R, Kruglyak L. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet. 2003;35(1):57–64. doi: 10.1038/ng1222. [DOI] [PubMed] [Google Scholar]
  107. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005;4:17. doi: 10.2202/1544-6115.1128. [DOI] [PubMed] [Google Scholar]
  108. Zhou JJ, Ghazalpour A, Sobel EM, Sinsheimer JS, Lange K. Quantitative trait loci association mapping by imputation of strain origins in multifounder crosses. Genetics. 2012;190(2):459–473. doi: 10.1534/genetics.111.135095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE. Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. PLoS Comput Biol. 2007;3(4):e69. doi: 10.1371/journal.pcbi.0030069. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Mammalian Genome are provided here courtesy of Springer

RESOURCES