Abstract
Multiparental populations (MPPs) encompass greater genetic diversity than traditional experimental crosses of two inbred strains, enabling broader surveys of genetic variation underlying complex traits. Two such mouse MPPs are the Collaborative Cross (CC) inbred panel and the Diversity Outbred (DO) population, which are descended from the same eight inbred strains. Additionally, the F1 intercrosses of CC strains (CC-RIX) have been used and enable study designs with replicate outbred mice. Genetic analyses commonly used by researchers to investigate complex traits in these populations include characterizing how heritable a trait is, i.e. its heritability, and mapping its underlying genetic loci, i.e. its quantitative trait loci (QTLs). Here we evaluate the relative merits of these populations for these tasks through simulation, as well as provide recommendations for performing the quantitative genetic analyses. We find that sample populations that include replicate animals, as possible with the CC and CC-RIX, provide more efficient and precise estimates of heritability. We report QTL mapping power curves for the CC, CC-RIX, and DO across a range of QTL effect sizes and polygenic backgrounds for samples of 174 and 500 mice. The utility of replicate animals in the CC and CC-RIX for mapping QTLs rapidly decreased as traits became more polygenic. Only large sample populations of 500 DO mice were well-powered to detect smaller effect loci (7.5–10%) for highly complex traits (80% polygenic background). All results were generated with our R package musppr, which we developed to simulate data from these MPPs and evaluate genetic analyses from user-provided genotypes.
Keywords: CC, recombinant inbred intercross, CC-RIX, DO, MPP, multiparent advanced generation intercross, MAGIC, heritability, quantitative trait locus, QTL
The Collaborative Cross (CC), their F1 hybrids (CC-RIX), and Diversity Outbred (DO) stock are murine multiparental populations that possess high levels of genetic diversity and thus enable powerful genetic studies of complex traits. We compared the performance across these populations for heritability and QTL analyses. Populations with replicate animals (CC and CC-RIX) better estimated heritability. All populations had sufficient power to detect QTLs in simple genetic backgrounds, but only large DO populations mapped small effect loci in highly polygenic backgrounds.
Introduction
Multiparental populations (MPPs) are a powerful class of experimental cross for genetic studies of complex traits (de Koning and McIntyre 2017). Multiple isogenic founder strains are intercrossed to eventually produce offspring that possess recombinant genomes that encompass greater genetic diversity than traditional experimental crosses of two strains. MPPs have been developed across a wide range of model organisms, representing both animal models, such as heterogeneous stocks of mice and rats (Woods and Mott 2017), flies (Long et al. 2014), round worm (Noble et al. 2017), and yeast (Cubillos et al. 2013) as well as plants, including Arabidopsis (Kover et al. 2009), maize (Dell’Acqua et al. 2015), and rice (Bandillo et al. 2013). Here we focus on three related MPPs of the house mouse, Mus musculus: the Collaborative Cross (CC) panel of inbred strains (Collaborative Cross Consortium 2012; Srivastava et al. 2017), their F1 intercrosses (CC-RIX) (Threadgill et al. 2011; Schoenrock et al. 2018; Sun et al. 2021), and the Diversity Outbred (DO) population (Churchill et al. 2012).
The CC, CC-RIX, and DO share the same eight inbred founder strains (short names in parantheses): A/J (AJ), C57BL/6J (B6), 129S1/SvImJ (129), NOD/ShiLtJ (NOD), NZO/HlLtJ (NZO), CAST/EiJ (CAST), PWK/PhJ (PWK), and WSB/EiJ (WSB), which include both traditional laboratory and wild-derived strains and represent three subspecies of Mus musculus (Yang et al. 2007, 2011). As recombinant populations, the CC and DO can be used to map quantitative trait loci (QTLs) and have thus been used to genetically dissect a wide range of phenotypes. Traits that have been studied in the CC include behavioral traits (Philip et al. 2011), hematology traits (Kelada et al. 2012), airway damage (Tovar et al. 2022) and allergies (Kelada et al. 2014), susceptibility to influenza (Ferris et al. 2013) and SARS-CoV (Gralinski et al. 2015; Schäfer et al. 2022), homeostatic immune regulation (Hampton et al. 2022), and drug response (Mosedale et al. 2017, 2019, 2021); in the DO, serum cholesterol (Svenson et al. 2012), insulin secretion (Keller et al. 2019), glutathione metabolism (Gould et al. 2021), response to benzene exposure (French et al. 2015), bone and skeletal traits (Al-Barghouthi et al. 2021), and working (Ouellette et al. 2020) and short-term (Hsiao et al. 2020) memory.
The number of realized fully inbred CC strains was lower than planned (The Complex Trait Consortium 2004), with approximately 70 strains yielded rather than 1,000 due to extinctions caused by allelic incompatibilities (Shorter et al. 2017). This reduction in the number of available strains, i.e. the number of unique genomes, reduced the potential power of genetic mapping studies in the CC; however, the inbred nature of the CC enables the use of replicates within and across experiments. Strain replicates can improve mapping power by reducing variation due to noise, and capture and identify strain-specific genetic effects and phenotypes, which can be caused by strain-specific variants and/or unique combinations of alleles across multiple loci. CC strains with unique phenotypes have been identified for a number of traits and diseases, including spontaneous colitis (Rogala et al. 2014), peanut allergy (Orgel et al. 2019), immune cell diversity (Dupont et al. 2021), and susceptibility to virally induced neurological phenotypes (Eldridge et al. 2020), tuberculosis, (Smith et al. 2019), and Salmonella (Zhang et al. 2019; Scoggin et al. 2022).
Recently genomic studies have been a growing area of research for both the CC and DO, leveraging their genetic diversity in the presence of large genetic effects on molecular, i.e. omic traits, such as gene expression (Aylor et al. 2011; Keller et al. 2018) and chromatin accessibility (Keele et al. 2020), proteins (Chick et al. 2016) and their phosphorylation sites (Zhang et al. 2022), and lipids (Linke et al. 2020), across tissues and organs. The effects of biologically related factors other than genetic variation, such as age, on gene expression, and protein abundance have been studied in the DO mice (Takemon et al. 2021; Gerdes Gyuricza et al. 2022). Omic studies have also been performed in embryonic stem cells derived from DO mice (Skelly et al. 2020; Aydin et al. 2022). Genetic effects are strongly consistent between the CC and DO populations (Keele et al. 2021), supporting the use of both for joint analysis and validation of findings between populations.
QTL mapping power in the CC was initially evaluated prior to the development of the final panel of fully inbred strains, using far greater numbers of simulated genomes than were actually realized (Valdar et al. 2006). We updated mapping power estimates by simulating from readily available CC strains (Keele et al. 2019). Mapping power has also been assessed in the DO population (Gatti et al. 2014). Here we extend our previous approach of simulating from observed genomes, now across the CC, CC-RIX, and DO, to evaluate and compare the performance of commonly used genetic analyses: mapping QTLs and quantifying genetic architecture (in the form of estimating heritability). This work will aid researchers in designing and tailoring their experiments to maximize their efficiency for various goals (e.g. characterizing genetic architecture or mapping causal genetic variation), as well as provide recommendations for best practices in performing the quantitative and statistical analyses.
Methods
Sample populations
Generating realistic genetic data for MPPs from scratch poses multiple challenges. An ideal approach would involve full simulation of the various breeding designs from initial founder strains all the way to the reconstruction of founder haplotypes from marker genotypes in the final recombinant offspring. To avoid this complex and computationally intensive approach, we simulated data for CC, CC-RIX, and DO (Fig. 1) from observed populations of CC and DO. This approach has the additional advantage of capturing the effects of genetic drift or any deviations from the breeding designs that occurred, better approximating real animals available to researchers.
Genotype data
The CC sample population consisted of 116 mice (Keele et al. 2021) (female/male pairs from 58 strains) that were genotyped on an 11,000 marker array (MiniMUGA) (Sigmon et al. 2020). We used two DO sample populations, including 192 mice (Chick et al. 2016) genotyped on a 57,000 marker array (MegaMUGA) (Morgan and Welsh 2015) and 500 mice (Keller et al. 2018) genotyped on a 143,000 marker array (GigaMUGA). Founder haplotypes were probabilistically inferred using a hidden Markov model (HMM) implemented in the qtl2 R package (Broman et al. 2019). Genetic mapping in experimental crosses is commonly performed in terms of founder haplotypes rather than specific genetic variants, with uncertainty accounted for using a mixture model (Lander and Botstein 1989) (i.e. interval or linkage mapping) or regression-based approximations (Haley and Knott 1992; Martínez and Curnow 1992). Because the CC were genotyped on a sparser array than the DO, their founder haplotype reconstruction possesses greater uncertainty. To make these sample populations as comparable as possible in terms of quantified founder haplotype uncertainty and its effects on heritability estimation and QTL mapping, we imputed founder haplotype probabilities at the same 64,000 loci (i.e. pseudomarkers) spanning the genome and then imputed the founder haplotype pair (i.e. diplotype) based on greatest probability. We note that the genetic data and corresponding results reported here are idealized to some extent, and real data will be subject to genotype uncertainty. We derived the founder diplotypes for the 1,653 CC-RIX F1s, representing all possible pairings between 58 CC strains (ignoring parent-of-origin features, e.g. sex chromosomes and mitochondria). See “Appendix A” for greater detail on how diplotypes were processed for all sample populations.
CC-RIX F1 selection
For the CC-RIX, given that it is unlikely researchers would collect data for a full set of all possible F1s (1,653 for 58 CC parental strains), we evaluated three classes of simulated populations. These three types of CC-RIX populations are not meant to represent all possible approaches to selecting F1s and designing a CC-RIX experiment, though they do possess distinct features described below.
For the first class of CC-RIX population, F1s were selected such that each CC strain is a parent for two F1s, as is possible with a rotational breeding scheme (e.g. CC001×CC058, CC001×CC002, CC002×CC003, …, CC057×CC058). We refer to these populations as “balanced” because all CC strains are observed equally as parental strains. Each F1 was simulated with multiple replicates. Balanced samples of CC-RIX F1s possess the same overall allele frequencies as a corresponding sample of the parental CC strains, allowing us to decouple the effects of allele frequency from how the CC-RIX population structure and overall heterozygosity affects heritability estimation and mapping power.
We also evaluated simulated populations that were composed of randomly sampled CC-RIX F1s, which we refer to as “unbalanced.” This results in unequal representation of parental CC strains by their offspring F1s in the resulting CC-RIX sample population. We evaluated unbalanced CC-RIX F1 sets in CC-RIX-only populations and in combination with the parental CC strains. Examples of each class of CC-RIX sample population are shown in Supplementary Figure S1.
Heritability
Heritability, the proportion of variation in a population explained by genetic relationship, is commonly used to assess evidence that a phenotype is genetically controlled. It is important to note that heritability is specific to a population, and thus cannot be extrapolated across populations. Furthermore, it can be influenced by other sources of variation in the data, such as the measurement error of the phenotype, which will likely reduce heritability estimates.
Depending on the sample population, heritability can be estimated using different approaches and even decomposed into multiple components, such as the proportion of variation explained by all genetic effects, i.e. broad-sense heritability (commonly symbolized as H2) and the proportion of variation explained by additive genetic effects, i.e. narrow-sense heritability (h2) (Lynch and Walsh 1998). In this work, we use the h2 notation more generally and will include subscripts to specify components of heritability in certain contexts. For studies of inbred strains with replicates, as is possible with the CC, an intraclass correlation can be used (as in Yam et al. 2021), though it is less appropriate in the outbred CC-RIX and not applicable to the DO.
Heritability model
Linear mixed effects models (LMMs) offer an appealing general approach that can be adjusted for each of the CC, CC-RIX, and DO. We estimate heritability using the following general LMM:
(1) |
where is the phenotype vector for a sample population of N mice, μ is the intercept, and u and are N-length random vectors. u can be referred to as the polygenic effect, here representing structured error (i.e. population structure) and is modeled as where K is an N × N (often additive) genetic relationship matrix (i.e. kinship matrix) and τ2 is the corresponding variance component. is unstructured error, distributed according to where I is the N × N identity matrix and σ2 is its variance component. Heritability is then calculated as the ratio of variation due to genetics to total variation:
(2) |
Toobtain unbiased estimates of heritability, the variance component parameters are estimated through optimizing the restricted maximum likelihood (REML) (Patterson and Thompson 1971). The heritability LMM can be fit with various software packages; here we compare results from the qtl2 R package (Broman et al. 2019), our miQTL R package (available at https://github.com/gkeele/miqtl), and the sommer R package (Covarrubias-Pazaran 2016).
For sample populations that include strain or F1 replicates, such as in the CC and CC-RIX, the kinship matrix can be characterized as K = ZKMZT, where Z is the N × M strain/F1 identity matrix that maps the N individuals to M strains/F1s (M < N) and KM is the M × M kinship matrix encoding the overall genetic relationship between the M strains/F1s.
Expanded heritability model with non-additive component for replicates
If replicates are included in the sample population, Equation (1) can be expanded to include two components of heritability:
(3) |
whereuadd is equivalent to u from Equation 1 in the presence of replicates——and urep is a random effect specific to strain/F1—. The two components of heritability are then estimated as
(4) |
(5) |
where is the proportion of variation explained by additive genetic effects, i.e. narrow-sense heritability, and is the proportion of variation explained by strain/F1 identity, such as due to epistasis between loci with combinations of alleles unique to a given strain/F1. From the three software packages we used, only sommer allows for flexible specification of multiple random effects, each with their own covariance matrix.
Kinship matrix for heritability
The true kinship matrix K is unknown and must be estimated from genotypes (haplotypes or SNPs), pedigree information, or both (Cheng et al. 2013). Here our focus is not a thorough analysis of how to best estimate the kinship matrix, though we do evaluate a number of options. For a SNP-based kinship matrix, we use the form described in Endelman and Jannink (2012) with selection parameter set to 0. For a haplotype-based kinship matrix, we used qtl2 (Broman et al. 2019), which calculates , where Ap is the scaled founder haplotype dosage matrix at locus p. Recently, Feldmann et al. (2022) proposed the average semivariance (ASV) transformation for kinship matrices, which resulted in improved heritability estimation. Here we also evaluated ASV forms of each kinship matrix as , where is the mean-centering matrix.
Heritability simulations
We adapted our approach used in Keele et al. (2019) to simulate phenotype data with specified heritability; see “Appendix B” for greater detail. When simulating a single additive component of heritability for a given population, we simulated 1,000 data sets for each specified level of heritability. Heritability was evaluated across a grid, ranging from 0 to 1 with increments of 0.05. Heritability was then estimated using one of qtl2, miQTL, or sommer, and summarized with the mean and the 95% estimate interval (defined by the 0.025 and 0.975 empirical quantiles). Within this simulation framework, we also compared using kinship matrices estimated from SNPs and founder haplotypes, as well as ASV forms of both.
We extended the previous simulation approach for heritability with two components by fixing the total heritability at a specified level (e.g. 90%) and varied the ratio across a grid, ranging from 0 to 1 with increments of 0.1. Only 100 data sets were simulated per parameter setting because fitting the two component heritability model in sommer is more computationally intensive and thus slower to fit. Means and 95% estimate intervals were used as summaries as before, but now for each component of heritability separately as well as their sum.
We performed various comparisons of heritability estimation across the sample populations of CC, DO, and balanced and unbalanced CC-RIX by generating simulated populations derived from their genomes. Generally, we down sampled mice to make the number of total mice consistent across populations for a given comparison. For example, we randomly sampled 174 DO mice from the sample population of 192 to compare to 174 CC mice (three per 58 strains). When evaluating two components of heritability, we simulated larger populations of 522 mice given that each genome must be observed with multiple replicates to distinguish the strain- or F1-specific component from the additive one.
QTL mapping
The chromosomes of individuals from MPPs are mosaics of the founder haplotypes formed through recombination events that occurred during outbreeding (Fig. 1). This mixing process randomizes loci from each other, ignoring linkage disequilibrium (LD) and population structure, allowing genetic variation at loci to be causally associated with traits. We evaluate mapping power in the context of conventional single locus genetic mapping, though note that these approaches could be extended to multi-locus models and epistasis.
QTL model
The underlying model used during QTL mapping is similar to Equation (1) for heritability, however a locus effect is evaluated at loci spanning the genome (i.e. a genome scan):
(6) |
whereQTL[p] = ApβQTL is the effect of locus p, Ap is the N × 8 scaled founder haplotype dosage matrix at locus p (assuming an additive model), βQTL is the 8-vector of founder haplotype effects, u[c[p]] is the random polygenic effect vector with respect to chromosome c (on which locus p is located), and all other terms as previously defined. This represents a leave-one-chromosome-out (LOCO) approach (Yang et al. 2014; Gatti et al. 2014) in which , meaning K is estimated from all markers excluding those on locus p’s chromosome, which avoids any of the effect of locus p from being absorbed into the random term and increases mapping power. We used qtl2 to perform all genome scans.
QTL significance thresholds
Instead of using permutations (Churchill and Doerge 1994) to estimate significance thresholds that control the genome-wide family-wise error rate (FWER), we used parametric bootstrap samples generated from the null model (Equation (6) excluding the QTL term). We note that the bootstrap samples are generated given the true value of heritability, which would normally need to be estimated from the data and thus subject to error. Our goal is to produce thresholds that can be used across simulated data for a given population with the same parameter settings. Thresholds were calculated as quantiles from extreme value distributions fit from the maximum LOD scores across the null bootstrap samples (Dudbridge and Koeleman 2004). For a QTL to be correctly detected, we required its peak marker to be located within 1.5 LOD support intervals (Dupuis and Siegmund 1999), which we discuss in great detail in the next section of the “Methods.”
In the context of an omic trait, such as a gene’s expression, there is strong biological support for genetic variants that are nearby the trait’s genomic positions (i.e. local) having strong genetic effects, such as cis-eQTLs. This prior evidence allows for more lenient thresholds to be used to more powerfully detect local QTLs (Keele et al. 2020). Here we use a simple approach to lenient thresholds by repeating the previous genome-wide procedure, but now reducing the multiple testing burden to the loci on the traits’s chromosome (i.e. local chromosome). This is analogous to only considering the chromosome on which a gene is encoded for cis-eQTLs.
Confidence interval for QTL location
The goal of QTL mapping is often to identify candidate genes or functional variants that underlie a QTL. But just as estimation of the effects of the QTL are subject to error, so is the estimation of QTL location. Furthermore, most often the causal genetic variant itself is not being tested as a QTL, but rather a locus that is in LD with it. This is particularly relevant in MPPs, where it is conventional to initially detect QTL using sparser scans based on founder haplotypes at intervals across the genome (Equation (6)). A confidence or support interval for the location of QTL summarizes this uncertainty and can prioritize a genomic region for specific candidate genes or genetic variants.
Approximate likelihood-based support intervals are commonly used due to ease of computation, which were statistically characterized for F2 intercrosses and backcrosses (Dupuis and Siegmund 1999) but not for MPPs such as the CC, CC-RIX, or DO. Empirical sampling-based approaches have also been proposed, such as nonparametric bootstrapping (Visscher et al. 1996), though have been found to perform poorly with sparse markers (Manichaikul and Dupuis 2006). Here we evaluate approaches for estimating QTL intervals in CC, CC-RIX, and DO populations, including likelihood-based—LOD support intervals and approximate Bayes credible intervals (Broman and Sen 2009)—and sampling-based intervals—parametric bootstrap, parametric permutation, and Bayesian bootstrap (Rubin 1981). See “Appendix C” for more details on each method.
QTL simulations
We extended our approach to simulating mapping data in the CC (Keele et al. 2019) to the CC-RIX and DO populations. We evaluated QTL mapping performance across the populations, comparing power, mapping resolution, and interval summaries for QTL location, while varying the number of mice and the proportion of phenotypic variation due to a QTL and the cumulative effect of background genetic loci (approximated through a polygenic effect). All combinations of QTL effect size ranging from 10% to 40% in increments of 5% and polygenic background of 5% (essentially a Mendelian trait), 30%, and 55% were performed for CC, CC-RIX, and DO populations of 174 and 500 mice. We also considered the context of small QTLs (<10%) in highly complex traits (polygenic background of 80%). For this case, we simulated QTL effect size of 1%, 2.5%, 5%, 7.5%, and 10% for sample populations of 174 and 500 mice.
For CC and balanced CC-RIX samples of 174, three replicates per 58 strains/F1s were simulated. We used 174 because we had genetic data for 58 CC strains and a smaller DO sample population of 192 mice. Using three replicates per CC strain resulted in 174 mice, whereas four replicates (232 mice) would have exceeded the smaller DO sample. For samples of 500, ten replicates per 50 strains/F1s were simulated for CC and balanced CC-RIX. Unbalanced CC-RIX populations did not include any replicates.
Across all simulation scenarios (QTL effect size, polygenic background, population, sample size), we randomly selected 1,000 loci as QTLs from which to simulate data. The variation explained by the QTL and polygenic background were strictly controlled with respect to the sample population. Each QTL was simulated as having a bi-allelic effect evenly split across the founder strains. We discuss the implications of this further in the “Results and discussion.”
We also performed reduced sets of simulations to compare mapping power based on analyzing CC strain means data to instead analyzing individual mouse data. Furthermore, we also evaluated how differences in allele frequency and heterozygosity across the CC, CC-RIX, and DO populations may affect mapping results. For the comparison of CC strain means to individual-level data, 1,000 QTLs were simulated in CC populations of 116 mice (2 replicates per 58 strains) across the previously used low-to-moderate polygenic backgrounds (5%, 30%, and 55%). To evaluate the effect of allele frequency and heterozygosity, we scaled the QTL effect size with respect to a reference population, one that is fully inbred with perfectly balanced allele frequencies. We simulated 1,000 QTLs in populations of 174 mice with a polygenic background of 30%. For both reduced analyses, we varied QTL effect size from 10% to 40% in increments of 5%. See “Appendix D” for further explanation of our simulation approach for QTL data.
Results and discussion
We simulated data using previously observed genomes of CC, CC-RIX, and DO mice to evaluate heritability estimation and QTL mapping; see “Methods” for more details on the populations, the simulations, and statistical methods used.
Heritability estimation performance
ASV transformation improves accuracy and precision
We first compared heritability estimation across kinship matrices (haplotype-based and SNP-based, both non-ASV and ASV forms) and statistical software (qtl2, miQTL, and sommer) that can fit the same LMM (Supplementary Figure S2). Haplotype-based and SNP-based kinship matrices with matching ASV status performed similarly, which is not surprising given that the SNPs were imputed from the haplotypes. This does confirm that these estimators produce similar genetic relationship matrices. In terms of software packages, miQTL and sommer produce essentially identical estimates of heritability, performing best with ASV kinship matrices. In contrast, qtl2 performs best in the DO with non-ASV haplotype-based matrices and is negatively biased when estimating heritability in the CC. For these reasons, we present the remaining heritability results using miQTL, which is computationally faster than sommer, and using ASV haplotype-based kinship matrices.
Replicates improve precision
We first evaluated heritability estimation in small samples of 50 and 100 mice in the CC and DO (Fig. 2). For a sample of 50 mice, the mean heritability across the simulations was more biased at the tails of the distribution. More importantly, the 95% estimate intervals cover the full support of heritability; in other words, regardless of the true value of heritability, the interval ranges from 0% to 100% (Fig. 2a–b). Doubling the number of animals to 100 increases the precision of heritability estimation (Fig. 2c–d), most notably in the CC population with two replicates per strain. For a 50% heritable trait in 100 DO mice, the 95% estimate interval still ranges from 0% to 100%, whereas in the CC, it ranges from 30% to 65%. In the CC, there was also greater variability in heritability estimates when the true heritability was lower, though the widest intervals are still notably narrower than in the DO sample population of the same number of mice.
We also evaluated heritability estimation using CC strain means (Fig. 2e), which are commonly used for QTL mapping, and observed upward bias and intervals that cover the full support of heritability. The upward bias is consistent with the expected reduction in noise on a mean, and the wide intervals are due to the loss of the within-strain correlation information. The underlying heritability model could be adjusted for strain means, incorporating the number of replicates per strain and standard errors on the strain means, which would likely correct bias and increase precision. However, this would require statistical models tailored to data with replicates. If this is not the case, this finding suggests that replicates should not be reduced to strain-level summaries for the purpose of estimating heritability.
We expanded our evaluation of heritability estimation to include three classes of CC-RIX populations while fixing the number of total mice at 174 (three replicates per 58 strains/F1s for CC and Balanced CC-RIX; Fig. 3) and 500 (10 replicates per 50 strains/F1s for CC and Balanced CC-RIX; Supplementary Figure S3). Compared to populations of 50 or 100 mice, larger populations of 174 or 500 are able to estimate heritability more accurately and precisely. To more directly compare how the number of replicates versus the total number of mice influence the accuracy and precision of heritability estimation across the populations, we performed 1,000 simulations for varying numbers of replicates and total mice at two levels of heritability (25% and 75%) (Fig. 4). Overall, bias was not an issue for any of the populations except for small samples of CC (50 mice with no replicates) and DO (50–100 mice) (Fig. 4a). Precision of heritability, however, was notably improved through replicates rather than total number of mice—a sample of five replicates per 10 CC strains (50 mice total) has similar or better precision than a DO population of 400–500 mice (Fig. 4b). For example, the 95% estimate interval width in a sample of five mice from 10 CC strains is 15.1% compared to 100.0% in 100 DO mice, 43.3% in 400 DO mice, and 32.7% in 500 DO mice when the true heritability is 75%.
The CC and CC-RIX populations provide more precise heritability estimation than the DO because they possess greater variation in inter-relatedness across their sample populations (essentially, population structure) (Supplementary Figure S4) in the form of genetic replicates and/or shared parental CC strains (for the CC-RIX). In data simulated for 174 mice and heritability set to 80%, all populations produce low bias, but the 95% estimate interval width for the DO is greater than 45% and around 10% in the CC and CC-RIX (Supplementary Figure S4b). Notably, population structure also has implications for QTL mapping power. For example, not accounting for population structure in QTL mapping produces many false positives, including many with strong associations (LOD score >10), in the unbalanced CC-RIX populations (Supplementary Figure S4c). This emphasizes the importance of accounting for population structure though the LMM in QTL mapping. We will discuss QTL mapping power in greater detail below.
Replicates of CC-RIX distinguish additive and non-additive components of heritability
We next performed simulations with two components of heritability: an additive component as before that reflects the additive kinship matrix, as well as a strain- or F1-specific component. Because replicate DO mice are not an option, we only evaluated CC and CC-RIX populations (Fig. 5). All simulations had a cumulative heritability of 90% in populations of 522 mice. Across 100 simulations in the CC, the ratio of estimated components are generally unbiased but with 95% intervals ranging from 0% to 90%, whereas the cumulative heritability was accurately and precisely estimated at 90% (Fig. 5b). In the CC, it is unsurprising that cumulative heritability can be accurately estimated, whereas the individuals components are indistinguishable given that the strains of the CC are largely equally related and thus the majority of population structure in a sample population comes from replicates, confounding the additive and strain-specific effects. This is further confirmed by how fitting an additive-only heritability model to data simulated with both additive and strain components generally returns the cumulative heritability total rather than the additive component (Supplementary Figure S5a).
All CC-RIX populations better distinguish the additive and non-additive components of heritability than the CC, particularly the unbalanced CC-RIX populations (Supplementary Figure S1). Fitting an additive-only heritability model in simulated unbalanced CC-RIX data with two components is also better able to capture only the additive component, though some of the non-additive variation does appear to upwardly bias the estimate (Supplementary Figure S5b). These findings demonstrate the utility of the CC-RIX and their unique half-sibling F1s for distinguishing additive and non-additive genetic effects.
QTL mapping power curves for traits with low-to-moderate polygenic backgrounds
We evaluated the power to detect QTLs with effect sizes ranging from 10% to 40% by 5% increments in traits with low-to-moderate polygenic backgrounds (5%, 30%, and 55%) (Fig. 6a). This range of genetic effect parameters covers a phenotype spectrum from simple Mendelian traits to more genetically complicated ones. Sample populations of 174 (three replicates per 58 strains/F1s in CC and balanced CC-RIX) and 500 (10 replicates per 50 strains/F1s) were simulated and evaluated.
Population structure reduces QTL mapping power
For nearly monogenic traits (5% polygenic background), all populations were well powered to detect QTL with effect sizes within the range of 10% to 40% (Fig. 6a [top row]). The value of replicates for QTL mapping is maximized for traits with low polygenic effect, with the CC and balanced CC-RIX mostly out-performing even the DO. As the genetic background of the trait became more polygenic, populations with high levels of structure (i.e. unequal relatedness across individuals) due to replicates or CC-RIX half-sibling F1s performed more poorly. With a 30% polygenic effect, populations of 174 with replicates (CC and balanced CC-RIX) were essentially equally powered as populations of 500 (Fig. 6a [middle row]). This becomes more pronounced with 55% polygenic traits; populations of 174 with replicates were better powered than corresponding populations of 500 because they included more distinct genomes (58 CC strains compared to 50) (Fig. 6a [bottom row]). The unbalanced CC-RIX populations of 500 are only slightly better powered than the corresponding populations of 174. In contrast to the CC and CC-RIX populations, mapping power in the DO was highly consistent across background polygenicity.
DO provide narrower QTL intervals than CC or CC-RIX
Mapping in the DO produced narrower QTL intervals, as measured by LOD support intervals, followed by CC and balanced CC-RIX, and finally the unbalanced CC-RIX populations (Fig. 6b). The QTL interval width is expected to be inversely related to the number of distinct recombination events that are observed in a mapping population. These results are consistent with this expectation, given that the DO possess far more recombination events that occurred during additional outbreeding generations over the CC and CC-RIX, as well as possessing more genetically unique individuals than populations with replicates. The unbalanced CC-RIX populations potentially include the fewest recombination events because not all the available 58 CC strains are necessarily selected as parental strains, resulting in wider QTL intervals. We also note that the DO continue to be intercrossed to produce new generations, with each subsequent generation accruing more recombinations and thus finer mapping resolution in principle. LOD support intervals were used as a summary of mapping resolution due to ease of computation. Further on we more deeply evaluate the statistical performance of QTL interval estimates.
Individual-level data improved QTL mapping power over strain/F1 means
In the CC, it has been common practice to perform QTL mapping based on strain-level summaries (e.g. means) rather than individual mouse-level data. Often specific animals are not even genotyped, and instead resource genotypes summarized from multiple ancestors are used (available at http://csbio.unc.edu/CCstatus/index.py?run=FounderProbs). Strain summaries also have the added benefit of reducing the data and the resulting computational burden. We sought to evaluate how this strain-level approach compared to use of individual-level data in terms of QTL mapping power. Based on simulations of 1,000 QTLs in 116 mice (two replicates per strain), we observed reduced mapping power with CC strain means compared to individual-level data. For a 40% QTL in a 30% polygenic background, we observed 88% power in CC strain means compared to 98% in CC individuals (Supplementary Figure S6). The disparity in power varied with QTL effect size and polygenic background, but individual-level data universally performed better. We primarily report results using strain means (and F1 means for balanced CC-RIX) given this approach has been commonly used, but these findings suggest that the use of individual-level data results in appreciable gains in mapping power.
QTL mapping power curves for traits with a highly polygenic background
We next evaluated power to detect QTLs with small effect sizes (1–10%) in traits that are highly polygenic (80%) (Fig. 6c). These simulations are meant to approximate highly heritable traits that are controlled by many genetic loci with small individual effects, such as height in humans (Vinkhuyzen et al. 2013), which has heritability estimates around 80%. As before, sample sizes of 174 and 500 were evaluated.
Only large samples of DO are well-powered to detect small effect QTLs in a highly polygenic background
Only sample populations of 500 DO mice were potentially powered to detect QTLs in a highly polygenic background, whereas CC and CC-RIX populations had no power. Specifically, samples of 500 DO mice were well powered (>80%) to detect QTLs that explain 7.5–10% of phenotypic variation. A key takeaway is that as the genetic background becomes more polygenic, the value of a genetic replicate in terms of mapping power decreases. Though a genetic replicate will still greatly increase the precision of heritability estimates, they contribute essentially no improvement to mapping power for traits with polygenic backgrounds of 30% or greater.
QTL mapping power curves for omic traits using lenient local significance thresholds
Genetic variants that strongly affect an omic trait are often in close proximity to the trait’s genomic coordinate, such as a cis-eQTL that affects the availability of a gene’s promoter for the initiation of transcription. In this context, analyses focused on the local genomic region of a trait can improve power by reducing the multiple testing burden (Keele et al. 2020). We reevaluated the power to detect QTLs in the prior scenarios, but now assuming all 1,000 simulated QTLs are local and restricting our testing to the local chromosome of each trait (Fig. 7). Detecting QTLs based on local significance improves power, most notably in the CC and CC-RIX for traits with low-to-moderate polygenic backgrounds, suggesting that they have more utility for mapping local QTLs for omic studies than traditional complex traits, in which they are often under-powered (Fig. 6). In a highly polygenic background, the power gains are less noticeable and the CC and CC-RIX do not exceed 12% power for any QTL effect size.
The effect of genotype frequencies on QTL effect size in mapping populations
For the power curves reported above, we strictly scaled the effects of the simulated QTL so that it causes a specified proportion of the total phenotypic variation in the observed mapping population. Consider the case of an allele that is rare in one sample population but common in another; this scaling implicitly increases the effect of the QTL in the population in which the allele is rare to equalize the variation explained across populations. More broadly speaking, controlling the proportion of variation explained by the QTL in the sample population will inflate the QTL effect as its minor allele frequency decreases. In practice this likely inflates the power estimate for the population in which the minor allele is rarer.
In MPPs, allele frequencies will be strongly affected by the allelic series at the QTL, i.e. how the QTL’s alleles are distributed among the founder strains (Crouse et al. 2020). For all results reported here, we simulated a bi-allelic variant with alleles evenly distributed among the founder strains, which results in fairly balanced allele frequencies across the populations. This assumption is optimistic in terms of power, given that many QTLs are driven by alleles from the phylogenetically distinct CAST and PWK founder strains (Aylor et al. 2011; Keele et al. 2020). We previously explored how the allelic series of a QTL influenced mapping power in the CC (Keele et al. 2019); less balanced allelic series generally had reduced mapping power due to lower allele frequencies. Researchers should expect QTLs with less balanced allelic series to have reduced power compared with the estimates reported here.
Beyond allele frequencies, when comparing inbred and outbred populations, the genotype frequencies at the QTL can greatly influence the phenotypic variation observed in the sample population. Variation due to the QTL is maximized when the sample population is composed of individuals with homozygous genotypes at the QTL rather than heterozygous, resulting in a larger QTL effect size in the CC than the CC-RIX or DO. Though power curves are often contextualized in terms of the proportion variance explained within the mapping population, given the large-scale differences in genotype frequencies between these populations, we sought to evaluate how these population features would affect mapping power. We again performed 1,000 simulations for QTLs with effect sizes ranging from 10% to 40% (increments of 5%) in a 30% polygenic background for samples of 174 mice. We now scaled the QTL effect equally across the CC, CC-RIX, and DO populations with respect to a reference population with homozygous genotypes and balanced allele frequencies at the QTL. The effect of scaling to the same reference population on mapping power, as well as the relationships between QTL effect size in the sample population and minor allele frequency and heterozygosity are shown in Fig. 8.
QTL effect size is maximized in the CC
The mapping power in the CC was only slightly reduced based on scaling to the reference population (Fig. 8a), due to minor imbalances in allele frequency reducing the QTL effect size (Fig. 8b). Heterozygosity had no effect on QTL effect size in the inbred CC because they are fully homozygous (Fig. 8c). The DO experienced a larger reduction in mapping power than the CC, resulting in its mapping power being slightly lower than the CC’s, mostly due to large-scale heterozygosity. Mapping powers in the CC-RIX populations were penalized the most, likely due to the combined effects of heterozygosity, imbalanced allele frequencies, and population structure.
These findings suggest there is some cause for optimism when mapping in the CC, in which an additive QTL effect is likely to be maximized within the sample population. It is important to note that deviations from additivity could lessen this effect. These findings also reiterate that the CC-RIX populations have less utility for QTL mapping. Samples of 174 CC and DO are well-powered to detect a large 35–40% QTL (in the reference population), whereas none of the CC-RIX have greater than 50% power. We emphasize that though the inbred genotypes of the CC maximize simulated QTL effect size and thus power in some contexts, this benefit will still be limited by the reduced number of unique genomes compared to the DO, particularly as the genetic complexity of the trait increases. Large sample populations of DO are by far the best option for mapping highly polygenic traits.
QTL location interval performance
QTL location intervals allow researchers to survey and prioritize candidate genetic variants and genes near a QTL. Defining a conventional statistical confidence interval for QTL location is challenging because of the non-smooth likelihood of the location parameter of the QTL model due to discrete genotype markers (Manichaikul and Dupuis 2006). Approximations to a confidence interval include likelihood-based methods that estimate intervals using the LOD scores from markers around a detected QTL as a profile likelihood for QTL location. Two such methods include the LOD support (Dupuis and Siegmund 1999) and Bayes credible intervals (Broman and Sen 2009). These approaches were proposed and calibrated in simpler bi-parental crosses (e.g. F2 intercrosses and backcrosses). The greater complexity of the multi-allelic QTL model (Equation (6)) for MPPs compared to bi-parental populations warrants statistical assessment in the context of the CC, CC-RIX, and DO.
Sampling the data represents another approach to quantifying uncertainty on QTL location through methods like bootstrapping and permutation. We evaluated three sampling-based QTL location intervals: parametric bootstrap, parametric permutation, and Bayesian bootstrap (Rubin 1981). See “Appendix C” for more details on these sampling-based methods. Note that in practical terms the likelihood-based approaches are appealing because they require significantly less computation due to no sampling.
We estimated these intervals for the 1,000 simulated QTLs in the CC, CC-RIX, and DO populations of 174 animals in the 40% QTL and 30% polygenic background scenario. For all methods, we evaluated how the QTL coverage rate (the rate that the estimated interval included the true QTL across simulations) compared to the support level or nominal probability. For methods with a nominal probability (e.g. 95% confidence interval), ideally the observed coverage rate would be close to the nominal probability. We also evaluated methods based on mapping power when strictly requiring the estimated QTL interval to cover the true location.
The support level that corresponded to a LOD support interval with 80% coverage varied across populations and statistical threshold, ranging from 1.25 in the CC and balanced CC-RIX to less than 1 (≈0.7) in the unbalanced CC-RIX populations (Supplementary Figure S7a). The likelihood-based Bayes credible interval performed similarly to LOD support interval, though power was slightly reduced for all populations (Supplementary Figure S7b). Bayes credible intervals were also conservative in terms of coverage rate—observed coverage was generally higher than the nominal probability.
Likelihood-based QTL intervals outperform sampling-based ones
The sampling-based intervals resulted in reduced mapping power compared to likelihood-based intervals (Supplementary Figure S8). However, the observed coverage rate for sampling-based methods did generally track more closely with the nominal probability than for the likelihood-based Bayes credible intervals. In terms of coverage rate and the median interval width (narrower being better), the likelihood-based intervals generally performed better (Fig. 9 and Supplementary Figure S9). For likelihood-based methods, the Bayes credible intervals were slightly narrower than the LOD support intervals, but this was balanced by LOD support intervals having slightly better coverage. For sampling-based intervals, parametric bootstrap and permutation outperformed Bayesian bootstrap. In summary, we found likelihood-based intervals to be the superior option for estimating QTL intervals in the CC, CC-RIX, and DO given their computational ease and overall superior statistical performance.
We note that these results are based on simulated data, which does not often reflect all the structure and complexity of real data. Fitting the founder haplotype model, particularly in smaller sample populations (<200 mice), can result in unstable associations when there are rare alleles present (Keele et al. 2018; Hsiao et al. 2020). We speculate that the likelihood-based intervals could be susceptible to these problematic loci, producing overly certain, i.e. narrow, QTL intervals. In these cases, researchers should consider comparing multiple interval estimates, including sampling-based ones.
Extending the musppr R package to future studies
We designed musppr to be reusable, allowing researchers to input genetic data from their own sample populations of CC, CC-RIX, and DO, and thus tailoring findings to specific studies. Its functions are amenable to being run in parallel on a computing cluster, allowing deeper evaluations of experimental performance, such as mapping power across more QTL effect size and polygenic background settings, which could be useful for proposals and when planning experiments. The broad findings reported here, such as the value of genetic replicates for estimating heritability, are also largely valid when extrapolating to non-recombinant inbred panels, such as the CC/DO founder strains or the Hybrid Mouse Diversity Panel (Lusis et al. 2016), as well as non-mouse MPPs. We do caveat that to use musppr to analyze genetic data from non-mouse MPPs, functions may need to be expanded or adjusted. For example, if the model underlying the QTL mapping analysis requires population-specific features that are not available in the model fit by the qtl2 R package, musppr’s mapping function would need to be adjusted.
Conclusions
Here we evaluated the performance of three related genetically diverse mouse MPPs, the CC, CC-RIX, and DO, in estimating heritability and mapping QTLs, commonly used genetic analyses for the study of complex traits. Our findings provide examples of best practices for researchers designing studies with these population resources, such as using the ASV form of the kinship matrix for heritability estimation. More broadly, this work reveals the relative strengths of these populations. Replicate mice in the CC and CC-RIX samples result in more efficient estimation of heritability, potentially offering more precise estimates from far fewer animals than would be required in the DO. The CC and CC-RIX can be powerful tools for genetic mapping when the QTL effect is large (≥40%) and the genetic architecture is fairly simple, but as the trait becomes more polygenic and QTL effect sizes smaller, only large sample populations of DO are likely to be well-powered for QTL mapping. The complex population structure of the CC-RIX reduces mapping power, but does enable more accurate estimation of additive heritability. Furthermore, the CC-RIX can be used to detect parent-of-origin effects using reciprocal F1 designs (Oreper et al. 2018; Sun et al. 2021), though we do not investigate these approaches here. These key principles should extend to MPPs (and more broadly any mapping population) of other organisms with similar experimental features (e.g. genetic replicates, inbred/outbred).
These results emphasize the complementary nature of these populations for joint analyses. For example, even though CC and CC-RIX sample populations are unlikely to be sufficiently powered to dissect highly polygenic traits, a leniently detected result could confirm a QTL stringently detected in a larger DO sample population; furthermore, replicable CC and CC-RIX mice that possess key alleles of the QTL identified in the DO could be used for follow-up mechanistic studies. Taken together, they represent a flexible and powerful MPP resource for next generation complex trait studies.
Supplementary Material
Acknowledgments
We acknowledge Martin T. Ferris and William Valdar, both of the University of North Carolina at Chapel Hill, and Gary A. Churchill of the Jackson Laboratory for discussions related to this work. We thank Callan O’Connor of the Jackson Laboratory and Paul L. Maurizio of the University of Chicago for providing feedback on an early draft manuscript.
Appendix A
Processing founder haplotype data
Our initial sample populations were 116 CC mice (two animals per 58 strains) genotyped on MiniMUGA (11,000 markers), 192 DO mice genotyped on MegaMUGA (57,000 markers), and 500 DO mice genotyped on GigaMUGA (143,000 markers). We used the qtl2 R package to impute all populations to the same 64,000 loci. Though the shared set of loci far exceeds the mapping resolution of the CC, it is appealing to compare populations based on the same set of loci.
The input data for the DO are scaled haplotype dosages, reflecting an additive model rather than the full 36-state space of outbred diplotypes. Denser genotype arrays result in greater certainty in diplotype estimation. To keep this from influencing our findings, we imputed all loci based on most likely haplotype.
CC individuals and strains
Consider diplotype probability vector for CC mouse i at the locus p:
whereelements represent the homozygous diplotype probabilities for the founder haplotypes ordered as AJ, B6, 129, NOD, NZO, CAST, PWK, and WSB. For this example, we impute the mouse as having the NZO diplotype:
Forthe CC, we performed this step both at the individual-level and strain-level. For strain-level, we first averaged the diplotype probabilities at all loci between the two individuals from the same strain before doing imputation of most likely diplotype, thus smoothing out loci with segregating variation within a CC strain.
CC-RIX
We derived founder haplotype data for the possible F1s given 58 CC strains (ignoring reciprocal F1s). Consider CC strains j and k that have the B6 and PWK alleles at locus p, respectively:
TheF1 offspring’s scaled haplotype count at the locus p is calculated by the taking element-wise averages:
Thisprocess was repeated across all loci for all pairwise combinations of CC strains.
DO
Selecting most likely haplotypes for the DO samples is more complicated than the CC due to input being scaled haplotype dosages rather than 36-state probibilities. Consider the scaled haplotype dosage vector for DO mouse i at the locus p, we calculate and . If , the dosages suggest mouse i is homozygous at locus p and we set the allele with maximum probability from to 1 and all others to 0. If , the dosages suggest mouse i is heterozygous at locus p, and for ai,p we set the alleles with highest and second highest probability from to 0.5 and all others to 0.
Appendix B
Simulating data for heritability estimation
We extended our simulation approach for QTL mapping (Keele et al. 2019) to heritability, built from the model specified in Equation (1). We generate simulation t using the following model:
(B1) |
Anintercept can be included but by default we set it to 0. We randomly sample draws of and according to and . To control the relative contributions of a polygenic effect (u) at a specified proportion of total variation ϕ2, we scale the random draws accordingly:
whereV(.) returns the unbiased sample variance of its argument vector. For simulations with two components of heritability, we used the same approach but with respect to Equation (3), now with two polygenic effects controlled at and .
Appendix C
Estimating confidence intervals for the location of a QTL
We estimated likelihood-based intervals for QTL location using LOD support intervals (Dupuis and Siegmund 1999) and Bayes credible intervals (Broman and Sen 2009), which were calculated through the qtl2 R package (Broman et al. 2019). These approaches are fast and efficient because they do not involve a sampling process and are instead summarized directly from the LOD score profile around a detected QTL.
In contrast, sampling-based approaches are slow but statistically appealing because they incorporate uncertainty through a sampling process. We evaluated four related sampling-based procedures.
Parametric bootstrap (par boot)
The sampling process for parametric bootstrap involves simulating data from the QTL model (Equation (6)) at the detected locus. We fit the model to generate parametric bootstrap sample t:
(C2) |
where is the fit prediction of the QTL (based on eight fixed effect parameters for founder haplotpes) and is generated according to where is the estimated residual variance. We then map all bootstrap samples for the chromosome of the detected QTL and record the peak location. Middle quantile intervals for peak location were then calculated for a specified level, e.g. 95%. Notably, we exclude the polygenic effect to simplify the computational cost of running scans of the bootstrap samples. Incorporating variation from estimating the polygenic effect would make the intervals wider.
Parametric permutation (par perm)
The sampling process for parametric permutation is similar to parametric bootstrap. To generate parametric permutation sample t, we fit:
(C3) |
whereπ() is a function that permutes (randomly reorders) a vector, are the fit residuals, and all other terms as previously defined. Subsequent scans of parametric permutations and interval estimation is performed as for the parametric bootstrap.
Parametric permutation with kinship (par permK)
We also performed the parametric permutation procedure as previously described but with the kinship term fit as part of the subsequent scans, allowing us to assess whether ignoring the kinship effect makes a significant difference.
Bayesian bootstrap (Bayes boot)
The Bayesian bootstrap procedure is a continuous generalization of nonparametric bootstrapping, i.e. sampling with replacement (Rubin 1981). Instead of sampling with replacement, we generate sample weights for individuals (), ensuring that all individuals make some contribution to each sample, even if only fractional. The original data are re-scanned for the chromosome with the detected QTL, but now with each sample of weights. Intervals are recorded using the same quantile summary as with the other sampling-based approaches. Initially we included the kinship term but found that doing so resulted in essentially no variation in the LOD scores based on weights. Further work is needed on weighting in the presence of a random term for kinship.
Evaluating the sampling-based procedures represents a significant computational burden because of the additional sampling steps for each simulated QTL. For the 1,000 QTLs that were simulated for the 40% QTL and 30% polygenic background scenario, we evaluated parametric bootstrap and parametric permutation based on 1,000 samples. For parametric permutation with kinship and Bayesian bootstrap, the number of samples was reduced to 200 because both procedures are computationally more intensive.
Appendix D
Simulating data for QTL mapping
We simulated QTL data using our previous approach in the CC (Keele et al. 2019), now extended for CC-RIX and DO data as well. For sample populations of CC or CC-RIX with replicates, we simulated the data as strain/F1 means, reducing the size of the data and making subsequent mapping analysis more computationally efficient. Consider the QTL model specified in Equation (6) with effects on a trait stemming from the QTL, polygenic background, and random noise. We calculate the expected QTL effect size at the level of strain/F1 means based on the reduction in noise on means. Similar to our simulations of heritability (“Appendix B”), we generate simulation t for a QTL at p with the following model:
(D4) |
wherethe * superscript indicates randomly sampled vectors, QTL*[p] = Apβ*QTL, and scaling factors are calculated as
forthe proportions of variation explained by the QTL () and polygenic background (), r is the number of replicates per strain/F1, is a variance for the QTL term, and all other terms as previously defined.
We evaluated QTL mapping power based on scaling the QTL effect with respect to both the sample population and a shared reference population that has homozygous and balanced allele frequencies at the QTL. Which scaling is used is determined by adjusting the term. When scaling with respect to the sample population, . If scaling with respect to the reference population, then .
Data availability
All analyses were performed using the R statistical programming language (R Core Team 2022). We wrote the musppr R package to perform all simulations and subsequent analysis, using the qtl2, miQTL, and sommer R packages as dependencies. The musppr R package is available at https://github.com/gkeele/musppr. Founder haplotype data for all populations used in this study, a fixed version of musppr, and R code used to generate the simulated data, figures, and reported results can be found at https://doi.org/10.6084/m9.figshare.20560821. Supplementary material are available at G3 online.
Funding
This work was supported by grants from the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health (NIH): F32GM134599 and R01GM070683.
Literature cited
- Al-Barghouthi BM, Mesner LD, Calabrese GM, Brooks D, Tommasini SM, Bouxsein ML, Horowitz MC, Rosen CJ, Nguyen K, Haddox S, et al. . Systems genetics in diversity outbred mice inform BMD GWAS and identify determinants of bone strength. Nat Commun. 2021;12:3408. doi: 10.1038/s41467-021-23649-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aydin S, Pham DT, Zhang T, Keele GR, Skelly DA, Pankratz M, Choi T, Gygi SP, Reinholdt LG, Baker CL, et al. . Genetic dissection of the pluripotent proteome through multi-omics data integration. bioRxiv. 2022. [DOI] [PMC free article] [PubMed]
- Aylor DL, Valdar W, Foulds-Mathes W, Buus RJ, Verdugo RA, Baric RS, Ferris MT, Frelinger JA, Heise M, Frieman MB, et al. . Genetic analysis of complex traits in the emerging collaborative cross. Genome Res. 2011;21:1213–1222. doi: 10.1101/gr.111310.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bandillo N, Raghavan C, Muyco PA, Sevilla MAL, Lobina IT, Dilla-Ermita CJ, Tung CW, McCouch S, Thomson M, Mauleon R, et al. . Multi-parent advanced generation inter-cross (MAGIC) populations in rice: progress and potential for genetics research and breeding. Rice. 2013;6:11. doi: 10.1186/1939-8433-6-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, Churchill GA. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics. 2019;211:495–502. doi: 10.1534/genetics.118.301595 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broman KW, Sen S. A Guide to QTL Mapping with R/qtl. Statistics for Biology and Health. New York (NY): Springer; 2009. [Google Scholar]
- Cheng R, Parker CC, Abney M, Palmer AA. Practical considerations regarding the use of genotype and pedigree data to model relatedness in the context of genome-wide association studies. G3 Genes|Genomes|Genetics. 2013;3:1861–1867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chick JM, Munger SC, Simecek P, Huttlin EL, Choi K, Gatti DM, Raghupathy N, Svenson KL, Churchill GA, Gygi SP. Defining the consequences of genetic variation on a proteome-wide scale. Nature. 2016;534:500–505. doi: 10.1038/nature18270 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–971. doi: 10.1093/genetics/138.3.963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Churchill GA, Gatti DM, Munger SC, Svenson KL. The diversity outbred mouse population. Mamm Genome. 2012;23:713–718. doi: 10.1007/s00335-012-9414-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collaborative Cross Consortium . The genome architecture of the collaborative cross mouse genetic reference population. Genetics. 2012;190:389–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Complex Trait Consortium . The collaborative cross, a community resource for the genetic analysis of complex traits. Nat Genet. 2004;36:1133–1137. [DOI] [PubMed] [Google Scholar]
- Covarrubias-Pazaran G. Genome-assisted prediction of quantitative traits using the R package sommer. PLoS ONE. 2016;11:e0156744. doi: 10.1371/journal.pone.0156744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crouse WL, Kelada SNP, Valdar W. Inferring the allelic series at QTL in multiparental populations. Genetics. 2020;216:957–983. doi: 10.1534/genetics.120.303393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cubillos FA, Parts L, Salinas F, Bergström A, Scovacricchi E, Zia A, Illingworth CJR, Mustonen V, Ibstedt S, Warringer J, et al. . High-resolution mapping of complex traits with a four-parent advanced intercross yeast population. Genetics. 2013;195:1141–1155. doi: 10.1534/genetics.113.155515 [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Koning DJ, McIntyre LM. Back to the future: multiparent populations provide the key to unlocking the genetic basis of complex traits. G3 Genes|Genomes|Genetics. 2017;7:1617–1618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dell’Acqua M, Gatti DM, Pea G, Cattonaro F, Coppens F, Magris G, Hlaing AL, Aung HH, Nelissen H, Baute J, et al. . Genetic properties of the MAGIC maize population: a new platform for high definition QTL mapping in Zea mays. Genome Biol. 2015;16:167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dudbridge F, Koeleman BP. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am J Hum Genet. 2004;75:424–435. doi: 10.1086/423738 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dupont MSJ, Guillemot V, Campagne P, Serafini N, Marie S, Montagutelli X, Di Santo JP, Vosshenrich CAJ. Host genetic control of natural killer cell diversity revealed in the collaborative cross. Proc Natl Acad Sci USA. 2021;118:e2018834118. doi: 10.1073/pnas.2018834118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dupuis J, Siegmund D. Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics. 1999;151:373–386. doi: 10.1093/genetics/151.1.373 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eldridge R, Osorio D, Amstalden K, Edwards C, Young CR, Cai JJ, Konganti K, Hillhouse A, Threadgill DW, Welsh CJ, et al. . Antecedent presentation of neurological phenotypes in the collaborative cross reveals four classes with complex sex-dependencies. Sci Rep. 2020;10:7918. doi: 10.1038/s41598-020-64862-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Endelman JB, Jannink JL. Shrinkage estimation of the realized relationship matrix. G3 Genes|Genomes|Genetics. 2012;2:1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feldmann MJ, Piepho HP, Knapp SJ. Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses. G3 Genes|Genomes|Genetics. 2022;12:jkac080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferris MT, Aylor DL, Bottomly D, Whitmore AC, Aicher LD, Bell TA, Bradel-Tretheway B, Bryan JT, Buus RJ, Gralinski LE, et al. . Modeling host genetic regulation of influenza pathogenesis in the collaborative cross. PLoS Pathog. 2013;9:e1003196. doi: 10.1371/journal.ppat.1003196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- French JE, Gatti DM, Morgan DL, Kissling GE, Shockley KR, Knudsen GA, Shepard KG, Price HC, King D, Witt KL, et al. . Diversity outbred mice identify population-based exposure thresholds and genetic factors that influence benzene-induced genotoxicity. Environ Health Perspect. 2015;123:237–245. doi: 10.1289/ehp.1408202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gatti DM, Svenson KL, Shabalin A, Wu LY, Valdar W, Simecek P, Goodwin N, Cheng R, Pomp D, Palmer A, et al. . Quantitative trait locus mapping methods for diversity outbred mice. G3 Genes|Genomes|Genetics. 2014;4:1623–1633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gerdes Gyuricza I, Chick JM, Keele GR, Deighan AG, Munger SC, Korstanje R, Gygi SP, Churchill GA. Genome-wide transcript and protein analysis highlights the role of protein homeostasis in the aging mouse heart. Genome Res. 2022;32:838–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gould RL, Craig SW, McClatchy S, Churchill GA, Pazdro R. Quantitative trait mapping in diversity outbred mice identifies novel genomic regions associated with the hepatic glutathione redox system. Redox Biol. 2021;46:102093. doi: 10.1016/j.redox.2021.102093 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gralinski LE, Ferris MT, Aylor DL, Whitmore AC, Green R, Frieman MB, Deming D, Menachery VD, Miller DR, Buus RJ, et al. . Genome wide identification of SARS-CoV susceptibility loci using the collaborative cross. PLoS Genet. 2015;11:e1005504. doi: 10.1371/journal.pgen.1005504 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haley CS, Knott SA. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity. 1992;69:315–324. doi: 10.1038/hdy.1992.131 [DOI] [PubMed] [Google Scholar]
- Hampton BK, Plante KS, Whitmore AC, Linnertz CL, Madden EA, Noll KE, Boyson SP, Parotti B, Xenakis JG, Bell TA, et al. . Forward genetic screen of homeostatic antibody levels in the collaborative cross identifies MBD1 as a novel regulator of B cell homeostasis. PLoS Genet. 2022;18:e1010548. doi: 10.1371/journal.pgen.1010548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hsiao K, Noble C, Pitman W, Yadav N, Kumar S, Keele GR, Terceros A, Kanke M, Conniff T, Cheleuitte-Nieves C, et al. . A thalamic orphan receptor drives variability in short-term memory. Cell. 2020;183:522–536.e19. doi: 10.1016/j.cell.2020.09.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keele GR, Crouse WL, Kelada SNP, Valdar W. Determinants of QTL mapping power in the realized collaborative cross. G3 Genes|Genomes|Genetics. 2019;9:1707–1727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keele GR, Prokop JW, He H, Holl K, Littrell J, Deal A, Francic S, Cui L, Gatti DM, Broman KW, et al. . Genetic fine-mapping and identification of candidate genes and variants for adiposity traits in outbred rats: mapping adiposity traits in outbred rats. Obesity. 2018;26:213–222. doi: 10.1002/oby.22075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keele GR, Quach BC, Israel JW, Chappell GA, Lewis L, Safi A, Simon JM, Cotney P, Crawford GE, Valdar W, et al. . Integrative QTL analysis of gene expression and chromatin accessibility identifies multi-tissue patterns of genetic regulation. PLoS Genet. 2020;16:e1008537. doi: 10.1371/journal.pgen.1008537 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keele GR, Zhang T, Pham DT, Vincent M, Bell TA, Hock P, Shaw GD, Paulo JA, Munger SC, Pardo-Manuel de Villena F, et al. . Regulation of protein abundance in genetically diverse mouse populations. Cell Genomics. 2021;1:100003. doi: 10.1016/j.xgen.2021.100003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelada SNP, Aylor DL, Peck BCE, Ryan JF, Tavarez U, Buus RJ, Miller DR, Chesler EJ, Threadgill DW, Churchill GA, et al. . Genetic analysis of hematological parameters in incipient lines of the collaborative cross. G3 Genes|Genomes|Genetics. 2012;2:157–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelada SNP, Carpenter DE, Aylor DL, Chines P, Rutledge H, Chesler EJ, Churchill GA, Pardo-Manuel de Villena F, Schwartz DA, Collins FS. Integrative genetic analysis of allergic inflammation in the murine lung. Am J Respir Cell Mol Biol. 2014;51:436–445. doi: 10.1165/rcmb.2013-0501OC [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keller MP, Gatti DM, Schueler KL, Rabaglia ME, Stapleton DS, Simecek P, Vincent M, Allen S, Broman AT, Bacher R, et al. . Genetic drivers of pancreatic islet function. Genetics. 2018;209:335–356. doi: 10.1534/genetics.118.300864 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keller MP, Rabaglia ME, Schueler KL, Stapleton DS, Gatti DM, Vincent M, Mitok KA, Wang Z, Ishimura T, Simonett SP, et al. . Gene loci associated with insulin secretion in islets from nondiabetic mice. J Clin Invest. 2019;129:4419–4432. doi: 10.1172/JCI129143 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kover PX, Valdar W, Trakalo J, Scarcelli N, Ehrenreich IM, Purugganan MD, Durrant C, Mott R. A multiparent advanced generation inter-cross to fine-map quantitative traits in arabidopsis thaliana. PLoS Genet. 2009;5:e1000551. doi: 10.1371/journal.pgen.1000551 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lander ES, Botstein D. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121:185–199. doi: 10.1093/genetics/121.1.185 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linke V, Overmyer KA, Miller IJ, Brademan DR, Hutchins PD, Trujillo EA, Reddy TR, Russell JD, Cushing EM, Schueler KL, et al. . A large-scale genome–lipid association map guides lipid identification. Nat Metab. 2020;2:1149–1162. doi: 10.1038/s42255-020-00278-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Long AD, Macdonald SJ, King EG. Dissecting complex traits using the drosophila synthetic population resource. Trends Genet. 2014;30:488–495. doi: 10.1016/j.tig.2014.07.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lusis AJ, Seldin MM, Allayee H, Bennett BJ, Civelek M, Davis RC, Eskin E, Farber CR, Hui S, Mehrabian M, et al. . The hybrid mouse diversity panel: a resource for systems genetics analyses of metabolic and cardiovascular traits. J Lipid Res. 2016;57:925–942. doi: 10.1194/jlr.R066944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Sunderland (MA): Sinauer Associates; 1998. [Google Scholar]
- Manichaikul A, Dupuis J. Poor performance of bootstrap confidence intervals for the location of a quantitative trait locus. Genetics. 2006;174:481–489. doi: 10.1534/genetics.106.061549 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martínez O, Curnow RN. Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor Appl Genet. 1992;85:480–488. [DOI] [PubMed] [Google Scholar]
- Morgan AP, Welsh CE. Informatics resources for the collaborative cross and related mouse populations. Mamm Genome. 2015;26:521–539. doi: 10.1007/s00335-015-9581-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mosedale M, Cai Y, Eaddy JS, Corty RW, Nautiyal M, Watkins PB, Valdar W. Identification of candidate risk factor genes for human idelalisib toxicity using a collaborative cross approach. Toxicol Sci. 2019;172:265–278. doi: 10.1093/toxsci/kfz199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mosedale M, Cai Y, Eaddy JS, Kirby PJ, Wolenski FS, Dragan Y, Valdar W. Human-relevant mechanisms and risk factors for TAK-875-induced liver injury identified via a gene pathway-based approach in collaborative cross mice. Toxicology. 2021;461:152902. doi: 10.1016/j.tox.2021.152902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mosedale M, Kim Y, Brock WJ, Roth SE, Wiltshire T, Scott Eaddy J, Keele GR, Corty RW, Xie Y, Valdar W, et al. . Candidate risk factors and mechanisms for tolvaptan-induced liver injury are identified using a collaborative cross approach. Toxicol Sci. 2017;156:438–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noble LM, Chelo I, Guzella T, Afonso B, Riccardi DD, Ammerman P, Dayarian A, Carvalho S, Crist A, Pino-Querido A, et al. . Polygenicity and epistasis underlie fitness-proximal traits in the Caenorhabditis elegans multiparental experimental evolution (CeMEE) panel. Genetics. 2017;207:1663–1685. doi: 10.1534/genetics.117.300406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oreper D, Schoenrock SA, McMullan R, Ervin R, Farrington J, Miller DR, de Villena FPM, Valdar W, Tarantino LM. Reciprocal F1 hybrids of two inbred mouse strains reveal parent-of-origin and perinatal diet effects on behavior and expression. G3 Genes|Genomes|Genetics. 2018;8:3447–3468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orgel K, Smeekens JM, Ye P, Fotsch L, Guo R, Miller DR, Pardo-Manuel de Villena F, Burks AW, Ferris MT, Kulis MD. Genetic diversity between mouse strains allows identification of the CC027/GeniUnc strain as an orally reactive model of peanut allergy. J Allergy Clin Immunol. 2019;143:1027–1037.e7. doi: 10.1016/j.jaci.2018.10.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ouellette AR, Neuner SM, Dumitrescu L, Anderson LC, Gatti DM, Mahoney ER, Bubier JA, Churchill G, Peters L, Huentelman MJ, et al. . Cross-species analyses identify Dlgap2 as a regulator of age-related cognitive decline and Alzheimer’s dementia. Cell Rep. 2020;32:108091. doi: 10.1016/j.celrep.2020.108091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patterson HD, Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrika. 1971;58:545–554. doi: 10.1093/biomet/58.3.545 [DOI] [Google Scholar]
- Philip VM, Sokoloff G, Ackert-Bicknell CL, Striz M, Branstetter L, Beckmann MA, Spence JS, Jackson BL, Galloway LD, Barker P, et al. . Genetic analysis in the collaborative cross breeding population. Genome Res. 2011;21:1223–1238. doi: 10.1101/gr.113886.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team . R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. 2022. [Google Scholar]
- Rogala AR, Morgan AP, Christensen AM, Gooch TJ, Bell TA, Miller DR, Godfrey VL, de Villena FPM. The collaborative cross as a resource for modeling human disease: CC011/UNC, a new mouse model for spontaneous colitis. Mamm Genome. 2014;25:95–108. doi: 10.1007/s00335-013-9499-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubin DB. The Bayesian bootstrap. Ann Stat. 1981;9:130–134. doi: 10.1214/aos/1176345338 [DOI] [Google Scholar]
- Schäfer A, Leist SR, Gralinski LE, Martinez DR, Winkler ES, Okuda K, Hawkins PE, Gully KL, Graham RL, Scobey DT, et al. . A Multitrait Locus Regulates Sarbecovirus Pathogenesis. Microbiology, preprint. 2022.
- Schoenrock SA, Oreper D, Farrington J, McMullan RC, Ervin R, Miller DR, Pardo-Manuel de Villena F, Valdar W, Tarantino LM. Perinatal nutrition interacts with genetic background to alter behavior in a parent-of-origin-dependent manner in adult collaborative cross mice. Genes Brain Behav. 2018;17:e12438. doi: 10.1111/gbb.12438 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scoggin K, Lynch R, Gupta J, Nagarajan A, Sheffield M, Elsaadi A, Bowden C, Aminian M, Peterson A, Adams LG, et al. . Genetic background influences survival of infections with Salmonella enterica serovar Typhimurium in the collaborative cross. PLoS Genet. 2022;18:e1010075. doi: 10.1371/journal.pgen.1010075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shorter JR, Odet F, Aylor DL, Pan W, Kao CY, Fu CP, Morgan AP, Greenstein S, Bell TA, Stevans AM, et al. . Male infertility is responsible for nearly half of the extinction observed in the mouse collaborative cross. Genetics. 2017;206:557–572. doi: 10.1534/genetics.116.199596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sigmon JS, Blanchard MW, Baric RS, Bell TA, Brennan J, Brockmann GA, Burks AW, Calabrese JM, Caron KM, Cheney RE, et al. . Content and performance of the MiniMUGA genotyping array: a new tool to improve rigor and reproducibility in mouse research. Genetics. 2020;216:905–930. doi: 10.1534/genetics.120.303596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skelly DA, Czechanski A, Byers C, Aydin S, Spruce C, Olivier C, Choi K, Gatti DM, Raghupathy N, Keele GR, et al. . Mapping the effects of genetic variation on chromatin state and gene expression reveals loci that control ground state pluripotency. Cell Stem Cell. 2020;27:459–469.e8. doi: 10.1016/j.stem.2020.07.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith CM, Proulx MK, Lai R, Kiritsy MC, Bell TA, Hock P, Pardo-Manuel de Villena F, Ferris MT, Baker RE, Behar SM, et al. . Functionally overlapping variants control tuberculosis susceptibility in collaborative cross mice. mBio. 2019;10:e02791–19. doi: 10.1128/mBio.02791-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srivastava A, Morgan AP, Najarian ML, Sarsani VK, Sigmon JS, Shorter JR, Kashfeen A, McMullan RC, Williams LH, Giusti-Rodríguez P, et al. . Genomes of the mouse collaborative cross. Genetics. 2017;206:537–556. doi: 10.1534/genetics.116.198838 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun KY, Oreper D, Schoenrock SA, McMullan R, Giusti-Rodríguez P, Zhabotynsky V, Miller DR, Tarantino LM, Pardo-Manuel de Villena F, Valdar W. Bayesian modeling of skewed X inactivation in genetically diverse mice identifies a novel Xce allele associated with copy number changes. Genetics. 2021;218:iyab034. doi: 10.1093/genetics/iyab034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svenson KL, Gatti DM, Valdar W, Welsh CE, Cheng R, Chesler EJ, Palmer AA, McMillan L, Churchill GA. High-resolution genetic mapping using the mouse diversity outbred population. Genetics. 2012;190:437–447. doi: 10.1534/genetics.111.132597 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takemon Y, Chick JM, Gerdes Gyuricza I, Skelly DA, Devuyst O, Gygi SP, Churchill GA, Korstanje R. Proteomic and transcriptomic profiling reveal different aspects of aging in the kidney. eLife. 2021;10:e62585. doi: 10.7554/eLife.62585 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Threadgill DW, Miller DR, Churchill GA, de Villena FPM. The collaborative cross: a recombinant inbred mouse population for the systems genetic era. ILAR J. 2011;52:24–31. doi: 10.1093/ilar.52.1.24 [DOI] [PubMed] [Google Scholar]
- Tovar A, Smith GJ, Nalesnik MB, Thomas JM, McFadden KM, Harkema JR, Kelada SNP. A locus on chromosome 15 contributes to acute ozone-induced lung injury in collaborative cross mice. Am J Respir Cell Mol Biol. 2022;67:528–538. doi: 10.1165/rcmb.2021-0326OC [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valdar W, Flint J, Mott R. Simulating the collaborative cross: power of quantitative trait loci detection and mapping resolution in large sets of recombinant inbred strains of mice. Genetics. 2006;172:1783–1797. doi: 10.1534/genetics.104.039313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinkhuyzen AA, Wray NR, Yang J, Goddard ME, Visscher PM. Estimation and partition of heritability in human populations using whole-genome analysis methods. Annu Rev Genet. 2013;47:75–95. doi: 10.1146/annurev-genet-111212-133258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Visscher PM, Thompson R, Haley CS. Confidence intervals in QTL mapping by bootstrapping. Genetics. 1996;143:1013–1020. doi: 10.1093/genetics/143.2.1013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woods LCS, Mott R. Heterogeneous stock populations for analysis of complex traits. In: Schughart K, Williams RW, editors, Systems Genetics. New York (NY): Springer. 2017(Methods in Molecular Biology; 1488). p. 31–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yam P, Albright J, VerHague M, Gertz ER, Pardo-Manuel de Villena F, Bennett BJ. Genetic background shapes phenotypic response to diet for adiposity in the collaborative cross. Front Genet. 2021;11:615012. doi: 10.3389/fgene.2020.615012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang H, Bell TA, Churchill GA, Pardo-Manuel de Villena F. On the subspecific origin of the laboratory mouse. Nat Genet. 2007;39:1100–1107. doi: 10.1038/ng2087 [DOI] [PubMed] [Google Scholar]
- Yang H, Wang JR, Didion JP, Buus RJ, Bell TA, Welsh CE, Bonhomme F, Yu AHT, Nachman MW, Pialek J, et al. . Subspecific origin and haplotype diversity in the laboratory mouse. Nat Genet. 2011;43:648–655. doi: 10.1038/ng.847 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46:100–106. doi: 10.1038/ng.2876 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang T, Keele GR, Gyuricza IG, Vincent M, Brunton C, Bell TA, Hock P, Shaw GD, Munger SC, de Villena FPM, et al. . Multi-omics analysis identifies drivers of protein phosphorylation. bioRxiv. 2022.
- Zhang J, Teh M, Kim J, Eva MM, Cayrol R, Meade R, Nijnik A, Montagutelli X, Malo D, Jaubert J. A loss-of-function mutation in the integrin alpha l (itgal) gene contributes to susceptibility to salmonella enterica serovar typhimurium infection in collaborative cross strain cc042. Infect Immun. 2019;88:e00656–19. doi: 10.1128/IAI.00656-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All analyses were performed using the R statistical programming language (R Core Team 2022). We wrote the musppr R package to perform all simulations and subsequent analysis, using the qtl2, miQTL, and sommer R packages as dependencies. The musppr R package is available at https://github.com/gkeele/musppr. Founder haplotype data for all populations used in this study, a fixed version of musppr, and R code used to generate the simulated data, figures, and reported results can be found at https://doi.org/10.6084/m9.figshare.20560821. Supplementary material are available at G3 online.