A generic hidden Markov model for multiparent populations

Karl W Broman

doi:10.1093/g3journal/jkab396

. 2021 Nov 16;12(2):jkab396. doi: 10.1093/g3journal/jkab396

A generic hidden Markov model for multiparent populations

Karl W Broman ^1,^✉

Editor: R W Doerge

PMCID: PMC9210298 PMID: 34791211

Abstract

A common step in the analysis of multiparent populations (MPPs) is genotype reconstruction: identifying the founder origin of haplotypes from dense marker data. This process often makes use of a probability model for the pattern of founder alleles along chromosomes, including the relative frequency of founder alleles and the probability of exchanges among them, which depend on a model for meiotic recombination and on the mating design for the population. While the precise experimental design used to generate the population may be used to derive a precise characterization of the model for exchanges among founder alleles, this can be tedious, particularly given the great variety of experimental designs that have been proposed. We describe an approximate model that can be applied for a variety of MPPs. We have implemented the approach in the R/qtl2 software, and we illustrate its use in applications to publicly available data on Diversity Outbred and Collaborative Cross mice.

Keywords: quantitative trait loci, QTL, HMM, Collaborative Cross, Diversity Outbred mice, heterogeneous stock, MPP, multiparental populations, Multiparent Advanced Generation Inter-Cross (MAGIC)

Introduction

Multiparent populations (MPPs) are valuable resources for the analysis of complex traits (de Koning and McIntyre 2017), including the mapping of quantitative trait loci (QTL). A wide variety of MPPs have been developed, including heterogeneous stock (HS) in mice (Mott et al. 2000) and rats (Solberg Woods et al. 2010), eight-way recombinant inbred lines (RIL) in mice (Complex Trait Consortium 2004) and Drosophila (King et al. 2012), and multiparent advanced generation intercross (MAGIC) populations in a variety of plant species including Arabidopsis (Kover et al. 2009), wheat (Cavanagh et al. 2008), maize (Dell’Acqua et al. 2015), and rice (Bandillo et al. 2013).

QTL mapping in MPPs can be performed through statistical tests at individual single nucleotide polymorphisms (SNPs), as used in genome-wide association studies. However, many investigators first seek to reconstruct the mosaic of founder haplotypes along the chromosomes of MPP individuals and use this reconstruction to test for association between founder alleles and the quantitative phenotype. This approach was first introduced by Mott et al. (2000) for the analysis of HS mice, implemented in the HAPPY software, and has been continued in packages such as R/mpMap (Huang and George 2011), DOQTL (Gatti et al. 2014), and R/qtl2 (Broman et al. 2019a).

The process of genotype reconstruction in an MPP individual is illustrated in Figure 1. The genotypes in the founder strains (Figure 1A) and the MPP offspring (Figure 1B) are used to calculate the probability of each possible founder genotype at each position along the chromosome (Figure 1C). Thresholding of these probabilities can be used to infer the founder genotypes and the locations of recombination breakpoints (Figure 1D).

Illustration of genotype reconstruction in a 1 Mbp region in a single DO mouse. (A) Genotypes of eight founder strains at a set of SNPs, with open and closed circles corresponding to being homozygous for the more-frequent and less-frequent allele, respectively. (B) Genotype of the DO mouse at the SNPs, with gray indicating the mouse is heterozygous. (C) Genotype probabilities for the DO mouse along the chromosome segment, given the observed data. Genotypes other than the two shown have negligible probability across the region. (D) Inferred haplotypes in the DO mouse.

Such genotype reconstructions are valuable not just for QTL analysis but also for data diagnostics (Broman et al. 2019b). For example, the inferred number of recombination breakpoints is a useful diagnostic for sample quality. Further, the reconstructed genotypes can be used to derive predicted SNP genotypes; comparing these to the observed SNP genotypes can help to identify problems in both samples and SNPs.

The probability calculation in Figure 1C depends on a model for the process along MPP chromosomes in Figure 1D. In the HAPPY software for HS mice, Mott et al. (2000) used a model of random mating in a large population. Broman (2005) extended the work of Haldane and Waddington (1931) to derive two-locus genotype probabilities in multiparent RIL. This was later developed for the case of multiparent advanced intercross populations (Broman 2012a, 2012b), including Diversity Outbred (DO) mice (Churchill et al. 2012).

Genotype reconstruction for a variety of MPP designs has been implemented in the R/qtl2 software (Broman et al. 2019a, https://kbroman.org/qtl2). But it can be tedious analytical work to derive the appropriate transition probabilities for each new MPP design that is proposed. An alternative is to develop a more general approach for genotype reconstruction, such as used in the software RABBIT (Zheng et al. 2015). However, this approach has a variety of parameters that can be difficult to specify.

Here, we propose a similarly general method for genotype reconstruction in MPPs. We imagine that an MPP was derived from a population of homozygous founder strains at known proportions, α_i, followed by n generations of random mating among a large number of mating pairs. We can derive the exact transition probabilities for this situation. The α_i should be simple to specify from the MPP design, and the effective number of generations of random mating, n, can be determined by computer simulation, to match the expected density of recombination breakpoints.

Our approach has been implemented in R/qtl2. While we currently focus on data with SNP genotype calls, such as from microarrays, our model could potentially be incorporated into methods for genotype imputation from low-coverage sequencing, such as that of Zheng et al. (2018). We illustrate our approach through application to publicly available datasets on DO (Al-Barghouthi et al. 2021) and Collaborative Cross (CC) mice (Srivastava et al. 2017).

Methods

For genotype reconstruction in an MPP, we use a hidden Markov model (HMM; see Rabiner 1989). Our basic approach is as described in Broman and Sen (2009, Appendix D) for a biparental cross; the extension to an MPP is straightforward and described below.

Consider an MPP derived from k inbred lines. We focus on a single individual, and on a single chromosome with M marker positions (including pseudomarkers: positions between markers at which we have no data but would like to infer the underlying genotype). Let G_m be the underlying genotype at position m. In a homozygous population, such as RIL, the G_m take one of k possible values, the k homozygous genotypes. In a heterozygous population, such as advanced intercross lines (AIL), the G_m take one of $(\begin{matrix} k \\ 2 \end{matrix}) + k$ possible values, the $(\begin{matrix} k \\ 2 \end{matrix})$ heterozygotes and k homozygotes. Let O_m be the observed SNP genotype at position m (possibly missing). We assume that the G_m form a Markov chain (that $G_{1}, \dots, G_{m - 1}$ are conditionally independent of $G_{m + 1}, \dots, G_{M}$ , given G_m), and that O_m is conditionally independent of everything else, given G_m. The forward-backward algorithm (see Rabiner 1989) takes advantage of the conditional independence structure of the HMM to calculate $\Pr (G_{m} | O)$ .

The key parameters in the model are the initial probabilities, $π_{g} = \Pr (G_{1} = g)$ , the transition probabilities, $t_{m} (g, g') = \Pr (G_{m + 1} = g' | G_{m} = g)$ , and the emission probabilities, $e_{m} (g) = \Pr (O_{m} | G_{m} = g)$ . A particular advantage of the HMM for genotype reconstruction is the easy incorporation of a model for genotyping errors (Lincoln and Lander 1992), which is done through the emission probabilities, which condition on the founder SNP genotypes but allow some fixed probability ϵ that the observed SNP genotype in the MPP individual is in error and incompatible with the underlying genotype G_m and the SNP genotypes in the founder lines.

The initial and transition probabilities govern the underlying Markov chain, including the relative frequency of founder alleles and the frequency of recombination breakpoints along MPP chromosomes. In principle, these probabilities may be derived on the basis of the crossing design for the MPP. In practice, the transition probabilities can be tedious to derive, and exact calculations may provide no real advantage for genotype reconstruction.

Here, we derive the transition probabilities for a generic MPP design, which may then be applied generally. We consider a founder population with k inbred lines in proportions α_i, and imagine subsequent generations are produced by random mating with a very large set of mating pairs.

Consider a pair of loci separated by a recombination fraction of r (assumed the same in both sexes) and let $p_{i j}^{(n)}$ be the probability of that a random haplotype at generation n has alleles i and j. At n = 0, we have just the founding inbred lines, and so $p_{i j}^{(0)} = α_{i}$ if i = j and $= 0$ if $i \neq j$ .

The probabilities from one generation to the next are related by a simple recursion, as in Broman (2012b). Consider a random haplotype at generation n. It was either a random haplotype from generation n−1 transmitted intact without recombination, or it is a recombinant haplotype bringing together two random alleles. Thus

p_{i j}^{(n)} = (1 - r) p_{i j}^{(n - 1)} + r α_{i} α_{j} .

(1)

Using the same techniques described in Broman (2012b), we find the solutions:

p_{i j}^{(n)} = {\begin{matrix} α_{i}^{2} + {(1 - r)}^{n} α_{i} (1 - α_{i}) & if i = j \\ α_{i} α_{j} [1 - {(1 - r)}^{n}] & if i \neq j . \end{matrix}

(2)

The transition probabilities along a haplotype are derived by dividing the above by the marginal probability, α_i. Thus if G₁ and G₂ are the genotypes at the two loci, we have the following transition probabilities.

\Pr (G_{2} = j | G_{1} = i) = {\begin{matrix} α_{i} + {(1 - r)}^{n} (1 - α_{i}) & if i = j \\ α_{j} [1 - {(1 - r)}^{n}] & if i \neq j . \end{matrix}

(3)

For a heterozygous population (such as HS or DO mice), an individual will have two random such haplotypes. For homozygous population (such as MAGIC), we treat them like doubled haploids, by taking a single random chromosome and doubling it.

For the X chromosome, we use the same equations but replace n with $(2 / 3) n$ , since recombination occurs only in females, so in 2/3 of the X chromosomes. This provides a remarkably tight approximation.

You can potentially use the expected number of crossovers to calibrate the number generations of random mating, or the map expansion, which is the relative increase in the number of crossovers. Let R(r) be the chance that a random haplotype has an exchange of alleles across an interval with recombination fraction r, so $R (r) = 1 - \sum_{i} p_{i i}^{(n)}$ . The map expansion is dR/dr evaluated at r = 0 (see Teuscher and Broman 2007). Using Equation (2) above, we then get that the map expansion in this population is $n (1 - \sum α_{i}^{2})$ . In the special case that $α_{i} \equiv 1 / k$ for all i, this reduces to $n (k - 1) / k$ .

The map expansion at generation s in DO mice on an autosome is $(7 / 8) (s - 1) + M_{1}$ where M₁ is the weighted average of map expansion in the pre-CC founders (Broman 2012b), or about $(7 s + 37) / 8$ . Equating this with $(7 / 8) n$ , we can thus take $n \approx s + 5$ when using this model to approximate the DO. For the CC, Broman (2005) showed that R = 7r/(1 + 6r), and so the map expansion is 7. Thus we can take n = 8 as the effective number of generations of random mating.

Applications

We illustrate our approach with application to datasets on DO mice (Al-Barghouthi et al. 2021) and CC mice (Srivastava et al. 2017). In both cases, the approach provided results that were generally equivalent to those from the more exact model, though with important differences in the results for the X chromosome in the CC application.

DO mice

The DO mouse data of Al-Barghouthi et al. (2021) concerns a set of 619 mice from DO generations 23–33, in 11 batches by generation and including 304 females and 315 males. The mice were genotyped on the GigaMUGA array (Morgan et al. 2016) and the cleaned data consist of genotypes at 109,427 markers. A wide variety of phenotypes are available; we focus on the 20 contributing to the results in Table 1 of Al-Barghouthi et al. (2021).

We performed genotype reconstruction using the transition matrices derived specifically for DO mice (Broman et al. 2019b) as well as by the approximate model proposed above. For the DO mice at generation n, we used the transition probabilities for general eight-way AIL at n + 5.

Following Al-Barghouthi et al. (2021), we assumed a 0.2% genotyping error rate and used the Carter–Falconer map function (Carter and Falconer 1951). Calculations were performed in R (R Core Team 2021) with R/qtl2 (Broman et al. 2019a), on an 8-core Linux laptop with 64 GB RAM. The calculations with the DO-specific model took approximately 35 min, while those with the general AIL model took 27 min, an almost 25% reduction in computation time.

The transition probabilities used by the two models are only subtly different and become less different in later generations. The probability of an exchange across an interval on a random DO chromosome, as a function of the recombination fraction for the interval and the number of generations, is shown in Figure 2.

Differences in transition probabilities for DO mice from more-exact calculations and the proposed approximations. Probability of an exchange of alleles across an interval as a function of generation with the more-exact calculations (solid lines) and the proposed approximation (dashed lines) for autosomes (A) and the X chromosome (B). Ratio of the probabilities (more-exact vs approximation) for autosomes (C) and the X chromosome (D).

QTL analysis proceeded by the method described in Gatti et al. (2014) and also used by Al-Barghouthi et al. (2021). Namely, we fit a linear mixed model assuming an additive model for the founder haplotypes, with a residual polygenic effect to account for relationships among individuals with kinship matrices calculated using the “leave-one-chromosome-out” method (see Yang et al. 2014), and with a set of fixed-effect covariates defined in Al-Barghouthi et al. (2021).

The genotype probabilities were almost indistinguishable. The maximum difference was 0.011 on the X chromosome followed by a difference of 0.007 on chromosome 8. For that reason, the QTL mapping results were hardly different. Across all 20 traits considered, the maximum difference in LOD scores in the two sets of results was 0.02.

The LOD curves by the two methods for tissue mineral density (TMD) and the differences between them are shown in Figure 3. The QTL on chromosomes 1 and 10 have LOD scores of 23.9 and 14.6, respectively, but the maximum difference in LOD, genome wide, between the two methods is just 0.014.

Genome scan for TMD for the DO mouse data from Al-Barghouthi *et al.* (2021). (A) LOD curves across the genome using the genotype probabilities from the DO-specific model (solid blue curves) and the proposed general model (dotted pink curves). (B) Differences between the two sets of LOD curves.

CC mice

As a second application of our approach, we consider the data for a set of 69 CC lines (Srivastava et al. 2017). These are eight-way RIL derived from the same eight founders as the DO mice, as the DO was formed from 144 partially inbred lines from the process of developing the CC (Svenson et al. 2012).

Each CC line was formed from a separate “funnel,” bringing the eight founder genomes together as rapidly as possible, for example [(A × B)×(C × D)]×[(E × F)×(G × H)], where the female parent is listed first in each cross. Inbreeding was accomplished by repeated mating between siblings.

The recombination probabilities for the autosomes in the CC do not depend on the order of the founders in the funnel for a line (Broman 2005). This is in contrast with the case of eight-way RIL by selfing (see Broman 2005, Table 2). For the X chromosome, however, the cross order is important, as only five of the eight founders can contribute. For example, in a line derived from the cross [(A × B)×(C × D)]×[(E × F)×(G × H)], the single-locus genotype probabilities on the X chromosome are 1/6 each for alleles A, B, E, and F, and 1/3 for allele C, while alleles D, G, and H will be absent. And note that the mitochondrial DNA will come from founder A, while the Y chromosome will be from founder H.

The cross funnel information was missing for 14 of the 69 CC lines. While the sources of the mitochondria and Y chromosome were provided for all lines, there were several inconsistencies in these data: line CC013/GeniUnc has the same founder listed as the source for its mitochondria and Y chromosome, and for three lines (CC031/GeniUnc, CC037/TauUnc, and CC056/GeniUnc) the founder on the Y chromosome is also seen contributing to the X chromosome. We used the genotype probabilities reported in Srivastava et al. (2017) to construct compatible cross funnels, with small modifications to handle the inconsistent information.

We performed genotype reconstruction using the transition matrices derived specifically for CC mice (Broman 2005) as well as by the approximate model proposed above, using n = 8 generations of random mating, chosen to match the expected frequency of recombination breakpoints.

The resulting probabilities were nearly identical on all autosomes in all CC lines. The maximum difference in probabilities on the autosomes was just 0.0006.

There were some important differences on the X chromosome, however. There were no cases with high probability pointing to different founder alleles by the two models, but there were several cases where two or more founders cannot be distinguished, but some would be excluded by the assumed cross design.

For example, in Figure 4, we show the genotype probabilities along the X chromosome for strain CC038/GeniUnc, as calculated with the more-exact model (Figure 4A) and with the approximate model (Figure 4B). We also include the results for the case that the more-exact model but when an incorrect cross design was used (Figure 4C). Note the segment near 135 Mbp, which is inferred to be from founder NOD with the more-exact model but is equally likely B6 or NOD with the approximate model; the B6 and NOD founder strains are identical in the region, but the assumed cross design for the CC038/GeniUnc strain excluded B6. For the results using the incorrect cross design (which excluded not just B6 but also 129 and NOD), the results across the entire chromosome become a chopped-up mess, with an apparent 39 recombination breakpoints, vs 5 when the correct cross information is used.

Genotype probabilities along the X chromosome for CC strain CC038/GeniUnc. (A) Results using the more-exact model that excludes founders B6, CAST, and WSB. (B) Results using the proposed approximate model. (C) Results using the more-exact model but with the wrong cross information, excluding founders B6, 129, and NOD.

Overall, there were seven strains where the maximum difference in the probabilities from the more-exact model and the proposed approximate model were in the range 0.25–0.50, and another eight strains with maximum difference in the range 0.10–0.25. All of the differences concern cases where multiple founders are identical for a region and either some would be excluded by the cross design, or where the difference in prior frequencies affects the results. For example, in the cross [(A × B)×(C × D)]×[(E × F)×(G × H)], the frequency of the C allele on the X chromosome is twice that of A, B, E, and F.

Discussion

We have proposed an approximate model for use with genotype reconstruction in MPPs. We derived the two-point probabilities on autosomes in the case of random mating in large, discrete generations, derived from a founder population of a set of inbred lines in known proportions. We use the same frequencies for the X chromosome, but with 2/3 the number of generations. The approach is shown to give equivalent results for the mouse DO and CC populations, though with important differences for the X chromosome in CC lines, where some founder alleles can be excluded based on the cross design. The more-exact model for the X chromosome in the CC excludes three of the eight founders based on the cross design. This is particularly useful in cases that multiple founders are identical by descent across a region. However, the approximate model is not affected by errors in the specified cross design (see Figure 4).

The value of this generic model points toward the general usefulness of the original software for MPPs, HAPPY (Mott et al. 2000), developed for the analysis of mouse HS. The results may depend on marker density and informativeness, but with a dense set of informative markers, a generic approach can provide good-quality genome reconstructions.

The HMM itself is an approximation. Meiosis generally exhibits positive crossover interference, but the Markov property is closer to being correct in MPPs with multiple generations of mating, because nearby recombination events come from independent generations. This was apparent in the three-point probabilities derived by Haldane and Waddington (1931) for two-way RIL and was further explored in Broman (2005) for multiway RIL.

The proposed method has been implemented in the R/qtl2 software (Broman et al. 2019a). It requires specification of the founder proportions and one other parameter (the number of generations of random mating) which governs the frequency of recombination breakpoints. The founder proportions should be straightforward from the cross design; the effective number of generations of random mating may require some calibration, such as through computer simulation to match the expected frequency of recombination breakpoints.

Data availability

The R/qtl2 software is available at the Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=qtl2, as well as GitHub, https://github.com/rqtl/qtl2. Further documentation is available at the R/qtl2 website, https://kbroman.org/qtl2.

The DO mouse data from Al-Barghouthi et al. (2021) are available at Zenodo, https://doi.org/10.5281/zenodo.4265417. Also see their companion repository of analysis scripts at GitHub, https://github.com/basel-maher/DO_project, and archived at Zenodo, https://doi.org/10.5281/zenodo.4718146.

The CC mouse data from Srivastava et al. (2017) are available at Zenodo, https://doi.org/10.5281/zenodo.377036. Reorganized files in R/qtl2 format are at https://github.com/rqtl/qtl2data/tree/main/CC.

Our detailed analysis code is available at GitHub, https://github.com/kbroman/Paper_GenericHMM, and archived at Zenodo, https://doi.org/10.5281/zenodo.5718739.

Acknowledgments

Two anonymous reviewers provided valuable comments for improvement of the manuscript.

Funding

This work was supported in part by the National Institutes of Health grant R01GM070683.

Conflicts of interest

The author declares that there is no conflict of interest.

Literature cited

Al-Barghouthi BM, Mesner LD, Calabrese GM, Brooks D, Tommasini SM, et al. 2021. Systems genetics in Diversity Outbred mice inform BMD GWAS and identify determinants of bone strength. Nat Commun. 12:3408. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bandillo N, Raghavan C, Muyco PA, Sevilla MAL, Lobina IT, et al. 2013. Multi-parent advanced generation inter-cross (MAGIC) populations in rice: progress and potential for genetics research and breeding. Rice (N Y). 6:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broman KW. 2005. The genomes of recombinant inbred lines. Genetics. 169:1133–1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broman KW. 2012a. Genotype probabilities at intermediate generations in the construction of recombinant inbred lines. Genetics. 190:403–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broman KW. 2012b. Haplotype probabilities in advanced intercross populations. G3 (Bethesda). 2:199–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, et al. 2019a. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics. 211:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broman KW, Gatti DM, Svenson KL, Sen Ś, Churchill GA.. 2019b. Cleaning genotype data from Diversity Outbred mice. G3 (Bethesda). 9:1571–1579. [DOI] [PMC free article] [PubMed] [Google Scholar]
Broman KW, Sen S.. 2009. A Guide to QTL Mapping with R/qtl. New York: Springer. [Google Scholar]
Carter T, Falconer D.. 1951. Stocks for detecting linkage in the mouse, and the theory of their design. J Genet. 50:307–323. [DOI] [PubMed] [Google Scholar]
Cavanagh C, Morell M, Mackay I, Powell W.. 2008. From mutations to MAGIC: resources for gene discovery, validation and delivery in crop plants. Curr Opin Plant Biol. 11:215–221. [DOI] [PubMed] [Google Scholar]
Churchill GA, Gatti DM, Munger SC, Svenson KL.. 2012. The Diversity Outbred mouse population. Mamm Genome. 23:713–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
Complex Trait Consortium. 2004. The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat Genet. 36:1133–1137. [DOI] [PubMed] [Google Scholar]
de Koning D, McIntyre L.. 2017. Back to the future: multiparent populations provide the key to unlocking the genetic basis of complex traits. Genetics. 206:527–529. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dell'Acqua M, Gatti DM, Pea G, Cattonaro F, Coppens F, et al. 2015. Genetic properties of the MAGIC maize population: a new platform for high definition QTL mapping in Zea mays. Genome Biol. 16:167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gatti DM, Svenson KL, Shabalin A, Wu L-Y, Valdar W, et al. 2014. Quantitative trait locus mapping methods for Diversity Outbred mice. G3 (Bethesda). 4:1623–1633. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haldane JBS, Waddington CH.. 1931. Inbreeding and linkage. Genetics. 16:357–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang BE, George AW.. 2011. R/mpmap: a computational platform for the genetic analysis of multiparent recombinant inbred lines. Bioinformatics. 27:727–729. [DOI] [PubMed] [Google Scholar]
King EG, Merkes CM, McNeil CL, Hoofer SR, Sen S, et al. 2012. Genetic dissection of a model complex trait using the Drosophila synthetic population resource. Genome Res. 22:1558–1566. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kover PX, Valdar W, Trakalo J, Scarcelli N, Ehrenreich IM, et al. 2009. A multiparent advanced generation inter-cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet. 5:e1000551. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lincoln SE, Lander ES.. 1992. Systematic detection of errors in genetic linkage data. Genomics. 14:604–610. [DOI] [PubMed] [Google Scholar]
Morgan AP, Fu C-P, Kao C-Y, Welsh CE, Didion JP, et al. 2016. The Mouse Universal Genotyping Array: from substrains to subspecies. G3 (Bethesda). 6:263–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mott R, Talbot CJ, Turri MG, Collins AC, Flint J.. 2000. A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci U S A. 97:12649–12654. [DOI] [PMC free article] [PubMed] [Google Scholar]
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
Rabiner LR. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 77:257–286. [Google Scholar]
Solberg Woods LC, Stelloh C, Regner KR, Schwabe T, Eisenhauer J, et al. 2010. Heterogeneous stock rats: a new model to study the genetics of renal phenotypes. Am J Physiol Renal Physiol. 298:F1484–F1491. [DOI] [PMC free article] [PubMed] [Google Scholar]
Srivastava A, Morgan AP, Najarian ML, Sarsani VK, Sigmon JS, et al. 2017. Genomes of the mouse Collaborative Cross. Genetics. 206:537–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
Svenson KL, Gatti DM, Valdar W, Welsh CE, Cheng R, et al. 2012. High-resolution genetic mapping using the mouse Diversity Outbred population. Genetics. 190:437–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
Teuscher F, Broman KW.. 2007. Haplotype probabilities for multiple-strain recombinant inbred lines. Genetics. 175:1267–1274. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL.. 2014. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 46:100–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng C, Boer MP, van Eeuwijk FA.. 2015. Reconstruction of genome ancestry blocks in multiparental populations. Genetics. 200:1073–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng C, Boer MP, van Eeuwijk FA.. 2018. Accurate genotype imputation in multiparental populations from low-coverage sequence. Genetics. 210:71–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Our detailed analysis code is available at GitHub, https://github.com/kbroman/Paper_GenericHMM, and archived at Zenodo, https://doi.org/10.5281/zenodo.5718739.

[jkab396-B1] Al-Barghouthi BM, Mesner LD, Calabrese GM, Brooks D, Tommasini SM, et al. 2021. Systems genetics in Diversity Outbred mice inform BMD GWAS and identify determinants of bone strength. Nat Commun. 12:3408. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B2] Bandillo N, Raghavan C, Muyco PA, Sevilla MAL, Lobina IT, et al. 2013. Multi-parent advanced generation inter-cross (MAGIC) populations in rice: progress and potential for genetics research and breeding. Rice (N Y). 6:11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B3] Broman KW. 2005. The genomes of recombinant inbred lines. Genetics. 169:1133–1146. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B4] Broman KW. 2012a. Genotype probabilities at intermediate generations in the construction of recombinant inbred lines. Genetics. 190:403–412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B5] Broman KW. 2012b. Haplotype probabilities in advanced intercross populations. G3 (Bethesda). 2:199–202. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B6] Broman KW, Gatti DM, Simecek P, Furlotte NA, Prins P, et al. 2019a. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics. 211:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B7] Broman KW, Gatti DM, Svenson KL, Sen Ś, Churchill GA.. 2019b. Cleaning genotype data from Diversity Outbred mice. G3 (Bethesda). 9:1571–1579. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B8] Broman KW, Sen S.. 2009. A Guide to QTL Mapping with R/qtl. New York: Springer. [Google Scholar]

[jkab396-B9] Carter T, Falconer D.. 1951. Stocks for detecting linkage in the mouse, and the theory of their design. J Genet. 50:307–323. [DOI] [PubMed] [Google Scholar]

[jkab396-B10] Cavanagh C, Morell M, Mackay I, Powell W.. 2008. From mutations to MAGIC: resources for gene discovery, validation and delivery in crop plants. Curr Opin Plant Biol. 11:215–221. [DOI] [PubMed] [Google Scholar]

[jkab396-B11] Churchill GA, Gatti DM, Munger SC, Svenson KL.. 2012. The Diversity Outbred mouse population. Mamm Genome. 23:713–718. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B12] Complex Trait Consortium. 2004. The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat Genet. 36:1133–1137. [DOI] [PubMed] [Google Scholar]

[jkab396-B13] de Koning D, McIntyre L.. 2017. Back to the future: multiparent populations provide the key to unlocking the genetic basis of complex traits. Genetics. 206:527–529. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B14] Dell'Acqua M, Gatti DM, Pea G, Cattonaro F, Coppens F, et al. 2015. Genetic properties of the MAGIC maize population: a new platform for high definition QTL mapping in Zea mays. Genome Biol. 16:167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B15] Gatti DM, Svenson KL, Shabalin A, Wu L-Y, Valdar W, et al. 2014. Quantitative trait locus mapping methods for Diversity Outbred mice. G3 (Bethesda). 4:1623–1633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B16] Haldane JBS, Waddington CH.. 1931. Inbreeding and linkage. Genetics. 16:357–374. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B17] Huang BE, George AW.. 2011. R/mpmap: a computational platform for the genetic analysis of multiparent recombinant inbred lines. Bioinformatics. 27:727–729. [DOI] [PubMed] [Google Scholar]

[jkab396-B18] King EG, Merkes CM, McNeil CL, Hoofer SR, Sen S, et al. 2012. Genetic dissection of a model complex trait using the Drosophila synthetic population resource. Genome Res. 22:1558–1566. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B19] Kover PX, Valdar W, Trakalo J, Scarcelli N, Ehrenreich IM, et al. 2009. A multiparent advanced generation inter-cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet. 5:e1000551. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B20] Lincoln SE, Lander ES.. 1992. Systematic detection of errors in genetic linkage data. Genomics. 14:604–610. [DOI] [PubMed] [Google Scholar]

[jkab396-B21] Morgan AP, Fu C-P, Kao C-Y, Welsh CE, Didion JP, et al. 2016. The Mouse Universal Genotyping Array: from substrains to subspecies. G3 (Bethesda). 6:263–279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B22] Mott R, Talbot CJ, Turri MG, Collins AC, Flint J.. 2000. A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci U S A. 97:12649–12654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B23] R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]

[jkab396-B24] Rabiner LR. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 77:257–286. [Google Scholar]

[jkab396-B25] Solberg Woods LC, Stelloh C, Regner KR, Schwabe T, Eisenhauer J, et al. 2010. Heterogeneous stock rats: a new model to study the genetics of renal phenotypes. Am J Physiol Renal Physiol. 298:F1484–F1491. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B26] Srivastava A, Morgan AP, Najarian ML, Sarsani VK, Sigmon JS, et al. 2017. Genomes of the mouse Collaborative Cross. Genetics. 206:537–556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B27] Svenson KL, Gatti DM, Valdar W, Welsh CE, Cheng R, et al. 2012. High-resolution genetic mapping using the mouse Diversity Outbred population. Genetics. 190:437–447. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B28] Teuscher F, Broman KW.. 2007. Haplotype probabilities for multiple-strain recombinant inbred lines. Genetics. 175:1267–1274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B29] Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL.. 2014. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 46:100–106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B30] Zheng C, Boer MP, van Eeuwijk FA.. 2015. Reconstruction of genome ancestry blocks in multiparental populations. Genetics. 200:1073–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkab396-B31] Zheng C, Boer MP, van Eeuwijk FA.. 2018. Accurate genotype imputation in multiparental populations from low-coverage sequence. Genetics. 210:71–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A generic hidden Markov model for multiparent populations

Karl W Broman

Roles

Abstract

Introduction

Figure 1.

Methods