Summary
R/qtlbim is an extensible, interactive environment for the Bayesian Interval Mapping of QTL, built on top of R/qtl (Broman et al. 2003), providing Bayesian analysis of multiple interacting quantitative trait loci (QTL) models for continuous, binary and ordinal traits in experimental crosses. It includes several efficient Markov chain Monte Carlo (MCMC) algorithms for evaluating the posterior of genetic architectures, i.e. the number and locations of QTL, their main and epistatic effects, and gene-environment interactions. R/qtlbim provides extensive informative graphical and numerical summaries, and model selection and convergence diagnostics of the MCMC output, illustrated through the vignette, example and demo capabilities of R (R Development Core Team 2006).
1 INTRODUCTION
The freely available QTL mapping package, R/qtlbim (www.rqtlbim.org), provides a comprehensive framework for Bayesian model selection of the genetic architecture of complex traits in experimental crosses. Classical approaches to model selection in QTL mapping, such as multiple interval mapping in QTL Cartographer (Basten, Weir and Zeng 2002), largely rely on stepwise model selection with separate fits to each possible model. The Bayesian approach has the advantage of sampling across the more probable models, and providing graphical summaries that can compare many models at once. R/qtlbim can infer multiple QTL in the presence of epistasis (gene-gene interaction) and gene-environment interactions.
R/qtlbim is built on the widely used R/qtl framework (Broman et al. 2003), which provides many graphical tools for data checking and classical model selection. R/qtlbim shares the extensibility features of R/qtl. Computationally intensive algorithms are written in C, with data manipulation and graphics in R. R/qtlbim is available across Window, Linux and Mac OSX platforms and accepts a variety of input formats via R/qtl.
2 MCMC TECHNOLOGY
Central to R/qtlbim is the Markov chain Monte Carlo (MCMC) technology (Yi 2004; Yi et al. 2004; Yi et al. 2005). MCMC samples are drawn from the posterior distribution of genetic architecture, including number and location of genetic loci, gene action effects at all loci, epistatic interactions between pairs of loci, fixed and random covariates, and gene-environment interactions. These MCMC samples are then summarized and interpreted with graphs to infer key aspects of the genetic architecture.
MCMC provides a mechanism to study the full Bayesian posterior distribution for any particular genetic architecture. Further, using a prior to allow uncertainty of genetic architecture leads to MCMC samples across multiple genetic architectures. This brings model selection formally into a Bayesian framework in which the genetic architecture is just another parameter to be estimated.
3 FEATURES
R/qtlbim contains several efficient MCMC algorithms to search for genetic architectures that are most probable. It includes graphical and tabular summaries that assess the contribution of individual loci and pairs of loci while adjusting for effects of all other possible loci and covariates via model averaging.
Graphical tools allow the user to examine the MCMC samples by loci and genotypic effects in a variety of ways. They include estimation of Bayes factors for model selection on the number of QTL, the pattern of QTL across chromosomes, and the patterns of epistatic and gene-environment interactions. High posterior density (HPD) regions provide estimates of QTL analogous to LOD support intervals. Suggestions for model selection are provided in vignettes, examples and demos using the R (R Development Core Team 2006) interactive documentation system. The primary vignette, qtlbim.pdf, gives an overview of the package. Each command has a help page and corresponding example, which can be adapted to the user’s own data. The command qb.demo leads the user through several demonstrations of the package capabilities.
The scan.pdf vignette provides more detail on R/qtlbim extensions of typical interval mapping scans. The R/qtl package (Broman et al. 2003) offers genome scans of the classical log odds (LOD) of Lander and Botstein (1989) and the Bayesian log posterior density (LPD) of Sen and Churchill (2001). These only consider the effect of one (scanone) or two (scantwo) QTL on a phenotype. R/qtlbim’s scan routines, principally qb.scanone and qb.scantwo, use R/qtl’s generic plot routines for 1-D and 2-D scans, respectively. However, our philosophy differs in important ways beyond changes in the calling sequence. Our scans consider the contribution of a given locus, or pair of loci, to the LPD after adjusting for all other possible QTL and covariates by model averaging over all genetic architectures that contain the QTL(s) being examined. These marginal scans are partitioned into contributions from main effects, epistatic interactions and gene-environment interactions. Another important distinction is that we provide the facility to estimate marginal heritability, Bayes factors, means by genotypes, and other features in addition to LPD.
A third vignette, hyperslide.pdf, offers a way to automate model selection for genetic architecture of a complex trait. By default, it analyzes the hypertension data of Sugiyama et al. (2001) illustrated below. However, this vignette is an interactive object that can be reconfigured with another dataset using the qb.sweave command, via the Sweave framework (Leisch 2002). Sweave is an advanced feature using and the LaTeX (www.latex-project.org) type-setting system, which must be separately installed.
An example of R/qtlbim is provided in Figure 1 using the salt-induced hypertension data of Sugiyama et al. (2001) that is available in R/qtl. Note the improvement in LPD over R/qtl scanone by (1) adjusting for affects of all other possible QTL and (2) providing marginal evidence for epistasis.
4 FUTURE DEVELOPMENT
R/qtlbim is under continual development. Future plans include proper treatment of the X chromosome (Broman et al. 2006) and extensions to correlated traits and experimental crosses derived from multiple inbred lines and outbred populations. We are also investigating ways to assess false discovery and check the fit of a model to data and prior assumptions. More extensive graphics for gene-environment interactions and for ordinal traits are planned. We intend to build on the graphical user interface for R/qtl that is under development, and we are in close communication with the R/qtl development team.
Acknowledgments
This work is supported by National Institutes of Health (NIH) Grants R01 GM069430 (NY). In addition, RVS has partial support from NIH GM070683; BSY has partial support from NIH/PA-02-110, NIH/NIDDK 5803701 and NIH/NIDDK 66369-01; and NY has partial support from NIH HL80812, ES09912 and DK067487.
Footnotes
Availability: The package is freely available from cran.r-project.org.
Contributor Information
Brian S. Yandell, Email: byandell@wisc.edu.
Nengjun Yi, Email: nyi@ms.soph.uab.edu.
References
- Basten CJ, Weir BS, Zeng ZB. QTL Cartographer, Version 1.16. Department of Statistics, North Carolina State University; Raleigh, NC: 2002. [Google Scholar]
- Broman KW, Sen Œ, Owens SE, Manichaikul A, Southard-Smith EM, Churchill GA. The X chromosome in quantitative trait locus mapping. Genetics. 2006;00:000–000. doi: 10.1534/genetics.106.061176. dx.doi.org/10.1534/genetics.106.064311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broman KW, Wu H, Sen Œ, Churchill GA. R/qtl: QTL mapping in experimental crosses. Bioinformatics. 2003;19:889–890. doi: 10.1093/bioinformatics/btg112. www.rqtl.org. [DOI] [PubMed] [Google Scholar]
- Leisch F. Dynamic generation of statistical reports using literate data analysis. In: Härdle W, Rönz B, editors. Compstat 2002 - Proceedings in Computational Statistics; Heidelberg, Germany: Physika Verlag; 2002. pp. 575–580. [Google Scholar]
- Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121:185–199. doi: 10.1093/genetics/121.1.185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2006. www.R-project.org. [Google Scholar]
- Sen Œ, Churchill GA. A statistical framework for quantitative trait mapping. Genetics. 2001;159:371–387. doi: 10.1093/genetics/159.1.371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sugiyama F, Churchill GA, Higgens DC, Johns C, Makaritsis KP, Gavras H, Paigen B. Concordance of murine quantitative trait loci for salt-induced hypertension with rat and human loci. Genomics. 2001;71:70–77. doi: 10.1006/geno.2000.6401. [DOI] [PubMed] [Google Scholar]
- Yi N. A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics. 2004;167:967–975. doi: 10.1534/genetics.104.026286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi N, Xu S, George V, Allison DB. Mapping multiple quantitative trait loci for ordinal traits. Behavior Genetics. 2004;34:3–15. doi: 10.1023/B:BEGE.0000009473.43185.43. [DOI] [PubMed] [Google Scholar]
- Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D. Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics. 2005;170:1333–1344. doi: 10.1534/genetics.104.040386. [DOI] [PMC free article] [PubMed] [Google Scholar]