Skip to main content
BMC Bioinformatics logoLink to BMC Bioinformatics
. 2008 May 29;9:251. doi: 10.1186/1471-2105-9-251

Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

Min Zhang 1, Dabao Zhang 1,, Martin T Wells 2
PMCID: PMC2435550  PMID: 18510743

Abstract

Background

Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC).

Results

We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets.

Conclusion

The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.

Background

With the advent of high-throughput biotechnologies to genotype dense molecular markers throughout the genome, statistical methodologies are crucial in understanding the genetic architecture of complex traits, and in locating genes underlying important traits. Since the pioneering statistical work by Lander and Botstein [1], much effort has been devoted to improving the efficiency and accuracy of QTL mapping. Traditional approaches to QTL mapping test each of dense grid loci on chromosomes via the likelihood ratios of linear regression models (see the reviews by Doerge et al. [2] and Broman and Speed [3]), and Wang et al. [4] also proposed a Bayesian shrinkage estimation of QTL parameters allowing varying shrinkage factors across different effects.

Epistases (that is, interactions between genes) are ubiquitous in biological systems [5] and may even play a more important role than additive effects, as have been shown in human population [6,7] and other organisms [8-12]. However, even a moderate number of markers implies a large number of pairwise combinations, thus creating statistical issues in QTL mapping. Due to the small sample sizes and the lack of efficient statistical tools, the number of identified genes is limited although the existence of epistasis has been recognized for nearly a hundred years [13]. To detect epistatic effects, Kao and Zeng [14] proposed modeling epistasis via orthogonal contrast scales using Cockerham's model; Yi and Xu [15] developed a Bayesian method to detect epistasis using reversible jump Markov chain Monte Carlo (MCMC) algorithm; Yi et al. [16-18] then proposed a Bayesian model selection approach to detect genome-wide epistasis (with the software described in [19]); Bogdan et al. [20] modified Bayesian information criterion (mBIC) to permit the identification of additive effects as well as pairwise interactions; and Cui and Wu [21] also proposed a statistical framework to detect genetic interactions derived from different genomes in self-pollinated plants. Recently, Żak et al. [22] developed a rank-based model selection and Shi et al. [23] developed a LASSO-type penalized likelihood method to locate interacting QTL while Bogdan et. al [24] extended mBIC for strongly correlated markers and multiple interval mapping.

Consider Yi as the trait value of strain i = 1, ⋯, n, and let Xij be the genotypic value of marker j = 1, ⋯, pβ within the i-th strain. Here we focus on the populations with binary markers Xij (coded as -0.5 and 0.5), such as doubled-haploid, backcross or recombinant inbred lines. With available markers (either observed or imputed) densely located on chromosomes, we assume the putative QTL co-transmit with some of the markers. Let I{X} denote the set including all pairwise epistases of interest, and define Zij = Xik Xil for the j-th candidate epistasis (k, l) ∈ I{X}, j = 1, ⋯, pγ. We investigate the additive effects of putative QTL and the epistatic interactions between them through the following multiple regression model,

Yi=μ+j=1pββjXij+j=1pγγjZij+εi,εi~iidN(0,σε2), (1)

where μ is the overall mean, βj is the additive effect of marker j, γj represents the j-th epistatic effect, and εi is the random error.

QTL mapping with this multiple regression model can be viewed as a model selection procedure [3,25-27]. However, several characteristics of the data complicate the application of classical statistical methodologies. First, a large amount of missing molecular markers, due to failure in genotyping or selective genotyping, is common in practice. When markers are sparse, the missing genotype information between markers must be inferred. Second, the molecular markers in the same linkage group may be highly correlated. Third, the total number of molecular markers and putative epistases, i.e., p = pβ + pγ, is usually much larger than the sample size n. Because of these issues, the efficiency and accuracy are usually compromised for easy development of statistical approaches. Characteristics of the "large p small n" data with missing values require further attention via extensions of traditional model selection approaches. We extend the Bayesian classification approach in Zhang et al. [28] to map QTL with epistases. Spike and slab priors have been used by, for example, Mitchell and Beauchamp [29], George and McCulloch [30], and Ishwaran and Rao [31] to develop Bayesian variable selection approaches. The spike and slab priors consist of two components, with one modeling zero coefficients and the other modeling non-zero ones.

Furthermore, the mixing weight plays a crucial role in condensing the searchable parameter space and enforcing a stochastic search within low-dimensional spaces. When only a limited number of covariates are being investigated, a uniform distribution on [0, 1] or even a fixed value (e.g., 0.5) is usually chosen for the mixing weight. However, when n p, it is unrealistic to expect half of the variables to be selected because the final model may still be unidentifiable. Instead, we expect that, for a successful variable selection, the prior distributions of the mixing weights depend on both n and p.

We investigate the predictability of a model developed for a dataset of sample size n, and tackle the aforementioned issues. We then construct a two-step Bayesian variable selection approach for model (1) in the case that n ≪ (pβ + pγ). In the first step, we employ a restrictive prior for each of the coefficients in model (1) in order to enforce stochastic filtering of the large number of candidate variables. This prior also allows flexibility for the possible different numbers and/or scales of positive and negative coefficients (see [32] for more details on its advantage over symmetric priors). A Gibbs sampling algorithm is developed to compute the posterior distributions and to implement the stochastic search. Only a limited number of variables are filtered to go through the second step, which repeats the first step but with much fewer candidate variables. The second step is necessary to model (1) when n ≪ (pβ + pγ), as the priors in the first step could potentially be too restrictive. The performance of our approach is evaluated via a simulation study and application to real datasets.

Results and Discussion

Simulation

Simulation studies were performed to evaluate the performance of our method in the case of p n. We simulated 56 markers across 3 chromosomes, with each having 10, 20, and 26 markers, and being 56.7 cM, 133.5 cM and 171.6 cM long respectively. We specify σε2 = 0.5415, and the locations of 28 markers are chosen based on the Drosophila data [28], which include 221 inbred introgression lines between two closely related species. The other 28 markers are chosen such that the neighboring markers are at least 5 cM away. Table 1 shows the detailed information of the non-zero effects specified in the simulation, including two additive effects and three epistatic effects. To assess whether our method is able to identify different types of epistatic effects, we include all three possible interactions in the simulation: (1) neither of the two markers has additive effects (that is, 2–133.8 and 3–56.7); (2) one of them has additive effects (that is, 1–24.7 and 2–47.8); (3) both have additive effects (that is, 2–47.8 and 3–141.5). All epistatic effects were set at the same size to avoid its effects on detectability. Due to the intensive computation involved in Gibbs sampling, a total of 100 complete data sets were simulated. Each of the 100 data sets was analyzed using two models, one model with both additive and epistatic effects while the other with additive effects only. When mapping QTL with epistases, we have a total number of 1596 variables (56 additive-effect loci and 1540 epistases) versus 221 observations in the model.

Table 1.

Design of the simulation studies.

Type of effect Marker(s) Effect size Heritability
Additive effect 2–47.8 0.8 0.0989
3–141.5 1.2 0.2225

Interaction effect (1–24.7, 2–47.8) 1.7321 0.1159
(2–47.8, 3–141.5) 1.7321 0.1159
(2–133.8, 3–56.7) -1.7321 0.1159

Each marker is referred by its chromosome index and its location on the chromosome.

For the model without epistases, both markers can be detected in most of the 100 simulated datasets even when the false discovery rate (FDR) is controlled as low as 0 (via setting the Bayes factor higher than 3.2), see Table 2. When modeling the epistases, all (additive and interaction) effects are still detected in more than 90% of the data sets for all levels of Bayes factor (BF) though the FDRs are higher. For those data sets with any effect not identified, the immediate neighbors of the corresponding marker locus are mostly detected instead. As expected, it is more difficult to detect epistases than to detect additive effects. The epistasis of markers both having additive effects is the easiest to be detected among all epistases. The true parameter values are included in their 95% credible intervals with the associated posterior probabilities being very close to one (results not shown).

Table 2.

Simulation results on the basis of model (1).

Model Marker(s) Mean SE BF ≥ 1 BF ≥ 3.2 BF ≥ 10 BF ≥ 100
Without Epistases 2–47.8 0.7453 0.1340 94 (5) 93 (5) 92 (5) 90 (3)
3–141.5 1.1222 0.231 100 (0) 100 (0) 100 (0) 100 (0)
FDR (additive) -- -- 0.0067 0 0 0

With Epistases 2–47.8 0.7610 0.1439 94 (6) 93 (6) 93 (6) 93 (6)
3–141.5 1.1316 0.1402 100 (0) 100 (0) 100 (0) 100 (0)
(1–24.7, 2–47.8) 1.5607 0.3921 92 (6) 91 (7) 91 (7) 90 (7)
(2–47.8, 3–141.5) 1.5558 0.3054 97 (3) 96 (4) 96 (4) 96 (4)
(2–133.8, 3–56.7) -1.6204 0.3875 92 (8) 92 (8) 92 (8) 90 (10)
FDR (additive) -- -- 0.0408 0.0333 0.0133 0.0067
FDR (epistatic) -- -- 0.4872 0.3251 0.2283 0.1122

Out of 100 simulated data sets, the total numbers of data sets that correctly identify the true additive and interaction effects (in the brackets, their neighboring ones when the true ones are missed) are counted respectively when thresholding the Bayes factor (BF) at different levels. Also listed are the mean and standard error (SE) of the estimated effect sizes.

Application

We apply the developed method to the simulans backcross II (BS2) data and the mauritiana backcross II (BM2) data [33,34]. An F1 population was first produced by females from an inbred line of D. simulans and males from an inbred line of D. mauritiana. Then the F1 females were backcrossed to the parental line of D. simulans, which was fixed for different alleles at 45 marker loci, to produce a simulans backcross (BS) population. A mauritiana backcross (BM) population was also produced by backcrossing the F1 females to the other parental line. Based on the two different times of crossing, a total of four data sets were obtained, namely, BS1 (n = 186), BS2 (n = 288), BM1 (n = 192), and BM2 (n = 299). The phenotypic value of an individual is a morphometric descriptor of the posterior lobe, obtained by averaging both sides of the first principal component (PC1) of the Fourier coefficients of the posterior lobe. The genotypes of males were determined at each marker locus, and genetic map positions were estimated from gametes produced by the F1 females in this study. Further information about the data is referred to Liu et al. [33] and Zeng et al. [34].

Employing multiple interval mapping (MIM) [25,35] to the BS2 data, Zeng et al. [34] detected a total of 16 additive effects and no epistatic effect. Pooling all four data sets, Zeng et al. [34] detected three extra additive effects and six epistatic effects. These epistatic effects appeared to be relatively unimportant for PC1 in the interspecific backcross populations, which carried an observation difficult to interpret biologically. Of the 19 additive effects, 18 additive effect estimates have the same sign [34]. Zeng et al. [34] explained this interesting phenomena as an unusually strong directional selection, although Tanksley [36] suggested that transgressive segregation usually followed a mixture of plus and minus alleles in each species as demonstrated by most previous analyses of quantitative traits.

We focused our analysis on the BS2 and BM2 data with the standardized phenotypic values. Of the 19 putative QTL reported by Zeng et al. [34], only nine are at least 1 cM away from the 45 marker loci. Therefore, we analyzed both datasets with these 54 additive effects (nine putative QTL and 45 markers) and all possible pairwise interactions (that is, 1431 putative epistases). When controlling BF ≥ 1, the analysis of the BS2 data reported a total of 25 additive effects (see Table 3), including all nine putative QTL, but no epistatic effect. The analysis of the BM2 data instead reported a total of 20 additive effects (see Table 4), including three of the nine putative QTL, and 18 epistatic effects (see Table 5). On the basis of the simulation study, we may expect less than 0.67% FDR for those 17 and 16 additive effects reported with BF ≥ 100 in analyzing the BS2 and BM2 data respectively. Similarly, three epistatic effects reported in analyzing the BM2 data have BF ≥ 100, less than 12% of which may be false discoveries.

Table 3.

Additive effects with BF ≥ 1 in analyzing the BS2 data.

Marker Coefficient S.D. BF
1–3.6 -0.3797 0.0707 > 1000
1–23.4 -0.3462 0.0426 > 1000
2-0 -0.2284 0.0493 > 1000
2–17.08 -0.1906 0.1055 > 1000
2–27 -0.1262 0.1491 > 1000
2–28.53 -0.1618 0.1387 > 1000
2–69 -0.2969 0.1382 > 1000
2–113.92 -0.0682 0.0487 4.38
2–143 -0.0454 0.0592 1.72
2–145.85 -0.0322 0.0648 1.29
3-0 -0.1726 0.0880 > 1000
3–21.3 -0.3100 0.0569 > 1000
3–43.2 -0.1482 0.1052 > 1000
3–47 -0.1261 0.1571 > 1000
3–49.99 -0.2164 0.0992 > 1000
3–75 -0.4018 0.1072 > 1000
3–94 -0.2147 0.1360 > 1000
3–101.29 -0.0520 0.0904 2.03
3–117 -0.0941 0.0960 29.55
3–126.62 -0.0378 0.0780 1.20
3–134.6 -0.0724 0.1255 4.21
3–139 -0.2624 0.1604 > 1000
3–147.69 -0.0420 0.0833 1.29
3–160 -0.1847 0.1154 > 1000
3–171.22 -0.3295 0.0567 > 1000

The position of each significant additive effect is specified by an index of the corresponding chromosome and its location on this chromosome (cM). The estimated sizes of additive effects and the standard deviations of the Markov chains are also shown in the columns of coefficient and S.D., respectively.

Table 4.

Additive effects with BF ≥ 1 in analyzing the BM2 data.

Marker Coefficient S.D. BF
1-0 -0.2181 0.1426 > 1000
1–3.6 -0.1438 0.1506 920.66
1–23.4 -0.1909 0.0654 > 1000
2–6.98 -0.2393 0.0809 > 1000
2–27 -0.3361 0.0855 > 1000
2–67.96 -0.0561 0.1093 1.29
2–69 -0.1473 0.1146 > 1000
2–113.92 -0.2496 0.0509 > 1000
2–145.85 -0.1145 0.0856 79.06
3–4.99 -0.1973 0.0954 > 1000
3–14.33 -0.2855 0.0928 > 1000
3–28.74 -0.1754 0.0934 > 1000
3–43.2 -0.0586 0.1077 1.80
3–47 -0.2213 0.1648 > 1000
3–49.99 -0.1749 0.1355 > 1000
3–83.15 -0.5978 0.0781 > 1000
3–126.62 -0.1970 0.1066 > 1000
3–147.69 -0.0698 0.0826 3.05
3–161.43 -0.1982 0.0950 > 1000
3–171.22 -0.2385 0.1028 > 1000

The position of each significant additive effect is specified by an index of the corresponding chromosome and its location on this chromosome (cM). The estimated sizes of additive effects and the standard deviations of the Markov chains are also shown in the columns of coefficient and S.D., respectively.

Table 5.

Epistatic effects with BF ≥ 1 in analyzing the BM2 data.

Markers Coefficient S.D. BF
(1–3.6, 3–14.34) 0.0601 0.1077 1.36
(1–3.6, 3–101.29) 0.0383 0.0994 1.05
(1–14.2, 2–28.53) 0.0156 0.0792 1.07
(1–14.2, 3–134.6) 0.2231 0.1552 116.30
(1–14.2, 3–139) 0.1689 0.1453 13.40
(2–17.08, 3–157.73) 0.3304 0.0806 > 1000
(2–28.53, 3–101.29) 0.2688 0.1063 > 1000
(2–34.72, 3–76.3) 0.1307 0.0960 5.34
(2–113.92, 3–83.15) 0.0779 0.0911 1.89
(2–138.82, 3–147.69) 0.1678 0.0943 12.23
(2–143, 3–101.29) 0.0463 0.0972 1.25
(2–145.85, 3–28.74) 0.0896 0.0909 2.61
(2–145.85, 3–43.2) 0.0330 0.0980 1.07
(2–145.85, 3–101.29) 0.0419 0.0921 1.29
(3–21.3, 3–76.3) 0.0797 0.0856 1.97
(3–28.74, 3–53.54) 0.0487 0.0999 1.19
(3–43.2, 3–123.32) 0.0400 0.1014 1.04
(3–53.54, 3–123.33) 0.1925 0.1226 43.36

The QTL positions of each significant epistatic effect are specified by the indices of the corresponding chromosomes and the locations on the chromosomes (cM). The estimated sizes of the epistatic effects and the standard deviations of the Markov chains are also shown in the columns of coefficient and S.D., respectively.

Interestingly, the 25 additive effects detected from the BS2 data include all those detected by Zeng et al. [34] except the 2–135, 3–5 and 3–83 (we consider the markers within 1 cM to be same), but the 20 additive effects detected from the BM2 data only include nine of those detected by Zeng et al. [34]. On the other hand, nine additive effects (i.e., 2–28.53, 2–145.85, 3-0, 3–43.2, 3–49.99, 3–101.29, 3–126.62, 3–134.6, 3–147.69) from the BS2 data are not reported by Zeng et al. [34], and eleven additive effects from the BM2 data (i.e., 1-0, 2–6.98, 2–67.96, 2–145.85, 3–14.33, 3–28.74, 3–43.2, 3–49.99, 3–126.62, 3–147.69, 3–161.43) are not reported by Zeng et al. [34]. Note that almost each additive effect uniquely detected by Zeng et al. [34] has a neighboring one (within 10 cM) in our lists except 2–135 and 3–94 for the BM2 dataset, and almost each additive effect unique in our lists has a neighboring one (within 10 cM) detected by Zeng et al. [34]. Per the discussion on the precision of QTL location by Bogdan and Doerge [37] and Bogdan et al. [24], these effects of close neighbors may be due to identical QTL. Our analysis reported R2 = 0.934 and R2 = 0.902 for the BS2 and BM2 data respectively.

Conclusion

This article extends the Bayesian framework in Zhang et al. [28] to identify both additive and epistatic effects of QTL based on model (1). The advantage of this approach mainly lies in the flexible priors for the regression coefficients by accounting for some characteristics of "large p small n" data, the predictability of a model constructed with size n data, and the two step strategy for dimension reduction. A Gibbs sampler is developed to draw Markov chain samples from the posterior distributions, which can be considered as a stochastic search for an optimal model. Unlike information criteria based model selections which require calculation of the effective sample size for incomplete data, missing values can be naturally imputed within the Gibbs sampling scheme. The corresponding algorithm has been implemented in Matlab and is available as QTLBayes http://www.stat.purdue.edu/~zhangdb/QTLBayes/.

Bayesian variable selections can be viewed as penalized likelihood approaches, which have been studied recently [38,39]. With "large p small n" data, it is not clear how to set up the penalty properly such that it will neither overpenalize nor underpenalize the likelihood. An overpenalized likelihood will lose some significant variables of particular interest, while an underpenalized likelihood will introduce false positives. The predictability of size n data sheds light on the choice of this penalty. Since a size n data set will allow us to understand the variation of the trait explained by only pn = O(n) QTL with accuracy O(n-1/2), selecting too many variables into the model will ruin this practice of QTL mapping. As shown by Bogdan and Doerge [37], severely biased estimates can be resulted from large genome and/or marker number in QTL mapping. We propose a Bayesian framework to resolve the bias problem. We have illustrated our approach by application to the BS2 and BM2 data [33,34], both of which have 45 markers observed across three chromosomes. The disadvantage of this approach is the heavy computation involved as the computation-intensive Markov chain Monte Carlo algorithm is utilized. For example, the analysis of a dataset with more than 200 markers from 1,000 subjects take almost 24 hours using one Intel® Xeon™ CPU at 2.80 GHz.

Coding binary markers with -0.5 and 0.5 has been commonly utilized in QTL mapping as it does not introduce correlation between additive effects and interactive effects, and such uncorrelation benefits the identification of additive effects. On the other hand, coding binary markers with 0 and 1 introduces correlation and thus is not preferred for QTL mapping with epistases [40,41]. Although developed for QTL mapping, this approach is completely general and can be applied to other settings with "large p small n" data, such as associating genomic features to clinical outcomes or phenotypes of biological interest. Unlike QTL mapping data with known missing structure from the linkage information, genomic data with imaging and microarray may require more information to impute missing values because of the unknown missing mechanism. Even though the missing values are usually imputed with a nearest-neighbor approach [42], Gibbs samplers allow natural multiple imputation under the assumption of missing at random (MAR, see Little and Rubin, [43]).

Methods

Predictability and Sample Size

Suppose, for a sample of size n, we select up to pn (assuming pn <n) significant variables into the following regression model,

Yn=Xnβ+εn,εn~N(0,σε2In),

where Yn is an n-dimensional column vector; Xn is an n × pn design matrix such that XnTXn=n×Ipn. The best linear unbiased estimator (BLUE) of β is

β^n=β+1nXnTεn.

Let x˜=(x˜1,x˜2,,x˜pn) include pn predictors for y˜ such that max1jpn|x˜j|=O(1). Since trace{Var(β^n)}=pnnσε2,x˜β can be consistently estimated by x˜β^n. When using x˜β^n to predict y˜, the mean squared prediction error is

E[(y˜x˜β^n)2]=σε2+pnnx˜x˜Tpnσε2.

If pn = o(n), the mean squared prediction error asymptotically achieves the minimum variance, and thus the prediction is asymptotically efficient.

This illustration implies that, with a sample of size n and pn = O(n) predictors, the mean squared prediction error can reach the minimum prediction error at rate O(n-1/2). Suppose that all pn significant variables could be perfectly selected out of p candidates, we still need pn = o(n) in order to have a chance to correctly understand the variation of the dependent variable explained by the selected predictors. Therefore, we always assume that there are at most pn = O(n) significant variables among a total of p candidates in the case of p n. Indeed, the study of consistency in a triangular array setting for regression problems was conducted by Huber [44-46]. In examining the underlying theory of 'model-selection' and 'variable-selection' procedures that choose pn explanatory variables from an initial set of variables, Greenshtein and Ritov [46] proved that one may expect consistency for the choice of pn with an order between o(n/log(n)) and o(n/log(n)). Our choice of pn = O(n) satisfies the Greenshtein and Ritov [46] conditions for consistency.

Bayesian Variable Selection

Here we propose a two-step Bayesian variable selection approach to map QTL with epistases through model (1). With the following Bayesian framework, we first select cn out of pβ additive effects and cn out of pγ epistatic effects (e.g., we use c = 2), respectively, using a restrictive prior for each coefficient. We then apply the same Bayesian framework to stochastically select the filtered variables, using a non-restrictive prior for each coefficient. Gibbs sampling algorithms are developed to stochastically search low-dimensional subspaces, as implied by the predictability of a size n data set.

Prior Specification

For a two-state marker system, both additive effects βj, j = 1, ⋯, pβ, and epistatic effects γj, j = 1, ⋯, pγ, are the primary focus of QTL mapping. As is often the case p = (pβ +pγ) ≫ n, many of these coefficients are zero, either because the variation of the trait can be explained by only a few QTL or because the limited sample size precludes selecting too many variables (otherwise the constructed model is not reliable as shown in the previous section). It is also possible that the number and/or scale of the positive coefficients may be different from those of the negative ones. To account for these properties, a three-component mixture prior is specified for each coefficient βj or γj. More specifically,

{βj~iid(1wβ+wβ)δ()+wβ+N+(μβ+,σβ+2)+wβN(μβ,σβ2),γj~iid(1wγ+wγ)δ()+wγ+N+(μγ+,σγ+2)+wγN(μγ,σγ2), (2)

where δ (·) is a Dirac function with mass one at zero; N+(μ, σ2) and N-(μ, σ2) positively and negatively truncate the normal distribution, i.e., N(μ, σ2), respectively. Therefore, wβ+ (or wβ-) is the probability for any single marker, and wγ+ (or wγ-) is the probability for any pair of markers in I{X}, to have positive (or negative) interactive effect on the trait.

The hyperparameters, σβ+2,σβ2,σγ+2 and σγ2, are assumed to have priors as inverse gamma distributions, that is, IG(θβ+, φβ+), IG(θβ-, φβ-), IG(θγ+, φγ+), and IG(θγ-, φγ-), respectively (e.g., setting θβ+ = θβ- = θγ+ = θγ- = 0.1 and φβ+ = φβ- = φγ+ = φγ- = 10). As a result, the prior on β (and γ) is essentially a mixture of a point mass at zero and some truncated t-distributions, which shrinks the smaller effects towards zero and allows sufficient flexibility for non-zero effects. Furthermore, t-type prior distributions yield Bayes rules with desirable decision-theoretic frequentist properties [47]. The hyperparameters, μβ+, μγ+, μβ- and μγ-, are assumed to have diffuse priors, and the prior distribution for σε2 is proportional to 1/σε2.

As suggested by the predictability of a size n data set, we expect to select at most pn = O(n) out of the p variables for the final model. Therefore, we specify the priors for (wβ+, wβ-) and (wγ+,wγ-) as

wβ++wβ~U(0,cn/pβ),wγ++wγ~U(0,cn/pγ), (3)

that is, expecting at most cn significant additive effects and epistatic effects, respectively. Gaffney [48] and Yi et al. [17], among others, employed similar ideas to rescale the priors based on the number of possible effects. Apparently, when n ≪ (pβ + pγ), either cn/pβ or cn/pγ is very small, which implies a restrictive prior on each corresponding coefficient. Therefore, we usually select cn additive effects and cn epistatic effects during the first run of Bayesian analysis. We then apply the same Bayesian analysis to these pre-selected variables. The second run of Bayesian analysis has both wβ+ + wβ- and wγ+ + wγ-, a priori, uniformly distributed on [0, 1].

Likelihood

Let Yn be the column vector including the trait values of all strains under investigation, let Xi be the vector of all marker values of the i-th strain and Xn=(X1T,,XnT)T, and let Zi be the vector of all epistatic candidate values of the i-th strain. Denote the marginal distribution of A as [A], and the conditional distribution of A given B as [A|B]. With data (Yn, Xn) and the prior specification in Section 3.1, we have the likelihood function, that is, the joint distribution function of the data (Yn, Xn), the parameters (μ, β, γ), σε2, and all hyperparameters

(wβ+,wβ,wγ+,wγ,μβ+,μγ+,σβ+2,σγ+2,μγ+,μγ,σβ2,σγ2),L[Yn|Xn,μ,β,γ,σε2]×[μ]×[μβ+]×[σβ+2]×[μβ]×[σβ2]×[wβ+,wβ]×[β|wβ+,wβ,μβ+,σβ+2,μβ,σβ2]×[μγ+]×[σγ+2]×[μγ]×[σγ2]×[wγ+,wγ]×[γ|wγ+,wγ,μγ+,σγ+2,μγ,σγ2]×[σε2]×[Xn]σεn2exp{i=1n(YiμXiβZiγ)22σε2}×exp(σβ+2φβ+σβ2φβ)×(σβ+2)θβ+1×(σβ2)θβ1×exp(σγ+2φγ+σγ2φγ)×(σγ+2)θγ+1×(σγ2)θγ1×[β|wβ+,wβ,μβ+,σβ+2,μβ,σβ2]×I[wβ++wβcnpβ]×[γ|wγ+,wγ,μγ+,σγ+2,μγ,σγ2]×I[wγ++wγcnpγ]×[Xn]. (4)

The distribution of Xn can be specified based on the available linkage map information [2]. The conditional distribution of [β|wβ+,wβ,μβ+,σβ+2,μβ,σβ2] is a product of the prior distribution for each βj. Similarly, the conditional distribution of [γ|wγ+,wγ,μγ+,σγ+2,μγ,σγ2] is a product of the prior distribution for each γj. The priors of the hyperparameters, θβ+, θγ+, φβ+, φγ+, θβ-, θγ-, φβ- and φγ-, are specified to be as noninformative as possible.

Gibbs Sampling

Since the specified priors are conditionally conjugate, Bayesian variable selection can be implemented with a Gibbs sampling algorithm. We initialize the algorithm by imputing missing genotypic values based on the observed genotypes and linkage information. The initial value of μ is set as the mean of the observed trait values. Then, with individuals having fully observed trait values, each component of β and γ is initially estimated using recursive univariate regression. Other parameters, wβ+,wβ,μβ+,σβ+2,μβ and σβ2, are simply initialized based on the initial value of β, and similarly, the initial values for wγ+,wγ,μγ+,σγ+2,μγ, and σγ2 can be specified using the information from γ. For example, we can initialize σβ+2=σβ2 with an estimate from the initial value β, and then use max{#{j : βj > 2σβ+}, 1}/pβ to initialize wβ+.

Let Xi,-j be Xi excluding the j-th component, and define β-j and γ-j similarly. Based on the likelihood function in (4), the Gibbs sampler can be developed by recursively drawing the missing genotypic values, the missing trait values, and the model parameters from their full conditional posterior distributions as follows.

Sample missing values: Sample each missing genotypic value Xij from its full conditional posterior distribution,

[Xij|Yi,Xi,j,μ,β,γ,σε2][Yi|Xi,j,Xij,μ,β,γ,σε2]×[Xij|Xi,j1,Xi,j+1],

and then sample each missing trait value Yi from its full conditional posterior distribution [Yi|Xi, μ, β, γ, σε2].

Sample μ: Sample μ from its full conditional posterior distribution,

μ|Yn,Xn,β,γ,σε2~N(1ni=1n(YiXiβZiγ),σε2n).

Sample β and γ: Sample each βj and γj from their full conditional posterior distributions,

[βj|Yn,Xn,μ,βj,γ,wβ+,wβ,σε2,σβ+2,σβ2]~(1w˜βj+w˜βj)δ(βj)+w˜βj+N+(μ˜βj+,σ˜βj+2)+w˜βjN(μ˜βj,σ˜βj2),[γj|Yn,Xn,μ,β,γj,wγ+,wγ,σε2,σγ+2,σγ2]~(1w˜γj+w˜γj)δ(γj)+w˜γj+N+(μ˜γj+,σ˜γj+2)+w˜γjN(μ˜γj,σ˜γj2),

where w˜βj+,w˜βj,μ˜βj+,σ˜βj+2,μ˜βj, and σ˜βj2 are specified in the APPENDIX. In addition, w˜γj+,w˜γj,μ˜γj+,σ˜γj+2,μ˜γj, and σ˜γj2 can be obtained similarly.

Sample wβ+, wγ+, wβ-, and wγ-: These parameters can be sampled from the conditional posterior distributions,

(wβ+,wβ,1wβ+wβ)|β~Dirichlet(p˜β++1,p˜β+1,pβp˜β+p˜β+1),wβ++wβcnpβ,(wγ+,wγ,1wγ+wγ)|γ~Dirichlet(p˜γ++1,p˜γ+1,pγp˜γ+p˜γ+1),wγ++wγcnpγ,

where p˜β+ = #{βj > 0 : 1 ≤ j pβ} and p˜β = #{βj < 0 : 1 ≤ j pβ}; p˜γ+ = #{γj > 0 : 1 ≤ j pγ} and p˜γ = #{γj < 0 : 1 ≤ j pγ}.

Sample σε2,σβ+2,σγ+2,σβ2, and σγ2: With conditionally conjugate priors, the posterior for all variance parameters are still inverse gamma distributions. Specifically,

σε2|Yn,Xn,μ,β,γ~IG(n2,2i=1n(YiμXiβZiγ)2),σβ+2|β~IG(θβ++p˜β+2,22φβ++j=1pββj2I[βj>0]),σγ+2|γ~IG(θγ++p˜γ+2,22φγ++j=1pγγj2I[γj>0]),σβ2|β~IG(θβ+p˜β2,22φβ+j=1pββj2I[βj<0]),σγ2|γ~IG(θγ+p˜γ2,22φγ+j=1pγγj2I[γj<0]).

Bayesian Inference

For each variable in model (1), one pair of parameters is used to select the corresponding variable. They are, for the j-th additive effect, the posterior probabilities wβ j+ = P (βj > 0|Yn, Xn) and wβ j- = P (βj < 0|Yn, Xn). With the full conditional posterior distribution of βj and all the notations in the APPENDIX, we have

wβj+=E[w˜βj+|Yn,Xn],wβj=E[w˜βj|Yn,Xn].

Therefore, the two parameters wβ j+ and wβ j- can be estimated with the Markov chains of w˜βj+ and w˜βj drawn from the above Gibbs sampler. If and only if both wβ j+ and wβ j- are less than 0.5, the median of the posterior distribution of βj is zero. Similarly, the posterior probabilities wγ j+ = P (γj > 0|Yn, Xn) and wγ j- = P (γj < 0|Yn, Xn) can be estimated with the Markov chains of w˜γj+ and w˜γj drawn from the above Gibbs sampler.

We propose to select variables twice under the above Bayesian framework for model (1). At the first step, we use a restrictive prior for each coefficient to ensure an identifiable Bayesian model and enforce to stochastically search for an optimal low-dimensional parameter subspace. We then rank the j-th additive effect based on max{wβ j+, wβ j-}, and rank the j-th epistatic effect based on max{wγ j+, wγ j-}. The top cn out of pβ additive effects, and the top cn out of pγ epistatic effects are selected, respectively. At the second step, we select variables out of those selected cn additive effects and cn epistatic effects, under the above Bayesian framework for model (1). Obviously, we have a non-restrictive prior for each coefficient at the second step, and therefore avoid possible over-penalization due to restrictive priors.

Following Jeffreys [49,50], we test the hypothesis H0 : βj = 0 vs. H1: βj ≠ 0 on the basis of the Bayes factor, which was defined as

B10(βj)=P(Data|βj0)P(Data|βj=0)=P(βj0|Data)P(βj=0|Data)×π(βj=0)π(βj0)=wβj++wβj1wβj+wβj,

where π (βj = 0) and π (βj ≠ 0) are the a priori probabilities, and the last equality follows the fact that π (βj = 0) = π (βj ≠ 0) at the second step of our Bayesian Classification. As suggested by Jeffreys [50], a B10 (βj) with value between 1 and 10 ≈ 3.2 provides "not worth more than a bare mention" evidence against H0; a B10 (βj) with value from 10 to 10 provides "substantial" evidence against H0; a B10 (βj) with value from 10 to 100 provides "strong" evidence against H0; and a B10 (βj) with value larger than 100 provides "decisive" evidence against H0. Similarly, we can test the hypothesis H0: γj = 0 vs. H1: j ≠ 0 using the following Bayes factor

B10(γj)=wγj++wγj1wγj+wγj. (5)

Authors' contributions

MZ and DZ both conceived the study and developed the method. DZ wrote the MATLAB® code and did the simulation study. MZ analyzed the real data. MTW participated in conceptual development, writing, reviewing and editing the manuscript. All authors read and approved the final manuscript.

Appendix

Fully Conditional Posterior Distribution of βj

For each j = 1, ⋯, pβ, the fully conditional posterior distribution of βj is

βj|Yn,Xn,μ,βj,γ,wβ+,wβ,σε2,σβ+2,σβ2~(1w˜βj+w˜βj)δ(0)+w˜βj+N+(μ˜βj+,σ˜βj+2)+w˜βjN(μ˜βj,σ˜βj2),

where the updated parameter values are

μ˜βj+=σβ+2i=1nXij(YiμXi,jβjZiγ)/(σε2+σβ+2i=1nXij2),σ˜βj+2=σβ+2σε2/(σε2+σβ+2i=1nXij2),μ˜βj=σβ2i=1nXij(YiμXi,jβjZiγ)/(σε2+σβ2i=1nXij2),σ˜βj2=σβ2σε2/(σε2+σβ2i=1nXij2),w˜βj+=σ˜βj+σβ+Φ(μ˜βj+σ˜βj+)/[1wβ+wβ2wβ+exp(μ˜βj+22σ˜βj+2)+σ˜βj+σβ+×Φ(μ˜βj+σ˜βj+)+wβσ˜βjwβ+σβΦ(μ˜βjσ˜βj)exp(μ˜βj22σ˜βj2μ˜βj+22σ˜βj+2)],w˜βj=σ˜βjσβΦ(μ˜βjσ˜βj)/[1wβ+wβ2wβexp(μ˜βj22σ˜βj2)+σ˜βjσβj×Φ(μ˜βjσ˜βj)+wβ+σ˜βj+wβσβ+Φ(μ˜βj+σ˜βj+)exp(μ˜j+22σ˜βj+2μ˜βj22σ˜βj2)].

Acknowledgments

Acknowledgements

We thank Rebecca Doerge and three anonymous referees for suggestions and comments that significantly improved this article. This research was supported by NSF grant 0612031 and NIH-NIGMS 1R01GM083606-01 to MTW, Purdue Research Foundation grants to MZ and DZ, and Purdue Alumni Association Faculty Incentive Grant to MZ.

Contributor Information

Min Zhang, Email: minzhang@stat.purdue.edu.

Dabao Zhang, Email: zhangdb@stat.purdue.edu.

Martin T Wells, Email: mtw1@cornell.edu.

References

  1. Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121:185–199. doi: 10.1093/genetics/121.1.185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Doerge RW, Zeng ZB, Weir BS. Statistical issues in the search for genes affecting quantitative traits in experimental populations. Statistical Science. 1997;12:195–219. doi: 10.1214/ss/1030037909. [DOI] [Google Scholar]
  3. Broman KW, Speed TP. A model selection approach for the identification of quantitative trait loci in experimental crosses. Journal of the Royal Statistical Society Series B. 2002;64:641–656. doi: 10.1111/1467-9868.00354. [DOI] [Google Scholar]
  4. Wang H, Zhang YM, Li X, Masinde GL, Mohan S, Baylink DJ, Xu S. Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics. 2005;170:465–480. doi: 10.1534/genetics.104.039354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Carlborg d, Haley CS. Epistasis: too often neglected in complex trait studies? Natuer Review Genetics. 2004;5:618–625. doi: 10.1038/nrg1407. [DOI] [PubMed] [Google Scholar]
  6. Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human disease. Human Heredity. 2003;56:73–82. doi: 10.1159/000073735. [DOI] [PubMed] [Google Scholar]
  7. Williams SM, Addy JH, Phillips JAI, Dai M, Kpodonu J, Afful J, Jackson H, Joseph K, Eason F, Murray MM, Epperson P, Aduonum A, Wong LJ, Jose PA, Felder RA. Combinations of variation in multiple genes are associated with hypertension. Hypertension. 2000;36:2–6. doi: 10.1161/01.hyp.36.1.2. [DOI] [PubMed] [Google Scholar]
  8. Leamy LJ, Routman EJ, Cheverud JM. An epistatic genetic basis for fluctuating asymmetry of mandible size in mice. Evolution. 2002;56:642–653. doi: 10.1111/j.0014-3820.2002.tb01373.x. [DOI] [PubMed] [Google Scholar]
  9. Wagner A. Robustness against mutations in genetic networks of yeast. Nature Genetics. 2000;24:355–361. doi: 10.1038/74174. [DOI] [PubMed] [Google Scholar]
  10. Sanjuán R, Cuevas JM, Moya A, Elena SF. Epistasis and the adaptability of an RNA virus. Genetics. 2005;170:1001–1008. doi: 10.1534/genetics.105.040741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Eshed Y, Zamir D. Less-than-additive epistatic interactions of quantitative trait loci in tomato. Genetics. 1996;143:1807–1817. doi: 10.1093/genetics/143.4.1807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Xu S, Jia Z. Genomewide analysis of epistatic effects for quantative traits in Barley. Genetics. 2007;175:1955–1963. doi: 10.1534/genetics.106.066571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bateson W. Mendel's Principles of Heredity. Cambridge: Cambridge University Press; 1909. [Google Scholar]
  14. Kao CH, Zeng ZB. Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics. 2002;160:1243–1261. doi: 10.1093/genetics/160.3.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Yi N, Xu S. Mapping quantitative trait loci with epistatic effects. Genetical Research. 2002;79:185–198. doi: 10.1017/S0016672301005511. [DOI] [PubMed] [Google Scholar]
  16. Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D. Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics. 2005;170:s1333–1344. doi: 10.1534/genetics.104.040386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Yi N, Banerjee S, Pomp D, Yandell BS. Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits. Genetics. 2007;176:1855–1864. doi: 10.1534/genetics.107.071142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, Yandell BS. An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics. 2007;176:1865–1877. doi: 10.1534/genetics.107.071365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Yandell BS, Mehta T, Banerjee S, Shriner D, Venkataraman R, Moon JY, Neely WW, Wu H, von Smith R, Yi N. R/qtlbim: QTL with Bayesian interval mapping in experimental crosses. Bioinformatics. 2007;23:641–643. doi: 10.1093/bioinformatics/btm011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Bogdan M, Ghosh JK, Doerge RW. Modifying the Schwartz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics. 2004;167:989–999. doi: 10.1534/genetics.103.021683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Cui YH, Wu R. Mapping genome-genome epistasis: a high-dimensional model. Bioinformatics. 2005;21:2447–2455. doi: 10.1093/bioinformatics/bti342. [DOI] [PubMed] [Google Scholar]
  22. Żak M, Baierl A, Bogdan M, Futschik A. Locating multiple interacting quantitative trait loci using rank-based model selection. Genetics. 2007;176:1845–1854. doi: 10.1534/genetics.106.068031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Shi W, Lee KE, Wahba G. Detecing disease-causing genes by LASSO-Patternsearch algorithm. BMC Proceedings. 2007;1:S60. doi: 10.1186/1753-6561-1-s1-s60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Bogdan M, Frommlet F, Biecek P, Cheng R, Ghosh JK, Doerge RW. Extending the modified Bayesian information criterion (mBIC) to dense markers and multiple interval mapping. Biometrics doi: 10.1111/j.1541-0420.2008.00989.x. [DOI] [PubMed] [Google Scholar]
  25. Kao CH, Zeng ZB, Teasdale RD. Multiple interval mapping for quantitative trait loci. Genetics. 1999;152:1203–1216. doi: 10.1093/genetics/152.3.1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Zeng ZB, Kao CH, Basten CJ. Estimating the genetic architecture of quantitative traits. Genetical Research. 1999;74:279–289. doi: 10.1017/S0016672399004255. [DOI] [PubMed] [Google Scholar]
  27. Ball RD. Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the Bayesian information criterion. Genetics. 2001;159:1351–1364. doi: 10.1093/genetics/159.3.1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Zhang M, Montooth KL, Wells MT, Clark AG, Zhang D. Mapping multiple quantitative trait loci by Bayesian classification. Genetics. 2005;169:2305–2318. doi: 10.1534/genetics.104.034181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mitchell TJ, Beauchamp JJ. Bayesian variable selection in linear regression (with discussion) Journal of the American Statistical Association. 1988;83:1023–1036. doi: 10.2307/2290129. [DOI] [Google Scholar]
  30. George EI, McCulloch RE. Variable selection via Gibbs sampling. Journal of the American Statistical Association. 1993;88:881–889. doi: 10.2307/2290777. [DOI] [Google Scholar]
  31. Ishwaran H, Rao JS. Spike and slab variable selection: frequentist and Bayesian strategies. The Annals of Statistics. 2005;33:730–773. doi: 10.1214/009053604000001147. [DOI] [Google Scholar]
  32. Zhang M, Zhang D, Wells MT. Generalized Shrinkage Estimators Adpative to Sparsity and Asymmetry of High Dimensional Parameter Spaces. Technical Reports, Department of Statistics, Purdue University. 2008. pp. 08–01.
  33. Liu J, Mercer JM, Stam LF, Gibson G, Zeng ZB, Laurie CC. Genetic analysis of a morphological shape difference in the male genitalia of Drosophila simulans and D. mauritiana. Genetics. 1996;142:1129–1145. doi: 10.1093/genetics/142.4.1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zeng ZB, Liu J, Stam LF, Kao CH, Mercer JM, Laurie CC. Genetic architecture of a morphological shape difference between two drosophila species. Genetics. 2000;154:299–310. doi: 10.1093/genetics/154.1.299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kao CH, Zeng ZB. General formula for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics. 1997;53:653–665. doi: 10.2307/2533965. [DOI] [PubMed] [Google Scholar]
  36. Tanksley SD. Mapping polygenes. Annual Review Genetics. 1993;27:205–233. doi: 10.1146/annurev.ge.27.120193.001225. [DOI] [PubMed] [Google Scholar]
  37. Bogdan M, Doerge RW. Biased estimators of quantitative trait locus heritability and location in interval mapping. Heredity. 2005;95:476–484. doi: 10.1038/sj.hdy.6800747. [DOI] [PubMed] [Google Scholar]
  38. Tibshirani RJ. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B. 1996;58:267–288. [Google Scholar]
  39. Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics. 2004;32:928–961. doi: 10.1214/009053604000000256. [DOI] [Google Scholar]
  40. Álvarez-Castro JM, Carlborg O. A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics. 2007;176:1151–1167. doi: 10.1534/genetics.106.067348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Zeng ZB, Wang T, Zou W. Modeling quantitative trait loci and interpretation of models. Genetics. 2005;169:1711–1725. doi: 10.1534/genetics.104.035857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Hastie H, Tibshirani R, Sherlock G, Eisen M, Brown P, Botstein D. PhD thesis. Stanford University, Statistics Department; 1999. Imputing missing data for gene expression arrays. [Google Scholar]
  43. Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: John Wiley; 2002. [Google Scholar]
  44. Huber P. Robust regression: asymptotics, conjectures, and Monte Carlo. The Annals of Statistics. 1973;1:799–821. doi: 10.1214/aos/1176342503. [DOI] [Google Scholar]
  45. Portnoy S. Asymptotic behavior of M-estimators of p regression parameters when p2/n is large, I. Consistency. Annals of Statistics. 1984;12:1298–1309. doi: 10.1214/aos/1176346793. [DOI] [Google Scholar]
  46. Greenshtein E, Ritov Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli. 2004;10:971–988. doi: 10.3150/bj/1106314846. [DOI] [Google Scholar]
  47. Fourdrinier D, Strawderman WE, Wells MT. On the construction of Bayes minimax estimators. The Annals of Statistics. 1998;26:660–671. doi: 10.1214/aos/1028144853. [DOI] [Google Scholar]
  48. Gaffney PJ. PhD thesis. Department of Statistics, University of Wisconsin, Madison, WI; 2001. An efficient reversible jump Markov chain Monte Carlo approach to detect multiple loci and their effects in inbred crosses. [Google Scholar]
  49. Jeffreys H. Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophy Society. 1935;31:201–222. [Google Scholar]
  50. Jeffreys H. Theory of Probability. Oxford: Clarendon Press; 1961. [Google Scholar]

Articles from BMC Bioinformatics are provided here courtesy of BMC

RESOURCES