Skip to main content
Genetics logoLink to Genetics
. 2007 Oct;177(2):1255–1258. doi: 10.1534/genetics.107.077487

Derivation of the Shrinkage Estimates of Quantitative Trait Locus Effects

Shizhong Xu 1,1
PMCID: PMC2034631  PMID: 17720913

Abstract

The shrinkage estimate of a quantitative trait locus (QTL) effect is the posterior mean of the QTL effect when a normal prior distribution is assigned to the QTL. This note gives the derivation of the shrinkage estimate under the multivariate linear model. An important lemma regarding the posterior mean of a normal likelihood combined with a normal prior is introduced. The lemma is then used to derive the Bayesian shrinkage estimates of the QTL effects.


THE Bayesian shrinkage estimation of quantitative trait locus (QTL) effects was first introduced by Xu (2003) and later formalized by Wang et al. (2005). The multivariate version of the shrinkage estimation of QTL effects was recently developed by Yang and Xu (2007). The main purpose of the shrinkage estimation is to avoid variable selection for mapping multiple QTL. Once a normal prior distribution for each regression coefficient is incorporated into the QTL mapping program, the method can handle substantially more QTL effects than the classical maximum-likelihood (ML) method. In addition, the shrinkage method produces much clearer signals of QTL on the genome than the ML method. As a result, shrinkage mapping appears to have pointed to a new direction for future research in QTL mapping.

The key issue of shrinkage estimation is the normal prior distribution assigned to the regression coefficient (QTL effect). More importantly, different regression coefficients are assigned different normal priors. Because the variances in the prior distributions determine the degrees of shrinkage, assigning different prior variances to different regression coefficients allows the method to differentially shrink regression coefficients. A smaller prior variance will cause the regression coefficient to shrink more while a larger prior variance will lead to less shrinkage. This phenomenon is called selective shrinkage.

After incorporating the normal prior distribution into the likelihood function, we can derive the posterior distribution of the regression coefficient, which remains normal due to the conjugate nature of the normal prior. The posterior mean and posterior variance are used to generate a posterior sample of the regression coefficient. Formulas for the posterior mean and posterior variance are mathematically attractive (see Xu 2003; Wang et al. 2005; Yang and Xu 2007). However, due to page limitations of these publications, derivation of the formulas was not provided in these articles.

Derivation of the univariate shrinkage estimation closely followed Box and Tiao's (1973, Appendix A1.1) combination of a univariate normal likelihood and a univariate normal prior. Derivation of the multivariate shrinkage estimation followed the general Bayesian linear model of Lindley and Smith (1972) and the best linear unbiased prediction (BLUP) of Robinson (1991). The derivations presented by these authors were particularly targeted to statisticians and often difficult to understand by the audience of the genetics community. I have been regularly receiving e-mails and calls from readers asking for the derivation. These readers (almost all genetics professionals and students) are often interested in extending the shrinkage method to handle QTL mapping in different mapping populations. Understanding the derivation of these formulas is crucial to the development of new shrinkage methods. Simply pointing them to the above references often does not help too much because intermediate steps are needed to lead to the shrinkage estimate presented by Xu (2003). By doing this, I often give them an impression of irresponsibility. Therefore, I prepared a short note for the derivation and distributed the note to these interested readers. The note briefly summarizes the derivation using a language that is easy to understand by geneticists with basic statistical training. Given the increasing interest of the derivation from the QTL mapping community, it is more efficient to publish the note in Genetics where the very first shrinkage method (Xu 2003) was published.

THEORY AND MODEL

Shrinkage estimates:

Let Inline graphic be an Inline graphic vector for the phenotypic values of m traits collected from the jth individual for Inline graphic where n is the sample size. This vector is described by the following linear model,

graphic file with name M4.gif (1)

where Inline graphic is an Inline graphic vector for the population means (or intercept), Inline graphic is an Inline graphic design matrix (determined by the genotypes of the jth individual at the kth locus), Inline graphic is a Inline graphic vector for the regression coefficients (QTL effects) for locus k (Inline graphic), Inline graphic is an Inline graphic vector of residual errors with an assumed Inline graphic distribution, and D is an Inline graphic positive definite covariance matrix. When the kth regression coefficient is considered, all other regression coefficients are treated as constants and thus model (1) can be rewritten as

graphic file with name M16.gif (2)

where

graphic file with name M17.gif (3)

is the phenotypic value adjusted by all other regression coefficients that are not currently under consideration. Let us describe Inline graphic by the following normal prior Inline graphic where Inline graphic is a Inline graphic vector for the means and Inline graphic is a Inline graphic prior variance–covariance matrix. The posterior distribution of Inline graphic is multivariate normal with mean

graphic file with name M25.gif (4)

and variance–covariance matrix

graphic file with name M26.gif (5)

In shrinkage analysis, we often set Inline graphic for Inline graphic as such the posterior mean becomes

graphic file with name M29.gif (6)

This posterior mean is called the shrinkage estimate of the regression coefficient Inline graphic When Inline graphic the prior is flat, leading to the usual least-squares estimate,

graphic file with name M32.gif (7)

When Inline graphic we have Inline graphic which leads to Inline graphic and thus Inline graphic an estimate shrunken to zero. Therefore, matrix Inline graphic serves as a factor to determine the degree of shrinkage for the estimate of Inline graphic Because Inline graphic varies, the degree of shrinkage also varies across k. To prove the shrinkage estimate, I first introduce the following lemma:

Lemma. Assume that parameter b can be inferred from two independent sources of information. Let Inline graphic and Inline graphic be the distributions of the two sources of information. When we combine Inline graphic and Inline graphic the distribution of b remains multivariate normal Inline graphic with mean Inline graphic and variance–covariance matrix Inline graphic

Proof of the lemma. The distribution of b given the two sources of information is described by

graphic file with name M47.gif (8)

where C is a constant with respect to b. When deriving a distribution, we are interested only in the kernel of the distribution. A kernel of a distribution is the central part of the distribution function, the part that remains when constants are disregarded. In the above distribution, the logarithm of the kernel is

graphic file with name M48.gif (9)

which is further expressed by

graphic file with name M49.gif (10)

We can see that this kernel involves another constant, Inline graphic which can be ignored also. Therefore, the actual kernel that contains only the linear and quadratic functions of b is

graphic file with name M51.gif (11)

Let Inline graphic and Inline graphic The kernel is simplified into

graphic file with name M54.gif (12)

which turns out to be the kernel of Inline graphic Therefore, we conclude that Inline graphic

Derivation of the shrinkage estimates:

We now use the above lemma to derive the shrinkage estimate of Inline graphic The two sources of information for Inline graphic come from the data (Inline graphic) and the prior. Information from the data is used to infer Inline graphic through the maximum-likelihood method. The log-likelihood function is

graphic file with name M61.gif (13)

The maximum-likelihood estimate of Inline graphic is

graphic file with name M63.gif (14)

and the variance of this estimate is

graphic file with name M64.gif (15)

Let Inline graphic and Inline graphic After some algebraic manipulation on the likelihood function, we find that Equation 13 has the following normal kernel with respect to Inline graphic

graphic file with name M68.gif (16)

Therefore, the distribution of Inline graphic inferred from the data is Inline graphic The second source of information for Inline graphic is the prior distribution Inline graphic If we let Inline graphic and Inline graphic the distribution of Inline graphic from the second source of information is Inline graphic According to the lemma, the posterior mean of Inline graphic is

graphic file with name M78.gif (17)

and the posterior variance is

graphic file with name M79.gif (18)

This concludes the derivation of the shrinkage estimate of Inline graphic

Univariate version of the shrinkage estimate:

The shrinkage estimate of the regression coefficient given by Xu (2003) is a special case of the general shrinkage estimate. The regression model of Xu (2003) is

graphic file with name M81.gif (19)

where every variable in the equation is a scalar rather than a matrix. When focused on the kth regression coefficient, the model is rewritten as

graphic file with name M82.gif (20)

where Inline graphic is the adjusted data. Let us assume Inline graphic where Inline graphic is the univariate version of matrix D. Assume that the prior distribution for Inline graphic is Inline graphic Therefore, the univariate versions of Inline graphic and Inline graphic are Inline graphic and Inline graphic respectively. Substituting all the parameters of Equations 4 and 5 by their univariate counterparts, we have

graphic file with name M92.gif (21)

and

graphic file with name M93.gif (22)

These equations are exactly the same as Equations 5 and 6 given by Xu (2003).

DISCUSSION

There are several alternative ways to prove the shrinkage estimation, such as the conditional distribution of multivariate normal variables (Giri 1996). The method presented in this note is a generalization of Box and Tiao's (1973, Appendix A1.1) combination of a univariate normal likelihood and a univariate normal prior. Using the method of Box and Tiao (1973), we can extend the lemma to the situation of inferring b from more than two independent sources of information. Let m be the number of sources of information (independent of each other) used to infer b and the distribution from the ith source is Inline graphic for Inline graphic The posterior distribution of b combining all the sources of information is Inline graphic where

graphic file with name M97.gif (23)

and

graphic file with name M98.gif (24)

One can use mathematical induction to prove Equations 23 and 24, starting from Inline graphic (given in the lemma) and moving to Inline graphic and so on.

Bayesian shrinkage estimation refers to the biased estimation of a regression coefficient toward zero using a prior variance as a factor to control the degree of shrinkage. A normal prior is often selected because it is a conjugate prior so that the posterior distribution remains normal. A normal posterior simplifies the MCMC sampling process because the Gibbs sampler can be used to draw the regression coefficient. Other prior distributions have been proposed, e.g., the mixture prior of two normal distributions (George and Mcmulloch 1993; Yi et al. 2003) and the spike and slab model (Ishwaran and Rao 2005). A t-distribution may also be used as a prior for the regression coefficient. However, the posterior distribution using a nonnormal prior rarely has an explicit form of a distribution, making Gibbs sampling impossible and thus complicating the MCMC sampling process.

The shrinkage method for regression analysis may also be called the random model approach to regression analysis, or simply random regression, because each regression coefficient is treated as a random effect with a (prior) normal distribution. It is well known that there is no limit in the number of random effects that can be handled by a random model. The success of a random linear model analysis, however, depends on the variance components chosen for the random model. If a random model contains an excessively large number of regression coefficients, most of them will be zero or close to zero. The sparse nature of the regression coefficients cannot be characterized by the random linear model alone and it must be accompanied by an efficient method to choose the variance components. In QTL mapping, the number of variance components can be extremely large, making subjective selection of the variance components impossible. Therefore, the variance components must be estimated from the data.

The most convenient way to estimate the variance components is to use the maximum-likelihood method. The estimated variance components are used in place of the prior variances to estimate the regression coefficients. The method is called the empirical Bayes method as far as the estimation of regression coefficients is concerned (Xu 2007). To reflect the sparse nature of the regression coefficients, a prior distribution is often assigned to each variance component. This is called hierarchical modeling (Gelman 2005). Furthermore, the prior distribution should be highly concentrated around zero. Many different prior distributions can be chosen for the variance components, but the scaled inverse chi-square distribution is the most convenient and flexible prior with such a property (Lindley and Smith 1972). Exponential distribution (Tibshirani 1996) and half t-distribution (Gelman 2006) have also been used. The prior choice for variance components of the random regression analysis is a very active research area to explore. More efficient priors may be developed in the future.

In the random regression analysis, the variance of a regression coefficient is not the primary interest of the investigator; rather, it is used only for the purpose of controlling the magnitude of the shrinkage. If the regression coefficients are batched (clustered) so that regression coefficients in the same batches share the same prior distribution, the variance may be estimated accurately and the estimate of it may be meaningful (Gelman 2005). In this case, the primary interest has been shifted from the regression coefficients to the variances of the regression coefficients; the method is better called the analysis of variances (ANOVA) (Gelman 2005). In the usual shrinkage analysis, the regression coefficients are not batched; i.e., every regression coefficient has its own prior variance, and the estimated variance for a regression coefficient may vary drastically across the posterior sample. This problem may look very bad, but will not seriously harm the Bayesian shrinkage estimates of the regression coefficients. One can minimize the variation of the sampled variance across the posterior sample by using some proper prior distribution for the variance (Gelman 2005).

References

  1. Box, G. E. P., and G. C. Tiao, 1973. Bayesian Inference in Statistical Analysis. Wiley & Sons, New York.
  2. Gelman, A., 2005. Analysis of variance–why it is more important than ever. Ann. Stat. 33: 1–53. [Google Scholar]
  3. Gelman, A., 2006. Prior distribution for variance parameters in hierarchical models. Bayesian Anal. 1: 515–533. [Google Scholar]
  4. George, E. I., and R. E. McMulloch, 1993. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 91: 883–904. [Google Scholar]
  5. Giri, N. C., 1996. Multivariate Statistical Analysis. Marcel Dekker, New York.
  6. Ishwaran, H., and J. S. Rao, 2005. Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33: 730–773. [Google Scholar]
  7. Lindley, D. V., and A. F. M. Smith, 1972. Bayes estimates for the linear model. J. R. Stat. Soc. Ser. B 34: 1–41. [Google Scholar]
  8. Robinson, G. K., 1991. That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6: 15–32. [Google Scholar]
  9. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58: 267–288. [Google Scholar]
  10. Wang, H., Y. M. Zhang, X. Li, G. L. Masinde, S. Mohan et al., 2005. Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465–480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Xu, S., 2003. Estimating polygenic effects using markers of the entire genome. Genetics 163: 789–801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Xu, S., 2007. An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63: 513–521. [DOI] [PubMed] [Google Scholar]
  13. Yang, R., and S. Xu, 2007. Bayesian shrinkage analysis of quantitative trait loci for dynamic traits. Genetics 176: 1169–1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Yi, N., V. George and D. B. Allison, 2003. Stochastic search variable selection for identifying quantitative trait loci. Genetics 164: 1129–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES