Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2010 Jul;11(3):473–483. doi: 10.1093/biostatistics/kxq012

Multiplicity-calibrated Bayesian hypothesis tests

Mengye Guo 1,2,*, Daniel F Heitjan 1,2
PMCID: PMC2912702  PMID: 20212321

Abstract

When testing multiple hypotheses simultaneously, there is a need to adjust the levels of the individual tests to effect control of the family-wise error rate (FWER). Standard frequentist adjustments control the error rate but are typically both conservative and oblivious to prior information. We propose a Bayesian testing approach—multiplicity-calibrated Bayesian hypothesis testing—that sets individual critical values to reflect prior information while controlling the FWER via the Bonferroni inequality. If the prior information is specified correctly, in the sense that those null hypotheses considered most likely to be false in fact are false, the power of our method is substantially greater than that of standard frequentist approaches. We illustrate our method using data from a pharmacogenetic trial and a preclinical cancer study. We demonstrate its error rate control and power advantage by simulation.

Keywords: Bayes factor, Bonferroni inequality, Frequentist calibration, Multiplicity

1. INTRODUCTION

When testing multiple hypotheses simultaneously, failure to adjust for the multiplicity of tests will lead to an inflated family-wise error rate (FWER) or probability of incorrectly rejecting at least one hypothesis. The problem can occur in clinical trials where there are multiple outcome variables, genetic studies in which large numbers of markers are to be tested for association with a key phenotype and basic science experiments with multiple control groups.

Standard procedures to address this problem include the Bonferroni correction and its variants (Shaffer, 1995). Assume that there are m null hypotheses, H1,…,Hm whose tests lead to individual p values P1,…,Pm. The unweighted Bonferroni procedure rejects any hypothesis Hi for which Piα/m. This method controls the FWER through the Bonferroni inequality: Pr[cupi{Piα/m|Hi}] ≤ α. Simes (1986) proposed assigning individual thresholds to ordered p values to test all the hypotheses simultaneously. Hommel (1988) and Hochberg (1988) extended Simes's procedure for individual hypotheses to derive more powerful tests. Westfall and Young (1993) proposed a step-down adjusted p value approach that takes into account the dependence structure of the hypotheses.

Criticisms of p value-based procedures include their incompatibility with the likelihood principle (Berger and Berry, 1988), overweighting of evidence against the null (Berger and Sellke, 1987), conflation of evidence against the null with evidence for the alternative, widespread misinterpretation (Diamond and Forrester, 1983), peculiar behavior such as Lindley's paradox (Lindley, 1957; Bartlett, 1957), and failure to systematically incorporate prior information. Practically, perhaps the most damning critique is their overconservatism, which becomes acute when the number of hypotheses is large (Hochberg, 1988).

An alternative approach is Bayesian testing, in which one assigns prior probabilities to the null and alternative hypotheses and computes their posterior probabilities by Bayes's Theorem. A key measure of evidence is the Bayes factor (BF) defined as the ratio of the posterior to prior odds for the null (Berger, 1985; Kass and Raftery, 1995). Bayesian testing obeys the likelihood principle because the data exert their influence only through the likelihood. The posterior probabilities measure the evidence for the null and alternative directly and hence have a simple, intuitive interpretation (Berger and Sellke, 1987). Moreover, Bayesian testing automatically incorporates prior information. Ironically, this is often portrayed as a disadvantage, in that sensitivity to the prior is seen as a failure to exhibit “objectivity.”

The usual Bayesian approach to adjust for multiplicity is through the prior, by first ranking the BFs and then multiplying each by a factor that reflects the current prior (Jeffreys, 1961), or using a hierarchical prior that shrinks the effects being tested toward a common mean (Westfall and others, 1997: Berry and Hochberg, 1999). Such methods adjust for multiplicity without necessarily effecting a frequentist calibration.

The frequentist approaches cited above treat all null hypotheses equally, whereas in practice all null hypotheses may not be equally likely to be true. For example, markers on genes that affect the biological pathway of a treatment are more likely to be pharmacogenetically active than randomly sampled markers. It seems reasonable to use such prior information as the basis of a multiplicity adjustment strategy. There have been some attempts along these lines from the frequentist perspective: Holm (1979) suggested controlling the FWER while assigning larger weights to p values whose null hypotheses are more likely to be false, and Benjamini and Hochberg (1997) used weights in a loss function to indicate the importance of each hypothesis. By comparison, a Bayesian analysis automatically includes prior information.

We propose here a multiplicity-calibrated Bayesian hypothesis test (MCBHT) that controls the overall type I error using the Bonferroni inequality and increases the power for likely alternatives by selection of priors. We discuss in detail a special case of our method that places the hypotheses into 2 classes—candidate and control—attaching greater prior weight to the alternatives for candidate tests. We then distribute the overall type I error probability among all the hypotheses by assigning a lower threshold to the candidate hypotheses. In this way, the method increases the power as long as our conjectured identification of candidate tests is correct.

We illustrate MCBHT by 2 examples: The first example is a pharmacogenetic trial that sought to identify genetic markers that modify the treatment effect of bupropion among smokers attempting to quit (Heitjan and others 2007). The markers fell into 2 classes: Those that were selected because they reside on genes that are associated with the action of nicotine, and those randomly selected from throughout the genome for use in testing for population stratification. The second example comes from a preclinical study to assess the effect of combined hormonal and polyamine manipulation on breast cancer cell proliferation in vivo. This was a 3-arm trial with 2 endpoints (Manni and others, 1992); the hypotheses included one of primary interest together with 2 negative controls.

The article is organized as follows: In Section 2, we describe the calibrated Bayesian hypothesis testing method. Section 3 presents the pharmacogenetic example and supporting simulation studies. Section 4 presents conclusions and discussion. In the supplementary material (available at Biostatistics online), we present the preclinical example and the corresponding simulations.

2. CALIBRATED BAYESIAN HYPOTHESIS TESTS

2.1. Calibrating a single test

Suppose we want to test a simple null hypothesis H0:θ = 0 versus a composite alternative H1:θ≠0. The Bayesian hypothesis test starts by setting the prior probability for H0 (H1) to be π0 (π1 = 1 − π0) (Berger, 1985). The posterior probability of Hj is

2.1.

where x is the observed data and Pr[x|Hj] is the marginal density of the data under Hj. The BF in favor of H0 is

2.1.

Smaller values of BF indicate greater support for the alternative. Because

2.1. (2.1)

there is a 1-1 correspondence between the posterior probability and the BF when π0 is specified.

The main effort in a Bayesian hypothesis test lies in the computation of the marginal density of the data for which various analytical and numerical approximations are available. The marginal density is written as Pr[x|Hj] = ∫f(x|θj,Hj)πj(θj|Hj)dθj, j = 0,1, where θj is the parameter vector under Hj, πj(θj|Hj) is its prior distribution under Hj, and f(x|θj,Hj) is the likelihood function under Hj (Kass and Raftery, 1995). The prior distribution of θ under the alternative is commonly specified as a symmetric distribution with mean at the null (Berger and Sellke, 1987). In both of our examples, we will use a normal prior centered at the null value of 0, with standard deviation based on prior relevant studies. The prior for a nuisance parameter might well be the same under the null and alternative hypotheses, as illustrated in our second example.

Because posterior probability is sensitive to the choice of π0, many analysts prefer to summarize the data as the BF, which is independent of π0. Jeffreys proposed BF < 1/3 as a threshold indicating moderate evidence for the alternative (Kass and Raftery, 1995). Although Bayesian tests are typically more conservative than frequentist tests (Edwards and others, 1963), increasingly so in large samples, they are not generally calibrated (Kass and Raftery, 1995).

A natural way to calibrate a Bayesian test is to select a threshold for significance that achieves desired frequentist properties (Weiss, 1997). Thus, letting α be the target type I error rate, one must solve the equation

2.1.

for BF*. For example, in testing a single normal mean, Weiss (1997) found that BF is a function of the sample mean under specific priors and obtained its α-quantile under the null accordingly. In our examples, we will compute the null distribution of BF by simulation.

In light of (2.1), for a fixed prior we can view BF as a function of the posterior probability, and hence a test based on BF is equivalent to a test based on the posterior probability.

2.2. Calibrating multiple tests

With multiple testing, we seek to calibrate the Bayesian test by controlling its type I error rate across a family of hypothesis tests. Specifically, we will construct a weighted Bonferroni test, based on the BF, that assigns a more forgiving threshold to tests where the alternative is more likely to be true.

We assume initially that there are m hypotheses partitioned into 2 classes: mC candidate and mN control hypotheses, mC + mN = m. The basis of this classification would be prior knowledge from pilot data or relevant studies in the literature. For example, in the pharmacogenetic study, the locations of the genetic markers and the relevance of these genes to the biological pathway of the treatment are known from prior studies. Some markers that are located within genes that affect the pathway of the treatment are likely to exhibit an effect and the markers that lie elsewhere are not. An analysis plan that adjusts for multiplicity but treats all the hypotheses equally would be inefficient compared to a plan that gives greater prior probability to the alternatives that are more likely to be true. Our method groups the markers that are likely to be positive as candidates and those likely to be negative as controls.

In the basic science example, previous data may suggest that one component of a material is likely to exhibit an effect and some other component is not. An experimenter would simultaneously test the candidate component (expected to have an effect), the noncandidate component (expected to have no effect), and their combination (expected to have the same effect as the candidate alone). Thus, we classify the components expected to have an effect as candidates and those expected to have no effect as controls.

To incorporate this information into the Bayesian test, we assign a smaller prior probability of the null to the candidate hypotheses (π0C) and a larger prior probability of the null to the control hypotheses (π0N), π0C < π0N. We define Inline graphic, the ratio of the odds for the null under the control and candidate hypotheses. For simplicity, we assume that the prior distribution of θ under the alternative is the same for candidate and control hypotheses, as illustrated in the examples.

Once we specify the candidate and control hypotheses and the priors, we calibrate the multiple Bayesian test by controlling its type I error over the family of tests. We start with a common threshold for the posterior probability, denoted as P*, and control the overall type I error rate at level α through the Bonferroni inequality:

2.2. (2.2)

where HC(i) is the ith candidate null hypothesis, HN(j) is the jth control null hypothesis, and Pr[H|x,π0] is the posterior probability of hypothesis H, whose prior probability is π0. The posterior probability accommodates the difference between candidate and control hypotheses through the prior.

Alternatively, one can base the test on the BF. We propose to fix the threshold for BF for a candidate hypothesis (BF*) at k times the threshold for a control hypothesis (Appendix A), while we control the overall type I error rate by

2.2. (2.3)

where BFC(i) is the BF for the ith candidate null hypothesis and BFN(j) is the BF for the jth control null hypothesis. We solve Inequality (2.3) for BF*, the threshold for the candidate hypotheses, which gives us BF*/k as the threshold for the control hypotheses. Because the thresholds differ, the type I error rates attributed to candidate and control hypotheses also differ.

By selecting k > 1, we can increase the power if the candidate alternatives are more often true. Reducing the type I error rates among the controls does not harm power if, as expected, the control nulls are true. How type I error is distributed among all the hypotheses is determined by the threshold ratio k, with a larger k inferring a more liberal type I error rate for candidate hypotheses and hence giving greater opportunity for improvement in power.

3. EXAMPLE: TREATMENT–SNP INTERACTIONS IN SMOKING CESSATION

Heitjan and others (2007) sought to identify single nucleotide polymorphisms (SNPs) that modify the treatment effect of bupropion in smoking cessation. As part of a pharmacogenetic trial of bupropion versus placebo (Lerman and others, 2006), eligible smokers provided blood samples for DNA extraction and genotyping. Smoking status was recorded at the end of the treatment. As a follow-up to this trial (Heitjan and others, 2007), 59 SNPs coding for neuronal nicotinic acetylcholine receptors (nAChRs) were genotyped, along with 43 randomly selected SNPs to test for population stratification. The nAChR genes were believed a priori to contribute to smoking relapse and bupropion response.

Heitjan and others (2007) estimated logistic regression models predicting outcome from treatment, SNP, and the treatment–SNP interaction for each SNP in the panel. They assigned a probability π0 to the null hypothesis of zero interaction and assumed a normal prior distribution for the interaction coefficient under the alternative hypothesis, with the parameters in the prior based on results of past relevant studies. With no adjustment for multiplicity, the uncorrected BF identified 4 SNPs, and the likelihood ratio p value 7, as potentially pharmacogenetically active.

To illustrate our method, we restrict our analysis to the 6 SNPs on the CHRNA5 gene (the candidates) and the 43 randomly selected control SNPs. We set k = 20 and the significance level to 0.10. We solved for the threshold for BF from (2.3), with the null distribution of BF simulated from its analytical form (Appendix B).

Table 1 shows that only the most significant SNP by both p value and BF retains significance after multiplicity calibration. No SNP is significant under unweighted Bonferroni correction.

Table 1.

Results of the bupropion example including all the SNPs on CHRNA5 and all control markers; the threshold for BF in the MCBHT is 0.225

Class Gene rs number BF p value Decision
P BF Bonferroni MCBHT
Candidate CHRNA5 rs871058 0.191 0.019 1 1 0 1
rs601079 0.489 0.079 0 0 0 0
rs692780 0.701 0.161 0 0 0 0
rs514743 0.890 0.207 0 0 0 0
rs684513 1.585 0.670 0 0 0 0
rs637137 1.591 0.532 0 0 0 0
Control DIP2A rs2839290 0.647 0.208 0 0 0 0
MGC35440 rs741441 0.702 0.041 1 0 0 0
VGCNL1 rs638732 0.706 0.159 0 0 0 0
Unidentified rs2828759 0.721 0.055 0 0 0 0
Unidentified rs256875 0.784 0.176 0 0 0 0
Unidentified rs2750097 0.852 0.244 0 0 0 0
NFLA rs1909118 0.883 0.256 0 0 0 0
Unidentified rs1024766 0.924 0.281 0 0 0 0
UNC93A rs588981 0.946 0.196 0 0 0 0
LASS3 rs1910412 0.947 0.086 0 0 0 0
Unidentified rs907444 0.961 0.206 0 0 0 0
GRK7 rs1467200 1.043 0.306 0 0 0 0
AHCTF1 rs1691251 1.073 0.437 0 0 0 0
Unidentified rs2611611 1.131 0.320 0 0 0 0
EFCAB3 rs2009866 1.200 0.579 0 0 0 0
FMN1 rs1534596 1.210 0.378 0 0 0 0
Unidentified rs1468158 1.295 0.298 0 0 0 0
LOC728727 rs1330106 1.376 0.459 0 0 0 0
CCDC105 rs736737 1.400 0.429 0 0 0 0
ZC3H13 rs2031633 1.419 0.409 0 0 0 0
LOC152485 rs878451 1.464 0.540 0 0 0 0
KIAA1826 rs1939810 1.466 0.466 0 0 0 0
Unidentified rs2190184 1.489 0.521 0 0 0 0
Unidentified rs1365057 1.493 0.449 0 0 0 0
Unidentified rs2036943 1.522 0.624 0 0 0 0
NUDCD1 rs2054255 1.594 0.543 0 0 0 0
C20orf23 rs2208056 1.602 0.604 0 0 0 0
Unidentified rs136501 1.652 0.667 0 0 0 0
PHLPPL rs2052584 1.659 0.787 0 0 0 0
DNAJC10 rs288259 1.660 0.937 0 0 0 0
Unidentified rs829864 1.670 0.993 0 0 0 0
Unidentified rs1885423 1.678 0.978 0 0 0 0
Unidentified rs1906810 1.692 0.877 0 0 0 0
EPDR1 rs2598108 1.702 0.871 0 0 0 0
Unidentified rs719674 1.702 0.667 0 0 0 0
NFASC rs2802853 1.708 0.976 0 0 0 0
SMCR7 rs2605141 1.709 0.918 0 0 0 0
CEP110 rs1998505 1.712 0.987 0 0 0 0
UNC5CL rs2294693 1.713 0.737 0 0 0 0
ZNF445 rs1106499 1.724 0.862 0 0 0 0
ZFYVE27 rs946778 1.730 0.835 0 0 0 0
Unidentified rs1359719 1.735 0.992 0 0 0 0

We used simulation to evaluate the sizes and powers of the methods being considered. We set the total number of SNPs as 49 and varied the number of nonnull SNPs (m1) and the number of candidate SNPs (mC). Table 2 presents the simulated type I error rates for each method for the setting m1 = 0 and mC = 6. All the multiplicity adjustment approaches controlled type I error rates at the 0.05 level. An initial set of power simulations appears in Table 3. This time we set m1 = 6 and mC = 6, with all the nonnull SNPs correctly specified as candidates. We also varied the effect size βI∈{ − 0.5,0.5,1}. We get similar observations as in Table S3 (supplementary material available at Biostatistics online); that is, the power for MCBHT is higher than Bonferroni, achieving a plateau for large k.

Table 2.

Type I error rate (%) with 49 hypotheses, all null, assuming 6 candidate hypotheses and 43 controls

Method Type I error
p value 91.4
BF 72.6
Bonferroni 5.0
MCBHT k = 1 4.3
MCBHT k = 2 4.2
MCBHT k = 4 4.6
MCBHT k = 6 5.0
MCBHT k = 8 5.0
MCBHT k = 10 4.8
MCBHT k = 20 4.6

Table 3.

Power (%) with 49 hypotheses, 6 nonnull (all candidates), and 43 null (all controls)

Method βI = – 0.5 βI = 0.5 βI = 1
p value 19.5 19.8 56.5
BF 13.3 15.2 46.7
Bonferroni 1.5 1.7 13.6
MCBHT k = 1 1.5 1.9 13.6
MCBHT k = 2 2.5 2.9 18.2
MCBHT k = 4 3.8 4.3 22.8
MCBHT k = 6 4.5 5.0 25.1
MCBHT k = 8 4.8 5.3 26.5
MCBHT k = 10 5.1 5.6 27.3
MCBHT k = 20 5.6 6.2 29.6

We also illustrate the power of MCBHT for a range of values of m1 = mC, with all the nonnull SNPs correctly specified as candidates (Table 4). The power for MCBHT is always larger than Bonferroni. On average, power declines with m1 regardless of k.

Table 4.

Power (%) with 49 hypotheses, with varying numbers of candidate markers mC, all of them nonnull (m1 = mC), and all control markers null

Method m1 = 1 m1 = 2 m1 = 3 m1 = 4 m1 = 5
p value 21.2 20.1 19.3 20.8 17.9
BF 13.5 14.9 13.4 13.7 11.9
Bonferroni 2 1.6 1.7 1.8 1.5
MCBHT k = 1 1.9 1.8 1.7 1.8 1.3
MCBHT k = 2 2.9 3.1 2.7 2.8 2.2
MCBHT k = 4 5.2 5.3 4.2 4.1 3.5
MCBHT k = 6 6.9 6.8 5.2 5 4.1
MCBHT k = 10 9.2 8.3 6.7 5.9 4.8
MCBHT k = 20 12.9 10.7 8.1 7.2 5.4

Table 5.

Power (%) with 49 hypotheses, with a fraction of nonnulls incorrectly specified as controls

Method m1 = 1 PCN = 0 m1 = 2 PCN = 1/2 m1 = 2 PCN = 0 m1 = 3 PCN = 2/3 m1 = 3 PCN = 1/3 m1 = 3 PCN = 0
p value 11.7 20.3 21.6 17.5 21.1 18.4
BF 5.6 14.4 13.6 12.4 15.3 15.7
Bonferroni 0.5 2.2 2.1 1.7 1.7 1.4
MCBHT k = 1 0.2 2.2 2.1 1.7 1.9 1.8
MCBHT k = 2 0.2 2.9 1.8 1.9 2.1 1.6
MCBHT k = 4 0.2 3.5 1.4 2.2 2.4 1.3
MCBHT k = 6 0.2 3.5 1.1 2.4 2.6 1.1
MCBHT k = 8 0.1 3.5 1.0 2.5 2.6 1.0
MCBHT k = 10 0.1 3.6 0.8 2.4 2.4 0.9
MCBHT k = 20 0.0 4.0 0.5 2.6 2.6 0.6

Finally, we considered the situation where m1 < mC = 6 and some non-null SNPs are misspecifed as controls. We quantify a priori validity as the proportion of candidates among the non-null markers (PCN). The lower the fraction of nonnulls that are specified as candidates, the worse is the power of MCBHT. Thus, the value of MCBHT depends on the ability of the user to identify the nonnull markers as candidates.

4. DISCUSSION

Incorporating prior information can raise the power of frequentist multiplicity-adjusted tests, rendering them useful as screening tools when the number of tests is moderate to large. We propose a multiplicity-calibrated Bayesian hypothesis test that assigns a separate prior probability to each null hypothesis and controls for multiplicity through a weighted Bonferroni adjustment. Simulations demonstrate that our method increases the power if the prior information reflects the true state of nature. One could also use our method to calibrate other p value-based multiplicity adjustment procedures such as step-down tests (Westfall and Young, 1993) and tests that control the false discovery rate (FDR; Benjamini and Hochberg, 1995).

Genovese and others (2006) proposed a similar approach, controlling the FDR at a prespecified level by assigning to each p value a weight associated with the probability that the null hypothesis is false. Chen and Sarkar (2005) also proposed to incorporate the uncertainty in both parameter and data by using average FDR, averaging FDR over the parameter space. Our method accomplishes the calibration by basing the tests on the BF, adjusting its critical value until the desired error rate is achieved.

Although we illustrated our method in the scenario of two classes of hypotheses—candidate and control—the idea is more general, in that each null hypothesis can have a unique prior probability. As in many applications of Bayesian analysis, the choice of prior can be challenging. Our method for the 2-class case only requires specification of k, the ratio of the prior odds of the null under the control (null is more likely) and candidate (null is less likely) classes. Because our method is based on BF, there is no need to specify the exact prior probability for the null. When k = 1, our method is similar to an unweighted Bonferroni test. When k is large, for example, k > 10 in our examples, the method essentially eliminates the control tests and becomes a weighted Bonferroni test of the candidate null hypotheses.

In pharmacogenetic applications, one might ultimately seek to identify a best model for predicting outcome from treatment, an array of genetic markers, and treatment-by-marker (and even marker-by-marker) interactions. Information on relevant biological pathways involving the treatment effect could be used to further inform the selection of priors. In that context, the test described here would serve as a screening procedure to select a subset of markers for a more comprehensive model selection exercise.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

FUNDING

The United States Public Health Service (P50 CA084718 from the National Cancer Institute and National Institute on Drug Abuse; R01 CA063562 and R01 CA116723 from the National Cancer Institute; P20 RR020741 from the National Center for Research Resources).

Supplementary Material

[Supplementary Material]
kxq012_index.html (638B, html)

Acknowledgments

We are grateful to Caryn Lerman, Jinbo Chen, Edward George, and Stephen Kimmel for helpful comments and support. Conflict of Interest: None declared.

APPENDIX A: 1-1 POSTERIOR PROBABILITY AND BF

Assume that there are m null hypotheses H1,…,Hm, among which mC are candidate hypotheses and mN are controls. We assign a prior probability for the null π0C to the candidate hypotheses and π0N to the controls, with the relationship Inline graphic. For each null hypothesis Hi, we compute its posterior probability Pr[Hi|x] and BF, BFi. We reject Hi for a smaller posterior probability for the null than a common threshold P*, which is equivalent to rejecting candidates if BFi ≤ BFC* and controls if BFi ≤ BFN* by the following deduction:

graphic file with name biostskxq012fx9_ht.jpg

Because

graphic file with name biostskxq012fx10_ht.jpg

the BF threshold for candidate hypotheses is k times the BF threshold for controls.

APPENDIX B: ASYMPTOTIC APPROXIMATION OF BF

In Heitjan and others (2007), we developed an asymptotic approximation to compute the BF for the test of a treatment-by-SNP interaction with a binary outcome variable. Using the notation of that paper, we apply a second-order Taylor expansion on the log-likelihood of β around its maximum likelihood estimate (MLE; Inline graphic) and obtain Inline graphic, where Inline graphic is the inverse of the observed information matrix of the likelihood evaluated at Inline graphic. Plugging the approximation into m0(x) and integrating out β gives

graphic file with name biostskxq012fx14_ht.jpg

where β0 = β0TG, Inline graphic and Inline graphic are the MLE and estimated variance for β0TG, φ(·;μ00) is the multivariate normal density with mean μ0 and variance matrix Σ0, and the constants μ0 and Σ0 are the prior mean and variance for β0TG. Similarly, Inline graphic, where β1 = β0TGI, Inline graphic and Inline graphic are the MLE and estimated variance for β0TGI, φ(·;μ11) is the density of normal distribution with mean μ1 and variance matrix Σ1, and the constants μ1 and Σ1 are the prior mean and variance for β0TGI. Then,

graphic file with name biostskxq012fx20_ht.jpg

which one can show to be the same order, O(n − 1), as Laplace's method under certain regularity conditions (Wang and George, 2004).

References

  1. Bartlett MS. A comment on D.V. Lindley's statistical paradox. Biometrika. 1957;44:533–534. [Google Scholar]
  2. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;57:289–300. [Google Scholar]
  3. Benjamini Y, Hochberg Y. Multiple hypotheses testing with weights. Scandinavian Journal of Statistics. 1997;24:407–418. [Google Scholar]
  4. Berger JO. Statistical Decision Theory and Bayesian Analysis. New York: Springer; 1985. [Google Scholar]
  5. Berger JO, Berry DA. Statistical analysis and the illusion of objectivity. American Scientist. 1988;76:159–165. [Google Scholar]
  6. Berger JO, Sellke T. Testing a point null hypothesis: the irreconcilability of p values and evidence. Journal of the American Statistical Association. 1987;82:112–122. [Google Scholar]
  7. Berry DA, Hochberg Y. Bayesian perspectives on multiple comparisons. Journal of Statistical Planning and Inference. 1999;82:215–227. [Google Scholar]
  8. Chen J, Sarkar SK. A Bayesian determination of threshold for identifying differentially expressed genes in microarray experiments. Statistics in Medicine. 2005;25:3174–3189. doi: 10.1002/sim.2422. [DOI] [PubMed] [Google Scholar]
  9. Diamond GA, Forrester JS. Clinical trials and statistical verdicts: probable grounds for appeal. Annals of Internal Medicine. 1983;98:385–394. doi: 10.7326/0003-4819-98-3-385. [DOI] [PubMed] [Google Scholar]
  10. Edwards W, Lindman H, Savage LJ. Bayesian statistical inference for psychological research. Psychological Review. 1963;70:193–242. [Google Scholar]
  11. Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509. [Google Scholar]
  12. Gönen M, Johnson WO, Lu Y, Westfall PH. The Bayesian two-sample t test. The American Statistician. 2005;59:252–257. doi: 10.1080/00031305.2017.1322142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Heitjan DF, Guo M, Ray R, Wileyto EP, Epstein LH, Lerman C. Identification of pharmacogenetic markers in smoking cessation therapy. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2007;147B:712–719. doi: 10.1002/ajmg.b.30669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800–802. [Google Scholar]
  15. Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979;6:65–70. [Google Scholar]
  16. Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75:383–386. [Google Scholar]
  17. Jeffreys H. Theory of Probability. Oxford: Oxford University Press; 1961. [Google Scholar]
  18. Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
  19. Lerman C, Jepson C, Wileyto EP, Epstein LH, Rukstalis M, Patterson F, Kaufmann V, Restine S, Hawk L, Niaura R, et al. Role of functional genetic variation in the dopamine D2 receptor (DRD2) in response to bupropion and nicotine replacement therapy for tobacco dependence: results of two randomized clinical trials. Neuropsychopharmacology. 2006;31:231–242. doi: 10.1038/sj.npp.1300861. [DOI] [PubMed] [Google Scholar]
  20. Lindley DV. A statistical paradox. Biometrika. 1957;44:187–192. [Google Scholar]
  21. Manni A, Khin S, Biser N, English H, Badger B, Martel J, Demers L. Synchronization of breast cancer cell proliferation in vivo by combined hormonal and polyamine manipulation. Cancer Research. 1992;52:5720–5724. [PubMed] [Google Scholar]
  22. Shaffer JP. Multiple hypothesis testing. Annual Reviews in Psychology. 1995;46:561–584. [Google Scholar]
  23. Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–754. [Google Scholar]
  24. Wang X, George EI. A hierarchical Bayes approach to variable selection for generalized linear models. Technical Report SMU-TR-321. Dallas, TX: Department of Statistics, Southern Methodist University; 2004. [Google Scholar]
  25. Weiss R. Bayesian sample size calculations for hypothesis testing. Journal of the Royal Statistical Society, Series D: The Statistician. 1997;46:185–191. [Google Scholar]
  26. Westfall PH, Johnson WO, Utts JM. A Bayesian perspective on the Bonferroni adjustment. Biometrika. 1997;84:419–427. [Google Scholar]
  27. Westfall PH, Young SS. Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Hoboken, NJ: Wiley-Interscience; 1993. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplementary Material]
kxq012_index.html (638B, html)
kxq012_1.pdf (81.5KB, pdf)

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES