Multiplicity-calibrated Bayesian hypothesis tests

Mengye Guo; Daniel F Heitjan

doi:10.1093/biostatistics/kxq012

. 2010 Jul;11(3):473–483. doi: 10.1093/biostatistics/kxq012

Multiplicity-calibrated Bayesian hypothesis tests

Mengye Guo ^1,^2,^*, Daniel F Heitjan ^1,²

PMCID: PMC2912702 PMID: 20212321

Abstract

When testing multiple hypotheses simultaneously, there is a need to adjust the levels of the individual tests to effect control of the family-wise error rate (FWER). Standard frequentist adjustments control the error rate but are typically both conservative and oblivious to prior information. We propose a Bayesian testing approach—multiplicity-calibrated Bayesian hypothesis testing—that sets individual critical values to reflect prior information while controlling the FWER via the Bonferroni inequality. If the prior information is specified correctly, in the sense that those null hypotheses considered most likely to be false in fact are false, the power of our method is substantially greater than that of standard frequentist approaches. We illustrate our method using data from a pharmacogenetic trial and a preclinical cancer study. We demonstrate its error rate control and power advantage by simulation.

Keywords: Bayes factor, Bonferroni inequality, Frequentist calibration, Multiplicity

1. INTRODUCTION

When testing multiple hypotheses simultaneously, failure to adjust for the multiplicity of tests will lead to an inflated family-wise error rate (FWER) or probability of incorrectly rejecting at least one hypothesis. The problem can occur in clinical trials where there are multiple outcome variables, genetic studies in which large numbers of markers are to be tested for association with a key phenotype and basic science experiments with multiple control groups.

Standard procedures to address this problem include the Bonferroni correction and its variants (Shaffer, 1995). Assume that there are m null hypotheses, H₁,…,H_m whose tests lead to individual p values P₁,…,P_m. The unweighted Bonferroni procedure rejects any hypothesis H_i for which P_i ≤ α/m. This method controls the FWER through the Bonferroni inequality: Pr[cup_i{P_i ≤ α/m|H_i}] ≤ α. Simes (1986) proposed assigning individual thresholds to ordered p values to test all the hypotheses simultaneously. Hommel (1988) and Hochberg (1988) extended Simes's procedure for individual hypotheses to derive more powerful tests. Westfall and Young (1993) proposed a step-down adjusted p value approach that takes into account the dependence structure of the hypotheses.

Criticisms of p value-based procedures include their incompatibility with the likelihood principle (Berger and Berry, 1988), overweighting of evidence against the null (Berger and Sellke, 1987), conflation of evidence against the null with evidence for the alternative, widespread misinterpretation (Diamond and Forrester, 1983), peculiar behavior such as Lindley's paradox (Lindley, 1957; Bartlett, 1957), and failure to systematically incorporate prior information. Practically, perhaps the most damning critique is their overconservatism, which becomes acute when the number of hypotheses is large (Hochberg, 1988).

An alternative approach is Bayesian testing, in which one assigns prior probabilities to the null and alternative hypotheses and computes their posterior probabilities by Bayes's Theorem. A key measure of evidence is the Bayes factor (BF) defined as the ratio of the posterior to prior odds for the null (Berger, 1985; Kass and Raftery, 1995). Bayesian testing obeys the likelihood principle because the data exert their influence only through the likelihood. The posterior probabilities measure the evidence for the null and alternative directly and hence have a simple, intuitive interpretation (Berger and Sellke, 1987). Moreover, Bayesian testing automatically incorporates prior information. Ironically, this is often portrayed as a disadvantage, in that sensitivity to the prior is seen as a failure to exhibit “objectivity.”

The usual Bayesian approach to adjust for multiplicity is through the prior, by first ranking the BFs and then multiplying each by a factor that reflects the current prior (Jeffreys, 1961), or using a hierarchical prior that shrinks the effects being tested toward a common mean (Westfall and others, 1997: Berry and Hochberg, 1999). Such methods adjust for multiplicity without necessarily effecting a frequentist calibration.

The frequentist approaches cited above treat all null hypotheses equally, whereas in practice all null hypotheses may not be equally likely to be true. For example, markers on genes that affect the biological pathway of a treatment are more likely to be pharmacogenetically active than randomly sampled markers. It seems reasonable to use such prior information as the basis of a multiplicity adjustment strategy. There have been some attempts along these lines from the frequentist perspective: Holm (1979) suggested controlling the FWER while assigning larger weights to p values whose null hypotheses are more likely to be false, and Benjamini and Hochberg (1997) used weights in a loss function to indicate the importance of each hypothesis. By comparison, a Bayesian analysis automatically includes prior information.

We propose here a multiplicity-calibrated Bayesian hypothesis test (MCBHT) that controls the overall type I error using the Bonferroni inequality and increases the power for likely alternatives by selection of priors. We discuss in detail a special case of our method that places the hypotheses into 2 classes—candidate and control—attaching greater prior weight to the alternatives for candidate tests. We then distribute the overall type I error probability among all the hypotheses by assigning a lower threshold to the candidate hypotheses. In this way, the method increases the power as long as our conjectured identification of candidate tests is correct.

We illustrate MCBHT by 2 examples: The first example is a pharmacogenetic trial that sought to identify genetic markers that modify the treatment effect of bupropion among smokers attempting to quit (Heitjan and others 2007). The markers fell into 2 classes: Those that were selected because they reside on genes that are associated with the action of nicotine, and those randomly selected from throughout the genome for use in testing for population stratification. The second example comes from a preclinical study to assess the effect of combined hormonal and polyamine manipulation on breast cancer cell proliferation in vivo. This was a 3-arm trial with 2 endpoints (Manni and others, 1992); the hypotheses included one of primary interest together with 2 negative controls.

The article is organized as follows: In Section 2, we describe the calibrated Bayesian hypothesis testing method. Section 3 presents the pharmacogenetic example and supporting simulation studies. Section 4 presents conclusions and discussion. In the supplementary material (available at Biostatistics online), we present the preclinical example and the corresponding simulations.

2. CALIBRATED BAYESIAN HYPOTHESIS TESTS

2.1. Calibrating a single test

Suppose we want to test a simple null hypothesis H₀:θ = 0 versus a composite alternative H₁:θ≠0. The Bayesian hypothesis test starts by setting the prior probability for H₀ (H₁) to be π₀ (π₁ = 1 − π₀) (Berger, 1985). The posterior probability of H_j is

where x is the observed data and Pr[x|H_j] is the marginal density of the data under H_j. The BF in favor of H₀ is

Smaller values of BF indicate greater support for the alternative. Because

(2.1)

there is a 1-1 correspondence between the posterior probability and the BF when π₀ is specified.

The main effort in a Bayesian hypothesis test lies in the computation of the marginal density of the data for which various analytical and numerical approximations are available. The marginal density is written as Pr[x|H_j] = ∫f(x|θ_j,H_j)π_j(θ_j|H_j)dθ_j, j = 0,1, where θ_j is the parameter vector under H_j, π_j(θ_j|H_j) is its prior distribution under H_j, and f(x|θ_j,H_j) is the likelihood function under H_j (Kass and Raftery, 1995). The prior distribution of θ under the alternative is commonly specified as a symmetric distribution with mean at the null (Berger and Sellke, 1987). In both of our examples, we will use a normal prior centered at the null value of 0, with standard deviation based on prior relevant studies. The prior for a nuisance parameter might well be the same under the null and alternative hypotheses, as illustrated in our second example.

Because posterior probability is sensitive to the choice of π₀, many analysts prefer to summarize the data as the BF, which is independent of π₀. Jeffreys proposed BF < 1/3 as a threshold indicating moderate evidence for the alternative (Kass and Raftery, 1995). Although Bayesian tests are typically more conservative than frequentist tests (Edwards and others, 1963), increasingly so in large samples, they are not generally calibrated (Kass and Raftery, 1995).

A natural way to calibrate a Bayesian test is to select a threshold for significance that achieves desired frequentist properties (Weiss, 1997). Thus, letting α be the target type I error rate, one must solve the equation

for BF^*. For example, in testing a single normal mean, Weiss (1997) found that BF is a function of the sample mean under specific priors and obtained its α-quantile under the null accordingly. In our examples, we will compute the null distribution of BF by simulation.

In light of (2.1), for a fixed prior we can view BF as a function of the posterior probability, and hence a test based on BF is equivalent to a test based on the posterior probability.

2.2. Calibrating multiple tests

With multiple testing, we seek to calibrate the Bayesian test by controlling its type I error rate across a family of hypothesis tests. Specifically, we will construct a weighted Bonferroni test, based on the BF, that assigns a more forgiving threshold to tests where the alternative is more likely to be true.

We assume initially that there are m hypotheses partitioned into 2 classes: m_C candidate and m_N control hypotheses, m_C + m_N = m. The basis of this classification would be prior knowledge from pilot data or relevant studies in the literature. For example, in the pharmacogenetic study, the locations of the genetic markers and the relevance of these genes to the biological pathway of the treatment are known from prior studies. Some markers that are located within genes that affect the pathway of the treatment are likely to exhibit an effect and the markers that lie elsewhere are not. An analysis plan that adjusts for multiplicity but treats all the hypotheses equally would be inefficient compared to a plan that gives greater prior probability to the alternatives that are more likely to be true. Our method groups the markers that are likely to be positive as candidates and those likely to be negative as controls.

In the basic science example, previous data may suggest that one component of a material is likely to exhibit an effect and some other component is not. An experimenter would simultaneously test the candidate component (expected to have an effect), the noncandidate component (expected to have no effect), and their combination (expected to have the same effect as the candidate alone). Thus, we classify the components expected to have an effect as candidates and those expected to have no effect as controls.

To incorporate this information into the Bayesian test, we assign a smaller prior probability of the null to the candidate hypotheses (π_0C) and a larger prior probability of the null to the control hypotheses (π_0N), π_0C < π_0N. We define Inline graphic , the ratio of the odds for the null under the control and candidate hypotheses. For simplicity, we assume that the prior distribution of θ under the alternative is the same for candidate and control hypotheses, as illustrated in the examples.

Once we specify the candidate and control hypotheses and the priors, we calibrate the multiple Bayesian test by controlling its type I error over the family of tests. We start with a common threshold for the posterior probability, denoted as P^*, and control the overall type I error rate at level α through the Bonferroni inequality:

(2.2)

where H_C⁽ⁱ⁾ is the ith candidate null hypothesis, H_N^(j) is the jth control null hypothesis, and Pr[H|x,π₀] is the posterior probability of hypothesis H, whose prior probability is π₀. The posterior probability accommodates the difference between candidate and control hypotheses through the prior.

Alternatively, one can base the test on the BF. We propose to fix the threshold for BF for a candidate hypothesis (BF^*) at k times the threshold for a control hypothesis (Appendix A), while we control the overall type I error rate by

(2.3)

where BF_C⁽ⁱ⁾ is the BF for the ith candidate null hypothesis and BF_N^(j) is the BF for the jth control null hypothesis. We solve Inequality (2.3) for BF^*, the threshold for the candidate hypotheses, which gives us BF^*/k as the threshold for the control hypotheses. Because the thresholds differ, the type I error rates attributed to candidate and control hypotheses also differ.

By selecting k > 1, we can increase the power if the candidate alternatives are more often true. Reducing the type I error rates among the controls does not harm power if, as expected, the control nulls are true. How type I error is distributed among all the hypotheses is determined by the threshold ratio k, with a larger k inferring a more liberal type I error rate for candidate hypotheses and hence giving greater opportunity for improvement in power.

3. EXAMPLE: TREATMENT–SNP INTERACTIONS IN SMOKING CESSATION

Heitjan and others (2007) sought to identify single nucleotide polymorphisms (SNPs) that modify the treatment effect of bupropion in smoking cessation. As part of a pharmacogenetic trial of bupropion versus placebo (Lerman and others, 2006), eligible smokers provided blood samples for DNA extraction and genotyping. Smoking status was recorded at the end of the treatment. As a follow-up to this trial (Heitjan and others, 2007), 59 SNPs coding for neuronal nicotinic acetylcholine receptors (nAChRs) were genotyped, along with 43 randomly selected SNPs to test for population stratification. The nAChR genes were believed a priori to contribute to smoking relapse and bupropion response.

Heitjan and others (2007) estimated logistic regression models predicting outcome from treatment, SNP, and the treatment–SNP interaction for each SNP in the panel. They assigned a probability π₀ to the null hypothesis of zero interaction and assumed a normal prior distribution for the interaction coefficient under the alternative hypothesis, with the parameters in the prior based on results of past relevant studies. With no adjustment for multiplicity, the uncorrected BF identified 4 SNPs, and the likelihood ratio p value 7, as potentially pharmacogenetically active.

To illustrate our method, we restrict our analysis to the 6 SNPs on the CHRNA5 gene (the candidates) and the 43 randomly selected control SNPs. We set k = 20 and the significance level to 0.10. We solved for the threshold for BF from (2.3), with the null distribution of BF simulated from its analytical form (Appendix B).

Table 1 shows that only the most significant SNP by both p value and BF retains significance after multiplicity calibration. No SNP is significant under unweighted Bonferroni correction.

Table 1.

Results of the bupropion example including all the SNPs on CHRNA5 and all control markers; the threshold for BF in the MCBHT is 0.225

Class	Gene	rs number	BF	p value	Decision
					P	BF	Bonferroni	MCBHT
Candidate	CHRNA5	rs871058	0.191	0.019	1	1	0	1
		rs601079	0.489	0.079	0	0	0	0
		rs692780	0.701	0.161	0	0	0	0
		rs514743	0.890	0.207	0	0	0	0
		rs684513	1.585	0.670	0	0	0	0
		rs637137	1.591	0.532	0	0	0	0
Control	DIP2A	rs2839290	0.647	0.208	0	0	0	0
	MGC35440	rs741441	0.702	0.041	1	0	0	0
	VGCNL1	rs638732	0.706	0.159	0	0	0	0
	Unidentified	rs2828759	0.721	0.055	0	0	0	0
	Unidentified	rs256875	0.784	0.176	0	0	0	0
	Unidentified	rs2750097	0.852	0.244	0	0	0	0
	NFLA	rs1909118	0.883	0.256	0	0	0	0
	Unidentified	rs1024766	0.924	0.281	0	0	0	0
	UNC93A	rs588981	0.946	0.196	0	0	0	0
	LASS3	rs1910412	0.947	0.086	0	0	0	0
	Unidentified	rs907444	0.961	0.206	0	0	0	0
	GRK7	rs1467200	1.043	0.306	0	0	0	0
	AHCTF1	rs1691251	1.073	0.437	0	0	0	0
	Unidentified	rs2611611	1.131	0.320	0	0	0	0
	EFCAB3	rs2009866	1.200	0.579	0	0	0	0
	FMN1	rs1534596	1.210	0.378	0	0	0	0
	Unidentified	rs1468158	1.295	0.298	0	0	0	0
	LOC728727	rs1330106	1.376	0.459	0	0	0	0
	CCDC105	rs736737	1.400	0.429	0	0	0	0
	ZC3H13	rs2031633	1.419	0.409	0	0	0	0
	LOC152485	rs878451	1.464	0.540	0	0	0	0
	KIAA1826	rs1939810	1.466	0.466	0	0	0	0
	Unidentified	rs2190184	1.489	0.521	0	0	0	0
	Unidentified	rs1365057	1.493	0.449	0	0	0	0
	Unidentified	rs2036943	1.522	0.624	0	0	0	0
	NUDCD1	rs2054255	1.594	0.543	0	0	0	0
	C20orf23	rs2208056	1.602	0.604	0	0	0	0
	Unidentified	rs136501	1.652	0.667	0	0	0	0
	PHLPPL	rs2052584	1.659	0.787	0	0	0	0
	DNAJC10	rs288259	1.660	0.937	0	0	0	0
	Unidentified	rs829864	1.670	0.993	0	0	0	0
	Unidentified	rs1885423	1.678	0.978	0	0	0	0
	Unidentified	rs1906810	1.692	0.877	0	0	0	0
	EPDR1	rs2598108	1.702	0.871	0	0	0	0
	Unidentified	rs719674	1.702	0.667	0	0	0	0
	NFASC	rs2802853	1.708	0.976	0	0	0	0
	SMCR7	rs2605141	1.709	0.918	0	0	0	0
	CEP110	rs1998505	1.712	0.987	0	0	0	0
	UNC5CL	rs2294693	1.713	0.737	0	0	0	0
	ZNF445	rs1106499	1.724	0.862	0	0	0	0
	ZFYVE27	rs946778	1.730	0.835	0	0	0	0
	Unidentified	rs1359719	1.735	0.992	0	0	0	0

Open in a new tab

We used simulation to evaluate the sizes and powers of the methods being considered. We set the total number of SNPs as 49 and varied the number of nonnull SNPs (m₁) and the number of candidate SNPs (m_C). Table 2 presents the simulated type I error rates for each method for the setting m₁ = 0 and m_C = 6. All the multiplicity adjustment approaches controlled type I error rates at the 0.05 level. An initial set of power simulations appears in Table 3. This time we set m₁ = 6 and m_C = 6, with all the nonnull SNPs correctly specified as candidates. We also varied the effect size β_I∈{ − 0.5,0.5,1}. We get similar observations as in Table S3 (supplementary material available at Biostatistics online); that is, the power for MCBHT is higher than Bonferroni, achieving a plateau for large k.

Table 2.

Type I error rate (%) with 49 hypotheses, all null, assuming 6 candidate hypotheses and 43 controls

Method	Type I error
p value	91.4
BF	72.6
Bonferroni	5.0
MCBHT k = 1	4.3
MCBHT k = 2	4.2
MCBHT k = 4	4.6
MCBHT k = 6	5.0
MCBHT k = 8	5.0
MCBHT k = 10	4.8
MCBHT k = 20	4.6

Open in a new tab

Table 3.

Power (%) with 49 hypotheses, 6 nonnull (all candidates), and 43 null (all controls)

Method	β_I = – 0.5	β_I = 0.5	β_I = 1
p value	19.5	19.8	56.5
BF	13.3	15.2	46.7
Bonferroni	1.5	1.7	13.6
MCBHT k = 1	1.5	1.9	13.6
MCBHT k = 2	2.5	2.9	18.2
MCBHT k = 4	3.8	4.3	22.8
MCBHT k = 6	4.5	5.0	25.1
MCBHT k = 8	4.8	5.3	26.5
MCBHT k = 10	5.1	5.6	27.3
MCBHT k = 20	5.6	6.2	29.6

Open in a new tab

We also illustrate the power of MCBHT for a range of values of m₁ = m_C, with all the nonnull SNPs correctly specified as candidates (Table 4). The power for MCBHT is always larger than Bonferroni. On average, power declines with m₁ regardless of k.

Table 4.

Power (%) with 49 hypotheses, with varying numbers of candidate markers m_C, all of them nonnull (m₁ = m_C), and all control markers null

Method	m₁ = 1	m₁ = 2	m₁ = 3	m₁ = 4	m₁ = 5
p value	21.2	20.1	19.3	20.8	17.9
BF	13.5	14.9	13.4	13.7	11.9
Bonferroni	2	1.6	1.7	1.8	1.5
MCBHT k = 1	1.9	1.8	1.7	1.8	1.3
MCBHT k = 2	2.9	3.1	2.7	2.8	2.2
MCBHT k = 4	5.2	5.3	4.2	4.1	3.5
MCBHT k = 6	6.9	6.8	5.2	5	4.1
MCBHT k = 10	9.2	8.3	6.7	5.9	4.8
MCBHT k = 20	12.9	10.7	8.1	7.2	5.4

Open in a new tab

Table 5.

Power (%) with 49 hypotheses, with a fraction of nonnulls incorrectly specified as controls

Method	m₁ = 1 PCN = 0	m₁ = 2 PCN = 1/2	m₁ = 2 PCN = 0	m₁ = 3 PCN = 2/3	m₁ = 3 PCN = 1/3	m₁ = 3 PCN = 0
p value	11.7	20.3	21.6	17.5	21.1	18.4
BF	5.6	14.4	13.6	12.4	15.3	15.7
Bonferroni	0.5	2.2	2.1	1.7	1.7	1.4
MCBHT k = 1	0.2	2.2	2.1	1.7	1.9	1.8
MCBHT k = 2	0.2	2.9	1.8	1.9	2.1	1.6
MCBHT k = 4	0.2	3.5	1.4	2.2	2.4	1.3
MCBHT k = 6	0.2	3.5	1.1	2.4	2.6	1.1
MCBHT k = 8	0.1	3.5	1.0	2.5	2.6	1.0
MCBHT k = 10	0.1	3.6	0.8	2.4	2.4	0.9
MCBHT k = 20	0.0	4.0	0.5	2.6	2.6	0.6

Open in a new tab

Finally, we considered the situation where m₁ < m_C = 6 and some non-null SNPs are misspecifed as controls. We quantify a priori validity as the proportion of candidates among the non-null markers (PCN). The lower the fraction of nonnulls that are specified as candidates, the worse is the power of MCBHT. Thus, the value of MCBHT depends on the ability of the user to identify the nonnull markers as candidates.

4. DISCUSSION

Incorporating prior information can raise the power of frequentist multiplicity-adjusted tests, rendering them useful as screening tools when the number of tests is moderate to large. We propose a multiplicity-calibrated Bayesian hypothesis test that assigns a separate prior probability to each null hypothesis and controls for multiplicity through a weighted Bonferroni adjustment. Simulations demonstrate that our method increases the power if the prior information reflects the true state of nature. One could also use our method to calibrate other p value-based multiplicity adjustment procedures such as step-down tests (Westfall and Young, 1993) and tests that control the false discovery rate (FDR; Benjamini and Hochberg, 1995).

Genovese and others (2006) proposed a similar approach, controlling the FDR at a prespecified level by assigning to each p value a weight associated with the probability that the null hypothesis is false. Chen and Sarkar (2005) also proposed to incorporate the uncertainty in both parameter and data by using average FDR, averaging FDR over the parameter space. Our method accomplishes the calibration by basing the tests on the BF, adjusting its critical value until the desired error rate is achieved.

Although we illustrated our method in the scenario of two classes of hypotheses—candidate and control—the idea is more general, in that each null hypothesis can have a unique prior probability. As in many applications of Bayesian analysis, the choice of prior can be challenging. Our method for the 2-class case only requires specification of k, the ratio of the prior odds of the null under the control (null is more likely) and candidate (null is less likely) classes. Because our method is based on BF, there is no need to specify the exact prior probability for the null. When k = 1, our method is similar to an unweighted Bonferroni test. When k is large, for example, k > 10 in our examples, the method essentially eliminates the control tests and becomes a weighted Bonferroni test of the candidate null hypotheses.

In pharmacogenetic applications, one might ultimately seek to identify a best model for predicting outcome from treatment, an array of genetic markers, and treatment-by-marker (and even marker-by-marker) interactions. Information on relevant biological pathways involving the treatment effect could be used to further inform the selection of priors. In that context, the test described here would serve as a screening procedure to select a subset of markers for a more comprehensive model selection exercise.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

FUNDING

The United States Public Health Service (P50 CA084718 from the National Cancer Institute and National Institute on Drug Abuse; R01 CA063562 and R01 CA116723 from the National Cancer Institute; P20 RR020741 from the National Center for Research Resources).

Supplementary Material

[Supplementary Material]

kxq012_index.html^{(638B, html)}

Acknowledgments

We are grateful to Caryn Lerman, Jinbo Chen, Edward George, and Stephen Kimmel for helpful comments and support. Conflict of Interest: None declared.

APPENDIX A: 1-1 POSTERIOR PROBABILITY AND BF

Assume that there are m null hypotheses H₁,…,H_m, among which m_C are candidate hypotheses and m_N are controls. We assign a prior probability for the null π_0C to the candidate hypotheses and π_0N to the controls, with the relationship Inline graphic . For each null hypothesis H_i, we compute its posterior probability Pr[H_i|x] and BF, BF_i. We reject H_i for a smaller posterior probability for the null than a common threshold P^*, which is equivalent to rejecting candidates if BF_i ≤ BF_C^* and controls if BF_i ≤ BF_N^* by the following deduction:

Because

the BF threshold for candidate hypotheses is k times the BF threshold for controls.

APPENDIX B: ASYMPTOTIC APPROXIMATION OF BF

In Heitjan and others (2007), we developed an asymptotic approximation to compute the BF for the test of a treatment-by-SNP interaction with a binary outcome variable. Using the notation of that paper, we apply a second-order Taylor expansion on the log-likelihood of β around its maximum likelihood estimate (MLE; Inline graphic ) and obtain , where is the inverse of the observed information matrix of the likelihood evaluated at . Plugging the approximation into m₀(x) and integrating out β gives

graphic file with name biostskxq012fx14_ht.jpg

where β₀ = β_0TG, Inline graphic and are the MLE and estimated variance for β_0TG, φ(·;μ₀,Σ₀) is the multivariate normal density with mean μ₀ and variance matrix Σ₀, and the constants μ₀ and Σ₀ are the prior mean and variance for β_0TG. Similarly, , where β₁ = β_0TGI, and are the MLE and estimated variance for β_0TGI, φ(·;μ₁,Σ₁) is the density of normal distribution with mean μ₁ and variance matrix Σ₁, and the constants μ₁ and Σ₁ are the prior mean and variance for β_0TGI. Then,

which one can show to be the same order, O(n^{− 1}), as Laplace's method under certain regularity conditions (Wang and George, 2004).

References

Bartlett MS. A comment on D.V. Lindley's statistical paradox. Biometrika. 1957;44:533–534. [Google Scholar]
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;57:289–300. [Google Scholar]
Benjamini Y, Hochberg Y. Multiple hypotheses testing with weights. Scandinavian Journal of Statistics. 1997;24:407–418. [Google Scholar]
Berger JO. Statistical Decision Theory and Bayesian Analysis. New York: Springer; 1985. [Google Scholar]
Berger JO, Berry DA. Statistical analysis and the illusion of objectivity. American Scientist. 1988;76:159–165. [Google Scholar]
Berger JO, Sellke T. Testing a point null hypothesis: the irreconcilability of p values and evidence. Journal of the American Statistical Association. 1987;82:112–122. [Google Scholar]
Berry DA, Hochberg Y. Bayesian perspectives on multiple comparisons. Journal of Statistical Planning and Inference. 1999;82:215–227. [Google Scholar]
Chen J, Sarkar SK. A Bayesian determination of threshold for identifying differentially expressed genes in microarray experiments. Statistics in Medicine. 2005;25:3174–3189. doi: 10.1002/sim.2422. [DOI] [PubMed] [Google Scholar]
Diamond GA, Forrester JS. Clinical trials and statistical verdicts: probable grounds for appeal. Annals of Internal Medicine. 1983;98:385–394. doi: 10.7326/0003-4819-98-3-385. [DOI] [PubMed] [Google Scholar]
Edwards W, Lindman H, Savage LJ. Bayesian statistical inference for psychological research. Psychological Review. 1963;70:193–242. [Google Scholar]
Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509. [Google Scholar]
Gönen M, Johnson WO, Lu Y, Westfall PH. The Bayesian two-sample t test. The American Statistician. 2005;59:252–257. doi: 10.1080/00031305.2017.1322142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heitjan DF, Guo M, Ray R, Wileyto EP, Epstein LH, Lerman C. Identification of pharmacogenetic markers in smoking cessation therapy. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2007;147B:712–719. doi: 10.1002/ajmg.b.30669. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800–802. [Google Scholar]
Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979;6:65–70. [Google Scholar]
Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75:383–386. [Google Scholar]
Jeffreys H. Theory of Probability. Oxford: Oxford University Press; 1961. [Google Scholar]
Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
Lerman C, Jepson C, Wileyto EP, Epstein LH, Rukstalis M, Patterson F, Kaufmann V, Restine S, Hawk L, Niaura R, et al. Role of functional genetic variation in the dopamine D2 receptor (DRD2) in response to bupropion and nicotine replacement therapy for tobacco dependence: results of two randomized clinical trials. Neuropsychopharmacology. 2006;31:231–242. doi: 10.1038/sj.npp.1300861. [DOI] [PubMed] [Google Scholar]
Lindley DV. A statistical paradox. Biometrika. 1957;44:187–192. [Google Scholar]
Manni A, Khin S, Biser N, English H, Badger B, Martel J, Demers L. Synchronization of breast cancer cell proliferation in vivo by combined hormonal and polyamine manipulation. Cancer Research. 1992;52:5720–5724. [PubMed] [Google Scholar]
Shaffer JP. Multiple hypothesis testing. Annual Reviews in Psychology. 1995;46:561–584. [Google Scholar]
Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–754. [Google Scholar]
Wang X, George EI. A hierarchical Bayes approach to variable selection for generalized linear models. Technical Report SMU-TR-321. Dallas, TX: Department of Statistics, Southern Methodist University; 2004. [Google Scholar]
Weiss R. Bayesian sample size calculations for hypothesis testing. Journal of the Royal Statistical Society, Series D: The Statistician. 1997;46:185–191. [Google Scholar]
Westfall PH, Johnson WO, Utts JM. A Bayesian perspective on the Bonferroni adjustment. Biometrika. 1997;84:419–427. [Google Scholar]
Westfall PH, Young SS. Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Hoboken, NJ: Wiley-Interscience; 1993. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplementary Material]

kxq012_index.html^{(638B, html)}

kxq012_1.pdf^{(81.5KB, pdf)}

[bib1] Bartlett MS. A comment on D.V. Lindley's statistical paradox. Biometrika. 1957;44:533–534. [Google Scholar]

[bib2] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995;57:289–300. [Google Scholar]

[bib3] Benjamini Y, Hochberg Y. Multiple hypotheses testing with weights. Scandinavian Journal of Statistics. 1997;24:407–418. [Google Scholar]

[bib4] Berger JO. Statistical Decision Theory and Bayesian Analysis. New York: Springer; 1985. [Google Scholar]

[bib5] Berger JO, Berry DA. Statistical analysis and the illusion of objectivity. American Scientist. 1988;76:159–165. [Google Scholar]

[bib6] Berger JO, Sellke T. Testing a point null hypothesis: the irreconcilability of p values and evidence. Journal of the American Statistical Association. 1987;82:112–122. [Google Scholar]

[bib7] Berry DA, Hochberg Y. Bayesian perspectives on multiple comparisons. Journal of Statistical Planning and Inference. 1999;82:215–227. [Google Scholar]

[bib8] Chen J, Sarkar SK. A Bayesian determination of threshold for identifying differentially expressed genes in microarray experiments. Statistics in Medicine. 2005;25:3174–3189. doi: 10.1002/sim.2422. [DOI] [PubMed] [Google Scholar]

[bib9] Diamond GA, Forrester JS. Clinical trials and statistical verdicts: probable grounds for appeal. Annals of Internal Medicine. 1983;98:385–394. doi: 10.7326/0003-4819-98-3-385. [DOI] [PubMed] [Google Scholar]

[bib10] Edwards W, Lindman H, Savage LJ. Bayesian statistical inference for psychological research. Psychological Review. 1963;70:193–242. [Google Scholar]

[bib11] Genovese CR, Roeder K, Wasserman L. False discovery control with p-value weighting. Biometrika. 2006;93:509. [Google Scholar]

[bib12] Gönen M, Johnson WO, Lu Y, Westfall PH. The Bayesian two-sample t test. The American Statistician. 2005;59:252–257. doi: 10.1080/00031305.2017.1322142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Heitjan DF, Guo M, Ray R, Wileyto EP, Epstein LH, Lerman C. Identification of pharmacogenetic markers in smoking cessation therapy. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2007;147B:712–719. doi: 10.1002/ajmg.b.30669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800–802. [Google Scholar]

[bib15] Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979;6:65–70. [Google Scholar]

[bib16] Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75:383–386. [Google Scholar]

[bib17] Jeffreys H. Theory of Probability. Oxford: Oxford University Press; 1961. [Google Scholar]

[bib18] Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]

[bib19] Lerman C, Jepson C, Wileyto EP, Epstein LH, Rukstalis M, Patterson F, Kaufmann V, Restine S, Hawk L, Niaura R, et al. Role of functional genetic variation in the dopamine D2 receptor (DRD2) in response to bupropion and nicotine replacement therapy for tobacco dependence: results of two randomized clinical trials. Neuropsychopharmacology. 2006;31:231–242. doi: 10.1038/sj.npp.1300861. [DOI] [PubMed] [Google Scholar]

[bib20] Lindley DV. A statistical paradox. Biometrika. 1957;44:187–192. [Google Scholar]

[bib21] Manni A, Khin S, Biser N, English H, Badger B, Martel J, Demers L. Synchronization of breast cancer cell proliferation in vivo by combined hormonal and polyamine manipulation. Cancer Research. 1992;52:5720–5724. [PubMed] [Google Scholar]

[bib22] Shaffer JP. Multiple hypothesis testing. Annual Reviews in Psychology. 1995;46:561–584. [Google Scholar]

[bib23] Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751–754. [Google Scholar]

[bib24] Wang X, George EI. A hierarchical Bayes approach to variable selection for generalized linear models. Technical Report SMU-TR-321. Dallas, TX: Department of Statistics, Southern Methodist University; 2004. [Google Scholar]

[bib25] Weiss R. Bayesian sample size calculations for hypothesis testing. Journal of the Royal Statistical Society, Series D: The Statistician. 1997;46:185–191. [Google Scholar]

[bib26] Westfall PH, Johnson WO, Utts JM. A Bayesian perspective on the Bonferroni adjustment. Biometrika. 1997;84:419–427. [Google Scholar]

[bib27] Westfall PH, Young SS. Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Hoboken, NJ: Wiley-Interscience; 1993. [Google Scholar]

PERMALINK

Multiplicity-calibrated Bayesian hypothesis tests

Mengye Guo

Daniel F Heitjan

Abstract

1. INTRODUCTION