Abstract
We develop a Bayes factor based testing procedure for comparing two population means in high dimensional settings. In ‘large-p-small-n’ settings, Bayes factors based on proper priors require eliciting a large and complex p×p covariance matrix, whereas Bayes factors based on Jeffrey’s prior suffer the same impediment as the classical Hotelling T2 test statistic as they involve inversion of ill-formed sample covariance matrices. To circumvent this limitation, we propose that the Bayes factor be based on lower dimensional random projections of the high dimensional data vectors. We choose the prior under the alternative to maximize the power of the test for a fixed threshold level, yielding a restricted most powerful Bayesian test (RMPBT). The final test statistic is based on the ensemble of Bayes factors corresponding to multiple replications of randomly projected data. We show that the test is unbiased and, under mild conditions, is also locally consistent. We demonstrate the efficacy of the approach through simulated and real data examples.
Some Key Words: Bayes factor, Random projection, Restricted most powerful Bayesian tests, Testing of hypotheses
1 Introduction
High dimensional population mean testing is common in many application areas including genomics, where gene-set testing is often of more interest than individual gene tests (Ein-Dor et al., 2006; Subramanian et al., 2005). A natural high dimensional test is based on the distance between the sample mean vectors weighted by the inverse sample covariance matrix, also known as the Mahalanobis distance (Johnson and Wichern, 1992). However, the weight becomes undetermined when the dimension of the population mean vectors is larger than the total sample size minus 2.
To circumvent these limitations, two major approaches have emerged. The first approach is centered around constructing tests that eliminate the need to invert ill-formed covariance matrices. Bai and Saranadasa (1996) replaced the sample covariance matrix by a diagonal covariance matrix, for which the inverse exists. Srivastava (2007) substituted the inverse covariance matrix by its Moore-Penrose inverse, under the assumption that the groups have the same covariances. Wu et al. (2006) and Gregory et al. (2014) proposed tests based on the pooled squared univariate t-tests, eliminating the need to invert non-positive definitive matrices.
The latter approach centers around transforming the data, instead of the test statistics, so that existing tests could be applied to the transformed data. Random projection (RP) is one such method that works by projecting high dimensional data into lower dimensions while only slightly distorting the distances between the original vectors. See, for example, Dasgupta and Gupta (2003). RP has become a popular tool used extensively in machine learning literature where texts documents, imaging and MRI data are often high dimensional. Dasgupta (2000), for example, used RP to uncover the components of high dimensional mixtures of Gaussians. Fern and Brodley (2003) showed the improvement in clustering high dimensional data using RP over other standard approaches. Recently, Guhaniyogi and Dunson (2015) proposed a Bayesian compression regression approach in n ≪ p scenarios, where RP is used to reduce the covariate space. RP has entered the frequentist hypothesis testing literature where the T2 statistics are based on the projected version of the data in ‘large-p-small-n’ setting. See, for example, Lopes et al. (2011) and Srivastava et al. (2016).
However, to our knowledge, no work has been done to extend Bayesian machineries to hypothesis testing in high dimensional group means testing. Bayesian hypothesis testing differs from its frequentist counterpart in that the decision to reject or accept a null hypothesis is based on the Bayes factor and a chosen evidence threshold (Jeffreys, 1961; Kass and Raftery, 1995). More precisely, the Bayes factor in favor of the alternative hypothesis H1, denoted by BF10, is defined as
| (1) |
where m(Y|Hi) denotes the marginal distribution of Y under Hi; π(Θ|Hi) denotes the prior distribution of Θ under Hi, for i = 0, 1.
Equation (1) involves high dimensional integrals and the choice of π(Θ|Hi) often focuses on distributions that lead to closed form expressions for the Bayes factor.
In this paper, we use the random projection approach to develop a Bayes factor based restricted most powerful Bayesian test (RMPBT) (Goddard and Johnson, 2016) for the high dimensional group means testing problem. In an RMPBT, the prior distribution under the alternative is chosen by maximizing the power of the test with respect to a restricted class of priors. The evidence threshold is selected to match the rejection region of its non-Bayesian counterpart. We show that our proposed test is unbiased and consistent.
The paper is organized as follows. In Section 2, we derive RMPBT for testing differences between two mean vectors. We establish some asymptotic properties of the test in Section 3. Section 4 provides a simulation study investigating the power of the proposed test. We apply the proposed test to the analysis of some real data sets in Section 5. Section 6 concludes with a discussion.
2 Bayes factor in high dimensions
Let Np(μ,Σ) denote a p-dimensional multivariate normal density with mean vector μ and covariance matrix Σ. Let X1, · · ·, Xn1 ∈ ℝp and Y1, · · ·, Yn2 ∈ ℝp be independent random draws from Np(μ1,Σ) and Np(μ2,Σ), respectively. Also, let Xn1×p = (X1, · · ·, Xn1)T and Yn2×p = (Y1, · · ·, Yn2)T.
The minimal sufficient statistics are D = Ȳ − X̄, A = (n1Ȳ +n2X̄)/(n1 +n2) and . Also, , A ~ Np(μ, n−1Σ) and (n − 2)S ~ Wp(n − 2,Σ), independently, where δ = μ2 − μ1, μ = (n1μ1 + n2μ2)/n, , n = n1 + n2 and Wp(n,Σ) denotes a Wishart distribution on the space of p×p dimensional positive definite matrices with degrees of freedom n and mean nΣ.
The problem is to test H0 : μ1 = μ2 against H1 : μ1 ≠ μ2. We will work with the reparametrization in terms of δ, μ and Σ, which parametrize the distribution of the minimal sufficient statistics. The hypotheses of interest can accordingly be reformulated as
The generic form of the prior that we consider is given by π(δ,μ,Σ |Hi) = π(μ,Σ) π(δ|Σ,Hi). For π(μ,Σ), we consider the Jeffrey’s prior given by
| (2) |
The choice π(δ|Σ,H0) = 1{δ = 0} is trivially dictated by H0. For π(δ|Σ,H1), we consider the prior
| (3) |
The choice of the hyper-parameter τ0 ∈ (0,∞) is crucial and is discussed in Section 2.1. Throughout the paper, we assume the same prior weight for each hypothesis, that is, P(H0) = P(H1) = 0.5.
When 1 < p < n−2, the Bayes factor admits a closed form expression under the assumed priors, as shown by the following result.
Lemma 1
With 1 < p < n− 2 and under the priors (2) and (3), we have
| (4) |
where η = n0/τ0 and .
We show the derivation of BF10 in Appendix A. Here, f is the scaled Hotelling’s T2 statistic with f ~ Fp,n−p−1 under H0, where Fν1,ν2 denotes a central F distribution with ν1 and ν2 degrees of freedom. When p ≥ n − 2, S is no longer positive definite, hence its inverse is not unique and (4) can not be employed to test H0 against H1. To handle the dimensionality problem, we project the data vectors to a lower dimensional subspace using a random projection matrix Rp×m satisfying RTR = Im×m where 1 < m < n−2. The projected data for group 1 are then obtained as , i = 1, · · ·, n1. Likewise, the projected data for group 2 are , i = 1, · · ·, n2. For a given projection matrix Rp×m, under the priors (2) and (3), using Lemma 1 and basic properties of multivariate normal distributions, BF10 based on the projected data is given by
| (5) |
where , 1 < m < n− 2, p ≥ n − 2, and n = n1 + n2. Also, f★ ~ Fm,n−m−1 under H0. The following result establishes some desirable asymptotic properties of BF10(X★,Y★) when τ0 is fixed and also when τ0 is allowed to depend on n.
Theorem 1
Let nmin = min{n1, n2} → ∞ and m→∞ with limnmin→∞ m/n = θ ∈ (0, 1).
If τ0 is fixed, under H0, , and, under H1, .
If n0/τ0 → 0 and mn0/τ0 → ∞, under H0, log{BF10(X★,Y★)} = 𝒪p(1). For the corresponding sequence of , with as nmin →∞.
The proof of Theorem 1 is deferred to Appendix B. Part (a) of Theorem 1 states that for fixed τ0, the Bayes factors is consistent under H0 and a fixed alternative H1. However, if τ0 is allowed to depend on n so that n0/τ0 → 0 at a slower rate than 1/n, BF10(X★,Y★) is not consistent under H0, but is consistent for that sequence of local alternatives, provided that the F-statistic f★ is unbounded as nmin →∞. Although the lack of consistency of the Bayes factors in part (b) of Theorem 1 under H0 seems unsettling at first, this property is similar to that of frequentist tests, where, for a chosen significance level, the null hypothesis has a non-zero probability of being rejected regardless of the sample size when the null is actually true. We show below that the construction of the restricted most powerful Bayesian test satisfies the conditions enumerated above.
2.1 Restricted most powerful Bayesian tests
Recently, Johnson (2013b) introduced the idea of the uniformly most powerful Bayesian tests (UMPBTs) in the context of point hypothesis testing, providing a Bayesian parallel to the idea of uniformly most powerful tests (UMPTs) proposed by Neyman and Pearson (1928, 1933). UMPTs are defined as tests with the highest power among all possible tests of a given size. For a fixed size, UMPTs have the rejection region with the highest probability under the alternative. The rejection region refers to the range of values of the test statistics that leads to a rejection of the null.
In Bayesian hypothesis testing, the decision to reject the null hypothesis is based on the Bayes factor or evidence (log Bayes factor) for a given fixed alternative. Johnson (2013b) defines a UMPBT for testing a null against a fixed alternative as the test corresponding to the prior under the alternative that maximizes the probability of deciding in favor of the alternative for a fixed evidence level γ, among all possible data generating parameters. More precisely, we have a UMPBT for a given evidence threshold γ if the Bayes factor in favor of an alternative hypothesis H1 against a fixed null hypothesis H0 satisfies
for all possible values of the data generating parameter θ and all alternative hypotheses H2. However, as noted in Johnson (2013b), UMPBTs exist in a limited number of relatively simple testing scenarios. Finding a UMPBT in our setting is also a daunting task. Recently, Goddard and Johnson (2016) introduced the idea of restricted most powerful Bayesian tests (RMPBTs). RMPBTs are obtained by restricting the choice of the alternatives to a smaller family of distributions. Here, we focus the search of alternatives to a narrow class of distributions, preferably to priors that lead to Bayes factors with closed forms, like the prior considered in (3). We can subsequently choose the hyper-parameter τ0 using the idea of RMPBT by maximizing the probability of deciding in favor of the alternative for a fixed evidence level γ. In other words, we choose τ0 = τ★ so that
for a chosen value of the evidence threshold γ, all possible values of τ0 and all data generating model parameters θ = (δ,μ,Σ). That is, we choose τ0 so as to maximize the following probability
which is at its maximum when the quantity on the right-hand side of the inequality is at its minimum. The RMPBT is thus obtained with
| (6) |
The optimization in (6) requires a value of γ which can be chosen according to the evidence threshold suggested in Kass and Raftery (1995). Alternatively, we can choose γ by equating the rejection region of the Bayes factor to that of the classical F statistic. In the non-Bayesian setting, a level α test would reject H0 if f★ > Fα,m,n−m−1, where Fα,m,n−m−1 is the upper α quantile of an F distribution with m and n − m − 1 degrees of freedom. The rejection region based on the Bayes factor in favor of the alternative can be expressed as
| (7) |
where Cn = {(1 + n0/τ0)/(n0/τ0)}[1 − {γ(1 + n0/τ0)m/2}−2/(n−1)]. Setting
| (8) |
we can then solve for γα. This way, under H0, we have P{BF10(X★,Y★) > γα} = P(f★ > Fα,m,n−m−1) = α. We obtain an exact form for τ0, denoted τα(n), that satisfies (6) given (8) as
| (9) |
where, from (8), Cn = (mFα,m,n−m−1)/(mFα,m,n−m−1 + n − m − 1).
We can then obtain the equivalent value of γ, denoted γα(n), as
| (10) |
The plot of the evidence threshold γα(n) along with the associated τα(n) value for various values of α is shown in Figure 1 for two different cases. In both cases, we note that the values of γα(n) above 20 (strong evidence) are associated with very small significance level α < 0.007. This is consistent with the findings that the evidence reflected in Bayes factors often requires very strong evidence in classical settings (Johnson, 2013a).
Figure 1.
Plot of the estimated Bayes factor threshold γα(n) and the value of τα(n) for various values of α, the significance level. The values of Bayes factor above the horizontal line at 20 denotes the (γα, τα, α) triplet that represents strong evidence against the null hypothesis according to Kass and Raftery (1995).
2.2 Choice of R and m
We discuss the choice of R and m here. We make no attempt to find an optimal projection matrix but are primarily motivated by practical convenience.
Intuitively, however, the projection matrix R should be selected so to only slightly perturb all pairwise distances between the sample vectors (Li et al., 2006). One possible way to achieve this is to sample the entries of R from a distribution with mean zero and variance one. Since our test statistics involves the inversion of RTSR, which is positive definite if RTR = Im (see Lemma 1 of Srivastava et al., 2016), we further restrict our choices to the family of semi-orthogonal matrices. We consider two constructions of the projection matrix. The first one, denoted R1, is similar to the one permutation + one random projection considered in Srivastava et al. (2016) and yields a sparse matrix with only p non-zero elements. It is constructed as follows.
Start with a p × m matrix of zero in each entry.
Simulate {r1, r2, · · ·, rp} independently from a standard normal.
For each of the m columns, iteratively select ⌊p/m⌋ elements from r = {r1, r2, · · ·, rp} without replacement and assign them respectively to the positions 1 to ⌊p/m⌋ for column 1 vector, ⌊p/m⌋+1 to 2⌊p/m⌋ for column 2 vector, and elements (m−1)⌊p/m⌋ to m⌊p/m⌋ for column m vector. Finally, assign the remaining elements of r, if any, one per column and per row in the remaining rows. Each row of R1 should now have exactly one non-zero element.
Randomly permute the row vectors of R1.
Finally, standardize the columns vectors so that they have length 1.
The second approach obtains R2 as the Q matrix of the QR decomposition of a p × m matrix with entries simulated independently from a standard normal distribution.
QR decomposition of a large matrix is computationally intensive.
Note, however, that any matrix U ∈ ℝp×m admits a QR decomposition U = RB, where R ∈ ℝp×m is an orthonormal matrix, that is, RTR = Im and B ∈ ℝm×m is an upper triangular matrix with positive entries on the diagonal. This implies U(UTSU)−1UT = RB(BTRTSRB)−1BTRT = R(RTSR)−1RT.
This suggests that we could simply replace R by U in the equation for f★ to speed up the computation.
As one reviewer pointed out, the matrix R could also be obtained from singular value decomposition (SVD) of the sample covariance matrix by ignoring the eigenvectors associated with small eigenvalues. However, since such a construction involves the data, the projected data would have a more complex distribution, adding another layer of complexity to the test. Simulation experiments, where we approximated the null distribution of the test statistic using Monte Carlo simulations, also suggest that this approach does not perform well in practice. The results are summarized in Section S.2 of the Supplementary Material.
Now, we discuss how to choose m. Intuitively, small values of m will tend to ignore dependence in the data and the value m = 1 completely ignore any correlation. Large values of m close to n1 +n2 −2, on the other hand, will lead to tests with low power as the sample covariance matrix is getting closer to a degenerate matrix with small eigenvalues as noted by Bai and Saranadasa (1996). It is expected that the best value of m will tend to depend on the form of the true unknown covariance matrix - while smaller values of m may perform well when the true covariance matrix is diagonal, larger values will be appealing in more complex cases. Here, we present a heuristic approach to obtain m in general settings.
For a significance level α and a random projection matrix R(m), we have Pm[BF10{X★(m),Y★(m),m} > γα | H1,R(m)] = Pm{f★(m) > Fα,m,n−m−1 | H1,R(m)}. Ideally, we would want to choose a value of m that maximizes this probability and hence optimizes the power. This, however, is a difficult problem since m is also involved in the construction of f★ itself and in its distribution as well. We obtain an approximate solution instead by minimizing the value of the threshold Fα,m,n−m−1 with respect to m. The values of m hence obtained are similar to the values of m obtained by Srivastava et al. (2016) using a similar argument. We show in the proof of Theorem 2 in Appendix C that such a choice of m satisfies the assumptions of Theorem 1.
Numerical experiments also suggest the empirical power of the test based on such a choice of m to be very close to the optimal power.
3 Test based on Bayes factor and random projections
3.1 Single random projection
We derived the Bayes factor in Section 2 after we apply a single random projection R to the data (see Equation 5). Given the sample sizes n1 and n2 and a choice of α, we choose m(n), τα(n), and γα(n) as discussed in Sections 2.1 and 2.2. A test based on the resulting Bayes factor is then obtained as
| (11) |
where ϕ(R) = 1 signifies rejection of H0 in favor of H1, and ϕ(R) = 0 accept H0. We make the following observations about the test in (11).
Theorem 2
For a given significance level α ∈ (0, 1), we have
Under H0, for fixed n1, n2, and m(n), E{ϕ(R) | H0} = α.
Under H1, for fixed n1, n2, and m(n), E{ϕ(R) | H1} ≥ α.
Let the assumptions in Theorem 1 part (b) be satisfied such that n1, n2, p → ∞, with m(n), τα(n), γα(n) chosen as described in Section 2.2. Then, , where m(n)/n → θ ∈ (0, 1), n0m(n)/τα(n)→∞.
We show the proof of Theorem 2 in Appendix C. Note that (a) shows that the test described in (11) has size α; (b) shows that the test is unbiased; finally (c) shows that the power converges to 1 with increasing sample size. In part (c) of Theorem 2, we impose that m(n)/n → θ ∈ (0, 1) and n0m(n)/τα(n) → ∞ which are satisfied by our construction suggested in Section 2.2.
3.2 Multiple random projections
A test based on a Bayes factor obtained from a single random projection may lead to different decisions for two different random projection matrices. To avoid that, we consider a multitude of Bayes factors computed using many different random projections. Subsequently, we define our test statistic based on the ensemble of Bayes factors and study its power.
Let R1, · · ·, RN be a collection of independently and identically distributed random projection matrices. For a choice of n1, n2, and α, the values of m(n), τα(n), and γα(n) are obtained as discussed in Sections 2.1 and 2.2. We then define ϕ̄(N) as
| (12) |
where 1{A} = 1 if A is true and 0 otherwise. Clearly, BF10(Ri) depends on τα(n). Note that ϕ̄(N) represents the proportion of Bayes factors based on projected data that yield Bayes factor larger than the specified evidence threshold γα(n), for a choice of α and m(n). We then define RMPBT as
| (13) |
where is the upper α quantile of the distribution of ϕ̄(N) under H0, which depends on m(n), n1, n2, p and α.
Theorem 3
Suppose the assumptions of Theorems 1 and 2 hold. Given a collection R1, · · ·, RN of random projections matrices, where for all i = 1, · · ·, N, then under the sequence of alternatives.
We show the proof of Theorem 3 in Appendix D. For fixed m(n), p, n1, n2 and α, RMPBT in (13) requires that we first compute . Under H0, δ = 0, but μ1 = μ2 = μ and Σ are unknown. Fortunately, the asymptotic null distribution of ϕ̄(N) is independent of the nuisance parameters μ and Σ, providing a simple way of finding . The result is formalized in the following theorem.
Theorem 4
Under H0, the distribution of ϕ̄(N) as N →∞ is independent of μ and Σ for any fixed n1, n2, m ∈ (1, n1 + n2 − 2) and p ≥ n1 + n2 − 2.
We show the proof of Theorem 4 in Appendix E. Theorem 4 suggests that for large values of N we can approximate the null distribution of ϕ̄(N) by simulating data assuming Σ = I and μ1 = μ2 = 0.
4 Simulation study
We designed a simulation study aimed at investigating the power of the test proposed in (13) with respect to the proportion of true elements of δ that are actually zero for various choices of covariance matrices and different scenarios or conditions. We consider two simulation settings. In the first, we assume p = 200 and n1 = n2 = 50. Using the approach described in Section 2.2, we find m = 43. In the second setting, p = 1000, n1 = n2 = 70 and we get m = 62. We denote by p0 the proportion of entries of the vector δ that are exactly zero. We choose p0 = 0.5, .75, .80, 0.95, 0.99, 1.00. In each setting, the values of τα and γα are chosen according to our discussion in Section 2.1. We consider two types of random projections matrices, R1 and R2, as described in Section 2.2.
We consider the following choices for Σ = ((σij)).
Σ1 = Ip×p is the identity matrix.
Σ2 is a diagonal matrix where the first 20 elements are set at 1 to 20 and the remaining are set exactly 1.
Σ3 is an AR(1) covariance matrix with σij = σ2ρ|i−j|. We chose σ2 = 1 and ρ = 0.4.
Σ4 is block diagonal matrix, with block B = 0. 85I25×25 + 0.15J25×25. J denotes a matrix with 1 in all of its entries.
Σ5 is an ARIMA(1,1) covariance matrix with σij = σ2γ1{|i−j|>0}ρ|i−j|1{|i−j|≥2}. We chose σ2 = 1, γ = 0.5, ρ = 0.9.
We also consider two possible alternatives.
We simulate μ2 ~ Np(1, I), set p0 randomly selected elements to zero, and scale μ2 so that (μ2 − μ1)TΣ−1(μ2 − μ1) = 2.
We simulate μ2 ~ Np(1, I) and set p0 of its elements to zero and rescale μ2 so that .
μ1 is chosen to be a vector of zeros. The two alternatives described above were also considered by Srivastava et al. (2016). We include in these comparisons the following competitors.
The approach of Srivastava et al. (2016) referred to as RAPTT.
The approach of Bai and Saranadasa (1996) referred to as BS96.
The approach of Srivastava and Du (2008) referred to as SD08.
The approach of Chen et al. (2010) referred to as CQ10.
To estimate the power of our test, we use N = 5000 random projections for each of the 1000 independently simulated data sets under each of the alternatives considered. Recall that when p0 = 1.0, the null hypothesis is true and the power represents the type-I error rate estimate. When the true covariance matrix Σ is diagonal (Σ1 and Σ2) (see Table 1 and Table S.1 in the Supplemental Material), the tests that do not rely on random projections, namely BS96, SD08 and CQ10, perform slightly better compared to the tests based on random projections, namely RAPTT and the proposed RMPBT. However, RMPBT tended to have higher power when compared to RAPTT for Σ1 and Σ2. Also, for Σ3, an AR(1) covariance matrix with ρ = 0.4, RMPBT performed better than all its competitors, especially for sparse alternatives (Table S.2 in the Supplemental Material). Also, the random projection based tests tended to be slightly more conservative when Σ = Σ1, Σ2, or Σ3. For more complex covariance matrices (Σ4, Σ5), the non-random projection based approaches tended to have lower overall power (Table 2, Table 3), and, in some cases, significantly lower power than their random projection based competitors, especially when n1 = n2 = 50. However, random projection based approaches tended to have slightly higher or similar estimated type-I error rates for complex true covariance matrices.
Table 1.
Power analysis of 5 tests assuming the true covariance matrix is Σ1 = Ip×p. We chose the significant level at α = 0.05. For the case n1 = n2 = 50, p = 200, we have m = 43, τα = 41.918, and γα = 3.841. For the case, n1 = n2 = 70, p = 1000, we have m = 62, τα = 72.318, and γα = 3.850. RMPBT is our approach. RAPTT is the approach of Srivastava et al. (2016). BS96 is the approach of Bai and Saranadasa (1996). SD08 is the approach of Srivastava and Du (2008). CQ10 is the approach of Chen et al. (2010).
| p0 | RMPBT (R1) | RMPBT (R2) | RAPTT (R1) | RAPTT (R2) | BS96 | SD08 | CQ10 | RMPBT (R1) | RMPBT (R2) | RAPTT (R1) | RAPTT (R2) | BS96 | SD08 | CQ10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||||||
| Alternative 1 | Alternative 2 | ||||||||||||||
| n1 = n2 = 50, p = 200 | 0.500 | 0.674 | 0.669 | 0.675 | 0.650 | 0.754 | 0.723 | 0.754 | 0.469 | 0.435 | 0.453 | 0.408 | 0.542 | 0.513 | 0.542 |
| 0.750 | 0.684 | 0.663 | 0.669 | 0.654 | 0.769 | 0.726 | 0.769 | 0.474 | 0.463 | 0.455 | 0.414 | 0.535 | 0.503 | 0.535 | |
| 0.800 | 0.680 | 0.651 | 0.652 | 0.634 | 0.749 | 0.715 | 0.749 | 0.441 | 0.413 | 0.425 | 0.379 | 0.517 | 0.481 | 0.517 | |
| 0.950 | 0.716 | 0.698 | 0.649 | 0.622 | 0.756 | 0.720 | 0.756 | 0.480 | 0.439 | 0.425 | 0.384 | 0.525 | 0.482 | 0.525 | |
| 0.975 | 0.707 | 0.680 | 0.572 | 0.555 | 0.721 | 0.681 | 0.721 | 0.471 | 0.450 | 0.382 | 0.351 | 0.497 | 0.453 | 0.497 | |
| 0.990 | 0.793 | 0.771 | 0.561 | 0.553 | 0.748 | 0.726 | 0.748 | 0.553 | 0.525 | 0.367 | 0.326 | 0.554 | 0.506 | 0.554 | |
| 1.000 | 0.046 | 0.041 | 0.037 | 0.032 | 0.042 | 0.039 | 0.042 | 0.046 | 0.038 | 0.037 | 0.025 | 0.042 | 0.039 | 0.042 | |
|
| |||||||||||||||
| n1 = n2 = 70, p = 1000 | 0.500 | 0.372 | 0.343 | 0.347 | 0.311 | 0.473 | 0.422 | 0.473 | 0.677 | 0.588 | 0.644 | 0.559 | 0.767 | 0.727 | 0.767 |
| 0.750 | 0.348 | 0.294 | 0.316 | 0.271 | 0.470 | 0.401 | 0.470 | 0.695 | 0.616 | 0.673 | 0.578 | 0.789 | 0.746 | 0.789 | |
| 0.800 | 0.337 | 0.304 | 0.314 | 0.267 | 0.448 | 0.389 | 0.448 | 0.660 | 0.581 | 0.634 | 0.552 | 0.762 | 0.717 | 0.762 | |
| 0.950 | 0.388 | 0.339 | 0.349 | 0.304 | 0.474 | 0.423 | 0.474 | 0.694 | 0.612 | 0.646 | 0.559 | 0.775 | 0.741 | 0.775 | |
| 0.975 | 0.332 | 0.309 | 0.298 | 0.253 | 0.450 | 0.384 | 0.450 | 0.685 | 0.612 | 0.619 | 0.532 | 0.761 | 0.722 | 0.761 | |
| 0.990 | 0.377 | 0.348 | 0.316 | 0.280 | 0.453 | 0.401 | 0.453 | 0.719 | 0.640 | 0.610 | 0.527 | 0.774 | 0.719 | 0.774 | |
| 1.000 | 0.031 | 0.030 | 0.028 | 0.023 | 0.063 | 0.040 | 0.063 | 0.031 | 0.022 | 0.028 | 0.020 | 0.063 | 0.040 | 0.063 | |
Table 2.
Power analysis of 5 tests assuming the true covariance matrix is Σ4. We chose the significant level at α = 0.05. For the case n1 = n2 = 50, p = 200, we have m = 43, τα = 41.918, and γα = 3.841. For the case, n1 = n2 = 70, p = 1000, we have m = 62, τα = 72.318, and γα = 3.850. RMPBT is our approach. RAPTT is the approach of Srivastava et al. (2016). BS96 is the approach of Bai and Saranadasa (1996). SD08 is the approach of Srivastava and Du (2008). CQ10 is the approach of Chen et al. (2010).
| p0 | RMPBT (R1) | RMPBT (R2) | RAPTT (R1) | RAPTT (R2) | BS96 | SD08 | CQ10 | RMPBT (R1) | RMPBT (R2) | RAPTT (R1) | RAPTT (R2) | BS96 | SD08 | CQ10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||||||
| Alternative 1 | Alternative 2 | ||||||||||||||
| n1 = n2 = 50, p = 200 | 0.500 | 0.668 | 0.653 | 0.682 | 0.669 | 0.644 | 0.602 | 0.644 | 0.534 | 0.504 | 0.539 | 0.489 | 0.508 | 0.465 | 0.508 |
| 0.750 | 0.642 | 0.621 | 0.639 | 0.606 | 0.558 | 0.526 | 0.558 | 0.565 | 0.539 | 0.561 | 0.522 | 0.497 | 0.447 | 0.497 | |
| 0.800 | 0.658 | 0.649 | 0.645 | 0.617 | 0.576 | 0.535 | 0.576 | 0.591 | 0.569 | 0.577 | 0.532 | 0.528 | 0.485 | 0.528 | |
| 0.950 | 0.659 | 0.627 | 0.592 | 0.567 | 0.521 | 0.481 | 0.521 | 0.644 | 0.605 | 0.580 | 0.537 | 0.513 | 0.469 | 0.513 | |
| 0.975 | 0.688 | 0.667 | 0.564 | 0.538 | 0.531 | 0.478 | 0.531 | 0.678 | 0.656 | 0.556 | 0.521 | 0.523 | 0.473 | 0.523 | |
| 0.990 | 0.731 | 0.706 | 0.513 | 0.492 | 0.511 | 0.464 | 0.511 | 0.729 | 0.694 | 0.511 | 0.474 | 0.509 | 0.462 | 0.509 | |
| 1.000 | 0.049 | 0.049 | 0.049 | 0.048 | 0.057 | 0.044 | 0.057 | 0.049 | 0.046 | 0.049 | 0.042 | 0.057 | 0.044 | 0.057 | |
|
| |||||||||||||||
| n1 = n2 = 70, p = 1000 | 0.500 | 0.414 | 0.379 | 0.407 | 0.378 | 0.401 | 0.353 | 0.401 | 0.774 | 0.717 | 0.767 | 0.704 | 0.761 | 0.720 | 0.761 |
| 0.750 | 0.327 | 0.294 | 0.323 | 0.291 | 0.331 | 0.278 | 0.331 | 0.790 | 0.727 | 0.783 | 0.711 | 0.764 | 0.718 | 0.764 | |
| 0.800 | 0.348 | 0.318 | 0.347 | 0.304 | 0.343 | 0.285 | 0.343 | 0.796 | 0.734 | 0.788 | 0.713 | 0.775 | 0.728 | 0.775 | |
| 0.950 | 0.335 | 0.307 | 0.319 | 0.287 | 0.311 | 0.270 | 0.311 | 0.827 | 0.776 | 0.806 | 0.742 | 0.782 | 0.730 | 0.782 | |
| 0.975 | 0.315 | 0.278 | 0.295 | 0.266 | 0.294 | 0.245 | 0.294 | 0.836 | 0.776 | 0.786 | 0.720 | 0.755 | 0.716 | 0.755 | |
| 0.990 | 0.335 | 0.308 | 0.298 | 0.261 | 0.294 | 0.248 | 0.294 | 0.871 | 0.819 | 0.769 | 0.714 | 0.775 | 0.738 | 0.775 | |
| 1.000 | 0.060 | 0.052 | 0.056 | 0.050 | 0.063 | 0.045 | 0.063 | 0.060 | 0.040 | 0.056 | 0.038 | 0.063 | 0.045 | 0.063 | |
Table 3.
Power analysis of 5 tests assuming the true covariance matrix is Σ5. We chose the significant level at α = 0.05. For the case n1 = n2 = 50, p = 200, we have m = 43, τα = 41.918, and γα = 3.841. For the second case, n1 = n2 = 70, p = 1000, m = 62, τα = 72.318, and γα = 3.850. RMPBT is our approach. RAPTT is the approach of Srivastava et al. (2016). BS96 is the approach of Bai and Saranadasa (1996). SD08 is the approach of Srivastava and Du (2008). CQ10 is the approach of Chen et al. (2010).
| p0 | RMPBT (R1) | RMPBT (R2) | RAPTT (R1) | RAPTT (R2) | BS96 | SD08 | CQ10 | RMPBT (R1) | RMPBT (R2) | RAPTT (R1) | RAPTT (R2) | BS96 | SD08 | CQ10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||||||
| Alternative 1 | Alternative 2 | ||||||||||||||
| n1 = n2 = 50, p = 200 | 0.500 | 0.526 | 0.508 | 0.541 | 0.520 | 0.255 | 0.216 | 0.255 | 0.861 | 0.845 | 0.870 | 0.850 | 0.481 | 0.408 | 0.481 |
| 0.750 | 0.490 | 0.468 | 0.504 | 0.489 | 0.214 | 0.180 | 0.214 | 0.921 | 0.914 | 0.916 | 0.900 | 0.480 | 0.417 | 0.480 | |
| 0.800 | 0.505 | 0.497 | 0.519 | 0.502 | 0.226 | 0.168 | 0.226 | 0.928 | 0.923 | 0.918 | 0.908 | 0.500 | 0.409 | 0.500 | |
| 0.950 | 0.491 | 0.469 | 0.468 | 0.445 | 0.172 | 0.128 | 0.172 | 0.969 | 0.957 | 0.929 | 0.909 | 0.466 | 0.390 | 0.466 | |
| 0.975 | 0.537 | 0.517 | 0.455 | 0.437 | 0.192 | 0.139 | 0.192 | 0.969 | 0.959 | 0.894 | 0.882 | 0.484 | 0.410 | 0.484 | |
| 0.990 | 0.587 | 0.559 | 0.426 | 0.415 | 0.203 | 0.155 | 0.203 | 0.990 | 0.983 | 0.841 | 0.835 | 0.525 | 0.452 | 0.525 | |
| 1.000 | 0.060 | 0.055 | 0.075 | 0.067 | 0.064 | 0.047 | 0.064 | 0.060 | 0.054 | 0.075 | 0.061 | 0.064 | 0.047 | 0.064 | |
|
| |||||||||||||||
| n1 = n2 = 70, p = 1000 | 0.500 | 0.295 | 0.273 | 0.326 | 0.304 | 0.175 | 0.132 | 0.175 | 0.950 | 0.930 | 0.954 | 0.934 | 0.763 | 0.695 | 0.763 |
| 0.750 | 0.258 | 0.231 | 0.290 | 0.265 | 0.135 | 0.089 | 0.135 | 0.967 | 0.953 | 0.966 | 0.951 | 0.755 | 0.681 | 0.755 | |
| 0.800 | 0.273 | 0.242 | 0.288 | 0.264 | 0.133 | 0.097 | 0.133 | 0.966 | 0.950 | 0.969 | 0.954 | 0.772 | 0.708 | 0.772 | |
| 0.950 | 0.260 | 0.246 | 0.289 | 0.266 | 0.146 | 0.107 | 0.146 | 0.986 | 0.973 | 0.971 | 0.962 | 0.783 | 0.699 | 0.783 | |
| 0.975 | 0.229 | 0.207 | 0.248 | 0.225 | 0.109 | 0.078 | 0.109 | 0.986 | 0.978 | 0.970 | 0.956 | 0.767 | 0.692 | 0.767 | |
| 0.990 | 0.246 | 0.229 | 0.268 | 0.243 | 0.122 | 0.092 | 0.122 | 0.988 | 0.984 | 0.961 | 0.939 | 0.768 | 0.687 | 0.768 | |
| 1.000 | 0.096 | 0.078 | 0.112 | 0.103 | 0.059 | 0.034 | 0.059 | 0.096 | 0.071 | 0.112 | 0.089 | 0.059 | 0.034 | 0.059 | |
This performance differences can be explained as follows. All three non-projection approaches attempt to get around the issue related to inverting an ill-formed sample covariance matrix. In doing so, BS96 based their test statistic on a quadratic norm of the vector sample means, ignoring any possible weighing for the vectors entries. CQ10 develop their test around the cross-product of the sample vectors, also ignoring possible weighing. SD08 also based their test statistic around the square differences of the group sample means, but used the diagonal sample covariance as weights. Consequently, all three approaches tended to only perform well for diagonal or near diagonal structure covariance matrices. On the other hand, because the random projection based approaches make no assumption about the data covariance matrix, they can use the additional information provided in the dependency relationships to improve power. Both random projection based approaches seem to perform similarly under less sparse alternatives, that is, smaller value of p0.
However, RMPBT tended to have, in most cases, much higher power when compared to RAPTT for cases where the true mean differences are very sparse, especially in the scenario n1 = n2 = 50 and p = 200. Note that both RMPBT and RAPTT depend on the F-statistics. We think that the difference observed between the power of RMPBT and RAPTT are not due to the F-statistic and the choice of m per se. Instead, the differences found between RMPBT and RAPTT are due to the way each test quantifies the evidence contained in each F-statistic computed from each projected copies of a data set. For example, consider two arbitrary values, 20 and 1, for the F-statistic with 2 and 3 degrees of freedom. The RMPBT statistic is 0.5, and the equivalent test statistic for the RAPTT is 0.256. Suppose now that the values of the F-statistics are 10 and 0.01 instead. The test statistic for the RMPBT remains unchanged, but the test statistic for RAPTT become 0.518. Because in sparse alternatives scenarios and small sample setting, a large number of F-statistics are reported large (small p-values), and a smaller number is reported small (large p-values), the test statistic in RAPTT would tend to be affected by these few less significant p-values and causing non-rejection. RMPBT does not suffer this problem since the test statistic only relies on a 0–1 decision. In less sparse alternatives however, this discrepancy between RMPBT and RAPTT is less severe, which is reflected in very similar power reported by both approaches.
The Bayes factor based test proposed in this article assumed a non-informative prior for the nuisance parameters with a single scalar hyper-parameter. Other possibilities include proper priors, like a joint normal-inverse-Wishart prior for the nuisance parameters. Specifying such a prior, however, requires a practitioner to carefully choose high dimensional prior hyper-parameters, including a large p×p covariance matrix, a very difficult exercise for most applications. We found the power of the resulting test to be highly sensitive to the choice of the covariance matrix hyper-parameter of the normal-inverse-Wishart prior. Results are deferred to Section S.3 of the Supplemental Material.
5 Applications
5.1 Colon organoids data
Stem cells have some unique regenerative abilities and offer new potential to treating chronic diseases. But stem cells are often modulated by many factors, like the aryl hydrocarbon receptors (AhR), which are not well understood. A study was designed to examine the effect of the AhR on intestinal stem cells. Intestinal crypts were isolated from one mouse, plated, cultivated, separated in 4 sets of 3 plates and each set was treated with one of the following 4 treatments: TCDD only, Indole only, TCDD+Indole, DMSO (control). TCDD is a cancer inducing agent and has the effect of changing the expressions of many genes by activating the AhR. Indole has the role of modulating the effect of TCDD whereas DMSO has an anti-inflammatory property. Finally, RNA is isolated from each of the 12 organoids and sequenced. A gene-by-gene comparison between the TCDD only group versus TCDD+Indole group resulted in only 6 differentially expressed (DE) genes after adjusting for multiple comparisons (McCarthy et al., 2012). We use RMPBT to compare the expression of p = 2000 genes simultaneously between TCDD only and TCDD+Indole groups, with n1 = 3 and n2 = 3. Before we apply the tests, we take a log2 transformation of the gene expressions (after adding one to avoid issues with log2 of 0). In this set of 2000 genes, all the 6 DE genes previously found are included. For RMPBT, we use m = 2, N = 100000 random projections, , τα = 0.175, and γα = 4.302. Based on the results in Table 4, RMPBT is the only test that reports significance, when the random projection matrix is R1. This is consistent with the findings in the simulation where RMPBT showed high power for sparse alternative when the projection matrix was R1. The next smallest p-value is also produced by RMPBT with the projection matrix R2.
Table 4.
Summary of the analysis of 3 data sets: ORGNDS=organoids, BC=breast cancer, and SRBCT=small round blue cell tumors. In each case, we assume a significance level of α = 0.05 and report the probability of exceeding the test statistic obtained based on the data under the null (p-value). RMPBT is our approach. RAPTT is the approach of Srivastava et al. (2016). BS96 is the approach of Bai and Saranadasa (1996). SD08 is the approach of Srivastava and Du (2008). CQ10 is the approach of Chen et al. (2010).
| Data Set | (n1, n2) | Subset | p | RMPBT (R1) | RMPBT (R2) | RAPTT (R1) | RAPTT (R2) | BS96 | SD08 | CQ10 |
|---|---|---|---|---|---|---|---|---|---|---|
| ORGNDS | (3, 3) | 2000 | 0.0058 | 0.212 | 0.1392 | 0.510 | 0.6309 | - | 0.6309 | |
|
| ||||||||||
| BC | (111, 57) | Chromosome 1 | 374 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | 0.19 | 0.11 | 0.23 |
| Chromosome 2 | 233 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | 0.012 | 0.043 | 0.022 | ||
| Chromosome 12 | 191 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | 0.0004 | 0.026 | 0.003 | ||
|
| ||||||||||
| (37, 19) | Chromosome 1 | 374 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | 0.416 | 0.371 | 0.466 | |
| Chromosome 2 | 233 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | 0.223 | 0.273 | 0.327 | ||
| Chromosome 12 | 191 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | 0.135 | 0.266 | 0.304 | ||
|
| ||||||||||
| SRBCT | (11, 18) | 2308 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 | |
5.2 Breast cancer data
We apply the proposed method to the analysis of a breast cancer data set reported by Gravier, Eleonore et al. (2010). The study investigates the involvement of small, invasive ductal carcinoma without lymph (T1T2N0) in predicting metastasis of small node-negative breast carcinoma. Gene expression levels of around 2905 genes were reported for 168 patients over five years. Of the 168, n1 = 111 patients with no event after diagnosis were labelled good and the remaining n2 = 57 with early metastasis were labelled poor. We performed three gene-set comparisons between the two groups, good and poor. The gene-sets were similar to the sets compared in Thulin (2014). The first gene set had p = 374 genes located on Chromosome 1. The second set had p = 233 genes located on Chromosome 2 and the third set had p = 191 genes located on Chromosome 12. A restricted most powerful test 15 is obtained by choosing τα = 86.880 and evidence threshold γα = 3.852 with m = 75 and N = 10000 random projections. The values of , the cutoff values for the test statistic in Chromosome 1, 2 and 12 are 0.127, 0.165, and 0.175 respectively. We also compared both groups using RAPTT, BS96, SD08 and CQ10. The results reported in Table 4 indicate that RMPBT and RAPTT found significance in each gene set, but the non-random projection based approaches failed to find significance for the genes on Chromosome 1.
To investigate the impact of smaller sample sizes on the performance of each test, we compare again the two groups, now by only considering one-third of the total samples (n1 = 37 and n2 = 19). We run the tests on 100 independently sampled data sets from the original data set. For all three chromosomes, the median (over the 100 p-values) p-value for RMPBT (R1), RMPBT (R2), RAPTT (R1), RAPTT (R2) are all highly significant, whereas the p-values for BS96, SD08 and CQ10 are highly insignificant (see Table 4). We see that the approaches based on random projection are able to detect differences between both groups even for relatively smaller sample sizes. However, the other approaches fail to detect difference between both groups for smaller samples sizes.
5.3 SRBCT data
We finally apply our approach to the small round blue cell tumors (SRBCTs) data set which is available at http://www.biolab.si/supp/bi-cancer/projections/info/SRBCT.htm. SRBCT’s are comprised of 4 different childhood tumors. In this exercise, we would like to test for equality of expression mean of the genes between the neuroblastoma(NB) and the Burkitt’s lymphoma(BL) tumors group. The data contain p = 2308 gene expression for both NB and BL tumors with sample sizes 11 and 18 respectively. We estimate , τα = 4.836, γα = 3.720, and N = 10000. We report the p-values for each of these tests (see Table 4 SRBCT part). All the tests rejected the null hypothesis with high significance.
6 Conclusion
In this article, we proposed a Bayes factor based test for differences between group means in high dimensions. We transformed the data points to lower dimensional spaces using random projections. Using the transformed data, we obtained a closed form Bayes factor by carefully choosing the priors for the model parameters, involving a single scalar hyper-parameter. The hyper-parameter was chosen to obtain a restricted most powerful Bayesian test (RMPBT). Our final test was based on an ensemble of Bayes factors obtained from multiple projected copies. We showed unbiasedness and consistency of the proposed test under mild conditions. We illustrated the efficacy of the test in real and simulated examples.
An ongoing extension of the proposed test also considers non-local priors of Johnson and Rossell (2010) for the distribution of δ under the alternative and relaxes the assumption of equal covariance matrices between the two groups.
Supplementary Material
Figure 2.
Empirical distribution function of ϕ̄(N) under the null hypothesis for 5 different covariance matrices based on N = 50000 random projections and 1000 data sets.
Acknowledgments
Carroll and Mallick’s research was supported by grant U01-CA057030 and by grant R01-CA194391, both from the National Cancer Institute. We are grateful to Dr. Robert Chapkin for sharing the organoids data with us. We also thank the Associate Editor and the anonymous referees for their comments and suggestions which helped greatly improve the paper’s presentation.
Appendix
Appendix A Lemma 1 - Derivation of the Bayes factor
We recall that the minimal sufficient statistics are D = Ȳ−X̄, A = (n1Y +n2X)/(n1+n2) and . Also, , A ~ Np(μ, n−1Σ) and (n − 2)S ~ Wp(n − 2,Σ), independently, where δ = μ2 − μ1, μ = (n1μ1 + n2μ2)/n, , n = n1 + n2. Under our assumed framework, the joint distribution of the data in terms of the minimal sufficient statistics D, A and S is given by
We assume the following joint prior for μ and Σ as π(μ,Σ) ∝ |Σ|−(p+1)/2. Under H1, we choose the following conditional prior for δ as δ | Σ ~ Np(0,Σ/τ0). Under H1, we have
where . If we denote the marginal distribution of the data under H1 by m1(Data), then we have
where the second line is obtained by taking the integral with respect to μ. This integral only involves Np(μ | A, n−1Σ) and evaluates to 1. With a little bit of algebra, we get that
where A0 represents a quantity independent of the data. Similarly, under H0, δ = 0 and we can show that the marginal distribution of the data, denoted m0(Data), is
Therefore, the Bayes factor in favor of the alternative is
Setting η = n0/τ0 and , we get Equation (4).
Appendix B Proof of Theorem 1
-
Part(a) For 1 < m < n− 2, we integrate out the parameters with respect to the conjugate priors to obtain the Bayes factor in favor of the alternative aswhereRecall that n = n1+n2, 1/n0 = 1/n1+1/n2, η = n0/τ0, and nmin = min{n1, n2}. Since τ0 is fixed, η → ∞ as nmin → ∞. For a randomly chosen R, under H0, f★ ~ Fm,n−m−1 with m and n − m − 1 degrees of freedom. Thus, f★ = Op(1). Also, from well-known properties of the F distribution, we have thatwhere Beta(a, b) denotes a Beta distribution. Therefore, {η/(1+η)}U = Op(1). Hence, log {1 − ηU/(1 + η)} = Op(1) as nmin →∞. We then get
since log(1+η)→ ∞ as nmin → ∞ and limnmin→∞ m/n = θ ∈ (0, 1). We conclude that under the null hypothesis.
Under the alternative, μ1 ≠ μ2 and δ ~ Np(0,Σ/τ0). Then, f★ | λ ~ Fm,n−m−1(λ) with non-centrality λ = n0δTR(RTΣR)−1RTδ. Since δ ~ Np(0,Σ/τ0), , where denotes a χ2 distribution with m degrees of freedom. The non-centrality parameter depends on n through n0. We can show that the unconditional distribution of f★/(1+η) ~ Fm,n−m−1 (see Johnson, 2005, page 704). If we denote f0 = f★/(1+η), we have f0 = Op(1), and mf0/n = Op(1), as nmin →∞. We have thatFrom the above equation, we getSince f0 = Op(1), and m/(n − m − 1) converges, we haveWe conclude that , under the alternative hypothesis.
-
Part(b) We now assume that η → 0 and mη → ∞. We have
where U ~ Beta {m/2, (n − m − 1)/2} under H0. For large n, none of the terms with n dominates and their difference converges. The distribution log{BF10(X★,Y★)} then depends on that of U, which is bounded in probability. Therefore, under H0, log{BF10(X★,Y★)} = 𝒪p(1).
Under , again we havewhere with . Since log(1 + η){(n − 1)/2 − m/2} → ∞, we conclude that .
Appendix C Proof of Theorem 2
In this proof, we denote m(n), τα(n) and γα(n) simply as mn, τn and γn, respectively. That the projection matrix R, the Bayes factor, the variable f★, their distributions etc. all depend on mn is implicitly understood and hence mn is suppressed in their notation.
-
Part(a) Under H0, and some chosen values of α and mn, we have
making use of the results in Section 2.1 and noting that PX,Y {BF10(XR,YR) > γn} is computed over the data generating model under the null hypothesis.
Part(b) Under the alternative hypothesis, δ = δ1 ≠ 0, and f★ | λ ~ Fmn,n−mn−1(λ) distribution, with degrees of freedoms mn and n − mn − 1 and non-centrality parameter . Hence, under the alternative hypothesis P{BF10(XR,YR) > γα} = P(f★ > Fα,mn,n−mn−1) ≥ α, where Fα,mn,n−mn−1 is the α upper quantile of a Fmn,n−mn−1 distribution, as seen in Section 2.1. Marginalizing over δ1 under the alternative, f★/(1+ηn) ~ Fmn,n−mn−1, where ηn = n0/τn. Therefore, α = P{f★/(1 + ηn) > Fα,mn,n−mn−1} = P{f★ > (1 + ηn)Fα,mn,n−mn−1} ≤ P(f★ > Fα,mn,n−mn−1). We conclude ER{ϕ(R) | H1} ≥ α.
-
Part(c) First, we show that our construction of mn satisfies mn/n → θ ∈ (0, 1). For chosen n, Fα,m,n−m−1 is a convex function over the range of possible values of m, suggesting that mn and n − mn both diverge. See Figure 3.
The mean μn and variance of an Fmn,n−mn−1 distribution are given by and . For large mn and n − mn, we have Fα,mn,n−mn−1 ≈ μn + ΣnΦ−1(α), where Φ−1(α) is the upper α-quantile of the standard normal distribution. So Fα,mn,n−mn−1 is at its minimum when Σn is at its minimum, which happens when . Hence, mn/n → θ = 1/2 ∈ (0, 1) as nmin → ∞. Second, we show that nηn → ∞ as nmin → ∞. From equation (8), we have(A.1) We haveThe variance of a central F distribution with mn and n−mn −1 degrees of freedom is 2(n−mn−1)2(n−3)/{mn(n−mn−3)2(n−mn−5)} = O(1/mn). The convergence of Fα,mn,n−mn−1 is thus slower than . We conclude that nηn and hence mnηn → ∞. For a chosen α and the sequence of alternatives , we haveSince and Fα,mn,n−mn−1 → 1, we conclude that .
Figure 3.
Plot of Fα,m,n−m−1 against m = 1, 2, · · ·, n − 2 for different values of n = n1 + n2. The arrows point to values of mn obtained by our method for different n.
Appendix D Proof of Theorem 3
The power of RMBPT is . Henceforth, we make it explicit that depends on (n1, n2) and write instead.
For given n1, n2 and α, we choose so that . Since , for 0 < α < 1, we have that .
We have that as nmin → ∞, under the alternative for i = 1, · · ·, N. So, . Additionally, for fixed N as nmin →∞. We conclude that as nmin →∞.
Appendix E Proof of Theorem 4
The proof is similar to that of Theorem 2 of Srivastava et al. (2016) except that we do not rely on the cumulative distribution function of the F distribution.
Suppose R1, R2, · · ·, RN is a collection independently sampled projection matrices. Let ϕi = ϕ(Ri) = 1{BF10(Ri, τα) > γα} for i = 1, 2, · · ·, N. Recall that . Evaluating the conditional probability that ϕ̄(N) < x over the distribution of the random projection matrices given X and Y and then taking the expectation over the data, we get
| (A.2) |
We have
where ER(ϕ1 | X,Y ) and varR(ϕ1 | X,Y ) are respectively the conditional mean and variance of ϕ1. Given X and Y, the binary variables ϕi, i = 1, · · ·, N, are independent and identically distributed with finite mean and variance. By the Central Limit Theorem,
| (A.3) |
where Φ(a) is the standard normal cumulative distribution function evaluated at a. We need to show that both ER(ϕ1 | X,Y ) and varR(ϕ1 | X,Y ) have a distribution independent of μ1, μ2 and Σ under H0. This is equivalent to showing that EX,Y {ER(ϕ1 | X,Y )}r is independent of μ1, μ2, and Σ, for r = 1, 2, · · ·. From (7), we have
where . PR is a probability measure associated with the random projection matrices. Thus, the rth moment of ER{ϕ1 | X,Y )} is expressed as
| (A.4) |
Since 0 ≤ ER(ϕ1 | X,Y ) ≤ 1, we have
| (A.5) |
where we can safely interchange the order of integration by Fubini’s Theorem. Under H0, f(R | X,Y ) ~ Fmn,n−mn−1 and PX,Y {f(R | X,Y ) > Fα,mn,n−mn−1} = α. We conclude that (A.5) is independent of μ1, μ2, and Σ for any positive integer r.
Next, from (A.4), we have
| (A.6) |
where we can again safely exchange the order of integration using Fubini’s Theorem. In (A.6), since Ri, i = 1, · · ·, N, are identically and independently distributed with respect to the probability measure PR, we get that
is also independent of μ1, μ2, and Σ based on the result obtained in (A.5).
Also, we have
| (A.7) |
Using (A.2), (A.3), (A.7) and the bounded convergence theorem, we have
| (A.8) |
We conclude that for n1, n2, mn, and α, the asymptotic distribution of ϕ̄(N) as N → ∞ does not depend on the true parameters μ1, μ2, and Σ under H0.
Footnotes
Supplementary Material presents additional tables from the simulation study and a section presenting the derivation of a Bayes factor based on proper joint Normal-Inv-Wishart prior for the nuisance parameters. We have provided a table comparing power estimates between a deterministic projection based test and a random projection based test. We have also included some additional tables containing the results of a simulation comparing the power of RMPBT to that of a test obtained from a proper prior. A Julia code, implementing our approach, is also available as part of the Supplementary Material.
Contributor Information
Roger S. Zoh, Department of Epidemiology & Biostatistics, Texas A&M University, 1266 TAMU, College Station, TX 77843-1266, USA
Abhra Sarkar, Department of Statistical Science, Duke University, Box 90251, Durham NC 27708-0251, USA.
Raymond J. Carroll, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, USA. School of Mathematical Sciences, University of Technology, Sydney, Broadway NSW 2007, Australia
Bani K. Mallick, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, USA
References
- Bai ZD, Saranadasa H. Effect of high dimension: by an example of a two sample problem. Statistica Sinica. 1996;6:311–329. [Google Scholar]
- Chen SX, Qin YL, et al. A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics. 2010;38:808–835. [Google Scholar]
- Dasgupta S. Experiments with random projection. Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000); Morgan Kaufmann; 2000. pp. 143–151. [Google Scholar]
- Dasgupta S, Gupta A. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms. 2003;22:60–65. [Google Scholar]
- Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proceedings of the National Academy of Sciences. 2006;103:5923–5928. doi: 10.1073/pnas.0601231103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fern XZ, Brodley CE. Random projection for high dimensional data clustering: A cluster ensemble approach. Proceedings of the 20th International Conference on Machine Learning; 2003. pp. 186–193. [Google Scholar]
- Goddard SD, Johnson VE. Restricted most powerful Bayesian tests for linear models. Scandinavian Journal of Statistics. 2016;43(4):1162–1177. [Google Scholar]
- Gravier Eleonore, Pierron G, Vincent-Salomon A, gruel N, Raynal V, Savignoni A, De Rycke Y, Pierga JY, Lucchesi C, Reyal F, Fourquet A, Roman-Roman S, Radvanyi F, Sastre-Garau X, Asselain B, Delattre O. A prognostic DNA signature for T1T2 node-negative breast cancer patients. Genes, Chromosomes and Cancer. 2010;49:1125–1125. doi: 10.1002/gcc.20820. [DOI] [PubMed] [Google Scholar]
- Gregory KB, Carroll RJ, Baladandayuthapani V, Lahiri SN. A two-sample test for equality of means in high dimension. Journal of the American Statistical Association. 2014;110:837–849. doi: 10.1080/01621459.2014.934826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guhaniyogi R, Dunson DB. Bayesian compressed regression. Journal of the American Statistical Association. 2015;110(512):1500–1514. [Google Scholar]
- Jeffreys H. Theory of Probability. 3 Oxford University Press; New York, NY: 1961. [Google Scholar]
- Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. Vol. 4. Prentice Hall; Englewood Cliffs, NJ: 1992. [Google Scholar]
- Johnson VE. Bayes factors based on test statistics. Journal of the Royal Statistical Society: Series B. 2005;67:689–701. [Google Scholar]
- Johnson VE. Revised standards for statistical evidence. Proceedings of the National Academy of Sciences. 2013a;110:19313–19317. doi: 10.1073/pnas.1313476110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson VE. Uniformly most powerful Bayesian tests. Annals of Statistics. 2013b;41:1716. doi: 10.1214/13-AOS1123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson VE, Rossell D. On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society: Series B. 2010;72:143–170. [Google Scholar]
- Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
- Li P, Hastie TJ, Church KW. Very sparse random projections. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining; 2006. pp. 287–296. [Google Scholar]
- Lopes M, Jacob L, Wainwright MJ. A more powerful two-sample test in high dimensions using random projection. Advances in Neural Information Processing Systems. 2011:1206–1214. [Google Scholar]
- McCarthy D, Chen Y, Smyth G. Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleic Acids Research. 2012;40(10):4288–4297. doi: 10.1093/nar/gks042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. 1933;231:289–337. [Google Scholar]
- Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purposes of statistical inference: Part ii. Biometrika. 1928;20A:263–294. (Part II) [Google Scholar]
- Srivastava MS. Multivariate theory for analyzing high dimensional data. Journal of the Japan Statistical Society. 2007;37:53–86. [Google Scholar]
- Srivastava MS, Du M. A test for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis. 2008;99:386–402. [Google Scholar]
- Srivastava R, Li P, Ruppert D. RAPTT: An exact two-sample test in high dimensions using random projections. Journal of Computational and Graphical Statistics. 2016;25(3):954–970. [Google Scholar]
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thulin M. A high-dimensional two-sample test for the mean using random subspaces. Computational Statistics & Data Analysis. 2014;74:26–38. [Google Scholar]
- Wu Y, Genton MG, Stefanski LA. A multivariate two-sample mean test for small sample size and missing data. Biometrics. 2006;62:877–885. doi: 10.1111/j.1541-0420.2006.00533.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



