Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2017 Mar 18;103(3):609–624. doi: 10.1093/biomet/asw029

An adaptive two-sample test for high-dimensional means

Gongjun Xu 1,2,3,4,, Lifeng Lin 1,2,3,4, Peng Wei 1,2,3,4, Wei Pan 1,2,3,4
PMCID: PMC5549874  NIHMSID: NIHMS886329  PMID: 28804142

SUMMARY

Several two-sample tests for high-dimensional data have been proposed recently, but they are powerful only against certain alternative hypotheses. In practice, since the true alternative hypothesis is unknown, it is unclear how to choose a powerful test. We propose an adaptive test that maintains high power across a wide range of situations and study its asymptotic properties. Its finite-sample performance is compared with that of existing tests. We apply it and other tests to detect possible associations between bipolar disease and a large number of single nucleotide polymorphisms on each chromosome based on data from a genome-wide association study. Numerical studies demonstrate the superior performance and high power of the proposed test across a wide spectrum of applications.

Keywords: Genome-wide association study, Single nucleotide polymorphism, Sum-of-powers test

1. Introduction

Two-sample testing on the equality of two high-dimensional means is common in genomics and genetics. For instance, Chen & Qin (2010) considered analysis of differential expressions for gene sets based on microarray data. In our motivating example and other genome-wide association studies (The International Schizophrenia Consortium, 2009), polygenic testing is of interest: one would like to test whether there is any association between a disease and a large number of genetic variants, mostly single nucleotide polymorphisms. In these applications, the dimension of the data, Inline graphic, is often much larger than the sample size Inline graphic. Traditional multivariate two-sample tests, such as the Inline graphic-test of Hotelling (1931), either cannot be directly applied or have too low power. As shown theoretically in Fan (1996), as the dimension Inline graphic increases, even for simple one-sample testing on the mean of a normal distribution with a known covariance matrix Inline graphic, the standard Wald, score or likelihood ratio tests may have power that decreases to the Type I error rate as the departure from the null hypothesis increases. Several two-sample tests for high-dimensional data have been proposed (Bai & Saranadasa, 1996; Srivastava & Du, 2008; Chen & Qin, 2010; Cai et al., 2014; Gregory et al., 2015; Srivastava et al., 2015). There are two common types of testing approach when Inline graphic: one based on the sum-of-squares of the sample mean differences and the other based on the maximum componentwise sample mean difference. The two types of tests are powerful against different alternatives: if the true mean differences are dense in the sense that there is a large proportion of small to moderate componentwise differences, then the former type is more powerful; in contrast, if the true mean differences are sparse in the sense that there are only few but large nonzero componentwise differences, the latter type of test is more powerful. In practice, however, it is unclear which should be applied. Furthermore, as will be shown in the simulation study, there are denser and intermediate situations in which neither type of test is powerful.

In this paper, we develop an adaptive testing procedure which yields high testing power against various alternative hypotheses in the high-dimensional setting. This is achieved through combining information across a class of sum-of-powers tests, including tests based on the sum-of-squares of the mean differences and the supremum mean difference. The main idea is to incorporate multiple tests in the procedure so that at least one of them would yield a high power for a particular application with unknown truth. The proposed adaptive sum-of-powers test then selects the most powerful of the candidate tests. To perform the proposed test, we establish the asymptotic null distribution of the adaptive test statistic. In particular, we derive the joint asymptotic distribution for a set of the sum-of-powers test statistics. The marginal distributions of the test statistics converge to the normal distribution or the extreme value distribution, depending on the power parameters. Based on the theoretical results, we develop a new way to calculate asymptotic Inline graphic-values for the adaptive test.

We further demonstrate the superior performance of the proposed adaptive test in the context of large Inline graphic and small Inline graphic. We compare its performance with several existing tests which have not yet been applied to single nucleotide polymorphism data. Due to the discrete nature of single nucleotide polymorphism data, normal-based parametric tests are not suitable. In addition, although the sparsity assumption has been so widely adopted, the nonzero differences in single nucleotide polymorphism data may not be sparse, as predicted by the polygenic theory of Fisher (1918). The problem of nonsparse signals has begun to attract the attention of statisticians (e.g., Hall et al., 2014). It is highly relevant here because the performance of a test, especially a nonadaptive one, may depend on how sparse the signals are, as illustrated in the real-data analysis. An R (R Development Core Team, 2016) package highmean that implements the tests studied here is available from the Comprehensive R Archive Network, CRAN.

2. Some existing tests

Suppose that we observe two groups of Inline graphic-dimensional independent and identically distributed samples Inline graphic and Inline graphic; we consider high-dimensional data with Inline graphic. Let Inline graphic and Inline graphic denote the true mean vectors of the groups, and assume throughout that the two groups share a common covariance matrix Inline graphic. Our primary object is to test Inline graphic versus Inline graphic In this section, we review existing two-sample tests for high-dimensional data. For Inline graphic let Inline graphic be the sample mean for group Inline graphic, and let Inline graphic be the pooled sample covariance matrix. The precision matrix, i.e., the inverse of the covariance matrix, is written as Inline graphic. Moreover, for a vector Inline graphic, we denote by Inline graphic its Inline graphicth element.

The best-known two-sample test for low-dimensional data is the Inline graphic-test of Hotelling (1931), which is a generalization of the two-sample Inline graphic-test for Inline graphic to multivariate data with Inline graphic but Inline graphic: Inline graphic The Inline graphic-test, however, is not applicable to high-dimensional data because Inline graphic is singular. Accordingly, some modifications have been proposed in which Inline graphic is replaced by a known quantity or another estimate. A straightforward procedure is to substitute an identity matrix Inline graphic for Inline graphic, forming a sum-of-squares-type test, which is directly based on the Inline graphic-norm of the sample mean differences, Inline graphic, or its weighted version (Bai & Saranadasa, 1996; Srivastava & Du, 2008; Chen & Qin, 2010). Bai & Saranadasa (1996) proposed a test statistic

graphic file with name M40.gif

and established its asymptotic normal null distribution. Chen & Qin (2010) noticed some theoretical difficulties due to the presence of the cross-product terms Inline graphic in Inline graphic, and proposed removing them to obtain a new test statistic

graphic file with name M43.gif

whose asymptotic properties were established under much weaker conditions.

To account for possibly varying variances of the components of the data, one may replace Inline graphic by a diagonal version Inline graphic, where Inline graphic are the diagonal elements of Inline graphic; the matrix Inline graphic is in general nonsingular. Srivastava & Du (2008) introduced such a weighted version of the sum-of-squares-type test of Bai & Saranadasa (1996):

graphic file with name M49.gif

where Inline graphic is the sample correlation matrix and Inline graphic.

All of the above sum-of-squares-type test statistics are asymptotically distributed as normal under Inline graphic. These tests are usually powerful against moderately dense alternative hypotheses, where there is a large proportion of nonzero components in the true mean differences Inline graphic. However, if the nonzero signals are sparse, these tests lose substantial power (Cai et al., 2014). Accordingly, Cai et al. (2014) proposed a supremum-type statistic using the Inline graphic-norm of the sample mean differences, i.e.,

graphic file with name M55.gif

where Inline graphic are the diagonal elements of the covariance matrix Inline graphic. In practice, we use the sample variances Inline graphic to estimate the Inline graphic.

A supremum-type statistic and a sum-of-squares-type statistic represent two extremes: the former uses only a single component as evidence against the null hypothesis, while the latter uses all of the components. Neither of the statistics will be uniformly better; they are more powerful for sparse and dense nonzero signals, respectively (Gregory et al., 2015). However, for more dense or only weakly dense nonzero signals, neither may be powerful: there may not be a single component to represent a strong departure from Inline graphic, whereas a sum-of-squares statistic may accumulate too much noise through summing over the zero components. To boost the power when nonzero signals are neither too dense nor too sparse, Chen et al. (2014) proposed removing estimated zero components through thresholding; since zero components are expected to give small squared sample mean differences, those smaller than a given threshold would be ignored, leading to a test statistic

graphic file with name M61.gif

where the threshold level is Inline graphic and Inline graphic is the indicator function. Since an optimal choice of the threshold is unknown, Chen et al. (2014) proposed trying all possible threshold values and then choosing the most significant one as the final test statistic:

graphic file with name M64.gif

where Inline graphic and Inline graphic are estimates of the mean and standard deviation of Inline graphic under the null hypothesis. The asymptotic null distribution of Inline graphic is an extreme value distribution. Because of the slow convergence to the asymptotic null distribution, Chen et al. (2014) proposed using the parametric bootstrap to calculate its Inline graphic-values. The test Inline graphic can be regarded as an adaptive test: it uses thresholding to adapt to unknown signal sparsity. It is closely related to another adaptive test for association analysis of rare variants in genetics (Pan & Shen, 2011).

Remark 1. —

Sum-of-squares-type tests and supremum-type tests have also been used in analyses of genome-wide association studies with large Inline graphic and small Inline graphic. For example, in the framework of generalized linear models, the sum-of-squared-score test in Pan (2009) for association analysis of multiple single nucleotide polymorphisms can be regarded as a sum-of-squares-type test, while another widely used test in single nucleotide polymorphism analysis is similar to the supremum-type test of Cai et al. (2014). As shown in Pan (2011), the sum-of-squared-score test is equivalent to the variance-component-score test with a linear kernel (Wu et al., 2010) and a nonparametric multivariate analysis of variance (Wessel & Schork, 2006), both used in genetics, as well as to an empirical Bayes test for high-dimensional data (Goeman et al., 2006).

Remark 2. —

Cai et al. (2014) also introduced test statistics based on linearly transformed sample mean differences. Although they discussed the transformation only for their supremum-type statistic, the same transformation can be applied to other test statistics (Chen et al., 2014). However, the transformation may not work for very dense signals; for example, in some cases with more than Inline graphic nonzero signals for Inline graphic, a test using the precision matrix transformation could be outperformed by that without transformation (Cai et al., 2014). Furthermore, conducting the Inline graphic-transformation requires an estimate of the Inline graphic precision matrix, which is time-consuming for large Inline graphic (Gregory et al., 2015). More importantly, any test can be conducted on either the original or the transformed data, which is not a focus here. Therefore, this article does not consider data transformations.

3. Main results

3.1. Test statistics

We first propose a family of sum-of-powers tests, indexed by a positive integer Inline graphic. For any Inline graphic, we define a sum-of-powers test statistic with power index Inline graphic as

graphic file with name M81.gif

When Inline graphic, this yields a sum-of-squares-type test statistic equivalent to that of Bai & Saranadasa (1996). Since, as an even Inline graphic,

graphic file with name M84.gif

following the supremum-type test statistic in Cai et al. (2014) we define

graphic file with name M85.gif

Thus, the class of the sum-of-powers tests includes both a sum-of-squares test and a supremum-type test as special cases. Furthermore, Inline graphic is like a burden test widely studied in genetic association analysis of rare variants for large Inline graphic and small Inline graphic (Pan & Shen, 2011; Lee et al., 2012). If nonzero signals are extremely dense with almost the same sign, then a burden test like Inline graphic can be more powerful than both the sum-of-squares and the supremum-type tests; see our numerical examples and § 3.3. Similarly, there are situations with only weakly dense signals, in which an Inline graphic test with Inline graphic may be more powerful than both the sum-of-squares and the supremum-type tests.

Which Inline graphic is most powerful depends on the unknown pattern of nonzero signals, such as sparsity and signal strength. Hence, we propose the following adaptive test to combine the sum-of-powers tests and improve the test power:

graphic file with name M93.gif (1)

where Inline graphic is the Inline graphic-value of Inline graphic test. The idea of taking the minimum Inline graphic-value to approximate the maximum power has been widely used (e.g., Yu et al., 2009), but Inline graphic is no longer a genuine Inline graphic-value. In order to perform the proposed adaptive test, in the next section we derive the asymptotic distribution. In practice, one has to decide what candidate values of Inline graphic are to be used. From the theoretical power study in § 3.3 and the simulation study, we suggest using Inline graphic with Inline graphic or a little bigger for a larger Inline graphic ratio.

Remark 3. —

Our tests for small Inline graphic and large Inline graphic are in the same spirit as those proposed for analysis of rare variants with large Inline graphic and small Inline graphic (Pan et al., 2014). Specifically, Pan et al. (2014) defined spuInline graphic, where Inline graphic is the score vector for a parameter, say Inline graphic, in a generalized linear model under a null hypothesis Inline graphic. The spuInline graphic test can be regarded as a weighted score test (Lin & Tang, 2011) with weights Inline graphic. In the current context, the score Inline graphic becomes the sample mean difference, so we use the same name and denote the adaptive test statistic by Inline graphic in (1). Apart from, the difference between the small Inline graphic large Inline graphic and large Inline graphic small Inline graphic scenarios, asymptotic results for the adaptive test have not yet been described in the literature. In this paper we derive asymptotics of the test statistics Inline graphic in the high-dimensional setting, based on which we can calculate the asymptotic Inline graphic-values of Inline graphic and Inline graphic.

3.2. Asymptotic theory

For simplicity, we present our results under the assumption of a common covariance matrix Inline graphic, although our derivations and proofs in the Supplementary Material are established without this assumption. In the following we write Inline graphic. Under Inline graphic, we first derive asymptotic approximations to the mean and variance of Inline graphic for Inline graphic, denoted by Inline graphic and Inline graphic, respectively. We assume that Inline graphic as Inline graphic. We write Inline graphic if Inline graphic and let Inline graphic denote the largest integer not greater than Inline graphic.

We need the following assumptions.

Condition 1 (Covariance assumption). —

There exists some constant Inline graphic such that Inline graphic where Inline graphic and Inline graphic denote the minimum and maximum eigenvalues of the covariance matrix Inline graphic. In addition, all correlations are bounded away from Inline graphic and 1, i.e., Inline graphic for some Inline graphic.

Condition 2 (Mixing assumption). —

For a set of multivariate random vectors Inline graphic and integers Inline graphic, let Inline graphic be the Inline graphic-algebra generated by Inline graphic. For each Inline graphic, define the Inline graphic-mixing coefficient Inline graphic We assume that Inline graphic is Inline graphic-mixing for Inline graphic and that Inline graphic, where Inline graphic and Inline graphic is some constant.

Condition 3 (Moment assumption). —

We assume that Inline graphic and max1≤i≤pE[exp{h(X(i)k1−μ(i)k)2}]<∞ for Inline graphic and Inline graphic.

Remark 4. —

Conditions 1 and 3 were also assumed in Cai et al. (2014), and they are needed to establish the weak convergence of Inline graphic. When Inline graphic, asymptotic normality can be established under weaker assumptions on the eigenvalues and correlations. However, in order to establish weak convergence of Inline graphic for Inline graphic, stronger moment assumptions may be needed than those in Chen & Qin (2010), whose test statistic is similar to Inline graphic. Condition 2 imposes weak dependence on the data. A similar mixing condition is considered in Chen et al. (2014), and such weak dependence is also commonly assumed in time series and spatial statistics. Alternatively, we may consider the weak dependence structure introduced in Bai & Saranadasa (1996) and Chen & Qin (2010), where a factor-type model for Inline graphic is assumed. Since the variables in the motivating genome-wide association studies have a local dependence structure, with their correlations often decaying to zero as their physical distances on a chromosome increase, we focus on mixing-type weak dependence in this paper.

We write Inline graphic, where Inline graphic. Then the following approximation holds for Inline graphic and Inline graphic with Inline graphic.

Proposition 1. —

UnderInline graphic, we haveInline graphicand

graphic file with name M175.gif

whereInline graphicis the third central moment of the random variable in componentInline graphicfrom groupInline graphic, i.e.,Inline graphic.

For any positive integers Inline graphic and Inline graphic with Inline graphic even, define a set Inline graphic of integers Inline graphic such that Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic. For simplicity, we write the set as Inline graphic.

Proposition 2. —

Under Conditions 1–3 andInline graphic,Inline graphicwhereInline graphicis aInline graphicvector whose elements are all 1, and, forInline graphic,

graphic file with name M198.gif (2)

Because Inline graphic, we have that Inline graphic if Inline graphic for any Inline graphic. Since the boundedness condition on the eigenvalues, Condition 1, implies the boundedness of the variances Inline graphic, Inline graphic is of order Inline graphic

To derive the asymptotic joint distribution of the test statistics Inline graphic, we also need the following result to approximate their correlations: Inline graphic

Proposition 3. —

Under Conditions 1–3 andInline graphic, for finiteInline graphic, ifInline graphicis even then

graphic file with name M211.gif

and ifInline graphicis odd,Inline graphic.

Now we are ready to introduce the asymptotic joint distributions for the test statistics Inline graphic.

Theorem 1. —

LetInline graphicbe a candidate set ofInline graphicvalues containingInline graphic. Assume thatInline graphicforInline graphic. Under Conditions 1–3 and the null hypothesisInline graphic, the following properties hold:

  1. for the setInline graphic,Inline graphicconverges weakly to a normal distributionInline graphic, whereInline graphicsatisfiesInline graphicforInline graphicandInline graphicforInline graphic. In particular,Inline graphicwhenInline graphicis odd;

  2. whenInline graphic,Inline graphicfor anyInline graphic, whereInline graphic

  3. Inline graphicandInline graphicare asymptotically independent.

We can use Propositions 1–3 to approximate Inline graphic, Inline graphic and Inline graphic, respectively, and then calculate the Inline graphic-value for the proposed adaptive test. Define Inline graphic and Inline graphic as the sets consisting of standardized Inline graphic with Inline graphic odd and even, respectively, i.e., Inline graphic and Inline graphic By Theorem 1, Inline graphic and Inline graphic are asymptotically independent, and each is asymptotically independent of Inline graphic. Thus we can obtain the Inline graphic-value of the adaptive test from these three sets of statistics. Consider the realizations of the test statistics, Inline graphic and Inline graphic We calculate the Inline graphic-values for Inline graphic and Inline graphic as Inline graphic and Inline graphic. We use the function pmvnorm in the R package mvtnorm to calculate the multivariate normal tail probabilities Inline graphic and Inline graphic (R Development Core Team, 2016). Finally, we take the minimum Inline graphic-value from the odd, even and infinity tests, i.e., Inline graphic; then, by the asymptotic independence of Inline graphic, Inline graphic and Inline graphic), the asymptotic Inline graphic-value for the adaptive test is Inline graphic.

The above discussion focuses on the case where the covariance matrix Inline graphic is known. In practice, Inline graphic must be estimated. We can apply existing methods, such as banding and thresholding techniques, to estimate a high-dimensional sparse covariance matrix (Bickel & Levina, 2008; Rothman et al., 2010; Cai & Liu, 2011; Xue et al., 2012). In the simulation study and real-data analysis, we used the banding approach of Bickel & Levina (2008): for a sample covariance matrix Inline graphic, the banded matrix with bandwidth Inline graphic is defined as Inline graphic. Theoretical properties of Inline graphic have been studied in Bickel & Levina (2008). We used five-fold crossvalidation to select an optimal bandwidth in our simulations and real-data analysis (Bickel & Levina, 2008; Cai & Liu, 2011). Under the conditions in Theorem 1, we can show that Inline graphic estimated based on the banded matrix Inline graphic satisfies Inline graphic for properly chosen Inline graphic. Consider the approximation of Inline graphic in (2). Under the weak dependence condition, Condition 2, for any Inline graphic and Inline graphic, there is a constant Inline graphic such that Inline graphic (see, e.g., Guyon, 1995). Therefore, for Inline graphic as Inline graphic, the sum of terms with Inline graphic, i.e., Inline graphic, is ignorable. On the other hand, in Inline graphic there are Inline graphic summands in total. Since Inline graphic, we can obtain Inline graphic if Inline graphic By the result that Inline graphic is of order Inline graphic, we obtain Inline graphic. Similarly, we can show that Inline graphic and the estimators of the correlations are consistent.

In applications, the components of the observations may be measured on different scales. Therefore, we could consider an inverse variance weighted test statistic Inline graphic (Inline graphic). For Inline graphic, Inline graphic is already weighted by the inverse variances and we let Inline graphic. To calculate the Inline graphic-values for Inline graphic and the corresponding adaptive test, it is straightforward to use the asymptotic properties on the weighted samples Inline graphic, where Inline graphic and Inline graphic. In practice, we replace the unknown Inline graphic with the sample variances Inline graphic. Results similar to Theorem 1 can be established.

Remark 5. —

For simplicity, this paper focuses on the case where two groups of samples share a common covariance matrix. More generally, the two groups may have different covariance matrices, Inline graphic. In this situation, one may apply a two-sample test without assuming a common covariance matrix: the definitions of the tests remain the same, while for the weighted tests, the weights for the sample mean differences become the reciprocals of the diagonal elements in Inline graphic. The asymptotic properties of the proposed tests are still valid in this situation; see the Supplementary Material.

Remark 6. —

The asymptotic independence of the sum-of-squares- and supremum-type statistics has been studied in Hsing (1995) for weakly dependent observations. Under the sparse signal alternative with Inline graphic, similar tests to the proposed Inline graphic and Inline graphic have also been studied in Zhong et al. (2013) with an additional higher criticism thresholding of the means; the asymptotic independence between the sum-of-squares-type statistics and a screening statistic by higher criticism thresholding has been studied in Fan et al. (2015). However, our study differs from theirs in several respects. First, our proposed method is adaptive and powerful for both sparse and dense signal alternatives, as shown by the theoretical and numerical results, whereas Zhong et al. (2013) and Fan et al. (2015) focus on sparse alternatives. As illustrated in the simulation, when the signals are dense, the proposed test performs better than the thresholding-type test in Chen et al. (2014). Second, we theoretically study a family of power statistics Inline graphic with different finite and infinite values of Inline graphic and establish their joint distribution; Zhong et al. (2013), on the other hand, focused on Inline graphic- and Inline graphic-type statistics and studied their performance separately, while Fan et al. (2015) considered the limiting behaviour of the summation of a sum-of-squares-type statistic and a screening statistic by higher criticism thresholding.

3.3. Asymptotic power analysis

In this section, we analyse the asymptotic power of the proposed adaptive test. Under the alternative Inline graphic, we first derive approximations for the mean, variance and covariance functions for Inline graphic with Inline graphic, denoted respectively by Inline graphic, Inline graphic and Inline graphic for Inline graphic. We write Inline graphic (Inline graphic).

Proposition 4. —

Under the regularity conditions in Theorem 1 andInline graphic,

graphic file with name M326.gif

where approximations forInline graphicandInline graphicare given in Proposition 1. In particular,Inline graphic,Inline graphic,Inline graphic, and

graphic file with name M332.gif

Proposition 5. —

Under the conditions in Theorem 1 andInline graphic,

graphic file with name M334.gif

where forInline graphicorInline graphic,Inline graphic, and forInline graphicandInline graphic,

graphic file with name M340.gif

whereInline graphicforInline graphicandInline graphicis the set of nonnegative integersInline graphicsuch thatInline graphic,Inline graphic,Inline graphicandInline graphicorInline graphic.

The variance function isInline graphic. In particular,Inline graphicandInline graphic

We now analyse the power of the test. For the testing statistic in (1), let Inline graphic be the critical threshold under Inline graphic with significance level Inline graphic. The test power under Inline graphic then satisfies Inline graphic for any Inline graphic. Therefore, the asymptotic power of the proposed adaptive test is 1 if there exists Inline graphic such that Inline graphic, that is, if Inline graphic has asymptotic power equal to 1. Hence, to study the asymptotic power of the adaptive test, we only need to focus on the power of Inline graphic for Inline graphic.

Under the alternative, we denote the set of locations of the signals by Inline graphic and the cardinality of Inline graphic by Inline graphic, where Inline graphic is the sparsity parameter. In the following, we consider two cases: the dense signal case with Inline graphic and the sparse signal case with Inline graphic.

Case 1: Inline graphic. To study the asymptotic power, we consider the local alternative with small Inline graphic. Consider the set Inline graphic, and for any finite Inline graphic define the corresponding average standardized signal as Inline graphic. If Inline graphic with Inline graphic, then Inline graphic and Inline graphic. A proof similar to that of Theorem 1 gives the following result.

Theorem 2. —

Under the conditions in Theorem 1 and the alternativeInline graphicwithInline graphicandInline graphicforInline graphic,Inline graphicconverges weakly to a multivariate normal distribution with mean zero and covariance matrixInline graphicgiven in Proposition 5.

Theorem 2 gives the asymptotic test power of Inline graphic at significance level Inline graphic as

graphic file with name M387.gif

where Inline graphic is the standard normal cumulative distribution function and Inline graphic is its Inline graphicth quantile. Since Inline graphic is bounded under the alternative considered, the asymptotic power is mainly dominated by Inline graphic. In addition, Inline graphic is of order Inline graphic and therefore the power goes to 1 if Inline graphic. Intuitively speaking, the power of the adaptive test converges to 1 if some of the average standardized signals are of order higher than Inline graphic, which is Inline graphic. For example, when Inline graphic or 2, from the derivations in Proposition 4 we have that the asymptotic power of Inline graphic or Inline graphic goes to 1 if Inline graphic or Inline graphic that is, if Inline graphic or Inline graphic is of order higher than Inline graphic.

For different values of Inline graphic, the test statistic Inline graphic that achieves the highest power depends on the specific dense alternative. To further study the power of different test statistics Inline graphic and how to choose the set Inline graphic, we consider a special case where the signal strength is fixed at the same level, Inline graphic, Inline graphic and Inline graphic. In this case, we show in the Supplementary Material that under the alternative hypothesis with small Inline graphic, the Inline graphic test is asymptotically more powerful than the other Inline graphic tests. On the other hand, because of the slow convergence rate to the asymptotic distribution, which depends on the value of Inline graphic, the performance of Inline graphic for a finite sample may not be as good as that of Inline graphic tests with Inline graphic, especially when the sparsity parameter Inline graphic is close to Inline graphic and Inline graphic is not large enough; see the Supplementary Material. Similarly, we can show that Inline graphic is asymptotically more powerful if the absolute values of the Inline graphic have the same level but the signs are random with about half being positive.

Case 2: Inline graphic. The result in Case 1 implies that when Inline graphic and Inline graphic, the test power of Inline graphic goes to 1 if Inline graphic, which is satisfied in most cases if some average standardized signal is of order higher than Inline graphic. However, in the sparse setting with Inline graphic and Inline graphic, Inline graphic loses power. To illustrate this, take Inline graphic and 2. For any Inline graphic, the powers of Inline graphic and Inline graphic converge to 1 if Inline graphic and Inline graphic are of order higher than Inline graphic. However, when Inline graphic, Inline graphic and the asymptotic powers of Inline graphic and Inline graphic are strictly less than 1 even if Inline graphic and Inline graphic

On the other hand, Inline graphic is known to be powerful against sparse alternatives; therefore, the proposed adaptive sum-of-powers test still has asymptotic power equal to 1 if that of Inline graphic converges to 1. The asymptotic power of Inline graphic has been studied in Cai et al. (2014); from their Theorem 2, the power of Inline graphic converges to 1 if Inline graphic for a certain constant Inline graphic and if the nonzero Inline graphic are randomly uniformly sampled with sparsity level Inline graphic. The condition that Inline graphic was assumed by the authors because of the technical difficulty in proving the asymptotic results. It is expected that the asymptotic power is still 1 for Inline graphic but the proof would be more challenging (Cai et al., 2014).

Combining the above theoretical arguments and simulation results, we recommend including small Inline graphic values such as Inline graphic and medium Inline graphic values such as Inline graphic in Inline graphic to achieve balance between the asymptotic and finite-sample performances when the signals are dense; in addition, we also recommend including Inline graphic in Inline graphic, as Inline graphic is more powerful when the signals are sparse. See the Supplementary Material for more details and simulation studies.

Remark 7. —

When the signal is dense, Inline graphic, the Inline graphic test performs similarly to the tests in Bai & Saranadasa (1996) and Chen & Qin (2010). As discussed above, there are alternatives under which Inline graphic is not as powerful as other Inline graphic tests, and therefore in these dense signal cases, the proposed test is more powerful than those of Bai & Saranadasa (1996) and Chen & Qin (2010), as illustrated by the simulation study. When the signal is sparse, Inline graphic, the Inline graphic test is equivalent to the supremum test in Cai et al. (2013), so the proposed adaptive test would perform similarly to that of Cai et al. (2013). On the other hand, under certain sparse alternatives, the Inline graphic test may not be as powerful as the thresholding tests in the literature, such as the test proposed in Chen et al. (2014). To illustrate this, consider the oracle case, where the signal set Inline graphic is known and has order Inline graphic with Inline graphic. Suppose that the Inline graphic are independent standard normal and signals are at the same level Inline graphic for some large constant Inline graphic. Then, the oracle test statistic with power index Inline graphic, namely Inline graphic, has test power going to 1 if Inline graphic. In particular, the log of the Type II error is of the order of Inline graphic For the Inline graphic test, the log of the Type II error is of the order of Inline graphic Therefore, in this ideal case, the Inline graphic test, which excludes nonsignal locations, is more powerful than the supremum-type test Inline graphic.

4. Simulations

In this section we compare, through simulations, the performance of the proposed adaptive method and the existing tests described in §2. The candidate set of Inline graphic for the sum-of-powers tests Inline graphic was taken to be Inline graphic. We generated two groups of random samples, Inline graphic and Inline graphic, with sample sizes Inline graphic, from two multivariate normal distributions of dimension Inline graphic, so Inline graphic for Inline graphic. Without loss of generality, we let Inline graphic. Under the null hypothesis, Inline graphic; under the alternative hypothesis, Inline graphic elements in Inline graphic were set to nonzero values, where Inline graphic controls the signal sparsity. In our simulations we used Inline graphic, covering very dense signals for an alternative hypothesis at Inline graphic, to dense and then only moderately dense signals at Inline graphic and Inline graphic, and finally to moderately sparse and very sparse signals at Inline graphic and Inline graphic, respectively. The nonzero elements of Inline graphic were assumed to be uniformly distributed in Inline graphic, and their values were constant at Inline graphic, where Inline graphic controls the signal strength. The common covariance matrix is Inline graphic, where Inline graphic is the correlation matrix and the diagonal matrix Inline graphic contains the variances. We considered various structures of Inline graphic and Inline graphic, as detailed in the Supplementary Material. To save space, here we only show results for a first-order autoregressive correlation matrix Inline graphic and an equal-variance case with Inline graphic. Although this covariance matrix is only approximately bandable, we applied the banding estimator of Bickel & Levina (2008) to show the robustness of the proposed tests. For each setting, 1000 replicates were simulated to calculate the empirical Type I error and power of each test. The Inline graphic-values were calculated based on both the asymptotic distributions of the tests and the permutation method with Inline graphic iterations. The nominal significance level was set to Inline graphic.

Table 1 presents empirical Type I error rates and powers for Inline graphic. The results of most tests based on the asymptotics are very close to those based on permutations. This validates the results in Theorem 1. The Type I error rate and power of the thresholding test Inline graphic were overestimated by the corresponding asymptotic approximation, probably due to the slow convergence to its asymptotic distribution.

Table 1.

Empirical Type I errors and powers Inline graphic of various tests for normal samples with Inline graphic, Inline graphic and covariance matrix Inline graphic. Zero signal strength Inline graphic represents Type I errors, while Inline graphic represents powers; the results outside and inside parentheses were calculated from asymptotics- and permutation-based Inline graphic-values, respectively. The sparsity parameter was Inline graphic, leading to 117 nonzero elements in Inline graphic with a constant value of Inline graphic

Test Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
SPU(1) 5 (5) 50 (46) 78 (76) 92 (91) 98 (97)
SPU(2) 5 (5) 22 (20) 47 (46) 69 (67) 87 (85)
SPU(3) 4 (4) 40 (40) 71 (70) 88 (89) 97 (97)
SPU(4) 5 (5) 19 (18) 38 (37) 61 (60) 79 (78)
SPU(5) 4 (5) 24 (25) 47 (49) 70 (72) 84 (86)
SPU(6) 4 (4) 13 (14) 26 (29) 42 (45) 60 (64)
SPU(Inline graphic) 6 (5) 12 (9) 18 (15) 25 (21) 35 (28)
aSPU 6 (5) 33 (34) 66 (66) 85 (85) 94 (94)
CLZ 12 (5) 33 (15) 56 (34) 77 (57) 91 (76)
CLX 6 (5) 12 (9) 18 (15) 25 (21) 35 (28)
BS 6 (5) 23 (20) 48 (46) 70 (67) 88 (85)
CQ 6 (5) 23 (20) 48 (46) 70 (67) 88 (85)
SD 4 (5) 19 (19) 43 (45) 67 (68) 85 (86)

SPU, the proposed sum-of-powers tests with different values of Inline graphic; aSPU, the adaptive sum-of-powers test; CLZ, test of Chen et al. (2014); CLX, test of Cai et al. (2014); BS, test of Bai & Saranadasa (1996); CQ, test of Chen & Qin (2010); SD, test of Srivastava & Du (2008).

Since the Type I error rates of all the tests were well controlled by their permutation-based Inline graphic-values, we present the permutation-based powers in Fig. 1 to offer a fair comparison between the tests. The proposed adaptive sum-of-powers test Inline graphic was much more powerful than the other tests when the signals were highly dense, with Inline graphic. When the signal sparsity increased from Inline graphic to Inline graphic, the adaptive sum-of-powers test performed similarly to the sum-of-squares-type tests in Bai & Saranadasa (1996), Srivastava & Du (2008) and Chen & Qin (2010), and it was slightly more powerful than the thresholding test in Chen et al. (2014) and much more powerful than the supremum-type test in Cai et al. (2014). As the signals became less dense at Inline graphic, the adaptive sum-of-powers and thresholding tests were the most powerful, closely followed by the sum-of-squares-type tests and then the supremum-type test. At Inline graphic, although the adaptive sum-of-powers and thresholding tests remained the winners, the supremum-type test was more powerful than the sum-of-squares-type tests. When the signals were moderately sparse at Inline graphic, the adaptive sum-of-powers and supremum-type tests were the most powerful, closely followed by the thresholding test; they were much more powerful than the sum-of-squares-type tests. When the signals were highly sparse at Inline graphic, as expected, the supremum-type test became the sole winner, and the powers of the sum-of-squares-type and thresholding tests dropped substantially; however, the power of the adaptive sum-of-powers test remained high, close to that of the winner, the supremum-type test.

Fig. 1.

Fig. 1.

Empirical powers of the adaptive sum-of-powers test (squares) and the tests of Chen et al. (2014) (triangles point up), Cai et al. (2014) (plus signs), Bai & Saranadasa (1996) (crosses), Chen & Qin (2010) (diamonds), and Srivastava & Du (2008) (triangles point down). The signal sparsity parameter Inline graphic varies from Inline graphic to Inline graphic.

We obtained similar results for other simulation settings, including a more extreme case with a compound symmetric Inline graphic and unequal variances Inline graphic for multivariate normal data, and for simulated single nucleotide polymorphism data; see the Supplementary Material. In summary, owing to its adaptivity, the adaptive sum-of-powers test either achieved the highest power or had power close to that of the winner in any setting; it performed consistently well across all the situations. The banding estimator performed well, although occasionally the asymptotic adaptive sum-of-powers test would have slightly inflated Type I error rates when the assumptions in §3.2 were severely violated.

5. Real-data analysis

We applied the various tests to the bipolar disorder dataset from a genome-wide association study collected by The Wellcome Trust Case Control Consortium (2007). We used their quality control procedure to screen the subjects and obtained Inline graphic controls and Inline graphic cases. We filtered out all the single nucleotide polymorphisms with minor allele frequency lower than Inline graphic and those with Hardy–Weinberg equilibrium test Inline graphic-value less than Inline graphic in either cases or controls, giving 354 796 variables in total. To obtain a set of single nucleotide polymorphisms in approximate linkage equilibrium, as in the work of The International Schizophrenia Consortium (2009), we used the software plink (Purcell et al., 2007) to prune them with a criterion of linkage disequilibrium Inline graphic, a sliding window covering 200 single nucleotide polymorphisms, and a moving step of 20; this yielded 42 092 remaining single nucleotide polymorphisms. As The International Schizophrenia Consortium (2009) has shown that for bipolar disorder there is strong evidence of polygenic effects, we applied the various tests to the single nucleotide polymorphisms in each of the 22 autosomes separately to better demonstrate the possible power differences between the tests. The familywise nominal significance level was set at Inline graphic, and it would be Inline graphic for each chromosome after Bonferroni adjustment. This indicates that 10 000 permutations should be sufficient to yield a possibly significant Inline graphic-value to reject the null hypothesis.

We calculated both asymptotics- and permutation-based Inline graphic-values for each test. To save space, Table 2 shows only some representative results. Most of the asymptotics-based Inline graphic-values of the proposed sum-of-powers and adaptive tests were similar to their permutation-based ones, indicating good approximations. Again, the thresholding test Inline graphic produced asymptotics-based Inline graphic-values that were far more significant than the permutation-based ones for most chromosomes, indicating its poor approximation. The test of Srivastava & Du (2008) also performed poorly; it always gave asymptotic Inline graphic-values less than Inline graphic. To avoid potentially poor asymptotic approximations, we use the permutation-based Inline graphic-values to compare the various tests. In chromosomes 1, 2, 3, 6, 7, 9, 14, 15 and 16, both the sum-of-squares-type tests and the adaptive sum-of-powers test gave Inline graphic-values less than Inline graphic. In contrast, the thresholding test yielded significant Inline graphic-values for only five of those chromosomes, while the supremum-type test was not significant for any chromosome. These results were presumably due to dense signals in these chromosomes, thus favouring the sum-of-squares-type tests. However, in other situations the sum-of-squares-type tests might not perform well. For example, for chromosome 13, only the sum-of-powers test Inline graphic with Inline graphic gave a significant Inline graphic-value. Another example is chromosome 18: perhaps due to sparse signals, the supremum-type test gave the most significant Inline graphic-value, but none of the sum-of-squares-type tests yielded even marginal significance; borrowing strength from the supremum-type test, i.e., Inline graphic, the Inline graphic-value of the adaptive sum-of-powers test was marginally significant. In summary, owing to its adaptivity, the proposed adaptive test retained high power across various chromosomes with varying association patterns.

Table 2.

The Inline graphic-values Inline graphic of various tests applied to the Wellcome Trust Case Control Consortium bipolar disease data; the Inline graphic-values outside parentheses were calculated from asymptotic distributions, and those inside parentheses were based on permutations

Chromosome (number of single nucleotide polymorphisms)
Test 1 (3340) 2 (3194) 4 (2617) 13 (1592) 18 (1421)
SPU(1) 63Inline graphic6 (64Inline graphic3) 17Inline graphic0 (17Inline graphic8) 0Inline graphic2 (0Inline graphic2) 3Inline graphic7 (3Inline graphic7) 33Inline graphic0 (32Inline graphic3)
SPU(2) Inline graphic1 (Inline graphic1) Inline graphic1 (Inline graphic1) 1Inline graphic5 (1Inline graphic7) 2Inline graphic7 (2Inline graphic9) 28Inline graphic9 (28Inline graphic7)
SPU(3) 73Inline graphic8 (74Inline graphic5) 0Inline graphic6 (0Inline graphic7) 3Inline graphic1 (3Inline graphic1) 12Inline graphic9 (12Inline graphic6) 18Inline graphic7 (17Inline graphic4)
SPU(4) Inline graphic1 (Inline graphic1) Inline graphic1 (Inline graphic1) 2Inline graphic0 (2Inline graphic7) Inline graphic1 (0Inline graphic2) 35Inline graphic3 (33Inline graphic1)
SPU(5) 74Inline graphic2 (73Inline graphic2) 0Inline graphic2 (0Inline graphic3) 37Inline graphic5 (36Inline graphic1) 39Inline graphic4 (37Inline graphic1) 25Inline graphic9 (23Inline graphic4)
SPU(6) Inline graphic1 (Inline graphic1) Inline graphic1 (0Inline graphic1) 2Inline graphic7 (4Inline graphic1) Inline graphic1 (0Inline graphic4) 44Inline graphic8 (38Inline graphic6)
SPU(Inline graphic) 13Inline graphic1 (11Inline graphic8) 4Inline graphic5 (4Inline graphic3) 12Inline graphic1 (11Inline graphic9) 8Inline graphic8 (8Inline graphic0) 0Inline graphic5 (0Inline graphic4)
aSPU Inline graphic1 (Inline graphic1) Inline graphic1 (Inline graphic1) 1Inline graphic0 (1Inline graphic2) Inline graphic1 (1Inline graphic3) 1Inline graphic4 (1Inline graphic9)
CLZ Inline graphic1 (Inline graphic1) Inline graphic1 (0Inline graphic3) 9Inline graphic6 (10Inline graphic2) 0Inline graphic2 (0Inline graphic5) 5Inline graphic6 (6Inline graphic6)
CLX 13Inline graphic1 (11Inline graphic8) 4Inline graphic5 (4Inline graphic3) 12Inline graphic1 (11Inline graphic9) 8Inline graphic8 (8Inline graphic0) 0Inline graphic5 (0Inline graphic4)
BS Inline graphic1 (Inline graphic1) Inline graphic1 (Inline graphic1) 1Inline graphic5 (1Inline graphic7) 2Inline graphic6 (2Inline graphic9) 28Inline graphic8 (28Inline graphic7)
CQ Inline graphic1 (Inline graphic1) Inline graphic1 (Inline graphic1) 1Inline graphic5 (1Inline graphic7) 2Inline graphic7 (2Inline graphic9) 29Inline graphic0 (28Inline graphic7)
SD Inline graphic1 (Inline graphic1) Inline graphic1 (Inline graphic1) Inline graphic1 (1Inline graphic0) Inline graphic1 (11Inline graphic4) Inline graphic1 (9Inline graphic7)

SPU, the proposed sum-of-powers tests with different values of Inline graphic; aSPU, the adaptive sum-of-powers test; CLZ, test of Chen et al. (2014); CLX, test of Cai et al. (2014); BS, test of Bai & Saranadasa (1996); CQ, test of Chen & Qin (2010); SD, test of Srivastava & Du (2008).

Supplementary material

Supplementary material available at Biometrika online includes additional numerical results and proofs of the main theoretical results.

Supplementary Material

Supplementary Data

Acknowledgments

We thank the editor, an associate editor and two reviewers for many helpful and constructive comments. This research was supported by the U.S. National Institutes of Health. This study makes use of data generated by The Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available at www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust. Peng Wei is also affiliated with the Human Genetics Center at the University of Texas.

References

  1. Bai Z. D. & Saranadasa H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6, 311–29. [Google Scholar]
  2. Bickel P. J. & Levina E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199–227. [Google Scholar]
  3. Cai T. T. & Liu W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Assoc. 106, 672–84. [Google Scholar]
  4. Cai T. T., Liu W. & Xia Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Am. Statist. Assoc. 108, 265–77. [Google Scholar]
  5. Cai T. T., Liu W. & Xia Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Statist. Soc. B 76, 349–72. [Google Scholar]
  6. Chen S. X., Li J. & Zhong P.-S. (2014). Two-sample tests for high dimensional means with thresholding and data transformation. arXiv:1410.2848.
  7. Chen S. X. & Qin Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38, 808–35. [Google Scholar]
  8. Fan J. (1996). Test of significance based on wavelet thresholding and Neyman's truncation. J. Am. Statist. Assoc. 91, 674–88. [Google Scholar]
  9. Fan J., Liao Y. & Yao J. (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica 83, 1497–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Fisher R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edin. 52, 399–433. [Google Scholar]
  11. Goeman J. J., van de Geer S. A. & van Houwelingen H. C. (2006). Testing against a high dimensional alternative. J. R. Statist. Soc. B 68, 477–93. [Google Scholar]
  12. Gregory K. B., Carroll R. J., Baladandayuthapani V. & Lahiri S. N. (2015). A two-sample test for equality of means in high dimension. J. Am. Statist. Assoc. 110, 837–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Guyon X. (1995). Random Fields on a Network: Modeling, Statistics, and Applications. New York: Springer. [Google Scholar]
  14. Hall P., Jin J. & Miller H. (2014). Feature selection when there are many influential features. Bernoulli 20, 1647–71. [Google Scholar]
  15. Hotelling H. (1931). The generalization of Student's ratio. Ann. Math. Statist. 2, 360–78. [Google Scholar]
  16. Hsing T. (1995). A note on the asymptotic independence of the sum and maximum of strongly mixing stationary random variables. Ann. Prob. 23, 938–47. [Google Scholar]
  17. Lee S., Emond M. J., Bamshad M. J., Barnes K. C., Rieder M. J., Nickerson D. A., NHLBI GO Exome Sequencing Project ESP Lung Project Team, Christiani D. C., Wurfel M. M. & Lin X. (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lin D.-Y. & Tang Z.-Z. (2011). A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Pan W. (2009). Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet. Epidemiol. 33, 497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Pan W. (2011). Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet. Epidemiol. 35, 211–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pan W., Kim J., Zhang Y., Shen X. & Wei P. (2014). A powerful and adaptive association test for rare variants. Genetics 197, 1081–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Pan W. & Shen X. (2011). Adaptive tests for association analysis of rare variants. Genet. Epidemiol. 35, 381–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A. R., Bender D., Maller J., Sklar P., deBakker P. I. W. & Daly M. J.. et al (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. R Development Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org.
  25. Rothman A. J., Levina E. & Zhu J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97, 539–50. [Google Scholar]
  26. Srivastava M. S. & Du M. (2008). A test for the mean vector with fewer observations than the dimension. J. Mult. Anal. 99, 386–402. [Google Scholar]
  27. Srivastava R., Li P. & Ruppert D. (2015). RAPTT: An exact two-sample test in high dimensions using random projections. J. Comp. Graph. Statist. 25, 954–70. [Google Scholar]
  28. The International Schizophrenia Consortium (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. The Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wessel J. & Schork N. J. (2006). Generalized genomic distance-based regression methodology for multilocus association analysis. Am. J. Hum. Genet. 79, 792–806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wu M. C., Kraft P., Epstein M. P., Taylor D. M., Chanock S. J., Hunter D. J. & Lin X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86, 929–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Xue L., Ma S. & Zou H. (2012). Positive-definite Inline graphic-penalized estimation of large covariance matrices. J. Am. Statist. Assoc. 107, 1480–91. [Google Scholar]
  33. Yu K., Li Q., Bergen A. W., Pfeiffer R. M., Rosenberg P. S., Caporaso N., Kraft P. & Chatterjee N. (2009). Pathway analysis by adaptive combination of Inline graphic-values. Genet. Epidemiol. 33, 700–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zhong P.-S., Chen S. X. & Xu M. (2013). Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence. Ann. Statist. 41, 2820–51. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES