Abstract
This paper studies the projection test for high-dimensional mean vectors via optimal projection. The idea of projection test is to project high-dimensional data onto a space of low dimension such that traditional methods can be applied. We first propose a new estimation for the optimal projection direction by solving a constrained and regularized quadratic programming. Then two tests are constructed using the estimated optimal projection direction. The first one is based on a data-splitting procedure, which achieves an exact -test under normality assumption. To mitigate the power loss due to data-splitting, we further propose an online framework, which iteratively updates the estimation of projection direction when new observations arrive. We show that this online-style projection test asymptotically converges to the standard normal distribution. Various simulation studies as well as a real data example show that the proposed online-style projection test retains the type I error rate well and is more powerful than other existing tests.
Keywords: Data splitting, one-sample mean problem, online-style estimation, power enhancement, regularization methods
1. Introduction
Testing whether a population mean equals a known vector or the equality of means from two populations is a fundamental problem in high-dimensional statistics with wide applications. Such tests are commonly encountered in genome-wide association studies. For instance, Chen and Qin (2010) performed a test to identify sets of genes which are responsible for certain types of tumors in a genetics research. Xu et al. (2016) applied various tests to a bipolar disorder dataset collected by Wellcome Trust Case Control Consortium (2007) in which one would like to test whether there is any association between a disease and a large number of genetic variants. In these applications, the dimension of collected data is often much larger than the sample size . Suppose that is a random sample from a -dimensional population with mean and positive definite covariance matrix . Of interest is to test the following hypothesis
| (1) |
for some known vector . This is typically referred to as the one-sample hypothesis testing problem in multivariate analysis and has been extensively studied when is fixed. When , the well-known Hotelling’s test (Hotelling 1931) is not directly applicable as the sample covariance matrix is not invertible. Even in the case , the Hotelling’s test may suffer from low power against the alternative if for some that is close to 1 (Bai and Saranadasa 1996).
More recently, various tests have been proposed for high-dimensional mean vectors, which can be roughly classified into three types (Huang et al. 2022). The first type is known as quadratic-form tests, which can be viewed as modified Hotelling’s tests. Quadratic-form tests substitute the singular sample covariance matrix by an invertible diagonal matrix, leading to a quadratic form of the sample mean. For example, Bai and Saranadasa (1996) substituted the sample covariance matrix by identity matrix I and showed the resulting statistic has asymptotic normal null distribution if . Quadratic-form tests are known to be more powerful against dense alternatives. However, these quadratic-form tests tend to ignore the dependence among the dimensions and would lose testing power when correlations among the variables are high. The second type is known as extreme-type tests. An extreme-type test usually makes a decision based on the maximum value of a sequence of candidate test statistics. For example, to test a high-dimensional mean vector, one can easily construct a marginal test for each dimension and then use the maximum value of the marginal test statistics as the final test statistic (Cai et al. 2014, Chang et al. 2017). Another example is the sum-of-powers test which is in the form of sum of marginal statistics with power index leading to the most extreme p-value (Xu et al. 2016). Though extreme-type tests are typically more powerful against sparse alternatives (Cai et al. 2014), they generally require stringent conditions in order to derive the limiting null distribution and are likely to suffer from size distortions due to slow convergence rate. The third type is projection test. The idea of projection test is to project the high-dimensional observations onto a space of low dimension and then traditional methods such as Hotelling’s can be directly applied. A primary task of projection test is to find a good projection matrix (or vector) such that the resulting test achieves high power. A random projection test was proposed in Lopes et al. (2011) where entries in the projection matrix are randomly drawn from the standard normal distribution. Instead of random projection, data-driven approaches are also studied to construct the projection matrix. Lauter (1996) proposed a test that projects data onto a 1-dimensional space by some weight vector which depends on data only through . Recently, Huang (2015) studied the optimal projection such that the resulting projection test has the highest power among all possible projection matrices. It has been shown that we should project data onto a 1-dimensional space to achieve the highest power and the optimal projection matrix is of the form . Li and Li (2021) further extended the idea of optimal projection matrix for hypothesis testing in high-dimensional linear models.
Optimal direction based projection tests have not been systematically studied yet and there are a few open questions remaining to be addressed. The first question is how to obtain a good estimation of the optimal projection with statistical guarantees. A naive ridge-type estimator is proposed in Huang (2015), but little is known about its theoretical properties. The power analysis of their test relies on the assumption that the ridge-type estimator is consistent, which is not necessarily true in the high-dimensional setting. The second question is how to construct a powerful test based on the estimated optimal projection. Notice that the optimal projection is typically unknown and needs to be estimated from observed data. Huang (2015) proposed a data-splitting strategy which utilizes half of the observations to estimate the optimal projection and uses the remaining half to test the hypothesis. The major advantage of the data-splitting strategy is the estimated projection direction is independent of the remaining data and an exact test can be easily constructed. However, there is substantial power loss due to data-splitting as only half of the data is used to perform the test. This paper aims to fill these gaps by constructing an online-style projection test via optimal projection. Firstly, we propose a new estimation of the optimal projection direction by solving a constrained and regularized quadratic programming problem. Under the assumption that the optimal projection is sparse, we can show that any stationary point of the quadratic programming problem is a consistent estimator. In other words, we do not need to find the global solution to the possibly nonconvex optimization problem as any stationary point enjoys desirable statistical properties. Secondly, we propose an online framework to boost testing power, which originates from online learning algorithms for streaming data. The main idea is to recursively update the estimation when new observations arrive. In particular, we first obtain an initial estimator of the optimal projection via a subset of data. Then we recursively project the incoming observations onto a 1-dimensional space and then update the estimation by including the newly arrived observations. We repeat this procedure until the arrival of the last observation and then perform a test based on the projected sample.
The rest of this paper is organized as follows. In Section 2, we propose a new estimation of the optimal projection via constrained and regularized quadratic programming and derive non-asymptotic and error bounds. Such error bounds hold for all stationary points. In Section 3, we propose the online-style projection test and study its asymptotic limiting distributions. In Section 4, numerical studies are conducted to examine the finite sample performances of different tests with application to a real data example. We conclude this paper with some discussion in Section 5. All technical proofs and additional simulation results are given in the supplemental material.
2. Optimal Direction Based Projection Tests
2.1. Preliminary
Without loss of generality, we assume and the one-sample problem (1) becomes
| (2) |
Given a random sample with mean and covariance matrix , let and be the sample mean and the sample covariance matrix, respectively,
| (3) |
When , the Hotelling’s for problem (2) is . If are independent and identically distributed (i.i.d.) normal random variables, then under , where denotes distribution with degrees of freedom and . The Hotelling’s requires the sample covariance matrix to be invertible and cannot be directly applied when . Despite the singularity of , it has been observed that the power of the Hotelling’s test can be adversely affected even when if is nearly singular (Bai and Saranadasa 1996).
Projection tests are proposed for high-dimensional data. The basic idea of projection tests is to reduce dimension by projecting the high-dimensional vector onto a space of lower dimension and then traditional methods such as Hotelling’s can be carried out. Let be a projection matrix with . We project the -dimensional vector to a -dimensional space by left-multiplying , i.e., . Conditional on the projection matrix , the projected sample are independent and identically distributed with mean and covariance matrix . Then the original problem (2) can be reformulated as
| (4) |
Note that the hypothesis in (4) is not necessarily equivalent to that in (2). If is rejected, then is rejected as well but not vice versa. Given the projection matrix , we construct Hotelling’s test, denoted by , based on the projected sample ,
A key question is how to choose the dimension and the projection matrix . Huang (2015) proved that under normality assumption, the optimal projection direction is proportional to in the sense that the power of is maximized. Without normality assumption, the direction is asymptotically optimal provided that the sample mean follows asymptotically normal. Throughout this paper, it is assumed that are i.i.d. sub-Gaussian, and hence is asymptotically the optimal direction.
Define . With the choice of , the hypothesis becomes
| (5) |
which is equivalent to the original hypothesis (2) as implies . This optimal projection direction depends on unknown parameters and needs to be estimated from data. In order to control the type I error, Huang (2015) proposed an exact -test via data-splitting. The entire dataset is randomly partitioned into two disjoint sets: and . The first dataset is used to estimate and the second dataset is used to construct the test statistic. To estimate , a ridge-type estimator is used in Huang (2015), where and are the sample mean and the sample covariance estimated from an . Define , , then the test statistic is constructed as , where and are the sample mean and standard deviation based on . The advantage of data-splitting is the constructed test is an exact test under normality assumption. The power analysis of relies on the consistency of ridge-type estimator , which is no longer true in the high-dimensional regime. In addition, this data-splitting approach suffers from power loss since only a subset of data is used to perform the test. In this paper, we propose a different approach to estimate the optimal projection direction such that the resulting estimator is consistent. Furthermore, we propose an online framework to perform the test such that more observations are utilized to boost the power.
2.2. Estimation of optimal projection direction
We first introduce some notations. For a vector , let be its norm, ,2. Its norm is the number of nonzero entries in and norm is . For a matrix , its entrywise norm is . For a set , denotes its cardinality. We use to denote the largest integer that is smaller or equal to , and use to denote the data matrix.
In practice, the optimal projection direction is typically unknown and needs to learned from samples. A simple plug-in estimator for is replacing and by their sample counterparts and . The first challenge is when , the sample covariance matrix is no longer invertible. This challenge can be overcomed by imposing additional sparsity structure on or and apply thresholding or penalized likelihood (Fan et al. 2016). The second challenge is even if we achieve a consistent estimator , the consistency of does not imply the consistency of (Chen et al. 2016). These challenges motivate us to seek a direct estimation of the vector . Let denote the optimal projection direction. Notice that any vector that is proportional to is also optimal as the projection test statistic is scale invariant. Based on this observation, we consider the following constrained quadratic optimization problem
| (6) |
The solution to (6) is , which can be viewed as a scaled version of the optimal projection direction. Note that the formulation (6) rules out the null hypothesis as the solution is not well-defined for . We want to point out that the optimal projection direction, which maximizes the power of the test , is derived when alternative hypothesis is true. Under null hypothesis, any choice of projection would work perfectly fine in the sense that type I error rate can be well controlled, see more discussions in Section 3. Hence excluding does not bring in any inconvenience.
In high-dimensional regimes, it is well known that consistent estimators cannot be achieved unless additional structural assumptions are imposed. A common assumption is the optimal projection direction is sparse, i.e., a large portion of elements in are 0. To encourage a sparse estimator of , we consider the following constrained and regularized quadratic programming,
| (7) |
where is some penalty function satisfying the conditions in Supplement S.2.1. Such conditions on are relatively mild and are satisfied for a wide variety of popular penalties (Fan et al. 2020).
Before we proceed to solve the optimization problem (7), we provide a different prospective on estimation of the optimal projection direction without knowing its explicit form . Suppose we want to project data onto a 1-dimensional space. Let be the projection vector and be projected sample. Then is the -statistic based on the projected sample, where and are the sample mean and sample variance of ’s,
The goal is to find that maximizes the absolute value of the -statistic (i.e., maximize power). Since is an odd function w.r.t. , maximizing is the same as maximizing ,, which is equivalent to
| (8) |
Therefore, formulation (7) can be viewed as the regularized version of (8), which intends to maximize the test statistic and the penalty term is added to avoid overfitting from a machine learning prospective. Another perspective that provides insights on the optimal projection direction is Fisher’s discriminant analysis. Consider a two-sample problem, and of interest is to test whether the two population means are the same. That is, . We can treat the hypothesis testing as a classification problem under Fisher’s discriminant analysis framework, which is to find the best linear combination of features that maximizes the ratio of variance between the two populations to the variance within the populations (Fisher 1936). That is to find to maximize the ratio
This ratio measures the signal-to-noise ratio for the class labelling. It can be shown that the maximum ratio occurs when , which is exactly the optimal projection direction for the two-sample problem (Huang 2015).
2.3. Non-asymptotic error bounds
In this section, we establish the non-asymptotic error bounds for all stationary points. The nonconvexity of brings extra difficulty in solving the constrained and regularized quadratic form (7). Direct computation of the global solution is quite challenging in high-dimensional settings. For instance, Liu et al. (2016) developed a mixed integer linear programming to find a provably global solution, which is quite computationally expensive. Instead of searching for the global solution, many algorithms are developed to efficiently find local solutions with satisfactory theoretical guarantees (Zou and Li 2008, Wang et al. 2013, Loh and Wainwright 2015). We will show that any stationary points of (7) enjoy satisfactory statistical properties and introduce an ADMM with local linear approximation algorithm to obtain a stationary point. is a stationary point of optimization problem (7) if satisfies the following first order condition,
| (9) |
where denotes the subgradient of and denotes the inner product of two vectors. The first order condition (9) is a necessary condition for to be a local minimum. In other words, the set of satisfying condition (9) includes all local minimizers as well as the global minimizer.
We further assume the sample covariance matrix satisfies the following restricted strong convexity (RSC) condition,
| (10) |
where is a strictly positive constant and is a nonnegative constant. Though we only impose the RSC condition on the set , it is easy to verify that the RSC condition actually holds for all due to the quadratic form, see Lemma S.3 in Supplement S.1. In a classical setting where , the sample covariance matrix is positive definite and hence one can set and , the minimal eigenvalue of . In the high-dimensional regime where , the sample covariance matrix is typically semi-positive definite and hence for any . Then the RSC condition (10) holds trivially for , where . As a result, we only require the RSC condition to hold in the set . Such RSC-type condition is widely adopted to develop the non-asymptotic error bounds in high-dimensional -estimation and is satisfied with high probability if are i.i.d. sub-Gaussian random variables. See, for example, Loh and Wainwright (2015).
Recall that is the scaled optimal projection direction. It is challenging to directly derive the error bounds between and as is not necessarily feasible, i.e., may not satisfy the linear equality constraint . To this end, we introduce the scaled feasible optimal projection direction , which is the intersection of set of optimal projection direction and feasible set . Note that is of the same direction as the optimal projection but also satisfies the linear constraint . Figure 1 illustrates the relationship between the scaled optimal projection direction , the scaled feasible optimal projection direction and a stationary point . We first derive error bounds between and and then error bounds between and follow the triangle inequality
| (11) |
Figure 1:

Illustration of the scaled optimal projection direction , the scaled feasible optimal projection direction and a stationary point . is the intersection of the optimal projection direction and feasible set. Both and lie in the feasible set while and are optimal projection directions.
We impose the following conditions,
(C1) are independent and identically distributed sub-Gaussian vectors.
(C2) The sample covariance matrix satisfies the RSC condition (10) with .
(C3) There exists , such that ,.
The following theorem states the and error bounds for and .
Theorem 1. Suppose conditions (C1)–(C3) hold. Let be any stationary point of the program (7) with for some large constant . If the sample size satisfies
| (12) |
for some . Then with probability at least for some constant , we have
and .
and .
where is the number of nonzero elements in .
Remark. The error bounds in Theorem 1 hold for all stationary points satisfying condition (9). In other words, any local estimator is guaranteed with desirable statistical accuracy. Condition (12) describes the relationship between sample size and dimension and is satisfied if . If as , then we know is a consistent estimator of . In other words, in order to obtain a consistent estimator, we require the optimal projection direction to be sparse. Chernozhukov et al. (2017) established a central limit theorem for high-dimensional data with Gaussian and bootstrap approximations, which can be applied to a one-sample hypothesis testing problem and requires . In contrast, we impose a weaker condition on the dimension on the proposed projection test. Conditions (C1)–(C2) are commonly adopted in high-dimensional statistics. Condition (C3) is posited to ensure the estimation error of covariance matrix is not enlarged by and passed along to the estimation of . Note that . A diverging would amplify the estimation error of .
Geometry of .
Notice that the projection test is scale invariant with respect to , i.e., and have exact the same testing performance for any . To eliminate scale effect, we also measure the closeness between and using cosine similarity. Formally, the cosine similarity between two vectors and is defined as
If and are of the same direction ( for some ), then cosine similarity equals 1. In general, if the cosine similarity is closer to 1, then the directions of the two vectors are closer to each other. We that the cosine similarity between and converges to 1.
Corollary 1. Suppose conditions (C1)–(C3) hold. Let be a stationary point of the program (7) with for some large constant . Then we have
2.4. ADMM with local linear approximation
The error bounds in Theorem 1 indicate that any stationary point has satisfactory statistical accuracy. Instead of finding the global solution to the possibly nonconvex optimization problem (7), we only need to develop an algorithm which can find a stationary point that satisfies the first order condition (9). We propose to solve the constrained and regularized quadratic form using the local linear approximation (LLA) technique. The LLA algorithm was firstly proposed in Zou and Li (2008) to deal with nonconvex penalty in generalized linear models. The idea of LLA is to approximate the nonconvex penalty by its first order expansion, which is a convex function. It turns out that we only need to solve a convex problem iteratively until certain convergence criterion is reached. A similar idea is proposed in Wang et al. (2013) which approximates the nonconvex penalty by its tight upper bound. Here we extend the LLA algorithm to solve the constrained and regularized quadratic programming (7).
Suppose is close to , ignoring constants that do not involve , the penalty function can be approximated by
Given the -th iteration , the optimization problem (7) can be updated as follows
| (13) |
where . We iteratively solve (13) until converges. The LLA algorithm is an instance of Majorization-Minimization (MM) algorithms and is guaranteed to converge to a stationary point (Zou and Li 2008). With the local linear approximation, we know there is a unique solution to the constrained and regularized quadratic form and this unique solution is a stationary point. Hence the theoretical results developed in Section 2.3 hold for the estimator given by the LLA algorithm. The theoretical properties of LLA estimators are systematically studied in Fan et al. (2014) in high dimensions and the LLA algorithm can find the oracle estimator with high probability after one iteration. To solve the convex optimization problem after local linear approximation (13), we adopt the alternating direction method of multipliers (ADMM) algorithm which naturally handles the linear equality constraint (Boyd et al. 2011, Fang et al. 2015). The penalisation parameter is chosen via high-dimensional BIC (Wang et al. 2013). The details of ADMM algorithm and choice of are summarized in Supplement S.4 and S.5, respectively.
3. Data Splitting and Power Enhancement
Ideally, one would like to use the full sample to estimate the optimal projection direction so the estimated projection is the most accurate. Then we project the full sample to a 1-dimensional space and perform a test based on the projected sample. However, the limiting distribution of the test statistic is challenging to derive as the estimated projection direction and the data to be projected are dependent. In this section, we introduce a new projection test via an online framework. Before we present the full methodology of the online-style projection test, we first proceed with a data-splitting procedure.
3.1. A data-splitting procedure
Data-splitting technique has been widely used in high-dimensional statistical inference, where one can use a subset of data to learn the underlying model and use the remaining data to conduct statistical inferences. Given a dataset , we partition the full dataset into two disjoint subsets and . We use the first subset to estimate the optimal projection direction and use the second dataset to perform the projection test. More specifically, let and be the sample mean and the sample covariance matrix estimated from . The projection direction can be estimated by solving the following constrained and penalized quadratic form introduced in Section 2.2,
| (14) |
Then we project data in onto a 1-dimensional space by taking inner product of and , i.e., for . Note that the estimated projection direction is independent of , which is the key benefit of the data-splitting procedure. Conditional on , are independent and identically distributed with mean and variance . Now we reduce the dimensionality from to 1, and we can simply apply the one-sample -test to the projected data . Let
where . Clearly, the test statistic is asymptotically follows under and we reject null hypothesis whenever , where is the upper quantile of the standard normal distribution. If ’s are normally distributed, then follows an exact distribution with degree of freedom . We refer to this test as data-splitting projection test (DSPT) and summarize it in Algorithm 1. Thanks to the data-splitting procedure, the estimated is independent of the remaining subset . As a result, we are able to achieve an exact -test and hence control the type I error rate. However, the performance of the data-splitting procedure may not be satisfactory as data in is not utilized when conducting the -test, leading to loss of power. Next, we derive the asymptotic power function for the DSPT defined as .

Theorem 2. Suppose conditions (C1)–(C3) hold. Let be a stationary point of program (14) with for some large constant . If , then we have
where and is the cdf of the standard normal distribution.
The quantity can be interpreted as the signal strength in alternative hypothesis. As long as , the DSPT has asymptotic power approaching 1. If the full sample of size is used to perform the test, one would expect the power function to be . Theorem 2 clearly indicates power loss of the DSPT due to the data-splitting procedure - only observations in are used to test the hypothesis. Though the DSPT is not optimal in terms of testing power, it is still more powerful than the quadratic-form tests which ignores the dependence among variables, as long as (Huang 2015). In practice, it is recommended to set with based on numerical studies. Huang (2015) proposed a ridge-type estimator to estimate the projection direction, which is not guaranteed to be consistent in high dimensions and its theoretical properties require and of the same order. Its power analysis is relied on the consistency of the ridge-type estimator. We propose to estimate the optimal projection via a constrained and regularized quadratic program. The resulting estimator is consistent provided that . We further derive the asymptotic power function based on this condition.
3.2. An online-style data splitting
The data-splitting procedure introduced in the previous section achieves an exact -test but suffers from power loss. In this subsection, we propose an online framework for projection test to enhance the testing power while retaining the type I error rate. Imagine that observations arrive one by one in chronological order, and we repeat a projection-estimation procedure whenever a new observation arrives. More specifically, we obtain an estimator of the optimal projection based on current observations, and when a new observation arrives we first project the new observation and then update the estimation by including the new observation. Suppose we observe at time and we first use the observations to estimate the optimal projection direction via the penalized and constrained quadratic form introduced in Section 2.2,
| (15) |
where and are sample mean and sample covariance estimated from . When a new observation arrives, we take the following Projection-Estimation steps:
Projection: project the new observation using the current estimator , i.e., .
- Estimation: update the estimation of projection direction by incorporating the information of the new observation . That is,
where and are sample mean and sample covariance estimated from . Given an integer , we obtain an initial estimator based on the first observations in . Then we repeat the Projection-Estimation steps until the last observation arrives. We require and since we need at least two observations to estimate the covariance matrix. As a result, we obtain a sequence of projected sample of sample size and based on which we can carry out the one-sample -test. Let
where and are the sample mean and the sample variance of . Later in this section we will show the test statistic converges to under . As a result, we can use as the critical value and reject whenever . We refer to this test as Online-style Projection Test One by one (OPT-O) as we update whenever a new observation arrives. The details of the OPT-O are also summarized in Algorithm 2.(16)
Under the one-by-one framework of OPT-O, one needs to solve a constrained and penalized quadratic programming, which can be nonconvex, whenever a new observation arrives. As a result, the OPT-O test can be computationally expensive especially when is large. To reduce the computational cost, we also propose a mini-batch version the OPT-O test. That is, we update the estimated projection direction only when a batch of observations of size arrive. Suppose we obtain an estimator at time based on . When the next observations arrive, we first project the observations to a 1-dimensional space by multiplying , i.e., ,. Then we update the estimation of projection direction by including the additional observations,
| (17) |
where and are sample mean and sample covariance matrix computed from the first observations . We repeat this procedure until the last batch arrives. Note that the size of the last batch can be small than . Similar to the OPT-O, we reject if where the test statistic is defined in (16). The details of the mini-batch version are summarized in Algorithm 3 and the corresponding test is referred to as Online-style Projection Test - mini Batch (OPT-B). Note that the data-splitting projection test can be regarded as a special case of the OPT-B where there is only one single batch. In practice, we recommend the OPT-O test (i.e., ) to maximize the power. If computing power is a concern, one should use a small value of to ensure high power provided that the computational cost is affordable. Section S.8 studies how the batch size affects the power of projection tests. Another way to reduce computational time is through warm starting. For both OPT-O and OPT-B, one can leverage the solution from the previous step as the initial solution in the next iteration to obtain or . Such a warm start helps shorten time to find the optimal solution.

Algorithms 2 and 3 introduce a general online framework to construct projection test. Instead of using the constrained and penalized quadratic programming to estimate the projection direction, any appropriate estimator can be applied. For instance, one can use the ridge-type estimator proposed in Huang (2015).
3.3. Asymptotic distribution of online-style projection test
In this subsection, we establish the limiting distributions for the proposed online-style projection test. Under the online framework, we obtain a sequence of projected sample , one by one or via mini-batch. Then we need to make an inference regarding based on the projected sample. One challenge is that the projected sample , are no longer independent from each other. A key observation is that we can construct a sequence of martingale differences from the projected sample. Let be the centralized version of , i.e., for and has mean 0. The following lemma states that the sequence of ’s is a martingale difference and the proof can be found in Section S.3.

Lemma 1. The sequence is a martingale difference with respect to the -field generated by under both the one-by-one and the mini-batch framework.
Lemma 1 also implies that is a martingale difference under . With this observation, we are ready to establish the central limit theorem for the test statistic. To this end, let be the conditional variance of given ,
and let and be the sample mean and sample variance of ,
The following theorem establishes the asymptotic normality for .
Theorem 3. (Normality) Assume and there exists that depends on such that , where is some almost surely finite random variable satisfying , then we have
In particular, under ,.
We also empirically verify the asymptotic normality of proposed online-style tests under both and in Section S.6.
Remark. In order to establish the central limit theorem for the proposed online-style test statistic, we require the sample average of conditional variance to be nondegenerate. Note that for each , the estimated projection direction satisfies the linear constraint and hence . Since with high probability, we know which diverges in the high-dimensional settings, so is the conditional variance . With a proper choice of , we want the sample average of conditional variance (up to the factor ) to converge to some nonzero random variable. In general, one can set to be of the order . In practice, we do not need to derive the explicit form of as the test statistic is scale invariant and will be cancelled out. Under the alternative hypothesis, we can show that the sample average of conditional variance converges to where and hence one can set . Thanks to the linear constraint , the resulting estimator is bounded away from and the conditional variance is not degenerate. Without the linear constraint, we cannot guarantee that is bounded away from and hence the central limit theorem may not hold.
Here is an example to illustrate the convergence of with the following conditions imposed on the first moment and second moment of . Assume there exists , and sequences such that and for all . Therefore, and . Note that
Further assume converges to its expectation , where
As a result, the sample average of conditional variance can be approximated by
Under , we have and hence one can take . Under , we have and hence one can take .
Theorem 3 establishes the asymptotic normality of the test statistic under both null and alternative hypotheses assuming the sample average of conditional variances (up to some factor converges. This theorem justifies the usage of as the critical value and we reject the null hypothesis if and only if . A bootstrap version of projection test is discussed in Supplement S.7 where the critical value is chosen based on bootstrap samples. Under conditions (C1)–(C3), we can show as , based on which we can derive the asymptotic power function for the OPT-O test. The following theorem establishes the asymptotic power function for the OPT-O test.
Theorem 4. Suppose conditions (C1)–(C3) hold. Let be a stationary point of program (15) with . Assume that and , we have
(Normality under alternative) .
(Power function) as .
Theorem 4 establishes the normality of the proposed OPT-O test under the alternative, and based on which we derive the asymptotic power function. The asymptotic power function in Theorem 4 holds with any choice of . In practice, one may set be a fixed constant or for some , see Section 3.4 for a numerical study on how to select . For the mini-batch version test OPT-B, it is easy to verify that is also a martingale difference and thus the asymptotic normality and power function in Theorem 4 also hold for the OPT-B.
Remark The power function indicates that can be regarded as the signal in the alternative. The larger , the easier to reject the null. If , then we have asymptotic power 1 to reject the null. For the sum-of-square-type of tests, which ignore the dependence among the variables, have asymptotic power
under the assumption that . Since , we have . The OPT-O test improves the performance of the DSPT in the following two ways: (1) the OPT-O test keeps including new observations for the estimation of and hence the estimator of gets more and more accurate as more observations arrive. (2) Fewer observations are discarded for the OPT-O test compared with the DSPT when performing the test. For the OPT-O, only the first observations are discarded while observations are discarded for the DSPT. Let us further examine the technical conditions for the asymptotic power. For the DSPT test, we require to control the error bounds of the estimated projection direction. In order to have high power, the sample size used for the test cannot be too small. In particular, if , then it is more powerful than the quadratic-form tests such as the CQ test. As a result, it is assumed that and are the same order of (i.e., and (hence ). The same condition on dimension is needed for the OPT-O, but it allows the sample size discarded for the test is small, i.e., . As a result, more observations can be used to perform the test and has higher asymptotic power than the DSPT. To summarize, under the same condition , the OPT-O is more powerful than the DSPT. We can easily translate the asymptotic power function with local alternative for some fixed vector . Given that and , the local asymptotic power functions for the DSPT and the OPT-O are
where and . Clearly, for local alternatives, we still have . To evaluate the efficiency of the proposed OPT-O and DSPT, we introduce the oracle projection test which utilizes the full sample with known optimal projection direction. Hence the asymptotic power for the oracle projection test is
Here we present an asymptotic power comparison between the proposed OPT-O, DSPT and the oracle projection test with ,, and different signal strength . Figure 2 indicates that there is certain power loss due to data-splitting compared with the oracle test. The online-style test improves the power of the DSPT. Overall we have .
Figure 2:

Asymptotic power functions of the OPT-O, the DSPT and the oracle projection test. The oracle power function is based on the full sample with known optimal direction.
3.4. Choice of
The proposed online-style projection test involves choosing the parameter such that we use the first observations to obtain an initial estimator of the optimal projection direction. According to Theorem 4, any choice of would result in testing power of order . Clearly, should not be too large since the first observations are not utilized when performing the online-style projection test. A large may lead to a significant loss in power. To maximize with respect to , one may set (since we need at least two observations to estimate the covariance matrix). However, a small results in an inaccurate initial estimator and consequently affect the power. In what follows we conduct numerical studies to investigate how affect the testing power. Assume (so with . We set and for both autocorrelation and compound symmetry covariance structures with . Figure 3 depicts the testing power against the choice of , where the upper two panels are for the autocorrelation structure and the lower two panels are for the compound symmetry structure. Figure 3 shows that the power of the test is almost flat when is small, then increases gradually and drops quickly as further increases. In other words, the power of the online-style projection test is not very sensitive to the choice of and gives very similar power when is relatively small . When is large, we perform the online-style projection test only based on a dataset of a relatively small sample size and lower power is expected. According to Figure 3, we suggest choosing in practice and we use in the rest of this paper.
Figure 3:

Power of the online-style projection test against the choice of . We set and for both autocorrelation and compound symmetry covariance matrix structure with . The upper two panels are the power for autocorrelation structure and the lower two panels are the power for compound symmetry structure.
4. Numerical Results
In this section, we conduct numerical experiments to examine the finite sample performance of different tests for one-sample mean vector problem in high dimensions, including projection tests as well as quadratic-form tests. In particular, we consider the following optimal projection based projection tests:
OPT-O: Online-style Projection Test One-by-one version according to Algorithm 2.
OPT-B: Online-style Projection Test mini-Batch version according to Algorithm 3.
OPT-R: Online-style Projection Test One-by-one version where the projection direction is estimated by the ridge-type estimator in Huang (2015).
DSPT: Data-splitting Projection Test according to Algorithm 1.
RDSPT: Data-splitting Projection Test where the projection direction is estimated by the ridge-type estimator in Huang (2015).
We use the SCAD penalty (Fan and Li 2001) to estimate the optimal projection direction. The batch size for the OPT-B test is set to be 10. We also include two other projection tests, which are the Lauter test (Lauter 1996) and the random projection test (RPT) proposed in Lopes et al. (2011). The quadratic-form tests include the BS test (Bai and Saranadasa 1996), the CQ test (Chen and Qin 2010) and the SD test (Srivastava and Du 2008).
4.1. Simulation studies
We generate random samples of size from multivariate normal distribution , multivariate -distribution with degrees of freedom 6, and multivariate -distribution with degree of freedom 1 where and . We set and 1 to examine Type I error rate and compare power of different tests, where corresponds to the null hypothesis and or 1 corresponds to the alternative hypothesis. For , we consider the following two covariance structures: (1) compound symmetry with and (2) autocorrelation with , where is a vector with all its entries being 1 and is the identity matrix. We consider and 0.95 to study the impact of correlation on testing power. We set sample size and dimension . For online-style projection tests (i.e., OPT-O, OPT-B and OPT-R), we set with . For data splitting projection tests (i.e., DSPT and RDSPT), we set with . To this end, we replace the sample covariance matrix by with a small positive number and all the theoretical results still hold when . Such a perturbation does not noticeably affect the computational accuracy of the estimator but leads to a faster convergence according to our numerical studies. We set the type I error rate . All simulation results are based on 10,000 replications. Top two panels of Table 1 report type I error rate and power for all tests with and under compound symmetry structure and autocorrelation structure respectively. To save space, all other simulation results can be found in Tables S.2 – S.6 of Supplement S.9.
Table 1:
Size and power comparison for and (values are in percentage).
| c = 0 | c = 0.5 | c = 1 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ρ | 0.25 | 0.5 | 0.75 | 0.95 | 0.25 | 0.5 | 0.75 | 0.95 | 0.25 | 0.5 | 0.75 | 0.95 |
| OPT-O | 5.70 | 5.83 | 5.07 | 5.11 | 90.35 | 99.70 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| OPT-B | 6.11 | 5.74 | 5.68 | 5.41 | 77.18 | 95.28 | 99.80 | 99.97 | 100.0 | 100.0 | 100.0 | 100.0 |
| OPT-R | 5.20 | 5.03 | 5.27 | 5.05 | 15.82 | 28.90 | 82.01 | 100.0 | 97.97 | 99.79 | 99.99 | 100.0 |
| DSPT | 5.22 | 4.99 | 5.21 | 5.08 | 50.43 | 79.97 | 98.51 | 99.98 | 99.92 | 99.94 | 99.99 | 100.0 |
| RDSPT | 5.01 | 4.71 | 5.06 | 4.94 | 14.62 | 23.71 | 54.68 | 98.14 | 71.49 | 81.98 | 95.74 | 100.0 |
| BS | 6.08 | 6.20 | 6.22 | 6.22 | 7.28 | 6.72 | 6.58 | 6.50 | 11.18 | 8.46 | 7.70 | 7.34 |
| CQ | 6.02 | 6.14 | 6.14 | 6.12 | 7.30 | 6.62 | 6.54 | 6.46 | 11.18 | 8.46 | 7.58 | 7.32 |
| SD | 2.29 | 0.56 | 0.11 | 0.01 | 2.50 | 0.58 | 0.12 | 0.01 | 4.06 | 0.72 | 0.14 | 0.01 |
| Lauter | 5.16 | 5.18 | 5.14 | 5.16 | 5.04 | 5.06 | 5.08 | 5.06 | 5.12 | 5.08 | 5.02 | 5.02 |
| RPT | 5.12 | 5.24 | 5.04 | 4.90 | 6.88 | 8.00 | 11.78 | 51.76 | 14.56 | 20.94 | 42.22 | 98.38 |
| OPT-O | 5.87 | 5.99 | 5.76 | 5.88 | 74.33 | 59.22 | 40.02 | 25.12 | 100.0 | 100.0 | 99.98 | 99.84 |
| OPT-B | 5.48 | 5.90 | 6.06 | 5.98 | 63.21 | 49.59 | 32.92 | 20.84 | 100.0 | 100.0 | 99.87 | 98.33 |
| OPT-R | 6.14 | 5.45 | 5.96 | 5.47 | 32.44 | 24.99 | 15.73 | 8.37 | 99.89 | 98.48 | 84.55 | 40.02 |
| DSPT | 5.25 | 5.19 | 5.09 | 5.12 | 38.03 | 30.96 | 22.88 | 16.49 | 100.0 | 99.94 | 98.81 | 91.04 |
| RDSPT | 4.61 | 4.95 | 5.30 | 4.92 | 17.85 | 14.57 | 9.55 | 6.10 | 94.90 | 84.59 | 58.09 | 22.43 |
| BS | 5.18 | 4.96 | 5.24 | 4.46 | 38.02 | 29.42 | 17.74 | 8.02 | 100.0 | 99.84 | 91.44 | 33.82 |
| CQ | 5.16 | 5.06 | 5.24 | 4.44 | 38.08 | 29.46 | 17.72 | 8.06 | 100.0 | 99.84 | 91.48 | 33.92 |
| SD | 11.73 | 8.22 | 4.08 | 1.65 | 64.17 | 45.15 | 20.19 | 3.45 | 100.0 | 99.82 | 91.48 | 20.90 |
| Lauter | 4.90 | 4.66 | 5.20 | 5.10 | 8.70 | 6.42 | 5.94 | 5.14 | 14.64 | 9.48 | 6.98 | 5.36 |
| RPT | 4.86 | 4.98 | 5.16 | 4.90 | 6.34 | 6.30 | 6.60 | 6.86 | 11.88 | 12.16 | 11.52 | 13.36 |
| OPT-O | 5.81 | 5.39 | 5.18 | 4.89 | 75.58 | 95.94 | 99.90 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| OPT-B | 6.21 | 5.32 | 5.11 | 4.79 | 61.58 | 87.35 | 98.92 | 100.0 | 100.0 | 99.98 | 99.99 | 100.0 |
| OPT-R | 4.93 | 4.79 | 4.75 | 5.74 | 12.71 | 22.99 | 67.67 | 99.95 | 89.86 | 96.65 | 99.74 | 100.0 |
| DSPT | 5.28 | 5.09 | 4.83 | 4.62 | 40.44 | 69.29 | 95.49 | 99.97 | 99.82 | 99.83 | 99.99 | 99.99 |
| RDSPT | 4.88 | 5.06 | 5.02 | 4.74 | 11.40 | 18.51 | 46.12 | 96.50 | 61.99 | 74.62 | 92.62 | 99.98 |
| BS | 4.92 | 5.75 | 5.93 | 5.98 | 5.45 | 6.03 | 6.09 | 6.12 | 7.48 | 7.13 | 6.82 | 6.65 |
| CQ | 6.16 | 6.19 | 6.18 | 6.13 | 6.75 | 6.44 | 6.31 | 6.28 | 9.42 | 7.57 | 7.12 | 6.89 |
| SD | 1.19 | 0.43 | 0.05 | 0.02 | 1.34 | 0.43 | 0.05 | 0.02 | 1.89 | 0.53 | 0.05 | 0.02 |
| Lauter | 4.95 | 4.97 | 4.93 | 4.96 | 4.98 | 4.95 | 4.97 | 4.96 | 4.97 | 4.93 | 4.92 | 4.91 |
| RPT | 4.43 | 4.61 | 4.26 | 4.15 | 5.71 | 7.08 | 9.87 | 44.45 | 12.53 | 18.11 | 35.10 | 96.15 |
| OPT-O | 5.93 | 5.86 | 5.91 | 6.03 | 58.34 | 45.07 | 30.92 | 17.82 | 99.99 | 99.96 | 99.36 | 96.20 |
| OPT-B | 5.46 | 5.82 | 6.02 | 6.24 | 48.12 | 36.98 | 26.30 | 16.15 | 99.99 | 99.90 | 98.59 | 91.16 |
| OPT-R | 5.48 | 5.96 | 6.20 | 6.19 | 23.81 | 18.51 | 12.64 | 7.43 | 97.26 | 91.16 | 70.57 | 30.22 |
| DSPT | 4.89 | 4.61 | 4.85 | 4.77 | 28.90 | 24.90 | 17.90 | 12.35 | 99.89 | 99.19 | 94.29 | 78.77 |
| RDSPT | 5.24 | 4.58 | 5.08 | 5.37 | 13.82 | 11.03 | 8.69 | 5.40 | 83.00 | 69.65 | 44.60 | 17.05 |
| BS | 0.00 | 0.00 | 0.01 | 0.78 | 0.00 | 0.01 | 0.06 | 1.36 | 3.05 | 3.60 | 5.24 | 6.85 |
| CQ | 5.14 | 5.17 | 5.09 | 5.25 | 21.73 | 17.47 | 11.39 | 7.08 | 97.03 | 90.64 | 66.82 | 20.99 |
| SD | 0.00 | 0.00 | 0.00 | 0.06 | 0.00 | 0.00 | 0.01 | 0.18 | 1.31 | 1.82 | 2.21 | 1.66 |
| Lauter | 5.07 | 5.12 | 4.76 | 5.09 | 7.72 | 6.65 | 5.08 | 5.27 | 12.20 | 9.06 | 6.38 | 5.44 |
| RPT | 3.89 | 4.63 | 4.00 | 4.34 | 5.47 | 5.81 | 5.14 | 5.78 | 10.12 | 10.39 | 10.17 | 11.02 |
We first examine the type I error rate. Among all these tests, the DSPT, the RDSPT, the Lauter test and the RPT are exact tests (under normality assumption) and their sizes are exactly . The online-style projection tests have asymptotic normal distribution under and retain the type I error very well. Similar to online-style projection tests, the BS test and the CQ test also control the size reasonably well. The SD test is very sensitive to the correlation level and its size is often deviated from the expected 0.05.
Next we compare the power of these tests. Table 1 suggests that the power of these tests strongly relies on the covariance structure, the correlation as well as the signal strength . In summary, the proposed OPT-O test is the most powerful test in all settings. The one-by-one online-style projection (OPT-O) is slightly more powerful than the corresponding mini-batch version (OPT-B) and improves the performance of corresponding data splitting projection test (DSPT) a lot. This is not surprising since the OPT-O test keeps updating the estimation of projection direction whenever a new observation arrives and generally has a more accurate estimation than its mini-batch version. The mini-batch projection test slightly sacrifices the accuracy but reduces the computational cost. The DSPT is less powerful as it throws away much more information compared with online-style projection tests. The test based on constrained and regularized quadratic programming (DSPT) is more powerful than the one based on the ridge-type estimator (RDSPT) as the true optimal projection direction is (approximately) sparse. When the covariance structure is compound symmetry, the power of optimal projection based tests improve as increases as larger leads to stronger signal in the alternative. As the value of jumps from 0.5 to 1, the power of all tests increases dramatically. As the dimension goes from 400 to 1600, there is a downward trend for the performance of these tests. However, even in the most challenging setting , the proposed OPT-O still has very satisfactory performance with power greater than 90%. The quadratic-form tests tend to become less powerful as increases. This is because quadratic-form tests ignore the correlation among the variables and therefore their overall performances are not satisfactory when correlation is strong. When the covariance structure is autocorrelation, unlike the compound symmetry setting, the power of all the tests drops as increases since larger would lead to weaker signal in the alternative. We notice that some of the quadratic-form tests have better power than the data splitting projection tests when is small. This is because is a 3-sparse matrix, which is almost an identity matrix, and ignoring dependence among variables does not noticeably diminish the power. In fact, the power of quadratic-form tests decreases dramatically as increases and become less powerful than the DSPT when , not to mention the OPT-O test. As shown in the bottom two panels of Table 1 and Tables S.4, S.5 and S.6, the overall pattern of size and power for multivariate -distribution and -distribution is very similar to that under normality assumption, indicating the proposed projection test is not very sensitive to the sub-Gaussianity assumption and performs well for asymmetric data. For non-sub-Gaussian data, we observe that online-style projection test still controls the type I error rate well and retains high power. The proposed OPT-O test is still the most powerful one among those which successfully retain the type I error rate. For other tests, the BS test and the CQ test have slight size distortion under while the SD test and the Lauter test completely fail to control the type I error under .
4.2. Real data example
In this section, we apply the proposed online-style projection test as well as other tests to a real dataset of high resolution micro-computed tomography. This dataset consists of 58 mice’s skull bone density from three different genotypes (“T0A0”, “T0A1”, “T1A1”) in a genetic mutation study. For each mouse, bone density is measured at density levels from 130 – 249 for 16 different areas of its skull. See Percival et al. (2014) for a detailed description of protocols. In this empirical study, we would like to know if there is a difference in the bone density patterns of two different areas in mice’s skull. To emphasize the high-dimensionality nature of this dataset, we only use a subset of the dataset. We select the mice of the genotype “T0A1” and there are 29 observations available in the dataset, i.e., sample size . The two areas of the skull “Mandible” and “Nasal” are selected. We use all density levels from 130 – 249 for our analysis, hence dimension . We first take the difference of the bone density of the two selected areas at the corresponding density level for each subject since the two bones come from the same mouse. Then we normalize the bone density in the sense that for all .
We first apply SOPT-O, SPT-DS and other existing tests to perform the one-sample test and compute the p-values. The p-values are reported in the first column in Table 2. All p-values are very close to 0 (), implying that the bone densities of the two areas are significantly different. In order to compare the power of different tests, we also compute the p-values of different tests as we weaken the signal strength in alternative. To be more specific, let be the sample mean and be the residual for the th subject. Then we can construct a new observation for the th subject, where . By the construction, a smaller leads to a weaker signal and makes the test more challenging. Table 2 also reports the p-values of different tests with . As expected, p-values of all tests increase as decreases. When or 0.6, all tests perform well and reject the null hypothesis at level 0.05. When , the Lauter’s test starts to fail to reject the null hypothesis. When is further decreased to 0.3, all the three projection based tests SOPT-O, SPT-DS and RPT-DS are able to reject the null hypothesis while all other tests except the LJW2011 test fail to reject null hypothesis. When , only SOPT-O and SPT-DS reject the null hypothesis, which suggests that our proposed sparse projection tests can still perform very well even when the signal is weak. Among those tests that fail to reject the null at , RPT-DS is the most powerful one as it has the smallest p-value.
Table 2:
P-values of different tests for bone density dataset.
| δ | 1.0 | 0.8 | 0.6 | 0.4 | 0.3 | 0.2 |
|---|---|---|---|---|---|---|
| OPT-O | 0 | 0 | 4.6 × 10−13 | 4.0 × 10−10 | 1.6 × 10−4 | 0.0325 |
| DSPT | 7.9 × 10−10 | 6.7 × 10−9 | 3.3 × 10−7 | 9.9 × 10−6 | 3.4 × 10−4 | 0.0418 |
| RDSPT | 2.4 × 10−8 | 5.1 × 10−7 | 1.9 × 10−5 | 0.0014 | 0.0140 | 0.1462 |
| BS | 0 | 0 | 0 | 1.2 × 10−4 | 0.0763 | 0.7684 |
| CQ | 0 | 0 | 0 | 1.6 × 10−4 | 0.0810 | 0.7717 |
| SD | 0 | 0 | 2.0 × 10−9 | 1.6 × 10−2 | 0.2494 | 0.7995 |
| Lauter | 1.1 × 10−10 | 3.1 × 10−8 | 1.6 × 10−3 | 0.1265 | 0.2625 | 0.4574 |
| RPT | 3.8 × 10−9 | 4.2 × 10−8 | 4.5 × 10−6 | 8.5 × 10−4 | 9.6 × 10−3 | 0.2031 |
We plot the heatmap of absolute values of paired sample correlations of all bone density levels in Figure 4. The heatmap clearly shows that some bone density levels are highly correlated. This explains why the proposed projection tests are more powerful than the D1958 test, the BS1996 test and the SD2008w test as these tests do not take the dependence among variables into account.
Figure 4:

Heatmap of absolute values of paired sample correlations of bone densities. The yellow grids indicate that some bone density levels are highly correlated. The green curve in the right panel is the histogram of absolute values of paired sample correlation.
5. Discussion
This paper studies the projection test for mean vectors in high dimensions. Existing projection tests either fail to utilize the optimal projection or lack the theoretical justification of the estimated projection direction. To maximize the power of projection tests, a critical task is to obtain a good estimation of the optimal projection with statistical guarantees. We propose a constrained and regularized quadratic programming to obtain a consistent estimator under sparsity assumption. We further propose an online-style framework for projection tests to enhance the testing power. This is a general testing framework in the sense that any proper estimator (e.g., Ridge-type estimator) of the optimal projection can be applied and can be easily extended to the two-sample problem. Under this framework, the first observations are used to obtain an initial estimator and are not utilized to perform the test. In future work, it would be interesting to investigate how we can reuse the observations that are discarded to further enhance the power. A challenge is whether we can still control the type I error with guarantees by reusing the first observations. Another interesting question is how we can construct a project test when the sparsity assumption does not hold. When the optimal projection directions is not sparse, the ridge-type estimator in Huang (2015) seems a good candidate and it is worth investigating its theoretical properties in future work.
Supplementary Material
Acknowledgments
The authors would like to thank the AE and reviewers for their constructive comments, which lead to a significant improvement of this work.
Funding
Zhong’s research was supported by National Natural Science Foundation of China (NNSFC) grants 11922117, 12231011, 7198801, and a National Statistical Science Research Grant 2022LD0. Li’s research was supported by National Science Foundation (NSF) DMS-1820702, and NIH grants R01AI136664 and R01AI170249. The content is solely the responsibility of the authors and does not necessarily represent the official views of NNSFC, NSF or NIH.
Footnotes
Supplementary Materials
Supplemental materials contain some useful lemmas, technical proofs and additional numerical results.
Disclosure Statement
The authors report there are no competing interests to declare.
References
- Bai Z and Saranadasa H (1996), ‘Effect of high dimension: By an example of a two sample problem’, Statistica Sinica 6, 311–329. [Google Scholar]
- Boyd S, Parikh N, Chu E, Peleato B and Eckstein J (2011), ‘Distributed optimization and statistical learning via the alternating direction method of multipliers’, Foundations and Trends® in Machine Learning 3(1), 1–122. [Google Scholar]
- Cai T, Liu W and Xia Y (2014), ‘Two-sample test of high dimensional means under dependence’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(2), 349–372. [Google Scholar]
- Chang J, Zheng C, Zhou W-X and Zhou W (2017), ‘Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity’, Biometrics 73(4), 1300–1310. [DOI] [PubMed] [Google Scholar]
- Chen SX and Qin Y-L (2010), ‘A two-sample test for high-dimensional data with applications to gene-set testing’, The Annals of Statistics 38(2), 808–835. [Google Scholar]
- Chen X, Xu M and Wu WB (2016), ‘Regularized estimation of linear functionals of precision matrices for high-dimensional time series’, IEEE Transactions on Signal Processing 64(24), 6459–6470. [Google Scholar]
- Fan J and Li R (2001), ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, Journal of the American Statistical Association 96(456), 1348–1360. [Google Scholar]
- Fan J, Li R, Zhang C and Zou H (2020), Statistical Foundations of Data Science, Chapman and Hall/CRC. Boca Raton, FL. [Google Scholar]
- Fan J, Liao Y and Liu H (2016), ‘An overview of the estimation of large covariance and precision matrices’, The Econometrics Journal 19(1), C1–C32. [Google Scholar]
- Fan J, Xue L and Zou H (2014), ‘Strong oracle optimality of folded concave penalized estimation’, The Annals of Statistics 42(3), 819–849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fang EX, He B, Liu H and Yuan X (2015), ‘Generalized alternating direction method of multipliers: new theoretical insights and applications’, Mathematical Programming Computation 7(2), 149–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hotelling H (1931), ‘The generalization of student’s ratio’, The Annals of Mathematical Statistics 2(3), 360–378. [Google Scholar]
- Huang Y (2015), ‘Projection test for high-dimensional mean vectors with optimal direction’, Department of Statistics, The Pennsylvania State University at University Park. [Google Scholar]
- Huang Y, Li C, Li R and Yang S (2022), ‘An overview of tests on high-dimensional means’, Journal of Multivariate Analysis 188, 104813. [Google Scholar]
- Lauter J (1996), ‘Exact t and F tests for analyzing studies with multiple endpoints’, Biometrics 52(3), 964–970. [Google Scholar]
- Li C and Li R (2021), ‘Linear hypothesis testing in linear models with high dimensional responses’, Journal of the American Statistical Association pp. 1–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H, Yao T and Li R (2016), ‘Global solutions to folded concave penalized nonconvex learning’, The Annals of Statistics 44(2), 629–659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loh P-L and Wainwright MJ (2015), ‘Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima’, Journal of Machine Learning Research 16(1), 559–616. [Google Scholar]
- Lopes M, Jacob L and Wainwright MJ (2011), A more powerful two-sample test in high dimensions using random projection, in ‘Advances in Neural Information Processing Systems’, Vol. 24, pp. 1206–1214. [Google Scholar]
- Percival CJ, Huang Y, Jabs EW, Li R and Richtsmeier JT (2014), ‘Embryonic craniofacial bone volume and bone mineral density in fgfr2+/p253r and nonmutant mice’, Developmental Dynamics 243(4), 541–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srivastava MS and Du M (2008), ‘A test for the mean vector with fewer observations than the dimension’, Journal of Multivariate Analysis 99(3), 386–402. [Google Scholar]
- Wang L, Kim Y and Li R (2013), ‘Calibrating non-convex penalized regression in ultra-high dimension’, The Annals of Statistics 41(5), 2505–2536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wellcome Trust Case Control Consortium (2007), ‘Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls’, Nature 447(7145), 661–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu G, Lin L, Wei P and Pan W (2016), ‘An adaptive two-sample test for high-dimensional means’, Biometrika 103(3), 609–624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou H and Li R (2008), ‘One-step sparse estimates in nonconcave penalized likelihood models’, The Annals of Statistics 36(4), 1509–1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
