ASYMPTOTICALLY INDEPENDENT U-STATISTICS IN HIGH-DIMENSIONAL TESTING

Yinqiu He; Gongjun Xu; Chong Wu; Wei Pan

doi:10.1214/20-aos1951

. Author manuscript; available in PMC: 2021 Dec 1.

Published in final edited form as: Ann Stat. 2021 Jan 29;49(1):154–181. doi: 10.1214/20-aos1951

ASYMPTOTICALLY INDEPENDENT U-STATISTICS IN HIGH-DIMENSIONAL TESTING

Yinqiu He ^1,^*, Gongjun Xu ^1,^†, Chong Wu ², Wei Pan ³

PMCID: PMC8634550 NIHMSID: NIHMS1737820 PMID: 34857975

Abstract

Many high-dimensional hypothesis tests aim to globally examine marginal or low-dimensional features of a high-dimensional joint distribution, such as testing of mean vectors, covariance matrices and regression coefficients. This paper constructs a family of U-statistics as unbiased estimators of the ℓ_p-norms of those features. We show that under the null hypothesis, the U-statistics of different finite orders are asymptotically independent and normally distributed. Moreover, they are also asymptotically independent with the maximum-type test statistic, whose limiting distribution is an extreme value distribution. Based on the asymptotic independence property, we propose an adaptive testing procedure which combines p-values computed from the U-statistics of different orders. We further establish power analysis results and show that the proposed adaptive procedure maintains high power against various alternatives.

Keywords: High-dimensional hypothesis test, U-statistics, adaptive testing

MSC2020 subject classifications: 62F03, 62F05

1. Introduction.

Motivation.

Analysis of high-dimensional data, whose dimension p could be much larger than the sample size n, has emerged as an important and active research area (e.g., [19, 21, 23, 63]). In many large-scale inference problems, one is often interested in globally testing some overall patterns of low-dimensional features of the high-dimensional random observations. One example is genome-wide association studies (GWAS), whose primary goal is to identify single nucleotide polymorphisms (SNPs) associated with certain complex diseases of interest. A popular approach in GWAS is to perform univariate tests, which examine each SNP one by one. This, however, may lead to low statistical power due to the weak effect size of each SNP [47] and the small statistical significance threshold (~ 10⁻⁸) chosen to control the multiple-comparison type I error [40]. Researchers therefore have proposed to globally test a genetic marker set with many SNPs [40, 64] in order to achieve higher statistical power and to better understand the underlying genetic mechanisms.

In this paper, we focus on a family of global testing problems in the high-dimensional setting, including testing of mean vectors, covariance matrices and regression coefficients in generalized linear models. These problems can be formulated as $H_{0} : E = 0$ , where 0 is an all zero vector, $E = {e_{l} : l \in L}$ is a parameter vector with $L$ being the index set, and e_l’s being the corresponding parameters of interest, for example, elements in mean vectors, covariance matrices or coefficients in generalized linear models. For the global testing problem $H_{0} : E = 0$ versus $H_{A} : E \neq 0$ , two different types of methods are often used in the literature. One is sum-of-squares-type statistics. They are usually powerful against “dense” alternatives, where $E$ has a high proportion of nonzero elements with a large $‖ E ‖_{2} = \sum_{l \in L} e_{l}^{2}$ or its weighted variants. See examples in mean testing (e.g., [4, 11, 12, 25, 26, 60, 62]) and covariance testing (e.g., [3, 13, 42, 45]). The other is maximum-type statistics. They are usually powerful against “sparse” alternatives, where $E$ has few nonzero elements with a large $‖ E ‖_{\infty}$ (e.g., [6, 8, 9, 27, 36, 46, 58]). More recently, [20, 70] also proposed to combine these two kinds of test statistics. However, for denser or only moderately dense alternatives, neither of these two types of statistics may be powerful, as will be further illustrated in this paper both theoretically and numerically. Importantly, in real applications, the underlying truth is usually unknown, which could be either sparse, dense or in-between. As global testing could be highly underpowered if an inappropriate testing method is used (e.g., [15]), it is desired in practice to have a testing procedure with high statistical power against a variety of alternatives.

A family of asymptotically independent U-statistics.

To address these issues, we propose a U-statistics framework and introduce its applications to adaptive high-dimensional testing. The U-statistics framework constructs unbiased and asymptotically independent estimators of $‖ E ‖_{a}^{a} ≔ \sum_{l \in L} e_{l}^{a}$ for different (positive) integers a, where a = 2 corresponds to a sum-of-squares-type statistic, and an even integer a → ∞ yields a maximum-type statistic. The adaptive testing then combines the information from different $‖ E ‖_{a}^{a}$ ’s, and our power analysis shows that it is powerful against a wide range of alternatives, from highly sparse, moderately sparse to dense, to highly dense.

To illustrate our idea, suppose z₁, … , z_n are n independent and identically distributed (i.i.d.) copies of a random vector z. We consider the setting where each parameter e_l has an unbiased kernel function estimator K_l(z_i₁, … , z_{i_{γ_l}}), and γ_l is the smallest integer such that for any 1 ≤ i₁ ≠ ⋯ ≠ i_{γ_l} ≤ n, E[K_l(z_i₁, … , z_{i_{γ_l}})] = e_l. This includes many testing problems on moments of low orders, such as entries in mean vectors, covariance matrices and score vectors of generalized linear models, which shall be discussed in detail. The family of U-statistics can be constructed generally as follows. For integers a ≥ 1 and 1 ≤ i₁ ≠ ⋯ ≠ i_{γ_l} ≠ ⋯ ≠ i_{(a−1)×γ_l+1} ⋯ ≠ i_{a×γ_l} ≤ n, since the z’s are i.i.d., we have $E [K_{l} (z_{i_{1}}, \dots, z_{i_{γ_{l}}}) \dots K_{l} (z_{i_{(a - 1) \times γ_{l} + 1}}, \dots, z_{i_{a \times γ_{l}}})] = e_{l}^{a}$ . Therefore, we can construct an unbiased estimator of the parameters of augmented powers $e_{l}^{a}$ with different a. Then $‖ E ‖_{a}^{a}$ has an unbiased estimator

U (a) = \sum_{l \in L} (P_{a \times γ_{l}}^{n})^{- 1} \sum_{1 \leq i_{1} \neq \dots \neq i_{a \times γ_{l}} \leq n} \prod_{k = 1}^{a} K_{l} (z_{i_{(k - 1) \times γ_{l} + 1}}, \dots, z_{i_{k \times γ_{l}}}),

(1.1)

where $P_{k}^{n} = n! ∕ (n - k)!$ denotes the number of k-permutations of n. We call a the order of the U-statistic $U (a)$ . If a > b, we say $U (a)$ is of higher order than $U (b)$ and vice versa.

This construction procedure can be applied to many testing problems. We give three common examples below for illustration and more detailed case studies will be discussed in Sections 2 and 4.

Example 1. Consider one-sample mean testing of H₀ : μ = 0, where $E = μ$ is the mean vector of a p-dimensional random vector x. Suppose x₁, … , x_n are n i.i.d. copies of x. For each i = 1, … , n, j = 1, … , p, x_i,j is a simple unbiased estimator of μ_j, then we can take the kernel function K_j(x_i) = x_i,j. Following (1.1), we know the U-statistic

U (a) = (P_{a}^{n})^{- 1} \sum_{j = 1}^{p} \sum_{1 \leq i_{1} \neq \dots \neq i_{a} \leq n} \prod_{k = 1}^{a} x_{i_{k}, j}

is an unbiased estimator of $‖ E ‖_{a}^{a} = ‖ μ ‖_{a}^{a} = \sum_{j = 1}^{p} μ_{j}^{a}$ . Please see Section 4.1 for the two-sample mean testing example and related theoretical properties.

Example 2. Suppose x₁, … , x_n are n i.i.d. copies of a random vector x with mean vector μ = 0 and covariance matrix Σ = {σ_j₁,j₂}_p×p. For covariance testing H₀ : σ_j₁,j₂} = 0 for any 1 ≤ j₁ ≠ j₂ ≤ p, we have $E = {σ_{l} : l \in L}$ with $L = {(j_{1}, j_{2}) : 1 \leq j_{1} \neq j_{2} \leq p}$ . Since x_i,j₁x_i,j₂ is a simple unbiased estimator of σ_j₁,j₂, then for each pair $l = (j_{1}, j_{2}) \in L$ , we can take the kernel function K_l(x_i) = x_i,j₁x_i,j₂. Following (1.1), the U-statistic

U (a) = (P_{a}^{n})^{- 1} \sum_{1 \leq j_{1} \neq j_{2} \leq p} \sum_{1 \leq i_{1} \neq \dots \neq i_{a} \leq n} \prod_{k = 1}^{a} (x_{i_{k}, j_{1}} x_{i_{k}, j_{2}})

is an unbiased estimator of $‖ E ‖_{a}^{a} = \sum_{1 \leq j_{1} \neq j_{2} \leq p} σ_{j_{1}, j_{2}}^{a}$ . Please see Section 2 for one-sample covariance testing with unknown μ, and Section 4.2 for two-sample covariance testing.

Example 3. Consider a response variable y and its covariates $x \in R^{p}$ following a generalized linear model: E(y∣x) = g⁻¹(x^⊤β), where g is the canonical link function and $β \in R^{p}$ are the regression coefficients. Suppose that (x_i, y_i), i = 1, … , n, are i.i.d. copies of (x, y). For testing H₀ : β = β₀, the score vectors (S_i,j = (y_i − μ_0,i)x_i,j : j = 1, … , p)^⊤ are often used in the literature, where $μ_{0, i} = g^{- 1} (x_{i}^{⊺} β_{0})$ . Note that E(S_i,j) = 0 under H₀. Thus to test H₀, we can take $E = {E (S_{i, j}) : j = 1, \dots, p}$ and use the U-statistic

U (a) = (P_{a}^{n})^{- 1} \sum_{j = 1}^{p} \sum_{1 \leq i_{1} \neq \dots \neq i_{a} \leq n} \prod_{k = 1}^{a} S_{i_{k}, j},

which is an unbiased estimator of $‖ E ‖_{a}^{a} = \sum_{j = 1}^{p} {E (S_{i, j})}^{a}$ . Please see Section 4.3.

Related literature.

For high-dimensional testing, some other adaptive testing procedures have recently been proposed in [52, 65, 67]. These works combine the p-values of a family of sum-of-powered statistics that are powerful against different $‖ E ‖_{a}^{a}$ ’s. However, in these existing works, to evaluate the p-value of the adaptive test statistic, the joint asymptotic distribution of the statistics is difficult to obtain or calculate. Accordingly, computationally expensive resampling methods are often used in practice [40, 52, 69]. For some special cases such as testing means and the coefficients of generalized linear models, [67] and [65] derived the limiting distributions of the test statistics under the framework of a family of von Mises V-statistics. However, the constructed V-statistics are usually correlated and biased estimators of the target $‖ E ‖_{a}^{a}$ . It follows that in [67] and [65], numerical approximations are still needed to calculate the tail probabilities of the adaptive test statistics; see Remark 4.1 and Section 4.3. In addition, these existing adaptive testing works mainly focus on the first-order moments, and their results do not directly apply to testing second-order moments, such as covariance matrices.

To overcome these issues, this paper considers the proposed family of unbiased U-statistics. There are some other recent works providing important results on high-dimensional U-statistics (e.g., [14, 43, 72]). For instance, [72] considered testing the regression coefficients in linear models using the fourth-order U-statistic; [43] studied the limiting distributions of rank-based U-statistics; and [14] studied bootstrap approximation of the second-order U-statistics. However, these results do not directly apply to the high-order U-statistics considered in this paper.

Our contributions.

We establish the theoretical properties of the U-statistics in various high dimensional testing problems, including testing mean vectors, regression coefficients of generalized linear models, and covariance matrices. Our contributions are summarized as follows.

Under the null hypothesis, we show that the normalized U-statistics of different finite orders are jointly normally distributed. The result applies generally for any asymptotic regime with n → ∞ and p → ∞. In addition, we prove that all the finite-order U-statistics are asymptotically independent with each other under the null hypothesis. Moreover, we prove that U-statistics of finite orders are also asymptotically independent of the maximum-type test statistic with a limiting extreme value distribution.

Under the alternative hypothesis, we further analyze the asymptotic power for U-statistics of different orders. We show that when $E$ has denser nonzero entries, $U (a)$ ’s of lower orders tend to be more powerful; and when $E$ has sparser nonzero entries, $U (a)$ ’s of higher orders tend to be more powerful. More interestingly, we show that in the boundary case of “moderate” sparsity levels, $U (a)$ with a finite a > 2 gives the highest power among the family of U-statistics, clearly indicating the inadequacy of both the sum-of-squares- and the maximum-type statistics.

An important application of the independence property among $U (a)$ ’s is to construct adaptive testing procedures by combining the information of different $U (a)$ ’s, whose univariate distributions or p-values can be easily combined to form a joint distribution to calculate the p-value of an adaptive test statistic. Compared with other existing works (e.g., [65, 67]), numerical approximations of tail probabilities are no longer needed. As shown in the power analysis, an adaptive integration of information across different tests leads to a powerful testing procedure.

The rest of the paper is organized as follows. In Sections 2 and 3, we illustrate the framework by a covariance testing problem. Particularly, in Section 2.1, we study the U-statistics under null hypothesis; in Section 2.2, we analyze the power of the U-statistics; in Section 2.3, we develop an adaptive testing procedure. In Sections 3.1 and 3.2, we report simulations and a real dataset analysis. In Section 4, we study other high-dimensional testing problems, including testing means, regression coefficients, and two-sample covariances. In Section 5, we discuss several extensions of the proposed framework. We give proofs and other stimulations in Supplementary Material [28].

2. Motivating example: One-sample covariance testing.

The constructed family of U-statistics and adaptive testing procedure can be applied to various high-dimensional testing problems. In this section, we illustrate the framework with a motivating example of one-sample covariance testing. Analogous results for other high-dimensional testing problems in Section 4 can be obtained following similar analyses. We showcase the study of one-sample covariance testing problem since this is more challenging than mean testing due to the two-way dependency structure and the one-sample problem can be used as the building block for more general cases.

Specifically, we focus on testing

H_{0} : σ_{j_{1}, j_{2}} = 0 \forall 1 \leq j_{1} \neq j_{2} \leq p,

(2.1)

where Σ = {σ_j₁,j₂ : 1 ≤ j₁, j₂ ≤ p} is the covariance matrix of a p-dimensional real-valued random vector x = (x₁, … , x_p)^⊤ with E(x) = μ = (μ₁, … , μ_p)^⊤. The observed data include n i.i.d. copies of x, denoted by x₁, … , x_n with x_i = (x_i,1, … , x_i,p)^⊤. In factor analysis, testing H₀ in (2.1) can be used to examine whether Σ has any significant factor or not [1].

Global testing of covariance structure plays an important role in many statistical analysis and applications; see a review in [7]. Conventional tests include the likelihood ratio test, John’s test and Nagao’s test, etc. [1, 50]. These methods, however, often fail in the high-dimensional setting when both n, p → ∞. To address this issue, new procedures have been recently proposed (e.g., [3, 8, 13, 36-38, 41, 42, 45, 46, 53, 57-59]). However, these methods might suffer from loss of power when the sparsity level of the alternative covariance matrix varies. In the following subsections, we introduce the general U-statistics framework, study their asymptotic properties and develop a powerful adaptive testing procedure.

We introduce some notation. For two series of numbers u_n,p, v_n,p that depend on n, p: u_n,p = o(v_n,p) denotes $lim {sup}_{n, p \to \infty} ∣ u_{n, p} ∕ v_{n, p} ∣ = 0; u_{n, p} = O (v_{n, p})$ denotes $lim {sup}_{n, p \to \infty} ∣ u_{n, p} ∕ v_{n, p} ∣ < \infty$ ; u_n,p = Θ(v_n,p) denotes $0 < lim {inf}_{n, p \to \infty} ∣ u_{n, p} ∕ v_{n, p} ∣ \leq lim {sup}_{n, p \to \infty} ∣ u_{n, p} ∕ v_{n, p} ∣ < \infty$ ; u_n,p ≃ v_n,p denotes ${lim}_{n, p \to \infty} u_{n, p} ∕ v_{n, p} = 1$ . Moreover, $\overset{P}{\to}$ and $\overset{D}{\to}$ represent the convergence in probability and distribution, respectively. For p-dimensional random vector x with mean μ and ∀j₁, … , j_t ∈ {1, … , p}, we write the central moment as

Π_{j_{1}, \dots, j_{t}} = E [(x_{j_{1}} - μ_{j_{1}}) \dots (x_{j_{t}} - μ_{j_{t}})] .

(2.2)

2.1. Asymptotically independent U-statistics.

For testing (2.1), the set of parameters that we are interested in is $E = {σ_{j_{1}, j_{2}} : 1 \leq j_{1} \neq j_{2} \leq p}$ . Following the previous analysis of (1.1), since σ_j₁,j₂ has a simple unbiased estimator x_i₁,j₁x_i₁,j₂ − x_i₁,j₁x_i₂,j₂ with 1 ≤ i₁ ≠ i₂ ≤ n, then for integers a ≥ 1, an unbiased U-statistic of $‖ E ‖_{a}^{a} = \sum_{1 \leq j_{1} \neq j_{2} \leq p} σ_{j_{1}, j_{2}}^{a}$ is

U (a) = (P_{2 a}^{n})^{- 1} \sum_{1 \leq j_{1} \neq j_{2} \leq p} \sum_{1 \leq i_{1} \neq \dots \neq i_{2 a} \leq n} \prod_{k = 1}^{a} (x_{i_{2 k - 1}, j_{1}} x_{i_{2 k - 1}, j_{2}} - x_{i_{2 k - 1}, j_{1}} x_{i_{2 k}, j_{2}}) .

This is equivalent to

U (a) = \sum_{1 \leq j_{1} \neq j_{2} \leq p} \sum_{c = 0}^{a} (- 1)^{c} (\begin{matrix} a \\ c \end{matrix}) \frac{1}{P_{a + c}^{n}} \sum_{1 \leq i_{1} \neq \dots \neq i_{a + c} \leq n} \prod_{k = 1}^{a - c} (x_{i_{k}, j_{1}} x_{i_{k}, j_{2}}) \prod_{s = a - c + 1}^{a} x_{i_{s}, j_{1}} \prod_{t = a + 1}^{a + c} x_{i_{t}, j_{2}} .

(2.3)

Remark 2.1. The U-statistics can be constructed by another method equivalently. Given 1 ≤ j₁ ≠ j₂ ≤ p, define φ_j₁,j₂ = σ_j₁,j₂ + μ_j₁μ_j₂. Then

\sum_{1 \leq j_{1} \neq j_{2} \leq p} σ_{j_{1}, j_{2}}^{a} = \sum_{1 \leq j_{1} \neq j_{2} \leq p} \sum_{c = 0}^{a} (\begin{matrix} a \\ c \end{matrix}) φ_{j_{1}, j_{2}}^{a - c} \times (- μ_{j_{1}} μ_{j_{2}})^{c},

(2.4)

which is a polynomial function of the moments μ_j and φ_j₁,j₂. Since μ_j and φ_j₁,j₂ have unbiased estimators x_i,j and x_i,j₁x_i,j₂ respectively, then for 1 ≤ i₁ ≠ ⋯ ≠ i_a+c ≤ n, $E (\prod_{k = 1}^{a - c} x_{i_{k}, j_{1}} x_{i_{k}, j_{2}} \prod_{s = a - c + 1}^{a} x_{i_{s}, j_{1}} \prod_{t = a + 1}^{a + c} x_{i_{t}, j_{2}}) = φ_{j_{1}, j_{2}}^{a - c} μ_{j_{1}}^{c} μ_{j_{2}}^{c}$ . Given this and (2.4), the U-statistics (2.3) can be obtained.

Remark 2.2. The summed term with c = 0 in (2.3) is

\tilde{U} (a) ≔ (P_{a}^{n})^{- 1} \sum_{1 \leq i_{1} \neq \dots \neq i_{a} \leq n} \sum_{1 \leq j_{1} \neq j_{2} \leq p} \prod_{k = 1}^{a} (x_{i_{k}, j_{1}} x_{i_{k}, j_{2}}),

(2.5)

which has the same form as the simplified U-statistic for mean zero observations in Example 2, and is shown to be the leading term of (2.3) in proof.

We next introduce some nice properties of the U-statistics (2.3). The first one is the following location invariant property.

Proposition 2.1. $U (a)$ constructed as in (2.3) is location invariant; that is, for any vector $Δ \in R^{p}$ , the U-statistic constructed based on the transformed data {x_i + Δ : i = 1, … , n} is still $U (a)$ .

The following proposition verifies that the constructed U-statistics are unbiased estimators of $‖ E ‖_{a}^{a} = \sum_{1 \leq j_{1} \neq j_{2} \leq p} σ_{j_{1}, j_{2}}^{a}$ .

Proposition 2.2. For any integer a, $E [U (a)] = \sum_{1 \leq j_{1} \neq j_{2} \leq p} σ_{j_{1}, j_{2}}^{a}$ . Under H₀ in (2.1), $E [U (a)] = 0$ .

We next study the limiting properties of the constructed U-statistics under H₀ given the following assumptions on the random vector x = (x₁, … , x_p)^⊤.

Condition 2.1 (Moment assumption). ${lim}_{p \to \infty} \max_{1 \leq j \leq p} E (x_{j} - μ_{j})^{8} < \infty$ and ${lim}_{p \to \infty} {min}_{1 \leq j \leq p} E (x_{j} - μ_{j})^{2} > 0$ .

Condition 2.2 (Dependence assumption). For a sequence of random variables z = {z_j : j ≥ 1} and integers a < b, let $Z_{a}^{b}$ be the σ-algebra generated by {z_j : j ∈ {a, … , b}}. For each s ≥ 1, define the α-mixing coefficient $α_{z} (s) = {sup}_{t \geq 1} {∣ P (A \cap B) - P (A) P (B) ∣ : A \in Z_{1}^{t}, B \in Z_{t + s}^{\infty}}$ . We assume that under H₀, x is α-mixing with α_x(s) ≤ Mδ^s, where δ ∈ (0, 1) and M > 0 are some constants.

Condition 2.2* (Alternative dependence assumption to Condition 2.2). Following the notation in (2.2), we assume that under H₀, for any j₁, j₂, j₃ ∈ {1, … , p}, Π_{j₁,j₂,j₃} = 0; for any j₁, j₂, j₃, j₄ ∈ {1, … , p}, Π_{j₁,j₂,j₃,j₄} = κ₁(σ_j₁,_j₂σ_j₃,_j₄ + σ_j₁,_j₃σ_j₂,_j₄ + σ_j₁,_j₄σ_j₂,_j₃) for some constant κ₁ < ∞; and for t = 6, 8, and any j₁, … , j_t ∈ {1, …, p}, Π_{j₁,…,j_t} = 0 when at least one of these indexes appears odd times in {j₁, …, j_t}.

Condition 2.1 assumes that the eighth marginal moments of x are uniformly bounded from above and the second moments are uniformly bounded from below, which are true for most light-tailed distributions. Condition 2.2 assumes weak dependence among different x_j’s under H₀, since the uncorrelatedness of x_j’s under H₀ may not imply the independence of them, especially when x_j’s are non-Gaussian. Under H₀, Condition 2.2 automatically holds when x is Gaussian or m-dependent. The mixing-type weak dependence is similarly considered in previous works such as [5, 11, 67] and also commonly assumed in time series and spatial statistics [24, 55]. Moreover, the variables in our motivating genome-wide association studies have a local dependence structure, with their associations often decreasing to zero as the corresponding physical distances on a chromosome increase. We note that it suffices to have Condition 2.2 hold up to a permutation of the variables.

Alternatively, we can substitute Condition 2.2 with Condition 2.2*. Condition 2.2* specifies some higher-order moments of x and is satisfied when x follows an elliptical distribution with finite eighth moments and covariance Σ (see [1, 22, 50, 51]). Conditions 2.2* and 2.2 become equivalent when x follows a multivariate Gaussian distribution. The fourth moment condition is also assumed in other high-dimensional research [6]. In this work, the eighth moment condition is needed to establish the asymptotic joint distribution of different U-statistics.

The following theorem specifies the asymptotic variances of the finite order U-statistics and their joint limiting distribution. Since the U-statistics are degenerate under H₀, an analysis different from the asymptotic theory on nondegenerate U-statistics (e.g., [32]) is needed in the proof.

Theorem 2.1. Under H₀ in (2.1) and Conditions 2.1 and 2.2 (or 2.2*), for $U (a)$ ’s defined in (2.3) and any distinct finite (and positive) integers {a₁, … , a_m}, as n, p → ∞,

{[\frac{U (a_{1})}{σ (a_{1})}, \dots, \frac{U (a_{m})}{σ (a_{m})}]}^{⊺} \overset{D}{\to} N (0, I_{m}),

(2.6)

where

σ^{2} (a) ≔ var [U (a)] ≃ \frac{a!}{P_{a}^{n}} \sum_{1 \leq j_{1} \neq j_{2} \leq p; 1 \leq j_{3} \neq j_{4} \leq p} (Π_{j_{1}, j_{2}, j_{3}, j_{4}})^{a},

(2.7)

with Π_{j₁,j₂,j₃,j₄} defined in (2.2). Note that σ²(a) = Θ(p²n^−a).

Theorem 2.1 shows that after normalization, the finite-order U-statistics have a joint normal limiting distribution with an identity covariance matrix, which implies that they are asymptotically independent as n, p → ∞. The nice independence property makes it easy to combine these U-statistics and apply our proposed adaptive testing later. Moreover, the conclusion holds on general asymptotic regime for n, p → ∞, without any constraint on the relationship between n and p. We will also see in Section 4 that similar results hold generally for some other testing problems.

Remark 2.3. Theorem 2.1 discusses the U-statistics of finite orders, that is, the a values do not grow with n, p. When {x₁, … , x_p} are independent, Theorem 2.1 can be extended when a = O(1) min{log^ϵ n, log^ϵ p} for some ϵ > 0. On the other hand, we will show in Section 2.2 that it is usually enough to include $U (a)$ ’s of finite a. Therefore, we do not pursue the general case when a grows with n, p in this work.

In the following, we further discuss the maximum-type test statistic $U (\infty)$ , which corresponds to the ℓ_∞-norm of the parameter vector $E = {e_{l} : l \in L}$ , that is, $‖ E ‖_{\infty} = \max_{l \in L} ∣ e_{l} ∣$ . In the existing literature, there is already some corresponding established work [8, 36] on the test statistic:

M_{n}^{*} ≔ \max_{1 \leq j_{1} \neq j_{2} \leq p} ∣ {\hat{σ}}_{j_{1}, j_{2}} ∕ \sqrt{{\hat{σ}}_{j_{1}, j_{1}} {\hat{σ}}_{j_{2}, j_{2}}} ∣,

(2.8)

where $({\hat{σ}}_{j_{1}, j_{2}})_{p \times p} = \sum_{i = 1}^{n} (x_{i} - \bar{x}) (x_{i} - \bar{x})^{⊺} ∕ n$ and $\bar{x} = \sum_{i = 1}^{n} x_{i} ∕ n$ . We will take $U (\infty) = M_{n}^{*}$ below. The limiting distribution of $U (\infty)$ was first studied in [36] and extended by [8, 46, 58]. Next, we restate the result in [8], which gives the limiting distribution of (2.8) under the following condition.

Condition 2.3. Consider the random vector x = (x₁, … , x_p)^⊤ with mean vector μ = (μ₁, … , μ_p)^⊤ and covariance matrix Σ = dia(σ_1,1,… , σ_p,p). $(x_{j} - μ_{j}) ∕ \sqrt{σ_{j, j}}$ are i.i.d. for j = 1, … , p. Furthermore, $E e^{t_{0} (∣ x_{1} - μ_{1} ∣ ∕ \sqrt{σ_{1, 1}})^{ς}} < \infty$ for some 0 < ς ≤ 2 and t₀ > 0.

Theorem 2.2 (Cai and Jiang [8], Theorem 2). Assume Condition 2.3 and log p = o(n^β), where β = ς/(4 + ς). Then $P (n \times U (\infty)^{2} + ϖ_{p} \leq u) \to G (u) = e^{- (1 ∕ \sqrt{8 π}) e^{- u ∕ 2}}$ , where $ϖ_{p} = - 4 \log p + \log \log p$ and G(u) is an extreme value distribution of type I.

Theorems 2.1 and 2.2 give the limiting distributions of $U (a)$ of finite orders and $U (\infty)$ respectively; it is of interest to examine their joint distribution. The following theorem shows that although $U (\infty)$ has limiting distribution different from $U (a)$ , a < ∞, they are still asymptotically independent.

Theorem 2.3. Assume that Condition 2.1 is satisfied, Condition 2.3 holds for ς = 2, and log p = o(n^1/7). For finite integers {a₁, … , a_m}, under H₀, $U (a_{1}), \dots, U (a_{m})$ and $U (\infty)$ are mutually asymptotically independent. Specifically, ∀z₁, … , z_m, $y \in R$ , as n, p → ∞,

∣ P (n U (\infty)^{2} + ϖ_{p} \geq y, \frac{U (a_{1})}{σ (a_{1})} \leq z_{1}, \dots, \frac{U (a_{m})}{σ (a_{m})} \leq z_{m}) - P (n U (\infty)^{2} + ϖ_{p} \geq y) \times \prod_{r = 1}^{m} P (\frac{U (a_{r})}{σ (a_{r})} \leq z_{r}) ∣ \to 0 .

Theorem 2.1 suggests that all the finite-order U-statistics are asymptotically independent with each other. Given this, Theorem 2.3 further shows that the maximum-type test statistic $U (\infty)$ is also asymptotically mutually independent with those finite-order U-statistics. The conclusion shares similarity with some classical results on the asymptotic independence between the sum-of-squares-type and maximum-type statistics. Specifically, for random variables w₁,… , w_n, [30, 33] proved the asymptotic independence between $\sum_{i = 1}^{n} w_{i}^{2}$ and max_i=1,…,n ∣w_i∣ for weakly dependent observations. The similar independence properties were extensively studied in literature (e.g., [31, 34, 44, 48, 54, 67]). However, there are several differences between existing literature and the results in this paper. First, we discuss a family of U-statistics $U (a)$ ’s, which takes different a values, and $U (2)$ here corresponding to the sum-of-squares-type statistic is only a special case of general $U (a)$ . Furthermore, we have shown not only the asymptotic independence between $U (a)$ and $U (\infty)$ , but also the asymptotic independence among $U (a)$ ’s of finite a values. Second, the constructed $U (a)$ ’s are unbiased estimators, which are different from the sum-of-squares statistics usually examined in the literature. Moreover, the x’s are allowed to be dependent and the theoretical development in the covariance testing involves a two-way dependence structure, which requires different proof techniques from the existing studies.

Remark 2.4. An alternative way to construct $U (\infty)$ is to standardize ${\hat{σ}}_{j_{1}, j_{2}}$ by its variance $\hat{var} ({\hat{σ}}_{j_{1}, j_{2}})$ . Specifically, following Cai et al. [6], we take $\hat{var} ({\hat{σ}}_{j_{1}, j_{2}}) = n^{- 1} \sum_{i = 1}^{n} {(x_{i, j_{1}} - {\bar{x}}_{j_{1}}) (x_{i, j_{2}} - {\bar{x}}_{j_{2}}) - {\hat{σ}}_{j_{1}, j_{2}}}^{2}$ . Define $M_{n}^{†} = \max_{1 \leq j_{1} \neq j_{2} \leq p} ∣ {\hat{σ}}_{j_{1}, j_{2}} ∣ ∕ {\hat{var} ({\hat{σ}}_{j_{1}, j_{2}})}^{1 ∕ 2}$ and we take $U (\infty) = M_{n}^{†}$ . Theoretically, we prove that Theorem 2.3 still holds with $U (\infty) = M_{n}^{†}$ in Supplementary Material [28], Section B.11. Numerically, we provide the simulations in Supplementary Material [28], Section C.2, which shows that $M_{n}^{*}$ in (2.8) generally has higher power than $M_{n}^{†}$ .

To apply hypothesis testing using the asymptotic results in Theorems 2.1 and 2.3, we need to estimate $var {U (a)}$ . In particular, we propose the following moment estimator of (2.7):

V_{u} (a) = \frac{2 a!}{(P_{a}^{n})^{2}} \sum_{1 \leq j_{1} \neq j_{2} \leq p} \sum_{1 \leq i_{1} \neq \dots \neq i_{a} \leq n} \prod_{t = 1}^{a} (x_{i_{t}, j_{1}} - {\bar{x}}_{j_{1}})^{2} (x_{i_{t}, j_{2}} - {\bar{x}}_{j_{2}})^{2} .

(2.9)

The next result establishes the statistical consistency of $V_{u} (a)$ .

Condition 2.4. For integer a, ${lim}_{p \to \infty} \max_{1 \leq j \leq p} E (x_{j} - μ_{j})^{8 a} < \infty$ .

Theorem 2.4. Under H₀ in (2.1), assume Conditions 2.1, 2.2 and 2.4 hold. Then $V_{u} (a) ∕ var {U (a)} \overset{P}{\to} 1$ .

Theorem 2.4 implies that the asymptotic results in Theorems 2.1 and 2.3 still hold by replacing $var {U (a)}$ with its estimator $V_{u} (a)$ . Specifically, under $H_{0}, [U (a_{1}) ∕ \sqrt{V_{u} (a_{1})}, \dots, U (a_{m}) ∕ \sqrt{V_{u} (a_{m})}]^{⊺} \overset{D}{\to} N (0, I_{m})$ under Conditions 2.1, 2.2 and 2.4. Moreover, Theorem 2.3 implies that ${U (a) ∕ \sqrt{V_{u} (a)}}$ ’s are asymptotically independent with $U (\infty)$ .

2.2. Power analysis.

In this section, we analyze the asymptotic power of the U-statistics. The power of $U (2)$ has been studied in the literature. In particular, [10] studied the hypothesis testing of a high-dimensional covariance matrix with H₀ : Σ = I_p. The authors characterized the boundary that distinguishes the testable region from the nontestable region in terms of the Frobenius norm ∥Σ − I_p∥_F, and showed that the test statistic proposed by [10, 13], which corresponds to $U (2)$ in this paper, is rate optimal over their considered regime. However in practice, $U (2)$ may be not powerful if the alternative covariance matrix is sparse with a small ∥Σ − I_p∥_F. When the alternative covariance has different sparsity levels, it is of interest to further examine which $U (a)$ achieves the best power performance among the constructed family of U-statistics.

To study the test power, we establish the limiting distributions of $U (a)$ ’s under the alternative hypothesis H_A : Σ = Σ_A, where the alternative covariance matrix Σ_A = (σ_j₁,j₂)_p×p is specified in the following Condition 2.5. Define J_A = {(j₁, j₂) : σ_j₁,j₂ ≠ 0, 1 ≤ j₁ ≠ j₂ ≤ p}, which indicates the nonzero off-diagonal entries in Σ_A. The cardinality of J_A, denoted by ∣J_A∣, then represents the sparsity level of Σ_A.

Condition 2.5. Assume ∣J_A∣ = o(p²) and for (j₁, j₂) ∈ J_A, ∣σ_j₁,j₂∣ = Θ(ρ), where ρ = Σ_{(j₁,j₂)∈J_A}∣σ_j₁,j₂∣/∣J_A∣.

Here ρ represents the average signal strength of Σ_A. In our following power comparison of two U-statistics $U (a)$ and $U (b)$ , we say $U (a)$ is “better” than $U (b)$ , if, under the same test power, $U (a)$ can detect a smaller average signal strength ρ (please see the specific definition in Criterion 1 on page 163). Condition 2.5 specifies a general family of “local” alternatives, which include banded covariance matrices, block covariance matrices and sparse covariance matrices whose nonzero entries are randomly located.

Theorem 2.5. Suppose Conditions 2.1, 2.5 and A.1 (an analogous condition to Condition 2.2* under H_A) in Supplementary Material [28] hold. For $U (a)$ in (2.3) and finite integers {a₁, … , a_m}, if ρ = O(∣J_A∣^−1/a_tp^1/a_tn^−1/2) for t = 1, … , m, then as n, p → ∞,

{[\frac{U (a_{1}) - E [U (a_{1})]}{σ (a_{1})}, \dots, \frac{U (a_{m}) - E [U (a_{m})]}{σ (a_{m})}]}^{⊺} \overset{D}{\to} N (0, I_{m}),

where for a ∈ {a₁, … , a_m}, $E [U (a)] = \sum_{(j_{1}, j_{2}) \in J_{A}} σ_{j_{1}, j_{2}}^{a}$ and $σ^{2} (a) = var [U (a)] ≃ 2 a! κ_{1}^{a} \times n^{- a} \sum_{1 \leq j_{1} \neq j_{2} \leq p} σ_{j_{1}, j_{1}}^{a} σ_{j_{2}, j_{2}}^{a}$ , which is of order Θ(p²n^−a).

Theorem 2.5 shows that for a single U-statistic $U (a)$ of finite order a,

P (\frac{U (a)}{\sqrt{var [U (a)]}} > z_{1 - α}) \to 1 - Φ (z_{1 - α} - \frac{E [U (a)]}{\sqrt{var [U (a)]}}),

(2.10)

where z_1−α is the upper α quantile of $N (0, 1)$ and Φ(·) is the cumulative distribution function of $N (0, 1)$ . By Theorem 2.5, the asymptotic power of $U (a)$ of the one-sided test depends on

\frac{E [U (a)]}{\sqrt{var [U (a)]}} ≃ \frac{\sum_{(j_{1}, j_{2}) \in J_{A}} σ_{j_{1}, j_{2}}^{a}}{{2 a! κ_{1}^{a} n^{- a} \sum_{1 \leq j_{1} \neq j_{2} \leq p} (σ_{j_{1}, j_{1}} σ_{j_{2}, j_{2}})^{a}}^{1 ∕ 2}},

(2.11)

where (2.11) = Θ(∣J_A∣ρ^ap⁻¹n^a/2). It follows that when $E [U (a)]$ is of the same order of $\sqrt{var [U (a)]}$ , that is, $E [U (a)] = O (1) \sqrt{var [U (a)]}$ , the constraint of ρ in Theorem 2.5 is satisfied.

In the following power analysis, we will first compare $U (a)$ ’s of finite a and then compare them with $U (\infty)$ . As we focus on studying the relationship between the sparsity level and power, we consider an ideal case where σ_j₁,j₂ = ρ > 0 for (j₁, j₂) ∈ J_A and σ_{j, j} = v² > 0 for j = 1, … , p. Then

(2.11) ≃ ∣ J_{A} ∣ ρ^{a} ∕ (\sqrt{2 a! κ_{1}^{a}} v^{2 a} p n^{- a ∕ 2}) .

(2.12)

We next show how the order of the “best” U-statistics changes when the sparsity level ∣J_A∣ varies. To be specific of the meaning of “best,” we compare the ρ values needed by different U-statistics to achieve the same asymptotic power. Particularly, we fix $E [U (a)] ∕ \sqrt{var [U (a)]}$ , that is, (2.12) to be some constant $M ∕ \sqrt{2}$ for different a’s and the asymptotic power of each $U (a)$ is (2.10) = $1 - Φ (z_{1 - α} - M ∕ \sqrt{2})$ . Then by (2.12), the ρ value such that $U (a)$ attains the power above is

ρ_{a} = \sqrt{κ_{1}} (a!)^{\frac{1}{2 a}} v^{2} (M p ∕ ∣ J_{A} ∣)^{\frac{1}{a}} n^{- \frac{1}{2}} .

(2.13)

By the definition in (2.13), we compare the power of two U-statistics $U (a)$ and $U (b)$ with a ≠ b following the Criterion 1 below:

Criterion 1. We say $U (a)$ is “better” than $U (b)$ if ρ_a < ρ_b.

Given values of n, p, ∣J_A∣ and M, (2.13) is a function of a. Therefore, to find the “best” $U (a)$ , it suffices to find the order, denoted by a₀, that gives the smallest ρ_a value in (2.13). We then have the following proposition discussing the optimality among the U-statistics of finite orders in (2.3).

Proposition 2.3. Given n, p, ∣J_A∣ and any constant M ∈ (0, +∞), we consider ρ_a in (2.13) as a function of integer a, then:

(i) when ∣J_A∣ ≥ M_p, the minimum of ρ_a is achieved at a₀ = 1;

(ii) when ∣J_A∣ < M_p, the minimum of ρ_a is achieved at some a₀, which increases as M_p/∣J_A∣ increases.

By Proposition 2.3, the order a₀ that attains the smallest value of ρ_a depends on the value of M_p/∣J_A∣ and does not have a closed-form solution. We use numerical plots to demonstrate the relationship between a₀ and the sparsity level. Particularly, let ∣J_A∣= p^2(1–β), where β ∈ (0, 1) denotes the sparsity level. To have a better visualization, we use $g (a) = \log (ρ_{a} n^{1 ∕ 2} κ_{1}^{- 1 ∕ 2} v^{- 2}) = (1 ∕ 2 a) \log a! + a^{- 1} \log (M p^{2 β - 1})$ instead of ρ_a. We plot g(a) curves in Figure 1 for each β ∈ {0.1,…, 0.9} with M = 4 and p ∈ {100, 10000}. Other values of M and p are also taken, which give similar patterns to Figure 1 and are not presented.

Fig. 1. — g(a) *versus a with different sparsity level β for p* = 100, 10000.

Figure 1 shows that the a₀ such that g(a) attains the smallest value increases when the sparsity level β increases. In particular, when the sparsity level β ≤ 0.3, that is, when ∣J_A∣ is “very” large and then Σ_A is “very” dense, g(a) has the smallest value at a₀ = 1. This is consistent with the conclusion in Proposition 2.3 (i). When the sparsity level β is between 0.4 and 0.5, we note that a₀ = 2 achieves the minimum of g(a). This shows that when ∣J_A∣ is “moderately” large and Σ_A is “moderately” dense, $U (2)$ is more powerful than $U (1)$ . When the sparsity level β > 0.5, we find that a₀ > 2. This implies that when ∣J_A∣ becomes smaller and Σ_A becomes sparser, U-statistics of higher orders are more powerful. Additionally, we note that a₀ increases slowly as β increases, which verifies Proposition 2.3(ii). Moreover, the curves converge as a increases and the differences of g(a) for large a values (a ≥ 6) are small. This implies that when selecting the range of considered orders of U-statistics, it suffices to select an upper bound with a = 6 or 8, which gives better or similar ρ_a values to those larger a’s.

In summary, when ∣J_A∣ is large, that is, Σ_A is dense, a small a tends to obtain a smaller lower bound in terms of ρ. But when ∣J_A∣ decreases, that is, Σ_A becomes sparse, a U-statistic of large finite order (or the maximum-type U-statistic as shown next) tends to obtain a smaller lower bound in ρ. This observation is consistent with the existing literature [7, 8, 10, 13].

Next, we proceed to examine the power of the maximum-type test statistic $U (\infty)$ , and compare it with the U-statistics $U (a)$ of finite a defined in (2.3). By [8], the rejection region for $U (\infty)$ with significance level α is

∣ U (\infty) ∣ \geq t_{p} ≔ n^{- 1 ∕ 2} \sqrt{4 \log p - \log \log p - \log (8 π) - 2 \log \log (1 - α)^{- 1}} .

Note $t_{p} ≃ 2 \sqrt{\log p ∕ n}$ and under alternative, the power for $U (\infty)$ is

P (∣ U (\infty) ∣ \geq t_{p}) .

(2.14)

As discussed, we consider the alternatives satisfying Conditions 2.2* and 2.5, σ_{j₁, j₂} = ρ > 0 for (j₁, j₂) ∈ J_A, and σ_{j, j} = ν² for j = 1,…,p. For simplicity, we assume E(x) = μ and ν² are given, and focus on the simplified

U (\infty) = \max_{1 \leq j_{1} < j_{2} \leq p} ∣ v^{- 2} n^{- 1} \sum_{i = 1}^{n} (x_{i, j_{1}} - μ_{j_{1}}) (x_{i, j_{2}} - μ_{j_{2}}) ∣ .

(2.15)

We show in the following proposition when the power of $U (\infty)$ asymptotically converges to 1 or is strictly smaller than 1 under alternative.

Proposition 2.4. Under the considered alternative Σ_A above, suppose max_j=1,…,p Ee^{t₀∣x_j–μ_j∣^ς} < ∞ for some 0 < ς ≤ 2 and t₀ > 0, and log p = o(n^β) with β = ς/(4 + ς). Then for (2.15), when n, p → ∞:

(i) there exists a constant c₁ > 2 such that if $ρ \geq c_{1} \sqrt{\log p ∕ n}$ , (2.14) → 1;

(ii) there exists another constant 0 < c₂ < 2 such that when $ρ \leq c_{2} \sqrt{\log p ∕ n}$ , Condition 2.2* holds for κ₁ ≤ 1 and $∣ J_{A} ∣ = o (1) p^{\frac{2 (1 - c_{2} ∕ 2)^{2}}{κ_{1} + m}} (\log p)^{\frac{1}{2} - \frac{1}{2 (κ_{1} + m)}}$ for some m > 0, we have (2.14) ≤ log(1 – α)⁻¹.

Recall that Proposition 2.3 shows that there exists a finite integer a₀, such that ρ_a₀ is the minimum of (2.13), and ρ_a₀ is a lower bound of ρ value for the finite-order U-statistics to achieve the given asymptotic power. With Propositions 2.3 and 2.4, we next compare the finite-order U-statistics defined in (2.3) with the maximum-type test statistic $U (\infty)$ .

Proposition 2.5. Under the conditions of Theorem 2.5 and Proposition 2.4, for any finite integer a, there exist constants c₁ and c₂ such that when p is sufficiently large:

(i) For any M, when $∣ J_{A} ∣ < c_{1}^{- a} (a!)^{\frac{1}{2}} κ_{1}^{\frac{a}{2}} (\log p)^{- \frac{a}{2}} M p$ , $U (\infty)$ has higher asymptotic power than $U (a)$ .

(ii) When M is big enough and $∣ J_{A} ∣ > c_{2}^{- a} (a!)^{\frac{1}{2}} κ_{1}^{\frac{a}{2}} (\log p)^{- \frac{a}{2}} M p$ , $U (a)$ has higher asymptotic power than $U (\infty)$ .

From Proposition 2.3, we know when M_p/∣J_A∣ = O(1), there exists a finite a₀ such that $U (a_{0})$ is the “best” among all the finite-order U-statistics; in this case, Proposition 2.5(ii) further indicates that $U (a_{0})$ has higher asymptotic power than $U (\infty)$ . Specifically, if M_p/∣J_A∣ < 1, a₀ = 1, then $U (1)$ is the “best” and its lowest detectable order of ρ is Θ(p∣J_A∣⁻¹n^−1/2). More interestingly, when Σ_A is moderately dense or moderately sparse with M_p/∣J_A∣ > 1 and bounded, some U-statistic of finite order a₀ > 1 would become the “best.” By Figure 1, the value of a₀ increases as Σ_A becomes denser. On the other hand, when Σ_A is “very” sparse with $∣ J_{A} ∣ < c_{1}^{- a_{0}} (a_{0}!)^{\frac{1}{2}} κ_{1}^{\frac{a_{0}}{2}} (\log p)^{- \frac{a_{0}}{2}} M p$ , $U (\infty)$ is the “best” and its lowest detectable order of ρ is $Θ (\sqrt{\log p ∕ n})$ .

Remark 2.5. The above power comparison results are under the constructed family of U-statistics. We note that additional formulation may further enhance the test power. For instance, [11, 73] showed that an adaptive thresholding in certain ℓ_p-type test statistics can achieve high power under the alternatives with sparse and faint signals. It is of interest to incorporate the adaptive thresholding into the constructed family of U-statistics, which is left for future study.

R_emark 2.6. The analysis above focuses on the ideal case where the nonzero off-diagonal entries of Σ_A are the same for illustration. When these entries of Σ_A are different, similar analysis still applies by Theorem 2.5 for general covariance matrices. In particular, the asymptotic power of $U (a)$ depends on the mean variance ratio (2.11) and $ρ_{a} = \sqrt{κ_{1}} n^{- 1 ∕ 2} (a!)^{1 ∕ 2 a} \times (M \sum_{j = 1}^{p} σ_{j, j}^{a} ∕ \sum_{1 \leq j_{1}, j_{2} \leq p} σ_{j_{1}, j_{2}}^{a})^{1 ∕ a}$ . We can then obtain conclusions similar to Propositions 2.3-2.5. One interesting case is when Σ_A contains both positive and negative entries; the same analysis applies for even-order U-statistics, since $σ_{j_{1}, j_{2}}^{a}$ ’s are all nonnegative for even a. On the other hand, the odd-order U-statistics would have low power, since $\sum_{1 \leq j_{1} \neq j_{2} \leq p} σ_{j_{1}, j_{2}}^{a}$ ’s could be small due to the cancellation of positive and negative $σ_{j_{1}, j_{2}}^{a}$ . We have conducted simulations when the nonzero σ_j₁,j₂’s are different in Section 3.1, and the results exhibit consistent patterns as expected.

2.3. Application to adaptive testing and computation.

Adaptive testing.

Power analysis in Section 2.2 shows that when the sparsity level of the alternative changes, the test statistic that achieves the highest power could vary. However, since the truth is often unknown in practice, it is unclear which test statistic should be chosen. Therefore, we develop an adaptive testing procedure by combining the information from U-statistics of different orders, which would yield high power against various alternatives.

In particular, we propose to combine the U-statistics through their p-values, which is widely used in literature [49, 52, 71]. One popular method is the minimum combination, whose idea is to take the minimum p-value to approximate the maximum power [52, 67, 71]. Specifically, let Γ be a candidate set of the orders of U-statistics, which contains both finite values and TO. We compute p-values p_a’s of the U-statistics $U (a)$ ’s satisfying a ∈ Γ. The minimum combination takes the statistic T_adpUmin = min{p_a : a ∈ Γ} and has the asymptotic p-value p_adpUmin = 1 – (1 – T_adpUmin)^∣Γ∣, where ∣Γ∣ denotes the size of the candidate set Γ. We reject H₀ if p_adpUmin < α. Under H₀, p_a’s are asymptotically independent and uniformly distributed by the theoretical results in Section 2.1. The type I error is asymptotically controlled as $P (p_{adpUmin} < α) = P ({min}_{a \in Γ} p_{a} < p_{α}^{*}) \to α$ , where $p_{α}^{*} = 1 - (1 - α)^{1 ∕ ∣ Γ ∣}$ . Since $P ({min}_{a \in Γ} p_{a} < p_{α}^{*}) \geq P (p_{a} < p_{α}^{*})$ , the power of the adaptive test goes to 1 if there exists a ∈ Γ such that the power of $U (a)$ goes to 1. We note that the power of the adaptive test is not necessarily higher than that of all the U-statistics. This is because the power of $U (a)$ is P(p_a < α), and is different from $P (p_{a} < p_{α}^{*})$ since $p_{α}^{*} < α$ when ∣Γ∣ > 1. Based on our extensive simulations, we find that the adaptive test is usually close to or even higher than the maximum power of the U-statistics.

Remark 2.7. Fisher’s method [49] is another popular method for combining independent p-values. It has the test statistic $T_{adpUf} = - 2 \sum_{k = 1}^{∣ Γ ∣} \log p_{k}$ , which converges to $χ_{2 ∣ Γ ∣}^{2}$ under H₀. By our simulations, the minimum combination and Fisher’s method are generally comparable, while Fisher’s method has higher power under several cases. Moreover, we can also use other methods to combine the p-values, such as higher criticism [16, 17]. We leave the study of how to efficiently combine the p-values for future research.

We select the candidate set Γ by the power analysis in Section 2.2. We would recommend including {1, 2,…, 6, ∞}, which can be powerful against a wide spectrum of alternatives. In particular, by Propositions 2.3 and 2.5, we include a = 1, 2 that are powerful against dense signals; a = ∞ that is powerful against sparse signals; and also a = {3,…, 6} for the moderately dense and moderately sparse signals. By Figure 1, it generally suffices to choose finite a up to 6–8, which often give similar/better performance to/than larger a values. The simulations in Section 3.1 confirm the good performance of this choice of Γ; and the proposed adaptive test appears to well approximate the “best” performance even when Γ may not always contain the unknown “optimal” U-statistics.

We would like to mention that the adaptive procedure can be generalized to other testing problems, as long as similar theoretical properties are given, such as the examples in Section 4.

Computation.

Next, we discuss the computation in the adaptive testing. A direct calculation following the form of $U (a)$ in (2.3) and $V (a)$ in (2.9) would be computationally expensive for large a with a cost of O(p²n^2a). To address this issue, we introduce a method that can reduce the cost.

We first consider a simplified setting when E(x_i,j) = 0 to illustrate the idea. As discussed in Remark 2.2, we examine $\tilde{U} (a)$ defined in (2.5). Let $L = {(j_{1}, j_{2}) : 1 \leq j_{1} \neq j_{2} \leq p}$ denote the set of index tuples, and for each index tuple $l = (j_{1}, j_{2}) \in L$ , define s_i,l = x_i,j₁ x_i,j2. Note that $\tilde{U} (a) = (P_{a}^{n})^{- 1} \sum_{l \in L} U_{l} (a)$ , where $U_{l} (a) = \sum_{1 \leq i_{1} \neq \dots \neq i_{a} \leq n} \prod_{k = 1}^{a} s_{i_{k}, l}$ . Calculating $U_{l} (a)$ directly is of order O(n^a). We then focus on reducing the computational cost of $U_{l} (a)$ . For $l \in L$ and finite integers t₁,…,t_k, define

V_{l}^{(t_{1}, \dots, t_{k})} = \prod_{r = 1}^{k} (\sum_{i = 1}^{n} s_{i, l}^{t_{r}}), U_{l}^{(t_{1}, \dots, t_{k})} = \sum_{1 \leq i_{1} \neq \dots \neq i_{k} \leq n} \prod_{r = 1}^{k} s_{i_{1}, l}^{t_{r}} .

(2.16)

We can see that $U_{l} (a) = U_{l}^{1_{a}}$ with 1_a being an a-dimensional vector of all ones, and $U_{l}^{(a)} = V_{l}^{(a)}$ for any finite integer a. To reduce the computational cost of $U_{l} (a)$ , the main idea is to obtain $U_{l}^{1_{a}}$ from $V_{l}^{(t_{1}, \dots, t_{k})}$ , whose computational cost is O(n). In particular, $U_{l} (a)$ can be attained iteratively from $V_{l}^{(t_{1}, \dots, t_{k})}$ based on the following equation:

U_{l}^{(k, 1_{r - k})} = V_{l}^{(k)} \times U_{l}^{1_{r - k}} - (r - k) \times U_{l}^{(k + 1, 1_{r - k - 1})},

(2.17)

which follows from the definitions. Algorithm 1 below summarizes the steps.

We illustrate the idea of the algorithm by some examples. By definition, $U_{l}^{(1)} = V_{l}^{(1)}$ , which can be computed with cost O(n). Next, consider in (2.17), if r = 2 and k = 1, then $U_{l}^{(1, 1)} = V_{l}^{(1)} \times U_{l}^{(1)} - (2 - 1) \times U_{l}^{(2)} = V_{l}^{(1)} \times V_{l}^{(1)} - V_{l}^{(2)}$ , which yields $U_{l}^{1_{2}}$ with cost O(n). For $U_{l}^{1_{3}}$ , we first take r = 3 and k = 2 in (2.17), then with cost O(n), we have $U_{l}^{(2, 1)} = V_{l}^{(2)} \times U_{l}^{(1)} - U_{l}^{(3)} = V_{l}^{(2)} \times V_{l}^{(1)} - V_{l}^{(3)}$ , as $V_{l}^{(k)} = U_{l}^{(k)}$ by the definition. Given $U_{l}^{1_{2}}$ and $U_{l}^{(2, 1)}$ , we obtain $U_{l}^{(1, 1_{2})} = V_{l}^{(1)} \times U_{l}^{1_{2}} - 2 \times U_{l}^{(2, 1_{1})}$ . Thus $U_{l}^{1_{3}}$ is also computed with cost O(n). Iteratively, for any finite integer a, we can obtain $U_{l}^{1_{a}}$ from $V_{l}^{(t_{1}, \dots, t_{k})}$ whose computational cost is O(n). More closed-form formulae representing $U_{l}^{1_{a}}$ by $V_{l}^{(t_{1}, \dots, t_{k})}$ are given in Section C.1.1 of Supplementary Material [28].

Algorithm 1 reduces the computational cost of $\tilde{U} (a)$ from O(p²n^a) to O(p²n). Its idea is general and can be extended to compute other different U-statistics by changing the input s_i,l. In particular, the variance estimator $V (a)$ can be computed with cost O(p²n) by specifying $s_{i, l} = (x_{i, j_{1}} - {\bar{x}}_{j_{1}})^{2} (x_{i, j_{2}} - {\bar{x}}_{j_{2}})^{2}$ , for each $l \in L = {(j_{1}, j_{2}) : 1 \leq j_{1} \neq j_{2} \leq p}$ . Then $V (a) = 2 a! (P_{a}^{n})^{- 2} \sum_{l \in L} \sum_{l \leq i_{1} \neq \dots \neq i_{a} \leq n} \prod_{k = 1}^{a} s_{i_{k}, l}$ and Algorithm 1 can be applied. Moreover, when E(x_i,j) is unknown, $U (a)$ can still be computed with cost O(p²n) using the iterative method similar to Algorithm 1. The details are provided in Section C.1.2 of Supplementary Material [28].

3. Simulations and real data analysis.

3.1. Simulations.

We conduct simulation studies to evaluate the performance of the proposed adaptive testing procedures, and investigate the relationship between the power and sparsity levels. For one-sample covariance testing discussed in Section 2, we generate n i.i.d. p-dimensional x_i for i = 1,…,n, and consider the following five simulation settings.

Setting 1: x_i has p i.i.d. entries of $N (0, 1)$ and Gamma(2, 0.5), respectively. Under each case, we take n = 100 and p ∈ {50, 100, 200,400, 600, 800, 1000} to verify the theoretical results under H₀ and the validity of the adaptive test across different n and p combinations.

For the following settings 2–5, we generate x_i from multivariate Gaussian distributions with mean zero and different covariance matrices Σ_A’s.

Setting 2: $Σ_{A} = (1 - ρ) I_{p} + ρ 1_{p, k_{0}} 1_{p, k_{0}}^{⊺}$ , where 1_p,k₀ is a p-dimensional vector with the first k₀ elements one and the rest zero. We take (n, p) ∈ {(100, 300), (100, 600), (100, 1000)}, and study the power with respect to different signal sizes ρ and sparsity levels k₀.

Setting 3: The diagonal elements of Σ_A are all one and ∣J_A∣ number of off-diagonal elements are ρ with random positions. We take (n, p) ∈ {(100, 600), (100, 1000)} and let the signal size ρ and sparsity level ∣J_A∣ vary to examine how the power changes accordingly.

Setting 4: The diagonal elements of Σ_A are all one and ∣J_A∣ number of off-diagonal elements are uniformly generated from (0, 2ρ) with random positions. We take (n, p) = (100, 1000) and similarly let the signal size ρ and sparsity level ∣J_A∣ vary to examine how the power changes accordingly.

Setting 5: We consider the multivariate models in [13]. Specifically, for each i = 1,…,n, x_i = Ξz_i + μ, where Ξ is a matrix of dimension p × m, and z_i’s are i.i.d. Gaussian or Gamma random vectors. Under null hypothesis, m = p, Ξ = I_p, μ = 21_p; under alternative hypothesis, m = p + 1, $Ξ = (\sqrt{1 - ρ} I_{p}, \sqrt{2 ρ} 1_{p})$ , $μ = 2 (\sqrt{1 - ρ} + \sqrt{2 ρ}) 1_{p}$ . We also take the n and p combination in [13] with (n, p) ∈ {(40, 159), (40, 331), (80, 159), (80, 331), (80, 642)}.

We compare several methods in the literature, including both maximum-type and sum-of-squares-type tests. In particular, the maximum-type test statistic in Jiang [36] is taken as $U (\infty)$ in this framework. Since the convergence in [36] is known to be slow, we use permutation to approximate the distribution in the simulations. In addition, we consider some sum-of-squares-type methods. Specifically, we examine the identity and sphericity tests in Chen et al. [13], which are denoted as “Equal” and “Spher,” respectively. We also compare the methods in Ledoit and Wolf [42] and Schott [57], which are referred to as “LW” and “Schott,” respectively.

To illustrate, Figure 2 summarizes the numerical results for the setting 3 when n = 100 and p = 1000. All the results are based on 1000 simulations at the 5% nominal significance level. In Figure 2, we present the power of single U-statistics with orders in {1,…, 6, ∞}. “adpUmin” and “adpUf” represent the results of the adaptive testing procedure using the minimum combination and Fisher’s method in Section 2.2, respectively. The simulation results show that the type I error rates of the U-statistics and adaptive test are well controlled under H₀. In addition, Figure 2 exhibits several patterns that are consistent with the power analysis in Section 2.2. First, it shows that among the U-statistics, when ∣J_A∣ is very small, $U (\infty)$ performs best; and when ∣J_A∣ increases, the performances of some U-statistics of finite orders catch up. For instance, when ∣J_A∣ = 100, $U (6)$ and $U (\infty)$ are similar and are better than the other U-statistics; when ∣J_A∣ = 400, $U (4)$ and $U (5)$ are similar and better than the other U-statistics. When Σ_A is relatively dense, $U (2)$ and $U (1)$ become more powerful. Particularly, when ∣J_A∣ = 1600, $U (2)$ is powerful; when ∣J_A∣ becomes larger, such as when ∣J_A∣ = 3200, $U (1)$ is overall the most powerful. Second, Figure 2 shows that “LW,” “Schott,” “Equal,” “Spher” and $U (2)$ perform similarly under various cases. In particular, these methods are not powerful when the alternative is sparse but becomes more powerful when the alternative gets denser. This is because they are all sum-of-squares-type statistics that target at dense alternatives. Third and importantly, the two adaptive tests “adpUmin” and “adpUf” maintain high power across different settings. Specifically, they perform better than most single U-statistics: their powers are usually close to or even higher than the best single U-statistic. Moreover, “adpUmin” and “adpUf” generally have higher power than the compared existing methods. We also note that “adpUf” overall performs better than “adpUmin” in this simulation setting. In summary, Figure 2 demonstrates the relationship between the sparsity levels of alternatives and the power of the tests, confirming the theoretical conclusions in Section 2.2. Notably, the proposed adaptive testing procedure is powerful against a wide range of alternatives, and thus advantageous in practice when the true alternative is unknown.

Due to the space limitation, we provide other extensive numerical studies in Supplementary Material [28], Section C.2. The conclusions are similar to those of Figure 2, and consistent with the theoretical results in Section 2.2. In particular, the results show that the empirical sizes of the tests are close to the nominal level, suggesting the good finite-sample performance of the asymptotic approximations. Moreover, under highly dense alternatives with only nonnegative entries in the covariance matrix, $U (1)$ is the most powerful one among the $U (a)$ ’s and the other tests in [13, 42, 57], in agreement with the results in Propositions 2.3 and 2.5. Furthermore, the proposed adaptive testing procedures often have higher power than most single U-statistics.

3.2. Real data analysis.

Alzheimer’s disease (AD) is the most prevalent neurodegenerative disease [56] and is ranked as the sixth leading cause of death in the US [68]. Every 65 seconds, someone in the US develops AD [2]. To advance our understanding of AD, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) was started in 2004, collecting extensive genetic data for both healthy individuals and AD patients. To gain insight into the genetic mechanisms of AD, one can test a single SNP a time. However, due to a relatively small sample size of the ADNI data, scanning across all SNPs failed to identify any genomewide significant SNP (with p-value < 5 × 10⁻⁸) [40]. To date, the largest meta-analysis of more than 600,000 individuals identified 29 significant risk loci [35] and can only explain a small proportion of AD variance. On the other hand, a group of functionally related genes as annotated in a biological pathway are often involved in the same disease susceptibility and progression [29]. Thus, pathway-based analyses, which jointly analyze a group of SNPs in a biological pathway, have become increasingly popular. We retrieve a total of 214 pathways from the KEGG database [39] for the subsequent analysis.

Although pathway-based analyses with KEGG pathways are common in real studies, formally testing the correlations of the genes in a KEGG pathway has been largely untouched. Here, we apply our method and other competing methods in [13] to test if all the genes in a pathway have correlated gene expression levels. Perhaps as expected, all methods reject the null hypothesis for all pathways with highly significant p-values, since the KEGG pathways are constructed to include only the genes with similar function into the same pathway [39], while similar function often implies co-expression (and vice versa). To compare the performance of the different tests, for each pathway we randomly select 50 subjects and restrict our analysis to pathways of at least 50 genes, leading to 103 pathways for the following analysis. Then we perturb the data by shuffling the gene expression levels of randomly selected 100(1 – α)% genes in a pathway before applying each test. Figure 3 shows the performance of the tests with two significance cutoffs, where “ $U (2)$ ” represents the single $U (2)$ statistic, “adpU” represents our proposed adaptive testing procedure using the minimum combination with candidate U-statistics of orders in {1,…, 6, ∞}, and “Equal” and “Spher” represent the identity and sphericity tests in [13], respectively. Because all pathways are highly significant with all samples, we can treat all pathways as the true positives. Due to the adaptiveness of our proposed testing procedure, “adpU” identifies more significant pathways than the competing methods across all the levels of data perturbation (mimicking the varying sparsity levels of the alternatives).

Fig. 3. — Power comparison of different methods with ADNI data.

4. Other high-dimensional examples.

In this section, we apply the proposed U-statistics framework to other high-dimensional testing problems, including testing means, two-sample covariances, and regression coefficients in generalized linear regression models. Similar theoretical results to Section 2 are developed, with detailed proofs and related simulation studies provided in Supplementary Material [28].

4.1. Mean testing.

Testing mean vectors is widely used in many statistical analysis and applications [1, 50]. Under high-dimensional scenarios, for example, in genome-wide studies, dimension of the data is often much larger than the sample size, so traditional multivariate tests such as Hotelling’s T²-test either cannot be directly applied or have low power [18]. To address this issue, several new procedures for testing high-dimensional mean vectors have been proposed [4, 9, 11, 12, 16, 17, 25-27, 60, 62, 67]. However, many of the statistics only target at either sparse or dense alternatives, and suffer from loss of power for other types of alternatives. We next apply the U-statistics framework to one-sample and two-sample mean testing problems.

One-sample mean testing.

We first discuss the one-sample mean vector testing. Assume that x₁,…, x_n are n i.i.d. copies of a p-dimensional real-valued random vector x = (x₁,…,x_p)^⊤ with mean vector μ = (μ₁,…,μ_p)^⊤, covariance matrix Σ = {σ_j₁,j₂ : 1 ≤ j₁, j₂ ≤ p}. We want to conduct the global test on H₀ : μ = μ₀ where μ₀ = (μ_1,0,…,μ_p,0)^⊤ is given.

Similar to previous discussion, the parameter set that we are interested in is $E = {μ_{1} - μ_{1, 0}, \dots, μ_{p} - μ_{p, 0}}$ . For each j = 1,…, p, E(x_i,j) = μ_j, so K_j(x_i) = x_i,j – μ_j,0 is a kernel function, which is a simple unbiased estimator of the target. Following our construction, the U-statistic for finite a is

U (a) = \sum_{j = 1}^{p} \frac{1}{P_{a}^{n}} \sum_{1 \leq i_{1} \neq \dots \neq i_{a} \leq n} \prod_{k = 1}^{a} (x_{i_{k}, j} - μ_{j, 0}),

(4.1)

which targets at $‖ E ‖_{a}^{a} = \sum_{j = 1}^{p} (μ_{j} - μ_{j, 0})^{a}$ , and the U-statistic corresponding to $‖ E ‖_{\infty}$ is $U (\infty) = \max_{1 \leq j \leq p} σ_{j, j}^{- 1} ({\bar{x}}_{j} - μ_{0, j})^{2}$ with ${\bar{x}}_{j} = \sum_{i = 1}^{n} x_{i, j} ∕ n$ .

Given the statistics, we have the theoretical results similar to Theorems 2.1-2.3. The following Theorems 4.1-4.2 are established under similar conditions to that of Theorems 2.1-2.3. Due to the limited space, we provide the conditions and corresponding discussions in Supplementary Material [28].

Theorem 4.1. Under H₀: μ = μ₀, assume Condition A.2 in Supplementary Material [28]. Then for any finite integers {a₁,…,a_m}, as n, p → ∞, $[U (a_{1}) ∕ σ (a_{1}), \dots, U (a_{m}) ∕ σ (a_{m})]^{⊺} \overset{D}{\to} N (0, I_{m})$ , where $σ^{2} (a) = var [U (a)] = \sum_{i = 1}^{p} \sum_{j = 1}^{p} a! σ_{i, j}^{a} ∕ P_{a}^{n}$ with the order of Θ(a!pn^−a).

Theorem 4.2. Under H₀: μ = μ₀, assume Condition A.3 in Supplementary Material [28]. Then $\forall u \in R$ , $P (n U (\infty) - τ_{p} \leq u) \to exp {- π^{- 1 ∕ 2} exp (- u ∕ 2)}$ , as n, p → ∞, where τ_p = 2 log p – log log p. In addition, for any finite integer a, ${U (a) ∕ σ (a)} a n d {n U (\infty) - τ_{p}}$ are asymptotically independent.

By Theorems 4.1 and 4.2, we obtain the asymptotic independence among the U-statistics and the corresponding limiting distributions of the U-statistics under H₀. Under the alternative hypothesis, since the power analysis of the one-sample mean testing is similar to that of the two-sample case, we delay the power analysis after presenting the asymptotic independence property of the proposed U-statistics in the two-sample mean testing problem.

Two-sample mean testing.

Next, we discuss the two-sample mean testing problem. Suppose we have two groups of p-dimensional observations ${x_{i}}_{i = 1}^{n_{x}}$ and ${y_{i}}_{i = 1}^{n_{y}}$ , which are i.i.d. copies of two independent random vectors x = (x₁, … , x_p)^⊤ and y = (y₁, … , y_p)^⊤, respectively. Suppose E(x) = μ = (μ₁, … , μ_p)^⊤, E(y) = ν = (ν₁, … , ν_p)^⊤, cov(x) = Σ_x and cov(y) = Σ_y. We write n = n_x + n_y and assume n_x = Θ(n_y). For easy illustration, we first consider Σ_x = Σ_y = Σ = {σ_{j₁, j₂} : 1 ≤ j₁, j₂ ≤ p}. We will then discuss the case when Σ_x ≠ Σ_y, where similar analysis applies.

The two-sample mean testing examines H₀: μ = ν versus H_A: μ ≠ ν, then $E = (μ_{1} - v_{1}, \dots, μ_{p} - v_{p})^{⊺}$ . For 1 ≤ j ≤ p, 1 ≤ k ≤ n_x, 1 ≤ s ≤ n_y, K_j(x_k, y_s) = x_k,j − y_s,j is a simple unbiased estimator of μ_j − ν_j, and thus we construct $U (a) = \sum_{j = 1}^{p} (P_{a}^{n_{x}} P_{a}^{n_{y}})^{- 1} \times \sum_{1 \leq k_{1} \neq \dots \neq k_{a} \leq n_{x}; 1 \leq s_{1} \neq \dots \neq s_{a} \leq n_{y}} \prod_{t = 1}^{a} (x_{k_{t}, j} - y_{s_{t}, j})$ , which is also equivalent to

U (a) = \sum_{j = 1}^{p} \sum_{c = 0}^{a} (\begin{matrix} a \\ c \end{matrix}) \frac{(- 1)^{a - c}}{P_{c}^{n_{x}} P_{a - c}^{n_{y}}} \sum_{\begin{matrix} 1 \leq k_{1} \neq \dots \neq k_{c} \leq n_{x} \\ 1 \leq s_{1} \neq \dots \neq s_{a - c} \leq n_{y} \end{matrix}} \prod_{t = 1}^{c} x_{k_{t}, j} \prod_{m = 1}^{a - c} y_{s_{m}, j} .

(4.2)

We can check that (4.2) satisfies $E {U (a)} = \sum_{j = 1}^{p} (μ_{j} - v_{j})^{a}$ , so $U (a)$ is an unbiased estimator of $‖ E ‖_{a}^{a} = \sum_{j = 1}^{p} (μ_{j} - v_{j})^{a}$ . On the other hand, for $‖ E ‖_{\infty}$ , following the maximum-type test statistic in Cai et al. [9], we have

U (\infty) = \max_{1 \leq j \leq p} σ_{j, j}^{- 1} ({\bar{x}}_{j} - {\bar{y}}_{j})^{2},

(4.3)

where ${\bar{x}}_{j} = \sum_{i = 1}^{n_{x}} x_{i, j} ∕ n_{x}$ , ${\bar{y}}_{j} = \sum_{i = 1}^{n_{y}} y_{i, j} ∕ n_{y}$ . We then obtain results similar to Theorems 2.1, 2.3 and 2.5. As the conditions are similar to those in Section 2, we only keep the key conclusions, and the details of conditions and discussions are given in Supplementary Material [28], Section A.8.

Theorem 4.3. Under Condition A.4 in Supplementary Material [28], Σ_x = Σ_y and H₀: μ = ν, for any finite integers (a₁, … , a_m), as n, p → ∞, $[U (a_{1}) ∕ σ (a_{1}), \dots, U (a_{m}) ∕ σ (a_{m})]^{⊺} \overset{D}{\to} N (0, I_{m})$ , where $σ^{2} (a) ≃ a! \sum_{j_{1}, j_{2} = 1}^{p} (n_{x} + n_{y})^{a} σ_{j_{1}, j_{2}}^{a} ∕ (n_{x} n_{y})^{a}$ is of the order Θ(a!pn^−a).

Theorem 4.4. Under Condition A.4 in Supplementary Material [28], Σ_x = Σ_y and H₀: μ = ν, $\forall u \in R$ , $P (\frac{n_{x} n_{y}}{n_{x} + n_{y}} U (\infty) - τ_{p} \leq u) \to exp {- π^{- 1 ∕ 2} exp (- u ∕ 2)}$ , as n, p → ∞, where τ_p = 2log p − log log p. Moreover, ${U (a) ∕ σ (a)}$ of finite integer a and ${n_{x} n_{y} U (\infty) ∕ (n_{x} + n_{y}) - τ_{p}}$ are asymptotically independent.

Theorems 4.3 and 4.4 provide the asymptotic properties of finite-order U-statistics and $U (\infty)$ under H₀. To analyze the power of $U (a)$ ’s, we derive the asymptotic results of $U (a)$ ’s under the alternative hypotheses. We focus on the two-sample mean testing problem, while one-sample mean testing can be obtained similarly. Specifically, we consider the alternative $E_{A} = {μ_{j} - v_{j} = ρ > 0$ for j = 1, … , k₀; μ_j − ν_j = 0 for j = k₀ + 1, … , p}. We then obtain similar conclusions to Theorem 2.5.

Theorem 4.5. Assume Condition A.4 in Supplementary Material [28] and k₀ = o(p). For any finite integers {a₁, … , a_m}, if ρ in $E_{A}$ satisfies $ρ = O (k_{0}^{- 1 ∕ a_{t}} p^{1 ∕ (2 a_{t})} n^{- 1 ∕ 2})$ for t = 1, … , m, then $[U (a_{1}) - E {U (a_{1})}] ∕ σ (a_{1}), \dots, [U (a_{m}) - E {U (a_{m})}] ∕ σ (a_{m})]^{⊺} \overset{D}{\to} N (0, I_{m})$ , as n, p → ∞. Here, $E [U (a)] = ‖ E_{A} ‖_{a}^{a} = k_{0} ρ^{a}$ and $σ^{2} (a) = var {U (a)} ≃ V_{a}$ , with $V_{a} = a! \sum_{j_{1}, j_{2} = k_{0} + 1}^{p} (n_{x} + n_{y})^{a} σ_{j_{1}, j_{2}}^{a} ∕ (n_{x} n_{y})^{a}$ of the order Θ(a!pn^−a).

Next, we compare the power of different U-statistics under alternatives with different sparsity levels. Theorem 4.5 shows that under the local alternatives, the asymptotic power of $U (a)$ mainly depends on $E {U (a)} ∕ \sqrt{var {U (a)}}$ . Therefore, by Theorem 4.5, given constant M > 0, for each $U (a)$ , if $ρ = M^{1 ∕ a} k_{0}^{- 1 ∕ a} V_{a}^{1 ∕ (2 a)}$ , then $E {U (a)} ∕ \sqrt{var {U (a)}} ≃ M$ ; that is, different $U (a)$ ’s have the same power asymptotically. For easy illustration, we consider σ_{j₁, j₂} = 1 when j₁ = j₂ ∈ {k₀ + 1, … , p}, and σ_{j₁, j₂} = 0 when j₁ ≠ j₂ ∈ {k₀ + 1, … , p}, then $M^{1 ∕ a} k_{0}^{- 1 ∕ a} V_{a}^{1 ∕ (2 a)} ≃ ρ_{a}$ with

ρ_{a} ≔ a!^{\frac{1}{2 a}} (M \sqrt{p} ∕ k_{0})^{\frac{1}{a}} {(n_{x} + n_{y}) ∕ (n_{x} n_{y})}^{\frac{1}{2}} .

(4.4)

Therefore, similar to the analysis in Section 2.2, to find the “best” $U (a)$ , it suffices to find the order, denoted by a₀, that gives the minimum ρ_a in (4.4). We have the following result similar to Proposition 2.3.

Proposition 4.1. Given any constant M ∈ (0, + ∞) and n, p, k₀, we consider ρ_a in (4.4) as a function of positive integers a, then:

(i) when $k_{0} \geq M \sqrt{p}$ , the minimum of ρ_a is achieved at a₀ = 1;

(ii) when $k_{0} < M \sqrt{p}$ , the minimum of ρ_a is achieved at some a₀, which increases as $M \sqrt{p} ∕ ∣ J_{D} ∣$ increases.

Proposition 4.1 shows that when the sparsity level k₀ is large, that is, $E_{a}$ is dense, a small a tends to obtain a smaller lower bound in ρ, and vice versa. As (4.4) and (2.13) are similar, we have similar patterns to that in Figure 1 when examining the corresponding numerical plots of ρ_a. In addition, [9] shows that when $ρ = ρ_{\infty} ≔ C_{1} \sqrt{\log p ∕ n}$ for a large C₁, the power of $U (\infty)$ converges to 1, and $\sqrt{\log p ∕ n}$ is minimax rate optimal for sparse alternatives; see also [17]. Thus, if ρ_∞ < ρ_a₀, that is, $k_{0} < M C_{1}^{- a_{0}} \sqrt{p a_{0}!} ∕ \log^{a_{0} ∕ 2} p$ , $U (\infty)$ is the “best” and its lowest detectable order of ρ is $Θ (\sqrt{\log p ∕ n})$ . On the other hand, Proposition 4.1 shows that when $E_{A}$ is dense with $k_{0} > \sqrt{M p}$ , $U (1)$ is the “best” and its lowest detectable order of ρ is $Θ (\sqrt{p} k_{0}^{- 1} n^{- 1 ∕ 2})$ . Moreover, for some large M and C₂, when $E_{A}$ is “moderately dense” or “moderately sparse” with $C_{2} \sqrt{p a_{0}!} ∕ \log^{a_{0} ∕ 2} p < k_{0} < \sqrt{M p}$ , $U (a_{0})$ is the “best” and its lowest detectable order of ρ is $Θ {(\sqrt{p} ∕ k_{0})^{\frac{1}{a_{0}}} n^{- 1 ∕ 2}}$ , which is of a smaller order than the optimal detection boundary of the sparse case $Θ (\sqrt{\log p ∕ n})$ .

More generally, when Σ_x ≠ Σ_y, similar results to Theorems 4.3 and 4.5 can be obtained. In particular, we have the following corollary.

Corollary 4.1. When Σ_x ≠ Σ_y, under Condition A.4 in Supplementary Material [28], Theorem 4.3 holds with $σ^{2} (a) ≃ a! \sum_{j_{1}, j_{2} = 1}^{p} (σ_{x, j_{1}, j_{2}} ∕ n_{x} + σ_{y, j_{1}, j_{2}} ∕ n_{y})^{a}$ and Theorem 4.5 holds with $V_{a} = a! \sum_{j_{1}, j_{2} = k_{0} + 1}^{p} (σ_{x, j_{1}, j_{2}} ∕ n_{x} + σ_{y, j_{1}, j_{2}} ∕ n_{y})^{a}$ .

Corollary 4.1 shows that the asymptotic power of finite-order U-statistics depends on $E {U (a)} ∕ \sqrt{var {U (a)}}$ . By the construction of finite-order U-statistics and the proof, we obtain that $E {U (a)} = k_{0} ρ^{a}$ and $var {U (a)} = Θ (a! p n^{- a})$ . We then know that for finite-order U-statistics, similar results to Proposition 4.1 still hold by examining $E {U (a)} ∕ \sqrt{var {U (a)}}$ .

The above power analysis shows that the optimal U-statistic varies when the alternative hypothesis changes. To achieve high power across various alternatives, we can develop an adaptive test similar to that in Section 2.3. Specifically, we calculate the p-values of the U-statistics (4.1) and (4.2) following the theoretical results above and the algorithm in Section 2.3. By combining the p-values as discussed in Section 2.3, the asymptotic power of the adaptive test goes to 1 if there exists one $U (a)$ whose power goes to 1.

Remark 4.1. Xu et al. [67] has also discussed the adaptive testing of two-sample mean that is powerful against various ℓ_p-norm-like sums of μ − ν. But [67] is under the framework of a family of von Mises V-statistics where $V (a) = \sum_{j = 1}^{p} ({\bar{x}}_{j} - {\bar{y}}_{j})^{a}$ . We note that $V (a)$ is equivalent to

V (a) = \sum_{j = 1}^{p} \sum_{c = 0}^{a} (- 1)^{a - c} (\begin{matrix} a \\ c \end{matrix}) (n_{x}^{c} n_{y}^{a - c})^{- 1} \sum_{\begin{matrix} 1 \leq k_{1}, \dots, k_{c} \leq n_{x} \\ 1 \leq s_{1}, \dots, s_{a - c} \leq n_{y} \end{matrix}} \prod_{t = 1}^{c} x_{k_{t}, j} \prod_{m = 1}^{a - c} y_{s_{m}, j},

which allows the indexes k’s and s’s to be the same, and thus is different from the U-statistics in (4.2). [67] shows that the constructed V-statistics are biased estimators of $‖ μ - v ‖_{a}^{a}$ , and $V (a)$ and $V (b)$ are asymptotically independent if a + b is odd, but are asymptotically correlated if a + b is even. The constructed U-statistics in this work extend the properties of those V-statistics such that $U (a)$ in (4.2) is an unbiased estimator of $‖ μ - v ‖_{a}^{a}$ , and all $U (a)$ ’s are asymptotically independent with each other. Given these nice statistical properties, it becomes easier to obtain the joint asymptotic distribution of the U-statistics, and then apply the adaptive test.

4.2. Two-sample covariance testing.

The U-statistics framework can be applied similar to testing the equality of two covariance matrices. Suppose ${x_{i}}_{i = 1}^{n_{x}}$ and ${y_{i}}_{i = 1}^{n_{y}}$ are i.i.d. copies of two independent random vectors x = (x₁, … , x_p)^⊤ and y = (y₁, … , y_p)^⊤, respectively. Denote E(x) = μ = (μ₁, … , μ_p)⊤, E(y) = ν = (ν₁, … , ν_p)^⊤; cov(x) = Σ_x = [σ_{x, j₁, j₂} : 1 ≤ j₁, j₂ ≤ p} and cov(y) = Σ_y = [σ_{y, j₁, j₂} : 1 ≤ j₁, j₂ ≤ p}. Consider H₀ : Σ_x = Σ_y = Σ = (σ_{j₁, j₂})_p×p. Given 1 ≤ j₁, j₂ ≤ p, 1 ≤ k₁ ≠ k₂ ≤ n_x, and 1 ≤ s₁ ≠ s₂ ≤ n_y, K_{j₁, j₂} (x_k₁, x_k₂, y_s₁, y_s₂ ) = (x_k₁,j₁ x_k₁,j₂ − x_k₁,j₁ x_k₂,j₂) − (y_s₁,j₁ y_s₁,j₂ − y_s₁,j₁ y_s₂,j₂) is a simple unbiased estimator of σ_{x, j₁, j₂} − σ_{y, j₁, j₂}. Therefore, for a finite positive integer a, we have the U-statistic

U (a) = \sum_{1 \leq j_{1}, j_{2} \leq p} \frac{1}{P_{2 a}^{n_{x}} P_{2 a}^{n_{y}}} \sum_{\begin{matrix} 1 \leq k_{1, 1} \neq k_{1, 2} \neq \dots \\ \neq k_{a, 1} \neq k_{a, 2} \leq n_{x} \end{matrix}} \sum_{\begin{matrix} 1 \leq s_{1, 1} \neq s_{1, 2} \neq \dots \\ \neq s_{a, 1} \neq s_{a, 2} \leq n_{y} \end{matrix}} \prod_{t = 1}^{a} K_{j_{1}, j_{2}} (x_{k_{t, 1}}, x_{k_{t, 2}}, y_{s_{t, 1}}, y_{s_{t, 2}}) .

(4.5)

As in Remark 2.1, another formulation of $U (a)$ equivalent to (4.5) is

U (a) = \sum_{c = 0}^{a} \sum_{b_{1} = 0}^{c} \sum_{b_{2} = 0}^{a - c} (- 1)^{c - b_{1} + b_{2}} \sum_{1 \leq j_{1}, j_{2} \leq p} \sum_{\begin{matrix} 1 \leq i_{1} \neq \dots \neq \\ i_{2 c - b_{1}} \leq n_{x} \end{matrix}} \sum_{\begin{matrix} 1 \leq w_{1} \neq \dots \neq \\ w_{2 (a - c) - b_{2}} \leq n_{y} \end{matrix}} C_{n_{x}, n_{y}, a, c, b_{1}, b_{2}} \times \prod_{k = 1}^{b_{1}} (x_{i_{k}, j_{1}} x_{i_{k}, j_{2}}) \prod_{s = b_{1} + 1}^{c} x_{i_{s}, j_{1}} \prod_{t = c + 1}^{2 c - b_{1}} x_{i_{t}, j_{2}} \times \prod_{m = 1}^{b_{2}} (y_{w_{m}, j_{1}} y_{w_{m}, j_{2}}) \prod_{l = b_{2} + 1}^{a - c} y_{w_{l}, j_{1}} \prod_{q = a - c + 1}^{2 (a - c) - b_{2}} y_{w_{q}, j_{2}},

(4.6)

where $C_{n_{x}, n_{y}, c, b_{1}, b_{2}} = (P_{2 c - b_{1}}^{n_{x}} P_{2 (a - c) - b_{2}}^{n_{y}})^{- 1} a! ∕ {b_{1}! (c - b_{1})! b_{2}! (a - c - b_{2})!}$ , and (4.6) shall be used in the theoretical developments.

We next present the asymptotic results of the constructed U-statistics under the null hypothesis. Here, we assume the regularity Condition A.5 or A.6, whose details and discussions are provided in Section A.13.1 of Supplementary Material [28] due to the space limitation. We mention that Condition A.5 is a mixing-type dependence assumption similar to Condition 2.2, and Condition A.6 is a moment-type dependence assumption similar to Condition 2.2*. Particularly, Condition A.6 extends the moment assumption for second-order U-statistics in Li and Chen [45] to U-statistics of general orders; please see the detailed discussions in Section A.13.1.

Theorem 4.6. Under H₀ and Condition A.5 or A.6 in Supplementary Material [28], for finite integers {a₁, … , a_m}, $[U (a_{1}) ∕ σ (a_{1}), \dots, U (a_{m}) ∕ σ (a_{m})]^{⊺} \overset{D}{\to} N (0, I_{m})$ , where for a ∈ {a₁, … , a_m},

σ^{2} (a) = var {U (a)} ≃ \sum_{1 \leq j_{1}, j_{2}, j_{3}, j_{4} \leq p} a! {\frac{1}{n_{x}} (Π_{j_{1}, j_{2}, j_{3}, j_{4}}^{x} - σ_{j_{1}, j_{2}} σ_{j_{3}, j_{4}}) + \frac{1}{n_{y}} (Π_{j_{1}, j_{2}, j_{3}, j_{4}}^{y} - σ_{j_{1}, j_{2}} σ_{j_{3}, j_{4}})}^{a}

with $Π_{j_{1}, j_{2}, j_{3}, j_{4}}^{x} = E {\prod_{t = 1}^{4} (x_{1, j_{t}} - μ_{j_{t}})}$ and $Π_{j_{1}, j_{2}, j_{3}, j_{4}}^{y} = E {\prod_{t = 1}^{4} (y_{1, j_{t}} - v_{j_{t}})}$ .

Theorem 4.6 provides the asymptotic independence and joint normality of the finite-order U-statistics, which are similar to Theorems 2.1, 4.1 and 4.3. To further study the power of these finite-order U-statistics, we next consider the alternative hypotheses where Σ_x ≠ Σ_y. Let $J_{0}$ be the largest subset of {1, … , p} such that σ_{x, j₁, j₂} = σ_{y, j₁, j₂} = σ_{j₁, j₂} for any j₁, $j_{2} \in J_{0}$ . We then obtain the following theorem under the regularity conditions given in Section A.14 of Supplementary Material [28].

Theorem 4.7. Under Conditions A.7 and A.8 in Supplementary Material [28], for finite integers {a₁, … , a_m}, $[U (a_{1}) - E {U (a_{1})}] ∕ σ (a_{1}), \dots, [U (a_{m}) - E {U (a_{m})}] ∕ σ (a_{m})]^{⊺} \overset{D}{\to} N (0, I_{m})$ , where

σ^{2} (a) = var {U (a)} ≃ a! C_{κ, a} \sum_{j_{1}, j_{2}, j_{3}, j_{4} \in J_{0}} σ_{j_{1}, j_{2}}^{a} σ_{j_{3}, j_{4}}^{a},

and C_κ,a = {(κ_x − 1)/n_x + (κ_y − 1)/n_y}^a + 2(κ_x/n_x + κ_y/n_y)^a with κ_x and κ_y given in Condition A.7.

Given the asymptotic results under the alternatives, we next analyze the power of the finite-order U-statistics. By Theorem 4.7, the asymptotic power of $U (a)$ depends on $E {U (a)} ∕ \sqrt{var {U (a)}}$ . Let J_D = {(j₁, j₂) : σ_{x, j₁, j₂} ≠ σ_{y, j₁, j₂}, 1 ≤ j₁, j₂ ≤ p}, then $E {U (a)} = \sum_{(j_{1}, j_{2}) \in J_{D}} (σ_{x, j_{1}, j_{2}} - σ_{y, j_{1}, j_{2}})^{a}$ . Similar to Section 2.2, to study the relationship between the sparsity level of Σ_x − Σ_y and the power of U-statistics, we consider the case where the nonzero differences between Σ_x and Σ_y are the same. Specifically, let σ_{x, j₁, j₂} − σ_{y, j₁, j₂} = ρ for (j₁, j₂) ∈ J_D, and then $E {U (a)} = ∣ J_{D} ∣ ρ^{a}$ . Following the analysis in Section 2.2, we compare the ρ values needed by different $U (a)$ ’s to achieve $E {U (a)} ∕ \sqrt{var {U (a)}} ≃ M$ for a given constant M. In particular, for given integer a, suppose $E {U (a)} ∕ \sqrt{var {U (a)}} ≃ M$ is achieved when ρ = ρ_a. For any a ≠ b, we compare $U (a)$ and $U (b)$ following Criterion 1.

We use the following example as an illustration, where Σ_x and Σ_y satisfy the conditions of Theorem 4.7. Specifically, we assume that Σ_x = (σ_{x, j₁, j₂})_p×p has the diagonal elements σ_{x, j, j} = ν²; and the off-diagonal elements σ_{x, j₁, j₂} = h_{∣j₁−j₂∣} ∈ (0, ν²) with h_{∣j₁−j₂∣} = Θ(ν²) when ∣j₁ − j₂∣ ≤ s, while σ_{x, j₁, j₂} = 0 when ∣j₁ − j₂∣ > s. This covers the moving average covariance structure of order s, and Σ_x is a banded matrix with bandwidth s. In addition, we assume the bandwidth s = o(p) and $p - ∣ J_{0} ∣ = o (p)$ . By the definition of $J_{0}$ , the assumption $p - ∣ J_{0} ∣ = o (p)$ implies that a large square sub-matrix of Σ_x and Σ_y are the same. For simplicity, we let n_x = n_y with n = n_x + n_y, and a similar analysis can be applied when n_x ≠ n_y. By Theorem 4.7, $var {U (a)} ≃ (n ∕ 2)^{- a} a! {2 κ_{1}^{a} + κ_{2}^{a}} {p v^{2 a} + 2 \sum_{t = 1}^{s} h_{t}^{a} (p - t)}^{2}$ , where κ₁ = κ_x + κ_y and κ₂ = κ_x + κ_y − 2. Therefore, we know for given finite integer a, $E {U (a)} ∕ \sqrt{var {U (a)}} ≃ M$ holds when ρ = ρ_a defined as

ρ_{a} = \frac{(a!)^{\frac{1}{2 a}} \sqrt{κ_{1}} v}{(n ∕ 2)^{1 ∕ 2}} {(\frac{M p}{∣ J_{D} ∣})}^{1 ∕ a} {2 + {(\frac{κ_{2}}{κ_{1}})}^{a}}^{\frac{1}{2 a}} {1 + 2 \sum_{t = 1}^{s} {(\frac{h_{t}}{v^{2}})}^{a} (1 - \frac{t}{p})}^{\frac{1}{a}} .

We next compare the ρ_a’s and obtain the following proposition.

Proposition 4.2. There exists $D_{0}$ that only depends on the given κ_x, κ_y, ν², s, and h_t, t = 1, … , s, and satisfies $D_{0} = Θ (1 ∕ s^{2})$ such that:

(i) When $∣ J_{D} ∣ \geq M p ∕ \sqrt{D_{0}}$ , the minimum of ρ_a is achieved at a₀ = 1.

(ii) When $∣ J_{D} ∣ < M p ∕ \sqrt{D_{0}}$ , the minimum of ρ_a is achieved at some a₀, which increases as M_p/∣J_D∣ increases.

Proposition 4.2 is similar to Propositions 2.3 and 4.1. Following the analysis in Section 2.2, Proposition 4.2 shows that when the difference Σ_x − Σ_y is “very” dense with $∣ J_{D} ∣ \geq M p ∕ \sqrt{D_{0}}$ , $U (1)$ is the most powerful U-statistic; when Σ_x − Σ_y becomes sparser as M_p/∣J_D∣ decreases, a higher-order U-statistic is more powerful; when the Σ_x − Σ_y is “moderately” dense or sparse, a U-statistic of finite order a₀ > 1 would be the most powerful one.

The power analysis above shows that the power of the U-statistics varies when the alternative changes. To maintain high power across different alternatives, we can develop an adaptive testing procedure similar to that in Section 2.3. Given the asymptotic independence in Theorem 4.6, an adaptive testing procedure using the constructed $U (a)$ ’s is valid with the type I error asymptotically controlled. Also, the adaptive test achieves high power by combining the U-statistics as discussed in Section 2.3.

We provide simulation studies on two-sample covariance testing in Supplementary Material [28], Section C.3. By the simulations, we first find that the type I errors of the U statistics and the adaptive test are well controlled under H₀. This verifies the theoretical results in Theorem 4.7. Second, similar to the one-sample covariance testing, we find that generally when the difference Σ_x − Σ_y is sparser, a U-statistic of higher order is more powerful, and vice versa. Moreover, under moderately sparse/dense alternatives, $U (a_{0})$ with a₀ > 1 could achieve the highest power. The results are consistent with Proposition 4.2. Third, we compare the proposed adaptive test with existing methods in literature including [6, 45, 57, 61], and find that the proposed adaptive testing procedure maintains high power across various alternatives.

Remark 4.2. Similar to Section 2, we can let $U (\infty)$ be the maximum-type test statistic in [6], and expect that the result similar to Theorem 2.3 holds under certain regularity conditions. However, as the dependence structure of two-sample covariance matrices is more complicated than the one-sample case, it is more challenging to establish the asymptotic joint distribution of $U (\infty)$ and finite-order U-statistics. We leave this interesting problem for future study, while find in simulations that the performance of $U (\infty)$ is similar to high-order U-statistics $U (a)$ ’s.

4.3. Generalized linear model.

In this section, we consider Example 3 of generalized linear models (on page 156) to show that the proposed framework can be extended to other testing problems. Similar to the results in Section 4.1, we show that the constructed U-statistics are asymptotically independent and normally distributed, and also establish the power analysis results of the U-statistics. We provide the details in Section A.16 of Supplementary Material [28]. Recently, Wu et al. [65] also discussed the adaptive testing of generalized linear model. But [65] is under the framework of a family of von Mises V-statistics, and thus is different from the current paper as discussed in Remark 4.1. Moreover, the current work provides the theoretical power analysis while [65] did not.

5. Discussion.

This paper introduces a general U-statistics framework for applications to high-dimensional adaptive testing. Particularly, we focus on the examples including testing of means, covariances and regression coefficients in generalized linear models. Under the null hypothesis, we prove that the U-statistics of finite orders have asymptotic joint normality, and establish the asymptotic mutual independence among the finite-order U-statistics and $U (\infty)$ . Moreover, under alternative hypotheses, we analyze the power of different U-statistics and demonstrate how the most powerful U-statistic changes with the sparsity level of the alternative parameters. Based on the theoretical results, we propose an adaptive testing procedure, which is powerful against different alternatives. The superior performance of this adaptive testing is confirmed in the simulations and real data analysis.

There are several possible extensions of the U-statistics framework in this paper. First, by our current proof, the convergence rate in Theorem 2.3 is bounded by O(log^−1/2 p), which is an upper bound and not sharp. From our extensive simulations, we find that the type I error rate of the adaptive testing is well controlled with a relatively small p, for example, p = 50. We might obtain a shaper bound of the convergence rate, but more refined concentration property of the high-dimensional and high-order U-statistics is needed. Second, the proposed framework requires that the elements in the parameter set $E$ have unbiased estimates. When we cannot obtain unbiased estimates easily, for example, for the precision matrix, the proposed construction may not follow directly. Nevertheless we may use “nearly” unbiased estimators to construct “U-statistics” for hypothesis testing, such as the “nearly” unbiased estimator of the precision matrix proposed in [66]; the main challenge is then to control the accumulative bias over the parameters under high dimensions. Third, this paper discusses the examples where the elements in $E$ are comparable. When the parameters in $E$ are not comparable, such as $E$ containing both means and covariances parameters, the construction of U-statistics still follows but the theoretical derivation may require a careful case-by-case examination. Fourth, the construction of the U-statistics treats the parameters in $E$ with equal weight. More generally, we could assign different weights to different parameter estimators. For instance, standardizing the data is one example of assigning different weights. As inappropriate weight assignments could lead to power loss, when the truth is unknown, how to effectively assign weights to maximize the test power is an interesting research question. We shall discuss these extensions in the future as a significant amount of additional work is still needed.

In addition to the examples in this paper, the proposed U-statistics framework can be applied to other high-dimensional hypothesis testing problems. For example, it can be applied to testing the block-diagonality of a covariance matrix, whose theoretical analysis would be similar to the considered one sample and two sample covariance testing problems. It can also be used to test high-dimensional regression coefficients in complex regression models other than the generalized linear models, following a similar construction based on the score functions. A key step is then to characterize the impact of nuisance parameters that are estimated under the null hypothesis, and challenges arise especially when the nuisance parameters are high dimensional. Such interesting extensions will be further explored in our follow-up studies.

Supplementary Material

Supplementary

NIHMS1737820-supplement-Supplementary.pdf^{(1.3MB, pdf)}

Acknowledgments.

The authors thank Co-Editors, Professor Edward I. George and Professor Richard J. Samworth, an Associate Editor and three anonymous referees for their constructive comments. The authors also thank Professor Ping-Shou Zhong for sharing the code of the paper [13] and Professor Xuming He and Professor Peter Song for helpful discussions.

This research is supported by NSF Grants DMS-1711226, DMS-1712717, SES-1659328, CAREER SES-1846747, and NIH grants R01GM113250, R01GM126002, R01HL105397 and R01HL116720.

Footnotes

SUPPLEMENTARY MATERIAL

Supplement to “Asymptotically independent U-statistics in high-dimensional testing” (DOI: 10.1214/20-AOS1951SUPP; .pdf). This supplementary material contains the technical proofs of the main paper and additional simulations.

REFERENCES

[1].Anderson TW (2009). An Introduction to Multivariate Statistical Analysis. Wiley, New York. [Google Scholar]
[2].Alzheimer’s Association (2018). 2018 Alzheimer’s disease facts and figures. Alzheimer’s Dement. 14 367–429. [Google Scholar]
[3].Bai Z, Jiang D, Yao J-F and Zheng S (2009). Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Statist 37 3822–3840. MR2572444 10.1214/09-AOS694 [DOI] [Google Scholar]
[4].Bai Z and Saranadasa H (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329. MR1399305 [Google Scholar]
[5].Bickel PJ and Levina E (2008). Regularized estimation of large covariance matrices. Ann. Statist 36 199–227. MR2387969 10.1214/009053607000000758 [DOI] [Google Scholar]
[6].Cai T, Liu W and Xia Y (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Amer Statist. Assoc 108 265–277. MR3174618 10.1080/01621459.2012.758041 [DOI] [Google Scholar]
[7].Cai TT (2017). Global testing and large-scale multiple testing for high-dimensional covariance structures. Annu. Rev. Stat. Appl 4 423–446. 10.1146/annurev-statistics-060116-053754 [DOI] [Google Scholar]
[8].Cai TT and Jiang T (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist 39 1496–1525. MR2850210 10.1214/11-AOS879 [DOI] [Google Scholar]
[9].Cai TT, Liu W and Xia Y (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser B. Stat. Methodol 76 349–372. MR3164870 10.1111/rssb.12034 [DOI] [Google Scholar]
[10].Cai TT and Ma Z (2013). Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli 19 2359–2388. MR3160557 10.3150/12-BEJ455 [DOI] [Google Scholar]
[11].Chen SX, Li J and Zhong P-S (2019). Two-sample and ANOVA tests for high dimensional means. Ann. Statist 47 1443–1474. MR3911118 10.1214/18-AOS1720 [DOI] [Google Scholar]
[12].Chen SX and Qin Y-L (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist 38 808–835. MR2604697 10.1214/09-AOS716 [DOI] [Google Scholar]
[13].Chen SX, Zhang L-X and Zhong P-S (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc 105 810–819. MR2724863 10.1198/jasa.2010.tm09560 [DOI] [Google Scholar]
[14].Chen X (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Statist 46 642–678. MR3782380 10.1214/17-AOS1563 [DOI] [Google Scholar]
[15].Colantuoni C, Lipska BK, Ye T, Hyde TM, Tao R, Leek JT, Colantuoni EA, Elkahloun AG, Herman MM et al. (2011). Temporal dynamics and genetic control of transcription in the human prefrontal cortex. Nature 478 519. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Donoho D and Jin J (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist 32 962–994. MR2065195 10.1214/009053604000000265 [DOI] [Google Scholar]
[17].Donoho D and Jin J (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statist. Sci 30 1–25. MR3317751 10.1214/14-STS506 [DOI] [Google Scholar]
[18].Fan J (1996). Test of significance based on wavelet thresholding and Neyman’s truncation. J. Amer. Statist. Assoc 91 674–688. MR1395735 10.2307/2291663 [DOI] [Google Scholar]
[19].Fan J, Han F and Liu H (2014). Challenges of big data analysis. Nat. Sci. Rev 1 293–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Fan J, Liao Y and Yao J (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica 83 1497–1541. MR3384226 10.3982/ECTA12749 [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Fan J, Lv J and Qi L (2011). Sparse high dimensional models in economics. Ann. Rev. Econ 3 291–317. 10.1146/annurev-economics-061109-080451 [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Frahm G (2004). Generalized elliptical distributions: Theory and applications. Ph.D. thesis, Univ. Köln. [Google Scholar]
[23].Friston KJ (2009). Modalities, modes, and models in functional neuroimaging. Science 326 399–403. 10.1126/science.1174521 [DOI] [PubMed] [Google Scholar]
[24].Gaetan C and Guyon X (2010). Spatial Statistics and Modeling. Springer Series in Statistics. Springer, New York. MR2569034 10.1007/978-0-387-92257-7 [DOI] [Google Scholar]
[25].Goeman JJ, van de Geer SA and van Houwelingen HC (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B. Stat. Methodol 68 477–493. MR2278336 10.1111/j.1467-9868.2006.00551.x [DOI] [Google Scholar]
[26].Gregory KB, Carroll RJ, Baladandayuthapani V and Lahiri SN (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc 110 837–849. MR3367268 10.1080/01621459.2014.934826 [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Hall P and Jin J (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist 38 1686–1732. MR2662357 10.1214/09-AOS764 [DOI] [Google Scholar]
[28].He Y, Xu G, Wu C and Pan W (2021). Supplement to “Asymptotically independent U-statistics in high-dimensional testing.” 10.1214/20-AOS1951SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Heinig M, Petretto E, Wallace C, Bottolo L, Rotival M, Lu H, Li Y, Sarwar R, Langley SR et al. (2010). A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk. Nature 467 460. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Ho H-C and Hsing T (1996). On the asymptotic joint distribution of the sum and maximum of stationary normal random variables. J. Appl. Probab 33 138–145. MR1371961 10.2307/3215271 [DOI] [Google Scholar]
[31].Ho H-C and Mccormick WP (1999). Asymptotic distribution of sum and maximum for Gaussian processes. J. Appl. Probab 36 1031–1044. MR1742148 10.1239/jap/1032374753 [DOI] [Google Scholar]
[32].Hoeffding W (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Stat 19 293–325. MR0026294 10.1214/aoms/1177730196 [DOI] [Google Scholar]
[33].Hsing T (1995). A note on the asymptotic independence of the sum and maximum of strongly mixing stationary random variables. Ann. Probab 23 938–947. MR1334178 [Google Scholar]
[34].James B, James K and Qi Y (2007). Limit distribution of the sum and maximum from multivariate Gaussian sequences. J. Multivariate Anal 98 517–532. MR2293012 10.1016/j.jmva.2006.06.009 [DOI] [Google Scholar]
[35].Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM et al. (2019). Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet 51 404–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Jiang T (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab 14 865–880. MR2052906 10.1214/105051604000000143 [DOI] [Google Scholar]
[37].Jiang T and Yang F (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Statist 41 2029–2074. MR3127857 10.1214/13-AOS1134 [DOI] [Google Scholar]
[38].Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist 29 295–327. MR1863961 10.1214/aos/1009210544 [DOI] [Google Scholar]
[39].Kanehisa M, Goto S, Furumichi M, Tanabe M and Hirakawa M (2010). KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38 D355–D360. 10.1093/nar/gkp896 [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Kim J, Zhang Y and Pan W (2016). Powerful and adaptive testing for multi-trait and multi-SNP associations with GWAS and sequencing data. Genetics 203 715–731. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Lan W, Luo R, Tsai C-L, Wang H and Yang Y (2015). Testing the diagonality of a large covariance matrix in a regression setting. J. Bus. Econom. Statist 33 76–86. MR3303743 10.1080/07350015.2014.923317 [DOI] [Google Scholar]
[42].Ledoit O and Wolf M (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist 30 1081–1102. MR1926169 10.1214/aos/1031689018 [DOI] [Google Scholar]
[43].Leung D and Drton M (2018). Testing independence in high dimensions with sums of rank correlations. Ann. Statist 46 280–307. MR3766953 10.1214/17-AOS1550 [DOI] [Google Scholar]
[44].Li D and Xue L (2015). Joint limiting laws for high-dimensional independence tests. ArXiv E-prints. [Google Scholar]
[45].Li J and Chen SX (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist 40 908–940. MR2985938 10.1214/12-AOS993 [DOI] [Google Scholar]
[46].Liu W-D, Lin Z and Shao Q-M (2008). The asymptotic distribution and Berry–Esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann. Appl. Probab 18 2337–2366. MR2474539 10.1214/08-AAP527 [DOI] [Google Scholar]
[47].Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA et al. (2009). Finding the missing heritability of complex diseases. Nature 461 747. [DOI] [PMC free article] [PubMed] [Google Scholar]
[48].Mccormick WP and Qi Y (2000). Asymptotic distribution for the sum and maximum of Gaussian processes. J. Appl. Probab 37 958–971. MR1808861 10.1239/jap/1014843076 [DOI] [Google Scholar]
[49].Mosteller F and Fisher RA (1948). Questions and answers. Amer. Statist 2 30–31. [Google Scholar]
[50].Muirhead RJ (2009). Aspects of Multivariate Statistical Theory. Wiley, New York. [Google Scholar]
[51].Paindaveine D and Van Bever G (2014). Inference on the shape of elliptical distributions based on the MCD. J. Multivariate Anal 129 125–144. MR3215984 10.1016/j.jmva.2014.04.013 [DOI] [Google Scholar]
[52].Pan W, Kim J, Zhang Y, Shen X and Wei P (2014). A powerful and adaptive association test for rare variants. Genetics 197 1081–1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Péché S (2009). Universality results for the largest eigenvalues of some sample covariance matrix ensembles. Probab. Theory Related Fields 143 481–516. MR2475670 10.1007/s00440-007-0133-7 [DOI] [Google Scholar]
[54].Peng Z and Nadarajah S (2003). On the joint limiting distribution of sums and maxima of stationary normal sequence. Theory Probab. Appl 47 706–709. [Google Scholar]
[55].Pham TD and Tran LT (1985). Some mixing properties of time series models. Stochastic Process. Appl 19 297–303. MR0787587 10.1016/0304-4149(85)90031-6 [DOI] [Google Scholar]
[56].Prince M, Bryce R, Albanese E, Wimo A, Ribeiro W and Ferri CP (2013). The global prevalence of dementia: A systematic review and metaanalysis. Alzheimer’s Dement. 9 63–75. [DOI] [PubMed] [Google Scholar]
[57].Schott JR (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Statist. Data Anal 51 6535–6542. MR2408613 10.1016/j.csda.2007.03.004 [DOI] [Google Scholar]
[58].Shao Q-M and Zhou W-X (2014). Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab 42 623–648. MR3178469 10.1214/13-AOP837 [DOI] [Google Scholar]
[59].Soshnikov A (2002). A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. J. Stat. Phys 108 1033–1056. MR1933444 10.1023/A:1019739414239 [DOI] [Google Scholar]
[60].Srivastava MS and Du M (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal 99 386–402. MR2396970 10.1016/j.jmva.2006.11.002 [DOI] [Google Scholar]
[61].Srivastava MS and Yanagihara H (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivariate Anal 101 1319–1329. MR2609494 10.1016/j.jmva.2009.12.010 [DOI] [Google Scholar]
[62].Srivastava R, Li P and Ruppert D (2016). RAPTT: An exact two-sample test in high dimensions using random projections. J. Comput. Graph. Statist 25 954–970. MR3533647 10.1080/10618600.2015.1062771 [DOI] [Google Scholar]
[63].Storey JD and Tibshirani R (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100 9440–9445. MR1994856 10.1073/pnas.1530509100 [DOI] [PMC free article] [PubMed] [Google Scholar]
[64].Wang L, Jia P, Wolfinger RD, Chen X and Zhao Z (2011). Gene set analysis of genome-wide association studies: Methodological issues and perspectives. Genomics 98 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
[65].Wu C, Xu G and Pan W (2019). An adaptive test on high-dimensional parameters in generalized linear models. Statist. Sinica 29 2163–2186. MR3970351 [Google Scholar]
[66].Xia Y, Cai T and Cai TT (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102 247–266. MR3371002 10.1093/biomet/asu074 [DOI] [PMC free article] [PubMed] [Google Scholar]
[67].Xu G, Lin L, Wei P and Pan W (2016). An adaptive two-sample test for high-dimensional means. Biometrika 103 609–624. MR3551787 10.1093/biomet/asw029 [DOI] [PMC free article] [PubMed] [Google Scholar]
[68].Xu J, Murphy SL, Kochanek KD, Bastian B and Arias E (2018). Deaths: Final data for 2016. Natl. Vital Stat. Rep 67 1–76. [PubMed] [Google Scholar]
[69].Xu Z, Xu G and Pan W (2017). Adaptive testing for association between two random vectors in moderate to high dimensions. Genet. Epidemiol 41 599–609. [DOI] [PMC free article] [PubMed] [Google Scholar]
[70].Yang Q and Pan G (2017). Weighted statistic in detecting faint and sparse alternatives for high-dimensional covariance matrices. J. Amer. Statist. Assoc 112 188–200. MR3646565 10.1080/01621459.2015.1122602 [DOI] [Google Scholar]
[71].Yu K, Li Q, Bergen AW, Pfeiffer RM, Rosenberg PS, Caporaso N, Kraft P and Chatterjee N (2009). Pathway analysis by adaptive combination of p-values. Genet. Epidemiol 33 700–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
[72].Zhong P-S and Chen SX (2011). Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc 106 260–274. MR2816719 10.1198/jasa.2011.tm10284 [DOI] [Google Scholar]
[73].Zhong P-S, Chen SX and Xu M (2013). Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence. Ann. Statist 41 2820–2851. MR3161449 10.1214/13-AOS1168 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

NIHMS1737820-supplement-Supplementary.pdf^{(1.3MB, pdf)}

[R1] [1].Anderson TW (2009). An Introduction to Multivariate Statistical Analysis. Wiley, New York. [Google Scholar]

[R2] [2].Alzheimer’s Association (2018). 2018 Alzheimer’s disease facts and figures. Alzheimer’s Dement. 14 367–429. [Google Scholar]

[R3] [3].Bai Z, Jiang D, Yao J-F and Zheng S (2009). Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Statist 37 3822–3840. MR2572444 10.1214/09-AOS694 [DOI] [Google Scholar]

[R4] [4].Bai Z and Saranadasa H (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329. MR1399305 [Google Scholar]

[R5] [5].Bickel PJ and Levina E (2008). Regularized estimation of large covariance matrices. Ann. Statist 36 199–227. MR2387969 10.1214/009053607000000758 [DOI] [Google Scholar]

[R6] [6].Cai T, Liu W and Xia Y (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Amer Statist. Assoc 108 265–277. MR3174618 10.1080/01621459.2012.758041 [DOI] [Google Scholar]

[R7] [7].Cai TT (2017). Global testing and large-scale multiple testing for high-dimensional covariance structures. Annu. Rev. Stat. Appl 4 423–446. 10.1146/annurev-statistics-060116-053754 [DOI] [Google Scholar]

[R8] [8].Cai TT and Jiang T (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist 39 1496–1525. MR2850210 10.1214/11-AOS879 [DOI] [Google Scholar]

[R9] [9].Cai TT, Liu W and Xia Y (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser B. Stat. Methodol 76 349–372. MR3164870 10.1111/rssb.12034 [DOI] [Google Scholar]

[R10] [10].Cai TT and Ma Z (2013). Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli 19 2359–2388. MR3160557 10.3150/12-BEJ455 [DOI] [Google Scholar]

[R11] [11].Chen SX, Li J and Zhong P-S (2019). Two-sample and ANOVA tests for high dimensional means. Ann. Statist 47 1443–1474. MR3911118 10.1214/18-AOS1720 [DOI] [Google Scholar]

[R12] [12].Chen SX and Qin Y-L (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist 38 808–835. MR2604697 10.1214/09-AOS716 [DOI] [Google Scholar]

[R13] [13].Chen SX, Zhang L-X and Zhong P-S (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc 105 810–819. MR2724863 10.1198/jasa.2010.tm09560 [DOI] [Google Scholar]

[R14] [14].Chen X (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Statist 46 642–678. MR3782380 10.1214/17-AOS1563 [DOI] [Google Scholar]

[R15] [15].Colantuoni C, Lipska BK, Ye T, Hyde TM, Tao R, Leek JT, Colantuoni EA, Elkahloun AG, Herman MM et al. (2011). Temporal dynamics and genetic control of transcription in the human prefrontal cortex. Nature 478 519. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Donoho D and Jin J (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist 32 962–994. MR2065195 10.1214/009053604000000265 [DOI] [Google Scholar]

[R17] [17].Donoho D and Jin J (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statist. Sci 30 1–25. MR3317751 10.1214/14-STS506 [DOI] [Google Scholar]

[R18] [18].Fan J (1996). Test of significance based on wavelet thresholding and Neyman’s truncation. J. Amer. Statist. Assoc 91 674–688. MR1395735 10.2307/2291663 [DOI] [Google Scholar]

[R19] [19].Fan J, Han F and Liu H (2014). Challenges of big data analysis. Nat. Sci. Rev 1 293–314. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Fan J, Liao Y and Yao J (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica 83 1497–1541. MR3384226 10.3982/ECTA12749 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Fan J, Lv J and Qi L (2011). Sparse high dimensional models in economics. Ann. Rev. Econ 3 291–317. 10.1146/annurev-economics-061109-080451 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Frahm G (2004). Generalized elliptical distributions: Theory and applications. Ph.D. thesis, Univ. Köln. [Google Scholar]

[R23] [23].Friston KJ (2009). Modalities, modes, and models in functional neuroimaging. Science 326 399–403. 10.1126/science.1174521 [DOI] [PubMed] [Google Scholar]

[R24] [24].Gaetan C and Guyon X (2010). Spatial Statistics and Modeling. Springer Series in Statistics. Springer, New York. MR2569034 10.1007/978-0-387-92257-7 [DOI] [Google Scholar]

[R25] [25].Goeman JJ, van de Geer SA and van Houwelingen HC (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B. Stat. Methodol 68 477–493. MR2278336 10.1111/j.1467-9868.2006.00551.x [DOI] [Google Scholar]

[R26] [26].Gregory KB, Carroll RJ, Baladandayuthapani V and Lahiri SN (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc 110 837–849. MR3367268 10.1080/01621459.2014.934826 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Hall P and Jin J (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist 38 1686–1732. MR2662357 10.1214/09-AOS764 [DOI] [Google Scholar]

[R28] [28].He Y, Xu G, Wu C and Pan W (2021). Supplement to “Asymptotically independent U-statistics in high-dimensional testing.” 10.1214/20-AOS1951SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Heinig M, Petretto E, Wallace C, Bottolo L, Rotival M, Lu H, Li Y, Sarwar R, Langley SR et al. (2010). A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk. Nature 467 460. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Ho H-C and Hsing T (1996). On the asymptotic joint distribution of the sum and maximum of stationary normal random variables. J. Appl. Probab 33 138–145. MR1371961 10.2307/3215271 [DOI] [Google Scholar]

[R31] [31].Ho H-C and Mccormick WP (1999). Asymptotic distribution of sum and maximum for Gaussian processes. J. Appl. Probab 36 1031–1044. MR1742148 10.1239/jap/1032374753 [DOI] [Google Scholar]

[R32] [32].Hoeffding W (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Stat 19 293–325. MR0026294 10.1214/aoms/1177730196 [DOI] [Google Scholar]

[R33] [33].Hsing T (1995). A note on the asymptotic independence of the sum and maximum of strongly mixing stationary random variables. Ann. Probab 23 938–947. MR1334178 [Google Scholar]

[R34] [34].James B, James K and Qi Y (2007). Limit distribution of the sum and maximum from multivariate Gaussian sequences. J. Multivariate Anal 98 517–532. MR2293012 10.1016/j.jmva.2006.06.009 [DOI] [Google Scholar]

[R35] [35].Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM et al. (2019). Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet 51 404–413. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Jiang T (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab 14 865–880. MR2052906 10.1214/105051604000000143 [DOI] [Google Scholar]

[R37] [37].Jiang T and Yang F (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Statist 41 2029–2074. MR3127857 10.1214/13-AOS1134 [DOI] [Google Scholar]

[R38] [38].Johnstone IM (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist 29 295–327. MR1863961 10.1214/aos/1009210544 [DOI] [Google Scholar]

[R39] [39].Kanehisa M, Goto S, Furumichi M, Tanabe M and Hirakawa M (2010). KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38 D355–D360. 10.1093/nar/gkp896 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Kim J, Zhang Y and Pan W (2016). Powerful and adaptive testing for multi-trait and multi-SNP associations with GWAS and sequencing data. Genetics 203 715–731. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Lan W, Luo R, Tsai C-L, Wang H and Yang Y (2015). Testing the diagonality of a large covariance matrix in a regression setting. J. Bus. Econom. Statist 33 76–86. MR3303743 10.1080/07350015.2014.923317 [DOI] [Google Scholar]

[R42] [42].Ledoit O and Wolf M (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist 30 1081–1102. MR1926169 10.1214/aos/1031689018 [DOI] [Google Scholar]

[R43] [43].Leung D and Drton M (2018). Testing independence in high dimensions with sums of rank correlations. Ann. Statist 46 280–307. MR3766953 10.1214/17-AOS1550 [DOI] [Google Scholar]

[R44] [44].Li D and Xue L (2015). Joint limiting laws for high-dimensional independence tests. ArXiv E-prints. [Google Scholar]

[R45] [45].Li J and Chen SX (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist 40 908–940. MR2985938 10.1214/12-AOS993 [DOI] [Google Scholar]

[R46] [46].Liu W-D, Lin Z and Shao Q-M (2008). The asymptotic distribution and Berry–Esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann. Appl. Probab 18 2337–2366. MR2474539 10.1214/08-AAP527 [DOI] [Google Scholar]

[R47] [47].Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA et al. (2009). Finding the missing heritability of complex diseases. Nature 461 747. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] [48].Mccormick WP and Qi Y (2000). Asymptotic distribution for the sum and maximum of Gaussian processes. J. Appl. Probab 37 958–971. MR1808861 10.1239/jap/1014843076 [DOI] [Google Scholar]

[R49] [49].Mosteller F and Fisher RA (1948). Questions and answers. Amer. Statist 2 30–31. [Google Scholar]

[R50] [50].Muirhead RJ (2009). Aspects of Multivariate Statistical Theory. Wiley, New York. [Google Scholar]

[R51] [51].Paindaveine D and Van Bever G (2014). Inference on the shape of elliptical distributions based on the MCD. J. Multivariate Anal 129 125–144. MR3215984 10.1016/j.jmva.2014.04.013 [DOI] [Google Scholar]

[R52] [52].Pan W, Kim J, Zhang Y, Shen X and Wei P (2014). A powerful and adaptive association test for rare variants. Genetics 197 1081–1095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] [53].Péché S (2009). Universality results for the largest eigenvalues of some sample covariance matrix ensembles. Probab. Theory Related Fields 143 481–516. MR2475670 10.1007/s00440-007-0133-7 [DOI] [Google Scholar]

[R54] [54].Peng Z and Nadarajah S (2003). On the joint limiting distribution of sums and maxima of stationary normal sequence. Theory Probab. Appl 47 706–709. [Google Scholar]

[R55] [55].Pham TD and Tran LT (1985). Some mixing properties of time series models. Stochastic Process. Appl 19 297–303. MR0787587 10.1016/0304-4149(85)90031-6 [DOI] [Google Scholar]

[R56] [56].Prince M, Bryce R, Albanese E, Wimo A, Ribeiro W and Ferri CP (2013). The global prevalence of dementia: A systematic review and metaanalysis. Alzheimer’s Dement. 9 63–75. [DOI] [PubMed] [Google Scholar]

[R57] [57].Schott JR (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Statist. Data Anal 51 6535–6542. MR2408613 10.1016/j.csda.2007.03.004 [DOI] [Google Scholar]

[R58] [58].Shao Q-M and Zhou W-X (2014). Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab 42 623–648. MR3178469 10.1214/13-AOP837 [DOI] [Google Scholar]

[R59] [59].Soshnikov A (2002). A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. J. Stat. Phys 108 1033–1056. MR1933444 10.1023/A:1019739414239 [DOI] [Google Scholar]

[R60] [60].Srivastava MS and Du M (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal 99 386–402. MR2396970 10.1016/j.jmva.2006.11.002 [DOI] [Google Scholar]

[R61] [61].Srivastava MS and Yanagihara H (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivariate Anal 101 1319–1329. MR2609494 10.1016/j.jmva.2009.12.010 [DOI] [Google Scholar]

[R62] [62].Srivastava R, Li P and Ruppert D (2016). RAPTT: An exact two-sample test in high dimensions using random projections. J. Comput. Graph. Statist 25 954–970. MR3533647 10.1080/10618600.2015.1062771 [DOI] [Google Scholar]

[R63] [63].Storey JD and Tibshirani R (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100 9440–9445. MR1994856 10.1073/pnas.1530509100 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R64] [64].Wang L, Jia P, Wolfinger RD, Chen X and Zhao Z (2011). Gene set analysis of genome-wide association studies: Methodological issues and perspectives. Genomics 98 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R65] [65].Wu C, Xu G and Pan W (2019). An adaptive test on high-dimensional parameters in generalized linear models. Statist. Sinica 29 2163–2186. MR3970351 [Google Scholar]

[R66] [66].Xia Y, Cai T and Cai TT (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102 247–266. MR3371002 10.1093/biomet/asu074 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] [67].Xu G, Lin L, Wei P and Pan W (2016). An adaptive two-sample test for high-dimensional means. Biometrika 103 609–624. MR3551787 10.1093/biomet/asw029 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R68] [68].Xu J, Murphy SL, Kochanek KD, Bastian B and Arias E (2018). Deaths: Final data for 2016. Natl. Vital Stat. Rep 67 1–76. [PubMed] [Google Scholar]

[R69] [69].Xu Z, Xu G and Pan W (2017). Adaptive testing for association between two random vectors in moderate to high dimensions. Genet. Epidemiol 41 599–609. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R70] [70].Yang Q and Pan G (2017). Weighted statistic in detecting faint and sparse alternatives for high-dimensional covariance matrices. J. Amer. Statist. Assoc 112 188–200. MR3646565 10.1080/01621459.2015.1122602 [DOI] [Google Scholar]

[R71] [71].Yu K, Li Q, Bergen AW, Pfeiffer RM, Rosenberg PS, Caporaso N, Kraft P and Chatterjee N (2009). Pathway analysis by adaptive combination of p-values. Genet. Epidemiol 33 700–709. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R72] [72].Zhong P-S and Chen SX (2011). Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc 106 260–274. MR2816719 10.1198/jasa.2011.tm10284 [DOI] [Google Scholar]

[R73] [73].Zhong P-S, Chen SX and Xu M (2013). Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence. Ann. Statist 41 2820–2851. MR3161449 10.1214/13-AOS1168 [DOI] [Google Scholar]

PERMALINK

ASYMPTOTICALLY INDEPENDENT U-STATISTICS IN HIGH-DIMENSIONAL TESTING

Yinqiu He

Gongjun Xu

Chong Wu

Wei Pan

Abstract

1. Introduction.

Motivation.

A family of asymptotically independent U-statistics.

Related literature.

Our contributions.

2. Motivating example: One-sample covariance testing.

2.1. Asymptotically independent U-statistics.

2.2. Power analysis.

Fig. 1.

2.3. Application to adaptive testing and computation.

Adaptive testing.

Computation.

Algorithm 1:

3. Simulations and real data analysis.

3.1. Simulations.

Fig. 2.

3.2. Real data analysis.

Fig. 3.

4. Other high-dimensional examples.

4.1. Mean testing.

One-sample mean testing.

Two-sample mean testing.

4.2. Two-sample covariance testing.

4.3. Generalized linear model.

5. Discussion.

Supplementary Material

Acknowledgments.

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases