Skip to main content
Entropy logoLink to Entropy
. 2024 Mar 1;26(3):226. doi: 10.3390/e26030226

A Blockwise Bootstrap-Based Two-Sample Test for High-Dimensional Time Series

Lin Yang 1
Editor: Boris Ryabko1
PMCID: PMC10969679  PMID: 38539738

Abstract

We propose a two-sample testing procedure for high-dimensional time series. To obtain the asymptotic distribution of our -type test statistic under the null hypothesis, we establish high-dimensional central limit theorems (HCLTs) for an α-mixing sequence. Specifically, we derive two HCLTs for the maximum of a sum of high-dimensional α-mixing random vectors under the assumptions of bounded finite moments and exponential tails, respectively. The proposed HCLT for α-mixing sequence under bounded finite moments assumption is novel, and in comparison with existing results, we improve the convergence rate of the HCLT under the exponential tails assumption. To compute the critical value, we employ the blockwise bootstrap method. Importantly, our approach does not require the independence of the two samples, making it applicable for detecting change points in high-dimensional time series. Numerical results emphasize the effectiveness and advantages of our method.

Keywords: two-sample testing, high-dimensional time series, α-mixing, Gaussian approximation, blockwise bootstrap

1. Introduction

A fundamental testing problem in multivariate analysis involves assessing the equality of two mean vectors, denoted as μX and μY. Since its inception by [1], the Hotelling T2 test has proven to be a valuable tool in multivariate analyses. Subsequently, numerous studies have addressed the testing of μX=μY, within various contexts and under distinct assumptions. See refs. [2,3], along with their respective references.

Consider two sets of observations, {Xt}t=1n1 and {Yt}t=1n2, where Xt=(Xt,1,,Xt,p)T and Yt=(Yt,1,,Yt,p)T. These observations are drawn from two populations with means μX and μY, respectively. The classical test aims to test the hypotheses:

H0:μX=μYversusH1:μXμY. (1)

When {Xt}t=1n1 and {Yt}t=1n2 are two independent sequences and independent with each other, a considerable body of literature focuses on testing Hypothesis (1). The 2-type test statistic corresponding to (1) is of the form (X¯Y¯)TS1(X¯Y¯), where X¯=n11t=1n1Xt, Y¯=n21t=1n2Yt and S1 is the weight matrix. A straightforward choice for S1 is the identity matrix Ip [4,5], implying equal weighting for each dimension. Several classical asymptotic theories have been developed based on this selection of S1. However, this choice disregards the variability in each dimension and the correlations between them, resulting in suboptimal performance, particularly in the presence of heterogeneity or the existence of correlations between dimensions. In recent decades, numerous researchers have investigated various choices for S1 along with the corresponding asymptotic theories. See refs. [6,7]. In addition, some researchers have developed a framework centered on -type test statistics, represented as maxj[p]|(S1/2(X¯Y¯))j| [8,9,10]. Extreme value theory plays a pivotal role in deriving the asymptotic behaviors of these test statistics.

However, when {Xt}t=1n1 and {Yt}t=1n2 are two weakly dependent sequences and are not independent of each other, the above methods may not work well. In this paper, we introduce an -type test statistic Tn:=(n1n2)1/2(n1+n2)1/2|X¯Y¯| for testing H0 under two dependent sequences. Based on Σ, which represents the variance of (n1n2)1/2(n1+n2)1/2(X¯Y¯), we construct a Gaussian maxima, denoted as TnG, to approximate Tn under the null hypothesis. When n1=n2=n, Tn can be written as |Sn|, the maximum of a sum of high-dimensional weakly dependent random vectors, where Sn=n1/2t=1n(XtYt). Let TnG=|G| with G=(G1,,Gp)TN{0,var(Sn)} and A be a class of Borel subsets in Rp. Define

ρn(A)=supAA|P(SnA)P(GA)|.

Paticularly, let Amax consists of all sets Amax of the form Amax={(a1,,ap)TRp: maxj[p]|aj|x} with some xR. Then we have

ρn(Amax)=supxR|P(Tnx)P(TnGx)|.

Note that ρn(Amax) is the Kolmogorov distance between Tn and TnG.

When dimension p diverges exponentially with respect to the sample size n, several studies have focused on deriving ρn(Amax)=o(1) under a weakly dependent assumption. Based on the coupling method for β-mixing sequence, ref. [11] obtained ρn(Amax)=o(1) under the β-mixing condition, contributing to the understanding of such phenomena. Ref. [12] extended the scope of the investigation to the physical dependence framework introduced by [13]. Considering three distinct types of dependence—namely α-mixing, m-dependence, and physical dependence measures—ref. [14] made significant strides. They established nonasymptotic error bounds for Gaussian approximations of sums involving high-dimensional dependent random vectors. Their analysis encompassed various scenarios of A, including hyper-rectangles, simple convex sets, and sparsely convex sets. Let Are be the class of all hyper-rectangles in Rp. Under the α-mixing scenario and some mild regularity conditions, [14] showed

ρn(Are){log(pn)}7/6n1/9,

hence the Gaussian approximation holds if log(pn)=o(n2/21). In this paper, under some conditions similar to or even weaker than [14], we obtain

ρn(Amax){log(pn)}3/2n1/6,

which implies the Gaussian approximation holds if log(pn)=o(n1/9). Refer to Remark 1 for more details on the comparison of the convergence rates. By using the Gaussian-to-Gaussian comparison and Nazarov’s inequality for p-dimensional random vectors, we can easily extend our result to ρn(Are){log(pn)}3/2n1/6. Given that our framework and numerous testing procedures rely on -type test statistics, we thus propose our results under Amax. When p diverges polynomially with respect to n, to the best of our knowledge, there is no existing literature providing the convergence rate of ρn(Amax) for α-mixing sequences under bounded finite moments.

Based on the Gaussian approximation for high-dimensional independent random vectors [15,16], we employ the coupling method for α-mixing sequence [17] and “big-and-small” block technique to specify the convergence rate of ρn(Amax) under various divergence rates of p. For more details, refer to Theorem 1 in Section 3.1 and its corresponding proof in Appendix A. Given that Σ is typically unknown in practice, we develop a data-driven procedure based on blockwise wild bootstrap [18] to determine the critical value for a given significance level α. The blockwise wild bootstrap method is widely used in the time series analysis. See [19,20] and references within.

The independence between {Xt}t=1n1 and {Yt}t=1n2 is not a necessary assumption in our method. We only require the pair sequence {(Xt,Yt)} is weakly dependent. Therefore, our method can be applied effectively to detect change points in high-dimensional time series. Further details on this application can be found in Section 4.

The rest of this paper is organized as follows. Section 2 introduces the test statistic and the blockwise bootstrap method. The convergence rates of Gaussian approximations for high-dimensional α-mixing sequence and the theoretical properties of the proposed test can be found in Section 3. In Section 4, an application to change point detection for high-dimensional time series is presented. The selection method for tuning parameter and a simulation study to investigate the numerical performance of the test are displayed in Section 5. We apply the proposed method to the opening price data from multiple stocks in Section 6. Section 7 provides discussions on the results and outlines our future work. The proofs of the main results in Section 3 are detailed in the Appendix A, Appendix B, Appendix C and Appendix D.

Notation: 

For any positive integer p1, we write [p]={1,,p}. We use |a|=maxj[p]|aj| to denote the -norm of the p-dimensional vector a. Let x and x represent the greatest integer less than or equal to x and the smallest integer greater than or equal to x, respectively. For two sequences of positive numbers {an} and {bn}, we write anbn or bnan if lim supnan/bnc0 for some positive constant c0. Let anbn if anbn and bnan hold simultaneously. Denote 0p=(0,,0)TRp. For any m×m matrix A=(aij)m×m, let |A|=maxi,j[m]|aij| and A2 be the spectral norm of A. Additionally, denote λmin(A) as the smallest eigenvalue of A. Let 1(·) be the indicator function. For any x,yR, denote xy=max{x,y} and xy=min{x,y}. Given γ>0, we define the function ψγ(x):=exp(xγ)1 for any x>0. For a real-valued random variable ξ, we define ξψγ:=inf[λ>0:E{ψγ(|ξ|/λ)}1]. Throughout the paper, we use c,C(0,) to denote two generic finite constants that do not depend on (n1,n2,p), and may be different in different uses.

2. Methodology

2.1. Test Statistic and Its Gaussian Analog

Consider two weakly stationary time series {Xt,tZ} and {Yt,tZ} with Xt=(Xt,1,,Xt,p)T and Yt=(Yt,1,,Yt,p)T. Let μX=E(Xt) and μY=E(Yt). The primary focus is on testing equality of mean vectors of the two populations:

H0:μX=μYversusH1:μXμY.

Given the observations {Xt}t=1n1 and {Yt}t=1n2, the estimations of μX and μY are, respectively, μ^X=n11t=1n1Xt and μ^Y=n21t=1n2Yt. In this paper, we assume n1n2n. It is natural to consider the -type test statistic Tn=(n1n2)1/2(n1+n2)1/2|μ^Xμ^Y|. Write n˜=max{n1,n2}. Define two new sequences {X˜t}t=1n˜ and {Y˜t}t=1n˜ with

X˜t=Xtn11(1tn1)andY˜t=Ytn21(1tn2).

For each t[n˜], let

Zt=n2n˜n1(n1+n2)X˜tn1n˜n2(n1+n2)Y˜t.

Then, Tn can be rewritten as

Tn=|1n˜t=1n˜Zt|. (2)

We reject the null hypothesis H0 if Tn>cvα, where cvα represents the critical value at the significance level α(0,1). Determining cvα involves deriving the distribution of Tn under H0. However, due to the divergence of p in a high-dimensional scenario, obtaining the distribution of Tn is challenging. To address this challenge, we employ the Gaussian approximation theorem [15,16]. We seek a Gaussian analog, denoted as TnG, satisfying the property that the Kolmogorov distance between Tn and TnG converges to zero under H0. Then, we can replace cvα by cvαG:=inf{x>0:P(TnG>x)α}. Define a p-dimensional Gaussian vector

GN(0p,Ξn˜)withΞn˜=var1n˜t=1n˜Zt. (3)

We then define the Gaussian analogue of Tn as

TnG=|G|.

Proposition 1 below demonstrates that the null distribution of Tn can be effectively approximated by the distribution of TnG.

2.2. Blockwise Bootstrap

Note that the long-run covariance matrix Ξn˜ specified in (3) is typically unknown. As a result, determining cvαG through the distribution of TnG becomes challenging. To address this challenge, we introduce a parametric bootstrap estimator for Tn using the blockwise bootstrap method [18].

For some positive constant ϑ[1/2,1), let Sn˜1ϑ and B=n˜/S be the size of each block and the number of blocks, respectively. Denote Ib={(b1)S+1,,bS} for b[B1] and IB={(B1)S+1,,n˜}. Let {ϱb}b=1B be the sequence of i.i.d. standard normal random variables and ϱ=(ϱ1,,ϱn˜), where ϱt=ϱb if tIb. Define the bootstrap estimator of Tn as

T^nG=|1n˜t=1n˜(ZtZ¯)ϱt|,

where Z¯=n˜1t=1n˜Zt. Based on this estimator, we define the estimated critical value cv^α as

cv^α:=inf{x>0:P(T^nG>x|E)α}, (4)

where E={X1,,Xn1,Y1,,Yn2}. Then, we reject the null hypothesis H0 if Tn>cv^α. The procedure for selecting the parameter ϑ (or block size S) is detailed in Section 5.1. In practice, we obtain cv^α through the following bootstrap procedure: Generate K independent sequences {ϱ(1),t}t=1n˜,,{ϱ(K),t}t=1n˜, with each {ϱ(k),t}t=1n˜ generated as {ϱt}t=1n˜. For each k[K], calculate T^(k),nG with {ϱ(k),t}t=1n˜. Then, cv^α is the (1α)K-th largest value among {T^(1),nG,,T^(K),nG}. Here, K is the number of bootstrap replications.

3. Theoretical Results

We employ the concept of ‘α-mixing’ to characterize the serial dependence of {(Xt,Yt)}, with the α-mixing coefficient at lag κ defined as

α(κ):=suprsupAFr,BFr+κ|P(AB)P(A)P(B)|, (5)

where Fr and Fr+κ are the σ-fields generated by {(Xt,Yt):tr} and {(Xt,Yt):tr+κ}, respectively. We call the sequence {(Xt,Yt)} is α-mixing if α(κ)0 as κ.

3.1. Gaussian Approximation for High-Dimensional α-Mixing Sequence

To show that the Kolmogorov distance between Tn and TnG converges to zero under various divergence rates of p, we need the following central limit theorems for high-dimensional α-mixing sequence.

Theorem 1.

Let {ξt}t=1n be an α-mixing sequence of p-dimensional centered random vectors and {α(κ)}κ1 denote the α-mixing coefficients of {ξt}, defined in the same manner as (5). Write Sn=(Sn,1,,Sn,p)T=n1/2t=1nξt and W=(W1,,Wp)TN(0p,Σn) with Σn=E(SnSnT). Define

ρn=supxR|P(|Sn|x)P(|W|x)|.
  • (i) 
    If maxt[n]maxj[p]E(|ξt,j|m)C1*, α(κ)C2*κτ and λmin(Σn)C3* for some m>3, τ>max{2m/(m3),3} and constants C1*,C2*,C3*>0, we have
    ρnp1/2(logp)1/4nτ˜
    provided that p=o(n2τ˜), where τ˜=τ/(11τ+12).
  • (ii) 
    If maxt[n]maxj[p]ξt,jψγ1Mn, α(κ)C1**exp(C2**κγ2) and minj[p](Σn)j,jC3** for some Mn1, γ1(0,2], γ2>0 and constants C1**,C2**,C3**>0, we have
    ρnMn{log(pn)}max{(2γ2+1)/2γ2,3/2}n1/6
    provided that {log(pn)}3=o{nγ1γ2/(2γ1+2γ2γ1γ2)} and Mn2{log(pn)}1/γ2=o(n1/3).

Remark 1.

In scenarios where the dimension p diverges polynomially with respect to n, Theorem 1(i) represents a novel contribution to the existing literature. Moreover, if τ(i.e., α(κ)exp(Cκ) for some constant C>0), we have τ˜1/11, and thus ρn=o(1) if p(logp)1/2=o(n2/11). Compared with Theorem 1 in [14], which provides a Gaussian approximation result when p diverges exponentially with respect to n, Theorem 1(ii) has three improvements. Firstly, all conditions of Theorem 1(ii) are equivalent to those in Theorem 1 of [14], with the exception that we permit γ1(0,1), thereby offering a weaker assumption that is more broadly applicable. Secondly, the convergence rate dependent on n via n1/6 in Theorem 1(ii) outperforms the rate of n1/9 demonstrated in Theorem 1 of [14]. Note that the convergence rate in Theorem 1 of [14] can be rewritten as

Mn{log(pn)}(2γ2+1)/2γ2n1/62/3+Mn{log(pn)}7/6n1/9.

To ensure ρn=o(1), in our result, it is necessary to allow Mn6{log(pn)}(6γ2+3)/γ2=o(n) when γ22/3 and Mn6{log(pn)}max{(6γ2+3)/γ2,9}=o(n) when γ2>2/3, respectively. Comparatively, the basic requirements under Theorem 1 of [14] are Mn6{log(pn)}(6γ2+3)/γ2=o(n) when γ22/3 and Mn9{log(pn)}21/2=o(n) when γ2>2/3, respectively. Due to (6γ2+3)/γ2<21/2 when γ2>2/3, our result permits a larger or equal divergence rate of p compared with Theorem 1 in [14].

3.2. Theoretical Properties

In order to derive the theoretical properties of Tn, the following regular assumptions are needed.

Assumption 1.

  • (i) 

    For some m>4, there exists a constant C1>0 s.t. maxt[n˜]maxj[p]E(|Zt,j|m)C1.

  • (ii) 

    There exists a constant C2>0 s.t. α(κ)C2κτ for some τ>3m/(m4).

  • (iii) 

    There exists a constant C3>0 s.t. λmin(Ξn˜)C3.

Assumption 2.

  • (i) 

    There exists a constant C1>0 s.t. maxt[n˜]maxj[p]Zt,jψ2C1.

  • (ii) 

    There exist two constants C2,C3>0 s.t. α(κ)C2exp(C3κ).

  • (iii) 

    There exists a constant C4>0 s.t. minj[p](Ξn˜)j,jC4.

Remark 2.

The two mild Assumptions, 1 and 2, delineate the necessary assumptions for {(Xt,Yt)} to facilitate the development of Gaussian approximation theories for the dimension p divergence, characterized by polynomial and exponential rates relative to the sample size n, respectively. Assumptions 1(i) and 1(ii) are common assumptions in multivariate time series analysis. Due to n1n2n, if maxt[n1],j[p]E(|Xt,j|m)C and maxt[n2],j[p]E(|Yt,j|m)C, then Assumption 1(i) holds, as verified by the triangle inequality. Additionally, Assumption 1(iii) necessitates the strong nondegeneracy of Ξn˜, a requirement commonly assumed in Gaussian approximation theories (see refs. [21,22], among others). Note that Assumption 2(iii) is implied by Assumption 1(iii). The latter assumption only necessitates the nondegeneracy of minj[p]var(n˜1/2t=1n˜Zt,j). We can modify Assumption 2(i) to maxt[n˜]maxj[p]Zt,jψγC for any γ(0,2], a standard assumption in the literature on ultra-high-dimensional data analysis. This assumption ensures subexponential upper bounds for the tail probabilities of the statistics in question when pn, as discussed in [23,24]. The requirement of sub-Gaussian properties in Assumption 2(i) is made for the sake of simplicity. If {Xt} and {Yt} share the same tail probability, Assumption 2(i) is satisfied automatically. Assumption 2(ii) necessitates that the α-mixing coefficients decay at an exponential rate.

Write Δn:=max{n1,n2}min{n1,n2}. Define two cases with respect to the distinct divergence rates of p as

  • Case1: {Xt}t=1n1 and {Yt}t=1n2 satisfy Assumption 1, and the dimension p satisfies p2logp=o{n4τ/(11τ+12)} and Δn2logp=o(n);

  • Case2: {Xt}t=1n1 and {Yt}t=1n2 satisfy Assumption 2, and the dimension p satisfies log(pn)=o(n1/9) and Δn2logp=o(n).

Note that Δn2logp=o(n) mandates the maximum difference between the two sample sizes. Proposition 1 below demonstrates that, under the aforementioned cases and H0, the Kolmogorov distance between Tn and TnG converges to zero as the sample size approaches infinity. Proposition 1 can be directly derived from Theorem 1. Note that, in the scenario where the dimension p diverges in a polynomial rate with respect to n, obtaining Proposition 1 requires only m>3 and τ>max{2m/(m3),3}, an assumption weaker than Assumption 1. The more stringent restrictions m>4 and τ>3m/(m4) in Assumption 1 are imposed to establish the results presented in Theorems 2 and 3.

Proposition 1.

In either Case1 or Case2, it holds under the null hypothesis H0 that

supxR|P(Tnx)P(TnGx)|=o(1).

According to Proposition 1, the critical value cvα can be substituted with cvαG. However, in practical scenarios, the long-run covariance Ξn˜ defined in (3) is typically unknown. This implies that obtaining cvαG directly from the distribution of TnG is not feasible. We introduce a bootstrap method for obtaining the estimator cv^α defined in (4). In situations where the dimension p diverges at a polynomial rate relative to the sample size n, we require an additional Assumption 3 to ensure that cv^α serves as a reliable estimator for cvα. Assumption 3 places restrictions on the cumulant function, a commonly assumed criterion in time series analysis. Refer to [25,26] for examples of such assumptions in the literature.

Assumption 3.

For each i,j[p], define cumi,j(h,t,s)=cov(Z˚0,iZ˚h,j,Z˚t,iZ˚s,j)γt,i,iγsh,j,jγs,i,jγth,j,i, where γh,i,j=cov(Z0,i,Zh,j) and Z˚t,j=Zt,jE(Zt,j). There exists a constant C4>0 s.t.

maxi,j[p]h=t=s=|cumi,j(h,t,s)|<C4.

Similar to Case1 and Case2, we consider two cases corresponding to different divergence rates of the dimension p, as outlined below:

  • Case3: {Xt}t=1n1 and {Yt}t=1n2 satisfy Assumptions 1 and 3.

  • Case4: {Xt}t=1n1 and {Yt}t=1n2 satisfy Assumption 2.

Theorem 2.

In either Case3 with plogp=o[nmin{(1ϑ)/4,2τ/(11τ+12)}] and Δn2logp=o(n), or Case4 with log(pn)=o[nmin{(1ϑ)/2,ϑ/7,1/9}] and Δn2logp=o(n), it holds under H0 that supxR|P(Tnx)P(T^nGx|E)|=op(1). Moreover, it holds under H0 that

P(Tn>cv^α)αasn.

Theorem 3.

In either Case3 with p=o{n(1ϑ)/4} or Case4 with log(pn)=o[nmin{ϑ/3,(1ϑ)/2}], if maxj[p]|μX,jμY,j|n1/2(logp)1/2, it holds that

P(Tn>cv^α)1asn.

Remark 3.

The different requirements for the divergence rates of p follow from the fact that we do not rely on the Gaussian approximation and comparison results under certain alternative hypotheses. By Theorem 2 and Theorem 3, the optimal selections for ϑ are 1/2 and 7/9 in Case3 and Case4, respectively. This implies that limnPH0(Tn>cv^α)=α holds with plogp=o(n1/8) in Case3 and log(pn)=o(n1/9) in Case4. Under certain alternative hypotheses, limnPH1(Tn>cv^α)=1 holds with p=o(n1/8) in Case3 and log(pn)=o(n1/9) in Case4.

4. Application: Change Point Detection

In this section, we elaborate that our two-sample testing procedure can be regarded as a novel method for detecting change points for high-dimensional time series. To illustrate, we provide a notation for the detection of a single change point, with the understanding that it can be easily extended to the multiple change points case.

Consider a p-dimensional time series {Xt}t=1n. Let μt=E(Xt). Consider the following hypothesis testing problem:

H0:μ1==μnversusH1:μ1==μτ01μτ0==μn.

Here, τ0 is the unknown change point. Let w be a positive integer such that w<min{τ0,nτ0}. We define μ¯t=w1l=tw/2+1t+w/2μl, μ¯(1)=w1l=1wμl and μ¯(2)=w1l=nw+1nμl. Then for each t[3w/2,n3w/2], define Δt,(1)=μ¯tμ¯(1) and Δt,(2)=μ¯tμ¯(2). Thus,

Δt,(1)=0p,if3w/2tτ0w/2,μ¯(2)μ¯(1)t+w/2τ0w,ifτ0w/2<tτ0+w/2,μ¯(2)μ¯(1),ifτ0+w/2<tn3w/2,Δt,(2)=μ¯(1)μ¯(2),if3w/2tτ0w/2,μ¯(1)μ¯(2)t+w/2+τ0w,ifτ0w/2<tτ0+w/2,0p,ifτ0+w/2<tn3w/2.

Assume |μ¯(1)μ¯(2)|=O(1), which represents the sparse signals case. Define t1(εt,(1))=min{t[3w/2,n3w/2]:|Δt,(1)|>εt,(1)} and t2(εt,(2))=max{t[3w/2,n3w/2]:|Δt,(2)|>εt,(2)} with two well-defined thresholds εt,(1),εt,(2)0. Due to the symmetry of |Δt,(1)| and |Δt,(2)|, it holds under H1 that

τ0=t1(εt,(1))+t2(εt,(2))2.

The sample estimators of μ¯t, μ¯(1) and μ¯(2) are, respectively, μ¯^t=w1l=tw/2+1t+w/2Xl, μ¯^(1)=w1l=1wXl and μ¯^(2)=w1l=nw+1nXl. Based on the method proposed in Section 2, with n1=n2=w, we define the following two test statistics:

Twt,(1)=w|μ¯^tμ¯^(1)|andTwt,(2)=w|μ¯^tμ¯^(2)|.

Given a significance level α>0, we choose εt,(1)=cv1αt and εt,(2)=cv2αt, where cv1αt and cv2αt are, respectively, the (1α)-quantiles of the distributions of Twt,(1) and Twt,(2). The estimated critical values cv^1αt and cv^2αt can be obtained by (4). Thus, t^1=min{t[3w/2,n3w/2]:Twt,(1)>cv^1αt} and t^2=max{t[3w/2,n3w/2]:Twt,(2)>cv^2αt}. Hence, the estimator of τ0 is given by

τ^0=t^1+t^22.

We utilize Twt,(1) as an illustrative example to elucidate the applicability of our proposed method. Let w be an even integer. For any t[5w/2,n3w/2], we have Twt,(1)=|w1/2l=1w(Xtw/2+lXl)|, where the sequence {Xtw/2+lXl}l=1w possesses the same weakly dependence properties and similar moment/tail conditions as {Xl}l=1n. For t[3w/2,5w/21], let {X˜l}l=1tw/2 be defined as X˜l=Xl when l[1,w] and X˜l=0p when l[w+1,tw/2]. Additionally, define {Y˜l}l=tw/2+12tw as Y˜l=Xl when l[tw/2+1,t+w/2] and Y˜l=0p when l[t+w/2+1,2tw]. Then, Twt,(1) can be expressed as |w1/2l=1t/2w/4{(Y˜tw/2+lX˜l)+(Y˜2tw+1lX˜tw/2+1l)}|, and {(Y˜tw/2+lX˜l)+(Y˜2tw+1lX˜tw/2+1l)}l=1t/2w/4 shares the same weakly dependence properties and similar moment/tail conditions as {Xl}l=1n. Hence, our method can be applied to change point detection.

The selections of w and α are crucial in this method. We will elaborate on the specific choices for them in future works.

5. Simulation Study

5.1. Tuning Parameter Selection

Given the observations {Xt}t=1n1 and {Yt}t=1n2, we use the minimum volatility (MV) method proposed in [27] to select the block size S.

When the data are independent, by the multiplier bootstrap method described in [28], we set B=n˜ (thus S=1). In this case,

Ξ^n˜=var1n˜t=1n˜(ZtZ¯)ϱt|Z1,,Zn˜=1n˜b=1BtIb(ZtZ¯)tIb(ZtZ¯)T=1n˜t=1n˜(ZtZ¯)(ZtZ¯)T

proves to be a reliable estimator of Ξn˜ introduced in Section 3. When the data are weakly dependent (and thus nearly independent), we expect a small value for S and a large value for B. Therefore, we recommend exploring a narrow range of S, such as S{1,,m}, where m is a moderate integer. In our theoretical proof, the quality of the bootstrap approximation depends on how well the Ξ^n˜ approximates the covariance Ξn˜. The idea behind the MV method is that the conditional covariance Ξ^n˜ should exhibit stable behavior as a function of S within an appropriate range. For more comprehensive discussions on the MV method and its applications in time series analysis, we refer readers to [27,29]. For a moderately sized integer m, let S1<S2<<Sm be a sequence of equally spaced candidate block sizes, and S0=2S1S2, Sm+1=2SmSm1. For each i{0,,m+1}, let

Yji=b=1B(Si)tIb(Zt,jZ¯j)2,

where j[p] and B(S)=n˜/S. Then for each i{1,,m}, we compute

Yi=j=1psd{Yjl}l=i1i+1,

where sd(·) is the standard deviation. Then, we select the block size Si* with i*=argmini{1,,m}Yi.

5.2. Simulation Settings

We present the results of a simulation study aimed at evaluating the performance of tests based on Tn, as defined in (2), in finite samples. To assess the finite-sample properties of the proposed test, we employed the following fundamental generating processes: W=HA+f(a)Rn×p, where ARp×p is the loading matrix, f(·):RRn×p is a constant function, the parameter a belongs to the set {0,0.1,0.2,0.3,0.4,0.5,0.6}, representing the distance between the null and alternative hypotheses. Additionally, H=(H1,,Hn)TRn×p with Ht=ρHt1+εtRp×1, where εtiidN(0p,Ip) and ρ{0,0.1,0.2}. Construct fi(a)=(m1(i),,mn(i))TRn×p such that mt(i)=(mt,1(i),,mt,p(i))T for i{1,2}, where mt,j(1)=aj and mt,j(2)=a(1j/p) for each t[n] and j[p]. Then f1(·) and f2(·) represent the sparse and dense signal cases, respectively. We consider three different loading matrices for A as follows:

  • (M1).

    Let V=(vk,l)1k,lp s.t. vk,l=0.995|kl|, then let A=V1/2.

  • (M2).

    Let A=(ak,l)1k,lp s.t. ak,k=1, ak,l=0.7 for |kl|=1 and ak,l=0 otherwise.

  • (M3).

    Let r=p/2.5, V=(vk,l)1k,lp, where vk,k=1, vk,l=0.9 for r(q1)+1klrq with q=1,,p/r, and vk,l=0 otherwise. Let A=V1/2.

We assess the finite sample performance of our proposed test (denoted by Yang) in comparison with tests introduced by [5] (denoted by Dempster), [4] (denoted by BS), [6] (denoted by SD), and [8] (denoted by CLX). All tests in our simulations are conducted at the 5% significance level with 1000 Monte Carlo replications, and the number of bootstrap replications is set to 1000. We consider dimensions p{50,200,400,800} and sample size pairs (n1,n2){(200,220),(400,420)}.

5.3. Simulation Results

For the testing of the null hypothesis, consider independent generations of {Xt} and {Yt}, following the same process as W, with identical values for ρ and f(a)=0. The choice of f(a)=0 here is made for the sake of simplicity. We exclusively present the simulation results for (M1) in the main body of the paper. The results obtained for (M2) and (M3) are analogous to those of (M1) and are detailed in the Appendix E.

Table 1 presents the performance of various methods in controlling Type I errors based on (M1). As the dimension p or sample size (n1,n2) increases, the results of all methods exhibit small changes, except BS’s. When ρ equals 0, indicating samples are generated from independent Gaussian distributions, both Yang’s method and BS’s method effectively control Type I errors at around 5%, while the control achieved by the other three methods is less optimal. It is noteworthy that, with an increase in ρ, the data generated by the AR(1) model significantly influence the other methods. In contrast, Yang’s method demonstrates superior and more stable results with increasing ρ. These comparative effects are also observable in the results based on (M2) and (M3) in the Appendix E. For this reason, we exclusively compare the empirical power results by different methods with ρ=0.

Table 1.

The Type I error rates, expressed as percentages, were calculated by independently generated sequences {Xt}t=1n1 and {Yt}t=1n2 based on (M1). The simulations were replicated 1000 times.

(n1,n2) ρ p Yang Dempster BS SD CLX
(200,220) 0 50 5 18.5 5.8 0.9 0.3
200 5.9 16.5 6.6 0.4 0.4
400 5.4 17.4 6.2 0.2 0.3
800 4.2 13.5 6.7 0.3 0.2
0.1 50 6.5 22.8 9.3 2 1
200 6.6 22.6 9.6 1.2 0.8
400 7.4 22.9 10.4 1 0.8
800 5.8 22.5 12.4 1 1.2
0.2 50 6.8 30.2 13.8 3.1 2.5
200 7.7 29.9 14.3 2.2 2.7
400 9.3 30.5 18.2 2.2 2.4
800 7.9 33.3 21.3 3 3.2
(400,420) 0 50 5.2 17.6 6.8 1 0.5
200 5.3 17.2 6.8 0.5 0.1
400 4.6 15.1 5.7 0.3 0
800 5.2 14.2 6.3 0.3 0.4
0.1 50 5.6 22.4 9.6 1.4 1
200 6.3 22.5 9.6 1.3 0.8
400 6.1 21.4 9.7 0.8 0.8
800 6.5 23.6 12.1 0.7 1.2
0.2 50 6.7 26.9 12.8 2.5 1.9
200 7.6 29.2 14.9 2.3 2.4
400 7.6 29.4 15.1 1.5 2.9
800 8.3 36.3 21.9 2.5 3.8

Figure 1 and Figure 2 depict the empirical power results of various methods for sparse and dense signals based on (M1). Similarly, as the dimension p increases, the results of all methods show little variation, except Dempster’s. However, with an increase in sample size (n1,n2), most methods exhibit improvement in their results. In Figure 1, it is evident that Yang’s method outperforms others significantly when the signal is sparse. Methods like SD, BS, and Dempster rely on the 2-norm of the data, aggregating signals across all dimensions for testing. This makes them less effective when the signal is sparse, i.e., anomalies appear in only a few dimensions. CLX’s approach, akin to Yang’s, tests whether the largest signal is abnormal. Consequently, CLX performs better than the other three methods in scenarios with sparse signals but still falls short of Yang’s method. On the contrary, when the signal is dense, Figure 2 shows that all methods yield favorable results, with Dempster’s method proving to be the most effective. Yang’s method performs at a relatively high level among these methods. In contrast, the CLX’s method, which performs well in sparse signals, exhibits a relatively lower level of performance in dense signals. In conclusion, the proposed method exhibits the most stable performance across all methods and performs exceptionally well on sparse data.

Figure 1.

Figure 1

The empirical powers with sparse signals were evaluated by independently generated sequences {Xt}t=1n1 based on (M1), f(·)=0 and ρ=0, and {Yt}t=1n2 based on (M1), f(·)=f1(·) and ρ=0. The parameter a represents the distance between the null and alternative hypotheses. The simulations were replicated 1000 times.

Figure 2.

Figure 2

The empirical powers with dense signals were evaluated by independently generated sequences {Xt}t=1n1 based on (M1), f(·)=0 and ρ=0, and {Yt}t=1n2 based on (M1), f(·)=f2(·) and ρ=0. The parameter a represents the distance between the null and alternative hypotheses. The simulations were replicated 1000 times.

6. Real Data Analysis

In this section, we apply the proposed method to a dataset comprised of stock data obtained from Bloomberg’s public database. This dataset includes daily opening prices from 1 January 2018 to 31 December 2021 for 30 companies in the Consumer Discretionary Sector (CDS) and 31 companies in the Information Technology Sector (ITS), all listed in the S&P 500. The sample sizes for the years 2018, 2019, 2020, and 2021 are 251, 250, 253, and 252, respectively. The findings are presented in Table 2. Regarding the data for the Consumer Discretionary (CD) and Information Technology (IT) sectors, all p-values from the tests between two consecutive years are 0. This suggests a significant variation in the average annual opening prices across different years for both CDs and ITs.

Table 2.

The p-values for testing the equality of average annual opening prices across two consecutive years in the Consumer Discretionary Sector and Information Technology Sector, respectively.

Sector of S&P 500 2018–2019 2019–2020 2020–2021
Consumer Discretionary 0 0 0
Information Technology 0 0 0

For data visualization, Figure 3 displays the average annual opening prices of 30 companies in the CDS (left subgragh) and 31 companies in the ITS (right subgragh) in 2018, 2019, 2020, and 2021. The two subgraghs both exhibit a pattern of annual growth in the opening prices of nearly every stock. These results are well in line with the conclusion of Table 2.

Figure 3.

Figure 3

The average annual opening prices of 30 Consumer Discretionary corporations and 31 Information Technology corporations in 2018, 2019, 2020, and 2021.

7. Discussion

In this paper, we propose a two-sample test for high-dimensional time series based on blockwise bootstrap. Our -type test statistic is designed to detect the largest abnormal signal among dimensions. Unlike some frameworks, we do not necessarily require independence within each observation or between the two sets of observations. Instead, we rely on the weak dependence property of the pair sequence {(Xt,Yt)} to ensure the asymptotic properties of our proposed method. We derive two Gaussian approximation results for two cases in which the dimension p diverges, one at a polynomial rate relative to the sample size n and the other at an exponential rate relative to the sample size n. In the bootstrap procedure, the block size serves as the tuning parameter, and we employ the minimum volatility method, as proposed by [27], for block size selection.

Our test statistic is designed to pinpoint the maximum value among dimensions, facilitating the detection of significant differences in certain dimensions. In cases where differences in each dimension are minimal, it is more appropriate to consider the 2-type test statistic rather than the -type one. Consequently, in the absence of prior information, the utilization of test statistics that combine both types proves advantageous. However, deriving theoretical results from such a combined approach is a significant challenge. As discussed in Section 4, our two-sample testing procedure can be applied to change point detection in high-dimensional time series. The choices of w, the size of each subsample mean, and the significance level α play crucial roles in this change point detection procedure. We leave these considerations for future research.

Appendix A. Proof of Theorem 1

Appendix A.1. Proof of Theorem 1(i)

Proof. 

We first show that, for any τ>(q1)m/(mq) with some q[2,m],

maxj[p]E|t=1nξt,j|qnq/2. (A1)

If q=2, due to κ=1αm2m(κ)κ=1κ(m2)τm<, Equation (1.12b) (Davydov’s inequality) of [30] yields

Et=1nξt,j2=t=1nE(|ξt,j|2)+t1t2cov(ξt1,j,ξt2,j)n+t1t2{E(|ξt1,j|m)}1m{E(|ξt2,j|m)}1mαm2m(|t1t2|)n+nκ=1nαm2m(κ)n (A2)

for any j[p]. For q>2 and j[p], Theorem 6.3 of [30] yields

E|t=1nξt,j|qaqsn,jq+nbq01[α1(u)n]q1supt[n]Qt,j(u)qdu,

where aq,bq>0 are two constants depending only on q, sn,j2=t1,t2=1n|Cov(ξt1,j,ξt2,j)|, α1(u)=κ01(uα(κ)) and Qt,j(u)=inf{x:P(|ξt,j|>x)u}. By (A2), it holds that sn,jq=(sn,j2)q/2nq/2. Due to maxt[n]maxj[p]E(|ξt,j|m)C, we have maxt[n]maxj[p]Qt,j(u)u1m. By the denifition of α1(·), we know that α1(u)u1τ. Thus

01[α1(u)n]q1supt[n]Qt,j(u)qdu01uq1τqmduC,

where the last inequality follows from τ>(q1)m/(mq). Hence, we have

E|t=1nξt,j|qnq/2

for any j[p]. By combining above results, we complete the proof of (A1).

Now, we begin to prove Theorem 1(i). Define

ωn=supx>0|Pmaxj[p]Sn,jxPmaxj[p]Wjx|.

Let Sˇn=(Sˇn,1,,Sˇn,2p)T=(Sn,1,Sn,1,,Sn,p,Sn,p)T and Wˇ=(Wˇ1,,Wˇ2p)T=(W1,W1,,Wp,Wp)T. Then, we have maxj[p]|Sn,j|=maxj[2p]Sˇn,j and maxj[p]|Wj|=maxj[2p]Wˇj. Then, to obtain Theorem 1(i), without loss of generality, it suffices to specify the convergence rate of ωn.

For some constant ς(0,1), let Bn=nς and Kn=n/Bn be the number of blocks and the size of each block, respectively. For simplicity, we assume Bnnς and Kn=n/Bnn1ς. We first decompose the sequence {1,,n} into Bn blocks: Gb={(b1)Kn+1,,bKn} for b[Bn]. Let gnkn be two non-negative integers such that Kn=gn+kn. We then decompose each Gb(b[Bn]) to a “large” block Ib with length gn and a “small” block Jb with length kn: Ib={(b1)Kn+1,,bKnkn} and Jb={bKnkn+1,,bKn}. Let Hb=(Hb,1,,Hb,p)T=Kn1/2tIbξt. For each b[Bn] and some Dn, define Hb+=(Hb,1+,,Hb,p+)T with Hb,j+=Hb,j1(|Hb,j|Dn)E{Hb,j1(|Hb,j|Dn)} and Hb=(Hb,1,,Hb,p)T with Hb,j=Hb,j1(|Hb,j|>Dn)E{Hb,j1(|Hb,j|>Dn)}. For each j[p], by Theorem 2 of [17], there exists an independent sequence {H˜b,j}b=1Bn such that H˜b,j has the same distribution as Hb,j+ and

E(|H˜b,jHb,j+|)0α(kn)inf{xR:P(|Hb,j+|>x)u}du.

Due to |Hb,j+|2Dn, we have inf{xR:P(|Hb,j+|>x)u}Dn for any u0, which implies

E(|H˜b,jHb,j+|)Dnα(kn). (A3)

Define S˜n=(S˜n,1,,S˜n,p)T=Bn1/2b=1BnH˜b with S˜n,j=Bn1/2b=1BnH˜b,j and

ω˜n=supx>0|Pmaxj[p]S˜n,jxPmaxj[p]Wjx|. (A4)

For any ϵ1>0, triangle inequality implies

Pmaxj[p]Sn,jxPmaxj[p]S˜n,jx+ϵ1+P|maxj[p]Sn,jmaxj[p]S˜n,j|>ϵ1Pmaxj[p]Wjx+ϵ1+ω˜n+P|maxj[p]Sn,jmaxj[p]S˜n,j|>ϵ1Pmaxj[p]Wjx+Pxϵ1<maxj[p]Wjx+ϵ1+ω˜n+P|maxj[p]Sn,jmaxj[p]S˜n,j|>ϵ1

for any x>0, then P(maxj[p]Sn,jx)P(maxj[p]Wjx)P(xϵ1<maxj[p]Wjx+ϵ1)+ω˜n+P(|maxj[p]Sn,jmaxj[p]S˜n,j|>ϵ1). Likewise, P(maxj[p]Sn,jx)P(maxj[p]Wjx)P(xϵ1<maxj[p]Wjx+ϵ1)ω˜nP(|maxj[p]Sn,jmaxj[p]S˜n,j|>ϵ1). Due to minj[p](Σn)j,jλmin(Σn)c, Lemma A.1 of [31] yields

supxRPxϵ1<maxj[p]Wjx+ϵ1ϵ1(logp)1/2

for any ϵ1>0. Thus, we can conclude that

ωnω˜n+P|maxj[p]Sn,jmaxj[p]S˜n,j|>ϵ1+ϵ1(logp)1/2. (A5)

Define Sn+=(Sn,1+,,Sn,p+)T=Bn1/2b=1BnHb+. By triangle inequality,

|maxj[p]Sn,jmaxj[p]Sn,j+|maxj[p]|Sn,jSn,j+|maxj[p]|1n1/2b=1BntJbξt,j|+maxj[p]|1Bn1/2b=1BnHb,j|.

By (A1), we have E(|Hb,j|3)C. Thus E(|Hb,j|3)E(|Hb,j|3)C, and

E(|Hb,j|2)E{|Hb,j|21(|Hb,j|>Dn)}E(|Hb,j|3)Dn1Dn1. (A6)

Similar to (A2), we have E(|b=1BntJbξt,j|)Bn1/2kn1/2 for any j[p], and

Eb=1BnHb,j2=b=1BnE(|Hb,j|2)+b1b2cov(Hb1,j,Hb2,j)BnDn1+b1b2α13kn1(|b1b2|=1)+|b2b11|Kn1(|b1b2|>1)BnDn1+|b1b2|=1α13(kn)+|b1b2|>1α13(|b1b21|Kn)BnDn1+Bnknτ3, (A7)

where the last inequality follows from τ>3. Thus, E(|b=1BnHb,j|)Bn1/2(Dn1/2+knτ/6) and

E|maxj[p]Sn,jmaxj[p]Sn,j+|pkn1/2Kn1/2+pDn1/2+pknτ/6. (A8)

Due to H˜b,j having the same distribution as Hb,j+ and |Hb,j+|2Dn, by (A3), we have E(|H˜b,jHb,j+|s)Dnsknτ for s{2,3}. Thus, following the same arguments as in the proof of (A7), it holds that

Eb=1Bn(H˜b,jHb,j+)2b=1BnDn2knτ+Dn2kn2τ/3|b1b2|=1α13(kn)+|b1b2|>1α13(|b1b21|Kn)BnDn2knτ.

Thus, E(|b=1Bn(H˜b,jHb,j+)|)Bn1/2Dnknτ/2 and

E|maxj[p]S˜n,jmaxj[p]Sn,j+|Emaxj[p]|S˜n,jSn,j+|pDnknτ/2.

Together with (A8), we have

E|maxj[p]Sn,jmaxj[p]S˜n,j|pkn1/2Kn1/2+pDn1/2+pknτ/6+pDnknτ/2.

Let ϵ1=p1/2(logp)1/4(kn1/2Kn1/2+Dn1/2+knτ/6+Dnknτ/2)1/2. It holds by (A5) and Markov inequality that

ωnω˜n+p1/2(logp)1/4kn1/2Kn1/2+1Dn1/2+1knτ/6+Dnknτ/21/2. (A9)

Define Σ˜G=Bn1b=1Bnvar(H˜b) and Δ=|ΣnΣ˜G|, where Σn=E(SnSnT). Note that

Δ=|1Bnb=1Bnvar(H˜b)var(Hb+)+1Bnb=1Bnvar(Hb+)var(Hb)+1Bnb=1Bnvar(Hb)Σn|1Bnb=1Bn|var(H˜b)var(Hb+)|Δ1+1Bnb=1Bn|var(Hb+)var(Hb)|Δ2+|1Bnb=1Bnvar(Hb)Σn|Δ3. (A10)

In this sequel, we specify the convergence rates of |Δ1|, |Δ2|, and |Δ3|, respectively. Note that the (i,j)-th element of var(H˜b)var(Hb+) is E(H˜b,iH˜b,jHb,i+Hb,j+). Due to H˜b,j having the same distribution as Hb,j+ and |Hb,j+|Dn for any b[Bn] and j[p], it holds by (A3) that

|E(H˜b,iH˜b,jHb,i+Hb,j+)||E{(H˜b,iHb,i+)H˜b,j}|+|E{(H˜b,jHb,j+)H˜b,i+}|Dn2knτ

for any b[Bn] and i,j[p]. Thus, we can conclude that |Δ1|Dn2knτ. The (i,j)-th element of var(Hb+)var(Hb) is E(Hb,i+Hb,j+Hb,iHb,j). Note that E(|Hb,j|)E{|Hb,j|1(|Hb,j|>Dn)}E(|Hb,j|3)Dn2Dn2. Due to Hb,j=Hb,j++Hb,j, it holds by (A6) that

|E(Hb,i+Hb,j+Hb,iHb,j)|=|E{Hb,i+Hb,j+(Hb,i++Hb,i)(Hb,j++Hb,j)}||E(Hb,i+Hb,j)|+|E(Hb,j+Hb,i)|+|E(Hb,iHb,j)|Dn1

for any b[Bn] and i,j[p]. Thus, we can conclude that |Δ2|Dn1. The (i,j)-th element of ΣnBn1b=1Bnvar(Hb) is n1t1,t2=1nE(ξt1,iξt2,j)n1b=1Bnt1,t2IbE(ξt1,iξt2,j), and

|1nt1,t2=1nE(ξt1,iξt2,j)1nb=1Bnt1,t2IbE(ξt1,iξt2,j)|=1n|b1b2EtGb1ξt,itGb2ξt,j+b=1BnEtIbξt,itJbξt,j+tJbξt,itGbξt,j|. (A11)

Similar to the proof of (A2), we have

|EtJbξt,itGbξt,j|=|tJbcov(ξt,i,ξt,j)+t1t2:t1,t2Jbcov(ξt1,i,ξt2,j)+t1Jbt2Ibcov(ξt1,i,ξt2,j)|kn+t1t2:t1,t2Jb{E(|ξt1,i|3)}13{E(|ξt2,j|3)}13α13(|t1t2|)+t1Jbt2Ib{E(|ξt1,i|3)}13{E(|ξt2,j|3)}13α13(|t1t2|)kn.

Similarly, we can also obtain

|EtIbξt,itJbξt,j|kn.

Thus,

|b=1BnEtIbξt,itJbξt,j+tJbξt,itGbξt,j|knBn.

Analogously to the proof of (A2), if b1<b2, due to τ>2m/(m3),

|EtGb1ξt,itGb2ξt,j|t1Gb1t2Gb2{E(|ξt1,i|m)}1m{E(|ξt2,i|m)}1mαm2m(|t1t2|)δ=1Knδαm2m{(b2b11)Kn+δ}1(b2b1=1)+Kn2αm2m{(b2b11)Kn}1(b2b1>1).

Then,

b1<b2|EtGb1ξt,itGb2ξt,j|Bn+BnKn2m(m2)τmδ=1Bnδ(m2)τmBn.

The same result still holds for b1>b2. Thus, we can conclude that

b1b2|EtGb1ξt,itGb2ξt,j|Bn.

Then, by (A11), it holds that

|1nt1,t2=1nE(ξt1,iξt2,j)1nb=1Bnt1,t2IbE(ξt1,iξt2,j)|knKn

for any i,j[p]. Thus, |Δ3|knKn1. By (A10), we can conclude that

|Δ||Δ1|+|Δ2|+|Δ3|Dn2knτ+1Dn+knKn.

Let {H˜bG}b=1Bn be a sequence of an independent Gaussian vector such that H˜bG=(H˜b,1G,,H˜b,pG)TN{0p,var(H˜b)} for each b[Bn], where H˜b=(H˜b,1,,H˜b,p)T. By Theorem 1.1 of [15], Cauchy–Schwarz inequality and Jensen’s inequality,

supx>0|Pmaxj[p]1Bn1/2b=1BnH˜b,jxPmaxj[p]1Bn1/2b=1BnH˜b,jGx|p1/4·b=1BnE(|Σ˜G1/2Bn1/2H˜b|23)p1/4Bn3/2Σ˜G1/223·b=1BnEj=1pH˜b,j23/2p7/4Bn3/2Σ˜G1/223·b=1Bnmaxj[p]E(|H˜b,j|3),

where Σ˜G=Bn1b=1Bnvar(H˜b). Note that

|λmin(Σ˜G)λmin(Σn)|Δ2p|Δ|.

Due to λmin(Σn)c, we have λmin(Σ˜G)c as long as p|Δ|=o(1). Thus, if p|Δ|=o(1), we have Σ˜G1/22C. Since Hb,j=Kn1/2tIbξt,j, (A1) yields E(|H˜b,j|3)=E(|Hb,j+|3)E(|Hb,j|3)C for any b[Bn] and j[p], which implies

supx>0|Pmaxj[p]1Bn1/2b=1BnH˜b,jxPmaxj[p]1Bn1/2b=1BnH˜b,jGx|p7/4Bn1/2 (A12)

provided that p|Δ|=o(1). By Proposition 2.1 of [16], we have

supx>0|Pmaxj[p]1Bn1/2b=1BnH˜b,jGxPmaxj[p]Wjx||Δ|1/2logp. (A13)

Then, by (A4), (A12), and (A13), we have

ω˜np7/4Bn1/2+|Δ|1/2logp

provided that p|Δ|=o(1). Together with (A9),

ωnp7/4Bn1/2+|Δ|1/2logp+p1/2(logp)1/4kn1/2Kn1/2+1Dn1/2+1knτ/6+Dnknτ/21/2

provided that p|Δ|=o(1). Select Dnn4τ/(11τ+12), knn12/(11τ+12), and ς=7τ/(11τ+12). Then, if p=o{n2τ/(11τ+12)}, we have

ωnp7/4n7τ/(22τ+24)+logpn2τ/(11τ+12)+p1/2(logp)1/4nτ/(11τ+12)p1/2(logp)1/4nτ/(11τ+12).

Hence, we complete the proof of Theorem 1(i). □

Appendix A.2. Proof of Theorem 1(ii)

Proof. 

Define {(Gb,Ib,Jb)}b=1Bn, {Hb+}b=1Bn, and {Hb}b=1Bn in the same manner as in the proof of Theorem 1(i) with Bnnς, Knn1ς, knn1ς and Dn, where ς(0,1). Let

ωn=supx>0|Pmaxj[p]Sn,jxPmaxj[p]Wjx|.

Analogously to (A5), due to minj[p](Σn)j,j>c, we have

ωnω˜n+P|maxj[p]Sn,jmaxj[p]S˜n,j|>ϵ2+ϵ2(logp)1/2 (A14)

for some ϵ2>0, where S˜n,j=Bn1/2b=1BnH˜b,j with {H˜b,j} specified in the same manner as in the proof of Theorem 1(i), and

ω˜n=supx>0|Pmaxj[p]S˜n,jxPmaxj[p]Wjx|.

Define Sn+=(Sn,1+,,Sn,p+)T=Bn1/2b=1BnHb+. By triangle inequality,

|maxj[p]Sn,jmaxj[p]Sn,j+|maxj[p]|Sn,jSn,j+|maxj[p]|1n1/2b=1BntJbξt,j|+maxj[p]|1Bn1/2b=1BnHb,j|.

Note that P(|ξt,jMn1|>x)exp(Cxγ1) for any x>0. Let γ˜=(1/γ1+1/γ2)1. By Theorem 1 of [32] and Bonferroni inequality, we have

Pmaxj[p]|1n1/2b=1BntJbξt,j|>xpBnknexpCnγ˜/2xγ˜Mnγ˜+pexpCnx2Mn2Bnkn (A15)

for any xMnn1/2. Similarly, by Theorem 1 of [32] again, for any xMnKn1/2,

P(|Hb,j|>x)=P|1Kn1/2tIbξt,j|>xKnexpCKnγ˜/2xγ˜Mnγ˜+expCx2Mn2.

Then, if Dn>Mn,

E{Hb,j21(|Hb,j|>Dn)}=20DnxP(|Hb,j|>Dn)dx+2DnxP(|Hb,j|>x)dxDn2KnexpCKnγ˜/2Dnγ˜Mnγ˜+expCDn2Mn2+KnDnxexpCKnγ˜/2xγ˜Mnγ˜dx+DnxexpCx2Mn2dxDn2KnCKnγ˜/2Dnγ˜Mnγ˜+expCDn2Mn2.

Thus, for any b[Bn] and j[p],

E(|Hb,j|2)E{Hb,j21(|Hb,j|>Dn)}Dn2KnCKnγ˜/2Dnγ˜Mnγ˜+expCDn2Mn2. (A16)

Select Dn=C*Mn{log(pn)}1/2 for some sufficiently large constant C*>0. Thus, for any x0,

Pmaxj[p]|1Bn1/2b=1BnHb,j|>xpBn1/2maxj[p]maxb[Bn]E(|Hb,j|)x(pn)1x

provided that log(pn)=o{Knγ˜/(2γ˜)}. Then, by (A15), we can conclude that for any xMnn1/2,

P|maxj[p]Sn,jmaxj[p]Sn,j+|>xpBnknexpCnγ˜/2xγ˜Mnγ˜+pexpCnx2Mn2Bnkn+(pn)1x (A17)

provided that log(pn)=o{Knγ˜/(2γ˜)}. Similar to (A3), we have

E(|H˜b,jHb,j+|)Dnα(kn)Dnexp(Cknγ2). (A18)

Select kn=C**{log(pn)}1/γ2 for some sufficiently large constant C**>0. By (A18) and triangle inequality,

P|maxj[p]S˜n,jmaxj[p]Sn,j+|>xpBn1/2maxb[Bn]maxj[p]E(|H˜b,jHb,j+|)x(pn)1x

for any x0. Thus, by (A17), for any xMnn1/2,

P|maxj[p]Sn,jmaxj[p]S˜n,j|>xpBnknexpCnγ˜/2xγ˜Mnγ˜+pexpCnx2Mn2Bnkn+(pn)1x (A19)

provided that log(pn)=o{Knγ˜/(2γ˜)}. Let ϵ2=C***Mnkn1/2Kn1/2{log(pn)}1/2 for some sufficient large constant C***>0. It holds by (A14) that

ωnω˜n+Mn{log(pn)}(2γ2+1)/2γ2Kn1/2 (A20)

provided that log(pn)=o{knγ˜/(2γ˜)Bnγ˜/(2γ˜)Knγ˜/(2γ˜)}. Define Σ˜G=Bn1b=1Bnvar(H˜b) and Δ=|ΣnΣ˜G|, where Σn=E(SnSnT). Note that

Δ=|1Bnb=1Bnvar(H˜b)var(Hb+)+1Bnb=1Bnvar(Hb+)var(Hb)+1Bnb=1Bnvar(Hb)Σn|1Bnb=1Bn|var(H˜b)var(Hb+)|Δ1+1Bnb=1Bn|var(Hb+)var(Hb)|Δ2+|1Bnb=1Bnvar(Hb)Σn|Δ3. (A21)

In this sequel, we will specify the convergence rates of |Δ1|, |Δ2| and |Δ3|, respectively. Note that the (i,j)-th element of var(H˜b)var(Hb+) is E(H˜b,iH˜b,jHb,i+Hb,j+). Due to H˜b,j has the same distribution as Hb,j+ and |Hb,j+|Dn for any b[Bn] and j[p], it holds by (A18) that

|E(H˜b,iH˜b,jHb,i+Hb,j+)||E{(H˜b,iHb,i+)H˜b,j}|+|E{(H˜b,jHb,j+)H˜b,i+}|(pn)1

for any b[Bn] and i,j[p]. Thus, we can conclude that |Δ1|(pn)1. The (i,j)-th element of var(Hb+)var(Hb) is E(Hb,i+Hb,j+Hb,iHb,j). Due to Hb,j=Hb,j++Hb,j, then it holds by (A16) that

|E(Hb,i+Hb,j+Hb,iHb,j)|=|E{Hb,i+Hb,j+(Hb,i++Hb,i)(Hb,j++Hb,j)}||E(Hb,i+Hb,j)|+|E(Hb,j+Hb,i)|+|E(Hb,iHb,j)|(pn)1

for any b[Bn] and i,j[p] provided that log(pn)=o{Knγ˜/(2γ˜)}. Thus, we can conclude that |Δ2|(pn)1 provided that log(pn)=o{Knγ˜/(2γ˜)}. The (i,j)-th element of ΣnBn1b=1Bnvar(Hb) is n1t1,t2=1nE(ξt1,iξt2,j)n1b=1Bnt1,t2IbE(ξt1,iξt2,j), and

|1nt1,t2=1nE(ξt1,iξt2,j)1nb=1Bnt1,t2IbE(ξt1,iξt2,j)|=1n|b1b2EtGb1ξt,itGb2ξt,j+b=1BnEtIbξt,itJbξt,j+tJbξt,itGbξt,j|. (A22)

Note that E(|ξt,j|r)Mnr for any constant integer r>0. Equation (1.12b) of [30] yields

|EtJbξt,itGbξt,j|=|tJbcov(ξt,i,ξt,j)+t1t2:t1,t2Jbcov(ξt1,i,ξt2,j)+t1Jbt2Ibcov(ξt1,i,ξt2,j)|Mn2kn+t1t2:t1,t2Jb{E(|ξt1,i|3)}13{E(|ξt2,j|3)}13α13(|t1t2|)+t1Jbt2Ib{E(|ξt1,i|3)}13{E(|ξt2,j|3)}13α13(|t1t2|)Mn2kn. (A23)

Similarly, we can also obtain

|EtIbξt,itJbξt,j|Mn2kn.

Thus,

|b=1BnEtIbξt,itJbξt,j+tJbξt,itGbξt,j|Mn2knBn.

By Equation (1.12b) of [30], if b1<b2,

|EtGb1ξt,itGb2ξt,j|t1Gb1t2Gb2{E(|ξt1,i|3)}13{E(|ξt2,j|3)}13α13(|t1t2|)Mn2δ=1Knδexp[C{(b2b11)Kn+δ}γ2]Mn21(b2b1=1)+Mn2Kn2exp{C(b2b11)γ2Knγ2}1(b2b1>1).

Thus,

b1<b2|EtGb1ξt,itGb2ξt,j|Mn2Bn+Mn2Kn2b2b1=2Bn1exp{C(b2b11)γ2Knγ2}Mn2Bn.

Same result holds for b1>b2. Thus we can conclude that

b1b2|EtGb1ξt,itGb2ξt,j|Mn2Bn.

Note that the above upper bounds do not depend on (i,j). Then by (A22), it holds that |Δ3|Mn2knKn1. By (A21), we can conclude that

|Δ|Mn2knKn (A24)

provided that log(pn)=o{Knγ˜/(2γ˜)}. Let {H˜bG}b=1Bn be a sequence of independent Gaussian vector such that H˜bG=(H˜b,1G,,H˜b,pG)TN{0p,var(H˜b)} for any b[Bn], where H˜b=(H˜b,1,,H˜b,p)T. Due to kn{log(pn)}1/γ2, we know that minj[p](Σ˜G)j,j>c provided that Mn2{log(pn)}1/γ2=o(Kn) and log(pn)=o{Knγ˜/(2γ˜)}. Due to H˜b,j2DnMn{log(pn)}1/2, it holds that E(H˜b,j4)Dn2E(H˜b,j2)Mn4log(pn) for any b[Bn] and j[p], where the last inequality follows from E(H˜b,j2)=E(|Hb,j+|2)E(Hb,j2) and the similar arguments as in the proof of (A23). By Theorem 2.1 of [16], we have

supx>0|Pmaxj[p]S˜n,jxPmaxj[p]1Bn1/2b=1BnH˜b,jGx|Mn{log(pn)}3/2Bn1/4. (A25)

provided that Mn2{log(pn)}1/γ2=o(Kn) and log(pn)=o{Knγ˜/(2γ˜)}. By Proposition 2.1 of [16] and (A24), we have

supx>0|Pmaxj[p]1Bn1/2b=1BnH˜b,jGxPmaxj[p]Wjx||Δ|1/2logpMn{log(pn)}(2γ2+1)/2γ2Kn1/2. (A26)

By (A20), (A25) and (A26), due to γ˜=(1/γ1+1/γ2)1, we have

ωnMn{log(pn)}3/2Bn1/4+Mn{log(pn)}(2γ2+1)/2γ2Kn1/2

provided that log(pn)=o{Bnγ1γ2/(γ1+2γ2γ1γ2)Knγ1γ2/(2γ1+2γ2γ1γ2)} and Mn2{log(pn)}1/γ2=o(Kn). Select ς=2/3. Then Bnn2/3, Knn1/3 and

ωnMn{log(pn)}max{(2γ2+1)/2γ2,3/2}n1/6

provided that Mn2{log(pn)}1/γ2=o(n1/3) and {log(pn)}3=o{nγ1γ2/(2γ1+2γ2γ1γ2)}. Thus we complete the proof of Theorem 1(ii). □

Appendix B. Proof of Proposition 1

Proof. 

Define

T˚n=|1n˜t=1n˜Z˚t|,

where Z˚t=ZtE(Zt). Under H0, we know that μX=μY=:μ. Recall n1n2n and Δn=n1n2n1n2. Without loss of generality, we assume n1n2. By triangle inequality, for any j[p],

|t=1n˜Z˚t,jt=1n˜Zt,j|t=1n1|n22n1(n1+n2)n1(n1+n2)|+t=n1+1n2|n1(n1+n2)|=O(Δn).

Thus |TnT˚n|=O(Δnn1/2). Write δn=Δnn1/2πn, where πn>0 diverges at a sufficiently slow rate. Thus, we have

P(Tnx)P(T˚nx+δn)+P(|TnT˚n|>δn)P(TnGx+δn)+supxR|P(T˚nx)P(TnGx)|+o(1)P(TnGx)+supxRP(xδnTnGx+δn)+supxR|P(T˚nx)P(TnGx)|+o(1).

Analogously, we can also obtain that P(Tnx)P(TnGx)supxRP(xδnTnGx+δn)supxR|P(T˚nx)P(TnGx)|o(1). Thus,

supxR|P(Tnx)P(TnGx)|supxRP(xδnTnGx+δn)+supxR|P(T˚nx)P(TnGx)|+o(1).

In Case1, by Assumption 1(iii), we have minj[p](Ξn˜)j,j>c. Then by Lemma A.1 of [31], due to Δn2logp=o(n), we have supxRP(xδnTnGx+δn)Δnn1/2πn(logp)1/2=o(1). By Assumption 1(i), we have maxt[n˜]maxj[p]E(|Z˚t,j|m)C. Note that Ξn˜=E(n˜1/2t=1n˜Z˚t,n˜1/2t=1n˜Z˚tT). Then by Assumption 1 and Theorem 1(i), due to 3m/(m4)>max{2m/(m3),3}, we have supxR|P(T˚nx)P(TnGx)|=o(1) provided that p2logp=o{n4τ/(11τ+12)}. Thus, if p2logp=o{n4τ/(11τ+12)},

supxR|P(Tnx)P(TnGx)|=o(1).

Similarly, in Case2, by Assumption 2 and Theorem 1(ii) with (Mn,γ1,γ2)=(C,2,1), we have supxRP(xδnTnGx+δn)Δnn1/2πn(logp)1/2=o(1) and supxR|P(T˚nx)P(TnGx)|=o(1) provided that log(pn)=o(n1/9). Thus, if log(pn)=o(n1/9),

supxR|P(Tnx)P(TnGx)|=o(1).

We complete the proof of Proposition 1. □

Appendix C. Proof of Theorem 2

Appendix C.1. Proof of Theorem 2 under Case3

Proof. 

By Proposition 1 under Case1, it suffices to show

supxR|P(T^nGx|E)P(TnGx)|=op(1).

Recall TnG=|G| with GN(0,Ξn˜) and T^nG=|n˜1/2t=1n˜(ZtZ¯)ϱt|, where Ξn˜=var(n˜1/2t=1n˜Zt). Let

Ξ^n˜=1n˜b=1BtIb(ZtZ¯)tIb(ZtZ¯)T. (A27)

By Proposition 2.1 of [16], we have

supxR|P(T^nGx|E)P(TnGx)|Γ1/2logp, (A28)

where

Γ=|Ξn˜Ξ^n˜|=1n˜|b=1BtIb(ZtZ¯)tIb(ZtZ¯)Tvart=1n˜Zt|.

Let Z˚t=(Z˚t,1,,Z˚t,p)T=ZtE(Zt). Then, for any i,j[p], triangle inequality yields

|b=1BtIb(Zt,iZ¯i)tIb(Zt,jZ¯j)Et=1n˜Z˚t,it=1n˜Z˚t,j|=|b=1BtIb(Z˚t,iZ˚¯i)tIb(Z˚t,jZ˚¯j)Et=1n˜Z˚t,it=1n˜Z˚t,j||b=1BtIb(Z˚t,iZ˚¯i)tIb(Z˚t,jZ˚¯j)b=1BEtIbZ˚t,itIbZ˚t,j|+|b=1BEtIbZ˚t,itIbZ˚t,jEt=1n˜Z˚t,it=1n˜Z˚t,j||b=1Bt1,t2Ib{Z˚t1,iZ˚t2,jE(Z˚t1,iZ˚t2,j)}|I1,i,j+Sn˜|t=1n˜Z˚t,it=1n˜Z˚t,j|I2,i,j+|b=1BEtIbZ˚t,itIbZ˚t,jEt=1n˜Z˚t,it=1n˜Z˚t,j|I3,i,j.

In this sequel, we will specify the upper bounds of I1,i,j, I2,i,j and I3,i,j, respectively. Without loss of generality, we assume n˜=BS with Bnϑ and Sn1ϑ. By Assumption 1(i), it holds that maxt[n˜]maxj[p]E(|Z˚t,j|m)C for some m>4. Then, due to τ>3m/(m4), (A1) yields

Et1,t2Ib{Z˚t1,iZ˚t2,jE(Z˚t1,iZ˚t2,j)}2EtIbZ˚t,i2tIbZ˚t,j2S2.

By triangle inequality,

Eb=1Bt1,t2Ib{Z˚t1,iZ˚t2,jE(Z˚t1,iZ˚t2,j)}2BS2+b=1B1s=1Bb|t1,t2Ibt3,t4Ib+scov(Z˚t1,iZ˚t2,j,Z˚t3,iZ˚t4,j)|BS2+b=1B1s=1Bb|t1,t2Ibt3,t4Ib+scumi,j(t2t1,t3t1,t4t1)|+b=1B1s=1Bb|t1,t2Ibt3,t4Ib+sE(Z˚t1,iZ˚t3,i)E(Z˚t2,jZ˚t4,j)|+b=1B1s=1Bb|t1,t2Ibt3,t4Ib+sE(Z˚t1,iZ˚t4,j)E(Z˚t3,iZ˚t2,j)}|. (A29)

By Assumption 3, t1,t2Ibt3,t4Ib+scumi,j(t2t1,t3t1,t4t1)S, which implies

b=1B1s=1Bb|t1,t2Ibt3,t4Ib+scumi,j(t2t1,t3t1,t4t1)|B2S. (A30)

For any b[B1] and s[Bb], due to τ>3m/(m4), Equation (1.12b) of [30] yields

|t1Ibt3Ib+sE(Z˚t1,iZ˚t3,i)|t1Ibt3Ib+s|E(Z˚t1,iZ˚t3,i)|t1Ibt3Ib+s{E(|Zt1,i|m)}1m{E(|Zt3,i|m)}1mαm2m(t3t1)t1Ibt3Ib+sαm2m(t3t1)h=1Shαm2m{h+(s1)S}1(s=1)+S2m(m2)τm(s1)(m2)τm1(s>1).

Similarly, we also have

|t2Ibt4Ib+sE(Z˚t2,jZ˚t4,j)|1(s=1)+S2m(m2)τm(s1)(m2)τm1(s>1).

Thus,

b=1B1s=1Bb|t1,t2Ibt3,t4Ib+sE(Z˚t1,iZ˚t3,i)E(Z˚t2,jZ˚t4,j)|b=1B1s=1Bb1(s=1)+S4m2(m2)τm(s1)2(m2)τm1(s>1)B. (A31)

Analogously, we also have b=1B1s=1Bb|t1,t2Ibt3,t4Ib+sE(Z˚t1,iZ˚t4,j)E(Z˚t3,iZ˚t2,j)}|B. Combining this with (A29)–(A31), due to BS,

Eb=1Bt1,t2Ib{Z˚t1,iZ˚t2,jE(Z˚t1,iZ˚t2,j)}2B2S.

Then it holds that

I1,i,j=Op(BS1/2). (A32)

Similar to (A1), we have |(t=1n˜Z˚t,i)(t=1n˜Z˚t,j)|=Op(n). Thus, we know that

I2,i,j=Op(S). (A33)

Note that

I3,i,jb1b2|EtIb1Z˚t,itIb2Z˚t,j|.

For b1<b2, due to τ>3m/(m4), Equation (1.12b) of [30] yields

|EtIb1Z˚t,itIb2Z˚t,j|s=1Ss{E(|Zt,i|m)}1m{E(|Zt+s,j|m)}1mαm2m{s+(b2b11)S}1(b2b1=1)+S2m(m2)τm(b2b11)(m2)τm1(b2b1>1).

Thus,

b1<b2|EtIb1Z˚t,itIb2Z˚t,j|B+S2m(m2)τmb2b1>1(b2b11)(m2)τmB.

Similarly, we can also obtain b1>b2|E{(tIb1Z˚t,i)(tIb2Z˚t,j)}|B, which implies I3,i,jB. Then by (A32) and (A33), it holds that

1n˜|b=1BtIb(Zt,iZ¯i)tIb(Zt,jZ¯j)Et=1n˜Z˚t,it=1n˜Z˚t,j|=Op(S1/2).

Then, by Markov’s inequality,

Γ=|Ξn˜Ξ^n˜|=Op(p2S1/2). (A34)

By (A28), due to Sn1ϑ, it holds that

supxR|P(T^nGx|E)P(TnGx)|=op(1) (A35)

provided that plogp=o{n(1ϑ)/4}.

Recall cv^α=inf{x>0:P(T^nG>x|E)α}. For any ϵ>0, let cvα(ϵ) and cvα(ϵ) be two constants which satisfy P{TnG>cvα(ϵ)}=α+ϵ and P{TnG>cvα(ϵ)}=αϵ, respectively. We claim that for any ϵ>0, it holds that P{cvα(ϵ)<cv^α<cvα(ϵ)}1 as n. Otherwise, if cv^αcvα(ϵ), by (A35), we have

α=P(T^nG>cv^α|E)P{T^nG>cvα(ϵ)|E}=P{TnG>cvα(ϵ)}+op(1)=α+ϵ+op(1),

which is a contradiction with probability approaching one as n. Analogously, if cv^αcvα(ϵ), by (A35), we have

α=P(T^nG>cv^α|E)P{T^nG>cvα(ϵ)|E}=P{TnG>cvα(ϵ)}+op(1)=αϵ+op(1),

which is also a contradiction with probability approaching one as n.

For any ϵ>0, define the event E1,ϵ={cvα(ϵ)<cv^α<cvα(ϵ)}. Then P(E1,ϵ)1 as n. On the one hand, by Proposition 1,

P(Tn>cv^α)P(Tn>cv^α|E1,ϵ)+P(E1,ϵc)P{Tn>cvα(ϵ)}+o(1)=P{TnG>cvα(ϵ)}+o(1)=α+ϵ+o(1),

which implies that lim¯nP(Tn>cv^α)α+ϵ. On the other hand, by Proposition 1,

P(Tn>cv^α)P(Tn>cv^α|E1,ϵ)P{Tn>cvα(ϵ)}P(E1,ϵc)P{TnG>cvα(ϵ)}o(1)=αϵo(1),

which implies that lim_nP(Tn>cv^α)αϵ. Since P(Tn>cv^α) does not depend on ϵ, by letting ϵ0+, we have limnP(Tn>cv^α)=α. Thus we complete the proof of Theorem 2 under Case3. □

Appendix C.2. Proof of Theorem 2 under Case4

Proof. 

By Proposition 1 under Case2 and the arguments in Appendix C.1, it suffices to show

supxR|P(T^nGx|E)P(TnGx)|Γ1/2logp=op(1),

where

Γmaxi,j[p]n˜1|I1,i,j|+maxi,j[p]n˜1|I2,i,j|+maxi,j[p]n˜1|I3,i,j|

with I1,i,j, I2,i,j and I3,i,j specified in Appendix C.1. In this sequel, we will specify the upper bounds of maxi,j[p]|I1,i,j|, maxi,j[p]|I2,i,j| and maxi,j[p]|I3,i,j|, respectively.

Without loss of generality, we assume n˜=BS with Bn˜ϑ and Sn˜1ϑ for some ϑ[1/2,1). Let Wb,i,j=t1,t2IbZ˚t1,iZ˚t2,jE(t1,t2IbZ˚t1,iZ˚t2,j). For Rn>C*S with some sufficiently large constant C*>0, denote Wb,i,j+=Wb,i,j1(|Wb,i,j|Rn)E{Wb,i,j1(|Wb,i,j|Rn)} and Wb,i,j=Wb,i,j1(|Wb,i,j|>Rn)E{Wb,i,j1(|Wb,i,j|>Rn)}. Then for some Cn>0, it holds by Bonferroni inequality that

Pmaxi,j[p]|b=1BWb,i,j|>n˜xp2maxi,j[p]P|b=1BWb,i,j+|+|b=1BWb,i,j|>n˜xp2maxi,j[p]P|b=1BWb,i,j+|>n˜xCn+P|b=1BWb,i,j|>Cn

for all x>Cnn˜1. Note that

E{Wb,i,j21(|Wb,i,j|>Rn)}=20RnxP(|Wb,i,j|>Rn)dx+2RnxP(|Wb,i,j|>x)dx

By Assumptions 2(i)–(ii) and Cauchy–Schwarz inequality, E(t1,t2IbZ˚t1,iZ˚t2,j)S. By Assumptions 2(i)–(ii) again and Theorem 1 of [32], we know that

P|tIbZ˚t,i|>xSexp(Cx2/3)+exp(CS1x2)

for any x. Thus, for any x>CS, we have

P(|Wb,i,j|>x)P|tIbZ˚t,i||tIbZ˚t,j|>CxP|tIbZ˚t,i|>Cx1/2+P|tIbZ˚t,j|>Cx1/2Sexp(Cx1/3)+exp(CS1x).

Due to Rn>CS, we can show that

E(|Wb,i,j|2)E{Wb,i,j21(|Wb,i,j|>Rn)}Rn2Sexp(CRn1/3)+Rn2exp(CS1Rn).

Selecting Rn=C**Slog(pn) for some sufficiently large constant C**>0, and CnB1/2, it holds by Markov’s inequality that

p2maxi,j[p]P|b=1BWb,i,j|>Cnp2B1/2maxi,j[p]maxb[B]E(|Wb,i,j|)=o(1)

provided that log(pn)=o(S1/2). Due to |Wb,i,j+|2Rn, by Theorem 1 of [33],

P|b=1BWb,i,j+|>n˜xCnexpCn˜2x2BRn2+Rnn˜x(logB)(loglogB)

for any x>CB1/2S1. Thus, we can conclude that

maxi,j[p]n˜1|I1,i,j|=maxi,j[p]|1n˜b=1BWb,i,j|=Op{log(pn)}3/2B1/2

provided that log(pn)=o[min{S1/2,B(lognloglogn)2}]. By Bonferroni inequality and Theorem 1 of [32], we know that

Pmaxi,j[p]n˜1|I2,i,j|>xp2maxi[p]P|t=1n˜Z˚t,i|>CS1/2nx1/2p2nexp(CS1/3n2/3x1/3)+p2exp(CS1nx)

for any xSn2. Then, we can conclude that maxi,j[p]n˜1|I2,i,j|=Op{B1log(pn)} provided that log(pn)=o(n1/2). Finally, Equation (1.12b) of [30] yields

|b=1BEtIbZ˚t,itIbZ˚t,jEt=1n˜Z˚t,it=1n˜Z˚t,j|b=1B1κ=1Bb|t1Ibt2Ib+κE(Z˚t1,iZ˚t2,j)|b=1B1κ=1Bbδ=1Bδexp[C{δ+(κ1)S}]b=1B1δ=1Bδexp(Cδ)+b=1B1κ=2BbB2exp{C(κ1)S}B+B3exp(CS)B,

which implies maxi,j[p]n˜1|I3,i,j|=O(S1). Thus,

Γ=Op{log(pn)}3/2B1/2+1S (A36)

provided that log(pn)=o[min{S1/2,B(lognloglogn)2}]. It holds that

supxR|P(T^nGx|E)P(TnGx)|Γ1/2logp=op(1)

provided that log(pn)=o[nmin{(1ϑ)/2,ϑ/7}]. The proof of the second result of Theorem 2 under Case4 is the same as in the proof of the second result of Theorem 2 under Case3. Thus, we complete the proof of Theorem 2. □

Appendix D. Proof of Theorem 3

Proof. 

Let s=Cπnp2S1/2 in Case3 and s=Cπn[B1/2{log(pn)}3/2+CS1] in Case4, where πn>0 diverges at a sufficiently slow rate. Then s=o(1) provided that p=o(S1/4) in Case3 and log(pn)=o(B1/3) in Case4. Define an event

Φ(s)=maxj[p]|(Ξ^n˜)j,j(Ξn˜)j,j1|s,

where Ξ^n˜ and Ξn˜ are specified in (A27) and (3), respectively. By (A34) and (A36) in Appendix C, we have

maxj[p]|(Ξ^n˜)j,j(Ξn˜)j,j||Ξ^n˜Ξn˜|=op(s)

holds under Case3 and Case4 with log(pn)=o(S1/2). By Assumption 1(iii) and Assumption 2(iii), we know that minj[p](Ξn˜)j,j>c holds under Case3 and Case4. Therefore,

maxj[p]|(Ξ^n˜)j,j(Ξn˜)j,j1|maxj[p]|(Ξ^n˜)j,j(Ξn˜)j,j|minj[p](Ξn˜)j,j=op(s).

Then it holds that P{Φc(s)|E}=op(1) under Case3 and Case4. Let ϱ=maxj[p](Ξn˜)j,j. Restricted on Φ(s), there exists a constant C0>0 such that

E(T^nG|E)C0(logp)1/2maxj[p](Ξ^n˜)j,j1/2(1+s)1/2C0(logp)1/2ϱ1/2.

By Borell inequality for Gaussian process,

P{T^nG>E(T^nG|E)+x|E}2expx22maxj[p](Ξ^n˜)j,j

for any x>0. Let x0=ϱ1/2(1+s)1/2[C0(logp)1/2+{2log(4/α)}1/2]. Restricted on Φ(s), we have

x0E(T^nG|E)+(2ϱ)1/2(1+s)1/2log1/24α,

which implies

P{T^nG>x0,Φ(s)|E}2exp2ϱ(1+s)log(4/α)2ϱ(1+s)=α2.

Since P{Φc(s)|E}=op(1), then P{Φc(s)|E}α/4 with probability approaching one. Hence, P(T^nG>x0|E)α with probability approaching one. Similar to the proof of (A1), we know that ϱC under Case3 and Case4. By the definition of cv^α, it holds with probability approaching one that

cv^αϱ1/2(1+s)1/2[C0(logp)1/2+{2log(4/α)}1/2](logp)1/2

under Case3 with p=o(S1/4) and Case4 with log(pn)=o(B1/3S1/2). Let μX=(μX,1,,μX,p)T=E(Xt), μY=(μY,1,,μY,p)T=E(Yt) and j0=argmaxj[p]|μX,jμY,j|, then

Tn=n1n2n1+n2|μ^Xμ^Y|n1n2n1+n2|1n1t=1n1Xt,j01n2t=1n2Yt,j0|=n1n2n1+n2|1n1t=1n1(Xt,j0μX,j0)1n2t=1n2(Yt,j0μY,j0)+μX,j0μY,j0|n1n2n1+n2|μX,j0μY,j0|n1n2n1+n2|1n1t=1n1(Xt,j0μX,j0)1n2t=1n2(Yt,j0μY,j0)|.

Similar to the proof of (A1), we know that

|1n1t=1n1(Xt,j0μX,j0)1n2t=1n2(Yt,j0μY,j0)|=Op(n1/2)

under Case3 and Case4. If |μX,j0μY,j0|n1/2, we can conclude that P(TnC*n1/2|μX,j0μY,j0|)1 as n for some constant C*>0. Due to cv^α(logp)1/2=o(n1/2|μX,j0μY,j0|) under Case3 and Case4, we have that Theorem 3 holds under Case3 and Case4. □

Appendix E. Additional Simulation Results

Table A1.

The Type I error rates, expressed as percentages, were calculated by independently generated sequences {Xt}t=1n1 and {Yt}t=1n2 based on (M2). The simulations were replicated 1000 times.

(n1,n2) ρ p Yang Dempster BS SD CLX
(200,220) 0 50 4.6 2.3 6.4 4.5 3
200 3.3 0 6.7 5.8 3.4
400 3.7 0 5 4.4 4.1
800 3.3 0 6.4 5.2 4.2
0.1 50 6.2 12.4 23.4 18.4 9.8
200 5.2 1.8 43.2 39.5 12.9
400 5.8 0.1 63 59.7 14.7
800 5.3 0 88.7 87.3 19.4
0.2 50 8.1 35.6 51.3 44.3 21.9
200 7.7 23.5 87.2 85.5 37.9
400 9.2 16.9 98.5 98.3 43
800 9.6 9.8 100 100 54.5
(400,420) 0 50 4.9 1.8 5.4 3.6 3.5
200 3.5 0 6.4 5.3 3.2
400 5 0 5.7 4.8 4.4
800 4.3 0 6 5 4.5
0.1 50 6.9 12.2 21.7 17 9.3
200 4.9 1.9 41.7 39 11.1
400 7.8 0.1 63.6 61.4 17.7
800 7.3 0 87.9 86.7 18.3
0.2 50 8.6 33.7 46.9 40.7 20.6
200 7.7 23.7 86.3 84.7 31
400 9.4 17.1 99.2 99 43.6
800 9 9.5 100 100 53.2

Figure A1.

Figure A1

The empirical powers with sparse signals were evaluated by independently generated sequences {Xt}t=1n1 based on (M2), f(·)=0 and ρ=0, and {Yt}t=1n2 based on (M2), f(·)=f1(·) and ρ=0. The parameter a represents the distance between the null and alternative hypotheses. The simulations were replicated 1000 times.

Figure A2.

Figure A2

The empirical powers with dense signals were evaluated by independently generated sequences {Xt}t=1n1 based on (M2), f(·)=0 and ρ=0, and {Yt}t=1n2 based on (M2), f(·)=f2(·) and ρ=0. The parameter a represents the distance between the null and alternative hypotheses. The simulations were replicated 1000 times.

Table A2.

The Type I error rates, expressed as percentages, were calculated by independently generated sequences {Xt}t=1n1 and {Yt}t=1n2 based on (M3). The simulations were replicated 1000 times.

(n1,n2) ρ p Yang Dempster BS SD CLX
(200,220) 0 50 5.7 16.8 7.7 3 1.6
200 4.3 14.9 6.9 0.9 1.6
400 3.5 14.7 7.7 0.2 1.2
800 4.2 15.4 6.9 0.2 1.7
0.1 50 7.9 25.2 13.7 5.5 5.4
200 6.2 23 12 2.7 5
400 5.5 23.3 12.5 1.2 4.2
800 6.9 24 12.9 0.7 5.5
0.2 50 8.6 33.8 21 10.7 12.8
200 7.5 32.5 19.7 5.8 13.9
400 6.9 30.4 20.2 4.4 15
800 9.3 32.3 20.7 1.7 18.4
(400,420) 0 50 5.4 13.9 6.7 1.7 1.6
200 5.1 15.5 6.4 1 1.1
400 5.3 14.1 7.1 0.8 1.3
800 4 16.2 6.3 0.1 1.3
0.1 50 6.9 21.3 10.7 4.7 5.4
200 6.6 23.1 12.5 2.7 4.9
400 7.3 22 11.4 1.7 5.9
800 6.2 23.8 12.7 0.6 5.4
0.2 50 8.2 31.8 18.2 8.6 11
200 8.2 31 19.6 5.2 13.5
400 8.7 32.6 18.9 3.9 14.7
800 7.4 35.2 21.3 1.8 17.3

Figure A3.

Figure A3

The empirical powers with sparse signals were evaluated by independently generated sequences {Xt}t=1n1 based on (M3), f(·)=0 and ρ=0, and {Yt}t=1n2 based on (M3), f(·)=f1(·) and ρ=0. The parameter a represents the distance between the null and alternative hypotheses. The simulations were replicated 1000 times.

Figure A4.

Figure A4

The empirical powers with dense signals were evaluated by independently generated sequences {Xt}t=1n1 based on (M3), f(·)=0 and ρ=0, and {Yt}t=1n2 based on (M3), f(·)=f2(·) and ρ=0. The parameter a represents the distance between the null and alternative hypotheses. The simulations were replicated 1000 times.

Data Availability Statement

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares no conflict of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Hotelling H. The generalization of student’s ratio. Ann. Math. Stat. 1931;2:360–378. doi: 10.1214/aoms/1177732979. [DOI] [Google Scholar]
  • 2.Hu J., Bai Z. A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices. Sci. China Math. 2016;59:2281–2300. doi: 10.1007/s11425-016-0131-0. [DOI] [Google Scholar]
  • 3.Harrar S.W., Kong X. Recent developments in high-dimensional inference for multivariate data: Parametric, semiparametric and nonparametric approaches. J. Multivar. Anal. 2022;188:104855. doi: 10.1016/j.jmva.2021.104855. [DOI] [Google Scholar]
  • 4.Bai Z., Saranadasa H. Effect of high dimension: By an example of a two sample problem. Stat. Sin. 1996;6:311–329. [Google Scholar]
  • 5.Dempster A.P. A high dimensional two sample significance test. Ann. Math. Stat. 1958;29:995–1010. doi: 10.1214/aoms/1177706437. [DOI] [Google Scholar]
  • 6.Srivastava M.S., Du M. A test for the mean vector with fewer observations than the dimension. J. Multivar. Anal. 2008;99:386–402. doi: 10.1016/j.jmva.2006.11.002. [DOI] [Google Scholar]
  • 7.Gregory K.B., Carroll R.J., Baladandayuthapani V., Lahiri S.N. A two-sample test for equality of means in high dimension. J. Am. Stat. Assoc. 2015;110:837–849. doi: 10.1080/01621459.2014.934826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cai T.T., Liu W., Xia Y. Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014;76:349–372. [Google Scholar]
  • 9.Chang J., Zheng C., Zhou W.X., Zhou W. Simulation-Based Hypothesis Testing of High Dimensional Means Under Covariance Heterogeneity. Biometrics. 2017;73:1300–1310. doi: 10.1111/biom.12695. [DOI] [PubMed] [Google Scholar]
  • 10.Xu G., Lin L., Wei P., Pan W. An adaptive two-sample test for high-dimensional means. Biometrika. 2017;103:609–624. doi: 10.1093/biomet/asw029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chernozhukov V., Chetverikov D., Kato K. Testing Many Moment Inequalities. Centre for Microdata Methods and Practice (cemmap); London, UK: 2016. Cemmap working paper, No. CWP42/16. [Google Scholar]
  • 12.Zhang D., Wu W.B. Gaussian approximation for high dimensional time series. Ann. Stat. 2017;45:1895–1919. doi: 10.1214/16-AOS1512. [DOI] [Google Scholar]
  • 13.Wu W.B. Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. USA. 2005;102:14150–14154. doi: 10.1073/pnas.0506715102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chang J., Chen X., Wu M. Central limit theorems for high dimensional dependent data. Bernoulli. 2024;30:712–742. doi: 10.3150/23-BEJ1614. [DOI] [Google Scholar]
  • 15.Raič M. A multivariate berry–esseen theorem with explicit constants. Bernoulli. 2019;25:2824–2853. [Google Scholar]
  • 16.Chernozhukov V., Chetverikov D., Kato K., Koike Y. Improved central limit theorem and bootstrap approximations in high dimensions. Ann. Stat. 2022;50:2562–2586. doi: 10.1214/22-AOS2193. [DOI] [Google Scholar]
  • 17.Peligrad M. Some remarks on coupling of dependent random variables. Stat. Probab. Lett. 2002;60:201–209. doi: 10.1016/S0167-7152(02)00318-8. [DOI] [Google Scholar]
  • 18.Künsch H.R. The jackknife and the bootstrap for general stationary observations. Ann. Stat. 1989;17:1217–1241. doi: 10.1214/aos/1176347265. [DOI] [Google Scholar]
  • 19.Liu R.Y. Bootstrap procedures under some non-I.I.D. models. Ann. Stat. 1988;16:1696–1708. doi: 10.1214/aos/1176351062. [DOI] [Google Scholar]
  • 20.Hill J.B., Li T. A global wavelet based bootstrapped test of covariance stationarity. arXiv. 20222210.14086 [Google Scholar]
  • 21.Fang X., Koike Y. High-dimensional central limit theorems by Stein’s method. Ann. Appl. Probab. 2021;31:1660–1686. doi: 10.1214/20-AAP1629. [DOI] [Google Scholar]
  • 22.Chernozhukov V., Chetverikov D., Koike Y. Nearly optimal central limit theorem and bootstrap approximations in high dimensions. Ann. Appl. Probab. 2023;33:2374–2425. doi: 10.1214/22-AAP1870. [DOI] [Google Scholar]
  • 23.Chang J., He J., Yang L., Yao Q. Modelling matrix time series via a tensor CP-decomposition. J. R. Stat. Soc. Ser. B Stat. Methodol. 2023;85:127–148. doi: 10.1093/jrsssb/qkac011. [DOI] [Google Scholar]
  • 24.Koike Y. High-dimensional central limit theorems for homogeneous sums. J. Theor. Probab. 2023;36:1–45. doi: 10.1007/s10959-022-01156-2. [DOI] [Google Scholar]
  • 25.Hörmann S., Kokoszka P. Weakly dependent functional data. Ann. Stat. 2010;38:1845–1884. doi: 10.1214/09-AOS768. [DOI] [Google Scholar]
  • 26.Zhang X. White noise testing and model diagnostic checking for functional time series. J. Econ. 2016;194:76–95. doi: 10.1016/j.jeconom.2016.04.004. [DOI] [Google Scholar]
  • 27.Politis D.N., Romano J.P., Wolf M. Subsampling. Springer; Berlin/Heidelberg, Germany: 1999. (Springer Series in Statistics). [Google Scholar]
  • 28.Chernozhukov V., Chetverikov D., Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 2013;41:2786–2819. doi: 10.1214/13-AOS1161. [DOI] [Google Scholar]
  • 29.Zhou Z. Heteroscedasticity and autocorrelation robust structural change detection. J. Am. Stat. Assoc. 2013;108:726–740. doi: 10.1080/01621459.2013.787184. [DOI] [Google Scholar]
  • 30.Rio E. Inequalities and Limit Theorems for Weakly Dependent Sequences, 3rd Cycle. 2013. [(accessed on 8 December 2023)]. p. 170. France. Available online: https://cel.hal.science/cel-00867106v2.
  • 31.Chernozhukov V., Chetverikov D., Kato K. Central limit theorems and bootstrap in high dimensions. Ann. Probab. 2017;45:2309–2352. doi: 10.1214/16-AOP1113. [DOI] [Google Scholar]
  • 32.Merlevède F., Peligrad M., Rio E. A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Relat. Fields. 2011;151:435–474. doi: 10.1007/s00440-010-0304-9. [DOI] [Google Scholar]
  • 33.Merlevède F., Peligrad M., Rio E. High Dimensional Probability V: The Luminy Volume. Volume 5. Institute of Mathematical Statistics; Waite Hill, OH, USA: 2009. Bernstein inequality and moderate deviations under strong mixing conditions; pp. 273–292. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used to support the findings of this study are included within the article.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES