Simultaneous Inference for High-Dimensional Approximate Factor Model

Yong Wang; Xiao Guo

doi:10.3390/e22111258

. 2020 Nov 5;22(11):1258. doi: 10.3390/e22111258

Simultaneous Inference for High-Dimensional Approximate Factor Model

Yong Wang ¹, Xiao Guo ^2,^*

PMCID: PMC7712464 PMID: 33287026

Abstract

This paper studies simultaneous inference for factor loadings in the approximate factor model. We propose a test statistic based on the maximum discrepancy measure. Taking advantage of the fact that the test statistic can be approximated by the sum of the independent random variables, we develop a multiplier bootstrap procedure to calculate the critical value, and demonstrate the asymptotic size and power of the test. Finally, we apply our result to multiple testing problems by controlling the family-wise error rate (FWER). The conclusions are confirmed by simulations and real data analysis.

Keywords: high-dimensional factor model, multiple testing, multiplier bootstrap, simultaneous inference

1. Introduction

The high-dimensional factor model is becoming more and more important in different scientific areas including finance and macroeconomics. For example, the data in the World Bank contain two-hundred countries over forty years and the number of stocks can be in the thousands which is larger than or of the same order of the sample size for portfolio allocation. Due to its broad applications, much efforts have been devoted to analyzing factor model in different aspects. Examples include estimation of factors and loadings for latent factor model [1,2], covariance matrix estimation [3,4,5,6], and simultaneous inference for factor loadings of dynamic factor model [7,8], among others.

This work focuses on the simultaneous inference for the loading matrix with observed factors, which is an important issue in the analysis of approximate factor models. For example, in the study of gene expression genomics, it is commonly assumed that each gene is associated with only a few factors. For example, the authors of [9] showed that several oncogenes are related to Rb/E2F pathway rather than any other pathways. The authors of [10] also considered sparse loading matrix for gene expression data. Therefore, it is necessary to test the sparsity of the factor loadings. In the literature, some inference procedures have been proposed for latent factor models. For example, in low-dimensional setting, the authors of [11] considered testing the homogeneity assumption, i.e., the loadings associated to a factor are identical for all variables. The same testing problem has been considered by the authors of [12] in high-dimensional situation. As for observed factors, to the best of our knowledge, very limited work has been conducted.

Inference for the factor loadings with observed factors is not trivial. The approaches for latent factors cannot be directly applied to observed factors. The major difficulty is due to the high dimensionality, which poses significant challenges in deriving the asymptotic null limiting distribution of the test statistic. We propose a test statistic based on the maximum discrepancy measure. The distribution of this statistic is attractive in high-dimensional statistical inference such as model selection, simultaneous inference, and multiple testing. Examples include the works in [13,14,15,16,17], among others.

We use the multiplier bootstrap procedure to obtain the critical value of our test statistic. Based on the fact that the test statistic can be approximated by the sum of the independent random variables, we show that the proposed multiplier bootstrap method consistently approximates the null limiting distribution of the test statistic, and thus the testing procedure achieves the prespecified significance level asymptotically. There are some related works applying multiplier bootstrap method to high-dimensional inference; see in [16,18,19], among others. However, their procedures require sparsity assumption on the parameters and cannot be directly applied to factor model. Compared with the works with latent factors, we do not require homogeneity constraints or sparsity on the model and our procedure is adaptive to high-dimensional regime.

Another application of our procedure is the multiple testing problem. Combining the multiplier bootstrap method with step-down procedure proposed by [17], we show that our procedure has a strong control of the family-wise error rate (FWER). Our method is asymptotically non-conservative as compared to the Bonferroni–Holm procedure since the correlation among the test statistics has been taken into account. We also want to point out that any procedure controlling the FWER will also control the false discovery rate [20] when there exist some true discoveries.

The rest of the paper is organized as follows. In Section 2.1, we develop the multiplier bootstrap procedure for simultaneous test of parameters for a single factor and demonstrate its asymptotic level and power. In Section 2.2, we give the result of simultaneous test of parameters for multiple factors. Section 3 discusses the multiple testing problem by combining the multiplier bootstrap procedure with the step-down method proposed by [17]. Section 4 investigates the numerical performance of the proposed test by simulations. We also conduct real data analysis on portfolio risk of S&P stocks via Fama–French model in Section 5. The proofs of the main results are given in Appendix A.

Finally, we introduce some notation. For set S, let $| S |$ denote the cardinality of S. Let $0_{p} \in R^{p}$ be the vector of zeros. For $p \times p$ matrix $A = {(a_{i j})}_{i, j = 1}^{p}$ , denote by $λ_{\min} (A)$ and $λ_{\max} (A)$ the minimum and maximum eigenvalues of $A$ , respectively. The matrix element-wise maximum norm and $L_{2}$ norm are defined as ${∥ A ∥}_{\infty} = \max_{1 \leq i, j \leq p} | a_{i j} |$ and $∥ A ∥ = λ_{\max}^{1 / 2} (A^{'} A)$ , respectively. For $a = {(a_{1}, \dots, a_{p})}^{'} \in R^{p}$ and $q > 0$ , denote by ${∥ a ∥}_{q} = (\sum_{i = 1}^{p} | a_{i} {|^{q})}^{1 / q}$ and ${∥ a ∥}_{\infty} = \max_{1 \leq j \leq p} | a_{j} |$ . Let $v_{i} \in R^{K}$ be the ith column of the $K \times K$ identity matrix. We write $a_{t} ≲ b_{t}$ if $a_{t}$ is smaller than or equal to $b_{t}$ up to a universal positive constant. For $a, b \in R$ , we write $a \lor b = \max {a, b}$ . For two sets, A and B, $A ⊖ B$ denotes their symmetric difference, that is, $A ⊖ B = (A ∖ B) \cup (B ∖ A)$ .

2. Methodology

2.1. Simultaneous Test for a Single Factor

We consider the factor model defined as follows,

y_{i t} = b_{i}^{'} f_{t} + u_{i t}, i = 1, \dots, p and t = 1, \dots, T,

(1)

where $y_{i t}$ is the observed response for the ith variable at time t, $b_{i} \in R^{K}$ is the unknown vector of factor loadings, $f_{t} \in R^{K}$ is the observed vector of common factors, and $u_{i t}$ is the latent error. Here, K is a fixed integer denoting the number of factors, p is the number of variables, and T denotes the sample size. Model (1) is commonly used in finance and macroeconomics, see, e.g., in [3,4,21], among others.

Denote by $B = {(b_{1}, \dots, b_{p})}^{'}$ , $y_{t} = {(y_{1 t}, \dots, y_{p t})}^{'}$ and $u_{t} = {(u_{1 t}, \dots, u_{p t})}^{'}$ , then model (1) can be re-expressed as

y_{t} = B f_{t} + u_{t} .

(2)

We first focus on testing the coefficients $b_{i k} = b_{i}^{'} v_{k}$ corresponding to a single factor, i.e., the kth factor. Specifically, we consider the following simultaneous testing problem that given $k = 1, \dots, K$ ,

H_{0, G} : b_{i k} = b_{i k}^{null} for all i \in G versus H_{1, G} : b_{i k} \neq b_{i k}^{null} for some i \in G,

(3)

where G is a subset of ${1, \dots, p}$ and $b_{i k}^{null}$ are prespecified values. For example, if $b_{i k}^{null}$ are 0, then the hypotheses are able to test whether the variables with indices in G are significantly associated with the kth factor. Throughout the paper, $| G |$ is allowed to grow as fast as p, which may have exponential growth with T as in Assumption 3.

The ordinary least squares (OLS) estimator $\hat{B} = Y^{'} F {(F^{'} F)}^{- 1}$ is applied to estimate $B$ , where $Y = {(y_{1}, \dots, y_{T})}^{'}$ and $F = {(f_{1}, \dots, f_{T})}^{'}$ . Therefore,

\hat{B} - B = (\sum_{t = 1}^{T} u_{t} f_{t}^{'}) {(\sum_{t = 1}^{T} f_{t} f_{t}^{'})}^{- 1} .

(4)

We propose the following test statistic for $H_{0, G}$ ,

M_{T, k} = max_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k}^{null} |,

where ${({\hat{b}}_{i k})}_{i \leq p, k \leq K} = \hat{B}$ . For each $i \in G$ , the asymptotic normality of ${\hat{b}}_{i k}$ is straightforward due to the central limit theorem. However, when $| G |$ diverges with p, it is very challenging to demonstrate the existence of the limiting distribution of $M_{T, k}$ . In order to approximate the asymptotic distribution of $M_{T, k}$ , we will use the multiplier bootstrap method. From (4), we know

\sqrt{T} ({\hat{b}}_{i k} - b_{i k}) = \frac{1}{\sqrt{T}} \sum_{t = 1}^{T} u_{i t} f_{t}^{'} {\hat{Ω}}_{f} v_{k} = \frac{1}{\sqrt{T}} \sum_{t = 1}^{T} {\hat{ξ}}_{i t},

(5)

where ${\hat{ξ}}_{i t} = u_{i t} f_{t}^{'} {\hat{Ω}}_{f} v_{k}$ and ${\hat{Ω}}_{f} = {(\sum_{t = 1}^{T} f_{t} f_{t}^{'} / T)}^{- 1}$ .

In order to apply the multiplier bootstrap procedure, we need to approximate $\sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T}$ by sum of independent random variables. As ${\hat{Ω}}_{f}$ is consistent for $Ω_{f} = {E (f_{t} f_{t}^{'})}^{- 1}$ , we can replace the former with the latter in ${\hat{ξ}}_{i t}$ , and define $ξ_{i t} = u_{i t} f_{t}^{'} Ω_{f} v_{k}$ . Then, for each $i \in G$ , ${ξ_{i t}}_{t \geq 1}$ are i.i.d. and $\sum_{t = 1}^{T} ξ_{i t} / \sqrt{T}$ well approximates $\sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T}$ .

We then apply the multiplier bootstrap procedure to approximate the distribution of ${max}_{i \in G} | \sum_{t = 1}^{T} ξ_{i t} | / \sqrt{T}$ . Denote by $Σ_{u} = {(σ_{i j})}_{p \times p}$ the covariance matrix of $u_{t}$ , and hence $cov (ξ_{i t}, ξ_{j t}) = Ω_{f} (k, k) σ_{i j}$ , where $Ω_{f} (k, k) = v_{k}^{'} Ω_{f} v_{k}$ . We know that ${\hat{Ω}}_{f} (k, k) = v_{k}^{'} {\hat{Ω}}_{f} v_{k}$ is $\sqrt{T}$ -consistent for $Ω_{f} (k, k)$ . To estimate $σ_{i j}$ , we first calculate the residuals

{\hat{u}}_{i t} = y_{i t} - {\hat{b}}_{i}^{'} f_{t} .

Denote by ${\hat{u}}_{t} = {({\hat{u}}_{1 t}, \dots, {\hat{u}}_{p t})}^{'}$ , then the error covariance matrix is estimated by

{\hat{Σ}}_{u} = \frac{1}{T} \sum_{t = 1}^{T} {\hat{u}}_{t} {\hat{u}}_{t}^{'} = {({\hat{σ}}_{i j})}_{i \leq p, j \leq p} .

Let ${e_{t}}_{t = 1}^{T}$ , a sequence of i.i.d. $N (0, 1)$ independent of ${y_{t}, f_{t}}_{t = 1}^{T}$ , be the multiplier random variables. Then the multiplier bootstrap statistic is defined as

W_{T, k} = max_{i \in G} T^{- 1 / 2} \sqrt{{\hat{Ω}}_{f} (k, k)} | \sum_{t = 1}^{T} {\hat{u}}_{i t} e_{t} | .

Conditioning on ${y_{t}, f_{t}}_{t = 1}^{T}$ , the covariance of $T^{- 1 / 2} \sum_{t = 1}^{T} \sqrt{{\hat{Ω}}_{f} (k, k)} {\hat{u}}_{i t} e_{t}$ and $T^{- 1 / 2} \sum_{t = 1}^{T} \sqrt{{\hat{Ω}}_{f} (k, k)} {\hat{u}}_{j t} e_{t}$ is ${\hat{Ω}}_{f} (k, k) {\hat{σ}}_{i j}$ , which can sufficiently approximate the covariance between $ξ_{i t}$ and $ξ_{j t}$ . Then, the bootstrap critical value can be obtained via

c_{W_{T, k}} (α) = \inf {t \in R : P (W_{T, k} \leq t | (Y, F)) \geq 1 - α} .

$c_{W_{T, k}} (α)$ is calculated by generating ${e_{t}}_{t = 1}^{T}$ repeatedly. In our simulations and real data, we conduct bootstrap 500 times. We now present some technical assumptions.

Assumption 1.

(i)
${f_{t}, u_{t}}_{t \geq 1}$ are $i . i . d .$ with $E (u_{t}) = 0_{p}$ and $Σ_{u} = cov (u_{t})$ .

(ii)
There exist constants $c_{1}, c_{2}$ such that $0 < c_{1} < λ_{\min} (Σ_{u}) < λ_{\max} (Σ_{u}) < c_{2} < \infty$ .

(iii)
${u_{t}}_{t \geq 1}$ and ${f_{t}}_{t \geq 1}$ are independent.

Assumption 2.

There exist positive constants $r_{1}, r_{2}, b_{1}, b_{2}$ , such that for any $s > 0$ , $t \leq T$ , $i \leq p$ and $j \leq K$ ,

$P (| u_{i t} | > s) \leq exp {- {(s / b_{1})}^{r_{1}}}, P (| f_{j t} | > s) \leq exp {- {(s / b_{2})}^{r_{2}}} .$

The “ $i . i . d .$ ” condition in Assumption 1 is commonly considered in the literature for high-dimensional inference, see, e.g., in [16]. Assumption 1 (ii) allows the bounded eigenvalue of the error covariance matrix. As noted in [22], such assumption is satisfied by two situations: (1) $cov (U_{1}, \dots, U_{p})$ , where ${U_{i}, i \geq 1}$ is a stationary ergodic process with spectral density f, $0 < c_{1} < f < c_{2}$ and (2) $cov (X_{1}, \dots, X_{p})$ where $X_{i} = U_{i} + V_{i}, i = 1, \dots, p$ , ${U_{i}}$ is a stationary process as above and ${V_{i}}$ is a noise process independent of ${U_{i}}$ . In Example 1 in [22], they demonstrated that ARMA( $r, q$ ) process satisfies Assumption 1 (ii). Furthermore, this assumption is commonly considered in the literature, see, e.g., in [4,15].

Assumption 2 allows the application of the large deviation theory to $(1 / T) \sum_{t = 1}^{T} u_{i t} u_{j t} - σ_{i j}$ and $(1 / T) \sum_{t = 1}^{T} u_{i t} f_{j t}$ . In this paper, we assume that $f_{t}$ and $u_{t}$ have exponential-type tails. Let $γ_{1}^{- 1} = 3 r_{1}^{- 1}$ and $γ_{2}^{- 1} = 1.5 r_{1}^{- 1} + 1.5 r_{2}^{- 1}$ .

Assumption 3.

Suppose $γ_{1} < 1$ , $γ_{2} < 1$ and there exists a constant $c_{1} > 0$ , such that ${(log p)}^{γ} = o (T)$ , where $γ = max {2 / γ_{1} - 1, 2 / γ_{2} - 1, 7 + c_{1}}$ .

Assumption 4.

There exists a constant $C > 0$ such that $λ_{max} (Ω_{f}) < C$ .

Assumption 3 is needed in Bernstein-type inequality [23] and commonly assumed in the literature for Gaussian approximation theory. Assumption 4 is also reasonable by bounding the eigenvalues of $Ω_{f}$ .

Theorem 1.

Under Assumptions 1–4, we have

$sup_{α \in (0, 1)} | P (max_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k} | > c_{W_{T, k}} (α)) - α | = o (1) .$

Theorem 1 demonstrates that the multiplier bootstrap critical value $c_{W_{T, k}} (α)$ well approximates the quantile of the test statistic. It is worth mentioning that our method does not require any sparsity assumption on either $Σ_{u}$ or $B$ .

The proof of Theorem 1 depends on the two results: (1) ${max}_{i \in G} \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T}$ is sufficiently close to ${max}_{i \in G} \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T}$ and (2) the covariances of $ξ_{i t}$ and $ξ_{j t}$ are well approximated by the bootstrap version. The first result is demonstrated in Lemma A7 that there exist $ζ_{1} > 0$ and $ζ_{2} > 0$ such that

P (| max_{i \in G} \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} - max_{i \in G} \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T} | > ζ_{1}) < ζ_{2},

where $ζ_{1} \sqrt{1 \lor log (p / ζ_{1})} = o (1)$ and $ζ_{2} = o (1)$ . The second result is shown in Lemma A6 that

Δ = max_{1 \leq i, j \leq p} | {\hat{Ω}}_{f} (k, k) {\hat{σ}}_{i j} - Ω_{f} (k, k) σ_{i j} | = o_{P} ({(log p)}^{- 2}),

i.e., the maximum discrepancy between the empirical and population covariance matrices converges to zero.

Based on Theorem 1, for a given significance level $0 < α < 1$ , we define the test $Φ_{α}$ by

Φ_{α} = I (M_{T, k} > c_{W_{T, k}} (α)) .

(6)

The hypothesis $H_{0, G}$ is rejected whenever $Φ_{α} = 1$ .

Bootstrap is a commonly used resampling method and full theories about it can be found in [24]. There are many versions of bootstrap, for example, wild bootstrap, empirical bootstrap, and multiplier bootstrap, among others. As discussed in [25], other exchangeable bootstrap methods are asymptotically equivalent to the multiplier bootstrap. As our test statistic can be approximated by the maximum of sum of independent random vectors, we adopt the multiplier bootstrap method in [25] based on Gaussian approximation.

Alternatively, we propose the studentized statistic $M_{T, k}^{*} : = {max}_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k}^{null} | / \sqrt{{\hat{ω}}_{i i}}$ for $H_{0, G}$ , where ${\hat{ω}}_{i i} = {\hat{Ω}}_{f} (k, k) {\hat{σ}}_{i i}$ . Similarly to Section 2.1, we define the multiplier bootstrap statistic as

W_{T, k}^{*} = max_{i \in G} T^{- 1 / 2} | \sum_{t = 1}^{T} {\hat{u}}_{i t} e_{t} | \sqrt{{\hat{Ω}}_{f} (k, k) / {\hat{ω}}_{i i}} = max_{i \in G} T^{- 1 / 2} {\hat{σ}}_{i i}^{- 1 / 2} | \sum_{t = 1}^{T} {\hat{u}}_{i t} e_{t} |,

where ${e_{t}}_{t = 1}^{T} \overset{i . i . d .}{\sim} N (0, 1)$ are independent of ${y_{t}, f_{t}}_{t = 1}^{T}$ . Then, the bootstrap critical value can be obtained via

c_{W_{T, k}}^{*} (α) = inf {t \in R : P (W_{T, k}^{*} \leq t | (Y, F)) \geq 1 - α} .

Theorem 2 below justifies the validity of the bootstrap procedure for the studentized statistic.

Theorem 2.

Under the assumptions in Theorem 1, we have

$sup_{α \in (0, 1)} | P (max_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k} | / \sqrt{{\hat{ω}}_{i i}} > c_{W_{T, k}}^{*} (α)) - α | = o (1) .$

Based on this result, for a given significance level $0 < α < 1$ , we define the test $Φ_{α}^{*}$ by

Φ_{α}^{*} = I (M_{T, k}^{*} > c_{W_{T, k}}^{*} (α)) .

The hypothesis $H_{0, G}$ is rejected whenever $Φ_{α}^{*} = 1$ .

For the studentized statistic, we can derive its asymptotic distribution. By Lemma 6 of the work in [15], for any $x \in R$ and as $p \to \infty$ , we have

P (max_{1 \leq i \leq p} T {| {\hat{b}}_{i k} - b_{i k} |}^{2} / {\hat{ω}}_{i i} - 2 log (p) + log log (p) \leq x) \to exp \{- \frac{1}{\sqrt{π}} exp (- \frac{x}{2})\} .

However, the above alternative testing procedure may not perform well in practice, because it requires diverging p, and the convergence rate is typically slow.

In contrast to the extreme value approach, our testing procedure explicitly accounts for the effect of $| G |$ in the sense that the bootstrap critical value $c_{W_{T, k}}^{*} (α)$ depends on G. Therefore, our approach is more robust to the change in $| G |$ .

Next, we turn to the (asymptotic) power analysis of the above procedure. Denote by $B_{k}$ the kth column of $B$ . We focus on the case where $| G | \to \infty$ as $T \to \infty$ below. Define the separation set

U_{G} (c) = {{(b_{1 k}, \dots, b_{p k})}^{T} : max_{i \in G} | b_{i k} - b_{i k}^{null} | / \sqrt{ω_{i i}} > c \sqrt{log (| G |) / T}},

(7)

where $ω_{i j} = Ω_{f} (k, k) σ_{i j}$ . Let $Θ = {(θ_{i j})}_{i, j = 1}^{p}$ with $θ_{i j} = ω_{i j} / \sqrt{ω_{i i} ω_{j j}} = σ_{i j} / \sqrt{σ_{i i} σ_{j j}}$ , which is the correlation matrix of $u_{t}$ .

Assumption 5.

Suppose ${max}_{1 \leq i \neq j \leq p} | θ_{i j} | \leq c_{0} < 1$ for some constant $c_{0}$ .

Theorem 3.

Under Assumptions 1–5, for any $ε_{0} > 0$ , we have

$inf_{B_{k} \in U_{G} (\sqrt{2} + ε_{0})} P (max_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k}^{null} | / \sqrt{{\hat{ω}}_{i i}} > c_{W_{T, k}}^{*} (α)) \to 1 .$

As long as one entry of $b_{i k} - b_{i k}^{null}$ has a magnitude larger than $(\sqrt{2} + ε_{0}) \sqrt{log | G | / T}$ , our bootstrap-assisted test can reject the null correctly. Therefore, with $B$ being sparse, our procedure performs well in detecting non-sparse alternatives. According to Section 3.2 of [26], the separation rate $(\sqrt{2} + ε_{0}) \sqrt{log (| G |) / T}$ is minimax optimal under suitable assumptions.

2.2. Simultaneous Test for Multiple Factors

In this section, we test the elements of the loading matrix corresponding to different factors. The testing problem can be stated as follows,

H_{0, G^{*}} : b_{i k} = b_{i k}^{null} for all (i, k) \in G^{*} versus H_{1, G^{*}} : b_{i k} \neq b_{i k}^{null} for some (i, k) \in G^{*},

where $G^{*}$ is a subset of $M \equiv {(i, j) : i = 1, \dots p and j = 1, \dots, K}$ . Define

\begin{matrix} ω_{(i, k), (j, ℓ)}^{*} & = & cov (u_{i t} f_{t}^{'} Ω_{f} v_{k}, u_{j t} f_{t}^{'} Ω_{f} v_{ℓ}) = σ_{i j} v_{k}^{'} Ω_{f} v_{ℓ}, \\ {\hat{ω}}_{(i, k), (j, ℓ)}^{*} & = & {({\hat{Ω}}_{f} v_{k})}^{'} (\frac{1}{T} \sum_{t = 1}^{T} {\hat{u}}_{i t} {\hat{u}}_{j t} f_{t} f_{t}^{'}) ({\hat{Ω}}_{f} v_{ℓ}) . \end{matrix}

We propose the studentized test statistic

M_{T, G^{*}} = max_{(i, k) \in G^{*}} \sqrt{T} | {\hat{b}}_{i k} - b_{i k} | / \sqrt{{\hat{ω}}_{(i, k), (i, k)}^{*}} .

From the linear expansion in (5), the multiplier bootstrap statistic is defined as

W_{T, G^{*}} = max_{(i, k) \in G^{*}} T^{- 1 / 2} | \sum_{t = 1}^{T} {\hat{u}}_{i t} f_{t}^{'} {\hat{Ω}}_{f} v_{k} e_{t} | / \sqrt{{\hat{ω}}_{(i, k), (i, k)}^{*}},

where ${e_{t}}_{t = 1}^{T} \overset{i . i . d .}{\sim} N (0, 1)$ are independent of ${y_{t}, f_{t}}_{t = 1}^{T}$ . Then, the bootstrap critical value can be obtained via

c_{W_{T, G^{*}}} (α) = \inf {t \in R : P (W_{T, G^{*}} \leq t | (Y, F)) \geq 1 - α} .

Let $γ_{3}^{- 1} = 4 r_{1}^{- 1} + 4 r_{2}^{- 1}$ , $r_{3}^{- 1} = 3 r_{1}^{- 1} + 9 r_{2}^{- 1}$ and $r = max {2 / γ_{3} - 1, 2 / r_{3} - 1, c_{1} + 7}$ for a constant $c_{1} > 0$ .

Theorem 4.

Suppose ${(log p)}^{r} = o (T)$ , under Assumptions 1,2 and 4, we have

$sup_{α \in (0, 1)} | P (max_{(i, k) \in G^{*}} \sqrt{T} | {\hat{b}}_{i k} - b_{i k} | / \sqrt{{\hat{ω}}_{(i, k), (i, k)}^{*}} > c_{W_{T, G^{*}}} (α)) - α | = o (1) .$

Based on Theorem 4, for a given significance level $0 < α < 1$ , we define the test $Φ_{α} (G^{*})$ by

Φ_{α} (G^{*}) = I (M_{T, G^{*}} > c_{W_{T, G^{*}}} (α)) .

The hypothesis $H_{0, G^{*}}$ is rejected whenever $Φ_{α} (G^{*}) = 1$ .

Now we turn to the power analysis of the test $Φ_{α} (G^{*})$ . Similar to Section 2.1, we focus on the case where $| G^{*} | \to \infty$ as $T \to \infty$ and define the separation set

V_{G^{*}} (c) = {{(b_{i k})}_{i \leq p, k \leq K} : max_{(i, k) \in G^{*}} | b_{i k} - b_{i k}^{null} | / \sqrt{ω_{(i, k), (i, k)}^{*}} > c \sqrt{log (| G^{*} |) / T}},

Let $θ_{(i, k), (j, ℓ)}^{*} = ω_{(i, k), (j, ℓ)}^{*} / \sqrt{ω_{(i, k), (i, k)}^{*} ω_{(j, ℓ), (j, ℓ)}^{*}}$ . We consider the following condition.

Assumption 6.

Suppose ${max}_{(i, k), (j, ℓ)} | θ_{(i, k), (j, ℓ)}^{*} | \leq c_{0}^{*} < 1$ for some constant $c_{0}^{*}$ .

The asymptotic power of the testing procedure is given as follows.

Theorem 5.

Under the assumptions in Theorem 4 and Assumption 6 , for any $ε_{0} > 0$ , we have

$inf_{B \in V_{G^{*}} (\sqrt{2} + ε_{0})} P (max_{(i, k) \in G^{*}} \sqrt{T} | {\hat{b}}_{i k} - b_{i k}^{null} | / \sqrt{{\hat{ω}}_{(i, k), (i, k)}^{*}} > c_{W_{T, G^{*}}} (α)) \to 1 .$

3. Multiple Testing with Strong FWER Control

In this section, we study the following multiple testing problem,

H_{0, i} : b_{i j} \leq b_{i j}^{null} versus H_{1, i} : b_{i j} > b_{i j}^{null} for all i \in G .

For simplicity, we set $G = {1, 2, \dots, p}$ and let j be fixed. We combine the bootstrap-assisted procedure with the step-down method proposed by [17]. Our method can be seen as a special case in Section 5 of [25]. Note that this framework can cover the case of testing equalities $(H_{0, j} : b_{i j} = b_{i j}^{null})$ because equalities can be rewritten as pairs of inequalities.

We briefly illustrate the control of the FWER. Full details and theory can be found in [25]. Let $Ω$ be the space for all data generating processes, and $ω$ be the true process. Each null hypothesis $H_{0, i}$ is equivalent to $ω \in Ω_{i}$ for some $Ω_{i} \subset Ω$ . For any $η \subset G$ , denote by $Ω^{η} = (\cap_{i \in η} Ω_{i}) \cap (\cap_{i \notin η} Ω_{i}^{c})$ with $Ω_{i}^{c} = Ω ∖ Ω_{i}$ . The strong control of the FWER means that

sup_{η \subset G} sup_{ω \in Ω^{η}} P_{ω} (reject at least one hypothesis H_{0, i}, i \in ω) \leq α + o (1),

(8)

where $P_{ω}$ denotes the probability distribution under the data-generating process $ω$ .

For $i = 1, \dots, p$ , denote $t_{i j} = \sqrt{T} ({\hat{b}}_{i j} - b_{i j}^{null})$ . For a subset $η \subset G$ , let $c_{η} (α)$ be the bootstrapped estimate for the $(1 - α)$ -quantile of $\max_{i \in η} t_{i j}$ . The step-down procedure in [17] is described as follows. Define $η (1) = G$ at the first step and reject all $H_{0, i}$ satisfying $t_{i j} > c_{η (1)} (α)$ . If no $H_{0, i}$ is rejected, then stop the procedure. If some $H_{0, i}$ are rejected, let $η (2)$ be the set of indices for those hypotheses not being rejected at the first step. On step $ℓ \geq 2$ , let $η (ℓ) \subset G$ be the subset of hypotheses that were not rejected at step $ℓ - 1$ . Reject all hypotheses $H_{0, i}$ for $i \in η (ℓ)$ satisfying that $t_{i j} > c_{η (ℓ)} (α)$ . If no hypothesis is rejected, then stop the procedure. Proceed in this way until the algorithm stops.

Romano and Wolf [17] proved the following result:

c_{η} (α) \leq c_{η^{'}} (α), for η \subset η^{'}

(9)

sup_{η \subset G} sup_{ω \in Ω^{η}} P_{ω} (max_{i \in η} t_{i j} > c_{η} (α)) \leq α + o (1) .

(10)

Therefore, we can show that the step-down method together with the multiplier bootstrap provide strong control of the FWER by verifying (9) and (10). The theoretical results are given in the proposition below. The proofs are similar to those of Theorem 1, which are omitted here.

Proposition 1.

Under the assumptions in Theorem 1, the step-down procedure with the bootstrap critical value $c_{η} (α)$ satisfies (8).

Our multiple testing method has the following two important features: (i) It can be applied to models with an increasing dimension; (ii) It takes into account the correlation amongst statistics and hence is asymptotically non-conservative.

In the simulation, we also consider Benjamini–Hochberg procedure [20] to control the false discovery rate (FDR), which is summarized as follows. For each of $H_{0, 1}, \dots, H_{0, p}$ , we calculate the p-values $P_{1}, \dots, P_{p}$ based on the studentized test statistic. Let $P_{(1)} \leq \dots \leq P_{(p)}$ be the ordered p-values, and denote by $H_{0, (i)}$ the null hypothesis corresponding to $P_{(i)}$ . Let $k = max {i : P_{(i)} \leq i α / p}$ , and then reject all $H_{0, (i)}$ for $i = 1, \dots, k$ .

4. Simulation Study

This section examines the performance of the proposed testing procedure by a simulation study. We fix the number of factors $K = 3$ , the sample size $T \in {200, 400}$ , and let the dimensionality p increase from 50 to 600. Throughout the simulation, we consider testing the first column of $B$ and repeat multiplier bootstrap 500 times.

Each row of $B$ is generated independently from $N (0, I_{K})$ , where $I_{K}$ is $K \times K$ identity matrix. Let $cov (f_{t}) = {(σ_{i j}^{f})}_{K \times K}$ with $σ_{i j}^{f} = {0.6}^{| i - j |}$ . Here, we consider two models for the covariance structure $Σ_{u}$ .

(a)
Model 1 (sparse): $Ω_{u} = {(ω_{i j})}_{1 \leq i, j \leq p}$ where $ω_{i i} = 1$ , $ω_{i j} = 0.8$ for $2 (k - 1) + 1 \leq i \neq j \leq 2 k$ , where $k = 1, \dots, [p / 2]$ and $ω_{i, j} = 0$ otherwise. $Σ_{u} = Ω_{u}^{- 1}$ .
(b)
Model 2 (non-sparse): $Σ_{u} = {(σ_{i j})}_{1 \leq i, j \leq p}$ where $σ_{i i} = 1$ and $σ_{i j} = 0.5$ for $i \neq j$ .

Under each model, ${f_{t}}_{t = 1}^{T}$ and ${u_{t}}_{t = 1}^{T}$ are generated independently from $N (0, cov (f_{t}))$ and $N (0, Σ_{u})$ , respectively.

We calculate the empirical sizes of test for each column of $B$ under each model by considering hypothesis (3) with $G = {1, 2, \dots, p}$ and $b_{i k}^{null}$ being the true value of $b_{i k}$ . The results are summarized in Table 1. Here “NST”, “ST” denote the non-studentized, studentized Bootstrap-based test, respectively, and “EX” denotes the test using extreme value distribution. The estimated sizes of the three tests are reasonably close to the nominal level 0.05 for the values of p ranging from 50 to 600.

Table 1.

Empirical sizes of tests, $α = 0.05$ , $T = 400$ , and 500 replications.

	p = 50	p = 100	p = 200	p = 400	p = 600
Model 1
NST	0.076	0.060	0.058	0.050	0.058
ST	0.074	0.064	0.064	0.058	0.078
EX	0.046	0.046	0.038	0.046	0.058
Model 2
NST	0.050	0.052	0.056	0.060	0.038
ST	0.070	0.058	0.064	0.068	0.048
EX	0.038	0.030	0.024	0.018	0.016

Open in a new tab

For all $i \in G$ , by varying $b_{i k} = b_{i k}^{null} + c / 40$ with $c = \pm 0.8 ℓ$ and $ℓ = 0, \dots, 10$ , we plot the empirical powers of $M_{T, k}$ and $M_{T, k}^{*}$ in Figure 1. For ease of presentation, we only consider $p \in {10, 200, 600}$ . The results for other dimensionality are similar in spirit, and are not presented here. For all tests, the significance level is fixed at $α = 0.05$ . From Figure 1, we can tell that the empirical rejection rate grows from the nominal level to one as c deviates away from zero. The difference between NST test and ST test is slight. For small p, the EX test does not perform well because this approach requires diverging p. Furthermore, for non-sparse error covariance matrix, our method performs better than the EX method. These numerical results confirm our theoretical analysis.

Empirical powers of the NST, ST, and EX methods. The figures in the left panels are based on Model 1, while those in the right panels are for Model 2. The red solid line corresponds to the nominal level. (a) $p = 10$ , (b) $p = 10$ , (c) $p = 200$ , (d) $p = 200$ , (e) $p = 600$ , (f) $p = 600$ .

Next, we study the numerical performance of the step-down method in Section 3 and compare it with the Bonferroni–Holm procedure. Consider the following two-sided multiple testing problem; $H_{0, i} : b_{i j} = {\tilde{b}}_{i j}^{null}$ among all $i = 1, 2, \dots, p$ with $j = 1$ . For Models 1 and 2, the first $s_{0}$ entries of ${{\tilde{b}}_{i j}^{null}}_{i = 1}^{p}$ are $b_{i j}^{null} + 0.5$ and $b_{i j}^{null} + 0.35$ , respectively, and the rest are equal to $b_{i j}^{null}$ . We set $T \in {200, 400}$ and $p \in {50, 200, 500, 600}$ .

We employ both the step-down method based on the studentized/non-studentized test statistic, and the Bonferroni–Holm procedure (based on the studentized test statistic) to control the FWER. We denote these three procedures by NST-FWER, ST-FWER, and BH-FWER, respectively. For comparison, we also consider using Benjamini–Hochberg procedure to control FDR. We denote this procedure by BH-FDR. Based on 500 replications, we calculate the average empirical FWER

Average {I {At least one hypothesis H_{0, i} is rejected, i \in {s_{0} + 1, \dots, p}}}

for methods NST-FWER, ST-FWER, and BH-FWER, the average empirical FDR

Average \{\sum_{i \in S_{0}} I {H_{0, i} is rejected} / \sum_{i \in G} I {H_{0, i} is rejected}\}

for method BH-FDR, and the average empirical power

Average \{\sum_{i \in S_{0}} I {H_{0, i} is rejected} / s_{0}\}

for all the four methods, where $S_{0} = {1, 2, \dots, s_{0}}$ and $G = {1, \dots, p}$ . Under each model, we consider the case $s_{0} = 3$ and $s_{0} = 15$ . Table 2 and Table 3 report the empirical FWER, FDR, and the average power. From Table 2 and Table 3, the proposed and Bonferroni–Holm procedures provide similar control on the FWER, and Benjamini–Hochberg procedure can control FDR. The empirical powers of the step-down method and Benjamini–Hochberg procedure are higher than that of the Bonferroni–Holm procedure. It is also seen that controlling the FDR is more powerful than controlling the FWER.

Table 2.

Empirical family-wise error rate (FWER) and false discovery rate (FDR) with power in the brackets of multiple testing based on Model 1, $α = 0.05$ , and 500 replications.

T	$s_{0}$	Method	p = 50	p = 200	p = 500	p = 600
200	3	NST-FWER	0.058 (0.551)	0.062 (0.405)	0.048 (0.309)	0.056 (0.291)
		ST-FWER	0.074 (0.554)	0.082 (0.431)	0.086 (0.337)	0.090 (0.324)
		BH-FWER	0.054 (0.528)	0.070 (0.409)	0.074 (0.319)	0.068 (0.300)
		BH-FDR	0.061 (0.635)	0.064 (0.470)	0.086 (0.380)	0.069 (0.353)
	15	NST-FWER	0.056 (0.569)	0.050 (0.412)	0.040 (0.306)	0.046 (0.303)
		ST-FWER	0.066 (0.583)	0.086 (0.430)	0.074 (0.334)	0.084 (0.327)
		BH-FWER	0.060 (0.561)	0.066 (0.410)	0.056 (0.310)	0.068 (0.309)
		BH-FDR	0.043 (0.810)	0.064 (0.655)	0.06 (0.518)	0.061 (0.509)
400	3	NST-FWER	0.050 (0.935)	0.058 (0.889)	0.062 (0.839)	0.058 (0.808)
		ST-FWER	0.070 (0.937)	0.062 (0.885)	0.078 (0.842)	0.066 (0.813)
		BH-FWER	0.052 (0.931)	0.054 (0.873)	0.062 (0.834)	0.052 (0.795)
		BH-FDR	0.057 (0.957)	0.056 (0.924)	0.064 (0.889)	0.068 (0.863)
	15	NST-FWER	0.058 (0.947)	0.054 (0.881)	0.040 (0.819)	0.070 (0.815)
		ST-FWER	0.052 (0.946)	0.066 (0.881)	0.058 (0.825)	0.084 (0.882)
		BH-FWER	0.050 (0.942)	0.056 (0.871)	0.050 (0.809)	0.060 (0.806)
		BH-FDR	0.035 (0.989)	0.052 (0.968)	0.056 (0.946)	0.059 (0.941)

Open in a new tab

Table 3.

Empirical FWER and FDR with power in the brackets of multiple testing based on Model 2, $α = 0.05$ , and 500 replications.

T	$s_{0}$	Method	p = 50	p = 200	p = 500	p = 600
200	3	NST-FWER	0.044 (0.805)	0.052 (0.692)	0.066 (0.622)	0.056 (0.609)
		ST-FWER	0.058 (0.807)	0.066 (0.701)	0.084 (0.638)	0.066 (0.621)
		BH-FWER	0.030 (0.759)	0.042 (0.620)	0.024 (0.517)	0.024 (0.505)
		BH-FDR	0.039 (0.819)	0.046 (0.691)	0.038 (0.592)	0.030 (0.570)
	15	NST-FWER	0.042 (0.805)	0.060 (0.697)	0.058 (0.626)	0.048 (0.618)
		ST-FWER	0.050 (0.809)	0.080 (0.708)	0.080 (0.637)	0.072 (0.630)
		BH-FWER	0.028 (0.757)	0.042 (0.621)	0.034 (0.530)	0.038 (0.519)
		BH-FDR	0.035 (0.922)	0.044 (0.822)	0.046 (0.746)	0.040 (0.717)
400	3	NST-FWER	0.060 (0.989)	0.052 (0.985)	0.052 (0.971)	0.050 (0.970)
		ST-FWER	0.064 (0.989)	0.054 (0.985)	0.068 (0.970)	0.072 (0.973)
		BH-FWER	0.046 (0.983)	0.022 (0.975)	0.026 (0.951)	0.024 (0.945)
		BH-FDR	0.045 (0.995)	0.034 (0.990)	0.040 (0.972)	0.035 (0.973)
	15	NST-FWER	0.066 (0.992)	0.072 (0.986)	0.056 (0.975)	0.056 (0.975)
		ST-FWER	0.072 (0.992)	0.076 (0.986)	0.056 (0.975)	0.064 (0.975)
		BH-FWER	0.046 (0.988)	0.036 (0.973)	0.024 (0.952)	0.022 (0.950)
		BH-FDR	0.043 (0.999)	0.042 (0.998)	0.031 (0.993)	0.044 (0.992)

Open in a new tab

5. Real Data Analysis

This section conducts hypothesis testing for financial data from 1 January 2017 to 14 March 2018. The dataset consists of daily returns of 491 stocks from S&P 500 index. In addition, we collected Fama–French three factors [21] in the same period. In summary, the panel matrix is a 300 by 491 matrix $Y$ , in addition to a factor matrix $F$ of size 300 by 3. Here, 300 is the number of days and 491 is the number of stocks.

We first centralize and standardize the factor matrix $F$ and $Y$ is centralized as well. We consider testing the sparsity of each column of $B$ and repeat the multiplier bootstrap 500 times. Simultaneous test of parameters corresponding to multiple factors is also considered. The hypotheses are

H_{0} : b_{i k} = 0 for all (i, k) \in s versus H_{1} : b_{i k} \neq 0 for some (i, k) \in s,

where $s = {(i, k) : k \in s_{*} \subset {1, 2, 3}, | {\hat{b}}_{i k} | are the smallest β % among {| {\hat{b}}_{i k} {|}}_{i \in {1, \dots, p}; k \in s_{*}}}$ , with $s_{*} = {1}, {2}, {3}$ or ${2, 3}$ and $β = 10, 30, 50, 70, 90$ . The results are depicted in Table 4. For the first column of $B$ , it is therefore not reasonable to assume $b_{i 1} = 0$ . However, we can claim that the last two columns of $B$ are sparse. Hence, a sufficiently large number of stocks are not influenced by the last two factors.

Table 4.

Results of sparse testing.

	$β$ = 10	$β$ = 30	$β$ = 50	$β$ = 70	$β$ = 90
1st loading	R	R	R	R	R
2nd loading	A	A	A	A	A
3rd loading	A	A	A	A	R
2nd and 3rd loading	A	A	A	R	R

Open in a new tab

Note: “A” means accepting the null hypothesis; “R” denotes rejecting the null hypothesis.

Acknowledgments

We thank the three referees for insightful comments and suggestions.

Appendix A. Technical Details

We prove the main results in this section. First, we introduce some notations. Throughout this section, we denote by $c, c^{'}, C, C^{'}, C_{i}$ constants that do not depend on $p, T$ and may vary from place to place. Define $T_{0} = {max}_{i \in G} \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T}$ and $T_{1} = {max}_{i \in G} \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T}$ . Let ${z_{t}}_{t = 1}^{T}$ be a sequence of $N (0, Ω_{f} (k, k) Σ_{u})$ vectors. Define $c_{W_{0}} (α) = \inf {t \in R : P (W_{0} \leq t | (Y, F)) \geq 1 - α}$ with $W_{0} = {max}_{i \in G} T^{- 1 / 2} \sqrt{{\hat{Ω}}_{f} (k, k)} \sum_{t = 1}^{T} {\hat{u}}_{i t} e_{t}$ and $c_{Z_{0}} (α) = \inf {t \in R : P (Z_{0} \leq t | (Y, F)) \geq 1 - α}$ with $Z_{0} = {max}_{i \in G} \sum_{t = 1}^{T} z_{i t} / \sqrt{T}$ . Denote by $Σ_{f} (m, n) = v_{m}^{'} Σ_{f} v_{n}$ with $Σ_{f} = Ω_{f}^{- 1}$ . We begin by presenting some useful lemmas that will be used in the proofs of the main results.

Lemma A1.

Suppose that the random variables $Z_{1}, Z_{2}$ both satisfy the exponential-type tail condition: There exist $r_{1}, r_{2} \in (0, 1)$ and $b_{1}, b_{2} > 0$ such that $\forall s > 0$ ,

$P (| Z_{i} | > s) \leq exp {1 - {(s / b_{i})}^{r_{i}}}, i = 1, 2 .$

Then, for some $r_{3}$ and $b_{3} > 0$ , and any $s > 0$ ,

(i)
$P (| Z_{1} Z_{2} | > s) \leq exp {1 - {(s / b_{3})}^{r_{3}}}$ , where $r_{3} \in (0, r)$ with $r = r_{1} r_{2} / (r_{1} + r_{2})$ ,

(ii)
$P (| Z | > s) \leq exp {1 - {(s / b_{3})}^{r_{3}}}$ , where $Z = max {Z_{1}, Z_{2}}$ .

Proof of Lemma A1.

The proof of the first claim can be found in the proof of Lemma A.2 in [4], thus we prove the second claim. For any $s > 0$ , we have

$\begin{matrix} P (| Z | > s) & \leq P (| Z_{1} | > s) + P (| Z_{2} | > s) \\ = exp {1 - {(s / b_{1})}^{r_{1}}} + exp {1 - {(s / b_{2})}^{r_{2}}} \\ \leq 2 exp {1 - {(s / b)}^{r}}, \end{matrix}$

where $b^{r} = max {b_{1}^{r_{1}}, b_{2}^{r_{2}}}$ . Pick up an $r_{3} \in (0, r)$ , and $b_{3} > max {{(r_{3} / r)}^{1 / r} b, {(1 + log 2)}^{1 / r} b}$ ; then, it can be shown that $F (s) = {(s / b)}^{r} - {(s / b_{3})}^{r_{3}}$ is increasing when $s > b_{3}$ . Therefore, $F (s) > F (b_{3}) > log 2$ when $s > b_{3}$ , which implies when $s > b_{3}$ ,

$P (| Z | > s) \leq 2 exp {1 - {(s / b)}^{r}} \leq exp {1 - {(s / b_{3})}^{r_{3}}} .$

When $s \leq b_{3}$ ,

$P (| Z | > s) \leq 1 \leq exp {1 - {(s / b_{3})}^{r_{3}}} .$

Then the proof is complete. ☐

Lemma A2.

Under Assumptions 1–4, we have

(i)
${max}_{i, j \leq p} | {\hat{σ}}_{i j} - σ_{i j} | = O_{p} (\sqrt{log p / T})$ .

(ii)
${max}_{i \leq p, k \leq K} | (1 / T) \sum_{t = 1}^{T} f_{k t} u_{i t} | = O_{p} (\sqrt{log p / T})$ .

(iii)
${max}_{i, j \leq K} | (1 / T) \sum_{t = 1}^{T} f_{i t} f_{j t} - E f_{i t} f_{j t} | = O_{p} (\sqrt{log T / T})$ .

Proof of Lemma A2.

For a proof, see the proof of Lemma A.3 and Lemma B.1 in [4]. ☐

Lemma A3.

If a random variable X satisfies exponential-type tail: there exist $r > 0$ and $b > 0$ such that $\forall s > 0$ , $P (| X | > s) \leq exp {1 - {(s / b)}^{r}}$ . Then $E (| X |) = O (1)$ .

Proof of Lemma A3.

Note that

$\begin{matrix} E (| X |) & = \int_{0}^{\infty} P (| X | > x) d x \\ \leq \int_{0}^{\infty} exp {1 - {(x / b)}^{r}} d x \\ ≲ \int_{0}^{\infty} exp (- x^{r}) d x : = I \end{matrix}$

It is not hard to check when $r \geq 1$ , $I \leq 1 + e^{- 1}$ . When $r < 1$ , $I = α Γ (α)$ , where $α = 1 / r$ and $Γ (α) = \int_{0}^{\infty} e^{- x} x^{α - 1} d x = O (1)$ . Then, the proof is complete. ☐

Lemma A4.

Under the assumptions in Theorem 1, there exist constants $c, C > 0$ such that

$ρ : = sup_{t \in R} | P (T_{0} \leq t) - P (Z_{0} \leq t) | \leq C T^{- c} .$

Proof of Lemma A4.

Implied by Assumption 3, we have ${(log (p T))}^{7} / T \leq C_{1} T^{- c_{1}}$ for some constants $c_{1}, C_{1} > 0$ . We then apply Corollary 2.1 of [25] to the sequence $ξ_{i t}$ . What we should check is its Condition (E.2), that is, uniformly over i,

$c_{0} \leq E {(ξ_{i t})}^{2} \leq C_{0}, max_{k = 1, 2} E (| ξ_{i t} |^{k + 2} / B^{k}) + E {(max_{1 \leq i \leq p} | ξ_{i t} {| / B)}^{4}} \leq 4,$

where $c_{0}, C_{0} > 0$ and B is some large enough constant. By Lemmas A1 and A3 we have $E (f_{i t} f_{j t}) = O (1)$ uniformly for $i, j \leq K$ . This implies $∥ Ω_{f} ∥_{\infty} = O (1)$ . Uniformly for $i \leq p$ , we have

$ξ_{i t} = u_{i t} f_{t}^{'} Ω_{f} v_{k} \leq u_{i t} ∥ f_{t} ∥_{\infty} {∥ Ω_{f} v_{k} ∥}_{1} : = γ_{i t} .$

By Lemma A1, we know $γ_{i t}$ and ${max}_{1 \leq i \leq p} | γ_{i t} |$ is exponential-type tail. Then by Lemma A3 we have $E (| ξ_{i t} |^{4}) \leq E (| γ_{i t} |^{4})$ =O(1) and $E ({max}_{i \leq p} | ξ_{i t} {|)}^{4} \leq E ({max}_{i \leq p} | γ_{i t} {|)}^{4} = O (1)$ . Thus, we can find large enough B such that the above condition is satisfied. Then, the proof is complete. ☐

Lemma A5.

Under the assumptions in Theorem 1, there exists a sequence of positive numbers $α_{T} \to \infty$ such that $α_{T} / p = o (1)$ and $P (α_{T} {(log p)}^{2} | {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | > 1) \to 0$ .

Proof of Lemma A5.

By Lemma A2 (iii), we have $∥ {\hat{Ω}}_{f}^{- 1} - Ω_{f}^{- 1} ∥_{\infty} = O_{p} (\sqrt{log T / T})$ . Since $∥ Ω_{f} ∥ = O (1)$ and $∥ {\hat{Ω}}_{f} ∥_{\infty} = O_{p} (1)$ , we have

$\begin{matrix} ∥ {\hat{Ω}}_{f} - Ω_{f} ∥_{\infty} & = ∥ {\hat{Ω}}_{f} (Ω_{f}^{- 1} - {\hat{Ω}}_{f}^{- 1}) Ω_{f} ∥_{\infty} \\ \leq ∥ {\hat{Ω}}_{f} ∥_{\infty} ∥ {\hat{Ω}}_{f}^{- 1} - Ω_{f}^{- 1} ∥_{\infty} {∥ Ω_{f} ∥}_{\infty} \\ = O_{p} (\sqrt{log T / T}) . \end{matrix}$

On the other hand,

$| {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | \leq ∥ {\hat{Ω}}_{f} - Ω_{f} ∥_{\infty} = O_{p} (\sqrt{log T / T}) .$

Choosing $α_{T} = \sqrt{log T}$ , by Assumption 3, the proof is complete. ☐

Lemma A6.

Under the assumptions in Theorem 1, we have for every $α \in (0, 1)$ and $ϑ > 0$ ,

$\begin{matrix} P (c_{W_{0}} (α) \leq c_{Z_{0}} (α + π (ϑ))) \geq 1 - P (Δ > ϑ), \\ P (c_{Z_{0}} (α) \leq c_{W_{0}} (α + π (ϑ))) \geq 1 - P (Δ > ϑ), \end{matrix}$

Proof of Lemma A6.

For $ϑ > 0$ , let $π (ϑ) = C_{2} ϑ^{1 / 3} {(1 \lor log (p / ϑ))}^{2 / 3}$ with $C_{2} > 0$ . Recall that $Δ = {max}_{1 \leq i, j \leq p} | {\hat{Ω}}_{f} (k, k) {\hat{σ}}_{i j} - Ω_{f} (k, k) σ_{i j} |$ . As $| Ω_{f} (k, k) | = O (1)$ uniformly for $k \leq K$ , by Lemma A2 (i), we have

$Δ = O_{p} (| {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | + \sqrt{(log p) / T}) .$

By Lemma A5 and Assumption 3, choosing $ϑ = 1 / (α_{T} {(log p)}^{2})$ , we have $P (Δ > ϑ) = o (1)$ . By Lemma 3.1 of [25], on the event ${(Y, F) : Δ \leq ϑ}$ , we have $| P (Z_{0} \leq t) - P (P (W_{0} \leq t | (Y, F)) | \leq π (ϑ)$ for all $t \in R$ , and so on this event

$\begin{matrix} P (P (W_{0} \leq c_{Z_{0}} (α + π (ϑ)) | (Y, F))) & \geq P (Z_{0} \leq c_{Z_{0}} (α + π (ϑ))) - π (ϑ) \\ \geq α + π (ϑ) - π (ϑ) = α, \end{matrix}$

implying the first claim. The second claim follows similarly. ☐

Lemma A7.

Under the assumptions in Theorem 1, there exist $ζ_{1}, ζ_{2} > 0$ such that

$P (| max_{1 \leq i \leq p} \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} - max_{1 \leq i \leq p} \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T} | > ζ_{1}) < ζ_{2},$

where $ζ_{1} \sqrt{1 \lor log (p / ζ_{1})} = o (1)$ , $ζ_{2} = o (1)$ .

Proof of Lemma A7.

The arguments in the proof of Lemma A5 imply that

$∥ {\hat{Ω}}_{f} v_{k} - Ω_{f} v_{k} ∥_{1} = O_{p} (\sqrt{log T / T}) .$

By Lemma A2 (ii), uniformly for $i \leq p$ , we have

$\begin{matrix} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} - \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T} | & = | \sum_{t = 1}^{T} u_{i t} f_{t}^{'} ({\hat{Ω}}_{f} v_{k} - Ω_{f} v_{k}) / \sqrt{T} | \\ \leq ∥ {\hat{Ω}}_{f} v_{k} - Ω_{f} v_{k} ∥_{1} max_{i \leq p, k \leq K} | \frac{1}{\sqrt{T}} \sum_{t = 1}^{T} f_{k t} u_{i t} | \\ = O_{p} (\sqrt{log p log T / T}) . \end{matrix}$

Choosing $ζ_{1}^{2} = O (\sqrt{log p log T / T})$ , we have

$P (max_{1 \leq i \leq p} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} - \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T} | > ζ_{1}) \leq ζ_{2}, ζ_{2} = o (1) .$

Note that

$| max_{1 \leq i \leq p} \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} - max_{1 \leq i \leq p} \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T} | \leq max_{1 \leq i \leq p} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} - \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T} |,$

then the proof is complete. ☐

Lemma A8.

Under the assumptions in Theorem 4, we have

(i)
${max}_{i, j, m, n} | (1 / T) \sum_{t = 1}^{T} u_{i t} u_{j t} f_{m t} f_{n t} - σ_{i j} Σ_{f} (m, n) | = O_{p} (\sqrt{log p / T})$ ,

(ii)
${max}_{i, j, m, n} | (1 / T) \sum_{t = 1}^{T} u_{i t} f_{j t} f_{m t} f_{n t} | = O_{p} (\sqrt{log p / T})$ ,

Proof of Lemma A8.

(i) By Assumption 1 and Lemma A1, $u_{i t} f_{m t}$ satisfies the exponential tail condition, with parameter $2 r_{1} r_{2} / (3 r_{1} + 3 r_{2})$ as shown in Lemma A1. Thus, $u_{i t} u_{j t} f_{m t} f_{n t}$ satisfies the exponential tail condition, with parameter $r_{1} r_{2} / (4 r_{1} + 4 r_{2})$ . It follows from $1.5 (r_{1}^{- 1} + r_{2}^{- 1}) > 1$ that $γ_{3} < 1$ . Therefore, by the Bernstein’s inequality [23], there exist constants $C_{i}$ , $i = 1, \dots, 5$ , for any $s > 0$

$\begin{matrix} max_{i, j, m, n} P (| \frac{1}{T} \sum_{t = 1}^{T} u_{i t} u_{j t} f_{m t} f_{n t} - σ_{i j} Σ_{f} (m, n) | \geq s) \leq T exp (- \frac{{(T s)}^{γ_{3}}}{C_{1}}) \\ + exp (- \frac{T^{2} s^{2}}{C_{2} (1 + T C_{3})}) + exp (- \frac{{(T s)}^{2}}{C_{4} T} exp (\frac{{(T s)}^{γ_{3} (1 - γ_{3})}}{C_{5} {(log T s)}^{γ_{3}}})) . \end{matrix}$ (A1)

Using Bonferroni’s method, we have

$\begin{matrix} P ( & max_{i, j, m, n} | \frac{1}{T} \sum_{t = 1}^{T} u_{i t} u_{j t} f_{m t} f_{n t} - σ_{i j} Σ_{f} (m, n) | > s) \\ \leq {(p K)}^{2} max_{i, j, m, n} P (| \frac{1}{T} \sum_{t = 1}^{T} u_{i t} u_{j t} f_{m t} f_{n t} - σ_{i j} Σ_{f} (m, n) | > s) . \end{matrix}$

Let $s = C \sqrt{(log p) / T}$ for some $C > 0$ . It is not hard to check that when ${(log p)}^{2 / γ_{3} - 1} = o (T)$ (by assumption), for large enough C,

$p^{2} T exp (- \frac{{(T s)}^{γ_{3}}}{C_{1}}) + p^{2} exp (- \frac{{(T s)}^{2}}{C_{4} T} exp (\frac{{(T s)}^{γ_{3} (1 - γ_{3})}}{C_{5} {(log T s)}^{γ_{3}}})) = o (\frac{1}{p^{2}})$

and

$p^{2} exp (- \frac{T^{2} s^{2}}{C_{2} (1 + T C_{3})}) = O (\frac{1}{p^{2}}) .$

As $K = O (1)$ , this proves (i).

(ii) By Assumption 1 and Lemma A1, $u_{i t} f_{j t} f_{m t} f_{n t}$ satisfies the exponential tail condition for the tail parameter $r_{1} r_{2} / (9 r_{1} + 3 r_{2})$ . Therefore, again by the Bernstein’s inequality and the Bonferroni method on $u_{i t} f_{j t} f_{m t} f_{n t}$ similar to (A1) with the parameter $r_{3}^{- 1} = 3 r_{1}^{- 1} + 9 r_{2}^{- 1}$ , it follows from $1.5 (r_{1}^{- 1} + r_{2}^{- 1}) > 1$ that $r_{3} < 1$ . Thus, when $s = C \sqrt{log p / T}$ for large enough C, as K is fixed, the term

$p K^{3} exp (- \frac{T^{2} s^{2}}{C_{2} (1 + T C_{3})}) \leq p^{- 2},$

and the rest terms on the right-hand side of the inequality, multiplied by $p K^{3}$ are of order $o (p^{- 2})$ . Hence when ${(log p)}^{2 / r_{3} - 1} = o (T)$ (by assumption), we have

$max_{i, j, m, n} | \frac{1}{T} \sum_{t = 1}^{T} u_{i t} f_{j t} f_{m t} f_{n t} | = O_{p} (\sqrt{log p / T}),$

which completes the proof. ☐

Lemma A9.

Under the assumptions in Theorem 4, we have

(i)
${max}_{i, j, m, n} | (1 / T) \sum_{t = 1}^{T} ({\hat{u}}_{i t} {\hat{u}}_{j t} f_{m t} f_{n t} - u_{i t} u_{j t} f_{m t} f_{n t}) | = O_{p} (log p / T) .$

(ii)
${max}_{i, j, m, n} | (1 / T) \sum_{t = 1}^{T} {\hat{u}}_{i t} {\hat{u}}_{j t} f_{m t} f_{n t} - σ_{i j} Σ_{f} (m, n) | = O_{p} (\sqrt{log p / T}) .$

Proof of Lemma A9.

(i) By the triangular inequality, we have

$\begin{matrix} | \frac{1}{T} \sum_{t = 1}^{T} ({\hat{u}}_{i t} {\hat{u}}_{j t} f_{m t} f_{n t} - u_{i t} u_{j t} f_{m t} f_{n t}) | \\ \leq \underset{I}{\underset{︸}{| \frac{1}{T} \sum_{t = 1}^{T} ({\hat{u}}_{i t} {\hat{u}}_{j t} f_{m t} f_{n t} - {\hat{u}}_{i t} u_{j t} f_{m t} f_{n t}) |}} + \underset{I I}{\underset{︸}{| \frac{1}{T} \sum_{t = 1}^{T} ({\hat{u}}_{i t} u_{j t} f_{m t} f_{n t} - u_{i t} u_{j t} f_{m t} f_{n t}) |}} \end{matrix}$

For I, we have

$\begin{matrix} I & = | \frac{1}{T} \sum_{t = 1}^{T} {\hat{u}}_{i t} {({\hat{b}}_{j} - b_{j})}^{'} f_{t} f_{m t} f_{n t} | \\ \leq \underset{i}{\underset{︸}{| \frac{1}{T} \sum_{t = 1}^{T} {({\hat{b}}_{i} - b_{i})}^{'} f_{t} {({\hat{b}}_{j} - b_{j})}^{'} f_{t} f_{m t} f_{n t} |}} + \underset{i i}{\underset{︸}{| \frac{1}{T} \sum_{t = 1}^{T} u_{i t} {({\hat{b}}_{j} - b_{j})}^{'} f_{t} f_{m t} f_{n t} |}} \end{matrix}$

By Lemma 3.1 of [4], we have ${max}_{i \leq p} ∥ {\hat{b}}_{i} - b_{i} ∥ = O_{p} (\sqrt{log p / T})$ . It is straightforward to see that

$max_{m, n \leq K} ∥ \frac{1}{T} \sum_{t = 1}^{T} f_{t} f_{t}^{'} f_{m t} f_{n t} ∥_{\infty} = O_{p} (1) .$

then we have,

$i \leq max_{i, m, n} {∥ {\hat{b}}_{i} - b_{i} ∥}^{2} ∥ \frac{1}{T} \sum_{t = 1}^{T} f_{t} f_{t}^{'} f_{m t} f_{n t} ∥_{\infty} = O_{p} (log p / T) .$

By Lemma A8 (ii), we have $∥ (1 / T) \sum_{t = 1}^{T} u_{i t} f_{t} f_{m t} f_{n t} ∥_{\infty} = O_{p} (\sqrt{log p / T})$ , which implies that

$i i \leq max_{j \leq p} ∥ {\hat{b}}_{j} - b_{j} ∥ ∥ \frac{1}{T} \sum_{t = 1}^{T} u_{i t} f_{t} f_{m t} f_{n t} ∥_{\infty} = O_{p} (log p / T)$

Part $I I$ is similar to $i i$ , thus we have

$I I = | \frac{1}{T} \sum_{t = 1}^{T} {({\hat{b}}_{i} - b_{i})}^{'} f_{t} u_{j t} f_{m t} f_{n t} | \leq O_{p} (log p / T) .$

Then the proof is complete.

(ii) By the triangular inequality, we have

$\begin{matrix} max_{i, j, m, n} | \frac{1}{T} \sum_{t = 1}^{T} {\hat{u}}_{i t} {\hat{u}}_{j t} f_{m t} f_{n t} - σ_{i j} Σ_{f} (m, n) | \\ \leq max_{i, j, m, n} | \frac{1}{T} \sum_{t = 1}^{T} ({\hat{u}}_{i t} {\hat{u}}_{j t} f_{m t} f_{n t} - u_{i t} u_{j t} f_{m t} f_{n t}) | + max_{i, j, m, n} | \frac{1}{T} \sum_{t = 1}^{T} u_{i t} u_{j t} f_{m t} f_{n t} - σ_{i j} Σ_{f} (m, n) | \\ = O_{p} (\sqrt{log p / T}), \end{matrix}$

which proves the result. ☐

Proof of Theorem 1.

Without loss of generality, we set $G = {1, 2, \dots, p}$ . First, we prove the following fact,

$sup_{α \in (0, 1)} | P (T_{1} > c_{W_{0}} (α)) - α | = o (1),$ (A2)

For $ϑ > 0$ , let $π (ϑ) : = C_{2} ϑ^{1 / 3} {(1 \lor log (p / ϑ))}^{2 / 3}$ with $C_{2} > 0$ . In addition, Let $κ_{1} (ϑ) : = c_{Z_{0}} (α - ζ_{2} - π (ϑ))$ and $κ_{2} (ϑ) : = c_{Z_{0}} (α + ζ_{2} + π (ϑ))$ . For every $α \in (0, 1)$ , note that

$\begin{matrix} P ({T_{1} \leq c_{W_{0}} (α)} ⊖ {T_{0} \leq c_{Z_{0}} (α)}) \\ ≲_{(1)} P (κ_{1} (ϑ) - 2 ζ_{1} < T_{0} \leq κ_{2} (ϑ) + 2 ζ_{1}) + P (Δ > ϑ) + ζ_{2} \\ ≲_{(2)} P (κ_{1} (ϑ) - 2 ζ_{1} < Z_{0} \leq κ_{2} (ϑ) + 2 ζ_{1}) + P (Δ > ϑ) + ρ + ζ_{2} \\ ≲_{(3)} π (ϑ) + P (Δ > ϑ) + ρ + ζ_{1} \sqrt{1 \lor log (p / ζ_{1})} + ζ_{2}, \end{matrix}$

where (1) follows from Lemmas A6 and A7, (2) follows from Lemma A4, and (3) follows from Lemma 2.1 in [25] and the fact that $Z_{0}$ has no point masses. Then, by the definition of $ρ$ in Lemma A4, we have

$sup_{α \in (0, 1)} | P (T_{1} > c_{W_{0}} (α)) - α | \leq ρ_{⊖} + ρ,$

where $ρ_{⊖} = {sup}_{α \in (0, 1)} P ({T_{1} \leq c_{W_{0}} (α)} ⊖ {T_{0} \leq c_{Z_{0}} (α)})$ . The right-hand side of the above inequality is $o (1)$ , which has proved (A2). Since ${max}_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k} | = \sqrt{T} {max}_{i \in G} max {{\hat{b}}_{i k} - b_{i k}, b_{i k} - {\hat{b}}_{i k}}$ , similar arguments imply that

$sup_{α \in (0, 1)} | P (max_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k} | > c_{W_{T, k}} (α)) - α | = o (1),$

which completes the proof. ☐

Proof of Theorem 2.

From the arguments in the proof of Lemma A6, we have

$Δ = O_{p} (| {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | + \sqrt{log p / T}),$

which implies that ${max}_{1 \leq i \leq p} | {\hat{ω}}_{i i} - ω_{i i} | = O_{p} (| {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | + \sqrt{log p / T})$ . We then have

$P (ω_{i i} / 2 < {\hat{ω}}_{i i} < 2 ω_{i i} for all 1 \leq i \leq p) \to 1 .$ (A3)

Define ${\bar{T}}_{1} = {max}_{i \in G} \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T {\hat{ω}}_{i i}}$ and ${\bar{T}}_{0} = {max}_{i \in G} \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T ω_{i i}}$ . Note that

$\begin{matrix} | {\bar{T}}_{1} - {\bar{T}}_{0} | \\ \leq max_{1 \leq i \leq p} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T {\hat{ω}}_{i i}} - \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T ω_{i i}} | \\ \leq max_{1 \leq i \leq p} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T {\hat{ω}}_{i i}} - \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T ω_{i i}} | + max_{1 \leq i \leq p} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T ω_{i i}} - \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T ω_{i i}} | \\ \leq C^{'} max_{1 \leq i \leq p} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} | max_{1 \leq i \leq p} | \sqrt{ω_{i i} / {\hat{ω}}_{i i}} - 1 | + C^{''} max_{1 \leq i \leq p} | \sum_{t = 1}^{T} ({\hat{ξ}}_{i t} - ξ_{i t}) / \sqrt{T} | \\ : = I_{1} + I_{2}, \end{matrix}$

where $C^{'}, C^{''} > 0$ .

On the event $ω_{i i} / 2 < {\hat{ω}}_{i i} < 2 ω_{i i}$ for all $1 \leq i \leq p$ ,

$\begin{matrix} max_{1 \leq i \leq p} | \sqrt{ω_{i i} / {\hat{ω}}_{i i}} - 1 | & \leq max_{1 \leq i \leq p} | \sqrt{ω_{i i}} - \sqrt{{\hat{ω}}_{i i}} | max_{1 \leq i \leq p} \sqrt{2 / ω_{i i}} \\ \leq max_{1 \leq i \leq p} | \frac{ω_{i i} - {\hat{ω}}_{i i}}{\sqrt{ω_{i i}} + \sqrt{{\hat{ω}}_{i i}}} | max_{1 \leq i \leq p} \sqrt{2 / ω_{i i}} \\ \leq max_{1 \leq i \leq p} | ω_{i i} - {\hat{ω}}_{i i} | max_{1 \leq i \leq p} 1 / ω_{i i} \\ = O_{p} (| {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | + \sqrt{(log p) / T}) . \end{matrix}$

On the other hand,

$\begin{matrix} max_{1 \leq i \leq p} | \sum_{t = 1}^{T} {\hat{ξ}}_{i t} / \sqrt{T} | & \leq max_{1 \leq i \leq p} | \sum_{t = 1}^{T} ({\hat{ξ}}_{i t} - ξ_{i t}) / \sqrt{T} | + max_{1 \leq i \leq p} | \sum_{t = 1}^{T} ξ_{i t} / \sqrt{T} | \\ = O_{P} (\sqrt{log p log T / T} + \sqrt{log p}) = O_{P} (\sqrt{log p}) . \end{matrix}$

Therefore, on the above event, $I_{1} \leq O_{p} (\sqrt{log p} | {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | + log p / \sqrt{T})$ . By Lemma A5, we can find $ζ_{1}^{'}$ such that $P (I_{1} > ζ_{1}^{'}) = o (1)$ and $ζ_{1}^{'} \sqrt{1 \lor log (p / ζ_{1}^{'})} = o (1)$ . Thus by Lemma A7 and (A3), we have

$P (| {\bar{T}}_{1} - {\bar{T}}_{0} | > ζ_{1}) \leq P (I_{1} + I_{2} > ζ_{1}) < ζ_{2},$

for $ζ_{1} \sqrt{1 \lor log (p / ζ_{1})} = o (1)$ and $ζ_{2} = o (1)$ .

Let $\bar{Δ} = {max}_{1 \leq j, k \leq p} | ω_{j k} / \sqrt{ω_{j j} ω_{k k}} - {\hat{ω}}_{j k} / \sqrt{{\hat{ω}}_{j j} {\hat{ω}}_{k k}} |$ . Note that

$| \sqrt{ω_{j j} ω_{k k}} - \sqrt{{\hat{ω}}_{j j} {\hat{ω}}_{k k}} | = \frac{| ω_{j j} ω_{k k} - {\hat{ω}}_{j j} {\hat{ω}}_{k k} |}{\sqrt{ω_{j j} ω_{k k}} + \sqrt{{\hat{ω}}_{j j} {\hat{ω}}_{k k}}} .$

On the event $ω_{i i} / 2 < {\hat{ω}}_{i i} < 2 ω_{i i}$ for all $1 \leq i \leq p$ , we have

$\frac{| ω_{j j} ω_{k k} - {\hat{ω}}_{j j} {\hat{ω}}_{k k} |}{\sqrt{ω_{j j} ω_{k k}} + \sqrt{{\hat{ω}}_{j j} {\hat{ω}}_{k k}}} \leq \frac{| ω_{j j} ω_{k k} - {\hat{ω}}_{j j} {\hat{ω}}_{k k} |}{\sqrt{ω_{j j} ω_{k k}} + \sqrt{ω_{j j} ω_{k k} / 4}} \leq (2 / 3) | ω_{j j} ω_{k k} - {\hat{ω}}_{j j} {\hat{ω}}_{k k} | max_{1 \leq j \leq p} 1 / ω_{j j},$

which implies that

$\begin{matrix} max_{1 \leq j, k \leq p} | \sqrt{ω_{j j} ω_{k k} / {\hat{ω}}_{j j} {\hat{ω}}_{k k}} - 1 | & \leq max_{1 \leq j, k \leq p} | \sqrt{ω_{j j} ω_{k k}} - \sqrt{{\hat{ω}}_{j j} {\hat{ω}}_{k k}} | max_{1 \leq j \leq p} 2 / ω_{j j} \\ \leq (4 / 3) max_{1 \leq j, k \leq p} | ω_{j j} ω_{k k} - {\hat{ω}}_{j j} {\hat{ω}}_{k k} | max_{1 \leq j \leq p} 1 / ω_{j j}^{2} \\ = O_{p} (| {\hat{Ω}}_{f} (k, k) - Ω_{f} (k, k) | + \sqrt{(log p) / T}) . \end{matrix}$

Choosing $ϑ = 1 / (α_{T} {(log p)}^{2})$ , we can show that $P (\bar{Δ} > ϑ) = o (1)$ . The rest of the proofs are similar to those in the proof of Theorem 1. We skip the details. ☐

Proof of Theorem 3.

Let $Z = (Z_{1}, \dots, Z_{p}) \overset{d}{\sim} N (0, Θ)$ . Following the arguments in the proof of Theorem 2, we can show that the distribution of ${max}_{i \in G} \sqrt{T} | {\hat{b}}_{i k} - b_{i k} | / \sqrt{{\hat{ω}}_{i i}}$ can be approximated by ${max}_{i \in G} | Z_{i} |$ . Under Assumption 5, by Lemma 6 of [15], we have for any $x \in R$ and as $| G | \to \infty,$

$P (max_{i \in G} | Z_{i} |^{2} - 2 log (| G |) + log log (| G |) \leq x) \to F (x) : = exp \{- \frac{1}{\sqrt{π}} exp (- \frac{x}{2})\} .$

It implies that

$P (max_{i \in G} T | {\hat{b}}_{i k} - b_{i k} |^{2} / {\hat{ω}}_{i i} \leq 2 log (| G |) - log log (| G |) / 2) \to 1 .$ (A4)

The bootstrap consistency result implies that

$| {(c_{W_{T, k}}^{*} (α))}^{2} - 2 log (| G |) + log log (| G |) - q_{α} | = o_{P} (1),$ (A5)

where $q_{α}$ is the $100 (1 - α)$ th quantile of $F (x)$ . Consider any $i \in G$ such that $| b_{i k}^{null} - b_{i k} | / \sqrt{ω_{i i}} > (\sqrt{2} + ε_{0}) \sqrt{log | G | / T}$ . Using the inequality $2 a_{1} a_{2} \leq δ^{- 1} a_{1}^{2} + δ a_{2}^{2}$ for any $δ > 0$ , we have

$T | b_{i k}^{null} - b_{i k} |^{2} / {\hat{ω}}_{i i} \leq (1 + δ^{- 1}) T | {\hat{b}}_{i k} - b_{i k} |^{2} / {\hat{ω}}_{i i} + (1 + δ) T {| {\hat{b}}_{i k} - b_{i k}^{null} |}^{2} / {\hat{ω}}_{i i},$ (A6)

where $T | {\hat{b}}_{i k} - b_{i k} |^{2} / {\hat{ω}}_{i i} = o_{p} (log | G |)$ as i is fixed and $| G |$ grows. From the proof of Theorem 2, we know the difference between $T | b_{i k}^{null} - b_{i k} |^{2} / {\hat{ω}}_{i i}$ and $T | b_{i k}^{null} - b_{i k} |^{2} / ω_{i i}$ is asymptotically negligible. Thus, by (A6) and the fact that $B_{k} \in U_{G} (\sqrt{2} + ε_{0})$ , we have

$max_{i \in G} T {| {\hat{b}}_{i k} - b_{i k}^{null} |}^{2} / {\hat{ω}}_{i i} \geq \frac{1}{1 + δ} \{{(\sqrt{2} + ε_{0})}^{2} (log | G |) - o_{p} (log | G |)\} .$ (A7)

The conclusion thus follows from (A7) and (A5) provided that $δ$ is small enough. ☐

Proof of Theorem 4.

Without loss of generality, we set $G^{*} = M$ . Define

$\hat{Γ} = \frac{1}{T} \sum_{t = 1}^{T} {\hat{u}}_{i t} {\hat{u}}_{j t} f_{t} f_{t}^{'}, and Γ = σ_{i j} Σ_{f} .$

Let

$Δ^{*} : = max_{i, j, m, n} | {({\hat{Ω}}_{f} v_{m})}^{'} \hat{Γ} ({\hat{Ω}}_{f} v_{n}) - σ_{i j} Ω_{f} (m, n) |$

denote the maximum discrepancy between the empirical and population covariance matrices. By the triangular inequality, we have

$\begin{matrix} | {({\hat{Ω}}_{f} v_{m})}^{'} \hat{Γ} ({\hat{Ω}}_{f} v_{n}) - σ_{i j} Ω_{f} (m, n) | \\ = | {({\hat{Ω}}_{f} v_{m})}^{'} \hat{Γ} ({\hat{Ω}}_{f} v_{n}) - {(Ω_{f} v_{m})}^{'} Γ (Ω_{f} v_{n}) | \\ \leq \underset{I}{\underset{︸}{| {({\hat{Ω}}_{f} v_{m})}^{'} (\hat{Γ} - Γ) ({\hat{Ω}}_{f} v_{n}) |}} + \underset{I I}{\underset{︸}{| {({\hat{Ω}}_{f} v_{m} - Ω_{f} v_{m})}^{'} Γ ({\hat{Ω}}_{f} v_{n}) |}} + \underset{I I I}{\underset{︸}{| {(Ω_{f} v_{m})}^{'} Γ ({\hat{Ω}}_{f} - Ω_{f}) v_{n} |}} . \end{matrix}$

Note that $∥ {\hat{Ω}}_{f} ∥_{\infty} = O_{p} (1)$ , by Lemma A9 (ii), we have

$I \leq ∥ {\hat{Ω}}_{f} v_{m} ∥_{\infty}^{2} {∥ \hat{Γ} - Γ ∥}_{\infty} = O_{p} (\sqrt{log p / T}) .$

By Lemma A2 (iii) and ${∥ Γ ∥}_{\infty} = O (1)$ , we have

$I I \leq ∥ {\hat{Ω}}_{f} v_{m} - Ω_{f} v_{m} ∥_{\infty} {∥ Γ ∥}_{\infty} {∥ {\hat{Ω}}_{f} v_{n} ∥}_{\infty} = O_{p} (\sqrt{log T / T}) .$

Since $∥ Ω_{f} ∥_{\infty} = O (1)$ , we have

$I I I \leq ∥ {(Ω_{f} v_{m})}^{'} ∥_{\infty} {∥ Γ ∥}_{\infty} {∥ ({\hat{Ω}}_{f} - Ω_{f}) v_{n} ∥}_{\infty} = O_{p} (\sqrt{log T / T}) .$

The above results hold uniformly for $i, j, m, n$ , thus we have

$Δ^{*} = O_{p} (\sqrt{log p / T} + \sqrt{log T / T}) .$

The rest of the proofs is similar to those in the proof of Theorem 2. We skip the details. ☐

Proof of Theorem 5.

For a proof, see the proof of Theorem 3. ☐

Author Contributions

X.G. conceived and designed the experiments; Y.W. performed the experiments; Y.W. analyzed the data; X.G. contributed to analysis tools; X.G. and Y.W. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grants 12071452, 11601500, 11671374 and 11771418 and the Fundamental Research Funds for the Central Universities.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Bai J., Liao Y. Efficient estimation of approximate factor models via penalized maximum likelihood. J. Econ. 2016;191:1–18. [Google Scholar]
2.Heinemann A. Efficient estimation of factor models with time and cross-sectional dependence. J. Appl. Econ. 2017;32:1107–1122. [Google Scholar]
3.Fan J., Fan Y., Lv J. High dimensional covariance matrix estimation using a factor model. J. Econ. 2008;147:186–197. [Google Scholar]
4.Fan J., Liao Y., Mincheva M. High-dimensional covariance matrix estimation in approximate factor models1. Ann. Stat. 2011;39:3320–3356. doi: 10.1214/11-AOS944. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Fan J., Liao Y., Mincheva M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2013;75:603–680. doi: 10.1111/rssb.12016. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Fan J., Wang W., Zhong Y. Robust covariance estimation for approximate factor models. J. Econom. 2019;208:5–22. doi: 10.1016/j.jeconom.2018.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Dickhaus T., Pauly M. Time Series Analysis and Forecasting. Springer; Berlin, Germany: 2016. Simultaneous statistical inference in dynamic factor models; pp. 27–45. [Google Scholar]
8.Dickhaus T., Sirotko-Sibirskaya N. Simultaneous statistical inference in dynamic factor models: Chi-square approximation and model-based bootstrap. Comput. Stat. Data Anal. 2019;129:30–46. [Google Scholar]
9.Lucas J., Carvalho C., Wang Q., Bild A., Nevins J.R., West M. Sparse statistical modelling in gene expression genomics. Bayesian Inference Gene Expr. Proteom. 2006;1:155–176. [Google Scholar]
10.Carvalho C.M., Chang J., Lucas J.E., Nevins J.R., Wang Q., West M. High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 2008;103:1438–1456. doi: 10.1198/016214508000000869. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Reis R., Watson M.W. Relative goods’ prices, pure inflation, and the Phillips correlation. Am. Econ. J. Macroecon. 2010;2:128–157. [Google Scholar]
12.Amengual D., Repetto L. Testing a Large Number of Hypotheses in Approximate Factor Models. CEMFI; Madrid, Spain: 2014. Technical Report. [Google Scholar]
13.Candès E., Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007;35:2313–2351. [Google Scholar]
14.Cai T., Liu W., Xia Y. Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Am. Stat. Assoc. 2013;108:265–277. doi: 10.1080/01621459.2012.758041. [DOI] [Google Scholar]
15.Cai T.T., Liu W., Xia Y. Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2014;76:349–372. [Google Scholar]
16.Zhang X., Cheng G. Simultaneous inference for high-dimensional linear models. J. Am. Stat. Assoc. 2017;112:757–768. doi: 10.1080/01621459.2016.1166114. [DOI] [Google Scholar]
17.Romano J.P., Wolf M. Exact and approximate stepdown methods for multiple hypothesis testing. J. Am. Stat. Assoc. 2005;100:94–108. [Google Scholar]
18.Zhu Y., Yu Z., Cheng G. High dimensional inference in partially linear models; Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS); Naha, Okinawa, Japan. 16–18 April 2019; pp. 2760–2769. [Google Scholar]
19.Zhang X., Cheng G. Bootstrapping high dimensional time series. arXiv. 20141406.1037 [Google Scholar]
20.Benjamini Y., Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. (Methodol.) 1995;57:289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]
21.Fama E.F., French K.R. The cross-section of expected stock returns. J. Financ. 1992;47:427–465. doi: 10.1111/j.1540-6261.1992.tb04398.x. [DOI] [Google Scholar]
22.Bickel P., Levina E. Some theory for Fisher’s linear discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations. Bernoulli. 2004;10:989–1010. doi: 10.3150/bj/1106314847. [DOI] [Google Scholar]
23.Merlevède F., Peligrad M., Rio E. A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Relat. Fields. 2011;151:435–474. doi: 10.1007/s00440-010-0304-9. [DOI] [Google Scholar]
24.Kosorok M.R. Introduction to Empirical Processes and Semiparametric Inference. Springer; New York, NY, USA: 2008. [Google Scholar]
25.Chernozhukov V., Chetverikov D., Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 2013;41:2786–2819. [Google Scholar]
26.Ingster Y.I., Tsybakov A.B., Verzelen N. Detection boundary in sparse regression. Electron. J. Stat. 2010;4:1476–1526. doi: 10.1214/10-EJS589. [DOI] [Google Scholar]

[B1-entropy-22-01258] 1.Bai J., Liao Y. Efficient estimation of approximate factor models via penalized maximum likelihood. J. Econ. 2016;191:1–18. [Google Scholar]

[B2-entropy-22-01258] 2.Heinemann A. Efficient estimation of factor models with time and cross-sectional dependence. J. Appl. Econ. 2017;32:1107–1122. [Google Scholar]

[B3-entropy-22-01258] 3.Fan J., Fan Y., Lv J. High dimensional covariance matrix estimation using a factor model. J. Econ. 2008;147:186–197. [Google Scholar]

[B4-entropy-22-01258] 4.Fan J., Liao Y., Mincheva M. High-dimensional covariance matrix estimation in approximate factor models1. Ann. Stat. 2011;39:3320–3356. doi: 10.1214/11-AOS944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5-entropy-22-01258] 5.Fan J., Liao Y., Mincheva M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2013;75:603–680. doi: 10.1111/rssb.12016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6-entropy-22-01258] 6.Fan J., Wang W., Zhong Y. Robust covariance estimation for approximate factor models. J. Econom. 2019;208:5–22. doi: 10.1016/j.jeconom.2018.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7-entropy-22-01258] 7.Dickhaus T., Pauly M. Time Series Analysis and Forecasting. Springer; Berlin, Germany: 2016. Simultaneous statistical inference in dynamic factor models; pp. 27–45. [Google Scholar]

[B8-entropy-22-01258] 8.Dickhaus T., Sirotko-Sibirskaya N. Simultaneous statistical inference in dynamic factor models: Chi-square approximation and model-based bootstrap. Comput. Stat. Data Anal. 2019;129:30–46. [Google Scholar]

[B9-entropy-22-01258] 9.Lucas J., Carvalho C., Wang Q., Bild A., Nevins J.R., West M. Sparse statistical modelling in gene expression genomics. Bayesian Inference Gene Expr. Proteom. 2006;1:155–176. [Google Scholar]

[B10-entropy-22-01258] 10.Carvalho C.M., Chang J., Lucas J.E., Nevins J.R., Wang Q., West M. High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 2008;103:1438–1456. doi: 10.1198/016214508000000869. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11-entropy-22-01258] 11.Reis R., Watson M.W. Relative goods’ prices, pure inflation, and the Phillips correlation. Am. Econ. J. Macroecon. 2010;2:128–157. [Google Scholar]

[B12-entropy-22-01258] 12.Amengual D., Repetto L. Testing a Large Number of Hypotheses in Approximate Factor Models. CEMFI; Madrid, Spain: 2014. Technical Report. [Google Scholar]

[B13-entropy-22-01258] 13.Candès E., Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007;35:2313–2351. [Google Scholar]

[B14-entropy-22-01258] 14.Cai T., Liu W., Xia Y. Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Am. Stat. Assoc. 2013;108:265–277. doi: 10.1080/01621459.2012.758041. [DOI] [Google Scholar]

[B15-entropy-22-01258] 15.Cai T.T., Liu W., Xia Y. Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2014;76:349–372. [Google Scholar]

[B16-entropy-22-01258] 16.Zhang X., Cheng G. Simultaneous inference for high-dimensional linear models. J. Am. Stat. Assoc. 2017;112:757–768. doi: 10.1080/01621459.2016.1166114. [DOI] [Google Scholar]

[B17-entropy-22-01258] 17.Romano J.P., Wolf M. Exact and approximate stepdown methods for multiple hypothesis testing. J. Am. Stat. Assoc. 2005;100:94–108. [Google Scholar]

[B18-entropy-22-01258] 18.Zhu Y., Yu Z., Cheng G. High dimensional inference in partially linear models; Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS); Naha, Okinawa, Japan. 16–18 April 2019; pp. 2760–2769. [Google Scholar]

[B19-entropy-22-01258] 19.Zhang X., Cheng G. Bootstrapping high dimensional time series. arXiv. 20141406.1037 [Google Scholar]

[B20-entropy-22-01258] 20.Benjamini Y., Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. (Methodol.) 1995;57:289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]

[B21-entropy-22-01258] 21.Fama E.F., French K.R. The cross-section of expected stock returns. J. Financ. 1992;47:427–465. doi: 10.1111/j.1540-6261.1992.tb04398.x. [DOI] [Google Scholar]

[B22-entropy-22-01258] 22.Bickel P., Levina E. Some theory for Fisher’s linear discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations. Bernoulli. 2004;10:989–1010. doi: 10.3150/bj/1106314847. [DOI] [Google Scholar]

[B23-entropy-22-01258] 23.Merlevède F., Peligrad M., Rio E. A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Relat. Fields. 2011;151:435–474. doi: 10.1007/s00440-010-0304-9. [DOI] [Google Scholar]

[B24-entropy-22-01258] 24.Kosorok M.R. Introduction to Empirical Processes and Semiparametric Inference. Springer; New York, NY, USA: 2008. [Google Scholar]

[B25-entropy-22-01258] 25.Chernozhukov V., Chetverikov D., Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 2013;41:2786–2819. [Google Scholar]

[B26-entropy-22-01258] 26.Ingster Y.I., Tsybakov A.B., Verzelen N. Detection boundary in sparse regression. Electron. J. Stat. 2010;4:1476–1526. doi: 10.1214/10-EJS589. [DOI] [Google Scholar]

PERMALINK

Simultaneous Inference for High-Dimensional Approximate Factor Model

Yong Wang

Xiao Guo

Abstract

1. Introduction

2. Methodology

2.1. Simultaneous Test for a Single Factor

Assumption 1.

Assumption 2.

Assumption 3.

Assumption 4.

Theorem 1.

Theorem 2.

Assumption 5.

Theorem 3.

2.2. Simultaneous Test for Multiple Factors

Theorem 4.

Assumption 6.

Theorem 5.

3. Multiple Testing with Strong FWER Control

Proposition 1.

4. Simulation Study

Table 1.

Figure 1.

Table 2.

Table 3.

5. Real Data Analysis

Table 4.

Acknowledgments

Appendix A. Technical Details

Lemma A1.

Proof of Lemma A1.

Lemma A2.

Proof of Lemma A2.

Lemma A3.

Proof of Lemma A3.

Lemma A4.

Proof of Lemma A4.

Lemma A5.

Proof of Lemma A5.

Lemma A6.

Proof of Lemma A6.

Lemma A7.

Proof of Lemma A7.

Lemma A8.

Proof of Lemma A8.

Lemma A9.

Proof of Lemma A9.

Proof of Theorem 1.

Proof of Theorem 2.

Proof of Theorem 3.

Proof of Theorem 4.

Proof of Theorem 5.

Author Contributions

Funding

Conflicts of Interest

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases