Model averaging estimation for high-dimensional covariance matrices with a network structure

Rong Zhu; Xinyu Zhang; Yanyuan Ma; Guohua Zou

doi:10.1093/ectj/utaa030

. 2020 Sep 29;24(1):177–197. doi: 10.1093/ectj/utaa030

Model averaging estimation for high-dimensional covariance matrices with a network structure

Rong Zhu ^1,^✉, Xinyu Zhang ^2,^✉, Yanyuan Ma ^3,^✉, Guohua Zou ^4,^✉

PMCID: PMC7946866 PMID: 33746562

Summary

In this paper, we develop a model averaging method to estimate a high-dimensional covariance matrix, where the candidate models are constructed by different orders of polynomial functions. We propose a Mallows-type model averaging criterion and select the weights by minimizing this criterion, which is an unbiased estimator of the expected in-sample squared error plus a constant. Then, we prove the asymptotic optimality of the resulting model average covariance estimators. Finally, we conduct numerical simulations and a case study on Chinese airport network structure data to demonstrate the usefulness of the proposed approaches.

Keywords: asymptotic optimality, consistency, covariance regression network model, Mallows criterion, model averaging

1. INTRODUCTION

The covariance matrix is a familiar concept and has been widely used in many fields. For example, Markowitz (1952) illustrated geometrically the relationship between belief and choice of portfolio according to their covariance matrix. Campbell et al. (1998) and Jagannathan and Ma (2003) considered finance and risk management based on covariance matrices. Bilmes (2000) improved the parsimony of speech recognition systems by adjusting the type of covariance matrices. Chen and Conley (2001) presented a semiparametric model for high-dimensional vector time series: they analysed a semiparametric spatial model by using the positive-definiteness of a covariance function for panel data in the context of time series. Friedman et al. (2008) considered the problem of estimating sparse graphs by incorporating the lasso penalty into the inverse covariance matrix. And, motivated by the arbitrage pricing theory in finance, Fan et al. (2008) used the multi-factor model to reduce dimensionality and estimate the covariance matrix.

Let Inline graphic be the -dimensional covariance matrix of a -dimensional random vector . The most commonly used covariance matrix estimator of is the classic sample covariance matrix estimator. However, the sample covariance matrix estimator does not perform well when and the sample size is fixed or grows at a slower rate than p, because the number of unknown parameters is much larger than the sample size in this case. Inevitably, additional assumptions need to be made to ensure that the estimation of Inline graphic is feasible. Here, we consider the novel strategy proposed by Lan et al. (2018), which uses a covariance regression network model (CRNM) to estimate the high-dimensional covariance matrix. In this framework, the covariance matrix is regarded as a polynomial function of the symmetric adjacency matrix, representing a given network structure. In this way, the estimation of a high-dimensional covariance matrix is converted into the estimation of the low-dimensional coefficients of the CRNM. These authors further developed a Bayesian information criterion (BIC) to select the order of the polynomial function and proved the consistency of the BIC. Their treatment can thus be classed as a model selection approach.

Model averaging can be viewed as a smooth extension of model selection and it can substantially reduce the risk relative to model selection (Hansen, 2014). Furthermore, model averaging is often more stable than model selection, in which a small change of the data may lead to a significant change in the model selected; see Yuan and Yang (2005) and Leung and Barron (2006) for further discussion. There has been much research into model averaging for regression models. For example, Buckland et al. (1997) proposed the smoothed AIC and smoothed BIC method, in which weights are assigned based on the information criterion scores obtained from different models. Hansen (2007) and Wan et al. (2010) developed a Mallows model averaging method for linear regression models. Other methods include jackknife model averaging (Hansen and Racine, 2012; Zhang et al., 2013), heteroscedasticity-robust Cp model averaging (Liu and Okui, 2013), leave-subject-out cross-validation (Gao et al., 2016), and Mahalanobis Mallows model averaging (Zhu et al., 2018).

In this paper, in order to improve the estimation of a high-dimensional covariance matrix with a network structure, we propose a model averaging method based on the normalized version of CRNM (Lan et al., 2018). We select the averaging weights through minimizing a Mallows-type criterion, which is an unbiased estimator of the expected in-sample squared error plus a constant. We then establish the asymptotic optimality of the resulting model average covariance (MAC) estimator when none of the candidate models is correct. We also show that MAC will be consistent during the estimation of the covariance matrix Inline graphic if at least one of the candidate models is in fact correct. Following Lan et al. (2018), for the sake of convenience, we consider and first, and then discuss the extension when increases.

The remainder of this paper is organized as follows. In Section 2, we describe the estimation method in the normalized CRNM (nCRNM), introduce the Mallows-type weight choice criterion, and propose the MAC estimator based on this criterion. The asymptotic optimality of the model averaging estimator is established in Section 3, while the consistency of the method is shown in Section 4. We extend the theorems of asymptotic optimality and consistency to the case that the sample size Inline graphic is larger than one in Section 5. We compare the finite-sample properties of the MAC estimator with several information criterion-based model selection and averaging estimators in Section 6. A real data example is considered in Section 7. Section 8 contains some concluding remarks. Technical proofs are in Online Appendices.

2. MODEL SET-UP AND ESTIMATION

2.1. Covariance regression network model

Consider a graph with Inline graphic nodes, where the th node is used to represent the th observation, for example the th individual. Let denote the symmetric adjacency matrix of the graph, where if the two nodes and are directly related, for example if the th and th individuals know each other, and let otherwise. We set the diagonal elements Inline graphic . Let be the th power of , with the zeroth power defined as identity, i.e., . The th component of is then and it counts the number of ways to connect the th node and the th node through a path with exactly steps. In other words, measures the -path relationships of the nodes.

Let Inline graphic be a random variable that describes a certain property of the th node, such as the th person’s social activities in a period of time. Let , and let the covariance matrix of be . Naturally, can be linked to the adjacency matrix . Lan et al. (2018) introduced the CRNM: , where is a positive integer and Inline graphic is the vector of regression coefficients. To ensure that the parameters in the CRNM are comparable between models, we normalize the elements of , , by dividing . Hence, the normalized matrix is with . The th component of is then , which is the 'normalized' number of ways to connect the Inline graphic th node and the th node through a path with steps. Rather than using the CRNM in Lan et al. (2018), we assume the following normalized CRNM (nCRNM):

(2.1)

where Inline graphic is a positive integer that can increase to infinity with , and is the vector of regression coefficients.

To ensure that Inline graphic in (2.1) is positive-definite, constraints are imposed on . Because the adjacency matrix is assumed known, an obvious benefit of the nCRNM (2.1) is that it reduces the number of parameters in from to . Note that (2.1) is not necessarily the smallest model, since it is possible that Inline graphic for any . In particular, we allow the 'wasteful' case where for all , for a certain value .

In estimating the covariance matrix Inline graphic , it is common to assume that , have the same mean, which can be consistently estimated by the sample average . One can further centre the data by subtracting the sample mean and work directly with . Hence, without loss of generality, we assume that in the rest of the paper. For convenience, we also perform a standard eigenvalue decomposition on Inline graphic to obtain

(2.2)

where Inline graphic is an orthogonal matrix, is a diagonal matrix, and is the th-largest eigenvalue of . These preparations allow us to convert the model in (2.1) to a simple linear regression model, which facilitates straightforward estimation. Specifically, let , where is defined in (2.2). Then has mean Inline graphic and covariance matrix . Extracting the th diagonal element of these relations, we obtain , which is a multiple linear regression model if we view as the response variable, as the th regressor, and as the regression coefficients. To fix the notation, we define and . Then, the nCRNM (2.1) is equivalently written as

(2.3)

where Inline graphic . Let , , and be the diagonal matrix with the th diagonal entry being the th element of . With this notation,

(2.4)

Thus, for nCRNM (2.1) with a pre-determined Inline graphic , one can use the ordinary least squares method when is of full column rank to obtain the estimator of (i.e., ) in (2.3) and use (2.4) to obtain .

2.2. Candidate models and estimation

Each fixed polynomial order Inline graphic represents a different nCRNM (2.1). In practice, which value is suitable is often unclear. We thus consider many possible values of , which corresponds to a series of possible models. We index these models by , where the total number of models can be related to the total number of nodes Inline graphic . To further increase flexibility, we allow the th candidate nCRNM to contain arbitrary monomials of , which do not have to be to but must include the intercept term . Similar to (2.3), the resulting th nCRNM is equivalently written as

(2.5)

where Inline graphic is a submatrix of , is the coefficient vector, and is the error. Here we can assume that is a matrix with sufficiently large so that for all and is of full column rank. Although we consider models, we do not assume that any of these models are correct. Thus, we allow some or all of the Inline graphic models to be misspecified.

It is easy to see that the ordinary least square estimator of Inline graphic is , and the estimator of is , where is the projection matrix. By (2.4), the estimator of based on the th candidate model is

We thus have Inline graphic estimators of based on the candidate models.

2.3. Model averaging and weight choice criterion

Our purpose is now to combine the Inline graphic estimators of obtained from the candidate models to achieve an optimal estimator of through a weighted average of s. The optimality is reflected in the fact that the resulting estimator minimizes the distance to the true , while it simultaneously ensures that the estimated covariance matrix is positive-definite. Note that the positive-definiteness property is not guaranteed by the estimators described in Subsections 2.1 and 2.2. Of course, to obtain a positive-definite matrix by forming a weighted average of several candidate matrices, we have to assume that at least one of these candidate matrices is positive-definite. This is reasonable and is certainly true when the set of candidates contains the trivial model, which contains only the intercept term Inline graphic .

Let Inline graphic be a weight vector. A model average estimator of is of the form

where Inline graphic . Similarly, define . We then have

We restrict the weight vector in the set Inline graphic , where

(2.6)

Obviously, Inline graphic is not empty under our assumption that at least one candidate model is positive-definite. We measure the distance between two matrices using the Frobenius norm of the difference, and hence the Frobenius norm loss of is naturally defined as

where Inline graphic denotes the Frobenius norm. Our purpose is to devise a weight choice criterion to minimize the expected loss .

We first note that

(2.7)

where Inline graphic denotes the norm. This means that the Frobenius norm loss of is the same as the squared error loss of . Then the corresponding risk function, defined as the expected loss function, can be calculated as

(2.8)

Of course, neither Inline graphic nor can be directly minimized because they depend on , which is unknown. We thus work around the difficulty by first replacing with , and then adjusting for the offset. This leads us to construct an estimator of the risk function,

(2.9)

It can be readily shown that

(2.10)

This implies that Inline graphic is an unbiased estimator of the risk function plus a constant irrelevant to the weight . Therefore, by minimizing , we expect that and are also minimized. This property in (2.10) is similar to the Mallows criterion proposed by Hansen (2007). We thus proceed to use the Mallows-type model averaging criterion. Alternatively, the jackknife model averaging (Hansen and Racine, 2012; Zhang et al., 2013) criterion can also be used for weight choice. Liu and Okui (2013) have shown that the Mallows-type and jackknife methods have a similar performance.

In practice, the covariance matrix Inline graphic in is unknown and needs to be estimated. Following Hansen (2007), Lan et al. (2018) and Zhang and Wang (2019), we estimate based on a candidate model containing the largest number of covariates, indexed by

(2.11)

When several candidates have the same maximum number of covariates, we simply pick any one of such candidate models as the Inline graphic model. This leads to the estimator

(2.12)

where Inline graphic , and is the th component of . Here, we have restricted to be diagonal, while itself may not be diagonal. We find that despite this seemingly crude practice, the resulting model averaging estimator is always optimal in that it minimizes the expected Frobenius loss regardless of whether Inline graphic is diagonal or not.

When Inline graphic is replaced by , changes to

(2.13)

in which all quantities are known expect Inline graphic . Minimizing with respect to leads to

(2.14)

Substituting Inline graphic in yields the model average estimator , which we name the model average covariance (MAC) estimator. We point out that the minimization of (2.13) with respect to is a constrained quadratic programming problem, and hence the computation of the optimal weight vector is straightforward. For example, quadratic programming can be performed using the quadprog package in r, the quadprog command in matlab, or the qp command in sas. Next, we present the asymptotic optimality of the MAC estimator Inline graphic .

3. ASYMPTOTIC OPTIMALITY

As we have pointed out, in order to obtain a final covariance matrix estimator based on several candidate estimators, at least one of these candidate estimators needs to be valid, i.e., positive-definite. We state this formally as Condition (C.1). We then proceed to establish the optimality property of our procedure under scenarios of both independent and dependent error components.

Condition (C.1)

There exists ansuch thatis positive-definite.

We first define some notation. Let Inline graphic , which is a larger space than . Write . Let and denote respectively the maximum and minimum eigenvalues of a matrix. Let be an vector where the th element is one and all the other elements are zero. We use to denote the th diagonal element of and let . All limiting processes correspond to Inline graphic unless stated otherwise.

3.1. Asymptotic optimality with independence

We first consider the situation where the elements of Inline graphic are independent of each other. This arises when, for example, is normally distributed. In this case, , . This implies that the elements of are independent of each other; as a consequence, the elements of are also independent of each other. We assume the following regularity conditions.

Condition (C.2)

There exist a fixed integer G and a constantso thatfor.

Condition (C.3)

for the same constant G as given in Condition(C.2).

Condition (C.4)

There exists a constant c such that, where, recall, is the number of columns ofin the largest model.

Condition (C.5)

, a.s.

Condition (C.6)

, whereis a constant.

Remark 3.1

Condition (C.2) places a moment restriction on the error term. Condition (C.3) imposes some relation between the best model risk and the total risks from all the models, in that the best model should not be too much better than all other models. Specifically, letand, then, given, a sufficient condition of Condition (C.3) iswith, which means that the risk of the best model can increase more slowly than that of the worst model, but not too much more slowly. In addition, a consequence of Condition (C.3) is that, which implies that there is no correctly specified candidate model with a finite dimension. This can be seen from an argument of contradiction. If theth candidate model is correctly specified with a finite, then

which is finite ifis bounded and hence is a contradiction. Similar conditions are used in Wan et al. (2010), Liu and Okui (2013) and Ando and Li (2014). Condition (C.4) is commonly used to ensure the asymptotic optimality of cross-validation; see, for example, Andrews (1991) and Hansen and Racine (2012). Condition (C.5) is about the sum of the squares of the elements ofand is commonly used in the context of linear regression; see, for example, Wan et al. (2010) and Liang et al. (2011). Condition (C.6) means thathas order the same as or smaller than. A similar requirement, whereis the sample size, has been used in Wan et al. (2010) and Liu et al. (2016).

Theorem 3.1

Assume that Conditions(C.1)–(C.6) hold. Then as,

(3.1)

Theorem 3.1 shows that the MAC estimator is asymptotically optimal in that it leads to a squared error loss that is asymptotically identical to that of the infeasible best possible model average estimator. The proof of Theorem 3.1 is given in Online Appendix A.1.

3.2. Asymptotic optimality without independence

We now consider the situation where the elements of Inline graphic are dependent. Recall that , and thus we allow to be nondiagonal. Let , and , where is the Moore–Penrose generalized inverse matrix of . We can still establish the asymptotic optimality described in Theorem 3.1 with the following additional conditions.

Condition (C.7)

Condition (C.8)

.

Condition (C.9)

, whereand c are positive constants.

Remark 3.2

Condition (C.7) is similar to Condition (C.3) but is weaker. It is implied by Condition (C.3) with. Condition (C.8) is similar to condition (22) in Zhang et al. (2013). When all candidate models are nested within the largest candidate modeldefined in (2.11), , so Condition (C.8) is implied by, which is again a restriction onand is similar to Condition (C.6). Condition (C.9) is a commonly used boundedness requirement on the minimum and maximum eigenvalues of.

Theorem 3.2

Assume that Conditions(C.1), (C.4), (C.5), (C.7)–(C.9) hold. Then as, the asymptotic optimality in (3.1) still holds.

Theorem 3.2 shows that Theorem 3.1 remains valid when Inline graphic is a general matrix instead of a diagonal matrix. The proof of Theorem 3.2 is given in Online Appendix A.2 in the online Supporting Information. The reason why the optimality property is retained in Theorem 3.2 lies in Condition (C.8). This condition restricts the effect of the second term in the criterion Inline graphic to be ignorable on the final weight choice compared with the first term, and thus the estimate of has an ignorable effect.

4. CONSISTENCY WITH CORRECTLY SPECIFIED CANDIDATE MODELS

As discussed in Remark 3.1, the optimality shown above essentially excludes the situation where at least one of the candidate models is correctly specified. Here, we show that when there is at least one correctly specified candidate model, the model averaging method described above can achieve the same convergence rate of these correctly specified candidate models. Let Inline graphic be a selection matrix so that . Assume that the th model is a correctly specified model, i.e.,

(4.1)

Thus,

(4.2)

Because the Inline graphic th model has the largest number of covariates, for convenience we assume that the th model is nested inside the th model.

Under the Inline graphic th candidate model, , and thus the estimator of is . Then, the model averaging estimator of is

(4.3)

To build the consistency of Inline graphic , we assume the following regularity condition. Let .

Condition (C.10)

, whereandare constants.

Remark 4.1

Condition (C.10) requires the additional components in bigger models to contribute sufficiently different structures. Condition (C.10) is the same as the condition (A1) of Zou and Zhang (2009). In their paper, andare constants, and we sometimes use the sameto denote different constants.

Lemma 4.1

If Conditions(C.9) and(C.10) are satisfied, then

(4.4)

Remark 4.2

Lemma 4.1 shows the-consistency of the correctly specified model coefficient, which is a very common convergence result when the dimension of coefficients diverges; see, for example, He and Shao (2000) and Fan and Peng (2004). The proof of Lemma 4.1 is given in Online Appendix A.3.

Theorem 4.1

Assume that Conditions(C.4), (C.9) and(C.10) hold. Then

Theorem 4.1 shows the Inline graphic -consistency of the optimal model averaging coefficient . The results in Theorem 4.1 may appear counter-intuitive at first glance, since they indicate that the large size of the unknown matrix is beneficial to us. This is a direct consequence of the assumption that the adjacency matrix Inline graphic is known, and hence a larger under such a setting does indeed represent more information. On the other hand, is the number of parameters, and hence indeed resembles the usual diverging parametric convergence rate. The proof of Theorem 4.1 is given in Online Appendix A.4 .

5. EXTENSIONS WITH

In this section, we extend the asymptotic optimality and consistency properties in Sections 3 and 4 to the case of Inline graphic . Suppose we have a sample , which consists of independent and identically distributed copies of . Denote , , and for . Let , be the th component of , and . Then, we know and . By model (2.3) and the th candidate model (2.5), we have that

(5.1)

Then, the ordinary least squares estimator of Inline graphic is , and the estimator of is . For simplicity, we do not explicitly redefine , , , and , other than pointing out that they are the same as defined in Section 2 except with the in their corresponding expressions replaced by .

The loss function is Inline graphic , and the corresponding risk function is

(5.2)

The estimator of the risk function is

which is an unbiased estimator of the risk Inline graphic up to a constant, i.e.,

In practice, we estimate Inline graphic by

(5.3)

where Inline graphic . Then, is changed to

(5.4)

and the optimal weight vector is

(5.5)

Next, we consider the asymptotic optimality of the model averaging estimator Inline graphic as and when no correct model is contained in the candidate model set.

Corollary 5.1

Whenis diagonal, if Conditions(C.1)–(C.6) hold, then asand, the asymptotic optimality in (3.1) still holds.

Corollary 5.2

Whenis nondiagonal, if Conditions(C.1), (C.4), (C.5), (C.7)–(C.9) hold, then asand, the asymptotic optimality in (3.1) still holds.

Corollaries 5.1 and 5.2 are the generalized versions of Theorems 3.1 and 3.2, where the sample size Inline graphic is larger than 1. The proofs of Corollaries 5.1 and 5.2 are given in Online Appendices A.5 and A.6 . From these proofs, it is straightforward to see that when , these corollaries still hold.

Similarly, we also establish the consistency of the methods when at least one candidate model is correct. Since Inline graphic can go to infinity, the convergence order in Lemma 4.1 will depend on Specifically, we have the following result.

Lemma 5.1

If Conditions(C.9) and (C.10) are satisfied, then

(5.6)

Remark 5.1

Lemma 5.1 is a generalization of Lemma 4.1 when the sample sizeis allowed to diverge. The proof of Lemma 5.1 is given in Online Appendix A.7.

Corollary 5.3

Assume that Conditions(C.4), (C.9) and(C.10) hold. Then.

From Corollary 5.3, if Inline graphic , then under those conditions. Otherwise, . Thus, as far as the convergence rate is concerned, there is no need to increase to be much larger than because no more gain can be obtained. The proof of Corollary 5.3 is given in Online Appendix A.8 .

6. SIMULATION STUDY

This section is devoted to a comparison of the finite-sample performance of the MAC estimator with the AIC- and BIC-based model selection and averaging estimators. The AIC and BIC scores for the Inline graphic th candidate model are and , respectively, where . (When , we use instead of and set .) Buckland et al. (1997) suggested smoothed AIC (SAIC) and smoothed BIC (SBIC) model averaging methods, where the weight for the th model is simply set as , and similarly for . Owing to their ease of use, the SAIC and SBIC weight choice methods have been used extensively in the literature (Wan and Zhang, 2009; Millar et al., 2014).

In the following, we consider two kinds of experimental design. In the first design, all candidate models are misspecified, while in the second one, some candidate models are correctly specified.

6.1. Experimental designs

Design 1 (All candidate models are misspecified). We set the true covariance matrix Inline graphic as , where and the individual elements of , , are calculated by , which are independently generated from a binary distribution with probability with or 10 for any , while . We set and , where the function returns the largest integers not greater than .

The Inline graphic and values together control the dimension and the sparsity of the network structure. We let be the maximum order for . Here we have chosen very small coefficients to ensure the positive-definiteness of . In addition, the response vector is , where each component of is independently and identically simulated from either a standard normal distribution (norm), a standardized exponential distribution (exp), or a mixture of two normal distributions (mix). For the 'mix' case, the first distribution has mean zero and variance 5/9, and the second one has mean zero and variance 5, with the mixture coefficient itself generated from a binomial distribution with probability 0.9, i.e., Inline graphic , where Binomial(0.9). We repeat the simulation 500 times under each simulation setting.

In all candidate models, Inline graphic is included and may or may not be included. Thus, we consider a total of candidate models. Note that when , and when , and hence all candidate models are misspecified.

Design 2 (Some candidate models are correctly specified). We set the covariance matrix Inline graphic as . We consider three settings of , which are , , and . Like in Design 1, we always include , while allowing all other components to be zeros. Thus, we consider a total of candidate models. All other aspects of Design 2 are the same as in Design 1.

We evaluate the performance of various estimators based on the following mean Frobenius loss (MFL):

(6.1)

where Inline graphic is the estimator of obtained by a given method in the th trial, and is the number of replications. In Design 2, where some candidate models are correctly specified, in order to illustrate the consistency of the MAC method shown in Corollary 5.3 we also calculate the mean squared error (MSE) :

(6.2)

where Inline graphic and are the chosen weight vector and the estimator of obtained in the th trial.

6.2. Results

The MFLs of Design 1 are presented in Tables 1–3 under normal, exponential and mixture distributions, respectively. The MFLs of Design 2 are presented in Tables 4–6, under normal, exponential and mixture distributions, respectively. To facilitate comparisons, we flag the best, second best and worst estimators in each case in bold, blue italic and red italic respectively. The MSEs of Design 2 are shown in Table 7.

Table 1.

The MFL under a normal distribution for Design 1 ( Inline graphic 100).

			MAC	SAIC	SBIC	AIC	BIC
5	200	1	5.780	*6.467*	6.992	7.383	8.923
			3.571	3.991	4.072	4.639	4.736
			1.463	1.638	1.775	1.908	2.144
	400	1	3.086	3.456	3.637	3.966	4.271
			1.243	1.518	1.745	1.723	2.081
			0.551	0.631	0.798	0.640	0.916
10	200	1	7.720	8.305	8.638	9.181	9.959
			4.819	5.143	5.344	5.460	5.913
			1.733	1.790	1.805	1.962	1.953
	400	1	3.878	4.372	4.608	4.660	5.265
			1.331	1.429	1.417	1.572	1.566
			0.637	0.726	0.821	0.787	0.946

Open in a new tab

Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and italic, respectively.

Table 3.

The MFL under a mixture distribution for Design 1 ( Inline graphic 100).

			MAC	SAIC	SBIC	AIC	BIC
5	200	1	9.801	10.476	10.954	11.563	12.515
			5.952	7.443	7.460	8.038	8.036
			2.230	2.481	2.601	2.792	3.040
	400	1	5.233	5.671	5.803	6.255	6.430
			1.861	2.328	2.548	2.500	2.891
			0.769	0.864	1.023	0.885	1.143
10	200	1	10.421	10.994	11.285	11.908	12.716
			5.737	5.915	6.124	6.135	6.754
			2.556	2.698	2.741	2.916	2.922
	400	1	5.543	5.921	6.274	6.148	7.025
			1.869	1.977	1.946	2.150	2.085
			0.825	0.919	1.010	0.988	1.143

Open in a new tab

Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold,bold italic and italic, respectively.

Table 4.

The MFL under a normal distribution for Design 2 ( Inline graphic 100).

			MAC	SAIC	SBIC	AIC	BIC
5	200	1	21.577	24.045	24.885	26.990	29.942
			12.983	14.752	14.289	16.624	16.140
			5.287	5.470	5.536	6.071	6.313
	400	1	11.103	12.604	12.678	14.173	14.393
			4.422	5.161	5.212	5.670	5.857
			1.719	1.824	1.796	1.983	1.958
10	200	1	24.202	25.178	25.367	28.237	28.729
			15.434	16.109	16.480	17.396	18.228
			5.180	5.289	5.417	5.617	6.059
	400	1	13.505	13.953	14.699	15.180	16.458
			4.382	4.447	4.481	4.739	4.905
			1.878	1.903	1.905	1.969	2.037
5	200	1	17.819	19.836	20.807	22.336	24.616
			10.138	11.543	12.006	12.506	13.587
			3.987	4.110	4.131	4.321	4.381
	400	1	8.332	9.742	10.687	10.623	12.495
			3.521	4.295	4.157	4.545	4.407
			1.238	1.359	1.287	1.477	1.392
10	200	1	21.224	21.724	21.570	23.124	23.196
			13.374	13.885	14.068	14.667	15.207
			4.486	4.647	4.868	4.814	5.248
	400	1	11.434	11.830	12.125	12.550	13.279
			3.746	3.838	4.057	4.009	4.407
			1.626	1.671	1.710	1.717	1.793
5	200	1	15.920	18.239	18.245	20.264	20.769
			9.256	10.970	10.732	11.693	11.854
			3.820	4.003	3.922	4.358	4.287
	400	1	7.563	9.210	9.155	10.037	10.318
			3.295	4.121	3.923	4.458	4.335
			1.262	1.363	1.398	1.490	1.586
10	200	1	20.512	21.196	20.628	22.084	21.468
			12.942	13.595	13.606	14.150	14.329
			4.314	4.560	4.682	4.775	4.983
	400	1	10.950	11.458	11.306	12.039	11.805
			3.454	3.652	3.748	3.810	3.982
			1.535	1.630	1.675	1.687	1.777

Open in a new tab

Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and italic, respectively.

Table 6.

The MFL under a mixture distribution for Design 2 ( Inline graphic 100).

			MAC	SAIC	SBIC	AIC	BIC
5	200	1	31.494	34.076	34.723	37.717	39.548
			26.376	33.483	32.943	35.667	34.934
			7.194	7.459	7.611	8.072	8.545
	400	1	15.549	17.122	17.399	18.847	19.249
			7.677	9.692	9.613	10.261	10.414
			2.260	2.391	2.382	2.538	2.538
10	200	1	28.938	29.761	29.665	32.988	32.564
			15.583	16.243	16.769	17.558	18.745
			6.903	7.142	7.372	7.525	7.956
	400	1	15.148	15.589	16.181	16.905	17.920
			5.323	5.427	5.542	5.735	6.076
			2.210	2.265	2.260	2.378	2.406
5	200	1	22.950	25.278	26.137	27.904	30.076
			21.712	29.250	29.787	30.200	31.647
			5.266	5.571	5.599	5.758	5.916
	400	1	10.800	12.232	13.236	13.167	15.212
			6.479	8.413	8.338	8.642	8.639
			1.557	1.681	1.607	1.803	1.713
10	200	1	24.239	24.780	24.694	26.198	26.343
			12.803	13.285	13.529	14.025	14.709
			5.839	6.101	6.318	6.307	6.737
	400	1	12.337	12.662	12.935	13.370	14.056
			4.367	4.474	4.641	4.616	4.944
			1.864	1.914	1.952	1.970	2.045
5	200	1	21.398	24.199	24.175	26.096	27.105
			20.849	28.529	28.268	29.559	29.689
			4.998	5.417	5.327	5.758	5.717
	400	1	9.911	11.531	11.626	12.332	12.932
			6.340	8.301	8.151	8.602	8.592
			1.578	1.672	1.702	1.805	1.879
10	200	1	23.860	24.410	23.873	25.182	24.778
			12.298	12.975	12.941	13.641	13.765
			5.570	5.935	6.026	6.129	6.233
	400	1	11.712	12.230	11.979	12.738	12.441
			4.071	4.337	4.412	4.542	4.673
			1.771	1.877	1.912	1.955	2.007

Open in a new tab

Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and italic, respectively.

Table 7.

The MSE of the MAC estimator Inline graphic under Design 2.


		Norm	Exp	Mix	Norm	Exp	Mix
200	1	38.156	42.237	42.153	72.417	71.881	79.302
		19.555	23.064	30.482	51.202	38.125	41.443
		11.337	9.225	13.505	16.759	18.577	20.464
400	1	27.714	26.913	25.331	73.968	69.991	61.756
		7.020	11.045	14.087	23.588	25.602	24.684
		4.684	5.221	5.467	10.584	11.597	11.795
200	1	40.017	38.634	40.506	60.281	59.040	64.643
		19.370	22.159	28.908	37.361	28.449	32.889
		10.276	8.056	12.130	12.684	13.218	15.312
400	1	24.519	24.133	24.501	58.768	59.701	54.037
		6.793	10.555	15.713	20.278	21.214	20.150
		4.348	4.918	4.737	8.412	9.281	9.228
200	1	41.005	38.222	41.347	55.315	58.739	64.556
		20.543	22.352	28.944	37.371	26.602	31.185
		10.724	8.065	12.109	11.796	12.293	14.133
400	1	26.025	23.938	24.792	58.897	59.446	52.941
		6.714	10.575	15.647	19.404	20.189	18.872
		4.396	4.902	4.765	8.020	8.883	8.598

Open in a new tab

Table 2.

The MFL under an exponential distribution for Design 1 ( Inline graphic 100).

			MAC	SAIC	SBIC	AIC	BIC
5	200	1	9.828	10.631	11.169	11.581	13.106
			5.036	5.245	5.316	5.736	5.817
			2.512	2.784	2.937	3.068	3.374
	400	1	5.120	5.542	5.726	5.997	6.258
			1.997	2.212	2.381	2.437	2.752
			0.833	0.919	1.068	0.941	1.189
10	200	1	11.093	11.650	11.997	12.494	13.450
			5.464	5.576	5.813	5.789	6.366
			2.507	2.593	2.619	2.808	2.814
	400	1	5.321	5.866	6.205	6.142	6.909
			2.028	2.148	2.135	2.338	2.295
			0.856	0.948	1.043	1.015	1.150

Open in a new tab

Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold,bold italic anditalic, respectively.

Table 5.

The MFL under an exponential distribution for Design 2 ( Inline graphic 100).

			MAC	SAIC	SBIC	AIC	BIC
5	200	1	31.665	34.181	35.037	37.668	39.586
			15.732	16.454	16.369	18.079	18.267
			7.938	8.351	8.498	8.989	9.468
	400	1	15.544	17.232	17.525	18.749	19.361
			6.021	6.392	6.580	6.872	7.198
			2.397	2.506	2.453	2.653	2.609
10	200	1	29.631	30.776	30.796	33.824	33.871
			14.371	14.837	15.341	16.025	17.127
			6.630	6.789	6.967	7.218	7.547
	400	1	15.451	15.858	16.652	17.158	18.424
			5.627	5.718	5.850	6.017	6.307
			2.242	2.266	2.290	2.327	2.427
5	200	1	22.577	24.649	25.606	27.176	29.480
			11.560	12.278	13.001	13.189	14.650
			5.390	5.704	5.729	5.994	6.129
	400	1	10.597	12.065	13.314	12.916	15.328
			4.035	4.387	4.381	4.600	4.570
			1.631	1.751	1.698	1.866	1.804
10	200	1	24.457	25.007	24.816	26.269	26.431
			11.745	12.177	12.482	12.967	13.621
			5.404	5.608	5.821	5.838	6.230
	400	1	12.708	13.020	13.339	13.780	14.430
			4.613	4.734	4.922	4.882	5.271
			1.873	1.920	1.967	1.975	2.038
5	200	1	20.949	23.550	23.422	25.406	26.054
			10.631	11.572	11.5 08	12.419	12.514
			5.136	5.518	5.399	5.958	5.812
	400	1	9.642	11.283	11.661	12.048	13.004
			3.916	4.299	4.249	4.602	4.578
			1.661	1.754	1.811	1.880	1.997
10	200	1	23.945	24.523	23.774	25.440	24.640
			11.259	11.799	11.802	12.341	12.515
			5.168	5.475	5.571	5.738	5.920
	400	1	12.174	12.623	12.469	13.153	12.994
			4.365	4.602	4.697	4.777	4.898
			1.797	1.882	1.912	1.968	2.006

Open in a new tab

Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and red italic, respectively.

First, different distributions of Inline graphic have little quantitative effect on the performance of each method. Second, in most cases, evaluated by MFL, MAC is the best, and SAIC/SBIC outperforms AIC/BIC, which is as expected. Tables A.1–A.6 in the online Supporting Information show the standard errors for the MFLs, for which, we can see that our method performs the best stable in all cases. Finally, we can see from Table 7 that as Inline graphic or increases, the MSE of decreases, which reflects the consistency of .

7. EMPIRICAL APPLICATION

To illustrate the usefulness of the proposed method, we now apply the proposed MAC method to analyse the data on passenger traffic volume in airports. The dataset consists of yearly passenger traffic volumes at Inline graphic airports in Mainland China in 2017,¹ obtained from the Civil Aviation Administration of China². The response variable contains the centralized values of the logarithm of passenger traffic volumes at these airports. It is intuitive to see that the number of nonstop flights at an airport affects the passenger traffic volume at the airport. So, we use the matrix of nonstop flights as the adjacency matrix Inline graphic . Specifically, the entry of is if there is at least one nonstop flight between airports and , and 0 otherwise, where is the number of airports that have a nonstop flight with airport . We set the largest order of to be , and then the numbers of candidate models to be considered is Inline graphic .

We consider the five methods studied in Section 6. The values of AIC, BIC and the weights of SAIC, SBIC and MAC are reported in Table 8. We can see that the AIC and BIC methods select the same model (0, 1, 2, 3), which means that Inline graphic , , and are in the model, and the SAIC and SBIC methods also put almost all the weights on these models. However, the MAC method puts weights on the models (0, 2), (0, 3) and (0, 1, 3).

Table 8.

Model selection criterion values and model averaging weights in the analysis of Chinese airports in 2017.

	Model selection criterion values		Weights
Models	AIC	BIC	SAIC	SBIC	MAC
(0)	1361.723	1365.148	0.000	0.000	0.000
(0, 1)	1357.781	1364.630	0.000	0.000	0.000
(0,2)	1195.696	1202.546	0.002	0.043	0.593
(0,3)	1266.882	1273.731	0.000	0.000	0.132
(0, 1, 2)	1198.360	1208.635	0.000	0.002	0.000
(0, 1, 3)	1242.146	1252.421	0.000	0.000	0.275
(0,2,3)	1190.737	1201.012	0.019	0.093	0.000
(0, 1, 2, 3)	1182.865	1196.565	0.979	0.861	0.000

Open in a new tab

Notes: Model (0,2,3) means the model including Inline graphic , and . The minimum AIC and BIC values are highlighted in bold.

To compare the different methods, we use the estimated covariance matrix Inline graphic from each method as the true covariance matrix, i.e., we let . Then we create replications of randomly generated from . Subsequently, we estimate based on the five methods used above and calculate their mean and median of the MFLs across the 500 replications. Specifically,

(7.1)

and

(7.2)

where Inline graphic is the estimator of obtained by a given method in the th trial. The results are shown in Table 9, which also reports the standard deviation of each method and the optimal rate of each method, defined as the proportion of times in which the method results in the smallest MFL across the Inline graphic replication trials.

Table 9.

MFL in the analysis of Chinese airports in 2017.

Method		MAC	SAIC	SBIC	AIC	BIC
MAC	Mean	84.669	149.813	148.348	150.454	149.039
	Median	63.676	75.379	70.190	77.443	73.255
	Standard deviation	94.413	304.467	304.698	304.360	304.712
	Optimal rate	0.526	0.064	0.140	0.172	0.098
SAIC	Mean	137.931	354.814	353.701	355.525	354.004
	Median	94.863	102.324	98.094	105.817	97.684
	Standard deviation	235.541	1403.513	1403.687	1403.396	1403.677
	Optimal rate	0.528	0.070	0.118	0.202	0.082
SBIC	Mean	129.912	305.138	303.586	305.603	304.121
	Median	94.681	103.598	99.498	103.875	101.281
	Standard deviation	271.782	1512.634	1512.853	1512.588	1512.798
	Optimal rate	0.538	0.064	0.120	0.208	0.070
AIC/BIC	Mean	115.859	226.284	224.767	226.895	225.073
	Median	89.564	98.179	95.068	100.434	96.556
	Standard deviation	150.260	650.444	650.764	650.334	650.732
	Optimal rate	0.524	0.074	0.106	0.212	0.084

Open in a new tab

Notes: The methods listed in the first column are used to estimate the correlation matrix, and then the estimated correlation matrix is used as the truth to generate the corresponding data. The minimum values in every rows are highlighted in bold. Since AIC and BIC select the same model in this real data analysis, the last method is AIC/BIC.

The results show that the MAC method consistently outperforms all other methods, regardless of how the true Inline graphic is selected and which performance evaluation criterion is used. We find this quite remarkable. In terms of the mean and median of the MFLs, the MAC method performs the best among all five estimators. In particular, the mean MFL of the MAC method is about half of that by any other method in all cases. The standard deviation of the MFL of MAC is much smaller than those of others in all cases, which means that the MAC performance is the most stable. In terms of the percentage of times where a method shows optimal performance (optimal rate), the MAC estimator always attains the highest score among the five methods, often with a value of more than 50%, indicating that in over half of the 500 trials, MAC is the champion. Depending on the means and medians, SAIC/SBIC performs slightly better than AIC/BIC.

8. CONCLUDING REMARKS

In this article, we used the covariance regression network model, which treats the covariance as a polynomial function of the symmetric adjacency matrix. The model averaging method was used to estimate the high-dimensional covariance matrix, and the candidate models were constructed through different orders of a polynomial function. The optimal weights were obtained by minimizing the newly proposed Mallows-type model averaging criterion. We proved the asymptotic optimality and consistency of the resulting MAC estimators for different situations. Both numerical simulations and a case study on Chinese airport network structure data were conducted to demonstrate the validity of the proposed approach.

It is worth noting that our method combines polynomials of Inline graphic . If the true covariance matrix is far from the polynomial form, the MAC method may not yield an accurate estimate. In addition, when conjecturing that is a combination of banded and Toeplits-type matrices, these matrices should be included in candidate models, but then there is no single orthogonal matrix to diagonalize them simultaneously. In such a case, MAC will not be applicable. How to develop a model averaging method for estimating this kind of covariance matrix warrants further study.

Supplementary Material

utaa030_Online_Appendix

Click here for additional data file.^{(197.1KB, pdf)}

utaa030_Replication_Files

Click here for additional data file.^{(152.6KB, zip)}

ACKNOWLEDGEMENTS

We thank the referee, the associate editor, the co-editor Victor Chernozhukov and Prof. Hansheng Wang for many constructive comments and suggestions. We thank Prof. Wei Lan for providing his codes. Zhang, the corresponding author, was supported by the National Key R&D Program of China (2020AAA0105200), the National Natural Science Foundation of China (grant nos. 71925007, 11688101 and 71631008), the Beijing Academy of Artificial Intelligence, and the Youth Innovation Promotion Association of the Chinese Academy of Sciences. Ma was supported by grants from the National Science Foundation and the National Institutes of Health. Zou was supported by the National Natural Science Foundation of China (grant nos. 11971323 and 12031016). All errors remain the authors.

Notes

Co-editor Victor Chernozhukov handled this manuscript.

Footnotes

In 2017, there were 229 civil airports in Mainland China, 227 of which had scheduled flights.

http://news.carnoc.com/list/438/438561.html?from=timeline.

Contributor Information

Rong Zhu, School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.

Xinyu Zhang, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.

Yanyuan Ma, Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA.

Guohua Zou, School of Mathematical Sciences, Capital Normal University, Beijing 100048, China.

Supporting Information

Additional Supporting Information may be found in the online version of this article at the publisher’s website:

Online Appendix

Replication Package

REFERENCES

Ando T., Li K.-C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association. 109, 254–65. [Google Scholar]
Andrews D. W. K. (1991). Asymptotic optimality of generalized c, cross-validation, and generalized cross-validation in regression with heteroskedastic errors. Journal of Econometrics. 47, 359–77. [Google Scholar]
Bilmes J. A. (2000). Factored sparse inverse covariance matrices. IEEE International Conference. 2, 1009–12. [Google Scholar]
Buckland S. T., Burnham K. P., Augustin N. H. (1997). Model selection: An integral part of inference. Biometrics. 53, 603–18. [Google Scholar]
Campbell J. Y., Lo A. W., Mackinlay A. C., Whitelaw R. F. (1998). The econometrics of financial markets. Macroeconomic Dynamics. 2, 559–62. [Google Scholar]
Chen X. H., Conley T. G. (2001). A new semi-parametric spatial model for panel time series. Journal of Econometrics. 105, 59–83. [Google Scholar]
Fan J., Fan Y., Lv J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics. 147, 186–97. [Google Scholar]
Fan J., Peng H. (2004). On nonconcave penalized likelihood with diverging number of parameters. Annals of Statistics. 32, 928–61. [Google Scholar]
Friedman J. H., Hastie T., Tibshirani R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 9, 432–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gao Y., Zhang X., Wang S., Zou G. (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics. 192, 139–51. [Google Scholar]
Hansen B. E. (2007). Least squares model averaging. Econometrica. 75, 1175–89. [Google Scholar]
Hansen B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics. 5, 495–530. [Google Scholar]
Hansen B. E., Racine J. S. (2012). Jackknife model averaging. Journal of Econometrics. 167, 38–46. [Google Scholar]
He X., Shao Q. M. (2000). On parameters of increasing dimensions. Journal of Multivariate Analysis. 73, 120–35. [Google Scholar]
Jagannathan R., Ma T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraints helps. Journal of Finance. 58, 1651–84. [Google Scholar]
Lan W., Fang Z., Wang H., Tsai C. (2018). Covariance matrix estimation via network structure. Journal of Business and Economic Statistics. 36, 359–69. [Google Scholar]
Leung G., Barron A. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on information theory. 52, 3396–410. [Google Scholar]
Liang H., Zou G., Wan A. T. K., Zhang X. (2011). Optimal weight choice for frequentist model average estimators. Journal of the American Statistical Association. 106, 1053–66. [Google Scholar]
Liu Q., Okui R. (2013). Heteroskedasticity-robust C model averaging. Econometrics Journal. 16, 463–72. [Google Scholar]
Liu Q., Okui R., Yoshimura A. (2016). Generalized least squares model averaging. Econometric Reviews. 35, 1692–752. [Google Scholar]
Markowitz H. (1952). Portfolio selection. Journal of Finance. 7, 77–91. [Google Scholar]
Millar C. P., Jardim E., Scott F., Osio G. C., Mosqueira I., Alzorriz N. (2014). Model averaging to streamline the stock assessment process. ICES Journal of Marine Science. 72, 93–98. [Google Scholar]
Wan A. T. K., Zhang X. (2009). On the use of model averaging in tourism research. Annals of Tourism Research. 36, 525–32. [Google Scholar]
Wan A. T. K., Zhang X., Zou G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics. 156, 277–83. [Google Scholar]
Whittle P., (1960), Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and its Applications. 5, 331–5. [Google Scholar]
Yuan Z., Yang Y. (2005). Combining linear regression models: When and how?. Journal of the American Statistical Association. 100, 1202–14. [Google Scholar]
Zhang X., (2010), Model averaging and its applications. Ph.D. Thesis, Academy of Mathematics and Systems Science, Chinese Academy of Sciences. [Google Scholar]
Zhang X., Wan A. T. K., Zou G. (2013). Model averaging by jackknife criterion in models with dependent data. Journal of Econometrics. 174, 82–94. [Google Scholar]
Zhang X., Wang W. (2019). Optimal model averaging estimation for partially linear models. Statistica Sinica. 29, 693–718. [Google Scholar]
Zhu R., Zou G., Zhang X. (2018). Model averaging for multivariate multiple regression models. Statistics. 52, 205–27. [Google Scholar]
Zou H., Zhang H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics. 37, 1733–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

utaa030_Online_Appendix

Click here for additional data file.^{(197.1KB, pdf)}

utaa030_Replication_Files

Click here for additional data file.^{(152.6KB, zip)}

[bib1] Ando T., Li K.-C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association. 109, 254–65. [Google Scholar]

[bib2] Andrews D. W. K. (1991). Asymptotic optimality of generalized c, cross-validation, and generalized cross-validation in regression with heteroskedastic errors. Journal of Econometrics. 47, 359–77. [Google Scholar]

[bib3] Bilmes J. A. (2000). Factored sparse inverse covariance matrices. IEEE International Conference. 2, 1009–12. [Google Scholar]

[bib4] Buckland S. T., Burnham K. P., Augustin N. H. (1997). Model selection: An integral part of inference. Biometrics. 53, 603–18. [Google Scholar]

[bib5] Campbell J. Y., Lo A. W., Mackinlay A. C., Whitelaw R. F. (1998). The econometrics of financial markets. Macroeconomic Dynamics. 2, 559–62. [Google Scholar]

[bib6] Chen X. H., Conley T. G. (2001). A new semi-parametric spatial model for panel time series. Journal of Econometrics. 105, 59–83. [Google Scholar]

[bib7] Fan J., Fan Y., Lv J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics. 147, 186–97. [Google Scholar]

[bib8] Fan J., Peng H. (2004). On nonconcave penalized likelihood with diverging number of parameters. Annals of Statistics. 32, 928–61. [Google Scholar]

[bib9] Friedman J. H., Hastie T., Tibshirani R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 9, 432–41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Gao Y., Zhang X., Wang S., Zou G. (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics. 192, 139–51. [Google Scholar]

[bib11] Hansen B. E. (2007). Least squares model averaging. Econometrica. 75, 1175–89. [Google Scholar]

[bib12] Hansen B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics. 5, 495–530. [Google Scholar]

[bib13] Hansen B. E., Racine J. S. (2012). Jackknife model averaging. Journal of Econometrics. 167, 38–46. [Google Scholar]

[bib14] He X., Shao Q. M. (2000). On parameters of increasing dimensions. Journal of Multivariate Analysis. 73, 120–35. [Google Scholar]

[bib15] Jagannathan R., Ma T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraints helps. Journal of Finance. 58, 1651–84. [Google Scholar]

[bib16] Lan W., Fang Z., Wang H., Tsai C. (2018). Covariance matrix estimation via network structure. Journal of Business and Economic Statistics. 36, 359–69. [Google Scholar]

[bib17] Leung G., Barron A. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on information theory. 52, 3396–410. [Google Scholar]

[bib18] Liang H., Zou G., Wan A. T. K., Zhang X. (2011). Optimal weight choice for frequentist model average estimators. Journal of the American Statistical Association. 106, 1053–66. [Google Scholar]

[bib19] Liu Q., Okui R. (2013). Heteroskedasticity-robust C model averaging. Econometrics Journal. 16, 463–72. [Google Scholar]

[bib20] Liu Q., Okui R., Yoshimura A. (2016). Generalized least squares model averaging. Econometric Reviews. 35, 1692–752. [Google Scholar]

[bib21] Markowitz H. (1952). Portfolio selection. Journal of Finance. 7, 77–91. [Google Scholar]

[bib22] Millar C. P., Jardim E., Scott F., Osio G. C., Mosqueira I., Alzorriz N. (2014). Model averaging to streamline the stock assessment process. ICES Journal of Marine Science. 72, 93–98. [Google Scholar]

[bib23] Wan A. T. K., Zhang X. (2009). On the use of model averaging in tourism research. Annals of Tourism Research. 36, 525–32. [Google Scholar]

[bib24] Wan A. T. K., Zhang X., Zou G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics. 156, 277–83. [Google Scholar]

[bib1_97_1604973782742] Whittle P., (1960), Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and its Applications. 5, 331–5. [Google Scholar]

[bib25] Yuan Z., Yang Y. (2005). Combining linear regression models: When and how?. Journal of the American Statistical Association. 100, 1202–14. [Google Scholar]

[bib2_472_1604974156098] Zhang X., (2010), Model averaging and its applications. Ph.D. Thesis, Academy of Mathematics and Systems Science, Chinese Academy of Sciences. [Google Scholar]

[bib26] Zhang X., Wan A. T. K., Zou G. (2013). Model averaging by jackknife criterion in models with dependent data. Journal of Econometrics. 174, 82–94. [Google Scholar]

[bib27] Zhang X., Wang W. (2019). Optimal model averaging estimation for partially linear models. Statistica Sinica. 29, 693–718. [Google Scholar]

[bib28] Zhu R., Zou G., Zhang X. (2018). Model averaging for multivariate multiple regression models. Statistics. 52, 205–27. [Google Scholar]

[bib29] Zou H., Zhang H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics. 37, 1733–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Model averaging estimation for high-dimensional covariance matrices with a network structure

Rong Zhu

Xinyu Zhang

Yanyuan Ma

Guohua Zou

Summary

1. INTRODUCTION

2. MODEL SET-UP AND ESTIMATION

2.1. Covariance regression network model

2.2. Candidate models and estimation

2.3. Model averaging and weight choice criterion

3. ASYMPTOTIC OPTIMALITY

Condition (C.1)

3.1. Asymptotic optimality with independence

Condition (C.2)

Condition (C.3)

Condition (C.4)

Condition (C.5)

Condition (C.6)

Remark 3.1

Theorem 3.1

3.2. Asymptotic optimality without independence

Condition (C.7)

Condition (C.8)

Condition (C.9)

Remark 3.2

Theorem 3.2

4. CONSISTENCY WITH CORRECTLY SPECIFIED CANDIDATE MODELS

Condition (C.10)

Remark 4.1

Lemma 4.1

Remark 4.2

Theorem 4.1

5. EXTENSIONS WITH

Corollary 5.1

Corollary 5.2

Lemma 5.1

Remark 5.1

Corollary 5.3

6. SIMULATION STUDY

6.1. Experimental designs

6.2. Results

Table 1.

Table 3.

Table 4.

Table 6.

Table 7.

Table 2.

Table 5.

7. EMPIRICAL APPLICATION

Table 8.

Table 9.

8. CONCLUDING REMARKS

Supplementary Material

ACKNOWLEDGEMENTS

Notes

Footnotes

Contributor Information

Supporting Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases