Summary
In this paper, we develop a model averaging method to estimate a high-dimensional covariance matrix, where the candidate models are constructed by different orders of polynomial functions. We propose a Mallows-type model averaging criterion and select the weights by minimizing this criterion, which is an unbiased estimator of the expected in-sample squared error plus a constant. Then, we prove the asymptotic optimality of the resulting model average covariance estimators. Finally, we conduct numerical simulations and a case study on Chinese airport network structure data to demonstrate the usefulness of the proposed approaches.
Keywords: asymptotic optimality, consistency, covariance regression network model, Mallows criterion, model averaging
1. INTRODUCTION
The covariance matrix is a familiar concept and has been widely used in many fields. For example, Markowitz (1952) illustrated geometrically the relationship between belief and choice of portfolio according to their covariance matrix. Campbell et al. (1998) and Jagannathan and Ma (2003) considered finance and risk management based on covariance matrices. Bilmes (2000) improved the parsimony of speech recognition systems by adjusting the type of covariance matrices. Chen and Conley (2001) presented a semiparametric model for high-dimensional vector time series: they analysed a semiparametric spatial model by using the positive-definiteness of a covariance function for panel data in the context of time series. Friedman et al. (2008) considered the problem of estimating sparse graphs by incorporating the lasso penalty into the inverse covariance matrix. And, motivated by the arbitrage pricing theory in finance, Fan et al. (2008) used the multi-factor model to reduce dimensionality and estimate the covariance matrix.
Let
be the
-dimensional covariance matrix of a
-dimensional random vector
. The most commonly used covariance matrix estimator of
is the classic sample covariance matrix estimator. However, the sample covariance matrix estimator does not perform well when
and the sample size
is fixed or grows at a slower rate than p, because the number of unknown parameters is much larger than the sample size in this case. Inevitably, additional assumptions need to be made to ensure that the estimation of
is feasible. Here, we consider the novel strategy proposed by Lan et al. (2018), which uses a covariance regression network model (CRNM) to estimate the high-dimensional covariance matrix. In this framework, the covariance matrix is regarded as a polynomial function of the symmetric adjacency matrix, representing a given network structure. In this way, the estimation of a high-dimensional covariance matrix is converted into the estimation of the low-dimensional coefficients of the CRNM. These authors further developed a Bayesian information criterion (BIC) to select the order of the polynomial function and proved the consistency of the BIC. Their treatment can thus be classed as a model selection approach.
Model averaging can be viewed as a smooth extension of model selection and it can substantially reduce the risk relative to model selection (Hansen, 2014). Furthermore, model averaging is often more stable than model selection, in which a small change of the data may lead to a significant change in the model selected; see Yuan and Yang (2005) and Leung and Barron (2006) for further discussion. There has been much research into model averaging for regression models. For example, Buckland et al. (1997) proposed the smoothed AIC and smoothed BIC method, in which weights are assigned based on the information criterion scores obtained from different models. Hansen (2007) and Wan et al. (2010) developed a Mallows model averaging method for linear regression models. Other methods include jackknife model averaging (Hansen and Racine, 2012; Zhang et al., 2013), heteroscedasticity-robust Cp model averaging (Liu and Okui, 2013), leave-subject-out cross-validation (Gao et al., 2016), and Mahalanobis Mallows model averaging (Zhu et al., 2018).
In this paper, in order to improve the estimation of a high-dimensional covariance matrix with a network structure, we propose a model averaging method based on the normalized version of CRNM (Lan et al., 2018). We select the averaging weights through minimizing a Mallows-type criterion, which is an unbiased estimator of the expected in-sample squared error plus a constant. We then establish the asymptotic optimality of the resulting model average covariance (MAC) estimator when none of the candidate models is correct. We also show that MAC will be consistent during the estimation of the covariance matrix
if at least one of the candidate models is in fact correct. Following Lan et al. (2018), for the sake of convenience, we consider
and
first, and then discuss the extension when
increases.
The remainder of this paper is organized as follows. In Section 2, we describe the estimation method in the normalized CRNM (nCRNM), introduce the Mallows-type weight choice criterion, and propose the MAC estimator based on this criterion. The asymptotic optimality of the model averaging estimator is established in Section 3, while the consistency of the method is shown in Section 4. We extend the theorems of asymptotic optimality and consistency to the case that the sample size
is larger than one in Section 5. We compare the finite-sample properties of the MAC estimator with several information criterion-based model selection and averaging estimators in Section 6. A real data example is considered in Section 7. Section 8 contains some concluding remarks. Technical proofs are in Online Appendices.
2. MODEL SET-UP AND ESTIMATION
2.1. Covariance regression network model
Consider a graph with
nodes, where the
th node is used to represent the
th observation, for example the
th individual. Let
denote the symmetric adjacency matrix of the graph, where
if the two nodes
and
are directly related, for example if the
th and
th individuals know each other, and let
otherwise. We set the diagonal elements
. Let
be the
th power of
, with the zeroth power defined as identity, i.e.,
. The
th component of
is then
and it counts the number of ways to connect the
th node and the
th node through a path with exactly
steps. In other words,
measures the
-path relationships of the
nodes.
Let
be a random variable that describes a certain property of the
th node, such as the
th person’s social activities in a period of time. Let
, and let the covariance matrix of
be
. Naturally,
can be linked to the adjacency matrix
. Lan et al. (2018) introduced the CRNM:
, where
is a positive integer and
is the vector of regression coefficients. To ensure that the parameters in the CRNM are comparable between models, we normalize the elements of
,
, by dividing
. Hence, the normalized matrix is
with
. The
th component of
is then
, which is the 'normalized' number of ways to connect the
th node and the
th node through a path with
steps. Rather than using the CRNM in Lan et al. (2018), we assume the following normalized CRNM (nCRNM):
![]() |
(2.1) |
where
is a positive integer that can increase to infinity with
, and
is the vector of regression coefficients.
To ensure that
in (2.1) is positive-definite, constraints are imposed on
. Because the adjacency matrix
is assumed known, an obvious benefit of the nCRNM (2.1) is that it reduces the number of parameters in
from
to
. Note that (2.1) is not necessarily the smallest model, since it is possible that
for any
. In particular, we allow the 'wasteful' case where
for all
, for a certain value
.
In estimating the covariance matrix
, it is common to assume that
, have the same mean, which can be consistently estimated by the sample average
. One can further centre the data by subtracting the sample mean and work directly with
. Hence, without loss of generality, we assume that
in the rest of the paper. For convenience, we also perform a standard eigenvalue decomposition on
to obtain
![]() |
(2.2) |
where
is an orthogonal matrix,
is a diagonal matrix, and
is the
th-largest eigenvalue of
. These preparations allow us to convert the model in (2.1) to a simple linear regression model, which facilitates straightforward estimation. Specifically, let
, where
is defined in (2.2). Then
has mean
and covariance matrix
. Extracting the
th diagonal element of these relations, we obtain
, which is a multiple linear regression model if we view
as the response variable,
as the
th regressor, and
as the regression coefficients. To fix the notation, we define
and
. Then, the nCRNM (2.1) is equivalently written as
![]() |
(2.3) |
where
. Let
,
, and
be the diagonal matrix with the
th diagonal entry being the
th element of
. With this notation,
![]() |
(2.4) |
Thus, for nCRNM (2.1) with a pre-determined
, one can use the ordinary least squares method when
is of full column rank to obtain the estimator of
(i.e.,
) in (2.3) and use (2.4) to obtain
.
2.2. Candidate models and estimation
Each fixed polynomial order
represents a different nCRNM (2.1). In practice, which
value is suitable is often unclear. We thus consider many possible values of
, which corresponds to a series of possible models. We index these models by
, where the total number of models
can be related to the total number of nodes
. To further increase flexibility, we allow the
th candidate nCRNM to contain
arbitrary monomials of
, which do not have to be
to
but must include the intercept term
. Similar to (2.3), the resulting
th nCRNM is equivalently written as
![]() |
(2.5) |
where
is a
submatrix of
,
is the coefficient vector, and
is the error. Here we can assume that
is a
matrix with
sufficiently large so that
for all
and
is of full column rank. Although we consider
models, we do not assume that any of these models are correct. Thus, we allow some or all of the
models to be misspecified.
It is easy to see that the ordinary least square estimator of
is
, and the estimator of
is
, where
is the projection matrix. By (2.4), the estimator of
based on the
th candidate model is
![]() |
We thus have
estimators of
based on the
candidate models.
2.3. Model averaging and weight choice criterion
Our purpose is now to combine the
estimators of
obtained from the
candidate models to achieve an optimal estimator of
through a weighted average of
s. The optimality is reflected in the fact that the resulting estimator minimizes the distance to the true
, while it simultaneously ensures that the estimated covariance matrix is positive-definite. Note that the positive-definiteness property is not guaranteed by the estimators described in Subsections 2.1 and 2.2. Of course, to obtain a positive-definite matrix by forming a weighted average of several candidate matrices, we have to assume that at least one of these candidate matrices is positive-definite. This is reasonable and is certainly true when the set of candidates contains the trivial model, which contains only the intercept term
.
Let
be a weight vector. A model average estimator of
is of the form
![]() |
where
. Similarly, define
. We then have
![]() |
We restrict the weight vector in the set
, where
![]() |
(2.6) |
Obviously,
is not empty under our assumption that at least one candidate model is positive-definite. We measure the distance between two matrices using the Frobenius norm of the difference, and hence the Frobenius norm loss of
is naturally defined as
![]() |
where
denotes the Frobenius norm. Our purpose is to devise a weight choice criterion to minimize the expected loss
.
We first note that
![]() |
(2.7) |
where
denotes the
norm. This means that the Frobenius norm loss of
is the same as the squared error loss of
. Then the corresponding risk function, defined as the expected loss function, can be calculated as
![]() |
(2.8) |
Of course, neither
nor
can be directly minimized because they depend on
, which is unknown. We thus work around the difficulty by first replacing
with
, and then adjusting for the offset. This leads us to construct an estimator of the risk function,
![]() |
(2.9) |
It can be readily shown that
![]() |
(2.10) |
This implies that
is an unbiased estimator of the risk function
plus a constant irrelevant to the weight
. Therefore, by minimizing
, we expect that
and
are also minimized. This property in (2.10) is similar to the Mallows criterion proposed by Hansen (2007). We thus proceed to use the Mallows-type model averaging criterion. Alternatively, the jackknife model averaging (Hansen and Racine, 2012; Zhang et al., 2013) criterion can also be used for weight choice. Liu and Okui (2013) have shown that the Mallows-type and jackknife methods have a similar performance.
In practice, the covariance matrix
in
is unknown and needs to be estimated. Following Hansen (2007), Lan et al. (2018) and Zhang and Wang (2019), we estimate
based on a candidate model containing the largest number of covariates, indexed by
![]() |
(2.11) |
When several candidates have the same maximum number of covariates, we simply pick any one of such candidate models as the
model. This leads to the estimator
![]() |
(2.12) |
where
, and
is the
th component of
. Here, we have restricted
to be diagonal, while
itself may not be diagonal. We find that despite this seemingly crude practice, the resulting model averaging estimator is always optimal in that it minimizes the expected Frobenius loss
regardless of whether
is diagonal or not.
When
is replaced by
,
changes to
![]() |
(2.13) |
in which all quantities are known expect
. Minimizing
with respect to
leads to
![]() |
(2.14) |
Substituting
in
yields the model average estimator
, which we name the model average covariance (MAC) estimator. We point out that the minimization of (2.13) with respect to
is a constrained quadratic programming problem, and hence the computation of the optimal weight vector is straightforward. For example, quadratic programming can be performed using the quadprog package in r, the quadprog command in matlab, or the qp command in sas. Next, we present the asymptotic optimality of the MAC estimator
.
3. ASYMPTOTIC OPTIMALITY
As we have pointed out, in order to obtain a final covariance matrix estimator based on several candidate estimators, at least one of these candidate estimators needs to be valid, i.e., positive-definite. We state this formally as Condition (C.1). We then proceed to establish the optimality property of our procedure under scenarios of both independent and dependent error components.
Condition (C.1)
There exists an
such that
is positive-definite.
We first define some notation. Let
, which is a larger space than
. Write
. Let
and
denote respectively the maximum and minimum eigenvalues of a matrix. Let
be an
vector where the
th element is one and all the other elements are zero. We use
to denote the
th diagonal element of
and let
. All limiting processes correspond to
unless stated otherwise.
3.1. Asymptotic optimality with independence
We first consider the situation where the elements of
are independent of each other. This arises when, for example,
is normally distributed. In this case,
,
. This implies that the elements of
are independent of each other; as a consequence, the elements of
are also independent of each other. We assume the following regularity conditions.
Condition (C.2)
There exist a fixed integer G and a constant
so that
for
.
Condition (C.3)
for the same constant G as given in Condition(C.2).
Condition (C.4)
There exists a constant c such that
, where, recall,
is the number of columns of
in the largest model.
Condition (C.5)
, a.s.
Condition (C.6)
, where
is a constant.
Remark 3.1
Condition (C.2) places a moment restriction on the error term. Condition (C.3) imposes some relation between the best model risk and the total risks from all the models, in that the best model should not be too much better than all other models. Specifically, let
and
, then, given
, a sufficient condition of Condition (C.3) is
with
, which means that the risk of the best model can increase more slowly than that of the worst model, but not too much more slowly. In addition, a consequence of Condition (C.3) is that
, which implies that there is no correctly specified candidate model with a finite dimension. This can be seen from an argument of contradiction. If the
th candidate model is correctly specified with a finite
, then
which is finite if
is bounded and hence is a contradiction. Similar conditions are used in Wan et al. (2010), Liu and Okui (2013) and Ando and Li (2014). Condition (C.4) is commonly used to ensure the asymptotic optimality of cross-validation; see, for example, Andrews (1991) and Hansen and Racine (2012). Condition (C.5) is about the sum of the squares of the elements of
and is commonly used in the context of linear regression; see, for example, Wan et al. (2010) and Liang et al. (2011). Condition (C.6) means that
has order the same as or smaller than
. A similar requirement
, where
is the sample size, has been used in Wan et al. (2010) and Liu et al. (2016).
Theorem 3.1
Assume that Conditions(C.1)–(C.6) hold. Then as
,
(3.1)
Theorem 3.1 shows that the MAC estimator is asymptotically optimal in that it leads to a squared error loss that is asymptotically identical to that of the infeasible best possible model average estimator. The proof of Theorem 3.1 is given in Online Appendix A.1.
3.2. Asymptotic optimality without independence
We now consider the situation where the elements of
are dependent. Recall that
, and thus we allow
to be nondiagonal. Let
, and
, where
is the Moore–Penrose generalized inverse matrix of
. We can still establish the asymptotic optimality described in Theorem 3.1 with the following additional conditions.
Condition (C.7)
![]()
Condition (C.8)
.
Condition (C.9)
, where
and c are positive constants.
Remark 3.2
Condition (C.7) is similar to Condition (C.3) but is weaker. It is implied by Condition (C.3) with
. Condition (C.8) is similar to condition (22) in Zhang et al. (2013). When all candidate models are nested within the largest candidate model
defined in (2.11),
, so Condition (C.8) is implied by
, which is again a restriction on
and is similar to Condition (C.6). Condition (C.9) is a commonly used boundedness requirement on the minimum and maximum eigenvalues of
.
Theorem 3.2
Assume that Conditions(C.1), (C.4), (C.5), (C.7)–(C.9) hold. Then as
, the asymptotic optimality in (3.1) still holds.
Theorem 3.2 shows that Theorem 3.1 remains valid when
is a general matrix instead of a diagonal matrix. The proof of Theorem 3.2 is given in Online Appendix A.2 in the online Supporting Information. The reason why the optimality property is retained in Theorem 3.2 lies in Condition (C.8). This condition restricts the effect of the second term in the criterion
to be ignorable on the final weight choice compared with the first term, and thus the estimate of
has an ignorable effect.
4. CONSISTENCY WITH CORRECTLY SPECIFIED CANDIDATE MODELS
As discussed in Remark 3.1, the optimality shown above essentially excludes the situation where at least one of the candidate models is correctly specified. Here, we show that when there is at least one correctly specified candidate model, the model averaging method described above can achieve the same convergence rate of these correctly specified candidate models. Let
be a
selection matrix so that
. Assume that the
th model is a correctly specified model, i.e.,
![]() |
(4.1) |
Thus,
![]() |
(4.2) |
Because the
th model has the largest number of covariates, for convenience we assume that the
th model is nested inside the
th model.
Under the
th candidate model,
, and thus the estimator of
is
. Then, the model averaging estimator of
is
![]() |
(4.3) |
To build the consistency of
, we assume the following regularity condition. Let
.
Condition (C.10)
, where
and
are constants.
Remark 4.1
Condition (C.10) requires the additional components in bigger models to contribute sufficiently different structures. Condition (C.10) is the same as the condition (A1) of Zou and Zhang (2009). In their paper,
and
are constants, and we sometimes use the same
to denote different constants.
Lemma 4.1
If Conditions(C.9) and(C.10) are satisfied, then
(4.4)
Remark 4.2
Lemma 4.1 shows the
-consistency of the correctly specified model coefficient
, which is a very common convergence result when the dimension of coefficients diverges; see, for example, He and Shao (2000) and Fan and Peng (2004). The proof of Lemma 4.1 is given in Online Appendix A.3.
Theorem 4.1
Theorem 4.1 shows the
-consistency of the optimal model averaging coefficient
. The results in Theorem 4.1 may appear counter-intuitive at first glance, since they indicate that the large size of the unknown matrix
is beneficial to us. This is a direct consequence of the assumption that the adjacency matrix
is known, and hence a larger
under such a setting does indeed represent more information. On the other hand,
is the number of parameters, and hence
indeed resembles the usual diverging parametric convergence rate. The proof of Theorem 4.1 is given in Online Appendix A.4 .
5. EXTENSIONS WITH
In this section, we extend the asymptotic optimality and consistency properties in Sections 3 and 4 to the case of
. Suppose we have a sample
, which consists of independent and identically distributed copies of
. Denote
,
,
and
for
. Let
,
be the
th component of
,
and
. Then, we know
and
. By model (2.3) and the
th candidate model (2.5), we have that
![]() |
(5.1) |
Then, the ordinary least squares estimator of
is
, and the estimator of
is
. For simplicity, we do not explicitly redefine
,
,
,
and
, other than pointing out that they are the same as defined in Section 2 except with the
in their corresponding expressions replaced by
.
The loss function is
, and the corresponding risk function is
![]() |
(5.2) |
The estimator of the risk function is
![]() |
which is an unbiased estimator of the risk
up to a constant, i.e.,
![]() |
In practice, we estimate
by
![]() |
(5.3) |
where
. Then,
is changed to
![]() |
(5.4) |
and the optimal weight vector is
![]() |
(5.5) |
Next, we consider the asymptotic optimality of the model averaging estimator
as
and
when no correct model is contained in the candidate model set.
Corollary 5.1
When
is diagonal, if Conditions(C.1)–(C.6) hold, then as
and
, the asymptotic optimality in (3.1) still holds.
Corollary 5.2
When
is nondiagonal, if Conditions(C.1), (C.4), (C.5), (C.7)–(C.9) hold, then as
and
, the asymptotic optimality in (3.1) still holds.
Corollaries 5.1 and 5.2 are the generalized versions of Theorems 3.1 and 3.2, where the sample size
is larger than 1. The proofs of Corollaries 5.1 and 5.2 are given in Online Appendices A.5 and A.6 . From these proofs, it is straightforward to see that when
, these corollaries still hold.
Similarly, we also establish the consistency of the methods when at least one candidate model is correct. Since
can go to infinity, the convergence order in Lemma 4.1 will depend on
Specifically, we have the following result.
Lemma 5.1
If Conditions(C.9) and (C.10) are satisfied, then
(5.6)
Remark 5.1
Lemma 5.1 is a generalization of Lemma 4.1 when the sample size
is allowed to diverge. The proof of Lemma 5.1 is given in Online Appendix A.7.
Corollary 5.3
From Corollary 5.3, if
, then
under those conditions. Otherwise,
. Thus, as far as the convergence rate is concerned, there is no need to increase
to be much larger than
because no more gain can be obtained. The proof of Corollary 5.3 is given in Online Appendix A.8 .
6. SIMULATION STUDY
This section is devoted to a comparison of the finite-sample performance of the MAC estimator with the AIC- and BIC-based model selection and averaging estimators. The AIC and BIC scores for the
th candidate model are
and
, respectively, where
. (When
, we use
instead of
and set
.) Buckland et al. (1997) suggested smoothed AIC (SAIC) and smoothed BIC (SBIC) model averaging methods, where the weight for the
th model is simply set as
, and similarly for
. Owing to their ease of use, the SAIC and SBIC weight choice methods have been used extensively in the literature (Wan and Zhang, 2009; Millar et al., 2014).
In the following, we consider two kinds of experimental design. In the first design, all candidate models are misspecified, while in the second one, some candidate models are correctly specified.
6.1. Experimental designs
Design 1 (All candidate models are misspecified). We set the true covariance matrix
as
, where
and the individual elements of
,
, are calculated by
, which are independently generated from a binary distribution with probability
with
or 10 for any
, while
. We set
and
, where the function
returns the largest integers not greater than
.
The
and
values together control the dimension and the sparsity of the network structure. We let
be the maximum order for
. Here we have chosen very small coefficients to ensure the positive-definiteness of
. In addition, the response vector is
, where each component of
is independently and identically simulated from either a standard normal distribution (norm), a standardized exponential distribution (exp), or a mixture of two normal distributions (mix). For the 'mix' case, the first distribution has mean zero and variance 5/9, and the second one has mean zero and variance 5, with the mixture coefficient itself generated from a binomial distribution with probability 0.9, i.e.,
, where
Binomial(0.9). We repeat the simulation 500 times under each simulation setting.
In all candidate models,
is included and
may or may not be included. Thus, we consider a total of
candidate models. Note that
when
, and
when
, and hence all candidate models are misspecified.
Design 2 (Some candidate models are correctly specified). We set the covariance matrix
as
. We consider three settings of
, which are
,
, and
. Like in Design 1, we always include
, while allowing all other components
to be zeros. Thus, we consider a total of
candidate models. All other aspects of Design 2 are the same as in Design 1.
We evaluate the performance of various estimators based on the following mean Frobenius loss (MFL):
![]() |
(6.1) |
where
is the estimator of
obtained by a given method in the
th trial, and
is the number of replications. In Design 2, where some candidate models are correctly specified, in order to illustrate the consistency of the MAC method shown in Corollary 5.3 we also calculate the mean squared error (MSE) :
![]() |
(6.2) |
where
and
are the chosen weight vector and the estimator of
obtained in the
th trial.
6.2. Results
The MFLs of Design 1 are presented in Tables 1–3 under normal, exponential and mixture distributions, respectively. The MFLs of Design 2 are presented in Tables 4–6, under normal, exponential and mixture distributions, respectively. To facilitate comparisons, we flag the best, second best and worst estimators in each case in bold, blue italic and red italic respectively. The MSEs of Design 2 are shown in Table 7.
Table 1.
The MFL under a normal distribution for Design 1 (
100).
|
|
|
MAC | SAIC | SBIC | AIC | BIC |
|---|---|---|---|---|---|---|---|
| 5 | 200 | 1 | 5.780 | 6.467 | 6.992 | 7.383 | 8.923 |
|
3.571 | 3.991 | 4.072 | 4.639 | 4.736 | ||
|
1.463 | 1.638 | 1.775 | 1.908 | 2.144 | ||
| 400 | 1 | 3.086 | 3.456 | 3.637 | 3.966 | 4.271 | |
|
1.243 | 1.518 | 1.745 | 1.723 | 2.081 | ||
|
0.551 | 0.631 | 0.798 | 0.640 | 0.916 | ||
| 10 | 200 | 1 | 7.720 | 8.305 | 8.638 | 9.181 | 9.959 |
|
4.819 | 5.143 | 5.344 | 5.460 | 5.913 | ||
|
1.733 | 1.790 | 1.805 | 1.962 | 1.953 | ||
| 400 | 1 | 3.878 | 4.372 | 4.608 | 4.660 | 5.265 | |
|
1.331 | 1.429 | 1.417 | 1.572 | 1.566 | ||
|
0.637 | 0.726 | 0.821 | 0.787 | 0.946 |
Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and italic, respectively.
Table 3.
The MFL under a mixture distribution for Design 1 (
100).
|
|
|
MAC | SAIC | SBIC | AIC | BIC |
|---|---|---|---|---|---|---|---|
| 5 | 200 | 1 | 9.801 | 10.476 | 10.954 | 11.563 | 12.515 |
|
5.952 | 7.443 | 7.460 | 8.038 | 8.036 | ||
|
2.230 | 2.481 | 2.601 | 2.792 | 3.040 | ||
| 400 | 1 | 5.233 | 5.671 | 5.803 | 6.255 | 6.430 | |
|
1.861 | 2.328 | 2.548 | 2.500 | 2.891 | ||
|
0.769 | 0.864 | 1.023 | 0.885 | 1.143 | ||
| 10 | 200 | 1 | 10.421 | 10.994 | 11.285 | 11.908 | 12.716 |
|
5.737 | 5.915 | 6.124 | 6.135 | 6.754 | ||
|
2.556 | 2.698 | 2.741 | 2.916 | 2.922 | ||
| 400 | 1 | 5.543 | 5.921 | 6.274 | 6.148 | 7.025 | |
|
1.869 | 1.977 | 1.946 | 2.150 | 2.085 | ||
|
0.825 | 0.919 | 1.010 | 0.988 | 1.143 |
Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold,bold italic and italic, respectively.
Table 4.
The MFL under a normal distribution for Design 2 (
100).
|
|
|
|
MAC | SAIC | SBIC | AIC | BIC |
|---|---|---|---|---|---|---|---|---|
|
5 | 200 | 1 | 21.577 | 24.045 | 24.885 | 26.990 | 29.942 |
|
12.983 | 14.752 | 14.289 | 16.624 | 16.140 | |||
|
5.287 | 5.470 | 5.536 | 6.071 | 6.313 | |||
| 400 | 1 | 11.103 | 12.604 | 12.678 | 14.173 | 14.393 | ||
|
4.422 | 5.161 | 5.212 | 5.670 | 5.857 | |||
|
1.719 | 1.824 | 1.796 | 1.983 | 1.958 | |||
| 10 | 200 | 1 | 24.202 | 25.178 | 25.367 | 28.237 | 28.729 | |
|
15.434 | 16.109 | 16.480 | 17.396 | 18.228 | |||
|
5.180 | 5.289 | 5.417 | 5.617 | 6.059 | |||
| 400 | 1 | 13.505 | 13.953 | 14.699 | 15.180 | 16.458 | ||
|
4.382 | 4.447 | 4.481 | 4.739 | 4.905 | |||
|
1.878 | 1.903 | 1.905 | 1.969 | 2.037 | |||
|
5 | 200 | 1 | 17.819 | 19.836 | 20.807 | 22.336 | 24.616 |
|
10.138 | 11.543 | 12.006 | 12.506 | 13.587 | |||
|
3.987 | 4.110 | 4.131 | 4.321 | 4.381 | |||
| 400 | 1 | 8.332 | 9.742 | 10.687 | 10.623 | 12.495 | ||
|
3.521 | 4.295 | 4.157 | 4.545 | 4.407 | |||
|
1.238 | 1.359 | 1.287 | 1.477 | 1.392 | |||
| 10 | 200 | 1 | 21.224 | 21.724 | 21.570 | 23.124 | 23.196 | |
|
13.374 | 13.885 | 14.068 | 14.667 | 15.207 | |||
|
4.486 | 4.647 | 4.868 | 4.814 | 5.248 | |||
| 400 | 1 | 11.434 | 11.830 | 12.125 | 12.550 | 13.279 | ||
|
3.746 | 3.838 | 4.057 | 4.009 | 4.407 | |||
|
1.626 | 1.671 | 1.710 | 1.717 | 1.793 | |||
|
5 | 200 | 1 | 15.920 | 18.239 | 18.245 | 20.264 | 20.769 |
|
9.256 | 10.970 | 10.732 | 11.693 | 11.854 | |||
|
3.820 | 4.003 | 3.922 | 4.358 | 4.287 | |||
| 400 | 1 | 7.563 | 9.210 | 9.155 | 10.037 | 10.318 | ||
|
3.295 | 4.121 | 3.923 | 4.458 | 4.335 | |||
|
1.262 | 1.363 | 1.398 | 1.490 | 1.586 | |||
| 10 | 200 | 1 | 20.512 | 21.196 | 20.628 | 22.084 | 21.468 | |
|
12.942 | 13.595 | 13.606 | 14.150 | 14.329 | |||
|
4.314 | 4.560 | 4.682 | 4.775 | 4.983 | |||
| 400 | 1 | 10.950 | 11.458 | 11.306 | 12.039 | 11.805 | ||
|
3.454 | 3.652 | 3.748 | 3.810 | 3.982 | |||
|
1.535 | 1.630 | 1.675 | 1.687 | 1.777 |
Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and italic, respectively.
Table 6.
The MFL under a mixture distribution for Design 2 (
100).
|
|
|
|
MAC | SAIC | SBIC | AIC | BIC |
|---|---|---|---|---|---|---|---|---|
|
5 | 200 | 1 | 31.494 | 34.076 | 34.723 | 37.717 | 39.548 |
|
26.376 | 33.483 | 32.943 | 35.667 | 34.934 | |||
|
7.194 | 7.459 | 7.611 | 8.072 | 8.545 | |||
| 400 | 1 | 15.549 | 17.122 | 17.399 | 18.847 | 19.249 | ||
|
7.677 | 9.692 | 9.613 | 10.261 | 10.414 | |||
|
2.260 | 2.391 | 2.382 | 2.538 | 2.538 | |||
| 10 | 200 | 1 | 28.938 | 29.761 | 29.665 | 32.988 | 32.564 | |
|
15.583 | 16.243 | 16.769 | 17.558 | 18.745 | |||
|
6.903 | 7.142 | 7.372 | 7.525 | 7.956 | |||
| 400 | 1 | 15.148 | 15.589 | 16.181 | 16.905 | 17.920 | ||
|
5.323 | 5.427 | 5.542 | 5.735 | 6.076 | |||
|
2.210 | 2.265 | 2.260 | 2.378 | 2.406 | |||
|
5 | 200 | 1 | 22.950 | 25.278 | 26.137 | 27.904 | 30.076 |
|
21.712 | 29.250 | 29.787 | 30.200 | 31.647 | |||
|
5.266 | 5.571 | 5.599 | 5.758 | 5.916 | |||
| 400 | 1 | 10.800 | 12.232 | 13.236 | 13.167 | 15.212 | ||
|
6.479 | 8.413 | 8.338 | 8.642 | 8.639 | |||
|
1.557 | 1.681 | 1.607 | 1.803 | 1.713 | |||
| 10 | 200 | 1 | 24.239 | 24.780 | 24.694 | 26.198 | 26.343 | |
|
12.803 | 13.285 | 13.529 | 14.025 | 14.709 | |||
|
5.839 | 6.101 | 6.318 | 6.307 | 6.737 | |||
| 400 | 1 | 12.337 | 12.662 | 12.935 | 13.370 | 14.056 | ||
|
4.367 | 4.474 | 4.641 | 4.616 | 4.944 | |||
|
1.864 | 1.914 | 1.952 | 1.970 | 2.045 | |||
|
5 | 200 | 1 | 21.398 | 24.199 | 24.175 | 26.096 | 27.105 |
|
20.849 | 28.529 | 28.268 | 29.559 | 29.689 | |||
|
4.998 | 5.417 | 5.327 | 5.758 | 5.717 | |||
| 400 | 1 | 9.911 | 11.531 | 11.626 | 12.332 | 12.932 | ||
|
6.340 | 8.301 | 8.151 | 8.602 | 8.592 | |||
|
1.578 | 1.672 | 1.702 | 1.805 | 1.879 | |||
| 10 | 200 | 1 | 23.860 | 24.410 | 23.873 | 25.182 | 24.778 | |
|
12.298 | 12.975 | 12.941 | 13.641 | 13.765 | |||
|
5.570 | 5.935 | 6.026 | 6.129 | 6.233 | |||
| 400 | 1 | 11.712 | 12.230 | 11.979 | 12.738 | 12.441 | ||
|
4.071 | 4.337 | 4.412 | 4.542 | 4.673 | |||
|
1.771 | 1.877 | 1.912 | 1.955 | 2.007 |
Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and italic, respectively.
Table 7.
The MSE of the MAC estimator
under Design 2.
|
|
|||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
Norm | Exp | Mix | Norm | Exp | Mix |
|
200 | 1 | 38.156 | 42.237 | 42.153 | 72.417 | 71.881 | 79.302 |
|
19.555 | 23.064 | 30.482 | 51.202 | 38.125 | 41.443 | ||
|
11.337 | 9.225 | 13.505 | 16.759 | 18.577 | 20.464 | ||
| 400 | 1 | 27.714 | 26.913 | 25.331 | 73.968 | 69.991 | 61.756 | |
|
7.020 | 11.045 | 14.087 | 23.588 | 25.602 | 24.684 | ||
|
4.684 | 5.221 | 5.467 | 10.584 | 11.597 | 11.795 | ||
|
200 | 1 | 40.017 | 38.634 | 40.506 | 60.281 | 59.040 | 64.643 |
|
19.370 | 22.159 | 28.908 | 37.361 | 28.449 | 32.889 | ||
|
10.276 | 8.056 | 12.130 | 12.684 | 13.218 | 15.312 | ||
| 400 | 1 | 24.519 | 24.133 | 24.501 | 58.768 | 59.701 | 54.037 | |
|
6.793 | 10.555 | 15.713 | 20.278 | 21.214 | 20.150 | ||
|
4.348 | 4.918 | 4.737 | 8.412 | 9.281 | 9.228 | ||
|
200 | 1 | 41.005 | 38.222 | 41.347 | 55.315 | 58.739 | 64.556 |
|
20.543 | 22.352 | 28.944 | 37.371 | 26.602 | 31.185 | ||
|
10.724 | 8.065 | 12.109 | 11.796 | 12.293 | 14.133 | ||
| 400 | 1 | 26.025 | 23.938 | 24.792 | 58.897 | 59.446 | 52.941 | |
|
6.714 | 10.575 | 15.647 | 19.404 | 20.189 | 18.872 | ||
|
4.396 | 4.902 | 4.765 | 8.020 | 8.883 | 8.598 | ||
Table 2.
The MFL under an exponential distribution for Design 1 (
100).
|
|
|
MAC | SAIC | SBIC | AIC | BIC |
|---|---|---|---|---|---|---|---|
| 5 | 200 | 1 | 9.828 | 10.631 | 11.169 | 11.581 | 13.106 |
|
5.036 | 5.245 | 5.316 | 5.736 | 5.817 | ||
|
2.512 | 2.784 | 2.937 | 3.068 | 3.374 | ||
| 400 | 1 | 5.120 | 5.542 | 5.726 | 5.997 | 6.258 | |
|
1.997 | 2.212 | 2.381 | 2.437 | 2.752 | ||
|
0.833 | 0.919 | 1.068 | 0.941 | 1.189 | ||
| 10 | 200 | 1 | 11.093 | 11.650 | 11.997 | 12.494 | 13.450 |
|
5.464 | 5.576 | 5.813 | 5.789 | 6.366 | ||
|
2.507 | 2.593 | 2.619 | 2.808 | 2.814 | ||
| 400 | 1 | 5.321 | 5.866 | 6.205 | 6.142 | 6.909 | |
|
2.028 | 2.148 | 2.135 | 2.338 | 2.295 | ||
|
0.856 | 0.948 | 1.043 | 1.015 | 1.150 |
Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold,bold italic anditalic, respectively.
Table 5.
The MFL under an exponential distribution for Design 2 (
100).
|
|
|
|
MAC | SAIC | SBIC | AIC | BIC |
|---|---|---|---|---|---|---|---|---|
|
5 | 200 | 1 | 31.665 | 34.181 | 35.037 | 37.668 | 39.586 |
|
15.732 | 16.454 | 16.369 | 18.079 | 18.267 | |||
|
7.938 | 8.351 | 8.498 | 8.989 | 9.468 | |||
| 400 | 1 | 15.544 | 17.232 | 17.525 | 18.749 | 19.361 | ||
|
6.021 | 6.392 | 6.580 | 6.872 | 7.198 | |||
|
2.397 | 2.506 | 2.453 | 2.653 | 2.609 | |||
| 10 | 200 | 1 | 29.631 | 30.776 | 30.796 | 33.824 | 33.871 | |
|
14.371 | 14.837 | 15.341 | 16.025 | 17.127 | |||
|
6.630 | 6.789 | 6.967 | 7.218 | 7.547 | |||
| 400 | 1 | 15.451 | 15.858 | 16.652 | 17.158 | 18.424 | ||
|
5.627 | 5.718 | 5.850 | 6.017 | 6.307 | |||
|
2.242 | 2.266 | 2.290 | 2.327 | 2.427 | |||
|
5 | 200 | 1 | 22.577 | 24.649 | 25.606 | 27.176 | 29.480 |
|
11.560 | 12.278 | 13.001 | 13.189 | 14.650 | |||
|
5.390 | 5.704 | 5.729 | 5.994 | 6.129 | |||
| 400 | 1 | 10.597 | 12.065 | 13.314 | 12.916 | 15.328 | ||
|
4.035 | 4.387 | 4.381 | 4.600 | 4.570 | |||
|
1.631 | 1.751 | 1.698 | 1.866 | 1.804 | |||
| 10 | 200 | 1 | 24.457 | 25.007 | 24.816 | 26.269 | 26.431 | |
|
11.745 | 12.177 | 12.482 | 12.967 | 13.621 | |||
|
5.404 | 5.608 | 5.821 | 5.838 | 6.230 | |||
| 400 | 1 | 12.708 | 13.020 | 13.339 | 13.780 | 14.430 | ||
|
4.613 | 4.734 | 4.922 | 4.882 | 5.271 | |||
|
1.873 | 1.920 | 1.967 | 1.975 | 2.038 | |||
|
5 | 200 | 1 | 20.949 | 23.550 | 23.422 | 25.406 | 26.054 |
|
10.631 | 11.572 | 11.5 08 | 12.419 | 12.514 | |||
|
5.136 | 5.518 | 5.399 | 5.958 | 5.812 | |||
| 400 | 1 | 9.642 | 11.283 | 11.661 | 12.048 | 13.004 | ||
|
3.916 | 4.299 | 4.249 | 4.602 | 4.578 | |||
|
1.661 | 1.754 | 1.811 | 1.880 | 1.997 | |||
| 10 | 200 | 1 | 23.945 | 24.523 | 23.774 | 25.440 | 24.640 | |
|
11.259 | 11.799 | 11.802 | 12.341 | 12.515 | |||
|
5.168 | 5.475 | 5.571 | 5.738 | 5.920 | |||
| 400 | 1 | 12.174 | 12.623 | 12.469 | 13.153 | 12.994 | ||
|
4.365 | 4.602 | 4.697 | 4.777 | 4.898 | |||
|
1.797 | 1.882 | 1.912 | 1.968 | 2.006 |
Notes: The smallest, second smallest and largest MFLs in each row are highlighted in bold, bold italic and red italic, respectively.
First, different distributions of
have little quantitative effect on the performance of each method. Second, in most cases, evaluated by MFL, MAC is the best, and SAIC/SBIC outperforms AIC/BIC, which is as expected. Tables A.1–A.6 in the online Supporting Information show the standard errors for the MFLs, for which, we can see that our method performs the best stable in all cases. Finally, we can see from Table 7 that as
or
increases, the MSE of
decreases, which reflects the consistency of
.
7. EMPIRICAL APPLICATION
To illustrate the usefulness of the proposed method, we now apply the proposed MAC method to analyse the data on passenger traffic volume in airports. The dataset consists of yearly passenger traffic volumes at
airports in Mainland China in 2017,1 obtained from the Civil Aviation Administration of China2. The response variable
contains the centralized values of the logarithm of passenger traffic volumes at these airports. It is intuitive to see that the number of nonstop flights at an airport affects the passenger traffic volume at the airport. So, we use the matrix of nonstop flights as the adjacency matrix
. Specifically, the
entry of
is
if there is at least one nonstop flight between airports
and
, and 0 otherwise, where
is the number of airports that have a nonstop flight with airport
. We set the largest order of
to be
, and then the numbers of candidate models to be considered is
.
We consider the five methods studied in Section 6. The values of AIC, BIC and the weights of SAIC, SBIC and MAC are reported in Table 8. We can see that the AIC and BIC methods select the same model (0, 1, 2, 3), which means that
,
,
and
are in the model, and the SAIC and SBIC methods also put almost all the weights on these models. However, the MAC method puts weights on the models (0, 2), (0, 3) and (0, 1, 3).
Table 8.
Model selection criterion values and model averaging weights in the analysis of Chinese airports in 2017.
| Model selection criterion values | Weights | ||||
|---|---|---|---|---|---|
| Models | AIC | BIC | SAIC | SBIC | MAC |
| (0) | 1361.723 | 1365.148 | 0.000 | 0.000 | 0.000 |
| (0, 1) | 1357.781 | 1364.630 | 0.000 | 0.000 | 0.000 |
| (0,2) | 1195.696 | 1202.546 | 0.002 | 0.043 | 0.593 |
| (0,3) | 1266.882 | 1273.731 | 0.000 | 0.000 | 0.132 |
| (0, 1, 2) | 1198.360 | 1208.635 | 0.000 | 0.002 | 0.000 |
| (0, 1, 3) | 1242.146 | 1252.421 | 0.000 | 0.000 | 0.275 |
| (0,2,3) | 1190.737 | 1201.012 | 0.019 | 0.093 | 0.000 |
| (0, 1, 2, 3) | 1182.865 | 1196.565 | 0.979 | 0.861 | 0.000 |
Notes: Model (0,2,3) means the model including
,
and
. The minimum AIC and BIC values are highlighted in bold.
To compare the different methods, we use the estimated covariance matrix
from each method as the true covariance matrix, i.e., we let
. Then we create
replications of
randomly generated from
. Subsequently, we estimate
based on the five methods used above and calculate their mean and median of the MFLs across the 500 replications. Specifically,
![]() |
(7.1) |
and
![]() |
(7.2) |
where
is the estimator of
obtained by a given method in the
th trial. The results are shown in Table 9, which also reports the standard deviation of each method and the optimal rate of each method, defined as the proportion of times in which the method results in the smallest MFL across the
replication trials.
Table 9.
MFL in the analysis of Chinese airports in 2017.
| Method | MAC | SAIC | SBIC | AIC | BIC | |
|---|---|---|---|---|---|---|
| MAC | Mean | 84.669 | 149.813 | 148.348 | 150.454 | 149.039 |
| Median | 63.676 | 75.379 | 70.190 | 77.443 | 73.255 | |
| Standard deviation | 94.413 | 304.467 | 304.698 | 304.360 | 304.712 | |
| Optimal rate | 0.526 | 0.064 | 0.140 | 0.172 | 0.098 | |
| SAIC | Mean | 137.931 | 354.814 | 353.701 | 355.525 | 354.004 |
| Median | 94.863 | 102.324 | 98.094 | 105.817 | 97.684 | |
| Standard deviation | 235.541 | 1403.513 | 1403.687 | 1403.396 | 1403.677 | |
| Optimal rate | 0.528 | 0.070 | 0.118 | 0.202 | 0.082 | |
| SBIC | Mean | 129.912 | 305.138 | 303.586 | 305.603 | 304.121 |
| Median | 94.681 | 103.598 | 99.498 | 103.875 | 101.281 | |
| Standard deviation | 271.782 | 1512.634 | 1512.853 | 1512.588 | 1512.798 | |
| Optimal rate | 0.538 | 0.064 | 0.120 | 0.208 | 0.070 | |
| AIC/BIC | Mean | 115.859 | 226.284 | 224.767 | 226.895 | 225.073 |
| Median | 89.564 | 98.179 | 95.068 | 100.434 | 96.556 | |
| Standard deviation | 150.260 | 650.444 | 650.764 | 650.334 | 650.732 | |
| Optimal rate | 0.524 | 0.074 | 0.106 | 0.212 | 0.084 |
Notes: The methods listed in the first column are used to estimate the correlation matrix, and then the estimated correlation matrix is used as the truth to generate the corresponding data. The minimum values in every rows are highlighted in bold. Since AIC and BIC select the same model in this real data analysis, the last method is AIC/BIC.
The results show that the MAC method consistently outperforms all other methods, regardless of how the true
is selected and which performance evaluation criterion is used. We find this quite remarkable. In terms of the mean and median of the MFLs, the MAC method performs the best among all five estimators. In particular, the mean MFL of the MAC method is about half of that by any other method in all cases. The standard deviation of the MFL of MAC is much smaller than those of others in all cases, which means that the MAC performance is the most stable. In terms of the percentage of times where a method shows optimal performance (optimal rate), the MAC estimator always attains the highest score among the five methods, often with a value of more than 50%, indicating that in over half of the 500 trials, MAC is the champion. Depending on the means and medians, SAIC/SBIC performs slightly better than AIC/BIC.
8. CONCLUDING REMARKS
In this article, we used the covariance regression network model, which treats the covariance as a polynomial function of the symmetric adjacency matrix. The model averaging method was used to estimate the high-dimensional covariance matrix, and the candidate models were constructed through different orders of a polynomial function. The optimal weights were obtained by minimizing the newly proposed Mallows-type model averaging criterion. We proved the asymptotic optimality and consistency of the resulting MAC estimators for different situations. Both numerical simulations and a case study on Chinese airport network structure data were conducted to demonstrate the validity of the proposed approach.
It is worth noting that our method combines polynomials of
. If the true covariance matrix is far from the polynomial form, the MAC method may not yield an accurate estimate. In addition, when conjecturing that
is a combination of banded and Toeplits-type matrices, these matrices should be included in candidate models, but then there is no single orthogonal matrix to diagonalize them simultaneously. In such a case, MAC will not be applicable. How to develop a model averaging method for estimating this kind of covariance matrix warrants further study.
Supplementary Material
ACKNOWLEDGEMENTS
We thank the referee, the associate editor, the co-editor Victor Chernozhukov and Prof. Hansheng Wang for many constructive comments and suggestions. We thank Prof. Wei Lan for providing his codes. Zhang, the corresponding author, was supported by the National Key R&D Program of China (2020AAA0105200), the National Natural Science Foundation of China (grant nos. 71925007, 11688101 and 71631008), the Beijing Academy of Artificial Intelligence, and the Youth Innovation Promotion Association of the Chinese Academy of Sciences. Ma was supported by grants from the National Science Foundation and the National Institutes of Health. Zou was supported by the National Natural Science Foundation of China (grant nos. 11971323 and 12031016). All errors remain the authors.
Notes
Co-editor Victor Chernozhukov handled this manuscript.
Footnotes
In 2017, there were 229 civil airports in Mainland China, 227 of which had scheduled flights.
Contributor Information
Rong Zhu, School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
Xinyu Zhang, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
Yanyuan Ma, Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA.
Guohua Zou, School of Mathematical Sciences, Capital Normal University, Beijing 100048, China.
Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher’s website:
Online Appendix
Replication Package
REFERENCES
- Ando T., Li K.-C. (2014). A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association. 109, 254–65. [Google Scholar]
-
Andrews D. W. K. (1991). Asymptotic optimality of generalized c
, cross-validation, and generalized cross-validation in regression with heteroskedastic errors. Journal of Econometrics. 47, 359–77. [Google Scholar] - Bilmes J. A. (2000). Factored sparse inverse covariance matrices. IEEE International Conference. 2, 1009–12. [Google Scholar]
- Buckland S. T., Burnham K. P., Augustin N. H. (1997). Model selection: An integral part of inference. Biometrics. 53, 603–18. [Google Scholar]
- Campbell J. Y., Lo A. W., Mackinlay A. C., Whitelaw R. F. (1998). The econometrics of financial markets. Macroeconomic Dynamics. 2, 559–62. [Google Scholar]
- Chen X. H., Conley T. G. (2001). A new semi-parametric spatial model for panel time series. Journal of Econometrics. 105, 59–83. [Google Scholar]
- Fan J., Fan Y., Lv J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics. 147, 186–97. [Google Scholar]
- Fan J., Peng H. (2004). On nonconcave penalized likelihood with diverging number of parameters. Annals of Statistics. 32, 928–61. [Google Scholar]
- Friedman J. H., Hastie T., Tibshirani R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 9, 432–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao Y., Zhang X., Wang S., Zou G. (2016). Model averaging based on leave-subject-out cross-validation. Journal of Econometrics. 192, 139–51. [Google Scholar]
- Hansen B. E. (2007). Least squares model averaging. Econometrica. 75, 1175–89. [Google Scholar]
- Hansen B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics. 5, 495–530. [Google Scholar]
- Hansen B. E., Racine J. S. (2012). Jackknife model averaging. Journal of Econometrics. 167, 38–46. [Google Scholar]
- He X., Shao Q. M. (2000). On parameters of increasing dimensions. Journal of Multivariate Analysis. 73, 120–35. [Google Scholar]
- Jagannathan R., Ma T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraints helps. Journal of Finance. 58, 1651–84. [Google Scholar]
- Lan W., Fang Z., Wang H., Tsai C. (2018). Covariance matrix estimation via network structure. Journal of Business and Economic Statistics. 36, 359–69. [Google Scholar]
- Leung G., Barron A. (2006). Information theory and mixing least-squares regressions. IEEE Transactions on information theory. 52, 3396–410. [Google Scholar]
- Liang H., Zou G., Wan A. T. K., Zhang X. (2011). Optimal weight choice for frequentist model average estimators. Journal of the American Statistical Association. 106, 1053–66. [Google Scholar]
-
Liu Q., Okui R. (2013). Heteroskedasticity-robust C
model averaging. Econometrics Journal. 16, 463–72. [Google Scholar] - Liu Q., Okui R., Yoshimura A. (2016). Generalized least squares model averaging. Econometric Reviews. 35, 1692–752. [Google Scholar]
- Markowitz H. (1952). Portfolio selection. Journal of Finance. 7, 77–91. [Google Scholar]
- Millar C. P., Jardim E., Scott F., Osio G. C., Mosqueira I., Alzorriz N. (2014). Model averaging to streamline the stock assessment process. ICES Journal of Marine Science. 72, 93–98. [Google Scholar]
- Wan A. T. K., Zhang X. (2009). On the use of model averaging in tourism research. Annals of Tourism Research. 36, 525–32. [Google Scholar]
- Wan A. T. K., Zhang X., Zou G. (2010). Least squares model averaging by Mallows criterion. Journal of Econometrics. 156, 277–83. [Google Scholar]
- Whittle P., (1960), Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and its Applications. 5, 331–5. [Google Scholar]
- Yuan Z., Yang Y. (2005). Combining linear regression models: When and how?. Journal of the American Statistical Association. 100, 1202–14. [Google Scholar]
- Zhang X., (2010), Model averaging and its applications. Ph.D. Thesis, Academy of Mathematics and Systems Science, Chinese Academy of Sciences. [Google Scholar]
- Zhang X., Wan A. T. K., Zou G. (2013). Model averaging by jackknife criterion in models with dependent data. Journal of Econometrics. 174, 82–94. [Google Scholar]
- Zhang X., Wang W. (2019). Optimal model averaging estimation for partially linear models. Statistica Sinica. 29, 693–718. [Google Scholar]
- Zhu R., Zou G., Zhang X. (2018). Model averaging for multivariate multiple regression models. Statistics. 52, 205–27. [Google Scholar]
- Zou H., Zhang H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics. 37, 1733–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


























































































