ABSTRACT
This paper develops a new method to estimate a large-dimensional covariance matrix when the variables have no natural ordering among themselves. The modified Cholesky decomposition technique is used to provide a set of estimates of the covariance matrix under multiple orderings of variables. The proposed estimator is in the form of a linear combination of these available estimates and the identity matrix. It is positive definite and applicable in large dimensions. The merits of the proposed estimator are demonstrated through the numerical study and a real data example by comparison with several existing methods.
KEYWORDS: Cholesky factor, ensemble estimate, large-dimensional, ordering of variables, positive definite
1. Introduction
Estimation of a covariance matrix is among the most fundamental problems in statistics, since the covariance matrix plays a key role in a broad area of data analysis and statistical applications, as in principle component analysis, portfolio optimization, linear discriminant analysis (LDA) and so forth. However, the estimation is difficult due to two aspects. One is the positive definite property of a covariance matrix. The other is the fact that the number of unknown parameters is quadratically increasing as the dimensionality increases. Although the sample covariance matrix is an unbiased estimator and performs well in the traditional situation where the sample size n is far more than the number of variables p, it performs poorly in large-dimensional settings when n is close to p or n<p. In particular, the eigenvalues of the sample covariance matrix are over dispersed and the eigenvectors are inconsistent in large dimensions [15]. Additionally, we know that the classification by the LDA breaks down and reduces to random guessing as [1]. Therefore, it calls for the alternative estimation for a large covariance matrix.
The framework of covariance matrix estimation with large p can be loosely classified into two situations. One situation is that there exists a natural ordering among the variables. This is commonly the case in time series, longitudinal data, spatial data, etc. The estimation is routinely based on the assumption that the variables are ordered and those far apart in the ordering are only weakly correlated [10,14,26,30,33]. The other situation is that the variables do not have a natural ordering among themselves. It often happens in economic data, genetic data, industrial data, etc. In this case, one strategy is to establish an ordering by some data-driven schemes first, then estimate the covariance matrix based on it [7,24,28]. However, the established ordering may make little sense for practical data, for example, the stock market data or gene expression data. The other prevailing strategy, instead of establishing an ordering, aims at estimating the covariance matrix directly. Ledoit and Wolf [18] introduced a linear combination of the sample covariance matrix and a diagonal matrix. Bickel and Levina [2] imposed the hard thresholding technique on the small entries of the sample covariance matrix to form the estimator. At the same time, a banded structure estimate for high-dimensional covariance matrix was also proposed by Bickel and Levina [3]. However, such methods, which are based on thresholding or banding, cannot guarantee the positive definiteness of the estimated covariance matrix. Later, Bien and Tibshirani [4] developed an estimator by adding the penalty to the log-likelihood function. The computation, however, is much intensive due to the non-convexity of the objective function. More recent and related work of covariance matrix estimation can be found in [6,8,21,29,31], among others.
The modified Cholesky decomposition (MCD) [7,22,34] of a covariance matrix is another important technique that provides an unconstrained and statistically interpretable parameterization for a covariance matrix. Zheng et al. [34] used the MCD to obtain a set of covariance matrix estimates under multiple orderings of variables and introduced an averaging model to estimate the covariance matrix. In this paper, we adopt the idea of Zheng et al. [34] to propose an ensemble estimate of a large covariance matrix for the case where an ordering of the variables does not exist. The proposed method obtains multiple covariance matrix estimates by considering different orderings of variables in the MCD. The proposed estimator is a linear combination of the average of these available estimates and the identity matrix by employing the loss function as in Ledoit and Wolf [18]. The estimator of Ledoit and Wolf [18] was developed based on the sample covariance matrix , while our proposed estimator is built based on the average of available estimates, which is more stable than in high dimensions. Hence, the proposed estimator is supposed to perform better than that of Ledoit and Wolf [18]. This is can be seen in the simulation study.
The rest of this paper is organized as follows. Section 2 briefly reviews the MCD approach to estimate the covariance matrix, and introduces the proposed estimator in detail. The numerical simulation results and a real data example are reported in Sections 3 and 4. We conclude our paper in Section 5.
2. Estimation method
2.1. The modified Cholesky decomposition
Let be a p-dimensional vector of random variables with an unknown covariance matrix . Without loss of generality, we assume that . The key idea of the modified Cholesky decomposition (MCD) is that the covariance matrix can be diagonalized by a lower triangular matrix resulting from the regression coefficients when is regressed on its predecessors . Specifically, for , define
(1) |
where , and represents the vector of regression coefficients. The error terms are independent from each other with and , where is defined as . Accordingly, a lower triangular matrix is constructed with ones on its diagonal and as its jth row. Then the MCD of the covariance matrix is
where is a diagonal matrix.
2.2. The proposed estimator
Note that the Cholesky-based covariance matrix estimate depends on the ordering of variables . This indicates that different orderings of variables will generally lead to different estimates of the covariance matrix. Hence, when the variables are not accompanied by the natural ordering, multiple estimates of can be obtained via the MCD by considering a set of random permutations of the orderings. Define a permutation mapping , which gives a rearrangement of the ordering of variables, . The corresponding permutation matrix is formed with the entries in ith column all 0 except taking 1 at position . Suppose data are independently and identically distributed from a multivariate normal distribution , and let be the data matrix. Then, the permutated data matrix is
where is the th column of . Note that the normality assumption is not necessary for estimating a covariance matrix using the MCD technique. However, a number of literature studying the Cholesky-based covariance matrix estimation assumes that the data are sampled from a normal distribution for the sake of using the normal likelihood as the loss function or establishing the theoretical property of estimators [7,14,19,20,22,23,32]. Here we keep the normality assumption to derive the proposed estimator in Theorem 2.3. Some discussion on the Cholesky-Based estimation for the non-normal data is provided in Section 5.
In the situation where the sample size n is close to p or n<p, the regularization technique such as Lasso [27] can be imposed on the regressions (1) to construct the Cholesky factor matrix . Hence, given a permutation π, we have
(2) |
where represents the first (j-1) columns of , is the tuning parameter, and stands for the vector norm. The optimization problem (2) can be solved by the coordinate descent algorithm [11]. The tuning parameters are determined by the 10-fold cross validation scheme for each Lasso regression, where the mean squared error of the response variable is used as the loss measure. As a result, the corresponding estimates and can be constructed for the permutation π. Now we randomly generate M different permutations , and build the estimates and . Consequently, is an estimate of covariance matrix for the permutation .
To make it explicit that the variables and Cholesky-based estimates depend on the sample size n, we use to index a sequence of statistical models thereafter. For example, is a matrix of n i.i.d. observations for variables with mean zero and covariance matrix . Notations and , represent the MCD covariance matrix estimates and Cholesky factors estimates, respectively, for permutation associated with the sample size n. The number of variables can vary and even go to infinity for sample size n. Now a natural question arises as how to efficiently use these available to form an ensemble covariance matrix estimate. Borrowing the idea from Ledoit and Wolf [18], we propose to consider
(3) |
where is proposed by Zheng et al. [34]. The coefficients and are constants. is defined as . Compared with the estimate of Ledoit and Wolf [18], we use instead of the sample covariance in (3). Since the estimate performs more stable than in large dimensions, the proposed estimate could be more accurate. On the other hand, the percentage relative improvement of the proposed estimate over the average estimate is equal to as discussed in [18], where and are defined in the next paragraph. Hence, our proposed method is supposed to perform better the aforementioned two approaches, which can be seen in the numerical study in Section 3.
To solve the optimization problem (3), we need to introduce some notations and assumptions. Denote the inner product as . Define , , and . Let denote the true covariance matrix of the observations , and use the set to represent the collection of nonzero entries in the lower left part of Cholesky factor matrix corresponding to the variable ordering . Denote by the maximum of the cardinality of for . Define the singular values of matrix as in a decreasing order. We assume the following conditions:
there exists a constant such that the singular values of are bounded as ;
the tuning parameters in (2) satisfy ;
.
The condition (C1) is also made in [12,17], which guarantees that is positive definite. Now we present the following lemma and theorem.
Lemma 2.1
Suppose are independently and identically distributed observations from . Under (C1)–(C2), we have
Proof Proof of Lemma 1 —
The proof is similar to Theorem 1 in [25] and Theorem 2 in [16], thus is omitted here.
Lemma 2.2
Assume all the assumptions in Lemma 2.1 hold. Under (C3), we have .
Proof Proof of Lemma 2 —
We first derive the consistent property of as follows:
where the fourth equality is provided by Lemma 1. Then we have
, which results from the consistent property of , establishes this Lemma.
Theorem 2.3
Assume all the assumptions in Lemma 2.2 hold, then the solution to (3) is .
Proof Proof of Theorem 1 —
The optimization problem (3) can be re-written as
Therefore, the objective can be decomposed into three parts as follows:
Under (C3), we have as . Let and be the minimizers of and , respectively. According to Lemma 2 of Hjort and Pollard [13], we have . Now let us consider the minimization problem of
(4) It is easy to see that the optimal value of υ does not depend on ρ. Hence, the closed form for υ is developed as
(5) Replacing (5) into (4) lead to minimizing problem of . The first derivative is . Hence, the optimal value of ρ is . This completes the proof.
Solving the covariance estimate seems challenging since the four terms , , and all depend on the true covariance matrix which is apparently unobservable. [18] showed that these exist asymptotically consistent estimators for , , and , hence for . Define
Then and are estimators for and , respectively. Note that is the average of matrices , hence, we can estimate by seeing how far each of deviates from their average . Define
where is introduced to ensure that is nonnegative. Then and are estimators for and , respectively. Now we replace the unobservable terms in the formula of in Theorem 2.3 with their estimators, yielding our proposed estimate of covariance matrix
3. Simulation study
This section examines the finite performance of the proposed method (Prop) by comparison with several other approaches. The competing methods include the sample covariance matrix as a benchmark, Ledoit and Wolf's estimate (LW) [18], Bien and Tibshirani's estimate (BT) [4], minimax estimate (MX) [9] as well as the average estimate (AVE). The BT estimate minimizes the negative log-likelihood function with penalty
where is the tuning parameter selected by the cross-validation, and is a positive definite constraint for the covariance matrix imposed on the objective. This optimization problem is solved by the majorization-minimization algorithm. The MX estimate preserves the sample eigenvectors but replaces the sample eigenvalues by , where stand for the sample eigenvalues in decreasing order.
We consider four covariance matrix structures as follows:
= diag(1, 2, …, p). The inverse covariance matrix is a diagonal matrix with diagonal elements from 1 to p.
= AR(0.5). The conditional covariance between any two variables and is fixed to be , .
.
, where CS(0.5) is a compound structure matrix with diagonal elements 1 and others 0.5. represents zero matrix.
1 is a diagonal matrix. 2 is a autoregressive structure with homogeneous variances and correlations declining. 3 is similar to the matrix design used in [5,31]. 4 is a block compound structure on the upper left corner. For each model, data are independently generated from with the sample size n=50 and three different settings of variable sizes . To measure the performance of each covariance matrix estimate , we consider the squared Frobenius norm (F), the mean absolute error (MAE) and quadratic loss (QL), defined as follows (up to some scale)
We also use the entropy loss (EN) and the Kullback–Leibler loss (KL) given by
Throughout this section, the number of permutations M is set to be 30 for the proposed method, as suggested by Zheng et al. [34]. The numerical results of the averaged loss measures over 100 replicates and their standard errors in parenthesis for each method are summarized in Tables 1–4. The best method with lowest averages regarding each loss measure is shown in bold for each setting of variable sizes. Dashed lines in the tables represent the corresponding values not available due to the matrix singularity.
Table 2. The averages and standard errors (in parenthesis) of estimates for 2.
F | MAE | QL | EN | KL | ||
---|---|---|---|---|---|---|
p=30 | 0.804 (0.005) | 3.485 (0.022) | 0.633 (0.006) | 12.35 (0.089) | 37.83 (0.075) | |
BT | 0.855 (0.001) | 2.187 (0.003) | 0.197 (0.001) | 5.332 (0.032) | 11.45 (0.096) | |
MX | 0.617 (0.003) | 2.589 (0.012) | 0.313 (0.003) | 6.946 (0.067) | 15.55 (0.316) | |
AVE | 0.576 (0.005) | 1.917 (0.021) | 0.216 (0.004) | 3.307 (0.050) | 4.210 (0.102) | |
LW | 0.593 (0.003) | 2.445 (0.016) | 0.475 (0.006) | 5.147 (0.039) | 4.860 (0.043) | |
Prop | 0.519 (0.004) | 1.719 (0.013) | 0.195 (0.003) | 2.971 (0.041) | 3.771 (0.073) | |
p=50 | 1.028 (0.005) | 5.769 (0.029) | 1.040 (0.006) | – | – | |
BT | 0.855 (0.005) | 2.253 (0.011) | 0.208 (0.002) | 9.987 (0.177) | 25.003 (1.103) | |
MX | 0.738 (0.002) | 3.735 (0.010) | 0.407 (0.001) | – | – | |
AVE | 0.604 (0.004) | 2.229 (0.014) | 0.235 (0.004) | 5.977 (0.079) | 7.802 (0.13) | |
LW | 0.668 (0.002) | 3.425 (0.021) | 0.632 (0.006) | 10.808 (0.051) | 10.231 (0.067) | |
Prop | 0.563 (0.003) | 2.066 (0.010) | 0.211 (0.002) | 5.473 (0.051) | 7.246 (0.095) | |
p=100 | 1.443 (0.004) | 11.45 (0.036) | 2.068 (0.010) | – | – | |
BT | 0.866 (0.008) | 2.279 (0.015) | 0.222 (0.002) | 21.447 (0.312) | 52.644 (1.083) | |
MX | 0.936 (0.001) | 5.560 (0.009) | 0.637 (0.001) | – | – | |
AVE | 0.737 (0.003) | 2.957 (0.015) | 0.410 (0.006) | 15.085 (0.142) | 20.716 (0.270) | |
LW | 0.757 (0.002) | 5.092 (0.028) | 0.875 (0.006) | 27.601 (0.102) | 25.466 (0.078) | |
Prop | 0.616 (0.002) | 2.480 (0.010) | 0.242 (0.002) | 12.560 (0.081) | 17.186 (0.153) |
Table 3. The averages and standard errors (in parenthesis) of estimates for 3.
F | MAE | QL | EN | KL | ||
---|---|---|---|---|---|---|
p=30 | 1.043 (0.050) | 5.642 (0.277) | 0.627 (0.008) | 12.50 (0.091) | 40.03 (0.751) | |
BT | 1.224 (0.095) | 6.219 (0.530) | 1.010 (0.013) | 8.732 (0.079) | 7.558 (0.187) | |
MX | 1.671 (0.056) | 8.839 (0.319) | 0.363 (0.004) | 7.870 (0.066) | 18.53 (0.326) | |
AVE | 1.056 (0.066) | 4.922 (0.356) | 0.207 (0.006) | 3.124 (0.067) | 4.233 (0.122) | |
LW | 0.987 (0.053) | 4.830 (0.290) | 17.62 (1.038) | 60.29 (2.138) | 18.50 (0.365) | |
Prop | 0.945 (0.048) | 4.681 (0.268) | 0.180 (0.003) | 2.751 (0.037) | 3.617 (0.069) | |
p=50 | 1.286 (0.053) | 8.292 (0.357) | 1.049 (0.008) | – | – | |
BT | 3.823 (0.093) | 25.70 (0.646) | 2.070 (0.164) | 22.25 (0.429) | 28.76 (1.180) | |
MX | 2.501 (0.049) | 16.48 (0.359) | 0.469 (0.002) | – | – | |
AVE | 1.215 (0.077) | 7.562 (0.532) | 0.224 (0.004) | 5.415 (0.074) | 6.551 (0.140) | |
LW | 1.256 (0.053) | 7.725 (0.366) | 18.62 (0.914) | 10.53 (3.258) | 32.01 (0.518) | |
Prop | 1.196 (0.050) | 7.397 (0.341) | 0.194 (0.002) | 4.760 (0.046) | 6.363 (0.099) | |
p=100 | 1.751 (0.055) | 14.63 (0.488) | 2.045 (0.010) | – | – | |
BT | 5.190 (0.045) | 40.21 (0.354) | 108.8 (8.656) | 568.9 (7.719) | 277.0 (6.078) | |
MX | 3.562 (0.029) | 28.20 (0.252) | 0.695 (0.001) | – | – | |
AVE | 1.665 (0.068) | 13.87 (0.594) | 0.764 (0.006) | 26.88 (0.132) | 29.40 (0.310) | |
LW | 1.652 (0.050) | 13.67 (0.452) | 59.51 (2.068) | 396.7 (8.192) | 87.14 (0.791) | |
Prop | 1.610 (0.044) | 13.10 (0.383) | 0.688 (0.004) | 24.24 (0.081) | 25.93 (0.170) |
Table 1. The averages and standard errors (in parenthesis) of estimates for 1.
F | MAE | QL | EN | KL | ||
---|---|---|---|---|---|---|
p=30 | 0.107 (0.001) | 0.353 (0.002) | 0.630 (0.007) | 12.42 (0.082) | 38.76 (0.652) | |
BT | 0.132 (0.001) | 0.057 (0.001) | 0.082 (0.001) | 1.783 (0.021) | 2.775 (0.041) | |
MX | 0.103 (0.001) | 0.250 (0.001) | 0.316 (0.003) | 7.037 (0.059) | 15.96 (0.275) | |
AVE | 0.049 (0.002) | 0.059 (0.001) | 0.071 (0.002) | 1.062 (0.035) | 1.281 (0.055) | |
LW | 0.048 (0.002) | 0.074 (0.004) | 0.064 (0.003) | 0.945 (0.040) | 1.013 (0.043) | |
Prop | 0.045 (0.001) | 0.055 (0.001) | 0.064 (0.001) | 1.053 (0.024) | 1.265 (0.035) | |
p=50 | 0.094 (0.001) | 0.374 (0.002) | 1.045 (0.007) | – | – | |
BT | 0.112 (0.001) | 0.041 (0.001) | 0.078 (0.001) | 2.702 (0.031) | 4.192 (0.063) | |
MX | 0.097 (0.001) | 0.227 (0.001) | 0.415 (0.002) | – | – | |
AVE | 0.041 (0.002) | 0.045 (0.001) | 0.073 (0.002) | 1.870 (0.056) | 2.308 (0.090) | |
LW | 0.039 (0.001) | 0.072 (0.004) | 0.080 (0.004) | 1.899 (0.095) | 2.004 (0.102) | |
Prop | 0.036 (0.001) | 0.040 (0.001) | 0.066 (0.001) | 1.857 (0.036) | 2.300 (0.058) | |
p=100 | 0.076 (0.001) | 0.394 (0.001) | 2.061 (0.009) | – | – | |
BT | 0.082 (0.001) | 0.025 (0.001) | 0.162 (0.001) | 7.366 (0.048) | 8.134 (0.066) | |
MX | 0.088 (0.001) | 0.178 (0.001) | 0.646 (0.001) | – | – | |
AVE | 0.029 (0.001) | 0.029 (0.001) | 0.078 (0.001) | 4.212 (0.073) | 5.504 (0.117) | |
LW | 0.027 (0.001) | 0.067 (0.003) | 0.104 (0.006) | 4.666 (0.225) | 4.786 (0.223) | |
Prop | 0.025 (0.001) | 0.026 (0.001) | 0.069 (0.001) | 4.109 (0.058) | 5.354 (0.098) |
Table 4. The averages and standard errors (in parenthesis) of estimates for 4.
F | MAE | QL | EN | KL | ||
---|---|---|---|---|---|---|
p=30 | 0.823 (0.013) | 3.611 (0.060) | 0.626 (0.006) | 12.46 (0.086) | 39.78 (0.745) | |
BT | 1.795 (0.001) | 6.619 (0.004) | 0.162 (0.001) | 4.888 (0.044) | 16.11 (0.121) | |
MX | 0.847 (0.017) | 3.736 (0.076) | 0.268 (0.002) | 6.372 (0.065) | 14.79 (0.307) | |
AVE | 0.868 (0.034) | 3.369 (0.135) | 0.188 (0.003) | 3.053 (0.036) | 4.311 (0.083) | |
LW | 0.774 (0.013) | 3.431 (0.061) | 0.433 (0.006) | 5.763 (0.067) | 7.501 (0.132) | |
Prop | 0.835 (0.023) | 3.228 (0.090) | 0.175 (0.002) | 2.863 (0.027) | 3.949 (0.056) | |
p=50 | 1.045 (0.009) | 5.864 (0.050) | 1.041 (0.007) | – | – | |
BT | 1.415 (0.002) | 4.166 (0.007) | 0.184 (0.002) | 8.753 (0.134) | 25.83 (0.724) | |
MX | 0.923 (0.011) | 4.595 (0.037) | 0.366 (0.001) | – | – | |
AVE | 0.880 (0.027) | 3.037 (0.080) | 0.144 (0.002) | 4.096 (0.059) | 5.910 (0.134) | |
LW | 0.866 (0.009) | 4.724 (0.035) | 0.545 (0.011) | 10.94 (0.205) | 13.58 (0.355) | |
Prop | 0.828 (0.017) | 2.858 (0.049) | 0.135 (0.001) | 3.816 (0.037) | 5.447 (0.084) | |
p=100 | 1.441 (0.005) | 11.44 (0.039) | 2.063 (0.009) | – | – | |
BT | 0.891 (0.016) | 2.003 (0.036) | 0.252 (0.004) | 21.52 (0.522) | 49.61 (2.412) | |
MX | 1.016 (0.004) | 5.837 (0.011) | 0.615 (0.001) | – | – | |
AVE | 0.757 (0.013) | 2.220 (0.026) | 0.108 (0.002) | 6.367 (0.101) | 9.309 (0.204) | |
LW | 0.858 (0.003) | 5.713 (0.057) | 0.462 (0.012) | 16.57 (0.384) | 16.99 (0.394) | |
Prop | 0.717 (0.009) | 2.094 (0.019) | 0.101 (0.001) | 5.903 (0.061) | 8.518 (0.123) |
Overall, Tables 1–4 show that the proposed estimate outperforms other methods regarding these loss measures. Although there are several settings that the proposed method is not the best, it is the second best, for example, the settings of p=100 under MAE and QL in 2. The BT estimate is superior over some other approaches in a couple of settings regarding MAE and QL measures. The MX method is inferior to other approaches except under these loss measures, possibly due to that it cannot guarantee the positive definiteness of the estimated covariance matrix. Notably, the LW estimate performs very well with respect to KL in 1, a diagonal covariance matrix. However, it is not as good as the proposed method for more general settings of matrix structures such as 2, 3 and 4. In addition, the AVE method generally shows relatively good performance compared with the , BT, MX and LW methods, but the proposed method does have some improvement over the AVE with respect to all loss measures. Furthermore, the proposed estimate substantially performs well as the number of variables p increases from 30 to 100. Thus, the proposed method is able to provide an accurate estimate for the large covariance matrices in these settings.
Notice that the MCD technique does not require the normality assumption on data to form a covariance matrix estimate. It is therefore interesting to investigate the empirical performance of the proposed method when the data are not from a normal distribution. Now we use = AR(0.5) as the true covariance matrix, and consider the following two data generating processes with the sample size n=50 and variable sizes :
Independently generate data from distribution with mean , and as its scale matrix.
Independently generate data from a mixed distribution , where b is a random number from Binomial distribution . represents distribution with mean and the identity matrix as its scale matrix.
Table 5 summarizes the averages and corresponding standard errors for each method based on 100 replicates. The proposed method produces superior results for case I, and performs comparably with MX and LW for case II. It is worth pointing out that the proposed method becomes better for case II when the variable size increases from p=30 to p=100. Since the BT method heavily relies on the normal data and uses the normal likelihood as its objective, it gives relatively high losses in Table 5, especially regarding the measures EN and KL. We also observe that the BT method has convergence issue for the non-normal data, for example when p=100 for case II. Although the MX estimate seems to perform well for case II in the low dimensions, it is not positive definite when n<p. The LW method shows a good performance with respect to the loss functions EN and KL for case II. Overall, when the data are not normal, the proposed method appears to be superior or at least comparable to other approaches. Examining Lemma 2.1, 2.2 and Theorem 2.3, we see that the convergence rate of would be different when the data are not normal. This rate may depend on the data distribution, the true covariance structure, the moments condition of variable and so forth. The results of Theorem 2.3 would still hold as long as such rate converges to 0 as the sample size goes to infinity.
Table 5. The averages and standard errors (in parenthesis) of estimates for non-normal data.
F | MAE | QL | EN | KL | |||
---|---|---|---|---|---|---|---|
I | p=30 | 1.117 (0.024) | 4.860 (0.110) | 1.355 (0.057) | 19.43 (0.344) | 77.03 (2.390) | |
BT | 0.990 (0.033) | 3.679 (0.206) | 1.267 (0.060) | 14.83 (0.684) | 44.86 (3.905) | ||
MX | 0.952 (0.023) | 4.116 (0.100) | 1.064 (0.045) | 11.90 (0.255) | 23.23 (0.766) | ||
AVE | 0.713 (0.015) | 2.600 (0.072) | 0.598 (0.021) | 5.994 (0.149) | 5.980 (0.220) | ||
LW | 0.655 (0.007) | 2.634 (0.039) | 0.651 (0.015) | 6.242 (0.104) | 5.214 (0.072) | ||
Prop | 0.645 (0.014) | 2.361 (0.065) | 0.542 (0.018) | 5.421 (0.133) | 5.362 (0.194) | ||
p=50 | 1.443 (0.021) | 8.146 (0.120) | 2.294 (0.079) | – | – | ||
BT | 0.868 (0.035) | 3.061 (0.273) | 1.199 (0.082) | 21.58 (1.744) | 129.7 (28.97) | ||
MX | 1.202 (0.020) | 6.741 (0.110) | 1.728 (0.060) | – | – | ||
AVE | 0.739 (0.010) | 3.230 (0.062) | 0.670 (0.015) | 10.74 (0.174) | 10.04 (0.207) | ||
LW | 0.724 (0.007) | 3.577 (0.065) | 0.844 (0.014) | 12.59 (0.136) | 10.33 (0.088) | ||
Prop | 0.670 (0.009) | 2.923 (0.055) | 0.607 (0.014) | 9.712 (0.155) | 8.993 (0.180) | ||
p=100 | 2.118 (0.067) | 17.11 (0.602) | 5.201 (0.489) | – | – | ||
BT | 1.001 (0.091) | 4.792 (0.880) | 2.267 (0.540) | 54.87 (7.447) | 171.2 (35.10) | ||
MX | 1.884 (0.065) | 15.15 (0.580) | 4.396 (0.443) | – | – | ||
AVE | 0.849 (0.046) | 4.847 (0.395) | 0.985 (0.177) | 24.06 (0.858) | 23.54 (2.334) | ||
LW | 0.813 (0.010) | 5.314 (0.138) | 1.057 (0.021) | 29.64 (0.245) | 24.34 (0.141) | ||
Prop | 0.769 (0.042) | 4.374 (0.356) | 0.889 (0.159) | 21.73 (0.764) | 20.99 (2.019) | ||
II | p=30 | 3.319 (0.048) | 17.49 (0.268) | 1.966 (0.056) | 29.83 (0.568) | 174.8 (5.806) | |
BT | 3.248 (0.057) | 17.08 (0.326) | 1.916 (0.057) | 28.27 (0.618) | 148.6 (5.208) | ||
MX | 3.032 (0.050) | 15.91 (0.277) | 1.622 (0.053) | 19.10 (0.461) | 57.03 (2.087) | ||
AVE | 3.326 (0.065) | 17.48 (0.363) | 1.806 (0.054) | 18.67 (0.562) | 43.60 (1.919) | ||
LW | 3.313 (0.048) | 17.42 (0.262) | 1.621 (0.042) | 14.95 (0.384) | 35.15 (1.370) | ||
Prop | 3.167 (0.062) | 16.65 (0.346) | 1.719 (0.051) | 17.76 (0.536) | 41.42 (1.826) | ||
p=50 | 4.503 (0.073) | 31.04 (0.531) | 3.126 (0.076) | – | – | ||
BT | 4.296 (0.123) | 29.49 (0.913) | 2.981 (0.108) | 68.58 (1.981) | 581.7 (25.29) | ||
MX | 4.059 (0.074) | 27.90 (0.539) | 2.541 (0.071) | – | – | ||
AVE | 4.514 (0.102) | 31.08 (0.735) | 2.900 (0.090) | 34.08 (1.094) | 81.40 (3.863) | ||
LW | 4.460 (0.081) | 30.77 (0.575) | 2.694 (0.077) | 27.79 (0.764) | 66.44 (2.889) | ||
Prop | 4.298 (0.097) | 29.59 (0.700) | 2.760 (0.086) | 32.42 (1.043) | 77.35 (3.679) | ||
p=100 | 6.444 (0.091) | 63.32 (0.947) | 5.922 (0.145) | – | – | ||
BT | – | – | – | – | – | ||
MX | 6.143 (0.095) | 61.29 (0.980) | 5.147 (0.137) | – | – | ||
AVE | 6.348 (0.149) | 62.31 (1.523) | 5.285 (0.174) | 68.50 (2.270) | 161.8 (8.540) | ||
LW | 6.413 (0.097) | 63.34 (0.968) | 5.076 (0.135) | 56.57 (1.354) | 128.0 (5.059) | ||
Prop | 6.043 (0.142) | 59.31 (1.452) | 5.029 (0.166) | 65.14 (2.163) | 153.7 (8.125) |
4. Application
In this section, we apply the proposed method in an application of portfolio optimization. The problem is formulated as
(6) |
where is a portfolio, and is an estimate for the covariance matrix of asset returns. We expect that an accurate estimate will result in a better portfolio strategy.
Now we evaluate the performance of the proposed method by analyzing real data of 97 stocks from the SP100. This data set contains n=156 stock returns weekly recorded from 4 January 2010 to 12 December 2012. We use the first k observations, , to obtain the covariance matrix estimate from the proposed method, the sample covariance matrix, the BT method, the MX method, the AVE method and the LW method. Then the realized return at time k+1 is computed. Here the optimal portfolio is calculated with replaced by in (6). We report the performance measures of the average annual realized return
their standard deviation (SD) and the information ratio AVG/SD. Note that the objective function (6) is to minimize the variance rather than maximize the realized returns. Hence, the performance of each method should be primarily evaluated by whether it could produce a small SD. A large value of AVR and AV/SD are also surely desirable as a secondary importance in evaluating the performance of each method.
Table 6 displays the results of portfolio performance for each approach. We can see that both the proposed method and the AVE produce the smallest SD, but the proposed method has a relatively large AVR. Moreover, the sample covariance matrix , the BT method and the proposed method give the comparable AVR, but the proposed method is more reliable with a much smaller SD, hence, leading to the larger information ratio AVR/SD. In addition, the AVE and LW also show a good performance, indicated by smaller SD and larger AVR/SD than the , the BT and the MX methods. Overall, the proposed method performs well in the portfolio selection for this set of 97 stock data. It is able to give a smaller SD and slightly larger AVR than other approaches.
Table 6. The comparison of the portfolio performance measures for 97 stock returns.
BT | MX | AVE | LW | Prop | ||
---|---|---|---|---|---|---|
AVR | 12.37 | 12.04 | 9.323 | 10.60 | 9.746 | 12.60 |
SD | 19.64 | 15.20 | 11.34 | 8.474 | 9.303 | 8.473 |
AVR/SD | 0.630 | 0.792 | 0.822 | 1.251 | 1.048 | 1.488 |
5. Conclusion
In this paper, we have proposed a new method for estimating the covariance matrix based on the MCD framework under the loss function of the Frobenius norm. The MCD enables us to obtain a set of estimates for the covariance matrix by permutation of variable orderings, based on which the proposed estimator is constructed following the idea of Ledoit and Wolf [18]. Simulation study indicates that the proposed estimator performs well with respect to some commonly used loss functions. It is worth noting that the proposed method does not provide a sparse estimate. In view of this, a penalty function may be imposed to the off-diagonal entries of the covariance matrix. In addition, the MCD technique does not need the normality assumption on data to provide a covariance matrix estimate. Many papers on the Cholesky-based estimation for the covariance matrix assume normal data for the reason of theoretical proof. It is an interesting topic to study our proposed estimate if the normal assumption is relaxed. We may need different assumptions instead of (C2) and (C3) to derive the convergence rate of in Lemma 2.1, such as moment conditions of variable , or proper tail condition for the distribution of data. Based on the rate of , it is easy to derive the convergence rate of as in Lemma 2.2. If this new derived rate is converged, Theorem 2.3 would hold, and the proposed estimate is valid. This is left as a topic for the future research.
Acknowledgements
The authors thank the Associate Editor and two referees for their insightful and helpful comments that have significantly improved the original manuscript.
Funding Statement
Xiaoning Kang's research was supported by the Science Education Foundation of Liaoning Province (LN2019Q21), and the National Natural Science Foundation of China (71871047). Chaoping Xie's research was supported by the National Natural Science Foundation of China (71903090). Mingqiu Wang's research was supported by the National Natural Science Foundation of China (11771250), and the Natural Science Foundation of Shandong Province (ZR2019MA002).
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Xiaoning Kang http://orcid.org/0000-0003-0394-6240
References
- 1.Bickel P.J. and Levina E., Some theory for Fishers linear discriminant function, naive Bayes, and some alternatives when there are many more variables than observations, Bernoulli 10 (2004), pp. 989–1010. doi: 10.3150/bj/1106314847 [DOI] [Google Scholar]
- 2.Bickel P.J. and Levina E., Covariance regularization by thresholding, Ann. Statist. 36 (2008), pp. 2577–2604. doi: 10.1214/08-AOS600 [DOI] [Google Scholar]
- 3.Bickel P.J. and Levina E., Regularized estimation of large covariance matrices, Ann. Statist. 36 (2008), pp. 199–227. doi: 10.1214/009053607000000758 [DOI] [Google Scholar]
- 4.Bien J. and Tibshirani R.J., Sparse estimation of a covariance matrix, Biometrika 98 (2011), pp. 807–820. doi: 10.1093/biomet/asr054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cai T. and Liu W., Adaptive thresholding for sparse covariance matrix estimation, J. Am. Stat. Assoc. 106 (2011), pp. 672–684. doi: 10.1198/jasa.2011.tm10560 [DOI] [Google Scholar]
- 6.Chen J., Zhang Y., Li W. and Tian B., A supplement on CLT for LSS under a large dimensional generalized spiked covariance model, Statist. Probab. Lett. 138 (2018), pp. 57–65. doi: 10.1016/j.spl.2018.02.061 [DOI] [Google Scholar]
- 7.Dellaportas P. and Pourahmadi M., Cholesky-GARCH models with applications to finance, Stat. Comput. 22 (2012), pp. 849–855. doi: 10.1007/s11222-011-9251-2 [DOI] [Google Scholar]
- 8.Deng X. and Tsui K.W., Penalized covariance matrix estimation using a matrix-logarithm transformation, J. Comput. Graph. Stat. 22 (2013), pp. 494–512. doi: 10.1080/10618600.2012.715556 [DOI] [Google Scholar]
- 9.Dey D.K. and Srinivasan C., Estimation of a covariance matrix under Stein's loss, Ann. Stat. 13 (1985), pp. 1581–1591. doi: 10.1214/aos/1176349756 [DOI] [Google Scholar]
- 10.Engle R.F., Ledoit O. and Wolf M., Large dynamic covariance matrices, J. Bus. Econ. Statist. 37 (2019), pp. 363–375. doi: 10.1080/07350015.2017.1345683 [DOI] [Google Scholar]
- 11.Friedman J., Hastie T. and Tibshirani R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw. 33 (2010), pp. 1–22. doi: 10.18637/jss.v033.i01 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Guo J., Levina E., Michailidis G. and Zhu J., Joint estimation of multiple graphical models, Biometrika 98 (2011), pp. 1–15. doi: 10.1093/biomet/asq060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hjort N. and Pollard D., Asymptotics for minimisers of convex processes, Statistical Research Report, Department of Mathematics, University of Oslo, 1993.
- 14.Huang J.Z., Liu N., Pourahmadi M. and Liu L., Covariance matrix selection and estimation via penalised normal likelihood, Biometrika 93 (2006), pp. 85–98. doi: 10.1093/biomet/93.1.85 [DOI] [Google Scholar]
- 15.Johnstone I.M., On the distribution of the largest eigenvalue in principal components analysis, Ann. Statist. 29 (2001), pp. 295–327. doi: 10.1214/aos/1009210544 [DOI] [Google Scholar]
- 16.Kang X. and Deng X., Ensemble estimation of large sparse covariance matrix based on modified Cholesky decomposition (2018). Available at https://arxiv.org/abs/1801.00380.
- 17.Lam C. and Fan J., Sparsistency and rates of convergence in large covariance matrix estimation, Ann. Statist. 37 (2009), pp. 4254–4278. doi: 10.1214/09-AOS720 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ledoit O. and Wolf M., A well-conditioned estimator for large-dimensional covariance matrices, J. Multivar. Anal. 88 (2004), pp. 365–411. doi: 10.1016/S0047-259X(03)00096-4 [DOI] [Google Scholar]
- 19.Leng C. and Li B., Forward adaptive banding for estimating large covariance matrices, Biometrika 98 (2011), pp. 821–830. doi: 10.1093/biomet/asr045 [DOI] [Google Scholar]
- 20.Levina E., Rothman A. and Zhu J., Sparse estimation of large covariance matrices via a nested Lasso penalty, Ann. Appl. Stat. 2 (2008), pp. 245–263. doi: 10.1214/07-AOAS139 [DOI] [Google Scholar]
- 21.Peng J., Wang P., Zhou N. and Zhu J., Partial correlation estimation by joint sparse regression models, J. Am. Stat. Assoc. 104 (2008), pp. 735–746. doi: 10.1198/jasa.2009.0126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pourahmadi M., Joint mean–covariance models with applications to longitudinal data: Unconstrained parameterisation, Biometrika 86 (1999), pp. 677–690. doi: 10.1093/biomet/86.3.677 [DOI] [Google Scholar]
- 23.Pourahmadi M., Daniels M.J. and Park T., Simultaneous modelling of the Cholesky decomposition of several covariance matrices, J. Multivar. Anal. 98 (2007), pp. 568–587. doi: 10.1016/j.jmva.2005.11.002 [DOI] [Google Scholar]
- 24.Rajaratnam B. and Salzman J., Best permutation analysis, J. Multivar. Anal. 121 (2013), pp. 193–223. doi: 10.1016/j.jmva.2013.03.001 [DOI] [Google Scholar]
- 25.Rothman A., Bickel P., Levina E. and Zhu J., Sparse permutation invariant covariance estimation, Electron. J. Stat. 2 (2008), pp. 494–515. doi: 10.1214/08-EJS176 [DOI] [Google Scholar]
- 26.Rothman A.J., Levina E. and Zhu J., A new approach to Cholesky-based covariance regularization in high dimensions, Biometrika 97 (2010), pp. 539–550. doi: 10.1093/biomet/asq022 [DOI] [Google Scholar]
- 27.Tibshirani R., Regression shrinkage and selection via the Lasso, J. R. Statist. Soc. Ser. B 58 (1996), pp. 267–288. [Google Scholar]
- 28.Wagaman A. and Levina E., Discovering sparse covariance structures with the Isomap, J. Comput. Graph. Stat. 18 (2009), pp. 551–572. doi: 10.1198/jcgs.2009.08021 [DOI] [Google Scholar]
- 29.Won J.H., Lim J., Kim S.J. and Rajaratnam B., Condition-number-regularized covariance estimation, J. R. Statist. Soc.: Ser. B 75 (2013), pp. 427–450. doi: 10.1111/j.1467-9868.2012.01049.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wu W.B. and Pourahmadi M., Nonparametric estimation of large covariance matrices of longitudinal data, Biometrika 90 (2003), pp. 831–844. doi: 10.1093/biomet/90.4.831 [DOI] [Google Scholar]
- 31.Xue L., Ma S. and Zou H., Positive-definite -penalized estimation of large covariance matrices, J. Am. Stat. Assoc. 107 (2012), pp. 1480–1491. doi: 10.1080/01621459.2012.725386 [DOI] [Google Scholar]
- 32.Zhang W. and Leng C., A moving average Cholesky factor model in covariance modelling for longitudinal data, Biometrika 99 (2011), pp. 141–150. doi: 10.1093/biomet/asr068 [DOI] [Google Scholar]
- 33.Zhang W., Leng C. and Tang C.Y., A joint modelling approach for longitudinal studies, J. R. Statist. Soc. 77 (2015), pp. 219–238. doi: 10.1111/rssb.12065 [DOI] [Google Scholar]
- 34.Zheng H., Tsui K., Kang X. and Deng X., Cholesky-based model averaging for covariance matrix estimation, Statist. Theory Related Fields 1 (2017), pp. 48–58. doi: 10.1080/24754269.2017.1336831 [DOI] [Google Scholar]