Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2018 Mar 27;105(2):271–284. doi: 10.1093/biomet/asy011

Robust estimation of high-dimensional covariance and precision matrices

Marco Avella-Medina 1,2, Heather S Battey 2,, Jianqing Fan 3,3, Quefeng Li 4,4
PMCID: PMC6188670  NIHMSID: NIHMS958947  PMID: 30337763

SUMMARY

High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded Inline graphic moments for Inline graphic. The associated convergence rates depend on Inline graphic.

Keywords: Constrained 1-minimization, Leptokurtosis, Minimax rate, Robustness, Thresholding

1. Introduction

Covariance and precision matrices play a central role in summarizing linear relationships among variables. Our focus is on estimating these matrices when their dimension is large relative to the number of observations. Besides being of interest in themselves, estimates of covariance and precision matrices are used for numerous procedures from classical multivariate analysis, including linear regression.

Consistency is achievable under structural assumptions provided regularity conditions are met. For instance, under the assumption that all rows or columns of the covariance matrix belong to a sufficiently small Inline graphic-ball around zero, thresholding (Bickel & Levina, 2008; Rothman et al., 2008) or its adaptive counterpart (Cai & Liu, 2011) gives consistent estimators of the covariance matrix in the spectral norm for data from a distribution with sub-Gaussian tails. For precision matrix estimation, the same sparsity assumption on the precision matrix motivates the use of the constrained Inline graphic-minimizer of Cai et al. (2011) or its adaptive counterpart (Cai et al., 2016), both of which are consistent in spectral norm under the same sub-Gaussianity condition. Under sub-Gaussianity, Cai & Liu (2011) and Cai et al. (2016) showed that in high-dimensional regimes the adaptive thresholding estimator and adaptive constrained Inline graphic-minimization estimator are minimax optimal within the classes of covariance or precision matrices satisfying their sparsity constraint.

Since sub-Gaussianity is often too restrictive in practice, we seek new procedures that can achieve the same minimax optimality when data are leptokurtic. Inspection of the proofs of Bickel & Levina (2008), Cai & Liu (2011) and Cai et al. (2016) reveals that sub-Gaussianity is needed because their methods are built on the sample covariance matrix, which requires the assumption to guarantee its optimal performance. Here we show that minimax optimality is achievable within a larger class of distributions if the sample covariance matrix is replaced by a robust pilot estimator, thus providing a unified theory for covariance and precision matrix estimation based on general pilot estimators. We also show how to construct pilot estimators that have the required elementwise convergence rates of (1) and (2) below. Within a much larger class of distributions with bounded fourth moment, it is shown that an estimator obtained by regularizing a robust pilot estimator attains the minimax rate achieved by existing methods under sub-Gaussianity. The analysis is extended to show that when only bounded Inline graphic moments exist for Inline graphic, matrix estimators with satisfactory convergence rates are still attainable.

Some related work includes that of Liu et al. (2012) and Xue & Zou (2012), who considered robust estimation of graphical models when the underlying distribution is elliptically symmetric, Fan et al. (2015, 2016a,b), who studied robust matrix estimation in the context of factor models, and Chen et al. (2015) and Loh & Tan (2015), who investigated matrix estimation when the data are contaminated by outliers. The present paper is concerned with efficient estimation of general sparse covariance and precision matrices when only certain moment conditions are assumed.

For a Inline graphic-dimensional random vector Inline graphic with mean Inline graphic, let Inline graphic and let Inline graphic denote an arbitrary pilot estimator of Inline graphic, where Inline graphic with Inline graphic standing for Inline graphic. The key requirement on Inline graphic for optimal covariance estimation is that

graphic file with name M19.gif (1)

where Inline graphic is a positive constant and Inline graphic is a deterministic sequence converging to zero as Inline graphic such that Inline graphic. This delivers rates of convergence that match the minimax rates of Cai & Liu (2011) even under violations of their sub-Gaussianity condition, which entails the existence of Inline graphic such that Inline graphic for every Inline graphic and every Inline graphic. Introduce the sample covariance matrix

graphic file with name M28.gif

where Inline graphic are independent and identically distributed copies of Inline graphic and Inline graphic. Proposition 1 shows that Inline graphic violates (1) when Inline graphic is not sub-Gaussian. In other words, the sample covariance does not concentrate exponentially fast in an elementwise sense if sub-Gaussianity is violated.

Similarly, for estimation of the precision matrix Inline graphic, the optimality of the adaptive constrained Inline graphic-minimization estimator is retained under a pilot estimator satisfying

graphic file with name M36.gif (2)

where Inline graphic and Inline graphic are as in (1) and Inline graphic denotes the Inline graphic identity matrix. While (2) holds with Inline graphic under sub-Gaussianity of Inline graphic, it fails otherwise.

The following proposition provides a more formal illustration of the unsuitability of Inline graphic as a pilot estimator in the absence of sub-Gaussianity.

Proposition 1.

Let Inline graphic for Inline graphic and some Inline graphic. For all distributions of Inline graphic satisfying this assumption, there is a distribution Inline graphic such that for some Inline graphic,

Proposition 1.

This implies that the choice to take the sample covariance as the pilot estimator Inline graphic results in a polynomial rate of convergence, which is slower than the exponential rate of concentration in (1). Instead, we introduce robust pilot estimators in § 4 that satisfy the conditions (1) and (2). These estimators only require Inline graphic.

Throughout the paper, for a vector Inline graphic, Inline graphic, Inline graphic and Inline graphic. For a matrix Inline graphic, Inline graphic is the elementwise maximum norm, Inline graphic is the spectral norm, and Inline graphic is the matrix Inline graphic-norm. We let Inline graphic denote the Inline graphic identity matrix; Inline graphic and Inline graphic mean that Inline graphic is positive definite and positive semidefinite, respectively. For a square matrix Inline graphic, we denote its maximum and minimum eigenvalues by Inline graphic and Inline graphic, respectively. We also assume that Inline graphic.

2. Broadening the scope of the adaptive thresholding estimator

Let Inline graphic be a general thresholding function for which:

  • (i) Inline graphic for all Inline graphic and Inline graphic that satisfy Inline graphic;

  • (ii) Inline graphic for Inline graphic;

  • (iii) Inline graphic for all Inline graphic.

Similar properties are set forth in Antoniadis & Fan (2001) and were proposed in the context of covariance estimation via thresholding in Rothman et al. (2009) and Cai & Liu (2011). Some examples of thresholding functions satisfying these three conditions are the soft thresholding rule Inline graphic, the adaptive lasso rule Inline graphic with Inline graphic, and the smoothly clipped absolute deviation thresholding rule (Rothman et al., 2009). Although the hard thresholding rule Inline graphic does not satisfy (i), the results presented in this section also hold for hard thresholding. The adaptive thresholding estimator is defined as

graphic file with name M84.gif

where Inline graphic is the (Inline graphic)th entry of Inline graphic and the threshold Inline graphic is entry-dependent. Equipped with these adaptive thresholds, Cai & Liu (2011) established optimal rates of convergence of the resulting estimator under sub-Gaussianity of Inline graphic. To accommodate data drawn from distributions violating sub-Gaussianity, we replace the sample covariance matrix Inline graphic by a pilot estimator Inline graphic satisfying (1). The resulting adaptive thresholding estimator is denoted by Inline graphic. As suggested by Fan et al. (2013), the entry-dependent threshold

graphic file with name M93.gif (3)

is used, where Inline graphic is a constant. This is simpler than the threshold used by Cai & Liu (2011), as it does not require estimation of Inline graphic and achieves the same optimality.

Let Inline graphic denote the class of positive-definite symmetric matrices with elements in Inline graphic. Theorem 1 relies on the following conditions on the pilot estimator and the sparsity of Inline graphic.

Condition 1.

The pilot estimator Inline graphic satisfies (1).

Condition 2.

The matrix Inline graphic belongs to the class

Condition 2.

The class of weakly sparse matrices Inline graphic was introduced by Cai & Liu (2011). The columns of a covariance matrix in Inline graphic are required to lie in a weighted Inline graphic-ball, where the weights are determined by the variance of the entries of the population covariance.

Theorem 1.

Suppose that Conditions 1 and 2 hold, Inline graphic, and Inline graphic. There exists a positive constant Inline graphic such that

Theorem 1.

where Inline graphic is a deterministic sequence that decreases to zero as Inline graphic.

The constant Inline graphic in Theorem 1 depends on Inline graphic, Inline graphic and the unknown distribution of Inline graphic, and so do the constants appearing in Theorem 2 and Propositions 2–4.

Our result generalizes Theorem 1 of Cai & Liu (2011); the minimax lower bound of our Theorem 1 matches theirs, implying that our procedure is minimax optimal for a wider class of distributions containing the sub-Gaussian distributions.

Cai & Liu (2011) also give convergence rates under bounded moments in all components of Inline graphic. In that case, a much more stringent scaling condition on Inline graphic and Inline graphic is required, as shown in Theorem 1(ii) of Cai & Liu (2011). Their result does not cover the high-dimensional case where Inline graphic when fewer than Inline graphic finite moments exist. If a larger number of finite moments is assumed, Inline graphic is allowed to increase polynomially with Inline graphic. However, we allow Inline graphic.

For the three pilot estimators to be given in § 4, even universal thresholding can achieve the same minimax optimal rate given the bounded fourth moments assumed there. However, adaptive thresholding as formulated in (3) results in better numerical performance.

Unfortunately, Inline graphic may not be positive semidefinite, but it can be projected onto the cone of positive-semidefinite matrices through the convex optimization

graphic file with name M124.gif (4)

By definition, Inline graphic, so the triangle inequality yields

graphic file with name M126.gif

Hence, the price to pay for projection is no more than a factor of two, which does not affect the convergence rate. The projection is easily made by linear programming (Boyd & Vandenberghe, 2004).

3. Broadening the scope of the adaptively constrained Inline graphic-minimization estimator

We consider a robust modification of the adaptively constrained Inline graphic-minimization estimator of Cai et al. (2016). In a spirit similar to § 2, our robust modification relies on the existence of a pilot estimator Inline graphic satisfying (2). Construction of the robust adaptive constrained Inline graphic-minimizer relies on a preliminary projection, resulting in the positive-definite estimator

graphic file with name M131.gif (5)

for an arbitrarily small positive number Inline graphic. The minimization problem in (5) can be rewritten as minimizing Inline graphic such that Inline graphic and Inline graphic for all Inline graphic. This problem can be solved in Matlab using the cvx solver (Grant & Boyd, 2014).

Given Inline graphic, our estimator of Inline graphic is constructed by replacing Inline graphic with Inline graphic in the original constrained Inline graphic-minimization procedure. For ease of reference, the steps are reproduced below. Define the first-stage estimator Inline graphic of Inline graphic through the vectors

graphic file with name M144.gif (6)

with Inline graphic, Inline graphic for Inline graphic, and Inline graphic being the vector that has value Inline graphic in the Inline graphicth coordinate and zeros elsewhere. More specifically, define Inline graphic as an adjustment of Inline graphic such that the Inline graphicth entry is

graphic file with name M154.gif (7)

and define the first-stage estimator as Inline graphic. A second-stage adaptive estimator Inline graphic is defined by solving, for each column,

graphic file with name M157.gif (8)

where Inline graphic for Inline graphic. In practice, the optimal values of Inline graphic and Inline graphic are chosen by crossvalidation. The final estimator, Inline graphic, of Inline graphic is a symmetrized version of Inline graphic constructed as

graphic file with name M165.gif (9)

The theoretical properties of Inline graphic are derived under Conditions 3 and 4.

Condition 3.

The pilot estimator Inline graphic satisfies (2).

Condition 4.

The matrix Inline graphic belongs to the class

Condition 4.

where Inline graphic, Inline graphic is a constant, and Inline graphic and Inline graphic are positive deterministic sequences that are bounded away from zero and allowed to diverge as Inline graphic and Inline graphic grow.

In this class of precision matrices, sparsity is imposed by restricting the columns of Inline graphic to lie in an Inline graphic-ball of radius Inline graphic (Inline graphic).

Theorem 2.

Suppose that Conditions 1, 3 and 4 are satisfied with Inline graphic. Under the scaling condition Inline graphic we have, for a positive constant Inline graphic,

Theorem 2.

where Inline graphic is a deterministic sequence that decreases to zero as Inline graphic and Inline graphic is the robust adaptively constrained Inline graphic-minimization estimator described in (6)–(9).

Remark 1.

Our class of precision matrices is slightly more restrictive than that considered in Cai et al. (2016), since we require Inline graphic instead of Inline graphic. The difference is marginal since Inline graphic and Inline graphic implies that Inline graphic. We therefore only exclude precision matrices associated with either exploding or imploding covariance matrices, i.e., we exclude Inline graphic and Inline graphic for all Inline graphic. Ren et al. (2015) also require Inline graphic.

A positive-semidefinite estimator with the same convergence rate as Inline graphic can be constructed by projecting the symmetric matrix Inline graphic onto the cone of positive-semidefinite matrices, as in (4).

Next, we present three pilot estimators whose performance is favourable with respect to the sample covariance matrix when the sub-Gaussianity assumption is violated. We verify Conditions 1 and 3 for these estimators. Condition 1 will be verified for all three pilot estimators. When Inline graphic is bounded, Condition 1 implies Condition 3 because Inline graphic. When Inline graphic, Condition 3 is verified for the adaptive Huber estimator. We emphasize that Condition 3 is only needed if the goal is to obtain a minimax optimal estimator of Inline graphic. A consistent estimator is still attainable if only Condition 1 holds when Inline graphic. A more thorough discussion appears in the Supplementary Material.

4. Robust pilot estimators

4.1. A rank-based estimator

The rank-based estimator requires only the existence of the second moment. However, it makes arguably more restrictive assumptions, as it requires the distribution of Inline graphic to be elliptically symmetric.

Definition 1.

A random vector Inline graphic follows an elliptically symmetric distribution if and only if Inline graphic, where Inline graphic, Inline graphic with Inline graphic, Inline graphic is uniformly distributed on the unit sphere in Inline graphic, and Inline graphic is a positive random variable independent of Inline graphic.

Observe that Inline graphic where Inline graphic denotes the correlation matrix and Inline graphic. Liu et al. (2012) and Xue & Zou (2012) both proposed rank-based estimation of Inline graphic, exploiting a bijective mapping between Pearson correlation and Kendall’s tau or Spearman’s rho dependence measures that hold for elliptical distributions. More specifically, Kendall’s tau concordance between Inline graphic and Inline graphic is defined as

graphic file with name M220.gif

where Inline graphic is an independent copy of Inline graphic. With Inline graphic, the empirical analogue of Inline graphic is

graphic file with name M225.gif

Since Inline graphic, an estimator of Inline graphic is Inline graphic. An analogous bijection exists between Spearman’s rho and Pearson’s correlation; see Xue & Zou (2012) for details. We propose to estimate the elements of the diagonal matrix Inline graphic using a median absolute deviation estimator, Inline graphic, where Inline graphic. Here, Inline graphic denotes the median within the index set Inline graphic and Inline graphic is the Fisher consistency constant, where Inline graphic is the distribution function of Inline graphic and Inline graphic is the median of Inline graphic. Finally, the rank-based estimator is defined as Inline graphic.

Proposition 2.

Let Inline graphic be independent and identically distributed copies of the elliptically symmetric random vector Inline graphic with covariance matrix Inline graphic. Assume that Inline graphic and Inline graphic, where Inline graphic for Inline graphic and Inline graphic. Then

Proposition 2.

with Inline graphic for positive constants Inline graphic, Inline graphic and Inline graphic.

In estimating marginal variances, we use median absolute deviation estimators to avoid higher moment assumptions. This assumes knowledge of Inline graphic, without which these marginal variances can be estimated by using the adaptive Huber estimator or the median of means estimator given in the next two subsections. This requires existence of a fourth moment; see Propositions 3 and 5.

4.2. An adaptive Huber estimator

The Huber-type M-estimator only requires the existence of fourth moments. Let Inline graphic. Then Inline graphic where Inline graphic, Inline graphic and Inline graphic. We propose to estimate Inline graphic robustly through robust estimators of Inline graphic, Inline graphic and Inline graphic. For independent and identically distributed copies Inline graphic of a real random variable Inline graphic with mean Inline graphic, Huber’s (1964) M-estimator of Inline graphic is defined as the solution to

graphic file with name M267.gif (10)

where Inline graphic is the Huber function. Replacing Inline graphic in (10) by Inline graphic, Inline graphic and Inline graphic gives the Huber estimators Inline graphic, Inline graphic and Inline graphic of Inline graphic, Inline graphic and Inline graphic, respectively, from which the Huber-type estimator of Inline graphic is defined as Inline graphic.

We depart from Huber (1964) by allowing Inline graphic to grow to infinity as Inline graphic increases, as our objectives differ from those of Huber (1964) and of classical robust statistics (Huber & Ronchetti, 2009). There, the distribution generating the data is assumed to be a contaminated version of a given parametric model, where the contamination level is small, and the objective is to estimate features of the parametric model as if no contamination were present. Our goal is instead to estimate the mean of the underlying distribution, allowing departures from sub-Gaussianity. In related work, Fan et al. (2017) have shown that when Inline graphic is allowed to diverge at an appropriate rate, the Huber estimator of the mean concentrates exponentially fast around the true mean when only a finite second moment exists. In a similar spirit, we allow Inline graphic to grow with Inline graphic in order to alleviate the bias. An appropriate choice of Inline graphic trades off bias and robustness. We build on Fan et al. (2017) and Catoni (2012), showing that our proposed Huber-type estimator satisfies Conditions 1 and 3.

Proposition 3.

Assume Inline graphic. Let Inline graphic be the Huber-type estimator with Inline graphic for Inline graphic and Inline graphic satisfying Inline graphic. Under the scaling condition Inline graphic we have, for large Inline graphic and a constant Inline graphic,

Proposition 3.

where Inline graphic for positive constants Inline graphic and Inline graphic.

Proposition 3 verifies Condition 1 for Inline graphic, provided Inline graphic is chosen to diverge at the appropriate rate. As quantified in Proposition 4, Inline graphic also satisfies Condition 3 when Inline graphic is of the same rate as in Proposition 3. The proof of this result entails extending a large deviation result of Petrov (1995).

Proposition 4.

Assume that Inline graphic and Inline graphic with Inline graphic. Let Inline graphic be the Huber-type estimator defined below (10) with Inline graphic for Inline graphic and Inline graphic. Assume that the truncated population covariance matrix Inline graphic satisfies Inline graphic. Under the scaling condition Inline graphic we have, for large Inline graphic and a constant Inline graphic,

Proposition 4.

where Inline graphic for positive constants Inline graphic and Inline graphic.

4.3. A median of means estimator

The median of means estimator was proposed by Nemirovsky & Yudin (1983) and has been further studied by Lerasle & Oliveira (2011), Bubeck et al. (2013) and Joly & Lugosi (2016). It is defined as the median of Inline graphic means obtained by partitioning the data into Inline graphic subsamples. A heuristic explanation for its success is that taking means within subsamples results in a more symmetric sample while the median makes the solution concentrate faster.

Our median of means estimator for Inline graphic is constructed as Inline graphic, where Inline graphic, Inline graphic and Inline graphic are median of means estimators of Inline graphic, Inline graphic and Inline graphic, respectively; in each case, each of the Inline graphic means is computed on an regular partition Inline graphic of Inline graphic. It is assumed that Inline graphic is a factor of Inline graphic.

The value of Inline graphic is a tuning parameter that affects the accuracy of the median of means estimator. The choice of Inline graphic involves a compromise between bias and variance. For the extreme cases, Inline graphic and Inline graphic, we obtain respectively the sample median and the sample mean. The latter is asymptotically unbiased but does not concentrate exponentially fast in the presence of heavy tails, while the former concentrates exponentially fast but not to the population mean under asymmetric distributions. Proposition 5 gives the range of Inline graphic for which both goals are achieved simultaneously.

Proposition 5.

Assume Inline graphic. Let Inline graphic be the median of means estimator described above based on a regular partition Inline graphic with Inline graphic for a positive constant Inline graphic. Under the scaling condition Inline graphic we have, for large Inline graphic and a constant Inline graphic,

Proposition 5.

where Inline graphic for positive constants Inline graphic and Inline graphic.

5. Infinite kurtosis

In the previous discussion we assumed the existence of fourth moments of Inline graphic for the Huber-type estimator in § 4. We now relax the condition of boundedness of Inline graphic to that of Inline graphic for some Inline graphic and all Inline graphic. The following proposition lays the foundations for the analysis of high-dimensional covariance or precision matrix estimation with infinite kurtosis. It extends Theorem 5 in Fan et al. (2017) and gives rates of convergence for Huber’s estimator of Inline graphic assuming a bounded Inline graphic moment for Inline graphic. The result is optimal in the sense that our rates match the minimax lower bound given in Theorem 3.1 of Devroye et al. (2016). The rates depend on Inline graphic, and when Inline graphic they match those of Catoni (2012) and Fan et al. (2017).

Proposition 6.

Let Inline graphic, Inline graphic and Inline graphic, and let Inline graphic be independent and identically distributed random variables with mean Inline graphic and bounded Inline graphic moment, i.e., Inline graphic. Take Inline graphic. Then, with probability at least Inline graphic,

Proposition 6.

where Inline graphic is as defined in § 4.2.

Corollary 1.

Under the conditions of Proposition 6, the Huber estimator satisfies

Corollary 1.

Corollary 1 allows us to generalize the upper bounds of the Huber-type estimator. The following two theorems establish rates of convergence for the adaptive thresholding and the adaptively constrained Inline graphic-minimization estimators. While we do not prove that these rates are minimax optimal under Inline graphic finite moments, the proof expands on the elementwise maximum norm convergence of the pilot estimator, which is optimal by Theorem 3.1 of Devroye et al. (2016), and the resulting rates for adaptive thresholding match the minimax rates of Cai & Liu (2011) when Inline graphic. This is a strong indication that the rates are sharp.

Theorem 3.

Suppose that Condition 2 is satisfied and assume Inline graphic. Let Inline graphic be the adaptive thresholding estimator defined in § 2based on the Huber pilot estimator Inline graphic with Inline graphic for Inline graphic and Inline graphic. Under the scaling condition Inline graphic and choosing Inline graphic for some Inline graphic, we have, for sufficiently large Inline graphic,

Theorem 3.

where Inline graphic for positive constants Inline graphic and Inline graphic.

Theorem 4.

Suppose that Condition 4 is satisfied, Inline graphic and Inline graphic. Let Inline graphic be the adaptively constrained Inline graphic-minimization estimator defined in § 3based on the Huber pilot estimator Inline graphic with Inline graphic for Inline graphic and Inline graphic. Assume that the truncated population covariance matrix Inline graphic satisfies Inline graphic. Under the scaling condition Inline graphic, we have, for sufficiently large Inline graphic,

Theorem 4.

where Inline graphic for positive constants Inline graphic and Inline graphic.

A result similar to Proposition 6 was obtained in Lemma 2 of Bubeck et al. (2013) for the median of means estimator. Expanding on it, we obtain a result analogous to Theorem 3 for the median of means matrix estimator.

6. Finite-sample performance

We illustrate the performance of the estimators discussed in § § 2 and 3 under a range of data-generating scenarios and for every choice of pilot estimator discussed in § 4. For the adaptive thresholding estimator of Inline graphic, we use a hard thresholding rule with the entry-dependent thresholds of (3). In each of 500 Monte Carlo replications, Inline graphic independent copies of a random vector Inline graphic of dimension Inline graphic are drawn from a model with either a sparse covariance matrix Inline graphic or a sparse precision matrix Inline graphic, depending on the experiment. We consider four different scenarios for the distribution of Inline graphic: the zero-mean multivariate normal distribution; the Inline graphic distribution with 3Inline graphic5 degrees of freedom and infinite kurtosis; the skewed Inline graphic distribution with four degrees of freedom and skew parameter equal to 20; and the contaminated skewed Inline graphic distribution (Azzalini, 2005) with four degrees of freedom and skew parameter equal to 10. Data in the last scenario are generated as Inline graphic, where Inline graphic, Inline graphic and Inline graphic; here Inline graphic is the Inline graphic distribution generating most of the data, while Inline graphic is a normal distribution with a mean vector of Inline graphic and covariance matrix equal to the identity. Any unspecified tuning parameters from the adaptive thresholding estimator and adaptively constrained Inline graphic-minimization estimator are chosen by crossvalidation to minimize the spectral norm error. Unspecified constants in the tuning parameters of the robust pilot estimators are conservatively chosen to be those that would be optimal if the true distribution was a Student Inline graphic distribution with five degrees of freedom. We consider the following two structures for Inline graphic and Inline graphic.

  • (i) Sparse covariance matrix: similar to Model 2 in the simulation section of Cai & Liu (2011), we take the true covariance model to be the block-diagonal matrix Inline graphic, where Inline graphic, Inline graphic, Inline graphic with independent Inline graphic and Inline graphic to ensure that Inline graphic is positive definite.

  • (ii) Banded precision matrix: following Cai et al. (2016), we take the true precision matrix to be of the banded form Inline graphic, where Inline graphicInline graphic, Inline graphic and Inline graphic for Inline graphic.

Table 1 shows that while the sample covariance estimator performs well for the normally distributed case, when the true model departs from normality, thresholding this estimator gives poor performance, reflected by its elevated estimation error in both the maximum norm and the spectral norm. By contrast, thresholding one of our proposed robust pilot estimators does not suffer from these heavy-tailed distributions. Table 2 shows a similar pattern for the precision matrix estimators. The gains are apparent for all robust pilot estimators, as predicted by our theory.

Table 1.

Estimation errors Inline graphicwith standard errors in parenthesesInline graphic of the adaptive thresholding estimator of Inline graphic based on four different pilot estimators; values are averaged over Inline graphic replications

Distribution Error Sample covariance Adaptive Huber Median of means Rank-based
MVN Inline graphic 2Inline graphic88 (0Inline graphic04) 2Inline graphic86 (0Inline graphic04) 3Inline graphic31 (0Inline graphic05) 3Inline graphic01 (0Inline graphic07)
MVN Inline graphic 0Inline graphic98 (0Inline graphic09) 0Inline graphic92 (0Inline graphic09) 1Inline graphic50 (0Inline graphic14) 1Inline graphic61 (0Inline graphic23)
T Inline graphic 8Inline graphic95 (0Inline graphic53) 3Inline graphic92 (0Inline graphic06) 4Inline graphic46 (0Inline graphic24) 5Inline graphic02 (0Inline graphic06)
T Inline graphic 8Inline graphic72 (0Inline graphic55) 1Inline graphic87 (0Inline graphic05) 3Inline graphic35 (0Inline graphic74) 2Inline graphic54 (0Inline graphic04)
ST Inline graphic 7Inline graphic12 (0Inline graphic17) 4Inline graphic88 (0Inline graphic05) 4Inline graphic96 (0Inline graphic06) 5Inline graphic16 (0Inline graphic06)
ST Inline graphic 6Inline graphic89 (0Inline graphic18) 2Inline graphic41 (0Inline graphic04) 2Inline graphic43 (0Inline graphic04) 2Inline graphic57 (0Inline graphic04)
CST Inline graphic 5Inline graphic47 (0Inline graphic23) 4Inline graphic14 (0Inline graphic06) 4Inline graphic60 (0Inline graphic05) 5Inline graphic13 (0Inline graphic06)
CST Inline graphic 5Inline graphic07 (0Inline graphic27) 2Inline graphic02 (0Inline graphic05) 2Inline graphic27 (0Inline graphic05) 2Inline graphic56 (0Inline graphic04)

MVN, the normal distribution; T, the Inline graphic distribution; ST, the skewed Inline graphic distribution; CST, the contaminated skewed Inline graphic distribution.

Table 2.

Estimation errors Inline graphicwith standard errors in parenthesesInline graphic of the adaptively constrained Inline graphic-minimizers to Inline graphic based on four different pilot estimators; values are averaged over Inline graphic replications

Distribution Error Sample covariance Adaptive Huber Median of means Rank-based
MVN Inline graphic 2Inline graphic62 (0Inline graphic01) 2Inline graphic61 (0Inline graphic01) 2Inline graphic59 (0Inline graphic01) 2Inline graphic59 (0Inline graphic01)
MVN Inline graphic 1Inline graphic05 (0Inline graphic09) 1Inline graphic02 (0Inline graphic09) 1Inline graphic85 (0Inline graphic28) 2Inline graphic90 (0Inline graphic55)
T Inline graphic 2Inline graphic54 (0Inline graphic03) 2Inline graphic26 (0Inline graphic02) 2Inline graphic43 (0Inline graphic02) 2Inline graphic41 (0Inline graphic02)
T Inline graphic 2Inline graphic66 (3Inline graphic96) 0Inline graphic81 (0Inline graphic03) 1Inline graphic02 (0Inline graphic19) 1Inline graphic01 (0Inline graphic19)
ST Inline graphic 2Inline graphic27 (0Inline graphic15) 1Inline graphic97 (0Inline graphic05) 2Inline graphic08 (0Inline graphic08) 2Inline graphic12 (0Inline graphic08)
ST Inline graphic 1Inline graphic40 (1Inline graphic59) 0Inline graphic97 (0Inline graphic02) 1Inline graphic05 (0Inline graphic03) 0Inline graphic96 (0Inline graphic02)
CST Inline graphic 2Inline graphic65 (0Inline graphic02) 2Inline graphic01 (0Inline graphic04) 2Inline graphic12 (0Inline graphic06) 2Inline graphic10 (0Inline graphic06)
CST Inline graphic 9Inline graphic65 (3Inline graphic76) 0Inline graphic97 (0Inline graphic04) 2Inline graphic16 (2Inline graphic19) 0Inline graphic92 (0Inline graphic03)

7. Real-data example

A gene regulatory network, also known as a pathway, is a set of genes that interact with each other to control a specific cell function. With recent advances in genomic research, many such networks have been discovered and their functions thoroughly studied. Certain pathways are now known and available in public databases such as KEGG (Ogata et al., 2000). One popular way to infer a gene regulatory network is through estimation of the precision matrix associated with gene expression (Wit & Abbruzzo, 2015). However, such data often contain outliers. To assess whether our robust estimator can improve inference on gene regulatory networks, we use a microarray dataset and compare our findings with generally acknowledged scientific truth from the genomics literature. The microarray data come from a study by Huang et al. (2011) on the inflammation process of cardiovascular disease. They identified that the toll-like receptor signalling pathway plays a key role in the inflammation process. Their study involves Inline graphic patients and the data are available from the Gene Expression Omnibus via the accession name GSE20129. We consider 95 genes from the toll-like receptor signalling pathway and another 62 genes from the peroxisome proliferator-activated receptor signalling pathway, which is known to be unrelated to cardiovascular disease. A good method should discover connections for genes within each of the pathways but not across them. We use both the original version of the adaptively constrained Inline graphic-minimization estimator and our robustified version via the Huber pilot estimator to estimate the precision matrix and therefore the gene regulatory network.

We first choose the tuning parameters that deliver the top 100 connections for each method. Table 3 reports the selection results, also displayed in Fig. 1. Our robust method identifies more connections within each pathway and fewer connections across the pathways.

Table 3.

Number of connections detected by two types of methods

Top 100 connections Equal tuning parameters
  Within Between Total   Within Between Total
Huber estimator 60 40 100 Huber estimator 27 15 42
Sample covariance 55 45 100 Sample covariance 55 45 100

Fig. 1.

Fig. 1.

Connections estimated by the adaptively constrained Inline graphic-minimization estimator using (a) the sample covariance and (b) the Huber-type pilot estimator; blue lines represent within-pathway connections and red lines between-pathway connections.

We tried taking the same tuning parameter in the constrained Inline graphic-minimization step (8) for each procedure. Table 3 gives the results. Our estimator detects fewer connections; however, the percentage of within-pathway connections estimated using the Huber pilot estimator is much higher than that of the sample covariance estimator. If the genomics literature is correct, our results show that use of the Huber pilot estimator improves inference for this example, in which heavy tails and skewness are present.

Supplementary Material

Supplementary Data

Acknowledgement

This work was partially supported by the U.S. National Science Foundation and National Institutes of Health. Avella-Medina was partially supported by the Swiss National Science Foundation. Battey was partially supported by the U.K. Engineering and Physical Sciences Research Council. The authors thank the editor, the associate editor and three referees for valuable comments.

Supplementary material

Supplementary Material available at Biometrika online includes the proofs of all propositions and theorems and additional plots for the real-data example.

References

  1. Antoniadis, A. & Fan, J. (2001). Regularization of wavelet approximations. J. Am. Statist. Assoc. 96, 939–57. [Google Scholar]
  2. Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scand. J. Statist. 32, 159–200. [Google Scholar]
  3. Bickel, P. J. & Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36, 2577–604. [Google Scholar]
  4. Boyd, S. & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press. [Google Scholar]
  5. Bubeck, S., Cesa-Bianchi, N. & Lugosi, G. (2013). Bandits with heavy tail. IEEE Trans. Info. Theory 59, 7711–7. [Google Scholar]
  6. Cai, T. T. & Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Assoc. 106, 672–4. [Google Scholar]
  7. Cai, T. T., Liu, W. & Luo, X. (2011). A constrained Inline graphic-minimization approach to sparse precision matrix estimation. J. Am. Statist. Assoc. 106, 594–607. [Google Scholar]
  8. Cai, T. T., Liu, W. & Zhou, H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Statist. 44, 455–88. [Google Scholar]
  9. Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Prob. Statist. 48, 1148–85. [Google Scholar]
  10. Chen, M., Gao, C. & Ren, Z. (2015). Robust covariance matrix estimation via matrix depth. arXiv: 1506.00691. [Google Scholar]
  11. Devroye, L., Lerasle, M., Lugosi, G. & Oliveira, R. I. (2016). Sub-Gaussian mean estimators. Ann. Statist. 44, 2695–725. [Google Scholar]
  12. Fan, J., Han, F., Liu, H. & Vickers, B. (2016a). Robust inference of risks of large portfolios. J. Economet. 194, 298–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fan, J., Li, Q. & Wang, Y. (2017). Estimation of high-dimensional mean regression in absence of symmetry and light-tail assumptions. J. R. Statist. Soc. B 79, 247–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Fan, J., Liao, Y. & Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Statist. Soc. B 75, 603–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fan, J., Liu, H. & Wang, W. (2015). Large covariance estimation through elliptical factor models. arXiv: 1507.08377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fan, J., Wang, W. & Zhong, Y. (2016b). Robust covariance estimation for approximate factor models. arXiv: 1602.00719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Grant, M. & Boyd, S. (2014). CVX: Matlab software for disciplined convex programming, version 2.1. [Google Scholar]
  18. Huang, C.-C.,, Liu, K.,, Pope, R. M.,, Du, P.,, Lin, S.,, Rajamannan, N. M.,, Huang, Q.-Q.,, Jafari, N.,, Burke, G. L.,, Post, W. et al. (2011). Activated TLR signaling in atherosclerosis among women with lower Framingham risk score: The multi-ethnic study of atherosclerosis. PloS One 6, e21067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35, 73–101. [Google Scholar]
  20. Huber, P. & Ronchetti, E. (2009). Robust Statistics. Hoboken, New Jersey: Wiley, 2nd edn. [Google Scholar]
  21. Joly, E. & Lugosi, G. (2016) Robust estimation of U-statistics. Stoch. Proces. Appl. 126, 3760–73. [Google Scholar]
  22. Lerasle, M. & Oliveira, R. (2011). Robust empirical mean estimators. arXiv: 1112.3914. [Google Scholar]
  23. Liu, H., Han, F., Yuan, M., Lafferty, J. & Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40, 2293–326. [Google Scholar]
  24. Loh, P. L. & Tan, X. L. (2015). High-dimensional robust precision matrix estimation: Cellwise corruption under Inline graphic-contamination. arXiv: 1509.07229. [Google Scholar]
  25. Nemirovsky, A. S. & Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. Hoboken, New Jersey: Wiley. [Google Scholar]
  26. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. & Kanehisa, M. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Petrov, V. (1995). Limit Theorems of Probability Theory. Oxford: Clarendon Press. [Google Scholar]
  28. Ren, Z., Sun, T., Zhang, C.-H. & Zhou, H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43, 991–1026. [Google Scholar]
  29. Rothman, A., Bickel, P., Levina, E. & Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Statist. 2, 494–515. [Google Scholar]
  30. Rothman, A., Levina, E. & Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Am. Statist. Assoc. 104, 177–86. [Google Scholar]
  31. Wit, E. C. & Abbruzzo, A. (2015). Inferring slowly-changing dynamic gene-regulatory networks. BMC Bioinformatics 16, S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Xue, L. & Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40, 2541–71. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES