SUMMARY
High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded
moments for
. The associated convergence rates depend on
.
Keywords: Constrained ℓ1-minimization, Leptokurtosis, Minimax rate, Robustness, Thresholding
1. Introduction
Covariance and precision matrices play a central role in summarizing linear relationships among variables. Our focus is on estimating these matrices when their dimension is large relative to the number of observations. Besides being of interest in themselves, estimates of covariance and precision matrices are used for numerous procedures from classical multivariate analysis, including linear regression.
Consistency is achievable under structural assumptions provided regularity conditions are met. For instance, under the assumption that all rows or columns of the covariance matrix belong to a sufficiently small
-ball around zero, thresholding (Bickel & Levina, 2008; Rothman et al., 2008) or its adaptive counterpart (Cai & Liu, 2011) gives consistent estimators of the covariance matrix in the spectral norm for data from a distribution with sub-Gaussian tails. For precision matrix estimation, the same sparsity assumption on the precision matrix motivates the use of the constrained
-minimizer of Cai et al. (2011) or its adaptive counterpart (Cai et al., 2016), both of which are consistent in spectral norm under the same sub-Gaussianity condition. Under sub-Gaussianity, Cai & Liu (2011) and Cai et al. (2016) showed that in high-dimensional regimes the adaptive thresholding estimator and adaptive constrained
-minimization estimator are minimax optimal within the classes of covariance or precision matrices satisfying their sparsity constraint.
Since sub-Gaussianity is often too restrictive in practice, we seek new procedures that can achieve the same minimax optimality when data are leptokurtic. Inspection of the proofs of Bickel & Levina (2008), Cai & Liu (2011) and Cai et al. (2016) reveals that sub-Gaussianity is needed because their methods are built on the sample covariance matrix, which requires the assumption to guarantee its optimal performance. Here we show that minimax optimality is achievable within a larger class of distributions if the sample covariance matrix is replaced by a robust pilot estimator, thus providing a unified theory for covariance and precision matrix estimation based on general pilot estimators. We also show how to construct pilot estimators that have the required elementwise convergence rates of (1) and (2) below. Within a much larger class of distributions with bounded fourth moment, it is shown that an estimator obtained by regularizing a robust pilot estimator attains the minimax rate achieved by existing methods under sub-Gaussianity. The analysis is extended to show that when only bounded
moments exist for
, matrix estimators with satisfactory convergence rates are still attainable.
Some related work includes that of Liu et al. (2012) and Xue & Zou (2012), who considered robust estimation of graphical models when the underlying distribution is elliptically symmetric, Fan et al. (2015, 2016a,b), who studied robust matrix estimation in the context of factor models, and Chen et al. (2015) and Loh & Tan (2015), who investigated matrix estimation when the data are contaminated by outliers. The present paper is concerned with efficient estimation of general sparse covariance and precision matrices when only certain moment conditions are assumed.
For a
-dimensional random vector
with mean
, let
and let
denote an arbitrary pilot estimator of
, where
with
standing for
. The key requirement on
for optimal covariance estimation is that
![]() |
(1) |
where
is a positive constant and
is a deterministic sequence converging to zero as
such that
. This delivers rates of convergence that match the minimax rates of Cai & Liu (2011) even under violations of their sub-Gaussianity condition, which entails the existence of
such that
for every
and every
. Introduce the sample covariance matrix
![]() |
where
are independent and identically distributed copies of
and
. Proposition 1 shows that
violates (1) when
is not sub-Gaussian. In other words, the sample covariance does not concentrate exponentially fast in an elementwise sense if sub-Gaussianity is violated.
Similarly, for estimation of the precision matrix
, the optimality of the adaptive constrained
-minimization estimator is retained under a pilot estimator satisfying
![]() |
(2) |
where
and
are as in (1) and
denotes the
identity matrix. While (2) holds with
under sub-Gaussianity of
, it fails otherwise.
The following proposition provides a more formal illustration of the unsuitability of
as a pilot estimator in the absence of sub-Gaussianity.
Proposition 1.
Let
for
and some
. For all distributions of
satisfying this assumption, there is a distribution
such that for some
,
This implies that the choice to take the sample covariance as the pilot estimator
results in a polynomial rate of convergence, which is slower than the exponential rate of concentration in (1). Instead, we introduce robust pilot estimators in § 4 that satisfy the conditions (1) and (2). These estimators only require
.
Throughout the paper, for a vector
,
,
and
. For a matrix
,
is the elementwise maximum norm,
is the spectral norm, and
is the matrix
-norm. We let
denote the
identity matrix;
and
mean that
is positive definite and positive semidefinite, respectively. For a square matrix
, we denote its maximum and minimum eigenvalues by
and
, respectively. We also assume that
.
2. Broadening the scope of the adaptive thresholding estimator
Let
be a general thresholding function for which:
(i)
for all
and
that satisfy
;(ii)
for
;(iii)
for all
.
Similar properties are set forth in Antoniadis & Fan (2001) and were proposed in the context of covariance estimation via thresholding in Rothman et al. (2009) and Cai & Liu (2011). Some examples of thresholding functions satisfying these three conditions are the soft thresholding rule
, the adaptive lasso rule
with
, and the smoothly clipped absolute deviation thresholding rule (Rothman et al., 2009). Although the hard thresholding rule
does not satisfy (i), the results presented in this section also hold for hard thresholding. The adaptive thresholding estimator is defined as
![]() |
where
is the (
)th entry of
and the threshold
is entry-dependent. Equipped with these adaptive thresholds, Cai & Liu (2011) established optimal rates of convergence of the resulting estimator under sub-Gaussianity of
. To accommodate data drawn from distributions violating sub-Gaussianity, we replace the sample covariance matrix
by a pilot estimator
satisfying (1). The resulting adaptive thresholding estimator is denoted by
. As suggested by Fan et al. (2013), the entry-dependent threshold
![]() |
(3) |
is used, where
is a constant. This is simpler than the threshold used by Cai & Liu (2011), as it does not require estimation of
and achieves the same optimality.
Let
denote the class of positive-definite symmetric matrices with elements in
. Theorem 1 relies on the following conditions on the pilot estimator and the sparsity of
.
Condition 1.
The pilot estimator
satisfies (1).
Condition 2.
The matrix
belongs to the class
The class of weakly sparse matrices
was introduced by Cai & Liu (2011). The columns of a covariance matrix in
are required to lie in a weighted
-ball, where the weights are determined by the variance of the entries of the population covariance.
Theorem 1.
Suppose that Conditions 1 and 2 hold,
, and
. There exists a positive constant
such that
where
is a deterministic sequence that decreases to zero as
.
The constant
in Theorem 1 depends on
,
and the unknown distribution of
, and so do the constants appearing in Theorem 2 and Propositions 2–4.
Our result generalizes Theorem 1 of Cai & Liu (2011); the minimax lower bound of our Theorem 1 matches theirs, implying that our procedure is minimax optimal for a wider class of distributions containing the sub-Gaussian distributions.
Cai & Liu (2011) also give convergence rates under bounded moments in all components of
. In that case, a much more stringent scaling condition on
and
is required, as shown in Theorem 1(ii) of Cai & Liu (2011). Their result does not cover the high-dimensional case where
when fewer than
finite moments exist. If a larger number of finite moments is assumed,
is allowed to increase polynomially with
. However, we allow
.
For the three pilot estimators to be given in § 4, even universal thresholding can achieve the same minimax optimal rate given the bounded fourth moments assumed there. However, adaptive thresholding as formulated in (3) results in better numerical performance.
Unfortunately,
may not be positive semidefinite, but it can be projected onto the cone of positive-semidefinite matrices through the convex optimization
![]() |
(4) |
By definition,
, so the triangle inequality yields
![]() |
Hence, the price to pay for projection is no more than a factor of two, which does not affect the convergence rate. The projection is easily made by linear programming (Boyd & Vandenberghe, 2004).
3. Broadening the scope of the adaptively constrained
-minimization estimator
We consider a robust modification of the adaptively constrained
-minimization estimator of Cai et al. (2016). In a spirit similar to § 2, our robust modification relies on the existence of a pilot estimator
satisfying (2). Construction of the robust adaptive constrained
-minimizer relies on a preliminary projection, resulting in the positive-definite estimator
![]() |
(5) |
for an arbitrarily small positive number
. The minimization problem in (5) can be rewritten as minimizing
such that
and
for all
. This problem can be solved in Matlab using the cvx solver (Grant & Boyd, 2014).
Given
, our estimator of
is constructed by replacing
with
in the original constrained
-minimization procedure. For ease of reference, the steps are reproduced below. Define the first-stage estimator
of
through the vectors
![]() |
(6) |
with
,
for
, and
being the vector that has value
in the
th coordinate and zeros elsewhere. More specifically, define
as an adjustment of
such that the
th entry is
![]() |
(7) |
and define the first-stage estimator as
. A second-stage adaptive estimator
is defined by solving, for each column,
![]() |
(8) |
where
for
. In practice, the optimal values of
and
are chosen by crossvalidation. The final estimator,
, of
is a symmetrized version of
constructed as
![]() |
(9) |
The theoretical properties of
are derived under Conditions 3 and 4.
Condition 3.
The pilot estimator
satisfies (2).
Condition 4.
The matrix
belongs to the class
where
,
is a constant, and
and
are positive deterministic sequences that are bounded away from zero and allowed to diverge as
and
grow.
In this class of precision matrices, sparsity is imposed by restricting the columns of
to lie in an
-ball of radius
(
).
Theorem 2.
Suppose that Conditions 1, 3 and 4 are satisfied with
. Under the scaling condition
we have, for a positive constant
,
where
is a deterministic sequence that decreases to zero as
and
is the robust adaptively constrained
-minimization estimator described in (6)–(9).
Remark 1.
Our class of precision matrices is slightly more restrictive than that considered in Cai et al. (2016), since we require
instead of
. The difference is marginal since
and
implies that
. We therefore only exclude precision matrices associated with either exploding or imploding covariance matrices, i.e., we exclude
and
for all
. Ren et al. (2015) also require
.
A positive-semidefinite estimator with the same convergence rate as
can be constructed by projecting the symmetric matrix
onto the cone of positive-semidefinite matrices, as in (4).
Next, we present three pilot estimators whose performance is favourable with respect to the sample covariance matrix when the sub-Gaussianity assumption is violated. We verify Conditions 1 and 3 for these estimators. Condition 1 will be verified for all three pilot estimators. When
is bounded, Condition 1 implies Condition 3 because
. When
, Condition 3 is verified for the adaptive Huber estimator. We emphasize that Condition 3 is only needed if the goal is to obtain a minimax optimal estimator of
. A consistent estimator is still attainable if only Condition 1 holds when
. A more thorough discussion appears in the Supplementary Material.
4. Robust pilot estimators
4.1. A rank-based estimator
The rank-based estimator requires only the existence of the second moment. However, it makes arguably more restrictive assumptions, as it requires the distribution of
to be elliptically symmetric.
Definition 1.
A random vector
follows an elliptically symmetric distribution if and only if
, where
,
with
,
is uniformly distributed on the unit sphere in
, and
is a positive random variable independent of
.
Observe that
where
denotes the correlation matrix and
. Liu et al. (2012) and Xue & Zou (2012) both proposed rank-based estimation of
, exploiting a bijective mapping between Pearson correlation and Kendall’s tau or Spearman’s rho dependence measures that hold for elliptical distributions. More specifically, Kendall’s tau concordance between
and
is defined as
![]() |
where
is an independent copy of
. With
, the empirical analogue of
is
![]() |
Since
, an estimator of
is
. An analogous bijection exists between Spearman’s rho and Pearson’s correlation; see Xue & Zou (2012) for details. We propose to estimate the elements of the diagonal matrix
using a median absolute deviation estimator,
, where
. Here,
denotes the median within the index set
and
is the Fisher consistency constant, where
is the distribution function of
and
is the median of
. Finally, the rank-based estimator is defined as
.
Proposition 2.
Let
be independent and identically distributed copies of the elliptically symmetric random vector
with covariance matrix
. Assume that
and
, where
for
and
. Then
with
for positive constants
,
and
.
In estimating marginal variances, we use median absolute deviation estimators to avoid higher moment assumptions. This assumes knowledge of
, without which these marginal variances can be estimated by using the adaptive Huber estimator or the median of means estimator given in the next two subsections. This requires existence of a fourth moment; see Propositions 3 and 5.
4.2. An adaptive Huber estimator
The Huber-type M-estimator only requires the existence of fourth moments. Let
. Then
where
,
and
. We propose to estimate
robustly through robust estimators of
,
and
. For independent and identically distributed copies
of a real random variable
with mean
, Huber’s (1964) M-estimator of
is defined as the solution to
![]() |
(10) |
where
is the Huber function. Replacing
in (10) by
,
and
gives the Huber estimators
,
and
of
,
and
, respectively, from which the Huber-type estimator of
is defined as
.
We depart from Huber (1964) by allowing
to grow to infinity as
increases, as our objectives differ from those of Huber (1964) and of classical robust statistics (Huber & Ronchetti, 2009). There, the distribution generating the data is assumed to be a contaminated version of a given parametric model, where the contamination level is small, and the objective is to estimate features of the parametric model as if no contamination were present. Our goal is instead to estimate the mean of the underlying distribution, allowing departures from sub-Gaussianity. In related work, Fan et al. (2017) have shown that when
is allowed to diverge at an appropriate rate, the Huber estimator of the mean concentrates exponentially fast around the true mean when only a finite second moment exists. In a similar spirit, we allow
to grow with
in order to alleviate the bias. An appropriate choice of
trades off bias and robustness. We build on Fan et al. (2017) and Catoni (2012), showing that our proposed Huber-type estimator satisfies Conditions 1 and 3.
Proposition 3.
Assume
. Let
be the Huber-type estimator with
for
and
satisfying
. Under the scaling condition
we have, for large
and a constant
,
where
for positive constants
and
.
Proposition 3 verifies Condition 1 for
, provided
is chosen to diverge at the appropriate rate. As quantified in Proposition 4,
also satisfies Condition 3 when
is of the same rate as in Proposition 3. The proof of this result entails extending a large deviation result of Petrov (1995).
Proposition 4.
Assume that
and
with
. Let
be the Huber-type estimator defined below (10) with
for
and
. Assume that the truncated population covariance matrix
satisfies
. Under the scaling condition
we have, for large
and a constant
,
where
for positive constants
and
.
4.3. A median of means estimator
The median of means estimator was proposed by Nemirovsky & Yudin (1983) and has been further studied by Lerasle & Oliveira (2011), Bubeck et al. (2013) and Joly & Lugosi (2016). It is defined as the median of
means obtained by partitioning the data into
subsamples. A heuristic explanation for its success is that taking means within subsamples results in a more symmetric sample while the median makes the solution concentrate faster.
Our median of means estimator for
is constructed as
, where
,
and
are median of means estimators of
,
and
, respectively; in each case, each of the
means is computed on an regular partition
of
. It is assumed that
is a factor of
.
The value of
is a tuning parameter that affects the accuracy of the median of means estimator. The choice of
involves a compromise between bias and variance. For the extreme cases,
and
, we obtain respectively the sample median and the sample mean. The latter is asymptotically unbiased but does not concentrate exponentially fast in the presence of heavy tails, while the former concentrates exponentially fast but not to the population mean under asymmetric distributions. Proposition 5 gives the range of
for which both goals are achieved simultaneously.
Proposition 5.
Assume
. Let
be the median of means estimator described above based on a regular partition
with
for a positive constant
. Under the scaling condition
we have, for large
and a constant
,
where
for positive constants
and
.
5. Infinite kurtosis
In the previous discussion we assumed the existence of fourth moments of
for the Huber-type estimator in § 4. We now relax the condition of boundedness of
to that of
for some
and all
. The following proposition lays the foundations for the analysis of high-dimensional covariance or precision matrix estimation with infinite kurtosis. It extends Theorem 5 in Fan et al. (2017) and gives rates of convergence for Huber’s estimator of
assuming a bounded
moment for
. The result is optimal in the sense that our rates match the minimax lower bound given in Theorem 3.1 of Devroye et al. (2016). The rates depend on
, and when
they match those of Catoni (2012) and Fan et al. (2017).
Proposition 6.
Let
,
and
, and let
be independent and identically distributed random variables with mean
and bounded
moment, i.e.,
. Take
. Then, with probability at least
,
where
is as defined in § 4.2.
Corollary 1.
Under the conditions of Proposition 6, the Huber estimator satisfies
Corollary 1 allows us to generalize the upper bounds of the Huber-type estimator. The following two theorems establish rates of convergence for the adaptive thresholding and the adaptively constrained
-minimization estimators. While we do not prove that these rates are minimax optimal under
finite moments, the proof expands on the elementwise maximum norm convergence of the pilot estimator, which is optimal by Theorem 3.1 of Devroye et al. (2016), and the resulting rates for adaptive thresholding match the minimax rates of Cai & Liu (2011) when
. This is a strong indication that the rates are sharp.
Theorem 3.
Suppose that Condition 2 is satisfied and assume
. Let
be the adaptive thresholding estimator defined in § 2based on the Huber pilot estimator
with
for
and
. Under the scaling condition
and choosing
for some
, we have, for sufficiently large
,
where
for positive constants
and
.
Theorem 4.
Suppose that Condition 4 is satisfied,
and
. Let
be the adaptively constrained
-minimization estimator defined in § 3based on the Huber pilot estimator
with
for
and
. Assume that the truncated population covariance matrix
satisfies
. Under the scaling condition
, we have, for sufficiently large
,
where
for positive constants
and
.
A result similar to Proposition 6 was obtained in Lemma 2 of Bubeck et al. (2013) for the median of means estimator. Expanding on it, we obtain a result analogous to Theorem 3 for the median of means matrix estimator.
6. Finite-sample performance
We illustrate the performance of the estimators discussed in § § 2 and 3 under a range of data-generating scenarios and for every choice of pilot estimator discussed in § 4. For the adaptive thresholding estimator of
, we use a hard thresholding rule with the entry-dependent thresholds of (3). In each of 500 Monte Carlo replications,
independent copies of a random vector
of dimension
are drawn from a model with either a sparse covariance matrix
or a sparse precision matrix
, depending on the experiment. We consider four different scenarios for the distribution of
: the zero-mean multivariate normal distribution; the
distribution with 3
5 degrees of freedom and infinite kurtosis; the skewed
distribution with four degrees of freedom and skew parameter equal to 20; and the contaminated skewed
distribution (Azzalini, 2005) with four degrees of freedom and skew parameter equal to 10. Data in the last scenario are generated as
, where
,
and
; here
is the
distribution generating most of the data, while
is a normal distribution with a mean vector of
and covariance matrix equal to the identity. Any unspecified tuning parameters from the adaptive thresholding estimator and adaptively constrained
-minimization estimator are chosen by crossvalidation to minimize the spectral norm error. Unspecified constants in the tuning parameters of the robust pilot estimators are conservatively chosen to be those that would be optimal if the true distribution was a Student
distribution with five degrees of freedom. We consider the following two structures for
and
.
(i) Sparse covariance matrix: similar to Model 2 in the simulation section of Cai & Liu (2011), we take the true covariance model to be the block-diagonal matrix
, where
,
,
with independent
and
to ensure that
is positive definite.(ii) Banded precision matrix: following Cai et al. (2016), we take the true precision matrix to be of the banded form
, where 
,
and
for
.
Table 1 shows that while the sample covariance estimator performs well for the normally distributed case, when the true model departs from normality, thresholding this estimator gives poor performance, reflected by its elevated estimation error in both the maximum norm and the spectral norm. By contrast, thresholding one of our proposed robust pilot estimators does not suffer from these heavy-tailed distributions. Table 2 shows a similar pattern for the precision matrix estimators. The gains are apparent for all robust pilot estimators, as predicted by our theory.
Table 1.
Estimation errors
with standard errors in parentheses
of the adaptive thresholding estimator of
based on four different pilot estimators; values are averaged over
replications
| Distribution | Error | Sample covariance | Adaptive Huber | Median of means | Rank-based |
|---|---|---|---|---|---|
| MVN |
|
2 88 (0 04) |
2 86 (0 04) |
3 31 (0 05) |
3 01 (0 07) |
| MVN |
|
0 98 (0 09) |
0 92 (0 09) |
1 50 (0 14) |
1 61 (0 23) |
| T |
|
8 95 (0 53) |
3 92 (0 06) |
4 46 (0 24) |
5 02 (0 06) |
| T |
|
8 72 (0 55) |
1 87 (0 05) |
3 35 (0 74) |
2 54 (0 04) |
| ST |
|
7 12 (0 17) |
4 88 (0 05) |
4 96 (0 06) |
5 16 (0 06) |
| ST |
|
6 89 (0 18) |
2 41 (0 04) |
2 43 (0 04) |
2 57 (0 04) |
| CST |
|
5 47 (0 23) |
4 14 (0 06) |
4 60 (0 05) |
5 13 (0 06) |
| CST |
|
5 07 (0 27) |
2 02 (0 05) |
2 27 (0 05) |
2 56 (0 04) |
MVN, the normal distribution; T, the
distribution; ST, the skewed
distribution; CST, the contaminated skewed
distribution.
Table 2.
Estimation errors
with standard errors in parentheses
of the adaptively constrained
-minimizers to
based on four different pilot estimators; values are averaged over
replications
| Distribution | Error | Sample covariance | Adaptive Huber | Median of means | Rank-based |
|---|---|---|---|---|---|
| MVN |
|
2 62 (0 01) |
2 61 (0 01) |
2 59 (0 01) |
2 59 (0 01) |
| MVN |
|
1 05 (0 09) |
1 02 (0 09) |
1 85 (0 28) |
2 90 (0 55) |
| T |
|
2 54 (0 03) |
2 26 (0 02) |
2 43 (0 02) |
2 41 (0 02) |
| T |
|
2 66 (3 96) |
0 81 (0 03) |
1 02 (0 19) |
1 01 (0 19) |
| ST |
|
2 27 (0 15) |
1 97 (0 05) |
2 08 (0 08) |
2 12 (0 08) |
| ST |
|
1 40 (1 59) |
0 97 (0 02) |
1 05 (0 03) |
0 96 (0 02) |
| CST |
|
2 65 (0 02) |
2 01 (0 04) |
2 12 (0 06) |
2 10 (0 06) |
| CST |
|
9 65 (3 76) |
0 97 (0 04) |
2 16 (2 19) |
0 92 (0 03) |
7. Real-data example
A gene regulatory network, also known as a pathway, is a set of genes that interact with each other to control a specific cell function. With recent advances in genomic research, many such networks have been discovered and their functions thoroughly studied. Certain pathways are now known and available in public databases such as KEGG (Ogata et al., 2000). One popular way to infer a gene regulatory network is through estimation of the precision matrix associated with gene expression (Wit & Abbruzzo, 2015). However, such data often contain outliers. To assess whether our robust estimator can improve inference on gene regulatory networks, we use a microarray dataset and compare our findings with generally acknowledged scientific truth from the genomics literature. The microarray data come from a study by Huang et al. (2011) on the inflammation process of cardiovascular disease. They identified that the toll-like receptor signalling pathway plays a key role in the inflammation process. Their study involves
patients and the data are available from the Gene Expression Omnibus via the accession name GSE20129. We consider 95 genes from the toll-like receptor signalling pathway and another 62 genes from the peroxisome proliferator-activated receptor signalling pathway, which is known to be unrelated to cardiovascular disease. A good method should discover connections for genes within each of the pathways but not across them. We use both the original version of the adaptively constrained
-minimization estimator and our robustified version via the Huber pilot estimator to estimate the precision matrix and therefore the gene regulatory network.
We first choose the tuning parameters that deliver the top 100 connections for each method. Table 3 reports the selection results, also displayed in Fig. 1. Our robust method identifies more connections within each pathway and fewer connections across the pathways.
Table 3.
Number of connections detected by two types of methods
| Top 100 connections | Equal tuning parameters | ||||||
|---|---|---|---|---|---|---|---|
| Within | Between | Total | Within | Between | Total | ||
| Huber estimator | 60 | 40 | 100 | Huber estimator | 27 | 15 | 42 |
| Sample covariance | 55 | 45 | 100 | Sample covariance | 55 | 45 | 100 |
Fig. 1.
Connections estimated by the adaptively constrained
-minimization estimator using (a) the sample covariance and (b) the Huber-type pilot estimator; blue lines represent within-pathway connections and red lines between-pathway connections.
We tried taking the same tuning parameter in the constrained
-minimization step (8) for each procedure. Table 3 gives the results. Our estimator detects fewer connections; however, the percentage of within-pathway connections estimated using the Huber pilot estimator is much higher than that of the sample covariance estimator. If the genomics literature is correct, our results show that use of the Huber pilot estimator improves inference for this example, in which heavy tails and skewness are present.
Supplementary Material
Acknowledgement
This work was partially supported by the U.S. National Science Foundation and National Institutes of Health. Avella-Medina was partially supported by the Swiss National Science Foundation. Battey was partially supported by the U.K. Engineering and Physical Sciences Research Council. The authors thank the editor, the associate editor and three referees for valuable comments.
Supplementary material
Supplementary Material available at Biometrika online includes the proofs of all propositions and theorems and additional plots for the real-data example.
References
- Antoniadis, A. & Fan, J. (2001). Regularization of wavelet approximations. J. Am. Statist. Assoc. 96, 939–57. [Google Scholar]
- Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scand. J. Statist. 32, 159–200. [Google Scholar]
- Bickel, P. J. & Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36, 2577–604. [Google Scholar]
- Boyd, S. & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press. [Google Scholar]
- Bubeck, S., Cesa-Bianchi, N. & Lugosi, G. (2013). Bandits with heavy tail. IEEE Trans. Info. Theory 59, 7711–7. [Google Scholar]
- Cai, T. T. & Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Assoc. 106, 672–4. [Google Scholar]
-
Cai, T. T., Liu, W. & Luo, X. (2011). A constrained
-minimization approach to sparse precision matrix estimation. J. Am. Statist. Assoc. 106, 594–607. [Google Scholar] - Cai, T. T., Liu, W. & Zhou, H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Statist. 44, 455–88. [Google Scholar]
- Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Prob. Statist. 48, 1148–85. [Google Scholar]
- Chen, M., Gao, C. & Ren, Z. (2015). Robust covariance matrix estimation via matrix depth. arXiv: 1506.00691. [Google Scholar]
- Devroye, L., Lerasle, M., Lugosi, G. & Oliveira, R. I. (2016). Sub-Gaussian mean estimators. Ann. Statist. 44, 2695–725. [Google Scholar]
- Fan, J., Han, F., Liu, H. & Vickers, B. (2016a). Robust inference of risks of large portfolios. J. Economet. 194, 298–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan, J., Li, Q. & Wang, Y. (2017). Estimation of high-dimensional mean regression in absence of symmetry and light-tail assumptions. J. R. Statist. Soc. B 79, 247–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan, J., Liao, Y. & Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Statist. Soc. B 75, 603–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan, J., Liu, H. & Wang, W. (2015). Large covariance estimation through elliptical factor models. arXiv: 1507.08377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan, J., Wang, W. & Zhong, Y. (2016b). Robust covariance estimation for approximate factor models. arXiv: 1602.00719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant, M. & Boyd, S. (2014). CVX: Matlab software for disciplined convex programming, version 2.1. [Google Scholar]
- Huang, C.-C.,, Liu, K.,, Pope, R. M.,, Du, P.,, Lin, S.,, Rajamannan, N. M.,, Huang, Q.-Q.,, Jafari, N.,, Burke, G. L.,, Post, W. et al. (2011). Activated TLR signaling in atherosclerosis among women with lower Framingham risk score: The multi-ethnic study of atherosclerosis. PloS One 6, e21067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35, 73–101. [Google Scholar]
- Huber, P. & Ronchetti, E. (2009). Robust Statistics. Hoboken, New Jersey: Wiley, 2nd edn. [Google Scholar]
- Joly, E. & Lugosi, G. (2016) Robust estimation of U-statistics. Stoch. Proces. Appl. 126, 3760–73. [Google Scholar]
- Lerasle, M. & Oliveira, R. (2011). Robust empirical mean estimators. arXiv: 1112.3914. [Google Scholar]
- Liu, H., Han, F., Yuan, M., Lafferty, J. & Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40, 2293–326. [Google Scholar]
-
Loh, P. L. & Tan, X. L. (2015). High-dimensional robust precision matrix estimation: Cellwise corruption under
-contamination. arXiv: 1509.07229. [Google Scholar] - Nemirovsky, A. S. & Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. Hoboken, New Jersey: Wiley. [Google Scholar]
- Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. & Kanehisa, M. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petrov, V. (1995). Limit Theorems of Probability Theory. Oxford: Clarendon Press. [Google Scholar]
- Ren, Z., Sun, T., Zhang, C.-H. & Zhou, H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43, 991–1026. [Google Scholar]
- Rothman, A., Bickel, P., Levina, E. & Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Statist. 2, 494–515. [Google Scholar]
- Rothman, A., Levina, E. & Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Am. Statist. Assoc. 104, 177–86. [Google Scholar]
- Wit, E. C. & Abbruzzo, A. (2015). Inferring slowly-changing dynamic gene-regulatory networks. BMC Bioinformatics 16, S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue, L. & Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40, 2541–71. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



















































































































































































































































































































