Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2013 Aug 16;110(36):14563–14568. doi: 10.1073/pnas.1307845110

Optimal M-estimation in high-dimensional regression

Derek Bean 1, Peter J Bickel 1,1, Noureddine El Karoui 1, Bin Yu 1
PMCID: PMC3767535  PMID: 23954907

Abstract

We consider, in the modern setting of high-dimensional statistics, the classic problem of optimizing the objective function in regression using M-estimates when the error distribution is assumed to be known. We propose an algorithm to compute this optimal objective function that takes into account the dimensionality of the problem. Although optimality is achieved under assumptions on the design matrix that will not always be satisfied, our analysis reveals generally interesting families of dimension-dependent objective functions.

Keywords: robust regression, prox function


In this article, we study a classical statistical problem: how to pick optimally (among regression M-estimates) the objective to be minimized in a parametric regression when we know the error distribution. Our study is carried out in the modern setting where the number of predictors is proportional to the number of observations.

The classical answer to the problem we posed at the beginning, maximum likelihood, was given by Fisher (1) in the specific case of multinomial models and then at succeeding levels of generality by Cramér (2), Hájek (3), and, above all, Le Cam (4). For instance, for p fixed or Inline graphic fast enough, least squares is optimal for Gaussian errors, whereas least absolute deviations (LAD) is optimal for double-exponential errors. We shall show that this is no longer true in the regime we consider with the answer depending, in general, on the limit of the ratio Inline graphic as well as the form of the error distribution. Our analysis in this paper is carried out in the setting of Gaussian predictors, although as we explain below, this assumption should be relaxable to a situation in which the distribution of the predictors satisfy certain concentration properties for quadratic forms.

We carry out our analysis in a regime that has been essentially unexplored, namely Inline graphic, where p is the number of predictor variables and n is the number of independent observations. Because in most fields of application, situations in which p as well as n are large have become paramount, there has been a huge amount of literature on the case where Inline graphic but the number of “relevant” predictors is small. In this case, the objective function, quadratic (least squares) or otherwise (Inline graphic for LAD) has been modified to include a penalty (usually Inline graphic) on the regression coefficients, which forces sparsity (5). The price paid for this modification is that estimates of individual coefficients are seriously biased and statistical inference, as opposed to prediction, often becomes problematic.

In ref. 6, we showed that this price need not be paid if Inline graphic stays bounded away from 1, even if the regression does not have a sparse representation. We review the main theoretical results from this previous paper in Result 1 below. Some of our key findings were as follows: (i) Surprisingly, when Inline graphic, it is no longer true that LAD is necessarily better than least squares for heavy-tailed errors. This behavior is unlike that in the classical regime p bounded or Inline graphic fast enough studied, for instance, in ref. 7. (ii) Linear combinations of regression coefficients are unbiased and still asymptotically Gaussian at rate Inline graphic.

This article contains three main parts: Background and Main Results contains needed background and a description of our findings. In Computing the Optimal Objective, we give two examples of interest to statisticians: the case of Gaussian errors and the case of double-exponential errors. We present our derivations in the last section.

Background and Main Results

We consider a problem in which we observe n independent, identically distributed pairs Inline graphic, where Inline graphic is a p-dimensional vector of predictors, and Inline graphic is a scalar response. We call the problem high-dimensional when the ratio Inline graphic is not close to 0. In effect, we are considering an asymptotic setting where Inline graphic is not 0. We also limit ourselves to the case in which Inline graphic. As far as we know, all of the very large body of work developed in robust regression (following ref. 7) is concerned with situations in which Inline graphic tends to 0, as n tends to infinity.

Let us briefly recall the details of the robust regression problem. We consider the estimator

graphic file with name pnas.1307845110uneq1.jpg

where ρ is a function from Inline graphic to Inline graphic, which we will assume throughout is convex.§ Furthermore, we consider a linear regression model

graphic file with name pnas.1307845110uneq2.jpg

where Inline graphic is unknown and Inline graphic are random errors, independent of Inline graphic’s. Naturally, our aim is to estimate Inline graphic from our observations Inline graphic and the question is therefore, which ρ we should choose. We can separate this into the following two questions: (i) What choice of ρ minimizes the asymptotic error for estimating an individual regression coefficient (or a given linear form in Inline graphic)? (ii) What choice of ρ minimizes the asymptotic prediction error for a new observation Inline graphic given the training data? The answers to (i) and (ii) turn out to be the same in the high-dimensional and Gaussian setting we are considering, just as in the low-dimensional case, but the extension is surprising.

Some Recent High-Dimensional Results.

In a recent paper (6), we found heuristically the following.

Result 1.

[See El Karoui et al. (6).] Suppose Inline graphic are independent identically distributed (i.i.d) Inline graphic, with Inline graphic positive definite. Suppose Inline graphic, Inline graphic are i.i.d, independent of Inline graphic, Inline graphic is deterministic, and Inline graphic. Call

graphic file with name pnas.1307845110uneq3.jpg

Then we have the stochastic representation

graphic file with name pnas.1307845110uneq4.jpg

where u is uniform on Inline graphic(the unit sphere in Inline graphic) and independent of Inline graphic.

Let us call Inline graphic. As p and n tend to infinity, while Inline graphic and Inline graphic, Inline graphic in probability (under regularity conditions on ρ and ε), where Inline graphic is deterministic. Define Inline graphic, where Inline graphic is independent of ε, and ε has the same distribution as Inline graphic. We can determine Inline graphic through

graphic file with name pnas.1307845110uneq5.jpg

where c is a positive deterministic constant to be determined from the previous system.

The definition and details about the prox mapping are given in Appendix: Reminders. This formulation is important because it shows that what matters about an objective function in high dimension is not really the objective itself but rather its prox, in connection with the distribution of the errors. We also note that our analysis in ref. 6 highlights the fact that the result concerning Inline graphic should hold when normality of the predictors is replaced by a concentration of quadratic form assumption. System S is the basis of our analysis.

Consequences for estimation of Inline graphic.

If v is a given deterministic vector, we see that Inline graphic is unbiased for Inline graphic and

graphic file with name pnas.1307845110uneq6.jpg

In other words, in high dimension, the simple estimator Inline graphic is Inline graphic-consistent for Inline graphic. We further note that Inline graphic is asymptotically normal, and its variance can be estimated, so inference about Inline graphic is easy. More details are in SI Text, including for instance explicit confidence intervals. Picking v to be the kth canonical basis vector, Inline graphic, we also see that we can consistently estimate the kth coordinate of Inline graphic, Inline graphic, at rate Inline graphic.

A similar analysis can be performed to obtain unbiased (and consistent) estimators of quadratic forms in Inline graphic, i.e., quantities of the form Inline graphic, where Inline graphic is a given covariance matrix.

The situation in which the vector v is random (and for instance dependent on the design) is beyond the scope of this paper. We note that Huber (7) has early very partial results on this problem.

On expected prediction error.

In the case in which Inline graphic follows the model above and is independent of Inline graphic, we immediately see that

graphic file with name pnas.1307845110uneq7.jpg

Picking ρ to minimize the quantity Inline graphic (viewed as a function of ρ) will allow us to get the best estimators (in the class we are considering) for both expected prediction error (EPE) and, as it turns out, linear forms in Inline graphic.

Potential Limitations of the Model.

Assumptions concerning the predictors.

The impact of changing the distributional assumptions of the predictors on Result 1 is explored and explained in details in our papers (6) and especially ref. 8. We present in ref. 8 design matrices for which Result 1 does not hold. For those designs, using the family of objective functions described in Theorem 1 below will not yield optimal results—however, using dimension-dependent objective functions may still yield improvement over using standard, dimension-independent objective functions. We also note that parts of Result 1 hold under less restrictive assumptions, but one key limitation is that our generalizations do not include categorical predictors.

The question of intercept.

The model and estimation problem described in Result 1 do not include an intercept. Furthermore, both Inline graphic and Inline graphic are assumed to have mean 0. However, if we recenter both the responses Inline graphic’s and the predictors Inline graphic’s and fit a linear model (without intercept) on this recentered dataset, we have a solution to these problems that essentially fits into the framework of our analysis. The corresponding Inline graphic has the same stochastic representation properties as the ones described in Result 1, and hence inference about Inline graphic is possible in this case, too. More details and justifications are provided in SI Text. We note that, as presented in this paper, our results do not allow us to address inferential questions involving the intercept. However, in separate and ongoing work, we have obtained results that suggest that we will be able to answer some of these inferential questions as well as issues pertaining to the fitted values [see Huber (7), remarks 2 and 3, pp. 803–804, for early comments about the difficulty they pose in the high-dimensional setting].

The rest of the paper is focused on the model and statistical problem discussed in Result 1.

Main Result.

We propose an algorithm to determine the asymptotically optimal objective function to use in robust regression. As explained in SI Text, under the assumptions of Result 1, this optimal objective function is the same regardless of the norm chosen to assess the performance of our regression estimator. Just as in the classical case, determining the optimal objective function requires knowledge of the distribution of the errors, which we call ε. We call the density of the errors Inline graphic and assume that Inline graphic is log-concave.

If Inline graphic is the normal density with variance Inline graphic and Inline graphic, where Inline graphic is the usual convolution operation, Inline graphic is log-concave. As a matter of fact, it is well known (9, 10) that the convolution of two log-concave densities is log-concave.

We call Inline graphic the information of Inline graphic, which we assume exists for all Inline graphic. It is known that, when ε has a density, Inline graphic is continuous in r (11), where it is explained that Inline graphic is differentiable or see ref. 12.

Throughout the paper, we denote by Inline graphic the function taking value

graphic file with name pnas.1307845110uneq8.jpg

Here is our theorem.

Theorem 1.

If Inline graphic is a solution of system S, we have Inline graphic, where Inline graphic.

Furthermore, Inline graphic is the solution of system S when Inline graphic, and Inline graphic is the convex function

graphic file with name pnas.1307845110uneq9.jpg

[For a function g, Inline graphic is its (Fenchel–Legendre) conjugate, i.e., Inline graphic.]

We give an alternative representation of Inline graphic in Appendix: Reminders.

We propose the following algorithm for computing the optimal objective function under the assumptions of the theorem.

  • 1. Solve for r the equation

graphic file with name pnas.1307845110eq1.jpg

Define Inline graphic.

  • 2. Use the objective function

graphic file with name pnas.1307845110eq2.jpg

The theorem and the algorithm raise a few questions: Is there a solution to the equation in step 1? Is the min well defined? Is the objective function in step 2 convex? We address all these questions in the course of the paper.

The significance of the algorithm lies in the fact that we are now able to incorporate dimensionality in our optimal choice of ρ. In other words, different objectives turn out to be optimal as the ratio of dimensions varies. Naturally, the optimality of the objective function is restricted to situations in which our assumptions on the design matrix are satisfied. However, more generally, our results provide practitioners with a “natural” family of objective functions that could be tried if one would like to use a dimension-dependent objective function in a high-dimensional regression problem. This family of functions is also qualitatively different from the ones that appear naturally in the small p setting, as our derivation (see below) shows.

Because our estimators are M-estimators, they are not immune to inadmissibility problems, as has been understood, in an even simpler context, since the appearance of James–Stein estimators. Picking Inline graphic instead of another ρ, although improving performance when our assumptions are satisfied, does not remove this potential problem.

It should also be noted that at least numerically, computing Inline graphic is not very hard. Similarly, solving Eq. 1 is not hard numerically. Hence, the algorithm is effective as soon as we have information about the distribution of ε.

As the reader will have noticed, a crucial role is played by Inline graphic. In the rest of the paper, we use the lighter notation

graphic file with name pnas.1307845110uneq10.jpg

The dependence of Inline graphic on p and n is left implicit in general, but will be brought back when there are any risks of confusion.

Next, we illustrate our algorithm in a few special cases.

Computing the Optimal Objective

The Case of Gaussian Errors.

Corollary 1.

In the setting of i.i.d Gaussian predictors, among all convex objective functions, Inline graphic is optimal in regression when the errors are Gaussian.

In the case of Gaussian ε, it is clear that Inline graphic is a Gaussian density. Hence, Inline graphic is a multiple of Inline graphic (up to centering) and so is Inline graphic. General arguments given later guarantee that this latter multiple is strictly positive. Therefore, Inline graphic is Inline graphic, up to positive scaling and centering. Carrying out the computations detailed in the algorithm, we actually arrive at Inline graphic. Details are in SI Text.

The Case of Double-Exponential Errors.

We recall that in low dimension (e.g., p fixed, n goes to infinity), classic results show that the optimal objective is Inline graphic. As we will see, it is not at all of the case when p and n grow in such a way that Inline graphic has a finite limit in Inline graphic. We recall that, in ref. 6, we observed that when Inline graphic was greater than 0.3 or so, Inline graphic actually performed better than Inline graphic for double-exponential errors.

Although there is no analytic form for the optimal objective, it can be computed numerically. We discuss how and present a picture to get a better understanding of the solution of our problem.

The Optimal Objective.

For Inline graphic, Inline graphic, and Inline graphic, the Gaussian cumulative distribution function, let us define

graphic file with name pnas.1307845110uneq11.jpg

It is easy to verify that, when the errors are double exponential, Inline graphic. Hence, effectively the optimal objective is the function taking values

graphic file with name pnas.1307845110uneq12.jpg

It is of course important to be able to compute this function and the estimate Inline graphic based on it. We show below that Inline graphic is a smooth convex function for all r. Hence, in the case we are considering, Inline graphic is increasing and therefore invertible. If we call Inline graphic, we see that Inline graphic. We also need to be able to compute the derivative of Inline graphic (denoted Inline graphic) to implement a gradient descent algorithm to compute Inline graphic. For this, we can use a well-known result in convex analysis, which says that for a convex function h (under regularity conditions) Inline graphic (see ref. 13, corollary 23.5.1).

We present a plot to get an intuitive feeling for how this function Inline graphic behaves (more can be found in SI Text). Fig. 1 compares Inline graphic to other objective functions of potential interest in the case of Inline graphic. All of the functions we compare are normalized so that they take value 0 at 0 and 1 at 1.

Fig. 1.

Fig. 1.

Inline graphic: comparison of Inline graphic (optimal objective) to Inline graphic and Inline graphic. Inline graphic is the solution of Inline graphic; for Inline graphic, Inline graphic.

Comparison of Asymptotic Performance of Inline graphic Against Other Objective Functions.

We compare Inline graphic to the results we would get using other objective functions ρ in the case of double-exponential errors. Recall that our system S allows us to compute the asymptotic value of Inline graphic, Inline graphic, as n and p go to infinity for any convex (and sufficiently regular) ρ.

Comparison of Inline graphic to Inline graphic.

We compare Inline graphic to Inline graphic in Fig. 2. Interestingly, Inline graphic yields a Inline graphic that is twice as efficient as Inline graphic as Inline graphic goes to 0. From classical results in robust regression (p bounded), we know that this is optimal because Inline graphic objective is optimal in that setting, and also yields estimators that are twice as efficient as Inline graphic.

Fig. 2.

Fig. 2.

Ratio Inline graphic for double-exponential errors: the ratio is always less than 1, showing the superiority of the objective we propose over Inline graphic.

Comparison of Inline graphic to Inline graphic.

We compare Inline graphic to Inline graphic in Fig. 3. Naturally, the ratio goes to 1 when Inline graphic goes to 0, because Inline graphic, as we just mentioned, is known to be the optimal objective function for Inline graphic tending to 0.

Fig. 3.

Fig. 3.

Ratio Inline graphic: the ratio is always less than 1, showing the superiority of the objective we propose over Inline graphic. Naturally, the ratio goes to 1 at 0, because we know that Inline graphic is the optimal objective when Inline graphic for double-exponential errors.

Simulations.

We investigate the empirical behavior of estimators computed under our proposed objective function. We call those estimators Inline graphic. Table 1 shows Inline graphic over 1,000 simulations when Inline graphic for different ratios of dimensions and compares the empirical results to Inline graphic, the theoretical values. We used Inline graphic, Inline graphic and double-exponential errors in our simulations.

Table 1.

n = 500: Inline graphic over 1,000 independent simulations

p/n 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Observed mean ratio 0.6924 0.7732 0.8296 0.8862 0.9264 0.9614 0.9840 0.9959 0.9997
Predicted mean ratio 0.6842 0.7626 0.8224 0.8715 0.9124 0.9460 0.9721 0.9898 0.9986
|Relative error|, % 1.2 1.4 0.8 1.7 1.5 1.6 1.2 0.6 0.1

In SI Text, we also provide statistics concerning Inline graphic computed over 1,000 simulations. We note that our predictions concerning Inline graphic work very well in expectation when p and n are a few 100’s, even though in these dimensions, Inline graphic is not yet close to being deterministic (see SI Text for details—these remarks also apply to Inline graphic for more general ρ).

Derivations

We prove Theorem 1 assuming the validity of Result 1.

Phrasing the Problem as a Convex Feasibility Problem.

Let us call Inline graphic, where Inline graphic. We now assume throughout that Inline graphic and call Inline graphic simply Inline graphic for notational simplicity. We recall that for Inline graphic (Appendix: Reminders). From now on, we call Inline graphic just prox. If Inline graphic is feasible for our problem, there is a ρ that realizes it and the system S is therefore, with Inline graphic,

graphic file with name pnas.1307845110uneq13.jpg

Now it is clear that if we replace ρ by Inline graphic, Inline graphic, we do not change Inline graphic. In particular, if we call Inline graphic, where c is the real appearing in the system above, we have, if Inline graphic is feasible: there exists Inline graphic such that

graphic file with name pnas.1307845110uneq14.jpg

We can now rephrase this system using the fundamental equality (see ref. 14 and Appendix: Reminders) Inline graphic, where Inline graphic is the (Fenchel–Legendre) conjugate of ρ. It becomes

graphic file with name pnas.1307845110uneq15.jpg

Prox mappings are known to belong to subdifferentials of convex functions and to be contractive (see ref. 14, p. 292, corollaire 10.c). Let us call Inline graphic and recall that Inline graphic denotes the density of Inline graphic. Because g is contractive, Inline graphic as Inline graphic. Because Inline graphic is a log-concave density (as a convolution of two log-concave densities—see refs. 9 and 10) with support Inline graphic, it goes to zero at infinity exponentially fast (see ref. 15, p. 332). We can therefore use integration by parts in the first equation to rewrite the previous system as (we now use r instead of Inline graphic for simplicity)

graphic file with name pnas.1307845110uneq16.jpg

Because Inline graphic, for all x, we can multiply and divide by Inline graphic inside the integral of the first equation and use the Cauchy–Schwarz inequality to get

graphic file with name pnas.1307845110uneq17.jpg

It follows that

graphic file with name pnas.1307845110eq3.jpg

We now seek a ρ to achieve this lower bound on Inline graphic.

Achieving the Lower Bound.

It is clear that a good g [which is Inline graphic] should saturate the Cauchy–Schwarz inequality above. Let Inline graphic. A natural candidate is

graphic file with name pnas.1307845110uneq18.jpg

It is easy to see that for this function Inline graphic, the two equations of the system are satisfied (the way we have chosen Inline graphic is of course key here). However, we need to make sure that Inline graphic is a valid choice; in other words, it needs to be the prox of a certain (convex) function.

We can do so by using ref. 14. By proposition 9.b on p. 289 in ref. 14, it suffices to establish that, for all Inline graphic,

graphic file with name pnas.1307845110uneq19.jpg

is convex and less convex than Inline graphic. That is, there exists a convex function γ such that Inline graphic.

When ε has a log-concave density, it is well known that Inline graphic is log-concave. Inline graphic is therefore convex.

Furthermore, for a constant K,

graphic file with name pnas.1307845110uneq20.jpg

It is clear that Inline graphic is convex in x. Hence, Inline graphic is less convex than Inline graphic. Thus, Inline graphic is a prox function and a valid choice for our problem.

Determining Inline graphic from Inline graphic.

Let us now recall another result of ref. 14. We denote the inf-convolution operation by Inline graphic. More details about Inline graphic are given in SI Text and Appendix: Reminders. If γ is a (proper, closed) convex function, and Inline graphic, we have (see ref. 14, p. 286)

graphic file with name pnas.1307845110uneq21.jpg

Recall that Inline graphic. So up to constants that do not matter, we have

graphic file with name pnas.1307845110uneq22.jpg

It is easy to see (SI Text) that, for any function f,

graphic file with name pnas.1307845110uneq23.jpg

So we have Inline graphic. Now, for a proper, closed, convex function γ, we know that Inline graphic. Hence,

graphic file with name pnas.1307845110uneq24.jpg

Convexity of Inline graphic.

We still need to make sure that the function Inline graphic we have obtained is convex. We once again appeal to ref. 14, proposition 9.b. Because Inline graphic is less convex than Inline graphic, Inline graphic is convex. However, because Inline graphic is convex, Inline graphic is less convex than Inline graphic. Therefore, Inline graphic is more convex than Inline graphic, which implies that Inline graphic is convex.

Minimality of Inline graphic.

The fundamental inequality we have obtained is Eq. 3, which says that, for any feasible Inline graphic, when Inline graphic, Inline graphic. Our theorem requires solving the equation Inline graphic. Let us study the properties of the solutions of this equation.

Let us call ξ the function such that Inline graphic. We note that Inline graphic is the information of Inline graphic, where Inline graphic and independent of ε. Hence Inline graphic as Inline graphic and Inline graphic as Inline graphic. This is easily established using the information inequality Inline graphic when X and Y are independent (I is the Fisher information; see, e.g., ref. 16). As a matter of fact, Inline graphic as Inline graphic However, Inline graphic. Finally, as Inline graphic, it is clear that Inline graphic (see SI Text for details). Using the fact that ξ is continuous (see, e.g., ref. 11), we see that the equation Inline graphic has at least one solution for all Inline graphic.

Let us recall that we defined our solution as Inline graphic. Denote Inline graphic. We need to show two facts to guarantee optimality of Inline graphic: (i) the inf is really a min; (ii) Inline graphic, for all feasible r’s (i.e., r’s such that Inline graphic). (i) follows easily from the continuity of ξ and lower bounds on Inline graphic detailed in SI Text.

We now show that, for all feasible r’s, Inline graphic. Suppose it is not the case. Then, there exists Inline graphic, which is asymptotically feasible and Inline graphic. Because Inline graphic is asymptotically feasible, Inline graphic. Clearly, Inline graphic, for otherwise we would have Inline graphic with Inline graphic, which would violate the definition of Inline graphic. Now recall that Inline graphic. By continuity of ξ, because Inline graphic, there exists Inline graphic such that Inline graphic. However, Inline graphic, which violates the definition of Inline graphic.

Appendix: Reminders

Convex Analysis Reminders.

Inf-convolution and conjugation.

Recall the definition of the inf-convolution (see, e.g., ref. 13, p. 34). If f and g are two functions,

graphic file with name pnas.1307845110uneq25.jpg

Recall also that the (Fenchel–Legendre) conjugate of a function f is

graphic file with name pnas.1307845110uneq26.jpg

A standard result says that, when f is closed, proper, and convex, Inline graphic (ref. 13, theorem 12.2).

We also need a simple remark about relation between inf-convolution and conjugation. Recall that Inline graphic. Then (we give details in SI Text),

graphic file with name pnas.1307845110uneq27.jpg

The prox function.

The prox function seems to have been introduced in convex analysis by Moreau (refs. 13, pp. 339–340, and 14). The definition follows. We assume that f is a proper, closed, convex function. Then, when f: Inline graphic, and Inline graphic is a scalar,

graphic file with name pnas.1307845110uneq28.jpg

In the last equation, Inline graphic is in general a subdifferential of f. Although this could be a multivalued mapping when f is not differentiable, the prox is indeed well defined as a (single-valued) function.

A fundamental result connecting prox mapping and conjugation is the equality

graphic file with name pnas.1307845110uneq29.jpg

An Alternative Representation for Inline graphic.

We give an alternative representation for Inline graphic. Recall that we had Inline graphic. Using Inline graphic, we see that Inline graphic. In the case in which Inline graphic is differentiable, this gives immediately

graphic file with name pnas.1307845110uneq30.jpg

Because Inline graphic is defined up to a positive scaling factor,

graphic file with name pnas.1307845110uneq31.jpg

is an equally valid choice.

Interestingly, for Inline graphic near 0, Inline graphic will be near zero too, and the previous equation shows that Inline graphic will be essentially Inline graphic, corresponding to the objective derived from maximum-likelihood theory.

Supplementary Material

Supporting Information

Acknowledgments

We thank the referees for their interesting and constructive comments. We thank Prof. Steven Portnoy for several thought-provoking remarks and discussions. D.B. gratefully acknowledges support from National Science Foundation (NSF) Grant DMS-0636667 (Vertical Integration of Research and Education in the Mathematical Sciences). P.J.B. gratefully acknowledges support from NSF Grant DMS-0907362. N.E.K. gratefully acknowledges support from an Alfred P. Sloan Research Fellowship and NSF Grant DMS-0847647 (Faculty Early Career Development). B.Y. gratefully acknowledges support from NSF Grants SES-0835531 (Cyber-Enabled Discovery and Innovation), DMS-1107000, and CCF-0939370.

Footnotes

The authors declare no conflict of interest.

Heuristically.

Under conditions depending on the model and the linear combination; see details below.

§The properties of Inline graphic naturally depend on ρ—this dependence will be made clear later.

Note that any Inline graphic, where λ and ξ are real valued with Inline graphic, yields the same solution for Inline graphic.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1307845110/-/DCSupplemental.

References

  • 1. Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc Lond A 222:309–368.
  • 2. Cramér H (1946) Mathematical Methods of Statistics. Princeton Mathematical Series (Princeton Univ Press, Princeton), Vol 9.
  • 3. Hájek J (1972) Local asymptotic minimax and admissibility in estimation. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (University of California, Berkeley, California, 1970/1971) (Univ of California Press, Berkeley, CA), Vol I: Theory of Statistics, pp 175–194.
  • 4.Le Cam L. On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann Math Stat. 1970;41:802–828. [Google Scholar]
  • 5. Bühlmann P, van de Geer S (2011) Statistics for High-Dimensional Data. Methods, Theory and Applications. Springer Series in Statistics (Springer, Heidelberg)
  • 6.El Karoui N, Bean D, Bickel P, Lim C, Yu B. On robust regression with high-dimensional predictors. Proc Natl Acad Sci USA. 2012;110:14557–14562. doi: 10.1073/pnas.1307842110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Huber PJ. Robust regression: Asymptotics, conjectures and Monte Carlo. Ann Stat. 1973;1:799–821. [Google Scholar]
  • 8. El Karoui N, Bean D, Bickel P, Lim C, Yu B (2012) On robust regression with high-dimensional predictors (Department of Statistics, Univ of California, Berkeley, CA), Technical Report 811.
  • 9.Ibragimov IA. On the composition of unimodal distributions. Teor Veroyatnost i Primenen. 1956;1:283–288. [Google Scholar]
  • 10.Prékopa A. On logarithmic concave measures and functions. Acta Sci Math (Szeged) 1973;34:335–343. [Google Scholar]
  • 11.Costa MHM. A new entropy power inequality. IEEE Trans Inf Theory. 1985;31:751–760. [Google Scholar]
  • 12.Guo D, Wu Y, Shamai S, Verdú S. Estimation in Gaussian noise: Properties of the minimum mean-square error. IEEE Trans Inf Theory. 2011;57:2371–2385. [Google Scholar]
  • 13. Rockafellar RT (1997) Convex analysis. Princeton Landmarks in Mathematics (Princeton Univ Press, Princeton), Reprint of the 1970 original, Princeton Paperbacks.
  • 14.Moreau J-J. Proximité et dualité dans un espace hilbertien. Bull Soc Math France. 1965;93:273–299. French. [Google Scholar]
  • 15.Karlin S. Total positivity. Vol I. Stanford, CA: Stanford Univ Press; 1968. [Google Scholar]
  • 16.Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf Control. 1959;2:101–112. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES