Abstract
In situations that the predictors are correlated with the error term, we propose a bridge estimator in the two-stage least squares estimation. We apply this estimator to overcome the multicollinearity and sparsity of the explanatory variables, when the endogeneity problem is present.The proposed estimator was applied to modify the Durbin-Wu-Hausman (DWH) test of endogeneity in the presence of multicollinearity. To compare our modified test with the existing DWH for detection of an endogenous problem in multi-collinear data, some numerical assessments are carried out. The numerical results showed that the proposed estimators and the suggested test perform better for the multi-collinear data. Finally, a genetical data set is applied for illustration the our results by estimating the coefficients parameters in the presence of endogeneity and multicollinearity.
Keywords: Bridge estimator, Durbin-Wu-Hausman test, Endogeneity, Instrumental variable, multicollinearity, sparsity
1. Introduction
Under some certain regularity conditions, the Ordinary Least Squares (OLS) estimator produces the unbiased and consistent estimators. An important assumption is that the predictors in the model are uncorrelated with the error term. When this condition is not satisfied, then the OLS estimator yields biased and inconsistent estimates and referred to as the endogeneity problem (Ebbes [10]; Bowden and Turkington [3]). An approach to overcome the endogeneity problem is applying some instrumental variable estimators or two-stage least squares which assumes that there is a secondary predictor, termed as instrumental variable, that is correlated to the predictor but not with the error term.
Another important assumption in the classical regression model is that the predictors are uncorrelated. When this assumption is violated, the explanatory variables are nearly dependent which refers as multicollinearity problem and the OLS yields poor estimates. In order to combat this problem, several approaches have been considered and these include the ridge regression estimator introduced by Hoerl and Kennard [17], as one. They considered a shrinkage method to overcome the problem of multicollinearity in the estimation of regression parameters; see ([12,21,28,32] and [22]).
Many authors described the connection between endogeneity and multicollinearity in the literature. Recently, Philips and Evans [26] discussed on approximating and reducing bias in an estimator method called Two Stage Least Squares (2SLS) estimation while Hansen and Kuzbur [15] proposed a jackknife instrument variable estimator with regularization at each jackknife iteration that helps alleviate the bias. Interested readers may refer to [5,6,8] for other developments in this regard.
On the other hand, in the context of variable selection, it is well known that the least absolute shrinkage and selection operator (LASSO) of Tibshirani [29] produces sparse solution. In the presence of endogenous regressors, the direct using of the Lasso selector fails as sparsity of coefficients in regression model does not correspond to sparsity of linear projection coefficients [2].
This paper uses the bridge regression method to combat multicollinearity and sparsity in estimation of the instrumental variable models.
It is known that finding valid instrumentals is not easy. Moreover, using instrumental variables estimators for eliminating the inconsistency of the OLS causes the loss of efficiency. Thus, researchers must find a good way of determining when they must apply the instrument variable estimator, and when they do not have to. Wu [31], Hausman [16] and Durbin [9] proposed some tests for detecting endogenous regressors in the instrumental analysis. The secondary purpose of this paper is to modify the Hausman test statistic in multi-collinear models. The main and preliminary results are given in the next section. In section 3, we propose our test statistic that is used for tesing of endogeneity under the multicollinearity situation and provide some asymptotic results. Section 4 is devoted to a simulation study to investigate the performance of the introduced test statistic. Finally, an application of our results in the analysis of a genetic data is presented in Section 5.
2. Main results
Consider the multiple linear regression model
(1) |
(2) |
where is the vector of response variables, is an matrix of endogenous explicative variables, is the vector of parameters, and is the vector of random disturbances with and . It is further assumed that follows the n-variate normal distribution. Moreover, is an matrix of the exogenous instrumental variables, is its corresponding coefficient vector, and is an vector of errors. It is also assumed that the covariance matrix of is .It is known that the two stage least squares (2SLS) estimator of is obtained in two stages as follows: in the first stage, we have to estimate the regression coefficient using the OLS approach as the below:
Then, using the predicted values in the second stage, is regressed on and the estimator of coefficient vector is of the form
(3) |
where is a symmetric and idempotent matrix.
Suppose now in the model (2) there are some ill-conditioned in the matrix . The most ill-conditioned situations are multicollinearity and sparsity. In such situation, we use the bridge penalty in the second stage of 2SLS method to deal with these problems. The first stage estimation is the same as the traditional 2SLS, but in the second stage of estimation we apply the bridge estimator in the regression equation . Then by shrinking the estimation under for some , the bridge 2SLS estimator of after solving the following optimization problem:
where is the tuning parameter, , and is a p-tuple of 1's. Using local quadratic approximation (LQA) of Fan and Li [11] the penalty term can be approximated around a local vector and the bridge estimator has the following closed form
(4) |
where . For more details one may refer to [24],[33], and [25].Obviously, when , the LASSO of [30] can be approximated by
However, the ridge estimator is of the following form for , without approximation
(5) |
(6) |
In order to study the properties of these estimators, The following regularity conditions are assumed:
(A.1) The and matrices and are nonsingular and finite.
(A.2) , and the matrix is finite and of full rank.
(A.3) By the law of large numbers, there exist nonnegative definite matrices , and such that
The following theorem discusses on the asymptotic behavior of the 2SLS bridge estimator .
Theorem 1
Under the assumptions (A.1)–(A.3), if , , then
where
and follows the distribution.
Proof.
Define
where and . Here, the convergence of the first term of is proven and the convergence of the second term can be similarly found in Knight and Wenjiang [19]. Note that
(7) Using the central limit theorem and (A.1)-(A.3), the RHS of (7) converges in distribution to which completes the proof.
Corollary 1
has asymptotically normal distribution with mean
and variance-covariance matrix
Proof.
Setting and in Theorem 1, implies that
so,
The proof is complete.
3. The proposed test
In this section, we propose a test statistic to diagnose endogeneity when the multicollinearity problem exists in the explanatory variables. Suppose that the regression model between the outcome vector and the predictor matrix is expressed as .
Then, for the endogeneity testing hypotheses
(8) |
Hausman[20] introduced the following test statistic
(9) |
Under the null hypothesis , it follows a central chi-square distribution with p degrees of freedom distribution where p is the number of unknown parameters.
Many fields of sciences, such as marketing, genomics, and basic sciences have studied detection of multicollinearity see e.g. Batterham et al. [1]. An important consequence of a high degree of multicollinearity is the high estimated variance of . Therefore, in this case, the Hausman test statistic tends to zero and always leads to accepting the null hypothesis. Therefore, the Hausman test in the presence of the multicollinearity problem in the model may lead to incorrect conclusions about regressors' endogeneity. Guo et al. [14] introduced a test statistic in this context without considering the problem of multicollinearity.
We propose a modified version of the Hausman test statistic as follows:
(10) |
where is the ordinary ridge estimation in the model (2) and is the ridge parameter. It is expected that the null hypothesis of endogeneity is rejected if the statistic significantly differs from zero.
To perform the hypothesis test, we need to find the null distribution of the statistic . To this end, we take the following assumptions
(B.1) and are stochastic and nonsingular matrices.
(B.2) The matrices and exist and are non-singular.
Theorem 2
Under (B.1)-(B.2), (A.3) and the null hypothesis (8), the statistic in (10) has asymptotically .
Proof.
Following the approach of Hausman [20], it suffices to prove the following three assertions:
Under the null hypothesis (8), and are consistent estimators of .
is an efficient estimator under the null hypothesis (8).
and have asymptotically normal distributions.
Under (B.1)–(B.2) we have
which proves the consistency of . Also, the assumption (A.3) imply that and therefore around a local vector we have
To establish the efficiency of , the Cramer-Rao inequality implies that
where is the Fisher's information matrix. Given that , we conclude that:
and also
which means that is an efficient estimator.
The asymptotic normality of is verified by Knight and Wenjiang [19] which they have shown that asymptotically follows a normal distribution with mean and variance-covariance matrix . Finally, the asymptotic normality of follows from Theorem (1), and this completes the proof.
4. Simulation study
In this section, we carry out some numerical analyses of the two most applied special cases of the bridge estimator i.e. and . First we compare the performance of and with the traditional estimators , , and based on their absolute relative bias (R.Bias) and estimated mean square errors (EMSE) under different degrees of endogeneity. Next, the our proposed test statistic is compared to Durbin- Wu- Hausman test in terms of the estimated power.
textbfWe generate data in the form of the vector from a multivariate normal distribution with mean zero and a specific variance covariance matrix. In order to impose multicollinearity to the data, the variance covariance matrix is adopted such that the correlation between the covariates is equal to the constant value 0.9 and there is a slight correlation between all instruments and the predictor variables, meaning that all instrument variables are relevant. Moreover, for endogeneity, different levels of are considered. Using the approach of principal component analysis, we take the largest eigenvector of as the vector of coefficients ([18]). Then, the response vector generated following . Specifically, in this experiment, we take the number of covariates and the number of instruments as and , respectively. Then by generating new error terms, the experiment is replicated 5000 times. The absolute R.Bias and EMSE for the standardized values of any estimator say, are calculated respectively as follows:
where is the estimation of in the lth replication of the simulation. The value of the ridge parameter k is obtained via the generalized cross-validation technique introduced by Golub et al. [13] to obtain a good estimator. See also Liu and Jiang [20] and references therein. The results are summarized in Table 1, for the values n = 50 and 1.
Table 1. Estimated parameter values, absolute R.Bias and EMSE for and different values of ρ.
Coefficients | β | ||||
---|---|---|---|---|---|
0.7 | 0.658 | 0.671 | 0.678 | 0.624 | |
0.5 | 0.509 | 0.576 | 0.502 | 0.5548 | |
R.bias | 0.402 | 0.879 | 0.486 | 0.647 | |
EMSE | 0.126 | 1.019 | 0.116 | 0.343 | |
Coefficients | β | ||||
0.7 | 1.400 | 0.740 | 1.366 | 0.737 | |
0.5 | 0.044 | 0.484 | 0.066 | 0.476 | |
R.bias | 1.158 | 0.825 | 1.103 | 0.636 | |
EMSE | 0.812 | 0.777 | 0.706 | 0.327 | |
Coefficients | β | ||||
0.7 | 1.838 | 0.774 | 1.815 | 0.757 | |
0.5 | −0.252 | 0.459 | 0.468 | ||
R.bias | 1.891 | 0.805 | 1.851 | 0.691 | |
EMSE | 1.909 | 0.853 | 1.831 | 0.419 |
Table 1 shows that in the case where we only have the multicollinearity problem, the OLS estimator has the lowest R.bias value and the ridge estimator has the lowest EMSE value. While in the presence of a problem of the endogeneity, the EMSE and the absolute R.Bias values of the 2SLS-R estimator are lesser than the other three competitors.
Figure 1 depicts the absolute R.bias and EMSE values of the three estimators for different values of ρ and n.
Figure 2.
Criteria plot of Bias and EMSE for n = 20, 100.
Figure 1.
Plot of absolute R.Bias and EMSE for n = 20, 100.
In view of Figure 1, for enough large sample size, if the endogeneity level is almost greater than 0.2, then the new estimator is always better than the other three estimators.
In order to evaluate the performance of , the sparse and endogenous samples are generated. For this purpose, the vector with the multivariate normal distribution with mean zero and the variance covariance matrix is considered such that all instruments are relevant and for the different levels of the endogeneity corresponding to the values . The vector is also taken as the form to create sparsity conditions. Here again, we set the number of covariates and the number of instruments as and and we compare the performance of new 2SLS-L estimator with the three estimators OLS, 2SLS and Lasso. The results are summarized in Table 2 and plotted in Figure 3.
Table 2. Estimated parameter values, absolute R.Bias and EMSE for and different values of ρ.
Coefficients | β | ||||
---|---|---|---|---|---|
0.6 | 0.604 | 0.610 | 0.572 | 0.349 | |
0.001 | −0.0004 | 0.0001 | 0.0002 | 0.0004 | |
R.bias | 0.227 | 0.507 | 0.217 | 0.502 | |
EMSE | 0.041 | 0.354 | 0.040 | 0.308 | |
Coefficients | β | ||||
0.6 | 1.196 | 0.656 | 1.137 | 0.371 | |
0.001 | 0.003 | −0.0001 | 0.002 | 0.0003 | |
R.bias | 0.684 | 0.511 | 0.598 | 0.467 | |
EMSE | 0.386 | 1.096 | 0.319 | 0.261 | |
Coefficients | β | ||||
0.6 | 1.589 | 0.643 | 1.567 | 0.524 | |
0.001 | 0.0003 | −0.002 | 0.0001 | 0.0002 | |
R.bias | 1.042 | 0.524 | 0.996 | 0.454 | |
EMSE | 1.005 | 0.997 | 0.934 | 0.266 |
Table 3. Estimated parameter values and MSPEs for the OLS, the 2SLS, the Ridge and 2SLS-R methods.
Estimation | ||||
---|---|---|---|---|
Intercept | 0.0055 | 0.0064 | 0.0055 | 0.0061 |
−0.1747 | −5.868 | −0.1707 | −3.5670 | |
0.2384 | 1.5041 | 0.2280 | 0.8182 | |
0.0565 | 4.1146 | 0.0502 | 2.6057 | |
Estimated MSPE | 0.9914 | 0.9882 | 0.9901 | 0.9832 |
As we can see in Table 2, almost having an endogeneity level greater than 0.6, the 2SLS-L is superior to the other three estimators.
At the end of this section, the power of our proposed test of endogeneity for a set of data with multicollinearity and endogeneity ill-conditioned is studied. We compare its power with the power of the existing DWH test. For this purpose we generate samples from model (1) by simulating vector which has a multivariate normal distribution with mean zero and consider the variance covariance matrix in such a way that the correlation between the covariates is equal to 0.9 and all instruments are relevant. Also, the number of instruments and covariates are assumed as and , respectively. Further, we consider the beta coefficient vector as . We have considered 9 different values of the endogeneity levels as and used the exact critical value based on the distribution of statistic at 0.05 significance level test. For each value of the endogeneity level, the power of the proposed test is estimated by N = 1000 simulated samples of size n = 100 from the alternative hypothesis. As shown in Figure 3, these two test statistics have almost the same power under the null hypothesis while under the alternative hypothesis our modified test behaves better, than the DWH test in the presence of the multicollinearity, in terms of the power criterion.
Figure 3.
Comparison of the DWH test statistic and the new modified DWH test statistic .(solid line: DWH test statistic; dashed line: new modified DWH test statistic).
5. Genetic data analysis
In this section, we apply our results to carry out a regression analysis of a genetic data set that suffers from multicollinearity and endogeneity. Several authors have reported the presence or absence of a relationship between the genetic variant(s) and the outcome as the primary analysis result, rather than an instrumental analysis, ( Didelez and Sheehan [7], Burgess and Dylan [4]). We use the our method to overcome endogeneity and multicollinearity in the data set of Health and Retirement Study (HRS), which is supported by the National Institute on Aging (NIA U01AG009740, RC2 AG036495, RC4 AG039029). This data contain genotype and phenotype variants which may effect on some outcomes such as educational attainment. see e. g. Rietveld et al. [27] and Okbay et al. [23].
We are involved in running multiple regression to predict educational attainment (EA) using systolic blood pressure (SBP), main arterial pressure (MAP) and pulse pressure (PP) as predictor variables. These predictors are extremely correlated (with between MAP and PP and VIF=60.79 in the regression equation between these exposure variables). On the other hand, the dependent variable EA and these three predictors are correlated in which correlation between residual of regression equation and EA is about 0.9997, so, we choose another three variables major depressive disorder (MDD), schizophrenia polygenic score (SPS) and subjective well-being (SWB) as instrumental variables since their correlation with EA is 0.030, 0.015 and 0.012, respectively. We first assessed the fitted value at the first stage and observed that the previous multicollinearity in the matrix still exists and consequently a cross-validation technique incorporated to obtain the ridge parameter which is equal to k = 1. Hence, using (6) we can estimate the parameters on the model. We also calculate the Mean Squared Prediction Error (MSPE) criterion to evaluate the performance of the estimators. To end this, we divide the real data into two subgroups, training and testing, such that 80 percent of the data set to the training set and the rest in the test group. Table 3 presents the estimated values of the coefficients of SBP, MAP, and PP using the four methods OLS, 2SLS, ridge and regularized 2SLS, along with the MSPE values for the test group.
According to the results of this paper, the proposed new estimator offers smaller estimated MSPE compared to the others. Furthermore, the value of the proposed test statistic , is equal to 5.401 while for the DWH test, it is lesser and equals to . Note that both of these values of tests statistics provide evidences in favor of .
6. Conclusion
In this study we used the bridge method to combat the multicollinearity and sparsity in a two-stage least squares estimation. In particular, we studied a ridge-2SLS as well as a Lasso-2SLS estimation procedures. Some large sample properties of the proposed estimator investigated and compared with some other estimators using a simulation study. Furthermore, we proposed a modification of the DWH test of endogeneity. Using a simulation study we observed that the our new test has higher power than the power of the existing DWH test in multi-collinear data.
Acknowledgments
The authors would like to thank the anonymous reviewers for the insightful comments which led to an improvement of this paper. The work of M. Arashi was based upon research supported in part by the National Research Foundation (NRF) of South Africa SARChI Research Chair UID: 71199. Opinions expressed and conclusions arrived at in this study are those of the author and are not necessarily to be attributed to the NRF.
Note
We use the R software and all codes are available upon request.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Batterham A.M., Tolfrey K., and George K.P., Nevill's explanation of Kleiber's 0.75 mass exponent: an artifact of collinearity problems in least squares models?, J. Appl. Physiology 82 (1997), pp. 693–697. doi: 10.1152/jappl.1997.82.2.693 [DOI] [PubMed] [Google Scholar]
- 2.Belloni A. and Chernozhukov V., L1-Penalized quantile regression in high-Dimensional sparse models, The Ann. Stat. 39 (2011a), pp. 82–130. doi: 10.1214/10-AOS827 [DOI] [Google Scholar]
- 3.Bowden R. and Turkington D., Instrumental Variables, Cambridge University Press, NY., 1984. [Google Scholar]
- 4.Burgess S. and Dylan S., Predicting the direction of causal effect based on an instrumental variable analysis: A cautionary tale, J. Cau. Inf. 4 (2016), pp. 49–59. doi: 10.1515/jci-2015-0024 [DOI] [Google Scholar]
- 5.Carrasco M., A regularization approach to the many instruments problem, J. Econometrics 170 (2012), pp. 383–398. doi: 10.1016/j.jeconom.2012.05.012 [DOI] [Google Scholar]
- 6.Carrasco M. and Tchuente G., Regularized LIML for many instruments, J. Econometrics 186 (2015), pp. 427–442. doi: 10.1016/j.jeconom.2015.02.018 [DOI] [Google Scholar]
- 7.Didelez V. and Sheehan N.A., Mendelian randomization as an instrumental variable approach to causal inference, Statist. Meth. Med. Res. 16 (2007), pp. 309–330. doi: 10.1177/0962280206077743 [DOI] [PubMed] [Google Scholar]
- 8.Dixon P., Davey Smith G., and von Hinke S., Pharmaco Economics, 34, 2016, 1075. Available at 10.1007/s40273-016-0432-x [DOI] [PMC free article] [PubMed]
- 9.Durbin J., Errors in variables, Revue De L'institut International De Statistique 22 (1954), pp. 23–32. doi: 10.2307/1401917 [DOI] [Google Scholar]
- 10.Ebbes P., Latent instrumental variables – a new approach to solve for endogeneity, Ph.D. thesis, University of Groningen Economics and Business. 2004.
- 11.Fan J. and Li R., Variable selection via non concave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (2001), pp. 1348–1360. doi: 10.1198/016214501753382273 [DOI] [Google Scholar]
- 12.Garciá C.B., Garciá J., Loṕez Martiń M.M., and Salmeroń R., Collinearity: revisiting the variance inflation factor in ridge regression, J. Appl. Stat. 42 (2015), pp. 648–661. doi: 10.1080/02664763.2014.980789 [DOI] [Google Scholar]
- 13.Golub G.H., Heath M., and Wahba G., Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21 (1979), pp. 215–223. doi: 10.1080/00401706.1979.10489751 [DOI] [Google Scholar]
- 14.Guo Z., Kang H., Cai T.T., and Small D.S., Testing endogeneity with possibly invalid instruments and high dimensional covariates, arXiv preprint arXiv:1609.06713, 2016
- 15.Hansen C. and Kozbur D., Instrumental variables estimation with manyWeak instruments using regularized JIVE, J. Econ. 182 (2014), pp. 290–308. doi: 10.1016/j.jeconom.2014.04.022 [DOI] [Google Scholar]
- 16.Hausman J., Specification tests in econometrics, Econometrica 46 (1978), pp. 1251–1271. doi: 10.2307/1913827 [DOI] [Google Scholar]
- 17.Hoerl A.E. and Kennard R.W., Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12 (1970), pp. 55–67. doi: 10.1080/00401706.1970.10488634 [DOI] [Google Scholar]
- 18.Jolliffe I.T., Principal Component Analysis, Springer series in statistics, Vol. 29, 2002. [Google Scholar]
- 19.Fu W. and Knight K., Asymptotic for lasso-type estimators, Ann. Statist. 28 (2000), pp. 1356–1378. doi: 10.1214/aos/1015957397 [DOI] [Google Scholar]
- 20.Liu X.Q. and Jiang H.Y., Optimal generalized ridge estimator under the generalized cross-validation criterion in linear regression, Linear Alg. App. 436 (2012), pp. 1238–1245. doi: 10.1016/j.laa.2011.08.032 [DOI] [Google Scholar]
- 21.Lukman A.F., Ayinde K., Binuomote S., and Onate A.C., Modified ridge-type estimator to combat multicollinearity: application to chemical data, J. Chemom. 2019), pp. e3125. 10.1002/cem.3125. [DOI] [Google Scholar]
- 22.Lukman A.F., Ayinde K., Sek S.K., and Adewuyi E., A modified new two-parameter estimator in a linear regression model, Modelling and Simul. Eng. (2019), pp. 10.1155/2019/6342702. [DOI] [Google Scholar]
- 23.Okbay A., Beauchamp J., Fontana M., Lee J., Pers T., Rietveld C., Turley P., Chen G., Emilsson V., Meddens S., and Oskarsson S., Genome-wide association study identifies 74 loci associated with educational attainment, Nature 533 (2016), pp. 539–542. doi: 10.1038/nature17671 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Park C. and Yoon Y., Bridge regression: adaptivity and group selection, J. Statist. Plann. Inf. 141 (2011), pp. 0. [Google Scholar]
- 25.Park C. and Yoon Y.J., Bridge regression: adaptivity and group selection, J. Stat. Plann. Inf. 141 (2011), pp. 3506–3519. doi: 10.1016/j.jspi.2011.05.004 [DOI] [Google Scholar]
- 26.Phillips G. and Evans G, Approximating and reducing bias in 2SLS estimation of dynamic simultaneous equation models, Comp. Statist. Data Anal. 100 (2016), pp. 734–762. doi: 10.1016/j.csda.2015.11.011 [DOI] [Google Scholar]
- 27.Rietveld C., Medland S., Derringer J., Yang J., Esko T., Martin N., Westra H., Shakhbazov K., Abdellaoui A., Agrawal A., and Albrecht E., GWAS of 126,559 individuals identifies genetic variants associated with educational attainment, Science 340 (2013), pp. 1467–1471. doi: 10.1126/science.1235488 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Saleh A.K., Ehsanes Md., Arashi M., and Kibria B.M.G., Theory of Ridge Regression Estimation with Applications, John Wiley, USA, 2019. [Google Scholar]
- 29.Tibshirani R., Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc.: Ser. B 58 (1996), pp. 267–288. [Google Scholar]
- 30.Tibshirani R., Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc: Ser. B 58 (1996), pp. 267–288. [Google Scholar]
- 31.Wu D., Alternative tests of independence between stochastic regressors and disturbances, J. Econometrica 41 (1973), pp. 733. doi: 10.2307/1914093 [DOI] [Google Scholar]
- 32.Xu-Qing L., Gao F., and Yu Z.-F., Improved ridge estimators in a linear regression model, J. Appl. Stat. 40 (2013), pp. 209–220. doi: 10.1080/02664763.2012.740623 [DOI] [Google Scholar]
- 33.Yuzbasi B., Arashi M., and Akdeniz F., Penalized regression via the restricted bridge estimator, arXiv:1910.03660, 2019.