Bootstrap-quantile ridge estimator for linear regression with applications

Irum Sajjad Dar; Sohail Chand

doi:10.1371/journal.pone.0302221

. 2024 Apr 29;19(4):e0302221. doi: 10.1371/journal.pone.0302221

Bootstrap-quantile ridge estimator for linear regression with applications

Irum Sajjad Dar ¹, Sohail Chand ^1,^*

Editor: Mohamed R Abonazel²

PMCID: PMC11057767 PMID: 38683865

Abstract

Bootstrap is a simple, yet powerful method of estimation based on the concept of random sampling with replacement. The ridge regression using a biasing parameter has become a viable alternative to the ordinary least square regression model for the analysis of data where predictors are collinear. This paper develops a nonparametric bootstrap-quantile approach for the estimation of ridge parameter in the linear regression model. The proposed method is illustrated using some popular and widely used ridge estimators, but this idea can be extended to any ridge estimator. Monte Carlo simulations are carried out to compare the performance of the proposed estimators with their baseline counterparts. It is demonstrated empirically that MSE obtained from our suggested bootstrap-quantile approach are substantially smaller than their baseline estimators especially when collinearity is high. Application to real data sets reveals the suitability of the idea.

1. Introduction

Multiple Linear regression (MLR) is one of the most popular tool used by practitioners due to its simplicity and attractive properties. However, it has been observed that many complex and interesting situations may exist. For example, collinearity among predictors is an imperative problem [1] faced by many researchers. In this situation, various methods have been developed to cope with collinearity problem. Among such methods are Ridge regression [2], Principal component regression [3], Partial least squares regression [4], and Continuum regression [5]. Many researchers have used such methods in accelerated failure time models (see e.g. [6–8]).

However, ridge regression (RR) is the most popular and is extensively used in practice due to its relatively low computational cost, strong theoretical guarantees, and interpretability [9]. It can be a useful technique for estimating the coefficients of the model as it provides precise estimates by introducing some bias in the regression model. This shrinkage method allows all the considered covariates to be included in the model but with shrunken coefficients.

Consider the classic MLR model:

y = X β + ε,

(1)

where y_(n×1) is the vector of the dependent variable, X_(n×p) is the design matrix, β_(p×1) is a vector of unknown regression parameters i.e. $\underline{β} = (β_{0}, β_{1}, \dots, β_{p})^{'}$ , without loss of generality, we assume β₀ = 0, ε_(n×1) is a vector of error random variables with mean vector 0 and variance-covariance matrix σ²I_n, where n denotes the number of observations, p represents the number of predictors and I_n is the identity matrix of order n. The ordinary least square (OLS) estimator of β is given as:

{\hat{β}}_{O L S} = {(X^{'} X)}^{- 1} X^{'} y .

which will mainly depend on the characteristics of the matrix X′X. When the predictors are correlated i.e. (at least one of the predictors is a linear combination of other predictors), the matrix X′X becomes ill-conditioned. It implies that for some of the eigen values of X′X becomes close to zero. Thus, the variance of the OLS estimator is inflated so that one cannot obtain stable estimates [10].

To circumvent this situation, [2] introduced the concept of RR in which a non-negative constant k is added to the diagonal elements of X′X so that the estimates or stable estimates can be obtained at the cost of some bias.

The RR estimator is defined as follows:

{\hat{β}}_{R R} = {(X^{'} X + k I_{p})}^{- 1} X^{'} y, k > 0,

where k is known as the ridge parameter or the biasing parameter. The RR has become an accepted alternative to OLS for the ill-conditioned design matrix, but there is no theoretical agreement on the optimal choice of k. The selection of k is yet one of the most challenging and fundamentally exciting existing problem. The main concern is to find the value of k such that the reduction in the variance term is greater than the increase in the squared bias. A striking diversity persists in choices for k as advocated in the literature (see e.g. [2, 11–21]) among others and references therein; but to search for the optimal value of the ridge parameter remains open. Also, to mitigate the effect of multicollinearity in different situations many researchers have employed RR in Beta regression [22], Gaussian linear model [23], Logistic regression [24], Poisson regression [25], Tobit regression [26], to mention but a few.

Bootstrap initiated by [27] is a generic statistical method for assessing the accuracy of an estimator. The core mechanism is based on the concept of random sampling with replacement. Many bootstrap techniques have been developed to address a large variety of statistical problems (see e.g. [28–32]). The common use of bootstrapping is the construction of confidence intervals, the approximation of critical points in hypothesis tests, calculating standard errors and bias, etc. However, we will show that the performance of any existing ridge estimator can be improved by replacing it with some appropriate quantile of its bootstrap distribution. The basic component in bootstrap methods is to replace a true distribution function with its empirical estimator. A very primitive understanding of bootstrap methods is that it does not rely on any distributional assumptions such as normality and can estimate the standard error of any complicated estimator without any theoretical calculations [29]. Such a method ensures that the statistics evaluated are accurate and unbiased as much as possible.

The work in this manuscript mainly came out of the combination of the ideas from [33], as they mentioned that the bootstrap approach for the selection of ridge parameter is justified as it is based on repeated and independent estimates of multiple predictions and [18] who used to calculate the γth quantile of the ridge estimator originally proposed by [2] presented in Eq (4).

We combine and extend their ideas to improve the performance of already existing ridge estimators while maintaining the same general idea. The objective of this manuscript is to propose a novel bootstrap-quantile ridge (BQR) estimator, and its computational algorithm and also to evaluate and compare its performance with the baseline estimator. That is based on the approach to arrive at an optimal choice of ridge parameter that will yield mean square error (MSE) substantially smaller as compared to its baseline counterpart. The paper would also contribute to the existing literature by offering new insights into the selection of ridge parameter.

The remainder of this article is structured as follows: Statistical methodology along with a brief review of some popular and widely used existing ridge estimators and our suggested BQR versions of these estimators are described in Sec. 2. A Monte Carlo simulation study has been conducted and its results are provided and discussed in Sec. 3. A real-life application is provided in Sec. 4. Finally, Sec. 5 concludes the article.

2. Statistical methodology

Considering the model given in (1), suppose that U is (p x p) orthogonal matrix such that U′U = I_p and U′X′XU = Λ where Λ = diag (λ₁, λ₂, …, λ_p) contains the eigen values of the matrix X′X. Then, the canonical form of model (1) is expressed as y = Zα+ε, where Z = XU and $α = (α_{1}, α_{2}, α_{3}, ...., α_{p})^{'} = U^{'} β$ .

The RR estimator in the canonical form is given as: $\hat{α} (k) = {(Z^{'} Z + K)}^{- 1} Z^{'} Y$ , where $K = d i a g (k_{1}, k_{2}, \dots, k_{p}), k_{w} > 0 f o r w = 1, 2, \dots, p .$ The OLS estimate of α is $\hat{α} = Λ^{- 1} Z^{'} y$ . The MSE of $\hat{α} (k)$ is given as follows:

M S E (\hat{α} (k)) = {\hat{σ}}^{2} \sum_{w = 1}^{p} \frac{λ_{w}}{{(λ_{w} + k_{w})}^{2}} + \sum_{w = 1}^{p} \frac{k_{w}^{2} α_{w}^{2}}{{(λ_{w} + k_{w})}^{2}} .

(2)

where, ${\hat{σ}}^{2}$ represents the error variance of the model (1), α_w is the w^th value of α and λ_w is the w^th eigen value of the matrix X′X.

2.1. Existing estimators

This section discusses some of the popular existing estimators while our suggested BQR versions are provided in a subsequent section. [2] proposed ridge estimator as an alternative to the OLS estimator for use in the presence of collinearity among predictors. They suggested k_w as the ratio of estimated error variance ( ${\hat{σ}}^{2}$ ) and w^th estimate of α using OLS as follows:

{\hat{k}}_{w} = \frac{{\hat{σ}}^{2}}{{\hat{α}}_{w}^{2}}, w = 1, 2, \dots, p .

(3)

where ${\hat{σ}}^{2} = \sum_{j = 1}^{n} {\hat{ε}}_{j} / (n - p)$ represents the residual mean square estimate which is an unbiased estimator of σ² and ${\hat{α}}_{w}$ is the w^th element of $\hat{α}$ which is an unbiased estimate of α.

Furthermore, they found that the best strategy for achieving an optimal estimate is to determine a single value for p ridge parameters i.e. to replace k_w with k for all w defined in Eq (3) and hence suggested the following estimator:

{\hat{k}}_{H K} = \frac{{\hat{σ}}^{2}}{{\hat{α}}_{\max}^{2}},

(4)

where ${\hat{α}}_{\max} = \max ({\hat{α}}_{1}, {\hat{α}}_{2}, {\hat{α}}_{3}, \dots, {\hat{α}}_{p}) .$

Later, [11] proposed a ridge estimator based on the harmonic mean of ${\hat{k}}_{w}$ ; harmonic mean is used as a way to prevent the small α_w, that have little predicting power and defined as

{\hat{k}}_{H K B} = \frac{p {\hat{σ}}^{2}}{\sum_{w = 1}^{p} {\hat{α}}_{w}^{2}},

(5)

[12] suggested the following ridge estimator by using weights of eigen values

{\hat{k}}_{H S L} = {\hat{σ}}^{2} \frac{\sum_{w = 1}^{p} {(λ_{w} {\hat{α}}_{w})}^{2}}{(\sum_{w = 1}^{p} λ_{w} {\hat{α}}_{w}^{2})^{2}},

(6)

where λ_w is the w^th eigen value.

[14] modified ${\hat{k}}_{w}$ ridge estimator by taking its arithmetic mean and geometric mean as follows:

{\hat{k}}_{A M} = \frac{1}{p} \sum_{w = 1}^{p} \frac{{\hat{σ}}^{2}}{{\hat{α}}_{w}^{2}},

(7)

{\hat{k}}_{G M} = {{\hat{σ}}^{2} / (\prod_{w = 1}^{p} {\hat{α}}_{w}^{2})}^{1 / p} .

(8)

For the sake of simplicity, in this paper, we will name these estimators as HK, HKB, HSL, AM, and GM defined in Eqs (4)–(8).

2.2. Novel bootstrap-quantile estimators

Consider any ridge estimator $\hat{k}$ and its estimates obtained for B bootstrap samples ${\hat{k}}^{* (u)}; u = 1, 2, 3, \dots, B .$ The estimates are ordered in magnitude as ${\hat{k}}^{* (1)} \leq {\hat{k}}^{* (2)} \leq {\hat{k}}^{* (3)} \leq \dots \leq {\hat{k}}^{* (B)} .$ Let ${{\hat{k}}_{γ}^{*}, 0 < γ < 1}$ be the 100 γ th quantile of ${{\hat{k}}^{* (1)}, {\hat{k}}^{* (2)}, {\hat{k}}^{* (3)}, \dots, {\hat{k}}^{* (B)}}$ then the proposed BQR is:

${\hat{k}}_{γ}^{*} = {{\hat{k}}^{* (1)}, {\hat{k}}^{* (2)}, {\hat{k}}^{* (3)}, \dots, {\hat{k}}^{* (B)}}_{γ}$ such that $P ({\hat{k}}^{* (u)} < {\hat{k}}_{γ}^{*}) = γ$ .

It can be easily noted that the performance of any ridge estimator can be improved by an appropriate selection of quantile level. Moreover, it has been noticed that generally upper quantile levels are the appropriate choice in this regard. Thus, we have a BQR version ${\hat{k}}_{γ}^{*}$ of any ridge estimator $\hat{k}$ such that the suggested BQR version yields the smallest MSE.

As in Eq (2), the MSE of the ridge in the canonical form is mentioned, therefore

M S E (\hat{α} ({\hat{k}}^{* (u)})) = σ^{2} \sum_{w = 1}^{p} \frac{λ_{w}}{{(λ_{w} + {\hat{k}}_{w}^{* (u)})}^{2}} + \sum_{w = 1}^{p} \frac{{\hat{k}}_{w}^{* (u)} α_{w}^{2}}{{(λ_{w} + {\hat{k}}_{w}^{* (u)})}^{2}},

There exists a real number ${\hat{k}}_{γ}^{*}$ in the interval; $\min ({\hat{k}}^{* (u)}) < {\hat{k}}_{γ}^{*} < \max ({\hat{k}}^{* (u)})$ , such that

M S E (\hat{k}) - M S E ({\hat{k}}_{γ}^{*}) > 0 i . e . M S E ({\hat{k}}_{γ}^{*}) < M S E (\hat{k}) .

2.2.1 Theoretical justification of the proposed method

Let G be the cumulative density function (CDF) of the estimator of ridge estimator k. The γth quantile of the distribution can be defined as P(k_γ) = γ, such that ${\hat{k}}_{γ} (G) = G^{- 1} (γ)$ . Let ${(X_{j}^{*}, y_{j}^{*}) : j = 1, 2, \dots, n}$ be a bootstrap sample then selecting “B” bootstrap samples and estimating the ridge parameter i.e. ${\hat{k}}^{* (1)}, {\hat{k}}^{* (2)}, {\hat{k}}^{* (3)}, \dots, {\hat{k}}^{* (B)}$ . We can define the empirical distribution function

{\hat{G}}_{n}^{*} (k) = \frac{1}{B} \sum_{u = 1}^{B} I ({\hat{k}}^{*} \leq k) .

It is important to note that the target function, ${\hat{k}}_{γ} (G)$ , is not linear. Let θ_k be a point mass at location k, then we can define the statistical function, T_k, and influence function see e.g. [29] as follows:

L_{G} (k) = \lim_{δ \to 0} \frac{T_{k} [(1 - δ) G + δ θ_{k}] - T_{k} (G)}{δ} .

Under the condition, T_k is smooth then L_F(k) is also smooth, which leads to the following

V_{k} (G) = \int L_{G}^{2} (k) d G (k)

The bootstrap is simply a plug-in estimate i.e. $V_{k} ({\hat{G}}_{n}) = \int L_{{\hat{G}}_{n}}^{2} (k) d {\hat{G}}_{n} (k)$ and

V_{k} ({\hat{G}}_{n}) \approx V_{k} (G) .

This consequently leads to the validity of bootstrap consistency.

The influence function of $T_{k_{γ}}$ is

L_{G} (k) = \frac{γ}{P (G^{- 1} (γ))} .

Where P is the pdf of G. Thus, using the delta method see. e.g. [34]

\sqrt{n} (T_{k_{γ}} ({\hat{G}}_{n}) - T_{k_{γ}} (G)) \to N (0, \frac{γ^{2}}{p^{2} (G^{- 1} (γ))}) .

This derives the asymptotic distribution of the bootstrap quantile estimator and also the variance of the estimator.

2.2.2. Computation of BQR

The proposed BQR estimators are computed using the methodology described in Algorithm-1.

Algorithm-1: Bootstrap-Quantile Estimator

1. For each j = 1,2,⋯n, generate random samples {(X_j,y_j): j = 1,2,⋯,n} under the specified DGP.

2. Select B bootstrap from {(X_j,y_j):j = 1,2,⋯,n} i.e.

${({X_{j}}^{* (u)}, {y_{j}}^{* (u)}) : j = 1, 2, \dots, n, u = 1, 2, \dots, B}$

3. For each bootstrap sample compute k*^(u):u = 1,2,⋯,B.

4. Calculate the γ^th quantile of ${\hat{k}}^{* (u)}$ such that

$P ({\hat{k}}^{* (u)} \leq {\hat{k}}_{γ}^{*}) = γ$ and $M S E ({\hat{k}}_{γ}^{*}) < M S E (\hat{k}) .$

The proposed BQR estimators corresponding to considered ridge estimators are denoted by $H K_{γ}^{*}, H K B_{γ}^{*}, H S L_{γ}^{*}, A M_{γ}^{*},$ and $G M_{γ}^{*}$ .

2.3. Performance evaluation criteria

The MSE criterion is used to evaluate the performance of our proposed estimators with their baseline counterparts. Previously this evaluation criterion has been used in numerous researches such as [14, 18, 35, 36] among many others.

The MSE is given as:

M S E (\hat{α}) = E (\hat{α} - α)^{'} (\hat{α} - α) .

Since the theoretical comparison of the ridge estimators presented in Eqs (4)–(8) with their BQR version is not possible. So, we establish empirically and for this purpose, Monte Carlo simulations are used to assess the performance of the considered and proposed estimators. To further quantify the strength of the proposed estimators over baseline estimators, the improved percentage indicator of MSE with respect to BQR is calculated as

P_{M S E} = \frac{(M S E - M S E_{γ}^{*})}{M S E} \times 100.

where P_MSE indicates the percentage of increase/decrease in MSE due to the BQR estimator in comparison with their baseline counterpart. Thus, theoretically, a positive P_MSE indicates improvement achieved while a negative percentage indicates deterioration due to the use of BQR compared to their baseline estimators.

3. Numerical evaluation

In this section, we will briefly describe the data generation process (DGP) together with the important factors like sample size (n), error variance (σ²), dimension (p), collinearity level (ρ), and error term distribution which are varied in the simulation study to see the behavior in different settings. The results of a simulation study for baseline and their corresponding BQR versions are also presented.

3.1 Simulation design

Keeping in view the previous studies, we take the DGP as mentioned by [14, 20, 21, 37, 38] and is mentioned below.

x_{j w} = {(1 - ρ^{2})}^{1 / 2} z_{j w} + ρ z_{j p}, w = 1, 2, \dots, p . j = 1, 2, \dots, n .

here z_jw are the independent pseudo-random numbers drawn from a standard normal distribution and ρ is the degree of correlation between any two predictors. Further, the dependent variable, y is generated as:

y_{j} = β_{0} + β_{1} x_{j 1} + β_{2} x_{j 2} + \dots + β_{p} x_{j p} + ε_{j}, j = 1, 2, \dots, n .

where ε_j is the random error generated from a normal distribution with zero mean and variance σ² Here, without loss of generality, we can assume β₀ = 0 i.e. the intercept of the regression model is zero. The experiment is replicated M times and the MSE of the estimators is computed using the following formula:

M S E ({\hat{α}}_{k}) = \frac{1}{M} \sum_{v = 1}^{M} \sum_{w}^{p} {({\hat{α}}_{v w} - α_{w})}^{2}, v = 1, 2, \dots, M, w = 1, 2, 3, \dots, p,

where ${\hat{α}}_{v w}$ represents one of the above mentioned estimates of w^th true regression parameter α_w in the v^th replication.

The design matrix X is generated to investigate the effects of four levels of collinearity, i.e., ρ = 0.70, 0.80, 0.90, and 0.99. These values will cover a wide range of low, moderate, and strong correlations among the variables. Three levels of error variance (σ² = 0.5, 1, 5). Whereas n and p are varied proportionally i.e. we consider the following cases.

Case 1: n = 25, p = 4, σ² = 0.5, 1, 5 and ρ = 0.70, 0.80, 0.90 and 0.99.

Case 2: n = 50, p = 8, σ² = 0.5, 1, 5 and ρ = 0.70, 0.80, 0.90 and 0.99.

Case 3: n = 100, p = 10, σ² = 0.5, 1, 5 and ρ = 0.70, 0.80, 0.90 and 0.99.

Further to explore the behavior of the proposed BQR for large sample size, as an illustration, we have considered,

Case 4: n = 200, 03c3 σ² = 1, ρ = 0.99 with varying choices of number of predictors i.e., p = 4, 8, 10, 16, 32.

In this study, the number of bootstrap samples is taken to be 200 (i.e. B = 200). The estimated MSE on different combinations of n, p, σ², and ρ, when the error term is normally distributed is presented in Tables 1–3. Furthermore, to study the effect of the non-normal error term, we have generated the error terms from t-distribution with 2 degrees of freedom i.e. ε_j∼t(2) and also from F-distribution with 4 and 16 degrees of freedom i.e. ε_j∼F(4,16). The results of the estimated MSE of the classical ridge and our proposed BQR estimators when the error term is generated from the t and F distribution are presented in Tables 4 and 5 respectively. We used the R programming language version 4.1.0 to perform all the calculations that were made for the analysis of the estimators.

Table 1. Estimated MSE when the distribution of error term is N(0,σ²) with n =25. p = 4.

σ ²	OLS	HK	$H K_{γ}^{*}$	HKB	$H K B_{γ}^{*}$	HSL	$H S L_{γ}^{*}$	AM	$A M_{γ}^{*}$	GM	$G M_{γ}^{*}$
ρ = 0.7
0.5	0.105	0.097	0.064	0.082	0.051	0.097	0.079	0.305	0.045	0.080	0.046
1	0.401	0.342	0.172	0.255	0.168	0.351	0.277	0.397	0.145	0.185	0.127
5	11.232	5.024	1.168	4.341	1.763	4.797	2.502	1.084	0.661	1.783	0.604
ρ = 0.8
0.5	0.301	0.206	0.074	0.151	0.067	0.206	0.123	0.299	0.045	0.075	0.042
1	1.081	0.665	0.203	0.465	0.199	0.510	0.294	0.372	0.125	0.195	0.154
5	20.711	9.634	1.514	8.811	3.156	6.484	2.509	1.104	0.559	2.040	0.552
ρ = 0.9
0.5	0.398	0.259	0.081	0.190	0.074	0.260	0.146	0.187	0.052	0.137	0.035
1	1.138	0.793	0.422	0.585	0.228	0.631	0.324	0.296	0.115	0.317	0.094
5	35.32	17.89	2.497	15.110	5.038	6.771	2.277	1.285	0.525	2.534	0.585
ρ = 0.99
0.5	2.909	0.899	0.116	0.701	0.137	0.614	0.137	0.327	0.037	0.483	0.035
1	11.88	6.533	0.375	5.601	1.446	0.592	0.106	0.794	0.083	1.465	0.179
5	190.3	45.27	4.221	39.651	14.20	8.097	1.153	1.225	0.466	10.06	1.123

Open in a new tab

Table 3. Estimated MSE when the distribution of error term is N(0,σ²) with n = 100,p =10.

σ ²	OLS	HK	$H K_{γ}^{*}$	HKB	$H K B_{γ}^{*}$	HSL	$H S L_{γ}^{*}$	AM	$A M_{γ}^{*}$	GM	$G M_{γ}^{*}$
ρ = 0.7
0.5	0.087	0.077	0.0741	0.061	0.051	0.078	0.075	0.065	0.021	0.034	0.019
1	0.352	0.310	0.2890	0.178	0.154	0.311	0.295	0.112	0.048	0.062	0.047
5	7.502	4.200	2.304	2.438	1.873	3.483	3.258	0.583	0.262	0.661	0.339
ρ = 0.8
0.5	0.132	0.101	0.094	0.073	0.075	0.101	0.095	0.119	0.098	0.024	0.016
1	0.502	0.440	0.391	0.241	0.202	0.439	0.406	0.312	0.039	0.062	0.039
5	15.79	5.623	2.672	3.173	2.382	3.651	3.369	0.537	0.229	0.902	0.393
ρ = 0.9
0.5	0.249	0.169	0.130	0.107	0.074	0.169	0.149	0.207	0.113	0.029	0.013
1	1.088	0.591	0.479	0.303	0.240	0.573	0.512	0.254	0.035	0.078	0.036
5	31.38	10.726	3.901	6.056	4.255	3.656	3.158	0.578	0.183	1.505	0.626
ρ = 0.99
0.5	2.395	1.326	0.642	0.685	0.320	1.066	0.617	0.111	0.007	0.179	0.038
1	9.973	4.899	1.798	2.581	1.513	1.930	1.340	0.207	0.012	0.615	0.220
5	240.11	101.02	29.85	51.94	34.36	1.525	1.138	1.811	0.117	10.015	3.927

Open in a new tab

Table 4. Estimated MSE when the distribution of error term is standardized t–distribution with 2 degrees of freedom.

ρ	OLS	HK	$H K_{γ}^{*}$	HKB	$H K B_{γ}^{*}$	HSL	$H S L_{γ}^{*}$	AM	$A M_{γ}^{*}$	GM	$G M_{γ}^{*}$
n = 25,p = 4
0.70	3.857	1.824	0.540	1.693	0.819	2.046	1.281	0.596	0.408	0.673	0.408
0.80	15.432	2.757	0.718	2.164	1.038	2.355	1.418	0.613	0.319	0.959	0.433
0.90	18.291	4.829	1.313	3.985	1.326	3.449	1.757	0.799	0.260	1.290	0.359
0.99	150.882	39.18	2.690	60.732	8.300	11.024	0.854	1.150	0.249	5.233	0.693
n = 50,p = 8
0.70	7.046	2.623	1.284	1.656	1.236	2.639	2.345	0.474	0.281	1.616	0.739
0.80	16.889	3.904	1.780	2.179	1.630	3.243	2.847	0.493	0.208	0.589	0.249
0.90	20.315	8.095	3.020	4.979	3.228	5.155	3.223	0.471	0.199	0.939	0.291
0.99	160.586	62.558	9.840	38.215	21.482	14.575	1.593	1.211	0.112	6.356	1.843
n = 100,p = 10
0.70	3.632	1.884	1.194	1.008	0.799	1.643	1.537	0.481	0.169	0.419	0.211
0.80	5.110	3.165	1.682	1.684	1.278	2.349	1.982	0.405	0.124	0.355	0.153
0.90	18.74	5.319	2.833	3.204	2.397	4.094	3.528	0.349	0.099	0.640	0.242
0.99	160.1	40.28	11.81	23.73	15.36	3.446	1.496	0.709	0.064	4.332	1.490

Open in a new tab

Table 5. Estimated MSE when the distribution of error term is standardized F(4,16) distribution.

ρ	OLS	HK	$H K_{γ}^{*}$	HKB	$H K B_{γ}^{*}$	HSL	$H S L_{γ}^{*}$	AM	$A M_{γ}^{*}$	GM	$G M_{γ}^{*}$
n = 25,p = 4
0.70	0.498	0.188	0.165	0.130	0.119	0.196	0.146	0.485	0.119	0.215	0.112
0.80	0.752	0.269	0.123	0.175	0.107	0.300	0.154	0.468	0.123	0.199	0.109
0.90	2.607	0.333	0.110	0.194	0.099	0.274	0.122	0.347	0.109	0.151	0.085
0.99	21.481	3.351	0.095	2.078	0.521	0.171	0.054	0.268	0.054	0.529	0.054
n = 50,p = 8
0.70	0.406	0.305	0.198	0.127	0.083	0.311	0.234	0.553	0.072	0.137	0.067
0.80	0.677	0.366	0.228	0.147	0.091	0.356	0.255	0.518	0.070	0.123	0.066
0.90	1.467	0.531	0.269	0.211	0.115	0.463	0.308	0.333	0.047	0.081	0.044
0.99	16.212	3.199	0.448	1.421	0.502	0.486	0.160	0.182	0.019	0.417	0.071
n = 100,p = 10
0.70	0.288	0.222	0.190	0.100	0.081	0.223	0.201	0.580	0.065	0.112	0.047
0.80	0.364	0.257	0.214	0.106	0.082	0.258	0.224	0.503	0.047	0.087	0.039
0.90	1.072	0.505	0.320	0.184	0.111	0.471	0.346	0.330	0.026	0.045	0.025
0.99	9.553	2.469	0.687	0.984	0.456	0.805	0.378	0.138	0.018	0.189	0.043

Open in a new tab

3.2. Results and discussion

In this section, we have used extensive simulations under various considered scenarios. The performance of the proposed estimators and the existing estimators are assessed on the basis of the MSE criterion. As the MSE is affected by error variance, distribution of error term, dimensionality, predictor’s correlation matrix, and the sample size, we have considered various combinations of these factors already discussed in the previous section. In this study, we have used 1000 Monte Carlo runs and 200 bootstrap samples and the results of MSE are presented in Tables 1–5. Whereas for a clearer picture Figs 1–3 exhibit percentage reduction in MSE (P_MSE) due to the proposed BQR method as compared to the classical counterpart when the distribution of error term is N(0,σ²), whereas Figs 4, and 5 indicates P_MSE when the distribution of error term is non-normal (i.e. it follows “t” and “F” distribution) respectively.

To highlight the performance of the studied estimators in Tables 1–6, we use boldface to indicate the more efficient estimator. Proposed BQR estimators have resulted in reduced MSE as compared to baseline estimators in almost every considered scenario. Hence, the results so far are encouraging. Another key finding is that a substantial reduction in MSE can be noted while using the BQR estimator when ρ = 0.99 which is evident from Figs 1–5. For instance, there is an 87%, 94%, and 91% reduction in MSE of HK when ρ = 0.99 and σ² = 0.5, 1, 5 respectively in cases where n = 25, p = 4 and error term follows a normal distribution (see Fig 1). Also, the reduction in MSE of HK for the same level of collinearity and error variances are 74%, 76%, and 81%, when n = 50 and p = 8 (see Fig 2). The results are an indication that bootstrapping can serve as an instrument for boosting the efficiency of already existing ridge estimators.

Table 6. Estimated MSE when the distribution of error term is N(0,σ²) with n = 200,σ² = 1 and ρ = 0.99. The values in parenthesis indicate reduction (%) in MSE due to novel BQR method as compared to the baseline counterpart.

p	OLS	HK	$H K_{γ}^{*}$	HKB	$H K B_{γ}^{*}$	HSL	$H S L_{γ}^{*}$	AM	$A M_{γ}^{*}$	GM	$G M_{γ}^{*}$
4	1.548	0.718	0.244 (66%)	0.553	0.363 (34%)	0.541	0.450 (17%)	0.499	0.047 (91%)	0.232	0.096 (58%)
8	3.396	1.537	1.014 (34%)	1.406	0.604 (57%)	1.178	1.000 (15%)	0.145	0.008 (94%)	0.257	0.098 (61%)
10	3.853	1.857	1.221 (34%)	0.977	0.687 (30%)	1.378	1.191 (14%)	0.126	0.009 (93%)	0.278	0.105 (62%)
16	7.879	3.784	2.268 (40%)	1.740	1.360 (22%)	2.417	2.048 (15%)	0.128	0.008 (94%)	0.405	0.326 (20%)
32	18.879	10.114	6.425 (36%)	4.043	3.479 (14%)	5.522	4.045 (27%)	0.109	0.007 (94%)	0.665	0.508 (24%)

Open in a new tab

When the probability distribution of the error term is t(2) or F(4,16), a similar improved performance of the BQR can be noted as it exhibits in case of normal errors (see Tables 4 and 5, Figs 4 and 5). Here, the proposed BQR version of AM i.e. $A M_{γ}^{*}$ estimator comes out to be the most efficient as it exhibits a maximum reduction in MSE in almost every considered scenario. In the case where the error term is normally distributed BQR can reduce the MSE of AM by up to 95%. Whereas the maximum reduction in MSE of estimator HK is noted to be 93% and 97% when the error term follows a t-distribution and F-distribution respectively.

Moreover, the MSE tends to increase with an increase in error variance. It is also noted that increase in the degree of collinearity generally increases the MSE of all estimators, however in the case of severe collinearity i.e. ρ = 0.99 and σ² = 1, some estimators show a decrease in the estimated value of MSE e.g. when n = 25 and p = 4 decrease in MSE value of the estimators such as $H K_{γ}^{*}, H K B_{γ}^{*}, H K L, H S L_{γ}^{*}, A M_{γ}^{*}$ can be observed (see Table 1). Similarly when n = 50 and p = 4 the MSE value of the estimator $A M_{γ}^{*}$ decreases (see Table 2), also when n = 100 and p = 10 the value MSE of the estimator $A M, A M_{γ}^{*}, G M_{γ}^{*}$ decreases (see Table 3).

Table 2. Estimated MSE when the distribution of error term is N(0,σ²) with n = 50, p = 8.

σ ²	OLS	HK	$H K_{γ}^{*}$	HKB	$H K B_{γ}^{*}$	HSL	$H S L_{γ}^{*}$	AM	$A M_{γ}^{*}$	GM	$G M_{γ}^{*}$
ρ = 0.7
0.5	0.242	0.111	0.037	0.086	0.061	0.111	0.102	0.109	0.030	0.052	0.030
1	0.675	0.455	0.385	0.270	0.210	0.458	0.413	0.148	0.080	0.111	0.078
5	10.37	5.780	2.182	3.485	2.417	4.862	4.335	0.702	0.411	1.139	0.492
ρ = 0.8
0.5	0.489	0.258	0.169	0.162	0.082	0.258	0.193	0.121	0.023	0.043	0.026
1	0.948	0.579	0.416	0.313	0.214	0.556	0.450	0.208	0.071	0.108	0.065
5	21.32	9.492	2.709	5.478	3.643	5.088	4.354	0.825	0.309	1.813	0.649
ρ = 0.9
0.5	0.520	0.282	0.168	0.168	0.083	0.282	0.196	0.205	0.025	0.053	0.021
1	1.509	0.869	0.595	0.480	0.322	0.836	0.654	0.278	0.048	0.159	0.058
5	57.24	14.531	3.375	9.063	5.337	4.920	3.923	0.795	0.277	1.982	0.672
ρ = 0.99
0.5	6.277	1.729	0.449	0.996	0.276	1.138	0.376	0.261	0.013	0.312	0.036
1	17.46	6.889	1.669	4.124	2.082	1.702	0.767	0.373	0.024	1.033	0.258
5	531.7	190.3	35.60	111.93	68.16	1.755	0.957	2.562	0.139	26.260	7.637

Open in a new tab

Nevertheless, it is to be noticed that the MSE of BQR estimators are appreciably lower in almost all the considered simulations scenarios. Despite, increase in sample size or number of predictors. The results in Table 6 shows similar excellent performance of BQR estimators as observed in Tables 1–3 in terms of minimum MSE, suggesting the superiority of suggested estimators over their counterparts even when the sample size is large.

4. Real-life application

In this section, to illustrate the use of our proposed estimators and methodology, we have considered two real-life published datasets Tobacco data and Hospital manpower data set [39]. These datasets are generally aligned with the structure we used earlier in our simulation work in Section 3.1.

4.1 Tobacco data

This data set consists of 30 observations; the dependent variable is the heat evolved from tobacco during the smoking process with percentage concentration of four important components that are taken as independent variables. The linear regression model is given as:

y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + β_{4} X_{4} + ε_{j} .

The condition number (CN) which is the ratio of maximum eigen value and minimum eigen value is calculated as 1855.526, and also the variance inflation factor (VIF) of all predictors is greater than 10 i.e. for X₁, X₂, X₃ and X₄ the VIF calculated are 324.141, 45.173, 173.258 and 138.175. This is an indication of the presence of severe multicollinearity in the data. Fig 6 represents correlation among the variables of Tobacco data. Table 7 provides the estimated MSE values and regression coefficients for the proposed as well as the baseline estimator.

Table 7. Estimated MSEs and estimated regression coefficients of the Tobacco data.

Estimator	k	MSE	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{3}$	${\hat{β}}_{4}$
OLS	0	1.1206	1.5074	-0.5211	-0.8416	0.8217
HK	0.02	0.7749	1.2083	-0.4658	-0.611	0.8354
$H K_{γ}^{*}$	0.08	0.6270	0.8344	-0.3613	-0.2856	0.7799
HKB	0.05	0.6449	0.9877	-0.4116	-0.4254	0.8165
$H K B_{γ}^{*}$	0.06	0.6365	0.9583	-0.4030	-0.3994	0.8113
HSL	0.02	0.7753	1.2089	-0.4660	-0.6114	0.8354
$H S L_{γ}^{*}$	0.11	0.6510	0.7422	-0.3227	-0.1963	0.7442
AM	0.08	0.6279	0.8271	-0.3585	-0.2787	0.7775
$A M_{γ}^{*}$	0.07	0.6254	0.8816	-0.3783	-0.3297	0.7938
GM	0.06	0.6277	0.903	-0.3856	-0.3494	0.7993
$G M_{γ}^{*}$	0.06	0.6268	0.9109	-0.3882	-0.3566	0.8012

Open in a new tab

Note: Bold value represents the estimator with the smallest MSE.

It is evident from Table 7 that all the BQR estimators outperform OLS as well as their baseline estimators in terms of smallest MSE.

4.2 Hospital manpower data

The data consists of 17 observations and 5 explanatory variables such as Load (monthly man hours), Xray (monthly X-ray exposures), BedDays (monthly occupied bed days), AreaPop (eligible population in the area in thousands) and Stay (average length of patient’s stay in days). Whereas dependent variable is Hours (average daily patient load).

The linear regression model is given as:

H o u r = β_{0} + β_{1} L o a d + β_{2} X r a y + β_{3} B e d D a y s + β_{4} A r e a P o p + β_{5} S t a y + ε_{j} .

Fig 7 represents correlation among the variables of Hospital manpower data. The CN is calculated as. 77769.66. Also, the VIF of the predictors Loads, BedDays and AreaPop are 9597.57, 8933.09 and 23.29 which are greater than 10; these numbers indicate a strong multicollinearity problem among the predictors. The estimated MSE for each estimator of manpower data are given in Table 8. From this table, it is evident that proposed BQR estimators have smaller MSE as compared to their base line estimator. This suggests that the proposed estimators outperform their competitors, suggesting the superiority of these estimators over the baseline estimators.

Table 8. Estimated MSEs and estimated regression coefficients of the Hospital manpower data.

Estimator	K	MSE	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{3}$	${\hat{β}}_{4}$	${\hat{β}}_{5}$
OLS	0	14.1831	-0.4591	0.2140	1.4027	-0.0819	-0.1123
HK	0.01	0.2483	0.3603	0.2161	0.6073	-0.1053	-0.1186
$H K_{γ}^{*}$	0.08	0.0424	0.4400	0.2231	0.4811	-0.0750	-0.1041
HKB	0.03	0.0522	0.4380	0.2185	0.5164	-0.0972	-0.1144
$H K B_{γ}^{*}$	0.14	0.0487	0.4255	0.2285	0.4551	-0.0477	-0.0916
HSL	0.04	0.0461	0.4422	0.2195	0.5048	-0.0923	-0.1121
$H S L_{γ}^{*}$	0.05	0.0437	0.4431	0.2205	0.4971	-0.0877	-0.1100
AM	0.62	0.1922	0.3519	0.2458	0.3661	0.0618	-0.0402
$A M_{γ}^{*}$	0.16	0.0519	0.4211	0.2298	0.4489	-0.0405	-0.0883
GM	0.18	0.0539	0.4187	0.2305	0.4456	-0.0366	-0.0865
$G M_{γ}^{*}$	0.1	0.0438	0.4348	0.2253	0.4699	-0.0639	-0.0991

Open in a new tab

Note: Bold value represents the estimator with the smallest MSE.

5. Conclusion

In this article, the bootstrap-quantile approach is suggested for the improvement of existing ridge estimators and thus the efficient estimation of regression coefficients. Since the ridge parameter adjusts the amount of shrinkage therefore its optimization is an important task to obtain better regression estimates. Using the resampling mechanism inherent in bootstrapping, it is demonstrated that our proposed method, i.e. BQR method remarkably improves the performance of any ridge estimator. We have also studied applications to tobacco and hospital manpower data to illustrate the use of our proposed method. The bootstrap methods, especially the wild bootstrapping, can be further studied for the regression models with multicollinear predictors and heteroscedastic errors.

Supporting information

S1 Text

(TXT)

pone.0302221.s001.txt^{(4KB, txt)}

Data Availability

Published data has been used. A complete reference is provided in the manuscript.

Funding Statement

The authors received no specific funding for this work.

References

1.Schroeder MA, Lander J, Levine-Silverman S. Diagnosing and dealing with multicollinearity. Western journal of nursing research. 1990. Apr;12(2):175–87. doi: 10.1177/019394599001200204 [DOI] [PubMed] [Google Scholar]
2.Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970. Feb 1;12(1):55–67. [Google Scholar]
3.Massy WF. Principal components regression in exploratory statistical research. Journal of the American Statistical Association. 1965. Mar 1;60(309):234–56. [Google Scholar]
4.Wold H. Estimation of principal components and related models by iterative least squares. Multivariate analysis. 1966:391–420. [Google Scholar]
5.Stone M, Brooks RJ. Continuum regression: cross‐validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. Journal of the Royal Statistical Society: Series B (Methodological). 1990. Jan;52(2):237–58. [Google Scholar]
6.Khan MH, Shaw JE. Variable selection for survival data with a class of adaptive elastic net techniques. Statistics and Computing. 2016. May;26(3):725–41. [Google Scholar]
7.Park E, Ha ID. Penalized variable selection for accelerated failure time models. Communications for Statistical Applications and Methods. 2018;25(6):591–604. [Google Scholar]
8.Khan MH, Bhadra A, Howlader T. Stability selection for lasso, ridge and elastic net implemented with AFT models. Statistical applications in genetics and molecular biology. 2019. Oct 1;18(5). doi: 10.1515/sagmb-2017-0001 [DOI] [PubMed] [Google Scholar]
9.Belsley DA, Kuh E, Welsch RE. Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons; 2005. Feb 25. [Google Scholar]
10.Maddala GS, Lahiri K. Introduction to econometrics. New York: Macmillan; 1992. [Google Scholar]
11.Hoerl AE, Kannard RW, Baldwin KF. Ridge regression: some simulations. Communications in Statistics-Theory and Methods. 1975. Jan 1;4(2):105–23. [Google Scholar]
12.Hocking RR, Speed FM, Lynn MJ. A class of biased estimators in linear regression. Technometrics. 1976. Nov 1;18(4):425–37. [Google Scholar]
13.JF L, P W. A simulation study of ridge and other regression estimators. Communications in Statistics-theory and Methods. 1976. Jan 1;5(4):307–23. [Google Scholar]
14.Kibria BMG. Performance of some new ridge regression estimators. Communications in Statistics-Simulation and Computation. 2003. Jan 6;32(2):419–35. [Google Scholar]
15.Muniz G, Kibria BG. On some ridge regression estimators: An empirical comparison. Communications in Statistics—Simulation and Computation. 2009. Feb 19;38(3):621–30. [Google Scholar]
16.Khalaf G, Månsson K, Shukur G. Modified ridge regression estimators. Communications in Statistics-Theory and Methods. 2013. Apr 15;42(8):1476–87. [Google Scholar]
17.Kibria BMG, Lukman AF. A new ridge-type estimator for the linear regression model: Simulations and applications. Scientifica. 2020 Apr 14;2020. doi: 10.1155/2020/9758378 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Suhail M, Chand S, Kibria BG. Quantile based estimation of biasing parameters in ridge regression model. Communications in Statistics-Simulation and Computation. 2020. Oct 2;49(10):2732–44. [Google Scholar]
19.Mermi S, Göktaş A, Akkuş Ö. Are most proposed ridge parameter estimators skewed and do they have any effect on MSE values? Journal of Statistical Computation and Simulation. 2021. Jul 3;91(10):2074–93. [Google Scholar]
20.Shabbir M, Chand S, Iqbal F. Bagging-based ridge estimators for a linear regression model with non-normal and heteroscedastic errors. Communications in Statistics-Simulation and Computation. 2022. Aug 4:1–5. [Google Scholar]
21.Dar IS, Chand S, Shabbir M, Kibria BMG. Condition-index based new ridge regression estimator for linear regression model with multicollinearity. Kuwait Journal of Science. 2023. Apr 1;50(2):91–6. [Google Scholar]
22.Abonazel MR, Taha IM. Beta ridge regression estimators: simulation and application. Communications in Statistics-Simulation and Computation. 2023. Sep 2;52(9):4280–92. [Google Scholar]
23.Dawoud I, Kibria BMG. A new biased estimator to combat the multicollinearity of the Gaussian linear regression model. Stats. 2020. Nov 6;3(4):526–41. [Google Scholar]
24.Hadia M, Amin M, Akram MN. Comparison of link functions for the estimation of logistic ridge regression: An application to urine data. Communications in Statistics-Simulation and Computation. 2022. Sep 25:1–7. [Google Scholar]
25.Yehia EG. On the restricted poisson ridge regression estimator. Science Journal of Applied Mathematics and Statistics. 2021;9:106. [Google Scholar]
26.Dawoud I, Abonazel MR, Awwad FA, Tag Eldin E. A new Tobit Ridge-type estimator of the censored regression model with multicollinearity problem. Frontiers in Applied Mathematics and Statistics. 2022. Jul 15;8:952142. [Google Scholar]
27.Efron B. Bootstrap methods: another look at the jackknife. Annals of Statistics 1979,7: 1–26. [Google Scholar]
28.Efron B, Hastie T. Computer age statistical inference. Cambridge University, 2016. [Google Scholar]
29.Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman and Hall/CRC; 1994. May 15. [Google Scholar]
30.Hesterberg TC. What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum. The American statistician. 2015. Oct 2;69(4):371–86. doi: 10.1080/00031305.2015.1089789 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Rousselet G, Pernet CR, Wilcox RR. An introduction to the bootstrap: a versatile method to make inferences by using data-driven simulations. Meta-Psychology. 2023. Dec 6;7. [Google Scholar]
32.Wilcox RR. Introduction to robust estimation and hypothesis testing. Academic press; 2011. Dec 14. [Google Scholar]
33.Delaney NJ, Chatterjee S. Use of the bootstrap and cross-validation in ridge regression. Journal of Business & Economic Statistics. 1986. Apr 1;4(2):255–62. [Google Scholar]
34.Oehlert GW. A note on the delta method. The American Statistician. 1992. Feb 1;46(1):27–9. [Google Scholar]
35.Gibbons DG. A simulation study of some ridge estimators. Journal of the American Statistical Association. 1981. Mar 1;76(373):131–9. [Google Scholar]
36.Kibria BM, Banik S. Some ridge regression estimators and their performances. Journal of Modern Applied statistical methods. 2020;15(1):12. [Google Scholar]
37.McDonald GC, Galarneau DI. A Monte Carlo evaluation of some ridge-type estimators. Journal of the American Statistical Association. 1975. Jun 1;70(350):407–16. [Google Scholar]
38.Dawoud I. A new improved estimator for reducing the multicollinearity effects. Communications in Statistics-Simulation and Computation. 2023. Aug 3;52(8):3581–92. [Google Scholar]
39.Myers RH. Classical and modern regression with applications. PWS-Kent Publishing, Boston; 1990. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0302221.r001

Decision Letter 0

Md Hasinur Rahaman Khan

26 Sep 2023

PONE-D-23-23113Bootstrap-quantile ridge estimator for linear regression with application to the groundwater physiochemical quality parametersPLOS ONE

Dear Dr. Chand,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 10 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Md Hasinur Rahaman Khan, Ph.D.

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

Additional Editor Comments:

In addition to the reviewer's comment, I have some points that the authors need to address:

1. Bootsrap generally considers two-third observations and hence produces underestimated results. Have you mentioned this as one of the demerits of your proposal.

2. Ridge-estimator is not used if there is correlated covariates in the model. Instead elastic-net is generally preferred for overcoming this issue. What is the rational of using Ridge estimator?

3. May need to cite some related papers:

M.H.R. Khan, A. Bhadra and T. Howlader (2019). Stability Selection for Lasso, Ridge and Elastic Net Implemented with AFT Models. Statistical Applications in Genetics and Molecular Biology, 18(5), 20170001

M.H.R. Khan and J.E.H. Shaw (2016). Variable Selection for Survival Data with a Class of Adaptive Elastic Net Techniques. Statistics and Computing, 26(3): 725-741

M.H.R. Khan (2018). On The Performance of Adaptive Pre-processing Technique in Analysing High-dimensional Censored Data. Biometrical Journal, 60(4): 687-702

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In my opinion, the paper offers a good contribution. So, I recommend accepting this paper, but after making the following major modifications:

1. I think that some recent papers related to "the ridge estimator" should be mentioned such as:

- DOI: 10.3390/stats3040033; DOI:10.3389/fams.2022.952142; DOI:10.1080/03610918.2021.1960373; DOI: 10.1016/j.sciaf.2022.e01372; DOI: 10.1080/03610918.2021.1939374.

2. In the simulation study, the author should use more different sample sizes, such as n>100. Moreover, it is very important to display the 'bias' amount of different biased estimators in the simulation study.

3. In application (table 8), what are the values of k's used for?

4. The author should put "datasets" that are used in the section of applications in a "Supporting Information File".

5. In section of conclusion, add a sentence about future work.

6. There are some grammatical errors in the paper. The author needs to carefully review the full text.

==================================

Reviewer #2: This paper develops a nonparametric bootstrap-quantile approach for the estimation of ridge parameter in linear regression model. The proposed method is illustrated using some popular and widely used ridge estimators but this idea can be extended to any ridge estimator. Monte Carlo simulations are carried out to compare the performance of the proposed estimators with their baseline counterparts. It is demonstrated empirically that MSE obtained from

our suggested bootstrap-quantile approach are substantially smaller than their baseline estimators especially when collinearity is high. Application to physiochemical properties of water quality data reveals the suitability of the idea.

The novelty of the article must be highlighted in relation to the existing literature. Many articles related to this topic has been published in the literature so what is new?

Why MSE is used to assess the performance?

What is the theoretical justification of the proposed method. It is suggested to the authors to derive the properties of the proposed estimators, especially large sample properties.

How the initial values are selected?

A new real data set should be used.

How the initial values are taken to initialized the algorithm in real data case?

Literature review must be updated by including all recent contribution in ridge regression.

The complete computational code should be submitted to verify the reproducibility.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Apr 29;19(4):e0302221. doi: 10.1371/journal.pone.0302221.r002

Author response to Decision Letter 0

25 Oct 2023

Response letter is uploaded.

Attachment

Submitted filename: Response to PLOS ONEl.docx

pone.0302221.s002.docx^{(37.3KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0302221.r003

Decision Letter 1

Mohamed R Abonazel

8 Mar 2024

PONE-D-23-23113R1Bootstrap-quantile ridge estimator for linear regression with applicationsPLOS ONE

Dear Dr. Chand,

Please submit your revised manuscript by Apr 22 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Mohamed R. Abonazel, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

The authors tried to address previous comments but not fully. For example,

There is no attachment showing the complete computational code to verify the reproducibility.

The source of 'Hospital manpower data' is not mentioned.

Why is not MAPE or MAE used rather than MSE?

The theoretical justification of bootstrap is available in books. Please cite an appropriate reference.

Can we consider a situation of n=10, and p=15?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

**********

PLoS One. 2024 Apr 29;19(4):e0302221. doi: 10.1371/journal.pone.0302221.r004

Author response to Decision Letter 1

20 Mar 2024

We are thankful to the Editor and Reviewers for their comments. We have revised the draft according to their suggestions.

Attachment

Submitted filename: Response to PLOS ONE 2.docx

pone.0302221.s003.docx^{(30.8KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0302221.r005

Decision Letter 2

Mohamed R Abonazel

1 Apr 2024

Bootstrap-quantile ridge estimator for linear regression with applications

PONE-D-23-23113R2

Dear Dr. Chand,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mohamed R. Abonazel, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

The authors addressed all comments.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

**********

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Text

(TXT)

pone.0302221.s001.txt^{(4KB, txt)}

Attachment

Submitted filename: Response to PLOS ONEl.docx

pone.0302221.s002.docx^{(37.3KB, docx)}

Attachment

Submitted filename: Response to PLOS ONE 2.docx

pone.0302221.s003.docx^{(30.8KB, docx)}

Data Availability Statement

Published data has been used. A complete reference is provided in the manuscript.

[pone.0302221.ref001] 1.Schroeder MA, Lander J, Levine-Silverman S. Diagnosing and dealing with multicollinearity. Western journal of nursing research. 1990. Apr;12(2):175–87. doi: 10.1177/019394599001200204 [DOI] [PubMed] [Google Scholar]

[pone.0302221.ref002] 2.Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970. Feb 1;12(1):55–67. [Google Scholar]

[pone.0302221.ref003] 3.Massy WF. Principal components regression in exploratory statistical research. Journal of the American Statistical Association. 1965. Mar 1;60(309):234–56. [Google Scholar]

[pone.0302221.ref004] 4.Wold H. Estimation of principal components and related models by iterative least squares. Multivariate analysis. 1966:391–420. [Google Scholar]

[pone.0302221.ref005] 5.Stone M, Brooks RJ. Continuum regression: cross‐validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. Journal of the Royal Statistical Society: Series B (Methodological). 1990. Jan;52(2):237–58. [Google Scholar]

[pone.0302221.ref006] 6.Khan MH, Shaw JE. Variable selection for survival data with a class of adaptive elastic net techniques. Statistics and Computing. 2016. May;26(3):725–41. [Google Scholar]

[pone.0302221.ref007] 7.Park E, Ha ID. Penalized variable selection for accelerated failure time models. Communications for Statistical Applications and Methods. 2018;25(6):591–604. [Google Scholar]

[pone.0302221.ref008] 8.Khan MH, Bhadra A, Howlader T. Stability selection for lasso, ridge and elastic net implemented with AFT models. Statistical applications in genetics and molecular biology. 2019. Oct 1;18(5). doi: 10.1515/sagmb-2017-0001 [DOI] [PubMed] [Google Scholar]

[pone.0302221.ref009] 9.Belsley DA, Kuh E, Welsch RE. Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons; 2005. Feb 25. [Google Scholar]

[pone.0302221.ref010] 10.Maddala GS, Lahiri K. Introduction to econometrics. New York: Macmillan; 1992. [Google Scholar]

[pone.0302221.ref011] 11.Hoerl AE, Kannard RW, Baldwin KF. Ridge regression: some simulations. Communications in Statistics-Theory and Methods. 1975. Jan 1;4(2):105–23. [Google Scholar]

[pone.0302221.ref012] 12.Hocking RR, Speed FM, Lynn MJ. A class of biased estimators in linear regression. Technometrics. 1976. Nov 1;18(4):425–37. [Google Scholar]

[pone.0302221.ref013] 13.JF L, P W. A simulation study of ridge and other regression estimators. Communications in Statistics-theory and Methods. 1976. Jan 1;5(4):307–23. [Google Scholar]

[pone.0302221.ref014] 14.Kibria BMG. Performance of some new ridge regression estimators. Communications in Statistics-Simulation and Computation. 2003. Jan 6;32(2):419–35. [Google Scholar]

[pone.0302221.ref015] 15.Muniz G, Kibria BG. On some ridge regression estimators: An empirical comparison. Communications in Statistics—Simulation and Computation. 2009. Feb 19;38(3):621–30. [Google Scholar]

[pone.0302221.ref016] 16.Khalaf G, Månsson K, Shukur G. Modified ridge regression estimators. Communications in Statistics-Theory and Methods. 2013. Apr 15;42(8):1476–87. [Google Scholar]

[pone.0302221.ref017] 17.Kibria BMG, Lukman AF. A new ridge-type estimator for the linear regression model: Simulations and applications. Scientifica. 2020 Apr 14;2020. doi: 10.1155/2020/9758378 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0302221.ref018] 18.Suhail M, Chand S, Kibria BG. Quantile based estimation of biasing parameters in ridge regression model. Communications in Statistics-Simulation and Computation. 2020. Oct 2;49(10):2732–44. [Google Scholar]

[pone.0302221.ref019] 19.Mermi S, Göktaş A, Akkuş Ö. Are most proposed ridge parameter estimators skewed and do they have any effect on MSE values? Journal of Statistical Computation and Simulation. 2021. Jul 3;91(10):2074–93. [Google Scholar]

[pone.0302221.ref020] 20.Shabbir M, Chand S, Iqbal F. Bagging-based ridge estimators for a linear regression model with non-normal and heteroscedastic errors. Communications in Statistics-Simulation and Computation. 2022. Aug 4:1–5. [Google Scholar]

[pone.0302221.ref021] 21.Dar IS, Chand S, Shabbir M, Kibria BMG. Condition-index based new ridge regression estimator for linear regression model with multicollinearity. Kuwait Journal of Science. 2023. Apr 1;50(2):91–6. [Google Scholar]

[pone.0302221.ref022] 22.Abonazel MR, Taha IM. Beta ridge regression estimators: simulation and application. Communications in Statistics-Simulation and Computation. 2023. Sep 2;52(9):4280–92. [Google Scholar]

[pone.0302221.ref023] 23.Dawoud I, Kibria BMG. A new biased estimator to combat the multicollinearity of the Gaussian linear regression model. Stats. 2020. Nov 6;3(4):526–41. [Google Scholar]

[pone.0302221.ref024] 24.Hadia M, Amin M, Akram MN. Comparison of link functions for the estimation of logistic ridge regression: An application to urine data. Communications in Statistics-Simulation and Computation. 2022. Sep 25:1–7. [Google Scholar]

[pone.0302221.ref025] 25.Yehia EG. On the restricted poisson ridge regression estimator. Science Journal of Applied Mathematics and Statistics. 2021;9:106. [Google Scholar]

[pone.0302221.ref026] 26.Dawoud I, Abonazel MR, Awwad FA, Tag Eldin E. A new Tobit Ridge-type estimator of the censored regression model with multicollinearity problem. Frontiers in Applied Mathematics and Statistics. 2022. Jul 15;8:952142. [Google Scholar]

[pone.0302221.ref027] 27.Efron B. Bootstrap methods: another look at the jackknife. Annals of Statistics 1979,7: 1–26. [Google Scholar]

[pone.0302221.ref028] 28.Efron B, Hastie T. Computer age statistical inference. Cambridge University, 2016. [Google Scholar]

[pone.0302221.ref029] 29.Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman and Hall/CRC; 1994. May 15. [Google Scholar]

[pone.0302221.ref030] 30.Hesterberg TC. What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum. The American statistician. 2015. Oct 2;69(4):371–86. doi: 10.1080/00031305.2015.1089789 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0302221.ref031] 31.Rousselet G, Pernet CR, Wilcox RR. An introduction to the bootstrap: a versatile method to make inferences by using data-driven simulations. Meta-Psychology. 2023. Dec 6;7. [Google Scholar]

[pone.0302221.ref032] 32.Wilcox RR. Introduction to robust estimation and hypothesis testing. Academic press; 2011. Dec 14. [Google Scholar]

[pone.0302221.ref033] 33.Delaney NJ, Chatterjee S. Use of the bootstrap and cross-validation in ridge regression. Journal of Business & Economic Statistics. 1986. Apr 1;4(2):255–62. [Google Scholar]

[pone.0302221.ref034] 34.Oehlert GW. A note on the delta method. The American Statistician. 1992. Feb 1;46(1):27–9. [Google Scholar]

[pone.0302221.ref035] 35.Gibbons DG. A simulation study of some ridge estimators. Journal of the American Statistical Association. 1981. Mar 1;76(373):131–9. [Google Scholar]

[pone.0302221.ref036] 36.Kibria BM, Banik S. Some ridge regression estimators and their performances. Journal of Modern Applied statistical methods. 2020;15(1):12. [Google Scholar]

[pone.0302221.ref037] 37.McDonald GC, Galarneau DI. A Monte Carlo evaluation of some ridge-type estimators. Journal of the American Statistical Association. 1975. Jun 1;70(350):407–16. [Google Scholar]

[pone.0302221.ref038] 38.Dawoud I. A new improved estimator for reducing the multicollinearity effects. Communications in Statistics-Simulation and Computation. 2023. Aug 3;52(8):3581–92. [Google Scholar]

[pone.0302221.ref039] 39.Myers RH. Classical and modern regression with applications. PWS-Kent Publishing, Boston; 1990. [Google Scholar]

PERMALINK

Bootstrap-quantile ridge estimator for linear regression with applications

Irum Sajjad Dar

Sohail Chand

Roles

Abstract

1. Introduction

2. Statistical methodology

2.1. Existing estimators

2.2. Novel bootstrap-quantile estimators

2.2.1 Theoretical justification of the proposed method

2.2.2. Computation of BQR

2.3. Performance evaluation criteria

3. Numerical evaluation

3.1 Simulation design

Table 1. Estimated MSE when the distribution of error term is N(0,σ2) with n =25. p = 4.

Table 3. Estimated MSE when the distribution of error term is N(0,σ2) with n = 100,p =10.

Table 4. Estimated MSE when the distribution of error term is standardized t–distribution with 2 degrees of freedom.

Table 5. Estimated MSE when the distribution of error term is standardized F(4,16) distribution.

3.2. Results and discussion

Fig 1. Improvements by proposed ridge estimators with different values of ρ when n = 25 and p = 4.

Fig 3. Improvements by proposed ridge estimators with different values of ρ when n = 100 and p = 10.

Fig 4. Improvements by proposed ridge estimators when error term follows t distribution.

Fig 5. Improvements by proposed ridge estimators when error term follows F distribution.

Table 6. Estimated MSE when the distribution of error term is N(0,σ2) with n = 200,σ2 = 1 and ρ = 0.99. The values in parenthesis indicate reduction (%) in MSE due to novel BQR method as compared to the baseline counterpart.

Fig 2. Improvements by proposed ridge estimators with different values of ρ when n = 50 and p = 8.

Table 2. Estimated MSE when the distribution of error term is N(0,σ2) with n = 50, p = 8.

4. Real-life application

4.1 Tobacco data

Fig 6. Graphical display of a correlation between variables of Tobacco data.

Table 7. Estimated MSEs and estimated regression coefficients of the Tobacco data.

4.2 Hospital manpower data

Fig 7. Graphical display of a correlation between variables of Hospital manpower data.

Table 8. Estimated MSEs and estimated regression coefficients of the Hospital manpower data.

5. Conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Md Hasinur Rahaman Khan

Roles

Author response to Decision Letter 0

Decision Letter 1

Mohamed R Abonazel

Roles

Author response to Decision Letter 1

Decision Letter 2

Mohamed R Abonazel

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. Estimated MSE when the distribution of error term is N(0,σ²) with n =25. p = 4.

Table 3. Estimated MSE when the distribution of error term is N(0,σ²) with n = 100,p =10.

Table 6. Estimated MSE when the distribution of error term is N(0,σ²) with n = 200,σ² = 1 and ρ = 0.99. The values in parenthesis indicate reduction (%) in MSE due to novel BQR method as compared to the baseline counterpart.

Table 2. Estimated MSE when the distribution of error term is N(0,σ²) with n = 50, p = 8.