Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2023 Jun 5;13:9066. doi: 10.1038/s41598-023-36053-z

Robust-stein estimator for overcoming outliers and multicollinearity

Adewale F Lukman 1,2,, Rasha A Farghali 3, B M Golam Kibria 4, Okunlola A Oluyemi 1,2
PMCID: PMC10241929  PMID: 37277421

Abstract

Linear regression models with correlated regressors can negatively impact the performance of ordinary least squares estimators. The Stein and ridge estimators have been proposed as alternative techniques to improve estimation accuracy. However, both methods are non-robust to outliers. In previous studies, the M-estimator has been used in combination with the ridge estimator to address both correlated regressors and outliers. In this paper, we introduce the robust Stein estimator to address both issues simultaneously. Our simulation and application results demonstrate that the proposed technique performs favorably compared to existing methods.

Subject terms: Environmental sciences, Mathematics and computing

Introduction

Linear regression models are popularly adopted to predict the response variable from a combination of regressors or predictors. The model is generally written as:

y=Xβ+ε, 1.1

where y is an n×1 vector of response variable, X is a n×p full rank matrix of regressors, β is a p×1 vector of unknown regression coefficients, ε is an n×1 vector of errors. The error term is assumed to be normally distributed with mean zero and constant variance σ2In, In is an n×n identity matrix. The parameter β is often estimated using the ordinary least squares estimator (OLS) which is defined as follows:

β^=XX-1Xy 1.2
Cov(β^)=σ2(XX)-1 1.3

where,σ^2 is the estimated residual mean square, σ^2=Y-Xβ^Y-Xβ^n-p. The scalar mean squared error (SMSE) of β^ and the matrix mean squared error (MMSE) of β^ are calculated as:

MMSEβ^=σ2(XX)-1 1.4
SMSEβ^=σ2tr(XX)-1=σ2j=1p1λj 1.5

is known to be sensitive to the presence of correlated regressors (multicollinearity) and outliers, which can negatively impact its performance. Several alternative methods have been proposed to address the issue of correlated regressors, including the Stein estimator, ridge regression, Liu estimator, Modified Liu estimator, modified ridge-type estimator, Kibra-Lukman estimator, Dawoud-Kibria estimator, and others17. These methods aim to effectively account for the correlation among the regressors.

Outliers are data points that differ significantly from other observations and can have a substantial impact on model estimates8,9. They threatened the efficiency of the OLS estimator811, and it is well-known that robust estimators are preferred when dealing with outliers1219. However, both multicollinearity and outliers can exist simultaneously in a model. To address both issues, some of the methods mentioned earlier have been combined. For example, ridge regression has been combined with the M-estimator to handle both correlated regressors and outliers in the y-direction20.

Recently, the Stein estimator has gained popularity as an alternative to OLS and performs well in handling correlated regressors. Few researchers have extended the method to some generalized linear models such as the Poisson, the zero-inflated negative binomial and inverse gaussian regression models2123. However, it is sensitive to outliers in the y-direction. In this study, we propose a robust version of the Stein estimator that can handle both multicollinearity and outliers.

In Section “Theoretical comparisons among estimators”, we provide a theoretical comparison of the proposed and existing estimators. We then conduct a simulation study in Section “Simulation study” to evaluate their performance, and in Section “Real-life application”, we analyze real-life data for illustration purposes. Finally, we conclude our findings in Section “Some concluding remarks”.

Theoretical comparisons among estimators

With the suggested biased estimators, we employ the spectral decomposition of the information matrix (XX) to offer the explicit form of the matrix mean squared error (MMSE) and the scalar mean squared error (SMSE). Assume that there exists a matrix T such that:

TXXT=Λ=diagλj,j=1,2,,p,p=p+1,

where, λ1λ2..λp, are the ordered eigenvalues of (XX) and T is a (p×p) orthogonal matrix whose columns are the corresponding eigenvectors of λ1λ2..λp. Rewrite the linear regression model in Eq. (1.1) in canonical form:

yi=j=1pαjhij+εi,i=1,2,,n, 2.1

where H=XT´,α=Tβ´,TXXT=HH=Λ. With the presence of correlated regressors (multicollinearity) the ordinary least squares estimator α^OLS is inadequate and inefficient. Also, outlier(s) negatively affect the parameter estimates of α^LS. The M-estimator is efficient for handling outliers in the y-direction15. Let α^M be the M-estimator of α, and can be obtained across a solution of M-estimating equations. The effects of outliers in the y-direction are eliminated by the weights of the residuals in the iterative reweighted least-squares approach used to solve M-estimating equations10,15.

α^LS=Λ-1Hy 2.2
α^M=mini=1nπεiη=mini=1nπyi-j=1pαjhijη, 2.3

where π(.) indicates a robust criterion function and η is a scale parameter estimate. α^M is obtained through a solution of M-estimating equations i=1nϕeiη=0 and i=1nϕeiηxi=0, where, ei=yi-j=1pα^j-Mhij,ϕ=π is a useful selected function10.

SMSEα^M=j=1pΨjj, 2.4

where Ψjj is the jth element of the main diagonal of the matrix Varα^M=Ψ, which is finite.

The ridge regression estimator of α is defined as:

α^Ridge=Λ+kI-1Λα^LS 2.5
covα^Ridge=σ2Λ+kI-1ΛΛ+kI-1,k0 2.6
Biasα^Ridge=EΛ+kI-1Λα^LS-α
=Λ+kI-1Λ-Iβ 2.7

The scalar mean squared error (SMSE) of α^Ridge and the matrix mean squared error (MMSE) of α^Ridge are calculated as:

MMSEα^Ridge=σ2Λ+kI-1ΛΛ+kI-1+Biasα^RidgeBiasα^Ridge 2.8
SMSEα^Ridge=σ2j=1pλjλj+k2+k2j=1pαj2λj+k2 2.9

The M-Ridge is given by:

α^M-ridge=Λ+kmI-1Λα^M 2.10
covα^Ridge-M=Λ+kmI-1ΛψΛΛ+kmI-1,k0 2.11
Biasα^Ridge-M=EΛ+kmI-1Λα^M-α 2.12

The scalar mean squared error (SMSE) of α^Ridge-M and the matrix mean squared error (MMSE) of α^Ridge-M are calculated as:

MMSEα^Ridge-M=Λ+kmI-1ΛΨΛΛ+kmI-1+Biasα^Ridge-MBiasα^Ridge-M 2.13
SMSEα^Ridge-M=j=1pλj2λj+k2Ψjj+j=1pαj2k2λj+k2 2.14

The James–Stein estimator (Stein, 1960) is given by:

α^JSE=cα^LS 2.15

where

c=α^LSα^LSα^LSα^LS+σ2tr(XX)-1)=j=1pλjαj2σ2+λjαj2 2.16
covα^JSE=cCovα^JSEc=cσ^2(XX)-1c 2.17
Biasα^JSE=Ecα^JSE-α=(c-1)α

The scalar mean squared error (SMSE) of α^JSE and the matrix mean squared error (MMSE) of α^JSE are calculated as:

MMSEα^JSE=cσ2(XX)-1c+Biasα^JSEBiasα^JSE 2.18
SMSEα^JSE=c2σ2j=1p1λj+c-12j=1pαj2 2.19
SMSEα^JSE=j=1pσ2λjαj4σ2+λjαj22+j=1pσ4αj2σ2+λjαj22 2.20

M-Stein estimator

Stein estimator is sensitive to outliers in the y-direction. Thus, there is a need to propose the Robust Stein estimator which is defined as follows:

α^M-JSE=cα^M, 2.21

where α^M is the M-estimate of α,

c=j=1pλjαM2Ψjj+λjαM2 2.22
covα^M-JSE=cCovα^Mc=cψ(XX)-1c 2.23
Biasα^M-JSE=Ecα^M-α=(c-1)α

The scalar mean squared error (SMSE) of α^M-JSE and the matrix mean squared error (MMSE) of α^M-JSE are calculated as:

MMSEα^M-JSE=cψ(XX)-1c+Biasα^MBiasα^M 2.24
SMSEα^M-JSE=c2Ψj=1p1λj+c-12j=1pαj2 2.25
SMSEα^M-JSE=j=1pΨjjλjαj4Ψjj+λjαj22+j=1pΨjj2αj2Ψjj+λjαj22 2.26

We presume the following conditions hold to describe the major theorems:

  1. ϕ is skew-symmetric and non-decreasing.

  2. The errors are symmetric.

  3. Ψ is finite.

Now we will give the theoretical comparisons among the estimators based on the scalar mean squared errors, presented in Eqs. (1.5), (2.4), (2.9), (2.14), and (2.20).

Theorem 2.1

SMSEα^M-JSE<SMSEα^LS, if σ2Ψjj+λjαj2>Ψjjαj2λj, where Ψjj is the jth element of the main diagonal of the matrix Varα^M=Ψ.

Proof:

The difference between SMSEα^M-JSEandSMSEα^LS is given by:

j=1pΨjjλjαj4Ψjj+λjαj22+j=1pΨjj2αj2Ψjj+λjαj22-j=1pσ2λj 2.27
j=1pΨjjλj2αj4+Ψjj2λjαj2-σ2Ψjj+λjαj22λjΨjj+λjαj22<0
Ψjj+λjαj2Ψjjαj2λj<σ2Ψjj+λjαj22
Ψjjαj2λj<σ2Ψjj+λjαj2
Ψjjαj2λj-σ2Ψjj+λjαj2<0 2.28

It is obvious from Eq. (2.28) that σ2Ψjj+λjαj2 is greater than Ψjjαj2λj. Thus, the difference is less than zero and the proof is completed.

Theorem 2.2

SMSEα^M-JSE<SMSEα^Ridge, if σ2λjαj2λj+Ψjj+k2αj4λj>Ψjjαj2λjλj+2k, where Ψjj is the jth element of the main diagonal of the matrix Varα^M=Ψ.

Proof:

The difference between SMSEα^M-JSEandSMSEα^Ridge is given by:

j=1pΨjjλjαj4Ψjj+λjαj22+j=1pΨjj2αj2Ψjj+λjαj22-j=1pσ2λj+k2αj2λj+k2 2.29
j=1pΨjjαj2λjαj2+Ψjjλj+k2-σ2λj+k2αj2Ψjj+λjαj22Ψjj+λjαj22λj+k2

SMSEα^M-JSE is better than SMSEα^Ridge if the difference is less than zero, i.e. if

Ψjjαj2λjαj2+Ψjjλj+k2<σ2λj+k2αj2Ψjj+λjαj22
Ψjjαj2λj+k2<σ2λj+k2αj2λjαj2+Ψjj
Ψjjαj2λj2+Ψjjαj2k2+2kλjΨjjαj2<σ2αj2λj2+σ2λjΨjj+k2αj4λj+k2αj2Ψjj
Ψjjαj2λj2-2kλjΨjjαj2-σ2αj2λj2+σ2λjΨjj+k2αj4λj<0
Ψjjαj2λjλj+2k-σ2λjαj2λj+Ψjj-k2αj4λj<0 2.30

It is obvious from Eq. (2.30) that σ2λjαj2λj+Ψjj+k2αj4λj is greater than Ψjjαj2λjλj+2k. Thus, the difference is less than zero and the proof is completed.

Theorem 2.3

SMSEα^M-JSE<SMSEα^M-Ridge, if Ψjj2λj+k2αj2Ψjj+k2αj4λj>Ψjjαj2kk+2λj, where Ψjj is the jth element of the main diagonal of the matrix Varα^M=Ψ.

Proof:

The difference between SMSEα^M-JSEandSMSEα^M-Ridge is given by:

j=1pΨjjλjαj4Ψjj+λjαj22+j=1pΨjj2αj2Ψjj+λjαj22-j=1pΨjjλjλj+k2+j=1pk2αj2λj+k2 2.31
j=1pΨjjαj2λjαj2+ΨjjΨjj+λjαj22-j=1pΨjjλj+k2αj2λj+k2
j=1pΨjjαj2λjαj2+Ψjjλj+k2-Ψjjλj+k2αj2Ψjj+λjαj22Ψjj+λjαj22λj+k2

SMSEα^M-JSE is better than SMSEα^M-Ridge if the difference is less than zero, i.e. if

Ψjjαj2λjαj2+Ψjjλj+k2<Ψjjλj+k2αj2Ψjj+λjαj22
Ψjjαj2λj+k2<Ψjjλj+k2αj2λjαj2+Ψjj
Ψjjαj2k2+Ψjjαj2λj2+2Ψjjαj2λjk<Ψjjαj2λj2+Ψjj2λj+k2αj2Ψjj+k2αj4λj
Ψjjαj2kk+2λj-Ψjj2λj+k2αj2Ψjj+k2αj4λj<0 2.32

It is obvious from Eq. (2.32) that Ψjj2λj+k2αj2Ψjj+k2αj4λj is greater than Ψjjαj2kk+2λj. Thus, the difference is less than zero and the proof is completed.

Theorem 2.4

SMSEα^M-JSE<SMSEα^JSE, if σ2αj2λjαj2+Ψjj>Ψjjαj2λjαj2+σ2, where Ψjj is the jth element of the main diagonal of the matrix Varα^M=Ψ.

Proof:

The difference between SMSEα^M-JSEandSMSEα^JSE is given by:

j=1pΨjjλjαj4Ψjj+λjαj22+j=1pΨjj2αj2Ψjj+λjαj22-j=1pσ2λjαj4σ2+λjαj22+j=1pσ4αj2σ2+λjαj22 2.33
j=1pΨjjαj2λjαj2+Ψjjσ2+λjαj22-σ2αj2λjαj2+σ2Ψjj+λjαj22Ψjj+λjαj22σ2+λjαj22

SMSEα^M-JSE is better than SMSEα^JSE if the difference is less than zero, i.e. if

Ψjjαj2λjαj2+σ2<σ2αj2λjαj2+Ψjj
Ψjjαj2λjαj2+σ2-σ2αj2λjαj2+Ψjj<0 2.34

It is obvious from Eq. (2.34) that σ2αj2λjαj2+Ψjj is greater than Ψjjαj2λjαj2+σ2. Thus, the difference is less than zero and the proof is completed.

Simulation study

This section provides a simulation study using the R programming language to compare the performance of the non-robust and robust estimators.

Simulation design

The design of this simulation study is based on specifying the variables that are anticipated to have an impact on the features of suggested estimator and selecting a metric to assess the outcomes. Following the cited references2428, we generated the regressors as follows:

xij=(1-ρ2)1/2mij+ρmi,p+1,i=1,2,,n,j=1,2,3,,p 3.1

where mij independent standard normal are pseudo-random numbers, p denotes the number of regressors (p=4,8,12) and ρ denotes level of multicollinearity (ρ=0.7,0.8,0.9,0.99). Thus, the response variable is given by:

yi=β0+β1xi1++βpxip+εi,i=1,2,,n 3.2

where εiN(0,σ2), σ=5,10, n = 30,50,100, 200 and the regression parameters are chosen such that ββ=12935. The experiment is repeated 2000 times. We introduced outlier by increasing the magnitude of the response variable. Using Eq. (3.3), 10% and 20% contamination were added to the model.

yi=hmaxyi+yi, 3.3

where h = 10 is added to inflate the response variable36,37. The ridge parameter k is obtained using the following equation:

k=pσ^2j=1pαLS2, 3.4

where σ^2=j=1nei2n-r, ei=y-y^ and r denotes the number of estimated parameter.

The unbiased estimator of Ψjj is asymptotically A^2λj-1 where A^2=s2(n-p)-1 i=1nφei/s2i=1n1nφei/s2andsisthescaleestimate. Thus, the parameter for M-Ridge is determined using the following equation:

km=pA^j=1pαM2 3.5

The estimated mean squared error (MSE) is computed as follows:

MSE=12000i=12000j=1p(β^ij-βj)2 3.6

where, β^ij is the estimated jth parameter in the ith replication and βj is the jth true parameter value. The estimated values of the mean squared error (MSE) of the proposed and other estimators are displayed in Tables 1, 2, 3, 4, 5 and 6 for p=4 with 10% outliers, p=8 with 10% outliers, p=12 with 10% outliers, p=4 with 20% outliers, p=8 with 20% outliers and p=12 with 10% outliers respectively.

Table 1.

Estimated MSE values for p = 4 with 10% outlier.

n 30 50 100 200
σ 5 10 5 10 5 10 5 10
ρ=0.7
α^LS 334.39 1269.12 161.47 603.63 93.26 351.43 63.83 240.57
α^Ridge 96.85 366.83 46.70 173.68 27.25 102.11 17.91 66.91
α^JSE 184.43 700.17 64.68 240.06 36.35 135.85 27.73 103.58
α^M 5.32 20.56 2.10 7.65 1.04 3.50 0.63 2.02
α^M-JSE 1.44 3.33 0.94 1.09 0.92 0.96 0.90 0.95
ρ=0.8
α^LS 448.39 1687.28 225.57 832.85 123.56 460.96 91.91 343.37
α^Ridge 126.51 475.31 63.19 232.75 35.01 130.31 24.90 92.68
α^JSE 276.00 1039.44 105.17 386.62 54.55 202.55 46.33 172.56
α^M 6.81 26.60 2.70 10.16 1.24 4.34 0.96 2.68
α^M-JSE 2.00 5.83 0.99 1.37 0.92 1.00 0.90 0.98
ρ=0.9
α^LS 800.34 2977.67 427.24 1556.29 217.49 800.67 178.59 660.46
α^Ridge 219.03 814.35 114.90 418.77 59.15 217.92 46.51 172.06
α^JSE 569.86 2122.70 246.97 898.20 117.37 431.58 109.51 405.46
α^M 11.51 45.45 4.69 18.17 1.91 7.05 1.33 4.79
α^M-JSE 4.65 17.39 1.37 3.17 0.97 1.28 0.94 1.25
ρ=0.99
α^LS 1511.38 5579.67 4162.02 14,948.83 1916.26 6939.70 1753.83 6423.20
α^Ridge 406.91 1501.78 1068.82 3847.27 493.89 1794.36 438.01 1609.34
α^JSE 1189.76 4395.97 3499.75 12,571.39 1557.44 5642.43 1494.22 5480.46
α^M 21.02 83.54 41.94 167.18 14.36 56.87 10.98 43.39
α^M-JSE 12.76 51.89 38.09 159.73 9.08 36.91 8.88 35.62

Table 2.

Estimated MSE values for p = 8 with 10% outlier.

n 30 50 100 200
σ 5 10 5 10 5 10 5 10
ρ=0.7
α^LS 991.49 3547.59 367.15 1333.23 286.72 1020.75 156.87 562.54
α^Ridge 237.09 849.14 92.03 333.94 70.85 251.95 38.94 139.05
α^JSE 574.11 2053.58 135.14 490.23 124.55 442.87 63.65 227.09
α^M 16.54 65.49 4.61 17.71 2.46 9.18 1.19 4.16
α^M-JSE 3.43 12.54 0.984 1.24 0.954 1.13 0.928 0.974
ρ=0.8
α^LS 1409.59 4924.47 495.44 1758.07 429.76 1498.04 231.65 812.34
α^Ridge 334.20 1169.31 121.90 432.56 104.61 364.68 56.54 197.75
α^JSE 887.78 3103.49 205.21 728.04 215.46 751.23 108.97 381.09
α^M 22.79 90.53 5.87 22.79 3.44 13.11 1.61 5.79
α^M-JSE 5.93 24.07 1.04 1.57 1.01 1.49 0.931 1.05
ρ=0.9
α^LS 2656.92 9036.34 887.95 3059.53 886.05 2953.05 460.32 1575.24
α^Ridge 625.83 2132.56 214.23 739.22 207.93 709.76 110.58 378.01
α^JSE 1867.80 6360.33 448.73 1546.77 522.67 1784.46 264.46 904.02
α^M 41.69 166.13 9.82 38.64 6.46 25.17 2.86 10.80
α^M-JSE 16.40 71.37 1.41 3.58 1.45 3.84 1.02 1.69
ρ=0.99
α^LS 24,559.35 81,192.05 7951.18 26,482.60 8764.16 29,262.26 4611.21 15,395.13
α^Ridge 5778.64 19,129.54 1888.04 6305.34 2080.10 6959.70 1092.61 3649.50
α^JSE 21,413.86 70,814.03 6231.93 20,727.48 7366.58 24,626.64 3803.09 12,689.94
α^M 413.20 1759.63 81.87 326.48 61.04 243.37 25.63 101.84
α^M-JSE 377.51 1509.54 46.05 206.76 45.81 202.69 18.04 79.30

Table 3.

Estimated MSE values for p = 12 with 10% outlier.

n 30 50 100 200
σ 5 10 5 10 5 10 5 10
ρ=0.7
α^LS 2604.94 8829.07 896.69 3051.38 460.96 1534.91 218.28 770.90
α^Ridge 543.53 1839.01 207.90 705.95 108.26 359.45 52.83 186.11
α^JSE 1640.62 5557.62 411.12 1396.73 186.55 618.17 84.12 296.52
α^M 67.04 267.59 12.19 48.06 3.67 13.96 1.52 5.42
α^M-JSE 21.63 96.12 1.53 4.07 0.965 1.14 0.937 0.971
ρ=0.8
α^LS 3998.75 13,189.59 1339.33 4417.40 692.69 2225.49 309.65 1062.11
α^Ridge 828.70 2730.26 308.39 1014.78 161.00 516.32 74.14 253.85
α^JSE 2709.64 8935.37 696.87 2296.04 323.37 1035.55 138.39 474.24
α^M 98.46 393.05 17.225 68.20 5.10 19.70 1.97 7.23
α^M-JSE 38.60 174.10 2.28 7.97 1.01 1.52 0.937 1.02
ρ=0.9
α^LS 8195.87 26,305.95 2686.94 8583.41 1400.06 4340.77 590.43 1953.99
α^Ridge 1686.83 5411.15 614.78 1959.70 322.48 999.12 140.00 462.84
α^JSE 6112.39 19,612.72 1662.91 5310.64 798.83 2473.15 325.43 1076.14
α^M 193.17 771.48 32.60 129.71 9.52 37.38 3.37 12.81
α^M-JSE 103.81 467.14 6.13 27.16 1.44 4.09 0.989 1.43
ρ=0.99
α^LS 83,344.68 260,825.92 27,061.36 84,006.10 14,253.48 42,889.05 5726.77 18,177.61
α^Ridge 16,956.42 53,080.29 6146.78 19,047.96 3260.76 9814.84 1347.36 4276.90
α^JSE 74,897.53 234,279.84 22,940.79 71,219.34 11,801.17 35,511.63 4651.08 14,761.87
α^M 2188.69 9287.41 310.45 1241.09 90.29 360.41 28.81 114.52
α^M-JSE 1886.15 7542.64 246.94 1104.34 57.14 271.60 14.09 65.28

Table 4.

Estimated MSE values for p = 4 with 20% outlier.

n 30 50 100 200
σ 5 10 5 10 5 10 5 10
ρ=0.7
α^LS 568.65 2141.13 293.37 1091.45 168.92 636.72 116.32 435.23
α^Ridge 138.77 520.48 79.29 293.44 45.92 172.49 30.47 113.22
α^JSE 288.18 1081.39 112.26 415.11 62.39 233.65 47.35 175.88
α^M 11.32 44.30 3.51 13.26 1.65 5.91 0.893 3.01
α^M-JSE 2.75 8.54 0.996 1.24 0.942 0.982 0.935 0.966
ρ=0.8
α^LS 761.63 2842.37 411.19 1508.89 223.95 834.83 167.76 621.08
α^Ridge 180.69 672.05 107.77 393.73 59.09 219.85 42.53 156.83
α^JSE 439.51 1636.31 185.09 676.19 94.64 351.27 80.30 296.07
α^M 14.44 56.75 4.59 17.69 2.01 7.42 1.14 4.03
α^M-JSE 4.06 14.24 1.08 1.68 0.946 1.02 0.934 0.997
ρ=0.9
α^LS 1358.01 5011.71 781.63 2825.33 394.51 1450.39 326.54 1194.91
α^Ridge 311.67 1146.83 197.02 709.15 99.85 366.71 79.70 291.15
α^JSE 929.02 3423.97 442.20 1593.99 206.74 758.25 193.35 706.26
α^M 24.34 96.18 8.10 31.78 3.20 12.22 1.95 7.27
α^M-JSE 9.41 37.10 1.65 4.32 0.999 1.34 0.976 1.28
ρ=0.99
α^LS 10,198.91 12,876.85 7639.70 27,208.70 3484.80 12,593.20 3216.32 11,634.61
α^Ridge 2706.58 9834.81 1839.22 6520.47 835.04 3012.58 752.92 2724.06
α^JSE 10,876.85 39,605.41 6374.79 22,685.65 2820.07 10,189.28 2700.45 9761.21
α^M 267.48 1082.27 73.65 294.02 25.15 100.03 16.82 66.73
α^M-JSE 232.32 915.01 55.70 236.04 12.51 51.92 10.57 43.34

Table 5.

Estimated MSE values for p = 8 with 20% outlier.

n 30 50 100 200
σ 5 10 5 10 5 10 5 10
ρ=0.7
α^LS 1680.29 6032.57 649.14 2347.64 519.84 1855.91 277.44 985.56
α^Ridge 338.58 1211.14 146.89 530.40 117.08 418.83 61.52 217.87
α^JSE 892.25 3197.52 220.07 795.30 215.90 771.14 107.57 380.48
α^M 77.05 295.95 9.04 35.45 4.09 15.66 1.73 6.28
α^M-JSE 50.56 191.36 1.15 1.97 0.975 1.16 0.949 0.986
ρ=0.8
α^LS 2386.06 8366.73 876.71 3095.33 779.41 2726.55 410.10 1423.33
α^Ridge 474.54 1657.42 194.86 687.38 172.99 606.78 89.31 309.44
α^JSE 1404.79 4918.57 338.48 1195.82 377.11 1319.87 187.49 649.08
α^M 107.42 406.95 11.64 45.89 5.79 22.48 2.36 8.79
α^M-JSE 77.90 292.56 1.35 2.96 1.04 1.60 0.953 1.05
ρ=0.9
α^LS 4497.41 15,343.62 1573.26 5386.61 1570.99 5376.51 815.20 2761.00
α^Ridge 884.37 3004.26 343.22 1174.22 343.98 1180.84 174.46 590.72
α^JSE 3030.80 10,326.90 755.01 2589.95 923.26 3161.07 462.72 1565.51
α^M 203.07 767.82 19.8 77.64 11.01 43.35 4.29 16.50
α^M-JSE 177.83 680.59 2.20 7.30 1.61 4.58 1.05 1.69
ρ=0.99
α^LS 41,563.66 137,956.01 14,127.25 46,680.78 15,899.19 53,246.48 8168.09 26,990.81
α^Ridge 8117.10 26,834.85 3032.17 10,016.38 3442.46 11,567.05 1722.93 5695.03
α^JSE 36,040.86 119,596.46 10,875.79 35,998.75 13,146.46 44,050.06 6718.96 22,180.61
α^M 2663.99 10,224.36 164.43 657.03 105.31 420.56 39.30 156.52
α^M-JSE 1932.41 7299.72 86.21 387.18 66.60 297.11 21.73 97.40

Table 6.

Estimated MSE values for p = 12 with 20% outlier.

n 30 50 100 200
σ 5 10 5 10 5 10 5 10
ρ=0.7
α^LS 4625.49 15,569.99 1592.00 5431.56 819.56 2736.17 384.88 1358.69
α^Ridge 795.44 2678.80 320.33 1093.17 171.78 574.02 81.58 287.60
α^JSE 2666.28 8958.67 657.99 2242.06 311.30 1040.14 135.03 475.13
α^M 704.82 2441.50 37.13 144.53 6.41 24.91 2.40 8.90
α^M-JSE 620.10 2178.46 10.24 37.24 0.987 1.21 0.955 0.987
ρ=0.8
α^LS 1358.69 23,312.47 2375.22 7861.05 1231.96 3968.68 545.98 1872.88
α^Ridge 1217.42 3984.70 474.01 1570.14 255.44 823.97 114.40 392.03
α^JSE 4490.75 14,662.86 1130.21 3735.51 547.12 1764.94 225.07 770.29
α^M 1173.73 3987.40 53.34 206.79 8.99 35.23 3.16 11.93
α^M-JSE 938.69 3239.22 17.50 65.90 1.06 1.71 0.958 1.04
ρ=0.9
α^LS 14,664.60 46,636.56 4759.12 15,267.63 2492.38 7745.98 1039.84 3447.79
α^Ridge 2489.93 7922.75 942.43 3028.36 512.12 1594.44 215.61 714.32
α^JSE 10,386.80 32,979.34 2750.90 8815.89 1376.18 4285.06 537.64 1780.84
α^M 2697.90 8905.19 103.67 398.83 16.94 67.04 5.50 21.27
α^M-JSE 1894.77 6371.29 46.18 179.03 1.67 5.36 1.02 1.51
ρ=0.99
α^LS 150,036.15 464,788.19 47,914.59 149,338.60 25,396.00 76,608.56 10,075.28 32,095.55
α^Ridge 25,236.70 78,230.32 9413.00 29,398.07 5186.20 15,674.68 2073.34 6598.45
α^JSE 132,438.10 410,155.40 39,657.19 123,665.60 20,860.67 62,998.27 7982.77 25,401.38
α^M 34,454.37 112,577.69 1015.75 3886.20 162.04 647.37 48.07 191.56
α^M-JSE 18,785.63 61,944.42 1004.59 4039.57 88.94 425.72 17.89 84.86

For a clear visualization of the simulated MSE values, we plotted MSE values vs sample size in Fig. 1 for p=4, σ = 5, 10% outliers and different ρ; in Fig. 2 for p = 4, σ = 5, 20% outliers and different ρ. The MSE values vs outliers are plotted in Fig. 3 for n = 30, p=4, σ = 5 and different ρ.

Figure 1.

Figure 1

MSE vs sample size, For p = 4, σ = 5, 10% outliers and different ρ.

Figure 2.

Figure 2

MSE vs sample size, For p = 4, σ = 5, 20% outliers and different ρ.

Figure 3.

Figure 3

MSE vs outliers, for n = 30, p = 4, σ = 5, different values of ρ.

Simulation results discussions

Our conclusions are derived from the comprehensive review of the simulation results presented in Tables 1, 2, 3, 4, 5 and 6 and Figs. 1, 2 and 3. The key findings are outlined below:

In a comprehensive evaluation, it is evident that the proposed estimator consistently outperforms OLS in all scenarios, yielding a significantly lower Mean Squared Error (MSE) value. Additionally, all of the estimators exhibit monotonic behaviors in accordance with the MSE, meaning that the estimated MSE values drop as the sample size grows. The statistics clearly show that increasing the sample size has a beneficial impact on the effectiveness of all estimators, including OLS.

The proposed estimator α^M-JSE consistently exhibits the lowest MSE values across all simulation settings, surpassing both the OLS estimator and other biased estimators. To investigate the impact of outliers on the estimated regression parameters, we considered two different percentages of outliers in the y-direction. As the percentage increases from 10 to 20%, the MSE of all estimators shows a corresponding increase. In order to assess the influence of multicollinearity on the regression parameter estimates, we varied the correlation coefficients between explanatory variables (ρ = 0.7, 0.8, 0.9, 0.99). It was observed that increasing the correlation between explanatory variables resulted in higher MSE values for all estimators. When evaluating the performance of the estimators relative to the sample size (n = 30, 50, 100, 200) while keeping p, the percent of outliers, and σ fixed, a noticeable trend emerged: the MSE consistently decreased as the sample size grew. Additionally, the parameter σ had a significant impact on the MSE, as its increase led to a corresponding rise in the MSE for all estimators. The total number of explanatory variables also influenced the MSE values for all estimators. A higher number of explanatory variables resulted in higher MSE values. Under all simulation conditions, it is observed that the proposed is the most effective choice for mitigating multicollinearity in the presence of outliers.

Real-life application

In this section, we adopted three examples to evaluate the performance of the estimators.

Example I

We utilized a pollution dataset that has been previously analyzed by various researchers38,39. The response variable is the total age-adjusted mortality rate per 100,000, which is a linear combination of 15 covariates. For a more detailed description of the data, refer to38,39.

First, we employed the least squares method to fit model (1.1) and obtained the residuals. The diagnostic plots in Fig. 4 were obtained via the residuals, which indicated that certain observations were outliers. Specifically, the residual versus fitted plot identified data points 26, 31, and 37 as outliers, and the normal Q-Q plot indicated that data points 26, 32, and 37 were outliers. The residual versus leverage plot identified observations 18, 32, and 37 as outliers, while the scale-location plot picked observations 32 and 37. These observations reveal that there are outliers in the model. Additionally, the variance inflation factor for xi12 and xi13 were 98.64 and 104.98, respectively, indicating a high degree of correlation between the regressors.

Figure 4.

Figure 4

Graphical detection of outliers using pollution data.

To address the issues of correlated regressors and outliers, we estimated the model using the ridge regression, the Stein estimator, the M-ridge, and the proposed robust Stein estimator. We compared the performance of these estimators using the scalar mean squared error (SMSE), and the regression estimates and SMSE values are provided in Table 7.

Table 7.

Regression coefficients and SMSEs for the pollution data.

Coef α^LS α^Ridge α^JSE α^M-Ridge α^M-JSE
x1 1.175 1.459 1.175 1.579 1.349
x2  − 1.516  − 2.985  − 1.516  − 2.578  − 1.342
x3 1.319 2.895 1.319 2.344 1.013
x4 11.184 6.302 11.183 6.952 11.095
x5 128.036 45.864 128.034 45.396 113.697
x6  − 1.463  − 2.593  − 1.463 4.025 5.825
x7 1.221 3.041 1.221 3.169 1.600
x8 0.007 0.007 0.007 0.008 0.008
x9 4.130 3.810 4.130 3.666 3.937
x10 0.447 0.300 0.447  − 0.681  − 0.647
x11 1.886 4.875 1.886 5.528 3.044
x12  − 0.373  − 0.401  − 0.373  − 0.235  − 0.203
x13 0.874 1.015 0.874 0.595 0.455
x14 0.160 0.145 0.160 0.215 0.232
x15 1.915 3.075 1.915 2.536 1.549
SMSE 2244.130 455.991 9.081 348.991 8.194

From Table 7, we observed that due to the sensitivity of the OLS estimator to correlated regressors (multicollinearity) and outliers, it exhibited the worst performance in terms of SMSE. The coefficients of all the estimates were similar, except for x6, where only M-ridge and M-Stein had a positive coefficient. As expected, the robust ridge dominated the ridge estimator since the ridge estimator is sensitive to outliers. However, the Stein estimator performed better than the ridge estimator, as reported in the literature. Most notably, the proposed robust version of the Stein estimator (M-JSE) outperformed every estimator under the study.

Example II

The dataset was used to predict the value of a product in the manufacturing sector, based on three predictors: the value of imported intermediate (x1), Imported capital commodities (x2) and the value of imported raw materials x314,40,41. A linear regression model was fitted, and the variance inflation factors were computed for each predictor, resulting in values of 128.26, 103.43, and 70.87, respectively, indicating high correlation between the predictor variables. The residual plot in Fig. 5 revealed the presence of outliers in the dataset. Outliers were identified by both the residual plot against the fitted values and the scale-location plot, which detected observations 16, 30, and 31 as outliers. The Normal Q-Q plot and Residual versus Leverage plot identified observations 31 and 30 as outliers. The residual versus leverage plot also detected observations 18, 32, and 37 as outliers, while the scale-location plot picked observation 32 and 37 as outliers. These findings indicate that the model contains both correlated regressors and outliers. The model was analyzed using several estimators, and the results were summarized in Table 8. It was observed that the regression estimate of the Stein estimator was the same as that of OLS, with a computed value of c approximately equal to 1 (c = 0.9996761). However, the Stein estimator exhibited a lower mean squared error than the OLS estimator. The ridge estimator dominated the Stein estimator in this instance, but the M-Ridge outperformed the ridge estimator by accounting for both multicollinearity and outliers. The proposed M-JSE performed the best in terms of smaller MSE.

Figure 5.

Figure 5

Graphical detection of outliers using import data.

Table 8.

Regression coefficients and SMSE for the import data.

Coef α^LS α^Ridge α^JSE α^M-Ridge α^M-JSE
x1 2.337 1.878 2.337 2.362 2.523
x2 0.573 0.617 0.573 0.518 0.511
x3  − 1.515  − 0.348  − 1.515  − 0.852  − 1.317
SMSE 4.723 1.837 1.355 0.793 0.690

Example III

We analyzed the Longley data to predict the total derived employment, which is a linear function of the following predictors: gross national product implicit price deflator, gross national product, unemployment, size of armed forces, and non-institutional population 14 years of age and over33,3840,42,43. The literature indicates that the model suffers from multicollinearity. Additionally, Fig. 6 shows that certain observations are anomalous, namely data points 9, 10, and 16.

Figure 6.

Figure 6

Graphical detection of outliers using longley data.

We used both robust and non-robust estimators to analyze the data, and the results are presented in Table 9. The table indicates that the regression estimates of OLS and Stein are the same, with a value of c = 1. However, the Stein estimator has a lower SMSE than OLS. The Stein estimator dominates the ridge and robust ridge estimators in this instance. Furthermore, the proposed robust Stein estimator provides optimal performance based on the results.

Table 9.

Regression coefficients and MSEs for the pollution data.

Coef α^LS α^Ridge α^JSE α^M-Ridge α^M-JSE
x1 217.217 99.388 217.217 144.229 180.799
x2  − 0.010  − 0.004  − 0.010  − 0.006  − 0.008
x3 0.453 0.527 0.453 0.498 0.475
x4  − 1.396  − 1.335  − 1.396  − 1.324  − 1.343
x5  − 0.579  − 0.409  − 0.579  − 0.540  − 0.593
SMSE 11,188.401 2342.558 1.276 1055.014 1.030

In summary, the Longley data analysis indicates that the model suffers from multicollinearity and contains anomalous observations. However, using the robust Stein estimator provides the best performance among the estimators considered in this study.

Some concluding remarks

Linear regression models (LRMs) are widely used for predicting the response variable based on a combination of regressors. However, correlated regressors can decrease the efficiency of the ordinary least square method. Alternative methods such as the Stein and the ridge estimators can provide better estimations in such situations. However, these methods can be sensitive to outlying observations, leading to unstable predictions.

To address this issue, researchers have previously combined the ridge estimator with robust estimators (such as M-estimators) to account for both correlated regressors and outliers.

In this study, we developed a new biased estimator that offers an alternate approach to handling multicollinearity in linear regression, it is boosted Stein estimator by combining the M-estimator with the Stein estimator. Pseudo random numbers are created for both the independent and dependent variables in a Monte Carlo experiment. Different sample sizes, correlation strengths, and quantities of independent variables are taken into account. Our simulation and application results demonstrate that the robust Stein estimator outperforms the other estimators considered.

It is noted that, in the case of high multicollinearity, the suggested estimator showed its best performance by means of the reduction of the estimated MSE values and it is not affected by multicollinearity as much as other estimators. According to the tables, there is some difference between the performances of the suggested estimators according to the shrinkage parameter that is used and it may be concluded that, km is the best shrinkage parameter among others in most cases.

The findings of this paper will be beneficial for practitioners who encounter the challenge of dealing with multicollinearity and outliers in their data. By using the Robust Stein estimator, they can obtain more stable and accurate predictions.

While this study has made substantial progress in addressing the challenges of LRMs, there are still avenues for further exploration. Future research endeavors should consider incorporating other robust estimators including the robust Liu estimator, Robust Liu-type estimator, robust linearized ridge estimator, Jackknife Kibria-Lukman M-Estimator, Modified Ridge-Type M-Estimator to conduct a more comprehensive comparative analysis13,14,4547. This will contribute to a deeper understanding of the strengths and limitations of different approaches in handling complex data scenarios.

Another potential direction for future research is the extension of the current study using neutrosophic statistics. Neutrosophic statistics is an extension of classical statistics that is particularly useful when dealing with data from complex processes or uncertain environments4853. By incorporating neutrosophic statistics, we can account for additional sources of uncertainty and variability, which may further enhance the robustness and applicability of our proposed estimator.

Supplementary Information

Author contributions

A.L.: Conceptualization, Writing, Software, Review. R.F.: Conceptualization and Writing. G.K.: Supervision and Editting. O.O.: Writing and Review.

Data availability

All data analysed during this study are included as Supplementary Files.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-023-36053-z.

References

  • 1.Stein, C. M. (1960). Multiple regression contributions to probability and statistics. Essays in Honor of Harold Hoteling. Stanford University Press.
  • 2.Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. doi: 10.1080/00401706.1970.10488634. [DOI] [Google Scholar]
  • 3.Liu K. A new class of biased estimate in linear regression. Comm. Stat. Theory Meth. 1993;22:393–402. doi: 10.1080/03610929308831027. [DOI] [Google Scholar]
  • 4.Dawoud I, Kibria BMG. A new biased estimator to combat the multicollinearity of the Gaussian linear regression model. Stats. 2020;3(4):526–541. doi: 10.3390/stats3040033. [DOI] [Google Scholar]
  • 5.Kibria BMG, Lukman AF. A new ridge-type estimator for the linear regression model: Simulations and applications. Scientifica. 2020 doi: 10.1155/2020/9758378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lukman AF, Ayinde K, Binuomote S, Onate AC. Modified ridge-type estimator to combat multicollinearity: Application to chemical data. J. Chemom. 2019;33:e3125. doi: 10.1002/cem.3125. [DOI] [Google Scholar]
  • 7.Lukman AF, Kibria BMG, Ayinde K, Jegede SL. Modified one-parameter Liu estimator for the linear regression model. Modell. Simul. Eng. 2020 doi: 10.1155/2020/9574304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chatterjee S, Hadi AS. Influential observations, high leverage points, and outliers in linear regression. Stat. Sci. 1986;1:379–416. [Google Scholar]
  • 9.Ayinde K, Lukman AF, Arowolo O. Robust regression diagnostics of influential observations in linear regression model. Open J. Stat. 2015;5:273–283. doi: 10.4236/ojs.2015.54029. [DOI] [Google Scholar]
  • 10.Montgomery DC, Peck EA, Vining GG. Introduction to Linear Regression Analysis. 3. John Wiley and sons; 2006. [Google Scholar]
  • 11.Jadhav NH, Kashid DN. A jackknifed ridge M-estimator for regression model with multicollinearity and outliers. J. Stat. Theory Pract. 2011;5(4):659–673. doi: 10.1080/15598608.2011.10483737. [DOI] [Google Scholar]
  • 12.Arum KC, Ugwuowo FI. Combining principal component and robust ridge estimators in linear regression model with multicollinearity and outlier. Concurr. Computat. Pract. Exper. 2022;34:e6803. doi: 10.1002/cpe.6803. [DOI] [Google Scholar]
  • 13.Jegede SL, Lukman AF, Ayinde K. Jackknife Kibria-Lukman M-estimator: Simulation and application. J. Nig. Soc. Phys. Sci. 2022;4:250–263. [Google Scholar]
  • 14.Lukman AF, Ayinde K, Kibria BMG, Jegede SL. Two-parameter modified ridge-type M-estimator for linear regression model. Sci. World J. 2020 doi: 10.1155/2020/3192852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Huber PJ. Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Stat. 1973;1:799–821. doi: 10.1214/aos/1176342503. [DOI] [Google Scholar]
  • 16.Rousseeuw PJ, Yohai V. Robust regression by means of S estimators in robust and nonlinear time series analysis. In: Franke J, Härdle W, Martin RD, editors. Lecture Notes in Statistics. Springer-Verlag; 1984. pp. 256–274. [Google Scholar]
  • 17.Rousseeuw PJ, Leroy AM. Robust Regression and Outlier Detection (Series in Applied Probability and Statistics) Wiley Interscience; 1987. p. 329. [Google Scholar]
  • 18.Yohai VJ. High breakdown point and high efficiency robust estimates for regression. Ann. Stat. 1987;15:642–656. doi: 10.1214/aos/1176350366. [DOI] [Google Scholar]
  • 19.Rousseeuw PJ, van Driessen K. Computing LTS regression for large data sets. Data Min. Knowl. Disc. 2006;12:29–45. doi: 10.1007/s10618-005-0024-4. [DOI] [Google Scholar]
  • 20.Silvapulle MJ. Robust ridge regression based on an M-estimator. Aust. J. Stat. 1991;33(3):319–333. doi: 10.1111/j.1467-842X.1991.tb00438.x. [DOI] [Google Scholar]
  • 21.Amin M, Akram MN, Amanullah M. On the James-Stein estimator for the Poisson regression model. Commun. Stat. 2020 doi: 10.1080/03610918.2020.1775851. [DOI] [Google Scholar]
  • 22.Akram MN, Abonazel MR, Amin M, Kibria BMG, Afzal N. A new Stein estimator for the zero-inflated negative binomial regression model. Concurr. Computat. Pract. Exper. 2022;34:e7045. doi: 10.1002/cpe.7045. [DOI] [Google Scholar]
  • 23.Akram MN, Amin M, Amanullah M. James stein estimator for the inverse Gaussian regression model. Iran J. Sci. Technol. Trans. Sci. 2021 doi: 10.1007/s40995-021-01133-0. [DOI] [Google Scholar]
  • 24.Akram MN, Amin M, Kibria BMG, Arashi M, Lukman AF, Afzal N. A new improved Liu estimator for the QSAR model with inverse Gaussian response. Commun. Stat. 2022 doi: 10.1080/03610918.2022.2059088. [DOI] [Google Scholar]
  • 25.Akram MN, Amin M, Lukman AF, Afzal S. Principal component ridge type estimator for the inverse Gaussian regression model. J. Stat. Comput. Simul. 2022;92(10):2060–2089. doi: 10.1080/00949655.2021.2020274. [DOI] [Google Scholar]
  • 26.Abonazel MR, Dawoud I, Awwad FA, Lukman AF. Dawoud-Kibria estimator for beta regression model: Simulation and application. Front. Appl. Math. Stat. 2022;8:775068. doi: 10.3389/fams.2022.775068. [DOI] [Google Scholar]
  • 27.Dawoud I, Lukman AF, Haadi A. A new biased regression estimator: Theory, simulation and application. Sci. Afr. 2022;15:e01100. doi: 10.1016/j.sciaf.2022.e01100. [DOI] [Google Scholar]
  • 28.Kibria BMG. Performance of some new ridge regression estimators. Commun. Stat. 2003;32(2):419–435. doi: 10.1081/SAC-120017499. [DOI] [Google Scholar]
  • 29.Kibria BMG. More than hundred (100) estimators for estimating the shrinkage parameter in a linear and generalized linear ridge regression models. J Econ Stat. 2022;2(2):233–252. [Google Scholar]
  • 30.Lukman AF, Kibria BMG, Nziku CK, Amin M, Adewuyi ET, Farghali R. K-L estimator: Dealing with multicollinearity in the logistic regression model. Mathematics. 2023;11:340. doi: 10.3390/math11020340. [DOI] [Google Scholar]
  • 31.Kibria BMG. Some Liu and ridge-type estimators and their properties under the ill-conditioned Gaussian linear regression model. J. Stat. Comput. Simul. 2012;82(1):1–17. doi: 10.1080/00949655.2010.519705. [DOI] [Google Scholar]
  • 32.Qasim M, Kibria BMG, Månsson K, Sj€olander, P. A new Poisson Liu regression estimator: Method and application. J. Appl. Stat. 2019 doi: 10.1080/02664763.2019.1707485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lukman AF, Arashi M, Prokaj V. Robust biased estimators for Poisson regressionmodel: Simulation and applications. Concurr. Computat. Pract. Exper. 2023;2022:e7594. doi: 10.1002/cpe.7594. [DOI] [Google Scholar]
  • 34.Arum KC, Ugwuowo FI, Oranye HE, Alakija TO, Ugah TE, Asogwa OC. Combating outliers and multicollinearity in linear regression model using robust Kibria-Lukman mixed with principal component estimator, simulation and computation. Sci. Afr. 2023;19:e01566. doi: 10.1016/j.sciaf.2023.e01566. [DOI] [Google Scholar]
  • 35.Ugwowo FI, Oranye HE, Arum KC. On the Jackknifed Kibria-Lukman estimator for the linear regression model. Commun. Stat. 2021 doi: 10.1080/03610918.2021.2007401. [DOI] [Google Scholar]
  • 36.Alao NA, Ayinde K, Solomon GS. A comparative study on sensitivity of multivariate tests of normality to outliers. A. SMSc J. 2019;12(5):65–71. [Google Scholar]
  • 37.Arum KC, Ugwuowo FI, Oranye HE. Robust modified jackknife ridge estimator for the Poisson regression model with multicollinearity and outliers. Sci. Afr. 2022;17(3):e01386. doi: 10.1016/j.sciaf.2022.e01386. [DOI] [Google Scholar]
  • 38.McDonald GC, Schwing RC. Instabilities of regression estimates relating air pollution to mortality. Technometrics. 1973;15(3):463–481. doi: 10.1080/00401706.1973.10489073. [DOI] [Google Scholar]
  • 39.Yüzbasi B, Arashi M, Ahmed SE. Shrinkage estimation strategies in generalised ridge regression models: Low/high-dimension regime. Int. Stat. Rev. 2020;88(1):229–251. doi: 10.1111/insr.12351. [DOI] [Google Scholar]
  • 40.Eledum HYA, Alkhalifa AA. Generalized two stages ridge regression estimator for multicollinearity and autocorrelated errors. Can. J. Sci. Eng. Math. 2012;3(3):79–85. [Google Scholar]
  • 41.Lukman AF, Osowole OI, Ayinde K. Two stage robust ridge method in a linear regression model. J. Mod. Appl. Stat. Methods. 2015;14(2):53–67. doi: 10.22237/jmasm/1446350820. [DOI] [Google Scholar]
  • 42.Longley JW. An appraisal of least squares programs for electronic computer from the point of view of the user. J. Am. Stat. Assoc. 1967;62:819–841. doi: 10.1080/01621459.1967.10500896. [DOI] [Google Scholar]
  • 43.Walker E, Birch JB. Influence measures in ridge regression. Technometrics. 1988;30(2):221–227. doi: 10.1080/00401706.1988.10488370. [DOI] [Google Scholar]
  • 44.Lukman AF, Ayinde K. Detecting influential observations in two-parameter Liu-ridge estimator. J. Data Sci. 2018;16(2):207–218. [Google Scholar]
  • 45.Arslan O, Billor N. Robust Liu estimator for regression based on an M-estimator. J. Appl. Stat. 2000;27(1):39–47. doi: 10.1080/02664760021817. [DOI] [Google Scholar]
  • 46.Jadhav NH, Kashid DN. Robust linearized ridge M-estimator for linear regression model. Commun. Stat. 2016;45(3):1001–1024. doi: 10.1080/03610918.2014.911898. [DOI] [Google Scholar]
  • 47.Ertaş H, Kaçıranlar S, Güler H. Robust Liu-type estimator for regression based on M-estimator. Commun. Stat. 2017;46(5):3907–3932. [Google Scholar]
  • 48.Aslam M. Neutrosophic analysis of variance: Application to university students. Complex Intell. Syst. 2019;5:403–407. doi: 10.1007/s40747-019-0107-2. [DOI] [Google Scholar]
  • 49.Nagarajan D, Broumi S, Smarandache F, Kavikumar J. Analysis of neutrosophic multiple regression. Neutrosophic Sets Syst. 2021;43:43–45. [Google Scholar]
  • 50.Salama AA, Khaled OM, Mahfouz KM. Neutrosophic correlation and simple linear regression. Neutrosophic Sets Syst. 2014;5:3–8. [Google Scholar]
  • 51.Aslam M. A new sampling plan using neutrosophic process loss consideration. Symmetry. 2018;10:132. doi: 10.3390/sym10050132. [DOI] [Google Scholar]
  • 52.Aslam M, Saleem M. Neutrosophic test of linearity with application. AIMS Math. 2023;8(4):7981–7989. doi: 10.3934/math.2023402. [DOI] [Google Scholar]
  • 53.Aslam M, Al-Marshadi AH. Dietary fat and prostate cancer relationship using trimmed regression under uncertainty. Front. Nutr. 2022;9:799375. doi: 10.3389/fnut.2022.799375. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data analysed during this study are included as Supplementary Files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES