A new estimator for the multicollinear Poisson regression model: simulation and application

Adewale F Lukman; Emmanuel Adewuyi; Kristofer Månsson; B M Golam Kibria

doi:10.1038/s41598-021-82582-w

. 2021 Feb 12;11:3732. doi: 10.1038/s41598-021-82582-w

A new estimator for the multicollinear Poisson regression model: simulation and application

Adewale F Lukman ^1,^✉, Emmanuel Adewuyi ², Kristofer Månsson ³, B M Golam Kibria ⁴

PMCID: PMC7881247 PMID: 33580148

Abstract

The maximum likelihood estimator (MLE) suffers from the instability problem in the presence of multicollinearity for a Poisson regression model (PRM). In this study, we propose a new estimator with some biasing parameters to estimate the regression coefficients for the PRM when there is multicollinearity problem. Some simulation experiments are conducted to compare the estimators' performance by using the mean squared error (MSE) criterion. For illustration purposes, aircraft damage data has been analyzed. The simulation results and the real-life application evidenced that the proposed estimator performs better than the rest of the estimators.

Subject terms: Mathematics and computing, Statistics

Introduction

The Poisson regression model (PRM) is often adopted in modelling count data. PRM is employed to model the relationship between the response variable and one or more regressors. The response variable comes in the form of a count variable or non-negative integers such as the defects in a unit of manufactured product, errors or bugs in software, number of road accidents, number of times a machine fail in a month, occurrences of virus disease, count of particulate matter or other pollutants in the environment etc. The regression coefficients in PRM are estimated using the Maximum Likelihood Estimator (MLE).

In LRM, the estimator performance suffers from high instability when the regressors are correlated, i.e. multicollinearity (for example, see^1,2). Multicollinearity effects include significant variance and covariances of the regression coefficients, wider confidence intervals, insignificant t-ratios and high R-square. Multicollinearity also negatively influence the performance of the MLE in PRM^3,4. Alternative estimators to the MLE in LRM are the ridge regression estimator by Hoerl and Kennard⁵, Liu estimator by Liu⁶, Liu type estimator by Liu⁷, two-parameter estimator by Ozkale and Kaciranlar⁸, k-d class estimator by Sakallioglu and Kaciranlar⁹, a two-parameter estimator by Yang and Chang¹⁰, modified two-parameter estimator by Dorugade¹¹ and recently, the modified ridge type estimator by Lukman et al.¹², modified new two-parameter estimator by Lukman et al.¹³, modified new two-parameter estimator by Ahmad and Aslam¹⁴, and K–L estimator by Kibria and Lukman¹⁵.

Researchers have applied some of these estimators to the Poisson regression model. These include the Poisson ridge regression estimator (PRE) by Månsson and Shukur³, Månsson et al.¹⁶ developed the Poisson Liu estimator (PLE) to mitigate the problem of multicollinearity in PRM. Batah et al.¹⁷ proposed the modified jackknifed ridge regression estimator (MJRE) for the LRM while Turkan and Özel¹⁸ adopted the MJRE to the Poisson regression model as a remedy to the problem of multicollinearity. Özkale and Kaciranlar⁸ combine the Liu regression estimator and the ridge regression to form the two-parameter estimator in LRM. Thus, Asar and Genc¹⁹ implemented the two-parameter estimator to the Poisson regression model. Rashad and Algamal²⁰ developed a new ridge estimator for the Poisson regression model by modifying Poisson modified jackknifed ridge regression. Qasim et al.⁴ suggest some new shrinkage estimators for the PLE. We classified these estimators into Poisson regression estimators with a single shrinkage parameter and two-parameters, respectively. Recently, Kibria and Lukman¹⁵ proposed another ridge type estimator called K–L estimator with a single shrinkage parameter.

This study aims to propose an estimator that can handle multicollinearity in a Poisson regression model. We harmonize the K–L estimator to the PRM and suggest some shrinkage estimators for the estimator. Also, compare the proposed estimator's performance with the MLE, PRE and PLE in terms of the matrix mean square error (MSEM) and mean square error (MSE). The small sample properties are investigated using a simulation experiment. Finally, the new method's benefit is evaluated in an example using aircraft damage data that was initially analyzed by Myers et al.²¹.

This paper structuring is as follows: the Poisson regression model, some estimators and the MSEM and MSE properties of the estimators are discussed in Sect. 2. A Monte Carlo simulation experiment has been conducted in Sect. 3. To illustrate the finding of the paper, aircraft damage data was analyzed in Sect. 4. Some concluding remarks are presented in Sect. 5.

Statistical methodology

Poisson regression model and maximum likelihood estimator

Suppose that the response variable, $y_{i}$ is in the form of non-negative integers (or count data), then the probability function is given as follows

f (y_{i}) = \frac{exp (- μ_{i}) μ_{i}^{yi}}{y_{i}!}, y_{i} = 0, 1, 2, \dots

2.1

where $μ_{i} > 0 .$ The mean and variance of the Poisson distribution in Eq. (2.1) are the same (i.e. $E (y) = V a r (y) = μ$ ). The model is written in terms of the mean of the response. According to Myers et al.²¹, we assume that there exists a function, g, that relates the mean of the response to a linear predictor such that

g (μ_{i}) = η_{i} = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p} = x_{i}^{'} β,

2.2

where $g (.)$ is a monotone differentiable link function. The log link function is a popular type of this link function such that $g (μ_{i}) = ln (μ_{i}) = exp (x_{i}^{'} β) .$ This log link is generally adopted for the Poisson regression model because it ensures that all the fitted values for the response variable are positive. The maximum likelihood estimator is popularly used to estimate the coefficients of the PRM, where the likelihood function is defined as:

l (β) = \prod_{i = 1}^{n} \frac{exp (- μ_{i}) μ_{i}^{yi}}{y_{i}!} . = \frac{\prod_{i = 1}^{n} μ_{i}^{yi} exp (- \sum_{i = 1}^{n} μ_{i})}{\prod_{i = 1}^{n} y_{i}!}

2.3

where $μ_{i} = g^{- 1} (x_{i}^{'} β) .$ The log-likelihood function is used to estimate the parameter vector $β$

ln l (β) = \sum_{i = 1}^{n} y_{i} ln (μ_{i}) - \sum_{i = 1}^{n} μ_{i} - \sum_{i = 1}^{n} ln (y_{i}!)

2.4

Since Eq. (2.4) is nonlinear in $β$ , the solution is obtained using iterative methods. A common such procedure is the Fisher Scoring method defined as:

β^{t + 1} = β^{t} + I^{- 1} (β^{t}) S (β^{t}),

2.5

where $S (β) = \frac{\partial l (β)}{\partial β}$ and $I^{- 1} (β) = {(- E (\partial^{2} l (β) / \partial β \partial β^{'}))}^{- 1}$ . The final step of the estimated coefficients corresponds to:

{\hat{β}}^{PMLE} = {(X^{'} \hat{W} X)}^{- 1} X^{'} \hat{W} \hat{z}

2.6

where $\hat{W} = d i a g (μ_{i}^{2})$ matrix and $\hat{z}$ is the adjusted response variable, $\hat{z} = x_{i}^{'} {\hat{β}}^{PMLE} + \frac{y_{i} - {\hat{μ}}_{i}}{{\hat{μ}}_{i}^{2}} .$ $\hat{W}$ and $\hat{z}$ are obtained using Fisher scoring iterative procedure (see Hardin and Hilbe²²). The covariance matrix and mean square error are given respectively as follows:

C o v ({\hat{β}}^{PMLE}) = {(X^{'} \hat{W} X)}^{- 1}

2.7

and

M S E ({\hat{β}}^{PMLE}) = \sum_{i = 1}^{p} \frac{1}{λ_{i}}

2.8

where $λ_{i}$ is the ith eigenvalue of the matrix $X^{'} \hat{W} X$ .

Poisson K–L estimator

Månsson and Shukur³ developed the Poisson ridge regression estimator (PRRE) to mitigate the problem of multicollinearity, which is defined as follows:

{\hat{β}}^{PRRE} = {(X^{'} \hat{W} X + k I)}^{- 1} X^{'} \hat{W} X {\hat{β}}^{PMLE} .

2.9

where $k > 0$ is the biasing parameter, $I$ is a $p \times p$ identity matrix and the optimal value of k is defined as:

k = \frac{1}{α_{i, max}^{2}}

2.10

where ${\hat{α}}_{i}$ is the ith component of $α = Q^{'} β$ , Q is the matrix whose columns are the eigenvectors of $X^{'} \hat{W} X .$

Månsson et al.¹⁶ introduced the Poisson Liu estimator (PLE) as follows:

{\hat{β}}^{PLE} = {(X^{'} \hat{W} X + I)}^{- 1} (X^{'} \hat{W} X + d I) {\hat{β}}^{PMLE}, 0 < d < 1,

2.11

where $d$ according to Månsson et al.¹⁶ may be estimated by the following formula:

d = max (0, min (\frac{α_{i}^{2} - 1}{\frac{1}{λ_{i}} + α_{i}^{2}}))

2.12

Kibria and Lukman¹⁵ proposed a new single parameter ridge-type estimator for the linear regression model, which is defined as follows:

{\hat{β}}^{KLE} = {(X^{'} X + k I_{p})}^{- 1} (X^{'} X - k I_{p}) {\hat{β}}^{MLE}

2.13

Following Kibria and Lukman¹⁵, we proposed the following new estimator for the Poisson regression model as follows:

{\hat{β}}^{PKLE} = {(X^{'} \hat{W} X + k I_{p})}^{- 1} (X^{'} \hat{W} X - k I_{p}) {\hat{β}}^{PMLE}

2.14

Suppose $α = Q^{'} β$ and $Q^{'} X^{T} \hat{W} X Q = Λ = d i a g (λ_{1}, . . ., λ_{p})$ where $λ_{1} \geq λ_{2} \geq . . . \geq λ_{p}, Λ$ is the matrix of eigenvalues of $X^{T} \hat{W} X$ and Q is the matrix whose columns are the eigenvectors of $X^{T} \hat{W} X .$ The matrix mean square error and the mean square error of the estimators PMLE, PRRE, PLE and PKLE are provided in Eqs. (2.15) to (2.21) respectively as follows:

M S E M ({\hat{α}}^{PMLE}) = Q Λ^{- 1} Q^{T}

2.15

M S E ({\hat{α}}^{PMLE}) = \sum_{i = 1}^{p} \frac{1}{λ_{i}}

2.16

M S E M ({\hat{α}}^{PRRE}) = Q Λ^{k} Λ Λ^{k} Q^{T} + k^{2} Λ^{k} α α^{T} Λ^{k}

2.17

M S E ({\hat{α}}^{PRRE}) = \sum_{i = 1}^{p} (\frac{λ_{i}}{{(λ_{i} + k)}^{2}}) + k^{2} \sum_{i = 1}^{p} (\frac{α_{i}^{2}}{{(λ_{i} + k)}^{2}})

2.18

M S E M ({\hat{α}}^{PLE}) = Q Λ_{d} Λ^{- 1} Λ_{d}^{T} Q^{T} + (Λ_{d} - I) α α^{T} {(Λ_{d} - I)}^{T}

2.19

where $Λ_{d} = {(Λ + I)}^{- 1} (Λ + d I) .$

M S E M ({\hat{α}}^{PKLE}) = Q Λ^{k} (Λ - k I_{p}) Λ^{- 1} Λ^{k} (Λ - k I_{p}) Q^{'} + 4 k^{2} Λ^{k} Q Λ^{k} α α^{'}

2.20

where $Λ^{k} = {(Λ + k I_{p})}^{- 1} .$

M S E ({\hat{α}}^{PKLE}) = \sum_{i = 1}^{p} (\frac{{(λ - k)}^{2}}{λ_{i} {(λ_{i} + k)}^{2}}) + 4 k^{2} \sum_{i = 1}^{p} (\frac{α_{i}^{2}}{{(λ_{i} + k)}^{2}})

2.21

M S E ({\hat{α}}^{PLE}) = \sum_{j = 1}^{p} (\frac{{(λ_{j} + d)}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} + \frac{{(d - 1)}^{2} α_{j}^{2}}{{(λ_{j} + 1)}^{2}})

2.22

where $λ_{i}$ is the ith eigenvalue of $X^{'} \hat{W} X$ and $α_{j}$ is the jth element of $α .$ For the purpose of theoretical comparisons, we adopt the following lemmas.

Lemma 2.1

Let A be a positive definite (pd) matrix, that is A > 0, and $a$ be some vector, then $A - a a^{^{'}} \geq 0$ if and only if (iff) $a^{'} A^{- 1} a \leq 1$ ²³.

Lemma 2.2

$M S E M ({\hat{β}}_{1}) - M S E M ({\hat{β}}_{2}) = σ^{2} D + b_{1} b_{1}^{T} - b_{2} b_{2}^{T} > 0$ if and only if $b_{2}^{T} {[σ^{2} D + b_{1} b_{1}^{T}]}^{- 1} b_{2} < 1$ where $M S E ({\hat{β}}_{j}) = C o v ({\hat{β}}_{j}) + b_{j}^{T} b_{j}$ ²⁴.

Theorem 2.1

${\hat{α}}^{PKLE}$ is better than ${\hat{α}}^{PMLE}$ iff, $M S E M [{\hat{α}}^{PMLE}] - M S E M [{\hat{α}}^{PKLE}] > 0$ provided k > 0.

Proof.

\begin{matrix} M S E M ({\hat{α}}^{PMLE}) - M S E M ({\hat{α}}^{PKLE}) = Q [Λ^{- 1} - Λ^{k} (Λ - k I_{p}) Λ^{- 1} Λ^{k} (Λ - k I_{p})] Q^{T} \\ - 4 k^{2} Λ^{k} Q Λ^{k} α α^{'} \\ = Q d i a g {\{\frac{1}{λ_{i}} - \frac{{(λ_{i} - k)}^{2}}{λ_{i} {(λ_{i} + k)}^{2}}\}}_{i = 1}^{p} Q^{T} \\ - 4 k^{2} Λ^{k} Q Λ^{k} α α^{'} \end{matrix}

2.23

The matrix $Λ^{- 1} - Λ^{k} (Λ - k I_{p}) Λ^{- 1} Λ^{k} (Λ - k I_{p})$ is pd since $λ_{i} {(λ_{i} + k)}^{2} - λ_{i} {(λ_{i} - k)}^{2} > 0 .$

Theorem 2.2

${\hat{α}}^{PKLE}$ is better than ${\hat{α}}^{PRRE}$ iff, $M S E M [{\hat{α}}^{PRRE}] - M S E M [{\hat{α}}^{PKLE}] > 0$ provided k > 0.

Proof.

\begin{matrix} D ({\hat{α}}^{PRRE}) - D ({\hat{α}}^{PKLE}) = Q [Λ^{k} Λ Λ^{k} - Λ^{k} (Λ - k I_{p}) Λ^{- 1} Λ^{k} (Λ - k I_{p})] Q^{T} \\ = Q d i a g {\{\frac{λ_{i}}{{(λ_{i} + k)}^{2}} - \frac{{(λ_{i} - k)}^{2}}{λ_{i} {(λ_{i} + k)}^{2}}\}}_{i = 1}^{p} Q^{T} \end{matrix}

2.24

The matrix $Λ^{k} Λ Λ^{k} - Λ^{k} (Λ - k I_{p}) Λ^{- 1} Λ^{k} (Λ - k I_{p}$ is pd since $λ_{i}^{2} {(λ_{i} + k)}^{2} - {(λ_{i} + k)}^{2} {(λ_{i} - k)}^{2} > 0$ for $2 λ_{i} - k > 0 .$

Theorem 2.3

${\hat{α}}^{PKLE}$ is better than ${\hat{α}}^{PLE}$ iff, $M S E M [{\hat{α}}^{PLE}] - M S E M [{\hat{α}}^{PKLE}] > 0$ provided k > 0.

Proof.

\begin{matrix} D ({\hat{α}}^{PLE}) - D ({\hat{α}}^{PKLE}) = Q [Λ_{d} Λ^{- 1} Λ_{d}^{T} - Λ^{k} (Λ - k I_{p}) Λ^{- 1} Λ^{k} (Λ - k I_{p})] Q^{T} \\ = Q d i a g {\{\frac{{(λ_{j} + d)}^{2}}{λ_{j} {(λ_{j} + 1)}^{2}} - \frac{{(λ_{i} - k)}^{2}}{λ_{i} {(λ_{i} + k)}^{2}}\}}_{i = 1}^{p} Q^{T} \end{matrix}

2.25

The matrix found in the above equation $Λ_{d} Λ^{- 1} Λ_{d}^{T} - Λ^{k} (Λ - k I_{p}) Λ^{- 1} Λ^{k} (Λ - k I_{p})$ is pd since $λ_{i} {(λ_{i} + k)}^{2} {(λ_{j} + d)}^{2} - λ_{j} {(λ_{j} + 1)}^{2} {(λ_{i} - k)}^{2} > 0 .$

Selection of Biasing Parameter

The parameter is estimated by taking the first derivative of the MSE function of ${\hat{α}}^{PKLE}$ with respect to k and equating the resulting solution to zero. We obtain the following estimates of k:

k = \frac{λ_{i}}{1 + 2 λ_{i} α_{i}^{2}}

2.26

Following Månsson et al.¹⁶ and Lukman and Ayinde²⁵, we propose the following forms of the shrinkage parameters in Eq. (2.26).

{\hat{k}}_{1} = max (0, min (\frac{λ_{i}}{1 + 2 λ_{i} α_{i}^{2}}))

2.27

{\hat{k}}_{2} = \sqrt{max (0, min (\frac{λ_{i}}{1 + 2 λ_{i} α_{i}^{2}}))}

2.28

Simulation Experiment

Simulation Design

Since a theoretical comparison among the estimators is not sufficient, as simulation experiment has been carried out in this section. We generate the response variable of the PRM from the Poisson distribution $P_{0} (μ_{i})$ where $μ_{i} = exp (x_{i} β) i = 1, 2, \dots, n, β = {(β_{0}, β_{1}, β_{2}, \dots, β_{p})}^{'}$ such that $x_{i}$ is the ith row of the design matrix X and following Kibria¹, we generated the X matrix as follows:

x_{ij} = {(1 - ρ^{2})}^{1 / 2} w_{ij} + ρ w_{i p + 1}, i = 1, 2, \dots, n ; j = 1, 2, \dots p, p + 1

3.1

where $ρ^{2}$ is the correlation between the explanatory variables. The values of $ρ$ are chosen to be 0.85, 0.9, 0.95 and 0.99. The mean function is obtained for p = 4 and 7 regressors, respectively. According to Kibria et al.²⁶, the intercept value are chosen to be − 1, 0 and 1 to change the average intensity of the Poisson process. The slope coefficients chosen so that $\sum_{j = 1}^{p} β_{j}^{2} = 1$ and $β_{1} = β_{2} = \dots = β_{p}$ for sample sizes 50, 75, 100 and 200. Simulation experiment conducted through R programming language²⁷. The estimated MSE is calculated as

M S E (\hat{β}) = \frac{1}{1000} \sum_{j = 1}^{1000} {({\hat{β}}_{ij} - β_{i})}^{'} ({\hat{β}}_{ij} - β_{i})

3.2

where ${\hat{β}}_{ij}$ denotes the estimate of the ith parameter in jth replication and β_i is the true parameter values. The experiment is replicated 1000 times. The simulated MSE values of the estimators for p = 4 and intercepts = − 1, 0 and 1 are presented in Tables 1, 2, 3 respectively and p = 7 and intercepts = − 1, 0 and 1 are presented in Tables 4, 5, 6, respectively.

Table 1.

Simulated MSE when p = 4 and intercept = − 1.

Intercept	n	ρ	PKLE1	PKLE2	PLE	PRRE	PMLE
− 1	50	0.8	0.2688	0.2324	0.2668	0.2691	0.3194
		0.9	0.3422	0.2780	0.3468	0.3445	0.4434
		0.95	0.4729	0.3391	0.4902	0.4812	0.6854
		0.99	1.5356	0.5282	1.5721	1.5470	2.8210
		0.999	15.5580	2.9798	15.6323	15.3346	28.6901
	75	0.8	0.1772	0.1635	0.1805	0.1777	0.2134
		0.9	0.2477	0.2237	0.2528	0.2493	0.3067
		0.95	0.3547	0.3028	0.3640	0.3593	0.4681
		0.99	0.9971	0.5388	1.0494	1.0258	1.7516
		0.999	8.3318	0.6540	8.3224	8.2732	15.8245
	100	0.8	0.1644	0.1520	0.1623	0.1644	0.1763
		0.9	0.2273	0.2076	0.2274	0.2278	0.2571
		0.95	0.3323	0.2894	0.3366	0.3345	0.4074
		0.99	1.0515	0.5912	1.1219	1.0801	1.7301
		0.999	7.8180	0.7432	7.9173	7.7334	15.0254
	200	0.8	0.0429	0.0420	0.0423	0.0429	0.0435
		0.9	0.0535	0.0527	0.0535	0.0535	0.0557
		0.95	0.0816	0.0800	0.0827	0.0817	0.0879
		0.99	0.2728	0.2438	0.2763	0.2749	0.3274
		0.999	1.7187	0.6548	1.7933	1.7500	3.1157

Open in a new tab

Table 2.

Simulated MSE when p = 4 and intercept = 0.

Intercept	n	ρ	PKLE1	PKLE2	PLE	PRRE	PMLE
0	50	0.8	0.0701	0.0683	0.0707	0.0702	0.0756
		0.9	0.1003	0.0955	0.1012	0.1007	0.1138
		0.95	0.1715	0.1561	0.1741	0.1735	0.2143
		0.99	0.5111	0.3241	0.5317	0.5315	0.9181
		0.999	4.6909	0.6275	4.6140	4.6207	9.0882
	75	0.8	0.0546	0.0537	0.0547	0.0546	0.0570
		0.9	0.0801	0.0780	0.0803	0.0802	0.0856
		0.95	0.1303	0.1245	0.1308	0.1307	0.1456
		0.99	0.3850	0.3168	0.3976	0.3972	0.5741
		0.999	3.0237	0.6477	2.9929	3.0076	5.6671
	100	0.8	0.0418	0.0413	0.0419	0.0418	0.0431
		0.9	0.0690	0.0675	0.0691	0.0691	0.0727
		0.95	0.1168	0.1122	0.1174	0.1171	0.1290
		0.99	0.3912	0.3238	0.4055	0.4034	0.5806
		0.999	2.8662	0.6173	2.8339	2.8511	5.4321
	200	0.8	0.0102	0.0102	0.0102	0.0102	0.0103
		0.9	0.0147	0.0146	0.0147	0.0147	0.0149
		0.95	0.0265	0.0263	0.0265	0.0265	0.0270
		0.99	0.1015	0.0978	0.1017	0.1017	0.1108
		0.999	0.6370	0.4352	0.6620	0.6618	1.1017

Open in a new tab

Table 3.

Simulated MSE when p = 4 and intercept = 1.

Intercept	n	ρ	PKLE1	PKLE2	PLE	PRRE	PMLE
1	50	0.8	0.0411	0.0408	0.0411	0.0411	0.0416
		0.9	0.0519	0.0511	0.0519	0.0519	0.0532
		0.95	0.0822	0.0793	0.0823	0.0822	0.0868
		0.99	0.2734	0.2298	0.2818	0.2784	0.3567
		0.999	1.7816	0.4751	1.8210	1.7884	3.4578
	75	0.8	0.0269	0.0268	0.0269	0.0269	0.0271
		0.9	0.0390	0.0387	0.0391	0.0390	0.0396
		0.95	0.0596	0.0586	0.0596	0.0596	0.0613
		0.99	0.2104	0.1944	0.2118	0.2113	0.2389
		0.999	1.2893	0.6147	1.3499	1.3232	2.3019
	100	0.8	0.0235	0.0234	0.0235	0.0235	0.0236
		0.9	0.0343	0.0340	0.0343	0.0343	0.0345
		0.95	0.0519	0.0512	0.0519	0.0519	0.0528
		0.99	0.2060	0.1900	0.2073	0.2066	0.2279
		0.999	1.1375	0.5447	1.2443	1.1820	2.0549
	200	0.8	0.0057	0.0056	0.0057	0.0057	0.0057
		0.9	0.0076	0.0076	0.0076	0.0076	0.0076
		0.95	0.0115	0.0115	0.0115	0.0115	0.0116
		0.99	0.0421	0.0415	0.0421	0.0421	0.0430
		0.999	0.3123	0.2704	0.3185	0.3159	0.3868

Open in a new tab

Table 4.

Simulated MSE when p = 7 and intercept = − 1.

Intercept	n	ρ	PKLE1	PKLE2	PLE	PRRE	PMLE
− 1	50	0.8	0.5026	0.4263	0.4988	0.5018	0.6064
		0.9	0.7669	0.5998	0.7795	0.7688	1.0486
		0.95	1.2641	0.8189	1.2873	1.2721	1.9715
		0.99	6.3135	1.5820	6.1834	6.2140	10.7269
		0.999	64.7209	21.1207	64.3927	63.0823	112.6646
	75	0.8	0.2380	0.2162	0.2388	0.2385	0.2770
		0.9	0.3137	0.2732	0.3264	0.3177	0.4234
		0.95	0.4385	0.3349	0.4646	0.4486	0.7010
		0.99	2.1826	0.8014	2.1642	2.1503	3.9059
		0.999	23.5750	9.5019	22.9197	22.6216	42.8199
	100	0.8	0.1463	0.1413	0.1478	0.1464	0.1610
		0.9	0.2107	0.2019	0.2149	0.2113	0.2386
		0.95	0.3366	0.3134	0.3505	0.3396	0.4186
		0.99	1.2396	0.8224	1.2956	1.2697	1.9434
		0.999	12.1942	1.9688	11.9290	11.9912	20.8725
	200	0.8	0.0516	0.0506	0.0512	0.0516	0.0524
		0.9	0.0757	0.0744	0.0759	0.0757	0.0791
		0.95	0.1279	0.1240	0.1285	0.1281	0.1362
		0.99	0.4777	0.4019	0.4916	0.4866	0.6330
		0.999	4.0605	1.3613	3.9903	3.9785	6.9296

Open in a new tab

Table 5.

Simulated MSE when p = 7 and intercept = 0.

Intercept	n	ρ	PKLE1	PKLE2	PLE	PRRE	PMLE
0	50	0.8	0.1489	0.1447	0.1503	0.1492	0.1617
		0.9	0.2452	0.2323	0.2483	0.2467	0.2835
		0.95	0.4212	0.3767	0.4310	0.4288	0.5537
		0.99	1.8687	0.9555	1.8648	1.8765	3.2182
		0.999	20.4742	3.3127	19.9500	20.0905	35.4611
	75	0.8	0.0658	0.0644	0.0661	0.0659	0.0699
		0.9	0.1094	0.1044	0.1103	0.1099	0.1236
		0.95	0.1797	0.1623	0.1833	0.1827	0.2302
		0.99	0.7702	0.4389	0.7854	0.7866	1.3652
		0.999	8.0166	2.3384	7.5503	7.6873	14.7306
	100	0.8	0.0489	0.0484	0.0489	0.0489	0.0501
		0.9	0.0755	0.0744	0.0755	0.0755	0.0782
		0.95	0.1306	0.1272	0.1308	0.1307	0.1394
		0.99	0.5228	0.4570	0.5342	0.5335	0.7070
		0.999	4.4932	1.4207	4.3748	4.4380	7.5910
	200	0.8	0.0134	0.0133	0.0134	0.0134	0.0135
		0.9	0.0227	0.0226	0.0227	0.0227	0.0230
		0.95	0.0401	0.0397	0.0401	0.0401	0.0412
		0.99	0.1930	0.1819	0.1945	0.1943	0.2241
		0.999	1.4410	0.7989	1.4362	1.4478	2.4561

Open in a new tab

Table 6.

Simulated MSE when p = 7 and intercept = 1.

Intercept	n	ρ	PKLE1	PKLE2	PLE	PRRE	PMLE
1	50	0.8	0.0792	0.0784	0.0792	0.0792	0.0805
		0.9	0.1247	0.1218	0.1248	0.1248	0.1296
		0.95	0.2244	0.2126	0.2253	0.2249	0.2448
		0.99	0.8773	0.6405	0.9176	0.9017	1.3239
		0.999	8.8188	1.8730	8.4501	8.6064	15.0688
	75	0.8	0.0347	0.0344	0.0347	0.0347	0.0351
		0.9	0.0518	0.0508	0.0518	0.0518	0.0534
		0.95	0.0983	0.0941	0.0985	0.0984	0.1055
		0.99	0.3866	0.2998	0.4050	0.3981	0.5553
		0.999	3.2853	1.0512	3.1833	3.1984	5.9012
	100	0.8	0.0218	0.0217	0.0218	0.0218	0.0218
		0.9	0.0322	0.0321	0.0322	0.0322	0.0325
		0.95	0.0547	0.0541	0.0547	0.0547	0.0556
		0.99	0.2388	0.2251	0.2398	0.2393	0.2612
		0.999	1.7042	0.9834	1.7764	1.7417	2.7838
	200	0.8	0.0072	0.0072	0.0072	0.0072	0.0072
		0.9	0.0103	0.0103	0.0103	0.0103	0.0104
		0.95	0.0171	0.0170	0.0171	0.0171	0.0172
		0.99	0.0814	0.0793	0.0814	0.0814	0.0846
		0.999	0.6326	0.4903	0.6622	0.6495	0.9012

Open in a new tab

Simulation results discussion

The simulation result in Tables 1, 2, 3, 4, 5, 6 shows that the following factors affect the estimators’ performances: the degree of correlation, the number of explanatory variables, the sample size and the value of the intercept. We observed that increasing the sample size led to a decrease in the MSE values of all the estimators, which is one of the unique properties for any statistical estimator. The proposed estimator, PKLE2 consistently possessed the minimum MSE. Increasing the degree of correlation increases the simulated MSE values for each of the estimators. The Poisson ridge (PRE) and Liu estimator (PLE) competes favorably with the proposed estimator. For instance, The MSE of PRE and PLE are very similar to the proposed estimator, especially when multicollinearity is low (ρ = 0.8–0.95).The performance of PMLE is the worst compared to other estimators, especially when the correlation among regressors is 0.90 and higher. This study increased explanatory variables from 4 to 7 and observed that the MSE rises by increasing explanatory variables. The MSE for all the estimators decreases when we change the intercept value from − 1 to + 1. Consistently, the proposed estimator PKLE2 outperforms all other estimators considered in this study. We also plotted MSE vs sample sizes and different ρ and intercepts and presented them Figs. 1, 2, 3, 4 and 5. From these figures, we observed that PKLE2 consistently possessed minimum value at the different sample size (n), followed by PKLE1 while PMLE has the worst performance. These figures also revealed that the estimators’ performance becomes similar for large n (200) or small correlation (0.80). However, the proposed estimator, PKLE2 performed the best.

Real life application

In this session, we examined the effectiveness of the new estimator using real-life data. We adopted the aircraft damage data to evaluate the proposed estimator's performance and some other estimators in this study. The dataset was initially used by Myers et al.²¹ and recently by Asar and Genc¹⁹ and others. The dataset provides the information about two types of aircraft, the McDonnell Douglas A-4 Skyhawk and the Grumman A-6 Intruder. This data describe 30 strike missions of these two aircraft. The explanatory variables are as follows: x₁ is a binary variable representing the aircraft type (A-4 coded as 0 and A-6 coded as 1), x₂ and x₃ denote bomb load in tons and total months of aircrew experience, respectively. The response variable, y represents the number of locations with damage on the aircraft, which follows a Poisson distribution^19,21. Amin et al.²⁸ examine if the model follows a Poisson regression model by adopting the Pearson chi-square goodness of fit test. The test confirms that the response variable is well fitted to the Poisson distribution with test statistic (p-value) is given as 6.89812 (0.07521).

According to Myers et al.²¹, there is evident of multicollinearity problem in the data. The eigenvalues of the $X^{'} \hat{W} X$ matrix are 4.3333, 374.8961 and 2085.2251. The condition number, $C N = \sqrt{\frac{max (eigenvalue)}{min (eigenvalue)}} = 219.365$ , also shows multicollinearity in the dataset^2,12. The estimators’ performances are assessed through the mean squared error (MSE). The MSE of the estimators is computed using Eqs. (2.15). (2.17), (2.19) and (2.21), respectively. The biasing parameters are determined using Eqs. (2.10), (2.12), (2.27) and (2.28), respectively. The regression coefficients and the MSE values are provided in Table 7. From Table 7, we observed that all the coefficients have a similar sign. PMLE has the highest mean square error, while the proposed estimator (PKLE2) has the lowest MSE which established its superiority. The maximum likelihood estimator possesses the highest MSE due to the presence of multicollinearity. The ridge and Liu estimator equally perform well when there is multicollinearity. We observed that the performance of the proposed estimator is a function of the biasing parameter, k.

Table 7.

Regression coefficients and MSE.

Coef.	${\hat{α}}^{PMLE}$	${\hat{α}}^{PRRE}$	${\hat{α}}^{PLE}$	${\hat{α}}^{P K L E 1}$	${\hat{α}}^{P K L E 2}$
${\hat{α}}_{0}$	− 0.4060	− 0.1676	− 0.2555	− 0.1085	− 0.1068
${\hat{α}}_{1}$	0.5688	0.3799	0.4789	0.3921	0.3906
${\hat{α}}_{2}$	0.1654	0.1705	0.1665	0.1675	0.1675
${\hat{α}}_{3}$	− 0.0135	− 0.0153	− 0.0147	− 0.0158	− 0.0158
MSE	1.0290	0.2727	0.4320	0.2251	0.2249

Open in a new tab

Some concluding remarks

The K–L estimator is an estimator with a single biasing parameter, k which eliminates the biasing parameter's computational rigour as obtainable in some of the two-parameter estimators. It falls in the ridge and Liu estimator class to mitigate multicollinearity in the linear regression model. According to Kibria and Lukman¹⁵, K–L estimator outclasses the following estimators: the ordinary least squares estimator, the ridge and the Liu estimator in the linear regression model. As earlier stated, the multicollinearity influences the performance of the maximum likelihood estimator (MLE) in both the linear regression models and the Poisson regression models (PRM). The ridge regression and Liu estimator at a different time were harmonized to the PRM to solve multicollinearity. However, in this study, we developed a new estimator, establish its statistical properties, carried out theoretical comparisons with the estimators mentioned above. Furthermore, we conducted a simulation experiment and analyzed a real-life application to show the proposed estimator effectiveness. The simulated and application results show that the proposed estimators outperform the existing estimators, while PMLE has the worst performance.

Author contributions

A.F.L.: Conceptualization, Methodology, Writing—original draft. E.A.: Conceptualization, Software. K.M.: Supervision, Editting, Review. G.B.M.K.: Conceptualization, Supervision, Editting, Review.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Kibria BMG. Performance of some new ridge regression estimators. Commun. Stat. Simul. Comput. 2003;32(2):419–435. doi: 10.1081/SAC-120017499. [DOI] [Google Scholar]
2.Lukman AF, Ayinde K, Aladeitan BB, Rasak B. An unbiased estimator with prior information. Arab J. Basic Appl. Sci. 2020;27(1):45–55. doi: 10.1080/25765299.2019.1706799. [DOI] [Google Scholar]
3.Månsson K, Shukur G. A Poisson ridge regression estimator. Econ. Model. 2011;28:1475–1481. doi: 10.1016/j.econmod.2011.02.030. [DOI] [Google Scholar]
4.Qasim M, Kibria BMG, Månsson K, Sjölander P. A new Poisson Liu regression estimator: method and application. J. Appl. Stat. 2019 doi: 10.1080/02664763.2019.1707485. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. doi: 10.1080/00401706.1970.10488634. [DOI] [Google Scholar]
6.Liu K. A new class of biased estimate in linear regression. Commun. Stat. 1993;22(2):393–402. doi: 10.1080/03610929308831027. [DOI] [Google Scholar]
7.Liu K. Using Liu-type estimator to combat collinearity. Commun. Stat. 2003;32(5):1009–1020. doi: 10.1081/STA-120019959. [DOI] [Google Scholar]
8.Ozkale MR, Kaciranlar S. The restricted and unrestricted two-parameter estimators. Commun. Statist. Theor. Meth. 2007;36:2707–2725. doi: 10.1080/03610920701386877. [DOI] [Google Scholar]
9.Sakallıoğlu S, Kaçıranlar S. A new biased estimator based on ridge estimation. Statist. Papers. 2008;49(4):669–689. doi: 10.1007/s00362-006-0037-0. [DOI] [Google Scholar]
10.Yang H, Chang X. A new two-parameter estimator in linear regression. Commun. Stat. Theory Methods. 2010;39(6):923–934. doi: 10.1080/03610920902807911. [DOI] [Google Scholar]
11.Dorugade AV. Modified two parameter estimator in linear regression. J. Stat. Trans. New Ser. 2014;15(1):23–36. [Google Scholar]
12.Lukman AF, Ayinde K, Binuomote S, Onate AC. Modified ridge-type estimator to combat multicollinearity: application to chemical data. J. Chemomet. 2019 doi: 10.1002/cem.3125. [DOI] [Google Scholar]
13.Lukman AF, Ayinde K, Sek SK, Adewuyi E. A modified new two-parameter estimator in a linear regression model. Model. Simul. Eng. 2019 doi: 10.1155/2019/6342702. [DOI] [Google Scholar]
14.Ahmad S, Aslam M. Another proposal about the new two-parameter estimator for linear regression model with correlated regressors. Commun. Stat. Simul. Comput. 2020 doi: 10.1080/03610918.2019.1705975. [DOI] [Google Scholar]
15.Kibria BMG, Lukman AF. A new ridge-type estimator for the linear regression model: simulations and applications. Scientifica. 2020 doi: 10.1155/2020/9758378. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Månsson, Kibria, B.M.G., Sjölander, P. and Shukur, G. (2012). Improved Liu estimators for the Poisson regression model. Int. J. Stat. Prob. 1(1).
17.Batah FSM, Ramanathan TV, Gore SD. The efficiency of modified Jackknife and ridge type regression estimators: a comparison. Surv. Math. Appl. 2008;3:111–122. [Google Scholar]
18.Türkan S, Özel G. A new modified Jackknifed estimator for the Poisson regression model. J. Appl. Stat. 2016;43:1892–1905. doi: 10.1080/02664763.2015.1125861. [DOI] [Google Scholar]
19.Asar Y, Genc A. A new two-parameter estimator for the poisson regression model. Iran J Sci Technol Trans Sci. 2017 doi: 10.1007/s40995-017-0174-4. [DOI] [Google Scholar]
20.Rashad NK, Algamal ZY. A new ridge estimator for the poisson regression model. Iran J. Sci. Technol. Trans. Sci. 2019 doi: 10.1007/s40995-019-00769-3(01. [DOI] [Google Scholar]
21.Myers RH, Montgomery DC, Vining GG, Robinson TJ. Generalized linear models: With applications in engineering and the sciences, 791. New York: Wiley; 2012. [Google Scholar]
22.Hardin JW, Hilbe JM. Generalized linear models and extensions. College Station: Stata Press; 2012. [Google Scholar]
23.Farebrother RW. Further results on the mean square error of ridge regression. J. Roy. Statist. Soc. B. 1976;38:248–250. [Google Scholar]
24.Trenkler G, Toutenburg H. Mean squared error matrix comparisons between biased estimators—an overview of recent results. Stat Pap. 1990;31(1):165–179. doi: 10.1007/BF02924687. [DOI] [Google Scholar]
25.Lukman AF, Ayinde K. Review and classifications of the ridge parameter estimation techniques. Hacettepe J. Math. Stat. 2017;46(5):953–967. [Google Scholar]
26.Kibria BMG, Månsson K, Shukur G. A simulation study of some biasing parameters for the ridge type estimation of Poisson regression. Commun. Stat. Simul Comput. I. 2015;44:943–957. doi: 10.1080/03610918.2013.796981. [DOI] [Google Scholar]
27.R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org.
28.Amin M, Akram MN, Amanullah M. On the James-Stein estimator for the Poisson regression model. Commun. Stat. Simul. Comput. 2020 doi: 10.1080/03610918.2020.1775851. [DOI] [Google Scholar]

[CR1] 1.Kibria BMG. Performance of some new ridge regression estimators. Commun. Stat. Simul. Comput. 2003;32(2):419–435. doi: 10.1081/SAC-120017499. [DOI] [Google Scholar]

[CR2] 2.Lukman AF, Ayinde K, Aladeitan BB, Rasak B. An unbiased estimator with prior information. Arab J. Basic Appl. Sci. 2020;27(1):45–55. doi: 10.1080/25765299.2019.1706799. [DOI] [Google Scholar]

[CR3] 3.Månsson K, Shukur G. A Poisson ridge regression estimator. Econ. Model. 2011;28:1475–1481. doi: 10.1016/j.econmod.2011.02.030. [DOI] [Google Scholar]

[CR4] 4.Qasim M, Kibria BMG, Månsson K, Sjölander P. A new Poisson Liu regression estimator: method and application. J. Appl. Stat. 2019 doi: 10.1080/02664763.2019.1707485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. doi: 10.1080/00401706.1970.10488634. [DOI] [Google Scholar]

[CR6] 6.Liu K. A new class of biased estimate in linear regression. Commun. Stat. 1993;22(2):393–402. doi: 10.1080/03610929308831027. [DOI] [Google Scholar]

[CR7] 7.Liu K. Using Liu-type estimator to combat collinearity. Commun. Stat. 2003;32(5):1009–1020. doi: 10.1081/STA-120019959. [DOI] [Google Scholar]

[CR8] 8.Ozkale MR, Kaciranlar S. The restricted and unrestricted two-parameter estimators. Commun. Statist. Theor. Meth. 2007;36:2707–2725. doi: 10.1080/03610920701386877. [DOI] [Google Scholar]

[CR9] 9.Sakallıoğlu S, Kaçıranlar S. A new biased estimator based on ridge estimation. Statist. Papers. 2008;49(4):669–689. doi: 10.1007/s00362-006-0037-0. [DOI] [Google Scholar]

[CR10] 10.Yang H, Chang X. A new two-parameter estimator in linear regression. Commun. Stat. Theory Methods. 2010;39(6):923–934. doi: 10.1080/03610920902807911. [DOI] [Google Scholar]

[CR11] 11.Dorugade AV. Modified two parameter estimator in linear regression. J. Stat. Trans. New Ser. 2014;15(1):23–36. [Google Scholar]

[CR12] 12.Lukman AF, Ayinde K, Binuomote S, Onate AC. Modified ridge-type estimator to combat multicollinearity: application to chemical data. J. Chemomet. 2019 doi: 10.1002/cem.3125. [DOI] [Google Scholar]

[CR13] 13.Lukman AF, Ayinde K, Sek SK, Adewuyi E. A modified new two-parameter estimator in a linear regression model. Model. Simul. Eng. 2019 doi: 10.1155/2019/6342702. [DOI] [Google Scholar]

[CR14] 14.Ahmad S, Aslam M. Another proposal about the new two-parameter estimator for linear regression model with correlated regressors. Commun. Stat. Simul. Comput. 2020 doi: 10.1080/03610918.2019.1705975. [DOI] [Google Scholar]

[CR15] 15.Kibria BMG, Lukman AF. A new ridge-type estimator for the linear regression model: simulations and applications. Scientifica. 2020 doi: 10.1155/2020/9758378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Månsson, Kibria, B.M.G., Sjölander, P. and Shukur, G. (2012). Improved Liu estimators for the Poisson regression model. Int. J. Stat. Prob. 1(1).

[CR17] 17.Batah FSM, Ramanathan TV, Gore SD. The efficiency of modified Jackknife and ridge type regression estimators: a comparison. Surv. Math. Appl. 2008;3:111–122. [Google Scholar]

[CR18] 18.Türkan S, Özel G. A new modified Jackknifed estimator for the Poisson regression model. J. Appl. Stat. 2016;43:1892–1905. doi: 10.1080/02664763.2015.1125861. [DOI] [Google Scholar]

[CR19] 19.Asar Y, Genc A. A new two-parameter estimator for the poisson regression model. Iran J Sci Technol Trans Sci. 2017 doi: 10.1007/s40995-017-0174-4. [DOI] [Google Scholar]

[CR20] 20.Rashad NK, Algamal ZY. A new ridge estimator for the poisson regression model. Iran J. Sci. Technol. Trans. Sci. 2019 doi: 10.1007/s40995-019-00769-3(01. [DOI] [Google Scholar]

[CR21] 21.Myers RH, Montgomery DC, Vining GG, Robinson TJ. Generalized linear models: With applications in engineering and the sciences, 791. New York: Wiley; 2012. [Google Scholar]

[CR22] 22.Hardin JW, Hilbe JM. Generalized linear models and extensions. College Station: Stata Press; 2012. [Google Scholar]

[CR23] 23.Farebrother RW. Further results on the mean square error of ridge regression. J. Roy. Statist. Soc. B. 1976;38:248–250. [Google Scholar]

[CR24] 24.Trenkler G, Toutenburg H. Mean squared error matrix comparisons between biased estimators—an overview of recent results. Stat Pap. 1990;31(1):165–179. doi: 10.1007/BF02924687. [DOI] [Google Scholar]

[CR25] 25.Lukman AF, Ayinde K. Review and classifications of the ridge parameter estimation techniques. Hacettepe J. Math. Stat. 2017;46(5):953–967. [Google Scholar]

[CR26] 26.Kibria BMG, Månsson K, Shukur G. A simulation study of some biasing parameters for the ridge type estimation of Poisson regression. Commun. Stat. Simul Comput. I. 2015;44:943–957. doi: 10.1080/03610918.2013.796981. [DOI] [Google Scholar]

[CR27] 27.R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org.

[CR28] 28.Amin M, Akram MN, Amanullah M. On the James-Stein estimator for the Poisson regression model. Commun. Stat. Simul. Comput. 2020 doi: 10.1080/03610918.2020.1775851. [DOI] [Google Scholar]

PERMALINK

A new estimator for the multicollinear Poisson regression model: simulation and application

Adewale F Lukman

Emmanuel Adewuyi

Kristofer Månsson

B M Golam Kibria

Abstract

Introduction

Statistical methodology

Poisson regression model and maximum likelihood estimator

Poisson K–L estimator

Lemma 2.1

Lemma 2.2

Theorem 2.1

Proof.

Theorem 2.2

Proof.

Theorem 2.3

Proof.

Selection of Biasing Parameter

Simulation Experiment

Simulation Design

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Simulation results discussion

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Real life application

Table 7.

Some concluding remarks

Author contributions

Competing interests

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases