Abstract
In the existence of multicollinearity problem in the logistic model, some important problems may occur in the analysis of the model, such as unstable maximum likelihood estimator with very high standard errors, false inferences. The Liu-type logistic estimator was proposed as two-parameter estimator to overcome multicollinearity problem in the logistic model. In the existing previous studies, the (k, d) pair in this shrinkage estimator is estimated by two-phase methods. However, since the different estimators can be utilized in the estimation of d, optimal choice of the (k, d) pair provided using the two-phase approaches is not guaranteed to overcome multicollinearity. In this article, a new alternative method based on particle swarm optimization is suggested to estimate (k, d) pair in Liu-type logistic estimator, simultaneously. For this purpose, an objective function that eliminates the multicollinearity problem, provides minimization of the bias of the model and improvement of the model’s predictive performance, is developed. Monte Carlo simulation study is conducted to show the performance of the proposed method by comparing it with existing methods. The performance of the proposed method is also demonstrated by the real dataset which is related to the collapse of commercial banks in Turkey during Asian financial crisis.
KEYWORDS: Logistic Liu-type estimator, ridge logistic estimator, maximum likelihood estimator, multicollinearity, particle swarm optimization
1. Introduction
The logistic model is the most commonly used generalized linear model in many fields of science from medicine to the economics, to the education, to engineering, etc., which models relationship between the dichotomous response variable and the covariates. The estimation of the parameters in the logistic regression model is provided by the maximum likelihood (ML) estimation method. The reliability of the estimations to be obtained by the ML method depends on the various assumptions. One of these assumptions is that there is no linear dependency between the independent variables in the model. If there is no linear relationship between the covariates, they are said to be orthogonal. However, the covariates are not orthogonal in almost all applications. When there are near-linear dependencies between the covariates, non-orthogonality arises, meaning that the multicollinearity (ill-conditioning) problem is present. This problem may cause instability of the ML estimation of regression coefficients and inaccurate inferences based on the model. This problem also leads to inflated variance and covariance of the ML estimator, which causes wide confidence intervals of the coefficients. In the presence of multicollinearity, the signs of the ML estimators may be different than theoretically expected. As a consequence of this, the interpretation of the relationship between the dependent and independent variables in the terms of odds ratios becomes erroneous. These harmful results of the multicollinearity make ML estimators unreliable.
Applying biased shrinkage estimators is a good solution to manage variance and instability of ML estimators. One of these estimators is the ridge logistic estimator, introduced by Schaefer et al. [25], which is a logistic form of ridge estimator proposed by Hoerl and Kennard [10,11]. Mansson and Shukur [20] studied various types of ridge logistic estimators. Moreover, in the study of Månsson et al. [21] a new shrinkage estimator which is a logistic generalization of the estimator defined by Liu [18] was proposed. Also, Huang [13] proposed a shrinkage estimator which is a combination of the ridge logistic estimator and Liu logistic estimator as a solution to multicollinearity.
In ridge regression, it is widely accepted that value of the shrinkage parameter k is chosen as a relatively small constant between 0 and 1, but there is an inverse relationship between the condition number of and the shrinkage parameter k [19]. In order to ensure that the condition number is small, the selected value of k must be sufficiently large. Based on the fact that ridge regression did not completely overcome the problem of ill-conditioning, Liu-type logistic estimator was defined by Inan and Erdogan [14] as a logistic form of the Liu-type estimator proposed by Liu [19]. In the Liu-type logistic estimator, by adapting the biasing parameter d to ensure a good fit of the model, a large k value can be selected to solve the multicollinearity problem. In this way, the ill-conditioning problem is eliminated. In other words, the value of k can be chosen to be large enough to reduce the condition number of to an acceptable extent, while the d value makes the model fit well.
In the existing studies for the Liu-type logistic estimator, the pair of (k, d) is obtained in two phases [1,5,14]. The biasing parameter d is identified after identifying the shrinkage parameter k, such that the minimization of the mean squared error of the parameters is obtained. Since it is not feasible to evaluate the mean square error of the parameters numerically, an optimum value of the biasing parameter is obtained in the terms of parameters and thus the estimations of different parameters are used to obtain optimum value of d. However, the pair of the parameters k and d chosen in this way is not optimal since it depends on the choice of estimations of the parameters. When the estimated (k, d) pair is the actual optimum value, the minimization of MSE will ensure a good fit for the model. However, this form of two-phase process does not ensure a good performance of the model and the selected (k, d) pair may not be chosen purposefully so to guarantee the solution of the multicollinearity problem. For these reasons, the simultaneous estimation of the (k, d) pair instead of utilizing the two-phase process can be considered as a more appropriate approach to solve the multicollinearity problem and to ensure a good fit of the model. In this study, a simultaneous estimation method for the optimal selection of k and d parameters has been proposed for the Liu-type logistic estimator using Particle Swarm Optimization (PSO) algorithm. In accordance with this aim, an appropriate objective function is developed to solve the multicollinearity problem and to ensure the good performance of the model, simultaneously.
This article is organized as follows: In Section 2, the logistic Liu-type estimator is described. Section 3 includes information about the process of the PSO algorithm. An alternative method based on PSO for the optimal selection of (k, d) pair is proposed in Section 4. In Section 5, the performance of the proposed method is illustrated by simulation study and a real data set application by comparing various estimators and finally, the conclusion is given in Section 6.
2. Methodology
2.1. Logistic model and maximum likelihood estimator
The logistic model predicts the probability of occurrence of a binary event using logit link function. Consider a logistic regression model in Equation (1) where the dependent variable is Bernoulli distributed as
| (1) |
where is the i-th row of the design matrix with n data points and p explanatory variables, and is the coefficient vector. The most commonly used method of estimating is the maximum likelihood (ML) estimation method to maximize the log-likelihood :
| (2) |
The ML estimator of is computed by setting the first derivative of Equation (2) with respect to to zero. Therefore, the ML estimator is obtained by solving the following equation:
| (3) |
where , .
The iteratively weighted least squares (IWLS) is applied to obtain the solution to Equation (3). The ML estimator of is estimated by applying the IWLS algorithm as in Equation(4):
| (4) |
where is the vector which is the adjust dependent variable with . The covariance matrix of asymptotically normally distributed is defied by the inverse of the Hessian matrix, which is given by Equation (5):
| (5) |
The mean square error (MSE) of is also given by Equation (6):
| (6) |
2.2. Ridge logistic estimator and Liu-type logistic estimator
When the Hessian matrix is not invertible, this clearly leads to problems. As a result of the multicollinearity problem in the design matrix, the inverse of the Hessian matrix becomes difficult or impossible and some of the s becomes zero or very close to zero, which causes the ill-conditioning problem. Hence, the very small eigenvalue blows up the variance of , which produces an unstable estimator and inaccurate inferences. Applying biased shrinkage estimators is a powerful solution to deleterious effects of the multicollinearity. Many authors have studied biased shrinkage estimators in logistic regression. Schaefer et al. [25] proposed the ridge logistic estimator, Aguilera et al. [3] proposed the principal component logistic regression (PCLR) estimator, and Mansson et al. [21] introduced the Liu logistic estimator, by integrating the PCLR and ridge logistic estimator to deal with multicollinearity. Moreover, Inan and Erdogan [14] proposed Liu-type logistic estimator and Asar [5] studied various properties of this estimator. As in linear regression, the ridge estimator in Equation (7) is a frequently used biased estimator in logistic regression as a solution to the negative results of multicollinearity problem.
| (7) |
The ridge logistic estimator, solves multicollinearity by adding a small number to the diagonal of to decrease the condition number, which is defined by
| (8) |
where and are maximum and minimum eigenvalues of , respectively. A high indicates ill-conditioning problem. CN < 100 implies that there is no multicollinearity. If 100 < CN < 1000 then there is moderate to strong multicollinearity and CN > 1000 indicates that there is severe multicollinearity [8]. There is an inverse relationship between CN and k and, a large k must be selected to ensure that the CN is small. On the other hand, a small value of k is mostly chosen in applications since the bias of the estimator inflates as the value of k increases. Thus, in practice, the very small selected value of k is not sufficient to solve the ill-conditioning problem, and in this case, the problem still exists and the parameter estimators are still unstable. To overcome this problem in the ridge estimator, Inan and Erdogan [14] proposed logistic Liu-type estimator, as a logistic form of the Liu-type estimator proposed by Liu [19]:
| (9) |
where k > 0 and The Liu-type logistic estimator is a two-parameter estimator. These two parameters (k, d) have distinct tasks; k, which is a shrinkage parameter whose purpose is to decrease the condition number of the matrix to a desired level, and d, which is a biasing parameter, that improves the model fit and statistical property. Consider spectral decomposition of the matrix . Let
where are the ordered eigenvalues of , and is the matrix whose columns are the eigenvectors of . The bias and variance of are obtained as in Equation (10) and Equation (11), respectively:
| (10) |
| (11) |
Since is obtained as:
| (12) |
The purpose of this estimator is to select appropriate values of the (k, d) pair such that the decrease in variance is higher than the increment of the square of the bias. is quadratic function of the parameter d when k is fixed. Thus, differentiating with respect to d and equating to zero yields:
| (13) |
It is evident that the minimum and optimal value of the parameter d, in Equation (13) is obtained after estimating k. There is no definite rule for selecting k. Numerous estimators are used for the estimation of k for the ridge logistic regression. The following are the most commonly used estimators of k for Liu-type estimator:
| (14) |
| (15) |
| (16) |
| (17) |
| (18) |
where Equations (14)–(18) are given in [25], [10], and [19], respectively. After k is estimated by using the one of the estimators of k, obtained in Equation (13) is used to choose d. Since dopt depends on an unknown parameter , we replace it by ML estimator, . Thus, we obtain the following estimator of d, with selected k:
| (19) |
This method allows the model to operate with larger k values by balancing the bias caused by k with d. The pair (k, d) is estimated in a two-phase process in previous studies. After assigning the shrinkage parameter k, the biasing parameter d is estimated such that the mean square error of is minimized. However, since the different estimators can be utilized in the estimation of d and the depends on the selection of parameter estimations, optimal choice of the (k, d) pair in this way is not guaranteed. Therefore, in this study, instead of using such two-phase estimation methods, it is considered more appropriate to develop a method that can estimate k and d with a simultaneous optimization method by means of an appropriate objective function that considers both the ill-conditioning problem and model fitting performance. To achieve this, a meta heuristic algorithm Particle Swarm Optimization (PSO) has been employed.
3. Particle swarm optimization
Particle Swarm Optimization (PSO) is a swarm intelligence-based evolutionary computation algorithm developed by Kennedy and Eberhart [16]. This algorithm was developed by simulating the social behavior of bird flocks and fish schools. Each possible solution in PSO is called a particle. A particle in the PSO is similar to a bird or fish flying across the search area. The positions of the particles are described as potential problem solutions. Each particle possesses a velocity vector that allows it to investigate the area in search of the optimum. Each particle assigns its trajectory based on its best position (Pbest) and the best particle of the swarm which is the global best (Gbest) at each generation. This process expands the stochastic structure of the particle and converge rapidly to a optimum solution.
Each particle of the swarm changes its positions at each iteration. Let be the position vector of the particle j at iteration t where d is the number of positions in a particle. Each particle j holds the trajectory of the position vector represents the current best obtained by the particle j and the position vector keeps the global best solution obtained by all of the particles. The velocity and position of the particle j at iteration t + 1 are updated by the following dynamic equations:
| (20) |
| (21) |
where w is the inertia weight; c1 and c2 are cognitive and social coefficients, respectively; and are independently uniformly distributed random numbers between 0 and 1, and s is a component in the space. The steps of the optimization process are shown in the following algortihm:
Step 1. Determine the tuning parameters: number of the particles (np), maximum number of iterations (maxit), w, .
Step 2. Initialize the particles in the swarm by defining positions and velocities, randomly.
Step 3. Find the value of objective function for each particle.
Step 4. Define the best experienced position, local best solution by each particle (Pbest) and the swarm’s best position so far, global best solution (Gbest).
Step 5. Calculate and update the velocity and positions for each particle according to Equation (20) and Equation (21).
Step 6. Repeat steps 3–5 until maxit is reached.
4. Implementation of PSO for the estimating optimal values for the shrinkage and biasing parameters in Liu-type logistic regression
PSO has many good characteristics and superiorities over other heuristic algorithms such as easy implementation, few parameters reqirements, fewer calculation time, stable convergence feature, reduced dependent on initial points according to other methods and it is therefore more robust [2,6]. Furthermore, PSO has been successfully applied in regression analysis. See the references [27,9,15,26,24]. Taking into consideration of these characteristics and superiorities of PSO over other heuristic algorithms, PSO has been applied to develop new algorithm for estimating the (k, d) pair in Liu-type logistic estimator.
In the presence of severe multicollinearity problem, the Liu-type logistic estimator which is two-parameter biased estimator, was introduced as an alternative to the ridge logistic estimator to overcome this problem. The existing conventional methods for estimation of the (k, d) pair in Liu-type logistic regression are two-phase methods. However, as mentioned in previous sections, estimation of (k, d) by existing two-phase methods does not provide a precise optimal solution to the ill-conditioning problem since it is based on the selection of the estimators of the parameters. In short, when the (k, d) pair is selected based on two-phase methods, it becomes very difficult to obtain the optimal value of this pair in such a way as to adjust the bias that occurs and also improve the prediction performance. For these reasons, the aim is to develop a simultaneous approach based on PSO for selection of the (k, d) pair in a single step instead of utilizing two-phase process, both to solve multicollinearity and to ensure good performance and fit of the model.
In line with the purpose of the study, an objective function for our PSO-based algorithm that solves the multicollinearity problem and minimizes the bias of the model, as well as improves predictive performance of the model is formulated. Thus, the objective function in Equation (22) has the following three parts:
| (22) |
where
| (23) |
and
| (24) |
The FPR or (1-specificity) is the False Positive Rate for a threshold value, which is used in the objective function to provide good prediction performance of the model. is the condition number (CN) function, which is utilized to solve ill-conditioning problem in the model. Since utilization of CN directly in the objective function terminates in a much larger k value than required, is constructed. Additionaly, residual sum of square (RSS) is an approach to assess model fit in the terms of deviation of the model from the observations [4]. A large value of MRSS (mean residual sum of squares) shows failure of the logistic model to fit the observation. MRSS is used in the objective function for improving model fit. The aim is to determine the optimum (k, d) pair that can eliminate the multicollinearity problem and minimize the bias of the model, as well as improve the predictive performance and fitness of the model. The optimization problem defined in Equation (22) is solved by utilizing PSO in the suggested method. R Studio has been used for the analyses in the study. The algorithm of the proposed method is introduced as in the following:
Step 1. The tuning parameters of the proposed algorithm are chosen as follows:
np: 15, maxit: 1000, w: 0.9, : 2, : 2
Step 2. The positions of each of the jth (j = 1, 2, … , 15) particle are randomly generated by uniform distribution. The first and second positions of a particle represent the shrinkage parameter k and biasing parameter d values, respectively. The first positions of the particles are generated from uniform distribution with (0, 500) parameters, and the second positions are generated from a uniform distribution with (−500, 500) parameters.
Step 3. The velocities are generated from a uniform distribution with parameters (0, 4).
Step 4. The objective function is chosen as in Equation (22). The objective function values of all particles are obtained.
Step 5. According to objective function values, Pbest and Gbest particles are determined.
Step 6. The velocities and positions of the particles are updated by using the equations given in Equation (20) and Equation (21), respectively.
Step 7. Steps from 3 to step 5 are iterated until maxit is achieved.
Step 8. The optimum pair (k, d) pair is acquired as Gbest.
5. Monte Carlo simulation study
A Monte Carlo simulation study under various degrees of multicollinearity, number of explanatory variables and sample sizes has been designed to show the performance of the suggested PSO-based method for estimating optimal values for the shrinkage and biasing parameters pair (k, d) in Liu-Type logistic regression. The proposed method has been compared with the ML estimator, the various ridge logistic estimators with different estimators of k and various Liu-type logistic estimators with different estimators (k, d) according to various judgment criteria.
In the study, four and eight explanatory variables were considered. The explanatorty variables were generated when p = 4 by the following formulas:
| (25) |
| (26) |
And the explanatory variables were generated when p = 8 by the following formulas:
| (27) |
| (28) |
| (29) |
| (30) |
where are pseudo-random numbers by the standard normal distribution, and and are correlations between any two independent variables [22]. Three different sets of and values were considered as
As a common restriction in many simulation studies, the parameter β is chosen so that [23]. The experiment is repeated 1000 times for the samples sizes of n = 60, n = 90, n = 150 and at each iteration, the data was divided into training and testing datasets. Condition number (CN) and mean residual sum of squares (MRSS) were calculated on the training dataset; Area under the ROC curve (AUC), Accuracy (Acc), Sensitivity (Sens) were calculated on the testing dataset. The estimators used in the study are as follows:
MLE: Maximum likelihood estimator.
R1: Ridge logistic estimator, k is estimated by in Equation (16).
R2: Ridge logistic estimator, k is estimated by in Equation (14).
R3: Ridge logistic estimator, k is estimated by 10-fold cross validation [17].
LT1: Liu-type logistic estimator, k and d are estimated by and in Equation (14) and Equation (19), respectively.
LT2: Liu-type logistic estimator, k and d are estimated by and Equation (18) and Equation (19), respectively.
LT3: Liu-type logistic estimator, k and d are estimated by and Equation (16) and Equation (19), respectively.
PSO-based LT: Liu-type logistic estimator, (k, d) pair is simultaneously estimated by the proposed PSO-based method.
The simulation was repeated 1000 times and the MSEs of the parameter estimates; mean of CN, MRSS, AUC, Sens, Acc values were calculated for each estimator.
The simulation study results have been illustrated in Tables 1–6. It is observed that in the case of moderate and severe multicollinearity, the MSE values of MLE are inflated in each simulation scenario. For all cases, all of the existing biased estimators and the proposed estimator have better performance than MLE according to each judgment criteria such that the proposed estimator with all other estimators has a lower value of MSE than MLE has. When the proposed estimator and other estimators are examined in terms of MSE and CN, the proposed method shows superior performance over the other methods by completely eliminating the multicollinearity problem in each simulation scenario and significantly reducing the MSE value of the parameter estimates. The MSE values of the proposed estimator are the lowest in each simulation scenario. Other existing conventional biased estimators have also generally reduced MSE values compared to MLE although not to the similar extent as the suggested method. Furthermore, the compared conventional estimators do not completely solve multicollinearity problem in each simulation scenario. It is also observed that an increase in the number of explanatory variables have negative effect on the performances of the estimators. This increase in the number of explanatory variables makes increase in MSE values of both MLE and other estimators.
Table 2. Simulation results for ρ = 0.999 and γ = 0.95 when p = 4.
| n | Judgment criteria | MLE | PSO-based LT | LT1 | LT2 | LT3 | R1 | R2 | R3 |
|---|---|---|---|---|---|---|---|---|---|
| ntrain = 40ntest = 20 | MSE | 122.235 | 2.487 | 10.732 | 4.089 | 2.966 | 11.248 | 20.634 | 13.901 |
| MRSS | 26.234 | 5.021 | 18.789 | 16.853 | 13.058 | 12.978 | 17.963 | 19.098 | |
| CN | 1274.788 | 1.985 | 198.633 | 96.991 | 90.963 | 92.856 | 189.997 | 255.476 | |
| AUC | 0.730 | 0.788 | 0.739 | 0.748 | 0.750 | 0.749 | 0.749 | 0.750 | |
| Sens | 0.709 | 0.807 | 0.749 | 0.746 | 0.751 | 0.749 | 0.753 | 0.775 | |
| Acc | 0.650 | 0.681 | 0.677 | 0.673 | 0.672 | 0.682 | 0.675 | 0.670 | |
| ntrain = 60ntest = 30 | MSE | 79.547 | 2.003 | 18.654 | 8.735 | 9.847 | 27.965 | 34.985 | 17.178 |
| MRSS | 8.985 | 4.457 | 7.963 | 9.145 | 7.632 | 8.003 | 8.236 | 9.653 | |
| CN | 1221.856 | 2.568 | 241.961 | 100.895 | 201.236 | 225.857 | 235.789 | 305.598 | |
| AUC | 0.721 | 0.789 | 0.744 | 0.739 | 0.743 | 0.744 | 0.742 | 0.739 | |
| Sens | 0.739 | 0.855 | 0.750 | 0.753 | 0.759 | 0.756 | 0.751 | 0.754 | |
| Acc | 0.663 | 0.687 | 0.669 | 0.675 | 0.678 | 0.690 | 0.680 | 0.679 | |
| ntrain = 100ntest = 50 | MSE | 40.856 | 1.564 | 9.987 | 2.799 | 7.632 | 5.653 | 10.367 | 8.963 |
| MRSS | 14.106 | 4.562 | 10.635 | 12.905 | 10.560 | 10.157 | 11.744 | 12.954 | |
| CN | 1156.345 | 2.896 | 400.693 | 102.455 | 165.323 | 148.810 | 201.563 | 634.058 | |
| AUC | 0.703 | 0.711 | 0.705 | 0.707 | 0.708 | 0.706 | 0.703 | 0.706 | |
| Sens | 0.695 | 0.787 | 0.728 | 0.730 | 0.727 | 0.728 | 0.711 | 0.709 | |
| Acc | 0.659 | 0.663 | 0.660 | 0.659 | 0.662 | 0.665 | 0.655 | 0.657 |
Table 3. Simulation results for ρ = 0.99 and γ = 0.95 when p = 4.
| n | Judgment criteria | MLE | PSO-based LT | LT1 | LT2 | LT3 | R1 | R2 | R3 |
|---|---|---|---|---|---|---|---|---|---|
| ntrain = 40ntest = 20 | MSE | 101.236 | 6.109 | 19.523 | 12.633 | 19.635 | 31.145 | 35.699 | 19.235 |
| MRSS | 14.873 | 4.782 | 8.222 | 8.796 | 8.969 | 9.104 | 8.967 | 9.254 | |
| CN | 1106.564 | 1.850 | 142.321 | 95.487 | 123.665 | 129.123 | 163.364 | 193.457 | |
| AUC | 0.734 | 0.803 | 0.775 | 0.759 | 0.766 | 0.763 | 0.741 | 0.752 | |
| Sens | 0.763 | 0.799 | 0.749 | 0.762 | 0.752 | 0.739 | 0.767 | 0.758 | |
| Acc | 0.680 | 0.695 | 0.669 | 0.660 | 0.680 | 0.671 | 0.677 | 0.675 | |
| ntrain = 60ntest = 30 | MSE | 69.854 | 2.443 | 7.879 | 5.852 | 8.156 | 24.877 | 21.523 | 14.118 |
| MRSS | 8.966 | 2.963 | 7.145 | 7.966 | 7.564 | 7.632 | 7.520 | 7.883 | |
| CN | 1053.215 | 1.324 | 135.633 | 100.665 | 145.247 | 152.633 | 169.224 | 182.562 | |
| AUC | 0.749 | 0.789 | 0.757 | 0.763 | 0.759 | 0.767 | 0.741 | 0.764 | |
| Sens | 0.715 | 0.769 | 0.749 | 0.719 | 0.727 | 0.733 | 0.725 | 0.728 | |
| Acc | 0.675 | 0.689 | 0.681 | 0.680 | 0.675 | 0.683 | 0.675 | 0.683 | |
| ntrain = 100ntest = 50 | MSE | 45.720 | 0.998 | 20.563 | 8.569 | 21.335 | 18.455 | 17.999 | 23.007 |
| MRSS | 9.378 | 3.966 | 8.889 | 6.214 | 7.851 | 8.006 | 8.257 | 8.229 | |
| CN | 1009.434 | 2.009 | 149.766 | 101.858 | 121.960 | 119.651 | 179.641 | 600.568 | |
| AUC | 0.730 | 0.740 | 0.731 | 0.736 | 0.711 | 0.735 | 0.734 | 0.733 | |
| Sens | 0.761 | 0.832 | 0.773 | 0.759 | 0.776 | 0.768 | 0.780 | 0.770 | |
| Acc | 0.650 | 0.657 | 0.667 | 0.668 | 0.643 | 0.670 | 0.660 | 0.655 |
Table 4. Simulation results for ρ = 0.999 and γ = 0.99 when p = 8.
| n | Judgment criteria | MLE | PSO-based LT | LT1 | LT2 | LT3 | R1 | R2 | R3 |
|---|---|---|---|---|---|---|---|---|---|
| ntrain = 40ntest = 20 | MSE | 269.567 | 30.947 | 120.523 | 66.134 | 108.8246 | 201.993 | 207.617 | 182.307 |
| MRSS | 14.934 | 4.233 | 7.431 | 7.523 | 10.372 | 9.212 | 15.278 | 9.194 | |
| CN | 2165.341 | 1.634 | 210.309 | 99.588 | 207.169 | 313.901 | 321.198 | 297.52 | |
| AUC | 0.689 | 0.701 | 0.698 | 0.695 | 0.690 | 0.698 | 0.691 | 0.694 | |
| Sens | 0.580 | 0.647 | 0.595 | 0.577 | 0.586 | 0.600 | 0.594 | 0.582 | |
| Acc | 0.621 | 0.627 | 0.626 | 0.623 | 0.63 | 0.633 | 0.625 | 0.624 | |
| ntrain = 60ntest = 30 | MSE | 97.344 | 6.684 | 52.788 | 34.634 | 54.962 | 72.896 | 69.295 | 64.064 |
| MRSS | 12.372 | 4.142 | 8.597 | 17.965 | 14.692 | 9.705 | 9.190 | 12.186 | |
| CN | 1674.565 | 1.874 | 190.081 | 98.093 | 197.250 | 200.194 | 194.484 | 189.027 | |
| AUC | 0.703 | 0.709 | 0.710 | 0.711 | 0.707 | 0.710 | 0.703 | 0.707 | |
| Sens | 0.592 | 0.665 | 0.584 | 0.590 | 0.571 | 0.586 | 0.574 | 0.591 | |
| Acc | 0.647 | 0.646 | 0.649 | 0.645 | 0.643 | 0.647 | 0.648 | 0.647 | |
| ntrain = 100ntest = 50 | MSE | 55.895 | 3.790 | 15.651 | 35.274 | 18.399 | 40.675 | 43.178 | 40.262 |
| MRSS | 8.186 | 1.992 | 7.457 | 8.085 | 7.700 | 8.400 | 7.264 | 7.284 | |
| CN | 1356.413 | 2.202 | 133.546 | 97.145 | 130.769 | 130.535 | 133.785 | 123.774 | |
| AUC | 0.717 | 0.726 | 0.725 | 0.720 | 0.725 | 0.721 | 0.715 | 0.723 | |
| Sens | 0.633 | 0.700 | 0.623 | 0.624 | 0.619 | 0.621 | 0.623 | 0.626 | |
| Acc | 0.655 | 0.648 | 0.662 | 0.654 | 0.659 | 0.655 | 0.6520 | 0.659 |
Table 5. Simulation results for ρ = 0.999 and γ = 0.95 when p = 8.
| n | Judgment criteria | MLE | PSO-based LT | LT1 | LT2 | LT3 | R1 | R2 | R3 |
|---|---|---|---|---|---|---|---|---|---|
| ntrain = 40ntest = 20 | MSE | 217.535 | 8.387 | 80.297 | 24.856 | 72.379 | 181.959 | 183.737 | 152.769 |
| MRSS | 18.814 | 4328 | 10.323 | 19.523 | 9.091 | 15.633 | 20.529 | 18.175 | |
| CN | 2135.366 | 1.629 | 274.044 | 93.399 | 270.921 | 286.353 | 273.410 | 201.051 | |
| AUC | 0.694 | 0.700 | 0.698 | 0.699 | 0.690 | 0.701 | 0.694 | 0.687 | |
| Sens | 0.632 | 0.691 | 0.625 | 0.616 | 0.614 | 0.629 | 0.615 | 0.621 | |
| Acc | 0.642 | 0.648 | 0.632 | 0.642 | 0.633 | 0.641 | 0.635 | 0.637 | |
| ntrain = 60ntest = 30 | MSE | 75.914 | 4.693 | 40.003 | 24.037 | 39.801 | 57.475 | 57.912 | 53.335 |
| MRSS | 9.610 | 4.123 | 8.284 | 11.979 | 8.001 | 8.700 | 12.973 | 9.184 | |
| CN | 1476.719 | 1.753 | 160.790 | 99.283 | 174.711 | 162.100 | 175.7 | 168.103 | |
| AUC | 0.712 | 0.730 | 0.723 | 0.722 | 0.721 | 0.719 | 0.712 | 0.719 | |
| Sens | 0.606 | 0.703 | 0.610 | 0.596 | 0.599 | 0.607 | 0.597 | 0.612 | |
| Acc | 0.657 | 0.665 | 0.668 | 0.657 | 0.658 | 0.660 | 0.650 | 0.667 | |
| ntrain = 100ntest = 50 | MSE | 50.908 | 1.648 | 22.641 | 23.957 | 22.998 | 30.505 | 29.865 | 27.135 |
| MRSS | 8.953 | 3.982 | 8.721 | 9.086 | 7.915 | 8.054 | 10.056 | 8.777 | |
| CN | 1314.917 | 2.142 | 150.332 | 99.743 | 162.474 | 150.833 | 163.457 | 145.823 | |
| AUC | 0.717 | 0.726 | 0.725 | 0.720 | 0.725 | 0.721 | 0.715 | 0.723 | |
| Sens | 0.633 | 0.701 | 0.619 | 0.624 | 0.623 | 0.622 | 0.623 | 0.626 | |
| Acc | 0.656 | 0.658 | 0.659 | 0.654 | 0.662 | 0.652 | 0.652 | 0.659 |
Table 1. Simulation results for ρ = 0.999 and γ = 0.99 when p = 4.
| n | Judgment criteria | MLE | PSO-based LT | LT1 | LT2 | LT3 | R1 | R2 | R3 |
|---|---|---|---|---|---|---|---|---|---|
| ntrain = 40ntest = 20 | MSE | 130.693 | 3.001 | 17.578 | 7.023 | 9.645 | 41.102 | 49.125 | 32.634 |
| MRSS | 15.149 | 4.102 | 8.698 | 12.855 | 6.108 | 8.401 | 9.923 | 13.563 | |
| CN | 1389.569 | 2.123 | 296.529 | 97.065 | 182.155 | 191.364 | 235.658 | 201.896 | |
| AUC | 0.723 | 0.799 | 0.740 | 0.748 | 0.745 | 0.736 | 0.741 | 0.744 | |
| Sens | 0.613 | 0.703 | 0.631 | 0.640 | 0.629 | 0.623 | 0.630 | 0.636 | |
| Acc | 0.608 | 0.686 | 0.641 | 0.621 | 0.659 | 0.661 | 0.639 | 0.644 | |
| ntrain = 60ntest = 30 | MSE | 81.753 | 2.998 | 14.873 | 10.451 | 13.063 | 27.214 | 29.956 | 13.568 |
| MRSS | 7.562 | 3.562 | 6.153 | 6.998 | 5.823 | 6.274 | 5.952 | 7.045 | |
| CN | 1223.673 | 2.319 | 250.124 | 102.457 | 219.376 | 265.278 | 296.36 | 358.169 | |
| AUC | 0.711 | 0.783 | 0.731 | 0.729 | 0.731 | 0.737 | 0.729 | 0.717 | |
| Sens | 0.698 | 0.759 | 0.708 | 0.722 | 0.725 | 0.723 | 0.731 | 0.725 | |
| Acc | 0.649 | 0.714 | 0.689 | 0.683 | 0.678 | 0.692 | 0.685 | 0.693 | |
| ntrain = 100ntest = 50 | MSE | 49.231 | 1.314 | 20.963 | 10.897 | 20.689 | 10.205 | 12.002 | 26.719 |
| MRSS | 6.731 | 3.574 | 5.638 | 6.025 | 5.741 | 6.087 | 5.209 | 6.677 | |
| CN | 1213.850 | 3.024 | 270.903 | 103.954 | 240.687 | 278.234 | 305.562 | 702.006 | |
| AUC | 0.721 | 0.745 | 0.733 | 0.733 | 0.728 | 0.730 | 0.727 | 0.725 | |
| Sens | 0.689 | 0.758 | 0.703 | 0.710 | 0.709 | 0.708 | 0.713 | 0.711 | |
| Acc | 0.633 | 0.653 | 0.627 | 0.630 | 0.631 | 0.625 | 0.621 | 0.623 |
Table 6. Simulation results for ρ = 0.99 and γ = 0.95 when p = 8.
| n | Judgment criteria | MLE | PSO-based LT | LT1 | LT2 | LT3 | R1 | R2 | R3 |
|---|---|---|---|---|---|---|---|---|---|
| ntrain = 40ntest = 20 | MSE | 102.270 | 6.268 | 45.458 | 19.507 | 38.465 | 89.265 | 80.256 | 80.399 |
| MRSS | 34.916 | 4.175 | 8.588 | 18.583 | 7.059 | 39.147 | 17.540 | 20.837 | |
| CN | 1235.327 | 1.666 | 158.841 | 99.230 | 164.888 | 157.844 | 168.451 | 178.542 | |
| AUC | 0.706 | 0.708 | 0.717 | 0.711 | 0.718 | 0.696 | 0.701 | 0.702 | |
| Sens | 0.621 | 0.648 | 0.627 | 0.624 | 0.6132 | 0.608 | 0.634 | 0.619 | |
| Acc | 0.643 | 0.643 | 0.646 | 0.642 | 0.647 | 0.632 | 0.647 | 0.642 | |
| ntrain = 60ntest = 30 | MSE | 72.463 | 5.274 | 23.199 | 39.485 | 20.11 | 45.200 | 50.774 | 49.941 |
| MRSS | 10.480 | 4.107 | 7.296 | 9.132 | 6.637 | 13.121 | 11.524 | 9.445 | |
| CN | 1160.661 | 1.809 | 139.710 | 96.863 | 135.324 | 145.619 | 141.198 | 139.887 | |
| AUC | 0.693 | 0.702 | 0.703 | 0.702 | 0.703 | 0.668 | 0.681 | 0.695 | |
| Sens | 0.616 | 0.998 | 0.618 | 0.620 | 0.599 | 0.537 | 0.580 | 0.616 | |
| Acc | 0.628 | 0.628 | 0.640 | 0.640 | 0.635 | 0.591 | 0.617 | 0.630 | |
| ntrain = 100ntest = 50 | MSE | 45.125 | 1.070 | 19.140 | 25.777 | 16.333 | 38.221 | 40.856 | 41.124 |
| MRSS | 7.571 | 4.030 | 6.093 | 7.333 | 5.714 | 9.456 | 7.451 | 6.896 | |
| CN | 1148.112 | 3.498 | 134.152 | 97.288 | 130.574 | 147.519 | 148.648 | 139.288 | |
| AUC | 0.746 | 0.757 | 0.760 | 0.747 | 0.755 | 0.722 | 0.717 | 0.750 | |
| Sens | 0.598 | 0.635 | 0.607 | 0.598 | 0.601 | 0.511 | 0.550 | 0.602 | |
| Acc | 0.679 | 0:687 | 0.681 | 0.679 | 0.681 | 0.620 | 0.638 | 0.673 |
If the MRSS is used as a performance criterion to evaluate the goodness of the model fit, we can see that the MRSS values of MLE are quite high in each simulation scenario. The MRSS values of the proposed method are quite low compared to MLE and all other biased estimators. These values of the conventional ridge and Liu-type estimators have slightly decreased according to the values of MLE, but not to the similar extent as the suggested method. When also comparing the AUC, Sens and Acc values calculated in the testing sets as the measures of prediction or classification performance, it has been observed that the proposed method increases these values in each simulation scenario according to the values of MLE. Other estimators also tend to increase these values. However, in some scenarios, these classification measure values of the proposed method are significantly increased.
When the ridge and Liu-type logistic estimators are compared among themselves, the Liu-type logistic estimators generally give better results than ridge estimators according to MSE values, especially for the sample sizes of n = 60 and n = 90. However, there is a significant difference between the performances of the Liu-type logistic estimators according as which shrinkage parameter is applied. The performance of the LT1 and LT3 estimators are almost equivalent. However, the most robust option is the LT2 estimator among the Liu-type estimators. Moreover, when comparing the ridge logistic estimators among themselves, the R3 estimator with cross validation gives better results than other ridge estimators.
Consequently, as can be seen from the simulation results, the existing conventional methods are not successful methods for the optimal choice of (k, d) pair in the Liu-type logistic estimator and the proposed method outperforms all the compared conventional methods. The performance of the proposed method is quite good in terms of each judgment criteria compared to the ridge and Liu-type logistic estimators in small samples where the multicollinearity is more effective and large samples where it is less effective. The proposed PSO-based estimator exhibits its best performance by the means of solution to the ill-conditioning problem, the significant reduction of the MSE values and the increase of the values of classification measures simultaneously, in other words improvement of model fit. We know that if the estimation of the (k, d) pair is equal to the actual estimation of the optimum value, it is guaranteed to minimize the mean square error of the parameter and the fitting performance of the model is improved. As can be seen, the proposed estimator yields better estimation results by both minimizing MSE, solving the multicollinearity problem and improving the fitting performance of the model. Consequently, the optimum value of the (k, d) pair can be provided by the proposed method.
6. Application
In the application part of this study, the real dataset used in the article of the Liu-type logistic estimator, which was proposed by Inan and Erdogan [14], was utilized. The proposed method was evaluated based on a dataset taken from the website of the Banks Association of Turkey [7]. This dataset is about the collapse of the banks in Turkey through the Asian financial crisis (www.bddk.org.tr, 2002). The goal of this application was to develop a logistic regression model to estimate the probability that the Turkish commercial banks would be transferred to the Savings Deposit Insurance Fund (SDIF) where the dependent variable is defined as the financial status of the banks:
| (31) |
The dataset (n = 41) was randomly divided into training and testing datasets and the sample sizes were chosen as ntrain = 30 and ntest = 11. Based on the proposal of Hosmer and Lemeshow [12], a variable that has a univariate test p value below 0.025 was selected as a necessary variable for the logistic model. As a result of this proposal, the predictors to be used in the study were determined as follows:
Tables 7 and 8 show real dataset application results. In the data matrix, the condition number was calculated as 1472 × 107, which signs quite severe multicollinearity. Furthermore, variance inflation factor (VIF) values for each explanatory variable were calculated as 34.63, 2.67, 235.63, 357.77, 9.28, 3.79, 7.40, which also sign that there is a multicollinearity problem in the dataset. PSO-based-LT, LT1, LT2, LT3, R1, R2, R3, MLE estimates, and the goodnesss of fit and classification measures related to them (MRSS and CN for training set; Sens, Acc, AUC for testing set) were computed. Since relatively severe collinearity exists, the estimation results by MLE are not reliable. According to the judgment criteria values of MLE, it can be seen that MRSS is quite high and the values of the classification measures are quite low. Second, the sign of is positive. However, should be negative since is an indicator of capital adequacy and having good capital prevents transfer to SDIF. Nevertheless, the MLE of is positive and contradicts this interpretation. Obviously, we can see that our proposed estimator corrects the problems. The sign of the becomes negative. The proposed estimator also solves the ill-conditioning problem by reducing CN to below 100. Furthermore, the proposed estimator has lower value of MRSS than MLE and the compared biased estimators. The other conventional biased estimators do not solve the ill-conditioning problem and do not give good results according to the performance judgment criteria. Although they decrease the MRSS values and increase the values of the classification measures, they do not significantly change these values as in the PSO-based LT. It should be also mentioned that a large k value was estimated by the proposed method since a relatively severe multicollinearity exists in the dataset and only a sufficiently large k value can solve this severe ill-conditioning problem. In line with this, it is observed from the improvement of the model fit and statistical properties that the optimal d value is obtained to reduce and offset the bias caused by the large selected k value. This is because if the estimation of the optimum (k, d) pair is equal to the actual optimum value, the minimization of the mean square error of the parameters improves the model fit.
Table 7. Real dataset application results.
| CN | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| MLE | 46.984 | 0.453 | −2.122 | 1.089 | −1.349 | −0.476 | 0.323 | −0.212 | 1472 × 107 |
| R1 | 0.437 | 1.027 | −1.179 | 2.211 | −1.929 | −0.226 | 0.164 | −0.042 | 1657.54 × 104 |
| R2 | 0.384 | 0.821 | −0.983 | 1.862 | −1.630 | −0.212 | 0.140 | −0.031 | 4147.961 × 103 |
| R3 | 15.772 | −0.047 | −0.475 | −0.08 | −0.065 | −0.026 | 0.086 | −0.092 | 1378.849 × 101 |
| LT1 | 0.382 | 0.797 | −0.960 | 1.804 | −1.585 | −0.216 | 0.136 | −0.028 | 4099.506 × 102 |
| LT2 | 7.038 | −0.011 | −0.229 | 0.041 | −0.054 | −0.020 | 0.048 | −0.052 | 1019.234 |
| LT3 | 0.434 | 1.027 | −1.178 | 2.216 | −1.933 | −0.221 | 0.164 | −0.042 | 1627.54 × 104 |
| PSO-based LT | 4.811 | −0.021 | −0.143 | 0.013 | −0.003 | −0.010 | 0.038 | −0.045 | 93.23 |
Table 8. Judgment criteria values for the real dataset application.
| MRSS | AUC | Spec | Acc | Sens | |||
|---|---|---|---|---|---|---|---|
| MLE | 11.231 | 0.427 | 0.285 | 0.363 | 0.49 | – | – |
| R1 | 8.578 | 0.601 | 0.571 | 0.545 | 0.5 | 0.0008 | – |
| R2 | 8.500 | 0.602 | 0.571 | 0.545 | 0.5 | 0.0031 | – |
| R3 | 6.751 | 0.601 | 0.798 | 0.703 | 0.5 | 0.98 | – |
| LT1 | 8.534 | 0.602 | 0.571 | 0.545 | 0.5 | 0.0031 | −4 × 10−10 |
| LT2 | 5.626 | 0.686 | 0.812 | 0.712 | 0.51 | 33.131 | −0.000042 |
| LT3 | 8.509 | 0.602 | 0.574 | 0.545 | 0.5 | 0.0008 | −0.009 |
| PSO-based LT | 3.440 | 0.689 | 0.857 | 0.795 | 0.51 | 74.513 | −2.123 |
7. Conclusion
In this study, a simultaneous estimation method based on PSO for the optimal selection of k and d parameters has been proposed for the Liu-type logistic estimator. The performance of the proposed estimator has been compared with the existing conventional ridge and Liu-type logistic estimators according to various judgment criteria. The Monte Carlo simulation study with different scenarios has revealed that the proposed method through the proper objective function used in the optimization process outperforms the conventional compared biased estimators. The proposed estimator shows the best performance by the means of a solution to the multicollinearity problem, a significant reduction in MSE values, and an improvement of the model fit, simultaneously. However, it is clearly observed that the existing biased estimators both fail to solve the multicollinearity problem, to improve the model fit and also to significantly reduce the MSE value, simultaneously. Real dataset application has also revealed the eligibility of the proposed method. Furthermore, when conventional ridge and Liu-type logistic estimators have been compared, it was observed that the Liu-type estimators are generally more successful than ridge estimators. However, the compared conventional Liu-type estimators do not give good enough results because they are two-phase methods. It has been observed that the single-phase proposed PSO-based method gives very good results in every aspect compared to the two-stage Liu-type estimators. Consequently, we strongly suggest our PSO-based algorithm for the optimal selection of (k, d) pair in the Liu-type logistic estimator.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Abonazel M.R. and Farghali R.A., Liu-type multinomial logistic estimator. Sankhya B 81 (2019), pp. 203–225. Available at 10.1007/s13571-018-0171-4. [DOI] [Google Scholar]
- 2.Abuella M.A., Study of particle swarm for optimal power flow in IEEE benchmark systems including wind power generators, Ph.D. diss., Southern Illinois University Carbondale, 2012.
- 3.Aguilera A.M., Manuel E., and Mariano J.V., Using principal components for estimating logistic regression with high-dimensional multicollinear data. Comput. Stat. Data Anal. 50 (2006), pp. 1905–1924. doi: 10.1016/j.csda.2005.03.011 [DOI] [Google Scholar]
- 4.Aitkin M.A., Aitkin M., Francis B., and Hinde J., Statistical Modelling in GLIM 4, Vol. 32, OUP, Oxford, 2005. [Google Scholar]
- 5.Asar Y., Some new methods to solve multicollinearity in logistic regression. Commun. Stat. Simul. Comput. 46 (2017), pp. 2576–2586. doi: 10.1080/03610918.2015.1053925 [DOI] [Google Scholar]
- 6.Bai Q., Analysis of particle swarm optimization algorithm. Comput. Inf. Sci. 3 (2010), pp. 180–184. [Google Scholar]
- 7.The Banks Association of Turkey, Banks in Turkey (1998–2002). Available at www.tbb.org.tr.
- 8.Belsley D.A., Kuh E., and Welsch R.E., Diagnostics: Identifying Influential Data and Sources of Collinearity, John Wiley & Sons, New York, 1980. [Google Scholar]
- 9.Cagcag O., Yolcu U., and Egrioglu E., A new robust regression method based on particle swarm optimization. Commun. Stat. Theory Methods 44 (2015), pp. 1270–1280. doi: 10.1080/03610926.2012.718843 [DOI] [Google Scholar]
- 10.Hoerl A.E. and Kennard R.W., Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 12 (1970), pp. 55–67. doi: 10.1080/00401706.1970.10488634 [DOI] [Google Scholar]
- 11.Hoerl A.E. and Kennard R.W., Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 12 (1970), pp. 69–82. doi: 10.1080/00401706.1970.10488635 [DOI] [Google Scholar]
- 12.Hosmer D.W. and Lemeshow S., Applied Logistic Regression, Wiley, New York, 1989. [Google Scholar]
- 13.Huang J., A simulation research on a biased estimator in logistic regression model, in International Symposium on Intelligence Computation and Applications, Springer, Berlin, 2012, pp. 389–395. [Google Scholar]
- 14.Inan D. and Erdogan B.E., Liu-type logistic estimator. Commun. Stat. Simul. Comput. 42 (2013), pp. 1578–1586. doi: 10.1080/03610918.2012.667480 [DOI] [Google Scholar]
- 15.Inan D., Egrioglu E., Sarica B., Askin O.E., and Tez M., Particle swarm optimization based Liu-type estimator. Commun. Stat. Theory Methods 46 (2017), pp. 11358–11369. doi: 10.1080/03610926.2016.1267759 [DOI] [Google Scholar]
- 16.Kennedy J. and Eberhart R., Particle swarm optimization. Proc. IEEE Int. Conf. Neural Netw. 4 (1995), pp. 1942–1948. doi: 10.1109/ICNN.1995.488968 [DOI] [Google Scholar]
- 17.Le Cessie S. and Van Houwelingen J.C., Ridge estimators in logistic regression. J. R. Stat. Soc. C Appl. Stat. 41 (1992), pp. 191–201. [Google Scholar]
- 18.Liu K., A new class of blased estimate in linear regression. Commun. Stat. Theory Methods 22 (1993), pp. 393–402. doi: 10.1080/03610929308831027 [DOI] [Google Scholar]
- 19.Liu K., Using Liu-type estimator to combat collinearity. Commun. Stat. Theory Methods 32 (2003), pp. 1009–1020. doi: 10.1081/STA-120019959 [DOI] [Google Scholar]
- 20.Månsson K. and Shukur G., On ridge parameters in logistic regression. Commun. Stat. Theory Methods 40 (2011), pp. 3366–3381. doi: 10.1080/03610926.2010.500111 [DOI] [Google Scholar]
- 21.Månsson K., Kibria B.G., and Shukur G., On Liu estimators for the logit regression model. Econ. Model. 29 (2012), pp. 1483–1488. doi: 10.1016/j.econmod.2011.11.015 [DOI] [Google Scholar]
- 22.McDonald G.C. and Galarneau D.I., A Monte Carlo evaluation of some ridge-type estimators. J. Am. Stat. Assoc. 70 (1975), pp. 407–416. doi: 10.1080/01621459.1975.10479882 [DOI] [Google Scholar]
- 23.Newhouse J.P. and Oman S.D., An Evaluation of Ridge Estimators, Rand Corporation, P-716-PR, 1971.
- 24.Sancar N. and Inan D., Identification of influential observations based on binary particle swarm optimization in the Cox PH model. Commun. Stat. Simul. Comput. 49 (2019), pp. 1–24. [Google Scholar]
- 25.Schaefer R., Roi L., and Wolfe R., A ridge logistic estimator. Commun. Stat. Theory Methods 13 (1984), pp. 99–113. doi: 10.1080/03610928408828664 [DOI] [Google Scholar]
- 26.Tak N., Evren A.A., Tez M., and Egrioglu E., Recurrent type-1 fuzzy functions approach for time series forecasting. Appl. Intell. 48 (2018), pp. 68–77. doi: 10.1007/s10489-017-0962-8 [DOI] [Google Scholar]
- 27.Uslu V.R., Egrioglu E., and Bas E., Finding optimal value for the shrinkage parameter in ridge regression via particle swarm optimization. Am. J. Intell. Syst. 4 (2014), pp. 142–147. [Google Scholar]
