A new regression model for bounded response variable: An alternative to the beta and unit-Lindley regression models

Emrah Altun; M El-Morshedy; M S Eliwa

doi:10.1371/journal.pone.0245627

. 2021 Jan 22;16(1):e0245627. doi: 10.1371/journal.pone.0245627

A new regression model for bounded response variable: An alternative to the beta and unit-Lindley regression models

Emrah Altun ¹, M El-Morshedy ^2,^3,^*, M S Eliwa ³

Editor: Feng Chen⁴

PMCID: PMC7822343 PMID: 33481884

Abstract

A new distribution defined on (0,1) interval is introduced. Its probability density and cumulative distribution functions have simple forms. Thanks to its simple forms, the moments, incomplete moments and quantile function of the proposed distribution are derived and obtained in explicit forms. Four parameter estimation methods are used to estimate the unknown parameter of the distribution. Besides, simulation study is implemented to compare the efficiencies of these parameter estimation methods. More importantly, owing to the proposed distribution, we provide an alternative regression model for the bounded response variable. The proposed regression model is compared with the beta and unit-Lindley regression models based on two real data sets.

1 Introduction

In the last decade, modeling of the bounded data sets is increased its popularity. These kinds of data sets appear in many fields such as finance, actuarial and medical sciences. The statistics literature has very limited distributions defined on (0,1). The best known distributions defined on (0,1) are beta, Topp-Leone by Topp and Leone [1] and Kumaraswamy by Kumaraswamy [2] distributions. To increase the modeling accuracy of the data sets on (0,1), several distributions have been proposed by researchers. For instance, the unit-Lindley by Mazucheli et al. [3], unit-inverse Gaussian by Ghitany et al. [4], unit-Birnbaum-Saunders by Mazucheli et al. [5], exponentiated Topp-Leone by Pourdarvish et al. [6], transmuted Kumaraswamy by Khan et al. [7], log-xgamma by Altun and Hamedani [8], log-weighted exponential by Altun [9] and unit-improved second-degree Lindley by Altun and Cordeiro [10].

Although the beta distribution is widely used to model data sets on bounded interval, it has deficiency to model extremely left-skewed and leptokurtic data sets. The moments of the Topp-Leone distribution are not in explicit forms which is important to make appropriate parametrization on the density function for regression modeling. Additionally, even if the moments of the Kumaraswamy distribution are in explicit forms, they contains gamma function which destroys the re-parametrization of the density function. We aim to introduce a new distribution on (0,1) interval to remove the deficiencies of the existing distributions for modeling the extremely skewed data sets. The Bilal distribution introduced by Abd-Elrahman [11] is used to generate a new distribution employing the appropriate transformation. The resulting distribution is called as log-Bilal distribution since we use Y = exp(−X) transformation. After obtaining the log-Bilal distribution, we obtain its statistical properties such as moments, incomplete moments and quantile function. The important question is that do we need this distribution? To answer this question, we summarize the importance of the log-Bilal distribution: (i) the log-Bilal distribution has simple and closed-form expressions for its statistical functions (ii) the properties of the log-Bilal distribution are derived in explicit forms without any special mathematical functions, (iii) the proposed distribution provides more flexibility than existing distributions for the shapes of hazard rate function, (iv) thanks to its simple mathematical functions, we introduce a new regression model based on the log-Bilal density to model the extremely skewed dependent variables with associated covariates.

We summarize the concepts of the remaining sections: the moments, incomplete moments, quantile function, and exponential family property of the log-Bilal distribution are obtained in the next section. Section 3 is devoted to the parameter estimation methods. The efficiencies of these methods are compared in Section 4. The log-Bilal regression model is introduced in Section 5. Section 6 contains the results of the data analysis. The paper is ended with concluding remarks in Section 7.

2 The log-Bilal distribution

Let random variable (rv) X represents the Bilal distribution which has the following probability density function (pdf)

\begin{matrix} f (x) = \frac{6}{θ} exp (- \frac{2 x}{θ}) (1 - exp (- \frac{x}{θ})), x > 0, \end{matrix}

(1)

where θ > 0 is the scale parameter. The cumulative distribution function (cdf) of X is

\begin{matrix} F (x) = 1 - exp (- \frac{2 x}{θ}) (3 - 2 exp (- \frac{x}{θ})) . \end{matrix}

(2)

Following the idea of Altun and Hamedani [8] and Altun [9] and using the Y = exp(−X) transformation on the Bilal distribution, the pdf of the log-Bilal distribution is

\begin{matrix} f (y; θ) = \frac{6}{θ} y^{2 / θ - 1} (1 - y^{1 / θ}), 0 < y < 1, \end{matrix}

(3)

where θ > 0. Here, the parameter θ behaves like a shape parameter by contrast with the Bilal distribution. From now on, the rv Y having density (3) is stated as Y ∼ log- Bilal(θ). The cdf of Y (for 0 ≤ y ≤ 1) is

\begin{matrix} F (y; θ) = 3 y^{2 / θ} - 2 y^{3 / θ} . \end{matrix}

(4)

Some possible pdf shapes of the log-Bilal distribution are displayed in Fig 1. From these figures, it is clear that the proposed distribution can be used to model the different types of the data sets defined on the unit-interval such as right and left skewed as well as nearly symmetric data sets.

The survival function (sf) and hazard rate function (hrf) of Y are, respectively,

\begin{matrix} S (y) = 1 - 3 y^{2 / θ} + 2 y^{3 / θ}, \end{matrix}

(5)

\begin{matrix} h (y) = \frac{6 y^{2 / θ - 1} (1 - y^{1 / θ})}{θ (1 - 3 y^{2 / θ} + 2 y^{3 / θ})} . \end{matrix}

(6)

Fig 2 displays hrf shapes of the log-Bilal distribution. As seen from these plots, the hrf shapes of the log-Bilal distribution can be increasing and bathtub. The right side of Fig 2 gives information about the hrf regions of the log-Bilal regression according to the different values of the parameter θ.

The quantile function of Y is given by

\begin{matrix} Q (u) = \frac{2}{θ} {(1 + {(2 \sqrt{u^{2} - u} - 2 u + 1)}^{1 / 3} + \frac{1}{{(2 \sqrt{u^{2} - u} - 2 u + 1)}^{1 / 3}})}^{θ} \end{matrix}

(7)

where 0 < u < 1. Using (7), we have the following algorithm to generate random variables from the log-Bilal distribution.

Algorithm 1 Generating random variables from log- Bilal(θ) distribution

1. Set the parameter θ,

2. Generate u_i ∼ U(0, 1),

3. Calculate $X_{i} = \frac{2}{θ} {(1 + {(1 + 2 \sqrt{u_{i}^{2} - u_{i}} - 2 u_{i})}^{1 / 3} + \frac{1}{{(1 + 2 \sqrt{u_{i}^{2} - u_{i}} - 2 u_{i})}^{1 / 3}})}^{θ}$

4. Repeat steps 2 and 3 n times.

2.1 Moments

The kth raw moment of Y is

\begin{matrix} E (Y^{k}) & = & \int_{0}^{1} \frac{6}{θ} y^{k + 2 / θ - 1} (1 - y^{1 / θ}) d y \\ = & \frac{6}{(k θ + 2) (k θ + 3)} \end{matrix}

(8)

Using (8), the first and second raw moments of Y are given, respectively, by

\begin{matrix} E (Y) = \frac{6}{(θ + 2) (θ + 3)} and E (Y^{2}) = \frac{3}{(θ + 1) (2 θ + 3)} . \end{matrix}

The variance of Y is obtained from the its first and second raw moments as

\begin{matrix} Var (Y) = \frac{3 θ^{2} (θ^{2} + 10 θ + 13)}{{(θ^{2} + 5 θ + 6)}^{2} (2 θ^{2} + 5 θ + 3)} . \end{matrix}

It is easy to conclude that the mean and variance of the log-Bilal distribution decreases when the parameter θ increases.

2.2 Incomplete moments

The rth incomplete moment of Y is

\begin{matrix} m_{r} (t) & = & E (Y^{r} | y < t) = \int_{0}^{t} \frac{6}{θ} y^{r + 2 / θ - 1} (1 - y^{1 / θ}) d t \\ = & \frac{6 t^{2 / θ + r}}{r θ + 2} - \frac{6 t^{3 / θ + r}}{r θ + 3} \end{matrix}

(9)

The incomplete moments of random variables are important tools to measure the inequalities like Gini measure (see, Butler and McDonald [12] for details).

2.3 Exponential family

The pdf of any distribution should be expressed in the following form to be a member of exponential family.

\begin{matrix} f (y; θ) = exp [Q (θ) T (y) + D (θ) + S (y)] . \end{matrix}

The pdf of the the log-Bilal distribution can be expressed as follows

\begin{matrix} f (y; θ) = exp [(2 / θ - 1) log (y)] exp [log (6 / θ)] exp [log (1 - y^{1 / θ})], \end{matrix}

where Q(θ) = (2/θ − 1), T(y) = log (y), S(y) = log (1 − y^1/θ) and D(θ) = log(6/θ). Therefore, the log-Bilal distribution is a member of exponential family. Here, $T (y) = \sum_{i = 1}^{n} log (y_{i})$ is the sufficient statistic for the parameter θ.

3 Estimation

We use four estimation methods to discuss the parameter estimation process of the log-Bilal distributions. These estimation methods are maximum likelihood estimation (MLE), method of moments (MM), least squares estimation (LSE) and weighted least squares estimation (WLSE). Detailed pieces of information on these estimation methods are given in the rest of this section.

3.1 Maximum likelihood

Let y₁, …, y_n be a random sample from the log- Bilal distribution. The log-likelihood function of the log-Bilal distribution is

\begin{matrix} ℓ (θ) = n ln (6 / θ) + n (2 / θ - 1) \bar{y} + \sum_{i = 1}^{n} ln (1 - y^{1 / θ}), \end{matrix}

(10)

where $\bar{y} = \sum_{i = 1}^{n} y_{i} / n$ . By differentiating (10) with respect to θ gives

\begin{matrix} \frac{\partial l}{\partial θ} = - \frac{n}{θ} - \frac{2 n \bar{y}}{θ^{2}} + \frac{1}{θ^{2}} \sum_{i = 1}^{n} \frac{y_{i}^{1 / θ} ln (y)}{(1 - y_{i}^{1 / θ})} \end{matrix}

(11)

The MLE of θ, say, $\hat{θ}$ , is the solution of (11) for zero. There is no explicit form solution for (11). Therefore, it should be solved iteratively or direct maximization of (10) can be viewed as the other choice. Here, the direct maximization of (10) is preferred by using the optim function of R software.

3.2 Method of moments

The MM estimation method is a popular method when the raw moments of the distribution have simple forms. The MM estimator of θ can be easily obtained by equating the first theoretical moment of the log-Bilal distribution to the sample mean, which gives

\begin{matrix} {\hat{θ}}_{M M} = \frac{1}{2} ({(\frac{\bar{y}}{\bar{y} + 24})}^{- 1 / 2} - 5), \end{matrix}

where $\bar{y} = \sum_{i = 1}^{n} y_{i} / n$ .

3.3 Least squares

Assume that the y₍₁₎, …, y_(n) be ordered sample of y₁, …, y_n following the log-Bilal distribution. The LSE of θ is obtained by minimizing

\begin{matrix} \sum_{i = 1}^{n} {[F (y_{(i)}; θ) - \frac{i}{n + 1}]}^{2}, \end{matrix}

(12)

where F(y_(i);θ) is in (4). Then, we have

\begin{matrix} \sum_{i = 1}^{n} {[3 y_{i}^{2 / θ} - 2 y_{i}^{3 / θ} - \frac{i}{n + 1}]}^{2} . \end{matrix}

3.4 Weighted least squares

The minimization of the below function gives the WLSE of the parameter θ.

\begin{matrix} \sum_{i = 1}^{n} \frac{{(n + 1)}^{2} (n + 2)}{i (n - i + 1)} {[3 y_{i}^{2 / θ} - 2 y_{i}^{3 / θ} - \frac{i}{n + 1}]}^{2} . \end{matrix}

4 Simulation

We compare the efficiencies of the MLE, MM, LSE and WLSE methods in estimating the parameter of the log-Bilal distribution. The algorithm given in Section 2 is used to generate random variables from the log-Bilal distribution. The simulation results are interpreted based on the following quantities.

\begin{matrix} \begin{matrix} B i a s = \sum_{j = 1}^{N} \frac{{\hat{θ}}_{j} - θ}{N}, M R E = \sum_{j = 1}^{N} \frac{{\hat{θ}}_{j} / θ}{N}, \\ M S E = \sum_{j = 1}^{N} \frac{{({\hat{θ}}_{j} - θ)}^{2}}{N} . \end{matrix} \end{matrix}

These kind of statistical measures such as means square erros (MSEs) and mean relative errors (MREs) are used to compare the different approaches deciding the best model under pre-determined scenarios (see, Zeng et al., [13, 14]). The statistical software R is used to obtain numerical results for the simulation study. We choose the parameter value θ = 1.7, the simulation replication is N = 10, 000 and the sample size is n = 20, 25, 30, …, 300. If the estimation methods yield an asymptotically unbiased estimation of θ, we expect to see that MSEs and biases approach the zero. On the other hand, MREs should be near the one. The simulation results are displayed in Fig 3. As seen from these figures, MLE method approaches the desired values of biases, MSEs and MREs faster than other estimation methods. Therefore, MLE method is more appropriate than other methods for estimating the parameter of the log-Bilal distribution.

5 The log-Bilal regression model

Now, we introduce a new regression model for bounded response variable as an alternative to the beta and unit-Lindley regression models. Let θ = 2⁻¹({μ/(μ + 24)}^−1/2 − 5), then the pdf of log-Bilal distribution takes the form

\begin{matrix} f (y; μ) = \frac{12}{({μ / (μ + 24)}^{- 1 / 2} - 5)} y^{4 / ({μ / (μ + 24)}^{- 1 / 2} - 5) - 1} (1 - y^{2 / ({μ / (μ + 24)}^{- 1 / 2} - 5)}) \end{matrix}

(13)

where 0 < y < 1, 0 < μ < 1 and E(Y|μ) = μ. The logit link function is used to link the covariates to the mean of response variable, as follows,

\begin{matrix} μ_{i} = \frac{exp (x_{i}^{T} β)}{1 + exp (x_{i}^{T} β)}, i = 1, \dots, n, \end{matrix}

(14)

where $x_{i}^{T} = (x_{i 1}, x_{i 2}, \dots, x_{i p})$ is the vector of covariates and β = (β₀, β₁, β₂, …, β_k)^T is the vector of unknown regression coefficients. Substituting μ_i in (13) with (14), the log-likelihood function of the log-Bilal regression model is

\begin{matrix} \begin{matrix} ℓ (β) = n ln (12) - \sum_{i = 1}^{n} ln ({μ_{i} / (μ_{i} + 24)}^{- 1 / 2} - 5) + \sum_{i = 1}^{n} ln (y_{i}) [\frac{12}{({μ_{i} / (μ_{i} + 24)}^{- 1 / 2} - 5)} - 1] \\ + \sum_{i = 1}^{n} ln (1 - y_{i}^{2 / ({μ_{i} / (μ_{i} + 24)}^{- 1 / 2} - 5)}), \end{matrix} \end{matrix}

(15)

where μ_i is given by (14). The unknown vector of regression parameters, β, is estimated by minimizing the negative value of (15) which is equivalent to the maximization of (15). The standard errors of the estimated parameters are obtained by means of observed information matrix whose elements can be calculated numerically with fdHess function of R software.

5.1 Residuals analysis

To check the model accuracy of the fitted log-Bilal regression model, the randomized quantile residuals introduced by Dunn and Smyth [15] is used. The randomized quantile residuals are given by

\begin{matrix} {\hat{r}}_{i} = Φ^{- 1} ({\hat{u}}_{i}), \end{matrix}

where ${\hat{u}}_{i} = F (y_{i}; \hat{β})$ and Φ⁻¹(z) is the inverse of the standard normal cdf. When the fitted model is valid for the used data set, r_i is normally distributed with zero mean and unit variance.

6 Empirical studies

In this section, the log-Bilal distribution and log-Bilal regression model are compared with existing models. Two real data set are analyzed to prove the usefulness of proposed distribution in modeling the real data sets.

6.1 Dwellings without basic facilities

Better Life Index (BLI) is calculated for the OECD countries as well as Brazil, Russia and South Africa to compare the countries based on 12 indicators which effect the quality of the life. Here, we use one of the variable of BLI measured in the year of 2017, dwellings without basic facilities which is defined as a percentage of the population living in a dwelling without indoor flushing toilet. The data set is available at https://stats.oecd.org/index.aspx?DataSetCode=BLI. This data set is used to compare the real data modeling performance of the log-Bilal distribution with the following competitive models: beta, Kumaraswamy, Topp-Leone and unit-Lindley.

The competitive distributions as well as the log-Bilal distribution are fitted to the data used by means of R software. After fitting the distribution to data, the MLEs of the parameters of the fitted distributions with their standard errors (SEs) are obtained. Besides, the formal goodness-of-fit tests such as Kolmogorov-Smirnov (K-S), Cramér-von Mises (W*) and Anderson-Darling (A*) are applied to decide the suitability of the distributions on the data used. Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) are widely used criteria to choose the best statistical model. These statistics are used for comparison of the fitted models and selection of the best model (see, Chen et al., [16, 17]).

Table 1 shows the MLEs of the parameters for the fitted models to the dwellings without basic facilities data, corresponding SEs, and goodness-of-fit statistics as well as AIC and BIC values. As seen from the results of K-S tests with corresponding p-values, the all fitted distributions, except the unit-Lindley, provide adequate fits. However, the log-Bilal distribution has the lowest values of the AIC, BIC, A* and W* statistics which indicate that the proposed distribution is the best choice for the data used.

Table 1. The estimated parameters of the fitted models (SEs are on the second line).

Models	Parameter estimations		AIC	BIC	A*	W*	K-S	p-value
Beta(α, β)	0.2847	1.4017	-114.1408	-110.8657	1.8818	0.2546	0.2032	0.0868
Beta(α, β)	0.0518	0.3917	-114.1408	-110.8657	1.8818	0.2546	0.2032	0.0868
Kumaraswamy(α, β)	0.3367	1.6076	-117.0740	-113.7988	1.7423	0.2317	0.1610	0.2785
Kumaraswamy(α, β)	0.0599	0.3519	-117.0740	-113.7988	1.7423	0.2317	0.1610	0.2785
Topp-Leone(θ)	0.3069		-112.9418	-111.3042	2.2026	0.3074	0.1867	0.1414
Topp-Leone(θ)	0.0498		-112.9418	-111.3042	2.2026	0.3074	0.1867	0.1414
unit-Lindley(λ)	0.0732		492.8384	494.4760	7.9700	1.4892	0.9699	<0.001
unit-Lindley(λ)	0.0084		492.8384	494.4760	7.9700	1.4892	0.9699	<0.001
log-Bilal(λ)	4.7063		-118.9374	-117.2998	1.7032	0.2254	0.1504	0.3567
log-Bilal(λ)	0.5491		-118.9374	-117.2998	1.7032	0.2254	0.1504	0.3567

Open in a new tab

Fig 4 displays the estimated densities of the models on the histogram of data and estimated functions of the log-Bilal distribution. The right panel of Fig 4 plays an important role to convince the readers in favor of log-Bilal distribution.

6.2 Education attainment

Here, the performance of the log-Bilal regression model is compared with the beta and unit-Lindley regression models. The used data set comes from the BLI of OECD countries, measured in the year of 2017. The data source is https://stats.oecd.org/index.aspx?DataSetCode=BLI.

The educational attainment values of the OECD countries (y) is considered as response (dependent) variable The goal is to explore the effects of following covariates on the conditional mean of the response variable: homicide rate (HR), dwellings without basic facilities (DWBF), and labor market insecurity (LMI). The logit link function which ensures that the estimated mean lies between 0 and 1, is used for all fitted regression models. The fitted regression model is

\begin{matrix} logit (μ_{i}) = β_{0} + β_{1} {H R}_{i} + β_{2} {D W B F}_{i} + β_{3} {L M I}_{i} . \end{matrix}

Table 2 lists the MLEs, SEs, and corresponding p-values, AIC and BIC for the beta, unit-Lindley, and log-Bilal regression models. The parameter φ represents the dispersion parameter of the beta regression model. Based on the figures in Table 2, all estimated regression parameters are found statistically significant for beta and log-Bilal regression models. Based on the estimated regression parameters of the log-Bilal regression model, it is concluded that when the homicide rate and labor market insecurity increase, the educational attainment decreases in the OECD countries. On the other hand, when the dwellings without basic facilities increases, the educational attainment increases in the OECD countries.

Table 2. MLEs, SEs, corresponding p-values, AIC and BIC values for the fitted models.

Parameters	Beta			unit-Lindley			log-Bilal
Parameters	Estimate	S.E.	p-value	Estimate	S.E.	p-value	Estimate	S.E.	p-value
β₀	1.9208	0.1570	<0.0001	1.6263	0.1887	<0.0001	2.1136	0.2122	<0.0001
β₁	-0.0674	0.0173	<0.0001	-0.0543	0.0304	0.0739	-0.0705	0.0270	0.0089
β₂	0.0434	0.0182	0.0172	0.0521	0.0263	0.0477	0.0724	0.0340	0.0334
β₃	-10.9688	2.1804	<0.0001	-10.8607	2.6421	<0.0001	-14.8182	4.4554	0.0009
φ	15.6120	3.5320	<0.0001	-	-	-	-	-	-
AIC	-63.2794			-61.7153			-64.5549
BIC	-55.0915			-55.1649			-58.0045

Open in a new tab

The information criteria, AIC and BIC statistics, are used to select the best model for the data used. Since the lowest values of the AIC and BIC statistics are belong to the log-Bilal regression model, we conclude that it is best by comparison with the beta and unit-Lindley regression models. Additionally, the residual analysis is done to evaluate the suitability of the fitted models for the data used. Fig 5 displays the quantile-quantile plots of the randomized quantile residuals. As seen from these figures, all fitted regression models provide adequate fits, but, the plotted points for the log-Bilal regression model are more closer the diagonal line than the beta and unit-Lindley regression models.

7 Conclusion

For the first time, a new one-parameter unit distribution is introduced for modeling the extremely left-skewed data sets measured in unit-interval. The new model provides a reasonably better fit than the other one and two-parameter unit distributions such as Topp-Leone, unit-Lindley, Kumaraswamy, and beta distributions when the data sets are extremely skewed to left (right). The newly defined regression model is compared with the famous beta regression model as well as the recently proposed unit-Lindley regression model. The results of the data analysis show that the proposed models work better than other existing models. As a future work of the presented study, we plan to introduce the quantile regression model based on the log-Bilal distribution. Additionally, we extend our model for modeling the longitudinal data sets as an alternative to the longitudinal beta regression model.

Appendix

Beta distribution:
$\begin{matrix} f (x; α, β) = \frac{Γ (α + β)}{Γ (α) Γ (β)} y^{α - 1} {(1 - y)}^{β - 1}, α > 0, β > 0, 0 < y < 1 . \end{matrix}$
Kumaraswamy distribution:
$\begin{matrix} f (y; α, β) = α β y^{α - 1} {(1 - y^{α})}^{β - 1}, α > 0, β > 0, 0 < y < 1 . \end{matrix}$
Topp-Leone distribution:
$\begin{matrix} f (y; θ) = θ (2 - 2 y) {(2 y - y^{2})}^{θ - 1}, θ > 0, 0 < y < 1 . \end{matrix}$
Unit-Lindley distribution:
$\begin{matrix} f (y; θ) = \frac{θ^{2}}{1 + θ} {(1 - y)}^{- 3} exp (- \frac{θ y}{1 - y}), θ > 0, 0 < y < 1 . \end{matrix}$

Data Availability

The data sets are available from OECD (https://stats.oecd.org/Index.aspx?DataSetCode=BLI).

Funding Statement

The author(s) received no specific funding for this work.

References

1. Topp C. W. and Leone F. C. (1955). A family of J–shaped frequency functions. Journal of the American Statistical Association, 50, 209–219. 10.1080/01621459.1955.10501259 [DOI] [Google Scholar]
2. Kumaraswamy P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46, 79–88. 10.1016/0022-1694(80)90036-0 [DOI] [Google Scholar]
3. Mazucheli J., Menezes A. F. B. and Chakraborty S. (2019). On the one parameter unit-Lindley distribution and its associated regression model for proportion data. Journal of Applied Statistics, 46, 700–714. 10.1080/02664763.2018.1511774 [DOI] [Google Scholar]
4. Ghitany M. E., Mazucheli J., Menezes A. F. B. and Alqallaf F. (2018). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics-Theory and Methods, 1–19. [Google Scholar]
5. Mazucheli J., Menezes A. F. and Dey S. (2018). The unit-Birnbaum-Saunders distribution with applications. Chilean Journal of Statistics (ChJS), 9, 47–57. [Google Scholar]
6. Pourdarvish A., Mirmostafaee S. M. T. K. and Naderi K. (2015). The exponentiated Topp-Leone distribution: Properties and application. Journal of Applied Environmental and Biological Sciences, 5, 251–6. [Google Scholar]
7. Khan M. S., King R. and Hudson I. L. (2016). Transmuted kumaraswamy distribution. Statistics in Transition new series, 17, 183–210. 10.21307/stattrans-2016-013 [DOI] [Google Scholar]
8. Altun E. and Hamedani G. G. (2018). The log-xgamma distribution with inference and application. Journal de la Société Française de Statistique, 159, 40–55. [Google Scholar]
9.Altun, E. (2019). The log-weighted exponential regression model: alternative to the beta regression model. Communications in Statistics-Theory and Methods. Forthcoming.
10. Altun E. and Cordeiro G. M. (2020). The unit-improved second-degree Lindley distribution: inference and regression modeling. Computational Statistics, 35(1), 259–279. 10.1007/s00180-019-00921-y [DOI] [Google Scholar]
11. Abd-Elrahman A. M. (2013). Utilizing ordered statistics in lifetime distributions production: a new lifetime distribution and applications. Journal of Probability and Statistical Science, 11, 153–164. [Google Scholar]
12. Butler R. J. and McDonald J. B. (1989). Using incomplete moments to measure inequality. Journal of Econometrics, 42, 109–119. 10.1016/0304-4076(89)90079-1 [DOI] [Google Scholar]
13. Zeng Q., Wen H., Huang H., Pei X. and Wong S. C. (2017). A multivariate random-parameters Tobit model for analyzing highway crash rates by injury severity. Accident Analysis & Prevention, 99, 184–191. 10.1016/j.aap.2016.11.018 [DOI] [PubMed] [Google Scholar]
14. Zeng Q., Wen H., Wong S. C., Huang H., Guo Q. and Pei X. (2020). Spatial joint analysis for zonal daytime and nighttime crash frequencies using a Bayesian bivariate conditional autoregressive model. Journal of Transportation Safety & Security, 12(4), 566–585. 10.4271/2016-01-1439 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Dunn P. K. and Smyth G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5, 236–244. 10.1080/10618600.1996.10474708 [DOI] [Google Scholar]
16. Chen F., Chen S. and Ma X. (2018). Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of safety research, 65, 153–159. 10.1016/j.jsr.2018.02.010 [DOI] [PubMed] [Google Scholar]
17. Chen F., Chen S. and Ma X. (2016). Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. International journal of Environmental Research and Public Health, 13(6), 609 10.3390/ijerph13060609 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0245627.r001

Decision Letter 0

Feng Chen

9 Dec 2020

PONE-D-20-33907

A new regression model for bounded response variable: an alternative to the beta and unit-Lindley regression models

PLOS ONE

Dear Dr. El-Morshedy,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 15 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Feng Chen

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please ensure that all existing datasets used are referenced both in the main text and the Data availability statement. We note that the second dataset does not seem to be referenced.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper proposes a log-Bilal regression model for analyzing bounded response variable. Some statistical properties of the log-Bilal distribution, such as moments and quantiles are derived. The outperformance of the proposed model over the unit-Lindley and beta regression models in terms of model fit is demonstrated by the empirical studies using two real-world datasets. While the proposed model sounds reasonably, the English writing is poor. There are a number of grammar errors and improper expressions in the manuscript, where the language requires professional proofreading.

In the second paragraph, the authors stated that “Our aim is... to remove the deficiencies of

the existing distributions...” What are the deficiencies? They should be illustrated explicitly, as they reveal the research gap and imply the potential contributions of this research.

More references on the model comparison criteria (e.g., MSE and BIC) should be added, such as:

A multivariate random parameters Tobit model for analyzing highway crash rate by injury severity. Accident Analysis and Prevention, 2017, 99: 184-191.

Bayesian spatial-temporal model for the main and interaction effects of roadway and weather characteristics on freeway crash incidence. Accident Analysis and Prevention, 2019, 132: 1-6.

Spatial joint analysis for zonal daytime and nighttime crash frequencies using a Bayesian bivariate conditional autoregressive model. Journal of Transportation Safety and Security, 2020, 12(4): 566-585.

Besides, the Conclusion is too short. The limitations of the current research or some directions for future research should be presented in this section.

Reviewer #2: The topic of this paper is interesting. The methods sound. The results are meaningful and useful. There are several suggestions to improve this paper.

1. The English of this paper need to be polished.

2. When talking about the Maximum likelihood, AIC and BIC, references are needed. For example, the following ones.

[1] Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. 2018, JOURNAL OF SAFETY RESEARCH. 65: 153-159.

[2] “Crash Frequency Modeling Using Real-Time Environmental and Traffic Data and Unbalanced Panel Data Models”, International Journal of Environmental Research and Public Health, 2016, 13(6), 609.

AIC, BIC, log-likelihood

[3] “Investigating the Differences of Single- and Multi-vehicle Accident Probability Using Mixed Logit Model", Journal of Advanced Transportation, 2018, UNSP 2702360.

3. The conclusion part is too simple. At least, the future direction of similar studies could be added.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jan 22;16(1):e0245627. doi: 10.1371/journal.pone.0245627.r002

Author response to Decision Letter 0

25 Dec 2020

Dear Professor Feng Chen

We have prepared the revision of our paper "A new regression model for bounded response variable: an alternative to the beta and unit-Lindley regression models" taking into account all comments of the reviewers. We thank the reviewers for their time and important suggestions and criticisms, which greatly improved our manuscript.

It will make your task substantially easier if we itemize the changes made to the manuscript during the revision. We now answer the

questions/comments in the order they appeared in the reports and outline also some important changes made in the paper.

We do think that the revised manuscript represents an improved version as compared to the previous version.

Reviewer 1:

\\item[1.] While the proposed model sounds reasonably, the English writing is poor. There are a number of grammar errors and improper expressions in the manuscript, where the language requires professional proofreading.

Answer: Thank you for the comment. The language of the manuscript has been corrected by the English Editing Service.

\\item[2.] In the second paragraph, the authors stated that “Our aim is... to remove the deficiencies of the existing distributions...” What are the deficiencies? They should be illustrated explicitly, as they reveal the research gap and imply the potential contributions of this research.

Answer: Thank you for the comment. We clarified the deficiencies of the existing models and emphasized the contribution of the proposed model.

\\item[3.] More references on the model comparison criteria (e.g., MSE and BIC) should be added, such as

A multivariate random parameters Tobit model for analyzing highway crash rate by injury severity. Accident Analysis and Prevention, 2017, 99: 184-191.

Bayesian spatial-temporal model for the main and interaction effects of roadway and weather characteristics on freeway crash incidence. Accident Analysis and Prevention, 2019, 132: 1-6.

Answer: Thank you for the comment. Done.

\\item[4.] Besides, the Conclusion is too short. The limitations of the current research or some directions for future research should be presented in this section.

Answer: Thank you for the comment. The future research plan has been added.

Reviewer 2:

\\item[1.] The English of this paper need to be polished

Answer: Thank you for your comment. The language of the manuscript has been corrected by the English Editing Service.

\\item[2.] When talking about the Maximum likelihood, AIC and BIC, references are needed. For example, the following ones

1. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. 2018, JOURNAL OF SAFETY RESEARCH. 65: 153-159.

2. Crash Frequency Modeling Using Real-Time Environmental and Traffic Data and Unbalanced Panel Data Models, International Journal of Environmental Research and Public Health, 2016, 13(6), 609.

3. Investigating the Differences of Single- and Multi-vehicle Accident Probability Using Mixed Logit Model, Journal of Advanced Transportation, 2018, UNSP 2702360.

Answer: Thank you for your comment. Done.

\\item[3.] The conclusion part is too simple. At least, the future direction of similar studies could be added.

Answer: Thank you for your comment. The future research plan has been added in the conclusion section.

All minor corrections have been considered and acted upon. All typos have been corrected. We thank you, the associate editor and the reviewers again for the constructive comments and hope that the revision is now appropriate for PLOS ONE.

Please, do not hesitate to contact me at the address above or by e-mail if you have any questions.

I look forward to hearing from you on this revised version.

Yours Sincerely,

Attachment

Submitted filename: reply letter.tex

Click here for additional data file.^{(5.4KB, tex)}

PLoS One. doi: 10.1371/journal.pone.0245627.r003

Decision Letter 1

Feng Chen

5 Jan 2021

A new regression model for bounded response variable: an alternative to the beta and unit-Lindley regression models

PONE-D-20-33907R1

Dear Dr. El-Morshedy,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Feng Chen

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. doi: 10.1371/journal.pone.0245627.r004

Acceptance letter

Feng Chen

11 Jan 2021

PONE-D-20-33907R1

A new regression model for bounded response variable: an alternative to the beta and unit-Lindley regression models

Dear Dr. El-Morshedy:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Feng Chen

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: reply letter.tex

Click here for additional data file.^{(5.4KB, tex)}

Data Availability Statement

The data sets are available from OECD (https://stats.oecd.org/Index.aspx?DataSetCode=BLI).

[pone.0245627.ref001] 1. Topp C. W. and Leone F. C. (1955). A family of J–shaped frequency functions. Journal of the American Statistical Association, 50, 209–219. 10.1080/01621459.1955.10501259 [DOI] [Google Scholar]

[pone.0245627.ref002] 2. Kumaraswamy P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46, 79–88. 10.1016/0022-1694(80)90036-0 [DOI] [Google Scholar]

[pone.0245627.ref003] 3. Mazucheli J., Menezes A. F. B. and Chakraborty S. (2019). On the one parameter unit-Lindley distribution and its associated regression model for proportion data. Journal of Applied Statistics, 46, 700–714. 10.1080/02664763.2018.1511774 [DOI] [Google Scholar]

[pone.0245627.ref004] 4. Ghitany M. E., Mazucheli J., Menezes A. F. B. and Alqallaf F. (2018). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics-Theory and Methods, 1–19. [Google Scholar]

[pone.0245627.ref005] 5. Mazucheli J., Menezes A. F. and Dey S. (2018). The unit-Birnbaum-Saunders distribution with applications. Chilean Journal of Statistics (ChJS), 9, 47–57. [Google Scholar]

[pone.0245627.ref006] 6. Pourdarvish A., Mirmostafaee S. M. T. K. and Naderi K. (2015). The exponentiated Topp-Leone distribution: Properties and application. Journal of Applied Environmental and Biological Sciences, 5, 251–6. [Google Scholar]

[pone.0245627.ref007] 7. Khan M. S., King R. and Hudson I. L. (2016). Transmuted kumaraswamy distribution. Statistics in Transition new series, 17, 183–210. 10.21307/stattrans-2016-013 [DOI] [Google Scholar]

[pone.0245627.ref008] 8. Altun E. and Hamedani G. G. (2018). The log-xgamma distribution with inference and application. Journal de la Société Française de Statistique, 159, 40–55. [Google Scholar]

[pone.0245627.ref009] 9.Altun, E. (2019). The log-weighted exponential regression model: alternative to the beta regression model. Communications in Statistics-Theory and Methods. Forthcoming.

[pone.0245627.ref010] 10. Altun E. and Cordeiro G. M. (2020). The unit-improved second-degree Lindley distribution: inference and regression modeling. Computational Statistics, 35(1), 259–279. 10.1007/s00180-019-00921-y [DOI] [Google Scholar]

[pone.0245627.ref011] 11. Abd-Elrahman A. M. (2013). Utilizing ordered statistics in lifetime distributions production: a new lifetime distribution and applications. Journal of Probability and Statistical Science, 11, 153–164. [Google Scholar]

[pone.0245627.ref012] 12. Butler R. J. and McDonald J. B. (1989). Using incomplete moments to measure inequality. Journal of Econometrics, 42, 109–119. 10.1016/0304-4076(89)90079-1 [DOI] [Google Scholar]

[pone.0245627.ref013] 13. Zeng Q., Wen H., Huang H., Pei X. and Wong S. C. (2017). A multivariate random-parameters Tobit model for analyzing highway crash rates by injury severity. Accident Analysis & Prevention, 99, 184–191. 10.1016/j.aap.2016.11.018 [DOI] [PubMed] [Google Scholar]

[pone.0245627.ref014] 14. Zeng Q., Wen H., Wong S. C., Huang H., Guo Q. and Pei X. (2020). Spatial joint analysis for zonal daytime and nighttime crash frequencies using a Bayesian bivariate conditional autoregressive model. Journal of Transportation Safety & Security, 12(4), 566–585. 10.4271/2016-01-1439 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0245627.ref015] 15. Dunn P. K. and Smyth G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5, 236–244. 10.1080/10618600.1996.10474708 [DOI] [Google Scholar]

[pone.0245627.ref016] 16. Chen F., Chen S. and Ma X. (2018). Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of safety research, 65, 153–159. 10.1016/j.jsr.2018.02.010 [DOI] [PubMed] [Google Scholar]

[pone.0245627.ref017] 17. Chen F., Chen S. and Ma X. (2016). Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. International journal of Environmental Research and Public Health, 13(6), 609 10.3390/ijerph13060609 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A new regression model for bounded response variable: An alternative to the beta and unit-Lindley regression models

Emrah Altun

M El-Morshedy

M S Eliwa

Roles

Abstract

1 Introduction

2 The log-Bilal distribution

Fig 1. The pdf shapes of the log-Bilal distribution.

Fig 2. The hrf plots (left) and hrf regions (right) of log-Bilal distribution for selected parameter values.

2.1 Moments

2.2 Incomplete moments

2.3 Exponential family

3 Estimation

3.1 Maximum likelihood

3.2 Method of moments

3.3 Least squares

3.4 Weighted least squares

4 Simulation

Fig 3. The simulation results of the log-Bilal distribution.

5 The log-Bilal regression model

5.1 Residuals analysis

6 Empirical studies

6.1 Dwellings without basic facilities

Table 1. The estimated parameters of the fitted models (SEs are on the second line).

Fig 4. The estimated pdfs of the fitted distribution (left-panel) and some fitted functions of the log-Bilal distribution (right-panel).

6.2 Education attainment

Table 2. MLEs, SEs, corresponding p-values, AIC and BIC values for the fitted models.

Fig 5. The quantile-quantile plots of the randomized quantile residuals: Beta (left), unit-Lindley (middle) and log-Bilal (right).

7 Conclusion

Appendix

Data Availability

Funding Statement

References

Decision Letter 0

Feng Chen

Roles

Author response to Decision Letter 0

Decision Letter 1

Feng Chen

Roles

Acceptance letter

Feng Chen

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases