Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 Aug 3;48(16):3174–3192. doi: 10.1080/02664763.2020.1796936

The unit extended Weibull families of distributions and its applications

Renata Rojas Guerra a,CONTACT, Fernando A Peña-Ramírez a, Marcelo Bourguignon b
PMCID: PMC9041710  PMID: 35707261

Abstract

In this paper, two new general families of distributions supported on the unit interval are introduced. The proposed families include several known models as special cases and define at least twenty (each one) new special models. Since the list of well-being indicators may include several double bounded random variables, the applicability for modeling those is the major practical motivation for introducing the distributions on those families. We propose a parametrization of the new families in terms of the median and develop a shiny application to provide interactive density shape illustrations for some special cases. Various properties of the introduced families are studied. Some special models in the new families are discussed. In particular, the complementary unit Weibull distribution is studied in some detail. The method of maximum likelihood for estimating the model parameters is discussed. An extensive Monte Carlo experiment is conducted to evaluate the performances of these estimators in finite samples. Applications to the literacy rate in Brazilian and Colombian municipalities illustrate the usefulness of the two new families for modeling well-being indicators.

Keywords: Extended Weibull distribution, maximum likelihood estimation, moments, unit distributions, well-being indicators

AMS CLASSIFICATION: 60E05

1. Introduction

When analyzing the standard of living elements, Sen [25] claimed that a ‘good life’ could be measured through indicators that consider the actual outcome of peoples' decisions and also their capabilities (opportunities they have). The gross domestic product is usually used as a proxy for capabilities, and social indicators to measure actual outcomes (see Royuela and García [23]). The list of well-being indicators may include several double bounded random variables, such as infant mortality, literacy and murder rates, telephone, television, and internet availability, and human development index. Those indicators also represent essential aspects of the international development agenda [28]. In this context, it is necessary to consider probability distributions that take those characteristics into account.

Some double bounded distributions that have been widely studied in the literature are the classical Beta and Kumaraswamy (Kw) distributions. Much theoretical work has been concentrated on the use of those models. However, the beta and Kw distributions are not always suitable for modeling well-being indicators (see Section 7). On the other hand, only a few papers have dealt with distributions supported on the unit interval. Gómez-Déniz et al. [6] studied the log-Lindley distribution. Mazucheli et al. [14] discussed the unit Birnbaum-Saunders distribution. Mazucheli et al. [12] defined the unit-Lindley distribution, and Ghitany et al. [5] the unit inverse-Gaussian distribution. Recently, Altun and Cordeiro [2] introduced the unit-improved second-degree Lindley distribution.

In this paper, we introduce a new family of distributions for modeling random variables with support on the unit domain. The so-called unit extended Weibull ( UEW) family may also be considered to model double bounded variate. We provide a comprehensive account of the mathematical properties of the proposed family of distribution. The new family of distributions provides a rich source of alternative distributions for analyzing bounded data. Additionally, the complementary unit extended Weibull ( CUEW) is also derived. We note five motivations for the proposed families of distributions:

  1. The proposed families define at least forty new special models;

  2. Some distributions commonly used for parametric models on the unit interval are special cases of the proposed families, such as the unit Weibull [15,16] distribution;

  3. The expected value of the proposed models can be obtained in closed form;

  4. The proposed family of distributions is median-parametrized, facilitating the interpretation of its location parameter;

  5. Simulations and real data sets show the good performance of these new models (see Section 7).

The proposed families are obtained from a variable transformation in the extended Weibull ( EW) class of distributions, pioneered by [8]. Its cumulative distribution function (cdf) is given by

G(x)=1exp{αH(x;ξ)}, (1)

where x>0,α>0 and H(x;ξ) is a non-negative monotonically increasing function which depends on the parameter vector ξ. The corresponding probability density function (pdf) is given by

g(x)=αh(x;ξ)exp{αH(x;ξ)}, (2)

where h(x;ξ) is the derivative of H(x;ξ) with respect to x. We emphasize that several well-known distributions can be obtained from different expressions of H(x;ξ) and refer the reader to [24] for a detailed survey on the EW special models, with corresponding H(x;ξ) and h(x;ξ) functions. Nadarajah and Kotz [21] and Pham and Lai [22] also give more details on this family.

The applicability for modeling well-being indicators is the major practical motivation for introducing the distributions on those families. Section 7 illustrates their relevance by means of applications on the Brazilian and Colombian citizens' literacy rates. Literacy levels are related to quality education and have been an international concern. At the World Education Forum, in Dakar, Senegal, 2000, 164 governments signed a global commitment to provide quality basic education for all children, youth, and adults. One of the Dakar's six goals was halving illiteracy rate by 2015. By adopting these goals, Brazil and Colombia joined the group of countries committed to this achievement [30]. This variable is also a useful indicator of poverty, often considered to evaluate the overall standard of living in a country. Messias [17] verified that this variable is strongly associated with life expectancy in Brazil. Ahnen [1] emphasize that the literacy rate can be a control variable to Brazilian police violence. Massa [11] have found statistically significant associations between self-rated health and area-level literacy rates in adults from the 27 Brazilian capitals. Finally, Royuela [23] consider it to examine quality life convergence in Colombia.

The rest of the paper is organized as follows. In Section 2, we define the new families. Section 3 provides some general properties, and Section 4 the parameter estimation by maximum likelihood. Section 5 presents some special distributions obtained from the UEW class. Some general properties and a simulation study of a UEW special model is presented in Section 6. Empirical applications and concluding remarks are presented in Sections 7 and 8, respectively.

2. The unit extended Weibull family

Let X be a non negative random variable which follows a EW distribution with cdf and pdf in (1) and (2), respectively, and denote XEW(α,ξ). By considering the transformation Y=eX, we derive the UEW class of distributions. Thus, the cdf of the new family is

FY(y)=exp{αH(log(y);ξ)}, (3)

where y(0,1), α>0 is a shape parameter, and H(log(y);ξ) is obtained by taking this transformation in the random variable X. In the supplementary material, we prove that α is a location and scale-invariant measure. The vector ξ is a vector of shape parameters that depend on the chosen H(log(y);ξ) function. We develop a shiny application to provide interactive plots and illustrate the density shapes of some special cases upon variations in ξ1

The pdf corresponding to (3) is

fY(y)=αyh(log(y);ξ)exp{αH(log(y);ξ)}, (4)

where h(log(y);ξ) is the derivative of H(x;ξ) with respect to x evaluated in log(y). Hereafter, let Y be a random variable having the UEW pdf (4) with parameters α and ξ, say YUEW(α,ξ). The quantile function (qf) of the UEW family can be expressed in terms of H1(), which is the inverse function of H(). Therefore, the qf of Y has the form

QY(u;α,ξ)=exp{H1(log(u)α;ξ)}. (5)

The quantities H(log(y);ξ), h(logy;ξ), and the corresponding parameter vectors for some EW special models are presented in the paper supplementary material. By replacing these quantities in (4), we obtain a new unit distribution on the UEW family.

Next, we provide a different parameterization of the UEW class distributions, in which one of its parameters corresponds to the median. The proposition below refers to the UEW median-based parameterization.

Proposition 2.1

Let Y be a random variable with cdf given by

FY(y)=2H(log(y);ξ)/H(log(μ);ξ), (6)

where y(0,1). Then Y belongs to the UEW class of distributions, and μ(0,1) is a location parameter which corresponds to the median of Y.

Proof.

The results hold by setting α=log(2)/H(log(μ);ξ) in Equation (3) to get (6). Therefore, the qf of Y is given by

QY(u;μ,ξ)=exp{H1(log(u)log(2)H(log(μ);ξ))}, (7)

and it follows that, by taking u = 0.5 in Equation (7), μ is the median of Y. This completes the proof.

The pdf of Y can be written in the median-based parameterization as

fY(y)=log(2)h(log(y);ξ)yH(log(μ);ξ)2H(log(y);ξ)/H(log(μ);ξ),

where y(0,1).

Analogously, let XEW(α,ξ) and now consider the transformation Z=1eX.

Under the above transformation, we may derive the CUEW class of distributions. A similar approach was considered by [7] for obtaining a second kind of unit-Gamma distribution. Note that the CUEW family can also be derived by taking the transformation Z = 1−Y. Thus, the cdf of this alternative unit family is

FZ(z)=1FY(1z)=1exp{αH[log(1z);ξ]}, (8)

and the corresponding pdf reduces to

fz(z)=fy(1z)=α1zh(log(1z);ξ)exp{αH(log(1z);ξ)}, (9)

where h(log(1z);ξ) is the derivative of H(x;ξ) with respect to x evaluated in log(1z). Hereafter, let Z be a random variable having the CUEW pdf (9) with parameters α and ξ, say ZCUEW(α,ξ). The qf of Z has the form

QZ(u;α,ξ)=1exp{H1(log(1u)α;ξ)}. (10)

The quantities H(log(1z);ξ), h(log(1z);ξ), and the corresponding parameter vectors for some EW special models are presented in the paper supplementary material. By replacing these quantities in (9), we obtain a new unit distribution on the CUEW family.

The following result shows an alternative parametrization of the CUEW class of distributions, in which one of its parameters corresponds to the median of the random variable Z.

Proposition 2.2

Let Z be a random variable with cdf given by

FZ(y)=12H[log(1z);ξ]/H[log(1μ);ξ], (11)

where z(0,1). Then Z belongs to the CUEW class of distributions, and μ(0,1) is a location parameter which corresponds to the median of Z.

Proof.

The results hold by setting α=log(2)/H(log(1μ);ξ) in Equation (8) to get (11). Therefore, the qf of Z is given by

QZ(u;μ,ξ)=1exp{H1[log(1u)log(2)H(log(1μ);ξ)]}, (12)

and it follows that, by taking u = 0.5 in Equation (12), μ is the median of Z. This completes the proof.

The pdf of Z can be written in the median-based parameterization as

fZ(z)=log(2)h(log(1z);ξ)H(log(1μ);ξ)2H(log(1z);ξ)/H(log(1μ);ξ),

where z(0,1).

3. General mathematical properties

In this section, we derive some useful statistical quantities for both introduced families, including the raw and incomplete moments. All the results of this section can be easily extended to the UEW and CUEW median-based parametrization by substituting α for log(2)/H(log(μ);ξ) and log(2)/H(log(1μ);ξ), respectively.

3.1. Moments

Many of the important characteristics and features of a distribution are obtained through ordinary moments. The sth moment of Y, with fY(y) from (4), is given by

E(Ys)=E(esX)=MX(s), (13)

where MX(t)=0etXfX(x)dx is the moment generating function of XEW(α,ξ). Similar computations can be done for the CUEW family. Note that, the sth moment of Z can be written as E(Zs)=E(1Y)s. Using the binomial theorem, and the result in (13), it can be reduced to

E(Zs)=j=0s(sj)(1)jMX(j). (14)

3.2. Incomplete moments

The sth incomplete moment of Y is defined as TY(r,s)=0rysfY(y)dy. Taking fY(y) from (4), we have that

TY(r,s)=α0rys1h(log(y);ξ)exp{αH(log(y);ξ)}dy.

Setting u=αH(log(y);ξ), we have du/dy=y1h(log(y);ξ) and then

TY(r,s)=αH(log(r);ξ)exp{[u+sH1(u/α;ξ)]}du. (15)

Using the relationship between the introduced families, the sth incomplete moment of Z is given by TZ(r,s)=1r1(1y)sfY(y)dy. Using the binomial theorem, and after some algebra, we can write

TZ(r,s)=E(Zr)j=0s(sj)(1)jTY(1r,j). (16)

4. Maximum likelihood estimation

The conventional likelihood estimation techniques can be applied to estimate the parameters of the UEW and CUEW families. Let Y1,,Yn be a random sample of size n from the UEW family distributions. Thus, the log-likelihood function for the parameter vector Θ=(α,ξ) can be written as

(Θ)=nlogαi=1nlog(yi)+i=1nlog[h(log(y);ξ)]αi=1nH(log(y);ξ). (17)

The components of the score function U(Θ)=(Uα,Uξk) are given by

Uα=nαi=1nH(log(y);ξ)

and

Uξk=i=1n1h(log(y);ξ)h(log(y);ξ)ξkαi=1nH(log(y);ξ)ξk.

Setting Uα and Uξk equal to zero and solving these equations simultaneously yields the maximum likelihood estimators (MLE), U(Θ^)=(α^,ξ^), of Θ. These equations cannot be solved analytically. Statistical software can be used to solve them numerically using iterative methods such as the Newton-Raphson type algorithms.

However, for fixed ξ, it is possible to obtain a semi-closed MLE of α. From Uα=0, the estimator of α is given by

α^(ξ^)=ni=1nH(log(y);ξ).

By replacing α for α^(ξ^) in Equation (17) yields the profile log-likelihood for the parameter vector ξ. Maximizing the profile log-likelihood may be simpler since it involves one less parameter. The log-likelihood function for the parameter vector Θ=(μ,ξ), of the UEW median-based parameterization, is obtained just by making α=log(2)/H(log(μ);ξ) in Equation (17). The component Uξk of the score function remains unchanged, and Uμ is given by

Uμ=h(log(μ);ξ)μH(log(μ);ξ)[nlog(2)H(log(μ);ξ)i=1nH(log(y);ξ)].

For the special models, where ξ=, it is possible to obtain a closed-form MLE for μ. The results of the CUEW family can be derived through similar computations. It is easy to note that the log-likelihood of Z is obtained by taking y = 1−z and log(μ) = log(1−μ) in (17).

5. Some special models

In this section, we provide a few examples of unit distributions that arise as special models of the proposed families and are still not defined in the literature. Therefore, the Gompertz and Lomax models are considered as parent distributions in both introduced families. Those models are introduced under the median-parametrization given in Section 2. The unit Weibull (UW) distribution was pioneered by [15] using a transformation in a Weibull random variable. We also note that it arises by considering the Weibull distribution as a parent model in the EW family.

5.1. Unit Gompertz distribution

Consider the Gompertz distribution as a model in the UEW family, we obtain the unit Gompertz (UGo) distribution, in which its cdf and pdf takes the form

FUGo(y)=2(yβ1)/(1μβ)

and

fUGo(y)=βlog(2)μβ1y(β+1)2(yβ1)/(1μβ), (18)

respectively, where y(0,1), β>0 is a shape parameter and μ(0,1) is the median parameter. For the pdf (18), it is easy to verify that

limy0fUGo(y)=0andlimy1fUGo(y)=βlog(2)μβ1.

The shapes behavior of the UGo pdf is given by the following proposition.

Proposition 5.1

Let Y be a random variable following the UGo distribution. Then, its density is unimodal with mode at

y=[βlog(2)(β+1)(μβ1)]1/β.

The proof of Proposition 5.1 can be found in the paper supplementary material. From Equation (5), the UGo qf is obtained as

QUGo(u)=[1+(1μβ)log(u)log(2)]1/β.

The UGo first raw moment reduces to

EUGo(Y)=21/(μβ1)(log(2)μβ1)1/βΓ(11β,log(2)μβ1),

where Γ(a,b)=bxa1exdx is the upper incomplete gamma function.

5.2. Unit Lomax distribution

By considering the Lomax distribution as a model in the UEW family, we derive the unit Lomax (UL) distribution, in which its cdf and pdf takes the form

FUL(y)=(1β1log(y))log(2)/log(1β1log(μ))

and

fUL(y)=log(2)βy[log(1β1log(μ))]1×(1β1log(y))log(2)/log(1β1log(μ))1, (19)

respectively, where y(0,1), β>0 is a shape parameter and μ(0,1) is the median parameter. For the pdf (19), it can be verified that

limy0fUL(y)=andlimy1fUL(y)=log(2)β[log(1β1log(μ))]1.

Proposition 5.2

Let Y be a random variable following the UL distribution. Then, for log(μ)>β[121/(β1)], its density is bathtub shaped with minimum at

y=exp{β1log(2)log(1β1log(μ))}.

The proof of Proposition 5.2 can be found in the paper supplementary material. From Equation (5), the UL qf is obtained as

QUL(u)=exp{β(1ulog(1β1log(μ))/log(2))}

and its first raw moment as

EUL(Y)=log(2)eββlog(2)/log(1β1log(μ))log(1β1log(μ))Γ(log(2)log(1β1log(μ)),β).

5.3. Complementary unit Gompertz distribution

By considering the Gompertz distribution as a model in the CUEW family, we obtain the complementary unit Gompertz (CUGo) distribution, in which its cdf and pdf takes the form

FCUGo(z)=12[(1z)β1]/[1(1μ)β]

and

fCUGo(z)=βlog(2)(1μ)β1(1z)(β+1)2[(1z)β1]/[1(1μ)β], (20)

respectively, where z(0,1), β>0 is a shape parameter and μ(0,1) is the median parameter.

For the pdf (20), it is easy to verify that

limz0fCUGo(z)=βlog(2)(1μ)β1andlimz1fCUGo(z)=0.

The shapes behavior of the CUGo pdf is given by the following proposition.

Proposition 5.3

Let Z be a random variable following the CUGo distribution. Then its density is unimodal with mode at

z=1[βlog(2)(β+1)[(1μ)β1]]1/β.

The proof of Proposition 5.3 can be found in the paper supplementary material. From Equation (10), the CUGo qf is obtained as

QCUGo(u)=1[1+log(1u)log(2)[1(1μ)β]]1/β,

and its first raw moment as

ECUGo(Z)=121/[(1μ)β1](log(2)(1μ)β1)1/βΓ(11β,log(2)(1μ)β1).

5.4. Complementary unit Lomax distribution

By considering the Lomax distribution as a model in the CUEW family, we derive the complementary unit Lomax (CUL) distribution, in which its cdf and pdf takes the form

FCUL(z)=1[1β1log(1z)]log(2)/log[1β1log(1μ)]

and

fCUL(z)=log(2)β(1z){log[1β1log(1μ)]}1×[1β1log(1z)]log(2)/log[1β1log(1μ)]1, (21)

respectively, where z(0,1), β>0 is a shape parameter and μ(0,1) is the median parameter. For the pdf (21), it is easy to verify that

limz0fCUL(z)=log(2)β{log[1β1log(1μ)]}1andlimz1fCUL(z)=.

Proposition 5.4

Let Z be a random variable following the CUL distribution. Then, for log(1μ)>β[121/β1], its density is bathtub shaped with minimum at

z=1exp{β1log(2)log[1β1log(1μ)]}.

The proof of Proposition 5.4 can be found in the paper supplementary material. From Equation (10), the CUL qf is obtained as

QCUL(u)=1exp{β[1(1u)log[1β1log(1μ)]/log(2)]}

and its first raw moment as

ECUL(Z)=1log(2)eββlog(2)/log[1β1log(1μ)]log[1β1log(1μ)]Γ(log(2)log[1β1log(1μ)],β).

6. The complementary unit Weibull distribution and its properties

The two-parameter Weibull distribution [31] is one of the most popular models for modeling non-negative random processes. It has applications ranging from reliability engineering, survival analysis in biomedical sciences, mortality study, insurance, and social sciences, among others. In this section, we describe of the mathematical properties for the complementary unit Weibull (CUW) distribution under the median-parametrization given in Section 2. The parameter estimation by maximum likelihood method is presented and a simulation study is carried out.

Thus, the median-based CUW cdf and pdf are

FCUW(z)=12[log(1z)/log(1μ)]γ

and

fCUW(z)=γlog(2)1z[log(1μ)]γ[log(1z)]γ12[log(1z)/log(1μ)]γ, (22)

respectively, where γ>0. Figure 1 illustrates its pdf shapes for some parameter values. From Equation (5), the qf of CUW distribution is obtained as

QCUW(u)=1(1μ)[log(1u)/log(2)]1/γ. (23)

Figure 1.

Figure 1.

Pdf plots for the CUW pdf.

Based on previous results from Equation (14), the hth moment of the CUW model is given by

ECUW(Zs)=j=0s(sj)(1)jk=0iklogk(1μ)(log2)k/γk!Γ(kγ+1). (24)

The hth cumulant ( κh) of the CUW model can be obtained from (24) using well-known relationships. We have that

κh=ECUW(Yh)k=1h1(h1k1)κkECUW(Yhk),

where κ1=ECUW(Y). Note that κ2 is the variance of the CUW model. The skewness γ1=κ3/κ23/2 and kurtosis γ2=κ4/κ22 follow from the third and fourth standardized cumulants, respectively.

Table 1 provides a numerical study by computing the first four moments, variance, γ1, and γ2 for ten different scenarios. Note that the parameterizations chosen are the same presented in Figure 1. All the quantities computed are in agreement with the behavior in those plots. Also, these illustrations indicate that the CUW distribution is quite flexible not only for density shapes but also regarding the moments, skewness, and kurtosis. It can accommodate positive and negative values for both skewness and kurtosis coefficients. Combining (15) and (16), and using the proposed median-parametrization, the sth incomplete moment of the CUW distribution is

TCUW(r,s)=ECUW(Zs)j=0s(sj)(1)j×log2[log(1r)/log(1μ)]γexp{[u+s(u/α)γ]}du.

Using the exponential expansion in exp{s(u/α)γ}, and after some algebra, it can be determined as

TCUW(r,s)=ECUW(Zs)j=0s(sj)(1)jk=0(j)klogk(1μ)(log2)k/γk!×Γ(kγ+1,log2[log(1r)log(1μ)]γ).

Table 1. First four moments, variance, skewness and kurtosis coefficients for some scenarios of the CUW distribution.

μ γ ECUW(Z) EUW(Z2) ECUW(Z3) ECUW(Z4) κ2 γ1 γ2
0.10 0.5 0.2245 0.1250 0.0889 0.0704 0.0746 1.3435 0.7287
0.10 0.9 0.1406 0.0374 0.0136 0.0060 0.0177 1.4362 2.1116
0.15 1.5 0.1644 0.0370 0.0101 0.0031 0.0100 0.6986 0.1574
0.30 2.0 0.3029 0.1089 0.0437 0.0190 0.0172 0.1441 −0.5192
0.50 3.0 0.4871 0.2538 0.1389 0.0788 0.0166 −0.4517 −0.1373
0.70 0.7 0.6150 0.4992 0.4387 0.4000 0.1211 −0.4071 −1.3292
0.80 0.9 0.6897 0.5713 0.5040 0.4589 0.0956 −0.7420 −0.7841
0.80 2.5 0.7596 0.6026 0.4919 0.4098 0.0257 −1.1871 1.2709
0.85 2.7 0.8095 0.6757 0.5757 0.4980 0.0204 −1.4738 2.3962
0.90 1.2 0.7897 0.6859 0.6197 0.5722 0.0622 −1.3059 0.7269

6.1. Maximum likelihood estimation

Let z1,,zn be a random sample of size n from the CUW (μ,γ) distribution. Let Θ=(μ,γ) be the parameter vector. The log-likelihood function for Θ can be expressed as

(Θ)=nlog{γlog(2)[log(1μ)]γ}i=1nlog(1zi)+(γ1)i=1nlog[log(1zi)]log(2)[log(1μ)]γi=1n[log(1zi)]γ. (25)

The maximum likelihood estimates can be obtained by maximizing directly the Equation (25). Alternatively, we can obtain the score vector U(Θ)=(Uμ,Uγ), set their components to zero and solve these equations simultaneously. For the CUW distribution, the U(Θ) components are given by

Uμ=γ(1μ)log(1μ)[nlog(2)[log(1μ)]γi=1n[log(1zi)]γ]

and

Uγ=nγ+i=1nlog[log(1zi)]nlog[log(1μ)]log(2)[log(1μ)]γi=1n[log(1zi)]γlog[log(1zi)log(1μ)].

As reported in Section 4, we note that for fixed γ, the semi-closed MLE of μ is given by

μ^(γ)=1exp{(log(2)ni=1n[log(1zi)]γ)1/γ}. (26)

By replacing μ by μ^(γ) in Equation (25), we obtain the profile log-likelihood function for γ, expressed as

(γ)=nlog(γn)nlog{i=1n[log(1zi)]γ}i=1nlog(1zi)+(γ1)i=1nlog[log(1zi)]n. (27)

The score vector for (27), Uγ, is given by

Uγ=nγ+i=1nlog[log(1zi)]ni=1nlog[log(1zi)][log(1zi)]γi=1n[log(1zi)]γ.

6.2. Simulation study

We shall now present the results from Monte Carlo simulation studies conducted to evaluate the performance of the MLEs of the parameters that index the CUW distribution. The simulations are carried out in the R programming language, using the optim routine with BFGS quasi-Newton nonlinear optimization algorithm. The inverse transform method is employed to generate a size n sample from a CUW distribution using (23). We simulate 10,000 Monte Carlo replications, the sample sizes being n{20,50,100,300}. It is considered ten different combinations for the parameter vector Θ. The scenarios are defined by the illustrations discussed in Figure 1 and Table 1. Thus, the chosen parametrizations cover different density shapes and also various combinations of skewness and kurtosis coefficients.

The mean estimates, percentage relative bias (RB%), and root mean squared errors (RMSE) are computed by maximizing (27) and taking the MLE of μ from (26). One advantage of using the profile log-likelihood is that the maximization of (27) is simpler than for (25), once it involves only one parameter.

The results for each generation scheme are reported in Table 2. As expected, the RMSEs tend to decrease as the sample size increases. We also observe that the overall performance of the MLEs is appropriate. Note that when n = 300, the RB% is less than 1% for both parameter estimates and all the scenarios. In general, the μ^ estimates are more accurate when compared with γ^. It also presents smaller RMSEs, mostly when n = 20.

Table 2. Monte Carlo results for the mean estimates, RB%, and RMSEs of the CUW distribution with 10,000 replications.

        Mean RB% RMSE
Scenario μ γ n μ^ γ^ μ^ γ^ μ^ γ^
1 0.1 0.5 20 0.1094 0.5381 9.3663 7.6129 0.0568 0.1094
      50 0.1032 0.5145 3.1661 2.8967 0.0353 0.0606
      100 0.1011 0.5067 1.0804 1.3404 0.0247 0.0408
      300 0.1000 0.5023 0.0246 0.4585 0.0150 0.0229
2 0.1 0.9 20 0.1014 0.9694 1.3847 7.7074 0.0303 0.1980
      50 0.0999 0.9246 −0.0722 2.7344 0.0199 0.1096
      100 0.1000 0.9140 −0.0176 1.5596 0.0143 0.0738
      300 0.0993 0.9037 −0.6941 0.4075 0.0105 0.0418
3 0.15 1.5 20 0.1500 1.6142 0.0273 7.6133 0.0271 0.3288
      50 0.1494 1.5479 −0.3772 3.1947 0.0190 0.1848
      100 0.1489 1.5195 −0.7447 1.3028 0.0157 0.1221
      300 0.1495 1.5075 −0.3650 0.4983 0.0100 0.0691
4 0.3 2.0 20 0.2982 2.1500 −0.6097 7.5018 0.0390 0.4364
      50 0.2983 2.0574 −0.5622 2.8685 0.0264 0.2427
      100 0.2988 2.0295 −0.3891 1.4747 0.0201 0.1632
      300 0.2983 2.0087 −0.5669 0.4353 0.0188 0.0921
5 0.5 3.0 20 0.4981 3.2285 −0.3823 7.6171 0.0349 0.6670
      50 0.4979 3.0818 −0.4291 2.7282 0.0281 0.3580
      100 0.4975 3.0376 −0.5095 1.2544 0.0242 0.2443
      300 0.4989 3.0145 −0.2177 0.4846 0.0151 0.1379
6 0.7 0.7 20 0.6847 0.7588 −2.1872 8.4058 0.1455 0.1571
      50 0.6884 0.7214 −1.6610 3.0541 0.1104 0.0847
      100 0.6889 0.7114 −1.5920 1.6234 0.0917 0.0579
      300 0.6938 0.7056 −0.8787 0.7933 0.0641 0.0321
7 0.8 3.0 20 0.7808 0.9725 −2.4046 8.0553 0.1215 0.2028
      50 0.7887 0.9273 −1.4069 3.0367 0.0852 0.1085
      100 0.7922 0.9119 −0.9774 1.3259 0.0686 0.0738
      300 0.7946 0.9043 −0.6741 0.4759 0.0566 0.0410
8 0.8 2.5 20 0.7950 2.6931 −0.6225 7.7256 0.0475 0.5538
      50 0.7953 2.5720 −0.5871 2.8793 0.0407 0.3062
      100 0.7976 2.5369 −0.3048 1.4757 0.0291 0.2035
      300 0.7987 2.5123 −0.1610 0.4934 0.0181 0.1154
9 0.85 2.7 20 0.8456 2.9057 −0.5143 7.6168 0.0402 0.5885
      50 0.8459 2.7742 −0.4882 2.7496 0.0328 0.3268
      100 0.8473 2.7411 −0.3166 1.5238 0.0255 0.2206
      300 0.8488 2.7101 −0.1457 0.3753 0.0165 0.1239
10 0.9 1.2 20 0.8849 1.2906 −1.6780 7.5480 0.0833 0.2609
      50 0.8878 1.2325 −1.3589 2.7100 0.0704 0.1464
      100 0.8930 1.2164 −0.7791 1.3626 0.0536 0.0993
      300 0.8965 1.2053 −0.3938 0.4416 0.0394 0.0550

In the paper supplementary material, we provide boxplots that illustrate the convergence of μ^(γ^) and γ^ for the first 100 replications at selected scenarios from Table 2. The outcome indicates that the precision of the MLEs improved for larger sample sizes. In addition, both μ^(γ^) and γ^ exhibited high accuracy and precision when n = 300. We note the presence of outliers that overestimate the true value of γ for the small sample size n = 20. By this fact, we observe that this configuration is attenuated as n increases.

7. Applications

In what follows, we shall apply some UEW and CUEW special models for two data sets related to literacy rate, which is defined as the proportion of people aged 15 years old or more who can read or write a simple note. The first data set contains the literacy rates of 5565 cities in Brazil. It was measured during the census in 2010 and is available at http://datasus.saude.gov.br/. The second application models the literacy rates of 1107 cities in Colombia. It was measured during the census in 2005 and is available at www.http://microdatos.dane.gov.co/. The analysis is carried out using the AdequacyModel script [10] in the R programming language.

For modeling those data, we fit the classical beta and Kw distributions and other five special models of both introduced families. They are the UGo, ULo, CUGo, CULo and CUW distributions. They have their densities given by (18), (19), (20), (21) and (22), respectively. We also considered the unit gamma (UG) distribution, introduced by [7] and considered by [27] for hydrological applications. Mazucheli et al. [13] proposed two bias-corrected maximum likelihood estimators (MLEs) for both shape parameters of the UG distribution.

The UG, beta and Kw densities (for 0<y<1) are given by

fUG(y)=[μ1/ϕ1μ1/ϕ]ϕ1Γ(ϕ)ytμ1/ϕ/(1μ1/ϕ)1[log(y)]ϕ1,fBeta(y)=Γ(ϕ)Γ(μϕ)Γ((1μ)ϕ)yμϕ1(1y)(1μ)ϕ1,

and

fKw(y)=ϕlog0.5log(1μϕ)yϕ1(1yϕ)log0.5/log(1μϕ)1, (28)

respectively. For the UG and beta models, μ(0,1) is the mean of Y, and φ is a precision parameter. Those parametrizations are presented by [4,19], respectively. For the Kw model, μ(0,1) is the distribution median, and φ is a precision parameter. The pdf in (28) was previously presented by [18].

The descriptive summary of the literacy rates for Brazilian and Colombian municipalities is given in Table 3. We observe that both countries present the mean and median quite distant from the mode, and variance of 0.01. Brazil exhibit higher values for all central tendency measures considered and Colombia for the amplitude. Both countries present negative skewness for this variable. These descriptive measures indicate that the mass of observations is concentrated on the right. This configuration is adequate once this variable is defined positively: the higher the literacy rate, the better the country's education development. According to UNESCO [29], ‘literate societies enable the free exchange of text based information and provide an array of opportunities for lifelong learning’. In addition, for Sen [26], the basic education can be considered a semi-public good, which benefits not only the literate person but also the society in general. We develop an interactive map dashboard as a tool for data visualization on the literacy rate. Interested readers can refer to the website https://newdists.shinyapps.io/UEWfamilies/#section-literacy-rates.

Table 3. Descriptive statistics for literacy rates in both countries.

Country Mean Median Mode Variance Skewness Kurtosis Min. Max.
Brazil 0.8419 0.8708 0.93 0.01 −0.61 −0.72 0.53 0.99
Colombia 0.8039 0.8341 0.88 0.01 −1.57 3.67 0.18 0.98

The parameter estimates obtained by the maximum likelihood method, and corresponding standard errors (SEs) for all those models are listed in Table 4 for the Brazilian and Colombian data sets. The Cramér-von Misses corrected statistic [3] (W) is also presented to evaluate the goodness-of-fit. The lower is the W statistic's value, the better is the adjustment to the data. The SEs of the estimates for all fitted models are quite small. Among all fitted models, the figures in Table 4 indicate that, for both data sets, the CUW model has the lowest value for W. Further, the other distributions on the proposed families are shown competitive with the classical models. It illustrates the relevance of the new family for modeling social indicators, such as the literacy rate.

Table 4. MLEs of the parameters from fitted models to literacy rates for Brazilian municipalities in 2010, and Colombian municipalities in 2005.

  Brazil Colombia
Distribution Parameter estimates W Parameter estimates W
Beta (μ,ϕ) 0.8420 13.3490 7.3663 0.8020 14.0180 1.8003
  (0.0013) (0.2520)   (0.0031) (0.5864)  
Kw (μ,ϕ) 0.8610 8.9550 6.8732 0.8190 8.7040 1.1929
  (0.0012) (0.1308)   (0.0029) (0.2712)  
UG (μ,ϕ) 0.8420 2.1080 7.3735 0.8020 2.7690 1.8138
  (0.0013) (0.0372)   (0.0031) (0.1113)  
UGo (μ,β) 0.8510 3.7220 13.9384 0.8340 1.2030 3.6124
  (0.0017) (0.1096)   (0.0046) (0.1184)  
ULo (μ,β) 0.8920 0.6940 6.4610 0.8600 1.3700 1.2318
  (0.0015) (0.0291)   (0.0042) (0.1444)  
CUGo (μ,β) 0.8780 1.2270 10.6856 0.8360 1.6040 1.4484
  (0.0015) (0.0149)   (0.0034) (0.0397)  
CULo (μ,β) 0.7600 365.6600 6.1925 0.7080 183.819 1.4064
  (0.0046) (69.7698)   (0.0109) (59.528)  
CUW (μ,γ) 0.8720 3.2070 5.0898 0.8300 3.5548 0.2798
  (0.0013) (0.0329)   (0.0030) (0.0808)  

Note: The corresponding SEs (given in parentheses) and the goodness-of-fit statistic.

Figure 2 contains the data histogram with fitted density functions for some of the competitive models introduced and the beta distribution. This visual inspection indicates that the CUW distribution fits adequately to the Brazilian and Colombian literacy rates. We also note that this plot is in agreement with the results in Table 3. For both samples, the CUW median MLE is very close with the observed, and the CUGo provides the second closer median estimate. By analyzing the quantile-quantile plot, we also observe the CUW distribution's superiority for modeling these data sets. Therefore, we can conclude that the CUW distribution, a particular case of the CUEW family, provides a good fit to the Brazilian and Colombian literacy rates, and other distributions on the introduced families are also quite competitive. Finally, these results illustrate that the new models can be effective alternatives to the classical distributions for modeling bounded data (Figures 3 and 4).

Figure 2.

Figure 2.

Boxplot, histogram and estimated densities of the beta, CUGo and CUW models for the literacy rates. (a) Brazilian data and (b) Colombian data.

Figure 3.

Figure 3.

Quantile-quantile plot of the beta, CUGo and CUW models for the literacy rates for Brazilian municipalities in 2010.

Figure 4.

Figure 4.

Quantile-quantile plot of the beta, CUGo and CUW models for the literacy rates for Colombian municipalities in 2005.

8. Concluding remarks

We define two new classes of distributions with bounded domain constructed by a simple and intuitive variable transformation in the extended Weibull family of distributions. The main properties of the families of distributions are derived, such as the quantile function, moments, and incomplete moments. Five special models in the family are described with some details. The maximum likelihood procedure is used for estimating the model parameters. In order to assess the performance of the maximum likelihood estimates, a simulation study is performed employing Monte Carlo experiments. An example of real data illustrates the importance and potentiality of the new family. In conclusion, we define a general approach for generating new unit interval distributions, at least forty distributions, some known, and the great majority new ones. All computational codes are available as supplementary material. We hope these families of distributions may attract wider applications in statistics. Future work should explore a regression structure for the median and zero-augmented UEW family, assuming that the variable has a mixed continuous-discrete distribution to model data that are observed on [0,1),(0,1] or [0,1].

Supplementary Material

Section_7_Application.R
Section_6_Simulation.R
Section_6_Figure1.R
supplementarydata2.xlsx
supplementarydata1.xlsx
supp_UEW_JAS.pdf

Acknowledgements

We gratefully acknowledge partial financial support from CAPES.

Note

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Ahnen R.E., The politics of police violence in democratic Brazil, Lat. Am. Polit. Soc. 49 (2007), pp. 141–164. doi: 10.1111/j.1548-2456.2007.tb00377.x [DOI] [Google Scholar]
  • 2.Altun E. and Cordeiro G.M., The unit-improved second-degree Lindley distribution: Inference and regression modeling, Comput. Stat. 35 (2019), pp. 1–21. [Google Scholar]
  • 3.Chen G. and Balakrishnan N., A general purpose approximate goodness-of-fit test, J. Qual. Technol. 27 (1995), pp. 154–161. doi: 10.1080/00224065.1995.11979578 [DOI] [Google Scholar]
  • 4.Ferrari S.L.P. and Cribari-Neto F., Beta regression for modelling rates and proportions, J. Appl. Stat. 31 (2004), pp. 799–815. doi: 10.1080/0266476042000214501 [DOI] [Google Scholar]
  • 5.Ghitany M.E., Mazucheli J., Menezes A.F.B., and Alqallaf F., The unit-inverse gaussian distribution: A new alternative to two-parameter distributions on the unit interval, Comm. Statist. Theory Methods 48 (2019), pp. 3423–3438. doi: 10.1080/03610926.2018.1476717 [DOI] [Google Scholar]
  • 6.Gómez-Déniz E., Sordo M.A., and Calderín-Ojeda E., The log-Lindley distribution as an alternative to the beta regression model with applications in insurance, Insurance Math. Econom. 54 (2014), pp. 49–57. doi: 10.1016/j.insmatheco.2013.10.017 [DOI] [Google Scholar]
  • 7.Grassia A., On a family of distributions with argument between 0 and 1 obtained by transformation of the gamma distribution and derived compound distributions, Aust. J. Statist. 19 (1977), pp. 108–114. doi: 10.1111/j.1467-842X.1977.tb01277.x [DOI] [Google Scholar]
  • 8.Gurvich M., DiBenedetto A., and Ranade S., A new statistical distribution for characterizing the random strength of brittle materials, J. Mater. Sci. 32 (1997), pp. 2559–2564. doi: 10.1023/A:1018594215963 [DOI] [Google Scholar]
  • 10.Marinho P.R.D., Bourguignon M., and Dias C.R.B., AdequacyModel: Adequacy of probabilistic models and general purpose optimization, R package version 2.0.0, 2016. Available at https://CRAN.R-project.org/package=AdequacyModel.
  • 11.Massa K.H.C., Pabayo R., and Chiavegatto Filho A.D.P., Income inequality and self-reported health in a representative sample of 27 017 residents of state capitals of Brazil, J. Public Health 40 (2018), pp. e440–e446. doi: 10.1093/pubmed/fdy022 [DOI] [PubMed] [Google Scholar]
  • 12.Mazucheli J., Menezes A.F.B., and Chakraborty S., On the one parameter unit-Lindley distribution and its associated regression model for proportion data, J. Appl. Stat. 46 (2019), pp. 700–714. doi: 10.1080/02664763.2018.1511774 [DOI] [Google Scholar]
  • 13.Mazucheli J., Menezes A.F.B., and Dey S., Improved maximum likelihood estimators for the parameters of the unit-gamma distribution, Comm. Statist. Theory Methods 47 (2017), pp. 3767–3778. doi: 10.1080/03610926.2017.1361993 [DOI] [Google Scholar]
  • 14.Mazucheli J., Menezes A.F.B., and Dey S., The unit-Birnbaum-Saunders distribution with applications, Chil. J. Stat. 9 (2018), pp. 47–57. [Google Scholar]
  • 15.Mazucheli J., Menezes A.F.B., Fernandes L.B., de Oliveira R.P., and Ghitany M.E., The unit-Weibull distribution and associated inference, J. Appl. Probab. Stat. 13 (2019), pp. 1–22. doi: 10.18576/amis/13S101 [DOI] [Google Scholar]
  • 16.Mazucheli J., Menezes A.F.B., Fernandes L.B., de Oliveira R.P., and Ghitany M.E., The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates, J. Appl. Stat. 47 (2019), pp. 954–974. doi: 10.1080/02664763.2019.1657813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Messias E., Income inequality, illiteracy rate, and life expectancy in Brazil, Amer. J. Public Health 93 (2003), pp. 1294–1296. doi: 10.2105/AJPH.93.8.1294 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Mitnik P.A. and Baek S., The Kumaraswamy distribution: Median-dispersion re-parameterizations for regression modeling and simulation-based estimation, Statist. Papers 54 (2013), pp. 177–192. doi: 10.1007/s00362-011-0417-y [DOI] [Google Scholar]
  • 19.Mousa A.M., El-Sheikh A.A., and Abdel-Fattah M.A., A gamma regression for bounded continuous variables, Adv. Appl. Stat. 49 (2016), pp. 305–326. [Google Scholar]
  • 21.Nadarajah S. and Kotz S., On some recent modifications of Weibull distribution, IEEE Trans. Reliab. 54 (2005), pp. 561–562. doi: 10.1109/TR.2005.858811 [DOI] [Google Scholar]
  • 22.Pham H. and Lai C.D., On recent generalizations of the Weibull distribution, IEEE Trans. Reliab. 56 (2007), pp. 454–458. doi: 10.1109/TR.2007.903352 [DOI] [Google Scholar]
  • 23.Royuela V. and García G.A., Economic and social convergence in Colombia, Reg. Stud. 49 (2015), pp. 219–239. doi: 10.1080/00343404.2012.762086 [DOI] [Google Scholar]
  • 24.Santos-Neto M., Bourguignon M., Zea L.M., Nascimento A.D., and Cordeiro G.M., The Marshall-Olkin extended Weibull family of distributions, J. Stat. Distrib. Appl. 1 (2014), pp. 9. doi: 10.1186/2195-5832-1-9 [DOI] [Google Scholar]
  • 25.Sen A., The standard of living: Lecture I, concepts and critiques, The Standard of Living, Cambridge: Cambridge University Press, 1987, pp. 1–19.
  • 26.Sen A., Develoment as Freedom, Alfred A. Knopf, New York, 1999. [Google Scholar]
  • 27.Tadikamalla P.R., On a family of distributions obtained by the transformation of the gamma distribution, J. Stat. Comput. Simul. 13 (1981), pp. 209–214. doi: 10.1080/00949658108810497 [DOI] [Google Scholar]
  • 28.U.G. Assembly , Work of the statistical commission pertaining to the 2030 agenda for sustainable development (A/RES/71/313), UN General Assembly, New York, NY, USA 2017.
  • 29.UNESCO , Education for All Global Monitoring Report 2006: Education for All. Literacy for life, Oxford University Press, 2005.
  • 30.UNESCO , Education for All Global Monitoring Report 2015, UNESCO, Paris, 2015.
  • 31.Weibull W., A statistical distribution of wide applicability, J. Appl. Mech. 18 (1951), pp. 293–297. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Section_7_Application.R
Section_6_Simulation.R
Section_6_Figure1.R
supplementarydata2.xlsx
supplementarydata1.xlsx
supp_UEW_JAS.pdf

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES