Skip to main content
Entropy logoLink to Entropy
. 2019 May 20;21(5):510. doi: 10.3390/e21050510

The Exponentiated Lindley Geometric Distribution with Applications

Bo Peng 1, Zhengqiu Xu 1, Min Wang 2,*
PMCID: PMC7514999  PMID: 33267224

Abstract

We introduce a new three-parameter lifetime distribution, the exponentiated Lindley geometric distribution, which exhibits increasing, decreasing, unimodal, and bathtub shaped hazard rates. We provide statistical properties of the new distribution, including shape of the probability density function, hazard rate function, quantile function, order statistics, moments, residual life function, mean deviations, Bonferroni and Lorenz curves, and entropies. We use maximum likelihood estimation of the unknown parameters, and an Expectation-Maximization algorithm is also developed to find the maximum likelihood estimates. The Fisher information matrix is provided to construct the asymptotic confidence intervals. Finally, two real-data examples are analyzed for illustrative purposes.

Keywords: compounding, Lindley distribution, geometric distribution, maximum likelihood estimation, Expectation-Maximization algorithm, lifetime distribution

1. Introduction

Suppose that a company has N systems functioning independently and producing a certain product at a given time, where N is a random variable determined by economy, customers demand, etc. The reason for considering N as a random variable comes from a practical viewpoint, because failure (of a device for example) often occurs due to the present of an unknown number of initial defects in the system. In this paper, we consider the case in which N is taken to be a geometric random variable with the probability mass function given by

P(N=n)=(1p)pn1,

for 0<p<1 and n is a positive integer. We may take N to follow other discrete distributions, such as binomial, Poisson, etc, whereas they need to be truncated 0 because one must have N1. Another rationale by taking N to be a geometric random variable is that the “optimum” number can be interpreted as “number to event”, matching up with the definition of a geometric random variable, as commented by [1]. The geometric distribution has been widely used for the number of “systems” in the literature; see, for example, [2,3]. It has also been adopted to obtain some new class of distributions; see [4] for the exponential geometric (EG) distribution, [5] for the exponentiated exponential geometric (EEG) distribution, [6] for the Weibull geometric distribution, [1] for the geometric exponential Poisson (GEP) distribution, to name just a few.

On the other hand, we assume that each of N systems is made of α parallel components, and therefore, the system will completely shutdown if all of the components fail. Meanwhile, we assume that the failure times of the components for the ith system, denoted by Zi1,,Ziα, are independent and identically distributed (iid) with the cumulative distribution function (cdf) G(z) and the probability density function (pdf) g(z). For simplicity of notation, let Yi stand for the failure time of the ith system and X denote the time to failure of the first out of the N functioning systems, i.e., X=min(Y1,,YN). Then it can be seen from [5] that the conditional cdf of X given N is given by

G(xN)=1P(X>xN)=11G(x)αN,

and the unconditional cdf of X can thus be written as

F(x)=n=1G(xN)P(N=n)=G(x)α1p+p·G(x)α. (1)

The new class of distribution in (1) depends on the cdf of the failure times of the components in the system, which may follow some continuous probability distributions, such as the exponential, Lindley, and Weibull distributions. As an illustration, if the failure times of the components for the ith system are iid exponential random variables with the rate parameter λ, i.e., G(z)=1eλz, then we obtain the EEG distribution due to [5]. Its cdf is given by

F(x)=1eλxα1p+p1eλxα. (2)

Please note that in reliability engineering and lifetime analysis, we often assume that the failure times of the components within each system follow the exponential lifetimes; see, for example [4,5,7], among others. This assumption may be unreasonable because the hazard rate of the exponential distribution is a constant, whereas some real-life systems may not have constant hazard rates, and the components of a system are often more rigid than the system itself. Accordingly, it becomes reasonable to consider the components of a system following a distribution with a non-constant hazard function that has flexible hazard function shapes.

In this paper, we propose a new three-parameter lifetime distribution by compounding the Lindley and geometric distributions based on the new class of distribution in (1). The Lindley distribution was first proposed by [8] in the context of Bayesian statistics, as a counterexample of fiducial statistics. It has recently received considerable attention as an appropriate model to analyze lifetime data especially in applications modeling stress-strength reliability; see, for example, [9,10,11]. Ghitany et al. [12] argue that the Lindley distribution could be a better lifetime model than the exponential distribution through a numerical example and show that the hazard function of the Lindley distribution does not exhibit a constant hazard rate, indicating the flexibility of the Lindley distribution over the exponential distribution. These observations motivate us to study the structure properties of the distribution in (1) when the failure times of the units for the ith system are iid Lindley random variables with the parameter θ, i.e.,

G(z)=1θ+1+θzθ+1eθz,z>0, (3)

where the parameter θ>0. Its corresponding cdf is given by

F(x)=1θ+1+θxθ+1eθxα1p+p1θ+1+θxθ+1eθxα,x>0, (4)

where the parameters α>0, θ>0, and 0<p<1. We call the distribution as the exponentiated Lindley geometric (ELG) distribution. Indeed, it is necessary to compute the entropy measure for ELG distribution under the assumption that errors are non-Gaussian distributed (e.g., [13]). Other motivations of the ELG distribution are briefly summarized as follows. (i) It contains several lifetime distributions as special cases, such as the Lindley-geometric (LG) distribution due to [14] when α=1. (ii) It can be viewed as a mixture of exponentiated Lindley distributions introduced by [15]. (iii) The ELG distribution is a flexible model which can be widely used for modeling lifetime data in reliability and survival analysis. (iv) It exhibits monotonically increasing, decreasing, unimodal (upper-down bathtub), and bathtub shaped hazard rates but does not exhibit a constant hazard rate, which makes the ELG distribution to be superior to other lifetime distributions, which exhibit only monotonically increasing/decreasing, or constant hazard rates.

The remainder of the paper is organized as follows. In Section 2, we discuss various statistical properties of the new distribution. The maximum-likelihood estimation is considered in Section 3, and an EM algorithm is proposed to find the maximum likelihood estimates because they cannot be obtained in closed form. The maximum-likelihood estimation for censored data is also discussed briefly. In Section 4, two real-data applications are provided for illustrative purposes. Some concluding remarks are given in Section 5.

2. Properties of the ELG distribution

We provide statistical properties of the ELG distribution. These include the pdf and its shape (Section 2.1), hazard rate function and its shape (Section 2.2), quantile function (Section 2.3), order statistics (Section 2.4), expressions for the nth moments (Section 2.5), residual life function (Section 2.6), mean deviations (Section 2.7), Bonferroni and Lorenz curves (Section 2.8), and entropies (Section 2.9).

2.1. Probability Density function

The corresponding pdf of the ELG distribution corresponding to the cdf in (4) is given by

f(x)=αθ2(1p)(1+x)eθx1θ+1+θxθ+1eθxα1(θ+1)1p+p1θ+1+θxθ+1eθxα2, (5)

for x>0, α>0, θ>0, and 0<p<1.

It should be noted that the pdf in (5) is still a well-defined density function when p0. Thus, we can define the ELG distribution in (5) to any p<1. As mentioned in Section 1, the ELG distribution includes several special submodels. When α=1, it becomes the LG distribution due to [14]. When p=0 and α=1, it turns out to be the Lindley distribution due to [8]. It converges a distribution degenerating at the point 0 when p1.

Figure 1 displays the pdf of the ELG distribution in (5) with selected values of α,θ, and p. We observe from Figure 1 that the shape of the pdf is monotonically decreasing with the modal value of at x=0 when α<1 and the shape of the pdf appears upside-down bathtub for α>1. When α=1, we observe that the shape exhibits monotonically decreasing as well as unimodal. This observation coincides with Theorem 1 of [14], which states that the density function of the LG distribution is (i) decreasing for all values of p and θ for which p>1θ21+θ2, (ii) unimodal for all values of p and θ for which p1θ21+θ2.

Figure 1.

Figure 1

Plots of the pdf of the ELG distribution for different values of α,θ, and p.

2.2. Hazard Rate Function

The failure rate function, also known as the hazard rate (hf) function, is an important characteristic for lifetime modeling. For a continuous distribution with the cdf F(x) and the pdf f(x), its failure rate function is defined as

h(x)=limΔx0=P(X<x+ΔxX>x)Δx=f(x)S(x),

where S(x)=1F(x) is the survival function of X. The hf of the ELG distribution is given by

h(x)=αθ2(1+x)eθx1θ+1+θxθ+1eθxα1(θ+1)11θ+1+θxθ+1eθxα1p+p1θ+1+θxθ+1eθxα (6)

for x>0, α>0, θ>0, and p<1.

Figure 2 depicts shapes of the hf with selected values of α,θ, and p. We observe that the hf of the ELG distribution is quite flexible. For example, the shape appears monotonically decreasing if α is sufficiently small and p is not sufficiently large. The shape appears monotonically increasing for small p and large α. The shape appears bathtub-shaped or first increases then bathtub-shaped for α=1. We may conclude that the ELG distribution exhibits increasing, decreasing, upside-down bathtub, and bathtub shaped failure hazard rates, but does not exhibit a constant hazard rate.

Figure 2.

Figure 2

Plots of the hf of the ELG distribution for different values of α,θ, and p.

Note also that as x0, the initial hf behaves as h(x){αθ2α/[(θ+1)α(1p)]}xα1, which implies that h(0) for α<1, h(0)=θ2/[(θ+1)(1p)] for α=1, and h(0)=0 for α>1.

2.3. Quantile Function

Let Z denote a Lindley random variable with the cdf in (3). We observe from [16] that the quantile function of the Lindley distribution is

G1(u)=11θ1θW1θ+1eθ+1(1u), (7)

where 0<u<1 and W1(·) denotes the negative branch of the Lambert W function (i.e., the solution of the equation W(z)eW(z)=z), which can be calculated by using the Lambert-W function in the R package lamW; see [17] in detail.

Let X be a ELG random variable with the cdf F(x) in (4). By inverting F(x)=u for 0<u<1, we obtain

uup1up1/α=1θ+1+θxθ+1eθx=G(x).

It follows from Equation (7) that the quantile function of the ELG distribution is given by

F1(u)=11θ1θW1(θ+1eθ+1[1uup1up1/α]). (8)

Please note that 1e<θ+1eθ+1[1uup1up1/α]<0, so the W1(·) is unique, which implies that F1(u) is also unique. Thus, one can use Equation (8) for generating random data from the ELG distribution. In particular, the quartiles of the ELG distribution, respectively, are given by

Q1=F114=11θ1θW1(θ+1eθ+1[11p4p1/α]),Q2=F112=11θ1θW1(θ+1eθ+1[11p2p1/α]),Q3=F134=11θ1θW1(θ+1eθ+1[133p43p1/α]).

2.4. Order Statistics

Suppose X1,,Xn is a random sample from the ELG distribution. Let X(1)<X(2)<<X(n) be the corresponding order statistics. The pdf for the rth order statistic of the ELG distribution, say Y=X(r), is given by

fY(y)=n!(r1)!(nr)!Fr1(y)1F(y)nrf(y)=n!(r1)!(nr)!τ=0nrnrτ(1)τFr1+τ(y)f(y)=αθ2(1p)(1+x)eθxn!(θ+1)2(r1)!(nr)!τ=0nrnrτ(1)τ1θ+1+θxθ+1eθxα(r+τ)11p+p1θ+1+θxθ+1eθxαr+τ+1.

The corresponding cdf of Y is given by

FY(y)=j=rnFj(y)1F(y)nj=j=rnτ=0njnjnjτ(1)τFj+τ(y)=j=rnτ=0njnjnjτ(1)τ1θ+1+θxθ+1eθxα(j+τ)1p+p1θ+1+θxθ+1eθxαj+τ.

In practice, we may be interested in studying the asymptotic distribution of the extreme values X(1) and X(n). By using L’Hospital’s rule, we have

limt1F(t+x/θ)1F(t)=limtf(t+x/θ)f(t)=11θtθ+1e(θt+x)α11θtθ+1eθtα=ex.

In addition, by using L’Hospital’s rule, it can be easily shown that

limt0F(tx)F(t)=limt0xf(xt)f(t)=limt01θ+1+θtxθ+1eθtx1θ+1+θtθ+1eθtα=xα.

By following Theorem 1.6.2 in [18] we observe that there must be some normalizing constants an>0, bn, cn>0, and dn, such that

Pran(X(1)bn)xexpex

and

Prcn(X(n)dn)x1expxa

as n. The form of the normalizing constants can be determined by using Corollary 1.6.3 in [18]. As an illustration, one can see that an=θ and bn=F1(11/n), where F1(·) denotes the inverse function of F(·).

2.5. Moment Properties

Many important features of a distribution can be characterized through its moments, such as dispersion, skewness, and kurtosis. To derive the nth moment of the ELG distribution, we consider the Taylor series expansion of the form

(1+x)a=k=0akxk, (9)

which converges for |x|<1. This provides that

1p+pGα(x)1=k=01kp1Gα(x)k=k=0j=0k1kkj(1)j+kpkG(x)αj, (10)

where 1k is the generalized binomial coefficient. Therefore, we can rewrite Equation (4) as

F(x)=k=0j=0k1kkj(1)j+kpkG(x)αj+α=k=0j=0k1kkj(1)j+kpk1θ+1+θxθ+1eθxαj+α. (11)

We observe that the ELG distribution is a mixture of exponentiated Lindley distributions introduced by [15], i.e.,

1p+pGα(x)1=11p1+p1pGα(x)1=11pk=01kp1pGα(x)k,

which is convergent for |p/(1p)Gα(x)|<1. They show that if Y is an exponentiated Lindley random variable with parameters θ and β, the nth moment and the moment generating function of Y are, respectively, given by

IE(Yθ,βn)=βθ21+θK(β,θ,n,θ)

and

MYθ,β(t)=βθ21+θK(β,θ,0,θt)

for t<θ, where

K(a,b,c,δ)=0xc(1+x)11+b+bx1+bebxa1eδxdx=i=0j=0ik=0j+1a1iijj+1k(1)ibjΓ(c+k+1)(1+b)i(bi+δ)c+k+1.

By using Equation (11), we obtain the nth moment of X can be rewritten as

μr(x)=IE(Xn)=k=01k(p)kj=0kkj(1)jIEYθ,αj+αn=k=01k(p)kj=0kkj(1)j(αj+α)θ21+θK(αj+α,θ,n,θ)=αθ21+θk=0j=0k1kkj(1)j+kpk(j+1)K(αj+α,θ,n,θ) (12)

for n=1,2,. Equation (12) can be adopted to compute the third and fourth central moments of the ELG distribution, which are then used to define skewness and kurtosis, respectively. For instance, based on the first four moments of the ELG distribution, the measures of skewness γ and kurtosis κ of the ELG distribution are, respectively, given by

γ=μ3(x)3μ2(x)μ2(x)+2μ13(x)μ2(x)μ12(x)3/2,

and

κ=μ4(x)4μ1(x)μ3(x)+6μ12(x)μ2(x)3μ14(x)μ2(x)μ12(x)2.

The moment generating function of the ELG distribution, denoted by MX(t), is given by

MX(t)=k=01k(p)kj=0kkj(1)jMYθ,αj+α(t)=αθ21+θk=0j=0k1kkj(1)j+kpk(j+1)K(αj+α,θ,0,θt).

Thereafter, we can use MX(t) to obtain the nth moment about zero of the ELG distribution. In particular, if |p1pGα(x)|<1, then Equation (11) can be simplified to

F(x)=11pk=0(p)k(1p)k1θ+1+θxθ+1eθxαk+α. (13)

The corresponding nth moment of X can be simplified as

μr(x)=IE(Xn)=11pk=0(p)k(1p)kIEYθ,αk+αn=αθ2(1+θ)(1p)k=0(p)k(j+1)(1p)kK(αk+α,θ,n,θ) (14)

for n=1,2,, and the moment generating function of the ELG distribution is given by

MX(t)=11pk=0(p)k(1p)kMYθ,αk+α(t)=αθ2(1+θ)(1p)k=0(p)k(j+1)(1p)kK(αk+α,θ,0,θt).

2.6. Residual Life Function

Given that a component of a system survives up to time t0, the residual life will be the period beyond t until the time of failure occurs in the system and is thus defined by the conditional random variable XtX>t. The mean residual life plays an important role in survival analysis and reliability of characterizing lifetime, because it can be used to determine a unique corresponding lifetime distribution. The rth moment of the residual life of the ELG distribution can be obtained by the general formula

mr(t)=IE(Xt)rY>t=1S(t)t(xt)rf(x)dx, (15)

where S(t)=1F(t) is the survival function defined before. Noting that the ELG distribution is a mixture of exponentiated Lindley distributions, we may calculate mr(t) by using the expression in Lemma 2 of [15], which is given by

L(a,b,c,t)=txc(1+x)1b+1+bxb+1ebxa1ebxdx=i=1j=1ik=0j+1a1iijj+1k(1)ibjΓ(c+k+1,(bi+b)t)(1+b)i(bi+b)c+k+1,

where Γ(a,x)=xta1exp(t)dx represents the complementary incomplete gamma function. Let X be an ELG random variable. By using the Taylor series expansion in (9), it can be easily shown that

txrf(x)dx=αθ2(1p)θ+1txr(1+x)1θ+1+θxθ+1eθxα11p+p1θ+1+θxθ+1eθxα2dx=αθ2(1p)θ+1tl=02l(p)lj=0llj(1)jxr(1+x)1θ+1+θxθ+1eθxα+αj1dx=αθ2(1p)θ+1l=02l(p)lj=0llj(1)jtxr(1+x)1θ+1+θxθ+1eθxα+αj1dx=αθ2(1p)θ+1l=0j=0l2llj(1)j+lplL(α+αj,θ,r,t).

From the binomial expansion for (xt)r, we get that the rth order moment of the residual life of the ELG distribution is given by

mr(t)=1S(t)t(xt)rf(x)=1S(t)tk=0rrkxrk(t)kf(x)dx=1S(t)k=0rrk(t)ktxrkf(x)dx=1S(t)αθ2(1p)θ+1l=0j=0lk=0rrk2llj(1)j+l+ktkplL(α+αj,θ,rk,t).

The mean and variance of the residual life function of the ELG distribution can be easily obtained using m1(t) and m2(t), and are not shown here for simplicity. In a similar way as done for Equation (13), it can be shown that if |p1pGα(x)|<1, then

txrf(x)dx=αθ2(θ+1)(1p)l=02l(p)l(1p)lL(α+αl,θ,r,t), (16)

and the rth order moment of the residual life of the ELG distribution can be written as

mr(t)=1S(t)αθ2(θ+1)(1p)l=0k=0rrk2l(1)k+lpltk(1p)lL(α+αl,θ,rk,t).

2.7. Mean Deviations

We consider the totality of deviations from the mean and median and the mean deviation from the mean, which is often used to estimate the amount of scatter in a population. The mean deviation is a more robust statistic to outliers in the data set than the standard deviation and the mean deviation from the median is a measure of statistical dispersion, which is a more robust statistic to outliers than the sample variance or standard deviation.

Let X denote a random variable with the pdf f(x), the cdf F(x), mean μ, and median M. The mean deviation about the mean and the mean deviation about the median are defined by

δ1(X)=0|xμ|f(x)dx=0μ(μx)f(x)dx+μ(xμ)f(x)dx=μF(μ)0μxf(x)dx+μxf(x)dxμ(1F(μ))=2μF(μ)2μ+2μxf(x)dx=2μF(μ)2μ+2αθ2(1p)θ+1l=0j=0l2llj(1)j+lplL(α+αj,θ,1,μ)

and

δ2(X)=0|xM|f(x)dx=0M(Mx)f(x)dx+M(xM)f(x)dx=MF(M)0Mxf(x)dx+Mxf(x)dxM(1F(M))=μ+2Mxf(x)dx=μ+2αθ2(1p)θ+1l=0j=0l2llj(1)j+lplL(α+αj,θ,1,M).

respectively. Of particular note is that when |p1pGα(x)|<1, the mean deviations above can be further simplified as

δ1(X)=2μF(μ)2μ+2αθ2(1p)(θ+1)i=12jpp+1jL(α+αj,θ,1,t).

and

δ2(X)=μ+2αθ2(1p)(θ+1)i=12jpp+1jL(α+αj,θ,1,M).

2.8. Bonferroni and Lorenz Curves

The Bonferroni and Lorenz curves (Bonferroni 1930) have many practical applications not only in economics and poverty, but also in other fields like reliability, lifetime testing, insurance, and medicine. For a random variable X with cdf F(·), the Bonferroni and Lorenz curves are defined by

BF(x)=1μF(x)0qxf(x)dx, (17)

where μ=IE(X), and

LF(x)=1F(x)0qxf(x)dx, (18)

respectively. If X is an ELG random variable with the pdf in (5), we observe Equation (17) can be written as

BF(x)=1μF(x)0qxf(x)dx=1μF(x)0xf(x)dxqxf(x)dx=1μF(x)μαθ2(1p)θ+1l=0j=0l2llj(1)j+lplL(α+αj,θ,1,q),

which is obtained by using Equation (16) with t=q and r=1. By using Equation (18), it follows easily that the Lorenz curve of the ELG distribution is given by L[F(x)]=μB[F(x)].

2.9. Entropies

It is well known that an entropy of a random variable X is a measure of variation of the uncertainty. The Rényi entropy is defined as

IR(γ)=1γlog0fγ(x)dx,

where γ>0 and γ1. The Shannon entropy is defined as Elog(f(x)), which is a particular case of the Rényi entropy as γ1. We first observe that

0fγ(x)dx=αθ2(1p)1+θ0γ(1+x)γeθγx1θ+1+θxθ+1eθxαγγ1p+p1θ+1+θxθ+1eθxα2γdx=αθ2(1p)1+θγk=0j=02γkkj(1)k+jp0k(1+x)γeθγx1θ+1+θxθ+1eθxαγγ+αjdx,

which shows that the Rényi entropy of the ELG distribution is given by

IR(γ)=1γlog0fγ(x)dx,=γ1γlogαθ2(1p)1+θ+11γlogk=0j=02γkkj(1)k+jpk0(1+x)γeθγx1θ+1+θxθ+1eθxαγγ+αjdx.

It can be shown that the Shannon entropy of the ELG distribution is given by

H(X)=E[logf(X)]=logαθ2(1p)1+θE[log(1+x)]+θE[x](α1)E[log(G(x))]+2E[log(1p+pGα(x)],

which can be easily evaluated using a unidimensional integral. Figure 3 depicts shapes of the Shannon entropy of the ELG distribution with several selected values of α,θ, and p. It deserves mentioning that the entropy measure of the ELG distribution can be estimated by using numerical integration methods with the (plug-in) estimators found in the following section.

Figure 3.

Figure 3

Plots of the Shannon entropy of the ELG distribution for different values of α,θ, and p.

3. Estimation of Parameters

We adopt the maximum likelihood estimation to estimate the unknown parameters (Section 3.1) and develop an Expectation-Maximization (EM) algorithm to find the maximum likelihood estimate (MLE) (Section 3.2). We also discuss the MLEs of the unknown parameters when the data is censored (Section 3.3).

3.1. Maximum Likelihood Estimation

It is well known that the MLE is often used to estimate the unknown parameter of a distribution because of its attractive properties, such as consistency, asymptotic normality, etc. Let X1,,Xn be a random sample from the ELG distribution with unknown parameter vector ϕ=(θ,α,p). Then the log-likelihood function l=l(ϕ;x) is given by

l=nlogα+2nlogθnlog(θ+1)+nlog(1p)+i=1nlog(1+xi)θi=1nxi+(α1)×i=1nlog1θ+1+θxiθ+1eθxi2i=1nlog1p+p1θ+1+θxiθ+1eθxiα. (19)

For notational convenience, let

τi(θ)=1θ+1+θxiθ+1eθxi,

for i=1,,n. The MLEs of the unknown parameters can be obtained by taking the first partial derivatives of Equation (19) with respect to α, θ, and p and putting them equal to 0. We have the following likelihood equations

lα=nα+i=1nlogτi(θ)2pi=1nτiα(θ)logτi(θ)1p+pτiα(θ), (20)
lθ=2nθnθ+1i=1nxi+(α1)θ(θ+1)2i=1nxi(2+θ+θxi+xi)eθxiτi(θ)2αpθ(θ+1)2×i=1nτiα1(θ)xi(2+θ+θxi+xi)eθxi1p+pτiα(θ), (21)
lp=n1p+2i=1n1τiα(θ)1p+pτiα(θ). (22)

Please note that the MLEs, respectively α^, θ^ and p^ of α, θ and p cannot be solved analytically. Numerical iteration techniques, such as the Newton-Raphson algorithm, are required to solve these equations, whereas the second derivatives of the log-likelihood are required for all iterations involved in numerical iteration techniques. We thus develop an EM algorithm to estimate the MLEs of the unknown parameters.

For interval estimation of the parameters, we consider suitable pivotal quantities based on the asymptotic properties of the MLEs and approximate the distributions of these quantities by the normal distribution. We observe that

2loglα2=nα22p(1p)i=1nτiα(θ)[log(τi(θ))]2[1p+pτiα(θ)]2,2loglθ2=2nθ2+n(θ+1)2(α1)(θ+1)4i=1nxieθxieθxi(xi+2xiθ+2+2θ)+(t+1)κiτi2(θ)+2αpθ2(θ+1)4i=1nxi2(2+θ+θxi+xi)2e2θxiτiα2(θ)(1α)[1p+pτiα(θ)]+αpτiα(θ)1p+pτiα(θ)2+2αp(θ+1)3i=1nxiκiτiα1(θ)eθxi1p+pτiα(θ),
2loglp2=n(1p)2+2i=1n1τiα(θ)1p+pτiα(θ)2,2loglαθ=2loglθα=θ(θ+1)2i=1nxi(2+θ+θxi+xi)eθxiτi(θ)2pθ(θ+1)2×i=1nα(1p)log(τi(θ))+1p+pτiα(θ)xi(2+θ+θxi+xi)eθxiτiα1(θ)[1p+pτiα(θ)]2,2loglαp=2loglpα=τiα(θ)log(τi(θ))[1p+pτi(θ)]2,2loglθp=2loglpθ=2τiα(θ)log(τi(θ))[1p+pτiα(θ)]2,

where κi=(θ3+θ)(xi+xi2)+θ2(3xi+2xi2)xi2 for i=1,,n. The observed Fisher information matrix of α, θ, and p can be written as

I=2loglα22loglαθ2loglαp2loglθα2loglθ22loglθp2loglpα2loglpθ2loglp2,

so the variance-covariance matrix of the MLEs α^, θ^ and p^ may be approximated by inverting the matrix I and is thus given by

V=2loglα22loglαθ2loglαp2loglθα2loglθ22loglθp2loglpα2loglpθ2loglp21=var(α)cov(α,θ)cov(α,p)cov(θ,α)var(θ)cov(θ,p)cov(p,α)cov(p,θ)var(p).

The asymptotic joint distribution of the MLEs α^, θ^, and p^ can be treated as being approximately multivariate normal and is given by

α^θ^p^Nαθp,var(α)cov(α,θ)cov(α,p)cov(θ,α)var(θ)cov(θ,p)cov(p,α)cov(p,θ)var(p). (23)

Since V involves the unknown parameters α, θ, and p, we replace these parameters by their corresponding MLEs to obtain an estimate of V denoted by

V^=var(α)^cov(α,θ)^cov(α,p)^cov(θ,α)^var(θ)^cov(θ,p)^cov(p,α)^cov(p,θ)^var(p)^.

The asymptotic 100(1γ)% confidence intervals of α, θ, and p are determined by

α^zγ/2var(α)^,α^+zγ/2var(α)^,θ^zγ/2var(θ)^,θ^+zγ/2var(θ)^,p^zγ/2var(p)^,p^+zγ/2var(p)^,

respectively, where zp is the upper pth percentile of the standard normal distribution.

The likelihood ratio (LR) can be used to evaluate the difference between the ELG distribution and its special submodels. We partition the parameters of the ELG distribution into (ϕ1,ϕ2), where ϕ1 is the parameter of interest and ϕ2 is the remaining parameters. Consider the hypotheses

H0:ϕ1=ϕ1(0)versusH1:ϕ1ϕ1(0). (24)

The LR statistic for the test of the null hypothesis in (24) is given by

ω=2l(ϕ^;x)l(ϕ^*;x), (25)

where ϕ^ and ϕ^* are the restricted and unrestricted maximum likelihood estimators under H0 and H1, respectively. Under H0, it follows

ωDχκ2, (26)

where D denotes convergence in distribution as n and κ is the dimension of the subset ϕ1 of interest. For instance, we can compare the ELG and LG distributions by testing H0:α=1 versus H1:α1. The ELG and Lindley distributions are compared by testing H0:(α,p)=(1,0) versus H1:(α,p)(1,0).

3.2. Expectation-Maximization Algorithm

Dempster et al. [19] introduce an EM algorithm to estimate the parameters when some observations are treated as incomplete data. Suppose that X=(X1,X2,,Xn) and Z=(Z1,Z2,,Zn) represent the observed and hypothetical data, respectively. Here, the hypothetical data can be thought of as missing data because Z1,Z2,,Zn are not observable. We formulate the problem of finding the MLEs as an incomplete data problem, and thus, the EM algorithm is applicable to determine the MLEs of the ELG distribution. Let W=(X,Z) denote the complete data. To start this algorithm, define the pdf of each (Xi,Zi) for i=1,,n as

g(x,z,α,θ,p)=α(1p)θ2z(1+x)θ+1eθx1θ+1+θxθ+1eθxα1×pp1θ+1+θxθ+1eθxαz1.

The E-step of an EM cycle requires the conditional expectation of (ZX,α(r),θ(r),p(r)), where (α(r),θ(r),p(r)) is the current estimate of (α,θ,p) in the rthe iteration. Please note that the pdf of Z given X, say g(zx), is given by

g(zx)=zpp1θ+1+θxθ+1eθxαz11p+p1θ+1+θxθ+1eθxα2.

Thus, the conditional expectation is given by

IE[ZX,α,θ,p]=1+p11θ+1+θxθ+1eθxα1p11θ+1+θxθ+1eθxα.

The log-likelihood function lc(W;α,θ,p) of the complete data after ignoring the constants can be written as

lc(W;α,θ,p)i=1nzi+nlogα+i=1nlog(1+xi)+2nlogθnlog(θ+1)θi=1nxi+nlog(1p)+(α1)i=1nlog1θ+1+θxiθ+1eθxi+i=1n(zi1)logpp1θ+1+θxiθ+1eθxiα. (27)

Next the M-step involves the maximization of the pseudo log-likelihood function in (27). The components of the score function are given by

lcα=nα+i=1nlog1θ+1+θxiθ+1eθxii=1n(zi1)1θ+1+θxiθ+1eθxiαlog1θ+1+θxiθ+1eθxi11θ+1+θxiθ+1eθxiα,lcθ=2nθnθ+1i=1nxi+(α1)i=1nθxieθx1+xi+1θ+1(θ+1)1θ+1+θxiθ+1eθxiαθ(θ+1)2×i=1n(zi1)xi(2+θ+θxi+xi)eθxi1θ+1+θxiθ+1eθxiα111θ+1+θxiθ+1eθxiα,lcp=n1p+i=1nzi1p.

For notational convenience, let

τi(r)(θ)=1θ(r)+1+θ(r)xiθ(r)+1eθ(r)xi,

for i=1,,n. By replacing the missing Z’s with their conditional expectations IE[ZX,α(r),θ(r),p(r)], we obtain an iterative procedure of the EM algorithm given by the following equations.

0=nα(r+1)+i=1nlogτi(r+1)(θ)i=1n(zi1)τi(r+1)(θ)α(r+1)logτi(r+1)(θ)1τi(r+1)(θ)α(r+1), (28)
0=2nθ(r+1)nθ(r+1)+1i=1nxi+(α(r+1)1)i=1nθ(r+1)xieθ(r+1)xi1+xi+1θ(r+1)+1(θ(r+1)+1)τi(r+1)(θ)α(r+1)θ(r+1)(θ(r+1)+1)2i=1n(zi1)xi(2+θ(r+1)+θ(r+1)xi+xi)eθ(r+1)xiτi(r+1)(θ)α(r+1)11τi(r+1)(θ)α(r+1),p(r+1)=1ni=1nzi, (29)

where

zi=1+p(r)1τi(r)(θ)α(r)1p(r)1τi(r)(θ)α(r),

for i=1,,n. Please note that some efficient numerical methods, such as the Newton-Raphson algorithm, are only needed for solving Equations (28) and (29).

3.3. Censored Maximum Likelihood Estimation

Censored data often occur in lifetime data analysis. Several popular mechanisms of censoring, such as type-I censoring and type-II censoring, have received much attention in the literature. The survival function of the ELG distribution has a simple closed-form expression, and therefore, it can be used in analyzing lifetime data in the presence of censoring. We briefly discuss the general case of multicensored data. Suppose that n=n0+n1+n2 subjects of which

  • n0 is known to have failed at the times t1,,tn0,

  • n1 is known to have failed into the interval [si1,si] for i=1,,n1,

  • n2 is known to have survived at a time ri for i=1,n2 but not observed any longer.

Please note that Type-I censoring and Type-II censoring are contained as particular cases of multicensoring above. The log-likelihood function of ϕ=(θ,α,p) of the ELG distribution for this multicensoring takes the form

l(ϕ;x)=i=1n0log(1+ti)+n0logα2θ2(1p)θ+1θi=1n0ti+(α1)log1θ+1θtiθ+1eθtii=1n0log1p+p1θ+1θtiθ+1eθtiα+i=1n1log1θ+1+θsiθ+1eθsiα1p+p1θ+1+θsiθ+1eθsiα1θ+1+θsi1θ+1eθsi1α1p+p1θ+1+θsi1θ+1eθsi1αdx+i=1n2log11θ+1+θriθ+1eθriα1p+p1θ+1+θriθ+1eθriα.

It is straightforward to derive the first derivatives of the log-likelihood function with respect to the three unknown parameters α, θ, and p. Thereafter, the MLEs of the unknown parameters can be obtained by setting the first derivatives equal to zero, i.e.,

l(ϕ;x)θ=l(ϕ;x)α=l(ϕ;x)p=0.

Please note that the Newton-Raphson algorithm or other optimization algorithms may be employed to solve the above system of equations, because the MLEs of the unknown parameters cannot be obtained in closed-forms. Finally, the corresponding information matrix for ϕ is too complicated to be presented here.

4. Two Real-Data Applications

In this section, we illustrate the applicability of the ELG distribution using two real-data examples. We use the same data sets to compare the ELG distribution with the Gamma, Weibull, Lindley geometric (LG), Weibull geometric (WG) distributions, whose densities are given by

  • (i) 
    Gamma(β,α)
    f1(x)=1Γ(β)αβxβ1eαx,β>0,α>0;
  • (ii) 
    Weibull(β,λ)
    f2(x)=αβxβα1e(x/β)α,β>0,α>0;
  • (iii) 
    LG(θ,p)
    f3(x)=θ2θ+1(1p)(1+x)eθx1p(θ+1+θx)θ+1eθx2,θ>0,p<1,
  • (iv) 
    WG(α,β,p)
    f4(x)=αβα(1p)xα1e(βx)α1pe(βx)α2,α>0,β>0,p<1,

for x>0, respectively. To compare the ELG distribution with the four distributions listed above, we advocate the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the AIC with a correction (AICc) for the two-real data sets. In addition, we apply two formal goodness-of-fit tests: the Cramér-von Mises (W*) and Anderson-Darling (A*) statistics to further verify which distribution fits better to the data; see, for example, [5,20], among others. The smaller the value of the considered criterion, the better the fit to the data.

The first data set is about the remission time (in months) of a random sample of 128 bladder cancer patients. This data set presented in Table 1 was studied by [21] in fitting the extended Lomax distribution and [22] for the modified Weibull geometric distribution. Table 2 shows the MLEs of the parameters, AIC, BIC, and AICc for the ELG, Gamma, Weibull, LG, and WG distributions for the first data set. We observe from Table 2 that the ELG distribution and its special case LG provide an improved fit over other distributions that are commonly used for fitting lifetime data. The plots of the fitted probability density and survival function are also shown in Figure 4. Please note that the density and survival functions of the ELG distribution seem to be better than Gamma, Weibull, and WG density and survival functions. In addition, we observe from the values of goodness-of-fit tests in Table 3 that the ELG distribution fits the current data better than other distributions under consideration.

Table 1.

The first data set: the remission time (in months) of a random sample of 128 bladder cancer patients.

0.08 2.09 3.48 4.87 6.94 8.66 13.11 23.63 0.20 2.23
3.52 4.98 6.97 9.02 13.29 0.40 2.26 3.57 5.06 7.09
9.22 13.80 25.74 0.50 2.46 3.64 5.09 7.26 9.47 14.24
25.82 0.51 2.54 3.70 5.17 7.28 9.74 14.76 26.31 0.81
2.62 3.82 5.32 7.32 10.06 14.77 32.15 2.64 3.88 5.32
7.39 10.34 14.83 34.26 0.90 2.69 4.18 5.34 7.59 10.66
15.96 36.66 1.05 2.69 4.23 5.41 7.62 10.75 16.62 43.01
1.19 2.75 4.26 5.41 7.63 17.12 46.12 1.26 2.83 4.33
5.49 7.66 11.25 17.14 79.05 1.35 2.87 5.62 7.87 11.64
17.36 1.40 3.02 4.34 5.71 7.93 11.79 18.10 1.46 4.40
5.85 8.26 11.98 19.13 1.76 3.25 4.50 6.25 8.37 12.02
2.02 3.31 4.51 6.54 8.53 12.03 20.28 2.02 3.36 6.76
12.07 21.73 2.07 3.36 6.93 8.65 12.63 22.69

Table 2.

MLEs of the fitted models, AIC, BIC, and AICc for the first data set.

Model Parameters AIC BIC AICc
Gamma α^ = 0.1252 β^=1.1726 830.7356 836.4396 830.8316
Weibull α^ = 1.0478 β^ = 9.5607 832.1738 837.8778 832.2698
LG θ^ = 0.0742 p^=0.8898 823.1859 833.742 823.2819
WG α^ = 1.6042 β^ = 0.0286 p^ = 0.9362 826.1842 834.7403 826.3777
ELG α^ = 1.0792 θ^ = 0.0699 p^ = 0.9204 824.6214 833.1775 824.8149

Figure 4.

Figure 4

Plots of the estimated density and survival function of the fitted models for the first data set.

Table 3.

Goodness-of-fit tests for the first data set.

Statistic
Model W* A*
Gamma 0.11988 0.71928
Weibull 0.13136 0.78643
LG 0.05374 0.33827
WG 0.01493 0.09939
ELG 0.01389 0.09498

As mentioned in Section 3.1, we can adopt the LR statistic to compare between the ELG distribution and its special submodels. For example, the LR statistic for testing between the LG and ELG distributions (i.e., H0:α=1 versus H1:α1) is ω=0.5645 and the corresponding p-value is 0.4525. Thus, we fail to reject H0 and conclude that there is no statistical difference between the fits to this data using the ELG and its submodel LG. This is quite reasonable because the estimate of α in the ELG model is α^=1.0792, which is close to 1 in the LG model.

In the second data set, we consider the waiting time (in minutes) before service of 100 bank customers. The data are presented in Table 4. This data set was used by [12] in fitting the Lindley distribution. Table 5 shows the MLEs of the parameters, AIC, BIC, and AICc for the ELG, Gamma, Weibull, LG, and WG distributions for the second data set. Table 5 indicates that the ELG distribution is still a strong competitor to other lifetime distributions. In addition, the plots of the fitted probability density and survival function are shown in Figure 5. Please note that the ELG and WG distributions perform identically and that the empirical and fitted five survival curves almost overlap for this data set, supporting that the ELG distribution fits this data at least as good as the four alternative distributions. In addition, we observe from the values of goodness-of-fit tests in Table 6 that the ELG distribution fits the current data better than the Gamma, Weibull, and LG distributions and is comparable with the WG distribution.

Table 4.

The second data set: the waiting time (in minutes) before service of 100 bank customers.

0.8 0.8 1.3 1.5 1.8 1.9 1.9 2.1 2.6 2.7
2.9 3.1 3.2 3.3 3.5 3.6 4.0 4.1 4.2 4.2
4.3 4.3 4.4 4.4 4.6 4.7 4.7 4.8 4.9 4.9
5.0 5.3 5.5 5.7 5.7 6.1 6.2 6.2 6.2 6.3
6.7 6.9 7.1 7.1 7.1 7.1 7.4 7.6 7.7 8.0
8.2 8.6 8.6 8.6 8.8 8.8 8.9 8.9 9.5 9.6
9.7 9.8 10.7 10.9 11.0 11.0 11.1 11.2 11.2 11.5
11.9 12.4 12.5 12.9 13.0 13.1 13.3 13.6 13.7 13.9
14.1 15.4 15.4 17.3 17.3 18.1 18.2 18.4 18.9 19.0
19.9 20.6 21.3 21.4 21.9 23.0 27.0 31.6 33.1 38.5

Table 5.

MLEs of the fitted models, AIC, BIC, and AICc for the second data set.

Model Parameters AIC BIC AICc
Gamma α^ = 0.2033 β^=2.0089 638.6002 643.8106 638.724
Weibull α^ = 1.4585 β^ = 10.9553 641.4614 646.6717 641.5851
LG θ^ = 0.2027 p^=0.2427 641.8269 647.0372 641.9506
WG α^ = 1.9789 β^ = 0.0501 p^ = 0.82132 639.9084 647.7239 640.1584
ELG α^ = 1.4602 θ^ = 0.1725 p^ = 0.5385 640.3108 648.1263 640.5608

Figure 5.

Figure 5

Plots of the estimated density and survival function of the fitted models for the second data set.

Table 6.

Goodness-of-fit tests for the second data set.

Statistic
Model W* A*
Gamma 0.02761 0.18225
Weibull 0.06294 0.39624
LG 0.05374 0.33827
WG 0.01706 0.12365
ELG 0.01801 0.12665

5. Concluding Remarks

In this paper, we introduced the exponentiated Lindley geometric distribution, which generalizes the LG distribution due to [14] and the Lindley distribution proposed by [23]. We have studied various statistical properties of the new distribution. Estimations of the unknown parameters of the distribution are discussed based on the maximum likelihood estimation and an EM algorithm is provided for estimating the parameters. In an ongoing project, we study and Bayesian inference of these parameters and results will be reported elsewhere.

Acknowledgments

We greatly are very grateful to the three anonymous reviewers for their constructive comments and suggestions that led to a significant improvements of this paper. An early version of this paper is available in [24].

Author Contributions

M.W. and B.P. initiated and carried out the study. MW drafted the manuscript. Z.X. and B.P. participated in the data analysis and discussion. All authors read and approved the final manuscript.

Funding

The first author was partially supported by Scientific Innovation Program of Sichuan Province (Major Engineering Project: 2018RZ0093), Nanchong Scientific Council (Strategic Cooperation Program between University and City: NC17SY4020).

Conflicts of Interest

The authors declare that they have no competing interests.

References

  • 1.Nadarajah S., Cancho V., Ortega E.M. The geometric exponential Poisson distribution. Stat. Methods Appl. 2013;22:355–380. doi: 10.1007/s10260-013-0230-y. [DOI] [Google Scholar]
  • 2.Conti M., Gregori E., Panzieri F. Load distribution among replicated web servers: A QoS-based approach. SIGMETRICS Perform. Eval. Rev. 2000;27:12–19. doi: 10.1145/346000.346004. [DOI] [Google Scholar]
  • 3.Fricker C., Gast N., Mohamed H. Mean field analysis for inhomogeneous bike sharing systems; Proceedings of the DMTCS; Montreal, QC, Canada. 18–22 June 2012; pp. 365–376. [Google Scholar]
  • 4.Adamidis K., Loukas S. A lifetime distribution with decreasing failure rate. Stat. Probab. Lett. 1998;39:35–42. doi: 10.1016/S0167-7152(98)00012-1. [DOI] [Google Scholar]
  • 5.Rezaei S., Nadarajah S., Tahghighnia N. A new three-parameter lifetime distribution. Statistics. 2013;47:835–860. doi: 10.1080/02331888.2011.627587. [DOI] [Google Scholar]
  • 6.Barreto-Souza W., de Morais A.L., Cordeiro G.M. The Weibull-geometric distribution. J. Stat. Comput. Simul. 2011;81:645–657. doi: 10.1080/00949650903436554. [DOI] [Google Scholar]
  • 7.Pararai M., Warahena-Liyanage G., Oluyede B.O. Exponentiated power Lindley-Poisson distribution: Properties and applications. Commun. Stat. Theory Methods. 2017;46:4726–4755. doi: 10.1080/03610926.2015.1076473. [DOI] [Google Scholar]
  • 8.Lindley D.V. Fiducial distributions and Bayes’ theorem. J. R. Stat. Soc. Ser. B. Methodol. 1958;20:102–107. doi: 10.1111/j.2517-6161.1958.tb00278.x. [DOI] [Google Scholar]
  • 9.Gupta P., Singh B. Parameter estimation of Lindley distribution with hybrid censored data. Int. J. Syst. Assur. Eng. Manag. 2012;4:378–385. doi: 10.1007/s13198-012-0120-y. [DOI] [Google Scholar]
  • 10.Mazucheli J., Achcar J.A. The Lindley distribution applied to competing risks lifetime data. Comput. Methods Progr. Biomed. 2011;104:188–192. doi: 10.1016/j.cmpb.2011.03.006. [DOI] [PubMed] [Google Scholar]
  • 11.Zakerzadeh Y., Dolati A. Generalized Lindley distribution. J. Math. Ext. 2009;3:13–25. [Google Scholar]
  • 12.Ghitany M.E., Atieh B., Nadarajah S. Lindley distribution and its application. Math. Comput. Simul. 2008;78:493–506. doi: 10.1016/j.matcom.2007.06.007. [DOI] [Google Scholar]
  • 13.Arellano-Valle R.B., Contreras-Reyes J.E., Stehlík M. Generalized skew-normal negentropy and its application to fish condition factor time series. Entropy. 2017;19:528. doi: 10.3390/e19100528. [DOI] [Google Scholar]
  • 14.Zakerzadeh H., Mahmoudi E. A new two parameter lifetime distribution: Model and properties. arXiv. 20121204.4248 [Google Scholar]
  • 15.Nadarajah S., Bakouch H.S., Tahmasbi R. A generalized Lindley distribution. Sankhya B. 2011;73:331–359. doi: 10.1007/s13571-011-0025-9. [DOI] [Google Scholar]
  • 16.Jodrá P. Computer generation of random variables with lindley or Poisson-Lindley distribution via the lambert W function. Math. Comput. Simul. 2010;81:851–859. doi: 10.1016/j.matcom.2010.09.006. [DOI] [Google Scholar]
  • 17.Adler A. lamW: Lambert-W Function; R package version 1.3.0. [(accessed on 18 May 2019)];2017 Available online: https://cran.r-project.org/web/packages/lamW/lamW.pdf.
  • 18.Leadbetter M.R., Lindgren G., Rootzén H. Extremes and Related Properties of Random Sequences and Processes. Springer; New York, NY, USA: 1983. (Springer Series in Statistics). [Google Scholar]
  • 19.Dempster A.P., Laird N.M., Rubin D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977;39:1–38. doi: 10.1111/j.2517-6161.1977.tb01600.x. [DOI] [Google Scholar]
  • 20.Chen G., Balakrishnan N. A general purpose approximate goodness-of-fit test. J. Qual. Technol. 1995;27:154–161. doi: 10.1080/00224065.1995.11979578. [DOI] [Google Scholar]
  • 21.Lemonte A.J., Cordeiro G.M. An extended Lomax distribution. Statistics. 2013;47:800–816. doi: 10.1080/02331888.2011.568119. [DOI] [Google Scholar]
  • 22.Wang M., Elbatal I. The modified Weibull geometric distribution. METRON. 2015;73:303–315. doi: 10.1007/s40300-014-0052-1. [DOI] [Google Scholar]
  • 23.Biçer C. Statistical inference for geometric process with the power Lindley distribution. Entropy. 2018;20:723. doi: 10.3390/e20100723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wang M. A new three-parameter lifetime distribution and associated inference. arXiv. 20131308.4128 [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES