Modeling Cancer Remission Time Data by Means of the Max Erlang Binomial Distribution

Bogdan Gheorghe Munteanu

doi:10.1155/2021/9932729

. 2021 Sep 24;2021:9932729. doi: 10.1155/2021/9932729

Modeling Cancer Remission Time Data by Means of the Max Erlang Binomial Distribution

Bogdan Gheorghe Munteanu ^1,^✉

PMCID: PMC8487378 PMID: 34608400

Abstract

In this paper, a statistical simulation algorithm for the power series distribution, called the Max Erlang Binomial distribution, is proposed, analyzed, and tested for bladder cancer remission time data. In order to present the simulation technique, the EM algorithm for statistical estimation aimed at estimating the model parameters is described.

1. Introduction

The introduction of this new (generalized) distribution addresses reliability problems when lifetime can be expressed as the maximum or minimum of a sequence of independent and identically distributed (iid) random variables, which represents the system components' risk times. In recent years, some researchers have proposed a series of new distributions for the maximum and minimum of a sequence of iid random variables. For example, Adamidis and Loukas [1], Kus [2], Tahmasbi and Rezaei [3], Louzada et al. [4], and Cancho et al. [5] were interested in determining the maximum or minimum distribution when the components in a sequence of iid random variables are exponentially distributed, and a number of components are of a discrete type. Next, Flores et al. [6] treated the distribution of a vector's maximum with components that are exponentially distributed in a random number of a power series distribution type. This type of distribution is called the complementary exponential power series (CEPS) distribution. Also, Morais and Barreto-Souza [7] considered analyzing the Weibull distribution class by means of the power series distribution class (WPS). Recently, Louzada et al. [8] have developed a mathematical model that unifies the procedure for obtaining a distribution of the maximum and minimum of a sequence of iid random variables of the absolutely continuous type in a random number N characterized by the generating function. But the problem of determining the general formula when the random variable N forms a part of a power series distributions remains unsolved.

In this paper, the simulation algorithms for these family distributions are proposed. This study is intended as a completion of the research by Balkema and de Haan (1974), Bryson (1974), Ahsanullah (1991), Balakrishnan and Ahsanullah (1994), Childs and others (2001), Al Awadhi and Ghitany (2001, 2007), Zahrani and Harbi (2013), Al-Zahrani and Sagor (2014), Tahir and Cordeiro ([9], 2016), Hassan and Abd-Elfattah (2016), and Munteanu ([10], 2013). The above-mentioned algorithm was implemented by means of the Eclipse SDK programming environment.

This work has the following structure: Section 2 defines the mathematical properties of the Max Erlang Binomial power series distribution (the cumulative distribution function, the probability density function, the mean, and variance). The simulation techniques targeting the Max Erlang Binomial distribution are analyzed and formulated in Section 3, with results validation via the Pearson test. In Section 4, the simulation algorithm for the Max Erlang Binomial distribution parameters is proposed and tested using the method of the maximum likelihood estimation. Section 5 discusses an application of the proposed distribution using a real-life dataset. Lastly, in Section 6, some useful conclusions are drawn.

2. Development of the Mathematical Model

In [11], the properties of a new power distribution type series, called the Max Erlang Binomial (MaxErlB), are introduced and researched. As a mathematical model, this distribution describes the probabilistic behavior of lifetimes, widely used in researching the reliability of systems. In [11], this distribution is presented as the distribution of the maximum value in a random volume sample Z from a statistical population, Erlang distributed, where Z is a binomially distributed, zero-truncated random variable. Formally, things are presented as follows.

Let us consider the random variable Z such that ℙ(Z ∈ {1, 2, ⋯}) = 1.

Definition 1 ([12]). —

We say that the random variable Z has a power series distribution if

$\begin{matrix} ℙ (Z = z) = \frac{a_{z} Θ^{z}}{A (Θ)}, z = 1, 2, \dots; Θ \in (0, τ), \end{matrix}$ (1)

where a₁, a₂, ⋯ are nonnegative real numbers, τ is a positive number bounded by the convergence radius of power series (series function) A(Θ) = ∑_z≥1 a_zΘ^z, ∀Θ ∈ (0, τ), and Θ is the power parameter of the distribution (Table 1).

Table 1.

The representative elements of the PSD families for various truncated distributions.

Distribution	a _z	Θ	A(Θ)	τ
Binom^∗(n, p)	$(\begin{matrix} n \\ z \end{matrix})$	$\frac{p}{1 - p}$	(1 + Θ)ⁿ − 1	∞
Poisson^∗(α)	$\frac{1}{z!}$	α	e^Θ − 1	∞
Log(p)	$\frac{1}{z}$	p	−ln(1 − Θ)	1
Geom^∗(p)	1	1 − p	$\frac{Θ}{1 - Θ}$	1
Pascal(k, p)	$(\begin{matrix} z - 1 \\ k - 1 \end{matrix})$	1 − p	${(\frac{Θ}{1 - Θ})}^{k}$	1
Bineg^∗(k, p)	$(\begin{matrix} z + k - 1 \\ z \end{matrix})$	p	(1 − Θ)^−k − 1	1

Open in a new tab

PSD denotes the power series distribution function families. If the random variable Z has the distribution in Equation (1), then we write Z ∈ PSD.

We consider that X_i ~ Erlang(k, λ), k ∈ ℕ, k ≥ 1, λ > 0, where (X_i)_i≥1 are iid random variables with the distribution function F_{X_i}(x) ≡ F_Erl(x) = 1 − ∑_i=0^k−1 ((λx)ⁱ/i!)e^−λx, x > 0, and the probability density function f_{X_i}(x) ≡ f_Erl(x) = (λ^kx^k−1e^−λx/(k − 1)!), x > 0.

We note that U_Erl = max {X₁, X₂, ⋯, X_Z}.

The results in this section are obtained using the general framework in [13], for which reason some proofs are not presented.

Proposition 1 (see [11]). —

If the random variable U_Erl = max{X₁, X₂, ⋯, X_Z}, where (X_i)_i≥1 are nonnegative iid random variables, X_i ~ Erlang(k, λ), k ∈ ℕ, k ≥ 1, λ > 0 and Z ~ Binom^∗(n, p), Z ∈ PSD, n ∈ {1, 2⋯}, with A(Θ) = (Θ + 1)ⁿ − 1, Θ = (p/(1 − p)), p ∈ (0, 1), Θ ∈ (0, τ), τ > 0, the random variables (X_i)_i≥1 and Z being independent; then, the cumulative distribution functions and the probability density function of the random variable U_ErlB are the following:

$\begin{matrix} U_{ErlB} (x) = \frac{{(1 - p e^{- λ x} \sum_{i = 0}^{k - 1} ({(λ x)}^{i} / i!))}^{n} - {(1 - p)}^{n}}{1 - {(1 - p)}^{n}}, x > 0, \end{matrix}$ (2)

$\begin{matrix} u_{ErlB} (x) = \frac{n p λ^{k} x^{k - 1} e^{- λ x} {(1 - p e^{- λ x} \sum_{i = 0}^{k - 1} ({(λ x)}^{i} / i!))}^{n - 1}}{1 - {(1 - p)}^{n}}, x > 0 . \end{matrix}$ (3)

Definition 2 (see [11]). —

We say that the random variable U_ErlB has a Max Erlang Binomial power series distribution with parameters k, λ, n, and p (U_ErlB ~ MaxErlB (k, λ, n, p)), if it has the cumulative distribution function (cdf) defined by Equation (2) and probability density function (pdf) defined by Equation (3).

The numerical characteristics (mean, variance) of a random variable with a MaxErlB distribution, in a particular case (k = 2), are presented in the following result:

Proposition 2 . —

The mean and variance of the random variable U_ErlB ~ MaxErlB(2, λ, n, p), λ > 0, n ∈ {1, 2⋯}, p ∈ (0, 1), are characterized by the following relations:

$\begin{matrix} E U_{ErlB} = \frac{n}{λ [1 - {(1 - p)}^{n}]} \sum_{z = 1}^{n} {(- 1)}^{z - 1} (\begin{matrix} n \\ z - 1 \end{matrix}) \frac{(z + 1)! p^{z}}{z^{z + 2}}, \end{matrix}$ (4)

$\begin{matrix} V a r U_{ErlB} = \frac{n}{λ^{2} [1 - {(1 - p)}^{n}]} [\sum_{z = 1}^{n} {(- 1)}^{z - 1} (\begin{matrix} n \\ z - 1 \end{matrix}) \frac{(z + 2)! p^{z}}{z^{z + 3}} - \frac{n}{1 - {(1 - p)}^{n}} {(\sum_{z = 1}^{n} {(- 1)}^{z - 1} (\begin{matrix} n \\ z - 1 \end{matrix}) \frac{(z + 1)! p^{z}}{z^{z + 2}})}^{2}] . \end{matrix}$ (5)

Proof —

After Equation (3) and the definition of the mean, we obtain

$\begin{matrix} E U_{ErlB} = \frac{n p λ^{2}}{1 - {(1 - p)}^{n}} \int_{0}^{\infty} x^{2} e^{- λ x} {(1 - p e^{- λ x} (1 + λ x))}^{n - 1} d x, \end{matrix}$ (6)

where ${(1 - p e^{- λ x} (1 + λ x))}^{n - 1} = \sum_{z = 1}^{n} {(- 1)}^{z - 1} (\begin{matrix} n \\ z - 1 \end{matrix}) p^{z - 1} {(1 + λ x)}^{z - 1} e^{- (z - 1) λ x}$ , as developed by Newton's binomial. A sum of n-integrals then can be solved with elementary methods (method of integration by parts), which leads to Equation (4).

Similarly, evaluating the second-order moment

$\begin{matrix} E U_{ErlB}^{2} = \frac{n p λ^{2}}{1 - {(1 - p)}^{n}} \int_{0}^{\infty} x^{3} e^{- λ x} {(1 - p e^{- λ x} (1 + λ x))}^{n - 1} d x, \end{matrix}$ (7)

together with the definition of variance, leads us to Equation (5).

Remark 1 . —

We notice that for k = 1, we obtain the complementary exponential distribution introduced by Flores et al. [6].

3. Statistical Simulation for the MaxErlB Distribution

Taking advantage of the fact that the random variable U_ErlB ~ MaxErlB(k, λ, n, p), λ > 0, k, n ∈ {1, 2⋯}, p ∈ (0, 1), has the same distribution as the random variable max_1≤i≤ZX_i, where (X_i)_i≥1 are iid random variables, X_i ~ Erlang(k, λ), k ∈ ℕ, k ≥ 1, λ > 0, and the value of the random variable Z ~ Binom^∗(n, p), p ∈ (0, 1), n ∈ {1, 2, ⋯}, coincide with the value of the random variable zero-truncated binomial distributed with the same parameters, but provided this is a nonzero value, we can briefly describe the following algorithm.

3.1. Statistical Simulation Algorithm for the MaxErlB Distribution

Step 1 . —

We generate a value z^⋆ of the random variable Z^⋆ ~ Binom(n, p), p ∈ (0, 1), n ∈ {1, 2, ⋯}

Step 2 . —

If z^⋆ = 0, then GO TO Step 1; otherwise, z = z^∗

Step 3 . —

For the value z of the random variable Z (generated in Steps 1 and 2), simulate the values x_i, i = 1, 2, ⋯ as a values of z-iid random variables with distribution Erlang(k, λ), k ∈ ℕ, k ≥ 1, λ > 0

Step 4 . —

It is considered u_ErlB = max_1≤i≤zx_i, STOP.

Following the simulation, we can apply the Chi-square test of concordance. Based on a test, based on the results (u_ErlB¹, u_ErlB², ⋯, u_ErlB^m), the Chi-square criterion (Pearson's criterion) is applied, and the basic and alternative hypotheses are verified, respectively:

H₀: sample values (u_ErlB¹, u_ErlB², ⋯, u_ErlB^m) are values of the random variable distributed MaxErlB(2, λ, n, p)

H₁: sample values (u_ErlB¹, u_ErlB², ⋯, u_ErlB^m) are not the values of the random variable distributed MaxErlB(2, λ, n, p).

The test is considered valid if the empirical value of χ_c² is less than the upper critical value of the Chi-square with (r − 1) − L = (12 − 1) − 4 = 7 freedom degrees (χ_0.05;7² = 14.067). The statistics of Pearson's test is calculated using the following relation:

\begin{matrix} χ_{c}^{2} = \sum_{j = 1}^{r} \frac{{(n_{j} - n_{0} p_{j})}^{2}}{n_{0} p_{j}}, \end{matrix}

(8)

where $n_{j}, j = \bar{1, r}$ represents the number of observed values in the interval [t_j−1, t_j), n₀ = ∑_j=1^r n_j.

The probabilities p_j that the random variable U_ErlB takes the values in the interval [t_j−1, t_j) are calculated using the following relation:

\begin{matrix} p_{j} = U_{ErlB} (t_{j}) - U_{ErlB} (t_{j - 1}) \overset{(2)}{=} \frac{1}{1 - {(1 - p)}^{n}} [{(1 - p e^{- λ t_{j}} \sum_{i = 0}^{k - 1} \frac{{(λ t_{j})}^{i}}{i!})}^{n} - {(1 - p e^{- λ t_{j - 1}} \sum_{i = 0}^{k - 1} \frac{{(λ t_{j - 1})}^{i}}{i!})}^{n}], \end{matrix}

(9)

where $t_{j}, j = \bar{0, r - 1}$ represent the ends of each interval after they have been merged.

Based on the algorithm presented above, we can notice (see Table 2) that the mean and the empirical variance of the simulation results are well approximated by the mean and the theoretical variance of the random variable U_ErlB ~ MaxErlB(2,10,3, 0.2) (Proposition 2), and the Chi-square criterion validates each time the basic hypothesis according to which the simulated values are indeed governed by this distribution.

Table 2.

The validation of the MaxErlB simulation results with the application of the Chi-square test.

Sample size	Mean		Variance		Chi-square
Sample size	Theoretical	Empirical	Theoretical	Empirical	Chi-square
100	0.2197	0.2080	0.0202	0.0182	2.061
1000		0.2141		0.0228	6.952
10000		0.2151		0.0217	5.317
100000		0.2170		0.0216	4.833
1000000		0.2168		0.0215	7.636
10000000		0.2167		0.0214	2.901

Open in a new tab

Moreover, the validation is confirmed for samples values m ∈ {100,1000,10000,100000,1000000,10000000}.

The histogram of the simulated data and the plot of the probability density function of the simulated distribution (Figure 1) also confirm the validity of the basic hypothesis, but visually.

Histograms of relative frequencies of samples size m = 1000,10000,100000,1000000 and probability density function of the simulated values that are governed by the MaxErlB(2,10,3, 0.2) distribution.

4. EM Algorithm for the MaxErlB Distribution

The EM algorithm introduced in 1977 in the paper [14] comes to perfect the maximum likelihood method which, in the case of processing incomplete statistical data, becomes practically unusable. Next, the algorithm is implemented for the MaxErlB(2, λ, 3, p), λ > 0, p ∈ (0, 1) distribution.

We consider the values of a sample (x₁, x₂, ⋯, x_m) of size m a statistical population govorned by a MaxErlB distribution with the probability density function u_ErlB(x, Ψ), x > 0, which depends on the parameter vector Ψ = (λ, p), given that the parameter n of the zero-truncated binomial distribution and the parameter k of the Erlang distribution are given. According to the definition of the maximum likelihood function and Equation (3), we have

\begin{matrix} L (x_{1}, x_{2}, \dots, x_{m}; Ψ) = \prod_{j = 1}^{m} \frac{3 p λ^{2} x_{j} e^{- λ x_{j}} {(1 - p e^{- λ x_{j}} - p λ x_{j} e^{- λ x_{j}})}^{2}}{1 - {(1 - p)}^{3}} = \frac{{(3 p λ^{2})}^{m} e^{- λ \sum_{j = 1}^{m} x_{j}}}{{(1 - {(1 - p)}^{3})}^{m}} \prod_{j = 1}^{m} x_{j} {(1 - p e^{- λ x_{j}} - p λ x_{j} e^{- λ x_{j}})}^{2} . \end{matrix}

(10)

To obtain the maximum likelihood equations for the MaxErlB distribution regarding the estimation $\hat{λ}, \hat{p}$ for the parameters λ, p, we consider

\begin{matrix} \ln L (x_{1}, x_{2}, \dots, x_{m}; λ, p) = m (\ln 3 + \ln p + 2 \ln λ) - λ \sum_{i = 0}^{k - 1} x_{j} - m \ln (1 - {(1 - p)}^{3}) + \sum_{j = 1}^{m} [\ln x_{j} + 2 (1 - p e^{- λ x_{j}} - p λ x_{j} e^{- λ x_{j}})] . \end{matrix}

(11)

The parameters n and k being considered known, then the equations of the method for the maximum likelihood estimation function (MLE) are characterized by the nonlinear system S(Ψ) = 0, where S(Ψ) = ((∂lnL/∂λ), (∂lnL/∂p)). Developing the system of equations S(Ψ) = 0, we notice that it becomes difficult to solve in relation to the unknowns λ and p. We are thus in the situation in which the application of the EM algorithm explained and analyzed by Dempster et al. [14], then expanded by McLachlan and Krishnan [15] is required. In this algorithm, the random variable Z is considered a random variable latency, that is, the random variable which cannot be observed directly.

For this, we consider, formally, the following sample:

\begin{matrix} ((x_{1}, z_{1}), (x_{2}, z_{2}), \dots, (x_{m}, z_{m})), \end{matrix}

(12)

by m observations of the random variable (U_ErlB, Z).

This shows that ((x₁, z₁), (x₂, z₂), ⋯, (x_m, z_m)) can be interpreted as a complete set of statistics, being, in this case, a sample of incomplete data. The description of the EM algorithm supposes a known conditional mean 𝔼(Z|U_ErlB; Ψ), where Ψ = (λ, p).

The probability density function u_ErlB(x, Ψ), x > 0, of the random variable U_ErlB wich corresponds to a complete set of data, is defined by the following relation according to the definition of probability density in the case of the maximum (see [13], Consequence 2.2):

\begin{matrix} u_{ErlB} (x) = \frac{Θ λ^{2} x e^{- λ x} \{A^{'} [Θ (1 - e^{- λ x} - λ x e^{- λ x})]\}}{A (Θ)}, x > 0, \end{matrix}

(13)

In these conditions, the probability density function u_ErlB(x, z) of the random variable (U_ErlB, Z) which corresponds to a complete set of data is given by

\begin{matrix} u_{E r l B} (x, z; Ψ) = z f_{E r l} (x) {(F_{E r l} (x))}^{z - 1} ℙ (Z = z) = \frac{z a_{z} Θ^{z} λ^{2} x e^{- λ x} {(1 - e^{- λ x} - λ x e^{- λ x})}^{z - 1}}{A (Θ)}, \end{matrix}

(14)

where A(Θ) = (1 + Θ)ⁿ − 1, Θ = p/1 − p, p ∈ (0, 1), $a_{z} = (\begin{matrix} n \\ z \end{matrix})$ , z ≤ n, f_Erl(x), and F_Erl(x), x > 0 are the probability density function, respectively, the cumulative distribution function which has the Erlang(k, λ), k ∈ ℕ, k ≥ 1, λ > 0 distribution.

Then, the probability density function of the random variable Z conditioned by the random variable U_ErlB has the following expression:

\begin{matrix} u_{ErlB} (z |x) = \frac{u_{ErlB} (x, z)}{u_{ErlB} (x)} = \frac{z a_{z} Θ^{z - 1} {(1 - e^{- λ x} - λ x e^{- λ x})}^{z - 1}}{A^{'} [Θ (1 - e^{- λ x} - λ x e^{- λ x})]} . \end{matrix}

(15)

Therefore, considering the obvious relation ∑_z≥1 z²a_zΘ^z−2 = A′′(Θ) + (1/Θ) · A′(Θ), the conditional mean becomes

\begin{matrix} E (Z |U_{ErlB}; Ψ) = \sum_{z = 1}^{n} z \cdot u_{ErlB} (z |x; Ψ) = \frac{\sum_{z = 1}^{n} z^{2} a_{z} Θ^{z - 1} {(1 - e^{- λ x} - λ x e^{- λ x})}^{z - 1}}{A^{'} [Θ (1 - e^{- λ x} - λ x e^{- λ x})]} = \frac{Θ (1 - e^{- λ x} - λ x e^{- λ x})}{A^{'} [Θ (1 - e^{- λ x} - λ x e^{- λ x})]} \sum_{z = 1}^{n} z^{2} a_{z} Θ^{z - 2} {(1 - e^{- λ x} - λ x e^{- λ x})}^{z - 2} = \frac{Θ (1 - e^{- λ x} - λ x e^{- λ x}) \cdot A^{''} [Θ (1 - e^{- λ x} - λ x e^{- λ x})]}{A^{'} [Θ (1 - e^{- λ x} - λ x e^{- λ x})]} + 1 . \end{matrix}

(16)

Since Z ~ Binom^⋆(n, p) ∈ PSD, k, n ∈ {1, 2, ⋯} with A(Θ) = (1 + Θ)ⁿ − 1, Θ ∈ (0, +∞), Θ = p/1 − p, p ∈ (0, 1), we have

\begin{matrix} E (Z |U_{ErlB}; Ψ) = \frac{2 p (1 - e^{- λ x} - λ x e^{- λ x})}{1 - p e^{- λ x} - p λ x e^{- λ x}} + 1 . \end{matrix}

(17)

We describe the EM algorithm for the MaxErlB(2, λ, 3, p) distribution as an iterative process of estimating the unknown parameter Ψ = (λ, p) through Ψ^(h) = (λ^(h), p^(h)) calculated for a few steps h ≥ 1 such that the following condition is satisfied:

\begin{matrix} \max (|λ^{(h)} - λ^{(h - 1)}|, |p^{(h)} - p^{(h - 1)}|) \leq ε, \end{matrix}

(18)

or h = K be accomplished when ε > 0 and K represents the number of preset iterations.

The steps of the EM algorithm for MaxErlB distribution are the following:

Step 5 . —

We take λ = λ⁽⁰⁾, p = p⁽⁰⁾, λ⁽⁰⁾ > 0, p⁽⁰⁾ ∈ (0, 1)

Step 6 . (Expectation). —

To iterate h, h ≥ 1, we calculate the mean value of z_j^{(h − 1)}, $j = \bar{1, m}$ according to Equation (17) for k = 2:

$\begin{matrix} z_{j}^{(h - 1)} = \frac{2 p^{(h - 1)} (1 - e^{- λ^{(h - 1)} x_{j}} (1 + λ^{(h - 1)} x_{j}))}{1 - p^{(h - 1)} e^{- λ^{(h - 1)} x_{j}} (1 + λ^{(h - 1)} x_{j})} + 1 \end{matrix}$ (19)

Step 7 . (Maximization). —

Through the maximum likelihood estimation (MLE) method, we take into consideration the following sample:

$\begin{matrix} ((x_{1}, z_{1}^{(h - 1)}), (x_{2}, z_{2}^{(h - 1)}), \dots, (x_{m}, z_{m}^{(h - 1)})), \end{matrix}$ (20)

with the maximum likelihood function:

$\begin{matrix} L (x_{1}, \dots, x_{m}, z_{1}^{(h - 1)}, \dots, z_{m}^{(h - 1)}; Ψ^{(h - 1)}) = \prod_{j = 1}^{m} u_{E r l B} (x_{j}, z_{j}^{(h - 1)}; Ψ^{(h - 1)}) = \prod_{j = 1}^{m} [\frac{z_{j}^{(h - 1)} x_{j} (\begin{matrix} 3 \\ z_{j}^{(h - 1)} \end{matrix}) {(p^{(h - 1)})}^{z_{j}^{(h - 1)}} {(1 - p^{(h - 1)})}^{3 - z_{j}^{(h - 1)}}}{1} \cdot \cdot \frac{{(λ^{(h - 1)})}^{2} e^{- λ^{(h - 1)} x_{j}} {(1 - e^{- λ^{(h - 1)} x_{j}} (1 + λ^{(h - 1)} x_{j}))}^{z_{j}^{(h - 1)} - 1}}{1 - {(1 - p^{(h - 1)})}^{3}}] . = {[\frac{{(λ^{(h - 1)})}^{2}}{1 - {(1 - p^{(h - 1)})}^{3}}]}^{m} \prod_{j = 1}^{m} [z_{j}^{(h - 1)} x_{j} (\begin{matrix} 3 \\ z_{j}^{(h - 1)} \end{matrix}) {(p^{(h - 1)})}^{z_{j}^{(h - 1)}} {(1 - p^{(h - 1)})}^{3 - z_{j}^{(h - 1)}} \cdot \cdot e^{- λ^{(h - 1)} x_{j}} {(1 - e^{- λ^{(h - 1)} x_{j}} (1 + λ^{(h - 1)} x_{j}))}^{z_{j}^{(h - 1)} - 1}] . \end{matrix}$ (21)

Thus, we can find iteration Ψ^(h) = (λ^(h), p^(h)) which estimates the parameters Ψ = (λ, p)

Step 8 . —

We examine Equation (18). If NOT, then GO TO Step 2; otherwise, Ψ≔Ψ^(h), STOP.

Given the function

\begin{matrix} \ln L (x_{1}, x_{2}, \dots, x_{m}, z_{1}^{(h - 1)}, z_{2}^{(h - 1)}, \dots, z_{m}^{(h - 1)}; Ψ^{(h - 1)}) = = 2 m \ln λ^{(h - 1)} - m \ln [1 - {(1 - p^{(h - 1)})}^{3}] + \sum_{j = 1}^{m} \{\ln (\begin{matrix} 3 \\ z_{j}^{(h - 1)} \end{matrix}) + \ln z_{j}^{(h - 1)} + z_{j}^{(h - 1)} \ln p^{(h - 1)} + (3 - z_{j}^{(h - 1)}) \ln (1 - p^{(h - 1)}) + + \ln x_{j} - λ^{(h - 1)} x_{j} + (z_{j}^{(h - 1)} - 1) \ln [1 - e^{- λ^{(h - 1)} x_{j}} (1 + λ^{(h - 1)} x_{j})]\}, \end{matrix}

(22)

the maximum likelihood equations are characterized by the nonlinear system S(Ψ^{(h − 1)}) = ((∂lnL/∂λ^{(h − 1)}), (∂lnL/∂p^{(h − 1)}), namely

\begin{matrix} S (Ψ^{(h - 1)}) : \{\begin{matrix} \frac{2 m}{λ^{(h - 1)}} + \sum_{j = 1}^{m} (\frac{x_{j}^{2} λ^{(h - 1)} (z_{j}^{(h - 1)} - 1) e^{- λ^{(h - 1)} x_{j}}}{1 - e^{- λ^{(h - 1)} x_{j}} (1 + λ^{(h - 1)} x_{j})} - x_{j}) = 0 \\ - \frac{3 m {(1 - p^{(h - 1)})}^{2}}{1 - {(1 - p^{(h - 1)})}^{3}} - \frac{3 m}{1 - p^{(h - 1)}} + \frac{\sum_{j = 1}^{m} z_{j}^{(h - 1)}}{p^{(h - 1)} (1 - p^{(h - 1)})} = 0 . \end{matrix} \end{matrix}

(23)

Table 3 shows the results obtained from the implementation of the EM algorithm (described above), in the Octave 1.5.4 GUI programming environment. We must also emphasize that for different sample sizes (m ∈ {100,1000,10000,100000,1000000}), we obtain very good approximations of the parameters λ and p that characterize the MaxErlB distribution, when the parameters k and n are known.

Table 3.

The estimate of the parameter vector Ψ = (λ, p) of MaxErlB(2, λ, 3, p) distribution by $\hat{Ψ} = (\hat{λ}, \hat{p})$ .

Sample size	(λ, p)	$\hat{λ}$	$\hat{p}$	h
100	(1, 0.5)	1.017	0.494	134
1000		1.030	0.518	144
10000		0.999	0.496	152
100000		0.999	0.503	150
1000000		0.998	0.497	152

Open in a new tab

5. Application

We will now consider a dataset which represents the remission times (in months) of a random sample of 128 bladder cancer patients. The dataset itself has previously been used in [16–18]. It is summarized as follows: 0.08, 2.09, 3.48, 4.87, 6.94, 8.66, 13.11, 23.63, 0.20, 2.23, 3.52, 4.98, 6.97, 9.02, 13.29, 0.40, 2.26, 3.57, 5.06, 7.09, 9.22, 13.80, 25.74, 0.50, 2.46, 3.64, 5.09, 7.26, 9.47, 14.24, 25.82, 0.51, 2.54, 3.70, 5.17, 7.28, 9.74, 14.76, 26.31, 0.81, 2.62, 3.82, 5.32, 7.32, 10.06, 14.77, 32.15, 2.64, 3.88, 5.32, 7.39, 10.34, 14.83, 34.26, 0.90, 2.69, 4.18, 5.34, 7.59, 10.66, 15.96, 36.66, 1.05, 2.69, 4.23, 5.41, 7.62, 10.75, 16.62, 43.01, 1.19, 2.75, 4.26, 5.41, 7.63, 17.12, 46.12, 1.26, 2.83, 4.33, 5.49, 7.66, 11.25, 17.14, 79.05, 1.35, 2.87, 5.62, 7.87, 11.64, 17.36, 1.40, 3.02, 4.34, 5.71, 7.93, 11.79, 18.10, 1.46, 4.40, 5.85, 8.26, 11.98, 19.13, 1.76, 3.25, 4.50, 6.25, 8.37, 12.02, 2.02, 3.31, 4.51, 6.54, 8.53, 12.03, 20.28, 2.02, 3.36, 6.76, 12.07, 21.73, 2.07, 3.36, 6.93, 8.65, 12.63, and 22.69.

Figure 2 provides the histogram of relative frequencies of a sample size which characterizes the remission times of bladder cancer, where the curve represents the pdf of the random variable U_ErlB ~ MaxErlB(2,10,3, 0.2) distribution defined by Equation (3).

A histogram and probability density function plot for remission times of bladder cancer.

6. Conclusion

The conclusions revealed by the present research are related to the study of power series distributions type of a maximum of a sequence of iid random variables which are found in a random number.

Also, the distribution of a maximum number of iid random variables through the PSD family, characterized by the number of the random variable in the sequence, was presented in a compact, coherent approach.

For this purpose, programs for the statistical simulation of the MaxErlB power series distributions type were developed. The validity of the maximum distributions was performed using Pearson's test of consistency and is reflected in Table 2. Describing the EM algorithm implemented in the GUI Octave 1.5.4 programming environment to estimate the parameters of the MaxErlB distribution is presented in Table 3.

A real data sequence on bladder cancer remission times was used to illustrate and compare the histogram of the relative frequencies of remission times and the probability density function plot of the remission time values that are governed by the MaxErlB distribution (Figure 2).

Data Availability

All data are fully available without restriction.

Conflicts of Interest

The author declares no conflicts of interest.

References

1.Adamidis K., Loukas S. A lifetime distribution with decreasing failure rate. Statistics and Probability Letters . 1998;39(1):35–42. doi: 10.1016/S0167-7152(98)00012-1. [DOI] [Google Scholar]
2.Kus C. A new lifetime distribution. Computational Statistics and Data Analysis . 2007;51(9):4497–4509. doi: 10.1016/j.csda.2006.07.017. [DOI] [Google Scholar]
3.Tahmasbi R., Rezaei S. A two-parameter lifetime distribution with decreasing failure rate. Computational Statistics and Data Analysis . 2008;52(8):3889–3901. doi: 10.1016/j.csda.2007.12.002. [DOI] [Google Scholar]
4.Louzada F., Roman M., Cancho V. G. The complementary exponential geometric distribution: model, properties, and a comparison with its counterpart. Computational Statistics and Data Analysis . 2011;55(8):2516–2524. doi: 10.1016/j.csda.2011.02.018. [DOI] [Google Scholar]
5.Cancho V. G., Louzada-Neto F., Barriga G. The Poisson-exponential lifetime distribution. Computational Statistics and Data Analysis . 2011;55(1):677–686. doi: 10.1016/j.csda.2010.05.033. [DOI] [Google Scholar]
6.Flores D. J., Borges P., Cancho V. G., Louzada F. The complementary exponential power series distribution. Brazilian Journal of Probability and Statistics . 2013;27(4):565–584. doi: 10.1214/11-BJPS182. [DOI] [Google Scholar]
7.Morais A. L., Barreto-Souza W. A. A compound class of Weibull and power series distributions. Computational Statistics and Data Analysis . 2011;55(3):1410–1425. doi: 10.1016/j.csda.2010.09.030. [DOI] [Google Scholar]
8.Louzada F., Bereta M. P. E., Franco M. A. P. On the distribution of the minimum or maximum of a random number of i. i. d. lifetime random variables. Applied Mathematics . 2012;3(4):350–353. doi: 10.4236/am.2012.34054. [DOI] [Google Scholar]
9.Tahir M. H., Cordeiro G. M. Compounding of distributions: a survey and new generalized classes. Journal of Statistical Distributions and Applications . 2016;3:2–35. doi: 10.1186/s40488-016-0052-1. [DOI] [Google Scholar]
10.Munteanu B. G. The Min-Pareto power series distributions of lifetime. Applied Mathematics & Information Sciences . 2016;10(5):1673–1679. doi: 10.18576/amis/100505. [DOI] [Google Scholar]
11.Leahu A., Munteanu B. G., Cataranciuc S. Max-Erlang and Min-Erlang power series distributions as two new families of lifetime distribution. Buletinul Academiei de Stiinte a Republicii Moldova. Matematica . 2014;2(75):60–73. [Google Scholar]
12.Johnson N. L., Kemp A. W., Kotz S. Univariate Discrete Distribution . Hoboken, NJ, USA: Wiley; 2005. [DOI] [Google Scholar]
13.Leahu A., Munteanu B. G., Cataranciuc S. On the lifetime as the maximum or minimum of the sample with power series distributed size. Romai Journal . 2013;9(2):119–128. [Google Scholar]
14.Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B . 1997;39:1–38. doi: 10.1111/j.2517-6161.1977.tb01600.x. [DOI] [Google Scholar]
15.McLachlan G. J., Krishnan T. The EM Algorithm and Extension . New York, NY, USA: Wiley; 1997. [Google Scholar]
16.Ieren T. G., Chukwu A. U. Bayesian estimation of a shape parameter of the Weibull-Frechet distribution. Asian Journal of Probability and Statistics . 2018;2(1):1–19. doi: 10.9734/ajpas/2018/v2i124562. [DOI] [Google Scholar]
17.Lee E. L., Wang J. W. Statistical Methods for Survival Data Analysis . 3rd. Hoboken, NJ, USA: Wiley; 2003. [DOI] [Google Scholar]
18.Rady E. A., Hassanein W. A., Elhaddad T. A. The power Lomax distribution with an application to bladder cancer data. Springer Plus . 2016;5(1):p. 1838. doi: 10.1186/s40064-016-3464-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data are fully available without restriction.

[B1] 1.Adamidis K., Loukas S. A lifetime distribution with decreasing failure rate. Statistics and Probability Letters . 1998;39(1):35–42. doi: 10.1016/S0167-7152(98)00012-1. [DOI] [Google Scholar]

[B2] 2.Kus C. A new lifetime distribution. Computational Statistics and Data Analysis . 2007;51(9):4497–4509. doi: 10.1016/j.csda.2006.07.017. [DOI] [Google Scholar]

[B3] 3.Tahmasbi R., Rezaei S. A two-parameter lifetime distribution with decreasing failure rate. Computational Statistics and Data Analysis . 2008;52(8):3889–3901. doi: 10.1016/j.csda.2007.12.002. [DOI] [Google Scholar]

[B4] 4.Louzada F., Roman M., Cancho V. G. The complementary exponential geometric distribution: model, properties, and a comparison with its counterpart. Computational Statistics and Data Analysis . 2011;55(8):2516–2524. doi: 10.1016/j.csda.2011.02.018. [DOI] [Google Scholar]

[B5] 5.Cancho V. G., Louzada-Neto F., Barriga G. The Poisson-exponential lifetime distribution. Computational Statistics and Data Analysis . 2011;55(1):677–686. doi: 10.1016/j.csda.2010.05.033. [DOI] [Google Scholar]

[B6] 6.Flores D. J., Borges P., Cancho V. G., Louzada F. The complementary exponential power series distribution. Brazilian Journal of Probability and Statistics . 2013;27(4):565–584. doi: 10.1214/11-BJPS182. [DOI] [Google Scholar]

[B7] 7.Morais A. L., Barreto-Souza W. A. A compound class of Weibull and power series distributions. Computational Statistics and Data Analysis . 2011;55(3):1410–1425. doi: 10.1016/j.csda.2010.09.030. [DOI] [Google Scholar]

[B8] 8.Louzada F., Bereta M. P. E., Franco M. A. P. On the distribution of the minimum or maximum of a random number of i. i. d. lifetime random variables. Applied Mathematics . 2012;3(4):350–353. doi: 10.4236/am.2012.34054. [DOI] [Google Scholar]

[B9] 9.Tahir M. H., Cordeiro G. M. Compounding of distributions: a survey and new generalized classes. Journal of Statistical Distributions and Applications . 2016;3:2–35. doi: 10.1186/s40488-016-0052-1. [DOI] [Google Scholar]

[B10] 10.Munteanu B. G. The Min-Pareto power series distributions of lifetime. Applied Mathematics & Information Sciences . 2016;10(5):1673–1679. doi: 10.18576/amis/100505. [DOI] [Google Scholar]

[B11] 11.Leahu A., Munteanu B. G., Cataranciuc S. Max-Erlang and Min-Erlang power series distributions as two new families of lifetime distribution. Buletinul Academiei de Stiinte a Republicii Moldova. Matematica . 2014;2(75):60–73. [Google Scholar]

[B12] 12.Johnson N. L., Kemp A. W., Kotz S. Univariate Discrete Distribution . Hoboken, NJ, USA: Wiley; 2005. [DOI] [Google Scholar]

[B13] 13.Leahu A., Munteanu B. G., Cataranciuc S. On the lifetime as the maximum or minimum of the sample with power series distributed size. Romai Journal . 2013;9(2):119–128. [Google Scholar]

[B14] 14.Dempster A. P., Laird N. M., Rubin D. B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B . 1997;39:1–38. doi: 10.1111/j.2517-6161.1977.tb01600.x. [DOI] [Google Scholar]

[B15] 15.McLachlan G. J., Krishnan T. The EM Algorithm and Extension . New York, NY, USA: Wiley; 1997. [Google Scholar]

[B16] 16.Ieren T. G., Chukwu A. U. Bayesian estimation of a shape parameter of the Weibull-Frechet distribution. Asian Journal of Probability and Statistics . 2018;2(1):1–19. doi: 10.9734/ajpas/2018/v2i124562. [DOI] [Google Scholar]

[B17] 17.Lee E. L., Wang J. W. Statistical Methods for Survival Data Analysis . 3rd. Hoboken, NJ, USA: Wiley; 2003. [DOI] [Google Scholar]

[B18] 18.Rady E. A., Hassanein W. A., Elhaddad T. A. The power Lomax distribution with an application to bladder cancer data. Springer Plus . 2016;5(1):p. 1838. doi: 10.1186/s40064-016-3464-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Modeling Cancer Remission Time Data by Means of the Max Erlang Binomial Distribution

Bogdan Gheorghe Munteanu

Abstract

1. Introduction

2. Development of the Mathematical Model

Definition 1 ([12]). —

Table 1.

Proposition 1 (see [11]). —

Definition 2 (see [11]). —

Proposition 2 . —

Proof —

Remark 1 . —

3. Statistical Simulation for the MaxErlB Distribution

3.1. Statistical Simulation Algorithm for the MaxErlB Distribution

Step 1 . —

Step 2 . —

Step 3 . —

Step 4 . —

Table 2.

Figure 1.

4. EM Algorithm for the MaxErlB Distribution

Step 5 . —

Step 6 . (Expectation). —

Step 7 . (Maximization). —

Step 8 . —

Table 3.

5. Application

Figure 2.

6. Conclusion

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Modeling Cancer Remission Time Data by Means of the Max Erlang Binomial Distribution

Bogdan Gheorghe Munteanu

Abstract

1. Introduction

2. Development of the Mathematical Model

Definition 1 ([12]). —

Table 1.

Proposition 1 (see [11]). —

Definition 2 (see [11]). —

Proposition 2 . —

Proof —

Remark 1 . —

3. Statistical Simulation for the MaxErlB Distribution

3.1. Statistical Simulation Algorithm for the MaxErlB Distribution

Step 1 . —

Step 2 . —

Step 3 . —

Step 4 . —

Table 2.

Figure 1.

4. EM Algorithm for the MaxErlB Distribution

Step 5 . —

Step 6 . (Expectation). —

Step 7 . (Maximization). —

Step 8 . —

Table 3.

5. Application

Figure 2.

6. Conclusion

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases