Abstract
A new flexible univariate probability distribution was defined in this paper. The new distribution is so called the ‘exponentiated Gumbel–Weibull {logistic} distribution’ and it arose by using the exponentiated Gumbel distribution to generate a generalized Weibull distribution using the logit function or the quantile function of the logistic distribution as a link. The new distribution was observed to be both unimodal and bimodal as well as exhibits various shape and tail properties consistent with data arising from several real life phenomena. A detail study of its statistical properties was carried out and the maximum likelihood method was used in the estimation of its parameters. The new distribution was applied in fitting the reported daily number of infections due to the COVID-19 pandemic in Nigeria. Five other datasets were further used to ascertain the flexibility of the new distribution in fitting data sets with different statistical properties.
Keywords: T–R {Y} family, Gumbel distribution, Weibull distribution, Maximum likelihood estimation, Monte Carlo Simulations
Introduction
The science of data is one which involves the use of some methodologies from disparate fields in extracting information from data usually for policy purposes. These methodologies include statistical methodologies, scientific methodologies, artificial intelligence as well as data analysis methodologies [1–3]. These methodologies come handy in aggregating, cleaning, preparing data for analysis, manipulating data as well finding specific patterns or trajectories that data follow. Within the vanguard of statistical modeling of data, the practice is usually to find a stochastic model which best describe the behavior of a given data. These stochastic models are usually completely specified as probability distribution functions from which other desirable properties of the data are obtained for either policy making or for further investigations. The need to obtain appropriate distribution functions which can best describe the stochastic behavior of data sets arising from several real life situations is one of the major drives for the development of new and more flexible families of probability distributions. Within the context of applications, the classical probability distribution functions have been found to be unable to adequately fit data sets with varying shape and tail properties in many studies and hence the increasing volumes of research devoted so far to generalized them and in the process increase their flexibility. Several methods have been put forward in the literature for the generalization of a probability distribution [4–12] each with their attendant benefits and shortcomings.
The COVID-19 pandemic is one which has ravage the entire world and accompanying it are economic, social and behavioral challenges and responses. Several studies, using mathematical models, statistical models, behavioral models and those involving artificial intelligence frameworks have been put forward already to explain the evolution, transmission and the impacts of the pandemic in several countries of the world using data on the daily, weekly or monthly number of infections from the disease [13–21]. However, it is important to state that data of this nature tends to possess one or more characteristics which classical probability distributions as used in statistical modeling may not be able to capture when they are used to describe them. For example, data of this sort tends to be highly skewed either to the right or to the left with the possibility of having some outlying observation and hence, a classical distribution like the normal distribution cannot be used to fit such data and it becomes imperative to use a very flexible distribution to fit data of this sort such as generalized families of distributions. In this paper a new probability distribution which is a generalization of the classical Weibull distribution is developed and used to fit the daily number of infections from the COVID-19 pandemic in Nigeria. The new distribution is further used in fitting five other data sets in order to demonstrate how flexible it can be.
The rest of the paper is organized thus. In Sect. 2, the new distribution is presented. A discussion on some of the statistical properties of the distribution is contained in Sect. 3. The process of using the maximum likelihood method for the estimation of the parameters of the distribution is contained in Sect. 4 while application of the distribution to real data sets is carried out in Sect. 5. The paper closes in Sect. 6 with summary and conclusion.
The New Distribution
Supposed is a random variable following the exponentiated Gumbel distribution defined by [22] with the cumulative distribution function (cdf), probability density function (pdf) and quantile function given respectively by
Suppose also that is a Weibull random variable with cdf, pdf and quantile function given respectively by
Let be a standard logistic random variable with cdf, pdf and quantile function given respectively
The cdf
| 1 |
is a valid cdf and from (1) we have the cdf of the 5-parameter exponentiated Gumbel–Weibull {logistic} (EGuWL) distribution given as
| 2 |
The pdf corresponding to (2) is expressed as
| 3 |
where the parameters and control the shape of the distribution and is scale parameter. The graphs of the pdf in (3) are shown in Figs. 1, 2 and 3 for various combinations of parameter values. The quantile function corresponding to the cdf in (1) is given by
| 4 |
Fig. 1.
EGuWL density showing skewness to the right
Fig. 2.
EGuWL density showing symmetry and bimodality
Fig. 3.
EGuWL density showing bimodality and left skewness
In Fig. 1, for fixed values of the parameters and we observe that the EGuWL density is highly skewed to the right when the parameters are varied. In fact, for decreasing (increasing) values of parameter the density falls exponentially. This behavior shows that the EGuWL distribution can be very effective in fitting highly right-skewed data sets with possibility of outliers or reverse-J shaped data sets. In Fig. 2, for fixed values of and and varied values of the EGuWL density can be bimodal and almost symmetric. For negative values of and increasing (decreasing) values parameter , the EGuWL density is bimodal and for non-negative values of the parameter and increasing (decreasing) values of parameter , the EGuWL density is almost symmetric. This highlights that the EGuWL distribution can be used for fitting bimodal and near symmetric data sets. In Fig. 3, the EGuWL density is also observed to possess left-skewness. In fact, for fixed values of and the density is skewed to the left when the value of is decreasing and when the values of is increasing. This also shows that the EGuWL distribution can also be used to fit left-skewed data sets. Observe that in the Figs. 1, 2 and 3, the value of the parameter is always fixed, this is because is a scale parameter and its value does not affect the shape of the density.
Proposition 1:
Suppose is an EGuWL random variable and and are uniform random variable defined on (0, 1) and exponentiated Gumbel random variable respectively, then.
-
(i)
-
(ii)
Proof:
The proof of (i) and (ii) follow from (1) and (4) respectively. Proposition 1 is very useful for simulating random samples from the EGuWL distribution by first simulating from the exponentiated Gumbel distribution or the uniform distribution and applying the transformation accordingly. The relation in (i) can also be used to determine the moments of the EGuWL distribution.
Statistical Properties of the New Distribution
Here we present some essential statistical properties of the EGuWL distribution. A discussion on the hazard function is used to begin the section.
Hazard Function
The hazard function of the EGuWL distribution is expressed as
| 5 |
Figures 4, 5, 6 display the shape of the EGuWL hazard function for various combinations of parameter values. Figures 4, 5, 6 show that the EGuWL hazard can be decreasing, increasing and upside down bathtub. These results are very useful in lifetime data analysis.
Fig. 4.
EGuWL hazard showing decreasing and upside down bathtub shapes
Fig. 5.
EGuWL hazard showing increasing shapes
Fig. 6.
EGuWL hazard showing increasing shapes
Mode
Proposition 2:
The mode(s) of the EGuWL distribution is either at or it will satisfy the equation.
| 6 |
where
Proof:
As observed from the graphs of the EGuWL density, the distribution can be both unimodal and bimodal. On differentiating the EGuWL density w.r.t , one obtains.
The derivative does not exist when . Other critical point(s) satisfy , hence the EGuWL distribution mode(s) will either be at or it will satisfy the equation
Remark 1:
Observe that the expression is a factor of and has the same sign as . Analytical solution of (6) for is not possible. However, (6) can be solved numerically in order to obtain the desired mode(s).
Moments
An expression for computing the non-central moments of the EGuWL distribution can easily be obtained by making using of the relationship between the EGuWL random variable and the exponentiated Gumbel random variable as specified in Proposition 1(i). In particular, the relation implies that
Since which is an EGuWL random variable is a transformed exponentiated Gumbel random variable following from proposition 1(i), its moments can be obtained as if one is obtaining the moments of the exponentiated Gumbel random variable hence the density function of the exponentiated Gumbel distribution will be used in obtaining the moments instead of the more complex density function of the EGuWL distribution and this is a major result in this paper. It follows that
| 7 |
The th non-central moments of the EGuWL distribution are computed from the relation in (7). The mean , variance , skewness and kurtosis of the EGuWL distribution are given respectively as
The quantile function can also be used in computing the skewness and kurtosis of a distribution, especially when such quantile function exists in a simple analytic form. Galton [23] proposed a quantile measure based approach for evaluating skewness while Moor [24] did the same for Kurtosis. Galton’s skewness and Moor’s kurtosis are evaluated using the relations
Since the quantile function of the EGuWL distribution exists in a simple analytic form as expressed in (4), the above expressions can be used in computing the skewness and kurtosis of the EGuWL distribution. 3-D plots of the Galton’s skewness and the Moore’s kurtosis of the EGuWL distribution for some selected parameters values are presented in Fig. 7.
Fig. 7.
Galton’s skewness (S) and Moore’s kurtosis (K) for the EGuWL distribution (k = 0, λ = 1, β = 0.5)
Entropy
Shannon [25] offered a probabilistic definition of entropy. The Shannon entropy of a random variable following a known probability distribution is a measure of variation of uncertainty.
Proposition 3:
The Shannon entropy of a random variable following the EGuWL distribution can be expressed as.
| 8 |
where
and are respectively the Shannon entropy and mean of the exponentiated Gumbel distribution,
Proof:
For a random variable with density function , the Shannon Entropy of is defined as.
The pdf corresponding to the cdf in (1) can be written as
and hence
Observe that from (1), and hence . It follows that
and
It follows that
and consequently
From Proposition 1(i) we have that and thus
It follows that
Thus
where
| 9 |
| 10 |
| 11 |
The integrals in (9)–(11) exist because
and
Hence
where
Remark 2:
It can be easily verified that.
where is the cdf of the Gumbel distribution, is the Euler’s constant and
An expression for was given in [22] as
where is the complete gamma function.
Estimation
Here the maximum likelihood method of estimation of parameters is presented for the estimation of the parameters of the EGuWL distribution.
Maximum Likelihood Method of Estimation of the Parameters of the EGuWL Distribution
For a complete random independent sample of size , the log-likelihood function of the EGuWL distribution is
| 12 |
Suppose be the unknown parameter vector, the associated score function is given by
where are the partial derivatives of the log-likelihood function w.r.t. to each parameter and are given by
The maximum likelihood estimate of is obtained by solving the non-linear systems of equations . Since the resulting systems of equations are not in closed form, the solutions can be found numerically using any of the Newton’s type algorithms.
The Fisher information matrix (FIM) of the EGuWL distribution is the symmetric matrix given by
where the elements Thus, the elements of the FIM can be obtained by realizing the second order partial derivatives of the log-likelihood function w.r.t. to the parameters. These elements can be numerically obtained by using the R software. The total FIM, can be approximated by
For real data, is obtained after the maximum likelihood estimate of is gotten, which implies the convergence of the iterative numerical procedure involved in finding such estimate.
Suppose is the maximum likelihood estimate of . Under the usual regularity conditions and that the parameters are in the interior of the parameter space, but not on the boundary, we have: where is the inverse of the expected FIM, which also corresponds to the variance–covariance matrix of the parameters. The asymptotic behavior is still valid if is replaced by the inverse of the observed information matrix evaluated at that is . The multivariate normal distribution with mean vector and covariance matrix can be used to construct confidence intervals for the EGuWL parameters. The approximate two-sided confidence interval for the parameters are given by
respectively, where are diagonal elements of and is the upper percentile of a standard normal distribution.
Monte Carlo Simulations
Here we conduct a Monte Carlo simulations study to assess the performance and efficiency of the maximum likelihood estimators of the parameters of the EGuWL distribution. The performance of the maximum likelihood estimators are examined for different sample sizes and different combinations of parameter values. The simulation is repeated for times using the sample sizes and parameter combination values and Random samples are simulated from the EGuWL distribution using Proposition 1(i) and five quantities are computed in the simulations and these include:
- Mean estimates (ME) of the maximum likelihood estimator of the parameter where
- Average bias (AVB) of the maximum likelihood estimator of the parameter where
-
Root mean squared error (RSME) of the maximum likelihood estimator of the parameter
where Coverage probability (CP) of 95% confidence intervals of the parameters i.e., the percentage of intervals that contain the true value of parameter
Average width (AW) of 95% confidence intervals of the parameter .
Tables 1 and 2 contain the results for the quantities ME, AVB, RMSE, AW and CP. In Tables 1 and 2, it can be observed that ME of all the parameters reduce as the sample size increases and moves toward their true values. The AVB of all the parameters are all positive and reduce as the sample size increases. The RMSE and the AW of all the parameters also reduce as the sample size increases.
Table 1.
Results of Monte Carlo simulations
| Parameter | Sample size | ME | AVB | RMSE | AW | CP |
|---|---|---|---|---|---|---|
| n = 25 | 2.3662 | 2.8721 | 9.5568 | 92.1923 | 0.92 | |
| n = 80 | 1.9902 | 2.8001 | 9.2332 | 43.7943 | 0.94 | |
| n = 150 | 1.8189 | 1.6123 | 4.9977 | 19.0392 | 0.92 | |
| n = 400 | 1.7231 | 0.6956 | 1.8794 | 6.2785 | 0.95 | |
| n = 800 | 1.6070 | 0.2885 | 0.8791 | 3.1967 | 0.96 | |
| n = 1500 | 1.5158 | 0.1722 | 0.5319 | 2.1018 | 0.95 | |
| n = 25 | 6.6151 | 1.6151 | 2.8147 | 22.4692 | 0.99 | |
| n = 80 | 5.9234 | 0.9234 | 2.1205 | 11.6721 | 0.98 | |
| n = 150 | 5.6044 | 0.6044 | 1.7176 | 7.9471 | 0.97 | |
| n = 400 | 5.3479 | 0.3479 | 1.0629 | 4.2522 | 0.97 | |
| n = 800 | 5.1483 | 0.1483 | 0.6956 | 2.7024 | 0.97 | |
| n = 1500 | 5.0368 | 0.0368 | 0.4752 | 1.8867 | 0.96 | |
| n = 25 | 1.5495 | 0.0495 | 0.2718 | 2.2593 | 0.99 | |
| n = 80 | 1.5594 | 0.0594 | 0.2367 | 1.3777 | 0.97 | |
| n = 150 | 1.5397 | 0.0397 | 0.2124 | 1.0284 | 0.95 | |
| n = 400 | 1.5396 | 0.0396 | 0.1434 | 0.5947 | 0.94 | |
| n = 800 | 1.5163 | 0.0163 | 0.1006 | 0.4082 | 0.94 | |
| n = 1500 | 1.5080 | 0.0081 | 0.0734 | 0.2959 | 0.95 | |
| n = 25 | − 0.5616 | 1.4384 | 6.9987 | 61.4781 | 1 | |
| n = 80 | − 0.6047 | 1.8336 | 5.4006 | 28.2781 | 1 | |
| n = 150 | − 0.6947 | 1.3053 | 3.8439 | 17.0276 | 0.99 | |
| n = 400 | − 1.3415 | 0.6585 | 2.0352 | 7.9190 | 0.98 | |
| n = 800 | − 1.6746 | 0.3254 | 1.1400 | 4.7119 | 0.99 | |
| n = 1500 | − 1.7674 | 0.2326 | 0.7742 | 3.2086 | 0.95 | |
| n = 25 | 6.0365 | 2.0365 | 5.2700 | 41.5927 | 0.98 | |
| n = 80 | 5.7615 | 1.7615 | 4.2515 | 21.6274 | 0.97 | |
| n = 150 | 5.2363 | 1.2363 | 3.2890 | 14.0813 | 0.95 | |
| n = 400 | 4.7095 | 0.7095 | 1.9105 | 7.0457 | 0.98 | |
| n = 800 | 4.3271 | 0.3271 | 1.1555 | 4.2893 | 0.97 | |
| n = 1500 | 4.1631 | 0.1631 | 0.7403 | 2.9448 | 0.96 |
Table 2.
Results of Monte Carlo simulations
| Parameter | Sample size | ME | AVB | RMSE | AW | CP |
|---|---|---|---|---|---|---|
| n = 25 | 2.9043 | 28.2944 | 33.9312 | 398.123 | 0.97 | |
| n = 80 | 2.8575 | 19.1266 | 26.1717 | 333.010 | 0.99 | |
| n = 150 | 2.8018 | 14.6778 | 20.2345 | 271.750 | 0.99 | |
| n = 400 | 2.2214 | 5.9548 | 18.6956 | 99.3318 | 0.99 | |
| n = 800 | 2.0757 | 3.6372 | 10.4671 | 55.8275 | 1 | |
| n = 1500 | 2.0091 | 2.3331 | 5.7627 | 28.5450 | 0.99 | |
| n = 25 | 5.3867 | 3.3870 | 4.5350 | 18.6944 | 1 | |
| n = 80 | 4.0405 | 2.0405 | 3.2464 | 10.4061 | 0.97 | |
| n = 150 | 3.1945 | 1.1945 | 2.3676 | 7.1001 | 0.97 | |
| n = 400 | 2.2856 | 0.2856 | 0.8372 | 2.9608 | 0.96 | |
| n = 800 | 2.0716 | 0.0716 | 0.4053 | 1.8213 | 0.98 | |
| n = 1500 | 2.0002 | 0.0002 | 0.2412 | 1.2770 | 0.98 | |
| n = 25 | 4.8666 | 1.8660 | 2.5276 | 10.7823 | 0.77 | |
| n = 80 | 4.7721 | 1.7721 | 2.4828 | 9.6358 | 0.78 | |
| n = 150 | 4.4782 | 1.4783 | 2.2842 | 9.1945 | 0.95 | |
| n = 400 | 3.8396 | 0.8396 | 1.5297 | 6.9524 | 0.95 | |
| n = 800 | 3.5479 | 0.5479 | 1.0833 | 5.0864 | 0.97 | |
| n = 1500 | 3.4106 | 0.4106 | 0.8283 | 3.5243 | 0.99 | |
| n = 25 | 4.0900 | 2.0900 | 8.7545 | 90.8129 | 0.98 | |
| n = 80 | 4.6124 | 2.6123 | 7.9091 | 49.9287 | 0.97 | |
| n = 150 | 4.5187 | 2.5187 | 7.5799 | 36.1367 | 0.86 | |
| n = 400 | 2.8963 | 0.8963 | 2.8955 | 14.2318 | 0.99 | |
| n = 800 | 2.5846 | 0.5846 | 1.6062 | 8.4314 | 0.99 | |
| n = 1500 | 2.4372 | 0.4373 | 1.0703 | 5.5644 | 0.99 | |
| n = 25 | 6.2989 | 3.7989 | 7.0401 | 47.6068 | 0.99 | |
| n = 80 | 5.5586 | 3.0586 | 6.0645 | 27.1930 | 0.99 | |
| n = 150 | 4.7446 | 2.2446 | 5.2725 | 18.9548 | 0.99 | |
| n = 400 | 3.2178 | 0.7178 | 1.9232 | 6.8753 | 1 | |
| n = 800 | 2.8741 | 0.3741 | 0.9428 | 3.7113 | 1 | |
| n = 1500 | 2.7351 | 0.2351 | 0.5228 | 2.3270 | 0.99 |
Remark 3
The simulations was also conducted for other sets of combination of parameter values namely , and and the results followed similar pattern as obtained in Tables 1 and 2. To conserve space, they are not reported.
Applications
The EGuWL distribution will be applied to fit the daily number of reported infections from the COVID-19 pandemic in Nigeria. Five other data sets will also be used to demonstrate its flexibility. The fit of the EGuWL distribution will be compared with those of other models in its class.
-
(i)
Application to Nigeria’s COVID—19 data
For the first application, the EGuWL is used to fit the daily number of reported infections from the COVID-19 pandemic in Nigeria for a seven months period (20th March–19th October, 2020). The data set was obtained from the website of the National Center for Disease Control (NCDC) at http://covid19.ncdc.gov.ng/. The data set is unimodal, right-skewed and platykurtic (skewness = 0.4671, excess kurtosis = − 0.8916). The data set is contained in Table 3.
The Weibull (W), exponentiated Gumbel (EGu) [22], the beta exponential (BE) [26], the beta generalized exponential (BGE) [27] and the Gumbe Weibull {logistic} (GuWL) [28] distributions are also used to fit the data and their fits are compared with that of the EGuWL distribution. The BE, BGE and GuWL densities are given respectively by
The results from fitting the COVID-19 data which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov–Smirnov (K–S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 4. Figure 8 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 4 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K–S statistic.
Table 3.
Daily number of infections from COVID-19 (20th March–19th October, 2020)
| 4, 4, 10, 8, 10, 4, 7, 14, 5, 19, 22, 20, 8, 35, 10, 25, 5, 18, 6, 16, 22, 14, 17, 13, 5, 20, 30, 34, 35, 51, 48, 86, 38, 117, 91, 108, 114, 87, 91, 64, 195, 196, 204, 238, 220, 170, 245,148, 195, 381, 386, 239, 248, 242, 146, 184, 193, 288, 176, 388, 216, 226, 284, 339, 245, 265, 313, 229, 276, 389, 182, 387, 553, 307, 416, 241, 348, 350, 328, 389, 260, 315, 663, 409, 681, 627, 501, 403, 573, 490, 587, 745, 667, 661, 436, 675, 452, 649, 594, 684, 779, 490, 566, 561, 790, 626, 454, 603, 544, 575, 503, 460, 499, 575, 664, 571, 595, 463, 643, 595, 600, 653, 556, 562, 576, 543, 604, 591, 438, 555, 648, 624, 404, 481, 462, 386, 304, 288, 304, 457, 354, 443, 453, 437, 290, 423, 453, 373, 329, 325, 298, 417, 410, 593, 476, 340, 601, 322, 321, 252, 221, 296, 160, 250, 138, 143, 239, 216, 125, 162, 100, 155, 296, 176, 197, 188, 160, 79, 132, 90, 126, 131, 221, 189, 97, 195, 176, 111, 125, 213, 136, 126, 136, 187, 201, 153, 126, 160, 58, 120, 118, 155, 103, 151, 111, 163, 164, 225, 179, 148, 212, 113, 133, 118 |
Table 4.
Maximum likelihood fit of the COVID-19 data
| Distribution | W | BE | EGu | BGE | GuWL | EGuWL |
|---|---|---|---|---|---|---|
|
Parameter estimates |
||||||
| Log Likelihood | ||||||
| AIC | ||||||
|
K-S p-value |
|
|
|
|
|
|
(Standard error of estimates in parenthesis)
Fig. 8.
Graph of the fitted densities for the COVID-19 data
-
(ii)
Application to Aluminum Coupons data
For the second application, the EGuWL distribution is used to fit the fatigue time of 101 6061-T6 Aluminum Coupons cut parallel to the direction of rolling and oscillated at 18 cycles per second (cps). The data set was reported in [29] and presented in Table 5. The data set is unimodal, right-skewed and leptokurtic (Skewness = 0.3355 and excess kurtosis = 1.1687). The beta normal (BN) [6], the beta Weibull (BW) [30], the beta Burr XII (BBXII) [31], Gumbel–Burr XII {logistic} (GuBXIIL) [32] and the GuWL distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The BN, BW, BBXII and the GuBXIIL densities are given respectively by
Table 5.
Fatigue time of 101 6061-T6 Aluminum Coupons
| 70, 90, 96, 97, 99, 100, 103, 104,104,105,107,108, 108, 108,109, 109, 112, 112,113, 114, 114, 114, 116, 119, 120, 120,120, 121, 121, 123, 124, 124, 124, 124, 124, 128, 128, 129,129, 130, 130, 130, 131, 131, 131, 131, 131, 132, 132, 132,133, 134, 134, 134, 134, 134, 136, 136, 137, 138, 138, 138,139, 139, 141, 141, 142, 142, 142, 142, 142, 142, 144, 144,145, 146, 148, 148, 149, 151, 151, 152, 155, 156, 157, 157,157, 157, 158, 159, 162, 163, 163, 164, 166, 166, 168, 170,174, 196, 212 |
and are the pdf and cdf of the normal distribution respectively,
The results from fitting the Aluminum Coupons which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov–Smirnov (K–S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 6. Figure 9 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 6 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K–S statistic.
-
(iii)
Application to the Kevlar 49/epoxy strands failure times data (pressure at 70%)
Table 6.
Maximum likelihood fit of the Aluminum Coupons
| Distribution | BW | BN | GuWL | BBXII | GuBXIIL | EGuWL |
|---|---|---|---|---|---|---|
|
Parameter estimates |
||||||
| Log Likelihood | − 456.67 | − 456.88 | − 456.61 | − 457.90 | − 475.18 | − 455.59 |
| AIC | 921.34 | 921.75 | 921.21 | 925.80 | 960.35 | 921.18 |
| K–S | 0.0654 | 0.0647 | 0.0750 | 0.0913 | 0.1329 | 0.0611 |
| p value | 0.7550 | 0.7673 | 0.5936 | 0.3482 | 0.0514 | 0.8222 |
(Standard error of estimates in parenthesis)
Fig. 9.
Graph of the fitted densities for the Aluminum Coupons data
For the third application, the EGuWL distribution is used to fit the Kevlar 49/epoxy strands failure times data (pressure at 70%). The data set was reported in [28]. The data set is multimodal, platykurtic, and approximately symmetric. (skewness = 0.0998, excess kurtosis = − 0.79). The data set is presented in Table 7. The BN, BW, GuWL, beta exponentiated Weibull (BEW) [33] and the Gumbel–Weibull {logistic} Poisson (GuWLP) [12] distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The BEW and the GuWLP densities are given respectively by
Table 7.
Kevlar 49/epoxy strands failure times data (pressure at 70%)
| 1051, 1337, 1389, 1921, 1942, 2322, 3629, 4006, 4012, 4063, 4921, 5445, 5620, 5817, 5905, 5956, 6068, 6121, 6473, 7501, 7886, 8108, 8546, 8666, 8831, 9106, 9711, 9806, 10,205, 10,396, 10,861, 11,026, 11,214, 11,362, 11,604, 11,608, 11,745, 11,762, 11,895, 12,044, 13,520, 13,670, 14,110, 14,496, 15,395, 16,179, 17,092, 17,568, 17,568 |
The results from fitting the Kevlar 49/epoxy strands failure times data (pressure at 70%) which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov–Smirnov (K–S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 8. Figure 10 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 8 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K -S statistic.
-
(iv)
Application to the Kevlar 49/epoxy strands failure times data (pressure at 90%)
Table 8.
Maximum likelihood fit of the Kevlar 49/epoxy strands failure times data (pressure at 70%)
| Distribution | BW | BN | GuWL | BEW | GuWLP | EGuWL |
|---|---|---|---|---|---|---|
|
Parameter estimates |
||||||
| Log Likelihood | 479.49 | − 480.41 | − 479.49 | − 480.0 | − 478.86 | − 478.40$ |
| AIC | 966.97 | 968.81 | 966.97 | 970.0 | 960.35 | 966.80 |
| K–S | 0.0764 | 0.0832 | 0.0742 | 0.0755 | 0.0701 | 0.0607 |
| p value | 0.9165 | 0.8590 | 0.9316 | 0.9227 | 0.9556 | 0.9888 |
(Standard error of estimates in parenthesis)
Fig. 10.
Graph of the fitted densities for the Kevlar 49/epoxy strands failure times data (pressure at 70%)
For the fourth application, the EGuWL distribution is used to fit the Kevlar 49/epoxy strands failure times data (pressure at 90%). The data set was reported in [28]. The data set is unimodal, leptokurtic, and highly skewed to the right (reverse J-shape) (skewness = 3.0472, excess kurtosis = 14.4745). The data set is presented in Table 9. The BN, BW, GuWL, exponentiated Weibull (EW) [5] and the GuWLP distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The EW density is given by
Table 9.
Kevlar 49/epoxy strands failure times data (pressure at 90%)
| 0.01, 0.01, 0.02, 0.02, 0.02, 0.03, 0.03, 0.04, 0.05, 0.06, 0.07, 0.07, 0.08, 0.09, 0.09, 0.10, 0.10, 0.11, 0.11, 0.12, 0.13, 0.18, 0.19, 0.20, 0.23, 0.24, 0.24, 0.29, 0.34, 0.35, 0.36, 0.38, 0.40, 0.42, 0.43, 0.52, 0.54, 0.56, 0.60, 0.60, 0.63, 0.65, 0.67, 0.68, 0.72, 0.72, 0.72, 0.73, 0.79, 0.79, 0.80, 0.80, 0.83, 0.85, 0.90, 0.92, 0.95, 0.99, 1.00, 1.01, 1.02, 1.03, 1.05, 1.10, 1.10, 1.11, 1.15, 1.18, 1.20, 1.29, 1.31, 1.33, 1.34, 1.40, 1.43, 1.45, 1.50, 1.51, 1.52, 1.53, 1.54, 1.54, 1.55, 1.58, 1.60, 1.63, 1.64, 1.80, 1.80, 1.81, 2.02, 2.05, 2.14, 2.17, 2.33, 3.03, 3.03, 3.34, 4.20, 4.69, 7.89 |
The results from fitting the Kevlar 49/epoxy strands failure times data (pressure at 90%) which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov–Smirnov (K–S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 10. Figure 11 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 10 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K -S statistic.
-
(v)
Application to the Australian Athletes' Height Data
Table 10.
Maximum likelihood fit of the Kevlar 49/epoxy strands failure times data (pressure at 90%)
| Distribution | BW | BN | GuWL | EW | GuWLP | EGuWL |
|---|---|---|---|---|---|---|
|
Parameter estimates |
||||||
| Log Likelihood | − 102.17 | − 129.81 | − 100.94 | − 102.79 | − 100.16 | − 99.80 |
| AIC | 212.34 | 267.62 | 209.88 | 211.57 | 210.31 | 209.59 |
| K–S | 0.0784 | 0.1219 | 0.0689 | 0.0844 | 0.0683 | 0.0629 |
| p value | 0.5385 | 0.0913 | 0.6983 | 0.4433 | 0.7078 | 0.7953 |
(Standard error of estimates in parenthesis)
Fig. 11.
Graph of the fitted densities for the Kevlar 49/epoxy strands failure times data (pressure at 90%)
For the fifth application, the EGuWL distribution is used to fit the heights (in centimeters) of 100 female Australian athletes. The data set was collected by the Australian Institute of Sport and reported in [28]. The data set is unimodal, leptokurtic, and left-skewed (skewness = − 0.5684, excess kurtosis = 1.3212). The data set is presented in Table 11. The BN, GuWL, EW, Weibull–Pareto {exponential} (WPE) [34] and the beta skew normal (BSN) [35] distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The WPE and the BSN densities are given by
Table 11.
Australian Athletes' Height Data
| 148.9, 149.0, 156.0, 156.9, 157.9, 158.9, 162.0, 162.0, 162.5, 163.0, 163.9, 165.0, 166.1, 166.7, 167.3, 167.9, 168.0, 168.6, 169.1, 169.8, 169.9, 170.0, 170.0, 170.3, 170.8, 171.1, 171.4, 171.4, 171.6, 171.7, 172.0, 172.2, 172.3, 172.5, 172.6, 172.7, 173.0, 173.3, 173.3, 173.5, 173.6, 173.7, 173.8, 174.0, 174.0, 174.0, 174.1, 174.1, 174.4, 175.0, 175.0, 175.0, 175.3, 175.6, 176.0, 176.0, 176.0, 176.0, 176.8, 177.0, 177.3, 177.3, 177.5, 177.5, 177.8, 177.9, 178.0, 178.2, 178.7, 178.9, 179.3, 179.5, 179.6, 179.6, 179.7, 179.7, 179.8, 179.9, 180.2, 180.2, 180.5, 180.5, 180.9, 181.0, 181.3, 182.1, 182.7, 183.0, 183.3, 183.3, 184.6, 184.7, 185.0, 185.2, 186.2, 186.3, 188.7, 189.7, 193.4, 195.9 |
and are the pdf and cdf of the normal distribution respectively, is the Owen’s T function.
The results from fitting the Heights data which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov–Smirnov (K–S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 12. Figure 12 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 12 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K -S statistic.
-
(vi)
Application to Australian Athletes’ sum of skin folds data
Table 12.
Maximum likelihood fit of the Heights data
| Distribution | WPE | BN | GuWL | EW | BSN | EGuWL |
|---|---|---|---|---|---|---|
|
Parameter estimates |
||||||
| Log Likelihood | − 351.49 | − 350.30 | − 350.14 | − 351.44 | −350.30 | −349.02 |
| AIC | 708.97 | 708.60 | 708.28 | 708.89 | 210.31 | 708.04 |
| K–S | 0.0801 | 0.0721 | 0.0587 | 0.0711 | 0.0722 | 0.0534 |
| p value | 0.5171 | 0.6489 | 0.8607 | 0.6662 | 0.6472 | 0.9230 |
(Standard error of estimates in parenthesis)
Fig. 12.
Graph of the fitted densities for the Heights data
For the last application, the EGuWL distribution is used to fit the sum skin folds of 100 female Australian athletes. The data set was collected by the Australian Institute of Sport and reported in [28]. The data set is unimodal, leptokurtic, and right-skewed (skewness = 0.7878, excess kurtosis = 0.7320). The data set is presented in Table 13. The BN, GuWL, WPE, EW and BW distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The results from fitting the sum of skin folds data which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov–Smirnov (K–S) statistic values (the corresponding p-values are also reported) of all the fitted distributions are reported in Table 14. Figure 13 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 14 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K–S statistic.
Table 13.
Australian Athletes' sum of skin folds data
| 33.8, 36.8, 38.2, 41.1, 41.6, 42.3, 43.5, 43.5, 46.1, 46.2, 46.3, 47.5, 47.6, 48.4, 49.0, 49.9, 50.0, 52.5, 52.6, 54.6,54.6, 55.6, 56.8, 57.9, 58.9, 59.4, 61.9, 62.6, 62.9, 65.1, 67.0, 68.3, 68.9, 69.9, 70.0, 71.3, 71.6, 73.9, 74.7, 74.9, 75.1,75.2, 76.2, 76.8, 77.0, 80.1, 80.3, 80.3, 80.3, 80.6, 83.0, 87.2, 88.2, 89.0,90.2, 90.4, 91.0, 91.2, 95.4, 96.8, 97.2, 97.9, 98.0, 98.1, 98.3, 98.5, 99.8, 99.9, 101.1, 102.8, 102.8,103.6,103.6, 104.6, 106.9, 109.0, 109.1, 109.5, 109.6, 110.2, 110.7, 111.1, 113.5, 114.0, 115.9, 117.8, 122.1,123.6, 125.9, 126.4, 126.4, 131.9, 136.3,143.5, 148.9,156.6,156.6, 171.1, 181.7, 200.8 |
Table 14.
Maximum likelihood fit of the sum of skin folds data
| Distribution | BW | BN | GuWL | EW | WPE | EGuWL |
|---|---|---|---|---|---|---|
|
Parameter estimates |
|
|||||
| Log Likelihood | − 486.25 | − 487.06 | − 486.28 | − 487.27 | − 486.07 | − 485.23 |
| AIC | 980.50 | 982.10 | 980.55 | 980.54 | 978.13 | 980.47 |
| K–S | 0.0725 | 0.0711 | 0.0704 | 0.0809 | 0.0825 | 0.0598 |
| p value | 0.6424 | 0.6925 | 0.6778 | 0.5042 | 0.4782 | 0.8463 |
(Standard error of estimates in parenthesis)
Fig. 13.
Graph of the fitted densities for the sum of skin folds data
Summary and Conclusion
A new flexible probability distribution called the exponentiated Gumbel–Weibull {logistic} distribution has been defined and studied in this paper. The new distribution has been applied in modeling the daily number of infections from the novel COVID-19 pandemic in Nigeria. Five other data sets which exhibit various shape and tail behaviors have been further used to buttress the flexibility of the new distribution. The performance of the distribution in fitting the various data sets have been compared with those of other probability distributions in its class and results obtained showed that the new distribution gave the best fits. We hope the new distribution will attract further usage in fitting data sets from other fields.
Author Contributions
The first draft of the manuscript was written by Patrick Osatohanmwen and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
No funding was received for conducting this study.
Declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethics approval
Ethical standards as recommended by the journal and in line with global best practices have been followed in the course of wrting the article as well as in the reporting of the results conatined therein.
Data Availability
All data as used in the article and in the generation of results are contained in the body of the article and where necesary, URL address have been provided to also acess them.
Code Availability
The codes used in the article can be obtained upon request from the corresponding author.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York [Google Scholar]
- 2.Shi Y, Tian YJ, Kou G, Peng Y, Li JP (2011) Optimization based data mining: theory and applications. Springer, Berlin [Google Scholar]
- 3.Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4(2):149–178 [Google Scholar]
- 4.Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178 [Google Scholar]
- 5.Mudholkar GS, Srivastava DK (1993) Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans Reliab 42:299–302. 10.1109/24.229504 [Google Scholar]
- 6.Eugene N, Lee C, Famoye F (2002) Beta-normal distribution and its applications. Commun Stat Theory Methods 31:497–512. 10.1081/STA-120003130 [Google Scholar]
- 7.Shaw WT, Buckley IR (2009) The alchemy of probability distributions: beyond Gram-Charlier expansions and a skew-kurtotic-normal distribution from a rank transmutation map. arXiv:0901.0434
- 8.Cordeiro GM, de Castro M (2011) A new family of generalized distributions. J Stat Comput Simul 81:883–898. 10.1080/00949650903530745 [Google Scholar]
- 9.Cordeiro GM, Ortega GM, da Cunha DCC (2013) The exponentiated generalized class of distributions. J Data Sci 11:1–27 [Google Scholar]
- 10.Alzaatreh A, Lee C, Famoye F (2014) T – normal family of distributions: a new approach to generalize the normal distribution. J Stat Distrib Appl 1:16 [Google Scholar]
- 11.Osatohanmwen P, Oyegue FO, Ajibade B, Ewere F (2020) A new generalized family of distributions on the unit interval: the T - kumaraswamy family of distributions. J Data Sci 18(2):218–236 [Google Scholar]
- 12.Osatohanmwen P, Oyegue FO, Ogbonmwan SM (2020) The T – R Y power series family of probability distributions. J Egypt Math Soc 28:29. 10.1186/s42787-020-00083-7 [Google Scholar]
- 13.Liu Z, Magal P, Seydi O, Webb G (2020) Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data. arXiv:2002.12298v1 [DOI] [PubMed]
- 14.Roosa K, Lee Y, Luo R, Kirpich A, Rothenberg A, Hyman JM, Yan P, Chowell G (2020) Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020. Infect Dis Model 5:256–263. 10.1016/j.idm.2020.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tang B, Bragazzi NL, Li Q, Tang S, Xiao Y, Wu J (2020) An updated estimation of the risk of transmission of the novel corona virus (2019-nCov). Infect Dis Model 5:248–255. 10.1016/j.idm.2020.02.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tang B, Wang X, Li Q, Bragazzi NL, Tang S, Xiao Y, Wu J (2020) Estimation of the transmission risk of the 2019-nCov and its implication for public health intervention. J Clin Med 9(2):462 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wu JT, Leung K, Leung GM (2020) Nowcasting and forecasting the potential domestic and international spread of the 2019-nCov outbreak originating in Wuhan, China: a modelling study. Lancet 395:689–697. 10.1016/s0140-6736(20)30260-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Osatohanmwen P, Oyegue FO, Ogbonmwan SM (2020) Modeling the daily number of reported cases of infection from the COVID-19 Pandemic in Nigeria: a stochastic approach. Earthline J Math Sci 5(2):217–235. 10.34198/ejms.5221.217235 [Google Scholar]
- 19.Guan C, Liu W, Cheng JYC (2021) Using social media to predict the stock market crash and rebound amid the pandemic: the digital ‘Haves’ and ‘Have-mores.’ Ann Data Sci. 10.1007/s40745-021-00353-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Li J, Guo K, Herrera Viedma E, Lee H, Liu J, Zhong Z, Gomes L, Filip FG, Fang SC, Özdemir MS, Liu XH, Lu G, Sh Y (2020) Culture vs policy: more global collaboration to effectively combat COVID-19. Innovation. 10.1016/j.xinn.2020.100023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu Y, Gu Z, Xia S, Shi B, Zhou X, Shi Y, Liu J (2020) What are the underlying transmission patterns of COVID-19 outbreak? An age-specific social contact characterization. EClincialMedicine 22:100354 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Nadarajah S (2006) The exponentiated Gumbel distribution with climate application. Environmetrics 17:13–23. 10.1002/env.739 [Google Scholar]
- 23.Galton F (1883) Enquiries into human faculty and its development. Macmillan and Company, London [Google Scholar]
- 24.Moor JJ (1988) A quantile alternative for Kurtosis. Statistician 37:25–32 [Google Scholar]
- 25.Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–432 [Google Scholar]
- 26.Nadarajah S, Kotz S (2006) The beta exponential distribution. Reliab Eng Syst Saf 91:689–697. 10.1016/j.ress.2005.05.008 [Google Scholar]
- 27.Barreto-Souza W, Santos AHS, Cordeiro GM (2010) The beta generalized exponential distribution. J Stat Comput Simul 80:159–172. 10.1080/00949650802552402 [Google Scholar]
- 28.Al-Aqtash R, Lee C, Famoye F (2014) Gumbel - Weibull distribution: properties and application. J Mod App Stat Method 13:201–225. 10.22237/jmasm/1414815000 [Google Scholar]
- 29.Birnbaum ZW, Saunders SC (1969) A new family of life distributions. J App Prob 6:637–652. 10.2307/3212003 [Google Scholar]
- 30.Famoye F, Lee C, Olumolade O (2005) The beta-Weibull distribution. J Stat Theory Appl 4:121–136 [Google Scholar]
- 31.Paranaiba PF, Ortega EMM, Cordeiro GM, Pescim R (2013) The beta Burr XII distribution with application to lifetime data. Comput Stat Data Anal 55:1118–1136. 10.1016/j.csda.2010.09.009 [Google Scholar]
- 32.Osatohanmwen P, Oyegue FO, Ogbonmwan SM (2019) A new Member from the T-X family of distributions: the Gumbel-Burr XII distribution and its properties. Sankhya A 81:298–322. 10.1007/s13171-017-0110-x [Google Scholar]
- 33.Cordeiro GM, Gomes AE, da-Silva CQ, Ortega EMM (2013) The beta exponentiated Weibull distribution. J Stat Comput Simul 83(1):114–138. 10.1080/00949655.2011.615838 [Google Scholar]
- 34.Alzaatreh A, Lee C, Famoye F (2013) Weibull-pareto distribution and its applications. Commun Stat Theory Methods 42:1673–1691. 10.1080/03610926.2011.599002 [Google Scholar]
- 35.Mameli V, Musio M (2013) A generalization of the beta skew-normal distribution: the beta skew-normal. Commun Statist Theory Methods 42:2229–2244. 10.1080/03610926.2011.607530 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data as used in the article and in the generation of results are contained in the body of the article and where necesary, URL address have been provided to also acess them.













