Abstract
We propose a new flexible generalized family (NFGF) for constructing many families of distributions. The importance of the NFGF is that any baseline distribution can be chosen and it does not involve any additional parameters. Some useful statistical properties of the NFGF are determined such as a linear representation for the family density, analytical shapes of the density and hazard rate, random variable generation, moments and generating function. Further, the structural properties of a special model named the new flexible Kumaraswamy (NFKw) distribution, are investigated, and the model parameters are estimated by maximum-likelihood method. A simulation study is carried out to assess the performance of the estimates. The usefulness of the NFKw model is proved empirically by means of three real-life data sets. In fact, the two-parameter NFKw model performs better than three-parameter transmuted-Kumaraswamy, three-parameter exponentiated-Kumaraswamy and the well-known two-parameter Kumaraswamy models.
Keywords: Flexible G-family, generalized family, Kumaraswamy distribution, maximum-likelihood method, new flexible family, T–X family
2010 Mathematics Subject Classifications: 60E05, 60E10, 62E10, 62P12
1. Introduction
Both data and model are equally important in applied research. One class of researchers prefer to realize a phenomenon first, and the other is interested in testing models by fitting to real data. We cannot enter into this endless debate but prefer to follow the last option in order to check the suitability of the proposed family, derived sub-families and special models from the generator. This is actually one of the main objectives of the modern distribution theory, where new families and models are proposed and are then adopted or tested to tackle problems encountered in different fields, such as reliability and survival studies, engineering, actuaries, sports sciences, agriculture, etc. This revolution makes the data analyst to cope with data sets available from different phenomenons. In such way: (i) the well-established parent models are extended by adding shape(s) parameters; (ii) the functional forms of the parent models are modified; (iii) inverted and weighted forms are adopted; (iv) generalized (G) classes have been proposed through transformations, mixtures, composition, copulas, convolution and compounding methods; (v) the special models of generalized classes are investigated, among other methods. All such proposals and their increasing interests led to new and alternative ways for problem solving so that one can reach on a lucid and conclusive end by which the research activity has been kept warm and live. For detailed study and discussion, the reader is referred to [9,13,15].
Alzaatreh et al. [2] introduced a general method for constructing G-families by using the transformed-transformer (T–X) approach. Let be the probability density function (pdf) and be the cumulative distribution function (cdf) of a random variable (rv) for and let be a function of the cdf or survival function (sf) of any baseline rv ( is known as generator) such that satisfies three conditions:
,
is differentiable and monotonically non-decreasing, and
and .
The cdf of the T–X family is
| (1) |
where satisfies the conditions (i)–(iii).
The pdf corresponding to Equation (1) is
| (2) |
In Table 1, we give an update on pioneer generators which are natural models of the T–X family. Here T could be , , . To the best of our knowledge, we are unable to find any other generator that can be included in Table 1.
Table 1.
Pioneer generators as functions of (W[G(x)]) from the T–X family.
There exist some G-classes such as Marshall-Olkin-G (MO-G) [11], exponentiated-G (exp-G) which includes the Lehmann alternative of type 1 (LA1) and Lehmann alternative of type 2 (LA2) [7], transmuted-G (Tr-G) [12], cubic rank transmuted-G (CRTr-G) [6] and exponentiated-generalized-G (EG-G) [3]. These six G-classes (MO-G, LA1, LA2, Tr-G, CRTr-G, EG-G) have not been developed from any existing parent model.
The main motivations for the new flexible generalized family (for short, NFGF) of distributions are:
The NFGF is not developed from any well-known parent model similar to the MO-G, LA1, LA2, Tr-G, CRTr and EG-G classes;
The NFGF does not include any extra parameter;
Any baseline model can be chosen for the NFGF;
The special models generated from the NFGF are free from non-identifiability issue. You may choose either exponentiated or inverted models;
The new special models based on the NFGF have the ability to compete with existing parent or some other competitive models;
The new special models based on the NFGF can produce flexible shapes of the density and hazard rate (in some cases) in comparison to existing parent models;
The new special models of the NFGF can provide consistently better fits than other corresponding models.
We unfold the paper as follows. In Section 2, we propose a new generator called the NFGF. In Section 3, we obtain some of its mathematical properties such as a linear representation for the family density, analytical shapes of the density and hazard rate, random variable generation, moments and generating function. In Section 4, we define a new flexible Kumaraswamy (NFKw) distribution and investigate some structural properties. Its model parameters are estimated by the maximum-likelihood method. In Section 5, a simulation study is carried out to check the precision of the estimates of the NFKw distribution. In Section 6, the potentiality of this model is illustrated by means of three real-life data sets. We show that it performs better than some well-known models. The last section offers some concluding remarks.
2. The proposed flexible G-family
Let T be a baseline rv having cdf , sf and pdf , where ξ is the baseline parameter vector. We define the cdf and pdf of the NFGF by
| (3) |
and
| (4) |
respectively.
Henceforth, let X be a rv having the density (4). The sf and hazard rate function (hrf) of X are, respectively,
and
| (5) |
We consider below some special distributions for different supports of rvs , namely for Kumaraswamy (Kw), beta, Weibull (W), Burr XII (Br), Gumbel (Gu), logistic (Lo), power function (or generalized uniform) (PF) and Pareto (Pa) models:
- If has the cdf and pdf , then the cdf and pdf of the NFKw model are, respectively, given by
and(6) (7) - If has the cdf and pdf , then the cdf and pdf of the new flexible beta (NFB) model are, respectively, given by
and
where , and are the beta function, incomplete beta function and incomplete beta function ratio, respectively. - If has the cdf and pdf , then the cdf and pdf of the new flexible Weibull (NFW) model are, respectively, given by
and - If has the cdf and pdf , then the cdf and pdf of the new flexible Burr XII (NFBr) model are, respectively, given by
and - If has the cdf and pdf , then the cdf and pdf of the new flexible Gumbel (NFGu) model are, respectively, given by
and - If has the cdf and pdf , then the cdf and pdf of the new flexible logistic (NFLo) model are, respectively, given by
and - If has the cdf and pdf , then the cdf and pdf of the new flexible power function (NFPF) model are, respectively, given by
and
Figure 2.
Plots for the NFBeta model for some parameter values: (a) density and (b) hazard rate.
Figure 3.
Plots for the NFW model for some parameter values: (a) density and (b) hazard rate.
Figure 4.
Plots for the NFBr model for some parameter values: (a) density and (b) hazard rate.
Figure 5.
Plots for the NFGu model for some parameter values: (a) density and (b) hazard rate.
Figure 6.
Plots for the NFLo model for some parameter values: (a) density and (b) hazard rate.
Figure 7.
Plots for the NFPF model for some parameter values: (a) density and (b) hazard rate.
Figure 1.
Plots for the NFKw model for some parameter values: (a) density and (b) hazard rate.
Figure 8.
Plots for the NFPa model for some parameter values: (a) density and (b) hazard rate.
Our proposed generator defined by (3) can extend several well-known G-classes of distributions which can be accessed through SupplementaryDataANFGF.pdf. Secondly, the information regarding the new flexible G-families derived from these G-classes can be accessed through SupplementaryDataBNFGF.pdf.
3. Properties of the NFGF
3.1. Linear representation
For an arbitrary baseline cdf , the exponentiated-G (exp-G) distribution with parameter a>0, has cdf and pdf in the forms and , respectively. Re-calling (3) and then using Mathematica, the power series holds
| (8) |
where , , and , which can be expressed as
| (9) |
where (for ).
By differentiating (9), the density of X takes the form
| (10) |
where is the exp-G density with power parameter . Equation (10) reveals that the NKwG density function is a linear combination of exp-G densities. Thus, some mathematical properties of the NFGF can be determined directly from those of the exp-G distributions, which are known for several baseline distributions.
3.2. Analytical shapes
The shapes of the density and hrf of X can be described analytically. The critical points of the density of X are the roots of the equation:
The critical points of the hrf of X are obtained from the equation:
3.3. Quantile function
The simplest method for generating rvs is based on the inverse cdf. For an arbitrary cdf, the quantile function (qf) is defined as . The qf of the NFGF can be determined by inverting (3) and solving two nonlinear equations numerically. We can use the following procedure:
Set ;
Find numerically in using any Newton-Raphson algorithm;
Solving numerically for x in gives the qf of X.
3.4. Moments and generating function
The nth ordinary moment of X, say , can be expressed from Equation (10) as
| (11) |
where , and is the qf of the baseline G.
The first four moments can be used to describe some characteristics of a distribution. Clearly, the central moments and cumulants of X can be determined from Equation (11) using well-known results.
The nth lower incomplete moment of X, say , is
| (12) |
The last two integrals can be evaluated numerically for most G distributions.
The total deviations from the mean and median are and , where comes from Equation (3).
The moment generating function (mgf) of X follows from Equation (10) as
| (13) |
where is the mgf of and . Hence, can be obtained from the exp-G generating function.
3.5. Estimation
Here, we consider the estimation of the unknown parameters of the NFGF family by the maximum-likelihood method. The maximum-likelihood estimates (MLEs) enjoy desirable properties that can be used when constructing confidence intervals and deliver simple approximations that work well in finite samples. The normal approximation for the MLEs in distribution theory can easily be handled either analytically or numerically.
The log-likelihood function for the vector of parameters from n observations has the form
| (14) |
The MLE of θ can be evaluated by maximizing . There are several routines for numerical maximization of in the R program (optim function), SAS (PROC NLMIXED), Ox (sub-routine MaxBFGS), among others. All distributions in the NFGF can also be fitted to data sets using the AdequacyModel package in R (see https://www.r-project.org/). An important advantage of this package is that it is not necessary to define the log-likelihood function and that it computes the MLEs, their standard errors (SEs) and some goodness-of-fit (GoF) statistics. We only need to provide the pdf and cdf of the distribution to be fitted to a data set.
Alternatively, we can differentiate the log-likelihood and solving the resulting nonlinear likelihood equations. Then, the score components with respect to are
where and are column vectors of the same dimension of ξ.
Setting the score components to zero and solving them simultaneously yields the MLEs of the parameters of the NFGF. The resulting equations cannot be solved analytically, but some statistical softwares can be used to solve them numerically through iterative Newton–Raphson type algorithms.
We can obtain the elements of the observed information matrix (p is the dimension of ξ) by numerical integration. Further, the approximate multivariate normal distribution for , where the observed information matrix is evaluated at , can be used to construct confidence intervals for the parameters of the NFGF family.
4. The NFKw model and its properties
The sf and hrf of NFKw rv are, respectively, given by
and
4.1. Quantile function
The qf of the NFKw distribution cannot be obtained explicitly. However, we can use Newton–Raphson algorithm to generate NFKw variates as follows:
-
Step 1:
Set n, a, b and initial value .
-
Step 2:
Generate U∼Uniform .
- Step 3:
-
Step 4:
If , ( , very small tolerance limit), then store as a variate from the NFKw(a, b) distribution.
-
Step 5:
If , then, set and go to step 3.
-
Step 6:
Repeat steps (2)–(5) n times to generate .
4.2. Properties
First, the linear representation for cdf of the NFKw model follows from Equation (3) as
| (15) |
By expanding binomial series and noting that , we can write
and then by changing k by k + 1
Let for k=0,1,2 and for k ≥ 3. We can change conveniently the double sums
The last expression can be written as
where .
By differentiating the last expression, the NFKw density can be written as
| (16) |
where denotes the Kumaraswamy density with shape parameters a and .
It is clear from Equation (16), that the NFKw density is a linear combination of Kumaraswamy densities. So, several of the NFKw properties can be obtained from the Kumaraswamy distribution.
Let be a rv with density . Then, several properties of X can follow from those of . First, the nth ordinary moment of X takes the form
| (17) |
The cumulants ( ) of X can be determined recursively from (17) as , respectively, where .
The skewness and kurtosis plots of the NFKw distribution are displayed in Figure 9. These plots reveal that the parameters a and b play a significant role in modeling the skewness and kurtosis behaviors of X.
Figure 9.
Plots for the NFKw model: (a) skewness (b) kurtosis.
The nth incomplete moment of X is , which is easily found by changing variables from the lower incomplete beta function when calculating the corresponding moment of . Then, we obtain
| (18) |
The total deviations from the mean and median M of X have the forms and , where M can be determined from .
The first incomplete moment is also used to construct the Bonferroni and Lorenz curves (popular measures in economics, reliability, demography, insurance and medicine). The Bonferroni and Lorenz curves of X for a given probability π are given by and , respectively, where is the qf of X discussed in Section 4.2.
4.3. Estimation
Let be a sample of size n from the NFKw distribution given in Equation (7). The log-likelihood function for the vector of parameters reduces to
The components of the score vector are
Setting these equations to zero and solving them simultaneously yields the MLEs of the model parameters.
5. Simulation study
In this section, we evaluate the accuracy of the MLEs of the NFKw parameters using Monte Carlo simulations. The simulation study is repeated for N= 1,000 times each with given sample size n=50, 100, 200, 300, 400 and parameter scenarios: I: a = 0.5, and b = 1.5, II: a = 1.1, and b = 3.5 and III: , and b = 1.5. The precision of the MLEs is investigated in terms of the biases and mean square errors (MSEs), namely:
We display plots of the biases and MSEs for the estimates of NFKw parameters a and b in Figures 10 and 11. These plots reveal that the values of biases and MSEs decrease as the sample size n increases. Thus, the MLEs perform well in estimating the parameters of the NFKw distribution.
Figure 10.
Plots of the biases for the estimated values of NFKw parameters.
Figure 11.
Plots of the MSEs for the estimated values of NFKw parameters.
6. Empirical illustration of the NFKw model
We compare the proposed two-parameter NFKw model (a special model of NFGF) with three-parameter transmuted-Kumaraswamy (TrKw) [8], three-parameter exponentiated-Kumaraswamy (EKw) [10] and two-parameter Kumaraswamy (Kw) models to three real-life data sets (Flood data, Leaves data, Glass Fiber data) which can be accessed from SupplementaryDataCNFGF.pdf. The pdfs of these models are, respectively given by:
and
The parameters of the models are estimated by the maximum-likelihood method and the log-likelihood function is evaluated at the MLEs ( ). The well-known goodness-of-fit (GoF) statistics such as Akaike information criterion (AIC), Bayesian Information Criterion (BIC), Hannan–Quinn Information Criterion (HQIC), Anderson-Darling ( ), Cramér–von Mises ( ) and Kolmogrov-Smirnov (K-S) are used for model comparisons. The lower values of the GoF statistics and higher p-values of K-S indicate good fit.
Tables 2, 4 and 6 give the MLEs and their standard errors for the NFKw model and other competitive models TrKw, EKw and Kw for these data sets. The values of the GoF statistics in Tables 3, 5 and 7 indicate that the NFKw model shows small values of the GoF statistics and hence the proposed model provides best fit as compared to the other models. These plots also support our claim.
Table 2.
MLEs and their SEs (in parentheses) for data set 1.
| Distribution | a | b | α | λ |
|---|---|---|---|---|
| NFKw | 2.3455 | 7.7175 | – | – |
| (0.4556) | (3.2113) | – | – | |
| TrKw | 3.7259 | 10.9645 | – | 0.6141 |
| (0.6490) | (6.0368) | – | (0.3752) | |
| EKw | 3.3633 | 45.8805 | 0.2570 | – |
| (0.6021) | (9.4457) | (0.1269) | – | |
| Kw | 3.3631 | 11.7886 | – | – |
| (0.6033) | (5.3594) | – | – |
Table 4.
MLEs and their standard errors (in parentheses) for data set 2.
| Distribution | a | b | α | λ |
|---|---|---|---|---|
| NFKw | 1.8585 | 43.1739 | – | – |
| (0.1319) | (11.0563) | – | – | |
| TrKw | 2.9292 | 177.1790 | – | 0.3571 |
| (0.2128) | (62.1210) | – | (0.3535) | |
| EKw | 2.8099 | 85.9558 | 2.0498 | – |
| (0.1940) | (509.7923) | (12.1569) | – | |
| Kw | 2.8104 | 176.3490 | – | – |
| (0.1941) | (59.9656) | – | – |
Table 6.
MLEs and their standard errors (in parentheses) for data set 3.
| Distribution | a | b | α | λ |
|---|---|---|---|---|
| NFKw | 1.7263 | 19.4685 | – | – |
| (0.2661) | (8.7742) | – | – | |
| TrKw | 2.7037 | 45.8129 | – | 0.4748 |
| (0.3944) | (27.9858) | – | (0.3956) | |
| EKw | 2.4966 | 105.2523 | 0.4157 | – |
| (0.3691) | (4066.8131) | (16.0628) | – | |
| Kw | 2.4998 | 43.9672 | – | – |
| (0.3700) | (23.6081) | – | – |
Table 3.
The statistics AIC, BIC, HQIC, , , K-S and p-values for data set 1.
| K-S | ||||||||
|---|---|---|---|---|---|---|---|---|
| Distribution | AIC | BIC | HQIC | K-S | p-value | |||
| NFKw | 0.6966 | 0.1147 | 0.1797 | 0.5380 | ||||
| TrKw | 0.8409 | 0.1409 | 0.1930 | 0.4455 | ||||
| EKw | 1.4830 | 0.2626 | 0.7960 | 1.972e−11 | ||||
| Kw | 0.9722 | 0.1658 | 0.2109 | 0.3360 |
Table 5.
The statistics AIC, BIC, HQIC, , , K-S and p-values for data set 2.
| K-S | ||||||||
|---|---|---|---|---|---|---|---|---|
| Distribution | AIC | BIC | HQIC | K-S | p-value | |||
| NFKw | 0.9281 | 0.1657 | 0.1005 | 0.1509 | ||||
| TrKw | 1.0578 | 0.1906 | 0.1098 | 0.0913 | ||||
| EKw | 0.9207 | 0.1670 | 0.5341 | 2.2e−16 | ||||
| Kw | 1.1612 | 0.2078 | 0.1181 | 0.0563 |
Table 7.
The statistics AIC, BIC, HQIC, , , K-S and p-values for data set 3.
| K-S | ||||||||
|---|---|---|---|---|---|---|---|---|
| Distribution | AIC | BIC | HQIC | K-S | p-value | |||
| NFKw | 1.1518 | 0.1835 | 0.1755 | 0.3766 | ||||
| TrKw | 1.3400 | 0.2187 | 0.1837 | 0.3222 | ||||
| Ekw | 1.7763 | 0.3018 | 0.6172 | 2.325e−09 | ||||
| Kw | 1.4409 | 0.2379 | 0.1915 | 0.2756 |
Firstly, it is clear that, the NFKw model provides a better fit than the other tested models, because it has the smallest value among , AIC, BIC, HQIC, , and K-S. Figures 12–14 also support our claim about NFKw model.
Figure 13.
Estimated plots (a) density (b) sf (c) hazard rate, and (d) Box-plot for data set 2.
Figure 12.
Estimated plots (a) density (b) sf (c) hazard rate, and (d) Box-plot for data set 1.
Figure 14.
Estimated plots (a) density (b) sf (c) hazard rate, and (d) Box-plot for data set 3.
7. Concluding remarks
We introduce a new cumulative distribution without extra parameters defined from a baseline cumulative distribution which serves as a flexible generator of generalized classes of distributions. The function defines the new flexible generalized family (NFGF) of distributions. We present many sub-families of the NFGF. We obtain some mathematical properties of the NFGF and also study some properties of the special model called the new flexible Kumaraswamy (NFKw) distribution. We compare this distribution with the transmuted-Kumaraswamy, exponentiated-Kumaraswamy, and Kumaraswamy models by considering six popular GoF statistics. We find that the new distribution provides better estimates and minimum GoF values. The NFKw model outperforms these three models on the basis of numerical and graphical analysis. We expect that this new generator of G-families will be able to attract readers and applied statisticians.
Supplementary Material
Acknowledgments
The authors would like to thank two anonymous reviewers for comments and suggestions which improved the earlier version of the manuscript.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Ahmad Z., Elgarhy M., and Hamedani G.G., A new Weibull-X family of distributions: properties, characterizations and applications, J. Stat. Distrib. Appl. 5 (2018), Art. 5, pp. 18. [Google Scholar]
- 2.Alzaatreh A., Famoye F, and Lee C., A new method for generating families of continuous distributions, Metron 71 (2013), pp. 63–79. [Google Scholar]
- 3.Cordeiro G.M., Ortega E.M.M., and Cunha D.C.C., The exponentiated generalized class of distributions, J. Data Sci. 11 (2013), pp. 1–27. [Google Scholar]
- 4.Eugene N., Lee C., and Famoye F., Beta-normal distribution and its applications, Commun. Stat. Theory Methods 31 (2002), pp. 497–512. [Google Scholar]
- 5.Gleaton J.U. and Lynch J.D., Properties of generalized log-logistic families of lifetime distributions, J. Probab. Stat. Sci. 4 (2006), pp. 51–64. [Google Scholar]
- 6.Granzotto D.C.T., Louzada F., and Balakrishnan N., Cubic rank transmuted distributions: inferential issues and applications, J. Stat. Comput. Simul. 87 (2016), pp. 2760–2778. [Google Scholar]
- 7.Gupta R.C., Gupta P.L., and Gupta R.D., Modeling failure time data by Lehman alternatives, Commun. Stat. Theory Methods 27 (1999), pp. 887–904. [Google Scholar]
- 8.Khan M.S., King R., and Hudson L.I., Transmuted Kumaraswamy distribution, Stat. Trans. 17 (2016), pp. 183–210. [Google Scholar]
- 9.Lee C., Famoye F., and Alzaatreh A., Methods for generating families of univariate continuous distributions in the recent decades, WIREs Comput. Stat. 5 (2013), pp. 219–238. [Google Scholar]
- 10.Lemonte A.J., Barreto-Souza W., and Cordeiro G.M., The exponentiated Kumaraswamy distribution and its log-transform, Braz. J. Probab. Stat. 27 (2013), pp. 31–53. [Google Scholar]
- 11.Marshall A.W. and Olkin I., A new method for adding parameters to a family of distributions with application to the exponential and Weibull families, Biometrika 84 (1997), pp. 641–652. [Google Scholar]
- 12.Shaw W.T. and Buckley I.R., The alchemy of probability distributions: beyond Gram-Charlier expansions, and a skew-kurtotic-normal distribution from a rank transmutation map, UCL discovery repository (2007). Available at http://discovery.ucl.ac.uk/id/eprint/643923.
- 13.Tahir M.H. and Cordeiro G.M., Compounding of distributions: a survey and new generalized classes, J. Stat. Distrib. Appl. 3 (2016), pp. 13. [Google Scholar]
- 14.Tahir M.H., Cordeiro G.M., Alizadeh M., Mansoor M., and Zubair M., The logistic-X family of distributions and its applications, Commun. Stat. Theory Methods 45 (2016), pp. 7326–7349. [Google Scholar]
- 15.Tahir M.H. and Nadarajah S., Parameter induction in continuous univariate distributions: well-established G families, An. Acad. Bras. Ciênc. 87 (2015), pp. 539–568. [DOI] [PubMed] [Google Scholar]
- 16.Torabi H. and Montazari N.H., The logistic-uniform distribution and its application, Commun. Stat. Simul. Comput. 43 (2014), pp. 2551–2569. [Google Scholar]
- 17.Zografos K. and Balakrishnan N., On families of beta- and generalized gamma generated distributions and associated inference, Stat. Methodol. 6 (2009), pp. 344–362. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.














