Abstract
In recent years, the trigonometric families of continuous distributions have found a place of choice in the theory and practice of statistics, with the Sin-G family as leader. In this paper, we provide some contributions to the subject by introducing a flexible extension of the Sin-G family, called the transformed Sin-G family. It is constructed from a new polynomial-trigonometric function presenting a desirable “versatile concave/convex” property, among others. The modelling possibilities of the former Sin-G family are thus multiplied. This potential is also highlighted by a complete theoretical work, showing stochastic ordering results, studying the analytical properties of the main functions, deriving several kinds of moments, and discussing the reliability parameter as well. Then, the applied side of the proposed family is investigated, with numerical results and applications on the related models. In particular, the estimation of the unknown model parameters is performed through the use of the maximum likelihood method. Then, two real life data sets are analyzed by a new extended Weibull model derived to the considered trigonometric mechanism. We show that it performs the best among seven comparable models, illustrating the importance of the findings.
1 Introduction
Recent advances in probability distribution theory and applications have seen the rise of various general families of distributions, successfully applied for different statistical problems. In this regard, a nice survey can be found in [1]. Here, we put the light on the trigonometric families of continuous distributions, i.e., those defined by a cumulative distribution function (cdf) involving trigonometric functions (sine, cosine, tangent, cotangent, and various combinations of these). The pioneer work is about the Sin-G family developed by [2–5]. As indicated by its name, it is defined around the sine function; the corresponding cdf is given by
| (1) |
where G(x;ζ) is a baseline cdf of a continuous distribution with parameter(s) vector denoted by ζ. It is now demonstrated that the Sin-G family has the ability to provide flexible statistical models to fit data of various nature. Also, it is a simple alternative to the model derived to the baseline distribution, without the addition of parameter. For instance, in [2], the exponential distribution is used as a baseline to construct the SinE model, which reveals to suitably fit the famous bladder cancer patients data of [6]. Also, he has the better fit as compared to some classical models such as the former exponential one, having better Akaike information criteria (AIC), Bayesian information criteria (BIC) and Kolmogorov-Smirnov (KS) test values. On the other side, based on the inverse Weibull distribution (see [7]), the SinIW model was introduced by [4], with application to the so-called Guinea pigs data by [8], providing better BIC in comparison to some other solid models. A“free for all” R package on the SinIW model is provided in [9]. As a matter of fact, the qualities of the models derived to the Sin-G family have inspired other general families of continuous distributions also centered around trigonometric functions, such as the Cos-G family by [5], CS-G family by [10], NSin-G family by [11], TransSC-G family by [12], SinTL-G family by [13], SinKum-G family by [14], and SinEOF-G family by [15]. The majority of these families are based on the Sin-G structure, with no additional tuning parameters or transformations.
In this paper, we go further the Sin-G family by proposing a new extended version of it, called the transformed Sin-G (TS-G) family. The corresponding cdf is derived to (1), with the use of a simple one-parameter polynomial-trigonometric transformation. This transformation has the following features: (i) it is analytically simple and includes the non-transformed case, (ii) it has the properties of a continuous cdf, that is, has its values into the unit interval, is continuous, almost everywhere differentiable and increasing, and (iii) it can be convex or concave, or none of them, for well-identified values of the parameter. Thanks to its versatility, this transformation significantly enhances the flexible properties of (1), and the baseline cdf as well. Thus, the TS-G family distinguishes itself from other modified Sin-G families by its overall simplicity, original polynomial-trigonometric functions, and the advantage of flexible kurtosis, skewness, versatile distribution tails, and various hazard rate shapes, as a result of the considered transformation. Thus, the TS-G family can provide interesting models for diverse fitting purposes. This practical aspect, along with important theoretical results, are developed in this study.
The rest of the paper is organized as follows. The basics on the TS-G family are presented in Section 2. Also, an emphasis is put on a special distribution of the family based on the Weibull distribution, motivated by its desirable shapes characteristics in the modelling sense. In Section 3, interesting properties of the TS-G family are studied, including stochastic ordering results, equivalence properties, critical points analysis, series expansion involving known exponentiated functions, moments, and reliability parameter. In Section 4, by adopting a statistical approach, the TS-G model parameters are estimated with the maximum likelihood method, supported by a simulation study. Then, applications of this special model are addressed in Section 5, showing how the new family can be of interest to fit various data sets, outperforming seven other solid extended or modified Weibull models of the literature. Section 6 formulates concluding remarks.
2 Basics on the TS-G family
In this section, the TS-G family is defined, with motivations and discussions.
2.1 On a special polynomial-trigonometric function
The following result presents some interesting features of a simple polynomial-trigonometric function, which will be at the basis of the TS-G family.
Proposition 1 Let λ ∈ [0, 1] and Tλ(x) be the following parametric function:
| (2) |
with Tλ(x) = 0 if x < 0 and Tλ(x) = 1 if x > 1. Then, the following properties hold:
Tλ(x) has the properties of a continuous cdf,
Tλ(x) can be convex or concave according to the values of λ. In particular, for λ ∈ [0, 1/3], Tλ(x) is concave and, for λ ∈ [1/2, 1], Tλ(x) is convex.
For λ ∈ (1/3, 1/2), Tλ(x) can be neither convex nor concave.
Proof. First of all, the following inequality holds: for y ∈ [0, π/2], we have
| (3) |
(see [16]). Let us now prove the first point of the proposition. Since λ ∈ [0, 1], it follows from (3) that 0 ≤ T1(x)≤Tλ(x)≤sin[(π/2)x] ≤ 1. Also, Tλ(x) satisfies Tλ(0) = 0 and Tλ(1) = 1, it is continuous, differentiable and, by differentiating on x, we have
As a sum of positive functions, we have dTλ(x)/dx ≥ 0, so Tλ(x) is increasing. We conclude that Tλ(x) has the properties of a continuous cdf. For the second point of the proof, let us notice that, by differentiating on x, we have
Therefore, if λ ∈ [0, 1/3], it follows from 2λ − 1 ≤ −1/3 and (3) that
That is, Tλ(x) is concave. On the other hand, if λ ∈ [1/2, 1], we have d2 Tλ(x)/dx2 ≥ 0 as a sum of positive functions, implying that Tλ(x) is convex.
Now, for λ = 2/5 ∈ (1/3, 1/2), we have
implying that Tλ(x) can be neither convex nor concave. As a visual approach, if we set , with λℓ = ℓ/2 + (1 − ℓ)/3 and ℓ ∈ {0.1, 0.2, …, 0.9}, so that λℓ ∈ (1/3, 1/2), Fig 1 shows that Uℓ(x) can be positive and negative, implying that is neither convex nor concave for the considered values of λ. This concludes the proof of Proposition 1.
Fig 1. Plots of the function Uℓ(x) for ℓ ∈ {0.1, 0.2, …, 0.9}.
One can remark that the function Tλ(x) defined by (2) can be written as , where
One can establish that the function has the properties of a cdf, which is not mentioned in the existing literature.
In view of Proposition 1, the transformation function allows to “convexify (or not)” the convex cdf s(x) = sin[(π/2)x], x ∈ [0, 1], while keeping its cdf properties. This ability is not satisfied by some other simple transformation functions, as the power transformation, i.e., with γ > 0, for instance. This aspect is the driving force behind the TS-G family, which aims to expand the Sin-G family in a straightforward manner to open new statistical perspectives. We show the convex/concave properties of the function Tλ(x) given by (2) in Fig 2, by considering several values for λ.
Fig 2. Plots of the function Tλ(x) for λ ∈ {0.1, 0.2, …, 1}.
2.2 Definition
By taking the benefits of the flexibility of Tλ(x) given by (2) as described in Proposition 1, the proposed TS-G family of continuous distributions is defined by the following cdf:
| (4) |
where λ ∈ [0, 1] and, as usual, G(x;ζ) is a baseline cdf of a continuous distribution with parameter(s) vector denoted by ζ.
That is, by considering the transformations Tλ(x) and discussed above, we have F(x;λ, ζ) = Tλ[G(x;ζ)] or, equivalently, , motivating the name of “transformed Sin-G family”. One can notice that the cdf of the former Sin-G family is derived by taking λ = 0. Also, based on Proposition 1 and the convex/concave properties of , we argue that the overall flexibility of the cdf of the former Sin-G family provided by (1) is enhanced. This is concretized by the addition of the modulating polynomial-cosine term λ(π/2)G(x;ζ)cos[(π/2)G(x;ζ)], which opens up a whole new world of possibilities.
Also, one can write F(x;λ, ζ) as a simple mixture of two cdfs of the TS-G family itself: F(x;0, ζ) and F(x;1, ζ), with the weights 1 − λ and λ, respectively, i.e.,
Hence, the role of λ is to balance F(x;0, ζ) and F(x;1, ζ), each reaching different targets in terms of statistical modelling.
Among the other functions of interest, the survival function (sf) of the TS-G family is given by
Upon an almost everywhere differentiation of F(x;λ, ζ) with respect to x, the corresponding probability density function (pdf) is given by
| (5) |
where g(x;ζ) is the pdf of the baseline distribution, i.e., obtained by an almost everywhere differentiation of G(x;ζ).
Another important function of the TS-G family, specially when the support of the baseline distribution is (0, + ∞), is the hazard rate function (hrf) defined by
| (6) |
For the importance of the sf and hrf, in reliability analysis mainly, we may refer the reader to [17], and the references therein.
2.3 A special distribution: The TSW distribution
Naturally, each choice for G(x;ζ) gives a new TS-G distribution. Here, we focus our attention on the Weibull distribution as baseline, i.e., defined by the following cdf:
| (7) |
and G(x;α, β) = 0 if x ≤ 0, where α > 0 and β > 0 are scale and shape parameters, respectively. As a main interest, the Weibull distribution is known to be an alternative to the exponential distribution, offering more flexible hazard rate shapes; decreasing and increasing shapes can be observed. It has been involved with success in a plethora of applications requiring the analysis of lifetime and reliability data. In this regard, we may refer the reader to [18–20].
We thus aim to extend the Weibull distribution, along with their properties, via the use of the TS-G family. That is, by inserting (7) into (4), we introduce the TSW distribution defined by the following cdf:
and F(x;λ, α, β) = 0 if x ≤ 0, where the second expression is obtained after some trigonometric manipulations.
Also, the corresponding sf, pdf and hrf are, respectively, given by
and S(x;λ, α, β) = 1 if x ≤ 0,
and f(x;λ, α, β) = 0 if x ≤ 0, and
and h(x;λ, α, β) = 0 if x ≤ 0.
After some graphical investigations, the curvature properties of the functions of the TSW distribution reveal to be desirably versatile. Evidence can be seen in Fig 3, which displays some plots of the corresponding pdf and hrf for various values of the parameters.
Fig 3. Selection of plots for (a) the pdf and (b) the hrf of the TSW distribution.
In particular, Fig 3(a) indicates that the pdf of the TSW distribution has various skewness shapes (near symmetrical, left, right, bathtub, reversed-J shapes, mainly), along with different kurtosis properties. Fig 3(b) reveals that the corresponding hrf possesses versatile shapes, such as decreasing, increasing, bathtub (classic and upside-down) and reversed-J shapes. These observations imply that the TSW distribution is adequate to fit heterogeneous data sets. In our study, this aspect will be developed in Section 5, where the TSW distribution is used to fit two real life data sets. Also, it will be compared with other extended or modified Weibull models, and the results will be quite favorable to the TSW model.
3 Notable mathematical properties
Here, we explore some mathematical properties of interest satisfied by the TS-G family.
3.1 Stochastic ordering results
Stochastic ordering results are crucial to understand a certain hierarchy existing between the distributions, with consequence on their comparison from the modelling point of view. In the framework of the TS-G family, the following result presents some relations involving the cdf of the TS-G family (beyond the following immediate stochastic ordering property: F(x;λ, ζ)≤F(x;0, ζ)).
Proposition 2 The following inequalities hold:
If λ2 ≥ λ1 ≥ 0, we have F(x;λ2, ζ)≤F(x;λ1, ζ).
- For λ ∈ [0, 2/π], we have F(x;λ, ζ)≥F*(x;ζ), where
is a valid cdf.
Proof. Based on (4), since λ2 ≥ λ1 and the involved functions are positive, we have
implying the desired inequality.
For the second point, the following inequality holds: for y ∈ [0, π/2], we have sin(y)≥y(2/π) (see [16]). Hence, based on (4), since λ ∈ [0, 2/π], we have
Then, one can remark that F*(x;ζ) is the cdf of the rv Z = max(X, Y), where X is a rv having the (baseline) cdf G(x;ζ) and Y is a rv having the cdf of the Cos-G family (see [5]), with X and Y independent.
The following result is about a likelihood stochastic ordering of the TS-G family. We refer the reader to [21] for the details on the concept of likelihood stochastic order.
Proposition 3 Let X1 be a rv having the cdf F(x;λ1, ζ) and X2 be a rv having the cdf F(x;λ2, ζ). Then, if λ2 ≥ λ1, we have X1 ≤ X2 in the likelihood stochastic ordering sense.
Proof. Following [21], we have X1 ≤ X2 in the likelihood stochastic ordering sense if and only if the following ratio function is decreasing with respect to x:
where f(x;λ1, ζ) and f(x;λ2, ζ) are the corresponding pdfs of F(x;λ1, ζ) and F(x;λ2, ζ), respectively. That is, by using (5), we have
Upon an almost everywhere differentiation with respect to x, after some developments, we get
which is negative if and only if λ2 ≥ λ1, implying the desired result.
3.2 Equivalence properties
Here, some equivalence properties of crucial functions of the TS-G family are discussed, which can be helpful to find their limits and also, understand the tails properties of the distribution. As G(x;ζ)→0, we establish that
Also, as G(x;ζ)→1, we have
In each case, we see how the new parameter λ modulates the limits; it has a strong effect in this regard, except for the hrf when G(x;ζ)→1.
In the case of the TSW distribution as described in Subsection 2.3, the following equivalence hold. As x → 0, we have
and
Therefore, we obtain limx → 0 f(x;λ, α, β) = limx → 0 f(x;λ, α, β) = ℓ with ℓ = + ∞ if β < 1, ℓ = (π/2)(1 − λ)α if β = 1 and ℓ = 0 if β > 1.
Also, as x → + ∞, we have
and
Hence, we have limx → + ∞ f(x;λ, α, β) = 0 in all the situations, and limx → + ∞ h(x;λ, α, β) = ℓ with ℓ = 0 if β < 1, ℓ = α if β = 1 and ℓ = + ∞ if β > 1.
3.3 Critical points
Some analytical facts about the critical points of functions of the TS-G family are now presented. First of all, the study of critical point(s), i.e., mode(s), of f(x;λ, ζ) informs us on the possible singularities of the related model. A critical point of f(x;λ, ζ), say x*, is solution of the following non-linear equation: df(x;λ, ζ)/dx = 0, which is equivalent to be solution of the following more tractable non-linear equation: d{log[f(x;λ, ζ)]}/dx = 0, i.e.,
Then, the nature of x* depends on the values of . More specifically, x* is designated as a local maximum point if η < 0, an inflection point if η = 0, and a local minimum point if η > 0.
The same methodology can be applied to study the critical points for h(x;λ, ζ), which can be useful to identify specific hazard rate shapes (monotonic, bathtub, S…) for a modelling aim. Let us just mention that a critical point for h(x;λ, ζ) is solution of the following non-linear equation: d{log[h(x;λ, ζ)]}/dx = 0, i.e.,
Clearly, in most of the cases, the critical points of f(x;λ, ζ) and h(x;λ, ζ) have not closed-forms. They can however be determined as numerical values by using mathematical softwares, as Mathematica, Python, R, Maltlab…
Concerning the TSW distribution, numerical investigations, supported by Fig 3 as well, show that it is unimodal, with a corresponding hrf that can have one critical point.
3.4 A series expansion
The following result establishes a new representation of the pdf of the TS-G family involving exponentiated baseline pdfs. Such results are common for the pdfs of modern general families of continuous distributions (see, e.g., [4, 11, 22]).
Proposition 4 For any x such that G(x;ζ)<1, the following series expansion holds:
where ak = (π/2)2k+1(−1)k[1 − λ(2k + 1)]/(2k + 1)! and υγ = γg(x;ζ)G(x;ζ)γ−1, with γ = 2k + 1.
Proof. Owing to the series expansions of the sine and cosine functions, after some developments, we get
We end the proof of Proposition 4 by differentiating the above function with respect to x.
Proposition 4 is of interest because the properties of most of the exponentiated standard distributions are well known, and thus, can be used to determine those of the TS-G family. Also, from the practical point of view, it allows us to define some integral terms by the means of (infinite) sums, which sometimes give less error than compute the integral directly. In this regard, we refer to the discussion in [22].
In the setting of the TSW distribution, we have
where υ2k+1(x;ζ) denotes the pdf of the exponentiated Weibull distribution, defined with power parameter 2k + 1 (see [23]), i.e.,
and υ2k+1(x;α, β) = 0 if x ≤ 0. Further details about the exponentiated Weibull distribution can also be found in [24, 25].
3.5 Generalities on the moments
Let X be a rv having the cdf F(x;α, β, ζ) given by (4) (and the pdf f(x;α, β, ζ) given by (5)) and ϕ(x) be a function. Then, assuming that it makes mathematical sense, the expectation of ϕ(X) is obtained as
where, by denoting QG(u;ζ) the inverse function of G(x;ζ),
and
These two integrals can be determined analytically, depending on the complexity of the function ϕ[QG(u;ζ)]. In all the situations, for given baseline cdf and λ, Θϕ(X) can be calculated by the means of numerical techniques, implemented in any mathematical software.
Also, for an alternative analytical treatment, Proposition 4 implies that
| (8) |
For practical purposes, the sum can be truncated to a large enough integer K, providing a suitable approximation of Θϕ(X). Some derivations of Θϕ(X) are presented in Table 1, which follow from several specific choices of ϕ(x). As an example of application, the mth raw moments of a rv X following the TSW distribution can be derived from (8) and the mth raw moments of the exponentiated Weibull distribution with power parameter 2k + 1 as established in [26].
Table 1. Specific measures and functions derived to Θϕ(X) according to the choice of ϕ(x).
| Θϕ(X) | ϕ(x) |
|---|---|
| mean (μ*) | x |
| variance | (x − μ*)2 |
| mth raw moment | xm |
| mth central moment | (x − μ*)m |
| mth inverse moment | x−m |
| mth logarithmic moment | [log(x)]m |
| mth descending factorial moment | x(x − 1)(x − 2)…(x − m + 1) |
| mth incomplete moment with respect to t | xm if x ≤ t, and 0 elsewhere |
| (m, q)th probability weighted moment | xm F(x;λ, ζ)q |
| moment generating function with respect to t | etx |
| characteristic function with respect to t | eitx |
3.6 Reliability parameter
The general definition of the reliability parameter can be formulated as follows. Let X1 and X2 be two continuous rvs that can be compared based on a scenario that makes sense in a random system. Then, the corresponding reliability parameter can be defined as
where f(x, y;ξ) denotes the joint pdf of (X1, X2), with ξ as parameter(s) vector. Details and applications of R in a concrete setting can be found in [27, 28], and the references therein.
The following result concerns the expression of R for the TS-G family in a specific setting.
Proposition 5 Let X1 and X2 be two independent rvs having the cdfs F(x;λ1, ζ) and F(x;λ2, ζ), respectively. Then, we have
Proof. Owing to the independence of X1 and X2, and (4) and (5), and after some integral calculus, we arrive at
This ends the proof of Proposition 5.
In Proposition 5, when X1 and X2 are identically distributed, i.e., λ1 = λ2, we get R = 1/2. Also, Proposition 5 is useful to have a simple estimate of R based on estimates of λ1 and λ2. Indeed, if and are estimates of λ1 and λ2, respectively, then the plugging approach suggests the following estimate for R:
However, more research into the application of this formula to real-world data is needed.
4 Maximum likelihood estimation
Here, an inferential study of the TS-G family is proposed, estimating the parameters of the TS-G model by the maximum likelihood method.
4.1 The basics
The maximum likelihood method is commonly employed in parametric estimation because of its overall simplicity and the theoretical guarantees ensuring strong convergence properties on the obtained estimates. In this regard, the reader will find everything in [29]. We may also refer to [30–32] for modern applications of this method. In the context of the TS-G family, the theoretical basics of the maximum likelihood method are described below. Let x1, …, xn be n observed values of a rv having the cdf given by (4). Then, the log-likelihood function for the parameters λ and ζ, supposed to be unknown, is defined as
Then, the maximum likelihood method suggests the estimates given by and , which is possibly a vector of estimates, making maximal, among all the possible values for λ and ζ. They are called maximum likelihood estimates (MLEs). From the ideal mathematical point of view, they are the solutions the following system of equations: ∂ℓ(λ, ζ)/∂λ = 0 and ∂ℓ(λ, ζ)/∂ζ = 0, with
and
In most of the cases, the analytical expressions for and seem not possible. However, for given baseline cdf and λ, they can be approximated numerically by iterative techniques. Common routines are the optim function of the R software or PROC NLMIXED of the SAS (Statistical Analysis System) software. Also, one can determine the standard errors (SEs) of the MLEs which follow from the inverse of the observed information matrix. By assuming that ζ contains several parameters, say m, this observed information matrix is defined by , where ψ1 = λ, and ψ1+r denotes the rth component of ζ. From the SEs, one can construct asymptotic confidence intervals of the parameters, among others.
A statistical aspect of the TS-G family that is not investigated in this study is the identifiability. Numerical experiments show no particular problem on this property, but the rigorous theory remains to be developed with precise mathematical tools.
The rest of the study is devoted to the empirical and real life applications of the TS-G model with the consideration of the MLEs of the parameters.
4.2 Simulation
Here, we illustrate the practical aspect of the MLEs in the setting of the TSW model, i.e., based on the TSW distribution presented in Subsection 2.3. More precisely, we propose a graphical simulation approach illustrating the numerical behavior of the MLEs , and , of λ, α and β, respectively. The R software is used in this regard.
We proceed as follows. We generate N = 3000 samples (x1, …, xn) of size n = 10 to 50 from a rv following the TSW distribution with the two following sets of parameters: and . We also calculate the empirical mean squared errors (MSEs) of the MLEs defined as, for h = λ, α, β,
where the index i refers to the ith generated samples. The results of this simulation study are presented in Figs 4 and 5 for and , respectively.
Fig 4. Plots of the empirical MSEs of the TSW model parameters for for (a) λ, (b) α and (c) β.
Fig 5. Plots of the empirical MSEs of the TSW model parameters for for (a) λ, (b) α and (c) β.
As a prime observation, we see that, in all the situations, when the sample size increases, the empirical MSEs approach the axis y = 0. This illustrates the “numerical convergence” of the MLEs to the true values of the parameters.
5 Applications
Thanks to its desirable flexible properties, the TSW model aims to be applied in concrete scenarios, such as the fit of real life data. We share this finding by considered the two following well-referenced real life data sets.
“The first data set”. The first data set finds its source in [28]. It contains the tensile strength (with unit in GPa) for single carbon fibers. This data set is given by: {0.312, 0.314, 0.479, 0.552, 0.700, 0.803, 0.861, 0.865, 0.944, 0.958, 0.966, 0.997, 1.006, 1.021, 1.027, 1.055, 1.063, 1.098, 1.140, 1.179, 1.224, 1.240, 1.253, 1.270, 1.272, 1.274, 1.301, 1.301, 1.359, 1.382, 1.382, 1.426, 1.434, 1.435, 1.478, 1.490, 1.511, 1.514, 1.535, 1.554, 1.566, 1.570, 1.586, 1.629, 1.633, 1.642, 1.648, 1.684, 1.697, 1.726, 1.770, 1.773, 1.800, 1.809, 1.818, 1.821, 1.848, 1.880, 1.954, 2.012, 2.067, 2.084, 2.090, 2.096, 2.128, 2.233, 2.433, 2.585, 2.585}.
“The second data set”. The second data set, often called breaking stress of carbon fibers data set, was used by [33]. This data set is given by: {3.70, 2.74, 2.73, 2.50, 3.60, 3.11, 3.27, 2.87, 1.47, 3.11, 3.56, 4.42, 2.41, 3.19, 3.22, 1.69, 3.28, 3.09, 1.87, 3.15, 4.90, 1.57, 2.67, 2.93, 3.22, 3.39, 2.81, 4.20, 3.33, 2.55, 3.31, 3.31, 2.85, 1.25, 4.38, 1.84, 0.39, 3.68, 2.48, 0.85, 1.61, 2.79, 4.70, 2.03, 1.89, 2.88, 2.82, 2.05, 3.65, 3.75, 2.43, 2.95, 2.97, 3.39, 2.96, 2.35, 2.55, 2.59, 2.03, 1.61, 2.12, 3.15, 1.08, 2.56, 1.80, 2.53}.
In addition, seven successful models are considered for comparison, also defined as extended or modified versions of the Weibull model and having two, or three or four tuning parameters. Namely, we consider the four-parameter generalized modified Weibull (GMW) model by [34], four-parameter Kumaraswamy Weibull (KW) model by [35], four-parameter beta Weibull (BW) model by [6], four-parameter odd log-logistic modified Weibull (OLLMW) model by [36], three-parameter transmuted Weibull (TW) model by [37], three-parameter modified Weibull (MW) model by [38] and the former two-parameter sine Weibull (SW) model by [3].
As criteria of goodness-of-fits to compare these models, we chose the Cramér-Von Mises (CVM), Anderson-Darling (AD) and KS statistics, with the corresponding KS p-values. Also, the AIC is calculated. For the use of the AIC in applied frameworks, one may refer to [39–41]. The global rule is the following ones. The smaller the values of the CVM, AD, KS statistics and AIC, and the larger the values of the KS p-values, the better the fit of the corresponding model to the considered data. The R software is used.
Tables 2 and 3 list the values of the CVM, AD, KS with p-value, and the MLEs and their corresponding SEs of the models parameters for the first and second data sets, respectively.
Table 2. CVM, AD, KS with p-value, MLEs and SEs for the first data set.
| Model | CVM | AD | KS | p-value | MLEs (SEs) | |||
|---|---|---|---|---|---|---|---|---|
| TSW | 0.0152 | 0.1322 | 0.0385 | 1.0000 | 0.7896 | 0.4807 | 2.4826 | - |
| (λ, α, β) | (0.1687) | (0.1702) | (0.4224) | - | ||||
| GMW | 0.0184 | 0.1649 | 0.0421 | 0.9960 | 4.5031 | 0.4927 | 0.3401 | 0.8561 |
| (a, α, γ, λ) | (9.1201) | (0.7698) | (1.0690) | (0.2833) | ||||
| KW | 0.0226 | 0.1984 | 0.0475 | 0.9977 | 0.7268 | 0.1621 | 1.0308 | 3.5369 |
| (α, β, γ, θ) | (0.0052) | (0.0186) | (0.0218) | (0.0086) | ||||
| BW | 0.0256 | 0.2217 | 0.0480 | 0.9973 | 0.3585 | 3.7827 | 0.7813 | 5.7953 |
| (α, β, γ, θ) | (1.9772) | (1.2906) | (0.4103) | (18.9342) | ||||
| OLLW | 0.0228 | 0.1664 | 0.04675 | 0.9982 | 0.0729 | 0.5845 | 0.0146 | 22.3637 |
| (α, β, γ, θ) | (0.1025) | (0.1487) | (0.0384) | (29.5884) | ||||
| TW | 0.0428 | 0.3266 | 0.3145 | 0.0000 | 2.7732 | 1.4508 | -0.5636 | - |
| (α, β, λ) | (0.4919) | (0.1420) | (0.4269) | - | ||||
| MW | 0.0195 | 0.1733 | 0.0431 | 0.9995 | 0.0180 | 0.1892 | 3.3740 | - |
| (α, β, θ) | (0.0609) | (0.0780) | (0.5227) | - | ||||
| SW | 0.0236 | 0.2076 | 0.0442 | 0.9902 | 0.1291 | 3.0852 | - | - |
| (α, β) | (0.0275) | (0.2951) | - | - | ||||
Table 3. CVM, AD, KS with p-value, MLEs and SEs for the second data set.
| Model | CVM | AD | KS | p-value | MLEs (SEs) | |||
|---|---|---|---|---|---|---|---|---|
| TSW | 0.0554 | 0.3538 | 0.0694 | 0.9079 | 0.7440 | 0.0679 | 2.7694 | - |
| (λ, α, β) | (0.1915) | (0.0504) | (0.5096) | - | ||||
| GMW | 0.0653 | 0.3939 | 0.0760 | 0.8394 | 5.4737 | 0.4343 | 0.1493 | 0.5167 |
| (a, α, γ, λ) | (7.9525) | (0.6457) | (0.5395) | (0.1722) | ||||
| KW | 0.0703 | 0.4501 | 0.0825 | 0.7591 | 0.6536 | 0.1738 | 0.0664 | 3.8782 |
| (α, β, γ, θ) | (0.0230) | (0.0416) | (0.0142) | (0.0171) | ||||
| BW | 0.0846 | 0.5041 | 0.0812 | 0.7761 | 0.1864 | 4.0715 | 0.7592 | 6.9449 |
| (α, β, γ, θ) | (0.4201) | (1.2708) | (0.3673) | (2.7517) | ||||
| OLLW | 0.1032 | 0.54558 | 0.0780 | 0.8160 | 0.0729 | 0.5845 | 0.0146 | 22.3637 |
| (α, β, γ, θ) | (0.0301) | (0.0828) | (0.0256) | (23.3981) | ||||
| TW | 0.1260 | 0.6700 | 0.3669 | 0.0000 | 2.9256 | 2.7531 | -0.5906 | - |
| (α, β, λ) | (0.4828) | (0.2294) | (0.3744) | - | ||||
| MW | 0.0640 | 0.40134 | 0.0795 | 0.7974 | 0.0165 | 0.0144 | 3.7146 | - |
| (α, β, θ) | (0.0206) | (0.0079) | (0.4069) | - | ||||
| SW | 0.0863 | 0.4937 | 0.09003 | 0.6584 | 0.0165 | 3.1733 | - | - |
| (α, β) | (0.0064) | (0.3016) | - | - | ||||
Tables 2 and 3 indicate that the smallest CVM, AD and KS and the largest KS p-value are for the TSW model; it is the best model with the considered criteria. In particular, it outperforms the former SW model corresponding to λ = 0. That is, we see that the parameter λ of the TSW model is estimated “far from zero”, i.e., the corresponding MLEs are and , for the first and second data set, respectively. This points out the importance of the transformed sine technique to obtain suitable fits of these data, in comparison to the former SW model.
Tables 4 and 5 present the minus estimated log-likelihood, i.e., for the TSW model, and AIC values of the model parameters for the first and second data set, respectively.
Table 4. The and AIC for the first data set.
| Distribution | AIC | |
|---|---|---|
| TSW | 48.4389 | 102.8779 |
| GMW | 48.7195 | 106.0641 |
| KW | 48.7684 | 105.5368 |
| BW | 48.8954 | 105.7908 |
| OLLW | 49.3799 | 106.7600 |
| TW | 48.7059 | 103.4118 |
| MW | 48.9583 | 103.9166 |
| SW | 49.5012 | 103.0024 |
Table 5. The and AIC for the second data set.
| Distribution | AIC | |
|---|---|---|
| TSW | 85.1358 | 176.2717 |
| GMW | 85.3731 | 178.7462 |
| KW | 85.60939 | 179.2188 |
| BW | 85.9184 | 179.8368 |
| OLLW | 85.5593 | 179.1187 |
| TW | 85.5453 | 177.0907 |
| MW | 85.52304 | 177.0461 |
| SW | 86.6910 | 177.3820 |
According to Tables 4 and 5, since it has the lowest AIC for the two data sets, the TSW model can be considered as the best one.
We now provide a graphical visualization of the nice fitting results of the TSW model. That is, Figs 6 and 7 display several fits of the TSW model. In particular, the histograms of the both data sets are plotted, along with the curves of the corresponding estimated pdfs, i.e., , the curves of the estimated cdfs, i.e., , are plotted over the ones of the corresponding empirical cdfs of the data, the curves of the estimated sfs, i.e., , are plotted over the curves of corresponding empirical sfs of the data, and Probability-Probability (P-P) plots are provided.
Fig 6. Several fits of the TSW model for the first data set: (a) estimated pdf over the histogram, (b) estimated cdf over the empirical cdf, (c) estimated sf over the empirical sf and (d) P-P plot.
Fig 7. Several fits for the TSW model for the second data set: (a) estimated pdf over the histogram, (b) estimated cdf over the empirical cdf, (c) estimated sf over the empirical sf and (d) P-P plot.
In all the graphics, we see that the red curves fit well the corresponding black curves, attesting the efficiency of the TSW model in this data fitting exercise.
6 Concluding remarks
Based on a new one-parameter transformation function, we provide an original extension of the Sin-G family of continuous distributions, introducing the transformed Sin-G (TS-G) family. We discuss how an additional parameter λ can enhance the flexibility of the cdf of the former Sin-G family, with nice consequences for modelling purposes. An emphasis is put on the transformed Sin Weibull (TSW) distribution, showing a high potential in the analysis and modelling of lifetime data. Some general mathematical features of the TS-G family are established. Then, a statistical approach is adopted; the maximum likelihood estimates (MLEs) for the TS-G model parameters are discussed. The TSW model is highlighted, demonstrating that it is more capable of fitting data than seven rival models, some of which have more parameters. The TS-G family can find a broader use in all areas dealing with modern data as a result of its qualities. For example, it can be used to construct models in multivariate analysis, regression, classification, and other statistical fields of importance. In addition, the transformation Tλ(x) or can be used to efficiently extend other existing families of distributions. These viewpoints necessitate additional developments, which we plan to incorporate in future works.
Acknowledgments
The authors thank the two reviewers for their detailed and constructive comments.
Data Availability
All relevant data are within the manuscript.
Funding Statement
The author(s) received no specific funding for this work.
References
- 1. Brito CR, Rêgo LC, Oliveira WR and Gomes-Silva F. Method for generating distributions and classes of probability distributions: The univariate case, Hacettepe Journal of Mathematics and Statistics. 2019, 48, 897–930. [Google Scholar]
- 2. Kumar D, Singh U and Singh SK. A new distribution using sine function: its application to bladder cancer patients data, Journal of Statistics Applications and Probability. 2015, 4, 417–427. [Google Scholar]
- 3.Souza L. New trigonometric classes of probabilistic distributions, Thesis, Universidade Federal Rural de Pernambuco, 2015.
- 4. Souza L, Junior WRO, de Brito CCR, Chesneau C, Ferreira TAE and Soares L. On the Sin-G class of distributions: theory, model and application, Journal of Mathematical Modeling. 2019, 7, 3, 357–379. [Google Scholar]
- 5. Souza L, Junior WRO, de Brito CCR, Chesneau C, Ferreira TAE and Soares L. General properties for the Cos-G class of distributions with applications, Eurasian Bulletin of Mathematics. 2019, 2, 2, 63–79. [Google Scholar]
- 6. Lee C, Famoye F and Olumolade O. Beta-Weibull distribution: Some properties and applications to censored data, Journal of modern applied statistical methods. 2007, 6, 1, 173–186. 10.22237/jmasm/1177992960 [DOI] [Google Scholar]
- 7. Nelson W. Applied life data analysis, John Wiley and Sons, New York, 1982. [Google Scholar]
- 8. Bjerkedal T. Acquisition of resistance in guinea pigs infected with different doses of virulent tubercle bacilli, American Journal of Hygiene. 1960, 72, 130–148. [DOI] [PubMed] [Google Scholar]
- 9.Souza L, Gallindo L and Serafim-de-Souza L. SinIW: The SinIW distribution. R package version 0.2. 2016, Available at https://CRAN.R-project.org/package=SinIW.
- 10. Chesneau C, Bakouch HS and Hussain T. A new class of probability distributions via cosine and sine functions with applications, Communications in Statistics—Simulation and Computation. 2019, 48, 8, 2287–2300. 10.1080/03610918.2018.1440303 [DOI] [Google Scholar]
- 11. Mahmood Z, Chesneau C and Tahir MH. A new sine-G family of distributions: properties and applications, Bulletin of Computational Applied Mathematics. 2019, 7, 1, 53–81. [Google Scholar]
- 12. Jamal F and Chesneau C. A new family of polyno-expo-trigonometric distributions with applications, Infinite Dimensional Analysis, Quantum Probability and Related Topics. 2019, 22, 04, 1950027, 1–15. 10.1142/S0219025719500279 [DOI] [Google Scholar]
- 13. Al-Babtain AA, Elbatal I, Chesneau C and Elgarhy M. Sine Topp-Leone-G family of distributions: Theory and applications, Open Physics. 2020, 18, 1, 574–593. 10.1515/phys-2020-0180 [DOI] [Google Scholar]
- 14.Jamal F and Chesneau C. The sine Kumaraswamy-G family of distributions, 2020, preprint: https://hal.archives-ouvertes.fr/hal-02120197.
- 15.Jamal F, Chesneau C and Aidi K. The sine extended odd Fréchet-G family of distribution with applications to complete and censored data, Mathematica Slovaca. (2021), (to appear).
- 16. Mitrinović DS. Analytic Inequalities, Springer-Verlag, Berlin, 1970. [Google Scholar]
- 17. Klein JP and Moeschberger ML. Survival analysis: Techniques for censored and truncated data, 2nd ed.; Springer: Berlin, Germany, 2003. [Google Scholar]
- 18. Wang Y, Chan Y, Gui Z, Webb D and Li L. Application of Weibull distribution analysis to the dielectric failure of multilayer ceramic capacitors, Materials Science and Engineering: B. 1997, 47, 3, 197–203. 10.1016/S0921-5107(97)00041-X [DOI] [Google Scholar]
- 19. Corzo O and Bracho N. Application of Weibull distribution model to describe the vacuum pulse osmotic dehydration of sardine sheets, LWT-Food Science and Technology. 2008, 41, 6, 1108–1115. 10.1016/j.lwt.2007.06.018 [DOI] [Google Scholar]
- 20. Aslam M, Arif OH and Jun CH. An attribute control chart for a Weibull distribution under accelerated hybrid censoring, PloS one. 2017, 12, 3, e0173406. 10.1371/journal.pone.0173406 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Shaked M and Shanthikumar JG. Stochastic orders, Wiley, New York, 2007. [Google Scholar]
- 22. Cordeiro GM and de Castro M. A new family of generalized distributions, Journal of Statistical Computation and Simulation. 2011, 81, 883–893. 10.1080/00949650903530745 [DOI] [Google Scholar]
- 23. Mudholkar GS and Srivastava DK. Exponentiated Weibull family for analyzing bathtub failure-rate data, IEEE Transactions on Reliability. 1993, 42, 2, 299–302. 10.1109/24.229504 [DOI] [Google Scholar]
- 24. Mudholkar GS, Srivastava DK and Freimer M. The exponentiated Weibull family; a reanalysis of the bus motor failure data, Technometrics. 1995, 37, 4, 436–445. 10.1080/00401706.1995.10484376 [DOI] [Google Scholar]
- 25. Mudholkar GS and Hutson AD. The exponentiated Weibull family: some properties and a flood data application, Communications in Statistics—Theory and Methods. 1996, 25, 3059–3083. 10.1080/03610929608831886 [DOI] [Google Scholar]
- 26. Choudhury A. A simple derivation of moments of the exponentiated Weibull distribution, Metrika. 2005, 62, 1, 17–22. 10.1007/s001840400351 [DOI] [Google Scholar]
- 27. Kotz S, Lumelskii Y and Penskey M. The stress-strength model and its generalizations and applications, World Scientific, Singapore, 2003. [Google Scholar]
- 28. Raqab M, Madi T and Debasis K. Estimation of P (Y < X) for the 3-parameter generalized exponential distribution, Communications in Statistics-Theory and Methods. 2008, 37, 2854–2864. 10.1080/03610920802162664 [DOI] [Google Scholar]
- 29. Casella G and Berger RL. Statistical inference, Duxbury Advanced Series Thomson Learning: Pacific Grove, 2002. [Google Scholar]
- 30.Dong B, Ma X, Chen F and Chen S. Investigating the differences of single- and multi-vehicle accident probability using mixed logit model. Journal of Advanced Transportation. 2018, Article ID 2702360, 9 pages.
- 31. Chen F and Chen S. Injury severities of truck drivers in single- and multi-vehicle accidents on rural highway. Accident Analysis and Prevention. 2011, 43, 1677–1688. 10.1016/j.aap.2011.03.026 [DOI] [PubMed] [Google Scholar]
- 32. Chen F, Chen S and Ma X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research. 2018, 65, 153–159. 10.1016/j.jsr.2018.02.010 [DOI] [PubMed] [Google Scholar]
- 33. Cordeiro GM and Lemonte A. The β-Birnbaum-Saunders distribution: An improved distribution for fatigue life modeling, Computational Statistics and Data Analysis. 2011, 55, 1445–1461. 10.1016/j.csda.2010.10.007 [DOI] [Google Scholar]
- 34. Carrasco JMF, Edwin Ortega MM and Cordeiro GM A generalized modified Weibull distribution for lifetime modeling, Computational Statistics and Data Analysis. 2008, 53, 450–462. 10.1016/j.csda.2008.08.023 [DOI] [Google Scholar]
- 35. Cordeiro GM, Ortega EMM and Nadarajah S. The Kumaraswamy Weibull distribution with application to failure data, Journal of the Franklin Institute. 2010, 347, 8, 1399–1429. 10.1016/j.jfranklin.2010.06.010 [DOI] [Google Scholar]
- 36. Saboor A, Alizadeh M, Khan MN, Gosh I and Cordeiro GM. Odd Log-Logistic modified Weibull distribution, Mediterranean Journal of Mathematics. 2017, 14, 96. 10.1007/s00009-017-0880-3 [DOI] [Google Scholar]
- 37. Aryal GR and Tsokos CP. Transmuted Weibull distribution: A generalization of the Weibull probability distribution, European Journal of Pure and Applied Mathematics. 2011, 4, 2, 89–102. [Google Scholar]
- 38. Sarhan AM and Zain-din M. Modified Weibull distribution, Applied Sciences. 2009, 11, 123–136. [Google Scholar]
- 39. Huang H, Song B, Xu P, Zeng Q, Lee J and Abdel-Aty M. Macro and micro models for zonal crash prediction with application in hot zones identification. Journal of Transport Geography. 2016, 54, 248–256. 10.1016/j.jtrangeo.2016.06.012 [DOI] [Google Scholar]
- 40. Wen H, Zhang X, Zeng Q and Sze NN. Bayesian spatial-temporal model for the main and interaction effects of roadway and weather characteristics on freeway crash incidence. Accident Analysis and Prevention. 2019, 132, 105249. 10.1016/j.aap.2019.07.025 [DOI] [PubMed] [Google Scholar]
- 41. Zeng Q, Wang X, Wen H and Yuan Q. An empirical investigation of the factors contributing to local-vehicle and non-local-vehicle crashes on freeway. Journal of Transportation Safety & Security. 2020, 1–15. 10.1080/19439962.2020.1779422 27648455 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All relevant data are within the manuscript.







