Beyond the Sin-G family: The transformed Sin-G family

Farrukh Jamal; Christophe Chesneau; Dalal Lala Bouali; Mahmood Ul Hassan

doi:10.1371/journal.pone.0250790

. 2021 May 11;16(5):e0250790. doi: 10.1371/journal.pone.0250790

Beyond the Sin-G family: The transformed Sin-G family

Farrukh Jamal ^1,^#, Christophe Chesneau ^2,^#, Dalal Lala Bouali ^2,^3,^#, Mahmood Ul Hassan ^4,^*,^#

Editor: Alan D Hutson⁵

PMCID: PMC8112721 PMID: 33974643

Abstract

In recent years, the trigonometric families of continuous distributions have found a place of choice in the theory and practice of statistics, with the Sin-G family as leader. In this paper, we provide some contributions to the subject by introducing a flexible extension of the Sin-G family, called the transformed Sin-G family. It is constructed from a new polynomial-trigonometric function presenting a desirable “versatile concave/convex” property, among others. The modelling possibilities of the former Sin-G family are thus multiplied. This potential is also highlighted by a complete theoretical work, showing stochastic ordering results, studying the analytical properties of the main functions, deriving several kinds of moments, and discussing the reliability parameter as well. Then, the applied side of the proposed family is investigated, with numerical results and applications on the related models. In particular, the estimation of the unknown model parameters is performed through the use of the maximum likelihood method. Then, two real life data sets are analyzed by a new extended Weibull model derived to the considered trigonometric mechanism. We show that it performs the best among seven comparable models, illustrating the importance of the findings.

1 Introduction

Recent advances in probability distribution theory and applications have seen the rise of various general families of distributions, successfully applied for different statistical problems. In this regard, a nice survey can be found in [1]. Here, we put the light on the trigonometric families of continuous distributions, i.e., those defined by a cumulative distribution function (cdf) involving trigonometric functions (sine, cosine, tangent, cotangent, and various combinations of these). The pioneer work is about the Sin-G family developed by [2–5]. As indicated by its name, it is defined around the sine function; the corresponding cdf is given by

\begin{matrix} F (x; ζ) = sin [\frac{π}{2} G (x; ζ)], x \in R, \end{matrix}

(1)

where G(x;ζ) is a baseline cdf of a continuous distribution with parameter(s) vector denoted by ζ. It is now demonstrated that the Sin-G family has the ability to provide flexible statistical models to fit data of various nature. Also, it is a simple alternative to the model derived to the baseline distribution, without the addition of parameter. For instance, in [2], the exponential distribution is used as a baseline to construct the SinE model, which reveals to suitably fit the famous bladder cancer patients data of [6]. Also, he has the better fit as compared to some classical models such as the former exponential one, having better Akaike information criteria (AIC), Bayesian information criteria (BIC) and Kolmogorov-Smirnov (KS) test values. On the other side, based on the inverse Weibull distribution (see [7]), the SinIW model was introduced by [4], with application to the so-called Guinea pigs data by [8], providing better BIC in comparison to some other solid models. A“free for all” R package on the SinIW model is provided in [9]. As a matter of fact, the qualities of the models derived to the Sin-G family have inspired other general families of continuous distributions also centered around trigonometric functions, such as the Cos-G family by [5], CS-G family by [10], NSin-G family by [11], TransSC-G family by [12], SinTL-G family by [13], SinKum-G family by [14], and SinEOF-G family by [15]. The majority of these families are based on the Sin-G structure, with no additional tuning parameters or transformations.

In this paper, we go further the Sin-G family by proposing a new extended version of it, called the transformed Sin-G (TS-G) family. The corresponding cdf is derived to (1), with the use of a simple one-parameter polynomial-trigonometric transformation. This transformation has the following features: (i) it is analytically simple and includes the non-transformed case, (ii) it has the properties of a continuous cdf, that is, has its values into the unit interval, is continuous, almost everywhere differentiable and increasing, and (iii) it can be convex or concave, or none of them, for well-identified values of the parameter. Thanks to its versatility, this transformation significantly enhances the flexible properties of (1), and the baseline cdf as well. Thus, the TS-G family distinguishes itself from other modified Sin-G families by its overall simplicity, original polynomial-trigonometric functions, and the advantage of flexible kurtosis, skewness, versatile distribution tails, and various hazard rate shapes, as a result of the considered transformation. Thus, the TS-G family can provide interesting models for diverse fitting purposes. This practical aspect, along with important theoretical results, are developed in this study.

The rest of the paper is organized as follows. The basics on the TS-G family are presented in Section 2. Also, an emphasis is put on a special distribution of the family based on the Weibull distribution, motivated by its desirable shapes characteristics in the modelling sense. In Section 3, interesting properties of the TS-G family are studied, including stochastic ordering results, equivalence properties, critical points analysis, series expansion involving known exponentiated functions, moments, and reliability parameter. In Section 4, by adopting a statistical approach, the TS-G model parameters are estimated with the maximum likelihood method, supported by a simulation study. Then, applications of this special model are addressed in Section 5, showing how the new family can be of interest to fit various data sets, outperforming seven other solid extended or modified Weibull models of the literature. Section 6 formulates concluding remarks.

2 Basics on the TS-G family

In this section, the TS-G family is defined, with motivations and discussions.

2.1 On a special polynomial-trigonometric function

The following result presents some interesting features of a simple polynomial-trigonometric function, which will be at the basis of the TS-G family.

Proposition 1 Let λ ∈ [0, 1] and T_λ(x) be the following parametric function:

\begin{matrix} T_{λ} (x) = sin (\frac{π}{2} x) - λ \frac{π}{2} x cos (\frac{π}{2} x), x \in [0, 1], \end{matrix}

(2)

with T_λ(x) = 0 if x < 0 and T_λ(x) = 1 if x > 1. Then, the following properties hold:

T_λ(x) has the properties of a continuous cdf,
T_λ(x) can be convex or concave according to the values of λ. In particular, for λ ∈ [0, 1/3], T_λ(x) is concave and, for λ ∈ [1/2, 1], T_λ(x) is convex.
For λ ∈ (1/3, 1/2), T_λ(x) can be neither convex nor concave.

Proof. First of all, the following inequality holds: for y ∈ [0, π/2], we have

\begin{matrix} sin (y) \geq y cos (y), \end{matrix}

(3)

(see [16]). Let us now prove the first point of the proposition. Since λ ∈ [0, 1], it follows from (3) that 0 ≤ T₁(x)≤T_λ(x)≤sin[(π/2)x] ≤ 1. Also, T_λ(x) satisfies T_λ(0) = 0 and T_λ(1) = 1, it is continuous, differentiable and, by differentiating on x, we have

\begin{matrix} \frac{d}{d x} T_{λ} (x) = \frac{π}{2} [(1 - λ) cos (\frac{π}{2} x) + λ \frac{π}{2} x sin (\frac{π}{2} x)] . \end{matrix}

As a sum of positive functions, we have dT_λ(x)/dx ≥ 0, so T_λ(x) is increasing. We conclude that T_λ(x) has the properties of a continuous cdf. For the second point of the proof, let us notice that, by differentiating on x, we have

\begin{matrix} \frac{d^{2}}{d x^{2}} T_{λ} (x) = \frac{π^{2}}{4} [(2 λ - 1) sin (\frac{π}{2} x) + λ \frac{π}{2} x cos (\frac{π}{2} x)] . \end{matrix}

Therefore, if λ ∈ [0, 1/3], it follows from 2λ − 1 ≤ −1/3 and (3) that

\begin{matrix} \frac{d^{2}}{d x^{2}} T_{λ} (x) \leq - \frac{π^{2}}{12} [sin (\frac{π}{2} x) - \frac{π}{2} x cos (\frac{π}{2} x)] \leq 0 . \end{matrix}

That is, T_λ(x) is concave. On the other hand, if λ ∈ [1/2, 1], we have d² T_λ(x)/dx² ≥ 0 as a sum of positive functions, implying that T_λ(x) is convex.

Now, for λ = 2/5 ∈ (1/3, 1/2), we have

\begin{matrix} \frac{d^{2}}{d x^{2}} T_{λ} (0.1) = 0.07592538 > 0, \frac{d^{2}}{d x^{2}} T_{λ} (0.8) = - 0.08606892 < 0, \end{matrix}

implying that T_λ(x) can be neither convex nor concave. As a visual approach, if we set $U_{ℓ} (x) = d^{2} T_{λ_{ℓ}} (x) / d x^{2}$ , with λ_ℓ = ℓ/2 + (1 − ℓ)/3 and ℓ ∈ {0.1, 0.2, …, 0.9}, so that λ_ℓ ∈ (1/3, 1/2), Fig 1 shows that U_ℓ(x) can be positive and negative, implying that $T_{λ} (x)$ is neither convex nor concave for the considered values of λ. This concludes the proof of Proposition 1.

One can remark that the function T_λ(x) defined by (2) can be written as $T_{λ} (x) = T_{λ}^{*} {sin [(π / 2) x]}$ , where

\begin{matrix} T_{λ}^{*} (x) = x - λ arcsin (x) \sqrt{1 - x^{2}}, x \in [0, 1] . \end{matrix}

One can establish that the function $T_{λ}^{*} (x)$ has the properties of a cdf, which is not mentioned in the existing literature.

In view of Proposition 1, the transformation function $T_{λ}^{*} (x)$ allows to “convexify (or not)” the convex cdf s(x) = sin[(π/2)x], x ∈ [0, 1], while keeping its cdf properties. This ability is not satisfied by some other simple transformation functions, as the power transformation, i.e., $T_{γ}^{* *} (x) = x^{γ}$ with γ > 0, for instance. This aspect is the driving force behind the TS-G family, which aims to expand the Sin-G family in a straightforward manner to open new statistical perspectives. We show the convex/concave properties of the function T_λ(x) given by (2) in Fig 2, by considering several values for λ.

2.2 Definition

By taking the benefits of the flexibility of T_λ(x) given by (2) as described in Proposition 1, the proposed TS-G family of continuous distributions is defined by the following cdf:

\begin{matrix} F (x; λ, ζ) = sin [\frac{π}{2} G (x; ζ)] - λ \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)], x \in R, \end{matrix}

(4)

where λ ∈ [0, 1] and, as usual, G(x;ζ) is a baseline cdf of a continuous distribution with parameter(s) vector denoted by ζ.

That is, by considering the transformations T_λ(x) and $T_{λ}^{*} (x)$ discussed above, we have F(x;λ, ζ) = T_λ[G(x;ζ)] or, equivalently, $F (x; λ, ζ) = T_{λ}^{*} {sin [(π / 2) G (x; ζ)]}$ , motivating the name of “transformed Sin-G family”. One can notice that the cdf of the former Sin-G family is derived by taking λ = 0. Also, based on Proposition 1 and the convex/concave properties of $T_{λ}^{*} (x)$ , we argue that the overall flexibility of the cdf of the former Sin-G family provided by (1) is enhanced. This is concretized by the addition of the modulating polynomial-cosine term λ(π/2)G(x;ζ)cos[(π/2)G(x;ζ)], which opens up a whole new world of possibilities.

Also, one can write F(x;λ, ζ) as a simple mixture of two cdfs of the TS-G family itself: F(x;0, ζ) and F(x;1, ζ), with the weights 1 − λ and λ, respectively, i.e.,

\begin{matrix} F (x; λ, ζ) = (1 - λ) F (x; 0, ζ) + λ F (x; 1, ζ) . \end{matrix}

Hence, the role of λ is to balance F(x;0, ζ) and F(x;1, ζ), each reaching different targets in terms of statistical modelling.

Among the other functions of interest, the survival function (sf) of the TS-G family is given by

\begin{matrix} S (x; λ, ζ) = 1 - sin [\frac{π}{2} G (x; ζ)] + λ \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)], x \in R . \end{matrix}

Upon an almost everywhere differentiation of F(x;λ, ζ) with respect to x, the corresponding probability density function (pdf) is given by

\begin{matrix} f (x; λ, ζ) = \frac{π}{2} g (x; ζ) {λ \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ) cos [\frac{π}{2} G (x; ζ)]}, \end{matrix}

(5)

where g(x;ζ) is the pdf of the baseline distribution, i.e., obtained by an almost everywhere differentiation of G(x;ζ).

Another important function of the TS-G family, specially when the support of the baseline distribution is (0, + ∞), is the hazard rate function (hrf) defined by

\begin{matrix} h (x; λ, ζ) = \frac{\frac{π}{2} g (x; ζ) {λ \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ) cos [\frac{π}{2} G (x; ζ)]}}{1 - sin [\frac{π}{2} G (x; ζ)] + λ \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)]}, x \in R . \end{matrix}

(6)

For the importance of the sf and hrf, in reliability analysis mainly, we may refer the reader to [17], and the references therein.

2.3 A special distribution: The TSW distribution

Naturally, each choice for G(x;ζ) gives a new TS-G distribution. Here, we focus our attention on the Weibull distribution as baseline, i.e., defined by the following cdf:

\begin{matrix} G (x; α, β) = 1 - e^{- α x^{β}}, x > 0, \end{matrix}

(7)

and G(x;α, β) = 0 if x ≤ 0, where α > 0 and β > 0 are scale and shape parameters, respectively. As a main interest, the Weibull distribution is known to be an alternative to the exponential distribution, offering more flexible hazard rate shapes; decreasing and increasing shapes can be observed. It has been involved with success in a plethora of applications requiring the analysis of lifetime and reliability data. In this regard, we may refer the reader to [18–20].

We thus aim to extend the Weibull distribution, along with their properties, via the use of the TS-G family. That is, by inserting (7) into (4), we introduce the TSW distribution defined by the following cdf:

\begin{matrix} F (x; λ, α, β) & = sin [\frac{π}{2} (1 - e^{- α x^{β}})] - λ \frac{π}{2} (1 - e^{- α x^{β}}) cos [\frac{π}{2} (1 - e^{- α x^{β}})] \\ = cos [\frac{π}{2} e^{- α x^{β}}] - λ \frac{π}{2} (1 - e^{- α x^{β}}) sin [\frac{π}{2} e^{- α x^{β}}], x > 0, \end{matrix}

and F(x;λ, α, β) = 0 if x ≤ 0, where the second expression is obtained after some trigonometric manipulations.

Also, the corresponding sf, pdf and hrf are, respectively, given by

\begin{matrix} S (x; λ, α, β) = 1 - cos [\frac{π}{2} e^{- α x^{β}}] + λ \frac{π}{2} (1 - e^{- α x^{β}}) sin [\frac{π}{2} e^{- α x^{β}}], x > 0, \end{matrix}

and S(x;λ, α, β) = 1 if x ≤ 0,

\begin{matrix} f (x; λ, α, β) = \frac{π}{2} α β x^{β - 1} e^{- α x^{β}} {λ \frac{π}{2} (1 - e^{- α x^{β}}) cos [\frac{π}{2} e^{- α x^{β}}] + (1 - λ) sin [\frac{π}{2} e^{- α x^{β}}]}, \\ x > 0, \end{matrix}

and f(x;λ, α, β) = 0 if x ≤ 0, and

\begin{matrix} h (x; λ, α, β) = \frac{\frac{π}{2} α β x^{β - 1} e^{- α x^{β}} {λ \frac{π}{2} (1 - e^{- α x^{β}}) cos [\frac{π}{2} e^{- α x^{β}}] + (1 - λ) sin [\frac{π}{2} e^{- α x^{β}}]}}{1 - cos [\frac{π}{2} e^{- α x^{β}}] + λ \frac{π}{2} (1 - e^{- α x^{β}}) sin [\frac{π}{2} e^{- α x^{β}}]}, \\ x > 0, \end{matrix}

and h(x;λ, α, β) = 0 if x ≤ 0.

After some graphical investigations, the curvature properties of the functions of the TSW distribution reveal to be desirably versatile. Evidence can be seen in Fig 3, which displays some plots of the corresponding pdf and hrf for various values of the parameters.

In particular, Fig 3(a) indicates that the pdf of the TSW distribution has various skewness shapes (near symmetrical, left, right, bathtub, reversed-J shapes, mainly), along with different kurtosis properties. Fig 3(b) reveals that the corresponding hrf possesses versatile shapes, such as decreasing, increasing, bathtub (classic and upside-down) and reversed-J shapes. These observations imply that the TSW distribution is adequate to fit heterogeneous data sets. In our study, this aspect will be developed in Section 5, where the TSW distribution is used to fit two real life data sets. Also, it will be compared with other extended or modified Weibull models, and the results will be quite favorable to the TSW model.

3 Notable mathematical properties

Here, we explore some mathematical properties of interest satisfied by the TS-G family.

3.1 Stochastic ordering results

Stochastic ordering results are crucial to understand a certain hierarchy existing between the distributions, with consequence on their comparison from the modelling point of view. In the framework of the TS-G family, the following result presents some relations involving the cdf of the TS-G family (beyond the following immediate stochastic ordering property: F(x;λ, ζ)≤F(x;0, ζ)).

Proposition 2 The following inequalities hold:

If λ₂ ≥ λ₁ ≥ 0, we have F(x;λ₂, ζ)≤F(x;λ₁, ζ).
For λ ∈ [0, 2/π], we have F(x;λ, ζ)≥F_*(x;ζ), where
$\begin{matrix} F_{*} (x; ζ) = G (x; ζ) {1 - cos [\frac{π}{2} G (x; ζ)]} \end{matrix}$
is a valid cdf.

Proof. Based on (4), since λ₂ ≥ λ₁ and the involved functions are positive, we have

\begin{matrix} sin [\frac{π}{2} G (x; ζ)] - λ_{2} \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)] \leq sin [\frac{π}{2} G (x; ζ)] - λ_{1} \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)], \end{matrix}

implying the desired inequality.

For the second point, the following inequality holds: for y ∈ [0, π/2], we have sin(y)≥y(2/π) (see [16]). Hence, based on (4), since λ ∈ [0, 2/π], we have

\begin{matrix} F (x; λ, ζ) & \geq G (x; ζ) - λ \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)] \\ \geq G (x; ζ) - G (x; ζ) cos [\frac{π}{2} G (x; ζ)] = F_{*} (x; ζ) . \end{matrix}

Then, one can remark that F_*(x;ζ) is the cdf of the rv Z = max(X, Y), where X is a rv having the (baseline) cdf G(x;ζ) and Y is a rv having the cdf of the Cos-G family (see [5]), with X and Y independent.

The following result is about a likelihood stochastic ordering of the TS-G family. We refer the reader to [21] for the details on the concept of likelihood stochastic order.

Proposition 3 Let X₁ be a rv having the cdf F(x;λ₁, ζ) and X₂ be a rv having the cdf F(x;λ₂, ζ). Then, if λ₂ ≥ λ₁, we have X₁ ≤ X₂ in the likelihood stochastic ordering sense.

Proof. Following [21], we have X₁ ≤ X₂ in the likelihood stochastic ordering sense if and only if the following ratio function is decreasing with respect to x:

\begin{matrix} r (x; λ_{1}, λ_{2}, ζ) = \frac{f (x; λ_{1}, ζ)}{f (x; λ_{2}, ζ)}, x \in R, \end{matrix}

where f(x;λ₁, ζ) and f(x;λ₂, ζ) are the corresponding pdfs of F(x;λ₁, ζ) and F(x;λ₂, ζ), respectively. That is, by using (5), we have

\begin{matrix} r (x; λ_{1}, λ_{2}, ζ) = \frac{λ_{1} \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ_{1}) cos [\frac{π}{2} G (x; ζ)]}{λ_{2} \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ_{2}) cos [\frac{π}{2} G (x; ζ)]}, x \in R . \end{matrix}

Upon an almost everywhere differentiation with respect to x, after some developments, we get

\begin{matrix} \frac{d}{d x} r (x; λ_{1}, λ_{2}, ζ) = (λ_{1} - λ_{2}) \frac{π g (x; ζ) [π G (x; ζ) + sin [π G (x; ζ)]]}{4 {λ_{2} \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ_{2}) cos [\frac{π}{2} G (x; ζ)]}^{2}}, \end{matrix}

which is negative if and only if λ₂ ≥ λ₁, implying the desired result.

3.2 Equivalence properties

Here, some equivalence properties of crucial functions of the TS-G family are discussed, which can be helpful to find their limits and also, understand the tails properties of the distribution. As G(x;ζ)→0, we establish that

\begin{matrix} F (x; λ, ζ) \sim \frac{π}{2} (1 - λ) G (x; ζ), f (x; λ, ζ) \sim \frac{π}{2} (1 - λ) g (x; ζ), h (x; λ, ζ) \sim \frac{π}{2} (1 - λ) g (x; ζ) . \end{matrix}

Also, as G(x;ζ)→1, we have

\begin{matrix} F (x; λ, ζ) \sim 1 - \frac{π^{2}}{4} λ [1 - G (x; ζ)], f (x; λ, ζ) \sim \frac{π^{2}}{4} λ g (x; ζ), h (x; λ, ζ) \sim \frac{g (x; ζ)}{1 - G (x; ζ)} . \end{matrix}

In each case, we see how the new parameter λ modulates the limits; it has a strong effect in this regard, except for the hrf when G(x;ζ)→1.

In the case of the TSW distribution as described in Subsection 2.3, the following equivalence hold. As x → 0, we have

\begin{matrix} F (x; λ, α, β) \sim \frac{π}{2} (1 - λ) α x^{β}, f (x; λ, α, β) \sim \frac{π}{2} (1 - λ) α β x^{β - 1} \end{matrix}

and

\begin{matrix} h (x; λ, α, β) \sim \frac{π}{2} (1 - λ) α β x^{β - 1} . \end{matrix}

Therefore, we obtain lim_{x → 0} f(x;λ, α, β) = lim_{x → 0} f(x;λ, α, β) = ℓ with ℓ = + ∞ if β < 1, ℓ = (π/2)(1 − λ)α if β = 1 and ℓ = 0 if β > 1.

Also, as x → + ∞, we have

\begin{matrix} F (x; λ, α, β) \sim 1 - \frac{π^{2}}{4} λ e^{- α x^{β}}, f (x; λ, α, β) \sim \frac{π^{2}}{4} λ α β x^{β - 1} e^{- α x^{β}} \end{matrix}

and

\begin{matrix} h (x; λ, α, β) \sim α β x^{β - 1} . \end{matrix}

Hence, we have lim_{x → + ∞} f(x;λ, α, β) = 0 in all the situations, and lim_{x → + ∞} h(x;λ, α, β) = ℓ with ℓ = 0 if β < 1, ℓ = α if β = 1 and ℓ = + ∞ if β > 1.

3.3 Critical points

Some analytical facts about the critical points of functions of the TS-G family are now presented. First of all, the study of critical point(s), i.e., mode(s), of f(x;λ, ζ) informs us on the possible singularities of the related model. A critical point of f(x;λ, ζ), say x_*, is solution of the following non-linear equation: df(x;λ, ζ)/dx = 0, which is equivalent to be solution of the following more tractable non-linear equation: d{log[f(x;λ, ζ)]}/dx = 0, i.e.,

\begin{matrix} \frac{d g (x; ζ) / d x}{g (x; ζ)} + \frac{π}{2} g (x; ζ) \frac{(2 λ - 1) sin [\frac{π}{2} G (x; ζ)] + \frac{π}{2} λ G (x; ζ) cos [\frac{π}{2} G (x; ζ)]}{λ \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ) cos [\frac{π}{2} G (x; ζ)]} = 0 . \end{matrix}

Then, the nature of x_* depends on the values of $η = d^{2} {log [f (x; λ, ζ)]} / d x^{2} ∣_{x = x_{*}}$ . More specifically, x_* is designated as a local maximum point if η < 0, an inflection point if η = 0, and a local minimum point if η > 0.

The same methodology can be applied to study the critical points for h(x;λ, ζ), which can be useful to identify specific hazard rate shapes (monotonic, bathtub, S…) for a modelling aim. Let us just mention that a critical point for h(x;λ, ζ) is solution of the following non-linear equation: d{log[h(x;λ, ζ)]}/dx = 0, i.e.,

\begin{matrix} \frac{d g (x; ζ) / d x}{g (x; ζ)} + \frac{π}{2} g (x; ζ) \frac{(2 λ - 1) sin [\frac{π}{2} G (x; ζ)] + \frac{π}{2} λ G (x; ζ) cos [\frac{π}{2} G (x; ζ)]}{λ \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ) cos [\frac{π}{2} G (x; ζ)]} \\ + \frac{\frac{π}{2} g (x; ζ) {λ \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ) cos [\frac{π}{2} G (x; ζ)]}}{1 - sin [\frac{π}{2} G (x; ζ)] + λ \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)]} = 0 . \end{matrix}

Clearly, in most of the cases, the critical points of f(x;λ, ζ) and h(x;λ, ζ) have not closed-forms. They can however be determined as numerical values by using mathematical softwares, as Mathematica, Python, R, Maltlab…

Concerning the TSW distribution, numerical investigations, supported by Fig 3 as well, show that it is unimodal, with a corresponding hrf that can have one critical point.

3.4 A series expansion

The following result establishes a new representation of the pdf of the TS-G family involving exponentiated baseline pdfs. Such results are common for the pdfs of modern general families of continuous distributions (see, e.g., [4, 11, 22]).

Proposition 4 For any x such that G(x;ζ)<1, the following series expansion holds:

\begin{matrix} f (x; λ, ζ) = \sum_{k = 0}^{+ \infty} a_{k} υ_{2 k + 1} (x; ζ), \end{matrix}

where a_k = (π/2)^2k+1(−1)^k[1 − λ(2k + 1)]/(2k + 1)! and υ_γ = γg(x;ζ)G(x;ζ)^γ−1, with γ = 2k + 1.

Proof. Owing to the series expansions of the sine and cosine functions, after some developments, we get

\begin{matrix} F (x; λ, ζ) & = sin [\frac{π}{2} G (x; ζ)] - λ \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)], \\ = \sum_{k = 0}^{+ \infty} \frac{{(- 1)}^{k}}{(2 k + 1)!} {[\frac{π}{2} G (x; ζ)]}^{2 k + 1} - λ \frac{π}{2} G (x; ζ) \sum_{k = 0}^{+ \infty} \frac{{(- 1)}^{k}}{(2 k)!} {[\frac{π}{2} G (x; ζ)]}^{2 k} \\ = \sum_{k = 0}^{+ \infty} a_{k} {[G (x; ζ)]}^{2 k + 1} . \end{matrix}

We end the proof of Proposition 4 by differentiating the above function with respect to x.

Proposition 4 is of interest because the properties of most of the exponentiated standard distributions are well known, and thus, can be used to determine those of the TS-G family. Also, from the practical point of view, it allows us to define some integral terms by the means of (infinite) sums, which sometimes give less error than compute the integral directly. In this regard, we refer to the discussion in [22].

In the setting of the TSW distribution, we have

\begin{matrix} f (x; λ, α, β) = \sum_{k = 0}^{+ \infty} a_{k} υ_{2 k + 1} (x; α, β), \end{matrix}

where υ_2k+1(x;ζ) denotes the pdf of the exponentiated Weibull distribution, defined with power parameter 2k + 1 (see [23]), i.e.,

\begin{matrix} υ_{2 k + 1} (x; α, β) = (2 k + 1) α β x^{β - 1} e^{- α x^{β}} {(1 - e^{- α x^{β}})}^{2 k}, x > 0, \end{matrix}

and υ_2k+1(x;α, β) = 0 if x ≤ 0. Further details about the exponentiated Weibull distribution can also be found in [24, 25].

3.5 Generalities on the moments

Let X be a rv having the cdf F(x;α, β, ζ) given by (4) (and the pdf f(x;α, β, ζ) given by (5)) and ϕ(x) be a function. Then, assuming that it makes mathematical sense, the expectation of ϕ(X) is obtained as

\begin{matrix} Θ_{ϕ} (X) & = E [ϕ (X)] = \int_{- \infty}^{+ \infty} ϕ (x) f (x; λ, ζ) d x \\ = \int_{- \infty}^{+ \infty} ϕ (x) \frac{π}{2} g (x; ζ) {λ \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ) cos [\frac{π}{2} G (x; ζ)]} d x \\ = I_{ϕ}^{(1)} + I_{ϕ}^{(2)}, \end{matrix}

where, by denoting Q_G(u;ζ) the inverse function of G(x;ζ),

\begin{matrix} I_{ϕ}^{(1)} & = λ \frac{π^{2}}{4} \int_{- \infty}^{+ \infty} ϕ (x) g (x; ζ) G (x; ζ) sin [\frac{π}{2} G (x; ζ)] d x \\ = λ \frac{π^{2}}{4} \int_{0}^{1} ϕ [Q_{G} (u; ζ)] u sin (\frac{π}{2} u) d u \end{matrix}

and

\begin{matrix} I_{ϕ}^{(2)} & = (1 - λ) \frac{π}{2} \int_{- \infty}^{+ \infty} ϕ (x) g (x; ζ) cos [\frac{π}{2} G (x; ζ)] d x \\ = (1 - λ) \frac{π}{2} \int_{0}^{1} ϕ [Q_{G} (u; ζ)] cos (\frac{π}{2} u) d u . \end{matrix}

These two integrals can be determined analytically, depending on the complexity of the function ϕ[Q_G(u;ζ)]. In all the situations, for given baseline cdf and λ, Θ_ϕ(X) can be calculated by the means of numerical techniques, implemented in any mathematical software.

Also, for an alternative analytical treatment, Proposition 4 implies that

\begin{matrix} Θ_{ϕ} (X) = \sum_{k = 0}^{+ \infty} a_{k} \int_{- \infty}^{+ \infty} ϕ (x) υ_{2 k + 1} (x; ζ) d x . \end{matrix}

(8)

For practical purposes, the sum can be truncated to a large enough integer K, providing a suitable approximation of Θ_ϕ(X). Some derivations of Θ_ϕ(X) are presented in Table 1, which follow from several specific choices of ϕ(x). As an example of application, the m^th raw moments of a rv X following the TSW distribution can be derived from (8) and the m^th raw moments of the exponentiated Weibull distribution with power parameter 2k + 1 as established in [26].

Table 1. Specific measures and functions derived to Θ_ϕ(X) according to the choice of ϕ(x).

Θ_ϕ(X)	ϕ(x)
mean (μ_*)	x
variance	(x − μ_*)²
m^th raw moment	x^m
m^th central moment	(x − μ_*)^m
m^th inverse moment	x^−m
m^th logarithmic moment	[log(x)]^m
m^th descending factorial moment	x(x − 1)(x − 2)…(x − m + 1)
m^th incomplete moment with respect to t	x^m if x ≤ t, and 0 elsewhere
(m, q)^th probability weighted moment	x^m F(x;λ, ζ)^q
moment generating function with respect to t	e^tx
characteristic function with respect to t	e^itx

Open in a new tab

3.6 Reliability parameter

The general definition of the reliability parameter can be formulated as follows. Let X₁ and X₂ be two continuous rvs that can be compared based on a scenario that makes sense in a random system. Then, the corresponding reliability parameter can be defined as

\begin{matrix} R = P (X_{2} < X_{1}) = \int \int_{{y < x}} f (x, y; ξ) d x d y, \end{matrix}

where f(x, y;ξ) denotes the joint pdf of (X₁, X₂), with ξ as parameter(s) vector. Details and applications of R in a concrete setting can be found in [27, 28], and the references therein.

The following result concerns the expression of R for the TS-G family in a specific setting.

Proposition 5 Let X₁ and X₂ be two independent rvs having the cdfs F(x;λ₁, ζ) and F(x;λ₂, ζ), respectively. Then, we have

\begin{matrix} R = \frac{1}{2} + \frac{1}{16} (λ_{1} - λ_{2}) (π^{2} - 4) . \end{matrix}

Proof. Owing to the independence of X₁ and X₂, and (4) and (5), and after some integral calculus, we arrive at

\begin{matrix} R & = P (X_{2} < X_{1}) = \int_{- \infty}^{+ \infty} F (x; λ_{2}, ζ) f (x; λ_{1}, ζ) d x \\ = \int_{- \infty}^{+ \infty} {sin [\frac{π}{2} G (x; ζ)] - λ_{2} \frac{π}{2} G (x; ζ) cos [\frac{π}{2} G (x; ζ)]} \times \\ \frac{π}{2} g (x; ζ) {λ_{1} \frac{π}{2} G (x; ζ) sin [\frac{π}{2} G (x; ζ)] + (1 - λ_{1}) cos [\frac{π}{2} G (x; ζ)]} d x \\ = \frac{π}{2} \int_{0}^{1} {sin (\frac{π}{2} u) - λ_{2} \frac{π}{2} u cos (\frac{π}{2} u)} {λ_{1} \frac{π}{2} u sin (\frac{π}{2} u) + (1 - λ_{1}) cos (\frac{π}{2} u)} d u \\ = \frac{1}{2} + \frac{1}{16} (λ_{1} - λ_{2}) (π^{2} - 4) . \end{matrix}

This ends the proof of Proposition 5.

In Proposition 5, when X₁ and X₂ are identically distributed, i.e., λ₁ = λ₂, we get R = 1/2. Also, Proposition 5 is useful to have a simple estimate of R based on estimates of λ₁ and λ₂. Indeed, if ${\hat{λ}}_{1}$ and ${\hat{λ}}_{2}$ are estimates of λ₁ and λ₂, respectively, then the plugging approach suggests the following estimate for R:

\begin{matrix} \hat{R} = \frac{1}{2} + \frac{1}{16} ({\hat{λ}}_{1} - {\hat{λ}}_{2}) (π^{2} - 4) . \end{matrix}

However, more research into the application of this formula to real-world data is needed.

4 Maximum likelihood estimation

Here, an inferential study of the TS-G family is proposed, estimating the parameters of the TS-G model by the maximum likelihood method.

4.1 The basics

The maximum likelihood method is commonly employed in parametric estimation because of its overall simplicity and the theoretical guarantees ensuring strong convergence properties on the obtained estimates. In this regard, the reader will find everything in [29]. We may also refer to [30–32] for modern applications of this method. In the context of the TS-G family, the theoretical basics of the maximum likelihood method are described below. Let x₁, …, x_n be n observed values of a rv having the cdf given by (4). Then, the log-likelihood function for the parameters λ and ζ, supposed to be unknown, is defined as

\begin{matrix} ℓ (λ, ζ) & = \sum_{i = 1}^{n} log [f (x_{i}; λ, ζ)] = n log (\frac{π}{2}) + \sum_{i = 1}^{n} log [g (x_{i}; ζ)] \\ + \sum_{i = 1}^{n} log {λ \frac{π}{2} G (x_{i}; ζ) sin [\frac{π}{2} G (x_{i}; ζ)] + (1 - λ) cos [\frac{π}{2} G (x_{i}; ζ)]} . \end{matrix}

Then, the maximum likelihood method suggests the estimates given by $\hat{λ}$ and $\hat{ζ}$ , which is possibly a vector of estimates, making $ℓ (\hat{λ}, \hat{ζ})$ maximal, among all the possible values for λ and ζ. They are called maximum likelihood estimates (MLEs). From the ideal mathematical point of view, they are the solutions the following system of equations: ∂ℓ(λ, ζ)/∂λ = 0 and ∂ℓ(λ, ζ)/∂ζ = 0, with

\begin{matrix} \frac{\partial ℓ (λ, ζ)}{\partial λ} = \sum_{i = 1}^{n} \frac{\frac{π}{2} G (x_{i}; ζ) sin [\frac{π}{2} G (x_{i}; ζ)] - cos [\frac{π}{2} G (x_{i}; ζ)]}{λ \frac{π}{2} G (x_{i}; ζ) sin [\frac{π}{2} G (x_{i}; ζ)] + (1 - λ) cos [\frac{π}{2} G (x_{i}; ζ)]} \end{matrix}

and

\begin{matrix} \frac{\partial ℓ (λ, ζ)}{\partial ζ} & = \sum_{i = 1}^{n} \frac{\partial g (x_{i}; ζ)}{\partial ζ} \frac{1}{g (x_{i}; ζ)} + \\ \frac{π}{2} \sum_{i = 1}^{n} \frac{\partial g (x_{i}; ζ)}{\partial ζ} \frac{(2 λ - 1) sin [\frac{π}{2} G (x_{i}; ζ)] + \frac{π}{2} λ G (x_{i}; ζ) cos [\frac{π}{2} G (x_{i}; ζ)]}{λ \frac{π}{2} G (x_{i}; ζ) sin [\frac{π}{2} G (x_{i}; ζ)] + (1 - λ) cos [\frac{π}{2} G (x_{i}; ζ)]} . \end{matrix}

In most of the cases, the analytical expressions for $\hat{λ}$ and $\hat{ζ}$ seem not possible. However, for given baseline cdf and λ, they can be approximated numerically by iterative techniques. Common routines are the optim function of the R software or PROC NLMIXED of the SAS (Statistical Analysis System) software. Also, one can determine the standard errors (SEs) of the MLEs which follow from the inverse of the observed information matrix. By assuming that ζ contains several parameters, say m, this observed information matrix is defined by $J = {- \partial^{2} ℓ (\hat{λ}, \hat{ζ}) / \partial ψ_{i} \partial ψ_{j}}_{i, j = 1, \dots, m + 1}$ , where ψ₁ = λ, and ψ_1+r denotes the r^th component of ζ. From the SEs, one can construct asymptotic confidence intervals of the parameters, among others.

A statistical aspect of the TS-G family that is not investigated in this study is the identifiability. Numerical experiments show no particular problem on this property, but the rigorous theory remains to be developed with precise mathematical tools.

The rest of the study is devoted to the empirical and real life applications of the TS-G model with the consideration of the MLEs of the parameters.

4.2 Simulation

Here, we illustrate the practical aspect of the MLEs in the setting of the TSW model, i.e., based on the TSW distribution presented in Subsection 2.3. More precisely, we propose a graphical simulation approach illustrating the numerical behavior of the MLEs $\hat{λ}$ , $\hat{α}$ and $\hat{β}$ , of λ, α and β, respectively. The R software is used in this regard.

We proceed as follows. We generate N = 3000 samples (x₁, …, x_n) of size n = 10 to 50 from a rv following the TSW distribution with the two following sets of parameters: $S_{1} : (λ = 0.3, α = 3, β = 5)$ and $S_{2} : (λ = 0.1, α = 3.5, β = 4.5)$ . We also calculate the empirical mean squared errors (MSEs) of the MLEs defined as, for h = λ, α, β,

\begin{matrix} \hat{M S E_{h}} = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{h}}_{i} - h)}^{2}, \end{matrix}

where the index i refers to the i^th generated samples. The results of this simulation study are presented in Figs 4 and 5 for $S_{1}$ and $S_{2}$ , respectively.

As a prime observation, we see that, in all the situations, when the sample size increases, the empirical MSEs approach the axis y = 0. This illustrates the “numerical convergence” of the MLEs to the true values of the parameters.

5 Applications

Thanks to its desirable flexible properties, the TSW model aims to be applied in concrete scenarios, such as the fit of real life data. We share this finding by considered the two following well-referenced real life data sets.

“The first data set”. The first data set finds its source in [28]. It contains the tensile strength (with unit in GPa) for single carbon fibers. This data set is given by: {0.312, 0.314, 0.479, 0.552, 0.700, 0.803, 0.861, 0.865, 0.944, 0.958, 0.966, 0.997, 1.006, 1.021, 1.027, 1.055, 1.063, 1.098, 1.140, 1.179, 1.224, 1.240, 1.253, 1.270, 1.272, 1.274, 1.301, 1.301, 1.359, 1.382, 1.382, 1.426, 1.434, 1.435, 1.478, 1.490, 1.511, 1.514, 1.535, 1.554, 1.566, 1.570, 1.586, 1.629, 1.633, 1.642, 1.648, 1.684, 1.697, 1.726, 1.770, 1.773, 1.800, 1.809, 1.818, 1.821, 1.848, 1.880, 1.954, 2.012, 2.067, 2.084, 2.090, 2.096, 2.128, 2.233, 2.433, 2.585, 2.585}.
“The second data set”. The second data set, often called breaking stress of carbon fibers data set, was used by [33]. This data set is given by: {3.70, 2.74, 2.73, 2.50, 3.60, 3.11, 3.27, 2.87, 1.47, 3.11, 3.56, 4.42, 2.41, 3.19, 3.22, 1.69, 3.28, 3.09, 1.87, 3.15, 4.90, 1.57, 2.67, 2.93, 3.22, 3.39, 2.81, 4.20, 3.33, 2.55, 3.31, 3.31, 2.85, 1.25, 4.38, 1.84, 0.39, 3.68, 2.48, 0.85, 1.61, 2.79, 4.70, 2.03, 1.89, 2.88, 2.82, 2.05, 3.65, 3.75, 2.43, 2.95, 2.97, 3.39, 2.96, 2.35, 2.55, 2.59, 2.03, 1.61, 2.12, 3.15, 1.08, 2.56, 1.80, 2.53}.

In addition, seven successful models are considered for comparison, also defined as extended or modified versions of the Weibull model and having two, or three or four tuning parameters. Namely, we consider the four-parameter generalized modified Weibull (GMW) model by [34], four-parameter Kumaraswamy Weibull (KW) model by [35], four-parameter beta Weibull (BW) model by [6], four-parameter odd log-logistic modified Weibull (OLLMW) model by [36], three-parameter transmuted Weibull (TW) model by [37], three-parameter modified Weibull (MW) model by [38] and the former two-parameter sine Weibull (SW) model by [3].

As criteria of goodness-of-fits to compare these models, we chose the Cramér-Von Mises (CVM), Anderson-Darling (AD) and KS statistics, with the corresponding KS p-values. Also, the AIC is calculated. For the use of the AIC in applied frameworks, one may refer to [39–41]. The global rule is the following ones. The smaller the values of the CVM, AD, KS statistics and AIC, and the larger the values of the KS p-values, the better the fit of the corresponding model to the considered data. The R software is used.

Tables 2 and 3 list the values of the CVM, AD, KS with p-value, and the MLEs and their corresponding SEs of the models parameters for the first and second data sets, respectively.

Table 2. CVM, AD, KS with p-value, MLEs and SEs for the first data set.

Model	CVM	AD	KS	p-value			MLEs (SEs)
TSW	0.0152	0.1322	0.0385	1.0000	0.7896	0.4807	2.4826	-
(λ, α, β)	0.0152	0.1322	0.0385	1.0000	(0.1687)	(0.1702)	(0.4224)	-
GMW	0.0184	0.1649	0.0421	0.9960	4.5031	0.4927	0.3401	0.8561
(a, α, γ, λ)	0.0184	0.1649	0.0421	0.9960	(9.1201)	(0.7698)	(1.0690)	(0.2833)
KW	0.0226	0.1984	0.0475	0.9977	0.7268	0.1621	1.0308	3.5369
(α, β, γ, θ)	0.0226	0.1984	0.0475	0.9977	(0.0052)	(0.0186)	(0.0218)	(0.0086)
BW	0.0256	0.2217	0.0480	0.9973	0.3585	3.7827	0.7813	5.7953
(α, β, γ, θ)	0.0256	0.2217	0.0480	0.9973	(1.9772)	(1.2906)	(0.4103)	(18.9342)
OLLW	0.0228	0.1664	0.04675	0.9982	0.0729	0.5845	0.0146	22.3637
(α, β, γ, θ)	0.0228	0.1664	0.04675	0.9982	(0.1025)	(0.1487)	(0.0384)	(29.5884)
TW	0.0428	0.3266	0.3145	0.0000	2.7732	1.4508	-0.5636	-
(α, β, λ)	0.0428	0.3266	0.3145	0.0000	(0.4919)	(0.1420)	(0.4269)	-
MW	0.0195	0.1733	0.0431	0.9995	0.0180	0.1892	3.3740	-
(α, β, θ)	0.0195	0.1733	0.0431	0.9995	(0.0609)	(0.0780)	(0.5227)	-
SW	0.0236	0.2076	0.0442	0.9902	0.1291	3.0852	-	-
(α, β)	0.0236	0.2076	0.0442	0.9902	(0.0275)	(0.2951)	-	-

Open in a new tab

Table 3. CVM, AD, KS with p-value, MLEs and SEs for the second data set.

Model	CVM	AD	KS	p-value			MLEs (SEs)
TSW	0.0554	0.3538	0.0694	0.9079	0.7440	0.0679	2.7694	-
(λ, α, β)	0.0554	0.3538	0.0694	0.9079	(0.1915)	(0.0504)	(0.5096)	-
GMW	0.0653	0.3939	0.0760	0.8394	5.4737	0.4343	0.1493	0.5167
(a, α, γ, λ)	0.0653	0.3939	0.0760	0.8394	(7.9525)	(0.6457)	(0.5395)	(0.1722)
KW	0.0703	0.4501	0.0825	0.7591	0.6536	0.1738	0.0664	3.8782
(α, β, γ, θ)	0.0703	0.4501	0.0825	0.7591	(0.0230)	(0.0416)	(0.0142)	(0.0171)
BW	0.0846	0.5041	0.0812	0.7761	0.1864	4.0715	0.7592	6.9449
(α, β, γ, θ)	0.0846	0.5041	0.0812	0.7761	(0.4201)	(1.2708)	(0.3673)	(2.7517)
OLLW	0.1032	0.54558	0.0780	0.8160	0.0729	0.5845	0.0146	22.3637
(α, β, γ, θ)	0.1032	0.54558	0.0780	0.8160	(0.0301)	(0.0828)	(0.0256)	(23.3981)
TW	0.1260	0.6700	0.3669	0.0000	2.9256	2.7531	-0.5906	-
(α, β, λ)	0.1260	0.6700	0.3669	0.0000	(0.4828)	(0.2294)	(0.3744)	-
MW	0.0640	0.40134	0.0795	0.7974	0.0165	0.0144	3.7146	-
(α, β, θ)	0.0640	0.40134	0.0795	0.7974	(0.0206)	(0.0079)	(0.4069)	-
SW	0.0863	0.4937	0.09003	0.6584	0.0165	3.1733	-	-
(α, β)	0.0863	0.4937	0.09003	0.6584	(0.0064)	(0.3016)	-	-

Open in a new tab

Tables 2 and 3 indicate that the smallest CVM, AD and KS and the largest KS p-value are for the TSW model; it is the best model with the considered criteria. In particular, it outperforms the former SW model corresponding to λ = 0. That is, we see that the parameter λ of the TSW model is estimated “far from zero”, i.e., the corresponding MLEs are $\hat{λ} = 0.7896$ and $\hat{λ} = 0.7440$ , for the first and second data set, respectively. This points out the importance of the transformed sine technique to obtain suitable fits of these data, in comparison to the former SW model.

Tables 4 and 5 present the minus estimated log-likelihood, i.e., $- \hat{ℓ} = - ℓ (\hat{λ}, \hat{α}, \hat{β})$ for the TSW model, and AIC values of the model parameters for the first and second data set, respectively.

Table 4. The $- \hat{ℓ}$ and AIC for the first data set.

Distribution	$- \hat{ℓ}$	AIC
TSW	48.4389	102.8779
GMW	48.7195	106.0641
KW	48.7684	105.5368
BW	48.8954	105.7908
OLLW	49.3799	106.7600
TW	48.7059	103.4118
MW	48.9583	103.9166
SW	49.5012	103.0024

Open in a new tab

Table 5. The $- \hat{ℓ}$ and AIC for the second data set.

Distribution	$- \hat{ℓ}$	AIC
TSW	85.1358	176.2717
GMW	85.3731	178.7462
KW	85.60939	179.2188
BW	85.9184	179.8368
OLLW	85.5593	179.1187
TW	85.5453	177.0907
MW	85.52304	177.0461
SW	86.6910	177.3820

Open in a new tab

According to Tables 4 and 5, since it has the lowest AIC for the two data sets, the TSW model can be considered as the best one.

We now provide a graphical visualization of the nice fitting results of the TSW model. That is, Figs 6 and 7 display several fits of the TSW model. In particular, the histograms of the both data sets are plotted, along with the curves of the corresponding estimated pdfs, i.e., $f (x; \hat{λ}, \hat{α}, \hat{β})$ , the curves of the estimated cdfs, i.e., $F (x; \hat{λ}, \hat{α}, \hat{β})$ , are plotted over the ones of the corresponding empirical cdfs of the data, the curves of the estimated sfs, i.e., $S (x; \hat{λ}, \hat{α}, \hat{β})$ , are plotted over the curves of corresponding empirical sfs of the data, and Probability-Probability (P-P) plots are provided.

In all the graphics, we see that the red curves fit well the corresponding black curves, attesting the efficiency of the TSW model in this data fitting exercise.

6 Concluding remarks

Based on a new one-parameter transformation function, we provide an original extension of the Sin-G family of continuous distributions, introducing the transformed Sin-G (TS-G) family. We discuss how an additional parameter λ can enhance the flexibility of the cdf of the former Sin-G family, with nice consequences for modelling purposes. An emphasis is put on the transformed Sin Weibull (TSW) distribution, showing a high potential in the analysis and modelling of lifetime data. Some general mathematical features of the TS-G family are established. Then, a statistical approach is adopted; the maximum likelihood estimates (MLEs) for the TS-G model parameters are discussed. The TSW model is highlighted, demonstrating that it is more capable of fitting data than seven rival models, some of which have more parameters. The TS-G family can find a broader use in all areas dealing with modern data as a result of its qualities. For example, it can be used to construct models in multivariate analysis, regression, classification, and other statistical fields of importance. In addition, the transformation T_λ(x) or $T_{λ}^{*} (x)$ can be used to efficiently extend other existing families of distributions. These viewpoints necessitate additional developments, which we plan to incorporate in future works.

Acknowledgments

The authors thank the two reviewers for their detailed and constructive comments.

Data Availability

All relevant data are within the manuscript.

Funding Statement

The author(s) received no specific funding for this work.

References

1. Brito CR, Rêgo LC, Oliveira WR and Gomes-Silva F. Method for generating distributions and classes of probability distributions: The univariate case, Hacettepe Journal of Mathematics and Statistics. 2019, 48, 897–930. [Google Scholar]
2. Kumar D, Singh U and Singh SK. A new distribution using sine function: its application to bladder cancer patients data, Journal of Statistics Applications and Probability. 2015, 4, 417–427. [Google Scholar]
3.Souza L. New trigonometric classes of probabilistic distributions, Thesis, Universidade Federal Rural de Pernambuco, 2015.
4. Souza L, Junior WRO, de Brito CCR, Chesneau C, Ferreira TAE and Soares L. On the Sin-G class of distributions: theory, model and application, Journal of Mathematical Modeling. 2019, 7, 3, 357–379. [Google Scholar]
5. Souza L, Junior WRO, de Brito CCR, Chesneau C, Ferreira TAE and Soares L. General properties for the Cos-G class of distributions with applications, Eurasian Bulletin of Mathematics. 2019, 2, 2, 63–79. [Google Scholar]
6. Lee C, Famoye F and Olumolade O. Beta-Weibull distribution: Some properties and applications to censored data, Journal of modern applied statistical methods. 2007, 6, 1, 173–186. 10.22237/jmasm/1177992960 [DOI] [Google Scholar]
7. Nelson W. Applied life data analysis, John Wiley and Sons, New York, 1982. [Google Scholar]
8. Bjerkedal T. Acquisition of resistance in guinea pigs infected with different doses of virulent tubercle bacilli, American Journal of Hygiene. 1960, 72, 130–148. [DOI] [PubMed] [Google Scholar]
9.Souza L, Gallindo L and Serafim-de-Souza L. SinIW: The SinIW distribution. R package version 0.2. 2016, Available at https://CRAN.R-project.org/package=SinIW.
10. Chesneau C, Bakouch HS and Hussain T. A new class of probability distributions via cosine and sine functions with applications, Communications in Statistics—Simulation and Computation. 2019, 48, 8, 2287–2300. 10.1080/03610918.2018.1440303 [DOI] [Google Scholar]
11. Mahmood Z, Chesneau C and Tahir MH. A new sine-G family of distributions: properties and applications, Bulletin of Computational Applied Mathematics. 2019, 7, 1, 53–81. [Google Scholar]
12. Jamal F and Chesneau C. A new family of polyno-expo-trigonometric distributions with applications, Infinite Dimensional Analysis, Quantum Probability and Related Topics. 2019, 22, 04, 1950027, 1–15. 10.1142/S0219025719500279 [DOI] [Google Scholar]
13. Al-Babtain AA, Elbatal I, Chesneau C and Elgarhy M. Sine Topp-Leone-G family of distributions: Theory and applications, Open Physics. 2020, 18, 1, 574–593. 10.1515/phys-2020-0180 [DOI] [Google Scholar]
14.Jamal F and Chesneau C. The sine Kumaraswamy-G family of distributions, 2020, preprint: https://hal.archives-ouvertes.fr/hal-02120197.
15.Jamal F, Chesneau C and Aidi K. The sine extended odd Fréchet-G family of distribution with applications to complete and censored data, Mathematica Slovaca. (2021), (to appear).
16. Mitrinović DS. Analytic Inequalities, Springer-Verlag, Berlin, 1970. [Google Scholar]
17. Klein JP and Moeschberger ML. Survival analysis: Techniques for censored and truncated data, 2nd ed.; Springer: Berlin, Germany, 2003. [Google Scholar]
18. Wang Y, Chan Y, Gui Z, Webb D and Li L. Application of Weibull distribution analysis to the dielectric failure of multilayer ceramic capacitors, Materials Science and Engineering: B. 1997, 47, 3, 197–203. 10.1016/S0921-5107(97)00041-X [DOI] [Google Scholar]
19. Corzo O and Bracho N. Application of Weibull distribution model to describe the vacuum pulse osmotic dehydration of sardine sheets, LWT-Food Science and Technology. 2008, 41, 6, 1108–1115. 10.1016/j.lwt.2007.06.018 [DOI] [Google Scholar]
20. Aslam M, Arif OH and Jun CH. An attribute control chart for a Weibull distribution under accelerated hybrid censoring, PloS one. 2017, 12, 3, e0173406. 10.1371/journal.pone.0173406 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Shaked M and Shanthikumar JG. Stochastic orders, Wiley, New York, 2007. [Google Scholar]
22. Cordeiro GM and de Castro M. A new family of generalized distributions, Journal of Statistical Computation and Simulation. 2011, 81, 883–893. 10.1080/00949650903530745 [DOI] [Google Scholar]
23. Mudholkar GS and Srivastava DK. Exponentiated Weibull family for analyzing bathtub failure-rate data, IEEE Transactions on Reliability. 1993, 42, 2, 299–302. 10.1109/24.229504 [DOI] [Google Scholar]
24. Mudholkar GS, Srivastava DK and Freimer M. The exponentiated Weibull family; a reanalysis of the bus motor failure data, Technometrics. 1995, 37, 4, 436–445. 10.1080/00401706.1995.10484376 [DOI] [Google Scholar]
25. Mudholkar GS and Hutson AD. The exponentiated Weibull family: some properties and a flood data application, Communications in Statistics—Theory and Methods. 1996, 25, 3059–3083. 10.1080/03610929608831886 [DOI] [Google Scholar]
26. Choudhury A. A simple derivation of moments of the exponentiated Weibull distribution, Metrika. 2005, 62, 1, 17–22. 10.1007/s001840400351 [DOI] [Google Scholar]
27. Kotz S, Lumelskii Y and Penskey M. The stress-strength model and its generalizations and applications, World Scientific, Singapore, 2003. [Google Scholar]
28. Raqab M, Madi T and Debasis K. Estimation of P (Y < X) for the 3-parameter generalized exponential distribution, Communications in Statistics-Theory and Methods. 2008, 37, 2854–2864. 10.1080/03610920802162664 [DOI] [Google Scholar]
29. Casella G and Berger RL. Statistical inference, Duxbury Advanced Series Thomson Learning: Pacific Grove, 2002. [Google Scholar]
30.Dong B, Ma X, Chen F and Chen S. Investigating the differences of single- and multi-vehicle accident probability using mixed logit model. Journal of Advanced Transportation. 2018, Article ID 2702360, 9 pages.
31. Chen F and Chen S. Injury severities of truck drivers in single- and multi-vehicle accidents on rural highway. Accident Analysis and Prevention. 2011, 43, 1677–1688. 10.1016/j.aap.2011.03.026 [DOI] [PubMed] [Google Scholar]
32. Chen F, Chen S and Ma X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research. 2018, 65, 153–159. 10.1016/j.jsr.2018.02.010 [DOI] [PubMed] [Google Scholar]
33. Cordeiro GM and Lemonte A. The β-Birnbaum-Saunders distribution: An improved distribution for fatigue life modeling, Computational Statistics and Data Analysis. 2011, 55, 1445–1461. 10.1016/j.csda.2010.10.007 [DOI] [Google Scholar]
34. Carrasco JMF, Edwin Ortega MM and Cordeiro GM A generalized modified Weibull distribution for lifetime modeling, Computational Statistics and Data Analysis. 2008, 53, 450–462. 10.1016/j.csda.2008.08.023 [DOI] [Google Scholar]
35. Cordeiro GM, Ortega EMM and Nadarajah S. The Kumaraswamy Weibull distribution with application to failure data, Journal of the Franklin Institute. 2010, 347, 8, 1399–1429. 10.1016/j.jfranklin.2010.06.010 [DOI] [Google Scholar]
36. Saboor A, Alizadeh M, Khan MN, Gosh I and Cordeiro GM. Odd Log-Logistic modified Weibull distribution, Mediterranean Journal of Mathematics. 2017, 14, 96. 10.1007/s00009-017-0880-3 [DOI] [Google Scholar]
37. Aryal GR and Tsokos CP. Transmuted Weibull distribution: A generalization of the Weibull probability distribution, European Journal of Pure and Applied Mathematics. 2011, 4, 2, 89–102. [Google Scholar]
38. Sarhan AM and Zain-din M. Modified Weibull distribution, Applied Sciences. 2009, 11, 123–136. [Google Scholar]
39. Huang H, Song B, Xu P, Zeng Q, Lee J and Abdel-Aty M. Macro and micro models for zonal crash prediction with application in hot zones identification. Journal of Transport Geography. 2016, 54, 248–256. 10.1016/j.jtrangeo.2016.06.012 [DOI] [Google Scholar]
40. Wen H, Zhang X, Zeng Q and Sze NN. Bayesian spatial-temporal model for the main and interaction effects of roadway and weather characteristics on freeway crash incidence. Accident Analysis and Prevention. 2019, 132, 105249. 10.1016/j.aap.2019.07.025 [DOI] [PubMed] [Google Scholar]
41. Zeng Q, Wang X, Wen H and Yuan Q. An empirical investigation of the factors contributing to local-vehicle and non-local-vehicle crashes on freeway. Journal of Transportation Safety & Security. 2020, 1–15. 10.1080/19439962.2020.1779422 27648455 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data are within the manuscript.

[pone.0250790.ref001] 1. Brito CR, Rêgo LC, Oliveira WR and Gomes-Silva F. Method for generating distributions and classes of probability distributions: The univariate case, Hacettepe Journal of Mathematics and Statistics. 2019, 48, 897–930. [Google Scholar]

[pone.0250790.ref002] 2. Kumar D, Singh U and Singh SK. A new distribution using sine function: its application to bladder cancer patients data, Journal of Statistics Applications and Probability. 2015, 4, 417–427. [Google Scholar]

[pone.0250790.ref003] 3.Souza L. New trigonometric classes of probabilistic distributions, Thesis, Universidade Federal Rural de Pernambuco, 2015.

[pone.0250790.ref004] 4. Souza L, Junior WRO, de Brito CCR, Chesneau C, Ferreira TAE and Soares L. On the Sin-G class of distributions: theory, model and application, Journal of Mathematical Modeling. 2019, 7, 3, 357–379. [Google Scholar]

[pone.0250790.ref005] 5. Souza L, Junior WRO, de Brito CCR, Chesneau C, Ferreira TAE and Soares L. General properties for the Cos-G class of distributions with applications, Eurasian Bulletin of Mathematics. 2019, 2, 2, 63–79. [Google Scholar]

[pone.0250790.ref006] 6. Lee C, Famoye F and Olumolade O. Beta-Weibull distribution: Some properties and applications to censored data, Journal of modern applied statistical methods. 2007, 6, 1, 173–186. 10.22237/jmasm/1177992960 [DOI] [Google Scholar]

[pone.0250790.ref007] 7. Nelson W. Applied life data analysis, John Wiley and Sons, New York, 1982. [Google Scholar]

[pone.0250790.ref008] 8. Bjerkedal T. Acquisition of resistance in guinea pigs infected with different doses of virulent tubercle bacilli, American Journal of Hygiene. 1960, 72, 130–148. [DOI] [PubMed] [Google Scholar]

[pone.0250790.ref009] 9.Souza L, Gallindo L and Serafim-de-Souza L. SinIW: The SinIW distribution. R package version 0.2. 2016, Available at https://CRAN.R-project.org/package=SinIW.

[pone.0250790.ref010] 10. Chesneau C, Bakouch HS and Hussain T. A new class of probability distributions via cosine and sine functions with applications, Communications in Statistics—Simulation and Computation. 2019, 48, 8, 2287–2300. 10.1080/03610918.2018.1440303 [DOI] [Google Scholar]

[pone.0250790.ref011] 11. Mahmood Z, Chesneau C and Tahir MH. A new sine-G family of distributions: properties and applications, Bulletin of Computational Applied Mathematics. 2019, 7, 1, 53–81. [Google Scholar]

[pone.0250790.ref012] 12. Jamal F and Chesneau C. A new family of polyno-expo-trigonometric distributions with applications, Infinite Dimensional Analysis, Quantum Probability and Related Topics. 2019, 22, 04, 1950027, 1–15. 10.1142/S0219025719500279 [DOI] [Google Scholar]

[pone.0250790.ref013] 13. Al-Babtain AA, Elbatal I, Chesneau C and Elgarhy M. Sine Topp-Leone-G family of distributions: Theory and applications, Open Physics. 2020, 18, 1, 574–593. 10.1515/phys-2020-0180 [DOI] [Google Scholar]

[pone.0250790.ref014] 14.Jamal F and Chesneau C. The sine Kumaraswamy-G family of distributions, 2020, preprint: https://hal.archives-ouvertes.fr/hal-02120197.

[pone.0250790.ref015] 15.Jamal F, Chesneau C and Aidi K. The sine extended odd Fréchet-G family of distribution with applications to complete and censored data, Mathematica Slovaca. (2021), (to appear).

[pone.0250790.ref016] 16. Mitrinović DS. Analytic Inequalities, Springer-Verlag, Berlin, 1970. [Google Scholar]

[pone.0250790.ref017] 17. Klein JP and Moeschberger ML. Survival analysis: Techniques for censored and truncated data, 2nd ed.; Springer: Berlin, Germany, 2003. [Google Scholar]

[pone.0250790.ref018] 18. Wang Y, Chan Y, Gui Z, Webb D and Li L. Application of Weibull distribution analysis to the dielectric failure of multilayer ceramic capacitors, Materials Science and Engineering: B. 1997, 47, 3, 197–203. 10.1016/S0921-5107(97)00041-X [DOI] [Google Scholar]

[pone.0250790.ref019] 19. Corzo O and Bracho N. Application of Weibull distribution model to describe the vacuum pulse osmotic dehydration of sardine sheets, LWT-Food Science and Technology. 2008, 41, 6, 1108–1115. 10.1016/j.lwt.2007.06.018 [DOI] [Google Scholar]

[pone.0250790.ref020] 20. Aslam M, Arif OH and Jun CH. An attribute control chart for a Weibull distribution under accelerated hybrid censoring, PloS one. 2017, 12, 3, e0173406. 10.1371/journal.pone.0173406 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0250790.ref021] 21. Shaked M and Shanthikumar JG. Stochastic orders, Wiley, New York, 2007. [Google Scholar]

[pone.0250790.ref022] 22. Cordeiro GM and de Castro M. A new family of generalized distributions, Journal of Statistical Computation and Simulation. 2011, 81, 883–893. 10.1080/00949650903530745 [DOI] [Google Scholar]

[pone.0250790.ref023] 23. Mudholkar GS and Srivastava DK. Exponentiated Weibull family for analyzing bathtub failure-rate data, IEEE Transactions on Reliability. 1993, 42, 2, 299–302. 10.1109/24.229504 [DOI] [Google Scholar]

[pone.0250790.ref024] 24. Mudholkar GS, Srivastava DK and Freimer M. The exponentiated Weibull family; a reanalysis of the bus motor failure data, Technometrics. 1995, 37, 4, 436–445. 10.1080/00401706.1995.10484376 [DOI] [Google Scholar]

[pone.0250790.ref025] 25. Mudholkar GS and Hutson AD. The exponentiated Weibull family: some properties and a flood data application, Communications in Statistics—Theory and Methods. 1996, 25, 3059–3083. 10.1080/03610929608831886 [DOI] [Google Scholar]

[pone.0250790.ref026] 26. Choudhury A. A simple derivation of moments of the exponentiated Weibull distribution, Metrika. 2005, 62, 1, 17–22. 10.1007/s001840400351 [DOI] [Google Scholar]

[pone.0250790.ref027] 27. Kotz S, Lumelskii Y and Penskey M. The stress-strength model and its generalizations and applications, World Scientific, Singapore, 2003. [Google Scholar]

[pone.0250790.ref028] 28. Raqab M, Madi T and Debasis K. Estimation of P (Y < X) for the 3-parameter generalized exponential distribution, Communications in Statistics-Theory and Methods. 2008, 37, 2854–2864. 10.1080/03610920802162664 [DOI] [Google Scholar]

[pone.0250790.ref029] 29. Casella G and Berger RL. Statistical inference, Duxbury Advanced Series Thomson Learning: Pacific Grove, 2002. [Google Scholar]

[pone.0250790.ref030] 30.Dong B, Ma X, Chen F and Chen S. Investigating the differences of single- and multi-vehicle accident probability using mixed logit model. Journal of Advanced Transportation. 2018, Article ID 2702360, 9 pages.

[pone.0250790.ref031] 31. Chen F and Chen S. Injury severities of truck drivers in single- and multi-vehicle accidents on rural highway. Accident Analysis and Prevention. 2011, 43, 1677–1688. 10.1016/j.aap.2011.03.026 [DOI] [PubMed] [Google Scholar]

[pone.0250790.ref032] 32. Chen F, Chen S and Ma X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research. 2018, 65, 153–159. 10.1016/j.jsr.2018.02.010 [DOI] [PubMed] [Google Scholar]

[pone.0250790.ref033] 33. Cordeiro GM and Lemonte A. The β-Birnbaum-Saunders distribution: An improved distribution for fatigue life modeling, Computational Statistics and Data Analysis. 2011, 55, 1445–1461. 10.1016/j.csda.2010.10.007 [DOI] [Google Scholar]

[pone.0250790.ref034] 34. Carrasco JMF, Edwin Ortega MM and Cordeiro GM A generalized modified Weibull distribution for lifetime modeling, Computational Statistics and Data Analysis. 2008, 53, 450–462. 10.1016/j.csda.2008.08.023 [DOI] [Google Scholar]

[pone.0250790.ref035] 35. Cordeiro GM, Ortega EMM and Nadarajah S. The Kumaraswamy Weibull distribution with application to failure data, Journal of the Franklin Institute. 2010, 347, 8, 1399–1429. 10.1016/j.jfranklin.2010.06.010 [DOI] [Google Scholar]

[pone.0250790.ref036] 36. Saboor A, Alizadeh M, Khan MN, Gosh I and Cordeiro GM. Odd Log-Logistic modified Weibull distribution, Mediterranean Journal of Mathematics. 2017, 14, 96. 10.1007/s00009-017-0880-3 [DOI] [Google Scholar]

[pone.0250790.ref037] 37. Aryal GR and Tsokos CP. Transmuted Weibull distribution: A generalization of the Weibull probability distribution, European Journal of Pure and Applied Mathematics. 2011, 4, 2, 89–102. [Google Scholar]

[pone.0250790.ref038] 38. Sarhan AM and Zain-din M. Modified Weibull distribution, Applied Sciences. 2009, 11, 123–136. [Google Scholar]

[pone.0250790.ref039] 39. Huang H, Song B, Xu P, Zeng Q, Lee J and Abdel-Aty M. Macro and micro models for zonal crash prediction with application in hot zones identification. Journal of Transport Geography. 2016, 54, 248–256. 10.1016/j.jtrangeo.2016.06.012 [DOI] [Google Scholar]

[pone.0250790.ref040] 40. Wen H, Zhang X, Zeng Q and Sze NN. Bayesian spatial-temporal model for the main and interaction effects of roadway and weather characteristics on freeway crash incidence. Accident Analysis and Prevention. 2019, 132, 105249. 10.1016/j.aap.2019.07.025 [DOI] [PubMed] [Google Scholar]

[pone.0250790.ref041] 41. Zeng Q, Wang X, Wen H and Yuan Q. An empirical investigation of the factors contributing to local-vehicle and non-local-vehicle crashes on freeway. Journal of Transportation Safety & Security. 2020, 1–15. 10.1080/19439962.2020.1779422 27648455 [DOI] [Google Scholar]

PERMALINK

Beyond the Sin-G family: The transformed Sin-G family

Farrukh Jamal

Christophe Chesneau

Dalal Lala Bouali

Mahmood Ul Hassan

Roles

Abstract

1 Introduction

2 Basics on the TS-G family

2.1 On a special polynomial-trigonometric function

Fig 1. Plots of the function Uℓ(x) for ℓ ∈ {0.1, 0.2, …, 0.9}.

Fig 2. Plots of the function Tλ(x) for λ ∈ {0.1, 0.2, …, 1}.

2.2 Definition

2.3 A special distribution: The TSW distribution

Fig 3. Selection of plots for (a) the pdf and (b) the hrf of the TSW distribution.

3 Notable mathematical properties

3.1 Stochastic ordering results

3.2 Equivalence properties

3.3 Critical points

3.4 A series expansion

3.5 Generalities on the moments

Table 1. Specific measures and functions derived to Θϕ(X) according to the choice of ϕ(x).

3.6 Reliability parameter

4 Maximum likelihood estimation

4.1 The basics

4.2 Simulation

Fig 4. Plots of the empirical MSEs of the TSW model parameters for S1:(λ=0.3,α=3,β=5) for (a) λ, (b) α and (c) β.

Fig 5. Plots of the empirical MSEs of the TSW model parameters for S2:(λ=0.1,α=3.5,β=4.5) for (a) λ, (b) α and (c) β.

5 Applications

Table 2. CVM, AD, KS with p-value, MLEs and SEs for the first data set.

Table 3. CVM, AD, KS with p-value, MLEs and SEs for the second data set.

Table 4. The -ℓ^ and AIC for the first data set.

Table 5. The -ℓ^ and AIC for the second data set.

Fig 6. Several fits of the TSW model for the first data set: (a) estimated pdf over the histogram, (b) estimated cdf over the empirical cdf, (c) estimated sf over the empirical sf and (d) P-P plot.

Fig 7. Several fits for the TSW model for the second data set: (a) estimated pdf over the histogram, (b) estimated cdf over the empirical cdf, (c) estimated sf over the empirical sf and (d) P-P plot.

6 Concluding remarks

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 1. Plots of the function U_ℓ(x) for ℓ ∈ {0.1, 0.2, …, 0.9}.

Fig 2. Plots of the function T_λ(x) for λ ∈ {0.1, 0.2, …, 1}.

Table 1. Specific measures and functions derived to Θ_ϕ(X) according to the choice of ϕ(x).

Fig 4. Plots of the empirical MSEs of the TSW model parameters for $S_{1} : (λ = 0.3, α = 3, β = 5)$ for (a) λ, (b) α and (c) β.

Fig 5. Plots of the empirical MSEs of the TSW model parameters for $S_{2} : (λ = 0.1, α = 3.5, β = 4.5)$ for (a) λ, (b) α and (c) β.

Table 4. The $- \hat{ℓ}$ and AIC for the first data set.

Table 5. The $- \hat{ℓ}$ and AIC for the second data set.