Abstract
We introduce a new survival distribution, of Pareto type, that arises from a cure-mixture frailty model. We describe its properties and demonstrate connections with familiar distributions including the Pareto and exponential. We derive its characteristic function and moments.
Keywords: Characteristic function, Moments, Pareto distribution, Survival distribution
Mathematics Subject Classification: 62E99
1. Introduction
Recently, Leemis and McQueston (2008) presented a graphic describing the relationships among a large number of known univariate distributions. A distribution that we encountered in our applied work, which we denote the Complementary Mixture Pareto II (CMPII), was not included in the summary, nor does it appear in a popular compendium of univariate distributions (Johnson et al., 1995). Therefore, we believe that CMPII is new to the statistical literature and in this note we present our derivation of it, describe connections with known distributions, and derive its characteristic function and moments.
We came upon CMPII while developing a model to accommodate the unique characteristics of data from smoking cessation clinical trials (Li et al., 2010). As the durations of quit attempts in such trials are characterized by heterogeneity and the possibility of permanent success, an appropriate statistical model should incorporate both latent frailties and the probability of cure. Our construction of such a model led us directly to CMPII, as we demonstrate below.
Let denote an event time and its survival function. We assume that with probability the event time is infinite, so that takes the form
where is a proper survival function. We assume moreover that, conditional on a latent frailty , the cure probability and survival function given not cured take the forms
| (1.1) |
where is a linear predictor for the cure probability and is a linear predictor for event rate given not cured. That is, we assume that: (i) conditionally on the frailty the probability of cure follows a generalized linear model with a complementary log-log (cloglog) link; (ii) survival given not cured is exponential with a proportional hazard regression; and (iii) the natural log of is an offset in both linear predictors. Moreover, we assume that is a random draw from a gamma distribution with shape and scale , ensuring that and . If the cure probability were fixed at 0, this would be the standard model of a gamma-distributed exponential hazard (Duchateau and Janssen, 2008).
This specification is unique among cure-mixture frailty models in using a cloglog rather than a logistic link for the cure probability and in allowing a common frailty to affect both the cure probability and the survival given not cured. Integrating out the frailty, we obtain the marginal survival function
where
and
is a proper survival function in that it is monotone decreasing with and . Therefore, the marginal cure probability is , and the marginal survival function given not cured is . Note that the marginal cure probability differs from the conditional cure probability , and the marginal non cured survival differs from —the integral of the conditional non cured survival. The distribution is the subject of this article.
As we indicated, our research suggests that has not previously been identified as a distribution. We therefore devote the remainder of this note to an investigation of its properties and its relationship to other distributions. Note that our purpose here is simply to elucidate basic theoretical properties; we refer readers to Li et al. (2010) for an application.
Our argument invokes the following densities.
Exponential :
Pareto I :
Pareto II :
2. General Characteristics
For simplicity, we denote ; then,
| (2.1) |
Here, is a parameter associated with the conditional hazard (conditional on the frailty) given not cured; a larger implies a higher conditional hazard of experiencing the event. Parameter is associated with the conditional cure probability, where a larger value implies a larger conditional cure probability. The parameter is the variance of the frailty, and is a normalizing constant. The corresponding marginal density is
| (2.2) |
and the hazard function is
| (2.3) |
We rewrite (2.2) as
| (2.4) |
where
and
Because the density is a weighted difference of Pareto II densities, we denote our model as the Complementary Mixture Pareto II (CMPII) distribution.
Note that the first derivative of the density is
so that the density function is always decreasing with time .
Figure 1 plots the log density and log hazard for different values of and when . Figure 2 depicts how the density and hazard vary with when and are held fixed. Note that the hazard increases with and appears to converge to a distribution as . In fact, by L’Hôpital’s rule,
Figure 1.

Log density and hazard function when .
Figure 2.

Log density and hazard function when and .
As the hazard function (2.3) is analytically complicated, we explored permissible shapes by computing its derivative numerically over a wide range of values of , and . In all cases the derivative was negative for all , suggesting that the hazard is non increasing with time under a wide range of parameter values. This is not surprising, as the special case of the Pareto II can be shown to have a non-increasing hazard (Johnson et al., 1995).
3. The Special Case of
Letting in (2.1), the marginal non cured survival function becomes
the Pareto II that arises from the standard frailty model with exponential baseline hazard (Duchateau and Janssen, 2008). This is because as , the conditional cure probability in (1.1) goes to 0, so that our model reduces to a standard frailty model with constant hazard conditional on the frailty. The density is then
with first derivative
so that the density function is always decreasing with time .
Figures 3 and 4 depict the log density and hazard at various values of and for . We see that when is fixed and decreases, the density declines at a slower rate, and the hazard is smaller at any given time. This is because a smaller value of implies a smaller individual-level conditional hazard, which manifests itself as a smaller population-level marginal hazard. In particular, when , the marginal hazard also goes to 0. When , the marginal non-cured survival becomes the Pareto I :
Figure 3.

Log density function when .
Figure 4.

Hazard function when .
Similarly, the density and hazard function vary with for fixed . When is small, the log density approaches a linear function of time and the hazard approaches a constant. This is because means that the variance of the frailty approaches 0, which implies that there is no between-subject heterogeneity, and therefore the population survival function equals the individual survival function, which is exponential:
We summarize the relationships of CMPII with other known distributions in Fig. 5.
Figure 5.

Relationships of CMPII to other distributions.
4. Characteristic Function and Moments
We first derive the expectation and variance of CMPII.
Theorem 4.1. The expectation of a variate X following the CMPII distribution is
the second moment is
and therefore the variance is
See Appendix A for a proof. We note that the second factor in the first moment formula is always smaller than 1, giving
We next derive the characteristic function .
Theorem 4.2. The characteristic function of CMPII is
Here, is the upper incomplete Gamma function with complex arguments:
(Abramowitz and Stegun, 1965; DiDonato and Morris, 1986; Temme, 1996). See Appendix B for a proof. Repeated differentiation gives
Theorem 4.3. The nth moment of the CMPII distribution is
See Appendix C for a proof. One can readily verify that Theorem 4.3 implies Theorem 4.1.
5. Discussion
We have described CMPII, a three-parameter family that arises as the marginal distribution of survival given not cured in a cure-mixture frailty model. Several known distributions are limiting cases.
The CMPII has the curious property that its density equals a linear combination of two Pareto II densities, which would appear to offer a clear path to its characteristic function and moments. Ironically, we were unable to find formulas for the Pareto II characteristic function, so we derived terms for CMPII directly (see Appendices B and C). The derivations are tedious but may serve as templates for similar derivations of other distributions.
Pareto distributions have been known for several decades (Arnold and Laguna, 1977; Arnold, 1983) and have found application in various areas of science, especially in economics to describe income and in medical statistics to describe survival. Naturally, one can apply CMPII in these areas as well. Our applied experience (Li et al., 2010) suggests its usefulness specifically in smoking cessation research.
Appendix A: Proof of Theorem 4.1
Johnson et al. (1995) showed that if is Pareto II , then
and
Recall that the density of a CMPII variate is
where
is the density of a Pareto II , and is the density of a Pareto II . One can therefore derive and by taking linear combinations of the moments of Pareto II, and by subtraction.
Appendix B: Proof of Theorem 4.2
Here, we use the fact that
where are complex numbers with ; and is the upper incomplete Gamma function (Gradshteyn and Ryzhik, 1980).
Equation (2.4) implies that
Now
and similarly,
Therefore,
Appendix C: Proof of Theorem 4.3
We first note a series of Lemmas.
Lemma A.1.
Proof. See Gradshteyn and Ryzhik (1980). □
Lemma A.2.
Proof.
□
Lemma A.3. Define . Then,
Proof.
□
Lemma A.4.
Proof. Denoting , we prove the lemma by induction. Supposing it is true for , then for ,
so it is true for as well. □
Lemma A.5.
Proof.
□
Lemma A.6.
Proof. Setting , we again prove the lemma using induction. If it is true for , then for ,
so it is true for as well. □
Based on these Lemmas we can prove Theorem 4.3 easily. Denote , and , then
References
- Abramowitz M, Stegun IA (1965). Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. New York: Dover Publications, Inc. [Google Scholar]
- Arnold BC (1983). Pareto Distributions. Fairland, MD: International Cooperative Publishing House, [Google Scholar]
- Arnold BC, Laguna L (1977). On Generalized Pareto Distributions with Applications to Income Data. International Studies in Economics, Monograph #10, Dept. of Economics, Iowa State University, Ames, IA. [Google Scholar]
- Duchateau L, Janssen P (2008). The Frailty Model. Springer: New York. [Google Scholar]
- DiDonato AR, Morris AH (1986). Computation of the incomplete gamma function ratios and their inverse. ACM Trans. Mathemat. Software 12:377–393. [Google Scholar]
- Gradshteyn IS, Ryzhik IM (1980). Tables of Integrals, Series, and Products. New York: Acedemic Press. [Google Scholar]
- Johnson NL, Kotz S, Balakrishnan N (1995). Continuous Univariate Distributions. New York: Wiley. [Google Scholar]
- Leemis LM, McQueston JT (2008). Univariate distribution relationships. The Amer. Statistician 62:45–53. [Google Scholar]
- Li Y, Wileyto EP, Heitjan DF (2010). Modeling smoking cessation data with alternating states and a cure fraction using frailty models. Statist. Med 29:627–638. [Google Scholar]
- Temme NM (1996). Uniform asymptotics for the incomplete gamma functions starting from negative values of the parameters. Meth. Applic. Anal 3:335–344. [Google Scholar]
