Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Feb 4.
Published in final edited form as: Biometrics. 2011 Mar;67(1):8–17. doi: 10.1111/j.1541-0420.2010.01444.x

A Positive Stable Frailty Model for Clustered Failure Time Data with Covariate-Dependent Frailty

Dandan Liu 1,*, John D Kalbfleisch 1, Douglas E Schaubel 1
PMCID: PMC3913567  NIHMSID: NIHMS540112  PMID: 20528861

Summary

In this article, we propose a positive stable shared frailty Cox model for clustered failure time data where the frailty distribution varies with cluster-level covariates. The proposed model accounts for covariate-dependent intracluster correlation and permits both conditional and marginal inferences. We obtain marginal inference directly from a marginal model, then use a stratified Cox-type pseudo-partial likelihood approach to estimate the regression coefficient for the frailty parameter. The proposed estimators are consistent and asymptotically normal and a consistent estimator of the covariance matrix is provided. Simulation studies show that the proposed estimation procedure is appropriate for practical use with a realistic number of clusters. Finally, we present an application of the proposed method to kidney transplantation data from the Scientific Registry of Transplant Recipients.

Keywords: Bridge distribution, Clustered failure times, Covariate-dependent frailty, Cox model, Positive stable frailty, Shared frailty

1. Introduction

Clustered failure time data are frequently observed in biomedical studies. For example, in the kidney transplantation setting, transplant failure times are of interest and can be taken as clustered failure times with transplant facilities as clusters. In family disease studies, time to disease onset is of interest and families are natural clusters. Subjects within cluster are correlated, with the intracluster dependence possibly due to sharing similar environmental and/or genetic conditions.

Several methods have been proposed for clustered failure time data. In general, these can be categorized into two broad strategies. In marginal models, the cluster structure is usually ignored when estimating the population-averaged covariate effect, but is used to derive valid standard error estimates. Marginal models can be used when the comparison of lifetimes across clusters is of interest. Examples include Wei, Lin, and Weissfeld (1989); Lee, Wei, and Amato (1992); and Spiekerman and Lin (1998). These authors used generalized estimating equations with an independence working assumption and the intracluster correlation structure left unspecified. As a result, some efficiency loss may occur, potentially affecting the significance of estimated covariate effects.

When the comparison of lifetimes within the same cluster is of interest, frailty models may be more appropriate. In this case, the correlation structure is specified by incorporating a random effect (frailty) that is common to subjects within the same cluster. The covariate effect is then interpreted as being conditional on the frailties and is cluster specific. One can also obtain marginal covariate effects by making additional assumptions about the frailty distribution as was done by Glidden and Self (1999) and Pipper and Martinussen (2003) under the Clayton-Oakes model. In frailty models, it is usually assumed that the frailty variables follow the same distribution across clusters, which implies equal intracluster dependence as well as between-cluster heterogeneity. This assumption may be violated in practice.

In studies comparing U.S. kidney transplant centers to the national average, the ratio of observed to expected deaths, known as the standardized mortality ratio (SMR), is used, with the expected deaths obtained from a marginal Cox model. An SMR > 1 indicates a mortality rate above the national average. In the shared frailty model, this statistic is actually a nonparametric Poisson-type estimator (Glidden and Vittinghoff, 2004) for the corresponding frailty, given the observed data in the center. An investigation of the SMRs suggests that there may be greater heterogeneity for smaller facilities, since SMRs for smaller centers are more frequently seen at either the top or at the bottom of the ordered list. Although this is partly due to sampling variance of the SMR estimator, it is also possible that an unequal degree of heterogeneity across centers results from varying cluster characteristics. This suggests a shared frailty model, but with the frailty distribution allowed to depend on cluster size. Other cluster-level covariates may also have an effect on the frailty distribution. For example, urban transplant facilities may exhibit more uniform practices than rural transplant hospitals, corresponding to less heterogeneity (smaller variance) for frailties of urban centers. In these examples of clustered failure time data, the population averaged effect is of primary interest. At the same time, however, the incorporation of cluster-level covariate effects on the frailty distribution is of practical interest and should be considered.

Similar situations exist for other types of clustered data. Prentice (1986) proposed a regression model for clustered binary data, in which the correlation between pairs of binary observation within clusters was assumed to depend on cluster-level covariates. Lin, Raz, and Harlow (1997) proposed a linear mixed model with heterogeneous within-cluster variances, where the within-cluster errors were assumed to follow a normal distribution with cluster-specific covariance matrix. Specifically, the variance of the measurement error was assumed to follow an inverse gamma distribution, where the mean depends on some linear combination of cluster-level covariates through a log link. Heagerty (1999) proposed a marginally specified logistic-normal model for longitudinal binary data in which the marginal mean, rather than the conditional mean, was regressed on covariates. In addition, a conditional model on a Gaussian latent variable is specified, where the random effect additively influences the logit of the conditional mean. Wang and Louis (2004) further extended this method to clustered binary data, allowing the distribution parameters of the random effect to depend on some cluster-level covariates. Their approach used a “bridge” distribution previously identified by Wang and Louis (2003) for the random effect to unify the form of the marginal and the conditional models. As a result, the conditional regression parameters can be expressed as functions of the marginal regression parameters and a parameter in the bridge distribution. Under this model, the regression parameter estimates have a direct marginal interpretation, while the conditional regression parameter estimates can easily be obtained. Moreover, the influence of the cluster-level covariates on the random effect can be estimated.

The positive stable distribution (Hougaard, 1986) serves as a bridge distribution for clustered failure time data under a Cox proportional hazards shared frailty model in the same sense as Wang and Louis (2003) since the resulting marginal regression parameter is a product of the conditional regression parameters and the frailty parameter. This relationship allows both marginal and conditional inference, while accounting for intracluster dependence. The shared positive stable frailty model has attracted renewed attention recently (e.g., Fine, Glidden, and Lee, 2003; Martinussen and Pipper, 2005).

In this article, we propose a covariate-dependent positive stable shared frailty model. The bridge-type frailties are allowed to depend on cluster-level covariates and so to follow different distributions across clusters. Under this unified framework, the marginal regression parameters and the covariate effects on the frailty distribution can be consistently estimated. The major contributions of this paper are the methods proposed for modeling the effects of the cluster-level covariates on the frailty distribution and the corresponding estimation of the marginal regression effects.

The remainder of this article is organized as follows. In Section 2, we introduce the proposed covariate-dependent frailty model and describe the estimation procedures. We obtain the large sample properties of the model parameter estimators in Section 3 and Section 4 presents simulation studies. The proposed method is then applied to kidney transplant data from the Scientific Registry of Transplant Recipients (SRTR) in Section 5. In Section 6, we provide some concluding remarks and discussion. Proofs of the results are provided in the Appendix.

2. Model Specification and Estimation

2.1 The Positive Stable Shared Frailty Cox Proportional Hazards Model with Covariate-Dependent Frailty

In this section, we specify a positive stable shared frailty Cox model, with the frailty distribution depending on cluster-level covariates and the corresponding marginal hazard having a proportional hazards form. Our ultimate purpose is to estimate cluster-level covariate effects on the frailty distribution, as well as the correlation within clusters and heterogeneity between clusters. We first define the Cox-type conditional and marginal hazard functions through the “bridge” property of the positive stable distribution. The relationship between the conditional hazard parameters, marginal hazard parameters, and frailty distribution parameter can be obtained accordingly. Cluster-level covariates are related to the frailty distribution parameter through a link function. Finally, we derive the individual intensity process given the observed history of all the individuals with the parameters of interest. We begin this section by establishing the requisite notation.

Suppose we have measurements from subjects in K clusters and that the cluster sizes nk (k = 1, 2, …, K) are independent and identically distributed bounded random variables. Given nk, let Dik and Cik be the failure and censoring times for the ith individual (i = 1, …, nk) in the kth cluster; let Tik = DikCik be the follow-up time and Δik = I (DikCik) the observed death indicator. Let Wk denote the positive stable distributed frailty with dependence parameter αk for the kth cluster that we use to describe within-cluster dependence possibly due to unobserved covariate information. Let Zik be a p-vector of time-independent covariates measured on individual (i, k). In addition, let Xk be a q-vector of time-independent cluster-level covariates that may influence αk. Let Dk = (D1k, …, Dnkk), with Ck and Zk defined similarly. We assume that (Dk, Ck, Zk, Xk, nk, Wk) are independent and identically distributed for k = 1, …, K. Define the at-risk process Yik (t) = I (Tikt) and the individual counting process Nik (t) = Δik I (Tikt). We define the filtrations

t=σ{Nik(s),Yik(s),Zik,Xk,nk:k=1,,K,i=1,,nk,0st}

and

t=σ{Nik(s),Yik(s),Zik,Xk,nk,Wk:k=1,,K,i=1,,nk,0st}.

Similar to Martinussen and Pipper (2005), we term ℱt the observed filtration and ℋt the conditional filtration.

We assume that Wk follows a positive stable distribution with shape parameter αk (0 < αk ≤ 1). The positive stable distribution has been used by Hougaard (1986) for multivariate failure time data; its density function and Laplace transform are given by

fαk(w)=1πwi=1Γ(iαk+1)i!(wαk)isin(αkiπ),

and

L(s)=ε{exp(sWk)}=exp(sαk)(s0),

respectively.

Given (Zk, Xk, Wk, nk), the failure time Dik, i = 1, …, nk are assumed to be independent with hazard function

limh0+P(tDikt+h|Dikt,Zk,Xk,nk,Wk)/h=Wkλ0k(t)eβkTZik, (1)

where λ0k (t)(k = 1, …, K) are unknown cluster-specific baseline hazard functions and βk (k = 1, …, K) are p-vectors of unknown cluster-specific regression parameters, all of which rely on αk through the derived marginal hazard function below.

Since Wk has the positive stable distribution, the marginal hazard function of Dik is given by

limh0+P(tDikt+h|Dikt,Zk,Xk,nk)/h=h0(t)eγTZik, (2)

where h0 (t) is an unspecified baseline hazard and γ is a p-vector of unknown marginal regression parameters. In this, we have assumed a constant marginal log hazard ratio γ, which, given (1) and (2), imposes the restriction γ = αkβk , k = 1, …, K. Note also that Λ0k(t)=H0(t)αk1, where Λ0k(t)=0tλ0k(s)ds and H0(t)=0th0(s)ds.

We further relate Xk and αk through a link function αk = (η; Xk) and let αk1=g(η;Xk), where η is a (q + 1)-vector of unknown parameters. Here, we assume that g (·) is monotone and twice differentiable with respect to η. Since αk ∈ (0, 1], a natural choice for is the logit link function and we set

g(η;Xk)=1+eηTXk, (3)

with Xk=(1,XkT)T and η=(η1,η2T)T where η1 is a scalar intercept and η2 is a q-vector of regression parameters.

In addition, we assume that the Dik and Cik are independent given Zik for i = 1, …, nk. Under this conditional independent censoring assumption, model (1) implies that the individual intensity process with respect to the conditional filtration ℋt is

λik(t|t)=Yik(t)Wkλ0k(t)eβkTZik. (4)

By applying the innovation theorem (Andersen et al., 1993) to (4) and inserting the link function (3), the individual intensity process with respect to the observed filtration ℱt is

λik(t|t)=Yik(t)fk(t)λ0k(t)eg(η;Xk)γTZik, (5)

Where fk (t) = E (Wk | ℱt) has the explicit form

fk(t)=EWk[WkN.k(t)+1eWki=1nk0tYik(t)eg(η;Xk)γTZikdH0g(η;Xk)(s)]EWk[WkN.k(t)eWki=1nk0tYik(t)eg(η;Xk)γTZikdH0g(η;Xk)(s)], (6)

with “.” denoting summation over a subscript.

2.2 Estimation

Model (4) differs from the existing positive stable shared frailty Cox proportional model in that it allows the frailty distribution parameter αk to depend on cluster-level covariates, which induces the cluster-specific conditional regression parameter βk=αk1γ and the cluster-specific conditional baseline hazard λ0k (t). It can be easily seen that when η2 = 0, αk is a constant and the proposed model reduces to the common positive stable shared frailty model for which several estimation procedures have been developed. For example, Wang, Klein, and Moeschberger (1995) applied the E-M algorithm for parameter estimation. Fine et al. (2003) presented a simple estimation procedure that fitted a marginal model and stratified model separately and utilized the relationship α = γ/β. Martinussen and Pipper (2005) proposed a likelihood-based estimation procedure based on the individual intensity process with respect to an observed filtration similar to (5), but with αk = α and βk = β. However, we are not able to extend these estimation procedures in the proposed model, since the regression parameter βk in the conditional hazard is cluster specific.

As can be seen in the existing literature, simulations and applications of the positive stable shared frailty model are usually based on small clusters, such as twin or family studies, especially when the estimation of frailties is needed. In order to apply the positive stable frailty model to studies with large clusters, it is useful to avoid the estimation of fk (t) in (6). We notice that model (5) can be written as

λik(t|t)=Yik(t)λ0k(t)eg(η;Xk)γTZik,

where λ̃0k (t) = λ0k (t)fk (t), which is actually a stratified Cox model, except that the covariate effect is cluster specific and depends on a function of cluster-level covariates. The stratified partial likelihood approach (Cox, 1975; Kalbfleisch and Prentice, 2002) can be directly applied here. Due to the loss of information in fk (t) and the multiplicative relationship between g and γ, we cannot estimate the intercept term η1 and the remaining parameters simultaneously. Therefore, our estimation procedure is actually based on two results from models (2) and (5), respectively.

Before proceeding, it is convenient to introduce the following two sets of notation for k = 1. …, K and r = 0, 1, 2,

S(r)(γ,t)=K1k=1Ki=1nkYik(t)eγTZikZikr,E(γ,t)=S(1)(γ,t)/S(0)(γ,t),V(γ,t)=S(2)(γ,t)/S(0)(γ,t){E(γ,t)}2,

Where a⊗0 = 1, a⊗1 = a, and a⊗2 = aaT, and

Sk(r)(η;γ,t)=i=1nkYik(t)eg(η;Xk)γTZik{g1(η;Xk)γTZik}r,Sk(3)(η;γ,t)=i=1nkYik(t)eg(η;Xk)γTZikg2(η;Xk)γTZik,Sk(4)(η;γ,t)=i=1nkYik(t)eg(η;Xk)γTZikg1(η;Xk)ZikT,Sk(5)(η;γ,t)=i=1nkYik(t)eg(η;Xk)γTZikg(η;Xk)ZikT,Sk(6)(η;γ,t)=i=1nkYik(t)eg(η;Xk)γTZikg1(η;Xk)γTZik2g(η;Xk),Ek1(η;γ,t)=Sk(1)(η;γ,t)/Sk(0)(η;γ,t),Ek3(η;γ,t)=Sk(3)(η;γ,t)/Sk(0)(η;γ,t),Ek4(η;γ,t)=Sk(4)(η;γ,t)/Sk(0)(η;γ,t),Ek5(η;γ,t)=Sk(5)(η;γ,t)/Sk(0)(η;γ,t),Vk1(η;γ,t)=Sk(2)(η;γ,t)/Sk(0)(η;γ,t){Ek1(η;γ,t)}2,Vk2(η;γ,t)=Sk(6)(η;γ,t)/Sk(0)(η;γ,t)Ek1(η;γ,t)Ek5(η;γ,t),g1(η;X)=g(η;X)/η,g2(η;X)=g1(η;X)/ηT.

We first estimate γ from model (2) by maximizing the pseudo partial log-likelihood

1(γ)=k=1Ki=1nk0τ{γTZiklogS(0)(γ,t)}dNik(t)

under the working independence assumption (Wei et al., 1989). The corresponding estimating equation can be written as

U1(γ)=k=1Ki=1nk0τ{ZikE(γ,t)}dNik(t).

Given an estimator γ̂ of γ from model (2), we estimate η from model (5) by maximizing the pseudo-stratified partial log-likelihood

2(η;γ^)=k=1Ki=1nk0τ{g(η;Xk)γ^TZiklogSk(0)(η;γ^,t)}dNik(t),

with corresponding score function,

U2(η;γ^)=k=1Ki=1nk0τ{g1(η;Xk)γ^TZikEk1(η;γ^,t)}dNik(t).

Solving U2 (η; γ̂) = 0, we can obtain the estimator η̂ for η.

3. Asymptotic Properties

Denote γ0 and η0 as the true values of the parameters γ and η, respectively. In this section, we emphasize the large sample results for η̂. We begin by restating a previously derived result. We list the assumed conditions, state a previously derived result, and then state the theorems for our estimators. Proofs are provided in the Appendix.

The following conditions are assumed throughout this article, where for all k = 1, …, K and some constant τ > 0:

  1. (Dk, Ck, Zk, Xk, nk, Wk) are independent and identically distributed;

  2. P {Yik (τ) = 1} > 0 for i = 1, …, nk ;

  3. |Zikl | < BZ < ∞ and |Xkj | < BX < ∞ for all l = 1, …, p and j = 1, …, q and some constants BZ and BX ;

  4. g (·) is twice continuously differentiable with respect to η;

  5. γ0 and η0 are interior to the parameter space.

  6. The following matrices are positive definite,

    A1=ε{0τV(γ0,t)S(0)(γ0,t)dH0(t)},A2=ε{0τVk1(η0;γ0,t)Sk(0)(η0;γ0,t)fk(t)dΛ0k(t)}.

Large sample results for γ̂ have been provided by Lee et al. (1992), who showed that K1/2 (γ̂γ0) is asymptotically mean zero normal with variance 1=A11B1A11, where A1 and B1 can be consistently estimated by Â1 = K−1Î and B^1=K1k=1Kψ^k2, with

I^=k=1Ki=1nk0τV(γ^,t)dNik(t),ψ^k=i=1nk0τ{ZikE(γ^,t)}×{dNikYik(t)eγ^TZikdH^0(t)},

where

H^0(t)=k=1Ki=1nk0tdNik(u)/S(0)(γ^,u).

Theorem 1

Under conditions (a)−(f), η̂ is unique and converges almost surely to η0 as K → ∞.

The proof of the consistency of η̂ is similar to that of Prentice and Self (1983) and Lemma 3.1 in Andersen and Gill (1982).

Theorem 2

Under conditions (a)−(f), the random vector K1/2 (η̂η0) converges weakly to a (q + 1)-variate normal vector with mean 0 and covariance matrix

2=A21(A2+B21B2T2CB2T)A21,

where A2 is defined in condition (f) and

B2=ε{0τVk2(η0;γ0,t)Sk(0)(η0;γ0,t)fk(t)dΛ0k(t)},C=ε{ukψkT}A11,

with

uk=i=1nk0τ{g1(η0;Xk)γ0TZikEk1(η0;γ0,t)}dNik(t),ψk=i=1nk0τ{Zike(γ0,t)}{dNikYik(t)eγ0TZikdH0(t)},

where

e(γ,t)=ε{S(1)(γ,t)}ε{S(0)(γ,t)}.

Using the proof of Theorems 1 and 2, together with the results from Lee et al. (1992), we can show that Σ2 can be consistently estimated by

^2=A^21(A^2+B^2^1B^2T2C^B^2T)A^21

with

A^2=K1U2(η;γ)/ηT|η=η^,γ=γ^=K1k=1Ki=1nk0τ{Vk1(η^;γ^,t)g2(η^;Xk)γ^TZik+Ek3(η^;γ^,t)}dNik(t),B^2=K1U2(η;γ)/γT|η=η^,γ=γ^=K1k=1Ki=1nk0τ{Vk2(η^;γ^,t)g1(η^;Xk)ZikTEk4(η^;γ^,t)}dNik(t),C^=K1k=1Ku^kψ^kTA^11,

where

u^k=i=1nk0τ{g1(η^;Xk)γ^TZikEk1(η^;γ^,t)}dNik(t).

4. Numerical Studies

Simulation studies were conducted to assess the finite sample behavior of η̂. We also compare our method to that of Fine et al. (2003) under the special case where αk is common among clusters.

In the first simulation study, clustered failure time data were simulated from models (3) and (4) with K = 50, 100; H0 (t) = t; γ = (0.5, 1)T; η1 = −0.5, −0.25, 0, 0.25, 0.5; and η2 = 0.5. Cluster sizes were simulated from a discrete uniform distribution in the following four intervals [5, 20], [21, 50], [51, 100], and [101, 200] with approximately equal number of clusters in each interval. The cluster-level covariate Xk was the cluster size measured in units of 100 subjects. The positive stable frailties, Wk, were simulated following the method in Chambers, Mallows, and Stuck (1976),

Wk=sin(αkW1k)sin(W1k)1/αk[sin{(1αk)W1k}W2k](1αk)/αk,

where W1k and W2k are independent, with W1k following a uniform distribution U (0, π) and W2k following an exponential distribution with mean 1. The individual-level covariate Zik = (Zik1, Zik2)T was independently generated, with Zik1 from a Bernoulli distribution with p = 0.5 and Zik2 from N (0, 1) distribution. The censoring times were simulated from the uniform distribution, U (0.25, 1), yielding censoring probabilities of approximately 46%. For each scenario, 1000 replicates were carried out.

The results are summarized in Table 1. We report bias of the sampling mean of the estimators (BIAS), the mean of the standard error estimators (ASE), empirical standard deviation of the estimators (ESD), and the 95% empirical coverage probability (CP). In the last column, we present the approximate range of αk for the simulated data. We also present the results for γ̂. We can see that the estimator η̂ is nearly unbiased. The (ASE) is generally fairly close to the ESD and, correspondingly, 95% empirical coverage probabilities are generally close to the nominal values. As the number of clusters increases from K = 50 to K = 100, the coverage probability is generally closer to the nominal value. In addition, as the value of αk decreases, the coverage probability becomes lower. This may partly be due to the fact that, for a fixed sample size, the amount of independent information decreases as αk decreases; that is, smaller value of αk corresponds to stronger association within clusters.

Table 1. Summary of results for the first simulation study with η2 = 0.5, γ1 = 0.5, and γ2 = 1 based on 1000 replicates.

η̂ γ̂


K Parameter True BIAS ASE ESD CP Parameter BIAS ASE ESD CP Range of αk
50 η1 0.5 0.01 0.27 0.27 0.96 γ1 0.01 0.06 0.06 0.92 0.63–0.82
η2 0.03 0.22 0.23 0.94 γ2 0.00 0.08 0.08 0.90
η1 0.25 0.02 0.24 0.24 0.96 γ1 0.01 0.06 0.06 0.91 0.57–0.78
η2 0.01 0.18 0.19 0.94 γ2 0.00 0.08 0.09 0.90
η1 0 0.03 0.22 0.23 0.94 γ1 0.01 0.06 0.07 0.92 0.51–0.73
η2 0.00 0.15 0.16 0.93 γ2 0.00 0.09 0.10 0.91
η1 −0.25 0.04 0.21 0.22 0.93 γ1 0.01 0.07 0.07 0.91 0.44–0.68
η2 −0.01 0.12 0.13 0.92 γ2 0.00 0.10 0.11 0.91
η1 −0.5 0.07 0.20 0.22 0.91 γ1 0.01 0.07 0.07 0.92 0.38–0.62
η2 −0.04 0.10 0.12 0.87 γ2 0.01 0.11 0.11 0.91
100 η1 0.5 0.01 0.20 0.19 0.95 γ1 0.01 0.04 0.04 0.94 0.63–0.82
η2 0.02 0.15 0.15 0.95 γ2 0.00 0.06 0.06 0.92
η1 0.25 0.02 0.17 0.18 0.95 γ1 0.01 0.05 0.05 0.93 0.57–0.78
η2 0.00 0.13 0.12 0.95 γ2 0.00 0.06 0.06 0.93
η1 0 0.02 0.16 0.16 0.93 γ1 0.01 0.05 0.05 0.93 0.51–0.73
η2 0.00 0.10 0.10 0.94 γ2 0.00 0.07 0.07 0.93
η1 −0.25 0.03 0.15 0.16 0.93 γ1 0.01 0.05 0.05 0.93 0.44–0.68
η2 −0.02 0.09 0.09 0.92 γ2 0.00 0.07 0.07 0.94
η1 −0.5 0.04 0.14 0.15 0.91 γ1 0.01 0.05 0.05 0.94 0.38–0.62
η2 −0.03 0.07 0.08 0.87 γ2 0.00 0.08 0.08 0.93

To assess the asymptotic normality of the regression parameter estimates, we study the quantile-quantile (Q-Q) plots of η̂ after being standardized against standard normal variable. In Figure 1, we show the Q-Q plots of η̂1 and η̂2 when K = 100 and η1 = −0.5, 0, and 0.5. All six plots exhibit diagonal lines, which suggests that the asymptotic normal approximation is reasonable.

Figure 1.

Figure 1

Q-Q plots for η̂1 and η̂2 when K = 100 and η1 = −0.5, 0, and 0.5.

In the second simulation study, we compare the proposed method (LKS) with Fine et al. (2003) (FGL) when αk is fixed for all clusters. We keep the same setting for H0, γ, and K. The individual-level covariates and the censoring variable follow the same distribution as the first study. We fix αk = 0.5 or αk = 0.75 for all clusters. When using our method, we let η2 = 0 and estimate η1 only. For the FGL method, α is estimated by averaging the truncated ratio of the marginal and conditional regression parameter estimators. The results are displayed in Table 2, In order to facilitate the comparison, we show the results for α̂ rather than η̂.

Table 2. Summary of results for the second simulation study comparing the proposed method (LKS) with FGL (see Fine et al., 2003) in the special case of constant αk = α, k = 1, …, K with γ1 = 0.5, γ2 = 1, and 1000 replicates.

LKS FGL


K Parameter True BIAS ASE ESD CP BIAS ASE ESD CP
50 α 0.5 0.01 0.06 0.06 0.92 0.01 0.06 0.06 0.91
γ1 0.01 0.07 0.08 0.92
γ2 0.01 0.11 0.12 0.91
α 0.75 0.01 0.06 0.06 0.91 0.01 0.06 0.06 0.87
γ1 0.02 0.06 0.06 0.91
γ2 0.01 0.08 0.09 0.88
100 α 0.5 0.00 0.04 0.04 0.94 0.00 0.04 0.05 0.94
γ1 0.01 0.05 0.05 0.94
γ2 0.00 0.08 0.09 0.94
α 0.75 0.00 0.05 0.05 0.93 0.00 0.04 0.05 0.90
γ1 0.01 0.04 0.04 0.94
γ2 0.00 0.06 0.06 0.92

Both methods give an almost unbiased estimator for α, and the estimated standard error and coverage probability are reasonable. Similar to the results in Table 1, when the number of clusters increases from 50 to 100, the asymptotic standard errors of the estimators decrease and the coverage probability tends to be closer to the nominal value. The asymptotic standard error estimators from the two methods are very close. The LKS method gives somewhat better coverage probability than FGL.

Simulations have been done under covariate-dependent frailty and common frailty settings. Since there is no existing method to compare with under the covariate-dependent frailty setting, we only make comparison under the common frailty setting. For this, three methods are available. Both the traditional EM method (Wang et al., 1995) and the Martinussen and Pipper (2005) method (MP) involve estimation of the frailties as missing data, which is computationally very slow when large number of deaths are observed for some clusters and does not yield standard error easily. On the other hand, FGL does not involve the estimation of the frailties as is the case with the LKS method. Since our primary application of interest has clusters with large number of observed deaths, we have compared our method to FGL only.

5. Application

We applied the proposed methods to data on deceased donor kidney transplants performed between 2000 and 2004 in the United States. Data were obtained from the SRTR. Failure time (recorded in days) was defined as the time from transplantation to graft failure, retransplantation, or death, whichever occurred first. There were 224 facilities and a total of 23,027 transplants included in the study. The facility size varied from 1 to 708. We fitted the proposed covariate-dependent frailty model to the data with the logit link function for the dependence parameter αk. A total of 12 patient-level covariates and four cluster-level covariates are considered in the proportional hazards model. The same cluster-level covariates are included in the link function for αk. Patient-level covariates included age at transplantation (by decade), race (African-American, Other), gender, time on dialysis (2 dummy variables), body mass index (BMI; 3 dummy variables) and primary cause of renal disease (4 dummy variables). Cluster-level covariates included percentage of female patients, percentage of African-American patients, percentage of patients caused by diabetes, and center size (per 100 patients) in a center.

We expect that any covariate that is associated with the between-cluster variability may also be related to within-cluster variation. Moreover, it is easier to interpret a covariate's effect on the frailty variance after adjusting for its effect on the hazard function itself. Therefore, as a modeling strategy, covariates included in the logit link function should also be represented in the marginal hazards model. Naturally, such cluster-level covariates will not be used in the second stage of the estimation procedures, due to the stratification.

Results of our analysis are shown in Table 3. Percentage of female patients has a significant effect (p = 0.0063) on the frailty parameter. It is found that facilities with fewer female patients tend to have a smaller value of αk, which corresponds to greater heterogeneity in facility performance. The percentage of female patients also influences the hazards significantly. Upon examining the point estimates, one could interpret these results as being in the same direction, as higher percent female implies lower graft failure hazard and lower variation; both desirable outcomes.

Table 3. Analysis of SRTR kidney transplant data.

Covariates Estimates SE p-value
γ (Patient Level)
 Age (in decades) 0.1541 0.0104 <.0001
 African-American 0.2738 0.0293 <.0001
 Female −0.0957 0.0254 0.0002
 Time on Dialysis (in years)
 ≤1 −0.1379 0.0372 0.0002
 >3 0.1153 0.0277 <.0001
 Recipient BMI
 <20 0.0732 0.0502 0.1450
  [25, 30) 0.0391 0.0298 0.1904
 ≥30 0.1369 0.0320 <.0001
 Cause of ESRD
 Diabetes 0.2970 0.0359 <.0001
 Hypertension 0.1646 0.0375 <.0001
 Polycystic −0.3106 0.0571 <.0001
 Other 0.1156 0.0392 0.0032
γ (Cluster level)
 Percent of female (pct) −0.0063 0.0022 0.0035
 Percent of African-American (pct) 0.0039 0.0007 <.0001
 Percent of diabetes (pct) 0.0084 0.0017 <.0001
 Center size (in 100 patients) 0.0097 0.0068 0.1548
η
 Intercept −2.3192 2.0945 0.2682
 Percent of female (pct) 0.1046 0.0382 0.0063
 Percent of African-American (pct) 0.0389 0.0325 0.2316
 Percent of diabetes (pct) 0.0298 0.0531 0.5745
 Center size (in 100 patients) −0.2288 0.2705 0.3977

6. Discussion

Covariate-dependent frailty models for clustered failure time data have rarely been studied previously. Wassell and Moeschberger (1993) proposed a bivariate survival model with the gamma frailty parameter depending on a pairwise covariate. Their approach only considered paired survival times in each cluster and cannot be applied to studies with larger cluster sizes. Wassell, Kulczycki, and Moyer (1995) also pointed out the increasing complexity of the application of a frailty model to clustered failure time data with larger group sizes. The model proposed in this article enables one to adjust for covariate effects on the frailty distribution and permits both marginal and conditional inference for clustered failure time data regardless of the group size. Further consideration of the proposed method reveals two additional advantages. First, model (5), on which we make inference, allows for covariate-by-cluster interaction. The covariate effect is multiplicatively influenced by clusters through the cluster-level covariate-dependent frailty parameter αk. Second, with the rapid development of various methods for frailty models, researchers have begun to consider more carefully issues of ease of implementation and computation time (e.g., Fine et al., 2003; Liu and Huang, 2007). The proposed method performs well in both aspects. The method can be implemented using SAS IML. When we evaluated the computation time in the simulation study, it took approximately 4 hours for 1000 runs, with approximately one-third of the time spent on the PROC PHREG call.

Recalling that Λ0k(t)=H0(t)αk1, we can estimate Λ0k with Ĥ0(t)g(η̂;Xk), k = 1, …, K, where the estimator Ĥ0 (t) of H0 (t) can be estimated from model (2) (see Spiekerman and Lin, 1998). Since the joint distribution of Ĥ0 (t) and η̂ is complicated, we have not been able to obtain the asymptotic distribution of the Λ0k's.

We noted that when a cluster-level covariate is included in the conditional proportional hazard model, its effect is nearly nonidentifiable and does not interfere with the estimation of other covariate effects. This is due to the use of the stratified partial likelihood approach in the estimation. Since the motivation of the proposed method is to model cluster-level covariate effects on between-cluster heterogeneity and within-cluster association, the inclusion of a cluster-level covariate in the conditional hazard is not needed. On the other hand, one is able to obtain the marginal effect of a cluster-level covariate due to the proportional hazard in the marginal model.

For ease of computation and to avoid the estimation of the fk (t) (which is difficult for studies with large clusters), we first attempted using a stratified partial likelihood approach based on model (5) only. We found that this approach does not lead to useful estimators for the parameter η1. As an alternative, we estimate γ from model (2), then use the estimator γ̂ in model (5) to obtain a consistent estimator for η. The proposed estimation procedure is actually a two-step procedure. Such approach has been employed previously in the context of maximum likelihood by, for example, Gong and Samaniego (1981) and for the Clayton-Oakes model with a proportional hazards model for the margins by Glidden (2000). It should be noted that some efficiency is lost under the stratified partial likelihood approach in the second stage, as exemplified by the fact that the same estimation would be obtained if we let fk (t) = 1.

Several areas of future research are possible. The proposed method relies on the specification of a link function, and model checking on this function is of potential interest. Future research on this method may also include the extension to other frailty distributions.

Acknowledgments

The authors thank the Scientific Registry of Transplant Recipients (SRTR) for access to the kidney transplant data. The SRTR is funded by a contract from the Health Resources and Services Administration (HRSA), U.S. Department of Health and Human Services. This research was supported in part by National Institutes of Health grant R01 DK-70869 (DES). The authors are also grateful to the coordinating editor, associate editor, and a referee for suggestions that resulted in considerable improvement of the manuscript. The authors also thank Tempie Shearon of the University of Michigan Kidney Epidemiology and Cost Center for assistance with assembling the analysis files.

Appendix.

Proof of Theorem 1

The individual counting process martingale for the observed filtration is

Mik(t)=Nik(t)0tYik(s)fk(s)eg(η0;Xk)γ0TZikdΛ0k(s).

The proof of the consistency of η̂ considers the following two processes,

G(η,γ^)=K1{l2(η,γ^,t)l2(η0,γ0,t)}=K1k=1Ki=1nk0τ[{g(η;Xk)γ^Tg(η0;Xk)γ0T}ZiklogSk(0)(η,γ^,t)Sk(0)(η0,γ0,t)]dNik(t),

and

Ξ(η)=K1k=1Ki=1nk0τ[{g(η;Xk)γ0Tg(η0;Xk)γ0T}ZiklogSk(0)(η,γ0,t)Sk(0)(η0,γ0,t)]Yik(t)fk(t)eg(η0;Xk)γ0TZikdΛ0k(t).

The difference between them can be decomposed into two parts,

G(η,γ^)Ξ(η)={G(η,γ^)G(η,γ0)}+{G(η,γ0)Ξ(η)}=K1k=1Ki=1nk0τ{g(η;Xk)Zik(γ^γ0)logSk(0)(η,γ^,t)Sk(0)(η,γ0,t)}dNik(t)+K1k=1Ki=1nk0τ[{g(η;Xk)g(η0;Xk)}×γ0TZiklogSk(0)(η,γ0,t)Sk(0)(η0,γ0,t)]dMik(t).

For each η, the first term on the right-hand side of the equation converges almost surely to zero due to the consistency of γ̂ and under conditions (a) to (f), the second term is a summation of K independent and identical distributed zero mean random variables. By the Strong Law of Large Numbers (SLLN), as K → ∞, G (η, γ̂) converges almost surely to the same limiting function of η as Ξ(η).

By the conditions (d) to (f), we can evaluate the first and the second derivatives of this limiting function by taking the partial derivatives inside the integral of Ξ(η). The first derivative is thus

ε[i=1nk0τ{g1(η;Xk)γ0TZikEk1(η;γ0,t)}Yik(t)fk(t)×eg(η0;Xk)γ0TZikdΛ0k(t)].

It is 0 at η = η0. The second derivative

ε[i=1nk0τVk1(η;γ0,t)Sk(0)(η;γ0,t)fk(t)dΛ0k(t)]

is minus a positive definite matrix at η = η0 by condition (f). Therefore, G (η, γ̂) converges almost surely to a concave function of η with a unique maximum at η = η0. Since η̂ maximizes G (η, γ̂), it follows that η^a.s.η0 as K → ∞.

Proof of Theorem 2

The first-order Taylor series expansion of K−1/2U2 (η̂, γ̂) about η = η0 and γ = γ0 gives

K1/2U2(η^;γ^)=K1/2U2(η0;γ0)B^2(η0;γ)K1/2(γ^γ0)A^2(η;γ^)K1/2(η^η0),

where η* is on the line segment between η̂ and η0 and γ* is on the line segment between γ̂ and γ0. Thus, we have

K1/2(η^η0)=A^21(η;γ^){K1/2U2(η0;γ0)B^2(η0;γ)K1/2(γ^γ0)}.

With the consistency of η̂ and γ̂ and the SLLN, we can show that A^2(η;γ^)pA2, and B^2(η0;γ)pB2 and that A2 and B2 can be consistently estimated by Â2 and 2, respectively.

It has been noted in Section 3 that K1/2 (γ̂γ0) converges in distribution to N (0, Σ1). We will prove that K−1/2U2 (η0; γ0) converges in distribution to N (0, A2). It can be easily seen that the process K−1/2U2 (η0; γ0, t) can be written as a sum of orthogonal martingales,

K1/2U2(η0;γ0,t)=K1/2k=1Ki=1nk0t{g1(η0;Xk)γ0TZikEk1(η0;γ0,s)}dMik(s),

with predictable variation process

K1/2U2(η0;γ0)(t)=K1k=1Ki=1nk0t{g1(η0;Xk)γ0TZikEk1(η0;γ0,s)}2×Yik(s)fk(s)eg(η0;Xk)γ0TZik(s)dΛ0k(s)=K1k=1K0tVk1(η0;γ0,s)Sk(0)(η0;γ0,s)fk(s)dΛ0k(s).

From Rebolledo's Theorem, the Weak Law of Large Numbers (WLLN) and condition (f), we can easily show that K−1/2U2 (η0; γ0, τ) converges in distribution to a zero mean Gaussian vector with covariance matrix

limK=K1/2U2(η0;γ0)(τ)=A2.

Finally, we need to obtain the asymptotic covariance matrix of K−1/2U2 (η0; γ0) and K1/2 (γ̂γ0). We can see that both items can be written as a summation of K i.i.d. zero mean random vectors,

K1/2U2(η0;γ0)=K1/2k=1Kuk,K1/2(γ^γ0)=K1/2A^1(γ)1k=1Kψk+op(1),

with

uk(η0;γ0)=i=1nk0τ{g1(η0;Xk)γ0TZikEk1(η0;γ0,t)}dNik(t),

and

ψk(γ0,H0)=i=1nk0τ{Zike(γ0,t)}×{dNikYik(t)eγ0TZikdH0(t)}.

With the consistency of γ̂ and the WLLN, the asymptotic covariance matrix of K−1/2U2 (η0; γ0) and K1/2 (γ̂γ0) is C=ε{ukψkT}A11.

In summary, K1/2 (η̂η0) converges in distribution to a N (0, Σ2), where

2=A21(A2+B21B2T2CB2T)A21,

which can be consistently estimated by replacing each quantity with its corresponding estimator.

References

  1. Andersen PK, Gill RD. Cox's regression model for counting process: A large sample study. Annals of Statistics. 1982;10:1100–1120. [Google Scholar]
  2. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer-Verlag; 1993. [Google Scholar]
  3. Chambers JM, Mallows CL, Stuck BW. A method for simulating stable random variables. Journal of the American Statistical Association. 1976;71:340–344. [Google Scholar]
  4. Cox DR. Partial likelihood. Biometrika. 1975;62:269–276. [Google Scholar]
  5. Fine JP, Glidden DV, Lee KE. A simple estimator for a shared frailty regression model. Journal of the Royal Statistical Society, Series B. 2003;65:317–329. [Google Scholar]
  6. Glidden DV. A two-stage estimator of the dependence parameter for the Clayton-Oakes model. Lifetime Data Analysis. 2000;6:141–156. doi: 10.1023/a:1009664011060. [DOI] [PubMed] [Google Scholar]
  7. Glidden DV, Self S. Semiparametric likelihood estimation in the Clayton-Oakes failure time model. Scandinavian Journal of Statistics. 1999;26:363–372. [Google Scholar]
  8. Glidden DV, Vittinghoff E. Modelling clustered survival data from multicentre clinical trials. Statistics in Medicine. 2004;23:369–388. doi: 10.1002/sim.1599. [DOI] [PubMed] [Google Scholar]
  9. Gong G, Samaniego FJ. Pseudo maximum likelihood estimation: Theory and applications. Annals of Statistics. 1981;9:861–869. [Google Scholar]
  10. Heagerty PJ. Marginally specified logistic-normal models for longitudinal binary data. Biometrics. 1999;55:688–698. doi: 10.1111/j.0006-341x.1999.00688.x. [DOI] [PubMed] [Google Scholar]
  11. Hougaard P. A class of multivariate failure time distributions. Biometrika. 1986;73:671–678. [Google Scholar]
  12. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd. New York: Wiley; 2002. [Google Scholar]
  13. Lee EW, Wei LJ, Amato DA. Cox-type regression analysis for large numbers of small groups of correlated failure time observations. In: Klein JP, Goel PK, editors. Survival Analysis: State of the Art. Dordrecht: Kluwer Academic Publishers; 1992. pp. 237–247. [Google Scholar]
  14. Lin X, Raz J, Harlow S. Linear mixed models with heterogeneous within-cluster variances. Biometrics. 1997;53:910–923. [PubMed] [Google Scholar]
  15. Liu L, Huang X. The use of Gaussian quadrature for estimation in frailty proportional hazards models. Statistics in Medicine. 2007;27:2665–2683. doi: 10.1002/sim.3077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Martinussen T, Pipper CB. Estimation in the positive stable shared frailty Cox proportional hazards model. Lifetime Data Analysis. 2005;11:99–115. doi: 10.1007/s10985-004-5642-4. [DOI] [PubMed] [Google Scholar]
  17. Pipper CB, Martinussen T. A likelihood based estimating equation for the Clayton-Oakes model with marginal proportional hazards. Scandinavian Journal of Statistics. 2003;30:509–522. [Google Scholar]
  18. Prentice RL. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association. 1986;81:321–327. [Google Scholar]
  19. Prentice RL, Self SG. Asymptotic distribution theory for Cox-type regression models with general relative risk form. Annals of Statistics. 1983;81:804–813. [Google Scholar]
  20. Spiekerman CF, Lin DY. Marginal regression models for multivariate failure time data. Journal of the American Statistical Association. 1998;93:1164–1175. [Google Scholar]
  21. Wang ST, Klein JP, Moeschberger ML. Semiparametric estimation of covariate effects using the positive stable frailty model. Applied Stochastic Models and Data Analysis. 1995;11:121–133. [Google Scholar]
  22. Wang Z, Louis TA. Matching conditional and marginal shapes in binary mixed-effect models using a bridge distribution function. Biometrika. 2003;90:765–775. [Google Scholar]
  23. Wang Z, Louis TA. Marginalized binary mixed-effects models with covariate-dependent random effects and likelihood inference. Biometrics. 2004;60:884–891. doi: 10.1111/j.0006-341X.2004.00243.x. [DOI] [PubMed] [Google Scholar]
  24. Wassell JT, Moeschberger ML. A bivariate survival model with modified gamma frailty for assessing the impact of interventions. Statistics in Medicine. 1993;12:241–248. doi: 10.1002/sim.4780120308. [DOI] [PubMed] [Google Scholar]
  25. Wassell JT, Kulczycki GW, Moyer ES. Frailty models of manufacturing effects. Lifetime Data Analysis. 1995;1:161–170. doi: 10.1007/BF00985767. [DOI] [PubMed] [Google Scholar]
  26. Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association. 1989;84:1065–1073. [Google Scholar]

RESOURCES