A Gaussian Copula Model for Multivariate Survival Data

Megan Othus; Yi Li

doi:10.1007/s12561-010-9026-x

. Author manuscript; available in PMC: 2011 Dec 6.

Published in final edited form as: Stat Biosci. 2010 Dec;2(2):154–179. doi: 10.1007/s12561-010-9026-x

A Gaussian Copula Model for Multivariate Survival Data

Megan Othus ¹, Yi Li ²

PMCID: PMC3232005 NIHMSID: NIHMS336462 PMID: 22162742

Abstract

We consider a Gaussian copula model for multivariate survival times. Estimation of the copula association parameter is easily implemented with existing software using a two-stage estimation procedure. Using the Gaussian copula, we are able to test whether the association parameter is equal to zero. When the association term is positive, the model can be extended to incorporate cluster-level frailty terms. Asymptotic properties are derived under the two-stage estimation scheme. Simulation studies verify finite sample utility. We apply the method to a Children’s Oncology Group multi-center study of acute lymphoblastic leukemia. The analysis estimates marginal treatment effects and examines potential clustering within treatment institution.

Keywords: Copula model, Correlated survival data, Proportional hazards model, Semiparametric normal transformation

1 Introduction

In some correlated survival data settings, practitioners have two primary interests: determining the effect of treatment and assessing potential dependence between subjects. For example, in many multi-center clinical trials, data are clustered within treatment center. Athough institutions participating in clinical trials follow trial-specific protocols, differences can still exist in outcomes between institutions. The dependence among patients treated at the same institution is an important component of a multi-center clinical trial analysis (Fleiss, 1986; Gray, 1994; Jones, Teather, Wang, and Lewis, 1998; Senn, 1998; Anello, O’Neill, and Dubey, 2005; Vierron and Giraudeau, 2007; Logan, Nelson, and Klein, 2008; Zheng and Zelen, 2008).

Our motivating data arise from a large multi-center clinical trial for children with acute lymphoblastic leukemia. The goal of the clinical trial was to test whether either an increase in the strength or an increase in the duration of the standard chemotherapy regimen was associated with improved survival. We are interested in evaluating whether there exists correlation between survival outcomes within an institution while concurrently assessing the efficacy of the new treatment regimens.

A variety of statistical models are available for correlated survival data. Marginal models treat within-cluster correlations as a nuisance (Wei, Lin, and Weissfeld, 1989; Prentice and Cai, 1992; Cai and Prentice, 1995; Cai, Wei, and Wilcox, 2000, among others). Parameters from marginal models have population-average interpretations. Frailty models are used when within-cluster inferences are desired because the parameters from these models have interpretations conditional on the value of the frailty (Clayton, 1978; Oakes, 1989; Murphy, 1994, 1995; Parner, 1998; Cai, Cheng, and Wei, 2002; Lam, Lee, and Leung, 2002; Glidden and Vittinghoff, 2004; Zeng and Lin, 2007, among others). The positive stable frailty model (Fine, Glidden, and Lee, 2003) allows for marginal proportional hazards interpretation of parameters within a frailty model framework. Copula models (Shih and Louis, 1995; Glidden, 2000; Li, Prentice, and Lin, 2008, among others) embed marginal survival functions within a copula function parametrized by an association term.

For our application, we would like to: (1) test whether either of the two new chemotherapy schedules is associated with improved marginal survival and (2) test whether there is a non-zero correlation between survival outcomes within institutions while controlling for known prognostic factors. Few existing models for multivariate survival data accommodate both semiparametric marginal distributions and unrestricted pairwise dependence.

Consider, for example, Clayton (1978)’s model for a pair of survival times. The dependence term for this model, θ, takes values in (0, ∞). When θ = 1 the model reduces to the independence model, while θ > 1 induces positive association and θ < 1 induces negative association. When θ ≥ 0.5, the distribution is absolutely continuous, but for θ ≤ 0.50, a singular distribution is concentrated on a curve (Oakes, 1989). Hougaard (2000) noted that frailty models cannot yield unrestricted marginal distributions with unrestricted pairwise parameters. Hence, it will be of substantial interest to specify a semiparametric likelihood model that allows for arbitrary modeling of the marginal survival functions and a flexible and interpretable correlation structure.

The goal of this project is to develop a model for multivariate survival data that addresses points (1) and (2) above. To this end we use a semiparametric normal transformation that establishes a Gaussian copula for survival data. The marginal survival function follows a proportional hazards model. The Gaussian copula includes a parameter that summarizes the within-cluster correlation. The correlation parameter can take positive and negative values, which allows for straightforward testing of whether the correlation parameter is equal to zero.

We note that there have been two previous articles using the semiparametric normal transformation model, but neither is applicable to our setting and the proposed model extends the range of data to which the idea can be applied. In contrast to Li et al (2008), our model can accommodate varying cluster sizes and allows for covariates. Li and Lin (2006) assume a specific spatial correlation structure on the entire dataset. In contrast, our method explicitly allows for correlated survival times within independent clusters.

The rest of the paper is structured as follows: in Section 2 we define notation and describe the model; Section 3 summarizes inference procedures; Section 4 outlines an extension of the model when the correlation term is postive; we provide a summary of asymptotic results in Section 5; simulations are presented in Section 6; Section 7 contains an analysis of a Children’s Oncology Group multi-center clinical trial; and we finish with a brief discussion in Section 8. Regularity conditions and proofs of theorems are contained in the Appendix.

2 Model Specification

Let T_ij and C_ij denote potentially unobserved failure and censoring times for subject j in cluster i, where j = 1, …, n_i and i = 1, …, m. The observed data are X_ij = min(T_ij, C_ij) and Δ_ij = I(T_ij ≤ C_ij). Let Z_ij(t) denote an external time-dependent covariate vector (Kalbfleisch and Prentice, 2002, page 197) of length p and write its covariate path up to time t as Z̄_ij(t) = {Z_ij(s) | 0 ≤ s ≤ t}. Assume that T_ij, conditional on the covariate process Z̄_ij(T_ij), is independent of C_ij. Also, assume that, conditional on each individual’s covariate path, the hazard of T_ij, denoted λ{t | Z̄_ij(t)}, follows a proportional hazards model:

lim_{h \to 0} h^{- 1} P {t \leq T_{i j} < t + h ∣ T_{i j} \geq t, {\bar{Z}}_{i j} (t)} = λ (t) exp {β^{'} Z_{i j} (t)} .

(1)

Here β is a vector of regression coefficients and λ(t) is an unspecified baseline hazard function with cumulative hazard function Λ. Equation (1) is a marginal model for each T_ij, hence β has a population-average interpretation not a cluster-specific interpretation.

To model the clustering of the T_ij, consider the semiparametric normal transformation:

{\tilde{T}}_{i j} = Φ^{- 1} [1 - S {T_{i j} ∣ {\bar{Z}}_{i j} (T_{i j})}],

(2)

where Φ is the standard normal distribution function and S is the survival function associated with Equation (1). By the probability integral transform, 1 – S{T_ij | Z̄_ij(T_ij)} has a Uniform(0, 1) distribution. It necessarily follows that T̃_ij ~ Normal(0, 1). The transformation takes T_ij with support on (0, ∞) and transforms it to a standard normal random variable, T̃_ij.

Denote the correlation of (T̃_i₁, …, T̃_{in_i}) with Σ_i. We consider an exchangeable correlation structure for Σ_i, where the diagonal terms are equal to 1 and off-diagonal terms are equal to σ. In this model σ can take positive and negative values. The value zero is an interior point of the parameter space for σ so this model can be used to test whether potentially clustered survival data have non-zero correlation. The Gaussian copula model of Li et al (2008) is a special case of this model with no covariates and cluster size fixed at two.

The term σ can be considered a summary measure for the correlation between two subjects within the same cluster after controlling for the covariates included in model (1). Known prognostic factors can be included in the proportional hazards model, and the estimate of the correlation will be based on Cox-Snell type residuals as defined with Equation (2). The term σ can viewed as a generalization of Kenall’s τ and Spearman’s ρ to allow for covariates. For bivariate data, a direct relationship between σ and Kendall’s τ and Spearman’s ρ is straightforward to establish (Li et al, 2008). We can relate σ to the original time scale using the cross-ratio, a local dependence measure (Kalbfleisch and Prentice, 2002). A derivation of this result can be found in Li and Lin (2006, Section 3.1).

3 Inference

3.1 Likelihood Development

Let Y_ij be a potentially censored version of T̃_ij (Equation (2)). The semiparametric normal transformation is monotone and thus preserves censoring patterns. To simplify the presentation, define $Δ_{i} = \sum_{j = 1}^{n_{i}} Δ_{i j}$ and order the observations such that Δ_i₁ = … = Δ_{iΔ_i} = 1. First consider 1 ≤ Δ_i ≤ n_i − 1. Let $Y_{i}^{Δ_{i}} = (Y_{i 1}, \dots, Y_{i Δ_{i}})$ and $Y_{i}^{n_{i} - Δ_{i}} = (Y_{i Δ_{i} + 1}, \dots, Y_{{i n}_{i}})$ .

Let Σ_i be the covariance matrix for the transformed failure times. Write Σ_i as a partitioned matrix:

\sum_{i} = (\begin{matrix} \sum_{i 11} & \sum_{i 12} \\ \sum_{i 21} & \sum_{i 22} \end{matrix}),

where Σ_i₁₁ has dimension Δ_i × Δ_i. The vector $Y_{i}^{Δ_{i}}$ follows a multivariate normal distribution with mean 0 and covariance matrix Σ_i₁₁. It follows that $Y_{i}^{n_{i} - Δ_{i}} ∣ Y_{i}^{Δ_{i}}$ is a censored observation from a normal distribution with mean $\sum_{i 21} \sum_{i 11}^{- 1} Y_{i}^{Δ_{i}^{'}}$ and covariance matrix $\sum_{i 22} - \sum_{i 21} \sum_{i 11}^{- 1} \sum_{i 12}$ .

Because the semiparametric normal transformation is monotonic, the likelihood for the observed data, X_ij, can be written in terms of the transformed terms, Y_ij. To do so, we use the fact that P(X_ij < x) = P(Y_ij < y), where y is the semiparametric normal transformation of x. Write the likelihood based on the observed data as L(σ, β, Λ). The likelihood contribution from cluster i is

φ^{Δ_{i}} (Y_{i}^{Δ_{i}}) {\tilde{Φ}}^{n_{i} - Δ_{i}} (Y_{i}^{n_{i} - Δ_{i}} ∣ Y_{i}^{Δ_{i}}) \prod_{j = 1}^{n_{i}} {[f {X_{i j} ∣ {\bar{Z}}_{i j} (X_{i j})} / φ (Y_{i j})]}^{Δ_{i j}},

(3)

where φ^Δ_i is the multivariate normal density corresponding to its argument, Φ̃^n_i−Δ_i is the multivariate normal survival function corresponding to its argument, f is the density corresponding to Equation (1), and φ is the standard normal density function. If σ = 0, L(σ, β, Λ) reduces to the usual proportional hazards likelihood. A derivation of this likelihood is provided in the Appendix.

When Δ_i = 0 (all subjects in cluster i are censored) define φ^Δ_i = 1 and ${\tilde{Φ}}^{n_{i} - Δ_{i}} (Y_{i}^{n_{i} - Δ_{i}} ∣ Y_{i}^{Δ_{i}}) = {\tilde{Φ}}^{n_{i} - Δ_{i}} (Y_{i}^{n_{i} - Δ_{i}})$ . When Δ_i = n_i (all subjects in cluster i have been observed to fail) define Φ̃^n_i−Δ_i = 1. With these conventions, Equation (3) holds also for Δ_i = 1 and Δ_i = n_i. L(σ, β, Λ) is the product of Equation (3) over i = 1, …, m.

To make the likelihood L(σ, β, Λ) more transparent, we consider an example likelihood contribution from a cluster of size two where one subject is observed to be censored at time C_A₁ and one subject is observed to fail at time T_A₂. The covariate process for each subject is denoted Z̄_A₁(C_A₁) and Z̄_A₂(T_A₂). The normally transformed observed failure times are Y_A₁ = Φ⁻¹[1– S{C_A₁ | Z̄_A₁(C_A₁)}] and Y_A₂ = Φ⁻¹[1 − S{T_A₂ | Z̄_A₂(T_A₂)}]. In this case

\sum = (\begin{matrix} 1 & σ \\ σ & 1 \end{matrix}) .

The first term of L(σ, β, Λ) can be written

φ_{Y_{A 2}}^{1} (y_{A 2}) = {(2 π)}^{- 1 / 2} exp (- y_{A 2}^{2} / 2)

(4)

while the second term can be written

Φ_{Y_{A 1} ∣ Y_{A 2}}^{1} (y_{A 1} ∣ Y_{A 2} = y_{a 2}) = \int_{y_{A 1}}^{\infty} {2 π (1 - σ^{2})}^{- 1 / 2} exp {- {(x - σ y_{A 2})}^{2} / 2 (1 - σ^{2})} d x .

(5)

Equation (4) is the density of a standard normal random variable, while Equation (5) corresponds to, conditional on Y_A₂ = y_A₂, the probability that a Normal(σy_A₂, 1−σ²) random variable is greater than y_A₁.

3.2 Estimation

We propose a two-stage method to estimate (σ, β, Λ). First we estimate β̂ and Λ̂ from the marginal proportional hazards model. We then solve max_σ L(σ, β̂, Λ̂) for σ̂. Formulas for the standard errors of β̂ and Λ̂ that account for clustering can be found using a sandwich formula (Spiekerman and Lin, 1998). The formula for β has been implemented in many statistical programs. The analytic standard error of σ̂ is complicated because it needs to account for the variability from β̂ and Λ̂. In practice the standard error can be estimated using a resampling scheme. To maintain the correlated structure of the failure times, the clusters should be the unit of removal for the resampling calculations (Cai et al, 1997; Cai and Shen, 2000). We choose to use the jackknife for resampling because theoretical validation of the method exists (Lipsitz, Dear, and Zhao, 1994; Lipsitz and Parzen, 1996).

This estimation procedure is computationally straightforward. Marginal estimates of the survival function are available in all standard computing programs, while the likelihood for σ, L(σ, β̂, Λ̂), is proportional to a product of multivariate normal terms, quickly computable using existing software (e.g., R package MVTNORM).

4 A Frailty Model Extension

4.1 A marginalized frailty model

When σ > 0, the model for T̃_ij can be extended to allow for a frailty term:

{\tilde{T}}_{i j} = \sqrt{σ} b_{i} + ε_{i j},

(6)

where b_i is a cluster-level frailty and ε_ij is an error term. We assume that cluster-level frailties b_i have a standard normal distribution and the error terms ε_ij are independent and identically distributed N(0, 1 – σ) random variables that are independent of b_i. The cluster-level frailties b_i can be used to assess cluster-level differences. The β parameters in Equation (1) have marginal interpretation, while σ and b_i from Equation (6) characterize the cluster effect.

Equation (6) merges elements of frailty models with the marginal model, and so will take a moment to review the interpretation of the components of this model. Larger values of σ imply that the frailty terms explain a larger portion of the variance in the T̃_ij compared to smaller values for σ. Larger values of σ provide evidence for a stronger cluster effect compared to smaller values for σ. In the context of a multi-center clinical trial, the cluster-level frailties (b_i) characterize the center effect. For a fixed set of covariates, smaller or more negative values for b_i are associated with shorter survival times (on the original untransformed scale) compared to larger or more positive values for b_i.

4.2 Prediction of the frailty terms

If σ̂ > 0, prediction of b_i and the associated standard error can be found using Laplace approximations to the b_i’s first two moments. Denote the observed data for the i^th cluster, (X_i₁, …, X_{in_i}, Δ_i₁, …, Δ_{in_i}, Z̄_i₁(X_i₁), …, Z̄_{in_i} (X_{in_i})), with Ψ_i. The conditional density of b_i given the observed data Ψ_i, denoted g(b_i | Ψ_i; σ, β, Λ), can be written

L_{i}^{- 1} {(2 π)}^{- 1 / 2} exp (- b_{i}^{2} / 2) \prod_{j = 1}^{n_{i}} {[f {X_{i j} ∣ {\bar{Z}}_{i j} (X_{i j})} / φ (Y_{i j})]}^{Δ_{i j}} \times φ_{σ} {(Y_{i j} - \sqrt{σ} b_{i})}^{Δ_{i j}} {\tilde{Φ}}_{σ} {(Y_{i j} - \sqrt{σ} b_{i})}^{1 - Δ_{i j}}

where L_i is the likelihood for Ψ_i | σ, β, Λ. Define k_i such that $g (b_{i} ∣ Ψ_{i}; σ, β, Λ) = L_{i}^{- 1} exp {k_{i} (b_{i} ∣ Ψ_{i}; σ, β, Λ)}$ . Using the Laplace approximations to the first two moments of g(b_i | Ψ_i; σ, β, Λ) (Booth and Hobert, 1998), the predicted estimate and variance of b_i are taken to be:

{\hat{b}}_{i} = E (b_{i} ∣ Ψ_{i}) \approx arg max_{b_{i}} k_{i} (b_{i} ∣ Ψ_{i}; \hat{σ}, \hat{β}, \hat{Λ})

(7)

V (b_{i} ∣ Ψ_{i}) \approx - {\ddot{k}}_{i} {({\hat{b}}_{i} ∣ Ψ_{i}; \hat{σ}, \hat{β}, \hat{Λ})}^{- 1},

(8)

where double superscript dots denote second derivatives.

The prediction of the shared frailties is straightforward. The expression for k_i(b_i | σ̂, β̂, Λ̂, Ψ_i) involves n_i normal terms and can be maximized using any standard optimization routine. The estimate of the variance of b_i has a closed form expression and can be found by plugging in relevant estimated quantities.

5 Theoretical Results

The following theorems establish the theoretical properties of (σ̂, β̂, Λ̂) where their true values are denoted with (σ₀, β₀, Λ₀).

Theorem 1

Under Conditions C.1 – C.6 in the Appendix, (σ̂, β̂, Λ̂) converges in probability to (σ₀, β₀, Λ₀) as m → ∞.

Theorem 2

Under Conditions C.1 – C.7 in the Appendix Report, as m → ∞, $\sqrt{m} (\hat{σ} - σ_{0})$ and $\sqrt{m} (\hat{β} - β_{0})$ converge to zero-mean normal distributions and $\sqrt{m} {\hat{Λ} (t) - Λ_{0} (t)}$ converges to a zero-mean Gaussian process.

Proofs of Theorems 1 and 2 can be found in the Appendix. The proofs of both theorems for σ̂ adjust for the two-stage estimating procedure. These theorems verify that σ̂ is consistent and asymptotically normal when plug-in estimates of β and Λ are used in the likelihood function.

6 Simulation Results

Simulations were conducted to evaluate the efficacy of the proposed method. The presented simulations have marginal survival times from a proportional hazards model with a constant baseline hazard function equal to 1 and with two covariates: one Bernoulli(0.5) covariate with parameter equal to log(0.5) (denoted β₁) and one Uniform(0,1) covariate with parameter equal to 0.75 (denoted β₂). Censoring times were taken from the Exponential(median=3) distribution and produced about a 25% censoring rate. Correlated survival times were created by first generating random correlated multivariate normal values. The normal values were transformed to the survival scale using Equation (2). Each simulation is based on 250 replications.

In order to focus on the novel elements of the method, we present results for β₁ and β₂ in the Appendix. As has been shown by other authors, our simulations verify that estimates of β₁ and β₂ have little bias and appropriate coverage probability.

For our results for σ, we summarize scenarios with 45, 60, and 90 clusters. Within each replication, clusters varied in size between 2 and 7 units. Standard errors (SEs) for σ were found using the jackknife. We chose to use the jackknife for resampling because theoretical validation of the method exists for multivariate survival data (Lipsitz, Dear, and Zhao, 1994; Lipsitz and Parzen, 1996). Bias, SEs, and coverage probabilities are summarized in Table 1. Power results are summarized in Figure 1.

Table 1.

Simulation results for the correlation terms

	Estimate	Jackknife SE	Monte Carlo SE	Coverage Probability
45 clusters
σ = 0	−0.001	0.091	0.091	0.904
σ = 0.05	0.058	0.104	0.099	0.924
σ = 0.10	0.108	0.110	0.100	0.932
σ = 0.15	0.160	0.112	0.118	0.888
60 clusters
σ = 0	0.000	0.076	0.078	0.942
σ = 0.05	0.049	0.086	0.082	0.928
σ = 0.10	0.104	0.094	0.083	0.944
σ = 0.15	0.154	0.096	0.091	0.964
90 clusters
σ = 0	0.002	0.060	0.059	0.952
σ = 0.05	0.049	0.067	0.067	0.900
σ = 0.10	0.108	0.072	0.068	0.960
σ = 0.15	0.151	0.075	0.077	0.944
σ = 0.50	0.523	0.045	0.044	0.924
σ = −0.10	−0.109	0.067	0.068	0.930

Open in a new tab

Fig. 1 — Power curves as a function of σ and number of cluster.

The estimates of σ have little bias across the simulations. The jackknife SE is close to the Monte Carlo SE across the scenarios. In clusters of size 45, parameter estimates remain unbiased, but the coverage probability drops below nominal levels. As expected, as sample size and correlation increases, the power for testing whether σ ≠ 0 increases.

We also conducted simulations to assess the performance of our frailty estimation method (Equations (7) and (8)). In these simulations, we took σ = 0.5 to ensure all estimates of σ were positive so that frailties could be predicted. Data were generated using Equation (6), to provide true values for the frailty terms. The parameters σ and β were estimated using our proposed two-stage method and their estimates were used in Equations (7) and (8) to predict frailty values and calculate standard errors. We summarize results for simulations with 90 clusters and a range of cluster sizes: cluster sizes varying between 3 and 10 with median size 5, cluster sizes varying between 7 and 20 with median size 10, cluster sizes varying between 10 and 33 with median size 15, and cluster sizes varying between 13 and 50 with median size 20. Results are summarized in Table 2.

Table 2.

Summary of frailty simulations. Relative bias=bias/predicted value

	Median relative bias	SE
Median cluster size = 5	0.06	1.07
Median cluster size = 10	−0.05	1.02
Median cluster size = 15	−0.05	1.01
Median cluster size = 20	−0.06	1.01

Open in a new tab

The method performs well even with small cluster sizes. The relative bias is small. As the cluster sizes increase, the likelihood SEs approach 1, their true value.

7 Data Application: Children’s Oncology Group Study 1961

7.1 Is there evidence of correlation between the survival times of patients within the same institution?

We applied our method to a Children’s Oncology Group (COG) study (protocol number 1961) (Seibel et al, 2008). We analyzed 460 children with enlarged livers from 104 institutions. The goal of the clinical trial was to test whether either an increase in the strength or an increase in the duration of the standard chemotherapy regimen was associated with improved survival for “higher risk” acute lymphoblastic leukemia patients. A 2×2 factorial design was used. The distribution of subjects with enlarged livers among the four arms is presented in Table 3. The number of subjects in each of the 104 institutions is summarized in the histogram in Figure 2. We analyzed the overall survival endpoint.

Table 3.

Distribution of patients with enlarged livers across treatment arms

Strength	Duration	Number of Patients
Standard	Standard	119
Standard	Double	104
Increased	Standard	117
Increased	Double	120

Open in a new tab

Fig. 2 — A histogram of the number of patients in each institution.

We present regression results in Table 4. Standard errors for the marginal survival covariates were found using three methods: a sandwich formula, the jackknife, and a naïve estimate ignoring potential clustering. Standard errors for σ were found using the jackknife. P-values were found using the jackknife standard errors.

Table 4.

Data analysis results using proposed model

Parameter	Estimate	SE_J	P-value	SE_S	Naive SE
Treament Only Model
Correlation	0.173	0.159	0.279	-	-
Increased Strength	−0.430	0.231	0.063	0.225	0.247
Increased Duration	0.419	0.231	0.070	0.226	0.245
Larger Model
Correlation	0.207	0.200	0.300	-	-
Increased Strength	−0.249	0.482	0.605	0.440	0.374
Increased Duration	0.551	0.387	0.154	0.360	0.326
Interaction	−0.321	0.704	0.648	0.646	0.498
Age
1–9	(ref)
10–15	0.353	0.234	0.132	0.227	0.256
16+	0.485	0.739	0.512	0.565	0.451
Platelets (×10³/mm²)
1–49	(ref)
50–150	0.397	0.305	0.193	0.290	0.258
150+	−0.145	0.651	0.824	0.574	0.528

Open in a new tab

Estimate = log hazard ratios and estimates of σ

SE_J = jackknife based standard errors

P-value = two-sided p-value using SE_J

SE_S = sandwich-formula based standard errors (only for β)

Naive SE = SE assuming independence (only for β)

In a model with two covariates for treatment, there was some evidence that increased duration of chemotherapy was associated with worse overall survival compared to the standard duration. There was also evidence that overall survival was improved for patients with increased strength of chemotherapy compared to standard strength. The estimate for the correlation of the transformed failure times was 0.173 with a standard error of 0.159.

We also considered a larger model including the two treatment covariates, their interaction, and the prognostic factors of age and platelet count at registration. In this larger model, none of the covariates appeared to be associated with overall survival. The correlation term of this larger model was of similar magnitude as the treatment only model: the estimate was 0.207 with a standard error of 0.200.

7.2 Is there further information that can be gained by considering frailty terms?

The standard errors for the correlation term (σ̂) were large in both models summarized in Table 4, so we felt it would be useful to investigate whether there were any outliers contributing to the large standard error. In both regression models presented, the estimate of the correlation (σ̂) was positive, so we were able to predict frailty terms for each institution. The predicted frailty values were very similar between the two models, so we only present the frailties from the larger model.

A qq-plot for frailty values standardized by their standard errors indicated several potential outliers (Figure 3). The largest positive standardized frailties were from institutions with only one patient, so we disregarded those institutions as potential outliers. The largest negative frailties were from institutions with 10 and 22 patients, so we investigated these institutions further.

Fig. 3 — A qq-plot of frailties from proposed model standardized by their standard errors. Two potential outlier frailties are marked with crosses.

A plot of the predicted frailty values by institution size is provided in Figure 4. The two institutions identified as potential outliers in the qq-plot are marked in Figure 4, and the institutions still appear to be outliers. Among the other institutions there appears to be a positive trend, where institutions that contributed more patients had larger frailty values. Both of the institutions that are potential outliers do not follow this trend.

Fig. 4 — A plot of institution size by frailty value from proposed model. The two frailties marked with crosses in Figure 3 are marked with crosses in this Figure.

Given these results, we were interested in whether these trends could be explained by the type of patients at the two institutions. We used Fisher’s Exact test to compare known prognostic factors between each of the potential outlying institutions and all other patients. Given the limited sample sizes, we recognized that the power of the tests may be limited. We compared the following baseline characteristics: spleen enlargement, race, nodes (normal, moderately enlarged, significantly enlarged), mediastinal mass, white blood cell count (less than 50, 50–199, 200 or more; all ×10³/mm²), hemoglobin count (1 to 7.9, 8.0 to 10.9, 11.0 or more; all g/dL), platelet count (1 to 49, 50 to 149, 150 or more; all ×10³/mm³), and age (1–9, 10–15, 16 and older). Only one test had a p-value < 0.05, the comparison of age with the institution with ten patients.

The larger regression model in Table 4 controls for age, and age was not significantly associated with survival in the model. Given this result, there may be other factors beyond the available patient data that explain why these institutions have unusual frailty values.

To evaluate the impact of these two institutions on the estimate of the correlation, we refit our models excluding patients from the two institutions marked in Figures 3 and 4. A summary of the original and refit correlation estimates (σ̂) is provided in Table 5. The correlation estimate was smaller in the subset compared to the full dataset, though the change was more pronounced in the treatment only model.

Table 5.

Correlation estimates from proposed model for the full data and for data excluding two potential outliers.

	Treatment Only Model		Larger Model

	All Data	Subset	All Data	Subset
Correlation	0.17	0.03	0.21	0.12
SE	0.16	0.13	0.20	0.16
P-value	0.28	0.81	0.30	0.44

Open in a new tab

The correlation estimates for the subset data are positive and so frailty terms were able to be predicted for each of the institutions. A qq-plot of the frailties standardized by their standard errors and a plot of the frailty terms versus the institution size is provided in Figure 5. There do not appear to be any outliers in the model fit with the subset of the data. The frailty results look very similar to the results in Figures 3 and 4 excluding the two marked values.

Fig. 5 — A qq-plot of standardized frailties from proposed model (left) and a plot of institution size by frailty value from proposed model (right) for the subset data.

In this subset of the data, there is not evidence that survival times are correlated within institution. Our available data does not explain why two institutions may be outliers.

7.3 An alternative analysis using a gamma frailty model

We applied the gamma frailty model to the COG dataset. Results from an analysis with the two treatment covariates and an analysis including a treatment interaction and controlling for age and platelet count is provided in Table 6. The estimate for covariates is the log hazard ratio.

Table 6.

Data analysis results using gamma frailty model

Parameter	Estimate	SE	P-value
Treament Only Model
Frailty variance	0.440	0.374	0.12
Increased Strength	−0.455	0.251	0.071
Increased Duration	0.498	0.252	0.048
Larger Model
Frailty variance	0.443	0.361	0.11
Increased Strength	−0.324	0.383	0.40
Increased Duration	0.594	0.337	0.08
Interaction	−0.256	0.512	0.62
Age
1–9	(ref)
10–15	0.418	0.264	0.11
16+	0.558	0.457	0.24
Platelets (×10³/mm²)
1–49	(ref)
50–150	0.361	0.266	0.17
150+	−0.087	0.541	0.11

Open in a new tab

In the gamma frailty model, the strength of the correlation is measured by the variance of the gamma frailties. In both the treatment only and larger models, the estimate for the frailty variance variance is not significant. As with our results, these two gamma frailty models do not show evidence of clustering within institution.

The covariate parameter values from the proposed model are the same as would be found using a marginal proportional hazards model treating the potential clustering as a nuisance. Therefore the parameters from the proposed model have population interpretation as an average effect across all the patients in the study.

In contrast, due to the functional specification of the gamma frailty hazard function, covariate parameters values from a gamma frailty model are interpreted as conditional on the value of the frailty and do not represent an average effect. When there is no correlation within clusters, the proposed model and the gamma frailty model both reduce to the proportional hazards model. Given the low evidence of clustering in this dataset, it is not surprising that the covariate parameter estimates are close in Tables 4 and 6. Given that the COG study was interested in population averaged hazard ratios, the parameters in Table 4 provide a more appropriate interpretation.

As with the proposed method, we looked plotted the log frailty values from the gamma frailty model versus the institution size in Figure 6. We note that the with the proposed method large, positive frailties are associated with improved survival, in contrast to the gamma frailty model in which large frailties are associated with increased hazards. The two institutions identified as potential outliers with the proposed method are marked with crosses. In Figure 6 there does not appear to be a relationship between log frailty value and institution size. The two marked institutions do not appear to have unusual frailty values given the trends in the data.

Fig. 6 — A plot of institution size by the log frailty value from gamma frailty model. Institutions identified as outliers in proposed methods are marked with crosses.

8 Discussion

There is a need for flexible survival regression models that allow for marginal interpretations of treatment or exposure, while concurrently evaluating potential clustering. The method proposed here establishes a general likelihood framework for this type of analysis. Marginal treatment or exposure effects are modeled with a proportional hazards model, while the correlation between survival times is described by a Gaussian copula. When the correlation between the transformed survival times is positive, the model can be extended to incorporate frailties. We believe this model can provide an easy graphical method to identify potentially “abnormal” clusters.

When the model was applied to a Children’s Oncology Group dataset, we were able to identify two potential outlying institutions. Baseline patient characteristics were unable to explain the observed trends. More information on patients or on the institutions may be needed to understand the underlying issues.

One area of future research is investigating the relative efficiency of using a non-parametric maximum likelihood approach in contrast to the two-stage approach used in this paper. Also, it might be useful to develop a more flexible frailty model that allows for covariates so that factors influencing correlation can be examined within a regression framework.

Acknowledgments

This work was supported in part by National Cancer Institute grants R01 CA95747 and CA09337-25. The authors thank Jim Anderson for providing the Children’s Oncology Group dataset and advice he gave for the data application.

A Regularity Conditions and Notation

Assume the following regularity conditions where τ > 0 is a constant (for example, study duration):

C.1
β is in a compact subset of ℝ^p
C.2
Λ(τ) < ∞
C.3
σ ∈ ν, where ν is a compact subset of (−1, 1)
C.4
P(C_ij ≥ t ∀t ∈ [0, τ] | Z_ij) > δ_c > 0 for j = 1, …, n_i and i = 1, …, m
C.5
Write Z_ij(t) = {Z_ij₁(t), …, Z_ijp(t)}. $∣ Z_{ijk} (0) ∣ + \int_{0}^{τ} ∣ {d Z}_{ijk} (t) ∣ \leq B_{Z} < \infty$ almost surely for some constant B_Z and i = 1, …, m, j = 1, …, n_i, k = 1, …, p
C.6
E[log{L(σ₁; β, Λ)/L(σ₂; β, Λ)}] exists for all σ₁, σ₂ ∈ (−1, 1)
C.7
Let Y_ij(t) = I(X_ij ≥ t), K = max_i n_i, a^⊗0 = 1, a^⊗1 = a, a^⊗2 = a′a,
$\begin{matrix} Q_{j}^{(κ)} (β, t) = m^{- 1} \sum_{i = 1}^{m} Y_{i j} (t) exp {β^{'} Z_{i j} (t)} Z_{i j} {(t)}^{\otimes κ}, q_{j}^{(κ)} (β, t) = E {Q_{j}^{(κ)} (β, t)}, \\ η_{j} (β, t) = \frac{q_{j}^{(1)} (β, t)}{q_{j}^{(0)} (β, t)}, ϱ_{j} (β, t) = \frac{q_{j}^{(2)} (β, t)}{q_{j}^{(0)} (β, t)} - η_{k} {(β, t)}^{\otimes 2} for j = 1, \dots, K . \end{matrix}$

Assume $\sum_{j = 1}^{K} \int_{0}^{τ} ϱ_{j} (β_{0}, t) q_{j}^{(0)} (β_{0}, t) λ_{0} (t) d t$ is positive definite

Condition C.3 allows us to avoid boundary issues. Condition C.5 assumes that all the covariates are of bounded variation, which is necessary to ensure the Hadamard differentiability of the likelihood and score function. Condition C.6 is useful to help prove that the expected likelihood is maximized at σ₀. Condition C.7 is a technical condition from Spiekerman and Lin (1998) that is needed for the results for β̂ and Λ̂.

Before providing technical results, we will provide a brief description of how the likelihood in Equation (3) can be derived. Let S_ij (x) = S{x | Z̄_ij (X_ij)}. We consider a cluster of size n_i with Δ_i subjects who are not censored. The contribution to the likelihood will be the density function for X_i₁, …, X_{iΔ_i} and the survival function for X_{iΔ_i+1}, …, X_{in_i}. We have specified a copula structure, so we can write the survival function for (X_i₁, …, X_{in_i}) as follows:

\begin{array}{l} P (X_{i 1} > x_{i 1}, \dots, X_{{i n}_{i}} > x_{{i n}_{i}}) = P {1 - S_{i 1} (X_{i 1}) > 1 - S_{i 1} (x_{i 1}), \dots, 1 - S_{{i n}_{i}} (X_{{i n}_{i}}) > 1 - S_{{i n}_{i}} (x_{{i n}_{i}})} \\ = P [Φ^{- 1} {1 - S_{i 1} (X_{i 1})} > Φ^{- 1} {1 - S_{i 1} (x_{i 1})}, \dots, Φ^{- 1} {1 - S_{{i n}_{i}} (X_{{i n}_{i}})} > Φ^{- 1} {1 - S_{{i n}_{i}} (x_{{i n}_{i}})}] \\ = P (Y_{i 1} > y_{i 1}, \dots Y_{{i n}_{i}} > y_{{i n}_{i}}) \\ = P (Y^{Δ_{i}} > y^{Δ_{i}}, Y^{n_{i} - Δ_{i}} > y^{n_{i} - Δ_{i}}) \end{array}

We can write the density of (X_i₁, …, X_{iΔ_i}), denoted f_{Δ_i} (x_i₁, …, x_{iΔ_i}) = f(x_i₁, …, x_{iΔ_i} | Z̄_i₁(x_i₁), …, Z̄_{in_i} (x_{in_i})), in terms of Y^Δ_i using the chain-rule:

\begin{array}{l} f_{Δ_{i}} (x_{i 1}, \dots, x_{i Δ_{i}}) = \partial^{2} P (X_{i 1} > x_{i 1}, \dots, X_{i Δ_{i}} > x_{i Δ_{i}}) / \partial x_{i 1}, \dots, \partial x_{i Δ_{i}} \\ = \partial^{2} P (Y^{Δ_{i}} > y^{Δ_{i}}) / \partial x_{1}, \dots, \partial x_{2} \\ = \frac{\partial^{2} P (Y^{Δ_{i}} > y^{Δ_{i}})}{\partial y_{i 1}, \dots, \partial y_{i Δ_{i}}} \frac{\partial y_{i 1}}{\partial x_{i 1}}, \dots, \frac{\partial y_{i Δ_{i}}}{\partial x_{i Δ_{i}}} \\ = φ^{Δ_{i}} (y^{Δ_{i}}) \frac{\partial y_{i 1}}{\partial x_{i 1}}, \dots, \frac{\partial y_{i Δ_{i}}}{\partial x_{i Δ_{i}}} . \end{array}

For j = 1, …, Δ_i:

\begin{array}{l} \partial y_{j} / \partial x_{j} = \partial Φ^{- 1} {1 - S_{j} (x_{j})} / \partial x_{j} \\ = \frac{1}{φ [Φ^{- 1} {1 - S_{k} (x_{k})}]} \frac{\partial {1 - S_{j} (x_{j})}}{\partial x_{j}} \\ = \frac{1}{φ (y_{j})} \frac{\partial {1 - S_{j} (x_{j})}}{\partial x_{j}} \\ = \frac{1}{φ (y_{j})} f_{j} (x_{j}) . \end{array}

The rest of Equation (3) can be found by combining these calculations with survival terms for subjects who are censored.

To simplify the presentation of the proofs we define several terms. Define

L^{*} (σ, β, Λ) = \prod_{i = 1}^{m} φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} ({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}}),

where L(σ, β, Λ) = c*L*(σ, β, Λ) and c* does not depend on σ. Let

\begin{array}{l} l_{m 0} (σ) = m^{- 1} log L^{*} (σ, β_{0}, Λ_{0}), l_{m} (σ) = m^{- 1} log L^{*} (σ, β, Λ), \\ {\hat{l}}_{m} (σ) = m^{- 1} log L^{*} (σ, \hat{β}, \hat{Λ}), U_{m 0} (σ) = \partial l_{m 0} (σ) / \partial σ, and U_{m 0} (σ) = \partial l_{m} (σ) / \partial σ . \end{array}

Expectations are with respect to the true distributions of all random variables involved. Let || · || denote the Euclidean norm and let || · ||_∞ denote the supremum norm on [0, τ]. Let BV [0, τ] denote the class of functions with bounded total variation on [0, τ]. Let single superscript dots denote first derivatives and double superscript dots denote second derivatives.

B Proof and Associated Lemmas for Theorem 1

For ease of presentation we state several lemmas used in the proof of Theorem 1, but defer their proof until Appendix D.

To account for the fact that plug-in estimates of β and Λ are used in the likelihood for σ, we will need to take a Taylor series expansion of the likelihood of σ around β₀ and Λ₀. Since Λ₀ is an unspecified function, this expansion will need to include a functional expansion term. An expansion using Hadamard derivatives is appropriate for this situation. In order to use the functional expansion, we first need to verify that the log-likelihood is Hadamard differentiable with respect to Λ, which is done in Lemma 1.

Lemma 1

Under conditions C.1–C.5, the log-likelihood l_m(σ) is Hadamard differentiable with respect to Λ.

After we have an expansion of the log-likelihood we will need the first order terms to be bounded by a random variable with finite expectation. We provide this verification in Lemma 2.

Lemma 2

Write the Hadamard derivative of l_m(σ) with respect to Λ at ϒ ∈ BV [0, τ] as $\int_{0}^{τ} ζ_{m} (Λ, σ) (u) d ϒ (u)$ and let ζ_m(β, σ) = ∂l_m(σ)/∂β. Under conditions C.1–C.5, ||ζ_m(Λ, σ) ||_∞ and ||ζ_m(β, σ)|| are bounded. Expressions for ζ_m(β, σ)|| and ζ_m(Λ, σ) are provided in the proof.

In order to prove σ̂ is consistent we will need to verify the uniform convergence of the log-likelihood with plug-in estimates of β and Λ to the expected value of the log-likelihood evaluated at the true values of β and Λ, denoted l_m₀(σ). We accomplish this, using the results of Lemmas 1 and 2, in Lemma 3.

Lemma 3

Under conditions C.1–C.5, as m → ∞,

sup_{σ \in ν} ∣ {\hat{l}}_{m} (σ) - E {l_{m 0} (σ)} ∣ = o_{p} (1) .

Finally, in order to verify that σ̂ is consistent, we will need to show that the expected log-likelihood is maximized at the truth, which is done in Lemma 4.

Lemma 4

Under conditions C.1–C.6, for any σ ≠ σ₀,

E {l_{m 0} (σ)} - E {l_{m 0} (σ_{0})} < 0.

Proof of Theorem 1

The results for β̂ and Λ̂ follow from arguments along the lines of Spiekerman and Lin (1998). We use the results of Lemmas 3 and 4 to prove the result for σ̂.

Since σ̂ maximizes l̂_m(σ), Lemma 3 implies that

\begin{array}{c} 0 \leq {\hat{l}}_{m} (\hat{σ}) - {\hat{l}}_{m} (σ_{0}) = {\hat{l}}_{m} (\hat{σ}) - {\hat{l}}_{m} (σ_{0}) + E {l_{m 0} (σ_{0})} - E {l_{m 0} (σ_{0})} \\ = {\hat{l}}_{m} (\hat{σ}) - E {l_{m 0} (σ_{0})} + o_{p} (1) . \end{array}

Therefore E{l_m₀(σ₀)} ≤ l̂_m(σ̂) + o_p(1). Subtract E{l_m₀(σ̂)} from each side of the inequality to write

E {l_{m 0} (σ_{0})} - E {l_{m 0} (\hat{σ})} \leq {\hat{l}}_{m} (\hat{σ}) - E {l_{m 0} (\hat{σ})} + o_{p} (1) \leq sup_{σ \in ν} ∣ {\hat{l}}_{m} (σ) - E {l_{m 0} (σ)} ∣ + o_{p} (1) = o_{p} (1),

(9)

where the last equality comes from Lemma 3.

Take σ such that |σ−σ₀| ≥ ε for any fixed ε > 0. By Lemma 4 there must exist some γ_ε > 0 such that E{l_m₀(σ)} + γ_ε < E{l_m₀(σ₀}). It follows that P(|σ̂−σ₀| ≥ ε) ≤ P[E{l_m₀(σ)}+ γ_ε< E{l_m₀(σ₀)}[. Equation (9) implies that P[E{l_m₀(σ̂)} + γ_ε < E{l_m₀(σ₀)}] converges to 0 as m → ∞. Therefore P(|σ̂−σ₀| ≥ ε) converges to 0 as m → ∞.

C Proof and Associated Lemmas for Theorem 2

For ease of presentation we state several lemmas used in the proof of Theorem 2, but defer their proof until Appendix D.

To account for the fact that plug-in estimates of β and Λ are used in the likelihood and score function for σ, we will need to take a Taylor series expansion of the score function for σ around β₀ and Λ₀. To do so we first need to verify that the score function is Hadamard differentiable with respect to Λ, which is done in Lemma 5.

Lemma 5

Under conditions C.1–C.5, the score function U_m(σ) is Hadamard differentiable with respect to Λ.

After we have an expansion of the score function for σ, we will need the first order terms to be bounded by a random variable with finite expectation. We provide this verification in Lemma 6.

Lemma 6

Write the Hadamard derivative of U_m(σ) with respect to Λ at ϒ ∈ BV [0, τ] as $\int_{0}^{τ} ξ_{m} (σ, Λ) (u) d ϒ (u)$ and let ξ_m(β, σ) = ∂U_m(σ)/∂β. Under conditions C.1–C.5, ||ξ_m(σ, Λ)||_∞ and ||ξ_m(σ, β)|| are bounded. Expressions for ξ_m(σ, β) and ξ_m(σ, Λ) are provided in the proof.

Proof of Theorem 2

The result that $\sqrt{m} (\hat{β} - β)$ converges to mean zero normal distribution and that $\sqrt{m} (\hat{Λ} - Λ_{0})$ converges to mean zero Guassian process follows from arguments along the lines of Spiekerman and Lin (1998). This proof needs to verify that $\sqrt{m} (\hat{σ} - σ_{0})$ converges to a normal distribution with mean zero after accounting for the extra variance induced by the two-stage estimation procedure. The variance of σ̂ should be adjusted compared to a model where β₀ and Λ₀ are used to take into account the estimation of β̂ and Λ̂.

First we will show that the score equation associated with l̂_m evaluated at σ₀ follows a normal distribution. This result coupled with a first order expansion of the score equation associated with l̂_m around σ₀ will finish the proof.

Using Lemma 5, a Taylor series expansion of Û_m(σ) around β₀ and Λ₀ gives

{\hat{U}}_{m} (σ_{0}) = U_{m} (σ_{0}) + \int_{0}^{τ} ξ_{m} (σ_{0}, Λ) (t) d {\hat{Λ} (t) - Λ_{0} (t)} + ξ_{m} (σ_{0}, β) (\hat{β} - β) + G_{m},

where G_m is a remainder term for the Taylor series. Since Λ̂ and β̂ are $\sqrt{m}$ -consistent it can be shown that G_m = o_p(m^−1/2). Define the pointwise limit of ξ_m(σ, Λ)(t) as ξ(σ, Λ)(t) and let ξ(σ, β) = E{ξ_m(σ, β)}. From Lemma 6, ||ξ(σ₀, Λ)||_∞ and ||ξ(σ, β)|| are bounded. It follows that

\sqrt{m} {\hat{U}}_{m} (σ_{0}) = \sqrt{m} U_{m} (σ_{0}) + \sqrt{m} \int_{0}^{τ} ξ (σ_{0}, Λ) (t) d {\hat{Λ} (t) - Λ_{0} (t)} + \sqrt{m} ξ (σ_{0}, β) (\hat{β} - β) + o_{p} (1) .

(10)

Using the results of Spiekerman and Lin (1998), we can write Equation (10) as a sum of independent and identically distributed random variables, $\sqrt{m} \sum_{i = 1}^{m} Ξ_{i}$ , where E(Ξ₁) = 0 and V(Ξ₁ < ∞. The central limit theorem implies that $\sqrt{m} {\hat{U}}_{m} (σ_{0})$ converges to a normally distributed random variable with mean zero and variance equal to the variance of Ξ₁.

Next, we take a first order Taylor series expansion of Û_m(σ̂) around σ₀:

{\hat{U}}_{m} (\hat{σ}) = {\hat{U}}_{m} (σ_{0}) + (\hat{σ} - σ_{0}) {\hat{W}}_{m} (σ^{*}),

where Ŵ_m(σ) = ∂Û_m(σ)/∂σ and σ* is between σ̂ and σ₀. It must be the case that Û_m(σ̂) = 0 since σ̂ was taken to be the maximum of L(σ, β̂, Λ̂). Theorem 1 showed that σ̂ consistently estimates σ₀, so the the law of large numbers implies that Ŵ_m(σ*) converges in probability to W(σ₀) = lim_m_→∞ W_m(σ₀). Finally, using the central limit theorem and Slutsky’s theorem, $\sqrt{m} (\hat{σ} - σ_{0})$ converges to a normal distribution with mean zero and variance equal to W(σ₀)⁻²V(Ξ₁)

D Proofs of Lemmas

Proof of Lemma 1

Define Y_ij(t) = I(X_ij ≥ t). The log-likelihood can be written

l_{m} (σ) = m^{- 1} \sum_{i = 1}^{m} log φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) + log {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} ({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})

where ${\tilde{X}}_{i j} = {\tilde{Φ}}^{- 1} (exp [- \int_{0}^{τ} Y_{i j} (u) exp {β^{'} Z_{i j} (u)} d Λ (u)])$ . By condition C.5 the term

\int_{0}^{τ} Y_{i j} (u) exp {β^{'} Z_{i j} (u)} d Λ (u)

is Hadamard differentiable. Using multiple iterations of the chain rule for Hadamard derivatives (van der Vaart, 1998, Theorem 20.9), we conclude that l_m(σ) is Hadamard differentiable.

Proof of Lemma 2

First we find expressions for ζ_m(σ, Λ) and ζ_m(β, σ), starting with ζ_m(σ, Λ). To make the argument more concrete express l_m(σ) as a function of Λ by writing l_m(σ, Λ) = l_m(σ). Let Γ BV[0, τ]. Denote

H_{i j} = exp [- \int_{0}^{τ} Y_{i j} (u) exp {β^{'} Z_{i j} (u)} d Λ (u)] .

By conditions C.1 and C.2, for j = 1, …, n_i and i = 1, …, m, H_ij > 0 and |X̃_ij| < B* < ∞ for some constant B*.

To find the expression for the derivative, take a Taylor series expansion of l_m{σ, Λ + t(Γ − Λ)} around t = 0 and evaluate the result at t = 1. The final expression is

l_{m} (σ, Γ) = l_{m} (σ, Λ) + \int_{0}^{τ} ζ_{m} (σ, Λ) (u) d (Λ - Γ) (u),

where ζ_m(σ, Λ)(u) is equal to $m^{- 1} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} D_{i j}^{l} Y_{i j} (u) exp {β^{'} Z_{i j} (u)} H_{i j}$ and $D_{i j}^{l}$ is equal to

(Δ_{i j} [φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 1} {\partial φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) / \partial {\tilde{X}}_{i j}}] + (1 - Δ_{i j}) [{\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})}^{- 1} \times {\partial {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} ({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}}) / \partial {\tilde{X}}_{i j}}]) \sum_{j = 1}^{n_{i}} \partial Φ^{- 1} (H_{i j}) / \partial H_{i j} .

Therefore the Hadamard derivative for ϒ ∈ BV[0, τ] is $\int_{0}^{τ} ζ_{m} (σ, Λ) (u) d ϒ (u)$ . Direct calculation verifies that ζ_m(σ, β) is equal to

m^{- 1} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} D_{i j}^{l} [\int_{0}^{τ} Y_{i j} (u) Z_{i j} (u) exp {β^{'} Z_{i j} (u)} d Λ (u)] H_{i j} .

We need to check whether each of the terms in $D_{i j}^{l}$ is bounded and also that the terms unique to ζ_m(σ, β) and ζ_m(σ, Λ) are bounded. First,

φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) = {(2 π)}^{- Δ_{i} / 2} det {(\sum_{i 11})}^{- 1 / 2} exp (- {\tilde{X}}_{i}^{Δ_{i}^{'}} \sum_{i 11}^{- 1} {\tilde{X}}_{i}^{Δ_{i}} / 2) > 1 / B_{1} > 0

for some constant B₁ since for ${\tilde{X}}_{i j} \in {\tilde{X}}_{i}^{Δ_{i}}$ , |X̃_ij| < B^*. Therefore, for i =1, …, m, $φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 1} < B_{1} < \infty$ .

Let w_α(j) denote the vector of length α where the j^th element is 1 and the rest of the vector is 0. Using the chain rule, for j = 1, …, Δ_i and i = 1, …, m,

\partial φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) / \partial {\tilde{X}}_{i j} = - {\tilde{X}}_{i}^{Δ_{i}^{'}} \sum_{i 11}^{- 1} w_{Δ_{i}} (j) φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) .

The multivariate normal density $φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}})$ is bounded and for ${\tilde{X}}_{i j} \in {\tilde{X}}_{i}^{Δ_{i}}$ , |X̃_ij| < B^*. Hence, for j = 1, …, Δ_i and i =1, …, m, $∣ \partial φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) / \partial {\tilde{X}}_{i j} ∣ < B_{2} < \infty$ for some constant B₂.

Next consider ${\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} ({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})$ , which for i = 1, …, m is equal to

\int_{M_{i}} {(2 π)}^{n_{i} - Δ_{i}} det ({\sum^{\sim}}_{i}) exp {{(t^{n_{i} - Δ_{i}} - {\tilde{μ}}_{i})}^{'} {\sum^{\sim}}_{i}^{- 1} (t^{n_{i} - Δ_{i}} - {\tilde{μ}}_{i}) / 2} {d t}^{n_{i} - Δ_{i}}

where M_i = {t_{(Δ_i+1)} > X̃_{i, (Δ_i+1)}, …, t_{n_i} > X̃_{i,n_i}}, t^{n_i − Δ_i} = (t_{(Δ_i+1)}, …, t_{n_i}), ${\sum^{\sim}}_{i} = \sum_{i 22} - \sum_{i 21}^{'} \sum_{i 11}^{- 1} \sum_{i 12}$ , and ${\tilde{μ}}_{i} = \sum_{i 21} \sum_{i 11}^{- 1} {\tilde{X}}_{i}^{Δ_{i}}$ . Since |X̃_ij| < B* for ${\tilde{X}}_{i j} \in {\tilde{X}}_{i}^{n_{i} - Δ_{i}}$ , it must be the case that for i = 1, …, m. $∣ {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})}^{- 1} ∣ < B_{3} < \infty$ for some constant B₃.

Let $t_{j}^{n_{i} - Δ_{i}}$ be equal to t^{n_i − Δ_i} but with the component corresponding to the (j − Δ_i)^th component replaced by X̃_ij. Let $t_{- j}^{n_{i} - Δ_{i}}$ be equal to t^{n_i − Δ_i} but with the (j − Δ_i)^th element removed. Let M_i_,−_j denote M_i but with the ( $j - Δ_{i}^{t h}$ ) inequality removed. Consider $∣ \partial {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} ({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}}) / \partial {\tilde{X}}_{i j} ∣$ , which, for j = Δ_i + 1, …, n_i, i = 1 …, m, can be written

∣ \int_{M_{i}} - {(2 π)}^{n_{i} - Δ_{i}} det ({\sum^{\sim}}_{i}) exp {{(t_{j}^{n_{i} - Δ_{i}} - {\tilde{μ}}_{i})}^{'} {\sum^{\sim}}_{i}^{- 1} (t_{j}^{n_{i} - Δ_{i}} - {\tilde{μ}}_{i}) / 2} d t_{- j}^{n_{i} - Δ_{i}} ∣ < B_{4}

for some constant B₄ < ∞ since |X̃_ij| < B* for ${\tilde{X}}_{i j} \in {\tilde{X}}_{i}^{n_{i} - Δ_{i}}$ .

Using the definition of the derivative of an inverse function,

\partial Φ^{- 1} (H_{i j}) / \partial H_{i j} = - {[φ {Φ^{- 1} (H_{i j})}]}^{- 1},

where φ is the density of the standard normal distribution and Φ⁻¹ is the inverse of the distribution function of the standard normal distribution. Since |X̃_ij| < B*, 0 < B₅ < H_ij < B₆ < 1 for some constants B₅ and B₆. Therefore, for j = 1, …, n_i and i = 1, …, m, |∂Φ⁻¹(H_ij)/∂H_ij| < B₇ < ∞ for some constant B₇. By condition C.5, for j = 1, …, n_i and i = 1, …, m, ||Y_ij exp(β′Z_ij)||_∞ < B8 < ∞ and $| | \int_{0}^{τ} Y_{i j} (u) Z_{i j} (u) exp {β^{'} Z_{i j} (u)} d Λ (u)] | | < B_{9} < \infty$ for some constants B₈ and B₉. Hence ||ζ_m(σ, Λ)||_∞ and ||ζ_m(β, σ)|| are bounded are bounded by (B₁B₂ + B₃B₄)B₇(B₈ + B₉) < ∞

Proof of Lemma 3

An expansion of l̂_m(σ) around Λ₀ and β₀ can be written:

{\hat{l}}_{m} (σ) = l_{m 0} (σ) + ζ_{m} (β, σ) (\hat{β} - β) + \int_{0}^{τ} ζ_{m} (σ, Λ) (t) d (\hat{Λ} - Λ_{0}) (t) + R,

where R is a remainder term of order o_p{max(||Λ̂ − Λ₀||_∞, ||β̂ − β₀||)} and ζ_m(β, σ) and ζ_m(σ, Λ)(t) are defined in Lemma 2. Since Λ̂ is uniformly consistent and β̂ is consistent (Spiekerman and Lin, 1998), R = o_p(1). The result follows from the law of large numbers, the uniform consistency of Λ̂, the consistency of β̂, and the fact that ||ζ_m(β, σ)|| and ||ζ_m(σ, Λ)||_∞ are bounded (Lemma 2).

Proof of Lemma 4

The log-likelihood, l_m(σ), can be written as a sum of independent and identically distributed random variables $m^{- 1} \sum_{i = 1}^{m} ϕ_{i} (σ)$ . Take σ ≠ σ₀. The law of large numbers and Jensen’s inequality imply that E{l_m₀(σ)} − E{l_m₀(σ₀)} = lim_m_→∞ l_m₀(σ) − l_m₀(σ₀) which is strictly less than log[E{L*(σ, β₀, Λ₀)/L*(σ₀, β₀, Λ₀)}] = 0.

Proof of Lemma 5

Let N(t, d, μ, Σ^†) be defined as (2π)⁻^d^/2 det(Σ^†)^−1/2 exp{−(t − μ)′(Σ^†)⁻¹(t − μ)/2} [tr{(Σ^†)⁻¹ W̃_d} − {−(t − μ)′(Σ^†) ⁻¹ W̃_d(Σ^†)⁻¹(t − μ)/2}]/2, where W̃_d is the d dimensional square matrix with zeros along the diagonal and ones off the diagonal. Let 0^d denote a vector of length d of zeros. The score function can be written

U_{m} (σ) = m^{- 1} \sum_{i = 1}^{m} φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 1} N ({\tilde{X}}_{i}^{Δ_{i}}, Δ_{i}, 0^{Δ_{i}}, \sum_{i 11}) + {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})}^{- 1} \int_{M_{i}} N (t^{n_{i} - Δ_{i}}, n_{i} - Δ_{i}, {\tilde{μ}}_{i}, {\sum^{\sim}}_{i}) {d t}^{n_{i} - Δ_{i}} .

Using the results of Lemma 1 and multiple iterations of the chain rule for Hadamard derivatives (van der Vaart, 1998, Theorem 20.9), we conclude that U_m(σ) is Hadamard differentiable.

Proof of Lemma 6

First we find expressions for ξ_m(σ, Λ) and ξ_m(σ, β), starting with ξ_m(σ, Λ). To make the argument more concrete express U_m(σ) as a function of Λ by writing U_m(σ, Λ) = U_m(σ). Let Γ ∈ BV [0, τ].

To find the expression for the derivative, take a Taylor series expansion of U_m{σ,Λ + τ(Γ – Λ)} around t = 0 and evaluate the result at t = 1. The final expression is $U_{m} (σ, Γ) = U_{m} (σ, Λ) + \int_{0}^{τ} ξ_{m} (σ, Λ) (u) d (Λ - Γ) (u)$ , where ξ_m(σ, Λ)(u) is equal to

m^{- 1} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} D_{i j}^{U} Y_{i j} (u) exp {β^{'} Z_{i j} (u)} H_{i j}

and

\begin{array}{l} D_{i j}^{U} = (Δ_{i j} [{\partial φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 1} / \partial {\tilde{X}}_{i j}} N ({\tilde{X}}_{i}^{Δ_{i}}, Δ_{i}, 0^{Δ_{i}}, \sum_{i 11}) + φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 1} \\ \times {\partial N ({\tilde{X}}_{i}^{Δ_{i}}, Δ_{i}, 0^{Δ_{i}}, \sum_{i 11}) / \partial {\tilde{X}}_{i j}}] + (1 - Δ_{i j}) [{\partial {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})}^{- 1} / \partial {\tilde{X}}_{i j}} \\ \times \int_{M_{i}} N (t^{n_{i} - Δ_{i}}, n_{i} - Δ_{i}, {\tilde{μ}}_{i}, {\sum^{\sim}}_{i}) {d t}^{n_{i} - Δ_{i}} + {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})}^{- 1} \\ \times {\partial \int_{M_{i}} N (t^{n_{i} - Δ_{i}}, n_{i} - Δ_{i}, {\tilde{μ}}_{i}, {\sum^{\sim}}_{i}) {d t}^{n_{i} - Δ_{i}} / \partial {\tilde{X}}_{i j}}]) \sum_{j = 1}^{n_{i}} \partial Φ^{- 1} (H_{i j}) / \partial H_{i j} \end{array}

Therefore the Hadamard derivative for ϒ ∈ BV [0, τ] is $\int_{0}^{τ} ξ_{m} (σ, Λ) (u) d ϒ (u)$ . Direct calculation verifies that ξ_m(σ, β) is equal to

m^{- 1} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} D_{i j}^{U} [\int_{0}^{τ} Y_{i j} (u) Z_{i j} (u) exp {β^{'} Z_{i j} (u)} d Λ (u)] H_{i j} .

In Lemma 2 we showed that, for i = 1, …, m, $∣ φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 1} ∣ < B 1 < \infty$ and $∣ {\tilde{Φ}}_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}})}^{- 1} ∣ < B_{3} < \infty$ . Also, for j = 1, …, n_i, i = 1, …, m, |∂Φ⁻¹(H_ij)/∂H_ij| < B₇ < ∞, ||Y_ij exp(β′Z_ij)||_∞ < B8 < ∞ and $| | \int_{0}^{τ} Y_{i j} (u) Z_{i j} (u) exp {β^{'} Z_{i j} (u)} d Λ (u)] | | < B_{9} < \infty$ .

We tackle each of the remaining terms. First, using results from Lemma 2, for j = 1, …, Δ_i, i = 1, …, m, $∣ \partial φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 1} / \partial {\tilde{X}}_{i j} ∣$ is equal to $∣ - φ_{u}^{Δ_{i}} {({\tilde{X}}_{i}^{Δ_{i}})}^{- 2} {\partial φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) / \partial {\tilde{X}}_{i j}} ∣ < B_{11} = B_{1}^{2} B_{2} < \infty$ for some constant B₁₁.

Since Σ_i₁₁ has an exchangeable structure, $t r {\sum_{i 11}^{- 1} {\tilde{W}}_{Δ_{i}}}$ and det(Σ_i₁₁)^−1/2 are both bounded by some constant B₁₀ < ∞. Therefore for i = 1, …, m, $∣ N ({\tilde{X}}_{i}^{Δ_{i}}, Δ_{i}, 0^{Δ_{i}}, \sum_{i 11}) ∣ < B_{12} < \infty$ for some constant B₁₂.

Next, we consider $∣ \partial N ({\tilde{X}}_{i}^{Δ_{i}}, Δ_{i}, 0^{Δ_{i}}, \sum_{i 11}) / \partial {\tilde{X}}_{i j} ∣$ for j = 1, …, Δ_i and i = 1, …, m, which is equal to

∣ {\tilde{X}}_{i}^{Δ_{i}^{'}} \sum_{i 11}^{- 1} w_{Δ_{i}} (j) N ({\tilde{X}}_{i}^{Δ_{i}}, Δ_{i}, 0^{Δ_{i}}, \sum_{i 11}) + {\tilde{X}}_{i}^{Δ_{i}^{'}} \sum_{i 11}^{- 1} {\tilde{W}}_{Δ_{i}} \sum_{i 11}^{- 1} w_{Δ_{i}} (j) φ_{u}^{Δ_{i}} ({\tilde{X}}_{i}^{Δ_{i}}) ∣,

and, by the results of the previous paragraph and the results of Lemma 2, is bounded by some constant B₁₃ < ∞.

Using results from Lemma 2, for j = Δ_i + 1, …, n_i and i = 1, …, m, $∣ \partial Φ_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})}^{- 1} / \partial {\tilde{X}}_{i j} ∣$ is equal to

∣ - Φ_{c}^{n_{i} - Δ_{i}} {({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}})}^{- 2} {\partial Φ_{c}^{n_{i} - Δ_{i}} ({\tilde{X}}_{i}^{n_{i} - Δ_{i}} ∣ {\tilde{X}}_{i}^{Δ_{i}}) / \partial {\tilde{X}}_{i j}} ∣ < B_{14} = B_{3}^{2} B_{4} < \infty

for some constant B₁₄.

Using similar arguments as above one can directly show that for i = 1, …, m,

\int_{M_{i}} N (t^{n_{i} - Δ_{i}}, n_{i} - Δ_{i}, {\tilde{μ}}_{i}, {\sum^{\sim}}_{i}) {d t}^{n_{i} - Δ_{i}} < B_{15} < \infty

for some constant B₁₅.

Also, for j = Δ_i + 1, …, n_i and i = 1, …, m,

∣ \partial \int_{M_{i}} N (t^{n_{i} - Δ_{i}}, n_{i} - Δ_{i}, {\tilde{μ}}_{i}, {\sum^{\sim}}_{i}) {d t}^{n_{i} - Δ_{i}} / \partial {\tilde{X}}_{i j} = \int_{M_{i, - j}} N (t_{j}^{n_{i} - Δ_{i}}, n_{i} - Δ_{i}, {\tilde{μ}}_{i}, {\sum^{\sim}}_{i}) d t_{- j}^{n_{i} - Δ_{i}} ∣ < B_{15} < \infty

for some constant B₁₆.

Hence ||ξ_m(σ, Λ)||_∞ and ||ξ_m(β, σ)|| are bounded by (B₁₁B₁₂+B₁B₁₃+B₁₄B₁₅+B₃B₁₆)B₇(B₈+B₉) < ∞.

E Extended simulation results

In the interest of space, the simulation results in the main body of the manuscript focus on the novel results for σ. We summarize the performance of β in Table 7. The true values for β₁ and β₂ are log(0.5) and 0.75, respectively. In all scenarios investigated, both the robust sandwich standard error and the jackknife standard error perform well. The jackknife standard error (SE) appears to match the Monte Carlo SE more closely compared to the robust SE, but the coverage probabilities are very similar and indicate appropriate coverage. Power for a Wald-type test with the jackknife SE is high across all the scenarios.

Table 7.

Simulation results for marginal parameters from the proportional hazards model. CP denotes coverage probability and SE denotes standard error.

		Estimate	Jackknife SE	Monte Carlo SE	Robust SE	Jackknife CP	Robust CP	Power
90 clusters
σ = 0	β₁	−0.700	0.140	0.136	0.182	0.964	0.960	0.996
	β₂	0.765	0.237	0.236	0.185	0.956	0.944	0.912
σ = 0.05	β₁	−0.711	0.138	0.124	0.183	0.964	0.964	1.000
	β₂	0.800	0.237	0.212	0.198	0.956	0.952	0.948
σ = 0.10	β₁	−0.696	0.140	0.140	0.185	0.952	0.944	1.000
	β₂	0.769	0.241	0.241	0.197	0.948	0.944	0.884
σ = 0.15	β₁	−0.710	0.141	0.127	0.182	0.972	0.960	1.000
	β₂	0.756	0.240	0.236	0.178	0.960	0.948	0.884
150 clusters
σ = 0	β₁	−0.689	0.106	0.108	0.139	0.944	0.936	1.000
	β₂	0.743	0.182	0.173	0.141	0.956	0.956	0.988
σ = 0.05	β₁	−0.679	0.106	0.101	0.138	0.952	0.952	1.000
	β₂	0.770	0.182	0.172	0.141	0.968	0.968	0.996
σ = 0.10	β₁	−0.699	0.106	0.107	0.146	0.956	0.948	1.000
	β₂	0.753	0.183	0.178	0.144	0.964	0.952	0.988
σ = 0.15	β₁	−0.702	0.107	0.109	0.143	0.940	0.936	1.000
	β₂	0.742	0.183	0.178	0.138	0.948	0.940	0.964

Open in a new tab

Contributor Information

Megan Othus, Email: mothus@fhcrc.org, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, Tel.: 206-667-5749.

Yi Li, Harvard University and Dana Farber Cancer Institute, Boston, MA 02115.

References

Anello C, O’Neill R, Dubey S. Multicentre Trials: A US Regulatory Perspective. Statistical Methods in Medical Research. 2005;14(3):303–318. doi: 10.1191/0962280205sm398oa. [DOI] [PubMed] [Google Scholar]
Booth J, Hobert J. Standard Errors of Prediction in Generalized Linear Mixed Models. Journal of the American Statistical Association. 1998;93(441):262–272. [Google Scholar]
Cai J, Prentice R. Estimating Equations for Hazard Ratio Parameters Based on Correlated Failure Time Data. Biometrika. 1995;82(1):151–164. [Google Scholar]
Cai J, Shen Y. Permutation Tests for Comparing Marginal Survival Functions with Clustered Failure Time Data. Statistics in Medicine. 2000;19(21):2963–2973. doi: 10.1002/1097-0258(20001115)19:21<2963::aid-sim593>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
Cai J, Zhou H, Davis C. Estimating the Mean Hazard Ratio Parameters for Clustered Survival Data with Random Clusters. Statistics in Medicine. 1997;16(17):2009–2020. doi: 10.1002/(sici)1097-0258(19970915)16:17<2009::aid-sim606>3.0.co;2-r. [DOI] [PubMed] [Google Scholar]
Cai T, Wei L, Wilcox M. Semiparametric Regression Analysis for Clustered Failure Time Data. Biometrika. 2000;87(4):867–878. [Google Scholar]
Cai T, Cheng S, Wei L. Semiparametric Mixed-effects Models for Clustered Failure Time Data. Journal of the American Statistical Association. 2002;97(458):514–522. [Google Scholar]
Chen X, Fan Y, Tsyrennikov V. Efficient Estimation of Semiparametric Multivariate Copula Models. Journal of the American Statistical Association. 2006;101(475):1228–1240. [Google Scholar]
Clayton D. A Model for Association in Bivariate Life Tables and its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence. Biometrika. 1978:141–151. [Google Scholar]
Fine J, Glidden D, Lee K. A Simple Estimator for a Shared Frailty Regression Model. Journal of the Royal Statistical Society Series B, Statistical Methodology. 2003;65(1):317–329. [Google Scholar]
Fleiss J. Analysis of Data from Multiclinic Trials. Controlled Clinical Trials. 1986;7(4):267–275. doi: 10.1016/0197-2456(86)90034-6. [DOI] [PubMed] [Google Scholar]
Glidden D. A Two-Stage Estimator of the Dependence Parameter for the Clayton-Oakes Model. Lifetime Data Analysis. 2000;6(2):141–156. doi: 10.1023/a:1009664011060. [DOI] [PubMed] [Google Scholar]
Glidden D, Vittinghoff E. Modelling Clustered Survival Data from Multicentre Clinical Trials. Statistics in medicine. 2004;23(3):369–388. doi: 10.1002/sim.1599. [DOI] [PubMed] [Google Scholar]
Gray R. A Bayesian Analysis of Institutional Effects in a Multicenter Cancer Clinical Trial. Biometrics. 1994;50(1):244–253. [PubMed] [Google Scholar]
Hougaard P. Analysis of Multivariate Survival Data. Springer Verlag; 2000. [Google Scholar]
Jones B, Teather D, Wang J, Lewis J. A Comparison of Various Estimators of a Treatment Difference for a Multi-centre Clinical Trial. Statistics in Medicine. 1998;17:1767–1777. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1767::aid-sim978>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
Kalbfleisch J, Prentice R. The Statistical Analysis of Failure Time Data. 2. Wiley; 2002. [Google Scholar]
Klaassen C, Wellner J. Efficient Estimation in the Bivariate Normal Copula Model: Normal Margins are Least Favourable. Bernoulli. 1997;3(1):55–77. [Google Scholar]
Lam K, Lee Y, Leung T. Modeling Multivariate Survival Data by a Semipara-metric Random Effects Proportional Odds Model. Biometrics. 2002;58(2):316–323. doi: 10.1111/j.0006-341x.2002.00316.x. [DOI] [PubMed] [Google Scholar]
Li Y, Lin X. Semiparametric Normal Transformation Models for Spatially Correlated Survival Data. Journal of the American Statistical Association. 2006;101(474):591–603. [Google Scholar]
Li Y, Prentice R, Lin X. Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data. Biometrika. 2008;95(4):947–960. doi: 10.1093/biomet/asn049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lipsitz S, Parzen M. A Jackknife Estimator of Variance for Cox Regression for Correlated Survival Data. Biometrics. 1996;52:291–298. [PubMed] [Google Scholar]
Lipsitz S, Dear K, Zhao L. Jackknife Estimators of Variance for Parameter Estimates from Estimating Equations with Applications to Clustered Survival Data. Biometrics. 1994;50:842–846. [PubMed] [Google Scholar]
Logan B, Nelson G, Klein J. Analyzing Center Specific Outcomes in Hematopoietic Cell Transplantation. Lifetime Data Analysis. 2008;14:389404. doi: 10.1007/s10985-008-9100-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murphy S. Consistency in a Proportional Hazards Model Incorporating a Random Effect. Annals of Statistics. 1994;22(2):712–731. [Google Scholar]
Murphy S. Asymptotic Theory for the Frailty Model. Annals of Statistics. 1995;23:182–198. [Google Scholar]
Oakes D. Bivariate Survival Models Induced by Frailties. Journal of the American Statistical Association. 1989;84(406):487–493. [Google Scholar]
Parner E. Asymptotic Theory for the Correlated Gamma-frailty Model. Annals of Statistics. 1998;26(2):183–214. [Google Scholar]
Prentice R, Cai J. Covariance and Survivor Function Estimation Using Censored Multivariate Failure Time Data. Biometrika. 1992;79(3):495–512. [Google Scholar]
Seibel N, Steinherz P, Sather H, Nachman J, Delaat C, Ettinger L, Freyer D, Mattano L, Jr, Hastings C, Rubin C, Bertolone K, Franklin J, Heerema N, Mitchell T, Pysemany A, La M, Edens C, Gaynon P. Early Post-induction Intensification Therapy Improves Survival for Children and Adolescents with High-risk Acute Lymphoblastic Leukemia: A Report from the Children’s Oncology Group. Blood. 2008;11(5):2548–2555. doi: 10.1182/blood-2007-02-070342. [DOI] [PMC free article] [PubMed] [Google Scholar]
Senn S. Some Controversies in Planning and Analysing Multi-centre Trials. Statistics in Medicine. 1998;17(15–16):1753–1765. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1753::aid-sim977>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
Shih J, Louis T. Assessing Gamma Frailty Models for Clustered Failure Time Data. Lifetime Data Analysis. 1995;1(2):205–220. doi: 10.1007/BF00985771. [DOI] [PubMed] [Google Scholar]
Spiekerman C, Lin D. Marginal Regression Models for Multivariate Failure Time Data. Journal of the American Statistical Association. 1998;93(443):1164–1175. [Google Scholar]
Therneau T, Grambsch P. Modeling Survival Data: Extending the Cox Model. Springer Verlag; 2000. [Google Scholar]
van der Vaart A. Asymptotic Statistics. Cambridge University Press; 1998. [Google Scholar]
Vierron E, Giraudeau B. Sample Size Calculation for Multicenter Randomized Trial: Taking the Center Effect into Account. Contemporary Clinical Trials. 2007;28(4):451–458. doi: 10.1016/j.cct.2006.11.003. [DOI] [PubMed] [Google Scholar]
Wei L, Lin D, Weissfeld L. Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions. Journal of the American Statistical Association. 1989;84(408):1065–1073. [Google Scholar]
Zeng D, Lin D. Maximum Likelihood Estimation in Semiparametric Regression Models with Censored Data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007;69(4):507–564. [Google Scholar]
Zheng L, Zelen M. Multi-center Clinical Trials: Randomization and Ancillary Statistics. Annals of Applied Statistics. 2008;2(2):582–600. doi: 10.1214/07-AOAS151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Anello C, O’Neill R, Dubey S. Multicentre Trials: A US Regulatory Perspective. Statistical Methods in Medical Research. 2005;14(3):303–318. doi: 10.1191/0962280205sm398oa. [DOI] [PubMed] [Google Scholar]

[R2] Booth J, Hobert J. Standard Errors of Prediction in Generalized Linear Mixed Models. Journal of the American Statistical Association. 1998;93(441):262–272. [Google Scholar]

[R3] Cai J, Prentice R. Estimating Equations for Hazard Ratio Parameters Based on Correlated Failure Time Data. Biometrika. 1995;82(1):151–164. [Google Scholar]

[R4] Cai J, Shen Y. Permutation Tests for Comparing Marginal Survival Functions with Clustered Failure Time Data. Statistics in Medicine. 2000;19(21):2963–2973. doi: 10.1002/1097-0258(20001115)19:21<2963::aid-sim593>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]

[R5] Cai J, Zhou H, Davis C. Estimating the Mean Hazard Ratio Parameters for Clustered Survival Data with Random Clusters. Statistics in Medicine. 1997;16(17):2009–2020. doi: 10.1002/(sici)1097-0258(19970915)16:17<2009::aid-sim606>3.0.co;2-r. [DOI] [PubMed] [Google Scholar]

[R6] Cai T, Wei L, Wilcox M. Semiparametric Regression Analysis for Clustered Failure Time Data. Biometrika. 2000;87(4):867–878. [Google Scholar]

[R7] Cai T, Cheng S, Wei L. Semiparametric Mixed-effects Models for Clustered Failure Time Data. Journal of the American Statistical Association. 2002;97(458):514–522. [Google Scholar]

[R8] Chen X, Fan Y, Tsyrennikov V. Efficient Estimation of Semiparametric Multivariate Copula Models. Journal of the American Statistical Association. 2006;101(475):1228–1240. [Google Scholar]

[R9] Clayton D. A Model for Association in Bivariate Life Tables and its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence. Biometrika. 1978:141–151. [Google Scholar]

[R10] Fine J, Glidden D, Lee K. A Simple Estimator for a Shared Frailty Regression Model. Journal of the Royal Statistical Society Series B, Statistical Methodology. 2003;65(1):317–329. [Google Scholar]

[R11] Fleiss J. Analysis of Data from Multiclinic Trials. Controlled Clinical Trials. 1986;7(4):267–275. doi: 10.1016/0197-2456(86)90034-6. [DOI] [PubMed] [Google Scholar]

[R12] Glidden D. A Two-Stage Estimator of the Dependence Parameter for the Clayton-Oakes Model. Lifetime Data Analysis. 2000;6(2):141–156. doi: 10.1023/a:1009664011060. [DOI] [PubMed] [Google Scholar]

[R13] Glidden D, Vittinghoff E. Modelling Clustered Survival Data from Multicentre Clinical Trials. Statistics in medicine. 2004;23(3):369–388. doi: 10.1002/sim.1599. [DOI] [PubMed] [Google Scholar]

[R14] Gray R. A Bayesian Analysis of Institutional Effects in a Multicenter Cancer Clinical Trial. Biometrics. 1994;50(1):244–253. [PubMed] [Google Scholar]

[R15] Hougaard P. Analysis of Multivariate Survival Data. Springer Verlag; 2000. [Google Scholar]

[R16] Jones B, Teather D, Wang J, Lewis J. A Comparison of Various Estimators of a Treatment Difference for a Multi-centre Clinical Trial. Statistics in Medicine. 1998;17:1767–1777. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1767::aid-sim978>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]

[R17] Kalbfleisch J, Prentice R. The Statistical Analysis of Failure Time Data. 2. Wiley; 2002. [Google Scholar]

[R18] Klaassen C, Wellner J. Efficient Estimation in the Bivariate Normal Copula Model: Normal Margins are Least Favourable. Bernoulli. 1997;3(1):55–77. [Google Scholar]

[R19] Lam K, Lee Y, Leung T. Modeling Multivariate Survival Data by a Semipara-metric Random Effects Proportional Odds Model. Biometrics. 2002;58(2):316–323. doi: 10.1111/j.0006-341x.2002.00316.x. [DOI] [PubMed] [Google Scholar]

[R20] Li Y, Lin X. Semiparametric Normal Transformation Models for Spatially Correlated Survival Data. Journal of the American Statistical Association. 2006;101(474):591–603. [Google Scholar]

[R21] Li Y, Prentice R, Lin X. Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data. Biometrika. 2008;95(4):947–960. doi: 10.1093/biomet/asn049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Lipsitz S, Parzen M. A Jackknife Estimator of Variance for Cox Regression for Correlated Survival Data. Biometrics. 1996;52:291–298. [PubMed] [Google Scholar]

[R23] Lipsitz S, Dear K, Zhao L. Jackknife Estimators of Variance for Parameter Estimates from Estimating Equations with Applications to Clustered Survival Data. Biometrics. 1994;50:842–846. [PubMed] [Google Scholar]

[R24] Logan B, Nelson G, Klein J. Analyzing Center Specific Outcomes in Hematopoietic Cell Transplantation. Lifetime Data Analysis. 2008;14:389404. doi: 10.1007/s10985-008-9100-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Murphy S. Consistency in a Proportional Hazards Model Incorporating a Random Effect. Annals of Statistics. 1994;22(2):712–731. [Google Scholar]

[R26] Murphy S. Asymptotic Theory for the Frailty Model. Annals of Statistics. 1995;23:182–198. [Google Scholar]

[R27] Oakes D. Bivariate Survival Models Induced by Frailties. Journal of the American Statistical Association. 1989;84(406):487–493. [Google Scholar]

[R28] Parner E. Asymptotic Theory for the Correlated Gamma-frailty Model. Annals of Statistics. 1998;26(2):183–214. [Google Scholar]

[R29] Prentice R, Cai J. Covariance and Survivor Function Estimation Using Censored Multivariate Failure Time Data. Biometrika. 1992;79(3):495–512. [Google Scholar]

[R30] Seibel N, Steinherz P, Sather H, Nachman J, Delaat C, Ettinger L, Freyer D, Mattano L, Jr, Hastings C, Rubin C, Bertolone K, Franklin J, Heerema N, Mitchell T, Pysemany A, La M, Edens C, Gaynon P. Early Post-induction Intensification Therapy Improves Survival for Children and Adolescents with High-risk Acute Lymphoblastic Leukemia: A Report from the Children’s Oncology Group. Blood. 2008;11(5):2548–2555. doi: 10.1182/blood-2007-02-070342. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Senn S. Some Controversies in Planning and Analysing Multi-centre Trials. Statistics in Medicine. 1998;17(15–16):1753–1765. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1753::aid-sim977>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]

[R32] Shih J, Louis T. Assessing Gamma Frailty Models for Clustered Failure Time Data. Lifetime Data Analysis. 1995;1(2):205–220. doi: 10.1007/BF00985771. [DOI] [PubMed] [Google Scholar]

[R33] Spiekerman C, Lin D. Marginal Regression Models for Multivariate Failure Time Data. Journal of the American Statistical Association. 1998;93(443):1164–1175. [Google Scholar]

[R34] Therneau T, Grambsch P. Modeling Survival Data: Extending the Cox Model. Springer Verlag; 2000. [Google Scholar]

[R35] van der Vaart A. Asymptotic Statistics. Cambridge University Press; 1998. [Google Scholar]

[R36] Vierron E, Giraudeau B. Sample Size Calculation for Multicenter Randomized Trial: Taking the Center Effect into Account. Contemporary Clinical Trials. 2007;28(4):451–458. doi: 10.1016/j.cct.2006.11.003. [DOI] [PubMed] [Google Scholar]

[R37] Wei L, Lin D, Weissfeld L. Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions. Journal of the American Statistical Association. 1989;84(408):1065–1073. [Google Scholar]

[R38] Zeng D, Lin D. Maximum Likelihood Estimation in Semiparametric Regression Models with Censored Data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007;69(4):507–564. [Google Scholar]

[R39] Zheng L, Zelen M. Multi-center Clinical Trials: Randomization and Ancillary Statistics. Annals of Applied Statistics. 2008;2(2):582–600. doi: 10.1214/07-AOAS151. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Gaussian Copula Model for Multivariate Survival Data

Megan Othus

Yi Li

Abstract

1 Introduction

2 Model Specification

3 Inference

3.1 Likelihood Development

3.2 Estimation

4 A Frailty Model Extension

4.1 A marginalized frailty model

4.2 Prediction of the frailty terms

5 Theoretical Results

Theorem 1

Theorem 2

6 Simulation Results

Table 1.

Fig. 1.

Table 2.

7 Data Application: Children’s Oncology Group Study 1961

7.1 Is there evidence of correlation between the survival times of patients within the same institution?

Table 3.

Fig. 2.

Table 4.

7.2 Is there further information that can be gained by considering frailty terms?

Fig. 3.

Fig. 4.

Table 5.

Fig. 5.

7.3 An alternative analysis using a gamma frailty model

Table 6.

Fig. 6.

8 Discussion

Acknowledgments

A Regularity Conditions and Notation

B Proof and Associated Lemmas for Theorem 1

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Proof of Theorem 1

C Proof and Associated Lemmas for Theorem 2

Lemma 5

Lemma 6

Proof of Theorem 2

D Proofs of Lemmas

Proof of Lemma 1

Proof of Lemma 2

Proof of Lemma 3

Proof of Lemma 4

Proof of Lemma 5

Proof of Lemma 6

E Extended simulation results

Table 7.

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases