Abstract
In this work we deal with correlated failure time (age at onset) data arising from population-based case-control studies, where case and control probands are selected by population-based sampling and an array of risk factor measures is collected for both cases and controls and their relatives. Parameters of interest are effects of risk factors on the failure time hazard function and within-family dependencies among failure times after adjusting for the risk factors. Due to the retrospective sampling scheme, large sample theory for existing methods has not been established. We develop a novel technique for estimating the parameters of interest under a general semiparametric shared frailty model. We also present a simple, easily computed, and non-iterative nonparametric estimator for the cumulative baseline hazard function. We provide rigorous large sample theory for the proposed method. We also present simulation results and a real data example for illustrating the utility of the proposed method.
Keywords: Case-control study, Correlated failure times, Family study, Frailty model, Multivariate survival model
1. Introduction
Clustered failure times arise often in medical and epidemiologic studies. Examples include disease onset times of twins (with time expressed in terms of age), multiple recurrence times of infections on an individual, or time to blindness for the two eyes within an individual. A typical case-control family study includes a random sample of independent diseased individuals (cases) and non-diseased individuals (controls), along with their family members. An array of genetic and environmental risk factor measures is collected on these individuals. Integration of genetic and environmental data is a central problem of modern observational epidemiology. Case-control family studies are powerful because they provide an efficient way to assess the effect of risk factors on the occurrence of a rare disease, and furthermore allow researchers to dissect genetic and environmental contributions to the disease based on the familial aggregation pattern of disease clusters. Hopper [16] suggested that such study designs may be the future of epidemiology in general, not just genetic epidemiology. Hence the need for statistical methods that can exploit such data is acute.
In this work we focus on population-based matched case-control family studies, where a number of case and control probands are randomly sampled from a well-defined population. The probands are the index subjects because of whom the families are ascertained. Here we use the term proband in a broad sense to refer to both cases and controls, in contrast with the traditional usage in which probands refer only to cases.
Relative to classical case-control methods, analysis of these studies is complicated in two major ways. First, comparisons are no longer solely between subjects with and without the disease under study, but rather between collections of the case probands and their relatives and the control probands and their relatives, with each collection typically including many subjects both with and without the disease under study. Second, data are clustered within families, and hence reflect intra-familial correlation due to unmeasured genetic and environmental factors.
Our work is motivated by a recent breast cancer study [23], [24]. In this study, the cases were incident breast cancer cases ascertained from the population-based Surveillance, Epidemiology, and End Results (SEER) cancer registry. The controls were selected by random digit dialing, and were matched with cases based on age at diagnosis and county of residence. Female relatives of case and control probands were identified, and the risk factor and outcome information was collected on these relatives. The primary goals of the study are (a) to determine the degree of familial dependence with respect to age at breast cancer diagnosis; and (b) to assess the effects of covariates on breast cancer risk.
Two modeling approaches, marginal and conditional, are typically used for accounting for the correlation within a cluster. In the conditional model, the correlation is explicitly induced by a cluster-specific random effect, with the outcomes of the cluster members being conditionally independent given the random effect. In the survival context, this model is known as the frailty model, with the random frailty term typically assumed to act multiplicatively on the hazard. Many frailty models have been considered, including gamma, positive stable, inverse Gaussian, compound Poisson, and lognormal. See [17] for a broad review. Under a frailty model, the regression coefficients are cluster-specific log-hazard ratios. In the marginal model, by contrast, the correlation is modelled through a multivariate distribution, often involving a copula function, with a specified model for the marginal hazard functions. The regression coefficients in the marginal model represent the log-hazard ratios at the population level regardless of which cluster an individual comes from. The effect is thus “population-averaged.” A comparison of the conditional and marginal modelling approaches can be found in [34].
Methods exist for case-control family studies under both modelling approaches. Shih and Chatterjee [31] proposed a semi-parametric quasi-partial-likelihood approach for estimating the regression coefficients in a bivariate copula model. Their cumulative hazard estimator requires an iterative solution, and thus, so far, the properties of their estimators have been investigated only by simulation. Moreover, in the case of multiple relatives for each proband, the relatives were treated in the estimation process as if they were independent, which may lead to loss of efficiency in the baseline hazard function estimator. By contrast, a quasi-EM algorithm method for the popular gamma frailty model is presented in [18], where the baseline hazard function estimator naturally accommodates multiple relatives in a family [19]. However, with this method as well, the properties of the proposed estimators were studied only by simulation. The Shih-Chatterjee method can be adapted to the family-specific frailty setting [27], but with the same limitation as for the marginal model: the lack of large sample theory.
In this work, we develop a new estimation technique for the general semiparametric shared frailty model, where the parameters of interest are the regression coefficients and the frailty parameters. Our method covers any frailty distribution with finite moments. The estimation procedure for the baseline hazard function leads to an estimator whose asymptotic properties can be derived and expressed in a tractable way. Thus, this paper is the first to present an estimation procedure with rigorous asymptotic theory for a frailty survival model in the case-control family study setting.
Section 2 presents our model, and Section 3 describes our estimation procedure. Section 4 gives the consistency and asymptotic normality results. In Section 5, we describe an extension of our method for the case where the proband observation times are subject to a certain restriction that can arise in some studies. Section 6 presents simulation results. In Section 7 we illustrate our method with a case-control family study of early onset breast cancer. Section 8 provides a short discussion. Section 9 provides the details of the asymptotic theory. Throughout the paper, certain details have been omitted for brevity. These details are given in an expanded version of this paper which is available at the Front for the Mathematics ArXiv under Statistics, publication number: math.ST/0703300.
2. Notation and Model Formulation
We consider a matched case-control family study where one case proband is age-matched with one control proband, and an array of risk factors is measured on the case and control probands and their relatives. Each matched set contains one case family and one control family, and there are n i.i.d. matched sets. Let and Cij denote the age of disease onset and age at censoring, respectively, for individual j of family i, i = 1, . . . , 2n, j = 0, 1, . . . , mi, where j = 0 corresponds to the proband. Following [28] (p. 187), we regard mi as a random variable over {1, . . . , m} for some m, and build up the remainder of the model conditional on mi. Define to be the failure indicator and to be the observed follow-up time for individual ij. We assume that a p-vector of covariates is observed on all subjects, and let Zij denote the value of the time-independent covariate vector for individual ij. In addition, we associate with family i an unobservable family-level covariate ωi, the “frailty”, which induces dependence among family members. The conditional hazard function for proband i, given the family frailty ωi, is assumed to take the form
(1) |
The conditional hazard function for relative ij, j = 1, . . . , mi, given the family frailty ωi, is assumed to take the form
(2) |
Here β is a p-vector of unknown regression coefficients, and λ0 is a conditional baseline hazard of unspecified form. The above model implies that the proband and the relatives have a common conditional baseline hazard function λ0, and that all the dependence between the proband and the relatives in a given family is due to the frailty factor ωi. This is the standard formulation of the frailty survival model; see, for example, [9], [10], [17] and [26]. Hence, the conditional hazard function of a relative, given Zij, ωi and the proband data, is a function of only Zij and ωi. The random variable ωi is assumed to have a density f(ω) ≡ f(ω; θ), where θ is an unknown parameter. For simplicity, we assume that θ is a scalar, though the vector case could be developed in a similar manner.
We put γ = (βT , θ)T, and let γo = (βoT, θo)T denote the true value of γ. The objective is to estimate γ and the cumulative baseline hazard . Let denote the true value of Λ0. Further, let δiR = (δi1, . . . , δimi), TiR = (Ti1, . . . , Timi), and .
We assume that Zij is bounded, and that the parameter γ lies in a compact subset of IRp+1 containing an open neighborhood of γo. These two assumptions imply that βTZij is bounded. In addition, we assume the following.
Conditional on and ωi, the censoring times are independent and noninformative for ωi and (β, Λ0). In addition, the frailty ωi is independent of .
The effect of the covariates on the observed time is subject-specific, i.e. Pr(Tij, δij|Zi0, ZiR, ωi) = Pr(Tij, δij|Zij, ωi). This implies Pr(Tij, δij|Zi0, ZiR) = Pr(Tij, δij|Zij) even without frailty.
A number of additional technical assumptions are listed in Section 9.
Following [31] and [18], the likelihood function for the data can be written as
(3) |
Since f(ZiR|Zi0) does not depend on the parameters of interest (β, Λ0, θ), this term will be ignored. In the following subsections we consider the other two terms in (3).
2.1. The likelihood for the proband data
To account for the matching of age of onset, as in [31] and [18], the likelihood function of the proband data, , is constructed based on the retrospective likelihood for the standard matched case-control study [29]. We express this likelihood in terms of the marginal survival function Si0(t) = Pr(Ti0 > t|Zi0) = ∫ Pr(Ti0 > t|Zi0,ω)f(ω)dω. In our setting we have n one-to-one matched sets. Based on the marginal survivor function, the marginal hazard function can be written as
where
and Hi0(t) = Λ0(t)exp(βTZi0). We arrange the notation so that the first n families are the case families and the rth case family, r = 1, . . . , n, is matched with the (n + r)th control family. The likelihood for the proband data is then replaced by the following conditional likelihood:
(4) |
where ξkk’i(t; γ, Λ0) = μki(t; γ, Λ0)/μk’i(t; γ, Λ0) for k, k’ = 0, 1, 2. Given (4), the likelihood score functions , l = 1, . . . , p, and can be obtained by straightforward differentiation. The detailed formulas are presented in the expanded paper.
Under the gamma frailty model, we have μ1i(t; γ, Λ0)/μ0i(t; γ, Λ0) = {θHi0(t) + 1}-1, and so the likelihood function (4) corresponds to that presented in [18] in the case of one-to-one matching. Extension to matching of multiple cases or multiple controls are straightforward, see e.g. [4].
2.2. The likelihood for the data from the relatives
Let Nij(t) = δijI(Tij ≤ t), j = 1, . . . , mi, , Hij(t) = Λ0(Tij ∧ t) exp(βTZij), j = 1, . . . , mi, and , and let τ be the maximum followup time. The likelihood of the data from the relatives then can be written as
(5) |
Here, by a Bayes theorem argument,
(6) |
Given (5), the likelihood score functions , l = 1, . . . , p, and can be obtained by straightforward differentiation. The detailed formulas are presented in the expanded paper.
3. The Proposed Approach
We focus first on estimating the baseline cumulative hazard function Λ0(t). Let Yij(t) = I(Tij ≥ t), and let denote the σ-algebra generated by (Ti0, δi0, Zi0) plus the entire observed history of the relatives up to time t:
It is worth noting that the observational times Ti0 for probands can be greater than time t and thus the filtration may include probands’ failure times or censoring times that are beyond t, a feature that is unique for case-control family data. In regard to the relatives, however, includes only information up to time t.
By the innovation theorem ([3], Thm. 3.4), the stochastic intensity process for Nij(t), i = 1, . . . , 2n, j = 1, . . . , mi, with respect to is given as follows [11], [28]:
(7) |
where, using (6),
Define (for 0 ≤ r ≤ m and h ≥ 0)
(8) |
Some salient properties of ψ*(r, h) are noted in Sec. 9.1. With this definition, we have ψi(t, γ, Λ0) = ψ*(Ni.(t) + δi0, Hi.(t) + Hi0(Ti0)).
Let τg, g = 1, . . . , G, denote the gth ordered failure time of the relatives and assume that dg failures were observed at time τg. In theory, since we are dealing with continuous survival distributions, dg = 1 for all g, but we express the following estimators in a form that allows for a modest level of ties. A Breslow-type estimator of the cumulative baseline hazard function, with a jump at each observed failure time among the relatives, can be formulated in a natural way similarly to Shih and Chatterjee [31], with the g-th jump given by
(9) |
However, ψi(t, γ, Λ0) could be a function of Λ0(Ti0) and Ti0 could be greater than t. Consequently, the above Breslow formula for the jump in the baseline hazard estimator at time t will often involve values of Λ0 for times beyond time t. For example, under the gamma frailty model with expectation 1 and variance θ, ψi(t, γ, Λ0) = {θ-1 + Ni.(t) + δi0}{θ-1 + Hi.(t) + Hi0(Ti0)}-1. An iterative procedure is thus required to compute the estimator. In addition, because of this estimator’s complicated structure, its asymptotic properties have not been established.
We propose instead to estimate the baseline hazard function using a noniterative two-stage procedure. Let Λmax be some known (possibly large) upper bound for . Define ψ‾(r, h) = ψ*(r, h∧hmax), with hmax = meνΛmax, where ν is an upper bound on |βTZij|. Further, define ψ‾i(t, γ, Λ) = ψ‾(Ni(t), Hi.(t, γ, Λ)). The first-stage estimator is then defined as a step function whose g-th jump is given by
(10) |
with
The formula (10) is of the same form as (9), with the following changes: (a) ψi is replaced by ψ‾i for technical reasons, (b) more substantively, in computing the jump at each failure time τg, we include only relatives whose proband’s observation time is less than τg. We thereby avoid the problem with (9) that was described above, and hence avoid the need for an iterative optimization process. Since (10) excludes some of the available data, these benefits are attained at the expense of a loss in efficiency. We therefore follow up with a second stage in order to recoup efficiency.
The second-stage estimator is defined as a step function whose g-th jump is given by
(11) |
where is defined analogously to ψ‾(t, γ, Λ0), with Λ0(Ti0) replaced by if Ti0 ≥ t and by Λ̂0(Ti0) otherwise. The large-sample properties of Λ̂0(t) will be determined by those of . The estimator is not necessarily bounded by Λmax, but if desired, we can replace it by without affecting the asymptotics.
For estimating (β, θ) we use a pseudo-likelihood approach: in the score functions based on L(1) and L(2), we replace the unknown Λ0 by Λ̂0. Thus, the score function corresponding to βl (for l = 1, . . . , p) is given by , and the estimating function for θ is given by . To summarize, our proposed estimation procedure is as follows. (1) Provide an initial value for γ. (2) For the given values of γ, estimate Λ0 using (10) and (11). (3) For the given value of Λ0, estimate γ. (4) Repeat Steps 2 and 3 until convergence is reached with respect to Λ̂0 and γ̂. Hence, Instead of having two iterative processes: within each iteration between γ̂ and Λ̂0, to iterate also for estimating Λ0 itself (as in [31]) we propose one iterative process between γ̂ and Λ̂0. By eliminating the iterative process for estimating the baseline hazard function we are able to provide the asymptotic theory of our estimators, in contrast to [18] and [31].
4. Asymptotic Properties
We show that γ̂ is a consistent estimator of γo and that √n(γ̂ - γo) is asymptotically mean-zero multivariate normal. In this section, we present a broad outline sketch of the argument. In Section 9, we provide the details of the proofs, including a detailed list of the technical conditions assumed. The arguments are patterned after those of [14] and [36], but with considerable expansion, as will be elaborated in section 9.
Consistency is shown through the following steps.
Claim A1. converges in pr. to some function uniformly in t and γ. The function satisfies .
Claim A2. Λ̂0(t, γ) converges in pr. to some function Λ̂0(t, γ) uniformly in t and γ. The function Λ̂0(t, γ) satisfies .
Claim B. U(γ, Λ̂0(·, γ)) converges in pr. uniformly in t and γ to a limit u(γ, Λ0(·, γ)).
Claim C. There exists a unique consistent (in pr.) root to U(γ̂, Λ̂0(·, γ̂)) = 0.
It should be emphasized that in Claims A1 and A2 the limits coincide at the true parameter value γo. The proofs of Claims A1, A2, and B involve empirical process and function-space compactness arguments, while Claim C is shown using Foutz’s [8] theorem on consistency of maximum likelihood type estimators.
Asymptotic normality is shown by decomposing U(γ̂, Λ̂0(·, γ̂)) = 0 as
In Section 9 we analyze each of the above three terms and prove that √n(γ̂-γo) has an asymptotic mean-zero multivariate normal distribution. Although it is possible to develop a consistent closed-form sandwich estimator for the asymptotic covariance matrix, we do not present this estimator because it is too complicated to be practically useful. Instead, as discussed in Section 6, we recommend bootstrap standard-error estimates.
In Section 9 we also show the uniform consistency and weak convergence of the cumulative baseline hazard function estimator Λ̂0(t, γ̂). Such results were not presented in [14] or [36].
5. Extension to Restricted Sampling of Probands
A key assumption in our procedure for estimating Λ0 is that the support of the probands’ observation times and that of relatives’ observation times have the same lower limit, which is designated (without loss of generality) as time zero. In some applications, however, the probands’ observed times are restricted to some range [s0, s1] with s0 > 0. For example, a multi-center case-control breast cancer study where ages of cases and controls are restricted between ages 35-64, is presented in [25]. In a design of this form, where the probands’ observed times are left-restricted by s0 and the relatives’ failure times are unrestricted, Λ0 will be underestimated by our two-stage procedure. But this bias can be easily corrected by first estimating Λ0(s0).
We present here the resulting three-stage estimator for the left-restricted design. Let , and let ΔΛ̂0{τg, Λ0(s0) be defined analogously to and ΔΛ̂0(τg), with Λ0(Ti0) = Λ0(s0) + Στg∈[s0, Ti0]Δ0(τg). The estimator Λ̂0(s0) is defined as the root of
(12) |
The root can be found by simple univariate Newton-Raphson iteration. This completes the first stage. The second stage involves calculating , g = 1, . . . , G, using the formula (10). In the third stage, we use the results of the second stage and the formula (11) to calculate the final estimate ΔΛ̂0(τg), g = 1, . . . , G. In applying (11), we replace Λ0(Ti0) by if Ti0 ≥ (τg) and by Λ̂0(Ti0 otherwise.
In Section 6 below, we present simulation results for this estimator. In theory, the asymptotic properties of the three-stage procedure could be worked out via an extension of the arguments for the two-stage procedure, but the algebra becomes very complicated. We hope to develop asymptotic theory for the left-restricted design in future work.
6. Simulation Results - Gamma Frailty
We have performed a simulation study to evaluate the finite sample performance of the proposed method and compare it with existing methods. One of the most extensively used frailty distributions is the gamma distribution: customarily, the gamma distribution with expectation 1 and variance θ. Under this model, the variance parameter θ quantifies the heterogeneity of risk among families, with larger values of θ corresponding to stronger within-family dependence. In addition, the gamma frailty model can be re-expressed in terms of the Clayton-Oakes copula-type model [6, 27]. Moreover, the cross-ratio, introduced by Oakes [27] as a local measure of association between survival times, is constant on the support of the failure time distribution and equals 1+θ. Finally, the gamma frailty model is convenient mathematically, because it admits a closed-form representation of the marginal survival distributions. These features make the gamma frailty model very popular. We therefore chose the gamma frailty model as the framework for our simulation study.
Simulation results are based on 500 control probands matched to 500 case probands, with one relative sampled for each proband. We considered a single U[0, 1] distributed covariate with β = ln(2), Λ0(t) = t, θ = 2, and a U[0, 1] censoring variable, yielding a censoring rate among the relatives of approximately 60%. In Table 1 we compare the following three estimates: the proposed estimate with the two-stage procedure for Λ0, the estimate of [18], and a modified version of [31] estimate, with their method adapted to the gamma frailty model. Results are based on 500 simulated data sets. The efficiency difference between our two-stage estimator and that of Shih and Chatterjee is very small.
Table 1.
Simulation results: 500 control proband matched with 500 case probands; One relative for each proband; β = 0.693, Λ0(t) = t, θ = 2.0, 500 samples
Proposed Method |
Hsu et al. |
Shih and Chatterjee |
||||
---|---|---|---|---|---|---|
mean | Empirical Standard Error | mean | Empirical Standard Error | mean | Empirical Standard Error | |
β ^ | 0.706 | 0.197 | 0.697 | 0.201 | 0.698 | 0.182 |
θ ^ | 2.003 | 0.312 | 1.986 | 0.302 | 1.992 | 0.303 |
Λ̂0(0.2) | 0.201 | 0.034 | 0.204 | 0.030 | 0.202 | 0.029 |
Λ̂0(0.4) | 0.402 | 0.063 | 0.407 | 0.058 | 0.403 | 0.054 |
Λ̂0(0.6) | 0.603 | 0.095 | 0.612 | 0.090 | 0.605 | 0.084 |
Λ̂0(0.8) | 0.809 | 0.136 | 0.820 | 0.131 | 0.811 | 0.122 |
For our method, we also performed simulations for two additional settings. In both settings, we took β = 0 and θ = 3. The first setting involved a a censoring distribution of U[0, 4] and a covariate with a U[0, 4] distribution; the second involved a censoring distribution of U[0, 0.1] and a covariate with a U[0, 1] distribution. The respective overall censoring rates in these two settings were approximately 30% and 90%. To construct confidence intervals, we used a bootstrap approach. In the setting of censored survival data, the usual nonparametric bootstrap is problematic because it leads to a substantial proportion of tied survival times. We therefore used the weighted bootstrap approach of [22]. For the weighted bootstrap, a sample of 2n independent and identically distributed weights from the unit exponential distribution was generated for each bootstrap sample. Let ξ1, . . . , ξ2n be the standardized weights after dividing each weight by the average weight. Then, in the estimating functions, for any given function h the empirical mean is replaced by its corresponding weighted empirical mean . This weighted bootstrap procedure gives valid inference for all parameters under right-censored univariate failure times [22].
Results for the two-stage procedure for Λ0 are presented in Tables 2 for various levels of censoring. We present the mean, the empirical standard error, and the coverage rate of the 95% weighted bootstrap confidence interval. The results are based on 50 bootstrap samples for each of the 2000 simulated data sets of each configuration. Our estimates perform well in terms of bias and coverage probability.
Table 2.
Simulation results for the proposed estimators: 500 control proband matched with 500 case probands; One relative for each proband; Λ0(t) = t; 2000 samples
β = 0.0 |
β = 0.693 |
||||||
---|---|---|---|---|---|---|---|
θ | Estimator | Mean | Empirical Standard Error | Coverage Rate | Mean | Empirical Standard Error | Coverage Rate |
90% censoring rate | |||||||
2.0 | β ^ | -0.013 | 0.217 | 93.5 | 0.694 | 0.200 | 96.0 |
θ ^ | 2.127 | 0.872 | 96.0 | 2.082 | 0.667 | 94.8 | |
Λ̂0(0.02) | 0.020 | 0.006 | 94.2 | 0.020 | 0.005 | 95.2 | |
Λ̂0(0.04) | 0.041 | 0.010 | 94.7 | 0.040 | 0.010 | 95.2 | |
Λ̂0(0.06) | 0.061 | 0.015 | 94.8 | 0.060 | 0.014 | 96.1 | |
Λ̂0(0.08) | 0.081 | 0.020 | 95.0 | 0.080 | 0.019 | 96.1 | |
3.0 | β ^ | -0.025 | 0.226 | 91.7 | 0.689 | 0.206 | 95.4 |
θ ^ | 3.126 | 1.142 | 94.2 | 3.172 | 0.964 | 95.7 | |
Λ̂0(0.02) | 0.020 | 0.005 | 95.7 | 0.020 | 0.005 | 94.8 | |
Λ̂0(0.04) | 0.041 | 0.012 | 95.8 | 0.040 | 0.010 | 95.9 | |
Λ̂0(0.06) | 0.062 | 0.016 | 96.1 | 0.060 | 0.014 | 96.5 | |
Λ̂0(0.08) | 0.082 | 0.021 | 95.9 | 0.080 | 0.019 | 95.7 | |
30% censoring rate | |||||||
2.0 | β ^ | 0.007 | 0.047 | 95.5 | 0.703 | 0.063 | 96.5 |
θ ^ | 2.013 | 0.247 | 95.3 | 1.993 | 0.196 | 95.5 | |
Λ̂0(0.2) | 0.200 | 0.037 | 95.5 | 0.197 | 0.045 | 94.5 | |
Λ̂0(0.4) | 0.397 | 0.073 | 95.0 | 0.394 | 0.085 | 94.0 | |
Λ̂0(0.6) | 0.596 | 0.110 | 95.1 | 0.591 | 0.125 | 94.0 | |
Λ̂0(0.8) | 0.794 | 0.147 | 95.5 | 0.788 | 0.166 | 94.1 | |
3.0 | β ^ | 0.006 | 0.048 | 97.3 | 0.703 | 0.061 | 97.2 |
θ ^ | 3.009 | 0.370 | 95.3 | 2.999 | 0.314 | 96.0 | |
Λ̂0(0.2) | 0.200 | 0.040 | 94.0 | 0.197 | 0.047 | 94.4 | |
Λ̂0(0.4) | 0.399 | 0.078 | 94.1 | 0.392 | 0.091 | 94.0 | |
Λ̂0(0.6) | 0.597 | 0.116 | 95.0 | 0.586 | 0.133 | 94.9 | |
Λ̂0(0.8) | 0.796 | 0.155 | 95.6 | 0.792 | 0.176 | 95.0 |
To study the case of left-restricted data, we considered a configuration similar to that of Table 1, but with the probands observation times restricted to be > 0.1. In Table 3, we present results for our three-stage estimator as well as for the methods of [18] and [31]. We see that estimating Λ0(s0) leads to modest efficiency loss in Λ̂0 relative to the other two methods.
Table 3.
Simulation results of left-restricted data: 500 control proband matched with 500 case probands; One relative for each proband; s0 = 0.1, β = 0.693, Λ0(t) = t, θ = 2.0, 500 samples
Proposed Method |
Hsu et al. |
Shih and Chatterjjee |
||||
---|---|---|---|---|---|---|
mean | Empirical Standard Error | mean | Empirical Standard Error | mean | Empirical Standard Error | |
β ^ | 0.735 | 0.214 | 0.698 | 0.234 | 0.694 | 0.170 |
θ ^ | 2.040 | 0.336 | 2.080 | 0.338 | 2.080 | 0.337 |
Λ̂0(0.2) | 0.195 | 0.049 | 0.198 | 0.034 | 0.198 | 0.031 |
Λ̂0(0.4) | 0.392 | 0.090 | 0.402 | 0.068 | 0.401 | 0.062 |
Λ̂0(0.6) | 0.589 | 0.129 | 0.604 | 0.102 | 0.603 | 0.092 |
Λ̂0(0.8) | 0.786 | 0.172 | 0.813 | 0.143 | 0.810 | 0.128 |
Λ̂0(s0) | 0.098 | 0.025 | - | - | - | - |
7. Example
We apply our method to the breast cancer study mentioned in the introduction. Various risk factors were measured on probands and their relatives. For illustrative purposes we consider age at first full-term pregnancy with the relatives of the probands being the mothers. The following analysis is based on 437 breast cancer case probands matched with 437 control probands and a total of 874 mothers. The number of mothers who had breast cancer was 70 among the case families and 35 among the control families. The number of women whose first live birth occurred before age 20 was 142 among the probands and 181 among the mothers. We use the gamma frailty model with expectation 1 and variance θ. Three estimation procedures are considered: our proposed method, the estimate of Hsu et al. [18] method, and a modified version of Shih and Chatterjee [31] estimate. For our proposed method, the two-stage procedure for Λ0 is used since the age range of the mothers with breast cancer was 20-76 and of the age range of the probands was 22-44. Table 4 presents the regression coefficient parameter estimate β̂, the dependency parameter estimate, θ̂, and Λ̂0 at ages 40, 50, 60 and 70 years, along with their respective bootstrap standard errors. The proposed approach and that of Shih and Chatterjee yielded similar dependency estimates with the proposed approach being moderately more efficient. Hsu et al.’s approach gave a slightly lower dependence estimate. The regression coefficient estimates of Hsu et al. and that of Shih and Chatterjee are similar, with the latter being slightly more efficient. The proposed approach yielded a slightly lower covariate effect. The cumulative baseline hazard estimates are similar under the three estimation techniques. The results of all the three methods imply that women who had their first full-term pregnancy before age 20 have a reduced risk of developing breast cancer, supporting the observation that breast cancer risk is reduced by early first full-term pregnancy (e.g. [7], among others). The estimates of the dependency parameter imply that after adjusting for the first full-term pregnancy, there remains a significant dependency between the ages of onset for mothers and daughters with cross ratio (1 + θ) close to 2.
Table 4.
Analysis of a case-control family study of breast cancer
Proposed Method |
Hsu et al. |
Shih and Chatterjee |
||||
---|---|---|---|---|---|---|
mean | Bootstrap Standard Error | mean | Bootstrap Standard Error | mean | Bootstrap Standard Error | |
β ^ | -0.440 | 0.158 | -0.484 | 0.216 | -0.476 | 0.168 |
θ ^ | 0.952 | 0.443 | 0.889 | 0.443 | 0.944 | 0.460 |
Λ̂0(40) | 0.005 | 0.002 | 0.005 | 0.002 | 0.005 | 0.002 |
Λ̂0(50) | 0.022 | 0.006 | 0.023 | 0.006 | 0.023 | 0.006 |
Λ̂0(60) | 0.048 | 0.010 | 0.051 | 0.010 | 0.049 | 0.010 |
Λ̂0(70) | 0.091 | 0.016 | 0.095 | 0.016 | 0.092 | 0.016 |
8. Discussion
In this work we have presented a new estimator for matched case-control family study survival data under a frailty model, allowing an arbitrary frailty distribution with finite moments. Rigorous large sample theory has been provided. Simulation results under the popular gamma frailty model indicate that the proposed procedure provides estimates with minimal bias and confidence intervals with the appropriate coverage rate. Moreover, our estimators were seen to be essentially identical in efficiency to estimators based on the more complex approach of [31].
Rigorous large sample theory has been provided for age-unrestricted sampling of cases and controls. For age-restricted sampling, the asymptotic theory could be worked out largely following the arguments for the two-stage estimator but the algebra becomes very complicated. This development is a potential topic for future work.
Having suggested a model with an arbitrary frailty distribution with finite moments, we offer some remarks on how to choose the frailty distribution and the effect of this choice on the parameter estimates. Hougraard [17] provides a comprehensive discussion of the theoretical properties and the fit of the following distributions: gamma, positive stable, power variance function (PVF), inverse Gaussian and lognormal. Hsu et al. [20] show by simulation that the biases in the marginal regression estimates and the marginal hazard function are generally 10% or lower under the assumed gamma distribution and mis-specification of the frailty distribution. This suggests that the gamma frailty model can be a practical choice if the marginal parameters are of primary interest. However, when the dependence function is also of interest, a correct specification of the frailty distribution is crucial. A general diagnostic approach to check the bivariate association structure of clustered failure times is given in [13]. Additional tests and graphical procedures for checking the dependence structure of clustered failure-time data can be found in [5, 12, 30, 32, 33]. These procedures, however, are not directly applicable to the retrospective setting we deal with here; extension of the procedures to this setting will be needed.
9. Asymptotic Theory: Conditions and Proofs
This section presents the technical conditions we assume for the asymptotic results and the proofs of these results. The pattern of the argument is as in [14], [35] and [36], but considerable extension is required, mainly because of the two-stage cumulative baseline hazard function estimator for the case-control family data. A pseudo full likelihood estimation procedure for prospective survival data with a general semiparametric shared frailty model is given by [14] and [35]. A pseudo partial likelihood method for semiparametric survival model with errors in covariates is provide by [36]. We focus here on the added arguments needed for the present setting, and refer back to [35] and [36] for the other derivations. Briefly, the main extensions required in this work are as follows: (i) In showing consistency of , the proof of Claim A in [36] cannot be applied directly since the quantity
which is the denominator of the term for the jumps in , tends to 0 as t → 0. (ii) For the asymptotic normality of , we need a workable representation of the baseline hazard function estimators (10) and (11). The approach of [36] cannot be applied directly since involves the “vanishing-denominator” problem mentioned above. For we use the representation of along with a recursive solution only for the relatives’ failure times.
9.1. Assumptions, Background, and Preliminaries
For the asymptotic theory, we make a number of assumptions. Several of these assumptions have already been listed in the main text. Below we list the additional assumptions.
There is a maximum follow-up time τ ∈ (0, ∞) with .
The frailty random variable ωi has finite moments up to order (m+2).
The baseline hazard function is bounded over [0, τ] by some fixed (but not necessarily known) constant λmax.
The function f’(w; θ) = (d/dθ)f(w; θ) is absolutely integrable.
For any given family, there is a positive probability of at least two failures.
- Defining , we have
This assumption is needed in the analysis of the first-stage estimator. For r = 1, it parallels Assumption (5.4) of [21].(13) The matrix is invertible with probability going to 1 as n → ∞. It should be noted that a general proof of invertibility is intractable, but given the data, one can easily check that numerically the matrix is invertible.
With ψ*(r, h) as in (8), we define and . It is easily seen that is finite and is strictly positive. The two lemmas below correspond to Lemmas 1 and 3 of [36].
Lemma 1. The function ψ*(r, h) is decreasing in h. Hence for all and all t, and .
Lemma 2. For any ∊ > 0, we have as n → ∞.
9.2. Consistency
As indicated in Sec. 4, the consistency proof proceeds in several stages.
Claim A1. converges in probability to some function uniformly in t and γ. The function satisfies .
Proof. We can write as
(14) |
The proof here builds here on that of the corresponding Claim A in [36]. The main point needing attention here is the fact that, because of the indicators I(Ti0 < s), the denominator of (14) tends to 0 as s → 0.
Define, in parallel with [36],
and
We write Ξn(t, γ, Λ) for Ξn(t, γ, Λ, 0) and Ξ(t, γ, Λ) for Ξ(t, γ, Λ, 0). By definition, satisfies the equation .
Remark
In [36], we had Ξn(t, γ, Λ) → Ξ(t, γ, Λ) a.s. as n → ∞, uniformly over t ∈ [0, τ], , and Λ in a certain set. We could not obtain such a result here; the argument of [2] fails in the neighborhood of zero because of the “vanishing denominator” problem. This is why we give only an in pr. consistency result rather than an a.s. result.
Next, define
This function qγ(s, Λ) has the same properties as its counterpart in [36]: these properties are not interfered with by the insertion of the indicator function I(Ti0 < s). In particular, from Lemma 1 we have
This leads to a bound on qγ(s, Λ). In addition, the function qγ(s, Λ) has the following Lipschitz-like property:
Hence, by [15] (Thm. 1.1), the equation Λ(t) = Ξ(t, γ, Λ) has a unique solution, which we denote by . The claim then is that converges in pr. (uniformly in t and γ) to .
We now define to be the solution of , starting from . For t < ∊ we set . Similarly, we define to be the solution of Λ0(t, γ, ∊) = Ξ(t, γ, Λ0(·, γ), ∊), starting from Λ0(∊, γ, ∊) = 0, and set for t < ∊. An induction argument similar to that in the proof of [15] (Thm. 1.1) shows that , where K is the Lipschitz constant for qγ(s, Λ). We thus have
(15) |
Now, for any given ∊ > 0, there is no “vanishing denominator” problem on the interval [∊, τ]. Hence, the argument in [36] goes through as is, and we get the following result: for any ∊ > 0,
(16) |
(and hence in probability). In fact, in the supremum above, we can replace [∊, τ] by [0, τ], since for t < ∊.
Our aim now is to show that in pr. as n → ∞. Now
(17) |
The second and third terms are easily dealt with using (15) and (16). It remains to deal with the first term.
Define
We can then write
(18) |
where
We deal with the two terms on the right side of (18) in turn. In what follows, we let R denote a “generic” constant which may vary from one appearance to another, but does not depend on the unknown parameters or ∊.
Denote Π(s) = C*(s, 0). It is clear that , where
We can write
where Mij is the martingale process corresponding to Nij:
(19) |
The first term in Υ(t, γ) clearly bounded by . Thus, denoting the second term by M*(t), we have
(20) |
We next examine A(t, ∊). We can restrict to t ≥ ∊, since A(t, ∊) = 0 for t < ∊. Denote . Bearing in mind the Lipschitz property of we find that
Note that, for t ≥ ∊, dA(t, ∊) = dΔ(t). Thus, a simple induction and some additional simple manipulations lead to the following, where we employ the symbol to denote product integral and use the fact that :
In view of the analysis above of Υ(t), we get
(21) |
Putting (18), (20), and (21) together, we get
(22) |
for suitable absolute constants R1 and R2.
The last main step is to deal with the martingale process
By an argument using Lenglart’s and Markov’s inequalities, as in [21] (p. 595), we obtain (with ξ1 as in Assumption 6)
(23) |
Given (22) and (23), we have control over the first term of (9.2), and the proof is thus complete.
Claim A2. converges in probability to some function Λ0(t, γ) uniformly in t and γ. The function Λ0(t, γ) satisfies .
Proof. We can write Λ̂0(t, γ) as
In view of Claim A1 above, up to a uniform error of oP (1) we can replace all instances of in the definition of by . The desired result then can be obtained using the argument used to prove Claim A of [36].
Claim A3. We have
Proof. By appeal to Claims A1 and A2, and to continuity of and .
Claim B. U(γ, Λ̂0(·, γ)) converges in probability uniformly in t and γ to a limit u(γ, Λ0(·, γ)).
Proof. As in Claim B of [36].
Claim C. There exists a unique consistent (in pr.) root to U(γ̂, Λ̂0(·, γ̂)) = 0.
Proof. By appeal to Foutz’s theorem [8], as in Claim C of [36].
9.3. A Workable Representation of
To develop our asymptotic normality result, we need a workable representation of . The first step is to develop a suitable representation of . Then, building on this, we develop our representation of .
9.3.1. Representation of
Our starting point is the following simple lemma.
Lemma 3. Let and be stochastic processes, and let An(t, ∊) and Bn(t, ∊) be quantities that are bounded in pr. uniformly in t and ∊. Define and . Suppose that:
as n → ∞ for any fixed ∊ > 0.
lim∊↓0 lim supn→∞ Pr(supt∈[0,∊] √n|Rn(t)| > δ) = 0 for all δ > 0.
lim∊↓0 lim supn→∞ Pr(supt∈[0,∊] √n|Sn(t)| > δ) = 0 for all δ > 0.
lim∊↓0 supt∈[0,τ] |Bn(t, ∊) - Bn(t, 0)| = 0 with probability going to 1 as n → ∞.
Then .
We apply this lemma with . We have to check the four conditions enumerated in the lemma.
Condition 1
Arguments along the lines of [36] yield the result of Condition 1, with
(24) |
and An(t, ∊) = Bn(t, ∊) = p̃(t, ∊)-1 where
Here
In the above, η1i(s) is defined as
In Sec. 9.3.2 below, we present in detail a similar argument for .
Appealing to Assumption 6 and using arguments similar to those used in the consistency proof, we find that the Ω quantities converge in probability uniformly in s and t, so that p̃(t, ∊) converges in probability to a deterministic limit uniformly in t and ∊.
Condition 2, 3, and 4
In regard to Condition 2, we have , where
where
and Mij(t) is defined as in (19). We will deal with Δ1(t) and Δ2(t) in turn, starting with Δ2(t). In the development below, R denotes a “generic” absolute constant.
The quadratic variation process of Δ2(t) is given by
By arguments similar to those of [21] (p. 595), we find that E[n〈Δ2〉(t)] ≤ Rξ1(t). An application of Lenglart’s inequality then gives
Assumption 6 implies that ξ1(∊) ↓ 0 as ∊ ↓ 0, and this takes care of Δ2(t).
We now turn to Δ1(t). Denote J(s) = I(Π(s) > 0). We can write Δ1(t) = Δ1a(t) + Δ1b(t), with
The term Δ1b(t) can be shown to be uniformly Op(n-½) by the argument in the middle of page 595 in [21]. As for Δ1a(t), we have
Thus, for t small, |Δ1a(t)| ≤ [Rt/(1-Rt)][Δ1b(t) + Δ2(t)], and the terms on the right hand side have already been taken care of.
The proof of Condition 3 is similar to that given above for Δ2(t). Condition 4 follows easily from the uniform convergence of the Ω quantities.
9.3.2. Representation of
Define
and
where in we take if Ti0 ≥ s and Λ̂0(Tij) if Tij < s, j ≥ 0. By Claim A3, we have sups∈[0,τ] |Λ̂0 (s, γo) - Λ̂0 (s-, γo)| converges to zero. Thus, we obtain the following approximation, uniformly over t ∈ [0, τ]:
Now let with or according to the estimator being used. Define and as the first and second derivative of with respect to r, respectively. Then, by a first order Taylor expansion of (s, r) we get
The justification for ignoring the remainder term in the Taylor expansion is as in the parallel argument in [36]. Note that in the above approximation probands’ data are involved since the derivative involves an estimator of Λ0 for the probands (either Λ or ) and not only an estimator for the relatives.
The second, third and fifth terms of the above equation can be written, by interchanging the order of integration, as
where Ñij(s) = I(Tij ≤ t),
and for j ≥ 1
The fourth term can be written, by plugging in the representation for , as
where
and . Given all the above, we get
By solving the above approximation recursively, for the relatives’ failure times, we get
(25) |
where
and
9.4. Asymptotic normality of n1/2(γ̂-γo)
To show that γ̂ is asymptotically normally distributed, we expand as
We examine in turn each of the terms on the left-hand side of the above equation.
Step I
We can write where , i = 1, . . . , n, are iid mean-zero random (p + 1)-vectors stemming from the likelihood of the proband data, while , i = 1, . . . , 2n, are iid mean-zero random (p + 1)-vectors stemming from the likelihood of the relatives’ data. It follows immediately from the classical central limit theorem that is asymptotically mean-zero multivariate normal.
Step II
Let r = 1, . . . , p, and (in this segment of the proof, when we write () the intent is to signify (). Further, denote , i = 1, . . . , 2n, j = 0, . . . , mi, r = 1, . . . ,p + 1. First order Taylor expansion of about , r = 1, . . . , p + 1, then gives
Although probands are involved in the above stochastic integral, its integrand is predictable since, by definitions (10)-(11), depends only on data up to time t-. Using the representation in Sec. 9.3.2 for and replacing certain empirical sums by their limiting values (see the expanded paper for details), we obtain a representation of the form
Thus, we have represented r = 1, . . . , p + 1 as the average of mean zero iid random variables. Hence, asymptotic normality follows from the classical central limit theorem.
Step III
First order Taylor expansion of about γo = (βoT, θo)T gives
where for l, s = 1, . . . , p + 1.
Combining the results of Steps I-III above we get that n1/2(γ̂-γo) is asymptotically zero-mean normally distributed with a covariance matrix that can be consistently estimated by a sandwich-type estimator.
9.5. Asymptotic properties of
We can write
(26) |
In (25), we have a representation of the first term above in terms of integrals with respect to the martingale processes Mij. Weak convergence of the first term can thus be established using the martingale central limit theorem, as in [1]. In particular, the first term is tight. In regard to the second term, Taylor expansion yields
where
The limiting value of W(t, γ) is
Since the integrand is bounded, the function ω(t, γ) is Lipschitz in t. We just showed that √n(γ̂ - γo) converges to a mean-zero normal variate. Hence the second term in (26) is tight. Accordingly, the entire expression (26) is tight.
Now, as seen in the normality proof of Sec. 9.4, both terms in (26) can be represented in terms of i.i.d. sums over i of functions of the data on family i. Hence, asymptotic normality of the finite dimensional distributions of the entire expression (26) follows from the classical central limit theorem. This, together with the tightness just shown, establishes weak convergence of to a Gaussian process. A fortiori,
Acknowledgements
The authors would like to thank Dr. Kathleen Malone for sharing the data from the case-control family study of breast cancer, which motivated the development of this work. The research was supported in part by grants from the National Institute of Health and the United States-Israel Binational Science Foundation (BSF).
Footnotes
AMS 2000 subject classiffications: Primary, 62N01,62N02, 62H12
Contributor Information
Malka Gorfine, Faculty of Industrial Engineering and Management, Technion City, Haifa 32000, Israel, E-mail: gorfinm@ie.technion.ac.il.
David M. Zucker, Department of Statistics, Hebrew University, Mt. Scopus, Jerusalem 91905, Israel, E-mail: mszucker@mscc.huji.ac.il
Li Hsu, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA, E-mail: lih@fhcrc.org.
References
- [1].Andersen PK, Gill RD. Cox’s regression model for counting processes: A large sample study. Ann. Statist. 1982;10:1100–1120. MR0673646. [Google Scholar]
- [2].Aalen OO. Nonparametric inference in connection with multiple decrement models. Scand. J. Statist. 1976;3:15–27. MR0400529. [Google Scholar]
- [3].Aalen O. Nonparametric inference for a family of counting processes. Ann. Statist. 1978;3:701–726. MR0491547. [Google Scholar]
- [4].Breslow NE, Day NE. Statistical methods in cancer research: Vol. 1 - The analysis of case-control studies. IARC Scientific Publication; Lyon, France: 1980. [PubMed] [Google Scholar]
- [5].Chen MC, Bandeen-Roche K. A diagnostic for association in bivariate survival models. Lifetime data Anal. 2005;11:245–264. doi: 10.1007/s10985-004-0386-8. MR2158784. [DOI] [PubMed] [Google Scholar]
- [6].Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–151. MR0501698. [Google Scholar]
- [7].Coditz GA, Rosner BA, Speizer FE, For the Nurses’ Health Study Reserch Group Risk factors for breast cancer according to family history of breast cancer. Journal of National Cancer Institute. 1996;88:365–371. doi: 10.1093/jnci/88.6.365. [DOI] [PubMed] [Google Scholar]
- [8].Foutz RV. On the unique consistent solution to the likelihood equation. J. Amer. Statist. Assoc. 1977;72:147–148. MR0445686. [Google Scholar]
- [9].Gill RD. Discussion of the paper by D. Clayton and J. Cuzick. J. R. Statist. Soc. 1985;A 148:108–109. [Google Scholar]
- [10].Gill RD. Non- and semi-parametric maximum likelihood estimators and the Von Mises method (Part 1) Scand. J. Statist. 1989;16:97–128. [Google Scholar]
- [11].Gill RD. Marginal partial likelihood. Scand. J. Statist. 1992;79:133–137. MR1173595. [Google Scholar]
- [12].Glidden DV. Checking the adequacy of the gamma frailty model for multivariate failure times. Biometrika. 1999;86:381–393. MR1705406. [Google Scholar]
- [13].Glidden DV. Pairwise dependence diagnostics for clustered failure time data. Biometrika. 2007;94:371–385. [Google Scholar]
- [14].Gorfine M, Zucker DM, Hsu L. Prospective survival analysis with a general semiparametric shared frailty model - a pseudo full likelihood approach. Biometrika. 2006;93:735–741. doi: 10.1901/jaba.2009.37-1489. MR2261454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Hartman P. Ordinary Differential Equations. 2nd ed. Birkhauser; Boston: 1973. reprinted, 1982. [Google Scholar]
- [16].Hopper JL. Commentary: Case-control-family design: a paradigm for future epidemiology research? International Journal of Epidemiology. 2003;32:48–50. doi: 10.1093/ije/dyg114. [DOI] [PubMed] [Google Scholar]
- [17].Hougaard P. Analysis of Multivariate Survival data. Springer; New York: 2000. [Google Scholar]
- [18].Hsu L, Chen L, Gorfine M, Malone K. Semiparametric estimation of marginal hazard function from case-control family studies. Biometrics. 2004;60:936–944. doi: 10.1111/j.0006-341X.2004.00249.x. MR2133546. [DOI] [PubMed] [Google Scholar]
- [19].Hsu L, Gorfine M. Multivariate survival analysis for case-control family data. Biostatistics. 2006;7:387–398. doi: 10.1093/biostatistics/kxj014. [DOI] [PubMed] [Google Scholar]
- [20].Hsu L, Gorfine M, Malone K. On robustness of marginal regression coefficient estimates and hazard functions in multivariate survival analysis of family data when the frailty distribution is misspecified. To appear in Stat. Med. 2007 doi: 10.1002/sim.2870. [DOI] [PubMed] [Google Scholar]
- [21].Keiding N, Gill R. Random truncation models and Markov processes. Ann. Statist. 1990;18:582–602. MR1056328. [Google Scholar]
- [22].Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazards regression models. Ann. Statist. 2004;32:1448–1491. MR2089130. [Google Scholar]
- [23].Malone KE, Daling JR, Thompson JD, Cecilia AO, Francisco LV, Ostrander EA. BRCA1 mutations and breast cancer in the general population. Journal of the American Medical Association. 1998;279:922–929. doi: 10.1001/jama.279.12.922. [DOI] [PubMed] [Google Scholar]
- [24].Malone KE, Daling JR, Neal C, Suter NM, O’brien C, Cushing-Haugen K, Jonasdottir TJ, Thompson JD, Ostrander EA. Frequency of BRCA1/BRCA2 mutations in a population-based sample of young breast carcinoma cases. Cancer. 2000;88:1393–1402. doi: 10.1002/(sici)1097-0142(20000315)88:6<1393::aid-cncr17>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
- [25].Malone KM, Daling JR, Doody DR, Hsu L, Bernstein L, Coates RJ, Marchbanks PA, Simon MS, McDonald JA, Norman SA, Strom BL, Burkman RT, Ursin G, Deapen D, Weiss LK, Folger S, Madeoy JJ, Friedrichsen DM, Suter NM, Humphrey MC, Spirtas R, Ostrander EA. Prevalence and predictors of BRCA1 and BRCA2 mutations in a population-based study of breast cancer in white and black American women aged 35-64 years. Cancer Research. 2006;16:8297–8308. doi: 10.1158/0008-5472.CAN-06-0503. [DOI] [PubMed] [Google Scholar]
- [26].Nielsen GG, Gill RD, Andersen PK, Sorensen TI. A counting process approach to maximum likelihood estimation of frailty models. Scand. J. Statist. 1992;19:25–43. [Google Scholar]
- [27].Oakes D. Bivariate survival models induced by frailties. J. Amer. Statist. Assoc. 1989;84:487–493. MR1010337. [Google Scholar]
- [28].Parner E. Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 1998;26:183–214. MR1611788. [Google Scholar]
- [29].Prentice RL, Breslow NE. Retrospective studies and failure time models. Biometrika. 1978;65:153–158. MR. [Google Scholar]
- [30].Shih JH. A goodness-of-fit test for association in a bivariate survival model. Biometrika. 1998;85:189–200. MR1627281. [Google Scholar]
- [31].Shih JH, Chatterjee N. Analysis of survival data from case-control family studies. Biometrics. 2002;58:502–509. doi: 10.1111/j.0006-341x.2002.00502.x. MR1925547. [DOI] [PubMed] [Google Scholar]
- [32].Shih JH, Louis TA. Inference on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1384–1399. MR1381050. [PubMed] [Google Scholar]
- [33].Viswanathan B, Manatunga AK. Diagnostic plots for assessing the frailty distribution in multivariate survival data. Lifetime Data Anal. 2001;7:143–155. doi: 10.1023/a:1011348823081. MR1842324. [DOI] [PubMed] [Google Scholar]
- [34].Zeger S, Liang K-Y, Albert PS. Models for longitudinal data: A generalized Estimation Equation approach. Biometrics. 1988;44:1049–1060. MR0980999. [PubMed] [Google Scholar]
- [35].Zucker DM. A pseudo partial likelihood method for semi-parametric survival regression with covariate errors. J. Amer. Statist. Assoc. 2005;100:1264–1277. MR2236440. [Google Scholar]
- [36].Zucker DM, Gorfine M, Hsu L. Pseudo full likelihood estimation for prospective survival analysis with a general semiparametric shared frailty model: asymptotic theory. To appear in J. Statist. Plann Inference. 2007 [Google Scholar]