Abstract
In stratified case-cohort designs, samplings of case-cohort samples are conducted via a stratified random sampling based on covariate information available on the entire cohort members. In this paper, we extended the work of Kang & Cai (2009) to a generalized stratified case-cohort study design for failure time data with multiple disease outcomes. Under this study design, we developed weighted estimating procedures for model parameters in marginal multiplicative intensity models and for the cumulative baseline hazard function. The asymptotic properties of the estimators are studied using martingales, modern empirical process theory, and results for finite population sampling.
Keywords: marginal hazards model, multivariate failure times, stratified case-cohort design, survival analysis, weighted estimating equations
1 Introduction
The case-cohort study design (Prentice, 1986) has been proposed to reduce the cost and effort arising in large cohort studies with time-to-event data. The amount of reduction could be substantial especially if the main disease of interest is rare and the main covariate of interest (exposure) is expensive to measure since the case-cohort design requires measurement of the exposure only on a subset of the whole cohort. Specifically, the sampling in the case-cohort design is comprised of the following two steps. First, a subset called subcohort from the entire cohort is sampled randomly regardless of failure status. Second, remaining cases outside the subcohort are sampled. Information on the exposure is obtained only on these sampled subjects.
While the exposure will be available only for the case-cohort sample, some less expensive covariates such as age, gender, or a correlate of the exposure might be easily obtained for all the cohort members. In such cases, the subcohort could be sampled via a stratified simple random sampling based on strata defined by some of these covariates. This stratified case-cohort design could lead to a large efficiency gain compared to the unstratified counterpart since the latter ignores the available information.
For a single disease outcome assuming independent failure times among subjects in Cox models, many statistical methods have been proposed and studied for data from unstratified case-cohort studies (Prentice, 1986; Self & Prentice, 1988; Barlow, 1994; Chen & Lo, 1999; Chen, 2001) and stratified case-cohort designs (Borgan et al., 2000; Kulich & Lin, 2004).
The case-cohort design, among several other study designs which have been proposed for the similar purpose, is known to have advantage since the same subcohort can be utilized for different disease outcomes (Langholz & Thomas, 1990; Wacholder et al., 1991). When more than one disease outcomes from a subject are of interest, failure time data from the same subject constitute multivariate failure time data wherein correlations among the failure times within the same subject should be accounted for. Such multivariate failure time data are frequently encountered in many biomedical studies. One interesting example is a study of relationship between serum ferritin and coronary heart disease and stroke events in the Busselton Health Study (Cullen, 1972). In order to reduce costs and preserve stored serum, case-cohort sampling was used. It is of scientific interest to compare the effects of serum ferritin on coronary heart disease and on stroke. A subject can experience both coronary heart disease and stroke, and times to coronary heart disease and stroke events observed from the same subject are obviously not independent. In this case, methods developed for single disease outcome assuming univariate failure time data can no longer be directly applied.
Statistical methods which address this problem have been somewhat limited. Recently, we proposed weighted estimating equation methods for failure time data with multiple disease outcomes from case-cohort studies assuming marginal hazards models (Kang & Cai, 2009). In that paper, the generalized case-cohort design which allows sampling of cases outside the subcohort was considered. It is more realistic and flexible when considering multiple disease outcomes since not all disease outcomes need to be rare or number of cases need to be small (Breslow & Wellner, 2007).
The main purpose of this article is to extend the study design considered in Kang & Cai (2009) to a stratified case-cohort design, propose estimation procedure under such study designs, and provide a detailed derivation for the asymptotic properties of the proposed estimators. The model and the estimating procedures for regression coefficients and cumulative baseline hazards function are presented in Section 2. The corresponding asymptotic properties are stated and proven in Section 3. A brief summary and discussion is provided in Section 4.
2 Model, study design, and estimating procedure
2.1 Model
Suppose a cohort with n subjects can be divided into L mutually exclusive strata using some information available for all the cohort members. Let Tlik be the failure time for the kth type of disease outcome (k = 1, …, K) of the ith subject (i = 1, …, nl) within the lth stratum (l = 1, …, L). Due to right censoring, what one actually observes for the kth type of disease outcome within the lth stratum is Xlik = min(Tlik, Clik) where Clik is the potential censoring time. Given p-vector of covariates Zlik(t), Tlik and Clik are assumed to be independent. We assume that all the time-dependent covariates in Zlik(t) are “external”, i.e., they are not affected by the disease processes, as described by Kalbfleisch & Prentice (2002). Let Δlik = I(Tlik ≤ Clik), Nlik(t) = ΔlikI (Xlik ≤ t), and Ylik(t) = I (Xlik ≥ t) where I (·) is an indicator function. Let λlik(t) denote the corresponding marginal hazards function and let τ denote the study end time.
For the kth type of disease outcome of the ith subject within the lth stratum, the marginal hazards function λlik(t) is assumed to be associated with the covariate Zlik(t) by
| (1) |
where λ0k(t) is an unspecified baseline hazard function for the kth disease outcome and β0 = (β01, …, β0K )T is a p × 1 vector of regression parameters.
2.2 Study designs
Let V denote the discrete random variable for indicating the stratum. We consider sampling procedures depending on V. We assume that Tlik is independent of Vik given Zlik(·), i.e., Vik affects the failure time only through the covariates (Kulich & Lin, 2004).
First, we consider a direct extension of the stratified case-cohort design for a single disease outcome to multiple disease outcomes and refer to this design as the “original” stratified case-cohort design. Under the original stratified case-cohort design for multiple disease outcomes, the subcohort is selected by a stratified random sampling. Specifically, for the lth stratum, we select a fixed size ñl subjects from nl subjects in the entire cohort via simple random sampling without replacement. Thus, each subject in the lth stratum has the same probability pr(ξli = 1) = α̃l = ñl/nl of being selected into the subcohort where ξli is subcohort sampling indicator for the ith subject in the lth stratum. We obtain covariate measurements only on the subcohort members and all the remaining cases outside the subcohort. For the kth type of disease outcome, complete data, {Xlik, Δlik, Zlik(t), 0 ≤ t ≤ Xik, Vik} are available for the subcohort members (ξli = 1) or cases (Δlik = 1). Note that, for cases, information on Vik do not need to be available. For the non-subcohort controls (ξli = 0 and Δlik = 0), only partial data, {Xlik, Δlik Vik} are available.
Since we consider more than one disease outcome, it might be more realistic that some of diseases outcomes are not rare or the numbers of cases are not small. In this situation, obtaining covariate information on all the cases outside the subcohort might not be feasible. Thus, we consider a stratified case-cohort design which allows the sampling of cases outside the subcohort to be different for different stratum and refer to this design as the “generalized” stratified case-cohort design.
Under the generalized stratified case-cohort design for multiple disease outcomes, sampling of the subcohort members follows the same routine as before: for the lth stratum, sampling a fixed size ñl subjects from nl subjects in the entire cohort via simple random sampling without replacement. After the sampling of a subcohort, instead of sampling all the cases outside the subcohort, we allow sampling a fraction of cases for each of the disease outcomes. Specifically, for the kth type of disease (k = 1, …, K) within the lth stratum (l = 1, …, L), we select a fixed number of cases who are outside the subcohort via simple random sampling without replacement. Then, each case outside the subcohort has the same probability of being sampled where ηlik is the case sampling indicator, is the number of the kth type of disease cases within the lth stratum in the cohort and is the number of kth disease cases within the lth stratum in the subcohort.
Note that due to the sampling scheme, (η11k, …, ηlnlk) are correlated, however, (ηl1k, …, ηlnlk) and (η l′1k′, …, ηl′nlk′) are not correlated for k ≠ k′ or l ≠ l′. We obtain covariate measurements only on the sampled subject. Thus complete data, {Xlik, Δlik, Zlik(t), 0 ≤ t ≤ Xlik, Vik}, are available for the sub-cohort members (ξli = 1) or sampled cases outside the subcohort (ηlik = 1). Only partial data, {Xlik, Δlik, Vik} are available for all others (ξli = 0 and ηlik = 0). Note that the generalized stratified case-cohort design includes the original stratified case-cohort design as a special case since if q̃lk = 1 for all k and l, it reduces to the original stratified case-cohort design which samples all the cases outside the subcohort. Also, if we do not consider the strata for the cohort, i.e., L = 1, then it reduces to the generalized case-cohort design considered by Kang & Cai (2009).
2.3 Estimation of regression parameters under the original stratified case-cohort design
For full cohort data, Wei et al. (1989) proposed the following pseudo-likelihood score equations for the estimation of the hazards regression parameter β0:
| (2) |
where for d = 0 and 1, and a⊗2 = aaT, a⊗1 = a, a⊗0 = 1 for a vector a. Since these estimating equations do not have analytical solutions, they need to be solved iteratively, for example, by Newton-Raphson method (Thisted, 1988).
For data from the original stratified case-cohort studies, in (2) cannot be calculated due to the incompleteness of the data. In order to handle this problem, we consider the idea of weighting the incomplete data by the inverse selection probability (Horvitz & Thompson, 1951). Specifically, we consider in place of for d = 0 and 1 where ρlik(t) is a possibly time-varying weight function, incorporated to account for the sampling scheme and has the following form:
Then, for the estimation of β0, we propose the following pseudo-partial-likelihood score equations Û(β) = 0p×1, where
| (3) |
A time-invariant version of the weight function which uses α̃l in place of α̂lk(t) may also be considered.
The solution to Û(β) = 0p×1 is defined to be the estimator of the hazards regression parameter β0. We will denote the estimator which uses time-invariant weight functions as β̂I and time-varying weight functions as β̂II. The corresponding pseudo-partial-likelihood functions will be denoted by ÛI(β) and ÛII (β), respectively.
2.4 Estimation of regression parameters under the generalized stratified case-cohort design
For the generalized stratified case-cohort design, the weight function needs to be modified to appropriately account for the sampling of cases outside the subcohort. Specifically, cases outside the subcohort who are sampled are weighted by where q̂lk(t) denotes the number of sampled non-subcohort cases with the kth type of disease outcome in the lth stratum divided by the number of non-subcohort cases with the kth type of disease outcome in the lth stratum remaining in the risk set at time t. Then, the weight function ωlik(t) has the following form:
Note that the proposed weight functions reduce to the ones for the original stratified case-cohort design if all cases outside the subcohort are sampled since q̃lk = 1 for all k and l.
For the estimation of β0 under the generalized stratified case-cohort design, the following weighted estimating functions with the weight function ωlik(t) is considered:
| (4) |
where for d = 0 and 1.
The solution to the equations Ũ(β) = 0p×1 is the estimator for the hazards regression parameter β0. We will denote the estimator which uses time-invariant weight functions as β̃I and time-varying weight functions as β̃II, respectively. The corresponding weighted estimating functions are ŨI (β) and ŨII (β), respectively.
2.5 Estimation of the cumulative baseline hazard function
Let Λ0k(t) denote the cumulative baseline hazard function for the kth type of disease outcome at time t where . Then for the estimator of Λ0k(t), we consider the following Breslow-Aalen type estimators Λ̂0k(β̂, t) for the original stratified case-cohort design and Λ̂0k(β̃, t) for the generalized stratified case-cohort design where
| (5) |
| (6) |
respectively.
We, again, use the superscript and to denote the estimator using time-invariant and time-varying weight functions, respectively.
3 Asymptotic properties
We will focus on the asymptotic properties of the estimators under the generalized stratified case-cohort design, β̃I, β̃II, , and . This is because the estimators under the original stratified case-cohort studies, β̂I, β̂II, , and , are special cases of those under the generalized stratified case-cohort studies. Thus, their asymptotic properties can be directly drawn from those under the generalized case-cohort studies.
3.1 Conditions
In order to establish the consistency and asymptotic normality of the estimators for the generalized stratified case-cohort studies, the following sets of conditions are needed:
-
(A)
(Tli, Cli, Zli), i = 1, …, nl and l = 1, …, L are independent and identically distributed where Tli = (Tli1, …, TliK)T, Cli = (Cli1, …, CliK)T and Zli = (Zli1, …, ZliK)T.
-
(B)
pr{Ylik(τ) > 0} > 0 for all i = 1, …, nl, k = 1, …, K and l = 1, …, L.
-
(C)
almost surely for all i = 1, …, nl, k = 1, …, K, and l = 1, …, L where Zlikj is the jth component of Zlik and Cz is some constant.
-
(D)
The matrix is positive definite.
-
(E)
, for all k = 1, …, K.
-
(F)
There exists a neighborhood
of β0 that satisfies the following conditions, as n → ∞: for all k = 1, …, K, and d = 0, 1, 2,
where
are continuous functions of β ∈
, uniformly in t ∈ [0, τ] and are bounded on
× [0, τ] and
is bounded away from zero on
× [0, τ].
The following additional conditions are also needed to ensure the desired asymptotic convergence of case-cohort samples:
-
(G)
As n → ∈,
For all l = 1, …, L, α̃l converges to a constant αl ∈ (0, 1];
For all k = 1, …, K, and l = 1, …, L, q̃lk converges to a constant qlk in (0, 1].
-
(H)
nl/n converges to a constant pl ∈ [0, 1] for all l = 1, …, L as n → ∈.
Here and hereafter the norms for the vector a, matrix A, and function f are defined as the following:
3.2 Asymptotic properties of β̃I and
The key component of the derivation of the asymptotic results involves a decomposition of the proposed estimating function into three asymptotically uncorrelated pieces and some negligible terms. These three components represent, respectively, the whole cohort counterpart, one arising from sampling of a subcohort, and one arising from sampling of cases outside the subcohort.
Let us provide some lemmas which will be frequently used in proving the theorems.
Lemma 1
Let Wn(t) and Gn(t) be two sequences of bounded processes. For some constant τ, assume that the following conditions (a) – (c) hold where
for some bounded process W (t),
Wn(t) is monotone on [0, τ] and
- Gn(t) converges to a zero-mean process with continuous sample paths. Under Conditions (a) – (c),
PROOF
Lemma 1 is given as a lemma in Lin (2000). Its proof follows from the strong embedding theorem (Shorack & Wellner, 1986, p47–48), lemma 1 of Lin et al. (2000) and the triangular argument of a norm.
Lemma 2 is an extension of the proposition in Kulich & Lin (2000).
Lemma 2
Let ξ = (ξ1, …, ξn) be a random vector containing ñ ones and n–ñ zeros, with each permutation equally likely. Let Bi(t), i = 1, …, n, be i.i.d. real-valued random processes on [0, τ] with E{Bi(t)} = μB (t), Var{Bi(0)} < ∞ and Var{Bi(τ)} < ∞. Let B(t) = {B1(t), …, Bn(t)} and ξ be independent. Suppose that almost all paths of Bi(t) have finite variation. Then,
converges weakly in ℓ∞ [0, τ] to a zero-mean Gaussian process and therefore
converges in probability to 0 uniformly in t.
PROOF
This lemma is an extension of the proposition in Kulich & Lin (2000). The proof of this lemma follows from Hájek (1960)’s central limit theorem for finite population sampling and Example 3.6.14 of van der Vaart & Wellner (1996). Specifically, suppose first that the Bi(t)’s have nondecreasing sample paths then the finite-dimensional convergence follows from Hájek (1960)’s central limit theorem for finite population sampling while the tightness follows from Example 3.6.14 of van der Vaart & Wellner (1996). In the general case, since almost every path b(t) of B(t) have finite variation, b(t) can be written as , where and are nonnegative, nondecreasing in t. Hence , where and are marginally tight since they meet the condition of Example 3.6.14 of van der Vaart & Wellner (1996). This implies that they are jointly tight. The joint finite-dimensional convergence of the normalized and follows again from Hájek (1960)’s central limit theorem for finite population sampling. Therefore, converges weakly in ℓ∞ [0, τ] to zero mean Gaussian processes. It then follows that converges to 0 in probability uniformly in t.
Note that ξli is the subcohort membership indicator and ηlik is the sampling indicator for the ith subject with the kth disease within the lth stratum outside the subcohort where both the sampling of the subcohort and the cases outside the subcohort were conducted by simple random sampling without replacement. Thus, it is clear that our ξli’s and ηlik’s satisfy the conditions in lemma 2.
Theorem 1
Under Conditions (A) – (H), β̃I solving ŨI (β) = 0 is a consistent estimator of β0. Also, n1/2(β̃I – β0) is asymptotically normally distributed with mean zero and with variance matrix ΣI (β0) of the following form
where
Note that El, Varl, and Covl denote the expectation, the variance and the covariance within the lth stratum, respectively.
We now study the asymptotic properties of
. Let
and
where
(t) is a zero-mean Gaussian process with the covariance function between
and
(1 ≤ j, k ≤ K and 0 ≤ t1, t2 ≤ τ) is
where
Also, let D[0, τ]K be a metric space consisting of right-continuous functions f (t) with left-hand limits where f (t) = {f1(t), …, fK (t)}T and fk(t): [0, τ] →
. This metric space is equipped with a uniform metric dk(f, g) = supk,t∈[0,τ]{|fk(t) − gk(t)|: 1 ≤ k ≤ K} for f, g ∈ D[0, τ]K.
Theorem 2
Under Conditions (A) – (H), for each k = 1, …, K,
converges in probability to Λ0k(t) uniformly in t ∈ [0, τ]. Also, WI(t) converges weakly to
(t) in D[0, τ]K
Proofs for Theorems 1 and 2 can be derived from those for their time-varying counterparts which will be provided in the next subsection. More detailed explanation on this is deferred to Section 3.4
3.3 Asymptotic properties of β̃II and
In order to establish the asymptotic properties of the proposed estimators with time-varying weight functions, we need the following lemma on the asymptotic properties of time-varying sampling probability estimators α̂lk(t) and q̂k(t).
Lemma 3
For all l = 1, …, L and k = 1, …, K,
- α̂lk(t) and α̃l converge to the same limit uniformly in t and
- q̂lk(t) and q̃lk converge to the same limit uniformly in t and
PROOF
For each l and k, it follows from the Taylor expansion of α̂lk(t)−1 around α̃l,
where α* (t) is on the line segment between α̂lk(t) and α̃l. Then,
By Glivenko-Cantelli lemma, converges to El {(1 − Δl1k)Yl1k(t)} in probability uniformly in t. In view of lemma 2, converges to a zero-mean Gaussian process since (1 − Δlik)Ylik(t)is bounded and monotone function in t. This implies converges to 0 in probability uniformly in t and consequently, α̂lk(t) and α̃l converges to the same limit uniformly in t. This ensures α* (t) also converges to the same limit as α̃l. Combining these results, it follows from Slutsky’s theorem and Condition (H) that
(ii) can be shown via similar arguments.
Now, we state the asymptotic behavior of the regression parameter estimator β̃II in the following theorem:
Theorem 3
Under Conditions (A) – (H), β̃II solving ŨII (β) = 0 is a consistent estimator of β0. Also, n1/2(β̃II − β0) is asymptotically normally distributed with mean zero and with variance matrix ΣII (β0) of the following form
where
PROOF
The proof for the consistency of β̃II is based on the application of the Inverse Function Theorem in Foutz (1977). One can show β̃II to be consistent for β0 provided:
exists and is continuous in an open neighborhood
of β0,is negative definite with probability going to one as n → ∞,
converges to A(β0) in probability uniformly for β in an open neighborhood about β0,
n−1ŨII (β) → 0 in probability.
One can write
| (7) |
Then, (i) is clearly satisfied on the basis of (7) and Condition (F). Now, following Andersen & Gill (1982),
| (8) |
Each of the terms on the right side of the above inequality can be shown to converge to zero, uniformly in β ∈
. To show the first term on the right side of (8), we will first show that
It suffices to show that as n → ∞ for d = 0, 1 and 2. One can write
| (9) |
Then by lemma 3,
| (10) |
It follows from lemma 2 that, for d = 0, 1 and 2, and converge to plEl{(1 − Δl1k)Yl1k(t)Zl1k(t)⊗deβTZl1k(t)} and plpr(Δl1k = 1)El{Yl1k(t)Zl1k(t)⊗deβTZl1k(t) | Δl1k = 1, ξl1 = 0 in probability uniformly in t under Condition (G), respectively. Thus, from (10)
| (11) |
It then follows from lemma 2 that, for d = 0, 1 and 2, converges weakly to zero-mean Gaussian processes under Condition (G). Consequently, together with condition (F),
| (12) |
Since
is bounded away from zero on
× [0, τ] by condition (F), it follows from the above convergence results that for k = 1, …, K, Ũk(β, t) converges to vk(β, t) in probability uniformly in t and β.
By combining these results with the Lenglart inequality for
(Andersen & Gill, 1982, p1115), it follows that the first term on the right side of (8) converges to zero in probability, uniformly in β ∈
, as n → ∞.
The second term and the fourth terms on the right side of (8) can be shown to converge to zero by applying lemma 2. The third term can be shown to converge to zero by the Lenglart inequality for (Andersen & Gill, 1982, p1115).
Conditions (D), (E) and (F) ensure the boundedness of supt,β{vk(β, t)}jj′ and Λ0k(τ) for k = 1, …, K and j, j′ = 1, …, p. Thus, together with the uniform convergence of
to
in probability, the last term on the right side of (8) converges to zero in probability, uniformly in β ∈
as n → ∞. Hence,
and, thus, (ii) and (iii) are satisfied.
For (iv), we will show that n−1/2ŨII (β0) is asymptotically equivalent to
| (13) |
Specifically, one can decompose n−1/2ŨII (β0) into the following four parts:
| (14) |
The second term on the right-hand side of (14) can be shown to converge to zero uniformly in t. Note that, for fixed t,
is a sum of i.i.d. zero-mean random variables. Based on Conditions (C) and (E), Mlik(β0, t) is of bounded variation and therefore can be written as a difference of two monotone functions in t. It then follows from the example of 2.11.16 of van der Vaart & Wellner (1996, p215) that
converges weakly to a zero-mean Gaussian process, say
(t). It can be shown that E[{
(t) −
(s)}4] ≤ C{Λ0k(t) − Λ0k(s)}2 for some constant C > 0. Specifically, E[{
(t) −
(s)}4] = 3(E[{
(t) −
(s)}2])2 since
(t) is a zero-mean normal random variable for a fixed t. Then E[{
(t) −
(s)}2] = E{
(t)2}+E{
(s)2}−2E{
(t)
(s)} = E{
(t)2}− E{
(s)2} for s ≤ t. Since
by the boundedness condition (C). Since Λ0k(·) is differentiable and λ0(·) is bounded on [0, τ], there exists a constant M, such that Λ0k(t) − Λ0k(s) ≤ M (t − s) for s ≤ t. Therefore,
and
for some constant
. Then, by the Kolmogorov-Centsov Theorem (Karatzas & Shereve, 1988, p53),
(t) has continuous sample paths. In addition, since
is of bounded variation based on (12) and Conditions (C) and (F), we can write
where both
and
are nonnegative, monotone in t and bounded. Therefore,
is a sum of two monotone functions. Hence, it follows from lemma 1 that the second term on the right-hand side of (14) converges to 0 uniformly in t.
By similar arguments, the fourth term on the right-hand side of (14) can be shown to converge to 0 uniformly in t.
The third term on the right-hand side of (14) can be further decomposed as
| (15) |
The second term on the right side of (15) is asymptotically equivalent to
| (16) |
by (i) in lemma 3 and applying lemma 2. Likewise, by (ii) in lemma 3 and applying lemma 2, the last term on the right side of (15) is asymptotically equivalent to
| (17) |
By combining (16) and (17), (15) is asymptotically equivalent to
| (18) |
Combining the above results, we have shown that n−1/2ŨII (β0) is asymptotically equivalent to (13). Under the regularity conditions, the first term on the right-hand side of (13) is asymptotically zero-mean normal with covariance matrix where by Spiekerman & Lin (1998). The second and the third terms on the right-hand side of (13) can be shown to be asymptotically zero-mean normal with covariance matrix and by lemma 2, respectively. It follows from conditional expectation arguments that these three terms are mutually independent. Therefore, n−1/2ŨII (β0) is asymptotically normally distributed with mean zero and with finite variance
Hence n−1ŨII (β) converges to zero in probability. Thus, (iv) is satisfied.
By (i),(ii),(iii) and (iv), it follows that there exists a unique sequence β̃II s.t. n−1ŨII (β̃II ) = 0 with probability converging to one as n → 0 and with β̃II converging in probability to β0 by Theorem 2 (Foutz, 1977).
The asymptotic normality of β̃II follows from the consistency of β̃II and a Taylor series expansion of ŨII (β). This completes the proof.
The asymptotic properties of (k = 1, …, K) are summarized in the following theorem.
Theorem 4
Under Conditions (A) – (H), for each k = 1, …, K,
converges in probability to Λ0k(t) uniformly in t ∈ [0, τ]. Also,
converges weakly to a zero-mean Gaussian process
(t) in D[0, τ]K where
. The covariance function between
and
is
where
PROOF
One can make the following decomposition
| (19) |
By the Taylor expansion of around β0, the first term on the right-hand side of (19) is equivalent to
| (20) |
where β* is on the line segment between β̃II and β0. Then, as n → ∞, (20) converges to 0 uniformly in t in probability by lemma 1 since is of bounded variation, β̃II is consistent for β0, and converges weakly to a zero-mean Gaussian process with continuous sample path. The second term can be shown to converge to 0 uniformly in t in probability by similar arguments.
Again, it follows from the Taylor expansion of around β0, the uniform convergence of and , the consistency of β̂II for β0 and the boundedness of Λ0k(t) on [0, τ) that the third term is asymptotically equivalent to
The fourth term can be shown to be asymptotically equivalent to
by lemma 1 since converges to uniformly in t and converges weakly to a zero-mean Gaussian process with continuous sample path. For the last term on the right-hand side of (19), it follows from lemma 3, and the uniform convergence of to , where is bounded away from 0 that
| (21) |
Now by combining the above results and using the asymptotic expansion of n1/2(β̃II − β0) where
we have
where
Now, let
where
where
, and
where
for k = 1, …, K. Then, W
(1)(t) converges weakly to a zero-mean Gaussian process
in D[0, τ]K where the covariance function between
and
is
by Spiekerman & Lin (1998, Thm.2). W
(2)(t) also can be shown to converge weakly to a zero-mean Gaussian process
. For any finite number of time points (t1, …, tD), the finite dimensional distribution of W
(2)(t) is asymptotically the same as those of
(t) by lemma 2 and Cramer-Wold device. Since the space D[0, τ]K is equipped with the uniform metric, it suffices to show the marginal tightness of
for each k. The marginal tightness follows directly by applying lemma 2 to
. Thus, W
(2)(t) converges weakly to a zero-mean Gaussian process where the covariance function between
and
is
. The weak convergence W(3)(t) to a zero-mean Gaussian process
(t) follows from the similar arguments with the covariance function between
and
being
It follows from the conditional expectation argument that these three terms are mutually independent. Therefore, WII(t) = W
(1)(t) + W
(2)(t) + W
(3)(t) converges weakly to a zero-mean Gaussian process
(t) =
(t) +
(t) +
(t) where the covariance function between
and
is
. This completes the proofs.
3.4 Proofs of Theorems 1 and 2
Proofs for Theorems 1 and 2 basically follow the same steps used for those for Theorems 3 and 4. However, the steps involving the asymptotic expansions of α̂lk(t)−1 and q̂lk(t)−1 around α̃l and q̃lk (lemma 3) can now be omitted. Specifically, the third and fourth terms in (9) and (10), and the second and fourth terms in (15) and (21) vanish.
4 Discussion
In this paper, we considered fitting marginal hazards models for failure time data with multiple disease outcomes from two types of stratified case-cohort study designs: the original and the generalized stratified case-cohort designs. In either design, subcohort members are sampled via stratified random sampling with possibly different sampling proportions within each stratum where the strata were constructed based on the information available for the entire cohort members. After the selection of the subcohort members, we sample all the remaining cases outside the subcohort under the original stratified case-cohort design whereas we are allowed to select a part of cases outside the subcohort via stratified random sampling under the generalized stratified case-cohort design. For estimation, we proposed weighted estimating equation approach for regression parameters and Breslow-Aalen type estimator for cumulative baseline hazards functions. We also provided a detailed proofs for deriving the asymptotic properties of the proposed estimators. The proposed estimators were shown to have desirable asymptotic properties such as consistency and asymptotic normality.
One modification to the generalized case-cohort study design in the current paper might be worth considering. Instead of sampling cases outside the sub-cohort within each stratum separately, one might want to sample cases outside the subcohort from the whole cohort regardless of their strata. Our proposed methods can be easily adapted to this design simply by redefining the strata, i.e., defining cases outside the subcohort as a separate single stratum.
Acknowledgments
This research is partly supported by National Institutes of Health NHLBI Grant R01 HL-57444.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Sangwook Kang, Email: skang@uga.edu, Department of Epidemiology and Biostatistics, University of Georgia, Athens, Georgia 30602, United States.
Jianwen Cai, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States.
References
- Andersen P, Gill R. Cox’s regression model for counting processes: A large sample study. The Annals of Statistics. 1982;10:1100–1120. [Google Scholar]
- Barlow W. Robust variance estimation for the case-cohort design. Biometrics. 1994;50:1064–1072. [PubMed] [Google Scholar]
- Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J. Exposure stratified case-cohort designs. Lifetime Data Analysis. 2000;6:39–58. doi: 10.1023/a:1009661900674. [DOI] [PubMed] [Google Scholar]
- Breslow NE, Wellner JA. Weighted likelihood for semipara-metric models and two-phase stratified samples, with application to cox regression. Scandinavian Journal of Statistics. 2007;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen K, Lo S. Case-cohort and case-control analysis with cox’s model. Biometrika. 1999;86:755–764. [Google Scholar]
- Chen K. Generalized case-cohort sampling. Journal of the Royal Statistical Society, Series B. 2001;63:791–809. [Google Scholar]
- Cullen KJ. Mass health examinations in the Busselton population, 1966 to 1970. Australian Journal of Medicine. 1972;2:714–718. doi: 10.5694/j.1326-5377.1972.tb103506.x. [DOI] [PubMed] [Google Scholar]
- Foutz RV. On the unique consistent solution to the likelihood equations. Journal of the American Statistical Association. 1977;72:147–148. [Google Scholar]
- Hájek J. Limiting distributions in simple random sampling from a finite population. Pub Math Inst Hungar Acad Sci. 1960;5:361–374. [Google Scholar]
- Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1951;47:663–685. [Google Scholar]
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. New York: Wiley, John & Sons; 2002. [Google Scholar]
- Kang S, Cai J. Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika. 2009;96:887–901. doi: 10.1093/biomet/asp059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karatzas I, Shereve SE. Brownian Motion and Stochastic Calculus. 2. New York: Springer-Verlag; 1988. [Google Scholar]
- Kulich M, Lin DY. Additive hazards regression for case-cohort studies. Biometrika. 2000;87:73–87. [Google Scholar]
- Kulich M, Lin DY. Improving the efficiency of relative-risk estimation in case-cohort studies. Journal of the American Statistical Association. 2004;99:832–844. [Google Scholar]
- Langholz B, Thomas D. Nested case-control and case-cohort methods of sampling from a cohort: A critical comparison. American Journal of Epidemiology. 1990;131:169–176. doi: 10.1093/oxfordjournals.aje.a115471. [DOI] [PubMed] [Google Scholar]
- Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society, Series B. 2000;62:711–730. [Google Scholar]
- Lin DY. On fitting cox’s proportional hazards models to survey data. Biometrika. 2000;87:37–47. [Google Scholar]
- Prentice R. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]
- Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Annals of Statistics. 1988;16:64–81. [Google Scholar]
- Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. New York: Wiley; 1986. [Google Scholar]
- Spiekerman CF, Lin DY. Marginal regression models for multivariate failure time data. Journal of the American Statistical Association. 1998;93:1164–1175. [Google Scholar]
- Thisted R. Elements of Statistical Computing. New York: Chapman & Hall; 1988. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer-Verlag; 1996. [Google Scholar]
- Wacholder S, Gail M, Pee D. Efficient design for assessing exposure-disease relationships in an assembled cohort. Biometrics. 1991;47:63–76. [PubMed] [Google Scholar]
- Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association. 1989;84:1065–1073. [Google Scholar]
