Statistical inference for the additive hazards model under outcome-dependent sampling

Jichang Yu; Yanyan Liu; Dale P Sandler; Haibo Zhou

doi:10.1002/cjs.11257

. Author manuscript; available in PMC: 2015 Sep 14.

Published in final edited form as: Can J Stat. 2015 Jul 30;43(3):436–453. doi: 10.1002/cjs.11257

Statistical inference for the additive hazards model under outcome-dependent sampling

Jichang Yu ¹, Yanyan Liu ², Dale P Sandler ³, Haibo Zhou ^4,^*

PMCID: PMC4569173 NIHMSID: NIHMS689640 PMID: 26379363

Abstract

Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

Keywords: additive hazards model, inverse probability weight, outcome-dependent sampling, Primary 62D05, secondary 62N01

1. INTRODUCTION

Epidemiologic studies often require a long follow-up of subjects in order to observe meaningful outcome results. The cost for a large number of subjects and a long period of follow-up time could be prohibitively expensive. Research methods that look into new efficient statistical designs that will reduce the overall cost and improve the study power under a fixed budget are always desired. For example, in the Cancer Incidence and Mortality of Uranium Miners Study conducted at the National Institution of Environment Health (Řeřicha et al., 2006), assembly of the life long record of radon exposure for a miner is a challenging and costly process. Investigators would like to maximize the study power for a given budget by strategically selecting the most informative study subjects.

The proposed ODS design for failure time data is a biased sampling scheme. Biased sampling schemes have long been recognized as cost-effective designs to improve the power of studies. Such biased designs include Case–Control designs for binary outcomes (e.g., Prentice & Pyke, 1979; Breslow & Cain, 1988; Weinberg & Wacholder, 1993; Breslow & Holubskov, 1997; Wang & Zhou, 2010), two-stage designs (e.g., White, 1982; Weaver & Zhou, 2005; Song, Zhou & Kosorok, 2009), and ODS for continuous outcomes (e.g., Zhou et al., 2002, 2007; Zhou, Qin & Longnecker, 2011).

The proposed ODS design is closely related to the well-known Case–Cohort design (Prentice, 1986) for the failure time data. The Case–Cohort design first samples a simple random sample (SRS) from the underlying population and in addition collects all remaining failures. This design is particularly effective when the failure rate is low and the number of failures is small (e.g., Self & Prentice, 1988; Cai & Zeng, 2004; Scheike & Martinussen, 2004; Sun et al., 2004; Pan & Schaubel, 2008). Variations of the Prentice (1986) Case–Cohort sampling scheme that further improve the efficiency of the designs include the stratified Case–Cohort design (e.g., Borgan et al., 2000), and generalized Case–Cohort design (e.g., Chen, 2001; Cai & Zeng, 2007; Samuelsen et al., 2007; Kang & Cai, 2009). In many studies where the failure rate may not be low and the number of failures is large, investigators may not have enough budget to sample all failures. Under these situations, it is still desirable to have a design that assembles covariates information for a subset of the failures that will increase the power of the study for a given overall budget.

The Cox proportional hazards model, which assumes the hazard ratio is constant, is commonly used in survival analysis and almost all of the aforementioned works are done under a Cox proportional hazards model framework. When the hazards ratio is varying as the study progresses, the additive hazards model, which assumes the hazards difference is constant, is a useful alternative to the Cox proportional hazards model (Cox & Oakes, 1984; Lin & Ying, 1994; Yip et al., 1999). Buckley (1984) demonstrated that the additive hazards model is biologically more plausible than the Cox proportional hazards model. In this paper, we propose an outcome-dependent sampling scheme for survival data under the additive hazards model and develop a weighted estimating equation approach to estimate the regression parameters for data generated under the proposed ODS design. The proposed design includes a SRS from the underlying cohort, as well as two supplemental samples: one from those who failed early and one from those who failed late. The intention of this sampling method is that if the exposure is related to the failure, then those who failed early and late will be more informative about the exposure-failure relationship. The Case–Cohort design can be viewed as a special case of the proposed ODS design with the selection probability of supplemental failure equal to 1. We show that parameter estimators have closed forms and are easy to compute. We provide theoretical formulas and computing software to help investigators to compute and design an optimal ODS study with the same sample size.

The rest of the paper is organized as follows. In Section 2 we introduce the proposed ODS design for failure time data and discuss suitable weights for constructing the pseudo-score function to estimate the regression parameters. A Breslow-type estimator for the cumulative baseline hazard function is also given. The asymptotic properties of the proposed estimator is presented in Section 3. In Section 4 the asymptotic relative efficiency of the proposed estimator is compared to the pseudo-score estimator under the SRS with the same sample size. A formula for calculating the optimal allocation of subsamples is provided. Section 5 presents a simulation study to evaluate the performance of the proposed methods. Section 6 provides a real data analysis. Section 7 provides some concluding remarks and discussions. The proof for theoretical results are outlined in the Appendix.

2. DATA STRUCTURE AND PSEUDO-SCORE EQUATION

2.1. ODS Design and Data Structure

Suppose that there are N independent subjects in a large study cohort. Let T be the failure time and C be the potential censoring time for T . With right-censoring, we observe the vector (X, δ) with X = min(T, C) and δ = I(T ≤ C), where I(·) is the indicator function. Let Z(t) be a possibly time-dependent p-vector of covariates. We assume that T and C are independent conditional on Z(·). Suppose the hazard function of the failure time T conditional on Z(t) follows the additive hazards model:

λ (t ∣ Z (t)) = λ_{0} (t) + β_{0}^{'} Z (t),

(1)

where λ₀(t) is the unspecified baseline hazard and β₀ denotes the p-vector of unknown regression parameters.

We propose the following general failure time ODS design, which is a retrospective design and the covariates are only measured for the selected subjects. First, we draw a simple random subcohort (SRS) from the original cohort. Let ξ_i indicate, by the values 1 or 0, whether or not the i-th subject is selected into SRS. Assume the sample size of SRS is n₀ and n₀/N → p. Secondly, we partition the domain of failure time T into a union of K̃ mutually exclusive intervals, Ã_k = (a_k_–1, a_k], k = 1,···, K̃, where {a_k : k = 0, 1,···, K̃} are known constants satisfying: a₀ = 0 < a₁ <,···,< a_K̃ = +∞. We select K exclusive intervals which are believed to be more informative to sample supplemental samples with K ≤ K̃. Let A_l denote the selected exclusive interval, who is from the above partition of the failure time for l = 1,..., K. Then, the supplemental samples are selected from the subjects who occurs failure, are outside of SRS, and in each stratum A_k, k = 1,..., K. Let η_ik denote whether or not the i-th subject from the stratum A_k is selected into the supplemental sample. Assume the size of supplemental samples selected form k stratum is n_k, k = 1,···, K. Obviously, the above ODS design is applicable whether or not the disease rate is low and the number of failures is small.

Let N_k and n_0,_k denote the size of the full cohort sample and the SRS sample falling into the k-th stratum and n_k/{N_k – n_0,_k} → r_k, k = 1,···, K. Denote $n = \sum_{i = 0}^{K} n_{i}$ , i.e., n is the total size of the SRS and supplemental samples. Let n/N → ρ_V (validation fraction), n₀/n → ρ₀ (SRS fraction) and n_k/n → ρ_k, k = 1,···, K (supplemental fraction), respectively. Let $π_{k} = P r (X \in A_{k}, δ = 1)$ , k = 1,···, K. Then from simple calculation, the relationship between (p, r_k) and (ρ_V, ρ₀, ρ_k) can be expressed as following:

\begin{matrix} p & = ρ_{0} \times ρ_{V}, \\ r_{k} & = \frac{ρ_{k} \times ρ_{V}}{π_{k} (1 - ρ_{0} \times ρ_{V})}, k = 1, \dots, K . \end{matrix}

(2)

The collection of samples from these two steps whose Z(·) value is observed is referred to as the validation sample. We refer to the collection of remaining subjects whose Z(·) value is not observed as the nonvalidation sample. Hence, the observable data structure of our. proposed failure time ODS is:

\begin{matrix} Validation sample : & SRS : (X_{i}, δ_{i}, Z_{i} (t)), i \in V_{0}, \\ Supplemental : & (X_{i}, δ_{i}, Z_{i} (t) ∣ X_{i} \in A_{k}, δ_{i} = 1), i \in V_{k}, k = 1, \dots, K; \\ Nonvalidation sample : & (X_{j}, δ_{j}), j \in \overset{‒}{V}; \end{matrix}

where V₀, V_k and V̄ are the index for the SRS, supplemental sample from the stratum A_k and the nonvalidation sample, respectively. Note that (i) when K̃ = 1 and r₁ = 1, our proposed failure time ODS design is the traditional Case–Cohort design. (ii) When K̃ = 1 and r₁ ∈ (0, 1), our proposed failure time ODS design is the generalized Case–Cohort design by Cai & Zeng (2007).

2.2. Weighted Pseudo-Score Estimator

Define N_i(t) = I(X_i ≤ t, δ_i = 1) and Y_i(t) = I(X_i ≥ t). Let τ denote the study end time. If the data are completely observed, β₀ of model (1) can be estimated by ${\hat{β}}_{F}$ , the root of the following pseudo-score equation

U_{F} (β) = \sum_{i = 1}^{N} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)} {d N_{i} (t) - β^{'} Z_{i} (t) Y_{i} (t) d t} = 0,

(3)

where $\overset{‒}{Z} (t) = \sum_{i = 1}^{N} Z_{i} (i) Y_{i} (t) ∕ \sum_{i = 1}^{N} Y_{i} (t)$ . Since not all observed data have the complete covariate history, we propose to apply the following inverse probability weight (IPW) (e.g., Horvitz & Thompson, 1951) to inference the data from an ODS design:

w_{i} = ξ_{i} (1 - δ_{i}) {(ρ_{0} ρ_{V})}^{- 1} + ξ_{i} δ_{i} (1 - ζ_{i}) {(ρ_{0} ρ_{V})}^{- 1} + ξ_{i} δ_{i} ζ_{i} + (1 - ξ_{i}) δ_{i} \sum_{k = 1}^{K} \frac{π_{k} (1 - ρ_{0} ρ_{V}) ζ_{i k} η_{i k}}{ρ_{k} ρ_{V}},

(4)

where $ζ_{i} = \sum_{k = 1}^{K} ζ_{i k}$ and $ζ_{i k} = I (T_{i} \in A_{k})$ . We don't sample the nonvalidation sample to observe their covariates. Therefore, the sampling probability of the nonvalidation sample should be zero. The sampling probability of the supplemental sample in A_k is ρ_kρ_V/[π_k(1 – ρ₀ρ_V)], k = 1,..., K. In SRS, the sampling probability of censored subject is ρ₀ρ_V and the sampling probability of failure is 1 if it belongs to stratum A_k, otherwise it is ρ₀ρ_V. The above inverse probability weight (4) can achieve the following goals: (i) nonvalidation samples are eliminated by setting w = 0; (ii) the sampled censored subjects have the inverse of the sampling probability, (ρ₀ρ_V)^–1, as their weight; (iii) the sampled supplemental cases are weighted by π_k(1 – ρ₀ρ_V)/(ρ₀ρ_V); (iv) the sampled subcohort cases are weighted by 1 if they belong to A_k (k = 1,···,K), and by (ρ₀ρ_V)^–1 otherwise.

We propose to estimate the true regression coefficients, β₀, by solving the following weighted pseudo-score equation:

U_{W} (β) = \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - {\overset{‒}{Z}}_{w} (t)} {d N_{i} (t) - β^{T} Z_{i} (t) Y_{i} (t) d t} = 0,

(5)

where ${\overset{‒}{Z}}_{w} (t) = \sum_{i = 1}^{N} w_{i} Z_{i} (t) Y_{i} (t) ∕ \sum_{i = 1}^{N} w_{i} Y_{i} (t)$ . The resultant estimator has a closed form:

{\hat{β}}_{O D S} = {[\sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - {\overset{‒}{Z}}_{w} (t)}^{\otimes 2} Y_{i} (t) d t]}^{- 1} [\sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - {\overset{‒}{Z}}_{w} (t)} d N_{i} (t)],

(6)

where $a^{\otimes 2} = a a^{T}$ for a vector a.

For the cumulative baseline hazard function $Λ_{0} (t) = \int_{0}^{t} λ_{0} (s) d s$ , it is natural to use the fol lowing estimator:

{\hat{Λ}}_{O D S} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{N} w_{i} d N_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} - \int_{0}^{t} {\hat{β}}_{O D S}^{T} {\overset{‒}{Z}}_{w} (s) d s .

(7)

To ensure its monotonicity, we make a minor modification, which still preserves the asymptotic properties, that is ${\hat{Λ}}_{O D S}^{*} (t) = \sup_{s \leq t} {\hat{Λ}}_{O D S} (s)$ . Following similar arguments as Lin & Ying (1994), we can show that ${\hat{Λ}}_{O D S}^{*} (t)$ and ${\hat{Λ}}_{O D S} (t)$ are asymptotically equivalent in the sense that ${\hat{Λ}}_{O D S}^{*} (t) - {\hat{Λ}}_{O D S} (t) = o_{p} (N^{- \frac{1}{2}})$ .

3. ASYMPTOTIC PROPERTIES

To develop large sample theory for the proposed estimators, we first introduce the following notations:

Let e(t) = E[Y(t)Z(t)]/E[Y(t)]. For i = 1, ···, N, define

\begin{matrix} M_{i} (t) & = N_{i} (t) - \int_{0}^{t} Y_{i} (s) d Λ_{0} (s) - \int_{0}^{t} β_{0}^{T} Z_{i} (s) Y_{i} (s) d s, \\ S_{i} (β_{0}) & = \int_{0}^{τ} {Z_{i} (t) - e (t)} d M_{i} (t), \\ A_{N} & = \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - {\overset{‒}{Z}}_{w} (t)}^{\otimes 2} Y_{i} (t) d t . \end{matrix}

We impose the following regularity conditions:

(C1)
Λ₀(τ) < ∞.
(C2)
Pr(Y(t) = 1) > 0 for t ∈ (0, τ].
(C3)
$E [\sup_{0 \leq t \leq τ} ∣ Y (t) Z^{\otimes 2} (t) β_{0}^{'} Z (t) ∣] < \infty$ .
(C4)
$Σ_{A} = E [\int_{0}^{τ} {Z (t) - e (t)}^{\otimes 2} Y (t) d t]$ is positive definite.

The conditions are similar to those in Theorem 4.1 of Anderson & Gill (1982). The asymptotic properties of ${\hat{β}}_{O D S}$ are stated in the following:

Theorem 1

Under the conditions (C1)-(C4), (i)(consistency) ${\hat{β}}_{O D S} \to_{p} β_{0}$ ; (ii) (asymptotic normality) $N^{1 ∕ 2} ({\hat{β}}_{O D S} - β_{0})$ is asymptotically normally distributed with mean zero and variance matrix $Σ_{O D S} (β_{0}) = Σ_{A}^{- 1} (Σ_{F} + Σ_{B} (β_{0})) {(Σ_{A}^{- 1})}^{'}$ , where $Σ_{A}$ is defined as in assumption (C4) and

\begin{matrix} Σ_{F} & = E [\int_{0}^{τ} {Z_{1} (t) - e (t)}^{\otimes 2} d N_{1} (t)]; \\ Σ_{B} (β_{0}) & = E [{(w_{1} - 1)}^{2} S_{1}^{\otimes 2} (β_{0})] \\ = \frac{1 - ρ_{0} ρ_{V}}{ρ_{0} ρ_{V}} E [(1 - δ_{1}) S_{1}^{\otimes 2} (β_{0})] + \frac{1 - ρ_{0} ρ_{V}}{ρ_{0} ρ_{V}} E [δ_{1} (1 - ζ_{1}) S_{1}^{\otimes 2} (β_{0})] + \sum_{k = 1}^{K} \frac{(1 - ρ_{0} ρ_{V}) (π_{k} (1 - ρ_{0} ρ_{V}) - ρ_{k} ρ_{V})}{ρ_{k} ρ_{V}} E [δ_{1} ζ_{1 k} S_{1}^{\otimes 2} (β_{0})] . \end{matrix}

Remark 1

The asymptotic variance of ${\hat{β}}_{O D S}$ consists that of full data pseudo-score estimator's variance $Σ_{F}$ plus an extra term $Σ_{B} (β_{0})$ due to ODS

Remark 2

For Case–Cohort sampling. design, K̃ = 1 and r₁ = 1,

Σ_{B} (β_{0}) = \frac{1 - ρ_{0} ρ_{V}}{ρ_{0} ρ_{V}} E [(1 - δ_{1}) S_{1}^{\otimes 2} (β_{0})],

and this results is the same as the variance derived by Kulich & Lin (2004).

Remark 3

For generalized Case–Cohort design, K̃ = 1 and r₁ ∈ (0, 1),

Σ_{B} (β_{0}) = \frac{1 - ρ_{0} ρ_{V}}{ρ_{0} ρ_{V}} E [(1 - δ_{1}) S_{1}^{\otimes 2} (β_{0})] + \frac{(1 - ρ_{0} ρ_{V}) (1 - ρ_{V})}{ρ_{1} ρ_{V}} E [δ_{1} S_{1}^{\otimes 2} (β_{0})],

and this result is the same as the variance derived by Cai & Zeng (2007).

Theorem 2

Under the conditions (C2)-(C4), the estimated variance matrixes ${\hat{Σ}}_{A} \to_{p} Σ_{A}$ , ${\hat{Σ}}_{F} \to_{p} Σ_{F}$ , ${\hat{Σ}}_{B} ({\hat{β}}_{O D S}) \to_{p} Σ_{B} (β_{0})$ and ${\hat{Σ}}_{O D S} ({\hat{β}}_{O D S}) \to_{p} Σ_{O D S} (β_{0})$ , where

\begin{matrix} {\hat{Σ}}_{A} & = \frac{1}{N} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - {\overset{‒}{Z}}_{w} (t)}^{\otimes 2} Y_{i} (t) d t, \\ {\hat{Σ}}_{F} & = \frac{1}{N} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - {\overset{‒}{Z}}_{w} (t)}^{\otimes 2} d N_{i} (t), \end{matrix}

{\hat{Σ}}_{B} ({\hat{β}}_{O D S}) = \frac{1 - ρ_{0} ρ_{V}}{ρ_{0} ρ_{V}} \frac{1}{N} \sum_{i = 1}^{N} w_{i} (1 - δ_{i}) {\hat{S}}_{i}^{\otimes 2} ({\hat{β}}_{O D S}) + \frac{1 - ρ_{0} ρ_{V}}{ρ_{0} ρ_{V}} \frac{1}{N} \sum_{i = 1}^{N} w_{i} δ_{i} (1 - ζ_{i}) {\hat{S}}_{i}^{\otimes 2} ({\hat{β}}_{O D S}) + \sum_{k = 1}^{K} \frac{(1 - ρ_{0} ρ_{V}) ({\hat{π}}_{k} (1 - ρ_{0} ρ_{V}) - ρ_{k} ρ_{V})}{ρ_{k} ρ_{V}} \frac{1}{N} \sum_{i = 1}^{N} w_{i} δ_{i} ζ_{i k} {\hat{S}}_{i}^{\otimes 2} ({\hat{β}}_{O D S}),

with

\begin{matrix} {\hat{S}}_{i} (β) = \int_{0}^{τ} {Z_{i} (t) - {\overset{‒}{Z}}_{w} (t)} {d N_{i} (t) - Y_{i} (t) d {\hat{Λ}}_{O D S} (t) - β^{'} Z_{i} (t) Y_{i} (t) d t}, \\ {\hat{Σ}}_{O D S} ({\hat{β}}_{O D S}) = {\hat{Σ}}_{A}^{- 1} ({\hat{Σ}}_{F} + {\hat{Σ}}_{B} ({\hat{β}}_{O D S})) {({\hat{Σ}}_{A}^{- 1})}^{'}, \end{matrix}

and

{\hat{π}}_{k} = \frac{1}{N} \sum_{i = 1}^{N} I (X_{i} \in A_{k}, δ_{i} = 1), k = 1, \dots, K .

Proof

the consistency follows from the law of large numbers, the uniform consistency of ${\hat{Λ}}_{O D S} (t)$ in Theorem 3 and the uniform convergence of Z̄_w(t) to e(t) are established in the Appendix.

Define $h (t) = \int_{0}^{t} e (u) d u$ and ψ₀(t) = E[Y(t)]. The follow theorem establishes the asymptotic property of the estimated cumulative baseline hazard function ${\hat{Λ}}_{O D S} (t)$ .

Theorem 3

Under the assumptions (C1)-(C4), (i)(uniform consistency) $\sup_{t \in [0, τ]} ∣ {\hat{Λ}}_{O D S} (t) - Λ_{0} (t) ∣ \to_{p} 0$ ; (ii)(asymptotic normality of ${\hat{Λ}}_{O D S} (t)$ ) $\sqrt{N} ({\hat{Λ}}_{O D S} (t) - Λ_{0} (t))$ , where ${\hat{Λ}}_{O D S} (t)$ is defined in (7), converges weakly on [0, τ] to a zero mean Gaussian process with function at (s, t) is

h {(s)}^{'} Σ_{A}^{- 1} (Σ_{F} + Σ_{B} (β_{0})) Σ_{A}^{- 1} h (t) + R_{1} (s, t) - h^{'} (s) Σ_{A}^{- 1} R_{2} (t) - h^{'} (t) Σ_{A}^{- 1} R_{2} (s),

where

\begin{matrix} R_{1} (s, t) & = E [{\frac{1}{ρ_{0} ρ_{V}} - \frac{{(ρ_{0} ρ_{V})}^{2} - 1}{ρ_{0} ρ_{V}} δ_{1} ζ_{1} + (1 - ρ_{0} ρ_{V}) δ_{1} \sum_{k = 1}^{K} \frac{ζ_{1 k} π_{k} (1 - ρ_{0} ρ_{V})}{ρ_{k} ρ_{V}}} \times \int_{0}^{t} Ψ_{0}^{- 1} (u) d M_{1} (u) \int_{0}^{s} Ψ_{0}^{- 1} (v) d M_{1} (v)], \\ R_{2} (t) & = E [{\frac{1}{ρ_{0} ρ_{V}} - \frac{{(ρ_{0} ρ_{V})}^{2} - 1}{ρ_{0} ρ_{V}} δ_{1} ζ_{1} + (1 - ρ_{0} ρ_{V}) δ_{1} \sum_{k = 1}^{K} \frac{ζ_{1 k} π_{k} (1 - ρ_{0} ρ_{V})}{ρ_{k} ρ_{V}}} \times \int_{0}^{t} {Z (u) - e (u)} Ψ_{0}^{- 1} (u) d N (u)] . \end{matrix}

The outline of the proofs of Theorem 1 and 3 are provided in the Appendix.

4. ASYMPTOTIC RELATIVE EFFICIENCY AND OPTIMAL ODS DESIGN

4.1. Asymptotic Relative Efficiency with SRS Design with Same Sample Size

In this section, we investigate the relative efficiency of the proposed estimator ${\hat{β}}_{O D S}$ to the competing estimator ${\hat{β}}_{S R S}$ , where ${\hat{β}}_{S R S}$ is the pseudo-score estimator from the equation (3) based on the SRS design with the same sample size. We then use those results to derive an optimal sample size allocation for future study designs.

By Theorem 1, the asymptotic relative efficiency of ${\hat{β}}_{S R S}$ versus ${\hat{β}}_{O D S}$ is

A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S}) = \frac{n}{N} Σ_{A}^{- 1} [Σ_{F} + Σ_{B} (β_{0})] Σ_{F}^{- 1} Σ_{A},

(8)

where $n = \sum_{k = 0}^{K} n_{k}$ is the total size of ODS sample. The formula of $A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S})$ can be re-written as:

A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S}) = ρ_{V} I_{p} + \frac{1 - ρ_{0} ρ_{V}}{ρ_{0}} Σ_{A}^{- 1} E [(1 - δ_{1}) S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A} + \frac{1 - ρ_{0} ρ_{V}}{ρ_{0}} Σ_{A}^{- 1} E [δ_{1} (1 - ζ_{1}) S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A} + \sum_{k = 1}^{K} \frac{(1 - ρ_{0} ρ_{V}) (π_{k} (1 - ρ_{0} ρ_{V}) - ρ_{k} ρ_{V})}{ρ_{k}} \times Σ_{A}^{- 1} E [δ_{1} ζ_{1 k} S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A} .

(9)

4.2. Optimal ODS Design

We consider the optimal subcohort allocation problem in the failure time ODS design under a fixed underlying cohort population and a fixed total budget. By optimality, we mean an allocation of n₀, n₁,...,n_K such that the trace of matrix $A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S})$ achieves its minimum. Recall that n = n₀ + n₁ + ··· + n_K is total validation size where Z is observed. Let N denote the total sample size of an underlying cohort population and $B denote total budget at the disposal of the study investigators. Assume that the unit cost is $C₁ to observe (X, δ) and the unit cost is $C₂ to observe (Z). For given B and N, the simple random sampling design can afford to sample (B – N × C₁)/C₂ = n_SRS subjects for assess exposure Z. The ODS design, on the other hand, can afford to sample n₀, n₁,..., n_K to assess exposure Z, where n₀, n₁,...,n_K are bounded by condition

N \times C_{1} + (n_{0} + n_{1} + \dots + n_{K}) \times C_{2} = B,

(10)

Our goal is finding the n₀, n₁,...,n_K allocation, such that they satisfy (10), but also minimize the trace of $A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S})$ . We assume that N, B, C₁, C₂ are all fixed, which is equivalent to the condition that ρ_V (ρ_V (ρV = (n₀+n₁+···+n_k)/N) is fixed.

From the formula (9), we known that the trace of asymptotic relative efficiency, denoted by $T A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S})$ can be written as:

T A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S}) = ρ_{V} p + \frac{1 - ρ_{0} ρ_{V}}{ρ_{0}} trace (Σ_{A}^{- 1} E [(1 - δ_{1}) S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A}) + \frac{1 - ρ_{0} ρ_{V}}{ρ_{0}} trace (Σ_{A}^{- 1} E [δ_{1} (1 - ζ_{1}) S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A}) + \sum_{k = 1}^{K} \frac{(1 - ρ_{0} ρ_{V}) (π_{k} (1 - ρ_{0} ρ_{V}) - ρ_{k} ρ_{V})}{ρ_{k}} \times trace (Σ_{A}^{- 1} E [δ_{1} ζ_{1 k} S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A}),

(11)

where trace $(Σ_{A}^{- 1} E [(1 - δ_{1}) S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A})$ , trace $(Σ_{A}^{- 1} E [δ_{1} (1 - ζ_{1}) S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A})$ , and trace $(Σ_{A}^{- 1} E [δ_{1} ζ_{1 k} S_{1}^{\otimes 2} (β_{0})] Σ_{F}^{- 1} Σ_{A})$ , k = 1,...,K are constant and they could be consistently estimated by replacing the means with their empirical counterparts from Theorem 2. Therefore, $T A R E ({\hat{β}}_{S R S}, {\hat{β}}_{O D S})$ is a function of ρ_V, ρ₀ and ρ_i, 1 ≤ i ≤ K, which are dependent on our sampling scheme. It is desirable to choose values that minimize the trace of the asymptotic relative efficiency. For most ODS applications, the K̃ = 3 case is shown to be a practical and sufficient setting (Zhou et al., 2007) and the Newton-Raphson algorithm could be used to get the optimal allocation of the subsamples. We will be happy to provide interested readers with the program code we wrote for this

4.3. Optimal ODS Example

We consider the following additive hazards model:

λ (t ∣ E, Z) = λ_{0} (t) + β_{1} E + β_{2} Z,

where E ~ N(0, 1), Z ~ Bern(1, 0.5), λ₀(t) = 0.6, β₁ = 0 and β₂ = 0.5. We consider the situation where the censoring rates are 70% and 60%, and the cutpoints are (30%, 70%) quartiles of failure time. We select the supplemental samples from the high and low intervals of the failure time. Let ρ₂ = 0 (ρ_i = n_i/n and n = n₀ + n₁ + n₃). We fix ρ_V and consider the trace of asymptotic relative efficiency between ${\hat{β}}_{S R S}$ and ${\hat{β}}_{O D S}$ under different setting of ρ₀, ρ₁ and ρ₃. The simulation results (Figure 1) are based on the total sample size N = 600 and 1000 simulated data sets.

The trace of asymptotic relative efficiency between ${\hat{β}}_{S R S}$ and ${\hat{β}}_{O D S}$

In Figure 1, the X-axis represents the range of corresponding ρ₀ and the Y-axis represents the trace of asymptotic relative efficiency. From Figure 1, it can be seen that: (i) the trace of asymptotic relative efficiency is decreasing as ρ_V is increasing. (ii) In Figure 1.a, when ρ_V = 0.2, 0.4, the smallest ρ₀ is equal to 0.33 and 0.66, respectively. In Figure 1.b, when ρ_V = 0.4, 0.5, the smallest ρ₀ is equal to 0.49 and 0.63, respectively. (iii) In Figure 1.a, when ρ_V = 0.2, 0.4, the corresponding optimal ρ₀ are equal to 0.67 and 0.73, respectively. (iv) Under the situation that censoring rate is 60%, ρ_V = 0.4, 0.5, the corresponding optimal ρ₀ are 0.75 and 0.73 in Figure 1.b. The above results suggests that: (1) when the censoring rate is high, e.g., 70%, sampling fewer SRS subcohorts (smaller ρ₀) will increase the study efficiency; (2) when the censoring rate is moderate, e.g., 60%, one can find an optimal ρ₀ that may be around 0.73.

5. SIMULATION STUDIES

In this section, we examine the finite sample performance of the proposed approach via simulation studies. For all simulation studies, we generated 1000 simulated datasets, each with N = 600 independent subjects. The failure times are generated from the additive hazards model:

λ (t ∣ E, Z) = λ_{0} (t) + β_{1} E + β_{2} Z,

where exposure E follows standard normal distribution and Z follows a Bernoulli distribution with Pr(Z = 1) = 0.5, λ₀(t) = 0.6, β₁ = 0 and β₂ = 0.5. The censoring times are generated from mixture uniform distribution with c₀unif[c₁, c₂] + (1 – c₀)unif[c₃, c₄] with 0 < c₀ < 1, where c₀, c₁, c₂, c₃ and c₄ are chosen to generate around 60%, 70% censoring respectively. All the failures are partitioned into three strata with the cutpoints (30%, 70%) quartiles of failure times. Our proposed ODS design consists different sizes of SRS and supplemental sample (presented in Table 1).

TABLE 1.

Simulation results based on 1000 simulations with full cohort size N = 600 and cutpoints being (0.3, 0.7). The hazard model is λ(t) = 0.6 + β₁E + β₂Z with β₁ = 0, β₂ = 0.5 and E ~ N(0, 1), Z ~ Bernulli(0.5).

			β ₁				β ₂
Censoring	(n₀,n₁,n₃)	Method	Mean	SE	$\hat{SE}$	95%CI	Mean	SE	$\hat{SE}$	95%CI
70%	(300,65,7)	${\hat{β}}_{F u l l}$	−0.002	0.065	0.062	0.940	0.502	0.132	0.130	0.950
		${\hat{β}}_{R}$	−0.003	0.089	0.089	0.950	0.512	0.190	0.185	0.946
		${\hat{β}}_{S R S}$	−0.002	0.085	0.080	0.941	0.501	0.174	0.165	0.946
		${\hat{β}}_{G C C}$	−0 001	0.076	0.078	0.960	0.509	0.160	0.161	0.949
		${\hat{β}}_{O D S}$	−0.001	0.075	0.077	0.958	0.508	0.157	0.159	0.947
	(360,51,6)	${\hat{β}}_{F u l l}$	−0.001	0.062	0.062	0.942	0.501	0.131	0.129	0.949
		${\hat{β}}_{R}$	−0.001	0.081	0.080	0.950	0.508	0.173	0.168	0.941
		${\hat{β}}_{S R S}$	−0.002	0.073	0.074	0.954	0.501	0.156	0.155	0.946
		${\hat{β}}_{G C C}$	0.001	0.071	0.073	0.945	0.507	0.149	0.151	0.952
		${\hat{β}}_{O D S}$	−0.000	0.069	0.071	0.942	0.505	0.144	0.148	0.956
60%	(300,69,16)	${\hat{β}}_{F u l l}$	−0.001	0.054	0.053	0.955	0.500	0.113	0.113	0.947
		${\hat{β}}_{R}$	0.002	0.074	0.075	0.954	0.498	0.170	0.160	0.938
		${\hat{β}}_{S R S}$	0.005	0.067	0.067	0.943	0.500	0.142	0.142	0.954
		${\hat{β}}_{G C C}$	0.003	0.065	0.069	0.962	0.506	0.142	0.145	0.958
		${\hat{β}}_{O D S}$	0.003	0.064	0.067	0.957	0.503	0.137	0.140	0.961
	(360,55,13)	${\hat{β}}_{F u l l}$	−0.001	0.053	0.052	0.950	0.501	0.115	0.113	0.951
		${\hat{β}}_{R}$	0.002	0.069	0.069	0.944	0.505	0.148	0.146	0.948
		${\hat{β}}_{S R S}$	−0.000	0.064	0.062	0.948	0.502	0.138	0.134	0.939
		${\hat{β}}_{G C C}$	0.001	0.063	0.064	0.952	0.501	0.132	0.135	0.953
		${\hat{β}}_{O D S}$	0.002	0.060	0.062	0.952	0.500	0.129	0.131	0.953

Open in a new tab

- ${\hat{β}}_{F u l l}, {\hat{β}}_{R}$ and ${\hat{β}}_{S R S}$ are the standard pseudo-score estimator based on full cohort, SRS subcohort and SRS sample with same size as ODS design, respectively.

- ${\hat{β}}_{G C C}$ and ${\hat{β}}_{O D S}$ denote the proposed estimator based on the GCC design and our proposed failure time ODS, respectively.

For each setting, we compare the proposed estimator by ( ${\hat{β}}_{O D S}$ ) with four competing estimators: (1) ${\hat{β}}_{G C C}$ , the estimator based on the generalized Case-Cohort design which randomly selects the SRS's of size n₀ and the supplemental samples of size n₁ + n₃ from the cases out of SRS, respectively. (2) ${\hat{β}}_{F u l l}$ , the pseudo-score estimator based on the full cohort. (3) ${\hat{β}}_{R}$ , the pseudo-score estimator based on the SRS sample. (4) ${\hat{β}}_{S R S}$ , the pseudo-score based on the SRS sample with the same sample size as the ODS design. We study different scenarios including different censoring rates and different size of supplemental samples. The sample standard deviation of the 1000 estimates is given in the corresponding SE column. The $\hat{SE}$ column gives the average of the estimated standard error and “95% CI” is the nominal 95% confidence interval coverage of the true parameter using the estimated standard error. The simulation results are summarized in Table 1.

First, under all of the situations considered here, the five estimators are all unbiased. The proposed variance estimator provides a good estimation for the sample standard errors and the confidence intervals attain coverage closed to the nominal 95% level. Second, ${\hat{β}}_{F u l l}$ is the best estimator among the five estimators, because it is based on the full cohort data. Third, the proposed estimator ${\hat{β}}_{O D S}$ is more efficient than the estimator, ${\hat{β}}_{G C C}$ , which indicates that sampling the supplemental samples from the high and low intervals of the failure time is more efficient than simple random sampling. Finally, the proposed estimator ${\hat{β}}_{O D S}$ is also more efficient than ${\hat{β}}_{S R S}$ under all the situations.

6. URANIUM MINERS STUDY DATA ANALYSIS

In this section, we illustrate the proposed method using a data set from the Cancer Incidence and Mortality of Uranium Miners Study. Uranium miners are chronically exposed to ionizing radia tion, which is a known carcinogen. Therefore, miners are at risk of developing radiation-related cancer because they are chronically exposed to alpha particles emitted by radon and its progeny (referred to as radon), which will increase the risk of cancer through the resulting biological damage. Lung cancer has been long acknowledged as an occupational disease in uranium miners (BEIR VI, 1999). Furthermore, most studies investigated mortality rather than cancer incidence (Tirmarche et al., 1993; Vacquier et al., 2008; Kreuzer et al., 2008, 2010). However, they miss a substantial number of cases when the cancers have low fatality rates (Řeřicha et al., 2006; Kulich et al., 2011). So, we investigate incidence of various types of cancer excluding lung cancer rather than mortality and evaluate associations of working exposures to radon with the incidence of non-lung solid cancers.

To illustrate our methods, we consider the following ODS design. The full cohort used for cancer incidence follow-up includes 16, 434 miners. The follow-up period for case ascertainment was January 1, 1977 to December 31, 1996. A total of 2, 506 subjects with incident cancers were identified, of which 1, 575 had a cancer type of interest. The cohort was classified according to age on 1/1/1977 (5-year age groups). The subcohort was simple random sampled from each of the resulting strata so that the number of a subcohort sampled from a stratum was approximately equal to the total number of all cancer cases in the stratum. Therefore, we used the bootstrap method to obtain the variance estimation with the number of bootstraps being 300. The size of SRS, n₀, is 1, 930. Let C₃, C₇ denote the 30% and 70% quantiles of the incidence time, respectively. We sample n₁ = 236 and n₃ = 236 supplemental samples from the intervals (0, C₃] and (C₇, ∞), respectively. The total size of ODS sample is 2, 402. We observe the following four covariates: total radon exposure (Trad) is measured as working level months (WLM, 1WLM = 3.5 × 10^–3Jhm^–3), Age (years), period of entering workforce (Dummy₁ = 1, if subject started work between 1957 and 1966, and 0 otherwise; Dummy₂ = 1, if subject started work between 1967 and 1976, and 0 otherwise) and Smoking (0 denotes non-smokers and light smokers who smoked less than 10 cigarettes a day for a period not exceeding 5 years; 1 denotes moderate and heavy smokers).

We consider the following additive hazards model:

λ (t ∣ Z) = λ_{0} (t) + β_{1} Trad + β_{2} Age + β_{3} Smoking + β_{4} {Dummy}_{1} + β_{5} {Dummy}_{2} .

The three methods including SRS ( ${\hat{β}}_{S R S}$ ), GCC ( ${\hat{β}}_{G C C}$ ) and ODS ( ${\hat{β}}_{O D S}$ ) with the same size of sample are used to evaluate the association between incident and above covariates. The results for Cancer Incidence and Mortality of Uranium Miners Study are summarized in Table 2.

Table 2.

Analysis results for Cancer Incidence and Mortality of Uranium Miners Study: the listed values are the original values ×10^–5

Methods	$\hat{β}$	$SE (\hat{β})$	95%CI
${\hat{β}}_{S R S}$
Trad	0.358	0.080	(0.201, 0.516)
Age	11.400	0.774	(9.900, 12.900)
Smoking	115.300	12.600	(90.700, 140.000)
Dummy₁	17.900	16.800	(−15.000, 50.800)
Dummy₂	27.500	21.900	(−15.000, 70.500)
${\hat{β}}_{G C C}$
Trad	0.401	0.063	(0.277, 0.525)
Age	7.940	0.721	(6.530, 9.350)
Smoking	125.500	11.500	(103.000, 148.000)
Dummy₁	3.380	13.600	(−23.000, 29.900)
Dummy₂	6.590	21.600	(−36.000, 48.900)
${\hat{β}}_{O D S}$
Trad	0.367	0.059	(0.251, 0.483)
Age	10.200	0.709	(8.840, 11.600)
Smoking	129.800	10.500	(109.200, 150.400)
Dummy₁	5.680	13.300	(−20.000, 31.800)
Dummy₂	7.330	20.500	(−33.000, 47.400)

Open in a new tab

Note: Trad is the total radon exposure. ${\hat{β}}_{S R S}$ : the estimator obtained by simple random sampling; ${\hat{β}}_{G C C}$ : the estimator obtained by generalized Case-Cohort sampling; ${\hat{β}}_{O D S}$ : the estimator obtained by ODS sampling. The three methods base on the same size of the sample.

Results in Table 2 show that Trad under various methods is significantly related to the incidence of non-lung solid cancers. Nevertheless, a more precise 95% confidence interval (0.251 × 10^–5, 0.483 × 10^–5) is achieved for the estimator of Trad by the method ${\hat{β}}_{O D S}$ . The standard deviations for Trad are 0.802 × 10^–6, 0.634 × 10^–6 and 0.590 × 10^–6 from ${\hat{β}}_{S R S}$ , ${\hat{β}}_{G C C}$ and ${\hat{β}}_{O D S}$ , respectively. The estimators for the remaining covariates under various methods are all almost the same as Trad. All the methods considered confirm that Trad has a positive impact on the incidence of non-lung solid cancers.

7. CONCLUDING REMARKS AND DISCUSSIONS

We proposed an ODS design for right censored failure time data under the additive hazards model. With a right censored response variable, the ODS sampling scheme is not only dependent on the value of observed failure time but also on the failure indictor. Under the framework of the additive hazards model we introduced the inverse probability weight (IPW) to the standard pseudo-score equation to estimate the regression coefficients. Our proposed estimators have a closed form and are easy to compute. The proposed estimators are shown to be consistent and asymptotically normal. Simulation studies show that the proposed estimator and design is more efficient than both the SRS estimator and the generalized Case–Cohort estimator with the same sample size.

We investigated the asymptotic relative efficiency and optimal allocation of subsample by evaluating the trace of the asymptotic relative efficiency between our proposed estimator and the standard pseudo-score estimator from SRS design with the same sample size under a fixed total sample size and a fixed total budget. We found that the proposed method performs well and is more efficient than the SRS design. When the censoring rate is high, sampling less SRS subcohort will increase the study efficiency. The simulation study suggests that greater efficiency can be gained in estimating the exposure effect on the outcome using our proposed ODS design. A real data analysis is provided to illustrate our proposed method.

Throughout this study, we have assumed Bernoulli sampling for the subcohort and cases outside the subcohort. Borgan et al., (2000) and Samuelsen et al., (2007) found that a stratified sampling SRS could improve the study efficiency. Future study focusing on developing efficient analysis methods for the stratified outcome-dependent sampling is justified.

ACKNOWLEDGEMENTS

The authors are grateful for the valuable comments and suggestions from the associate editor and the referees which drastically improved the article. This work is supported by the Fundamental Research Fund for the Central Universities 31541311216 (for Yu), National Science Foundation of China grant 11171263 (for Liu), and NIH R01 ES021900, P01 CA142538 (for Zhou).

APPENDIX

We first introduce the following lemmas which will be useful in proving the asymptotic properties of our estimators.

Lemma 1

Under the conditions (C2) to (C4), we have,

\sup_{t \in [0, τ]} ‖ \overset{‒}{Z} (t) - e (t) ‖ = o_{p} (1),

Proof

The result holds by application of the law of large numbers and Corollary III.2 of Anderson and Gill (1982).

Lemma 2

Let A_n(t), $A_{n}^{*} (t)$ and B_n(t) be three sequences of bounded processes on [0, τ]. Suppose that (a) B_n(t) converges weakly to a tight limit B(t) with almost surely continuous sample paths; (b) A_n(t) and $A_{n}^{*} (t)$ are monotone in t; and (c) there exist processes A(t) and A* (t) both right continuous at 0 and left continuous at τ, such that sup_t_∈[0,τ] and $\sup_{t \in [0, τ]} ∣ A_{n}^{*} (t) - A^{*} (t) ∣ \to_{p} 0$ . Then

\sup_{t \in [0, τ]} ‖ \frac{1}{N} \sum_{i = 1}^{N} w_{i} Y_{i} (t) {(Z_{i} (t))}^{k} - π_{k} (t) ‖ = o_{p} (1), k = 0, 1 .

This lemma's proof can be found in Kulich and Lin (2000).

Proof of Theorem 1

From the (6) and a simple algebraic manipulation, we can get that

\sup_{t \in [0, τ]} ∣ \int_{0}^{t} {A_{n} (s) A_{n}^{*} (s) - A (s) A^{*} (s)} d B_{n} (s) ∣ \to_{p} 0 .

We can show

\sqrt{N} ({\hat{β}}_{O D S} - β_{0}) = \sqrt{N} {[\sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)}^{\otimes 2} Y_{i} (t) d t]}^{- 1} \times [\sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)} d N_{i} (t) - \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)}^{\otimes 2} β_{0} Y_{i} (t) d t] = {[\frac{1}{N} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)}^{\otimes 2} Y_{i} (t) d t]}^{- 1} [\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)} d M_{i} (t)] .

by application of the law of large numbers and the Lemma 1.

We have

\frac{1}{N} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)}^{\otimes 2} Y_{i} (t) d t \to_{p} E [\int_{0}^{τ} {[Z (t) - e (t)]}^{\otimes 2} Y (t) d t],

Frist, we will show the second part of (1) is asymptotical negligible,

\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - \overset{‒}{Z} (t)} d M_{i} (t) = \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - e (t)} d M_{i} (t) + \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {e (t) - \overset{‒}{Z} (t)} d M_{i} (t) .

(1)

Without loss of generality, assume that Z_i(t) ≥ 0 for all t; otherwise, decompose each Z_i(·) into its positive and negative parts. for each i, the process w_iM_i(t) has mean zero and can be expressed as the sum of two monotone processes on [0, τ]. Thus, by van der Vaart and Wellner (1996, Example 2.11.16), $B_{n} (t) ≔ \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} M_{i} (t)$ converges weakly to a tight Gaussian process B(t) with continuous sample paths on [0, τ]. Since Z̄ (t) is a product of two monotone processes which converge uniformly in probability to π₁(t) and $π_{0}^{- 1} (t)$ , where $π_{1} (t) π_{0}^{- 1} (t) = e (t)$ . We can prove (2) by the Lemma 2.

Second, the first part of (1) is equal to

\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {e (t) - \overset{‒}{Z} (t)} d M_{i} (t) = o_{p} (1) .

(2)

By the define of S_i(β₀), the (3) is equal to

\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - e (t)} d M_{i} (t) = \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} (w_{i} - 1) \int_{0}^{τ} {Z_{i} (t) - e (t)} d M_{i} (t) + \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} \int_{0}^{τ} {Z_{i} (t) - e (t)} d M_{i} (t)

(3)

and

\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} S_{i} (β_{0}) + \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} (w_{i} - 1) S_{i} (β_{0}),

(4)

and the mean of them are both equal to zero. So, the two parts of (4) are uncorrelated. We rewrite the second part of (4) as:

\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} (w_{i} - 1) S_{i} (β_{0}) = \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} (\frac{ξ_{i}}{ρ_{0} ρ_{V}} - 1) (1 - δ_{i}) S_{i} (β_{0}) + \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} (\frac{ξ_{i}}{ρ_{0} ρ_{V}} - 1) (1 - ζ_{i}) δ_{i} S_{i} (β_{0}) + \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} [\sum_{k = 1}^{K} (\frac{π_{k} (1 - ρ_{0} ρ_{V}) η_{i k}}{ρ_{k} ρ_{V}} - 1) ζ_{i k}] (1 - ξ_{i}) δ_{i} S_{i} (β_{0}) .

(5)

It is easy to prove the three parties of (5) are uncorrelated. We can obtain the asymptotic normality of ${\hat{β}}_{O D S}$ by the multivariate central limit theorem. Obviously, the consistency of ${\hat{β}}_{O D S}$ holds immediately.

Proof of Theorem 3

From the (7) and a simple algebraic manipulation, we can get that we have

\begin{matrix} {\hat{Λ}}_{O D S} (t) - Λ_{0} (t) & = \int_{0}^{t} \frac{\sum_{i = 1}^{N} w_{i} d N_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} - \int_{0}^{t} \frac{\sum_{i = 1}^{N} w_{i} Y_{i} (s) d Λ (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} - \int_{0}^{t} {\hat{β}}_{O D S}^{'} \overset{‒}{Z} (s) d s \\ = \int_{0}^{t} \frac{\sum_{i = 1}^{N} w_{i} d M_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} - {({\hat{β}}_{O D S} - β_{0})}^{'} \int_{0}^{t} \overset{‒}{Z} (s) d s \\ = \int_{0}^{t} \frac{\sum_{i = 1}^{N} w_{i} d M_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} - {({\hat{β}}_{O D S} - β_{0})}^{'} h (t) - {({\hat{β}}_{O D S} - β_{0})}^{'} \int_{0}^{t} {\overset{‒}{Z} (s) - e (s)} d s . \end{matrix}

(6)

The third term is obviously o_p (1) uniformly in t. So,

\sup_{t \in [0, τ]} ∣ {\hat{Λ}}_{O D S} (t) - Λ_{0} (t) ∣ \leq \sup_{t \in [0, τ]} ∣ \int_{0}^{t} \frac{\sum_{i = 1}^{N} w_{i} d M_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} ∣ + \sup_{t \in [0, τ]} ∣ {({\hat{β}}_{O D S} - β_{0})}^{'} h (t) ∣

(7)

From the consistency of ${\hat{β}}_{O D S}$ and h(t) being bounded on [0, τ], we can obtain $\sup_{t \in [0, τ]} ∣ {({\hat{β}}_{O D S} - β_{0})}^{'} h (t) ∣ = o_{p} (1)$ . We have $\sup_{t \in [0, τ]} ∣ \int_{0}^{t} \frac{\sum_{i = 1}^{N} w_{i} d M_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} ∣ = o_{p} (1)$ by the method of quation (2)'s proof. Therefore, $\sup_{t \in [0, τ]} ∣ {\hat{Λ}}_{O D S} (t) - Λ_{0} (t) ∣ \to_{p} 0$

From (6), we obtain

\sqrt{N} ({\hat{Λ}}_{O D S} (t) - Λ_{0} (t)) = \int_{0}^{t} \frac{\sqrt{N} \sum_{i = 1}^{N} w_{i} d M_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} - \sqrt{N} {({\hat{β}}_{O D S} - β_{0})}^{'} h (t) - \sqrt{N} {({\hat{β}}_{O D S} - β_{0})}^{'} \int_{0}^{t} {\overset{‒}{Z} (s) - e (s)} d s .

By the method of Theorem 1's proof, we have

\sqrt{N} ({\hat{β}}_{O D S} - β_{0}) = Σ_{A}^{- 1} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - e (t)} d M_{i} (t) + o_{p} (1)

and

\int_{0}^{t} \frac{\sqrt{N} \sum_{i = 1}^{N} w_{i} d M_{i} (s)}{\sum_{i = 1}^{N} w_{i} Y_{i} (s)} = \int_{0}^{t} \frac{\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} d w_{i} M_{i} (s)}{Ψ_{0} (s)} + o_{p} (1) .

(8)

Obviously, M_i(t) is the difference of two monotone function in t and ψ₀(·) > 0. Thus, $\int_{0}^{t} \frac{w_{i} d M_{i} (s)}{Ψ_{0} (s)}$ is also a difference of two monotone function in t. Because monotone functions have pseudo-dimension 1 (Pollard, 1990; page 15), the process $\int_{0}^{t} \frac{w_{i} d M_{i} (s)}{Ψ_{0} (s)}$ is manageable (Pollard, 1990;page 38). It then follows the functional central limit theorem (Pollard, 1990; page 53) that $N^{- 1 ∕ 2} \sum_{i = 1}^{N} w_{i} \int_{0}^{t} \frac{d M_{i} (s)}{Ψ_{0} (s)}$ is tight and thus converges weakly to a Gaussian process with mean zero. This weak convergence also follows van der Vaart and Wellner (1996, Example 2.11.16, page 215). The tightness of $\sqrt{N} {({\hat{β}}_{O D S} - β_{0})}^{'} h (t)$ follows from the Theorem 1.

Obviously, ${({\hat{β}}_{O D S} - β_{0})}^{'} \int_{0}^{t} {\overset{‒}{Z} (s) - e (s)} d s$ is $o_{p} (N^{- 1 ∕ 2})$ uniformly in t. Therefore, we have

\sqrt{N} ({\hat{Λ}}_{O D S} (t) - Λ_{0} (t)) = \int_{0}^{t} \frac{\frac{1}{\sqrt{N}} \sum_{i = 1}^{N} d w_{i} M_{i} (s)}{Ψ_{0} (s)} - h {(t)}^{'} Σ_{A}^{- 1} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} w_{i} \int_{0}^{τ} {Z_{i} (t) - e (t)} d M_{i} (t) + o_{p} (1),

which converges weakly to a zero-mean Gaussian process. Thus, we prove Theorem 3.

BIBLIOGRAPHY

Andersen PK, Gill RD. Cox's regression model for counting processes: A large samle study. Annals of Statistics. 1982;10:1100–1120. [Google Scholar]
Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J. Exposure stratified case-cohort Designs. Lifetime Data Analysis. 2000;6:39–58. doi: 10.1023/a:1009661900674. [DOI] [PubMed] [Google Scholar]
Breslow NE, Cain KC. Logistic regression for two-stage case-control data. Biometrika. 1988;75:11–20. [Google Scholar]
Breslow NE, Holubkov R. Maximum likelihood estimation of logistic regressiion parameters under two-phase, outcome-dependent sampling. Journal of the Royal Statistical Society, Series B. 1997;59:447–461. [Google Scholar]
Buckley JD. Additive and multiplicative models for relative survival data. Biometrics. 1984;40:51–62. [PubMed] [Google Scholar]
Cai J, Zeng D. Sample size/power calculation for case-cohort studies. Biometrics. 2004;60:1015–1024. doi: 10.1111/j.0006-341X.2004.00257.x. [DOI] [PubMed] [Google Scholar]
Cai J, Zeng D. Power calculation for case-cohort studies with nonrare events. Biometrics. 2007;63:1288–1295. doi: 10.1111/j.1541-0420.2007.00838.x. [DOI] [PubMed] [Google Scholar]
Chen K. Generalized case-cohort sampling. Journal of the Royal Statistical Society, Series B. 2001;63:791–809. [Google Scholar]
Cox DR, OAKES D. Analysis of Survival Data. Chapman & Hall; London: 1984. [Google Scholar]
Horvitz D, Thompson D. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1951;47:663–685. [Google Scholar]
Kang S, Cai J. Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika. 2009;96:887–901. doi: 10.1093/biomet/asp059. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kreuzer M, Grosche B, Schnelzer M, Tschense A, Dufey F, Walsh L. Radon and risk of death from cancer and cardiovascular diseases in the German uranium miners cohort study : follow-up 1946-2003. Radiation and Environmental Biophysics. 2010;49:177–185. doi: 10.1007/s00411-009-0249-5. [DOI] [PubMed] [Google Scholar]
Kreuzer M, Walsh L, Schnelzer M, Tschense A, Grosche B. Radon and risk of extrapulmonary cancers: results of the German uranium miner's cohort study. British Journal of Cancer. 2008;99:1945–1953. doi: 10.1038/sj.bjc.6604776. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kulich M, Řeřicha V, Řeřicha R, Shore DL, Sandler D. Incidence of non-lung solid cancers in Czech uranium miners: a case-cohort study. Enviromental Health. 2011;111:400–405. doi: 10.1016/j.envres.2011.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kulich M, Lin DY. Additive hazards regression with covariate measurement error. Journal of American Statistical Association. 2000;95:238–248. [Google Scholar]
Kulich M, Lin DY. Improving the efficiency of relative-risk Estimation in case-cohort Studies. Journal of American Statistical Association. 2004;99:832–844. [Google Scholar]
Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71. [Google Scholar]
National Research Council . Committee on the Biological Effects of lonizing Radiation (BEIR VI), Health effects of exposure to radon. National Academy Press; Washingtou DC.: 1999. [Google Scholar]
Pan Q, Schaubel DE. Proportional hazards models based on biased samples and estimated selection probabilities. The Canadian Journal of Statistics. 2008;36:111–127. [Google Scholar]
Pollard D. Empirical processes: theories and applications. Institute of Mathematical Statistics; Hayward: 1990. [Google Scholar]
Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]
Prentice RL, Pyke R. Logistic disease incidence models and case-control studies. Biometrika. 1979;66:403–412. [Google Scholar]
Řeřicha V, Kulich M, Řeřicha R, Shore DL, Sandler D. Incidence of leukemia, lymphoma, and multiple myeloma in Czech uranium miners: a case-cohort study. Enviromental Health Perspect. 2006;114:818–822. doi: 10.1289/ehp.8476. [DOI] [PMC free article] [PubMed] [Google Scholar]
Samuelsen S, Ȧnestad H, Skrondal A. Stratified case-cohort analysis of general cohort sampling designs. Scandinavian Journal of Statistics. 2007;34:103–119. [Google Scholar]
Scheike T, Martinussen T. Maximum likelihood estimation in Cox's regression model under case-cohort sampling. Scandinavian Journal of Statistics. 2004;31:283–293. [Google Scholar]
Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Annals of Statistics. 1988;16:64–81. [Google Scholar]
Song R, Zhou H, Kosorok M. A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome. Biometrika. 2009;96:221–228. doi: 10.1093/biomet/asn073. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun J, Sun L, Flournoy N. Additive hazards model for competing risks analysis of the case-cohort Design. Communications in Statistics – Theory and Methods. 2004;33:351–366. [Google Scholar]
Tirmarche M, Raphalen A, Allin F, Bredon P. Mortality of a cohort of French uranium miners exposured to relatively low randon concentrations. British Journal of Cancer. 1993;67:1090–1097. doi: 10.1038/bjc.1993.200. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vacquier B, Caer S, Rogel A, Feurprier M, Tirmarche M, Luccioni C, Quesne B, Acker A, Laurier D. Mortality risk in the French cohort of uranium miners: extended follow-up 1964-1999. Occupational Environmental Medicine. 2008;65:597–604. doi: 10.1136/oem.2007.034959. [DOI] [PubMed] [Google Scholar]
van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer-Verlag; New York: 1996. [Google Scholar]
Wang X, Zhou H. Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics. 2010;66:502–511. doi: 10.1111/j.1541-0420.2009.01280.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weaver MA, Zhou H. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. Journal of The American Statistical Association. 2005;100:459–469. [Google Scholar]
Weinberg CR, Wacholder S. Prospective analysis of case-control data under general multiplicative intercept risk models. Biometrika. 1993;80:461–465. [Google Scholar]
White J. A two stage design for the study of the relationship between a rare exposure and a rare disease. American Journal of Epidemiology. 1982;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]
Yip PF, Zhou Y, Lin D, Fang X. Estimation of population size based on additive hazards models for continuous-time recapture experiments. Biometrics. 1999;55:904–908. doi: 10.1111/j.0006-341x.1999.00904.x. [DOI] [PubMed] [Google Scholar]
Zhou H, Chen J, Rissnen T, Korrick S, Hu H, Salonen J, Longnecker MP. Outcome-dependent sampling: an efficient sampling and inference procedure for studies with a continuous outcome. Epidemiology. 2007;18:461–468. doi: 10.1097/EDE.0b013e31806462d3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou H, Qin G, Longnecker M. A partial linear model in the outcome-dependent sampling setting to evaluate the effect of prenatal PCB exposure on cognitive function in children. Biometrics. 2011;67:876–885. doi: 10.1111/j.1541-0420.2010.01500.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou H, Weaver M, Qin J, Longnecker M, Wang MC. A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics. 2002;58:413–421. doi: 10.1111/j.0006-341x.2002.00413.x. [DOI] [PubMed] [Google Scholar]

[R1] Andersen PK, Gill RD. Cox's regression model for counting processes: A large samle study. Annals of Statistics. 1982;10:1100–1120. [Google Scholar]

[R2] Borgan O, Langholz B, Samuelsen SO, Goldstein L, Pogoda J. Exposure stratified case-cohort Designs. Lifetime Data Analysis. 2000;6:39–58. doi: 10.1023/a:1009661900674. [DOI] [PubMed] [Google Scholar]

[R3] Breslow NE, Cain KC. Logistic regression for two-stage case-control data. Biometrika. 1988;75:11–20. [Google Scholar]

[R4] Breslow NE, Holubkov R. Maximum likelihood estimation of logistic regressiion parameters under two-phase, outcome-dependent sampling. Journal of the Royal Statistical Society, Series B. 1997;59:447–461. [Google Scholar]

[R5] Buckley JD. Additive and multiplicative models for relative survival data. Biometrics. 1984;40:51–62. [PubMed] [Google Scholar]

[R6] Cai J, Zeng D. Sample size/power calculation for case-cohort studies. Biometrics. 2004;60:1015–1024. doi: 10.1111/j.0006-341X.2004.00257.x. [DOI] [PubMed] [Google Scholar]

[R7] Cai J, Zeng D. Power calculation for case-cohort studies with nonrare events. Biometrics. 2007;63:1288–1295. doi: 10.1111/j.1541-0420.2007.00838.x. [DOI] [PubMed] [Google Scholar]

[R8] Chen K. Generalized case-cohort sampling. Journal of the Royal Statistical Society, Series B. 2001;63:791–809. [Google Scholar]

[R9] Cox DR, OAKES D. Analysis of Survival Data. Chapman & Hall; London: 1984. [Google Scholar]

[R10] Horvitz D, Thompson D. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1951;47:663–685. [Google Scholar]

[R11] Kang S, Cai J. Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika. 2009;96:887–901. doi: 10.1093/biomet/asp059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Kreuzer M, Grosche B, Schnelzer M, Tschense A, Dufey F, Walsh L. Radon and risk of death from cancer and cardiovascular diseases in the German uranium miners cohort study : follow-up 1946-2003. Radiation and Environmental Biophysics. 2010;49:177–185. doi: 10.1007/s00411-009-0249-5. [DOI] [PubMed] [Google Scholar]

[R13] Kreuzer M, Walsh L, Schnelzer M, Tschense A, Grosche B. Radon and risk of extrapulmonary cancers: results of the German uranium miner's cohort study. British Journal of Cancer. 2008;99:1945–1953. doi: 10.1038/sj.bjc.6604776. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Kulich M, Řeřicha V, Řeřicha R, Shore DL, Sandler D. Incidence of non-lung solid cancers in Czech uranium miners: a case-cohort study. Enviromental Health. 2011;111:400–405. doi: 10.1016/j.envres.2011.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Kulich M, Lin DY. Additive hazards regression with covariate measurement error. Journal of American Statistical Association. 2000;95:238–248. [Google Scholar]

[R16] Kulich M, Lin DY. Improving the efficiency of relative-risk Estimation in case-cohort Studies. Journal of American Statistical Association. 2004;99:832–844. [Google Scholar]

[R17] Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71. [Google Scholar]

[R18] National Research Council . Committee on the Biological Effects of lonizing Radiation (BEIR VI), Health effects of exposure to radon. National Academy Press; Washingtou DC.: 1999. [Google Scholar]

[R19] Pan Q, Schaubel DE. Proportional hazards models based on biased samples and estimated selection probabilities. The Canadian Journal of Statistics. 2008;36:111–127. [Google Scholar]

[R20] Pollard D. Empirical processes: theories and applications. Institute of Mathematical Statistics; Hayward: 1990. [Google Scholar]

[R21] Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]

[R22] Prentice RL, Pyke R. Logistic disease incidence models and case-control studies. Biometrika. 1979;66:403–412. [Google Scholar]

[R23] Řeřicha V, Kulich M, Řeřicha R, Shore DL, Sandler D. Incidence of leukemia, lymphoma, and multiple myeloma in Czech uranium miners: a case-cohort study. Enviromental Health Perspect. 2006;114:818–822. doi: 10.1289/ehp.8476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Samuelsen S, Ȧnestad H, Skrondal A. Stratified case-cohort analysis of general cohort sampling designs. Scandinavian Journal of Statistics. 2007;34:103–119. [Google Scholar]

[R25] Scheike T, Martinussen T. Maximum likelihood estimation in Cox's regression model under case-cohort sampling. Scandinavian Journal of Statistics. 2004;31:283–293. [Google Scholar]

[R26] Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Annals of Statistics. 1988;16:64–81. [Google Scholar]

[R27] Song R, Zhou H, Kosorok M. A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome. Biometrika. 2009;96:221–228. doi: 10.1093/biomet/asn073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Sun J, Sun L, Flournoy N. Additive hazards model for competing risks analysis of the case-cohort Design. Communications in Statistics – Theory and Methods. 2004;33:351–366. [Google Scholar]

[R29] Tirmarche M, Raphalen A, Allin F, Bredon P. Mortality of a cohort of French uranium miners exposured to relatively low randon concentrations. British Journal of Cancer. 1993;67:1090–1097. doi: 10.1038/bjc.1993.200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Vacquier B, Caer S, Rogel A, Feurprier M, Tirmarche M, Luccioni C, Quesne B, Acker A, Laurier D. Mortality risk in the French cohort of uranium miners: extended follow-up 1964-1999. Occupational Environmental Medicine. 2008;65:597–604. doi: 10.1136/oem.2007.034959. [DOI] [PubMed] [Google Scholar]

[R31] van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer-Verlag; New York: 1996. [Google Scholar]

[R32] Wang X, Zhou H. Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics. 2010;66:502–511. doi: 10.1111/j.1541-0420.2009.01280.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Weaver MA, Zhou H. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. Journal of The American Statistical Association. 2005;100:459–469. [Google Scholar]

[R34] Weinberg CR, Wacholder S. Prospective analysis of case-control data under general multiplicative intercept risk models. Biometrika. 1993;80:461–465. [Google Scholar]

[R35] White J. A two stage design for the study of the relationship between a rare exposure and a rare disease. American Journal of Epidemiology. 1982;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]

[R36] Yip PF, Zhou Y, Lin D, Fang X. Estimation of population size based on additive hazards models for continuous-time recapture experiments. Biometrics. 1999;55:904–908. doi: 10.1111/j.0006-341x.1999.00904.x. [DOI] [PubMed] [Google Scholar]

[R37] Zhou H, Chen J, Rissnen T, Korrick S, Hu H, Salonen J, Longnecker MP. Outcome-dependent sampling: an efficient sampling and inference procedure for studies with a continuous outcome. Epidemiology. 2007;18:461–468. doi: 10.1097/EDE.0b013e31806462d3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Zhou H, Qin G, Longnecker M. A partial linear model in the outcome-dependent sampling setting to evaluate the effect of prenatal PCB exposure on cognitive function in children. Biometrics. 2011;67:876–885. doi: 10.1111/j.1541-0420.2010.01500.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Zhou H, Weaver M, Qin J, Longnecker M, Wang MC. A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics. 2002;58:413–421. doi: 10.1111/j.0006-341x.2002.00413.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Statistical inference for the additive hazards model under outcome-dependent sampling

Jichang Yu

Yanyan Liu

Dale P Sandler

Haibo Zhou

Abstract

1. INTRODUCTION

2. DATA STRUCTURE AND PSEUDO-SCORE EQUATION

2.1. ODS Design and Data Structure

2.2. Weighted Pseudo-Score Estimator

3. ASYMPTOTIC PROPERTIES

Theorem 1

Remark 1

Remark 2

Remark 3

Theorem 2

Proof

Theorem 3

4. ASYMPTOTIC RELATIVE EFFICIENCY AND OPTIMAL ODS DESIGN

4.1. Asymptotic Relative Efficiency with SRS Design with Same Sample Size

4.2. Optimal ODS Design

4.3. Optimal ODS Example

Figure 1.

5. SIMULATION STUDIES

TABLE 1.

6. URANIUM MINERS STUDY DATA ANALYSIS

Table 2.

7. CONCLUDING REMARKS AND DISCUSSIONS

ACKNOWLEDGEMENTS

APPENDIX

Lemma 1

Proof

Lemma 2

Proof of Theorem 1

Proof of Theorem 3

BIBLIOGRAPHY

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases