SEMIPARAMETRIC ADDITIVE RISKS REGRESSION FOR TWO-STAGE DESIGN SURVIVAL STUDIES

Gang Li; Tong Tong Wu

. Author manuscript; available in PMC: 2011 Sep 17.

Published in final edited form as: Stat Sin. 2010 Oct;20(4):1581–1607.

SEMIPARAMETRIC ADDITIVE RISKS REGRESSION FOR TWO-STAGE DESIGN SURVIVAL STUDIES

Gang Li ¹, Tong Tong Wu ²

PMCID: PMC3175231 NIHMSID: NIHMS316704 PMID: 21931467

Abstract

In this article we study a semiparametric additive risks model (McKeague and Sasieni (1994)) for two-stage design survival data where accurate information is available only on second stage subjects, a subset of the first stage study. We derive two-stage estimators by combining data from both stages. Large sample inferences are developed. As a by-product, we also obtain asymptotic properties of the single stage estimators of McKeague and Sasieni (1994) when the semiparametric additive risks model is misspecified. The proposed two-stage estimators are shown to be asymptotically more efficient than the second stage estimators. They also demonstrate smaller bias and variance for finite samples. The developed methods are illustrated using small intestine cancer data from the SEER (Surveillance, Epidemiology, and End Results) Program.

Key words and phrases: Censored data, correlation, efficiency, measurement errors, missing covariates

1. Introduction

Two-stage designs are useful in medical studies and other fields of research. The first stage sample of a two-stage design consists of a set of subjects under study with surrogate, inaccurate, or missing information. The second stage sample is a subset of individuals from the first stage with accurate and complete data. Typical scenarios include measurement error and missing covariates problems. For example, when a complete survey is complicated, expensive, and time consuming, researchers often use a simplified version for all study subjects in the first stage. The complete version is taken only by a small subset of study subjects. Two-stage data also arise in applications where certain information from more recently available technology, such as a genome-wide scan, is collected only for newly-diagnosed patients. A medical device postmarking surveillance example was given by Li and Tseng (2008): St Jude Medical conducted a postmarket surveillance study to evaluate the safety and efficacy of five pacing electrodes by collecting information of the adverse events or failures. The database maintained by the medical device company, which contains all the market devices, has serious under-reporting problems that might underestimate the failure and adverse event rates. To offset the under-reporting bias, St Jude Medical drew an active follow-up sample and collected accurate and complete information on this sample. This typical two-stage survival study consists of the company administrative data (first stage data) and the active follow-up data (second stage data). In general, analysis based on the first stage data alone could be biased. On the other hand, analysis based on the second stage data alone would not be the most efficient since it does not utilize information from the first stage. It would be desirable to combine data from both stages to increase efficiency of the second stage data analysis.

The two-stage design has been studied extensively for complete data; see, e.g., White (1982), Schill et al. (1993), Breslow and Holubkov (1997), and references therein. However, there are relatively few methods available for analysis of two-stage censored survival data, especially when both the outcome variable and covariates are subject to error in the first stage sample. Among others, Zhou and Pepe (1995) and Wang et al. (1997) studied the surrogate covariates problem for a multiplicative semiparametric hazard model using regression calibration techniques. Kulich and Lin (2000) proposed a corrected pseudo-score estimator for the additive risks model of Lin and Ying (1994) with measurement errors on covariates. Based on the work of Chen and Chen (2000) on regression models in two-stage designs, Chen (2002) and Tseng (2004) studied the Cox model for two-stage survival data, where both survival time and covariates are subject to measurement error. Li and Tseng (2008) studied nonparametric estimation of survival functions for two-stage survival data. Jiang and Zhou (2007) studied a two-stage design problem for Lin and Ying 's (1994) model.

In this paper we study the semiparametric additive risks model of McKeague and Sasieni (1994) (referred to as MS hereafter) for analysis of two-stage survival data where both the survival time and covariates are subject to measurement errors. Let h(t|x, z) denote the conditional hazard function of a survival time given x and z. The MS model postulates that

h (t ∣ x, z) = α (t)' x + β' z .

The MS model provides a useful alternative to the Cox (1972) model when the proportional hazards assumption is violated. Including Lin and Ying 's (1994) model as a special case, the MS is more parsimonious than Aalen's (1978) additive risks model.

We derive two-stage estimates for the parametric and nonparametric regression coefficients by bridging the first stage and second stage estimates through their asymptotic joint distribution. The estimators introduced in this paper take the form

{\hat{θ}}_{2} = {\hat{θ}}_{2}^{V} - Σ_{21} Σ_{11}^{- 1} ({\hat{θ}}_{1}^{V} - {\hat{θ}}_{1}),

where ${\hat{θ}}_{2}^{V}$ is the second stage estimator, ${\hat{θ}}_{1}^{V}$ and ${\hat{θ}}_{1}$ are first stage estimators for different sample sizes, Σ₂₁ is the covariance matrix between ${\hat{θ}}_{2}^{V}$ and ${\hat{θ}}_{1}^{V}$ , and Σ₁₁ is the variance of ${\hat{θ}}_{1}^{V}$ . The second stage estimator ${\hat{θ}}_{2}^{V}$ can be improved by incorporating the information from the first stage, i.e. ${\hat{θ}}_{1}^{V} - {\hat{θ}}_{1}$ . The use of information from both stages allows us to fit a model with full information. A major challenge in establishing the asymptotic joint distribution of the first-stage and second-stage estimates in our model is the loss of the martingale property, which is the key to the theoretical development of the MS model for the single stage estimates. Moreover, we need to derive the properties of the MS estimators under a misspecified model. We use a different approach to study the asymptotic joint distribution by deriving i.i.d. representations. The same approach is then used to establish large sample properties of the proposed two-stage estimates and to develop large sample inferences.

Our methods are developed under a very general setting that incorporates measurement errors on both covariates and the survival outcome without requiring specific model specifications for the errors. No assumption is needed for the relationship between surrogate variables and target variables. We allow misspecified models for the first stage data and derive a robust sandwich variance estimate for the MS model.

The paper is organized as follows. In Section 2, we study the properties of the single stage MS estimators under misspecified models, and propose two-stage estimators for the regression coefficients and the conditional survival function. Large sample properties of our proposed estimators are given in Section 3. Point-wise and simultaneous confidence intervals for the conditional survival function are derived. Section 4 presents a simulation study to evaluate the performance of our methods. In Section 5, we illustrate our method using small intestine cancer data from the Surveillance, Epidemiology, and End Results (SEER) Program supported by the National Cancer Institute (NCI). Section 6 provides some concluding remarks. The proofs are provided in the appendix.

2. Two-Stage Estimators

2.1. Notation and assumptions

Supposed there are N subjects in the first stage and only coarse measurements, denoted as (x_1i, z_1i, T_1i, δ_1i), i = 1, …, N, are available. Here x_1i ∊ R^p1, z_i1 ∊ R^q1 are the observed surrogate covariates that might depend on time, $T_{1 i} = min {{\tilde{T}}_{1 i}, C_{1 i}}$ , ${\tilde{T}}_{1 i}$ is a survival time, C_1i is a censoring time conditionally independent of ${\tilde{T}}_{1 i}$ given the covariates, and $δ_{1 i} = I ({\tilde{T}}_{1 i} \leq C_{1 i})$ is the censoring indicator. In the second stage, accurate data (x_2i, z_2i, T_2i, δ_2i), i ∊ V (n), are collected for a random validation subsample V (n) of n subjects from the first stage, where x_2i ∊ R^p2, z₂₁ ∊ R^p2, $T_{2 i} = min ({\tilde{T}}_{2 i}, C_{2 i})$ , ${\tilde{T}}_{2 i}$ is the true survival time, C_2i is a censoring time conditionally independent of ${\tilde{T}}_{2 i}$ given the covariates x_2i and z_2i, and $δ_{2 i} = I ({\tilde{T}}_{2 i} \leq C_{2 i})$ is the censoring indicator.

Assume the following MS model for the survival time ${\tilde{T}}_{2 i}$ :

\begin{matrix} h_{2} (t ∣ x_{2 i}, z_{2 i}) & = lim_{Δ t \to 0} \frac{Pr (t \leq {\tilde{T}}_{2 i} < t + Δ t ∣ {\tilde{T}}_{2 i} \geq t, x_{2 i}, z_{2 i})}{Δ t} \\ = α_{2} (t)' x_{2 i} + β_{2}^{'} z_{2 i}, \end{matrix}

for i ∊ V (n). For the second stage sample, let

{\hat{β}}_{2}^{V} {[\int_{0}^{τ} (Z_{2}^{V})' {\hat{H}}_{2}^{V} Z_{2}^{V} dt]}^{- 1} [\int_{0}^{τ} (Z_{2}^{V})' {\hat{H}}_{2}^{V} d N_{2}^{V} (t)],

{\hat{A}}_{2}^{V} (t) = \int_{0}^{t} {[(X_{2}^{V})' {\hat{W}}_{2}^{V} Z_{2}^{V}]}^{- 1} (X_{2}^{V})' {\hat{W}}_{2}^{V} [d N_{2}^{V} (t) - Z_{2}^{V} {\hat{β}}_{2}^{V} ds],

be the weighted least squares estimators of β₂ and $A_{2} (t) = \int_{0}^{t} α_{2} (u) du$ , respectively (McKeague and Sasieni (1994)), where $Z_{2}^{V} = Z_{2}^{V} (t) = [z_{21} Y_{21} (t), \dots, z_{2 n} Y_{2 n} (t)']$ , $X_{2}^{V} = X_{2}^{V} (t) = [x_{21} Y_{21} (t), \dots, x_{2 n} Y_{2 n} (t)']$ , $Y_{2 i} (t) = I (T_{2 i} \geq t)$ is the at-risk process, $N_{2}^{V} (t) = [N_{21} (t), \dots, N_{2 n} (t)]'$ , N_2i(t) = I(T_2i ≤ t, δ_2i = 1) is the counting process,

{\hat{H}}_{2}^{V} = {\hat{W}}_{2}^{V} - {\hat{W}}_{2}^{V} X_{2}^{V} {[(X_{2}^{V})' {\hat{H}}_{2}^{V} X_{2}^{V}]}^{- 1} (X_{2}^{V})' {\hat{H}}_{2}^{V},

with ${\hat{W}}_{2}^{V} = {\hat{W}}_{2}^{V} (t) = diag [1 ∕ {\hat{h}}_{21} (t), \dots, 1 ∕ {\hat{h}}_{2 n} (t)]$ , ${\hat{h}}_{2 i} (t)$ is a uniformly consistent estimate of the weight function h_2i(t) ≡ h₂(t|x_2i, z_2i) for subject i, and τ is the last time point in the study (see a more rigorous definition in the appendix). Similarly, we define the first stage estimators [ ${\hat{β}}_{1}$ , ${\hat{A}}_{1} (t)$ and ${\hat{β}}_{1}^{V}$ , ${\hat{A}}_{1}^{V} (t)$ ] using the first stage data based on all N subjects and the n subjects in the validation sample, respectively.

2.2. Asymptotic properties of the MS estimators under misspecified models

Our theorem gives large sample properties of ${\hat{β}}_{1}^{V}$ and ${\hat{A}}_{1}^{V} (t)$ without making any model assumption for the first stage data. It is a nontrivial generalization of the result of MS by allowing the model to be misspecified. Scheike (2002) considered a particular misspecification of the MS model, where the form holds only for the rate function and not the intensity, while our results work for general misspecification.

It is shown in the appendix that $\sqrt{n} ({\hat{β}}_{1}^{V} - β_{1})$ is equivalent to a sum of independent and identically distributed random variables with mean zero,

\sqrt{n} ({\hat{β}}_{1}^{V} - β_{1}) = Φ_{1}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} w_{1 i}) + o_{p} (1),

where $β_{1} = Φ_{1}^{- 1} E (\int_{0}^{τ} u_{1 i} (t) d N_{1 i} (t))$ is the unknown working parameter, $Φ_{1} = \int_{0}^{τ} [R_{1} (t) - V_{1} (t) U_{1}^{- 1} (t) V_{1} (t)]$ and

\begin{matrix} w_{1 i} = & {\int u_{1 i} (t) d N_{1 i} (t) - E (\int u_{1 i} (t) d N_{1 i} (t))} \\ - \int [Y_{1 i} (t) \frac{z_{1 i} x_{1 i}^{'}}{h_{1 i} (t)} - V_{1}^{'} (t)] U_{1}^{- 1} (t) d {\tilde{F}}^{x} (t) \\ + \int V_{1}^{'} (t) U_{1}^{- 2} (t) [Y_{1 i} (t) \frac{x_{1 i} x_{1 i}^{'}}{h_{1 i}} - U_{1} (t)] d {\tilde{F}}^{x} (t), \end{matrix}

with

\begin{matrix} u_{1 i} (t) & = Y_{1 i} (t) \frac{[z_{1 i} - V_{1}^{'} (t) U_{1}^{- 1} (t) x_{1 i}]}{h_{1 i} (t)}, \\ R_{1} (t) & = lim_{n \to \infty} {\tilde{R}}_{1}^{V} (t), {\tilde{R}}_{1}^{V} (t) = \frac{1}{n} Z_{1}^{V} (t)' W_{1}^{V} (t) Z_{1}^{V} (t), \\ V_{1} (t) & = lim_{n \to \infty} {\tilde{V}}_{1}^{V} (t), {\tilde{V}}_{1}^{V} (t) = \frac{1}{n} X_{1}^{V} (t)' W_{1}^{V} (t) Z_{1}^{V} (t), \\ U_{1} (t) & = lim_{n \to \infty} {\tilde{U}}_{1}^{V} (t), {\tilde{U}}_{1}^{V} (t) = \frac{1}{n} X_{1}^{V} (t)' W_{1}^{V} (t) X_{1}^{V} (t), \\ {\tilde{F}}^{x} (t) & \equiv lim_{n \to \infty} {\tilde{F}}_{n}^{x} (t), {\tilde{F}}_{n}^{x} (t) = \frac{1}{n} \int_{0}^{t} X_{1}^{'} {\hat{W}}_{1} d N_{1} (s) . \end{matrix}

The variance of $\sqrt{n} ({\hat{β}}_{1}^{V} - β_{1})$ is therefore $Φ_{1}^{- 1} E (w_{1 i}^{\otimes 2}) Φ_{1}^{- 1}$ and can be consistently estimated by ${(Φ_{1}^{- 1})}^{- 1} [(1 ∕ n) Σ_{i = 1}^{n}] {({\hat{w}}_{1 i}^{V})}^{\otimes 2} {({\hat{Φ}}_{1}^{V})}^{- 1}$ with the unknown quantities replaced by their estimates.

Similarly, we prove in the appendix that

\sqrt{n} [{\hat{A}}_{1} (t) - A_{1} (t)] = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{1 i} (t) + o_{p} (1),

where

A_{1} (t) = \int_{0}^{t} U_{1}^{- 1} (s) [d {\tilde{F}}^{x} (s) - V_{1} (s) β_{1} ds],

(2.1)

\begin{matrix} v_{1 i} = & \int_{0}^{t} U_{1}^{- 1} (s) [Y_{1 i} (s) \frac{x_{1 i}}{h_{1 i} (s)} d N_{1 i} (s) - d {\tilde{F}}^{x} (s)] \\ - \int_{t}^{0} U_{1}^{- 2} (s) [Y_{1 i} (s) \frac{x_{1 i} x_{1 i}^{'}}{h_{1 i} (s)} - U_{1} (s)] d {\tilde{F}}^{x} (s) \\ + \int_{0}^{t} U_{1}^{- 2} (s) [Y_{1 i} (s) \frac{x_{1 i} x_{1 i}^{'}}{h_{1 i} (s)} - U_{1}] V_{1} (s) ds \cdot β_{1} \\ - \int_{0}^{t} U_{1}^{- 1} (s) [Y_{1 i} (s) \frac{x_{1 i} z_{1 i}^{'}}{h_{1 i} (s)} - V_{1} (s)] ds \cdot β_{1} \\ - \int_{0}^{t} U_{1}^{- 1} (s) V_{1} (s) ds \cdot w_{1 i} . \end{matrix}

(2.2)

The pointwise variance of the asymptotic Gaussian process can be estimated by $(1 ∕ n) Σ_{i = 1}^{n} {[{\hat{v}}_{1 i} (t)]}^{\otimes 2}$ , with the unknown quantities replaced by the estimates.

The asymptotic results for a misspecified MS model is summarized in the theorem, and the proof is in the appendix.

Theorem 1 Under the regularity conditions (C1)–(C3) stated in the appendix, in a misspecified model

\sqrt{n} ({\hat{β}}_{1}^{V} - β_{1}) \to d N (0, Σ_{β, 11}) as n \to \infty,

where $Σ_{β, 11} = Φ_{1}^{- 1} E (w_{1 i}^{\otimes 2}) Φ_{1}^{- 1}$ with a^⊗2 = aa′ The variance Σ_β,11 can be consistently estimated by ${\hat{Σ}}_{β, 11} = {({\hat{Φ}}_{1}^{V})}^{- 1} [(1 ∕ n) Σ_{i = 1}^{n} {({\hat{w}}_{1 i}^{V})}^{\otimes 2}] {({\hat{Φ}}_{1}^{V})}^{- 1}$ , with ${\hat{Φ}}_{1}^{V} = (1 ∕ n) \int (Z_{1}^{V})' {\hat{H}}_{1}^{V} Z_{1}^{V}$ dt and ${\hat{w}}_{1 i}^{V}$ defined in (A.2). Moreover,

\sqrt{n} [{\hat{A}}_{1}^{V} (\cdot) - A_{1} (\cdot)] \to d W_{1}^{V} (\cdot) in D [0, τ] as n \to \infty,

where $D [0, τ]$ is the standard Skorohod space on [0, τ], τ = sup{t : S₁(t|x, z) S₂(t|x, z)S_C(t|x, z) > 0 for all x, z} (see regularity assumption (C1) in Appendix), and $W_{1}^{V} (\cdot)$ is a zero-mean Gaussian process with covariance function $κ_{1} (t_{1}, t_{2}) = E {[\int_{0}^{t_{1}} v_{1 i} (s) ds] \cdot [\int_{0}^{t_{2}} v_{1 i} (s) ds]'}$ . The variance function of $\sqrt{n} [{\hat{A}}_{1}^{V} (t) - A_{1} (t)]$ is given $Σ_{A, 11} (t) = κ_{1} (t, t)$ , which can be consistently estimated by ${\hat{Σ}}_{A, 11} (t) = (1 ∕ n) Σ_{i = 1}^{n} {[{\hat{v}}_{1 i} (t)]}^{\otimes 2}$ with the unknown quantities replaced by the estimates.

The estimators ${\hat{β}}_{1}$ and ${\hat{A}}_{1} (t)$ , based on all the N first stage subjects, have the same asymptotic properties as stated in Theorem 1. Notice that no model is assumed for the first stage data in Theorem 1. If the MS model holds for the first stage data, then β₁ and A₁(t) coincide with the regression parameters in the true MS model.

2.3. Two-stage estimators of β and A(t)

To develop the two-stage estimator for β₂, we first give the joint distribution of $({\hat{β}}_{1}^{V}, {\hat{β}}_{2}^{V})$ .

Lemma 1. Assume the regularity conditions (C1)–(C3) given in the appendix, then

\sqrt{n} (\begin{matrix} {\hat{β}}_{1}^{V} - β_{1} \\ {\hat{β}}_{2}^{V} - β_{2} \end{matrix}) \to d N (0, (\begin{matrix} Σ_{β, 11} & Σ_{β, 12} \\ Σ_{β, 21} & Σ_{β, 22} \end{matrix})) as n \to \infty,

(2.3)

where $Σ_{β, gg} = Φ_{g}^{- 1} E (w_{gi}^{\otimes 2}) Φ_{g}^{- 1}$ , $Σ_{β, 12} = Φ_{1}^{- 1} E (w_{1 i} \cdot w_{2 i}^{'}) Φ_{2}^{- 1}$ , and g = 1, 2 indicates the stage. The covariance matrix can be estimated by ${\hat{Σ}}_{β, gg} = {({\hat{Φ}}_{g}^{V})}^{- 1} \cdot [(1 ∕ n) Σ_{i \in V (n)} {({\hat{w}}_{gi}^{V})}^{\otimes 2}] {({\hat{Φ}}_{g}^{V})}^{- 1}$ , and ${\hat{Σ}}_{β, 12} = {({\hat{Φ}}_{1}^{V})}^{- 1} [(1 ∕ n) Σ_{i \in V (n)} {\hat{w}}_{1 i}^{V} (t) \cdot {\hat{w}}_{2 i}^{v} (t)'] {(Φ_{2}^{V})}^{- 1}$ Here β₂ Φ₂, w_2i(t), ${\hat{Φ}}_{2}^{V}$ , and ${\hat{w}}_{2 i}^{V} (t)$ are defined similarly as β₁, Φ₁ w_1i(t), ${\hat{Φ}}_{1}^{V}$ , and ${\hat{w}}_{1 i}^{V} (t)$ .

It follows from (2.3) that $E ({\hat{β}}_{2}^{V} - β_{2} ∣ {\hat{β}}_{1}^{V} - β_{1}) \approx Σ_{β, 21} Σ_{β, 11}^{- 1} ({\hat{β}}_{1}^{V} - β_{1})$ . This suggests that β₂ be estimated by

{\hat{β}}_{2} = {\hat{β}}_{2}^{V} - {\hat{Σ}}_{β, 21} {\hat{Σ}}_{β, 11}^{- 1} ({\hat{β}}_{1}^{V} - {\hat{β}}_{1}) .

(2.4)

Next, we consider the joint distribution of ${\hat{A}}_{1}^{V} (t)$ and ${\hat{A}}_{2}^{V} (t)$ .

Lemma 2. Let A₂(t) and v_2i(t) be defined similarly as (2.1) and (2.2), respectively, based on the second stage sample. Under the regularity conditions (C1)–(C3), as n, N → ∞, n/N → ρ,

\sqrt{n} (\begin{matrix} {\hat{A}}_{1}^{V} (t) - A_{1} (t) \\ {\hat{A}}_{2}^{V} (t) - A_{2} (t) \end{matrix}) \to_{d} (\begin{matrix} W_{1}^{V} (t) \\ W_{2}^{V} (t) \end{matrix})

in $D {[0, τ]}^{2} = D [0, τ] \times D [0, τ]$ , where $[W_{1}^{V} (t), W_{2}^{V} (t)]$ is a zero-mean Gaussian random field, with variance-covariance function

(\begin{matrix} Σ_{A, 11} (t_{1}) & Σ_{A, 12} (t_{1}, t_{2}) \\ Σ_{A, 21} (t_{1}, t_{2}) & Σ_{A, 22} (t_{2}) \end{matrix}),

where Σ_A,kl(t₁, t₂) = E [v_ki(t₁)·v_li(t₂)′] for k, l ε {1, 2}, and Σ_A,gg(t) = Σ_A,gg(t, t). The variance and covariance functions can be consistently estimated by ${\hat{Σ}}_{A, kl} (t_{1}, t_{2})$ as defined in (A.4) in the Appendix.

By Lemma 2 and the argument leading to (2.4), we take

{\hat{A}}_{2} (t) = {\hat{A}}_{2}^{V} (t) - {\hat{Σ}}_{A, 21} (t) {\hat{Σ}}_{A, 11}^{- 1} (t) [{\hat{A}}_{1}^{V} (t) - {\hat{A}}_{1} (t)],

(2.5)

where ${\hat{Σ}}_{A, 21} (t) = {\hat{Σ}}_{A, 21} (t, t)$ .

Our two-stage estimators possess some appealing properties. In particular, if the first stage data are barely correlated with the second stage data, then the proposed estimate ${\hat{β}}_{2}$ is close to the second stage estimate ${\hat{β}}_{2}^{V}$ . This is a desirable property since the first stage data are not expected to contribute much useful information for estimating β₂. The same comment applies to ${\hat{A}}_{2} (t)$ . It can also be easily verified that when the first stage sample contains precise and complete information, the proposed estimates ${\hat{β}}_{2}$ and ${\hat{A}}_{2} (t)$ are identical to the estimates ${\hat{β}}_{1}$ and ${\hat{A}}_{1} (t)$ . This means that we should use all the first stage data to estimate the parameters and make statistical inference when no bias is present in the first stage sample.

This method is general enough to allow variables to have different types of coefficients in the two stages, as well as different sets of covariates for the first- and second-stage models. However, we recommend using the same type of coefficients for a variable in both stages whenever possible, based on the intuition that the effects of a variable are expected to be similar for both stages. For example, we use constant coefficients for age and gender for both the first- and second-stage models in our data example in Section 5.

3. Asymptotic Properties and Inferences

3.1. Asymptotic properties of $({\hat{β}}_{2}, {\hat{A}}_{2} (t))$

The following result states the weak convergence property of the joint distribution of the proposed estimators ${\hat{β}}_{2}$ and ${\hat{A}}_{2} (t)$ .

Theorem 2 Under conditions (C1)–(C3), we have

\sqrt{n} (\begin{matrix} {\hat{β}}_{2} - β_{2} \\ {\hat{A}}_{2} (t) - A_{2} (t) \end{matrix}) \to_{d} (\begin{matrix} Z_{2} \\ W_{2} (t) \end{matrix}),

(3.1)

where $Z_{2} ~ N (0, Σ_{β_{2}}^{*})$ with $Σ_{β_{2}}^{*} = Σ_{β, 22} - (1 - ρ) Σ_{β, 21} Σ_{β, 11}^{- 1} Σ_{β, 12}$ , and $W_{2} (t)$ is a zero-mean Gaussian process with covariance function

ζ (s, t) = Σ_{A, 22} (s, t) - [Σ_{A, 21} (s, t) - \sqrt{ρ} Σ_{A, 21} (s, t)] Σ_{A, 11}^{- 1} (t) Σ_{A, 12} (t) - Σ_{A, 21} (s) Σ_{A, 11}^{- 1} (s) [Σ_{A, 12} (s, t) - \sqrt{ρ} Σ_{A, 12} (s, t)] + Σ_{A, 21} (s) Σ_{A, 11}^{- 1} (s) [Σ_{A, 11} (s, t) - 2 \sqrt{ρ} Σ_{A, 11} (s \land t) + ρ Σ_{A, 11} (s, t)] Σ_{A, 11}^{- 1} (t) Σ_{A, 12} (t)

with Σ_A,kl(s, t) defined in Lemma 2. The variance of $W_{2} (t)$ is $Σ_{A_{2}}^{*} (t) = ζ (t, t) = Σ_{A, 22} (t) - (1 - ρ) Σ_{A, 21} (t) Σ_{A, 11}^{- 1} (t) Σ_{A, 12} (t)$ . The covariance between Z₂ and $W_{2} (t)$ is

Σ_{β_{2} A_{2}}^{*} (t) = Σ_{β_{2} A_{2}} (t) - (1 - \sqrt{ρ}) Σ_{β_{2} A_{1}} (t) Σ_{A, 11}^{- 1} (t) Σ_{A, 12} (t) - (1 - \sqrt{ρ}) Σ_{β, 21} Σ_{β, 11}^{- 1} Σ_{β_{1} A_{2}} (t) + (1 - 2 \sqrt{ρ} + ρ) Σ_{β, 21} Σ_{β, 11}^{- 1} Σ_{β_{1} A_{1}} (t) Σ_{A, 11}^{- 1} (t) Σ_{A, 12} (t),

where n/N → ρ for some constant 0 < ρ < 1 as n, N → ∞, and $Σ_{β_{k} A_{l}} (t) = Φ_{k}^{- 1} \cdot E {w_{k i} \cdot v_{l i} (t)'}$ for k, l ε {1, 2}.

Furthermore, $Σ_{β_{2}}^{*}$ can be consistently estimated by ${\hat{Σ}}_{β_{2}}^{*} = {\hat{Σ}}_{β, 22} - (1 - (n ∕ N)) {\hat{Σ}}_{β, 21} \cdot {\hat{Σ}}_{β, 11}^{- 1} {\hat{Σ}}_{β, 12}$ , and $Σ_{A_{2}}^{*} (t)$ can be consistently estimated by $Σ_{A_{2}}^{*} (t) = {\hat{Σ}}_{A, 22} (t) - (1 - (n ∕ N)) {\hat{Σ}}_{A, 21} (t) Σ_{A, 11}^{- 1} (t) {\hat{Σ}}_{A, 12} (t)$ for any t ε [0, τ]. The covariance function $Σ_{β_{2} A_{2}}^{*} (t)$ can be estimated by

{\hat{Σ}}_{β_{2} A_{2}}^{*} (t) = {\hat{Σ}}_{β_{2} A_{2}} (t) - (1 - \sqrt{\frac{n}{N}}) {\hat{Σ}}_{β_{2} A_{1}} (t) {\hat{Σ}}_{A, 11}^{- 1} (t) {\hat{Σ}}_{A, 12} (t) - (1 - \sqrt{\frac{n}{N}}) {\hat{Σ}}_{β, 21} {\hat{Σ}}_{β, 11}^{- 1} {\hat{Σ}}_{β_{2} A_{2}} (t) + (1 - 2 \sqrt{\frac{n}{N}} + \frac{n}{N}) {\hat{Σ}}_{β, 21} {\hat{Σ}}_{β, 11}^{- 1} {\hat{Σ}}_{β_{2} A_{1}} (t) {\hat{Σ}}_{A, 11}^{- 1} (t) Σ_{A, 12} (t)

with ${\hat{Σ}}_{β_{k} A_{l}} (t) = {\hat{Φ}}_{k}^{- 1} \cdot {(1 ∕ n) Σ_{i = 1}^{n} {\hat{w}}_{k i} \cdot [\int_{0}^{t} {\hat{v}}_{l i} (s) d s]'}$ .

Obviously, $Σ_{β_{2}}^{*} \leq Σ_{β, 22}$ (i.e. $Σ_{β, 22} - Σ_{β_{2}}^{*}$ is nonnegative definite). Hence our proposed two-stage estimators are asymptotically more efficient than the estimators using the second stage data alone. We will compare their finite sample performance in Section 4.

3.2. Estimation of the conditional survival function

We consider the problem of estimating the conditional cumulative hazard function H₂(t) = H₂(t|x₀, z₀) and the conditional survival function S₂(t) = S₂(t|x₀, z₀), for some given covariates x₀, z₀. Let ${\hat{H}}_{2} (t) = {\hat{A}}_{2} (t)' x_{0} + {\hat{β}}_{2}^{'} z_{0} \cdot t$ and ${\hat{S}}_{2} (t) = Π_{s \leq t} [1 - Δ {\hat{H}}_{2} (s)]$ , where $Δ {\hat{H}}_{2} (s) = {\hat{H}}_{2} (s) - {\hat{H}}_{2} (s -)$ .

Theorem 3 Assume that n/N → ρ for some constant 0 < ρ < 1 as n, N → ∞ Under the regularity conditions (C1)–(C3),

\begin{matrix} E_{n} (t) = & \sqrt{n} [{\hat{H}}_{2} (\cdot) - H_{2} (\cdot)] \to_{d} V_{2} (\cdot) in D [0, τ], \\ \sqrt{n} [{\hat{S}}_{2} (\cdot) - S_{2} (\cdot)] \to_{d} S_{2} (\cdot) V_{2} (\cdot) in D [0, τ], \end{matrix}

where $V_{2} (\cdot)$ is a zero-mean Gaussian process with covariance function $Σ_{H_{2}}^{*}$ (t₁, t₂) equal to $x_{0}^{'} ζ (t_{1}, t_{2}) x_{0} + z_{0}^{'} Σ_{β_{2}}^{*} z_{0} \cdot t_{1} t_{2} + z_{0}^{'} Σ_{β_{2} A_{2}}^{*} (t_{1}) x_{0} \cdot t_{1} + z_{0}^{'} Σ_{β_{2} A_{2}}^{*} (t_{2}) x_{0} \cdot t_{2}$ ; this can be consistently estimated by replacing each term by its estimate.

Thus at any t ε [0, τ], 100(1 − α)% pointwise confidence intervals for H₂(t) and S₂(t) are given by ${\hat{H}}_{2} (t) \pm z_{1 - α ∕ 2} \sqrt{{\hat{Σ}}_{H_{2}} (t)}$ , and ${\hat{S}}_{2} (t) \pm z_{1 - α ∕ 2} {\hat{S}}_{2} (t) \sqrt{{\hat{Σ}}_{H_{2}} (t)}$ , where 1 z_1−α/2 is the (1 − α/2)th percentile of the standard normal distribution.

Notice that the proposed estimator ${\hat{S}}_{2} (t)$ is not necessarily monotonically non-increasing in t. As mentioned by Li and Tseng (2008), this problem is local and minor, especially when the sample size is large. In practice, one can improve the estimates by the `Poor-Adjacent Violator' algorithm (Barlow et al. (1972)) or some simpler modification (c.f. Lin and Ying (1994)).

Theorem 3 cannot be readily applied to construct simultaneous confidence bands for S₂(t) over a given interval [τ₁, τ₂] since the distribution of sup $∣ {\hat{S}}_{2} (t) - S_{2} (t) ∣$ is intractable. Using an idea similar to that in Lin and Ying (1994), we develop a Monte Carlo method for constructing simultaneous confidence bands for H₂(t) and S₂(t). It can be shown that the process E_n(t) in Theorem 3 is asymptotically equivalent to a sum of i.i.d. random variables. Specifically,

\begin{matrix} E_{n} (t) & = z_{0}^{'} \cdot \sqrt{n} ({\hat{β}}_{2} - β_{2}) \cdot t + x_{0}^{'} \cdot \sqrt{n} [{\hat{A}}_{2} (t) - A_{2} (t)] \\ = z_{0}^{'} \cdot {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{2}^{- 1} w_{2 i} - Σ_{β, 21} Σ_{β, 11}^{- 1} [\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{1}^{- 1} w_{1 i} - \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{N} Φ_{1}^{- 1} w_{1 i}]} \cdot t + x_{0}^{'} \cdot {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{2 i} (t) - Σ_{A, 21} (t) Σ_{A, 11}^{- 1} (t) [\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{1 i} (t) - \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{N} v_{1 i} (t)]} + o_{p} (1) . \end{matrix}

To approximate the distribution of E_n(t), we define another process ${\hat{E}}_{n} (t)$ as

{\hat{E}}_{n} (t) = z_{0}^{'} {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {({\hat{Φ}}_{2}^{V})}^{- 1} {\hat{w}}_{2 i}^{V} G_{i} - {\hat{Σ}}_{β, 21} {\hat{Σ}}_{β, 11}^{- 1} [\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {({\hat{Φ}}_{1}^{V})}^{- 1} {\hat{w}}_{1 i}^{V} G_{i} - \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{N} {\hat{Φ}}_{1}^{- 1} {\hat{w}}_{1 i} G_{i}]} t + x_{0}^{'} {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{v}}_{2 i}^{V} (t) G_{i} - {\hat{Σ}}_{A, 21} (t) {\hat{Σ}}_{A, 11}^{- 1} (t) [\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{v}}_{1 i}^{V} (t) G_{i} - \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{N} {\hat{v}}_{1 i} (t) G_{i}]},

where G_i are i.i.d. N(0, 1). We prove in the appendix that E_n(t) and ${\hat{E}}_{n} (t)$ have the same limiting distribution.

Theorem 4 Conditioned on the data (x_1i, z_1i, T_1i, δ_1i), i = 1,…, N, and (x_2j, z_2j, T_2j, δ_2j), j ε V (n), the random process ${\hat{E}}_{n} (t)$ converges weakly to $V_{2} (t)$ in $D [0, τ]$ .

Theorem 4 suggests that the limiting distribution of E_n(t) can be approximated by that of ${\hat{E}}_{n} (t)$ . The latter can be obtained by generating a large number of independent Monte Carlo random samples G₁,…,G_N from the standard normal distribution. Similar to Lin and Ying (1994), the confidence bands for H₂(t) and S₂(t) can be obtained as $ϕ^{- 1} {ϕ [{\hat{H}}_{2} (t)] \mp n^{- 1 ∕ 2} q_{α} ∕ g (t)}$ and ${- ϕ^{- 1} {ϕ [{\hat{H}}_{2} (t)] \pm n^{- 1 ∕ 2} q_{α} ∕ g (t)}}$ , where q_α is the critical value of $\Pr (\sup_{t \in [τ_{1}, τ_{2}]} ∣ \sqrt{n} g (t) ϕ' [{\hat{H}}_{2} (t)] {\hat{E}}_{n} (t) ∣ > q_{α}) = α$ , g is a weight function, and ϕ is a known transformation function with non-zero and continuous first derivative ϕ′. Specifically, we consider $g (t) = {\hat{H}}_{2} (t) ∕ {\hat{Σ}}_{{\hat{H}}_{2}}^{*} (t)$ to get an equal-precision band, and set ϕ(t) = log(t) to obtain bands on meaningful ranges and to attain better coverage probabilities.

4. Simulation Studies

We present a small simulation study to illustrate and evaluate the finite sample performance of the proposed two-stage estimators. The two-stage estimators are compared with the second stage estimators in terms of bias, variance, mean square error (MSE), and achieved confidence interval coverage probabilities.

The weight matrices W_g(t), g = 1, 2, in Section 2.1 require consistent estimates of the conditional hazard functions h_gi(t|x_gi, z_gi); they are given by ${\hat{h}}_{g i} (t ∣ x_{g i}, z_{g i}) = {\hat{α}}_{g} (t)' x_{g i} + {\hat{β}}_{g}^{'} z_{g i}$ , where

{\hat{α}}_{g} (t) = \frac{1}{b} \int_{- \infty}^{\infty} K (\frac{t - s}{b}) d {\hat{A}}_{g} (s),

K(t) is a kernel function, and b is the bandwidth. In the simulation study, we used the Epanechnikov kernel, K(x) = 3(1 − x²)/4 for |x| < 1, with b = 0.4τ, around a given point t. The boundary effects are corrected by the modified asymmetric kernel proposed by Gasser and Muller (1979). In a more comprehensive simulation, Wu (2006) observed that both second stage and two-stage estimators are not very sensitive to the choice of the smoothing parameter.

The second stage survival time was generated from h₂(t|x_2i, z_2i,1, z_2i,2) = α_2,0(t) + α_2,1(t)x_2i + β_2,1z_2i,1 + β_2,2z_2i,2, where α_2,0(t) = 1, α_2,1(t) = t, β_2,1 = β_2,2 = 1 (q₂ = 2), and x_2i, z_2i,1, z_2i,2 ~ i.i.d. Unif[0, 1]. The random censoring times C_2i, i = 1,…,n, were generated from h(c|x_2i, z_2i) = 0.1 + 0.1cx_2i + 0.5z_2i,1 + 0.5z_2i,2. About 20 percent of the subjects were censored under this model. In the first stage, we considered a general situation by incorporating both the measurement error problem and missing covariate problem. The working model for the stage data was h₁(t|x_1i, z_1i,1) = α_1,0(t) + α_1,1(t)x_1i + β_1,1z_1i,1, where ${\tilde{T}}_{1 i} = {\tilde{T}}_{2 i} + Unif [0, 0.1]$ , x_1i = x_2i + Unif[0, 0.1], z_1i,1 = z_2i,1 + Unif[0, 0.5], and the covariate z_2i,2 is missing for all subjects. We generated 1,000 Monte Carlo samples for each size (n, N) = (100, 1, 000), (500, 1, 000), and (500, 2, 000). With different sample sizes at both the first and second stages, we can evaluate: (1) the performance of the variance estimator under a misspecified model at finite sample sizes; and (2) the improvement of the second stage estimator when incorporating different amount of information form the first stage.

Table 1 presents the bias, variance, estimated variance, and MSE of ${\hat{β}}_{1}^{V}$ , ${\hat{β}}_{1}$ , ${\hat{β}}_{2}^{V}$ , and ${\hat{β}}_{2}$ together with the achieved coverage probabilities of the respective confidence intervals. When the second stage sample was small (n = 100), the variance of the second stage estimate ${\hat{β}}_{2, 1}^{V}$ (0.69) was underestimated (0.58). As a comparison, we can see that the first stage estimates had large bias, which indicates that the information from the first stage was biased. Even so, there was gain from combining the first stage data of z_1i,1. The estimated variance of ${\hat{β}}_{2, 1}$ (0.23) was close to the variance (0.24), which was much smaller than the variance of the second stage estimate. We also observed a better coverage probability of ${\hat{β}}_{2, 1}$ than ${\hat{β}}_{2, 1}^{V}$ On the other hand, ${\hat{β}}_{2, 2}$ showed little improvement over ${\hat{β}}_{2, 2}^{V}$ This can be explained by the fact that the first stage data contained no information on z_2i,2. As n increased to 500, the second stage estimates were improved with better variance estimates and higher coverage probabilities – the variance estimator for a misspecified model worked well when the sample size got bigger. Our proposed estimates ${\hat{β}}_{2, 1}$ had smaller variances, better variance estimates, and better coverage probabilities. As N increased to 2,000 and n remained at 500, the variance of ${\hat{β}}_{2, 1}$ got further reduced since more information from stage one was obtained. As for comparison, Tables 1 also shows the “ideal” estimates, denoted as ${\tilde{β}}_{21}$ and ${\tilde{β}}_{22}$ using the complete and accurate information for all N subjects. Figure 1 depicts the mean and variance of two-stage and second stage estimates of A_2,0(t), A_2,1(t), and S₂(t|z₀ = (0.5, 0, 5), x₀ = 0.5), respectively, for sample size (n, N) = (100, 1, 000). The two-stage estimates (thick solid line) show less bias at the right tail than the second stage estimates (dashed line). Moreover, the two-stage estimates have much smaller variances throughout. Table 2 presents the simulated coverage probabilities of the pointwise 95% confidence intervals of A(t) and S(t|z₀, x₀) for (n, N) = (100, 1, 000). The coverage probabilities are quite satisfactory for most time points. As sample size increases, the performance is improved in the far right tail. The results for (n, N) = (500, 1, 000) and (500, 2, 000) are similar and thus not reported here.

Table 1.

Simulated bias, variance, means square error (MSE), and 95% coverage probability (CP) of ${\hat{β}}_{1}^{V}$ , ${\hat{β}}_{1}$ , ${\hat{β}}_{2}^{V} = ({\hat{β}}_{21}^{V}, {\hat{β}}_{22}^{V})$ and ${\hat{β}}_{2} = ({\hat{β}}_{21}, {\hat{β}}_{22})$ . The true parameter value was $β_{2} = (β_{21}, β_{22}) = (1, 1)$ .

(n,N)	Estimate	Bias	Var	Est. Var	MSE	95% CP
(100,1,000)	${\hat{β}}_{11}^{V}$ (Stage 1)	−0.290	0.380	0.310	NA	NA
	${\hat{β}}_{11}$ (Stage 1)	−0.320	0.037	0.033	NA	NA
	${\hat{β}}_{21}^{V}$ (Stage 2)	0.040	0.690	0.580	0.690	0.910
	${\hat{β}}_{21}$ (Proposed)	< 0.001	0.240	0.230	0.240	0.940
	${\tilde{β}}_{21}$ (Ideal)	< 0.001	0.060	0.061	0.061	0.950
	${\hat{β}}_{22}^{V}$ (Stage 2)	−0.060	0.650	0.660	0.660	0.920
	${\hat{β}}_{22}$ (Proposed)	−0.040	0.670	0.570	0.680	0.910
	${\tilde{β}}_{22}$ (Ideal)	−0.020	0.058	0.061	0.058	0.950

(500, 1, 000)	${\hat{β}}_{11}^{V}$	−0.320	0.075	0.075	NA	NA
	${\hat{β}}_{11}$	−0.310	0.038	0.039	NA	NA
	${\hat{β}}_{21}^{V}$ (Stage 2)	−0.010	0.130	0.130	0.130	0.940
	${\hat{β}}_{21}$ (Proposed)	−0.010	0.078	0.077	0.078	0.950
	${\tilde{β}}_{21}$ (Ideal)	< 0.001	0.060	0.061	0.060	0.950
	${\hat{β}}_{22}^{V}$ (Stage 2)	0.010	0.130	0.130	0.130	0.940
	${\hat{β}}_{22}$ (Proposed)	0.010	0.130	0.130	0.130	0.940
	${\tilde{β}}_{22}$ (Ideal)	−0.010	0.058	0.060	0.058	0.950

(500, 2, 000)	${\hat{β}}_{11}^{V}$	−0.310	0.074	0.075	NA	NA
	${\hat{β}}_{11}$	−0.320	0.019	0.020	NA	NA
	${\hat{β}}_{21}^{V}$ (Stage 2)	< 0.001	0.120	0.120	0.120	0.950
	${\hat{β}}_{21}$ (Proposed)	< 0.001	0.054	0.054	0.054	0.950
	${\tilde{β}}_{21}$ (Ideal)	< 0.001	0.030	0.031	0.031	0.950
	${\hat{β}}_{22}^{V}$ (Stage 2)	< 0.001	0.130	0.120	0.130	0.940
	${\hat{β}}_{21}$ (Proposed)	< 0.001	0.130	0.120	0.130	0.940
	${\tilde{β}}_{21}$ (Ideal)	< 0.001	0.030	0.030	0.030	0.950

Open in a new tab

Comparison of the two-stage estimates (thick solid line) and second stage estimates (dash line) with the true coefficients (solid line) for sample size (*n, N*) = (100, 1, 000). The top panel gives the estimates and variances of ${\hat{A}}_{2, 0}^{V}$ and ${\hat{A}}_{2, 0} (t)$ ; the middle panel shows the estimates and variances of ${\hat{A}}_{2, 1}^{V} (t)$ and ${\hat{A}}_{2, 1} (t)$ ; the bottom panel gives the estimates and variances of ${\hat{S}}_{2, 0}^{V} (t ∣ z_{0}, x_{0})$ and ${\hat{S}}_{2, 0} (t ∣ z_{0}, x_{0})$ . The first column is the plot of point estimates compared to the true value (solid line), and the second column is the plot of variance estimates.

Table 2.

Simulated coverage probabilities of the nonparametric two-stage estimators at nominal level 95%.

Time	0.2	0.4	0.6	0.8	1.0	1.2
${\hat{A}}_{2, 0} (t)$	0.91	0.92	0.91	0.93	0.93	0.90
${\hat{A}}_{2, 1} (t)$	0.95	0.96	0.96	0.95	0.92	0.90
${\hat{S}}_{2} (t ∣ z_{0}, x_{0})$	0.95	0.95	0.95	0.94	0.93	0.92

Open in a new tab

5. An Example

5.1. Data description

We illustrate our method using a data set on small intestine cancer from the SEER program supported by NCI. Surgery and radiation therapy are the most commonly used treatments for small intestine cancer. In this study we wanted to know how these treatments affect both survival time and the development of subsequent tumors. Therefore, we defined the survival time as the time from the diagnosis of the first primary small intestine cancer to the diagnosis of the second primary cancer or death. We considered eleven covariates: surgery status (1 if yes, or 0 if no), radiation therapy (1 if yes, or 0 if no), age at the first primary cancer diagnosis (1 if age < 60, or 0 if age ≥ 60), gender (1 if male, or 0 if female), dummy variable race (black, other races, and the reference group white), dummy variable stage (regional stage, distant stage, and the reference group local stage), and dummy variable tumor grade (grade II, grade III, grade IV, and the reference group grade I).

To illustrate our method, we constructed a two-stage design data set as follows. The second stage sample consists of 300 patients (censoring rate = 33.7%) randomly chosen from the 2,669 patients (censoring rate = 26.5%) in the data set with all eleven covariates and survival information. The first stage data include all 2,669 patients, however the variable tumor grade is missing. The 2,669 patients with all variables were used as the reference population.

5.2. Analysis results

With all eleven covariates as time-dependent variables, we plotted the cumulative hazard function $\hat{A} (t) = \int_{0}^{t} α (u) du$ for each variable based on the Aalen (1978) nonparametric additive risks model (Wu (2006)). The linear trends of age and gender (Figure 2) suggest that these two variables have time-independent effects and might be used as Z (q₂ = 2) in the MS model. The nonlinear trend of the other nine covariates (radiation, race, tumor grade, stage, and surgery) suggests that they be used as X (p₂ = 9). For more information on how to assign x and z, refer to Martinussen and Scheike (2006).

Plots of cumulative hazard function for *age* and *gender* show linear trends.

With age and gender as covariate Z (q₂ = 2) with time-independent effects, and the other nine covariates (radiation therapy, black, other races, grade II, grade III, grade IV, regional stage, distant stage, and surgery status) as X (p₂ = 9) with time-dependent effects, Table 3 compares the proposed and second stage estimates for age and gender. Both methods show significant higher risks associated with being male and older (≥ 60 years). We note that both confidence intervals cover the reference parameter values obtained from the complete data (“ideal” estimates), but the interval based on the two-stage estimates is much narrower than the second stage estimate.

Table 3.

Comparison of the second stage and the proposed two-stage estimators for age and gender.

	${\hat{β}}_{2}^{V}$		${\hat{β}}_{2}$
Covariate	Point Est.	95% Conf. Interval	Point Est.	95% Conf. Interval
age	0.053	(0.017, 0.089)	0.062	(0.046, 0.078)
gender	0.036	(0.001, 0.072)	0.031	(0.016, 0.047)

Open in a new tab

Figure 3 displays the estimates and confidence bands of the cumulative regression functions for surgery and radiotherapy, which are of most interest in this study. It is seen from Figure (3b) that, after adjusting for other factors, surgery significantly reduces the risk of second primary cancer or death during the first 2.5 years; after this time period, the efficacy of surgery deceases. However, the effects of surgery are inconclusive based on the second stage estimate (Figure (3a)), since its confidence band is very wide and contains zero. Radiation therapy, as opposed to surgery, seems to have no significant impact on subsequent cancer development. We note that the proposed two-stage estimators are usually closer to the reference estimate from the complete data, with much narrower confidence bands than the second-stage estimator.

95% confidence bands (dash line) of the second stage and two-stage estimators (solid line) of *A_j*(t) for *surgery* and *radiation therapy*. The thick solid line is obtained from the complete data.

Figure 4 depicts a 95% simultaneous confidence band for the conditional survival function for a white male patient who is diagnosed as cancer grade IV, at distant stage, younger than 60, and treated by both surgery and radiation therapy. The proposed two-stage estimate is more accurate with a much narrower confidence band. For example, the survival probabilities at year 1 and year 3 are estimated to be 0.613 (with variance 0.0016) and 0.332 (0.0019), respectively, using the two-stage estimator, and 0.625 (0.0065) and 0.340 (0.0088), respectively, using the second stage data only.

95% simultaneous confidence bands (dash line) for the conditional survival function (solid line) for a white male patient who is diagnosed as grade IV, at distant stage, younger than 60, and has both surgery and radiation therapy. The thick solid line is obtained from the complete data.

6. Discussion

We propose two-stage estimators for the partial linear semiparametric hazard model introduced by McKeague and Sasieni (1994) in a two-stage design setting. We allow measurement errors for survival time, censoring time, and covariates in the first stage data. We also allow missing covariates in the first stage. The proposed estimators are consistent and asymptotically normal. Confidence bands are developed to assess time-varying covariate effects, and to predict conditional survival probabilities. By utilizing information from the first stage, our estimators are more efficient than the second stage estimators for both large and small samples. Reduction in bias is also observed for small samples.

The estimators introduced in this paper take a form that is similar to that of a trick used for variance reduction in the theory of Monte Carlo methods (sometimes called “control variate”):

m^{*} = m - \frac{σ_{m}}{σ_{t}} ρ_{m t} (t - τ),

where m* and m are unbiased estimates for a parameter of interest, say μ, E (t) = τ, σ_m and σ_t are variance of m and t, respectively, and ρ_mt = corr(m, t). In the control variate theory, σ_m, σ_t, and ρ_mt = corr(m, t) can be estimated across the Monte Carlo replicates if they are known. The estimator m is improved by incorporating the control variate t. Since E (t) = τ, the improved estimator m* is unbiased and has the same expectation as m. In our method, since β₁ and A₁(t) (τ) are unknown, we use the first stage estimates ${\hat{β}}_{1}$ and ${\hat{A}}_{1} (t)$ to replace their expected values. On the other hand, ${\hat{β}}_{1}$ and ${\hat{A}}_{1} (t)$ are more efficient than ${\hat{β}}_{1}^{V}$ and ${\hat{A}}_{1}^{V} (t)$ due to larger sample sizes, and therefore can serve as good estimates for β₁ and A₁(t). Another difference between our method and the control variate method is that we use $Σ_{21} = Cov ({\hat{θ}}_{2}^{V}, {\hat{θ}}_{1}^{V})$ instead of $Cov ({\hat{θ}}_{2}^{V}, {\hat{θ}}_{1}^{V} - {\hat{θ}}_{1})$ , since the subjects outside the validation set are independent of the subjects in the validation set.

It is worth noting that our method also works for a working model other than the MS model. For any working model, one can define an estimator of the same form as our proposed estimator and obtain its asymptotic properties following similar steps, provided that the parameter estimator has a similar i.i.d. representation. An interesting question is then how to select the best possible working model, if exists, for the first stage data in order to maximize the benefit of combining information from the two stages. We do not have a definite answer to this question by far. The issue can be very complicated since the answer may depend on many factors, such as the form of the working model, the parameters to be estimated, the surrogate variables involved, and the criterion for optimality. Future research is warranted. However, as implied by the expression of the asymptotic variance of the proposed estimator, for a given second stage model the first stage working model should be chosen in a way such that the first and second stage estimators have high correlation coefficients. In practice, it is convenient to use the same type of models in the two stages, a strategy that would work well when the first stage data are similar (or highly correlated) to the second stage data. Similarly, we suggest assigning the same type of coefficients to a variable in the two stages. For instance, age and gender have constant coefficients in both stages in our data example.

We only consider a simple design where the second stage is a simple random sample from the first stage sample. In many studies, the second-stage subjects are chosen with different selection probabilities depending on the outcomes of the first stage, the covariates, or both. The unequal-selection-probability sampling scheme is of special importance in medical and epidemiological studies, especially for rare diseases. In a sequel, we will extend our methods to incorporate biased-sampling problems.

Acknowledgement

Gang Li's research was supported in part by the U.S. National Institute of Health grants CA016042 and P01AT003960. The authors thank an associate editor and the referees for their helpful comments. Tong Tong Wu's research is partly supported by NSF CCF-0926194.

Appendix

A.1. Regularity assumptions

Let g ∈ {1, 2} be the stage, and let

\begin{matrix} {\tilde{U}}_{g}^{V} (t) = \frac{1}{n} X_{g}^{V} (t)' W_{g}^{V} (t) X_{g}^{V} (t), \\ {\tilde{V}}_{g}^{V} (t) = \frac{1}{n} X_{g}^{V} (t)' W_{g}^{V} (t) Z_{g}^{V} (t), \\ {\tilde{R}}_{g}^{V} (t) = \frac{1}{n} Z_{g}^{V} (t)' W_{g}^{V} (t) Z_{g}^{V} (t), \\ {\tilde{U}}_{1} (t) = \frac{1}{N} X_{1} (t)' W_{1} (t) X_{1} (t), \\ {\tilde{V}}_{1} (t) = \frac{1}{N} X_{1} (t)' W_{1} (t) Z_{1} (t), \\ {\tilde{R}}_{1} (t) = \frac{1}{N} Z_{1} (t)' W_{1} (t) Z_{1} (t) . \end{matrix}

The following assumptions are made for the theoretical development.

(C1)
Finite intervals. Let τ = sup {t : S₁(t|x, z)S₂(t| x, z)S_C(t| x, z) > 0 for all x, z} be a finite constant, where S_g(t|x, z is the survival function of stage g, and S_C(t|x, z) is the survival function for censoring time. The covariates x_gi and z_gi are restricted to a bounded set.
(C2)
Limiting bounds. The hazard functions h_g(t|x, z), g = 1, 2, are bounded uniformly below and above in t, x, and z by some constants b and B, respectively.
(C3)
Asymptotic limits. For the subjects in V (n), ${\tilde{U}}_{g}^{V} (t)$ , ${\tilde{V}}_{g}^{V} (t)$ , and ${\tilde{R}}_{g}^{V} (t)$ converge in L∞ norm (defined as ||M||∞ = max_i ${∥ M ∥}_{\infty} = {max}_{i} Σ_{i = 1}^{n} ∣ m_{ij} ∣$ |m_ij| for a matrix M of size m × n uniformly in time t ε [0, τ], in probability, to some deterministic functions U_g(t), V_g(t), and R_g(t), respectively. The functions ${\tilde{U}}_{i} (t)$ , ${\tilde{V}}_{1} (t)$ , and ${\tilde{R}}_{1} (t)$ have limiting functions U₁(t), V₁(t), and R₁(t), respectively. These functions are uniformly continuous and bounded absolutely by the constant matrices K_U, K_V, and K_R, respectively, in the interval [0, τ]. All the matrices are of full rank.

A.2. Proof of Theorem 1

To simplify the notation, we omit the superscript V for the validation set. Let ${\tilde{u}}_{1 i} (t) = Y_{1 i} (t) [z_{1 i} - {\hat{V}}_{1}^{'} (t) {\hat{U}}_{1}^{- 1} (t) x_{1 i}] {\hat{h}}_{1 i} (t)$ , ${\hat{U}}_{1} (t) = (1 ∕ n) Σ_{i = 1}^{n} Y_{1 i} (t) x_{1 i} x_{1 i}^{'} ∕ {\hat{h}}_{1 i} (t)$ , ${\hat{Φ}}_{1} = (1 ∕ n) \int_{0}^{τ} Z_{1}^{'} {\hat{H}}_{1} Z_{1} dt$ , ${\hat{V}}_{1} (t) = (1 ∕ n) Σ_{i = 1}^{n} Y_{1 i} (t) x_{1 i} z_{1 i} ∕ h_{1 i} (t)$ and $F_{n}^{x} (t) = (1 ∕ n) \int_{0}^{t} X_{1}^{'} {\hat{W}}_{1} d N_{1} (s) = (1 ∕ n) Σ_{i = 1}^{n} \int_{0}^{t} Y_{1 i} (s) (x_{1 i} ∕ {\hat{h}}_{1 i} (s)) d N_{1 i} (s)$ . Let $β_{1} = Φ_{1}^{- 1} E (\int_{0}^{τ} u_{1 i} (t) d N_{1 i} (t))$ , where $Φ_{1} = \int_{0}^{τ} [R_{1} - V_{1} U_{1}^{- 1} V_{1}] dt$ . We have

\begin{matrix} \sqrt{n} ({\hat{β}}_{1} - β_{1}) & = {\hat{Φ}}_{1}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \int {\hat{u}}_{1 i} (t) {dN}_{1 i} (t)) - \sqrt{n} Φ_{1}^{- 1} E (\int u_{1 i} (t) {dN}_{1 i} (t)) \\ = Φ_{1}^{- 1} {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} [\int u_{1 i} (t) {dN}_{1 i} (t) - E (\int u_{1 i} (t) {dN}_{1 i} (t))]} + Φ_{1}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \int [{\hat{u}}_{1 i} (t) - u_{1 i} (t)] {dN}_{1 i} (t)) + ({\hat{Φ}}_{1}^{- 1} - Φ_{1}^{- 1}) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \int u_{1 i} (t) {dN}_{1 i} (t)) + ({\hat{Φ}}_{1}^{- 1} - Φ_{1}^{- 1}) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \int [{\hat{u}}_{1 i} (t) - u_{1 i} (t)] {dN}_{1 i} (t)) \\ \equiv I_{1} + I_{2} + I_{3} + I_{4} . \end{matrix}

. It can be shown that and

I_{2} = - Φ_{1}^{- 1} (\sqrt{n} \int [{\hat{V}}_{1}^{'} (t) - V_{1}^{'} (t)] U_{1}^{- 1} (t) d {\tilde{F}}^{x} (t)) + Φ_{1}^{- 1} (\sqrt{n} \int V_{1}^{'} (t) U_{1}^{- 2} (t) [{\hat{U}}_{1} (t) - U_{1} (t)] d {\tilde{F}}^{x} (t)) + o_{p} (1),

and I₃ = o_p(1) and I₄ = o_p(1) in L∞ norm in probability; see Wu (2006) for more details. Thus

\sqrt{n} ({\hat{β}}_{1} - β_{1}) = Φ_{1}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} w_{1 i}) + o_{p} (1),

(A.1)

where

w_{1 i} = \int u_{1 i} (t) {dN}_{1 i} (t) - E (\int u_{1 i} (t) {dN}_{1 i} (t)) - \int [Y_{1 i} (t) \frac{z_{1 i} x_{1 i}^{'}}{h_{1 i} (t)} - V_{1}^{'} (t)] U_{1}^{- 1} (t) d {\tilde{F}}^{x} (t) + \int V_{1}^{'} (t) U_{1}^{- 1} (t) [Y_{1 i} (t) \frac{x_{1 i} x_{1 i}^{'}}{h_{1 i} (t)} - U_{1} (t)] d {\tilde{F}}^{x} (t) .

It can be shown that E[w_1i] = 0 and ||w_1i|| ∞ ≤ K for some constant K. By (A.1) and the Multivariate Central Limit Theorem, we have $\sqrt{n} ({\hat{β}}_{1} - β_{1}) \to_{d} N (0, Σ_{β, 11})$ , where $Σ_{β, 11} = Φ_{1}^{- 1} E (w_{1 i}^{\otimes 2}) Φ_{1}^{- 1}$ .

Define

{\hat{Σ}}_{β, 11} = {\hat{Φ}}_{1}^{- 1} (\frac{1}{n} \sum_{i = 1}^{n} {\hat{w}}_{1 i}^{\otimes 2}) {\hat{Φ}}_{1}^{- 1},

where

{\hat{w}}_{1 i} = \int {\hat{u}}_{1 i} (t) {dN}_{1 i} (t) - (\frac{1}{n} \sum_{i = 1}^{n} \int {\hat{u}}_{1 i} (t) {dN}_{1 i} (t)) - \int [Y_{1 i} (t) z_{1 i} x_{1 i}^{'} - {\hat{V}}_{1}^{'} (t)] {\hat{U}}_{1}^{- 1} (t) d {\tilde{F}}_{n}^{x} (t) + \int {\hat{V}}_{1}^{'} (t) {\hat{U}}_{1}^{- 2} (t) [Y_{1 i} (t) x_{1 i} (t) x_{1 i} x_{1 i}^{'} - {\hat{U}}_{1} (t)] d {\tilde{F}}_{n}^{x} (t) .

(A.2)

The consistency of the variance estimator can be established by the consistency of ${\hat{Φ}}_{1}$ , ${\hat{U}}_{1} (t)$ , ${\hat{V}}_{1} (t)$ , ${\tilde{F}}_{n}^{x} (t)$ , and ${\hat{β}}_{1}$ .

Similarly, it can be shown that

\begin{matrix} \sqrt{n} [{\hat{A}}_{1} (t) - A_{1} (t)] & = \sqrt{n} \int_{0}^{t} {\hat{U}}_{1}^{- 1} (s) [d {\tilde{F}}_{n}^{x} (s) - {\hat{V}}_{1} (s) {\hat{β}}_{1} d_{s}] - \sqrt{n} \int_{0}^{t} U_{1}^{- 1} (s) [d {\tilde{F}}^{x} (s) - V_{1} (s) β_{1} d_{s}] \\ = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{1 i} (t) + o_{p} (1) . \end{matrix}

(A.3)

It follows from (A.3) and the Multivariate Central Limit Theorem that the finite dimensional distributions of $\sqrt{n} ({\hat{A}}_{1} (t) - A_{1} (t))$ converge to those of a multivariate normal distribution with covariance κ(t₁, t₂) = E {v_1i(t₁)· v_1i(t₂)′}. The`tightness' can be checked using Theorem 13.5 in Billingsley (1999). Thus, $\sqrt{n} [{\hat{A}}_{1} (\cdot) - A_{1} (\cdot)]$ converges to a zero-mean Gaussian process.

The variance Σ_A,11(t) = κ₁(t, t) can be estimated by

{\hat{Σ}}_{A, 11} (t) = \frac{1}{n} \sum_{i = 1}^{n} {{\hat{v}}_{1 i} (t)}^{\otimes 2},

where ${\hat{v}}_{1 i} (t)$ is obtained by replacing U₁(t), V₁(t), h_1i(t), ${\tilde{F}}_{x} (s)$ , β₁, and w_1i in (2.2) by their respective sample estimates ${\hat{U}}_{1} (t)$ , ${\hat{V}}_{1} (t)$ , ${\hat{h}}_{1 i} (t)$ , ${\hat{F}}_{n}^{x} (s)$ , ${\hat{β}}_{1}$ , and ${\hat{w}}_{1 i}$ . The consistency of the variance estimator follows immediately from the consistency of ${\hat{U}}_{1} (t)$ , ${\hat{V}}_{1} (t)$ , ${\hat{w}}_{1 i}$ , ${\hat{β}}_{1}$ , and ${\hat{A}}_{1} (t)$ .

A.3. Proof of Lemma 1

Similar to (A.1), it can be shown that

\sqrt{n} (\begin{matrix} {\hat{β}}_{1}^{V} - β_{1} \\ {\hat{β}}_{2}^{V} - β_{2} \end{matrix}) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\begin{matrix} Φ_{1}^{- 1} w_{1 i} \\ Φ_{2}^{- 1} w_{2 i} \end{matrix}) + o_{p} (1) .

This, together with the Multivariate Central Limit Theorem, proves (2.3). The uniform convergence of the variance and covariance estimators can be easily verified using Theorem 1 of Rao (1963).

A.4. Proof of Lemma 2

Similar to the proof of Lemma 1, we can prove that the joint distributions of $\sqrt{n} [{\hat{A}}_{1}^{V} (t_{1}) - A_{1} (t_{1}), {\hat{A}}_{2}^{V} (t_{2}) - A_{2} (t_{2})]$ converge to those of a zero-mean multivariate normal distribution. It can be verified that tightness holds. Therefore we have $\sqrt{n} [{\hat{A}}_{1}^{V} (t_{1}) - A_{1} (t_{1}), {\hat{A}}_{2}^{V} (t_{2}) - A_{2} (t_{2})]$ converges to a zero-mean Gaussian random field $[W_{1}^{V} (t_{1}), W_{2}^{V} (t_{2})]$ with the variance-covariance function

(\begin{matrix} Σ_{A, 11} (t_{1}) & Σ_{A, 12} (t_{1}, t_{2}) \\ Σ_{A, 21} (t_{1}, t_{2}) & Σ_{A, 22} (t_{2}) \end{matrix}) .

The covariance function ${\hat{Σ}}_{A, kl} (t_{1}, t_{2})$ can be consistently estimated by

{\hat{Σ}}_{A, kl} (t_{1}, t_{2}) = \frac{1}{n} \sum_{i \in V (n)} {{\hat{v}}_{ki} (t_{1}) \cdot {\hat{v}}_{li} {(t_{2})}^{'}} .

(A.4)

The uniform consistency of ${\hat{Σ}}_{A, kl} (t_{1}, t_{2})$ , can be verified using Theorem 1 of Rao (1963).

A.5. Proof of Theorem 2

As in Lemma 1, we can show that

\sqrt{n} (\begin{matrix} {\hat{β}}_{1} - β_{1} \\ {\hat{β}}_{1}^{V} - β_{1} \\ {\hat{β}}_{2}^{V} - β_{2} \end{matrix}) = (\begin{matrix} \frac{\sqrt{n}}{\sqrt{N}} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} Φ_{1}^{- 1} w_{1 i} \\ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{1}^{- 1} w_{1 i} \\ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{2}^{- 1} w_{2 i} \end{matrix}) + o_{p} (1) \to_{d} N (0, (\begin{matrix} ρ Σ_{β, 11} & \sqrt{ρ} Σ_{β, 11} & \sqrt{ρ} Σ_{β, 12} \\ \sqrt{ρ} Σ_{β, 11} & Σ_{β, 11} & Σ_{β, 12} \\ \sqrt{ρ} Σ_{β, 21} & Σ_{β, 21} & Σ_{β, 22} \end{matrix})),

, where ρ = lim n/N as n, N → ∞. The proposed estimator then can be written as

\begin{matrix} \sqrt{n} ({\hat{β}}_{2} - β_{2}) & = \sqrt{n} ({\hat{β}}_{2}^{V} - β_{2}) - Σ_{β, 21} Σ_{β, 11}^{- 1} [\sqrt{n} ({\hat{β}}_{1}^{V} - β_{1}) - \sqrt{n} ({\hat{β}}_{1} - β_{1})] + o_{p} (1) \\ = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{2}^{- 1} w_{2 i} - Σ_{β, 21} Σ_{β, 11}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{1}^{- 1} w_{1 i} - \frac{\sqrt{n}}{\sqrt{N}} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} Φ_{1}^{- 1} w_{1 i}) + o_{p} (1) . \end{matrix}

We have

\sqrt{n} (\begin{matrix} {\hat{A}}_{1} (t_{1}) - A_{1} (t_{1}) \\ {\hat{A}}_{1}^{V} (t_{2}) - A_{1} (t_{2}) \\ {\hat{A}}_{2}^{V} (t_{3}) - A_{2} (t_{3}) \end{matrix}) = (\begin{matrix} \frac{\sqrt{n}}{\sqrt{N}} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} v_{1 i} (t_{1}) \\ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{1 i} (t_{2}) \\ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{2 i} (t_{3}) \end{matrix}) + o_{p} (1) \to_{d} (\begin{matrix} W_{1} (t_{1}) \\ W_{1}^{V} (t_{2}) \\ W_{2}^{V} (t_{3}) \end{matrix}) \equiv W (t),

with variance-covariance function among $W_{1} (t_{1})$ , $W_{1}^{V} (t_{2})$ , $W_{2}^{V} (t_{3})$ being

(\begin{matrix} ρ Σ_{A, 11} (t_{1}) & \sqrt{ρ} Σ_{A, 11} (t_{1} \land t_{2}) & \sqrt{ρ} Σ_{A, 12} (t_{1}, t_{3}) \\ \sqrt{ρ} Σ_{A, 11} (t_{1} \land t_{2}) & Σ_{A, 11} (t_{2}) & Σ_{A, 12} (t_{2}, t_{3}) \\ \sqrt{ρ} Σ_{A, 21} (t_{1}, t_{3}) & Σ_{A, 21} (t_{2}, t_{3}) & Σ_{A, 22} (t_{3}) \end{matrix}) .

We can write

\sqrt{n} [{\hat{A}}_{2} (t) - A_{2} (t)] = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{2 i} (t) - Σ_{A, 21} (t) Σ_{A, 11}^{- 1} (t) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{1 i} (t) - \frac{\sqrt{n}}{\sqrt{N}} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} v_{1 i} (t)) + o_{p} (1) .

The joint distribution of the proposed estimators ${\hat{β}}_{2}$ and ${\hat{A}}_{2} (t)$ is

\sqrt{n} (\begin{matrix} {\hat{β}}_{2} - β_{2} \\ {\hat{A}}_{2} (t) - A_{2} (t) \end{matrix}) = (\begin{matrix} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{2}^{- 1} w_{2 i} - Σ_{β, 21} Σ_{β, 11}^{- 1} [\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{1}^{- 1} w_{1 i} - \frac{\sqrt{n}}{\sqrt{N}} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} Φ_{1}^{- 1} w_{1 i}] \\ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{2 i} (t) - Σ_{A, 21} (t) Σ_{A, 11}^{- 1} (t) [\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{1 i} (t) - \frac{\sqrt{n}}{\sqrt{N}} \frac{1}{\sqrt{N}} \sum_{i = 1}^{N} v_{1 i} (t)] \end{matrix}) + o_{p} (1) \to_{d} (\begin{matrix} Z_{2} \\ W_{2} (t) \end{matrix}),

where $Z_{2} \sim N (0, Σ_{β_{2}}^{*})$ , the variance of $\sqrt{n} ({\hat{β}}_{2} - β_{2})$ can be derived easily by Slutsky's Theorem and the delta-method:

Var (\sqrt{n} ({\hat{β}}_{2} - β_{2})) \to_{p} Σ_{β, 22} + Σ_{β, 21} Σ_{β, 11}^{- 1} (Σ_{β, 11} + ρ Σ_{β, 11} - 2 \sqrt{ρ} Σ_{β, 11}) Σ_{β, 11}^{- 1} Σ_{β, 12} - 2 Σ_{β, 21} Σ_{β, 11}^{- 1} (Σ_{β, 12} - \sqrt{ρ} Σ_{β, 12}) = Σ_{β, 22} - (1 - ρ) Σ_{β, 21} Σ_{β, 11}^{- 1} Σ_{β, 12},

and $W_{2} (t)$ is a zero-mean Gaussian process with covariance function given by

ζ (s, t) = Σ_{A, 22} (s, t) - [Σ_{A, 21} (s, t) - \sqrt{ρ} Σ_{A, 21} (s, t)] Σ_{A, 11}^{- 1} (t) Σ_{A, 12} (t) - Σ_{A, 21} (s) Σ_{A, 11}^{- 1} (s) [Σ_{A, 12} (s, t) - \sqrt{ρ} Σ_{A, 12} (s, t)] + Σ_{A, 21} (s) Σ_{A, 11}^{- 1} (s) [Σ_{A, 11} (s, t) - 2 \sqrt{ρ} Σ_{A, 11} (s \land t) + ρ Σ_{A, 11} (s, t)] Σ_{A, 11}^{- 1} (t) Σ_{A, 12} (t) .

Covariance of the two-stage estimators ${\hat{β}}_{2}$ and ${\hat{A}}_{2} (t)$ converges to

Σ_{β_{2} A_{2}}^{*} (t) = E {[\sqrt{n} ({\hat{β}}_{2}^{V} - β_{2}) - {\hat{Σ}}_{β, 21} {\hat{Σ}}_{β, 11}^{- 1} \sqrt{n} ({\hat{β}}_{1}^{V} - {\hat{β}}_{1})] \cdot {[\sqrt{n} ({\hat{A}}_{2}^{V} (t) - A_{2} (t)) - {\hat{E}}_{A, 21} (t) {\hat{E}}_{A, 11}^{- 1} (t) \sqrt{n} ({\hat{A}}_{1}^{V} (t) - {\hat{A}}_{1} (t))]}^{'}} \to Σ_{β_{2} A_{2}} - (1 - \sqrt{ρ}) Σ_{β_{2} A_{1}} Σ_{A, 11}^{- 1} Σ_{A, 12} - (1 - \sqrt{ρ}) Σ_{β, 21} Σ_{β, 11}^{- 1} Σ_{β_{1} A_{2}} + (1 - 2 \sqrt{ρ} + ρ) Σ_{β, 21} Σ_{β, 11}^{- 1} Σ_{β_{1} A_{1}} Σ_{A, 11}^{- 1} Σ_{A, 12} .

The variance and covariance matrices can be estimated by replacing each term by their respective sample estimates. The consistency can established by Slutsky's Theorem.

A.6. Proof of Theorem 4

To show that E_n(t) and ${\hat{E}}_{n} (t)$ have the same limiting distribution $V_{2} (t)$ , we define an intermediate processes $E_{n}^{*} (t)$ as

E_{n}^{*} (t) = z_{0}^{'} \cdot {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{2}^{- 1} w_{2 i} G_{i} - Σ_{β, 21} Σ_{β, 11}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{1}^{- 1} w_{1 i} G_{i} - \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{N} Φ_{1}^{- 1} w_{1 i} G_{i})} \cdot t + x_{0}^{'} \cdot {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{2 i} (t) G_{i} - Σ_{A, 21} (t) Σ_{A, 11}^{- 1} (t) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} v_{1 i} (t) G_{i} - \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{n} v_{1 i} (t) G_{i})} .

It follows that, conditionally on the data, $E_{n}^{*} (t)$ converges weakly in probability to $V_{2} (t)$ , which is the limiting Gaussian distribution of E_n(t), according to Theorem 2.9.6 in van der Vaart and Wellner (1996). To complete the proof, we need to show that $∥ {\hat{E}}_{n} (t) - E_{n}^{*} (t) ∥ \to_{d} 0$ . We write

\begin{matrix} {\hat{E}}_{n} (t) - E_{n}^{*} (t) & = z_{0}^{'} \cdot {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({({\hat{Φ}}_{2}^{V})}^{- 1} {\hat{w}}_{2 i} - Φ_{2}^{- 1} w_{2 i}) G_{i} - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Σ}}_{β, 21} {\hat{Σ}}_{β, 11}^{- 1} {({\hat{Φ}}_{1}^{V})}^{- 1} {\hat{w}}_{1 i} - Σ_{β, 21} Σ_{β, 11}^{- 1} Φ_{1}^{- 1} w_{1 i}) G_{i} + \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{N} ({\hat{Σ}}_{β, 21} {\hat{Σ}}_{β, 11}^{- 1} {\hat{Φ}}_{1}^{- 1} {\hat{w}}_{1 i} - Σ_{β, 21} Σ_{β, 11}^{- 1} Φ_{1}^{- 1} w_{1 i}) G_{i}} \cdot t + x_{0}^{'} \cdot {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{v}}_{2 i}^{V} (t) - v_{2 i} (t)) G_{i} - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Σ}}_{A, 21} (t) {\hat{Σ}}_{A, 11}^{- 1} (t) {\hat{v}}_{1 i}^{V} (t) - Σ_{A, 21} (t) Σ_{A, 11}^{- 1} (t) v_{1 i} (t)) G_{i} - \frac{1}{\sqrt{N}} \sqrt{\frac{n}{N}} \sum_{i = 1}^{N} ({\hat{Σ}}_{A, 21} (t) {\hat{Σ}}_{A, 11}^{- 1} (t) {\hat{v}}_{1 i} (t) - Σ_{A, 21} (t) Σ_{A, 11}^{- 1} (t) v_{1 i} (t)) G_{i}} \\ \equiv E_{n, 1}^{d} (t) + E_{n, 2}^{d} (t) + E_{n, 3}^{d} (t) + E_{n, 4}^{d} (t) + E_{n, 5}^{d} (t) + E_{n, 6}^{d} (t) . \end{matrix}

Then $∥ {\hat{E}}_{n} (t) - E_{n}^{*} (t) ∥ \leq ∥ E_{n, 1}^{d} (t) ∥ + ∥ E_{n, 2}^{d} (t) ∥ + ∥ E_{n, 3}^{d} (t) ∥ + ∥ E_{n, 4}^{d} (t) ∥ + ∥ E_{n, 5}^{d} (t) ∥ + ∥ E_{n, 6}^{d} (t) ∥$ . We first check that

∥ E_{n, 1}^{d} (t) ∥ = ‖ z_{0}^{'} {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({({\hat{Φ}}_{2}^{V})}^{- 1} {\hat{w}}_{2 i} - Φ_{2}^{- 1} w_{2 i})} G_{i} \cdot t ‖ \leq ‖ z_{0}^{'} {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({({\hat{Φ}}_{2}^{V})}^{- 1} - Φ_{2}^{- 1}) {\hat{w}}_{2 i}} G_{i} \cdot t ‖ + ‖ z_{0}^{'} {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Φ_{2}^{- 1} ({\hat{w}}_{2 i} - w_{2 i})} G_{i} \cdot t ‖,

which converges to zero in probability uniformly in t. It can be similarly shown that the other terms converge to zero in probability uniformly in t.

References

Aalen OO. Nonparametric inference for a family of counting processes. Ann. Statist. 1978;6:701–726. [Google Scholar]
Barlow RE, Bartholomew DJ, Bremner JM, Brunk H. Statistical Inference under Order Restrictions. Wiley; New York: 1972. [Google Scholar]
Billingsley P. Convergence of Probability Measures. Wiley; New York: 1999. [Google Scholar]
Breslow NE, Holubkov R. Weighted likelihood, pseudo-likelihood and maximum likelihood methods for logistic regression analysis of two-stage data. Statist. Medicine. 1997;16:103–116. doi: 10.1002/(sici)1097-0258(19970115)16:1<103::aid-sim474>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]
Chen Y-H. Cox regression in cohort studies with validation sampling. J. Roy. Statist.Soc. Ser. B. 2002;64:51–62. [Google Scholar]
Chen Y-H, Chen H. A unified approach to regression analysis under double sampling design. J. Roy. Statist. Soc. Ser. B. 2000;64:449–460. [Google Scholar]
Cox DR. Regression models and life tables (with discussion) J. Roy. Statist. Soc. Ser.B. 1972;34:187–220. [Google Scholar]
Gasser T, Muller H. Kernel Estimation of Regression Functions, Smoothing Techniques for Curve Estimation. Lecture Notes in Mathematics 757. Springer-Verlag; Berlin: 1979. [Google Scholar]
Jiang J, Zhou H. Additive hazard regression with auxiliary covariates. Biometrika. 2007;94:359–369. [Google Scholar]
Kulich M, Lin DY. Additive hazards regression with covariate measurement error. J. Amer. Statist. Assoc. 2000;95:238–248. [Google Scholar]
Li G, Tseng C. Non-parametric estimation of a survival function with two-stage design studies. Scand. J. Statist. 2008;35:193–211. doi: 10.1111/j.1467-9469.2007.00581.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin DY, Ying ZL. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71. [Google Scholar]
Martinussen T, Scheike TH. Dynamic Regression Models for Survival Data. Springer; New York: 2006. [Google Scholar]
McKeague I, Sasieni P. A partly parametric additive risk model. Biometrika. 1994;81:501–514. [Google Scholar]
Rao R. The law of large numbers for d[0,1]-valued random variables. Theory Probab. Appl. 1963;8:70–74. [Google Scholar]
Scheike TH. The additive nonparametric and semiparametric aalen model as the rate function for a counting process. Lifetime Data Analysis. 2002;8:247–262. doi: 10.1023/a:1015849821021. [DOI] [PubMed] [Google Scholar]
Schill W, Jockel K-H, Drescher K, Timm J. Logistic analysis in case-control studies under validation sampling. Biometrika. 1993;80:339–352. [Google Scholar]
Tseng CH. Ph. D. thesis. University of California; Los Angeles: 2004. Survival analysis with two stage design studies. [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer Verlag; New York: 1996. [Google Scholar]
Wang C-Y, Hsu L, Feng ZD, Prentice RL. Regression calibration in failure time regression. Biometrics. 1997;53:131–145. [PubMed] [Google Scholar]
White JE. A two stage design for the study of the relationship between a rare exposure and a rare disease. Amer. J. Epidemiology. 1982;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]
Wu TT. Ph. D. thesis. University of California; Los Angeles: 2006. A Partial Linear Semiparametric Additive Risks Model for Two-Stage Design Survival Studies. [Google Scholar]
Zhou H, Pepe M. Auxiliary covariate data in failure time regression. Biometrika. 1995;82:139–149. [Google Scholar]

[R1] Aalen OO. Nonparametric inference for a family of counting processes. Ann. Statist. 1978;6:701–726. [Google Scholar]

[R2] Barlow RE, Bartholomew DJ, Bremner JM, Brunk H. Statistical Inference under Order Restrictions. Wiley; New York: 1972. [Google Scholar]

[R3] Billingsley P. Convergence of Probability Measures. Wiley; New York: 1999. [Google Scholar]

[R4] Breslow NE, Holubkov R. Weighted likelihood, pseudo-likelihood and maximum likelihood methods for logistic regression analysis of two-stage data. Statist. Medicine. 1997;16:103–116. doi: 10.1002/(sici)1097-0258(19970115)16:1<103::aid-sim474>3.0.co;2-p. [DOI] [PubMed] [Google Scholar]

[R5] Chen Y-H. Cox regression in cohort studies with validation sampling. J. Roy. Statist.Soc. Ser. B. 2002;64:51–62. [Google Scholar]

[R6] Chen Y-H, Chen H. A unified approach to regression analysis under double sampling design. J. Roy. Statist. Soc. Ser. B. 2000;64:449–460. [Google Scholar]

[R7] Cox DR. Regression models and life tables (with discussion) J. Roy. Statist. Soc. Ser.B. 1972;34:187–220. [Google Scholar]

[R8] Gasser T, Muller H. Kernel Estimation of Regression Functions, Smoothing Techniques for Curve Estimation. Lecture Notes in Mathematics 757. Springer-Verlag; Berlin: 1979. [Google Scholar]

[R9] Jiang J, Zhou H. Additive hazard regression with auxiliary covariates. Biometrika. 2007;94:359–369. [Google Scholar]

[R10] Kulich M, Lin DY. Additive hazards regression with covariate measurement error. J. Amer. Statist. Assoc. 2000;95:238–248. [Google Scholar]

[R11] Li G, Tseng C. Non-parametric estimation of a survival function with two-stage design studies. Scand. J. Statist. 2008;35:193–211. doi: 10.1111/j.1467-9469.2007.00581.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Lin DY, Ying ZL. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71. [Google Scholar]

[R13] Martinussen T, Scheike TH. Dynamic Regression Models for Survival Data. Springer; New York: 2006. [Google Scholar]

[R14] McKeague I, Sasieni P. A partly parametric additive risk model. Biometrika. 1994;81:501–514. [Google Scholar]

[R15] Rao R. The law of large numbers for d[0,1]-valued random variables. Theory Probab. Appl. 1963;8:70–74. [Google Scholar]

[R16] Scheike TH. The additive nonparametric and semiparametric aalen model as the rate function for a counting process. Lifetime Data Analysis. 2002;8:247–262. doi: 10.1023/a:1015849821021. [DOI] [PubMed] [Google Scholar]

[R17] Schill W, Jockel K-H, Drescher K, Timm J. Logistic analysis in case-control studies under validation sampling. Biometrika. 1993;80:339–352. [Google Scholar]

[R18] Tseng CH. Ph. D. thesis. University of California; Los Angeles: 2004. Survival analysis with two stage design studies. [Google Scholar]

[R19] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer Verlag; New York: 1996. [Google Scholar]

[R20] Wang C-Y, Hsu L, Feng ZD, Prentice RL. Regression calibration in failure time regression. Biometrics. 1997;53:131–145. [PubMed] [Google Scholar]

[R21] White JE. A two stage design for the study of the relationship between a rare exposure and a rare disease. Amer. J. Epidemiology. 1982;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]

[R22] Wu TT. Ph. D. thesis. University of California; Los Angeles: 2006. A Partial Linear Semiparametric Additive Risks Model for Two-Stage Design Survival Studies. [Google Scholar]

[R23] Zhou H, Pepe M. Auxiliary covariate data in failure time regression. Biometrika. 1995;82:139–149. [Google Scholar]

PERMALINK

SEMIPARAMETRIC ADDITIVE RISKS REGRESSION FOR TWO-STAGE DESIGN SURVIVAL STUDIES

Gang Li

Tong Tong Wu

Abstract

1. Introduction

2. Two-Stage Estimators

2.1. Notation and assumptions

2.2. Asymptotic properties of the MS estimators under misspecified models

2.3. Two-stage estimators of β and A(t)

3. Asymptotic Properties and Inferences

3.1. Asymptotic properties of $({\hat{β}}_{2}, {\hat{A}}_{2} (t))$

3.2. Estimation of the conditional survival function

4. Simulation Studies

Table 1.

Figure 1.

Table 2.

5. An Example

5.1. Data description

5.2. Analysis results

Figure 2.

Table 3.

Figure 3.

Figure 4.

6. Discussion

Acknowledgement

Appendix

A.1. Regularity assumptions

A.2. Proof of Theorem 1

A.3. Proof of Lemma 1

A.4. Proof of Lemma 2

A.5. Proof of Theorem 2

A.6. Proof of Theorem 4

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

SEMIPARAMETRIC ADDITIVE RISKS REGRESSION FOR TWO-STAGE DESIGN SURVIVAL STUDIES

Gang Li

Tong Tong Wu

Abstract

1. Introduction

2. Two-Stage Estimators

2.1. Notation and assumptions

2.2. Asymptotic properties of the MS estimators under misspecified models

2.3. Two-stage estimators of β and A(t)

3. Asymptotic Properties and Inferences

3.1. Asymptotic properties of (β^2,A^2(t))

3.2. Estimation of the conditional survival function

4. Simulation Studies

Table 1.

Figure 1.

Table 2.

5. An Example

5.1. Data description

5.2. Analysis results

Figure 2.

Table 3.

Figure 3.

Figure 4.

6. Discussion

Acknowledgement

Appendix

A.1. Regularity assumptions

A.2. Proof of Theorem 1

A.3. Proof of Lemma 1

A.4. Proof of Lemma 2

A.5. Proof of Theorem 2

A.6. Proof of Theorem 4

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.1. Asymptotic properties of $({\hat{β}}_{2}, {\hat{A}}_{2} (t))$