Efficient Estimation of the Cox Model With Auxiliary Subgroup Survival Information

Chiung-Yu Huang; Jing Qin; Huei-Ting Tsai

doi:10.1080/01621459.2015.1044090

. Author manuscript; available in PMC: 2017 Aug 18.

Published in final edited form as: J Am Stat Assoc. 2016 Aug 18;111(514):787–799. doi: 10.1080/01621459.2015.1044090

Efficient Estimation of the Cox Model With Auxiliary Subgroup Survival Information

Chiung-Yu Huang ¹, Jing Qin ², Huei-Ting Tsai ³

PMCID: PMC5157123 NIHMSID: NIHMS801688 PMID: 27990035

Abstract

With the rapidly increasing availability of data in the public domain, combining information from different sources to infer about associations or differences of interest has become an emerging challenge to researchers. This paper presents a novel approach to improve efficiency in estimating the survival time distribution by synthesizing information from the individual-level data with t-year survival probabilities from external sources such as disease registries. While disease registries provide accurate and reliable overall survival statistics for the disease population, critical pieces of information that influence both choice of treatment and clinical outcomes usually are not available in the registry database. To combine with the published information, we propose to summarize the external survival information via a system of nonlinear population moments and estimate the survival time model using empirical likelihood methods. The proposed approach is more flexible than the conventional meta-analysis in the sense that it can automatically combine survival information for different subgroups and the information may be derived from different studies. Moreover, an extended estimator that allows for a different baseline risk in the aggregate data is also studied. Empirical likelihood ratio tests are proposed to examine whether the auxiliary survival information is consistent with the individual-level data. Simulation studies show that the proposed estimators yield a substantial gain in efficiency over the conventional partial likelihood approach. Two sets of data analysis are conducted to illustrate the methods and theory.

Keywords: Information synthesis, Meta-analysis, SEER cancer registries, Subgroup analysis

1. Introduction

Combining information from different sources to infer about associations or differences of interest is an important area of research known as meta-analysis. Results of meta-analyses have been used to guide the design of future studies, aid the development of regulatory recommendations, and even modify clinical practice. A PubMed search of the word “meta-analysis” in article titles found 7231 articles in just 2013. With the rapidly increasing availability of data in the public domain, taking full advantage of available information while saving considerable resources has become an emerging challenge to researchers.

An ideal meta-analysis would be an analysis of pooled individual-level data, where the raw data from each study are obtained and analyzed directly. In clinical studies, the use of pooled individual-level data enables researchers to conduct subgroup analysis to investigate whether patient characteristics are related to treatment effects. In most applications, however, synthesis of the information is conducted by analyzing summary statistics, such as means, standard deviations, proportions, odds ratios, and relative risks, from each individual study. Specifically, meta-analysis calculates a weighted average of the summary statistics across studies to provide an overall measure of the association or difference of interest. The major drawback of this approach is that covariate-treatment interactions are usually not provided in the reports of primary analysis findings, thus making subgroup analysis difficult to perform (Simmonds and Higgins, 2007).

On the other hand, methods for combining information from both individual-level data and published aggregate data have drawn much attention (Kovalchick, 2013; Liu et al., 2014). This research is particularly inspired by the growing interest in exploiting the population-based cancer survival statistics made available by the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). The SEER program consists of 18 cancer registries covering approximately 28% of the U.S. population. The registries began collecting demographics and cancer factors on all types of incident cancer patients in 1973, and the database links to state death certificates for patient survival information, including cause of death. The SEER program updates their data annually and has been used by thousands of researchers, clinicians, public health officials, policy makers, community groups and public for cancer incidence and survival statistics in the United States. The SEER Cancer Statistics Review 1973-2010 (http://seer.cancer.gov/csr/1975_2010/) reports the 5-year survival after cancer diagnosis by race, sex, age, and year of diagnosis for the major cancer sites and for all cancers combined using data from the population-based cancer registry. For example, the 5-year survival among ovarian cancer patients diagnosed before age 65 is 57%, and is 27.7% among patients diagnosed after age 65. This paper presents a novel approach for combining data from clinical studies, that is, individual-level data, and the published subgroup t-year survival probabilities, that is, aggregate data. Individual-level data provide estimates of the treatment-covariate interactions or effects of biomarkers that are not reported in existing publications, but the sample size may be too small to provide accurate estimates. Properly combining with survival information available from external sources is expected to yield more efficient estimates of the effects of interest as well as more accurate prediction of the risk of the failure event.

Our approach to synthesize information from different sources is motivated by the empirical likelihood method that was first introduced by Thomas and Grunkemeier (1975) to obtain confidence intervals for the Kaplan-Meier estimator. Later, Owen (1988, 1990) studied empirical likelihood-based confidence regions for the mean or other functions. It has been shown that the empirical likelihood ratio has a limiting chi-squared distribution under mild regularity conditions, and that the empirical likelihood is Bartlett correctable in many applications (DiCiccio et al., 1991), leading to an advantage over the bootstrap method in the construction of confidence regions. Many researchers have applied the empirical likelihood method to more general settings. In particular, Qin and Lawless (1994) made connections to estimating equations and demonstrated that the empirical likelihood method can yield the most efficient estimator by making optimal use of the estimating equations. Although the use of empirical likelihood methods has been very popular in fields such as survey sampling, the main interest is usually to improve the estimation of the mean or other functions of the distribution function. For example, Chen and Qin (1993), Chen and Wu (2002), Chen et al. (2002), and Wu and Sitter (2001) applied the empirical likelihood method to incorporate auxiliary covariate information to improve efficiency of estimation. Imbens (2002) provided a very nice discussion on how the empirical likelihood methods can be used as an alternative for the generalized method of moments.

Application of the empirical likelihood approach to survival data has received much attention because variances of statistical estimates can be very difficult to estimate in the presence of right censoring. The empirical likelihood method can efficiently establish joint confidence regions without directly estimating the corresponding asymptotic variances, and can significantly improve coverage accuracy. In a review paper, Li et al. (2005) summarized the result of empirical likelihood analysis for censored survival time data. Ren and Zhou (2011) investigated the properties of the maximum empirical likelihood estimator and compared with the maximum partial likelihood method. Zhou (2006) applied the empirical likelihood method to improve estimation of the Cox model with partial information on the baseline hazard function.

The proposed application of the empirical likelihood method deals with a nonconventional scenario. Under the proportional hazards model, the auxiliary t-year survival probabilities amount to a system of nonlinear estimating equations that involve the regression parameters, the infinite-dimensional baseline hazard function, and the infinite-dimensional marginal distribution function of the covariate X, making it difficult to derive the constrained maximum likelihood estimator. To tackle this difficulty, two empirical likelihoods, one based on the conditional likelihood of the survival time T give X and another based on the marginal likelihood of X, are constructed to combine information from different sources. In the same spirit of Breslow's estimator (Breslow, 1972), the baseline hazard function and the marginal distribution function of X are profiled out of the conditional likelihood and the marginal likelihood, respectively. To the best of our knowledge, this is the first paper that considers a double empirical likelihood approach.

This paper is organized as follows. In Section 2 we introduce notation and summarize the landmark survival information as unbiased estimating equations. The main results are presented in Section 3, where a double empirical likelihood approach is proposed to synthesize the auxiliary survival information. Because the auxiliary survival information many not be consistent with the individual-level data due to inclusion/exclusion criteria of the clinical study, in Section 4 we extend the proposed double empirical likelihood method to allow the population from which the aggregate survival information is derived to have a different baseline risk. We present results of simulation studies in Section 5 and illustrate the proposed methods with two sets of data analysis in Section 6. Some concluding remarks are given in Section 6.

2. Model Setup

Let T denote the time from disease onset to a failure event or an event of interest. Assume that T is absolutely continuous, that is, T has a probability density. Let X denote a p × 1 vector of baseline covariates. Denote by f(t | x) and S(t | x) the conditional density function and the conditional survival function of T given X = x. We assume that the survival time T follows the proportional hazards model (Cox, 1972)

λ (t | x) = λ (t) exp (β^{'} x),

where β is a vector of p × 1 regression parameters and λ(t) is an unspecified baseline hazard function. Let $Λ (t) = \int_{0}^{t} λ (u) d u$ be the corresponding baseline cumulative hazard function. The observation of the survival time is usually subject to right censoring due to study end or premature dropout. Thus, instead of observing the actual value of the survival time T, we observe the possibly censored survival time Y = min(T, C), where C is the time of censoring. In many applications, it is reasonable to assume that C is independent of T given the observed covariates X.

Our goal is to derive an efficient estimator of the Cox model by incorporating published t-year survival probabilities. To express the auxiliary information on survival at the time point t*, we use Ω_k, k = 1, …, K to denote the kth subgroup whose t*-year survival is provided. In the aforementioned ovarian cancer example, we set Z to be the age of diagnosis, where Z is a subset of the covariate of interest X, which may include biomarkers and other risk factors of ovarian cancer. Write X = (Z, W). Thus the two subgroups of ovarian cancer patients are Ω₁ = {(Z, W) : Z < 65} and Ω₂ = {(Z, W) : Z ≥ 65}, and the auxiliary 5-year survival probabilities obtained from the SEER Cancer Statistics Review 1973-2010 are given by pr(T > 5 | X ∈ Ω₁) = 0.56 and pr(T > 5 | X ∈ Ω₂) = 0.277.

A general expression of the auxiliary survival information for subgroup k at the time point t* is

pr (T > t * | X \in Ω_{k}) = ϕ_{k}, k = 1, \dots, K,

or, equivalently, pr(T > t*, X ∈ Ω_k) – ϕ_k × pr(X ∈ Ω_k) = 0. By double expectation and under the assumed Cox model, we can derive

E [I (X \in Ω_{k}) {S (t * | X) - ϕ_{k}}] = E {I (X \in Ω_{k}) [exp {- Λ (t *) exp (β^{'} X)} - ϕ_{k}]} = 0 .

Define the estimating function

ψ_{k} (X, β, Λ) = I (X \in Ω_{k}) [exp {- Λ (t *) exp (β^{'} X)} - ϕ_{k}] .

Then the subgroup survival information at t* is summarized by

E {ψ_{k} (X, β, Λ)} = 0, k = 1, \dots, K,

(1)

where the random function ψ_k(X, β, Λ), k = 1, …, K, is bounded by 2. Note that the estimating equations only involve the regression parameter β and the baseline cumulative hazard function evaluated at t*. Hence, by setting α = Λ(t*), equation (1) can be reexpressed as E{ψ_k(X, β, α)} = 0, k = 1, …, K.

3. Method

In this section, we introduce a double empirical likelihood method to synthesize information from different sources. Under the Cox model, the density of the bivariate random variable (T, X), relative to the product of the Lebesgue measure and the marginal distribution of X, is given by exp(β′x)λ(t) exp{−Λ(t) exp(β′x)}dG(x), where G is the distribution function of X. Assume that the observed data (Y_i, Δ_i, X_i), i = 1, …, n, on n subjects are independent and identically distributed realizations of (Y, Δ, X). Dropping the factors involving the censoring time distribution, the log full likelihood based on the observed data is

l_{F} = \sum_{i = 1}^{n} Δ_{i} [β^{'} X_{i} + log {d Λ (Y_{i})}] - Λ (Y_{i}) exp (β^{'} X_{i}) + log {d G (X_{i})} .

Define the functions $S^{(k)} (t, β) = n^{- 1} \sum_{j = 1}^{n} I (Y_{j} \geq t) exp (β^{'} X_{j}) X_{j}^{\otimes k}$ , k = 0, 1, 2, with x^⊗2 = x′x. Following the empirical likelihood method of Owen (1988) and Qin and Lawless (1994), we denote by λ_i the jump of Λ at Y_i and by p_i the jump of G at X_i. The full empirical likelihood can then be decomposed as the product of the conditional likelihood of (Y, Δ) given X and the marginal likelihood of X, where the log conditional likelihood is

l_{C} = \sum_{i = 1}^{n} Δ_{i} (β^{'} X_{i} + log λ_{i}) - n \sum_{i = 1}^{n} λ_{i} S^{(0)} (Y_{i}, β),

and the log marginal likelihood is

l_{M} = \sum_{i = 1}^{n} log (p_{i}) .

For a fixed β, differentiating the log conditional likelihood ℓ_C with respect to λ_i and setting the derivative to 0 yields

λ_{i} = \frac{1}{n} \times \frac{Δ_{i}}{S^{(0)} (Y_{i}, β)},

which leads to the Breslow (1972) estimator for the baseline cumulative hazard function

{\hat{Λ}}_{B} (t, β) = \frac{1}{n} \sum_{i = 1}^{n} \frac{Δ_{i} I (Y_{i} \leq t)}{S^{(0)} (Y_{i}, β)} .

Another well-known result is that replacing Λ with Λ̂_B in ℓ_C yields the (log) partial likelihood function.

Assume that the auxiliary survival information is consistent with the individual-level data, that is, individuals in the clinical study are a representative sample of the population from which the aggregate survival information is derived. A simple idea to combine auxiliary information is applying the empirical likelihood method to maximize the full likelihood with respect to the constraints

p_{i} \geq 0, \sum_{i = 1}^{n} p_{i} = 1, and \sum_{i = 1}^{n} p_{i} ψ_{k} (X_{i}; β, Λ) = 0, k = 1, \dots, K .

Because the estimating function ψ_k only involves the value of Λ(t) at t = t*, intuitively, one may replace Λ(t*) in ψ_k with its Breslow-type estimator Λ̂_B(t*, β). However, simulation studies suggest that this simple approach yields biased estimation, because the Breslow-type estimator involves unknown parameter β.

We propose to combine the auxiliary subgroup survival information to estimate the Cox model by formulating two empirical likelihoods – one of which is derived from the conditional likelihood and the other is derived from the marginal likelihood. Our idea is to treat α = Λ(t*) as a nuisance parameter and construct an empirical likelihood for α. We then formulate the usual empirical likelihood for β and Λ using the auxiliary survival information which depends on β and α.

The steps to construct the double empirical likelihood are described below. By definition, $α = Λ (t^{*}) = \int_{0}^{\infty} I (u \leq t^{*}) d Λ (u)$ . We propose to maximize the log conditional likelihood ℓ_C subject to the constraint

\sum_{i = 1}^{n} λ_{i} I (Y_{i} \leq t^{*}) - α = 0 .

Applying Lagrange multipliers ν, the objective function to be maximized is

\sum_{i = 1}^{n} Δ_{i} (β^{'} X_{i} + log λ_{i}) - n \sum_{i = 1}^{n} λ_{i} S^{(0)} (Y_{i}, β) - n ν {\sum_{i = 1}^{n} λ_{i} I (Y_{i} \leq t^{*}) - α} .

Taking derivative of the objective function with respect to λ_i and setting the derivative to 0 yields

λ_{i} = \frac{1}{n} \times \frac{Δ_{i}}{S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})},

(2)

where the Lagrange multiplier is determined by

\frac{1}{n} \sum_{i = 1}^{n} {\frac{Δ_{i} I (Y_{i} \leq t^{*})}{S^{(0)} (Y_{i}) + ν I (Y_{i} \leq t^{*})}} - α = 0 .

Substituting (2) back to the objective function yields, up to a constant,

\sum_{i = 1}^{n} Δ_{i} [β^{'} X_{i} - log {S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})}] + n ν α .

Hence the marginal empirical score function for β is

\sum_{i = 1}^{n} Δ_{i} {X_{i} - \frac{S^{(1)} (Y_{i}, β)}{S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})}} .

Next, we maximize the log marginal likelihood ℓ_M with respect to the constraints p_i ≥ 0, $\sum_{i = 1}^{n} p_{i} = 1$ , and $\sum_{i = 1}^{n} p_{i} ψ_{k} (X_{i}, β, α) = 0$ for k = 1, …, K. Write ψ(x, β, α) = {ψ₁(x, β, α), …, ψ_K(x, β, α)}′. Given β and α, a unique maximum exists provided 0 lies in the convex hull of ψ(X₁, β, α), …, ψ(X_n, β, α). Applying the classic empirical likelihood argument, we have

p_{i} = \frac{1}{n} \times \frac{1}{1 + ξ^{'} ψ (X_{i}, β, α)}

and the constrained log marginal likelihood, up to a constant,

- \sum_{i = 1}^{n} log {1 + ξ^{'} ψ (X_{i}, β, α)}

where the Lagrange multipliers ξ = (ξ₁, …, ξ_K)′ are determined by

\frac{1}{n} \sum_{i = 1}^{n} \frac{ψ (X_{i}, β, α)}{1 + ξ^{'} ψ (X_{i}, β, α)} = 0 .

Combing the two constrained log likelihoods yields the constrained log full likelihood function ℓ, where, up to a constant,

l (β, ξ, ν, α) = \sum_{i = 1}^{n} Δ_{i} [β^{'} X_{i} - log {S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})}] + n ν α - \sum_{i = 1}^{n} log {1 + ξ^{'} ψ (X_{i}, β, α)} .

The procedure described above enables us to change an infinite dimension problem to a finite-dimension problem at the expense of introducing an additional (K + 2)-dimensional parameters.

To estimate β, we solve a system of empirical score equations:

\begin{array}{l} U_{1} (β, ξ, ν, α) = \sum_{i = 1}^{n} Δ_{i} {X_{i} - \frac{S^{(1)} (Y_{i}, β)}{S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})}} - \sum_{i = 1}^{n} \frac{ξ^{'} \frac{\partial ψ}{\partial β} (X_{i}, β, α)}{1 + ξ^{'} ψ (X_{i}, β, α)}, \\ U_{2} (β, ξ, ν, α) = \sum_{i = 1}^{n} \frac{ψ (X_{i}, β, α)}{1 + ξ^{'} ψ (X_{i}, β, α)}, \\ U_{3} (β, ξ, ν, α) = \sum_{i = 1}^{n} {\frac{Δ_{i} I (Y_{i} \leq t^{*})}{S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})} - α}, \\ U_{4} (β, ξ, ν, α) = \sum_{i = 1}^{n} {\frac{ξ^{'} \frac{\partial ψ}{\partial α} (X_{i}, β, α)}{1 + ξ^{'} ψ (X_{i}, β, α)} - ν}, \end{array}

with the usual convention 0/0 = 0. These empirical score functions are derived by taking derivative of the empirical likelihood ℓ with respect to θ = (β, ξ, ν, α). Let β₀, Λ₀, and α₀ be the true parameter values, and denote θ₀ = (β₀, 0, 0, α₀). Define U_0k = U_k(β₀, 0, 0, α₀), k = 1, 2, 3, 4, that is, U_0k is the value of U_k evaluated at θ₀. Define $U^{'} = {(U_{1}^{'}, U_{2}^{'}, U_{3}, U_{4})}^{'}$ and $U_{0}^{'} = (U_{01}^{'}, U_{02}^{'}, U_{03}, U_{04})$ , where U₀₄ = 0. We show in the Appendix that, under some regularity conditions, n^−1/2U₀ converges in distribution to a zero-mean multivariate normal distribution with variance-covariance matrix

Ω = (\begin{matrix} \sum & 0 & 0 & 0 \\ 0 & J & 0 & 0 \\ 0 & 0 & K & 0 \\ 0 & 0 & 0 & 0 \end{matrix}),

where Σ, J, K are defined in the appendix. Note, for convenience, we use 0 and I to denote a matrix of 0's and an identity matrix with proper dimensions.

Denote by θ̂ = (β̂, ξ̂, ν̂, α̂) the solution to U(θ) = 0. The large-sample properties of θ̂ are presented in Theorem 1, the proof of which is given in the Appendix.

Theorem 1 Assume that X is bounded, the true regression parameter β₀ lies in a compact set, and both T and C are absolutely continuous. Moreover, assume that E{ψ(X, β₀, α₀)ψ(X, β₀, α₀)′} is positive definite and α₀ = Λ₀(t*) < ∞. Then n^1/2(β̂ – β₀) converges in distribution to a zero mean multivariate normal distribution with variance-covariance matrix Γ⁻¹ = (Σ+BQ⁻¹B′)⁻¹, provided Γ is non-singular, where B and Q are specified in the Appendix.

Interestingly, Σ⁻¹ is the asymptotic covariance-covariance matrix of n^1/2(β̂_PL − β₀), where β̂_PL is the maximum partial likelihood estimator. Hence Theorem 1 implies that the proposed estimator β̂ is asymptotically more efficient than the maximum partial likelihood estimator β̂_PL.

When the subgroups involved in the auxiliary survival information are determined by a subset of covariates, the efficiency gain in the estimated coefficients for other covariates is expected to be minimum. To see this, consider a simple case where X = (X₁, X₂) and the subgroups are determined only based on X₁. Then the auxiliary survival information for the kth subgroup can be reexpressed as

E {ψ_{k} (X, β, α)} = \int \int {I (x_{1} \in Ω_{k}) [exp {- α exp (β_{1} x_{1} + β_{2} x_{2})} - ϕ_{k}]} d G (x_{1}, x_{2}) = \int \int {I (x_{1} \in Ω_{k}) [exp {- α exp (β_{1} x_{1} + u)} - ϕ_{k}]} d G (x_{1}, β_{2}^{- 1} u) .

Because the proposed estimation procedure allows for an arbitrary distribution function G for (X₁, X₂), after reparameterization, it is equivalent to maximize the log marginal likelihood $\sum_{i = 1}^{n} log (p_{i}^{*})$ with respect to the constraints

p_{i}^{*} \geq 0, \sum_{i = 1}^{n} p_{i}^{*} = 1 and \sum_{i = 1}^{n} p_{i}^{*} {I (X_{i 1} \in Ω_{k}) [exp {- α exp (β_{1} X_{i 1} + X_{i 2})} - ϕ_{k}]},

where X_i = (X_i1, X_i2) and $p_{i}^{*}$ is the jump of G at (X_i1, X_i2/β₂). It is easy to see that, after profiling out $p_{i}^{*}$ , the auxiliary survival information does not involve β₂. Thus the proposed estimation procedure is expected to significantly improve the efficiency in the estimation of β₁ but has only limited impact on the estimation of β₂.

To estimate the baseline cumulative hazard function Λ(t), we consider the following empirical likelihood-based estimator to incorporate the auxiliary survival information:

\hat{Λ} (t) = \frac{1}{n} \sum_{i = 1}^{n} \frac{Δ_{i} I (Y_{i} \leq t)}{S^{(0)} (Y_{i}, \hat{β}) + \hat{ν} I (Y_{i} \leq t^{*})} .

Applying the functional delta method, we can show that n^1/2{Λ̂_EL(t) – Λ(t)} converges to a zero-mean Gaussian process on [0, τ]. A sketch of the proof is given in the Appendix.

The validity of the proposed method holds when the t-year survival probabilities summarized by (1) are consistent with the individual-level data. To test the conformity of the auxiliary survival information, an empirical likelihood ratio test statistic can be constructed in the spirit of Corollary 4 of Qin and Lawless (1994) and Qin and Lawless (1995). Specifically, we consider the test statistic

R = 2 {sup_{β, ξ, ν, α} l (β, ξ, ν, α) - sup_{β, ν, α} l (β, 0, ν, α)} .

Note that when the conformity assumption holds, that is, ξ = 0, the likelihood ℓ(β, α, 0, ν) is maximized by (β, ν, α) = (β̂_PL, 0, α̂_PL), where α̂_PL = Λ̂_B(t*, β̂_PL) is the Breslow-type estimator of the baseline cumulative hazard function at time t*. Theorem 2 summarizes the asymptotic properties of empirical log-likelihood ratio statistic R.

Theorem 2 Under the regularity conditions specified in Theorem 1 and the null hypothesis that ξ = 0, the empirical log-likelihood ratio R converges in distribution to a χ² random variable with K degrees of freedom as n → ∞.

4. An Extension

As discussed before, a major limitation of the estimation procedure described in Section 3 is that the auxiliary information must be consistent with the individual-level data. However, due to study inclusion/exclusion criteria, subjects enrolled in the clinical study may not be a representative sample of the population from which the aggregate survival information is derived. Hence it is desired to allow the aggregate data to have a different survival time model. To this end, we propose to accommodate the inconsistency by assuming that the hazard function of the survival time in the aggregate data follows the Cox model λ*(t)exp(β′ x), where

λ^{*} (t) = ρ λ (t), ρ > 0,

(3)

so that the potential differences in the two data sources are characterized by a scale factor ρ. Of note, ρ = 1 indicates that survival time model for the aggregate data is the same as that for the individual-level data.

Similar to the discussions in Section 3, the auxiliary survival information pr(T > t* | X ∈ Ω_k) = ϕ_k, k = 1, …, K, can be summarized by the estimating equations

E {{\tilde{ψ}}_{k} (X, β, α, ρ)} = 0, k = 1, \dots, K,

where, under model (3), ψ̂_k(X, β, α, ρ) = I(X ∈ Ω_k)[exp{−ραexp(β′X)} − ϕ_k]. The constrained log full likelihood function is given by, up to a constant,

\tilde{l} (β, ξ, ν, α, ρ) = \sum_{i = 1}^{n} Δ_{i} [β^{'} X_{i} - log {S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})}] + n ν α - \sum_{i = 1}^{n} log {1 + ξ^{'} \tilde{ψ} (X_{i}, β, α, ρ)} .

Taking derivative of the empirical likelihood ℓ̃ with respect to θ̃ = (β, ξ, ν, α, ρ), we reach a system of empirical score equations:

\begin{array}{l} {\tilde{U}}_{1} (β, ξ, ν, α, ρ) = \sum_{i = 1}^{n} Δ_{i} {X_{i} - \frac{S^{(1)} (Y_{i}, β)}{S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})}} - \sum_{i = 1}^{n} \frac{ξ^{'} \frac{\partial \tilde{ψ}}{\partial β} (X_{i}, β, α, ρ)}{1 + ξ^{'} \tilde{ψ} (X_{i}, β, α, ρ)}, \\ {\tilde{U}}_{2} (β, ξ, ν, α, ρ) = \sum_{i = 1}^{n} \frac{\tilde{ψ} (X_{i}, β, α, ρ)}{1 + ξ^{'} \tilde{ψ} (X_{i}, β, α, ρ)}, \\ {\tilde{U}}_{3} (β, ξ, ν, α, ρ) = \sum_{i = 1}^{n} {\frac{Δ_{i} I (Y_{i} \leq t^{*})}{S^{(0)} (Y_{i}, β) + ν I (Y_{i} \leq t^{*})} - α}, \\ {\tilde{U}}_{4} (β, ξ, ν, α, ρ) = \sum_{i = 1}^{n} {\frac{ξ^{'} \frac{\partial \tilde{ψ}}{\partial α} (X_{i}, β, α, ρ)}{1 + ξ^{'} \tilde{ψ} (X_{i}, β, α, ρ)} - ν}, \\ {\tilde{U}}_{5} (β, ξ, ν, α, ρ) = \sum_{i = 1}^{n} \frac{ξ^{'} \frac{\partial \tilde{ψ}}{\partial ρ} (X_{i}, β, α, ρ)}{1 + ξ^{'} \tilde{ψ} (X_{i}, β, α, ρ)}, \end{array}

with the usual convention 0/0 = 0. Let ρ₀ be the true parameter value for ρ, and denote ${\tilde{θ}}_{0} = {(θ_{0}^{'}, ρ_{0})}^{'} = (β_{0}, 0, 0, α_{0}, ρ_{0})$ . Define Ũ_0k = Ũ_k(θ̃₀), k = 1, …, 5, that is, Ũ_0k is the value of Ũ_k evaluated at θ̃₀. Define ${\tilde{U}}^{'} = {({\tilde{U}}_{1}^{'}, {\tilde{U}}_{2}^{'}, {\tilde{U}}_{3}, {\tilde{U}}_{4}, {\tilde{U}}_{5})}^{'}$ and ${\tilde{U}}_{0}^{'} = ({\tilde{U}}_{01}^{'}, {\tilde{U}}_{02}^{'}, {\tilde{U}}_{03}, {\tilde{U}}_{04}, {\tilde{U}}_{05})$ , where Ũ₀₄ = Ũ₀₅ = 0. Let θ̂_ρ be the solution to Ũ(θ̃) = 0 and β̂_ρ be the corresponding estimated regression coefficient. The large-sample properties of θ̂_ρ are presented in Theorem 3, with the proof given in the Appendix.

Theorem 3 Assume that the matrix E{ψ̃(X, β₀, α₀, ρ₀)ψ̃(X, β₀, α₀, ρ₀)′} is positive definite. Then, under the same regularity conditions in Theorem 1, n^1/2(β̂_ρ – β₀) converges in distribution to a zero mean multivariate normal distribution with variance-covariance matrix Γ̃⁻¹ = (Σ + B̃Q̃⁻¹B̃′)⁻¹, provided Γ̃ is non-singular, where B̃ and Q̃ are specified in the Appendix.

Theorem 3 implies that the extended double empirical likelihood estimator β̂_ρ is asymptotically more efficient than the maximum partial likelihood estimator β̂_PL. Moreover, it is easy to see that, θ̂ = (β̂, ξ̂, ν̂, α̂) is the maximizer of ℓ(β, ξ, ν, α, ρ ≡ 1). In the proof of Theorem 3, we also show that β̂_ρ is less efficient than β̂. To test if the same baseline hazard function is shared by the individual-level data and the aggregate data, we consider the empirical log-likelihood ratio statistic

\tilde{R} = 2 {sup_{β, ξ, ν, α, ρ} \tilde{l} (β, ξ, ν, α, ρ) - sup_{β, ξ, ν, α} \tilde{l} (β, ξ, ν, α, 1)} .

Under minor regularity conditions and the null hypothesis that ρ₀ = 1, the empirical log-likelihood ratio R̃ converges in distribution to a χ² random variable with 1 degrees of freedom as n → ∞. The proof closely follows that for Theorem 2, and thus is omitted.

5. Numerical Studies

5.1 Monte Carlo Simulations

We conducted two sets of Monte Carlo simulations to examine the finite-sample performance of the proposed methods. In both simulation studies, we generated X₁ from the standard normal random variable and X₂ from a Bernoulli distribution with pr(X₂ = 1) = pr(X₂ = 0) = 0.5. The survival time T in the individual-level data was generated from the proportional hazards models (A) λ(t | X₁, X₂) = λ(t) exp(β₁X₁ + β₂X₂) with (β₁, β₂) = (−0.5, 0.5), and (B) λ(t | X₁, X₂) = λ(t) exp(β₁X₁ + β₂X₂ + β₃X₁X₂) with (β₁, β₂, β₃) = (−0.5, 1, −0.5), where we set λ(t) = 2t for both models. The censoring time C was generated from an uniform distribution so that the censoring rate was approximately 0%, 30%, and 50%. In each simulation, we generated 1000 datasets, each with a sample size of n = 100 and n = 400.

In the first set of simulations, we considered the case where the individual-level and the aggregate data share the same survival time model, that is, ρ = 1 in model (3). We derived the auxiliary survival information at t = 0.5 for subgroups Ω₁ = {(X₁, X₂) : X₁ ≤ 0, X₂ = 0} and Ω₂ = {(X₁, X₂) : X₁ > 0, X₂ = 0} under the assumed Cox model. This specification aims to mimic the situation where in a randomized clinical trial we exploit the information about the 6-month survival probabilities in the standard-of-care control group (X₂ = 0) and where X₁ is a baseline risk factor. The 6-month survival probabilities for the two subgroups are approximately 0.68 and 0.84 under Models (A) and (B).

Tables 1 and 2 summarize the empirical bias and empirical standard deviation of the maximum partial likelihood estimator β̂_PL, the double empirical likelihood estimator β̂, and the extended double empirical likelihood estimator β̂_ρ that allows for a different baseline hazard function for the aggregate data. All three estimators are close to their estimands under Models (A) and (B). Compared with the maximum partial likelihood estimator β̂_PL, the two double empirical likelihood estimators β̂ and β̂_ρ enjoy substantial efficiency gains. Under Model (A), the relative efficiency ranges from 4.86 to 9.66 for the estimated coefficient of baseline risk factor and from 1.02 to 1.84 for the estimated treatment effect. Under Model (B), the relative efficiency in the treatment-covariate interaction ranges from 1.50 to 2.13, suggesting that the use of proposed methods can substantially reduce the sample size requirement by about 18%– 58% in investigating treatment heterogeneity. As expected, β̂_ρ is slightly less efficient than β̂ and the estimated value of ρ is close to 1 using the extended double empirical likelihood approach. We also reported the estimated baseline cumulative hazard function at t = 0.3 and 0.7. As expected, the double empirical likelihood approach enjoys a substantial gain in efficiency in the estimation of Λ(t), while the efficiency gain for the extended approach is minimal because it allows for a different baseline risk in the aggregate data. Finally, our simulations (results not shown) also show that the efficiency gain increases with the number of constraints.

Table 1.

Summary statistics for the estimation of Model (A) λ(t | X₁, X₂) = λ₀(t)exp(β₁X₁ + β₂X₂) when ρ = 1.

Proportion censored			β₁			β₂			ρ		Λ(0.3)			Λ(0.7)
Proportion censored	n	Coef	Bias	ES	RE	Bias	ES	RE	Bias	ES	Bias	ES	RE	Bias	ES	RE
0%	100	PL	-2	12	–	3	22	–	–	–	0	3	–	0	10	–
		DEL	-1	5	4.96	0	18	1.48	–	–	0	2	1.67	0	6	2.59
		DEL_ρ	-1	5	4.86	2	22	1.03	7	26	0	3	1.03	0	10	1.05
	400	PL	0	6	–	1	11	–	–	–	0	1		0	5	–
		DEL	0	2	5.62	0	9	1.47	–	–	0	1	1.86	0	3	2.35
		DEL_ρ	0	2	5.52	0	11	1.02	1	11	0	1	1.05	0	5	1.05
30%	100	PL	-1	14	–	3	26	–	–	–	0	3	–	0	11	–
		DEL	-1	5	6.84	1	20	1.67	–	–	0	2	1.76	0	7	2.50
		DEL_ρ	-1	5	6.68	2	25	1.04	7	31	0	3	1.05	0	11	1.05
	400	PL	-1	7	–	1	13	–	–	–	0	2	–	0	5	–
		DEL	0	2	7.10	0	10	1.60	–	–	0	1	1.87	0	4	2.36
		DEL_ρ	0	2	6.98	0	13	1.02	1	13	0	1	1.05	0	5	1.04
50%	100	PL	-2	17	–	0	31	–	–	–	0	3	–	1	13	–
		DEL	-1	5	9.64	-2	23	1.84	–	–	0	2	1.81	1	8	2.54
		DEL_ρ	-1	5	9.61	-1	30	1.04	6	32	0	3	1.01	2	13	1.03
	400	PL	-1	8	–	1	15	–	–	–	0	2	–	0	6	–
		DEL	1	2	9.66	0	12	1.72	–	–	0	1	1.87	0	4	2.50
		DEL_ρ	1	2	9.56	0	15	1.02	1	14	0	2	1.05	0	6	1.04

Open in a new tab

NOTE: β₁ and β₂ are the regression coefficients, where the true parameter values are (−0.5, 0.5); Λ(t) = t² is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂_PL; DEL, the double empirical likelihood estimator β̂; DEL_ρ, the extended double empirical likelihood estimator β̂_ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

Table 2.

Summary statistics for the estimation of Model (B) λ(t | X₁, X₂) = λ₀(t) exp(β₁X₁ + β₂X₂ + β₃X₁X₂) when ρ = 1.

Proportion censored			β₁			β₂			β₃			ρ		Λ(0.3)			Λ(0.7)
Proportion censored	n	Coef	Bias	ES	RE	Bias	ES	RE	Bias	ES	RE	Bias	ES	Bias	ES	RE	Bias	ES	RE
0%	100	PL	-1	17	–	2	24	–	-2	24	–	–	–	0	3	–	0	10	–
		DEL	0	6	9.45	0	18	1.80	-1	17	1.84	–	–	0	2	1.84	0	6	2.54
		DEL_ρ	0	6	9.38	2	23	1.02	-2	18	1.72	6	26	0	3	1.01	0	10	1.04
	400	PL	-1	8	–	1	12	–	0	11	–	–	–	0	1	–	0	5	–
		DEL	0	3	9.44	1	9	1.77	0	8	1.64	–	–	0	1	2.01	0	3	2.35
		DEL_ρ	0	3	9.41	1	12	1.01	0	9	1.50	1	12	0	1	1.03	0	5	1.03
30%	100	PL	-2	21	–	3	29	–	-2	28	–	–	–	0	3	–	0	12	–
		DEL	-1	6	13.8	0	21	1.92	-1	20	1.93	–	–	0	2	1.96	0	8	2.44
		DEL_ρ	-1	6	13.7	2	28	1.04	-2	21	1.84	8	31	0	3	1.03	0	12	1.05
	400	PL	-1	9	–	1	14	–	0	13	–	–	–	0	2	–	0	6	–
		DEL	0	3	13.7	0	10	1.88	-1	10	1.78	–	–	0	1	2.04	0	4	2.42
		DEL_ρ	0	3	13.6	1	13	1.04	-1	10	1.70	2	14	0	1	1.04	0	5	1.06
50%	100	PL	-2	27	–	4	35	–	-2	36	–	–	–	0	3	–	0	14	–
		DEL	0	6	23.2	-1	24	2.12	-4	25	2.13	–	–	0	2	2.05	1	9	2.45
		DEL_ρ	0	6	23.0	2	33	1.09	-4	25	2.11	9	35	0	3	1.05	0	14	1.08
	400	PL	-1	12	–	1	16	–	0	16	–	–	–	0	2	–	0	7	–
		DEL	1	3	21.0	0	12	1.94	-2	11	1.95	–	–	0	1	2.04	0	4	2.35
		DEL_ρ	1	3	21.0	0	16	1.03	-2	11	1.91	2	15	0	2	1.04	0	6	1.06

Open in a new tab

NOTE: β₁, β₂, and, β₃ are the regression coefficients, where the true parameter values are (−0.5, 1, −0.5). Λ(t) = t² is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂_PL; DEL, the double empirical likelihood estimator β̂; DEL_ρ, the extended double empirical likelihood estimator β̂_ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

In the second set of simulations, we assume that the auxiliary survival information is derived from the Cox model with a different baseline hazard function λ*(t) = 1.5λ(t), that is, we set ρ = 1.5 in model (3). The 6-month survival probabilities for the same subgroups are approximately 0.57 and 0.77 under both Model (A) and Model (B). Tables 3 and 4 give the summary statistics of the simulation results. As expected, the double empirical likelihood estimator β̂ which assumes consistency between the individual-level data and the aggregate data yields biased estimates. On the other hand, the extended double empirical likelihood estimator β̂_ρ performs very well in terms of bias and efficiency gain, and the estimated value of ρ is very close to its true value 1.5.

Table 3.

Summary statistics for the estimation of Model (A) λ(t | X₁, X₂) = λ₀(t) exp(β₁X₁ + β₂X₂) when ρ = 1.5.

Proportion censored			β₁			β₂			ρ		Λ(0.3)			Λ(0.7)
Proportion censored	n	Coef	Bias	ES	RE	Bias	ES	RE	Bias	ES	Bias	ES	RE	Bias	ES	RE
0%	100	PL	-1	12	–	2	22	–	–	–	0	3	–	0	10	–
		DEL	0	5	5.99	-22	15	1.88	–	–	4	3	0.76	15	7	1.92
		DEL_ρ	-1	5	4.71	1	22	1.02	8	38	0	3	1.02	0	10	1.03
	400	PL	0	6	–	1	11	–	–	–	0	1	–	0	5	–
		DEL	2	2	6.26	-21	8	1.81	–	–	4	2	0.88	15	3	1.90
		DEL_ρ	1	2	5.68	0	11	1.02	2	17	0	1	1.04	0	5	1.05
30%	100	PL	-2	14	–	2	27	–	–	–	0	3	–	0	12	–
		DEL	1	5	8.39	-25	19	1.99	–	–	4	3	0.82	16	8	2.01
		DEL_ρ	0	2	5.20	0	11	1.01	2	17	0	1	1.03	0	5	1.04
	400	PL	-1	7	–	1	13	–	–	–	0	2	–	0	5
		DEL	1	2	7.59	-23	9	1.92	–	–	4	2	0.90	15	4	1.88
		DEL_ρ	0	2	7.01	0	13	1.02	2	19	0	1	1.04	0	5	1.04
50%	100	PL	-2	16	–	2	32	–	–	–	0	3	–	0	13	–
		DEL	1	5	8.84	-28	22	2.09	–	–	4	3	0.97	16	10	1.84
		DEL_ρ	0	6	8.29	1	31	1.04	12	50	0	3	1.02	1	13	1.03
	400	PL	-1	8	–	1	15	–	–	–	0	2	–	0	6	–
		DEL	2	2	9.99	-27	11	2.01	–	–	4	2	0.90	16	4	1.96
		DEL_ρ	1	2	9.39	0	15	1.02	2	21	0	2	1.05	0	6	1.04

Open in a new tab

NOTE: β1 and β₂ are the regression coefficients, where the true parameter values are (−0.5, 0.5); Λ(t) = t² is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂_PL; DEL, the double empirical likelihood estimator β̂; DEL_ρ, the extended double empirical likelihood estimator β̂_ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

Table 4.

Summary statistics for the estimation of Model (B) λ(t | X₁, X₂) = λ₀(t)exp(β₁X₁ + β₂X₂ + β₃X₁X₂) when ρ = 1.5.

Proportion censored			β₁			β₂			β₃			ρ		Λ(0.3)			Λ(0.7)
Proportion censored	n	Coef	Bias	ES	RE	Bias	ES	RE	Bias	ES	RE	Bias	ES	Bias	ES	RE	Bias	ES	RE
0%	100	PL	-1	17	–	2	24	–	-2	24	–	–	–	0	3	–	–	10	–
		DEL	0	5	9.70	-24	16	2.10	6	17	2.04	–	–	4	3	0.88	15	7	2.02
		DEL_ρ	0	6	9.38	2	23	1.02	-2	18	1.73	9	40	0	3	1.01	0	10	1.04
	400	PL	-1	8	–	1	12	–	0	11	–	–	–	0	1	–	0	5	–
		DEL	2	2	10.1	-24	8	2.08	6	8	1.82	–	–	4	1	0.95	15	4	1.90
		DEL_ρ	1	2	9.75	1	12	1.02	-1	9	1.51	2	19	0	1	1.02	0	5	1.03
30%	100	PL	-2	21	–	3	29	–	-2	28	–	–	–	0	3	–	0	12	–
		DEL	0	6	14.1	-27	19	2.20	4	20	2.09	–	–	4	3	0.95	16	9	1.93
		DEL_ρ	-1	6	13.6	2	28	1.04	-2	21	1.84	12	46	0	3	1.03	0	12	1.05
	400	PL	-1	9	–	1	14	–	0	13	–	–	–	0	2	–	0	6	–
		DEL	1	3	14.2	-26	9	2.17	5	10	1.92	–	–	4	2	0.98	15	4	1.94
		DEL_ρ	0	3	13.7	1	13	1.04	-1	10	1.70	3	21	0	1	1.04	0	6	1.05
50%	100	PL	-2	27	–	4	35	–	-2	36	–	–	–	0	3	–	0	14	–
		DEL	0	6	23.5	-30	23	2.36	0	24	2.23	–	–	4	3	1.00	16	10	1.93
		DEL_ρ	0	6	22.6	2	33	1.09	-4	25	2.11	14	53	0	3	1.06	0	14	1.08
	400	PL	-1	12	–	1	16	–	0	16	–	–	–	0		–	0	7
		DEL	2	3	21.3	-29	11	2.19	2	11	2.05	–	–	4	2	0.99	16	5	1.88
		DEL_ρ	1	3	20.6	0	16	1.03	-2	11	1.90	3	23	0	2	1.04	0	6	1.06

Open in a new tab

NOTE: β₁, β₂, and β₃ are the regression coefficients, where the true parameter values are (−0.5, 1, −0.5). Λ(t) = t² is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂_PL; DEL, the double empirical likelihood estimator β̂; DEL_ρ, the extended double empirical likelihood estimator β̂_ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

5.2 Data Example

5.2.1 Example 1: Prostate cancer study

To demonstrate how the proposed methods can improve efficiency by incorporating auxiliary survival information with the individual-level data, we investigated the comparative effectiveness of two modes of androgen deprivation therapy (ADT), intermittent and continuous ADT (IADT and CADT), on survival outcomes in men with advanced prostate cancer. Continuous ADT has been the conventional palliative approach in the U.S. for the control of advanced prostate cancer, and intermittent ADT has been proposed as an alternative to CADT for the potential advantages of improved quality of life, reduced cost, and reduced risk of side effects. However, it remains unclear whether IADT has a survival benefit comparable to CADT. While age and prostate specific antigen (PSA) level have been shown to be important prognostic factors in advanced prostate cancer, it remains unclear whether the comparative effectiveness of IADT versus CADT differs by PSA level at diagnosis after adjustment for men's age at diagnosis. Although clinical trials and meta-analysis have ben conducted to investigate this issue, the answer remains inconclusive due to lack of statistical power needed for subgroup analysis (Tsai et al., 2013; Hussain et al., 2013).

We used the linked Surveillance, Epidemiology and End Result (SEER)-Medicare dataset in this example. The SEER-Medicare dataset matches incident cancer patients identified from SEER registries to their data from Medicare, the major insurer in the U.S. for people 65 years and older, to obtain longitudinal inpatient and outpatient treatment information and determine patients' receipt of ADT. Our study population was defined as men 66 years and older diagnosed with advanced prostate cancer anytime from January 1, 2004 to December 31, 2009 who received ADT anytime during 2004 to 2010. After excluding men who did not have continuous Medicare coverage from 2004 to 2010, who did not receive either CADT or IADT, and who were missing diagnosis age and PSA measures, a total of 4548 patients were included in this illustrative data example. Among these patients, 71.7% of them received continuous ADT treatment and 45.0% died before December 31, 2010. The median age at diagnosis was 75 years and the median PSA level was 16.2 ng/mL. To illustrate the proposed estimation procedure, we randomly selected 300 cases from the complete dataset, that is, 6.6% of the available 4548 cases. The selected subset has a median age at diagnosis of 75 years and a median PSA level of 17.1 ng/mL. Additionally, 68.7% of the cases in the subset received continuous ADT treatment and 46% died before December 31, 2010.

We fitted a Cox model to analyze survival after diagnosis of prostate cancer using diagnosis age (66-75, 76-80, and > 80) and PSA level (0-40 and > 40) as categorical covariates. Table 5 shows the maximum partial likelihood estimators and their standard errors by using the complete dataset with 4548 cases and the subset with 300 selected cases. We also applied the two double empirical likelihood estimators to synthesize information from the subset data with three sets of auxiliary survival information at the 5-year landmark: (I) The five year survival probabilities were 70.5% and 34.5% for IADT-treated patients whose PSA levels were below and above 40 ng/mL at diagnosis; (II) The five year survival probabilities were 64.2% and 20.1% for CADT-treated patients whose PSA level were below and above 40 ng/mL at diagnosis; (III) The aforementioned survival information are available in both IADT and CADT groups. Note that these auxiliary survival probabilities were derived by applying the Kaplan-Meier estimator to the corresponding subgroups from the complete dataset. To obtain the standard errors for the estimated regression coefficients, we adopted a nonparametric bootstrap method by sampling 300 subjects with replacements from the subset data. The resampling procedure was repeated 1000 times, and the standard errors were estimated with the standard deviation of the 1000 estimates.

Table 5.

Estimated regression coefficients of the Cox model for the prostate cancer study.

	cADT		PSA		cADT*PSA		76-80 years		>80 years		ρ
	Coef	SE	Coef	SE	Coef	SE	Coef	SE	Coef	SE	Coef	SE
Complete	0.193	0.066	0.905	0.084	0.359	0.099	0.316	0.058	0.764	0.051	–	–
Subset	0.433	0.251	1.121	0.298	0.117	0.377	0.268	0.254	0.822	0.205	–	–
DEL I	0.309	0.156	1.076	0.083	0.158	0.251	0.266	0.248	0.804	0.198	–	–
DEL_ρ I	0.414	0.213	1.081	0.084	0.155	0.252	0.270	0.251	0.823	0.204	1.132	0.212
DEL II	0.382	0.220	1.125	0.300	0.193	0.309	0.259	0.251	0.812	0.199	–	–
DEL_ρ II	0.398	0.229	1.125	0.299	0.192	0.308	0.255	0.249	0.805	0.199	0.977	0.115
DEL III	0.236	0.059	1.076	0.083	0.240	0.106	0.259	0.248	0.797	0.194	–	–
DEL_ρ III	0.237	0.060	1.077	0.084	0.241	0.107	0.263	0.249	0.806	0.199	1.022	0.111

Open in a new tab

NOTE: Coef, the estimated coefficient; SE, the bootstrap standard error given by the standard deviation of the 1000 estimates. SE for the complete dataset with 4548 cases is the asymptotic standard error estimates. DEL, the double empirical likelihood estimator β̂; DEL_ρ, the extended double empirical likelihood estimator β̂_ρ that allows for a different baseline hazard function for the aggregated data.

As expected, incorporating auxiliary survival information in the data analysis using the proposed methods yields smaller standard errors than the conventional partial likelihood approach using the subset data, and the largest efficiency gains are observed in Scenario III where a greater amount of information is incorporated. Of note, the estimator β̂_ρ that allows for a different baseline hazard function for the aggregate data gives almost identical results as the one that assumes consistency between the two populations. As expected, the estimated value of ρ is very close to 1 in all scenarios because the selected subset is a random sample of the complete data. Interestingly, the analysis with complete data shows the comparative effectiveness of IADT versus CADT to differ significantly by patients' PSA level after adjusting for age at diagnosis, yet the significance disappears when applying conventional analysis in the subset data but is revealed in our proposed approaches with auxiliary survival information, thus highlighting the practical value of employing a more efficient methodology.

5.2.2 Example 2: Pancreatic cancer study

We now apply the extended double empirical likelihood estimator to another example where the auxiliary survival information may be inconsistent with individual-level data. We analyzed data from a pancreatic cancer study conducted at the Johns Hopkins Hospital to study risk factors affecting survival following pancreatectomy. Pancreatic ductal adenocarcinoma (PDAC), the most common histological subtype of pancreatic malignancy, is a very resilient disease with a very poor prognosis. To date, radical surgical resection remains the only treatment for PDAC that offers clinical benefit in terms of overall survival. Unfortunately, less than 20% of pancreatic cancer patients have surgically resectable disease at the time of diagnosis, and the majority of resected pancreatic cancer recurs within 5 years. Despite advances in the treatment of cancer during the past few decades, improvement in long-term survival of PDAC patients has been modest. The major risk factors influencing survival after pancreatic cancer surgery are tumor characteristics. Favorable prognostic factors include negative resection margin, negative lymph node, and absence of perineural invasion.

This data example is from a retrospective cohort study of 209 consecutive patients who had surgical resection of PDAC and follow-up at the Johns Hopkins Hospital from January 9, 1998 to June 13, 2007. Thorough chart reviews were conducted to ascertain patient's demographics and results of laboratory test, clinical and pathological exams. Treatment data were collected from the electronic medical records. Disease recurrence was determined clinically through imaging studies (computed tomography, positron emission tomography) or pathological diagnosis (CT-guided biopsy, wedge resection or lobectomy). All-cause and cancer-specific deaths and dates of death were determined by a combined review of clinical follow-up information, Social Security Death Index, and the National Cancer Database.

We fitted a Cox model to evaluate the effects of presence of lymph nodes, positive resection margins, presence of perineural invasion (PNI), age at surgery (≤ 65 and > 65), and gender on overall survival. Table 6 shows the estimated covariate effects using the partial likelihood method. We also applied the double empirical likelihood estimators β̂_ρ to synthesize three sets of auxiliary survival information reported in Cameron et al. (2006): (I) the three-year survival probabilities for node-negative and node-positive patients were 40% and 26%, respectively. (II) the three-year survival probabilities for margin-negative and margin-positive patients were 35% and 20%, respectively. (III) All the four survival probabilities given in (I) and (II). These survival probabilities were estimated from 1000 consecutive pancreatectomies performed by a single surgeon between March 1969 and May 2003.

Table 6.

Estimated regression coefficients of the Cox model for the pancreatic cancer study.

	Nodes		Margins		PNI		> 65 years		Male		ρ
	Coef	SE	Coef	SE	Coef	SE	Coef	SE	Coef	SE	Coef	SE
PL	0.37	0.22	0.41	0.17	1.09	0.42	0.28	0.16	-0.29	0.15	–	–
DEL_ρ I	0.25	0.10	0.42	0.17	1.09	0.40	0.29	0.16	-0.29	0.16	0.80	0.08
DEL_ρ I	0.37	0.23	0.34	0.06	1.10	0.40	0.27	0.16	-0.30	0.16	0.80	0.08
DEL_ρ III	0.27	0.08	0.36	0.05	1.10	0.41	0.28	0.16	-0.30	0.15	0.80	0.08

Open in a new tab

NOTE: Coef, the estimated coefficient; SE, the bootstrap standard error given by the standard deviation of the 1000 estimates. SE for the maximum partial likelihood estimator is the asymptotic standard error estimates. DEL_ρ, the extended double empirical likelihood estimator β̂_ρ that allows for a different baseline hazard function for the aggregate data.

We only reported the results of β̂_ρ in Table 6 because the baseline hazard function in the aggregate data is expect to be different due to potential differences in patients' characteristics. As before, incorporating auxiliary survival information in the data analysis using the proposed methods yields smaller standard errors than the conventional partial likelihood approach, and the largest efficiency gains are observed in Scenario III where a greater amount of information is incorporated. Note that the effect of positive lymph nodes is only marginally significant when applying the partial likelihood method, but becomes statistically significant when combining with the auxiliary survival information. The estimated value of ρ is 0.8 and is significantly lower than 1, indicating that the baseline risk in patients in the study reported by Cameron et al. (2006) is lower than that in patients in our clinical study.

6. Remarks

In this paper, we have proposed two double empirical likelihood approaches to synthesize information from both patient-level right-censored survival data and the auxiliary survival information. We first construct an efficient estimation procedure by imposing consistency between the individual-level data and the aggregate data, and then extend the estimation procedure to allow for potential differences in the survival models for the two data sources. Many researchers, including Imbens and Lancaster (1994), Hellerstein and Imbens (1999), and Chaudhuri et al. (2008), have considered estimation of the general/generalized linear models by imposing additional moment restrictions. Most of the existing work deals with complete data, while this paper considers estimation of the semiparametric Cox proportional hazards model using right-censored survival data by incorporating a system of nonlinear constraints. The simulation studies show large gains in efficiency by incorporating marginal moments from published survival probability information. We believe that the proposed methodologies will have a significant impact on the practice of meta-analysis.

The proposed approaches are more flexible than the conventional meta-analysis in the sense that they can automatically combine survival information for different subgroups and the information may be derived from different studies. For example, one study may publish survival probabilities for different age groups, while the other study may publish survival probabilities for different disease stages. The proposed double empirical likelihood methods provide a unified framework to incorporate all the available information in the form of non-linear constraints.

The sample size of the external data source, such as disease registries, is in general much larger than that of the individual-level data. As a result, the variability in the published survival information is usually negligible compared to the variability in the parameter estimates using the individual-level data. In the case where the variability is not negligible, a higher-order Taylor expansion needs to be employed to summarize the auxiliary survival information as estimating equations. This will be explored elsewhere.

Acknowledgments

The authors thank the editor, the associate editor, and two referees for their constructive comments. Thanks are due to Dr. Lei Zheng for kindly sharing the pancreatic cancer data. The authors also acknowledge the efforts of the Applied Research Program, NCI; the Office of Research, Development and Information, CMS; Information Management Services (IMS), Inc.; and the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in the creation of the SEER-Medicare database. The interpretation and reporting of these data are the sole responsibility of the authors. This work is supported by National Institutes of Health.

Appendix: Large-Sample Properties

Proof of Asymptotic Normality for U₀. Define N_i(t) = Δ_iI(Y_i ≤ t) and $M_{i} (t) = N_{i} (t) - Λ_{0} (t) exp (β_{0}^{'} X_{i})$ . Thus we have

\begin{matrix} U_{01} = \sum_{i = 1}^{n} \int_{0}^{\infty} {X_{i} - \frac{S^{(1)} (u, β_{0})}{S^{(0)} (u, β_{0})}} d M_{i} (u), \\ U_{02} = \sum_{i = 1}^{n} ψ (X_{i}, β_{0}, α_{0}), \\ U_{03} = \sum_{i = 1}^{n} \int_{0}^{\infty} \frac{I (u \leq t^{*})}{S^{(0)} (u, β_{0})} d M_{i} (u), \\ U_{04} = 0 . \end{matrix}

For convenience, let s⁽^k⁾(t, β) denote the limit of S⁽^k⁾(t, β), that is, s⁽^k⁾(t, β) = lim_n_→∞ S⁽^k⁾(t, β) for k ∈ {0, 1, 2}. Because M_i(t) is a local square-integrable martingale and X_i–S⁽¹⁾(u, β₀)/S⁽⁰⁾(u, β₀) and I(u ≤ t*)/S⁽⁰⁾(u, β₀) are both predictable quadratic variation processes, we have E(U₀₁) = 0, E(U₀₃) = 0,

\sum \equiv var (n^{- 1 / 2} U_{01}) = \int_{0}^{\infty} [s^{(2)} (u, β_{0}) - {s^{(0)} (u, β_{0})}^{- 1} {s^{(1)} (u, β_{0})}^{\otimes 2}] d Λ_{0} (u),

K \equiv var (n^{- 1 / 2} U_{03}) = \int_{0}^{t_{0}} {s^{(0)} (u, β_{0})}^{- 1} d Λ_{0} (u),

and

cov (U_{01}, U_{03}) = \sum_{i = 1}^{n} \int_{0}^{\infty} E [{X_{i} - \frac{S^{(1)} (u, β_{0})}{S^{(0)} (u, β_{0})}} \frac{I (u \leq t^{*})}{S^{(0)} (u, β_{0})} exp (β_{0}^{'} X_{i}) I (Y_{i} \geq u)] d Λ_{0} (t) = 0 .

Moreover, by double expectation, it can be shown that $E (U_{01}^{'} U_{02}) = 0$ and E(U₀₃U₀₂) = 0. Define J = var(n^−1/2U₀₂). It is easy to see that | ψ_k(X, β₀, α₀) |≤ 2 for k = 1, …, K. Thus, by the martingale central limit theorem and the classic central limit theorem, U₀ converges in distribution to a zero mean multivariate normal distribution with the variance-covariate matrix Ω as n → ∞.

Proof of Theorem 1. Arguing as in the proof of Lemma 1 in Qin and Lawless (1994), under some regularity conditions, we can show that the full constraint empirical likelihood attains the maximum at some point (β̂, α̂) in the interior of the ball {(β, α) : ‖ (β, α) − (β₀, α₀) ‖≤ n^−1/3} with probability 1. Next, straightforward algebra yields

- n^{- 1} E (\frac{\partial U (θ)}{\partial θ} |_{θ = θ_{0}}) = (\begin{matrix} \sum & B \\ - B^{'} & Q \end{matrix}),

(4)

Where

\begin{matrix} B = - n^{- 1} {E (\frac{\partial U_{1}}{\partial ξ}) |_{θ = θ_{0}}, E (\frac{\partial U_{1}}{\partial ν}) |_{θ = θ_{0}}, E (\frac{\partial U_{1}}{\partial α}) |_{θ = θ_{0}}} \\ = [E {\frac{\partial ψ}{\partial β} (X_{1}, β_{0}, α_{0})}, - \int_{0}^{t_{0}} \frac{s^{(1)} (u, β_{0})}{s^{(0)} (u, β_{0})} d Λ_{0} (u), 0] \\ : = (B_{1}, B_{2}, 0), \end{matrix}

and, similarly,

Q = (\begin{matrix} J & 0 & H \\ 0 & K & 1 \\ H^{'} & 1 & 0 \end{matrix}),

with H = −n⁻¹E {∂ψ/∂α(X₁, β₀, α₀)}. It follows from H′KH + J being positive definite that Q is negative definite. By singular value decomposition, one can derive

Q^{- 1} = (\begin{matrix} I & 0 & - J^{- 1} H \\ 0 & 1 & - K^{- 1} \\ 0 & 0 & 1 \end{matrix}) {\begin{matrix} J^{- 1} & 0 & 0 \\ 0 & K^{- 1} & 0 \\ 0 & 0 & - {(H^{'} J^{- 1} H + K^{- 1})}^{- 1} \end{matrix}} (\begin{matrix} I & 0 & 0 \\ 0 & 1 & 0 \\ - H^{'} J^{- 1} & - K^{- 1} & 1 \end{matrix}) .

Define Γ = Σ + BQ⁻¹B′, then

{(\begin{matrix} \sum & B \\ - B^{'} & Q \end{matrix})}^{- 1} = {\begin{matrix} Γ^{- 1} & - Γ^{- 1} B Q^{- 1} \\ Q^{- 1} B^{'} Γ^{- 1} & {(Q + B^{'} \sum^{- 1} B)}^{- 1} \end{matrix}} .

By definition, θ̂ is the solution to U(θ̂) = 0. Write $U = {(U_{01}^{'}, U^{*'})}^{'}$ with $U^{*} = {(U_{02}^{'}, U_{03}, 0)}^{'}$ . Then, by (4) and a Taylor series expansion, we have

n^{1 / 2} (\hat{θ} - θ_{0}) = {(\begin{matrix} \sum & B \\ - B^{'} & Q \end{matrix})}^{- 1} (\begin{matrix} n^{- 1 / 2} U_{01} \\ n^{- 1 / 2} U^{*} \end{matrix}) + o_{p} (1) .

(5)

Thus we establish the asymptotic representation n^1/2(β̂–β₀) = Γ⁻¹(n^−1/2U₀₁)–Γ⁻¹BQ⁻¹(n^−1/2U*)+ o_p(1). Because U₀₁ and U* are orthogonal, the asymptotic variance of n^1/2(β̂ – β₀) is given by

Γ^{- 1} \sum Γ^{- 1} + Γ^{- 1} B Q^{- 1} (\begin{matrix} J & 0 & 0 \\ 0 & K & 0 \\ 0 & 0 & 0 \end{matrix}) Q^{- 1} B^{'} Γ^{- 1} = Γ^{- 1} (Γ - B Q^{- 1} B^{'}) Γ^{- 1} + Γ^{- 1} B Q^{- 1} (\begin{matrix} J & 0 & 0 \\ 0 & K & 0 \\ 0 & 0 & 0 \end{matrix}) Q^{- 1} B^{'} Γ^{- 1} = Γ^{- 1} - Γ^{- 1} B Q^{- 1} (\begin{matrix} 0 & 0 & H \\ 0 & 0 & 1 \\ H^{'} & 1 & 0 \end{matrix}) Q^{- 1} B^{'} Γ^{- 1} .

Let G = H′J⁻¹H + K⁻¹. It can be verified that

Q^{- 1} (\begin{matrix} 0 & 0 & H \\ 0 & 0 & 1 \\ H^{'} & 1 & 0 \end{matrix}) Q^{- 1} = (\begin{matrix} 0 & 0 & J^{- 1} H G^{- 1} \\ 0 & 0 & K^{- 1} G^{- 1} \\ G^{- 1} H^{'} J^{- 1} & G^{- 1} K^{- 1} & - 2 G^{- 1} \end{matrix}) .

Straightforward algebra yields

B Q^{- 1} (\begin{matrix} 0 & 0 & H \\ 0 & 0 & 1 \\ H^{'} & 1 & 0 \end{matrix}) Q^{- 1} B^{'} = 0 .

(6)

Hence the asymptotic variance of {n^1/2(β̂ – β₀)} is given by Γ⁻¹ = (Σ + BQ⁻¹B′)⁻¹ ≤ Σ⁻¹.

Proof of Asymptotic Normality for Λ̂(t). To establish the asymptotic normality of Λ̂(t), we first note that the double empirical likelihood-based estimator for the baseline cumulative hazard function can be reexpressed as Λ̂(t, β̂, ν̂), where

\hat{Λ} (t, β, ν) = \int_{0}^{t} \frac{n^{- 1} \sum_{i = 1}^{n} d N_{i} (u)}{S^{(0)} (u, β) + ν I (u \leq t^{*})} .

Note that Λ̂ defines a functional of two empirical processes $n^{- 1} \sum_{i = 1}^{n} d N_{i} (u)$ and S⁽⁰⁾(u, β) + νI(u ≤ t*), and the mapping defined by Λ̂ is compactly differentiable. Define F^u(t) = E{N₁(t)} and let Λ₀(t) be the true baseline cumulative hazard function. It can be shown that Λ̂(t, β₀, 0) converges almost surely to $\int_{0}^{t} d F^{u} (u) / s^{(0)} (u, β_{0}) = Λ_{0} (t)$ .

A Taylor series expansion of Λ̂(t, β̂, ν̂) about (β₀, 0) yields

\hat{Λ} (t, \hat{β}, \hat{ν}) = \hat{Λ} (t, β_{0}, 0) + {\frac{\partial \hat{Λ} (t, β, ν)}{\partial β} |_{(β_{t}^{*}, 0)}}^{'} (\hat{β} - β_{0}) + \frac{\partial \hat{Λ} (t, β, ν)}{\partial ν} |_{(β_{0}, ν_{t}^{*})} (\hat{ν} - 0),

(7)

where $(β_{t}^{*}, ν_{t}^{*})$ lies between (β̂, ν̂) and (β₀, 0). Given the consistency of θ̂ and by the Glivenko-Cantelli theorem, we can show that almost surely

\frac{\partial \hat{Λ} (t, β, ν)}{\partial β} |_{(β_{t}^{*}, 0)} \to E {\frac{\partial \hat{Λ} (t, β, ν)}{\partial β} |_{(β_{0}, 0)}} = - \int_{0}^{t} \frac{s^{(1)} (u, β_{0})}{s^{(0)} {(u, β_{0})}^{2}} d F^{u} (u)

and

\frac{\partial \hat{Λ} (t, β, ν)}{\partial ν} |_{(β_{0}, ν_{t}^{*})} \to E {\frac{\partial \hat{Λ} (t, β, ν)}{\partial ν} |_{(β_{0}, 0)}} = - \int_{0}^{t} \frac{I (u \leq t^{*})}{s^{(0)} {(u, β_{0})}^{2}} d F^{u} (u)

as n → ∞. Moreover, applying the functional delta method to Λ̂(t, β₀, 0) yields

\hat{Λ} (t, β_{0}, 0) - Λ_{0} (t) = \int_{0}^{t} \frac{n^{- 1} \sum_{i = 1}^{n} d N_{i} (u)}{s^{(0)} (u, β_{0})} - \int_{0}^{t} \frac{S^{(0)} (u, β) + ν I (u \leq t^{*})}{s^{(0)} {(u, β_{0})}^{2}} d F^{u} (u) + o_{p} (n^{- 1 / 2}) .

(8)

It follows from (5), (7), and (8) that n^1/2{Λ̂(t, β̂, ν̂) − Λ₀(t)} is asymptotically equivalent to a sum of i.i.d. monotone processes with bounded second moments, and thus, following example 2.11.16 of van der Vaart and Wellner (1996), converges weakly to a mean zero gaussian process.

Proof of Theorem 2. Note that (β̃, α̃) is the solution to U₁(β, 0, 0, α) = 0, and U₃(β, 0, 0, α) = 0. Hence (β̃, α̃) has the following asymptotic representation

n^{1 / 2} (\begin{matrix} \tilde{β} - β_{0} \\ \tilde{α} - α_{0} \end{matrix}) = n^{- 1 / 2} {(\begin{matrix} \sum & 0 \\ - B_{2}^{'} & 1 \end{matrix})}^{- 1} (\begin{matrix} U_{01} \\ U_{03} \end{matrix}) + o_{p} (1) = n^{- 1 / 2} (\begin{matrix} \sum^{- 1} & 0 \\ B_{2}^{'} \sum^{- 1} & 1 \end{matrix}) (\begin{matrix} U_{01} \\ U_{03} \end{matrix}) + o_{p} (1) .

It follows from (5) and n^1/2(H′ξ̂ + ν̂) = n^−1/2U₀₄ + o_p(1) = o_p(1) that

n^{1 / 2} (\begin{matrix} \hat{β} - \tilde{β} \\ \hat{ξ} \\ \hat{ν} \\ \hat{α} - \tilde{α} \end{matrix}) = n^{- 1 / 2} (\begin{matrix} \sum^{- 1} (- B_{1} + B_{2} H^{'}) \\ I \\ - H^{'} \\ - B_{2}^{'} \sum^{- 1} B_{1} + (B_{2}^{'} \sum^{- 1} B_{2} + K) H^{'} \end{matrix}) \hat{ξ} + o_{p} (1) .

Expanding the partial likelihood ℓ(β̃, 0, 0, α̃) at θ̂ yields

\begin{matrix} R = 2 {l (\tilde{β}, 0, 0, \tilde{α}) - l (\hat{β}, \hat{ξ}, \hat{ν}, \hat{α})} \\ = n ({\hat{β}}^{'} - {\tilde{β}}^{'}, {\hat{ξ}}^{'}, \hat{ν}, \hat{α} - \tilde{α}) (\begin{matrix} \sum & B \\ - B^{'} & Q \end{matrix}) {({\hat{β}}^{'} - {\tilde{β}}^{'}, {\hat{ξ}}^{'}, \hat{ν}, \hat{α} - \tilde{α})}^{'} + o_{p} (1) \\ = n^{1 / 2} {\hat{ξ}}^{'} {(B_{1}^{'} - H B_{2}^{'}) \sum^{- 1} (B_{1} - B_{2} H^{'}) + J + H K H^{'}} n^{1 / 2} \hat{ξ} + o_{p} (1) . \end{matrix}

Define $W = (B_{1}^{'} - H B_{2}^{'}) \sum^{- 1} (B_{1} - B_{2} H^{'}) + J + H K H^{'}$ and denote the asymptotic variance of n^1/2ξ by

V_{ξ} = (0, I, 0, 0) {(\begin{matrix} \sum & B \\ - B^{'} & Q \end{matrix})}^{- 1} Ω {(\begin{matrix} \sum & B \\ - B^{'} & Q \end{matrix})}^{- 1} {(0, I, 0, 0)}^{'} .

By tedious algebra, we can show that VWVWV = VWV and rank(WV) = K, that is, the quadratic form of the asymptotically normally distributed random variable ξ satisfies the conditions of Ogasawara-Takahashi Theorem (Rao, 1973, page 188). Thus we prove that R converges in distribution a χ² distribution with K degrees of freedom as n → ∞.

Proof of Theorem 3. The proof of Theorem 3 closely follows that of Theorem 1. Hence we only highlight their differences. It is easy to see that Ũ₀₁ = U₀₁,

{\tilde{U}}_{02} = \sum_{i = 1}^{n} \tilde{ψ} (X_{i}, β_{0}, α_{0}, ρ_{0}),

Ũ₀₃ = U₀₃, Ũ₀₄ = U₀₄ = 0, Ũ₀₅ = U₀₅ = 0. Arguing as before, ${\tilde{U}}_{0} = {({\tilde{U}}_{01}^{'}, {\tilde{U}}_{02}^{'}, {\tilde{U}}_{03}, {\tilde{U}}_{04}, {\tilde{U}}_{05})}^{'}$ converges in distribution to a zero mean multivariate normal distribution with the variance-covariate matrix Ω̃ as n → ∞, where

\tilde{Ω} = (\begin{matrix} \sum & 0 & 0 & 0 & 0 \\ 0 & \tilde{J} & 0 & 0 & 0 \\ 0 & 0 & K & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}),

where J̃ = var(n^−1/2Ũ₀₂).

Define the matrices

Φ = (\begin{matrix} \tilde{Q} & L \\ L^{'} & 0 \end{matrix}),

where

L = - n^{- 1} {E (\frac{\partial {\tilde{U}}_{5}}{\partial ξ}) |_{θ = {\tilde{θ}}_{0}}, E (\frac{\partial {\tilde{U}}_{5}}{\partial ν}) |_{θ = {\tilde{θ}}_{0}}, E (\frac{\partial {\tilde{U}}_{5}}{\partial α}) |_{θ = {\tilde{θ}}_{0}}}^{'} = {[- E {\frac{\partial \tilde{ψ}}{\partial ρ} (X_{1}, β_{0}, α_{0}, ρ_{0})}, 0, 0]}^{'} : = {(L_{1}^{'}, 0, 0)}^{'},

\tilde{Q} = (\begin{matrix} J & 0 & \tilde{H} \\ 0 & K & 1 \\ {\tilde{H}}^{'} & 1 & 0 \end{matrix}),

and H̃ = −n⁻¹E {∂ψ̃/∂α(X₁, β₀, α₀, ρ₀)}. Straightforward algebra yields

- n^{- 1} E (\frac{\partial \tilde{U}}{\partial θ} |_{θ = {\tilde{θ}}_{0}}) = (\begin{matrix} \sum & κ \\ - κ^{'} & Φ \end{matrix}),

where κ = (B̃, 0) with $\tilde{B} = [E {\frac{\partial \tilde{ψ}}{\partial β} (X_{1}, β_{0}, α_{0}, ρ_{0})}, B_{2}, 0]$ .

By a Taylor series expansion, we have

n^{1 / 2} ({\hat{θ}}_{ρ} - {\tilde{θ}}_{0}) = {(\begin{matrix} \sum & κ \\ - κ^{'} & Φ \end{matrix})}^{- 1} (\begin{matrix} n^{- 1 / 2} {\tilde{U}}_{01} \\ n^{- 1 / 2} {\tilde{U}}^{*} \end{matrix}) + o_{p} (1),

where ${\tilde{U}}^{*} = {({\tilde{U}}_{02}^{'}, {\tilde{U}}_{03}, 0, 0)}^{'}$ . Define Γ̃ = Σ + κΦ⁻¹κ′. Arguing as before, we establish the asymptotic representation n^1/2(β̂_ρ − β₀) = Γ̃⁻¹(n^−1/2Ũ₀₁) − Γ̃⁻¹κΦ⁻¹(n^−1/2Ũ*) + o_p(1). Because Ũ₀₁ and Ũ* are orthogonal, the asymptotic variance of n^1/2(β̂_ρ − β₀) is given by

{\tilde{Γ}}^{- 1} \sum {\tilde{Γ}}^{- 1} + {\tilde{Γ}}^{- 1} κ Φ^{- 1} (\begin{matrix} J & 0 & 0 & 0 \\ 0 & K & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}) Φ^{- 1} κ^{'} {\tilde{Γ}}^{- 1} = {\tilde{Γ}}^{- 1} (\tilde{Γ} - κ Φ^{- 1} κ^{'}) {\tilde{Γ}}^{- 1} + {\tilde{Γ}}^{- 1} κ Φ^{- 1} (\begin{matrix} \tilde{J} & 0 & 0 & 0 \\ 0^{'} & K & 0 & 0 \\ 0^{'} & 0 & 0 & 0 \\ 0^{'} & 0 & 0 & 0 \end{matrix}) Φ^{- 1} κ^{'} {\tilde{Γ}}^{- 1} = {\tilde{Γ}}^{- 1} - {\tilde{Γ}}^{- 1} κ Φ^{- 1} (\begin{matrix} 0 & 0 & \tilde{H} & L_{1} \\ 0^{'} & 0 & 1 & 0 \\ {\tilde{H}}^{'} & 1 & 0 & 0 \\ L_{1}^{'} & 0 & 0 & 0 \end{matrix}) Φ^{- 1} κ^{'} {\tilde{Γ}}^{- 1} .

Arguing as in the proof of Theorem 1, we can show that

\begin{matrix} L^{'} {\tilde{Q}}^{- 1} (\begin{matrix} 0 & 0 & \tilde{H} \\ 0^{'} & 0 & 1 \\ {\tilde{H}}^{'} & 1 & 0 \end{matrix}) {\tilde{Q}}^{- 1} L = 0, & and & \tilde{B} {\tilde{Q}}^{- 1} (\begin{matrix} 0 & 0 & \tilde{H} \\ 0^{'} & 0 & 1 \\ {\tilde{H}}^{'} & 1 & 0 \end{matrix}) {\tilde{Q}}^{- 1} L = 0 . \end{matrix}

Together with (7) and

Φ^{- 1} = {(\begin{matrix} \tilde{Q} & L \\ L^{'} & 0 \end{matrix})}^{- 1} = (\begin{matrix} {\tilde{Q}}^{- 1} - {\tilde{Q}}^{- 1} L {(L^{'} {\tilde{Q}}^{- 1} L)}^{- 1} L^{'} {\tilde{Q}}^{- 1} & {\tilde{Q}}^{- 1} L {(L^{'} {\tilde{Q}}^{- 1} L)}^{- 1} \\ {(L^{'} {\tilde{Q}}^{- 1} L)}^{- 1} L^{'} {\tilde{Q}}^{- 1} & - {(L^{'} {\tilde{Q}}^{- 1} L)}^{- 1} \end{matrix})

we have

κ Φ^{- 1} (\begin{matrix} 0 & 0 & \tilde{H} & L_{1} \\ 0^{'} & 0 & 1 & 0 \\ {\tilde{H}}^{'} & 1 & 0 & 0 \\ L_{1}^{'} & 0 & 0 & 0 \end{matrix}) Φ^{- 1} κ^{'} = (\tilde{B}, 0) {(\begin{matrix} \tilde{Q} & L \\ L^{'} & 0 \end{matrix})}^{- 1} (\begin{matrix} 0 & 0 & \tilde{H} & L_{1} \\ 0^{'} & 0 & 1 & 0 \\ {\tilde{H}}^{'} & 1 & 0 & 0 \\ L_{1}^{'} & 0 & 0 & 0 \end{matrix}) {(\begin{matrix} \tilde{Q} & L \\ L^{'} & 0 \end{matrix})}^{- 1} (\begin{matrix} {\tilde{B}}^{'} \\ 0 \end{matrix}) = 0 .

Hence we prove that var{n^1/2(β̂_ρ − β₀)} = Γ̃⁻¹ = (Σ + κΦ⁻¹κ′)⁻¹ ≤ Σ⁻¹, where Σ⁻¹ is the variance-covariance matrix for the maximum partial likelihood estimator β̂_PL. Moreover, because κΦ⁻¹κ′ = B̃Q̃⁻¹B̃′−(B̃Q̃⁻¹L)(L′Q̃⁻¹L)⁻¹(B̃Q̃⁻¹L)′ ≤ B̃Q̃⁻¹B̃′, we also prove that β̂_ρ is less efficient than β̂ which is the maximum empirical likelihood estimator for β when the true parameter value ρ₀ = 1 is known.

Contributor Information

Chiung-Yu Huang, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, Maryland 21205.

Jing Qin, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892.

Huei-Ting Tsai, Cancer Prevention and Control Program, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20007.

References

Breslow NE. Discussion of the paper by D. R. Cox. Journal of the Royal Statistical Society, Series B. 1972;34:216–217. [Google Scholar]
Cameron JL, Riall TS, Coleman J, Belcher KA. One Thousand Consecutive Pancreaticoduodenectomies. Annals of Surgery. 2006;244:10. doi: 10.1097/01.sla.0000217673.04165.ea. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chaudhuri S, Handcock MS, Rendall MS. Generalized Linear Models Incorporating Population Level Information: An Empirical-Likelihood-Based Approach. Journal of the Royal Statistical Society, Series B. 2008;70:311–328. doi: 10.1111/j.1467-9868.2007.00637.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen J, Qin J. Empirical Likelihood Estimation for Finite Populations and the Effective Usage of Auxiliary Information. Biometrika. 1993;80:107–116. [Google Scholar]
Chen J, Sitter RR, Wu C. Using Empirical Likelihood Methods to Obtain Range Restricted Weights in Regression Estimators for Surveys. Biometrika. 2002;89:230–237. [Google Scholar]
Chen J, Wu C. Estimation of Distribution Function and Quantiles Using the Model-Calibrated Pseudo Empirical Likelihood Method. Statistica Sinica. 2002;12:1223–1239. [Google Scholar]
Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
DiCiccio T, Hall P, Romano J. Empirical Likelihood is Bartlett-Correctable. The Annals of Statistics. 1991;19:1053–1061. [Google Scholar]
Hellerstein JK, Imbens GW. Imposing Moment Restrictions from Auxiliary Data by Weighting. Review of Economics and Statistics. 1999;81:1–14. [Google Scholar]
Hussain M, Tangen CM, Berry DL, Higano CS, Crawford ED, Liu G, Wilding G, Prescott S, Kanaga Sundaram S, Small EJ, Dawson NA, Donnelly BJ, Venner PM, Vaishampayan UN, Schellhammer PF, Quinn DI, Raghavan D, Ely B, Moinpour CM, Vogelzang NJ, Thompson IM. Intermittent Versus Continuous Androgen Deprivation in Prostate Cancer. New England Journal of Medicine. 2013;368:1314–1325. doi: 10.1056/NEJMoa1212299. [DOI] [PMC free article] [PubMed] [Google Scholar]
Imbens GW. Generalized Method of Moments and Empirical Likelihood. Journal of Business & Economic Statistics. 2002;20:493–506. [Google Scholar]
Imbens GW, Lancaster T. Combining Micro and Macro Data in Microeconometric Models. The Review of Economic Studies. 1994;61:655–680. [Google Scholar]
Kovalchick SA. Aggregate-Data Estimation of an Individual Patient Data Linear Random Effects Meta-Analysis With a Patient Covariate-Treatment Interaction Term. Bio-statistics. 2013;14:273–283. doi: 10.1093/biostatistics/kxs035. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li G, Li R, Zhou M. Empirical Likelihood in Survival Analysis. In: Fang K, Fan J, Li G, editors. Contemporary Multivariate Analysis and Design of Experiments. Singapore: World Scientific Publishing Co. Inc.; 2005. pp. 337–350. [Google Scholar]
Liu D, Zheng Y, Prentice RL, Hsu L. Estimating Risk with Time-to-Event Data: An Application to the Womens Health Initiative. Journal of the American Statistical Association. 2014;109:514–524. doi: 10.1080/01621459.2014.881739. [DOI] [PMC free article] [PubMed] [Google Scholar]
Owen AB. Empirical Likelihood Ratio Confidence Intervals for a Single Functional. Biometrika. 1988;75:237–249. [Google Scholar]
Owen AB. Empirical Likelihood Ratio Confidence Regions. The Annals of Statistics. 1990;18:90–120. [Google Scholar]
Qin J, Lawless J. Estimating Equations, Empirical Likelihood and Constraints on Parameters. Canadian Journal of Statistics. 1995;23:145–159. [Google Scholar]
Qin J, Lawless JF. Empirical Likelihood and General Estimating Equations. The Annals of Statistics. 1994;22:300–325. [Google Scholar]
Rao CR. Linear Statistical Inference and Its Applications. 2nd New York: Wiley; 1973. [Google Scholar]
Ren JJ, Zhou M. Full Likelihood Inferences in the Cox Model: An Empirical Likelihood Approach. Annals of the Institute of Statistical Mathematics. 2011;63:1005–1018. [Google Scholar]
Simmonds MC, Higgins JP. Covariate Heterogeneity in Meta-Analysis: Criteria for Deciding Between Meta-Regression and Individual Patient Data. Statistics in Medicine. 2007;26:2982–2999. doi: 10.1002/sim.2768. [DOI] [PubMed] [Google Scholar]
Thomas DR, Grunkemeier GL. Confidence Interval Estimation of Survival Probabilities for Censored Data. Journal of the American Statistical Association. 1975;70:865–871. [Google Scholar]
Tsai HT, Penson DF, Makambi KH, Lynch JH, Van Den Eeden SK, Potosky AL. Efficacy of Intermittent Androgen Deprivation Therapy vs Conventional Continuous Androgen Deprivation Therapy for Advanced Prostate Cancer: a Meta-Analysis. Urology. 2013;82:327–334. doi: 10.1016/j.urology.2013.01.078. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
Wu C, Sitter RR. A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data. Journal of the American Statistical Association. 2001;96:185–193. [Google Scholar]
Zhou M. The Cox Proportional Hazards Model With Partially Known Baseline. In: Hsiung AC, Ying Z, Zhang CH, editors. Random Walk, Sequential Analysis and Related Topics. Singapore: World Scientific Publishing Co.; 2006. pp. 215–232. [Google Scholar]

[R1] Breslow NE. Discussion of the paper by D. R. Cox. Journal of the Royal Statistical Society, Series B. 1972;34:216–217. [Google Scholar]

[R2] Cameron JL, Riall TS, Coleman J, Belcher KA. One Thousand Consecutive Pancreaticoduodenectomies. Annals of Surgery. 2006;244:10. doi: 10.1097/01.sla.0000217673.04165.ea. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Chaudhuri S, Handcock MS, Rendall MS. Generalized Linear Models Incorporating Population Level Information: An Empirical-Likelihood-Based Approach. Journal of the Royal Statistical Society, Series B. 2008;70:311–328. doi: 10.1111/j.1467-9868.2007.00637.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Chen J, Qin J. Empirical Likelihood Estimation for Finite Populations and the Effective Usage of Auxiliary Information. Biometrika. 1993;80:107–116. [Google Scholar]

[R5] Chen J, Sitter RR, Wu C. Using Empirical Likelihood Methods to Obtain Range Restricted Weights in Regression Estimators for Surveys. Biometrika. 2002;89:230–237. [Google Scholar]

[R6] Chen J, Wu C. Estimation of Distribution Function and Quantiles Using the Model-Calibrated Pseudo Empirical Likelihood Method. Statistica Sinica. 2002;12:1223–1239. [Google Scholar]

[R7] Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]

[R8] DiCiccio T, Hall P, Romano J. Empirical Likelihood is Bartlett-Correctable. The Annals of Statistics. 1991;19:1053–1061. [Google Scholar]

[R9] Hellerstein JK, Imbens GW. Imposing Moment Restrictions from Auxiliary Data by Weighting. Review of Economics and Statistics. 1999;81:1–14. [Google Scholar]

[R10] Hussain M, Tangen CM, Berry DL, Higano CS, Crawford ED, Liu G, Wilding G, Prescott S, Kanaga Sundaram S, Small EJ, Dawson NA, Donnelly BJ, Venner PM, Vaishampayan UN, Schellhammer PF, Quinn DI, Raghavan D, Ely B, Moinpour CM, Vogelzang NJ, Thompson IM. Intermittent Versus Continuous Androgen Deprivation in Prostate Cancer. New England Journal of Medicine. 2013;368:1314–1325. doi: 10.1056/NEJMoa1212299. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Imbens GW. Generalized Method of Moments and Empirical Likelihood. Journal of Business & Economic Statistics. 2002;20:493–506. [Google Scholar]

[R12] Imbens GW, Lancaster T. Combining Micro and Macro Data in Microeconometric Models. The Review of Economic Studies. 1994;61:655–680. [Google Scholar]

[R13] Kovalchick SA. Aggregate-Data Estimation of an Individual Patient Data Linear Random Effects Meta-Analysis With a Patient Covariate-Treatment Interaction Term. Bio-statistics. 2013;14:273–283. doi: 10.1093/biostatistics/kxs035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Li G, Li R, Zhou M. Empirical Likelihood in Survival Analysis. In: Fang K, Fan J, Li G, editors. Contemporary Multivariate Analysis and Design of Experiments. Singapore: World Scientific Publishing Co. Inc.; 2005. pp. 337–350. [Google Scholar]

[R15] Liu D, Zheng Y, Prentice RL, Hsu L. Estimating Risk with Time-to-Event Data: An Application to the Womens Health Initiative. Journal of the American Statistical Association. 2014;109:514–524. doi: 10.1080/01621459.2014.881739. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Owen AB. Empirical Likelihood Ratio Confidence Intervals for a Single Functional. Biometrika. 1988;75:237–249. [Google Scholar]

[R17] Owen AB. Empirical Likelihood Ratio Confidence Regions. The Annals of Statistics. 1990;18:90–120. [Google Scholar]

[R18] Qin J, Lawless J. Estimating Equations, Empirical Likelihood and Constraints on Parameters. Canadian Journal of Statistics. 1995;23:145–159. [Google Scholar]

[R19] Qin J, Lawless JF. Empirical Likelihood and General Estimating Equations. The Annals of Statistics. 1994;22:300–325. [Google Scholar]

[R20] Rao CR. Linear Statistical Inference and Its Applications. 2nd New York: Wiley; 1973. [Google Scholar]

[R21] Ren JJ, Zhou M. Full Likelihood Inferences in the Cox Model: An Empirical Likelihood Approach. Annals of the Institute of Statistical Mathematics. 2011;63:1005–1018. [Google Scholar]

[R22] Simmonds MC, Higgins JP. Covariate Heterogeneity in Meta-Analysis: Criteria for Deciding Between Meta-Regression and Individual Patient Data. Statistics in Medicine. 2007;26:2982–2999. doi: 10.1002/sim.2768. [DOI] [PubMed] [Google Scholar]

[R23] Thomas DR, Grunkemeier GL. Confidence Interval Estimation of Survival Probabilities for Censored Data. Journal of the American Statistical Association. 1975;70:865–871. [Google Scholar]

[R24] Tsai HT, Penson DF, Makambi KH, Lynch JH, Van Den Eeden SK, Potosky AL. Efficacy of Intermittent Androgen Deprivation Therapy vs Conventional Continuous Androgen Deprivation Therapy for Advanced Prostate Cancer: a Meta-Analysis. Urology. 2013;82:327–334. doi: 10.1016/j.urology.2013.01.078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]

[R26] Wu C, Sitter RR. A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data. Journal of the American Statistical Association. 2001;96:185–193. [Google Scholar]

[R27] Zhou M. The Cox Proportional Hazards Model With Partially Known Baseline. In: Hsiung AC, Ying Z, Zhang CH, editors. Random Walk, Sequential Analysis and Related Topics. Singapore: World Scientific Publishing Co.; 2006. pp. 215–232. [Google Scholar]

PERMALINK

Efficient Estimation of the Cox Model With Auxiliary Subgroup Survival Information

Chiung-Yu Huang

Jing Qin

Huei-Ting Tsai

Roles

Abstract

1. Introduction

2. Model Setup

3. Method

4. An Extension

5. Numerical Studies

5.1 Monte Carlo Simulations

Table 1.

Table 2.

Table 3.

Table 4.

5.2 Data Example

5.2.1 Example 1: Prostate cancer study

Table 5.

5.2.2 Example 2: Pancreatic cancer study

Table 6.

6. Remarks

Acknowledgments

Appendix: Large-Sample Properties

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Efficient Estimation of the Cox Model With Auxiliary Subgroup Survival Information

Chiung-Yu Huang

Jing Qin

Huei-Ting Tsai

Roles

Abstract

1. Introduction

2. Model Setup

3. Method

4. An Extension

5. Numerical Studies

5.1 Monte Carlo Simulations

Table 1.

Table 2.

Table 3.

Table 4.

5.2 Data Example

5.2.1 Example 1: Prostate cancer study

Table 5.

5.2.2 Example 2: Pancreatic cancer study

Table 6.

6. Remarks

Acknowledgments

Appendix: Large-Sample Properties

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases