Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Aug 18.
Published in final edited form as: J Am Stat Assoc. 2016 Aug 18;111(514):787–799. doi: 10.1080/01621459.2015.1044090

Efficient Estimation of the Cox Model With Auxiliary Subgroup Survival Information

Chiung-Yu Huang 1, Jing Qin 2, Huei-Ting Tsai 3
PMCID: PMC5157123  NIHMSID: NIHMS801688  PMID: 27990035

Abstract

With the rapidly increasing availability of data in the public domain, combining information from different sources to infer about associations or differences of interest has become an emerging challenge to researchers. This paper presents a novel approach to improve efficiency in estimating the survival time distribution by synthesizing information from the individual-level data with t-year survival probabilities from external sources such as disease registries. While disease registries provide accurate and reliable overall survival statistics for the disease population, critical pieces of information that influence both choice of treatment and clinical outcomes usually are not available in the registry database. To combine with the published information, we propose to summarize the external survival information via a system of nonlinear population moments and estimate the survival time model using empirical likelihood methods. The proposed approach is more flexible than the conventional meta-analysis in the sense that it can automatically combine survival information for different subgroups and the information may be derived from different studies. Moreover, an extended estimator that allows for a different baseline risk in the aggregate data is also studied. Empirical likelihood ratio tests are proposed to examine whether the auxiliary survival information is consistent with the individual-level data. Simulation studies show that the proposed estimators yield a substantial gain in efficiency over the conventional partial likelihood approach. Two sets of data analysis are conducted to illustrate the methods and theory.

Keywords: Information synthesis, Meta-analysis, SEER cancer registries, Subgroup analysis

1. Introduction

Combining information from different sources to infer about associations or differences of interest is an important area of research known as meta-analysis. Results of meta-analyses have been used to guide the design of future studies, aid the development of regulatory recommendations, and even modify clinical practice. A PubMed search of the word “meta-analysis” in article titles found 7231 articles in just 2013. With the rapidly increasing availability of data in the public domain, taking full advantage of available information while saving considerable resources has become an emerging challenge to researchers.

An ideal meta-analysis would be an analysis of pooled individual-level data, where the raw data from each study are obtained and analyzed directly. In clinical studies, the use of pooled individual-level data enables researchers to conduct subgroup analysis to investigate whether patient characteristics are related to treatment effects. In most applications, however, synthesis of the information is conducted by analyzing summary statistics, such as means, standard deviations, proportions, odds ratios, and relative risks, from each individual study. Specifically, meta-analysis calculates a weighted average of the summary statistics across studies to provide an overall measure of the association or difference of interest. The major drawback of this approach is that covariate-treatment interactions are usually not provided in the reports of primary analysis findings, thus making subgroup analysis difficult to perform (Simmonds and Higgins, 2007).

On the other hand, methods for combining information from both individual-level data and published aggregate data have drawn much attention (Kovalchick, 2013; Liu et al., 2014). This research is particularly inspired by the growing interest in exploiting the population-based cancer survival statistics made available by the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). The SEER program consists of 18 cancer registries covering approximately 28% of the U.S. population. The registries began collecting demographics and cancer factors on all types of incident cancer patients in 1973, and the database links to state death certificates for patient survival information, including cause of death. The SEER program updates their data annually and has been used by thousands of researchers, clinicians, public health officials, policy makers, community groups and public for cancer incidence and survival statistics in the United States. The SEER Cancer Statistics Review 1973-2010 (http://seer.cancer.gov/csr/1975_2010/) reports the 5-year survival after cancer diagnosis by race, sex, age, and year of diagnosis for the major cancer sites and for all cancers combined using data from the population-based cancer registry. For example, the 5-year survival among ovarian cancer patients diagnosed before age 65 is 57%, and is 27.7% among patients diagnosed after age 65. This paper presents a novel approach for combining data from clinical studies, that is, individual-level data, and the published subgroup t-year survival probabilities, that is, aggregate data. Individual-level data provide estimates of the treatment-covariate interactions or effects of biomarkers that are not reported in existing publications, but the sample size may be too small to provide accurate estimates. Properly combining with survival information available from external sources is expected to yield more efficient estimates of the effects of interest as well as more accurate prediction of the risk of the failure event.

Our approach to synthesize information from different sources is motivated by the empirical likelihood method that was first introduced by Thomas and Grunkemeier (1975) to obtain confidence intervals for the Kaplan-Meier estimator. Later, Owen (1988, 1990) studied empirical likelihood-based confidence regions for the mean or other functions. It has been shown that the empirical likelihood ratio has a limiting chi-squared distribution under mild regularity conditions, and that the empirical likelihood is Bartlett correctable in many applications (DiCiccio et al., 1991), leading to an advantage over the bootstrap method in the construction of confidence regions. Many researchers have applied the empirical likelihood method to more general settings. In particular, Qin and Lawless (1994) made connections to estimating equations and demonstrated that the empirical likelihood method can yield the most efficient estimator by making optimal use of the estimating equations. Although the use of empirical likelihood methods has been very popular in fields such as survey sampling, the main interest is usually to improve the estimation of the mean or other functions of the distribution function. For example, Chen and Qin (1993), Chen and Wu (2002), Chen et al. (2002), and Wu and Sitter (2001) applied the empirical likelihood method to incorporate auxiliary covariate information to improve efficiency of estimation. Imbens (2002) provided a very nice discussion on how the empirical likelihood methods can be used as an alternative for the generalized method of moments.

Application of the empirical likelihood approach to survival data has received much attention because variances of statistical estimates can be very difficult to estimate in the presence of right censoring. The empirical likelihood method can efficiently establish joint confidence regions without directly estimating the corresponding asymptotic variances, and can significantly improve coverage accuracy. In a review paper, Li et al. (2005) summarized the result of empirical likelihood analysis for censored survival time data. Ren and Zhou (2011) investigated the properties of the maximum empirical likelihood estimator and compared with the maximum partial likelihood method. Zhou (2006) applied the empirical likelihood method to improve estimation of the Cox model with partial information on the baseline hazard function.

The proposed application of the empirical likelihood method deals with a nonconventional scenario. Under the proportional hazards model, the auxiliary t-year survival probabilities amount to a system of nonlinear estimating equations that involve the regression parameters, the infinite-dimensional baseline hazard function, and the infinite-dimensional marginal distribution function of the covariate X, making it difficult to derive the constrained maximum likelihood estimator. To tackle this difficulty, two empirical likelihoods, one based on the conditional likelihood of the survival time T give X and another based on the marginal likelihood of X, are constructed to combine information from different sources. In the same spirit of Breslow's estimator (Breslow, 1972), the baseline hazard function and the marginal distribution function of X are profiled out of the conditional likelihood and the marginal likelihood, respectively. To the best of our knowledge, this is the first paper that considers a double empirical likelihood approach.

This paper is organized as follows. In Section 2 we introduce notation and summarize the landmark survival information as unbiased estimating equations. The main results are presented in Section 3, where a double empirical likelihood approach is proposed to synthesize the auxiliary survival information. Because the auxiliary survival information many not be consistent with the individual-level data due to inclusion/exclusion criteria of the clinical study, in Section 4 we extend the proposed double empirical likelihood method to allow the population from which the aggregate survival information is derived to have a different baseline risk. We present results of simulation studies in Section 5 and illustrate the proposed methods with two sets of data analysis in Section 6. Some concluding remarks are given in Section 6.

2. Model Setup

Let T denote the time from disease onset to a failure event or an event of interest. Assume that T is absolutely continuous, that is, T has a probability density. Let X denote a p × 1 vector of baseline covariates. Denote by f(t | x) and S(t | x) the conditional density function and the conditional survival function of T given X = x. We assume that the survival time T follows the proportional hazards model (Cox, 1972)

λ(t|x)=λ(t)exp(βx),

where β is a vector of p × 1 regression parameters and λ(t) is an unspecified baseline hazard function. Let Λ(t)=0tλ(u)du be the corresponding baseline cumulative hazard function. The observation of the survival time is usually subject to right censoring due to study end or premature dropout. Thus, instead of observing the actual value of the survival time T, we observe the possibly censored survival time Y = min(T, C), where C is the time of censoring. In many applications, it is reasonable to assume that C is independent of T given the observed covariates X.

Our goal is to derive an efficient estimator of the Cox model by incorporating published t-year survival probabilities. To express the auxiliary information on survival at the time point t*, we use Ωk, k = 1, …, K to denote the kth subgroup whose t*-year survival is provided. In the aforementioned ovarian cancer example, we set Z to be the age of diagnosis, where Z is a subset of the covariate of interest X, which may include biomarkers and other risk factors of ovarian cancer. Write X = (Z, W). Thus the two subgroups of ovarian cancer patients are Ω1 = {(Z, W) : Z < 65} and Ω2 = {(Z, W) : Z ≥ 65}, and the auxiliary 5-year survival probabilities obtained from the SEER Cancer Statistics Review 1973-2010 are given by pr(T > 5 | XΩ1) = 0.56 and pr(T > 5 | XΩ2) = 0.277.

A general expression of the auxiliary survival information for subgroup k at the time point t* is

pr(T>t|XΩk)=ϕk,k=1,,K,

or, equivalently, pr(T > t*, XΩk) – ϕk × pr(XΩk) = 0. By double expectation and under the assumed Cox model, we can derive

E[I(XΩk){S(t|X)ϕk}]=E{I(XΩk)[exp{Λ(t)exp(βX)}ϕk]}=0.

Define the estimating function

ψk(X,β,Λ)=I(XΩk)[exp{Λ(t)exp(βX)}ϕk].

Then the subgroup survival information at t* is summarized by

E{ψk(X,β,Λ)}=0,k=1,,K, (1)

where the random function ψk(X, β, Λ), k = 1, …, K, is bounded by 2. Note that the estimating equations only involve the regression parameter β and the baseline cumulative hazard function evaluated at t*. Hence, by setting α = Λ(t*), equation (1) can be reexpressed as Ek(X, β, α)} = 0, k = 1, …, K.

3. Method

In this section, we introduce a double empirical likelihood method to synthesize information from different sources. Under the Cox model, the density of the bivariate random variable (T, X), relative to the product of the Lebesgue measure and the marginal distribution of X, is given by exp(βx)λ(t) exp{−Λ(t) exp(βx)}dG(x), where G is the distribution function of X. Assume that the observed data (Yi, Δi, Xi), i = 1, …, n, on n subjects are independent and identically distributed realizations of (Y, Δ, X). Dropping the factors involving the censoring time distribution, the log full likelihood based on the observed data is

lF=i=1nΔi[βXi+log{dΛ(Yi)}]Λ(Yi)exp(βXi)+log{dG(Xi)}.

Define the functions S(k)(t,β)=n1j=1nI(Yjt)exp(βXj)Xjk, k = 0, 1, 2, with x⊗2 = xx. Following the empirical likelihood method of Owen (1988) and Qin and Lawless (1994), we denote by λi the jump of Λ at Yi and by pi the jump of G at Xi. The full empirical likelihood can then be decomposed as the product of the conditional likelihood of (Y, Δ) given X and the marginal likelihood of X, where the log conditional likelihood is

lC=i=1nΔi(βXi+logλi)ni=1nλiS(0)(Yi,β),

and the log marginal likelihood is

lM=i=1nlog(pi).

For a fixed β, differentiating the log conditional likelihood C with respect to λi and setting the derivative to 0 yields

λi=1n×ΔiS(0)(Yi,β),

which leads to the Breslow (1972) estimator for the baseline cumulative hazard function

Λ^B(t,β)=1ni=1nΔiI(Yit)S(0)(Yi,β).

Another well-known result is that replacing Λ with Λ̂B in C yields the (log) partial likelihood function.

Assume that the auxiliary survival information is consistent with the individual-level data, that is, individuals in the clinical study are a representative sample of the population from which the aggregate survival information is derived. A simple idea to combine auxiliary information is applying the empirical likelihood method to maximize the full likelihood with respect to the constraints

pi0,i=1npi=1,andi=1npiψk(Xi;β,Λ)=0,k=1,,K.

Because the estimating function ψk only involves the value of Λ(t) at t = t*, intuitively, one may replace Λ(t*) in ψk with its Breslow-type estimator Λ̂B(t*, β). However, simulation studies suggest that this simple approach yields biased estimation, because the Breslow-type estimator involves unknown parameter β.

We propose to combine the auxiliary subgroup survival information to estimate the Cox model by formulating two empirical likelihoods – one of which is derived from the conditional likelihood and the other is derived from the marginal likelihood. Our idea is to treat α = Λ(t*) as a nuisance parameter and construct an empirical likelihood for α. We then formulate the usual empirical likelihood for β and Λ using the auxiliary survival information which depends on β and α.

The steps to construct the double empirical likelihood are described below. By definition, α=Λ(t)=0I(ut)dΛ(u). We propose to maximize the log conditional likelihood C subject to the constraint

i=1nλiI(Yit)α=0.

Applying Lagrange multipliers ν, the objective function to be maximized is

i=1nΔi(βXi+logλi)ni=1nλiS(0)(Yi,β)nν{i=1nλiI(Yit)α}.

Taking derivative of the objective function with respect to λi and setting the derivative to 0 yields

λi=1n×ΔiS(0)(Yi,β)+νI(Yit), (2)

where the Lagrange multiplier is determined by

1ni=1n{ΔiI(Yit)S(0)(Yi)+νI(Yit)}α=0.

Substituting (2) back to the objective function yields, up to a constant,

i=1nΔi[βXilog{S(0)(Yi,β)+νI(Yit)}]+nνα.

Hence the marginal empirical score function for β is

i=1nΔi{XiS(1)(Yi,β)S(0)(Yi,β)+νI(Yit)}.

Next, we maximize the log marginal likelihood M with respect to the constraints pi ≥ 0, i=1npi=1, and i=1npiψk(Xi,β,α)=0 for k = 1, …, K. Write ψ(x, β, α) = {ψ1(x, β, α), …, ψK(x, β, α)}′. Given β and α, a unique maximum exists provided 0 lies in the convex hull of ψ(X1, β, α), …, ψ(Xn, β, α). Applying the classic empirical likelihood argument, we have

pi=1n×11+ξψ(Xi,β,α)

and the constrained log marginal likelihood, up to a constant,

i=1nlog{1+ξψ(Xi,β,α)}

where the Lagrange multipliers ξ = (ξ1, …, ξK)′ are determined by

1ni=1nψ(Xi,β,α)1+ξψ(Xi,β,α)=0.

Combing the two constrained log likelihoods yields the constrained log full likelihood function , where, up to a constant,

l(β,ξ,ν,α)=i=1nΔi[βXilog{S(0)(Yi,β)+νI(Yit)}]+nναi=1nlog{1+ξψ(Xi,β,α)}.

The procedure described above enables us to change an infinite dimension problem to a finite-dimension problem at the expense of introducing an additional (K + 2)-dimensional parameters.

To estimate β, we solve a system of empirical score equations:

U1(β,ξ,ν,α)=i=1nΔi{XiS(1)(Yi,β)S(0)(Yi,β)+νI(Yit)}i=1nξψβ(Xi,β,α)1+ξψ(Xi,β,α),U2(β,ξ,ν,α)=i=1nψ(Xi,β,α)1+ξψ(Xi,β,α),U3(β,ξ,ν,α)=i=1n{ΔiI(Yit)S(0)(Yi,β)+νI(Yit)α},U4(β,ξ,ν,α)=i=1n{ξψα(Xi,β,α)1+ξψ(Xi,β,α)ν},

with the usual convention 0/0 = 0. These empirical score functions are derived by taking derivative of the empirical likelihood with respect to θ = (β, ξ, ν, α). Let β0, Λ0, and α0 be the true parameter values, and denote θ0 = (β0, 0, 0, α0). Define U0k = Uk(β0, 0, 0, α0), k = 1, 2, 3, 4, that is, U0k is the value of Uk evaluated at θ0. Define U=(U1,U2,U3,U4) and U0=(U01,U02,U03,U04), where U04 = 0. We show in the Appendix that, under some regularity conditions, n−1/2U0 converges in distribution to a zero-mean multivariate normal distribution with variance-covariance matrix

Ω=(0000J0000K00000),

where Σ, J, K are defined in the appendix. Note, for convenience, we use 0 and I to denote a matrix of 0's and an identity matrix with proper dimensions.

Denote by θ̂ = (β̂, ξ̂, ν̂, α̂) the solution to U(θ) = 0. The large-sample properties of θ̂ are presented in Theorem 1, the proof of which is given in the Appendix.

Theorem 1 Assume that X is bounded, the true regression parameter β0 lies in a compact set, and both T and C are absolutely continuous. Moreover, assume that E{ψ(X, β0, α0)ψ(X, β0, α0)′} is positive definite and α0 = Λ0(t*) < ∞. Then n1/2(β̂β0) converges in distribution to a zero mean multivariate normal distribution with variance-covariance matrix Γ−1 = (Σ+BQ−1B′)−1, provided Γ is non-singular, where B and Q are specified in the Appendix.

Interestingly, Σ−1 is the asymptotic covariance-covariance matrix of n1/2(β̂PLβ0), where β̂PL is the maximum partial likelihood estimator. Hence Theorem 1 implies that the proposed estimator β̂ is asymptotically more efficient than the maximum partial likelihood estimator β̂PL.

When the subgroups involved in the auxiliary survival information are determined by a subset of covariates, the efficiency gain in the estimated coefficients for other covariates is expected to be minimum. To see this, consider a simple case where X = (X1, X2) and the subgroups are determined only based on X1. Then the auxiliary survival information for the kth subgroup can be reexpressed as

E{ψk(X,β,α)}={I(x1Ωk)[exp{αexp(β1x1+β2x2)}ϕk]}dG(x1,x2)={I(x1Ωk)[exp{αexp(β1x1+u)}ϕk]}dG(x1,β21u).

Because the proposed estimation procedure allows for an arbitrary distribution function G for (X1, X2), after reparameterization, it is equivalent to maximize the log marginal likelihood i=1nlog(pi) with respect to the constraints

pi0,i=1npi=1andi=1npi{I(Xi1Ωk)[exp{αexp(β1Xi1+Xi2)}ϕk]},

where Xi = (Xi1, Xi2) and pi is the jump of G at (Xi1, Xi2/β2). It is easy to see that, after profiling out pi, the auxiliary survival information does not involve β2. Thus the proposed estimation procedure is expected to significantly improve the efficiency in the estimation of β1 but has only limited impact on the estimation of β2.

To estimate the baseline cumulative hazard function Λ(t), we consider the following empirical likelihood-based estimator to incorporate the auxiliary survival information:

Λ^(t)=1ni=1nΔiI(Yit)S(0)(Yi,β^)+ν^I(Yit).

Applying the functional delta method, we can show that n1/2{Λ̂EL(t) – Λ(t)} converges to a zero-mean Gaussian process on [0, τ]. A sketch of the proof is given in the Appendix.

The validity of the proposed method holds when the t-year survival probabilities summarized by (1) are consistent with the individual-level data. To test the conformity of the auxiliary survival information, an empirical likelihood ratio test statistic can be constructed in the spirit of Corollary 4 of Qin and Lawless (1994) and Qin and Lawless (1995). Specifically, we consider the test statistic

R=2{supβ,ξ,ν,αl(β,ξ,ν,α)supβ,ν,αl(β,0,ν,α)}.

Note that when the conformity assumption holds, that is, ξ = 0, the likelihood (β, α, 0, ν) is maximized by (β, ν, α) = (β̂PL, 0, α̂PL), where α̂PL = Λ̂B(t*, β̂PL) is the Breslow-type estimator of the baseline cumulative hazard function at time t*. Theorem 2 summarizes the asymptotic properties of empirical log-likelihood ratio statistic R.

Theorem 2 Under the regularity conditions specified in Theorem 1 and the null hypothesis that ξ = 0, the empirical log-likelihood ratio R converges in distribution to a χ2 random variable with K degrees of freedom as n → ∞.

4. An Extension

As discussed before, a major limitation of the estimation procedure described in Section 3 is that the auxiliary information must be consistent with the individual-level data. However, due to study inclusion/exclusion criteria, subjects enrolled in the clinical study may not be a representative sample of the population from which the aggregate survival information is derived. Hence it is desired to allow the aggregate data to have a different survival time model. To this end, we propose to accommodate the inconsistency by assuming that the hazard function of the survival time in the aggregate data follows the Cox model λ*(t)exp(β′ x), where

λ(t)=ρλ(t),ρ>0, (3)

so that the potential differences in the two data sources are characterized by a scale factor ρ. Of note, ρ = 1 indicates that survival time model for the aggregate data is the same as that for the individual-level data.

Similar to the discussions in Section 3, the auxiliary survival information pr(T > t* | XΩk) = ϕk, k = 1, …, K, can be summarized by the estimating equations

E{ψk(X,β,α,ρ)}=0,k=1,,K,

where, under model (3), ψ̂k(X, β, α, ρ) = I(XΩk)[exp{−ραexp(β′X)} − ϕk]. The constrained log full likelihood function is given by, up to a constant,

l(β,ξ,ν,α,ρ)=i=1nΔi[βXilog{S(0)(Yi,β)+νI(Yit)}]+nναi=1nlog{1+ξψ(Xi,β,α,ρ)}.

Taking derivative of the empirical likelihood ℓ̃ with respect to θ̃ = (β, ξ, ν, α, ρ), we reach a system of empirical score equations:

U1(β,ξ,ν,α,ρ)=i=1nΔi{XiS(1)(Yi,β)S(0)(Yi,β)+νI(Yit)}i=1nξψβ(Xi,β,α,ρ)1+ξψ(Xi,β,α,ρ),U2(β,ξ,ν,α,ρ)=i=1nψ(Xi,β,α,ρ)1+ξψ(Xi,β,α,ρ),U3(β,ξ,ν,α,ρ)=i=1n{ΔiI(Yit)S(0)(Yi,β)+νI(Yit)α},U4(β,ξ,ν,α,ρ)=i=1n{ξψα(Xi,β,α,ρ)1+ξψ(Xi,β,α,ρ)ν},U5(β,ξ,ν,α,ρ)=i=1nξψρ(Xi,β,α,ρ)1+ξψ(Xi,β,α,ρ),

with the usual convention 0/0 = 0. Let ρ0 be the true parameter value for ρ, and denote θ0=(θ0,ρ0)=(β0,0,0,α0,ρ0). Define Ũ0k = Ũk(θ̃0), k = 1, …, 5, that is, Ũ0k is the value of Ũk evaluated at θ̃0. Define U=(U1,U2,U3,U4,U5) and U0=(U01,U02,U03,U04,U05), where Ũ04 = Ũ05 = 0. Let θ̂ρ be the solution to Ũ(θ̃) = 0 and β̂ρ be the corresponding estimated regression coefficient. The large-sample properties of θ̂ρ are presented in Theorem 3, with the proof given in the Appendix.

Theorem 3 Assume that the matrix E{ψ̃(X, β0, α0, ρ0)ψ̃(X, β0, α0, ρ0)′} is positive definite. Then, under the same regularity conditions in Theorem 1, n1/2(β̂ρβ0) converges in distribution to a zero mean multivariate normal distribution with variance-covariance matrix Γ̃−1 = (Σ + B̃Q̃−1′)−1, provided Γ̃ is non-singular, where and are specified in the Appendix.

Theorem 3 implies that the extended double empirical likelihood estimator β̂ρ is asymptotically more efficient than the maximum partial likelihood estimator β̂PL. Moreover, it is easy to see that, θ̂ = (β̂, ξ̂, ν̂, α̂) is the maximizer of (β, ξ, ν, α, ρ ≡ 1). In the proof of Theorem 3, we also show that β̂ρ is less efficient than β̂. To test if the same baseline hazard function is shared by the individual-level data and the aggregate data, we consider the empirical log-likelihood ratio statistic

R=2{supβ,ξ,ν,α,ρl(β,ξ,ν,α,ρ)supβ,ξ,ν,αl(β,ξ,ν,α,1)}.

Under minor regularity conditions and the null hypothesis that ρ0 = 1, the empirical log-likelihood ratio converges in distribution to a χ2 random variable with 1 degrees of freedom as n → ∞. The proof closely follows that for Theorem 2, and thus is omitted.

5. Numerical Studies

5.1 Monte Carlo Simulations

We conducted two sets of Monte Carlo simulations to examine the finite-sample performance of the proposed methods. In both simulation studies, we generated X1 from the standard normal random variable and X2 from a Bernoulli distribution with pr(X2 = 1) = pr(X2 = 0) = 0.5. The survival time T in the individual-level data was generated from the proportional hazards models (A) λ(t | X1, X2) = λ(t) exp(β1X1 + β2X2) with (β1, β2) = (−0.5, 0.5), and (B) λ(t | X1, X2) = λ(t) exp(β1X1 + β2X2 + β3X1X2) with (β1, β2, β3) = (−0.5, 1, −0.5), where we set λ(t) = 2t for both models. The censoring time C was generated from an uniform distribution so that the censoring rate was approximately 0%, 30%, and 50%. In each simulation, we generated 1000 datasets, each with a sample size of n = 100 and n = 400.

In the first set of simulations, we considered the case where the individual-level and the aggregate data share the same survival time model, that is, ρ = 1 in model (3). We derived the auxiliary survival information at t = 0.5 for subgroups Ω1 = {(X1, X2) : X1 ≤ 0, X2 = 0} and Ω2 = {(X1, X2) : X1 > 0, X2 = 0} under the assumed Cox model. This specification aims to mimic the situation where in a randomized clinical trial we exploit the information about the 6-month survival probabilities in the standard-of-care control group (X2 = 0) and where X1 is a baseline risk factor. The 6-month survival probabilities for the two subgroups are approximately 0.68 and 0.84 under Models (A) and (B).

Tables 1 and 2 summarize the empirical bias and empirical standard deviation of the maximum partial likelihood estimator β̂PL, the double empirical likelihood estimator β̂, and the extended double empirical likelihood estimator β̂ρ that allows for a different baseline hazard function for the aggregate data. All three estimators are close to their estimands under Models (A) and (B). Compared with the maximum partial likelihood estimator β̂PL, the two double empirical likelihood estimators β̂ and β̂ρ enjoy substantial efficiency gains. Under Model (A), the relative efficiency ranges from 4.86 to 9.66 for the estimated coefficient of baseline risk factor and from 1.02 to 1.84 for the estimated treatment effect. Under Model (B), the relative efficiency in the treatment-covariate interaction ranges from 1.50 to 2.13, suggesting that the use of proposed methods can substantially reduce the sample size requirement by about 18%– 58% in investigating treatment heterogeneity. As expected, β̂ρ is slightly less efficient than β̂ and the estimated value of ρ is close to 1 using the extended double empirical likelihood approach. We also reported the estimated baseline cumulative hazard function at t = 0.3 and 0.7. As expected, the double empirical likelihood approach enjoys a substantial gain in efficiency in the estimation of Λ(t), while the efficiency gain for the extended approach is minimal because it allows for a different baseline risk in the aggregate data. Finally, our simulations (results not shown) also show that the efficiency gain increases with the number of constraints.

Table 1.

Summary statistics for the estimation of Model (A) λ(t | X1, X2) = λ0(t)exp(β1X1 + β2X2) when ρ = 1.

Proportion censored β1 β2 ρ Λ(0.3) Λ(0.7)
n Coef Bias ES RE Bias ES RE Bias ES Bias ES RE Bias ES RE
0% 100 PL -2 12 3 22 0 3 0 10
DEL -1 5 4.96 0 18 1.48 0 2 1.67 0 6 2.59
DELρ -1 5 4.86 2 22 1.03 7 26 0 3 1.03 0 10 1.05
400 PL 0 6 1 11 0 1 0 5
DEL 0 2 5.62 0 9 1.47 0 1 1.86 0 3 2.35
DELρ 0 2 5.52 0 11 1.02 1 11 0 1 1.05 0 5 1.05
30% 100 PL -1 14 3 26 0 3 0 11
DEL -1 5 6.84 1 20 1.67 0 2 1.76 0 7 2.50
DELρ -1 5 6.68 2 25 1.04 7 31 0 3 1.05 0 11 1.05
400 PL -1 7 1 13 0 2 0 5
DEL 0 2 7.10 0 10 1.60 0 1 1.87 0 4 2.36
DELρ 0 2 6.98 0 13 1.02 1 13 0 1 1.05 0 5 1.04
50% 100 PL -2 17 0 31 0 3 1 13
DEL -1 5 9.64 -2 23 1.84 0 2 1.81 1 8 2.54
DELρ -1 5 9.61 -1 30 1.04 6 32 0 3 1.01 2 13 1.03
400 PL -1 8 1 15 0 2 0 6
DEL 1 2 9.66 0 12 1.72 0 1 1.87 0 4 2.50
DELρ 1 2 9.56 0 15 1.02 1 14 0 2 1.05 0 6 1.04

NOTE: β1 and β2 are the regression coefficients, where the true parameter values are (−0.5, 0.5); Λ(t) = t2 is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂PL; DEL, the double empirical likelihood estimator β̂; DELρ, the extended double empirical likelihood estimator β̂ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

Table 2.

Summary statistics for the estimation of Model (B) λ(t | X1, X2) = λ0(t) exp(β1X1 + β2X2 + β3X1X2) when ρ = 1.

Proportion censored β1 β2 β3 ρ Λ(0.3) Λ(0.7)
n Coef Bias ES RE Bias ES RE Bias ES RE Bias ES Bias ES RE Bias ES RE
0% 100 PL -1 17 2 24 -2 24 0 3 0 10
DEL 0 6 9.45 0 18 1.80 -1 17 1.84 0 2 1.84 0 6 2.54
DELρ 0 6 9.38 2 23 1.02 -2 18 1.72 6 26 0 3 1.01 0 10 1.04
400 PL -1 8 1 12 0 11 0 1 0 5
DEL 0 3 9.44 1 9 1.77 0 8 1.64 0 1 2.01 0 3 2.35
DELρ 0 3 9.41 1 12 1.01 0 9 1.50 1 12 0 1 1.03 0 5 1.03
30% 100 PL -2 21 3 29 -2 28 0 3 0 12
DEL -1 6 13.8 0 21 1.92 -1 20 1.93 0 2 1.96 0 8 2.44
DELρ -1 6 13.7 2 28 1.04 -2 21 1.84 8 31 0 3 1.03 0 12 1.05
400 PL -1 9 1 14 0 13 0 2 0 6
DEL 0 3 13.7 0 10 1.88 -1 10 1.78 0 1 2.04 0 4 2.42
DELρ 0 3 13.6 1 13 1.04 -1 10 1.70 2 14 0 1 1.04 0 5 1.06
50% 100 PL -2 27 4 35 -2 36 0 3 0 14
DEL 0 6 23.2 -1 24 2.12 -4 25 2.13 0 2 2.05 1 9 2.45
DELρ 0 6 23.0 2 33 1.09 -4 25 2.11 9 35 0 3 1.05 0 14 1.08
400 PL -1 12 1 16 0 16 0 2 0 7
DEL 1 3 21.0 0 12 1.94 -2 11 1.95 0 1 2.04 0 4 2.35
DELρ 1 3 21.0 0 16 1.03 -2 11 1.91 2 15 0 2 1.04 0 6 1.06

NOTE: β1, β2, and, β3 are the regression coefficients, where the true parameter values are (−0.5, 1, −0.5). Λ(t) = t2 is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂PL; DEL, the double empirical likelihood estimator β̂; DELρ, the extended double empirical likelihood estimator β̂ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

In the second set of simulations, we assume that the auxiliary survival information is derived from the Cox model with a different baseline hazard function λ*(t) = 1.5λ(t), that is, we set ρ = 1.5 in model (3). The 6-month survival probabilities for the same subgroups are approximately 0.57 and 0.77 under both Model (A) and Model (B). Tables 3 and 4 give the summary statistics of the simulation results. As expected, the double empirical likelihood estimator β̂ which assumes consistency between the individual-level data and the aggregate data yields biased estimates. On the other hand, the extended double empirical likelihood estimator β̂ρ performs very well in terms of bias and efficiency gain, and the estimated value of ρ is very close to its true value 1.5.

Table 3.

Summary statistics for the estimation of Model (A) λ(t | X1, X2) = λ0(t) exp(β1X1 + β2X2) when ρ = 1.5.

Proportion censored β1 β2 ρ Λ(0.3) Λ(0.7)
n Coef Bias ES RE Bias ES RE Bias ES Bias ES RE Bias ES RE
0% 100 PL -1 12 2 22 0 3 0 10
DEL 0 5 5.99 -22 15 1.88 4 3 0.76 15 7 1.92
DELρ -1 5 4.71 1 22 1.02 8 38 0 3 1.02 0 10 1.03
400 PL 0 6 1 11 0 1 0 5
DEL 2 2 6.26 -21 8 1.81 4 2 0.88 15 3 1.90
DELρ 1 2 5.68 0 11 1.02 2 17 0 1 1.04 0 5 1.05
30% 100 PL -2 14 2 27 0 3 0 12
DEL 1 5 8.39 -25 19 1.99 4 3 0.82 16 8 2.01
DELρ 0 2 5.20 0 11 1.01 2 17 0 1 1.03 0 5 1.04
400 PL -1 7 1 13 0 2 0 5
DEL 1 2 7.59 -23 9 1.92 4 2 0.90 15 4 1.88
DELρ 0 2 7.01 0 13 1.02 2 19 0 1 1.04 0 5 1.04
50% 100 PL -2 16 2 32 0 3 0 13
DEL 1 5 8.84 -28 22 2.09 4 3 0.97 16 10 1.84
DELρ 0 6 8.29 1 31 1.04 12 50 0 3 1.02 1 13 1.03
400 PL -1 8 1 15 0 2 0 6
DEL 2 2 9.99 -27 11 2.01 4 2 0.90 16 4 1.96
DELρ 1 2 9.39 0 15 1.02 2 21 0 2 1.05 0 6 1.04

NOTE: β1 and β2 are the regression coefficients, where the true parameter values are (−0.5, 0.5); Λ(t) = t2 is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂PL; DEL, the double empirical likelihood estimator β̂; DELρ, the extended double empirical likelihood estimator β̂ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

Table 4.

Summary statistics for the estimation of Model (B) λ(t | X1, X2) = λ0(t)exp(β1X1 + β2X2 + β3X1X2) when ρ = 1.5.

Proportion censored β1 β2 β3 ρ Λ(0.3) Λ(0.7)
n Coef Bias ES RE Bias ES RE Bias ES RE Bias ES Bias ES RE Bias ES RE
0% 100 PL -1 17 2 24 -2 24 0 3 10
DEL 0 5 9.70 -24 16 2.10 6 17 2.04 4 3 0.88 15 7 2.02
DELρ 0 6 9.38 2 23 1.02 -2 18 1.73 9 40 0 3 1.01 0 10 1.04
400 PL -1 8 1 12 0 11 0 1 0 5
DEL 2 2 10.1 -24 8 2.08 6 8 1.82 4 1 0.95 15 4 1.90
DELρ 1 2 9.75 1 12 1.02 -1 9 1.51 2 19 0 1 1.02 0 5 1.03
30% 100 PL -2 21 3 29 -2 28 0 3 0 12
DEL 0 6 14.1 -27 19 2.20 4 20 2.09 4 3 0.95 16 9 1.93
DELρ -1 6 13.6 2 28 1.04 -2 21 1.84 12 46 0 3 1.03 0 12 1.05
400 PL -1 9 1 14 0 13 0 2 0 6
DEL 1 3 14.2 -26 9 2.17 5 10 1.92 4 2 0.98 15 4 1.94
DELρ 0 3 13.7 1 13 1.04 -1 10 1.70 3 21 0 1 1.04 0 6 1.05
50% 100 PL -2 27 4 35 -2 36 0 3 0 14
DEL 0 6 23.5 -30 23 2.36 0 24 2.23 4 3 1.00 16 10 1.93
DELρ 0 6 22.6 2 33 1.09 -4 25 2.11 14 53 0 3 1.06 0 14 1.08
400 PL -1 12 1 16 0 16 0 0 7
DEL 2 3 21.3 -29 11 2.19 2 11 2.05 4 2 0.99 16 5 1.88
DELρ 1 3 20.6 0 16 1.03 -2 11 1.90 3 23 0 2 1.04 0 6 1.06

NOTE: β1, β2, and β3 are the regression coefficients, where the true parameter values are (−0.5, 1, −0.5). Λ(t) = t2 is the baseline cumulative hazard function evaluated at t; PL, the maximum partial likelihood estimator β̂PL; DEL, the double empirical likelihood estimator β̂; DELρ, the extended double empirical likelihood estimator β̂ρ that allows for a different baseline hazard function for the aggregate data; Bias and ES, empirical bias (×100) and empirical standard deviation (×100) of 1,000 regression parameter estimates; RE, the empirical variance of the maximum partial likelihood estimator divided by that of the double empirical likelihood estimators.

5.2 Data Example

5.2.1 Example 1: Prostate cancer study

To demonstrate how the proposed methods can improve efficiency by incorporating auxiliary survival information with the individual-level data, we investigated the comparative effectiveness of two modes of androgen deprivation therapy (ADT), intermittent and continuous ADT (IADT and CADT), on survival outcomes in men with advanced prostate cancer. Continuous ADT has been the conventional palliative approach in the U.S. for the control of advanced prostate cancer, and intermittent ADT has been proposed as an alternative to CADT for the potential advantages of improved quality of life, reduced cost, and reduced risk of side effects. However, it remains unclear whether IADT has a survival benefit comparable to CADT. While age and prostate specific antigen (PSA) level have been shown to be important prognostic factors in advanced prostate cancer, it remains unclear whether the comparative effectiveness of IADT versus CADT differs by PSA level at diagnosis after adjustment for men's age at diagnosis. Although clinical trials and meta-analysis have ben conducted to investigate this issue, the answer remains inconclusive due to lack of statistical power needed for subgroup analysis (Tsai et al., 2013; Hussain et al., 2013).

We used the linked Surveillance, Epidemiology and End Result (SEER)-Medicare dataset in this example. The SEER-Medicare dataset matches incident cancer patients identified from SEER registries to their data from Medicare, the major insurer in the U.S. for people 65 years and older, to obtain longitudinal inpatient and outpatient treatment information and determine patients' receipt of ADT. Our study population was defined as men 66 years and older diagnosed with advanced prostate cancer anytime from January 1, 2004 to December 31, 2009 who received ADT anytime during 2004 to 2010. After excluding men who did not have continuous Medicare coverage from 2004 to 2010, who did not receive either CADT or IADT, and who were missing diagnosis age and PSA measures, a total of 4548 patients were included in this illustrative data example. Among these patients, 71.7% of them received continuous ADT treatment and 45.0% died before December 31, 2010. The median age at diagnosis was 75 years and the median PSA level was 16.2 ng/mL. To illustrate the proposed estimation procedure, we randomly selected 300 cases from the complete dataset, that is, 6.6% of the available 4548 cases. The selected subset has a median age at diagnosis of 75 years and a median PSA level of 17.1 ng/mL. Additionally, 68.7% of the cases in the subset received continuous ADT treatment and 46% died before December 31, 2010.

We fitted a Cox model to analyze survival after diagnosis of prostate cancer using diagnosis age (66-75, 76-80, and > 80) and PSA level (0-40 and > 40) as categorical covariates. Table 5 shows the maximum partial likelihood estimators and their standard errors by using the complete dataset with 4548 cases and the subset with 300 selected cases. We also applied the two double empirical likelihood estimators to synthesize information from the subset data with three sets of auxiliary survival information at the 5-year landmark: (I) The five year survival probabilities were 70.5% and 34.5% for IADT-treated patients whose PSA levels were below and above 40 ng/mL at diagnosis; (II) The five year survival probabilities were 64.2% and 20.1% for CADT-treated patients whose PSA level were below and above 40 ng/mL at diagnosis; (III) The aforementioned survival information are available in both IADT and CADT groups. Note that these auxiliary survival probabilities were derived by applying the Kaplan-Meier estimator to the corresponding subgroups from the complete dataset. To obtain the standard errors for the estimated regression coefficients, we adopted a nonparametric bootstrap method by sampling 300 subjects with replacements from the subset data. The resampling procedure was repeated 1000 times, and the standard errors were estimated with the standard deviation of the 1000 estimates.

Table 5.

Estimated regression coefficients of the Cox model for the prostate cancer study.

cADT PSA cADT*PSA 76-80 years >80 years ρ
Coef SE Coef SE Coef SE Coef SE Coef SE Coef SE
Complete 0.193 0.066 0.905 0.084 0.359 0.099 0.316 0.058 0.764 0.051
Subset 0.433 0.251 1.121 0.298 0.117 0.377 0.268 0.254 0.822 0.205
DEL I 0.309 0.156 1.076 0.083 0.158 0.251 0.266 0.248 0.804 0.198
DELρ I 0.414 0.213 1.081 0.084 0.155 0.252 0.270 0.251 0.823 0.204 1.132 0.212
DEL II 0.382 0.220 1.125 0.300 0.193 0.309 0.259 0.251 0.812 0.199
DELρ II 0.398 0.229 1.125 0.299 0.192 0.308 0.255 0.249 0.805 0.199 0.977 0.115
DEL III 0.236 0.059 1.076 0.083 0.240 0.106 0.259 0.248 0.797 0.194
DELρ III 0.237 0.060 1.077 0.084 0.241 0.107 0.263 0.249 0.806 0.199 1.022 0.111

NOTE: Coef, the estimated coefficient; SE, the bootstrap standard error given by the standard deviation of the 1000 estimates. SE for the complete dataset with 4548 cases is the asymptotic standard error estimates. DEL, the double empirical likelihood estimator β̂; DELρ, the extended double empirical likelihood estimator β̂ρ that allows for a different baseline hazard function for the aggregated data.

As expected, incorporating auxiliary survival information in the data analysis using the proposed methods yields smaller standard errors than the conventional partial likelihood approach using the subset data, and the largest efficiency gains are observed in Scenario III where a greater amount of information is incorporated. Of note, the estimator β̂ρ that allows for a different baseline hazard function for the aggregate data gives almost identical results as the one that assumes consistency between the two populations. As expected, the estimated value of ρ is very close to 1 in all scenarios because the selected subset is a random sample of the complete data. Interestingly, the analysis with complete data shows the comparative effectiveness of IADT versus CADT to differ significantly by patients' PSA level after adjusting for age at diagnosis, yet the significance disappears when applying conventional analysis in the subset data but is revealed in our proposed approaches with auxiliary survival information, thus highlighting the practical value of employing a more efficient methodology.

5.2.2 Example 2: Pancreatic cancer study

We now apply the extended double empirical likelihood estimator to another example where the auxiliary survival information may be inconsistent with individual-level data. We analyzed data from a pancreatic cancer study conducted at the Johns Hopkins Hospital to study risk factors affecting survival following pancreatectomy. Pancreatic ductal adenocarcinoma (PDAC), the most common histological subtype of pancreatic malignancy, is a very resilient disease with a very poor prognosis. To date, radical surgical resection remains the only treatment for PDAC that offers clinical benefit in terms of overall survival. Unfortunately, less than 20% of pancreatic cancer patients have surgically resectable disease at the time of diagnosis, and the majority of resected pancreatic cancer recurs within 5 years. Despite advances in the treatment of cancer during the past few decades, improvement in long-term survival of PDAC patients has been modest. The major risk factors influencing survival after pancreatic cancer surgery are tumor characteristics. Favorable prognostic factors include negative resection margin, negative lymph node, and absence of perineural invasion.

This data example is from a retrospective cohort study of 209 consecutive patients who had surgical resection of PDAC and follow-up at the Johns Hopkins Hospital from January 9, 1998 to June 13, 2007. Thorough chart reviews were conducted to ascertain patient's demographics and results of laboratory test, clinical and pathological exams. Treatment data were collected from the electronic medical records. Disease recurrence was determined clinically through imaging studies (computed tomography, positron emission tomography) or pathological diagnosis (CT-guided biopsy, wedge resection or lobectomy). All-cause and cancer-specific deaths and dates of death were determined by a combined review of clinical follow-up information, Social Security Death Index, and the National Cancer Database.

We fitted a Cox model to evaluate the effects of presence of lymph nodes, positive resection margins, presence of perineural invasion (PNI), age at surgery (≤ 65 and > 65), and gender on overall survival. Table 6 shows the estimated covariate effects using the partial likelihood method. We also applied the double empirical likelihood estimators β̂ρ to synthesize three sets of auxiliary survival information reported in Cameron et al. (2006): (I) the three-year survival probabilities for node-negative and node-positive patients were 40% and 26%, respectively. (II) the three-year survival probabilities for margin-negative and margin-positive patients were 35% and 20%, respectively. (III) All the four survival probabilities given in (I) and (II). These survival probabilities were estimated from 1000 consecutive pancreatectomies performed by a single surgeon between March 1969 and May 2003.

Table 6.

Estimated regression coefficients of the Cox model for the pancreatic cancer study.

Nodes Margins PNI > 65 years Male ρ
Coef SE Coef SE Coef SE Coef SE Coef SE Coef SE
PL 0.37 0.22 0.41 0.17 1.09 0.42 0.28 0.16 -0.29 0.15
DELρ I 0.25 0.10 0.42 0.17 1.09 0.40 0.29 0.16 -0.29 0.16 0.80 0.08
DELρ I 0.37 0.23 0.34 0.06 1.10 0.40 0.27 0.16 -0.30 0.16 0.80 0.08
DELρ III 0.27 0.08 0.36 0.05 1.10 0.41 0.28 0.16 -0.30 0.15 0.80 0.08

NOTE: Coef, the estimated coefficient; SE, the bootstrap standard error given by the standard deviation of the 1000 estimates. SE for the maximum partial likelihood estimator is the asymptotic standard error estimates. DELρ, the extended double empirical likelihood estimator β̂ρ that allows for a different baseline hazard function for the aggregate data.

We only reported the results of β̂ρ in Table 6 because the baseline hazard function in the aggregate data is expect to be different due to potential differences in patients' characteristics. As before, incorporating auxiliary survival information in the data analysis using the proposed methods yields smaller standard errors than the conventional partial likelihood approach, and the largest efficiency gains are observed in Scenario III where a greater amount of information is incorporated. Note that the effect of positive lymph nodes is only marginally significant when applying the partial likelihood method, but becomes statistically significant when combining with the auxiliary survival information. The estimated value of ρ is 0.8 and is significantly lower than 1, indicating that the baseline risk in patients in the study reported by Cameron et al. (2006) is lower than that in patients in our clinical study.

6. Remarks

In this paper, we have proposed two double empirical likelihood approaches to synthesize information from both patient-level right-censored survival data and the auxiliary survival information. We first construct an efficient estimation procedure by imposing consistency between the individual-level data and the aggregate data, and then extend the estimation procedure to allow for potential differences in the survival models for the two data sources. Many researchers, including Imbens and Lancaster (1994), Hellerstein and Imbens (1999), and Chaudhuri et al. (2008), have considered estimation of the general/generalized linear models by imposing additional moment restrictions. Most of the existing work deals with complete data, while this paper considers estimation of the semiparametric Cox proportional hazards model using right-censored survival data by incorporating a system of nonlinear constraints. The simulation studies show large gains in efficiency by incorporating marginal moments from published survival probability information. We believe that the proposed methodologies will have a significant impact on the practice of meta-analysis.

The proposed approaches are more flexible than the conventional meta-analysis in the sense that they can automatically combine survival information for different subgroups and the information may be derived from different studies. For example, one study may publish survival probabilities for different age groups, while the other study may publish survival probabilities for different disease stages. The proposed double empirical likelihood methods provide a unified framework to incorporate all the available information in the form of non-linear constraints.

The sample size of the external data source, such as disease registries, is in general much larger than that of the individual-level data. As a result, the variability in the published survival information is usually negligible compared to the variability in the parameter estimates using the individual-level data. In the case where the variability is not negligible, a higher-order Taylor expansion needs to be employed to summarize the auxiliary survival information as estimating equations. This will be explored elsewhere.

Acknowledgments

The authors thank the editor, the associate editor, and two referees for their constructive comments. Thanks are due to Dr. Lei Zheng for kindly sharing the pancreatic cancer data. The authors also acknowledge the efforts of the Applied Research Program, NCI; the Office of Research, Development and Information, CMS; Information Management Services (IMS), Inc.; and the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in the creation of the SEER-Medicare database. The interpretation and reporting of these data are the sole responsibility of the authors. This work is supported by National Institutes of Health.

Appendix: Large-Sample Properties

Proof of Asymptotic Normality for U0. Define Ni(t) = ΔiI(Yit) and Mi(t)=Ni(t)Λ0(t)exp(β0Xi). Thus we have

U01=i=1n0{XiS(1)(u,β0)S(0)(u,β0)}dMi(u),U02=i=1nψ(Xi,β0,α0),U03=i=1n0I(ut)S(0)(u,β0)dMi(u),U04=0.

For convenience, let s(k)(t, β) denote the limit of S(k)(t, β), that is, s(k)(t, β) = limn→∞ S(k)(t, β) for k ∈ {0, 1, 2}. Because Mi(t) is a local square-integrable martingale and XiS(1)(u, β0)/S(0)(u, β0) and I(ut*)/S(0)(u, β0) are both predictable quadratic variation processes, we have E(U01) = 0, E(U03) = 0,

var(n1/2U01)=0[s(2)(u,β0){s(0)(u,β0)}1{s(1)(u,β0)}2]dΛ0(u),
Kvar(n1/2U03)=0t0{s(0)(u,β0)}1dΛ0(u),

and

cov(U01,U03)=i=1n0E[{XiS(1)(u,β0)S(0)(u,β0)}I(ut)S(0)(u,β0)exp(β0Xi)I(Yiu)]dΛ0(t)=0.

Moreover, by double expectation, it can be shown that E(U01U02)=0 and E(U03U02) = 0. Define J = var(n−1/2U02). It is easy to see that | ψk(X, β0, α0) |≤ 2 for k = 1, …, K. Thus, by the martingale central limit theorem and the classic central limit theorem, U0 converges in distribution to a zero mean multivariate normal distribution with the variance-covariate matrix Ω as n → ∞.

Proof of Theorem 1. Arguing as in the proof of Lemma 1 in Qin and Lawless (1994), under some regularity conditions, we can show that the full constraint empirical likelihood attains the maximum at some point (β̂, α̂) in the interior of the ball {(β, α) : ‖ (β, α) − (β0, α0) ‖≤ n−1/3} with probability 1. Next, straightforward algebra yields

n1E(U(θ)θ|θ=θ0)=(BBQ), (4)

Where

B=n1{E(U1ξ)|θ=θ0,E(U1ν)|θ=θ0,E(U1α)|θ=θ0}=[E{ψβ(X1,β0,α0)},0t0s(1)(u,β0)s(0)(u,β0)dΛ0(u),0]:=(B1,B2,0),

and, similarly,

Q=(J0H0K1H10),

with H = −n−1E {ψ/∂α(X1, β0, α0)}. It follows from HKH + J being positive definite that Q is negative definite. By singular value decomposition, one can derive

Q1=(I0J1H01K1001){J1000K1000(HJ1H+K1)1}(I00010HJ1K11).

Define Γ = Σ + BQ−1B′, then

(BBQ)1={Γ1Γ1BQ1Q1BΓ1(Q+B1B)1}.

By definition, θ̂ is the solution to U(θ̂) = 0. Write U=(U01,U) with U=(U02,U03,0). Then, by (4) and a Taylor series expansion, we have

n1/2(θ^θ0)=(BBQ)1(n1/2U01n1/2U)+op(1). (5)

Thus we establish the asymptotic representation n1/2(β̂β0) = Γ−1(n−1/2U01)–Γ−1BQ−1(n−1/2U*)+ op(1). Because U01 and U* are orthogonal, the asymptotic variance of n1/2(β̂β0) is given by

Γ1Γ1+Γ1BQ1(J000K0000)Q1BΓ1=Γ1(ΓBQ1B)Γ1+Γ1BQ1(J000K0000)Q1BΓ1=Γ1Γ1BQ1(00H001H10)Q1BΓ1.

Let G = HJ−1H + K−1. It can be verified that

Q1(00H001H10)Q1=(00J1HG100K1G1G1HJ1G1K12G1).

Straightforward algebra yields

BQ1(00H001H10)Q1B=0. (6)

Hence the asymptotic variance of {n1/2(β̂β0)} is given by Γ−1 = (Σ + BQ−1B′)−1Σ−1.

Proof of Asymptotic Normality for Λ̂(t). To establish the asymptotic normality of Λ̂(t), we first note that the double empirical likelihood-based estimator for the baseline cumulative hazard function can be reexpressed as Λ̂(t, β̂, ν̂), where

Λ^(t,β,ν)=0tn1i=1ndNi(u)S(0)(u,β)+νI(ut).

Note that Λ̂ defines a functional of two empirical processes n1i=1ndNi(u) and S(0)(u, β) + νI(ut*), and the mapping defined by Λ̂ is compactly differentiable. Define Fu(t) = E{N1(t)} and let Λ0(t) be the true baseline cumulative hazard function. It can be shown that Λ̂(t, β0, 0) converges almost surely to 0tdFu(u)/s(0)(u,β0)=Λ0(t).

A Taylor series expansion of Λ̂(t, β̂, ν̂) about (β0, 0) yields

Λ^(t,β^,ν^)=Λ^(t,β0,0)+{Λ^(t,β,ν)β|(βt,0)}(β^β0)+Λ^(t,β,ν)ν|(β0,νt)(ν^0), (7)

where (βt,νt) lies between (β̂, ν̂) and (β0, 0). Given the consistency of θ̂ and by the Glivenko-Cantelli theorem, we can show that almost surely

Λ^(t,β,ν)β|(βt,0)E{Λ^(t,β,ν)β|(β0,0)}=0ts(1)(u,β0)s(0)(u,β0)2dFu(u)

and

Λ^(t,β,ν)ν|(β0,νt)E{Λ^(t,β,ν)ν|(β0,0)}=0tI(ut)s(0)(u,β0)2dFu(u)

as n → ∞. Moreover, applying the functional delta method to Λ̂(t, β0, 0) yields

Λ^(t,β0,0)Λ0(t)=0tn1i=1ndNi(u)s(0)(u,β0)0tS(0)(u,β)+νI(ut)s(0)(u,β0)2dFu(u)+op(n1/2). (8)

It follows from (5), (7), and (8) that n1/2{Λ̂(t, β̂, ν̂) − Λ0(t)} is asymptotically equivalent to a sum of i.i.d. monotone processes with bounded second moments, and thus, following example 2.11.16 of van der Vaart and Wellner (1996), converges weakly to a mean zero gaussian process.

Proof of Theorem 2. Note that (β̃, α̃) is the solution to U1(β, 0, 0, α) = 0, and U3(β, 0, 0, α) = 0. Hence (β̃, α̃) has the following asymptotic representation

n1/2(ββ0αα0)=n1/2(0B21)1(U01U03)+op(1)=n1/2(10B211)(U01U03)+op(1).

It follows from (5) and n1/2(Hξ̂ + ν̂) = n−1/2U04 + op(1) = op(1) that

n1/2(β^βξ^ν^α^α)=n1/2(1(B1+B2H)IHB21B1+(B21B2+K)H)ξ^+op(1).

Expanding the partial likelihood ℓ(β̃, 0, 0, α̃) at θ̂ yields

R=2{l(β,0,0,α)l(β^,ξ^,ν^,α^)}=n(β^β,ξ^,ν^,α^α)(BBQ)(β^β,ξ^,ν^,α^α)+op(1)=n1/2ξ^{(B1HB2)1(B1B2H)+J+HKH}n1/2ξ^+op(1).

Define W=(B1HB2)1(B1B2H)+J+HKH and denote the asymptotic variance of n1/2ξ by

Vξ=(0,I,0,0)(BBQ)1Ω(BBQ)1(0,I,0,0).

By tedious algebra, we can show that VWVWV = VWV and rank(WV) = K, that is, the quadratic form of the asymptotically normally distributed random variable ξ satisfies the conditions of Ogasawara-Takahashi Theorem (Rao, 1973, page 188). Thus we prove that R converges in distribution a χ2 distribution with K degrees of freedom as n → ∞.

Proof of Theorem 3. The proof of Theorem 3 closely follows that of Theorem 1. Hence we only highlight their differences. It is easy to see that Ũ01 = U01,

U02=i=1nψ(Xi,β0,α0,ρ0),

Ũ03 = U03, Ũ04 = U04 = 0, Ũ05 = U05 = 0. Arguing as before, U0=(U01,U02,U03,U04,U05) converges in distribution to a zero mean multivariate normal distribution with the variance-covariate matrix Ω̃ as n → ∞, where

Ω=(00000J00000K000000000000),

where = var(n−1/2Ũ02).

Define the matrices

Φ=(QLL0),

where

L=n1{E(U5ξ)|θ=θ0,E(U5ν)|θ=θ0,E(U5α)|θ=θ0}=[E{ψρ(X1,β0,α0,ρ0)},0,0]:=(L1,0,0),
Q=(J0H0K1H10),

and = −n−1E {ψ̃/∂α(X1, β0, α0, ρ0)}. Straightforward algebra yields

n1E(Uθ|θ=θ0)=(κκΦ),

where κ = (, 0) with B=[E{ψβ(X1,β0,α0,ρ0)},B2,0].

By a Taylor series expansion, we have

n1/2(θ^ρθ0)=(κκΦ)1(n1/2U01n1/2U)+op(1),

where U=(U02,U03,0,0). Define Γ̃ = Σ + κΦ−1κ′. Arguing as before, we establish the asymptotic representation n1/2(β̂ρβ0) = Γ̃−1(n−1/2Ũ01) − Γ̃−1κΦ−1(n−1/2Ũ*) + op(1). Because Ũ01 and Ũ* are orthogonal, the asymptotic variance of n1/2(β̂ρβ0) is given by

Γ1Γ1+Γ1κΦ1(J0000K0000000000)Φ1κΓ1=Γ1(ΓκΦ1κ)Γ1+Γ1κΦ1(J0000K0000000000)Φ1κΓ1=Γ1Γ1κΦ1(00HL10010H100L1000)Φ1κΓ1.

Arguing as in the proof of Theorem 1, we can show that

LQ1(00H001H10)Q1L=0,andBQ1(00H001H10)Q1L=0.

Together with (7) and

Φ1=(QLL0)1=(Q1Q1L(LQ1L)1LQ1Q1L(LQ1L)1(LQ1L)1LQ1(LQ1L)1)

we have

κΦ1(00HL10010H100L1000)Φ1κ=(B,0)(QLL0)1(00HL10010H100L1000)(QLL0)1(B0)=0.

Hence we prove that var{n1/2(β̂ρβ0)} = Γ̃−1 = (Σ + κΦ−1κ′)−1Σ−1, where Σ−1 is the variance-covariance matrix for the maximum partial likelihood estimator β̂PL. Moreover, because κΦ−1κ′ = B̃Q̃−1′−(B̃Q̃−1L)(L−1L)−1(B̃Q̃−1L)′ ≤ B̃Q̃−1′, we also prove that β̂ρ is less efficient than β̂ which is the maximum empirical likelihood estimator for β when the true parameter value ρ0 = 1 is known.

Contributor Information

Chiung-Yu Huang, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, Maryland 21205.

Jing Qin, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892.

Huei-Ting Tsai, Cancer Prevention and Control Program, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20007.

References

  1. Breslow NE. Discussion of the paper by D. R. Cox. Journal of the Royal Statistical Society, Series B. 1972;34:216–217. [Google Scholar]
  2. Cameron JL, Riall TS, Coleman J, Belcher KA. One Thousand Consecutive Pancreaticoduodenectomies. Annals of Surgery. 2006;244:10. doi: 10.1097/01.sla.0000217673.04165.ea. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chaudhuri S, Handcock MS, Rendall MS. Generalized Linear Models Incorporating Population Level Information: An Empirical-Likelihood-Based Approach. Journal of the Royal Statistical Society, Series B. 2008;70:311–328. doi: 10.1111/j.1467-9868.2007.00637.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen J, Qin J. Empirical Likelihood Estimation for Finite Populations and the Effective Usage of Auxiliary Information. Biometrika. 1993;80:107–116. [Google Scholar]
  5. Chen J, Sitter RR, Wu C. Using Empirical Likelihood Methods to Obtain Range Restricted Weights in Regression Estimators for Surveys. Biometrika. 2002;89:230–237. [Google Scholar]
  6. Chen J, Wu C. Estimation of Distribution Function and Quantiles Using the Model-Calibrated Pseudo Empirical Likelihood Method. Statistica Sinica. 2002;12:1223–1239. [Google Scholar]
  7. Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
  8. DiCiccio T, Hall P, Romano J. Empirical Likelihood is Bartlett-Correctable. The Annals of Statistics. 1991;19:1053–1061. [Google Scholar]
  9. Hellerstein JK, Imbens GW. Imposing Moment Restrictions from Auxiliary Data by Weighting. Review of Economics and Statistics. 1999;81:1–14. [Google Scholar]
  10. Hussain M, Tangen CM, Berry DL, Higano CS, Crawford ED, Liu G, Wilding G, Prescott S, Kanaga Sundaram S, Small EJ, Dawson NA, Donnelly BJ, Venner PM, Vaishampayan UN, Schellhammer PF, Quinn DI, Raghavan D, Ely B, Moinpour CM, Vogelzang NJ, Thompson IM. Intermittent Versus Continuous Androgen Deprivation in Prostate Cancer. New England Journal of Medicine. 2013;368:1314–1325. doi: 10.1056/NEJMoa1212299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Imbens GW. Generalized Method of Moments and Empirical Likelihood. Journal of Business & Economic Statistics. 2002;20:493–506. [Google Scholar]
  12. Imbens GW, Lancaster T. Combining Micro and Macro Data in Microeconometric Models. The Review of Economic Studies. 1994;61:655–680. [Google Scholar]
  13. Kovalchick SA. Aggregate-Data Estimation of an Individual Patient Data Linear Random Effects Meta-Analysis With a Patient Covariate-Treatment Interaction Term. Bio-statistics. 2013;14:273–283. doi: 10.1093/biostatistics/kxs035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Li G, Li R, Zhou M. Empirical Likelihood in Survival Analysis. In: Fang K, Fan J, Li G, editors. Contemporary Multivariate Analysis and Design of Experiments. Singapore: World Scientific Publishing Co. Inc.; 2005. pp. 337–350. [Google Scholar]
  15. Liu D, Zheng Y, Prentice RL, Hsu L. Estimating Risk with Time-to-Event Data: An Application to the Womens Health Initiative. Journal of the American Statistical Association. 2014;109:514–524. doi: 10.1080/01621459.2014.881739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Owen AB. Empirical Likelihood Ratio Confidence Intervals for a Single Functional. Biometrika. 1988;75:237–249. [Google Scholar]
  17. Owen AB. Empirical Likelihood Ratio Confidence Regions. The Annals of Statistics. 1990;18:90–120. [Google Scholar]
  18. Qin J, Lawless J. Estimating Equations, Empirical Likelihood and Constraints on Parameters. Canadian Journal of Statistics. 1995;23:145–159. [Google Scholar]
  19. Qin J, Lawless JF. Empirical Likelihood and General Estimating Equations. The Annals of Statistics. 1994;22:300–325. [Google Scholar]
  20. Rao CR. Linear Statistical Inference and Its Applications. 2nd New York: Wiley; 1973. [Google Scholar]
  21. Ren JJ, Zhou M. Full Likelihood Inferences in the Cox Model: An Empirical Likelihood Approach. Annals of the Institute of Statistical Mathematics. 2011;63:1005–1018. [Google Scholar]
  22. Simmonds MC, Higgins JP. Covariate Heterogeneity in Meta-Analysis: Criteria for Deciding Between Meta-Regression and Individual Patient Data. Statistics in Medicine. 2007;26:2982–2999. doi: 10.1002/sim.2768. [DOI] [PubMed] [Google Scholar]
  23. Thomas DR, Grunkemeier GL. Confidence Interval Estimation of Survival Probabilities for Censored Data. Journal of the American Statistical Association. 1975;70:865–871. [Google Scholar]
  24. Tsai HT, Penson DF, Makambi KH, Lynch JH, Van Den Eeden SK, Potosky AL. Efficacy of Intermittent Androgen Deprivation Therapy vs Conventional Continuous Androgen Deprivation Therapy for Advanced Prostate Cancer: a Meta-Analysis. Urology. 2013;82:327–334. doi: 10.1016/j.urology.2013.01.078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
  26. Wu C, Sitter RR. A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data. Journal of the American Statistical Association. 2001;96:185–193. [Google Scholar]
  27. Zhou M. The Cox Proportional Hazards Model With Partially Known Baseline. In: Hsiung AC, Ying Z, Zhang CH, editors. Random Walk, Sequential Analysis and Related Topics. Singapore: World Scientific Publishing Co.; 2006. pp. 215–232. [Google Scholar]

RESOURCES