Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 11.
Published in final edited form as: J Comput Graph Stat. 2021 Mar 8;30(4):1209–1223. doi: 10.1080/10618600.2020.1870481

Online Updating of Survival Analysis

Jing Wu 1, Ming-Hui Chen 2,*, Elizabeth D Schifano 2, Jun Yan 2
PMCID: PMC8916746  NIHMSID: NIHMS1722634  PMID: 35280977

Abstract

When large amounts of survival data arrive in streams, conventional estimation methods become computationally infeasible since they require access to all observations at each accumulation point. We develop online updating methods for carrying out survival analysis under the Cox proportional hazards model in an online-update framework. Our methods are also applicable with time-dependent covariates. Specifically, we propose online-updating estimators as well as their standard errors for both the regression coefficients and the baseline hazard function. Extensive simulation studies are conducted to investigate the empirical performance of the proposed estimators. A large colon cancer data set from the Surveillance, Epidemiology, and End Results (SEER) program and a large venture capital (VC) data set with time-dependent covariates are analyzed to demonstrate the utility of the proposed methodologies.

Keywords: Cox model, Data compression, Piecewise constant baseline hazard, SEER, Streaming Survival Data

1. Introduction

Survival analysis, or the analysis of time-to-event data (Kalbfleisch and Prentice 2011), has been widely applied in diverse fields such as biostatistics, economics, education, and sociology, among others (e.g., Ibrahim et al. 2001; Giot and Schwienbacher 2007; Plank et al. 2008). The advancement in computer technology has made possible the collection of “big survival data”, which brings opportunities as well as challenges towards new discoveries since most of the traditional survival analysis methods become computationally infeasible in the presence of large-scale survival data. For example, the Cox maximum partial likelihood estimation (MPLE) approach (e.g., Cox 1972, 1975), which has long been used for survival analysis, involves summations over risk sets requiring access to all observations, and is thus computationally challenging.

The modern statistical methodologies for big data can be roughly grouped into three categories (Wang et al. 2016): resampling-based (e.g., Wang et al. 2019; Ai et al. 2018; Kleiner et al. 2014; Maclaurin and Adams 2014; Ma et al. 2013; Liang et al. 2013), divide-and-conquer (e.g., Barbian and Assunção 2017; Chang et al. 2017; Song and Liang 2015; Chen and Xie 2014; Neiswanger et al. 2013; Lin and Xi 2011), and online-updating (e.g., Luo and Song 2020; Kong and Xia 2019; Wang et al. 2018a; Schifano et al. 2016). Recent developments in advanced survival analysis have focused on high-dimensional survival data (e.g., Kawaguchi et al. 2017; Mittal et al. 2013), while less attention has been paid to survival data with huge sample size as we focus here. Wang et al. (2018b) proposed a divide-and-conquer algorithm to handle high-dimensional and huge sample-size survival data using the Cox model. They first obtained a maximum partial likelihood estimator based on a subset, then updated the estimator via one-step linear approximations based on the entire data, and finally obtained the LASSO penalized estimator by applying a least-square approximation to the partial likelihood. Xue et al. (2020) proposed a cumulatively updated estimating equation (CUEE) estimator for the regression coefficients under the online-updating setting. Notably, however, none of these works provide an estimator for the baseline hazard function, which is essential for better understanding the survival process and for prediction. Furthermore, the literature on online-updating in the streaming survival data setting, where survival data arrive sequentially in large chunks and the full access to the entire data is infeasible, is still sparse.

We develop new methodologies to carry out survival analysis in the streaming survival data setting, which is not uncommon in real life. The Surveillance, Epidemiology, and End Results (SEER) program, for example, has been updating its database on cancer cases throughout the United States annually since 1973, to better understand the survival of cancer patients and reduce the cancer burden of the society. In the venture capital (VC) investing industry starting from 1946, time to successful exits, such as initial public offerings (IPOs) of the funded companies, is of significant importance for both VC investors and the companies. Real estate companies such as Zillow (Zillow 2016) also receive huge amounts of streaming data every second from various public sources, where time on market until a house is sold is of huge practical interest. Under such settings, most of the conventional estimation methods for survival analysis are computationally challenging since they require access to immense amounts of data. To overcome such computational challenges and inspired by the observations that (i) Cox partial likelihood function can be approximated by the likelihood function of piecewise exponential model (Johansen 1983) and that (ii) the maximum likelihood estimators of piecewise exponential model are consistent under mild conditions for big data (Friedman 1982), we propose four online-updating methods for survival analysis in the Cox proportional hazards model framework by assuming a piecewise constant baseline hazard function. By carrying out analysis in a parametric manner, we are able to estimate the baseline hazard function simultaneously with the regression coefficients, in contrast to the coefficients alone as in Xue et al. (2020). Furthermore, our methods, with minimal storage requirement, are computationally efficient and allow for online-updating of estimation and inference for both the regression coefficients and the baseline hazard function. Other novelties of the proposed methods include a characterization of crucial but not trivial conditions of the expansion matrices P and Q to correct the bias under the adaptive partition approach, and flexibility in including time-dependent covariates which relaxes the proportional hazard assumption.

Derivations of the formulae for the online-updating estimators and the standard errors for both regression coefficients and baseline hazards are provided in detail. Extensive simulation studies show that the proposed methods are competitive in comparison with the method using the entire data in terms of bias and standard errors for both the regression coefficients and baseline hazards. The analyses of the SEER colon cancer data and the VC data further demonstrate that the estimates under the proposed methods are similar to estimates obtained using the entire data simultaneously.

The remainder of this article is organized as follows. In Section 2, we briefly review the Cox proportional hazards model with piecewise constant baseline hazard function. We then propose four online-updating methods to carry out survival analysis in the streaming survival data setting, with the derivation of the algorithms and estimators presented in detail. We report extensive simulation studies in Section 3 and the real data analyses in Section 4. A discussion concludes in Section 5.

2. Online Updating Algorithms and Inference

2.1. Notation and Preliminaries

Suppose there are N independent observations D = {(ti, δi, xi), i = 1,2, …, N} of interest, where ti is the observed time (either censoring or event time), δi is the indicator function with δi = 1 representing the event and δi = 0 indicating censored, and xi is the p × 1-dimensional baseline covariate vector. Write t = (t1, t2, …, tN), δ = (δ1, δ2, …, δN), and X = (x1, x2, …, xN).

The logarithm of the partial likelihood function of Cox model is given by

(βD)=i=1Nδilog{exp(xiβ)jR(ti)exp(xjβ)}, (2.1)

where R(t)={:tt} is the set of subjects at risk at time t and β is a p × 1-dimensional vector of regression coefficients. Unlike in Schifano et al. (2016), (β | D) in (2.1) can not be written as the summation of independent partial likelihood functions since the term jR(ti)exp(xjβ) involved in each function depends on the entire data, for any i. Thus, this approach would require the full access to the entire data at each accumulation point or data block, which is not applicable under the streaming survival data setting. In addition, the MPLE approach does not allow us to estimate the baseline hazard function, which is essential if one is interested in prediction.

To address these problems, we consider a proportional hazards model with piecewise constant baseline hazard function. Note that the piecewise constant hazard is not a strong assumption since any continuous function on a closed interval can be uniformly approximated by a step (piecewise constant) function (Carothers 2000). In fact, the piecewise constant hazard allows us to approximate reasonably well almost any baseline hazard if the partition is fine enough (relative to the true baseline hazard). Assume we partition [0, ∞) into J intervals (0 = a0 < a1 < … < aJ = ∞), the piecewise constant hazard function is given by

λi(t)=j=1Jλj1[aj1,aj)(t)exp(xiβ), (2.2)

where {λj | j = 1, …, J} are constant, λ = (λ1, …, λJ), and β is a p × 1-dimensional vector of regression coefficients corresponding to covariates xi. We write the cumulative piecewise linear hazard function as follows,

Λi(t)=j=1JλjΔj(t)exp(xiβ),    where    Δj(t)={0t<aj1taj1aj1t<ajajaj1taj. (2.3)

After some algebra, the logarithm of likelihood function for model (2.2) is given by

(β,λD)=i=1N{δij=1Jlogλj1[aj1,aj)(ti)+δixiβj=1JλjΔj(ti)exp(xiβ)}. (2.4)

Under this formulation, (β, λ | D) can be written as the summation of several independent log partial likelihood functions and thus the online-updating algorithm idea can be naturally applied.

Note that, we can also write (2.4) as follows

(β,λD)=j=1Jdjlogλj+i=1Nδixiβj=1Jλj{i=1NΔj(ti)exp(xiβ)}, (2.5)

where dj=i=1Nδi1[aj1,aj)(ti).

Let θ=(β,λJ) denote the collection of all the parameters and M(θ) the score function, which is the first-order partial derivative of the logarithm of likelihood function in (2.5). Let θ=(β,λJ) denote the solution to the score equation

M(θ)=[i=1Nδixi1j=1Jλji=1NΔj(ti)exp(xiβ)xi1i=1Nδixipj=1Jλji=1NΔj(ti)exp(xiβ)xipd1λ1i=1NΔ1(ti)exp(xiβ)dJλJi=1NΔJ(ti)exp(xiβ)]=0. (2.6)

After taking the second-order partial derivatives of the log partial likelihood function, elements of the negated (p + J) × (p + J) Hessian matrix H = (Hi,j) are given by

Hr,s=2βrβs=j=1Jλj{i=1NΔj(ti)xirxisexp(xiβ)},
Hp+m,r=Hr,p+m=2λmβr=i=1NΔm(ti)xirexp(xiβ), (2.7)
Hp+m,p+n=2λmλn=1(m=n)dmλm2,

for r, s = 1, …, p and m, n = 1, …, J.

2.2. Time-dependent Covariates

The above results can also be extended to non-proportional hazards models with time-dependent covariates. Let zi(t) denote the time-dependent covariates and γ be a q × 1-dimensional vector of regression coefficients. Without loss of generality (WLOG), we assume 0 = bi0 < bi1 < … biL = ∞ and zi (t) = ziℓ for t ∈[biℓ−1, biℓ), which is equivalent to

zi(t)==1Lzi1[bi1,bi)(t).

The corresponding hazard function is given by

λi(t)=j=1Jλj1[aj1,aj)(t)exp(xiβ+zi(t)γ)=j=1Jλj1[aj1,aj)(t)exp(=1Lziγ1[bi1,bi)(t))exp(xiβ),

and the cumulative hazard function can be simplified as follows

Λi(t)=0tj=1Jλj1[aj1,aj)(u)exp(=1Lziγ1[bi1,bi)(u))exp(xiβ)du=0tj=1J=1Lλj1[aj1,aj)[bi1,bi)(u)exp(=1Lziγ1[bi1,bi)(u))du exp(xiβ).

Let Ωijℓ denote the set [aj−1, aj)⋂[biℓ−1, biℓ), j = 1, …, J, = 1, …, L. Note that, the Ωijℓ’s represent disjoint sets and can be empty. Additionally, jΩij=[0,).

Then

Λi(t)=j=1Jλj=1LΔij(t)exp(ziγ)exp(xiβ), (2.8)

where

Δij(t)={0Ωij=    or    t<L(Ωij)tL(Ωij)L(Ωij)t<U(Ωij)U(Ωij)L(Ωij)tU(Ωij)

Lijℓ) and Uijℓ) are respectively the lower and upper bounds of the interval set Ωijℓ. The logarithm of the likelihood is given by

(β,γ,λD)=j=1Jdjlogλj+i=1Nδi(xiβ+zi*γ)j=1Jλj{i=1N=1LΔij(ti)exp(xiβ+ziγ)}, (2.9)

where zi*==1Lzi1[bi1,bi)(ti). Elements of the negated (p + J) × (p + J) Hessian matrix H = (Hi,j) are given by

Hr,s=j=1Jλj{i=1N=1LΔij(ti)xir*xis*exp(xiβ+ziγ)},
Hp+q+m,r=Hr,p+q+m=i=1N=1LΔim(ti)xir*exp(xiβ+ziγ), (2.10)
Hp+q+m,p+q+n=1(m=n)dmλm2,

where xi*=(xi,Zi), for i = 1, …, N, r, s = 1, …, p + q, and m, n = 1, …, J.

The logarithm of the likelihood function in (2.9) and the negated Hessian matrix in (2.10) with time-dependent covariates have the same formats as the corresponding functions with only time-independent covariates in (2.5) and (2.7). Therefore, the following proposed methods with only time-independent covariates can be directly applied to the online-updating of survival analysis with time-dependent covariates.

Remark 2.2.1. A more general form of (2.9) can be written as

(β,γ,λD)=j=1Jdjlogλj+i=1Nδi(xiβ+zi(ti)γ)j=1Jλj{i=1N0ti1[aj1,aj)(u)exp(zi(u)γ)duexp(xiβ)}, (2.11)

which can be approximated by the Riemann sum in (2.9) as L goes to infinity.

2.3. Fixed Partition

In the steaming data setting, we suppose that the N observations are not available all at once, but rather arrive in chunks from a large data stream. Suppose at each accumulation point k, for the nk observations, we observe tk, δk, and Xk, which are the nk-dimensional vector of observed times, the nk-dimensional vector of event indicator, and the nk × p matrix of baseline covariates, respectively, for k = 1, …, K such that t=(t1,t2,,tK), δ=(δ1,δ2,,δK), X=(X1,X2,,XK).

For now, we let the partition for the piecewise hazard function be fixed through the entire updating process. WLOG, we assume each interval has at least one event for each block of data. Otherwise, for that particular block, we temporarily merge the consecutive intervals to ensure each new wider interval has at least one event. After that, we still return to the pre-specified fixed partition and set the constant hazard estimates of each problematic original interval the same as the estimate of the corresponding combined interval.

Let θnk,k=(βnk,k,λnk,k) and Hnk,k=Hnk,k(θnk,k) denote the current estimators of θ and H, which are obtained in a similar way as in (2.6) and (2.7) but based on the kth subset. The online-updating estimator of θ based on the cumulative data Dk = {(t, X, δ), = 1, …, k)} is given by

θk=(=1kHn,)1(=1kHn,θn,), (2.12)

which is equivalent to

θk=(Hk1+Hnk,k)1(Hk1θk1+Hnk,kθnk,k), (2.13)

with θ0=0,Hk==1kHn, is the cumulative negated Hessian matrix and H0 = 0p+J is a (p + J) × (p + J) matrix of zeros.

A natural variance estimator of θk is given by

Vk=(=1kHn,)1=1kHn,Vn,Hn,[(=1kHn,)1], (2.14)

where Vnk,k=(Hnk,k)1 is the variance estimator of θnk,k, from the subset k. Equation (2.14) can thus be simplified as

Vk=(=1kHn,)1. (2.15)

Remark 2.3.1. A conventional online-updating algorithm for θk is given by

θk=γkθk1+(1γk)θnk,k,

where θk is a weighted average between the cumulative estimators θk−1 at last update and the current estimators θnk,k, with the scalar weight functions satisfying 1γ= and 1γ2<. Cappé (2011), for example, proposed γk = kα, which only depends on k and α ∈ (0.5, 1]. Our proposed estimator in (2.13) is also a weighted average between θk−1 and θnk,k, but with different weight functions (Hnk,k). The second order bias of θk can be reduced with the carefully selected weight functions and the bias correction terms introduced in Sections 2.5 and 2.6.

2.4. Adaptive Partition

In the previous section, we select the partition based on the first block of data, and assume the partition is fixed during the entire online-updating procedure. However, this assumption may not be ideal as more and more data accumulate. For the following blocks of data, the number of events within each interval may vary tremendously. Some intervals may contain zero events (as mentioned in the previous section) while some intervals may contain an overwhelming number of events. Thus, instead of sticking with the initial partition, it would be desirable to allow for an adaptive partition, i.e., splitting the interval with too many events into subintervals. As more and more data arrive, the partition of the piecewise constant hazard function becomes finer and finer. With the increasingly fine partition, the fitted baseline hazard function should be able to capture the true baseline hazard function.

Assume for the (k − 1)th block, we have J intervals, (p + J)-dimensional vector of the cumulative estimator θk−1, and (p + J) × (p + J)-dimensional matrix of the cumulative negated Hessian matrix Hk−1. For the kth block, if each interval has similar event size (see Remark 2.4.4), we will continue using the same partition from the (k − 1)th block. Otherwise, WLOG, assume the Jth interval has the maximum number of events and we thus partition this interval into two subintervals.

The estimator θnk,k, of the current block is of length (p + J + 1), and the negated Hessian matrix Hnk,k of the current block is of dimension (p + J + 1) × (p + J + 1). However, θk−1 and θnk,k, as well as Hk−1 and Hnk,k, do not have the same dimensions and therefore, are not additive. To resolve this problem, we need to expand both θk−1 and Hk−1. For this purpose, let the symbol * denote the expansion of a matrix or a vector, and denote by λJ* and λJ+1* the corresponding unknown constants for the two new subintervals of the Jth interval at accumulation point k − 1. Since λJ* and λJ+1* are unknown, and are closely related to the λJ, we assume

λJ=f(λJ*,λJ+1*), (2.16)

where f is a certain function. Further assume

λJ=wJpλJ*+wJ+1pλJ+1*, (2.17)

where wJp and wJ+1p are constants.

The expanded cumulative negated Hessian matrix Hk1* at block k − 1 is obtained by the chain rule,

2λl*λm=2λJλmλJλl*,2λl*βr=2λJβrλJλl*,    for l=J,J+1. (2.18)

To be specific, we introduce the (p + J + 1) × (p + J) expansion matrix Pk−1, where Pk−1(i, i) = 1, i = 1, …, (p + J − 1), Pk1(p+J,p+J)=wJp, Pk1(p+J+1,p+J)=wJ+1p, and 0 elsewhere. Then,

Hk1*={Hk1 if no interval added at block k,Pk1Hk1Pk1 otherwise.

Let θk1* and Vk1* denote the expanded cumulative estimator for θ and the corresponding variance, respectively. We further introduce the (p + J + 1) × (p + J) expansion matrix Qk−1, where Qk−1(i, i) = 1, i = 1, …, (p + J − 1), Qk1(p+J,p+J)=wJq, Qk1(p+J+1,p+J)=wJ+1q, and 0 elsewhere. We discuss the choice of constants wJq and wJ+1q, as well as wJp and wJ+1p, in Remark 2.4.1 and Section 2.6. We thus have

θk1*={θk1 if no interval added at block k,Qk1θk1 otherwise, 

and

Vk1*={Vk1 if no interval added at block kQk1Vk1Qk1 otherwise.

Finally, the online-updating estimator of θ based on cumulative data Dk and finer partition of piecewise baseline hazard function is given by

θk=(Hk1*+Hnk,k)1(Hk1*θk1*+Hnk,kθnk,k). (2.19)

An approximate variance estimator is given by

Vk=(Hk1*+Hnk,k)1(Hk1*Vk1*Hk1*+Hnk,k)[(Hk1*+Hnk,k)1]. (2.20)

Remark 2.4.1. We further impose constraints on wJp, wJ+1p, wJq, and wJ+1q (wJpwJq+wJ+1pwJ+1q=1 and all are positive) to reduce the bias of the new estimator in Section 2.6. The choice of wJq and wJ+1q depends on the underlying baseline hazard. If the baseline hazard function is strictly increasing (decreasing), we set 0<wJq<1<wJ+1q (0<wJ+1q<1<wJq). If the baseline hazard function is not strictly monotone or we do not know the true baseline hazard function, we set wJq=wJ+1q=1 as in the simulation studies and real data analyses. To satisfy wJpwJq+wJ+1pwJ+1q=1, we further set wJp=wJ+1p=0.5.

Remark 2.4.2. More complicated baseline hazard functions will require more pieces in the partition (larger J) in order to guarantee the consistency of the estimators. However, the gain from increasing the number of pieces in the partition should be balanced with the issues of power and ease of computation (Holford 1976). Thus, we may stop increasing the number of pieces in the partition once J reaches certain value Jmax, which depends on the true baseline hazard function if given. If the true baseline hazard function is unknown, we may stop increasing J such that the relative “distance” between the estimated baseline hazard functions at the previous and the current accumulation points is small enough, i.e., for a given small ϵ > 0, there exists k, such that supt|Λ^0k(t)Λ^0k1(t)Λ^0k1(t)|<ϵ, where, Λ0(t)=j=1Jλj1[aj1,aj)(t), and set the number of pieces at accumulation point k as Jmax.

Remark 2.4.3. WLOG, we can split more than one interval at a given block k. However, if we increase J only by 1 at block k then the cumulative negated Hessian matrix at the current block (Hk=Hk1*+Hnk,k) is most likely positive definite (p.d.) This can be shown by mathematical induction.

When k = 1, H1=Hn1,1, which is always p.d. Assume Hk−1 is p.d., and further assume for the kth update, we partition the Jth interval into two subintervals. We write Hk1* in terms of block matrices Hk1*=(A11a12a12a22), where we know that the leading (p + J) × (p + J) principal submatrix A11 is p.d. given that Hk−1 is p.d. Similarly, we write Hnk,k in terms of block matrices Hnk,k=(B11b12b12b22), where the leading (p + J) × (p + J) principal submatrix B11 is at least semi-positive definite (s.p.d.) or even p.d. since Hnk,k is always (s.)p.d. We then have Hk=(A11+B11a12+b12(a12+b12)a22+b22), where A11 + B11 is p.d. After some elementary transformations which preserve the rank of the matrix, we have (A11+B110p+J0m), where m = (a22 + b22) − (a12 + b12) (A11 + B11)−1 (a12 + b12) ≥ 0. Hk is thus p.d. if m0. The result is further confirmed by both simulation studies and real data analyses.

Remark 2.4.4. For the kth block of data, we partition the jmaxth interval into two subintervals, where jmax=argmaxj{jdj>rk=1JdJ} with expansion rate rk ≥ 1, for k = 1, …, K. In this paper, we set the expansion rate rk = 1 throughout. The expansion rate, however, does not need to be the same throughout the updates. Similar to the idea of simulated annealing, we can set rk relatively small at early stages (when k is small) to speed up the partition and quickly approximate the underlying baseline hazard function. Once the estimated baseline hazard function is relatively “stable” (see Remark 2.4.2), we then gradually increase rk to slow down the partition.

2.5. Fixed Partition and Bias Correction

Denote Mn,(θ) as the negated score function for the current block, which is defined in the same way as the score function in Section 2.1. In order to reduce the finite-sample bias introduced by (2.13) where the total number of intervals of the piecewise hazard function is fixed, similar to Schifano et al. (2016), we consider the Taylor expansion of Mn,(θ) around a vector θˇn, to be defined later. Then

Mn,(θ)=Mn,(θˇn,)+[Hn,(θˇn,)](θθˇn,)+Rˇn, (2.21)

with Rˇn, denoting the remainder. Denote θk as the solution of

=1kMn,(θˇn,)+=1k[Hn,(θˇn,)](θθˇn,)=0. (2.22)

Defining Hn,=[Hn,(θˇn,)] and Hk==1kHn,, then we have

θk={=1kHn,}1{=1kHn,θˇn,+=1kMn,(θˇn,)}, (2.23)

which can be written sequentially as

θk={Hk1+Hnk,k}1{Hk1θk1+Hnk,kθˇnk,k+Mnk,k(θˇnk,k)} (2.24)

with H0 = 0p+J and θ0 = 0.

We observe that

0=Mn,(θn,)Mn,(θˇn,)+Hn,(θn,θˇn,).

Thus, we have Hn,θˇn,+Mn,(θˇn,)Hn,θn,. Using the above approximation, the variance formula is given by

Vk=(Hk1+Hnk,k)1(=1kHn,Vn,Hn,)[(Hk1+Hnk,k)1]=(Hk1+Hnk,k)1(Hk1Vk1Hk1+Hnk,kVnk,kHnk,k)[(Hk1+Hnk,k)1]. (2.25)

Remark 2.5.1. If we choose θˇnk,k=θnk,k, then θk in (2.23) reduces to the estimator in (2.13), and bias is not corrected. Ideally, we should choose the intermediary estimator in a small neighborhood of the true θ, which is unknown but can be best approximated by utilizing all the cumulative information. Thus, we consider the intermediary estimator for fixed number of intervals given by

θˇnk,k=(Hk1+Hnk,k)1(Hk1θk1+Hnk,kθnk,k), (2.26)

for k = 1, 2, …, H0 = 0p+J, and θ0 = 0.

2.6. Adaptive Partition and Bias Correction

Following from Section (2.5), we propose a new estimator, which allows for the adaptive partition of the piecewise hazard function and with less bias. To make the presentation cleaner, assume the number of pieces of the hazard function is not increased until the kth block, and that there are J pieces in the partition in block k − 1. Denote by Mn,(θp+J) the negated score function for block , where θp+J is a vector of length p + J, for = 1, 2, …, k − 1.

We start with (2.21) by summing over from one to k − 1:

=1k1Mn,(θp+J)==1k1{Mn,(θˇn,)+Hn,(θˇn,)θˇn,}+{=1k1Hn,(θˇn,)}θp+J+=1k1Rˇn,.

Based on (2.23), we have

=1k1Mn,(θp+J)=Hk1θk1+Hk1θp+J+=1k1Rˇn,. (2.27)

Note that (2.27) still holds if we multiple Pk−1 on both sides,

Pk1=1k1Mn,(θp+J)=Pk1Hk1θk1+Pk1Hk1θp+J+Pk1=1k1Rˇn,. (2.28)

We set Pk−1 and Qk−1 as described in Section 2.4 such that Pk1Qk1=Ip+J. This is the same as putting constraints on wJp, wJ+1p, wJq, and wJ+1q in (2.17) such that wJpwJq+wJ+1pwJ+1q=1. Denote the expanded cumulative score function Pk1k1Mn,(θp+J) as Mk1*(θp+J+1), where θp+J+1=Qk1θp+J. We thus have an equivalent representation of (2.28) as

Mk1*(θp+J+1)={Pk1Hk1Pk1}Qk1θk1+{Pk1Hk1Pk1}Qk1θp+J+Pk1=1k1Rˇn,. (2.29)

Additionally, for the kth block, we have

Mnk,k(θp+J+1)=Mn,(θˇn,)+Hnk,kθp+J+1Hnk,kθˇnk,k+Rˇnk,k. (2.30)

Denote θk of length (p + J + 1) as the solution for the sum of (2.29) and (2.30), then we have

θk={Hk1*+Hnk,k}1{Hk1*θk1*+Hnk,kθˇnk,k+Mnk,k(θˇnk,k)}, (2.31)

where

θk1*={θk1 if nointerval added at block k,Qk1θk1 otherwise, 

and

Hk1*={Hk1 if no interval added at block k,Pk1Hk1Pk1 otherwise.

An approximate variance estimator of θk is given by

Vk=(Hk1*+Hnk,k)1(Hk1*Vk1*Hk1*+Hnk,kVnk,kHnk,k)[(Hk1*+Hnk,k)1], (2.32)

where

Vk1*={Vk1 if no interval added at block kQk1Vk1Qk1 otherwise.

Remark 2.6.1. If we allow for increasing number of intervals, θˇnk,k in (2.26) becomes

θˇnk,k=(Hk1*+Hnk,k)1(Hk1*θk1*+Hnk,kθnk,k)

Remark 2.6.2. We assume for simplicity of notation that nk = n for all k = 1, 2, …, K. Let β0 denote the true value of β in the multiplicative intensity model. Denote by βn and βN the MPLEs of the Cox model based on the n current observations and the N entire observations, respectively. Under Conditions A-D of Friedman (1982) and P (tiTmax) > 0 (Fleming and Harrington 2011) for i = 1, …, n, where Tmax is a finite stopping time, then for each block, βn,kpβ0 and βn,kβnp0.

It is well known that the estimated cumulative baseline hazard function j=1Jλ^n,k,jΔj(t) is consistent under Conditions A-D of Friedman (1982) (Whittemore and Keller 1986). Furthermore, according to (2.6), βn,kpβ0, the width of each interval goes to 0 (Condition B of Friedman (1982)), and by Chebyshev’s weak law of large numbers, λ^n,k,j=i=1n1[aj1,aj)(ti)δii=1nΔj(ti)exp(xiβn,k)pλ0(aj1), for all j. Given that aj is dense in (0, Tmax], for any t(0,Tmax],j=1Jλ^n,k,j1[aj1,aj)(t) converges in probability to λ0(t).

Remark 2.6.3. Both simulation studies and real data analyses results show that, under standard regularity conditions (Conditions (2.1)–(2.6) of Fleming and Harrington (2011)), the online-updating estimator βK in (2.31) has similar convergence rate as MPLE based on the entire data βN as the number of block K and the number of pieces J increase, but not too fast (Condition C of Friedman (1982)). Furthermore, the estimated baseline hazard function converges to the true baseline hazard function pointwisely as K and J increase.

2.7. Cumulative Inference

With the advantage of online-updating estimators for the baseline hazard function, the proposed methods allow us to conduct cumulative statistical inference as a by-product. For example, comparisons between groups of survival rates at certain time points, as well as estimates of the entire survival curve are routinely reported in the clinical literature.

The cumulative estimated survival function at the kth block is given by

S^k(tx)=exp{j=1Jλ˜k,jΔk,j(t)exp(xβk)},

where 0 = ak0 < ak1 < … < akJ = ∞ is the updated partition at kth block and Δk,j(t) is defined in the same way as in (2.3) but corresponds to the new partition. By the delta method, the approximated variance estimator of S^k(tx) is given by V(S^k(tx))=SkHkSk, where

Sk=S^k(tx)exp(xβk)(j=1Jλ˜k,jΔk,j(t)x,Δk,1(t),,Δk,J(t)).

The 100(1−α)% pointwise confidence interval for the survival function is thus given by

(S^k(tx)1ϕ(t),S^k(tx)ϕ(t)),

where ϕ(t)=exp{z1α/2V(S^k(tx))log[S^k(tx)]Sk(tx)} and z1−α/2 is the (1 − α / 2) th quantile of the standard normal N(0, 1) distribution.

3. Simulation Studies

3.1. Simulation I

In Simulation I (censoring rate ≈ 45%), we first investigate the impact of initial number of intervals J0 on the performances of the four approaches: fixed/adaptive partition and bias/no bias correction estimators. We then focus on the “optimal” approaches: fixed/adaptive partition and bias correction, and compare them with MPLE estimator based on fitting the entire dataset simultaneously by SAS procedure PHREG and the cumulatively updated estimating equation (CUEE) estimator (Xue et al. 2020).

Simulation Setting

We generate B = 500 datasets of survival time ti independently from a Cox proportional model, for i = 1, …, N, with the baseline hazard function given by

h(t)=h0(t)+0.1exp(0.35t)sin(5t). (3.1)

If we assume a Weibull distribution for h0(t), then we have h(t) = 1.2t0.2 exp(−2) + 0.1exp(−0.35t) sin(5t). For the linear predictor in the Cox model, let β = (1, 0.5, −2.0), xki[1] ~ N(0, 1), xki[2] ~ Bernoulli(0.5), and xki[3] ~ Bernoulli(0.6) independently. After sampling the survival time for each subject, we generate their corresponding censoring time as min(Tmax, Uniform(0.7 Tmax, 1.5Tmax)), where Tmax = 10. Let the variable “event” be 1 if survival time is smaller than censoring time, and 0 otherwise. We set the total sample size N = 5, 000, 000, and the number of blocks K = 200 with nk = 2,500. The average event rates are 15.7%, 17.5%, 9.0%, and 12.4% for arms with (xki[2], xki[3]) = (0, 0), (1, 0), (0, 1), and (1, 1), respectively.

To examine the effect of the initial number of intervals J0 on the performances of the four proposed estimators, we let J0 vary from 1, 5, 10, to 15. Note that for the fixed partition estimators, J0 will stay the same (at 1, 5, 10 or 15) throughout the updates. For the adaptive estimators, we partition the interval with maximum number of events into two subintervals for each update until the total number of intervals reaches 50 (Jmax).

The Impact of J0

In Table 1, we report the average of the bias (Bias), the average of the standard errors (ASE), the simulation errors (SE), the root of the mean squared error (RMSE), the coverage probability (CP) of the 95% confidence intervals, and the computation time in minutes. The MPLE has little bias since it is obtained based on the entire dataset. The computation time of MPLE is not reported because MPLE is conducted in SAS while CUEE and the proposed approaches are conducted in FORTRAN using IMSL subroutines with double-precision accuracy on an Intel i7-4770 processor machine with 16 GB of RAM memory using a GNU/Linux operating system. Therefore, the computation time is not comparable. As expected, the computation time of fixed partition approaches increases as J0 increases and the computation time of adaptive approaches are similar under different values of J0. All proposed approaches had shorter computation time than CUEE.

Table 1.

Estimates from 500 replications and computation time in minutes under the MPLE, CUEE, fixed partition and bias correction (Fixed & BC), and adaptive partition and bias correction approaches (Adapt & BC), with varying J0, in Simulation I.

Variable Method J 0 Bias ASE SE RMSE CP Time
β 1 MPLE 0.0001 0.0023 0.0022 0.0022 0.958
CUEE 0.0000 0.0023 0.0022 0.0022 0.958 876.1
Fixed & BC 1 −0.0621 0.0021 0.0019 0.0622 0.000 44.8
Fixed & BC 5 −0.0085 0.0023 0.0043 0.0095 0.186 62.4
Fixed & BC 10 −0.0027 0.0023 0.0023 0.0036 0.774 68.8
Fixed & BC 15 −0.0013 0.0023 0.0023 0.0026 0.916 73.1
Adapt & BC 1 −0.0021 0.0023 0.0063 0.0066 0.856 82.6
Adapt & BC 5 0.0001 0.0023 0.0023 0.0023 0.962 103.3
Adapt & BC 10 0.0002 0.0023 0.0022 0.0023 0.964 102.9
Adapt & BC 15 0.0002 0.0023 0.0022 0.0023 0.962 104.6
β 2 MPLE 0.0002 0.0039 0.0037 0.0037 0.960
CUEE 0.0002 0.0039 0.0037 0.0037 0.958
Fixed & BC 1 −0.0309 0.0039 0.0035 0.0311 0.000
Fixed & BC 5 −0.0038 0.0039 0.0041 0.0056 0.834
Fixed & BC 10 −0.0008 0.0039 0.0038 0.0038 0.954
Fixed & BC 15 0.0001 0.0040 0.0037 0.0037 0.960
Adapt & BC 1 0.0014 0.0039 0.0049 0.0051 0.904
Adapt & BC 5 0.0024 0.0041 0.0038 0.0045 0.930
Adapt & BC 10 0.0023 0.0041 0.0038 0.0044 0.932
Adapt & BC 15 0.0021 0.0041 0.0037 0.0043 0.932
β 3 MPLE −0.0001 0.0044 0.0043 0.0043 0.954
CUEE 0.0000 0.0044 0.0043 0.0043 0.954
Fixed & BC 1 0.1202 0.0040 0.0037 0.1204 0.000
Fixed & BC 5 0.0140 0.0044 0.0067 0.0155 0.198
Fixed & BC 10 0.0039 0.0044 0.0044 0.0058 0.868
Fixed & BC 15 0.0023 0.0044 0.0043 0.0049 0.926
Adapt & BC 1 0.0061 0.0044 0.0121 0.0136 0.774
Adapt & BC 5 0.0018 0.0045 0.0043 0.0046 0.942
Adapt & BC 10 0.0015 0.0045 0.0043 0.0046 0.944
Adapt & BC 15 0.0014 0.0045 0.0043 0.0045 0.944

We first focus on the fixed partition and bias correction approach. As shown in Table 1, the estimator with J0 = 1 tends to be the most biased, particularly in the coefficients corresponding to binary covariates (β2 and β3). As expected, bias decreases with a larger J0 and the estimator with J0 = 5 already performs quite well. In addition, ASEs, SEs, and RMSEs with J0 > 1 for all parameters are close to those of MPLE. As expected, CPs increase as J0 increases and is close to 95% with J0 = 15. Similar results are observed in Figures S4 and S5. Figure S4 shows boxplots of the biases in MPLE from SAS and the fixed partition and bias correction estimator of βj, j = 1, …, 3, under different values of J0. The corresponding standard errors are shown in Figure S5. Figure S6 shows the fitted baseline hazard function of the fixed partition and bias correction approach under different values of J0. Again, as the initial partition becomes finer (J0 increases), the estimated piecewise baseline hazard function align better with the true baseline hazard function. However, even with J0 = 15, the fixed partition and bias correction approach still cannot fully recover the complicated true baseline hazard function.

Comparison between the MPLE and the Four Proposed Estimators (J0 = 5)

Controlling for J0(=5), we next show the comparison between MPLE and the four proposed approaches (Figures 13). A comparison of the biases with J0 = 5 is shown in Figure 1. Among the four proposed approaches, the adaptive partition and bias correction approach has the least biased estimates for βj, for all j = 1, …, 3. According to Figure 2, the standard errors of fixed partition approaches (both bias correction and no bias correction) and that of the adaptive approach and bias correction are similar to the standard error of MPLE, while the standard error of the adaptive partition and no bias correction approach is slightly smaller. As shown in Figure 3, even with J0 = 5, the adaptive partition and bias correction approach successfully recovers the true baseline hazard, with the fitted baseline hazard function in blue almost overlapping with the true baseline hazard function in red. The adaptive partition with no bias correction approach can capture the shape of the true function, but is biased. The two fixed partition approaches, for both bias and no bias correction, fail to estimate the true baseline hazard function given small values of J0. Additional figures on comparisons are given in the Supplementary Materials.

Fig. 1.

Fig. 1

Boxplots of bias for the 5 types of estimators (MPLE, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction), with J0 = 5, in Simulation I.

Fig. 3.

Fig. 3

Estimated baseline hazard functions for (a) fixed partition and no bias correction, (b) fixed partition and bias correction, (c) adaptive partition and no bias correction, and (d) adaptive partition and bias correction, with J0 = 5. The red curve is the true baseline hazard function, the blue curve is the estimated baseline hazard function, and the two black curves represent the pointwise 95% confidence intervals in Simulation I.

Fig. 2.

Fig. 2

Boxplots of standard error for the 5 types of estimators (MPLE, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction), with J0 = 5, in Simulation I.

Comparison between MPLE, CUEE, and the Bias-corrected Estimators

Now, we show comparison between MPLE, CUEE, and the “optimal” bias correction methods: fixed/adaptive and bias correction approaches with varying J0 (Table 1). Similar to the fixed partition and bias correction approach, as J0 increases, the bias of the adaptive partition and bias correction approach decreases, with J0 = 1 already performing quite well. ASEs, SEs, and RMSEs are close to those of MPLE and the CPs are also close to 95% under all values of J0.

Both adaptive partition and bias correction approach and CUEE perform well in terms of bias, ASE, SE, RMSE, and CP. One advantage of the proposed method is that it provides good estimates of the baseline hazard functions, which cannot be achieved by CUEE approach. As shown in Figure S12, even with J0 = 1, the fitted baseline hazard function under the adaptive partition and bias correction approach successfully captures the shape of the true baseline hazard. As J0 increases, the fitted and true baseline hazard curves almost overlap, which further justifies our proposed method in Section 2.6. Another advantage of the proposed method over CUEE is the computation time, especially when censoring rate is low (Simulation I).

3.2. Simulation II

In Simulation II (censoring rate ≈ 76%), we further compare the performances between MPLE, CUEE, and the proposed fixed/adaptive and bias correction estimators with varying J0.

Simulation Setting

We generate B = 500 datasets of survival time ti independently from a Cox proportional model, for i = 1, …, N, with the baseline hazard function given in (3.1). The censoring time and the variable “event” are generated as in Simulation I. To achieve high censoring rate which is frequently encountered in real life, let β = (1, −4.0, −4.0), xki[1] ~ N(−1, 1), xki[2] ~ Bernoulli(0.5), and xki[3] ~ Bernoulli(0.1 + 0.8xki[2]), where the binary covariates are highly correlated. We set the total sample size N = 5, 000, 000, and the number of blocks K = 1000 with nk = 500. The average event rates are 24.3%, 0.1%, 0.1%, and 0.0% for arms with (xki[2], xki[3]) = (0, 0), (1, 0), (0, 1), and (1, 1), respectively.

To examine the effect of the initial number of intervals J0 on the performances of the two proposed bias correction estimators, we let J0 vary from 1, 5, to 10. Due to the high censoring rate and rare event issues (average event rates around 0.0% in certain arms), for the adaptive estimators, we partition the interval with maximum number of events into two subintervals for each update until the total number of intervals reaches 30 (Jmax).

Comparison between MPLE, CUEE, and the Bias-corrected Estimators

Now, we show comparison between MPLE, CUEE, and the “optimal” bias correction methods: fixed/adaptive and bias correction approaches with varying J0 (Table 2).

Table 2.

Estimates from 500 replications and computation time in minutes under MPLE, CUEE, fixed partition and bias correction (Fixed & BC), and adaptive partition and bias correction approaches (Adapt & BC), with varying J0, in Simulation II.

Variable Method J 0 Bias ASE SE RMSE CP Time
β 1 MPLE −0.0007 0.0034 0.0036 0.0036 0.942
CUEE −0.0008 0.0034 0.0037 0.0038 0.920 192.7
Fixed & BC 1 −0.0530 0.0032 0.0032 0.0532 0.000 77.5
Fixed & BC 5 0.0005 0.0034 0.0042 0.0043 0.874 83.2
Fixed & BC 10 −0.0008 0.0034 0.0037 0.0038 0.930 90.2
Adapt & BC 1 −0.0155 0.0034 0.0037 0.0160 0.004 111.8
Adapt & BC 5 0.0002 0.0034 0.0037 0.0037 0.932 114.1
Adapt & BC 10 −0.0001 0.0034 0.0036 0.0036 0.946 113.8
β 2 MPLE −0.0014 0.0402 0.0391 0.0391 0.944
CUEE 0.2953 0.0269 0.2145 0.3653 0.080
Fixed & BC 1 0.0826 0.0395 0.0394 0.0916 0.442
Fixed & BC 5 0.0064 0.0395 0.0397 0.0402 0.936
Fixed & BC 10 0.0069 0.0395 0.0397 0.0403 0.930
Adapt & BC 1 0.0295 0.0394 0.0397 0.0495 0.884
Adapt & BC 5 0.0060 0.0396 0.0397 0.0401 0.936
Adapt & BC 10 0.0062 0.0396 0.0396 0.0401 0.938
β 3 MPLE −0.0007 0.0402 0.0424 0.0424 0.932
CUEE 0.3050 0.0268 0.2244 0.3789 0.086
Fixed & BC 1 0.0836 0.0395 0.0425 0.0938 0.424
Fixed & BC 5 0.0076 0.0394 0.0425 0.0432 0.930
Fixed & BC 10 0.0080 0.0395 0.0425 0.0433 0.926
Adapt & BC 1 0.0305 0.0394 0.0425 0.0523 0.850
Adapt & BC 5 0.0071 0.0396 0.0425 0.0431 0.926
Adapt & BC 10 0.0073 0.0396 0.0425 0.0431 0.928

First, we note that CUEE does not perform well for the coefficients of the two binary covariates (β2 and β3) in the presences of high censoring rate and rare event. The corresponding bias are huge, ASEs, SEs, and RMSEs are not comparable to those of MPLE, and the CPs are low.

The bias correction methods outperform CUEE under this setting. For both fixed/adaptive and bias correction approaches, as J0 increases, the bias of each parameter decreases, with J0 = 5 already performing quite well. ASEs, SEs, and RMSEs are close to those of MPLE and the CPs are also close to 95% with J0 > 1. Additionally, the adaptive partition and bias correction approach provides good estimates of the baseline hazard functions, which cannot be achieved by CUEE approach. As shown in Figure 4, even with J0 = 1, the fitted baseline hazard function under the adaptive partition and bias correction approach successfully captures the shape of the true baseline hazard. The computation time of the proposed method is also smaller than the computation time of CUEE even when censoring rate is high (Simulation II).

Fig. 4.

Fig. 4

Estimated baseline hazard functions for the adaptive partition and bias correction approach, for (a) J0 = 1, (b) J0 = 5, and (c) J0 = 10. The red curve is the true baseline hazard function, the blue curve is the estimated baseline hazard function, and the two black curves represent the pointwise 95% confidence intervals in Simulation II.

4. Analyses of Real Data

4.1. Analysis of the SEER Colon Cancer Data

We examine the SEER colon cancer statistics between 1973 to 2013, available at https://seer.cancer.gov/data/. The data involves 315,120 observations, after deleting all the subjects with missing covariates and survival time less than three months. For illustration purpose, we consider the survival time in SEER data as continuous. We set the maximum censoring time (Tmax) as 18 months, which means the subject is still considered as censored if the event occurs after 18 months. We are interested in the early stage (≤ 18 months) as colon cancer is highly curable. Under this scenario, the total number of events including both colon cancer death and other causes death is 67,798, with a censoring rate of 78.49%. The covariates considered in our analysis are year of diagnosis (Year) and surgery treatment indicator (RP). The covariate Year is continuous and the covariate RP is binary. Among the 67,798 patients, 4,586 underwent surgery treatment. The data satisfies the proportional hazards assumption by the test of Grambsch and Therneau (1994).

We use a subset sample size nk = 2,500 for k = 1, …, 127 to estimate the data in the online-updating framework. To examine the effect of the initial number of intervals J0 on the performances of the proposed estimators, we let J0 vary from 1, 3, to 5 given high censoring rate. For the adaptive partition approaches, we allow increment of pieces (at most one piece a time) for each update until the total number of pieces reaches 14 (Jmax). We set wJp=wJ+1p=0.5 and wJq=wJ+1q=1 since we do not know the underlying baseline hazard function.

The Impact of J0

As shown in Table 3, for the continuous covariate Year, the estimates and standard errors of the four approaches i.e, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction, are similar under different values of J0.

Table 3.

Estimates and standard errors for the SEER colon cancer data.

Method J 0 Variable Est SE Variable Est SE
MPLE RP 0.14552 0.03285 Year −0.17462 0.00393
CUEE 0.14798 0.03535 −0.17469 0.00392
Fixed & NBC 1 0.22358 0.03285 −0.17518 0.00394
Fixed & NBC 3 0.22697 0.03284 −0.17585 0.00393
Fixed & NBC 5 0.23128 0.03284 −0.17650 0.00393
Fixed & BC 1 0.14774 0.03327 −0.17482 0.00394
Fixed & BC 3 0.14755 0.03327 −0.17479 0.00394
Fixed & BC 5 0.14769 0.03327 −0.17477 0.00394
Adapt & NBC 1 0.26512 0.03283 −0.18206 0.00393
Adapt & NBC 3 0.25180 0.03284 −0.17967 0.00393
Adapt & NBC 5 0.25177 0.03284 −0.17967 0.00393
Adapt & BC 1 0.16063 0.03371 −0.17688 0.00400
Adapt & BC 3 0.14834 0.03331 −0.17477 0.00395
Adapt & BC 5 0.14853 0.03328 −0.17480 0.00395

For the binary covariate RP, the estimates under the adaptive partition approaches (both bias and no bias correction) tend to be closer to the estimate of MPLE as J0 increases, with J0 = 3 already performs quite well for the adaptive and bias correction approach. Estimates under the fixed partition approaches (both bias and no bias correction) are robust under different values of J0. All standard errors are similar to the standard errors of MPLE under different values of J0.

Comparison between MPLE, CUEE, and the Four Proposed Estimators (J0 = 5)

Controlling for J0 (= 5), we compare the estimates and standard errors between MPLE, CUEE, and the four approaches. In Table 3, the bias correction approaches (both fixed and adaptive partition), MPLE, and CUEE tend to be the most similar in terms of both regression coefficients and standard errors, except that CUEE has slightly larger SE for binary covariate RP. The regression coefficients for the other two approaches without bias correction (both fixed and adaptive partition) have similar results for the continuous covariate Year, but very different results for the binary covariate RP.

SAS PHREG does not directly provide the baseline hazard function without any assumption of the baseline hazard. We obtain the baseline hazard function based on the entire dataset by assuming the baseline hazard is piecewise constant with all the distinct event times as cutoff points, i.e, the Breslow estimator (Breslow 1972). We again compare the estimated baseline hazard functions of the four proposed methods with the result based on the entire data (Figure 5). The estimated baseline hazard function of the adaptive partition and bias correction approach in brown nearly overlaps with the estimated baseline hazard function based on the entire data in black. With so few pieces, the fixed partition approaches (both bias and no bias correction) fail to provide satisfactory results on estimating the baseline hazard function. Similar results were also observed in the previous simulation study.

Fig. 5.

Fig. 5

Estimated baseline hazard functions for the SEER colon cancer data for (i) all data (solid and black), (ii) fixed partition and no bias correction (dashed and green), (iii) fixed partition and bias correction (dotted and blue), (iv) adaptive partition and no bias correction (dot dash and orange), and (v) adaptive partition and bias correction (long dash and brown), with J0 = 5.

Figure 6 shows the plots of the estimated survival functions and the corresponding pointwise 95% confidence intervals stratified by the treatments (RP and no RP) evaluated at Year=1994, which corresponds to the average year of diagnosis. With the average year of diagnosis (Year), the estimates (95% CIs) of the survival rates were 0.892 (95% CI 0.891 – 0.893) for the subjects treated with RP and 0.873 (95% CI 0.872 – 0.874) for the subjects without surgery (no RP) treatment at 10 months after diagnosis; and 0.842 (95% CI 0.840 – 0.843) for the subjects treated with RP and 0.814 (95% CI 0.813 – 0.816) for the subjects without surgery (no RP) treatment at 15 months after diagnosis.

Fig. 6.

Fig. 6

Estimated survival functions under the adaptive partition and bias correction approach for the SEER colon cancer data. The blue and red curves represent the arms with average years of treatment, and with/without RP treatment, respectively. The two black curves represent the pointwise 95% confidence intervals, with J0 = 5.

4.2. Analysis of Successful Exit of Venture Capital (VC) Investing

We investigate the U.S.-based VC-backed companies that received their first round of VC funding between 1946 to 2019. The data, from VentureXpert database by Thomas Financial, involves 33,268 companies after deleting all the companies with missing round dates or initial public offering (IPO) dates. We consider successful exit (IPO) as event and the logarithm of number of days from first round of VC funding to IPO (event time) or the last round investment (censoring time) as the continuous survival time. Under this scenario, the total number of events is 3,717, with a high censoring rate of 88.83%. The covariates considered in our analysis are number of funds received (NumFunds) and cumulative amount of investment received at each round (CumAmounts) with total number of rounds ranging from 1 to 46, of which CumAmounts is time-dependent. Both covariates are continuous and are scaled via subtracting 9.07 and 48308.36 from them and divided by 6.70 and 245076.64, respectively, for numerical stability.

We start with a subset sample size nk = 1000 (k ≤ 5) to obtain enough cumulative events for analysis and then set nk = 500 for all subsequent block (k = 6, …, 62). Similar to Section 4.1, we let J0 vary from 1, 3, to 5. For the adaptive partition approaches, we allow increment of pieces (at most one piece a time) for each update until the total number of pieces reaches 15 (Jmax) due to the rare events. We again set wJp=wJ+1p=0.5 and wJq=wJ+1q=1 since we do not know the underlying baseline hazard function.

The Impact of J0

As shown in Table 4, all the estimates and standard errors tend to be closer to the estimate and standard error of MPLE as J0 increases. Among all the methods, the adaptive and bias correction approach with J0 = 5 yields the closest results to MPLE.

Table 4.

Estimates and standard errors for the VC data.

Method J 0 Variable Est SE Variable Est SE
MPLE NumFunds 0.13467 0.01504 CumAmounts 0.03051 0.00336
CUEE 0.13547 0.01537 0.03559 0.00683
Fixed & BC 1 0.40769 0.01333 0.05122 0.00279
Fixed & BC 3 0.18673 0.01464 0.04002 0.00397
Fixed & BC 5 0.15157 0.01488 0.03454 0.00385
Adapt & BC 1 0.36694 0.01677 0.05214 0.00344
Adapt & BC 3 0.15680 0.01571 0.04016 0.00448
Adapt & BC 5 0.13361 0.01555 0.03387 0.00395

Comparison between MPLE, CUEE, and the Bias-corrected Estimators (J0 = 5)

Controlling for J0 (= 5), we compare the estimates and standard errors between MPLE, CUEE, and the “optimal” bias correction approaches. The bias correction approaches have similar estimates and standard errors as MPLE for both time-independent (NumFunds) and time-dependent covariates (CumAmounts). CUEE also has similar estimates and standard errors as MPLE for the time-independent covariate, but larger standard errors for the time-dependent covariate.

We compare the estimated baseline hazard functions of the bias correction methods with the result based on the entire data (Figure 7). Similar to Section 4.1, the estimated baseline hazard function of the adaptive partition and bias correction approach in brown nearly overlaps with the estimated baseline hazard function based on the entire data in black. The fixed partition approach in blue fails to provide satisfactory results on estimating the baseline hazard function. Note that, CUEE does not automatically provide us the estimates of the baseline hazard function.

Fig. 7.

Fig. 7

Estimated baseline hazard functions for VC data for (i) all data (solid and black), (ii) fixed partition and bias correction (dotted and blue), and (iii) adaptive partition and bias correction (long dash and brown), with J0 = 5.

This example shows that the adaptive and bias correction approach performs as well as the full data approach based on the MPLE, even in the presence of time-dependent covariates and rare events.

5. Discussion

We developed online-updating algorithms and inferences for survival data, under the proportional hazard assumption. Among all the four approaches we proposed, the adaptive and bias correction approach is minimally storage-intensive and compares favorably with the existing method which requires access to the entire data, for both the regression coefficients and the baseline hazard function. Our methods are also applicable for time-dependent covariates, which relaxes the proportional hazard assumption. In this paper, we focus on the large-scale survival data where the event is induced by a single risk. One future direction would be to extend the current methods to other types of big survival data, including but not limited to competing-risks streaming data (Fine and Gray 1999) (SEER) and recurrent event data (Pena et al. 2001)(Zillow real estate data).

Supplementary Material

Supp 1

Acknowledgement

We would like to thank the Editor, the Associate Editor, and the two anonymous reviewers for their very helpful comments and suggestions, which have led to a much improved version of the paper. Dr. M.-H. Chen’s research was partially supported by NIH grants #GM70335 and #P01CA142538.

References

  1. Ai M, Yu J, Zhang H, and Wang H (2018). Optimal subsampling algorithms for big data generalized linear models. arXiv preprint arXiv:1806.06761. [Google Scholar]
  2. Barbian MH and Assunção RM (2017). Spatial subsemble estimator for large geostatistical data. Spatial Statistics, 22, 68–88. [Google Scholar]
  3. Breslow NE (1972). Discussion of the paper by D.R. Cox. Journal of the Royal Statistical Society: Series B, 34, 216–217. [Google Scholar]
  4. Cappé O (2011). Online EM algorithm for hidden markov models. Journal of Computational and Graphical Statistics, 20(3), 728–749. [Google Scholar]
  5. Carothers NL (2000). Real analysis. Cambridge University Press. [Google Scholar]
  6. Chang X, Lin S-B, Wang Y, et al. (2017). Divide and conquer local average regression. Electronic Journal of Statistics, 11(1), 1326–1350. [Google Scholar]
  7. Chen X and Xie M. g. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 24(4), 1655–1684. [Google Scholar]
  8. Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society B, 34, 187–220. [Google Scholar]
  9. Cox DR (1975). Partial likelihood. Biometrika, 62(2), 269–276. [Google Scholar]
  10. Fine JP and Gray RJ (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association, 94(446), 496–509. [Google Scholar]
  11. Fleming TR and Harrington DP (2011). Counting Processes and Survival Analysis. John Wiley & Sons. [Google Scholar]
  12. Friedman M (1982). Piecewise exponential models for survival data with covariates. The Annals of Statistics, 10(1), 101–113. [Google Scholar]
  13. Giot P and Schwienbacher A (2007). Ipos, trade sales and liquidations: Modelling venture capital exits using survival analysis. Journal of Banking & Finance, 31(3), 679–702. [Google Scholar]
  14. Grambsch PM and Therneau TM (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81(3), 515–526. [Google Scholar]
  15. Holford TR (1976). Life tables with concomitant information. Biometrics, 32(3), 587–597. [PubMed] [Google Scholar]
  16. Ibrahim JG, Chen M-H, and Sinha D (2001). Bayesian Survival Analysis. Springer Science & Business Media. [Google Scholar]
  17. Johansen S (1983). An extension of cox’s regression model. International Statistical Review/Revue Internationale de Statistique, 51(2), 165–174. [Google Scholar]
  18. Kalbfleisch JD and Prentice RL (2011). The statistical analysis of failure time data. John Wiley & Sons. [Google Scholar]
  19. Kawaguchi ES, Suchard MA, Liu Z, and Li G (2017). Scalable sparse cox’s regression for large-scale survival data via broken adaptive ridge. arXiv preprint arXiv:1712.00561. [Google Scholar]
  20. Kleiner A, Talwalkar A, Sarkar P, and Jordan MI (2014). A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 795–816. [Google Scholar]
  21. Kong E and Xia Y (2019). On the efficiency of online approach to nonparametric smoothing of big data. Statistica Sinica, 29(1), 185–201. [Google Scholar]
  22. Liang F, Cheng Y, Song Q, Park J, and Yang P (2013). A resampling-based stochastic approximation method for analysis of large geostatistical data. Journal of the American Statistical Association, 108(501), 325–339. [Google Scholar]
  23. Lin N and Xi R (2011). Aggregated estimating equation estimation. Statistics and Its Interface, 4, 73–83. [Google Scholar]
  24. Luo L and Song PX-K (2020). Renewable estimation and incremental inference in generalized linear models with streaming data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1), 69–97. [Google Scholar]
  25. Ma P, Mahoney MW, and Yu B (2013). A statistical perspective on algorithmic leveraging. arXiv preprint arXiv:1306.5362. [Google Scholar]
  26. Maclaurin D and Adams RP (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. arXiv preprint arXiv:1403.5693. [Google Scholar]
  27. Mittal S, Madigan D, Burd RS, and Suchard MA (2013). High-dimensional, massive sample-size cox proportional hazards regression for survival analysis. Biostatistics, 15(2), 207–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Neiswanger W, Wang C, and Xing E (2013). Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780. [Google Scholar]
  29. Pena EA, Strawderman RL, and Hollander M (2001). Nonparametric estimation with recurrent event data. Journal of the American Statistical Association, 96(456), 1299–1315. [Google Scholar]
  30. Plank SB, DeLuca S, and Estacion A (2008). High school dropout and the role of career and technical education: A survival analysis of surviving high school. Sociology of Education, 81(4), 345–370. [Google Scholar]
  31. Schifano ED, Wu J, Wang C, Yan J, and Chen M-H (2016). Online updating of statistical inference in the big data setting. Technometrics, 58(3), 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Song Q and Liang F (2015). A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(5), 947–972. [Google Scholar]
  33. Wang C, Chen M-H, Schifano E, Wu J, and Yan J (2016). Statistical methods and computing for big data. Statistics and its interface, 9(4), 399–414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Wang C, Chen M-H, Wu J, Yan J, Zhang Y, and Schifano E (2018a). Online updating method with new variables for big data streams. Canadian Journal of Statistics, 46(1), 123–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Wang H, Yang M, and Stufken J (2019). Information-based optimal subdata selection for big data linear regression. Journal of the American Statistical Association, 114(525), 393–405. [Google Scholar]
  36. Wang Y, Palmer N, Di Q, Schwartz J, Kohane I, and Cai T (2018b). A fast divide-and-conquer sparse cox regression. Forthcoming in Biostatistics. [DOI] [PMC free article] [PubMed]
  37. Whittemore AS and Keller JB (1986). Survival estimation using splines. Biometrics, 42(3), 495–506. [PubMed] [Google Scholar]
  38. Xue Y, Wang H, Yan J, and Schifano ED (2020). An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics, 76(1), 171–182. [DOI] [PubMed] [Google Scholar]
  39. Zillow (2016). Zillow Transitions to Streaming Data Architecture. https://www.zillow.com/data-science/streaming-data-architecture.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

RESOURCES