Online Updating of Survival Analysis

Jing Wu; Ming-Hui Chen; Elizabeth D Schifano; Jun Yan

doi:10.1080/10618600.2020.1870481

. Author manuscript; available in PMC: 2022 Mar 11.

Published in final edited form as: J Comput Graph Stat. 2021 Mar 8;30(4):1209–1223. doi: 10.1080/10618600.2020.1870481

Online Updating of Survival Analysis

Jing Wu ¹, Ming-Hui Chen ^2,^*, Elizabeth D Schifano ², Jun Yan ²

PMCID: PMC8916746 NIHMSID: NIHMS1722634 PMID: 35280977

Abstract

When large amounts of survival data arrive in streams, conventional estimation methods become computationally infeasible since they require access to all observations at each accumulation point. We develop online updating methods for carrying out survival analysis under the Cox proportional hazards model in an online-update framework. Our methods are also applicable with time-dependent covariates. Specifically, we propose online-updating estimators as well as their standard errors for both the regression coefficients and the baseline hazard function. Extensive simulation studies are conducted to investigate the empirical performance of the proposed estimators. A large colon cancer data set from the Surveillance, Epidemiology, and End Results (SEER) program and a large venture capital (VC) data set with time-dependent covariates are analyzed to demonstrate the utility of the proposed methodologies.

Keywords: Cox model, Data compression, Piecewise constant baseline hazard, SEER, Streaming Survival Data

1. Introduction

Survival analysis, or the analysis of time-to-event data (Kalbfleisch and Prentice 2011), has been widely applied in diverse fields such as biostatistics, economics, education, and sociology, among others (e.g., Ibrahim et al. 2001; Giot and Schwienbacher 2007; Plank et al. 2008). The advancement in computer technology has made possible the collection of “big survival data”, which brings opportunities as well as challenges towards new discoveries since most of the traditional survival analysis methods become computationally infeasible in the presence of large-scale survival data. For example, the Cox maximum partial likelihood estimation (MPLE) approach (e.g., Cox 1972, 1975), which has long been used for survival analysis, involves summations over risk sets requiring access to all observations, and is thus computationally challenging.

The modern statistical methodologies for big data can be roughly grouped into three categories (Wang et al. 2016): resampling-based (e.g., Wang et al. 2019; Ai et al. 2018; Kleiner et al. 2014; Maclaurin and Adams 2014; Ma et al. 2013; Liang et al. 2013), divide-and-conquer (e.g., Barbian and Assunção 2017; Chang et al. 2017; Song and Liang 2015; Chen and Xie 2014; Neiswanger et al. 2013; Lin and Xi 2011), and online-updating (e.g., Luo and Song 2020; Kong and Xia 2019; Wang et al. 2018a; Schifano et al. 2016). Recent developments in advanced survival analysis have focused on high-dimensional survival data (e.g., Kawaguchi et al. 2017; Mittal et al. 2013), while less attention has been paid to survival data with huge sample size as we focus here. Wang et al. (2018b) proposed a divide-and-conquer algorithm to handle high-dimensional and huge sample-size survival data using the Cox model. They first obtained a maximum partial likelihood estimator based on a subset, then updated the estimator via one-step linear approximations based on the entire data, and finally obtained the LASSO penalized estimator by applying a least-square approximation to the partial likelihood. Xue et al. (2020) proposed a cumulatively updated estimating equation (CUEE) estimator for the regression coefficients under the online-updating setting. Notably, however, none of these works provide an estimator for the baseline hazard function, which is essential for better understanding the survival process and for prediction. Furthermore, the literature on online-updating in the streaming survival data setting, where survival data arrive sequentially in large chunks and the full access to the entire data is infeasible, is still sparse.

We develop new methodologies to carry out survival analysis in the streaming survival data setting, which is not uncommon in real life. The Surveillance, Epidemiology, and End Results (SEER) program, for example, has been updating its database on cancer cases throughout the United States annually since 1973, to better understand the survival of cancer patients and reduce the cancer burden of the society. In the venture capital (VC) investing industry starting from 1946, time to successful exits, such as initial public offerings (IPOs) of the funded companies, is of significant importance for both VC investors and the companies. Real estate companies such as Zillow (Zillow 2016) also receive huge amounts of streaming data every second from various public sources, where time on market until a house is sold is of huge practical interest. Under such settings, most of the conventional estimation methods for survival analysis are computationally challenging since they require access to immense amounts of data. To overcome such computational challenges and inspired by the observations that (i) Cox partial likelihood function can be approximated by the likelihood function of piecewise exponential model (Johansen 1983) and that (ii) the maximum likelihood estimators of piecewise exponential model are consistent under mild conditions for big data (Friedman 1982), we propose four online-updating methods for survival analysis in the Cox proportional hazards model framework by assuming a piecewise constant baseline hazard function. By carrying out analysis in a parametric manner, we are able to estimate the baseline hazard function simultaneously with the regression coefficients, in contrast to the coefficients alone as in Xue et al. (2020). Furthermore, our methods, with minimal storage requirement, are computationally efficient and allow for online-updating of estimation and inference for both the regression coefficients and the baseline hazard function. Other novelties of the proposed methods include a characterization of crucial but not trivial conditions of the expansion matrices P and Q to correct the bias under the adaptive partition approach, and flexibility in including time-dependent covariates which relaxes the proportional hazard assumption.

Derivations of the formulae for the online-updating estimators and the standard errors for both regression coefficients and baseline hazards are provided in detail. Extensive simulation studies show that the proposed methods are competitive in comparison with the method using the entire data in terms of bias and standard errors for both the regression coefficients and baseline hazards. The analyses of the SEER colon cancer data and the VC data further demonstrate that the estimates under the proposed methods are similar to estimates obtained using the entire data simultaneously.

The remainder of this article is organized as follows. In Section 2, we briefly review the Cox proportional hazards model with piecewise constant baseline hazard function. We then propose four online-updating methods to carry out survival analysis in the streaming survival data setting, with the derivation of the algorithms and estimators presented in detail. We report extensive simulation studies in Section 3 and the real data analyses in Section 4. A discussion concludes in Section 5.

2. Online Updating Algorithms and Inference

2.1. Notation and Preliminaries

Suppose there are N independent observations D = {(t_i, δ_i, x_i), i = 1,2, …, N} of interest, where t_i is the observed time (either censoring or event time), δ_i is the indicator function with δ_i = 1 representing the event and δ_i = 0 indicating censored, and x_i is the p × 1-dimensional baseline covariate vector. Write t = (t₁, t₂, …, t_N)^⊤, δ = (δ₁, δ₂, …, δ_N)^⊤, and X = (x₁, x₂, …, x_N)^⊤.

The logarithm of the partial likelihood function of Cox model is given by

ℓ (β ∣ D) = \sum_{i = 1}^{N} δ_{i} log {\frac{exp (x_{i}^{⊤} β)}{\sum_{j \in R (t_{i})} exp (x_{j}^{⊤} β)}},

(2.1)

where $R (t) = {ℓ : t_{ℓ} \geq t}$ is the set of subjects at risk at time t and β is a p × 1-dimensional vector of regression coefficients. Unlike in Schifano et al. (2016), ℓ(β | D) in (2.1) can not be written as the summation of independent partial likelihood functions since the term $\sum_{j \in R (t_{i})} exp (x_{j}^{⊤} β)$ involved in each function depends on the entire data, for any i. Thus, this approach would require the full access to the entire data at each accumulation point or data block, which is not applicable under the streaming survival data setting. In addition, the MPLE approach does not allow us to estimate the baseline hazard function, which is essential if one is interested in prediction.

To address these problems, we consider a proportional hazards model with piecewise constant baseline hazard function. Note that the piecewise constant hazard is not a strong assumption since any continuous function on a closed interval can be uniformly approximated by a step (piecewise constant) function (Carothers 2000). In fact, the piecewise constant hazard allows us to approximate reasonably well almost any baseline hazard if the partition is fine enough (relative to the true baseline hazard). Assume we partition [0, ∞) into J intervals (0 = a₀ < a₁ < … < a_J = ∞), the piecewise constant hazard function is given by

λ_{i} (t) = \sum_{j = 1}^{J} λ_{j} 1_{[a_{j - 1}, a_{j})} (t) exp (x_{i}^{⊤} β),

(2.2)

where {λ_j | j = 1, …, J} are constant, λ = (λ₁, …, λ_J)^⊤, and β is a p × 1-dimensional vector of regression coefficients corresponding to covariates x_i. We write the cumulative piecewise linear hazard function as follows,

Λ_{i} (t) = \sum_{j = 1}^{J} λ_{j} Δ_{j} (t) exp (x_{i}^{⊤} β), where Δ_{j} (t) = {\begin{array}{l} 0 & t < a_{j - 1} \\ t - a_{j - 1} & a_{j - 1} \leq t < a_{j} \\ a_{j} - a_{j - 1} & t \geq a_{j} \end{array} .

(2.3)

After some algebra, the logarithm of likelihood function for model (2.2) is given by

ℓ (β, λ ∣ D) = \sum_{i = 1}^{N} {δ_{i} \sum_{j = 1}^{J} log λ_{j} 1_{[a_{j - 1}, a_{j})} (t_{i}) + δ_{i} x_{i}^{⊤} β - \sum_{j = 1}^{J} λ_{j} Δ_{j} (t_{i}) exp (x_{i}^{⊤} β)} .

(2.4)

Under this formulation, ℓ(β, λ | D) can be written as the summation of several independent log partial likelihood functions and thus the online-updating algorithm idea can be naturally applied.

Note that, we can also write (2.4) as follows

ℓ (β, λ ∣ D) = \sum_{j = 1}^{J} d_{j} log λ_{j} + \sum_{i = 1}^{N} δ_{i} x_{i}^{⊤} β - \sum_{j = 1}^{J} λ_{j} {\sum_{i = 1}^{N} Δ_{j} (t_{i}) exp (x_{i}^{⊤} β)},

(2.5)

where $d_{j} = \sum_{i = 1}^{N} δ_{i} 1_{[a_{j - 1}, a_{j})} (t_{i})$ .

Let $θ = {(β^{⊤}, λ_{J}^{⊤})}^{⊤}$ denote the collection of all the parameters and M(θ) the score function, which is the first-order partial derivative of the logarithm of likelihood function in (2.5). Let $θ = {(β^{⊤}, λ_{J}^{⊤})}^{⊤}$ denote the solution to the score equation

M (θ) = [\begin{matrix} \sum_{i = 1}^{N} δ_{i} x_{i 1} - \sum_{j = 1}^{J} λ_{j} \sum_{i = 1}^{N} Δ_{j} (t_{i}) exp (x_{i}^{⊤} β) x_{i 1} \\ ⋮ \\ \sum_{i = 1}^{N} δ_{i} x_{i p} - \sum_{j = 1}^{J} λ_{j} \sum_{i = 1}^{N} Δ_{j} (t_{i}) exp (x_{i}^{⊤} β) x_{i p} \\ \frac{d_{1}}{λ_{1}} - \sum_{i = 1}^{N} Δ_{1} (t_{i}) exp (x_{i}^{⊤} β) \\ ⋮ \\ \frac{d_{J}}{λ_{J}} - \sum_{i = 1}^{N} Δ_{J} (t_{i}) exp (x_{i}^{⊤} β) \end{matrix}] = 0 .

(2.6)

After taking the second-order partial derivatives of the log partial likelihood function, elements of the negated (p + J) × (p + J) Hessian matrix H = (H_i,j) are given by

H_{r, s} = - \frac{\partial^{2} ℓ}{\partial β_{r} \partial β_{s}} = \sum_{j = 1}^{J} λ_{j} {\sum_{i = 1}^{N} Δ_{j} (t_{i}) x_{i r} x_{i s} exp (x_{i}^{⊤} β)},

H_{p + m, r} = H_{r, p + m} = - \frac{\partial^{2} ℓ}{\partial λ_{m} \partial β_{r}} = \sum_{i = 1}^{N} Δ_{m} (t_{i}) x_{i r} exp (x_{i}^{⊤} β),

(2.7)

H_{p + m, p + n} = - \frac{\partial^{2} ℓ}{\partial λ_{m} \partial λ_{n}} = 1_{(m = n)} \frac{d_{m}}{λ_{m}^{2}},

for r, s = 1, …, p and m, n = 1, …, J.

2.2. Time-dependent Covariates

The above results can also be extended to non-proportional hazards models with time-dependent covariates. Let z_i(t) denote the time-dependent covariates and γ be a q × 1-dimensional vector of regression coefficients. Without loss of generality (WLOG), we assume 0 = b_i0 < b_i1 < … b_iL = ∞ and z_i (t) = z_iℓ for t ∈[b_iℓ−1, b_iℓ), which is equivalent to

z_{i} (t) = \sum_{ℓ = 1}^{L} z_{i ℓ} 1_{[b_{i ℓ - 1}, b_{i ℓ})} (t) .

The corresponding hazard function is given by

λ_{i} (t) = \sum_{j = 1}^{J} λ_{j} 1_{[a_{j - 1}, a_{j})} (t) exp (x_{i}^{⊤} β + z_{i} {(t)}^{⊤} γ) = \sum_{j = 1}^{J} λ_{j} 1_{[a_{j - 1}, a_{j})} (t) exp (\sum_{ℓ = 1}^{L} z_{i ℓ}^{⊤} γ 1_{[b_{i ℓ - 1}, b_{i ℓ})} (t)) exp (x_{i}^{⊤} β),

and the cumulative hazard function can be simplified as follows

Λ_{i} (t) = \int_{0}^{t} \sum_{j = 1}^{J} λ_{j} 1_{[a_{j - 1}, a_{j})} (u) exp (\sum_{ℓ = 1}^{L} z_{i ℓ}^{⊤} γ 1_{[b_{i ℓ - 1}, b_{i ℓ})} (u)) exp (x_{i}^{⊤} β) d u = \int_{0}^{t} \sum_{j = 1}^{J} \sum_{ℓ = 1}^{L} λ_{j} 1_{[a_{j - 1}, a_{j}) \cap [b_{i ℓ - 1}, b_{i ℓ})} (u) exp (\sum_{ℓ = 1}^{L} z_{i ℓ}^{⊤} γ 1_{[b_{i ℓ - 1}, b_{i ℓ})} (u)) d u exp (x_{i}^{⊤} β) .

Let Ω_ijℓ denote the set [a_j−1, a_j)⋂[b_iℓ−1, b_iℓ), j = 1, …, J, ℓ = 1, …, L. Note that, the Ω_ijℓ’s represent disjoint sets and can be empty. Additionally, $\underset{j ℓ}{\cup} Ω_{i j ℓ} = [0, \infty)$ .

Then

Λ_{i} (t) = \sum_{j = 1}^{J} λ_{j} \sum_{ℓ = 1}^{L} Δ_{i j ℓ (t)} exp (z_{i ℓ}^{⊤} γ) exp (x_{i}^{⊤} β),

(2.8)

where

Δ_{i j ℓ} (t) = {\begin{array}{l} 0 & Ω_{i j ℓ} = \emptyset or t < L (Ω_{i j ℓ}) \\ t - L (Ω_{i j ℓ}) & L (Ω_{i j ℓ}) \leq t < U (Ω_{i j ℓ}) \\ U (Ω_{i j ℓ}) - L (Ω_{i j ℓ}) & t \geq U (Ω_{i j ℓ}) \end{array}

L(Ω_ijℓ) and U(Ω_ijℓ) are respectively the lower and upper bounds of the interval set Ω_ijℓ. The logarithm of the likelihood is given by

ℓ (β, γ, λ ∣ D) = \sum_{j = 1}^{J} d_{j} log λ_{j} + \sum_{i = 1}^{N} δ_{i} (x_{i}^{⊤} β + z_{i}^{* ⊤} γ) - \sum_{j = 1}^{J} λ_{j} {\sum_{i = 1}^{N} \sum_{ℓ = 1}^{L} Δ_{i j ℓ (t_{i})} exp (x_{i}^{⊤} β + z_{i ℓ}^{⊤} γ)},

(2.9)

where $z_{i}^{*} = \sum_{ℓ = 1}^{L} z_{i ℓ}^{⊤} 1_{[b_{i ℓ - 1}, b_{i ℓ})} (t_{i})$ . Elements of the negated (p + J) × (p + J) Hessian matrix H = (H_i,j) are given by

H_{r, s} = \sum_{j = 1}^{J} λ_{j} {\sum_{i = 1}^{N} \sum_{ℓ = 1}^{L} Δ_{i j ℓ} (t_{i}) x_{i ℓ r}^{*} x_{i ℓ s}^{*} exp (x_{i}^{⊤} β + z_{i ℓ}^{⊤} γ)},

H_{p + q + m, r} = H_{r, p + q + m} = \sum_{i = 1}^{N} \sum_{ℓ = 1}^{L} Δ_{i m ℓ} (t_{i}) x_{i ℓ r}^{*} exp (x_{i}^{⊤} β + {z_{i ℓ}}^{⊤} γ),

(2.10)

H_{p + q + m, p + q + n} = 1_{(m = n)} \frac{d_{m}}{λ_{m}^{2}},

where $x_{i ℓ}^{*} = {(x_{i}^{⊤}, Z_{i ℓ}^{⊤})}^{⊤}$ , for i = 1, …, N, r, s = 1, …, p + q, and m, n = 1, …, J.

The logarithm of the likelihood function in (2.9) and the negated Hessian matrix in (2.10) with time-dependent covariates have the same formats as the corresponding functions with only time-independent covariates in (2.5) and (2.7). Therefore, the following proposed methods with only time-independent covariates can be directly applied to the online-updating of survival analysis with time-dependent covariates.

Remark 2.2.1. A more general form of (2.9) can be written as

ℓ (β, γ, λ ∣ D) = \sum_{j = 1}^{J} d_{j} log λ_{j} + \sum_{i = 1}^{N} δ_{i} (x_{i}^{⊤} β + z_{i} {(t_{i})}^{⊤} γ) - \sum_{j = 1}^{J} λ_{j} {\sum_{i = 1}^{N} \int_{0}^{t_{i}} 1_{[a_{j - 1}, a_{j})} (u) exp (z_{i} {(u)}^{⊤} γ) d u exp (x_{i}^{⊤} β)},

(2.11)

which can be approximated by the Riemann sum in (2.9) as L goes to infinity.

2.3. Fixed Partition

In the steaming data setting, we suppose that the N observations are not available all at once, but rather arrive in chunks from a large data stream. Suppose at each accumulation point k, for the n_k observations, we observe t_k, δ_k, and X_k, which are the n_k-dimensional vector of observed times, the n_k-dimensional vector of event indicator, and the n_k × p matrix of baseline covariates, respectively, for k = 1, …, K such that $t = {(t_{1}^{⊤}, t_{2}^{⊤}, \dots, t_{K}^{⊤})}^{⊤}$ , $δ = {(δ_{1}^{⊤}, δ_{2}^{⊤}, \dots, δ_{K}^{⊤})}^{⊤}$ , $X = {(X_{1}^{⊤}, X_{2}^{⊤}, \dots, X_{K}^{⊤})}^{⊤}$ .

For now, we let the partition for the piecewise hazard function be fixed through the entire updating process. WLOG, we assume each interval has at least one event for each block of data. Otherwise, for that particular block, we temporarily merge the consecutive intervals to ensure each new wider interval has at least one event. After that, we still return to the pre-specified fixed partition and set the constant hazard estimates of each problematic original interval the same as the estimate of the corresponding combined interval.

Let $θ_{n_{k}, k} = {(β_{n_{k}, k}^{⊤}, λ_{n_{k}, k}^{⊤})}^{⊤}$ and $H_{n_{k}, k} = H_{n_{k}, k} (θ_{n_{k}, k})$ denote the current estimators of θ and H, which are obtained in a similar way as in (2.6) and (2.7) but based on the k^th subset. The online-updating estimator of θ based on the cumulative data D_k = {(t_ℓ, X_ℓ, δ_ℓ), ℓ = 1, …, k)} is given by

θ_{k} = {(\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ})}^{- 1} (\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ} θ_{n_{ℓ}, ℓ}),

(2.12)

which is equivalent to

θ_{k} = {(H_{k - 1} + H_{n_{k}, k})}^{- 1} (H_{k - 1} θ_{k - 1} + H_{n_{k}, k} θ_{n_{k}, k}),

(2.13)

with $θ_{0} = 0, H_{k} = \sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ}$ is the cumulative negated Hessian matrix and H₀ = 0_p+J is a (p + J) × (p + J) matrix of zeros.

A natural variance estimator of θ_k is given by

V_{k} = {(\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ})}^{- 1} \sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ} V_{n_{ℓ}, ℓ} H_{n_{ℓ}, ℓ}^{⊤} {[{(\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ})}^{- 1}]}^{⊤},

(2.14)

where $V_{n_{k}, k} = {(H_{n_{k}, k})}^{- 1}$ is the variance estimator of $θ_{n_{k}, k}$ , from the subset k. Equation (2.14) can thus be simplified as

V_{k} = {(\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ})}^{- 1} .

(2.15)

Remark 2.3.1. A conventional online-updating algorithm for θ_k is given by

θ_{k} = γ_{k} θ_{k - 1} + (1 - γ_{k}) θ_{n_{k}, k},

where θ_k is a weighted average between the cumulative estimators θ_k−1 at last update and the current estimators $θ_{n_{k}, k}$ , with the scalar weight functions satisfying $\sum_{ℓ \geq 1} γ_{ℓ} = \infty$ and $\sum_{ℓ \geq 1} γ_{ℓ}^{2} < \infty$ . Cappé (2011), for example, proposed γ_k = k^−α, which only depends on k and α ∈ (0.5, 1]. Our proposed estimator in (2.13) is also a weighted average between θ_k−1 and $θ_{n_{k}, k}$ , but with different weight functions $(H_{n_{k}, k})$ . The second order bias of θ_k can be reduced with the carefully selected weight functions and the bias correction terms introduced in Sections 2.5 and 2.6.

2.4. Adaptive Partition

In the previous section, we select the partition based on the first block of data, and assume the partition is fixed during the entire online-updating procedure. However, this assumption may not be ideal as more and more data accumulate. For the following blocks of data, the number of events within each interval may vary tremendously. Some intervals may contain zero events (as mentioned in the previous section) while some intervals may contain an overwhelming number of events. Thus, instead of sticking with the initial partition, it would be desirable to allow for an adaptive partition, i.e., splitting the interval with too many events into subintervals. As more and more data arrive, the partition of the piecewise constant hazard function becomes finer and finer. With the increasingly fine partition, the fitted baseline hazard function should be able to capture the true baseline hazard function.

Assume for the (k − 1)^th block, we have J intervals, (p + J)-dimensional vector of the cumulative estimator θ_k−1, and (p + J) × (p + J)-dimensional matrix of the cumulative negated Hessian matrix H_k−1. For the k^th block, if each interval has similar event size (see Remark 2.4.4), we will continue using the same partition from the (k − 1)^th block. Otherwise, WLOG, assume the J^th interval has the maximum number of events and we thus partition this interval into two subintervals.

The estimator $θ_{n_{k}, k}$ , of the current block is of length (p + J + 1), and the negated Hessian matrix $H_{n_{k}, k}$ of the current block is of dimension (p + J + 1) × (p + J + 1). However, θ_k−1 and $θ_{n_{k}, k}$ , as well as H_k−1 and $H_{n_{k}, k}$ , do not have the same dimensions and therefore, are not additive. To resolve this problem, we need to expand both θ_k−1 and H_k−1. For this purpose, let the symbol * denote the expansion of a matrix or a vector, and denote by $λ_{J}^{*}$ and $λ_{J + 1}^{*}$ the corresponding unknown constants for the two new subintervals of the J^th interval at accumulation point k − 1. Since $λ_{J}^{*}$ and $λ_{J + 1}^{*}$ are unknown, and are closely related to the λ_J, we assume

λ_{J} = f (λ_{J}^{*}, λ_{J + 1}^{*}),

(2.16)

where f is a certain function. Further assume

λ_{J} = w_{J}^{p} λ_{J}^{*} + w_{J + 1}^{p} λ_{J + 1}^{*},

(2.17)

where $w_{J}^{p}$ and $w_{J + 1}^{p}$ are constants.

The expanded cumulative negated Hessian matrix $H_{k - 1}^{*}$ at block k − 1 is obtained by the chain rule,

- \frac{\partial^{2} ℓ}{\partial λ_{l}^{*} \partial λ_{m}} = - \frac{\partial^{2} ℓ}{\partial λ_{J} \partial λ_{m}} \frac{\partial λ_{J}}{\partial λ_{l}^{*}}, - \frac{\partial^{2} ℓ}{\partial λ_{l}^{*} \partial β_{r}} = - \frac{\partial^{2} ℓ}{\partial λ_{J} \partial β_{r}} \frac{\partial λ_{J}}{\partial λ_{l}^{*}}, for l = J, J + 1.

(2.18)

To be specific, we introduce the (p + J + 1) × (p + J) expansion matrix P_k−1, where P_k−1(i, i) = 1, i = 1, …, (p + J − 1), $P_{k - 1} (p + J, p + J) = w_{J}^{p}$ , $P_{k - 1} (p + J + 1, p + J) = w_{J + 1}^{p}$ , and 0 elsewhere. Then,

H_{k - 1}^{*} = {\begin{array}{l} H_{k - 1} & if no interval added at block k, \\ P_{k - 1} H_{k - 1} P_{k - 1}^{⊤} & otherwise . \end{array}

Let $θ_{k - 1}^{*}$ and $V_{k - 1}^{*}$ denote the expanded cumulative estimator for θ and the corresponding variance, respectively. We further introduce the (p + J + 1) × (p + J) expansion matrix Q_k−1, where Q_k−1(i, i) = 1, i = 1, …, (p + J − 1), $Q_{k - 1} (p + J, p + J) = w_{J}^{q}$ , $Q_{k - 1} (p + J + 1, p + J) = w_{J + 1}^{q}$ , and 0 elsewhere. We discuss the choice of constants $w_{J}^{q}$ and $w_{J + 1}^{q}$ , as well as $w_{J}^{p}$ and $w_{J + 1}^{p}$ , in Remark 2.4.1 and Section 2.6. We thus have

θ_{k - 1}^{*} = {\begin{array}{l} θ_{k - 1} & if no interval added at block k, \\ Q_{k - 1} θ_{k - 1} & otherwise, \end{array}

and

V_{k - 1}^{*} = {\begin{array}{l} V_{k - 1} & if no interval added at block k \\ Q_{k - 1} V_{k - 1} Q_{k - 1}^{⊤} & otherwise . \end{array}

Finally, the online-updating estimator of θ based on cumulative data D_k and finer partition of piecewise baseline hazard function is given by

θ_{k} = {(H_{k - 1}^{*} + H_{n_{k}, k})}^{- 1} (H_{k - 1}^{*} θ_{k - 1}^{*} + H_{n_{k}, k} θ_{n_{k}, k}) .

(2.19)

An approximate variance estimator is given by

V_{k} = {(H_{k - 1}^{*} + H_{n_{k}, k})}^{- 1} (H_{k - 1}^{*} V_{k - 1}^{*} {H_{k - 1}^{*}}^{⊤} + H_{n_{k}, k}) {[{(H_{k - 1}^{*} + H_{n_{k}, k})}^{- 1}]}^{⊤} .

(2.20)

Remark 2.4.1. We further impose constraints on $w_{J}^{p}$ , $w_{J + 1}^{p}$ , $w_{J}^{q}$ , and $w_{J + 1}^{q}$ ( $w_{J}^{p} w_{J}^{q} + w_{J + 1}^{p} w_{J + 1}^{q} = 1$ and all are positive) to reduce the bias of the new estimator in Section 2.6. The choice of $w_{J}^{q}$ and $w_{J + 1}^{q}$ depends on the underlying baseline hazard. If the baseline hazard function is strictly increasing (decreasing), we set $0 < w_{J}^{q} < 1 < w_{J + 1}^{q}$ ( $0 < w_{J + 1}^{q} < 1 < w_{J}^{q}$ ). If the baseline hazard function is not strictly monotone or we do not know the true baseline hazard function, we set $w_{J}^{q} = w_{J + 1}^{q} = 1$ as in the simulation studies and real data analyses. To satisfy $w_{J}^{p} w_{J}^{q} + w_{J + 1}^{p} w_{J + 1}^{q} = 1$ , we further set $w_{J}^{p} = w_{J + 1}^{p} = 0.5$ .

Remark 2.4.2. More complicated baseline hazard functions will require more pieces in the partition (larger J) in order to guarantee the consistency of the estimators. However, the gain from increasing the number of pieces in the partition should be balanced with the issues of power and ease of computation (Holford 1976). Thus, we may stop increasing the number of pieces in the partition once J reaches certain value J_max, which depends on the true baseline hazard function if given. If the true baseline hazard function is unknown, we may stop increasing J such that the relative “distance” between the estimated baseline hazard functions at the previous and the current accumulation points is small enough, i.e., for a given small ϵ > 0, there exists k, such that $sup_{t} | \frac{{\hat{Λ}}_{0_{k}} (t) - {\hat{Λ}}_{0_{k - 1}} (t)}{{\hat{Λ}}_{0_{k - 1}} (t)} | < ϵ$ , where, $Λ_{0} (t) = \sum_{j = 1}^{J} λ_{j} 1_{[a_{j - 1}, a_{j})} (t)$ , and set the number of pieces at accumulation point k as J_max.

Remark 2.4.3. WLOG, we can split more than one interval at a given block k. However, if we increase J only by 1 at block k then the cumulative negated Hessian matrix at the current block $(H_{k} = H_{k - 1}^{*} + H_{n_{k}, k})$ is most likely positive definite (p.d.) This can be shown by mathematical induction.

When k = 1, $H_{1} = H_{n_{1}, 1}$ , which is always p.d. Assume H_k−1 is p.d., and further assume for the k^th update, we partition the J^th interval into two subintervals. We write $H_{k - 1}^{*}$ in terms of block matrices $H_{k - 1}^{*} = (\begin{matrix} A_{11} & a_{12} \\ a_{12}^{⊤} & a_{22} \end{matrix})$ , where we know that the leading (p + J) × (p + J) principal submatrix A₁₁ is p.d. given that H_k−1 is p.d. Similarly, we write $H_{n_{k}, k}$ in terms of block matrices $H_{n_{k}, k} = (\begin{array}{l} B_{11} & b_{12} \\ b_{12}^{⊤} & b_{22} \end{array})$ , where the leading (p + J) × (p + J) principal submatrix B₁₁ is at least semi-positive definite (s.p.d.) or even p.d. since $H_{n_{k}, k}$ is always (s.)p.d. We then have $H_{k} = (\begin{matrix} A_{11} + B_{11} & a_{12} + b_{12} \\ {(a_{12} + b_{12})}^{⊤} & a_{22} + b_{22} \end{matrix})$ , where A₁₁ + B₁₁ is p.d. After some elementary transformations which preserve the rank of the matrix, we have $(\begin{matrix} A_{11} + B_{11} & 0_{p + J} \\ 0^{⊤} & m \end{matrix})$ , where m = (a₂₂ + b₂₂) − (a₁₂ + b₁₂)^⊤ (A₁₁ + B₁₁)⁻¹ (a₁₂ + b₁₂) ≥ 0. H_k is thus p.d. if m ≠ 0. The result is further confirmed by both simulation studies and real data analyses.

Remark 2.4.4. For the k^th block of data, we partition the j_max^th interval into two subintervals, where $j_{max} = {argmax}_{j} {j ∣ d_{j} > r_{k} \cdot \frac{\sum_{ℓ = 1}^{J} d_{ℓ}}{J}}$ with expansion rate r_k ≥ 1, for k = 1, …, K. In this paper, we set the expansion rate r_k = 1 throughout. The expansion rate, however, does not need to be the same throughout the updates. Similar to the idea of simulated annealing, we can set r_k relatively small at early stages (when k is small) to speed up the partition and quickly approximate the underlying baseline hazard function. Once the estimated baseline hazard function is relatively “stable” (see Remark 2.4.2), we then gradually increase r_k to slow down the partition.

2.5. Fixed Partition and Bias Correction

Denote $- M_{n_{ℓ}, ℓ} (θ)$ as the negated score function for the current block, which is defined in the same way as the score function in Section 2.1. In order to reduce the finite-sample bias introduced by (2.13) where the total number of intervals of the piecewise hazard function is fixed, similar to Schifano et al. (2016), we consider the Taylor expansion of $- M_{n_{ℓ}, ℓ} (θ)$ around a vector ${\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}$ to be defined later. Then

- M_{n_{ℓ}, ℓ} (θ) = - M_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) + [H_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ})] (θ - {\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) + {\overset{ˇ}{R}}_{n_{ℓ}, ℓ}

(2.21)

with ${\overset{ˇ}{R}}_{n_{ℓ}, ℓ}$ denoting the remainder. Denote θ_k as the solution of

\sum_{ℓ = 1}^{k} - M_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) + \sum_{ℓ = 1}^{k} [H_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ})] (θ - {\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) = 0 .

(2.22)

Defining $H_{n_{ℓ}, ℓ} = [H_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ})]$ and $H_{k} = \sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ}$ , then we have

θ_{k} = {\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ}}^{- 1} {\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ} {\overset{ˇ}{θ}}_{n_{ℓ}, ℓ} + \sum_{ℓ = 1}^{k} M_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ})},

(2.23)

which can be written sequentially as

θ_{k} = {H_{k - 1} + H_{n_{k}, k}}^{- 1} {H_{k - 1} θ_{k - 1} + H_{n_{k}, k} {\overset{ˇ}{θ}}_{n_{k}, k} + M_{n_{k}, k} ({\overset{ˇ}{θ}}_{n_{k}, k})}

(2.24)

with H₀ = 0_p+J and θ₀ = 0.

We observe that

0 = - M_{n_{ℓ}, ℓ} (θ_{n_{ℓ}, ℓ}) \approx - M_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) + H_{n_{ℓ}, ℓ} (θ_{n_{ℓ}, ℓ} - {\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) .

Thus, we have $H_{n_{ℓ}, ℓ} {\overset{ˇ}{θ}}_{n_{ℓ}, ℓ} + M_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) \approx H_{n_{ℓ}, ℓ} θ_{n_{ℓ}, ℓ}$ . Using the above approximation, the variance formula is given by

V_{k} = {(H_{k - 1} + H_{n_{k}, k})}^{- 1} (\sum_{ℓ = 1}^{k} H_{n_{ℓ}, ℓ} V_{n_{ℓ}, ℓ} H_{n_{ℓ}, ℓ}^{⊤}) {[{(H_{k - 1} + H_{n_{k}, k})}^{- 1}]}^{⊤} = {(H_{k - 1} + H_{n_{k}, k})}^{- 1} (H_{k - 1} V_{k - 1} H_{k - 1}^{⊤} + H_{n_{k}, k} V_{n_{k}, k} H_{n_{k}, k}^{⊤}) {[{(H_{k - 1} + H_{n_{k}, k})}^{- 1}]}^{⊤} .

(2.25)

Remark 2.5.1. If we choose ${\overset{ˇ}{θ}}_{n_{k}, k} = θ_{n_{k}, k}$ , then θ_k in (2.23) reduces to the estimator in (2.13), and bias is not corrected. Ideally, we should choose the intermediary estimator in a small neighborhood of the true θ, which is unknown but can be best approximated by utilizing all the cumulative information. Thus, we consider the intermediary estimator for fixed number of intervals given by

{\overset{ˇ}{θ}}_{n_{k}, k} = {(H_{k - 1} + H_{n_{k}, k})}^{- 1} (H_{k - 1} θ_{k - 1} + H_{n_{k}, k} θ_{n_{k}, k}),

(2.26)

for k = 1, 2, …, H₀ = 0_p+J, and θ₀ = 0.

2.6. Adaptive Partition and Bias Correction

Following from Section (2.5), we propose a new estimator, which allows for the adaptive partition of the piecewise hazard function and with less bias. To make the presentation cleaner, assume the number of pieces of the hazard function is not increased until the k^th block, and that there are J pieces in the partition in block k − 1. Denote by $- M_{n_{ℓ}, ℓ} (θ^{p + J})$ the negated score function for block ℓ, where θ^p+J is a vector of length p + J, for ℓ = 1, 2, …, k − 1.

We start with (2.21) by summing over ℓ from one to k − 1:

- \sum_{ℓ = 1}^{k - 1} M_{n_{ℓ}, ℓ} (θ^{p + J}) = - \sum_{ℓ = 1}^{k - 1} {M_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) + H_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) {\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}} + {\sum_{ℓ = 1}^{k - 1} H_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ})} θ^{p + J} + \sum_{ℓ = 1}^{k - 1} {\overset{ˇ}{R}}_{n_{ℓ}, ℓ} .

Based on (2.23), we have

- \sum_{ℓ = 1}^{k - 1} M_{n_{ℓ}, ℓ} (θ^{p + J}) = - H_{k - 1} θ_{k - 1} + H_{k - 1} θ^{p + J} + \sum_{ℓ = 1}^{k - 1} {\overset{ˇ}{R}}_{n_{ℓ}, ℓ} .

(2.27)

Note that (2.27) still holds if we multiple P_k−1 on both sides,

- P_{k - 1} \sum_{ℓ = 1}^{k - 1} M_{n_{ℓ}, ℓ} (θ^{p + J}) = - P_{k - 1} H_{k - 1} θ_{k - 1} + P_{k - 1} H_{k - 1} θ^{p + J} + P_{k - 1} \sum_{ℓ = 1}^{k - 1} {\overset{ˇ}{R}}_{n_{ℓ}, ℓ} .

(2.28)

We set P_k−1 and Q_k−1 as described in Section 2.4 such that $P_{k - 1}^{⊤} Q_{k - 1} = I_{p + J}$ . This is the same as putting constraints on $w_{J}^{p}$ , $w_{J + 1}^{p}$ , $w_{J}^{q}$ , and $w_{J + 1}^{q}$ in (2.17) such that $w_{J}^{p} w_{J}^{q} + w_{J + 1}^{p} w_{J + 1}^{q} = 1$ . Denote the expanded cumulative score function $P_{k - 1} \sum_{ℓ}^{k - 1} M_{n_{ℓ}, ℓ} (θ^{p + J})$ as $M_{k - 1}^{*} (θ^{p + J + 1})$ , where $θ^{p + J + 1} = Q_{k - 1} θ^{p + J}$ . We thus have an equivalent representation of (2.28) as

- M_{k - 1}^{*} (θ^{p + J + 1}) = - {P_{k - 1} H_{k - 1} P_{k - 1}^{⊤}} Q_{k - 1} θ_{k - 1} + {P_{k - 1} H_{k - 1} P_{k - 1}^{⊤}} Q_{k - 1} θ^{p + J} + P_{k - 1} \sum_{ℓ = 1}^{k - 1} {\overset{ˇ}{R}}_{n_{ℓ}, ℓ} .

(2.29)

Additionally, for the k^th block, we have

- M_{n_{k}, k} (θ^{p + J + 1}) = - M_{n_{ℓ}, ℓ} ({\overset{ˇ}{θ}}_{n_{ℓ}, ℓ}) + H_{n_{k}, k} θ^{p + J + 1} - H_{n_{k}, k} {\overset{ˇ}{θ}}_{n_{k}, k} + {\overset{ˇ}{R}}_{n_{k}, k} .

(2.30)

Denote θ_k of length (p + J + 1) as the solution for the sum of (2.29) and (2.30), then we have

θ_{k} = {H_{k - 1}^{*} + H_{n_{k}, k}}^{- 1} {H_{k - 1}^{*} θ_{k - 1}^{*} + H_{n_{k}, k} {\overset{ˇ}{θ}}_{n_{k}, k} + M_{n_{k}, k} ({\overset{ˇ}{θ}}_{n_{k}, k})},

(2.31)

where

θ_{k - 1}^{*} = {\begin{array}{l} θ_{k - 1} & if nointerval added at block k, \\ Q_{k - 1} θ_{k - 1} & otherwise, \end{array}

and

H_{k - 1}^{*} = {\begin{array}{l} H_{k - 1} & if no interval added at block k, \\ P_{k - 1} H_{k - 1} P_{k - 1}^{⊤} & otherwise . \end{array}

An approximate variance estimator of θ_k is given by

V_{k} = {(H_{k - 1}^{*} + H_{n_{k}, k})}^{- 1} (H_{k - 1}^{*} V_{k - 1}^{*} H_{k - 1}^{* ⊤} + H_{n_{k}, k} V_{n_{k}, k} H_{n_{k}, k}^{⊤}) {[{(H_{k - 1}^{*} + H_{n_{k}, k})}^{- 1}]}^{⊤},

(2.32)

where

V_{k - 1}^{*} = {\begin{array}{l} V_{k - 1} & if no interval added at block k \\ Q_{k - 1} V_{k - 1} Q_{k - 1}^{⊤} & otherwise . \end{array}

Remark 2.6.1. If we allow for increasing number of intervals, ${\overset{ˇ}{θ}}_{n_{k}, k}$ in (2.26) becomes

{\overset{ˇ}{θ}}_{n_{k}, k} = {(H_{k - 1}^{*} + H_{n_{k}, k})}^{- 1} (H_{k - 1}^{*} θ_{k - 1}^{*} + H_{n_{k}, k} θ_{n_{k}, k})

Remark 2.6.2. We assume for simplicity of notation that n_k = n for all k = 1, 2, …, K. Let β₀ denote the true value of β in the multiplicative intensity model. Denote by β_n and β_N the MPLEs of the Cox model based on the n current observations and the N entire observations, respectively. Under Conditions A-D of Friedman (1982) and P (t_i ≥ T_max) > 0 (Fleming and Harrington 2011) for i = 1, …, n, where T_max is a finite stopping time, then for each block, $β_{n, k} \overset{p}{\to} β_{0}$ and $‖ β_{n, k} - β_{n} ‖ \overset{p}{\to} 0$ .

It is well known that the estimated cumulative baseline hazard function $\sum_{j = 1}^{J} {\hat{λ}}_{n, k, j} Δ_{j} (t)$ is consistent under Conditions A-D of Friedman (1982) (Whittemore and Keller 1986). Furthermore, according to (2.6), $β_{n, k} \overset{p}{\to} β_{0}$ , the width of each interval goes to 0 (Condition B of Friedman (1982)), and by Chebyshev’s weak law of large numbers, ${\hat{λ}}_{n, k, j} = \frac{\sum_{i = 1}^{n} 1_{[a_{j - 1}, a_{j})} (t_{i}) δ_{i}}{\sum_{i = 1}^{n} Δ_{j} (t_{i}) exp (x_{i}^{⊤} β_{n, k})} \overset{p}{\to} λ_{0} (a_{j - 1})$ , for all j. Given that a_j is dense in (0, T_max], for any $t \in (0, T_{max}], \sum_{j = 1}^{J} {\hat{λ}}_{n, k, j} 1_{[a_{j - 1}, a_{j})} (t)$ converges in probability to λ₀(t).

Remark 2.6.3. Both simulation studies and real data analyses results show that, under standard regularity conditions (Conditions (2.1)–(2.6) of Fleming and Harrington (2011)), the online-updating estimator β_K in (2.31) has similar convergence rate as MPLE based on the entire data β_N as the number of block K and the number of pieces J increase, but not too fast (Condition C of Friedman (1982)). Furthermore, the estimated baseline hazard function converges to the true baseline hazard function pointwisely as K and J increase.

2.7. Cumulative Inference

With the advantage of online-updating estimators for the baseline hazard function, the proposed methods allow us to conduct cumulative statistical inference as a by-product. For example, comparisons between groups of survival rates at certain time points, as well as estimates of the entire survival curve are routinely reported in the clinical literature.

The cumulative estimated survival function at the k^th block is given by

{\hat{S}}_{k} (t ∣ x) = exp {- \sum_{j = 1}^{J} {\tilde{λ}}_{k, j} Δ_{k, j} (t) exp (x^{⊤} β_{k})},

where 0 = a_k0 < a_k1 < … < a_kJ = ∞ is the updated partition at k^th block and Δ_k,j(t) is defined in the same way as in (2.3) but corresponds to the new partition. By the delta method, the approximated variance estimator of ${\hat{S}}_{k} (t ∣ x)$ is given by $V ({\hat{S}}_{k} (t ∣ x)) = \nabla S_{k}^{⊤} H_{k} \nabla S_{k}$ , where

\nabla S_{k} = - {\hat{S}}_{k} (t ∣ x) exp (x^{⊤} β_{k}) {(\sum_{j = 1}^{J} {\tilde{λ}}_{k, j} Δ_{k, j} (t) x^{⊤}, Δ_{k, 1} (t), \dots, Δ_{k, J} (t))}^{⊤} .

The 100(1−α)% pointwise confidence interval for the survival function is thus given by

({\hat{S}}_{k} {(t ∣ x)}^{\frac{1}{ϕ (t)}}, {\hat{S}}_{k} {(t ∣ x)}^{ϕ (t)}),

where $ϕ (t) = exp {\frac{z_{1 - α / 2} \sqrt{V ({\hat{S}}_{k} (t ∣ x))}}{log [{\hat{S}}_{k} (t ∣ x)] S_{k} (t ∣ x)}}$ and z_1−α/2 is the (1 − α / 2) th quantile of the standard normal N(0, 1) distribution.

3. Simulation Studies

3.1. Simulation I

In Simulation I (censoring rate ≈ 45%), we first investigate the impact of initial number of intervals J₀ on the performances of the four approaches: fixed/adaptive partition and bias/no bias correction estimators. We then focus on the “optimal” approaches: fixed/adaptive partition and bias correction, and compare them with MPLE estimator based on fitting the entire dataset simultaneously by SAS procedure PHREG and the cumulatively updated estimating equation (CUEE) estimator (Xue et al. 2020).

Simulation Setting

We generate B = 500 datasets of survival time t_i independently from a Cox proportional model, for i = 1, …, N, with the baseline hazard function given by

h (t) = h_{0} (t) + 0.1 exp (- 0.35 t) sin (5 t) .

(3.1)

If we assume a Weibull distribution for h₀(t), then we have h(t) = 1.2t^0.2 exp(−2) + 0.1exp(−0.35t) sin(5t). For the linear predictor in the Cox model, let β = (1, 0.5, −2.0)^⊤, x_ki[1] ~ N(0, 1), x_ki[2] ~ Bernoulli(0.5), and x_ki[3] ~ Bernoulli(0.6) independently. After sampling the survival time for each subject, we generate their corresponding censoring time as min(T_max, Uniform(0.7 T_max, 1.5T_max)), where T_max = 10. Let the variable “event” be 1 if survival time is smaller than censoring time, and 0 otherwise. We set the total sample size N = 5, 000, 000, and the number of blocks K = 200 with n_k = 2,500. The average event rates are 15.7%, 17.5%, 9.0%, and 12.4% for arms with (x_ki[2], x_ki[3]) = (0, 0), (1, 0), (0, 1), and (1, 1), respectively.

To examine the effect of the initial number of intervals J₀ on the performances of the four proposed estimators, we let J₀ vary from 1, 5, 10, to 15. Note that for the fixed partition estimators, J₀ will stay the same (at 1, 5, 10 or 15) throughout the updates. For the adaptive estimators, we partition the interval with maximum number of events into two subintervals for each update until the total number of intervals reaches 50 (J_max).

The Impact of J₀

In Table 1, we report the average of the bias (Bias), the average of the standard errors (ASE), the simulation errors (SE), the root of the mean squared error (RMSE), the coverage probability (CP) of the 95% confidence intervals, and the computation time in minutes. The MPLE has little bias since it is obtained based on the entire dataset. The computation time of MPLE is not reported because MPLE is conducted in SAS while CUEE and the proposed approaches are conducted in FORTRAN using IMSL subroutines with double-precision accuracy on an Intel i7-4770 processor machine with 16 GB of RAM memory using a GNU/Linux operating system. Therefore, the computation time is not comparable. As expected, the computation time of fixed partition approaches increases as J₀ increases and the computation time of adaptive approaches are similar under different values of J₀. All proposed approaches had shorter computation time than CUEE.

Table 1.

Estimates from 500 replications and computation time in minutes under the MPLE, CUEE, fixed partition and bias correction (Fixed & BC), and adaptive partition and bias correction approaches (Adapt & BC), with varying J₀, in Simulation I.

Variable	Method	J ₀	Bias	ASE	SE	RMSE	CP	Time
β ₁	MPLE	—	0.0001	0.0023	0.0022	0.0022	0.958	—
	CUEE	—	0.0000	0.0023	0.0022	0.0022	0.958	876.1
	Fixed & BC	1	−0.0621	0.0021	0.0019	0.0622	0.000	44.8
	Fixed & BC	5	−0.0085	0.0023	0.0043	0.0095	0.186	62.4
	Fixed & BC	10	−0.0027	0.0023	0.0023	0.0036	0.774	68.8
	Fixed & BC	15	−0.0013	0.0023	0.0023	0.0026	0.916	73.1
	Adapt & BC	1	−0.0021	0.0023	0.0063	0.0066	0.856	82.6
	Adapt & BC	5	0.0001	0.0023	0.0023	0.0023	0.962	103.3
	Adapt & BC	10	0.0002	0.0023	0.0022	0.0023	0.964	102.9
	Adapt & BC	15	0.0002	0.0023	0.0022	0.0023	0.962	104.6
β ₂	MPLE	—	0.0002	0.0039	0.0037	0.0037	0.960	—
	CUEE	—	0.0002	0.0039	0.0037	0.0037	0.958	—
	Fixed & BC	1	−0.0309	0.0039	0.0035	0.0311	0.000	—
	Fixed & BC	5	−0.0038	0.0039	0.0041	0.0056	0.834	—
	Fixed & BC	10	−0.0008	0.0039	0.0038	0.0038	0.954	—
	Fixed & BC	15	0.0001	0.0040	0.0037	0.0037	0.960	—
	Adapt & BC	1	0.0014	0.0039	0.0049	0.0051	0.904	—
	Adapt & BC	5	0.0024	0.0041	0.0038	0.0045	0.930	—
	Adapt & BC	10	0.0023	0.0041	0.0038	0.0044	0.932	—
	Adapt & BC	15	0.0021	0.0041	0.0037	0.0043	0.932	—
β ₃	MPLE	—	−0.0001	0.0044	0.0043	0.0043	0.954	—
	CUEE	—	0.0000	0.0044	0.0043	0.0043	0.954	—
	Fixed & BC	1	0.1202	0.0040	0.0037	0.1204	0.000	—
	Fixed & BC	5	0.0140	0.0044	0.0067	0.0155	0.198	—
	Fixed & BC	10	0.0039	0.0044	0.0044	0.0058	0.868	—
	Fixed & BC	15	0.0023	0.0044	0.0043	0.0049	0.926	—
	Adapt & BC	1	0.0061	0.0044	0.0121	0.0136	0.774	—
	Adapt & BC	5	0.0018	0.0045	0.0043	0.0046	0.942	—
	Adapt & BC	10	0.0015	0.0045	0.0043	0.0046	0.944	—
	Adapt & BC	15	0.0014	0.0045	0.0043	0.0045	0.944	—

Open in a new tab

We first focus on the fixed partition and bias correction approach. As shown in Table 1, the estimator with J₀ = 1 tends to be the most biased, particularly in the coefficients corresponding to binary covariates (β₂ and β₃). As expected, bias decreases with a larger J₀ and the estimator with J₀ = 5 already performs quite well. In addition, ASEs, SEs, and RMSEs with J₀ > 1 for all parameters are close to those of MPLE. As expected, CPs increase as J₀ increases and is close to 95% with J₀ = 15. Similar results are observed in Figures S4 and S5. Figure S4 shows boxplots of the biases in MPLE from SAS and the fixed partition and bias correction estimator of β_j, j = 1, …, 3, under different values of J₀. The corresponding standard errors are shown in Figure S5. Figure S6 shows the fitted baseline hazard function of the fixed partition and bias correction approach under different values of J₀. Again, as the initial partition becomes finer (J₀ increases), the estimated piecewise baseline hazard function align better with the true baseline hazard function. However, even with J₀ = 15, the fixed partition and bias correction approach still cannot fully recover the complicated true baseline hazard function.

Comparison between the MPLE and the Four Proposed Estimators (J₀ = 5)

Controlling for J₀(=5), we next show the comparison between MPLE and the four proposed approaches (Figures 1–3). A comparison of the biases with J₀ = 5 is shown in Figure 1. Among the four proposed approaches, the adaptive partition and bias correction approach has the least biased estimates for β_j, for all j = 1, …, 3. According to Figure 2, the standard errors of fixed partition approaches (both bias correction and no bias correction) and that of the adaptive approach and bias correction are similar to the standard error of MPLE, while the standard error of the adaptive partition and no bias correction approach is slightly smaller. As shown in Figure 3, even with J₀ = 5, the adaptive partition and bias correction approach successfully recovers the true baseline hazard, with the fitted baseline hazard function in blue almost overlapping with the true baseline hazard function in red. The adaptive partition with no bias correction approach can capture the shape of the true function, but is biased. The two fixed partition approaches, for both bias and no bias correction, fail to estimate the true baseline hazard function given small values of J₀. Additional figures on comparisons are given in the Supplementary Materials.

Fig. 1 — Boxplots of bias for the 5 types of estimators (MPLE, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction), with J₀ = 5, in Simulation I.

Fig. 3 — Estimated baseline hazard functions for (a) fixed partition and no bias correction, (b) fixed partition and bias correction, (c) adaptive partition and no bias correction, and (d) adaptive partition and bias correction, with J₀ = 5. The red curve is the true baseline hazard function, the blue curve is the estimated baseline hazard function, and the two black curves represent the pointwise 95% confidence intervals in Simulation I.

Fig. 2 — Boxplots of standard error for the 5 types of estimators (MPLE, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction), with J₀ = 5, in Simulation I.

Comparison between MPLE, CUEE, and the Bias-corrected Estimators

Now, we show comparison between MPLE, CUEE, and the “optimal” bias correction methods: fixed/adaptive and bias correction approaches with varying J₀ (Table 1). Similar to the fixed partition and bias correction approach, as J₀ increases, the bias of the adaptive partition and bias correction approach decreases, with J₀ = 1 already performing quite well. ASEs, SEs, and RMSEs are close to those of MPLE and the CPs are also close to 95% under all values of J₀.

Both adaptive partition and bias correction approach and CUEE perform well in terms of bias, ASE, SE, RMSE, and CP. One advantage of the proposed method is that it provides good estimates of the baseline hazard functions, which cannot be achieved by CUEE approach. As shown in Figure S12, even with J₀ = 1, the fitted baseline hazard function under the adaptive partition and bias correction approach successfully captures the shape of the true baseline hazard. As J₀ increases, the fitted and true baseline hazard curves almost overlap, which further justifies our proposed method in Section 2.6. Another advantage of the proposed method over CUEE is the computation time, especially when censoring rate is low (Simulation I).

3.2. Simulation II

In Simulation II (censoring rate ≈ 76%), we further compare the performances between MPLE, CUEE, and the proposed fixed/adaptive and bias correction estimators with varying J₀.

Simulation Setting

We generate B = 500 datasets of survival time t_i independently from a Cox proportional model, for i = 1, …, N, with the baseline hazard function given in (3.1). The censoring time and the variable “event” are generated as in Simulation I. To achieve high censoring rate which is frequently encountered in real life, let β = (1, −4.0, −4.0)^⊤, x_ki[1] ~ N(−1, 1), x_ki[2] ~ Bernoulli(0.5), and x_ki[3] ~ Bernoulli(0.1 + 0.8x_ki[2]), where the binary covariates are highly correlated. We set the total sample size N = 5, 000, 000, and the number of blocks K = 1000 with n_k = 500. The average event rates are 24.3%, 0.1%, 0.1%, and 0.0% for arms with (x_ki[2], x_ki[3]) = (0, 0), (1, 0), (0, 1), and (1, 1), respectively.

To examine the effect of the initial number of intervals J₀ on the performances of the two proposed bias correction estimators, we let J₀ vary from 1, 5, to 10. Due to the high censoring rate and rare event issues (average event rates around 0.0% in certain arms), for the adaptive estimators, we partition the interval with maximum number of events into two subintervals for each update until the total number of intervals reaches 30 (J_max).

Comparison between MPLE, CUEE, and the Bias-corrected Estimators

Now, we show comparison between MPLE, CUEE, and the “optimal” bias correction methods: fixed/adaptive and bias correction approaches with varying J₀ (Table 2).

Table 2.

Estimates from 500 replications and computation time in minutes under MPLE, CUEE, fixed partition and bias correction (Fixed & BC), and adaptive partition and bias correction approaches (Adapt & BC), with varying J₀, in Simulation II.

Variable	Method	J ₀	Bias	ASE	SE	RMSE	CP	Time
β ₁	MPLE	—	−0.0007	0.0034	0.0036	0.0036	0.942	—
	CUEE	—	−0.0008	0.0034	0.0037	0.0038	0.920	192.7
	Fixed & BC	1	−0.0530	0.0032	0.0032	0.0532	0.000	77.5
	Fixed & BC	5	0.0005	0.0034	0.0042	0.0043	0.874	83.2
	Fixed & BC	10	−0.0008	0.0034	0.0037	0.0038	0.930	90.2
	Adapt & BC	1	−0.0155	0.0034	0.0037	0.0160	0.004	111.8
	Adapt & BC	5	0.0002	0.0034	0.0037	0.0037	0.932	114.1
	Adapt & BC	10	−0.0001	0.0034	0.0036	0.0036	0.946	113.8
β ₂	MPLE	—	−0.0014	0.0402	0.0391	0.0391	0.944	—
	CUEE	—	0.2953	0.0269	0.2145	0.3653	0.080	—
	Fixed & BC	1	0.0826	0.0395	0.0394	0.0916	0.442	—
	Fixed & BC	5	0.0064	0.0395	0.0397	0.0402	0.936	—
	Fixed & BC	10	0.0069	0.0395	0.0397	0.0403	0.930	—
	Adapt & BC	1	0.0295	0.0394	0.0397	0.0495	0.884	—
	Adapt & BC	5	0.0060	0.0396	0.0397	0.0401	0.936	—
	Adapt & BC	10	0.0062	0.0396	0.0396	0.0401	0.938	—
β ₃	MPLE	—	−0.0007	0.0402	0.0424	0.0424	0.932	—
	CUEE	—	0.3050	0.0268	0.2244	0.3789	0.086	—
	Fixed & BC	1	0.0836	0.0395	0.0425	0.0938	0.424	—
	Fixed & BC	5	0.0076	0.0394	0.0425	0.0432	0.930	—
	Fixed & BC	10	0.0080	0.0395	0.0425	0.0433	0.926	—
	Adapt & BC	1	0.0305	0.0394	0.0425	0.0523	0.850	—
	Adapt & BC	5	0.0071	0.0396	0.0425	0.0431	0.926	—
	Adapt & BC	10	0.0073	0.0396	0.0425	0.0431	0.928	—

Open in a new tab

First, we note that CUEE does not perform well for the coefficients of the two binary covariates (β₂ and β₃) in the presences of high censoring rate and rare event. The corresponding bias are huge, ASEs, SEs, and RMSEs are not comparable to those of MPLE, and the CPs are low.

The bias correction methods outperform CUEE under this setting. For both fixed/adaptive and bias correction approaches, as J₀ increases, the bias of each parameter decreases, with J₀ = 5 already performing quite well. ASEs, SEs, and RMSEs are close to those of MPLE and the CPs are also close to 95% with J₀ > 1. Additionally, the adaptive partition and bias correction approach provides good estimates of the baseline hazard functions, which cannot be achieved by CUEE approach. As shown in Figure 4, even with J₀ = 1, the fitted baseline hazard function under the adaptive partition and bias correction approach successfully captures the shape of the true baseline hazard. The computation time of the proposed method is also smaller than the computation time of CUEE even when censoring rate is high (Simulation II).

Fig. 4 — Estimated baseline hazard functions for the adaptive partition and bias correction approach, for (a) J₀ = 1, (b) J₀ = 5, and (c) J₀ = 10. The red curve is the true baseline hazard function, the blue curve is the estimated baseline hazard function, and the two black curves represent the pointwise 95% confidence intervals in Simulation II.

4. Analyses of Real Data

4.1. Analysis of the SEER Colon Cancer Data

We examine the SEER colon cancer statistics between 1973 to 2013, available at https://seer.cancer.gov/data/. The data involves 315,120 observations, after deleting all the subjects with missing covariates and survival time less than three months. For illustration purpose, we consider the survival time in SEER data as continuous. We set the maximum censoring time (T_max) as 18 months, which means the subject is still considered as censored if the event occurs after 18 months. We are interested in the early stage (≤ 18 months) as colon cancer is highly curable. Under this scenario, the total number of events including both colon cancer death and other causes death is 67,798, with a censoring rate of 78.49%. The covariates considered in our analysis are year of diagnosis (Year) and surgery treatment indicator (RP). The covariate Year is continuous and the covariate RP is binary. Among the 67,798 patients, 4,586 underwent surgery treatment. The data satisfies the proportional hazards assumption by the test of Grambsch and Therneau (1994).

We use a subset sample size n_k = 2,500 for k = 1, …, 127 to estimate the data in the online-updating framework. To examine the effect of the initial number of intervals J₀ on the performances of the proposed estimators, we let J₀ vary from 1, 3, to 5 given high censoring rate. For the adaptive partition approaches, we allow increment of pieces (at most one piece a time) for each update until the total number of pieces reaches 14 (J_max). We set $w_{J}^{p} = w_{J + 1}^{p} = 0.5$ and $w_{J}^{q} = w_{J + 1}^{q} = 1$ since we do not know the underlying baseline hazard function.

The Impact of J₀

As shown in Table 3, for the continuous covariate Year, the estimates and standard errors of the four approaches i.e, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction, are similar under different values of J₀.

Table 3.

Estimates and standard errors for the SEER colon cancer data.

Method	J ₀	Variable	Est	SE	Variable	Est	SE
MPLE	—	RP	0.14552	0.03285	Year	−0.17462	0.00393
CUEE	—		0.14798	0.03535		−0.17469	0.00392
Fixed & NBC	1		0.22358	0.03285		−0.17518	0.00394
Fixed & NBC	3		0.22697	0.03284		−0.17585	0.00393
Fixed & NBC	5		0.23128	0.03284		−0.17650	0.00393
Fixed & BC	1		0.14774	0.03327		−0.17482	0.00394
Fixed & BC	3		0.14755	0.03327		−0.17479	0.00394
Fixed & BC	5		0.14769	0.03327		−0.17477	0.00394
Adapt & NBC	1		0.26512	0.03283		−0.18206	0.00393
Adapt & NBC	3		0.25180	0.03284		−0.17967	0.00393
Adapt & NBC	5		0.25177	0.03284		−0.17967	0.00393
Adapt & BC	1		0.16063	0.03371		−0.17688	0.00400
Adapt & BC	3		0.14834	0.03331		−0.17477	0.00395
Adapt & BC	5		0.14853	0.03328		−0.17480	0.00395

Open in a new tab

For the binary covariate RP, the estimates under the adaptive partition approaches (both bias and no bias correction) tend to be closer to the estimate of MPLE as J₀ increases, with J₀ = 3 already performs quite well for the adaptive and bias correction approach. Estimates under the fixed partition approaches (both bias and no bias correction) are robust under different values of J₀. All standard errors are similar to the standard errors of MPLE under different values of J₀.

Comparison between MPLE, CUEE, and the Four Proposed Estimators (J₀ = 5)

Controlling for J₀ (= 5), we compare the estimates and standard errors between MPLE, CUEE, and the four approaches. In Table 3, the bias correction approaches (both fixed and adaptive partition), MPLE, and CUEE tend to be the most similar in terms of both regression coefficients and standard errors, except that CUEE has slightly larger SE for binary covariate RP. The regression coefficients for the other two approaches without bias correction (both fixed and adaptive partition) have similar results for the continuous covariate Year, but very different results for the binary covariate RP.

SAS PHREG does not directly provide the baseline hazard function without any assumption of the baseline hazard. We obtain the baseline hazard function based on the entire dataset by assuming the baseline hazard is piecewise constant with all the distinct event times as cutoff points, i.e, the Breslow estimator (Breslow 1972). We again compare the estimated baseline hazard functions of the four proposed methods with the result based on the entire data (Figure 5). The estimated baseline hazard function of the adaptive partition and bias correction approach in brown nearly overlaps with the estimated baseline hazard function based on the entire data in black. With so few pieces, the fixed partition approaches (both bias and no bias correction) fail to provide satisfactory results on estimating the baseline hazard function. Similar results were also observed in the previous simulation study.

Fig. 5 — Estimated baseline hazard functions for the SEER colon cancer data for (i) all data (solid and black), (ii) fixed partition and no bias correction (dashed and green), (iii) fixed partition and bias correction (dotted and blue), (iv) adaptive partition and no bias correction (dot dash and orange), and (v) adaptive partition and bias correction (long dash and brown), with J₀ = 5.

Figure 6 shows the plots of the estimated survival functions and the corresponding pointwise 95% confidence intervals stratified by the treatments (RP and no RP) evaluated at Year=1994, which corresponds to the average year of diagnosis. With the average year of diagnosis (Year), the estimates (95% CIs) of the survival rates were 0.892 (95% CI 0.891 – 0.893) for the subjects treated with RP and 0.873 (95% CI 0.872 – 0.874) for the subjects without surgery (no RP) treatment at 10 months after diagnosis; and 0.842 (95% CI 0.840 – 0.843) for the subjects treated with RP and 0.814 (95% CI 0.813 – 0.816) for the subjects without surgery (no RP) treatment at 15 months after diagnosis.

Fig. 6 — Estimated survival functions under the adaptive partition and bias correction approach for the SEER colon cancer data. The blue and red curves represent the arms with average years of treatment, and with/without RP treatment, respectively. The two black curves represent the pointwise 95% confidence intervals, with J₀ = 5.

4.2. Analysis of Successful Exit of Venture Capital (VC) Investing

We investigate the U.S.-based VC-backed companies that received their first round of VC funding between 1946 to 2019. The data, from VentureXpert database by Thomas Financial, involves 33,268 companies after deleting all the companies with missing round dates or initial public offering (IPO) dates. We consider successful exit (IPO) as event and the logarithm of number of days from first round of VC funding to IPO (event time) or the last round investment (censoring time) as the continuous survival time. Under this scenario, the total number of events is 3,717, with a high censoring rate of 88.83%. The covariates considered in our analysis are number of funds received (NumFunds) and cumulative amount of investment received at each round (CumAmounts) with total number of rounds ranging from 1 to 46, of which CumAmounts is time-dependent. Both covariates are continuous and are scaled via subtracting 9.07 and 48308.36 from them and divided by 6.70 and 245076.64, respectively, for numerical stability.

We start with a subset sample size n_k = 1000 (k ≤ 5) to obtain enough cumulative events for analysis and then set n_k = 500 for all subsequent block (k = 6, …, 62). Similar to Section 4.1, we let J₀ vary from 1, 3, to 5. For the adaptive partition approaches, we allow increment of pieces (at most one piece a time) for each update until the total number of pieces reaches 15 (J_max) due to the rare events. We again set $w_{J}^{p} = w_{J + 1}^{p} = 0.5$ and $w_{J}^{q} = w_{J + 1}^{q} = 1$ since we do not know the underlying baseline hazard function.

The Impact of J₀

As shown in Table 4, all the estimates and standard errors tend to be closer to the estimate and standard error of MPLE as J₀ increases. Among all the methods, the adaptive and bias correction approach with J₀ = 5 yields the closest results to MPLE.

Table 4.

Estimates and standard errors for the VC data.

Method	J ₀	Variable	Est	SE	Variable	Est	SE
MPLE	—	NumFunds	0.13467	0.01504	CumAmounts	0.03051	0.00336
CUEE	—		0.13547	0.01537		0.03559	0.00683
Fixed & BC	1		0.40769	0.01333		0.05122	0.00279
Fixed & BC	3		0.18673	0.01464		0.04002	0.00397
Fixed & BC	5		0.15157	0.01488		0.03454	0.00385
Adapt & BC	1		0.36694	0.01677		0.05214	0.00344
Adapt & BC	3		0.15680	0.01571		0.04016	0.00448
Adapt & BC	5		0.13361	0.01555		0.03387	0.00395

Open in a new tab

Comparison between MPLE, CUEE, and the Bias-corrected Estimators (J₀ = 5)

Controlling for J₀ (= 5), we compare the estimates and standard errors between MPLE, CUEE, and the “optimal” bias correction approaches. The bias correction approaches have similar estimates and standard errors as MPLE for both time-independent (NumFunds) and time-dependent covariates (CumAmounts). CUEE also has similar estimates and standard errors as MPLE for the time-independent covariate, but larger standard errors for the time-dependent covariate.

We compare the estimated baseline hazard functions of the bias correction methods with the result based on the entire data (Figure 7). Similar to Section 4.1, the estimated baseline hazard function of the adaptive partition and bias correction approach in brown nearly overlaps with the estimated baseline hazard function based on the entire data in black. The fixed partition approach in blue fails to provide satisfactory results on estimating the baseline hazard function. Note that, CUEE does not automatically provide us the estimates of the baseline hazard function.

Fig. 7 — Estimated baseline hazard functions for VC data for (i) all data (solid and black), (ii) fixed partition and bias correction (dotted and blue), and (iii) adaptive partition and bias correction (long dash and brown), with J₀ = 5.

This example shows that the adaptive and bias correction approach performs as well as the full data approach based on the MPLE, even in the presence of time-dependent covariates and rare events.

5. Discussion

We developed online-updating algorithms and inferences for survival data, under the proportional hazard assumption. Among all the four approaches we proposed, the adaptive and bias correction approach is minimally storage-intensive and compares favorably with the existing method which requires access to the entire data, for both the regression coefficients and the baseline hazard function. Our methods are also applicable for time-dependent covariates, which relaxes the proportional hazard assumption. In this paper, we focus on the large-scale survival data where the event is induced by a single risk. One future direction would be to extend the current methods to other types of big survival data, including but not limited to competing-risks streaming data (Fine and Gray 1999) (SEER) and recurrent event data (Pena et al. 2001)(Zillow real estate data).

Supplementary Material

Supp 1

NIHMS1722634-supplement-Supp_1.zip^{(11MB, zip)}

Acknowledgement

We would like to thank the Editor, the Associate Editor, and the two anonymous reviewers for their very helpful comments and suggestions, which have led to a much improved version of the paper. Dr. M.-H. Chen’s research was partially supported by NIH grants #GM70335 and #P01CA142538.

References

Ai M, Yu J, Zhang H, and Wang H (2018). Optimal subsampling algorithms for big data generalized linear models. arXiv preprint arXiv:1806.06761. [Google Scholar]
Barbian MH and Assunção RM (2017). Spatial subsemble estimator for large geostatistical data. Spatial Statistics, 22, 68–88. [Google Scholar]
Breslow NE (1972). Discussion of the paper by D.R. Cox. Journal of the Royal Statistical Society: Series B, 34, 216–217. [Google Scholar]
Cappé O (2011). Online EM algorithm for hidden markov models. Journal of Computational and Graphical Statistics, 20(3), 728–749. [Google Scholar]
Carothers NL (2000). Real analysis. Cambridge University Press. [Google Scholar]
Chang X, Lin S-B, Wang Y, et al. (2017). Divide and conquer local average regression. Electronic Journal of Statistics, 11(1), 1326–1350. [Google Scholar]
Chen X and Xie M. g. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 24(4), 1655–1684. [Google Scholar]
Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society B, 34, 187–220. [Google Scholar]
Cox DR (1975). Partial likelihood. Biometrika, 62(2), 269–276. [Google Scholar]
Fine JP and Gray RJ (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association, 94(446), 496–509. [Google Scholar]
Fleming TR and Harrington DP (2011). Counting Processes and Survival Analysis. John Wiley & Sons. [Google Scholar]
Friedman M (1982). Piecewise exponential models for survival data with covariates. The Annals of Statistics, 10(1), 101–113. [Google Scholar]
Giot P and Schwienbacher A (2007). Ipos, trade sales and liquidations: Modelling venture capital exits using survival analysis. Journal of Banking & Finance, 31(3), 679–702. [Google Scholar]
Grambsch PM and Therneau TM (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81(3), 515–526. [Google Scholar]
Holford TR (1976). Life tables with concomitant information. Biometrics, 32(3), 587–597. [PubMed] [Google Scholar]
Ibrahim JG, Chen M-H, and Sinha D (2001). Bayesian Survival Analysis. Springer Science & Business Media. [Google Scholar]
Johansen S (1983). An extension of cox’s regression model. International Statistical Review/Revue Internationale de Statistique, 51(2), 165–174. [Google Scholar]
Kalbfleisch JD and Prentice RL (2011). The statistical analysis of failure time data. John Wiley & Sons. [Google Scholar]
Kawaguchi ES, Suchard MA, Liu Z, and Li G (2017). Scalable sparse cox’s regression for large-scale survival data via broken adaptive ridge. arXiv preprint arXiv:1712.00561. [Google Scholar]
Kleiner A, Talwalkar A, Sarkar P, and Jordan MI (2014). A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 795–816. [Google Scholar]
Kong E and Xia Y (2019). On the efficiency of online approach to nonparametric smoothing of big data. Statistica Sinica, 29(1), 185–201. [Google Scholar]
Liang F, Cheng Y, Song Q, Park J, and Yang P (2013). A resampling-based stochastic approximation method for analysis of large geostatistical data. Journal of the American Statistical Association, 108(501), 325–339. [Google Scholar]
Lin N and Xi R (2011). Aggregated estimating equation estimation. Statistics and Its Interface, 4, 73–83. [Google Scholar]
Luo L and Song PX-K (2020). Renewable estimation and incremental inference in generalized linear models with streaming data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1), 69–97. [Google Scholar]
Ma P, Mahoney MW, and Yu B (2013). A statistical perspective on algorithmic leveraging. arXiv preprint arXiv:1306.5362. [Google Scholar]
Maclaurin D and Adams RP (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. arXiv preprint arXiv:1403.5693. [Google Scholar]
Mittal S, Madigan D, Burd RS, and Suchard MA (2013). High-dimensional, massive sample-size cox proportional hazards regression for survival analysis. Biostatistics, 15(2), 207–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neiswanger W, Wang C, and Xing E (2013). Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780. [Google Scholar]
Pena EA, Strawderman RL, and Hollander M (2001). Nonparametric estimation with recurrent event data. Journal of the American Statistical Association, 96(456), 1299–1315. [Google Scholar]
Plank SB, DeLuca S, and Estacion A (2008). High school dropout and the role of career and technical education: A survival analysis of surviving high school. Sociology of Education, 81(4), 345–370. [Google Scholar]
Schifano ED, Wu J, Wang C, Yan J, and Chen M-H (2016). Online updating of statistical inference in the big data setting. Technometrics, 58(3), 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
Song Q and Liang F (2015). A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(5), 947–972. [Google Scholar]
Wang C, Chen M-H, Schifano E, Wu J, and Yan J (2016). Statistical methods and computing for big data. Statistics and its interface, 9(4), 399–414. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang C, Chen M-H, Wu J, Yan J, Zhang Y, and Schifano E (2018a). Online updating method with new variables for big data streams. Canadian Journal of Statistics, 46(1), 123–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H, Yang M, and Stufken J (2019). Information-based optimal subdata selection for big data linear regression. Journal of the American Statistical Association, 114(525), 393–405. [Google Scholar]
Wang Y, Palmer N, Di Q, Schwartz J, Kohane I, and Cai T (2018b). A fast divide-and-conquer sparse cox regression. Forthcoming in Biostatistics. [DOI] [PMC free article] [PubMed]
Whittemore AS and Keller JB (1986). Survival estimation using splines. Biometrics, 42(3), 495–506. [PubMed] [Google Scholar]
Xue Y, Wang H, Yan J, and Schifano ED (2020). An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics, 76(1), 171–182. [DOI] [PubMed] [Google Scholar]
Zillow (2016). Zillow Transitions to Streaming Data Architecture. https://www.zillow.com/data-science/streaming-data-architecture.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

NIHMS1722634-supplement-Supp_1.zip^{(11MB, zip)}

[R1] Ai M, Yu J, Zhang H, and Wang H (2018). Optimal subsampling algorithms for big data generalized linear models. arXiv preprint arXiv:1806.06761. [Google Scholar]

[R2] Barbian MH and Assunção RM (2017). Spatial subsemble estimator for large geostatistical data. Spatial Statistics, 22, 68–88. [Google Scholar]

[R3] Breslow NE (1972). Discussion of the paper by D.R. Cox. Journal of the Royal Statistical Society: Series B, 34, 216–217. [Google Scholar]

[R4] Cappé O (2011). Online EM algorithm for hidden markov models. Journal of Computational and Graphical Statistics, 20(3), 728–749. [Google Scholar]

[R5] Carothers NL (2000). Real analysis. Cambridge University Press. [Google Scholar]

[R6] Chang X, Lin S-B, Wang Y, et al. (2017). Divide and conquer local average regression. Electronic Journal of Statistics, 11(1), 1326–1350. [Google Scholar]

[R7] Chen X and Xie M. g. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 24(4), 1655–1684. [Google Scholar]

[R8] Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society B, 34, 187–220. [Google Scholar]

[R9] Cox DR (1975). Partial likelihood. Biometrika, 62(2), 269–276. [Google Scholar]

[R10] Fine JP and Gray RJ (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association, 94(446), 496–509. [Google Scholar]

[R11] Fleming TR and Harrington DP (2011). Counting Processes and Survival Analysis. John Wiley & Sons. [Google Scholar]

[R12] Friedman M (1982). Piecewise exponential models for survival data with covariates. The Annals of Statistics, 10(1), 101–113. [Google Scholar]

[R13] Giot P and Schwienbacher A (2007). Ipos, trade sales and liquidations: Modelling venture capital exits using survival analysis. Journal of Banking & Finance, 31(3), 679–702. [Google Scholar]

[R14] Grambsch PM and Therneau TM (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81(3), 515–526. [Google Scholar]

[R15] Holford TR (1976). Life tables with concomitant information. Biometrics, 32(3), 587–597. [PubMed] [Google Scholar]

[R16] Ibrahim JG, Chen M-H, and Sinha D (2001). Bayesian Survival Analysis. Springer Science & Business Media. [Google Scholar]

[R17] Johansen S (1983). An extension of cox’s regression model. International Statistical Review/Revue Internationale de Statistique, 51(2), 165–174. [Google Scholar]

[R18] Kalbfleisch JD and Prentice RL (2011). The statistical analysis of failure time data. John Wiley & Sons. [Google Scholar]

[R19] Kawaguchi ES, Suchard MA, Liu Z, and Li G (2017). Scalable sparse cox’s regression for large-scale survival data via broken adaptive ridge. arXiv preprint arXiv:1712.00561. [Google Scholar]

[R20] Kleiner A, Talwalkar A, Sarkar P, and Jordan MI (2014). A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 795–816. [Google Scholar]

[R21] Kong E and Xia Y (2019). On the efficiency of online approach to nonparametric smoothing of big data. Statistica Sinica, 29(1), 185–201. [Google Scholar]

[R22] Liang F, Cheng Y, Song Q, Park J, and Yang P (2013). A resampling-based stochastic approximation method for analysis of large geostatistical data. Journal of the American Statistical Association, 108(501), 325–339. [Google Scholar]

[R23] Lin N and Xi R (2011). Aggregated estimating equation estimation. Statistics and Its Interface, 4, 73–83. [Google Scholar]

[R24] Luo L and Song PX-K (2020). Renewable estimation and incremental inference in generalized linear models with streaming data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1), 69–97. [Google Scholar]

[R25] Ma P, Mahoney MW, and Yu B (2013). A statistical perspective on algorithmic leveraging. arXiv preprint arXiv:1306.5362. [Google Scholar]

[R26] Maclaurin D and Adams RP (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. arXiv preprint arXiv:1403.5693. [Google Scholar]

[R27] Mittal S, Madigan D, Burd RS, and Suchard MA (2013). High-dimensional, massive sample-size cox proportional hazards regression for survival analysis. Biostatistics, 15(2), 207–221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Neiswanger W, Wang C, and Xing E (2013). Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780. [Google Scholar]

[R29] Pena EA, Strawderman RL, and Hollander M (2001). Nonparametric estimation with recurrent event data. Journal of the American Statistical Association, 96(456), 1299–1315. [Google Scholar]

[R30] Plank SB, DeLuca S, and Estacion A (2008). High school dropout and the role of career and technical education: A survival analysis of surviving high school. Sociology of Education, 81(4), 345–370. [Google Scholar]

[R31] Schifano ED, Wu J, Wang C, Yan J, and Chen M-H (2016). Online updating of statistical inference in the big data setting. Technometrics, 58(3), 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Song Q and Liang F (2015). A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(5), 947–972. [Google Scholar]

[R33] Wang C, Chen M-H, Schifano E, Wu J, and Yan J (2016). Statistical methods and computing for big data. Statistics and its interface, 9(4), 399–414. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Wang C, Chen M-H, Wu J, Yan J, Zhang Y, and Schifano E (2018a). Online updating method with new variables for big data streams. Canadian Journal of Statistics, 46(1), 123–146. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Wang H, Yang M, and Stufken J (2019). Information-based optimal subdata selection for big data linear regression. Journal of the American Statistical Association, 114(525), 393–405. [Google Scholar]

[R36] Wang Y, Palmer N, Di Q, Schwartz J, Kohane I, and Cai T (2018b). A fast divide-and-conquer sparse cox regression. Forthcoming in Biostatistics. [DOI] [PMC free article] [PubMed]

[R37] Whittemore AS and Keller JB (1986). Survival estimation using splines. Biometrics, 42(3), 495–506. [PubMed] [Google Scholar]

[R38] Xue Y, Wang H, Yan J, and Schifano ED (2020). An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics, 76(1), 171–182. [DOI] [PubMed] [Google Scholar]

[R39] Zillow (2016). Zillow Transitions to Streaming Data Architecture. https://www.zillow.com/data-science/streaming-data-architecture.

PERMALINK

Online Updating of Survival Analysis

Jing Wu

Ming-Hui Chen

Elizabeth D Schifano

Jun Yan

Abstract

1. Introduction

2. Online Updating Algorithms and Inference

2.1. Notation and Preliminaries

2.2. Time-dependent Covariates

2.3. Fixed Partition

2.4. Adaptive Partition

2.5. Fixed Partition and Bias Correction

2.6. Adaptive Partition and Bias Correction

2.7. Cumulative Inference

3. Simulation Studies

3.1. Simulation I

Simulation Setting

The Impact of J0

Table 1.

Comparison between the MPLE and the Four Proposed Estimators (J0 = 5)

Fig. 1.

Fig. 3.

Fig. 2.

Comparison between MPLE, CUEE, and the Bias-corrected Estimators

3.2. Simulation II

Simulation Setting

Comparison between MPLE, CUEE, and the Bias-corrected Estimators

Table 2.

Fig. 4.

4. Analyses of Real Data

4.1. Analysis of the SEER Colon Cancer Data

The Impact of J0

Table 3.

Comparison between MPLE, CUEE, and the Four Proposed Estimators (J0 = 5)

Fig. 5.

Fig. 6.

4.2. Analysis of Successful Exit of Venture Capital (VC) Investing

The Impact of J0

Table 4.

Comparison between MPLE, CUEE, and the Bias-corrected Estimators (J0 = 5)

Fig. 7.

5. Discussion

Supplementary Material

Acknowledgement

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

The Impact of J₀

Comparison between the MPLE and the Four Proposed Estimators (J₀ = 5)

The Impact of J₀

Comparison between MPLE, CUEE, and the Four Proposed Estimators (J₀ = 5)

The Impact of J₀

Comparison between MPLE, CUEE, and the Bias-corrected Estimators (J₀ = 5)