Abstract
When large amounts of survival data arrive in streams, conventional estimation methods become computationally infeasible since they require access to all observations at each accumulation point. We develop online updating methods for carrying out survival analysis under the Cox proportional hazards model in an online-update framework. Our methods are also applicable with time-dependent covariates. Specifically, we propose online-updating estimators as well as their standard errors for both the regression coefficients and the baseline hazard function. Extensive simulation studies are conducted to investigate the empirical performance of the proposed estimators. A large colon cancer data set from the Surveillance, Epidemiology, and End Results (SEER) program and a large venture capital (VC) data set with time-dependent covariates are analyzed to demonstrate the utility of the proposed methodologies.
Keywords: Cox model, Data compression, Piecewise constant baseline hazard, SEER, Streaming Survival Data
1. Introduction
Survival analysis, or the analysis of time-to-event data (Kalbfleisch and Prentice 2011), has been widely applied in diverse fields such as biostatistics, economics, education, and sociology, among others (e.g., Ibrahim et al. 2001; Giot and Schwienbacher 2007; Plank et al. 2008). The advancement in computer technology has made possible the collection of “big survival data”, which brings opportunities as well as challenges towards new discoveries since most of the traditional survival analysis methods become computationally infeasible in the presence of large-scale survival data. For example, the Cox maximum partial likelihood estimation (MPLE) approach (e.g., Cox 1972, 1975), which has long been used for survival analysis, involves summations over risk sets requiring access to all observations, and is thus computationally challenging.
The modern statistical methodologies for big data can be roughly grouped into three categories (Wang et al. 2016): resampling-based (e.g., Wang et al. 2019; Ai et al. 2018; Kleiner et al. 2014; Maclaurin and Adams 2014; Ma et al. 2013; Liang et al. 2013), divide-and-conquer (e.g., Barbian and Assunção 2017; Chang et al. 2017; Song and Liang 2015; Chen and Xie 2014; Neiswanger et al. 2013; Lin and Xi 2011), and online-updating (e.g., Luo and Song 2020; Kong and Xia 2019; Wang et al. 2018a; Schifano et al. 2016). Recent developments in advanced survival analysis have focused on high-dimensional survival data (e.g., Kawaguchi et al. 2017; Mittal et al. 2013), while less attention has been paid to survival data with huge sample size as we focus here. Wang et al. (2018b) proposed a divide-and-conquer algorithm to handle high-dimensional and huge sample-size survival data using the Cox model. They first obtained a maximum partial likelihood estimator based on a subset, then updated the estimator via one-step linear approximations based on the entire data, and finally obtained the LASSO penalized estimator by applying a least-square approximation to the partial likelihood. Xue et al. (2020) proposed a cumulatively updated estimating equation (CUEE) estimator for the regression coefficients under the online-updating setting. Notably, however, none of these works provide an estimator for the baseline hazard function, which is essential for better understanding the survival process and for prediction. Furthermore, the literature on online-updating in the streaming survival data setting, where survival data arrive sequentially in large chunks and the full access to the entire data is infeasible, is still sparse.
We develop new methodologies to carry out survival analysis in the streaming survival data setting, which is not uncommon in real life. The Surveillance, Epidemiology, and End Results (SEER) program, for example, has been updating its database on cancer cases throughout the United States annually since 1973, to better understand the survival of cancer patients and reduce the cancer burden of the society. In the venture capital (VC) investing industry starting from 1946, time to successful exits, such as initial public offerings (IPOs) of the funded companies, is of significant importance for both VC investors and the companies. Real estate companies such as Zillow (Zillow 2016) also receive huge amounts of streaming data every second from various public sources, where time on market until a house is sold is of huge practical interest. Under such settings, most of the conventional estimation methods for survival analysis are computationally challenging since they require access to immense amounts of data. To overcome such computational challenges and inspired by the observations that (i) Cox partial likelihood function can be approximated by the likelihood function of piecewise exponential model (Johansen 1983) and that (ii) the maximum likelihood estimators of piecewise exponential model are consistent under mild conditions for big data (Friedman 1982), we propose four online-updating methods for survival analysis in the Cox proportional hazards model framework by assuming a piecewise constant baseline hazard function. By carrying out analysis in a parametric manner, we are able to estimate the baseline hazard function simultaneously with the regression coefficients, in contrast to the coefficients alone as in Xue et al. (2020). Furthermore, our methods, with minimal storage requirement, are computationally efficient and allow for online-updating of estimation and inference for both the regression coefficients and the baseline hazard function. Other novelties of the proposed methods include a characterization of crucial but not trivial conditions of the expansion matrices P and Q to correct the bias under the adaptive partition approach, and flexibility in including time-dependent covariates which relaxes the proportional hazard assumption.
Derivations of the formulae for the online-updating estimators and the standard errors for both regression coefficients and baseline hazards are provided in detail. Extensive simulation studies show that the proposed methods are competitive in comparison with the method using the entire data in terms of bias and standard errors for both the regression coefficients and baseline hazards. The analyses of the SEER colon cancer data and the VC data further demonstrate that the estimates under the proposed methods are similar to estimates obtained using the entire data simultaneously.
The remainder of this article is organized as follows. In Section 2, we briefly review the Cox proportional hazards model with piecewise constant baseline hazard function. We then propose four online-updating methods to carry out survival analysis in the streaming survival data setting, with the derivation of the algorithms and estimators presented in detail. We report extensive simulation studies in Section 3 and the real data analyses in Section 4. A discussion concludes in Section 5.
2. Online Updating Algorithms and Inference
2.1. Notation and Preliminaries
Suppose there are N independent observations D = {(ti, δi, xi), i = 1,2, …, N} of interest, where ti is the observed time (either censoring or event time), δi is the indicator function with δi = 1 representing the event and δi = 0 indicating censored, and xi is the p × 1-dimensional baseline covariate vector. Write t = (t1, t2, …, tN)⊤, δ = (δ1, δ2, …, δN)⊤, and X = (x1, x2, …, xN)⊤.
The logarithm of the partial likelihood function of Cox model is given by
| (2.1) |
where is the set of subjects at risk at time t and β is a p × 1-dimensional vector of regression coefficients. Unlike in Schifano et al. (2016), ℓ(β | D) in (2.1) can not be written as the summation of independent partial likelihood functions since the term involved in each function depends on the entire data, for any i. Thus, this approach would require the full access to the entire data at each accumulation point or data block, which is not applicable under the streaming survival data setting. In addition, the MPLE approach does not allow us to estimate the baseline hazard function, which is essential if one is interested in prediction.
To address these problems, we consider a proportional hazards model with piecewise constant baseline hazard function. Note that the piecewise constant hazard is not a strong assumption since any continuous function on a closed interval can be uniformly approximated by a step (piecewise constant) function (Carothers 2000). In fact, the piecewise constant hazard allows us to approximate reasonably well almost any baseline hazard if the partition is fine enough (relative to the true baseline hazard). Assume we partition [0, ∞) into J intervals (0 = a0 < a1 < … < aJ = ∞), the piecewise constant hazard function is given by
| (2.2) |
where {λj | j = 1, …, J} are constant, λ = (λ1, …, λJ)⊤, and β is a p × 1-dimensional vector of regression coefficients corresponding to covariates xi. We write the cumulative piecewise linear hazard function as follows,
| (2.3) |
After some algebra, the logarithm of likelihood function for model (2.2) is given by
| (2.4) |
Under this formulation, ℓ(β, λ | D) can be written as the summation of several independent log partial likelihood functions and thus the online-updating algorithm idea can be naturally applied.
Note that, we can also write (2.4) as follows
| (2.5) |
where .
Let denote the collection of all the parameters and M(θ) the score function, which is the first-order partial derivative of the logarithm of likelihood function in (2.5). Let denote the solution to the score equation
| (2.6) |
After taking the second-order partial derivatives of the log partial likelihood function, elements of the negated (p + J) × (p + J) Hessian matrix H = (Hi,j) are given by
| (2.7) |
for r, s = 1, …, p and m, n = 1, …, J.
2.2. Time-dependent Covariates
The above results can also be extended to non-proportional hazards models with time-dependent covariates. Let zi(t) denote the time-dependent covariates and γ be a q × 1-dimensional vector of regression coefficients. Without loss of generality (WLOG), we assume 0 = bi0 < bi1 < … biL = ∞ and zi (t) = ziℓ for t ∈[biℓ−1, biℓ), which is equivalent to
The corresponding hazard function is given by
and the cumulative hazard function can be simplified as follows
Let Ωijℓ denote the set [aj−1, aj)⋂[biℓ−1, biℓ), j = 1, …, J, ℓ = 1, …, L. Note that, the Ωijℓ’s represent disjoint sets and can be empty. Additionally, .
Then
| (2.8) |
where
L(Ωijℓ) and U(Ωijℓ) are respectively the lower and upper bounds of the interval set Ωijℓ. The logarithm of the likelihood is given by
| (2.9) |
where . Elements of the negated (p + J) × (p + J) Hessian matrix H = (Hi,j) are given by
| (2.10) |
where , for i = 1, …, N, r, s = 1, …, p + q, and m, n = 1, …, J.
The logarithm of the likelihood function in (2.9) and the negated Hessian matrix in (2.10) with time-dependent covariates have the same formats as the corresponding functions with only time-independent covariates in (2.5) and (2.7). Therefore, the following proposed methods with only time-independent covariates can be directly applied to the online-updating of survival analysis with time-dependent covariates.
Remark 2.2.1. A more general form of (2.9) can be written as
| (2.11) |
which can be approximated by the Riemann sum in (2.9) as L goes to infinity.
2.3. Fixed Partition
In the steaming data setting, we suppose that the N observations are not available all at once, but rather arrive in chunks from a large data stream. Suppose at each accumulation point k, for the nk observations, we observe tk, δk, and Xk, which are the nk-dimensional vector of observed times, the nk-dimensional vector of event indicator, and the nk × p matrix of baseline covariates, respectively, for k = 1, …, K such that , , .
For now, we let the partition for the piecewise hazard function be fixed through the entire updating process. WLOG, we assume each interval has at least one event for each block of data. Otherwise, for that particular block, we temporarily merge the consecutive intervals to ensure each new wider interval has at least one event. After that, we still return to the pre-specified fixed partition and set the constant hazard estimates of each problematic original interval the same as the estimate of the corresponding combined interval.
Let and denote the current estimators of θ and H, which are obtained in a similar way as in (2.6) and (2.7) but based on the kth subset. The online-updating estimator of θ based on the cumulative data Dk = {(tℓ, Xℓ, δℓ), ℓ = 1, …, k)} is given by
| (2.12) |
which is equivalent to
| (2.13) |
with is the cumulative negated Hessian matrix and H0 = 0p+J is a (p + J) × (p + J) matrix of zeros.
A natural variance estimator of θk is given by
| (2.14) |
where is the variance estimator of , from the subset k. Equation (2.14) can thus be simplified as
| (2.15) |
Remark 2.3.1. A conventional online-updating algorithm for θk is given by
where θk is a weighted average between the cumulative estimators θk−1 at last update and the current estimators , with the scalar weight functions satisfying and . Cappé (2011), for example, proposed γk = k−α, which only depends on k and α ∈ (0.5, 1]. Our proposed estimator in (2.13) is also a weighted average between θk−1 and , but with different weight functions . The second order bias of θk can be reduced with the carefully selected weight functions and the bias correction terms introduced in Sections 2.5 and 2.6.
2.4. Adaptive Partition
In the previous section, we select the partition based on the first block of data, and assume the partition is fixed during the entire online-updating procedure. However, this assumption may not be ideal as more and more data accumulate. For the following blocks of data, the number of events within each interval may vary tremendously. Some intervals may contain zero events (as mentioned in the previous section) while some intervals may contain an overwhelming number of events. Thus, instead of sticking with the initial partition, it would be desirable to allow for an adaptive partition, i.e., splitting the interval with too many events into subintervals. As more and more data arrive, the partition of the piecewise constant hazard function becomes finer and finer. With the increasingly fine partition, the fitted baseline hazard function should be able to capture the true baseline hazard function.
Assume for the (k − 1)th block, we have J intervals, (p + J)-dimensional vector of the cumulative estimator θk−1, and (p + J) × (p + J)-dimensional matrix of the cumulative negated Hessian matrix Hk−1. For the kth block, if each interval has similar event size (see Remark 2.4.4), we will continue using the same partition from the (k − 1)th block. Otherwise, WLOG, assume the Jth interval has the maximum number of events and we thus partition this interval into two subintervals.
The estimator , of the current block is of length (p + J + 1), and the negated Hessian matrix of the current block is of dimension (p + J + 1) × (p + J + 1). However, θk−1 and , as well as Hk−1 and , do not have the same dimensions and therefore, are not additive. To resolve this problem, we need to expand both θk−1 and Hk−1. For this purpose, let the symbol * denote the expansion of a matrix or a vector, and denote by and the corresponding unknown constants for the two new subintervals of the Jth interval at accumulation point k − 1. Since and are unknown, and are closely related to the λJ, we assume
| (2.16) |
where f is a certain function. Further assume
| (2.17) |
where and are constants.
The expanded cumulative negated Hessian matrix at block k − 1 is obtained by the chain rule,
| (2.18) |
To be specific, we introduce the (p + J + 1) × (p + J) expansion matrix Pk−1, where Pk−1(i, i) = 1, i = 1, …, (p + J − 1), , , and 0 elsewhere. Then,
Let and denote the expanded cumulative estimator for θ and the corresponding variance, respectively. We further introduce the (p + J + 1) × (p + J) expansion matrix Qk−1, where Qk−1(i, i) = 1, i = 1, …, (p + J − 1), , , and 0 elsewhere. We discuss the choice of constants and , as well as and , in Remark 2.4.1 and Section 2.6. We thus have
and
Finally, the online-updating estimator of θ based on cumulative data Dk and finer partition of piecewise baseline hazard function is given by
| (2.19) |
An approximate variance estimator is given by
| (2.20) |
Remark 2.4.1. We further impose constraints on , , , and ( and all are positive) to reduce the bias of the new estimator in Section 2.6. The choice of and depends on the underlying baseline hazard. If the baseline hazard function is strictly increasing (decreasing), we set (). If the baseline hazard function is not strictly monotone or we do not know the true baseline hazard function, we set as in the simulation studies and real data analyses. To satisfy , we further set .
Remark 2.4.2. More complicated baseline hazard functions will require more pieces in the partition (larger J) in order to guarantee the consistency of the estimators. However, the gain from increasing the number of pieces in the partition should be balanced with the issues of power and ease of computation (Holford 1976). Thus, we may stop increasing the number of pieces in the partition once J reaches certain value Jmax, which depends on the true baseline hazard function if given. If the true baseline hazard function is unknown, we may stop increasing J such that the relative “distance” between the estimated baseline hazard functions at the previous and the current accumulation points is small enough, i.e., for a given small ϵ > 0, there exists k, such that , where, , and set the number of pieces at accumulation point k as Jmax.
Remark 2.4.3. WLOG, we can split more than one interval at a given block k. However, if we increase J only by 1 at block k then the cumulative negated Hessian matrix at the current block is most likely positive definite (p.d.) This can be shown by mathematical induction.
When k = 1, , which is always p.d. Assume Hk−1 is p.d., and further assume for the kth update, we partition the Jth interval into two subintervals. We write in terms of block matrices , where we know that the leading (p + J) × (p + J) principal submatrix A11 is p.d. given that Hk−1 is p.d. Similarly, we write in terms of block matrices , where the leading (p + J) × (p + J) principal submatrix B11 is at least semi-positive definite (s.p.d.) or even p.d. since is always (s.)p.d. We then have , where A11 + B11 is p.d. After some elementary transformations which preserve the rank of the matrix, we have , where m = (a22 + b22) − (a12 + b12)⊤ (A11 + B11)−1 (a12 + b12) ≥ 0. Hk is thus p.d. if m ≠ 0. The result is further confirmed by both simulation studies and real data analyses.
Remark 2.4.4. For the kth block of data, we partition the jmaxth interval into two subintervals, where with expansion rate rk ≥ 1, for k = 1, …, K. In this paper, we set the expansion rate rk = 1 throughout. The expansion rate, however, does not need to be the same throughout the updates. Similar to the idea of simulated annealing, we can set rk relatively small at early stages (when k is small) to speed up the partition and quickly approximate the underlying baseline hazard function. Once the estimated baseline hazard function is relatively “stable” (see Remark 2.4.2), we then gradually increase rk to slow down the partition.
2.5. Fixed Partition and Bias Correction
Denote as the negated score function for the current block, which is defined in the same way as the score function in Section 2.1. In order to reduce the finite-sample bias introduced by (2.13) where the total number of intervals of the piecewise hazard function is fixed, similar to Schifano et al. (2016), we consider the Taylor expansion of around a vector to be defined later. Then
| (2.21) |
with denoting the remainder. Denote θk as the solution of
| (2.22) |
Defining and , then we have
| (2.23) |
which can be written sequentially as
| (2.24) |
with H0 = 0p+J and θ0 = 0.
We observe that
Thus, we have . Using the above approximation, the variance formula is given by
| (2.25) |
Remark 2.5.1. If we choose , then θk in (2.23) reduces to the estimator in (2.13), and bias is not corrected. Ideally, we should choose the intermediary estimator in a small neighborhood of the true θ, which is unknown but can be best approximated by utilizing all the cumulative information. Thus, we consider the intermediary estimator for fixed number of intervals given by
| (2.26) |
for k = 1, 2, …, H0 = 0p+J, and θ0 = 0.
2.6. Adaptive Partition and Bias Correction
Following from Section (2.5), we propose a new estimator, which allows for the adaptive partition of the piecewise hazard function and with less bias. To make the presentation cleaner, assume the number of pieces of the hazard function is not increased until the kth block, and that there are J pieces in the partition in block k − 1. Denote by the negated score function for block ℓ, where θp+J is a vector of length p + J, for ℓ = 1, 2, …, k − 1.
We start with (2.21) by summing over ℓ from one to k − 1:
Based on (2.23), we have
| (2.27) |
Note that (2.27) still holds if we multiple Pk−1 on both sides,
| (2.28) |
We set Pk−1 and Qk−1 as described in Section 2.4 such that . This is the same as putting constraints on , , , and in (2.17) such that . Denote the expanded cumulative score function as , where . We thus have an equivalent representation of (2.28) as
| (2.29) |
Additionally, for the kth block, we have
| (2.30) |
Denote θk of length (p + J + 1) as the solution for the sum of (2.29) and (2.30), then we have
| (2.31) |
where
and
An approximate variance estimator of θk is given by
| (2.32) |
where
Remark 2.6.1. If we allow for increasing number of intervals, in (2.26) becomes
Remark 2.6.2. We assume for simplicity of notation that nk = n for all k = 1, 2, …, K. Let β0 denote the true value of β in the multiplicative intensity model. Denote by βn and βN the MPLEs of the Cox model based on the n current observations and the N entire observations, respectively. Under Conditions A-D of Friedman (1982) and P (ti ≥ Tmax) > 0 (Fleming and Harrington 2011) for i = 1, …, n, where Tmax is a finite stopping time, then for each block, and .
It is well known that the estimated cumulative baseline hazard function is consistent under Conditions A-D of Friedman (1982) (Whittemore and Keller 1986). Furthermore, according to (2.6), , the width of each interval goes to 0 (Condition B of Friedman (1982)), and by Chebyshev’s weak law of large numbers, , for all j. Given that aj is dense in (0, Tmax], for any converges in probability to λ0(t).
Remark 2.6.3. Both simulation studies and real data analyses results show that, under standard regularity conditions (Conditions (2.1)–(2.6) of Fleming and Harrington (2011)), the online-updating estimator βK in (2.31) has similar convergence rate as MPLE based on the entire data βN as the number of block K and the number of pieces J increase, but not too fast (Condition C of Friedman (1982)). Furthermore, the estimated baseline hazard function converges to the true baseline hazard function pointwisely as K and J increase.
2.7. Cumulative Inference
With the advantage of online-updating estimators for the baseline hazard function, the proposed methods allow us to conduct cumulative statistical inference as a by-product. For example, comparisons between groups of survival rates at certain time points, as well as estimates of the entire survival curve are routinely reported in the clinical literature.
The cumulative estimated survival function at the kth block is given by
where 0 = ak0 < ak1 < … < akJ = ∞ is the updated partition at kth block and Δk,j(t) is defined in the same way as in (2.3) but corresponds to the new partition. By the delta method, the approximated variance estimator of is given by , where
The 100(1−α)% pointwise confidence interval for the survival function is thus given by
where and z1−α/2 is the (1 − α / 2) th quantile of the standard normal N(0, 1) distribution.
3. Simulation Studies
3.1. Simulation I
In Simulation I (censoring rate ≈ 45%), we first investigate the impact of initial number of intervals J0 on the performances of the four approaches: fixed/adaptive partition and bias/no bias correction estimators. We then focus on the “optimal” approaches: fixed/adaptive partition and bias correction, and compare them with MPLE estimator based on fitting the entire dataset simultaneously by SAS procedure PHREG and the cumulatively updated estimating equation (CUEE) estimator (Xue et al. 2020).
Simulation Setting
We generate B = 500 datasets of survival time ti independently from a Cox proportional model, for i = 1, …, N, with the baseline hazard function given by
| (3.1) |
If we assume a Weibull distribution for h0(t), then we have h(t) = 1.2t0.2 exp(−2) + 0.1exp(−0.35t) sin(5t). For the linear predictor in the Cox model, let β = (1, 0.5, −2.0)⊤, xki[1] ~ N(0, 1), xki[2] ~ Bernoulli(0.5), and xki[3] ~ Bernoulli(0.6) independently. After sampling the survival time for each subject, we generate their corresponding censoring time as min(Tmax, Uniform(0.7 Tmax, 1.5Tmax)), where Tmax = 10. Let the variable “event” be 1 if survival time is smaller than censoring time, and 0 otherwise. We set the total sample size N = 5, 000, 000, and the number of blocks K = 200 with nk = 2,500. The average event rates are 15.7%, 17.5%, 9.0%, and 12.4% for arms with (xki[2], xki[3]) = (0, 0), (1, 0), (0, 1), and (1, 1), respectively.
To examine the effect of the initial number of intervals J0 on the performances of the four proposed estimators, we let J0 vary from 1, 5, 10, to 15. Note that for the fixed partition estimators, J0 will stay the same (at 1, 5, 10 or 15) throughout the updates. For the adaptive estimators, we partition the interval with maximum number of events into two subintervals for each update until the total number of intervals reaches 50 (Jmax).
The Impact of J0
In Table 1, we report the average of the bias (Bias), the average of the standard errors (ASE), the simulation errors (SE), the root of the mean squared error (RMSE), the coverage probability (CP) of the 95% confidence intervals, and the computation time in minutes. The MPLE has little bias since it is obtained based on the entire dataset. The computation time of MPLE is not reported because MPLE is conducted in SAS while CUEE and the proposed approaches are conducted in FORTRAN using IMSL subroutines with double-precision accuracy on an Intel i7-4770 processor machine with 16 GB of RAM memory using a GNU/Linux operating system. Therefore, the computation time is not comparable. As expected, the computation time of fixed partition approaches increases as J0 increases and the computation time of adaptive approaches are similar under different values of J0. All proposed approaches had shorter computation time than CUEE.
Table 1.
Estimates from 500 replications and computation time in minutes under the MPLE, CUEE, fixed partition and bias correction (Fixed & BC), and adaptive partition and bias correction approaches (Adapt & BC), with varying J0, in Simulation I.
| Variable | Method | J 0 | Bias | ASE | SE | RMSE | CP | Time |
|---|---|---|---|---|---|---|---|---|
| β 1 | MPLE | — | 0.0001 | 0.0023 | 0.0022 | 0.0022 | 0.958 | — |
| CUEE | — | 0.0000 | 0.0023 | 0.0022 | 0.0022 | 0.958 | 876.1 | |
| Fixed & BC | 1 | −0.0621 | 0.0021 | 0.0019 | 0.0622 | 0.000 | 44.8 | |
| Fixed & BC | 5 | −0.0085 | 0.0023 | 0.0043 | 0.0095 | 0.186 | 62.4 | |
| Fixed & BC | 10 | −0.0027 | 0.0023 | 0.0023 | 0.0036 | 0.774 | 68.8 | |
| Fixed & BC | 15 | −0.0013 | 0.0023 | 0.0023 | 0.0026 | 0.916 | 73.1 | |
| Adapt & BC | 1 | −0.0021 | 0.0023 | 0.0063 | 0.0066 | 0.856 | 82.6 | |
| Adapt & BC | 5 | 0.0001 | 0.0023 | 0.0023 | 0.0023 | 0.962 | 103.3 | |
| Adapt & BC | 10 | 0.0002 | 0.0023 | 0.0022 | 0.0023 | 0.964 | 102.9 | |
| Adapt & BC | 15 | 0.0002 | 0.0023 | 0.0022 | 0.0023 | 0.962 | 104.6 | |
| β 2 | MPLE | — | 0.0002 | 0.0039 | 0.0037 | 0.0037 | 0.960 | — |
| CUEE | — | 0.0002 | 0.0039 | 0.0037 | 0.0037 | 0.958 | — | |
| Fixed & BC | 1 | −0.0309 | 0.0039 | 0.0035 | 0.0311 | 0.000 | — | |
| Fixed & BC | 5 | −0.0038 | 0.0039 | 0.0041 | 0.0056 | 0.834 | — | |
| Fixed & BC | 10 | −0.0008 | 0.0039 | 0.0038 | 0.0038 | 0.954 | — | |
| Fixed & BC | 15 | 0.0001 | 0.0040 | 0.0037 | 0.0037 | 0.960 | — | |
| Adapt & BC | 1 | 0.0014 | 0.0039 | 0.0049 | 0.0051 | 0.904 | — | |
| Adapt & BC | 5 | 0.0024 | 0.0041 | 0.0038 | 0.0045 | 0.930 | — | |
| Adapt & BC | 10 | 0.0023 | 0.0041 | 0.0038 | 0.0044 | 0.932 | — | |
| Adapt & BC | 15 | 0.0021 | 0.0041 | 0.0037 | 0.0043 | 0.932 | — | |
| β 3 | MPLE | — | −0.0001 | 0.0044 | 0.0043 | 0.0043 | 0.954 | — |
| CUEE | — | 0.0000 | 0.0044 | 0.0043 | 0.0043 | 0.954 | — | |
| Fixed & BC | 1 | 0.1202 | 0.0040 | 0.0037 | 0.1204 | 0.000 | — | |
| Fixed & BC | 5 | 0.0140 | 0.0044 | 0.0067 | 0.0155 | 0.198 | — | |
| Fixed & BC | 10 | 0.0039 | 0.0044 | 0.0044 | 0.0058 | 0.868 | — | |
| Fixed & BC | 15 | 0.0023 | 0.0044 | 0.0043 | 0.0049 | 0.926 | — | |
| Adapt & BC | 1 | 0.0061 | 0.0044 | 0.0121 | 0.0136 | 0.774 | — | |
| Adapt & BC | 5 | 0.0018 | 0.0045 | 0.0043 | 0.0046 | 0.942 | — | |
| Adapt & BC | 10 | 0.0015 | 0.0045 | 0.0043 | 0.0046 | 0.944 | — | |
| Adapt & BC | 15 | 0.0014 | 0.0045 | 0.0043 | 0.0045 | 0.944 | — |
We first focus on the fixed partition and bias correction approach. As shown in Table 1, the estimator with J0 = 1 tends to be the most biased, particularly in the coefficients corresponding to binary covariates (β2 and β3). As expected, bias decreases with a larger J0 and the estimator with J0 = 5 already performs quite well. In addition, ASEs, SEs, and RMSEs with J0 > 1 for all parameters are close to those of MPLE. As expected, CPs increase as J0 increases and is close to 95% with J0 = 15. Similar results are observed in Figures S4 and S5. Figure S4 shows boxplots of the biases in MPLE from SAS and the fixed partition and bias correction estimator of βj, j = 1, …, 3, under different values of J0. The corresponding standard errors are shown in Figure S5. Figure S6 shows the fitted baseline hazard function of the fixed partition and bias correction approach under different values of J0. Again, as the initial partition becomes finer (J0 increases), the estimated piecewise baseline hazard function align better with the true baseline hazard function. However, even with J0 = 15, the fixed partition and bias correction approach still cannot fully recover the complicated true baseline hazard function.
Comparison between the MPLE and the Four Proposed Estimators (J0 = 5)
Controlling for J0(=5), we next show the comparison between MPLE and the four proposed approaches (Figures 1–3). A comparison of the biases with J0 = 5 is shown in Figure 1. Among the four proposed approaches, the adaptive partition and bias correction approach has the least biased estimates for βj, for all j = 1, …, 3. According to Figure 2, the standard errors of fixed partition approaches (both bias correction and no bias correction) and that of the adaptive approach and bias correction are similar to the standard error of MPLE, while the standard error of the adaptive partition and no bias correction approach is slightly smaller. As shown in Figure 3, even with J0 = 5, the adaptive partition and bias correction approach successfully recovers the true baseline hazard, with the fitted baseline hazard function in blue almost overlapping with the true baseline hazard function in red. The adaptive partition with no bias correction approach can capture the shape of the true function, but is biased. The two fixed partition approaches, for both bias and no bias correction, fail to estimate the true baseline hazard function given small values of J0. Additional figures on comparisons are given in the Supplementary Materials.
Fig. 1.

Boxplots of bias for the 5 types of estimators (MPLE, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction), with J0 = 5, in Simulation I.
Fig. 3.

Estimated baseline hazard functions for (a) fixed partition and no bias correction, (b) fixed partition and bias correction, (c) adaptive partition and no bias correction, and (d) adaptive partition and bias correction, with J0 = 5. The red curve is the true baseline hazard function, the blue curve is the estimated baseline hazard function, and the two black curves represent the pointwise 95% confidence intervals in Simulation I.
Fig. 2.

Boxplots of standard error for the 5 types of estimators (MPLE, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction), with J0 = 5, in Simulation I.
Comparison between MPLE, CUEE, and the Bias-corrected Estimators
Now, we show comparison between MPLE, CUEE, and the “optimal” bias correction methods: fixed/adaptive and bias correction approaches with varying J0 (Table 1). Similar to the fixed partition and bias correction approach, as J0 increases, the bias of the adaptive partition and bias correction approach decreases, with J0 = 1 already performing quite well. ASEs, SEs, and RMSEs are close to those of MPLE and the CPs are also close to 95% under all values of J0.
Both adaptive partition and bias correction approach and CUEE perform well in terms of bias, ASE, SE, RMSE, and CP. One advantage of the proposed method is that it provides good estimates of the baseline hazard functions, which cannot be achieved by CUEE approach. As shown in Figure S12, even with J0 = 1, the fitted baseline hazard function under the adaptive partition and bias correction approach successfully captures the shape of the true baseline hazard. As J0 increases, the fitted and true baseline hazard curves almost overlap, which further justifies our proposed method in Section 2.6. Another advantage of the proposed method over CUEE is the computation time, especially when censoring rate is low (Simulation I).
3.2. Simulation II
In Simulation II (censoring rate ≈ 76%), we further compare the performances between MPLE, CUEE, and the proposed fixed/adaptive and bias correction estimators with varying J0.
Simulation Setting
We generate B = 500 datasets of survival time ti independently from a Cox proportional model, for i = 1, …, N, with the baseline hazard function given in (3.1). The censoring time and the variable “event” are generated as in Simulation I. To achieve high censoring rate which is frequently encountered in real life, let β = (1, −4.0, −4.0)⊤, xki[1] ~ N(−1, 1), xki[2] ~ Bernoulli(0.5), and xki[3] ~ Bernoulli(0.1 + 0.8xki[2]), where the binary covariates are highly correlated. We set the total sample size N = 5, 000, 000, and the number of blocks K = 1000 with nk = 500. The average event rates are 24.3%, 0.1%, 0.1%, and 0.0% for arms with (xki[2], xki[3]) = (0, 0), (1, 0), (0, 1), and (1, 1), respectively.
To examine the effect of the initial number of intervals J0 on the performances of the two proposed bias correction estimators, we let J0 vary from 1, 5, to 10. Due to the high censoring rate and rare event issues (average event rates around 0.0% in certain arms), for the adaptive estimators, we partition the interval with maximum number of events into two subintervals for each update until the total number of intervals reaches 30 (Jmax).
Comparison between MPLE, CUEE, and the Bias-corrected Estimators
Now, we show comparison between MPLE, CUEE, and the “optimal” bias correction methods: fixed/adaptive and bias correction approaches with varying J0 (Table 2).
Table 2.
Estimates from 500 replications and computation time in minutes under MPLE, CUEE, fixed partition and bias correction (Fixed & BC), and adaptive partition and bias correction approaches (Adapt & BC), with varying J0, in Simulation II.
| Variable | Method | J 0 | Bias | ASE | SE | RMSE | CP | Time |
|---|---|---|---|---|---|---|---|---|
| β 1 | MPLE | — | −0.0007 | 0.0034 | 0.0036 | 0.0036 | 0.942 | — |
| CUEE | — | −0.0008 | 0.0034 | 0.0037 | 0.0038 | 0.920 | 192.7 | |
| Fixed & BC | 1 | −0.0530 | 0.0032 | 0.0032 | 0.0532 | 0.000 | 77.5 | |
| Fixed & BC | 5 | 0.0005 | 0.0034 | 0.0042 | 0.0043 | 0.874 | 83.2 | |
| Fixed & BC | 10 | −0.0008 | 0.0034 | 0.0037 | 0.0038 | 0.930 | 90.2 | |
| Adapt & BC | 1 | −0.0155 | 0.0034 | 0.0037 | 0.0160 | 0.004 | 111.8 | |
| Adapt & BC | 5 | 0.0002 | 0.0034 | 0.0037 | 0.0037 | 0.932 | 114.1 | |
| Adapt & BC | 10 | −0.0001 | 0.0034 | 0.0036 | 0.0036 | 0.946 | 113.8 | |
| β 2 | MPLE | — | −0.0014 | 0.0402 | 0.0391 | 0.0391 | 0.944 | — |
| CUEE | — | 0.2953 | 0.0269 | 0.2145 | 0.3653 | 0.080 | — | |
| Fixed & BC | 1 | 0.0826 | 0.0395 | 0.0394 | 0.0916 | 0.442 | — | |
| Fixed & BC | 5 | 0.0064 | 0.0395 | 0.0397 | 0.0402 | 0.936 | — | |
| Fixed & BC | 10 | 0.0069 | 0.0395 | 0.0397 | 0.0403 | 0.930 | — | |
| Adapt & BC | 1 | 0.0295 | 0.0394 | 0.0397 | 0.0495 | 0.884 | — | |
| Adapt & BC | 5 | 0.0060 | 0.0396 | 0.0397 | 0.0401 | 0.936 | — | |
| Adapt & BC | 10 | 0.0062 | 0.0396 | 0.0396 | 0.0401 | 0.938 | — | |
| β 3 | MPLE | — | −0.0007 | 0.0402 | 0.0424 | 0.0424 | 0.932 | — |
| CUEE | — | 0.3050 | 0.0268 | 0.2244 | 0.3789 | 0.086 | — | |
| Fixed & BC | 1 | 0.0836 | 0.0395 | 0.0425 | 0.0938 | 0.424 | — | |
| Fixed & BC | 5 | 0.0076 | 0.0394 | 0.0425 | 0.0432 | 0.930 | — | |
| Fixed & BC | 10 | 0.0080 | 0.0395 | 0.0425 | 0.0433 | 0.926 | — | |
| Adapt & BC | 1 | 0.0305 | 0.0394 | 0.0425 | 0.0523 | 0.850 | — | |
| Adapt & BC | 5 | 0.0071 | 0.0396 | 0.0425 | 0.0431 | 0.926 | — | |
| Adapt & BC | 10 | 0.0073 | 0.0396 | 0.0425 | 0.0431 | 0.928 | — |
First, we note that CUEE does not perform well for the coefficients of the two binary covariates (β2 and β3) in the presences of high censoring rate and rare event. The corresponding bias are huge, ASEs, SEs, and RMSEs are not comparable to those of MPLE, and the CPs are low.
The bias correction methods outperform CUEE under this setting. For both fixed/adaptive and bias correction approaches, as J0 increases, the bias of each parameter decreases, with J0 = 5 already performing quite well. ASEs, SEs, and RMSEs are close to those of MPLE and the CPs are also close to 95% with J0 > 1. Additionally, the adaptive partition and bias correction approach provides good estimates of the baseline hazard functions, which cannot be achieved by CUEE approach. As shown in Figure 4, even with J0 = 1, the fitted baseline hazard function under the adaptive partition and bias correction approach successfully captures the shape of the true baseline hazard. The computation time of the proposed method is also smaller than the computation time of CUEE even when censoring rate is high (Simulation II).
Fig. 4.

Estimated baseline hazard functions for the adaptive partition and bias correction approach, for (a) J0 = 1, (b) J0 = 5, and (c) J0 = 10. The red curve is the true baseline hazard function, the blue curve is the estimated baseline hazard function, and the two black curves represent the pointwise 95% confidence intervals in Simulation II.
4. Analyses of Real Data
4.1. Analysis of the SEER Colon Cancer Data
We examine the SEER colon cancer statistics between 1973 to 2013, available at https://seer.cancer.gov/data/. The data involves 315,120 observations, after deleting all the subjects with missing covariates and survival time less than three months. For illustration purpose, we consider the survival time in SEER data as continuous. We set the maximum censoring time (Tmax) as 18 months, which means the subject is still considered as censored if the event occurs after 18 months. We are interested in the early stage (≤ 18 months) as colon cancer is highly curable. Under this scenario, the total number of events including both colon cancer death and other causes death is 67,798, with a censoring rate of 78.49%. The covariates considered in our analysis are year of diagnosis (Year) and surgery treatment indicator (RP). The covariate Year is continuous and the covariate RP is binary. Among the 67,798 patients, 4,586 underwent surgery treatment. The data satisfies the proportional hazards assumption by the test of Grambsch and Therneau (1994).
We use a subset sample size nk = 2,500 for k = 1, …, 127 to estimate the data in the online-updating framework. To examine the effect of the initial number of intervals J0 on the performances of the proposed estimators, we let J0 vary from 1, 3, to 5 given high censoring rate. For the adaptive partition approaches, we allow increment of pieces (at most one piece a time) for each update until the total number of pieces reaches 14 (Jmax). We set and since we do not know the underlying baseline hazard function.
The Impact of J0
As shown in Table 3, for the continuous covariate Year, the estimates and standard errors of the four approaches i.e, fixed partition and no bias correction, fixed partition and bias correction, adaptive partition and no bias correction, and adaptive partition and bias correction, are similar under different values of J0.
Table 3.
Estimates and standard errors for the SEER colon cancer data.
| Method | J 0 | Variable | Est | SE | Variable | Est | SE |
|---|---|---|---|---|---|---|---|
| MPLE | — | RP | 0.14552 | 0.03285 | Year | −0.17462 | 0.00393 |
| CUEE | — | 0.14798 | 0.03535 | −0.17469 | 0.00392 | ||
| Fixed & NBC | 1 | 0.22358 | 0.03285 | −0.17518 | 0.00394 | ||
| Fixed & NBC | 3 | 0.22697 | 0.03284 | −0.17585 | 0.00393 | ||
| Fixed & NBC | 5 | 0.23128 | 0.03284 | −0.17650 | 0.00393 | ||
| Fixed & BC | 1 | 0.14774 | 0.03327 | −0.17482 | 0.00394 | ||
| Fixed & BC | 3 | 0.14755 | 0.03327 | −0.17479 | 0.00394 | ||
| Fixed & BC | 5 | 0.14769 | 0.03327 | −0.17477 | 0.00394 | ||
| Adapt & NBC | 1 | 0.26512 | 0.03283 | −0.18206 | 0.00393 | ||
| Adapt & NBC | 3 | 0.25180 | 0.03284 | −0.17967 | 0.00393 | ||
| Adapt & NBC | 5 | 0.25177 | 0.03284 | −0.17967 | 0.00393 | ||
| Adapt & BC | 1 | 0.16063 | 0.03371 | −0.17688 | 0.00400 | ||
| Adapt & BC | 3 | 0.14834 | 0.03331 | −0.17477 | 0.00395 | ||
| Adapt & BC | 5 | 0.14853 | 0.03328 | −0.17480 | 0.00395 |
For the binary covariate RP, the estimates under the adaptive partition approaches (both bias and no bias correction) tend to be closer to the estimate of MPLE as J0 increases, with J0 = 3 already performs quite well for the adaptive and bias correction approach. Estimates under the fixed partition approaches (both bias and no bias correction) are robust under different values of J0. All standard errors are similar to the standard errors of MPLE under different values of J0.
Comparison between MPLE, CUEE, and the Four Proposed Estimators (J0 = 5)
Controlling for J0 (= 5), we compare the estimates and standard errors between MPLE, CUEE, and the four approaches. In Table 3, the bias correction approaches (both fixed and adaptive partition), MPLE, and CUEE tend to be the most similar in terms of both regression coefficients and standard errors, except that CUEE has slightly larger SE for binary covariate RP. The regression coefficients for the other two approaches without bias correction (both fixed and adaptive partition) have similar results for the continuous covariate Year, but very different results for the binary covariate RP.
SAS PHREG does not directly provide the baseline hazard function without any assumption of the baseline hazard. We obtain the baseline hazard function based on the entire dataset by assuming the baseline hazard is piecewise constant with all the distinct event times as cutoff points, i.e, the Breslow estimator (Breslow 1972). We again compare the estimated baseline hazard functions of the four proposed methods with the result based on the entire data (Figure 5). The estimated baseline hazard function of the adaptive partition and bias correction approach in brown nearly overlaps with the estimated baseline hazard function based on the entire data in black. With so few pieces, the fixed partition approaches (both bias and no bias correction) fail to provide satisfactory results on estimating the baseline hazard function. Similar results were also observed in the previous simulation study.
Fig. 5.

Estimated baseline hazard functions for the SEER colon cancer data for (i) all data (solid and black), (ii) fixed partition and no bias correction (dashed and green), (iii) fixed partition and bias correction (dotted and blue), (iv) adaptive partition and no bias correction (dot dash and orange), and (v) adaptive partition and bias correction (long dash and brown), with J0 = 5.
Figure 6 shows the plots of the estimated survival functions and the corresponding pointwise 95% confidence intervals stratified by the treatments (RP and no RP) evaluated at Year=1994, which corresponds to the average year of diagnosis. With the average year of diagnosis (Year), the estimates (95% CIs) of the survival rates were 0.892 (95% CI 0.891 – 0.893) for the subjects treated with RP and 0.873 (95% CI 0.872 – 0.874) for the subjects without surgery (no RP) treatment at 10 months after diagnosis; and 0.842 (95% CI 0.840 – 0.843) for the subjects treated with RP and 0.814 (95% CI 0.813 – 0.816) for the subjects without surgery (no RP) treatment at 15 months after diagnosis.
Fig. 6.

Estimated survival functions under the adaptive partition and bias correction approach for the SEER colon cancer data. The blue and red curves represent the arms with average years of treatment, and with/without RP treatment, respectively. The two black curves represent the pointwise 95% confidence intervals, with J0 = 5.
4.2. Analysis of Successful Exit of Venture Capital (VC) Investing
We investigate the U.S.-based VC-backed companies that received their first round of VC funding between 1946 to 2019. The data, from VentureXpert database by Thomas Financial, involves 33,268 companies after deleting all the companies with missing round dates or initial public offering (IPO) dates. We consider successful exit (IPO) as event and the logarithm of number of days from first round of VC funding to IPO (event time) or the last round investment (censoring time) as the continuous survival time. Under this scenario, the total number of events is 3,717, with a high censoring rate of 88.83%. The covariates considered in our analysis are number of funds received (NumFunds) and cumulative amount of investment received at each round (CumAmounts) with total number of rounds ranging from 1 to 46, of which CumAmounts is time-dependent. Both covariates are continuous and are scaled via subtracting 9.07 and 48308.36 from them and divided by 6.70 and 245076.64, respectively, for numerical stability.
We start with a subset sample size nk = 1000 (k ≤ 5) to obtain enough cumulative events for analysis and then set nk = 500 for all subsequent block (k = 6, …, 62). Similar to Section 4.1, we let J0 vary from 1, 3, to 5. For the adaptive partition approaches, we allow increment of pieces (at most one piece a time) for each update until the total number of pieces reaches 15 (Jmax) due to the rare events. We again set and since we do not know the underlying baseline hazard function.
The Impact of J0
As shown in Table 4, all the estimates and standard errors tend to be closer to the estimate and standard error of MPLE as J0 increases. Among all the methods, the adaptive and bias correction approach with J0 = 5 yields the closest results to MPLE.
Table 4.
Estimates and standard errors for the VC data.
| Method | J 0 | Variable | Est | SE | Variable | Est | SE |
|---|---|---|---|---|---|---|---|
| MPLE | — | NumFunds | 0.13467 | 0.01504 | CumAmounts | 0.03051 | 0.00336 |
| CUEE | — | 0.13547 | 0.01537 | 0.03559 | 0.00683 | ||
| Fixed & BC | 1 | 0.40769 | 0.01333 | 0.05122 | 0.00279 | ||
| Fixed & BC | 3 | 0.18673 | 0.01464 | 0.04002 | 0.00397 | ||
| Fixed & BC | 5 | 0.15157 | 0.01488 | 0.03454 | 0.00385 | ||
| Adapt & BC | 1 | 0.36694 | 0.01677 | 0.05214 | 0.00344 | ||
| Adapt & BC | 3 | 0.15680 | 0.01571 | 0.04016 | 0.00448 | ||
| Adapt & BC | 5 | 0.13361 | 0.01555 | 0.03387 | 0.00395 |
Comparison between MPLE, CUEE, and the Bias-corrected Estimators (J0 = 5)
Controlling for J0 (= 5), we compare the estimates and standard errors between MPLE, CUEE, and the “optimal” bias correction approaches. The bias correction approaches have similar estimates and standard errors as MPLE for both time-independent (NumFunds) and time-dependent covariates (CumAmounts). CUEE also has similar estimates and standard errors as MPLE for the time-independent covariate, but larger standard errors for the time-dependent covariate.
We compare the estimated baseline hazard functions of the bias correction methods with the result based on the entire data (Figure 7). Similar to Section 4.1, the estimated baseline hazard function of the adaptive partition and bias correction approach in brown nearly overlaps with the estimated baseline hazard function based on the entire data in black. The fixed partition approach in blue fails to provide satisfactory results on estimating the baseline hazard function. Note that, CUEE does not automatically provide us the estimates of the baseline hazard function.
Fig. 7.

Estimated baseline hazard functions for VC data for (i) all data (solid and black), (ii) fixed partition and bias correction (dotted and blue), and (iii) adaptive partition and bias correction (long dash and brown), with J0 = 5.
This example shows that the adaptive and bias correction approach performs as well as the full data approach based on the MPLE, even in the presence of time-dependent covariates and rare events.
5. Discussion
We developed online-updating algorithms and inferences for survival data, under the proportional hazard assumption. Among all the four approaches we proposed, the adaptive and bias correction approach is minimally storage-intensive and compares favorably with the existing method which requires access to the entire data, for both the regression coefficients and the baseline hazard function. Our methods are also applicable for time-dependent covariates, which relaxes the proportional hazard assumption. In this paper, we focus on the large-scale survival data where the event is induced by a single risk. One future direction would be to extend the current methods to other types of big survival data, including but not limited to competing-risks streaming data (Fine and Gray 1999) (SEER) and recurrent event data (Pena et al. 2001)(Zillow real estate data).
Supplementary Material
Acknowledgement
We would like to thank the Editor, the Associate Editor, and the two anonymous reviewers for their very helpful comments and suggestions, which have led to a much improved version of the paper. Dr. M.-H. Chen’s research was partially supported by NIH grants #GM70335 and #P01CA142538.
References
- Ai M, Yu J, Zhang H, and Wang H (2018). Optimal subsampling algorithms for big data generalized linear models. arXiv preprint arXiv:1806.06761. [Google Scholar]
- Barbian MH and Assunção RM (2017). Spatial subsemble estimator for large geostatistical data. Spatial Statistics, 22, 68–88. [Google Scholar]
- Breslow NE (1972). Discussion of the paper by D.R. Cox. Journal of the Royal Statistical Society: Series B, 34, 216–217. [Google Scholar]
- Cappé O (2011). Online EM algorithm for hidden markov models. Journal of Computational and Graphical Statistics, 20(3), 728–749. [Google Scholar]
- Carothers NL (2000). Real analysis. Cambridge University Press. [Google Scholar]
- Chang X, Lin S-B, Wang Y, et al. (2017). Divide and conquer local average regression. Electronic Journal of Statistics, 11(1), 1326–1350. [Google Scholar]
- Chen X and Xie M. g. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, 24(4), 1655–1684. [Google Scholar]
- Cox DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society B, 34, 187–220. [Google Scholar]
- Cox DR (1975). Partial likelihood. Biometrika, 62(2), 269–276. [Google Scholar]
- Fine JP and Gray RJ (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association, 94(446), 496–509. [Google Scholar]
- Fleming TR and Harrington DP (2011). Counting Processes and Survival Analysis. John Wiley & Sons. [Google Scholar]
- Friedman M (1982). Piecewise exponential models for survival data with covariates. The Annals of Statistics, 10(1), 101–113. [Google Scholar]
- Giot P and Schwienbacher A (2007). Ipos, trade sales and liquidations: Modelling venture capital exits using survival analysis. Journal of Banking & Finance, 31(3), 679–702. [Google Scholar]
- Grambsch PM and Therneau TM (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika, 81(3), 515–526. [Google Scholar]
- Holford TR (1976). Life tables with concomitant information. Biometrics, 32(3), 587–597. [PubMed] [Google Scholar]
- Ibrahim JG, Chen M-H, and Sinha D (2001). Bayesian Survival Analysis. Springer Science & Business Media. [Google Scholar]
- Johansen S (1983). An extension of cox’s regression model. International Statistical Review/Revue Internationale de Statistique, 51(2), 165–174. [Google Scholar]
- Kalbfleisch JD and Prentice RL (2011). The statistical analysis of failure time data. John Wiley & Sons. [Google Scholar]
- Kawaguchi ES, Suchard MA, Liu Z, and Li G (2017). Scalable sparse cox’s regression for large-scale survival data via broken adaptive ridge. arXiv preprint arXiv:1712.00561. [Google Scholar]
- Kleiner A, Talwalkar A, Sarkar P, and Jordan MI (2014). A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 795–816. [Google Scholar]
- Kong E and Xia Y (2019). On the efficiency of online approach to nonparametric smoothing of big data. Statistica Sinica, 29(1), 185–201. [Google Scholar]
- Liang F, Cheng Y, Song Q, Park J, and Yang P (2013). A resampling-based stochastic approximation method for analysis of large geostatistical data. Journal of the American Statistical Association, 108(501), 325–339. [Google Scholar]
- Lin N and Xi R (2011). Aggregated estimating equation estimation. Statistics and Its Interface, 4, 73–83. [Google Scholar]
- Luo L and Song PX-K (2020). Renewable estimation and incremental inference in generalized linear models with streaming data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1), 69–97. [Google Scholar]
- Ma P, Mahoney MW, and Yu B (2013). A statistical perspective on algorithmic leveraging. arXiv preprint arXiv:1306.5362. [Google Scholar]
- Maclaurin D and Adams RP (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. arXiv preprint arXiv:1403.5693. [Google Scholar]
- Mittal S, Madigan D, Burd RS, and Suchard MA (2013). High-dimensional, massive sample-size cox proportional hazards regression for survival analysis. Biostatistics, 15(2), 207–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neiswanger W, Wang C, and Xing E (2013). Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780. [Google Scholar]
- Pena EA, Strawderman RL, and Hollander M (2001). Nonparametric estimation with recurrent event data. Journal of the American Statistical Association, 96(456), 1299–1315. [Google Scholar]
- Plank SB, DeLuca S, and Estacion A (2008). High school dropout and the role of career and technical education: A survival analysis of surviving high school. Sociology of Education, 81(4), 345–370. [Google Scholar]
- Schifano ED, Wu J, Wang C, Yan J, and Chen M-H (2016). Online updating of statistical inference in the big data setting. Technometrics, 58(3), 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song Q and Liang F (2015). A split-and-merge bayesian variable selection approach for ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(5), 947–972. [Google Scholar]
- Wang C, Chen M-H, Schifano E, Wu J, and Yan J (2016). Statistical methods and computing for big data. Statistics and its interface, 9(4), 399–414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang C, Chen M-H, Wu J, Yan J, Zhang Y, and Schifano E (2018a). Online updating method with new variables for big data streams. Canadian Journal of Statistics, 46(1), 123–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang H, Yang M, and Stufken J (2019). Information-based optimal subdata selection for big data linear regression. Journal of the American Statistical Association, 114(525), 393–405. [Google Scholar]
- Wang Y, Palmer N, Di Q, Schwartz J, Kohane I, and Cai T (2018b). A fast divide-and-conquer sparse cox regression. Forthcoming in Biostatistics. [DOI] [PMC free article] [PubMed]
- Whittemore AS and Keller JB (1986). Survival estimation using splines. Biometrics, 42(3), 495–506. [PubMed] [Google Scholar]
- Xue Y, Wang H, Yan J, and Schifano ED (2020). An online updating approach for testing the proportional hazards assumption with streams of survival data. Biometrics, 76(1), 171–182. [DOI] [PubMed] [Google Scholar]
- Zillow (2016). Zillow Transitions to Streaming Data Architecture. https://www.zillow.com/data-science/streaming-data-architecture.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
