Abstract
We introduce a novel procedure to perform Bayesian non-parametric inference with right-censored data, the beta-Stacy bootstrap. This approximates the posterior law of summaries of the survival distribution (e.g. the mean survival time). More precisely, our procedure approximates the joint posterior law of functionals of the beta-Stacy process, a non-parametric process prior that generalizes the Dirichlet process and that is widely used in survival analysis. The beta-Stacy bootstrap generalizes and unifies other common Bayesian bootstraps for complete or censored data based on non-parametric priors. It is defined by an exact sampling algorithm that does not require tuning of Markov Chain Monte Carlo steps. We illustrate the beta-Stacy bootstrap by analyzing survival data from a real clinical trial.
Keywords: Censored data, Bayesian bootstrap, Bayesian non-parametric, Beta-Stacy process
1. Introduction
Survival data is often censored, hindering statistical inferences (Kalbfleisch and Prentice, 2002). In this setting, the goal is often to perform inference on specific summaries of the cumulative distribution function (defined for ) that generated the observed survival times , e.g. the expected survival time, or the probability to survive past a landmark time-point.
We introduce beta-Stacy bootstrap, a new method to perform Bayesian non-parametric inference for functionals of the distribution function using censored data. Specifically, the proposed approach generates approximate samples from the posterior law of obtained by assuming that is a beta-Stacy process (Walker and Muliere, 1997). This process defines a non-parametric prior law for distribution functions widely used with censored data (Walker and Damien, 1998; Al Labadi and Zarepour, 2013; Arfè et al., 2018). The beta-Stacy process extends the classical Dirichlet process of Ferguson (1973) and it is conjugate to both complete and right-censored data (Walker and Muliere, 1997). It is also strictly related to the beta process of Hjort (1990): is a beta-Stacy process if and only if its cumulative hazard function is a beta process (Walker and Muliere, 1997).
The proposed approach belongs to the family of Bayesian bootstrap procedures pioneered by Rubin (1981). In addition to Rubin’s, this family includes the proper Bayesian bootstrap of Muliere and Secchi (1996), the Bayesian bootstrap for censored data of Lo (1993), and others (Lo, 1991; Kim and Lee, 2003; Lyddon et al., 2019). Similarly to Efron’s classical bootstrap (Efron and Tibshirani, 1986), Bayesian bootstraps repeatedly re-sample and/or re-weight the observed data to induce a probability distribution for . More precisely, Bayesian bootstraps generate approximate samples from the posterior distribution of associated to some non-parametric prior for (for connections with Efron’s frequentist procedure, see Lo, 1987, 1991, 1993; Muliere and Secchi, 1996). Interest in these sampling algorithms has recently increased thanks to their scalability and computational simplicity—e.g. they do not require tuning of Markov Chain Monte Carlo steps (Lyddon et al., 2018; Barrientos and Peña, 2020).
We show that the beta-Stacy bootstrap generalizes other common Bayesian bootstrap procedures. These include those of Rubin (1981) and Muliere and Secchi (1996), which are at the core of other recent proposals (Lyddon et al., 2019; Barrientos and Peña, 2020), but cannot be applied in presence of censoring. They also include Lo’s procedure (1993), which can incorporate censored observations, but cannot incorporate prior information on the functional form of . We characterize each of these methods as a special or limiting case of the beta-Stacy bootstrap (cf. Fig. 1), which, in comparison, can be applied with censored data and allows to incorporate prior information on the data-generating distribution.
Fig. 1.
Relations between different Bayesian bootstraps: BSB (in red), beta-Stacy Bootstrap (cf. Section 4); PBB, Proper Bayesian Bootstrap (Muliere and Secchi, 1996); BBC, Bayesian Bootstrap for Censored data (Lo, 1993); BB, classical Bayesian Bootstrap (Rubin, 1981). The prior precision of the BSB is controlled by a function , while that of the PBB is controlled by a constant . (a) the BSB and PBB coincide when there is no censoring and ; (b) the BSB reduces to the BBC if for every ; (c) the PBB reduces to the BB if ; (d) the BCC and BB coincide when there is no censoring. See Section 5 for details.
We note that, when has a beta-Stacy prior distribution, posterior inferences for could also be based on algorithms for the simulation of Lévy processes (cf. Damien et al., 1995, Walker and Damien, 1998, Ferguson and Klass, 1972, and Wolpert and Ickstadt, 1998; see also Ghosal and van der Vaart, 2017, Section 13.3.3 and Blasi, 2014 for a reviews and applications to the beta-Stacy process). With these methods, it is possible to generate approximate samples from the posterior law of , and so also from the posterior law of . However, some algorithms (e.g. Damien et al., 1995; Walker and Damien, 1998) can only generate approximate sample paths over some bounded interval [0, ]. Hence, they may be difficult to apply to summaries that depend on all values of , such as the expected survival time. These cases are not problematic for the beta-Stacy bootstrap. Other approaches (e.g. Ferguson and Klass, 1972; Wolpert and Ickstadt, 1998) can approximately sample full paths from the posterior law of , but they are computationally more complicated than the beta-Stacy bootstrap (e.g. they may require auxiliary algorithms to sample from unnormalized distributions).
The rest of the paper is structured as follows. In Section 2, we introduce notations and assumptions used throughout the manuscript. In Section 3, we review the definition and properties of the beta-Stacy process. In Section 4, we introduce the beta-Stacy bootstrap and study its approximation properties (most technical proofs are provided in Appendix A). In Section 5, we describe the connections of the beta-Stacy bootstrap with other Bayesian bootstrap algorithms. In Section 6, we briefly describe a generalization of the beta-Stacy bootstrap to the -sample setting. In Section 7, we describe a computational approach for implementing the beta-Stacy bootstrap. Using data from a clinical trial in hepatology (Dickson et al., 1989), in Section 8 we illustrate the beta-Stacy bootstrap and contrast it to an algorithm that generates approximate beta-Stacy sample paths. We describe this algorithm in the Supplementary Material, where we also report on additional comparative simulation studies (cf. Section 8.4). Finally, Section 9 provides concluding remarks and discusses potential venues for future research. Code to replicate our analyses is available online at https://github.com/andreaarfe/ or by request to the first author.
2. Basic notations and assumptions
If : is a non-decreasing, right-continuous function with left-hand limits, we let and for every (where ). We also identify with its induced measure, writing for any function , and for any . A function is if . We will denote with the set of discontinuity points of , and say that is -almost everywhere continuous if (this is true when is continuous, and it implies that must be continuous at every atom of ). If is random, then its distribution is fully characterized by its Laplace functional, i.e. the map , where is any non-negative function (Kallenberg, 2017, Chapter 2).
We assume that are independent, survival times, each with the same cumulative distribution function (with ). In survival analysis applications, it is common for to be (right) censored. In such cases, the observed dataset is formed by , where, for each , is the censored version of , its censoring time, and its censoring indicator. As common in this setting, we assume that censoring is independent—i.e. that are independent of (Kalbfleisch and Prentice, 2002, Section 3.2)—and ignorable—which essentially means that can be treated as known constants when computing posterior distributions (Heitjan and Rubin, 1991; Heitjan, 1993). We will also use the same notations when there is no censoring, in which case we simply define and for every . To refer to either these situations, we will simply say that the (potentially censored) survival times are generated by .
Let be (possibly censored) survival times generated from a distribution function . Our aim is to make inferences on , a summary of defined by the real-valued functions and (later we consider vectors of such summaries). Examples include the mean (, ), the variance (, , ), or the restricted mean survival time , , ; Royston and Parmar, 2013). From the Bayesian non-parametric perspective, any inference on can be accomplished first by assuming that is distributed according to some prior process, then computing or approximating the posterior distribution of , and so of , conditional on the observed data .
Let and be a distribution function over . We say that is a Dirichlet process DP(, ) and write DP(, ) if, for all has Dirichlet distribution Dir(), where . If DP() and there is no censoring, the posterior law of conditional on is DP(), with (Ferguson, 1973). However, if any is censored (i.e. ), then this posterior distribution is not a Dirichlet process anymore (Ferguson and Phadia, 1979; Walker and Muliere, 1997). In contrast, the beta-Stacy process is conjugate with respect to censored data, allowing simple posterior computations (Walker and Muliere, 1997). (Later, we will also discuss the use prior processes different that the beta-Stacy in the considered setting.)
3. The beta-Stacy process prior
The beta-Stacy process is the law of a random cumulative distribution function with support in (Walker and Muliere, 1997). It is a neutral-to-the-right, a type of non-parametric priors widely used with censored data (Doksum, 1974; Ferguson and Phadia, 1979). This means that if , then the increments , are independent for every (Ghosal and van der Vaart, 2017, Chapter 13).
Let be a cumulative distribution function with and jumps at locations (so for every ). Also let for every .
Definition 3.1 (Walker and Muliere, 1997). The cumulative distribution function is beta-Stacy process BS(, ) if the Laplace functional of satisfies
| (1) |
for every , where
| (2) |
and for , .
The sample paths of are discrete, as can only increase by an at most countable number of jumps (Walker and Muliere, 1997). A jump always occurs at each atom of ; its size is for independent Beta (, ). When is discrete, can only jump at each , so for . Otherwise, some jumps also occur at random positions. Their locations and sizes are determined by the - and -coordinates of the points (, ) of a non-homogeneous Poisson process on ; this is independent of each and has intensity measure exp , where is the continuous part of .
If BS(, ), then Beta(, ), infinitesimally speaking (Walker and Muliere, 1997). Hence, , and so , for all . Moreover, the variance of is a decreasing function of , with Var as . The function thus controls the dispersion of the distribution BS(, ) around its mean . Throughout, we will assume that (i) for all and (ii) for all and some . The former condition implies that has finite value (and so ) with probability 1 for every . The latter instead rule out extreme cases in which has null or arbitrarily large variance for some . (Both are technical requirements needed to prove Lemma A.2 in Appendix A.)
As previously mentioned, the classical Dirichlet process is a special case of the beta-Stacy process. In fact, Walker and Muliere (1997) show that if for all , then BS. Contrary to the Dirichlet process, however, the beta-Stacy process is conjugate with respect to right-censored data. Specifically, assume that (i) are generated by ; (ii) is the number of uncensored survival times less than or equal to ; and (iii) for all . Then we have the following result:
Theorem 3.1 (Theorem 4, Walker and Muliere, 1997). The posterior distribution of conditional on is the beta-Stacy process BS(, ), where
| (3) |
| (4) |
and is the product integral operator of Gill and Johansen (1990).
The posterior mean from Eq. (3) converges to , the standard Kaplan–Meier estimator of the distribution function, as for all (Walker and Muliere, 1997).
In practice, can be computed as , where, respectively, and are the following discrete and continuous distribution functions (Gill and Johansen, 1990). First,
| (5) |
where the product ranges over all positive such that (which are at most countable). Second,
| (6) |
where is with the discontinuities removed.
4. The beta-Stacy bootstrap
We now introduce the beta-Stacy bootstrap. Let be (possibly censored) survival times generated by BS(, ). The proposed procedure approximately samples from the law of conditional on . Better, it samples from an approximation to the law of , where BS(, ) and , are from Eqs. (3) and (4).
Algorithm 4.1. The beta-Stacy bootstrap is defined by the following steps:
Sample from and determine the corresponding number of distinct values (later we describe how to implement this step in practice and provide guidance on how to choose ).
Compute , for every , where is the empirical distribution function of .
For all , generate Beta(, ) (with , as ) and let .
Let and compute , where for all .
Output as an approximate sample from the distribution of .
We note that, in step 2 above, and are just the proportions of values that are equal to or stricter that , respectively. We also note that the law of in step 4 is the mixture of the beta-Stacy process BS(, ) with mixing measure , the joint law of . This generalizes the Dirichlet-multinomial process, which is a mixture of Dirichlet process with mean (Ishwaran and Zarepour, 2002; Muliere and Secchi, 2003).
Some of the sampled in step 1 can be equal to one of the observed uncensored event times among . This is because every observed event time is an atom of , as shown by Eq. (5). However, some of the values can also be new observations sampled from the support of the prior mean (e.g. these may come from the continuous component of in Eq. (6)). This deviates from other Bayesian bootstrap procedures, which typically only incorporate observed data (Rubin, 1981; Lo, 1993).
The following result shows that, if is -integrable (so that the posterior mean of exists finite) and -almost everywhere continuous (a necessary technical condition to prove this result; cf. the Appendix A), then the law of generated by Algorithm 4.1 using data approximates the posterior law of conditional on for large . More precisely, it shows that convergences in law to conditional on , i.e. as for any bounded continuous function (note that the sample size is fixed, and only the number of resamples m varies).
Proposition 4.1. If is -integrable and -almost everywhere continuous, then in law for conditional on .
Proof. The proof is provided in the Appendix A, as it relies on multiple lemmas.
The following corollary implies that the beta-Stacy bootstrap can also approximate the joint distribution of vectors of the form (). This is useful to approximate the joint distribution of multiple summaries of , e.g. the joint distribution of its first moments ( for ).
Corollary 4.1. Let be -integrable and -everywhere continuous. Then, in law conditional on for , i.e. as for any bounded continuous function .
Proof. Take and define . By Proposition 4.1, for . This implies that the joint characteristic function of () converges to that of (): for .
Consequently, for large the law of the sample generated by the beta-Stacy bootstrap is approximately the same of . In fact, if is continuous (as is the case for all examples considered in this paper), then by Corollary 4.1 and the continuous mapping theorem it holds that in law as . Hence, if is sufficiently large (e.g. ; see Section 8), by repeating steps 1–4 above independently, it is possible to generate an approximate sample of arbitrary size from the posterior law of . More generally, the joint law of () converges to that of (), where and is continuous or all . Thus the beta-Stacy bootstrap can also be used to approximate the joint posterior law of vectors of functionals of .
5. Connection with other Bayesian bootstraps
The proposed procedure is a Bayesian analogue of Efron’s classical bootstrap (Efron, 1981). When censoring is possible, the latter is based on repeated sampling from the Kaplan–Meier estimator (Efron and Tibshirani, 1986). Similarly, the beta-Stacy bootstrap samples from (cf. step 1 of Algorithm 4.1), the beta-Stacy posterior mean from Theorem 3.1.
The beta-Stacy bootstrap generalizes several Bayesian variants of the classical bootstrap: the Bayesian bootstrap of Rubin (1981), the proper Bayesian bootstrap of Muliere and Secchi (1996), and the Bayesian bootstrap for censored data of Lo (1993). The first two assume that there is no censoring, while the last allows for censored data. Their relationships are summarized in Fig. 1.
Given uncensored observations , the Bayesian bootstrap of Rubin (1981) assigns the same law as , where () has as a uniform Dirichlet distribution (and thus it is an exchangeably weighted bootstrap; cf. Praestgaard and Wellner, 1993). Consequently, Rubin’s bootstrap approximates the posterior law of induced by the improper Dirichlet process , i.e. the law of , where DP(, ) (Ghosal and van der Vaart, 2017, Section 4.7).
In contrast, the proper Bayesian bootstrap of Muliere and Secchi (1996) is defined according to a procedure akin to Algorithm 4.1. In detail, step 1 is the same, since (there is no censoring); in step 2, take for all ; finally, step 3 and 4 are the same. Hence, when there is no censoring, the procedure of Muliere and Secchi (1996) is a special case of the beta-Stacy bootstrap (in general, neither is exchangeably weighted; cf. Praestgaard and Wellner, 1993). Their relation is illustrated in Fig. 1 by arrow (a).
As a consequence, if there is no censoring the proper Bayesian bootstrap approximates (for large ) the posterior law of induced by a proper Dirichlet process DP(, ) with . More precisely, it approximates the law of with DP(, ) and . Thus, as (i.e. as the prior precision of the Dirichlet process vanishes), the proper Bayesian bootstrap will approximate the same posterior distribution as the procedure of Rubin (1981)—cf. Muliere and Secchi (1996). This is illustrated by arrow (c) in Fig. 1.
Lo’s procedure (1993) extends Rubin’s bootstrap (1981) to the case where censoring is possible—they coincide when there is no censoring; cf. arrow (d) in Fig. 1. Specifically, the Lo’s Bayesian bootstrap for censored data approximates the posterior law of obtained from the improper beta-Stacy prior BS(, ) or, equivalently, an improper beta process (Lo, 1993). Better, Lo’s bootstrap (1993) approximates the law of with that of , where , , , and is the Kaplan–Meier estimator (cf. Section 3). This is the limit of the beta-Stacy posterior law from Theorem 3.1 as for all . Thus, Lo’s procedure (1993) is obtained from ours in the limit of for all (cf. arrow (b) in Fig. 1).
In addition to the ones mentioned above, the beta-Stacy bootstrap also generalizes the Bayesian bootstrap for finite populations of Lo (1988) and the Pòlya urn bootstrap of Muliere and Walker (1998). These are obtained from the beta-Stacy bootstrap as previously done, assuming that is discrete and of finite support.
6. Generalization to the -sample case
We now consider the setting where censored observations are available from independent groups. Specifically, we observe a sample time-to-event data generated by the cumulative distribution function for all . A similar setting arises, for example, in randomized trials with treatment arms and a survival end-point. Without loss of generality, we suppose that .
In this setting, the goal is often to compare summary measures of survival across groups. These correspond to joint functionals of the form , , where , and are real-valued functions. Examples include the difference in expected survival times (, , ) or the ratio of survival probabilities (, , ). Similarly as in Section 4, we assume that is -integrable and -almost everywhere continuous, as well as that is continuous.
If and independently, we can use the beta-Stacy bootstrap to approximate the posterior law of given the censored data and . From Theorem 3.1, this is the law of , where: and are independent; for each , 2; and , are computed from the th group’s data using Eqs. (3)-(4).
In more detail, let be the distribution function generated by one iteration of the beta-Stacy bootstrap in group , 2 (cf. step 4 of Algorithm 4.1). Then, for large , will be an approximate sample from the law of , as shown by the following proposition.
Proposition 6.1. for conditional on , .
Proof. Since and are independent conditional on , Corollary 4.1 implies that () converges in law to (, ) as . The thesis now follows from the continuous mapping theorem. □
7. Implementing the beta-Stacy bootstrap
To implement the beta-Stacy bootstrap, we use the following procedure to generate observations from (step 1 of Algorithm 4.1). To be concrete, we assume that is continuous (so for all ) with density , but a similar method can also be used when is discrete.
Our approach is based on the relationship described in Section 3. This implies that if and are sampled independently from and , respectively, then is a sample from . We implement step 1 of Algorithm 4.1 by iterating this process times.
In detail, we sample from as follows. First, we note that, since is continuous, Eq. (5) implies that only if . Consequently, we can sample by defining it equal to with probability for all , or with probability . We do this using the inverse probability transform algorithm (Robert and Casella, 2004, Chapter 3).
Instead, we generate from in Eq. (3) using the inverse probability transform approach (Robert and Casella, 2004, Chapter 3). Specifically, first we sample from the uniform distribution over [0, 1], then we define as the solution to the equation
We approximate the above integral using Gaussian quadrature and compute using the bisection root-finding method (Quarteroni et al., 2010).
8. Empirical illustration
We illustrate our procedure using survival data (freely available as part of the R dataset survival::pbc) from a randomized clinical trial of D-penicillamine for primary biliary cirrhosis of the liver (Dickson et al., 1989). In this trial, 312 cirrhosis patients were randomized to receive either D-penicilammine (158 patients) or placebo (154 patients). Patients in the D-penicilammine (respectively: placebo) arm accumulated a total of about 872 (842) person-years of follow-up, during which 65 (60) were observed. Overall, 187 (59.9%) survival times were censored across study arms. Arm-specific Kaplan-Meier curves are shown in Fig. 2, panel a.
Fig. 2.
Panel a: Kaplan–Meier curves for the Mayo clinic primary biliary cirrhosis trial (cf. Section 8). Panels b and c: density estimates and box-plots of 10,000 posterior samples of the 10-years survival probability (panel b) and the 10-years restricted mean-survival time (panel c) in the placebo arm; samples were obtained either with the beta-Stacy bootstrap (separately for , 100, and 1000) or using the reference GvdVa algorithm (cf. Section 8). Panel d: density estimates and box-plots of 10,000 beta-Stacy bootstrap samples of the difference in mean survival times across arms (for , 100, and 1000).
Using these data, we compare the beta-Stacy bootstrap with another approach based on Algorithm a of Ghosal and van der Vaart (2017, Section 13.3.3)—which we will denote as GvdVa. For any beta-Stacy process , algorithm GvdVa can simulate approximate sample paths over a prespecified bounded interval [0, ]. This algorithm is based on a discretization of [0, ] by means of equally-spaced points, so that larger values of provide a better approximation to the beta-Stacy process (as we explain later, we use as reference in our analyses). We have chosen this algorithm as comparator because, compared to the others mentioned in the introduction, algorithm GvdVa is simpler to implement (like the beta-Stacy bootstrap, it is based on exact simulation steps and does not require sampling from unnormalized distributions; cf. Blasi, 2014). Details are provided in the Supplementary Section S1.
8.1. Prior and posterior distributions
Denote with and the cumulative distribution functions of survival times in the placebo and D-penicilammine arms, respectively. We assigned (, 1) an independent beta-Stacy prior , where is the cumulative distribution function of an exponential random variable with median equal to 10 years. For simplicity, we assumed for all . These prior distributions are fairly non-informative, since they are very diffuse around their expected values (Supplementary Figure S1).
With these priors, the posterior means of and are practically indistinguishable from the corresponding Kaplan–Meier curves (Supplementary Figure S2). This is also confirmed by the Kolmogorov–Smirnov distances , (, 1) which compare the Kaplan–Meier estimate of and the corresponding posterior mean over the period from 0 to 12 years from randomization. We estimated that for the placebo arm, and for the D-penicilammine arm.
8.2. Inference for single-sample summaries
Using the beta-Stacy bootstrap and the GvdVa algorithm, we approximate the posterior distribution of two summaries of : (i) the 10-year survival probability in the placebo arm, i.e. with , and (ii) the 10-year restricted mean survival time in the placebo arm, i.e. with .
In each case, we obtain 10,000 posterior samples. For the beta-Stacy bootstrap, we use , 100, and 1000, separately. To provide a reference against which to compare the beta-Stacy bootstrap, we implemented the GvdVa algorithm using a discretization of the time interval [0, 10] based on equally-spaced points. We chose by this value by iteratively increasing until the corresponding approximate posterior distributions of and seemed to stabilize (cf. Supplementary Section S1). Note that algorithm GvdVa can be applied to and because they depend only on the values of for [0, 10].
We use Kolmogorov–Smirnov statistics to compare the distributions obtained from the beta-Stacy bootstrap and algorithm GvdVa. Specifically, for both summary measures and separately, we compute the statistics , where: , 100, or 1000; is the empirical distribution of the corresponding beta-Stacy bootstrap sample; and is the empirical distribution of the GvdVa samples.
Results are shown in Figs. 2b-c. For the 10-year survival probability (panel b), the distribution of beta-Stacy bootstrap samples approaches that obtained from algorithm GvdVa as increases. Indeed, the associated Kolmogorov–Smirnov statistics are , , and . Similar results were also obtained for the 10-year restricted mean survival (panel c), for which we computed , , and . The choice thus seems to have provided a good approximation to the posterior laws of interest.
8.3. Difference in mean survival times
We now consider the posterior law of the two-sample summary defined by , i.e. the difference in mean survival times between the D-penicilammine arm and the placebo arm. In this case, it is hard to use the GvdVa algorithm to approximate the beta-Stacy posterior, because has infinite support. On the contrary, we can still use the beta-Stacy bootstrap directly to generate approximate samples from the posterior law of .
In Fig. 2d, we show the distribution of 10,000 posterior samples of the difference in mean survival times obtained with the beta-Stacy bootstrap, separately using , 100, or 1000. Compatibly with the previous results, the distribution of posterior samples stabilizes as increases. In particular, the density estimates and quartiles of the distributions for and are almost indistinguishable (the Kolmogorov–Smirnov distance between the two sample distributions was 0.007). These results again suggest that provided a good approximation to the relevant posterior distribution.
8.4. Additional simulation study
In Supplementary Section S2, we report a simulation study aimed at assessing how the proportion of censored sample observations may impact the beta-Stacy bootstrap approximation of the beta-Stacy posterior distribution. Results suggest that the proportion of censored observations does not impact the quality of the approximation in comparison to the reference GvdVa algorithm, provided that is sufficiently large. Compared to a scenario with no censoring, simulation scenarios with higher censoring rates (up to 75% of censored data) did not require larger values of to obtain the same quality of approximation ( seemed to be acceptable in all considered scenarios).
9. Concluding remarks
The beta-Stacy bootstrap is an algorithm to perform Bayesian non-parametric inference with censored data. This procedure generates approximate sampler a beta-Stacy process posterior (Walker and Muliere, 1997) without the need to tune Markov Chain Monte Carlo methods. The quality of the approximation is controlled by the number of samples from the posterior mean distribution (cf. step 1 of Algorithm 4.1). Our simulations suggest that may generally provide a good approximation, independently of the proportion of event times that are affected by censoring.
In place of the beta-Stacy process, many other non-parametric prior processes could be used to estimate summaries of the survival distribution function. Examples include piece-wise hazard processes (Arjas and Gasbarra, 1994), the gamma and extended gamma processes (Kalbfleisch, 1978; Dykstra and Laud, 1981), Pòlya trees (Mauldin et al., 1992; Muliere and Walker, 1997), mixture models driven by random measures (Kottas, 2006; Riva-Palacio et al., 2021), or Bayesian Additive Regression Trees (Sparapani et al., 2016). In comparison to the beta-Stacy process, computations using alternative prior processes may require the use of Markov Chain Monte Carlo samplers due to lack of conjugacy. Whether new Bayesian bootstraps could be derived for other conjugate processes (e.g Muliere and Walker, 1997) is a question for future research.
Inference using the beta-Stacy process requires specification of both the precision function and the prior mean distribution function . To avoid having to specify these in full, we might instead define and/or as a function of a scalar or multi-dimensional parameter (e.g. we might take for all ). Then, instead of specifying a single value of , we could assign it a prior distribution . This approach leads to the specification of a mixture of beta-Stacy processes as the prior distribution for G, i.e. , in the same approach of Antoniak (1974). In future work, we will evaluate the use of the beta-Stacy bootstrap in Monte Carlo schemes for such mixtures and their generalization for competing risks data (Arfè et al., 2018).
Supplementary Material
Acknowledgments
We thank Alejandra Avalos-Pacheco, Massimiliano Russo, and Giovanni Parmigiani for their useful comments. Part of this work was developed while the first author was supported by a post-doctoral fellowship at the Harvard-MIT Center for Regulatory Science, Harvard Medical School, United States. Analyses were conducted in R (version 4.1.2) using the libraries mvQuad, Rcpp, and ggplot2.
Appendix A. Technical lemmas and proofs
To prove Proposition 4.1, we will use results related to convergence in law of random measures—cf. Daley and Vere-Jones (2007), Section 11.1; see also Kallenberg (2017), Chapter 4. Let and be random measures over that are finite on bounded intervals for every integer . Then, converges in law to if and only if in law (as real-valued random variables) for every bounded continuous function with bounded support. This happens if and only as . for every such function (Daley and Vere-Jones, 2007, Proposition 1.11.VIII).
Let be the space of all right-continuous functions with left-hand limits with the Skorokhod topology (Jacod and Shiryaev, 2003, Chapter VI, Section 1b). The following result links convergence in law of to to convergence in law of their cumulative distribution functions as random elements of .
Lemma A.1. The random measure converges in law to if and only if the function converges in law to as random elements of with the Skorokhod topology.
Proof. This result can be shown using a similar argument as that presented before Lemma 11.1.XI of Daley and Vere-Jones (2007).
We now prove the following lemma, which implies that the measure converges in law to conditionally on i.e. as . for every bounded continuous with bounded support. For simplicity of notation, let be the conditional expectation with respect to . Also let , where are the variables from Step 1 of Algorithm 4.1.
For any given and , we define (respectively: as the function in Eq. (2), but with and (respectively: and ) in place of and . With these notations, by Lemma 1 of Ferguson (1974) it is and, similarly, .
Lemma A.2. (i) If is a bounded measurable function with bounded support (but not necessarily continuous), then, conditionally on in law as . (ii) The previous statement also holds for every bounded measurable with bounded support.
Proof. First we prove point (i). By dominated convergence, it suffices to show that as . with probability 1 for all functions such that and for all for some . To do so, we note that uniformly in , and so , for all fixed with probability 1. This follows from the Glivenko–Cantelli theorem, the fact that is bounded, and because the functions and are bounded and Lipschitz over . Now, fix such that (this is possible because for all ). With probability 1, for all and large . In such case, since and min(, 1), it is for and some . Since , , and , the thesis follows by dominated convergence.
To prove point (ii), let be a bounded measurable function with bounded support. Define and , which are both bounded non-negative measurable functions with bounded support. Now, by point (i), as . for every , . Consequently, in law as a random vector by convergence of the corresponding joint Laplace transform (Kallenberg, 1997, Theorem 4.3). Since , the thesis follows from the continuous mapping theorem.
Using Lemmas A.1 and A.2, we can now prove that, conditionally on converges in distribution to for every bounded continuous function with bounded support.
Lemma A.3. converges in law to as conditionally on .
Proof. Let be defined by for every . Since the map defined for every real is Lipschitz-continuous, is also continuous with respect to the Skorokhod topology on . Since and for every , the thesis now follows from Lemmas A.1 and A.2 and the continuous mapping theorem.
We are now ready to prove Proposition 4.1.
Proof of Proposition 4.1. From Lemma A.3 and Proposition 4.19 of Kallenberg (2017), it follows that in law conditionally on for every bounded continuous function (not necessarily with bounded support). Then, using an argument like the one in the proof of Lemma 4.12 of Kallenberg (2017), it follows that the same is true for every bounded measurable function (not necessarily continuous) such that . We now show that the thesis holds for any -integrable (not necessarily bounded), provided that . In fact, by an argument like the one used to prove point (ii) of Lemma A.2, it suffices to show that this is true for any such non-negative .
Consequently, suppose that for every . By the Portmanteau theorem, it suffices to show that as for any real-valued function such that and for some . To do this, let and define . Then, , where , , and . Now, for , because is bounded, measurable, and . Consequently, lim . Since for every and , we also have that . By the Markov inequality, for every it holds that —where the last equality follows from (cf. Section 4). As a consequence, lim . However, by the dominated convergence theorem, as . Hence, the thesis follows by first letting and then from above.
Footnotes
Appendix B. Supplementary data
Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jspi.2022.07.001. Code to implement the beta-Stacy bootstrap and reproduce our results is available at https://github.com/andreaarfe/.
References
- Al Labadi L, Zarepour M, 2013. A Bayesian nonparametric goodness of fit test for right censored data based on approximate samples from the beta-Stacy process. Canad. J. Statist 41 (3), 466–487. [Google Scholar]
- Antoniak CE, 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist 2 (6), 1152–1174. [Google Scholar]
- Arfè A, Peluso P, Muliere P, 2018. Reinforced urns and the subdistribution beta-Stacy process prior for competing risks analysis. Scand. J. Stat 46, 706–734. [Google Scholar]
- Arjas E, Gasbarra D, 1994. Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler. Statist. Sinica 505–524. [Google Scholar]
- Barrientos AF, Peña V, 2020. Bayesian bootstraps for massive data. Bayesian Anal. 15 (2). [Google Scholar]
- Blasi P, 2014. Simulation of the beta-stacy process. In: Wiley StatsRef: Statistics Reference Online. John Wiley and Sons, Ltd, URL https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118445112.stat03869. [Google Scholar]
- Daley D, Vere-Jones D, 2007. An Introduction To the Theory of Point Processes. Volume II: General Theory and Structure. Springer, New York. [Google Scholar]
- Damien P, Laud PW, Smith AF, 1995. Approximate random variate generation from infinitely divisible distributions with applications to Bayesian inference. J. R. Stat. Soc. Ser. B Stat. Methodol 547–563. [Google Scholar]
- Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A, 1989. Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 10 (1), 1–7. [DOI] [PubMed] [Google Scholar]
- Doksum K, 1974. Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab 183–201. [Google Scholar]
- Dykstra RL, Laud P, 1981. A Bayesian nonparametric approach to reliability. Ann. Statist 9 (2), 356–367. [Google Scholar]
- Efron B, 1981. Censored data and the bootstrap. J. Amer. Statist. Assoc 76 (374), 312–319. [Google Scholar]
- Efron B, Tibshirani R, 1986. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci 54–75. [Google Scholar]
- Ferguson TS, 1973. A Bayesian analysis of some nonparametric problems. Ann. Statist 1, 209–230. [Google Scholar]
- Ferguson T, 1974. Prior distributions on spaces of probability measures. Ann. Statist 2 (4), 615–629. [Google Scholar]
- Ferguson TS, Klass MJ, 1972. A representation of independent increment processes without Gaussian components. Ann. Math. Stat 43 (5), 1634–1643. [Google Scholar]
- Ferguson TS, Phadia EG, 1979. Bayesian nonparametric estimation based on censored data. Ann. Statist 163–186. [Google Scholar]
- Ghosal S, van der Vaart A, 2017. Fundamentals of Nonparametric BayesIan Inference. In: Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press. [Google Scholar]
- Gill RD, Johansen S, 1990. A survey of product-integration with a view toward application in survival analysis. Ann. Statist 18 (4), 1501–1555. [Google Scholar]
- Heitjan DF, 1993. Ignorability and coarse data: Some biomedical examples. Biometrics 49, 1099–1109. [PubMed] [Google Scholar]
- Heitjan DF, Rubin DB, 1991. Ignorability and coarse data. Ann. Statist 19, 2244–2253. [Google Scholar]
- Hjort NL, 1990. Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist 18, 1259–1294. [Google Scholar]
- Ishwaran H, Zarepour M, 2002. Dirichlet prior sieves in finite normal mixtures. Statist. Sinica 12 (3), 941–963. [Google Scholar]
- Jacod J, Shiryaev AN, 2003. Limit Theorems for Stochastic Processes. Springer Berlin Heidelberg, Berlin, Heidelberg. [Google Scholar]
- Kalbfleisch JD, 1978. Non-parametric Bayesian analysis of survival time data. J. R. Stat. Soc. Ser. B Stat. Methodol 40 (2), 214–221. [Google Scholar]
- Kalbfleisch JD, Prentice RL, 2002. The Statistical Analysis of Failure Time Data, 2nd ed John Wiley & Sons, Hoboken, New Jersey. [Google Scholar]
- Kallenberg O, 1997. Foundations of Modern Probability. Springer, New York. [Google Scholar]
- Kallenberg O, 2017. Random Measures, Theory and Applications. In: Probability Theory and Stochastic Modelling, Springer International Publishing. [Google Scholar]
- Kim Y, Lee J, 2003. Bayesian bootstrap for proportional hazards models. Ann. Statist 31 (6), 1905–1922. [Google Scholar]
- Kottas A, 2006. Nonparametric Bayesian survival analysis using mixtures of Weibull distributions. J. Statist. Plann. Inference 136 (3), 578–596. [Google Scholar]
- Lo AY, 1987. A large sample study of the Bayesian Bootstrap. Ann. Statist 15 (1). [Google Scholar]
- Lo AY, 1988. A Bayesian bootstrap for a finite population. Ann. Statist 16 (4), 1684–1695. [Google Scholar]
- Lo AY, 1991. BayesIan bootstrap clones and a biometry function. Sankhyā: Indian J. Stat 53 (3), 320–333. [Google Scholar]
- Lo AY, 1993. A Bayesian bootstrap for censored data. Ann. Statist 21 (1), 100–123. [Google Scholar]
- Lyddon SP, Holmes CC, Walker SG, 2019. General Bayesian updating and the loss-likelihood bootstrap. Biometrika 106 (2), 465–478. [Google Scholar]
- Lyddon S, Walker S, Holmes CC, 2018. Nonparametric learning from Bayesian models with randomized objective functions. In: NeurIPS. pp. 2075–2085. [Google Scholar]
- Mauldin RD, Sudderth WD, Williams S, 1992. Polya trees and random distributions. Ann. Statist 20 (3), 1203–1221. [Google Scholar]
- Muliere P, Secchi P, 1996. Bayesian nonparametric predictive inference and bootstrap techniques. Ann. Inst. Statist. Math 48 (4), 663–673. [Google Scholar]
- Muliere P, Secchi P, 2003. Weak convergence of a Dirichlet-multinomial process. Georgian Math. J 10 (2), 319–324. [Google Scholar]
- Muliere P, Walker S, 1997. A Bayesian non-parametric approach to survival analysis using polya trees. Scand. J. Stat 24 (3), 331–340. [Google Scholar]
- Muliere P, Walker S, 1998. Extending the family of Bayesian bootstraps and exchangeable urn schemes. J. R. Stat. Soc. Ser. B Stat. Methodol 60 (1), 175–182. [Google Scholar]
- Praestgaard J, Wellner JA, 1993. Exchangeably weighted bootstraps of the general empirical process. Ann. Probab 21 (4), 2053–2086. [Google Scholar]
- Quarteroni A, Sacco R, Saleri F, 2010. Numerical Mathematics. In: Texts in Applied Mathematics, (37), Springer-Verlag Berlin Heidelberg. [Google Scholar]
- Riva-Palacio A, Leisen F, Griffin J, 2021. Survival regression models with dependent Bayesian nonparametric priors. J. Amer. Statist. Assoc 1–10. [Google Scholar]
- Robert C, Casella G, 2004. Monte Carlo Statistical Methods, 2nd ed In: Springer Texts in Statistics, Springer, New York. [Google Scholar]
- Royston P, Parmar MK, 2013. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med. Res. Methodol 13 (1), 152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubin DB, 1981. The Bayesian bootstrap. Ann. Statist 130–134. [Google Scholar]
- Sparapani RA, Logan BR, McCulloch RE, Laud PW, 2016. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART). Stat. Med 35 (16), 2741–2753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walker S, Damien P, 1998. A full Bayesian non-parametric analysis involving a neutral to the right process. Scand. J. Stat 25 (4), 669–680. [Google Scholar]
- Walker S, Muliere P, 1997. Beta-stacy processes and a generalization of the Pólya-urn scheme. Ann. Statist 25, 1762–1780. [Google Scholar]
- Wolpert RL, Ickstadt K, 1998. Simulation of Lévy random fields. In: Practical Nonparametric and Semiparametric Bayesian Statistics. Springer; New York, pp. 227–242. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


