Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jan 1.
Published in final edited form as: J Stat Plan Inference. 2022 Jul 14;222:241–251. doi: 10.1016/j.jspi.2022.07.001

A general Bayesian bootstrap for censored data based on the beta-Stacy process

Andrea Arfè a,*, Pietro Muliere b
PMCID: PMC10347888  NIHMSID: NIHMS1908487  PMID: 37457239

Abstract

We introduce a novel procedure to perform Bayesian non-parametric inference with right-censored data, the beta-Stacy bootstrap. This approximates the posterior law of summaries of the survival distribution (e.g. the mean survival time). More precisely, our procedure approximates the joint posterior law of functionals of the beta-Stacy process, a non-parametric process prior that generalizes the Dirichlet process and that is widely used in survival analysis. The beta-Stacy bootstrap generalizes and unifies other common Bayesian bootstraps for complete or censored data based on non-parametric priors. It is defined by an exact sampling algorithm that does not require tuning of Markov Chain Monte Carlo steps. We illustrate the beta-Stacy bootstrap by analyzing survival data from a real clinical trial.

Keywords: Censored data, Bayesian bootstrap, Bayesian non-parametric, Beta-Stacy process

1. Introduction

Survival data is often censored, hindering statistical inferences (Kalbfleisch and Prentice, 2002). In this setting, the goal is often to perform inference on specific summaries ϕ(G) of the cumulative distribution function G(x) (defined for x0) that generated the observed survival times Y1,,Yn, e.g. the expected survival time, or the probability to survive past a landmark time-point.

We introduce beta-Stacy bootstrap, a new method to perform Bayesian non-parametric inference for functionals ϕ(G) of the distribution function G using censored data. Specifically, the proposed approach generates approximate samples from the posterior law of ϕ(G) obtained by assuming that G is a beta-Stacy process (Walker and Muliere, 1997). This process defines a non-parametric prior law for distribution functions widely used with censored data (Walker and Damien, 1998; Al Labadi and Zarepour, 2013; Arfè et al., 2018). The beta-Stacy process extends the classical Dirichlet process of Ferguson (1973) and it is conjugate to both complete and right-censored data (Walker and Muliere, 1997). It is also strictly related to the beta process of Hjort (1990): G is a beta-Stacy process if and only if its cumulative hazard function is a beta process (Walker and Muliere, 1997).

The proposed approach belongs to the family of Bayesian bootstrap procedures pioneered by Rubin (1981). In addition to Rubin’s, this family includes the proper Bayesian bootstrap of Muliere and Secchi (1996), the Bayesian bootstrap for censored data of Lo (1993), and others (Lo, 1991; Kim and Lee, 2003; Lyddon et al., 2019). Similarly to Efron’s classical bootstrap (Efron and Tibshirani, 1986), Bayesian bootstraps repeatedly re-sample and/or re-weight the observed data to induce a probability distribution for ϕ(G). More precisely, Bayesian bootstraps generate approximate samples from the posterior distribution of ϕ(G) associated to some non-parametric prior for G (for connections with Efron’s frequentist procedure, see Lo, 1987, 1991, 1993; Muliere and Secchi, 1996). Interest in these sampling algorithms has recently increased thanks to their scalability and computational simplicity—e.g. they do not require tuning of Markov Chain Monte Carlo steps (Lyddon et al., 2018; Barrientos and Peña, 2020).

We show that the beta-Stacy bootstrap generalizes other common Bayesian bootstrap procedures. These include those of Rubin (1981) and Muliere and Secchi (1996), which are at the core of other recent proposals (Lyddon et al., 2019; Barrientos and Peña, 2020), but cannot be applied in presence of censoring. They also include Lo’s procedure (1993), which can incorporate censored observations, but cannot incorporate prior information on the functional form of G. We characterize each of these methods as a special or limiting case of the beta-Stacy bootstrap (cf. Fig. 1), which, in comparison, can be applied with censored data and allows to incorporate prior information on the data-generating distribution.

Fig. 1.

Fig. 1.

Relations between different Bayesian bootstraps: BSB (in red), beta-Stacy Bootstrap (cf. Section 4); PBB, Proper Bayesian Bootstrap (Muliere and Secchi, 1996); BBC, Bayesian Bootstrap for Censored data (Lo, 1993); BB, classical Bayesian Bootstrap (Rubin, 1981). The prior precision of the BSB is controlled by a function c(x), while that of the PBB is controlled by a constant k. (a) the BSB and PBB coincide when there is no censoring and c(x)=k; (b) the BSB reduces to the BBC if c(x)0 for every x; (c) the PBB reduces to the BB if k0; (d) the BCC and BB coincide when there is no censoring. See Section 5 for details.

We note that, when G has a beta-Stacy prior distribution, posterior inferences for ϕ(G) could also be based on algorithms for the simulation of Lévy processes (cf. Damien et al., 1995, Walker and Damien, 1998, Ferguson and Klass, 1972, and Wolpert and Ickstadt, 1998; see also Ghosal and van der Vaart, 2017, Section 13.3.3 and Blasi, 2014 for a reviews and applications to the beta-Stacy process). With these methods, it is possible to generate approximate samples from the posterior law of G, and so also from the posterior law of ϕ(G). However, some algorithms (e.g. Damien et al., 1995; Walker and Damien, 1998) can only generate approximate sample paths {G(x):x[0,T]} over some bounded interval [0, T]. Hence, they may be difficult to apply to summaries ϕ(G) that depend on all values of G, such as the expected survival time. These cases are not problematic for the beta-Stacy bootstrap. Other approaches (e.g. Ferguson and Klass, 1972; Wolpert and Ickstadt, 1998) can approximately sample full paths {G(x):x[0,+]} from the posterior law of G, but they are computationally more complicated than the beta-Stacy bootstrap (e.g. they may require auxiliary algorithms to sample from unnormalized distributions).

The rest of the paper is structured as follows. In Section 2, we introduce notations and assumptions used throughout the manuscript. In Section 3, we review the definition and properties of the beta-Stacy process. In Section 4, we introduce the beta-Stacy bootstrap and study its approximation properties (most technical proofs are provided in Appendix A). In Section 5, we describe the connections of the beta-Stacy bootstrap with other Bayesian bootstrap algorithms. In Section 6, we briefly describe a generalization of the beta-Stacy bootstrap to the k-sample setting. In Section 7, we describe a computational approach for implementing the beta-Stacy bootstrap. Using data from a clinical trial in hepatology (Dickson et al., 1989), in Section 8 we illustrate the beta-Stacy bootstrap and contrast it to an algorithm that generates approximate beta-Stacy sample paths. We describe this algorithm in the Supplementary Material, where we also report on additional comparative simulation studies (cf. Section 8.4). Finally, Section 9 provides concluding remarks and discusses potential venues for future research. Code to replicate our analyses is available online at https://github.com/andreaarfe/ or by request to the first author.

2. Basic notations and assumptions

If Z : [0,+)R is a non-decreasing, right-continuous function with left-hand limits, we letZ¯(x)=1=F(x) and ΔZ(x)=Z(x)Z(x) for every x0 (where Z(0)=0). We also identify Z with its induced measure, writing Zf=h(x)dZ(x) for any function h(x), and Z(S)=I{xS}dZ(x) for any S[0,). A function h(x) is Z-integrable if Zh<+. We will denote with Dh[0,) the set of discontinuity points of h, and say that h is Z-almost everywhere continuous if Z(Dh)=0 (this is true when h is continuous, and it implies that h must be continuous at every atom of Z). If Z is random, then its distribution is fully characterized by its Laplace functional, i.e. the map h(x)E[exp(Zh)], where h(x) is any non-negative function (Kallenberg, 2017, Chapter 2).

We assume that T1,,Tn are independent, survival times, each with the same cumulative distribution function G (with G(0)=0). In survival analysis applications, it is common for T1,,Tn to be (right) censored. In such cases, the observed dataset is formed by Y1=(T1c,δ1),,Yn=(Tnc,δn), where, for each i=1,,n, Tic=min(Ti,Ci) is the censored version of Ti, Ci its censoring time, and δi=I{TiCi} its censoring indicator. As common in this setting, we assume that censoring is independent—i.e. that C1,,Cn are independent of T1,,Tn (Kalbfleisch and Prentice, 2002, Section 3.2)—and ignorable—which essentially means that C1,,Cn can be treated as known constants when computing posterior distributions (Heitjan and Rubin, 1991; Heitjan, 1993). We will also use the same notations when there is no censoring, in which case we simply define Tic=Ti and δi=1 for every i=1,,n. To refer to either these situations, we will simply say that the (potentially censored) survival times Y1,,Yn are generated by G.

Let Y1,,Yn be (possibly censored) survival times generated from a distribution function G. Our aim is to make inferences on ϕ(G)=f(Gh1,,Ghk), a summary of G defined by the real-valued functions f(x1,,xk) and h1(x),,hk(x) (later we consider vectors of such summaries). Examples include the mean (h1(x)=x, f(x1)=x1), the variance (h1(x)=x2, h2(x)=x, f(x1,x2)=x1x22), or the restricted mean survival time h1(x)=min(x,τ), f(x1)=x1, τ>0; Royston and Parmar, 2013). From the Bayesian non-parametric perspective, any inference on ϕ(G) can be accomplished first by assuming that G is distributed according to some prior process, then computing or approximating the posterior distribution of G, and so of ϕ(G), conditional on the observed data Y1,,Yn.

Let k>0 and F be a distribution function over [0,+α). We say that G is a Dirichlet process DP(k, F) and write G DP(k, F) if, for all 0=x0<x1<<xk<xk+1=+,(G(x1)G(x0),,G(xk+1)G(xk)) has Dirichlet distribution Dir(α1,α2,,αk+1), where αi=k(F(xi)F(Xi1)). If G DP(k,F) and there is no censoring, the posterior law of G conditional on Y1,,Yn is DP(k+n,F), with F(x)=kk+nF(x)+1k+ni=1nI{Tix} (Ferguson, 1973). However, if any Ti is censored (i.e. δi=0), then this posterior distribution is not a Dirichlet process anymore (Ferguson and Phadia, 1979; Walker and Muliere, 1997). In contrast, the beta-Stacy process is conjugate with respect to censored data, allowing simple posterior computations (Walker and Muliere, 1997). (Later, we will also discuss the use prior processes different that the beta-Stacy in the considered setting.)

3. The beta-Stacy process prior

The beta-Stacy process is the law of a random cumulative distribution function G(x) with support in [0,+) (Walker and Muliere, 1997). It is a neutral-to-the-right, a type of non-parametric priors widely used with censored data (Doksum, 1974; Ferguson and Phadia, 1979). This means that if Z(x)=log(1G(x)), then the increments Z(t1)Z(t0),Z(t2)Z(t1),,Z(tk)Z(tk1) are independent for every 0=t0<t1<<tk (Ghosal and van der Vaart, 2017, Chapter 13).

Let F(x) be a cumulative distribution function with F(0)=0 and jumps at locations x1<x2< (so ΔF(xj)>0 for every xj). Also let c(x)>0 for every x0.

Definition 3.1 (Walker and Muliere, 1997). The cumulative distribution function G is beta-Stacy process BS(c, F) if the Laplace functional of Z satisfies

logE[exp(Zh)]=0+0+(1euh(x))ρ(x,u)dF(x)du (1)

for every h(x)0, where

ρ(x,u)=11euc(x)exp(uc(x)F¯(x))r(uc(x)ΔF(x)) (2)

and r(u)=(1eu)u for u>0, r(0)=1.

The sample paths of G(x) are discrete, as Z(x) can only increase by an at most countable number of jumps (Walker and Muliere, 1997). A jump always occurs at each atom xj of F(x); its size is ΔG(xj)=Ujxi<x(1Ui) for independent Uj=1exp(ΔZ(xj)) Beta (c(xj)ΔF(xj), c(xj)F¯(xj)). When F is discrete, G can only jump at each xj, so G¯(x)=xjx(1Uj) for x>0. Otherwise, some jumps also occur at random positions. Their locations and sizes are determined by the x- and u-coordinates of the points (x, u) of a non-homogeneous Poisson process on (0,+)2; this is independent of each Uj and has intensity measure (1eu)1c(x) exp (uc(x)F¯(x))dFc(x)du, where Fc(x)=F(x)xjxΔF(xj) is the continuous part of F.

If G BS(c, F), then dG(x)G¯(x) Beta(c(x)dF(x), c(x)F¯(x)), infinitesimally speaking (Walker and Muliere, 1997). Hence, E[dG(x)]=dF(x), and so E[G(x)]=F(x), for all x>0. Moreover, the variance of dG(x) is a decreasing function of c(x), with Var(dG(x))0 as c(x)+. The function c(x) thus controls the dispersion of the distribution BS(c, F) around its mean F. Throughout, we will assume that (i) F(x)<1 for all x>0 and (ii) ϵc(x)ϵ1 for all x>0 and some ϵ(0,1). The former condition implies that Z(x) has finite value (and so G(x)<1) with probability 1 for every x>0. The latter instead rule out extreme cases in which dG(x) has null or arbitrarily large variance for some x>0. (Both are technical requirements needed to prove Lemma A.2 in Appendix A.)

As previously mentioned, the classical Dirichlet process is a special case of the beta-Stacy process. In fact, Walker and Muliere (1997) show that if c(x)=k for all x>0, then BS(c,F0=DP(k,F). Contrary to the Dirichlet process, however, the beta-Stacy process is conjugate with respect to right-censored data. Specifically, assume that (i) Y1,,Yn are generated by GBS(c,F); (ii) N(x)=i=1nI{Tic,δi=1} is the number of uncensored survival times less than or equal to x0; and (iii) M(x)=i=1nI{Ticx} for all x0. Then we have the following result:

Theorem 3.1 (Theorem 4, Walker and Muliere, 1997). The posterior distribution of G conditional on Y1,,Yn is the beta-Stacy process BS(c, F), where

F(x)=1t[0,x][1c(t)dF(t)+dN(t)c(t)F¯(t)+M(t)], (3)
c(x)=c(t)F¯(x)+M(x)ΔN(x)1F(x), (4)

and t[0,x] is the product integral operator of Gill and Johansen (1990).

The posterior mean F(x)=E[G(x)Y1,,Yn] from Eq. (3) converges to G^(x)=1t[0,x][1dN(t)M(t)], the standard Kaplan–Meier estimator of the distribution function, as c(x)0 for all x>0 (Walker and Muliere, 1997).

In practice, F(x) can be computed as F(x)=1(Fd(x))(1Fc(x)), where, respectively, Fd and Fc are the following discrete and continuous distribution functions (Gill and Johansen, 1990). First,

Fd(x)=1[1c(t)ΔF(t)+ΔN(t)(c(t)F¯(t)+M(t)], (5)

where the product ranges over all positive tx such that ΔF(t)+ΔN(t)>0 (which are at most countable). Second,

Fc(x)=1exp(0xc(t)dFc(t)c(t)(1F(t))+M(t)), (6)

where Fc(x)=F(x)xjxΔF(xj) is F(x) with the discontinuities removed.

4. The beta-Stacy bootstrap

We now introduce the beta-Stacy bootstrap. Let Y1,,Yn be (possibly censored) survival times generated by G BS(c, F). The proposed procedure approximately samples from the law of ϕ(G)=f(Gh1,,Ghk) conditional on Y1,,Yn. Better, it samples from an approximation to the law of ϕ(G), where G BS(c, F) and F, c are from Eqs. (3) and (4).

Algorithm 4.1. The beta-Stacy bootstrap is defined by the following steps:

  1. Sample X1,,Xm from F and determine the corresponding number D of distinct values X1,m<<XD,m (later we describe how to implement this step in practice and provide guidance on how to choose m).

  2. Compute αi=c(Xi,m)ΔFm(Xi,m), βi=c(Xi,m)F¯m(Xi,m) for every i=1,,D, where Fm(x)=i=1mI{Xix}m is the empirical distribution function of X1,,Xm.

  3. For all i=1,D, generate Ui Beta(αi, βi) (with UD=1, as βD=0) and let Zi=Uij=1i1(1Uj).

  4. Let Gm(x)=i=1DI{Xi,mx}Zi and compute ϕ(Gm)=f(Gmh1,,Gmhk), where Gmhj=i=1Dh(Xi,m)Zi for all j=1,,k.

  5. Output ϕ(Gm) as an approximate sample from the distribution of ϕ(G).

We note that, in step 2 above, ΔFm(Xi,m) and F¯m(Xi,m) are just the proportions of values X1,,Xm that are equal to or stricter that Xi,m, respectively. We also note that the law of Gm in step 4 is the mixture of the beta-Stacy process BS(c, Fm) with mixing measure i=1mF(dxi), the joint law of X1,,Xm. This generalizes the Dirichlet-multinomial process, which is a mixture of Dirichlet process with mean Fm (Ishwaran and Zarepour, 2002; Muliere and Secchi, 2003).

Some of the X1,,Xm sampled in step 1 can be equal to one of the observed uncensored event times among Y1,,Yn. This is because every observed event time is an atom of F, as shown by Eq. (5). However, some of the values X1,,Xm can also be new observations sampled from the support of the prior mean F (e.g. these may come from the continuous component of F in Eq. (6)). This deviates from other Bayesian bootstrap procedures, which typically only incorporate observed data (Rubin, 1981; Lo, 1993).

The following result shows that, if h is F-integrable (so that the posterior mean of Gh exists finite) and F-almost everywhere continuous (a necessary technical condition to prove this result; cf. the Appendix A), then the law of Gmh generated by Algorithm 4.1 using data Y1,,Yn approximates the posterior law of Gh conditional on Y1,,Yn for large m. More precisely, it shows that Gmh convergences in law to Gh conditional on Y1,,Yn, i.e. E[H(Gmh)Y1,,Yn]E[H(Gh)Y1,,Yn] as m+ for any bounded continuous function H (note that the sample size n is fixed, and only the number of resamples m varies).

Proposition 4.1. If h:[0,+)R is F-integrable and F-almost everywhere continuous, then GmhGh in law for m+ conditional on Y1,,Yn.

Proof. The proof is provided in the Appendix A, as it relies on multiple lemmas.

The following corollary implies that the beta-Stacy bootstrap can also approximate the joint distribution of vectors of the form (Gh1,,Ghk). This is useful to approximate the joint distribution of multiple summaries of G, e.g. the joint distribution of its first k moments (hj(x)=xj for j=1,,k).

Corollary 4.1. Let h1,,hk be F-integrable and F-everywhere continuous. Then, (Gmh1,,Gmhk)(Gh1,,Ghk) in law conditional on Y1,,Yn for m+, i.e. E[H(Gmh1,,Gmhk)Y1,,Yn]E[H(Gh1,,Ghk)Y1,,Yn] as m+ for any bounded continuous function H.

Proof. Take λ1,,λkR and define h=λ1h1++λkhk. By Proposition 4.1, GmhGh for m+. This implies that the joint characteristic function of (Gmh1,,Gmhk) converges to that of (Gh1,,Ghk): E[exp(i{λ1Gmh1++λkGmhk})Y1,,Yn]=E[exp(iGmh)Y1,,Yn]E[exp(iGh)Y1,,Yn]=E[exp(i{λ1Gh1++λkGhk})Y1,,Yn] for m+.

Consequently, for large m the law of the sample ϕ(Gm)=f(Gmh1,,Gmhk) generated by the beta-Stacy bootstrap is approximately the same of ϕ(G)=f(Gh1,,Ghk). In fact, if f is continuous (as is the case for all examples considered in this paper), then by Corollary 4.1 and the continuous mapping theorem it holds that ϕ(Gm)ϕ(G) in law as m+. Hence, if m is sufficiently large (e.g. m1000; see Section 8), by repeating steps 1–4 above independently, it is possible to generate an approximate sample of arbitrary size from the posterior law of ϕ(G). More generally, the joint law of (ϕ1(Gm),,ϕk(Gm)) converges to that of (ϕ1(G),,ϕp(G)), where ϕj(G)=fi(Gh1,,Ghk) and fi(x1,,k) is continuous or all j=1,,p. Thus the beta-Stacy bootstrap can also be used to approximate the joint posterior law of vectors of functionals of G.

5. Connection with other Bayesian bootstraps

The proposed procedure is a Bayesian analogue of Efron’s classical bootstrap (Efron, 1981). When censoring is possible, the latter is based on repeated sampling from the Kaplan–Meier estimator G^ (Efron and Tibshirani, 1986). Similarly, the beta-Stacy bootstrap samples from F (cf. step 1 of Algorithm 4.1), the beta-Stacy posterior mean from Theorem 3.1.

The beta-Stacy bootstrap generalizes several Bayesian variants of the classical bootstrap: the Bayesian bootstrap of Rubin (1981), the proper Bayesian bootstrap of Muliere and Secchi (1996), and the Bayesian bootstrap for censored data of Lo (1993). The first two assume that there is no censoring, while the last allows for censored data. Their relationships are summarized in Fig. 1.

Given uncensored observations Y1,,Yn, the Bayesian bootstrap of Rubin (1981) assigns ϕ(G) the same law as ϕ(i=1nWiI{Ynx}), where (W1,,Wn) has as a uniform Dirichlet distribution (and thus it is an exchangeably weighted bootstrap; cf. Praestgaard and Wellner, 1993). Consequently, Rubin’s bootstrap approximates the posterior law of ϕ(G) induced by the improper Dirichlet process GDP(0,F), i.e. the law of ϕ(G), where G DP(n, n1i=1nI{Yix})) (Ghosal and van der Vaart, 2017, Section 4.7).

In contrast, the proper Bayesian bootstrap of Muliere and Secchi (1996) is defined according to a procedure akin to Algorithm 4.1. In detail, step 1 is the same, since F=F^ (there is no censoring); in step 2, take c(x)=k for all x; finally, step 3 and 4 are the same. Hence, when there is no censoring, the procedure of Muliere and Secchi (1996) is a special case of the beta-Stacy bootstrap (in general, neither is exchangeably weighted; cf. Praestgaard and Wellner, 1993). Their relation is illustrated in Fig. 1 by arrow (a).

As a consequence, if there is no censoring the proper Bayesian bootstrap approximates (for large m) the posterior law of ϕ(G) induced by a proper Dirichlet process G DP(k, F) with k>0. More precisely, it approximates the law of ϕ(G) with G DP(k+n, F^) and F^=kk+nF(x)+1k+ni=1nI{Tix}. Thus, as k0 (i.e. as the prior precision of the Dirichlet process vanishes), the proper Bayesian bootstrap will approximate the same posterior distribution as the procedure of Rubin (1981)—cf. Muliere and Secchi (1996). This is illustrated by arrow (c) in Fig. 1.

Lo’s procedure (1993) extends Rubin’s bootstrap (1981) to the case where censoring is possible—they coincide when there is no censoring; cf. arrow (d) in Fig. 1. Specifically, the Lo’s Bayesian bootstrap for censored data approximates the posterior law of ϕ(G) obtained from the improper beta-Stacy prior BS(0, F) or, equivalently, an improper beta process (Lo, 1993). Better, Lo’s bootstrap (1993) approximates the law of ϕ(G) with that of ϕ(G), where GBS(c^(x), G^(x)), c^(x)=M(x)G^(x), and G^(x) is the Kaplan–Meier estimator (cf. Section 3). This is the limit of the beta-Stacy posterior law from Theorem 3.1 as c(x)0 for all x>0. Thus, Lo’s procedure (1993) is obtained from ours in the limit of c(x)0 for all x>0 (cf. arrow (b) in Fig. 1).

In addition to the ones mentioned above, the beta-Stacy bootstrap also generalizes the Bayesian bootstrap for finite populations of Lo (1988) and the Pòlya urn bootstrap of Muliere and Walker (1998). These are obtained from the beta-Stacy bootstrap as previously done, assuming that F is discrete and of finite support.

6. Generalization to the k-sample case

We now consider the setting where censored observations are available from k independent groups. Specifically, we observe a sample time-to-event data Yj,1,,Yj,nj generated by the cumulative distribution function Gj for all j=1,,k. A similar setting arises, for example, in randomized trials with k treatment arms and a survival end-point. Without loss of generality, we suppose that k=2.

In this setting, the goal is often to compare summary measures of survival across groups. These correspond to joint functionals of the form ϕ(G1,G2)=f(G1h1,,G1hp, G2h1,,G2hp), where f(x1,,xp, y1,,yp) and h1(x),,hp(x) are real-valued functions. Examples include the difference in expected survival times (p=1, h1(x)=x, f(x1,y1)=x1y1)) or the ratio of survival probabilities (p=1, h1(x)=I{xt}, f(x1,y1)=x1y1)). Similarly as in Section 4, we assume that hi is F-integrable and F-almost everywhere continuous, as well as that f(x1,,xp,y1,,yp) is continuous.

If G1BS(c1,F1) and G2BS(c2,F2) independently, we can use the beta-Stacy bootstrap to approximate the posterior law of ϕ(G1,G2) given the censored data Y1,1,,Y1,n1 and Y2,1,,Y1,n2. From Theorem 3.1, this is the law of ϕ(G1,G2), where: G1 and G2 are independent; GjBS(cj,Fj) for each j=1, 2; and cj, Fj are computed from the jth group’s data using Eqs. (3)-(4).

In more detail, let Gj,m be the distribution function generated by one iteration of the beta-Stacy bootstrap in group j=1, 2 (cf. step 4 of Algorithm 4.1). Then, for large m, ϕ(G1,m,G2,m) will be an approximate sample from the law of ϕ(G1,G2), as shown by the following proposition.

Proposition 6.1. ϕ(G1,m,G2,m)ϕ(G1,G2) for m+ conditional on Y1,1,,Y1,n1, Y2,1,,Y1,n2.

Proof. Since G1,m and G2,m are independent conditional on Y1,1,,Y1,n1,Y2,1,,Y1,n2, Corollary 4.1 implies that (G1,mh1,,G1,mhp,G2,mh1,,G2,mhp) converges in law to (G1h1,,G1hp, G2h1,,G2hp) as m+. The thesis now follows from the continuous mapping theorem. □

7. Implementing the beta-Stacy bootstrap

To implement the beta-Stacy bootstrap, we use the following procedure to generate observations from F (step 1 of Algorithm 4.1). To be concrete, we assume that F is continuous (so ΔF(x)=0 for all x>0) with density f(x), but a similar method can also be used when F is discrete.

Our approach is based on the relationship F(x)=1(1Fd(x))(1Fc(x)) described in Section 3. This implies that if Xd and Xc are sampled independently from Fd and Fc, respectively, then X=min(Xd,Xc) is a sample from F. We implement step 1 of Algorithm 4.1 by iterating this process m times.

In detail, we sample Xd from Fd as follows. First, we note that, since F is continuous, Eq. (5) implies that ΔFd(x)>0 only if ΔN(x)>0. Consequently, we can sample Xd by defining it equal to Yj with probability ΔFd(Yj) for all j=1,,n, or + with probability 1j=1nΔFd(Yj). We do this using the inverse probability transform algorithm (Robert and Casella, 2004, Chapter 3).

Instead, we generate Xc from Fc in Eq. (3) using the inverse probability transform approach (Robert and Casella, 2004, Chapter 3). Specifically, first we sample U from the uniform distribution over [0, 1], then we define Xc as the solution to the equation

0Xcc(x)f(x)c(x)(1F(x))+M(x)dx=log(1U).

We approximate the above integral using Gaussian quadrature and compute Xc using the bisection root-finding method (Quarteroni et al., 2010).

8. Empirical illustration

We illustrate our procedure using survival data (freely available as part of the R dataset survival::pbc) from a randomized clinical trial of D-penicillamine for primary biliary cirrhosis of the liver (Dickson et al., 1989). In this trial, 312 cirrhosis patients were randomized to receive either D-penicilammine (158 patients) or placebo (154 patients). Patients in the D-penicilammine (respectively: placebo) arm accumulated a total of about 872 (842) person-years of follow-up, during which 65 (60) were observed. Overall, 187 (59.9%) survival times were censored across study arms. Arm-specific Kaplan-Meier curves are shown in Fig. 2, panel a.

Fig. 2.

Fig. 2.

Panel a: Kaplan–Meier curves for the Mayo clinic primary biliary cirrhosis trial (cf. Section 8). Panels b and c: density estimates and box-plots of 10,000 posterior samples of the 10-years survival probability (panel b) and the 10-years restricted mean-survival time (panel c) in the placebo arm; samples were obtained either with the beta-Stacy bootstrap (separately for m=10, 100, and 1000) or using the reference GvdVa algorithm (cf. Section 8). Panel d: density estimates and box-plots of 10,000 beta-Stacy bootstrap samples of the difference in mean survival times across arms (for m=10, 100, and 1000).

Using these data, we compare the beta-Stacy bootstrap with another approach based on Algorithm a of Ghosal and van der Vaart (2017, Section 13.3.3)—which we will denote as GvdVa. For any beta-Stacy process G, algorithm GvdVa can simulate approximate sample paths {G(x):x[0,T]} over a prespecified bounded interval [0, T]. This algorithm is based on a discretization of [0, T] by means of N equally-spaced points, so that larger values of N provide a better approximation to the beta-Stacy process (as we explain later, we use N=5000 as reference in our analyses). We have chosen this algorithm as comparator because, compared to the others mentioned in the introduction, algorithm GvdVa is simpler to implement (like the beta-Stacy bootstrap, it is based on exact simulation steps and does not require sampling from unnormalized distributions; cf. Blasi, 2014). Details are provided in the Supplementary Section S1.

8.1. Prior and posterior distributions

Denote with G0 and G1 the cumulative distribution functions of survival times in the placebo and D-penicilammine arms, respectively. We assigned Gi (i=0, 1) an independent beta-Stacy prior BS(ci,Fi), where Fi is the cumulative distribution function of an exponential random variable with median equal to 10 years. For simplicity, we assumed ci(x)=1 for all x0. These prior distributions are fairly non-informative, since they are very diffuse around their expected values (Supplementary Figure S1).

With these priors, the posterior means of G0 and G1 are practically indistinguishable from the corresponding Kaplan–Meier curves (Supplementary Figure S2). This is also confirmed by the Kolmogorov–Smirnov distances Di=supx[0,12]Fi(x)G^i(x), (i=0, 1) which compare the Kaplan–Meier estimate G^i of Gi and the corresponding posterior mean Fi over the period from 0 to 12 years from randomization. We estimated that D0=0.004 for the placebo arm, and D1=0.005 for the D-penicilammine arm.

8.2. Inference for single-sample summaries

Using the beta-Stacy bootstrap and the GvdVa algorithm, we approximate the posterior distribution of two summaries of G0: (i) the 10-year survival probability in the placebo arm, i.e. ϕ1(G0)=1G0(10)=G0h with h(x)=I{x>10}, and (ii) the 10-year restricted mean survival time in the placebo arm, i.e. ϕ2(G0)=010[1G0(x)]dx=G0h with h(x)=min(x,10).

In each case, we obtain 10,000 posterior samples. For the beta-Stacy bootstrap, we use m=10, 100, and 1000, separately. To provide a reference against which to compare the beta-Stacy bootstrap, we implemented the GvdVa algorithm using a discretization of the time interval [0, 10] based on N=5000 equally-spaced points. We chose by this value by iteratively increasing N until the corresponding approximate posterior distributions of ϕ1(G0) and ϕ2(G0) seemed to stabilize (cf. Supplementary Section S1). Note that algorithm GvdVa can be applied to ϕ1(G0) and ϕ2(G0) because they depend only on the values of G0(x) for x[0,10] [0, 10].

We use Kolmogorov–Smirnov statistics to compare the distributions obtained from the beta-Stacy bootstrap and algorithm GvdVa. Specifically, for both summary measures ϕ1(G0) and ϕ2(G0) separately, we compute the statistics Δm=supx>0F^1(x)F^0,m(x), where: m=10, 100, or 1000; F^0,m(x) is the empirical distribution of the corresponding beta-Stacy bootstrap sample; and F^1 is the empirical distribution of the GvdVa samples.

Results are shown in Figs. 2b-c. For the 10-year survival probability (panel b), the distribution of beta-Stacy bootstrap samples approaches that obtained from algorithm GvdVa as m increases. Indeed, the associated Kolmogorov–Smirnov statistics are Δ10=0.24, Δ100=0.06, and Δ1000=0.02. Similar results were also obtained for the 10-year restricted mean survival (panel c), for which we computed Δ10=0.32, Δ100=0.11, and Δ1000=0.02. The choice m=1000 thus seems to have provided a good approximation to the posterior laws of interest.

8.3. Difference in mean survival times

We now consider the posterior law of the two-sample summary ϕ(G1,G0)=G1hG0h defined by h(x)=x, i.e. the difference in mean survival times between the D-penicilammine arm and the placebo arm. In this case, it is hard to use the GvdVa algorithm to approximate the beta-Stacy posterior, because h(x) has infinite support. On the contrary, we can still use the beta-Stacy bootstrap directly to generate approximate samples from the posterior law of ϕ(G1,G0).

In Fig. 2d, we show the distribution of 10,000 posterior samples of the difference in mean survival times obtained with the beta-Stacy bootstrap, separately using m=10, 100, or 1000. Compatibly with the previous results, the distribution of posterior samples stabilizes as m increases. In particular, the density estimates and quartiles of the distributions for m=100 and m=1000 are almost indistinguishable (the Kolmogorov–Smirnov distance between the two sample distributions was 0.007). These results again suggest that m=1000 provided a good approximation to the relevant posterior distribution.

8.4. Additional simulation study

In Supplementary Section S2, we report a simulation study aimed at assessing how the proportion of censored sample observations may impact the beta-Stacy bootstrap approximation of the beta-Stacy posterior distribution. Results suggest that the proportion of censored observations does not impact the quality of the approximation in comparison to the reference GvdVa algorithm, provided that m is sufficiently large. Compared to a scenario with no censoring, simulation scenarios with higher censoring rates (up to 75% of censored data) did not require larger values of m to obtain the same quality of approximation (m=1000 seemed to be acceptable in all considered scenarios).

9. Concluding remarks

The beta-Stacy bootstrap is an algorithm to perform Bayesian non-parametric inference with censored data. This procedure generates approximate sampler a beta-Stacy process posterior (Walker and Muliere, 1997) without the need to tune Markov Chain Monte Carlo methods. The quality of the approximation is controlled by the number m of samples from the posterior mean distribution (cf. step 1 of Algorithm 4.1). Our simulations suggest that m=1000 may generally provide a good approximation, independently of the proportion of event times that are affected by censoring.

In place of the beta-Stacy process, many other non-parametric prior processes could be used to estimate summaries ϕ(G) of the survival distribution function. Examples include piece-wise hazard processes (Arjas and Gasbarra, 1994), the gamma and extended gamma processes (Kalbfleisch, 1978; Dykstra and Laud, 1981), Pòlya trees (Mauldin et al., 1992; Muliere and Walker, 1997), mixture models driven by random measures (Kottas, 2006; Riva-Palacio et al., 2021), or Bayesian Additive Regression Trees (Sparapani et al., 2016). In comparison to the beta-Stacy process, computations using alternative prior processes may require the use of Markov Chain Monte Carlo samplers due to lack of conjugacy. Whether new Bayesian bootstraps could be derived for other conjugate processes (e.g Muliere and Walker, 1997) is a question for future research.

Inference using the beta-Stacy process BS(c,F) requires specification of both the precision function c and the prior mean distribution function F. To avoid having to specify these in full, we might instead definecθ and/or Fθ as a function of a scalar or multi-dimensional parameter θ (e.g. we might take c(x)=θ for all x0). Then, instead of specifying a single value of θ, we could assign it a prior distribution π(θ). This approach leads to the specification of a mixture of beta-Stacy processes as the prior distribution for G, i.e. GBS(cθ,Fθ)π(dθ), in the same approach of Antoniak (1974). In future work, we will evaluate the use of the beta-Stacy bootstrap in Monte Carlo schemes for such mixtures and their generalization for competing risks data (Arfè et al., 2018).

Supplementary Material

1

Acknowledgments

We thank Alejandra Avalos-Pacheco, Massimiliano Russo, and Giovanni Parmigiani for their useful comments. Part of this work was developed while the first author was supported by a post-doctoral fellowship at the Harvard-MIT Center for Regulatory Science, Harvard Medical School, United States. Analyses were conducted in R (version 4.1.2) using the libraries mvQuad, Rcpp, and ggplot2.

Appendix A. Technical lemmas and proofs

To prove Proposition 4.1, we will use results related to convergence in law of random measures—cf. Daley and Vere-Jones (2007), Section 11.1; see also Kallenberg (2017), Chapter 4. Let W and Wm be random measures over [0,+) that are finite on bounded intervals for every integer m1. Then, Wm converges in law to W if and only if WmhWh in law (as real-valued random variables) for every bounded continuous function h:[0,+)R with bounded support. This happens if and only E[exp(Wmh)]E[exp(Wh)] as m+. for every such function h (Daley and Vere-Jones, 2007, Proposition 1.11.VIII).

Let D[0,+) be the space of all right-continuous functions with left-hand limits with the Skorokhod topology (Jacod and Shiryaev, 2003, Chapter VI, Section 1b). The following result links convergence in law of Wm to W to convergence in law of their cumulative distribution functions as random elements of D[0,+).

Lemma A.1. The random measure Wm converges in law to W if and only if the function Wm(x)=Wm([0,x]) converges in law to W(x)=W([0,x])(x0) as random elements of D[0,+) with the Skorokhod topology.

Proof. This result can be shown using a similar argument as that presented before Lemma 11.1.XI of Daley and Vere-Jones (2007).

We now prove the following lemma, which implies that the measure Zm converges in law to Z conditionally on Y1,,Yn i.e. E[exp(Zmh)Y1,,Yn]E[exp(Zh)Y1,,Yn] as m+. for every bounded continuous h with bounded support. For simplicity of notation, let En[]=E[Y1,,Yn] be the conditional expectation with respect to Y1,,Yn. Also let En,m[]=En[X1,,Xm], where X1,,Xm are the variables from Step 1 of Algorithm 4.1.

For any given Y1,,Yn and X1,,Xm, we define ρ(x,u) (respectively: ρm(x,u) as the function ρ(x,u) in Eq. (2), but with c and F (respectively: c and Fm) in place of c and F. With these notations, by Lemma 1 of Ferguson (1974) it is logEn[exp(Zh)]=(0,+)2(1exp(uh(x))ρ(x,u)dF(x)du and, similarly, logEn,m[exp(Zmh)]=(0,+)2(1exp(uh(x))ρm(x,u)dFm(x)du.

Lemma A.2. (i) If h:[0,+)[0,+) is a bounded measurable function with bounded support (but not necessarily continuous), then, conditionally on Y1,,Yn,ZmhZh in law as m+. (ii) The previous statement also holds for every bounded measurable h:[0,+)R with bounded support.

Proof. First we prove point (i). By dominated convergence, it suffices to show that En,m[exp(Zmh)]En[exp(Zh)] as n+. with probability 1 for all functions h such that 0h(x)H and h(x)=0 for all x>l for some H,l>0. To do so, we note that (1euh(x))ρm(x,u)(1euh(x))ρ(x,u) uniformly in x, and so gm(u)=0l(1euh(x))ρm(x,u)dFm(x)g(u)=0l(1euh(x))ρ(x,u)dF(x), for all fixed u with probability 1. This follows from the Glivenko–Cantelli theorem, the fact that c(x) is bounded, and because the functions xex and r(x) are bounded and Lipschitz over (0,+). Now, fix δ>0 such that F¯(l)>δ>0 (this is possible because F(x)<1 for all x>0). With probability 1, F¯m(x)>δ for all xl and large m. In such case, since ϵc(x)ϵ1 and 1exp(uh(x)) min(uH, 1), it is gm(u)w(u)=γmin(uH,1)exp(uγδ)(1eu) for u>0 and some γ>0. Since logEn,m[eZmh]=0+gm(u)du, logEn[eZh]=0+g(u)du, and 0+w(u)du<+, the thesis follows by dominated convergence.

To prove point (ii), let h:[0,+)R be a bounded measurable function with bounded support. Define h+(x)=max(0,h(x)) and h(x)=min(0,h(x)), which are both bounded non-negative measurable functions with bounded support. Now, by point (i), En[exp(λ1Gmh+λ2Gmh)]=En[exp(Gm(λ1h++λ2h))]En[exp(G(λ1h++λ2h))]=En[exp(λ1Gh+λ2Gh)] as m+. for every λ1, λ20. Consequently, (Gmh+,Gmh)(Gh+,Gh) in law as a random vector by convergence of the corresponding joint Laplace transform (Kallenberg, 1997, Theorem 4.3). Since h=h+h, the thesis follows from the continuous mapping theorem.

Using Lemmas A.1 and A.2, we can now prove that, conditionally on Y1,,Yn,Gmh converges in distribution to Gh for every bounded continuous function h with bounded support.

Lemma A.3. Gm converges in law to G as m+ conditionally on Y1,,Yn.

Proof. Let ϕ:D[0,+)D[0,+) be defined by ϕ(z(x))=1exp(z(x))(x0) for every zD[0,+). Since the map x1exp(x) defined for every real x0 is Lipschitz-continuous, ϕ is also continuous with respect to the Skorokhod topology on D[0,+). Since Gm(x)=ϕ(Zm(x)) and G(x)=ϕ(Z(x)) for every x0, the thesis now follows from Lemmas A.1 and A.2 and the continuous mapping theorem.

We are now ready to prove Proposition 4.1.

Proof of Proposition 4.1. From Lemma A.3 and Proposition 4.19 of Kallenberg (2017), it follows that GmhGh in law conditionally on Y1,,Yn for every bounded continuous function h (not necessarily with bounded support). Then, using an argument like the one in the proof of Lemma 4.12 of Kallenberg (2017), it follows that the same is true for every bounded measurable function h (not necessarily continuous) such that F(Dh)=0. We now show that the thesis holds for any F-integrable h (not necessarily bounded), provided that F(Dh)=0. In fact, by an argument like the one used to prove point (ii) of Lemma A.2, it suffices to show that this is true for any such non-negative h.

Consequently, suppose that h(x)0 for every x0. By the Portmanteau theorem, it suffices to show that En[f(Gmh)]En[f(Gh)]0 as m+ for any real-valued function f(x) such that f(x)K and f(x)f(y)Lxy for some K,L0. To do this, let M0 and define hM(x)=min(M,h(x)). Then, En[f(Gmh)]En[f(Gh)]Δ1+Δ2+Δ3, where Δ1=supmEn[f(Gmh)]En[f(GmhM)], Δ2=En[f(GmhM)]En[f(GhM)], and Δ3=En[f(GhM)]En[f(Gh)]. Now, Δ20 for m+, because hM is bounded, measurable, and DhMDh. Consequently, lim supm+En[f(Gmh)]En[f(Gh)]Δ1+Δ3. Since 0hM(x)h(x) for every x0 and GBS(c,F), we also have that Δ3LEn[G(hhM)]=LF(hhM). By the Markov inequality, for every δ>0 it holds that Δ1supm(δL+2δ1KLEn[Gm(hhM)])=δL+2δ1KLF(hhM)—where the last equality follows from En[Gm(hhM)]=En[En,m{Gm(hhM)}]=En[Fm(hhM)] (cf. Section 4). As a consequence, lim supm+En[f(Gmh)]En[f(Gh)]δL+(L+2δ1KL)F(hhM). However, by the dominated convergence theorem, F(hhM)0 as M+. Hence, the thesis follows by first letting M+ and then δ0 from above.

Footnotes

Appendix B. Supplementary data

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jspi.2022.07.001. Code to implement the beta-Stacy bootstrap and reproduce our results is available at https://github.com/andreaarfe/.

References

  1. Al Labadi L, Zarepour M, 2013. A Bayesian nonparametric goodness of fit test for right censored data based on approximate samples from the beta-Stacy process. Canad. J. Statist 41 (3), 466–487. [Google Scholar]
  2. Antoniak CE, 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist 2 (6), 1152–1174. [Google Scholar]
  3. Arfè A, Peluso P, Muliere P, 2018. Reinforced urns and the subdistribution beta-Stacy process prior for competing risks analysis. Scand. J. Stat 46, 706–734. [Google Scholar]
  4. Arjas E, Gasbarra D, 1994. Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler. Statist. Sinica 505–524. [Google Scholar]
  5. Barrientos AF, Peña V, 2020. Bayesian bootstraps for massive data. Bayesian Anal. 15 (2). [Google Scholar]
  6. Blasi P, 2014. Simulation of the beta-stacy process. In: Wiley StatsRef: Statistics Reference Online. John Wiley and Sons, Ltd, URL https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118445112.stat03869. [Google Scholar]
  7. Daley D, Vere-Jones D, 2007. An Introduction To the Theory of Point Processes. Volume II: General Theory and Structure. Springer, New York. [Google Scholar]
  8. Damien P, Laud PW, Smith AF, 1995. Approximate random variate generation from infinitely divisible distributions with applications to Bayesian inference. J. R. Stat. Soc. Ser. B Stat. Methodol 547–563. [Google Scholar]
  9. Dickson ER, Grambsch PM, Fleming TR, Fisher LD, Langworthy A, 1989. Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 10 (1), 1–7. [DOI] [PubMed] [Google Scholar]
  10. Doksum K, 1974. Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab 183–201. [Google Scholar]
  11. Dykstra RL, Laud P, 1981. A Bayesian nonparametric approach to reliability. Ann. Statist 9 (2), 356–367. [Google Scholar]
  12. Efron B, 1981. Censored data and the bootstrap. J. Amer. Statist. Assoc 76 (374), 312–319. [Google Scholar]
  13. Efron B, Tibshirani R, 1986. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci 54–75. [Google Scholar]
  14. Ferguson TS, 1973. A Bayesian analysis of some nonparametric problems. Ann. Statist 1, 209–230. [Google Scholar]
  15. Ferguson T, 1974. Prior distributions on spaces of probability measures. Ann. Statist 2 (4), 615–629. [Google Scholar]
  16. Ferguson TS, Klass MJ, 1972. A representation of independent increment processes without Gaussian components. Ann. Math. Stat 43 (5), 1634–1643. [Google Scholar]
  17. Ferguson TS, Phadia EG, 1979. Bayesian nonparametric estimation based on censored data. Ann. Statist 163–186. [Google Scholar]
  18. Ghosal S, van der Vaart A, 2017. Fundamentals of Nonparametric BayesIan Inference. In: Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press. [Google Scholar]
  19. Gill RD, Johansen S, 1990. A survey of product-integration with a view toward application in survival analysis. Ann. Statist 18 (4), 1501–1555. [Google Scholar]
  20. Heitjan DF, 1993. Ignorability and coarse data: Some biomedical examples. Biometrics 49, 1099–1109. [PubMed] [Google Scholar]
  21. Heitjan DF, Rubin DB, 1991. Ignorability and coarse data. Ann. Statist 19, 2244–2253. [Google Scholar]
  22. Hjort NL, 1990. Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist 18, 1259–1294. [Google Scholar]
  23. Ishwaran H, Zarepour M, 2002. Dirichlet prior sieves in finite normal mixtures. Statist. Sinica 12 (3), 941–963. [Google Scholar]
  24. Jacod J, Shiryaev AN, 2003. Limit Theorems for Stochastic Processes. Springer Berlin Heidelberg, Berlin, Heidelberg. [Google Scholar]
  25. Kalbfleisch JD, 1978. Non-parametric Bayesian analysis of survival time data. J. R. Stat. Soc. Ser. B Stat. Methodol 40 (2), 214–221. [Google Scholar]
  26. Kalbfleisch JD, Prentice RL, 2002. The Statistical Analysis of Failure Time Data, 2nd ed John Wiley & Sons, Hoboken, New Jersey. [Google Scholar]
  27. Kallenberg O, 1997. Foundations of Modern Probability. Springer, New York. [Google Scholar]
  28. Kallenberg O, 2017. Random Measures, Theory and Applications. In: Probability Theory and Stochastic Modelling, Springer International Publishing. [Google Scholar]
  29. Kim Y, Lee J, 2003. Bayesian bootstrap for proportional hazards models. Ann. Statist 31 (6), 1905–1922. [Google Scholar]
  30. Kottas A, 2006. Nonparametric Bayesian survival analysis using mixtures of Weibull distributions. J. Statist. Plann. Inference 136 (3), 578–596. [Google Scholar]
  31. Lo AY, 1987. A large sample study of the Bayesian Bootstrap. Ann. Statist 15 (1). [Google Scholar]
  32. Lo AY, 1988. A Bayesian bootstrap for a finite population. Ann. Statist 16 (4), 1684–1695. [Google Scholar]
  33. Lo AY, 1991. BayesIan bootstrap clones and a biometry function. Sankhyā: Indian J. Stat 53 (3), 320–333. [Google Scholar]
  34. Lo AY, 1993. A Bayesian bootstrap for censored data. Ann. Statist 21 (1), 100–123. [Google Scholar]
  35. Lyddon SP, Holmes CC, Walker SG, 2019. General Bayesian updating and the loss-likelihood bootstrap. Biometrika 106 (2), 465–478. [Google Scholar]
  36. Lyddon S, Walker S, Holmes CC, 2018. Nonparametric learning from Bayesian models with randomized objective functions. In: NeurIPS. pp. 2075–2085. [Google Scholar]
  37. Mauldin RD, Sudderth WD, Williams S, 1992. Polya trees and random distributions. Ann. Statist 20 (3), 1203–1221. [Google Scholar]
  38. Muliere P, Secchi P, 1996. Bayesian nonparametric predictive inference and bootstrap techniques. Ann. Inst. Statist. Math 48 (4), 663–673. [Google Scholar]
  39. Muliere P, Secchi P, 2003. Weak convergence of a Dirichlet-multinomial process. Georgian Math. J 10 (2), 319–324. [Google Scholar]
  40. Muliere P, Walker S, 1997. A Bayesian non-parametric approach to survival analysis using polya trees. Scand. J. Stat 24 (3), 331–340. [Google Scholar]
  41. Muliere P, Walker S, 1998. Extending the family of Bayesian bootstraps and exchangeable urn schemes. J. R. Stat. Soc. Ser. B Stat. Methodol 60 (1), 175–182. [Google Scholar]
  42. Praestgaard J, Wellner JA, 1993. Exchangeably weighted bootstraps of the general empirical process. Ann. Probab 21 (4), 2053–2086. [Google Scholar]
  43. Quarteroni A, Sacco R, Saleri F, 2010. Numerical Mathematics. In: Texts in Applied Mathematics, (37), Springer-Verlag Berlin Heidelberg. [Google Scholar]
  44. Riva-Palacio A, Leisen F, Griffin J, 2021. Survival regression models with dependent Bayesian nonparametric priors. J. Amer. Statist. Assoc 1–10. [Google Scholar]
  45. Robert C, Casella G, 2004. Monte Carlo Statistical Methods, 2nd ed In: Springer Texts in Statistics, Springer, New York. [Google Scholar]
  46. Royston P, Parmar MK, 2013. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med. Res. Methodol 13 (1), 152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Rubin DB, 1981. The Bayesian bootstrap. Ann. Statist 130–134. [Google Scholar]
  48. Sparapani RA, Logan BR, McCulloch RE, Laud PW, 2016. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART). Stat. Med 35 (16), 2741–2753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Walker S, Damien P, 1998. A full Bayesian non-parametric analysis involving a neutral to the right process. Scand. J. Stat 25 (4), 669–680. [Google Scholar]
  50. Walker S, Muliere P, 1997. Beta-stacy processes and a generalization of the Pólya-urn scheme. Ann. Statist 25, 1762–1780. [Google Scholar]
  51. Wolpert RL, Ickstadt K, 1998. Simulation of Lévy random fields. In: Practical Nonparametric and Semiparametric Bayesian Statistics. Springer; New York, pp. 227–242. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES