Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 7.
Published in final edited form as: Stat Med. 2008 Feb 20;27(4):529–542. doi: 10.1002/sim.3002

A Bayesian analysis of doubly censored data using a hierarchical Cox model

Wei Zhang 1,2,*,, Kathryn Chaloner 1,3, Mary Kathryn Cowles 1,3, Ying Zhang 1, Jack T Stapleton 4
PMCID: PMC7476730  NIHMSID: NIHMS1622708  PMID: 17694594

SUMMARY

Two common statistical problems in pooling survival data from several studies are addressed. The first problem is that the data are doubly censored in that the origin is interval censored and the endpoint event may be right censored. Two approaches to incorporate the uncertainty of interval-censored origins are developed, and then compared with more usual analyses using imputation of a single fixed value for each origin. The second problem is that the data are collected from multiple studies and it is likely that heterogeneity exists among the study populations. A random-effects hierarchical Cox proportional hazards model is therefore used.

The scientific problem motivating this work is a pooled survival analysis of data sets from three studies to examine the effect of GB virus type C (GBV-C) coinfection on survival of HIV-infected individuals. The time of HIV infection is the origin and for each subject this time is unknown, but is known to lie later than the last time at which the subject was known to be HIV negative, and earlier than the first time the subject was known to be HIV positive. The use of an approximate Bayesian approach using the partial likelihood as the likelihood is recommended because it more appropriately incorporates the uncertainty of interval-censored HIV infection times.

Keywords: GBV-C, human immunodeficiency virus, interval censoring, MCMC, multicenter AIDS cohort study, partial likelihood

1. INTRODUCTION

Infection with GB virus type C (GBV-C) in humans is common, but no association between the virus and any known disease state has been demonstrated [13]. Individuals infected with human immunodeficiency virus (HIV) are commonly coinfected with GBV-C, since GBV-C shares the same modes of transmission as HIV. The prevalence of coinfection with GBV-C in HIV-infected individuals ranges from 14 to 43 per cent [4]. Several recent studies of data from an early period in the epidemic, before the availability of effective therapy, suggest that coinfection with GBV-C is associated with prolonged survival among HIV-infected people [58]; other studies have concluded that there is no association [9, 10]. A meta-analysis of summary statistics is performed by Zhang et al. [11] which indicates that persistent GBV-C coinfection is associated with prolonged survival when GBV-C is measured 5–6 years after seroconversion. To further investigate this conclusion, which remains controversial [12, 13], individual level data from separate studies are modeled here.

Original data sets from three published studies [68] are obtained. These data sets are doubly censored. First, the origin (HIV infection time) Y is interval censored in that it is known to lie in an interval Y ∈ [L, U]. Second, the endpoint (death) time E is possibly right censored. Denote survival time T = EY. The dependence of T on covariates and, in particular, on the indicator of GBV-C infection is of interest. In this paper, approaches are developed for the pooled survival analysis of doubly censored data from multiple studies and applied to the pooled data from the three studies.

A random-effects model for the indicator of GBV-C coinfection incorporates the heterogeneity of patient characteristics between the studies. The most popular modeling method in survival analysis, the Cox proportional hazards model [14], avoids making any assumptions about the baseline hazard function λ0(t). Several authors have considered Cox survival models with random effects [1519]. Our approach builds on that of Sargent [20] and Gustafson [21] and treats the Cox partial likelihood [22] as the likelihood.

The remainder of this paper is organized as follows. Section 2 gives an introduction to the hierarchical Cox proportional hazards model. Section 3 presents different approaches to incorporating the interval censoring. In Section 4, these approaches are applied to the case study, and the results are summarized and compared. Section 5 concludes with a discussion.

2. HIERARCHICAL COX PROPORTIONAL HAZARDS MODEL

The standard Cox regression represents the relationship between the covariates of interest and the hazard of event at time t through a proportional hazards model:

λ(t;xi)=λ0(t)exp(xiβ)

where λ(t; xi) is the hazard function, xi is a 1 × p covariate vector for individual i, and β is a p × 1 vector of coefficients corresponding to fixed effects.

2.1. The hierarchical model

Suppose there are q covariates to be modeled as random effects in addition to p covariates with fixed effects. For each of the q random covariates, there are rl levels (e.g. trials, centers, studies), for l =1, … , q, with parameter vector γl=(γl1,,γlrl)T corresponding to the lth random covariate. Let wil be a scalar indicating the ith subject’s value of the lth random-effect covariate. Let zil=(zil1,,zilrl), with zilj = Iiljwil, where Iilj = 1 if subject i falls in the jth level of the lth random covariate, 0 otherwise. Define γ=(γ1T,,γqT)T and zi = (zi1, … , ziq). Using this notation, the Cox model with random effects, a reparameterized frailty model [20], can be written as λ(t; xi, zi) = λ0(t) exp(xiβ + ziγ). Let D denote a right-censored data set, D = {(ti, δi, xi, zi) : i = 1, 2, … , n}, where ti is subject i’s survival time, δi is the indicator for censoring, with δi = 0 if censored and δi = 1 otherwise, and n is the number of observations. Let ti be the set of subjects at risk at time ti. If there are no ties, the partial likelihood incorporating random effects is given by

L(β,γD)i=1n[exp(xiβ+ziγ)jtiexp(xjβ+zjγ)]δi (1)

The partial likelihood in (1) serves as the first stage of the hierarchical model and can be treated as a likelihood for computing a posterior density. Kalbfleisch [23] demonstrates that treating the partial likelihood as a likelihood leads to a limiting marginal posterior distribution of the regression parameters, assuming an independent increments gamma process prior distribution for the baseline cumulative hazard and independently a uniform distribution on the regression parameters. This result is shown with a different proof in Sinha et al. [24], which extends the results to situations with time-dependent covariates, time-varying regression parameters, and grouped survival data, and presents a Bayesian justification of a modified partial likelihood for handling ties. See also Chapter 4 of [25]. Chen et al. [26] carry out an in-depth theoretical investigation of Bayesian inference for the Cox regression model and discuss posterior propriety and computation based on Cox’s partial likelihood. Sargent [20] and Gustafson [21] present methods for Bayesian analysis of multivariate survival data using (1).

The level-specific parameters γlj are modeled as draws from a distribution gl with mean μl and variance νl. Let g denote the joint density for γ, and assume γlj’s are independent of each other given μl and νl, for l = 1, … , q and j = 1, … , rl. Then

g(γμ,v)=l=1qj=1rlgl(γljμl,vl) (2)

where μ = (μ1, … , μq)T and ν = (ν1, … , νq)T.

For the final stage of the hierarchical model, prior distributions need to be specified. A proper prior distribution for the variance component is typically essential for a proper posterior distribution and computational stability. Let f (μ, ν|ω) represent this prior distribution, where ω is the vector of hyperparameters and is taken to be known. Also let β have, independent of γ, a uniform prior distribution.

An approximate posterior distribution for the model parameters is then be assumed to be

π(β,γ,μ,vD,ω)L(β,γD)g(γμ,v)f(μ,vω) (3)

where D are right-censored data.

2.2. Estimation of parameters using MCMC methods

The approximate posterior distribution (3) can be estimated with Markov chain Monte Carlo (MCMC) methods. The Metropolis–Hastings algorithm [27, 28] is a general term for a family of MCMC methods that are useful for drawing samples from Bayesian posterior distributions. Let θ denote the set of parameters involved in the hierarchical model. The parameter vector θ is divided into components corresponding to the hierarchy and the single-component Metropolis–Hastings algorithm is used [29].

3. ANALYSIS OF DOUBLY CENSORED DATA

Denote doubly censored data by C = {([li, ui], ei, δi, xi, zi) : i = 1, 2, … , n}, where [li, ui] is the interval within which origin yi falls and ei is the endpoint time which is possibly right censored. Further, define a function D(·, ·) mapping a doubly censored data set C and a set of origin times y = (y1, y2, … , yn)T to a right-censored data set; specifically, D(C, y) = {(ti, δi, xi, zi) : i = 1, 2, … , n}, where ti = eiyi. A common approach in medical applications [8, 30, 31] is that midpoints of censoring intervals are used to impute interval-censored origins, y^i=(li+ui)/2, and are then used to compute t^i=eiy^i in analysis assuming that they are right-censored data. In this case, D(C,y^)={(t^i,δi,xi,zi):i=1,2,,n}, where y^=(y^1,y^2,,y^n)T. Law and Brookmeyer [32] demonstrates that in HIV studies the Kaplan–Meier estimate based on this method is notably biased when origin intervals are longer than two years.

Three alternative methods are proposed below, which can be implemented using MCMC. If the partial likelihood is used as a likelihood in estimating the posterior distribution using MCMC, the assumption has to be made that this is valid under interval censoring of the origin: an assumption which has not been proved, but that seems reasonable given the results in [23, 24, 26].

3.1. MCMC for imputed data (MCMCid) approach

As a potential improvement to using the midpoint of a censoring interval, the MCMCid is proposed here, which samples a value of each interval-censored origin from an estimated distribution of origins. Let G denote the distribution function of origins y. The estimate of G, G^, can be obtained either parametrically using the maximum likelihood estimation based on a known distribution (e.g. Weibull, log normal) or non-parametrically using Turnbull’s self-consistency algorithm [33]. To perform an analysis for doubly censored survival data using MCMCid, for each subject i, i = 1, … , n, a value of yi, denoted y^i, is randomly sampled from G^, conditional on the interval [li, ui] within which yi falls. The doubly censored data set C and imputed origins y^=(y^1,y^2,,y^n)T are then mapped to a right-censored data set D(C,y^)={(t^i,δi,xi,zi):i=1,2,,n}, where t^i=eiy^i. The hierarchical Cox proportional hazards model can then be fit to the right-censored data D(C,y^) using MCMC methods.

The MCMCid approach is straightforward to implement and understand, but underestimates the variability of parameter estimates because the uncertainty of imputed origins y^ is not incorporated.

3.2. Imputation-embedded MCMC (ieMCMC) approach

The imputation-embedded MCMC (ieMCMC) approach is developed as an alternative. For each MCMC iteration step m, m = 1, … , M (e.g. M = 20 000), origins ym=(y1m,,ynm)T are randomly sampled based on their distribution G^, conditional on the intervals {[li, ui] : i = 1, 2, … , n} within which they fall. A right-censored data set, D(C,ym)={(tim,δi,xi,zi):i=1,2,,n}, where tim=eiyim, is generated at each iteration step m. In the ieMCMC approach, the origins ym vary at each MCMC sampler iteration with a distribution based on G^, but the estimate G^ is still fixed. The uncertainty in estimating G by G^ is therefore not taken into account, although the uncertainty in ym conditional on G^ is considered.

3.3. Bayesian partial likelihood (Bayesian-PL) approach

Finally, an approach is proposed with a parametric assumption on origins, which is more complete. Denote h(y|ξ) as the probability density function of the origins, with parameter vector ξ. Let p(ξ|ϖ) represent the prior distribution for ξ, where ϖ is the vector of hyperparameters governing p(·). For a doubly censored data set C = {([li, ui], ei, δi, xi1, zi) : i = 1, 2, , … , n}, define Iyi as the indicator function which is equal to 1 if yi is in [li, ui] and 0 otherwise. The approximate posterior distribution, based on using the partial likelihood as the likelihood, for all model parameters including the interval-censored unknown origins y can be expressed as

π(β,γ,μ,v,ξ,yC,ω,ϖ)L(β,γD(C,y))g(γμ,v)×{ih(yiξ)Iyi[H(μiξ)H(liξ)]}f(μ,vω)p(ξϖ) (4)

where H(·|ξ) is the cumulative density function of y. At each iteration of the MCMC sampler, the origins y are drawn from their full conditional distribution, given all other model quantities. The resulting analysis provides an updated estimate of the distribution of origins, as well as correctly captures the effect of the uncertainty in origins on the estimation of the parameters of primary interest. Because (4) is an approximation of the posterior distribution, in what follows we refer to this as the Bayesian-PL method.

4. CASE STUDY

The pooled data set consists of doubly censored data sets from three studies [68], and each study corresponds to a different population. In studies [6, 7], GBV-C is measured at entry into the study, which is presumed to be late in HIV disease based on the cohorts’ overall CD4+ cell count. In study [8], GBV-C infection is measured at 5–6 years after documented seroconversion to HIV infection. The ‘late’ data set of the Williams study [8] is used, from the Multicenter AIDS Cohort Study (MACS), which has a documented HIV seroconversion window of approximately 6 months on average. In the MACS study, HIV infection is based on the retrospective testing of stored blood samples obtained on a regular basis, and HIV tests are negative before seroconversion and positive after that. In the other two studies, the date of subject’s first-known HIV positive test is used as the right limit of the interval within which seroconversion is assumed to occur, and 1 January 1978 (or date of birth for subjects born after 1 January 1978) is taken to be the left limit of the interval. 1 January 1978 is chosen because an analysis of stored blood samples from a study in San Francisco indicates an extremely low prevalence of HIV infection before this date in a population at risk for HIV [34].

For subjects in all the three studies, the covariate CD4+ cell count is the measurement at the time of the first-known HIV-positive test. The covariate indicating GBV-C infection in [6, 7] is the only GBV-C measurement available for the subjects in these studies. In [8] two GBV-C test results are available, one at 12–18 months after seroconversion and the other at 5–6 years after seroconversion. The measurement 5–6 years after seroconversion is deemed to be most similar to the measurement in the other two studies and the measurement that could best address the question of whether persistent GBV-C infection is associated with prolonged survival. Although the duration of GBV-C infection is not well characterized, GBV-C is shown to persist in approximately 80 per cent of HIV-infected individuals tested at both time points in [8], and acquisition of GBV-C following HIV infection is rare (see [8] and the discussion in Section 5).

The sample sizes of [68] are 362, 197, and 138, respectively. A summary of these three studies can be found in [11]. All studies follow subjects through the time period before the advent of highly effective therapy for HIV in 1996.

Fitting the regular Cox model to the three imputed data sets separately, the estimated log hazard ratio of GBV-C coinfection, controlling for baseline log(CD4+ count in cells/mL) and age at HIV infection, is −1.23,−1.62, and −0.97 for [68], respectively. For both Xiang and Tillmann studies [6, 7], HIV infection time is heavily interval censored, with a mean interval width of about 10 years. In contrast, HIV infection time in the Williams study [8] has much narrower intervals. The subjects in the Tillmann study are from Germany, and the subjects in the other two studies are from the U.S.A. The differences among the studies could be due to several reasons, including the fact that GBV-C testing is not standardized and each study used a different primer for a qualitative test. A recent study [35] indicates that the sensitivity and specificity of each test vary and that the sensitivity of one particular test depends on GBV-C RNA levels. This motivates the need for an analysis that has the ability to account for the possibly differing effects of GBV-C infection within each population. The primary endpoint for this pooled analysis is the overall survival. All 695 eligible patients are included in the analysis.

Let xi1, xi2, and wi denote log(CD4+ count in cells/mL), age at HIV infection, and GBV-C coinfection status for subject i, respectively. Let Iij = 1 if subject i is from study j, 0 otherwise, with j = 1, 2, 3 corresponding to the three studies [68]. Define zi = (zi1, zi2, zi3), where zij = Iijwi. Let β = (β1, β2)T denote the fixed effects of covariates xi1 and xi2. Let γ = (γ1, γ2, γ3)T denote the random effects of covariates zi = (zi1, zi2, zi3), with constraint γj ∼N(μ, σ2), where μ is the population effect of GBV-C infection and σ2 the population variance. Specifically, the hazard function for individual i at time tk is given by

λ(tk;xi1,xi2,zi)=λ0(tk)exp(xi1β1+xi2β2+ziγ) (5)

Note that age at HIV infection xi2 is a known deterministic function of infection time yi. Fixed effects, rather than random effects, are used for age and CD4+ cell count as the effects from each data set of the three studies are very similar. This is unlike the effect of GBV-C in each study, which shows more variation.

In what follows, approximate posterior density, approximate posterior mean, and corresponding approximate highest posterior density region (HDR) are calculated using the approximation based on the partial likelihood. These will be referred to as posterior density, mean, and HDR for simplicity, without explicitly qualifying that they are approximations.

4.1. Prior distributions

Other than the three studies for which we have individual-level data, there are four additional studies [5, 9, 10, 36] providing only summary statistics (hazard ratio and corresponding 95 per cent confidence interval). A meta-analysis of summary statistics for these four studies is done, similarly to the meta-analysis of all summary statistics in [11]. The estimated combined effect of GBV-C in these four studies is −0.41, with an estimated standard error of 0.42. This result helps to postulate the prior distribution for μ. To be conservative, the standard error is multiplied by 2, so that μ is normally distributed as N[−0.41, (0.42 × 2)2].

A proper prior distribution for σ2 is used for the sake of computational stability of the MCMC methods and to generate a proper approximate posterior distribution. A gamma distribution on τ = σ−2 is used for τ with mean 50 and variance 10 000, denoted as Γ(0.25, 0.005). This distribution is specified by considering what values for the random effects are reasonable. For example, an assumption representing moderate heterogeneity between the studies would be that γj (j = 1, 2, 3) vary around μ by ±0.10. Using 0.10 as an estimate of the standard error σ leads to a prior estimate of τ = 100. A prior belief that presents substantial heterogeneity might be that the γj vary around μ by ±1.0. Using 1.0 as the prior standard error of γj leads to a prior estimate of τ = 1. The prior distribution Γ(0.25, 0.005), with mean 50 and standard error 100, gives reasonable weight to these extremes. A flat prior distribution is used for β.

4.2. Joint posterior distributions

A parametric model for the distribution of y is implemented in the example instead of the non-parametric method. Reasons for this choice include that the non-parametric maximum likelihood estimator G^ given by Turnbull [133] is only unique up to an equivalence class and also has discrete components. The parametric model gives a smoother distribution for the times of infection, which is thought to more realistically model the reality of the spread of HIV infection. In addition, for the pooled data set, Turnbull’s estimate of the survival function is not very different from the maximum likelihood estimate based on assuming a Weibull distribution (see Figure 1 of a technical report [37]).

Figure 1.

Figure 1.

Estimated hazard ratio and 95 per cent HDR from the MCMCid and ieMCMC approaches (above the dashed line), and from sensitivity analysis with different prior distributions for τ and HIV infection times y using the Bayesian-PL approach (below the dashed line). The primary analysis is Tau1Y1.

4.2.1. MCMCid and ieMCMC approaches

The distribution function G of HIV infection times y is assumed to be a Weibull(α, λ). Conditional on the maximum likelihood estimates (α^,λ^) and intervals for infection times, infection times y^ are randomly sampled. Let C denote the doubly censored data C = {([li, ui], ei, δi, xi1, zi) : i = 1, 2, … , n}. Following the prior distributions in Section 4.1, the joint approximate posterior density of all parameters for the MCMCid approach is given by

π(β,γ,μ,τD(C,y^))L(β,γD(C,y^))j=13τ1/2exp[(γjμ)2τ2]×exp[(μμ0)22σ02]τa01exp(τb0) (6)

where D(C,y^)={(t^i,δi,xi1,x^i2,zi):i=1,2,,n}, t^i=eiy^i,x^i2 is a function of y^i, and (a0, b0, μ0, σ0) = (0.25, 0.005, −0.41, 0.42 × 2).

The joint posterior density of all parameters for the ieMCMC approach is the same as (6), except that D(C,y^) is replaced by D(C, ym), where D(C,ym)={(tim,δi,xi1,xi2m,zi):i=1,2,,n}, tim=eiyim and xi2m is a function of yim. In the ieMCMC approach, D(C, ym) changes at each MCMC iteration m, while D(C,y^) is fixed during the process of the MCMCid approach.

For both MCMCid and ieMCMC approaches, given data and other parameters in the model, the full conditional posterior distribution for τ has a gamma distribution π(τ)Γ[a0+32,b0+j=13(γjμ)2/2], and the full conditional posterior distribution for μ has a normal distribution π(μ)N[λμ0+(1λ)γ¯,(1λ)(3τ)1], where λ=(3τ)1/[(3τ)1+σ02] and γ¯=j=13γj/3.

It is straightforward to perform the Gibbs sampling on μ and τ, but there is no direct way to draw from parameters β and γ, and the single-component Metropolis–Hastings algorithm [29] for the sampling of these two parameter components is used.

4.2.2. Bayesian-PL approach

The distribution of y is also assumed to be a Weibull(α, λ), and the prior distributions for α and λ are specified independently as log normal distributions: LogNorm (μα,σα2) and LogNorm (μλ,σλ2), respectively. For yi ∈ [li, ui], i = 1, … , n, the joint approximate posterior density of all parameters for the Bayesian-PL approach is then given by

π(β,γ,μ,τ,α,λ,yC)L(β,γD(C,y))j=13τ1/2exp[(γjμ)2τ2]×exp[(μμ0)22σ02]τa01exp(τb0)×i=1n{αλαyiα1exp[(yiλ)α]}{exp[(liλ)α]exp[(uiλ)α]}1×α1exp[(log(α)μα)22σα2]λ1exp[(log(λ)μλ)22σλ2] (7)

where (μα, σα, μλ, σλ) = (1.31, 0.4, 3.51, 0.5).

4.3. Results of primary analysis

All methods are implemented in R [38], using the Metropolis-within-Gibbs algorithm. The posterior full conditional distributions of μ and τ are normal and gamma, respectively, so these parameters were drawn using Gibbs sampling. The approximate posterior full conditional distributions for parameters using the single-component Metropolis–Hastings algorithm [29] in the Bayesian-PL approach are given in the Appendix of the technical report [37]. Code is available from the first author. The WinBUGS software package [39] could not be used easily because of the doubly censored data complicated by an interval-censored covariate. Three independent chains are generated for each of the three approaches (MCMCid, ieMCMC, and Bayesian-PL approach). Each chain consists of 14 000 iterations after a series of 6000 burn-in iterations. The Brooks and Gelman convergence diagnostic [40] indicates that there is no evidence against the convergence of sampler for each parameter in all approaches.

Table I summarizes results from the MCMCid, ieMCMC, and Bayesian-PL approaches based on the hierarchical Cox proportional hazards model. The point estimates from the three approaches are similar for each parameter except for σ. For each parameter estimate, the standard error from the Bayesian-PL approach is, appropriately, the largest among three approaches, while the standard error from the MCMCid approach is the smallest one. Consequently, the 95 per cent HDR from the Bayesian-PL approach is generally wider than the one from the MCMCid or ieMCMC approach; the 95 per cent HDR from the MCMCid approach tends to be the narrowest. The differences are substantial, especially for the parameters of most interest: μ and σ.

Table I.

Comparison of results from different approaches for pooled analysis using the hierarchical Cox proportional hazards model.

MCMCid*
ieMCMC*
Bayesian-PL
Parameter Mean (SE) 95 per cent HDR Mean (SE) 95 per cent HDR Mean (SE) 95 per cent HDR
μ −0.797 (0.174) (−1.136, −0.475) −0.846 (0.208) (−1.248, −0.436) −0.891 (0.277) (−1.423, −0.335)
σ 0.327 (0.413) (0.027, 1.016) 0.425 (0.519) (0.029, 1.292) 0.727 (0.783) (0.028, 1.947)
γ1 −0.779 (0.145) (−1.067, −0.494) −0.845 (0.158) (−1.162, −0.544) −0.998 (0.185) (−1.360, −0.623)
γ2 −1.073 (0.359) (−1.812, −0.486) −1.227 (0.416) (−2.123, −0.596) −1.585 (0.476) (−2.497, −0.775)
γ3 −0.636 (0.274) (−1.111, −0.039) −0.623 (0.336) (−1.190, 0.109) −0.463 (0.412) (−1.189, 0.339)
β1 −0.538 (0.086) (−0.707, −0.371) −0.569 (0.098) (−0.756, −0.377) −0.645 (0.101) (−0.842, −0.444)
β2 0.013 (0.006) (0.001, 0.027) 0.017 (0.007) (0.002, 0.030) 0.008 (0.007) (−0.006, 0.021)
*

Random imputation for infection times y based on the estimated Weibull (α^,λ^).

Prior distribution for infection times y: Weibull(α, λ), where α ∼ LogNorm(1.31, 0.4) and λ ∼ LogNorm(3.51, 0.5).

Results from all three approaches indicate that GBV-C infection is associated with prolonged survival. From the Bayesian-PL approach, the estimated hazard ratio for GBV-C viremia is (e1.423,e0.335)=eμ^=e0.891=0.41, with 95 per cent probability of falling into the interval (e−1.423, e−0.335) = (0.24, 0.72) after adjusting for baseline log(CD4+ count in cells/mL) and age at HIV infection. Baseline log(CD4+ count in cells/mL) is also associated with prolonged survival: estimated hazard ratio eβ^1=e0.645=0.52, with 95 per cent probability of falling into the interval (e−0.842, e−0.444) = (0.43, 0.64).

4.4. Results of sensitivity analysis

To examine the behavior of the estimate of μ from the Bayesian-PL approach when the distribution for infection times y is modified, the hierarchical Cox proportional hazards model is fit using different distributions for y. Specifically, we use a flat distribution for y, a single Weibull(α, λ) for y, and a set of three Weibull distributions, one for each of the three studies, Weibull(αj, λj) for yj, j = 1, 2, 3. The choice of Γ(0.25, 0.005) as the prior for τ is also examined by using Γ(0.001, 0.001). Overall, there are 3 × 2 = 6 scenarios in the sensitivity analyses, with the first one corresponding to the primary analysis (see Table II and Figure 1). The value of μ^, the estimate for the logarithm of hazard ratio of GBV-C infection, changes only slightly, as does the standard error and 95 per cent HDR. This analysis suggests that the results are insensitive to the choice of prior distribution.

Table II.

Sensitivity analysis for the Bayesian-PL approach using different prior distributions for τ and y.

Analysis τ Infection times y μ^ (SE) 95 per cent HDR
Tau1Y1 Γ(0.25, 0.005) Weibull(α, λ) −0.891 (0.277) (−1.423, −0.335)
Tau1Y2 Γ(0.25, 0.005) Weibull(αj, λj), j = 1, 2, 3 −0.866 (0.288) (−1.398, −0.263)
Tau1Y3 Γ(0.25, 0.005) Flat prior −0.924 (0.272) (−1.417, −0.350)
Tau2Y1 Γ(0.001, 0.001) Weibull(α, λ) −0.876 (0.271) (−1.385, −0.326)
Tau2Y2 Γ(0.001, 0.001) Weibull(αj, λj), j = 1, 2, 3 −0.878 (0.283) (−1.392, −0.279)
Tau2Y3 Γ(0.001, 0.001) Flat prior −0.911 (0.250) (−1.360, −0.412)

5. DISCUSSION

The imputation-embedded MCMC (ieMCMC) and the Bayesian-PL approaches are developed to deal with doubly censored survival data and compared with an MCMC analysis of a right-censored data set constructed by imputing a single value for each interval-censored origin (MCMCid). The MCMCid approach considerably underestimates the variability of the estimates. The ieMCMC allows for some uncertainty of imputed origins to play a role, but again results in underestimation of the variability of the parameter estimates. In comparison, the Bayesian-PL approach treats unobservable origins y as unknown quantities with a parametric distribution G. Prior distributions are then assigned for the hyperparameters of G. Interval censoring is treated by data augmentation [41] with y^ drawn from their posterior predictive distribution. The results from the Bayesian-PL approach more appropriately reflect uncertainty than the MCMCid and ieMCMC approaches. This paper demonstrates the ability of the Bayesian-PL approach to incorporate the uncertainty of imputed origins in doubly censored survival data. Our sensitivity study shows that the results from the Bayesian-PL approach are reasonably insensitive to the specification of the parametric form of G (Figure 1).

Härkänen et al. [42] presented a non-parametric Bayesian intensity model for doubly censored data in the fully Bayesian framework, which treats unobservable origins y as unknown quantities with piece-wise constant hazard functions. Hazard functions are assigned gamma prior distributions. Komárek et al. [43] applied a modified version of this approach to doubly censored dental data to examine the effect of fluoride intake on the time to caries development in children. However, these approaches are complex to implement and computationally demanding.

It should be noted that a formal justification of using the Bayesian-PL approach has not been provided for interval-censored origins and random effects. Given the results in [23, 24, 26] this assumption is not unreasonable, but additional work needs to be done.

In the case study, a completely parametric model could have been used, in which case an estimate of the survival curve would be available. The primary interest in this data analysis is, however, the question of whether or not co-infection with GBV-C is associated with prolonged survival of individuals infected with HIV disease. All three studies and the other studies in the meta-analysis [11] use data before the advent of the highly effective therapy for HIV infection, and so the survival curve itself is not of current interest.

Our analysis examines the association between GBV-C infection late in HIV disease and survival, from the time of HIV seroconversion, of HIV-infected individuals. The data available are not ideal to address the association of persistent GBV-C infection with prolonged survival, but GBV-C infection is typically persistent in that the possibility of GBV-C infection clearance or acquisition of GBV-C infection after HIV infection is small [8, 44]. The results of both [8, 13] indicate that the loss of GBV-C infection late in the disease may be predictive of shorter survival after the loss. Additional testing for GBV-C of stored samples and additional modeling are being planned to examine the joint distribution of survival and GBV-C infection.

In summary, the Bayesian hierarchical Cox model has accommodated random study-specific effects and has therefore incorporated between-study heterogeneity. Through the specification of prior distributions, the prior information relevant to the parameters of interest has been taken into account. The methods for doubly censored survival data developed in this paper have enabled the analysis of these data sets and lead to the conclusion that the hazard of death with GBV-C infection is approximately 40 per cent of that without GBV-C infection, and that the hypothesis of no difference in hazard can be ruled out with high probability. The pooled analysis of the individual subject data therefore augments and supports the meta-analysis result of the summary statistics previously reported in [11]. Biological plausibility for a beneficial mechanism and in vitro evidence in inhibiting HIV replication are provided in [6, 4549]. However, as in all observational data, the results of our analyses do not provide evidence that GBV-C is causally related to improved survival, and it is possible that GBV-C infection is not the reason that HIV-positive individuals coinfected with GBV-C live longer but, rather, that it serves as a biological marker of a different factor related to HIV disease progression. This warrants further investigation and is a subject of debate in the scientific literature [9, 12, 13, 49, 50].

ACKNOWLEDGEMENTS

This manuscript was considerably improved following input and comments from an Associate Editor and two reviewers. The authors also wish to thank Dr Hans Tillmann from the University of Leipzig, Germany, and the Multicenter AIDS Cohort Study (MACS) for providing data. The MACS has centers located at: The Johns Hopkins Bloomberg School of Public Health (Joseph Margolick); Howard Brown Health Center and Northwestern University Medical School (John Phair); University of California, Los Angeles (Roger Detels); University of Pittsburgh (Charles Rinaldo); and Data Analysis Center (Lisa Jacobson). This research was supported by NIH/NIAID (R01 058740) and National Security Agency (H98230-04-1-0042).

Contract/grant sponsor: NIH/NIAID; contract/grant number: R01 058740

Contract/grant sponsor: National Security Agency; contract/grant number: H98230-04-1-0042

REFERENCES

  • 1.Alter HJ. The cloning and clinical implications of HGV and HGBV-C. The New England Journal of Medicine 1996; 334:1536–1537. [DOI] [PubMed] [Google Scholar]
  • 2.Rambusch EG, Wedemeyer H, Tillmann HL, Heringlake S, Manns MP. Significance of coinfection with hepatitis G virus for chronic hepatitis C—a review of the literature. Zeitschrift fur Gastroenterologie 1998; 36:41–53. [PubMed] [Google Scholar]
  • 3.Tillmann HL, Heringlake S, Trautwein C, Meissner D, Nashan B, Schlitt HJ, Kratochvil J, Hunt J, Qiu X, Lou SC, Pichlmayr R, Manns MP. Antibodies against the GB virus C envelope 2 protein before liver transplantation protect against GB virus C de novo infection. Hepatology 1998; 28:379–384. [DOI] [PubMed] [Google Scholar]
  • 4.Stapleton JT. GB virus type C/hepatitis G virus. Seminars in Liver Disease 2003; 23:137–148. [DOI] [PubMed] [Google Scholar]
  • 5.Lefrère JJ, Roudot-Thoraval F, Morand-Joubert L, Petit JC, Lerable J, Thauvin M, Mariotti M. Carriage of GB virus C/hepatitis G virus RNA is associated with a slower immunologic, virologic, and clinical progression of human immunodeficiency virus disease in coinfected persons. Journal of Infectious Diseases 1999; 179:783–789. [DOI] [PubMed] [Google Scholar]
  • 6.Xiang J, Wünschmann S, Diekema DJ, Klinzman D, Patrick KD, George SL, Stapleton JT. Effect of coinfection with GB virus C on survival among patients with HIV infection. The New England Journal of Medicine 2001; 345:707–714. [DOI] [PubMed] [Google Scholar]
  • 7.Tillmann HL, Heiken H, Knapik-Botor A, Heringlake S, Ockenga J, Wilber JC, Goergen B, Detmer J, McMorrow M, Stoll M, Schmidt RE, Manns MP. Infection with GB virus C and reduced mortality among HIV-infected patients. The New England Journal of Medicine 2001; 345:715–724. [DOI] [PubMed] [Google Scholar]
  • 8.Williams CF, Klinzman D, Yamashita TE, Xiang J, Polgreen PM, Rinaldo C, Liu C, Phair J, Margolick JB, Zdunek D, Hess G, Stapleton JT. Persistent GB virus C infection and survival in HIV-infected men. The New England Journal of Medicine 2004; 350:981–990. [DOI] [PubMed] [Google Scholar]
  • 9.Björkman P, Flamholc L, Nauclér A, Molnegren V, Wallmark E, Widell A. GB virus C during the natural course of HIV-1 infection: viremia at diagnosis does not predict mortality. AIDS 2004; 18:877–886. [DOI] [PubMed] [Google Scholar]
  • 10.Birk M, Lindback S, Lidman C. No influence of GB virus C replication on the prognosis in a cohort of HIV-1-infected patients. AIDS 2002; 16:2482–2485. [DOI] [PubMed] [Google Scholar]
  • 11.Zhang W, Chaloner K, Tillmann HL, Williams CF, Stapleton JT. Effect of early and late GBV-C viremia on survival of HIV infected individuals: a meta-analysis. HIV Medicine 2006; 7:173–180. [DOI] [PubMed] [Google Scholar]
  • 12.Stapleton JT, Chaloner K, Williams CF. GB virus C infection and survival in the Amsterdam cohort study. Journal of Infectious Diseases 2005; 191:2157–2158. [DOI] [PubMed] [Google Scholar]
  • 13.Van der Bij AK, Kloosterboer N, Prins M, Boeser-Nunnink B, Geskus RB, Lange JM, Coutinho RA, Schuitemaker H. Reply to George and Stapleton et al. Journal of Infectious Diseases 2005; 191:2158–2160. [DOI] [PubMed] [Google Scholar]
  • 14.Cox DR. Regression models and life-tables (with discussion). Journal of the Royal Statistical Society, Series B 1972; 34:187–200. [Google Scholar]
  • 15.Clayton DG. A model for association in bivariate life tables and its applications in epidemiological studies of familial tendency in chronic disease indigence. Biometrika 1978; 65:141–151. [Google Scholar]
  • 16.Clayton DG, Cuzick J. Multivariate generalizations of the proportional hazards model. Journal of the Royal Statistical Society, Series A 1985; 148:82–117. [Google Scholar]
  • 17.Gustafson P A Bayesian analysis of bivariate survival data from a multicenter cancer clinical trial. Statistics in Medicine 1995; 14:2523–2535. [DOI] [PubMed] [Google Scholar]
  • 18.Stangl D Prediction and decision making using Bayesian hierarchical models. Statistics in Medicine 1995; 14:2173–2190. [DOI] [PubMed] [Google Scholar]
  • 19.Stangl D, Greenhouse J. Assessing placebo response using Bayesian hierarchical survival models. Lifetime Data Analysis 1998; 4:5–28. [DOI] [PubMed] [Google Scholar]
  • 20.Sargent DJ. A general framework for random effects survival analysis in the Cox proportional hazards setting. Biometrics 1998; 54:1486–1497. [PubMed] [Google Scholar]
  • 21.Gustafson P Large hierarchical Bayesian analysis of multivariate survival data. Biometrics 1997; 53:230–242. [PubMed] [Google Scholar]
  • 22.Cox DR. Partial likelihood. Biometrika 1975; 62:269–275. [Google Scholar]
  • 23.Kalbfleisch JD. Nonparametric Bayesian analysis of survival time data. Journal of the Royal Statistical Society, Series B 1978; 40:214–221. [Google Scholar]
  • 24.Sinha D, Ibrahim JG, Chen M. A Bayesian justification of Cox’s partial likelihood. Biometrics 2003; 90:629–641. [Google Scholar]
  • 25.Ibrahim JG, Chen M-H, Sinha D. Bayesian Survival Analysis. Springer: New York, 2001. [Google Scholar]
  • 26.Chen M-H, Ibrahim JG, Shao Q-M. Posterior propriety and computation for the Cox regression model with applications to missing covariates. Biometrika 2006; 93:791–807. [Google Scholar]
  • 27.Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machines. Journal of Chemical Physics 1953; 21:1087–1092. [Google Scholar]
  • 28.Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970; 57:97–109. [Google Scholar]
  • 29.Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. Chapman & Hall: London, 1996. [Google Scholar]
  • 30.Liu KJ, Darrow WW, Rutherford GW. A model-based estimate of the mean incubation period for AIDS in homosexual men. Science 1988; 240:1333–1335. [DOI] [PubMed] [Google Scholar]
  • 31.Mariotto AB, Mariotti S, Pezzotti P, Rezza G, Verdecchia A. Estimation of the acquired immunodeficiency syndrome incubation period in intravenous drug users: a comparison with male homosexuals. American Journal of Epidemiology 1992; 135:428–437. [DOI] [PubMed] [Google Scholar]
  • 32.Law CG, Brookmeyer R. Effects of mid-point imputation on the analysis of doubly censored data. Statistics in Medicine 1992; 11:1569–1578. [DOI] [PubMed] [Google Scholar]
  • 33.Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B 1976; 38:290–295. [Google Scholar]
  • 34.Jaffe HW, Darrow WW, Echenberg DF, O’Malley PM, Getchell JP, Kalyanaraman VS, Byers RH, Drennan DP, Braff EH, Curran JW et al. The acquired immunodeficiency syndrome in a cohort of homosexual men: a six year follow-up study. Annals of Internal Medicine 1985; 103:210–214. [DOI] [PubMed] [Google Scholar]
  • 35.Souza IE, Allen JB, Xiang J, Klinzman D, Diaz R, Zhang S, Chaloner K, Zdunek D, Hess G, Williams CF, Benning L, Stapleton JT. Effect of primer selection on estimates of GB virus C (GBV-C) prevalence and response to antiretroviral therapy for optimal testing for GBV-C viremia. Journal of Clinical Microbiology 2006; 44:3105–3113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Toyoda H, Fukuda Y, Hayakawa T, Takamatsu, Saito H. Effect of GB virus C/hepatitis G virus coinfection on the course of HIV infection in hemophilia patients in Japan. Journal of Acquired Immune Deficiency Syndrome and Human Retrovirology 1998; 17:209–213. [DOI] [PubMed] [Google Scholar]
  • 37.Zhang W, Chaloner K, Cowles MK, Zhang Y, Stapleton JT. A Bayesian analysis of doubly censored data using a hierarchical Cox model. Technical Report, Department of Biostatistics at University of Iowa, 2007. (Available from: http://www.public-health.uiowa.edu/biostat/research/reports.html). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria: ISBN: 3–900051-07–0 (Available from: http://www.r-project.org, 2005). [Google Scholar]
  • 39.Spiegelhalter D, Thomas A, Best N, Lunn D. WinBUGS User Manual, 2003. (Available from: http://www.mrc-bsu.cam.ac.uk/bugs).
  • 40.Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 1998; 7:434–455. [Google Scholar]
  • 41.Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 1987; 82:528–540. [Google Scholar]
  • 42.Härkänen T, Virtanen JI, Arjas E. Caries on permanent teeth: a non-parametric Bayesian analysis. Scandinavian Journal of Statistics 2000; 27:577–588. [Google Scholar]
  • 43.Komárek A, Lesaffre E, Härkänen T, Declerck D, Virtanen JI. A Bayesian analysis of multivariate doubly–interval-censored dental data. Biostatistics 2005; 6:145–155. [DOI] [PubMed] [Google Scholar]
  • 44.Bisson GP, Strom BL, Gross R, Weissman D, Klinzman D, Hwang WT, Kostman JR, Metzger D, Stapleton JT, Frank I. Effect of GB virus C viremia on HIV acquisition and HIV set-point. AIDS 2005; 19:1910–1912. [DOI] [PubMed] [Google Scholar]
  • 45.Xiang J, George SL, Wünschmann S, Chang Q, Klinzman D, Stapleton JT. Inhibition of HIV-1 replication by GB virus C infection through increases in RANTES, MIP-1alpha, MIP-1beta, and SDF-1. The Lancet 2004; 363:2040–2046. [DOI] [PubMed] [Google Scholar]
  • 46.Jung S, Knauer O, Donhauser N, Eichenmüller M, Helm M, Fleckenstein B, Reil H. Inhibition of HIV strains by GB virus C in cell culture can be mediated by CD4 and CD8 T-lymphocyte derived soluble factors. AIDS 2005; 19:1267–1272. [DOI] [PubMed] [Google Scholar]
  • 47.Nattermann J, Nischalke HD, Kupfer B, Rockstroh J, Hess L, Sauerbruch T, Spengler U. Regulation of CC chemokine receptor 5 in hepatitis G virus infection. AIDS 2003; 17:1457–1462. [DOI] [PubMed] [Google Scholar]
  • 48.Xiang J, Sathar MA, McLinden JH, Klinzman D, Chang Q, Stapleton JT. South African GB virus C isolates: interactions between genotypes 1 and 5 isolates and HIV. Journal of Infectious Diseases 2005; 192:2147–2151. [DOI] [PubMed] [Google Scholar]
  • 49.George SL, Varmaz D, Stapleton JT. GB virus C replicates in primary T and B lymphocytes. Journal of Infectious Diseases 2006; 193:451–454. [DOI] [PubMed] [Google Scholar]
  • 50.Stiehm ER. Disease versus disease: how one disease may ameliorate another. Pediatrics 2006; 117:184–191. [DOI] [PubMed] [Google Scholar]

RESOURCES