A Bayesian Framework for Estimating Vaccine Efficacy per Infectious Contact

Yang Yang; Peter Gilbert; Ira M Longini, Jr; M Elizabeth Halloran

doi:10.1214/08-AOAS193

. Author manuscript; available in PMC: 2009 Jan 23.

Published in final edited form as: Ann Appl Stat. 2008;2(4):1409–1431. doi: 10.1214/08-AOAS193

A Bayesian Framework for Estimating Vaccine Efficacy per Infectious Contact

Yang Yang ^1,^✉, Peter Gilbert ^1,², Ira M Longini Jr ^1,², M Elizabeth Halloran ^1,²

PMCID: PMC2630256 NIHMSID: NIHMS76672 PMID: 19169384

Abstract

In vaccine studies for infectious diseases such as human immunodeficiency virus (HIV), the frequency and type of contacts between study participants and infectious sources are among the most informative risk factors, but are often not adequately adjusted for in standard analyses. Such adjustment can improve the assessment of vaccine efficacy as well as the assessment of risk factors. It can be attained by modeling transmission per contact with infectious sources. However, information about contacts that rely on self-reporting by study participants are subject to nontrivial measurement error in many studies. We develop a Bayesian hierarchical model fitted using Markov chain Monte Carlo (MCMC) sampling to estimate the vaccine efficacy controlled for exposure to infection, while adjusting for measurement error in contact-related factors. Our method is used to re-analyze two recent HIV vaccine studies, and the results are compared with the published primary analyses that used standard methods. The proposed method could also be used for other vaccines where contact information is collected, such as human papilloma virus vaccines.

1 Introduction

Two randomized multi-center Phase III preventive HIV vaccine trials were conducted to evaluate the efficacy of two versions of AIDSVAX, a recombinant glycoprotein 120 (rgp120) vaccine developed by VaxGen and designed to provide protective immunity by inducing antibody response. One trial (VAX004) was conducted in adults at risk of sexual transmission in North America and the Netherlands, launched in June, 1998, and the other (VAX003) in injecting drug users (IDUs) in Bangkok, Thailand, started in March, 1999. In analyses using Cox proportional hazards models, the vaccine has been shown to be non-effective in Gurwith et al. (2005) for VAX004 and in Pitisuttithum et al. (2006) for VAX003.

A general definition of vaccine efficacy is VE = 1 - RR, where RR is the relative risk of infection for a vaccinated subject compared to that for a control subject. Depending on how risk is defined, various VE measures can be derived. The most frequently used measures were classified by Halloran, Struchiner and Longini (1997) into two categories: conditional on exposure to infection and unconditional, that is, whether the measure is controlling for the frequency and type of contacts that lead to transmission. A contact can be defined as one sexual act of a certain type in the context of VAX004 and as one act of sharing a needle for drug injection in VAX003. The VE measure used in Gurwith et al. (2005) and in Pitisuttithum et al. (2006) falls in the unconditional category. It is of public health interest to re-analyze the two vaccine trials using a VE measure conditional on exposure to infection.

For proper inference conditional on exposure to infection, measurement error in exposure factors should be taken into account. For example, the numbers of needle-sharing acts are often under-reported when IDUs are interviewed (Hudgens et al., 2002). Thus, methods depending solely on reported exposure information could be inappropriate. To handle the problem of measurement error, many methods have been introduced (Carroll, Ruppert and Stefanski, 1995). In the non-parametric setting, Fan and Truong (1993) explored the properties of globally consistent non-parametric regression using deconvolution kernels. Cook and Stefanski (1994) and Carroll et al. (1996) developed the simulation extrapolation method that imposes no assumption on the covariates measured with error and uses resampling to detect the trend of measurement error. Richardson and Green (1997) discussed the use of mixture priors for covariates measured with error in the Bayesian framework, and this method was extended to epidemiological studies with a validation set (Richardson et al., 2002). In these two vaccine trials, the exposure factors that are subject to measurement error and that are most vital to parameter estimation are the frequencies and the types of contacts.

In this paper, we develop a Bayesian framework under the simple assumption of conditional independence (Richardson and Gilks, 1993) for infectious disease incidence data with contact frequency and type recorded for each observation. Using this Bayesian model, we re-analyze the data from the two AIDSVAX trials. Our primary focus is to estimate the transmission probability and vaccine efficacy per infectious contact, while adjusting for measurement error in contact frequency and type. In addition, these studies provide information to address the following questions that are useful for understanding HIV transmission:

Is VE modified by the baseline behavioral risk profile?
Is the use of condoms in sexual contacts protective?
Is sharing needles more risky in prison compared to in the general public?
Is one subtype of HIV more infectious than another subtype via shared needle injection?

The results are compared to those obtained in Gurwith et al. (2005), Pitisuttithum et al. (2006) and Hudgens et al. (2002).

2 Data description

Basic characteristics of the two trials are presented in Table 1. The two trials had similar designs except the ratio of vaccine to placebo recipients. Each subject was enrolled free of HIV infection and received seven injections (study vaccine or placebo) at months 0, 1, 6 and every six months thereafter up to month 30. At each immunization visit and the final visit at month 36, antibody assays of blood samples were performed, and exposure factors, adverse events and social harm events for each participant in the past six months were collected. The primary endpoint of the trials was the detection of HIV-1 infection that is defined as both a positive HIV-1 enzyme immunoassay antibody test and the development of at least two new non-vaccine bands on confirmatory HIV immunoblot.

Table 1.

Two randomized multi-center trials conducted for evaluating the efficacy of AIDSVAX, a recombinant glycoprotein 120 HIV-1 vaccine.

	VAX004	VAX003
Time of trial	1998-2002	1999-2003
Location	North America and The Netherlands	Bangkok, Thailand
Type of transmission	Sexual acts	Sharing needles for drug injection
Population size	5403	2527
Male	5095 (94%)	2361 (93%)
Female	308 (6%)	166 (7%)
Randomization ratio (vaccine:placebo)	2:1	1:1
Infected/Randomized
Placebo	127/1805	105/1260
Male	123/1704	101/1170
Female	4/101	4/90
Vaccine	241/3598	106/1267
Male	239/3391	100/1191
Female	2/207	6/76
HIV-1 subtypes
B	100%	33 (78%)
E	0	164 (16%)
Untypeable	0	14 (6%)

Open in a new tab

For trial VAX004, in addition to vaccine status, exposure factors were collected in the form of sexual contact frequencies categorized by the behavioral type of the contact (vaginal, oral or anal), gender of the partner, the infection status of partners reported by the subject (HIV-positive, HIV-negative or unknown), and condom use. To reduce the dimension of parameters, we ignore the effects of behavioral type and gender on transmission probabilities by summing the frequencies over the corresponding categories. As the study participants were mostly men that have sex with men (MSM), with females accounting for only 6% of the population and 1.6% of the infections, we are largely assessing transmission via MSM contacts.

For trial VAX003, the exposure factors of interest are the frequency of injections, the fraction of injections using needles shared with other people, the history of injection in jail or prison (incarceration injection), and the vaccine status. Since one of two HIV-1 subtypes (E and B) was found for most infections, it is possible to estimate the transmission probability and vaccine efficacy for each of the two subtypes, given that reasonable estimates of the prevalences of these subtypes among the IDUs in Bangkok, Thailand, are available. Contact information collected in this study is not as detailed as in VAX004. Both the injection frequency and the fraction using shared needles were reported as a few categories instead of numbers. There are four categories for the injection frequency (none, < 1/week, ≥ 1/week but < 1/day, and ≥ 1/day), to which we assign values 10⁻¹⁰/day, 0.5/week, 4/week and 1/day respectively. There are five categories for the fraction of injections using shared needles (none, occasionally, half of the time, most, and always), to which we assign values 0.5%, 15%, 50%, 85% and 99.5% respectively.

3 Methods

3.1 Model Structure

Following Richardson and Gilks (1993), we specify three submodels for our Bayesian analysis of the measurement error problem: the regression submodel, the measurement error submodel, and the prior submodel. In the type of study we are considering, risk factors and infection status are obtained for each subject over consecutive six-month intervals. Let N be the total number of study participants, and T_i be the number of intervals of subject i, i = 1, … , N. We use data collected from month 6 to month 36, excluding month 0 as an adjustment for left truncation. Visits after the first with positive HIV detection are also excluded from analysis. For notational convenience, we identify the t^th interval of subject i by (i, t).

3.1.1 The Regression Submodel

Let p₀ be the baseline transmission probability per infectious contact. An infectious contact refers to a contact with an infectious source. Let n_it be the number of contacts and x_itj = (x_itj₁, … , x_itjK)^τ be the vector of K covariates associated with the j^th contact in interval (i, t), j = 1, … , n_it. The covariates associated with a contact may include characteristics of the subject (e.g., vaccine status), the partner (e.g., infection status) and the contact itself (e.g., condom use, incarceration, etc.). To associate the transmission probability with covariates, we consider a logit model:

p (x_{itj}) = {logit}^{- 1} (logit (p_{0}) + x_{itj}^{τ} θ),

(1)

where θ = (θ₁, … , θ_K)^τ is the coefficient vector with the interpretation that exp(θ_k) is the increment in odds of transmission per unit increase in x_itjk or the odds ratio (OR) for x_itjk = 1 relative to x_itjk = 0 if x_itjk is binary. Other regression submodels such as the complementary log-log could also be used. Also frequently used is the multiplicative submodel $p (x_{itj}) = p_{0} exp {x_{itj}^{τ} θ}$ . However, it is sometimes difficult to guarantee p(x_itj) < 1 when p₀ and θ are simultaneously sampled. In the context of the two AIDSVAX trials, we use OR_vac, OR_con and OR_inc to denote the odds ratios of transmission per infectious contact for vaccination, condom use and incarceration, respectively. The probability of escaping infection in interval (i, t) is

Q_{i t} = \prod_{j = 1}^{n_{i t}} (1 - p (x_{itj}) π (x_{itj})),

(2)

where π(x_itj) is the prevalence of infectious contacts among all contacts with covariates x_itj. As p(x_itj) and π(x_itj) always appear as a product, they are not estimable at the same time, and π(x_itj) is often assumed known and evaluated from either literature or the data.

As mentioned in the introduction, different measures can be used for vaccine efficacy, depending on the definition of relative risks. A natural choice is the VE per infectious contact with the risks being transmission probabilities per infectious contact as given in (1). However, the relative risk obtained from transmission probabilities per infectious contact depends on not only the vaccine status but also other covariates. Such dependency may not exist in different models. For example, if we assume a multiplicative model $p (x_{itj}) = p_{0} exp (x_{itj}^{τ} θ)$ , the VE per infectious contact will depend solely on the vaccine status. For the logit model, the dependency could also be minimal if p(x_itj) is small, where we have VE per infectious contact ≈ 1 − OR_vac. The approximation holds for the contact types we consider here, and thus we report 1 − OR_vac as the VE per infectious contact for the data analysis.

Expressions (1) and (2) provide a general form for the regression submodel. The exact form is specific to each study, depending on the covariates under consideration, and is described below.

The North America and Netherlands Trial (VAX004)

For trial VAX004, we are interested in the effects of vaccine and condom usage. Let υ_i indicate the vaccine status (1:yes, 0:no) and c_itj indicate the condom use (1: yes, 0: no) for the j^th sexual contact in interval (i, t). Let p₀ be the transmission probability for a sexual contact without a condom between a placebo recipient and an infected partner. We assume the prevalence, π, of HIV in contacts is identical for all intervals and is known. The escape probability for interval (i, t) is given by

Q_{i t} = \prod_{j = 1}^{n_{i t}} (1 - p (υ_{i}, c_{itj}) π) = {(1 - p (υ_{i}, 1) π)}^{m_{i t}} {(1 - p (υ_{i}, 0) π)}^{n_{i t} - m_{i t}},

(3)

where p(υ_i, c_itj) = logit⁻¹ (logit(p0) + θ_υυ_i + θ_cc_itj), θ_υ and θ_c are the effects of the vaccine and condom use, and $m_{i t} = \sum_{j = 1}^{n_{i t}} c_{itj}$ the total number of contacts with a condom. The probability distribution of the final transmission status, y_it (1: infection, 0: escape), is then

Pr (y_{i t} | n_{i t}, m_{i t}, υ_{i}; p_{0}, θ_{υ}, θ_{c}) = Q_{i t}^{1 - y_{i t}} {(1 - Q_{i t})}^{y_{i t}} .

(4)

The Thai Trial (VAX003)

For this trial, we consider vaccine status, incarceration history of the subject and needle-sharing as covariates. Let p₀ be the baseline probability of infection by an injection using a needle shared with an HIV-infected person. Let u_i denote whether the subject had incarceration injection (1: yes, 0: no) during the study, and s_itj denote whether the injection was using a shared needle (0: yes, 1: no). Also define θ_υ, θ_u and θ_s as the effects of the covariates, respectively. We assume that injections using non-shared needles were not infectious. That is, θ_s = −∞, and the regression submodel is built solely on the $m_{i t} = \sum_{j = 1}^{n_{i t}} (1 - s_{itj})$ contacts using shared needles. The probability of escaping infection in interval (i, t) is given by

Q_{i t} = \prod_{j = 1}^{n_{i t}} (1 - p (υ_{i}, u_{i}, s_{itj}) π) = {(1 - p (υ_{i}, u_{i}, 0) π)}^{m_{i t}},

(5)

where p(υ_i, u_i, s_itj) = logit⁻¹(logit(p₀) + θ_υυ_i + θ_uu_i + θ_ss_itj). The probability distribution of the final transmission status is the same as (4).

As the HIV subtype was determined for most infected subjects, it is possible to estimate the transmission probability and vaccine efficacy for each subtype. Let $p_{0}^{(e)} (p_{0}^{(b)})$ be the baseline probability of infection by an injection using a needle shared with somebody infected with HIV of subtype E (B), $θ_{υ}^{(e)} (θ_{υ}^{(b)})$ be the vaccine effects against transmission of subtype E (B), and π⁽^e⁾ (π⁽^b⁾) be the prevalence of people infected with subtype E (B) among the IDU population. The probabilities of escaping infection from injections using needles shared with infected partners of subtype E and subtype B, respectively, are given by

Q_{i t}^{(e)} = {(1 - {logit}^{- 1} (logit (p_{0}^{(e)}) + θ_{υ}^{(e)} υ_{i} + θ_{u} u_{i}) π^{(e)})}^{m_{i t}}

and

Q_{i t}^{(b)} = {(1 - {logit}^{- 1} (logit (p_{0}^{(b)}) + θ_{υ}^{(b)} υ_{i} + θ_{u} u_{i}) π^{(b)})}^{m_{i t}} .

We assume transmission of subtype E is independent of transmission of subtype B. As infection by both subtypes is rare, we assume an infected subject typed as E (B) must have escaped transmission from infectious contacts of subtype B (E). The probability distribution of the final transmission status can be expressed as

Pr (y_{i t}, subtype | m_{i t}, υ_{i}, u_{i}; p_{0}, θ_{υ}, θ_{u}) = {\begin{array}{l} Q_{i t}^{(e)} Q_{i t}^{(b)}, & y_{i t} = 0, \\ Q_{i t}^{(b)} (1 - Q_{i t}^{(e)}), & y_{i t} = 1, subtype = E \\ Q_{i t}^{(e)} (1 - Q_{i t}^{(b)}), & y_{i t} = 1, subtype = B \\ 1 - Q_{i t}^{(e)} Q_{i t}^{(b)}, & y_{i t} = 1, subtype = U \end{array},

(6)

where “U” stands for “Untypeable”.

3.1.2 The Measurement Error Submodel

We consider two types of exposure information that are measured with error, the total number of contacts, n_it, and the number of a particular subset of contacts, m_it. Let ñ_it and m̃_it be the measured values of n_it and m_it, respectively. As data in the form of counts over time periods often arise from a Poisson process, we assume a Poisson distribution for the true number of contacts n_it and an over-dispersed Poisson distribution for the measured number ñ_it during a time interval of length l_it, given the contact rate λ_it. The reason for an over-dispersion structure is that we want some correction for the potentially under- or over-reported number of contacts, e.g., the number of sexual contacts in a single interval was reported as thousands by several subjects in trial VAX004. The histograms of reported contact rates in Figure 1(a) for VAX004 and Figure 1(c) for VAX003 suggested either gamma or log-normal distributions. We use the log-normal distribution for illustration, but compare both in the data analyses. Define n_i = (n_i₁, … , n_{iT_i})^τ, m_i = (m_i₁, … , m_{iT_i})^τ, ñ_it = (ñ_i₁, …, ñ_{iT_i})^τ, m̃_i = (m̃_i₁, …, m̃_{iT_i})^τ, and λ_i = (λ_i_i, … , λ_{iT_i})^τ. Let 1 and J denote the vector and matrix, respectively, with all elements being 1, and let I denote the identity matrix. The dimensions of 1, J and I are clear from the context and are thus suppressed. We choose the following measurement error structure for n_it:

(a) Reported sexual contact rates in VAX004. Values larger than 5/day (< 0.1%) are truncated in the graph but not in the analysis. The vertical line segments indicate the location of values between 1 and 5. (b) Reported proportion of condom use in VAX004. (c) Reported injection rates in VAX003. (d) Reported proportion of shared needles in VAX003.

\begin{array}{l} λ_{i} \sim Log - Normal (μ 1, σ^{2} (ρ J + (1 - ρ) I)), \\ n_{i t} \sim Poisson (λ_{i t} l_{i t}), \\ δ_{i t} \sim Gamma (φ, λ_{i t} l_{i t} / φ), and \\ {\tilde{n}}_{i t} \sim Poisson (δ_{i t}) . \end{array}

(7)

An exchangeable within-subject correlation structure is assumed for the contact rates, λ_i, but other correlation structures could be considered. The magnitude of correlation among elements of λ_i is measured by ρ, 0 ≤ ρ ≤ 1, the correlation coefficient for log(λ_i). We assume unbiasness for the measurement error, as E(ñ_it|λ_it) = λ_itl_it = E(n_it|λ_it). The over-dispersion is reflected by VAR(ñ_it|λ_it) = λ_itl_it(1 + λ_itl_it/φ) and is generated by adding the layer of δ_i₁ = (δ_i₁, … , δ_{iT_i})^τ. The degree of over-dispersion decreases as φ goes to infinity. By our assumption, n_it is conditionally independent of ñ_it given the contact rate λ_it. Zero values of ñ_it are allowed for intervals in which infections happened since only n_it is required to be non-zero.

Given n_it and ñ_it, it is natural to choose binomial distributions for both the true number m_it and the measured number m̃_it based on a beta-distributed proportion ξ_it, which is also suggested by the histograms of reported proportions of contacts with condom use in Figure 1(b) for VAX004 and contacts with needle-sharing in Figure 1(d) for VAX003. Define Φ(·) as the standard normal cumulative distribution function (CDF) and Ψ(·|α, β) as the beta CDF. We have

\begin{array}{l} ξ_{i t} \sim Beta (α, β), \\ m_{i t} \sim Binomial (n_{i t}, ξ_{i t}), \\ {\tilde{m}}_{i t} \sim Binomial ({\tilde{n}}_{i t}, ξ_{i t}), \\ Φ (ε_{i t}) = Ψ (ξ_{i t} | α, β), and \\ ε_{i} \sim N (0, γ J + (1 - γ) I), \end{array}

(8)

where ε_i = (ε_i₁, … , ε_{iT_i})^τ. We use a standard normal copula to model the within-subject correlation among ξ_i = (ξ_i₁, … , ξ_{iT_i})^τ, the proportions of contacts in a subcategory (condom use or needle-sharing). This copula is formed by generating a standard normal random vector ε_i with an exchangeable correlation structure, the correlation coefficient being γ, and transforming it to a uniform random vector using Φ on each component. The uniform random vector is then transformed to ξ_i using Ψ⁻¹ on each element. The ξ_i generated in this way has marginal CDF Ψ(·|α, β) and an exchangeable correlation structure. While the correlation coefficient for ξ_i is not the same as that for ε_i, they share the same rank correlation because the CDFs are monotonic. Note that the log-normal distribution can be viewed as a special case utilizing the standard normal copula. Conditional on n_it, ñ_it and ξ_it, m_it and m̃_it are independent.

3.1.3 The Prior Submodel

We use the following priors for p₀, θ and hyperparameters:

\begin{array}{l} μ \sim 1, \\ σ^{2} \sim \frac{1}{σ^{2}}, \\ ρ \sim Uniform (0, 1), \\ φ \sim {[ln Γ^{″} (φ) - \frac{1}{φ}]}^{1 / 2}, \\ (α, β) \sim {[ln Γ^{″} (α) ln Γ^{″} (β) - ln Γ^{″} (α + β) (ln Γ^{″} (α) + ln Γ^{″} (β))]}^{1 / 2}, \\ γ \sim Uniform (0, 1), \\ θ_{k} \sim Normal(0, d_{k}^{2}), k = 1, \dots, K, and \\ p_{0} \sim Uniform (a_{p}, b_{p}), \end{array}

(9)

where {d_k : k = 1, …, K}, a_p and b_p are assumed known, and ln Γ″(·) is the trigamma function. Jeffreys' non-informative priors are used for μ, σ², φ and (α, β).

Our choice of a relatively wide range (a_p, b_p) is guided by the maximum likelihood estimate (MLE) of p₀ obtained solely from the regression submodel. To use this simple likelihood method, we assume n_it = ñ_it, and m_it is estimated by n_it × Σ_i,t m̃_it/Σ_i,t ñ_it for VAX003. The same assumption of a common proportion of shared needles was employed in Hudgens et al. (2002). However, one will not be able to differentiate the condom effect with a common proportion of condom use, and thus we assume m_it = m̃_it additionally to obtain the MLE of p₀ for VAX004.

A normal prior $N (0, d_{k}^{2})$ is reasonable for covariate effects because we let the data drive the 95% credible sets away from the null value if strong effects exist. The values of {d_k : k = 1, … , K} are set relatively large, e.g., 2, to provide a wide domain for the odds ratios.

3.2 Posterior Distributions

Bayesian inferences are based on posterior distributions of all unknown parameters and latent variables given the data and known parameters, which are derived from the prior and conditional distributions stated in the previous section. Let $y = {(y_{1}^{τ}, \dots, y_{N}^{τ})}^{τ}$ be the vector of observed infection status, where y_i = (y_i₁, … , y_{iT_i})^τ, and let $x = {(x_{1}^{τ}, \dots, x_{N}^{τ})}^{τ}$ , where $x_{i} = {(x_{i 1}^{τ}, \dots, x_{i T_{i}}^{τ})}^{τ}$ and x_it = (x_it₁, … , x_{itn_it})^τ, be the observed covariate matrix for all intervals. Similarly, define n, ñ, m, m̃, λ, δ, ξ and ε as the vectors of n_it, ñ_it, m_it, m̃_it, λ_it, δ_it, ξ_it and ε_it, t = 1, … , T_i, i = 1, … , N. Let f(·) denote the probability density function (PDF) for continuous variables and the probability mass function (PMF) for discrete variables. The joint posterior distribution of all unknown parameters and latent variables is proportional to the joint full probability of the unknown parameters, latent variables and the data:

\begin{array}{l} f (n, m, δ, λ, ε, ξ, p_{0}, θ, φ, μ, σ^{2}, ρ, α, β, γ | y, x, \tilde{n}, \tilde{m}) \\ \propto f (y, n, \tilde{n}, m, \tilde{m}, δ, λ, ε, ξ, p_{0}, θ, φ, μ, σ^{2}, ρ, α, β, γ | x) \\ = f (y | n, m, p_{0}, θ, x) \times f (\tilde{m} | \tilde{n}, ξ) \times f (\tilde{n} | δ) \times \\ f (m | n, ξ) \times f (n | λ) \times f (δ | λ, φ) \times f (λ | μ, σ^{2}, ρ) \times f (ε | γ) \times \\ f (μ) \times f (σ^{2}) \times f (ρ) \times f (φ) \times f (α) \times f (β) \times f (γ) \times f (p_{0}) \times f (θ), \end{array}

(10)

where ξ exists as a function of ε given in (8), and known hyper-parameters are suppressed.

To illustrate the Markov chain Monte Carlo (MCMC) algorithm used to obtain the joint posterior distribution of all parameters, we use VAX004 as an example and give the technical details in the appendix. In summary, we use the following strategies:

n, m, δ, μ, and σ² are sampled directly from their full conditional distributions.
For λ, ξ and ε, the full conditional distribution is a product of several regular density functions, and we use Metropolized independence sampling with each density sequentially serving as the proposal distribution.
The random-walk style Metropolis-Hastings algorithm is used for sampling all other parameters.

4 Application

In the following, we report the posterior medians followed by the 95% credible sets (CS) for parameters in the Bayesian model, and make comparisons with point estimates followed by the 95% confidence intervals (CI) from the literature when appropriate.

4.1 VAX004: HIV transmission by sexual contacts

At each semiannual follow-up visit in trial VAX004, subjects were asked to classify the sexual contacts by the infection status of their partners, i.e., positive, negative or unknown, based on their knowledge. HIV prevalence among partners reported as HIV-negative may be less than that among partners reported as HIV-positive. However, an exploratory analysis using a simple likelihood method showed that the probability of infection per contact was not different across the three types of partner infection status reported by the study participants. Hence, we assume a common prevalence π of infection among all partners and estimate it by 0.06, the proportion of reported contacts with positive partners among all contacts in the study population. In addition to the analysis for the overall study population, we performed a stratified analysis by classifying the study population into three subgroups corresponding to low, medium and high baseline (month 0) risk levels. We allow the transmission probability and vaccine effect to vary across, but assume that other parameters are not affected by, risk levels. The baseline risk levels are determined by a behavioral risk score ranging from 0 to 7, with 0 as low, 1-3 as medium, and 4-7 as high. The behavioral score is derived from nine baseline risk factors that are highly predictive of HIV infection (Gurwith et al., 2005).

Table 2 gives the results regarding transmission probabilities and VEs for VAX004. The vaccine did not show a significant effect, reducing the risk of infection per infectious contact by about 7% for the overall study population which is not statistically different from 0. Neither did the low-risk and medium-risk subgroups show any significant vaccine effect. However, we do observe a significant VE of 0.56 (95% CS:0.22, 0.75) in the high-risk subgroup, as the associated 95% CS excludes 0. The pattern that higher baseline risk tends to be associated with higher vaccine efficacy was also identified in Gurwith et al. (2005) via a Cox proportional hazards model for grouped times, where they reported an estimate of 0.06 (95% CS:-0.17, 0.24) for VE per six-month interval for the overall study population and 0.43 (95% CS:0.04, 0.66) for the high-risk subgroup, fairly close to our estimates.

Table 2.

VAX004: Summary of the posterior distributions of the transmission probability and the vaccine efficacy per infectious sexual contact for the overall study population and by baseline risk level, compared to the standard analysis

			p		VE (Bayesian)		VE (Cox^a)

Risk level	Total^b	Infected	Median	95% C.S.	Median	95% C.S.	Estimate	95% C.I.
Overall	8772	368	0.0056	0.0044, 0.0071	0.069	-0.15, 0.26	0.06	-0.17, 0.24
Low	3605	57	0.0020	0.0010, 0.0036	-0.23	-1.48, 0.35	-0.48	-1.93, 0.26
Middle	4546	229	0.0054	0.0041, 0.0071	0.02	-0.28, 0.25	0.03	-0.25, 0.25
High	621	82	0.020	0.013, 0.030	0.56	0.22, 0.75	0.43	0.04, 0.66

Open in a new tab

Results based on Cox proportional hazards model in Gurwith et al. (2005).

Total number of six-month intervals.

The baseline transmission probability per infectious sexual contact for the overall study population is 0.0056 (95% CS:0.0044, 0.0071), suggesting that 1000 sexual contacts with HIV-positive partners produce about six infections on average, without intervention of vaccine or condoms. This probability increases across risk levels, with the value for the high risk level 10 times that for the low risk level. A possible reason for the increase in transmission probability across risk levels is that subjects in higher risk levels might more likely under-report the number of contacts.

Results for all other parameters are presented in Table 3. Surprisingly, the reported use of condoms did not seem to be protective with OR_con estimated as 1.44 (95% CS:1.06, 1.94), suggesting that it increased the odds of transmission by about 44%. A possible explanation is that the reporting of condom use might be correlated with certain types of sexual behavior. A more specific speculation is that subjects in monogamy tended to use condoms much less frequently and yet had lower risk of infection as compared to those with multiple partners. We included an indicator for monogamy (on average < 2 partners over the study period), but the estimate of OR_con did not change much (results not shown).

Table 3.

VAX004: Summary of the posterior distributions of other parameters for the overall study population

Posterior Quantiles	OR_con	φ	μ	σ²	ρ	α	β	γ
Median	1.44	1.66	-2.54	1.95	0.92	0.30	0.29	0.65
2.5%	1.06	1.61	-2.58	1.87	0.91	0.29	0.28	0.64
97.5%	1.94	1.71	-2.50	2.04	0.92	0.31	0.30	0.67

Open in a new tab

High within-subject correlation is found among the contact rates and proportions of condom use, with ρ and γ estimated as 0.92 (95% CS:0.91, 0.92) and 0.65 (95% CS:0.64, 0.67) respectively. These correlation parameters indicate the magnitude of, but do not directly measure, the correlation coefficients among λ_i and among ξ_i. Based on posterior medians of μ, σ², α and β, we found that the mean contact rate in this cohort is 0.21 (95% CS:0.20, 0.22) times per day, and the mean proportion of condom use is 0.51 (95% CS:0.50, 0.52).

If a marginal gamma distribution is assumed for λ_i, we use the same copula technique used for ξ_i to introduce within-subject correlation. Changing the distribution of the contact rate from log-normal to gamma does not affect the estimates appreciably except for a slight increase in φ and decrease in ρ. We compare predicted population-level means and variances of the reported number of contacts yielded by the two distributions to the observed values, shown in Figure 2(a)-(c). While the gamma distribution gives a predicted mean closer to the observed mean, the log-normal distribution gives a more realistic standard deviation. The heavier tail of the log-normal distribution can better catch extreme reported values. We choose not to ignore the extreme reported values, and therefore all above results for VAX004 are based on the log-normal distribution for the contact rate.

(a) Reported number of sexual contacts in VAX004. Values larger than 1000 are truncated. The vertical line segments indicate the location of values between 200 and 1000. (b) Predicted number of sexual contacts in VAX004, assuming gamma distribution for contact rate. (c) Predicted number of sexual contacts in VAX004, assuming log-normal distribution for contact rate. (d) Reported number of injections in VAX003. (e) Predicted number of injections in VAX003, assuming gamma distribution for injection rate. (f) Predicted number of injections in VAX003, assuming log-normal distribution for injection rate.

While we believe that our prior assumptions over most parameters are non-informative or towards-null, we performed a brief sensitivity analysis by changing the prior distribution of p₀. We impose a strong beta prior with mean 0.0073 and standard deviation 0.001, instead of Uniform(0.0001, 0.1), on p₀. The posterior estimates increase to 0.0063 (0.0052, 0.0075) for p₀ and 0.12 (-0.08, 0.29) for VE, and decrease to 1.28 (0.99, 1.68) for OR_con, all changes being mild. A higher prior mean of p₀ will cause more substantial changes in the same directions.

4.2 VAX003: HIV transmission among IDUs using shared needles

In the Bayesian probability structure for trial VAX003, the over-dispersion structure and the related parameters, φ and δ_it, are dropped, i.e., we assume ñ_it ∼ Poisson(λ_it). The reason is that there is not sufficient information about over-dispersion with only four categories for the contact rate. We stratify the shape and scale parameters by incarceration injection history (u_i) for both injection rate (λ_it) and the proportion of needle sharing (ξ_it), an attempt to control for confounding factors when we evaluate the effect of incarceration injection history on the transmission probability. The prevalence of HIV among IDUs in Bangkok was around 30% (Kitayaporn et al., 1998). It was estimated that the relative prevalence between subtypes E and B was growing at a decreasing rate between 1998 and 2000, and reached 70%:30% in 2000 (Kitayaporn et al., 1998; Hudgens et al., 2002). Based on this information, the average relative prevalence most likely is between 0.7:0.3 to 0.8:0.2. We use π⁽^e⁾ = 0.75 × 0.3 = 0.225 and π⁽^b⁾ = 0.075 for analyses stratified by subtype.

We performed additional analyses stratified by two baseline behavioral risk levels defined in Pitisuttithum et al. (2006). A subject (and all his six-month intervals) is classified into the high baseline risk level if 2 or more of the following risk factors were present at visit 0: use of injection drugs regularly, use of injection drugs daily or weekly, use of injection drugs with shared needles, history of incarceration during the past 6 months, partner was an IDU, or shared needles with partner. Otherwise, the subject is classified into the low risk level.

The results for transmission probabilities and vaccine efficacies are presented in Table 4. None of the VE estimates are significantly different from 0. We estimate the VE per infectious needle-sharing act as -0.08 (95% CS:-0.43, 0.20) for overall transmission and as -0.12 (95% CS:-0.52, 0.17) for subtype E. Although subtype B tends to have a better VE than subtype E, the difference is not significant. Pitisuttithum et al. (2006) reported similar VE estimates, 0.001 (95% CI:-0.31, 0.24) for the overall IDU cohort and -0.014 (95% CI:-0.38, 0.25) for subtype E, based on a Cox proportional hazards model for grouped times.

Table 4.

VAX003: Summary of the posterior distributions of the transmission probability and the vaccine efficacy per infectious needle-sharing act for the overall study population and by baseline risk level and HIV subtype, compared to the standard analysis

				p		VE (Bayesian)		VE (Cox^a)

Risk Level	Subtype	Total^b	Infected^c	Median	95% C.S.	Median	95% C.S.	Estimate	95% C.I.
Overall		13797	206	0.026	0.021, 0.031	-0.08	-0.43, 0.20	0.001	-0.31, 0.24
	E		160	0.028	0.022, 0.034	-0.12	-0.52, 0.17	-0.014	-0.38, 0.25
	B		32	0.019	0.012, 0.029	0.18	-0.57, 0.60
	E/B			1.45	0.91, 2.39
Low		6622	80	0.033	0.024, 0.045	0.06	-0.49, 0.41
	E		55	0.034	0.022, 0.048	0.04	-0.66, 0.42
	B		16	0.032	0.015, 0.058	0.18	-1.33, 0.67
	E/B			1.06	0.51, 2.54
High		7175	126	0.023	0.017, 0.029	-0.10	-0.60, 0.23
	E		105	0.025	0.019, 0.032	-0.21	-0.77, 0.19
	B		16	0.015	0.008, 0.026	0.34	-0.63, 0.77
	E/B			1.68	0.92, 3.31

Open in a new tab

Results based on Cox proportional hazards model in Pitisuttithum et al. (2006).

Total number of six-month intervals.

Intervals for 5 subjects infected by visit 0 (E:4, B:1) are excluded. The 14 untypeable infections are not shown.

The baseline transmission probability per injection using a needle shared with an HIV-positive IDU is 0.026 (95% CS:0.021, 0.031), suggesting that, out of 100 such injections, 2.6 on average will transmit the virus. The subtype-specific baseline transmission probabilities are estimated as 0.028 (95% CS:0.022, 0.034) for $p_{0}^{(e)}$ and 0.019 (95% CS:0.012, 0.029) for ${\hat{p}}_{0}^{(b)}$ , higher than 0.016 (95% CI:0.012, 0.02) and 0.0063 (95% CI:0.0041, 0.0092) estimated in Hudgens et al. (2002) based on a likelihood method. It is interesting that the transmission probability per injection is somewhat higher for the low versus high baseline risk, opposite to the direction observed in VAX004. The ratio of $p_{0}^{(e)}$ to $p_{0}^{(b)}$ , with a posterior median of 1.45 (95% CS:0.91, 2.39), is only marginally different from 1, lower than 2.48 (95% CI:1.63, 3.88) reported in Hudgens et al. (2002).

Table 5 summarizes estimates for all other parameters. The odds ratio for incarceration injection is estimated as 0.47 (95% CS:0.30, 0.72). Hudgens et al. (2002) reported a much higher value, 4.47 (95% CI:2.63, 7.19), where a time-varying prevalence ratio with an average about 0.55:0.45 between subtypes E and B and a common proportion of 4% for needle sharing across the whole population were assumed. Among subjects with incarceration injection history, the mean injection rate is 0.45 (95% CS:0.37, 0.54) times per day and 14% (95% CS:12%, 17%) involved shared needles. In contrast, among those without incarceration history, the mean injection rate is 0.25 (95% CS:0.24, 0.27) times per day and 4.2% (95% CS:4.0%, 4.5%) involved shared needles. The assumption of a common proportion of needle-sharing in Hudgens et al. (2002) lowers the injection frequency and proportion of needle-sharing down to the overall level, and consequently increases the adjusted transmission probability for subjects with incarceration history. In addition, the incarceration injection indicator is defined for each interval in Hudgens et al. (2002), whereas we define it for each individual. Posterior estimates of ρ, 0.50 (95% CS:0.48, 0.52), and γ, 0.47 (95% CI:0.44, 0.51), suggest substantial within-subject correlation, though not as high as those in VAX004.

Table 5.

VAX003: Summary of the posterior distributions of other parameters for the overall study population

				With Incarceration Injection History				Without Incarceration Injection History

Posterior Quantiles	OR_inc	ρ	γ	αλ^a	βλ^b	α	β	αλ^a	βλ^b	α	β
Median	0.47	0.50	0.47	0.24	1.87	0.23	1.36	0.20	1.25	0.23	5.28
2.5%	0.30	0.48	0.44	0.21	1.60	0.20	1.12	0.19	1.18	0.22	4.85
97.5%	0.72	0.52	0.51	0.27	2.24	0.26	1.66	0.21	1.32	0.25	5.75

Open in a new tab

Shape of the gamma distribution for contact rate.

Scale of the gamma distribution for contact rate.

Similar to VAX004, log-normal and gamma distributions for the injection rate lead to similar results, with a slight difference in ρ. In Figure 2(d)-(f), we see that the heavy tail of the log-normal distribution yields extremely large predicted moments for reported number of injections and thus makes it less competitive than the gamma distribution for modeling injection rates reported in a few categories. Consequently, all results presented for VAX003 are based on the gamma distribution for injection rate.

We performed sensitivity analyses by changing the relative prevalence π⁽^e⁾ : π⁽^e⁾ to 0.7 : 0.3 and 0.8 : 0.2. As expected, the transmission probability tends to decrease for subtype E but to increase for subtype B, as the relative prevalence of subtype E increases. For each subtype and risk level, the VE estimate changes in the direction opposite to that of the corresponding transmission probability, but none of the VE estimates differ significantly from 0. The magnitude of all these changes are relatively small, especially for subtype E. The estimated transmission probability ratio of subtype E to subtype B decreases as the relative prevalence of subtype E increases. Particularly, subtype E becomes statistically more infectious than subtype B with an estimate of 1.88 (95% CS: 1.18, 3.21) for $p_{0}^{(e)} / p_{0}^{(b)}$ , if the prevalence of subtype E is as low as 70% among the IDUs.

5 Discussion

We established a Bayesian hierarchical model for analyzing clinical studies of infectious disease with transmission and exposure data observed over discrete time intervals. This model provides assessment of the transmission probability and vaccine efficacy conditioning on an infectious contact, whereas standard methods of analyzing vaccine trials do not. Assuming conditional independence between observed and true but unobserved quantities, this model provides an approach to adjustment for the measurement error in some key risk factors. We used the method to re-analyze two HIV-1 vaccine trials on populations who are at high risk of HIV-transmission via sexual contacts or sharing needles for drug injection. The proposed method could be applied to studies of other vaccines, such as human papilloma virus vaccines, where contact information is collected.

We obtained estimates of vaccine efficacy similar to the primary study results, especially for VAX004, confirming the findings of no protective efficacy. Two factors may contribute to this similarity in VE estimates. First, the measurement error might be relatively small for the majority of the study population. Secondly, our model assumes unbiasness, i.e., E(ñ_it|λ_it) = E(n_it|λ_it) and E(m̃_it|λ_it, ξ_it) = E(m_it|λ_it, ξ_it). However, if the bias trend is similar in both treatment groups, even a model with bias correction will likely yield a similar VE estimate as well. Despite the similarity, our hierarchical model provides joint inference on not only the transmission probability and VE but also the population-level behavioral characteristics such as the contact rate and proportion of condom use (needle-sharing).

We have assumed an exchangeable structure for within-subject correlation among contact (injection) rates and proportions of condom use (needle-sharing), using the copula method. A more sophisticated structure may be considered given sufficient data. Within-subject sample correlation coefficients among the logarithm of reported contact rates, {log(ñ_it/l_it) : t = 1, … , T_i}, and among reported proportions of condom use, {m̃_it/ñ_it : t = 1, … , T_i}, in VAX004 do indicate that correlation wanes away as two intervals are further apart, but the variation range is relatively small, 0.3-0.5 for the former and 0.45-0.67 for the latter. Therefore, an exchangeable structure is a reasonable assumption, albeit an autoregressive structure such as ARMA(p, q) model (Chib and Greenberg, 1994) may be more realistic. The range of 0.3-0.5 for {log(ñ_it/l_it) : t = 1, … , T_i} may seem contradictory to the Bayesian estimate of ρ around 0.9. A plausible explanation is that the addition of δ_it to reflect the over-dispersion may attenuate the true correlation among the elements of λ_i, as the elements of δ_it are independent given λ_i. Consequently, a high correlation among the elements of λ_i is needed to yield a moderate marginal correlation among the elements of δ_i. In fact, the parameter estimates, especially for transmission probabilities and VEs, do not change much if we assume intervals within the same subject are independent. A possible reason is that only the overall magnitude of n_i and m_i matter in the estimation of p₀ and the VE, and the magnitude mainly depends on the observed ñ_i and m̃_i and is much less affected by the correlation. However, we do see that correlation adjustment changes the shape and scale of the distributions of the contact rate λ_it and the proportion ξ_it in a more noticeable way. For example, without incarceration injection history, the estimates for the shape parameter β for the proportion of needle-sharing in VAX003 change from 5.28 (95% CS:4.85, 5.75) to 6.3 (95% CS:5.92, 6.69) when within-subject independence is assumed.

To adjust for error in self-reported contact information, we assumed a Poisson process for the true number of contacts and an over-dispersed Poisson process for the reported one, and that the two processes are conditionally independent given the underlying contact rate. Ideally, validation data would be available so that the measurement error could be modeled parametrically or without parametric assumptions as in Golm, Halloran and Longini (1999). The collection of validation data would be useful in future vaccine trials. In this Bayesian framework, a more general bivariate distribution could be modeled between n_it and ñ_it given λ_it or between λ_it and a latent rate λ̃_it that determines the distribution of ñ_it, had validation data been available on contact frequency. Another form of additional data, replication of ñ_it and m̃_it in all or some of the intervals, can also improve model precision (Carroll et al., 1995), but the assumption of unbiasness of ñ_it for the true n_it has to be retained. A possible parametric utilization of replication data in our model is to allow for within-interval correlation.

Other than log-normal and gamma distributions, a more flexible option for modeling the contact rate may be mixture prior densities (Richardson et al., 2002). It is likely that the true number of contacts also comes from an over-dispersed Poisson process, but whether such a model is identifiable needs further investigation. When the number of contacts is given as K categories and K is small, e.g., in trial VAX003, the Poisson and over-dispersed Poisson structure may not be realistic. In that case, a more flexible probability structure is to assume that n_it and ñ_it independently follow a discrete distribution indexed by p_it = (p_it₁, … , p_itK)^τ, where p_itk is the probability of falling in the k^th category for interval (i, t), and p_it ∼ Dirichlet (α) for some random or known vector α.

The model is sensitive to the contact-related information when such information is limited. For instance, when the value assigned to the “None” category of the reported proportion of needle-sharing was increased from 0.5% to 5% or higher, we were unable to obtain convergence, likely due to the lack of curvature supporting the estimation of a beta density. We emphasize for future studies that, in terms of contact frequency, numbers are more informative than categories, and more categories are preferred to fewer. Another factor to which the analyses are sensitive is the prevalence of infections among partners. While it is impossible to obtain the infection status of all partners, a validation set of partners randomly selected for verification of infection would help improve the inference. To alleviate under- or over-reporting of contact frequency, it is also important to ensure that study participants understand the definition of a contact, especially when the study involves multiple contact types. Extremely high frequencies, e.g., the numbers of sexual contacts that were reported as over thousands per six-month interval by several participants in VAX004, may indicate misunderstanding of the definition, and should be verified with the participants during the follow-up visits. The underlying mechanism of measurement error in contact-related factors in real studies may never be known, and the best way to improve the VE estimation is to reduce the error at the data collection step.

Acknowledgments

This work was supported by the National Institute of Allergy and Infectious Diseases grant R01-AI32042.

Appendix: MCMC methods and related sampling issues

MCMC Sampling schemes

We use f_dist(·|·) to denote the PDF for continuous variables or the PMF for discrete variables, and F_dist(·|·) to denote the CDF of a random variable given parameters. The subscript “dist” could be “Bin” for binomial, “Pois” for Poisson, “Beta” for beta, “G” for gamma, “IG” for inverse gamma, “N” for normal, and “LN” for log-normal distributions. Whether the distribution is univariate or multivariate is determined by the parameter input.

Sampling n_it

Define q_i₁ = 1 − p(v_i, 1) as the probability of escaping infection from a contact protected by condom use, and similarly define q_i₀ = 1 − p(v_i, 0) for an un-protected contact. The conditional probability of n_it is given by

Pr (n_{i t} = n | \cdot) = {\begin{matrix} \frac{{(λ_{i t} l_{i t} (1 - ξ_{i t}) q_{i 0})}^{n - m_{i t}} exp {- λ_{i t} l_{i t} (1 - ξ_{i t}) q_{i 0}}}{(n - m_{i t})!}, & y_{i t} = 0 \\ \frac{{(λ_{i t} l_{i t} (1 - ξ_{i t}))}^{n - m_{i t}} exp {- λ_{i t} l_{i t} (1 - ξ_{i t})}}{(n - m_{i t})!} \times [1 - q_{i 1}^{m_{i t}} q_{i 0}^{n - m_{i t}}] / C_{i t}, & y_{i t} = 1 \end{matrix}

where C_it = 1 − q_i₁^mit exp{λ_itl_it(1 − ξ_it)(q_i₀ − 1)}. When y_it = 0, we sample n_it − m_it directly from Poisson(λ_itl_it(1 − ξ_it)q_i₀). When y_it = 1, note that the conditional CDF of n_it is

\begin{array}{l} Pr (n_{i t} \leq n | \cdot, y_{i t} = 1) \\ = \frac{exp {λ_{i t} l_{i t} (1 - ξ_{i t})} F_{Pois} (n | λ_{i t} l_{i t} (1 - ξ_{i t})) - q_{i 1}^{m_{i t}} exp {λ_{i t} l_{i t} (1 - ξ_{i t}) q_{0}} F_{Pois} (n | λ_{i t} l_{i t} (1 - ξ_{i t}) q_{i 0})}{exp {λ_{i t} l_{i t} (1 - ξ_{i t})} - q_{i 1}^{m_{i t}} exp {λ_{i t} l_{i t} (1 - ξ_{i t}) q_{i 0}}} . \end{array}

(11)

As the CDF is an non-decreasing function, we use direct sampling in combination with binary searching. For example, to sample n_it, we generate a value z from Uniform(0, 1); then, the smallest n satisfying Pr(n_it ≤ n|·, y_it = 1) ≥ z is the sampled value of n_it and can be found using binary searching or other advanced searching methods.

Sampling m_it

The conditional probability of m_it is

Pr (m_{i t} = m | \cdot) = {\begin{matrix} (\begin{matrix} n_{i t} \\ m \end{matrix}) {\frac{ξ_{i t} q_{i 1}}{ξ_{i t} q_{i 1} + (1 - ξ_{i t}) q_{i 0}}}^{m} {(1 - \frac{ξ_{i t} q_{i 1}}{ξ_{i t} q_{i 1} + (1 - ξ_{i t}) q_{i 0}})}^{n_{i t} - m}, & y_{i t} = 0 \\ (\begin{matrix} n_{i t} \\ m \end{matrix}) ξ_{i t}^{m} {(1 - ξ_{i t})}^{n_{i t} - m} \times [1 - q_{i 1}^{m} q_{i 0}^{n_{i t} - m}] / D_{i t}, & y_{i t} = 1 \end{matrix}

where D_it = 1− [ξ_itq_i₁ +(1−ξ_it)q_i₀]^n_it. When y_it = 0, we sample m_it directly from Binomial $(n_{i t}, \frac{ξ_{i t} q_{i 1}}{ξ q_{i 1} + (1 - ξ_{i t}) q_{i 0}})$ . When y_it = 1, we have

Pr (m_{i t} \leq m | \cdot, y_{i t} = 1) = \frac{F_{Bin} (m | n_{i t}, ξ_{i t}) - {(ξ q_{i 1} + (1 - ξ_{i t}) q_{i 0})}^{n_{i t}} F_{Bin} (m | n_{i t}, P)}{1 - {(ξ_{i t} q_{i 1} + (1 - ξ_{i t}) q_{i 0})}^{n_{i t}}},

(12)

where $P = \frac{ξ_{i t} q_{i 1}}{ξ_{i t} q_{i 1} + (1 - ξ_{i t}) q_{i 0}}$ . We use the same technique in sampling n_it, i.e., direct sampling in combination with binary searching.

Sampling λ_it

Define μ_i = μ1_{T_i}_×1 and Σ_i = σ²(ρJ_{T_i×T_i} + (1 − ρ)I_{T_i×T_i})). The likelihood part concerning the contact rate vector λ_i is given by

L_{i} (λ_{i} | \cdot) \propto f_{L N} (λ_{i} | μ_{i}, \sum_{i}) (\prod_{t = 1}^{T_{i}} f_{G} (λ_{i t} | n_{i t} + 1, l_{i t}^{- 1})) (\prod_{t = 1}^{T_{i}} f_{I G} (λ_{i t} | φ, \frac{l_{i t}}{φ δ_{i t}})) (\prod_{t = 1}^{T_{i}} λ_{i t}) .

To sample λ_i, we take the following steps:

First sample $λ_{i}^{⋆}$ from Log-Normal(μ_i, Σ_i), and accept it with the probability
$min (1, \frac{\prod_{t = 1}^{T_{i}} {f_{G} (λ_{i t}^{⋆} | n_{i t} + 1, l_{i t}^{- 1}) f_{I G} (λ_{i t}^{⋆} | φ, \frac{l_{i t}}{φ δ_{i t}}) λ_{i t}^{⋆}}}{\prod_{t = 1}^{T_{i}} {f_{G} (λ_{i t} | n_{i t} + 1, l_{i t}^{- 1}) f_{I G} (λ_{i t} | φ, \frac{l_{i t}}{φ δ_{i t}}) λ_{i t}}}) .$

Update λ_i with $λ_{i}^{⋆}$ if the new sample is accepted;
Sample a new $λ_{i}^{⋆}$ from $\prod_{t = 1}^{T_{i}} (f_{G} (λ_{i t}^{⋆} | n_{i t} + 1, l_{i t}^{- 1})$ , and accept it with the probability
$min (1, \frac{f_{L N} (λ_{i}^{⋆} | μ_{i}, \sum_{i}) \prod_{t = 1}^{T_{i}} {λ_{i t}^{⋆} f_{I G} (λ_{i t}^{⋆} | φ, \frac{l_{i t}}{φ δ_{i t}})}}{f_{L N} (λ_{i} | μ_{i}, \sum_{i}) \prod_{t = 1}^{T_{i}} {λ_{i t} f_{I G} (λ_{i t} | φ, \frac{l_{i t}}{φ δ_{i t}})}}) .$

Update λ_i with $λ_{i}^{⋆}$ if the new sample is accepted;
Sample a new $λ_{i}^{⋆}$ from $\prod_{t = 1}^{T_{i}} f_{I G} (λ_{i t} | φ, \frac{l_{i t}}{φ δ_{i t}})$ and accept it with the probability
$min (1, \frac{f_{L N} (λ_{i}^{⋆} | μ_{i}, \sum_{i}) \prod_{t = 1}^{T_{i}} {λ_{i t}^{⋆} f_{G} (λ_{i t}^{⋆} | n_{i t} + 1, l_{i t}^{- 1})}}{f_{L N} (λ_{i} | μ_{i}, \sum_{i}) \prod_{t = 1}^{T_{i}} {λ_{i t} f_{G} (λ_{i t} | n_{i t} + 1, l_{i t}^{- 1})}}) .$

This cross-sampling procedure is a generalization of the Metropolized independence sampling algorithm (Chib and Greenberg, 1995). Liu (1996) showed that Metropolized independence sampling is superior to rejection sampling with respect to asymptotic efficiency and ease of computation, given that the proposal density provides a reasonable coverage over the domain of the posterior density. In this case, we have a composite full likelihood L(x) ∝ f(x)g(x) in which f(x) and g(x) are both ready for sampling. Using f(x) and g(x) alternately as the proposal density can better cover the reasonable range of x as compared to using either f(x) or g(x) alone as the proposal density.

Sampling ε_i and ξ_i

Define ϒ_i = γJ_{T_i}_×_{T_i} + (1 − γ)I_{T_i}_×_{T_i}. The likelihood part concerning ε_i is given by

L_{i} (ε_{i} | \cdot) \propto f_{N} (ε_{i} | 0, ϒ_{i}) \times \prod_{t = 1}^{T_{i}} ξ_{i t}^{m_{i t} + {\tilde{m}}_{i t}} {(1 - ξ_{i t})}^{n_{i t} - m_{i t} + {\tilde{n}}_{i t} - {\tilde{m}}_{i t}} .

(13)

The above likelihood is expressed in terms of ε_i, and ξ_i exists through ξ_it = Ψ⁻¹(Φ(ε_it) | α, β). To express the likelihood in terms of ξ_i, (13) becomes

L_{i} (ξ_{i} | \cdot) \propto exp {- \frac{1}{2} ε_{i}^{τ} ϒ_{i}^{- 1} ε_{i} + \frac{1}{2} ε_{i}^{τ} ε_{i}} \times \prod_{t = 1}^{T_{i}} f_{Beta} (ξ_{i} | α + m_{i t} + {\tilde{m}}_{i t}, β + n_{i t} - m_{i t} + {\tilde{n}}_{i t} - {\tilde{m}}_{i t})

(14)

where ε_i exists via $ε_{i t} = Φ^{- 1} (Ψ (ξ_{i t} | α, β)) . | \prod_{t = 1}^{T_{i}} {Φ^{- 1^{'}} (Ψ (ξ_{i t})) Ψ^{'} (ξ_{i t})} |$ is the Jacobian term, and Φ^−1′ (x) = [f_N(Φ⁻¹(x)|0, 1)]⁻¹.

The sampling of ε_i and ξ_i proceeds as the following:

Based on (13), sample $ε_{i}^{⋆}$ from Normal(0, ϒ_i), and accept it with the probability
$min (1, \prod_{t = 1}^{T_{i}} \frac{ξ_{i t}^{⋆ m_{i t} + {\tilde{m}}_{i t}} {(1 - ξ_{i t}^{⋆})}^{n_{i t} - m_{i t} + {\tilde{n}}_{i t} - {\tilde{m}}_{i t}}}{ξ_{i t}^{m_{i t} + {\tilde{m}}_{i t}} {(1 - ξ_{i t})}^{n_{i t} - m_{i t} + {\tilde{n}}_{i t} - {\tilde{m}}_{i t}}}),$

where $ξ_{i t}^{⋆} = Ψ^{- 1} (Φ (ε_{i t}^{⋆}))$ . Update ε_i and ξ_i if the new sample is accepted;
Based on (14), sample $ξ_{i}^{⋆}$ from $\prod_{t = 1}^{T_{i}} f_{Beta} (α + m_{i t} + {\tilde{m}}_{i t}, β + (n_{i t} - m_{i t}) + ({\tilde{n}}_{i t} - {\tilde{m}}_{i t}))$ , and accept it with the probability
$min (1, \frac{exp {- \frac{1}{2} ε_{i}^{⋆ τ} ϒ_{i}^{- 1} ε_{i}^{⋆} + \frac{1}{2} ε_{i}^{⋆ τ} ε_{i}^{⋆}}}{exp {- \frac{1}{2} ε_{i}^{τ} ϒ_{i}^{- 1} ε_{i} + \frac{1}{2} ε_{i}^{τ} ε_{i}}}),$

where $ε_{i t}^{⋆} = Φ^{- 1} (Ψ (ξ_{i t}^{⋆}))$ .

Sampling other parameters

Let log λ_i = (log λ_i ₁, … , logλ_{iT_i})^τ, μ_i = μ1_{T_i}_×1, and let R_i = ρJ_{T_i×T_i} + (1 −ρ)I_{T_i×T_i} such that Σ_i = (σ²)R_i.

The following parameters are sampled directly from their full conditional distributions:

\begin{array}{l} δ_{i t} | \cdot \sim Gamma ({\tilde{n}}_{i t} + φ, {(1 + \frac{φ}{λ_{i t} l_{i t}})}^{- 1}), \\ μ | \cdot \sim Normal (\frac{\sum_{i = 1}^{N} 1^{τ} \sum_{i}^{- 1} log λ_{i}}{\sum_{i = 1}^{N} 1^{τ} \sum_{i}^{- 1} 1}, {(\sum_{i = 1}^{N} 1^{τ} \sum_{i}^{- 1} 1)}^{- 1}), and \\ σ^{2} | \cdot \sim Inverse Gamma (\frac{1}{2} {\sum_{i} T_{i}, [\frac{1}{2} \sum_{i = 1}^{N} {(log λ_{i} - μ_{i})}^{τ} R_{i}^{- 1} (log λ_{i} - μ_{i}]}^{- 1}) . \end{array}

A random-walk style Metropolis-Hastings algorithm is used to sample ρ, φ, α, β, γ, p₀ and θ, i.e., a new value is sampled from a normal density with the current value as its mean. The variance of each proposal normal density is dynamically adapted to reach an acceptance rate of 0.3-0.4. To apply this sampling scheme, appropriate transformation may be necessary so that the domain of the transformed paremeter is (−∞, ∞), e.g., a logit transformation for the transmission probability.

Diagnostics for convergence

We run three chains simultaneously and use the scale reduction factor to monitor the convergence of the chains. The scale reduction factor is defined as

\sqrt{\hat{R}} = \sqrt{\frac{M - 1}{M} + \frac{1}{M} \frac{B}{W},}

where M is the number of runs, and B and W are the between-sequence and within-sequence variances, respectively. Gelman and Rubin (1992) showed that the factor $\sqrt{\hat{R}}$ will approach 1 as M → ∞, and recommend that the convergence can be considered as reached if $\sqrt{\hat{R}}$ < 1.2 for all parameters. We let the chains run through a burn-in period of 30000 iterations and calculate $\sqrt{\hat{R}}$ for each 5000 iterations afterwards. The criteria $\sqrt{\hat{R}}$ < 1.2 is adopted as the stopping rule.

The results of analyzing the two AIDSVAX trials are based on the last 5000 iterations of three parallel chains. A burn-in period of 5000 runs is enforced after the variances of proposal normal densities are fixed. To reduce the correlation within each successive chain, we loop over the last 5000 runs of the three parallel chains, and at each loop we randomly pick one chain to read in the samples.

References

Carroll RJ, Ruppert D, Stefanski LA. Measurement Error in Nonlinear Models. London: Chapman and Hall; 1995. [Google Scholar]
Carroll RJ, Kuchenhoff H, Lombard F, Stefanski LA. Asymptotics for the SIMEX estimator in nonlinear measurement error models. Journal of the American Statistical Association. 1995;91:242–250. [Google Scholar]
Chib SC, Greenberg E. Bayes inference in regression models with ARMA(p,q) errors. Journal of Economics. 1994;64:183–206. [Google Scholar]
Chib SC, Greenberg E. Understanding the Metropolis-Hastings algorithm. The American Statistician. 1995;49:327–335. [Google Scholar]
Cook JR, Stefanski LA. Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association. 1994;89:1314–1328. [Google Scholar]
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. (B).Journal of the Royal Statistical Society. 1977;39:1–38. [Google Scholar]
Fan J, Truong YK. Nonparametric regression with errors in variables. The Annals of Statistics. 1993;21:1900–1925. [Google Scholar]
Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences (with discussion) Statistical Science. 1992;7:457–511. [Google Scholar]
Golm GT, Halloran ME, Longini ML. Semiparametric methods for multiple exposure mismeasurement and a bivariate outcome in HIV vaccine trials. Biometrics. 1999;55:94–101. doi: 10.1111/j.0006-341x.1999.00094.x. [DOI] [PubMed] [Google Scholar]
Gurwith M, rgp120 HIV Vaccine Study Group Placebo-Controlled phase 3 trial of a recombinant glycoprotein 120 vaccine to prevent HIV-1 infection. Journal of Infectious Diseases. 2005;191:654–665. doi: 10.1086/428404. [DOI] [PubMed] [Google Scholar]
Halloran ME, Struchiner CJ, Longini ML. Study designs for evaluating different efficacy and effectiveness aspects of vaccines. American Journal of Epidemiology. 1997;146:789–803. doi: 10.1093/oxfordjournals.aje.a009196. [DOI] [PubMed] [Google Scholar]
Hudgens MG, Longini ML, Vanichseni S, Hu DJ, Kitayaporn D, Mock PA, Halloran ME, Satten G, Choopanya K, Mastro TD. Subtype-specific transmission probabilities for human immunodeficiency virus type 1 among injecting drug users in Bangkok, Thailand. American Journal of Epidemiology. 2002;155:159–168. doi: 10.1093/aje/155.2.159. [DOI] [PubMed] [Google Scholar]
Kitayaporn D, Vanichseni S, Mastro TD, Raktham S, Vaniyapongs T, Des Jarlais DC, Wasi C, Young N, Sujarita S, Heyward WL, Esparza J. Infection with HIV-1 subtypes B and E in injecting drug users screened for enrollment into a prospective cohort in Bangkok, Thailand. Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology. 1998;19:289–295. doi: 10.1097/00042560-199811010-00012. [DOI] [PubMed] [Google Scholar]
Liu JS. Metropolized independence sampling with comparison to rejection sampling and importance sampling. Statistics and Computing. 1996;6:113–119. [Google Scholar]
Longini ML, Hudgens MG, Halloran ME, Sagatelian K. A Markov model for measuring vaccine efficacy for both susceptibility to infection and reduction in infectiousness for prophylactic HIV-1 vaccines. Statistics in Medicine. 1999;18:53–68. doi: 10.1002/(sici)1097-0258(19990115)18:1<53::aid-sim996>3.0.co;2-0. [DOI] [PubMed] [Google Scholar]
Pitisuttithum P, Gilbert P, Gurwith M, Heyward W, Martin M, van Griensven F, Hu DJ, Tappero JW, Choopanya K. Randomized, double-blinded, placebo-controlled efficacy trial of a bivalent recombinant glycoprotein 120 HIV-1 vaccine among injection drug users in Bangkok, Thailand. Journal of Infectious Diseases. 2006;194:1661–1671. doi: 10.1086/508748. [DOI] [PubMed] [Google Scholar]
Richardson S, Gilks WR. A Bayesian approach to measurement error problems in epidemiology using conditional independence models. American Journal of Epidemiology. 1993;138:430–442. doi: 10.1093/oxfordjournals.aje.a116875. [DOI] [PubMed] [Google Scholar]
Richardson S, Green PJ. On Bayesian analysis of mixtures with an unknown number of components. (B).Journal of the Royal Statistical Society. 1997;59:731–792. [Google Scholar]
Richardson S, Leblond L, Jaussent I, Green PJ. Mixture models in measurement error problems, with reference to epidemiological studies. (A).Journal of the Royal Statistical Society. 2002;165:549–566. [Google Scholar]

[R1] Carroll RJ, Ruppert D, Stefanski LA. Measurement Error in Nonlinear Models. London: Chapman and Hall; 1995. [Google Scholar]

[R2] Carroll RJ, Kuchenhoff H, Lombard F, Stefanski LA. Asymptotics for the SIMEX estimator in nonlinear measurement error models. Journal of the American Statistical Association. 1995;91:242–250. [Google Scholar]

[R3] Chib SC, Greenberg E. Bayes inference in regression models with ARMA(p,q) errors. Journal of Economics. 1994;64:183–206. [Google Scholar]

[R4] Chib SC, Greenberg E. Understanding the Metropolis-Hastings algorithm. The American Statistician. 1995;49:327–335. [Google Scholar]

[R5] Cook JR, Stefanski LA. Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association. 1994;89:1314–1328. [Google Scholar]

[R6] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. (B).Journal of the Royal Statistical Society. 1977;39:1–38. [Google Scholar]

[R7] Fan J, Truong YK. Nonparametric regression with errors in variables. The Annals of Statistics. 1993;21:1900–1925. [Google Scholar]

[R8] Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences (with discussion) Statistical Science. 1992;7:457–511. [Google Scholar]

[R9] Golm GT, Halloran ME, Longini ML. Semiparametric methods for multiple exposure mismeasurement and a bivariate outcome in HIV vaccine trials. Biometrics. 1999;55:94–101. doi: 10.1111/j.0006-341x.1999.00094.x. [DOI] [PubMed] [Google Scholar]

[R10] Gurwith M, rgp120 HIV Vaccine Study Group Placebo-Controlled phase 3 trial of a recombinant glycoprotein 120 vaccine to prevent HIV-1 infection. Journal of Infectious Diseases. 2005;191:654–665. doi: 10.1086/428404. [DOI] [PubMed] [Google Scholar]

[R11] Halloran ME, Struchiner CJ, Longini ML. Study designs for evaluating different efficacy and effectiveness aspects of vaccines. American Journal of Epidemiology. 1997;146:789–803. doi: 10.1093/oxfordjournals.aje.a009196. [DOI] [PubMed] [Google Scholar]

[R12] Hudgens MG, Longini ML, Vanichseni S, Hu DJ, Kitayaporn D, Mock PA, Halloran ME, Satten G, Choopanya K, Mastro TD. Subtype-specific transmission probabilities for human immunodeficiency virus type 1 among injecting drug users in Bangkok, Thailand. American Journal of Epidemiology. 2002;155:159–168. doi: 10.1093/aje/155.2.159. [DOI] [PubMed] [Google Scholar]

[R13] Kitayaporn D, Vanichseni S, Mastro TD, Raktham S, Vaniyapongs T, Des Jarlais DC, Wasi C, Young N, Sujarita S, Heyward WL, Esparza J. Infection with HIV-1 subtypes B and E in injecting drug users screened for enrollment into a prospective cohort in Bangkok, Thailand. Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology. 1998;19:289–295. doi: 10.1097/00042560-199811010-00012. [DOI] [PubMed] [Google Scholar]

[R14] Liu JS. Metropolized independence sampling with comparison to rejection sampling and importance sampling. Statistics and Computing. 1996;6:113–119. [Google Scholar]

[R15] Longini ML, Hudgens MG, Halloran ME, Sagatelian K. A Markov model for measuring vaccine efficacy for both susceptibility to infection and reduction in infectiousness for prophylactic HIV-1 vaccines. Statistics in Medicine. 1999;18:53–68. doi: 10.1002/(sici)1097-0258(19990115)18:1<53::aid-sim996>3.0.co;2-0. [DOI] [PubMed] [Google Scholar]

[R16] Pitisuttithum P, Gilbert P, Gurwith M, Heyward W, Martin M, van Griensven F, Hu DJ, Tappero JW, Choopanya K. Randomized, double-blinded, placebo-controlled efficacy trial of a bivalent recombinant glycoprotein 120 HIV-1 vaccine among injection drug users in Bangkok, Thailand. Journal of Infectious Diseases. 2006;194:1661–1671. doi: 10.1086/508748. [DOI] [PubMed] [Google Scholar]

[R17] Richardson S, Gilks WR. A Bayesian approach to measurement error problems in epidemiology using conditional independence models. American Journal of Epidemiology. 1993;138:430–442. doi: 10.1093/oxfordjournals.aje.a116875. [DOI] [PubMed] [Google Scholar]

[R18] Richardson S, Green PJ. On Bayesian analysis of mixtures with an unknown number of components. (B).Journal of the Royal Statistical Society. 1997;59:731–792. [Google Scholar]

[R19] Richardson S, Leblond L, Jaussent I, Green PJ. Mixture models in measurement error problems, with reference to epidemiological studies. (A).Journal of the Royal Statistical Society. 2002;165:549–566. [Google Scholar]

PERMALINK

A Bayesian Framework for Estimating Vaccine Efficacy per Infectious Contact

Yang Yang

Peter Gilbert

Ira M Longini Jr

M Elizabeth Halloran

Abstract

1 Introduction

2 Data description

Table 1.

3 Methods

3.1 Model Structure

3.1.1 The Regression Submodel

The North America and Netherlands Trial (VAX004)

The Thai Trial (VAX003)

3.1.2 The Measurement Error Submodel

Figure 1.

3.1.3 The Prior Submodel

3.2 Posterior Distributions

4 Application

4.1 VAX004: HIV transmission by sexual contacts

Table 2.

Table 3.

Figure 2.

4.2 VAX003: HIV transmission among IDUs using shared needles

Table 4.

Table 5.

5 Discussion

Acknowledgments

Appendix: MCMC methods and related sampling issues

MCMC Sampling schemes

Sampling nit

Sampling mit

Sampling λit

Sampling εi and ξi

Sampling other parameters

Diagnostics for convergence

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Sampling n_it

Sampling m_it

Sampling λ_it

Sampling ε_i and ξ_i