Abstract
This article develops hypothesis testing procedures for the stratified mark-specific proportional hazards model in the presence of missing marks. The motivating application is preventive HIV vaccine efficacy trials, where the mark is the genetic distance of an infecting HIV sequence to an HIV sequence represented inside the vaccine. The test statistics are constructed based on two-stage efficient estimators, which utilize auxiliary predictors of the missing marks. The asymptotic properties and finite-sample performances of the testing procedures are investigated, demonstrating double-robustness and effectiveness of the predictive auxiliaries to recover efficiency. The methods are applied to the RV144 vaccine trial.
Keywords: Auxiliary marks, competing risks failure time data, proportional hazards model, genetic data, augmented inverse probability weighting, semiparametric model
1 Introduction
The primary objective of a preventive HIV vaccine efficacy trial is to assess vaccine efficacy (VE) to prevent HIV infection, where typically VE is defined as one minus the hazard ratio (vaccine/placebo) of HIV infection diagnosis. However, the great genetic variability of HIV poses a central challenge to developing a highly efficacious vaccine (Fauci et al., 2008). The trial population is exposed to many HIV genotypes but the vaccine only contains a few, and the vaccine is less likely to protect against HIVs with greater genetic distance from the sequences inside the vaccine (Gilbert et al., 1999). The trial has objectives to assess whether and how the vaccine impacts the infection rate with any HIV genotype and whether and how the vaccine effect varies by HIV genotype; assessment of these objectives has been named ‘sieve analysis’ (Gilbert et al., 1998).Gilbert et al. (2008),Sun et al. (2009), and Sun and Gilbert (2012) developed sieve analysis methods using the competing risks failure time framework (Prentice et al., 1978), which attach a continuous ‘mark’ variable to HIV infected subjects that measures the genetic distance of an infecting HIV sequence to a sequence inside the vaccine. The goal of the sieve analysis methods is evaluation of mark-specific vaccine efficacy, here defined as one minus the mark-specific hazard ratio (vaccine/placebo) of infection. Beyond HIV, the methods apply generally to any preventative vaccine efficacy trial for which the pathogen targeted by the vaccine is genetically diverse, which include influenza, malaria, tuberculosis, dengue, streptococcus pneumoniae, human papilloma virus, and hepatitis C virus.
Gilbert et al. (2008) and Sun et al. (2009) assumed no missing mark data in infected subjects, whereas Sun and Gilbert (2012) allowed missing at random (MAR) marks. In practice there are missing marks, for example in the Vax004 trial 32 of 368 infected subjects had no HIV sequence data (Gilbert et al., 2008), due to drop-out or to inability of the HIV sequencing technology to measure the infecting HIV sequence, and in the ‘Step’ trial 22 of 88 infected subjects had no HIV sequence data (Rolland et al., 2011). While it is of scientific interest to evaluate amark defined based on the earliest available HIV sequence, a mark of particular scientific interest is defined based on an HIV sequence measured near the time of acquisition, which is missing in a much larger fraction of infected subjects due to the periodic (typically 6-monthly) diagnostic tests for HIV infection. Specifically, HIV sequences are measured from the earliest available post-infection blood sample, and a ‘near acquisition’ or ‘early’ sample may be defined as one documented to be sufficiently near acquisition. In the Step trial, only 23 of the 66 infected subjects with sequence data had an early mark measured, defined as sampling within 3 weeks. Sun and Gilbert (2012) provide details on the HIV testing algorithm that is used to define an early mark.
Sun and Gilbert (2012) is currently the only paper on sieve analysis that accommodates missing continuous marks. It develops two valid estimation approaches based on the stratified mark-specific proportional hazards model. The first uses inverse probability weighting (IPW) of the complete-case estimator, which leverages auxiliary predictors of whether the mark is observed, whereas the second, adapting Robins et al. (1994), augments the IPW complete-case estimator with auxiliary predictors of the missing marks. Sun and Gilbert (2012) restricted attention to estimation methods, and this article is a sequel that develops corresponding inferential/hypothesis testing methods based on the augmented IPW estimator. An important new component of this work compared to the previous work is to center it around the sieve analysis of the RV144 Thai trial, which recently delivered the landmark result that a prime-boost HIV vaccine appeared to provide partial protection against HIV infection (estimated VE = 31%, 95% CI 1% to 51%, Rerks-Ngarm et al., 2009). This result has stimulated intense interest in the sieve analysis, for two reasons. First, there is controversy about whether the vaccine is really partially working versus a false positive result (Gilbert et al., 2011), and the sieve analysis of HIV sequences can help resolve this question. In particular, if evidence is found that the vaccine efficacy declines with genetic distance, and the distance is defined based on known parts of HIV that contain putatively protective antibody epitopes, then an interpretation of real vaccine efficacy is supported. Secondly, the HIV vaccine field is grappling with how to modify the tested vaccine to increase its potential vaccine efficacy for the next efficacy trial, and understanding the relationship between vaccine efficacy and the genetic distance provides direct guidance on which HIV sequences to put inside of the next generation vaccines.
This article is organized as follows. Notations, assumptions, and the stratified mark-specific proportional hazards model are introduced in Section 2. Background on the estimation procedures needed for the testing procedures are described in Section 3. The testing procedures are developed, and asymptotic properties described, in Section 4. The finite-sample performances of the tests are evaluated via simulations in Section 5. The application to the Thai trial is given in Section 6, and the asymptotic results and their proofs are placed in the Appendix.
2 Model and missing mark data
2.1 Stratified mark-specific proportional hazards (PH) model
Let T be the failure time, V a continuous mark variable with bounded support [0, 1], and Z(t) a possibly time-dependent p-dimensional covariate. The mark V is only observable when T is observed. Suppose that the conditional mark-specific hazard function at time t given the covariate history Z(s), for s ≤ t, only depends on the current value Z(t). We consider the stratified mark-specific proportional hazards (PH) model
(1) |
where λk(t, v|z(t)) is the conditional mark-specific hazard function given covariate z(t) for an individual in the kth stratum, λ0k (·, v) = λk(t, v|z(t) = 0) is the unspecified baseline hazard function for the kth stratum, β(v) is the p-dimensional unknown regression coefficient function of v, and K is the number of strata. Model (1) allows different baseline functions for different strata and flexibly allows for arbitrary mark-specific infection hazards over time in the placebo group. In practice, different key subgroups (e.g., men and women in the Thai trial) are assigned different baseline mark-specific hazards of HIV infection.
Arranging , so that β1 (v) is the coefficient for vaccination status and β2 (v) for other covariates, the covariate and stratum adjusted mark-specific vaccine efficacy VE(v) equals 1 − exp(β1 (v)).Sun et al. (2009) developed some statistical procedures for model (1) with K = 1 based on observations of the random variables (X,Z(·), V) for δ = 1 and (X,Z(·)) for δ = 0, where X = min{T,C}, λ = I(T ≤ C), and C is a censoring random variable. Sun and Gilbert (2012) developed estimation procedures for model (1) with general K allowing V to be missing for some subjects with δ = 1; these methods incorporate auxiliary covariates and/or auxiliary mark variables that inform about the probability V is observed and about the distribution of V. This article develops parallel hypothesis testing procedures for assessing VE(v). As summarized in the Introduction, the two objectives are to assess if the vaccine efficacy ever deviates from 0 [i.e., test VE(v) = 0] and to assess if the vaccine efficacy changes with the mark [i.e., test VE(v) = VE].
2.2 Missing data assumptions
Let R be the indicator of whether all possible data are observed for a subject; R = 1 if either δ = 0 (right-censored) or if δ = 1 and V is observed; and R = 0 otherwise. Auxiliary variables A may be helpful for predicting missing marks. Since the mark can only be missing for failures, supplemental information is potentially useful only for failures, for predicting missingness and for informing about the distribution of missing marks. For example, if V is defined based on the early virus, then V*, the auxiliary mark information, may include sequences of later sampled viruses, and can be considered a subset of A. In general, A could include multiple viral sequences per infected subject at multiple time-points, giving information on intra-subject HIV evolution. The relationship between A and V can be modelled to help predict V (see Section 5 for a simulated example).
We assume C is conditionally independent of (T, V ) given Z(·) and the stratum. We also assume V is MAR (Rubin, 1976); that is, given δ = 1 and W = (T,Z(T), A), the probability V is missing depends only on the observed W, not on the value of V; this assumption is expressed as
(2) |
Let πk(Q) = P(R = 1|Q) where Q = (δ, W). Then πk(Q) = δrk(W) + (1 − δ). The MAR assumption (2) also implies that V is independent of R given Q:
(3) |
Define rk (w) = P(R = 1|δ = 1, W = w) and ρk(v, w) = P(V ≤ v|λ = 1, W = w). The stratum-specific definitions of rk (w) and ρk(v, w) allow the models of the probability of complete-case and of the mark distribution to differ across strata.
Let τ be the end of the follow-up period, and nk be the number of subjects in the kth stratum; the total sample size is . Let {Xki, Zki (·), δki, Rki, Vki, Aki ; i = 1,…, nk } be iid replicates of {X,Z(·), δ, R, V, A} from the kth stratum. The observed data are {Oki ; i = 1,…, nk, k = 1,…, K}, where Oki = {Xki, Zki (·), Rki, Rki Vki, Aki } for δki = 1 and Oki = {Xki, Zki (·), Rki = 1} for δki = 0. We assume the Oki are independent for all subjects.
2.3 Hypotheses to test
We develop procedures for testing the following two sets of hypotheses. Let [a, b] ⊂ (0, 1). The first set of hypotheses is
H10 : VE(v) = 0 for v ∈ [a, b]
versus H1a : VE(v) ≠ 0 for some v (general alternative)
or H1m : VE(v) ≥ 0 with strict inequality for some v (monotone alternative).
The second set of hypotheses is
H20 : VE(v) does not depend on v ∈ [a, b]
versus H2a : VE(v) depends on v (general alternative)
or H2m : VE(v) decreases as v increases (monotone alternative).
The null hypothesis H10 implies the vaccine affords no protection (nor increased risk) against any HIV genotype. The ordered alternative H1m indicates that the vaccine provides protection for at least some of the HIV genotypes, while H1a indicates that the vaccine provides protection and/or increased risk for some HIV genotypes. The null hypothesis H20 implies there is no difference in vaccine protection against different HIV genotypes. The ordered alternative H2m indicates that vaccine efficacy decreases with v and H2a indicates that the vaccine efficacy changes with v. With β1 (v) the first component of β(v), the first set of hypotheses is equivalent to H10 : β1 (v) = 0 for v ∈ [a, b] versus H1a : β1 (v) ≠ 0 for some v or H1m : β1 (v) ≤ 0 with strict inequality for some v. The second set of hypotheses is equivalent to H20 : β1 (v) does not depend on v ∈ [a, b] versus H2a : β1 (v) depends on v or H2m : β1 (v) increases as v increases. We develop testing procedures for detecting departures from H10 in the direction of H1a and H1m and for detecting departures from H20 in the direction of H2a and H2m. The procedures are developed based on the augmented IPW complete-case estimator developed by Sun and Gilbert (2012).
3 Estimation procedure with missing marks
The augmented IPW estimator for model (1) is obtained in two stages. First the IPW complete-case estimator is derived and second the augmented IPW estimator is obtained, which improves efficiency by accounting for information in the conditional distribution of V given the auxiliaries.
Let rk (Wki, ψk) be the parametric model for the probability of complete-case, rk (Wki) defined in (2), where Wki = (Tki, Zki (Tki), Aki) and ψk is a q-dimensional parameter. For example, one can assume the logistic model with for those with λki = 1, where Wki = (Tki, Zki (Tki), Aki). By (2), the maximum likelihood estimator ψ̂ = (ψ̂1,…, ψ̂K)T of ψ = (ψ1,…, ψK)T is obtained by maximizing the observed data likelihood,
(4) |
Let K(x) be a kernel function with support [−1, 1] and let h = hn be a bandwidth. Let Nki (t, v) = I(Xki ≤ t, δki = 1, Vki ≤ v) and Yki (t) = I(Xki ≥ t). Let Qki = (δki, Wki) and πk (Qki, ψk) = δki rk (Wki, ψk) + (1 − δki). The first-stage estimator is the IPW estimator β̂ipw (v), which solves the following estimating equation for β: Uipw (v, β, ψ̂) = 0, where
(5) |
where for j = 0, 1, where z⊗0 = 1 and z⊗1 = z for any z ∈ ℝp. The score function (5) can be viewed as an extension of the score function used for the cause-specific Cox model (Prentice et al., 1978) for a particular failure cause J = j, for which the counting process only counts events of type j. It borrows strength from observations having marks in the neighborhood of v. The kernel function is designed to give greater weight to observations with marks near v than those further away.
The baseline function λ0k (t, v) can be estimated by , obtained by smoothing the increments of the following estimator of the doubly cumulative baseline function :
(6) |
For example, one can use the following kernel smoothing
(7) |
where , with K(1) (·) and K(2) (·) the kernel functions and h1 and h2 the bandwidths.
Following Robins et al. (1994), Sun and Gilbert (2012) proposed a more efficient procedure for estimating (1) by incorporating the knowledge of ρk (w, v) into the estimation procedure. Let w = (t, z, a) and gk (a|t, v, z) = P(Aki = a|Tki = t, Vki = v, Zki = z, δki = 1). Then
(8) |
If no auxiliary variables are available or if Aki is conditionally independent of Vki given (Tki, Zki, δki), then . In this case, ρk(w, v) can be estimated by . When the auxiliary marks Aki are correlated with Vki conditional on Tki, Zki and δki = 1, the conditional distribution ρk(w, v) involves the function gk (a|t, u, z), for which a parametric or semiparametric model may be developed to describe the dependence between Aki and Vki. Let ĝk (a|t, u, z) be an estimator of gk (a|t, u, z) with a convergence rate of at least (nh)−1/2. Then ρk(w, v) can be estimated by
(9) |
Let . The augmented IPW (AIPW) estimating equation for β is Uaug (v, β, ψ̂, ρ̂ (·)) = 0, where
(10) |
and for j = 0, 1. The AIPW estimator of β(v) solves the above equation and is denoted by β̂aug (v). The estimator of the cumulative function . Note that there is no ψ̂k in Z̄k (t, β); this is a difference between the IPW and AIPW estimators.
To implement the estimation procedures in practice, one can use arbitrary auxiliaries for estimating ψ̂k ; these auxiliaries may include covariates and marks at multiple time-points pre-infection and post-infection, respectively. In contrast, while in principle arbitrary auxiliaries may also be used for the terms ĝk (a|t, u, z) in (9), due to the curse of dimensionality the method is expected to perform best in practice with a univariate auxiliary, where semiparametric or fully parametric models for gk (a|t, u, z) would be required to include multivariate auxiliaries.
Sun and Gilbert (2012) proved that the estimators β̂ipw (t, v) and β̂aug (t, v) are consistent and that β̂aug (v) is more efficient than β̂ipw (v). In the next section, we develop some hypothesis testing procedures for assessing mark-specific vaccine efficacy based on B̂aug (v).
4 Testing of mark-specific vaccine efficacy
The covariate-adjusted vaccine efficacy VE(v) is defined through the first component of β(v). Let B1 (v) be the first component of the cumulative coefficient function B(v). The hypothesis tests concerning VE(v) are constructed based on the first component of the AIPW estimator B̂aug (v). The cumulative estimator B̂aug (v) has more stable large-sample behavior and a faster convergence rate than βaug (v).
Let WB (v) = n1/2 {B̂aug (v) − B̂aug (a)} − n1/2 {B(v) − B(a)} for v ∈ [a, b]. In the Appendix we show that WB (v), v ∈ [a, b], converges weakly to a p-dimensional mean-zero Gaussian process with continuous sample paths on v ∈ [a, b]. Further, the distribution of WB (v), for v ∈ [a, b], can be approximated using the Gaussian multipliers resampling method [of Lin et al. (1993)] based on v ∈ [a, b], where {ξki, i = 1,…, nk, k = 1,…,K} are iid standard normal random variables and Ĥk i(v) is defined in (22) in the Appendix. Let WB1 (v) and be the first component of WB(v) and , respectively. With the Gaussian multipliers method, the variance can be consistently estimated by is the first component on the diagonal of the covariance given in (23) in the Appendix.
4.1 Testing the null hypothesis H10
Consider the test process . Then Q(1) (v) = WB1(v) + n1/2 {B1 (v) − B1 (a)}, v ∈ [a, b]. Under H10, B1 (v) − B1 (a) = 0 for v ∈ [a, b], which motivates the following test statistics for testing H10:
The test statistics capture general departures H1a, while the test statistics are sensitive to the monotone departures H1m. It is easy to derive that all the test statistics are consistent against their respective alternative hypotheses, and the Appendix derives their limiting distributions under H10.
Under H10, the distribution of Q(1) (v), v ∈ [a, b], can be approximated by the conditional distribution of , v ∈ [a, b], given the observed data sequence. Hence, the distributions of under H10 can be approximated by the conditional distributions of , given the observed data sequence, respectively. The critical values, , of the test statistics can be approximated by the (1 − α)-quantile of , which can be obtained by repeatedly generating a large number, say 500, of independent sets of normal samples {λki, i = 1,…, nk, k = 1,…, K} while holding the observed data sequence fixed. Similarly, the critical values, , of the test statistics can be approximated by the α-quantile of , which again can be obtained by repeatedly generating independent sets of normal samples {ξki, i = 1,…, nk, k = 1,…, K}. At significance level α, the tests based on reject H10 in favor of H1a if , respectively, and the tests based on reject H10 in favor of H1m if , respectively.
4.2 Testing the null hypothesis H20
Let .Then
(11) |
where Γ(v, F1) = (v − a)−1 {F1 (v) − F1 (a)} − (b − a)−1 {F1 (b) − F1 (a)} is a transformation of F1 (·). We note that Γ(·, B1) = 0 under H20 and Γ(·, B1) ≠ 0 under the alternatives, motivating Q(2) (v) as the test process and the following test statistics for testing H20:
where a < a′ < b. We choose a′ > a to avoid zero in the denominator of Q(2) (v). In practice, one can choose a′ close to a to make use of available data and to ensure the tests are consistent.
By the asymptotic results shown in the Appendix and the continuous mapping theorem, under H20 the distribution of Q(2) (v), v ∈ [a, b], can be approximated by the conditional distribution of , v ∈ [a, b], given the observed data sequence. Hence, the distributions of under H20 can be approximated by the conditional distributions of , given the observed data sequence, respectively. Similar to Section 4.1, the respective critical values of the test statistics can be approximated by the (1 − α)-quantiles of the conditional distributions of obtained through repeatedly generating independent sets of normal samples {ξki, i = 1,…, nk, k = 1,…, K} while holding the observed data sequence fixed. The critical values can be approximated similarly. At the significance level α, the tests based on reject H20 in favor of H2a if , respectively, and the tests based on reject H20 in favor of H2m if , respectively.
The tests capture general departures H2a while the tests are sensitive to the monotone departure H2m. Note that the derivative dΓ(v, B1)/dv = (v − a)−1 [β1 (v) − (v − a)−1 B1 (v)] ≥ 0 under H2m with strict inequality for at least some v ∈ [a, b]. This plus the fact that Γ(v,B1) is non-decreasing with Γ(b,B1) = 0 lead to the results that the tests based on are consistent against H2m and the tests based on are consistent against H2a. The proofs are given in the second paragraph following Theorem 1 in the Appendix.
In Sections 4.1 and 4.2, we considered two types of test statistics, namely the integration-based test statistics and the supremum-based test statistics, for each pair of hypotheses. The former are generalizations of the Cramér-von Mises test statistic, and involve integration of deviations over the whole range of the mark, whereas the latter are extensions of the classic Kolmogorov-Smirnov test statistic for testing goodness-of-fit of a distribution function, and take the supremum of such deviations. As demonstrated in a comprehensive analysis of the relative powers of the classic Kolmogorov-Smirnov test and the Cramér-von Mises test by Stephens (1974), we expect that the two types of test statistics have different powers for different true alternative distributions. The integration-based test statistics are best-suited for situations where the true alternative distribution deviates a little over the whole support of the mark and the supremum-based test statistics may have more power against situations where the true alternative has large deviations over a small section of the support. For example, for testing differential VE(v), H20, the supremum-based tests will tend to be relatively more powerful if is very high for a small range of marks near a and declines sharply to zero and is constant at zero for all other marks.
5 Simulation study
5.1 Numerical assessment of the tests under correctly specified models
We conduct a simulation study to evaluate the finite-sample performance of the proposed testing procedures. The empirical sizes and powers of the test statistics are assessed for various models, sample sizes (500 and 800) and choices of bandwidths. The powers of the tests are evaluated in both situations where a correlated auxiliary variable is used and where it is absent.
We consider K = 1 stratum. Let Zki be the treatment indicator with P(Zki = 1) = 0.5. The (Tki, Vki) are generated from the following mark-specific proportional hazards model:
(12) |
where α, β and γ are constants. Under model (12), λ0 (t, v) = exp(v) and VE(v) = 1 − exp (α + βv). For α = 0 and β = 0, VE(v) = 0, indicating no vaccine efficacy, and for β = 0, VE(v) = VE, indicating mark-invariant vaccine efficacy; whereas β > 0 indicates VE(v) decreasing in v. We examine the hypothesis testing procedures for the following specific models:
(M1) (α, β, γ) = (0, 0, 0.3), implying VE(v) = 0;
(M2) (α, β, γ) = (−0.69, 0, 0.3), implying VE(v) does not depend on v;
(M3) (α, β, γ) = (−0.6, 0.6, 0.3), implying VE(v) decreases;
(M4) (α, β, γ) = (−1.2, 1.2, 0.3), implying VE(v) decreases;
(M5) (α, β, γ) = (−1.5, 1.5, 0.3), implying VE(v) decreases.
We generate the censoring times from an exponential distribution, independent of (T, V), with censoring rates ranging from 20% to 30%. We take τ = 2.0. The complete-case indicator Rki is generated with conditional probability rk(Wki) = P(Rki = 1|δki = 1, Wki), where
(13) |
With ψk0 = 0.2 and ψk1 = −0.2 about 50% of observed failures are missing marks.
Conditional on (Tki, Zki, Vki), we assume that the auxiliary marks follow the model
(14) |
for i = 1,…, nk, k = 1,…, K, where Vki are the possibly missing marks, Uki is uniformly distributed on [0, 1] independent of Vki, and θ > 0 is an association parameter between Aki and Vki. The correlation coefficient ρ between Aki and Vki is 1 for θ = 0. Since Aki is observed for all observed failure times, the AIPW estimator in this case is the full data estimator. The Aki and Vki are independent for θ = ∞, yielding ρ = 0. In addition, the θ values of 0.8, 0.4 and 0.2 correspond to ρ = 0.78, 0.92 and 0.98.
Under model (14), the conditional density of Aki given (Tki, Zki, Vki) is
(15) |
The likelihood function for θ is
It is easy to show that the maximum likelihood estimator equals
The density estimator gk (a|t, v, z; θ̂) is plugged into (9) to obtain , which is used to construct the AIPW estimator of β in (10).
The performances of the proposed test procedures are evaluated through simulations for the models described in (12), (13) and (14) under the settings (M1)–(M5), where (M1) is a setting under the null hypothesis H10 and (M2) is a setting under the null hypothesis H20. We consider the situations where no auxiliary information is provided and where the correlation between the auxiliary mark and the mark of interest is ρ = 0.92 [under model (14) with θ = 0.4]. Table 1 presents the empirical sizes and powers of the tests for testing H10 at the nominal level 0.05. Table 2 presents the empirical sizes and powers of the tests for testing H20 at the nominal level 0.05. The results are presented for n = 500 with h1 = 0.1 and h = h2 = 0.15 and 0.2, and for n = 800 with h1 = 0.1 and h = h2 = 0.1 and 0.15. We take a = 0, b = 1 and a′ = 0.5 for the tests. The Epanechnikov kernel K(x) = .75(1 − x2)I{|x| ≤ 1} is used throughout the numerical analysis.
Table 1.
Size/Power |
|||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ρ = 0 |
ρ = 0.92 |
||||||||||||||||||
Model | (α, β, γ) | n | h | ||||||||||||||||
M1 | (0, 0, 0.3) | 500 | 0.15 | 5.4 | 4.0 | 4.0 | 5.0 | 4.6 | 4.2 | 3.8 | 4.2 | ||||||||
0.20 | 5.0 | 4.4 | 4.6 | 5.2 | 4.8 | 4.0 | 4.2 | 3.6 | |||||||||||
800 | 0.10 | 3.8 | 3.6 | 4.2 | 4.2 | 3.8 | 3.8 | 5.4 | 4.8 | ||||||||||
0.15 | 4.0 | 3.8 | 4.6 | 4.6 | 5.0 | 4.4 | 5.4 | 5.6 | |||||||||||
M3 | (−0.6, 0.6, 0.3) | 500 | 0.15 | 68.2 | 67.0 | 79.4 | 76.0 | 73.2 | 74.6 | 83.2 | 85.4 | ||||||||
0.20 | 63.2 | 65.0 | 75.8 | 74.2 | 69.2 | 71.4 | 79.8 | 82.6 | |||||||||||
800 | 0.10 | 88.2 | 86.2 | 94.6 | 90.4 | 92.0 | 93.0 | 95.0 | 97.2 | ||||||||||
0.15 | 87.4 | 86.6 | 92.8 | 90.8 | 89.2 | 90.6 | 93.4 | 95.2 | |||||||||||
M4 | (−1.2, 1.2, 0.3) | 500 | 0.15 | 99.6 | 99.4 | 99.8 | 99.8 | 99.8 | 100 | 99.8 | 100 | ||||||||
0.20 | 99.4 | 99.0 | 99.6 | 99.8 | 99.6 | 99.8 | 99.8 | 100 | |||||||||||
800 | 0.10 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | ||||||||||
0.15 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |||||||||||
M2 | (−0.69, 0, 0.3) | 500 | 0.15 | 100 | 100 | 100 | 99.8 | 100 | 100 | 100 | 100 | ||||||||
0.20 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |||||||||||
800 | 0.10 | 100 | 99.8 | 100 | 100 | 99.8 | 99.8 | 99.8 | 99.8 | ||||||||||
0.15 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Table 2.
Size/Power |
|||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ρ = 0 |
ρ = 0.92 |
||||||||||||||||||
Model | (α, β, γ) | n | h | ||||||||||||||||
M2 | (−0.69, 0, 0.3) | 500 | 0.15 | 5.6 | 4.8 | 5.8 | 5.8 | 7.6 | 7.2 | 7.4 | 7.0 | ||||||||
0.20 | 5.8 | 4.8 | 5.4 | 5.2 | 6.6 | 6.6 | 6.6 | 7.4 | |||||||||||
800 | 0.10 | 6.4 | 5.0 | 5.6 | 5.8 | 6.2 | 5.8 | 7.2 | 7.0 | ||||||||||
0.15 | 6.6 | 5.2 | 5.8 | 5.6 | 6.0 | 5.6 | 6.0 | 6.6 | |||||||||||
M3 | (−0.6, 0.6, 0.3) | 500 | 0.15 | 16.8 | 17.0 | 22.4 | 25.2 | 20.6 | 25.8 | 32.6 | 37.4 | ||||||||
0.20 | 14.2 | 15.8 | 22.2 | 24.8 | 19.4 | 24.2 | 31.8 | 34.6 | |||||||||||
800 | 0.10 | 26.0 | 25.8 | 35.2 | 36.4 | 36.0 | 38.0 | 46.0 | 49.2 | ||||||||||
0.15 | 25.4 | 25.8 | 34.8 | 35.6 | 34.0 | 36.0 | 45.4 | 47.4 | |||||||||||
M4 | (−1.2, 1.2, 0.3) | 500 | 0.15 | 44.4 | 46.2 | 59.0 | 63.2 | 63.6 | 68.4 | 76.4 | 80.2 | ||||||||
0.20 | 42.2 | 44.0 | 57.2 | 59.6 | 61.4 | 65.8 | 73.2 | 75.8 | |||||||||||
800 | 0.10 | 66.2 | 67.6 | 75.2 | 78.0 | 82.8 | 86.6 | 90.6 | 91.8 | ||||||||||
0.15 | 64.6 | 66.2 | 74.0 | 77.0 | 80.6 | 84.4 | 88.4 | 91.2 | |||||||||||
M5 | (−1.5, 1.5, 0.3) | 500 | 0.15 | 64.5 | 66.5 | 75.0 | 76.5 | 81.0 | 85.6 | 88.8 | 90.4 | ||||||||
0.20 | 61.0 | 62.6 | 72.2 | 72.2 | 77.8 | 82.4 | 86.8 | 89.4 | |||||||||||
800 | 0.10 | 80.8 | 85.6 | 87.6 | 91.4 | 94.6 | 96.2 | 97.6 | 98.4 | ||||||||||
0.15 | 78.6 | 84.8 | 87.8 | 91.4 | 94.4 | 95.6 | 95.8 | 97.8 |
Tables 1 and 2 show that all of the tests have satisfactory empirical sizes close to the nominal level 0.05. The powers of the tests increase with sample size and they are not overly sensitive to the selected bandwidths. The powers of the tests for testing H10 increase as the model moves in the direction M1 → M3 → M4 → M2, representing increased departure from the null hypothesis H10. The powers of the tests for testing H20 increase as the model moves in the direction M2 → M3 → M4 → M5, representing increased departure from the null hypothesis H20. The tests utilizing the auxiliary marks have higher power than those without using the auxiliary marks.
As with any nonparametric smoothing procedure, one needs to carefully select bandwidths. In practice, the appropriate bandwidth selection can be based on a 𝒦-fold cross-validation method [e.g., Efron and Tibshirani (1993), Hoover et al. (1998), Cai et al. (2000) and Tian et al. (2005)].
The proposed testing procedures properly handles missing marks under MAR with asymptotically correct significance levels. However, if only the observations with complete information are used, i.e., the complete-case analysis, then the testing procedures are expected to often not provide correct type I error control. We conduct a simulation study to evaluate the observed sizes of the proposed tests using the complete cases under two different models for missing the indicator Rki – model (13) and the following model:
(16) |
For K = 1 both models (13) and (16) yield about 50% missing marks among the observed failures. The sizes of for testing H10 are evaluated under model (M1) and the sizes of for testing H20 are evaluated under model (M2) (Table 3). Under model (13), the observed sizes for testing H10 are elevated (around 7–15%), whereas those for testing H20 remain around 5%. Under model (16), the observed sizes for testing H10 exceed 37% for all tests, whereas those for testing H20 reach 12% and 14% for the tests when n = 800.
Table 3.
Model | Missing Model | n | h | Size | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
testing H10 | |||||||||||
M1 | (13) | 500 | 0.20 | 0.14 | 0.10 | 0.12 | 0.15 | ||||
800 | 0.15 | 0.10 | 0.07 | 0.11 | 0.11 | ||||||
(16) | 500 | 0.20 | 0.39 | 0.37 | 0.50 | 0.42 | |||||
800 | 0.15 | 0.50 | 0.46 | 0.63 | 0.55 | ||||||
testing H20 | |||||||||||
M2 | (13) | 500 | 0.20 | 0.08 | 0.04 | 0.08 | 0.05 | ||||
800 | 0.15 | 0.06 | 0.09 | 0.06 | 0.10 | ||||||
(16) | 500 | 0.20 | 0.08 | 0.07 | 0.08 | 0.05 | |||||
800 | 0.15 | 0.07 | 0.06 | 0.12 | 0.14 |
These simulation results verify that the testing procedures applied to complete cases generally do not have nominal size, although for some of the scenarios the sizes are nominal. To explain this, it can be shown that, under MAR, λk (t, v|z,Rki = 1) = λk (t, v|z)hk (t, z), where hk (t, z) = P(Rki = 1|Tki = t,Zki = z)/P(Rki = 1|Tki ≥ t, Zki = z). If hk (t, z) does not depend on z and MAR holds, then the observations for individuals with the observed marks only can be viewed as a random sample from a mark-specific proportional hazards model with a different baseline hazard function but the same regression function β(v). In this case, the tests for both H10 and H20 based on the complete cases are valid. If hk (t, z) depends on z but not on t and MAR holds, then hk (t, z) can be expressed as [the scenario under model (13)], and the tests of H10 based on the complete cases will be biased. However, the tests of H20 remain unbiased since the biases in the estimation of β(v) that do not depend on v, such that the test process Q(2) (v) is still asymptotically a mean zero process. In general, if hk (t, z) depends on both z and t and MAR holds, which is the scenario under the missing model (16), then the test process Q(2) (v) is not an asymptotically mean zero process. The magnitude of departure of the asymptotic sizes of the test statistics of H20 from the nominal level depends on hk (t, z) in a complicated manner.
5.2 Numerical assessment of the tests under mis-specified models
This subsection evaluates robustness of the proposed test procedures to mis-specifications of rk (w) and/or gk (a|t, v, z), and to violation of the MAR assumption. The Zki, (Tki, Vki), and Cki are generated using the same models as above, again with approximately 30% censoring.
Robustness of the tests to mis-specification of rk (w) is examined by assuming model (13) while the actual complete-case indicator Rki is generated with the conditional probability rk (Wki) = P(Rki = 1|δki = 1, Wki), where
(17) |
This model yields approximately 50% missing marks among observed failures under (M1)–(M5).
Robustness of the tests is also examined when gk (a|t, v, z) is mis-specified. This is carried out by assuming model (14) for the auxiliary mark, or, equivalently, model (15) for gk (a|t, v, z), while the actual mark for λki = 1 is generated from
(18) |
for i = 1,…, nk. Here Uki is uniformly distributed on [0, 1] and is independent of Vki.
Robustness of the tests to violation of the MAR assumption (2) is examined by assuming model (13), while the actual Rki depends on Vki through the model
(19) |
The proportion of missing marks among the observed failures is kept around 50% in all scenarios.
The models (17), (18) and (19) are similar to those used in Sun and Gilbert (2012) for examining robustness of the AIPW estimator. However, instead of examining biases and standard errors of the estimators, here we check whether the empirical sizes of the tests are close to their nominal level 0.05 and how the powers of the tests are affected by these mis-specifications. For sample size n = 500 and bandwidths h1 = 0.1 and h = h2 = 0.20, Table 4 shows the empirical sizes and powers of the tests of H10 and Table 5 shows the empirical sizes and powers of the tests of H20. In both tables, the first block shows the results when rk (w) is mis-specified following (17) and gk (a|t, v, z) is correctly specified by (15) with λ = 0.4; the second block shows the results when gk (a|t, v, z) is mis-specified following (18) and rk (w) is correctly specified by (13) with ψk1 = 0.2 and ψk1 = −0.2; the third block shows the results when rk (w) is mis-specified following (17) and gk (a|t, v, z) is mis-specified following (18); and the fourth block shows the results when rk (w) depends on Vki following (19) and gk (a|t, v, z) is correctly specified by (15) with λ = 0.4.
Table 4.
Size/Power |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Model | (α, β, γ) | ||||||||
rk(w) is misspecified | |||||||||
M1 | (0, 0, 0.3) | 4.2 | 5.2 | 3.6 | 4.2 | ||||
M3 | (−0.6, 0.6, 0.3) | 62.0 | 74.4 | 74.0 | 81.8 | ||||
M4 | (−1.2, 1.2, 0.3) | 99.6 | 99.8 | 99.8 | 99.8 | ||||
M2 | (−0.69, 0, 0.3) | 100 | 100 | 100 | 100 | ||||
gk(a|t, v, z) is misspecified | |||||||||
M1 | (0, 0, 0.3) | 3.4 | 4.2 | 5.8 | 4.6 | ||||
M3 | (−0.6, 0.6, 0.3) | 59.6 | 64.4 | 72.8 | 74.4 | ||||
M4 | (−1.2, 1.2, 0.3) | 99.2 | 99.4 | 99.6 | 99.6 | ||||
M2 | (−0.69, 0, 0.3) | 100 | 99.8 | 100 | 99.8 | ||||
rk(w) and gk(a|t, v, z) are misspecified | |||||||||
M1 | (0, 0, 0.3) | 4.0 | 4.0 | 3.8 | 3.4 | ||||
M3 | (−0.6, 0.6, 0.3) | 61.8 | 61.8 | 71.8 | 73.8 | ||||
M4 | (−1.2, 1.2, 0.3) | 99.6 | 98.6 | 99.8 | 99.8 | ||||
M2 | (−0.69, 0, 0.3) | 100 | 100 | 100 | 100 | ||||
missing-at-random assumption is violated | |||||||||
M1 | (0, 0, 0.3) | 3.4 | 3.8 | 3.6 | 5.0 | ||||
M3 | (−0.6, 0.6, 0.3) | 60.6 | 67.0 | 73.0 | 77.8 | ||||
M4 | (−1.2, 1.2, 0.3) | 99.2 | 99.6 | 99.8 | 99.6 | ||||
M2 | (−0.69, 0, 0.3) | 100 | 100 | 100 | 100 |
Table 5.
Size/Power |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Model | (α, β, γ) | ||||||||
rk(w) is misspecified | |||||||||
M2 | (−0.69, 0, 0.3) | 5.0 | 3.8 | 5.4 | 6.2 | ||||
M3 | (−0.6, 0.6, 0.3) | 24.0 | 25.2 | 34.2 | 36.0 | ||||
M4 | (−1.2, 1.2, 0.3) | 60.8 | 66.6 | 72.8 | 78.4 | ||||
M5 | (−1.5, 1.5, 0.3) | 76.6 | 82.0 | 85.8 | 88.8 | ||||
gk (a|t, v, z) is misspecified | |||||||||
M2 | (−0.69, 0, 0.3) | 4.8 | 6.6 | 6.0 | 5.8 | ||||
M3 | (−0.6, 0.6, 0.3) | 17.2 | 18.0 | 28.4 | 28.2 | ||||
M4 | (−1.2, 1.2, 0.3) | 44.8 | 47.2 | 56.4 | 61.0 | ||||
M5 | (−1.5, 1.5, 0.3) | 58.0 | 60.4 | 68.6 | 73.2 | ||||
rk (w) and gk (a|t, v, z) are misspecified | |||||||||
M2 | (−0.69, 0, 0.3) | 4.0 | 4.8 | 4.4 | 4.4 | ||||
M3 | (−0.6, 0.6, 0.3) | 16.6 | 19.6 | 26.8 | 26.6 | ||||
M4 | (−1.2, 1.2, 0.3) | 43.2 | 46.6 | 55.6 | 60.6 | ||||
M5 | (−1.5, 1.5, 0.3) | 53.8 | 58.8 | 67.4 | 71.4 | ||||
missing-at-random assumption is violated | |||||||||
M2 | (−0.69, 0, 0.3) | 6.8 | 6.0 | 7.6 | 7.8 | ||||
M3 | (−0.6, 0.6, 0.3) | 28.6 | 33.6 | 39.6 | 42.0 | ||||
M4 | (−1.2, 1.2, 0.3) | 61.8 | 67.0 | 74.0 | 78.4 | ||||
M5 | (−1.5, 1.5, 0.3) | 77.4 | 81.6 | 85.4 | 89.2 |
Tables 4 and 5 show that the empirical sizes of the tests are very close to the nominal level 0.05 when one of rk (w) and gk (a|t, v, z) is mis-specified, reflecting the double robustness property of the AIPW estimator. The empirical sizes are also close to 0.05 when both rk (w) and gk (a|t, v, z) are mis-specified and when the MAR assumption is violated, which is intriguing. When only rk (w) is mis-specified and MAR holds, the empirical powers in Tables 4 and 5 closely track the corresponding powers in Tables 1 and 2 under correct model specifications. The empirical powers are lower than those observed in Table 1 and 2 when gk (a|t, v, z) is mis-specified or when both rk (w) and gk (a|t, v, z) are mis-specified, whereas the empirical powers in Tables 4 and 5 are very close to those in Tables 1 and 2 when MAR is violated. Apparently for our particular data simulation, the bias due to the MAR violation counter-balances the bias due to mis-specification of both rk (w) and gk (a|t, v, z); however, in general these violations could distort sizes and powers.
5.3 Simulation study for the Thai trial
We conduct a simulation of the Thai trial, to gain insight about the power available for this real trial. Specifically, we simulated data to yield about the numbers of infections observed (74 in the placebo group and 51 in the vaccine group), the overall vaccine efficacy from the proportional hazards model is about 31%, and the true VE(v) curve decreases with v to be around 65–70% for v close to zero and around 0% for v close to 1. The actual infection rate was only 0.3% over 3.5 years; to speed the simulations we use a 20% placebo infection rate and retain 74 infections on average.
Again with K = 1 stratum, the (Tki, Vki) are generated from the following model:
(20) |
where α, β and γ are constants. Under model (20), VE(v) = 1 − exp(λ+ βv), the marginal hazards are λ0 (t) = σ for z = 0, and λ1 (t) = γ exp(α)(exp(β)−1)/β for z = 1, and the Cox proportional hazards vaccine efficacy equals VEC = 1 − λ1 (t)/λ0 (t) = 1 − exp(α)(exp(β) − 1)/β. We choose (α, β, γ) = (−1.1, 1.3, 0.068), yielding VEC = 0.32, VE(0) = 0.67, and VE(0.85) = 0. We study 400 subjects each in the vaccine and placebo groups. Matching the actual trial, the censoring rate before τ is kept very low, just under 5%. The missing mark indicator is generated from model (13), with (ψk0, ψk1) set to yield about 0%, 25% (−1.2, −0.2), 50% (0.2, −0.2), and 75% (−1.0, −0.2) missing marks among observed failures. We assume the auxiliary variable Aki follows the model (14) given in Section 5.1, where the θ values of ∞, 0.8, 0.4 and 0.2 correspond to λ = 0, 0.78, 0.92 and 0.98 for the correlation coefficient between Aki and Vki.
Because of lost information on the mark, we choose larger bandwidths for higher percentages of missing marks. We use h = 0.4 for the case with 75% missing marks; h = 0.3 for the case with 50% missing marks; h = 0.2 for the case with 25% missing marks; and h = 0.15 for the case with 0% missing marks. The bandwidths h1 and h2 in (7) in the estimation of are taken to be 0.50 and h2 = h in each case. Power of the proposed tests for the simulations based on the Thai trial at the nominal level 0.05 are reported in Table 6. The tests show similar performance as was found in the simulation study of Section 5.1. As only 10% of infected subjects had missing marks in RV144 and the auxiliary was very weakly predictive, we focus on the entries with 0% or 25% missing marks and ρ = 0. There is 67%–95% power to reject H10, and 33%–60% power to reject H20. These results show that a fairly strong sieve effect with V E(v) declining from 67% to 0% could readily be missed in the Thai trial due to limited power. The only slightly improved power with an excellent auxiliary ρ = 0.98 shows that greater numbers of events would be needed to achieve high power for testing H20.
Table 6.
Power |
||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
testing H10 | testing H20 | |||||||||||||||||
ρ | % missing marks | h | ||||||||||||||||
0 | 0.15 | 77 | 85 | 86 | 95 | 48 | 48 | 59 | 60 | |||||||||
0 | 25 | 0.2 | 67 | 76 | 79 | 85 | 36 | 33 | 50 | 47 | ||||||||
50 | 0.3 | 63 | 71 | 71 | 82 | 29 | 27 | 37 | 42 | |||||||||
75 | 0.4 | 41 | 51 | 59 | 58 | 21 | 18 | 35 | 31 | |||||||||
0.78 | 25 | 0.2 | 67 | 79 | 82 | 89 | 36 | 39 | 46 | 50 | ||||||||
50 | 0.3 | 60 | 71 | 74 | 84 | 28 | 28 | 41 | 39 | |||||||||
75 | 0.4 | 49 | 53 | 63 | 65 | 25 | 25 | 34 | 34 | |||||||||
0.92 | 25 | 0.2 | 70 | 80 | 84 | 91 | 37 | 41 | 50 | 56 | ||||||||
50 | 0.3 | 61 | 71 | 73 | 87 | 35 | 39 | 50 | 51 | |||||||||
75 | 0.4 | 54 | 58 | 62 | 71 | 30 | 33 | 40 | 44 | |||||||||
0.98 | 25 | 0.2 | 71 | 81 | 82 | 91 | 39 | 47 | 53 | 55 | ||||||||
50 | 0.3 | 66 | 76 | 75 | 86 | 44 | 42 | 50 | 52 | |||||||||
75 | 0.4 | 56 | 66 | 68 | 76 | 41 | 43 | 51 | 49 |
6 Analysis of the RV144 Thai trial
In the RV144 Thai trial, 125 subjects (51 of 8197 in the vaccine group and 74 of 8198 in the placebo group) were diagnosed with HIV infection over a 42 month follow-up period, from whom full-length HIV genomes were measured from 121; 3 missed data because their HIV viral load was too low for the Sanger sequencing technology to work, and 1 dropped out [Rerks-Ngarm et al. (2009), Rolland et al. (2012)]. We focus on the gp120 region of the HIV Env protein, because this region stimulates anti-HIV antibody responses which are the putative cause of the observed partial vaccine efficacy. Three gp120 sequences were included in the vaccine: 92TH023 in the ALVAC canarypox vector prime component; and CM244, MN in the AIDSVAX gp120 protein boost component. 92TH023 and CM244 are subtype E HIVs where as MN is subtype B, and 110 of the 121 subjects were infected with subtype E sequences. The subtype E vaccine-insert sequences are much closer genetically to the infecting (and regional circulating) sequences than MN, and thus are more likely to stimulate protective immune responses. Accordingly, the analysis focuses on the 92TH023 and CM244 reference sequences, and right-censors the 15 subjects HIV infected with subtype B or with unknown subtype. One subject who acquired HIV infection during the trial was documented to have acquired HIV from another trial participant who had previously become HIV infected; the analysis excludes this subject because his/her inclusion would violate the independent observations assumption. In the context of our model set-up, T is the time to HIV infection diagnosis with subtype E HIV. The time to HIV infection diagnosis with subtype B or with unknown HIV subtype is treated as censoring.
We define V based on HIV sequence data measured from a blood sample drawn at or before the HIV diagnosis date. (The trial documented acute-phase/pre-seroconversion infection in only a few subjects, prohibiting defining the mark based on acute-phase sequences.) Eleven of the 109 (11%) subtype E infected subjects have sequences measured from a post-diagnosis sample and hence are missing V. To maximize biological relevance and statistical power, we restrict the gp120 distances to the published set of gp120 sites in contact with known broadly neutralizing monoclonal antibodies (Moore et al., 2009; Wei et al., 2003). For each HIV sequence from a subject and each of the two reference vaccine sequences, V is computed as a weighted Hamming distance using the PAM-between scoring matrix (Nickle et al., 2007). Between 2 and 13 sequences (total 1030) sequences) were measured per infected subject, and V is defined as the subject’s sequence closest to his or her consensus sequence (the consensus sequence is comprised of the majority amino acids at each site, one site at a time). Finally, the distances are re-scaled to values between 0 and 1. In total, 109 infected subjects (43 vaccine, 66 placebo) are included in the analysis, of which 98 (39 vaccine, 59 placebo) have an observed mark V ; Figure 1 displays the observed V’s.
To predict the probability of observing V among the 109 infected subjects, we use all-subsets logistic regression model selection considering demographics, host genetics, and biomarker data post-infection. The best model by BIC includes only the years from entry until HIV infection diagnosis (X1), with model fit logit(P̂ (R = 1|δ = 1, X1)) = 1.17 + 0.70X1 for the CM244 reference sequence. The model was very similar for the 92TH023 reference sequence (not shown). In addition, we consider linear and logistic regression models for relating the mean of various potential auxiliary variables (A) to V, X1, and treatment indicator Z. Model selection did not reveal any significantly predictive auxiliary variables; we expect that HIV sequence information measured after V is defined would be a good predictor, but these data were not collected. Nevertheless, to implement the AIPW method we select the best available auxiliary variable, gender (A = X2, 1=male; 0=female), and use the logistic regression model that results; for CM244 the fitted model ĝ(A = a|V, X1, Z) is logit(P̂(X2 = 1|δ = 1, V, X1, Z) = 0.24 − 0.33V + 0.16X1 + 0.38Z, and the model was very similar for 92TH023 (not shown).
The AIPW estimation and testing procedures are applied to the Thai trial data set with bandwidths h1 = 0.5 and h2 = h = 0.3, a = 0.05, b = 1 and a′ = a + 0.01 (a and a′ are near the minimum observed marks). As in the simulation study, 500 simulated Gaussian multipliers are used. Because the results are nearly identical with and without the auxiliary variable, only the latter results are presented. Figure 2 shows the estimated VE(v) along with 95% pointwise confidence bands, indicating that vaccine efficacy appears to be high against HIVs near to the 92TH023 reference sequence [estimated VE(0.01) = 56%], and declines to zero against HIVs farthest from the 92TH023 reference sequence [estimated VE(1.0) = 2.4%]. The decline is similar for the CM244 reference sequence, with estimated VE(0.01) = 45% and estimated VE(0.95) = −9.1%.
Figure 3 (a) and (b) shows the test processes Q(1) (v) versus 20 realizations from the Gaussian multiplier process given the observed data, and Figure 3 (c) and (d) shows the parallel results for the test process Q(2) (v), each suggesting departures from the null hypothesis H10 and from the null hypothesis H20 for each reference sequence. The p-values of the tests based on the test statistics for testing H10 against the monotone alternative over v ∈ [0, 1] are 0.032 and 0.008 for 92TH023, and 0.014 and 0.010 for CM244. The p-values of the test statistics for testing H10 against the general alternative are 0.054 and 0.018 for 92TH023 and 0.030 and 0.010 for CM244. For testing H20 over v ∈ [0, 1], the p-values of the supremum-type tests based on the test statistics are 0.53 and 0.27 for 92TH023 and 0.37 and 0.18 for CM244. The p-values of the integrated square type tests based on the test statistics are 0.35 and 0.14 for 92TH023 and 0.44 and 0.19 for CM244.
These analyses provide more evidence that the vaccine had some protective efficacy than the original primary analysis that did not account for the mark information (Rerks-Ngarm et al., 2009): the primary analysis test for any vaccine efficacy yielded p=0.04 whereas the tests for any vaccine efficacy against any mark reported here yielded median p-value of 0.016 across the four test statistics and two reference sequences. The analyses also showed a nonsignificant trend (p-values around 0.14–0.19) that the vaccine protected better against HIVs closely matched to the vaccine strain HIVs in the monoclonal antibody contact sites, but had less or absent protection against HIVs with many mismatches in these sites. While the significance levels are not compelling, the simulation study presented in Section 5.3 of the power available for detecting a vaccine sieve effect in the Thai trial showed that the study is well-powered only to detect large sieve effects [with greater decline of V E(v) in v than what was observed in the estimated V E(v) curves]; thus a moderate-to-large sieve effect is consistent with the observed results. These results may guide future vaccine research by suggesting modifications of future vaccine candidates to include HIV sequences more closely matched to circulating HIVs in the monoclonal antibody contact sites. They may also motivate the design of future experiments to understand functional effects of amino acid mutations at the monoclonal antibody contact sites.
Acknowledgements
The authors thank Hasan Ahmed and Paul Edlefsen for generating the HIV sequence distances, and thank the participants, investigators, and sponsors of the RV144 Thai trial, including the U.S. Military HIV Research Program (MHRP); U.S. Army Medical Research and Materiel Command; NIAID; U.S. and Thai Components, Armed Forces Research Institute of Medical Science Ministry of Public Health, Thailand; Mahidol University; SanofiPasteur; and Global Solutions for Infectious Diseases. The authors thank the Editor, Associate Editor, and two referees for their helpful suggestions. The research of Yanqing Sun was partially supported by NSF grants DMS-0905777 and DMS-1208978, and the research of Drs. Sun and Peter Gilbert was partially supported by NIH NIAID grant R37AI054165. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Appendix: Asymptotic results
The following regularity conditions from Sun and Gilbert (2012) are assumed.
Condition A
-
(A.1)
β(v) has component wise continuous second derivatives on [0, 1]. For each k = 1,…, K,the second partial derivative of λ0k (t, v) with respect to v exists and is continuous on [0, τ]×[0, 1]. The covariate process Zk (t) has paths that are left continuous and of bounded variation, and satisfies the moment condition E[‖Zk (t)‖4 exp(2M‖Zk (t)‖)] < ∞, where M is a constant such that (v, β(v)) ∈ [0, 1] × (−M, M)p for all v and ‖A‖ = maxk,l |akl| for a matrix A = (akl).
-
(A.2)
Each component of is continuous on [0, τ] × [−M, M]p, is continuous on [0, τ] × [−M, M]p × [−L, L]q for some M, L > 0 and j = 0, 1, 2.
-
(A.3)
The limit pk = limn→∞ nk/n exists and 0 < pk < ∞. on [0, τ] × [−M, M]p and the matrix is positive definite, where .
-
(A.4)
The kernel function K(·) is symmetric with support [−1, 1] and of bounded variation. The bandwidth h satisfies nh2 → ∞ and nh4 → 0 as n → ∞.
-
(A.5)
There is a σ > 0 such that rk (Wki) ≥ σ for all k, i with δki = 1.
Let ℱt = σ{I(Xki ≤ s, δki = 1), I(Xki ≤ s, δki = 0), Vki I(Xki ≤ s, δki = 1), Zki (s); 0 ≤ s ≤ t, i = 1,…, nk, k = 1,…, K} be the (right-continuous) filtration generated by the full data processes {Nki (s, v), Yki (s), Zki (s); 0 ≤ s ≤ t, 0 ≤ v ≤ 1, i = 1,…, nk, k = 1,…, K}. Assume E(Nki (dt, dv)|ℱt−) = E(Nki (dt, dv)|Yki (t), Zki (t)), that is, the mark-specific instantaneous failure rate at time t given the observed information up to time t only depends on the failure status and the current covariate value. By the definition of the conditional mark-specific hazard function, E(Nki (dt, dv)|ℱt−) = Yki (t)λk(t, v|Zki (t)) dtdv. Hence, the mark-specific intensity of Nki (t, v) with respect to ℱt equals Yki (t)λki (t, v|Zki (t)). Let . By Aalen and Johansen (1978), Mki (·, v1) and Mki (·, v2) − Mki (·, v1) are orthogonal square integrable martingales with respect to ℱt for any 0 ≤ v1 ≤ v2 ≤ 1.
The weak convergence of WB (v) = n1/2 {B̂aug (v) − B(v)} − n1/2 {B̂aug (a) − B(a)} for v ∈ [a, b] is given in Theorem 1 below.
Theorem 1. Under conditions (A.1)–(A.5), , uniformly in v ∈ [a, b], where
(21) |
The processes WB (v) converges weakly to a p-dimensional mean-zero Gaussian process with continuous sample paths on v ∈ [a, b], where .
Theorem 1 provides the basis for obtaining asymptotically correct critical values for the testing procedures for H10 and for H20. In particular, let G(v) be the limiting Gaussian process ofWB1 (v), v ∈ [a, b], as n → ∞. Then under H10, , v ∈ [a, b], as n → ∞. By Theorem 1 and the continuous mapping theorem, under H10 as n → ∞. Under H20, , v ∈ [a, b], as n → ∞. Applying the continuous mapping theorem, under H20, , as n → ∞.
The proof of the consistency of the tests for testing H10 are straightforward. To show the consistency of the tests for testing H20, we note that the derivative dΓ(v, B1)/dv = (v − a)−1 [β1 (v) − (v − a)−1 B1 (v)] ≥ 0 under H2m with strict inequality for at least some v ∈ [a, b]. The function Γ(v, B1) is non-decreasing with Γ(b, B1) = 0. We have, under H2m, Γ(v, B1) ≤ 0 with strict inequality for at least some v ∈ [a, b]. Let v0 ∈ [a, b] be such that Γ(v0, B1) < 0. Then Γ(v, B1) < 0 for v ≤ v0. Now defining , we have Γ(v, B1) < 0 for and Γ(v, B1) = 0 for v* ≤ v < b. It follows from (11) and Theorem 1 that under H2m as n → ∞ for . Thus the tests based on are consistent against H2m. Similarly, let . Then under H2a, |Γ(v, B1)| > 0 for , and |Γ(v, B1)| = 0 for . Hence under H2a as n → ∞ for , resulting in the consistent tests against H2a.
We use the Gaussian multiplier resampling method [Lin et al. (1993)] to approximate the distribution of WB (v), v ∈ [a, b]. Let {ξki, i = 1,…, nk, k = 1,…, K} be iid standard normal random variables. Replacing each term of (26), which is asymptotically equivalent to (21), by its empirical counterpart and multiplying by ξki, we obtain , where
(22) |
where .
Following an application of Lemma 1 of Sun and Wu (2005), the distribution of WB (v), v ∈ [a, b], can be approximated by the conditional distribution of , v ∈ [a, b], given the observed data sequence, which can be obtained through repeatedly generating independent sets of 23 {ξki, i = 1,…, nk, k = 1,…, K}. Hence, the distribution of Q(1) (v), v ∈ [a, b], under H10, can be approximated by the conditional distribution of , v ∈ [a, b], given the observed data sequence. By the continuous mapping theorem, the distribution of Q(2) (v), v ∈ [a, b], under H20, can be approximated by the conditional distribution of Γ (v, ), v ∈ [a, b], given the observed data sequence.
With the Gaussian multiplier method, the variance can be consistently estimated by is the first component on the diagonal of
(23) |
Proof of Theorem 1
Let
(24) |
Following the proof of Theorem 4 of Sun and Gilbert (2012, the web Appendix (W.19)) and under nh4 → 0,
(25) |
Hence
which, by exchanging the order of integrations, equals to
(26) |
Let
It follows that
(27) |
Since the kernel function K(·) has compact support on [−1, 1], (27) equals to
(28) |
It can be shown that J̃n (x) converges weakly to a mean-zero Gaussian process with continuous paths. Under the assumption (A.4), has bounded variation and converges uniformly to Σ(x)−1 for x ∈ (h, v − h). By Lemma 2 of Gilbert et al. (2008), the first term in (28) is equal to . Similar arguments lead to the second and the third terms in (28) to be op (1). Hence,
which converges weakly to a p-dimensional mean-zero Gaussian process on v ∈ [a, b] with continuous sample paths by Lemma 1 of Sun and Wu (2005). Theorem 1 follows since WB (v) = n1/2 {B̂aug (v) − B(v)} − n1/2 {B̂aug (a) − B(a)} is a linear transformation of n1/2 (B̂aug (·) − B(·)).
Contributor Information
Peter B. Gilbert, Department of Biostatistics, University of Washington and Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Yanqing Sun, Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
References
- Aalen OO, Johansen S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scandinavian Journal of Statistics. 1978;5:141–150. [Google Scholar]
- Cai Z, Fan J, Runze Li. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association. 2000;95:888–902. [Google Scholar]
- Cleveland WS. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association. 1979;74:829–836. [Google Scholar]
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall; New York: 1993. [Google Scholar]
- Fauci AS, Johnston MI, Dieffenbach CW, Burton DR, Hammer SM, Hoxie JA, Martin M, Overbaugh J, Watkins DI, Mahmoud A, Greene WC. HIV vaccine research: the way forward. Science. 2008;321:530–532. doi: 10.1126/science.1161000. [DOI] [PubMed] [Google Scholar]
- Gilbert PB, Self SG, Ashby MA. Statistical methods for assessing differential vaccine protection against human immunodeficiency virus types. Biometrics. 1998;54:799–814. [PubMed] [Google Scholar]
- Gilbert PB, Lele S, Vardi Y. Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika. 1999;86:27–43. [Google Scholar]
- Gilbert PB, McKeague IW, Sun Y. The two-sample problem for failure rates depending on a continuous mark: An application to vaccine efficacy. Biostatistics. 2008;9:263–276. doi: 10.1093/biostatistics/kxm028. [DOI] [PubMed] [Google Scholar]
- Gilbert PB, Berger JO, Stablein D, Becker S, Essex M, Hammer SM, Kim JH, Degruttola VG. Statistical interpretation of the RV144 HIV vaccine efficacy trial in Thailand: A case study for statistical issues in efficacy trials. Journal of Infectious Diseases. 2011;203:969–975. doi: 10.1093/infdis/jiq152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoover DR, Rice JA, Wu CO, Yang P-L. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85:809–822. [Google Scholar]
- Lin DY, Wei LJ, Ying Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–572. [Google Scholar]
- Moore PL, Ranchobe N, Lambson BE, Gray ES, Cave E, Abrahams M-R, Bandawe G, Mlisana K, Abdool Karim SS, Williamson C, Morris L the CAPRISA 002 study and the NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI) Limited neutralizing antibody specificities drive neutralization escape in early HIV-1 subtype C infection. PLoS Pathogens. 2009;5:e1000598. doi: 10.1371/journal.ppat.1000598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nickle DC, Heath L, Jensen MA, Gilbert PB, Mullins JI, Kosakovsky Pond SL. HIV-specific probabilistic models of protein evolution. PLoS ONE. 2007;2(6):e503. doi: 10.1371/journal.pone.0000503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prentice RL, Kalbfleisch JD, Peterson AV, Jr, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]
- Rerks-Ngarm S, Pitisuttithum P, Nitayaphan S, Kaewkungwal J, Chiu J, Paris R, Premsri N, Namwat C, de Souza M, Adams E, Benenson M, Gurunathan S, Tartaglia J, McNeil JG, Francis DP, Stablein D, Birx DL, Chunsuttiwat S, Khamboonruang C, Thongcharoen P, Robb ML, Michael NL, Kunasol P, Kim JH. Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand. New England Journal of Medicine. 2009;361:2209–2220. doi: 10.1056/NEJMoa0908492. [DOI] [PubMed] [Google Scholar]
- Rolland M, Tovanabutra S, Decamp AC, Frahm N, Gilbert PB, Sanders-Buell E, Heath L, Magaret CA, Bose M, Bradfield A, O’Sullivan A, Crossler J, Jones T, Nau M, Wong K, Zhao H, Raugi DN, Sorensen S, Stoddard JN, Maust B, Deng W, Hural J, Dubey S, Michael NL, Shiver J, Corey L, Li F, Self SG, Kim J, Buchbinder S, Casimiro DR, Robertson MN, Duerr A, McElrath MJ, McCutchan FE, Mullins JI. Genetic impact of vaccination on breakthrough HIV-1 sequences from the STEP trial. Nature Medicine. 2011;17:366–371. doi: 10.1038/nm.2316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rolland M, Edlefsen PT, Larsen BB, Tovanabutra S, Sanders-Buell E, Hertz T, deCamp AC, Carrico C, Menis S, Magaret CA, Ahmed H, Juraska M, Chen L, Konopa P, Nariya S, Stoddard JN, Wong K, Zhao H, Deng W, Maust BS, Bose M, Howell S, Bates A, Lazzaro M, O’Sullivan A, Lei E, Bradfield A, Ibitamuno G, Assawadarachai V, O’Connell RJ, deSouza MS, Nitayaphan S, Rerks-Ngarm S, Robb ML, McLellan JS, Georgiev I, Kwong PD, Carlson JM, Michael NL, Schief WR, Gilbert PB, Mullins JI, Kim JH. Increased HIV-1 vaccine efficacy against viruses with genetic signatures in Env V2. Nature. 2012;490:417–420. doi: 10.1038/nature11519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]
- Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]
- Stephen MA. Edf statistics for goodness of fit and some comparisons. Journal of the American Statistical Association. 1974;69:730–737. [Google Scholar]
- Sun Y, Gilbert PB. Estimation of stratified mark-specific proportional hazards models with missing marks. Scandinavian Journal of Statistics. 2012;39:34–52. doi: 10.1111/j.1467-9469.2011.00746.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun Y, Gilbert PB, McKeague IW. Proportional hazards models with continuous marks. The Annals of Statistics. 2009;37:394–426. doi: 10.1214/07-AOS554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun Y, Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scandinavian Journal of Statistics. 2005;32:21–47. [Google Scholar]
- Tian L, Zucker D, Wei LJ. On the Cox model with time-varying regression coefficients. Journal of the American Statistical Association. 2005;100:172–183. [Google Scholar]
- Wei X, Decker JM, Wang S, Hui H, Kappes JC, Wu X, Salazar-Gonzalez JF, Salazar MG, Kilby JM, Saag MS, Komarova NL, Nowak MA, Hahn BH, Kwong PD, Shaw GM. Antibody neutralization and escape by HIV-1. Nature. 2003;422:307–312. doi: 10.1038/nature01470. [DOI] [PubMed] [Google Scholar]