Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2020 Sep 24;23(2):507–521. doi: 10.1093/biostatistics/kxaa037

Immune correlates analysis using vaccinees from test negative designs

Dean A Follmann 1,, Lori Dodd 1
PMCID: PMC9216615  PMID: 32968765

Summary

Determining the effect of vaccine-induced immune response on disease risk is an important goal of vaccinology. Typically, immune correlates analyses are conducted prospectively with immune response measured shortly after vaccination and subsequent disease status regressed on immune response. In outbreaks and rare disease settings, collecting samples from all vaccinees is not feasible. The test negative design is a retrospective design used to measure vaccine efficacy where symptomatic individuals who present at a clinic are assessed for relevant disease (cases) or some other disease (controls) and vaccination status ascertained. This article proposes that test negative vaccinees have immune response to vaccine assessed both for relevant (e.g., Ebola) and irrelevant (e.g., vector) proteins. If the latter immune response is unaffected by active (Ebola) infection, and is correlated with the relevant immune response, it can serve as a proxy for the immune response of interest proximal to infection. We show that logistic regression using imputed immune response as the covariate and case disease as outcome can estimate the prospective immune response slope and detail the assumptions needed for unbiased inference. The method is evaluated by simulation under various scenarios including constant and decaying immune response. A simulated dataset motivated by ring vaccination for an ongoing Ebola outbreak is analyzed.

Keywords: Imputation; Likelihood, Logisitic regression; Regression calibration

1. Introduction

The test negative design is a kind of case–control study used to estimate vaccine efficacy (Fukushima and Hirota, 2017). Originally proposed for pneumococcal vaccination Broome and others (1980), it has been used extensively for influenza vaccine see Jackson and Nelson (2013) as well as rotavirus vaccine (Boom and others, 2010). The basic design uses symptomatic subjects who come to a clinic with symptoms that could be either the vaccine preventable case disease (say Ebola) or a different control disease (non-Ebola). Vaccination status is ascertained and vaccine efficacy (VE) calculated as one minus the odds of vaccine among cases divided by the odds of vaccine among controls. This equals the prospective vaccine efficacy due to the invariance of the odds ratio for prospective or case–control sampling. The test negative study requires that all symptomatic vaccinees have the same probability of a clinic visit, an analogous condition for the symptomatic unvaccinated subjects, and that exposure be the same for all vaccinated and unvaccinated subjects. Numerous authors have investigated the sensitivity of the test negative design to these assumptions Sullivan and others (2016), Westreich and Hudgens (2016), Fukushima and Hirota (2017), and Lewnard and others (2018).

An important goal in vaccine development is to evaluate how vaccine-induced immune response correlates with risk of disease, known as a “correlate of risk” (Qin and others, 2007). Identification of a specific immune response that is associated with low disease risk provides a target for vaccine development. A common approach to evaluate immune correlates is to measure immune response to the vaccine shortly after vaccination determine who gets infected during prospective follow-up and then use logistic regression to predict the probability of disease as a function of measured immune response. Ideally, such an approach would be used in outbreak settings, see Halloran and others (2020), but at times it may not be feasible to prospectively collect samples. In addition, for rare diseases in non-outbreak settings, an enormous number of samples may need to be collected for traditional correlates analysis. In other settings, the durability of a vaccine is of interest and thus the immune response at the time of exposure is of interest though this may be many years after vaccination.

This article details an approach to immune correlates analysis under a test negative design using samples collected from vaccinees proximal to exposure. To fix ideas, consider the rVSV vaccine for Ebola which induces an immune response to Ebola antigens say Inline graphic as well as to the Vesicular Stomatitis Vaccine (VSV) vector, say Inline graphic. Naively, one might measure Inline graphic among the symptomatic vaccinees who visit a clinic and fit a logistic regression with outcome being Ebola and covariate Inline graphic. The problem is that Ebola immune response is likely massive in those with Ebola infection. What we really want is the pure vaccine-induced Inline graphic just prior to exposure in both Ebola cases and controls. As a proxy, we can use Inline graphic at the clinic visit which should be similar to Inline graphic at exposure. Provided Inline graphic and Inline graphic are correlated, we can use Inline graphic to impute the vaccine-induced Inline graphic at exposure. We can then fit a logistic regression with Ebola as the outcome using the expectation of Inline graphic conditional on Inline graphic, say Inline graphic, as the covariate; also known as regression calibration. If the time interval from vaccination to exposure is relatively long, the substantial decay of the immune response over time may need to be addressed. This can be accomplished either by stratification, or flexible parametric modeling of risk over study time, provided the disease is rare. We evaluate the performance of the method via simulation and illustrate its use by a simulated dataset meant to reflect the recent Ebola outbreak in the Democratic Republic of the Congo where the rVSV Ebola vaccine was deployed using ring vaccination.

2. Formulation and models

2.1. Constant immune response

An important goal in vaccine development is to assess how immune response to the vaccine, Inline graphic, is associated with the risk of infection/disease Inline graphic. In a randomized trial, vaccinees Inline graphic have an immune response Inline graphic measured shortly after vaccination. For now, assume that Inline graphic does not change. Volunteers are followed for a period of time Inline graphic and disease status is recorded. With a test negative design, we need both the case disease and a different (control) disease for reference while the undiseased are not used. We thus define the disease of interest (case), a different disease (control), or undiseased which we denote by Inline graphic respectively. We need three possibilities for later use in a test negative design. If both diseases are acquired, we use the case disease. We assume that among vaccinees

graphic file with name Equation1.gif (2.1)

and that

graphic file with name Equation2.gif (2.2)

where Inline graphic is the conditional probability that a vaccinee who avoids the case disease develops the control disease. Note that Inline graphic is assumed free of Inline graphic as it should not have an effect on acquiring the control disease. An immune correlates analysis relating risk of disease to immune response can be conducted by fitting (2.1) in vaccinees using as outcome the indicator that Inline graphic (versus Inline graphic). For developmental simplicity, we do not incorporate baseline covariates in (2.1) though one can.

For certain settings, Inline graphic may not be measured shortly after vaccination. For rare diseases it may be prohibitively costly, or in outbreak settings it may not be logistically feasible. In other settings, the durability of a vaccine is unknown and thus Inline graphic at exposure is of interest even though it may be many years since vaccination. Or in outbreak settings, it may not be logistically feasible to draw samples. In all these situations, a correlates analysis using the immune response proximal to exposure would be of great interest.

A design that collects information proximal to exposure is the test negative which takes symptomatic patients (Inline graphic or Inline graphic) who present at a clinic and tests them for the disease of interest and records vaccination status. Those testing positive are classified as cases (Inline graphic) while those testing negative are classified as controls (Inline graphic). Formally, Inline graphic if Inline graphic and Inline graphic, while Inline graphic if Inline graphic, and Inline graphic, where Inline graphic is the indicator a person with either disease (i.e., Inline graphic or 1) arrives at the clinic. Overall vaccine efficacy is estimated as

graphic file with name Equation3.gif (2.3)

where Inline graphic is the vaccine indicator see Guolo (2008). The second equality follows from the invariance of the odds to prospective and retrospective sampling. The test negative design can be much more efficient than a prospective study, but does require more assumptions.

Suppose, we somehow knew Inline graphic in all vaccinees who arrived at the clinic. Let Inline graphic be the arrival probability for an symptomatic vaccinee with disease status Inline graphic. Based on (2.1) and (2.2), we can derive the probability that a symptomatic vaccinee who arrives at the clinic has the case disease:

graphic file with name Equation4.gif (2.4)
graphic file with name Equation5.gif (2.5)
graphic file with name Equation6.gif (2.6)

where Inline graphic. Thus, if we fit a simple logistic regression model with Inline graphic as covariate among vaccinees in the clinic, we can recover the slope Inline graphic from (2.1). Note that this obtains even if we allow cases and controls to have different arrival probabilities. The usual test negative design requires Inline graphic for all vaccinees. We next consider the actual setting where Inline graphic must be derived from measurements at arrival. Figuring out how to get Inline graphic from test negative data is the major contribution of this article.

In practice, we can measure Inline graphic when subjects arrive at the clinic, say Inline graphic where Inline graphic is the arrival time. Naively fitting (2.6) using the clinic Inline graphic is problematic. For vaccinees who present with the control disease Inline graphic, Inline graphic should be close to Inline graphic where Inline graphic is just prior to infection as control disease should have little effect on the vaccine-induced immune response to the relevant vaccine antigen. But the vaccinated who present with the disease of interest Inline graphic are likely to have quite high Inline graphic due to active infection. So Inline graphic at arrival is different from Inline graphic for vaccinees with Inline graphic.

However, certain vaccines (e.g., vector based vaccines) also induce an immune response to irrelevant antigens (e.g., the vector), say Inline graphic, which should be relatively unaffected by active infection. To fix ideas, suppose that Inline graphic achieve stable values shortly after vaccination and remain constant in Inline graphic diseased controls. If Inline graphic are correlated prior to exposure, then one could predict the unadulterated Inline graphic at exposure using the Inline graphic at presentation. One could then use Inline graphic, the imputed immune response to relevant vaccine antigens at exposure, in lieu of Inline graphic. This works if Inline graphic is unaffected by case or control disease and Inline graphic are unaffected by control disease. But these requirements can be weakened. Suppose that for both groups, we observe Inline graphic and for those with control disease we observe Inline graphic where Inline graphic are errors (which can be correlated and have non-zero means). Thus Inline graphic are the pre-infection values while Inline graphic are the values observed at the clinic when subjects are diseased. If Inline graphic differs for case and control disease, however, the control model fitted on Inline graphic does not apply to cases and a more complex approach would be required.

This strategy is displayed in Figure 1. The diamonds are actual Inline graphic while the two solid bent lines are linear interpolations of the IgG antibody response to the outside of the Ebola virus i.e., Ebola glycoprotein (GP) following rVSV vaccination from two randomly selected subjects in Prevail 1, an immunogenicity trial of the rVSV vaccine Kennedy and others (2016). Immune response was measured at baseline, 1 week, 1 month, 6 months, and 1 year (diamonds). While IgG response to vector Inline graphic was not measured, we illustrate interpolated hypothetical values by the dashed bent lines. We pretend that these two subjects were infected, with, respectively, malaria and Ebola. They became symptomatic and arrived at a clinic a few days after productive exposure. The Inline graphic values for the malaria patient at arrival are similar to the values at exposure, but for the Ebola patient, only Inline graphic is similar at exposure and arrival while Inline graphic (dashed line) is massive from active Ebola infection., The bottom panel illustrates hypothetical data with a correlation of 0.70 between Inline graphic that might have been obtained from a sample of vaccinated controls who arrived at the clinic. From this relationship, we can impute Inline graphic at infection for the Ebola patient.

Fig. 1.

Fig. 1.

Top panel: IgG immune responses over time for two vaccinated volunteers from Prevail 1. The solid bent line is the linear interpolation of X = IgG to Ebola GP measured at weeks 0, 1, 4, 26, and 52 (red diamonds). The dotted red line is hypothetical X to reflect Ebola infection. The dashed bent line is hypothetical W=IgG to rVSV vector. The subject infected with malaria arrives with Inline graphic similar to those at infection. Only Inline graphic is similar to the value prior to infection for the Ebola infected volunteer. Bottom panel: a scatter plot of Inline graphic for the controls (circles) with imputation of Inline graphic for the Ebola case (solid square).

More formally, from the controls we can fit the model

graphic file with name Equation7.gif (2.7)

where Inline graphic is an error term. Using the estimated parameters, we impute Inline graphic,

graphic file with name Equation8.gif

and fit the logistic regression model in the symptomatic vaccinees

graphic file with name Equation9.gif (2.8)

This imputation is known as regression calibration and is a simple way to correct for measurement error in a covariate. Note that we impute Inline graphic even in the vaccinated controls where Inline graphic is known and thus treat both cases and controls in the same way. Using imputation in all those with Inline graphic and Inline graphic directly in all those with Inline graphic leads to substantial bias. Because we use Inline graphic, which is estimated instead of Inline graphic, which is fixed, standard errors from standard software fitting (2.8) are likely too small. A simple remedy is to use the bootstrap or use derived standard errors, see Rosner and others (1990).

Instead of using an imputed Inline graphic for regression calibration, one can perform a correlates analysis using Inline graphic directly. While Inline graphic is not part of the mechanism of protection, “correlates” analyses are often done with immune responses that may not be causal but are presumably related to the causal mechanism. As an example, smallpox vaccination involves scraping the arm with antigen. A vaccine “take” is recorded if a pox forms. While the scar itself does not protect, it is a proxy for a robust relevant immune response to the vaccine and risk of disease by scar or “take” is of interest even though it is clearly a non-mechanistic correlate of risk using the nomenclature of Plotkin and Gilbert (2012). Use of Inline graphic directly is simple, avoids the measurement error issue, requires no imputation, and has slightly better power than use of imputed Inline graphic, as we will see. Use of Inline graphic by itself, however, is not easily interpretable unless as a proxy for vaccine “take,” i.e., Inline graphic is binary and Inline graphic is like being unvaccinated.

2.2. Waning immune response

The above development is appropriate if the immune responses Inline graphic are stable during the followup period for the cohort study. In general, antibody decay over time and the above approach can lead to bias. To see this, suppose that nearly everyone was vaccinated prior to an outbreak which exploded and then waned so the risk of the case disease decreased over time, coinciding with the antibody decay. Further suppose that the control disease rate was constant over time. Then even if Inline graphic was unrelated to Inline graphic we would associate low Inline graphic with low risk and we would tend to estimate a positive Inline graphic. And the problem is even more complicated as, in general, people will be vaccinated at various times and the case disease rate might vary with time.

Let Inline graphic be the time since the start of the test negative study. To develop the time varying Inline graphic setting more formally, we will assume that the hazard for a case disease arriving at time Inline graphic with covariate Inline graphic is given by

graphic file with name Equation10.gif (2.9)

where Inline graphic is the hazard for case disease a little before Inline graphic and Inline graphic is the probability of arriving at the clinic at time Inline graphic, given a case infection just prior to Inline graphic.

We analogously assume that the hazard for the control disease is independent of Inline graphic and arbitrary, as is the instantaneous probability of arriving. Thus, the hazard for a person with control disease arriving at the clinic is

graphic file with name Equation11.gif

Recall that the probability of an event in a small interval [t,t+Inline graphic), given no event prior to Inline graphic, is approximately Inline graphic for an arbitrary hazard function Inline graphic. Under a “rare” disease assumption, we can calculate the probability that a given vaccinee who showed up at time Inline graphic was a case as

graphic file with name Equation12.gif (2.10)

where Inline graphic is the indicator of case disease for an arrival at time Inline graphic and

graphic file with name Equation13.gif

Now Inline graphic could be constant. One way this can happen is if Inline graphic and Inline graphic so Inline graphic. If so, we can simply fit logistic regression to the Inline graphic data points Inline graphic. In general, Inline graphic will not be constant so that

graphic file with name Equation14.gif (2.11)

We might specify Inline graphic where Inline graphic is say the median followup time. Or we could fit a quadratic function of log(t). In practice, different approaches to specify Inline graphic could be tried and the best one selected. Note that we only need know (or impute) Inline graphic at the time of arrival to the clinic for each vaccinee.

3. Assumptions

The above approaches recover the prospective slope for Inline graphic or Inline graphic if the model assumptions are met. In this section, we delineate the assumptions when Inline graphic is known for all, constant after time Inline graphic, the logistic-type model Inline graphic is correctly specified, and there is independence between exposure and Inline graphic. The major issue is thus whether vaccinated cases (controls) arrive with fixed probability Inline graphic (Inline graphic). Or equivalently, whether those who arrive are a representative sample of cases and controls respectively. We allow Inline graphic.

1. The vaccinees who arrive with the control disease provide a representative sample of the distribution of immune response among those vaccinated without the case disease.

This follows if Inline graphic has no impact on the probability of acquiring the control disease and no impact on the probability of arriving at the clinic so Inline graphic. Note we can allow Inline graphic to depend on latent or measured disease stratus or baseline covariates as long as they independent of Inline graphic.

2. The vaccinees who arrive with the case disease provide a representative sample of the distribution of immune response among those vaccinated with the case disease.

graphic file with name Equation15.gif (3.12)

The last equality follows if Inline graphic has no additional impact on Inline graphic given severity which seems a reasonable assumption. Using the last equality, we can see that Inline graphic does not depend on Inline graphic if either Inline graphic is free of Inline graphic or Inline graphic is free of Inline graphic. We discuss each in turn.

Now Inline graphic if severity does not impact health seeking behavior. This might occur if all cases are equally encouraged to arrive at a clinic and all cases are equally compliant, or if the severity gradient is modest enough that it does not change behavior.

Next consider when Inline graphic i.e., Inline graphic has no impact on severity given a vaccinee is infected. While this obtains if the vaccine’s mechanism of action is all-or-none, it is a weaker condition. An all-or-none vaccine implies that the distribution of severity is unchanged by vaccination so that Inline graphic, where Inline graphic is the vaccine indicator. But we only require Inline graphic which allows Inline graphic.

If neither assumption is plausible but case disease severity is observed, there is a remedy provided disease follows the prospective logistic regression model:

graphic file with name Equation16.gif (3.13)

for Inline graphic Then even if Inline graphic affects the arrival probability through the observable disease severity, recovery of Inline graphic using data Inline graphic follows from arguments analogous to those used to derive (2.6). One can show that (3.13) implies the distribution of disease severity among the vaccinated infecteds, Inline graphic, varies with Inline graphic reflecting a kind of disease modifying vaccine.

These conditions can be relaxed using baseline covariates Inline graphic. If the prospective model (2.1) controls for confounding by specifying Inline graphic, then the above arguments go through, even if the arrival probabilities satisfy Inline graphic. One can show that Inline graphic with Inline graphic, and thus Inline graphic remains the coeffience for Inline graphic.

To summarize, assuming the verity of (3.12) and a correctly specified model for Inline graphic, we can recover Inline graphic even if Inline graphic depend on Inline graphic unless disease severity is unobservable, Inline graphic depends on Inline graphic, and Inline graphic depends on Inline graphic.

4. Simulations

4.1. Constant immune response

We illustrate performance for the simple case where antibody does not decay. We prospectively generate data for vaccinees which are then sampled retrospectively as in a test negative design. We assume a population of size Inline graphic vaccinees and generate the case disease according to

graphic file with name Equation17.gif

and sequentially generate control disease among the case uninfecteds as

graphic file with name Equation18.gif

We specify Inline graphic as Gaussian with mean= 3.00 and standard deviation =0.50 as in the Prevail 1 vaccine trial Kennedy and others (2016). We generate

graphic file with name Equation19.gif

where Inline graphic is standard normal. Thus, Inline graphic are bivariate normal with common mean 3.00, common standard deviation 0.50, and correlation Inline graphic. We set Inline graphic, 0.70, and 1.00. For reference, a study estimated a correlation of 0.70 between ELISA OD immune response readouts for VSV proteins (W) and Ebola proteins (X) at 56 days post vaccination Poetsch and others (2018).

We specify Inline graphic as (Inline graphic5.51, 0.00), (Inline graphic4.05, Inline graphic0.60), and (Inline graphic1.12, Inline graphic1.81). These correspond to no, modest, and strong effects of Inline graphic. To help interpret Inline graphic, the ratios of risk of case disease for the 1st versus 8th octile of Inline graphic are 1, 2, and 10, respectively, for Inline graphic. We set the conditional risk of control disease as Inline graphic, Inline graphic and Inline graphic so that vaccinated cases and controls arrive with equal probability.

We evaluate estimation of the model

graphic file with name Equation20.gif (4.14)

where IR, the immune response, is Inline graphic, Inline graphic, or Inline graphic

Table 1 presents the results using 10 000 simulated studies per scenario. Columns 4 and 5 provide the sample mean and variance of Inline graphic. Column 6 reports the rejection rate for the two-sided Inline graphic Wald test using the Monte Carlo standard error over the 10 000 simulated studies. We see that we have power of around 100% to test the effect of IR when Inline graphic for all scenarios, while for Inline graphic we need a correlation of 0.70 to approach 80% power. The type I error rate seems consistent with the Inline graphic used for testing.

Table 1.

Simulated performance of logistic regression using different covariates for IR immune response; Inline graphic, or Inline graphic. Sample statistics for different parameters estimates are presented. Inline graphic is the average number of symptomatic vaccinees who arrive and Inline graphic the average number of vaccinees who arrive with the case disease. 1000 test negative designs are simulated for each row. The last row has Inline graphic all other rows have Inline graphic Each row is based on 10 000 simulated studies.

Inline graphic Inline graphic IR Inline graphic Inline graphic % Reject Inline graphic Inline graphic
0.000 1.000 Inline graphic Inline graphic 0.002 0.019 0.053 1049 302
Inline graphic 0.600 1.000 Inline graphic Inline graphic 0.602 0.024 0.973 972 225
Inline graphic 1.810 1.000 Inline graphic Inline graphic 1.821 0.040 1.000 908 160
0.000 0.700 Inline graphic Inline graphic 0.001 0.040 0.048 1048 302
Inline graphic 0.600 0.700 Inline graphic Inline graphic 0.605 0.049 0.786 973 225
Inline graphic 1.810 0.700 Inline graphic Inline graphic 1.826 0.076 1.000 909 160
0.000 0.400 Inline graphic 0.003 0.122 0.049 1049 302
Inline graphic 0.600 0.400 Inline graphic Inline graphic 0.603 0.152 0.332 973 225
Inline graphic 1.810 0.400 Inline graphic Inline graphic 1.827 0.232 0.974 909 160
0.000 0.700 Inline graphic Inline graphic 0.001 0.020 0.048 1048 302
Inline graphic 0.600 0.700 Inline graphic Inline graphic 0.423 0.024 0.790 973 225
Inline graphic 1.810 0.700 Inline graphic Inline graphic 1.274 0.035 1.000 909 160
0.000 0.400 Inline graphic 0.001 0.019 0.048 1049 302
Inline graphic 0.600 0.400 Inline graphic Inline graphic 0.239 0.023 0.344 973 225
Inline graphic 1.810 0.400 Inline graphic Inline graphic 0.725 0.032 0.984 909 160
Inline graphic 0.600 1.000 Inline graphic Inline graphic 0.604 0.031 0.938 599 225

Regression calibration recovers the true Inline graphic even with Inline graphic. In the Appendix, we derive the correct (marginalized) probability of Inline graphic given Inline graphic under our assumptions. We show that this curve as a function of Inline graphic (with slope Inline graphic) is virtually identical to the regression calibration curve (with the same slope Inline graphic), irrespective of Inline graphic. This fidelity explains the excellent recovery of Inline graphic with regression calibration. Regression calibration does become more biased as Inline graphic increases, but for these simulations, Inline graphic is relatively small, Follmann and others (1999).

Intuitively, as Inline graphic decreases estimation becomes more difficult. For Inline graphic, the variance for Inline graphic under regression calibration is 0.024, 0.049, and 0.152 for Inline graphic = 1.00, 0.70, and 0.40, respectively. So the variance roughly doubles as Inline graphic goes from 1.00 to 0.70 and triples as Inline graphic goes from 0.70 to 0.40. In contrast, with direct use of Inline graphic the variance of Inline graphic stays constant at about 0.023 or 0.024 for all Inline graphic with Inline graphic. The use of Inline graphic in lieu of Inline graphic, of course, leads to a smaller estimate of Inline graphic but this is counterbalanced by the smaller variance. The upshot is that the power for testing Inline graphic is nearly identical for regression calibration and direct use of Inline graphic for all scenarios.

In the Appendix, we provide additional simulations with sample sizes 1/10th and 10 times that of Table 1. The three approaches have less power and more variability with the smaller sample size and the opposite with larger sample size. Regression calibration continues to be nearly unbiased for the larger sample size, though it has some bias for the smaller sample size.

The above simulations were conducted under the standard assumptions of the test negative design. In the final row, we set Inline graphic while Inline graphic so that vaccinees with the control disease have half the arrival probability of the vaccinated cases, which violates the standard test negative assumption. We set Inline graphic and evaluate Inline graphic so that the second line of Table 1, which has Inline graphic provides a reference. As expected, we see the estimate of Inline graphic is still unbiased, though the variance for Inline graphic is slightly increased due to fewer control vaccinees.

4.2. Waning immune response

In this subsection, we provide a brief evaluation of the model (2.11). To focus on the issue of antibody decay, we assume that Inline graphic is measured without error, and that everyone with either disease arrives at a clinic. We specify the instantaneous conditional risk for control disease using a Weibull hazard,

graphic file with name Equation21.gif

where Inline graphic is the time since start of the test negative study. When Inline graphic (Inline graphic) the hazard is increasing (decreasing). The instantaneous conditional risk of case disease is specified as

graphic file with name Equation22.gif

For Inline graphic, we specify linear antibody decay according to a random effects model. For person Inline graphic the decay is

graphic file with name Equation23.gif

where Inline graphic are independent normals with mean 0 and standard deviations Inline graphic, respectively. The parameter Inline graphic reflects both natural variation in immune response to vaccination as well as variation in vaccination times as those vaccinated long ago would tend to have smaller Inline graphic.

We generate times to case and control disease by the inverse cumulative distribution function method where we generate Inline graphic, a uniform [0,1] random variable and then determine the disease arrival time Inline graphic that solves Inline graphic for Inline graphic = 0,1. For the case disease, this is solved by numerically minimizing Inline graphic. For subjects who experience both case and control events, we use the first and discard the second. We set Inline graphic vaccinees and follow subjects until we observe 1000 first events.

For all scenarios, we set Inline graphic, (Inline graphic, and Inline graphic. Our reference case is Inline graphic, or constant risk of each disease and Inline graphic, Inline graphic. We then vary these parameters to see their impact on performance. The hazards and distribution of arrival times are graphed in the Supplementary material available at Biostatistics online.

We evaluate two different modeling strategies for (2.11). The first is to treat Inline graphic as fixed with no dependence on time. The second is to evaluate five models and pick the one with the best Akaike Information Criterion (AIC). The five models are to treat Inline graphic as constant (1), linear in Inline graphic (2), quadratic in Inline graphic (3), linear in Inline graphic (4), and quadratic in Inline graphic. This second strategy is meant to mimic how a data analyst might address a non-constant Inline graphic. To summarize performance, we calculate the sample mean and variance for the estimate of Inline graphic under modeling strategies 1 and 2. We also report the modal model choice of strategy 2 and the number of times that modal choice was chosen. Additionally, we report the number of cases and controls for the last data set.

Table 2 presents the summary results. We see that both modeling strategies 1 and 2 recover Inline graphic very well for scenario 1, our reference, and scenario 2, where Inline graphic) has more variability and a steeper mean decline. Both scenarios 1 and 2 have constant hazards for both case and control disease. For non-constant hazards, modeling strategy 1 is biased with bias of around Inline graphic10%. Modeling strategy 2 has minimal bias for non-constant hazards and picks Inline graphic as a linear function of Inline graphic in about 70–80% of the simulations using the AIC criterion.

Table 2.

Performance of logistic regression based strategies for dealing with the nuisance function Inline graphic. Strategy 1: Fix Inline graphic at Inline graphic. Strategy 2: Choose Inline graphic based on AIC. The sample average and variance over the simulations are given for Inline graphic under the two strategies. AIC choice reports the modal choice for strategy 2 while Inline graphic are the number of cases and controls for the last simulated dataset. The true value of Inline graphic. Each row is based on 1000 simulated studies.

            Strategy 1 Strategy 2 AIC # times    
Scenario Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Choice Chosen Inline graphic Inline graphic
1 1.00 1.00 0.5 0.1 Inline graphic 0.5 Inline graphic 1.47 0.0204 Inline graphic 1.47 0.0209 1 735 426 574
2 1.00 1.00 1.0 0.5 Inline graphic 1.0 Inline graphic 1.48 0.0100 Inline graphic 1.48 0.0102 1 693 629 371
3 1.25 0.75 0.5 0.1 Inline graphic 0.5 Inline graphic 1.59 0.0268 Inline graphic 1.48 0.0271 4 738 235 765
4 0.75 1.25 0.5 0.1 Inline graphic 0.5 Inline graphic 1.35 0.0213 Inline graphic 1.48 0.0239 4 780 618 382
5 1.50 0.50 0.5 0.1 Inline graphic 0.5 Inline graphic 1.61 0.0858 Inline graphic 1.49 0.0927 4 770 66 934
6 0.50 1.50 0.5 0.1 Inline graphic 0.5 Inline graphic 1.29 0.0324 Inline graphic 1.48 0.0392 4 828 829 171

Strictly speaking, (2.10) obtains under a “rare” disease assumption. In the above simulations, 10% of the vaccinees acquired disease with no noticeable bias. We did further simulations and did observe about 7% bias for scenario 2 with 30% disease acquisition.

5. Example

In the 2018–2020, outbreak of Ebola in the Democratic Republic of the Congo a ring vaccination campaign was conducted. Local surveillance teams kept track of new Ebola cases which were then relayed to a WHO vaccination team that vaccinated the contacts and contacts of contacts of the index case. Several hundred thousand at risk people were thus vaccinated. Following vaccination, the vaccinees were monitored and encouraged to visit an Ebola transit/treatment center once any symptoms developed. The ring campaign used the rVSV vaccine which is estimated to have quite high efficacy though breakthrough infections occur, WHO (2019) and Henao-Restrepo and others (2015). The durability of the vaccine and the impact of immune response on risk of disease are unknown. Prospectively collecting samples in all vaccinees for correlates analysis has not been done.

To illustrate our approach, we simulate a dataset meant to loosely approximate a large Ebola ring vaccination campaign with a vaccine whose substantial efficacy is modulated by immune response. We assume that each person’s immune response is relatively stable over time. We assume 150 000 at risk subjects are vaccinated and generate case and control infections as in the previous section with Inline graphic. We set Inline graphic, the conditional risk of control disease (e.g., malaria) equal to 0.005 and assume that vaccinated subjects arrive at the transit center with fixed probability Inline graphic, as one might expect from regular monitoring and encouragement. Note that with regular monitoring of all subjects in a ring, the assumption that Inline graphic for vaccinees may hold, thus even if rVSV does modify disease, the estimated risk of disease as a function of immune response should be unbiased.

To provide additional context, we assume 300 000 unvaccinated subjects are at risk (e.g., have first or second degree contact with an Ebola case) and generate case infections with probability 0.01. We generate control infections from this set with conditional probability 0.005 just as for the vaccinated. We assume that the unvaccinated contacts arrive with probability 0.25 for both case and control disease, irrespective of severity. Finally, we assume that 20% of the arrivals occur within 35 days of contact with an Ebola case. For vaccinees, this should ensure a relatively stable vaccine-induced immune response at arrival.

For this single simulated dataset, a total of 114 unvaccinated contacts of Ebola cases arrived at least 35 days post contact. The proportion with Ebola Virus Disease (EVD) was 0.69. In contrast, 127 vaccinated contacts of an Ebola case arrived at least 35 days post contact with an EVD proportion of 0.28. Thus the overall test negative VE for these late arrivals is estimated as

graphic file with name Equation24.gif

The above VE estimate is unbiased under the standard test negative assumptions that vaccinees (unvaccinated) have fixed probability Inline graphic (Inline graphic) of arriving at a transit center, irrespective of true disease status, disease severity, and equal exposure of vaccinees and unvaccinated. This latter assumption may be questioned in a ring study where the vaccinees are all known contacts of a case.

Figure 2 displays the data, jittered for the unvaccinated for whom we set Inline graphic to 0, along with the fitted logistic regression model Inline graphic where Inline graphic is the immune response covariate, Inline graphic, Inline graphic, or Inline graphic. For the unvaccinated, we draw a dashed line at the observed Ebola disease rate of 0.69 for reference. Table 3 provides the estimated parameters. We see that, as expected, using Inline graphic results in a smaller Inline graphic than use of Inline graphic and that the Wald statistic for testing Inline graphic for Inline graphic is identical to the Wald statistic based on Inline graphic, when we use the naive standard error. We also provide the bootstrap standard error and Wald statistic for regression calibration based on 1000 resamples. The resultant standard errors are slightly larger than the naive standard errors. The two-sided p-values for testing an effect of Inline graphic and Inline graphic on disease are, respectively, 0.02 and 0.01. Recall that the estimates of Inline graphic are unbiased even if the arrival probability for vaccinees depends on latent severity, provided the vaccine has an all-or-none mechanism. Estimates of Inline graphic are also unbiased even if the vaccine modifies latent disease severity, provided the arrival probability is the same for all vaccinees with the case disease.

Fig. 2.

Fig. 2.

Simulated data to illustrate an immune correlates analysis based on a test negative design for the 2018–2020 Ebola outbreak in the DRC. Unvaccinated subjects’ data are given by jittered open circles near x-axis near 0 and whose Ebola disease rate of 0.69 is given by the dashed line. Vaccinated subjects’ data are given by diamonds. The estimated probability of disease for vaccinees is given by the solid curve. The three panels correspond to using Inline graphic, Inline graphic, and Inline graphic as the immune response, respectively.

Table 3.

Parameter estimates for a simulated data set with 114 unvaccinated and 127 vaccinated subjects. We evaluate Inline graphic, Inline graphic, and Inline graphic as the immune response covariate.

  IR = Inline graphic IR=Inline graphic IR= Inline graphic
parm est se Wald est seInline graphic WaldInline graphic seInline graphic WaldInline graphic est se wald
Inline graphic 3.68 1.28 2.86 3.28 1.68 1.95 1.70 1.93 2.07 1.20 1.72
Inline graphic Inline graphic 1.64 0.46 Inline graphic 3.54 Inline graphic 1.45 0.58 Inline graphic 2.49 0.60 Inline graphic 2.40 Inline graphic 1.04 0.42 Inline graphic 2.49

Inline graphic Naive standard errors, Inline graphicBootstrap standard errors

To help interpret the magnitude of the effect of the imputed immune response on risk of disease, we can compare the ratio of the risk of disease at the 10th versus 90th percentile of the observed distribution of Inline graphic. These percentiles are 2.51 and 3.39, respectively, with associated probabilities of disease of 0.42 and 0.16 for these late vaccinated arrivals. Thus the risk is roughly 2.5 times greater at the 10th versus 90th percentile of Inline graphic indicating a substantial effect of antibody on EVD risk.

6. Discussion

This research arose from the design of an immune correlates study using test negative data for the 2018–2020 Democratic Republic of the Congo (DRC) Ebola outbreak. The main contribution of this article is the idea that an estimate of the effect of the relevant immune response Inline graphic on the risk of disease can be estimated by using an irrelevant immune response Inline graphic. We use the proxy Inline graphic to impute Inline graphic based on a prediction model that is estimated in vaccinated controls using samples collected upon arrival at a clinic. The method extends to settings where antibody substantially decays over time. Simulations demonstrated the feasibility of the proposed methods.

This approach applies beyond Ebola. The key requirement is that the vaccine of interest produce irrelevant immune responses that will be unaffected by the disease of interest, and that these immune responses are correlated with the immune response of interest. One such application is for extremely rare diseases such as the prevention of Zika-induced congenital birth defects. Suppose Puerto Rico begins Zika vaccination in 2025 and an outbreak sweeps the island in 2040. Since Zika-induced birth defects are rare, a prospective immune correlates design would require long-term storage of samples from hundreds of thousands of women just after vaccination. Furthermore, the values collected in 2025 onward would not address the question of the immune response at Zika exposure in 2040. But using the methods of this article we could sample cases (mothers of children with birth defects who are antibody positive for non-vaccine Zika antigens) and controls (mothers of children with birth defects who are antibody negative for non-vaccine Zika antigens). We could then apply a test negative design to assess overall VE and also perform an immune correlates analysis with Inline graphic and Inline graphic, providing that a Inline graphic that predicts Inline graphic were available.

We focused on Inline graphic or the immune response proximal to infection. However, traditional immune correlates analyses focus on Inline graphic shortly after vaccination. Our methods for time constant immune response directly apply as here Inline graphic. If antibody decay occurs, one could in principle apply (2.10) with Inline graphic replaced by Inline graphic, the predicted value of Inline graphic shortly after vaccination. This would require the development of models to predict Inline graphic.

The biggest challenge in this approach is whether the biology is amenable. Some vaccines won’t induce a Inline graphic at all. Other times, even if Inline graphic is induced the correlation may be too weak for this method to work. This may be more of a problem with pre-existing immunity to either the antigens of the vaccine or the vector. On the other hand, it is not really required that Inline graphic be an immune response to the vector. What is required is the existence of covariates Inline graphic that predict Inline graphic among vaccinees and that Inline graphic. Development of such methods is left to future work.

Supplementary Material

kxaa037_Supplementary_Data

Acknowledgments

This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). We thank Jing Qin and Michael Fay for helpful comments.

Conflict of Interest: None declared.

7. Software

The code to reproduce the example is given at https://github.com/follmand/Test-Negative-Immune-Correlates.

Supplementary materials

Supplementary material is available at http://biostatistics.oxfordjournals.org.

References

  1. Boom, J. A., Tate, J. E., Sahni, L. C., Rench, M. A., Hull, J. J., Gentsch, J. R., Patel, M. M., Baker, C. J. and Parashar, U. D. (2010). Effectiveness of pentavalent rotavirus vaccine in a large urban population in the United States. Pediatrics 125, e199–e207. [DOI] [PubMed] [Google Scholar]
  2. Broome, C. V., Facklam, R. R. and Fraser, D. W. (1980). Pneumococcal disease after pneumococcal vaccination: an alternative method to estimate the efficacy of pneumococcal vaccine. New England Journal of Medicine 303, 549–552. [DOI] [PubMed] [Google Scholar]
  3. Follmann, D. A., Hunsberger, S. A. and Albert, P. S. (1999). Repeated probit regression when covariates are measured with error. Biometrics, 55, 403–409. [DOI] [PubMed] [Google Scholar]
  4. Fukushima, W. and Hirota, Y. (2017). Basic principles of test-negative design in evaluating influenza vaccine effectiveness. Vaccine 35, 4796–4800. [DOI] [PubMed] [Google Scholar]
  5. Guolo, A. (2008). A flexible approach to measurement error correction in case–control studies. Biometrics 64, 1207–1214. [DOI] [PubMed] [Google Scholar]
  6. Halloran, M. E., Longini, I. M. and Gilbert, P. B. (2020). Designing a study of correlates of risk for Ebola vaccination. American Journal of Epidemiology 189, 747–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Henao-Restrepo, A. M., Longini, I. M., Egger, M., Dean, N. E., Edmunds, W. J., Camacho, A., Carroll, M. W., Doumbia, M, Draguez, B., Duraffour, S.et al. (2015). Efficacy and effectiveness of an RVSV-vectored vaccine expressing Ebola surface glycoprotein: interim results from the guinea ring vaccination cluster-randomised trial. The Lancet 386, 857–866. [DOI] [PubMed] [Google Scholar]
  8. Jackson, M. L. and Nelson, J. C. (2013). The test-negative design for estimating influenza vaccine effectiveness. Vaccine 31, 2165–2168. [DOI] [PubMed] [Google Scholar]
  9. Kennedy, S. B., Neaton, J. D., Lane, H. C., Kieh, M. W. S., Massaquoi, M. B. F., Touchette, N. A., Nason, M. C., Follmann, D. A., Boley, F. K., Johnson, M. P.et al. (2016). Implementation of an Ebola virus disease vaccine clinical trial during the Ebola epidemic in Liberia: design, procedures, and challenges. Clinical Trials, 13, 49–56. [DOI] [PubMed] [Google Scholar]
  10. Lewnard, J. A., Tedijanto, C., Cowling, B. J. and Lipsitch, M. (2018). Measurement of vaccine direct effects under the test-negative design. American Journal of Epidemiology 187, 2686–2697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Plotkin, S. A. and Gilbert, P. B. (2012). Nomenclature for immune correlates of protection after vaccination. Clinical Infectious Diseases 54, 1615–1617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Poetsch, J. H., Dahlke, C., Zinser, M. E., Kasonta, R., Lunemann, S., Rechtien, A., Ly, M. L., Stubbe, H. C., Krähling, V., Biedenkopf, N.et al. (2018). Detectable vesicular stomatitis virus (VSV)-specific humoral and cellular immune responses following VSV-Ebola virus vaccination in humans. The Journal of Infectious Diseases 219, 556–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Qin, L., Gilbert, P. B., Corey, L., McElrath, M. J. and Self, S. G. (2007). A framework for assessing immunological correlates of protection in vaccine trials. The Journal of Infectious Diseases 196, 1304–1312. [DOI] [PubMed] [Google Scholar]
  14. Rosner, B., Spiegelman, D. and Willett, W. C. (1990). Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology 132, 734–745. [DOI] [PubMed] [Google Scholar]
  15. Sullivan, S. G., Tchetgen Tchetgen, E. J. and Cowling, B. J. (2016). Theoretical basis of the test-negative study design for assessment of influenza vaccine effectiveness. American Journal of Epidemiology 184, 345–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Westreich, D. and Hudgens, M. G. (2016). Invited commentary: beware the test-negative design. American Journal of Epidemiology 184, 354–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. WHO. (2019). Preliminary Results on the Efficacy of rVSV-ZEBOV-GP Ebola Vaccine using the Ring Vaccination Strategy in the Control of an Ebola Outbreak in the Democratic Republic of the Congo: An Example of Integration of Research into Epidemic Response. https://reliefweb.int/report/democratic-republic-congo/preliminary-results-efficacy-rvsv-zebov-gp-ebola-vaccine-using-ring.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxaa037_Supplementary_Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES