Summary
Determining the effect of vaccine-induced immune response on disease risk is an important goal of vaccinology. Typically, immune correlates analyses are conducted prospectively with immune response measured shortly after vaccination and subsequent disease status regressed on immune response. In outbreaks and rare disease settings, collecting samples from all vaccinees is not feasible. The test negative design is a retrospective design used to measure vaccine efficacy where symptomatic individuals who present at a clinic are assessed for relevant disease (cases) or some other disease (controls) and vaccination status ascertained. This article proposes that test negative vaccinees have immune response to vaccine assessed both for relevant (e.g., Ebola) and irrelevant (e.g., vector) proteins. If the latter immune response is unaffected by active (Ebola) infection, and is correlated with the relevant immune response, it can serve as a proxy for the immune response of interest proximal to infection. We show that logistic regression using imputed immune response as the covariate and case disease as outcome can estimate the prospective immune response slope and detail the assumptions needed for unbiased inference. The method is evaluated by simulation under various scenarios including constant and decaying immune response. A simulated dataset motivated by ring vaccination for an ongoing Ebola outbreak is analyzed.
Keywords: Imputation; Likelihood, Logisitic regression; Regression calibration
1. Introduction
The test negative design is a kind of case–control study used to estimate vaccine efficacy (Fukushima and Hirota, 2017). Originally proposed for pneumococcal vaccination Broome and others (1980), it has been used extensively for influenza vaccine see Jackson and Nelson (2013) as well as rotavirus vaccine (Boom and others, 2010). The basic design uses symptomatic subjects who come to a clinic with symptoms that could be either the vaccine preventable case disease (say Ebola) or a different control disease (non-Ebola). Vaccination status is ascertained and vaccine efficacy (VE) calculated as one minus the odds of vaccine among cases divided by the odds of vaccine among controls. This equals the prospective vaccine efficacy due to the invariance of the odds ratio for prospective or case–control sampling. The test negative study requires that all symptomatic vaccinees have the same probability of a clinic visit, an analogous condition for the symptomatic unvaccinated subjects, and that exposure be the same for all vaccinated and unvaccinated subjects. Numerous authors have investigated the sensitivity of the test negative design to these assumptions Sullivan and others (2016), Westreich and Hudgens (2016), Fukushima and Hirota (2017), and Lewnard and others (2018).
An important goal in vaccine development is to evaluate how vaccine-induced immune response correlates with risk of disease, known as a “correlate of risk” (Qin and others, 2007). Identification of a specific immune response that is associated with low disease risk provides a target for vaccine development. A common approach to evaluate immune correlates is to measure immune response to the vaccine shortly after vaccination determine who gets infected during prospective follow-up and then use logistic regression to predict the probability of disease as a function of measured immune response. Ideally, such an approach would be used in outbreak settings, see Halloran and others (2020), but at times it may not be feasible to prospectively collect samples. In addition, for rare diseases in non-outbreak settings, an enormous number of samples may need to be collected for traditional correlates analysis. In other settings, the durability of a vaccine is of interest and thus the immune response at the time of exposure is of interest though this may be many years after vaccination.
This article details an approach to immune correlates analysis under a test negative design using samples collected from vaccinees proximal to exposure. To fix ideas, consider the rVSV vaccine for Ebola which induces an immune response to Ebola antigens say as well as to the Vesicular Stomatitis Vaccine (VSV) vector, say . Naively, one might measure among the symptomatic vaccinees who visit a clinic and fit a logistic regression with outcome being Ebola and covariate . The problem is that Ebola immune response is likely massive in those with Ebola infection. What we really want is the pure vaccine-induced just prior to exposure in both Ebola cases and controls. As a proxy, we can use at the clinic visit which should be similar to at exposure. Provided and are correlated, we can use to impute the vaccine-induced at exposure. We can then fit a logistic regression with Ebola as the outcome using the expectation of conditional on , say , as the covariate; also known as regression calibration. If the time interval from vaccination to exposure is relatively long, the substantial decay of the immune response over time may need to be addressed. This can be accomplished either by stratification, or flexible parametric modeling of risk over study time, provided the disease is rare. We evaluate the performance of the method via simulation and illustrate its use by a simulated dataset meant to reflect the recent Ebola outbreak in the Democratic Republic of the Congo where the rVSV Ebola vaccine was deployed using ring vaccination.
2. Formulation and models
2.1. Constant immune response
An important goal in vaccine development is to assess how immune response to the vaccine, , is associated with the risk of infection/disease . In a randomized trial, vaccinees have an immune response measured shortly after vaccination. For now, assume that does not change. Volunteers are followed for a period of time and disease status is recorded. With a test negative design, we need both the case disease and a different (control) disease for reference while the undiseased are not used. We thus define the disease of interest (case), a different disease (control), or undiseased which we denote by respectively. We need three possibilities for later use in a test negative design. If both diseases are acquired, we use the case disease. We assume that among vaccinees
(2.1) |
and that
(2.2) |
where is the conditional probability that a vaccinee who avoids the case disease develops the control disease. Note that is assumed free of as it should not have an effect on acquiring the control disease. An immune correlates analysis relating risk of disease to immune response can be conducted by fitting (2.1) in vaccinees using as outcome the indicator that (versus ). For developmental simplicity, we do not incorporate baseline covariates in (2.1) though one can.
For certain settings, may not be measured shortly after vaccination. For rare diseases it may be prohibitively costly, or in outbreak settings it may not be logistically feasible. In other settings, the durability of a vaccine is unknown and thus at exposure is of interest even though it may be many years since vaccination. Or in outbreak settings, it may not be logistically feasible to draw samples. In all these situations, a correlates analysis using the immune response proximal to exposure would be of great interest.
A design that collects information proximal to exposure is the test negative which takes symptomatic patients ( or ) who present at a clinic and tests them for the disease of interest and records vaccination status. Those testing positive are classified as cases () while those testing negative are classified as controls (). Formally, if and , while if , and , where is the indicator a person with either disease (i.e., or 1) arrives at the clinic. Overall vaccine efficacy is estimated as
(2.3) |
where is the vaccine indicator see Guolo (2008). The second equality follows from the invariance of the odds to prospective and retrospective sampling. The test negative design can be much more efficient than a prospective study, but does require more assumptions.
Suppose, we somehow knew in all vaccinees who arrived at the clinic. Let be the arrival probability for an symptomatic vaccinee with disease status . Based on (2.1) and (2.2), we can derive the probability that a symptomatic vaccinee who arrives at the clinic has the case disease:
(2.4) |
(2.5) |
(2.6) |
where . Thus, if we fit a simple logistic regression model with as covariate among vaccinees in the clinic, we can recover the slope from (2.1). Note that this obtains even if we allow cases and controls to have different arrival probabilities. The usual test negative design requires for all vaccinees. We next consider the actual setting where must be derived from measurements at arrival. Figuring out how to get from test negative data is the major contribution of this article.
In practice, we can measure when subjects arrive at the clinic, say where is the arrival time. Naively fitting (2.6) using the clinic is problematic. For vaccinees who present with the control disease , should be close to where is just prior to infection as control disease should have little effect on the vaccine-induced immune response to the relevant vaccine antigen. But the vaccinated who present with the disease of interest are likely to have quite high due to active infection. So at arrival is different from for vaccinees with .
However, certain vaccines (e.g., vector based vaccines) also induce an immune response to irrelevant antigens (e.g., the vector), say , which should be relatively unaffected by active infection. To fix ideas, suppose that achieve stable values shortly after vaccination and remain constant in diseased controls. If are correlated prior to exposure, then one could predict the unadulterated at exposure using the at presentation. One could then use , the imputed immune response to relevant vaccine antigens at exposure, in lieu of . This works if is unaffected by case or control disease and are unaffected by control disease. But these requirements can be weakened. Suppose that for both groups, we observe and for those with control disease we observe where are errors (which can be correlated and have non-zero means). Thus are the pre-infection values while are the values observed at the clinic when subjects are diseased. If differs for case and control disease, however, the control model fitted on does not apply to cases and a more complex approach would be required.
This strategy is displayed in Figure 1. The diamonds are actual while the two solid bent lines are linear interpolations of the IgG antibody response to the outside of the Ebola virus i.e., Ebola glycoprotein (GP) following rVSV vaccination from two randomly selected subjects in Prevail 1, an immunogenicity trial of the rVSV vaccine Kennedy and others (2016). Immune response was measured at baseline, 1 week, 1 month, 6 months, and 1 year (diamonds). While IgG response to vector was not measured, we illustrate interpolated hypothetical values by the dashed bent lines. We pretend that these two subjects were infected, with, respectively, malaria and Ebola. They became symptomatic and arrived at a clinic a few days after productive exposure. The values for the malaria patient at arrival are similar to the values at exposure, but for the Ebola patient, only is similar at exposure and arrival while (dashed line) is massive from active Ebola infection., The bottom panel illustrates hypothetical data with a correlation of 0.70 between that might have been obtained from a sample of vaccinated controls who arrived at the clinic. From this relationship, we can impute at infection for the Ebola patient.
More formally, from the controls we can fit the model
(2.7) |
where is an error term. Using the estimated parameters, we impute ,
and fit the logistic regression model in the symptomatic vaccinees
(2.8) |
This imputation is known as regression calibration and is a simple way to correct for measurement error in a covariate. Note that we impute even in the vaccinated controls where is known and thus treat both cases and controls in the same way. Using imputation in all those with and directly in all those with leads to substantial bias. Because we use , which is estimated instead of , which is fixed, standard errors from standard software fitting (2.8) are likely too small. A simple remedy is to use the bootstrap or use derived standard errors, see Rosner and others (1990).
Instead of using an imputed for regression calibration, one can perform a correlates analysis using directly. While is not part of the mechanism of protection, “correlates” analyses are often done with immune responses that may not be causal but are presumably related to the causal mechanism. As an example, smallpox vaccination involves scraping the arm with antigen. A vaccine “take” is recorded if a pox forms. While the scar itself does not protect, it is a proxy for a robust relevant immune response to the vaccine and risk of disease by scar or “take” is of interest even though it is clearly a non-mechanistic correlate of risk using the nomenclature of Plotkin and Gilbert (2012). Use of directly is simple, avoids the measurement error issue, requires no imputation, and has slightly better power than use of imputed , as we will see. Use of by itself, however, is not easily interpretable unless as a proxy for vaccine “take,” i.e., is binary and is like being unvaccinated.
2.2. Waning immune response
The above development is appropriate if the immune responses are stable during the followup period for the cohort study. In general, antibody decay over time and the above approach can lead to bias. To see this, suppose that nearly everyone was vaccinated prior to an outbreak which exploded and then waned so the risk of the case disease decreased over time, coinciding with the antibody decay. Further suppose that the control disease rate was constant over time. Then even if was unrelated to we would associate low with low risk and we would tend to estimate a positive . And the problem is even more complicated as, in general, people will be vaccinated at various times and the case disease rate might vary with time.
Let be the time since the start of the test negative study. To develop the time varying setting more formally, we will assume that the hazard for a case disease arriving at time with covariate is given by
(2.9) |
where is the hazard for case disease a little before and is the probability of arriving at the clinic at time , given a case infection just prior to .
We analogously assume that the hazard for the control disease is independent of and arbitrary, as is the instantaneous probability of arriving. Thus, the hazard for a person with control disease arriving at the clinic is
Recall that the probability of an event in a small interval [t,t+), given no event prior to , is approximately for an arbitrary hazard function . Under a “rare” disease assumption, we can calculate the probability that a given vaccinee who showed up at time was a case as
(2.10) |
where is the indicator of case disease for an arrival at time and
Now could be constant. One way this can happen is if and so . If so, we can simply fit logistic regression to the data points . In general, will not be constant so that
(2.11) |
We might specify where is say the median followup time. Or we could fit a quadratic function of log(t). In practice, different approaches to specify could be tried and the best one selected. Note that we only need know (or impute) at the time of arrival to the clinic for each vaccinee.
3. Assumptions
The above approaches recover the prospective slope for or if the model assumptions are met. In this section, we delineate the assumptions when is known for all, constant after time , the logistic-type model is correctly specified, and there is independence between exposure and . The major issue is thus whether vaccinated cases (controls) arrive with fixed probability (). Or equivalently, whether those who arrive are a representative sample of cases and controls respectively. We allow .
1. The vaccinees who arrive with the control disease provide a representative sample of the distribution of immune response among those vaccinated without the case disease.
This follows if has no impact on the probability of acquiring the control disease and no impact on the probability of arriving at the clinic so . Note we can allow to depend on latent or measured disease stratus or baseline covariates as long as they independent of .
2. The vaccinees who arrive with the case disease provide a representative sample of the distribution of immune response among those vaccinated with the case disease.
(3.12) |
The last equality follows if has no additional impact on given severity which seems a reasonable assumption. Using the last equality, we can see that does not depend on if either is free of or is free of . We discuss each in turn.
Now if severity does not impact health seeking behavior. This might occur if all cases are equally encouraged to arrive at a clinic and all cases are equally compliant, or if the severity gradient is modest enough that it does not change behavior.
Next consider when i.e., has no impact on severity given a vaccinee is infected. While this obtains if the vaccine’s mechanism of action is all-or-none, it is a weaker condition. An all-or-none vaccine implies that the distribution of severity is unchanged by vaccination so that , where is the vaccine indicator. But we only require which allows .
If neither assumption is plausible but case disease severity is observed, there is a remedy provided disease follows the prospective logistic regression model:
(3.13) |
for Then even if affects the arrival probability through the observable disease severity, recovery of using data follows from arguments analogous to those used to derive (2.6). One can show that (3.13) implies the distribution of disease severity among the vaccinated infecteds, , varies with reflecting a kind of disease modifying vaccine.
These conditions can be relaxed using baseline covariates . If the prospective model (2.1) controls for confounding by specifying , then the above arguments go through, even if the arrival probabilities satisfy . One can show that with , and thus remains the coeffience for .
To summarize, assuming the verity of (3.12) and a correctly specified model for , we can recover even if depend on unless disease severity is unobservable, depends on , and depends on .
4. Simulations
4.1. Constant immune response
We illustrate performance for the simple case where antibody does not decay. We prospectively generate data for vaccinees which are then sampled retrospectively as in a test negative design. We assume a population of size vaccinees and generate the case disease according to
and sequentially generate control disease among the case uninfecteds as
We specify as Gaussian with mean= 3.00 and standard deviation =0.50 as in the Prevail 1 vaccine trial Kennedy and others (2016). We generate
where is standard normal. Thus, are bivariate normal with common mean 3.00, common standard deviation 0.50, and correlation . We set , 0.70, and 1.00. For reference, a study estimated a correlation of 0.70 between ELISA OD immune response readouts for VSV proteins (W) and Ebola proteins (X) at 56 days post vaccination Poetsch and others (2018).
We specify as (5.51, 0.00), (4.05, 0.60), and (1.12, 1.81). These correspond to no, modest, and strong effects of . To help interpret , the ratios of risk of case disease for the 1st versus 8th octile of are 1, 2, and 10, respectively, for . We set the conditional risk of control disease as , and so that vaccinated cases and controls arrive with equal probability.
We evaluate estimation of the model
(4.14) |
where IR, the immune response, is , , or
Table 1 presents the results using 10 000 simulated studies per scenario. Columns 4 and 5 provide the sample mean and variance of . Column 6 reports the rejection rate for the two-sided Wald test using the Monte Carlo standard error over the 10 000 simulated studies. We see that we have power of around 100% to test the effect of IR when for all scenarios, while for we need a correlation of 0.70 to approach 80% power. The type I error rate seems consistent with the used for testing.
Table 1.
IR | % Reject | ||||||
---|---|---|---|---|---|---|---|
0.000 | 1.000 | 0.002 | 0.019 | 0.053 | 1049 | 302 | |
0.600 | 1.000 | 0.602 | 0.024 | 0.973 | 972 | 225 | |
1.810 | 1.000 | 1.821 | 0.040 | 1.000 | 908 | 160 | |
0.000 | 0.700 | 0.001 | 0.040 | 0.048 | 1048 | 302 | |
0.600 | 0.700 | 0.605 | 0.049 | 0.786 | 973 | 225 | |
1.810 | 0.700 | 1.826 | 0.076 | 1.000 | 909 | 160 | |
0.000 | 0.400 | 0.003 | 0.122 | 0.049 | 1049 | 302 | |
0.600 | 0.400 | 0.603 | 0.152 | 0.332 | 973 | 225 | |
1.810 | 0.400 | 1.827 | 0.232 | 0.974 | 909 | 160 | |
0.000 | 0.700 | 0.001 | 0.020 | 0.048 | 1048 | 302 | |
0.600 | 0.700 | 0.423 | 0.024 | 0.790 | 973 | 225 | |
1.810 | 0.700 | 1.274 | 0.035 | 1.000 | 909 | 160 | |
0.000 | 0.400 | 0.001 | 0.019 | 0.048 | 1049 | 302 | |
0.600 | 0.400 | 0.239 | 0.023 | 0.344 | 973 | 225 | |
1.810 | 0.400 | 0.725 | 0.032 | 0.984 | 909 | 160 | |
0.600 | 1.000 | 0.604 | 0.031 | 0.938 | 599 | 225 |
Regression calibration recovers the true even with . In the Appendix, we derive the correct (marginalized) probability of given under our assumptions. We show that this curve as a function of (with slope ) is virtually identical to the regression calibration curve (with the same slope ), irrespective of . This fidelity explains the excellent recovery of with regression calibration. Regression calibration does become more biased as increases, but for these simulations, is relatively small, Follmann and others (1999).
Intuitively, as decreases estimation becomes more difficult. For , the variance for under regression calibration is 0.024, 0.049, and 0.152 for = 1.00, 0.70, and 0.40, respectively. So the variance roughly doubles as goes from 1.00 to 0.70 and triples as goes from 0.70 to 0.40. In contrast, with direct use of the variance of stays constant at about 0.023 or 0.024 for all with . The use of in lieu of , of course, leads to a smaller estimate of but this is counterbalanced by the smaller variance. The upshot is that the power for testing is nearly identical for regression calibration and direct use of for all scenarios.
In the Appendix, we provide additional simulations with sample sizes 1/10th and 10 times that of Table 1. The three approaches have less power and more variability with the smaller sample size and the opposite with larger sample size. Regression calibration continues to be nearly unbiased for the larger sample size, though it has some bias for the smaller sample size.
The above simulations were conducted under the standard assumptions of the test negative design. In the final row, we set while so that vaccinees with the control disease have half the arrival probability of the vaccinated cases, which violates the standard test negative assumption. We set and evaluate so that the second line of Table 1, which has provides a reference. As expected, we see the estimate of is still unbiased, though the variance for is slightly increased due to fewer control vaccinees.
4.2. Waning immune response
In this subsection, we provide a brief evaluation of the model (2.11). To focus on the issue of antibody decay, we assume that is measured without error, and that everyone with either disease arrives at a clinic. We specify the instantaneous conditional risk for control disease using a Weibull hazard,
where is the time since start of the test negative study. When () the hazard is increasing (decreasing). The instantaneous conditional risk of case disease is specified as
For , we specify linear antibody decay according to a random effects model. For person the decay is
where are independent normals with mean 0 and standard deviations , respectively. The parameter reflects both natural variation in immune response to vaccination as well as variation in vaccination times as those vaccinated long ago would tend to have smaller .
We generate times to case and control disease by the inverse cumulative distribution function method where we generate , a uniform [0,1] random variable and then determine the disease arrival time that solves for = 0,1. For the case disease, this is solved by numerically minimizing . For subjects who experience both case and control events, we use the first and discard the second. We set vaccinees and follow subjects until we observe 1000 first events.
For all scenarios, we set , (, and . Our reference case is , or constant risk of each disease and , . We then vary these parameters to see their impact on performance. The hazards and distribution of arrival times are graphed in the Supplementary material available at Biostatistics online.
We evaluate two different modeling strategies for (2.11). The first is to treat as fixed with no dependence on time. The second is to evaluate five models and pick the one with the best Akaike Information Criterion (AIC). The five models are to treat as constant (1), linear in (2), quadratic in (3), linear in (4), and quadratic in . This second strategy is meant to mimic how a data analyst might address a non-constant . To summarize performance, we calculate the sample mean and variance for the estimate of under modeling strategies 1 and 2. We also report the modal model choice of strategy 2 and the number of times that modal choice was chosen. Additionally, we report the number of cases and controls for the last data set.
Table 2 presents the summary results. We see that both modeling strategies 1 and 2 recover very well for scenario 1, our reference, and scenario 2, where ) has more variability and a steeper mean decline. Both scenarios 1 and 2 have constant hazards for both case and control disease. For non-constant hazards, modeling strategy 1 is biased with bias of around 10%. Modeling strategy 2 has minimal bias for non-constant hazards and picks as a linear function of in about 70–80% of the simulations using the AIC criterion.
Table 2.
Strategy 1 | Strategy 2 | AIC | # times | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scenario | Choice | Chosen | |||||||||||
1 | 1.00 | 1.00 | 0.5 | 0.1 | 0.5 | 1.47 | 0.0204 | 1.47 | 0.0209 | 1 | 735 | 426 | 574 |
2 | 1.00 | 1.00 | 1.0 | 0.5 | 1.0 | 1.48 | 0.0100 | 1.48 | 0.0102 | 1 | 693 | 629 | 371 |
3 | 1.25 | 0.75 | 0.5 | 0.1 | 0.5 | 1.59 | 0.0268 | 1.48 | 0.0271 | 4 | 738 | 235 | 765 |
4 | 0.75 | 1.25 | 0.5 | 0.1 | 0.5 | 1.35 | 0.0213 | 1.48 | 0.0239 | 4 | 780 | 618 | 382 |
5 | 1.50 | 0.50 | 0.5 | 0.1 | 0.5 | 1.61 | 0.0858 | 1.49 | 0.0927 | 4 | 770 | 66 | 934 |
6 | 0.50 | 1.50 | 0.5 | 0.1 | 0.5 | 1.29 | 0.0324 | 1.48 | 0.0392 | 4 | 828 | 829 | 171 |
Strictly speaking, (2.10) obtains under a “rare” disease assumption. In the above simulations, 10% of the vaccinees acquired disease with no noticeable bias. We did further simulations and did observe about 7% bias for scenario 2 with 30% disease acquisition.
5. Example
In the 2018–2020, outbreak of Ebola in the Democratic Republic of the Congo a ring vaccination campaign was conducted. Local surveillance teams kept track of new Ebola cases which were then relayed to a WHO vaccination team that vaccinated the contacts and contacts of contacts of the index case. Several hundred thousand at risk people were thus vaccinated. Following vaccination, the vaccinees were monitored and encouraged to visit an Ebola transit/treatment center once any symptoms developed. The ring campaign used the rVSV vaccine which is estimated to have quite high efficacy though breakthrough infections occur, WHO (2019) and Henao-Restrepo and others (2015). The durability of the vaccine and the impact of immune response on risk of disease are unknown. Prospectively collecting samples in all vaccinees for correlates analysis has not been done.
To illustrate our approach, we simulate a dataset meant to loosely approximate a large Ebola ring vaccination campaign with a vaccine whose substantial efficacy is modulated by immune response. We assume that each person’s immune response is relatively stable over time. We assume 150 000 at risk subjects are vaccinated and generate case and control infections as in the previous section with . We set , the conditional risk of control disease (e.g., malaria) equal to 0.005 and assume that vaccinated subjects arrive at the transit center with fixed probability , as one might expect from regular monitoring and encouragement. Note that with regular monitoring of all subjects in a ring, the assumption that for vaccinees may hold, thus even if rVSV does modify disease, the estimated risk of disease as a function of immune response should be unbiased.
To provide additional context, we assume 300 000 unvaccinated subjects are at risk (e.g., have first or second degree contact with an Ebola case) and generate case infections with probability 0.01. We generate control infections from this set with conditional probability 0.005 just as for the vaccinated. We assume that the unvaccinated contacts arrive with probability 0.25 for both case and control disease, irrespective of severity. Finally, we assume that 20% of the arrivals occur within 35 days of contact with an Ebola case. For vaccinees, this should ensure a relatively stable vaccine-induced immune response at arrival.
For this single simulated dataset, a total of 114 unvaccinated contacts of Ebola cases arrived at least 35 days post contact. The proportion with Ebola Virus Disease (EVD) was 0.69. In contrast, 127 vaccinated contacts of an Ebola case arrived at least 35 days post contact with an EVD proportion of 0.28. Thus the overall test negative VE for these late arrivals is estimated as
The above VE estimate is unbiased under the standard test negative assumptions that vaccinees (unvaccinated) have fixed probability () of arriving at a transit center, irrespective of true disease status, disease severity, and equal exposure of vaccinees and unvaccinated. This latter assumption may be questioned in a ring study where the vaccinees are all known contacts of a case.
Figure 2 displays the data, jittered for the unvaccinated for whom we set to 0, along with the fitted logistic regression model where is the immune response covariate, , , or . For the unvaccinated, we draw a dashed line at the observed Ebola disease rate of 0.69 for reference. Table 3 provides the estimated parameters. We see that, as expected, using results in a smaller than use of and that the Wald statistic for testing for is identical to the Wald statistic based on , when we use the naive standard error. We also provide the bootstrap standard error and Wald statistic for regression calibration based on 1000 resamples. The resultant standard errors are slightly larger than the naive standard errors. The two-sided p-values for testing an effect of and on disease are, respectively, 0.02 and 0.01. Recall that the estimates of are unbiased even if the arrival probability for vaccinees depends on latent severity, provided the vaccine has an all-or-none mechanism. Estimates of are also unbiased even if the vaccine modifies latent disease severity, provided the arrival probability is the same for all vaccinees with the case disease.
Table 3.
IR = | IR= | IR= | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
parm | est | se | Wald | est | se | Wald | se | Wald | est | se | wald |
3.68 | 1.28 | 2.86 | 3.28 | 1.68 | 1.95 | 1.70 | 1.93 | 2.07 | 1.20 | 1.72 | |
1.64 | 0.46 | 3.54 | 1.45 | 0.58 | 2.49 | 0.60 | 2.40 | 1.04 | 0.42 | 2.49 |
Naive standard errors, Bootstrap standard errors
To help interpret the magnitude of the effect of the imputed immune response on risk of disease, we can compare the ratio of the risk of disease at the 10th versus 90th percentile of the observed distribution of . These percentiles are 2.51 and 3.39, respectively, with associated probabilities of disease of 0.42 and 0.16 for these late vaccinated arrivals. Thus the risk is roughly 2.5 times greater at the 10th versus 90th percentile of indicating a substantial effect of antibody on EVD risk.
6. Discussion
This research arose from the design of an immune correlates study using test negative data for the 2018–2020 Democratic Republic of the Congo (DRC) Ebola outbreak. The main contribution of this article is the idea that an estimate of the effect of the relevant immune response on the risk of disease can be estimated by using an irrelevant immune response . We use the proxy to impute based on a prediction model that is estimated in vaccinated controls using samples collected upon arrival at a clinic. The method extends to settings where antibody substantially decays over time. Simulations demonstrated the feasibility of the proposed methods.
This approach applies beyond Ebola. The key requirement is that the vaccine of interest produce irrelevant immune responses that will be unaffected by the disease of interest, and that these immune responses are correlated with the immune response of interest. One such application is for extremely rare diseases such as the prevention of Zika-induced congenital birth defects. Suppose Puerto Rico begins Zika vaccination in 2025 and an outbreak sweeps the island in 2040. Since Zika-induced birth defects are rare, a prospective immune correlates design would require long-term storage of samples from hundreds of thousands of women just after vaccination. Furthermore, the values collected in 2025 onward would not address the question of the immune response at Zika exposure in 2040. But using the methods of this article we could sample cases (mothers of children with birth defects who are antibody positive for non-vaccine Zika antigens) and controls (mothers of children with birth defects who are antibody negative for non-vaccine Zika antigens). We could then apply a test negative design to assess overall VE and also perform an immune correlates analysis with and , providing that a that predicts were available.
We focused on or the immune response proximal to infection. However, traditional immune correlates analyses focus on shortly after vaccination. Our methods for time constant immune response directly apply as here . If antibody decay occurs, one could in principle apply (2.10) with replaced by , the predicted value of shortly after vaccination. This would require the development of models to predict .
The biggest challenge in this approach is whether the biology is amenable. Some vaccines won’t induce a at all. Other times, even if is induced the correlation may be too weak for this method to work. This may be more of a problem with pre-existing immunity to either the antigens of the vaccine or the vector. On the other hand, it is not really required that be an immune response to the vector. What is required is the existence of covariates that predict among vaccinees and that . Development of such methods is left to future work.
Supplementary Material
Acknowledgments
This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). We thank Jing Qin and Michael Fay for helpful comments.
Conflict of Interest: None declared.
7. Software
The code to reproduce the example is given at https://github.com/follmand/Test-Negative-Immune-Correlates.
Supplementary materials
Supplementary material is available at http://biostatistics.oxfordjournals.org.
References
- Boom, J. A., Tate, J. E., Sahni, L. C., Rench, M. A., Hull, J. J., Gentsch, J. R., Patel, M. M., Baker, C. J. and Parashar, U. D. (2010). Effectiveness of pentavalent rotavirus vaccine in a large urban population in the United States. Pediatrics 125, e199–e207. [DOI] [PubMed] [Google Scholar]
- Broome, C. V., Facklam, R. R. and Fraser, D. W. (1980). Pneumococcal disease after pneumococcal vaccination: an alternative method to estimate the efficacy of pneumococcal vaccine. New England Journal of Medicine 303, 549–552. [DOI] [PubMed] [Google Scholar]
- Follmann, D. A., Hunsberger, S. A. and Albert, P. S. (1999). Repeated probit regression when covariates are measured with error. Biometrics, 55, 403–409. [DOI] [PubMed] [Google Scholar]
- Fukushima, W. and Hirota, Y. (2017). Basic principles of test-negative design in evaluating influenza vaccine effectiveness. Vaccine 35, 4796–4800. [DOI] [PubMed] [Google Scholar]
- Guolo, A. (2008). A flexible approach to measurement error correction in case–control studies. Biometrics 64, 1207–1214. [DOI] [PubMed] [Google Scholar]
- Halloran, M. E., Longini, I. M. and Gilbert, P. B. (2020). Designing a study of correlates of risk for Ebola vaccination. American Journal of Epidemiology 189, 747–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henao-Restrepo, A. M., Longini, I. M., Egger, M., Dean, N. E., Edmunds, W. J., Camacho, A., Carroll, M. W., Doumbia, M, Draguez, B., Duraffour, S.et al. (2015). Efficacy and effectiveness of an RVSV-vectored vaccine expressing Ebola surface glycoprotein: interim results from the guinea ring vaccination cluster-randomised trial. The Lancet 386, 857–866. [DOI] [PubMed] [Google Scholar]
- Jackson, M. L. and Nelson, J. C. (2013). The test-negative design for estimating influenza vaccine effectiveness. Vaccine 31, 2165–2168. [DOI] [PubMed] [Google Scholar]
- Kennedy, S. B., Neaton, J. D., Lane, H. C., Kieh, M. W. S., Massaquoi, M. B. F., Touchette, N. A., Nason, M. C., Follmann, D. A., Boley, F. K., Johnson, M. P.et al. (2016). Implementation of an Ebola virus disease vaccine clinical trial during the Ebola epidemic in Liberia: design, procedures, and challenges. Clinical Trials, 13, 49–56. [DOI] [PubMed] [Google Scholar]
- Lewnard, J. A., Tedijanto, C., Cowling, B. J. and Lipsitch, M. (2018). Measurement of vaccine direct effects under the test-negative design. American Journal of Epidemiology 187, 2686–2697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plotkin, S. A. and Gilbert, P. B. (2012). Nomenclature for immune correlates of protection after vaccination. Clinical Infectious Diseases 54, 1615–1617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poetsch, J. H., Dahlke, C., Zinser, M. E., Kasonta, R., Lunemann, S., Rechtien, A., Ly, M. L., Stubbe, H. C., Krähling, V., Biedenkopf, N.et al. (2018). Detectable vesicular stomatitis virus (VSV)-specific humoral and cellular immune responses following VSV-Ebola virus vaccination in humans. The Journal of Infectious Diseases 219, 556–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qin, L., Gilbert, P. B., Corey, L., McElrath, M. J. and Self, S. G. (2007). A framework for assessing immunological correlates of protection in vaccine trials. The Journal of Infectious Diseases 196, 1304–1312. [DOI] [PubMed] [Google Scholar]
- Rosner, B., Spiegelman, D. and Willett, W. C. (1990). Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology 132, 734–745. [DOI] [PubMed] [Google Scholar]
- Sullivan, S. G., Tchetgen Tchetgen, E. J. and Cowling, B. J. (2016). Theoretical basis of the test-negative study design for assessment of influenza vaccine effectiveness. American Journal of Epidemiology 184, 345–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westreich, D. and Hudgens, M. G. (2016). Invited commentary: beware the test-negative design. American Journal of Epidemiology 184, 354–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- WHO. (2019). Preliminary Results on the Efficacy of rVSV-ZEBOV-GP Ebola Vaccine using the Ring Vaccination Strategy in the Control of an Ebola Outbreak in the Democratic Republic of the Congo: An Example of Integration of Research into Epidemic Response. https://reliefweb.int/report/democratic-republic-congo/preliminary-results-efficacy-rvsv-zebov-gp-ebola-vaccine-using-ring.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.