Summary:
Trials of candidate agents for HIV pre-exposure prophylaxis (PrEP) may randomise between a new agent (nPrEP) and oral co-formulated emtricitabine plus tenofovir disoproxil fumerate (F/TDF). This design presents unique challenges in design and interpretation. First with two active arms, HIV incidence may be low. Second, F/TDF effectiveness varies across populations; thus, similar HIV incidence between arms could be consistent with a wide range of effectiveness for the nPrEP. We propose a two-part approach to trial results. First, we use Bayesian methods to incorporate assumptions about the background trial HIV incidence in the absence of PrEP, possibly augmented by external data. Based on this, we estimate and compare the number of averted (or prevented) HIV infections in each of the two trial arms, calculating the averted infections ratio (AIR). We apply these methods to a recently completed trial of tenofovir alafenamide with emtricitabine (F/TAF) for PrEP. Our framework demonstrates that leveraging external information to estimate averted infections and the AIR enhances the efficiency and interpretation of active-controlled PrEP trials.
1. The DISCOVER Trial Design
There is an urgent need for innovative trial approaches which can ethically and rigorously assess new HIV prevention methods.1–4 Randomized active controlled trials, with an oral emtricitabine plus tenofovir disoproxil fumarate (F/TDF) control group, have been the preferred design for assessing new pre-exposure prophylaxis (nPrEP) agents. To date, active controlled nPrEP trials have been large and expensive, with power calculations requiring observation of at least 100 incident HIV infections.5–7 Observing this number of infections is a particular challenge in active controlled trials because at least one, and possibly both, arms may be receiving effective prevention. There has been robust discussion about designing cost-effective trials, identifying rigorous sources of evidence, and modifying regulatory standards for nPrEP. These dilemmas are well illustrated by the design, results, and interpretation of the DISCOVER study.
The DISCOVER study was a randomized double-blind double-dummy active-controlled trial comparing co-formulated emtricitabine and tenofovir alafenamide (F/TAF; Descovy®) vs. F/TDF for PrEP in men (MSM) and transgender women (TW) who have sex with men.5 The study met its pre-specified margin for non-inferiority (NI) of F/TAF compared to F/TDF.8 Based on DISCOVER, the US Food and Drug Administration approved F/TAF for PrEP in MSM and TW.9
In this paper, we present a reanalysis of the DISCOVER trial data using an alternative framework. Our approach compares the numbers of prevented infections on each arm which are calculated based on explicit assumptions about the counterfactual background HIV incidence,10 and allows incorporation of expert opinion and/or external data through Bayesian inference. Our analyses suggest that F/TDF and F/TAF have more similar effectiveness than might be apparent.
2. DISCOVER Results
Table 1 summarises the results of DISCOVER. There were 22 total HIV infections: 5 suspected baseline infections and 17 post-baseline infections. By PrEP group, there were 11 F/TDF vs. 6 F/TAF HIV post-baseline infections. The F/TAF:F/TDF relative rate (RR) was 0·55 with a 95% confidence interval (CI) of 0·22–1·48 (Table 1). The CI upper bound was within a pre-specified NI margin5 of <1·62 and met the criterion for NI, providing affirmative evidence that at least 50% of the control (F/TDF) effectiveness was preserved by the investigational treatment (F/TAF). The NI margin was derived based on the results of completed placebo-controlled trials of F/TDF and a set of assumptions11 (further details in Supplementary Appendix, page S1). However, a comparison of the RR to a fixed margin has important limitations; it does not address the strength of evidence for non-inferiority or estimate the effectiveness of F/TAF.
Table 1:
Study Arm | No. Randomised | Person Years | Total HIV+ | Post Baseline. HIV+ | Post-Baseline HIV Rate (100 PY) |
---|---|---|---|---|---|
F/TAF | 2694 | 4370 | 7 | 6 | 0·137 |
F/TDF | 2693 | 4386 | 15 | 11 | 0·251 |
Relative risk (post-baseline events) 0·55, 95% CI: 0·20,1·48
We examined the robustness of the NI conclusion (using the 1·62 margin) by hypothetical addition or subtraction of HIV infections from the F/TAF arm (Table 2). The trial would have failed to demonstrate NI with the addition of a single additional HIV infection in the F/TAF arm. Conversely, subtracting three infections from the F/TAF arm would have led to the conclusion of superiority of F/TAF. The qualitatively different conclusions supported by small changes to number of infections, present a challenge for interpretation of the strength of DISCOVER’s evidence.
Table 2:
Scenario | HIV+ cases | Rate ratio (95% CI) | AIR+ (95% CI) | AIR in Bayesian analysis (95% CrI) | |
---|---|---|---|---|---|
F/TAF | F/TDF | ||||
A | 6 | 11 | 0·55 (0·20–1·48) | 1·10 (0·94–1·17) | 1.03 (0.98–1.11) |
B | 7 | 11 | 0·64 (0·25–1·65) | 1·08 (0·92–1·26) | 1.02 (0.97–1.10) |
C | 3 | 11 | 0·27 (0·08–0·98) | 1·15 (1·02–1·31) | 1.05 (1.00–1.14) |
Scenario A. Observed results, non-inferiority shown (since 1.48<1.62)
Scenario B. One additional event on F/TAF, non-inferiority not shown (since 1.65>1.62)
Scenario C. Three fewer events on F/TAF, superiority shown (since 0.98<1.00)
Assuming a background HIV incidence of 1·44 per 100 PY
The AIR unequivocally demonstrates non-inferiority under each of these scenarios since the lower confidence limit of the AIR indicates preservation of effect higher than 90%.
The FDA briefing document alludes to the low number of HIV infections: “ … the similarity between F/TAF and F/TDF can mean either that both drugs were effective or neither drug was effective because the population was not at substantial risk.”12 This distinction is crucial: if the former is true then the conclusion of NI is justified, whereas if the latter is true then the trial had no possibility of generating information about the relative effectiveness of the two drugs. The briefing document further outlined why the former hypothesis was more plausible, citing the high effectiveness of F/TDF in previous placebo-controlled trials of MSM and an apparent high underlying risk of HIV infection in the trial population based on self-reported condomless anal sex and high STI rates.
3. Averted Infections and the Averted Infections Ratio
The conventional approach11 to NI trials is based on comparing the numbers of diagnosed HIV infections between study arms. We applied the framework of averted (or prevented) infections to the DISCOVER results, which shifts focus to estimation and comparison of the unobserved numbers of HIV infections that were prevented by the study drug in each arm.10 We contend evidence of effectiveness of PrEP product accumulates when it prevents infections that would otherwise happen. If an overwhelming effective preventative is used in a high risk population, then there will be few or no events. However, proof of prevention is abundant, but unseen, since there are a large number of averted infections. A 100% effective preventative is delivered in a population without risk, there will be zero events. However there are also no infection averted and no evidence of effectiveness is provided The averted infection scale is particularly useful for trials with few HIV infections because it makes the distinction between the two scenarios.
Estimation and comparison of the number of infections averted by F/TDF and by F/TAF requires specification of the HIV rate that would have occurred in the trial in the absence of PrEP; we termed this the “background HIV incidence rate”. Initially, we considered the rate of 1·44 per 100 PY, the value assumed in the design of DISCOVER for the F/TDF arm (hence, this estimate was conservatively low). With this rate, in the absence of PrEP, approximately 126 HIV infections would have been expected in the trial population, 63 in each arm. Far fewer HIV infections were observed in each arm of DISCOVER: 52 (=63–11) fewer on F/TDF and 57 (=63–6) fewer on F/TAF (Fig 1). Under this background rate, there were many more averted infections (n=109) than observed infections (n=17) in DISCOVER.
Our preferred metric of effect preservation is the averted infections ratio10 (AIR) between the arms, 57/52 = 1·10, 95% CI: 0·94–1·27. In other words, we estimate that in DISCOVER F/TAF prevented from 27% more to 6% fewer infections than did F/TDF. Hence, F/TAF preserved at least 94% of the effect of F/TDF – far above 50% effect preservation. The AIR estimate of 1·10 is based on an assumed background HIV incidence rate of 1·44 per 100 PY. Fig 2 shows a graphical exploration of the effect of varying this assumption for our interpretation of the DISCOVER data, displaying the number of averted infections, the AIR and the lower limit of the 95% CI for the AIR according to assumed background HIV incidence. The horizontal line at 0.5 demarcates the region for which F/TAF averted at least 50% of infections — a measure of 50% effect preservation and thus evidence of NI. If we assume that the background HIV rate is at least 0.5 per 100 PY, a very low rate for a study in any reasonable setting, then we have confidence in NI. Confidence grows rapidly with higher assumed background incidence.
The AIR can also be derived in terms of the assumed effectiveness for F/TDF (relative to background incidence).10 The DISCOVER protocol5 (see the Supplementary Appendix, p. S1) derived a working estimate of F/TDF effectiveness of 62% based on previous trials. Under this assumption, the estimated number of averted infections would be approximately 18 (F/TDF) and 23 (F/TAF) for an AIR of 1·28 with 95% CI: 0·71 to 1·49 — still far above 50% effect preservation.
In Table 2, we demonstrate that small changes to the data would have had a large effect on whether NI, or even superiority, was supported by conventional methods of analysis. The results using the AIR were considerably more resilient. The addition of one HIV infection to the F/TAF arm caused the RR confidence limit to fall outside the NI boundary. However, such a change would only move the AIR from 1·10 (95% CI: 0·94 to 1·27) to 1·08 (95% CI: 0·92 to 1·26). Subtracting three infections from the F/TAF arm implied the superiority of F/TAF (as well as NI) using the RR. Notably, the AIR could also conclude superiority of F/TAF even though the point estimate (1·15) and the 95% CI (1·02 to 1·31) would not be considerably altered. This reflects the fact that when the number of observed infections is small, yet the plausible number of averted infections is large, the between-arm ratios of the latter will be much more stable than the former.
4. Incorporation of Counterfactual Placebo Evidence – A Bayesian Approach
External evidence suggests that the background HIV incidence in DISCOVER is likely to have been well above 1 per 100 PY. Other MSM/TW PrEP trials with similar inclusion criteria reported background HIV incidence ranging from 3·9 to 9·0 per hundred PY.13–15 DISCOVER participants had high incidence of sexually transmitted infections (STIs) -- the rectal gonorrhoea rate was 21.0 per 100 person8, suggesting sexual practices that facilitate HIV transmission. Within a month of enrolment, five suspected baseline infections were diagnosed in DISCOVER. Assuming an eclipse period of 2 weeks among the participants initiating PrEP at enrolment (n=4498), this would correspond to 173 PY of follow-up, implying a pre-PrEP incidence rate of 2·9 per 100 PY (95% CI: 0·9 to 6·7). Note, 17% of DISCOVER participants enrolled into the study directly from daily TDF/FTC; they do not count as PrEP “initiators” and are therefore not included in the background incidence calculation.
We used a Bayesian approach to generate inferences regarding the hypothetical background HIV incidence rate among the DISCOVER participants had they not been receiving PrEP, based on both the observed background HIV incidence and the incidence of rectal gonorrhoea during the study period. For the latter, we used data from a systematic review by Mullick and Murray16 that evaluated the correlation between HIV and rectal gonorrhoea incidence rates among MSM not using PrEP. The linear regression formula from this study yields an estimated background HIV incidence of 6·6 per 100 per year for the DISCOVER participants. We built on this approach, directly using the raw data from the systematic review in our analysis. Additionally, the HIV incidence inferred from the baseline prevalent infections in DISCOVER was included in the model, assuming this was consistent with the background incidence during the trial. We applied a Bayesian analysis to estimating the AIR using a weakly informative prior (to restrict values to a plausible range) for background HIV incidence, combined with the trial results augmented with data on rectal gonorrhea and baseline infection. We used Stan17 software to estimate the posterior mean and associated 95% credible interval (CrI) for the AIR and other parameters of interest.18 The complete details of our methods have been provided in the Supplementary Appendix (pp. S2–S3).
Fig 3 shows the posterior density for the background HIV incidence e in the DISCOVER trial based on this analysis, posterior mean of 4·5 (95% CrI 2·0–7·3) per 100 person years. Fig 4 displays the posterior density for the associated AIR with poster mean 1·03 (95% CrI 0·98–1·11). The posterior is a flexible tool for describing uncertainty, for example allowing us to calculate that the probability that the AIR lies between 0·95 to 1·05 (effectiveness preservation within +/− 5%) equals 80%. While the posterior probability of superiority (AIR > 1) is 88%, the probability that the effect of F/TAF was greater than 10% higher (AIR > 1.1) was 3%. Hence, the analysis strongly suggested that the posterior effectiveness of F/TDF (93%, 95% CrI 85 to 97) and F/TAF (96%, 95% CrI 91 to 99) were very similar and that any potential superiority of F/TAF is modest.
5. Implications for DISCOVER and beyond
Based on HIV incidence, the evidence for F/TAF’s non-inferiority from DISCOVER might appear to be weak; the trial observed just 15% (22/144) of its planned endpoints. While the trial met the pre-specified margins for non-inferiority, conclusions are sensitive to relatively small changes to the number of infections in each arm. However, our Bayesian analysis estimates an 80% posterior probability that the ratio of averted infections lies between 0·95 and 1·05 (effectiveness preservation within +/− 5%); hence, we are confident not just that F/TAF is non-inferior to F/TDF, but that the effectiveness of F/TDF and F/TAF (relative to the background HIV incidence) are highly similar.
Nevertheless, evaluation of the clinical and public health utility of F/TAF must incorporate many issues beyond effectiveness, including safety, cost, and access. Because DISCOVER only enrolled MSM/TW, clinical data on F/TAF effectiveness for PrEP is lacking in many key populations, including cisgender women, transgender men, heterosexuals men, and people who inject drugs.19 Although F/TDF is associated with small decreases in renal and bone health, F/TAF is associated with small increases in weight and cholesterol. However, these small changes may not be clinically significant.20–22 From a societal perspective, generic F/TDF will be available in 2020, bringing discounted prices and making it unlikely that F/TAF will be cost-effective.22
Beyond its specific results, the DISCOVER trial is highly instructive for future active controlled PrEP trials. A nPrEP agent is most effectively assessed in a trial enrolling people at substantial risk of HIV infection. If there are high levels of adherence to efficacious agents, the number of observed infections will be low, but the number of averted infections will be high. This fact underpins the rationale for our proposed framework for analysis which places the emphasis on the number of averted infections. The inference on averted infections requires explicit assumptions about the background HIV rate and/or control arm effectiveness.
Our method contrasts with the fixed margin NI method11 which was used to power the DISCOVER and HPTN083 trials. This approach develops a pre-specified margin which is compared to the confidence interval for the RR. The fixed margin is derived from two components: assumptions about control group effectiveness and a minimum standard for effect preservation. The conflation of assumed control effectiveness and the set standard for NI in the fixed margin method makes it difficult to compare evidence for effectiveness and NI across studies. Note, the DISCOVER results would not meet the NI margin of 1·23 set for the HPTN083 study (cf . 1·62 for DISCOVER). The difference in margins in this instance does not arise from differences in standards for effect preservation, but is due to HPTN083 assuming lower F/TDF adherence than DISCOVER (reflecting the former’s participant recruitment strategy). The development of these NI margins is contrasted in the Supplementary Appendix (Tables S1 and S2, pp S4–S5). Another problem with the fixed margin approach is that it is entirely pre-specified. If data arise during the trial (e.g., higher than assumed adherence) which contradict the assumed control arm effectiveness, the NI margin can be excessively conservative.23 There is an incentive to default to conservative assumptions in the design phase; this conservatism costs power and requires larger sample sizes.24
Our strategy by contrast combines inference and decision on NI based on a pre-specified standard (e.g. AIR ≥ 0·50) with assumptions on the background HIV incidence and/or the control arm effectiveness. The uncertainty in these parameters in handled transparently and formally through sensitivity analysis (e.g., Fig 2) Bayesian analysis. The latter approach can incorporate data which inform these key parameters. Sources can include HIV infections during screening, on-trial STI rates and/or drug levels. One might choose a sceptical prior distribution on the background HIV incidence as a way to require a higher standard of evidence. An advantage of this approach is dealing with key assumptions in a way which is flexible and transparent while enforcing a pre-specified standard for effect preservation. Our hope is that the incorporation of external information can allow for smaller trial sizes than those determined by the fixed margin approach. This is an area of active investigation.
This strategy, combining Bayesian inference with the AIR, will be particularly advantageous for active controlled trials with low event rates, which can be strongly informative if there is confidence that they have been well-conducted in cohorts at substantial risk for HIV in the absence of PrEP.
Supplementary Material
Acknowledgments
Authors were supported by US National Institutes of Health grant R01AI143357 (to DVG). DTD was supported by UK Medical Research Council grant MR_UU_12023/23. We thank Sheena McCormack and Julia Marcus for their insights and critical reading of the manuscript.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of interests
DTD reports personal fees from ViiV Healthcare and Gilead Sciences, outside of the submitted work. DVG has accepted fees from Gilead Sciences and Merck.
Contributor Information
Prof. David V. Glidden, Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California, USA.
Dr. Oliver T. Stirrup, Institute for Global Health, University College London, London, UK.
Prof. David T. Dunn, MRC Clinical Trials Unit at University College London, London, UK.
Citations
- 1.Sugarman J, Celum CL, Donnell D, Mayer KH. Ethical considerations for new HIV prevention trials. Lancet HIV. 2019; 6:e489–91. [DOI] [PubMed] [Google Scholar]
- 2.Janes H, Donnell D, Gilbert PB, et al. Taking stock of the present and looking ahead: envisioning challenges in the design of future HIV prevention efficacy trials. Lancet HIV. 2019; 6:e475–e482 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Donnell D. Current and future challenges in trial design for pre-exposure prophylaxis in HIV prevention. Stat Comm Inf Dis. 2019;11(1). [Google Scholar]
- 4.Glidden DV, Mehrotra ML, Dunn DT, Geng EH. Mosaic effectiveness: measuring the impact of novel PrEP methods. Lancet HIV. 2019; 6;e800–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Safety and efficacy of emtricitabine and tenofovir alafenamide (F/TAF) Fixed-dose combination once daily for pre-exposure prophylaxis in men and transgender women who have sex with men and are at risk of HIV-1 infection (DISCOVER). https://clinicaltrials.gov/ct2/show/NCT02842086 Retrieved on 30 April 2020.
- 6.Safety and efficacy study of injectable cabotegravir compared to daily oral tenofovir disoproxil fumarate/Emtricitabine (TDF/FTC), for pre-exposure prophylaxis in HIV-uninfected cisgender men and transgender women who have sex with men https://clinicaltrials.gov/ct2/show/NCT02720094 Retrieved on 30 April 2020.
- 7.Evaluating the safety and efficacy of long-acting injectable cabotegravir compared to daily oral TDF/FTC for pre-exposure prophylaxis in HIV-uninfected women. https://clinicaltrials.gov/ct2/show/NCT03164564 Retrieved on 30 April 2020.
- 8.Hare CB, Coll J, Ruane P, et al. The phase 3 DISCOVER study: Daily F/TAF or F/TDF for HIV pre-exposure prophylaxis. In Congress of Retroviruses and Opportunistic Infections, Seattle, Feb 2019 http://www.croiwebcasts.org/p/2019croi/104, Retrieved on 30 April 2020. [Google Scholar]
- 9.Voelker R PrEP drug Is approved for some patients but not for others. JAMA. 2019; 322:1644. [DOI] [PubMed] [Google Scholar]
- 10.Dunn DT, Glidden DV, Stirrup OT, McCormack S. The averted infections ratio: a novel measure of effectiveness of experimental HIV pre-exposure prophylaxis agents. Lancet HIV 2018; 5:e329–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fleming TR, Odem-Davis K, Rothmann MD, Li Shen Y. Some essential considerations in the design and conduct of non-inferiority trials. Clin Trials. 2011; 8:432–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.DESCOVY® for HIV pre-exposure prophylaxis: antimicrobial drugs advisory committee meeting briefing document. https://www.fda.gov/media/129609/download. Retrieved on 30 April 2020
- 13.Grant RM, Lama JR, Anderson PL, et al. Preexposure chemoprophylaxis for HIV prevention in men who have sex with men. N Engl J Med. 2010; 363:2587–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.McCormack S, Dunn DT, Desai M, et al. 2016. Pre-exposure prophylaxis to prevent the acquisition of HIV-1 infection (PROUD): effectiveness results from the pilot phase of a pragmatic open-label randomised trial. Lancet 2016; 387:53–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Molina JM, Capitant C, Spire B, et al. On-demand preexposure prophylaxis in men at high risk for HIV-1 infection. N Engl J Med. 2015; 373:2237–46. [DOI] [PubMed] [Google Scholar]
- 16.Mullick C, Murray J. Correlations between HIV infection and rectal gonorrhea incidence in men who have sex with men: Implications for future HIV pre-exposure prophylaxis trials. J Infect Dis. 2020; 221: 214–7. [DOI] [PubMed] [Google Scholar]
- 17.Carpenter B, Lee D, Brubaker MA, Riddell A, Gelman A, Goodrich B, Guo J, Hoffman M, Betancourt M, Li P. Stan: A Probabilistic Programming Language. J Stat Softw. 2017; 76:1–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Spiegelhalter DJ, Abrams KR and Myles JP, 2004. Bayesian approaches to clinical trials and health-care evaluation (Vol. 13). John Wiley & Sons. [Google Scholar]
- 19.Goldstein RH, Walensky RP. Where were the women? Gender parity in clinical trials. N Engl J Med. 2015; doi: 10.1056/NEJMp1913547 [DOI] [PubMed] [Google Scholar]
- 20.Gupta SK, Post FA, Arribas JR, et al. Renal safety of tenofovir alafenamide vs. tenofovir disoproxil fumarate: A pooled analysis of 26 clinical trials. AIDS. 2019; 33:1455–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Walensky RP, Horn TH, Paltiel AD. The Epi-TAF for tenofovir disoproxil fumarate? Clin Infect Dis. 2015; 62:915–8. [DOI] [PubMed] [Google Scholar]
- 22.Hill A, Hughes SL, Gotham D, Pozniak AL. Tenofovir alafenamide versus tenofovir disoproxil fumarate: is there a true difference in efficacy and safety? J Virus Erad. 2018;4:72–9 [PMC free article] [PubMed] [Google Scholar]
- 23.Hanscom B, Hughes JP, Williamson BD, Donnell D. Adaptive non-inferiority margins under observable non-constancy. Stat Methods Med Res. 2019;28:3318–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dunn DT, Glidden DV. The connection between the averted infections ratio and the rate ratio in active-control trials of pre-exposure prophylaxis agents. Stat Comm Inf Dis. 2019;11(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.