Abstract
Clinical trials continue to be the gold standard for evaluating new medical technologies. New advancements in modern computation power have led to increasing interest in Bayesian methods. Despite the multiple benefits of Bayesian approaches, application to clinical trials has been limited. Based on insights from the survey of clinical researchers in drug development conducted by the Drug Information Association Bayesian Scientific Working Group (DIA BSWG), insufficient knowledge of Bayesian approaches was ranked as the most important perceived barrier to implementing Bayesian methods. Results of the same survey indicate that clinical researchers may find the interpretation of results from a Bayesian analysis to be more useful than conventional interpretations. In this article, we illustrate key concepts tied to Bayesian methods, starting with familiar concepts widely used in clinical practice before advancing in complexity, and use practical illustrations from clinical development.
Keywords: Bayesian methods, Bayesian statistics, Clinical development, Clinical trials, Drug development
Introduction
A survey was conducted to determine the barriers to implementing Bayesian methods in clinical trials and to gain insight on current levels of understanding of common conventional and Bayesian statistics among medical researchers. The survey was conducted by the Medical Outreach Team of the Drug Information Association Bayesian Scientific Working Group (DIA BSWG) [1, 2]. Over half of respondents expressed little to no comfort in interpreting Bayesian analyses, though respondents indicated an interest in an exposure to Bayesian methods.
This tutorial aims to provide clinical researchers working in drug or device development with an introduction to key Bayesian concepts. We assume the reader has a basic understanding of more traditional frequentist approaches to clinical trial design and analysis.
We start with an intuitive introduction to Bayesian reasoning. Then we introduce Bayes’ theorem, which is essential to the Bayesian framework. Using key concepts and definitions within both frequentist and Bayesian frameworks, we illustrate how Bayesian methods facilitate the interpretations of statistical results. Next, we introduce key concepts of Bayesian modeling and illustrate the importance of selecting a “prior.” One particular appeal of Bayesian methods is the use of sequential learning and Bayesian interim decision-making. In this article, we provide a simple and intuitive introduction to key Bayesian concepts, which can be applied to any new medical intervention, including drugs, devices, and diagnostics. We discuss a recent example of a Bayesian pivotal trial which resulted in the approval of a well-known COVID-19 vaccine. We conclude with a discussion of how Bayesian methods allow us to leverage data external to the clinical trial.
Introduction to the Bayesian Framework
Though the term Bayesian theory may sound new to many readers, most clinicians and clinical researchers are already familiar with Bayesian thinking. Bayesian reasoning aligns naturally with medical education and practice [3]. Bayesian methods allow for sequential learning and provide plausible values for a treatment effect or diagnosis that are compatible with both the observed data and prior knowledge or beliefs [4]. Bayesian thinking provides for formal incorporation of what one knows before collecting data and then updating what is known with acquired data. The prior probability would be the probability of an outcome of interest before the data are collected. The posterior probability is the updated probability including data collected during the trial.
In diagnostic medicine, data are collected on one patient. In a clinical trial, data are collected on numerous patients enrolled in the trial. In either instance, there is the option of reassessing probabilities of diagnoses or outcomes of interest after some or all of the data come in. Because the diagnostic example is simpler and perhaps more intuitive, we illustrate Bayesian thinking using the diagnostic example first.
In diagnostic testing, in the absence of data specific to the patient being assessed, the prior probability may correspond to the chance that the patient has one of two or more diagnoses. The simplest categorization could be that the patient does or does not have a particular disease, so the prior probability is p for having the target condition and 1-p for the subject not having the condition, where p is the general prevalence of the condition. With multiple diagnoses possible, there may be a prior distribution (a probability for each possible diagnosis having a value and the probabilities adding to one). After collecting assessments on the subject, one updates the probability of each diagnosis considered and stops when there is sufficient information to establish a diagnosis.
To illustrate, let’s assume that based on a patient’s history and presentation that there are a small number of possible diagnoses (e.g., 4) that may appear equally likely at first. Then as additional testing and assessments come into play, some diagnoses seem more likely (probability increasing) and others less likely (probability of those diagnoses decreasing). In some instances, some diagnoses may be totally ruled out. During the COVID-19 pandemic, someone with respiratory symptoms may be evaluated for COVID-19 before moving on to any additional testing because COVID is so common (and our assumption of equally likely diagnoses would not be very good). If the COVID test is negative, then additional tests may be used to decide if there is an infection (bacterial or viral) or no infection at all. Based on the results, one can exclude diagnosis A (COVID-19) because it is not compatible with the results (e.g., a very good diagnostic test ruled out a COVID-19 infection). This means that our posterior probability for diagnosis A is zero or very close to zero (few diagnostic tests are right 100% of the time). The clinician may want to rule out other infections (B is bacterial infection and C is non-COVID viral infection) before going to other explanations such as D, an inflammatory or allergic condition. Each time additional tests are performed, the clinician updates the probability of each of the remaining diagnoses (Fig. 1). In Bayes’ terminology, the clinician takes their prior plus the results of their assessments to obtain a new posterior distribution. Before a clinician can recommend how to treat a patient, the number of likely diagnoses needs to be sufficiently small, and the best diagnosis is apt to be the one with the highest posterior probability.
Figure 1.
Reallocation of credibility.
Adapted from J. Kruschke [5].
Illustration of Bayes’ Theorem
Bayes’ theorem (also referred to as Bayes’ rule) is a mathematical relationship between the prior probability and the posterior probability conditional on data [5]. In general terms, Bayes’ theorem is expressed as follows
where A and B are events of interest, and P(A) and P(B) are probabilities of event A and B, respectively. P (A|B) denotes the probability of event A happening on the condition that event B has already happened. Similarly, P(B|A) denotes the probability that the event B happens conditional on information that event A has already happened.
We will continue with a more realistic diagnostics example to illustrate the application of this theorem using OraQuick In-Home HIV Test, the only FDA-approved HIV test for self-testing at home [6]. Assume you perform the test and get a positive result. What you really want to know is “what is your probability of truly having HIV if you test positive?” Using the mathematical expression above, B is the event of testing positive, and A is the event of having the disease. P(B) is the probability of testing positive, and P(A) is the probability of having the disease (i.e., HIV). P(B|A) is the same as sensitivity (P(test +|disease +)), i.e., the probability that someone will test positive given they have the disease.
The OraQuick label notes that in a prospective clinical study, this test was observed to have 92% sensitivity and 99.98% specificity. This means that one false negative result would be expected out of every 12 test results in truly HIV-infected individuals, and one false positive result would be expected out of every 5,000 test results in truly uninfected individuals [6].
One may be tempted to conclude that if you test positive (event B occurred), you have a 92% probability of having HIV (event A). This logic is wrong, though unfortunately not uncommon. The probability of testing positive if you have a disease (P (B ⎢A), or 92% in this case) is not the same as the probability of having a disease if you test positive (P (A ⎢B)). The test’s sensitivity and specificity do not give us an answer to the key question: what is the probability that an individual indeed has HIV if their test is positive (P(A ⎢B))? This is known as the positive predictive value (PPV) of the test.
The answer to this question depends on another very important parameter: the prevalence of the disease in the population. To figure out the probability of having the disease if you test positive, we need to know the overall prevalence of HIV in the population (P(A)), or prior information, using Bayesian terminology. Figure 2 below illustrates how we can arrive at the answer to this question using the HIV test example. The PPV of OraQuick is about 65%, which means that even after a positive test result, one is not sure whether the patient really has HIV, and one would want to obtain an additional confirmatory laboratory-based test.
Figure 2.
Illustration of PPV an Bayes’ Theorem.
Taking into consideration the prevalence of a disease condition is very important to understand the value of a diagnostic test in clinical practice. The rarer the disease, the more sensitivity is required for a test to have a high PPV. For illustration, if the specificity of an HIV test is 99%, rather than the observed 99.98% that appears on the label, the positive predictive value (i.e., the probability of having HIV if you test positive) would decrease from 65% to approximately 4%. If the test is used in a high-risk population, the PPV in this population would increase due to increased prevalence. If the prevalence of HIV were as low as 6.3 cases per 100,000, the PPV would decrease from 65% to 22.5%.
Bayesian Methods: Facilitating Interpretations
Having illustrated how Bayes’ theorem relates the prior and the posterior probabilities, we move to therapeutics studies to illustrate how Bayesian methods can facilitate the statistical interpretation and, ultimately, help with the clinical interpretation.
In a clinical study evaluating a new treatment in a controlled trial (noted T for Treatment and C for Control), the prior could be the chance that the new treatment outperforms the control with respect to a particular endpoint. Denote the Probability that Treatment is better than Control as P(T > C). Assuming equipoise before starting the trial, the chances of either treatment being better is 50:50 or P(T > C) = 0.5 and P(T ≤ C) = 1-P(T > C) = 0.5. After the trial is over, one may use the data from the trial and the prior to determine if one is convinced that Treatment is better than Control overall (e.g., P(T > C) exceeds 0.975). If results from the trial lead to a conclusion that the treatments are not all that different, it would be unlikely to advance the new treatment for consideration. An alternative approach would be to express the prior and subsequent calculation of a posterior in terms of the treatment effect (e.g., the difference in the means for each treatment). Initially at equipoise, the prior assumes the average treatment effect is zero, but one can add uncertainty by making assumptions about the variability about the average of zero.
When we evaluate a new treatment using more traditional frequentist methods, we start by formulating a hypothesis that we want to test against (null hypothesis), typically that there is no difference between the study drug and control, i.e., the drug does not work. In general, the null hypothesis is posed and then potentially refuted by the evidence presented. Based on the data collected from the trial, frequentist methods estimate the probability (p value) that data as or more extreme are observed assuming the null hypothesis (no effect) is true. Using a conventionally accepted p value threshold of 0.05, we reject or don’t reject the null hypothesis. If we reject the null hypothesis, we have a statistically significant (positive) study. A failure to reject does not mean there is no effect, and we are often left with an inconclusive result in that setting.
Bayesian thinking facilitates a common-sense interpretation of statistical conclusions [8]. Bayesian methods can help answer our research questions because Bayesian methods allow one to estimate the probability that the hypothesis is true in light of the data. In the Bayesian framework, we estimate the probability of the effect based on the observed data. Consequently 95% credible interval, by definition, contains the true parameter with 95% probability. By comparison, a frequentist’s 95% confidence interval means that the true response would be contained within 95% of the confidence intervals produced by repeating the same experiments. Those learning statistics often confuse the definition of a frequentist confidence interval with that of a Bayesian credible interval.
Bayesian methods allow us to estimate the probability of different magnitudes of treatment effect, replacing the p value with easier and more clinically relevant interpretations. Table 1 below summarizes the key concepts used for results interpretation in each framework.
Table 1.
Key concepts within frequentist and Bayesian frameworks.
| Frequentist | Bayesian |
|---|---|
|
Estimating probability of the observed data given (assuming) that the null hypothesis is true P value = probability (data can be more extreme than what is observed ⎢ null hypothesis) Provides probability that observed effect can be by chance if null is true Parameter of interest is fixed but unknown and data are random. Parameter is estimated by the data along with the measure of error in estimation (standard error) 95% Confidence interval: The true effect size would be contained within 95% of the Confidence Intervals produced over repeated experimentations |
Estimating the probability that the hypothesis is true given the data Posterior probability of the hypothesis of interest = probability (hypothesis ⎢ data) Provides probability of the treatment effect based on the observed data and prior Parameters are random, the posterior distribution given the data describes the distribution of the parameter once the data is observed 95% Credible Interval: 95% Credible Interval contains the true parameter with 95% probability given the observed data |
Several recently published articles have reported post hoc Bayesian analyses of previously published frequentist trials that did not meet their primary endpoints [3, 9]. The null hypotheses could not be rejected using p values, and researchers could not get an answer on the probability of these investigational approaches having efficacy with frequentist methods. For example, the EOLIA trial evaluated the efficacy of early venovenous extracorporeal membrane oxygenation (ECMO) vs. conventional treatments in patients with severe acute respiratory distress syndrome (ARDS) [10]. The EOLIA trial reported 60 days mortality of 35% in the ECMO group vs. 46% in the control group, with a relative risk of 0.76 and corresponding frequentist 95% Confidence Interval of 0.55–1.04 and p value of 0.09. The null hypothesis could not be rejected, and the frequentist analysis did not provide insight on the probability of ECMO efficacy. Subsequently researchers conducted post hoc Bayesian analyses, which showed that across a broad range of prior assumptions about the probability of benefits from early ECMO, the posterior probability of any mortality benefit (Relative Risk < 1) ranged between 88 and 99%. Absolute risk reduction (ARR) of 2% was considered a reasonable minimum clinically important effect because this would translate into an estimated 500 lives saved every year in the United States. For an ARR of 2% or more, the posterior probability of benefit ranged between 78 and 98%, depending on the prior [3].
Post hoc Bayesian analysis provided useful information for the studies’ interpretation by estimating the posterior distribution of effect sizes of investigational therapies. However, it is important to highlight that although Bayesian methods could help interpret failed studies to guide future research, Bayesian methods cannot rescue failed trials. To fully leverage the benefits of the Bayesian framework, the Bayesian analysis needs to be pre-specified in the study protocol.
ISCHEMIA was a frequentist trial with pre-specified supportive Bayesian analysis [11]. It was designed to evaluate the effect of adding an invasive strategy—cardiac catheterization and revascularization—to medical therapy in 5,179 patients with stable coronary disease and moderate or severe ischemia. Table 2 illustrates side-by-side information we can derive from frequentist vs. Bayesian analysis.
Table 2.
Frequentist and Bayesian reporting of selected outcomes of ISCHEMIA trial [11].
| The primary endpoint: a composite endpoint of death from cardiovascular causes, myocardial infarction, or hospitalization for unstable angina, heart failure, or resuscitated cardiac arrest | |
|---|---|
| Frequentists | Bayesian |
|
At 5 years, the cumulative event rate was 16.4% and 18.2%, respectively The difference was − 1.8 percentage points with 95% confidence interval -4.7 to 1.0 At 5 years, the cumulative event rate for death from any case was 9.0% and 8.3% (95% confidence interval -1.6 to 3.1) |
The posterior probability that the difference in group-specific 5-year cumulative rates of the primary outcome is greater than 3 absolute percentage points was estimated to be 24.5% for a difference favoring the invasive strategy and less than 0.1% for a difference favoring the conservative strategy The probability that the difference in the 5-year rate of death from any cause is greater than 1 absolute percentage point was estimated to be 10.7% for a difference favoring the invasive strategy and 32.1% for a difference favoring the conservative strategy |
Introduction to the Components of a Bayesian Approach
To design and interpret Bayesian trials, it is crucial to understand key components of a Bayesian model: prior, likelihood, posterior distribution, and predictive probability. To illustrate the first three components, let’s assume that we are developing a new therapy for patients with acute ST-segment elevation myocardial infarction (STEMI). For our proof of concept, we plan to evaluate myocardial infarct size, which is expressed as a percentage of left ventricular volume or mass. We assume that infarct size is normally distributed with parameters µ (population mean) and σ (population standard deviation). As a reminder, a probability distribution is a list (for categorical variables) or a mathematical expression (for continuous variables) for all possible outcomes and their corresponding probabilities [5]. The approaches to model infarct size with these parameters under Bayesian and frequentist frameworks are illustrated in Table 3 below.
Table 3.
Model components within Frequentist and Bayesian frameworks.
| Frequentist framework | Bayesian framework |
|---|---|
|
The parameters µ and σ are fixed but unknown and we estimate them using the data collected Uncertainty is addressed by taking a sample of N patients and evaluating the sampling distribution. The estimates of uncertainty around our observed estimates are quantified as if we were repeating the sampling process of N patients over and over In other words, the parameters are considered fixed and observed data are treated as random |
The uncertainty of µ and σ are managed with probability directly, i.e., by considering these parameters as random with their own probability distributions In Bayesian models, before collecting the data, a prior distribution is assumed for the parameters which would characterize the uncertainty of parameters µ and σ Once the data are obtained, the prior distribution is updated using Bayes’ theorem to get the posterior distribution, which will be used to calculate the updated probability distributions and credible intervals of interest In other words, in the Bayesian framework the parameters of interest are viewed as random variables |
Priors play a critical role in Bayesian methodology. The prior distribution is a key part of Bayesian inference and represents the information about an uncertain parameter before the data are collected [12]. Conceptually, the prior specifies plausible values and the uncertainty of these parameters. Priors can have different degrees of informativeness, ranging from a non-informative prior (if we have no prior data) to a weakly informative prior to an informative prior. The prior is constructed before the start of a clinical trial. Methods to choose the prior range from expert opinion-based to data-driven methods. Data-driven methods (e.g., using data from completed clinical trials) are usually preferred. We illustrate the impact of this choice and provide an example of prior selection in the next section.
The likelihood is the probabilistic model for the data. It describes the data-generation process as a function of the parameters of interest. When we assume infarct size follows a normal distribution, the parameters are the mean µ and variance . The likelihood is an expression involving both the observed data and the parameters of interest.
When we obtain data from our trial, Bayes’ theorem gives us a mathematical way to combine the prior and the likelihood to get the posterior distribution. The posterior distribution is the probability distribution of the unobserved parameters of ultimate interest, given the observed data [8]. It describes the distribution of parameters (µ and in light of the newly obtained data.
After we obtain the posterior distribution, we can do predictive checking by simulating future data to assess if the model provides a good fit to the original data. Posterior predictive distributions are used to extrapolate beyond the observed data and to make predictions [13]. They can be used to predict the probability of the success of a trial (or some other aspect of the trial) given data at an interim time. They can also be used as input in an adaptive design where treatment arms may be dropped in a preplanned fashion because the arms are not predicted to be successful or possibly only a subset of subjects may be studied going forward according to a plan because a particular group is unlikely to generate a difference in treatment response. Similarly, historical data can be used to construct priors that may only be used for design purposes and not for the analysis, and the prior predictive distribution can be used to guide the sample size of the next trial.
Mathematically, the prior distribution, likelihood, posterior distribution, and posterior predictive distribution are used when developing a Bayesian approach for a clinical trial. The complexity of this prior to posterior update depends on the variables and distributions of interest (binomial, normal, etc.). Often, the development of posterior distributions that combine priors with the data likelihood includes some complex integration steps and cannot be written down in a closed form. In these cases, computational techniques such as Markov Chain Monte Carlo (MCMC) methods are used, but these are computationally intensive. For decades, computational complexity hindered the application of Bayesian methods in clinical development; however, this is no longer a hurdle due to modern computing power and available software.
To illustrate the impact of the choice of prior on the result, let us continue with the example of a new therapy in STEMI. Our hypothetical randomized controlled trial (RCT) aims at comparing a new treatment to a placebo where each is added to a current standard of care (SOC). The primary outcome of interest is myocardial infarct size at 90 days. Assume that the experimental treatment belongs to a new drug class, and there is no available prior knowledge on the efficacy of the new treatment. Therefore, a non-informative prior (sometimes called a diffuse prior) can be used for the experimental treatment arm. For the control arm, we consider the following three possible prior choices outlined in Table 4 below.
Table 4.
Prior choices for the control arm.
| Choice A | If there are no reliable prior data or if we would like to minimize the impact of the prior on the posterior, a non-informative prior can be placed on the population average outcome under the control. Here, we consider as a prior a normal distribution with a large variance (e.g., 1000) |
| Choice B | The SOC is well-established and a lot of data are available from previously completed RCTs [14]. Therefore, one may construct a prior based on the SOC outcomes in the historical trials. For example, a meta-analysis of the control arm data from several historical trials can be performed. A prior based on such a meta-analysis is often referred to as a meta-analytic-predictive (MAP) prior [15, 16]. As it could be observed in Fig. 3 below, due to some heterogeneity among historical trials, this procedure results in a less informative prior |
| Choice C | Suppose the current clinical trial is restricted to patients with occlusion of the left anterior descending (LAD) artery. Then, one may construct an informative prior based on the selection of trials that included only patients with LAD artery occlusion [14] (selected trials are highlighted in red in Fig. 3 below). With such a restriction, there is less between-trial heterogeneity, and the MAP approach in this case gives an informative normal mixture prior |
The prior choices B and C in Table 4 demonstrate how prior selection can be done. Here, we use a meta-analysis to combine information from several historical trials and to construct a prior for the SOC outcome. Such a prior is often referred to as a meta-analytic-predictive (MAP) prior [15, 16]. Figure 3 below shows a forest plot (the most common graph in meta-analysis reports) and the corresponding MAP priors for choices B and C described in Table 4.
Figure 3.
Meta-analysis of previous trials in STEMI and meta-analytic-predictive (MAP) priors.
The choice of prior has an important impact on the posterior. Figure 4 below illustrates the three prior choices (left panel), the likelihood function arising from hypothetical trial outcomes (middle panel, which is the same for all scenarios), and the posterior distributions by combining the prior and likelihood through Bayes’ rule (right panel). When the non-informative prior is used, the posterior is completely determined by the data. As the prior becomes more informative, the peaks of the prior and posterior become closer, reflecting a larger impact of the prior on the posterior. Considering the importance of prior selection, it is usually recommended to perform exploration of sensitivity to the choice of prior distribution by using alternative specifications for the prior parameters. The prior has less influence on the posterior as more data are accumulated in the trial.
Figure 4.
Prior, likelihood, and posterior.
Illustration of the Sequential Nature of Bayesian Learning
The Bayesian prior-to-posterior paradigm offers a sequential learning and decision-making framework that can be applied to clinical trials across all development phases. To introduce the concept of Bayesian sequential learning, we will build on the previous example but focus on an easier-to-illustrate binary endpoint: response to therapy. Let’s assume that in addition to infarct size we want to evaluate the response to our new therapy for STEMI, with response defined as at least 10 points improvement on KCCQ scale (Kansas City Cardiomyopathy Questionnaire) from baseline to 12 months. Note that in the Bayesian framework, the response rate is considered random rather than a fixed but unknown parameter as it is in the frequentist framework.
Let’s assume that we do not know much about the response rate. Hence, we assume a non-informative prior distribution, i.e., the response rate (parameter) can take on any value between 0 and 1 with uniform probability as shown in the first row of Fig. 5 below.
Figure 5.
Illustration of sequential prior and posterior updates.
An adaptive design, frequentist or Bayesian, allows for prospectively planned modifications to the study design based on accumulating data from patients in the study [17]. The most frequently used planned modifications include early stop for futility, early stop for efficacy, and/or sample size re-estimation. We will illustrate how the posterior distribution can be used to establish a criterion for trial success.
At the first unblinded interim analysis, we observed outcomes of the first 50 patients randomized in a 1:1 ratio to control and treatment arm. Let’s assume we observed 2 and 7 responders out of 25 patients in control and treatment arms, respectively. The posterior distributions based on outcomes of these 50 patients are shown in the second row. Posteriors are calculated using the prior and the data (likelihood) using Bayes’ theorem. These posteriors now become the priors before the next batch of data are collected. The prior to posterior updates at each interim analysis are shown in Fig. 5 in a row-wise fashion.
At the end of the study, with 75 patients in the control arm and 75 in treatment arm, we see that the posterior distributions for the response rates are centered around 0.07 and 0.29 in control and treatment arms, respectively. The posterior probability that the response rate in the treatment arm is greater than that in the control arm exceeds a pre-defined (e.g., 97.5%) probability threshold and the study is deemed successful. Should data from more patients be collected, we would get narrower credible intervals for the response rates.
This example illustrates the sequential learning mechanism that the Bayesian framework offers—today’s posterior becomes tomorrow’s prior. A Bayesian sequential approach can be applied to early phase trials as well as confirmatory trials. For example, a sequential design referred to as a Goldilocks design [18] has been used in pivotal medical device trials. Examples of Bayesian adaptive design features are discussed in the article “Bayesian Methods in Human Drugs and Biological Products Development in CDER and CBER” that is part of this special section on Bayesian clinical trials [19].
Introduction to Predictive Power for Interim Decision-Making
The concepts of interim analysis and adaptive trial design are not new and have been widely used in the frequentist framework [17]. To leverage the sequential learning and decision-making nature of the Bayesian paradigm, we need a framework to enable interim decision-making. In the previous section, we described how a posterior distribution can be used to form criteria for trial success. In this section, we will introduce predictive probabilities and discuss how they can help inform interim decision-making.
Bayesian methods can help to improve interim decision-making in the context of a development program or a single trial [20–22], for example, dose selection, population enrichment, and early stopping for futility or efficacy. Bayesian computation may allow better use of interim data and provide more adaptive options at the interim analysis (IA). For example, the calculation of predictive probability can include partial follow-up times of patients who have been enrolled but have not yet experienced an event of interest or completed the follow-up period. In this case, efficacy stopping based on predictive probability is consistent with a decision-making process that if existing (interim) data demonstrate superiority at IA, and the superiority will likely be maintained after the remaining data are collected, a trial may be stopped early [20]. Importantly, Bayesian IA methods do not require the trial to be fully Bayesian and can be applied to trials designed with a frequentist final analysis.
Predictive probability of success is defined as the chance of success for a trial at the final analysis based on interim data. It can provide clinically meaningful interpretations to inform interim decisions. IA questions in principle represent prediction problems [20]. In most studies, an IA would be planned when the data accumulated at the IA are still insufficient to make a conclusion. However, the goal of IA is not to draw a conclusion but to answer questions about whether this trial is likely to reach a definitive conclusion at the end of the trial. Such prediction questions are better addressed by predictive probabilities than by p-values or even by posterior probabilities.
Building on the previous example, let’s assume that the response rate in the control arm is 10%, and we require a minimum clinically relevant response rate of 30% in the treatment arm. Figure 6 below illustrates how we can use predictive probability to inform futility decisions at IAs. The numbers are for an adaptive design with the planned interim analysis at approximately 50% of the total sample size with the objective to assess and possibly stop for futility at the interim look. If there are no responders in the control arm and 5 responders in the treatment arm, the predictive probability of success is 0.924(≈92%). The numbers highlighted in white text show predictive probability of success with 1 and 0, 2 and 1, etc., responders in the treatment vs. control arm, respectively. Moving down from the white number within the same column are the cases favoring the control arm (cells in red). The figure below shows that 0.25 is a reasonable threshold for futility based on the number of responders in the treatment and control arms. Namely, if the predictive probability is less than 0.25 (in the area in red), the study can stop. Similarly, predictive probability can be used to determine if sufficient data to establish efficacy has been accumulated.
Figure 6.
Illustration of predictive probability. Note: The values inside the figure represent the predictive probability of success of a trial at the final analysis given the number of respondents in treatment and control arms at the interim analysis.
Beyond interpretability, Bayesian methods may offer better precision for interim decisions in some situations. For example, in studies with a time lag between enrollment and observed outcome, Bayesian predictive probability can use data from all enrolled patients. In this case, a final primary outcome can be modeled using earlier information. For example, if the primary outcome is measured at 12 months, many of enrolled patients will not have 12 months of follow-up by the time of an IA. If data for the endpoint of interest are available at earlier timepoints, e.g., at 3, 6, and 9 months (and presumably are correlated with the endpoint at 12 months), the correlation can be included into the calculation of the predictive probability of success.
Bayesian Phase 3 Case Study: BNT162b2 mRNA COVID-19 Vaccine Development
In this section, we illustrate how Bayesian methods introduced in the previous sections enable more flexible trial designs and accelerated learning from accumulating data by using the example of BNT162b2 mRNA COVID-19 vaccine development.
The high sense of urgency due to the COVID-19 pandemic called for accelerated but scientifically rigorous vaccine efficacy and safety evaluation. The Pfizer and BioNTech COVID-19 vaccine operationally seamless 1/2/3 trial, moving from dose selection through confirmatory evidence, was designed using a Bayesian framework. It provided flexibility by incorporating early and frequent interim analyses [23, 24].
The first primary endpoint for the phase 2/3 part was the efficacy of BNT162b2 against confirmed COVID-19 with onset after 7 days since the second dose in participants with no serologic or virologic evidence of SARS-CoV-2 infection up to 7 days after the second dose. Vaccine efficacy (VE) is defined as VE = 100 × (1 – IRR), where IRR is the ratio of the confirmed COVID-19 illness rate in the vaccine group to the corresponding illness rate in the placebo group. Based on FDA guidance on Development and Licensure of Vaccines to Prevent COVID-19 [25, 26], sponsors were required to demonstrate a lower bound of the appropriately alpha-adjusted confidence interval around (or equivalent to) VE > 30% with a point estimate of at least 50%. Due to high public health demand, there was strong interest to bring the vaccine to the market as soon as possible if there were evidence of safety and overwhelming efficacy. Moreover, the COVID-19 illness rate varied over time due to different waves of the disease. Therefore, a frequent interim look for vaccine efficacy was necessary. The phase 2/3 part of the study planned for 4 interim analyses (after the accrual of at least 32, 62, 92, and 120 cases) and one final analysis after 164 confirmed cases were accrued.
The assessment for the primary analysis was based on posterior probabilities using a Bayesian model. A weakly informative prior centered around VE = 30% was used; this was aligned with minimal VE efficacy of 30%. Figure 7 below describes the criteria established for futility and efficacy at IAs and Final Analysis. Planning for several IAs was especially useful in case accrual would be slower than expected, and/or if VE would be higher than expected.
Figure 7.
Boundaries for Efficacy and Futility at IA and Final Analysis in different scale. Note: The number in parenthesis represents required COVID-19 case split between BNT162b2 and placebo arm at IA and final analysis for claiming efficacy or futility assuming same exposure time for both arms. The numbers in green represent the VE and case split thresholds for success in respective IA. The numbers in red represent the VE and case split thresholds for futility in respective IA. Success Threshold: IA: P(VE > 30%|data) > 0.995; Final: P(VE > 30%|data) > 0.986. Futility Threshold: Probability of success < 5%.
There were 43,548 adults in the trial; these subjects were healthy or had stable chronic medical conditions [23]. For operational reasons, the first two planned IAs were not performed. The first IA took place after 94 cases were accrued. The BNT162b2 vaccine met the primary efficacy endpoint with more than a 99.5% probability (efficacy threshold) that the true vaccine efficacy was greater than 30% [27]. The IA outcome was released by Pfizer on Nov 9th, 2020 [28]. Moreover, the final analysis of the primary efficacy endpoint took place soon after the IA due to fast accrual of COVID cases caused by the second wave [29]. The final analysis also met the prespecified success criteria, i.e., to establish a probability above 98.6% (observed value > 99.99%) that the true VE was greater than 30%; this greatly exceeded the minimum FDA criteria for authorization. The EUA (Emergency Use Authorization) was based on the final analysis, which took place only one week after the IA. Note that for final approval, the FDA asked for longer-term safety data.
The Bayesian study design provided flexibility without compromising the quality of evidence. This design directly quantified the probability that a study hypothesis (VE ≥ 30%) was true given the trial data. This was particularly useful for early interim analyses where traditional adaptive frequentist designs could be overly conservative. The Bayesian design provided a straightforward way to set up a meaningful interim analysis plan, and the analysis used for the primary analysis provided strong evidence of vaccine efficacy.
Use of Informative Priors in Clinical Trials
Informative priors can be leveraged when there are relevant data external to the trial being planned. In such cases, Bayesian methods may result in a reduced sample size. With the use of informative priors, Bayesian methods provide for incorporating, or borrowing information from clinically relevant external evidence, without compromising the statistical validity of the trial. Bayesian borrowing can reduce sample size if external data such as previous RCTs or Natural History Studies (NHS) are available, and these data are in agreement with the data accumulating from the concurrent trial in terms of the population characteristics and outcomes of interest. Several important considerations for the selection of external data are outlined in Fig. 8.
Figure 8.
Considerations for selection and use of external data.
As illustrated before, a careful construction of a prior distribution is very important for the scientific rigor and regulatory acceptance of a Bayesian design or analysis. Prior-data conflict—a mismatch between the informative prior and observed data likelihood—can arise from multiple sources, such as population heterogeneity in baseline characteristics between the historical and current study or an evolving improvement in background standard of care. For example, to use an informative prior based on historical data for a control arm, the distribution of the outcomes of interest in the concurrent control arm should reasonably match with the informative prior. In the presence of prior-data conflict, methods that incorporate discounting of the informative prior should be considered to minimize bias and type-I or type-II error rate inflation. Since the FDA Center for Devices and Radiological Health (CDRH) issued Guidance for the Use of Bayesian Statistics [32], Bayesian methods have been increasingly applied to the development of new medical technologies. Readers could consider consulting the dedicated article on Bayesian methods applied to medical device development that is the part of this special section on Bayesian clinical trials [33]. In drug development, Bayesian borrowing and external control arms have been mainly used in rare diseases, pediatric trials, and/or when there are no previously approved drugs for the same indications. The application of Bayesian methods in rare disease drug development is discussed in a dedicated article on “Bayesian Strategies in Rare Diseases” [34]. A review of Bayesian methods in drugs and biologics development in CDER and CBER [19] outlines the areas with emerging use and discusses examples of Bayesian methods in CDER- and CBER-reviewed trials.
Conclusion
There are many Bayesian methods applicable to the analysis of clinical trials and both examples and methodology continue to grow. But we have chosen to highlight some key concepts likely to be of interest. Bayesian methods enable direct clinical interpretation of data and inference due to their ability to estimate the probability of a treatment effect exceeding a threshold and the probability of different magnitudes of treatment effect based on the observed data. The prior-to-posterior paradigm offers a sequential learning and decision-making framework which enables flexible trial design and accelerated learning. Bayesian interim analysis can optimize interim monitoring by enabling more efficient and informative decision-making, for both Bayesian and frequentist trials. For example, predictive probability of success can provide clinically meaningful interpretations to inform interim decisions. In trials with a time lag between enrollment and observed outcome, predictive probability allows us to incorporate partially completed data, i.e., data from patients who have been enrolled but have not yet completed the follow-up period. Therefore, an approach using predictive probabilities provides information on superiority of the novel treatment at interim analysis but also provides the likelihood of maintaining superiority after the remaining data are collected. Another important advantage of the Bayesian framework is a possibility to formally incorporate prior knowledge (i.e., data external to the current trial) into the analysis of the current trial. The choice of prior is very important because of its impact on the posterior, though as indicated earlier the impact is less as more data are accrued in the current trial. Use of informative priors can increase the precision of key quantities used in decision-making and potentially reduce the sample size. Due to the FDA CDRH guidance, Bayesian methods have more than a 10-year track record in medical device development. With the number of initiatives led by regulatory agencies, such as the FDA Complex Innovative Design program, we can expect increasing use of Bayesian methods in drug development.
Acknowledgements
We would like to thank Jennifer Clark, Food and Drug Administration, Silver Spring, Maryland, and Prof. em Martin Fey, University of Bern, Switzerland, for their helpful comments and insightful suggestions. The reviewers and the associate editor have been extremely helpful and we believe have improved the paper considerably.
Declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Clark J, Muhlemann N, Natanegara F, Hartley A, Wenkert D, Wang F, Harrell FE, Jr, Bray R. Why are not there more Bayesian clinical trials? Perceived barriers and educational preferences mong medical researchers involved in drug development. Ther Innov Regul Sci. 2022;3:1–9. doi: 10.1007/s43441-021-00357-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bray R, Hartley A, Wenkert D, Muehlemann N, Natanegara F, Harrell FE, Wang F, Clark J. Why are there not more Bayesian clinical trials? Perceptions and interpretation of Bayesian and conventional statistics among medical researchers involved drug development. Ther Innov Regul Sci. 2022. 10.1007/s43441-022-00482-1
- 3.Goligher E, Tomlinson G, Hajage D, Wijeysundera D, Fan E, Jüni P, Brodie D, Slutsky A, Combes A. Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome and posterior probability of mortality benefit in a post hoc bayesian analysis of a randomized clinical trial. JAMA. 2018;320(21):2251–2259. doi: 10.1001/jama.2018.14276. [DOI] [PubMed] [Google Scholar]
- 4.Zampieri F, Casey J, Shankar-Hari M, Harrell FE, Harhay MO. An overview of theory and example reanalysis of the alveolar recruitment for acute respiratory distress syndrome trial. Am J Respir Crit Care Med. 2021;203:543–552. doi: 10.1164/rccm.202006-2381CP. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kruschke J. Doing Bayesian data analysis: a tutorial with R. JAGS and Stan. Second edition: Academic Press; 2015. [Google Scholar]
- 6.https://www.fda.gov/vaccines-blood-biologics/approved-blood-products/information-regarding-oraquick-home-hiv-test. Accessed 2 Sept 2022
- 7.CDC. Estimated HIV Incidence and Prevalence in the United States, 2015–2019. https://www.cdc.gov/hiv/pdf/library/reports/surveillance/cdc-hiv-surveillance-supplemental-report-vol-26-1.pdf. Accessed 10 Aug 2022
- 8.Gelman A, Carlin J, Stern H, Rubin D, Dunson D, Vehtari A. Bayesian data analysis. Third edition. http://www.stat.columbia.edu/~gelman/book. Accessed 2 Sept 2022
- 9.Zampieri F, Damiani L, Bakker J, Ospina-Tascón G, Castro R, Cavalcanti R, Hernandez G. Effects of a resuscitation strategy targeting peripheral perfusion status versus serum lactate levels among patients with septic shock. A Bayesian reanalysis of the Andromeda-shock trial. Am J Respir Crit Care Med. 2020;201(4):423–429. [DOI] [PubMed]
- 10.Combes A, Hajage D, Capellier D et al. for the EOLIA Trial Group, REVA, and ECMONet*. Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome. N Engl J Med. 2018;378:1965–75. [DOI] [PubMed]
- 11.Maron DJ, Hochman JS, Reynolds HR, Bangalore S, O’Brien SM, Boden WE, Chaitman BR, Senior R, Lopez‑Sendon J, Alexander KP, Lopes RD, Shaw LJ, Berger JS, Newman JD, Sidhu MS, Goodman SG, Ruzyllo W, Gosselin G, Maggioni AP, White HD, Bhargava B, Min JK, Mancini GBJ, Berman DS, Picard MH, Kwong RY, Ali ZA, Mark DB, Spertus JA, Krishnan MN, Elghamaz A, Moorthy N, Hueb WA, Demkow M, Mavromatis K, Bockeria O, Peteiro J, Miller TD. Szwed H, Doerr R, Keltai M, Selvanayagam JB, Steg PG, Held C, Kohsaka S, Mavromichalis S, Kirby R, Jeffries NO, Harrell FE Jr, Rockhold FW, Broderick S, Ferguson TB Jr, Williams DO, Harrington RA, Stone GW, Rosenberg Y, for the ISCHEMIA Research Group. Initial Invasive or Conservative Strategy for Stable Coronary Disease. N Engl J Med. 2020; 382:1395–1407
- 12.Gelman A. Prior distribution. Chichester: Wiley; 2002. [Google Scholar]
- 13.van de Schoot R, Depaoli S, King R, Kramer B, Märtens K, Tadesse MG, Vannucci M, Gelman A, Veen D, Willemsen J, Yau C. Bayesian statistics and modelling. Nat Rev Methods Primers. 2021;1:1. doi: 10.1038/s43586-020-00001-2. [DOI] [Google Scholar]
- 14.Bulluck H, Hammond-Haley M, Weinmann S, Martinez-Macias R, Hausenloy DJ. Myocardial infarct size by CMR in clinical cardioprotection studies: insights from randomized controlled trials. JACC. 2017;10(3):230–240. doi: 10.1016/j.jcmg.2017.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ. Summarizing historical information on controls in clinical trials. Clin Trials. 2010;7(1):5–18. doi: 10.1177/1740774509356002. [DOI] [PubMed] [Google Scholar]
- 16.Weber S, Li Y, Seaman JW, III, Kakizume T, Schmidli H. Applying meta-analytic-predictive priors with the R Bayesian evidence synthesis tools. J Stat Softw. 2021;100:1–32. doi: 10.18637/jss.v100.i19. [DOI] [Google Scholar]
- 17.U.S. Food and Drug Administration. Guidance for industry on adaptive designs for clinical trials of drugs and biologics. Silver Spring, MD: Office of Communication, Outreach and Development, U.S. Food and Drug Administration 2019.
- 18.Broglio K, Connor J, Berry S. Not too big, not too small: a Goldilock approach to sample size selection. J Biopharm Stat. 2014;24:685–705. doi: 10.1080/10543406.2014.888569. [DOI] [PubMed] [Google Scholar]
- 19.Ionan AC, Clark J, Travis J, Amatya A, Scott J, Smith JP, Chattopadhyay S, Salerno MJ, Rothmann M. Bayesian methods in human drug and biological products development in CDER and CBER. Ther Innov Regul Sci. 2022;1–9. [DOI] [PMC free article] [PubMed]
- 20.Saville B, Connor J, Ayers G, Alvarez J. The utility of Bayesian predictive probabilities for interim monitoring of clinical trials. Clin Trials. 2014;11(4):485–493. doi: 10.1177/1740774514531352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mukherjee R, Yajnik P, Muhlemann N, Morgan BC. A sequential predictive power design for a COVID vaccine trial. Stat Biopharm Res. 2022;14(1):42–51. doi: 10.1080/19466315.2021.1979641. [DOI] [Google Scholar]
- 22.Wu X, Xu Y, Carlin B. Optimizing interim analysis timing for Bayesian adaptive commensurate designs. Stat Med. 2019;1–14. [DOI] [PubMed]
- 23.Polack FP, Thomas SJ, Kitchin N, Absalon J, Gurtman A, Lockhart S, Perez JL, Pérez Marc G, Moreira ED, Zerbini C, Bailey R, Swanson KA, Roychoudhury S, Koury K, Li P, Kalina WV, Cooper D, Frenck RW, Hammitt LL, Türeci Ö, Nell H, Schaefer A, Ünal S, Tresnan DB, Mather S. Safety and efficacy of the BNT162b2mRNA Covid-19 vaccine. N Engl J Med. 2020;383:2603–2615. doi: 10.1056/NEJMoa2034577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.https://cdn.pfizer.com/pfizercom/2020-11/C4591001_Clinical_Protocol_Nov2020.pdf Accessed 10 Aug 2022
- 25.https://www.fda.gov/regulatory-information/search-fda-guidance-documents/emergency-use-authorization-vaccines-prevent-covid-19. Accessed 10 Aug 2022
- 26.Development and Licensure of Vaccines to Prevent COVID-19. Guidance for Industry. FDA CBER, 2020.
- 27.https://www.nature.com/articles/d41586-020-03166-8. Accessed 25 Feb 2023
- 28.https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-announce-vaccine-candidate-against. Accessed 25 Feb 2023
- 29.Coccia M. (2021). The impact of first and second wave of the COVID-19 pandemic in society: comparative analysis to support control measures to cope with negative effects of future infectious diseases. Environ Res. 197, 111099. 10.1016/j.envres.2021.111099 [DOI] [PMC free article] [PubMed]
- 30.Rare Diseases: Natural History Studies for Drug Development. Draft Guidance. FDA CDER, CBER & OOPD, 2019.
- 31.Interacting with the FDA on Complex Innovative Trial Designs for Drugs and Biological Products. Guidance for Industry, FDA CDER & CBER, 2020. https://www.fda.gov/drugs/development-resources/complex-innovative-trial-design-pilot-meeting-program. Accessed 2 Sept 2022
- 32.Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. FDA CDRH, 2010
- 33.Campbell G, Irony T, Pennello G, Thompson L. Bayesian statistics in medical devices: progress since 2010. Ther Innov Regul Sci. [DOI] [PMC free article] [PubMed]
- 34.Garczarek A, Muehlemann N, Richard F, Yajnik P, Russek-Cohen E. Bayesian strategies in rare diseases. Ther Innov Regul Sci. 2022;1–8 [DOI] [PMC free article] [PubMed]








