Abstract
Purpose:
The primary results of phase III oncology trials may be challenging to interpret, given that results are generally based on P-value thresholds. The probability of whether a treatment is beneficial, although more intuitive, is not usually provided. Here we developed and released a user-friendly tool that calculates the probability of treatment benefit using trial summary statistics.
Methods:
We curated 415 phase III randomized trials enrolling 338,600 patients published between 2004 and 2020. A phase III prior probability distribution for the treatment effect was developed based on a three-component zero-mean mixture distribution of the observed z-scores. Using this prior, we computed the probability of clinically meaningful benefit (hazard ratio < 0.8). The distribution of signal-to-noise ratios and power of phase III oncology trials was compared with that of 23,551 randomized trials from the Cochrane Database.
Results:
The signal-to-noise ratios of phase III oncology trials tended to be much larger than randomized trials from the Cochrane database. Still, the median power of phase III oncology trials was only 49% (IQR, 14% to 95%), and the power was less than 80% in 65% of trials. Using the phase III oncology-specific prior, only 53% of trials claiming superiority (114 of 216) had a ≥ 90% probability of clinically meaningful benefits. Conversely, the probability that the experimental arm was superior to the control arm (HR < 1) exceeded 90% in 17% of trials interpreted as having no benefit (34 of 199).
Conclusion:
By enabling computation of contextual probabilities for the treatment effect from summary statistics, our robust, highly practical tool, now posted on a user-friendly webpage, can aid the wider oncology community in the interpretation of phase III trials.
Keywords: Bayesian statistics, shrinkage, publication bias, posterior probability, phase III, interpretation, signal-to-noise ratio, oncology
INTRODUCTION
The interpretation of modern phase III randomized trials in oncology is a considerable challenge.1 The standard approach for estimating comparative survival advantages is to compute hazard ratios (HRs) and their 95% CIs, with most trials declaring superiority of an experimental intervention based on P-value thresholds.2 However, 95% CIs and P values are widely misinterpreted. 95% CI are often misunderstood as having 95% probability of containing the true effect, and P-values are often mistaken as the probability of no difference.3–5 P-value thresholds may lead to both overestimation and underestimation of effects, particularly in scenarios where power is lower than planned.6–9 For example, a significant P value (e.g., P < 0.05) in a trial designed with 80% power does not imply an 80% probability that the experimental treatment was beneficial. Therefore, novel tools to improve the interpretation of primary outcomes in oncology trials are sorely needed. Some have proposed directly computing the probability of benefit (e.g., HR < 1) using Bayesian approaches, because the probability of whether an intervention is helpful or harmful is more intuitive to oncologists and patients than interpreting P values.10–13 However, such calculations require specification of prior knowledge, which may appear controversial due to apparent subjectivity even when guided by domain expertise.11,14,15
There is a considerable need for a straightforward, data-driven approach to estimating the probability of benefit in phase III trials, including in the absence of individual-level patient data, which are often difficult for clinicians to access. Here, we propose a user-friendly and evidence-based solution for estimating the probability of whether new oncology treatments tested in phase III trials are effective using standard trial-level summary statistics. The purpose of the present meta-epidemiological study was to develop an informative, oncology-specific default prior, derived from the distribution of the z-scores obtained from 415 contemporary phase III oncology trials. This default prior can be used by practicing oncologists to interpret historical, current, and future phase III oncology trials.
METHODS
Institutional review board approval for this meta-epidemiological study was not needed due to the public availability of the trial data. Trials were screened from ClinicalTrials.gov in February 2020 using the advanced search terms “cancer,” phase “Phase 3”, study results “With Results,” and status “excluded: Not yet recruiting. Trials were required to be phase III with two-arm, superiority designs that tested anti-cancer interventional strategies (Figure 1). Trials that had not published their primary endpoint were excluded, as well as trials that did not use time-to-event primary endpoints.
Figure 1.
Distribution of the signal-to-noise ratio (SNR) and power of the primary outcomes of phase III randomized clinical trials (RCTS) in oncology. (A) The SNR in phase III oncology trials tended to be much larger than that of randomized trials in the Cochrane database. (B) Estimated distribution of power against the true effect in phase III oncology RCTs.
Primary endpoint summary statistics (HRs and 95% CIs) were recorded. The control arm was taken as the reference for all comparisons, such that HR < 1 always favored the experimental arm and in cases where the experimental arm was set as the reference, reciprocals for the HR were used. For trials with multiple co-primary endpoints, the time-to-event primary endpoint with a reported 95% CI was used. If 95% CIs were used for all time-to-event co-primary endpoints, overall survival was used due to its intrinsic value and potential advantages compared with surrogate endpoints.2,16,17
In previous work, an informative prior for addressing treatment effect exaggeration was developed using 23,551 randomized trials in the Cochrane Database of Systematic Reviews.9,18–21 In the present study, we applied this methodology for the creation of a phase III, oncology-specific prior distribution of the treatment effect.18–21 In brief, the z-score for each RCT was computed as the point estimate for the log hazard ratio (log HR) divided by the standard error, computed by taking the difference between the log bounds of the 95% CI for the HR divided by twice the qnorm of 0.975.20 Recall that the P value is less than 0.05 when the z-score is greater than 1.96 or less than −1.96. Similar to the Cochrane-based prior distribution, we then fit a zero-mean three-component normal mixture to the z-scores obtained from the log HR.21 We used three components due to the smaller number of phase III oncology trials compared with the four component mixture used for the Cochrane-derived prior distribution. We chose a zero-mean mixture in order to set the prior probability of any benefit to be 50%−50%. In this way, neither treatment arm was favored in the prior distribution. While the z-score is the ratio of the estimated treatment effect to the standard error, the signal-to-noise ratio (SNR) is defined as the ratio of the true treatment effect to the standard error. Because the z-score represents the sum of the SNR and a standard normal error term, the “deconvolution trick” was applied to obtain the distribution of the SNR by subtracting 1 from the variances of each mixture component.19 It is quite remarkable that it is possible to obtain the distribution of the SNR because the true effect cannot be observed. The distribution of the power was then obtained as a transformation of the SNR.9 Note that this is the power against the (unobserved) true effect, and not the so-called “post hoc” power, or the power against the effect that was assumed in the sample size calculation.22 Lastly, the prior for the treatment effect was obtained by scaling the distribution of the SNR by the standard error.18 The underlying dataset and code for the development of the prior are provided in the Supplement.
Based on guidelines from the American Society of Clinical Oncology, the minimum clinically important difference (MCID) was defined by HR < 0.8.23–26 The probability of a detectable effect was defined by HR < 1. The probability for both hypotheses was computed for each trial. Analyses and plots were completed in R v.4.3.2 (Vienna, Austria) and Prism v10 (La Jolla, CA).27
RESULTS
After screening 785 phase III randomized trials from ClinicalTrials.gov, we included 415 two-arm, superiority-design, interventional, therapeutic, time-to-event, phase III trials (Figure S1). Publication dates of the primary endpoint ranged from 2004 to 2020, with a total of 338,600 patients enrolled. Most trials studied metastatic solid tumors (n=263, 63%), and most utilized surrogate primary endpoints (n=250, 60%) (Table 1). Superiority was claimed for the experimental arm in 216 trials (52%) and was not claimed in 191 trials (46%); in 8 trials, inferiority of the experimental arm was claimed (2%).
Table 1.
Characteristics of trials included in the analysis.
Characteristic | No., (%) |
---|---|
Total trials | 415 |
Disease stage | |
Solid, non-metastatic | 87 (21) |
Solid, metastatic | 263 (63) |
Hematologic | 65 (16) |
Disease site | |
Breast | 76 (18) |
Gastrointestinal | 66 (16) |
Genitourinary | 58 (14) |
Hematologic | 65 (16) |
Thoracic | 79 (19) |
Other* | 71 (17) |
Treatment modality | |
Systemic therapy | 404 (97) |
Local therapy | 11 (3) |
Cooperative group study | 88 (21) |
Industry sponsored | 361 (87) |
Median number of enrolled patients (interquartile range) | 596 (377 to 903) |
Median publication year (interquartile range) | 2015 (2012 to 2017) |
Primary endpoint | |
Overall survival | 165 (40) |
Surrogate | 250 (60) |
Primary outcome | |
Superiority shown for experimental arm | 216 (52) |
Superiority not shown for experimental arm | 191 (46) |
Inferiority of experimental arm | 8 (2) |
Other disease sites included: central nervous system, endocrine, gynecologic, head and neck, pediatric, sarcoma, and skin.
The absolute z-scores of phase III oncology trials tended to be much larger than those of 23,551 RCTs from the Cochrane database (P < 0.0001, Mann-Whitney test) (Figure S2). This implies that both the SNR and the power of phase III oncology RCTs also tends to be much larger than that of general RCTs (Figure 1A). Z-scores appeared to be stable over time, where time was defined as the trial publication year (Figure S3). A zero-mean mixture of 3 normal distributions provided a reasonable fit to the z-scores of the phase III trials (Figure S4). The proportions and variances of each subcomponent are reported in Table S1. We derived a zero-mean mixture of 3 normal distributions for the SNR by subtracting 1 from each of the variances (Table S1). From the distribution of the SNR, we derived the distribution of the power. We find that the median power of phase III oncology RCTs was only 49% (IQR, 14% to 95%), with an average power of 52% (Figure 1B). An estimated 65% of trials had power less than 80%, with 71% of trials < 90%. The power was > 95% in an estimated 25% of trials. As previously reported, the power of RCTs from the Cochrane database tended to be even lower (median power: 13%, with 78% of trials < 80% power).9 The SNR distribution was scaled by the observed standard error to derive the prior for the treatment effect in a particular trial (Figure S5). Examples of resulting priors are plotted in Figure S6.
The probabilities of a detectable effect (HR < 1) exceeded 90% for all 216 trials that claimed superiority (Figure 2). However, only 53% of trials with superiority claims (114 of 216) had ≥ 90% probability of achieving the MCID (Table 2). Conversely, for the 199 trials that did not claim superiority, the median probability that the experimental arm had an HR less than 1 was 63% (IQR, 32% to 86%) (Figure 2). In 17% of trials that did not claim superiority (34 of 199), the probability that HR was less than 1 exceeded 90% (Table 2). Consistent with the differences in the SNR between phase III oncology RCTs and RCTs from the Cochrane database, posterior probabilities computed by the Cochrane database prior appeared to be over-corrected compared with the phase III oncology-specific prior (Figure 3).
Figure 2.
Posterior probabilities of primary endpoints of 415 phase III trials computed using the phase III oncology prior. Probabilities are grouped according to endpoint type (overall survival [OS] or surrogate survival) and the trial result interpretation (claim for superiority or not). (A) Probability that the hazard ratio (HR) is less than 1 in favor of the experimental arm. (B) Probability that the experimental arm shows superiority according to the minimum clinically important difference (MCID), defined as HR < 0.8.
Table 2.
Probabilities of a detectable effect (hazard ratio [HR] < 1) and achieving a minimum clinically important difference (MCID) (HR < 0.8) in phase III oncology randomized clinical trials, computed by a phase III, oncology-specific prior.
Posterior probability | RCTs grouped by trial result interpretation, no. (%) | |
---|---|---|
Superiority claimed for the experimental arm, n=216 | Superiority not claimed for the experimental arm, n=199 | |
MCID (HR < 0.8) | ||
≥ 90% | 114 (53) | 0 (0) |
≥ 75% | 149 (69) | 1 (0.5) |
≥ 50% | 180 (83) | 6 (3) |
Detectable effect (HR < 1) | ||
≥ 90% | 216 (100) | 34 (17) |
≥ 75% | 216 (100) | 82 (41) |
≥ 50% | 216 (100) | 130 (65) |
Figure 3.
Comparison of posterior probabilities computed by the phase III oncology-specific prior versus the prior from the Cochrane database of RCTs for the minimum clinically important difference (hazard ratio < 0.8).
A webpage has been created for users to compute hypothesis probabilities at a given level of HR based on the trial’s summary statistics (https://alexandersherry.shinyapps.io/shinyapp/), with an illustration shown in Figure 4.
Figure 4.
Illustration of the standalone webpage (https://alexandersherry.shinyapps.io/shinyapp/) facilitating a user-friendly approach to estimating the posterior distribution from a phase III oncology trial of interest using the proposed informative prior. After entering the summary statistics from the published trial’s Cox regression (hazard ratio and its 95% confidence interval), the user can calculate the posterior mean estimate and its 95% credible interval, visualize the posterior distribution, and estimate the posterior probability of effects above or below a threshold of interest as well as between a range of interest. Posterior probabilities for HR < 1, HR > 1, and HR < 0.8 (which is often considered as a minimum clinically important difference) are provided as a default. For the posterior distribution plot, a dashed line indicates HR = 1, and the area under the curve for HR < 0.8 is shaded in blue.
DISCUSSION
Using the distribution of the z-scores of the primary endpoints of 415 phase III oncology randomized controlled trials, the present study is the first to compute an evidenced-based default prior specifically designed to estimate the effects of phase III oncology trials. This prior has been deployed in a standalone webpage application that allows users to input the summary Cox regression statistics of a phase III trial and compute the posterior probabilities of benefit at any level of hazard ratio. By providing oncologists with the means to directly compute probabilities of interest, the present study provides a robust, highly practical tool to immediately enhance the interpretation of phase III trials throughout the wider oncology community.
Consistent with previous work, we found that the actual power of most phase III trials is low relative to the power specified during trial design.9,28 Lower power increases the risk of false negative findings. When an underpowered trial does reach “statistical significance”, the effect is usually overestimated and leads to replication failure. Directly computing the probability of benefit using our phase III prior provides a more intuitive method of understanding and interpreting the uncertainty associated with underpowered trials. The consequences of relying on P-value thresholds in underpowered trials are directly manifested in our finding that the experimental arms of 17% of trials interpreted as negative or inconclusive had greater than 90% probability of superiority to the control arm.
Rather than specifying the beliefs of an effect for a single unique treatment, this study offers an objective approach to specifying the prior probability distribution by using signal-to-noise ratios that have been observed in phase III oncology trials. In the absence of specific beliefs regarding a unique treatment, this prior probability distribution for the treatment effect may serve as a reasonable default prior. The utility of this prior may be especially pertinent when Bayesian posteriors are not computed in the trial publication, which is the case for the vast majority of trials at present. Furthermore, this prior may be particularly helpful when individual-patient level data are not made readily available for re-analysis. In a separate study, we manually reconstructed individual patient-level data for the primary outcomes of 230 trials and found that the posteriors computed from individual patient-level data using conventional priors had a high degree of concordance with those computed using the prior proposed in the present study.29 Notwithstanding these advantages, in some cases or in sensitivity analyses, using priors unique to a certain treatment, rather than a default prior as proposed in the present study, may be desirable. However, the selection of a prior under such circumstances has inherent subjectivity, representing one of the primary criticism of Bayesian methods, and may be less robust or less informative compared with the prior proposed in the present study, which was derived from the data of 415 trials. Nonetheless, it is important to note that current tools using the methods of Wijeysundera et al. are available to facilitate this evaluation with published trial summary statistics: https://benjamin-andrew.shinyapps.io/bayesian_trials/.30,31 Taken together, the present study represents a considerable advance and adds robust, objective, and unique value to the interpretative tools available to oncologists.
Previously, an informative prior, based on thousands of clinical trials in the Cochrane database, was proposed.18–21 However, we found that the distribution of SNRs of RCTs in the Cochrane database tend to be lower than that of phase III oncology trials, consistent with the observation that phase III oncology trials are often larger than general medical RCTs, leading to more power. The use of a phase III-specific, oncology-specific prior improves the robustness of computed posterior probabilities by reducing the risk of over-correction, as suggested by our comparison of posterior probabilities for the MCID. These findings thus support the stated rationale for using a separate, dedicated phase III oncology prior, namely that phase III trials, compared with general medical RCTs, are more typically large-scale, multi-center, multinational, and most likely to change practice and lead to regulatory approvals. That being said, low SNR were observed even for phase III oncology trials, which can lead to upward bias in the estimate of HR.21 Application of the phase III oncology prior distribution of treatment effect to trials, especially those with low SNR, may partially reduce this bias, and posterior mean estimates for HR are computed as part of the provided webpage.
To illustrate the potential value of estimating the probability of benefit, consider the results of two example phase III trials, the GEMPAX trial and CALGB 30610 (Alliance) / RTOG 0538, neither of which was included in the development of the phase III prior as both trials were recently published.32,33 The GEMPAX trial compared second-line gemcitabine with paclitaxel versus gemcitabine alone for patients with metastatic pancreatic ductal adenocarcinoma.32 GEMPAX showed an improvement in progression-free survival (HR 0.64, 95% CI 0.47 to 0.89) and overall response rate, but interpreted the primary endpoint of overall survival as “statistically negative” (HR 0.87, 95% CI 0.63 to 1.20) on the basis of a large P value (0.41). Importantly, large P values do not support the null hypothesis, and in fact provide little information.4 Using the phase III prior developed in the present study, the probability that gemcitabine plus paclitaxel is associated with better overall survival (HR < 1) than gemcitabine alone is 78%, and is similar to the 75% probability that there is no clinically meaningful difference (0.8 < HR < 1.25) between the two treatments (Figure 4). This highlights that the overall survival results lacked the power and precision to make reliable assertions regarding treatment efficacy or lack thereof. Conversely, the CALGB 30610 (Alliance) / RTOG 0538 RCT noted a HR of 0.94 (95% CI 0.76 to 1.17, P = 0.594) for OS among patients with limited-stage small-cell lung cancer who underwent one-daily radiation compared with those who underwent standard-of-care twice daily radiation.33 Using our phase III prior we can see that the probability that the two treatments yielded no clinically meaningful difference (0.8 < HR < 1.25) was 95%. Therefore, although both RCTs yielded a similar p-value, only RTOG 0538 had adequate power to conclusively determine no clinically meaningful difference. Notably, the standard error of the natural logarithm of HR for RTOG 0538, 0.11, was less than that of GEMPAX, 0.16. A trial with a smaller standard error is expected to have a narrower prior for the treatment effect, because treatment effect and standard error are inversely related. For example, to obtain 80% power in a trial, the treatment effect must be 2.8 times the standard error. Accordingly, in this example, GEMPAX was powered for a larger treatment effect (HR of 0.625) compared with RTOG 0538 (HR of 0.77).32,33 Trials expecting small treatment effects and choosing large sample sizes to reduce the standard error will result in narrower priors for the treatment effect, because the trialists themselves suspected smaller treatment effects. This distinction, made manifest in the computation of posterior probabilities by our prior distribution, highlights an example of the importance of comprehensively evaluating RCT results using our provided webtool. Importantly, probabilities of benefit do not represent rules for decision-making.1 Inference, obtained from data such as this study, must be applied by the oncologist to each clinical scenario in light of the risks, alternatives, patient characteristics, and patient values.25,34 Nonetheless, the additional information provided by posterior computation can facilitate a more informed and data-driven approach to clinical care.
There are some important limitations to consider in the present study. First, we assumed that the Cox regressions that formed the basis of the primary analysis of each phase III trial met their underlying statistical assumptions, including proportional hazards; however, this may not have always been the case.35,36 In general, we suggest that trials should report summary statistics that are interpretable and make as few assumptions as possible.37 Second, there may have been underlying variation in the approach of each trial towards computing the primary endpoint; Cox models fit with prognostic covariates are likely more precise than univariable models and this may have influenced the resultant SNRs that were used to create the prior.38 Third, phase III trials that were not published due to non-significant results, which was estimated in one study to be as high as 7% of trials, may have resulted in a file drawer effect and influenced our treatment effect distribution, as only published studies were included.39 Fourth, our prior was specifically fit to the primary endpoints of phase III trials, which is both a strength and limitation. Importantly, this prior does not assume homogeneity of treatment effects, but represents general information about the signal-to-noise ratios of phase III trials of oncology conducted between 2004 and 2020. Although the z-score distribution appeared stable during the time period these trials were published, if the signal-to-noise ratios of phase III oncology trials were to meaningfully change in the coming decades, the robustness of the prior may be lessened. However, the foundation provided by the present study, including public availability of the key data and methods, facilitates ready updates to the distribution of the informative prior if the underlying basis of the prior were to meaningfully change, consistent with the general principles of Bayesian epistemology. Furthermore, the distribution of SNRs of phase III oncology trials appeared meaningfully different from the distribution of SNRs from general medical RCTs in the Cochrane database, as well as subsequent posterior probability estimations, thus establishing the need for a separate, phase III oncology specific prior. Consequently, other trials or endpoints, such as phase II trials or even secondary endpoints of phase III trials, are expected to have lower SNRs. Our prior is not appropriate for these cases because they lack exchangeability with the trials included in our analysis. Similarly, the prior is not as relevant to phase III trials conducted in other fields of medicine because it was derived and developed exclusively from oncology trials. Fifth, although we defined the MCID as HR < 0.8 as an illustration of the output of this tool, the MCID may be also be context-specific.24 Using the provided webpage, any interested user can compute the probability of MCID according to their own definition, or likewise compute the probability of a detectable effect based on an alternative definition (such as MCID/2, rather than HR < 1). Lastly, although this tool allows any user to obtain inferences from published trial data, we encourage consideration of these limitations and the fact that inference and decision-making are distinct.25,34 Bayesian methodology is not strictly superior to frequentism in all settings, but rather complementary. An understanding and sensitivity of the advantages and tradeoffs of both may ultimately best serve the stakeholders of clinical oncology in the analyses of landmark trials. For trial primary endpoint analysis, clinical trial biostatisticians and principal investigators are the best equipped to select statistical approaches for their trial, including frequentist approaches and/or priors based on the specific beliefs unique to their proposed treatment for posterior calculation based on the complete underlying individual patient-level data.
In summary, the present study provides an evidence-based off-the-shelf prior that can be used to improve trial interpretation by enabling computation of the probabilities of any benefit and a clinically meaningful benefit based on summary statistics from published Cox regression analyses. This tool is freely available online, without the need for coding. Practicing oncologists, patients, scientists, and students may find estimation of posterior probabilities to be valuable for placing trial results in context. We encourage clinical trial principal investigators, regulators, and other stakeholders to consider computing and reporting the probability of whether a treatment is provides a clinically meaningful benefit, and not just the P value, when weighing the merits and drawbacks of a new treatment.
Supplementary Material
CONTEXT SUMMARY.
Key objective:
We sought to derive an informative prior distribution for the treatment effect of phase III oncology trials to facilitate an objective means to calculate posterior probabilities for the primary outcomes of randomized oncology trials.
Knowledge generated:
Using the summary statistics of 415 phase III trials, we fit a mixture model for the true (unobserved) treatment effect distribution by deconvoluting the distribution of z-scores into the signal-to-noise ratio distribution followed by scaling of the standard error. We used this mixture model as a prior distribution to estimate the probabilities of clinically relevant effect sizes from this dataset of phase III trials.
Relevance:
Oncologists, researchers, and other stakeholders may readily estimate the probability of treatment effects at various effect sizes of interest with the derived prior by means of a user-friendly webpage (https://alexandersherry.shinyapps.io/shinyapp/).
Acknowledgments:
We thank Erica Goodoff, Senior Scientific Editor in the Research Medical Library at The University of Texas MD Anderson Cancer Center, for editing this article.
Funding:
Supported in part by Cancer Center Support (Core) grant P30CA016672 from the National Cancer Institute to The University of Texas MD Anderson Cancer Center and by the Sabin Family Fellowship Foundation (E.B.L. and P.M.).
Disclosures:
A.D.S. reports honoraria from Sermo. P.M. reports honoraria for scientific advisory board membership for Mirati Therapeutics, Bristol-Myers Squibb, and Exelixis; consulting fees from Axiom Healthcare; non-branded educational programs supported by DAVA oncology, Exelixis, and Pfizer; leadership or fiduciary roles as a Medical Steering Committee Member for the Kidney Cancer Association and a Kidney Cancer Scientific Advisory Board Member for KCCure; and research funding from Takeda, Bristol-Myers Squibb, Mirati Therapeutics, and Gateway for Cancer Research (all unrelated to this manuscript’s content). Z.R.M. reports employment at Insitro (unrelated to this manuscript’s content). No other authors report any conflicts of interest.
Data availability:
The underlying dataset, code for the development of the prior, and code for computing posterior probability are provided in the Supplement. A freely available webpage has been created for users to compute the probabilities at a given level of hazard ratio based on the trial’s summary statistics: https://alexandersherry.shinyapps.io/shinyapp/.
REFERENCES
- 1.Msaouel P, Lee J, Thall PF. Interpreting Randomized Controlled Trials. Cancers (Basel) 2023; 15(19): 4674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lin TA, Sherry AD, Ludmir EB. Challenges, Complexities, and Considerations in the Design and Interpretation of Late-Phase Oncology Trials. Semin Radiat Oncol 2023; 33(4): 429–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 2016; 31(4): 337–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rafi Z, Greenland S. Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology 2020; 20(1): 244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Greenland S Invited Commentary: The Need for Cognitive Science in Methodology. Am J Epidemiol 2017; 186(6): 639–45. [DOI] [PubMed] [Google Scholar]
- 6.Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature 2019; 567(7748): 305–7. [DOI] [PubMed] [Google Scholar]
- 7.Wasserstein RL, Lazar NA. The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician 2016; 70(2): 129–33. [Google Scholar]
- 8.Wasserstein RL, Schirm AL, Lazar NA. Moving to a World Beyond “p < 0.05”. The American Statistician 2019; 73(sup1): 1–19. [Google Scholar]
- 9.van Zwet E, Gelman A, Greenland S, Imbens G, Schwab S, Goodman SN. A New Look at P Values for Randomized Clinical Trials. NEJM Evidence 2024; 3(1): EVIDoa2300003. [DOI] [PubMed] [Google Scholar]
- 10.Adamina M, Tomlinson G, Guller U. Bayesian statistics in oncology: a guide for the clinical investigator. Cancer 2009; 115(23): 5371–81. [DOI] [PubMed] [Google Scholar]
- 11.Good IJ. The Bayes/Non-Bayes Compromise: A Brief Review. Journal of the American Statistical Association 1992; 87(419): 597–606. [Google Scholar]
- 12.Senn S You may believe you are a Bayesian but you are probably wrong. RMM 2011; 2: 48–66. [Google Scholar]
- 13.Siddique J Bayesian (re)-Analyses of Clinical Trial Data. NEJM Evidence 2023; 2(1): EVIDe2200297. [DOI] [PubMed] [Google Scholar]
- 14.Diamond GA, Kaul S. Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials. J Am Coll Cardiol 2004; 43(11): 1929–39. [DOI] [PubMed] [Google Scholar]
- 15.Fornacon-Wood I, Mistry H, Johnson-Hart C, Faivre-Finn C, O’Connor JPB, Price GJ. Understanding the Differences Between Bayesian and Frequentist Statistics. Int J Radiat Oncol Biol Phys 2022; 112(5): 1076–82. [DOI] [PubMed] [Google Scholar]
- 16.Booth CM, Eisenhauer EA, Gyawali B, Tannock IF. Progression-Free Survival Should Not Be Used as a Primary End Point for Registration of Anticancer Drugs. J Clin Oncol 2023; 41(32): 4968–72. [DOI] [PubMed] [Google Scholar]
- 17.Kemp R, Prasad V. Surrogate endpoints in oncology: when are they acceptable for regulatory and clinical decisions, and are they currently overused? BMC Med 2017; 15(1): 134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.van Zwet E, Gelman A. A Proposal for Informative Default Priors Scaled by the Standard Error of Estimates. The American Statistician 2022; 76(1): 1–9. [Google Scholar]
- 19.van Zwet E, Schwab S, Greenland S. Addressing exaggeration of effects from single RCTs. Significance 2021; 18(6): 16–21.33821160 [Google Scholar]
- 20.van Zwet E, Schwab S, Senn S. The statistical properties of RCTs and a proposal for shrinkage. Statistics in Medicine 2021; 40(27): 6107–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.van Zwet E, Tian L, Tibshirani R. Evaluating a shrinkage estimator for the treatment effect in clinical trials. Statistics in Medicine 2023; 43(5): 855–68. [DOI] [PubMed] [Google Scholar]
- 22.Hoenig JM, Heisey DM. The Abuse of Power. The American Statistician 2001; 55(1): 19–24. [Google Scholar]
- 23.Hahn AW, Dizman N, Msaouel P. Missing the trees for the forest: most subgroup analyses using forest plots at the ASCO annual meeting are inconclusive. Ther Adv Med Oncol 2022; 14: 17588359221103199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ellis LM, Bernstein DS, Voest EE, et al. American Society of Clinical Oncology perspective: Raising the bar for clinical trials by defining clinically meaningful outcomes. J Clin Oncol 2014; 32(12): 1277–80. [DOI] [PubMed] [Google Scholar]
- 25.Msaouel P, Lee J, Thall PF. Making Patient-Specific Treatment Decisions Using Prognostic Variables and Utilities of Clinical Outcomes. Cancers (Basel) 2021; 13(11): 2741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials 1989; 10(4): 407–15. [DOI] [PubMed] [Google Scholar]
- 27.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2023. [Google Scholar]
- 28.Shen C, Xu H. Randomized Phase III Oncology Trials: A Survey and Empirical Bayes Inference. Journal of Statistical Theory and Practice 2019; 13: 1–13. [Google Scholar]
- 29.Sherry AD, Msaouel P, Kupferman G, et al. Towards Treatment Effect Interpretability: A Bayesian Re-analysis of 194,129 Patient Outcomes Across 230 Oncology Trials. medRxiv 2024: 2024.07.23.24310891. [Google Scholar]
- 30.Wijeysundera DN, Austin PC, Hux JE, Beattie WS, Laupacis A. Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials. J Clin Epidemiol 2009; 62(1): 13–21.e5. [DOI] [PubMed] [Google Scholar]
- 31.Lane D, Andrew B. Bayesian re-analysis of clinical trials.2021. https://benjamin-andrew.shinyapps.io/bayesian_trials/ (accessed May 21, 2024).
- 32.Fouchardière CDL, Malka D, Cropet C, et al. Gemcitabine and Paclitaxel Versus Gemcitabine Alone After 5-Fluorouracil, Oxaliplatin, and Irinotecan in Metastatic Pancreatic Adenocarcinoma: A Randomized Phase III PRODIGE 65-UCGI 36-GEMPAX UNICANCER Study. Journal of Clinical Oncology; 0(0): JCO.23.00795. [DOI] [PubMed] [Google Scholar]
- 33.Bogart J, Wang X, Masters G, et al. High-Dose Once-Daily Thoracic Radiotherapy in Limited-Stage Small-Cell Lung Cancer: CALGB 30610 (Alliance)/RTOG 0538. J Clin Oncol 2023; 41(13): 2394–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Msaouel P, Lee J, Karam JA, Thall PF. A Causal Framework for Making Individualized Treatment Decisions in Oncology. Cancers (Basel) 2022; 14(16). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lin T, Koong A, Lin C, et al. Incidence and impact of proportional hazards violations in phase 3 cancer clinical trials. J Clin Oncol 2022; 40(16_suppl): 1561. [Google Scholar]
- 36.Rahman R, Fell G, Ventz S, et al. Deviation from the Proportional Hazards Assumption in Randomized Phase 3 Clinical Trials in Oncology: Prevalence, Associated Factors, and Implications. Clinical Cancer Research 2019; 25(21): 6339–45. [DOI] [PubMed] [Google Scholar]
- 37.McCaw ZR, Tian L, Wei J, et al. Choosing clinically interpretable summary measures and robust analytic procedures for quantifying the treatment difference in comparative clinical studies. Stat Med 2021; 40(28): 6235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Senn S Seven myths of randomisation in clinical trials. Stat Med 2013; 32(9): 1439–50. [DOI] [PubMed] [Google Scholar]
- 39.Pasalic D, Fuller CD, Mainwaring W, et al. Detecting the Dark Matter of Unpublished Clinical Cancer Studies: An Analysis of Phase 3 Randomized Controlled Trials. Mayo Clin Proc 2021; 96(2): 420–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The underlying dataset, code for the development of the prior, and code for computing posterior probability are provided in the Supplement. A freely available webpage has been created for users to compute the probabilities at a given level of hazard ratio based on the trial’s summary statistics: https://alexandersherry.shinyapps.io/shinyapp/.