Considerable progress in understanding cancer biology and advances in technology bring us closer to making precision oncology a reality, as is demonstrated by the volume of novel molecularly targeted and immune-modulating agents entering clinical development. However, integrating these agents into clinical practice requires validation of companion diagnostic (predictive) markers that can accurately separate patients into a biomarker-positive subgroup with a favorable benefit-risk profile and a biomarker-negative subgroup where the benefit-risk profile is unfavorable. Biomarker-driven randomized clinical trials (RCTs) play a critical role in this process. The importance of ensuring that these trials are properly and efficiently designed cannot be overstated. When the magnitude of the treatment effect for a new therapy is expected to depend on a predictive biomarker, the most straightforward and transparent approach to evaluating the therapy is to use a biomarker subgroup-specific design (also called biomarker-stratified design) that evaluates the treatment effect in each relevant biomarker subgroup separately (eg, in biomarker-positive and biomarker-negative subgroups). An often used, yet less transparent and indirect approach, is the so-called biomarker-positive/overall design that (in its simplest form) is based on evaluating treatment effect in the biomarker-positive and the overall population (with a treatment recommendation for the biomarker-negative subgroup based on the overall population results). It is well known that the biomarker-positive/overall design may result in misguided treatment recommendations, specifically recommending ineffective treatments to the biomarker-negative patients as the overall population results may be driven solely by the biomarker-positive patients (1-5).
It is important to note that although some biomarkers are inherently binary (eg, reflecting presence or absence of a mutation of interest), others are ordinal or continuous in nature (eg, reflecting level of expression or tumor-mutation burden) with the magnitude of treatment effect expected to be positively correlated with biomarker values. To use a continuous biomarker as a companion diagnostic, a cutoff value that separates patients with favorable vs unfavorable benefit-risk profiles must be identified to guide treatment decisions. The identification and validation of the cutoff typically requires examination of several plausible cutoffs. For example, for a programmed cell death 1 (PD-1) inhibitor with plausible combined positive score (CPS) cutoffs of 1% and 50%, the biomarker-stratified design would partition the study population into 3 nonoverlapping subgroups (<1%, 1%-49%, and ≥50%) and evaluate the treatment effect in each subgroup separately, and the corresponding biomarker-positive/overall design would evaluate the treatment effect in the 3 nested subpopulations (≥50%, ≥1%, and the overall population).
In this issue of the Journal, Liang et al. (6) present a review of designs and reporting practices for biomarker-validation RCTs in oncology. The authors searched 7 high-impact medical journals for biomarker-validation phase III RCTs and identified 45 biomarker RCTs published between 2011 and 2020. Liang et al. report that 40 (88.9%) of these 45 biomarker RCTs used the flawed biomarker-positive/overall design. Of these biomarker-positive/overall trials, 17 reported positive results for the overall population with 8 showing modest-to-no effect in the biomarker-negative subgroups (hazard ratio [HR] > 0.8): these 8 trials recommended treatment for biomarker-negative patients. In discussing one of these trials, KEYNOTE-042 (7), Liang et al. (6) point out that given that benefit of pembrolizumab in a tumor proportion score (TPS)≥50% in the non-small cell lung cancer (NSCLC) population had already been demonstrated in KEYNOTE-024, “the real clinically relevant question is whether NSCLC patients with PD-L1 [PD-ligand 1] expression between 1% and 49% could benefit from pembrolizumab monotherapy.” KEYNOTE-042 enrolled patients with a TPS>1% and used the biomarker-positive/overall design to report beneficial effect in ≥50%, ≥20%, and ≥1% subpopulations and recommended pembrolizumab for patients with a TPS>1%. However, the results of the subgroup with a TPS≥1% appear to be driven by the strong effect in the population with a TPS≥50%: in the populations with a TPS of 1%-49%, the hazard ratio was 0.92 with a 95% confidence interval (CI) of 0.77 to 1.11, raising questions about the study conclusions (8,9).
Another of the biomarker trials, KEYNOTE-048, followed KEYNOTE-040, demonstrating that a more favorable benefit from pembrolizumab in head and neck squamous cell carcinoma was associated with higher CPS scores, with no benefit observed in patients whose tumors had a CPS<1% (5,10). KEYNOTE-048 used the biomarker-positive/overall design to compare pebrolizumab with chemotherapy and pembrolizumab alone arms against cetuximab with chemotherapy control (testing treatment effect in CPS ≥ 20%, CPS ≥ 1% and overall study population). The study concluded that 1) pembrolizumab with chemotherapy was superior to cetuximab with chemotherapy in all patients and 2) pembrolizumab was noninferior to cetuximab with chemotherapy in all patients. It has been argued that in this application of the biomarker-positive/overall design, recommendations for patients with CPS<1% were driven by results from patients with high CPS scores (5,9). Although the original paper (10) did not report results for the subgroup with a CPS<1%, the follow-up paper (11) reported potentially detrimental overall survival (OS) in the population with a CPS<1% (HR = 1.51, 95% CI = 0.96 to 2.37, and HR = 1.21, 95% CI = 0.76 to 1.94, for the pembrolizumab vs cetuximab with chemotherapy and pembrolizumab with chemotherapy vs cetuximab with chemotherapy, respectively).
Although the majority (64.4%) of biomarker trials identified in the Liang et al. (6) manuscript were testing immunotherapy, 35.6% were evaluating molecularly targeted agents. For example, in advanced breast cancer, SOLAR-1 used the biomarker-stratified design to evaluate the PI3Kα-specific inhibitor alpelisib vs placebo separately in PIK3CA-mutant and PIK3CA-nonmutant subgroups (12). The observed progression-free survival (PFS) hazard ratio was 0.65 (95% CI = 0.50 to 0.85) in the PIK3CA-mutant subgroup and a hazard ratio of 0.85 (95% CI = 0.58 to 1.25) in the PIK3CA-nonmutant subgroup. This trial concluded that the benefit of alpelisib was shown only in PIK3CA-mutant patients thus validating PIK3CA as a predictive biomarker for guiding alpelisib therapy. This can be contrasted with the BELLE-2 trial that used the biomarker-positive/overall design to evaluate the pan-PI3K inhibitor buparlisib vs placebo in breast cancer patients (by formally evaluating buparlisib effect in PIK3CA mutant and in the overall population that combined PIK3CA-mutant and nonmutant patients) (13). The trial concluded that buparlisib is effective regardless of PIK3CA status based on 1.9-month median PFS increase in the overall population (HR = 0.78, 95% CI = 0.67 to 0.89), even though there was no improvement for PIK3CA nonmutant patients (PFS HR = 1.02, 95% CI = 0.79 to 1.30).
Liang et al. (6) reported an increasing time trend in the use of biomarker-validation designs with the field dominated by the biomarker-positive/overall designs. This trend persists beyond 2020 as demonstrated by KEYNOTE-590 (14), CHECKMATE-648 (15), and CHECKMATE-649 (16) evaluating PD-1 inhibitors in gastric and esophageal cancers. These trials used the biomarker-positive/overall design and recommended treatment for all patients regardless of PD-L1 expression levels based on the positive results in the overall populations. For example, CHECKMATE-648 reported longer OS in the overall population on nivolumab with chemotherapy vs chemotherapy alone (HR = 0.74, 99.1% CI = 0.58 to 0.96) and recommended nivolumab with chemotherapy to patents regardless of PD-L1 level. Yet patients with a PD-L1<1% derived no benefit from the addition of nivolumab (OS HR = 0.98, 95% CI = 0.76 to 1.28).
The suboptimal designs of the biomarker-validation RCTs reported by Liang et al. (6) can possibly be attributed to these trials being initiated before the issues with the biomarker-positive/overall designs became widely appreciated. Going forward, however, it is imperative, and should be strongly encouraged by regulatory agencies, that future biomarker-validation studies use biomarker-stratified designs to properly validate the predictive biomarkers by providing reliable treatment-effect estimates in each relevant biomarker subgroup separately. The need to make the trials as small as possible (to accelerate drug development) is often used to argue that biomarker-stratified designs would require unfeasibly large sample sizes. However, many of the 45 biomarker trials have enrolled a sufficient number of patients in relevant biomarker subgroups to use the proper biomarker-stratified design. For example, KEYNOTE-042 (that did not include direct evaluation of the TPS 1%-49% subgroup) enrolled 675 and 599 patients in the TPS 1%-49% and the TPS≥50% subgroups, respectively. As is pointed out by Liang et al., (6) this is sufficient for a biomarker-stratified design to reliably assess treatment effect in TPS 1%-49% and TPS≥50% subgroups separately. Similarly, BELLE-2 enrolled 372 PIK3CA-mutant and 387 PIK3CA-nonmutant patients, sufficient for a biomarker-stratified design adequately powered for PFS evaluation in each PIK3CA subgroup (13).
Implementation of the biomarker-stratified designs for biomarker validation is not without challenges. Ensuring proper power requires prespecifying minimal sample size for each biomarker subgroup. Depending on the specific setting, this may be achieved by keeping enrollment open for some of the subgroup(s) after the higher prevalence subgroups have finished enrollment (3). Adaptive design strategies that open accrual to additional subgroups based on early results from the biomarker-positive patients could also be used to improve design efficiency (2). This is particularly relevant to continuous biomarkers (eg, PD-L1) where validation may require examining several plausible cutoffs. For example, in KEYNOTE-042 to examine the 20% cutoff (in addition to the 50% cutoff), a biomarker-stratified design powered to evaluate 3 subgroups—TPS 1%-19%, 20%-49%, and ≥50%, separately—could have been designed: KEYNOTE-042 enrolled 456, 219, and 599 patients in these subgroups, respectively. Therefore, a biomarker-stratified trial could be designed to continue enrollment to the TPS 20%-49% subgroup after the other 2 subgroups reached their required sample sizes. In some rare and/or molecularly defined patient subgroups, it may not be feasible to enroll enough patients in a timely manner. In these settings, relaxing the design evidentiary levels to reduce the required sample size may be acceptable (eg, use of 1-sided significance levels in the 0.05-0.15 range instead of the standard 0.025 and/or use of larger target treatment effects; additional efficiency can be achieved with the use of adaptive methods like interim monitoring) (2).
In addition to design issues, Liang et al. (6) note the inadequate reporting of biomarker RCTs (eg, for the biomarker-negative patients, 24.4% of trials did not report the treatment effect, and 40.0% did not provide the primary outcome by arm) thereby “preventing physicians from robust benefit-risk assessment in biomarker negative subgroups.” This concern is especially problematic for settings with continuous biomarkers where, regardless of the regulatory approval, optimal treatment decisions for a given patient may depend on their biomarker value. For example, treatment choice for a particular patient with NSCLC with a TPS score of 40% may be different from the treatment choice for one with a 10% score. Therefore, to assist patients and clinicians in making treatment decisions, biomarker RCTs should make treatment effect estimates and Kaplan-Meier curves available for each clinically relevant subgroup (regardless of whether the subgroup is specified in the primary analysis). For example, for PD-1 and PD-L1 inhibitors, the PD-L1 score range should be partitioned in subgroups with 20% (or even 10%) increments, if sufficient patient numbers were enrolled, and outcomes reported for each subgroup separately (regardless of which subgroups were specified in the primary analysis).
Liang et al. reported that the majority of biomarker trials are industry sponsored, illustrating the decentralized and pharma-influenced nature of drug development. It is therefore essential for the future success of precision medicine in oncology that regulatory authorities provide clear guidance for the design of biomarker RCTs. It is also important for public health and academia leaders to ensure that these trials are reported in a way that is most informative for choice of treatment for individual patients—defining the future paradigm of oncology precision medicine therapeutics.
Contributor Information
Patricia M LoRusso, Department of Internal Medicine, Yale Cancer Center, New Haven, CT, USA.
Boris Freidlin, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.
Funding
None.
Notes
Role of the funder: Not applicable.
Disclosures: None exist.
Author contributions: Conceptualization: PML, BF. Writing—original draft: PML, BF. Writing—review & editing: PML, BF.
Data availability
No new data were generated or analyzed in this editorial.
References
- 1. Rothmann MD, Zhang JJ, Lu L, et al. Testing in a prespecified subgroup and the intent-to-treat population. Drug Inf J. 2012;46(2):175-179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Freidlin B, Korn EL.. Biomarker enrichment strategies: matching trial design to biomarker credentials. Nat Rev Clin Oncol. 2014;11(2):81-90. [DOI] [PubMed] [Google Scholar]
- 3. Freidlin B, Korn EL.. A problematic biomarker trial design. J Natl Cancer Inst. 2022;114(2):187-190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Tannock IF, Templeton AJ.. Flawed trials for cancer. Ann Oncol. 2020;31(3):331-333. [DOI] [PubMed] [Google Scholar]
- 5. Kim MS, Prasad V.. Nested and adjacent subgroups in cancer clinical trials: when the best interests of companies and patients diverge. Eur J Cancer. 2021;155:163-167. [DOI] [PubMed] [Google Scholar]
- 6. Liang F, Peng L, Wu Z, et al. Design and reporting of phase III oncology trials with prospective biomarker validation. J Natl Cancer Inst. 2022; doi:10.1093/jnci/djac210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Mok TSK, Wu YL, Kudaba I, et al. ; for the KEYNOTE-042 Investigators. Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial. Lancet. 2019;393(10183):1819-1830. [DOI] [PubMed] [Google Scholar]
- 8. Smit EF, de Langen AJ.. Pembrolizumab for all PD-L1-positive NSCLC. Lancet. 2019;393(10183):1776-1778. [DOI] [PubMed] [Google Scholar]
- 9. Fundytus A, Booth CM, Tannock IF.. How low can you go? PD-L1 expression as a biomarker in trials of cancer immunotherapy. Ann Oncol. 2021;32(7):833-836. [DOI] [PubMed] [Google Scholar]
- 10. Burtness B, Harrington KJ, Greil R, et al. ; for the KEYNOTE-048 Investigators. Pembrolizumab alone or with chemotherapy versus cetuximab with chemotherapy for recurrent or metastatic squamous cell carcinoma of the head and neck (KEYNOTE-048): a randomised, open-label, phase 3 study. Lancet. 2019;394(10212):1915-1928. [DOI] [PubMed] [Google Scholar]
- 11. Burtness B, Rischin D, Greil R, et al. Pembrolizumab alone or with chemotherapy for recurrent/metastatic head and neck squamous cell carcinoma in KEYNOTE-048: subgroup analysis by programmed death ligand-1 combined positive score. J Clin Oncol. 2022;40(21):2321-2332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Andre F, Ciruelos E, Rubovszky G, et al. ; for the SOLAR-1 Study Group. Alpelisib for PIK3CA-mutated, hormone receptor-positive advanced breast cancer. N Engl J Med. 2019;380(20):1929-1940. [DOI] [PubMed] [Google Scholar]
- 13. Baselga J, Im SA, Iwata H, et al. Buparlisib plus fulvestrant versus placebo plus fulvestrant in postmenopausal, hormone receptor-positive, HER2-negative, advanced breast cancer (BELLE-2): a randomised, double-blind, placebo-controlled, phase 3 trial. Lancet Oncol. 2017;18(7):904-916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Sun JM, Shen L, Shah MA, et al. ; for the KEYNOTE-590 Investigators. Pembrolizumab plus chemotherapy versus chemotherapy alone for first-line treatment of advanced oesophageal cancer (KEYNOTE-590): a randomised, placebo-controlled, phase 3 study. Lancet. 2021;398(10302):759-771. [DOI] [PubMed] [Google Scholar]
- 15. Doki Y, Ajani JA, Kato K, et al. ; CheckMate 648 Trial Investigators. Nivolumab combination therapy in advanced esophageal squamous-cell carcinoma. N Engl J Med. 2022;386(5):449-462. [DOI] [PubMed] [Google Scholar]
- 16. Janjigian YY, Shitara K, Moehler M, et al. First-line nivolumab plus chemotherapy versus chemotherapy alone for advanced gastric, gastro-oesophageal junction, and oesophageal adenocarcinoma (CheckMate 649): a randomised, open-label, phase 3 trial. Lancet. 2021;398(10294):27-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No new data were generated or analyzed in this editorial.