Abstract
Real-world data is emerging as a source of data to evaluate drug effects that may provide unique benefits given the highly selected enrollment and participation of adult patients in clinical trials. This commentary considers the potential utility of real-world data in light of the recently reported results of the FLOWER study, which explore the safety and efficacy of osimertinib using a prospective single-arm design with real-world data sourced from clinical practice.
Real-world data (RWD) are data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources such as electronic health records (EHRs), administrative health claims, registries, and patient-generated data.1 While drug development has historically relied on prospective randomized controlled trials as the evidentiary gold standard, RWD is emerging as another source of data to evaluate drug effects that may provide unique benefits given the highly selected enrollment and participation of adult patients in clinical trials.2,3 Most RWD studies to date have been retrospective and limited by lack of randomization; therefore, regulatory use has been considered in the rare circumstance when a prospective clinical trial is infeasible, unethical, or there is a clear lack of clinical equipoise.
In FLOWER (First-Line Osimertinib in the real-World: an intER-regional prospective study), Lorenzi et al. sought to explore the safety and effectiveness of osimertinib using a prospective single-arm design with RWD sourced from routine clinical practice. The ability to include populations previously excluded based on the eligibility criteria of the FLAURA trial and evaluate drug safety in routine clinical settings are examples of benefits of RWD studies that can increase generalizability. The prospective design of this study was a significant strength compared to more traditional retrospective observational studies, demonstrating that designs employing RWD need not be retrospective. Data can be prospectively collected via RWD sources, incorporating randomization in a pragmatic or hybrid approach that may employ techniques such as embedding and decentralization. Prospective designs using RWD have benefits including randomization if testing an intervention, and careful attention to key covariates to mitigate missing data, an Achilles heel of many retrospective RWD studies.
Interestingly, FLOWER reported tumor response rate using Response Evaluation Criteria in Solid Tumors (RECIST), and safety data graded using Common Terminology Criteria for Adverse Events (CTCAE) in a similar method to a clinical trial. Reporting RECIST is likely less common in US healthcare and it would be interesting to understand how often clinicians in Italy use RECIST and CTCAE in routine clinical practice. Unfortunately, the authors did not provide a supplemental protocol or detailed methodology on how these safety and response data were obtained, including the fraction of patients where CTCAE and RECIST data were not directly reported by the clinicians. Where RECIST was not directly reported, the methodology used to derive real-world response by Lorenzi et al. would be useful for interpretation given the distinct challenges of accurately and reliably collecting response endpoints using RWD. The FDA has relied on objective response rate (ORR) as an interpretable efficacy endpoint in single-arm trials because causal treatment effect can be attributed given tumors do not typically spontaneously regress. In fact, a key limitation in the use of RWD single-arm cohorts has been the inability to collect the appropriate level of tumor response data necessary for adequate interpretation of effectiveness as RECIST is not often recorded in routine US clinical practice. Indeed, the prospective, defined capture of response using CTCAE and RECIST based assessments is notable and suggests the FLOWER design may be more consistent with a pragmatic prospective single-arm clinical trial rather than an observational study.
The study reported time-to-event endpoints including progression-free survival (PFS), time to treatment discontinuation (TTD), and overall survival. To adequately interpret an effect on a time-to-event endpoint such as overall survival, randomization is essential. Furthermore, the methodological construction of TTD was not addressed in the manuscript, which again highlights the importance of reporting detailed methodology in RWD studies. While TTD has been associated with PFS using aggregated clinical trial data from lung cancer trials, further validation of the use of TTD as an endpoint in the routine care setting is needed.4 As noted by Lorenzi et al., the observation of a longer TTD in FLOWER relative to FLAURA may be related to hesitancy on the part of providers in routine clinical settings to switch therapy at the time of radiographic progression in patients who are felt to benefiting clinically or who have evidence of progression at a limited number of sites (oligoprogression). However, it is not clear whether the continuation of therapy in such cases translates into improved outcomes. Potential differences in the reason for treatment discontinuation between clinical trial and clinical care settings limit the utility of TTD as the endpoint for an external control arm when compared with a clinical trial result. Additionally, the dichotomization of outcomes may introduce methodologic bias and complicate interpretability. Nonetheless, further evaluation of the strengths and limitations of TTD as an efficacy endpoint in a prospective pragmatic trial conducted in the clinical care setting is warranted, where assessment and definition of the endpoint are consistent between the arms.
The prospective multicenter design of FLOWER is a strength, but the study was restricted to a single country (Italy), limiting extrapolation of the results to the more diverse U.S. and global patient population, a study examining outcomes in routine care settings should be broader, including racial and ethnic populations that are frequently underrepresented in clinical trials. Geographic limitations also preclude wider interpretation as the variance in health care systems and standards of care may not easily generalize to other settings or countries. Due to the small sample size of FLOWER (n = 126), there are also limitations in the ability to interpret results in specific subgroups of patients who may have been excluded from FLAURA. Although 12.7% of patients in FLOWER had an ECOG PS >2, this translated to a sample size of only 16 patients; similarly, there were only 7 patients with NSCLC harboring rare or complex EGFR mutations. While Lorenzi et al. suggest the higher proportion of patients in FLOWER with brain metastases at baseline (30%) may be due to the inclusion of patients with unstable and symptomatic brain metastases, no data are provided regarding the actual proportion of such patients.
The potential utility of RWD extends beyond evaluating effectiveness. Observational studies have a longstanding role in the evaluation of drug safety and pharmacovigilance monitoring. Lorenzi et al. report an increased incidence of venous thromboembolic (VTE) events in their cohort when compared to the FLAURA trial (FLOWER 7.9% vs FLAURA 3%). This finding should be interpreted with caution as the incidence of VTE reported in FLOWER falls within the range of VTE reported for patients with metastatic NSCLC in retrospective and observational studies5 and like the results supporting efficacy, suffers from the well-known limitations of cross-trial comparison, including a potential imbalance in other risk factors associated with an increased risk of VTE. In addition to safety and effectiveness there are other important potential uses of RWD such as the evaluation of global healthcare delivery. While not explored in detail in the manuscript by Lorenzi et al., an RWD study could examine the types of EGFR testing used in clinical practice, particularly the frequency of approved companion diagnostic assay use confirming EGFR mutation status in the FLAURA trial (the cobas EGFR Mutation Test) versus other assays. Additionally, the authors discuss differences in the median time to treatment initiation based on labeling and regulations, suggesting a potential role of RWD in evaluation of the impact of approved product labeling on healthcare delivery. These are all areas where RWD may afford the opportunity for further implementation research, as well as addressing questions that may arise around health equity.
Unfortunately, it is relatively common for published RWD studies to provide insufficient information on the ascertainment of source data variables, algorithms, and methods necessary to appropriately analyze and interpret the results. Guidelines supporting increased observational study transparency have encouraged authors to publish protocols or include supplementary materials when publishing RWD analyses given the heterogeneity of RWD sources.6–9 Methodological details can assist the evaluation of measurement biases associated with exposure misclassification, clinical care variance, and outcome misclassification. Direct comparison of data generated from FLOWER to FLAURA was limited by uncertainties around data granularity, differences in clinical assessment, exposure and outcome measurement, as well as differences in baseline measured and unmeasured prognostic clinical factors and residual confounding occurring in non-randomized studies. While the time-to-event endpoints are not interpretable in the single-arm setting, data on the tumor response and safety of the use of osimertinib in a broader patient population provided additional complementary information.
Several factors may maximize the utility of the data generated in RWD studies. Study design elements that would strengthen the utility of the data include a larger sample size, multinational sites, diverse racial and ethnic representation, and eligibility criteria selected to ensure the inclusion of patients traditionally excluded from clinical trials. With the increasing availability of RWD from registries, EHRs, and other sources it is paramount to focus on the development of high-quality data with sufficient reliability and accuracy to support evidence generation. Careful design of RWD studies is necessary for interpretable evidence and requires a fit-for-purpose data source, a priori protocol and statistical analysis plan, and a multidisciplinary team of clinicians, statisticians, epidemiologists, and data experts with a high level of data familiarity. And again, transparency around detailed methods (eg, endpoint definitions, algorithms) is fundamental for replication and interpretation of RWD studies.
The study by Lorenzi et al. provides an interesting look at the use of Osimertinib in an Italian cohort of patients treated in routine clinical care. The prospective design and reporting of RECIST and CTCAE in this RWD study suggest that there may continue to be a blurring of the lines between what could be considered a prospective observational study versus a pragmatic clinical trial. While FLOWER was a single-arm study, RWD need not be relegated to single-arm designs, and randomized pragmatic trial approaches would allow the ability to interpret time to event endpoints like survival while delivering care in a community setting with fewer trial-based assessments. The hope is that a transition from bringing the patient to the trial to bringing the trial to the patient—whether through decentralizing clinical trials, or designing pragmatic studies in the routine care setting—will decrease the burden of evidence generation and inform both regulatory and clinical decision making.
Acknowledgment
We thank Harpreet Singh, MD, for her careful review and for providing additional lung cancer-specific expertise.
Conflict of Interest
The authors indicated no financial relationships.
Author Contributions
Manuscript writing: All authors. Final approval of manuscript: All authors.
References
- 1. U.S. Food and Drug Administration. The FDA Framework for Real World Evidence. FDA, 2018. [Google Scholar]
- 2. Unger JM, Cook E, Tai E, Bleyer A.. The role of clinical trial participation in cancer research: Barriers, evidence, and strategies. Am Soc Clin Oncol Educ Book. 2016;35:185-98. 10.1200/EDBK_156686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Unger JM, Hershman DL, Till C, Minasian LM, Osarogiagbon RU, Fleury ME, Vaidya R.. “When offered to participate”: a systematic review and meta-analysis of patient agreement to participate in cancer clinical trials. JNCI. 2021;113(3):244-257. 10.1093/jnci/djaa155 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Blumenthal GM, Gong Y, Kehl K, et al. Analysis of time-to-treatment discontinuation of targeted therapy, immunotherapy, and chemotherapy in clinical trials of patients with non-small-cell lung cancer. Ann Oncol. 2019;30(5):830-838. [DOI] [PubMed] [Google Scholar]
- 5. Vitale C, D’Amato M, Calabrò P, Stanziola AA, Mormile M, Molino A.. Venous thromboembolism and lung cancer: a review. Multidiscip Respir Med. 2015;10(1):28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, Poole C, Schlesselman JJ, Egger M; STROBE Initiative . Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med. 2007;4(10):e297. 10.1371/journal.pmed.0040297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Berger ML, Mamdani M, Atkins D, Johnson ML.. Good research practices for comparative effectiveness research: defining, reporting and interpreting nonrandomized studies of treatment effects using secondary data sources: the ISPOR Good Research Practices for Retrospective Database Analysis Task Force Report–Part I. Value Health. 2009;12(8):1044-1052. [DOI] [PubMed] [Google Scholar]
- 8. Benchimol EI, Smeeth L, Guttmann A, et al. ; RECORD Working Committee . The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12(10):e1001885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Wang SV, Schneeweiss S, Berger ML, et al. ; joint ISPE-ISPOR Special Task Force on Real World Evidence in Health Care Decision Making . Reporting to improve reproducibility and facilitate validity assessment for healthcare database studies V1.0. Pharmacoepidemiol Drug Saf. 2017;26(9):1018-1032. 10.1002/pds.4295 [DOI] [PMC free article] [PubMed] [Google Scholar]
