Abstract
As advanced non-small cell lung cancer (aNSCLC) is being increasingly divided into rare oncogene-driven subsets, conducting randomised trials becomes challenging. Using real-world data (RWD) to construct control arms for single-arm trials provides an option for comparative data. However, non-randomised treatment comparisons have the potential to be biased and cause concern for decision-makers. Using the example of pralsetinib from a RET fusion-positive aNSCLC single-arm trial (NCT03037385), we demonstrate a relative survival benefit when compared to pembrolizumab monotherapy and pembrolizumab with chemotherapy RWD cohorts. Quantitative bias analyses show that results for the RWD-trial comparisons are robust to data missingness, potential poorer outcomes in RWD and residual confounding. Overall, the study provides evidence in favour of pralsetinib as a first-line treatment for RET fusion-positive aNSCLC. The quantification of potential bias performed in this study can be used as a template for future studies of this nature.
Subject terms: Non-small-cell lung cancer, Outcomes research, Lung cancer
Real-world data (RWD) based control arms provide an option to compare the effectiveness of single-arm trials. By performing multiple quantitative bias analyses to alleviate concerns about trial-RWD comparability, here the authors show that the RET inhibitor pralsetinib provides survival benefit in patients with RET fusion-positive non-small cell lung cancer from the ARROW single-arm trial, (NCT03037385) when compared to pembrolizumab monotherapy and pembrolizumab with chemotherapy RWD cohorts.
Introduction
The advent of immune checkpoint inhibitors and molecularly targeted therapy have altered the landscape of non-small cell lung cancer1 (NSCLC). Randomised trials have shown the benefit of targeted therapy in EGFR- and ALK- driven NSCLC over standard-of-care immunotherapy and chemo-immunotherapy. As NSCLC is being increasingly divided into rare oncogene-driven subsets, it is becoming challenging and infeasible to conduct well-powered randomised trials. In some cases, there is a lack of clinical equipoise when randomising to standard traditional therapies once a rationally designed targeted therapy produces high response rates with impressive durability in single-arm studies.
ARROW is a multi-cohort, open-label, phase I/II study (NCT03037385) that demonstrated that pralsetinib, a highly potent selective RET inhibitor, was efficacious when administered to treatment-naïve patients with advanced RET fusion-positive NSCLC2,3. Given the promising results of pralsetinib demonstrated in ARROW, the comparative effectiveness of pralsetinib relative to other therapies amongst patients with advanced NSCLC (aNSCLC) in terms of time-to-treatment discontinuation (TTD), overall survival (OS) and progression-free survival (PFS) is currently unknown and of interest. Whilst a front-line randomized phase III trial is ongoing, the feasibility of a definitive outcome from this strategy for a rare-molecular subset remains uncertain due to recruitment challenges with an efficacious intervention on a background of significant COVID infections. To fill this evidence gap, one of the two goals of this study was to investigate the relative effectiveness of pralsetinib by comparing outcomes for RET fusion-positive patients receiving first-line (1 L) pralsetinib in the ARROW trial with synthetic control arms (SCAs) derived from real-world data (RWD).
Drawing upon RWD to construct SCAs for comparison is used for situations where running a randomised clinical trial (RCT) is impractical or infeasible, or where RCT data is currently unavailable. However, non-randomised treatment comparisons have the potential to be biased due to unmeasured confounding, missing data in RWD and potential poorer performance of an RWD SCA as compared to the pivotal trial the comparator medicine was approved on. These issues have all caused concern for decision-makers evaluating SCA comparisons4 and suggestions have been made by both regulators and health technology assessment (HTA) agencies that the validity of any conclusions should be supported by analyses that quantify the impact of potential sources of bias5–7. Thus, the second of the two goals of this study was to demonstrate how we quantitatively assessed the robustness of our findings to potential sources of bias in a comprehensive and systematic fashion to act as a guide for future SCA studies using RWD8,9.
With the first goal to investigate the comparative effectiveness of pralsetinib relative to other therapies, we had to consider that since the prevalence of RET fusions is low in NSCLC (1–2%)10, using RWD for studies involving patients with RET fusion-positive status would be challenging. Thus, since we expected to have a limited number of RET fusion-positive patients available in RWD sources, and the prognostic value of RET fusion status appears to be limited based on the evidence currently available11–14, additional comparisons with aNSCLC patients of RET fusion unknown status were assessed. This allowed us to maximise the sample sizes of the RWD cohorts, translating into much higher statistical power and ability to adjust for imbalances in patient characteristics between cohorts.
Results
Demographics and clinical characteristics
The CGDB RET fusion-positive 1 L best-available therapy (BAT) (definition in the Supplementary – 1 L BAT regimens) had 10 patients in total, and baseline patient characteristics are shown in Table 1. Notable imbalances between the pralsetinib and Clinico-Genomic Database (CGDB)15 cohorts were observed for sex, Eastern Cooperative Oncology Group (ECOG) Performance Status (PS) score, and race (SMD [standardized mean difference] >0.6).
Table 1.
Baseline characteristics | Best available therapy | Pralsetinib | SMD |
---|---|---|---|
Sample size – n | 10 | 116 | – |
Age ≥ 65 – n (%) | 5 (50.0) | 49 (42.2) | 0.156 |
Male – n (%) | 2 (20.0) | 55 (47.4) | 0.606 |
Stage IV – n (%) | 7 (70.0) | 95 (81.9) | 0.281 |
Smoking status – n (%) | |||
History of smoking | 4 (40.0) | 45(38.8) | 0.23 |
No history of smoking | 6 (60.0) | 68(58.6) | |
Unknown | 0 (0.0) | 3 (2.6) | |
ECOG – n (%) | |||
0 | 5 (50.0) | 35 (30.2) | 1.017 |
1 | 3 (30.0) | 80 (69.0) | |
2 | 0 (0.0) | 1 (0.9) | |
Missing | 2 (20.0) | 0 (0.0) | |
Non-squamous histology – n (%) | 10 (100.0) | 115 (99.3) | 0.132 |
Time since diagnosis – median (IQR) | 1.50 (0.77, 8.77) | 1.76 (1.25, 2.51) | 0.069 |
Metastases: Brain/CNS site – n (%) | 1 (10.0) | 31 (26.7) | 0.442 |
Race – n (%) | |||
Other | 2 (20.0) | 53 (45.7) | 0.664 |
Unknown | 2 (20.0) | 6 (5.2) | |
White | 6 (60.0) | 57 (49.1) |
For the comparison between the pralsetinib and enhanced data-mart (EDM) pembrolizumab cohorts in 1 L, there were 795 patients in total, and the comparison with pralsetinib and EDM pembrolizumab with chemotherapy cohorts in 1 L had 1379 patients in total: 109 in the pralsetinib trial cohort, 686 in the pembrolizumab EDM cohort, and 1270 in the pembrolizumab with chemotherapy EDM cohort. Clinical and demographic characteristics of patients are shown in Table 2. As expected, there were more smokers in the RWD cohort for both comparisons relative to the pralsetinib cohort (Table 2).
Table 2.
Level | Pembrolizumab | Pralsetinib | SMD | Pembrolizumab with chemotherapy | Pralsetinib | SMD | |
---|---|---|---|---|---|---|---|
N | 686 | 109 | 1270 | 109 | |||
Age (%) | <65 | 197 (28.7) | 65 (59.6) | 0.655 | 508 (40.0) | 65 (59.6) | 0.4 |
>=65 | 489 (71.3) | 44 (40.4) | 762 (60.0) | 44 (40.4) | |||
Sex (%) | F | 375 (54.7) | 59 (54.1) | 0.011 | 569 (44.8) | 59 (54.1) | 0.187 |
M | 311 (45.3) | 50 (45.9) | 701 (55.2) | 50 (45.9) | |||
Smoking history at baseline (%) | History of smoking | 628 (91.5) | 43 (39.4) | 1.31 | 1144 (90.1) | 43 (39.4) | 1.25 |
No history of smoking | 58 (8.5) | 66 (60.6) | 126 (9.9) | 66 (60.6) | |||
ECOG (%) | 0 | 230 (33.5) | 34 (31.2) | 0.05 | 512 (40.3) | 34 (31.2) | 0.191 |
1 | 456 (66.5) | 75 (68.8) | 758 (59.7) | 75 (68.8) | |||
Time from initial diagnosis to first dose (months) (median [IQR]) | 1.41 [0.92, 2.85] | 1.74 [1.25, 2.30] | 0.054 | 1.18 [0.76, 1.84] | 1.74 [1.25, 2.30] | 0.148 | |
Stage at initial diagnosis (%) | STAGE I, II, or III | 192 (28.0) | 17 (15.6) | 0.304 | 204 (16.1) | 17 (15.6) | 0.013 |
STAGE IV | 494 (72.0) | 92 (84.4) | 1066 (83.9) | 92 (84.4) | |||
Race (%) | White | 493 (71.9) | 54 (49.5) | 0.612 | 883 (69.5) | 54 (49.5) | 0.573 |
Other | 123 (17.9) | 49 (45.0) | 248 (19.5) | 49 (45.0) | |||
Unknown | 70 (10.2) | 6 (5.5) | 139 (10.9) | 6 (5.5) | |||
Brain/CNS metastasis only (%) | 0 | 597 (87.0) | 79 (72.5) | 0.368 | 1090 (85.8) | 79 (72.5) | 0.333 |
1 | 89 (13.0) | 30 (27.5) | 180 (14.2) | 30 (27.5) |
Comparative effectiveness
CGDB RET fusion-positive comparison
Given sample size, an unadjusted comparison was performed between ARROW and the CGBD cohort. The unadjusted comparison between the pralsetinib and CGDB RET fusion-positive 1 L BAT cohorts showed that pralsetinib was associated with higher TTD, OS, and PFS. The hazard ratios (HRs) were TTD 0.71 (95% CI [confidence interval], 0.34–1.48), OS 0.45 (95% CI, 0.16–1.25), and PFS 0.71 (95% CI, 0.32–1.55); these associations however were limited by sample size (N = 10).
Pembrolizumab monotherapy EDM comparison
Following inverse probability of treatment weighting (IPTW)16, sufficient balance based on a conservative cut-off of SMDM <0.1 was achieved for sex, ECOG PS, time from initial diagnosis, and stage at diagnosis for the comparison between the pralsetinib and pembrolizumab cohorts (Table 3). Age, smoking history, and race demonstrated residual imbalance, though all have SMD <0.25, which has also been suggested as a reasonable threshold for balance17. The central nervous system (CNS) metastases variable remained imbalanced (SMD = 0.241), but recording of metastases differs between ARROW and the EDM. The ESS of the pembrolizumab group was 115.
Table 3.
Level | Pembrolizumab | Pralsetinib | SMD | Pembrolizumab with chemotherapy | Pralsetinib | SMD | Adjusted | |
---|---|---|---|---|---|---|---|---|
ESS/n | 115/683 | 109/109 | 217/1270 | 109/109 | ||||
Age (%) | <65 | 48.3 | 59.6 | 0.23 | 58.9 | 59.6 | 0.015 | Y |
>=65 | 51.7 | 40.4 | 41.1 | 40.4 | ||||
Sex (%) | F | 50.6 | 54.1 | 0.072 | 54.5 | 54.1 | 0.007 | Y |
M | 49.4 | 45.9 | 45.5 | 45.9 | ||||
Smoking history at baseline (%) | History of smoking | 48.9 | 39.4 | 0.192 | 40.3 | 39.4 | 0.017 | Y |
No history of smoking | 51.1 | 60.6 | 59.7 | 60.6 | ||||
ECOG (%) | 0 | 27.8 | 31.2 | 0.075 | 32.9 | 31.2 | 0.037 | Y |
72.2 | 68.8 | 67.1 | 68.8 | |||||
Time from initial diagnosis to first dose (months) (median [IQR]) | 1 | 1.45 [0.92, 2.45] | 1.74 [1.25, 2.30] | 0.078 | 1.32 [0.92, 2.24] | 1.74 [1.25, 2.30] | 0.042 | Y |
Stage at initial diagnosis (%) | STAGE I, II, or III | 17 | 15.6 | 0.038 | 16.6 | 15.6 | 0.028 | Y |
STAGE IV | 83 | 84.4 | 83.4 | 84.4 | ||||
Race (%) | White | 56.7 | 49.5 | 0.199 | 52.3 | 49.5 | 0.061 | Y |
Other | 35.6 | 45 | 41.9 | 45 | ||||
Unknown | 7.7 | 5.5 | 5.8 | 5.5 | ||||
CNS metastases only (%) | 0 | 82.5 | 72.5 | 0.241 | 87.5 | 72.5 | 0.383 | N |
1 | 17.5 | 27.5 | 12.5 | 27.5 |
ESS Effective sample size; the sample size of an unweighted sample which incorporates the precision of the given weighted sample, n number of patients in remaining in IPTW-trimmed sample.
For the comparisons between the pralsetinib trial cohort and EDM 1 L pembrolizumab cohort, post-IPTW-adjustment, pralsetinib was associated with significantly higher TTD, OS, and PFS. The adjusted HRs for the comparison were TTD 0.49 (95% CI, 0.33–0.73), OS 0.33 (95% CI, 0.18–0.61), PFS 0.47 (95% CI, 0.31–0.7),
Pembrolizumab and chemotherapy EDM comparison
For the comparison between the pralsetinib and EDM pembrolizumab with chemotherapy groups, following IPTW-adjustment, balance was achieved for age, smoking history, race, sex, ECOG PS, time from initial diagnosis, and stage at diagnosis based on a threshold of SMD <0.1 (Table 3). Indeed, only CNS metastases, appeared to have residual imbalance. The ESS of the pembrolizumab and chemotherapy group was 217.
For the comparisons between the pralsetinib trial cohort and EDM 1 L pembrolizumab with chemotherapy cohort, post-IPTW-adjustment, pralsetinib was associated with significantly higher TTD, OS, and PFS. The adjusted HRs for the were TTD 0.5 (95% CI, 0.36–0.7), OS 0.36 (95% CI, 0.21–0.64), PFS 0.5 (95% CI, 0.36–0.7) as shown in Fig. 1.
Sensitivity analyses corresponding to comparisons with the CGDB RET fusion-positive 1 L BAT cohort were not executed due to sample size considerations. Thus, we present the key results from the comparisons with the EDM cohorts in the following sections.
Sensitivity analysis – Quantitative Bias Analysis (QBA) for missing data assumptions about baseline covariates
In the pembrolizumab cohort, ECOG PS was missing for 294 patients (30%), and in the pembrolizumab with chemotherapy cohort, for 449 patients (26%). Following multiple imputation of ECOG PS scores, the 1 L comparison between pralsetinib and EDM pembrolizumab and pembrolizumab with chemotherapy, pralsetinib was still associated with significantly higher OS and the adjusted HRs were 0.38 (95% CI 0.21–0.67) and 0.37 (95% CI 0.21–0.64) respectively.
Tipping point-based bias analysis assuming non-random missingness for ECOG PS was executed. As no tipping points could be identified for either comparison of pralsetinib with pembrolizumab or pembrolizumab with chemotherapy for OS, this indicated that the adjusted HRs are robust to extreme deviations from random missingness for baseline ECOG PS. The MAR (data missing at random) and MNAR (data missing not at random) analyses showed our results were also robust in general to missingness assumptions for measured baseline covariates under standard multiple imputation compared to the main analyses.
Sensitivity analysis – Impact of metastases
The EDM pembrolizumab cohort had 365 patients (53.2%) without recorded metastases, and the EDM pembrolizumab with chemotherapy cohort also had a large proportion of 582 patients (45.8%) with no record of metastases. IPTW-based analyses including metastases in the propensity score model still yielded significantly better adjusted HRs in favour of pralsetinib for TTD 0.59 (95% CI, 0.38–0.93), OS 0.29 (95% CI, 0.15–0.57), and PFS 0.45 (95% CI, 0.29–0.71) in comparisons with the pembrolizumab cohort, and significantly better TTD 0.42 (95% CI, 0.30–0.60), OS 0.31 (95% CI 0.17–0.54), and PFS 0.38 (95% CI, 0.26–0.54) in the comparisons using the pembrolizumab with chemotherapy cohort.
Sensitivity analysis – QBA of unmeasured confounding
In Fig. 2, we plotted bias curves for 1 L pralsetinib vs EDM 1 L pembrolizumab and 1 L pralsetinib vs EDM 1 L pembrolizumab with chemotherapy comparisons. The black curve at the point estimate of 0.38 (95% CI 0.21–0.67; ARR 0.51) in Fig. 2A plots the range of values for the association of a confounder with survival and treatment assignment that would be needed to nullify our conclusions, i.e., that the resulting unconfounded effect estimate would equal 1 on the risk ratio (RR) scale for the pralsetinib versus pembrolizumab comparison. In Fig. 2B, for the comparison between pralsetinib and pembrolizumab with chemotherapy, the black curve was plotted at the point estimate 0.37 (95% CI 0.21–0.67).
The E-value on the RR scale, was 3.31 for the comparison of 1 L pralsetinib with EDM 1 L pembrolizumab, and 3.37 for the comparison with EDM 1 L pembrolizumab and chemotherapy. Amongst measured covariates, the highest association with the outcome OS was observed for age, and the highest association with exposure was smoking history. Therefore, consistent with the bias plots, we expect our results are robust to plausible unmeasured confounding since the QBA suggested it would be implausible for sufficiently large systematic differences in unmeasured prognostic variables to reverse our findings.
Sensitivity analysis – QBA of hazard ratio robustness for poorer RWD performance
For the comparison between 1 L pralsetinib and EDM 1 L pembrolizumab cohorts, at the transformation threshold, the EDM OS curve is well above that of KEYNOTE-42 (Fig. 3A), which has a median OS of 16.7 months (95% CI 13.9–19.7)18. The median OS of the untransformed true EDM cohort was 19.17 months (95% CI 10.22-NA) with an IPTW-adjusted HR of 0.35 (95% CI 0.19-0.64), and at the transformation threshold, the median OS was 32.58 months (95% CI 17.38-NA), with an IPTW-adjusted HR of 0.53 (95% CI 0.29–0.96).
For the comparison between 1 L pralsetinib and EDM 1 L pembrolizumab with chemotherapy cohorts, at the transformation threshold, the EDM OS curve is above that of KEYNOTE-18919 (Fig. 3B), which has a median OS of 22.0 months (95% CI 19.5–25.2). The median OS of the untransformed EDM cohort was 15.75 months (95% CI 12.46-31.36) with an IPTW-adjusted HR of 0.37 (95% CI 0.21–0.65), and at the transformation threshold, the median OS was 25.20 months (95% CI 19.94-NA), with an IPTW-adjusted HR of 0.56 (95% CI 0.32-0.99).
Discussion
This study directly compares OS, PFS, and TTD outcomes for pralsetinib versus other first-line treatments in the real-world for aNSCLC. The two goals of this study were to investigate the effectiveness of pralsetinib by constructing an SCA for the ARROW study from RWD, and secondly demonstrate the application of multiple QBA methods to quantify a number of potential sources of bias. This demonstration is motivated by the current landscape where even though propensity-score based methods are commonly used for indirect comparisons and can mitigate the effects of selection bias, many of these studies do not seek to quantify the effects of other types of bias, making it difficult to assess the robustness of resulting estimates, as highlighted by regulators and HTA agencies20–27.
Being a rare mutation at 1%-2% of NSCLC10 along with limited testing uptake over time, we expected that there would be a prohibitive number of RET fusion-positive patients available in RWD sources. Thus, this study involved comparisons between RET fusion-positive patients from the ARROW trial to two types of RWD patient groups: 1) the subset of RET fusion-positive patients from the CGDB, and 2) RET fusion status unknown patients from the EDM, which has many more patients than the CGDB. The assumption based on currently available evidence that RET fusion status is not distinctly prognostic allowed for flexibility in using the EDM for cohort development11–14.
The comparisons involving the CGDB RET fusion-positive 1 L BAT and 1 L pralsetinib cohorts showed that pralsetinib was associated with higher TTD, OS, and PFS, though these associations however were limited by sample size (N = 10). The comparisons using cohorts drawn from the EDM showed significant association in favour of pralsetinib, as well as far greater precision of treatment effect estimates. All comparisons between the pralsetinib trial and RWD EDM cohorts showed that pralsetinib was significantly associated with higher TTD, OS, and PFS over pembrolizumab, and pembrolizumab with chemotherapy. The results from the comparison with 1 L pembrolizumab had some residual imbalance in three patient characteristics. Nonetheless, the extent of the imbalance for all of these variables (SMD < 0.25) has been suggested to be reasonable based on a prior study17. Additional considerations were the inclusion of imbalanced confounders post-adjustment in the Cox outcome regression models. This was done to account for the residual confounding that could not be addressed purely by IPTW and subsequent weighted analyses. Further, the sensitivity analyses using QBA methods showed that our results from the comparisons with the EDM cohorts for OS were robust against all types of bias tested.
We performed multiple QBA-type sensitivity analyses to alleviate concerns about trial-RWD comparability by quantifying the effects of missing ECOG PS, unmeasured confounding, and reduced survival of patients from RWD relative to that seen in pivotal clinical trials. An advantage of the way we conducted QBA for missing data is that the researcher does not need to know the true mechanism of missingness since the effect on the treatment effects under multiple different missing data assumptions are tested. Tipping point analyses, which were used when working under the assumption that ECOG is MNAR provides a similar advantage. That is, researchers do not need to make assumptions about how the missingness occurs, but rather only consider whether the tipping point, if one exists, is a plausible scenario that may occur.
The QBA of unmeasured confounding also has practical and clear advantages in the context of studies involving RWD, where data limitations are common. Bias plots offer a visual representation of how the method adjusts for a hypothetical unmeasured confounder over a range of confounder-exposure and confounder-outcome associations. This allows for a nuanced assessment of how robust a treatment effect estimate would be against unmeasured confounders.
We also sought to quantify how our conclusion that pralsetinib is associated with significantly better OS would be reversed when we observe that the performance of a treatment in the real-world is worse as compared to that seen in its corresponding pivotal clinical trial. Without requiring any assumptions as to why the discrepancy occurs, the results of the analysis shows how much better the real-world performance needs to be before the association observed is no longer significant. When comparing the survival of this hypothetical group with the corresponding clinical trial via Kaplan–Meier curves and median values, researchers can judge whether their conclusions would be robust against meaningful non-concordance between real-world and clinical trial concordance. To our knowledge, this approach to answer how robust a treatment estimate is against any advantages conveyed to by poorer performance of treatments in real-world and trial settings has not been done previously. Such situations are often observed, with multiple cases in the context of immunotherapy treatments in NSCLC alone28.
Concerns that metastases are recorded differently between trial and RWD induced the decision to not use these variables for weight estimation since their inclusion may introduce bias. These variables were also not used for judgement as to whether groups being compared were considered balanced. Nevertheless, assessing the robustness of the main analysis estimates when adjusting for metastases resulted in adjusted HRs that support the conclusions from the main comparisons with the EDM cohorts. Possible sources of bias not addressed in this study include the inconsistent characterization of PFS between the ARROW trial and RWD. While we did exclude ALK and EGFR mutations, an additional limitation to be considered for future work is the lack of reporting across the ARROW trial and EDM for other uncommon/uncharacterised oncogene mutations. The effect of these mutations may additionally be exacerbated by differences in smoking status between cohorts.
Overall, this study provides evidence in favour of pralsetinib over pembrolizumab and pembrolizumab with chemotherapy as an effective 1 L treatment for RET fusion-positive aNSCLC. The study also demonstrates multiple sensitivity analyses performed to quantify the effect of multiple sources of bias. In the context of this study, we show that the results of these bias assessments reinforced our findings and can be used as a template for future trial-RWD comparisons.
Methods
This comparative effectiveness research study adheres to the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) reporting guideline and the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cohort studies29. Approval for this study was granted by the WIRB-Copernicus Group institutional review board. Informed consent was waived because the data were deidentified, in accordance with 45 CFR §46.
Study populations
The ARROW study (NCT03037385), is a registrational, non-randomized Phase 1 and 2 trial of pralsetinib, in patients with advanced non-resectable NSCLC and other tumours. The trial was conducted at multiple study sites across the US, Asia, and Europe. The pralsetinib cohort used for the comparisons included patients with RET fusion-positive aNSCLC in the ARROW trial.
The RWD study cohorts were selected from two databases from Flatiron Health. The first of these being the Flatiron Health-Foundation Medicine (FMI) Clinico-Genomic Database (CGDB)15, a US nationwide, longitudinal database of electronic health records linked to genomic data derived from FMI comprehensive genomic profiling (CGP) tests by deidentified, deterministic matching. An advantage of the CGDB is that it contains test results for RET fusion status. RWD cohorts were also drawn from a second Flatiron Health database, the enhanced data-mart (EDM). The EDM’s strength is its large number of patients, though does not have genomic testing information available on patients’ RET fusion status15,30. Hence, since the prognostic value of RET fusion status appears limited based on the evidence currently available, under the assumption that RET fusion status is not prognostic, the sample size of the RWD cohorts could be maximised by using the EDM11–14.
The flowcharts in Fig. 4 describe the patient selection process drawn from the (A) ARROW trial, (B) Flatiron Health CGDB, and (C) Flatiron Health EDM cohorts. For the CGDB and EDM study cohorts, patients missing a date of death were censored at their last recorded visit in the database or March 1, 2020 (data cut and study cut-off date), whichever was earlier. The main cohort was from the CGDB: RET fusion-positive receiving 1 L best-available therapy (BAT) (definition in the Supplementary – 1 L BAT regimens). 1 L treatments for this cohort were pooled as the sample size was small. Two other EDM cohorts involved in the head-to-head comparisons where we assumed RET fusion status is not prognostic were selected from the EDM: patients receiving 1 L pembrolizumab, and patients receiving 1 L pembrolizumab with chemotherapy; the chemotherapy was carboplatin and pemetrexed.
Patients from the pralsetinib cohort and the RWD study cohorts had unresectable, locally advanced, or metastatic NSCLC diagnosed between January 1, 2011 and September 1, 2019, had non-squamous histology, and had an Eastern Cooperative Oncology Group (ECOG) Performance Status (PS) score of 0 or 1 at time of 1 L treatment initiation. The following criteria were also applied to the RWD cohorts: patients do not have EGFR, ALK, ROS1, or BRAF mutations at the date of initiation of 1 L regimen (“index date”), are aged 18 years or older, have <90-day gap between aNSCLC diagnosis and first visit or medication administration, have an index date >6 months prior to the administrative cut-off date of March 1, 2020, a 1 L start date between 2017 and 2019 in order to align with the ARROW trial, and patients could not have pralsetinib or selpercatinib or clinical study drugs in any line of treatment. Identical eligibility criteria were used to select patients for all treatment regimens of interest. The eligibility criteria for the CGDB RET fusion-positive cohort was largely similar to the EDM cohorts (Fig. 4).
Digitised approximations of the Kaplan–Meier curves corresponding to two final phase-3 KEYNOTE trial arms (KEYNOTE-42 and KEYNOTE-189)21,22 for pembrolizumab monotherapy and pembrolizumab with chemotherapy used for sensitivity analyses. The purpose being to assess for each of the two regimens the impact of any discrepancy in overall survival between the corresponding clinical trial and RWD cohorts.
Statistical analysis – Comparative effectiveness
Inverse probability of treatment weighting (IPTW)16 was used to adjust for differences in patient characteristics between the ARROW trial and RWD cohorts. Estimating the relative treatment effect for a population of patients with similar characteristics to patients from the ARROW trial was of interest. Thus, the chosen estimand was the average treatment effect among the treated (ATT). Unadjusted and IPTW-adjusted hazard ratios (HR) for time to treatment discontinuation (TTD), overall survival (OS) and progression-free survival (PFS), were estimated using Cox proportional-hazards models. Covariates with residual imbalance after IPTW (standardized mean difference [SMD] > 0.1)17 were controlled by including them as covariates in the Cox model. Missing data was assumed to be completely missing at random and the significance level was set at 5% for all analyses. Unadjusted and IPTW-adjusted Kaplan–Meier (KM) curves were used to estimate median values of TTD, OS, PFS. When IPTW was used, the 95% confidence intervals (CI) were derived using a robust variance estimator. The proportional hazards assumption was justified for all models based on the Schoenfeld test, examination of KM plots and log-negative-log (LNL) plots. The effective sample size (ESS)31 was used to represent sample size post-IPTW.
Statistical analysis – Sensitivity analyses
Quantitative Bias Analysis (QBA) for missing data assumptions about baseline covariates
To assess the sensitivity of our results to missing data assumptions, hazard ratios (HRs) were computed under three scenarios:
Baseline confounder data missing completely at random (MCAR); these correspond to the results from the primary analysis
Baseline confounder data missing at random (MAR)
ECOG PS missing not at random (MNAR)
Multiple imputation by chained equations of ECOG PS scores was performed under MAR and MNAR, then where consistent with eligibility criteria for this study, patients with imputed ECOG PS > 1 were excluded, and the comparisons of interest executed8.
Tipping point bias analysis is an approach to manipulate scenarios for missingness or unmeasured confounding needed to evaluate the robustness of study results. The merits of evaluating the sensitivity of treatment effects under these scenarios is relevant to the common nature of uncollected or missing data in real-world studies. Therefore, it is essential to establish relevant thresholds for common sources of bias in real-world data and better specify the conditions where treatment effect conclusions may hold. For this study, tipping point-based bias analysis assuming non-random missingness (MNAR) for ECOG PS was also used, which involved shifting the distribution of imputed baseline ECOG PS within the RWD groups to poorer than expected under MAR to assess whether the corresponding adjusted HRs remained significant or not.
Impact of adjusting for metastases
Metastases were not explicitly adjusted for in the analyses as they are known to be under-recorded in the EDM. To evaluate the effect of metastases on the comparisons between the ARROW trial and EDM RWD cohorts, a sensitivity analysis was performed including a categorical metastases variable in the propensity score model.
QBA of unmeasured confounding
This analysis was used to assess the robustness of the study by estimating the E-value8,9,32. The E-value represents the minimum association of a hypothetical unmeasured confounder with treatment assignment and outcome of interest (OS) on the risk ratio (RR) scale to nullify our estimated HRs. HRs were converted to approximate RRs using a square-root transformation33. Bias plots graph unconfounded treatment effect estimates as fully adjusted risk ratios (ARR) after adjusting for a hypothetical unmeasured binary confounder over a range of confounder-exposure and confounder-outcome associations on the RR scale. Technical details are available in the Supplementary.
QBA of hazard ratio robustness
In order to quantitatively assess whether the adjusted HR estimates for the comparisons are robust against systematically poorer OS in RWD as compared to pivotal trials, we used a tipping point analysis to assess how far the OS in the RWD arms can be improved using a multiplicative constant before the IPTW-adjusted HR value loses statistical significance—we call this the “transformation threshold”. To maintain a fixed maximum follow-up time, patients were censored if their transformed time to event was greater than the maximum follow-up time in the original data for the reference/untransformed group.
The analyses and figures were performed in R statistical software version 3.3.6 (R Project for Statistical Computing). Further details on techniques are found in the Supplementary – Supplementary methods.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This study was funded by F. Hoffmann-La Roche. The funder of the study was involved in the writing of the manuscript and the decision to submit it for publication.
Author contributions
S.P.- Conceptualization, Formal Analysis, Writing - review & editing. S.V.R. - Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Methodology, Project Administration, Resources, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. A.L. - Conceptualization, Data Curation, Formal Analysis, Methodology, Validation, Visualization, Writing - original draft, Writing - review & editing. G.H. - Conceptualization, Project Administration, Resources, Supervision, Formal Analysis Validation, Visualization, Writing - original draft, Writing - review & editing. S.V.L. - Conceptualization, Formal Analysis, Writing - review & editing. N.S. - Conceptualization, Formal Analysis, Writing - review & editing. F.G. - Conceptualization, Formal Analysis, Writing - review & editing. V.S. - Conceptualization, Formal Analysis, Writing - review & editing.
Peer review
Peer review information
Nature Communications thanks Alastair Greystoke and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The Flatiron Health data used in this study were licensed from Flatiron Health https://flatiron.com/real-world-evidence/. The databases used were the Clinico-Genomic Database (CGDB) and the enhanced data-mart (EDM). These deidentified data may be made available upon request; interested researchers can contact DataAccess@flatiron.com. The clinical data from the ARROW trial were not generated for the purpose of this study. Researchers may request access to individual patient data from the ARROW trial through Roche’s data sharing platforms in accordance with the Global Policy on Sharing of Clinical Study Information: http://www.roche.com/research_and_development/who_we_are_how_we_work/clinical_trials/our_commitment_to_data_sharing.htm. Since at the time of publication the ARROW trial is ongoing and covering multiple indications, the study data will be accessible at https://vivli.org/ when the trial is completed for all indications (expected to be in 2024). In the meantime, requests to access individual patient data from the non-small cell lung cancer arm of the ARROW trial described in the current manuscript can be submitted through: https://vivli.org/members/enquiries-about-studies-not-listed-on-the-vivli-platform/ The remaining data are available within the Article and Supplementary Information.
Competing interests
S,P. receives honoraria from Boehringer Ingelheim, AstraZeneca, Roche, Takeda and Chugai Pharma; and provides consulting or advisory role for Boehringer Ingelheim, AstraZeneca, Roche, Takeda, Novartis, Pfizer, Bristol-Myers Squibb, MSD, Guardant Health, AbbVie and EMD Serono. Dr Liu has received research funding (to institution) from Alkermes, AstraZeneca, Bayer, Blueprint Medicines Corporation, Bristol-Myers Squibb, Corvus, Debiopharm, Elevation Oncology, Genentech, Lilly, Lycera, Merck, Merus, Pfizer, Rain Therapeutics, RAPT, and Turning Point Therapeutics; has served as consultant or advisory board member to Amgen, AstraZeneca, BeiGene, Blueprint Medicines Corporation, BMS, Catalyst, Daiichi Sankyo, G1 Therapeutics, Genentech/Roche, Guardant Health, Inivata, Janssen, Jazz Pharmaceuticals, Lilly, Merck/MSD, PharmaMar, Pfizer, Regeneron, and Takeda. Mr Scheuer reported receiving personal fees from Roche, receiving shares from Roche as an employee during the conduct of the study, and reported being an employee of and receiving shares from Novartis outside the submitted work. G.G.H. and A.L. reported receiving funding from Roche during the conduct of the study. S.V.R. reported receiving personal fees from Roche during the conduct of the study. F.G. has consulted or provided expert opinion for AMGEN, AstraZeneca, Bayer, BMS, Boehringer Ingelheim, Celgene, GSK, Lilly, MSD, Novartis, Pfizer, Roche, Siemens, and Takeda; has received fees from Amgen, AstraZeneca, Bayer, Boehringer Ingelheim, BMS, Celgene, GSK, Lilly, MSD, Novartis, Pfizer, Roche, Siemens, and Takeda; and has received funding for scientific research from Amgen, AstraZeneca, Boehringer Ingelheim, BMS, Celgene, GSK, Lilly, MSD, Novartis, Pfizer, Roche, Siemens, and Takeda. V.S. reports research funding/grant support for clinical trials from AbbVie, Agensys, Alfa-sigma, Altum, Amgen, Bayer, Berg Health, Biotherapeutics, Blueprint Medicines Corporation, Boston Biomedical, Boston Pharmaceuticals, Celgene, D3, Dragonfly Therapeutics, Exelixis, Fujifilm, GSK, Idera Pharma, Incyte, Inhibrx, Loxo Oncology, Medimmune, MultiVir, Nanocarrier, National Comprehensive Cancer Network, NCI-CTEP, Novartis, Northwest Biotherapeutics, Pfizer, PharmaMar, Roche/Genentech, Takeda, Turning Point Therapeutics, UT MD Anderson Cancer Center, and Vegenics; travel support from ASCO, ESMO, Helsinn, Incyte, Novartis, and PharmaMar; consultancy/advisory board participation for Helsinn, Incyte, Loxo Oncology/Eli Lilly, Medimmune, Novartis, R-Pharma US, QED Pharma, and other relationship with Medscape. V.S. is also an Andrew Sabin Family Foundation Fellow at The University of Texas MD Anderson Cancer Center, acknowledges support of The Jacquelyn A. Brady Fund, is supported by NIH grant R01CA242845. MD Anderson Cancer Center Department of Investigational Cancer Therapeutics is supported by the Cancer Prevention and Research Institute of Texas (RP1100584), the Sheikh Khalifa Bin Zayed Al Nahyan Institute for Personalized Cancer Therapy (1U01 CA180964), NCATS Grant UL1 TR000371 (Center for Clinical and Translational Sciences), and the MD Anderson Cancer Center Support Grant (P30 CA016672).
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Sreeram V. Ramagopalan, Email: sreeram.ramagopalan@roche.com
Vivek Subbiah, Email: vsubbiah@mdanderson.org.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-30908-1.
References
- 1.Addeo A, et al. Immunotherapy in non-small cell lung cancer harbouring driver mutations. Cancer Treat. Rev. 2021;96:102179. doi: 10.1016/j.ctrv.2021.102179. [DOI] [PubMed] [Google Scholar]
- 2.Gainor JF, et al. Pralsetinib for RET fusion-positive non-small-cell lung cancer (ARROW): a multi-cohort, open-label, phase 1/2 study. Lancet Oncol. 2021;22:959–969. doi: 10.1016/S1470-2045(21)00247-3. [DOI] [PubMed] [Google Scholar]
- 3.Subbiah V, et al. Pralsetinib for patients with advanced or metastatic RET-altered thyroid cancer (ARROW): a multi-cohort, open-label, registrational, phase 1/2 study. Lancet Diabetes Endocrinol. 2021;9:491–501. doi: 10.1016/S2213-8587(21)00120-0. [DOI] [PubMed] [Google Scholar]
- 4.IQWiG. IQWiG reports: commission No. A17-19—Alectinib (non-small cell lung cancer). https://www.iqwig.de/download/a17-19_alectinib_extract-of-dossier-assessment_v1-0.pdf?rev=185028 (2017).
- 5.Kent S, et al. The use of nonrandomized evidence to estimate treatment effects in health technology assessment. J. Comp. Eff. Res. 2021;10:1035–1043. doi: 10.2217/cer-2021-0108. [DOI] [PubMed] [Google Scholar]
- 6.Mishra-Kalyani, P. S. et al. External control arms in oncology: current use and future directions. Ann. Oncol. S0923753422000060 (2022) 10.1016/j.annonc.2021.12.015 (2022). [DOI] [PubMed]
- 7.Phillippo DM, et al. Methods for population-adjusted indirect comparisons in health technology appraisal. Med Decis. Mak. 2018;38:200–211. doi: 10.1177/0272989X17725740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wilkinson S, et al. Assessment of alectinib vs ceritinib in ALK -positive non–small cell lung cancer in phase 2 trials and in real-world data. JAMA Netw. Open. 2021;4:e2126306. doi: 10.1001/jamanetworkopen.2021.26306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann. Intern. Med. 2017;167:268. doi: 10.7326/M16-2607. [DOI] [PubMed] [Google Scholar]
- 10.Takeuchi K, et al. RET, ROS1 and ALK fusions in lung cancer. Nat. Med. 2012;18:378–381. doi: 10.1038/nm.2658. [DOI] [PubMed] [Google Scholar]
- 11.Cascetta P, et al. RET inhibitors in non-small-cell lung cancer. Cancers. 2021;13:4415. doi: 10.3390/cancers13174415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cong X-F, Yang L, Chen C, Liu Z. KIF5B-RET fusion gene and its correlation with clinicopathological and prognosis features in lung cancer: a meta-analysis. OncoTargets Ther. 2019;ume 12:4533–4542. doi: 10.2147/OTT.S186361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Song Z, Yu X, Zhang Y. Clinicopathologic characteristics, genetic variability and therapeutic options of RET rearrangements patients in lung adenocarcinoma. Lung Cancer. 2016;101:16–21. doi: 10.1016/j.lungcan.2016.09.002. [DOI] [PubMed] [Google Scholar]
- 14.Hess LM, Han Y, Zhu YE, Bhandari NR, Sireci A. Characteristics and outcomes of patients with RET-fusion positive non-small lung cancer in real-world practice in the United States. BMC Cancer. 2021;21:28. doi: 10.1186/s12885-020-07714-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Singal G, et al. Development and validation of a real-world clinicogenomic database. J. Clin. Oncol. 2017;35:2514–2514. doi: 10.1200/JCO.2017.35.15_suppl.2514. [DOI] [Google Scholar]
- 16.Williamson E, Morley R, Lucas A, Carpenter J. Propensity scores: From naïve enthusiasm to intuitive understanding. Stat. Methods Med. Res. 2012;21:273–293. doi: 10.1177/0962280210394483. [DOI] [PubMed] [Google Scholar]
- 17.Austin PC. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Commun. Stat. -Simul. Comput. 2009;38:1228–1234. doi: 10.1080/03610910902859574. [DOI] [Google Scholar]
- 18.Mok TSK, et al. Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial. Lancet Lond. Engl. 2019;393:1819–1830. doi: 10.1016/S0140-6736(18)32409-7. [DOI] [PubMed] [Google Scholar]
- 19.Gandhi L, et al. Pembrolizumab plus chemotherapy in metastatic non-small-cell lung cancer. N. Engl. J. Med. 2018;378:2078–2092. doi: 10.1056/NEJMoa1801005. [DOI] [PubMed] [Google Scholar]
- 20.De Giglio A, Di Federico A, Gelsomino F, Ardizzoni A. Prognostic relevance of pleural invasion for resected NSCLC patients undergoing adjuvant treatments: a propensity score-matched analysis of SEER database. Lung Cancer. 2021;161:18–25. doi: 10.1016/j.lungcan.2021.08.017. [DOI] [PubMed] [Google Scholar]
- 21.Gao J, et al. UniPortal thoracoscopic pneumonectomy does not compromise perioperative and long-term survival in patients with NSCLC: a retrospective, multicenter, and propensity score matching study. Lung Cancer. 2021;159:135–144. doi: 10.1016/j.lungcan.2021.07.013. [DOI] [PubMed] [Google Scholar]
- 22.Zhang R, et al. Radiotherapy improves the survival of patients with stage IV NSCLC: a propensity score matched analysis of the SEER database. Cancer Med. 2018;7:5015–5026. doi: 10.1002/cam4.1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mokhles S, et al. Comparison of clinical outcome of stage I non-small cell lung cancer treated surgically or with stereotactic radiotherapy: Results from propensity score analysis. Lung Cancer. 2015;87:283–289. doi: 10.1016/j.lungcan.2015.01.005. [DOI] [PubMed] [Google Scholar]
- 24.Hishida T, et al. Lobe-specific nodal dissection for clinical stage I and II NSCLC: Japanese multi-institutional retrospective study using a propensity score analysis. J. Thorac. Oncol. 2016;11:1529–1537. doi: 10.1016/j.jtho.2016.05.014. [DOI] [PubMed] [Google Scholar]
- 25.Chiang A, et al. A comparison between accelerated hypofractionation and stereotactic ablative radiotherapy (SABR) for early-stage non-small cell lung cancer (NSCLC): Results of a propensity score-matched analysis. Radiother. Oncol. 2016;118:478–484. doi: 10.1016/j.radonc.2015.12.026. [DOI] [PubMed] [Google Scholar]
- 26.Chen S, et al. Prognostic significance of pre-resection albumin/fibrinogen ratio in patients with non-small cell lung cancer: A propensity score matching analysis. Clin. Chim. Acta. 2018;482:203–208. doi: 10.1016/j.cca.2018.04.012. [DOI] [PubMed] [Google Scholar]
- 27.Adachi H, et al. Lobe-specific lymph node dissection as a standard procedure in surgery for non–small cell lung cancer: a propensity score matching study. J. Thorac. Oncol. 2017;12:85–93. doi: 10.1016/j.jtho.2016.08.127. [DOI] [PubMed] [Google Scholar]
- 28.Hsu, G. G., MacKay, E., Scheuer, N. & Ramagopalan, S. V. Keeping it real: implications of real-world treatment outcomes for first-line immunotherapy in metastatic non-small cell lung cancer. Immunotherapy imt–2021–0237 10.2217/imt-2021-0237 (2021). [DOI] [PubMed]
- 29.Vandenbroucke JP, et al. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. PLoS Med. 2007;4:e297. doi: 10.1371/journal.pmed.0040297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ma, X., Long, L., Moon, S., Adamson, B. J. S. & Baxi, S. S. Comparison of Population Characteristics in Real-World Clinical Oncology Databases in the US: Flatiron Health, SEER, and NPCR. 10.1101/2020.03.16.20037143 (2020).
- 31.Kish, L. Survey sampling. (Wiley, 1995).
- 32.Haneuse S, VanderWeele TJ, Arterburn D. Using the E-value to assess the potential effect of unmeasured confounding in observational studies. JAMA. 2019;321:602. doi: 10.1001/jama.2018.21554. [DOI] [PubMed] [Google Scholar]
- 33.VanderWeele TJ. On a square-root transformation of the odds ratio for a common outcome. Epidemiology. 2017;28:e58–e60. doi: 10.1097/EDE.0000000000000733. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Flatiron Health data used in this study were licensed from Flatiron Health https://flatiron.com/real-world-evidence/. The databases used were the Clinico-Genomic Database (CGDB) and the enhanced data-mart (EDM). These deidentified data may be made available upon request; interested researchers can contact DataAccess@flatiron.com. The clinical data from the ARROW trial were not generated for the purpose of this study. Researchers may request access to individual patient data from the ARROW trial through Roche’s data sharing platforms in accordance with the Global Policy on Sharing of Clinical Study Information: http://www.roche.com/research_and_development/who_we_are_how_we_work/clinical_trials/our_commitment_to_data_sharing.htm. Since at the time of publication the ARROW trial is ongoing and covering multiple indications, the study data will be accessible at https://vivli.org/ when the trial is completed for all indications (expected to be in 2024). In the meantime, requests to access individual patient data from the non-small cell lung cancer arm of the ARROW trial described in the current manuscript can be submitted through: https://vivli.org/members/enquiries-about-studies-not-listed-on-the-vivli-platform/ The remaining data are available within the Article and Supplementary Information.