This study evaluates the utility of quantitative bias analysis for exploring the sensitivity to unmeasured confounding of nonrandomized analyses using external control arms.
Key Points
Question
How can quantitative bias analysis (QBA) be used to address unmeasured confounding in external control arms (ECA) analyses?
Findings
In this study including 14 randomized clinical trials, QBA to adjust for known unmeasured confounders was feasible and helped mitigate bias in ECA analyses in this setting, which is characterized by high adherence to the assigned interventions, few losses to follow-up, and no competing events.
Meaning
The findings of this study are encouraging for investigators and decision-makers who consider QBA to complement ECA analyses in health technology assessments.
Abstract
Importance
Unmeasured confounding is a key concern for decision-makers when observational datasets are used to assemble external control arms (ECAs) for single-arm trials.
Objective
To investigate the utility of quantitative bias analysis (QBA) for exploring the sensitivity to unmeasured confounding of nonrandomized analyses using ECAs.
Design, Setting, and Participants
This study emulated 15 treatment comparisons using experimental arms from existing randomized trials in advanced non–small cell lung cancer (aNSCLC) conducted after 2011 and ECAs derived from observational data. Participants were eligible individuals diagnosed with aNSCLC between January 1, 2011, and March 1, 2020. After adjustment for measured baseline confounders, a prespecified QBA was conducted to address potential bias by known unmeasured and mismeasured confounders. The QBA relied on a synthesis of external evidence from a targeted literature search, randomized trial data, and clinician input. Hazard ratios from the original randomized trials were compared with those from their emulation based on ECA analyses. Analyses were completed from February 2022 to October 2023.
Exposure
Initiation of systemic therapies for aNSCLC.
Main outcomes and measures
Hazard ratios for all-cause death.
Results
Sample sizes varied from 52 to 830 depending on the treatment group. The mean difference in the log hazard ratio estimates when using the original control arm vs the ECA for each trial was 0.247 in unadjusted analyses (ratio of hazard ratios, 1.36), 0.139 when adjusted for measured confounders (ratio of hazard ratios, 1.22), and 0.098 when adding external adjustment for unmeasured and mismeasured confounders (ratio of hazard ratios, 1.17).
Conclusions and Relevance
QBA was feasible and informative in ECA analyses in which residual confounding was expected to be the most important source of bias. These findings encourage further exploration of how QBA can help quantify the impact of bias in other settings and when using other data sources.
Introduction
Randomized trials for cancer and rare diseases can be challenging due to difficulties1,2 in patient recruitment, the choice of an appropriate comparator, and ethical concerns.3,4 Even when randomized trials are performed, their results may be difficult to interpret because of the evolving standard of care or variations in the local standard of care across populations or geographies. In some settings, assessment of comparative effectiveness is based on single-arm trials supplemented with external control arms (ECA) selected from health care databases. Although these so-called ECA analyses are increasingly being submitted to regulatory and health technology assessment agencies, particularly for oncology drugs, they raise several methodological concerns. Prominent among them is the possibility of confounding bias due to lack of randomization. Substantial confounding will exist if important prognostic factors are not adequately measured and harmonized between the trial arm and the ECA.
The risk of confounding bias in ECA analyses is often only discussed qualitatively, which can make it difficult for decision-makers to evaluate the available evidence. Quantitative bias analysis (QBA) can be used to try to quantify the direction and magnitude of confounding.5 Health technology assessment agencies in the United Kingdom, France, and Canada have recommended the use of QBA for mitigating the risk of bias in studies using observational (data from routine clinical care) data (National Institute for Health and Care Excellence real-world evidence framework6). The US regulator is also considering QBA in observational analyses.7,8 However, little information is available on the feasibility and interpretability of QBA for ECA analyses, and on its practical relevance to regulators and health technology assessment agencies.9
Some previous studies10,11 have used QBA to address confounding in ECA analyses, but they have focused mostly on tipping point analyses. For example, Wilkinson et al10 used QBA to show that only an implausibly strong unmeasured confounder would be able to nullify the results from the ECA analysis comparing alectinib and ceritinib in pretreated ALK-positive aNSCLC. To our knowledge, there are no published ECA studies that have attempted external adjustment for unmeasured confounders. This study evaluated challenges in the implementation, reporting, and interpretation of QBA for unmeasured confounding in ECA analyses of treatment effectiveness. We demonstrate the application of QBA in a setting without competing events, high adherence to the assigned interventions, and few losses to follow-up, where residual confounding is expected to be the most important source of bias. Building on previous benchmarking studies12,13 involving ECAs, our demonstration study combined experimental arms from 14 randomized trials with ECAs constructed using observational data from a representative nationwide network of cancer centers in the US. We then compared effect estimates from the original randomized trials with those from ECA analyses before and after QBA. We also present a method for evidence synthesis and implementing analysis workflows for QBA in ECA analyses in this setting.
Methods
The Quantitative Bias Analysis for the Assessment of Bias in Comparisons between Synthetic Control Arms from External Data and Lung Cancer Trials (Q-BASEL) project is a demonstration study of QBA for unmeasured confounding in ECA analyses. Informed consent was waived under an umbrella approval by the Copernicus Group institutional review board because the data used for the analysis were deidentified. We followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
Randomized Trial Data
We obtained individual-level patient data from 15 experimental arms in 14 randomized trials12,14,15,16,17,18,19,20,21,22,23,24 sponsored by F. Hoffmann-La Roche involving patients with locally advanced or metastatic non–small cell lung cancer (aNSCLC). All trials met the following criteria: patients were randomized in 2011 or later; overall survival was an end point; and patient-level data and clinical study reports were available. Study characteristics, including trial calendar period, sample size, geographical location, treatments in each arm, length of follow-up, and proportion of patients who initiated their assigned treatments, are summarized in Table 1. All trials had active comparator arms.
Table 1. Randomized Trials of Advanced Non–Small Cell Lung Cancer Selected for Comparison With External Control Arm Analysis.
| Trial, y | Recruitment period, ya | Sample size | Experimental arm | Control arm | Maximum follow-up, mo | Patients who initiated their assigned treatment, %a | Mortality HR (95% CI) | |
|---|---|---|---|---|---|---|---|---|
| Experimental arm | Control arm | |||||||
| OAK, 202118 | 2014-2015 | 613 | 612 | Atezolizumab | Docetaxel | 33.4 | 96.81 | 0.80 (0.70-0.91) |
| POPLAR, 202118 | 2013-2014 | 144 | 612 | Atezolizumab | Docetaxel | 19.6 | 96.51 | 0.72 (0.54-0.98) |
| Impower110, 202015 | 2015-2018 | 285 | 287 | Atezolizumab | Carboplatin or cisplatin plus pemetrexed or gemcitabine | 36.6 | 95.80 | 0.85 (0.67-1.09) |
| Impower130, 201922 | 2015-2017 | 483 | 240 | Atezolizumab plus carboplatin plus nab-paclitaxel | Carboplatin plus nab-paclitaxel | 54.2 | 97.37 | 0.83 (0.69-0.99) |
| Impower131, 202017 | 2015-2017 | 681 | 340 | Atezolizumab plus carboplatin plus nab-paclitaxel | Carboplatin plus nab-paclitaxel | 29.7 | 97.94 | 0.98 (0.83-1.16) |
| Impower132, 202120 | 2016-2017 | 292 | 286 | Atezolizumab plus carboplatin or cisplatin plus pemetrexed | Carboplatin or cisplatin plus pemetrexed | 22.5 | 95.84 | 0.80 (0.63-1.01) |
| Impower150, 202121 | 2015-2016 | 400 | 400 | Atezolizumab plus bevacizumab plus carboplatin plus paclitaxel | Bevacizumab plus carboplatin plus paclitaxel | 52.2 | 98.66 | 0.81 (0.69-0.95) |
| ALEX, 202019,24 | 2014-2016 | 152 | 151 | Alectinib | Crizotinib | 65.8 | 100 | 0.70 (0.49-1.01) |
| ALESIA, 201923 | 2016-2017 | 125 | 62 | Alectinib | Crizotinib | 21.5 | 100 | 0.27 (0.11-0.65) |
| J-ALEX, 201716 | 2013-2015 | 103 | 104 | Alectinib | Crizotinib | 70.6 | 100 | 0.52 (0.33-0.82) |
| GO27820, 202012 | 2012-2013 | 55 | 54 | metMAb plus carboplatin or cisplatin plus paclitaxel | Placebo plus carboplatin or cisplatin plus paclitaxel | 34.7 | 100 | 0.94 (0.58-1.53) |
| GO27821 (A), 202012 | 2012-2013 | 69 | 70 | metMAb plus bevacizumab plus carboplatin or cisplatin plus paclitaxel | Placebo + bevacizumab plus carboplatin or cisplatin plus paclitaxel | 31.9 | 99.61 | 1.31 (0.79-2.19) |
| GO27821 (B), 202012 | 2012-2013 | 59 | 61 | metMAb plus carboplatin or cisplatin plus pemetrexed | Placebo plus carboplatin or cisplatin plus pemetrexed | 32.2 | 99.61 | 1.17 (0.74-1.86) |
| AvaALL, 201814 | 2011-2015 | 245 | 240 | Bevacizumab plus 1 of (erlotinib, docetaxel, or pemetrexed) | One of (erlotinib, docetaxel, or pemetrexed) | 52.8 | 97.31 | 0.88 (0.72-1.07) |
| NILE, 202012 | 2011-2012 | 52 | 52 | MEGF0444A (parsatuzumab) plus bevacizumab plus carboplatin plus paclitaxel | Placebo plus bevacizumab plus carboplatin plus paclitaxel | 24.4 | 99.03 | 1.31 (0.57-2.99) |
Median time from randomization to initiation was less than or equal to 1 day.
Approximately 7.5% of patients were lost to follow-up or dropped out. Because the outcome for the current study was all-cause mortality, there were no competing events. Between 95.8% and 100% of patients initiated their assigned treatment. The median time from randomization to initiation was 1 day and more than 95% of patients initiated treatment within 7 days of randomization. Therefore, assignment date and initiation date are used interchangeably in the analysis described subsequently. Less than 1% of patients deviated from the protocol during the follow-up.
Observational Data
We constructed ECAs using the deidentified Flatiron Health database, which is derived from both structured and unstructured data in electronic health records from cancer clinics in the US. The data from this study were obtained from US cancer clinics,25,26 including 43 302 patients who received treatment for aNSCLC. We used these data to emulate the control arms of target trials that mimic, as closely as possible, the original randomized trials. The target trial specifications based on publications, clinical study reports, and ClinicalTrials.gov entries are outlined in eAppendix 1 in Supplement 1.
The control arms included adults (aged 18 years or older) with a diagnosis of aNSCLC (American Joint Committee on Cancer stage IIIB or higher), with an Eastern Cooperative Oncology Group performance status between 0 and 2, and who initiated systemic therapy for cancer between January 1, 2011, and March 1, 2020. The outcome of interest was all-cause mortality. eAppendix 2 in Supplement 1 includes the definitions of baseline variables used in the analysis. eAppendix 3 in Supplement 1 contains a flowchart showing an overview of steps in patient selection. eTable 1 in Supplement 1 shows patient selection numbers by study. Given the advanced disease stage, we assumed that adherence among patients who initiated treatment in routine practice settings was similar to that of patients in the randomized trials.27,28
Prognostic Variables at Baseline
The following measured baseline prognostic factors were adjusted: age; sex; race (self-reported for the observational dataset); performance score; cancer stage at diagnosis; tumor histologic results; history of smoking; laboratory values for hemoglobin; serum creatinine; white blood cell count; liver function markers (eg, alanine aminotransferase); time since diagnosis; and time since January 1, 2011 (details can be found in eAppendix 2 in Supplement 1). For race, the available categories were typically dichotomized into White and Other/non-White for the purposes of confounding adjustment, with the Other category including African American or Black, Asian, Hispanic or Other unspecified categories; a separate Asian category was permitted at the discretion of the analyst on a case-by-case basis. Race was assessed in this study because it has been found to be independently associated with lung cancer incidence and mortality, in part as a result of differences in genetic susceptibility.29
The validity of our ECA estimates relies on the assumption of no unmeasured confounding, that is, the assumption that the physicians and patients chose a particular treatment using only prognostic information that was recorded in the database, and that measured prognostic variables were sufficient to account for differences in outcomes between patients treated in clinical trials and those treated in routine practice settings. If, on the other hand, important prognostic information was not available in the database, adjustment for the measured variables is insufficient and our estimates are biased. Adjustment may be insufficient because some confounders are unmeasured (ie, not included as variables in the database), or because some measured confounders are mismeasured.
In consultation with a lung oncologist (S.P.), we considered 4 types of baseline prognostic factors that could be confounders. The first type of prognostic factor was measured variables with complete data and with reasonably similar variable definitions between trial and observational data: age, sex, race, tumor histologic results, history of smoking, time since diagnosis, and time since January 1, 2011. The second type of factor was measured variables with missing values but with reasonably similar definitions between trial and observational data: performance score, cancer stage at diagnosis, laboratory values for hemoglobin, serum creatinine, white blood cell count, and liver function markers (eg, alanine aminotransferase). The third type of factor was measured variables with missing values that were known to be differently measured between trial and observational data: site-specific metastases and presence of biomarkers of interest (both considered dichotomous variables). We refer to these variables as mismeasured because their derivation in the observational data relied on International Statistical Classification of Diseases and Related Health Problems, Ninth and Tenth Revisions (ICD-9 and ICD-10) codes that were not recorded for all individuals and that did not correspond exactly to the information collected in the trials. A comorbidity index was not considered because performance score and age were assumed to capture most of the confounding due to disease burden.30 The final type of factor was unmeasured variables in the observational data, which were identified from a combination of a literature search based on search terms such as prognostic plus lung plus cancer and discussions with a lung oncologist (S.P.). These variables include presence of central nervous system metastases, presence of EGFR variant, tumor mutational burden, programmed cell death ligand-1 expression, presence of bone metastases, and presence of liver metastases. The cost of the drug was not included because it was difficult to quantify in the US health systems.
Statistical Analysis
Each of the 15 treatment comparisons using ECA analysis was performed separately as follows: we compared overall survival between patients in the experimental arms of the randomized trials and patients in the corresponding observational ECAs. We initially assumed that both groups were comparable conditional on the measured variables (types 1 and 2 previously described).
The causal contrast of interest was the intention-to-treat effect in a population with the same adherence to the initiated treatment (close to 100%) and distribution of risk factors as in the experimental arm of the trial. We estimated absolute risks using Kaplan-Meier curves and, given the small variability in duration of follow-up and the low proportion of losses to follow-up,31 hazard ratios (HRs) were estimated via a Cox model. For each trial, we calculated the difference between the log HR estimates between analyses using the original trial control arms and analyses using the corresponding ECAs. Importantly, the difference in log HRs is used solely to compare HRs across different analyses within this study and is not interpreted in terms of its absolute value. Interpretation of the findings on the basis of statistical significance was not performed.
Analyses were adjusted for baseline prognostic factors using nonstabilized inverse probability weights, as described elsewhere.32 Estimated weights were truncated at the 99th percentile to prevent the undue influence of outliers. Preweighting and postweighting covariate balance was assessed using standardized mean differences, with values less than 0.1 interpreted as negligible imbalance. We conducted separate analyses under 2 assumptions: (1) missing values in all baseline variables were completely at random (complete-case analysis), and (2) missing values in all confounders were random conditionally on the observed data (using multiple imputation).
A goal of QBA is to provide effect estimates that incorporate not only the confounder information available in the data but also external information. In our application, we estimate the effect when adjusting for both the measured confounders that are recorded in the database and the unmeasured and mismeasured confounders that are not recorded in the database but that we try to reconstruct under our assumptions. Therefore, we organized our simplified QBA for confounding in 3 steps: (1) quantification of the distribution of the unmeasured confounders using external information (eg, their prevalence); (2) simulation of individual values of the unmeasured and mismeasured confounders; and (3) repetition of the ECA analysis with the simulated values incorporated to the data. See eAppendix 3 in Supplement 1 for details.
In step 1, we estimated the unmeasured confounders’ prevalence and association with overall survival (using data from the 14 trials and previous observational studies of aNSCLC) as well as the difference between their prevalence in the trial’s treatment arm and in published observational studies. When no quantitative external information was available, we assumed that factors associated with longer survival were 10% more prevalent in patients selected into clinical trials (and between 5% and 20% in sensitivity analyses). Results were fairly insensitive to this parameter because known unmeasured confounders had low prevalence. We assumed that the recording of an ICD-10 code corresponded to presence of metastasis and that a random subset of individuals without a documented ICD-9 or ICD-10 code had a metastatic site that had not been captured in the data. We also assumed that a random subset of patients without a recorded genetic test or whose results were deemed uninterpretable by the testing facility had an unrecorded positive test result. We specified these random subsets in such a way that the resulting prevalence of the confounder and its association with outcome and treatment were equal to the ones estimated from randomized trials and observational studies of patients with aNSCLC (see eAppendix 3 in Supplement 1 for details).
In step 2, we simulated the individual values of the unmeasured and mismeasured confounders using the previously described parameters and added them to the database.33 In step 3, we estimated the effects from the ECA analyses described previously as if the unmeasured and mismeasured confounders had actually been recorded in the database.
For comparability across analyses, we used multiple imputation for all baseline variables. A particularly important confounder is performance score, which had a large amount of missingness and whose assessment is partly subjective. To assess the sensitivity of results to missingness for this variable, we also conducted the ECA analysis under the assumption that missing values in performance status were random conditionally on both the observed data and the unobserved confounders (missing values for all other measured confounders were assumed to be at random conditional on the observed data only). We used delta-adjusted pattern imputation of performance score for this analysis. Analyses were completed in R version 4.0.0 (R Project for Statistical Computing) from February 2022 to October 2023.
Results
A summary of baseline characteristics for the 15 ECA analyses are shown in eTable 2 in Supplement 1. Patients in trials were more likely to be younger and White than those in the observational data. Adjustment for measured covariates considerably reduced these imbalances. The estimated propensity scores are shown in eFigure 1 in Supplement 1.
HRs estimated from the original randomized trials and from the ECA analyses are shown in Figure 1 and in Table 2 (see also eTable 4 in Supplement 1 for results including complete-case analysis). Adjustment for measured confounders resulted in a mean difference between the log HR estimate using the original control arm and when using the ECA of 0.139 (ratio of HRs, 1.22), which was an improvement over the unadjusted comparison (0.247; ratio of HRs, 1.36). This difference decreased further after adjustment for simulated confounders (0.098; ratio of HRs, 1.17) (Table 3). If the HRs estimated using data from the randomized trials had been 1.0, these differences would correspond toHRs of 0.78, 0.87, and 0.91 from the unadjusted, adjusted, and externally adjusted ECA analyses, respectively.
Figure 1. Hazard Ratio (HR) Estimates and 95% CIs .
Data from analyses using the original control arms (dark blue) of the randomized trials and the ECAs without confounding adjustment (orange), with adjustment for measured confounders only after multiple imputation (light blue), and with adjustment for measured and simulated confounders (gray). ECA indicates external control arm.
Table 2. Results Based on Adjustment Strategy by External Control Arm (ECA) Analysis.
| Target trial | RCT | Unadjusted | Adjusted for measured variables (multiple imputation) | For measured variables (multiple imputation) plus external adjustment | ||||
|---|---|---|---|---|---|---|---|---|
| HR estimate (95% CI) | Control arm sample size/experimental arm sample size, No. | HR estimate (95% CI) | ECA sample size/experimental arm sample size, No. | HR estimate (95% CI) | ECA sample size/experimental arm sample size, No. | HR estimate (95% CI) | ECA sample size/experimental arm sample size, No. | |
| OAK18 | 0.80 (0.70-0.91) | 612/613 | 0.67 (0.56-0.81) | 225/613 | 0.71 (0.54-0.94) | 225/613 | 0.75 (0.55-1.01) | 224/613 |
| POPLAR18 | 0.72 (0.54-0.98) | 143/144 | 0.60 (0.46-0.78) | 266/144 | 0.58 (0.39-0.87) | 266/144 | 0.60 (0.38-0.94) | 266/144 |
| Impower11015 | 0.85 (0.67-1.09) | 287/285 | 0.71 (0.58-0.86) | 830/285 | 0.81 (0.60-1.10) | 830/285 | 0.88 (0.57-1.36) | 830/285 |
| Impower13022 | 0.83 (0.69-0.99) | 240/483 | 0.64 (0.53-0.76) | 269/483 | 0.69 (0.54-0.89) | 269/483 | 0.76 (0.55-1.04) | 269/483 |
| Impower13117 | 0.98 (0.83-1.16) | 340/681 | 0.78 (0.67-0.90) | 443/681 | 0.92 (0.75-1.13) | 443/681 | 0.98 (0.75-1.12) | 443/681 |
| Impower13220 | 0.80 (0.63-1.01) | 286/292 | 0.70 (0.57-0.86) | 456/292 | 0.76 (0.57-1.01) | 456/292 | 0.88 (0.59-1.31) | 456/292 |
| Impower15021 | 0.81 (0.69-0.95) | 400/400 | 0.59 (0.49-0.72) | 215/400 | 0.61 (0.44-0.84) | 215/400 | 0.64 (0.44-0.94) | 215/400 |
| ALEX19 | 0.70 (0.49-1.01) | 151/152 | 0.47 (0.32-0.68) | 114/152 | 0.63 (0.34-1.17) | 114/152 | 0.64 (0.34-1.21) | 114/152 |
| ALESIA23 | 0.27 (0.11-0.65) | 62/125 | 0.14 (0.06-0.34) | 43/125 | 0.20 (0.06-0.70) | 43/125 | 0.20 (0.06-0.67) | 43/125 |
| J-ALEX16 | 0.52 (0.33-0.82) | 104/103 | 0.52 (0.34-0.78) | 112/103 | 0.55 (0.32-0.93) | 112/103 | 0.58 (0.32-1.05) | 112/103 |
| GO2782012 | 0.94 (0.58-1.53) | 54/55 | 1.18 (0.82-1.70) | 321/55 | 1.14 (0.75-1.74) | 321/55 | 1.18 (0.77-1.81) | 321/55 |
| GO27821 (A)12 | 1.31 (0.79-2.19) | 70/69 | 0.97 (0.65-1.46) | 159/69 | 1.28 (0.77-2.13) | 159/69 | 1.32 (0.78-2.24) | 159/69 |
| GO27821 (B)12 | 1.17 (0.74-1.86) | 61/59 | 1.17 (0.82-1.65) | 433/59 | 1.43 (0.93-2.18) | 433/59 | 1.47 (0.95-2.28) | 433/59 |
| AvaALL14 | 0.88 (0.72-1.07) | 240/245 | 0.95 (0.79-1.15) | 319/245 | 0.87 (0.65-1.18) | 319/245 | 0.82 (0.59-1.13) | 319/245 |
| NILE12 | 1.31 (0.57-2.99) | 52/52 | 0.39 (0.22-0.70) | 229/52 | 0.41 (0.20-0.85) | 229/52 | 0.42 (0.20-0.88) | 229/52 |
Abbreviations: HR, hazard ratio; RCT, randomized clinical trial.
Table 3. Sample Sizes and Difference in Log Hazard Ratio (HR) Point Estimates Between Analyses That Use the Original Control From the Randomized Trials and the External Control Arm (ECA) From the Observational Flatiron Data.
| Trial | Sample sizes, No. | Log HR differencea | |||
|---|---|---|---|---|---|
| Trial experimental arm | ECAb | Unadjusted | Adjusted | ||
| For measured variables | For measured variables plus external adjustment | ||||
| OAK18 | 613 | 225 | 0.170 | 0.108 | 0.061 |
| POPLAR18 | 144 | 266 | 0.191 | 0.219 | 0.195 |
| Impower11015 | 285 | 830 | 0.185 | 0.044 | −0.037 |
| Impower13022 | 483 | 269 | 0.263 | 0.18 | 0.092 |
| Impower13117 | 681 | 443 | 0.234 | 0.066 | −0.004 |
| Impower13220 | 292 | 456 | 0.131 | 0.054 | −0.092 |
| Impower15021 | 400 | 215 | 0.305 | 0.279 | 0.229 |
| ALEX19 | 152 | 114 | 0.401 | 0.108 | 0.093 |
| ALESIA23 | 125 | 43 | 0.614 | 0.293 | 0.303 |
| J-ALEX16 | 55 | 321 | 0.006 | −0.056 | −0.110 |
| GO2782012 | 103 | 112 | −0.229 | −0.196 | −0.227 |
| GO27821 (A)12 | 69 | 159 | 0.301 | 0.026 | −0.004 |
| GO27821 (B)12 | 59 | 433 | 0.004 | −0.200 | −0.229 |
| AvaALL14 | 245 | 319 | −0.076 | 0.007 | 0.074 |
| NILE12 | 52 | 229 | 1.205 | 1.155 | 1.132 |
| Average | 251 | 296 | 0.247 (0.161) | 0.139 (0.080) | 0.098 (−0.006) |
The log HR differences in parentheses are differences in inverse variance-weighted HRs.
All eligible patients including those with missing values for confounders.
Survival curves for the 15 ECA analyses are shown in eFigures 2 and 3 in Supplement 1. Examination of the curves showed immediate survival divergence at the start of follow-up only for the NILE trial, which is clinically implausible and highly suggestive of unmeasured confounding by factors not foreseen in the analysis. After excluding this trial, the mean log HR was 0.179 in unadjusted analysis, 0.067 in the analysis adjusted for measured confounders, and 0.025 in the analysis further adjusted for simulated confounders, suggesting that regardless of this outlier, external adjustment improved agreement of HRs from ECA analyses with those from randomized trials.
Analyses with a larger proportion of missing values in measured confounders produced results that were more sensitive to assumptions about the missingness mechanism (Figure 2 and eTable 5 in Supplement 1). The log HR difference was greater if the ECA had greater levels of missing values (eTable 3 in Supplement 1).
Figure 2. Sensitivity of Results to Deviations From Missing at Random (MAR) Assumption for Performance Status in Each External Control Arm (ECA) Analysis.
Hazard ratios (HRs) adjusted for measured confounders for a range of deviations from MAR (δ ≠ 0) in the imputation model for performance status. Colors represent difference in log HR estimated from ECA analysis vs that from the corresponding randomized trial. For reference, a change of Eastern Cooperative Oncology Group performance status of 1.0 approximately corresponds to a δ equal to 3.
Discussion
In this study, we applied a simplified QBA for confounding to ECA analyses that estimated treatment effectiveness in patients with aNSCLC. We found that the effect estimates from ECA analyses adjusted for measured confounders were close to those from corresponding randomized trials and that additional adjustment based on external information further improved agreement with results from randomized trials, though the improvement may be considered small.
Our analysis can be viewed as a best-case scenario for evaluating QBA for several reasons. First, we studied a disease with well-known treatment indications and used a database with extensive information about them. Therefore, our ECA analyses are likely to adjust for most differences between the experimental and control arms, and the QBA is likely to adequately address the bias due to the remaining differences. Second, we studied clinical settings with high adherence to treatment and few losses to follow-up. In these conditions, the intention-to-treat effect targeted in this study is similar to the per-protocol effect, allowing us to focus on baseline confounding. In more complex clinical settings, the analysis would also have to consider postbaseline, time-varying confounding, and selection bias due to loss to follow-up. Third, our outcome of interest was total mortality so we did not have to handle competing events. Finally, because we had access to data from many randomized trials with relatively similar duration of follow-up, simplified comparisons based on HRs were appropriate.
A previous investigation studied the agreement between randomized trials (8 of the 14 trials12 included in our study) and ECA analyses, reporting on average relatively worse emulation results compared with our study. Although differences could be explained by multiple factors ranging from variation in data quality to differences in implementation techniques, in this demonstration we extended the number of trials, used the target trial framework to closely align the design of the randomized trials and the observational data, and outlined the steps to implement a simplified QBA for ECA analysis. Details of our external evidence synthesis method and analytical workflows can be found in the eAppendix 3 in Supplement 1. We hope that the workflows presented here will complement existing methods for combining and utilizing information about unmeasured confounders from external sources, such as the published literature, for QBA.
Limitations
This study has a few limitations. Our findings are based on clinical trial data from a single sponsor and a single health care database. Although we included a diverse set of clinical trials that varied in eligibility criteria, sample sizes, geographical locations, and effect sizes within NSCLC, and a high-quality oncology database, we do not claim that the findings will apply to all ECA comparisons or trial settings within aNSCLC. The relevance and utility of our proposed QBA framework and the extent of residual bias will need to be rigorously assessed on a case-by-case basis in addition to data provenance, rationale for fitness for purpose, and a discussion of other biases. We focused on NSCLC, where single arm trials are often conducted, particularly for therapies targeting rare genetic variants or novel immunotherapies resulting in small sample sizes and rapidly evolving therapeutic landscapes. Although focusing on NSCLC provided coherence to the study, a limitation is that we did not assess any other disease areas or outcomes besides overall survival or assess differential loss to follow-up.
Conclusions
In conclusion, our study illustrates the feasibility of QBA in ECA analyses when a sufficiently rich database is available. In our simplified setting, QBA can inform decision-makers about the expected impact of confounding in ECA analyses. More research is needed to examine the utility of QBA in settings with poor adherence, high losses to follow-up, competing events, and more uncertainty about treatment indications.
eAppendix 1. Target Trial Specifications
eAppendix 2. Variable Descriptions
eAppendix 3. Methods for a Simplified QBA for Confounding Bias
eTable 1. Patient Selection by Study for the External Control Group
eTable 2. Baseline Covariate Balance Tables
eTable 3. Association Between Selected Study Characteristics and the Log Hazard Ratio Difference
eTable 4. Results Based on Adjustment Strategy by ECA Analysis
eTable 5. Results for Delta-Adjusted Pattern Imputation Plotted in Figure 1
eFigure 1. Treatment Propensity Distribution
eFigure 2. Unadjusted Kaplan-Meier Curves
eFigure 3. Adjusted Kaplan-Meier Curves
eReferences.
Data Sharing Statement
References
- 1.Anderson M, Naci H, Morrison D, Osipenko L, Mossialos E. A review of NICE appraisals of pharmaceuticals 2000-2016 found variation in establishing comparative clinical effectiveness. J Clin Epidemiol. 2019;105:50-59. doi: 10.1016/j.jclinepi.2018.09.003 [DOI] [PubMed] [Google Scholar]
- 2.Beaver JA, Howie LJ, Pelosof L, et al. A 25-year experience of US Food and Drug Administration accelerated approval of malignant hematology and oncology drugs and biologics: a review. JAMA Oncol. 2018;4(6):849-856. doi: 10.1001/jamaoncol.2017.5618 [DOI] [PubMed] [Google Scholar]
- 3.Mishra-Kalyani PS, Amiri Kordestani L, Rivera DR, et al. External control arms in oncology: current use and future directions. Ann Oncol. 2022;33(4):376-383. doi: 10.1016/j.annonc.2021.12.015 [DOI] [PubMed] [Google Scholar]
- 4.Thorlund K, Dron L, Park JJH, Mills EJ. Synthetic and external controls in clinical trials—a primer for researchers. Clin Epidemiol. 2020;(12):457-467. doi: 10.2147/CLEP.S242097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lash TL, Fink AK. Applying Quantitative Bias Analysis to Epidemiologic Data. Springer; 2009. doi: 10.1007/978-0-387-87959-8 [DOI] [Google Scholar]
- 6.National Institute for Health and Care Excellence . NICE real-world evidence framework. Accessed July 2023. https://www.nice.org.uk/corporate/ecd9/chapter/appendix-2-reporting-on-methods-used-to-minimise-risk-of-bias [DOI] [PMC free article] [PubMed]
- 7.Gopalakrishnan C, et al. Quantitative bias analysis methodology development: sequential bias adjustment for outcome misclassification. Accessed July 2023. https://www.sentinelinitiative.org/sites/default/files/Methods/Sentinel_Methods_Sequential_bias.pdf
- 8.Desai RJ, Matheny ME, Johnson K, et al. Broadening the reach of the FDA Sentinel system: a roadmap for integrating electronic health record data in a causal analysis framework. NPJ Digit Med. 2021;4(1):170. doi: 10.1038/s41746-021-00542-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Jaksa A, Louder A, Maksymiuk C, et al. A comparison of seven oncology external control arm case studies: critiques from regulatory and health technology assessment agencies. Value Health. 2022;25(12):1967-1976. doi: 10.1016/j.jval.2022.05.016 [DOI] [PubMed] [Google Scholar]
- 10.Wilkinson S, Gupta A, Scheuer N, et al. Assessment of alectinib vs ceritinib in ALK-positive non-small cell lung cancer in phase 2 trials and in real-world data. JAMA Netw Open. 2021;4(10):e2126306. doi: 10.1001/jamanetworkopen.2021.26306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Popat S, Liu SV, Scheuer N, et al. Addressing challenges with real-world synthetic control arms to demonstrate the comparative effectiveness of Pralsetinib in non-small cell lung cancer. Nat Commun. 2022;13(1):3500. doi: 10.1038/s41467-022-30908-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Carrigan G, Whipple S, Capra WB, et al. Using electronic health records to derive control arms for early phase single-arm lung cancer trials: proof-of-concept in randomized controlled trials. Clin Pharmacol Ther. 2020;107(2):369-377. doi: 10.1002/cpt.1586 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Davies J, Martinec M, Delmar P, et al. Comparative effectiveness from a single-arm trial and real-world data: alectinib versus ceritinib. J Comp Eff Res. 2018;7(9):855-865. doi: 10.2217/cer-2018-0032 [DOI] [PubMed] [Google Scholar]
- 14.Gridelli C, de Castro Carpeno J, Dingemans AC, et al. Safety and efficacy of bevacizumab plus standard-of-care treatment beyond disease progression in patients with advanced non-small cell lung cancer: the AvaALL randomized clinical trial. JAMA Oncol. 2018;4(12):e183486. doi: 10.1001/jamaoncol.2018.3486 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Herbst RS, Giaccone G, de Marinis F, et al. Atezolizumab for first-line treatment of PD-L1-selected patients with NSCLC. N Engl J Med. 2020;383(14):1328-1339. doi: 10.1056/NEJMoa1917346 [DOI] [PubMed] [Google Scholar]
- 16.Hida T, Nokihara H, Kondo M, et al. Alectinib versus crizotinib in patients with ALK-positive non-small-cell lung cancer (J-ALEX): an open-label, randomised phase 3 trial. Lancet. 2017;390(10089):29-39. doi: 10.1016/S0140-6736(17)30565-2 [DOI] [PubMed] [Google Scholar]
- 17.Jotte R, Cappuzzo F, Vynnychenko I, et al. Atezolizumab in combination with carboplatin and nab-paclitaxel in advanced squamous NSCLC (IMpower131): results from a randomized phase III trial. J Thorac Oncol. 2020;15(8):1351-1360. doi: 10.1016/j.jtho.2020.03.028 [DOI] [PubMed] [Google Scholar]
- 18.Mazieres J, Rittmeyer A, Gadgeel S, et al. Atezolizumab versus docetaxel in pretreated patients with NSCLC: final results from the randomized phase 2 POPLAR and phase 3 OAK clinical trials. J Thorac Oncol. 2021;16(1):140-150. doi: 10.1016/j.jtho.2020.09.022 [DOI] [PubMed] [Google Scholar]
- 19.Mok T, Camidge DR, Gadgeel SM, et al. Updated overall survival and final progression-free survival data for patients with treatment-naive advanced ALK-positive non-small-cell lung cancer in the ALEX study. Ann Oncol. 2020;31(8):1056-1064. doi: 10.1016/j.annonc.2020.04.478 [DOI] [PubMed] [Google Scholar]
- 20.Nishio M, Barlesi F, West H, et al. Atezolizumab plus chemotherapy for first-line treatment of nonsquamous NSCLC: results from the randomized phase 3 IMpower132 trial. J Thorac Oncol. 2021;16(4):653-664. doi: 10.1016/j.jtho.2020.11.025 [DOI] [PubMed] [Google Scholar]
- 21.Socinski MA, Nishio M, Jotte RM, et al. IMpower150 final overall survival analyses for atezolizumab plus bevacizumab and chemotherapy in first-line metastatic nonsquamous NSCLC. J Thorac Oncol. 2021;16(11):1909-1924. doi: 10.1016/j.jtho.2021.07.009 [DOI] [PubMed] [Google Scholar]
- 22.West H, McCleod M, Hussein M, et al. Atezolizumab in combination with carboplatin plus nab-paclitaxel chemotherapy compared with chemotherapy alone as first-line treatment for metastatic non-squamous non-small-cell lung cancer (IMpower130): a multicentre, randomised, open-label, phase 3 trial. Lancet Oncol. 2019;20(7):924-937. doi: 10.1016/S1470-2045(19)30167-6 [DOI] [PubMed] [Google Scholar]
- 23.Zhou C, Kim SW, Reungwetwattana T, et al. Alectinib versus crizotinib in untreated Asian patients with anaplastic lymphoma kinase-positive non-small-cell lung cancer (ALESIA): a randomised phase 3 study. Lancet Respir Med. 2019;7(5):437-446. doi: 10.1016/S2213-2600(19)30053-0 [DOI] [PubMed] [Google Scholar]
- 24.Peters S, Camidge DR, Shaw AT, et al; ALEX Trial Investigators. Alectinib in untreated ALK-positive non-small-cell lung cancer. N Engl J Med. 2017;377(9):829-838. Medline:28586279 doi: 10.1056/NEJMoa1704795 [DOI] [PubMed] [Google Scholar]
- 25.Birnbaum B, Nussbaum N, Seidl-Rathkopf K, et al. Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research. arXiv. Preprint posted online January 13, 2020. doi: 10.48550/arXiv.2001.09765 [DOI]
- 26.Ma X, Long L, Moon S, Adamson Blythe JS, Baxi SS. Comparison of population characteristics in real-world clinical oncology databases in the US: Flatiron Health, SEER, and NPCR. medRxiv. Preprint posted online June 7, 2023. doi: 10.1101/2020.03.16.20037143 [DOI]
- 27.Souliotis K, Peppou L, Economou M, et al. Treatment adherence in patients with lung cancer from prospects of patients and physicians. Asian Pac J Cancer Prev. 2021;22(6):1891-1898. doi: 10.31557/APJCP.2021.22.6.1891 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hess LM, Louder A, Winfree K, Zhu YE, Oton AB, Nair R. Factors associated with adherence to and treatment duration of erlotinib among patients with non–small cell lung cancer. J Manag Care Spec Pharm. 2017;23(6):643-652. doi: 10.18553/jmcp.2017.16389 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Schabath MB, Cress WD, Muñoz-Antonia T. Racial and ethnic differences in the epidemiology of lung cancer and the lung cancer genome. Cancer Control. 2016;23(4):338-346. doi: 10.1177/107327481602300405 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sehgal K, Gill RR, Widick P, et al. Association of Performance Status With Survival in Patients With Advanced Non-Small Cell Lung Cancer Treated With Pembrolizumab Monotherapy. JAMA Netw Open. 2021;4(2):e2037120. doi: 10.1001/jamanetworkopen.2020.37120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Stensrud MJ, Hernán MA. Why test for proportional hazards? JAMA. 2020;323(14):1401-1402. doi: 10.1001/jama.2020.1267 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hernan MA, Robins JM. Causal Inference: What If. CRC Press; 2023. [Google Scholar]
- 33.Huang R, Xu R, Dulai PS. Sensitivity analysis of treatment effect to unmeasured confounding in observational studies with survival and competing risks outcomes. Stat Med. 2020;39(24):3397-3411. Medline:32677758 doi: 10.1002/sim.8672 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eAppendix 1. Target Trial Specifications
eAppendix 2. Variable Descriptions
eAppendix 3. Methods for a Simplified QBA for Confounding Bias
eTable 1. Patient Selection by Study for the External Control Group
eTable 2. Baseline Covariate Balance Tables
eTable 3. Association Between Selected Study Characteristics and the Log Hazard Ratio Difference
eTable 4. Results Based on Adjustment Strategy by ECA Analysis
eTable 5. Results for Delta-Adjusted Pattern Imputation Plotted in Figure 1
eFigure 1. Treatment Propensity Distribution
eFigure 2. Unadjusted Kaplan-Meier Curves
eFigure 3. Adjusted Kaplan-Meier Curves
eReferences.
Data Sharing Statement


