Skip to main content
JAMA Network logoLink to JAMA Network
. 2024 Sep 30;7(9):e2436535. doi: 10.1001/jamanetworkopen.2024.36535

Calibrating Observational Health Record Data Against a Randomized Trial

David Merola 1, Ulka Campbell 1, David Lenis 1, Sebastian Schneeweiss 2, Shirley Wang 2, Ann Madsen 1,, Gillis Carrigan 3, Victoria Chia 3, Osayi E Ovbiosa 4, Simone Pinheiro 4, Nelson Pace 4, Amanda Bruno 5, Mark Stewart 6, Sajan Khosla 7, Yiduo Zhang 7, Mothaffar Rimawi 8, Rachele Hendricks-Sturrup 9, Jenny Huang 10, Aliki Taylor 10, XiaoLong Jiao 11, Lauren Becnel 11, Lynn McRoy 11, Joy Eckert 12, Carla Rodriguez 12, Orsolya Lunacsek 13, Raymond Harvey 14, Joel Greshock 14, Khaled Sarsour 14, Andrew Belli 15, C K Wang 15, Laura Fernandes 15, James Chen 16, Brian San Francisco 16, Chithra Sangli 16, Yana Natanzon 17, K Arnold Chan 18, Neil Dhopeshwarkar 18, Mark Shapiro 19, Asher Wasserman 19, Jameson Quinn 19, Megan Rees 20, Travis Robinson 20, Ben Taylor 1, Jennifer R Rider 1
PMCID: PMC11443351  PMID: 39348118

Key Points

Question

Can findings from a pilot nonrandomized cohort study emulate the KEYNOTE-189 trial findings?

Findings

In this cohort study of 1854 patients, the hazard ratio for mortality (0.95) was incongruous with that of the benchmark trial (0.49). The 12-month survival probability in the exposed (0.60) and comparator (0.58) groups were also not aligned with those of the exposed (0.69) and comparator (0.49) groups of the trial.

Meaning

These findings suggest population and health record data source selection are critical to cohort study designs for oncology treatment effectiveness questions.


This cohort study attempts to emulate findings from the KEYNOTE-189 randomized clinical trial using electronic health record data.

Abstract

Importance

The conditions required for health record data sources to accurately assess treatment effectiveness remain unclear. Emulation of randomized clinical trials (RCTs) with health record data and subsequent calibration of the results can help elucidate this.

Objective

To pilot an emulation of the KEYNOTE-189 RCT using a commercially available electronic health record (EHR) data source.

Design, Setting, and Participants

This retrospective cohort study used an EHR database spanning from April 2007 to February 2023. Follow-up began on treatment initiation and proceeded until an outcome event, loss to follow-up, end of data, or end of study period (640 days). The population-based cohort was ascertained from EHRs provided by 52 health systems across the US. Eligibility criteria were defined as closely as possible to the benchmark RCT. Patients with non–small cell lung cancer initiating first-line treatment for metastatic disease were included. Patients with evidence of squamous non–small cell lung cancer, primary nonlung malignant neoplasms, or identified EGFR/ALK variations were excluded. Data were analyzed from June to October 2023.

Exposures

Initiation of first-line pembrolizumab and chemotherapy and chemotherapy alone. Chemotherapy in both groups was defined as a combination of pemetrexed and platinum-based (carboplatin or cisplatin) therapy.

Main Outcomes and Measures

Outcomes of interest were 12-month survival probability and mortality hazard ratio (HR).

Results

A total of 1854 patients (mean [SD] age, 63.7 [9.6] years; 971 [52.4%] men) were eligible, including 589 patients who initiated pembrolizumab and chemotherapy and 1265 patients who initiated chemotherapy only. The cohort included 364 Black patients (19.6%) and 1445 White patients (77.9%). The 12-month survival probabilities were 0.60 (95% CI, 0.54-0.65) in the pembrolizumab group and 0.58 (95% CI, 0.55-0.62) in the chemotherapy-only group, compared with 0.69 (95% CI, 0.64-0.74) in the KEYNOTE-189 pembrolizumab group and 0.49 (95% CI, 0.42-0.56) in the KEYNOTE-189 chemotherapy-only group. The mortality HR was 0.95 (95% CI, 0.78-1.16), compared with 0.49 (95% CI, 0.38-0.64) in the KEYNOTE-189 RCT.

Conclusions and Relevance

In this cohort study piloting an RCT emulation, results were incongruous with the benchmark trial. Differences in patient treatment and data capture between the RCT and EHR populations, confounding by indication, treatment crossover, and accuracy of captured diagnoses may explain these findings. Future feasibility assessments will require data sources to have important oncology-specific measures curated.

Introduction

Evidence generated from routinely collected health care data or extracted electronic health record data (EHR) may support decision-making among clinicians, regulators, payers, and patients. In the oncology setting, EHR data may be particularly useful for studying patients with high unmet need, rare malignant neoplasms, and populations often underrepresented in randomized clinical trials (RCTs), including elderly patients, individuals belonging to racial and ethnic minority groups (eg, Black or African American, Hispanic or Latino, Indigenous and American Indian, Asian, and Native Hawaiian and Other Pacific Islander), and those with comorbid illness.

Despite its advantages, analyzing data from health care practice to study drug effectiveness remains challenging. Such nonrandomized studies are susceptible to many biases, including channeling bias, immortal time bias, and unmeasured confounding.1,2 Furthermore, given that these data are not primarily collected for research purposes, limitations, including missing data, may arise due to health care delivery or data reporting practices.

To mitigate potential biases and enhance validity, many resources aim to guide decision-making about cohort study design and data source selection.3,4,5,6,7 To better understand when such studies can provide valid conclusions about cancer treatment effectiveness, including characteristics of data sources, the Coalition to Advance Real-World Evidence (CARE) created a framework to systematically emulate RCTs in oncology modeled after a prior initiative.8,9,10 Herein, we report pilot study findings from an emulation study of the KEYNOTE-189 trial11 using an EHR database. The purpose of this pilot was to evaluate whether the feasibility assessment process and study design used for this emulation should be applied to the remaining 49 potential data source-RCT emulation pairs under consideration for the CARE initiative or whether the approach should be refined.

Methods

Protocol Registration

This cohort study was reviewed by the WCG institutional review board and classified as exempt from approval and informed consent because the study satisfied the 3 elements of waiver of authorization, including use or disclosure of the personal health information involving no more than minimal risk to the individuals. A study protocol containing operational definitions and their justifications for all variables was registered on ClinicalTrials.gov.12 This study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cohort studies.

Data Source

The primary data source was the TriNetX Dataworks deidentified EHR database (spanning April 1, 2007, to February 28, 2023, encompassing the KEYNOTE-189 enrollment period of February 26, 2016, to March 6, 2017), which derives deidentified information from 52 health care organizations across the US, including acute care hospitals, outpatient oncology clinics, and academic medical centers. The data were drawn from a combination of structured health records fields (demographics, including race, date-indexed encounters, diagnoses, procedures, and medications), natural language processing of free-text clinician notes, linked tumor registry data, and mortality data ascertained via EHRs of contributing health systems, obituaries, and the Social Security death registry. Race was self-reported and classified as African American or Black, White, or other (eg, American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander, unknown). Race was included as a potential confounder to be weighted in the analysis.

Eligibility Criteria

Eligibility criteria were operationalized to reflect those of the KEYNOTE-189 trial. Patients with records indicating non–small cell lung cancer (NSCLC) and without EGFR or ALK variations identified prior to treatment initiation were included. Patients with records indicating central nervous system metastases within 14 days prior to treatment initiation, use of systemic steroids or other immunosuppressive agents within 90 days prior to treatment initiation, treatment for metastatic NSCLC any time prior to treatment initiation, evidence of squamous NSCLC, or other primary malignant neoplasms any time prior to treatment initiation were excluded. Several RCT enrollment criteria, such as underlying health conditions, performance status, life expectancy, and adequate organ function, were unavailable in the EHR source or not relevant in RCT setting (eg, life expectancy of ≥3 months). Protocol elements operationalization can be found in eTable 1 in Supplement 1.

Treatment Ascertainment and Contrast of Interest

The main intention-to-treat (ITT) analysis compared 2 treatment strategies. The exposure was initiation with pembrolizumab, pemetrexed, and platinum therapy (cisplatin or carboplatin) within 14 days of one another as first-line metastatic treatment. The comparator was initiation with pemetrexed and platinum therapy (cisplatin or carboplatin) within 14 days of one another as first-line metastatic treatment.

In the KEYNOTE-189 RCT, study drugs were administered on the same day; this emulation study used a 14-day exposure regimen ascertainment window, incident with respect to metastatic disease diagnosis date, to allow for treatment administration variability that often occurs in routine practice (Figure 1). Additionally, treatment initiation after the first metastatic record was required to capture each patient’s first line of therapy.

Figure 1. Example Patient Timelines for the Primary (Intention-to-Treat) and Secondary (Per-Protocol) Analysis.

Figure 1.

A, All patients shown are distinct and were classified on the basis of their treatment initiation within a 14-day exposure regimen ascertainment window that is incident with respect to each patient’s first record of metastasis. Notably, this analysis is agnostic to adherence and patients were followed-up from the end of the exposure regimen ascertainment window until occurrence of an outcome or censor event. Additionally, multiple records for each study drug could occur within the ascertainment window, as observed in patients D and H. B, Patients 1 to 4 are cloned and assigned to each treatment strategy on initiation of a study drug. Patient 1 adhered perfectly to the exposure treatment strategy and experienced the outcome event; therefore, all person-time observations were contributed to the exposed group. Patient 1 also contributed a small amount of person-time to the comparator group, as the patient’s treatment history was aligned with the comparator treatment strategy for some follow-up time prior to initiation of D3. Patient 2 in the exposed group was censored on initiation of a cancer treatment that is not a part of their assigned regimen (Dx) and did not contribute person-time to the comparator group, since D3 is not a component of the comparator regimen. Patient 3 in the exposed group was censored at the end of the 21-day interval because that is the point at which the patient’s treatment history was no longer adherent with the exposure strategy. Patient 4 in the comparator group had a treatment history that was compatible with the comparator strategy and experienced an outcome event but is censored in the exposed group due to a lack of adherence with the exposed treatment strategy. Notably, if an event occurred during an interval of follow-up time that was compatible with both treatment strategies, the outcome was attributed to both groups.

While assessing treatment as a time-fixed exposure (ie, treatment initiation) parallels the KEYNOTE-189 trial analytic approach, it ignores adherence and does not censor patients who discontinue or switch. Because adherence may explain effect estimate differences between randomized and nonrandomized study designs, a per-protocol (PP) analysis was conducted accounting for adherence and treatment variability over time.13,14

Drug exposures ascertainment used a combination of standardized codes found in structured fields of the database (eTable 2 in Supplement 1). Codes used to ascertain exposures are shown in eTable 3 in Supplement 1. Medication orders, administrations, and dispensing were indistinguishable in the data source, other than standardized billing codes for treatment administration. Medication exposure definitions for a particular agent used occurrence of any code and code type.

Analysis Time Zero and Follow-Up Period

In the main ITT analysis, follow-up began after the 14-day exposure regimen ascertainment window (ie, time zero), and ended at the first of death from any cause, loss to follow-up (defined as a >90-day period with no laboratory results, treatment records, or vitals recording), last date recorded in data, or end of study period (day 640 of follow-up). To avoid immortal time bias, follow-up for all patients began after the exposure regimen ascertainment window.

Outcome Ascertainment

Overall survival was estimated as the probability of survival beyond 12 months from time zero and as the marginal mortality hazard ratio (HR) comparing the pembrolizumab-containing (exposure) group with the comparator group. Because only month and year of death were available, analyses assumed the first day of the month of death.

Baseline Patient Characteristics

We identified 23 potential confounders and subsequently ascertained in the data source. Variables were assessed at times prior to or at time zero, except programmed death ligand 1 (PD-L1) tumor proportion score. Confounding variables, their operationalization, and justification have been described elsewhere,12 and relevant codes are provided in eTable 3 in Supplement 1.

Nonrandomized and Randomized Study Agreement

Agreement between the KEYNOTE-189 trial and this nonrandomized emulation study was assessed using 3 prespecified metrics from a prior trial emulation initiative.9 The first metric, regulatory agreement, is based on the emulation study having a similar direction and statistical significance as the RCT estimate. The second metric was estimate agreement, which occurs when the emulation point estimate lies within the bounds of the 95% CI of the estimate from the RCT. The third assessment metric is the standardized difference between the RCT and emulation estimates on the log scale, which evaluates the magnitude and direction of differences between the trial and emulation study.15

Statistical Analysis

ITT and PP analyses were performed (Figure 1). The main ITT analysis used a weighted marginal structural proportional hazards model and a weighted Kaplan-Meier estimator to estimate the marginal HR of mortality and 12-month survival probability, respectively. Derived weights assumed a time-fixed, stabilized inverse probability of treatment (IPT).16 The variable distribution was visually inspected before and after IPT weighting to assess potential positivity violations. Potential confounders used in the generation of weights included time between NSCLC diagnosis and time zero, time between the first indicator of metastatic disease and time zero, age, sex, PD-L1 tumor proportion score, performance status, marital status, body mass index, creatinine clearance, race, patient’s geographic region, corticosteroid use, smoking status, radiotherapy, venous thromboembolism, frequency of vitals recordings, and Charlson comorbidity index. Potential confounding introduced by very rare observations (<1% of patients) was limited; therefore, lung surgery, neoadjuvant treatment, and dementia were not included in IPT weighting to improve efficiency.17 Methods used to conduct the PP analysis are described elsewhere.12

Subgroup analyses were conducted in the ITT and PP cohorts. The mortality HR was evaluated within strata defined by age, sex, performance status, smoking status, PD-L1 tumor proportion score, and de novo metastatic disease status. IPT weights were not re-estimated in subgroups to maintain representativeness of the main analytic population. Analyses were conducted using R version 4.2.1 (R Project for Statistical Computing). Data were analyzed from June to October 2023.

Missing Data

Seven potential confounders with missingness remained in the final analytic dataset: age (8 patients [0.4%]), race (60 patients [3.2%]), marital status (1019 patients [55.0%]), body mass index (538 patients [29.0%]), performance status (1722 patients [92.9%]), PD-L1 tumor proportion score (1622 patients [87.5%]), and creatinine clearance (319 patients [17.2%]) (eTable 4 in Supplement 1). Missing values were believed to be attributable to varied reporting by health systems contributing to the database, given an observed dependence of missingness on geographic region (a proxy for the individual health systems). Therefore, missingness for all of these variables was assumed to follow a missing at random mechanism and was accounted for with multiple imputation by chained equations (eMethods 1 in Supplement 2).18

Sensitivity, Exploratory, and Post Hoc Analyses

To assess the robustness of the main ITT analysis to various assumptions, additional analyses were conducted. Sensitivity and exploratory analyses were prespecified in the study protocol, whereas post-hoc analyses were not specified a priori.

Three sensitivity analyses were conducted. First, a contemporaneous cohort restricted cohort entry to January 1, 2018, or later (eMethods 2 in Supplement 2). Second, treatment definition used only standardized billing codes to indicate treatment administration (eMethods 3 in Supplement 2). Third, the interval between the initial record indicating metastasis and treatment initiation was restricted to 6 months or less (eMethods 4 in Supplement 2). Two exploratory analyses were conducted. Rate of radiotherapy encounters and rate of vitals recordings were assessed over the follow-up period by treatment group (eMethods 5 in Supplement 2). Additionally, a density plot of censoring events was visually inspected to assess the potential for informative censoring (eMethods 6 in Supplement 2). Several post hoc analyses were conducted, including expansion of the 14-day exposure regimen ascertainment window to 45 days (eMethods 7 in Supplement 2), assessment of treatment group crossover (eMethods 8 in Supplement 2), assessment of de novo metastatic disease distribution (eMethods 9 in Supplement 2), and assessment of follow-up time distribution (eMethods 10 in Supplement 2).

Results

Study Population

The final analytic cohort included 1854 patients (mean [SD] age, 63.7 [9.6] years; 971 [52.4%] men), with 589 receiving pembrolizumab and chemotherapy and 1265 receiving chemotherapy only (Figure 2). There were 364 African American or Black patients (19.6%) and 1445 White patients (77.9%). During a median (IQR) follow-up of 380.50 (169.25-640.00) days, there were 959 mortality events. IPT-weighted study population baseline characteristics are shown alongside the RCT’s in Table 1. Region, sex, smoking status, and performance status differed between the EHR data and RCT populations. Race, reported for the emulation population in Table 1, was considered a potential confounder in the emulation.

Figure 2. Attrition Flowchart Describing Study Eligibility of Patients in the Data Source.

Figure 2.

Table 1. Patient Characteristics in the KEYNOTE-189 RCT Population and Emulation Inverse Probability Weighted, Post-Imputation EHR-Based Study Population.

Characteristic Patients, No. (%)
RCT EHRa
Pembrolizumab + chemotherapy (n = 410) Placebo + chemotherapy (n = 206) Pembrolizumab + chemotherapy (n = 571)b Chemotherapy only (n = 1274)b
Age <65 y 197 (48.0) 115 (55.8) 266 (46.7) 672 (52.7)
Sex
Female 156 (38.0) 97 (47.1) 296 (51.2) 618 (48.5)
Male 254 (62.0) 109 (52.9) 275 (48.3) 656 (51.5)
Region
Europe 243 (59.3) 131 (63.6) 0 0
North America 111 (27.1) 46 (22.3) 571 (100) 1274 (100)
East Asia 4 (1.0) 6 (2.9) 0 0
Other region 52 (12.7) 23 (11.2) 0 0
Performance status
0 186 (45.4) 80 (38.8) 170 (29.8) 342 (26.8)
1 221 (53.9) 125 (60.7) 207 (36.3) 466 (36.6)
2 1 (0.2) 0 (0.0) 112 (19.5) 270 (21.2)
3 0 (0.0) 0 (0.0) 82 (14.4) 196 (15.4)
Smoking status
Current or former 362 (88.3) 181 (87.9) 177 (31.0) 433 (34.0)
Never or unknown 48 (11.7) 25 (12.1) 394 (69.0) 841 (66.0)
Histologic features
Adenocarcinoma 394 (96.1) 198 (96.1) 511 (89.5) 1140 (89.5)
Other 16 (3.9) 8 (3.9) 60 (10.5) 134 (10.5)
Brain metastases 73 (17.8) 35 (17.0) 15 (2.6) 59 (4.6)
PD-L1 tumor proportion score, %
<1 127 (31.0) 63 (30.6) 187 (32.7) 409 (32.1)
≥1 260 (63.4) 128 (62.1) 383 (67.1) 865 (67.9)
1-49 128 (31.2) 58 (28.2) 239 (41.9) 533 (41.8)
≥50 132 (32.2) 70 (34.0) 144 (25.3) 332 (26.1)
Unavailable or missing 23 (5.6) 15 (7.3) NA NA
Previous therapy for nonmetastatic disease
Thoracic radiotherapy 28 (6.8) 20 (9.7) 163 (28.6) 380 (29.8)
Neoadjuvant or adjuvant therapy 30 (7.3) 20 (9.7) 2 (0.3) 8 (0.7)
None 352 (85.9) 166 (80.6) 406 (71.1) 886 (69.5)
Racec
African American or Black NR NR 115 (20.2) 249 (19.5)
White NR NR 447 (78.4) 998 (78.3)
Other NR NR 8 (1.4) 27 (2.2)

Abbreviations: EHR, electronic health record; NA, not applicable; NR, not reported; PD-L1, programmed death ligand 1; RCT, randomized clinical trial.

a

Age, race, marital status, body mass index, performance status, PD-L1 tumor proportion score, and creatinine clearance were imputed. Results of imputation are shown; results were similar across other 9 imputed datasets.

b

Sample sizes shown are the sum of the inverse probability of treatment weights.

c

Race was not reported in KEYNOTE-189 but considered as a potential confounder for the emulation study and reported for transparency. In this study, other race included American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander, and unknown.

Comparison of Overall Survival in Emulation Study and KEYNOTE-189 Trial

The mortality HR was 0.95 (95% CI, 0.78-1.16) in the ITT analysis and 1.15 (95% CI, 0.96-1.37) in the PP analysis, compared with 0.49 (95% CI, 0.38-0.64) in the RCT (Table 2). Neither regulatory agreement nor estimate agreement were reached in the ITT or PP analyses. The standardized difference between the ITT hazard ratio and the RCT was 5.50 (a value of <0.1 is conventionally considered to be a threshold for similarity).15 The standardized difference for the PP HR was 7.41. Results were similar across subgroups, except for the de novo metastatic disease subgroup in the ITT cohort (HR, 0.52; 95% CI, 0.28-0.96; weighted n = 468), which closely aligned with the main results of the RCT (eFigure 1 and eFigure 2 in Supplement 2).

Table 2. Estimates of Overall Survival in the RCT and EHR.

Estimate Mortality HR (95% CI) 12-mo Survival probability (95% CI)
Pembrolizumab combination Chemotherapy only
KEYNOTE-189 (ITT) 0.49 (0.38-0.64) 0.69 (0.64-0.74) 0.49 (0.42-0.56)
EHR (ITT) 0.95 (0.78-1.16) 0.60 (0.54-0.65) 0.58 (0.55-0.62)
EHR (PP) 1.15 (0.96-1.37) 0.72 (0.42-1.00) 0.58 (0.18-1.00)

Abbreviations: EHR, electronic health record; HR, hazard ratio; ITT, intention to treat; PP, per protocol.

Sensitivity and Exploratory Analyses

ITT analysis results were largely robust to sensitivity analyses restricting to a contemporaneous cohort (eTable 5 in Supplement 1), ascertaining exposure with the use of procedure codes only (eTable 6 in Supplement 1), and restricting time between initial metastasis to treatment initiation to 6 months or less (eTable 7 in Supplement 1). Exploratory analyses did not indicate unmeasured confounding (eTable 8 in Supplement 1) or informative censoring (eFigure 3 in Supplement 2). Results did not depend on the length of the exposure regimen ascertainment window (eTable 9 in Supplement 1) in this cohort. A post hoc analysis revealed variable follow-up time in the ITT cohort (eFigure 4 in Supplement 2) and short follow-up in the PP cohort (eFigure 5 in Supplement 2), in which nonadherence within 42 days resulted in censoring. Additionally, approximately 11.1% of patients in the ITT comparator group received pembrolizumab during the follow-up period (eTable 10 in Supplement 1). Lastly, there was a differential distribution of patients with de novo metastatic disease in the ITT cohorts (eTable 11 in Supplement 1).

Discussion

This cohort study designed as a nonrandomized pilot emulation of the KEYNOTE-189 trial and using the TriNetX Dataworks deidentified EHR database did not align with the RCT results per the examined metrics. While results were robust to an array of a priori specified sensitivity and exploratory analyses, this pilot did not incorporate some factors associated with survival that were missing from the data source. Additionally, this study evaluated US patients, whereas KEYNOTE-189 was global. The findings of this KEYNOTE-189 trial pilot were similar to those of a 2022 study19 that addressed a similar research question using several health record data sources.

The ITT analysis used robust sensitivity analyses, so post hoc analyses were conducted to understand the discordance. We found that the PP cohort experienced many censoring events within 42 days due to treatment nonadherence. Based on findings from the KEYNOTE-189 RCT, 42 days was not long enough to observe a treatment effect on mortality. In the ITT cohort, approximately 11.1% of patients in the comparator group received pembrolizumab later in follow-up (ie, experienced treatment crossover), potentially biasing toward the null. PP analyses account for adherence and treatment discontinuation. Future CARE emulations will prioritize PP analyses and consider expanding exposure definition, eg, by allowing a gap of 1 treatment cycle or expanding the window for permitting medications. Such changes may alter the research question.

The ITT HR estimated in a subgroup of patients with de novo metastatic disease aligned with the RCT, perhaps because tumor registry data predominantly informed initial metastasis date in this subgroup. In the main analysis metastasis was predominantly identified with diagnosis. If diagnosis codes were reported after initial diagnosis of metastasis, our study population may have included patients in later lines of treatment, as suggested by differences in performance status and radiotherapy compared with the RCT population. This may explain the null main finding if pembrolizumab plus chemotherapy has less benefit in later lines of therapy.20

This study had several strengths, including a larger sample size than the KEYNOTE-189 trial. The EHR data source included a linked tumor registry and multiple sources of mortality data for outcome assessment, a strength relative to other EHR data sources. Both treatment arms required that combination therapies be initiated within 14 days of one another and extending this window to 45 days to account for potential delays due to insurance coverage determination did not influence the findings. The use of IPT weighting produced exposure groups that were comparable on measured confounders and characteristics. A high level of transparency and reproducibility was maintained through oversight by a multistakeholder steering committee and publishing of the protocol publicly prior to study execution, ensuring that study results could not inform any aspect of the study design. Additionally, by using an ITT and PP analysis and a variety of sensitivity and post hoc analyses, greater context was provided to facilitate interpretation of the results.

Limitations

This study has some limitations. Our pilot emulation did not produce results aligned with the RCT findings for several reasons, including inability to operationalize important study eligibility criteria including performance status; reliance on International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) codes to determine date of metastatic disease; uncontrolled confounding by indication and other unmeasured or inadequately confounders, such as PD-L1; and the different calendar time periods of study for the exposed and comparator groups. Other limitations include misclassification of exposure and line of therapy. Furthermore, the EHR data source included US patients, whereas the KEYNOTE-189 trial was global. While sensitivity analyses supported the assumption that adjustment for observed characteristic rendered treatment groups comparable, such analyses cannot reveal bias resulting from unmeasured confounding. Additionally, in accordance with the prespecified study protocol, de novo metastatic disease was not considered in the PS model used to determine IPT weighting. However, given the worse prognosis for de novo metastatic disease diagnosis than initial metastasis diagnosis, this should bias away from the null, contrary to what was observed.21 Reliance on ICD-10 codes to determine date of metastatic diagnosis and prior RCT participation may mean that cohorts might have included patients who would have been ineligible to participate in clinical trials. In addition, a moderate to high degree of uncertainty of the distribution of strong effect modifiers, such as smoking status and PD-L1 tumor proportion score, make comparison of the EHR and RCT populations challenging. For example, patients with no smoking history have been reported to have a diminished response to pembrolizumab.22 The inability to distinguish between nonsmoker and unknown smoking status in EHR records leaves this uncertain. Furthermore, other differences between the EHR and RCT populations in variables with no missing data (eg, geographic region) further complicate their comparability. This study included patients who initiated comparator therapy after the approved indication for pembrolizumab. The underlying, unmeasured reasons that indicated patients would or would not receive a newly approved beneficial therapy may be related to prognosis in unknown ways.

While a statistically principled approach to address missing data was applied, measurement error could have resulted, given the extent of missingness and from incorrect assumptions about the randomness of missing values. This could have affected the success of confounding control. Additionally, the data source does not validate against the National Death Index; furthermore, owing to privacy considerations, we only had data on the month of death, so all deaths dates assumed the first of the month. These factors may have biased findings in either direction.

Conclusions

In this pilot cohort study, we identified potential reasons for the incongruous results that will inform future CARE initiative studies. These included underlying differences in accuracy and completeness of clinical data between the RCT and emulation populations, as well as within emulation study differences in crossover rates. As a result, we will refine the data feasibility assessment process to more critically evaluate the ability to operationalize key design elements and focus on data sources that include reliable oncology-specific measures, including curated lines of therapy, metastatic diagnosis date, and performance status. This will help to simplify potential explanations for any observed differences in future emulations.

Supplement 1.

eTable 1. Comparison of Protocol Element Definitions in the Real-World Evidence Study and the Randomized Controlled Trial

eTable 2. Key Variables Used in the TriNetX Dataworks Database for KEYNOTE-189 Emulation

eTable 3. Standardized Codes Used to Define Key Variables

eTable 4. Missing Data Distribution Stratified by Exposure Group in the Intention-To-Treat Cohort

eTable 5. Results From the Contemporaneous Cohort Sensitivity Analysis in the Intention-to-Treat Cohort

eTable 6. Results From the Exposure Definition Sensitivity Analysis in the Intention-to-Treat Cohort

eTable 7. Results From the Treatment Start Time Restriction Sensitivity Analysis in the Intention-to-Treat Cohort

eTable 8. Results From the Differential Treatment Intensity Exploratory Analysis in the Intention-to-Treat Cohort

eTable 9. Results From the Exposure Window Expansion Post-Hoc Analysis in the Intention-to-Treat Cohort

eTable 10. Results From the Comparator Percent Crossover Post-Hoc Analysis in the Intention-to-Treat Cohort

eTable 11. Results From the De Novo Metastatic Disease Post-Hoc Analysis in the Intention-to-Treat Cohort

Supplement 2.

eMethods 1. Missing Data: Mechanisms, Assumptions, and Methodological Approach

eMethods 2. Contemporaneous Cohort Sensitivity Analysis

eMethods 3. Exposure Definition Sensitivity Analysis

eMethods 4. Treatment Start Time Restriction Sensitivity Analysis

eMethods 5. Differential Treatment Intensity Exploratory Analysis

eMethods 6. Censoring Event Distribution Exploratory Analysis

eMethods 7. Exposure Regimen Ascertainment Window Expansion Post Hoc Analysis

eMethods 8. Comparator Percent Crossover Post Hoc Analysis

eMethods 9. De Novo Metastatic Disease Post Hoc Analysis

eMethods 10. Follow-up Time Distribution Post Hoc Analysis

eFigure 1. Subgroup Analysis of the ITT Cohort

eFigure 2. Subgroup Analysis of the PP Cohort

eFigure 3. Censoring Times Histogram and Density Plot in Study Cohort

eFigure 4. Density of Follow-Up Time in the Intention-to-Treat Analyses

eFigure 5. Density of Follow-Up Time in the Per-Protocol Analyses

eReferences.

Supplement 3.

Data Sharing Statement

References

  • 1.Lévesque LE, Hanley JA, Kezouh A, Suissa S. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes. BMJ. 2010;340:b5087. doi: 10.1136/bmj.b5087 [DOI] [PubMed] [Google Scholar]
  • 2.Hernán MA, Sauer BC, Hernández-Díaz S, Platt R, Shrier I. Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol. 2016;79:70-75. doi: 10.1016/j.jclinepi.2016.04.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gatto NM, Campbell UB, Rubinstein E, et al. The Structured Process to Identify Fit-For-Purpose Data: A Data Feasibility Assessment Framework. Clin Pharmacol Ther. 2022;111(1):122-134. doi: 10.1002/cpt.2466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gatto NM, Reynolds RF, Campbell UB. A structured preapproval and postapproval comparative study design framework to generate valid and transparent real-world evidence for regulatory decisions. Clin Pharmacol Ther. 2019;106(1):103-115. doi: 10.1002/cpt.1480 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gatto NM, Campbell UB, Rubinstein E, et al. The structured process to identify fit-for-purpose data: a data feasibility assessment framework. Clin Pharmacol Ther. 2022;111(1):122-134. doi: 10.1002/cpt.2466 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang SV, Schneeweiss S. A framework for visualizing study designs and data observability in electronic health record data. Clin Epidemiol. 2022;14:601-608. doi: 10.2147/CLEP.S358583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wang SW, Pottegård A, Crown W, et al. Harmonized Protocol Template to Enhance Reproducibility (HARPER) of hypothesis evaluating real-world evidence studies on treatment effects: a good practices report of a joint ISPE/ISPOR task force. Accessed June 28, 2022. https://osf.io/6qxpf/ [DOI] [PubMed]
  • 8.Franklin JM, Patorno E, Desai RJ, et al. Emulating randomized clinical trials with nonrandomized real-world evidence studies: first results from the RCT DUPLICATE Initiative. Circulation. 2021;143(10):1002-1013. doi: 10.1161/CIRCULATIONAHA.120.051718 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Franklin JM, Pawar A, Martin D, et al. Nonrandomized real-world evidence to support regulatory decision making: process for a randomized trial replication project. Clin Pharmacol Ther. 2020;107(4):817-826. doi: 10.1002/cpt.1633 [DOI] [PubMed] [Google Scholar]
  • 10.Merola D, Campbell U, Gautam N, et al. The Aetion coalition to advance real-world evidence through randomized controlled trial emulation initiative: oncology. Clin Pharmacol Ther. 2023;113(6):1217-1222. doi: 10.1002/cpt.2800 [DOI] [PubMed] [Google Scholar]
  • 11.Gandhi L, Rodríguez-Abreu D, Gadgeel S, et al. ; KEYNOTE-189 Investigators . Pembrolizumab plus chemotherapy in metastatic non–small-cell lung cancer. N Engl J Med. 2018;378(22):2078-2092. doi: 10.1056/NEJMoa1801005 [DOI] [PubMed] [Google Scholar]
  • 12.Merola D, Campbell U, Lenis D, et al. Emulation of the KEYNOTE-189 trial using electronic health records. Accessed August 29. 2024. https://clinicaltrials.gov/study/NCT05908799
  • 13.Franklin JM, Glynn RJ, Suissa S, Schneeweiss S. Emulation differences vs. biases when calibrating real-world evidence findings against randomized controlled trials. Clin Pharmacol Ther. 2020;107(4):735-737. doi: 10.1002/cpt.1793 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Danaei G, García Rodríguez LA, Cantero OF, Logan RW, Hernán MA. Electronic medical records can be used to emulate target trials of sustained treatment strategies. J Clin Epidemiol. 2018;96:12-22. doi: 10.1016/j.jclinepi.2017.11.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009;28(25):3083-3107. doi: 10.1002/sim.3697 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60(7):578-586. doi: 10.1136/jech.2004.029496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006;15(5):291-303. doi: 10.1002/pds.1200 [DOI] [PubMed] [Google Scholar]
  • 18.Pedersen AB, Mikkelsen EM, Cronin-Fenton D, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157-166. doi: 10.2147/CLEP.S129785 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lasiter L, Tymejczyk O, Garrett-Mayer E, et al. Real-world overall survival using oncology electronic health record data: Friends of Cancer Research Pilot. Clin Pharmacol Ther. 2022;111(2):444-454. doi: 10.1002/cpt.2443 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jung HA, Park S, Choi YL, et al. Continuation of pembrolizumab with additional chemotherapy after progression with PD-1/PD-L1 inhibitor monotherapy in patients with advanced NSCLC: a randomized, placebo-controlled phase II study. Clin Cancer Res. 2022;28(11):2321-2328. doi: 10.1158/1078-0432.CCR-21-3646 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Moore S, Leung B, Wu J, Ho C. Survival implications of de novo versus recurrent metastatic non–small cell lung cancer. Am J Clin Oncol. 2019;42(3):292-297. doi: 10.1097/COC.0000000000000513 [DOI] [PubMed] [Google Scholar]
  • 22.Popat S, Liu SV, Scheuer N, et al. Association between smoking history and overall survival in patients receiving pembrolizumab for first-line treatment of advanced non–small cell lung cancer. JAMA Netw Open. 2022;5(5):e2214046. doi: 10.1001/jamanetworkopen.2022.14046 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eTable 1. Comparison of Protocol Element Definitions in the Real-World Evidence Study and the Randomized Controlled Trial

eTable 2. Key Variables Used in the TriNetX Dataworks Database for KEYNOTE-189 Emulation

eTable 3. Standardized Codes Used to Define Key Variables

eTable 4. Missing Data Distribution Stratified by Exposure Group in the Intention-To-Treat Cohort

eTable 5. Results From the Contemporaneous Cohort Sensitivity Analysis in the Intention-to-Treat Cohort

eTable 6. Results From the Exposure Definition Sensitivity Analysis in the Intention-to-Treat Cohort

eTable 7. Results From the Treatment Start Time Restriction Sensitivity Analysis in the Intention-to-Treat Cohort

eTable 8. Results From the Differential Treatment Intensity Exploratory Analysis in the Intention-to-Treat Cohort

eTable 9. Results From the Exposure Window Expansion Post-Hoc Analysis in the Intention-to-Treat Cohort

eTable 10. Results From the Comparator Percent Crossover Post-Hoc Analysis in the Intention-to-Treat Cohort

eTable 11. Results From the De Novo Metastatic Disease Post-Hoc Analysis in the Intention-to-Treat Cohort

Supplement 2.

eMethods 1. Missing Data: Mechanisms, Assumptions, and Methodological Approach

eMethods 2. Contemporaneous Cohort Sensitivity Analysis

eMethods 3. Exposure Definition Sensitivity Analysis

eMethods 4. Treatment Start Time Restriction Sensitivity Analysis

eMethods 5. Differential Treatment Intensity Exploratory Analysis

eMethods 6. Censoring Event Distribution Exploratory Analysis

eMethods 7. Exposure Regimen Ascertainment Window Expansion Post Hoc Analysis

eMethods 8. Comparator Percent Crossover Post Hoc Analysis

eMethods 9. De Novo Metastatic Disease Post Hoc Analysis

eMethods 10. Follow-up Time Distribution Post Hoc Analysis

eFigure 1. Subgroup Analysis of the ITT Cohort

eFigure 2. Subgroup Analysis of the PP Cohort

eFigure 3. Censoring Times Histogram and Density Plot in Study Cohort

eFigure 4. Density of Follow-Up Time in the Intention-to-Treat Analyses

eFigure 5. Density of Follow-Up Time in the Per-Protocol Analyses

eReferences.

Supplement 3.

Data Sharing Statement


Articles from JAMA Network Open are provided here courtesy of American Medical Association

RESOURCES