Abstract
Objective:
To present an approach on using electronic health record (EHR) data that assesses how different eligibility criteria, either individually or in combination, can impact patient count and safety (exemplified by all-cause hospitalization risk) and further assist with criteria selection for prospective clinical trials.
Materials and Methods:
Trials in three disease domains – relapsed/refractory (r/r) lymphoma/leukemia; hepatitis C virus (HCV); stages 3 and 4 chronic kidney disease (CKD) – were analyzed as case studies for this approach. For each disease domain, criteria were identified and all criteria combinations were used to create EHR cohorts. Per combination, two values were derived: (1) number of eligible patients meeting the selected criteria; (2) hospitalization risk, measured as the hazard ratio between those that qualified and those that did not. From these values, k-means clustering was applied to derive which criteria combinations maximized patient counts but minimized hospitalization risk.
Results:
Criteria combinations that reduced hospitalization risk without substantial reductions on patient counts were as follows: for r/r lymphoma/leukemia (23 trials; 9 criteria; 623 patients), applying no infection and adequate absolute neutrophil count while forgoing no prior malignancy; for HCV (15; 7; 751), applying no human immunodeficiency virus and no hepatocellular carcinoma while forgoing no decompensated liver disease/cirrhosis; for CKD (10; 9; 23893), applying no congestive heart failure.
Conclusions:
Within each disease domain, the more drastic effects were generally driven by a few criteria. Similar criteria across different disease domains introduce different changes. Although results are contingent on the trial sample and the EHR data used, this approach demonstrates how EHR data can inform the impact on safety and available patients when exploring different criteria combinations for designing clinical trials.
Keywords: Clinical trials (as topic), electronic health records, clinical research informatics, patient recruitment, outcome assessment
1. INTRODUCTION
Clinical trials are the long-time gold standard for generating robust medical evidence. A crucial consideration when pursuing a clinical trial is defining a representative, clinically meaningful, and safe population to study, usually defined through eligibility criteria. Selection of criteria is important for defining a more homogeneous population to better understand an intervention’s effects, and to safeguard against undue harm [1,2]. However, suboptimal criteria selection can lead to low accrual, resulting in trial incompletion [3]. Furthermore, overly strict criteria may reduce the potential relevance for patients outside the trial that could otherwise benefit from the intervention [4]. Balancing between these parameters remains a difficult challenge when planning future clinical trials.
For clinical trials focusing on the same disease domain, common eligibility criteria often emerge [5–8]. Some of these are expected and inherent for defining the disease of interest, such as glycated hemoglobin levels for diabetes or blood pressure for hypertension [9–11]. In contrast, other criteria may be routinely applied with minimal or no justification [12–15]. Such instances have led to calls for either less stringent requirements, or at least more carefully defining the applied criteria [1,16]. This has been explicitly acknowledged in the oncology domain where evidence shows prior trials’ criteria were potentially too rigid, leading to difficulty in completing such trials [17–19]. Subsequently, concentrated efforts have been undertaken to better define criteria, such as how to address potential participants with human immunodeficiency virus (HIV) and prior malignancies [20–22]. Regardless of disease domain, there are other observations in commonly excluded groups – such as pregnant women and older adults – undergoing similar calls for more careful consideration on how to better approach the involvement of these groups [23,24].
To better understand eligibility criteria effects, real-world data (RWD), particularly electronic health records (EHRs), can be leveraged [25]. Many prior explorations have demonstrated using RWD to inform criteria selection for recruitment purposes, providing a promising avenue for criteria evaluation [21,26–29]. Likewise, RWD can be used to examine how criteria selection can affect outcomes, which subsequently can inform power analyses and safety concerns [29–31]. Although prior work has been promising, it is concentrated on just a few diseases, which calls for additional strategies to assess eligibility criteria and how robust those strategies are across different disease domains.
The focus of this study is on eligibility criteria selection that optimizes safety (i.e., defines a healthier sample) while balancing against the expense of removing too many patients for prospective clinical trials. Thus, the primary objective is to demonstrate an approach that evaluates how different criteria, either individually or in combination, affect changes in patient count and all-cause hospitalization risk using EHR data across a variety of disease domains. Patient count is defined as the available number of patients that could potentially be recruited under a given set of criteria. Hospitalization risk is measured as a hazard ratio (HR) that compares those that qualify for a given criteria combination versus those that do not. Because hospitalization is considered a serious adverse event, this measure is meant to provide a safety indication in which a lower HR is preferred in order to reduce the chance for unintended events [32]. Ideally, the approach presented in this study should provide guidance within any disease domain as to which criteria can maximize patient count while balancing against hospitalization risk.
2. MATERIALS AND METHODS
Figure 1 provides a study overview. This study used 3 data sources: (1) trial enrollment data from trials conducted at Columbia University Irving Medical Center (CUIMC); (2) ClinicalTrials.gov data; and (3) CUIMC EHR data. The trial enrollment data provided the sample of clinical trials for analysis, with 297 interventional medication trials available up to January 2019. ClinicalTrials.gov data were leveraged via two databases: (1) the Aggregate Analysis of ClinicalTrials.gov (AACT) database to derive the conditions of focus for the clinical trial sample; and (2) the Clinical Trial Knowledge Base (CTKB), which stores eligibility criteria mapped into a standardized format from a natural language processing tool [33,34]. Finally, the EHR data were used to create cohorts of interest and assess different eligibility criteria combinations. The EHR data contains over 6.5 million patient records between October 1985 to September 2020 and was stored in a common data model (OMOP v05) maintained by the Observational Health Data Science and Informatics collaborative [35,36]. This study was approved by the Columbia University IRB.
Figure 1:
Study overview
2.1. Trial Sample and Criteria Identification
Trials were selected based on the most common diseases studied within the trial enrollment data, and needed to focus on an adult population with their earliest recruitment date beginning on or after January 2010. Diseases were identified based on the condition descriptions from AACT. The 3 most common diseases found were selected: 23 trials for relapsed/refractory (r/r) lymphoma/leukemia; 15 trials for hepatitis C virus (HCV); and 10 trials for chronic kidney disease (CKD), with specific focus on stages 3 and 4.
Per disease domain, eligibility criteria from the sample trials was derived from CTKB [33]. Using the mapped standardized concepts, criteria were first selected if they were present in at least 25% of all trials. Next, criteria were removed if their standardized concept could not be reasonably captured in EHR data (e.g., “contraception”) or were too vague (e.g., “treatment”). Finally, the standardized concepts for all trials were manually checked against their source ClinicalTrials.gov criterion by one author (JRR) to ensure correct mapping and to define timing requirements based on what was commonly reported within the ClinicalTrials.gov entries. In the event a temporal restriction could not be identified from manual review, standard constraints of 365 days for chronic-related criteria and 30 days for acute-related criteria were applied. The final result was the set of eligibility criteria for cohort construction and analysis.
2.2. Cohort Construction
Figure 2 illustrates the basic cohort construction within each disease domain. The general process involved creating baseline cohorts followed by applying eligibility criteria combinations to create cohort pairs. Each pair consisted of two mutually exclusive cohorts: a qualifying cohort, which contained patients meeting the selected eligibility criteria; and a non-qualifying cohort, which contained all other patients from the baseline cohort. Baseline cohorts generally consisted of an index event with condition requirements and follow-up specifications. To define the calendar time range in which a patient could enter a baseline cohort, the earliest recruitment date and the latest recruitment date among all trials within the disease domain of interest were used. Patients can only have one entry per baseline cohort. For all cohorts, the outcome of interest was all-cause hospitalization (length of stay > 1 day); inpatient index events as well patients that had hospitalizations within the past 30 days of their selected index event were excluded. Follow-up time for each patient started on the index event and the length of time was up to 180 days (365 days for the sensitivity analyses) or end of patient data. Despite similar setups, each disease domain baseline cohort had its own nuances. These details were informed by UpToDate resources, clinical input, RxNav, and validated phenotypes when applicable [37,38]. Additionally, eligibility criteria definitions were represented by validated rule-based phenotypes when possible [39–49]. Code details are available elsewhere (Supplemental Material 1).
Figure 2:
Cohort construction overview
Rx = medication; dx = diagnosis; r/r = relapsed/refractory; HCV = hepatitis C virus; CKD = chronic kidney disease (specific focus on stages 3 and 4); EC = eligibility criterion. Yellow stars refer to index event, while red X’s refer to either outcome, end of data, or end of follow-up (whichever occurs first for each patient). Specific to the r/r lymphoma/leukemia, condition era specifications were used (represented as blue boxes). Codes and further details are available in Supplemental Material 1.
2.2.1. Relapsed/Refractory Lymphoma/Leukemia
The index event for this cohort was defined as a medication part of a treatment regimen to address the r/r lymphoma/leukemia. The medication could be part of a chemotherapy regimen or a corticosteroid (Supplemental Material 1). Additionally, entry was predicated on longstanding history of lymphoma/leukemia, loosely modeled from an available EHR phenotype [42]. This condition requirement was defined as a diagnosis period (i.e., condition era) for lymphoma or leukemia that subsumes the index event. Furthermore, there must be at least two more diagnoses for lymphoma or leukemia occurring within 180 days of one another, in which the latest diagnosis of that set was at least 180 days before the start of the diagnosis period that subsumes the index event. The intuition behind this setup was to approximate an established history of lymphoma or leukemia to potentially suggest a relapse (i.e., cancer reappearance) or refractory (i.e., failed prior treatment) event in light of EHR data quality limitations [50]. Date of entry was between January 2011 to December 2018. Regarding follow-up, if hospitalization for hematopoietic stem cell transplantation occurred within the 180 days after index, those individuals were excluded because their index event was likely tied to preparation for that procedure, making their hospitalization expected (i.e., undergoing treatment is a common part of procedure preparation).
2.2.2. Hepatitis C Virus
The index event for this cohort was defined as a medication part of an antiviral therapy (Supplemental Material 1). Entry related to the medication must have also had a diagnosis for HCV within the same encounter, as well as another HCV diagnosis on a different day recorded within the past year of index [45]. Date of entry was between January 2010 to February 2017. For follow-up, if hospitalization for liver transplantation occurred during the 180-day period, those individuals were excluded because transplantation was considered a planned procedure to address any underlying liver disease associated with HCV, as opposed to a potential consequence of a chosen medication.
2.2.3. Chronic Kidney Disease
Unlike the prior two cohorts, index events were not based on medications as therapeutic treatment is often focused on different indications that may be related to the CKD, such as hypertension and diabetes [51]. Thus, index events were based on estimated glomerular filtration rate (eGFR) and diagnosis requirements per the G-staging portion of a validated phenotype [39]. To enter, a patient must have an eGFR of at least 15 mL/min/1.73m2, but less than 60 mL/min/1.73m2. Then, within the past 90 days, the patient must have at least one of the following: (1) a relevant CKD diagnosis, which may occur on the index date; (2) another eGFR value that is less than 90 mL/min/1.73m2. Additionally, there must be no evidence of kidney transplantation or dialysis within the past 90 days, and no evidence of acute kidney injury (AKI) within the past 30 days (Supplemental Material 1). Date of entry was between January 2010 to January 2019. For follow-up, if evidence of AKI occurs within 30 days after index, those patients were removed as it was assumed the index event was related to that AKI event.
2.3. Analysis
With cohorts constructed, analyses were performed within each disease domain, focused on changes in patient counts and hospitalization risk estimations for each eligibility criteria combination, starting with individual criterion effects. Qualifying patient counts refer to the number of patients that met the specific criterion applied. Risk of hospitalization was measured as unadjusted hazard ratios (HRs) from Cox proportional hazards models with all-cause hospitalization (length of stay > 1 day) serving as the outcome (codes available in Supplemental Material 1); follow-up ended at first occurrence of hospitalization, 180 days, or end of data. The Cox models were univariable models adopted for each criteria combination, with the non-qualifying group serving as the reference. A follow-up of 180 days was chosen to simulate a typical follow-up period for a trial (or a period of time for an interim analysis of a longer-term trial).
Next, all eligibility criteria combinations were analyzed to identify patterns on changes between patient counts and risk of hospitalization. Because some criteria combinations might identify a small cohort, cohort pairs had to pass a power analysis where enough events were available to detect a 50% reduction in hospitalization risk at 80% power and alpha of 0.05. Although 50% reduction is relatively large, this was chosen because a key component of the analysis is to find criteria combinations highly likely to result in a safe patient sample. For adequately powered cohort pairs, counts and hospitalization risk were visualized using scatterplots. To identify patterns resulting from particular criteria combinations, k-means clustering analysis was applied on normalized observations of qualifying patient counts and HR estimates. K-means clustering was chosen because the analytic task of identifying criteria patterns was viewed as an unsupervised task to identify which groups of criteria emerged as measured against counts vs HRs, and the number of criteria combinations was not known a priori. The number of clusters was informed by elbow plots and minimum Bayesian information criterion, with scatterplots created to visualize the clustering results. Using the scatterplots, clusters of criteria combinations that had the largest qualifying patient counts but the lowest HR estimates (i.e., criteria combinations present in the bottom-right quadrant of a scatterplot) were considered to have the most promising criteria combinations. In order to determine which criteria were driving these combinations, a stepwise approach was applied in which the largest differences in prevalence of criteria applied between clusters were examined – based on those differences, clusters were partitioned and re-examined for additional differences, with this process repeating until a clear pattern emerged or saturation occurred. As a complementary check, boxplots for each individual eligibility criterion illustrating the distributions of patient counts and HRs of powered criteria combinations with that criterion were examined. As a sensitivity analysis, follow-up was extended from 180 days to 365 days to explore robustness of the 180-day analysis results. From an analysis standpoint, the 365-day analysis applies the same procedure as the 180-day analysis, except the cohorts involved have a follow-up time of 365 days instead of 180 days. All analyses were performed in R v4.0.3.
3. RESULTS
An overview of the number of patients and criteria explored for the three disease domains is provided in Table 1. Per disease domain, the following are presented: (1) the eligibility criteria found and their individual effects; (2) scatterplots of qualifying cohort counts versus hospitalization risk; and (3) clustering observations.
Table 1:
Summary of disease domains and eligibility criteria explored
| Relapsed/Refractory Lymphoma/Leukemia | Hepatitis C Virus | Chronic Kidney Disease |
|---|---|---|
| Total Number of Patients | ||
| 623 | 751 | 23893 |
| Total Number of Hospitalizations | ||
| 122 | 79 | 3357 |
| Criteria Explored (% of Patients that Qualify) | ||
| No HIV (98.56%) | No HIV (82.29%) | Adequate eGFR (89.08%) |
| No HBV/HCV (98.39%) | No HBV (97.34%) | No prior malignancy (65.26%) |
| Not pregnant (99.84%) | No HCC (90.15%) | Not pregnant (99.97%) |
| No prior chemotherapy or radiotherapy (94.70%) | No prior non-liver solid organ transplant (95.07%) | No prior use of rituximab (99.79%) |
| No prior malignancy (61.80%) | Not pregnant (99.60%) | No HIV (96.77%) |
| Adequate eGFR (98.23%) | No decompensated liver disease/cirrhosis (49.00%) | No HBV/HCV (97.38%) |
| No active infection (84.27%) | No prior non-HCC malignancy (79.09%) | No active infection (83.48%) |
| Adequate ANC (96.95%) | No prior corticosteroid use (88.21%) | |
| No prior corticosteroid use (94.22%) | No CHF (89.55%) | |
HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCV = hepatitis C virus; eGFR = estimated glomerular filtration rate; ANC = absolute neutrophil count; HCC = hepatocellular carcinoma; CHF = congestive heart failure
3.1. Relapsed/Refractory Lymphoma/Leukemia
From the set of 23 r/r lymphoma/leukemia trials, 9 criteria were identified (Table 2). The most common criteria were: no HIV (20 [87%] trials); no HBV/HCV (19 [83%]); and not pregnant (19 [83%]). The baseline cohort contained 623 r/r lymphoma/leukemia patients with 122 hospitalizations. When applying criteria individually, the following led to the smallest cohorts: no prior malignancy (385 [62%] patients); no active infection (525 [84%]); and no prior corticosteroid use (587 [94%]). There were four criteria that had HR < 1, including their confidence intervals: adequate absolute neutrophil count (ANC; 0.41 [95% CI: 0.19, 0.88]); no active infection (0.48 [0.32, 0.72]); no prior chemotherapy or radiotherapy (0.49 [0.26, 0.91]); and no prior malignancy (0.60 [0.42, 0.85], which is the only individually powered criterion).
Table 2:
Individual eligibility criteria overview for relapsed/refractory lymphoma/leukemia sample, follow-up 180 days
| Eligibility Criteria Label | Eligibility Criteria Details | Number of Trials Implementing the Eligibility Criterion N (%) | Number of Patients N (%) | Number of Events N (%) | Incidence Rate per 100 personyears (with 95% CI) | Hazard Ratio (with 95% CI) | |||
|---|---|---|---|---|---|---|---|---|---|
| <Start> | N/A | 23 (100) | 623 (100) | 122 (100) | 49.44 (41.40, 59.04) | N/A | |||
| Q | nQ | Q | nQ | Q | nQ | ||||
| No HIV | No HIV within the past 365 days | 20 (86.96) | 614 (98.56) | 9 (1.44) | 119 (97.54) | 3 (2.46) | 48.87 (40.83, 58.48) | 93.18 (30.05, 288.90) | 0.55 (0.18, 1.73) |
| No HBV/HCV | No HBV/HCV within the past 365 days | 19 (82.61) | 613 (98.39) | 10 (1.61) | 118 (96.72) | 4 (3.28) | 48.44 (40.44, 58.02) | 126.93 (47.64, 338.21) | 0.41 (0.15, 1.12) |
| Not pregnant | No evidence of current pregnancy within the past 60 days | 19 (82.61) | 622 (99.84) | 1 (0.16) | 122 (100) | 0 (0) | 49.54 (41.49, 59.16) | - | - |
| No prior chemotherapy or radiotherapy | No prior chemotherapy or radiotherapy within the past 14 days (excludes index) | 18 (78.26) | 590 (94.70) | 33 (5.30) | 111 (90.98) | 11 (9.02) | 46.99 (39.01, 56.60) | 104.44 (57.84, 188.59) | 0.49 (0.26, 0.91) |
| No prior malignancy | No prior malignancy (beside lymphoma or leukemia related cancers, non-melanoma skin cancer, in-situ cancers, benign tumor, lipomatous tumor, or uncertain behavior) within the past 1095 days | 17 (73.91) | 385 (61.80) | 238 (38.20) | 61 (50.00) | 61 (50.00) | 39.03 (30.37, 50.17) | 67.43 (52.46, 86.66) | 0.60 (0.42, 0.85)* |
| Adequate eGFR | Most recent eGFR measure within the past 180 days ≥ 30 mL/min/1.73m^2 (per MDRD equation) | 11 (47.83) | 612 (98.23) | 11 (1.77) | 118 (96.72) | 4 (3.28) | 48.49 (40.48, 58.07) | 118.49 (44.47, 315.72) | 0.47 (0.17, 1.27) |
| No active infection | No active infection within the past 30 days | 10 (43.48) | 525 (84.27) | 98 (15.73) | 91 (74.59) | 31 (25.41) | 42.39 (34.52, 52.06) | 96.59 (67.93, 137.35) | 0.48 (0.32, 0.72) |
| Adequate ANC | Most recent ANC measure within the past 180 days ≥ 1000/mm3 | 9 (39.13) | 604 (96.95) | 19 (3.05) | 115 (94.26) | 7 (5.74) | 47.74 (39.76, 57.31) | 119.98 (57.20, 251.67) | 0.41 (0.19, 0.88) |
| No prior corticosteroid use | No prior corticosteroid use within the past 7 days (excludes index) | 9 (39.13) | 587 (94.22) | 36 (5.78) | 112 (91.80) | 10 (8.20) | 47.50 (39.47, 57.16) | 91.36 (49.16, 169.80) | 0.56 (0.29, 1.06) |
Q = qualifying cohort; nQ = non-qualifying cohort; HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCV = hepatitis C virus; eGFR = estimated glomerular filtration rate; ANC = absolute neutrophil count; N/A = non-applicable; MDRD = modification of diet in renal disease study. Hazard ratios have nQ as the reference group (* = powered; - = not calculated given no events in a particular group). All eligibility criteria assessment timeframes include the date of the index event unless specified otherwise. For criteria that rely on lab values: if a patient did not have a lab value present or found during the criteria assessment period, that patient was assumed to have an adequate value and thus did not get excluded. Further code details are provided in Supplemental Material 1.
Of the 511 possible criteria combinations, 384 (75%) were powered (Figure 3). The range of qualifying patient counts was 297 to 521 patients, while the range of the HRs was 0.41 to 0.62. The patterns of 6 clusters of criteria combinations were analyzed. No prior malignancy led to the largest decreases in patient counts for all criteria combinations, as evidenced by comparing Clusters 3, 4, and 6 (i.e., those criteria combinations that apply it) to Clusters 1, 2, and 5 (i.e., those criteria combinations that do not apply it). However, the latter set of clusters contained combinations with lower hospitalization risk relative to the former set, suggesting the no prior malignancy criterion is not required to achieve lower risks. When examining Clusters 1, 2, and 5, all combinations within these clusters generally applied the no infection criterion and Cluster 2 contained the lowest risks, where application of adequate ANC was the primary difference in criteria applied. The observations are generally confirmed by the boxplots (Figure 4). Of note, no prior chemotherapy/radiotherapy had generally lower hazard ratios, which is confirmed by comparing how the application of this criterion affects the clusters that involved no prior malignancy (i.e., Cluster 4 compared to Clusters 3 and 6). For the sensitivity analysis, applying no infection without the no corticosteroid use criterion yielded a group of criteria combinations that minimized hospitalization risk while preserving more available patients relative to other clusters (Supplemental Material 2; 3; 4).
Figure 3:
Cluster patterns of eligibility criteria between patient counts and hospitalization risk, relapsed/refractory lymphoma/leukemia, follow-up 180 days
EC = eligibility criteria; HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCV = hepatitis C virus; chemo/rad = chemotherapy/radiotherapy; eGFR = estimated glomerular filtration rate; ANC = absolute neutrophil count
Figure 4:
Boxplots for each eligibility criterion displaying the distributions of the patient counts and hazard ratios for all powered eligibility criteria combinations that applied that particular criterion (as displayed per the “N” label on the x-axis), relapsed/refractory lymphoma/leukemia, follow-up 180 days
3.2. Hepatitis C Virus
From the set of 15 HCV trials, 7 criteria were identified (Table 3). The most common criteria were: no HIV (15 trials [100%]); no HBV (15 [100%]); and no hepatocellular carcinoma, (HCC; 11 [73%]). The baseline cohort contained 751 HCV patients with 79 hospitalizations. When applying criteria individually, the following led to the smallest cohorts: no decompensated liver disease/cirrhosis (368 patients [49%]); no prior non-HCC malignancy (594 [79%]); and no HIV (618 [82%]). There were four criteria that had HR < 1, including their confidence intervals: not pregnant (0.06 [95% CI: 0.01, 0.25]); no HCC (0.38 [0.22, 0.65]); no decompensated liver disease/cirrhosis (0.42 [0.26, 0.69]), which is the only individually powered criterion); and no HIV (0.47 [0.29, 0.76]).
Table 3:
Individual eligibility criteria overview for hepatitis C virus sample, follow-up 180 days
| Eligibility Criteria Label | Eligibility Criteria Details | Number of Trials Implementing the Eligibility Criterion N (%) | Number of Patients N (%) | Number of Events N (%) | Incidence Rate per 100 person-years (with 95% CI) | Hazard Ratio (with 95% CI) | |||
|---|---|---|---|---|---|---|---|---|---|
| <Start> | N/A | 15 (100) | 751 (100) | 79 (100) | 23.50 (18.85, 29.29) | N/A | |||
| Q | nQ | Q | nQ | Q | nQ | ||||
| No HIV | No HIV within the past 365 days | 15 (100) | 618 (82.29) | 133 (17.71) | 55 (69.62) | 24 (30.38) | 19.71 (15.13, 25.67) | 41.98 (28.14, 62.64) | 0.47 (0.29, 0.76) |
| No HBV | No HBV within the past 365 days | 15 (100) | 731 (97.34) | 20 (2.66) | 75 (94.94) | 4 (5.06) | 22.87 (18.24, 28.68) | 48.39 (18.16, 128.94) | 0.47 (0.17, 1.29) |
| No HCC | No HCC within the past 365 days | 11 (73.33) | 677 (90.15) | 74 (9.85) | 62 (78.48) | 17 (21.52) | 20.36 (15.87, 26.12) | 53.62 (33.33, 86.25) | 0.38 (0.22, 0.65) |
| No prior nonliver solid organ transplant | No evidence of a nonliver solid organ transplant (includes heart, lung, kidney, pancreas, intestine, or other) at any time prior to index | 10 (66.67) | 714 (95.07) | 37 (4.93) | 73 (92.41) | 6 (7.59) | 22.87 (18.18, 28.76) | 35.40 (15.90, 78.79) | 0.65 (0.28, 1.48) |
| Not pregnant | No evidence of current pregnancy within the past 60 days | 9 (60.00) | 748 (99.60) | 3 (0.40) | 77 (97.47) | 2 (2.53) | 22.94 (18.35, 28.68) | 367.09 (91.80, 1467.81) | 0.06 (0.01, 0.25) |
| No decompensated liver disease/cirrhosis | No decompensated liver disease/cirrhosis within the past 365 days | 9 (60.00) | 368 (49.00) | 383 (51.00) | 23 (29.11) | 56 (70.89) | 13.89 (9.23, 20.89) | 32.83 (25.27, 42.66) | 0.42 (0.26, 0.69)* |
| No prior non- HCC malignancy | No prior malignancy (beside HCC, benign tumor, or lipomatous tumor) within the past 1095 days | 8 (53.33) | 594 (79.09) | 157 (20.91) | 57 (72.15) | 22 (27.85) | 21.53 (16.60, 27.91) | 30.81 (20.29, 46.79) | 0.70 (0.43, 1.14) |
Q = qualifying cohort; nQ = non-qualifying cohort; HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCC = hepatocellular carcinoma; N/A = non-applicable. Hazard ratios have nQ as the reference group (* = powered). All eligibility criteria assessment timeframes include the date of the index event unless specified otherwise. Further code details are provided in Supplemental Material 1.
Of the 127 possible criteria combinations, 90 (71%) were powered (Figure 5). The range of qualifying patient counts was 221 to 527 patients, while the range of HRs was 0.12 to 0.56. The patterns of 4 clusters of criteria combinations were analyzed. No decompensated liver disease/cirrhosis led to the largest decreases in patient counts for all criteria combinations, but also the lowest hospitalization risks, as evidenced by comparing Clusters 1 and 4 (i.e., those criteria combinations that apply it) to Clusters 2 and 3 (i.e., those criteria combinations that do not apply it). When comparing Clusters 1 and 4, the criteria of no HIV led to the largest reduction in hospitalization risk. In terms of maximizing patient count and minimizing hospitalization risk, the most promising criteria combinations appeared in Cluster 2 by applying no HIV and no HCC while forgoing no decompensated liver disease/cirrhosis (the combination optimizing these parameters was no HIV, no HCC, and no non-liver organ transplant). Similar patterns were found in the sensitivity analysis and boxplot analyses (Figure 6; Supplemental Material 5; 6; 7).
Figure 5:
Cluster patterns of eligibility criteria between patient counts and hospitalization risk, hepatitis C virus, follow-up 180 days
EC = eligibility criteria; HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCC = hepatocellular carcinoma
Figure 6:
Boxplots for each eligibility criterion displaying the distributions of the patient counts and hazard ratios for all powered eligibility criteria combinations that applied that particular criterion (as displayed per the “N” label on the x-axis), hepatitis C virus, follow-up 180 days
HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCC = hepatocellular carcinoma
3.3. Chronic Kidney Disease
From the set of 10 CKD trials, 9 criteria were identified (Table 4). The most common criteria were: adequate eGFR (8 trials [80%]); no prior malignancy (6 [60%]); not pregnant (6 [60%]); and no prior use of rituximab (6 [60%]). For this set of trials, note the application of adequate eGFR as a criterion is to specify a minimum threshold for the indexed eGFR (which differed from the implemented index event). The baseline cohort contained 23893 CKD patients with 3357 hospitalizations. When applying criteria individually, the following led to the smallest cohorts: no prior malignancy (15592 [65%] patients); no active infection (19947 [83%]); and no prior corticosteroid use (21076 [88%]). There were five criteria that had HR < 1, including their confidence intervals: no congestive heart failure (CHF; 0.55 [95% CI: 0.50, 0.60]); no prior corticosteroid use (0.64 [0.58, 0.70]); no prior malignancy (0.70 [0.65, 0.75]); no active infection (0.82 [0.75, 0.89]); and adequate eGFR (0.84 [0.76, 0.94]) – all of these were powered.
Table 4:
Individual eligibility criteria overview for chronic kidney disease sample, follow-up 180 days
| Eligibility Criteria Label | Eligibility Criteria Details | Number of Trials Implementing the Eligibility Criterion N (%) | Number of Patients N (%) | Number of Events N (%) | Incidence Rate per 100 person-years (with 95% CI) | Hazard Ratio (with 95% CI) | |||
|---|---|---|---|---|---|---|---|---|---|
| <Start> | N/A | 10 (100) | 23893 (100) | 3357 (100) | 33.61 (32.49, 34.76) | N/A | |||
| Q | nQ | Q | nQ | Q | nQ | ||||
| Adequate eGFR | Index eGFR measure ≥ 30 mL/min/1.73m^2 (per MDRD equation) | 8 (80.00) | 21284 (89.08) | 2609 (10.92) | 2943 (87.67) | 414 (12.33) | 32.94 (31.77, 34.15) | 39.25 (35.64, 43.22) | 0.84 (0.76, 0.94)* |
| No prior malignancy | No prior malignancy (beside non-melanoma skin cancer, melanoma in situ, carcinoma in situ of the cervix, benign tumor, or lipomatous tumor) within the past 1095 days | 6 (60.00) | 15592 (65.26) | 8301 (34.74) | 1900 (56.60) | 1457 (43.40) | 29.17 (27.89, 30.52) | 41.91 (39.81, 44.11) | 0.70 (0.65, 0.75)* |
| Not pregnant | No evidence of current pregnancy within the past 60 days | 6 (60.00) | 23886 (99.97) | 7 (0.03) | 3357 (100) | 0 (0) | 33.62 (32.50, 34.77) | - | - |
| No prior use of rituximab | No use of rituximab within past 60 days (excludes index) | 6 (60.00) | 23842 (99.79) | 51 (0.21) | 3351 (99.82) | 6 (0.18) | 33.62 (32.50, 34.78) | 27.04 (12.15, 60.19) | 1.24 (0.56, 2.76) |
| No HIV | No HIV within the past 365 days | 4 (40.00) | 23122 (96.77) | 771 (3.23) | 3300 (98.30) | 57 (1.70) | 34.27 (33.12, 35.46) | 15.82 (12.20, 20.51) | 2.14 (1.64, 2.78)* |
| No HBV/HCV | No HBV/HCV within the past 365 days | 4 (40.00) | 23266 (97.38) | 627 (2.62) | 3247 (96.72) | 110 (3.28) | 33.41 (32.28, 34.58) | 40.56 (33.64, 48.89) | 0.83 (0.68, 1.00)* |
| No active infection | No active infection within the past 30 days | 4 (40.00) | 19947 (83.48) | 3946 (16.52) | 2704 (80.55) | 653 (19.45) | 32.38 (31.19, 33.63) | 39.83 (36.89, 43.01) | 0.82 (0.75, 0.89)* |
| No prior corticosteroid use | No prior corticosteroid use within the past 30 days (excludes index) | 4 (40.00) | 21076 (88.21) | 2817 (11.79) | 2794 (83.23) | 563 (16.77) | 31.53 (30.38, 32.72) | 49.94 (45.98, 54.24) | 0.64 (0.58, 0.70)* |
| No CHF | No CHF within the past 365 days | 3 (30.00) | 21397 (89.55) | 2496 (10.45) | 2772 (82.57) | 585 (17.43) | 30.94 (29.81, 32.12) | 56.77 (52.35, 61.56) | 0.55 (0.50, 0.60)* |
Q = qualifying cohort; nQ = non-qualifying cohort; eGFR = estimated glomerular filtration rate; HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCV = hepatitis C virus; CHF = congestive heart failure; N/A = non-applicable; MDRD = modification of diet in renal disease study. Hazard ratios have nQ as the reference group (* = powered; - = not calculated given no events in a particular group). All eligibility criteria assessment timeframes include the date of the index event unless specified otherwise. For criteria that rely on lab values: if a patient did not have a lab value present or found during the criteria assessment period, that patient was assumed to have an adequate value and thus did not get excluded. Further code details are provided in Supplemental Material 1.
Of the 511 possible criteria combinations, 508 (99%) were powered (Figure 7). The range of qualifying patient counts was 8824 to 23226 patients, while the range of HRs was between 0.55 to 2.16. The patterns of 8 clusters of criteria combinations were analyzed. No prior malignancy led to the largest decrease in patient counts for all criteria combinations (Clusters 6 and 7, and to a lesser extent, Cluster 1). However, for hospitalization risk, criteria combinations that applied no CHF tended to minimize risks the most (Clusters 1, 2, and 7). Some clusters contained higher hospitalization risks (i.e., HR > 1) among their qualifying cohort compared to the non-qualifying cohort (Clusters 3 and 5), which suggests that choosing not to apply certain criteria – particularly no prior malignancy, no CHF, no corticosteroid use, and no infection – can, on their own or in combination with other criteria, lead to an available patient pool with increased hospitalization risk. Similar patterns were found in the sensitivity analysis and boxplot analyses (Figure 8; Supplemental Material 8; 9; 10).
Figure 7:
Cluster patterns of eligibility criteria between patient counts and hospitalization risk, chronic kidney disease, follow-up 180 days
EC = eligibility criteria; eGFR = estimated glomerular filtration rate; HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCV = hepatitis C virus; CHF = congestive heart failure
Figure 8:
Boxplots for each eligibility criterion displaying the distributions of the patient counts and hazard ratios for all powered eligibility criteria combinations that applied that particular criterion (as displayed per the “N” label on the x-axis), chronic kidney disease, follow-up 180 days
eGFR = estimated glomerular filtration rate; HIV = human immunodeficiency virus; HBV = hepatitis B virus; HCV = hepatitis C virus; CHF = congestive heart failure; “N” label refers to count of criteria combinations applying that particular criterion; panel A is both distributions while panel B is a magnification of the hazard ratio boxplots (in which outliers > 1 are set to 1 for visualization purposes)
4. DISCUSSION
Using a convenience sample of trials, eligibility criteria were extracted and their effects, both individually and in combination, on patient counts and hospitalization risk across three different disease domains were assessed. The most promising criteria combinations that reduced hospitalization risk while preserving higher patient counts were as follows: for r/r lymphoma/leukemia, applying no infection and adequate ANC while forgoing no prior malignancy; for HCV, applying no HIV and no HCC while forgoing no decompensated liver disease/cirrhosis; and for CKD, applying no CHF.
The key contribution of this study is data-driven estimation of the relative reduction in hospitalization compared to reductions in eligible patient counts to guide criteria selection for future clinical trials. For r/r lymphoma/leukemia, no prior malignancy led to the greatest reduction in patient count, echoing prior evaluations estimating secondary cancers to be prevalent amongst this group [52,53]. However, an important observation is the existence of criteria combinations that did not include this criterion and still allowed for reduced hospitalization risk. This observation indicates that r/r lymphoma/leukemia individuals with a prior malignancy are not necessarily at increased risk of hospitalization and could be re-evaluated to allow for trial participation [21]. For HCV, decompensated liver disease/cirrhosis led to the greatest reduction in patient count and hospitalization risk. This observation is relatively confirmatory given this criterion refers to failing liver function, likely signaling increased disease burden or prolonged HCV infection requiring treatment in an inpatient setting [54]. There did exist possible criteria combinations (specifically no HIV and no HCC) that could forgo decompensated liver disease/cirrhosis and match similar reductions in hospitalization risk while preserving a larger available patient pool, but such a decision needs to be informed by the etiology of the decompensation as it relates to HCV, as well as the severity of the decompensation [55]. For CKD, no prior malignancy led to the greatest reductions in patient counts, which is unsurprising as these are often common comorbidities [56]. However, excluding CHF was generally the main driver for reducing hospitalization risk, which remains expected given CHF is a common reason for hospitalization and often co-occurs with CKD [57,58].
Although this approach focused on informing how eligibility criteria impacts safety and patient counts, it can also shed light on how criteria can be applied for other reasons. For example, some criteria are implemented for ethical reasons and this analysis can quantify their impact. Excluding pregnant individuals is often required for this reason; however, based on all three disease domains from this study, there appears to be minimal effect given so few pregnant patients were identified. Alternatively, some criteria are applied broadly regardless of disease, but this analysis demonstrates certain disease domains may be more heavily impacted. This is evidenced by HIV, in which there was minimal impact in the r/r lymphoma/leukemia and CKD trials, but prominent impact in the HCV trials regarding hospitalization risk. Likewise, some criteria are applied because of prior safety knowledge – this approach can be amendable to this by requiring the baseline cohort to incorporate those requirements or for the criteria combinations to automatically require those criteria. Ultimately, this approach can be tailored to inform a specific trial’s safety needs for eligibility criteria selection.
Undoubtedly, these results are driven by the underlying data. The sampled trials were conducted at a large academic medical center, which may be prone to conducting trials in higher-risk patients given available resources, such as the availability of a transplant center to allow for liver transplant trials to be conducted. As a result, the criteria sampled might not be representative or reflective of the greater clinical trial environment for some of these diseases. Likewise, the EHR data available were also from a large academic medical center, which may contain patients with specific underlying conditions that would otherwise not be found in other healthcare environments [59]. This may result in patients having a different comorbidity burden, which in turn affects risk estimates and patient count availability.
Despite these concerns, this analytic approach is promising from a trial planning standpoint. The most immediate advantage is its ability to provide quantifiable estimation of how certain criteria affect safety and patient count, thus allowing for a more informed criteria selection process. From a safety standpoint, this approach provides a sense of which patients might be prone to requiring further medical attention that may be further exacerbated from an experimental therapy. From a patient count standpoint, this approach allows investigators to determine which criteria are most restrictive in terms of who is available for recruitment. Building upon this advantage is that this approach can inform site selection by evaluating how many patients are available and if those patients may have particular risks at those sites. Many trials are often conducted as multi-site trials to provide a better opportunity for a larger and more diverse sample of participants [60]. Unfortunately, individual sites can still suffer from recruitment challenges and hinder the overall trial [12]. This approach can provide a means for estimating patient availability while also taking into account impact of different design decisions. This can enhance site engagement, a process centered on including sites in the planning and implementation of a trial, so that sites can inform the impact of certain eligibility criteria on recruitment strategies [61,62]. Of particular promise is the scalability of this approach because it is based on cohort construction strategies and phenotype syntax that utilizes conventions of a common data model, allowing transferability to other sites (under the condition that a site transformed their data to the same common data model).
Another advantage is the flexibility of adapting portions of this approach to fit different investigators’ needs. In general, the approach can be summarized as follows (echoing Figure 1): (1) select a disease domain, or otherwise patient population, of interest; (2) choose a sample of eligibility criteria; (3) create cohort pairs using all combinations of the sampled criteria; and (4) analyze the effects of the criteria. The details of these particular steps was pursued with the perspective that the eligibility criteria were not known a priori. However, those details, and subsequently the approach, can be changed to accommodate investigators’ needs if the criteria are already known. For example, the sample procedure provided in step 2 would not need to be pursued and the selected criteria can be set as the starting sample. Or as another example, if investigators already have certain criteria combinations of interest to assess, they can consider viewing the analysis step as a supervised task and not pursue an unsupervised approach such as k-means clustering. Instead, the analysis method of choice could be running regression models on patient counts and HR estimates with criteria as inputs and using the coefficients as the metric for informing which criteria should be considered. Ultimately, this approach can accommodate both perspectives of knowing and not knowing criteria to assess beforehand.
Although the focus of the case studies was from a more traditional clinical trial planning standpoint, it can be readapted for other trial types. In particular, this approach can assist in planning precision trials, which are designed to take into account dynamic changes within patients as they proceed in the trial [63]. One proposed feature of a precision trial is treatment-targeted enrichment. This feature involves measuring a short-term response after exposure to an intervention of interest and then tailoring the trial accordingly to those responses. For the purposes of this approach, adaption would require building a cohort of individuals exposed to interventions that have similar mechanism to the intervention of interest, and then using the responses to stratify the cohort in order to assess which eligibility criteria are most pertinent for which responses. In this situation, the responses would be the index date conditional on exposure to interventions similar to the one of interest. The result of this analysis would then allow an investigator to either: (1) inform criteria selection for the overall trial if interested in all responses; or (2) tailor the trial to those with the most promising responses and thus use the analysis to inform criteria selection for that group, all whilst taking into account safety and available patient counts.
For demonstration purposes, hospitalization was chosen as a safety outcome as it is considered a serious adverse event applicable to most disease domains [32]. However, the selection of this as a stand-in for a safety assessment comes with a few caveats. The first is that serious adverse events in clinical trials are typically reported as undesirable experiences that were likely a result of the intervention being studied. For this approach, the index events are not the intervention being studied per se as the focus is on informing patient characteristics for a future trial for that intervention, so the hospitalizations are likely a result of alternative explanations such as worsening health status related to particular comorbidities or use of other medications or procedures leading to hospitalization (although, the presumption is that using a studied intervention may worsen the hospitalization event). A possible way to better account for this concern is to index on similar interventions that have similar mechanisms to the future trial intervention of interest, but this strategy is not necessarily available for all trial instances, such as those assessing a novel drug mechanism. The second is that a potential trial may actually be interested in patients at high-risk – for this focus though, this approach can be revised to alter the baseline cohort to purposely focus on such patients while hospitalization can be replaced with another pertinent event of interest for that group, assuming the RWD source is fit-for-use. This pivot by extension can assist with addressing other safety issues that are more relevant for a potential future trial. Finally, the third caveat is the existence of expected hospitalizations unaccounted for during the follow-up assessment. This approach attempted to address those expected hospitalizations as best as possible, as exampled through the baseline cohort construction of r/r lymphoma/leukemia and HCV, but data limitations can make it difficult to distinguish between expected and unexpected for some criteria (e.g., pregnancy related criteria).
There are limitations to take into account, beyond the aforementioned sample considerations. Data quality concerns remain pertinent as EHR data can be rife with misleading entries or incomplete records, although the use of validated phenotypes hopefully mitigates this concern [50]. A particularly important EHR data quality concern is related to the capture of hospitalizations. Specifically, the CUIMC data did not necessarily capture hospitalizations that occur in other healthcare systems; likewise, CUIMC may have patient referral entries that only have a few instances for these patients, meaning that information on those patients may be incomplete for analysis. Beyond EHR data, ClinicalTrials.gov entries also had a data quality concern as temporal restrictions for criteria were infrequently provided. In particular, condition-related criteria often did not have a temporal restriction specified, ultimately requiring an assumed timeframe for cohort construction that may inaccurately reflect how these criteria were intended to be assessed. Another limitation is how the clinical trials were aggregated. Although they were grouped by similar disease domain, each trial has its own focus with its own nuances. For example, many of the CKD trials were tied to a specific kidney disease, so although all the trials in this group fit the CKD classification, this came at the expense of forgoing a more specific disease label. Finally, there are a few analytic limitations. The HRs provide hospitalization risk estimates that were unadjusted, meaning that there can be other clinical characteristics from patients that increased their risk of hospitalization beyond the examined criteria, such as another treatment regimen or a prior invasive surgery. Although the confounding concern may have this impact, the main focus of this approach is to examine incremental changes as different criteria are incorporated and use the changes to identify criteria patterns, which still provides valuable insight for understanding criteria impact. Likewise, other safety considerations exist beyond just hospitalizations – however, this methodology can accommodate those by replacing hospitalization with the assessment of interest. Additionally, the use of k-means clustering provides a straightforward methodology for identifying potential patterns, but changes to the underlying data can easily distort findings and lead to different interpretations.
5. CONCLUSION
Using a sample of clinical trials from three diverse disease domains, this study contributes a novel approach for proactively assessing eligibility criteria’s potential impact on hospitalization risk and eligible patient counts. Through clustering analyses, criteria patterns can be identified and interpreted in order to better inform if certain criteria should be re-evaluated or even applied. Ultimately, EHR data can be a useful resource for guiding clinical trial criteria selection in terms of balancing safety concerns and available patients for recruitment when planning future clinical trials.
Supplementary Material
HIGHLIGHTS.
This study presents an approach for helping select trial eligibility criteria.
It informs criteria’s impact on recruitment pool size and safety using EHR data.
It explores this approach through 3 case studies.
Case study findings demonstrate varying effects when choosing different criteria.
Funding
This research was funded by National Library of Medicine grants R01LM009886 (PI: Weng) and 5T15LM007079 (PI: Hripcsak). The funding agency was not involved in the study.
Footnotes
Conflict of Interest
All authors have no conflict of interest to disclose.
Ethical approval
This study has been approved by Columbia University IRB.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- 1.Beaver JA, Ison G, Pazdur R. Reevaluating Eligibility Criteria - Balancing Patient Protection and Participation in Oncology Trials. N Engl J Med 2017;376:1504–5. doi: 10.1056/NEJMp1615879 [DOI] [PubMed] [Google Scholar]
- 2.Kim ES, Bernstein D, Hilsenbeck SG, et al. Modernizing Eligibility Criteria for Molecularly Driven Trials. J Clin Oncol 2015;33:2815–20. doi: 10.1200/JCO.2015.62.1854 [DOI] [PubMed] [Google Scholar]
- 3.Williams RJ, Tse T, DiPiazza K, et al. Terminated Trials in the ClinicalTrials.gov Results Database: Evaluation of Availability of Primary Outcome Data and Reasons for Termination. PLOS ONE 2015;10:e0127242. doi: 10.1371/journal.pone.0127242 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rothwell PM. External validity of randomised controlled trials: “to whom do the results of this trial apply?” Lancet 2005;365:82–93. doi: 10.1016/S0140-6736(04)17670-8 [DOI] [PubMed] [Google Scholar]
- 5.Duma N, Kothadia SM, Azam TU, et al. Characterization of Comorbidities Limiting the Recruitment of Patients in Early Phase Clinical Trials. Oncologist 2019;24:96–102. doi: 10.1634/theoncologist.2017-0687 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.He Z, Carini S, Sim I, et al. Visual aggregate analysis of eligibility features of clinical trials. J Biomed Inform 2015;54:241–55. doi: 10.1016/j.jbi.2015.01.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hao T, Rusanov A, Boland MR, et al. Clustering clinical trials with similar eligibility criteria features. Journal of Biomedical Informatics 2014;52:112–20. doi: 10.1016/j.jbi.2014.01.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Luo Z, Miotto R, Weng C. A human-computer collaborative approach to identifying common data elements in clinical trial eligibility criteria. J Biomed Inform 2013;46:33–9. doi: 10.1016/j.jbi.2012.07.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.American Diabetes Association. 6. Glycemic Targets: Standards of Medical Care in Diabetes—2020. Diabetes Care 2020;43:S66–76. doi: 10.2337/dc20-S006 [DOI] [PubMed] [Google Scholar]
- 10.Khera R, Lu Y, Lu J, et al. Impact of 2017 ACC/AHA guidelines on prevalence of hypertension and eligibility for antihypertensive treatment in United States and China: nationally representative cross sectional study. BMJ 2018;362. doi: 10.1136/bmj.k2357 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lewington S, Clarke R, Qizilbash N, et al. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet 2002;360:1903–13. doi: 10.1016/s0140-6736(02)11911-8 [DOI] [PubMed] [Google Scholar]
- 12.Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun 2018;11:156–64. doi: 10.1016/j.conctc.2018.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jin S, Pazdur R, Sridhara R. Re-Evaluating Eligibility Criteria for Oncology Clinical Trials: Analysis of Investigational New Drug Applications in 2015. J Clin Oncol 2017;35:3745–52. doi: 10.1200/JCO.2017.73.4186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Weng C Optimizing Clinical Research Participant Selection with Informatics. Trends Pharmacol Sci 2015;36:706–9. doi: 10.1016/j.tips.2015.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Van Spall HGC, Toren A, Kiss A, et al. Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review. JAMA 2007;297:1233–40. doi: 10.1001/jama.297.11.1233 [DOI] [PubMed] [Google Scholar]
- 16.Persad GC, Little RF, Grady C. Including Persons With HIV Infection in Cancer Clinical Trials. JCO 2008;26:1027–32. doi: 10.1200/JCO.2007.14.5532 [DOI] [PubMed] [Google Scholar]
- 17.Karim S, Xu Y, Kong S, et al. Generalisability of Common Oncology Clinical Trial Eligibility Criteria in the Real World. Clinical oncology (Royal College of Radiologists (Great Britain)) 2019;31:e160–6. doi: 10.1016/j.clon.2019.05.003 [DOI] [PubMed] [Google Scholar]
- 18.Malik L, Lu D. Eligibility criteria for phase I clinical trials: tight vs loose? Cancer Chemother Pharmacol 2019;83:999–1002. doi: 10.1007/s00280-019-03801-w [DOI] [PubMed] [Google Scholar]
- 19.Shah JJ, Abonour R, Gasparetto C, et al. Analysis of Common Eligibility Criteria of Randomized Controlled Trials in Newly Diagnosed Multiple Myeloma Patients and Extrapolating Outcomes. Clin Lymphoma Myeloma Leuk 2017;17:575–583.e2. doi: 10.1016/j.clml.2017.06.013 [DOI] [PubMed] [Google Scholar]
- 20.Rahman NA, Ison G, Beaver JA. Broadening Eligibility Criteria for Oncology Clinical Trials: Current Advances and Future Directions. Clinical Pharmacology & Therapeutics 2020;108:419–21. doi: 10.1002/cpt.1919 [DOI] [PubMed] [Google Scholar]
- 21.Lichtman SM, Harvey RD, Damiette Smit M-A, et al. Modernizing Clinical Trial Eligibility Criteria: Recommendations of the American Society of Clinical Oncology-Friends of Cancer Research Organ Dysfunction, Prior or Concurrent Malignancy, and Comorbidities Working Group. J Clin Oncol 2017;35:3753–9. doi: 10.1200/JCO.2017.74.4102 [DOI] [PubMed] [Google Scholar]
- 22.Uldrick TS, Ison G, Rudek MA, et al. Modernizing Clinical Trial Eligibility Criteria: Recommendations of the American Society of Clinical Oncology-Friends of Cancer Research HIV Working Group. J Clin Oncol 2017;35:3774–80. doi: 10.1200/JCO.2017.73.7338 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Roes KCB, van der Zande ISE, van Smeden M, et al. Towards an appropriate framework to facilitate responsible inclusion of pregnant women in drug development programs. Trials 2018;19:123. doi: 10.1186/s13063-018-2495-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Herrera AP, Snipes SA, King DW, et al. Disparate Inclusion of Older Adults in Clinical Trials: Priorities and Opportunities for Policy and Practice Change. Am J Public Health 2010;100:S105–12. doi: 10.2105/AJPH.2009.162982 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Makady A, de Boer A, Hillege H, et al. What Is Real-World Data? A Review of Definitions Based on Literature and Stakeholder Interviews. Value in Health 2017;20:858–65. doi: 10.1016/j.jval.2017.03.008 [DOI] [PubMed] [Google Scholar]
- 26.Evans SR, Paraoan D, Perlmutter J, et al. Real-World Data for Planning Eligibility Criteria and Enhancing Recruitment: Recommendations from the Clinical Trials Transformation Initiative. Ther Innov Regul Sci Published Online First: 3 January 2021. doi: 10.1007/s43441-020-00248-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Melzer G, Maiwald T, Prokosch H-U, et al. Leveraging Real-World Data for the Selection of Relevant Eligibility Criteria for the Implementation of Electronic Recruitment Support in Clinical Trials. Appl Clin Inform 2021;12:17–26. doi: 10.1055/s-0040-1721010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Rogers JR, Lee J, Zhou Z, et al. Contemporary use of real-world data for clinical trial conduct in the United States: a scoping review. Journal of the American Medical Informatics Association 2021;28:144–54. doi: 10.1093/jamia/ocaa224 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Shortreed SM, Rutter CM, Cook AJ, et al. Improving pragmatic clinical trial design using real-world data. Clinical trials (London, England) 2019;16:273–82. doi: 10.1177/1740774519833679 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kim JH, Ta CN, Liu C, et al. Towards clinical data-driven eligibility criteria optimization for interventional COVID-19 clinical trials. Journal of the American Medical Informatics Association 2021;28:14–22. doi: 10.1093/jamia/ocaa276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Liu R, Rizzo S, Whipple S, et al. Evaluating eligibility criteria of oncology trials using real-world data and AI. Nature 2021;592:629–33. doi: 10.1038/s41586-021-03430-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.What is a Serious Adverse Event? | FDA. https://www.fda.gov/safety/reporting-serious-problems-fda/what-serious-adverse-event (accessed 2 May 2021). [Google Scholar]
- 33.Liu H, Chi Y, Butler A, et al. A knowledge base of clinical trial eligibility criteria. Journal of Biomedical Informatics 2021;117:103771. doi: 10.1016/j.jbi.2021.103771 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tasneem A, Aberle L, Ananth H, et al. The Database for Aggregate Analysis of ClinicalTrials.gov (AACT) and Subsequent Regrouping by Clinical Specialty. PLOS ONE 2012;7:e33677. doi: 10.1371/journal.pone.0033677 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.CommonDataModel: Definition and DDLs for the OMOP Common Data Model (CDM). Observational Health Data Sciences and Informatics 2018. https://github.com/OHDSI/CommonDataModel (accessed 5 Jan 2018).
- 36.Hripcsak G, Duke JD, Shah NH, et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform 2015;216:574–8. [PMC free article] [PubMed] [Google Scholar]
- 37.About Us. UpToDate. https://www.uptodate.com/home/about-us (accessed 1 Jun 2021).
- 38.RxNav Home Page. RxNav. https://rxnav.nlm.nih.gov/ (accessed 1 Dec 2020).
- 39.Shang N, Khan A, Polubriaginof F, et al. Medical records-based chronic kidney disease phenotype for clinical care and “big data” observational and genetic studies. npj Digit Med 2021;4:1–13. doi: 10.1038/s41746-021-00428-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chen R, Ryan P, Natarajan K, et al. Treatment Patterns for Chronic Comorbid Conditions in Patients With Cancer Using a Large-Scale Observational Data Network. JCO Clinical Cancer Informatics 2020;:171–83. doi: 10.1200/CCI.19.00107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wheless L, Baker L, Edwards L, et al. Development of Phenotyping Algorithms for the Identification of Organ Transplant Recipients: Cohort Study. JMIR Med Inform 2020;8. doi: 10.2196/18001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Phillips CA, Razzaghi H, Aglio T, et al. Development and evaluation of a computable phenotype to identify pediatric patients with leukemia and lymphoma treated with chemotherapy using electronic health record data. Pediatr Blood Cancer 2019;66:e27876. doi: 10.1002/pbc.27876 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Paul DW, Neely NB, Clement M, et al. Development and validation of an electronic medical record (EMR)-based computed phenotype of HIV-1 infection. J Am Med Inform Assoc 2018;25:150–7. doi: 10.1093/jamia/ocx061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Tison GH, Chamberlain AM, Pletcher MJ, et al. Identifying Heart Failure using EMR-based algorithms. Int J Med Inform 2018;120:1–7. doi: 10.1016/j.ijmedinf.2018.09.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Niu B, Forde KA, Goldberg DS. Coding algorithms for identifying patients with cirrhosis and hepatitis B or C virus using administrative data. Pharmacoepidemiol Drug Saf 2015;24:107–11. doi: 10.1002/pds.3721 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Andrade SE, Toh S, Houstoun M, et al. Surveillance of Medication Use During Pregnancy in the Mini-Sentinel Program. Matern Child Health J 2016;20:895–903. doi: 10.1007/s10995-015-1878-8 [DOI] [PubMed] [Google Scholar]
- 47.Goldberg DS, Lewis JD, Halpern SD, et al. Validation of a coding algorithm to identify patients with hepatocellular carcinoma in an administrative database. Pharmacoepidemiol Drug Saf 2013;22:103–7. doi: 10.1002/pds.3367 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Goldberg D, Lewis J, Halpern S, et al. Validation of a coding algorithm to identify patients with end-stage liver disease in an administrative database. Pharmacoepidemiol Drug Saf 2012;21:765–9. doi: 10.1002/pds.3290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Patkar NM, Curtis JR, Teng GG, et al. Administrative codes combined with medical records based criteria accurately identified bacterial infections among rheumatoid arthritis patients. Journal of Clinical Epidemiology 2009;62:321–327.e7. doi: 10.1016/j.jclinepi.2008.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013;20:144–51. doi: 10.1136/amiajnl-2011-000681 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Stevens PE, Levin A, Kidney Disease: Improving Global Outcomes Chronic Kidney Disease Guideline Development Work Group Members. Evaluation and management of chronic kidney disease: synopsis of the kidney disease: improving global outcomes 2012 clinical practice guideline. Ann Intern Med 2013;158:825–30. doi: 10.7326/0003-4819-158-11-201306040-00007 [DOI] [PubMed] [Google Scholar]
- 52.Varettoni M, Tedeschi A, Arcaini L, et al. Risk of second cancers in Waldenström macroglobulinemia. Annals of Oncology 2012;23:411–5. doi: 10.1093/annonc/mdr119 [DOI] [PubMed] [Google Scholar]
- 53.Tsimberidou A-M, Wen S, McLaughlin P, et al. Other Malignancies in Chronic Lymphocytic Leukemia/Small Lymphocytic Lymphoma. J Clin Oncol 2009;27:904–10. doi: 10.1200/JCO.2008.17.5398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ginès P, Fernández J, Durand F, et al. Management of critically-ill cirrhotic patients. Journal of Hepatology 2012;56:S13–24. doi: 10.1016/S0168-8278(12)60003-8 [DOI] [PubMed] [Google Scholar]
- 55.Solà E, Pose E, Campion D, et al. Endpoints and design of clinical trials in patients with decompensated cirrhosis: Position paper of the LiverHope Consortium. Journal of Hepatology 2021;74:200–19. doi: 10.1016/j.jhep.2020.08.009 [DOI] [PubMed] [Google Scholar]
- 56.Malyszko J, Tesarova P, Capasso G, et al. The link between kidney disease and cancer: complications and treatment. The Lancet 2020;396:277–87. doi: 10.1016/S0140-6736(20)30540-7 [DOI] [PubMed] [Google Scholar]
- 57.Roger VL. Epidemiology of Heart Failure. Circulation Research 2013;113:646–59. doi: 10.1161/CIRCRESAHA.113.300268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ahmed A, Campbell RC. Epidemiology of Chronic Kidney Disease in Heart Failure. Heart Fail Clin 2008;4:387–99. doi: 10.1016/j.hfc.2008.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.White C, Reschovsky JD, Bond AM. Understanding Differences Between High- And Low-Price Hospitals: Implications For Efforts To Rein In Costs. Health Affairs 2014;33:324–31. doi: 10.1377/hlthaff.2013.0747 [DOI] [PubMed] [Google Scholar]
- 60.Weinberger M, Oddone EZ, Henderson WG, et al. Multisite randomized controlled trials in health services research: scientific challenges and operational issues. Med Care 2001;39:627–34. doi: 10.1097/00005650-200106000-00010 [DOI] [PubMed] [Google Scholar]
- 61.Goodlett D, Hung A, Feriozzi A, et al. Site engagement for multi-site clinical trials. Contemporary Clinical Trials Communications 2020;19:100608. doi: 10.1016/j.conctc.2020.100608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Huang GD, Bull J, Johnston McKee K, et al. Clinical trials recruitment planning: A proposed framework from the Clinical Trials Transformation Initiative. Contemporary Clinical Trials 2018;66:74–9. doi: 10.1016/j.cct.2018.01.003 [DOI] [PubMed] [Google Scholar]
- 63.Lenze EJ, Rodebaugh TL, Nicol GE. A Framework for Advancing Precision Medicine in Clinical Trials for Mental Disorders. JAMA Psychiatry 2020;77:663–4. doi: 10.1001/jamapsychiatry.2020.0114 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








