Abstract
Overly restricted and poorly designed eligibility criteria reduce the generalizability of the results from clinical trials. We conducted a study to identify and quantify the impacts of study traits extracted from eligibility criteria on the age of study populations in Alzheimer’s Disease (AD) clinical trials. Using machine learning methods and SHapley Additive exPlanation (SHAP) values, we identified 30 and 34 study traits that excluded older patients from AD trials in our 2 generated target populations respectively. We also found that study traits had different magnitudes of impacts on the age distributions of the generated study populations across racial-ethnic groups. To our best knowledge, this was the first study that quantified the impact of eligibility criteria on the age of AD trial participants. Our research is a first step in addressing the overly restrictive eligibility criteria in AD clinical trials.
Introduction
Clinical trials, especially randomized controlled trials (RCTs), are the gold standard for assessing drug treatment effects and drug safety.1 When designing clinical trials, three patient populations are of interest (Figure 1). The target population is the patients to whom the trial findings will be applied, the study population is the patients who meet the trial eligibility criteria, and lastly, the study sample is the patients enrolled in the trial. In the ideal situation, the study sample is representative of the study population, which is representative of the target population. However, clinical trial investigators and sponsors tend to focus more on the assessment of efficacy and safety (i.e., internal validity) in the study sample and pay little attention to trial generalizability (i.e., external validity) which assesses how well trial findings can be applied to the target population2. As a result, clinical trials often adopt overly strict eligibility criteria in the hope of maximizing efficacy while minimizing adverse outcomes, leading to study samples less representative of the real-world patient population in need of the treatments3–5. To address the low generalizability of clinical trials, regulatory agencies, including the FDA6, have issued guidance on broadening trial eligibility criteria to increase the diversity of clinical trial study populations.
Figure 1.

Population in clinical trials
Although older adults are more likely to use prescription drugs and therapies, they are often excluded from and thus underrepresented in clinical trials, including Alzheimer’s Disease (AD) trials7,8, due to overly restricted trial eligibility criteria.9–13 For example, Hutchins et al reported that patients over 65 years old were underrepresented in cancer treatment trials.11 A recent study from Hsu et al discovered that the older rectal cancer patients were more likely to be excluded from the RCTs.12 Older adults were also excluded from AD drug trials despite being the primary target population of these AD drugs. AD patients over 80 years old were reported to be rarely included in AD drug trials.7 Gill et al. found that less than half of the elder donepezil (an FDA approved medication for AD) takers were eligible for the trials of the medication.8 There is a need for exploring and measuring the impact of eligibility criteria on the population representation of clinical trials.
In the past decade, the emergence of electronic health records (EHRs) has made real-world clinical data available for investigating the influence of eligibility criteria on trial participants.5,14–18 EHRs are real-time, patient-centered medical records that can be accessed instantly and securely by authorized users.19 EHRs have been widely adopted in the development of eligibility criteria20,21 and research on population representation22 in clinical trials. In a global Phase III endocrinology study, the stockholder used EHRs to identify an eligibility criterion based on a particular background medication resulted in the exclusion of a significant proportion of patients from the trial.23 EHRs provide the opportunity to examine the issue of underrepresentation of older adults in AD trials via data-driven methods for potential solutions.
In this study, we used data-driven algorithms with EHRs to measure the impact of altering eligibility criteria on population representativeness in terms of age in AD clinical trials. Our primary goal was to identify eligibility criteria that explicitly exclude older patients from AD trials and assess their impacts on the age distribution of the study population. One challenge in assessing the eligibility criteria of clinical trials is the large numbered complicated criteria that are different even among the same category of trials. To address this challenge, we reviewed the eligibility criteria of all clinical trials on FDA-approved AD drugs for safety and efficacy assessment, decomposed them into study traits, and analyzed how eligibility criteria impacted the age of the study population based on each study trait. Study trials included demographics (e.g., age, race-ethnicity), diagnoses (e.g., history of diabetes), lab test results (e.g., fasting glucose), and more that are used in the eligibility criteria to define the study population. For example, the eligibility criterion “the patients must be over 65 years old and had a history of diabetes” was decomposed into study traits age and diabetes. The other challenge was the complex interactions among the large number of study traits, and the need to isolate each study trait’s impact on study population’s age distribution. To address this challenge, we experimented with multiple machine learning methods to associate the study traits with the age of the study population. We used SHapley Additive exPlanation (SHAP) values to attribute the impact of each study trait on age. SHAP values is an artificial intelligence (AI) explanation tool based on Shapley values (SVs). SV is proposed as a measurement for the contribution of each player inside a winning game.24 SHAP value provides a scheme to measure the change of average age between the hypothetical study and target population (winning game) contributed by each study trait (player in the team) using each study trait as an eligibility criterion. In this study, we were able to identify study traits that excluded older AD patients from clinical trials. To our best knowledge, this was the first study to identify and quantify the impact of eligibility criteria on the age of AD clinical trial participants.
Methods
Alzheimer’s Disease trial eligibility criteria and study traits
We reviewed the eligibility criteria of all US-based AD trials using the FDA-approved Donepezil, Galantamine, Rivastigmine, and Memantine on ClinicalTrials.gov to create study traits. All 14 Phase III and Phase IV AD trials using the 4 AD drugs for safety and efficacy assessment were included. From the 14 AD trials, we extracted 234 eligibility criteria and generated a library of 204 unique study traits, among which 150 were computable (i.e., identifiable in EHRs). Two computable study traits, cardiovascular disease and acetylcholinesterase inhibitor, appeared in over half (n =10, 8) of the 14 AD trials, while most of the traits only appeared in one AD trial.
Data source and study cohorts
We obtained 2012-2021 real-world data from the OneFlorida+ network, a large CRN in the national PCORnet funded by the Patient-Centered Outcomes Research Institute (PCORI).25 OneFlorida+ contains robust longitudinal, linked patient-level data, including data from EHRs, Medicaid claims, cancer registries, vital statistics and more, for approximately 16.8 million patients in Florida, 2.1 million in Georgia (via Emory University), and 9.8 thousand in Alabama (via University of Alabama Birmingham). The OneFlorida+ data are a Health Insurance Portability and Accountability Act (HIPAA) limited data set that contains detailed patient demographic and clinical variables, including demographics, encounters, diagnoses, procedures, vitals, medications, and laboratory results.
We defined two different AD trial target populations in OneFlorida+: (1) AD patients: all patients diagnosed with AD, and (2) AD patients on medication: patients diagnosed with AD and taking one of the 4 AD medications (Donepezil, Galantamine, Rivastigmine, or Memantine). In OneFlorida+ EHRs, AD diagnoses were identified using ICD-9/10-CM codes (i.e., ICD-9-CM: 331.0; ICD-10-CM: G30.0, G30.1, G30.8, and G30.9), and AD medications were identified using its corresponding national drug code (NDC) and RxNorm concept unique identifier (RxCUI) codes. This study was approved by the University of Florida Institution Review Board (IRB).
Data analysis
The primary outcome of our study were the age distributions of the two target populations. The study predictors were the 150 computable study traits identified in EHRs. These predictors were coded as binary variables (0 = condition not present or 1 = condition present), which indicated whether a patient had the condition described by a specific study trait.
First, we explored the best performing machine learning models for examining the associations between the age distribution of the target populations and the study traits. Four regression models, namely the linear regression (LR), support vector regressor (SVR), extreme gradient boost (XgBoost) regressor, and Adaboost regressor (Adaboost), were used to examine the associations between the age distribution of the target population and the study traits. SVR is a supervised machine learning method based on Vapnik-Chervonenkis (support vector) theory.26 Unlike other regressors, SVR fits the best model with a predefined threshold, giving it advantages on accuracy and generalizability. XgBoost regressor is a gradient boost-based ensemble learning method that replaces the exact greedy algorithm with an efficient approximation algorithm for high-efficiency tree split selection.27 XgBoost regressor is widely known for its efficiency and high accuracy in clinical research compared with other tree-based machine learning methods.28 Adaboost is an ensemble learning method that utilizes relative error and weighting mechanisms to improve its performance over the subgroups that are hard to predict.29 Adaboost shows its strength when the prediction subgroups are highly overlapped. We randomly split our AD target populations into training and testing sets on a 2:1 ratio. The four regressors were trained with the training set and evaluated with the test set.
Second, to quantify the impact of applying each study trait as an exclusion or inclusion criterion on the age of our AD target populations, we calculated the SHAP value for each study trait. All SHAP values were calculated using equation (1) based on the best model as described in the literature.30 In short, the SHAP value attributes the change in the model output to each of the predictors as
| (1) |
Where F is the set of all predictors; S is a subset of F; i is a single predictor (i.e., study trait in our study); 𝑥𝑆 is the predictors in S.
As the SHAP values were calculated at the individual patient level, we aggregated the SHAP values across the patients by converting the SHAP values from a local explainer (per patient, per study trait variable) to a global explainer (per study trait variable) using Equation (2). Briefly, for each study trait, the global SHAP value was calculated as the difference between the median SHAP values for patients with the study trait (condition presented) and median SHAP values for patients without the study trait (condition not presented). The higher the global SHAP value for a study trait, the greater the impact it had on the age of the participants. A study trait with a global SHAP value larger than zero indicated that (1) the average age of the study population was lower than the target population if the study trait was used as an exclusion criterion, or (2) the average age of the study population was higher than the target population if the study trait was used as an inclusion criterion. In contrast, a study trait with a global SHAP value smaller than zero indicated that (1) the average age of the study population was higher than the target population if the study trait was used as an exclusion criterion, or (2) the average age of the study population was lower than the target population if the study trait was used as an inclusion criterion.
| (2) |
Where SHAPglobal is the global SHAP value for a study trait; {SHAPlocali(xstudytrait = 1)} are the SHAP values from the patients whose corresponding study trait value was 1, while {SHAPlocalj(xstudytrait = 0)} are the set of SHAP values from the patient whose corresponding study trait variables had values of 0.
We examined the impacts of the study traits on the age of the two AD target populations overall as well as stratified by race-ethnicity groups. The non-Hispanic other group was excluded from the stratified analysis due to small sample size. All data analyses were performed using Python 3.8.
Results
We extracted data on 130,146 AD patients and 20,579 AD patients on medication between January 2012 and July 2021 in OneFlorida+. The demographic information of the two AD target populations was shown in Table 1.
Table 1.
Demographic information of the AD target populations.
| AD patients n = 130,146 | AD patients on medication n= 20,579 | |
|---|---|---|
| Age, mean (SD) | 83.5 (9.5) | 77.6 (9.8) |
| Sex | ||
| Female | 88,235 (67.8%) | 13130 (63.8%) |
| Male | 41,917 (32.2%) | 7448 (36.2%) |
| Unknown | 10 (< 0.1%) | 1 (<0.1%) |
| Race-Ethnicity | ||
| Non-Hispanic White | 63,916 (49.1%) | 8410 (40.9%) |
| Non-Hispanic Black | 15,558 (12.0%) | 2814 (13.7%) |
| Non-Hispanic Other | 1,388 (1.1%) | 285 (1.4%) |
| Hispanic | 32,916 (25.3%) | 5312 (25.8%) |
| Unknown | 16,384 (12.6%) | 3758 (18.3%) |
When examining the associations between the age distribution of the target populations and the study traits, XgBoost (MSE=54.14, R2=0.315) outperformed the other three regressors in terms of mean squared error and R2 score (LR: MSE=70.55, R2=0.262; SVR: MSE=65.83, R2=0.315; AdaBoost: MSE=66.99, R2=0.315). Therefore, we chose XgBoost for the remaining analysis. In general, the machine learning methods (SVR, XGBoost, and AdaBoost) produced smaller MSE and higher R2 values in estimating patients’ age. These results suggested that machine learning models outperformed linear regressor in exploring the association between the age distribution of the target populations and the study traits. Compared with statistical models, machine learning models took the interactions between study traits into consideration. This was hard to implement with statistical models (i.e., LR) in our case where 150 study traits were included in the analysis.
For the AD patients and AD patients on medication target populations, we identified 30 and 34 study traits respectively, all originated from exclusion criteria, that would exclude older patients from AD trials if used as eligibility criteria (i.e., patients with the trait are not allowed to participate in AD trials) (Figure 2). As seen in Figure 2, each bar represents the global SHAP value for the corresponding study trait, and study traits were ranked in the order of decreasing SHAP value. For the target population of AD patients (Figure 2A), the top study traits that would exclude older patients, if used as exclusion criteria, were deafness and hearing impairment (age reduction = 2.99 years), atrial fibrillation (2.11 years), renal failure (1.02 years), heart block (0.96 years), and hypertension (0.93 years). For the target population of AD patients on medication (Figure 2B), the top study traits that would exclude older patients were atrial fibrillation (age reduction = 2.72 years), degenerative brain disorder (1.28 years), heart block (1.20 years), deafness and hearing impairment (1.14 years), renal failure (1.14 years), and blindness and visual impairment (0.96 years).
Figure 2.
The global SHAP values of the study traits that would exclude older AD patients in 2 target populations.
We calculated the unadjusted prevalence rates of the study traits that would exclude older patients in our target populations (Table 2). In the target population of AD patients, 7 out of the top 10 study traits with the largest global SHAP values (atrial fibrillation, renal failure, hypertension, cardiovascular disease, urinary tract infection, history of mental disorder) were highly prevalent (prevalence > 10%). In the target population of AD patients on medication, there were also 7 out of the top 10 study traits with the largest global SHAP values that were highly prevalent (degenerative brain disorder, renal failure, diseases of the circulatory system, urinary tract infection, and history of mental disorder).
Table 2.
Unadjusted prevalence of the condition described by study traits in the target population
| AD patients | AD patients on medication | ||
| Study traits | Patients with study traits (%) | Study traits | Patients with study traits (%) |
| Deafness and hearing impairment | 3961(3.0%) | Atrial Fibrillation | 3787 (18.4%) |
| Atrial fibrillation | 27535(21.2%) | Degenerative brain disorder | 10160 (49.4%) |
| Renal failure | 48446(37.2%) | Heart block | 2302 (11.2%) |
| Heart block | 12004(9.2%) | Deafness and hearing impairment | 759 (3.7%) |
| Hypertension | 75228(57.8%) | Renal failure | 7148 (34.7%) |
| Hypothyroidism | 23894(18.4%) | Blindness and visual impairment | 317 (1.5%) |
| Cardiovascular disease | 78570(60.4%) | Cerebral trauma | 654 (3.2%) |
| Urinary tract infection | 51934(39.9%) | Diseases of the circulatory system | 14022 (68.1%) |
| Blindness and visual impairment | 2102(1.6%) | Urinary tract infection | 7261 (35.3%) |
| History of mental disorder | 88445(68.0%) | History of mental disorder | 13039 (63.4%) |
| Intolerance to lactose | 133(0.1%) | Opioid medications | 13497 (65.6%) |
| Diseases of the circulatory system | 92230(70.9%) | Disorder of coronary artery | 5681 (27.6%) |
| Atrioventricular block | 2877(2.2%) | Cardiovascular disease | 11837 (57.5%) |
| Disorder of coronary artery | 37034(28.5%) | Gastric ulcer | 427 (2.1%) |
| Sick sinus syndrome | 4861 (3.7%) | Hypothyroidism | 3534 (17.2%) |
| Disorder of respiratory system | 65846 (50.6%) | Hypertension | 11954 (58.1%) |
| Degenerative brain disorder | 67081 (51.5%) | Use Vitamin | 3717 (18.1%) |
| Use vitamin | 7907 (6.1%) | Disorder of respiratory system | 9964 (48.4%) |
| Cardiac dysrhythmias | 23034 (17.7%) | Sick sinus syndrome | 856 (4.2%) |
| Pregnant | 47 (<0.1%) | Kidney disease | 7836 (38.1%) |
| Use vitamin b12 | 9105 (7.0%) | Incontinence of feces | 247 (1.2%) |
| Abnormal cardiac conduction | 8151(6.26%) | Macrocytic anemia | 625 (3.0%) |
| Cerebral trauma | 3782(2.9%) | Neoplasm of prostate | 322 (1.6%) |
| Transient ischemic attack | 8119(6.2%) | Disorder of thyroid | 3947 (19.2%) |
| Space-occupying lesion of brain | 1242(1.0%) | Bronchodilator Medications | 7204 (35.0%) |
| Disorder of digestive system | 62337(47.9%) | Cardiac dysrhythmias | 3776 (18.3%) |
| Duodenal ulcer | 1535(1.2%) | Myocardial infarction | 2713 (13.2%) |
| Kidney disease | 51880(39.9%) | Warfarin | 1236 (6.0%) |
| Substance abuse | 5060(3.9%) | Virus infections of CNS | 12 (0.1%) |
| Cerebral infarction | 15177(11.7%) | Disorder of lung | 3822 (18.6%) |
We summarized results from the race/ethnicity-stratified analysis in Figure 3. Figure 3 shows the exclusion criteria with the 10 largest global SHAP values among non-Hispanic white patient, non-Hispanic black patient, and Hispanic patient groups in the 2 AD target populations.
Figure 3.
study traits with the 10-largest SHAP value described in study traits in different racial-ethnic groups.
In non-Hispanic white patients, deafness and hearing impairment (age reduction = 3.00 years) had the largest global SHAP value followed by atrial fibrillation (2.31 years) and blindness and visual impairment (1.20 years) in AD patients, while cardiovascular disease had the largest impact on the patients’ age (age reduction = 2.66 years), seconded by atrial fibrillation (1.21 years) and degenerative brain disorder (1.07 years) in AD patients on medication.
In non-Hispanic black patients, cardiovascular disease (age reduction = 2.09 years) had the largest global SHAP value followed by deafness and hearing impairment (1.94 years) and degenerative brain disorder (1.14 years) in AD patients, while atrial fibrillation had the largest impact on the patients’ age (age reduction = 2.79 years), seconded by macrocytic anemia (1.52 years), renal failure (1.11 years) in AD patients on medication.
In Hispanic patients, atrial fibrillation (age reduction = 1.82 years) had the largest global SHAP value followed by deafness and hearing impairment (1.34 years) and renal failure (1.28 years) in AD patients, while atrial fibrillation had the largest impact on patients’ age (age reduction = 4.13 years), seconded by heart block (1.65 years) and degenerative brain disorder (0.88 years) in AD patients on medication.
CNS: central nervous system.
Discussion
In this study, we identified and quantified 30 and 34 study traits that could potentially exclude older patients if used as exclusion criteria in the target populations of AD patients and AD patients on medication, respectively. Most of the identified study traits were common in the two target populations, with 19 study traits being identical in the two target populations. Additionally, the study traits had different magnitudes of impacts on the age distributions of the generated study populations when used as exclusion criteria across the racial-ethnic groups.
For the extracted study traits from the 14 clinical trials, the top common study traits that had a large negative impact on age (i.e., reduce the age of AD trial participants) included atrial fibrillation, degenerative brain disorder, heart block, hypertension, hypothyroidism, cardiovascular disease, urinary tract infection, disease of the circulator system and history of mental disorder. Using these study traits as exclusion criteria for AD clinical trials would greatly reduce the representativeness of the study population. In addition, some study traits (e.g., warfarin, macrocytic anemia, intolerance to lactose) had a large effect (large global SHAP values) on age in certain racial-ethnic groups but not in the overall population. Using such traits in exclusion criteria could exclude older AD patients only in certain race-ethnic groups, which could result in selection bias and underrepresentation of the racial-ethnic groups in AD clinical trials. Thus, study traits need to be avoided when designing exclusion criteria.
Cardiovascular disease is the most common study trait in our study, found in 10 out of 14 AD clinical trials. Overall, its global SHAP value was ranked 7th in the AD patient population, and 13th in the AD patient on medication population. Among racial-ethnic groups, the global SHAP value of cardiovascular disease was ranked 1st in the AD patient population in non-Hispanic black patients, 1st in AD patient on medication population in non-Hispanic white patients, and the 4th in the AD patient on medication population in Hispanic and non-Hispanic black patients.
In addition, we found that most of the study traits were not common traits in the 14 AD clinical trials. Only two study traits (i.e., cardiovascular disease, acetylcholinesterase inhibitor) appeared in over half of the trials, while most traits appeared in one trial only. This finding suggests great variability in eligibility criteria of different AD trials. It remains to be studied whether this variability is necessary and whether there is a need for standardizing eligibility criteria of AD clinical trials.
Our study does not seek to replace the role of clinicians in trial design, but lay a foundation for designing informatics tools to support data-driven and evidence-based trial design toward better generalizability. Through generalizability assessment, we aim to prompt trialists to balance internal and external validity through adjusting trial eligibility criteria. For example, hypertension is a widely adopted exclusion criterion that potentially has little effect on trial safety and outcome. According to our findings, it may be reconsidered in trial design to improve trial generalizability.
Our analysis has the following limitations. First, the accuracy of our conclusions may be affected by the inaccurate and vague ICD codes. Misclassification due to inaccurate and vague ICD codes is a known limitation in EHR and could lead to misidentification on the conditions described by study traits among AD patients. Second, the conclusions may not generalize well to AD patients in states other than Florida. Our analysis was conducted with AD patients in Florida, which could have different health baselines from the patients national wide.
Conclusion
In this study, we measured the impact of 150 study traits generated from 14 AD clinical trials on 4 FDA-approved AD medications. We identified 30 and 34 study traits that could lower the age of the study population (in other words, exclude older patients) compared to our two target populations when used as exclusion criteria. Additionally, we found that some study traits had differential impacts on the age of the study population across racial-ethnic groups. Our research is a first step in addressing the overly restrictive eligibility criteria in AD clinical trials.
Acknowledgment
This work was supported in part by NIH grants R21AG068717 and R21CA253394. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Figures & Table
References
- 1.Spieth PM, Kubasch AS, Penzlin AI, et al. Randomized controlled trials – a matter of design. NDT. 2016;12:1341–1349. doi: 10.2147/NDT.S101938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Martin F, Susan SM. Improving the external validity of clinical trials: the case of multiple chronic conditions. J Comorb. 2013;3:30–35. doi: 10.15256/joc.2013.3.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Karim S, Xu Y, Kong S, et al. Generalisability of Common Oncology Clinical Trial Eligibility Criteria in the Real World. Clinical Oncology. 2019;31:e160–e166. doi: 10.1016/j.clon.2019.05.003. [DOI] [PubMed] [Google Scholar]
- 4.He J, Morales DR, Guthrie B. Exclusion rates in randomized controlled trials of treatments for physical conditions: a systematic review. Trials. 2020;21:228. doi: 10.1186/s13063-020-4139-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Li Q, Guo Y, He Z, et al. Using Real-World Data to Rationalize Clinical Trials Eligibility Criteria Design: A Case Study of Alzheimer’s Disease Trials. AMIA Annu Symp Proc. 2021;2020:717–726. [PMC free article] [PubMed] [Google Scholar]
- 6.Research C for DE and. Enhancing the Diversity of Clinical Trial Populations — Eligibility Criteria, Enrollment Practices, and Trial Designs Guidance for Industry. U.S. Food and Drug Administration. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/enhancing-diversity-clinical-trial-populations-eligibility-criteria-enrollment-practices-and-trial (2020, accessed 2 March 2022)
- 7.Banzi R, Camaioni P, Tettamanti M, et al. Older patients are still under-represented in clinical trials of Alzheimer’s disease. Alzheimer’s Research & Therapy. 2016;8:32. doi: 10.1186/s13195-016-0201-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gill S, Bronskill S, Mamdani M, et al. Representation of patients with dementia in clinical trials of donepezil. The Canadian journal of clinical pharmacology = Journal canadien de pharmacologie clinique. [PubMed]
- 9.Lewis JH, Kilgore ML, Goldman DP, et al. Participation of Patients 65 Years of Age or Older in Cancer Clinical Trials. JCO. 2003;21:1383–1389. doi: 10.1200/JCO.2003.08.010. [DOI] [PubMed] [Google Scholar]
- 10.Talarico L, Chen G, Pazdur R. Enrollment of Elderly Patients in Clinical Trials for Cancer Drug Registration: A 7-Year Experience by the US Food and Drug Administration. JCO. 2004;22:4626–4631. doi: 10.1200/JCO.2004.02.175. [DOI] [PubMed] [Google Scholar]
- 11.Hutchins LF, Unger JM, Crowley JJ, et al. Underrepresentation of Patients 65 Years of Age or Older in Cancer-Treatment Trials. N Engl J Med. 1999;341:2061–2067. doi: 10.1056/NEJM199912303412706. [DOI] [PubMed] [Google Scholar]
- 12.Hsu S, Rosen KJ, Cupertino A, et al. Generalizability of Randomized Controlled Trials in Rectal Cancer. J Gastrointest Surg. 2022;26:453–465. doi: 10.1007/s11605-021-05192-x. [DOI] [PubMed] [Google Scholar]
- 13.Ruiter R, Burggraaf J, Rissmann R. Under-representation of elderly in clinical trials: An analysis of the initial approval documents in the Food and Drug Administration database. British Journal of Clinical Pharmacology. 2019;85:838–844. doi: 10.1111/bcp.13876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhang R, Simon G, Yu F. Advancing Alzheimer’s research: A review of big data promises. International Journal of Medical Informatics. 2017;106:48–56. doi: 10.1016/j.ijmedinf.2017.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Towards clinical data-driven eligibility criteria optimization for interventional COVID-19 clinical trials | Journal of the American Medical Informatics Association | Oxford Academic. https://academic.oup.com/jamia/article/28/1/14/6015812?login=true (accessed 8 March 2022) [DOI] [PMC free article] [PubMed]
- 16.Meystre SM, Heider PM, Kim Y, et al. Automatic trial eligibility surveillance based on unstructured clinical data. International Journal of Medical Informatics. 2019;129:13–19. doi: 10.1016/j.ijmedinf.2019.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rogers JR, Pavisic J, Ta CN, et al. Leveraging electronic health record data for clinical trial planning by assessing eligibility criteria’s impact on patient count and safety. J Biomed Inform. 2022;127:104032. doi: 10.1016/j.jbi.2022.104032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Melzer G, Maiwald T, Prokosch H-U, et al. Leveraging Real-World Data for the Selection of Relevant Eligibility Criteria for the Implementation of Electronic Recruitment Support in Clinical Trials. Appl Clin Inform. 2021;12:17–26. doi: 10.1055/s-0040-1721010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.What is an electronic health record (EHR)? | HealthIT.gov. https://www.healthit.gov/faq/what-electronic-health-record-ehr (accessed 8 September 2021)
- 20.Evans SR, Paraoan D, Perlmutter J, et al. Real-World Data for Planning Eligibility Criteria and Enhancing Recruitment: Recommendations from the Clinical Trials Transformation Initiative. Ther Innov Regul Sci. 2021;55:545–552. doi: 10.1007/s43441-020-00248-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lai YS, Afseth JD. A review of the impact of utilising electronic medical records for clinical research recruitment. Clinical Trials. 2019;16:194–203. doi: 10.1177/1740774519829709. [DOI] [PubMed] [Google Scholar]
- 22.Rogers JR, Hripcsak G, Cheung YK, et al. Clinical comparison between trial participants and potentially eligible patients using electronic health record data: A generalizability assessment method. J Biomed Inform. 2021;119:103822. doi: 10.1016/j.jbi.2021.103822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Case Study: Using Real-World Data to Expand Eligibility Criteria for Phase III Endocrinology Study. CTTI. https://ctti-clinicaltrials.org/topics/novel/real-world-data2/case-study-using-real-world-data-to-expand-eligibility-criteria-for-phase-iii-endocrinology-study/ (2021, accessed 3 March 2022)
- 24.Štrumbelj E, Kononenko I, Robnik Šikonja M. Explaining instance classifications with interactions of subsets of feature values. Data & Knowledge Engineering. 2009;68:886–904. [Google Scholar]
- 25.Shenkman E, Hurt M, Hogan W, et al. OneFlorida Clinical Research Consortium: Linking a Clinical and Translational Science Institute With a Community-Based Distributive Medical Education Model. Acad Med. 2018;93:451–455. doi: 10.1097/ACM.0000000000002029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Awad M, Khanna R. Support Vector Regression. In: Awad M, Khanna R., editors; Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers. Berkeley, CA: Apress; pp. 67–80. [Google Scholar]
- 27.Chen T, Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery; XGBoost: A Scalable Tree Boosting System; pp. 785–794. [Google Scholar]
- 28.Nwanosike EM, Conway BR, Merchant HA, et al. Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review. International Journal of Medical Informatics. 2022;159:104679. doi: 10.1016/j.ijmedinf.2021.104679. [DOI] [PubMed] [Google Scholar]
- 29.Freund Y, Schapire RE. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences. 1997;55:119–139. [Google Scholar]
- 30.Lundberg SM, Lee S-I. A Unified Approach to Interpreting Model Predictions. 10.


