. 2020 Apr 22;11:420. doi: 10.3389/fphar.2020.00420

Table 1.

Examples of post-marketing data used to provide drug information in real-world patient populations and approaches to better characterize and assess the differences between clinical trial and real-world patients.

Post-marketing data	Data type	Advantages	Disadvantages	Examples
Sources (Camm and Fox, 2018)	Claims Data	Encompasses large patient population (10³–10⁶); can be used to study rare events and evaluate economic impact	Lack of randomization; data quality concerns (e.g., missing data, coding errors); limited validation; minimal information on health outcomes	Medicare claims data demonstrated decreased risk of ischemic stroke, intracranial hemorrhage and death with dabigatran 150 mg twice daily as compared to warfarin but increased risk of major gastrointestinal hemorrhage in elderly patients with nonvalvular atrial fibrillation. Dabigatran 75 mg twice daily was indistinguishable from warfarin except for a lower risk of intracranial hemorrhage with dabigatran (Graham et al., 2015).
	Registries	Encompasses large and diverse population; captures real time data; can be used to identify cost-effective treatment options	Lack of randomization; data quality concerns (e.g., missing data); data not collected at defined intervals	U.K. transplant registry data suggested significant benefit for graft survival with prolonged-release tacrolimus as compared to immediate-release tacrolimus with a number needed to treat of 14 to avoid one graft loss and 18 to avoid one death (Muduma et al., 2016).
	EHRs	Captures real-time treatment, outcomes and procedures; can be used to study rare conditions	Requires sophisticated data management and statistical tools; data quality concerns (e.g., missing data, coding errors, recall biases); lack of randomization	Electronic health care data were utilized to evaluate the benefits of switching first-line fever coverage from piperacillin-tazobactam to cefepime in pediatric stem cell transplant patients. Researchers saw a reduction in nephrotoxin-associated acute kidney injury episodes with no increases in treatment failures or infection rates (Benoit et al., 2019).
Examples of Approaches and Applications	GIST (ClinicalTrials.gov + EHR data or NHANES data) (Weng et al., 2014; He et al., 2015)	Patient representative analysis of clinical trials using EHR data or public survey datasets (NHANES data); NHANES data not limited to admitted patients and is well-structured and readily analyzed	Univariate model; lack of longitudinal analysis and use of self-reported medical conditions with NHANES data; data quality issues (EHRs and ClinicalTrials.gov carry potential for missing data)	When applied to type II diabetes clinical trials and EHR data, the GIST approach found that most studies are more generalizable with regard to age than they are with regard to hemoglobin A1c (HbA1c). (>70% of studies enroll patients with HbA1c between 7–10.5% though this encompasses only 38% of real-world patients; most studies allow patients age 18–80 years as compared to 10% of the real-world population that falls out of this range) (Weng et al., 2014). He et al. later validated the GIST approach using clinical trial data and NHANES data and concluded patients enrolled in type II diabetes trials are younger, with lower body mass index (BMI) and higher HbA1c than the general patient population (He et al., 2015).
	mGIST (ClinicalTrials.gov + NHANES data) (He et al., 2016)	Patient representative analysis of clinical trials using public survey datasets (NHANES); multivariate model; more effective and efficient in comparing representativeness of multiple study sets; NHANES data not limited to admitted patients and is well-structured and readily analyzed	Lack of longitudinal analysis and use of self-reported medical conditions (NHANES data); does not assess clinical relevance of factors (each variable weighted equally); data quality issues with ClinicalTrials.gov (potential for missing data)	Using the multivariate GIST metric, He et al. concluded that a significant portion of type II diabetic patients are eligible for fewer than 40% of clinical studies. Those aged >70 years are likely not eligible for most studies.
	MAGIC (ClinicalTrials.gov + NHANES data) (He et al., 2016)	Algorithm to identify underrepresented subpopulations in clinical trials; comparable to other methods of characterizing underrepresented population studies; NHANES data not limited to admitted patients and is well-structured and readily analyzed	May yield large number of subgroups with large variable ranges (does not aggregate similar subgroups); similar limitations with data sources as GIST/mGIST (lack of longitudinal analysis, use of self-reported medical conditions, does not assess clinical relevance of factors, data quality issues)	MAGIC identified 50 combinations of underrepresented population subgroups in type II diabetes clinical trials (e.g., elderly obese pre-diabetic female, elderly overweight pre-diabetic male, elderly obese diabetic male, etc.). Researchers also concluded that 94% of type II diabetic patients would qualify for 20% of clinical studies but only a quarter would qualify for half of the studies (He et al., 2016).

EHR, electronic health record; GIST, Generalizability Index for Study Traits; mGIST, Multivariate Generalizability Index for Study Traits; NHANES, National Health and Nutrition Examination Survey; MAGIC, Multivariate Underrepresented Subgroup Identification.