Skip to main content
. 2023 Feb 7;380:e071058. doi: 10.1136/bmj-2022-071058

Table 1.

Examples of large datasets with clustering

Dataset IMPACT29 EPIC30 CPRD31 MIMIC-III32
Population Patients with a head injury Volunteers agreeing to participate Patients attending primary care practices in the UK Patients admitted to the Beth Israel Deaconess Medical Center (Boston, MA, USA)
Data source IPD from multiple studies IPD from a prospective multicentre study Linked database with EHR data Hospital database with EHR data
Total sample size 11 022 519 978 11 299 221 38 597
No of clusters 15 studies 23 centres 674 general practices 5 critical care units
Heterogeneity in study designs Phase 3 clinical trials; observational cohort studies Observational cohort studies; nested case-control studies Not applicable Not applicable
Heterogeneity in included populations Data collection from 1984 to 1997; data from high, low, and middle income countries; variable severity of brain injury Participant enrolment from 1992 to 2000; data from 10 European countries; heterogeneity in participant recruitment schemes Data collection from 1987 to present; data from England, Wales, Scotland, and Northern Ireland Data collection from 2001 to 2012; variable patient ethnic group and social status, among other factors
Heterogeneity in data quality Variable classification for head injuries; variable time points for outcome assessment Lack of standardised procedures across cohorts; heterogeneity in dietary assessment methods; heterogeneity in anthropometric measurement methods; heterogeneity in questionnaires across countries Selective linkage with other databases; large variation in data recording between practices; variable frequency of data recording by age, sex, and underlying morbidity; informative missingness of patient characteristics; non-standardised definitions of diagnoses and outcomes; possible variation in extent of misclassification between diseases Different critical care information systems in place during data collection; protected health information removed from free text fields
Heterogeneity in level of care Variability in level of local care; clear improvement of treatment standards over time Not applicable Not applicable Variable efforts to health prevention owing to variability in health insurance programmes among patients33

IPD=individual participant data; EHR=electronic health records data; IMPACT=International Mission for Prognosis And Clinical Trial; EPIC=European Prospective Investigation into Cancer and Nutrition; CPRD=Clinical Practice Research Datalink; MIMC-III=Medical Information Mart for Intensive Care III.