Skip to main content
. Author manuscript; available in PMC: 2021 Sep 1.
Published in final edited form as: Diabetologia. 2020 Jul 15;63(9):1694–1705. doi: 10.1007/s00125-020-05217-1

Table 1.

Examples of RWD sources and applications to diabetes research

RWD source Merits Caveats Potential areas for application in diabetes research
Administrative claims data:
Insurance claims for pharmacy prescriptions and medical inpatient and outpatient visits submitted for billing purposes by government or commercial payers
Include cost information, date/place of service and patient demographics, all linked by a common patient identifier
Longitudinally follow patients as they navigate through the healthcare system
Reliable for studying important medical encounters, diagnoses and treatment using variables that are captured for reimbursement purposes
Provide information on large samples of patients and their families, considered to be representative of the target population (commercially insured/populations under public health insurance programs)
Demographically and geographically diverse, relatively low cost and time-efficient vs RCTs
Primary purpose for data collection is administrative rather than for research
Key clinical variables (e.g. severity), medications for which patients pay out-of-pocket, patient-reported outcomes, lifestyle variables and laboratory results are typically not captured
Loss of follow-up, particularly in commercial claims data when patients switch employers/health plans (known censoring date due to availability of enrolment file)
Identification of disease/treatment depends on accuracy of billing codes used and data require validation prior to use particularly for hard-to-diagnose rare conditions
Can be used in real-world studies to compare the effectiveness and safety of glucose-lowering therapies using active comparator new-user design [42, 59], patient characterisation, treatment utilisation [91] and health policy/cost [92] research, as well as burden of illness [93] studies
Can be used to estimate basic prevalence or incidence measures of conditions within diabetes populations given large sample sizes and representativeness
EHRdata:
Data from patients’ electronic medical records
Data typically include information on medical diagnoses, procedures, medications, free text with physician notes, vital signs at each visit, laboratory results, clinical variables
Data collected to capture clinical care and contain rich data on clinical variables or other important confounders
May provide rationale for treatment decisions depending on the quality of free text
Variability in the quality of data as clinical variables are often missing and may be recorded differently by different physicians
Follow-up only available as long as patients remain in the healthcare system and seek care (unknown censoring date since no enrolment file)
Typically, data from only one place of service are available and capture of information from other types of practices are often unreliable (e.g. in a general practice system, specialist data may not be accurately captured for all patients; hospitalisations for acute problems outside the system may not be captured)
Assessing comparative effectiveness or safety, treatment patterns and patient characterisation
Typically less useful for cost assessments or prevalence/incidence estimation
Analyses of EHR data have been
shown to improve glycaemic control, reduce emergency department visits and non-elective hospitalisations [94, 95]
Patient-generated data:
Data from surveys, questionnaires, smartphone apps and social media that allow continuous data capture
Information is provided mainly by patients, rather than by providers
Questionnaire/survey data sources provide data on quality-of-life measures, which are hard to find in other data sources
Can be used as external validation datasets
May find particular relevance in pharmacovigilance, particularly rare adverse events associated with treatments, and factors predicting patients’ adherence, behaviours and attitudes
Some data include real-time monitoring to allow tracking of selected measures and symptoms
Use of these sources implies reliance on self-reported variables, leading to recall bias, selective reporting and missing data on important patient characteristics and medical variables
Limited generalisability and internal validity, as the clinical outcomes reported are often not validated and authenticity is often unverifiable
Utility only in specific settings after careful evaluation and vetting
The FDA-approved WellDoc BlueStar System is a healthcare app that provides secure capture of blood glucose data and aids in diabetes self-management [96]
Patient registries:
Repositories of rich information on specific disease or treatment
Include data on patients’ characteristics and medical variables, including rich clinical information on disease or treatments of interest
Allow long patient follow-up
Useful in areas where richness of information related to a specific disease/treatment is desirable (e.g. rare tumours) and in unique populations (e.g. pregnancy registries)
Validity highly depends on what type of patients are selected into the registry (voluntary vs mandatory enrolment)
Expensive to maintain
May not contain information on other comorbidities or concurrent treatment; more potential for missing data
The diabetes collaborative registry, organised by the leading societies in diabetes research, provides RWD on diabetes patient care and treatment [17]
Data linkages:
Data from two or more sources are linked to bring together the information needed, assuming appropriate safeguards are applied
Bring together data from disparate sources allowing capture of comprehensive information needed in a particular research setting (e.g. linking administrative claims with EHRs would enable combination of longitudinal follow-up, cost information that may be lacking in EHRs, with clinical variables that are incomplete in claims)
Help minimise missing data on key variables, reducing misclassification
Validity of results depends on the quality of linkage
Expensive to link and maintain linked data sources
Challenges in linking data due to different purposes of data collection, discrepancies in data recording, legal/confidentiality issues
Several studies using linked data are being conducted in diabetes patients, predicting hospital admissions [97], cancer outcomes [98] and weight gain with diabetes treatments [99]

EHR, electronic health record