Skip to main content
. 2018 Dec 5;28(4):439–442. doi: 10.1002/pds.4697

Table 2.

Considerations for choosing RWD sources for research studies

Key Considerations
Adequate sample size
  • RWD addresses the scientific question with sufficient confidence.

  • There are sufficient persons, follow‐up time, and relevant observations to address the scientific question.

  • Absent specific feasibility numbers, the crude prevalence can be applied to the total person‐lives in the database to crudely estimate sample size (without applying entry criteria).

Research data element definitions and validation
  • Essential data elements are coded consistently in the RWD health care system (codes capture the research data fields, eg, disease, outcome, treatment, critical covariates, if relevant, adequately).

    • Systematic errors (eg, downcoding or upcoding) in the study population and essential data element definitions are identified and minimized and pre‐specified sensitivity analyses can assess potential impact, if possible.

    • Definitions for essential data elements (eg, population and outcome) are unlikely to result from “screening” or “rule out” of a specific diagnosis in clinical practice.

  • Needed coding algorithms (eg, computable phenotypes) are available and validated for essential data elements.

    • If additional validation is needed, given the research purpose and regulatory decision, then it can be performed within the data source.

  • Covariates or confounders are available that are critical to the research question.

    • (If needed) variables that correlate highly with key missing confounders are available and can be used instead.

Missingness and completeness
  • Consideration has been made regarding essential elements of the research question that may be systematically missing due to patients seeking care out of network or changes in insurance coverage and whether the outcome can be captured reliably over time within the RWD source.

    • Level of systematic error will not substantially affect study interpretation.

  • Discrepancies between different sources of linked data (claims and EHR) for the data elements needed for specific research question will not affect interpretation of the study results.

  • In combining data from multiple health care systems, different coverage policies or benefit designs do not affect ability to address the research question.

Abbreviation: RWD, real‐world data.