Table 1.
Summary of real-world data identification for comparative efficacy using externally controlled trial challenges, examples, and application of solutions.
Challenges | Examples | Application of solutions | |||
Data source identification of rare conditions |
|
||||
|
Indecision gaps due to abundance of real-world data | It is difficult to parse out important data sources, rare disease candidates, and data linkage options. | Machine learning applications can improve accuracy and quality (type and frequency) in data source selection and patient selection [23,67,71] | ||
Outcome and covariate |
|
|
|||
|
Poorly defined variables or inconsistent definitions from clinical trial to real-world data for limited comparability of real-world data | The conceptual definition of a data element does not align with the operational definition. | |||
|
Medical claims data might have limited use to support regulatory-grade decision-making | Claims data have limited clinical outcome data. | Combine with EHRs to expand the applicability, coverage, and depth of data [77] | ||
Follow-up |
|
|
|||
|
Difficult to capture continuity of care in a single data source | Diagnosis is spread across multiple physicians; if the patient moves and seeks care outside of the care network, follow-up data will be lost. | |||
Time selection |
|
|
|||
|
Timing of therapy | Patient has multiple lines of treatment; what should be considered the index date? | Define a proper index date or “time zero” following the target trial emulation framework [52] | ||
|
Timing of data collection – inconsistent standard of care over time | Data may be present, but are not current enough to provide a reasonable comparison to the current standard of care. | |||
Geography |
|
|
|||
|
External control arm nongeneralizable to clinical practice | Geographic representation where the main external control arm data source is from outside of the country of interest. Select two unlinked data sources with available data to obtain a sufficient sample size. However, it is unclear if patients overlap in care networks. | |||
Analysis phase |
|
|
|||
|
Data loss or insufficient sample size to detect power | In the analysis phase, during matching, the power to detect an effect is reduced. | |||
|
Avoid the appearance of the analysis as post-hoc or cherry picked | Data dredging/post-hoc analysis (eg, regulators can assume the most appealing analysis was conducted). | Transparent prespecified description of data selection, data provenance, and the statistical analysis plan [3] |