Skip to main content
. 2020 Oct 29;9(10):e18366. doi: 10.2196/18366

Table 3.

A summary of the analytical framework suggested in this study.

Problem Solution Advantages Limitations Other options
Patient care programs evolve based on clinical and reimbursement changes without regard to research leading to confusion about the flow of data for research Create a data flow diagram
  • Provides insight into the underlying structure of data

  • Identifies the main patient cohorts available in the registry

Not suitable for registries that are lack of the component of patient-centered care for chronic disease
  • Unified Modeling Language [25]

  • Conceptual Modeling [26]

Data collected in clinical settings is prone to many data quality issues Use the Kahn et al [23] framework to evaluate the quality of accumulated data in the registry against 5 dimensions: accuracy, completeness, consistency, validity, and uniqueness Provides specific operational approaches to determine the quality of data in a patient-centered registry
  • Not appropriate for multisite registries

  • Not appropriate for cleaning unstructured data set (eg, text cleaning)

Achilles Heel Data Quality Tool [27]
Patients may flow in and out of clinical care based on clinical needs leading to confusion when creating cohorts Use visualization techniques to visualize all possible instances (having no, single, or multiple enrollment status) in the registry
  • Helps the research team define key points of time in a patient’s flow (eg, eligibility date and start date) that account for the majority of patients

  • Helps create rule-based algorithms to create comparable patient cohorts for a study

Needs a deep understanding of patients’ flow in the registry and standard definitions that are adhered to in clinical practice as patients enter and leave treatment. Use unsupervised machine learning algorithms (eg, deep learning) for creating initial patient cohorts for human review [28]