Skip to main content
. 2020 Mar 4;2019:755–764.

Table 1.

Phenotype algorithm workflow model

Domain Step Description Potential Challenges to Portability
Data Data Collection The processes by which data is collected within the source EHR, and its intended purpose. Only data that is collected can be used for electronic phenotyping. How data is collected at a local institution (vocabulary used, frequency of collection, etc.) determines how that institution authors a phenotype algorithm. Modality of data collection (e.g., structured, narrative text, images) can affect how and if the data used in executing a phenotype algorithm.
Data Preparation Extract-Transform-Load (ETL) processes through which data is consolidated into an integrated data repository (IDR). The need to transform the shape of the data from an IDR data into a common data model (CDM). Effort to convert data from one modality to another (e.g., natural language processing to obtain structured results). Mapping of local terms to a standard vocabulary term (national standard or prescribed by CDM), and potential lossy mappings or semantic drift.
Authoring Define Value Sets Identifying the medical terms that are used to represent data elements within the phenotype algorithm logic. Not all terminologies/vocabularies are fully implemented at all institutions. Value sets may list all codes, or may list codes at the top level of a hierarchy that need to be expanded.
Define Logic Create a representation of the required data elements, and how the elements are related by different operators (e.g., Boolean, temporal) to create a phenotype algorithm. The modality of the logic representation (narrative, intermediate representation, programming language), and what system(s) may understand it.
Strictness of the logic, considering local instead of broader data availability.
Implementation Distribution The mechanism by which a phenotype algorithm is transmitted from the author to an implementing site. Automated vs. manual approach. Policies that require human review and approval before execution.
Translation How the phenotype algorithm is converted into an executable representation that may be directly applied to the institutional data model. Automated vs. manual approach. Technology-specific customizations (e.g., database schema names, table names). Information loss when elements of a data model do not have a direct translation or differ in granularity.
Execution The computation process by which the executable representation is applied to an institutional data warehouse, and results are retrieved. Syntax errors that require human intervention and correction.
Validation A formal or informalcomparison of the execution results against a reference standard. Lack of detailed information concerning the inclusion and exclusion implications across multiple phenotype algorithm steps. Lack of access to source data to evaluate results.