Table 1.
Domain | Step | Description | Potential Challenges to Portability |
Data | Data Collection | The processes by which data is collected within the source EHR, and its intended purpose. | Only data that is collected can be used for electronic phenotyping. How data is collected at a local institution (vocabulary used, frequency of collection, etc.) determines how that institution authors a phenotype algorithm. Modality of data collection (e.g., structured, narrative text, images) can affect how and if the data used in executing a phenotype algorithm. |
Data Preparation | Extract-Transform-Load (ETL) processes through which data is consolidated into an integrated data repository (IDR). | The need to transform the shape of the data from an IDR data into a common data model (CDM). Effort to convert data from one modality to another (e.g., natural language processing to obtain structured results). Mapping of local terms to a standard vocabulary term (national standard or prescribed by CDM), and potential lossy mappings or semantic drift. | |
Authoring | Define Value Sets | Identifying the medical terms that are used to represent data elements within the phenotype algorithm logic. | Not all terminologies/vocabularies are fully implemented at all institutions. Value sets may list all codes, or may list codes at the top level of a hierarchy that need to be expanded. |
Define Logic | Create a representation of the required data elements, and how the elements are related by different operators (e.g., Boolean, temporal) to create a phenotype algorithm. | The modality of the logic representation (narrative, intermediate representation, programming language), and what system(s) may understand it. Strictness of the logic, considering local instead of broader data availability. |
|
Implementation | Distribution | The mechanism by which a phenotype algorithm is transmitted from the author to an implementing site. | Automated vs. manual approach. Policies that require human review and approval before execution. |
Translation | How the phenotype algorithm is converted into an executable representation that may be directly applied to the institutional data model. | Automated vs. manual approach. Technology-specific customizations (e.g., database schema names, table names). Information loss when elements of a data model do not have a direct translation or differ in granularity. | |
Execution | The computation process by which the executable representation is applied to an institutional data warehouse, and results are retrieved. | Syntax errors that require human intervention and correction. | |
Validation | A formal or informalcomparison of the execution results against a reference standard. | Lack of detailed information concerning the inclusion and exclusion implications across multiple phenotype algorithm steps. Lack of access to source data to evaluate results. |