Skip to main content
AMIA Summits on Translational Science Proceedings logoLink to AMIA Summits on Translational Science Proceedings
. 2016 Jul 20;2016:167–175.

A Decompositional Approach to Executing Quality Data Model Algorithms on the i2b2 Platform

Huan Mo 1, Guoqian Jiang 2, Jennifer A Pacheco 3, Richard Kiefer 2, Luke V Rasmussen 3, Jyotishman Pathak 4, Joshua C Denny 1, William K Thompson 5
PMCID: PMC5001760  PMID: 27570665

Abstract

The Quality Data Model (QDM) is an established standard for representing electronic clinical quality measures on electronic health record (EHR) repositories. The Informatics for Integrated Biology and the Bedside (i2b2) is a widely used platform for implementing clinical data repositories. However, translation from QDM to i2b2 is challenging, since QDM allows for complex queries beyond the capability of single i2b2 messages. We have developed an approach to decompose complex QDM algorithms into workflows of single i2b2 messages, and execute them on the KNIME data analytics platform. Each workflow operation module is composed of parameter lists, a template for the i2b2 message, an mechanism to create parameter updates, and a web service call to i2b2. The communication between workflow modules relies on passing keys ofi2b2 result sets. As a demonstration of validity, we describe the implementation and execution of a type 2 diabetes mellitus phenotype algorithm against an i2b2 data repository.

Introduction

The widespread adoption of electronic health records (EHR), driven in part by Meaningful Use regulations,1 has been a key enabler in the ability to measure clinical quality and population health on a large scale.2 In order to help achieve the full promise of these goals, the Office of the National Coordinator has promoted the development of the Quality Data Model (QDM), and the Health Quality Measure Format (HQMF), a Health Level Seven (HL7) standard for expressing electronic clinical quality measures.3 Leveraging these quality standards and the tools developed to support them, we have proposed in previous work to extend the use of the QDM to encompass EHR-driven clinical research, in particular for the creation of phenotype algorithms.4 Phenotype algorithms typically select cohorts of patients based on logical criteria expressed over EHR data such as diagnoses, medications, procedures, and encounters. Such algorithms have taken on an increasingly important role in clinical research as EHR data have become more widely available.5,6 As part of our efforts to achieve this goal, we have investigated ways to map QDM-based algorithms into executable artifacts, to reduce the degree of manual effort required to implement a phenotype algorithm once it has been specified.7,8

Mapping a QDM algorithm to an executable form inevitably raises the issue of how to link the algorithm logic (expressed using QDM-based concepts) to actual data stored in an individual healthcare institution’s local data repository. This issue has motivated us to leverage the data warehousing platform developed by the Informatics for Integrating Biology and the Bedside (i2b2), an NIH-funded National Center for Biomedical Computing.9 The i2b2 platform is widely adopted as a research data repository for EHRs at over 80 sites nationwide.10 This leads us to target the translation of QDM-based algorithms into executable artifacts that query i2b2 repositories.

The i2b2 platform has a modular architecture, where communications between modules use XML messages and RESTful web services. In particular, an i2b2 web client posts XML messages representing data queries to an i2b2 server containing patient data, and receives in return XML messages that contain keys for result sets. The web client can also use these result keys to construct next-step queries or to request patient or encounter lists. A previous effort to execute QDM-based queries on i2b2 mostly focused on creating a single i2b2 XML query message from a single HQMF document.11 This strategy is effective for many algorithms, but it is difficult to scale this approach to the full complexity of the QDM. This limits its potential with respect to many publicly available clinical quality measures12 and phenotype algorithms.13 Here, we propose a flexible, decompositional approach to break down a QDM algorithm into a workflow of unit i2b2 messages, where each message implements one step of a well-defined QDM operation.

In previous work,8 we have demonstrated the ability to implement QDM-based phenotype algorithms as executable workflows for the open source KNIME data analytics platform (https://www.knime.org/). This platform enables the graphical creation and execution of data workflows that can read, transform, visualize, and write data in various formats. KNIME workflows are persisted as sets of text file descriptors, which can be bundled together, exported and shared with other users. To support the generation of KNIME workflows from QDM algorithms, we have previously developed a software tool to automatically translate QDM into KNIME workflow descriptors,14 that can be subsequently manually edited to link to local data repositories. In the work described here, we build upon these previous efforts, using KNIME workflows in order to orchestrate the execution of sequences of i2b2 querying messages.

Methods

We manually decomposed a QDM algorithm into basic units of functionality, as defined in a recent formalization of the QDM.15 Each of these units of functionality is implemented in the KNIME workflow as a group of nodes that are bundled together into a single unit of workflow functionality. We call this bundle of nodes a QDM Operator Module. We link these modules together into an executable KNIME workflow to implement a complete QDM algorithm. The i2b2 query results from each QDM Operator Module are passed to subsequent modules in the form of i2b2 result set keys.

QDM operator modules

We implemented a set of QDM Operator Modules that can handle a variety of QDM components, such as clinical data elements, logical operators (AND, OR, AND-NOT), temporal operators (point to point such as STARTS BEFORE START OF, and point to duration such as STARTS DURING), and a patient list retriever.

Although a single i2b2 message can represent a query with combined aggregative, logical, and temporal operations, in our implementation we restrict each QDM operator module to perform only a single unit of QDM operation as defined by the QDM formal model. We then compose these single steps into larger operations programmatically using KNIME workflows, allowing us to build queries of arbitrary complexity. Each QDM Operator Module contains the following core components:

  • Parameter List: This list contains account credentials, keys for patient or encounter lists from upstream operations, and attributes of an operation (e.g., “starts less than 3 days before start of”). Each parameter has a name and a value (Figure 1A). The Table Creator nodes (Figure 2A) that store these parameters are designed to be updatable with external KNIME workflow constructing applications, such as our QDM translation tool.

  • Template of i2b2 Messages: We generated i2b2 XML message templates by performing related queries on an i2b2 Web Client, obtaining the resulting XML messages (Figure 1C), and placing abstractions of these messages as templates in KNIME nodes (Figure 2B).

  • XPath Maps and Update of XML Messages: For each parameter that requires an update on the XML message template, we assign an XPath to indicate the destination (Figure 1B, Figure 2C). In KNIME, we use a node to update the XML message with the guidance of XPath (Figure 2D).

  • RESTful Communications: After an XML message is issued, the module will POST the message to the i2b2 server (Figure 2E). The module then will extract the result set keys for the next operations to use (Figure 3F). To retrieve patient lists, the module will POST an XML message with the result key as patient_set_coll_id to the i2b2 server.16

Figure 1.

Figure 1

Update Parameters on an i2b2 Message for an QDM Operator Module. A: a parameter list; B: An XPath map for the parameters; C: a template i2b2 message generated in i2b2 web client.

Figure 2.

Figure 2.

An example of a QDM Operator Module. This module is for temporal operators. The three input ports are (from up to down): connection information and log-in credentials, left side input of the key for encounter list from the previous operation, the right side one. A: the list of operation specific parameters; B: the XML message template; C: XPaths for parameters; D: the Java Snippet to update the parameters on the XML message; E: RESTful communication with the i2b2 server. The rest of the module is shown in Figure 3

Figure 3.

Figure 3.

An example of a QDM Operator Module, continuation from the one shown in Figure 2. F: Extracting information from the response message from an i2b2 server. The three output ports are: key for result patient set, key for result encounter set, count of patients.

A test case

As a demonstration test case, we use a version of the type 2 diabetes mellitus (T2DM) phenotype algorithm created as part of the Electronic Medical Records and Genomics (eMERGE) network17 (Figure 4). This version of the T2DM algorithm was also previously used in the AMIA 2015 Demonstration S32 (PhEMA: Phenotype Modeling, Sharing and Execution Architecture).18 The T2DM algorithm requires a set of complex nested Boolean logical operations over diagnoses, labs, and medications. It also includes a point-to-point temporal operation on timing of medication use.

Figure 4.

Figure 4.

High-level overview of the type 2 diabetes mellitus (T2DM) phenotype algorithm. The T2DM algorithm contains multiple nested Boolean operators expressed over diagnoses labs, and medications. It also includes a point- to-point temporal operation on timing of medication use. Use of disjunctive OR operator allows for mulitple paths to be selected into a T2DM cohort. The algorithm uses corresponding ICD9 codes for Diagnosis, Active, RxNorm codes of oral hypoglycemics for T2DM medications, and those of insulins for T1DM medications.

This level of complexity provides a good test of the composition of various QDM Operator Modules into a single executable workflow.

Results

We successfully used the QDM Operator Modules on KNIME to implement the test T2DM phenotype algorithm (Figure 5), and performed the query on an i2b2 Virtual Machine (VM) server instance. Among the total 133 demonstration patients provided by the i2b2 repository, none of them were found to be qualified for the proposed inclusion criteria. There are 11 patients with T2DM ICD-9 codes, but only one patient (without a T2DM diagnosis) takes oral hypoglycemic (Table 1). This result is in accordance with the logic of the algorithm, which properly excluded all test patients from inclusion in the T2DM cohort.

Figure 5.

Figure 5.

The KNIME i2b2 implementation of the eMERGE Type 2 Diabetes Mellitus Phenotype Algorithm modeled with QDM.

Table 1.

Diabetes-Related Records of i2b2 Demonstrate Patients. T1DM: Type 1 Diabetes Mellitus; T2DM: Type 2 Diabetes Mellitus; Dx: Diagnoses, with ICD-9 codes; Gluc: plasma glucose measure; A1C: Hemoglobin A1c; T1DM Med: mostly insulin; T2DM Med: mostly oral hypoglycemic. 1 or 0 indicates present or absent

patient_id T1DMDx T2DMDX Gluc > 200 mq/dl A1C ≥ 6.5% T1DM Med T2DMMed
1000000011 0 1 0 0 0 0
1000000013 0 1 0 1 0 0
1000000029 0 1 0 0 0 0
1000000043 0 0 0 1 0 0
1000000045 0 1 0 0 0 0
1000000070 1 1 0 0 0 0
1000000083 0 1 0 0 1 0
1000000087 0 1 0 0 0 0
1000000096 0 1 0 0 0 0
1000000108 0 1 0 0 1 0
1000000109 0 1 0 0 0 0
1000000119 0 1 0 0 0 0
1000000124 0 0 0 0 0 1

In the implementation, we met some difficulty in the QDM data element modules, because of the different designs between the code-based value set system in the QDM and the path-based ontology system in i2b2. We used an external KNIME workflow to query the i2b2 path sets of the QDM value sets directly through i2b2 metadata database of the Ontology Cell. Then we loaded these i2b2 path sets to the data element modules in the demonstration workflow.

The T2DM KNIME demonstration implementation is available for download at: www.ProjectPhema.org.

Discussion

We have described our solution for integrating the i2b2 platform with complex phenotype representation models such as the QDM. We adopted the strategy of decomposing a QDM algorithm into a workflow of granular i2b2 queries, each of which is represented in the KNIME workflow as a single QDM Operator Module. Each module executes a single i2b2 query, and passes the i2b2 result set key to subsequent modules in the workflow. This design potentially enables i2b2 to support phenotype algorithms with arbitrary complexity. Our implementation of the eMERGE T2DM algorithm demonstrates the validity of this decompositional design.

We used the open source KNIME platform as a proof of concept to offer an easy-to-test prototype. KNIME provides an excellent workflow management service and graphical representation of a pipeline of modules. As our previous work has demonstrated,8 KNIME users can easily modify these workflows to adapt to local EHR availabilities or secondary projects. With a standardized repository system like i2b2, this new effort eliminates the need to write local SQL queries. We are also developing EM2KN,14 a Java-based tool that will translate a QDM algorithm represented in HQMF format into an executable i2b2 workflow in the KNIME platform. We will integrate eM2Kn with this i2b2 solution. Future work will demonstrate this approach on an expanded set of clinical quality measures and algorithms.

However, our design is agnostic to a particular implementation platform. Each QDM Operator Module only requires a template of i2b2 XML message from the web client, a list of case specific parameters, a list of XPaths to indicate the destination of parameter values, a RESTful communication service, and an XML parser for response messages. The communication between different modules uses result sets keys. Alternative implementation platforms could be used with this same approach.

The i2b2 infrastructure utilizes a modular design, an open messaging system, and reusable result sets. These qualities facilitate the development of external applications to complement existing i2b2 client applications. Our implementation provides another demonstration of the flexibility and power of the i2b2 architecture. The ecosystem of third party applications enabled by this architecture make the i2b2 platform extremely useful as a common EHR repository. Our work provides an additional alternative to the i2b2 Web Client and Workbench, which are no longer the only two client applications available for consuming i2b2 services.

However, the i2b2 ontology system for queries is still designed specifically for the interface of these two i2b2 native clients. For example, all the queries have to go through the Ontology Cell, and have to use long concept paths as keys. This design makes dynamic value set based data element models (such as the QDM) difficult to adapt to i2b2. In the work described here, we map value sets to i2b2 concept paths externally. Therefore, we suggest that i2b2 query messaging system provide mechanisms to bypass the Ontology Cell to directly query base codes.

A potential limitation of the work described here is that we have not assessed the real world performance characteristics of our decompositional approach. All our tests are performed on i2b2 demonstration instances. Moving forward, a production instance of an i2b2 EHR repository will be required for this assessment.

Acknowledgements

This work has been supported by funding from PhEMA (R01 GM105688), iPGx (R01 GM103859), and eMERGE(U01 HG006379, U01 HG006378, U01 HG006388).

References


Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES