Objective
The objective is to describe the technical process, challenges, and lessons learned in scaling up from a local to regional syndromic surveillance system using the MetroChicago Health Information Exchange (HIE) and Geographic Utilization of Artificial Intelligence in Real-Time for Disease Identification and Alert Notification (GUARDIAN) collaborative initiative.
Introduction
Adoption of electronic medical records is on the rise, due to the Health Information Technology for Economic and Clinical Health Act and meaningful use incentives. Simultaneously, numerous HIE initiatives provide data sharing flexibility to streamline clinical care. Due to the consolidated data availability in centralized HIE models, conducting syndromic surveillance using locally developed systems, such as GUARDIAN, is becoming feasible. During the past year, Chicago has embarked on a city-wide HIE deployment campaign. Perhaps the most unique aspect of this endeavor is that the data warehouse for the HIE is intricately tied to the GUARDIAN syndromic surveillance system.
Methods
The GUARDIAN surveillance system has been running continuously at Rush University Medical Center (RUMC) for the past six years. In order to support real-time processing and analysis, the components of the system were deployed over six servers within the RUMC data center, specifically four processing servers plus two database servers configured as a single failover cluster.
Higher level (HL7) messages were received through transmission control protocol (TCP) connections and stored in a database-backed work queue. Using multiple servers, these messages were processed through a series of stages, specifically HL7 parsing, patient de-identification and matching, Natural Language Processing (NLP) of freetext, comparison of new patient data to stored disease profiles, report generation and user interaction through a web-based user interface. Based on the load metrics over the past six years, we have been able to scale up to a twelve server deployment (ten processing servers plus two database servers configured as a single failover cluster) which will support up to 30 hospitals within metropolitan Chicago.
Results
Based on our experience, below is the list of challenges, solutions, and lessons learned:
-
Data structure to support rapid analysisa. Data stored in hierarchical tree-structure
Free-text components of the patient chart are indexed by their unique concepts (as defined by the National Library of Medicine’s Metathesaurus)
Sophisticated NLP system to analyze and index free-text
-
Identification/ De-identification
For patient data, cryptographically secure one-way hash (salted SHA256) was used
Randomly-generated internal patient identifier to link HL7 messages
For public health, re-identification can be carried out using the hash of the patient identifier and as much identifying information as is stored (e.g. the age and gender of the patient, the rough admit time and date, and the chief complaint)
-
On-boarding hospitals
HIE integration less than 4 hours for majority of hospitals
Redirect existing HL7 messages
Compatibility issues with the site-to-site Virtual Private Network (VPN) and modes of TCP application. Thus, open additional ports than are typically needed over the site-to-site VPN and set up the TCP receiver services to operate in all modes
-
System performance metrics
Though the average volume of messages increased from 26,000 to 45,000 per day, the average processing time per HL7 message from parsing to application of NLP remained similar (24 vs. 18 minutes)
Conclusions
GUARDIAN was able to demonstrate scaling up from two to seven hospitals with similar performance measures in terms of on-boarding and message processing times. The GUARDIAN system, and its associated data warehouse, was successfully expanded from one which supported a single group of hospitals, to one which can potentially support the hospitals of a major city.
Acknowledgments
GUARDIAN is funded by the US Department of Defense, Telemedicine and Advanced Technology Research Center, award numbers W81XWH-09-1-0662 and W81XWH-11-1-0711.
