Abstract
In this paper, we present the design and implementation of a regional health information system that reconciles patient clinical data from heterogeneous Point of Services(POS) applications and supports complicated clinical queries. We propose to design a simple XML format for the representation of clinical documents and a messaging-based protocol for exchanging of clinical documents to facilitate the adoption of international standards such as CDA and IHE XDS profile for local application vendors. We also propose to use a hybrid relational-XML database for the storage of CDA documents that leverages both relational and XML optimization techniques to improve the performance of flexible clinical queries. This system has been deployed in a pilot phase to a regional health information organization operated by a top hospital in Beijing, China.
Introduction
A Health Information Exchange (HIE) is an electronic movement of health-related information among organizations utilizing nationally recognized standards and policies, as defined by the National Alliance for Health Information Technology in 2008. HIE has gained world wide adoption, with United States leading the way, because of it’s promise to improve the quality, safety, and efficiency of patient care 1. The potential for HIE to reduce costs and improve the quality of health care in ambulatory primary care settings is also well recognized 2. If the HIE initiatives are operated on the regional level, the governed organization is often called “Regional Health Information Organization” (RHIO).
In China, building effective and shareable health information system is one of the eight key building blocks of the China Health Reform proposed in 2009 3. Here, the regional health information network is the focus initiative believed to improve the healthcare quality while reducing the cost. Regional HIE is still at its early stage in China due to the lack of interoperability standards and implementation guidelines. To foster the development, the Steering Committee of Health Informatics led by China Ministry of Health has published the “Guideline for Building the Regional Health Information Platform based on Electronic Health Records” in 2009. This guideline describes a technical architecture for the regional health information platform, which includes a regional HIE to exchange data among disparate healthcare information systems and a central EHR (Electronic Health Records) data repository that stores a longitudinal electronic record of patient health information generated by encounters in any care delivery setting. This committee also published the “Architecture and Data standard of Electronic Health Records” (a.k.a. MOH_EHR) in 2009, which defines a conceptual model of EHR and the data elements to be collected in EHR.
To have a pilot implementation for such a regional health information platform, we has been partnering with a RHIO in China since 2009. This RHIO is driven by the Peking University People’s Hospital (PKUPH), one of the top hospitals in China with more than 1.5 million outpatient visits annually, providing 1500 beds, and connecting more than 10 community healthcare service centers and 40 healthcare employee service units. We aim to build an interoperable regional health information platform, so we use the well-known HL7 CDA (Clinical Document Architecture) 4 standard for the representation of clinical documents and the IHE XDS (Cross-Enterprise Document Sharing) profile 5 for the exchange of clinical documents. This system have been successfully implemented and deployed on this RHIO at the pilot phase. During the design and implementation, we faced two main challenges:
How to use both international and local standards for the integration with the Point Of Service (POS) applications developed by local vendors. International standard adoption is still slow in China as a developing country, while both CDA and IHE XDS profile have not been accepted as national standards. Meanwhile, there are more than 500 health information system vendors in China, 85 percent of which are medium-size or small-size companies 6. The involvement of international standards in those companies is not active. According to the IHE-C Connectathon testing result from 2008 to 2010, only three local vendors support the IHE XDS Document Consumer actor in 2010 7. So it is not practicable to expect every POS application vendor to submit the clinical documents in form of CDA via the IHE XDS protocol.
How to build an EHR data repository that supports answering complex clinical queries over large volumes of CDA documents. After clinical documents from disparate health information systems are collected, the clinical data in the documents should not only be accessed via IHE XDS profile in the granularity of document, but also be enabled for the “secondary use of health data” 8 by supporting complex clinical data usage scenarios, such as risk assessment, chronic disease management and clinical trials. In addition, the PKUPH has accumulated data of more than 10 million patients with at least 100 million clinical documents. To manage such large volumes of data efficiently, a high performance physical schema design and query answering implementation is needed.
We believe that the challenges we face are not unique, especially for the developing countries. To address the two challenges, we proposed a front-end adaptor for the POS vendors to facilitate the adoption of the international standards and using a hybrid relational-XML database 11 for the storage of CDA documents. Our contributions of this work can be summarized as below.
We propose a methodology to design relatively simple XML interfaces derived from CDA templates for the representation of clinical documents and a messaging-based protocol for exchanging of clinical documents. The simple XML format is derived by “marrying” the international standard such as HL7-ATSM CCD 9 and IHE PCC content profile 10 with the local data element standard MOH_EHR. Then we transform the simple XML format to CDA format and submit the CDA documents via IHE XDS profile. The main advantage is that the POS application vendors need not have prior knowledge of the CDA nor the IHE XDS profile. This helps to lower the high barrier of the adoption of such international standards.
We propose to use a hybrid relational-XML database 11 for the storage of CDA documents, supporting both patient-centric data services to query for a patient’s vital signs, problems, medications and other clinical data, and population-centric data services to query for a list of eligible patients satisfying a list of clinical conditions. We presented a simple but efficient physical schema for the EHR data repository. The preliminary experimental results show that the query performance is quite efficient, especially for the queries of clinical data for a given patient.
Methods
The overall design of the regional health information platform is shown in Figure 1. This system consists of the following components:
Data Registries: They are registry systems to manage the information required to uniquely identify the actors and resources in the platform. These identified elements include the demographic data of the patients (EMPI Registry, Enterprise Mater Patient Index), the providers of care (Provider Registry) and the terminologies used to describe diseases, procedures or other clinical data (Terminology Registry).
XDS Document Registry and Repository: An implementation of the IHE XDS profile providing a standards-based system for managing the sharing of clinical documents among healthcare enterprises. It includes a document repository responsible for storing documents in a transparent, secure, reliable and persistent manner and responding to document retrieval requests. And it also has a document registry responsible for storing metadata about those documents so that the documents of interest for the care of a patient can be easily found, selected and retrieved.
POS Document Exchanger: It interacts with the POS applications to request and receive clinical documents, and submits the collected documents to the XDS Document Registry and Repository.
EHR Data Repository: It stores the CDA documents in a hybrid relational-XML database and provides fine-granularity access to the clinical data, including patient centric query of clinical data for vital signs, allergies, medications, immunizations, diagnosis, procedures and visit histories, and population centric query for eligible patients satisfying a set of clinical conditions.
EHR Data Loader: It fetches newly submitted CDA documents from the XDS document repository, shreds and loads the CDA documents into the EHR data repository.
Regional Health Service Bus: It provides messaging, transformation and communication mechanism for all the components.
EHR Portal: It provides a single seamlessly-integrated view of all available, relevant clinical information about patients regardless of its source. It displays the EHR document view by querying the XDS document Registry and Repository, and the EHR data view by querying the EHR data repository. In addition, for the EHR data, it will shows the link to the CDA document from which the data is extracted.
POS Applications: The Point of Service (POS) applications are the information systems in the healthcare enterprises, such as Lab Information System, Health Information System, Radiology Information System, Electric Medical Record systems, etc. These systems are the data sources of the regional health information platform.
POS Document Exchanger
The IHE XDS Profile is the foremost framework to share and exchange clinical documents in form of CDA among healthcare enterprises. But in our experience, we find it is difficult to directly use CDA and XDS for regional health information exchange because the local POS application vendors have limeted development experience in international standards as CDA and IHE integration profiles. To address this issue, we designed a simple document exchange protocol for POS application vendors.. At the same time, the architecture allows the POS application vendors directly submit CDA documents into the XDS Registry and Repository if they have sufficient development skills in international standards.
In the protocol, the clinical documents to be submitted by POS applications are in a simple XML format, which is close to the format of their original data and is much simpler than CDA. For each specific kind of clinical document, like admission record or lab test result report, we design a simple XML format, rather than using the general CDA format. The design of the simple XML format is based on the HL7-ASTM CCD specification, IHE PCC content modules and the EHR Data Elements in MOH_EHR specification, which defines some common used data elements in EHR content with an identifier, name, description, and data type and data range. To design the simple interface XML format for a clinical document, we propose a methodology as shown in Figure 2. We first identify the IHE PCC or CCD sections or entry templates in this document, find the related data elements in MOH_EHR specification, and then define the mappings between the MOH_EHR data elements and the elements in the CDA entries. Based on the mappings, we build a template model for the documents, sections and entries. Finally, we derive a simple XML format by replacing the CDA elements by MOH_EHR data elements supplemented by schematron rules and the XSLT transformations from the simple interface XML document to CDA.
For example, for the Admission Diagnosis Section there is a mapping from the effectiveTime of the problem observation to the MOH_EHR data elements “DiagnosisTime”, then we can build a template model based on IHE PCC content modules represents the basic structure of this section, i.e., an Admission Diagnosis Section (“AdmissionDiagnosisSection”) includes a problem concern entry (“AdmissionDiagnosisAct”), which includes a problem observation (“AdmissionDiagnosisObservation”). Based on the template model and mappings, the element hierarchy in the simple XML format will be like “AdmissionDiagnosisSection/AdmissionDiagnosisAct/AdmissionDiagnosisObservation/DiagnosisTime”. Figure 3 shows the sample of the template model, the mapping, and the customized XML schema. From Figure 4, it can be found that the simple XML format (L) is much simpler than the transformed CDA format (R) of the admission diagnosis section.
In the protocol, the document submission is based on the asynchronous messaging mechanism which is easier for implementation than the IHE XDS profile. By using the asynchronous messaging, the POS applications can be loosely coupled from the regional health information platform. This protocol has two transactions triggered by the events: new EHR record created and new POS document generated. When an EHR record for a patient is created, the POS Document Exchanger will publish the message to the messaging server. All the POS applications will subscribe this kind of message. After receiving this message, each POS application will answer the message by sending back all the historical clinical documents of this patient. When a new POS document is generated for a patient with EHR record, the POS system will send a message to the POS Document Exchanger with the new document.
The POS Document Exchanger implements this document exchange protocol and serves as the adaptor of the XDS registry and Repository for the POS applications. It fetches the submitted XML documents from the messaging server, transforms them to the standard CDA formats and submits the CDA documents to the XDS Registry and Repository using IHE XDS protocol.
EHR Data Repository
The EHR Data Repository persistently stores the CDA documents and provides the EHR data services to applications, such as EHR portal. One fundamental problem is the choice of the database among relational database, XML database or others. Our choice is the hybrid relational-XML database. We will show the rationale, the physical schema design and the supported data services.
Since CDA is an XML-based standard format, so it is straightforward to choose XML database as the data store. The simplest schema for CDA documents in XML database is just to store the entire CDA documents into one table and all the queries on the data are implemented by XQuery. However, this solution can not guarantee the query performance, because XML indexes are not as efficient as the relational index, and there are still many limitations on XML column as some powerful relational optimization techniques can not be applied, such as range partition and multi-dimensional clustering. Another approach is to shred CDA documents into a relational database with many tables. This approach suffers from the query performance, because queries over shredded XML data often requires multi-way SQL joins to reconstruct the XML fragment, which is inefficient and hard to maintain. Thus, we believe the best practice is to use hybrid storage, i.e. with some of the data in relational format and some of the data in XML format. To design the hybrid schema, we proposed a hybrid model to store CDA documents according to the data access patterns of EHR applications. The basic idea is to store the XML elements or attributes likely to be query filter parameters as the normalized relational part and the elements likely to be returned by the queries as the XML part, then to de-normalize the relational part to improve query performance using the well-known de-normalization techniques, such as collapsing relationships, horizontal and vertical partition, redundant attributes and derived attributes 12.
Figure 5 shows the main part of hybrid schema. There is one table corresponding to each XML element of Act, Document, Section, Organizer, Observation, Procedure, SubstanceAdministration and Encounter in CDA document, because each element may be returned for a query, such as querying for patient’s past medical history section, or lab test result organizer. The common columns are: ACT_ID (the sequence id generated by database), the TEMPLATEID (an XML column to represent the child element templateId), CODE (a relational column shredded from the child element code to refer the code id in Terminology Registry), the EFFECTIVETIME_LOW and EFFECTIVETIME_HIGH (the relational columns shredded from effectiveTime element in CDA), the XMLCONTENT (an XML column to represent the corresponding element in CDA), the DOCUMENT_ID(the foreign key to indicate which document this element belongs to, the EMPI_ID(the unique Id of patient in EMPI Registry), along with the the ENCOUNTER_ID (the foreign key to indicate which encounter this element belongs). Additionally, For the Observation schema element, there is more relational data shredded extracted from its child elements value and interpreationCode to support queries such as finding patients with HbA1c is greater than 7.0 or is higher than the reference range. Figure 6 shows the data loading diagram for the table OBSERVATION.
EHR data services can be implemented based on the SQL/XML capabilities provided by the hybrid relational-XML database. At present, the supported EHR data services include patient-centric data service, population-centric data service and EHR alerts service. The patient-centric data service is similar to IHE Query for Existing Data Profile (QED) 13 that queries clinical data for a given patient, including vital signs, problems, medications, etc. For example, one is able to query for the HbA1c’s testing result in the last 1 year of patient A. In addition, the patient-centric data service includes those for dynamically building a referral summary for a hospital visit including diagnosis, imaging results, lab test results and medications etc. during this visit. The population-centric data service supports querying for a list of patient satisfying a conjunction of conditions that can be the demographic data and clinical data. For example, querying for patient with more than 50 years old and Type II diabetics with medication history of Metformin Hydrochloride Tablets. The population-centric data service can be used to support clinical trial and health alerts. The EHR alert service is to generate a list of alerts and reminders of each patient based on his/her clinical data, w.r.t. a set of user-defined alert conditions.
EHR Portal
The EHR portal provides an integrated view on the clinical data for different roles, including clinicians, patients, clinical researchers and healthcare service managers. In the document view, the EHR portal provides the browsing of the clinical documents organized by folders in XDS Registry and querying of clinical documents based on the XDS metadata, including the effective time, the document code and the custodian organization.
In the data view, the EHR portal provides an integrated view of patient’s clinical data, including a customizable EHR data summary, a summarized basic health information, a list of encounters with summaries, a list of physical examination results, a list of imaging results, a list of lab test results, a list of medications and a list of regular body checkup results. The EHR data summary (shown in Figure 7) includes the latest diagnosis, encounters, medications and interested examination and lab test results. The summarized basic health information includes the demographic data from EMPI Registry, the past medical histories, the allergies, the social history, the family history, the immunizations and the procedures histories. The EHR portal also provides a query builder for clinical researchers to build complex clinical data query on the EHR data repository, the query conditions can be demographic data, vital signs, lab test results, diagnosis, procedures and medications.
Results
This system has been implemented by using Java language, several open source packages and several industrial-strength IBM software products and solutions. The XDS document registry and repository is using IBM Health Information Exchange (HIE) Solution V1.0. The POS document exchanger is implemented by using the open source IHE Integration profiles V1.0 from the Open Health Tools organization to interact with the XDS Registry and Repository. The Regional Health Service Bus is implemented based on the IBM WebSphere Message Queue and Broker V6.1, while the CDA Converter is implemented using the ESQL language in WebSphere Message Broker for performance consideration. The EHR data repository is implemented based on the IBM DB2 pureXML V9.7 that provides a performance-proven hybrid storage of relational and XML data. The EHR Portal is implemented based on IBM WebSphere Portal Server V6.1. The EHR Data Loader is implemented by using store procedures plus XMLTable function supported in DB2 pureXML.
This system has connected 6 POS applications. Additionally, we have designed 14 CDA document templates and 46 section templates in the simple XML format, as shown in Table 1, leveraging the CCD, IHE PCC content modules and MOH_EHR data element specifications. Using our simple document exchange protocol, it only takes less than 10 person days for each POS application vendor to develop the module to submit documents to the regional health information platform.
Table 1.
POS Application | Categories of Clinical Document |
---|---|
RIS system | Imaging result report (with a link to the image on the PACS server) |
LIS system | Lab test result report |
EMR system | Admission note, Procedure record, Discharge summary, Summary of Death, Summarization of episode note in 24 Hours, Summary of Death in 24 Hours, Outpatient consultation note |
BodyCheckup System | Body checkup result report, health questionnaire form |
Hospital Information Sys. | Prescription, Payment sheet |
Regional Health Collaboration System | Referral summary |
This system has been deployed on the RHIO driven by the PKUPH. During the pilot phase, we have created EHR records for 3,361 patients and the POS applications have submitted 55,359 CDA level 3 documents to the repository. In detail, there are 31,401 lab test reports, 3,351 prescriptions, 6,399 imaging reports, 4,666 body checkup reports, 209 admission records, 82 procedure notes, 209 discharge summaries, and 9,044 payment sheets. We have conducted a preliminary user acceptance testing to several physicians from PKUPH and healthcare service managers from employers. The feedback is positive in that the EHR portal provides an integrated view of patient health data, which was not possible before the deployment of this system.
To have a hand-on experience on the performance of the EHR data repository using DB2 pureXML V9.7, we performed a preliminary experiment study where 10K, 100K, and 1M CDA level 3 documents are loaded into the EHR data repository in three runs. These documents are lab test reports exported from the LIS system in PKUPH with protected privacy information. All the experiments are carried on an IBM X3650 server with an Intel(R) Xeon 5160 3.00GHz CPU and 8 GB of RAM. We designed three typical clinical queries on the EHR data repository: (1) Q1: query all lab test organizers for a patient (returned with the XML element organizer in CDA documents); (2)Q2: query all the abnormal lab test observations in the last year for a patient (returned with the XML element observation in CDA documents); (3) Q3: query all the patients with the interpretation code of Triglyceride testing is HIGH or HbA1C result is larger than 6.2 (return a list of patient’s EMPI_ID). To compare the query performance of using native XML database, we also designed a schema without the relational columns that extracted from the XML fragments to improve the query performance. For example, in the table OBSERVATION, the columns CODE_ID, EFFECTIVETIME_LOW, EFFECTIVETIME_HIGH, VALUE_CD, VALUE_STRING, VALUE_NUMERIC and INTERPRETATIONCODE_ID were removed. In each schema, the relational indexes and XML indexes related to the queries were created. The query response times were reported as the average result of 5 runs without any specific database performance tuning.
Table 2 shows the query response times of the designed queries with different datasets. Based on the query performance data on the dataset 10K and 100K, the queries on the hybrid schema are much more efficient because the relational indexes are more efficient than XML index. As it’s common to compress XML data by 60 to 70 percent, the XML compressing can also boost the performance because it reduces the I/O operations by enabling the database to store more columns per page. In addition, query answering for Q1 and Q2 is more efficient than Q3, because the database can quickly identify the related rows if the patient id is fixed by only scanning the index on the column of EMPI_ID. The query performance of Q3 is relatively slow for that it returns about 20 thousands rows without paging support.
Table 2.
Data Set | Storage Type | Q1(s) | Q2(s) | Q3(s) |
---|---|---|---|---|
10K | XML | 0.002071 | 0.010123 | 2.449527 |
Hybrid | 0.002413 | 0.001897 | 0.091869 | |
100K | XML | 0.002305 | 0.013455 | 3.529810 |
Hybrid | 0.002547 | 0.002272 | 0.392873 | |
1M | XML | N/A | N/A | N/A |
Hybrid | 0.364126 | 0.604943 | 6.573756 | |
Hybrid+XML compress | 0.011244 | 0.015008 | 3.045117 |
Discussion
Standards Adoption
We adopted the IHE XDS profile as the clinical document exchange and sharing standard and the CDA as the clinical document format standard. The combination of CDA and IHE XDS profile is a good choice for the sharing of clinical documents among healthcare enterprises. But we found the local POS application developers have limited skills in international standards implementation, especially the complex HL7 standards such as CDA. So we designed a front-end adaptor for the POS vendors, including the simple XML format for clinical documents and the document exchange protocol. We believe this is the key factor for the success of this project because all the EHR data comes from the various POS applications. The project delivery would be easily delayed if we require all the POS vendors to submit the documents in form of CDA via IHE XDS profile, because the learning cost of such standards of each vendor cannot be fully pre-determined.
For the terminology standard, we adopted the ICD-10 for diseases and ICD-9-CM for procedures. But for the lab test data, we did not adopt the LOINC standard because it is not supported by the local LIS applications and the mapping is difficult. We also did not adopt the SNOMED CT, because China is not a national member of the IHTSDO and has no license for SNOMED CT use.
For the EHR data service standard, we noticed that IHE QED profile serves for this goal. But after a closer look on the IHE QED profile, we think it is not compatible to the CDA model, because the messaging format of QED is based on the HL7 care provision domain, whose underlying clinical statement model is not exactly the same as the one of CDA.
CDA Repository
The efficient management, storage and retrieval of CDA document is a challenge for EHR systems. There have been some proposals for the CDA storage in a relational database 14, object-relational database 15 and XML database 16, 17. Based on the characteristics of CDA documents and the requirements of EHR applications, we proposed to use hybrid relational-XML database as the storage of CDA documents. We believe that hybrid relational-XML database is more suitable for the CDA storage than relational database or XML database. Yuwen S,et al. 18 also propose to use hybrid relational-XML database to store the CDA documents, however, the schema design is not clear and the testing results are performed on only 100 CDA documents. Our current CDA repository does not support the context propagation mechanism and negation indicator in CDA specification; we will further extend the functionalities and evolve it to a more general CDA repository.
Conclusion
This paper describes the design and implementation of a regional health information system based on the electronic health records. We mainly highlight (1) the simple document exchange protocol to enable the local application vendors in China to adopt the international interoperability standards such as CDA and IHE XDS profile; (2) and the use of hybrid relational-XML storage of CDA documents that leverage both relational and XML optimization techniques to improve the performance of flexible clinical queries.
Acknowledgments
We thank Peking University Peoples Hospital for their great supports and Sarah Knoop for her careful proofreading on this paper.
References
- 1.eHealth Initiative, The state of health information exchange in 2010: connecting the nation to achieve meaningful use. 2010.
- 2.Fontaine P, Ross SE, Zink T, et al. Systematic review of health information exchange in primary care practices. J Am Board Fam Med. 2010;23(5):655–670. doi: 10.3122/jabfm.2010.05.090192. [DOI] [PubMed] [Google Scholar]
- 3.Chen Z. Launch of the health-care reform plan in China. Lancet. 2009;373(9672):1322–4. doi: 10.1016/S0140-6736(09)60753-4. [DOI] [PubMed] [Google Scholar]
- 4.Dolin RH, Alschuler L, Boyer S, et al. HL7 clinical document architecture, release 2.0. J Am Med Inform Assoc. 2006;13(1):30–9. doi: 10.1197/jamia.M1888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.ACC, HIMSS and RSNA Integrating the healthcare enterprise IT infrastructure technical framework volume 1: integration profiles. 2007.
- 6.CHIMA and Accenture, The white paper on China’s hospital information systems. 2008. Available at: http://www.chima.org.cn/pe/DataCenter/UploadFiles_8400/200812/20081219115545203.pdf
- 7.The IHE-C Connectathon 2008, 2009, 2010 testing results. Available at: http://www.ihec.org/intro/news/
- 8.Safran C, Bloomrosen M, Hammond WE, et al. Toward a national framework for the secondary use of health data: an American Medical Informatics Association white paper. J Am Med Inform Assoc. 2007 Jan-Feb;14(1):1–9. doi: 10.1197/jamia.M2273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.HL7 and ASTM. HL7 Implementation Guide: CDA Release 2 – Continuity of Care Document (CCD). 2007.
- 10.Moore S. IHE Patient Care Coordination Technical Framework Supplement: CDA Content Modules. 2009.
- 11.Beyer KS, Cochrane RJ, Josifovski V, et al. System RX: one part relational, one part XML. Proc. of SIGMOD Conference; 2005. [Google Scholar]
- 12.Shin SK, Sanders GL. Denormalization strategies for data retrieval from data warehouses. Decision Support Systems. 2006:267–282. [Google Scholar]
- 13.IHE QED Profile: http://wiki.ihe.net/index.php?title=PCC-1#Query_Existing_Data.
- 14.Eggebraaten TJ, Tenner JW, Dubbels JC. A health-care data model based on the HL7 Reference Information Model. IBM SYSTEMS JOURNAL. 2007;46(1) [Google Scholar]
- 15.Liang Z, Bodorik P, Shepherd M. Storage model for CDA documents. Proceedings of the 36th Hawaii International Conference on System Sciences. [Google Scholar]
- 16.Li H, Dua H, Lu X, et al. A clinical document repository for CDA documents. The 1st International Conference Bioinformatics and Biomedical Engineering; 2007. [Google Scholar]
- 17.Bianchi S, Burla A, Conti C, et al. Semantic warehousing of diverse biomedical information. NGITS. 2009:73–85. [Google Scholar]
- 18.Yuwen S, Yang X, Li H. Research on the EMR storage model, 2009 International Forum on Computer Science-Technology and Applications.