Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results for Microbiology from the PHIS+ Consortium

Ramkiran Gouripeddi; Phillip B Warner; Peter Mo; James E Levin; Rajendu Srivastava; Samir S Shah; David de Regt; Eric Kirkendall; Jonathan Bickel; E Kent Korgenski; Michelle Precourt; Richard L Stepanek; Joyce A Mitchell; Scott P Narus; Ron Keren

. 2012 Nov 3;2012:281–290.

Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results for Microbiology from the PHIS+ Consortium

Ramkiran Gouripeddi ¹, Phillip B Warner ¹, Peter Mo ¹, James E Levin ², Rajendu Srivastava ^1,³, Samir S Shah ^4,⁵, David de Regt ⁶, Eric Kirkendall ⁴, Jonathan Bickel ⁷, E Kent Korgenski ³, Michelle Precourt ⁸, Richard L Stepanek ⁹, Joyce A Mitchell ¹, Scott P Narus ¹, Ron Keren ⁸

PMCID: PMC3540481 PMID: 23304298

Abstract

Microbiology study results are necessary for conducting many comparative effectiveness research studies. Unlike core laboratory test results, microbiology results have a complex structure. Federating and integrating microbiology data from six disparate electronic medical record systems is challenging and requires a team of varied skills. The PHIS+ consortium which is partnership between members of the Pediatric Research in Inpatient Settings (PRIS) network, the Children’s Hospital Association and the University of Utah, have used “FURTHeR’ for federating laboratory data. We present our process and initial results for federating microbiology data from six pediatric hospitals.

Introduction

A great deal of infectious diseases research has been conducted using administrative data. In these studies, patients are defined as having an infection based on the presence of specific International Classification of Diseases, 9^th revision, Clinical Modification (ICD9-CM) codes and/or charges for diagnostic tests and therapies. However, in the absence of microbiology test results, it is difficult to validate the accuracy of case ascertainment strategies that rely on administrative data alone. The availability of microbiology test results could greatly enhance the accuracy of case ascertainment and, for bacterial infections, allow investigators to include antimicrobial susceptibility as a variable in their analyses.

As part of an Agency for Healthcare Research and Quality (AHRQ) funded project to generate new high quality evidence on the comparative effectiveness of healthcare interventions for hospitalized children, the Pediatric Research in Inpatient Settings (PRIS) network, the University of Utah Biomedical Informatics Core, and the Children’s Hospital Association (CHA) (formerly called Child Health Corporation of America) have partnered to augment CHA’s existing electronic database of detailed administrative data - the Pediatric Health Information System (PHIS) - with laboratory and radiology results from six children’s hospitals across multiple sites of care (inpatient, outpatient, emergency department, and same day surgery)¹. The augmented database, called PHIS+, has been developed on the Federated Utah Research and Translational Health electronic Repository (FURTHeR) platform²^–⁴. This platform which has been developed by the Biomedical Informatics Core (BMIC) researchers from the University of Utah (UU), federates and integrates heterogeneous data from multiple data sources providing syntactic and semantic data interoperability for clinical and translational research purposes. As part of the AHRQ-funded project, we will perform four comparative effectiveness research (CER) studies, three of which involve infectious diseases (pneumonia, osteomyelitis, and appendicitis). In order to federate the microbiology data, we created a data model to describe the results.

Microbiology results have complex data models, which become even more complicated when using data from multiple sources. Most previous attempts at building a data model for microbiology results have been limited in their scope; either addressing the needs of a single data source or creating a repository for a specific use-case⁵^–⁷. In this paper we describe the processes and initial results from the PHIS+ Microbiology database development efforts for the purposes of conducting pediatric CER studies.

Microbiology Data Federation Process Description

Microbiology Working Group

The PHIS+ consortium assembled a microbiology working group, which included specialists in pediatric infectious diseases and hospital medicine, information technology, microbiology, and informatics. Microbiology data are different from typical laboratory data in several important ways. They have multiple elements with different cardinalities between them, have a lag time for reporting final results, and often have multiple preliminary reports before a final result is reported. Queries may focus on organism incidence, specimen source, or susceptibilities of pathogenic organisms. Negative results (cultures with no growth) may also be important. Based on these discussions we concluded that a model needed to support these temporal aspects and multiple query types.

Using a simplified micro data model (Figure 1), we analyzed each of the fields that could be mapped to standard terminology. We discussed the availability of various time stamps associated with microbiology data. The microbiology working group representatives discussed their microbiology data with their laboratory and IT personnel to determine if they could provide data as represented in the simplified data model.

Pilot Study

The six contributing institutions use different laboratory information systems, electronic medical records and data warehouse systems as described previously¹. To understand each institution’s data and to verify that the data could be provided as specified by the simplified data model, each institution initially submitted one week’s sample of de-identified data. Our goals were to determine whether 1) the sites could provide the data, 2) the data could be provided in a discrete manner, and 3) data were coded using an in-house or standard terminology. Five of the six sites were able to extract and provide data as indicated by a common data model. Some microbiology tests, such as the rapid streptococcal antigen test, are similar to typical laboratory tests consisting of a test and a result value. They have neither the temporal aspect of culture studies nor their complex data model. Thus, we decided to group these non-culture microbiology studies with the laboratory component of PHIS+¹.

Standard Terminology Mapping

As a next step in the process, the informatics team identified fields that could be mapped to standard terminology. As part of FURTHeR’s regular process for mapping terms to standard terminology, the BMIC follows recommendations provided by Healthcare Information Technology Standards Panel (HITSP)⁸. Based on these recommendations, we used Systematized Nomenclature of Medicine--Clinical Terms (SNOMED)⁹ as the standard for specimens. After analyzing the local specimen terms and the coverage SNOMED provides, we decided that each local specimen term could be mapped to a SNOMED Specimen Type concept (e.g., swab specimen from wound), which is the specimen substance submitted for microbiology testing; and to a SNOMED Body Site concept (e.g., chest), which is the body structure from where the specimen was obtained. We considered using the PHIS drug hierarchy, a proprietary classification system employed by CHA, for antimicrobials but instead chose RxNorm¹⁰.

RxNorm is HITSP’s recommended standard for drugs and is updated on a regular basis. SNOMED was the choice for micro-organisms. One specific concern centered on the potential for future changes in names of micro-organisms and their taxonomical classification. For example, the bacterium Enterobacter sakazakii was recently reclassified in a novel genus called Cronobacter which resulted in the now recognized Cronobacter sakazakii¹¹. SNOMED provides regular updates that incorporate such changes using synonyms and relationships to manage them. Some of the local terms were ambiguous and could not be mapped to standard terminology. Sometimes local terms were in a coded form (e.g. ‘NPSUC’ for ‘nasal washing’), and others had multiple potential mappings (e.g. ‘tip’ which turns out to be a ‘catheter tip specimen’). We discussed these local terms with the respective sites and arrived at suitable mappings. We also had the IT and microbiology lab staffs at each of the sites review all of the mappings in their one week files and provide any feedback. Mappings were updated based on their feedback. By following this process we were able to map all the local terms in the one week files to standard terminology. The choices of the standards are described in the Common Data Model (CDM) section below.

Review of Mappings

Final mappings were reviewed by three pediatric infectious diseases (ID) specialists. During this review we gained a better understanding of what researchers would expect to be stored in the PHIS+ database. We discussed the modeling, choices of terminology standards for each of the fields, incorporated the group’s opinions and made the following observations:

A local specimen will usually have a specimen type (but not in all cases) and may or may not have a body site. (e.g., a local specimen ‘aspirate’ is a specimen obtained by aspiration but we need the body site information to determine the fluid aspirated).
A LOINC concept for a microbiological culture study is post-coordinated for the culture and results (e.g. bacteria identified in “anterior nares by aerobic culture”). We were interested in culture studies alone and therefore preferred to map them to SNOMED.
Sites represented their micro-organisms in multiple ways and most of these have specific mappings to SNOMED.
- ○ Most often, micro-organisms are provided at the species level but at other times they are at genus level (e.g., Staphylococcus aureus vs. Staphylococcus species). Rarely organisms were provided in a more complex manner indicating that they belonged to a particular genus but were not a particular species (e.g. Staphylococcus species, not aureus). In our discussions we decided that in such cases it is sufficient to provide mappings at the genus level.
- ○ At times, micro-organisms are provided as microbiological classification terms (e.g., coagulase-negative Staphylococcus, Gram-negative rods).
- ○ Culture studies that were negative for organisms or growth are often reported and we decided that it was important to store these results. Results indicating no organisms in a culture study are provided in multiple ways. Culture studies usually have a report of “no organism or growth seen”. But at times they could report “organism X not seen” or “normal flora”.. Sometimes the reports have more details such as “no growth seen after two days”. In our discussions with the microbiology working group, who considered a CER perspective, it was not important to model these complex negations and the time associated qualifiers; it would be sufficient to map all of these to a standard concept that denoted “no organism or growth”. We also decided to retain the local term phrasing for these negative results in the PHIS+ database, making it available for researchers interested in this level of detail.
- ○ At times, specific strains of a species are provided as the organism. SNOMED provides some organism strains and we decided to map local organism strains to the corresponding SNOMED concepts. In cases where a local strain did not have a corresponding SNOMED map, we mapped it at the species level and submitted that particular strain to IHTSDO for their review and inclusion into SNOMED (e.g. strains of Enterococcus faecalis). We will update the mappings of those organisms in the PHIS+ database they become available in SNOMED.
Some sites were not able to provide a unique specimen identifier. It was possible, however, to hypothesize a common specimen by using the specimen type, source, collection time, and number of cultures to determine whether multiple specimens were divided portions of one specimen. For example, a blood specimen provided for two different cultures and obtained from a venous line at a particular time can be assumed to be one specimen that has been divided into two bottles. In the absence of an explicit specimen ID, inferring them out of other fields could be complicated and error prone. We therefore decided to leave the specimen ID field blank when it was not explicitly provided by the sites.
An organism can initially be reported by the microscopic morphology as “Gram-positive cocci” and then later be identified as a specific organism from culture results (“Staphylococcus aureus”). We therefore decided to request a preliminary reporting date/time and final reporting date/time for each organism. In addition to these time-stamps we have the susceptibility time that is updated when the organism’s susceptibility data become available.

Common Data Model (CDM) for Data Submission

Using the one week data samples provided, BMIC had discussions with the five sites that were able to provide their data in a discrete manner to understand the structure of their microbiology data models. During these discussions, we verified the unique identifiers used for patients, encounters, specimens, culture studies, and sequence numbers used for micro-organisms and antimicrobial susceptibility tests. We also noted the various fields and time-stamps that each site associated with patients, encounters, specimens, culture studies, micro-organisms and susceptibility tests. We then developed graphical representations of each site’s micro data model and had them reviewed by each site. Next, using these individual models, we came up with a harmonized model (Figure 2).

Figure 2: — Assessment of metadata harmonization. Each sites’ field has be group into the CDM’s fields, associated sub-field, and its data elements.

FURTHeR’s processing of local terminology and data models requires knowledge about how local systems define, use and store their data. As many of the sites have fields that are free-text and are not in a coded form, we decided that rather than using local terminology it would be better to discover metadata during the processing of actual data files. This process was later supplemented with discussions with sites having ambiguous terms in their files.

The BMIC developed file format specifications for sites to submit future microbiology data based on the harmonized model. These file specifications were reviewed by the microbiology working group for domain coverage, by IT personnel at each site for their ability to provide data in this format, and finally with CHA to detect and handle fields with protected health information (PHI). We developed a final specification based on the feedback we received. With this file format, we are allowing sites some flexibility in how they will report their microbiology results in order to satisfy their local storage/reporting capabilities. FURTHeR allows heterogeneous data embedded in the harmonized data model and maps it to a common PHIS+ Micro Data Model.

Using the developed file format, we requested that sites submit their data as a text file based on the HL7 v2.x message syntax¹², using a pipe delimiter between data fields. As depicted in the harmonized model in Figure 2, each patient could have one or more encounters and could provide one or more specimens in a given encounter. Each specimen could have one or more microbiological studies performed on it, and each of these studies in turn could report zero or more microorganisms. Also, susceptibility tests with multiple antibiotics may be performed. We have included all administrative data of a patient encounter in the Patient Data Element (PDE); specimen and microbiological culture/study information in the Culture Data Element (CDE); organism related information in the Organism Data Element (ODE); and susceptibility information in Susceptibility Data Element (SDE). Due to lack of specimen identifiers with some sites we had to combine all specimen information with that of the culture study and include it in the CDE. Each row in the file ends with a carriage return. A description of the file format, data elements and field contents, and example culture studies explaining the hierarchical nature of the data were provided to the sites.

To maintain the hierarchical nature of the microbiological data model, we specified the use of a set of HL7-like type tags¹² and sequence numbers in the file format. The tags indicate the type of data element and the beginning and end of each row of data. PDEs, CDEs, ODEs and SDEs are tagged at the beginning and end using the following tags respectively: “PDE” and “PEOL”, “CDE” and “CEOL”, “ODE” and “OEOL’, and “SDE” and “SEOL”. Patient administrative information for each encounter is reported in a new PDE row. A CDE which includes the culture study is reported in a new row under a PDE corresponding to the patient for whom the study was provided. The organisms reported in that culture study are present below the corresponding culture study result row. The susceptibility tests performed on each these organisms are present below the corresponding organism row. Each PDE row has a sequence number [PDE] (PFSN). A CDE below a PDE is designated as belonging to that PDE. A CDE will have the PFSN to which it belongs and a File Sequence Number [CDE] (CFSN). This ties the culture study to the patient from whom the specimen was obtained. All ODE rows under a CDE row are designated as belonging to that CDE and are numbered incrementally starting from “1” until the next CDE row is encountered. Each ODE row will have the PFSN and the CFSN to which it belongs and a File Sequence Number [ODE] (OFSN). This ties the organism to the culture in which it was observed and the patient from whom the specimen was obtained. All SDE rows after an ODE row are designated as belonging to that ODE and are numbered incrementally starting from “1”. Each SDE row will have a PFSN, a CFSN and an OFSN to which it belongs and a File Sequence Number [SDE] (SFSN). This ties the susceptibility test to the organism it was performed on, the culture in which the organism was observed and the patient from whom the specimen was obtained. A CDE cannot exist without a parent PDE, an ODE cannot exist without a parent CDE, and a SDE cannot exist without a parent ODE.

Creation of the PHIS+ Micro Data Model

As a final step in the modeling efforts, BMIC enhanced FURTHeR’s processing capabilities by developing and incorporating a conceptual model for microbiology data storage. The entities and their relationships in this PHIS+ Micro Data Model were extracted from the harmonized data model and the submission file format specifications. Based on this conceptual model, we developed the logical and physical database models for microbiology data storage at CHA. The PHIS+ model consists of five data elements; the four present in the input file and a Specimen Data Element (SpDE) that will be created from the input CDE (Figure 4). Each data element in the storage model has its associated sequence number and all the input fields. Apart from these it also has the translated fields containing mappings to standard terminology (Table 1).

Figure 4: — Entity Relationship Diagram of the Harmonized and PHIS+ Micro Data Models.

Table 1:

Mapped fields within the input data element, corresponding fields in the output database and the choices of standards for mappings.

Submitted Data File Format (Input)		PHIS+ Micro Mode (Output)		Standard Terminology
Data Element	Field Name	Data Element	Field Name	Standard Terminology
CDE	Specimen	SpDE	spe_cd	SNOMED CT
		SpDE	spe_bodysite_cd	SNOMED CT
CDE	Local Culture Code	CDE	cul_cd	SNOMED CT
CDE		CDE	cul_stain_cd	SNOMED CT
CDE	Culture Result	CDE	cul_result	SNOMED CT
CDE	Culture Normalcy	CDE	cul_normalcy	HL7
ODE	Organism Code	ODE	org_cd	SNOMED CT
SDE	Local Antibiotic Code	SDE	sus_ab_cd	RxNorm
SDE	Susceptibility Test Code	SDE	sus_test_cd	LOINC
SDE	Susceptibility Test Interpretation	SDE	sus_interp_cd	HL7

Open in a new tab

Collection of One Month Sample

We then began collecting one month’s sample of data from the sites using the developed microbiology file format for submission. In this step we wanted to confirm that the sites could 1) generate data in the specified format, 2) use the data sample for software development, 3) map the local terms in these files to standards, and 4) process the files using FURTHeR. Because of our data use agreements and IRB protocols, the BMIC could receive only de-identified files. The sites were instructed to compile all the microbiology results from the month of January 2009 and submit them to CHA. CHA then stripped all the PHI fields (i.e., Patient ID, Billing Number, Encounter ID) and set all date/time fields to the same value (“January 1, 2009, 9:00 am”) before forwarding them to BMIC. The de-identified files and original files could be joined via the Sequence Number fields. All files were transmitted using secure FTP. All contributing sites were able to generate the data files in the specified format.

Terminology Mappings and use of Metamap

Upon receiving the de-identified data files, BMIC analyzed them to identify inconsistencies with the file formats, as well as the relationships between the data elements and the fields in each data elements. Sites were asked to resubmit their data when inconsistencies were identified. Unique terms in the input fields listed in Figure 3 in each site’s file were extracted and, following a process similar to that of one week files mentioned above and in [1], each of these local terms were mapped to the corresponding standard terminology. Many of these local terms are free-text entries so the BMIC terminologist used a local install of Metamap¹³ to semi-automate the mapping process. This was followed by a review of the generated mappings. These local terms and their mappings to standards were added to the terminology server in the site namespaces.

Figure 3: — Example Micro Result showing the use of tags at the beginning and end of each row. The tag at the beginning of each row is followed by sequence numbers that tie a child data element with its parent. The SDE element belongs to the PDE having the PFSN as 321, and the CDE with CFSN as 321. It has an OFSN of 1 which indicates that it is the first organism reported for this culture study. It the 12^th susceptibility test performed on this organism, indicated by the SFSN of 12.

Software Development, Data Processing and Error Logs

The input files are processed using a command-line-driven framework that implements FURTHeR’s translation engine³. The system may be run on any UNIX-based or Microsoft Windows operating system. Supported databases currently include MySQL and Microsoft SQL Server.

Input Data Format Problems and Considerations: The initial format expected each PDE followed by one or more CDEs, each of which was followed by its related ODEs, and so forth. However, data sent by CHA were sorted in a way that did not preserve the hierarchical ordering of the data. In other words, all PDEs were grouped at the top of the file, followed by all CDEs, and so forth. This introduces a data quality issue (data element association must be maintained), as FURTHeR’s file processing is batched³, where the ordering of the data and dependencies between each element cannot be considered during processing. Furthermore, a configurable number of lines are parsed and transformed into in-memory objects, processed, then persisted as a group to the database (minimizing round-trips to the database, thereby optimizing database persistence performance), thus allowing the framework to restrict the maximum number of in-memory objects, thereby reducing the memory workload and potential for crashing. These features require the framework to account for micro data elements by either requesting dependencies from the database for each element, or maintaining an internal record of all data elements encountered during processing. For performance reasons, we decided to maintain an internal accounting of data elements, as described below.

Data entities within the database are associated by foreign-key (FK) relationships using the Batch ID from the Batch table, and the list of FSNs (see Figure 4), as opposed to the typical method of foreign-key-to-primary-key relationships. An accounting of this composite FK constraint allows the framework to know which data elements have been encountered during processing (versus the alternative, i.e., querying the database for dependencies before attempting to persist each data element). If the dependencies for a given data element (namely the FSNs, as the Batch ID is always readily available) have not yet been persisted, the framework caches the data element until its dependencies are encountered after further processing. Once a data element is encountered, it is added to the data set (page) that is to be persisted and subsequently removed from the internal data element cache. This approach removes the potential for FK constraint violations. Finally, any data elements remaining in the cache after processing is complete, are considered orphans (i.e., the FSN dependencies for each were not encountered during processing), and are reported to the end user (a CHA administrator).

We use EHCache¹⁴ to temporarily store the data elements with missing dependencies. We use an internal hash-set to maintain the hierarchy of encountered data elements; we leverage two lists of sequence numbers, the first being the list of dependencies (for finding dependencies), while the other is the list of sequence numbers identifying the current element (for updating the internal hash-set for the encountered element). To support generic processing of each Micro data element, we used Abstract Factory pattern¹⁵; to support multiplicity of data elements, Builder pattern¹⁵ to construct the object indicated by each line of the input file. Finally, we used Java Persistence Architecture¹⁶ (specifically Hibernate¹⁷) for persistence.

Translations: Translations are performed using a translation engine interacting with our terminology server (Apelon DTS). The system logs errors in translations, which are composed of translations that were not found or translations that resulted in too many results. These errors are aggregated into sections, and assist the terminologist in determining what caused the translations to fail. It also logs errors pertaining to data quality which include parse errors (newlines, date formats), improperly formatted lines, orphans lines, duplicate lines, and missing data for required fields. Once processing of the input file has completed, the logs are printed to STDOUT in an aggregated format. While FURTHeR logs these general data errors, specific data quality checks related to each test type and site are expected to change with time; these changes will be managed by CHA who will take over all PHIS+ work beyond the grant period.

Microbiology Data Results

The BMIC was able to use the modified FURTHeR platform to read the one month text data file, perform the translations using the mappings stored in the terminology server and load rows into their corresponding tables in the database. Table 2 shows the number of records for each data type in the database. The table also contains the number of unique local terms and their corresponding standard codes.

Table 2:

Micro data result counts from the six hospitals. (Sites have been anonymized).

Sites		A	B	C	D	E	F
Number of Records in the Database	PDE	10139	12982	5445	1535	1640	4364
	CDE	10139	12982	5445	1535	1640	4364
	SpDE	10139	12982	5445	1535	1640	4364
	ODE	839	1633	988	626	506	4459
	SDE	10302	12829	6094	11739	5594	7349
Number of Unique Mapped Fields and their Corresponding Standards	Specimen	95	305	70	53	156	251
	SNOMED (Specimen Type)	82	128	62	50	130	146
	SNOMED (Body Site)	31	43	17	14	23	69
	Local Culture Code	59	34	33	22	61	28
	SNOMED (Culture)	52	31	24	21	56	28
	SNOMED (Stain)	12	3	14	1	3	1
	Culture Normalcy	-	2	-	-	2	2
	HL7	-	2	-	-	2	2
	Organism Code	55	53	83	38	53	130
	SNOMED	53	50	83	36	49	56
	Local Antibiotic Code	48	58	77	32	52	59
	RxNorm	42	50	73	30	48	54
	Susceptibility Test Code	102	113	126	56	76	97
	LOINC	102	113	126	56	76	97
	Susceptibility Test Interpretation	5	3	5	5	5	3
	HL7	3	3	5	5	5	3

Open in a new tab

Next Steps

We are currently collecting one year’s sample of micro data (2009) using the same file specifications and processes. BMIC will use this sample for further testing and make any necessary updates in the software. The local terms in these files will be translated to standards codes and included in the PHIS+ database. We will also review the mappings generated from these files for quality control. We will be collecting back-fills from 2007 onwards and prospective loads in a similar manner.

Discussion

Through the collaboration of multiple domain experts at multiple sites the PHIS+ micro working group and BMIC successfully mapped hospital microbiology data to a common, standards based database. As with the laboratory data federation¹, we found that dealing with data from the six pediatric hospitals simultaneously helped reveal various issues and sped up the modeling process. Our exploration of the different ways in which data were modeled across these sites facilitated the development of an accommodative model. For example, sites that report their “no organism or growth results” were allowed to submit this data as present in their systems and we simply included processes to integrate these fields. We also found that working with smaller samples initially helped sites generate files as specified in the file format and BMIC in development of the software processes.

Reviewing the models and mappings with the ID specialists helped the BMIC to understand the needed representation of microbiology data for CER studies. More complex models could have been developed from the data provided by the sites, which would then have their local terms mapped to very granular standard terminology mappings. But we realized this level of granularity would not have been useful for the PHIS+ context. For example, local organism terms associated with various qualifiers could have been managed by post-coordinating SNOMED concepts for representing them, but this would not have been useful. We will consider sharing the data mapping processes to the CDM model along with any metadata rules and logic available via a public resource in the near future, once the model has stabilized and the software is deployed in production.

Many sites have local terms as free text entries. A local term can be provided in different variations and combinations (e.g. Arterial blood, blood arterial, art. Blood) and is prone to errors in spellings. It is a tedious task to map and manage all these variants. While the terminologist is currently using Metamap as a semi-automated mapping tool, BMIC plans to integrate this into FURTHeR. Some sites have started coding their fields with in-house vocabularies and this will improve the quality and efficiency of our mapping process.

The main advantage to having a hierarchical format with our four data elements tagged at the beginning and end is that it allows the file size to be small. Each culture could report multiple organisms and each of these organisms could have multiple antimicrobial susceptibility results. But combining all of the elements into a single row with the susceptibility test result as the unique field in each row would produce a lot of redundant data. A potential problem is that some new institutions joining PHIS+ may not be able to provide data as specified in the file format. Our harmonized model was developed based on the data models of the five sites that could provide their data discretely. What was encouraging was that this data model was suitable to the sixth site that could not provide discrete data at the time of the model development. Nevertheless, there still remains the potential problem of a site not being able to provide data in the specified format. We are considering implementing a metadata repository (MDR) to overcome this problem. The MDR translates each local model to the FURTHeR data model and liberates the system from dependencies arising from new data sources/elements². The MDR could use the metadata associated with each field and provide a means to combine different fields in different site’s input files that provide the same information (e.g. “no organism or growth”). FURTHeR already uses an MDR in its normal operating mode², but it was not included in the PHIS+ architecture for the sake of operational simplicity. This would make FURTHeR even more generalizable in terms of accommodating new sites and also outside of PHIS+ with minimum local customization and central efforts towards metadata harmonization and mappings. As most of the software, metadata and terminology management technologies are easily available¹^–⁴, most of the local customization will be limited to mapping local terminologies to that for the FURTHeR model.

Although we considered the HL7 Microbiology messaging specification¹², the site-wise metadata variability and centralized architecture led us to devise our own file format specifications. A bigger challenge would have been to require the hospitals to generate HL7 messages within the short duration of the grant period. In this effort our desire was to standardize the data in order to support all types of CER studies, which includes not only those with therapeutic comparators but also other such as comparing the clinical care process. This would have required extending the HL7 specifications that are primarily developed for public health reporting¹². Our plans to introduce a MDR will allow us to be more flexible with regards to the submission file format such that sites can submit data based on what is available to them. At that time we will revisit the standard messaging options and consider the best options for PHIS+. Complete discussions on our messaging efforts will be addressed as a separate publication.

We considered XML as a preferred format for Micro data. However, as many of the individual hospitals could not support such a format we continued the use of the above mentioned delimited file. Furthermore, time considerations provided some guidance, as not only maintenance of XML parsing software (XML streaming APIs would be required to minimize memory usage), but maintenance of a schema would be required. Although XML would have been an ideal format due to the hierarchical nature of the data, maintenance and continued support on the part of all interested parties could also have become a potential drawback. However, a tool sufficient to create such XML files in a generic manner (considering the disparate data models from the various institutions) could be provided to all sites in the future.

We must also highlight that this study was possible due to a team consisting of diverse skills within the microbiology working group that included informaticists, researchers, microbiologists, as well as hospital medicine and infectious diseases physicians. We were helped by the clinical and IT personnel at each site and CHA who reviewed each step, ensuring their buy-in. We needed good communication between the six institutions, BMIC and CHA for the feasibility of this study and were provided with these channels by the project managers (PM) across all the sites. This effort was done in less than a year’s duration with many of the resources working on multiple simultaneous tasks related and unrelated to the project. The core informatics developmental work was done by a team of five consisting of a software engineer, data architects and terminologists.

Conclusion

During the second phase of the PHIS+ grant period, we were able to integrate microbiology data from six disparate sources and store it a common database at CHA. We encountered multiple challenges, but with input from a multi-disciplinary team were able to accomplish this work within a year’s duration. Subsequent steps in the project will be to collect all microbiology data from 2007 onwards, process it and store it in the PHIS+ database. We will also incorporate radiology reports into the database and then perform CER studies using these data. Successful completion of the CER studies will be the true test of the value of this federated database.

Acknowledgments

The authors would like to thank the Oversight Committee and Information Technology Committee members and their staff at each institution and CHA. We specially thank the clinical microbiology laboratory staff at each site. We would like to acknowledge the extensive contributions of our PM, Lauren Tanzer (CHOP), Matthew Whittaker (BMIC) and Jebi Miller (CHA), the PRIS Network Manager, Jaime Blank, and the PRIS Research Network. We acknowledge the terminology support provided by Apelon. FURTHeR development was supported by the NCRR and the NCATS, NIH, through Grant UL1RR025764 and supplement 3UL1RR025764-02S2. This project was funded under grant number R01 HS019862 from the AHRQ, U.S. Department of Health and Human Services (HHS). The opinions expressed [in this document] are those of the authors and do not reflect the official position of AHRQ or the HHS.

References

1.Narus SP, Srivastava R, Gouripeddi R, et al. Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results from the PHIS+ Consortium. AMIA Annu Symp Proc. 2011;2011:994–1003. [PMC free article] [PubMed] [Google Scholar]
2.Bradshaw RL, Matney S, Livne OE, et al. Architecture of a Federated Query Engine for Heterogeneous Resources. AMIA Annu Symp Proc. 2009;2009:70–74. [PMC free article] [PubMed] [Google Scholar]
3.Livne OE, Schultz ND, Narus SP. Federated Querying Architecture with Clinical & Translational Health IT Application. Journal of Medical Systems. 2011;35(5):1211–1224. doi: 10.1007/s10916-011-9720-3. [DOI] [PubMed] [Google Scholar]
4.Matney S, Bradshaw RL, Livne OE, et al. Developing a Semantic Framework for Clinical and Translational Research. AMIA Summit on Translational Bioinformatics. 2011. Available at: http://proceedings.amia.org/16pc81/. Accessed March 12, 2012.
5.Ormond-Walshe S. Computerised databases in infection control. Nurs Stand. 2000;14(18):43–45. doi: 10.7748/ns2000.01.14.18.43.c2748. [DOI] [PubMed] [Google Scholar]
6.Wisniewski MF, Kieszkowski P, Zagorski BM, et al. Development of a Clinical Data Warehouse for Hospital Infection Control. J Am Med Inform Assoc. 2003;10(5):454–462. doi: 10.1197/jamia.M1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ueki S, Kayaba H, Tomita N, et al. Development of a microbiology data warehouse (Akita-ReNICS) for networking hospitals in a medical region. Rinsho Byori. 2011;59(4):364–371. [PubMed] [Google Scholar]
8.HITSP Clinical Document and Message Terminology Component. Available at: http://wiki.hitsp.org/docs/C80/C80-1.html. Accessed March 12, 2012.
9.SNOMED CT. Available at: http://www.ihtsdo.org/snomed-ct/. Accessed March 12, 2012.
10.RxNorm. Available at: http://www.nlm.nih.gov/research/umls/rxnorm/. Accessed March 12, 2012.
11.Iversen C, Lehner A, et al. The taxonomy of Enterobacter sakazakii: proposal of a new genus Cronobacter gen. nov. and descriptions of Cronobacter sakazakii comb. nov. Cronobacter sakazakii subsp. sakazakii, comb. nov., Cronobacter sakazakii subsp. malonaticus subsp. nov., Cronobacter turicensis sp. nov., Cronobacter muytjensii sp. nov., Cronobacter dublinensis sp. nov. and Cronobacter genomospecies 1. BMC Evol Biol. 2007;7:64. doi: 10.1186/1471-2148-7-64. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Implementation Guide for Transmission of Microbiology Result Messages as Public Health Information using Version 2.3.1 of the Health Level Seven (HL7) Standard Protocol. Available at: ftp://ftp.ihe.net/MarketingAndPresentations/HIMSS04/vendor_workshop/1_Microbiology_Implementation_Guide_2003-05-27.pdf.
13.MetaMap. Available at: http://metamap.nlm.nih.gov/. Accessed March 12, 2012.
14.Ehcache | Performance at Any Scale. Available at: http://www.ehcache.org/. Accessed March 12, 2012.
15.Gamma E, Helm R, Johnson R, Vlissides J. Design Patterns: Elements of Reusable Object-Oriented Software. 1st ed. Addison-Wesley Professional; 1994. [Google Scholar]
16.The Java Community Process(SM) Program - JSRs: Java Specification Requests - detail JSR# 317. Available at: http://jcp.org/en/jsr/detail?id=317. Accessed March 12, 2012.
17.Bauer C, King G. Hibernate in Action. Manning Publications; 2004. [Google Scholar]

[b1-amia_2012_symp_0281] 1.Narus SP, Srivastava R, Gouripeddi R, et al. Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results from the PHIS+ Consortium. AMIA Annu Symp Proc. 2011;2011:994–1003. [PMC free article] [PubMed] [Google Scholar]

[b2-amia_2012_symp_0281] 2.Bradshaw RL, Matney S, Livne OE, et al. Architecture of a Federated Query Engine for Heterogeneous Resources. AMIA Annu Symp Proc. 2009;2009:70–74. [PMC free article] [PubMed] [Google Scholar]

[b3-amia_2012_symp_0281] 3.Livne OE, Schultz ND, Narus SP. Federated Querying Architecture with Clinical & Translational Health IT Application. Journal of Medical Systems. 2011;35(5):1211–1224. doi: 10.1007/s10916-011-9720-3. [DOI] [PubMed] [Google Scholar]

[b4-amia_2012_symp_0281] 4.Matney S, Bradshaw RL, Livne OE, et al. Developing a Semantic Framework for Clinical and Translational Research. AMIA Summit on Translational Bioinformatics. 2011. Available at: http://proceedings.amia.org/16pc81/. Accessed March 12, 2012.

[b5-amia_2012_symp_0281] 5.Ormond-Walshe S. Computerised databases in infection control. Nurs Stand. 2000;14(18):43–45. doi: 10.7748/ns2000.01.14.18.43.c2748. [DOI] [PubMed] [Google Scholar]

[b6-amia_2012_symp_0281] 6.Wisniewski MF, Kieszkowski P, Zagorski BM, et al. Development of a Clinical Data Warehouse for Hospital Infection Control. J Am Med Inform Assoc. 2003;10(5):454–462. doi: 10.1197/jamia.M1299. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b7-amia_2012_symp_0281] 7.Ueki S, Kayaba H, Tomita N, et al. Development of a microbiology data warehouse (Akita-ReNICS) for networking hospitals in a medical region. Rinsho Byori. 2011;59(4):364–371. [PubMed] [Google Scholar]

[b8-amia_2012_symp_0281] 8.HITSP Clinical Document and Message Terminology Component. Available at: http://wiki.hitsp.org/docs/C80/C80-1.html. Accessed March 12, 2012.

[b9-amia_2012_symp_0281] 9.SNOMED CT. Available at: http://www.ihtsdo.org/snomed-ct/. Accessed March 12, 2012.

[b10-amia_2012_symp_0281] 10.RxNorm. Available at: http://www.nlm.nih.gov/research/umls/rxnorm/. Accessed March 12, 2012.

[b11-amia_2012_symp_0281] 11.Iversen C, Lehner A, et al. The taxonomy of Enterobacter sakazakii: proposal of a new genus Cronobacter gen. nov. and descriptions of Cronobacter sakazakii comb. nov. Cronobacter sakazakii subsp. sakazakii, comb. nov., Cronobacter sakazakii subsp. malonaticus subsp. nov., Cronobacter turicensis sp. nov., Cronobacter muytjensii sp. nov., Cronobacter dublinensis sp. nov. and Cronobacter genomospecies 1. BMC Evol Biol. 2007;7:64. doi: 10.1186/1471-2148-7-64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12-amia_2012_symp_0281] 12.Implementation Guide for Transmission of Microbiology Result Messages as Public Health Information using Version 2.3.1 of the Health Level Seven (HL7) Standard Protocol. Available at: ftp://ftp.ihe.net/MarketingAndPresentations/HIMSS04/vendor_workshop/1_Microbiology_Implementation_Guide_2003-05-27.pdf.

[b13-amia_2012_symp_0281] 13.MetaMap. Available at: http://metamap.nlm.nih.gov/. Accessed March 12, 2012.

[b14-amia_2012_symp_0281] 14.Ehcache | Performance at Any Scale. Available at: http://www.ehcache.org/. Accessed March 12, 2012.

[b15-amia_2012_symp_0281] 15.Gamma E, Helm R, Johnson R, Vlissides J. Design Patterns: Elements of Reusable Object-Oriented Software. 1st ed. Addison-Wesley Professional; 1994. [Google Scholar]

[b16-amia_2012_symp_0281] 16.The Java Community Process(SM) Program - JSRs: Java Specification Requests - detail JSR# 317. Available at: http://jcp.org/en/jsr/detail?id=317. Accessed March 12, 2012.

[b17-amia_2012_symp_0281] 17.Bauer C, King G. Hibernate in Action. Manning Publications; 2004. [Google Scholar]

PERMALINK

Federating Clinical Data from Six Pediatric Hospitals: Process and Initial Results for Microbiology from the PHIS+ Consortium

Ramkiran Gouripeddi, MBBS, MS

Phillip B Warner, MS

Peter Mo

James E Levin, MD, PhD

Rajendu Srivastava, MD, FRCP(C), MPH

Samir S Shah, MD, MSCE

David de Regt

Eric Kirkendall, MD, FAAP

Jonathan Bickel, MD, MS

E Kent Korgenski, MS

Michelle Precourt, BSMT (ASCP)

Richard L Stepanek, MS, CPHIMS

Joyce A Mitchell, PhD

Scott P Narus, PhD

Ron Keren, MD, MPH

Abstract

Introduction

Microbiology Data Federation Process Description

Microbiology Working Group

Figure 1:

Pilot Study

Standard Terminology Mapping

Review of Mappings

Common Data Model (CDM) for Data Submission

Figure 2:

Creation of the PHIS+ Micro Data Model

Figure 4:

Table 1:

Collection of One Month Sample

Terminology Mappings and use of Metamap

Figure 3:

Software Development, Data Processing and Error Logs

Microbiology Data Results

Table 2:

Next Steps

Discussion

Conclusion

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases