Skip to main content
. Author manuscript; available in PMC: 2013 Jul 1.
Published in final edited form as: Med Care. 2012 Jul;50(Suppl):S49–S59. doi: 10.1097/MLR.0b013e318259c02b

Table 3.

Comparison of data flow and transformation from EHR to aggregated analysis. Grey cells indicate when data first leaves control of local site.

CER Project Name Raw data Natural Language Processing Data Normalization Data aggregation Data Analysis Tools
SPAN Data from local EHR, billing, and other sources are accessed in their native form, or extracted and stored in research data warehouse No Each site transforms local data to the common data model (HMORN VDW) using native coding systems (NDC, ICD9, HCPCS, LOINC); transformations checked centrally for consistency Centrally-developed application (analytic programs, queries) runs at local site, results sent to Data Coordinating Center or study lead site Menu-driven query tools for encounter-level VDW datasets are under development; tools available querying of aggregate data
WICER Extracted from local EHRs, sent to central data warehouse where it is combined, by patient, with data from multiple organizations (HIE warehouse – model) Yes, using general purpose MedLee28 Central DB uses Columbia's Medical Entities Dictionary (MED)29 to map to LOINC, ICD, SnoMed Central site Tool under development to allow end-users to identify patients with characteristics of interest, specify query constraints, and combine data elements
CER-HUB Data extracted from Local EHR, stored in local data warehouse Yes, using project-specific, knowledge-based MediClass applications11 Local site uses centrally-developed data processor – MediClass application–to populate Clinical Research Documents using Unified Medical Language System as standard knowledge base. Centrally-developed processors are downloaded to create study-specific clinical events that are transmitted to central site for aggregation and analysis Tools available for researchers to create new and review existing NLP query modules and test them against new data sets. Tools in development to enable investigators to analyze and review final aggregated results
RPDR Data extracted from local systems (some daily, some monthly), aggregated centrally in data warehouse. Yes, using project-specific pattern recognition models30,31 Data mapped to ICD-9-CM and COSTAR (for diseases), NDC (for medications), LOINC (for labs), CPT and HCPCS (for procedures) data stored centrally in local coding systems and dynamically mapped in query tool. All aggregate data is derived from queries generated through query tool accessing central data repository, detailed data on patients are gathered from central repository and through enterprise web services. Drag-and-drop web Query Tool allows users to construct ad hoc, Boolean queries for hypothesis generation from structured data, to get aggregate totals and to graph age, race, gender and vitals
INPC COMET-AD Extracted from local EHR (or payer), sent to central data warehouse, stored distinctly but can be combined at patient-level across multiple organizations; HIE-model Yes, using project-specific pattern recognition models32. Data are normalized and coded using standard vocabularies (LOINC, ICD9, CPT, NDC, etc.) when/where appropriate Specially developed data aggregation routines developed and run centrally All SAS and other statistical data analysis done at the central site; new data analysis tools are under development
SCOAP-CERTN Data are extracted from source systems and transmitted via VPN channels to the central data warehouse servers; message stream is queued then parsed; Text mining tools operate on the data within the central data warehouse; use hybrid text mining (rules base and statistical) to extract and tag the text and derive data elements needed for QA/QI and CER Parsed data stream is either directly mapped onto the HL-7 data model or into a staging table from which it is mapped to the HL-7 derived and augmented data model. Data extracted from local systems and sent to secure remote server cluster running Amalga Sites and SCOAP-CERTN personnel will use secure web sites and interfaces with appropriate authentication, authorization and logging to query the Amalga data warehouse