Skip to main content
Clinical Medicine & Research logoLink to Clinical Medicine & Research
. 2014 Sep;12(1-2):94. doi: 10.3121/cmr.2014.1250.c3-2

C3-2: All-In-One: How Group Health Organizes Clinical Text from Clarity for Research

Scott Halgrim 1, David Carrell 1, Diem-Thy Tran 1
PMCID: PMC4453310

Abstract

Background/Aims

Group Health Cooperative runs Epic as its Electronic Medical Record. The clinical text stored in Clarity, Epic’s relational reporting database, is valuable to research at Group Health Research Institute (GHRI). However, due to a number of factors, GHRI’s access to this data was limited. These factors included: 1) a limited window during the day allotted to GHRI due to higher priority reports on Group Health’s care delivery side, 2) Clarity splitting text notes into lines of about 5,000 characters, and 3) restrictions on managing the database itself, like adding a full-text index. We sought to make the clinical text more valuable for research by making it available at all times, combining all the content of a note into one record, and allowing for more database management options.

Methods

We have developed a nightly Python process that moves clinical text from four Clarity tables into one full-text-indexed table on our own server. In addition, we store metadata about each note--including note type, encounter date, department, and provider--in a parallel table.

Results

This conversion process, begun in 2010, converts about 60,000 notes per night and has converted every extant note in Group Health’s Clarity database for a total of 123 million notes as of October 2013. The notes’ availability has sped development of sophisticated NLP algorithms in the years since its inception. Another benefit is nightly automated status e-mails sent to the developers. When there was a recent import of several years of historical notes from legacy systems into Epic, GHRI knew immediately that a greater history of notes was available for research.

Conclusions

The text store at GHRI has strengthened research and grant submissions. Due to Clarity’s consistent data model and large footprint throughout the nation’s medical community, the solution should be easily transferable to other sites wishing to realize the same advantages. The solution is amenable to enhancement as needs arise for more metadata or for clinical text from other parts of Clarity. Remaining challenges are tracking changes to notes in Clarity and improving performance.

Keywords: Natural language processing, Clarity


Articles from Clinical Medicine & Research are provided here courtesy of Marshfield Clinic

RESOURCES