Abstract
At the Ohio State University Medical Center, a significant amount of valuable data pertaining to a patient’s visit is stored in the form of dictated reports such as discharge summaries, cardiology reports, and radiology reports. We have implemented conceptual search capability to facilitate more comprehensive content mining from clinical free text.
Introduction
To implement conceptual searches on free-text documents in our Information Warehouse (IW) we have utilized UMLS Metathesaurus by NIH. We have built an indexing scheme based on the Metathesaurus (1), where documents containing desired concepts can be searched. The acquisition of text reports containing certain concepts is essential for many reasons: 1) It would allow PIs to capture otherwise overlooked data in an efficient manner. 2) It may serve PIs who are trying to recruit patients for clinical trial studies. 3) It may be utilized to de-identify dictated reports.
Prototype
To explore the potential capabilities of such a system we have developed a prototype. The prototype allows users to interactively search for concepts and view the documents that contain the concepts of interest. The following features are available:
Retrieves related concepts to a given word or phrase.
Allows users to interactively refine their search and add additional concepts to their already selected concepts.
Enables users to browse and to visualize preprocessed dictated reports through a web interface.
Allows search trough multiple categories of reports.
Employs multiple dictionaries (SNOMED CT etc.)
Facilitates viewing multiple documents side by side.
Methodology
The prototype’s functionality can be broken into two categories: Preprocessing and Browsing.
Preprocessing
The contents of each text document are broken down to phrases by utilizing the UMLS Natural Language Processing (NLP) tools (2). Each phrase is then queried against UMLS, and a frequency score is generated per concept within the document. Finally, each document is placed into a concept index with the following columns: document identifier, frequency of the concept, number of times the concept has occurred, and the number of phrases in the document (Figure 1)
Figure 1.
Browsing
Browsing provides two main functionalities: Interactive Concept Search and Document Browsing.
Interactive Concept Search: Users can submit words or phrases through the web interface. For each submission a set of sibling concepts and their children will be returned. Since a concept may be related to many other concepts, users can make interactive selections in order to narrow down their search at this stage. For example the word “cold” may mean “flu”, “cold weather”, or “Chronic Obstructive Lung Disease” (Figure 2).
Document Browsing: Once users have finalized their selection with concepts, they may proceed to the document browsing stage. Here document identifiers will be returned to the user as hyperlinks based on their search criteria.
Figure 2.
Results and Conclusions
Figures 1 and 2 illustrate conceptual searches through large repositories of preprocessed clinical free text. Enabling concept based searches on free text provides great flexibility for researchers. Conceptual search adds tremendous value beyond the capabilities of basic text search. Though we have applied our system to already de-identified documents, we have future plans to extend our application to serve as a de-identification tool as well.
Acknowledgements
Special thanks to Jianhua Liu, Scott Silvey, Michael Truman and Kabardhi Pasuparthi
References
- 1.Lindberg D, et al. The Unified Medical Language System. Methods Inf Med. 1993;32(4):281–91. doi: 10.1055/s-0038-1634945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Browne C, et al. The Specialist Lexicon 06/2000. NLM; Bethesda, MD: [Google Scholar]


