Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2006;2006:918.

An Indexing Scheme for Medical Free Text Searches: A Prototype

Selnur Erdal 1, Jyoti Kamal 1
PMCID: PMC1839497  PMID: 17238537

Abstract

At the Ohio State University Medical Center, a significant amount of valuable data pertaining to a patient’s visit is stored in the form of dictated reports such as discharge summaries, cardiology reports, and radiology reports. We have implemented conceptual search capability to facilitate more comprehensive content mining from clinical free text.

Introduction

To implement conceptual searches on free-text documents in our Information Warehouse (IW) we have utilized UMLS Metathesaurus by NIH. We have built an indexing scheme based on the Metathesaurus (1), where documents containing desired concepts can be searched. The acquisition of text reports containing certain concepts is essential for many reasons: 1) It would allow PIs to capture otherwise overlooked data in an efficient manner. 2) It may serve PIs who are trying to recruit patients for clinical trial studies. 3) It may be utilized to de-identify dictated reports.

Prototype

To explore the potential capabilities of such a system we have developed a prototype. The prototype allows users to interactively search for concepts and view the documents that contain the concepts of interest. The following features are available:

  • Retrieves related concepts to a given word or phrase.

  • Allows users to interactively refine their search and add additional concepts to their already selected concepts.

  • Enables users to browse and to visualize preprocessed dictated reports through a web interface.

  • Allows search trough multiple categories of reports.

  • Employs multiple dictionaries (SNOMED CT etc.)

  • Facilitates viewing multiple documents side by side.

Methodology

The prototype’s functionality can be broken into two categories: Preprocessing and Browsing.

Preprocessing

The contents of each text document are broken down to phrases by utilizing the UMLS Natural Language Processing (NLP) tools (2). Each phrase is then queried against UMLS, and a frequency score is generated per concept within the document. Finally, each document is placed into a concept index with the following columns: document identifier, frequency of the concept, number of times the concept has occurred, and the number of phrases in the document (Figure 1)

Figure 1.

Figure 1

Browsing

Browsing provides two main functionalities: Interactive Concept Search and Document Browsing.

  1. Interactive Concept Search: Users can submit words or phrases through the web interface. For each submission a set of sibling concepts and their children will be returned. Since a concept may be related to many other concepts, users can make interactive selections in order to narrow down their search at this stage. For example the word “cold” may mean “flu”, “cold weather”, or “Chronic Obstructive Lung Disease” (Figure 2).

  2. Document Browsing: Once users have finalized their selection with concepts, they may proceed to the document browsing stage. Here document identifiers will be returned to the user as hyperlinks based on their search criteria.

Figure 2.

Figure 2

Results and Conclusions

Figures 1 and 2 illustrate conceptual searches through large repositories of preprocessed clinical free text. Enabling concept based searches on free text provides great flexibility for researchers. Conceptual search adds tremendous value beyond the capabilities of basic text search. Though we have applied our system to already de-identified documents, we have future plans to extend our application to serve as a de-identification tool as well.

Acknowledgements

Special thanks to Jianhua Liu, Scott Silvey, Michael Truman and Kabardhi Pasuparthi

References


Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES