Development of a Natural Language Processing System to Identify Timing and Status of Colonoscopy Testing in Electronic Medical Records

Joshua C Denny; Josh F Peterson; Neesha N Choma; Hua Xu; Randolph A Miller; Lisa Bastarache; Neeraja B Peterson

. 2009 Nov 14;2009:141.

Development of a Natural Language Processing System to Identify Timing and Status of Colonoscopy Testing in Electronic Medical Records

Joshua C Denny ^1,², Josh F Peterson ^1,^2,³, Neesha N Choma ^2,³, Hua Xu ¹, Randolph A Miller ¹, Lisa Bastarache ¹, Neeraja B Peterson ²

PMCID: PMC2815478 PMID: 20351837

Abstract

Colorectal cancer (CRC) screening rates are low despite proven benefits. We developed natural language processing (NLP) algorithms to identify temporal expressions and status indicators, such as “patient refused” or “test scheduled.” The authors incorporated the algorithms into the KnowledgeMap Concept Identifier system in order to detect references to completed colonoscopies within electronic text. The modified NLP system was evaluated using 200 randomly selected electronic medical records (EMRs) from a primary care population aged ≥50 years. The system detected completed colonoscopies with recall and precision of 0.93 and 0.92. The system was superior to a query of colonoscopy billing codes to determine screening status.

Introduction

Screening for CRC is recommended for average-risk individuals age 50 years and older, but is underutilized.¹ Current methods for determining CRC screening status (patient self-report, physician report, medical claims data, and manual chart abstraction) are either time-consuming and expensive or inaccurate. We investigated the use of an NLP system to detect the timing and receipt of colonoscopies.

Methods

The primary outcome measure was recall and precision for the NLP algorithm’s identification of completed colonoscopies as compared to a gold standard review of all available data by a physician. We extended an existing NLP system, the KnowledgeMap concept identifier², with novel NLP algorithms to 1) identify and interpret time descriptors (e.g., “6/2003” or “5 years ago”) and associate them with clinical events; and 2) assign values for concept certainty and status (e.g., “never had colonoscopy” or “needs a colonoscopy”) to each identified concept.

Results

Manual review identified 159 unique completed colonoscopies in the test set. The NLP system identified 1,208 sentences with references to colonoscopies, of which 518 contained a timing reference, and 514 contained a status indicator. The recall and precision were 0.94 and 0.95, respectively, for identifying and assigning timing information to the colonoscopies. The recall and precision of the algorithm to detect status indicators were 0.82 and 0.95, respectively. Overall, the system detected completed colonoscopies with a recall and precision of 0.93 and 0.92, respectively. A query of colonoscopy billing codes identified 106 (67%) of the colonoscopies detected by NLP, and identified one colonoscopy not previously detected.

Conclusion

Using NLP algorithms to detect timing and status on EMR records can identify patients who received colonoscopies with high recall and precision which was superior to billing records queries. These data suggest that a robust system to identify receipt of CRC testing should incorporate NLP methods.

References

1.Winawer S, Fletcher R, Rex D, et al. Colorectal cancer screening and surveillance: clinical guidelines and rationale-Update based on new evidence. Gastroenterology. 2003 Feb;124(2):544–560. doi: 10.1053/gast.2003.50044. [DOI] [PubMed] [Google Scholar]
2.Denny JC, Smithers JD, Miller RA, Spickard A., 3rd “Understanding” medical school curriculum content using KnowledgeMap. J Am Med Inform Assoc. 2003 Jul–Aug;10(4):351–362. doi: 10.1197/jamia.M1176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b1-amia-f2009-141] 1.Winawer S, Fletcher R, Rex D, et al. Colorectal cancer screening and surveillance: clinical guidelines and rationale-Update based on new evidence. Gastroenterology. 2003 Feb;124(2):544–560. doi: 10.1053/gast.2003.50044. [DOI] [PubMed] [Google Scholar]

[b2-amia-f2009-141] 2.Denny JC, Smithers JD, Miller RA, Spickard A., 3rd “Understanding” medical school curriculum content using KnowledgeMap. J Am Med Inform Assoc. 2003 Jul–Aug;10(4):351–362. doi: 10.1197/jamia.M1176. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Development of a Natural Language Processing System to Identify Timing and Status of Colonoscopy Testing in Electronic Medical Records

Joshua C Denny, MD MS

Josh F Peterson, MD MPH

Neesha N Choma, MD

Hua Xu, PhD

Randolph A Miller, MD

Lisa Bastarache, MS

Neeraja B Peterson, MD MSc

Abstract

Introduction

Methods

Results

Conclusion

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Development of a Natural Language Processing System to Identify Timing and Status of Colonoscopy Testing in Electronic Medical Records

Joshua C Denny, MD MS

Josh F Peterson, MD MPH

Neesha N Choma, MD

Hua Xu, PhD

Randolph A Miller, MD

Lisa Bastarache, MS

Neeraja B Peterson, MD MSc

Abstract

Introduction

Methods

Results

Conclusion

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases