Skip to main content
. 2010 Jul-Aug;17(4):383–388. doi: 10.1136/jamia.2010.004804

Table 3.

Different NLP approaches to determining from EMR records whether a colonoscopy was actually performed (completed)

Concept-match only* Concept-identification with:
Negation Status Date§ Date & Status
Manually verified no of completed colonoscopies from result set identified by method (TP) 367 367 358 349 340
No of ‘completed colonoscopies’ inferred from method (TP+FP) 1208 1174 940 396 359
Recall 1.00 1.00 0.98 0.95 0.93
Precision 0.30 0.31 0.38 0.88 0.95
F measure 0.47 0.48 0.55 0.91 0.94

Test sample contained 1208 sentences extracted verbatim from electronic medical records (EMR). Each of the 1208 sentences contained at least one reference to a project-specified colonoscopy-related concept. Manual review determined that of the 1208 sentences, only 367 referred to colonoscopies actually performed. Note that status and date methods were able to eliminate references to future not-yet-completed events, discussions of a patient's need for a colonoscopy, and other remarks not pertaining to actual, completed colonoscopies.

*

‘Concept-match only’ simply identifies a ‘colonoscopy event’ based on straightforward concept name matching (with synonymy), independent of surrounding text describing corresponding colonoscopy status, state of negation, or reference date(s).

‘Negation’ refers to ‘concept-match only’, as above, augmented by a negation tagger that removed any negated concept from consideration.

‘Status’ refers to application of a status algorithm to remove non-receipt statuses (eg, scheduled, performed).

§

‘Date’ refers to use of the date detection algorithm to include only present-day and past events.

FP, false positive; NLP, natural language processing; TP, true positive.

HHS Vulnerability Disclosure