Skip to main content
. 2020 Jan 16;27(3):407–418. doi: 10.1093/jamia/ocz207

Figure 5.

Figure 5.

Comparison of medication extraction natural language processing (NLP) systems for entity-level precision, recall, and F1 performance measures on test sets. The test sets consisted of 50 notes each for tacrolimus and lamotrigine, and 110 notes for allopurinol. Here, n refers to the number of annotations for that drug-entity combination in the gold standard dataset. P, R, and F1 represent precision, recall, and F-measure (F1 score), respectively. The drug entities presented here reflect a restricted list of entities that have been standardized across all 4 NLP systems to ensure comparability. Symbols and lines represent estimates and 95% bootstrapped confidence intervals, respectively. Arrows along the bottom x-axis indicate that either part or all of the confidence interval is below 0.80. Numeric results for this figure can be found in Supplementary Table 2.