Skip to main content
. 2016 Nov 30;4(4):e40. doi: 10.2196/medinform.6373

Table 4.

Performance of natural language processing systems with and without the additional features.

System P5a R5b F5c P10d R10e F10f AUC-ROCrankingg AUC-ROCKEh
FOCUS-basei 0.413 0.256 0.295 0.331 0.401 0.337 0.911 0.840
FOCUSj 0.462 0.305 0.341 0.369 0.464 0.381 0.940 0.866
P (FOCUS vs FOCUS-base) .03 .02 .02 .003 <.001 .001 <.001 <.001
RF-basek 0.349 0.219 0.251 0.303 0.381 0.315 0.848 0.781
RFl 0.409 0.267 0.299 0.339 0.416 0.346 0.891 0.821
P (RF vs RF-base) .003 .01 .01 .01 .10 .046 <.001 <.001

aP5: precision at rank 5.

bR5: recall at rank 5.

cF5: F-score at rank 5.

dP10: precision at rank 10.

eR10: recall at rank 10.

fF10: F-score at rank 10.

gAUC-ROCranking: area under the receiver operating characteristic curve computed on the candidate terms extracted by a system.

hAUC-ROCKE: area under the receiver operating characteristic curve (KE: keyphrase extraction) computed by using all the gold-standard important terms as positive examples.

iFOCUS-base: Finding impOrtant medical Concepts most Useful to patientS; uses only the baseline features.

jFOCUS: Finding impOrtant medical Concepts most Useful to patientS; uses the baseline features plus the additional features.

kRF-base: random forest; uses only the baseline features.

lRF: random forest; uses the baseline features plus the additional features.