Skip to main content
. 2016 Nov 30;4(4):e40. doi: 10.2196/medinform.6373

Table 3.

Performance of different natural language processing systems.

System P5a R5b F5c P10d R10e F10f AUC-ROCrankingg AUC-ROCKEh
Adapted KEA++i 0.333 0.211 0.239 0.281 0.362 0.292 0.890 0.780
RFj 0.409 0.267 0.299 0.339 0.416 0.346 0.891 0.821
FOCUSk 0.462 0.305 0.341 0.369 0.464 0.381 0.940 0.866
P (FOCUS vs RF) .01 .01 .01 .045 .03 .02 <.001 <.001

aP5: precision at rank 5.

bR5: recall at rank 5.

cF5: F-score at rank 5.

dP10: precision at rank 10.

eR10: recall at rank 10.

fF10: F-score at rank 10.

gAUC-ROCranking: area under the receiver operating characteristic curve computed on the candidate terms extracted by a system.

hAUC-ROCKE: area under the receiver operating characteristic curve (KE: keyphrase extraction) computed by using all the gold-standard important terms as positive examples.

iKEA++: extension of the keyphrase extraction algorithm KEA.

jRF: random forest.

kFOCUS: Finding impOrtant medical Concepts most Useful to patientS.