Skip to main content
. 2016 Aug 4;24(1):24–42. doi: 10.1177/1460458216656471

Table 3.

Precision, recall and F1 score (in %) for detecting HAIs using GTB, optimized GTB, SVM and optimized SVM given the different preprocessing methods.

GTB
GTB optimized
SVM
SVM optimized
P R F1 P R F1 P R F1 P R F1
TF 1000 83.4 90.6 86.7 79.6 92.2 85.2 76.3 79.8 78.0 80.2 88.1 83.7
Lemma 79.3 87.6 83.0 76.5 92.2 83.1 60.1 100.0 75.1 78.9 88.2 83.1
Stem 82.4 88.3 85.0 79.7 93.7 85.7 60.1 100.0 75.1 80.7 89.8 84.8
Stop 79.0 83.6 80.6 79.0 93.0 85.0 76.5 78.3 77.4 83.1 89.8 84.8
IST 79.0 86.0 81.7 76.7 89.1 81.9 73.0 65.1 68.9 72.9 84.5 78.0
TF-IDF 1000 81.7 91.2 86.0 79.5 92.1 84.9 60.1 100.0 75.1 78.1 89.7 82.8
LS-TFIDF 1000 80.2 84.4 81.9 78.9 91.3 84.2 60.1 100.0 75.1 72.7 88.9 79.3
SS-TFIDF 1000 78.6 85.8 81.6 78.8 93.0 85.0 60.1 100.0 75.1 75.3 86.6 79.8

GTB: gradient tree boosting; SVM: support vector machine; TF: term frequency; IST: infection-specific term; TF-IDF: term frequency–inverse document frequency.

In total, the material comprised 213 HRs of which 128 contained HAI giving a baseline precision of 60 percent, recall of 100 percent and F-score of 75 percent.