Skip to main content
. 2018 Oct 1;25(10):1274–1283. doi: 10.1093/jamia/ocy114

Table 2.

Performance metrics for selected system submissions for subtask-2, baselines, and system ensembles. Micro-averaged precision, recall, and F1-scores are shown for the definite intake (class 1) and possible intake (class 2) classes. The highest F1-score over the evaluation dataset is shown in bold. Detailed discussions about the approaches can be found in the system description papers referenced (when available)

System/Team Micro-averaged precision for classes 1 and 2 Micro-averaged recall for classes 1 and 2 Micro-averaged F1-score for classes 1 and 2
Baseline 1: Naïve Bayes 0.359 0.503 0.419
Baseline 2: SVMs 0.652 0.436 0.523
Baseline 3: Random Forest 0.628 0.487 0.549
InfyNLP44(Infosys Ltd) 0.725 0.664 0.693
UKNLP41(University of Kentucky) 0.701 0.677 0.689
NRC-Canada35 0.708 0.642 0.673
TJIIP (Tongji University, China) 0.691 0.641 0.665
TurkuNLP43(University of Turku) 0.701 0.630 0.663
CSaRUS-CNN50(Arizona State University) 0.709 0.604 0.652
NTTMU53(Multiple Universities, Taiwan) 0.690 0.554 0.614
Ensemble all: majority vote 0.736 0.657 0.694
Ensemble top 10: majority vote 0.726 0.679 0.702
Ensemble top 7: majority vote 0.724 0.673 0.697
Ensemble top 5: majority vote 0.723 0.667 0.694
Ensemble top submissions from top 5 teams: majority vote 0.727 0.673 0.699