Table 13. Evaluation results on the test portion of the i2b2/VA dataset.
Precision | Recall | F1 | |
---|---|---|---|
Baseline | 0.517 | 0.597 | 0.541 |
Xu et al. [51] | 0.906 | 0.925 | 0.915 |
Bio-SCoRes | 0.838 | 0.881 | 0.858 |
- BCUBED | 0.964 | 0.944 | 0.954 |
- MUC | 0.735 | 0.830 | 0.779 |
- CEAF | 0.815 | 0.868 | 0.841 |
By semantic type | |||
– Test | 0.796 | 0.700 | 0.735 |
– Person | 0.816 | 0.903 | 0.850 |
– Problem | 0.774 | 0.851 | 0.808 |
– Treatment | 0.759 | 0.826 | 0.789 |
No post-processing | |||
Bio-SCoRes | 0.800 | 0.871 | 0.832 |
- BCUBED | 0.966 | 0.941 | 0.953 |
- MUC | 0.655 | 0.807 | 0.723 |
- CEAF | 0.777 | 0.869 | 0.821 |