Table 3.
Results on the i2b2 2010 dataset. The best scores are bolded.
Model | Present | Absent | Hypothetical | Possible | Conditional | Not Associated | micro F-1 |
---|---|---|---|---|---|---|---|
Logistic Regression | 0.900 | 0.842 | 0.833 | 0.464 | 0.471 | 0.596 | 0.850 |
Roberts et al. [8]* | 0.962 | 0.947 | 0.895 | 0.684 | 0.423 | 0.861 | 0.928 |
Jiang et al. [9]* | 0.960 | 0.954 | 0.904 | 0.666 | 0.391 | 0.863 | 0.931 |
Demner et al. [10]⋄ | 0.957 | 0.940 | 0.626 | 0.859 | 0.384 | 0.835 | 0.933 |
Clark et al.[7]* | 0.958 | 0.937 | 0.890 | 0.630 | 0.422 | 0.869 | 0.934 |
de Bruijin et al. [11]⋄ | 0.959 | 0.942 | 0.884 | 0.643 | 0.263 | 0.824 | 0.936 |
BERT model | 0.959 | 0.955 | 0.902 | 0.760 | 0.000 | 0.000 | 0.936 |
Prompt model | 0.971 | 0.968 | 0.921 | 0.763 | 0.485 | 0.875 | 0.954 |
- the numbers were reported in the i2b2 2010/VA challenge [24], and were not directly comparable with our model.
- the numbers are computed based on the reported confusion matrices from the original paper, and were not directly comparable with our model.