. Author manuscript; available in PMC: 2022 Aug 16.

Published in final edited form as: J Biomed Inform. 2022 Jul 8;132:104139. doi: 10.1016/j.jbi.2022.104139

Table 3.

Results on the i2b2 2010 dataset. The best scores are bolded.

Model	Present	Absent	Hypothetical	Possible	Conditional	Not Associated	micro F-1
Logistic Regression	0.900	0.842	0.833	0.464	0.471	0.596	0.850
Roberts et al. [8]^*	0.962	0.947	0.895	0.684	0.423	0.861	0.928
Jiang et al. [9]^*	0.960	0.954	0.904	0.666	0.391	0.863	0.931
Demner et al. [10]^⋄	0.957	0.940	0.626	0.859	0.384	0.835	0.933
Clark et al.[7]^*	0.958	0.937	0.890	0.630	0.422	0.869	0.934
de Bruijin et al. [11]^⋄	0.959	0.942	0.884	0.643	0.263	0.824	0.936
BERT model	0.959	0.955	0.902	0.760	0.000	0.000	0.936
Prompt model	0.971	0.968	0.921	0.763	0.485	0.875	0.954

- the numbers were reported in the i2b2 2010/VA challenge [24], and were not directly comparable with our model.

^⋄

- the numbers are computed based on the reported confusion matrices from the original paper, and were not directly comparable with our model.