. 2021 Sep 7;12:5319. doi: 10.1038/s41467-021-25578-4

Table 1.

Micro and macro accuracy of (%) of our model on (a) CASI abbreviations, (b) i2b2 generated by hand labeling, (c) i2b2 generated by RS, and (d) MIMIC III generated by RS.

	(a) CASI accuracy		(b) i2b2 hand-labeled accuracy		(c) i2b2 RS accuracy		(d) MIMIC-III accuracy
Dataset generation method	Hand-labeled		Hand-labeled		RS		RS
	Macro	Micro	Macro	Micro	Macro	Micro	Macro	Micro
Sampling method
Control	0.672	0.673	0.702	0.682	0.869	0.850	0.948	0.917
Control + global	0.686*	0.687	0.738	0.745	0.877*	0.862	0.955*	0.929
SWR	0.705*	0.708	0.701	0.680	0.864	0.834	0.948	0.914
SWR + global	0.715*	0.712	0.701	0.677	0.873*	0.850	0.956*	0.931
Relatives	0.813*	0.806	0.833*	0.795	0.873	0.827	0.945	0.910
Relatives + global	0.825**	0.820	0.855**	0.816	0.886**	0.842	0.954**	0.925
Relatives + global + HP	0.841***	0.834	0.859	0.825	0.889***	0.848	0.961***	0.935
Clinical BERT	0.648	0.643	0.602	0.591	0.824	0.788	0.917	0.871
Clinical BERT + Relatives	0.721****	0.717	0.690****	0.699	–	–	–	–

^*p < 0.05 (one-sided Wilcoxon signed-rank test compared with Control model).

^**p < 0.02 (one-sided Wilcoxon signed-rank test compared with Relatives model).

^***p < 0.01 (one-sided Wilcoxon signed-rank test compared with Relatives + global model).

^****p < 0.03 (one-sided Wilcoxon signed-rank test compared with Clinical BERT model).

We sample training data with replacement (SWR)and augmentation with related medical concepts (Relatives). We report results for when we incorporate the ontology during pretraining (HP) and the global context of the note (global). Bolded values indicate the best-performing model for each column. We have omitted running ClinicalBERT + Relatives on the RS datasets due to computational constraints.