Table 1.
(a) CASI accuracy | (b) i2b2 hand-labeled accuracy | (c) i2b2 RS accuracy | (d) MIMIC-III accuracy | |||||
---|---|---|---|---|---|---|---|---|
Dataset generation method | Hand-labeled | Hand-labeled | RS | RS | ||||
Macro | Micro | Macro | Micro | Macro | Micro | Macro | Micro | |
Sampling method | ||||||||
Control | 0.672 | 0.673 | 0.702 | 0.682 | 0.869 | 0.850 | 0.948 | 0.917 |
Control + global | 0.686* | 0.687 | 0.738 | 0.745 | 0.877* | 0.862 | 0.955* | 0.929 |
SWR | 0.705* | 0.708 | 0.701 | 0.680 | 0.864 | 0.834 | 0.948 | 0.914 |
SWR + global | 0.715* | 0.712 | 0.701 | 0.677 | 0.873* | 0.850 | 0.956* | 0.931 |
Relatives | 0.813* | 0.806 | 0.833* | 0.795 | 0.873 | 0.827 | 0.945 | 0.910 |
Relatives + global | 0.825** | 0.820 | 0.855** | 0.816 | 0.886** | 0.842 | 0.954** | 0.925 |
Relatives + global + HP | 0.841*** | 0.834 | 0.859 | 0.825 | 0.889*** | 0.848 | 0.961*** | 0.935 |
Clinical BERT | 0.648 | 0.643 | 0.602 | 0.591 | 0.824 | 0.788 | 0.917 | 0.871 |
Clinical BERT + Relatives | 0.721**** | 0.717 | 0.690**** | 0.699 | – | – | – | – |
*p < 0.05 (one-sided Wilcoxon signed-rank test compared with Control model).
**p < 0.02 (one-sided Wilcoxon signed-rank test compared with Relatives model).
***p < 0.01 (one-sided Wilcoxon signed-rank test compared with Relatives + global model).
****p < 0.03 (one-sided Wilcoxon signed-rank test compared with Clinical BERT model).
We sample training data with replacement (SWR)and augmentation with related medical concepts (Relatives). We report results for when we incorporate the ontology during pretraining (HP) and the global context of the note (global). Bolded values indicate the best-performing model for each column. We have omitted running ClinicalBERT + Relatives on the RS datasets due to computational constraints.