Table 2.
Performance of the models using both clinical history and impression section. Bold text represents optimal performance on the specific dataset. Shows the p-value for model comparison
| Labels | Model | History and impression section | Support | |||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | p-value | |||
| Acute | BERT | 0.94 | 0.95 | 0.95 | p > 0.5 | 1436 |
| Critical | 0.82 | 0.86 | 0.83 | 136 | ||
| Non-acute | 0.97 | 0.97 | 0.97 | 2247 | ||
| Overall (weighted) | 0.96 | 0.96 | 0.96 | 3819 | ||
| Acute | ClinicalBERT++ | 0.93 | 0.96 | 0.95 | ref | 1436 |
| Critical | 0.92 | 0.84 | 0.88 | 136 | ||
| Non-acute | 0.98 | 0.96 | 0.97 | 2247 | ||
| Overall (weighted) | 0.96 | 0.96 | 0.96 | 3819 | ||
| External test | ||||||
| Acute | BERT | 0.31 | 0.89 | 0.45 | p < 0.01 | 325 |
| Critical | 0.23 | 0.33 | 0.27 | 24 | ||
| Non-acute | 0.97 | 0.61 | 0.75 | 1690 | ||
| Overall (weighted) | 0.85 | 0.65 | 0.69 | 2039 | ||
| Acute | ClinicalBERT++ | 0.64 | 0.55 | 0.60 | ref | 325 |
| Critical | 0.81 | 0.54 | 0.65 | 24 | ||
| Non-acute | 0.92 | 0.94 | 0.93 | 1690 | ||
| Overall (weighted) | 0.87 | 0.88 | 0.87 | 2039 | ||