Skip to main content
. 2022 Nov 7;36(1):105–113. doi: 10.1007/s10278-022-00712-w

Table 2.

Performance of the models using both clinical history and impression section. Bold text represents optimal performance on the specific dataset. Shows the p-value for model comparison

Labels Model History and impression section Support
Precision Recall F1-score p-value
Acute BERT 0.94 0.95 0.95 p > 0.5 1436
Critical 0.82 0.86 0.83 136
Non-acute 0.97 0.97 0.97 2247
Overall (weighted) 0.96 0.96 0.96 3819
Acute ClinicalBERT++ 0.93 0.96 0.95 ref 1436
Critical 0.92 0.84 0.88 136
Non-acute 0.98 0.96 0.97 2247
Overall (weighted) 0.96 0.96 0.96 3819
External test
Acute BERT 0.31 0.89 0.45 p < 0.01 325
Critical 0.23 0.33 0.27 24
Non-acute 0.97 0.61 0.75 1690
Overall (weighted) 0.85 0.65 0.69 2039
Acute ClinicalBERT++ 0.64 0.55 0.60 ref 325
Critical 0.81 0.54 0.65 24
Non-acute 0.92 0.94 0.93 1690
Overall (weighted) 0.87 0.88 0.87 2039