Table 5:
Transfer learning performance (F1-score) of RadBERT-CL, BERT, and BlueBERT when few labeled data is available. Fine-Tuning is done using randomly selected 400 reports and F1-score is reported on the remaining 287 reports of 687 high-quality manually annotated reports. Reported results are the mean F1-score of the 10 random training experiments and rounded to 3 decimal places. We identify significant improvements by RadBERT-CL in both Linear Evaluation setting (freeze encoder f(.) parameters and train the classifier layer), and Full-network Evaluation setting (train encoder f(.) and classifier layer end-to-end).
Model | Linear Evaluation | Full-Network Evaluation |
---|---|---|
BERT-uncased | 0.137 ±0.012 | 0.477 ±0.009 |
BlueBERT-uncased | 0.153 ±0.005 | 0.480 ±0.007 |
Algorithm 3 RadBERT-CL (pre-trained using 687 test reports) |
0.258 ±0.015 | 0.543 ±0.021 |
Algorithm 3 RadBERT-CL (pre-trained using Full MIMIC-CXR unlabelled data) |
0.282 ±0.011 | 0.591 ±0.019 |