Skip to main content
. Author manuscript; available in PMC: 2022 Apr 30.
Published in final edited form as: Proc Mach Learn Res. 2021 Dec;158:196–208.

Table 5:

Transfer learning performance (F1-score) of RadBERT-CL, BERT, and BlueBERT when few labeled data is available. Fine-Tuning is done using randomly selected 400 reports and F1-score is reported on the remaining 287 reports of 687 high-quality manually annotated reports. Reported results are the mean F1-score of the 10 random training experiments and rounded to 3 decimal places. We identify significant improvements by RadBERT-CL in both Linear Evaluation setting (freeze encoder f(.) parameters and train the classifier layer), and Full-network Evaluation setting (train encoder f(.) and classifier layer end-to-end).

Model Linear Evaluation Full-Network Evaluation
BERT-uncased 0.137 ±0.012 0.477 ±0.009
BlueBERT-uncased 0.153 ±0.005 0.480 ±0.007
Algorithm 3 RadBERT-CL
(pre-trained using 687 test reports)
0.258 ±0.015 0.543 ±0.021
Algorithm 3 RadBERT-CL
(pre-trained using Full MIMIC-CXR unlabelled data)
0.282 ±0.011 0.591 ±0.019