Skip to main content
. 2021 Aug 1;28(10):2108–2115. doi: 10.1093/jamia/ocab126

Table 2.

Clinical information extraction performance of MT-Clinical BERT vs hyperparameter searched Clinical BERT fine-tuning runs. All span level metrics are exact match. Task performances showcased in the column MT-Clinical BERT represent a single multitask round robin trained feature encoder with individual task-specific heads. Task performances showcased in the column MTL Loss Summation represent a multitask feature encoder trained with loss summation. All other reported results are generated from task-specific BERT models. Higher is better

MT-Clinical BERT MTL Loss Summation Optimized Clinical BERT Clinical BERT8
n2c2-2019 86.7 (−0.5) 84.5 87.2
MedNLI 80.5 (−2.3) 80.2 82.8 82.7
MedRQE 76.5 (−3.6) 77.5 80.1
n2c2-2018 87.4 (−0.7) 85.5 88.1
i2b2-2014 91.9 (−3.6) 94.2 95.5 92.7
i2b2-2012 84.1 (+0.2) 84.8 83.9 78.9
i2b2-2010 89.5 (−0.3) 90.6 89.8 87.8
quaero-2014 49.1 (−6.4) 52.2 55.5