. 2021 Aug 1;28(10):2108–2115. doi: 10.1093/jamia/ocab126

Table 2.

Clinical information extraction performance of MT-Clinical BERT vs hyperparameter searched Clinical BERT fine-tuning runs. All span level metrics are exact match. Task performances showcased in the column MT-Clinical BERT represent a single multitask round robin trained feature encoder with individual task-specific heads. Task performances showcased in the column MTL Loss Summation represent a multitask feature encoder trained with loss summation. All other reported results are generated from task-specific BERT models. Higher is better

	MT-Clinical BERT	MTL Loss Summation	Optimized Clinical BERT	Clinical BERT⁸
n2c2-2019	86.7 (−0.5)	84.5	87.2	–
MedNLI	80.5 (−2.3)	80.2	82.8	82.7
MedRQE	76.5 (−3.6)	77.5	80.1	–
n2c2-2018	87.4 (−0.7)	85.5	88.1	–
i2b2-2014	91.9 (−3.6)	94.2	95.5	92.7
i2b2-2012	84.1 (+0.2)	84.8	83.9	78.9
i2b2-2010	89.5 (−0.3)	90.6	89.8	87.8
quaero-2014	49.1 (−6.4)	52.2	55.5	–