Table 2.
Clinical information extraction performance of MT-Clinical BERT vs hyperparameter searched Clinical BERT fine-tuning runs. All span level metrics are exact match. Task performances showcased in the column MT-Clinical BERT represent a single multitask round robin trained feature encoder with individual task-specific heads. Task performances showcased in the column MTL Loss Summation represent a multitask feature encoder trained with loss summation. All other reported results are generated from task-specific BERT models. Higher is better
MT-Clinical BERT | MTL Loss Summation | Optimized Clinical BERT | Clinical BERT8 | |
---|---|---|---|---|
n2c2-2019 | 86.7 (−0.5) | 84.5 | 87.2 | – |
MedNLI | 80.5 (−2.3) | 80.2 | 82.8 | 82.7 |
MedRQE | 76.5 (−3.6) | 77.5 | 80.1 | – |
n2c2-2018 | 87.4 (−0.7) | 85.5 | 88.1 | – |
i2b2-2014 | 91.9 (−3.6) | 94.2 | 95.5 | 92.7 |
i2b2-2012 | 84.1 (+0.2) | 84.8 | 83.9 | 78.9 |
i2b2-2010 | 89.5 (−0.3) | 90.6 | 89.8 | 87.8 |
quaero-2014 | 49.1 (−6.4) | 52.2 | 55.5 | – |