Table 2.
Transformer | Semantic textual similarity | Natural language inference | Question answering | |||
---|---|---|---|---|---|---|
2019 n2c266 | MedNLI71 | emrQA medication77 | emrQA relation77 | |||
Pearson correlation | Accuracy | F1 score | Exact Match | F1 score | Exact Match | |
BioBERT | 0.8744 | 0.8050 | 0.6997 | 0.2475 | 0.9262 | 0.8361 |
ClinicalBERT | 0.8787 | 0.8270 | 0.6905 | 0.2406 | 0.9306 | 0.8533 |
BioMegatron | 0.8806 | 0.8390 | 0.7231 | 0.2882 | 0.9405 | 0.879 |
GatorTron-base (1/4 data) | 0.8675 | 0.8643 | 0.7281 | 0.2952 | 0.9390 | 0.8579 |
GatorTron-base | 0.8810 | 0.8670 | 0.7181 | 0.2978 | 0.9543 | 0.9029 |
GatorTron-medium | 0.8903 | 0.8720 | 0.7354 | 0.3018 | 0.9677 | 0.9243 |
GatorTron-large | 0.8896 | 0.9020 | 0.7408 | 0.3155 | 0.9719 | 0.9310 |
The best evaluation scores are presented in bold.