Skip to main content
. 2022 Dec 26;5:194. doi: 10.1038/s41746-022-00742-2

Table 2.

Comparison of GatorTron with existing biomedical and clinical transformer models for semantic textual similarity, natural language inference, and question answering.

Transformer Semantic textual similarity Natural language inference Question answering
2019 n2c266 MedNLI71 emrQA medication77 emrQA relation77
Pearson correlation Accuracy F1 score Exact Match F1 score Exact Match
BioBERT 0.8744 0.8050 0.6997 0.2475 0.9262 0.8361
ClinicalBERT 0.8787 0.8270 0.6905 0.2406 0.9306 0.8533
BioMegatron 0.8806 0.8390 0.7231 0.2882 0.9405 0.879
GatorTron-base (1/4 data) 0.8675 0.8643 0.7281 0.2952 0.9390 0.8579
GatorTron-base 0.8810 0.8670 0.7181 0.2978 0.9543 0.9029
GatorTron-medium 0.8903 0.8720 0.7354 0.3018 0.9677 0.9243
GatorTron-large 0.8896 0.9020 0.7408 0.3155 0.9719 0.9310

The best evaluation scores are presented in bold.