. 2022 Dec 26;5:194. doi: 10.1038/s41746-022-00742-2

Table 2.

Comparison of GatorTron with existing biomedical and clinical transformer models for semantic textual similarity, natural language inference, and question answering.

Transformer	Semantic textual similarity	Natural language inference	Question answering
	2019 n2c2⁶⁶	MedNLI⁷¹	emrQA medication⁷⁷		emrQA relation⁷⁷
	Pearson correlation	Accuracy	F1 score	Exact Match	F1 score	Exact Match
BioBERT	0.8744	0.8050	0.6997	0.2475	0.9262	0.8361
ClinicalBERT	0.8787	0.8270	0.6905	0.2406	0.9306	0.8533
BioMegatron	0.8806	0.8390	0.7231	0.2882	0.9405	0.879
GatorTron-base (1/4 data)	0.8675	0.8643	0.7281	0.2952	0.9390	0.8579
GatorTron-base	0.8810	0.8670	0.7181	0.2978	0.9543	0.9029
GatorTron-medium	0.8903	0.8720	0.7354	0.3018	0.9677	0.9243
GatorTron-large	0.8896	0.9020	0.7408	0.3155	0.9719	0.9310

The best evaluation scores are presented in bold.