Skip to main content
. 2023 May 12;39(5):btad310. doi: 10.1093/bioinformatics/btad310

Table 4.

The performance of different deep learning model variants on the BioRED test set.a

PLM Output layer Efficiency
F1-score
GPU CPU Baseline AIO
PubMedBERT CRF 27s 116s 89.34 91.26
Softmax 17s 110s 88.98 91.00
BioBERT CRF 29s 120s 88.66 90.29
Softmax 18s 113s 88.33 90.06
Bioformer CRF 21s 43s 88.65 90.28
Softmax 12s 40s 88.35 90.19
a

Baseline: the model trained on the original BioRED training set. AIO: the AIONER model trained on the merged training set. All AIONER models significantly outperform the corresponding baselines in a two-sided Wilcoxon signed-rank test with a P-value < 0.05. Bold indicates the best score in efficiency and F1-score. Note that the numbers of efficiency are the processing time (seconds) on the BioRED test set (100 abstracts). All models were evaluated on the same GPU (Tesla V100-SXM2-32GB) and CPU [Intel(R) Xeon(R) Gold 6226 CPU @ 2.70 GHz, 24 Cores]. The processing times of the BioBERT and PubMedBERT are almost the same, as their model architectures and parameters are similar.