Table 4.
PLM | Output layer | Efficiency |
F1-score |
||
---|---|---|---|---|---|
GPU | CPU | Baseline | AIO | ||
PubMedBERT | CRF | 27s | 116s | 89.34 | 91.26 |
Softmax | 17s | 110s | 88.98 | 91.00 | |
BioBERT | CRF | 29s | 120s | 88.66 | 90.29 |
Softmax | 18s | 113s | 88.33 | 90.06 | |
Bioformer | CRF | 21s | 43s | 88.65 | 90.28 |
Softmax | 12s | 40s | 88.35 | 90.19 |
Baseline: the model trained on the original BioRED training set. AIO: the AIONER model trained on the merged training set. All AIONER models significantly outperform the corresponding baselines in a two-sided Wilcoxon signed-rank test with a P-value < 0.05. Bold indicates the best score in efficiency and F1-score. Note that the numbers of efficiency are the processing time (seconds) on the BioRED test set (100 abstracts). All models were evaluated on the same GPU (Tesla V100-SXM2-32GB) and CPU [Intel(R) Xeon(R) Gold 6226 CPU @ 2.70 GHz, 24 Cores]. The processing times of the BioBERT and PubMedBERT are almost the same, as their model architectures and parameters are similar.