Skip to main content
. 2024 Jan 2;25(1):bbad493. doi: 10.1093/bib/bbad493

Table 4.

Performance of LLMs for RE compared to SOTA on selected datasets (F1-score in %)

LM Method BC5CDR CHEMPROT DDI GAD
SOTA Task fine-tuning 57.03 77.24 82.36 83.96
BioGPT Task fine-tuning and few-shot 46.17 40.76
GPT-3 Few-shot 25.90 16.10 66.00
SPIRES Zero-shot 40.65
GPT-3.5 Zero-shot 57.43 33.49
One-shot 61.91 34.40
ChatGPT Zero-shot or few-shot 34.16 51.62 52.43
GPT-4 Zero-shot 66.18 63.25
One-shot 65.43 65.58