. 2024 Jan 2;25(1):bbad493. doi: 10.1093/bib/bbad493

Table 4.

Performance of LLMs for RE compared to SOTA on selected datasets (F1-score in %)

LM	Method	BC5CDR	CHEMPROT	DDI	GAD
SOTA	Task fine-tuning	57.03	77.24	82.36	83.96
BioGPT	Task fine-tuning and few-shot	46.17		40.76
GPT-3	Few-shot		25.90	16.10	66.00
SPIRES	Zero-shot	40.65
GPT-3.5	Zero-shot		57.43	33.49
	One-shot		61.91	34.40
ChatGPT	Zero-shot or few-shot		34.16	51.62	52.43
GPT-4	Zero-shot		66.18	63.25
	One-shot		65.43	65.58