[Preprint]. 2023 Oct 17:arXiv:2306.10070v2. [Version 2]

Table 3.

Performance of LLMs for NER compared to SOTA on selected datasets (F1-score in %).

Language Model	Method	BC2GM	BC5CDR-chemical	BC5CDR-disease	JNLPBA	NCBI-disease
SOTA	Task fine-tuning	84.52	93.33	85.62	79.10	87.82
GPT-3	Few-shot	41.40	73.00	43.60		51.40
GPT-3.5	Zero-shot		29.25			24.05
GPT-3.5	One-shot		18.03			12.73
ChatGPT	Zero-shot or few-shot	37.54	60.30	51.77	41.25	50.49
GPT-4	Zero-shot		74.43			56.73
GPT-4	One-shot		82.07			48.37