Skip to main content
[Preprint]. 2023 Oct 17:arXiv:2306.10070v2. [Version 2]

Table 3.

Performance of LLMs for NER compared to SOTA on selected datasets (F1-score in %).

Language Model Method BC2GM BC5CDR-chemical BC5CDR-disease JNLPBA NCBI-disease
SOTA Task fine-tuning 84.52 93.33 85.62 79.10 87.82
GPT-3 Few-shot 41.40 73.00 43.60 51.40
GPT-3.5 Zero-shot 29.25 24.05
One-shot 18.03 12.73
ChatGPT Zero-shot or few-shot 37.54 60.30 51.77 41.25 50.49
GPT-4 Zero-shot 74.43 56.73
One-shot 82.07 48.37