. 2024 Jul 31;3(9):1822–1831. doi: 10.1039/d4dd00091a

Compound name recognition results of the fine-tuned model, ChemDataExtractor, and the MatSciBert model from the test set of USPTO-ORD-100K. In this task, a set of compound names (entities) is extracted from the unstructured text and is then evaluated against the ground truth.

Model	Accurate	Removal	Addition	Alteration	Total
Fine-tuned	94.9%	4.1%	2.2%	1.0%	78 408
ChemDataExtractor	76.1%	16.0%	22.7%	8.0%
MatSciBert	96.6%	2.2%	2.4%	1.2%