Skip to main content
. 2024 Jul 31;3(9):1822–1831. doi: 10.1039/d4dd00091a

Compound name recognition results of the fine-tuned model, ChemDataExtractor, and the MatSciBert model from the test set of USPTO-ORD-100K. In this task, a set of compound names (entities) is extracted from the unstructured text and is then evaluated against the ground truth.

Model Accurate Removal Addition Alteration Total
Fine-tuned 94.9% 4.1% 2.2% 1.0% 78 408
ChemDataExtractor 76.1% 16.0% 22.7% 8.0%
MatSciBert 96.6% 2.2% 2.4% 1.2%