Skip to main content
. 2024 Jul 31;3(9):1822–1831. doi: 10.1039/d4dd00091a

Evaluation results at the leaf field level (Evaluation Metric 2) for structured records extracted using the fine-tuned LLaMA-7B model. For each record in the test set of USPTO-ORD-100K, an ORD-formatted JSON record is extracted from the unstructured text and evaluated against the ground truth using Evaluation Metric 2. * These fields do not belong to any of the five field types (identifiers, amount, reaction role, condition, workup). In this dataset, all of them are leaf fields of ProductCompound, including texture, isolated_color, and yield-related measurements.

Message type Field type Accurate Removal Addition Alteration Total
ProductCompound & Compound Identifiers 100 958 (93.5%) 5490 (5.1%) 2590 (2.4%) 1566 (1.5%) 108 014
Amount 74 209 (95.2%) 3434 (4.4%) 2182 (2.8%) 300 (0.4%) 77 943
Reaction role 48 262 (89.3%) 2797 (5.2%) 1264 (2.3%) 2978 (5.5%) 54 037
ReactionConditions Condition 26 782 (98.3%) 298 (1.1%) 391 (1.4%) 176 (0.7%) 27 256
ReactionWorkup Workup 178 733 (94.0%) 8360 (4.4%) 10 189 (5.4%) 3156 (1.7%) 190 249
Other* 31 794 (84.80%) 5261 (14.0%) 2240 (6.0%) 439 (1.2%) 37 494