. 2023 Aug 19;14:5045. doi: 10.1038/s41467-023-40782-0

Table 3.

Benchmark results for datasets without added distortions—Catastrophic and severe failure rates of each model/tool on each dataset

Benchmark results for datasets without added distortions.
	JPO		CLEF		USPTO		UOB		USPTO Big		Indigo		Img2Mol Test		DECIMER-Hand drawn		DECIMER-Test non-augmented
	T_E	T_<=0.3	T_E	T_<=0.3	T_E	T_<=0.3	T_E	T_<=0.3	T_E	T_<=0.3	T_E	T_<=0.3	T_E	T_<=0.3	T_E	T_<=0.3	T_E	T_<=0.3
OSRA	14%	19%	4%	4%	2%	2%	2%	2%	8%	92%	25%	42%	34%	63%	49%	73%	43%	58%
MolVec	6%	8%	3%	3%	2%	2%	2%	2%	21%	45%	28%	35%	36%	55%	34%	62%	37%	47%
Imago	23%	25%	7%	7%	3%	3%	6%	7%	19%	98%	23%	92%	27%	91%	57%	67%	35%	79%
Img2Mol	2%	7%	3%	3%	3%	3%	1%	1%	1%	2%	1%	2%	0.29%	0.32%	2%	26%	4%	4%
SwinOCSR	6%	9%	5%	6%	2%	3%	0.21%	0.33%	3%	6%	5%	8%	8%	12%	3%	12%	11%	28%
MolScribe	1%	2%	3%	3%	0.37%	0.4%	0.02%	0.02%	0.22%	0.23%	1%	1%	1%	2%	5%	17%	2%	3%
DECIMER	3%	3%	2%	2%	1%	1%	0%	0%	0.25%	0.45%	0.20%	0.21%	2%	3%	5%	17%	4%	4%

T_E: Percentage of predictions with Tanimoto similarity values of zero and invalid predictions (catastrophic failure). T_<=0.3: The percentage of predictions with Tanimoto similarity less than or equal to 0.3 (severe failure).

The best result for each metric on each dataset is marked in bold.