Table 4.
Benchmark results for datasets with added distortions, such as mild shearing and rotation—Catastrophic and severe failure rates of each model/tool on each dataset
| B. Benchmark results for datasets with distortions | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| JPO (dist) | CLEF (dist) | USPTO (dist) | UOB (dist) | USPTO_big (dist) | Indigo (dist) | DECIMER-Test augmented | ||||||||
| TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | |
| OSRA | 18% | 23% | 19% | 20% | 25% | 26% | 4% | 5% | 11% | 97% | 25% | 62% | 62% | 81% |
| MolVec | 10% | 12% | 12% | 13% | 15% | 16% | 3% | 3% | 5% | 92% | 30% | 50% | 56% | 67% |
| Imago | 42% | 46% | 28% | 29% | 16% | 16% | 27% | 29% | 9% | 99% | 23% | 95% | 42% | 92% |
| Img2Mol | 3% | 7% | 3% | 4% | 3% | 3% | 1% | 1% | 1% | 6% | 1% | 3% | 4% | 8% |
| SwinOCSR | 5% | 10% | 5% | 6% | 2% | 3% | 0.14% | 0.23% | 7% | 28% | 7% | 17% | 29% | 47% |
| MolScribe | 0.44% | 1% | 3% | 3% | 0.39% | 0.43% | 0% | 0% | 0.23% | 0.27% | 1% | 1% | 20% | 25% |
| DECIMER | 3% | 4% | 2% | 2% | 1% | 1% | 0% | 0% | 0.39% | 0.74% | 0.16% | 0.19% | 3% | 3% |
TE: Percentage of predictions with Tanimoto similarity values of zero and invalid predictions (catastrophic failure). T<=0.3: The percentage of predictions with Tanimoto similarity less than or equal to 0.3 (severe failure).
The best result for each metric on each dataset is marked in bold.