Table 3.
Benchmark results for datasets without added distortions—Catastrophic and severe failure rates of each model/tool on each dataset
Benchmark results for datasets without added distortions. | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
JPO | CLEF | USPTO | UOB | USPTO Big | Indigo | Img2Mol Test | DECIMER-Hand drawn | DECIMER-Test non-augmented | ||||||||||
TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | TE | T<=0.3 | |
OSRA | 14% | 19% | 4% | 4% | 2% | 2% | 2% | 2% | 8% | 92% | 25% | 42% | 34% | 63% | 49% | 73% | 43% | 58% |
MolVec | 6% | 8% | 3% | 3% | 2% | 2% | 2% | 2% | 21% | 45% | 28% | 35% | 36% | 55% | 34% | 62% | 37% | 47% |
Imago | 23% | 25% | 7% | 7% | 3% | 3% | 6% | 7% | 19% | 98% | 23% | 92% | 27% | 91% | 57% | 67% | 35% | 79% |
Img2Mol | 2% | 7% | 3% | 3% | 3% | 3% | 1% | 1% | 1% | 2% | 1% | 2% | 0.29% | 0.32% | 2% | 26% | 4% | 4% |
SwinOCSR | 6% | 9% | 5% | 6% | 2% | 3% | 0.21% | 0.33% | 3% | 6% | 5% | 8% | 8% | 12% | 3% | 12% | 11% | 28% |
MolScribe | 1% | 2% | 3% | 3% | 0.37% | 0.4% | 0.02% | 0.02% | 0.22% | 0.23% | 1% | 1% | 1% | 2% | 5% | 17% | 2% | 3% |
DECIMER | 3% | 3% | 2% | 2% | 1% | 1% | 0% | 0% | 0.25% | 0.45% | 0.20% | 0.21% | 2% | 3% | 5% | 17% | 4% | 4% |
TE: Percentage of predictions with Tanimoto similarity values of zero and invalid predictions (catastrophic failure). T<=0.3: The percentage of predictions with Tanimoto similarity less than or equal to 0.3 (severe failure).
The best result for each metric on each dataset is marked in bold.