Accuracy and Tanimoto similarity are reported in [%]. Best results are in bold. Benchmark: performance on the benchmark datasets described in section 3.2.2. Depiction: results for the molecular optical recognition task for different cheminformatics depiction libraries. The dataset used is a random subset of 5000 compounds from the Img2Mol test set depicted each five times (with previously mentioned augmentations) by each of the three libraries.
Img2Mol | MolVec 0.9.8 | Imago 2.0 | OSRA 2.1 | |||||
---|---|---|---|---|---|---|---|---|
Accuracy | Tanimoto | Accuracy | Tanimoto | Accuracy | Tanimoto | Accuracy | Tanimoto | |
Benchmark | ||||||||
Img2Mol | 88.25 | 95.27 | 2.59 | 13.03 | 0.02 | 4.74 | 2.59 | 13.03 |
STAKER | 64.33 | 83.76 | 5.32 | 31.78 | 0.07 | 5.06 | 5.23 | 26.98 |
USPTO | 42.29 | 73.07 | 30.68 | 65.50 | 5.07 | 7.28 | 6.37 | 44.21 |
UoB | 78.18 | 88.51 | 75.01 | 86.88 | 5.12 | 7.19 | 70.89 | 85.27 |
CLEF | 48.84 | 78.04 | 44.48 | 76.61 | 26.72 | 41.29 | 17.04 | 58.84 |
JPO | 45.14 | 69.43 | 49.48 | 66.46 | 23.18 | 37.47 | 33.04 | 49.62 |
Depiction | ||||||||
RDKit | 93.4 ± 0.2 | 97.4 ± 0.1 | 3.7 ± 0.3 | 24.7 ± 0.1 | 0.3 ± 0.1 | 17.9 ± 0.3 | 4.4 ± 0.4 | 17.5 ± 0.5 |
OE | 89.5 ± 0.2 | 95.8 ± 0.1 | 33.4 ± 0.4 | 57.4 ± 0.3 | 12.3 ± 0.2 | 32.0 ± 0.2 | 26.3 ± 0.4 | 50.0 ± 0.4 |
Indigo | 79.0 ± 0.3 | 91.5 ± 0.1 | 22.2 ± 0.5 | 37.0 ± 0.5 | 4.2 ± 0.2 | 19.7 ± 0.2 | 22.6 ± 0.2 | 41.0 ± 0.2 |