Edit distance with/without root alignment. Except for the data size, all figures are shown on average. Dataset×m: m times data augmentation. Pro.: product SMILES. Rea.: reactant SMILES.
Dataset | Data size | Length | Edit distance | ||
---|---|---|---|---|---|
Pro. | Rea. | w/o | w/ | ||
USPTO-50K×1 | 50 016 | 43.4 | 47.4 | 17.9 | 14.1 (−21%) |
USPTO-50K×5 | 250 060 | 45.1 | 49.6 | 28.3 | 14.1 (−50%) |
USPTO-50K×10 | 500 160 | 45.3 | 49.9 | 30.0 | 14.1 (−53%) |
USPTO50K×20 | 1 000 240 | 45.4 | 50.0 | 30.2 | 14.1 (−53%) |
USPTO-MIT×1 | 482 132 | 40.6 | 46.1 | 17.0 | 13.5 (−21%) |
USPTO-MIT×5 | 2 410 660 | 41.6 | 47.0 | 26.7 | 13.5 (−49%) |
USPTO-FULL×1 | 960 198 | 41.4 | 48.1 | 19.8 | 16.6 (−16%) |
USPTO-FULL×5 | 4 800 990 | 43.1 | 50.4 | 29.2 | 16.6 (−43%) |