Skip to main content
. 2022 Jul 12;13(31):9023–9034. doi: 10.1039/d2sc02763a

Edit distance with/without root alignment. Except for the data size, all figures are shown on average. Dataset×m: m times data augmentation. Pro.: product SMILES. Rea.: reactant SMILES.

Dataset Data size Length Edit distance
Pro. Rea. w/o w/
USPTO-50K×1 50 016 43.4 47.4 17.9 14.1 (−21%)
USPTO-50K×5 250 060 45.1 49.6 28.3 14.1 (−50%)
USPTO-50K×10 500 160 45.3 49.9 30.0 14.1 (−53%)
USPTO50K×20 1 000 240 45.4 50.0 30.2 14.1 (−53%)
USPTO-MIT×1 482 132 40.6 46.1 17.0 13.5 (−21%)
USPTO-MIT×5 2 410 660 41.6 47.0 26.7 13.5 (−49%)
USPTO-FULL×1 960 198 41.4 48.1 19.8 16.6 (−16%)
USPTO-FULL×5 4 800 990 43.1 50.4 29.2 16.6 (−43%)