Evaluation of single-step retrosynthetic models. The test data set consisted of 10k entries. For every reaction we generated 10 predictions. The number of resulting precursor suggestions was 100k. Round-trip accuracy (RT), coverage (Cov.), class diversity (CD), the inverse of the Jensen–Shannon divergence of the class likelihood distributions (1/JSD), the percentage of invalid SMILES (ismi) and the human expert evaluation (hu. ev.) are reported in the table. Models with the “_i” suffix were trained on an inchified data set. Models starting with “ste” were trained with the stereo data set and the ones with “pist” with the pistachio data set.
Model Retro | Forw. | Test data | RT [%] | Cov. [%] | CD |
![]() |
ismi [%] | hu. ev. |
---|---|---|---|---|---|---|---|---|
ste_i | pist_i | ste | 81.2 | 95.1 | 1.8 | 16.5 | 0.5 | − |
ste_i | pist_i | pist | 79.1 | 93.8 | 1.8 | 20.6 | 1.1 | − |
pist_i | pist_i | pist | 74.9 | 95.3 | 2.1 | 22.0 | 0.5 | + |
pist | pist_i | pist | 71.1 | 92.6 | 2.1 | 27.2 | 0.6 | ++ |