Evaluation of single-step retrosynthetic models. The test data set consisted of 10k entries. For every reaction we generated 10 predictions. The number of resulting precursor suggestions was 100k. Round-trip accuracy (RT), coverage (Cov.), class diversity (CD), the inverse of the Jensen–Shannon divergence of the class likelihood distributions (1/JSD), the percentage of invalid SMILES (ismi) and the human expert evaluation (hu. ev.) are reported in the table. Models with the “_i” suffix were trained on an inchified data set. Models starting with “ste” were trained with the stereo data set and the ones with “pist” with the pistachio data set.
| Model Retro | Forw. | Test data | RT [%] | Cov. [%] | CD |
|
ismi [%] | hu. ev. |
|---|---|---|---|---|---|---|---|---|
| ste_i | pist_i | ste | 81.2 | 95.1 | 1.8 | 16.5 | 0.5 | − |
| ste_i | pist_i | pist | 79.1 | 93.8 | 1.8 | 20.6 | 1.1 | − |
| pist_i | pist_i | pist | 74.9 | 95.3 | 2.1 | 22.0 | 0.5 | + |
| pist | pist_i | pist | 71.1 | 92.6 | 2.1 | 27.2 | 0.6 | ++ |