. 2020 Mar 3;11(12):3316–3325. doi: 10.1039/c9sc05704h

Evaluation of single-step retrosynthetic models. The test data set consisted of 10k entries. For every reaction we generated 10 predictions. The number of resulting precursor suggestions was 100k. Round-trip accuracy (RT), coverage (Cov.), class diversity (CD), the inverse of the Jensen–Shannon divergence of the class likelihood distributions (1/JSD), the percentage of invalid SMILES (ismi) and the human expert evaluation (hu. ev.) are reported in the table. Models with the “_i” suffix were trained on an inchified data set. Models starting with “ste” were trained with the stereo data set and the ones with “pist” with the pistachio data set.

Model Retro	Forw.	Test data	RT [%]	Cov. [%]	CD		ismi [%]	hu. ev.
ste_i	pist_i	ste	81.2	95.1	1.8	16.5	0.5	−
ste_i	pist_i	pist	79.1	93.8	1.8	20.6	1.1	−
pist_i	pist_i	pist	74.9	95.3	2.1	22.0	0.5	+
pist	pist_i	pist	71.1	92.6	2.1	27.2	0.6	++