. 2023 Nov 23;16(2):239–248. doi: 10.1038/s41557-023-01360-5

Table 1.

Model performance of the GNNs

	Reaction yield r value	Reaction yield m.a.e. (%)	Binary reaction outcome (random split), AUC (%)	Binary reaction outcome (substrate split), AUC (%)
GTNN2D	0.896 ± 0.006	4.53 ± 0.09	91.8 ± 2.1	52 ± 2
GNN2D	0.866 ± 0.005	5.61 ± 0.06	87.5 ± 1.0	51 ± 2
GTNN3D	0.884 ± 0.01	4.51 ± 0.11	91.4 ± 0.7	58 ± 4
GNN3D	0.877 ± 0.001	5.33 ± 0.34	89.4 ± 0.8	65 ± 5
GTNN2DQM	0.898 ± 0.003	4.41 ± 0.17	90.9 ± 1.5	53 ± 5
GNN2DQM	0.876 ± 0.01	5.41 ± 0.10	89.0 ± 1.1	59 ± 5
GTNN3DQM	0.890 ± 0.01	4.23 ± 0.08	91.8 ± 0.9	67 ± 2
GNN3DQM	0.890 ± 0.006	4.88 ± 0.24	89.1 ± 0.9	64 ± 4
ECFP4NN	0.885 ± 0.0006	4.55 ± 0.14	89.3 ± 1.3	52 ± 3
	F-score (%)	PVV (%)	TPR (%)	Accuracy (%)
aGNN2D	38 ± 5	56 ± 1	30 ± 6	88 ± 1
aGNN2DQM	39 ± 2	54 ± 2	30 ± 3	88 ± 0.3
aGNN3D	59 ± 3	62 ± 2	56 ± 4	90 ± 1
aGNN3DQM	60 ± 4	62 ± 2	59 ± 6	90 ± 1

The top of the table shows the model performance of the nine investigated neural networks, predicting binary reaction outcomes and reaction yields. Pearson correlation coefficient (r) and m.a.e. values were used to quantify reaction yield predictions. Balanced accuracy (AUC) was used to quantify binary reaction outcome predictions. The bottom of the table shows the model performance of the four different aGNNs for regioselectivity prediction in terms of F-score, PPV, TPR and accuracy. The numbers represent mean and standard deviation for N = 3 independent neural network runs. The numbers in bold indicate the best performance for each of the individual metrics.