. 2021 Jun 14;4:90. doi: 10.1038/s42004-021-00528-9

Table 1.

Comparison of the performance of our developed DNNs and other prediction tools.

	Test set (graphs generated from)						SAMPL6 dataset		Martel dataset
	Original SMILES		Randomly selected^c		Original SMILES without ions		SAMPL6 dataset		Martel dataset
Model	rmse^d	sdev	rmse^d	sdev	rmse^d	sdev	rmse^d	sdev	rmse^d	sdev
DNN_taut^a	0.47	±0.02	0.47	±0.02	0.45	±0.02	0.33	±0.05	1.23	0.03
DNN_mono^a	0.50	±0.02	0.80	±0.03	0.49	±0.02	0.31	±0.06	1.35	0.02
ACD/GALAS^b	0.50	±0.03	0.65	±0.03	0.36	±0.02	0.51	±0.09	1.44	0.04
ALOGPS^b	0.50	±0.02	0.66	±0.03	0.45	±0.02	0.45	±0.06	1.25	0.03
COSMO-RS^b	0.97	±0.03	–	–	0.77	±0.03	0.37	±0.09	0.93	0.03
DataWarrior^b	0.80	±0.02	0.92	±0.02	0.75	±0.02	0.60	±0.16	1.61	0.04
JChem^b	0.72	±0.02	0.74	±0.03	0.69	±0.02	0.39	±0.08	1.23	0.03
KOWWIN^b	0.65	±0.04	0.92	±0.04	0.51	±0.02	0.53	±0.09	1.38	0.04
OCHEM^b	0.34	±0.02	0.65	±0.03	0.27	±0.02	0.49	±0.12	1.32	0.03

The root mean square error (rmse) and corresponding variance (sdev) for the log P prediction are given based on different SMILES inputs for the test set, the set of 11 chemicals from the SAMPL6 challenge, and the Martel dataset (707 chemicals). Results for each individual SMILES representation for the test set are given in Supplementary Table 2.

^aIntroduced in this work.

^bAlready existing prediction tool.

^cOnly one SMILES representation is randomly selected for each chemical (from the test set including also tautomers).

^dMean value and variance were estimated using bootstrapping. Random sampling with replacement was used to generate N = 1000 datasets per analyzed test set. If the rmse value of the original test set deviated from the calculated mean of the rmse distribution (N = 1000; one rmse per dataset), the mean value was reported to symmetrize the confidence intervals. The variance was determined as the standard mean error. A detailed description is given in Vorberg and Tetko⁷⁴.