Skip to main content
. 2021 Jun 14;4:90. doi: 10.1038/s42004-021-00528-9

Table 1.

Comparison of the performance of our developed DNNs and other prediction tools.

Test set (graphs generated from) SAMPL6 dataset Martel dataset
Original SMILES Randomly selectedc Original SMILES without ions
Model rmsed sdev rmsed sdev rmsed sdev rmsed sdev rmsed sdev
DNNtauta 0.47 ±0.02 0.47 ±0.02 0.45 ±0.02 0.33 ±0.05 1.23 0.03
DNNmonoa 0.50 ±0.02 0.80 ±0.03 0.49 ±0.02 0.31 ±0.06 1.35 0.02
ACD/GALASb 0.50 ±0.03 0.65 ±0.03 0.36 ±0.02 0.51 ±0.09 1.44 0.04
ALOGPSb 0.50 ±0.02 0.66 ±0.03 0.45 ±0.02 0.45 ±0.06 1.25 0.03
COSMO-RSb 0.97 ±0.03 0.77 ±0.03 0.37 ±0.09 0.93 0.03
DataWarriorb 0.80 ±0.02 0.92 ±0.02 0.75 ±0.02 0.60 ±0.16 1.61 0.04
JChemb 0.72 ±0.02 0.74 ±0.03 0.69 ±0.02 0.39 ±0.08 1.23 0.03
KOWWINb 0.65 ±0.04 0.92 ±0.04 0.51 ±0.02 0.53 ±0.09 1.38 0.04
OCHEMb 0.34 ±0.02 0.65 ±0.03 0.27 ±0.02 0.49 ±0.12 1.32 0.03

The root mean square error (rmse) and corresponding variance (sdev) for the log P prediction are given based on different SMILES inputs for the test set, the set of 11 chemicals from the SAMPL6 challenge, and the Martel dataset (707 chemicals). Results for each individual SMILES representation for the test set are given in Supplementary Table 2.

aIntroduced in this work.

bAlready existing prediction tool.

cOnly one SMILES representation is randomly selected for each chemical (from the test set including also tautomers).

dMean value and variance were estimated using bootstrapping. Random sampling with replacement was used to generate N = 1000 datasets per analyzed test set. If the rmse value of the original test set deviated from the calculated mean of the rmse distribution (N = 1000; one rmse per dataset), the mean value was reported to symmetrize the confidence intervals. The variance was determined as the standard mean error. A detailed description is given in Vorberg and Tetko74.