Skip to main content
. 2023 Sep 10;36(10):1561–1573. doi: 10.1021/acs.chemrestox.3c00032

Table 2. Overall Performance for Classification and Regression Tasks on Toxicity-Related Data Sets of TDCommons38 and ToxBenchmark37a.

Data set Ames ↑ hERG ↑ DILI ↑ Skin Reaction ↑ LD50 ↓ Tox- Benchmark ↑
No. molecules 7,269 650 470 403 7,353 6,489
Label dist. 0.55:0.45 0.68:0.32 0.5:0.5 0.68:0.32 2.54/0.95 0.53:0.47
AttrMasking 0.842 ± 0.008 0.778 ± 0.046 0.919 ± 0.008 - 0.685 ± 0.025 -
AttentiveFP 0.814 ± 0.008 0.825 ± 0.007 0.886 ± 0.015 - 0.678 ± 0.012 -
Fingerprint-based 0.865 ± 0.002 0.875 ± 0.003 0.937 ± 0.004 - 0.588 ± 0.005 0.86 ± 0.01
SMILES-T 0.697 ± 0.011 0.703 ± 0.056 0.760 ± 0.041 0.633 ± 0.051 0.715 ± 0.012 0.720 ± 0.014
ET (single) 0.836 ± 0.003 0.839 ± 0.017 0.878 ± 0.013 0.662 ± 0.033 0.653 ± 0.008 0.881 ± 0.008
ET (multi) 0.804 ± 0.004 0.763 ± 0.021 0.885 ± 0.030 0.581 ± 0.055 0.660 ± 0.01 0.879 ± 0.005
a

For Ames, hERG, DILI, Skin Reaction, and ToxBenchmark, we report the normalized label distribution as active:inactive. For LD50, the mean and standard deviation is given. The equivariant transformer is denoted as ET. Here, (single) and (multi) differentiate between single- and multi-conformer training. We report the standard deviation for five different seed runs in subscripts. Two numbers are written in bold in one column if standard deviations overlap.