Skip to main content
. 2016 Jan 22;8:2. doi: 10.1186/s13321-016-0113-y

Table 1.

The number of compounds and average properties of molecules of the analyzed datasets and their drug-like subsets

Dataset Type Whole set Drug-like set, % of the total set
N Average
T (°C) MW NA
PATENTS Training 241,958 159 357 25 89
 Decomposing Training 13,785 209 358 25 76
 Non-decomposing Training 228,173 155 357 25 93
Bergström Validation 277 151 295 20.8 92
Bradley Validation 2878 59 174 11.4 53
OCHEM Validation 21,832 117 249 16.7 73
Enamine Validation 22,449 143 223 14.9 91
COMBINED Validation, merge of four sets 47,436 126 233 15.6 81

MW molecular weight, NA number of non-hydrogen atoms