Table 1.
Dataset | Type | Whole set | Drug-like set, % of the total set | |||
---|---|---|---|---|---|---|
N | Average | |||||
T (°C) | MW | NA | ||||
PATENTS | Training | 241,958 | 159 | 357 | 25 | 89 |
Decomposing | Training | 13,785 | 209 | 358 | 25 | 76 |
Non-decomposing | Training | 228,173 | 155 | 357 | 25 | 93 |
Bergström | Validation | 277 | 151 | 295 | 20.8 | 92 |
Bradley | Validation | 2878 | 59 | 174 | 11.4 | 53 |
OCHEM | Validation | 21,832 | 117 | 249 | 16.7 | 73 |
Enamine | Validation | 22,449 | 143 | 223 | 14.9 | 91 |
COMBINED | Validation, merge of four sets | 47,436 | 126 | 233 | 15.6 | 81 |
MW molecular weight, NA number of non-hydrogen atoms