Table 2.
Dataset | ML Model | |||||
---|---|---|---|---|---|---|
GDSC1 | dGBDT | 115,863 | 0.0665 | N/A | 0.0661 (0.68%) | N/A |
hGBDT | 0.0611 | 8.16% | 0.0586 (4.14%) | 649,056 (x5.6) | ||
sNN | 0.0602 | 9.46% | 0.0560 (7.07%) | 312,381 (x2.7) | ||
mNN | 0.0574 | 13.69% | 0.0532 (7.33%) | 304,224 (x2.6) | ||
GDSC2 | dGBDT | 78,423 | 0.0586 | N/A | 0.0581 (0.93%) | N/A |
hGBDT | 0.0518 | 11.69% | 0.0496 (4.15%) | 598,003 (x7.6) | ||
sNN | 0.0512 | 12.70% | 0.0478 (6.58%) | 232,820 (x3.0) | ||
mNN | 0.0509 | 13.21% | 0.0477 (6.26%) | 247,656 (x3.2) | ||
CTRP | dGBDT | 203,650 | 0.0497 | N/A | 0.0495 (0.34%) | N/A |
hGBDT | 0.0429 | 13.63% | 0.0407 (5.15%) | 789,843 (x3.9) | ||
sNN | 0.0384 | 22.60% | 0.0345 (10.17%) | 402,308 (x2.0) | ||
mNN | 0.0355 | 28.58% | 0.0302 (14.96%) | 322,865 (x1.6) | ||
NCI-60 | dGBDT | 675,000 | 0.0554 | N/A | 0.0554 (0.04%) | N/A |
hGBDT | 0.0326 | 41.16% | 0.0313 (3.93%) | 18,355,942 (x27.2) | ||
sNN | 0.0333 | 39.95% | 0.0311 (6.59%) | 2,109,907 (x3.1) | ||
mNN | 0.0321 | 42.17% | 0.0305 (4.69%) | 5,175,827 (x7.6) |
: prediction error of models trained with the full training set size. : improvement in prediction error as compared with the dGBDT baseline. : expected prediction error if the training size is doubled (in parentheses is the percentage reduction in the error score as compared with ). : training size required to reduce the error score by 10% (in parentheses is the required increase in sample size as a factor of |T| to achieve the score)