Skip to main content
. 2021 May 17;22:252. doi: 10.1186/s12859-021-04163-y

Table 2.

Prediction errors of all dataset-model combinations

Dataset ML Model mK=|T| y~K Δy~ y~(m=2|T|) m(y~=0.9y~K)
GDSC1 dGBDT 115,863 0.0665 N/A 0.0661 (0.68%) N/A
hGBDT 0.0611 8.16% 0.0586 (4.14%) 649,056 (x5.6)
sNN 0.0602 9.46% 0.0560 (7.07%) 312,381 (x2.7)
mNN 0.0574 13.69% 0.0532 (7.33%) 304,224 (x2.6)
GDSC2 dGBDT 78,423 0.0586 N/A 0.0581 (0.93%) N/A
hGBDT 0.0518 11.69% 0.0496 (4.15%) 598,003 (x7.6)
sNN 0.0512 12.70% 0.0478 (6.58%) 232,820 (x3.0)
mNN 0.0509 13.21% 0.0477 (6.26%) 247,656 (x3.2)
CTRP dGBDT 203,650 0.0497 N/A 0.0495 (0.34%) N/A
hGBDT 0.0429 13.63% 0.0407 (5.15%) 789,843 (x3.9)
sNN 0.0384 22.60% 0.0345 (10.17%) 402,308 (x2.0)
mNN 0.0355 28.58% 0.0302 (14.96%) 322,865 (x1.6)
NCI-60 dGBDT 675,000 0.0554 N/A 0.0554 (0.04%) N/A
hGBDT 0.0326 41.16% 0.0313 (3.93%) 18,355,942 (x27.2)
sNN 0.0333 39.95% 0.0311 (6.59%) 2,109,907 (x3.1)
mNN 0.0321 42.17% 0.0305 (4.69%) 5,175,827 (x7.6)

y~K: prediction error of models trained with the full training set size. Δy~: improvement in prediction error as compared with the dGBDT baseline. y~(m=2|T|): expected prediction error if the training size is doubled (in parentheses is the percentage reduction in the error score as compared with y~K). m(y~=0.9y~K): training size required to reduce the error score by 10% (in parentheses is the required increase in sample size as a factor of |T| to achieve the score)