Table 1.
Model name | Actives | Size | ROC | F1 | Kappa | MCC | Domain |
---|---|---|---|---|---|---|---|
Broad | 150 | 454 | 0.67 | 0.54 | 0.23 | 0.25 | 0.36 |
Broad + EGFR | 289 | 1486 | 0.81 | 0.52 | 0.36 | 0.39 | 0.33 |
EGFR | 147 | 1064 | 0.84 | 0.44 | 0.30 | 0.37 | 0.28 |
Datasets were named as “Broad”21 and “EGFR”20, and underwent curation to remove problematic molecules before model building. Data represent fivefold cross validation. The “Broad” dataset is named as such because the data came from a chordoma screen at the Broad Institute. The “EGFR” data set is so named because it came from a paper that highlighted the activity of EGFR compounds in chordoma and is not meant to construe a dataset made up entirely of EGFR compounds. Both datasets contain a wide variety of compounds that inhibit a broad range of targets.