. 2020 Jul 31;10:12982. doi: 10.1038/s41598-020-70026-w

Table 1.

Chordoma Bayesian model statistics.

Model name	Actives	Size	ROC	F1	Kappa	MCC	Domain
Broad	150	454	0.67	0.54	0.23	0.25	0.36
Broad + EGFR	289	1486	0.81	0.52	0.36	0.39	0.33
EGFR	147	1064	0.84	0.44	0.30	0.37	0.28

Datasets were named as “Broad”²¹ and “EGFR”²⁰, and underwent curation to remove problematic molecules before model building. Data represent fivefold cross validation. The “Broad” dataset is named as such because the data came from a chordoma screen at the Broad Institute. The “EGFR” data set is so named because it came from a paper that highlighted the activity of EGFR compounds in chordoma and is not meant to construe a dataset made up entirely of EGFR compounds. Both datasets contain a wide variety of compounds that inhibit a broad range of targets.