Skip to main content
. Author manuscript; available in PMC: 2021 Dec 8.
Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2020 Dec 8;17(6):1846–1857. doi: 10.1109/TCBB.2019.2910061

TABLE I.

Population Statistics for Our Datasets and Tasks.

Dataset Statistics:
Dataset Number of Samples # Cell Lines Most Frequent Cell Line Least Frequent Cell Line
Full LINCS 156,461 36 MCF7 (26,546) NCIH716 (8)
Prostate Only LINCS 25,565 2 PC3 (13,625) VCAP (11,940)
MGH NeuroBank (Level 4) 5602 5 N/A (1133) N/A (1109)
MGH NeuroBank (Level 5) 1894 5 N/A (380) N/A (377)
Task Statistics:
Dataset Task # Classes Most Frequent Class Least Frequent Class
LINCS (Full) Primary Site 12 Prostate (43,686) Ovary (415)
Subtype 14 Adenocarcinoma (53,245) Embryonal Kidney (1384)
MOA 49 DMSO (25,638) IKK Inhibitor (828)
LINCS (Prostate Only) MOA 9 DMSO (8833) Serotonin Receptor Antagonist (1029)
MGH NeuroBank (Level 4) Perturbagen 60 DMSO (383) Ruboxistaurin (78)
MGH NeuroBank (Level 5) Perturbagen 60 DMSO (130) Ruboxistaurin (27)