Skip to main content
letter
. 2019 Sep 10;20:195. doi: 10.1186/s13059-019-1794-0

Table 1.

Challenge data characteristics

Challenge Data types Data cohorts N samples Size Open
Digital Mammography Human clinical Imaging Kaiser Permanente 80k patients (640k images) 13 TB No
MSSM 1k (15k) .3 TB No
Karolinska 69k (663k) 13.2 TB No
UCSF 42k (500k) 10 TB No
CRUK 7 k No
Total 200k (1818k) 36.5 TB
Multiple Myeloma Human clinical; gene expr; DNAseq; Cytogenetics MMRF 797 11 GB Yes
PUBLIC 1444 1 GB Yes
DFCI 294 76 GB No
UAMS 463 6 GB No
M2Gen 105 41 GB No
Total 3103 135 GB
SMC-Het All 76 22 GB No
SMC-RNA Simulated; Human clinical; RNA-seq Training 31 290 GB Yes
Test 20 197 GB Yes
Real 32 265 GB No

Data cohorts describe the source of the data used in the challenge. MSSM Mount Sinai School of Medicine, UCSF University of California San Francisco, CRUK Cancer Research UK, MMRF Multiple Myeloma Research Foundation, DFCI Dana-Farber Cancer Institute, UAMS University of Arkansas for Medical Sciences, Training synthetically generated data provided to participants, Test synthetically generated data held-out data, Real cell lines spiked in with known constructs. The number of samples in digital mammography includes the number of patients and the number of images in parentheses. Open indicates whether the data was publicly available to participants