Table 1.
Challenge | Data types | Data cohorts | N samples | Size | Open |
---|---|---|---|---|---|
Digital Mammography | Human clinical Imaging | Kaiser Permanente | 80k patients (640k images) | 13 TB | No |
MSSM | 1k (15k) | .3 TB | No | ||
Karolinska | 69k (663k) | 13.2 TB | No | ||
UCSF | 42k (500k) | 10 TB | No | ||
CRUK | 7 k | No | |||
Total | 200k (1818k) | 36.5 TB | |||
Multiple Myeloma | Human clinical; gene expr; DNAseq; Cytogenetics | MMRF | 797 | 11 GB | Yes |
PUBLIC | 1444 | 1 GB | Yes | ||
DFCI | 294 | 76 GB | No | ||
UAMS | 463 | 6 GB | No | ||
M2Gen | 105 | 41 GB | No | ||
Total | 3103 | 135 GB | |||
SMC-Het | All | 76 | 22 GB | No | |
SMC-RNA | Simulated; Human clinical; RNA-seq | Training | 31 | 290 GB | Yes |
Test | 20 | 197 GB | Yes | ||
Real | 32 | 265 GB | No |
Data cohorts describe the source of the data used in the challenge. MSSM Mount Sinai School of Medicine, UCSF University of California San Francisco, CRUK Cancer Research UK, MMRF Multiple Myeloma Research Foundation, DFCI Dana-Farber Cancer Institute, UAMS University of Arkansas for Medical Sciences, Training synthetically generated data provided to participants, Test synthetically generated data held-out data, Real cell lines spiked in with known constructs. The number of samples in digital mammography includes the number of patients and the number of images in parentheses. Open indicates whether the data was publicly available to participants