Table 2.
Project | Samples | Data type | Size | Description |
---|---|---|---|---|
TCGA | Primary cancers, matched normal samples, some metastatic samples | Gene expression, DNA mutations, DNA methylation, chromatin accessibility, CNA, protein expression, histopathology images | 11,315 cancer genomes from 33 cancer types | Joint effort between the US National Cancer Institute and the US National Human Genome Research Institute |
ICGC | Primary cancers, matched normal samples, some metastatic samples | Gene expression, DNA mutations, DNA methylation, CNA, protein expression | 25,000 cancer genomes from 22 cancer types | A global cancer genomics effort for documenting somatic mutations that drive common tumour types |
PCAWG | Samples from TCGA and ICGC | DNA variations from whole-genome sequencing | 2,658 cancer genomes from 38 tumour types | Revealed 288,457 structural variations across topologically associated domains152 |
LINCS | Human cell lines | Differential expression upon treatment or genetic perturbations | 1.4 million gene expression profiles in 50 cell types, focused on approximately 1,000 landmark genes | Probes how cell models respond to chemical or genetic perturbations through use of microarrays focused on approximately 1,000 genes that are most representative of variations in the transcriptome16 |
CCLE | Human cancer cell lines | Gene expression, DNA mutations, promoter methylation, CNA, metabolomics, drug sensitivity, CRISPR/RNAi genome-wide screens, protein expression for a few targets | 1,072 cell lines | Provides a data encyclopedia of human cancer cell lines178 |
CPTAC | Human cancers and normal tissue | Protein expression and post-translational modifications | Almost 4,000 samples from 14 tumour sites | A national effort to understand the molecular basis of cancer through large-scale proteome genomics |
Human Protein Atlas | Human cancers, normal tissues, cell models | IHC images, gene expression | 3.1 million annotated IHC tissue images for most protein-coding genes, spanning 17 cancer types | Aims to map all human proteins in tumours and tissues using IHC179 |
GENIE | Human cancers | Exome mutations focused on common cancer-related genes | 136,096 cases from 110 cancer sites | A registry assembled through 19 cancer centres worldwide, aggregating sequencing data obtained during routine medical practice from patients with cancer |
CAMELYON | Sentinel lymph nodes of patients with metastatic breast cancer | H&E-stained slides | 1,399 whole-slide images with pathology annotations of metastases regions | A challenge to evaluate new and existing algorithms for automated detection and classification of breast cancer metastases in whole-slide images of lymph nodes110 |
TARGET | Paediatric cancers | Gene expression, DNA mutation (whole-genome and whole-exome sequencing), DNA methylation | 6,196 cancer genomes spanning 9 cancer types | Applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers |
CCLE, Cancer Cell Line Encyclopedia; CNA, copy number alteration; CPTAC, Clinical Proteomic Tumour Analysis Consortium; H&E, haematoxylin and eosin; ICGC, International Cancer Genome Consortium; IHC, immunohistochemistry; PCAWG, Pan-Cancer Analysis of Whole Genomes; TARGET, Therapeutically Applicable Research to Generate Effective Treatments; TCGA, The Cancer Genome Atlas.