Skip to main content
. 2022 Sep 5;22(11):625–639. doi: 10.1038/s41568-022-00502-0

Table 2.

Large-scale projects generating cancer genomic datasets

Project Samples Data type Size Description
TCGA Primary cancers, matched normal samples, some metastatic samples Gene expression, DNA mutations, DNA methylation, chromatin accessibility, CNA, protein expression, histopathology images 11,315 cancer genomes from 33 cancer types Joint effort between the US National Cancer Institute and the US National Human Genome Research Institute
ICGC Primary cancers, matched normal samples, some metastatic samples Gene expression, DNA mutations, DNA methylation, CNA, protein expression 25,000 cancer genomes from 22 cancer types A global cancer genomics effort for documenting somatic mutations that drive common tumour types
PCAWG Samples from TCGA and ICGC DNA variations from whole-genome sequencing 2,658 cancer genomes from 38 tumour types Revealed 288,457 structural variations across topologically associated domains152
LINCS Human cell lines Differential expression upon treatment or genetic perturbations 1.4 million gene expression profiles in 50 cell types, focused on approximately 1,000 landmark genes Probes how cell models respond to chemical or genetic perturbations through use of microarrays focused on approximately 1,000 genes that are most representative of variations in the transcriptome16
CCLE Human cancer cell lines Gene expression, DNA mutations, promoter methylation, CNA, metabolomics, drug sensitivity, CRISPR/RNAi genome-wide screens, protein expression for a few targets 1,072 cell lines Provides a data encyclopedia of human cancer cell lines178
CPTAC Human cancers and normal tissue Protein expression and post-translational modifications Almost 4,000 samples from 14 tumour sites A national effort to understand the molecular basis of cancer through large-scale proteome genomics
Human Protein Atlas Human cancers, normal tissues, cell models IHC images, gene expression 3.1 million annotated IHC tissue images for most protein-coding genes, spanning 17 cancer types Aims to map all human proteins in tumours and tissues using IHC179
GENIE Human cancers Exome mutations focused on common cancer-related genes 136,096 cases from 110 cancer sites A registry assembled through 19 cancer centres worldwide, aggregating sequencing data obtained during routine medical practice from patients with cancer
CAMELYON Sentinel lymph nodes of patients with metastatic breast cancer H&E-stained slides 1,399 whole-slide images with pathology annotations of metastases regions A challenge to evaluate new and existing algorithms for automated detection and classification of breast cancer metastases in whole-slide images of lymph nodes110
TARGET Paediatric cancers Gene expression, DNA mutation (whole-genome and whole-exome sequencing), DNA methylation 6,196 cancer genomes spanning 9 cancer types Applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers

CCLE, Cancer Cell Line Encyclopedia; CNA, copy number alteration; CPTAC, Clinical Proteomic Tumour Analysis Consortium; H&E, haematoxylin and eosin; ICGC, International Cancer Genome Consortium; IHC, immunohistochemistry; PCAWG, Pan-Cancer Analysis of Whole Genomes; TARGET, Therapeutically Applicable Research to Generate Effective Treatments; TCGA, The Cancer Genome Atlas.