Skip to main content
. 2017 Jan 31;50(1):12–19. doi: 10.5483/BMBRep.2017.50.1.135

Table 1.

Cancer-related, high-throughput data repositories. The databases in Fig. 1B are described with additional information including the number of available data sets, data types, and websites. The number of entries is deemed valid as of 05/02/2016

Names Description Address Cancer relating data
TCGA The Cancer Genome Atlas (TCGA): now one of programs organized by newly established NCI’s Center for Cancer Genomics (11) cancergenome.nih.gov 34 cancer studies (types), 11,091 samples
dbGaP The database of Genotypes and Phenotypes (dbGaP): archive of genome and phenotype in human www.ncbi.nlm.nih.gov/gap 991 datasets
SRA Sequence Read Archive (SRA): raw sequencing files and alignment files from next generation sequencing www.ncbi.nlm.nih.gov/sra 1,950 cancer studies
cBioPortal Multi-functional platform: supporting intuitive visualization, literate clinical pie chart, and simple data access (75). TCGA data visualization included. cbioportal.org 126 cancer genomics studies, 26,080 samples
ICGC The International Cancer Genome Consortium (ICGC): global-scale cancer projects (16) dcc.icgc.org/ 66 cancer projects, 17,867 donors
ArrayExpress An archive of functional genomics data (76) www.ebi.ac.uk/arrayexpress 14,974 datasets
EGA The European Genome-phenome Archive (EGA) www.ebi.ac.uk/ega/home 1,997 datasets
UCSC CGB UCSC Cancer Genomics Browser (UCSC CGB): supplying interactive heat-map based visualization, and ready-to-use tab-delimited genomics and clinical data download (77). TCGA data visualization included. genome-cancer.ucsc.edu 720 datasets
GEO The Gene Expression Omnibus (GEO) (19): a public repository for microarray and next-generation sequencing data sets, and one of the representative repositories. www.ncbi.nlm.nih.gov/geo 19,554 datasets
ENCODE The Encyclopedia of DNA Elements (ENCODE) Consortium: decoding functional elements in DNA (17). www.encodeproject.org Cancer cell lines available
CCLE The Cancer Cell Line Encyclopedia (CCLE) project: genomics and visualization in about 1,000 cell lines. Drug sensitivity available for the cell lines (78). www.broadinstitute.org/ccle/home Genomic characterization of 1,000 cell lines
PeptideAtlas An archive of proteome information (21) www.peptideatlas.org 99 datasets
PRIDE PRoteomics IDEntifications (PRIDE) database: protein and peptide identifications, post-translational modifications (22). Mass spectrometry based proteomics data available. www.ebi.ac.uk/pride/archive 290 datasets