Skip to main content
. 2022 Sep 5;22(11):625–639. doi: 10.1038/s41568-022-00502-0

Table 3.

Data repositories hosting cancer genomics data

Repository Datasets included Sample size Description
GDC 20 data-generation programmes, including TCGA, TARGET, GENIE and CPTAC 85,552 cases from 67 primary cancer sites Provides the cancer research community with a unified repository that enables data sharing across genomic studies
IDC 115 data collections, including cohorts from TCGA, CPTAC and other projects 61,134 cases from 21 primary cancer sites Connects researchers with publicly available cancer imaging data and provides a cloud computing environment integrated with other cancer research data commons180
TCIA 169 data collections, including cohorts from TCGA, CPTAC and other projects 65,508 cases from 69 disease types, including cancer and non-cancer types (for example, COVID-19) De-identifies and hosts cancer medical images for public download, but not cloud computing use like IDC. Parts of its data are included in IDC. Also includes some private data collections
GEO 177,063 data series; 53,740 contain ‘cancer’ as a keyword 5,102,810 samples; 1,118,082 samples contain ‘cancer’ as a keyword in metadata Host data submissions from various studies. It contains many individual biology studies that may support knowledge rediscovery
Array Express 16,345 experiments; 3,293 contain ‘cancer’ as a keyword 894,309 samples; 236,935 of them contain ‘cancer’ as a keyword in their metadata A popular genomics data repository
FDC 81,883 human datasets deposited in GEO and ArrayExpress 3,707,349 samples in total, not restricted to cancer Helps researchers annotate metadata in GEO and ArrayExpress to enable automatic algorithmic analysis and knowledge rediscovery34

CPTAC, Clinical Proteomic Tumour Analysis Consortium; FDC, Framework for Data Curation; GDC, Genomic Data Commons; GEO, Gene Expression Omnibus; IDC, Imaging Data Commons; TARGET, Therapeutically Applicable Research to Generate Effective Treatments; TCGA, The Cancer Genome Atlas; TCIA, The Cancer Imaging Archive.