Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2013 Dec 6;30(4):590–592. doi: 10.1093/bioinformatics/btt710

ChromoHub V2: cancer genomics

Muhammad A Shah 1, Emily L Denton 1, Lihua Liu 1, Matthieu Schapira 1,2,*
PMCID: PMC3928521  PMID: 24319001

Abstract

Summary: Cancer genomics data produced by next-generation sequencing support the notion that epigenetic mechanisms play a central role in cancer. We have previously developed Chromohub, an open access online interface where users can map chemical, structural and biological data from public repositories on phylogenetic trees of protein families involved in chromatin mediated-signaling. Here, we describe a cancer genomics interface that was recently added to Chromohub; the frequency of mutation, amplification and change in expression of chromatin factors across large cohorts of cancer patients is regularly extracted from The Cancer Genome Atlas and the International Cancer Genome Consortium and can now be mapped on phylogenetic trees of epigenetic protein families. Explorators of chromatin signaling can now easily navigate the cancer genomics landscape of writers, readers and erasers of histone marks, chromatin remodeling complexes, histones and their chaperones.

Availability and implementation: http://www.thesgc.org/chromohub/.

Contact: matthieu.schapira@utoronto.ca

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

Chromohub is an online interface that allows the epigenetics research community to project biological, structural and chemical data on phylogenetic trees of protein families involved in chromatin-mediated signaling (Liu et al., 2012). The interface is a useful hub for cell biologists to find chemical inhibitors targeting their proteins of interest, medicinal chemists to inspect the structural coverage of specific binding sites or structural biologists to visualize the disease association of phylogenetic neighbors to the construct they crystallized. We previously described how protein families were assembled, phylogenetic trees generated and biological, structural and chemical data extracted from public repositories and mapped on the trees (Liu et al., 2012). We have now added to Chromohub a large section entirely focused on genomic data from cancer patients extracted from The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC).

Recent landmark next-generation sequencing campaigns of large cancer patient cohorts have revealed recurrent alterations of genes involved in epigenetic mechanisms (Biankin et al., 2012; Dalgliesh et al., 2010; Ellis et al., 2012; Ho et al., 2013; Jones et al., 2012; Le Gallo et al., 2012; Morin et al., 2011; Pugh et al., 2012; Robinson et al., 2012; Schwartzentruber et al., 2012; Stephens et al., 2012; Varela et al., 2011; Zhang et al., 2012). These results support the notion that chromatin-mediated signaling may be central to cancer initiation and progression (Baylin and Jones, 2011; You and Jones, 2012). The data associated with most of these and other unbiased cancer genomic projects were deposited into TCGA and the ICGC repositories, and made publicly accessible to the scientific community. Chromohub users can now map cancer genomics data on phylogenetic trees of protein families involved in epigenetic mechanisms.

2 METHODS

2.1 Data sources

RNASeq gene expression data, promoter and full genome methylation data and somatic mutation data were downloaded from TCGA’s Firehose data run (https://confluence.broadinstitute.org/display/GDAC/Dashboard-Stddata). GISTIC copy number data were downloaded via TCGA’s Firehose analyses run (https://confluence.broadinstitute.org/display/GDAC/Dashboard-Analyses). Furthermore, somatic mutation data are also extracted from ICGC’s Data Portal (http://dcc.icgc.org/). All data were stored in a MySQL database. A list detailing all datasets as of November 2013 underlying Chromohub’s cancer genomics interface is provided in Supplementary Table S1.

2.2 Somatic mutations

Only data derived from patients with both a tumor and a matched normal sample were used. Using an anonymized patient identification code for each patient, the overall number of genes mutated within the patient’s genome is stored and is used to filter out genomes that are hypermutated. A protein image is presented showing all mutations matching the set cutoffs; hovering over the mutations shows the amino acid change. When not explicitly specified by TCGA or ICGC, amino acid mutations are derived from genomic location, strand and mutated nucleotide.

2.3 RNASeq gene expression

Only data from patients with matched tumor and normal samples were used. RSEM values are used to quantify messenger RNA (mRNA) expression levels (RNASeq V2 data). A log2 fold change in gene expression is calculated from RSEM values of tumor and matched normal samples as follows:

graphic file with name btt710um1.jpg

Underexpressed genes have negative log2 values; overexpressed genes have positive log2 values. A rank is also generated for each gene, which is determined by ordering the frequency of over/underexpression of all genes (with available data using the specified cutoffs).

2.4 Copy number variation

The GISTIC 2.0 algorithm (Mermel et al., 2011) is used to produce copy number variation data. This preprocessing step is conducted by TCGA’s GDAC Firehose and the results are provided. Using anonimized patient identification codes, for each patient, the overall number of genes with gains/losses within the patient’s genome is stored and is used to filter out genomes with a high number of aberrations.

2.5 GISTIC copy number variation versus RNASeq gene expression

Anonymous patient identification numbers, provided by TCGA, were used to determine patients where both GISTIC copy number and RNASeq gene expression data were available. These data were used to find correlations between copy number variation and gene expression levels in tumor samples.

2.6 Promoter methylation in cancer

Promoter methylation data are downloaded exclusively from TCGA’s Firehose, but it is derived from two platforms, Human Methylation 27 k (strictly promoter methylation) and Human Methylation 450 k (whole genome methylation). Promoter methylation using the Human Methylation 450 k array was defined as 1000 bp upstream the transcription start site, which was determined for all genes using coordinates from the refGene table from the UCSC table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start).

3 RESULTS

Rather than listing gene-specific links to existing cancer genomics portals, Chromohub provides integrated data focused on chromatin signaling. Users can visualize on phylogenetic trees of protein families involved in epigenetic mechanisms the percent of tumor samples across large patient cohorts where a gene is mutated (compared with a non-tumor sample from the same patient). Highly mutated genomes can be excluded from the analysis by setting a threshold for the maximum number of genes mutated in a sample. The output is grouped by cancer type. As of October 2013, 16 cancer types are represented by cohorts of >100 patients. High or low copy number gains as well as heterozygous and homozygous deletions [corresponding to GISTIC values of 2, 1, −1 and −2, respectively (Mermel et al., 2011)] can also be plotted on phylogenetic trees. Statistically relevant data (>100 patients) are available for nine cancer types. Unlike mutation data, copy numbers are compared with those in the reference human genome.

In addition to chromosomal aberrations, changes in transcription profiles are also available: mRNA levels are compared between tumor and non-tumor samples from the same patient and tissue. This provides a bird’s eye view of genes that are overexpressed or repressed in specific cancer types for any protein family related to epigenetic mechanisms. Orthogonal data types can be projected on a tree simultaneously. For instance, combining mRNA expression and mutation data, users can rapidly see that the histone methyltransferase MLL3 is mutated in 7% (54 of 776) and repressed in 21% (23 of 107) of breast cancer samples, suggesting that this gene acts as a tumor suppressor.

Change in expression of a given gene is generally not driving cancer initiation or progression, but simply a passenger event (Hanahan and Weinberg, 2011), unless it is directly caused by a chromosomal amplification or deletion (Beroukhim et al., 2007; Eifert and Powers, 2012). To identify candidate driver events affecting chromatin factors, Chromohub allows users to automatically highlight genes where overexpression correlates with copy number gains. Using this approach, one can rapidly see that, among genes containing a Tudor domain (which bind methylated lysines and arginines), FXR1 is overexpressed and amplified in 53% (18 of 34) lung squamous cell carcinoma patients.

4 CONCLUSION

Dysregulation of the chromatin signaling platform plays a major role in cancer (Baylin and Jones, 2011; Timp and Feinberg, 2013; You and Jones, 2012); chromosomal aberrations and transcriptional alteration affecting chromatin factors can drive initiation and development of specific cancer types. The new Chromohub interface is a simple tool to navigate the cancer genomics of epigenetic mechanisms.

Funding: The SGC is a registered charity (1097737) that receives funds from AbbVie, Boehringer Ingelheim, the Canada Foundation for Innovation, the Canadian Institutes for Health Research, Genome Canada through the Ontario Genomics Institute [OGI-055], GlaxoSmithKline, Janssen, Eli Lilly Canada, the Novartis Research Foundation, the Ontario Ministry of Economic Development and Innovation, Pfizer, Takeda and the Wellcome Trust [092809/Z/10/Z]

Conflicts of Interest: none declared.

Supplementary Material

Supplementary Data

REFERENCES

  1. Baylin SB, Jones PA. A decade of exploring the cancer epigenome–biological and translational implications. Nat. Rev. Cancer. 2011;11:726–734. doi: 10.1038/nrc3130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Beroukhim R, et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. U. S. A. 2007;104:20007–20012. doi: 10.1073/pnas.0710052104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Biankin AV, et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature. 2012;491:399–405. doi: 10.1038/nature11547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Dalgliesh GL, et al. Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature. 2010;463:360–363. doi: 10.1038/nature08672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Eifert C, Powers RS. From cancer genomes to oncogenic drivers, tumour dependencies and therapeutic targets. Nat. Rev. Cancer. 2012;12:572–578. doi: 10.1038/nrc3299. [DOI] [PubMed] [Google Scholar]
  6. Ellis MJ, et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature. 2012;486:353–360. doi: 10.1038/nature11143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–674. doi: 10.1016/j.cell.2011.02.013. [DOI] [PubMed] [Google Scholar]
  8. Ho AS, et al. The mutational landscape of adenoid cystic carcinoma. Nat. Genet. 2013;45:791–798. doi: 10.1038/ng.2643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Jones DT, et al. Dissecting the genomic complexity underlying medulloblastoma. Nature. 2012;488:100–105. doi: 10.1038/nature11284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Le Gallo M, et al. Exome sequencing of serous endometrial tumors identifies recurrent somatic mutations in chromatin-remodeling and ubiquitin ligase complex genes. Nat. Genet. 2012;44:1310–1315. doi: 10.1038/ng.2455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Liu L, et al. ChromoHub: a data hub for navigators of chromatin-mediated signalling. Bioinformatics. 2012;28:2205–2206. doi: 10.1093/bioinformatics/bts340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Mermel CH, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12:R41. doi: 10.1186/gb-2011-12-4-r41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Morin RD, et al. Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature. 2011;476:298–303. doi: 10.1038/nature10351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Pugh TJ, et al. Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations. Nature. 2012;488:106–110. doi: 10.1038/nature11329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Robinson G, et al. Novel mutations target distinct subgroups of medulloblastoma. Nature. 2012;488:43–48. doi: 10.1038/nature11213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Schwartzentruber J, et al. Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma. Nature. 2012;482:226–231. doi: 10.1038/nature10833. [DOI] [PubMed] [Google Scholar]
  17. Stephens PJ, et al. The landscape of cancer genes and mutational processes in breast cancer. Nature. 2012;486:400–404. doi: 10.1038/nature11017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Timp W, Feinberg AP. Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nat. Rev. Cancer. 2013;13:497–510. doi: 10.1038/nrc3486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Varela I, et al. Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature. 2011;469:539–542. doi: 10.1038/nature09639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. You JS, Jones PA. Cancer genetics and epigenetics: two sides of the same coin? Cancer Cell. 2012;22:9–20. doi: 10.1016/j.ccr.2012.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zhang J, et al. The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature. 2012;481:157–163. doi: 10.1038/nature10725. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES