Abstract
Computational analysis of master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of a cell is a new approach of causal analysis of multi-omics data.
This paper contains results on analysis of multi-omics data that include transcriptomics, proteomics and epigenomics data of methotrexate (MTX) resistant colon cancer cell line. The data were used for analysis of mechanisms of resistance and for prediction of potential drug targets and promising compounds for reverting the MTX resistance of these cancer cells. We present all results of the analysis including the lists of identified transcription factors and their binding sites in genome and the list of predicted master regulators – potential drug targets.
This data was generated in the study recently published in the article “Multi-omics “Upstream Analysis” of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer” (Kel et al., 2016) [4].
These data are of interest for researchers from the field of multi-omics data analysis and for biologists who are interested in identification of novel drug targets against NTX resistance.
Specifications Table
Subject area | Biology |
More specific subject area | Analysis of molecular mechanisms of diseases using NGS, microarrays and novel proteomics technologies |
Type of data | Table, text file, graph, figure |
How data was acquired | The data were generated with the help of geneXplain platform version 3.1 and 4.0, using databases: TRANSFAC release 2016.2 and TRANSPATH 2016.2. |
Data format | Filtered, analyzed |
Experimental factors | The samples were used from two states of the cell line of colon cancer HT29: sensitive cells line versus resistant cells. |
Experimental features | Different omics data were generated in different studies. We extracted the experimental raw data from three repositories: GEO for transcriptomics data, database PRIDE for proteomics data and SRA archives for the epigenomic ChIP-seq data. |
Data source location | Wolfenbuettel, Germany, 38302 |
Data accessibility | The data is with this article and the initial raw data files are located in the PRIDE database with the project accession number PRIDE:PRD000369(http://www.ebi.ac.uk/pride/archive/projects/PRD000369); gene expression data is at Gene Expression Omnibus, data entries GEO:GSE11440and GEO:GSE53602. |
Processed data and results of data analysis are available in this article and in the publicly accessible section of geneXplain platform at:http://platform.genexplain.com/bioumlweb/#de=data/Projects/MTX%20resistance/Data/TFs/TF%20sel1%20Transpath%20peptides%20Up%20Upstream%2012%20HT29_protein_context%20viz10all&anonymous=true |
Value of the data
-
•
Lists of up-regulated and down-regulated genes in MTX resistant cells (Table 1A, 1B, Supplementary material) can help researchers to identify biomarkers of MTX resistance.
-
•
List of predicted transcription factor binding sites (Table. 2, Supplementary material) can be used by other researchers for designing further experiment for experimental validation of gene regulatory mechanisms of MTX resistance.
-
•
List of predicted master regulators (Table. 7, Supplementary material) that can be used for targeted knockout experiments to further investigate the molecular mechanisms of chemotherapy resistance of cancer.
1. Data
We here present the results of the analysis of the data of three different omics experiments, namely, transcriptomics, proteomics and epigenomics, that were performed independently in the same type of cell line. After necessary preprocessing of the obtained raw data we performed a special type of computational analysis, which we call “upstream analysis” that helps to integrate these three omics data types and identify master regulators of the methotrexate resistance of colon cancer. We identified master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of the cancer cells under study. The found master regulators helped to identify chemical compounds and existing drugs as inhibitors of those master regulators and therefore as potentially helpful for reverting the obtained MTX resistance.
2. Experimental design, materials and methods
-
1)
At the first step we analysed the transcriptomics data and compared the MTX resistant and MTX sensitive cells. We revealed differentially expressed genes (DEG) using Limma analysis [1] with the p-value cut-off 0.05 (corrected for the multiple testing). Among them, we found 1951 up-regulated genes (Table 1A Up-regulated genes in_MTXresistant Ensembl.txt, Supplementary material) and 2185 down-regulated genes (Table 1B Down-regulated genes in_MTXresistant Ensembl.txt, Supplementary material). Also we extracted a list of genes that did not have significant differences between MTX sensitive and MTX resistant cells (with p-value>0.5 and LogFC>−0.01 and <0.01) (Table 1C, Non-changed genes in_MTXresistant Ensembl.txt, Supplementary material).
-
2)
At the next step we applied the F-Match algorithm [2] and identified transcription factor binding sites that are overrepresented in promoters of MTX resistant cells in comparison with promoters of MTX sensitive cells. The promoter length was defined from −1000 bp till +100 bp around the transcription start site (TSS). We selected 16 TRANSFAC position weight matrices (PWMs) according to their p-value and frequency ratio cut-offs (P_value<0.01 & Yes_No_ratio>1.2).
(Table 2 Site optimisation summary Up-regulated genes Ensembl FC1.5 sites -1000.100 non-redundant_minSUM_filtered.txt, Supplementary material)
-
3)
At the next step we applied the CMA algorithm [3] and identified pairs of transcription factor binding sites overrepresented at the promoters of up-regulated genes in MTX resistant cells. We identified 6 pairs of TRANSFAC PWMs the matches of which are clustered in these promoters.
Link: (also showing the positions of the identified site pairs in the promoters of the up-regulated genes under study)
Table 3 CMA sites in promoters UpFC1.5 track.interval in Supplementary materials gives genomic coordinates (build GRCh37) of the identified transcription factor binding site pairs in the promoters of the up-regulated genes under study.
-
4)
At the next step we identified peaks of the CDK8 antibody ChIP-seq data in HT29 cell line using the peak calling program MACS [26] (without control and with almost all default parameters, except parameter “Enrichment ratio”, which was set to value 5 in order to achieve higher number of peaks). We identified 29,400 peaks of CDK8 complex binding in the whole human genome. These peaks were mapped to the vicinity of 17,115 genes in human genome (−2000 +2000 around 5′ and 3′ borders of the genes). The information about all these genes with the position of these peaks and the schemas of peak locations in the gene structure is presented here:
Link:
We retrieved the common genes of this list with the list of upregulated genes in MTX resistant cellsand identified 1347 genes that contain such peaks in their potential regulatory regions (in 5′ regions, in introns, and 3′ regions of the genes). The result of such overlap is shown in Fig. 1 below.
As a result we extracted 710 genomic intervals of 400 bp length each around summits of CDK8 peaks in the up-regulated genes. We consider these intervals as potential MTX resistance enhancers.
(Table 4 CDK8_400_summit_UpFC1.0_in_MTXresistant.interval, Supplementary material).
-
5)
We performed a site frequency analysis (F-Match) and composite site analysis (CMA) in those MTX resistance enhancers in a similar same way as we did in promoters of Up-regulated genes. The results of this analysis is present in Fig. 2 below (see also the data in Table 5 Site optimization summary_CDK8_400_summit_DnFC1.0.txt, Supplementary material).
-
6)
At the next step we performed the master regulator search as it is described in [2] with a modified algorithm described in the paper [4], using proteomics data as “context proteins”. The proteomics data were matched to the proteins in TRANSPATH database [5]. The list of the TRANSPATH matched proteins found in HT29 cell line is in Table 6 HT29_colon_cancer_cell_line Ensembl proteins Proteins Transpath peptides a annotated.txt, Supplementary material.
The master regulator search revealed 48 master-regulator proteins that were either found by the proteomics analysis or whose genes were significantly up-regulated. The list of all revealed master regulators is presented in Table 7 Master regulators from TFs filtered.txt, Supplementary material.
Link:
Acknowledgements
This work was done with the financial support of Targeted Program “Research and Development on Priority Directions of Science and Technology in Russia, 2014–2021”, Contract no. 14.604.21.0101, Unique Identifier of the Applied Scientific Project: RFMEFI60414×0101. The work was partially supported (VP) in the framework of the Russian State Academies of Sciences Fundamental Research Program for 2013–2020. This work was also supported by the following grants of the EU FP7 program: “SysMedIBD” no. 305564, “RESOLVE” no. 305707 and “MIMOMICS” no. 305280.
Footnotes
Transparency data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2016.11.096.
Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2016.11.096.
Transparency document. Supplementary material
.
Appendix A. Supplementary material
.
References
- 1.Smyth G.K. Limma. Linear models for microarray data. In: Gentleman R., Carey V., Dudoit S., Irizarry R., Huber W., editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer; New York: 2005. pp. 397–420. [Google Scholar]
- 2.Kel A., Voss N., Jauregui R., Kel-Margoulis O., Wingender E. Beyond microarrays: find key transcription factors controlling signal transduction pathways. BMC Bioinform. 2006;7:S13. doi: 10.1186/1471-2105-7-S2-S13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Waleev T., Shtokalo D., Konovalova T., Voss N., Cheremushkin E., Stegmaier P., Kel-Margoulis O., Wingender E., Kel A. Composite module analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 2006:W541–W545. doi: 10.1093/nar/gkl342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kel A., Stegmaier P., Valeev T., Koschmann J., Poroikov V., Kel-Margoulis O., Wingender E. Multi-omics “Upstream Analysis” of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer. EuPA Open Proteom. 2016;34:1–6. doi: 10.1016/j.euprot.2016.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Krull M., Pistor S., Voss N., Kel A., Reuter I., Kronenberg D., Michael H., Schwarzer K., Potapov A., Choi C., Kel-Margoulis O., Wingender E. TRANSPATH: an information resource for storing and visualizing signaling pathways and their pathological aberrations. Nucleic Acids Res. 2006;34:D546–D551. doi: 10.1093/nar/gkj107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.