Skip to main content
Data in Brief logoLink to Data in Brief
. 2016 Dec 6;10:499–504. doi: 10.1016/j.dib.2016.11.096

Data on master regulators and transcription factor binding sites found by upstream analysis of multi-omics data on methotrexate resistance of colon cancer

Alexander E Kel a,b,c,
PMCID: PMC5196090  PMID: 28054015

Abstract

Computational analysis of master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of a cell is a new approach of causal analysis of multi-omics data.

This paper contains results on analysis of multi-omics data that include transcriptomics, proteomics and epigenomics data of methotrexate (MTX) resistant colon cancer cell line. The data were used for analysis of mechanisms of resistance and for prediction of potential drug targets and promising compounds for reverting the MTX resistance of these cancer cells. We present all results of the analysis including the lists of identified transcription factors and their binding sites in genome and the list of predicted master regulators – potential drug targets.

This data was generated in the study recently published in the article “Multi-omics “Upstream Analysis” of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer” (Kel et al., 2016) [4].

These data are of interest for researchers from the field of multi-omics data analysis and for biologists who are interested in identification of novel drug targets against NTX resistance.


Specifications Table

Subject area Biology
More specific subject area Analysis of molecular mechanisms of diseases using NGS, microarrays and novel proteomics technologies
Type of data Table, text file, graph, figure
How data was acquired The data were generated with the help of geneXplain platform version 3.1 and 4.0, using databases: TRANSFAC release 2016.2 and TRANSPATH 2016.2.
Data format Filtered, analyzed
Experimental factors The samples were used from two states of the cell line of colon cancer HT29: sensitive cells line versus resistant cells.
Experimental features Different omics data were generated in different studies. We extracted the experimental raw data from three repositories: GEO for transcriptomics data, database PRIDE for proteomics data and SRA archives for the epigenomic ChIP-seq data.
Data source location Wolfenbuettel, Germany, 38302
Data accessibility The data is with this article and the initial raw data files are located in the PRIDE database with the project accession number PRIDE:PRD000369(http://www.ebi.ac.uk/pride/archive/projects/PRD000369); gene expression data is at Gene Expression Omnibus, data entries GEO:GSE11440and GEO:GSE53602.
Processed data and results of data analysis are available in this article and in the publicly accessible section of geneXplain platform at:http://platform.genexplain.com/bioumlweb/#de=data/Projects/MTX%20resistance/Data/TFs/TF%20sel1%20Transpath%20peptides%20Up%20Upstream%2012%20HT29_protein_context%20viz10all&anonymous=true

Value of the data

  • Lists of up-regulated and down-regulated genes in MTX resistant cells (Table 1A, 1B, Supplementary material) can help researchers to identify biomarkers of MTX resistance.

  • List of predicted transcription factor binding sites (Table. 2, Supplementary material) can be used by other researchers for designing further experiment for experimental validation of gene regulatory mechanisms of MTX resistance.

  • List of predicted master regulators (Table. 7, Supplementary material) that can be used for targeted knockout experiments to further investigate the molecular mechanisms of chemotherapy resistance of cancer.

1. Data

We here present the results of the analysis of the data of three different omics experiments, namely, transcriptomics, proteomics and epigenomics, that were performed independently in the same type of cell line. After necessary preprocessing of the obtained raw data we performed a special type of computational analysis, which we call “upstream analysis” that helps to integrate these three omics data types and identify master regulators of the methotrexate resistance of colon cancer. We identified master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of the cancer cells under study. The found master regulators helped to identify chemical compounds and existing drugs as inhibitors of those master regulators and therefore as potentially helpful for reverting the obtained MTX resistance.

2. Experimental design, materials and methods

Link:

http://platform.genexplain.com/bioumlweb/#de=data/Projects/MTX%20resistance/Data/HT29_ChIP-seq/Track%20genes&anonymous=true.

We retrieved the common genes of this list with the list of upregulated genes in MTX resistant cellsand identified 1347 genes that contain such peaks in their potential regulatory regions (in 5′ regions, in introns, and 3′ regions of the genes). The result of such overlap is shown in Fig. 1 below.

Fig. 1.

Fig. 1

Venn diagram of the overlap between genes associated with at least one peak of the CDK8 antibody ChIP-seq signal and the list of up-regulated in MTX resistant cells.

As a result we extracted 710 genomic intervals of 400 bp length each around summits of CDK8 peaks in the up-regulated genes. We consider these intervals as potential MTX resistance enhancers.

(Table 4 CDK8_400_summit_UpFC1.0_in_MTXresistant.interval, Supplementary material).

  • 5)

    We performed a site frequency analysis (F-Match) and composite site analysis (CMA) in those MTX resistance enhancers in a similar same way as we did in promoters of Up-regulated genes. The results of this analysis is present in Fig. 2 below (see also the data in Table 5 Site optimization summary_CDK8_400_summit_DnFC1.0.txt, Supplementary material).

  • 6)

    At the next step we performed the master regulator search as it is described in [2] with a modified algorithm described in the paper [4], using proteomics data as “context proteins”. The proteomics data were matched to the proteins in TRANSPATH database [5]. The list of the TRANSPATH matched proteins found in HT29 cell line is in Table 6 HT29_colon_cancer_cell_line Ensembl proteins Proteins Transpath peptides a annotated.txt, Supplementary material.

Fig. 2.

Fig. 2

Result of the composite site analysis (CMA) in MTX resistance enhancers. Detailed information of the search algorithm is given. Module 1 represents the list of PWMs that were included by the algorithm into the composite module. Two histograms, red and blue, show the difference of the score of the composite module in the Yes-set (enhancers) and No-set (non-regulated regions of genome).

The master regulator search revealed 48 master-regulator proteins that were either found by the proteomics analysis or whose genes were significantly up-regulated. The list of all revealed master regulators is presented in Table 7 Master regulators from TFs filtered.txt, Supplementary material.

Link:

http://platform.genexplain.com/bioumlweb/#de=data/Projects/MTX%20resistance/Data/TFs/TF%20sel1%20Transpath%20peptides%20Up%20Upstream%2010%20HT29_protein_context%20annotated%20filtered&anonymous=true.

Acknowledgements

This work was done with the financial support of Targeted Program “Research and Development on Priority Directions of Science and Technology in Russia, 2014–2021”, Contract no. 14.604.21.0101, Unique Identifier of the Applied Scientific Project: RFMEFI60414×0101. The work was partially supported (VP) in the framework of the Russian State Academies of Sciences Fundamental Research Program for 2013–2020. This work was also supported by the following grants of the EU FP7 program: “SysMedIBD” no. 305564, “RESOLVE” no. 305707 and “MIMOMICS” no. 305280.

Footnotes

Transparency document

Transparency data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2016.11.096.

Appendix A

Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2016.11.096.

Transparency document. Supplementary material

Supplementary material

mmc1.pdf (1.2MB, pdf)

.

Appendix A. Supplementary material

Supplementary material

mmc2.zip (557.6KB, zip)

.

References

  • 1.Smyth G.K. Limma. Linear models for microarray data. In: Gentleman R., Carey V., Dudoit S., Irizarry R., Huber W., editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer; New York: 2005. pp. 397–420. [Google Scholar]
  • 2.Kel A., Voss N., Jauregui R., Kel-Margoulis O., Wingender E. Beyond microarrays: find key transcription factors controlling signal transduction pathways. BMC Bioinform. 2006;7:S13. doi: 10.1186/1471-2105-7-S2-S13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Waleev T., Shtokalo D., Konovalova T., Voss N., Cheremushkin E., Stegmaier P., Kel-Margoulis O., Wingender E., Kel A. Composite module analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 2006:W541–W545. doi: 10.1093/nar/gkl342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kel A., Stegmaier P., Valeev T., Koschmann J., Poroikov V., Kel-Margoulis O., Wingender E. Multi-omics “Upstream Analysis” of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer. EuPA Open Proteom. 2016;34:1–6. doi: 10.1016/j.euprot.2016.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Krull M., Pistor S., Voss N., Kel A., Reuter I., Kronenberg D., Michael H., Schwarzer K., Potapov A., Choi C., Kel-Margoulis O., Wingender E. TRANSPATH: an information resource for storing and visualizing signaling pathways and their pathological aberrations. Nucleic Acids Res. 2006;34:D546–D551. doi: 10.1093/nar/gkj107. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.pdf (1.2MB, pdf)

Supplementary material

mmc2.zip (557.6KB, zip)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES