Skip to main content
Data in Brief logoLink to Data in Brief
. 2018 Jun 26;19:1902–1906. doi: 10.1016/j.dib.2018.06.048

Dataset for regulation between lncRNAs and their nearby protein-coding genes in human cancers

Zhi Liu a, Juncheng Dai a, Hongbing Shen a,b,
PMCID: PMC6141275  PMID: 30229065

Abstract

This article contains data related to the research article entitled “Systematic analysis reveals long noncoding RNAs regulating neighboring transcription factors in human cancers” (Liu et al., 2018 in press) [1]. Long noncoding RNAs (lncRNAs) are proposed to play essential roles in modulating the expression of the nearby loci. In this study, we systematically investigated the relationship between lncRNAs and their neighboring genes based on the genomic location of genes and the transcriptome expression profiles from TCGA samples across 12 tumor types. Position conservation analysis was applied to find lncRNAs conserved by position across vertebrate species. Gene ontology and enrichment analysis identified TF genes as a specific type of protein-coding genes that adjacent to highly positionally conserved lncRNA. The expression correlation of lncRNAs and their adjacent TFs were assessed across tumors to define significant co-expressed lncRNA-TF pairs, and a causal inference test (CIT) was used to infer the causal regulation of lncRNA on its nearby TF genes. A list of candidate lncRNA/TF regulation pairs in tumors was provided.

Keywords: LncRNA, Cancer, Transcription factors


Specifications Table

Subject area Biology
More specific subject area Gene expression
Type of data Tables
How data was acquired Gene expression extracted from RNA-seq was downloaded from TANRIC and TCGA database.
Data format Analyzed
Experimental factors The expression of lncRNA and protein-coding genes were extracted from the total expression profiles.
Experimental features Position conservation analysis was conducted on lncRNAs across ten vertebrate species to find lncRNAs conserved by position. Co-expression and causal inference test were used to infer causal relationship between lncRNA and their adjacent TF genes.
Data source location N/A
Data accessibility With this article
Related research article Systematic analysis reveals long noncoding RNAs regulating neighboring transcription factors in human cancers. BBA molecular basis of disease. (In press)

Value of the data

  • The position conservation analysis of lncRNAs across species provides a reference for inferring the functionality of lncRNAs from the conservation perspective of view.

  • The significant adjacency between positional conserved lncRNA and TF genes provides clues to study the regulation mechanism of lncRNAs on gene expression.

  • The provided list of candidate lncRNA/TF regulation pairs can be used for experimental validation to investigate the function of lncRNA in tumors.

1. Data

1.1. GO enrichment of protein coding genes nearby lncRNA

GO items enriched by protein coding genes located in regions 1 Mb upstream and downstream lncRNA loci were presented in Table S1.

1.2. Position conservation of lncRNAs

The existence and absence of syntenic counterparts of human lncRNAs across other vertebrate species were listed in Table S2. LncRNAs that have syntenic lncRNAs in at least four species were classified as highly conserved ones (HC), and used in the following analysis. In total, 769 lncRNA/TF pairs were classified as HC pairs (Table S3). The detailed results were discribed [1].

1.3. Co-expression between lncRNA and TF genes

There were 266 of 769 HC lncRNA/TF pairs were significantly correlated in at least one tumor type, involving 159 TF genes and 253 lncRNAs (Table S4). Of those, 206 were consistently co-expressed in at least two tumor types.

1.4. Candidate lncRNA/TF regulation pairs

To prioritize the true lncRNA/TF regulatory pairs involved in tumors, we combined the results of co-expression (Table S4) and CIT (Table S5) and take advantage of pan-cancer dataset to define a confident list of pairs as those passed both co-expression test and CIT in more than two tumor types. Finally, we provided a list of 28 lncRNA/TF regulation pairs (Table 1).

Table 1.

Candidate lncRNA/TF regulation pairs in TCGA tumors.

lncRNA TF genes Tumor type
SENCR ETS1 BLCA,BRCA,HNSC,KIRC,LUAD,LUSC,OV,STAD
RP11-290F20.2 CEBPB BLCA,BRCA,HNSC,KIRC,LUSC,STAD
RP11-290F20.1 CEBPB BLCA,BRCA,HNSC,KIRC,LUSC,STAD
PVT1 MYC BRCA,HNSC,KIRC,LUSC,OV,STAD
KB-1732A1.1 KLF10 BLCA,BRCA,HNSC,KIRC,LUSC,OV
AF064858.8 ETS2 HNSC,KIRC,LUAD,LUSC,OV
AF064858.11 ETS2 HNSC,KIRC,LUAD,LUSC,OV
RP11-796E10.1 SP3 HNSC,LUAD,LUSC,STAD
RP11-57H14.4 TCF7L2 BRCA,LUAD,LUSC,OV
RP11-290F20.3 CEBPB BRCA,LUAD,LUSC,STAD
LINC00511 SOX9 BRCA,KIRC,LUAD,STAD
CASC15 SOX4 BRCA,KIRC,LUSC,STAD
RP6-109B7.4 PPARA BRCA,KIRC,OV
RP11-57A1.1 SOX9 KIRC,LUAD,STAD
RP11-567M16.1 NFATC1 HNSC,LUSC,OV
RP11-51B23.3 TEAD1 BLCA,BRCA,LUAD
RP11-472N13.3 ZEB1 BLCA,BRCA,STAD
RP11-439L18.2 HIVEP2 BRCA,KIRC,STAD
RP11-397A16.2 TCF4 HNSC,KIRC,LUSC
RP11-330O11.3 ZEB1 BLCA,LUAD,LUSC
RP11-221N13.3 HMGA2 BLCA,HNSC,LUSC
PITRM1-AS1 KLF6 BRCA,KIRC,LUAD
LINC01152 SOX9 BRCA,LIHC,LUSC
LINC00261 FOXA2 LUSC,OV,STAD
GATA6-AS1 GATA6 LUAD,LUSC,STAD
CTD-2532K18.2 MSX2 HNSC,KIRC,LUSC
CCAT1 MYC HNSC,LUSC,STAD

2. Experimental design, materials, and methods

2.1. Data and preprocessing

We downloaded TCGA lncRNA and coding gene expression data from the TANRIC database [2] (http://ibl.mdanderson.org/tanric/_design/basic/index.html) and Broad Institute GDAC firehose (http://gdac.broadinstitute.org) respectively. Only samples with paired lncRNA and mRNA expression profiles were used in this study. LncRNA with RPKM >0.1 and coding genes with RPKM >1 in at least 5% of the samples in each tumor types were retained for the following analysis (Table 2).

Table 2.

The number of tumor samples and expressed lncRNAs across tumors.

Tumor No. of tumor samples No. of expressed lncRNA
BLCA 252 5979
BRCA 837 5941
COAD 157 1612
HNSC 426 5149
KIRC 448 6183
KIRP 198 5864
LIHC 200 4969
LUAD 488 6288
LUSC 220 6206
OV 412 6197
STAD 285 6070
THCA 497 5122

2.2. Positional conservation of human lncRNAs across species

Annotations of protein-coding gene orthologs were obtained from EnsemblCompara [3], and lncRNA annotation in other ten species was downloaded from the NONCODE database [4]. To identify syntenic human lncRNAs in other species, we used the method proposed by Hezroni et al. [5]. Briefly, when comparing genome human (H) and species A, and when considering orthologous protein-coding genes G1 and G2 we first identified lncRNAs within 5×105×Genome length(H)/10^9nt of G1 in H and within 5×105Genome length(A)/10^9nt of G2 in A. A lncRNA was considered to be found “upstream” of the protein-coding gene when it overlapped it or ended 5′ to its 5′ end, and “downstream” when it overlapped it or started 3′ to the 3′ end of the protein-coding gene. Two lncRNA L1 and L2 from A and B were considered syntenic, if they were both upstream or both downstream of G1 and G2, with the same relative orientations.

2.3. Co-expression between lncRNA and their nearby TF genes

Pearson correlation coefficient was used to analyze the co-expression between lncRNA and their nearby TF genes. Co-expressed gene pairs were identified with an absolute Pearson correlation coefficient value ≥0.25 and an FDR-adjusted p-value ≤0.05.

2.4. Causal inference analysis of lncRNA/TF regulation

The lncRNA-TF-targets regulation relationships were assessed using the causal inference test (CIT) [6] to test the regulation chain and to select the possible lncRNA-TF regulation pairs. Briefly, the CIT has statistical tests for four conditions, all of which must be met for the TF -mediated causal classification: (1) lncRNA and TF target are associated, (2) lncRNA is associated with T F after adjusting for TF target, (3) TF is associated with TF target after adjusting for lncRNA, and (4) lncRNA is independent of TF target after adjusting for TF. The CIT p-value was defined as the maximum of the component test p values, and a multivariate linear regression was used in the four component tests. The targets of each TF were obtained from the TRRUST database [7], which collect transcriptional regulatory relationships unraveled by sentence-based text-mining.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant numbers 81703306), China Postdoctoral Science Foundation (Grant number 2017M611867), and Jiangsu Planned Projects for Postdoctoral Research Funds (Grant number 1701119C).

Footnotes

Appendix A

Supplementary data associated with this article can be found in the online version at https://doi.org/10.1016/j.dib.2018.06.048.

Transparency document

Transparency document associated with this article can be found in the online version at 10.1016/j.dib.2018.06.048.

Appendix A. Supplementary material

Supplementary material

mmc1.docx (9.7KB, docx)

.

Transparency document. Supplementary material

Supplementary material

mmc2.xlsx (1MB, xlsx)

.

References

  • 1.Liu Z., Dai J., Shen H. Systematic analysis reveals long noncoding RNAs regulating neighboring transcription factors in human tumors. BBA Mol. Basis Dis. 2018 doi: 10.1016/j.bbadis.2018.05.006. In press. [DOI] [PubMed] [Google Scholar]
  • 2.Li J., Han L., Roebuck P., Diao L., Liu L., Yuan Y., Weinstein J.N., Liang H. TANRIC: an interactive open platform to explore the function of lncRNAs in tumor. Tumor Res. 2015;75:3728–3737. doi: 10.1158/0008-5472.CAN-15-0273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Herrero J., Muffato M., Beal K., Fitzgerald S., Gordon L., Pignatelli M., Vilella A.J., Searle S.M.J., Amode R., Brent S., Spooner W., Kulesha E., Yates A., Flicek P. Ensembl comparative genomics resources. Database J. Biol. Databases Curation. 2016;2016:bwa53. doi: 10.1093/database/bav096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Zhao Y., Li H., Fang S., Kang Y., wu W., Hao Y., Li Z., Bu D., Sun N., Zhang M.Q., Chen R. NONCODE 2016: an informative and valuable data source of long non-coding RNAs. Nucleic Acids Res. 2016;44:D203–D208. doi: 10.1093/nar/gkv1252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hezroni H., Koppstein D., Schwartz M.G., Avrutin A., Bartel D.P., Ulitsky I. Principles of long noncoding rna evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 2015;11:1110–1122. doi: 10.1016/j.celrep.2015.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Millstein J., Zhang B., Zhu J., Schadt E.E. Disentangling molecular relationships with a causal inference test. BMC Genet. 2009;10:23. doi: 10.1186/1471-2156-10-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Han H., Cho J.-W., Lee S., Yun A., Kim H., Bae D., Yang S., Kim C.Y., Lee M., Kim E., Lee S., Kang B., Jeong D., Kim Y., Jeon H.-N., Jung H., Nam S., Chung M., Kim J.-H., Lee I. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 2017;46 doi: 10.1093/nar/gkx1013. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.docx (9.7KB, docx)

Supplementary material

mmc2.xlsx (1MB, xlsx)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES