LnCompare: gene set feature analysis for human long non-coding RNAs

Joana Carlevaro-Fita; Leibo Liu; Yuan Zhou; Shan Zhang; Panagiotis Chouvardas; Rory Johnson; Jianwei Li

doi:10.1093/nar/gkz410

. 2019 May 31;47(W1):W523–W529. doi: 10.1093/nar/gkz410

LnCompare: gene set feature analysis for human long non-coding RNAs

Joana Carlevaro-Fita ^1,^2,³, Leibo Liu ^3,³, Yuan Zhou ⁴, Shan Zhang ³, Panagiotis Chouvardas ^1,², Rory Johnson ^1,^2,^✉, Jianwei Li ^3,^✉

PMCID: PMC6602513 PMID: 31147707

Abstract

Interest in the biological roles of long noncoding RNAs (lncRNAs) has resulted in growing numbers of studies that produce large sets of candidate genes, for example, differentially expressed between two conditions. For sets of protein-coding genes, ontology and pathway analyses are powerful tools for generating new insights from statistical enrichment of gene features. Here we present the LnCompare web server, an equivalent resource for studying the properties of lncRNA gene sets. The Gene Set Feature Comparison mode tests for enrichment amongst a panel of quantitative and categorical features, spanning gene structure, evolutionary conservation, expression, subcellular localization, repetitive sequences and disease association. Moreover, in Similar Gene Identification mode, users may identify other lncRNAs by similarity across a defined range of features. Comprehensive results may be downloaded in tabular and graphical formats, in addition to the entire feature resource. LnCompare will empower researchers to extract useful hypotheses and candidates from lncRNA gene sets.

INTRODUCTION

Long non-coding RNAs (lncRNAs) are a numerous yet poorly understood class of genes with growing biological and biomedical interest. Their regulatory roles (1) and tissue specificity (2,3) make them promising biomarkers and therapeutic targets (4,5). High-throughput studies on disease or biological systems routinely produce sets of tens to thousands of lncRNA candidates (6). Some examples of such sets are lncRNAs exhibiting differential expression between conditions (7), association with a disease (8) or whose perturbation by CRISPR-Cas9 leads to phenotypic changes (6,9). Important bottlenecks arise in assessing the quality of such sets, and generating functional and mechanistic hypotheses from them (6,7,10,11).

Controlled ontologies describing gene functions or their products’ characteristics, most notably Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (12–14), are powerful and widely used tools for inspecting sets of protein-coding genes (PCGs) (15–18). Unfortunately, functional labels have not yet been directly assigned to lncRNAs, making them inaccessible to ontology analyses. As a result, lncRNA candidate sets cannot be mined for biological insights to the degree which we have come to expect for PCGs. In a similar way, there are a range of tools designed to identify other PCGs with various degrees of similarity to a gene of interest (19–22), but none are available for lncRNAs. In summary, there is a need for tools to analyse lncRNA gene lists and functionally prioritize candidates for further study.

To date, most methods dedicated to revealing functional insights from lncRNA sets rely on the biological properties of PCGs with correlated expression across tissues (5). For example, LncRNA2Function and Co-LncRNA web servers perform functional enrichment analysis on the co-expressed coding genes of the input lncRNAs (23,24). Lnc-GFP and LncRNAs2Pathways (23,25) follow a similar strategy but introducing more sophisticated graph theory algorithms on co-expression networks. Finally, FARNA (26) considers transcription factor (TF)-lncRNA associations to predict lncRNA functions. However, none of these resources directly explores the features of lncRNA genes and products.

To address this need, we have developed LnCompare, a web server that compares lncRNA genes across a range of features. LnCompare is based on a comprehensive feature set with more than 100 attributes covering diverse aspects, including gene structure, nucleotide composition, evolutionary conservation, cell and tissue expression, subcellular localization, tissue specificity, repetitive sequence content and phenotypic association. Based on these features, LnCompare has two main functionalities. First, Gene Set Feature Comparison identifies statistically-enriched features of lncRNA sets, in a similar way as is presently done for PCGs (17,18,27,28). Second, Similar Gene Discovery functionality seeks to identify other similar genes for a given gene-of-interest, based on user-defined features.

LnCompare is freely available at http://www.rnanut.net/lncompare/

MATERIALS AND METHODS

Compilation of lncRNA features

We collected and processed various lncRNA datasets from public databases and in-house computational analysis. Altogether these comprise 109 gene/transcript attributes for the GENCODE v24 human lncRNAs annotation (15 941 genes). These features can be classified into six main classes (see Table 1):

Table 1.

Classification and summary of lncRNA features included in LnCompare database

Feature class	Description
Genomic	I) Gene structure and nucleotide composition of lncRNAs: gene length (i.e. entire gene span including exons and introns), exonic length (i.e. length of non-redundant merged exons), GC content and repeat coverage;
	II) Gene location with respect to closest protein-coding gene;
	III) Evolutionary conservation of exons and promoters of lncRNAs using phastCons elements (based on multispecies alignment of 100 vertebrates, 20 mammals or 7 vertebrates).
Cellular expression	Expression, as estimated from ENCODE RNA-seq:
	I) Whole cell, 11 human cell lines;
	II) Cytoplasmic fractions, 15 human cell lines;
	III) Nuclear fractions, 15 human cell lines.
Subcellular localization	Ratio of nuclear to cytoplasmic concentration, 11 human cell lines.
Expression across tissues	Aggregate expression across 16 human tissues: maximum, minimum, mean, median, specificity.
Phenotypic association	Previously discovered association with phenotypes or functions, based on:
	I) Presence in functional and disease databases;
	II) Association with cell proliferation phenotypes in CRISPR-Cas9 screens;
	III) Occurrence of GWAS SNPs in their promoters.
Repetitive elements	The exonic coverage of the 20 most highly overlapping repetitive element classes from Repbase (47).

Open in a new tab

All data has been compiled at the level of lncRNA genes, not transcripts. For certain features, we utilized an exonic projection of all annotated transcripts from each gene, and estimated the corresponding feature accordingly—for example, GC content or phastCons overlap. Detailed information on every feature and its source are available in Supplementary Table S1, and provided as additional information at the web server. A comprehensive list of features may be found in Supplementary Table S1, and the entire table of lncRNA features for the GENCODE v24 annotation can be downloaded in the ‘Download’ tab in LnCompare.

Statistical analysis

In Gene Set Feature Comparison mode, LnCompare considers both quantitative and categorical features (Figure 1A). By default, the background set is defined as the entire GENCODE annotation. The Wilcoxon test is used to compare quantitative features between the input gene set and the background set (Figure 1B). For categorical features, the hypergeometric test is applied, and the detailed formula was described in our previous work (29) (Figure 1C). By default, the quantitative features are sorted by the absolute logarithm ratio of the average feature values between the input versus background, in order to highlight the features where the input lncRNA list and the background show the most prominent divergence. Similarly, the odds ratio is used to rank most enriched categorical features.

In Similar Gene Discovery, LnCompare performs similarity calculation between two lncRNAs based on their features (Figure 2). After comparing several methods (e.g. Pearson correlation, Euclidian distance etc.), the cosine similarity was finally adopted:

where A and B are the feature vectors of two LncRNAs, with N/A values dropped. By definition, it is the point multiplication of A and B, divided by the product of the norm of the two vectors. To enable a flexible similarity calculation, users can either use all features, or else a defined subset of features (Supplementary Table S1). Higher cosine similarity indicates greater similarity.

Figure 2. — Similar Gene Discovery module workflow. The module compares either one gene or several input genes versus a list of background genes (‘1-to-N’ and ‘M-to-N’, respectively). The yellow box represents the input genes; the green box represents the background genes. Similarity comparisons are based on the distance between the two gene feature vectors, calculated by two different methods: cosine similarity and mutual rank.

In addition, for more robust results, we employed mutual rank, which has been successfully applied to establish gene co-expression networks (30):

Where the Rank(Similarity)_{A in B} denotes the rank of similarity between A and B among the similarities of B to all other lncRNAs, and Rank(Similarity)_{B in A} is defined in similar fashion. Lower mutual rank values indicate greater similarity.

Server implementation

The web server is built on a Linux server using the Apache+MySQL+PHP framework. All graphical visualizations are enabled by the open source G2 package (https://antv.alipay.com/), whilst the display and download of tabular results was established with the JavaScript plugin vis (http://visjs.org/).

DESCRIPTION OF WEB SERVER

LnCompare web server performs gene set or single gene comparisons of lncRNAs based on diverse features. It is based on the GENCODE version 24 human annotation (31), and therefore only genes with Ensembl ‘ENSG…’ identifiers belonging to this annotation are assessed. In cases where a supported non-GENCODE identifiers are provided, LnCompare will attempt to map it to GENCODE. Supported identifiers comprise Gene symbols, RefSeq IDs and Ensembl transcript IDs, whose mappings to GENCODE are based on the Ensembl ID mapping file. When successfully mapped, this gene will be included in subsequent analyses. When not successful, unrecognized IDs, including out-of-date Ensembl entries, are ignored in analyses. The number of successfully found IDs is reported in the results page.

For all analyses, users can populate forms with three different sets of example data using buttons. These are: the ‘Simple Example’ list of six randomly-selected lncRNAs; the ‘CLC’ list of 122 cancer-related lncRNAs (32); the ‘Cell Cycle Example’ of 117 lncRNAs that are differentially expressed between G1S and G2M cell-cycle stages in HeLa cells (33).

LnCompare has two modules, described below. A complete tutorial for both modules can be found in the ‘Help’ tab.

Gene set feature comparison

Gene Set Feature Comparison module aims to identify features that characterize a user-provided gene set. This input gene set is compared to a defined background gene set, across each feature (Figure 1A). By default, background is the entire GENCODE annotation, although the user can provide alternative background sets.

The feature-comparison analysis runs differently for quantitative and categorical features. For quantitative features, statistical significance is assessed by Wilcoxon test, while for categorical features the hypergeometric test is used (Figure 1B and C). Results are displayed separately.

For quantitative features, LnCompare returns a summary plot with logarithm ratios of the mean feature values for the input and the background sets. For categorical features, equivalent plots display the odds ratio (Figure 1B and C). In addition, corresponding P-values, Benjamini–Hochberg false discovery rates (FDR) (34) and a link to a boxplot (or barplot for categorical features) can be found in tabular format (for an example see Figure 3). Additional information on each feature can be accessed from the ‘?’ button in the table. For both quantitative and categorical features, the user can apply several different cutoffs to the data displayed: the top ten features, ranked by mean ratio/odds ratio (for quantitative and categorical features, respectively), features with P < 0.05, features with FDR < 0.05, or all possible comparisons). Graphical and tabular results can be directly downloaded from the website.

Figure 3. — Results from *Gene Set Feature Comparison* analysis of CLC (A) Graphical results displaying features that are significantly different between CLC and background lncRNAs (FDR < 0.05). Feature labels are shown on the y-axis and grey boxes on the left summarize their content. The x-axis indicates the ratio between the mean of CLC genes (input) and background genes for each feature. (B) Table obtained from the same analysis for categorical features. ‘Feature’ and ‘Name’ indicate and describe the feature tested, ‘List’ and ‘background’ show the number of CLC and background genes associated with the feature, respectively. ‘P-value’, ‘FDR’ and ‘Odds Ratio’ from hypergeometric test are also shown in the table.

Similar gene discovery

LnCompare offers two approaches to compare similarity of lncRNA genes using cosine similarity method: (i) 1-to-N comparison: computes the similarity of one user-provided lncRNA to all remaining GENCODE lncRNAs; (ii) M-to-N comparison: computes the similarity of every lncRNA from list M to every one in list N (Figure 2). The user must provide two lists of Ensembl gene IDs (up to a maximum of 100 in each). After specifying the type of analysis desired (1-to-N or M-to-N) and entering the gene IDs, the user can choose which subsets of lncRNA features to be used for similarity analysis (from the feature classes described above (see ‘Materials and Methods’ section).

For both types of similarity analysis, graphical and tabular outputs display the top 10, 20 or 50 cosine scores (specified by the user) together with the corresponding gene IDs. The table also contains the relative rank the score represents among the partners for lncRNA1 and lncRNA2 lists, respectively (i.e. first number indicates the ranking number for that pair among all possible partners of lncRNA from lncRNA1 list, and the second number indicates the same for lncRNA from lncRNA2 list).

When all the feature classes are selected, LnCompare also provides a mutual rank similarity results section. In this case, graphical and tabular formats show how reciprocal the similarity is between the two genes, with a mutual rank score (see ‘Materials and Methods’ section for more details). Again, the user can select how many output comparisons should be displayed (top 10, 20 or 50 scores). All the tables from this module are available for download.

EVALUATION OF WEB SERVER; A CASE STUDY USING CANCER-RELATED GENES

We tested the performance of LnCompare using a set of 122 experimentally validated cancer lncRNAs from the Cancer LncRNA Census (CLC) (32). CLC genes are curated based on experimentally validated functional roles in tumorigenesis or cancer-related cellular phenotypes, and hence is a useful positive control set of lncRNA genes. The CLC genes make a good test case, since they are known to be characterized by a range of features such as high expression in tumours, spliced length and evolutionary conservation (32). Assessing CLC we want to represent two possible scenarios: (i) the user has no prior knowledge of a gene set, and wishes to assess their potential functionality; (ii) The user is aware that this is a set of important lncRNAs, and wishes to study their particular features.

Running the Gene Set Feature Comparison module, we searched for specific features of CLC genes compared to background (all other lncRNAs). Using a cutoff of FDR < 0.05, we observe several quantitative and qualitative traits to be significantly enriched in CLC lncRNAs (Figure 3A). These include high average expression across numerous human cell lines and tissues. CLC genes, on average, also show high exon and promoter conservation across mammals and vertebrates (Figure 3A). Moreover, the CLC set is significantly enriched with lncRNAs from functional and disease databases (Figure 3B). Together, these attributes are consistent with the input gene set being enriched with bona fide functional lncRNAs (32), and points to features (e.g. high expression) that are shared by both cancer-related lncRNAs and PCGs (35,36).

Interestingly, the Gene Set Feature Comparison also reports CLC genes to be on average closer in genomic distance to PCGs, and significantly more likely to be divergently transcribed from protein coding genes (Figure 3B). This may reflect a common molecular mechanism among CLC lncRNAs to be further studied. In contrast, it may result from a bias in literature to focus on lncRNAs that lie close to PCGs. Such ascertainment biases are an important confounding factor that should be borne in mind when interpreting these results. At last, we also observe that the CLC set tends to be less tissue specific, more cytoplasmic (their nuclear/cytoplasmic ratio is significantly lower) (Figure 3A), and enriched with hits from proliferation CRISPR screens (Figure 3B).

Similar Gene Discovery ‘M-to-N’ functionality may complement the above analysis to interrogate unknown lists of genes and help to prioritize candidates based on similarity to known lncRNAs. For example, to select potential cancer-associated lncRNAs for experimental validation from a list of novel candidates, one can assess their similarity to CLC genes.

In addition, Similar Gene Discovery ‘1-to-N’ functionality makes it possible to search for the most similar genes to a given lncRNA, in order to discover new, functionally-related lncRNAs. For example, in searching for lncRNAs similar to the X-inactive specific transcript, XIST (37), LnCompare reports the maternally-expressed gene 8 (MEG8) to be the most similar (cosine similarity 0.9). Interestingly, both genes are associated with imprinting, are expressed during early development and have nuclear-restricted localization (38–40). Moreover, MEG8 has been reported to interact with chromatin-binding proteins and repressor complexes (41,42). An important caveat is the fact that some lncRNAs are present in the functional database lncRNAdb, and this will influence the similarity analysis results without necessarily representing biological similarity. In order to eliminate this possible confounder, we removed the phenotype-associated feature class and repeated the analysis. This analysis now identifies ENSG00000272872 as the most similar to XIST. This lncRNA has been linked to various cancers in the lncRNADisease v2.0 database (43), making it an interesting candidate to study.

DISCUSSION

In recent years there has been a dramatic acceleration in the volumes of lncRNA gene candidates emerging from genomic studies. However, we remain broadly ignorant about molecular mechanisms and biological roles of these genes. Classical methods to describe newly discovered genes or to assess gene sets are inefficient for lncRNAs. This has created a need for tools to study the properties of lncRNA sets, in order to formulate new hypotheses from or gauge the success of high-throughput experiments.

To meet this need, we have curated a comprehensive feature set covering diverge quantitative and categorical aspects of lncRNAs from the GENCODE annotation. Using these features, LnCompare searches for those that are significantly over- or under-represented in an input set compared to background. In a set of lncRNAs with known roles in cancer (35), LnCompare identifies a range of characteristic features, including elevated expression and evolutionary conservation. Moreover, it also identifies other attributes including higher cytoplasmic localization and ubiquitous expression. These features may guide researchers to focus on potential molecular activities related to cytoplasmic processes, in contrast to most studies that concentrate on lncRNAs’ roles in chromatin regulation (44–46). Moreover, LnCompare assesses similarity between genes, which can be a powerful strategy to identify new genes playing similar roles to known examples. Conversely, such similarity analysis could be used to predict the roles of novel lncRNAs by similarity to known genes. We anticipate that LnCompare will be useful to the many colleagues who presently study lncRNAs at the global scale and wish to extract more biological insights from their data.

Supplementary Material

gkz410_Supplemental_Files

Click here for additional data file.^{(14.1KB, xlsx)}

ACKNOWLEDGEMENTS

We acknowledge administrative support from Deborah Re and Silvia Roesselet (DBMR).

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Natural Science Foundation of China [81672113]; Swiss National Science Foundation through the National Center of Competence in Research (NCCR) ‘RNA & Disease’; Medical Faculty of the University and University Hospital of Bern; Helmut Horten Stiftung. Funding for open access charge: Core funding.

Conflict of interest statement. None declared.

REFERENCES

1. Marchese F.P., Raimondi I., Huarte M.. The multidimensional mechanisms of long noncoding RNA function. Genome Biol. 2017; 18:206. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Derrien T., Johnson R., Bussotti G., Tanzer A., Djebali S., Tilgner H., Guernec G., Martin D., Merkel A., Knowles D.G. et al.. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012; 22:1775–1789. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Mercer T.R., Dinger M.E., Sunkin S.M., Mehler M.F., Mattick J.S.. Specific expression of long noncoding RNAs in the mouse brain. Proc. Natl. Acad. Sci. U.S.A. 2008; 105:716–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Bonetti A., Carninci P.. From bench to bedside: the long journey of long non-coding RNAs. Curr. Opin. Syst. Biol. 2017; 3:119–124. [Google Scholar]
5. Sun M., Kraus W.L.. From discovery to function: the expanding roles of long noncoding RNAs in physiology and disease. Endocr. Rev. 2015; 36:25–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Liu S.J., Horlbeck M.A., Cho S.W., Birk H.S., Malatesta M., He D., Attenello F.J., Villalta J.E., Cho M.Y., Chen Y. et al.. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science. 2017; 355:aah7111. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Ounzain S., Micheletti R., Beckmann T., Schroen B., Alexanian M., Pezzuto I., Crippa S., Nemir M., Sarre A., Johnson R. et al.. Genome-wide profiling of the cardiac transcriptome after myocardial infarction identifies novel heart-specific long non-coding RNAs. Eur. Heart J. 2015; 36:353–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Gao Y., Wang P., Wang Y., Ma X., Zhi H., Zhou D., Li X., Fang Y., Shen W., Xu Y. et al.. Lnc2Cancer v2.0: updated database of experimentally supported long non-coding RNAs in human cancers. Nucleic Acids Res. 2019; 47:D1028–D1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Kashi K., Henderson L., Bonetti A., Carninci P.. Discovery and functional analysis of lncRNAs: Methodologies to investigate an uncharacterized transcriptome. Biochim. Biophys. Acta. 2015; 1859:3–15. [DOI] [PubMed] [Google Scholar]
10. Zhu S., Li W., Liu J., Chen C.-H., Liao Q., Xu P., Xu H., Xiao T., Cao Z., Peng J. et al.. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR–Cas9 library. Nat. Biotechnol. 2016; 34:1279–1286. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Ali M.M., Akhade V.S., Kosalai S.T., Subhash S., Statello L., Meryet-Figuiere M., Abrahamsson J., Mondal T., Kanduri C.. PAN-cancer analysis of S-phase enriched lncRNAs identifies oncogenic drivers and biomarkers. Nat. Commun. 2018; 9:883. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Kanehisa M., Goto S., Sato Y., Kawashima M., Furumichi M., Tanabe M.. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014; 42:D199–D205. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T. et al.. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000; 25:25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. The Gene Ontology Consortium The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019; 47:D330–D338. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Mi H., Huang X., Muruganujan A., Tang H., Mills C., Kang D., Thomas P.D.. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017; 45:D183–D189. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Carbon S., Ireland A., Mungall C.J., Shu S., Marshall B., Lewis S., Hub AmiGO Web Presence Working Group . AmiGO: online access to ontology and annotation data. Bioinformatics. 2009; 25:288–289. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Eden E., Navon R., Steinfeld I., Lipson D., Yakhini Z.. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009; 10:48. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Reimand J., Arak T., Adler P., Kolberg L., Reisberg S., Peterson H., Vilo J.. g:Profiler—a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 2016; 44:W83–W89. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Du Z., Li L., Chen C.-F., Yu P.S., Wang J.Z.. G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res. 2009; 37:W345–W349. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Pesaranghader A., Matwin S., Sokolova M., Beiko R.G.. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes. Bioinformatics. 2016; 32:1380–1387. [DOI] [PubMed] [Google Scholar]
21. Zhang P., Zhang J., Sheng H., Russo J.J., Osborne B., Buetow K.. Gene functional similarity search tool (GFSST). BMC Bioinformatics. 2006; 7:135. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Wang J.Z., Du Z., Yu P.S., Chen C.-F.. An Efficient Online Tool to Search Top-N Genes with Similar Biological Functions in Gene Ontology Database. 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007). 2007; Fremont: IEEE; 406–411. [Google Scholar]
23. Jiang Q., Ma R., Wang J., Wu X., Jin S., Peng J., Tan R., Zhang T., Li Y., Wang Y.. LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data. BMC Genomics. 2015; 16:S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Zhao Z., Bai J., Wu A., Wang Y., Zhang J., Wang Z., Li Y., Xu J., Li X.. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data. Database. 2015; 2015:bav082. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Guo X., Gao L., Liao Q., Xiao H., Ma X., Yang X., Luo H., Zhao G., Bu D., Jiao F. et al.. Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks. Nucleic Acids Res. 2013; 41:e35. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Alam T., Uludag M., Essack M., Salhi A., Ashoor H., Hanks J.B., Kapfer C., Mineta K., Gojobori T., Bajic V.B.. FARNA: knowledgebase of inferred functions of non-coding RNA transcripts. Nucleic Acids Res. 2017; 45:2838–2848. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Beißbarth T., Speed Walter T.P., Hall E.. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004; 20:1464–1465. [DOI] [PubMed] [Google Scholar]
28. Zheng Q., Wang X.-J.. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res. 2008; 36:W358–W363. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Li J., Han X., Wan Y., Zhang S., Zhao Y., Fan R., Cui Q., Zhou Y.. TAM 2.0: tool for MicroRNA set analysis. Nucleic Acids Res. 2018; 46:W180–W185. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Obayashi T., Kagaya Y., Aoki Y., Tadaka S., Kinoshita K.. COXPRESdb v7: a gene coexpression database for 11 animal species supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res. 2019; 47:D55–D62. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Frankish A., Diekhans M., Ferreira A.-M., Johnson R., Jungreis I., Loveland J., Mudge J.M., Sisu C., Wright J., Armstrong J. et al.. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019; 47:D766–D773. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Carlevaro-Fita J., Camaioni A.A.L., Feuerbach L., Hong C., Mas-Ponte D., Guigo R., Pedersen J.S., Johnson R. 2-5-9-14, - PCAWG Driver Identification Working Group . Unique genomic features and deeply-conserved functions of long non-coding RNAs in the Cancer LncRNA Census (CLC). 2017; bioRxiv doi: 10.1101/152769, 25 August 2017, preprint: not peer reviewed. [DOI]
33. Murthy T., Bluemn T., Gupta A.K., Reimer M., Rao S., Pillai M.M., Minella A.C.. Cyclin-dependent kinase 1 (CDK1) and CDK2 have opposing roles in regulating interactions of splicing factor 3B1 with chromatin. J. Biol. Chem. 2018; 293:10220–10234. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Benjamini Y., Hochberg Y., Hochberg Y., Benjamini Y., Benjamin Y.. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc B. 1995; 50:289–300. [Google Scholar]
35. Lanzós A., Carlevaro-Fita J., Palumbo E., Reverter F., Mularoni L., Guigó R., Johnson R.. Discovery of cancer driver long noncoding RNAs across 1112 Tumour Genomes: new candidates and distinguishing features. Sci. Rep. 2017; 7:41544. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Furney S.J., Higgins D.G., Ouzounis C.A., López-Bigas N.. Structural and functional properties of genes involved in human cancer. BMC Genomics. 2006; 7:3. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Brown C.J., Hendrich B.D., Rupert J.L., Lafrenière R.G., Xing Y., Lawrence J., Willard H.F.. The human XIST gene: Analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell. 1992; 71:527–542. [DOI] [PubMed] [Google Scholar]
38. Hatada I., Morita S., Obata Y., Sotomaru Y., Shimoda M., Kono T.. Identification of a new imprinted gene, Rian, on mouse chromosome 12 by fluorescent differential display screening. J. Biochem. 2001; 130:187–190. [DOI] [PubMed] [Google Scholar]
39. Charlier C., Segers K., Wagenaar D., Karim L., Berghmans S., Jaillon O., Shay T., Weissenbach J., Cockett N., Gyapay G. et al.. Human-ovine comparative sequencing of a 250-kb imprinted domain encompassing the callipyge (clpg) locus and identification of six imprinted transcripts: DLK1, DAT, GTL2, PEG11, antiPEG11, and MEG8. Genome Res. 2001; 11:850–862. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Inoue A., Jiang L., Lu F., Zhang Y.. Genomic imprinting of Xist by maternal H3K27me3. Genes Dev. 2017; 31:1927–1932. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Terashima M., Ishimura A., Wanna-udom S., Suzuki T.. MEG8 long noncoding RNA contributes to epigenetic progression of the epithelial-mesenchymal transition of lung and pancreatic cancer cells. J. Biol. Chem. 2018; 293:18016–18030. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Guttman M., Donaghey J., Carey B.W., Garber M., Grenier J.K., Munson G., Young G., Lucas A.B., Ach R., Bruhn L. et al.. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011; 477:295–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Bao Z., Yang Z., Huang Z., Zhou Y., Cui Q., Dong D.. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2019; 47:D1034–D1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Mercer T.R., Mattick J.S.. Structure and function of long noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol. 2013; 20:300–307. [DOI] [PubMed] [Google Scholar]
45. Chen L.-L., Carmichael G.G.. Altered nuclear retention of mRNAs containing inverted repeats in human embryonic stem Cells: Functional role of a nuclear noncoding RNA. Mol. Cell. 2009; 35:467–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Guttman M., Garber M., Bernstein B.E., Rinn J.L., Khalil A.M., van Oudenaarden A., Rivea Morales D., Lander E.S., Regev A., Thomas K. et al.. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. U.S.A. 2009; 106:11667–11672. [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Bao W., Kojima K.K., Kohany O.. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015; 6:11. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkz410_Supplemental_Files

Click here for additional data file.^{(14.1KB, xlsx)}

[B1] 1. Marchese F.P., Raimondi I., Huarte M.. The multidimensional mechanisms of long noncoding RNA function. Genome Biol. 2017; 18:206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2. Derrien T., Johnson R., Bussotti G., Tanzer A., Djebali S., Tilgner H., Guernec G., Martin D., Merkel A., Knowles D.G. et al.. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012; 22:1775–1789. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3. Mercer T.R., Dinger M.E., Sunkin S.M., Mehler M.F., Mattick J.S.. Specific expression of long noncoding RNAs in the mouse brain. Proc. Natl. Acad. Sci. U.S.A. 2008; 105:716–721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Bonetti A., Carninci P.. From bench to bedside: the long journey of long non-coding RNAs. Curr. Opin. Syst. Biol. 2017; 3:119–124. [Google Scholar]

[B5] 5. Sun M., Kraus W.L.. From discovery to function: the expanding roles of long noncoding RNAs in physiology and disease. Endocr. Rev. 2015; 36:25–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Liu S.J., Horlbeck M.A., Cho S.W., Birk H.S., Malatesta M., He D., Attenello F.J., Villalta J.E., Cho M.Y., Chen Y. et al.. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science. 2017; 355:aah7111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Ounzain S., Micheletti R., Beckmann T., Schroen B., Alexanian M., Pezzuto I., Crippa S., Nemir M., Sarre A., Johnson R. et al.. Genome-wide profiling of the cardiac transcriptome after myocardial infarction identifies novel heart-specific long non-coding RNAs. Eur. Heart J. 2015; 36:353–368. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Gao Y., Wang P., Wang Y., Ma X., Zhi H., Zhou D., Li X., Fang Y., Shen W., Xu Y. et al.. Lnc2Cancer v2.0: updated database of experimentally supported long non-coding RNAs in human cancers. Nucleic Acids Res. 2019; 47:D1028–D1033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9. Kashi K., Henderson L., Bonetti A., Carninci P.. Discovery and functional analysis of lncRNAs: Methodologies to investigate an uncharacterized transcriptome. Biochim. Biophys. Acta. 2015; 1859:3–15. [DOI] [PubMed] [Google Scholar]

[B10] 10. Zhu S., Li W., Liu J., Chen C.-H., Liao Q., Xu P., Xu H., Xiao T., Cao Z., Peng J. et al.. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR–Cas9 library. Nat. Biotechnol. 2016; 34:1279–1286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11. Ali M.M., Akhade V.S., Kosalai S.T., Subhash S., Statello L., Meryet-Figuiere M., Abrahamsson J., Mondal T., Kanduri C.. PAN-cancer analysis of S-phase enriched lncRNAs identifies oncogenic drivers and biomarkers. Nat. Commun. 2018; 9:883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12. Kanehisa M., Goto S., Sato Y., Kawashima M., Furumichi M., Tanabe M.. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014; 42:D199–D205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T. et al.. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000; 25:25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14. The Gene Ontology Consortium The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019; 47:D330–D338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15. Mi H., Huang X., Muruganujan A., Tang H., Mills C., Kang D., Thomas P.D.. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017; 45:D183–D189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Carbon S., Ireland A., Mungall C.J., Shu S., Marshall B., Lewis S., Hub AmiGO Web Presence Working Group . AmiGO: online access to ontology and annotation data. Bioinformatics. 2009; 25:288–289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17. Eden E., Navon R., Steinfeld I., Lipson D., Yakhini Z.. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009; 10:48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Reimand J., Arak T., Adler P., Kolberg L., Reisberg S., Peterson H., Vilo J.. g:Profiler—a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 2016; 44:W83–W89. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Du Z., Li L., Chen C.-F., Yu P.S., Wang J.Z.. G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res. 2009; 37:W345–W349. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20. Pesaranghader A., Matwin S., Sokolova M., Beiko R.G.. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes. Bioinformatics. 2016; 32:1380–1387. [DOI] [PubMed] [Google Scholar]

[B21] 21. Zhang P., Zhang J., Sheng H., Russo J.J., Osborne B., Buetow K.. Gene functional similarity search tool (GFSST). BMC Bioinformatics. 2006; 7:135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22. Wang J.Z., Du Z., Yu P.S., Chen C.-F.. An Efficient Online Tool to Search Top-N Genes with Similar Biological Functions in Gene Ontology Database. 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007). 2007; Fremont: IEEE; 406–411. [Google Scholar]

[B23] 23. Jiang Q., Ma R., Wang J., Wu X., Jin S., Peng J., Tan R., Zhang T., Li Y., Wang Y.. LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data. BMC Genomics. 2015; 16:S2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24. Zhao Z., Bai J., Wu A., Wang Y., Zhang J., Wang Z., Li Y., Xu J., Li X.. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data. Database. 2015; 2015:bav082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25. Guo X., Gao L., Liao Q., Xiao H., Ma X., Yang X., Luo H., Zhao G., Bu D., Jiao F. et al.. Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks. Nucleic Acids Res. 2013; 41:e35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26. Alam T., Uludag M., Essack M., Salhi A., Ashoor H., Hanks J.B., Kapfer C., Mineta K., Gojobori T., Bajic V.B.. FARNA: knowledgebase of inferred functions of non-coding RNA transcripts. Nucleic Acids Res. 2017; 45:2838–2848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27. Beißbarth T., Speed Walter T.P., Hall E.. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004; 20:1464–1465. [DOI] [PubMed] [Google Scholar]

[B28] 28. Zheng Q., Wang X.-J.. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res. 2008; 36:W358–W363. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 29. Li J., Han X., Wan Y., Zhang S., Zhao Y., Fan R., Cui Q., Zhou Y.. TAM 2.0: tool for MicroRNA set analysis. Nucleic Acids Res. 2018; 46:W180–W185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30. Obayashi T., Kagaya Y., Aoki Y., Tadaka S., Kinoshita K.. COXPRESdb v7: a gene coexpression database for 11 animal species supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res. 2019; 47:D55–D62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31. Frankish A., Diekhans M., Ferreira A.-M., Johnson R., Jungreis I., Loveland J., Mudge J.M., Sisu C., Wright J., Armstrong J. et al.. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019; 47:D766–D773. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32. Carlevaro-Fita J., Camaioni A.A.L., Feuerbach L., Hong C., Mas-Ponte D., Guigo R., Pedersen J.S., Johnson R. 2-5-9-14, - PCAWG Driver Identification Working Group . Unique genomic features and deeply-conserved functions of long non-coding RNAs in the Cancer LncRNA Census (CLC). 2017; bioRxiv doi: 10.1101/152769, 25 August 2017, preprint: not peer reviewed. [DOI]

[B33] 33. Murthy T., Bluemn T., Gupta A.K., Reimer M., Rao S., Pillai M.M., Minella A.C.. Cyclin-dependent kinase 1 (CDK1) and CDK2 have opposing roles in regulating interactions of splicing factor 3B1 with chromatin. J. Biol. Chem. 2018; 293:10220–10234. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34. Benjamini Y., Hochberg Y., Hochberg Y., Benjamini Y., Benjamin Y.. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc B. 1995; 50:289–300. [Google Scholar]

[B35] 35. Lanzós A., Carlevaro-Fita J., Palumbo E., Reverter F., Mularoni L., Guigó R., Johnson R.. Discovery of cancer driver long noncoding RNAs across 1112 Tumour Genomes: new candidates and distinguishing features. Sci. Rep. 2017; 7:41544. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] 36. Furney S.J., Higgins D.G., Ouzounis C.A., López-Bigas N.. Structural and functional properties of genes involved in human cancer. BMC Genomics. 2006; 7:3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] 37. Brown C.J., Hendrich B.D., Rupert J.L., Lafrenière R.G., Xing Y., Lawrence J., Willard H.F.. The human XIST gene: Analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell. 1992; 71:527–542. [DOI] [PubMed] [Google Scholar]

[B38] 38. Hatada I., Morita S., Obata Y., Sotomaru Y., Shimoda M., Kono T.. Identification of a new imprinted gene, Rian, on mouse chromosome 12 by fluorescent differential display screening. J. Biochem. 2001; 130:187–190. [DOI] [PubMed] [Google Scholar]

[B39] 39. Charlier C., Segers K., Wagenaar D., Karim L., Berghmans S., Jaillon O., Shay T., Weissenbach J., Cockett N., Gyapay G. et al.. Human-ovine comparative sequencing of a 250-kb imprinted domain encompassing the callipyge (clpg) locus and identification of six imprinted transcripts: DLK1, DAT, GTL2, PEG11, antiPEG11, and MEG8. Genome Res. 2001; 11:850–862. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 40. Inoue A., Jiang L., Lu F., Zhang Y.. Genomic imprinting of Xist by maternal H3K27me3. Genes Dev. 2017; 31:1927–1932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B41] 41. Terashima M., Ishimura A., Wanna-udom S., Suzuki T.. MEG8 long noncoding RNA contributes to epigenetic progression of the epithelial-mesenchymal transition of lung and pancreatic cancer cells. J. Biol. Chem. 2018; 293:18016–18030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B42] 42. Guttman M., Donaghey J., Carey B.W., Garber M., Grenier J.K., Munson G., Young G., Lucas A.B., Ach R., Bruhn L. et al.. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011; 477:295–300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] 43. Bao Z., Yang Z., Huang Z., Zhou Y., Cui Q., Dong D.. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2019; 47:D1034–D1037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B44] 44. Mercer T.R., Mattick J.S.. Structure and function of long noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol. 2013; 20:300–307. [DOI] [PubMed] [Google Scholar]

[B45] 45. Chen L.-L., Carmichael G.G.. Altered nuclear retention of mRNAs containing inverted repeats in human embryonic stem Cells: Functional role of a nuclear noncoding RNA. Mol. Cell. 2009; 35:467–478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B46] 46. Guttman M., Garber M., Bernstein B.E., Rinn J.L., Khalil A.M., van Oudenaarden A., Rivea Morales D., Lander E.S., Regev A., Thomas K. et al.. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. U.S.A. 2009; 106:11667–11672. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B47] 47. Bao W., Kojima K.K., Kohany O.. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015; 6:11. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

LnCompare: gene set feature analysis for human long non-coding RNAs

Joana Carlevaro-Fita

Leibo Liu

Yuan Zhou

Shan Zhang

Panagiotis Chouvardas

Rory Johnson

Jianwei Li

Abstract

INTRODUCTION