Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2022 Nov 4;51(D1):D186–D191. doi: 10.1093/nar/gkac999

LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations

Zhao Li 1,2,3,3, Lin Liu 4,5,3, Changrui Feng 6,7,8,3, Yuxin Qin 9,10,11, Jingfa Xiao 12,13,14, Zhang Zhang 15,16,17,, Lina Ma 18,19,20,
PMCID: PMC9825513  PMID: 36330950

Abstract

LncBook, a comprehensive resource of human long non-coding RNAs (lncRNAs), has been used in a wide range of lncRNA studies across various biological contexts. Here, we present LncBook 2.0 (https://ngdc.cncb.ac.cn/lncbook), with significant updates and enhancements as follows: (i) incorporation of 119 722 new transcripts, 9632 new genes, and gene structure update of 21 305 lncRNAs; (ii) characterization of conservation features of human lncRNA genes across 40 vertebrates; (iii) integration of lncRNA-encoded small proteins; (iv) enrichment of expression and DNA methylation profiles with more biological contexts and (v) identification of lncRNA–protein interactions and improved prediction of lncRNA-miRNA interactions. Collectively, LncBook 2.0 accommodates a high-quality collection of 95 243 lncRNA genes and 323 950 transcripts and incorporates their abundant annotations at different omics levels, thereby enabling users to decipher functional significance of lncRNAs in different biological contexts.

INTRODUCTION

LncBook, a curated resource of human long non-coding RNAs (lncRNAs), features comprehensive integration of human lncRNAs and systematic annotation with multi-omics data analysis (1). Since its inception in 2019, LncBook has been widely used in delineating the transcriptional landscape of human lncRNAs (2,3), uncovering lncRNAs’ molecular signatures (4,5), and disentangling functional relevance of lncRNAs in human diseases (6–8). Over the past several years, considerable efforts have been devoted to identifying (9–13) and characterizing human lncRNAs at different omics levels across diverse biological contexts, e.g. disease, normal tissue/cell line, organ development, subcellular localization (14). Particularly, multiple lines of evidence have accumulated that sequence conservation is a fundamental indicator for lncRNA functional significance (15) and that lncRNA-encoded small proteins are involved in diverse functions and multiple diseases (16–18). Therefore, it is highly needed to integrate newly reported lncRNAs and characterize lncRNAs from multiple omics levels and in more biological contexts. Toward this end, here we perform comprehensive integration of lncRNAs as well as their curated annotations, including expression and DNA methylation profiles in multiple biological contexts, disease/trait-associated variants, lncRNA-miRNA interactions, lncRNA–protein interactions, evolutionary conservation features and small proteins. As a consequence, we provide an updated version of LncBook, which, in contrast to the previous version, has been significantly upgraded, expanded and enhanced (Table 1).

Table 1.

Data statistics of two LncBook versions

Data item Version 2.0 Version 1.0
LncRNA integration and curation Transcript 323 950 270 044
Gene 95 243 140 356
Quality control Remove pseudogenes, small RNAs, miRNA precursors, and transcripts without strand information -
Reference file LncRNA; LncRNA & other genes LncRNA
Multi-omics annotation Evolutionary conservation Conservation features across 40 vertebrates -
Small protein 34 012 small proteins -
Genome variation 959 138 disease/trait-associated variants 92 725 757 SNPs
DNA methylation profile 14 cancers and 2 neurodevelopmental disorders 9 cancers
Expression profile 9 biological contexts 1 biological context
LncRNA–protein interaction 772 745 lncRNA–protein interactions -
LncRNA–miRNA interaction Predicted with TargetScan, miRanda and RNAhybrid Predicted with TargetScan and miRanda

MATERIALS AND METHODS

LncRNA integration and curation

Based on the previous version, LncBook 2.0 integrated lncRNAs from five resources, including RefLnc (9), GENCODE v33 (10), CHESS v2.2 (11), FANTOM-CAT (12) and BIGTranscriptome (13). Transcripts with redundancy, background noise, and mapping error, as well as incomplete transcripts, short ones, and those that may encode proteins, were excluded (1). To improve the curation quality, in LncBook 2.0, we also removed lncRNA transcripts without strand information, and transcripts identified as miRNA precursors, small RNAs and pseudogenes according to the comparison results generated with GffCompare (19). In addition, four algorithms, viz., CPC2 (20), LGC (21), CPAT (22) and PLEK (23), were used for coding potential estimation, and transcripts identified as lncRNAs by at least three algorithms were retained. It is noted that the lncRNAs annotated by HGNC and GENCODE were retained regardless of the coding potential. Following the strategies used by GENCODE (24) and NONCODE (25), we assigned lncRNA transcripts overlapped in their exonic regions in the same strand into the same gene. To meet different demands of data analysis, lncRNA gene annotation file and the integrated annotation file with both lncRNA genes and other genes (derived from GENCODE) were provided.

Data integration and analysis

To perform sequence conservation analysis, genome references, gene annotation files and paired alignment chain files for human and 40 vertebrates were downloaded from UCSC Genome Browser Gateway (https://hgdownload.soe.ucsc.edu) (26). We identified lncRNA homologous sequences/genes by considering alignment length and comparing with introns’ alignments in different species. As lncRNAs are in general lack of high sequence conservation, 20% transcript coverage has been adopted to identify homologous lncRNAs (27), so that we collected alignments that are at least 50 nt in length and with >20% transcript coverage. Meanwhile, to reduce the impacts of evolutionary distance, the homologous sequences/genes were determined for each species if the alignment performance (measured by alignment length and identity) of lncRNAs exceeds the introns’ Q50 threshold, which represents the intermediate level of intron alignments. LncRNA gene age was defined as the earliest occurrence time of its homologous sequence across 40 species, which, from latest to earliest, are ‘Homo’ (human specific), ‘Hominini’, ‘Homininae’, ‘Hominidae’, ‘Hominoidea’, ‘Catarrhini’, ‘Simiiformes’, ‘Haplorrhini’, ‘Primates’, ‘Euarchontoglires’, ‘Boreoeutheria’, ‘Eutheria’, ‘Theria’, ‘Mammalia’, ‘Amniota’, ‘Tetrapoda’ and ‘Euteleostomi’.

High-confidence variants and associations were collected from COSMIC (28), ClinVar (29) and GWAS Catalog (30). For COSMIC (28), the variants labeled as ‘Confirmed somatic mutation’ were retained. Suggested by COSMIC (https://cancer.sanger.ac.uk/cosmic/analyses), the variants whose FATHMM-MKL scores > 0.7 were defined as disease-related (pathogenic) variants. Variants with definite labels such as ‘benign’ or ‘pathogenic’ in ClinVar (29) were collected. For GWAS Catalog (30), we collected the associations with P-value < 5 × 10−8, which has been widely used to determine association between a common genetic variant and a trait of interest (31,32). Finally, names of diseases and traits were unified according to Human Phenotype Ontology (33) and Experimental Factor Ontology (34), respectively. All the variants were allocated to lncRNAs by BEDTools (35).

To characterize the DNA methylation profiles of lncRNAs across human diseases, 16 publicly accessible bisulfite-seq datasets were collected from TCGA (The Cancer Genome Atlas) (36) and GEO (Gene Expression Omnibus) (37), covering 14 cancers and 2 neuro-developmental disorders. Here, promoter region was defined as -1500 bps relative to the transcription start site and averaged methylation level of all CpGs in promoter or body region was calculated. Differentially methylated lncRNA genes were identified by considering the significance of fold change, P-value, maximum and minimum methylation levels.

Expression profiles of human lncRNAs were obtained from LncExpDB (38), covering 337 biological conditions that can be further classified into nine biological contexts, namely, normal tissue/cell line, organ development, preimplantation embryo, cell differentiation, subcellular localization, exosome, cancer cell line, virus infection and circadian rhythm. To determine gene expression capacity, genes whose expression values are greater than the upper quantile of whole transcriptome (includes both lncRNA genes and protein-coding genes) in at least one biological condition are considered as high expression capacity, those less than the lower quantile as low capacity, and the remaining as medium capacity. Featured lncRNA genes are those that are specifically expressed in a certain cell line/tissue, consistently expressed across different cell lines/tissues, differentially expressed in the context of cancer or virus infection, enriched in a specific organelle, dynamically expressed during cell differentiation or embryo/organ development, or periodically expressed with circadian rhythm. In addition, subcellular localization and tissue/normal cell/cancer cell specificity of lncRNA genes were characterized based on the expression profiles, and related information was listed in ‘Gene Summary’.

Small proteins supported by Ribo-seq or mass spectrometry evidence were integrated from SmProt (39). Small proteins were mapped onto the lncRNAs with BEDtools (35) and those entirely and uniquely falling within lncRNA transcripts were retained.

LncRNA–protein interactions were identified based on the collection of 848 077 RBP (RNA Binding Protein) binding sites of 150 RBPs in HepG2 and K562 cell lines from ENCODE (40). We mapped the RBP binding sites onto the lncRNAs with BEDtools (35), and RBP binding sites entirely and uniquely falling within lncRNA transcripts were retained.

Three tools, including miRanda (41), TargetScan (42), and RNAhybrid (43), were used to predict more lncRNA-miRNA interactions. Interactions supported by all the three tools as well as those by any two tools were listed in the ‘Interaction’ section. Additionally, interactions predicted by only one tool were provided in ‘Downloads’.

Implementation

LncBook 2.0 was implemented based on Spring Boot (https://spring.io/projects/spring-boot), MySQL (https://www.mysql.com) and Apache Tomcat Server (https://tomcat.apache.org). Web interfaces were developed by HTML5, CSS3, AJAX (Asynchronous JavaScript and XML), JQuery (https://jquery.com), Bootstrap (https://getbootstrap.com) and Semantic UI (https://semantic-ui.com). In addition, data visualization was powered by HighCharts (https://www.highcharts.com.cn), ECharts (https://echarts.apache.org), Plotly.js (https://plotly.com), and DataTables (https://datatables.net). Web tools were set up by HTML widgets, NCBI BLAST+ (44) and in-house python scripts.

IMPROVED CONTENT AND NEW FEATURES

Expanded lncRNA list and enriched multi-omics annotations

LncBook is devoted to providing a comprehensive and high-quality collection of human lncRNAs as well as their annotations based on multi-omics analysis and value-added curation. Compared with the previous version, LncBook 2.0 is significantly improved in the quality of human lncRNA genes, and the comprehensiveness of their multi-omics annotations (Table 1).

LncBook 2.0 features a full list of human lncRNAs by comprehensively integrating lncRNAs from different resources and curating the lncRNAs with more strict criteria (see details in Materials and Methods). As a result, it incorporates 119 722 new transcripts and 9632 new genes, updates the structure of 21 305 genes and provides a high-quality collection of 323 950 lncRNA transcripts and 95 243 genes, compared with 270 044 transcripts and 140 356 genes of the previous version. As transcripts overlapped in the exonic regions in the same strand are assigned as the same gene, LncBook 2.0 presents a decreased gene count in spite of the increased number of transcripts. Based on this list, it incorporates more annotations by including new omics profiles and covering more biological contexts.

We characterize conservation features of human lncRNA genes across 40 vertebrates, identify 139 306 homologous genes for 22 347 human lncRNA genes, and integrate 34 012 lncRNA-encoded small proteins for 5743 lncRNA genes. Expression profiles have been enriched with more biological contexts, which are increased from 1 (normal tissue/cell line) to 9 (normal tissue/cell line, organ development, preimplantation embryo, cell differentiation, subcellular localization, exosome, cancer cell line, virus infection, circadian rhythm). Moreover, disease types for DNA methylation profiles have been increased from 9 cancers to 14 cancers and 2 neurodevelopmental disorders have been included as well. As a result, LncBook 2.0 contains a total of 24 157 featured lncRNA genes for expression (specifically/consistently/differentially/dynamically/perio-dically expressed) and 19 543 featured lncRNA genes for DNA methylation (hyper/hypomethylated in promoter or body region). Also, we curate 959 138 disease/trait-associated variants associated with 50 165 lncRNA genes, identify 772 745 lncRNA–protein interactions for 2005 lncRNA genes, and predict 146 092 274 lncRNA-miRNA interactions for all lncRNA genes.

Database contents and organization

LncBook 2.0 is a gene-centric resource with user-friendly web interfaces for searching, browsing, visualizing, analyzing and downloading (Figure 1). A lncRNA gene corresponds to a web page, which is composed of nine sections, including gene summary, transcript information, coding potential, conservation, variation, methylation, expression, small protein and interaction. For any lncRNA gene, the annotations are summarized in various tables, sequence conservation status across 40 vertebrates is displayed in a phylogenetic tree, methylation levels of both promoter and body regions across 16 diseases are visualized in boxplots, and expression profiles across 337 biological conditions are represented in a bar chart. Meanwhile, LncBook 2.0 allows interactive visualization of literature curation results in LncRNAWiki (45), expression profiles across diverse biological contexts in LncExpDB (38) and relevant annotations in the integrated databases. In addition, LncBook 2.0 provides dedicated web pages for each omics resource with abundant descriptive terms to enable various customized comparisons. Moreover, curated multi-omics features of all lncRNAs are summarized in a tabular form in the ‘Genes’ page. Based on these annotations, LncBook 2.0 presents a series of useful statistics and analysis results in the ‘Statistics’ page, and deploys several useful tools for online analysis. Equally important, all the associated data are publicly available in the ‘Downloads’ page and all tables and figures could be freely downloadable in LncBook 2.0.

Figure 1.

Figure 1.

Data contents and organization in LncBook 2.0. A comprehensive, high-quality collection of human lncRNAs is annotated at different omics levels and organized with user-friendly web interfaces for searching, browsing, visualizing, analyzing and downloading.

Functional lncRNA identification and exploration

As an alternative to experimental examination, bioinformatics association study serves as an efficient approach to investigate the putative function of lncRNAs with the analysis of multi-omics data across various biological contexts. Therefore, LncBook 2.0 is committed to providing the list of high-quality functional evidences from evolutionary conservation, genome variation, DNA methylation, gene expression, small protein and lncRNA-mediated interactions, which could be overviewed for all collected lncRNA genes (Figure 2).

Figure 2.

Figure 2.

Screenshot of ‘Genes’ page. Multi-omics features including expression capacity, featured expression pattern, featured methylation pattern, disease/trait-associated variation, lncRNA–protein interaction, and small protein expression, are summarized in a table, which enables customized filtration and sort. Functional evidences for SATB2-AS1 and WT1-AS inferred from multi-omics association analysis and experimental evidences are described in the manuscript.

Users can start from highly conserved lncRNA genes with more multi-omics associations, for example, by setting up the filters: gene age ≥14, high expression capacity, featured gene for both expression and methylation, possessing disease/trait-associated variants, and encoding small proteins. Consequently, a list of 100 lncRNA genes are obtained (Supplementary Table S1). According to multi-omics associations, we find that SATB2-AS1 is suggested to be closely associated with colorectal cancer. It is highly expressed in colon and rectum, hypermethylated in colon adenocarcinoma, possesses large intestine carcinoma-associated variants, and its encoded small proteins (SPROHSA260428 and SPROHSA260429) are also detected in colorectal cancer samples. Consistently, annotations in LncRNAWiki show that SATB2-AS1 has been reported to be involved in colorectal cancer (46,47). Another lncRNA, WT1-AS, is suggested to be closely associated with leukemia, as it is highly expressed and hyper-methylated in leukemia samples, and the encoded small proteins (SPROHSA65308, SPR0HSA264911 and SPROHSA326667) are also detected in leukemia samples. Consistently, WT1-AS has been experimentally validated to play an important role in leukemia (48,49). Also, LncBook 2.0 provides homologous genes for these lncRNAs, which would offer new insights into the exploration of the biological function in different species. Among the 100 lncRNA genes, notably, 50 are functionally uncharacterized up to now, which can be regarded as valuable candidates for experimental investigation and in-depth functional research. Of course, we believe that the more multi-omics associations do not necessarily represent the more important function, and users are encouraged to perform customized selection by different omics features of their own interest.

DISCUSSION AND FUTURE DEVELOPMENTS

As an important resource of the National Genomics Data Center (50), LncBook, in close partnership with LncExpDB (38) and LncRNAWiki (45), serves as a fundamental resource to provide comprehensive and high-quality lncRNAs and their annotations. Considering the growing volume of human lncRNAs, we plan to develop an automatic pipeline and web server to ease lncRNA integration and curation, and classify these lncRNAs by building collaborations with field experts in RNAcentral (51). With more lncRNAs identified in different species, we plan to improve the conservation annotation by including the results generated from lncRNA sequence alignments. Moreover, to better decipher the characteristics of lncRNAs, we will continue to include new omics features, such as lncRNA–DNA/RNA interactions, histone modification regulation, lncRNA modifications/edits and structures, integrate more biological contexts, and perform comparisons between lncRNAs and other types of genes (e.g. protein-coding gene). With the incorporation of more datasets and annotations, we also plan to develop a robust metric to estimate the confidence level of lncRNA gene and accordingly provide a high-confidence list of functional lncRNAs.

DATA AVAILABILITY

LncBook 2.0 is freely available online at https://ngdc.cncb.ac.cn/lncbook.

Supplementary Material

gkac999_Supplemental_File

ACKNOWLEDGEMENTS

We thank Qianpeng Li, Qiheng Qian, Xiaonan Liu, Mochen Zhang and Chang Liu for their valuable comments and discussions in this work.

Contributor Information

Zhao Li, National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Lin Liu, National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; China National Center for Bioinformation, Beijing 100101, China.

Changrui Feng, National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Yuxin Qin, National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Jingfa Xiao, National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Zhang Zhang, National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Lina Ma, National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Strategic Priority Research Program of the Chinese Academy of Sciences [XDB38030400, XDA19050302]; Youth Innovation Promotion Association of Chinese Academy of Sciences [2019104]; National Natural Science Foundation of China [32030021, 31871328]. Funding for open access charge: Strategic Priority Research Program of the Chinese Academy of Sciences.

Conflict of interest statement. None declared.

REFERENCES

  • 1. Ma L., Cao J., Liu L., Du Q., Li Z., Zou D., Bajic V.B., Zhang Z.. LncBook: a curated knowledgebase of human long non-coding RNAs. Nucleic Acids Res. 2019; 47:D128–D134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Lin Y., Pan X., Shen H.-B.. lncLocator 2.0: a cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning. Bioinformatics. 2021; 37:2308–2316. [DOI] [PubMed] [Google Scholar]
  • 3. Kraczkowska W., Jagodzinski P.P.. The long non-coding RNA landscape of atherosclerotic plaques. Mol. Diagn. Ther. 2019; 23:735–749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Cai B., Cai J., Yin Z., Jiang X., Yao C., Ma J., Xue Z., Miao P., Xiao Q., Cheng Y.et al.. Long non-coding RNA expression profiles in neutrophils revealed potential biomarker for prediction of renal involvement in SLE patients. Rheumatology (Oxford). 2021; 60:1734–1746. [DOI] [PubMed] [Google Scholar]
  • 5. Zhao X., Tang D., Chen X., Chen S., Wang C.. Functional lncRNA-miRNA-mRNA networks in response to baicalein treatment in hepatocellular carcinoma. Biomed. Res. Int. 2021; 2021:8844261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Liu X., Xu Y., Wang R., Liu S., Wang J., Luo Y., Leung K.S., Cheng L.. A network-based algorithm for the identification of moonlighting noncoding RNAs and its application in sepsis. Brief. Bioinform. 2021; 22:581–588. [DOI] [PubMed] [Google Scholar]
  • 7. Turjya R.R., Khan M.A., Mir Md Khademul Islam A.B.. Perversely expressed long noncoding RNAs can alter host response and viral proliferation in SARS-CoV-2 infection. Future Virol. 2020; 15:577–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Doke M., McLaughlin J.P., Cai J.J., Pendyala G., Kashanchi F., Khan M.A., Samikkannu T.. HIV-1 tat and cocaine impact astrocytic energy reservoirs and epigenetic regulation by influencing the LINC01133-hsa-miR-4726-5p-NDUFA9 axis. Mol. Ther. Nucleic Acids. 2022; 29:243–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Jiang S., Cheng S.J., Ren L.C., Wang Q., Kang Y.J., Ding Y., Hou M., Yang X.X., Lin Y., Liang N.et al.. An expanded landscape of human long noncoding RNA. Nucleic Acids Res. 2019; 47:7842–7856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Frankish A., Diekhans M., Ferreira A.M., Johnson R., Jungreis I., Loveland J., Mudge J.M., Sisu C., Wright J., Armstrong J.et al.. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019; 47:D766–D773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Pertea M., Shumate A., Pertea G., Varabyou A., Breitwieser F.P., Chang Y.C., Madugundu A.K., Pandey A., Salzberg S.L.. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 2018; 19:208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Hon C.C., Ramilowski J.A., Harshbarger J., Bertin N., Rackham O.J., Gough J., Denisenko E., Schmeier S., Poulsen T.M., Severin J.et al.. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature. 2017; 543:199–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. You B.H., Yoon S.H., Nam J.W.. High-confidence coding and noncoding transcriptome maps. Genome Res. 2017; 27:1050–1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Li Q., Li Z., Feng C., Jiang S., Zhang Z., Ma L.. Multi-omics annotation of human long non-coding RNAs. Biochem. Soc. Trans. 2020; 48:1545–1556. [DOI] [PubMed] [Google Scholar]
  • 15. Diederichs S. The four dimensions of noncoding RNA conservation. Trends Genet. 2014; 30:121–123. [DOI] [PubMed] [Google Scholar]
  • 16. Zhang C., Zhou B., Gu F., Liu H., Wu H., Yao F., Zheng H., Fu H., Chong W., Cai S.et al.. Micropeptide PACMP inhibition elicits synthetic lethal effects by decreasing CtIP and poly(ADP-ribosyl)ation. Mol. Cell. 2022; 82:1297–1312. [DOI] [PubMed] [Google Scholar]
  • 17. Li X.L., Pongor L., Tang W., Das S., Muys B.R., Jones M.F., Lazar S.B., Dangelmaier E.A., Hartford C.C., Grammatikakis I.. A small protein encoded by a putative lncRNA regulates apoptosis and tumorigenicity in human colorectal cancer cells. Elife. 2020; 9:e53734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Huang J.Z., Chen M., Chen Gao X.C., Zhu S., Huang H., Hu M., Zhu H., Yan G.R. A peptide encoded by a putative lncRNA HOXB-AS3 suppresses colon cancer growth. Mol. Cell. 2017; 68:171–184. [DOI] [PubMed] [Google Scholar]
  • 19. Pertea G., Pertea M.. GFF utilities: gffread and gffcompare. F1000Res. 2020; 9:304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Kang Y.J., Yang D.C., Kong L., Hou M., Meng Y.Q., Wei L., Gao G.. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017; 45:W12–W16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Wang G., Yin H., Li B., Yu C., Wang F., Xu X., Cao J., Bao Y., Wang L., Abbasi A.A.et al.. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics. 2019; 35:2949–2956. [DOI] [PubMed] [Google Scholar]
  • 22. Wang L., Park H.J., Dasari S., Wang S., Kocher J.P., Li W.. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013; 41:e74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Li A., Zhang J., Zhou Z.. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinf. 2014; 15:311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S.et al.. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 2012; 22:1760–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Xie C., Yuan J., Li H., Li M., Zhao G., Bu D., Zhu W., Wu W., Chen R., Zhao Y.. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res. 2014; 42:D98–D103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Lee B.T., Barber G.P., Benet-Pages A., Casper J., Clawson H., Diekhans M., Fischer C., Gonzalez J.N., Hinrichs A.S., Lee C.M.et al.. The UCSC genome browser database: 2022 update. Nucleic Acids Res. 2022; 50:D1115–D1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Guo C.J., Ma X.K., Xing Y.H., Zheng C.C., Xu Y.F., Shan L., Zhang J., Wang S., Wang Y., Carmichael G.G.et al.. Distinct processing of lncRNAs contributes to Non-conserved functions in stem cells. Cell. 2020; 181:621–636. [DOI] [PubMed] [Google Scholar]
  • 28. Tate J.G., Bamford S., Jubb H.C., Sondka Z., Beare D.M., Bindal N., Boutselakis H., Cole C.G., Creatore C., Dawson E.et al.. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 2019; 47:D941–D947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Landrum M.J., Lee J.M., Benson M., Brown G.R., Chao C., Chitipiralla S., Gu B., Hart J., Hoffman D., Jang W.et al.. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018; 46:D1062–D1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E.et al.. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019; 47:D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Fadista J., Manning A.K., Florez J.C., Groop L.. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 2016; 24:1202–1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Jannot A.S., Ehret G., Perneger T.. P < 5×10(-8) has emerged as a standard of statistical significance for genome-wide association studies. J. Clin. Epidemiol. 2015; 68:460–465. [DOI] [PubMed] [Google Scholar]
  • 33. Kohler S., Gargano M., Matentzoglu N., Carmody L.C., Lewis-Smith D., Vasilevsky N.A., Danis D., Balagura G., Baynam G., Brower A.M.et al.. The human phenotype ontology in 2021. Nucleic Acids Res. 2021; 49:D1207–D1217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Malone J., Holloway E., Adamusiak T., Kapushesky M., Zheng J., Kolesnikov N., Zhukova A., Brazma A., Parkinson H.. Modeling sample variables with an experimental factor ontology. Bioinformatics. 2010; 26:1112–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Quinlan A.R., Hall I.M.. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Hutter C., Zenklusen J.C.. The cancer genome atlas: creating lasting value beyond its data. Cell. 2018; 173:283–285. [DOI] [PubMed] [Google Scholar]
  • 37. Clough E., Barrett T.. The gene expression omnibus database. Methods Mol. Biol. 2016; 1418:93–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Li Z., Liu L., Jiang S., Li Q., Feng C., Du Q., Zou D., Xiao J., Zhang Z., Ma L.. LncExpDB: an expression database of human long non-coding RNAs. Nucleic Acids Res. 2021; 49:D962–D968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Li Y., Zhou H., Chen X., Zheng Y., Kang Q., Hao D., Zhang L., Song T., Luo H., Hao Y.et al.. SmProt: a reliable repository with comprehensive annotation of small proteins identified from ribosome profiling. Genomics Proteomics Bioinformatics. 2021; 19:602–610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Davis C.A., Hitz B.C., Sloan C.A., Chan E.T., Davidson J.M., Gabdank I., Hilton J.A., Jain K., Baymuradov U.K., Narayanan A.K.et al.. The encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018; 46:D794–D801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Betel D., Koppal A., Agius P., Sander C., Leslie C.. Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol. 2010; 11:R90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Nam J.W., Rissland O.S., Koppstein D., Abreu-Goodger C., Jan C.H., Agarwal V., Yildirim M.A., Rodriguez A., Bartel D.P.. Global analyses of the effect of different cellular contexts on microRNA targeting. Mol. Cell. 2014; 53:1031–1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Kruger J., Rehmsmeier M.. RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res. 2006; 34:W451–W454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L.. BLAST+: architecture and applications. BMC Bioinf. 2009; 10:421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Liu L., Li Z., Liu C., Zou D., Li Q., Feng C., Jing W., Luo S., Zhang Z., Ma L.. LncRNAWiki 2.0: a knowledgebase of human long non-coding RNAs with enhanced curation model and database system. Nucleic Acids Res. 2022; 50:D190–D195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Wang Y.Q., Jiang D.M., Hu S.S., Zhao L., Wang L., Yang M.H., Ai M.L., Jiang H.J., Han Y., Ding Y.Q.et al.. SATB2-AS1 suppresses colorectal carcinoma aggressiveness by inhibiting SATB2-Dependent snail transcription and epithelial-mesenchymal transition. Cancer Res. 2019; 79:3542–3556. [DOI] [PubMed] [Google Scholar]
  • 47. Xu M., Xu X., Pan B., Chen X., Lin K., Zeng K., Liu X., Xu T., Sun L., Qin J.et al.. LncRNA SATB2-AS1 inhibits tumor metastasis and affects the tumor immune cell microenvironment in colorectal cancer by regulating SATB2. Mol. Cancer. 2019; 18:135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Dallosso A.R., Hancock A.L., Malik S., Salpekar A., King-Underwood L., Pritchard-Jones K., Peters J., Moorwood K., Ward A., Malik K.T.et al.. Alternately spliced WT1 antisense transcripts interact with WT1 sense RNA and show epigenetic and splicing defects in cancer. RNA. 2007; 13:2287–2299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Wang W., Lyu C., Wang F., Wang C., Wu F., Li X., Gan S.. Identification of potential signatures and their functions for acute lymphoblastic leukemia: a study based on the cancer genome atlas. Front. Genet. 2021; 12:656042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. CNCB-NGDC Members and Partners Database resources of the national genomics data center, china national center for bioinformation in 2022. Nucleic Acids Res. 2022; 50:D27–D38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. RNAcentral Consortium RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res. 2021; 49:D212–D220. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkac999_Supplemental_File

Data Availability Statement

LncBook 2.0 is freely available online at https://ngdc.cncb.ac.cn/lncbook.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES