lncRNADetector: a bioinformatics pipeline for long non-coding RNA identification and MAPslnc: a repository of medicinal and aromatic plant lncRNAs

Bhaskar Shukla; Sanchita Gupta; Gaurava Srivastava; Ashok Sharma; Ashutosh K Shukla; Ajit K Shasany

doi:10.1080/15476286.2021.1899673

. 2021 Mar 18;18(12):2290–2295. doi: 10.1080/15476286.2021.1899673

lncRNADetector: a bioinformatics pipeline for long non-coding RNA identification and MAPslnc: a repository of medicinal and aromatic plant lncRNAs

Bhaskar Shukla ^a,^b,^✉, Sanchita Gupta ^c, Gaurava Srivastava ^b,^d, Ashok Sharma ^a,^d, Ashutosh K Shukla ^b,^d, Ajit K Shasany ^b,^d

PMCID: PMC8632071 PMID: 33685383

ABSTRACT

Long non-coding RNAs (lncRNAs) are an emerging class of non-coding RNAs and potent regulatory elements in the living cells. High throughput RNA sequencing analyses have generated a tremendous amount of transcript sequence data. A large proportion of these transcript sequences does not code for proteins and are known as non-coding RNAs. Among them, lncRNAs are a unique class of transcripts longer than 200 nucleotides with diverse biological functions and regulatory mechanisms. Recent emerging studies and next-generation sequencing technologies show a substantial amount of lncRNAs within the plant genome, which are yet to be identified. The computational identification of lncRNAs from these transcripts is a challenging task due to the involvement of a series of filtering steps. We have developed lncRNADetector, a bioinformatics pipeline for the identification of novel lncRNAs, especially from medicinal and aromatic plant (MAP) species. The lncRNADetector has been utilized to analyse and identify more than 88,459 lncRNAs from 21 species of MAPs. To provide a knowledge resource for the plant research community towards elucidating the diversity of biological roles of lncRNAs, the information generated about MAP lncRNAs (post-filtering steps) through lncRNADetector has been stored and organized in MAPslnc database (MAPslnc, https://lncrnapipe.cimap.res.in). The lncRNADetector web server and MAPslnc database have been developed in order to facilitate researchers for accurate identification of lncRNAs from the next-generation sequencing data of different organisms for downstream studies. To the best of our knowledge no such MAPslnc database is available till date.

KEYWORDS: Bioinformatics pipeline, long non-coding RNAs (lncRNAs), medicinal and aromatic plants (MAPs)

Introduction

Recent studies have revealed that RNAs play a regulatory role in diverse biological processes such as development and stress responses in plants [1]. Although it has been hypothesized on the basis of high throughput genomic platforms that approximately 90% of a genome may be transcribed into RNA, only 1–2% of the transcripts code for proteins and the rest are considered to be non-coding [2]. A significant fraction of non-coding RNAs (ncRNAs) consists of different complex families, including housekeeping RNAs and regulatory RNAs. The housekeeping or structural ncRNAs including tRNA, rRNA, snRNA and snoRNA express constitutively in the cell [3]. The regulatory RNAs include small RNAs (miRNAs, siRNAs) and long non-coding RNAs (lncRNAs). Earlier, the lncRNAs were considered to be a source of transcriptional noise due to lack of coding potential and lower expression as compared to protein-coding transcripts [4]. In recent times, the lncRNAs are emerging as potent gene expression regulators [5,6]. Like mRNAs, the lncRNAs are also transcribed by RNA polymerase II and spliced. Additionally, the number of lncRNAs found is much higher than the number of protein-coding RNAs (mRNAs) [7]. The lncRNA Early nodulin40 (ENOD40) gene in Medicago truncatula [8] has been reported to play a critical role in nodule formation. In Arabidopsis thaliana, COOLAIR and COLDAIR lncRNAs participate in the regulation of FLOWERING LOCUS C (FLC) gene, which affects the flowering time [9]. These prominent discoveries of lncRNAs in plants signify the potential interest in the exploration of their biological functions, which in turn may be responsible for regulating important agronomic traits [10]. Besides their regulatory function, the functional role and conserved features of lncRNAs is a matter of current debate and research for the scientific community. It could be possible to work out their functional and conserved (evolutionary) aspects once they have been identified completely. At present, it is a difficult task to identify all the lncRNAs present in an organism. The application of bioinformatics makes the task easier and faster. Reports are available on different computational steps, which are needed for the identification of lncRNAs. Stepwise multi-filtering approach has proved to be effective in discovering thousands of novel lncRNAs in various organisms [11,12]. For this, different tools and softwares have to be considered one by one for the step by step analyses. However, there is an urgent need for an easy-to-use and user-friendly bioinformatics pipeline integrating the series of filtering steps into a single platform, facilitating the discovery of lncRNAs. Towards this end, we have developed lncRNADetector, a bioinformatics pipeline, through combining all the analysis steps into a single platform. It takes the assembled transcript sequences in FASTA format as input and gives the output of every step in separate files. The steps involve utilization of CPC (Coding Potential Calculator), a Support Vector Machine-based tool [13], to discriminate between protein-coding and non-coding transcripts. CPC employs UniRef90 as protein database [14]. Most of the currently available tools utilize only CPC that might categorize novel protein-coding transcripts as non-coding due to unavailability of records in the protein database used by CPC. In lncRNADetector, CPC is considered due to its FASTA and multispecies input format, along with other important steps, which enhance the accuracy of results. Few resources have been reported earlier for analysing lncRNAs. For example, the iSeeRNA is an SVM-based classifier and standalone tool for discriminating protein-coding transcripts and lincRNAs (long intergenic RNAs), a kind of lncRNA present among the exons in a genome [15]; Sebnif is a bioinformatics pipeline used for high-quality single-exonic lincRNAs [16]; and CPC2 is a fast and accurate coding potential calculator based on sequence intrinsic features [17]. However, a need was felt to further increase the accuracy of identifying lncRNAs in the existing tools. In our analysis, all the identified novel lncRNAs were stored and managed in the form of a MAPslnc database. lncRNA databases for various species including plants have been built but none of them focus specifically on medicinal and aromatic plants (MAPs). Earlier the MAPs were considered to be orphan plants but are now in high focus and many of them like, Catharanthus roseus (known for antineoplastic bisindole alkaloids), Papaver somniferum (known for morphinan alkaloids), Artemisia annua (known for the antimalarial sesquiterpene lactone artemisinin), etc. have acquired the status of model plants [18]. The MAPs that we have chosen are of high importance in terms of their medicinal/aromatic value and very less information is available on their lncRNAs. The primary objective of developing the MAPslnc database is to provide a specific repository of lncRNAs from MAPs.

Materials and methods

Pipeline overview

The lncRNADetector is a comprehensive bioinformatics pipeline for the identification of lncRNAs from the assembled transcriptome sequencing (NGS) data. It integrates several filtering steps to identify novel lncRNAs (Fig. 1).

Figure 1. — The schematic representation of the workflow of lncRNADetector

Datasets used for developing lncRNADetector

The datasets of various MAP species were used. The input data for lncRNA identification through lncRNADetector were collected from NCBI web portal https://www.ncbi.nlm.nih.gov using nucleotide database in FASTA format. The selection of datasets is based on the MAP species with sufficient amount of nucleotide sequence of transcript length (>200 bp).

Standard input file formats

To be compatible with both de novo and reference-based assembly softwares, we have set FASTA format as the default input file format for lncRNADetector. The FASTA format allows easy integration of transcriptome data analysis into the lncRNADetector workflow.

lncRNA identification

The source code of the pipeline for identification of lncRNAs written in PERL script [11] was further modified for integrating different standard tools involved in various steps to identify lncRNAs. We have implemented lncRNADetector, a bioinformatics pipeline for the identification of lncRNAs from assembled transcripts. This pipeline is hosted on a web server for proper functioning and storage of analysed data. In the first step, we filtered the assembled data with the transcript length (>200 bp). Next, we analysed the transcripts coding for open reading frames (ORFs). The sequences coding for ORFs lesser than 100 aa were considered further. The filtered sequences were Blast analysed (blastx, 2.2.28+) [19] against patented protein sequences (pataa) database [20], with a default E-value threshold set to 1e-3. The sequences were filtered to get ‘No hits found’ in the blastx-analysed file. The sequences with no hit found results were considered as input for CPC tool [13]. Thus, the coding potential of the left over sequences after blastx were calculated using CPC (0.9-r2) tool, which is an integral part of lncRNADetector. Finally, the sequences marked as non-coding by CPC were re-analysed through blastx with Pfam protein family database [21], which ultimately resulted in the identification of lncRNAs. As the pataa database contains sequences for protein structures not included in the non-redundant protein sequences (nr) database, the identified lncRNAs were further analysed through blastx against the nr database [20], with an E-value threshold set to 1e-5 for enhancing the accuracy of identified lncRNAs [19].

Recently launched species-neutral long non-coding RNA identification tools such as LGC [22] and CPC2 [17] web server were chosen for validation of our identified lncRNAs. Further to validate the results, LGC and CPC2 web servers were also utilized . The lncRNAs identified through lncRNADetector were utilized as input for the LGC and CPC2 web servers and results were compared.

Relational database of MAPs lncRNAs

Many public lncRNA databases on human, vertebrate and plants have already been reported but none of them are especially focused on lncRNAs from MAPs. Therefore, we have developed MAPs long non-coding RNA database (MAPslnc), having all the information of identified lncRNAs from different MAP species. lncRNAs were identified through lncRNADetector including all filtering steps. The resulting data generated by lncRNADetector (in FASTA/text files) gets imported into the MAPslnc database (hosted on SQL server) by mapping relational data fields through .NET script.

Results and discussion

We have implemented lncRNADetector, a bioinformatics pipeline for the identification of lncRNAs from assembled transcripts. It is hosted on a web server for proper functioning and storage of analysed data. It is capable of analysing lncRNAs, which may have important roles in various biological processes and systems in different organisms. The pipeline has been implemented as a web prediction tool available on server interface https://lncrnapipe.cimap.res.in.

Majority of the lncRNAs predicted through lncRNADetector were validated using other tools like, LGC and CPC2, as well. This has corroborated the accuracy of lncRNADetector and shown it to be an efficient and accurate lncRNA prediction tool for MAPs (Table 1).

Table 1.

Comparative validation of lncRNADetector-identified lncRNAs through LGC and CPC2

S. No.	MAP species	Number of lncRNAs identified through lncRNADetector	lncRNADetector-identified lncRNAs validated through LGC ^#		lncRNADetector-identified lncRNAs validated through CPC2 ^#
			Total	Percentage	Total	Percentage
1	Gloriosa superba	12	12	100	12	100
2	Lippia alba	42	42	100	42	100
3	Plantago ovata	61	61	100	61	100
4	Aloe vera	56	56	100	56	100
5	Mentha spicata	99	99	100	99	100
6	M. arvensis	80	80	100	77	96.25
7	Acorus calamus	45	45	100	45	100
8	Taxus baccata	106	106	100	106	100
9	Andrographis paniculata	74	74	100	74	100
10	Mentha x piperita	225	225	100	223	99.11
11	Plumbago zeylanica	764	758	99.21	763	99.86
12	Centella asiatica	299	296	98.99	299	100
13	Tinospora cordifolia	555	542	97.65	548	98.73
14	Hippophae rhamnoides	1134	1131	99.73	1134	100
15	Crocus sativus	1684	1679	99.70	1679	99.70
16	Mucuna pruriens	29	29	100	29	100
17	Ocimum basilicum	2306	2301	99.78	2281	98.91
18	Withania somnifera	6376	6286	98.58	6366	99.84
19	Rauvolfia serpentina	13,598	13,487	99.18	13,586	99.91
20	Catharanthus roseus	16,507	16,330	98.92	16,464	99.73
21	Papaver somniferum	44,407	43,637	98.26	43,941	98.95
	Total	88,459	87,276	98.66%	87,885	99.35%

Open in a new tab

^#Input = Total number of lncRNAs identified in each species through lncRNADetector.

Web server for lncRNADetector

We have developed and deployed a user-friendly web interface for lncRNADetector as a window service using C# language and ASP.NET applications (Fig. 2).

Figure 2. — Architecture of lncRNADetector

For convenience of users, we have established a web portal at https://lncrnapipe.cimap.res.in. Briefly, the user has to submit the nucleotide sequences in FASTA format on the web portal. The web server accepts a set of nucleotide FASTA sequences as input. The user can set the E-value as a parameter for blastx and upload the sequences for the identification of lncRNAs. Depending upon the size of the input data, this web server would take minutes to hours for providing the final output. We have created an lncRNADetector as a window service using .NET, which will execute Perl script of lncRNADetector as the user submits the input. In case of multiple nucleotide sequences submitted by the user, the lncRNADetector will execute on a sequential basis.

After finishing the identification process, results of lncRNA with details will appear in the browser in tabular form which shows the accession number, description, coding potential of the transcript, length of the transcript, sequence and label which can be downloaded in a tab separated file (.txt). Users can further click ‘Download’ to see all the intermediate files generated during the identification process as ZIP format. Also, lncRNADetector web server will assign a unique task ID for each job request submitted. Users can also track the progress of the task submitted and retrieve results by Task ID under 'Result' page in the homepage. In addition to web portal identification, users can also download stand-alone software packages and instal them under the Windows system. For complete installation and its usage, the user has to see the ‘Help’ page of web portal.

MAPslnc: repository of MAPs long non-coding (MAPslnc) RNAs

We have also created a repository of information for lncRNAs identified through lncRNADetector, which is stored in the SQL server database (Table 2). MAPslnc database contains information about sequences and their length as well as the coding potential of identified lncRNAs. Presently, the identified lncRNA data of 21 MAP species is freely available for downloading from an online repository called MAPslnc, which can be downloaded in a tab-separated file that enables downstream analysis and utilization.

Table 2.

Summary of MAP lncRNAs in MAPslnc identified through step-wise filtering

S. No.	MAP species	Total number of nucleotide sequences in the dataset	Steps 1 & 2 Number of sequences left after Transcript Size Selection + ORF filters	Step 3 Number of sequences left after blastx against pataa	Step 4 Number of sequences left after CPC filter	Step 5 Number of sequences left after blastx against Pfam	Step 6 Number of sequences left after blastx against nr
1	G. superba	82	40	15	15	15	12
2	L. alba	84	73	43	43	43	42
3	P. ovata	208	143	62	61	61	61
4	A. vera	225	107	61	60	60	56
5	M. spicata	261	166	104	104	104	99
6	M. arvensis	286	196	100	100	100	80
7	A. calamus	517	153	49	49	49	45
8	T. baccata	530	282	107	107	107	106
9	A. paniculata	867	431	81	81	77	74
10	Mentha x piperita	1626	774	250	250	249	225
11	P. zeylanica	1899	1283	786	785	773	764
12	C. asiatica	4671	1544	344	337	334	299
13	H. rhamnoides	4991	3637	1191	1188	1178	1134
14	T. cordifolia	5713	1608	587	587	582	555
15	C. sativus	7216	4345	1766	1760	1745	1684
16	M. pruriens	15,818	1226	47	46	43	29
17	O. basilicum	24,171	9453	2603	2599	2501	2306
18	W. somnifera	74,567	18,589	7894	7882	7508	6376
19	R. serpentina	99,622	37,462	15,146	15,126	14,827	13,598
20	C. roseus	108,110	42,684	18,056	18,003	17,700	16,507
21	P. somniferum	184,411	67,916	46,033	45,925	44,893	44,407
	Total	535,875					88,459

Open in a new tab

The lncRNAs in the MAPslnc database repository have been obtained through application of following filtering steps:

Step 1 (Transcript size selection): >200 nucleotides.

Step 2 (ORF): ˂100 amino acids.

Step 3 (blastx against patented protein sequences [pataa]): No hit in pataa database (E-value threshold 1e-3)

Step 4 (Coding potential calculation): Classified non-coding by CPC.

Step 5 (blastx against Pfam): No hit in Pfam database (E-value threshold 1e-3).

Step 6 (blastx against nr): No hit in nr database (E-value threshold 1e-5).

The results for lncRNAs obtained after step-wise filtering for all the 21 MAP species have been summarized in Table 2.

Conclusions

Due to next-generation sequencing technology, novel lncRNAs of different organisms can be identified through various bioinformatics pipelines. As lncRNAs in plants play a critical role in many biological regulations, it is of great importance to identify lncRNAs with high accuracy. The proper storage of identified lncRNAs in a public database is also important for their downstream utilization by plant researchers. The non-redundant protein sequences (nr) database is very huge in size (~65 Gb) and similarity search for sequences through blastx against it consumes high computational time. On the other hand, blastx against the pataa database, which is much smaller in size (~900 Mb), gives almost comparative results in significantly less search time. Therefore, the time cost benefit analysis prompted us to incorporate blastx against pataa (but not nr) database in the web portal of lncRNADetector. However, we have created the online repository of identified MAP lncRNAs based on all filtering steps of lncRNADetector, which includes results from all the six filtering steps mentioned in Table 2. The lncRNADetector demonstrates high prediction accuracy due to the incorporation of a series of filtering steps and offers a valuable tool for the identification of lncRNAs in a single window platform. MAPslnc repository will be regularly updated in order to add new lncRNA sequences from other MAP species.

Supplementary Material

Supplemental Material

Click here for additional data file.^{(55KB, doc)}

Acknowledgements

The authors are grateful to Director, CSIR-CIMAP, for his encouragement and providing necessary laboratory facilities. They also thank Dr Sumit K Bag, Principal Scientist, CSIR-NBRI, Dr Ashutosh Singh, Associate Professor, Shiv Nadar University, and Dr Vikrant Gupta, Senior Principal Scientist, CSIR-CIMAP, for their valuable suggestions. The help provided by Dr V. Sundaresan, Principal Scientist, CSIR-CIMAP, Research Centre, Bengaluru, in the form of original photographs of the target MAPs is duly acknowledged. Er. Manoj Semwal, Principal Scientist, CSIR-CIMAP and Mr Sanjay Singh, Technical Officer, CSIR-CIMAP, are gratefully acknowledged for providing ICT facility for hosting the web server. This work was financially supported by in-house funds provided by CSIR-CIMAP. SG was supported by the Science and Engineering Research Board (SERB), India, in the form of a National-Post Doctoral Fellowship and GS was supported by an ICMR fellowship.

Funding Statement

This work was supported by the CSIR-Central Institute of Medicinal and Aromatic Plants, Lucknow, India [In-house Project].

Disclosure statement

The authors have declared that no competing interests exist.

Supplementary material

Supplemental data for this article can be accessed here.

References

[1].Szakonyi D, Confraria A, Valerio C, et al. Editorial: Plant RNA Biology. Front Plant Sci. 2019;10:887. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Pertea M. The human transcriptome: an unfinished story. Genes (Basel). 2012;3:344–360. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Ponting CP, Oliver PL, Reik W.. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–641. [DOI] [PubMed] [Google Scholar]
[4].Liu J, Wang H, Chua NH. Long noncoding RNA transcriptome of plants. Plant Biotechnol J. 2015;13:319–328. [DOI] [PubMed] [Google Scholar]
[5].Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009;10:155–159. [DOI] [PubMed] [Google Scholar]
[6].Wilusz JE, Sunwoo H, Spector DL. Long noncoding RNAs: functional surprises from the RNA world. Genes Dev. 2009;23:1494–1504. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Kornienko AE, Dotter CP, Guenzl PM, et al. Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol. 2016;17:14. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Campalans A, Kondorosi A. Crespi M. Enod40, a short open reading frame–containing mRNA, induces cytoplasmic localization of a nuclear RNA binding protein in Medicago truncatula. Plant Cell. 2004;16:1047–1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Michaels SD, Amasino RM. FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell. 1999;11:949–956. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Sanchita, Trivedi PK, Asif MH. Updates on plant long non-coding RNAs (lncRNAs): the regulatory components. Plant Cell Tiss Organ Cult. 2020;140:259–269. [Google Scholar]
[11].Zhang W, Han Z, Guo Q, et al. Identification of maize long non-coding RNAs responsive to drought stress. PloS One. 2014;9:e98958. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Mu C, Wang R, Li T, et al. Long non-coding RNAs (lncRNAs) of sea cucumber: large-scale prediction, expression profiling, non-coding network construction, and lncRNA-microRNA-gene interaction analysis of lncRNAs in Apostichopus japonicus and Holothuria glaberrima during LPS challenge and radial organ complex regeneration. Mar Biotechnol (NY). 2016;18:485–499. [DOI] [PubMed] [Google Scholar]
[13].Kong L, Zhang Y, Ye ZQ, et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007;35(Web Server issue):W345–W349. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Suzek BE, Huang H, McGarvey P, et al. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23:1282–1288. [DOI] [PubMed] [Google Scholar]
[15].Sun K, Chen X, Jiang P, et al. iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data. BMC Genomics. 2013;14 Suppl 2:S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Sun K, Zhao Y, Wang H, et al. Sebnif: an integrated bioinformatics pipeline for the identification of novel large intergenic noncoding RNAs (lincRNAs)--application in human skeletal muscle cells. PloS One. 2014;9:e84500. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Kang YJ, Yang DC, Kong L, et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017;45(W1):W12–W16. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Khanuja SPS, Shukla AK. Medicinal plant metabolomes: converging botany and chemistry into health opportunity. In: Iqbal M, Ahmad A, editors. Current trends in medicinal botany. New Delhi: I.K. International Publishing House Pvt. Ltd.; 2014. p. 346–370. [Google Scholar]
[19].Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Sayers EW, Agarwala R, Bolton EE, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019;47(D1):D23–D28. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].El-Gebali S, Mistry J, Bateman A, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47(D1):D427–D432. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Wang G, Yin H, Li B, et al. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics. 2019;35:2949–2956. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Click here for additional data file.^{(55KB, doc)}

[cit0001] [1].Szakonyi D, Confraria A, Valerio C, et al. Editorial: Plant RNA Biology. Front Plant Sci. 2019;10:887. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0002] [2].Pertea M. The human transcriptome: an unfinished story. Genes (Basel). 2012;3:344–360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0003] [3].Ponting CP, Oliver PL, Reik W.. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–641. [DOI] [PubMed] [Google Scholar]

[cit0004] [4].Liu J, Wang H, Chua NH. Long noncoding RNA transcriptome of plants. Plant Biotechnol J. 2015;13:319–328. [DOI] [PubMed] [Google Scholar]

[cit0005] [5].Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009;10:155–159. [DOI] [PubMed] [Google Scholar]

[cit0006] [6].Wilusz JE, Sunwoo H, Spector DL. Long noncoding RNAs: functional surprises from the RNA world. Genes Dev. 2009;23:1494–1504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0007] [7].Kornienko AE, Dotter CP, Guenzl PM, et al. Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol. 2016;17:14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0008] [8].Campalans A, Kondorosi A. Crespi M. Enod40, a short open reading frame–containing mRNA, induces cytoplasmic localization of a nuclear RNA binding protein in Medicago truncatula. Plant Cell. 2004;16:1047–1059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0009] [9].Michaels SD, Amasino RM. FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell. 1999;11:949–956. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0010] [10].Sanchita, Trivedi PK, Asif MH. Updates on plant long non-coding RNAs (lncRNAs): the regulatory components. Plant Cell Tiss Organ Cult. 2020;140:259–269. [Google Scholar]

[cit0011] [11].Zhang W, Han Z, Guo Q, et al. Identification of maize long non-coding RNAs responsive to drought stress. PloS One. 2014;9:e98958. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0012] [12].Mu C, Wang R, Li T, et al. Long non-coding RNAs (lncRNAs) of sea cucumber: large-scale prediction, expression profiling, non-coding network construction, and lncRNA-microRNA-gene interaction analysis of lncRNAs in Apostichopus japonicus and Holothuria glaberrima during LPS challenge and radial organ complex regeneration. Mar Biotechnol (NY). 2016;18:485–499. [DOI] [PubMed] [Google Scholar]

[cit0013] [13].Kong L, Zhang Y, Ye ZQ, et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007;35(Web Server issue):W345–W349. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0014] [14].Suzek BE, Huang H, McGarvey P, et al. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23:1282–1288. [DOI] [PubMed] [Google Scholar]

[cit0015] [15].Sun K, Chen X, Jiang P, et al. iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data. BMC Genomics. 2013;14 Suppl 2:S7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0016] [16].Sun K, Zhao Y, Wang H, et al. Sebnif: an integrated bioinformatics pipeline for the identification of novel large intergenic noncoding RNAs (lincRNAs)--application in human skeletal muscle cells. PloS One. 2014;9:e84500. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0017] [17].Kang YJ, Yang DC, Kong L, et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017;45(W1):W12–W16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0018] [18].Khanuja SPS, Shukla AK. Medicinal plant metabolomes: converging botany and chemistry into health opportunity. In: Iqbal M, Ahmad A, editors. Current trends in medicinal botany. New Delhi: I.K. International Publishing House Pvt. Ltd.; 2014. p. 346–370. [Google Scholar]

[cit0019] [19].Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0020] [20].Sayers EW, Agarwala R, Bolton EE, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019;47(D1):D23–D28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0021] [21].El-Gebali S, Mistry J, Bateman A, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47(D1):D427–D432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0022] [22].Wang G, Yin H, Li B, et al. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics. 2019;35:2949–2956. [DOI] [PubMed] [Google Scholar]

PERMALINK

lncRNADetector: a bioinformatics pipeline for long non-coding RNA identification and MAPslnc: a repository of medicinal and aromatic plant lncRNAs

Bhaskar Shukla

Sanchita Gupta

Gaurava Srivastava

Ashok Sharma

Ashutosh K Shukla

Ajit K Shasany

ABSTRACT

Introduction

Materials and methods

Pipeline overview

Figure 1.

Datasets used for developing lncRNADetector

Standard input file formats

lncRNA identification

Relational database of MAPs lncRNAs

Results and discussion

Table 1.

Web server for lncRNADetector

Figure 2.

MAPslnc: repository of MAPs long non-coding (MAPslnc) RNAs

Table 2.

Conclusions

Supplementary Material

Acknowledgements

Funding Statement

Disclosure statement

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

lncRNADetector: a bioinformatics pipeline for long non-coding RNA identification and MAPslnc: a repository of medicinal and aromatic plant lncRNAs

Bhaskar Shukla

Sanchita Gupta

Gaurava Srivastava

Ashok Sharma

Ashutosh K Shukla

Ajit K Shasany

ABSTRACT

Introduction

Materials and methods

Pipeline overview

Figure 1.

Datasets used for developing lncRNADetector

Standard input file formats

lncRNA identification

Relational database of MAPs lncRNAs

Results and discussion

Table 1.

Web server for lncRNADetector

Figure 2.

MAPslnc: repository of MAPs long non-coding (MAPslnc) RNAs

Table 2.

Conclusions

Supplementary Material

Acknowledgements

Funding Statement

Disclosure statement

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases