Skip to main content
. 2018 Jul 31;20(6):2009–2027. doi: 10.1093/bib/bby065

Table 1.

Overview of machine learning-based lncRNA identification tools

Methods CPC [21] CPAT a [24] CNCI [31] PLEK [32] LncRNA-ID [28] lncRScan-SVM [29] DeepLNC [ 38 ] COME [ 26 ] CPC2 [ 35 ]
Year 2007 2013 2013 2014 2015 2015 2016 2016 2017
Category Protein-coding potential calculator Protein-coding potential calculator lncRNA predictor lncRNA predictor lncRNA prediction method lncRNA predictor lncRNA predictor Protein-coding potential calculator Protein-coding potential calculator
Input Format FASTA FASTA, BED FASTA, GTF FASTA GTF FASTA GTF FASTA, GTF
Species Multi-species Human, mouse, Vertebrate, plant Vertebrate, plant Human, mouse human human, mouse, Multi-species
fly, zebrafish fly, worm, plant
Requirements Linux, BLAST, Linux, Python2.7, Linux, Python2.7 Linux, Python2.7 Linux, Python2.7, Linux, R Linux, Python,
Protein database R Biopython Biopython
Model SVM Logistic regression SVM SVM Balanced random forest SVM deep neural network balanced random forest support vector machine
Features ORF information, BLASTX [22] ORF length, transcript length, Fickett TESTCODE score [33, 34], Hexamer score ANT information, codon bias Improved k-mer frequencies ORF length and coverage, ribosome interaction [42–44], profile HMM-based alignment [30] Count of stop codon, exon information, txCdsPredict score [45], PhastCons score [27] k-mer frequencies GC content, BLASTX [22], PhastCons score [27], ribosome profiling [3], INFERNAL result [48], expression data [4950], histone modification [4] ORF information, Fickett TESTCODE score [3334], isoelectric point [3637]
Re-training
PubMed https://www.ncbi.nlm.nih.gov/pubmed/17631615 https://www.ncbi.nlm.nih.gov/pubmed/23335781 https://www.ncbi.nlm.nih.gov/pubmed/23892401 https://www.ncbi.nlm.nih.gov/pubmed/25239089 https://www.ncbi.nlm.nih.gov/pubmed/26315901 https://www.ncbi.nlm.nih.gov/pubmed/26437338 https://www.ncbi.nlm.nih.gov/pubmed/27608726 https://www.ncbi.nlm.nih.gov/pubmed/28521017
Software http://cpc.cbi.pku.edu.cn/download https://sourceforge.net/projects/rna-cpat https://github.com/www-bioinfo-org/CNCI https://sourceforge.net/projects/plek https://sourceforge.net/projects/lncrscansvm https://bioserver.iiita.ac.in/deeplnc https://github.com/lulab/COME http://cpc2.cbi.pku.edu.cn/download.php
Web Server http://cpc.cbi.pku.edu.cn http://lilab.research.bcm.edu/cpat http://cpc2.cbi.pku.edu.cn

A brief summary of several lncRNA identification tools. Among the methods summarized above, the majority of tools identify lncRNAs with sequence-derived features alone, and only CPAT and PLEK can be re-trained by users. DeepLNC only provides web server, while CNCI, PLEK, lncRScan-SVM and COME can be downloaded for local use. CPC, CPAT and CPC2 are released as stand-alone application as well as web server. All the stand-alone tools listed in the table require Linux operating system.

a

CPAT has updated its features in the latest version. Original feature” coverage of ORF” has been replaced with feature ‘length of the transcript’. Features ‘Fickett TESTCODE score’ and ‘hexamer score’ are calculated on ORF region.