Table 1.
Methods | CPC [21] | CPAT a [24] | CNCI [31] | PLEK [32] | LncRNA-ID [28] | lncRScan-SVM [29] | DeepLNC [ 38 ] | COME [ 26 ] | CPC2 [ 35 ] |
---|---|---|---|---|---|---|---|---|---|
Year | 2007 | 2013 | 2013 | 2014 | 2015 | 2015 | 2016 | 2016 | 2017 |
Category | Protein-coding potential calculator | Protein-coding potential calculator | lncRNA predictor | lncRNA predictor | lncRNA prediction method | lncRNA predictor | lncRNA predictor | Protein-coding potential calculator | Protein-coding potential calculator |
Input Format | FASTA | FASTA, BED | FASTA, GTF | FASTA | GTF | FASTA | GTF | FASTA, GTF | |
Species | Multi-species | Human, mouse, | Vertebrate, plant | Vertebrate, plant | Human, mouse | human | human, mouse, | Multi-species | |
fly, zebrafish | fly, worm, plant | ||||||||
Requirements | Linux, BLAST, | Linux, Python2.7, | Linux, Python2.7 | Linux, Python2.7 | Linux, Python2.7, | Linux, R | Linux, Python, | ||
Protein database | R | Biopython | Biopython | ||||||
Model | SVM | Logistic regression | SVM | SVM | Balanced random forest | SVM | deep neural network | balanced random forest | support vector machine |
Features | ORF information, BLASTX [22] | ORF length, transcript length, Fickett TESTCODE score [33, 34], Hexamer score | ANT information, codon bias | Improved k-mer frequencies | ORF length and coverage, ribosome interaction [42–44], profile HMM-based alignment [30] | Count of stop codon, exon information, txCdsPredict score [45], PhastCons score [27] | k-mer frequencies | GC content, BLASTX [22], PhastCons score [27], ribosome profiling [3], INFERNAL result [48], expression data [49, 50], histone modification [4] | ORF information, Fickett TESTCODE score [33, 34], isoelectric point [36, 37] |
Re-training | |||||||||
PubMed | https://www.ncbi.nlm.nih.gov/pubmed/17631615 | https://www.ncbi.nlm.nih.gov/pubmed/23335781 | https://www.ncbi.nlm.nih.gov/pubmed/23892401 | https://www.ncbi.nlm.nih.gov/pubmed/25239089 | https://www.ncbi.nlm.nih.gov/pubmed/26315901 | https://www.ncbi.nlm.nih.gov/pubmed/26437338 | https://www.ncbi.nlm.nih.gov/pubmed/27608726 | https://www.ncbi.nlm.nih.gov/pubmed/28521017 | |
Software | http://cpc.cbi.pku.edu.cn/download | https://sourceforge.net/projects/rna-cpat | https://github.com/www-bioinfo-org/CNCI | https://sourceforge.net/projects/plek | https://sourceforge.net/projects/lncrscansvm | https://bioserver.iiita.ac.in/deeplnc | https://github.com/lulab/COME | http://cpc2.cbi.pku.edu.cn/download.php | |
Web Server | http://cpc.cbi.pku.edu.cn | http://lilab.research.bcm.edu/cpat | http://cpc2.cbi.pku.edu.cn |
A brief summary of several lncRNA identification tools. Among the methods summarized above, the majority of tools identify lncRNAs with sequence-derived features alone, and only CPAT and PLEK can be re-trained by users. DeepLNC only provides web server, while CNCI, PLEK, lncRScan-SVM and COME can be downloaded for local use. CPC, CPAT and CPC2 are released as stand-alone application as well as web server. All the stand-alone tools listed in the table require Linux operating system.
CPAT has updated its features in the latest version. Original feature” coverage of ORF” has been replaced with feature ‘length of the transcript’. Features ‘Fickett TESTCODE score’ and ‘hexamer score’ are calculated on ORF region.