Skip to main content
. 2021 Nov 13;33(2):248–270. doi: 10.1007/s00335-021-09928-7

Table 1.

Bioinformatic tools for annotating and classifying lncRNAs from multi-species databases

Database name Read mapping Gene model-ling Coding-potential assesse-ment Number of lncRNA genes/transcripts by species (genome assembly version)
Human Mouse Cow Pig Chicken Dog Horse
Ensembl (v104) BWA Exone-rate ORF and PFAM align-menta 16 896/46 960 (GRC g38.p13) 9972/12 601 (GRC m39) 1488/2199 (ARS_ UCD1.2) 6979/9367 (Sscrofa 11.1)

5506/8870 (GRC

g6a)

7083/12 283 (Can

Fam 3.1)

7244/11 978 (Equ

Cab 3.0)

NCBI (v105) Minimap2 (long read) Spilign (short-read) Gnomon Gnomon 16 375/27 838 (GRC g38.p13) 13 317/23 542 (GRC m39) 5183/7254 (ARS_ UCD1.2) 5605/9292 (Sscrofa 11.1)

5147/8233 (GRC

g6a)

10 823/19 248 (Can

Fam 3.1)

6789/10 850 (Equ

Cab 3.0)

NONCODE (v6.0) Literature parsing with RNAseq key words + "CuffCompare" to deal with overlapping features Comparison with RefSeq + "CNIT" 96 411/173 112 (GRC g38) 87 890/131 974 (GRC m39) 22 127/23 515 (UMD 3.1) 17 811/29 858 (Sscrofa 10.2) 9527/12 850 (galgal4) NA NA

The number of lncRNAs found for each species with the corresponding assembly used is also presented

aGene models containing a substantial open reading frame (ORF) and protein domains (e.g. from Pfam) are classified as coding. For human and mouse annotations, additional manual curations from Gencode