Table 1.
Database name | Read mapping | Gene model-ling | Coding-potential assesse-ment | Number of lncRNA genes/transcripts by species (genome assembly version) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Human | Mouse | Cow | Pig | Chicken | Dog | Horse | ||||
Ensembl (v104) | BWA | Exone-rate | ORF and PFAM align-menta | 16 896/46 960 (GRC g38.p13) | 9972/12 601 (GRC m39) | 1488/2199 (ARS_ UCD1.2) | 6979/9367 (Sscrofa 11.1) |
5506/8870 (GRC g6a) |
7083/12 283 (Can Fam 3.1) |
7244/11 978 (Equ Cab 3.0) |
NCBI (v105) | Minimap2 (long read) Spilign (short-read) | Gnomon | Gnomon | 16 375/27 838 (GRC g38.p13) | 13 317/23 542 (GRC m39) | 5183/7254 (ARS_ UCD1.2) | 5605/9292 (Sscrofa 11.1) |
5147/8233 (GRC g6a) |
10 823/19 248 (Can Fam 3.1) |
6789/10 850 (Equ Cab 3.0) |
NONCODE (v6.0) | Literature parsing with RNAseq key words + "CuffCompare" to deal with overlapping features | Comparison with RefSeq + "CNIT" | 96 411/173 112 (GRC g38) | 87 890/131 974 (GRC m39) | 22 127/23 515 (UMD 3.1) | 17 811/29 858 (Sscrofa 10.2) | 9527/12 850 (galgal4) | NA | NA |
The number of lncRNAs found for each species with the corresponding assembly used is also presented
aGene models containing a substantial open reading frame (ORF) and protein domains (e.g. from Pfam) are classified as coding. For human and mouse annotations, additional manual curations from Gencode