Skip to main content
. Author manuscript; available in PMC: 2019 Apr 8.
Published in final edited form as: Nat Rev Genet. 2018 Sep;19(9):535–548. doi: 10.1038/s41576-018-0017-y

Table 1 |.

lncRNA annotations

name (version) Reported size (gene loci) Methodsa comments completeness comprehensivenessb exhaustivenessc
NONCODE (v5) 96,308 Integration of other databases The most comprehensive resource 8.9% 67,276 2.3
MiTranscriptome (v2) 63,615 Assembly from short reads Mainly cancer samples 4.4% 45,088 4.4
FANTOM CAT (v1) 27,919 Assembly, other annotations and CAGE evidence Mapped 5′ ends using CAGE tags 15.8% 27,278 3.3
RefSeq (GCF_000001405.37_GRCh38.p11) 15,791 Manual (based on cDNA) and automated annotation (based on RNA-seq data) The oldest annotation 11.0% 14,889 1.9
GENCODE (v27) 15,778 Manual annotation based on cDNA, ESTs and high-quality long-read data Used by most consortia and integrated with Ensembl 13.5% 15,063 1.9
BIGTranscriptome (v1) 14,158 Assembly, with CAGE and 3 P-seq evidence Full-length transcripts 27.7% 12,632 2.1
GENCODE+ 13,434 Union of GENCODE (v20) and CLS lncRNAs with anchor-merged CLS transcript models Extension of GENCODE by CLS 24.0% 13,434 3.3
CLS FL 807 lncRNAs from GENCODE+ with CAGE and poly(A) evidence Full-length transcripts 71.7% 807 5.5
Protein-codingd 19,502 GENCODE confident protein-coding transcripts Not tagged mRNA_end_NF nor mRNA_start_NF in the original GENCODE v27 GTF file 53.8% 18,995 2.9

All numbers correct as of the end of 2017. MiTranscriptome, Functional Annotation of the Mammalian genome (FANTOM) cap analysis of gene expression (CAGE)-associated transcriptome (CAT) and BIGTranscriptome long non-coding RNA (lncRNA) catalogues were lifted over to the Genome Reference Consortium Human Build 38 (GRCh38) genome assembly. 3P-seq, poly(A)-position profiling by sequencing; CLS, capture long-read sequencing; EST, expressed sequence tag; RNA-seq, RNA sequencing.

a

Assembly in the Methods column refers to transcriptome assembly using short reads from RNA-seq.

b

Comprehensiveness is the total number of gene loci boundaries defined using buildLoci. To compare gene sets in a consistent way, the assembly patches were excluded, and the gene loci boundaries were redefined using buildLoci, which explains discrepancies between gene numbers presented here and those reported in original publications.

c

Exhaustiveness is the average number of isoforms per gene locus. Figures for completeness, comprehensiveness and exhaustiveness as presented in FIG. 3 are shown here.

d

A set of protein-coding transcripts was used as a reference.