Skip to main content
. Author manuscript; available in PMC: 2014 Jul 3.
Published in final edited form as: Cell. 2013 Jul 3;154(1):26–46. doi: 10.1016/j.cell.2013.06.020

Table 1.

Large-Scale Efforts to Catalog lincRNA Loci and Transcripts

Reference Data for Transcript
Reconstruction
Genomic Features and Filters Coding-Potential Filters Number of lincRNAs
Mouse
Ravasi et al., 2006 cDNAs Manual curation,
ORF length, CRITICA
13,502 transcripts
Ponjavic et al., 2007 cDNAs, CAGE Manual curation,
ORF length, BLAST,
CRITICA
3,122 transcripts
Guttman et al., 2009 Chromatin marks,
tiling arrays
Collection of approximate exonic
regions, chromatin domain ≥5 kb
CSF 1,675 loci (1,250
conservatively defined)
Guttman et al., 2010 RNA-seq Multi-exon only CSF 1,140 lincRNA transcripts
Sigova et al., 2013 RNA-seq, cDNAs,
chromatin marks,
Antisense overlap with mRNA
introns allowed, ≥100 nt mature
length
CPC 1,664 loci

Human
Khalil et al., 2009 Chromatin marks,
tiling arrays
Collection of approximate exonic
regions, chromatin domain ≥ 5 kb
CSF 3,289 loci
Jia et al., 2010 cDNAs Overlap with mRNAs allowed 5,446 transcripts
Ørom et al., 2010 cDNAs Restricted to loci >1 kb away
from known protein-coding genes,
≥200 nt mature length
Manual curation based
on length, conservation
and other characteristics
of the ORFs
3,019 transcripts from
2,286 loci
Cabili et al., 2011 RNA-seq Multi-exon only, ≥200 nt mature
length
PhyloCSF, Pfam 8,195 transcripts
(4,662 in the stringent set)
Derrien et al., 2012 cDNAs Overlap with mRNAs allowed
(intergenic transcripts reported
separately), ≥200 nt mature length
Manual curation based
on length, conservation
and other characteristics
of the ORFs
14,880 transcripts from
9,277 loci, including 9,518
intergenic transcripts
Sigova et al., 2013 RNA-seq, cDNAs,
chromatin marks,
Antisense overlap with mRNA
introns allowed, ≥100 nt mature
length
CPC 3,548 loci from embryonic
stem cells, and 3,986 loci
from endodermal cells

Frog
Tan et al., 2013 RNA-Seq >25 kb away from known protein-
coding genes or on a different strand
from the neighboring genes,
≥200 nt mature length
ORF length, BLAST, Pfam 6,686 transcripts from
3,859 loci

Zebrafish
Ulitsky et al., 2011 RNA-seq, cDNAs,
3P-seq, chromatin
marks
Antisense overlap with mRNA
introns allowed, ≥200 nt mature
length
CPC 691 transcripts from
567 loci
Pauli et al., 2012 RNA-seq Stringent criteria for single exon,
intron overlap with mRNA allowed,
≥160 nt mature length
ORF length, PhyloCSF,
BLAST, Pfam
397 intergenic and 184
intronic overlapping
transcripts

Fly
Tupy et al., 2005 cDNA Manual curation based on
ORF length, conservation
and other characteristics,
Ka/Ks test, QRNA
17 transcripts
Young et al., 2012 RNA-seq ≥200 nt locus length 1,119 trancripts

Nematode
Nam and Bartel, 2012 RNA-seq, 3P-seq ≥100 nt mature length CPC, RNAcode, ribosome
profiling, polysome
association
262 lincRNA transcripts
from 170 loci

Arabidopsis
Liu et al., 2012a cDNA, tiling arrays,
RNA-seq
In part a collection of approximate
exonic regions, >500 bp away from
protein-coding genes, no overlap
with transposable elements allowed,
≥200 nt mature length
ORF length 6,480 transcription
units from tiling arrays,
278 transcripts from
RNA-seq

Maize
Boerner and McGinnis, 2012 cDNA Both sense overlap with introns
and antisense overlap with mRNA
or introns allowed,
≥200 nt mature length
ORF length 2,492 transcripts

Plasmodium falciparum
Broadbent et al., 2011 Tiling arrays Collection of approximate
exonic regions,
≥200 nt mature length
BLAST 60 transcripts

Transcripts overlapping protein-coding sequences on either strand were excluded unless noted otherwise. Coding-potential filters included: ORF length; similarity to known protein-coding regions (BLAST); substitution patterns in whole-genome alignments, quantified by CRITICA (Badger and Olsen, 1999), CSF (Lin et al., 2007), PhyloCSF (Lin et al., 2011), QRNA (Rivas and Eddy, 2001; Rivas et al., 2001), or RNAcode (Washietl et al., 2011), as indicated; the CPC algorithm, which evaluates ORF properties and similarity to known proteins (Kong et al., 2007); the HMMER algorithm, which tests for potential to encode a known protein domain (Pfam); ribosome profiling, and polyribosome association. Criteria used to define the lincRNA collection (and not those used only for characterization) are listed.