Table 1.
Reference | Data for Transcript Reconstruction |
Genomic Features and Filters | Coding-Potential Filters | Number of lincRNAs |
---|---|---|---|---|
Mouse | ||||
Ravasi et al., 2006 | cDNAs | Manual curation, ORF length, CRITICA |
13,502 transcripts | |
Ponjavic et al., 2007 | cDNAs, CAGE | Manual curation, ORF length, BLAST, CRITICA |
3,122 transcripts | |
Guttman et al., 2009 | Chromatin marks, tiling arrays |
Collection of approximate exonic regions, chromatin domain ≥5 kb |
CSF | 1,675 loci (1,250 conservatively defined) |
Guttman et al., 2010 | RNA-seq | Multi-exon only | CSF | 1,140 lincRNA transcripts |
Sigova et al., 2013 | RNA-seq, cDNAs, chromatin marks, |
Antisense overlap with mRNA introns allowed, ≥100 nt mature length |
CPC | 1,664 loci |
| ||||
Human | ||||
Khalil et al., 2009 | Chromatin marks, tiling arrays |
Collection of approximate exonic regions, chromatin domain ≥ 5 kb |
CSF | 3,289 loci |
Jia et al., 2010 | cDNAs | Overlap with mRNAs allowed | 5,446 transcripts | |
Ørom et al., 2010 | cDNAs | Restricted to loci >1 kb away from known protein-coding genes, ≥200 nt mature length |
Manual curation based on length, conservation and other characteristics of the ORFs |
3,019 transcripts from 2,286 loci |
Cabili et al., 2011 | RNA-seq | Multi-exon only, ≥200 nt mature length |
PhyloCSF, Pfam | 8,195 transcripts (4,662 in the stringent set) |
Derrien et al., 2012 | cDNAs | Overlap with mRNAs allowed (intergenic transcripts reported separately), ≥200 nt mature length |
Manual curation based on length, conservation and other characteristics of the ORFs |
14,880 transcripts from 9,277 loci, including 9,518 intergenic transcripts |
Sigova et al., 2013 | RNA-seq, cDNAs, chromatin marks, |
Antisense overlap with mRNA introns allowed, ≥100 nt mature length |
CPC | 3,548 loci from embryonic stem cells, and 3,986 loci from endodermal cells |
| ||||
Frog | ||||
Tan et al., 2013 | RNA-Seq | >25 kb away from known protein- coding genes or on a different strand from the neighboring genes, ≥200 nt mature length |
ORF length, BLAST, Pfam | 6,686 transcripts from 3,859 loci |
| ||||
Zebrafish | ||||
Ulitsky et al., 2011 | RNA-seq, cDNAs, 3P-seq, chromatin marks |
Antisense overlap with mRNA introns allowed, ≥200 nt mature length |
CPC | 691 transcripts from 567 loci |
Pauli et al., 2012 | RNA-seq | Stringent criteria for single exon, intron overlap with mRNA allowed, ≥160 nt mature length |
ORF length, PhyloCSF, BLAST, Pfam |
397 intergenic and 184 intronic overlapping transcripts |
| ||||
Fly | ||||
Tupy et al., 2005 | cDNA | Manual curation based on ORF length, conservation and other characteristics, Ka/Ks test, QRNA |
17 transcripts | |
Young et al., 2012 | RNA-seq | ≥200 nt locus length | 1,119 trancripts | |
| ||||
Nematode | ||||
Nam and Bartel, 2012 | RNA-seq, 3P-seq | ≥100 nt mature length | CPC, RNAcode, ribosome profiling, polysome association |
262 lincRNA transcripts from 170 loci |
| ||||
Arabidopsis | ||||
Liu et al., 2012a | cDNA, tiling arrays, RNA-seq |
In part a collection of approximate exonic regions, >500 bp away from protein-coding genes, no overlap with transposable elements allowed, ≥200 nt mature length |
ORF length | 6,480 transcription units from tiling arrays, 278 transcripts from RNA-seq |
| ||||
Maize | ||||
Boerner and McGinnis, 2012 | cDNA | Both sense overlap with introns and antisense overlap with mRNA or introns allowed, ≥200 nt mature length |
ORF length | 2,492 transcripts |
| ||||
Plasmodium falciparum
| ||||
Broadbent et al., 2011 | Tiling arrays | Collection of approximate exonic regions, ≥200 nt mature length |
BLAST | 60 transcripts |
Transcripts overlapping protein-coding sequences on either strand were excluded unless noted otherwise. Coding-potential filters included: ORF length; similarity to known protein-coding regions (BLAST); substitution patterns in whole-genome alignments, quantified by CRITICA (Badger and Olsen, 1999), CSF (Lin et al., 2007), PhyloCSF (Lin et al., 2011), QRNA (Rivas and Eddy, 2001; Rivas et al., 2001), or RNAcode (Washietl et al., 2011), as indicated; the CPC algorithm, which evaluates ORF properties and similarity to known proteins (Kong et al., 2007); the HMMER algorithm, which tests for potential to encode a known protein domain (Pfam); ribosome profiling, and polyribosome association. Criteria used to define the lincRNA collection (and not those used only for characterization) are listed.