Skip to main content
. Author manuscript; available in PMC: 2016 Dec 1.
Published in final edited form as: Nat Chem Biol. 2015 Dec;11(12):909–916. doi: 10.1038/nchembio.1964

Figure 2. Integrated genomic and proteomic discovery and validation of smORFs.

Figure 2

a) Combining RNA-Seq data with proteomics identifies novel human smORFs. The RNA-Seq data is assembled in to transcripts using Cufflinks and then 3-frame translated in silico to generate a searchable proteomics database that contains non-annotated transcripts. Because all possible RNA-produced proteins are included, non-annotated proteins including SEPs can be found. This approach identified 86 novel human smORFs in the 5′-UTR, 3′-UTR, the coding sequence (CDS), non-coding RNAs, and antisense RNAs. b) Polysomes contain strings of ribosomes attached to RNAs. While longer ORFs can have many ribosomes attached simultaneously, smORFs should only have a handful (2–6) ribosomes per RNA. The ribosomal profiling of these short polysomes (referred to as Poly-Ribo-Seq) successfully enriched smORF-containing RNAs and identified 236 smORFs, including 146 whose SEPs have not been identified by proteomics, due to either their lower level of translation (expressed as RPKM, ribosomal-protected reads per million per kilobase) or lower peptide stability. c) Ribosome sequencing (Ribo-Seq) of cytomegalovirus (CMV) infected human cells revealed many novel viral smORFs. Ribo-Seq is a way to measure ribosome occupancy on mRNA and is used as a surrogate to indicate protein translation. The addition of harringtonine and lactimidomycin (LTM) stall the ribosome on the translation initiation codon, which allows the start of an ORF to be defined. Using this approach led to the identification of many novel CMV ORFs, most of which were sORFs. Several of these were validated via proteomics.