Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Aug 14;49(16):9132–9153. doi: 10.1093/nar/gkab710

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Figure 1. — Combining short read RNA-seq and long read RNA-seq to assemble hPSC-specific transcriptome. (A) Read density of different experimental techniques across the length of the transcript. Pileups of hPSC data from deepCAGE, short read RNA-seq, and polyA-seq reads across the lengths of the transcripts from the 5′ ends to the 3′ ends and the flanking 2 kb regions. Each transcript is scaled to the same size and orientated to the same strand. DeepCAGE specifically sequences the 5′ ends of transcripts and can identify TSSs. DeepCAGE data is measured in normalized tag counts, taken from Ref. (47). PolyA-seq data is from the 3′ RNA-seq data set GSE138759 (49), and is measured in normalized counts. RNA-seq (SR) refers to pileups of the SR RNA-seq data only, across the transcripts. The SR sample accessions used in this study are described in Supplementary Table S1. (B) The number of transcripts (in thousands) that are supported by short read (SR)-only, long read (LR)-only, or both (SR + LR). (C) The number of transcripts (in thousands) that were defined as matching (all internal exons boundaries match exactly to a GENCODE transcript, exact 5′ and 3′ ends of the transcript are not enforced), variant (shares any exon or overlapping exon segment with a GENCODE transcript) or novel (does not share any exonic nucleotide with a GENCODE transcript). (D) Pie charts showing the proportion of nucleotide sequences at the 5′ or 3′ splice sites. The transcripts are divided into the matching, variant or novel classes and all GENCODE transcripts are shown for comparison. (E) Violin plots showing normalized RNA counts for matching, variant and novel transcripts, for RNA-seq (from short read data) and deepCAGE data. RNA-seq is presented in log2 transcripts per million (TPM). DeepCAGE is in log2 normalized tag counts, as deepCAGE data only sequences the 5′ ends, only transcripts with unique 5′ ends were used in the analysis of deepCAGE data. (F) Number and percentage of coding and noncoding transcripts by transcript class. Coding and noncoding here refers to the prediction by FEELnc. Novel transcripts have no overlapping exons with GENCODE, variants overlap by any single base pair against the GENCODE annotations, matching have exactly matching internal exon splicing sites. ‘All’ are all assembled hPSC transcripts. (G) RNA levels of coding and noncoding transcripts, for short read RNA-seq (left violins) or deepCAGE data (right violins). For deepCAGE, only transcripts with a unique 5′ end were used.