Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2021 Jul 19.

Published in final edited form as: Nat Methods. 2018 Sep 10;15(10):832–836. doi: 10.1038/s41592-018-0114-z

(A) ‘Sashimi plots’ ¹⁸ of RNA-seq reads mapped to a region within the Coiled-coil Domain Containing 173 (CCDC173) gene locus, with the annotated ENSEMBL transcripts (blue), the PAS annotated in the PolyAsite atlas (vertical black lines, http://polyasite.unibas.ch) and densities of RNA-seq reads (gray) from fallopian tube and testis samples. The novel terminal exon is marked by the red dashed box, gray arcs indicate putative splice junctions, and numbers on the arcs indicate supporting reads (for clarity, only splice junctions supported by at least 10% of the maximum number of split reads between two exons in the genomic locus are shown, see also Supplementary Figure 2A). (B) Flow of the data through TECtool (input and output file formats are indicated in parentheses). (C) Outline of the main computational steps: Step 1 - Selection of PAS located within regions that with respect to the input annotation (see ‘Annotation (GTF)’ in (B)), are ‘intronic’ (red arrow), and not exonic, intergenic or antisense (black arrows). Step 2 - Identification of the ‘feature’ region of the putative novel terminal exon (red line), extending from the ‘intronic’ poly(A) site up to the closest annotated exon upstream (blue box with red border). Step 3 - Identification of reads that map uniquely to the feature region. Step 4 - Definition of terminal exon boundaries (red box), given by a splice site at the 5’ end - inferred from split reads -, and the ‘intronic’ poly(A) site at the 3’ end. Classification of putative terminal exons is done with a Bayes classifier. Step 5 - The newly identified terminal exons are linked to upstream exons to which they were found to be spliced based on split reads, to generate novel isoforms. Step 6 - Prediction of protein coding regions in newly identified transcripts.