Slinker: Visualising novel splicing events in RNA-Seq data

Breon Schmidt; Marek Cmero; Paul Ekert; Nadia Davidson; Alicia Oshlack

doi:10.12688/f1000research.74836.1

. 2021 Dec 7;10:1255. [Version 1] doi: 10.12688/f1000research.74836.1

Slinker: Visualising novel splicing events in RNA-Seq data

Breon Schmidt ^1,², Marek Cmero ^1,^3,⁴, Paul Ekert ^1,^3,^5,^6,⁷, Nadia Davidson ^1,^2,⁴, Alicia Oshlack ^1,^2,^3,^a

PMCID: PMC8749905 PMID: 35035899

Abstract

Visualisation of the transcriptome relative to a reference genome is fraught with sparsity. This is due to RNA sequencing (RNA-Seq) reads being predominantly mapped to exons that account for just under 3% of the human genome. Recently, we have used exon-only references, superTranscripts, to improve visualisation of aligned RNA-Seq data through the omission of supposedly unexpressed regions such as introns. However, variation within these regions can lead to novel splicing events that may drive a pathogenic phenotype. In these cases, the loss of information in only retaining annotated exons presents significant drawbacks. Here we present Slinker, a bioinformatics pipeline written in Python and Bpipe that uses a data-driven approach to assemble sample-specific superTranscripts. At its core, Slinker uses Stringtie2 to assemble transcripts with any sequence across any gene. This assembly is merged with reference transcripts, converted to a superTranscript, of which rich visualisations are made through Plotly with associated annotation and coverage information. Slinker was validated on five novel splicing events of rare disease samples from a cohort of primary muscular disorders. In addition, Slinker was shown to be effective in visualising deletion events within transcriptomes of tumour samples in the important leukemia gene, IKZF1. Slinker offers a succinct visualisation of RNA-Seq alignments across typically sparse regions and is freely available on Github.

Keywords: RNA-Seq, Visualisation, Novel Splicing Events, superTranscripts, bioinformatics

Introduction

Genomic variants often carry through to the transcriptome. Through these events, gene products can be inhibited, upregulated, or modified, which disrupts typical function. Understanding how a genomic variant has altered transcription can offer insights into causal mechanisms for pathogenesis or potential targets for therapeutic intervention. ¹ ^, ² DNA variants can be identified through genome sequencing and transcriptional disruption can be measured by RNA sequencing (RNA-Seq). In addition, RNA-Seq data can be used to detect an assortment of variants such as single nucleotide polymorphisms (SNPs), deletions or insertions, and altered splicing. ² ^– ⁶ It is common practice to visualise any novel events relative to control samples with tools such as the Integrative Genomics Viewer (IGV). ⁷ This process involves aligning sequencing reads to a reference genome. However, when visualising RNA-Seq data, the size of introns are typically significantly larger than the exons that harbour most of the aligned reads. ⁸ ^, ⁹ As a consequence, these visualisations are sparse. The intronic regions with low coverage offer little information, yet significantly reduce the interpretability of the transcribed regions.

In an effort to provide succinct representations of RNA-Seq data, we previously developed the concept of superTranscripts. ¹⁰ A superTranscript is the flattened collection of transcripts within a gene and includes only the concatenated sequence of its exons. Subsequently, alignment to a superTranscript reference mostly resolves the sparsity problem through elimination of low-information intronic regions. This concept was further extended in our previous work with the release of Clinker, which uses superTranscripts to visualise fusion genes. ¹¹ In the Clinker pipeline, the pre-generated superTranscripts of genes identified as belonging to a fusion are combined to form a new reference. The sequencing reads are then mapped to this with the splice-aware alignment tool, STAR. ¹² Any splice junctions spanning the two genes in the superTranscripts are indicative of a fusion breakpoint. Although this method offers succinct representations of fusion events, it is not applicable to other types of events that can produce novel structures in transcripts. Specifically, in both the rare disease and cancer context, it is important to visualise any events that might occur within the excised intronic regions, or those that add sequences to the reference transcripts upstream or downstream of annotated exons.

To address these requirements in the visualisation of novel transcripts, we present Slinker, a superTranscript generation and visualisation method. Slinker builds on the concept of visualisation using superTranscripts, but uses genome-guided assembly rather than predefined annotation to incorporate the novel transcribed regions of interest into a reference. Through this, Slinker retains novel transcribed regions outside of annotated exons, that can be described as: novel exons, retained introns, skipped exons, truncated exons, and extended exons ( Figure 1).

Figure 1. — Compared to the reference annotation: retained introns are defined by the inclusion of regions between exons, up to and including the exon boundary. Novel exons are defined by the inclusion of regions between exons, not including the exon boundaries. Skipped exons are defined by the existence of a splice junction in the case that starts and ends at known exon boundaries, but does not exist within the controls. Truncated exons occur when an exon in the case does not extend to the annotated exon boundary. Conversely, an extended exon is where an assembled exon within the case exceeds an annotated exon boundary.

These events can be an important consequence of genomic variants and may be associated with a disease phenotype. To demonstrate the utility of Slinker, we applied it to rare disease variants, discovered with the aid of RNA-Seq in genes associated with primary muscle disorders in skeletal muscle ² and events detected in childhood leukemia. ⁶ ^, ¹³ We determined that Slinker offers a succinct and complementary method to visualise and explore RNA-Seq data. Slinker is a Bpipe pipeline publicly available via Github ( Extended data).

Methods

Slinker is a pipeline that is used to visualise novel gene events using RNA-Seq. The main steps in the pipeline are outlined in Figure 2. Slinker requires multiple inputs: a gene name, aligned RNA-Seq reads (for the case and each control), a genomic sequence reference, and a transcriptome reference annotation. A custom annotation can also be supplied. These must be provided in GTF format with the naming conventions for the attributes reflecting those used within Ensembl annotations. ¹⁴ Upon completion, a PNG file containing a static image of the visualisation and an interactive HTML version are generated and saved to a user-defined destination.

The first step in the Slinker pipeline is to filter the gene transcripts to be visualised by the level of support in the Ensembl annotation. Specifically, all transcripts that either have a Transcript Support Level of 1 or are lacking this information are retained, the rest are filtered out. This ensures that only the best-supported transcripts are included in the assembly. The next step uses Samtools (v1.13) to extract all the reads aligned to the genomic region that the user-supplied gene defines in each input BAM. From each of these reduced BAM files, Stringtie2 (v2.1.7) performs genome-guided assembly using either the supplied annotation or the hg19 or hg38 references packaged with Slinker. ¹⁵ The output of Stringtie2 is a GTF file for each BAM that contains all potential reference and novel transcripts existing within that sample. The transcripts are further filtered by the Stringtie2 coverage estimate, removing any novel transcripts that do not meet the defined threshold (default, c = 1). These transcripts are then merged into a single GTF file. The reason for generating transcripts for both the case and controls is so that deletion events in the case are exposed. Next, we flatten these transcripts into a single linear representation, with concatenated exons, and then visually orient the strand left-to-right, creating a data-driven superTranscript.

A reference is then created from the generated superTranscript through gffread (v0.9.9) and STAR’s (v2.7.3a) genomeGenerate mode. ¹² ^, ¹⁶ The reads extracted from each input BAM that overlap the gene of interest are then aligned to this new reference using STAR in alignReads mode, specifically with all splice junction score penalties set to 0. Assembled transcripts are then annotated according to the new superTranscript coordinates. Finally, a custom plotting package using the Python library, Plotly (v4.0 and above), is used to generate both a static image and interactive HTML output. These plots contain coverage and splice junction tracks for the test sample and controls with the superTranscript annotations included for context. Slinker automatically detects and highlights novel splicing events in the gene of interest. These events are discovered through comparisons between the reference and novel transcripts and highlighted according to the novel event’s proximity to known exon boundaries, or whether an exon has been skipped. For instance, retained introns are defined by the inclusion of the region between exons in a gene, including the exon boundary. Novel exons are defined by the inclusion of regions between exons, not including the exon boundaries. Skipped exons are defined by the existence of a splice junction in the studied case that starts and ends at known exon boundaries, but does not exist within the controls. Truncated exons occur when an exon does not extend to the annotated exon boundary in the case. Conversely, an extended exon is where an assembled exon within the case exceeds an annotated exon boundary.

Slinker can be executed through a single command line option, where the inputs and outputs of each particular stage of the pipeline are handled by Bpipe. ¹⁷ However, users who do not wish to use Bpipe may also run each step manually. The core dependencies of Slinker are the STAR aligner, Stringtie2, Plotly (v4 and above), gffread, and Samtools. ¹² ^, ¹⁵ ^, ¹⁶ ^, ¹⁸ ^, ¹⁹ Runtime for a single gene is under 10 minutes with 20gb of memory and four cores allocated, with a control and case sample containing approximately 125 and 275 million reads, respectively.

Results

In order to demonstrate the utility of Slinker for visualising transcriptional variants that are associated with rare disease, we applied it to muscle biopsy samples obtained from Cummings et al. ² Of these, four were selected for visualisation based on the existence of relevant novel splicing events in disease related genes ( Table 1). These events were found to have consensus between Cummings et al. ² and the variant caller, MINTIE. ⁶ Three skeletal muscle controls were selected from the Genotype-Tissue Expression (GTEx) consortium that were also used as controls within Cummings et al. ² ^, ²⁰ The selected samples were then aligned to hg38 with the STAR aligner in 2-pass mode, of which the resulting BAM files were inputted into the Slinker pipeline. ¹²

Table 1. A list of the cases and controls used within this study to validate Slinker.

The GTEX samples were used as controls for the COL6A1, POMGNT1, RYR1, and NEB genes. The B-ALL20_7 and B-ALL9_4 samples were validated through Multiplex-ligation dependent probe amplification (MLPA) to not contain IKZF1 deletions and were used as controls against samples that were validated to have DEL4-7, DEL2-7, and DEL4-8.

Sample ID	Gene	Variant	Source	Verified
Cases
SRR5020918	COL6A1	Novel Exon	( Cummings et al. 2017)	MINTIE
SRR5034830	POMGNT1	Skipped Exon	( Cummings et al. 2017)	MINTIE
SRR5034830	POMGNT1	Retained Intron	( Cummings et al. 2017)	MINTIE
SRR5033001	RYR1	Truncated Exon	( Cummings et al. 2017)	MINTIE
SRR5038795	NEB	Extended Exon	( Cummings et al. 2017)	MINTIE
B-ALL18_4	IKZF1	DEL4-7	( Brown et al. 2020)	MLPA
B-ALL19_11	IKZF1	DEL2-7	( Brown et al. 2020)	MLPA
B-ALL7_8	IKZF1	DEL4-8	( Brown et al. 2020)	MLPA
Controls
SRR811771	Multiple	N/A	( Carithers et al. 2015)	N/A
SRR810249	Multiple	N/A	( Carithers et al. 2015)	N/A
SRR809595	Multiple	N/A	( Carithers et al. 2015)	N/A
B-ALL20_7	IKZF1	N/A	( Brown et al. 2020)	MLPA
B-ALL9_4	IKZF1	N/A	( Brown et al. 2020)	MLPA

Open in a new tab

Figure 3 demonstrates Slinker’s superTranscript visualisation for the RYR1 gene, which has 105 exons across 154kb of the genome. This was compared to the visualisation of the same region using a refined IGV sashimi plot. Log coverage settings were enabled in both Slinker and IGV. Slinker clearly revealed the novel event of a truncated exon in exon 25, which was far less obvious in the genomic view that included the unnecessary splice junctions and alignment sparsity.

Next, we demonstrated Slinker’s ability to visualise a range of novel events involving both added and deleted sequences in transcripts, each confirmed through MINTIE. Figure 4A depicts two novel splicing events in POMGNT1 found within a single case sample, and is highlighted through a comparison with the three GTEX controls ( Table 1). In this case, we could see a retained intron in the genome-guided assembly (highlighted in yellow) and an exon-skipping event (highlighted in purple) in exon 7. Applying Slinker to the COL6A1 gene in another sample clearly revealed the novel exon as reported by Cummings et al. ² ( Figure 4B). Furthermore, a truncated exon is shown in the RYR1 gene ( Figure 3A) and an extended exon is shown within the NEB gene ( Figure 5).

Figure 5. — In this case, the TSL filter was not applied and a reference with all transcripts was passed through to Slinker. This was due to the large number of transcripts of NEB, which were assembled by Stringtie and masked the interesting event. This demonstrates the utility of being able to pass through custom annotations which may be more suitable to the visualisation at hand.

Finally, we applied Slinker to three childhood B-Cell Acute Lymphoblastic Leukemia samples obtained from the Royal Children’s Hospital (RCH), that harboured various IKZF1 deletions; these were validated in DNA using a multiplex-ligation-dependent probe amplification (MLPA) assay ( Table 1). Figure 6A depicts an exon 4-7 deletion within one such case compared to two other leukemia samples in the RCH cohort. These two samples were chosen due to their high IKZF1 expression relative to other samples within the cohort, but did not contain a deletion either. In addition to the automatic highlighting denoting a SE event, a clear drop in expression across exons 4 and 7 can be seen in the case relative to the control samples. The 2-7 deletion event ( Figure 6B) was also clearly highlighted as a SE event. However, this was not true for the 4-8 deletion sample as no splice junctions existed and the deletion included the final exon ( Figure 6C). However, the drop in coverage between the case and controls in this succinct form demonstrates the utility of Slinker for providing a visual comparison between any samples. An interesting aspect of the 4-7 IKZF1 deletion was the assembly of multiple transcripts containing the deletion event ( Figure 6B). This demonstrates that Slinker may help provide more information for deletion events over exclusive alignment to predefined superTranscripts.

Figure 6. — A. An exon 4-7 deletion in IKZF1. A clear dip in coverage in the case compared to the controls can be seen within the purple highlighted region. Slinker was run using only a single transcript (ENST00000331349), which was representative of the canonical. The highlighted novel exon was annotated in a transcript that is within the greater reference. Transcripts were filtered using the minimum coverage parameter to include all assemblies with the known deletion event. Coverage was normalised by total reads in IKZF1. B. A 2-7 deletion in IKZF1. A clear dip in coverage in the case in comparison to the controls can be seen within the purple highlighted region. In this figure, Slinker was run using only a single transcript (ENST00000331349) which was representative of the canonical. Coverage was normalised by total reads in IKZF1. C. A 4-8 deletion in IKZF1. This comparison demonstrates a slight change in coverage post exon 3 to the end of the transcript in the case. This is representative of the deletion event known to have occurred. Coverage was normalised by total reads in IKZF1.

Discussion

DNA mutations can be causal drivers of disease. ² ^, ⁶ While genomic sequencing is becoming more commonly used to diagnose genetic diseases, there are still many cases where the variants cannot be directly identified as disease-causing, and are therefore called variants of unknown significance. Mutations with the potential to impact splicing are of particular interest. One way to assess the effect of a variant is to investigate the resulting transcript using RNA sequencing and compare these transcripts to a set of controls. While there are several approaches proposed for this purpose, the visualisation of this data can be improved. ² ^, ²¹ ^, ²² Here we address this issue with Slinker, which was built on the superTranscript visualisation framework. One of the fundamental advantages of the superTranscript method is the removal of the visualisation sparsity due to uninformative intronic information. However, novel splicing events can retain an intronic sequence, and it is therefore necessary to first determine which sequences are transcribed in the sample. Slinker utilizes the Stringtie2 data-driven method for assembling the transcripts in the sample which is then combined with reference transcripts to capture all potentially expressed sequences.

Due to the large insertion of sequences relative to the superTranscript, retained introns and novel exons were expected to be the simplest events to visualise with Slinker. This is reflected in Figure 4A and 4B. The same is true for exon skipping, where the existence of a splice junction at annotated exon boundaries in the case versus lack thereof in the control was clear ( Figure 4A). However, truncated and extended exons appeared to be less obvious in Slinker given the relatively small amount of sequence which were removed/added ( Figures 3A and 5). Nevertheless, the bespoke superTranscripts, in conjunction with the highlighting and comparisons with controls could reveal these types of events in the static visualisation even in the largest of genes ( Figure 3A). In addition, Slinker generates an interactive plot so that the user can simply zoom in on these regions to better understand the event whilst also zooming out to appreciate the greater context if required.

Though Slinker was developed to highlight novel splicing events in rare diseases, it can also be used to visualise cancer transcripts. Slinker was applied to three B-Cell Acute Lymphoblastic Leukemia (B-ALL) cases harbouring 4-7, 4-8, and 2-7 IKZF1 deletions. Deletions within this gene are a known risk factor in aggressive forms of this cancer and are used to monitor disease progression. ¹³ In each of these examples, a clear deletion event could be seen through either the presence of skipped exons or the relative drop in coverage when compared to the controls ( Figure 6). However, generally, cancer is a genetically complex disease and consequently a genetic visualisation is also challenging. ²³ Nevertheless, these results demonstrate that Slinker can be an appropriate choice for providing a succinct visualisation of this complexity and may be applicable to a broader set of diseases.

Further work may involve improving this aspect of the software, including highlighting other novel events, such as inversions and duplications. However, this is limited by the assembler’s capacity to retrieve an accurate assembly, including the variant, and for a superTranscript to be an appropriate visual representation. ⁶

Conclusion

Slinker is a bespoke superTranscript generation and visualisation tool with a demonstrated ability to succinctly present novel splicing events in RNA-Seq data. Its key advantage consists in removing redundant information without manual intervention, leaving room for more useful information, such as the expression across the entire gene.

We have produced a tool that has been validated on a number of known novel splice variants and is publicly available from Github ( https://github.com/Oshlack/Slinker).

Data availability

Underlying data

European Genome-Phenome Archive: The application of RNA sequencing for the diagnosis and genomic classification of pediatric acute lymphoblastic leukemia, https://identifiers.org/ega.study:EGAS00001004212 ²⁰ This study contains the paediatric B-ALL samples (B-ALL18_4, B-ALL19_11, B-ALL7_8, B-ALL8_1, and B-ALL9_4).

NCBI dbGaP: Genetics of Inherited Muscle Disease, https://identifiers.org/dbgap:phs000655.v3.p1

NCBI dbGaP: Common Fund (CF) Genotype-Tissue Expression Project (GTEx), https://identifiers.org/dbgap:phs000424.v6.p1

Extended data

Analysis code available from: https://github.com/Oshlack/Slinker

Archived code as at time of publication: https://doi.org/10.5281/zenodo.5719747

License: MIT

Competing interests

No competing interests were disclosed.

Acknowledgements

•
Tumour samples and coded data were supplied by the Children’s Cancer Centre Tissue Bank at the Murdoch Children’s Research Institute and The Royal Children’s Hospital. Establishment and running of the Children’s Cancer Centre Tissue Bank is made possible through generous support by Cancer In Kids @ RCH, The Royal Children’s Hospital Foundation and the Murdoch Children’s Research Institute.

Funding Statement

This work was supported by the NHMRC under project grant APP1140626.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 1; peer review: 3 approved]

References

1. Byron SA, Van Keuren-Jensen KR, Engelthaler DM, et al. : Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 2016 May;17(5):257–271. 10.1038/nrg.2016.10 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Cummings BB, Marshall JL, Tukiainen T, et al. : Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 2017 Apr 19;9(386) 10.1126/scitranslmed.aal5209 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Kim D, Pertea G, Trapnell C, et al. : TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013 Apr 25;14(4):R36. 10.1186/gb-2013-14-4-r36 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Van der Auwera GA, O’Connor BD: Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. O’Reilly Media, Inc.;2020;496. [Google Scholar]
5. Mortazavi A, Williams BA, McCue K, et al. : Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008 Jul;5(7):621–628. 10.1038/nmeth.1226 [DOI] [PubMed] [Google Scholar]
6. Cmero M, Schmidt B, Majewski IJ, et al. : MINTIE: identifying novel structural and splice variants in transcriptomes using RNA-seq data. Cold Spring Harbor Laboratory;2020 [cited 2020 Nov 30];2020.06.03.131532. 10.1101/2020.06.03.131532v1.abstract [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Robinson JT, Thorvaldsdóttir H, Winckler W, et al. : Integrative genomics viewer. Nat. Biotechnol. 2011 Jan 1;29(1):24–26. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Venter JC, Adams MD, Myers EW, et al. : The sequence of the human genome. Science. 2001 Feb 16;291(5507):1304–1351. 10.1126/science.1058040 [DOI] [PubMed] [Google Scholar]
9. ENCODE Project Consortium : An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57–74. 10.1038/nature11247 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Davidson NM, Hawkins ADK, Oshlack A: SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes. Genome Biol. 2017 Aug 4;18(1):148. 10.1186/s13059-017-1284-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Schmidt BM, Davidson NM, Hawkins ADK, et al. : Clinker: visualizing fusion genes detected in RNA-seq data. Gigascience. 2018 Jul 1;7(7). 10.1093/gigascience/giy079 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Dobin A, Davis CA, Schlesinger F, et al. : STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Brown LM, Lonsdale A, Zhu A, et al. : The application of RNA sequencing for the diagnosis and genomic classification of pediatric acute lymphoblastic leukemia. Blood Adv. 2020 Mar 10;4(5):930–942. 10.1182/bloodadvances.2019001008 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Howe KL, Achuthan P, Allen J, et al. : Ensembl 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D884–D891. 10.1093/nar/gkaa942 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Kovaka S, Zimin AV, Pertea GM, et al. : Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019 Dec 16;20(1):278. 10.1186/s13059-019-1910-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Pertea G, Pertea M: GFF Utilities: GffRead and GffCompare. F1000Res. 2020;9:304. 10.12688/f1000research.23297.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Sadedin SP, Pope B, Oshlack A: Bpipe: a tool for running and managing bioinformatics pipelines. Bioinformatics. 2012 Jun 1;28(11):1525–1526. 10.1093/bioinformatics/bts167 [DOI] [PubMed] [Google Scholar]
18. Collaborative data science. Montréal, QC: Plotly Technologies Inc;2015. [Google Scholar]
19. Li H, Handsaker B, Wysoker A, et al. : The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078–2079. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Carithers LJ, Ardlie K, Barcus M, et al. : A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreserv Biobank. 2015 Oct;13(5):311–319. 10.1089/bio.2015.0032 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Yépez VA, Mertes C, Müller MF, et al. : Detection of aberrant gene expression events in RNA sequencing data. Nat. Protoc. 2021 Feb;16(2):1276–1296. 10.1038/s41596-020-00462-5 [DOI] [PubMed] [Google Scholar]
22. Ulirsch JC, Verboon JM, Kazerounian S, et al. : The Genetic Landscape of Diamond-Blackfan Anemia. Am. J. Hum. Genet. 2019 Feb 7;104(2):356. 10.1016/j.ajhg.2018.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Greaves M, Maley CC: Clonal evolution in cancer. Nature. 2012 Jan 18;481(7381):306–313. 10.1038/nature10762 [DOI] [PMC free article] [PubMed] [Google Scholar]

F1000Res. 2022 Feb 3. doi: 10.5256/f1000research.78631.r118438

Reviewer response for version 1

Amy Huei-Yi Lee ¹

Visualizing RNA-Seq data using the reference genome via a tool like Integrative Genomics Viewer is challenged by the sparsity of transcriptome data, as RNA-Seq reads predominately map to exons. This impedes our abilities to detect SNPs, deletions/insertions and altered splicing events using RNA-Seq reads, and the associative impact on expression levels of the impacted gene. Schmidt et al. extend the concept of superTranscripts, which is a concatenated sequence of all the exons with a gene, with genome-guided assembly to allow visualization of novel exons, retained introns, skipped exons, truncated exons and extended exons. The manuscript is very well written with clear, detailed figures (especially the beautiful schematic of the workflow as shown in Figure 2). This is definitely a tool that I am looking forward to using in my own work.

I only have two minor comments:

The colors used in Fig 1 need to be explained in the legends – and the green exons/red introns should probably be different colors to make it colorblind friendly.
The comparison of case and control requires coverage/sequencing depth to help with detection of some of the novel splice events (as shown in Fig 3 – 6). What happens when the sequencing depth between case and control are drastically different?

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes

Are sufficient details provided to allow replication of the method development and its use by others?

Yes

Reviewer Expertise:

Systems immunology, multi-omics integration, neonatal sepsis, host-pathogen interactions

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2022 Jan 7. doi: 10.5256/f1000research.78631.r102214

Reviewer response for version 1

Stephen Montgomery ¹

Schmidt et al provide a pipeline for visualizing novel splicing events in RNA-seq data. The key limitations with existing approaches that they have identified are that sparsity of mapped RNA-seq reads to the genome making visualization of novel events harder to detect and further, that existing approaches are mainly guided by existing/known gene annotations. The combined approach appears useful however there are a number of minor weaknesses.

In practice, novel products/events are first found statistically and then visualized. It appears that the authors are suggesting that this is mostly done first by manual inspection and Slinker makes it easier to find them. From the examples provided with Slinker or with IGV it would be very difficult to eyeball the differences without considerable experience. In fact, even the authors in the figures have to highlight the critical regions so the readers know what to look at. I am not entirely sure the problem they are describing is that large or they have sufficiently solved it.
Overall, I think the clearly described pipeline in combination with the visualizations provides more utility and impact. I would be interested in knowing how dependent the pipeline is on the specific tools or is it compatible with specific conventional file formats and the user can change the underlying components as desired. Can the genome build be easily changed?
I think Figure 1 would benefit from an example of the corresponding superTranscripts. It wasn't immediately clear to me how a Skipped Exon would be conveniently visualized compared to the retained intron in Figure 2. Overall, some better documentation on how to visualize specific events as the practical examples were hard to read and very small.
The examples (Fig 3, 4, 5) have abundant text that is nearly unreadable. I am a bit concerned that the output of Slinker is hard to convert into publication quality figures. Does it output as vector graphics for Illustrator/Inkscape if someone was making a multi-panel figure?
In some of the figures there appear to be other events but they are not highlighted. It would be interesting for the authors to comment on that. For instance in Figure 3 exon 69-70.
The number of controls compared to is pretty small. In practice, it may be useful to compare to 100s or 1000s of controls to find an aberrant event. Is this possible? Figure 4 focused on 3 GTEx control samples only.
With the splice junction penalty score set to 0, are there many more false positives than true events? How many events would visually look interesting in other randomly selected comparisons of a couple samples between genes? Does setting this score influence the recovery of the known disease event examples?
In the discussion, 1st para, the word mutation and variant is used interchangeably. I would recommend being more precise here.
You don't have to cite it. But you might find our splicePlot tool interesting too. It was designed for sQTL but one can summarize data across lots of samples. However, not with the flexibility you have here since it only focused on junctions that one previously prioritized/found interesting. https://doi.org/10.1093/bioinformatics/btt733

Overall, nice work. The paper was well-written. I hope the tool provides some help for people looking to explore novel junctions.

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes

Are sufficient details provided to allow replication of the method development and its use by others?

Yes

Reviewer Expertise:

Computational biology, transcriptomics, human genetics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

References

1. : SplicePlot: a utility for visualizing splicing quantitative trait loci. Bioinformatics .2014;30(7) : 10.1093/bioinformatics/btt733 1025-6 10.1093/bioinformatics/btt733 [DOI] [PMC free article] [PubMed] [Google Scholar]

F1000Res. 2022 Jan 5. doi: 10.5256/f1000research.78631.r118437

Reviewer response for version 1

George Wiggins ¹

The authors have developed a novel method (Slinker) to visualize RNA isoforms that improves interpretation. The method requires condensing the genomic region to view only the sequence that is supported by RNA-sequencing reads.

In my opinion Slinker does address the need in visualizing RNA isoforms that are assembled from RNA-sequencing data. A limitation that should be discussed is the reliance on StringTie2 for transcript assemble. RNA isoform discovery is a growing field, in which Slinker would offer great use, if adaptable to the development of new tools (e.g. FLAIR, bambu).

Minor comments are listed below:

I have assumed long and short-read sequencing can be visualized equally? If not, this needs to be highlighted as a limitation in the discussion.
In the discussion, the authors highlight how RNA species can help determine the pathogenicity of a given variants. For interpretation of RNA splicing events, it is important to determine whether splicing shift the coding frame. Is this possible to implement? This is not a requirement for indexing but a suggested improvement to Slinker.

All figures are presented well but could be improved with increased font sizes. Additionally, is it possible to provide an HTML version of the one plot in the supplementary information or in the github repo?

Figure3A has purple shading at the 5` end of the gene that I assume is not relevant, this should be removed for clarity.
Please provide definitions for acronyms used in plots, e.g. superTranscript panels have TE, NE, TF etc.

Is the rationale for developing the new method (or application) clearly explained?

Yes

Is the description of the method technically sound?

Yes

Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes

Are sufficient details provided to allow replication of the method development and its use by others?

Yes

Reviewer Expertise:

Cancer genetics, RNA-sequencing, bioinformatics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Underlying data

NCBI dbGaP: Genetics of Inherited Muscle Disease, https://identifiers.org/dbgap:phs000655.v3.p1

NCBI dbGaP: Common Fund (CF) Genotype-Tissue Expression Project (GTEx), https://identifiers.org/dbgap:phs000424.v6.p1

Extended data

Analysis code available from: https://github.com/Oshlack/Slinker

Archived code as at time of publication: https://doi.org/10.5281/zenodo.5719747

License: MIT

[ref1] 1. Byron SA, Van Keuren-Jensen KR, Engelthaler DM, et al. : Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 2016 May;17(5):257–271. 10.1038/nrg.2016.10 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] 2. Cummings BB, Marshall JL, Tukiainen T, et al. : Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 2017 Apr 19;9(386) 10.1126/scitranslmed.aal5209 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] 3. Kim D, Pertea G, Trapnell C, et al. : TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013 Apr 25;14(4):R36. 10.1186/gb-2013-14-4-r36 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4. Van der Auwera GA, O’Connor BD: Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. O’Reilly Media, Inc.;2020;496. [Google Scholar]

[ref5] 5. Mortazavi A, Williams BA, McCue K, et al. : Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008 Jul;5(7):621–628. 10.1038/nmeth.1226 [DOI] [PubMed] [Google Scholar]

[ref6] 6. Cmero M, Schmidt B, Majewski IJ, et al. : MINTIE: identifying novel structural and splice variants in transcriptomes using RNA-seq data. Cold Spring Harbor Laboratory;2020 [cited 2020 Nov 30];2020.06.03.131532. 10.1101/2020.06.03.131532v1.abstract [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7. Robinson JT, Thorvaldsdóttir H, Winckler W, et al. : Integrative genomics viewer. Nat. Biotechnol. 2011 Jan 1;29(1):24–26. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8. Venter JC, Adams MD, Myers EW, et al. : The sequence of the human genome. Science. 2001 Feb 16;291(5507):1304–1351. 10.1126/science.1058040 [DOI] [PubMed] [Google Scholar]

[ref9] 9. ENCODE Project Consortium : An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57–74. 10.1038/nature11247 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10. Davidson NM, Hawkins ADK, Oshlack A: SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes. Genome Biol. 2017 Aug 4;18(1):148. 10.1186/s13059-017-1284-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11. Schmidt BM, Davidson NM, Hawkins ADK, et al. : Clinker: visualizing fusion genes detected in RNA-seq data. Gigascience. 2018 Jul 1;7(7). 10.1093/gigascience/giy079 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] 12. Dobin A, Davis CA, Schlesinger F, et al. : STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13. Brown LM, Lonsdale A, Zhu A, et al. : The application of RNA sequencing for the diagnosis and genomic classification of pediatric acute lymphoblastic leukemia. Blood Adv. 2020 Mar 10;4(5):930–942. 10.1182/bloodadvances.2019001008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] 14. Howe KL, Achuthan P, Allen J, et al. : Ensembl 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D884–D891. 10.1093/nar/gkaa942 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 15. Kovaka S, Zimin AV, Pertea GM, et al. : Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019 Dec 16;20(1):278. 10.1186/s13059-019-1910-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16. Pertea G, Pertea M: GFF Utilities: GffRead and GffCompare. F1000Res. 2020;9:304. 10.12688/f1000research.23297.1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] 17. Sadedin SP, Pope B, Oshlack A: Bpipe: a tool for running and managing bioinformatics pipelines. Bioinformatics. 2012 Jun 1;28(11):1525–1526. 10.1093/bioinformatics/bts167 [DOI] [PubMed] [Google Scholar]

[ref18] 18. Collaborative data science. Montréal, QC: Plotly Technologies Inc;2015. [Google Scholar]

[ref19] 19. Li H, Handsaker B, Wysoker A, et al. : The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078–2079. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] 20. Carithers LJ, Ardlie K, Barcus M, et al. : A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreserv Biobank. 2015 Oct;13(5):311–319. 10.1089/bio.2015.0032 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] 21. Yépez VA, Mertes C, Müller MF, et al. : Detection of aberrant gene expression events in RNA sequencing data. Nat. Protoc. 2021 Feb;16(2):1276–1296. 10.1038/s41596-020-00462-5 [DOI] [PubMed] [Google Scholar]

[ref22] 22. Ulirsch JC, Verboon JM, Kazerounian S, et al. : The Genetic Landscape of Diamond-Blackfan Anemia. Am. J. Hum. Genet. 2019 Feb 7;104(2):356. 10.1016/j.ajhg.2018.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] 23. Greaves M, Maley CC: Clonal evolution in cancer. Nature. 2012 Jan 18;481(7381):306–313. 10.1038/nature10762 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Slinker: Visualising novel splicing events in RNA-Seq data

Breon Schmidt

Marek Cmero

Paul Ekert

Nadia Davidson

Alicia Oshlack

Roles

Abstract

Introduction

Figure 1. Depiction of each of the novel splicing events that Slinker is designed to visualise.

Methods

Figure 2. A schematic of the Slinker pipeline for a single user-input gene.

Results

Table 1. A list of the cases and controls used within this study to validate Slinker.

Figure 3. Integrated Genomics Viewer (IGV) and Slinker visualisations of the RYR1 gene.

Figure 5. Slinker visualisation in a sample with an extended exon event (red vertical highlight) between exons 143-144 in the NEB gene.

Figure 6. Deletions within the IKZF1 gene.

Discussion

Conclusion

Data availability

Underlying data

Extended data

Competing interests

Acknowledgements

Funding Statement

References

Reviewer response for version 1

Amy Huei-Yi Lee

Roles

Reviewer response for version 1

Stephen Montgomery

Roles

References

Reviewer response for version 1

George Wiggins

Roles

Associated Data

Data Availability Statement

Underlying data

Extended data

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases