Skip to main content
. 2021 Aug;31(8):1462–1473. doi: 10.1101/gr.274696.120

Figure 1.

Figure 1.

Principle of the method. (A) Two transcripts are depicted, in which each gray box represents a genomic interval and contains the corresponding protein sequence. (Below) The minimal SG is shown, with the nodes (n1, n2, n3, n4) corresponding to subexons. The start and end nodes are added for convenience. Each structural edge in red corresponds to some intron, and each induced edge in green corresponds to a junction located inside the initial genomic interval (such as the donor site of exon AEIGV). (B) Close-up view of three SGs corresponding to three orthologous genes coming from human, gorilla, and cow, along with three examples of ESGs summarizing the same information. The nodes in the ESGs represent s-exons, or multiple sequence alignments (MSAs) of exonic regions. The details of the ESG scores, computed from Equation 1, are given in the inset table, with σmatch = 1, σmismatch = −0.5, and σgap = 0 for the MSA scores, and edge penalties σS = 0.5 and σI = 2. The best-scored ESG shows at the same time compactness (parsimony) and good-quality alignments. (C) Main steps of the ESG construction in ThorAxe. The input genes and transcripts are depicted on top, with exons displayed as boxes. ThorAxe first step consists in grouping similar exons together. Here, three clusters are identified, colored in red (1), cyan (2), and blue (3); note that cluster 2 groups to multiple exons in human and cow. Then, subexons are defined based on intra-species transcript variability. For instance, the first exon from gorilla is split into two subexons. The subexons would be the nodes in the species-specific minimal SGs, although the latter are not explicitly computed by ThorAxe. The next step consists in aligning the sequences belonging to each cluster (with some padding “X” between mutually exclusive subexons) and identifying the spliced exons (s-exons) as blocks in the alignment. We keep track of the cluster IDs in the s-exon IDs, to ease interpretability. Finally, ThorAxe builds an ESG in which the nodes are the s-exons. For the sake of clarity, multiedges are visualized as single edges.