Skip to main content
. 2016 Nov 28;5:e19090. doi: 10.7554/eLife.19090

Figure 5. Summary of IES structural features.

Red lines = MIC DNA. Blue lines = MAC DNA. A representative IES is indicated by the open red box. IESs were identified as described in Figure 5—figure supplement 1. Their size distribution is shown in Figure 5—figure supplement 2. The excision endpoint found in the SB210 MAC genome is indicated by the slanted lines converging to the right. Sequences from a large progeny pool representing multiple, independent excision events show most progeny share the parental endpoint, but variation within a limited range is common, as shown in detail in Figure 5—figure supplement 3. The left terminal junction sequences is shown blown up below and to the left. Short Terminal Direct Repeats (TDRs) are often found; they are generally very AT-rich and have a slight sequence pattern bias. A 4 bp TDR sequence logo is shown as an example. More detailed characterization of endpoint TDRs is presented in Figure 5—figure supplement 4.

DOI: http://dx.doi.org/10.7554/eLife.19090.012

Figure 5.

Figure 5—figure supplement 1. Read alignment methods used for IES dentification.

Figure 5—figure supplement 1.

(A) MAC Sanger sequencing reads (blue arrowed bars) align to MAC scaffolds (thick blue bar) along entire length, but their alignment to MIC scaffold (thick red bar) is interrupted by IES (black bar). (B) Alignment of Illumina MIC reads (red arrowed bars) to margins of IESs is uninterrupted in MIC scaffolds, but broken at 'residual' IES locations in MAC genome. (C) Short direct repeats (grey) at IES/MDS junctions in MIC lead to overlapping read alignment to MAC scaffolds.
Figure 5—figure supplement 2. Size distribution of 7551 high-confidence IESs.

Figure 5—figure supplement 2.

Note that the x-axis is log base two transformed.
Figure 5—figure supplement 3. IES excision variability.

Figure 5—figure supplement 3.

(A) The number of variant excision endpoints detected within the progeny pool at each IES site (calculated using excision sites for which data are available for both SB210 and progeny). It is difficult to reliably quantify the degree of endpoint variation because the values depend on several factors, including the number of progeny cells used in DNA purification, the depth of sequencing coverage, the method of mapping endpoints, and the criteria by which endpoints are validated. For this study, endpoints were mapped by the 'split read alignment' method (Figure 5—figure supplement 1A). For validation, at least three identical, independent read alignments were required. Number of progeny cells and sequencing coverage are described in 'Materials and methods'. (B) The positions of progeny pool read alignment breakpoints were mapped relative to the SB210 read alignment breakpoint reference (distances in either direction were added together). The greatest number of progeny breakpoints is identical to the SB210 reference (point 0) Nearly identical results were observed in comparison to SB1969 (data not shown). The frequency of alternative breakpoints generally decreases with increasing distance from the reference, with the exception of a small peak at a distance of 4 bp.
Figure 5—figure supplement 4. IES/MDS junctions.

Figure 5—figure supplement 4.

Short terminal direct repeats (TDRs) at IES termini were identified by examining alignments between the MIC and MAC genomes at these termini to identify alignment overlaps (i.e. short MAC sequences at precisely the site of excision that align to both precise ends of the IES). These results were confirmed by alignment of MIC sequencing reads to the MAC genome assembly (see Figure 5—figure supplement 1C). (A) TDR length. Numbers of junctions with TDRs of the indicated lengths, showing that IESs with no TDR constitute the largest class. (B) A+T richness. For each of the five TDR classes between 0 and 4 bp, the direct repeat (or two flanking bases, in the case of no overlap) plus six bases on either side were extracted from the MIC genome sequence and aligned (MAC-destined sequence to the left; MIC-limited sequence to the right). Each arrow indicates the center of the TDR (or, in the case of No TDR, the junction point). Sequence logos derived from the alignments show that the TDRs are more AT-rich than surrounding sequence. Bases within the four, three, and two base direct repeats are approximately 97% AT overall and the one base direct repeats are 92% AT, whereas the two bases flanking the 'zero overlap' junctions are 80% AT, similar to the adjacent sequence composition. (C) Sequence pattern bias. In addition to overall AT-richness, the sequence patterns of the TDRs are not entirely random. We compared the frequency of each of the possible TDRs between 2 and 4 bp in length that consist of only As and Ts. Reverse complementary sequences were found to have approximately equal frequencies, as expected because the orientation of the sequenced strand is random, and they were grouped together. This makes for 10 groupings of 4 bp TDRs, 4 groupings of 3 bp TDRs, and 3 groupings of 2 bp TDRs. As shown in this panel, the frequencies of each grouping are unequal; the most common are: 4-mer TTAA (palindromic), 3-mer TTA/TAA, and 2-mer TT/AA (the latter two are pairs of reverse complementary sequences). Furthermore, it is notable that the four most common groupings of 4-mers all contain one member with a 5' TT dinucleotide (red font) and together account for two thirds of the total 4-mers. Likewise, the two (out of four total) 3-mer groupings containing a 5' TT dinucleotide account for two thirds of 3-mers, and the single TT/AA 2-mer grouping accounts for over three quarters of all 2-mers. These findings suggest that IES junctions have a slight bias in favor of beginning with TT and an extended weak consensus of 5'-TT(A)(A)−3', the most common 2, 3, and 4 bp TDRs (but far from the majority) including successively more of the consensus, from left to right.