Abstract
We previously reported a unique genome with systematically fragmented genes and gene pieces dispersed across numerous circular chromosomes, occurring in mitochondria of diplonemids. Genes are split into up to 12 short fragments (modules), which are separately transcribed and joined in a way that differs from known trans-splicing. Further, cox1 mRNA includes six non-encoded uridines indicating RNA editing. In the absence of recognizable cis-elements, we postulated that trans-splicing and RNA editing are directed by trans-acting molecules. Here, we provide insight into the post-transcriptional processes by investigating transcription, RNA processing, trans-splicing and RNA editing in cox1 and at a newly discovered site in cob. We show that module precursor transcripts are up to several thousand nt long and processed accurately at their 5′ and 3′ termini to yield the short coding-only regions. Processing at 5′ and 3′ ends occurs independently, and a processed terminus engages in trans-splicing even if the module’s other terminus is yet unprocessed. Moreover, only cognate module transcripts join, though without directionality. In contrast, module transcripts requiring RNA editing only trans-splice when editing is completed. Finally, experimental and computational analyses suggest the existence of RNA trans-factors with the potential for guiding both trans-splicing and RNA editing.
Keywords: trans-splicing, U-insertion RNA editing, Diplonema papillatum, Euglenozoa
Introduction
The arguably most eccentric genome architecture and gene structure is found in the mitochondrion of diplonemids (Euglenozoa), a group of free-living unicellular flagellates with phagotrophic or osmotrophic mode of nutrition. Diplonemids are the sistergroup of the notorious kinetoplastids whose members are responsible for serious diseases in humans. The third group within Euglenozoa are the euglenids, which emerged prior to the split of diplonemids and kinetoplastids.1
The first molecular study of diplonemid mtDNA was published in 1999 by L. Simpson’s group2 and indicated that the mitochondrial genome of Diplonema papillatum consists of a complex array of small covalently closed DNA molecules. Today, we know that this mtDNA is composed of hundreds of 6- and 7-kbp long circular molecules termed A-class and B-class chromosomes, respectively.3 Most of the chromosomes’ sequence is non-coding and quasi identical among members of the same class. Only a small region, the “cassette,” is unique to each chromosome. The cassette encloses a gene fragment of ~70–350 bp4 that is bounded by on average a 50-nt long non-coding sequence on each side (Fig. 1A). All genes in mtDNA of Diplonema appear to be fragmented; as of now, not a single contiguous gene has been found in this genome. The cox1 gene for example is broken up into nine pieces and therefore requires nine different chromosomes to specify its coding region. To generate the contiguous cox1 mRNA, gene pieces are transcribed individually and then assembled by trans-splicing, as is evidenced by transcript intermediates readily visible in Northern hybridization experiments.4
Figure 1. Transcription and transcript processing in Diplonema mitochondria. (A) Structure of mitochondrial chromosomes showing regions unique to a given chromosome (“cassette”), which include coding (“gene module”) and non-coding sequence (“unique flanking regions”), regions common to a given chromosome class (“class-specific constant regions”) and the portion of the constant regions that is shared by A- and B-class chromosomes (“shared”). (B) The strandedness of the modules relative to the constant region. (C‒E) Observed types and abundance of putative RNA processing intermediates, including a single module (mono-module transcripts). 5′, 5′-terminal (first) modules (C); i, internal modules (D); 3′, 3′-terminal (last) modules (E) of all genes combined. Note that the 5′-terminal module includes a non-coding 5′UTR of ~25 nt. Data are taken from Table S1. The total counts of detected 5′-terminal, internal and 3′terminal mono-modules with defined length of adjacent regions are four, 35 and 38, respectively. In (E), the number of polyadenylated 3′modules is overestimated (indicated by < 26%, < 21%), since one library was enriched in poly-A RNA. The low number of 5′-terminal mono-modules is due to the experimental design. Although 5′-terminal mono-modules carrying both adjacent regions or a 3′-adjacent region have not been detected among the total number of four 5′-terminal mono-modules, this type of intermediate is likely among transcripts whose 3′ end remained unknown due to the choice of RT-PCR primers [see Table S1, e. g., cox1-m1 (dp4030), cox2-m1 (dp9347), and cox3-m1 (dp9346,56,69)]. (F) Length distribution of 5′- and 3′-adjacent regions of mono-modules. The gray shades represent the percentage of observed intermediates with 5′ and/or 3′extensions of length 0 nt, 1–25 nt, 1–100 nt, 1–300 nt, 1–500, etc., up to 1–1,100 nt. Note a steep drop of percentage immediately adjacent to the module boundaries, except for 5′ modules that include a 5′UTR of 26–27 nt. The reason why the longest observed 5′ extensions of 5′ modules is shorter than those of internal plus 3′ modules is due to the smaller sample size of the former modules. For details, see Table S1.
The unorthodox genome organization and gene structure of D. papillatum, and diplonemids in general,5 is contrasted by a rather ordinary set of mitochondrial genes, which much resembles the gene complement in kinetoplastid mitochondria. Genes identified so far in D. papillatum mtDNA encode components of the respiratory chain and oxidative phosphorylation, i.e., ATP synthase subunit 6, apocytochrome b, cytochrome oxidase subunits 1–3, NADH dehydrogenase subunits 1, 4, 5, 7 and 8, as well as the mitochondrial large subunit rRNA (rRNA).6 The gene for the small subunit rRNA is believed to also be present on D. papillatum mtDNA—as is the case for all other eukaryotes—but has remained undetected, most likely because its sequence is highly divergent. All recognized mitochondrial genes of D. papillatum are trans-spliced.
A well-known mechanism of mitochondrial trans-splicing involves discontinuous Group I or Group II introns, where cognate exons are brought into close proximity through intermolecular pairing that forms a distinctive intron RNA secondary structure.7,8 However, even the most sensitive in silico search failed to detect intron-typical sequence patterns, conserved residues or sequence-complementary motifs at module boundaries.5 In the absence of recognizable cis-elements, we postulated trans-active factors that would guide both trans-splicing and RNA editing.4,6 Here, we analyze in a comprehensive fashion post-transcriptional processes in Diplonema mitochondria, and identify, by experimental and in silico methods, trans-factors with the potential to direct and control the various RNA maturation steps. Experimental studies of Diplonema mitochondrial transcripts have been extremely challenging due to the difficulties to obtain sufficient quantities of cell material, mitochondria and RNA.
Results
Primary mitochondrial transcripts
Transcription of mitochondrial gene modules in Diplonema is thought to start, as in animal mitochondria,16 at the replication origin. Previously, we mapped the origin tentatively by in silico methods to the shared constant region of chromosomes (Fig. 1A and ref. 6). Aiming at a more precise experimental determination of transcription start sites, we now performed in vitro RNA capping experiments, since in most mitochondria (with few notable exceptions, e.g., Neurospora crassa17), primary transcripts have a triphosphate 5′ end that can be labeled with α-32P-GTP and guanylyl transferase. Nuclear mRNAs, in contrast, are naturally capped during transcription and therefore are not labeled, except cytosolic 5S rRNA (or a portion of it).
In Diplonema mitochondria, we expected four major groups of primary gene module transcripts, one each for A-class and B-class chromosomes and one each for orientation “(+)” and “(-)“ (Fig. 1B). All these primary transcripts should be above 1.2 kb in size (the minimum distance between modules and shared constant region). This lower-bound size estimate is corroborated by module precursors that were identified in cDNA libraries and by RT-PCR experiments (see below and Fig. 1F). Yet, capping of Diplonema total RNA yielded only two, relatively small-labeled bands (0.12 kb and 0.3 kb; Fig. 2A). Although the quantity of the labeled material was insufficient to perform RNA sequencing or to use it as a hybridization probe, we still can make specific inferences on the nature of these RNA species. The 0.12-kb molecule is almost certainly cytosolic 5S rRNA based on its size and high abundance in ethidium bromide staining (Fig. 2B and ref. 18). In contrast, the 0.3-kb band is apparently of mitochondrial origin, because of its low concentration (not visible by staining) and high capping efficiency; it most likely represents the equivalent of human mitochondrial 7S RNA that primes mtDNA replication.19 Human 7S is a stable RNA whose synthesis is sponsored by the promoter for transcription of L-strand encoded genes. Both the human mitochondrial L- and H2-strand polycistronic transcripts can have nearly full-genome length, but are rather short-lived.20 The same appears to apply to Diplonema mitochondria, with primary transcripts processed too rapidly to be detected by the method applied.

Figure 2. Transcripts bearing a 5′-triphosphate. (A) Radiogram of total RNA capped by α-(32P)-GTP and guanylyl-transferase, and separated on a denaturating polyacrylamide gel (5%). The band of ~0.12 kb is believed to be cytosolic 5S rRNA that, across eukaryotes, possesses a 5′-triphosphate. The band of ~0.3 kb is most likely mitochondrial 7S RNA based on abundance and size. (B) Ethidium bromide-stained total RNA separated on the same gel as capped RNA. The prominent bands are cytosolic LSU rRNA (~3.5 kb), SSU rRNA (~2 kb), 5.8S rRNA (0.17 kb) and 5S rRNA (0.12 kb) as determined earlier by others.18
End-processing of gene module transcripts
In Diplonema, the RNAs transcribed from individual chromosomes undergo multiple maturation steps, which we investigated by three experimental procedures. First, since mitochondrial mRNAs of Diplonema are polyadenylated, we constructed classical full-length cDNA libraries by priming the reverse transcription of the first DNA strand with an anchored oligo-dT primer that anneals with the proximal region of the poly(A) tail; the second DNA strand synthesis was primed with an oligonucleotide binding to all cDNA 3′ ends (see Materials and Methods). Second, double-stranded cDNA was produced as above, but transcripts were PCR-amplified using various combinations of gene-specific primers. A third type of experiment involved RNA circularization followed by RT-PCR with diverse pairs of “divergent” gene-specific primers (Fig. 3A). These experiments detected, in addition to (mature) mRNAs, two kinds of incomplete transcripts, one containing multiple modules (“oligo-module transcripts”) and the other including a single module (“mono-module transcripts”). We also encountered several transcripts containing exclusively module-flanking regions, with one terminus corresponding exactly to the nucleotide adjacent to a module. Such large chunks of flanking regions appear to be liberated by precise endonucleolytic cleavage. Tables S1 and 2 compile detailed information on transcripts that include modules or exclusively flanking regions, respectively. These tables also list the corresponding clones that are referred to in the following paragraphs.

Figure 3. Design of the experimental and computational search for trans-splicing and RNA-editing guides. mi, upstream module; mi+1, downstream module. Upper gray bars, module transcripts. Black lines, hypothetical guides. Vertical thin lines, pairing between module transcripts and guide. (A and B) RT-PCR experiments. Arrows, location of primers for reverse transcriptase (RT) and PCR reactions. Lower gray bar, the resulting amplicon, where light gray shade indicates the sequences that originate from the primers, and dark gray shade indicates the central region that originates from the template. For primers used, see Table S9. (A) RT-PCR after RNA circularization to detect the distal regions of antisense RNAs. The short vertical bar indicates the circularization point. (B) RT-PCR to detect the central portion of antisense RNAs that are complementary to module junctions. (C and D) In silico search. L, bridge length; d1, d2, distance of match from 3′-module boundary and 5′-module boundary; a (“anchors”), stretch of 100% sequence complementarity between module and guide. (C) “Regular” conformation (topology 1). (D) Permutated conformation (topology 2).
Among mono-module transcripts, we found fully and partially processed modules, the different types of which are depicted in Figure 1C‒E. Fully processed modules consist exclusively of coding region (e.g., cox1-m5 clone dp7341), except for the 5′-terminal (“first”) module of genes, which retains a 5′ untranslated leader (5′UTR) that is typically ~25 nt long (e.g., cob-m1, clone dp5996). Partially processed mono-module transcripts include flanking regions of non-coding sequence that are up to ~1,150 nt long, reaching far into the chromosome’s constant region (see Fig. 1A and F; Table S1). Flanking regions may border the upstream, the downstream or both sides of the module (e.g., nad4-m7).
As mentioned above, gene modules are encoded on either A- or B-type chromosomes, and in either orientation with respect to the constant region [referred to as A(+), A(-), B(+) and B(-) (see Fig. 1B; Table S1, column 2)]. Interestingly, immature module transcripts whose adjacent regions reach into the constant regions of chromosomes have the potential to pair with one another, notably A(+) with A(-) precursors, and B(+) with B(-) precursors (Fig. 4A). In fact, intermolecular hybridizations are quite likely to occur, given the relatively high steady-state concentration of precursors in mitochondria. Such pairing would evidently not align modules in the correct order for trans-splicing, but might allow to “herd” the hundred or so distinct module transcripts for further processing by a dedicated machinery.

Figure 4. Potential transcript interactions and maturation pathway. (A) Potential pairing of module precursor transcripts encoded in opposite orientation on a given chromosome class. Transcripts from (+)-orientation chromosomes have the propensity to pair with counterparts from (-)-orientation chromosomes of the same class. Such pairing could form foci of precursors in the organelle, but not align cognate modules. (B‒D) inferred transcript maturation pathways for “first” (5′) modules (B), internal modules (C) and last’ (3′) modules (D). Note that 5′ modules include a non-coding 5′UTR of ~25 nt. Boxes in black or striped represent modules. Gray boxes are unique flanking regions. Thin bars represent constant regions. Interrupted thin bars indicate that a neighboring module may or may not be present. 5′, 5′-terminal module; i, internal module; 3′, 3′-terminal module of a given gene. AAAA, poly(A) tail.
Special cases are transcripts of 3′-terminal (“last”) modules. These occur with processed and unprocessed ends, as well as with a poly(A)-tail attached directly 3′-adjacent to the coding region (e.g., cox1-m9, clone dp5927). We detected only a few last-module transcripts that are 3′-processed but not poly-adenylated, suggesting that the two steps are tightly coordinated.
A tentative estimation of the relative abundance of processing intermediates indicates that the steady-state concentration of mono-modules retaining 5′- and 3′-adjacent regions is generally higher than that of partially and fully end-processed transcripts (see Fig. 1C–E; Table S3).
Trans-splicing of gene module transcripts
While single-module transcripts described above provide insight into their end processing, transcripts containing several modules inform us about how trans-splicing proceeds. The observed oligo-module transcripts include various numbers of modules, covering virtually any interval of the mRNA (e.g., cox1-modules 3‒6; cox1-modules 6‒9; Table S4). Further, modules are all arranged in correct order, thus representing putative intermediates of the trans-splicing process.
We analyzed how far trans-splicing depends on module end-processing and, in the case of terminal modules, polyadenylation. Several oligo-modules were found to retain a 5′- or a 3′-flanking region (Tables S1 and 4). Moreover, oligo-modules including a last module may be poly-adenylated or not (e.g., cox1-m9, clones dp5977 and dp0655). Obviously, partially processed modules can readily engage in trans-splicing, and polyadenylation of the last module seems not required for joining with its upstream partner. These data taken together allowed inferring the assembly line by which transcripts are built in Diplonema mitochondria (Fig. 4B‒D).
Non-encoded nucleotides in mitochondrial transcripts
In general, genomic and cDNA sequences of Diplonema mitochondria are congruent — yet with a few notable exceptions. Most conspicuous is the occurrence of six non-encoded Us in the cox1 transcript exactly between modules 4 and 5,4 and this RNA editing event is evolutionarily conserved across diplonemids.5 Here we examine how exactly these extra Us are added. They may be inserted after ligation of the two module transcripts, or alternatively, prior to ligation, attached either to the 3′ end of module 4 or to the 5′ end of module 5 (Fig. S1). The insertion scenario implies the existence of an RNA intermediate that contains cox1 modules 4 and 5 without the Us in between (abbreviated as “m4-m5”). If the attachment scenario is correct, all transcripts that contain both modules should also carry the six Us in between; and further, one should find either of the two mono-modules with six Us attached, module 4 with Us at its 3′ end (abbreviated “m4-6xU”) or module 5 with Us at its 5′ end (abbreviated “6xU-m5”).
These alternatives were tested by RT-PCR experiments that amplify specifically cox1 module 4, module 5 or the corresponding junction, by using circularized RNA as template and divergent primers placed in module 4 or 5 (same principle as shown in Fig. 3A). Further, poisoned primer extension was performed probing the 5′-adjacent region of module 5 (described in Supplementary Information and Table S5). In none of the experiments occurred RNA species of the types m4-m5 or 6xU-m5. Instead, we found transcripts corresponding to m4-6xU. Remarkably, m4-6xU transcripts were only detected when RNA was treated with a 3′-phosphatase prior to circularization (Table S5). Apparently, the m4-6xU transcript in mitochondria carries a 3′-phosphate that blocks RNA ligation. How this phosphate group may arise will be addressed in the Discussion section. In sum, these results strongly suggest that RNA editing in Diplonema does not proceed by insertion in the strict sense (cleavage—nucleotide addition—resealing), but rather by appending extra nucleotides to a module transcript, prior to trans-splicing.
The cob transcript, too, seems to undergo RNA editing—in a less conspicuous but equally intriguing way (Fig. 5A). The cob-mRNA’s 3′ end includes three non-encoded Us just upstream the poly(A) tail, as observed in cDNA library clones and independent experiments involving RNA circularization and RT-PCR using primers within the terminal module. These additional Us generate a Phe-codon in the cob-mRNA sequence, plus the first position of the stop codon, which is completed to UAA via polyadenylation. We predict that the edited terminal cob module 6 also carries a 3′-phosphate group just like the edited module 4 of cox1, and that U appendage to the cob module is a prerequisite for polyadenylation.

Figure 5. Inconsistencies between gene and transcript sequences in 3′-end regions of genes. (A) RNA editing (red letters) and completion of the stop codon by polyadenylation at the 3′end of cob. (B and C) Completion of stop codons by polyadenylation. The genome-encoded 3′-terminal U of the last module from atp6 and cox1 is completed to UAA by the A-tail, generating the only stop codon in the atp6 reading frame and a second stop codon, immediately following UAG, in the cox1 reading frame. Completion of stop codons by polyadenylation most probably occurs in all protein-coding genes of Diplonema mtDNA. (B) Lower part, the square bracket with sequences in gray font color indicate artifacts due to the usage of anchored oligo-dT primer for cDNA synthesis; see Supplementary Information. Bold font style and underscoring highlights nucleotides that are identical in genomic and transcriptomic sequence. Lower-case letters in genome sequences show non-coding regions. For adenines (As) in a transcript sequence set in bold but not underlined, it cannot be inferred whether they are encoded or added post-transcriptionally. Genomic, sequence of clones from mtDNA; cDNA/polyAlib, cDNA sequences of clones in cDNA libraries generated by reverse-transcription of poly(A) RNA using an anchored oligo-dT primer; cDNA/RNAcirc, cDNA sequences of clones obtained from circularized RNA that was reverse-transcribed and amplified using gene-specific primers (see Materials and Methods). When only a single clone exists for a given sequence, the clone ID is given in parentheses. When multiple clones share the same sequence, the total number of such clones is indicated in parentheses. The corresponding clones are as follows. cob genomic: dp4155, dp4608, dp4735, dp4941, dp4980, dp4984; cob cDNA/polyAlib: dp0205, dp0314, dp0317, dp1021, dp4278; atp6 genomic, dp4241, dp4242, dp4246, dp4887, dp4896; atp6 cDNA/RNAcirc: dp9537, dp10201, dp10202, dp10245; atp6 cDNA/polyAlib: dp1971, dp0414; cox1 genomic, dp4216, dp3328-4, dp3207; cox1 cDNA, dp6005 and 83 additional clones.
We also observed differences between mitochondrial genomic and transcriptomic sequences that do not involve Us. Differences pertain to the termination codon, which is generated post-transcriptionally by polyadenylation as experimentally confirmed for atp6 (Fig. 5B), but most likely also applying to all other genes. The probably only terminal module enclosing an encoded stop codon is that of cox1, but curiously, polyadenylation creates an additional stop codon (Fig. 5C). Post-transcriptionally generated stop codons have first been reported for human mitochondria, where, in contrast to Diplonema, it coincides with an extreme reduction of both intergenic regions and the overall genome size.24
Experimental detection of postulated RNAs guiding post-transcriptional processes
Earlier we showed by rigorous in silico analyses that trans-splicing in Diplonema mitochondria is most certainly not directed by cis-elements, i.e., sequence motifs located in modules or their flanking regions.5 Therefore, we posited trans-acting matchmaking factors, which could be RNA, protein or DNA molecules. Here, we describe a set of experiments that test for the existence of RNAs that may guide module trans-splicing as well as RNA editing. We refer to these hypothetical molecules as post-transcriptional processes guiding RNAs (ppRNAs).
First, we searched for gRNA-like molecules known from kinetoplastid mitochondria to direct RNA editing. These RNAs are characterized by 50‒70 nt length, high abundance, a 5′-triphosphate and a 3′-poly(U) tract.25 Yet, electrophoretic separation of RNAs, capping experiments (see above) and in vitro incorporation of radiolabeled uridine (not shown) did not reveal RNA species in Diplonema akin to kinetoplastid gRNAs. Since in Diplonema, the posited ppRNAs may be present at only low concentrations, the more sensitive RT-PCR methodology was employed on total Diplonema RNA. We inquired for molecules that are antisense to mRNA and cover several or even all module junctions of a gene at once. One experiment aimed at antisense transcripts including modules 5‒9 of cox1 (spanning four junctions), and antisense transcripts of three additional genes spanning two to four junctions were tested as well. However, no amplicons were detected (results not shown), refuting the hypothesis of long antisense RNAs directing trans-splicing of multiple modules simultaneously.
A second series of experiments tested for the presence of ppRNAs that cover only a single module junction. These RT-PCR experiments used a primer pair that targets the central region of hypothetical ppRNAs (Fig. 3B). Here, we obtained amplicons of the expected size and sequence for all five of the examined cox1 junctions, M2/M3, M3/M4, M4/M5, M5/M6 and M8/M9 (Fig. 6); Table 1 (“central”) compiles the results of the individual experiments. However, the exact transcript sequence that served as template for the RT product cannot be inferred from these experiments. One reason is that the primer pairs were designed to anneal a few nucleotides adjacent to module junctions (to increase the chance of detecting the postulated molecules), with the consequence that the resulting amplicons include only two to eight “novel” nucleotides (Fig. 6, red labels), while most of the sequence originates from the primers. Second, the primer-derived sequence of amplicons may not fully correspond to the sequence of the targeted RNA. This is because primers were designed with the assumption that the hypothetical ppRNA is an exact reverse complement of pre-mRNA, but the pre-mRNA:ppRNA duplex region may contain G:U pairs. In addition, the primers may extend beyond the 5′ and 3′ termini of the hypothetical ppRNA.
Figure 6. Experimentally detected RNAs potentially guiding trans-splicing and RNA editing of cox1. The cDNA sequence at junctions is shown in blue (upstream module) and green (downstream module). -|, 3′ end of upstream module. |-, 5′ end of downstream module. dp9241 to dp8689 are RT-PCR clones representing RNAs that are complementary (antisense) to module junctions. The sequence portion in black originates from primers, and the underlined stretch colored red is the sequence originating from the antisense RNA. oli88 (RT) etc., oligonucleotide primers used for priming the reverse transcriptase reaction. oli80 etc., oligonucleotide primers used in the PCR reaction, together with the RT primer. For primers, see Table S9.
Table 1. Experimental detection of RNAs potentially guiding cox1 trans-splicing and RNA editing.
| Targeted ppRNA regiona | cox1-junction | Primers used | Clone series from separate experimentsb |
Obtained sequences (nr. of distinct types / nr. of clones of a given type)c |
|---|---|---|---|---|
|
Central |
Module 2 / 3 |
dp88 (RT) + dp80 |
dp7901–96 dp8373–96 dp8401–24 dp9237–72 |
ppRNA candidates (1/32) Spurious sequences (14/21) |
| |
Module 3 / 4 |
dp146 (RT) + dp147 |
dp8425–72 dp9273–84 |
ppRNA candidates (1/1) Spurious sequences (2/3) |
| |
Module 4 / 5 |
dp129 (RT) + dp109 dp129(RT) + Smart |
dp6401–96 dp6501–96 dp6601–96 dp7101–96 dp6901–96 |
ppRNA candidates (1/67) Spurious sequences (37/59) |
| |
Module 5 / 6 |
dp150 (RT) + dp151 |
dp8473–96 dp9285–96 |
ppRNA candidates (1/5) Spurious sequences (1/1) |
| |
Module 8 / 9 |
dp154 (RT) + dp41 |
dp8649–96 dp9337–96 |
ppRNA candidates (1/1) Spurious sequences (16/61) |
|
Distal |
Module 2 / 3 |
dp138 (RT) + dp139 |
dp8001–24,37–48 |
Spurious sequences (7/30) |
| |
Module 3 / 4 |
dp148 (RT) + dp149 |
dp8149–72 |
Spurious sequences (2/24) |
| |
Module 5 / 6 |
dp152 (RT) + dp153 |
dp8173–96 |
ppRNA candidates (1/1) Spurious sequences (6/14) |
| Module 7 / 8 | dp84 (RT) + dp41 dp141 (RT) + dp140 |
dp7801–96 dp8025–36,49–96 |
Spurious sequences (34/73) |
a Determination of central ppRNA regions by “convergent” RT-PCR, and of distal regions by “divergent” RT-PCR on circularized RNA (see Fig. 3A and B).
b Each clone series (dp7901-96, dp8373-96, etc.) was obtained from a separate RT-PCR reaction. Two different RNA preparations were used. In the experiments targeting the junction of module 4/5, both preparations were used (preparation #1 for dp64xx, dp65xx, dp66xx and dp69xx, and preparation #2 for dp71xx). The number and nature of the resulting amplicons was very similar. All other experiments were conducted with preparation #2.
c Sequences designated as of the same type are ≥ 99% identical in the stretch between the primers; those designated ppRNA candidates are 100% identical in this stretch. Additional but irrelevant differences between reads occur at their start and end, where the primers may be present fully or only partially, and adjacent vector sequences may be included or not. Sequences considered as originating from ppRNA candidates must suffice the following criteria. In experiments targeting the central region of the hypothetical ppRNA, candidates must carry both primers in correct orientation, further, the primer used for RT must prime the antisense transcript, and the sequence between these primers must match that of the corresponding module junction in mRNA (U:G pairs allowed, but not insertions/deletions). In the case of experiments targeting the distal region of the hypothetical ppRNA, candidates must carry both primers in correct orientation with > = 1 nt in between, and either match yet unassigned mitochondrial coding regions (within cassettes) or must be the predominant type of cloned amplicons. “Spurious sequences” include (1) mitochondrial sequences where one or both primers have annealed unspecifically at a site with < 60% sequence identity on the sense (coding) or antisense strand; (2) nuclear sequences; (3) sequences not matching the available mtDNA or nuclear genome sequences or the vector. A detailed listing of the results, the characterization of spurious sequences and the actual sequences of relevant clones are compiled in Table S6 and the corresponding footnote.
In an attempt to characterize the distal regions and the length of the detected RNAs, we conducted “divergent” RT-PCR on circularized RNAs (Fig. 3A), expecting to discover sequences that match presumptive mitochondrial coding regions (in cassettes). Yet, no significant hit with unassigned cassettes was found, and amplicon clones differ in sequence among each other, suggesting that these particular RT-PCR products are mostly spurious (Table 1, “Distal”). Nevertheless, a single candidate (dp8189) was found to match junction M5/M6 and covering ~60 nt of both modules in antisense direction (see Table S6 for a detailed analysis and sequence). The reason why divergent RT-PCR yielded such sparse results could be due to the short length of the target RNA together with mismatches between primer and target RNA. Finally, the length of potential ppRNAs was also investigated by primer extension (run-off reverse transcription) aiming at the antisense RNA covering the cox1 junctions M4/M5, which appears to be the highest expressed ppRNA candidate (see Table 1). However, no signal was detected, most likely because of the lower sensitivity of this method compared with RT-PCR.
Given the limited sequence information obtained for ppRNA candidates, it is not possible to determine unambiguously their coding regions on the (nuclear or mitochondrial) genome. Therefore, we mapped the candidates’ genomic positions by silico analyses, as described in the following section.
In silico detection of postulated trans-factors guiding post-transcriptional processes
Potential trans-acting elements directing cox1 trans-splicing and RNA editing were searched computationally in the available genome and transcriptome sequences. The in silico analyses were designed in a way to test many more scenarios and in a more rigorous way than would be feasible experimentally. For example, we permitted guiding factors not only to be (nucleus and mitochondrion-encoded) RNA but also DNA molecules. (Note that RNA and DNA uptake by this organelle has been well-documented27; reviewed in ref. 28). Moreover, we allowed guiding factors to bind to as few as six contiguous nucleotides (“anchor,” parameter a = 6, see Fig. 3C) in each of the neighboring modules, and three out of six pairs in the module/guide duplex region may be G:U.
The first and simplest guide model we searched for required the binding sites of the trans-factor to be directly adjacent to the module junction, and the trans-factor to be collinear with the sequence across the junction (distance d1, d2 = 0; no loop or bridge, L = 0, see Fig. 3C). Further, each guide was scrutinized for the potential to mis-assemble, i.e., to direct joining of non-cognate modules (e.g., modules 2 and 4). If so, these guides were removed. Finally, a data set must include guides for at least six out of eight cox1 junctions (to account for incomplete genome sequences), otherwise candidates are not reported. For this model, we detected 34 candidates of distinct sequence in the mitochondrial genome data, nearly four times as many in nuclear ESTs, and close to 3,000 candidates in nuclear genome data (Table 2, column 3). The guide candidates detected in mtDNA match all cox1 junctions except M1/M2 and M3/M4 (Table S7). Surprisingly, only one candidate is predicted to reside on one of the six fully sequenced chromosomes (out of an estimated total number of > 100 mitochondrial chromosomes). This finding refutes our earlier hypothesis that ppRNAs are encoded in the constant region of all chromosomes.
Table 2. Computational detection of cox1 trans-splicing and editing guidesa.
| Data setb | Size of data set (nt) | Number of guides | Number of guides (mean)c | |||
|---|---|---|---|---|---|---|
| |
|
Topology 1; L = 0; d1,d2 = 0; |
Topology 1; L = 0..50; d1,d2 = 0..83; |
Topology 1; L = 0..5; d1,d2 = 0..83; |
Topology 2; L = 0..50; d1,d2 = 0..83; |
Topology 2; L = 0..50; d1,d2 = 0..5; |
| nuc genome |
93,641,047 |
2,873 |
N. d. |
367,275,143 |
N. d. |
27,321,323 |
| nuc ESTs |
2,816,174 |
135 |
N. d. |
3,916,582 |
N. d. |
197,463 |
| mt genome |
633,395 |
34 |
15,653,575 |
N. d. |
15,596,645 |
N. d. |
| mt cDNAs | 68,395 | 0 | 37,520 | N. d. | 35,799 | N. d. |
a For an illustrative description of parameters, see Figure 3C and D. Minimum match length of paired regions was set to a = 6. The number of allowed G:U pairs is ≤ 3. Guides that have the potential to direct also joining of non-cognate modules have been eliminated. For each data set, the minimum number of junctions covered by guides of a given structural class is six (out of eight); otherwise the number of detected guides is set to zero. Numbers of guides count those with distinct sequence. N. d., not determined.
b nuc genome, nuclear genome sequences; mt genome, mitochondrial genome sequences; mt cDNA, mitochondrial cDNA sequences; nuc ESTs, nuclear EST sequences of D. papillatum. For details, see Materials and Methods.
c The minimum and maximum values deviate from the mean by ~5% (see Table S8).
More complex guide models were tested as well. We allowed that the guide-binding sites in modules may occur at various distances from the corresponding junction (d1, d2 > 0) and that the two module-binding sites in guides are at various distances from one another (L > 0). We also considered that the two binding sites on the guiding molecule may be arranged in permutated order (Topologies 1 and 2; Fig. 3C and D). Although permutated structural RNAs are not without precedent (e.g., ref. 29), they have been widely ignored in computational searches (but see ref. 30). The combination of the above parameters yields nearly 720,000 distinct guide classes, i.e., sets of guides that share identical parameters and conformation (Table S8). For this extended guide model, candidates were detected in all data sets and in very large numbers (Table 2, columns 4–7; for statistics of search results, see Table S8). To summarize, the Diplonema mitochondrial genome (as well as its nuclear genome) indeed contain candidates for trans-factors that direct trans-splicing and RNA editing in mitochondria.
Discussion
End processing and trans-splicing of gene modules proceed in parallel
We investigated in a comprehensive fashion the post-transcriptional processes in Diplonema mitochondria, processes that are involved in the generation of full-length transcripts from multiple, separately transcribed gene pieces (modules). Analysis of the immature transcript population (mono-module and oligo-module transcripts) provided insight into the transcription, transcript-end processing and trans-splicing of gene modules, allowing reconstruction of the full post-transcriptional processing pathway in Diplonema mitochondria. Apparently, transcription of individual modules initiates and terminates in the shared constant regions of chromosomes (see Fig. 1A) as evidenced by mono-module transcripts including long flanking regions (see Fig. 1F). These non-coding, flanking regions are subsequently removed from mono-module transcripts by precise endonucleolytic cleavage, since we detected sizeable transcripts starting or ending directly at a module boundary and containing exclusively flanking region. Still, adjacent regions included in module precursors are tremendously variable in length (see e.g., cox1-m2 with 22 to 943 nt-long 5′ extensions; Fig. 1F; Table S1), which might reflect that additional, exonucleolytic trimming is at work, but prematurely stalling reverse transcriptase reactions in RT-PCR experiments might contribute to this phenomenon as well. Most important, the results show that 5′- and 3′-end processing of mono-module precursor transcripts occurs independently from one another, and that 5′-end processing of terminal modules is independent of the polyadenylation step, which apparently takes place rapidly after 3′-end processing.
Mono-modules are able to engage in trans-splicing as soon as one of the two termini is end-processed (Fig. 4B–D), since many oligo-module transcripts include 5′- or 3′-flanking regions. In addition, trans-splicing seems to start with any pair of cognate modules, without imposing any directionality (such as transcript elongation from 3′ to 5′), which is indicated by the simultaneous finding of oligo-modules covering either a 5′-terminal, a central or a 3′-terminal portion of the mature transcript. Taken together, module-end processing and trans-splicing in Diplonema mitochondria is highly parallelized—not serial. Further noteworthy is that trans-splicing in this system is highly accurate, as mis-joined modules, across genes or within genes but out of order, have not been observed.
Candidates identified for trans-factors that guide post-transcriptional processes
Earlier we showed that trans-splicing in Diplonema mitochondria is most likely not guided by sequence elements in cis,5 and therefore, we postulated the existence of trans-acting factors [termed post-transcriptional processes guiding RNAs (ppRNAs)]. Here, we were able to confirm experimentally that low-abundant antisense RNAs indeed exist that cover a single module junction (Fig. 6), thus having the potential to direct trans-splicing, as well as RNA-editing. Although it was only possible to determine a small sequence portion of the presumed ppRNAs (see Results), data suggest that these molecules are rather short. The limited sequence information also precluded to unambiguously map the presumed guides to the nuclear or mitochondrial genome. It should be noted, however, that ppRNAs may not be directly encoded by the genome after all, but alternatively, reverse-transcribed from mRNAs and then transmitted epigenetically to daughter cells, as in the case of RNA-mediated genome rearrangements in ciliates26).
Instead of mapping the presumed ppRNAs to genome sequence, we computationally predicted guide candidates for cox1 in the genome sequences of Diplonema. This analysis allowed testing numerous different structures and conformations (see Fig. 3C and D) and to exclude solutions that lead to module misjoining. As expected, the large nuclear genome has the potential to encode numerous trans-acting factors, but the most important result is that in the available mtDNA sequence (which represents an estimated 50% of the entire mitochondrial genome), we located 34 distinct ppRNA candidates for six out of eight cox1 junctions (Table 2). This finding corroborates the view that the mitochondrion itself encodes its guides for trans-splicing and RNA editing of mitochondrial genes in Diplonema.
RNA editing at two sites by U-appendage
Uridine (U)-based mitochondrial RNA editing is known from plants and kinetoplastids, where it involves nucleotide modifications and insertion/deletions, respectively (reviewed in ref. 22). In contrast, RNA editing in Diplonema mitochondria relies on U addition. More precisely, editing at the first site, in cox1, proceeds by appending six Us to module 4 of cox1, prior to trans-splicing to module 5 (Fig. S1). The finding that Us are only found attached to module 4, but not module 5, makes it highly unlikely that the non-encoded Us represent an overlooked mini-module, because modules have no preference for either of their two neighbors to join with the first. Note that we never encountered intermediates with an incomplete or excessive number of Us, indicating that U attachment is rapid and highly precise, and tightly coordinated with module joining. Since cox1 modules 4 and 5 apparently do not trans-splice prior to U addition, RNA editing is a crucial prerequisite for the biosynthesis of the cox1 mRNA as a whole. The second site of RNA editing in Diplonema mitochondria, reported here for the first time, is located at the 3′end of the terminal cob module and involves appendage of three Us (Fig. 5A).
Non-encoded Us in mitochondrial transcripts were also observed in a close relative of dinoflagellates,23 where two mRNAs carry a U tract at their 5′end. It remains unknown whether the extra nucleotides originate indeed from RNA editing or rather from sloppy transcription. Finally, RNA editing involving terminal homo-oligomer addition has been discovered in a single mitochondrial gene (cox3) of select dinoflagellates. Their cox3 gene is bipartite and the transcripts of both fragments are oligo-adenylated as are all mRNAs in these organisms. Interestingly, five nucleotides of the A-tail from the upstream cox3 fragment are retained in mRNA.21
Parallels of mitochondrial post-transcriptional processes in Diplonema and kinetoplastids
We showed above that processing of module-precursor RNAs in Diplonema mitochondria involves base-precise endonucleolytic cleavage, suggesting that not only trans-splicing and RNA editing, but also end-processing, may be guided by the postulated trans-factors. Interestingly, end-processing in Diplonema mitochondria is formally equivalent to the first step of RNA editing in kinetoplastids. There, pre-mRNA is cleaved within a sequence stretch to which a gRNA is bound, the cut site being immediately adjacent to the anchor region. The reaction is performed by endoribonucleolytic enzymes that are part of the editosome (reviewed in ref. 31).
It came as a surprise that the RNA-editing intermediate cox1 Module4-UUUUUU (m4-6xU) carries a 3′-nucleoside monophosphate (3′NMP). This may reflect a mechanistic similarity between U-”appendage” editing in Diplonema and U-insertion editing in kinetoplastids. In Trypanosoma brucei, in vitro assays with mitochondrial extracts have shown that TUTase occasionally appends more Us at the pre-mRNA cleavage site than specified by the corresponding gRNA, and it was suggested that excess nucleotides are trimmed by an exonuclease (3′ -> 5′ exoUase; reviewed in ref. 32). Functional characterization of mitochondrial proteins in trypanosomes revealed three enzymes implicated in U-insertion/deletion RNA editing, TbMP42, TbMP99 and TbMP100.33,34 TbMP42 displays in vitro exoUase activity leaving 3′-monophosphate ends, which, as in Diplonema, cannot be ligated with the 5′ terminus of the downstream pre-mRNA cleavage fragment. In turn, TbMP99 and TbMP100 exhibit 3′-specific nucleotidyl phosphatase activity converting the 3′NMP to a 3′ hydroxyl group, which permits pre-mRNA re-ligation.35 This dephosphorylation step was proposed to serve as quality control in trypanosome RNA editing: only when the number of inserted nucleotides permits full pairing with the gRNA, the 3′monophosphate would be removed from the terminal U, and re-sealing of pre-mRNA would proceed.35
Kinetoplastid editosomes include virtually all catalytic activities required for post-transcriptional processes in Diplonema mitochondria, notably an endoribonuclease for module end processing, a TUTase for U addition, an exoUase leaving the 3′NMP, a 3′-nucleotidyl phosphatase that “repairs” 3′ termini generated by this exonuclease and, finally, RNA ligase for module joining. This raises the question whether in Diplonema mitochondria these activities are exerted by homologs of the kinetoplastid enzymes and whether the enzymes are also organized in a multi-functional protein complex. If so, this might allow us to trace the basic eukaryotic machinery from which the kinetoplastids’ editosome may have evolved. However, it is equally possible that the modes of transcript maturation in Diplonema and kinetoplastid mitochondria are fundamentally different. For example, module ligation in Diplonema mitochondria might not be an enzymatic, but rather a yet undescribed RNA-catalyzed reaction. To address this question, it will be required to characterize the catalytic entities that carry out the post-transcriptional processes in Diplonema mitochondria.
Conclusion and Outlook
Cis- and trans-splicing of traditional introns (spliceosomal, Group I, Group II or archaeal/tRNA introns) intimately links removal of flanking sequences and exon joining, so that splicing intermediates are generally difficult to examine. In contrast, trans-splicing in Diplonema mitochondria takes place in clearly separated steps with readily detectable intermediates and, therefore, permitted straightforward investigation of the underlying processes as reported here.
This study detected candidates for trans-factors that direct trans-splicing and RNA editing in Diplonema mitochondria. The next logical step is to validate the predicted function of these candidates. For example, we expect engineered DNA or RNA guides to mediate trans-splicing of non-cognate module transcripts, and to direct appendage of an arbitrary number of Us to a gene module transcript not uridylated in vivo. Another approach, recently initiated in our laboratory, is to identify and isolate mitochondrial protein complexes of Diplonema that display trans-splicing and RNA editing activity, and investigate which of the complex constituents act as guide. This approach is suited to detect guides that are not only RNA or DNA, but also protein molecules.
A more general question bears on the biological role of such complicated post-transcriptional processes. We believe that they offer an effective handle for regulation at various levels: polyadenylation generates stop codons acting on the effectiveness of mRNA translation; gene module-end processing controls the steady-state levels of the building blocks from which mRNAs are made and, finally, RNA editing events act as checkpoints during transcript maturation.
Materials and Methods
Sequences deposited in public-domain databases
About 17 kbp newly determined sequences were deposited in GenBank under the accession numbers JQ302962, JQ314396 and JQ302963. These entries contain the sequences of the entire chromosome A4005 that carries Module 8 of nad7, the entire chromosome A3216 that carries the 3′terminal module of the yet unidentified gene X2 and the 3′terminal piece of the gene specifying mitochondrial large subunit (LSU) rRNA, plus adjacent regions.
Strain, culture and extraction of mtDNA and RNA
Diplonema papillatum (ATCC 50162) was obtained from the American Type Culture Collection. The organism was cultivated axenically at ~20°C in artificial seawater enriched with 1% fetal horse serum (Wisent) and 0.1% Bactotryptone. Mitochondrial DNA was extracted from an organelle-enriched fraction isolated by differential and sucrose gradient centrifugation.9 RNA was extracted either from the mitochondria-enriched fraction or from total cell lysate by a homemade Trizol substitute (see ref. 10). Residual DNA was removed from RNA preparations either by RNeasy (Qiagen) column purification or by digestion with RNase-free DNase I (Roche) followed by phenol-chloroform extraction.
RNA capping
DNase-treated RNA was labeled with α-(32P)-GTP in the presence of capping enzyme (ScriptCap, Epicenter Biotechnologies), followed by phenol-chloroform extraction and electrophoretic separation on denaturating polyacrylamide gels of various concentrations (5%, 12% and 16%, and 4–10% and 10–20% gradients).
Primer extension
Run-off transcription experiments aimed at determining the length of potential antisense guiding RNAs. We followed the protocol devised by Promega (www.promega.com/resources/protocols/technical-bulletins/0/primer-extension-system-amv-reverse-transcriptase-protocol/). Briefly, a primer that was labeled at its 5′end with γ-(32P)-ATP was annealed with 4, 50 or 200 μg of DNase-treated poly(A), mitochondria-enriched or total RNA, respectively, and then incubated with AMV reverse transcriptase (Roche) in the presence of 1 mM dNTPs and 40 μM pyrophosphate. For positive controls, we used the primers dp145, dp153 and 138 that anneal with cox1 modules 1 and 5, and the terminal module of rnl, respectively. These controls gave rise to predominant products of 220, 180 and 340 nt arising from reverse transcription of the corresponding processed modules, together with minor bands representing precursors and trans-splicing intermediates (for primer sequences, see Table S9). Negative controls left out either RT or RNA. Oligonucleotide dp207 was used for primer extension of the hypothetical antisense RNA that is complementary to the M4/M5 junction of cox1. The samples were separated on an 8% poly-acrylamide gel (19:1) containing 7 M urea, resolving a size range from 20‒1,000 nt. As size markers served the φX174 (Hinf) marker (Promega), the low-range RNA ladder and the 1-kbp plus DNA ladder (Fermentas) that we also end-labeled with γ-(32P)-ATP. After migration, the gel was exposed on an X-ray film to visualize the bands.
RNA circularization
DNase-treated RNA was incubated with tobacco acid phosphatase (TAP, Epicenter) and T4 polynucleotide kinase (PNK, New England Bio Labs). We used both the unmodified PNK enzyme M0201 that possesses 3′-phosphatase activity, and the engineered form M0236 without this activity. RNA was diluted to 20 ng/μL and circularized using T4 RNA ligase (Roche).
RT-PCR
The first strand (cDNA) was generated with Powerscript reverse transcriptase of the Creator Smart cDNA library construction kit (Clonetech) or avian myeloblastosis virus (AMV) reverse transcriptase (Roche). PCR was performed with the Takara PCR kit (Bio Inc.), typically for 35 cycles. Generally, two gene-specific primers were used (Fig. 3A and B), but for certain RT-PCR experiments, PCR amplification was conducted with only one gene-specific primer (for first-strand synthesis) plus the Smart IV primer that anneals with the overhanging G residues at the 5′-end extension of the first-strand DNA.10 Primer sequences are given in Table S9. For all RT-PCR experiments aiming at the detection of RNAs that mediate trans-splicing and RNA editing, a negative control was performed where no template RNA was added. Other negative control experiments involved the use of a primer combination that is not expected to yield an amplicon, notably RT-PCR, where one of the two primers had an insertion or mismatches at its 3′-end compared with the target sequence, and another control where both primers would bind to the same strand. Controls without template did not yield a (visible) product, whereas controls with inappropriate primers produced amplicons of very low amounts. Sequencing showed that the latter RT-PCR products were artifactual, originating from unspecific priming of the sense-strand.
Cloning and sequencing of amplicons
Amplicon termini were rendered blunt with T7 DNA polymerase and the Klenow fragment of DNA polymerase I (New England Bio Labs), agarose gel-purified, phosphorylated with T4 PNK (New England Bio Labs) and ligated into the vector pBFL6cat, which is an in-house constructed, small pBlueScript derivative. cDNA libraries were cloned into pDNR-LIB (Clonetech). After transformation into E. coli DH5α, plasmid DNA was extracted using the Qiagen 96-well mini-prep kit. Sequencing reactions were performed with the BigDye Terminator v3.1 Cycle Sequencing Kit from Applied Biosystems and sequenced on an ABI 370 Analyzer.
Clustering of reads and sequence assembly
Reads obtained in experiments aiming at the detection of trans-splicing guide RNAs contained large numbers of identical sequences and, therefore, were clustered, prior to analysis, with the tool CD-Hit.11 Default parameters were used except for the c and g parameters that were set to 0.9 (90% identity) and 1 (a sequence is clustered into the most similar cluster), respectively. Representative reads obtained by clustering were assembled by phred/phrap12 using highly stringent parameters (-minmatch 300, -maxgap 10 -repeat_stringency 0.99 -shatter_greedy -q 95 -penalty -9 -minscore 200). Contigs were inspected using consed13 to identify potential misassembly. The consensus sequence in Masterfile format was generated using the in-house tool cosmea. Bioinformatics tools and Masterfile grammar are described at www.megasun.bch.umontreal.ca/ogmp/ogmpid.html. Software is available on request.
Analysis of amplicon sequences
For each contig in a Masterfile, primers were located by BLAST searches, and amplicon “inserts” by either BLAST or FASTA searches14,15 against available mitochondrial and nuclear genome, and EST sequences of D. papillatum. Using the in-house MotSearch program, we searched regular expressions corresponding to hypothetical guiding RNAs whose sequence is complementary to module junctions, by allowing in addition to canonical base pairs also G:T pairs.
Computational detection of guiding RNAs or DNAs
Guiding RNAs and DNAs were searched in all available mitochondrial genomic DNA, mitochondrial cDNA, nuclear EST and nuclear genome sequences. The mitochondrial genomic sequences (~350 kbp in total) represent approximately 50% of the genome and consist of six completely sequenced chromosomes and two collections of incomplete chromosome contigs. The mitochondrial chromosomes carry the modules cox1-m9, cox1-m9 (second copy), cox1-m4, geneX- m(k), nad7-m6 and nad7-m8. Chromosome names (sizes and NCBI acc. numbers in parentheses) are: dp3207L.all (5,856 nt; EU123536); dp3208L.all (5,661 nt; HQ288823); dp3209bT.all (7,182 nt; EU123537); dp3216-X2-L.all (5,763 nt); dp4001.all (nad7-m6; 5,794 nt; HQ288824); dp4005.all (5,763 nt; JQ302962). The two overlapping collections of incomplete chromosome contigs are dpapimt.all (219,754 nt, sequenced inhouse) and prag-mt-readings-phred-assembled_2010apr19.all (377,456 nt, sequenced by the Institute of Molecular Genetics, Academy of Sciences of the Czech Republic in Prague). The mitochondrial cDNAs (dpapemt.all_stringent; 68,404 nt) include the GenBank entries HQ288819-22, EU123538 and JQ302963. Nuclear EST sequences (4,952 nt; generated by us previously) were downloaded from our publicly accessible TBestDB (www.tbestdb.bcm.umontreal.ca). Nuclear genome data6 included 79,784 contigs of 93,641,047 nt total length and probably represent 50–75% of the entire genome.
In these data sets, we searched guiding trans-factors with the following parameter combinations: match length of guiding RNA in module i and module i+1, a = 6; distances of match from module boundary d1, d2 = 0,1,..83; bridge length L = 0,1,..50; topology 1 and 2. For an illustrative representation of parameters, see Figure 3C and D. The combination of all parameters yields 719,712 guide groups. The underlying assumption is that all guide molecules adhere to one and the same group. We searched for all these groups unless the number of detected candidates was excessively large (> 100,000) for the first 10 groups tested, so that a similarly large number of candidates could be expected for all other groups. The detected guide candidates were filtered in order to eliminate those that would guide module mis-assembly (e.g., joining of module 1 with module 3). The groups were screened for their capacity to guide the joining of at least 6 out of the 8 cox1 junctions, otherwise the group was discarded. We calculated the number of guides located at different positions in the query sequence and, in addition, the number of guides with distinct sequence. Note that even guide groups of L = 0 can have several members of different sequences, since G:U pairs are permitted at any position in the duplex regions. The algorithm is described in the Supplementary Information.
The search in the nuclear data set was executed on a Sun SPARC Enterprise M9000 server with 64 quad-core 2.52 Ghz Sparc64 VII processors capable of chip multi-threading with two hardware threads, and 2 TByte memory. Execution time for searching topology 1 (384 guide groups) and topology 2 (306 guide groups) was 54.20 h and 7.90 h, respectively. Smaller data sets were analyzed on an iMac 2.66 GHz quad-core Intel Core i5.
Supplementary Material
Acknowledgments
We thank S. Teijeiro and M. Aoulad-Aissa (Université de Montréal) for excellent technical assistance. Further, we acknowledge William Marande (Museum National d’Histoire Naturelle), Sivakumar Kannan (NCBI) and Amir Malekpour (University of Teheran) for conducting preliminary experiments in the context of their PhD and post-doctoral training under the supervision of G.B.: RNA circularization and poisoned primer experiments (W.M.) and in silico searches of RNA and DNA trans-factors (S.K. and A.M.). We also thank B. Franz Lang (Université de Montréal) and Julius Lukes (University of South Bohemia) for advice and discussions, B. Franz Lang and Matus Valach for critical comments on the manuscript, and Matus Valach for help in primer extension experiments. We acknowledge the Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, in particular Cestmir Vlcek, Jan Paces and Jakub Ridl, for 454 DNA sequencing.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Funding
This work was supported by operating grants from the Canadian Institute for Health Research (CIHR, grant MOP-79309; G.B.) and the National Science and Engineering Research Council (NSERC, grant 250909-2006; M.T.), a graduate student award from the Faculty of graduate and post-doctoral studies (FESP; Université de Montréal; Y.Y.) and a Ph. D. scholarship from the Programme Canadien de Bourses de la Francophonie (PCBF scholarship; G.N.K.).
Footnotes
Previously published online: www.landesbioscience.com/journals/rnabiology/article/23340
References
- 1.Simpson AG, Roger AJ. Protein phylogenies robustly resolve the deep-level relationships within Euglenozoa. Mol Phylogenet Evol. 2004;30:201–12. doi: 10.1016/S1055-7903(03)00177-5. [DOI] [PubMed] [Google Scholar]
- 2.Maslov DA, Yasuhira S, Simpson L. Phylogenetic affinities of Diplonema within the Euglenozoa as inferred from the SSU rRNA gene and partial COI protein sequences. Protist. 1999;150:33–42. doi: 10.1016/S1434-4610(99)70007-6. [DOI] [PubMed] [Google Scholar]
- 3.Marande W, Lukeš J, Burger G. Unique mitochondrial genome structure in diplonemids, the sister group of kinetoplastids. Eukaryot Cell. 2005;4:1137–46. doi: 10.1128/EC.4.6.1137-1146.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Marande W, Burger G. Mitochondrial DNA as a genomic jigsaw puzzle. Science. 2007;318:415. doi: 10.1126/science.1148033. [DOI] [PubMed] [Google Scholar]
- 5.Kiethega G, Turcotte M, Burger G. Conserved cox1 trans-splicing and RNA editing lacking conserved sequence patterns. Mol Biol Evol. 2011;28:2425–58. doi: 10.1093/molbev/msr075. [DOI] [PubMed] [Google Scholar]
- 6.Vlcek C, Marande W, Teijeiro S, Lukeš J, Burger G. Systematically fragmented genes in a multipartite mitochondrial genome. Nucleic Acids Res. 2011;39:979–88. doi: 10.1093/nar/gkq883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bonen L. Trans-splicing of pre-mRNA in plants, animals, and protists. FASEB J. 1993;7:40–6. doi: 10.1096/fasebj.7.1.8422973. [DOI] [PubMed] [Google Scholar]
- 8.Moreira S, Breton S, Burger G. (2012) Unscrambling of genetic information at the RNA level. Wiley Interdisc Rev RNA [DOI] [PubMed] [Google Scholar]
- 9.Lang BF, Burger G. Purification of mitochondrial and plastid DNA. Nat Protoc. 2007;2:652–60. doi: 10.1038/nprot.2007.58. [DOI] [PubMed] [Google Scholar]
- 10.Rodriguez-Ezpeleta N, Teijeiro S, Forget L, Burger G, Lang BF. (2009) In Parkinson, J. (ed.), Methods in Molecular Biology: Expressed Sequence Tags (ESTs) Humana Press, Totowa, NJ, Vol. 533, pp. 33-47. [DOI] [PubMed] [Google Scholar]
- 11.Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26:680–2. doi: 10.1093/bioinformatics/btq003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186–94. [PubMed] [Google Scholar]
- 13.Gordon D. (2003) Viewing and editing assembled sequences using Consed. Curr Protoc Bioinformatics, Chapter 11, Unit 11.12. [DOI] [PubMed] [Google Scholar]
- 14.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 15.Pearson WR. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol. 2000;132:185–219. doi: 10.1385/1-59259-192-2:185. [DOI] [PubMed] [Google Scholar]
- 16.Falkenberg M, Larsson NG, Gustafsson CM. DNA replication and transcription in mammalian mitochondria. Annu Rev Biochem. 2007;76:679–99. doi: 10.1146/annurev.biochem.76.060305.152028. [DOI] [PubMed] [Google Scholar]
- 17.Kennell JC, Lambowitz AM. Development of an in vitro transcription system for Neurospora crassa mitochondrial DNA and identification of transcription initiation sites. Mol Cell Biol. 1989;9:3603–13. doi: 10.1128/mcb.9.9.3603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sturm NR, Maslov DA, Grisard EC, Campbell DA. Diplonema spp. possess spliced leader RNA genes similar to the Kinetoplastida. J Eukaryot Microbiol. 2001;48:325–31. doi: 10.1111/j.1550-7408.2001.tb00321.x. [DOI] [PubMed] [Google Scholar]
- 19.Lee DY, Clayton DA. Initiation of mitochondrial DNA replication by transcription and R-loop processing. J Biol Chem. 1998;273:30614–21. doi: 10.1074/jbc.273.46.30614. [DOI] [PubMed] [Google Scholar]
- 20.Bonawitz ND, Clayton DA, Shadel GS. Initiation and beyond: multiple functions of the human mitochondrial transcription machinery. Mol Cell. 2006;24:813–25. doi: 10.1016/j.molcel.2006.11.024. [DOI] [PubMed] [Google Scholar]
- 21.Jackson CJ, Norman JE, Schnare MN, Gray MW, Keeling PJ, Waller RF. Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria. BMC Biol. 2007;5:41. doi: 10.1186/1741-7007-5-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gray MW. Diversity and evolution of mitochondrial RNA editing systems. IUBMB Life. 2003;55:227–33. doi: 10.1080/1521654031000119425. [DOI] [PubMed] [Google Scholar]
- 23.Slamovits CH, Saldarriaga JF, Larocque A, Keeling PJ. The highly reduced and fragmented mitochondrial genome of the early-branching dinoflagellate Oxyrrhis marina shares characteristics with both apicomplexan and dinoflagellate mitochondrial genomes. J Mol Biol. 2007;372:356–68. doi: 10.1016/j.jmb.2007.06.085. [DOI] [PubMed] [Google Scholar]
- 24.Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, Drouin J, et al. Sequence and organization of the human mitochondrial genome. Nature. 1981;290:457–65. doi: 10.1038/290457a0. [DOI] [PubMed] [Google Scholar]
- 25.Blum B, Simpson L. Guide RNAs in kinetoplastid mitochondria have a nonencoded 3′ oligo(U) tail involved in recognition of the preedited region. Cell. 1990;62:391–7. doi: 10.1016/0092-8674(90)90375-O. [DOI] [PubMed] [Google Scholar]
- 26.Nowacki M, Vijayan V, Zhou Y, Schotanus K, Doak TG, Landweber LF. RNA-mediated epigenetic programming of a genome-rearrangement pathway. Nature. 2008;451:153–8. doi: 10.1038/nature06452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Koulintchenko M, Konstantinov Y, Dietrich A. Plant mitochondria actively import DNA via the permeability transition pore complex. EMBO J. 2003;22:1245–54. doi: 10.1093/emboj/cdg128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Salinas T, Duchêne AM, Maréchal-Drouard L. Recent advances in tRNA mitochondrial import. Trends Biochem Sci. 2008;33:320–9. doi: 10.1016/j.tibs.2008.04.010. [DOI] [PubMed] [Google Scholar]
- 29.Keiler KC, Shapiro L, Williams KP. tmRNAs that encode proteolysis-inducing tags are found in all known bacterial genomes: A two-piece tmRNA functions in Caulobacter. Proc Natl Acad Sci USA. 2000;97:7778–83. doi: 10.1073/pnas.97.14.7778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Soma A, Onodera A, Sugahara J, Kanai A, Yachie N, Tomita M, et al. Permuted tRNA genes expressed via a circular RNA intermediate in Cyanidioschyzon merolae. Science. 2007;318:450–3. doi: 10.1126/science.1145718. [DOI] [PubMed] [Google Scholar]
- 31.Stuart KD, Schnaufer A, Ernst NL, Panigrahi AK. Complex management: RNA editing in trypanosomes. Trends Biochem Sci. 2005;30:97–105. doi: 10.1016/j.tibs.2004.12.006. [DOI] [PubMed] [Google Scholar]
- 32.Byrne EM, Connell GJ, Simpson L. Guide RNA-directed uridine insertion RNA editing in vitro. EMBO J. 1996;15:6758–65. [PMC free article] [PubMed] [Google Scholar]
- 33.Brecht M, Niemann M, Schlüter E, Müller UF, Stuart K, Göringer HU. TbMP42, a protein component of the RNA editing complex in African trypanosomes, has endo-exoribonuclease activity. Mol Cell. 2005;17:621–30. doi: 10.1016/j.molcel.2005.01.018. [DOI] [PubMed] [Google Scholar]
- 34.Kang X, Rogers K, Gao G, Falick AM, Zhou S, Simpson L. Reconstitution of uridine-deletion precleaved RNA editing with two recombinant enzymes. Proc Natl Acad Sci USA. 2005;102:1017–22. doi: 10.1073/pnas.0409275102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Niemann M, Kaibel H, Schlüter E, Weitzel K, Brecht M, Göringer HU. Kinetoplastid RNA editing involves a 3′ nucleotidyl phosphatase activity. Nucleic Acids Res. 2009;37:1897–906. doi: 10.1093/nar/gkp049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


