Abstract
The pentatricopeptide repeat modules of PPR proteins are key to their sequence-specific binding to RNAs. Gene families encoding PPR proteins are greatly expanded in land plants where hundreds of them participate in RNA maturation, mainly in mitochondria and chloroplasts. Many plant PPR proteins contain additional carboxyterminal domains and have been identified as essential factors for specific events of C-to-U RNA editing, which is abundant in the two endosymbiotic plant organelles. Among those carboxyterminal domain additions to plant PPR proteins, the so-called DYW domain is particularly interesting given its similarity to cytidine deaminases. The frequency of organelle C-to-U RNA editing and the diversity of DYW-type PPR proteins correlate well in plants and both were recently identified outside of land plants, in the protist Naegleria gruberi. Here we present a systematic survey of PPR protein genes and report on the identification of additional DYW-type PPR proteins in the protists Acanthamoeba castellanii, Malawimonas jakobiformis, and Physarum polycephalum. Moreover, DYW domains were also found in basal branches of multi-cellular lineages outside of land plants, including the alga Nitella flexilis and the rotifers Adineta ricciae and Philodina roseola. Intriguingly, the well-characterized and curious patterns of mitochondrial RNA editing in the slime mold Physarum also include examples of C-to-U changes. Finally, we identify candidate sites for mitochondrial RNA editing in Malawimonas, further supporting a link between DYW-type PPR proteins and C-to-U editing, which may have remained hitherto unnoticed in additional eukaryote lineages.
Keywords: chloroplasts, cytidine deaminase, mitochondria, pentatricopeptide repeats, protist genomes, pyrimidine conversion editing, RNA-binding proteins
Introduction
Since the original recognition of the pentatricopeptide repeat motif1 numerous publications have reported on the function of PPR proteins in RNA maturation (such as processing, splicing, and RNA editing) or stabilization.2,3 Most of these reports concern PPR proteins in plants, where the gene family is highly diversified, and includes more than 400 members in flowering plants.4 The characterization of ever more PPR protein interactions with their RNA targets has recently allowed a PPR-RNA sequence binding code to be deduced.5,6 Amino acid identities in two crucial positions within each pentatricopeptide repeat are most critical for recognizing individual cognate RNA nucleotides via a two-to-one correspondence (Fig. 1A).
Figure 1. (A) The RNA-binding properties of pentatricopeptide repeat (PPR) arrays are determined to a large extent by two crucial amino acid positions located in the first α-helix of each PPR and in the region connecting to the following repeat. This “RNA recognition code” is approximately binary with the position 6 (or alternatively 4, depending on PPR domain model) distinguishing purines (R = A/G, most often addressed by T= threonine) from pyrimidines (Y = C/U, most often addressed by N = asparagine) and position 1’ (or alternatively 34) distinguishing nucleotides with amino groups (M = A/C, most often addressed by N = asparagine or S = serine) from those with keto functions (K = G/U, addressed by D = aspartate). Nucleotide recognition is ambiguous when positions 6/4 and 1’/34 are both occupied with asparagines and may be fine-tuned by additional amino acid positions in the PPRs.6 Many plant PPR proteins feature “long” L and “short “S” variants of the classic “P” PPR motifs with L motifs showing much lower conservation of these motifs. All characterized plant RNA editing factors are of the PLS-type and carry at least an extra “E” domain downstream of the unique L2 and S2 PPRs terminating the PPR arrays and frequently (e.g., all PLS proteins in Naegleria gruberi and Physcomitrella patens) also a DYW domain, which has similarity to deaminases. (B) The DYW domain features highly conserved motifs which are likely involved in Zn2+ ion coordination and deaminase functionality including universally conserved histidines, glutamates and cysteines in alignment positions 28, 30, 56, and 59, respectively. The known 10 Physcomitrella and 11 Naegleria DYW proteins (one example each on top) were used to derive a consensus sequence (bold) used to identify novel occurrences of DYW domains in sequence data outside of land plants, which are here revealed in the protists Acanthamoeba castellanii (ELR21770) and Malawimonas jakobiformis (EST EC721053, extended), in the myxomycete Physarum polycephalum (EST EL565806 exemplarily shown), in the charophyte alga Nitella hyalina (EST HO499284, with two frameshifts assumed), and in the rotifers Adineta ricciae (TSA HE690097 exemplarily shown) and Philodina roseola (ACD54792, additional canonical intron assumed in the 3′ sequence region of EU643490). (C) Weblogo profiles for the DYW domains discovered in Physarum polycephalum (15 complete and one partial) and Adineta ricciae (seven complete and 19 partial).
Many plant PPR proteins have been identified as site-specific RNA editing factors in chloroplasts or mitochondria targeting either individual or small families of C-to-U editing sites with related upstream recognition sequences.7-10 The particular plant PPR proteins serving as RNA editing factors are of the so-called PLS-type, featuring tandem arrays of classic “P-type” PPRs along with “L” (long, 35/36 amino acid) and “S” (short, 31 amino acid) PPR variants (Fig. 1A). The plant PLS-type PPR proteins identified as RNA editing factors also feature additional carboxyterminal protein domains directly following the PLS repeat arrays, either an extended “E” domain alone or additionally followed by the “DYW” domain. The DYW domain, so named for the occurrence of this tripeptide consensus occurring at the very end of most proteins of this kind (Fig. 1), is an attractive candidate for involvement in the biochemical conversion of C-to-U. A weak cytidine deaminase similarity was recognized early11 and the DYW domain has subsequently been shown to feature a characteristic cytidine deaminase fold.12 Moreover, the occurrence of the DYW domain and organelle RNA editing correlates well among plants.13 This intriguing correlation prompted us to investigate mitochondrial RNA editing in the protist Naegleria gruberi after the initial discovery of PLS-type PPR proteins with a carboxyterminal DYW domain in this taxon.14 Indeed, we subsequently identified two sites of C-to-U RNA editing in the mitochondrial transcriptome of Naegleria.15As intriguing as the correlation between DYW domain and C-to-U RNA editing in organelles appears, however, it should be noted that a biochemical demonstration of a deaminase activity acting on RNAs is as yet lacking for DYW proteins.
Here we report on an extensive survey of PPR protein genes across all eukaryotic clades, which resulted in the discovery of loci encoding the DYW domain in unexpectedly diverse eukaryotes.
Results
DYW-type PPR protein genes exist in at least four eukaryotic domains
DYW domain proteins had previously been identified in only two eukaryotic super-clades: Archaeplastida (or “Plantae” sensu lato, i.e., all taxa, which trace back to the primary endosymbiosis giving rise to chloroplasts) and Excavata (including the heterolobosean Naegleria gruberi). To search for occurrences of DYW domains in other eukaryotic lineages, we used a 94 amino acid long DYW domain consensus sequence derived for the known Physcomitrella patens and Naegleria gruberi sequences (Fig. 1B) as a TBLASTN query against the diverse nucleotide databases at the National Center for Biotechnology Information (NCBI). This strategy identified novel cases of DYW proteins in several additional taxa outside of land plants, in the protists Acanthamoeba castellanii and Malawimonas jakobiformis, the slime mold Physarum polycephalum, the bdelloid rotifers Adineta ricciae and Philodina roseola, and the charophyte alga Nitella hyalina (Fig. 2). These discoveries demonstrate that DYW proteins exist in at least two additional eukaryotic super-clades, the Amoebozoa (Acanthamoeba and Physarum) and the Opisthokonta (rotifers) and expand the known distribution in plants to include a green alga. Thus, DYW domains are present in both of the two super-domains of eukaryotes, the Unikonta (including Amoebozoa and Opisthokonta) and the Bikonta (including Archaeplastida and Excavata).
Figure 2. Phylogenetic overview on a survey of loci encoding classic P-type PPR proteins and those with carboxyterminal DYW domain additions (bold, underlined) in diverse taxa as discussed in the text. Taxon sampling was focused on protist diversity with taxa given when genome data or extensive transcriptome data are available or if DYW domains were discovered in the course of this study (double arrows). Taxonomic terms implemented at the NCBI are given in bold, the 20 top-level eukaryote domains as currently recognized in the NCBI taxonomy are additionally underlined. Other taxonomic terms currently under debate and not (yet) implemented in the NCBI taxonomy are given in quotation marks. Where genome or transcriptome data of diverse species are available this is indicated with “spp..” The tilde “~” generally indicates PPR protein number estimates from completed genome data, the “>” sign the minimal number of such loci estimated from transcriptome data and number ranges indicate variability where data are available from more than one species of a given genus. The respective nucleotide database subset is indicated for genome or transcriptome data where PPR loci estimates are based on TBLASTN searches alone since no gene/protein models have as yet been deposited in the protein sequence database. Question marks indicate those four top level clades (Centrohelida, Glaucophyta, Jakobida, Oxymonadida) for which such sequence data are as yet lacking. PPR protein gene numbers appear to be generally low in fungi and animals with the exception of the here reported rotifer sequences. In contrast, they are abundant in plants with only example numbers indicated exemplarily for the moss Physcomitrella patens and the flowering plant Arabidopsis thaliana.
Acanthamoeba castellanii
The Acanthamoeba castellanii genome sequence was published very recently.16 We had previously identified a single DYW domain hit located at the carboxy-terminus of a PPR protein with 10 PPR repeats in A. castellanii WGS entry KB007900. The Acanthamoeba DYW domain (Fig. 1B) shows perfect conservation of all highly conserved positions, including the universally conserved cysteine and histidine residues responsible for zinc coordination in the deaminase fold, and even shows the name-giving canonical DYW motif at the carboxy-end, which is well conserved among the plant homologs.
Scanning for additional, i.e., “non-DYW,” PPR proteins in Acanthamoeba, we identified 28 homologs with PPR arrays but without a carboxyterminal DYW domain (Fig. 2). Our own database searches for PPR protein homologies agree very well with the respective annotation of the now available Acanthamoeba protein models based on the recognition of PPR motifs in the conserved domain database (CDD) at the NCBI.17 Although a significant amount of horizontal gene transfer (HGT) has been assumed to shape the Acanthamoeba genome,16 none of the PPR loci shows particularly high similarity to individual homologs hitherto identified in other taxa that could make them convincing candidates for recent HGT events.
Physarum polycephalum
A number of DYW domain proteins have been discovered in a separate branch of the Amoebozoa, in the myxomycete Physarum polycephalum (Fig. 1B). This finding is particularly noteworthy given the complex patterns of mitochondrial RNA editing known in Physarum.18 We initially identified sequences encoding a DYW domain in three ESTs obtained in an early transcriptome study.19 Our initial observation prompted us to scan preliminary genome assemblies of the ongoing Physarum polycephalum sequencing project (http://genome.wustl.edu/genomes/detail/physarum-polycephalum), which revealed additional occurrences of loci encoding DYW domains. The existing contigs in the Physarum genome database are relatively small, largely due to the repetitive nature of the genome and recombination. Moreover, introns are abundant, vary greatly in size, and often have non-canonical splice sites, making gene prediction difficult and the Physarum genome assembly has accordingly proven to be very challenging. The Physarum polycephalum genomic contigs of assembly version 7.3.1 were deposited in GenBank (accessions KE340389 ff.) while this paper was under revision.
We currently estimate the total number of PPR proteins with DYW domains to be ca. 16–20 in Physarum and predict that the total number of PPR proteins, i.e., including those lacking a DYW domain, will likely exceed 100 in this myxomycete (Fig. 2). Unlike their flowering plant homologs,3 PPR protein-coding sequences in Physarum are disrupted by numerous introns. Using targeted RT-PCRs we have confirmed transcription and functional splicing for selected DYW-type PPR loci. These proteins are prime candidates for factors involved in the four (or possibly five) specific C to U conversions within the cox1 mRNA20-22and are thus of particular interest for further study.
Adineta ricciae and Philodina roseola
Our finding of DYW-type PPR proteins in the Metazoan branch of the Opisthokonta clade within the Unikonta is also remarkable. A total of 26 non-overlapping transcriptome shotgun assembly (TSA) entries, which encode unequivocal DYW domain signature motifs (seven complete or near-complete and 19 partial) were identified for the bdelloid rotifer Adineta ricciae. In the longest of these entries (HE690097, 1,062 bp), at least four PPRs are present in the partial CDS upstream of the well-conserved DYW domain (Fig. 1B). Likewise, we found a DYW domain encoded in the genomic DNA entry (EU643490) of another rotifer, Philodina roseola. A corresponding protein model (ACD54792) could easily be improved by assumption of an additional splicing event affecting the carboxy-terminus to create a classic and highly conserved DYW domain end. Ten PPRs are clearly identified upstream of the Philodina DYW domain. Again, all crucial positions are perfectly conserved in this rotifer DYW domain (Fig. 1B). We additionally scanned the Adineta TSA data for further (non-DYW) PPR sequences and could identify 25 such hits (Fig. 2).
Malawimonas jakobiformis
A DYW domain has also been detected in the deeply branching protist, Malawimonas jakobiformis. Malawimonas may belong to the Excavata (Fig. 2), although its true phylogenetic affiliation is currently unsettled.23-26 Like its homologs in other taxa, the DYW domain found in Malawimonas jakobiformis EST EC721053 (Fig. 1B) retains all highly conserved sequence motifs including universal positions that are likely indispensable for deaminase functionality. This initial EST sequence was subsequently extended by two overlapping ESTs resulting in a 1,106 bp contig containing a partial CDS encoding five PPR repeats upstream of the DYW domain. No reasonable estimates on the total number of PPR proteins in Malawimonas can be made as yet, given the limited collection of ca. 14,000 ESTs currently available. However, we have already identified three additional ESTs encoding classic P-type PPR arrays (Fig. 2).
Nitella hyalina
PPR proteins are widespread within both major branches of the Viridiplantae (Fig. 2), but although DYW domains are abundant in land plants (Embryophyta), they had not been described in other taxa within the Streptophyta or any of the Chlorophyta. Our discovery of a DYW domain encoded by EST (HO499284) of the streptophyte alga Nitella hyalina was therefore somewhat surprising. Again, all relevant amino acid positions of the Nitella hyalina DYW domain are well conserved in comparison to the homologs in other taxa (Fig. 1B). The finding of a DYW domain in Nitella is particularly noteworthy given that, as a charophyceaean alga, it may be very closely related to the land plant lineage. As in the case of Malawimonas, given the limited coverage of the transcriptome, no estimates on the total size of the PPR protein gene family can currently be made for Nitella hyalina. However, we were able to consistently identify ca. 10–25 PPR loci lacking DYW domains in TSAs obtained for six other streptophyte algae taxa27 (Fig. 2). Very recently, a new Nitella mirabilis TSA data set was added (PRJNA158153), revealing at least 25 PPR coding sequences without DYW domains (Fig. 2). Interestingly, despite having similar numbers of PPR proteins as the chlorophyte algae, the majority of these Nitella sequences consistently show top similarity hits among their land plant homologs, supporting the assumed phylogenetic affiliation, and likely reflecting vertical transmission of the PPR genes.
Classic “P-type” PPR proteins in other eukaryotes
Aside from the above cases of taxa with novel occurrences of the DYW domain, we also compiled the numbers of non-DYW-PPR proteins encoded in other taxa for which either complete genome sequence or significant transcriptome data are available. This analysis included data from extensive transcriptome studies with good coverage for taxa representing protist diversity, but where complete genome data are currently lacking, including representatives of both the AH clade such as the chlorophyte Botryococcus braunii and the cryptophyte (Katablepharidophyta) Roombia truncata, and of the SAR clade such as the oomycete Bremia lactucae, the chromerid Chromera velia, and the dinoflagellates (Dinophyceae) Symbiodinium spec. and Lingulodinium polyedrum (Fig. 2).
PPR protein genes vary significantly in number across the diverse eukaryote clades. For instance, whereas plants generally contain hundreds of PPR proteins, as exemplified by the model plant Arabidopsis thaliana (Fig. 2), other members of the Archaeplastida typically have between seven and 25. Similar diversity is observed in other clades, ranging from zero in Entamoeba to likely more than 100 in Physarum in the Amoebozoa clade, from zero in Giardia to more than 40 in Naegleria in the Excavata clade, and from zero in Cryptosporidium to ca. 75 in Ectocarpus in the SAR clade (Fig. 2). In addition, whereas only low numbers of PPR hits could be identified in most members of the Alveolata, PPR gene families seem to be widely expanded in the dinoflagellates (Dinophyceae), with more than 200 and 800 potential members tentatively identified in the Lingulodinium and Alexandrium tamarense transcriptome data, respectively.28
The large variability of PPR protein gene numbers may reflect expansion and neo-functionalization of the family in some lineages (as in plants) for RNA editing and other forms of RNA processing in chloroplasts and mitochondria, and/or functional reductions due to simplified or parasitic lifestyles (e.g., in Cryptosporidium, Entamoeba, Giardia). In this regard, it is striking that the PPR family is significantly expanded in both Physarum and the dinoflagellates, given the high frequency (~2–4%) and variety of types of RNA editing within their organelles.
No similar gene family expansions have been observed in the Opisthokont clade comprising animals, fungi, and their unicellular ancestors. Animal (metazoa) genomes contain only a few PPR protein genes, generally between five and 10 and possibly as few as only 1–3 in basal lineages like Trichoplax (Placozoa), Capsaspora (Ichthyosporea), or Amphimedon (Porifera). The numbers of PPR genes in fungal genomes appears to be only slightly larger, although up to 15 PPR proteins may be encoded in some yeasts.29 It should be noted, however, that PPR recognition profiles may be biased based on the overwhelming majority representation of plant PPRs and may, in fact, be diverged in other taxa.
To address whether distinct variants of the PPR motif might be present in different eukaryote clades, we collected PPR motifs for selected taxa with significant numbers of PPR proteins, including Naegleria (Excavata), Acanthamoeba (Amoebozoa), Ectocarpus and Nannochloropsis (Stramenopila), Guillardia (Hacrobia), as well as the red algae Cyanidioschyzon and Galdieria (Rhodophyta). Taxon-specific PPR alignments were created and used to obtain “WebLogo” profiles to look for differential conservation of PPR positions (Fig. 3). Indeed, we found several examples of an apparent taxon-specific relaxation or increase of conservation, respectively, at certain positions as exemplarily highlighted in Figure 3. Notable cases are PPR positions 3, 9, 10, and 14 in which both stramenopile taxa, Ectocarpus and Nannochloropsis show an increased conservation of amino acids Y, A, C, and G, respectively. In contrast, a significant reduction in conservation is seen at four positions within the P-type PPR motifs in Naegleria, including the critical position 34 (1’ in the nomenclature of Barkan et al., 2012) that has been shown to be a major contributor to RNA sequence recognition in plants.5,6,30
Figure 3. P-type PPRs were collected from PPR proteins of individual species as indicated and aligned to create individual Weblogo profiles to check for potentially taxon-dependent conservations of PPR positions. Blue and red arrows indicate examples of increase or decrease of conservation in a given taxon at a certain homologous position, respectively. Stippled lines indicate positions 6 and 1’(or 4 and 34, respectively), which are assumed to have key roles in recognition of RNA nucleotides.
PPR proteins in prokaryotes
Pentatricopeptide repeats are assumed to be a eukaryotic invention and absent in prokaryotes. However, we did identify rare and isolated cases of PPR proteins encoded in bacteria and a single case in an archaeon (Fig. 2). The genome of the α-proteobacterium Rhodobacter sphaeroides strain ATCC17029 encodes a large PPR protein (YP_001044140) with 18 PPRs. This unique protein is most similar to a protein sequence encoded by an uncultured archaeon (AAU82142). Another large PPR protein coding sequence (CAD17344) containing only five PPRs is conserved among different strains of Ralstonia solanacearum, a β-proteobacterial plant pathogen. Two similar PPR proteins (CAX56557 and ADP 11637) are encoded by β-proteobacterial plant pathogens, Erwinia pyrifoliae and a related Erwinia spec. Another unrelated PPR protein (EKV55103) is encoded in a different Erwinia species, E. amylovora. Finally, we also found a short PPR protein (CCB89200) in the genome of Simkania negevensis (Chlamydiae). The punctuate and rare occurrence of the PPR protein genes in only these taxa among the numerous bacterial genomes now available and the obviously close contacts in nature of at least two of the genera with plants make these loci very obvious candidates for plant-bacterial HGT.
RNA editing candidate sites
The phylogenetically distant protists Acanthamoeba castellanii and Malawimonas jakobiformis feature compact circular mtDNAs with a rich gene complement similar to that of Naegleria gruberi, in which we discovered two events of RNA editing after the initial finding of DYW-type PPR protein genes in its genome.15 The DYW domains now recognized in Acanthamoeba and Malawimonas prompted us to check their mitochondrial coding sequences for candidate sites of C-to-U RNA editing. To this end we made use of our recently updated PREPACT 2.0 tool, which now conveniently allows scanning of entire organelle genomes for that purpose.31 Whereas only seven weak candidate sites for potential C-to-U RNA editing were identified for the Acanthamoeba castellanii mtDNA, we found 17 strong candidate sites for C-to-U editing that would reconstitute highly conserved codon identities in Malawimonas jakobiformis mitochondrial transcripts. Preliminary data from first RNA-Seq experiments indeed give evidence for RNA editing in Malawimonas but this has to await further confirmation (VK and BF Lang, Montreal). Similar to Acanthamoeba, the Nitella mtDNA displays only weak candidate sites for C-to-U editing, which may nevertheless warrant occasional checking at the transcript level.
Outlook and Discussion
Further refining the RNA recognition code of PPR repeats5,6is a prime future interest in understanding PPR protein functionality. This will necessitate taking the diversity of PPR repeats and PPR proteins in the living world into account. Any prognosis of RNA sequence targets will have to be tested against the respective full organellar, or possibly also the nucleo-cytosolic, transcriptomes of the given organisms. The recognition of target sequences by their cognate RNA editing factors in flowering plants is complicated by the presence of additional factors, alternatively named MORFs32 or RIPs,33 and in this regard other organisms may be a welcome alternative to elucidate the primary interactions of PPRs with RNA sequences. In Naegleria gruberi the 11 encoded DYW-type proteins significantly outnumber the two demonstrated events of mitochondrial C-to-U editing and this is also the case in Physarum polycephalum (ca. 16–20 vs. 4–5), which raises the interesting possibility that some transcripts outside of their mitochondria may also be targets of C-to-U editing.
It is likely no coincidence that we could identify DYW-type PPR proteins in Acanthamoeba, Naegleria, and in the bdelloid rotifers, all of which are taxa for which massive horizontal gene transfer is assumed.16,34,35 Like the rotifers, Physarum polycephalum also has a strong association with plant material in its natural habitat and it is tempting to speculate that among the likely huge number of PPRs in the myxomycete at least those of the DYW-type originated via HGT from plants. Hence, these may turn out to be typical “you-are-what-you-eat” cases of HGT. At present, however, none of the DYW-type PPR sequences in any of the above taxa shows a particularly high similarity to any of the hundreds of plant homologs already deposited in the database, ranging in diversity from mosses to flowering plants. Accordingly, if HGT would indeed be the provenance of the DYW-type PPR proteins outside of plants, these would be ancient cases with similarities obliterated by sequence drift or functional adaptation. The latter possibility is indeed somewhat supported for the two rotifers by the observation that, although significantly diverged, the Philodina and Adineta DYW domains identify each other as closest relatives.
Materials and Methods
The diverse sections of the NCBI nucleotide database, i.e., the non-redundant nucleotide sequences default, RNA reference sequences (refseq_rna), whole genome shotgun sequences (WGS), high-throughput genomic sequences (HTGS), genomic survey sequences (GSS), transcriptome shotgun assemblies (TSA), expressed sequence tags (EST), and NCBI_genomes were queried with a 94 amino acid DYW domain consensus sequence as shown in Figure 1B using TBLASTN and taxonomically excluding vascular plant (tracheophyte) entries.
All hits were re-run using BLASTX against the protein database and against their respective nucleotide subset database by BLASTN to exclude contaminations or mix-ups during sequencing projects. This procedure excluded two hits among bovine ESTs that were found to be identical to potato ESTs, three identical ESTs from the Antarctic nematode Plectus murrayi36 identical to oil palm ESTs encoding DYW proteins, and hits within chromosome assemblies of Pongo abelii (AC231278) and Danio rerio (CU280367), which were found to be identical to Zea mays or Medicago truncatula sequences, respectively. Two additional significant DYW domain similarities were identified in Laccaria bicolor and Aureococcus FC045055 but lacked conservation of key amino acid positions (Fig. 1B). These cases might represent pseudogenes after HGT but were not considered further.
For completed genome projects with reliable protein models, PPR proteins were compiled with searches for annotation of PPR motifs as defined in the conserved domain database (CDD:20522/ PPR_2/ pfam13041 or superfamily cI03252, including CDD:211604/ TIGR00756, PPR/ pfam01535, and PPR_3/ pfam13812). This coincided very well with independent searches for homology of PPR proteins with long PPR arrays as queries (e.g., Acanthamoeba ELR25026). This approach was also used to identify PPR encoding sequences in nucleotide sequences by TBLASTN. A random expectancy threshold value of 0.01 was applied to determine the number of significant hits in TBLASTN searches in the respective NCBI nucleotide database subsets outlined above, as indicated in Figure 2. Given their highly significant sequence conservation signatures, it appears likely that DYW domains are truly absent in taxa with extensive genome or transcriptome coverage (see Fig. 2). TPRpred37 was used to identify PPRs within protein sequences to collect them for profiling by WebLogo.38
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Footnotes
Previously published online: www.landesbioscience.com/journals/rnabiology/article/25755
References
- 1.Small ID, Peeters N. The PPR motif - a TPR-related motif prevalent in plant organellar proteins. Trends Biochem Sci. 2000;25:46–7. doi: 10.1016/S0968-0004(99)01520-0. [DOI] [PubMed] [Google Scholar]
- 2.Schmitz-Linneweber C, Small I. Pentatricopeptide repeat proteins: a socket set for organelle gene expression. Trends Plant Sci. 2008;13:663–70. doi: 10.1016/j.tplants.2008.10.001. [DOI] [PubMed] [Google Scholar]
- 3.O’Toole N, Hattori M, Andres C, Iida K, Lurin C, Schmitz-Linneweber C, et al. On the expansion of the pentatricopeptide repeat gene family in plants. Mol Biol Evol. 2008;25:1120–8. doi: 10.1093/molbev/msn057. [DOI] [PubMed] [Google Scholar]
- 4.Lurin C, Andrés C, Aubourg S, Bellaoui M, Bitton F, Bruyère C, et al. Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell. 2004;16:2089–103. doi: 10.1105/tpc.104.022236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Barkan A, Rojas M, Fujii S, Yap A, Chong YS, Bond CS, et al. A combinatorial amino acid code for RNA recognition by pentatricopeptide repeat proteins. PLoS Genet. 2012;8:e1002910. doi: 10.1371/journal.pgen.1002910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yagi Y, Hayashi S, Kobayashi K, Hirayama T, Nakamura T. Elucidation of the RNA recognition code for pentatricopeptide repeat proteins involved in organelle RNA editing in plants. PLoS One. 2013;8:e57286. doi: 10.1371/journal.pone.0057286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Finster S, Legen J, Qu Y, Schmitz-Linneweber C. Land plant RNA editing or: don't be fooled by plant organellar DNA sequences. In: Bock R, Knoop V. Genomics of Chloroplasts and Mitochondria. Springer, Dordrecht. 2012; 35:293-321 [Google Scholar]
- 8.Knoop V. When you can’t trust the DNA: RNA editing changes transcript sequences. Cell Mol Life Sci. 2011;68:567–86. doi: 10.1007/s00018-010-0538-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fujii S, Small I. The evolution of RNA editing and pentatricopeptide repeat genes. New Phytol. 2011;191:37–47. doi: 10.1111/j.1469-8137.2011.03746.x. [DOI] [PubMed] [Google Scholar]
- 10.Chateigner-Boutin AL, Small I. Organellar RNA editing. Wiley Interdiscip Rev RNA. 2011;2:493–506. doi: 10.1002/wrna.72. [DOI] [PubMed] [Google Scholar]
- 11.Salone V, Rüdinger M, Polsakiewicz M, Hoffmann B, Groth-Malonek M, Szurek B, et al. A hypothesis on the identification of the editing enzyme in plant organelles. FEBS Lett. 2007;581:4132–8. doi: 10.1016/j.febslet.2007.07.075. [DOI] [PubMed] [Google Scholar]
- 12.Iyer LM, Zhang D, Rogozin IB, Aravind L. Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systems. Nucleic Acids Res. 2011;39:9473–97. doi: 10.1093/nar/gkr691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Rüdinger M, Volkmar U, Lenz H, Groth-Malonek M, Knoop V. Nuclear DYW-type PPR gene families diversify with increasing RNA editing frequencies in liverwort and moss mitochondria. J Mol Evol. 2012;74:37–51. doi: 10.1007/s00239-012-9486-3. [DOI] [PubMed] [Google Scholar]
- 14.Knoop V, Rüdinger M. DYW-type PPR proteins in a heterolobosean protist: plant RNA editing factors involved in an ancient horizontal gene transfer? FEBS Lett. 2010;584:4287–91. doi: 10.1016/j.febslet.2010.09.041. [DOI] [PubMed] [Google Scholar]
- 15.Rüdinger M, Fritz-Laylin L, Polsakiewicz M, Knoop V. Plant-type mitochondrial RNA editing in the protist Naegleria gruberi. RNA. 2011;17:2058–62. doi: 10.1261/rna.02962911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Clarke M, Lohan AJ, Liu B, Lagkouvardos I, Roy S, Zafar N, et al. Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 2013;14:R11. doi: 10.1186/gb-2013-14-2-r11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, et al. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res. 2013;41(Database issue):D348–52. doi: 10.1093/nar/gks1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bundschuh R, Altmüller J, Becker C, Nürnberg P, Gott JM. Complete characterization of the edited transcriptome of the mitochondrion of Physarum polycephalum using deep sequencing of RNA. Nucleic Acids Res. 2011;39:6044–55. doi: 10.1093/nar/gkr180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Barrantes I, Glockner G, Meyer S, Marwan W. Transcriptomic changes arising during light-induced sporulation in Physarum polycephalum. BMC Genomics. 2010;11:115. doi: 10.1186/1471-2164-11-115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gott JM, Visomirski LM, Hunter JL. Substitutional and insertional RNA editing of the cytochrome c oxidase subunit 1 mRNA of Physarum polycephalum. J Biol Chem. 1993;268:25483–6. [PubMed] [Google Scholar]
- 21.Byrne EM, Gott JM. Unexpectedly complex editing patterns at dinucleotide insertion sites in Physarum mitochondria. Mol Cell Biol. 2004;24:7821–8. doi: 10.1128/MCB.24.18.7821-7828.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gott JM. Mechanisms and Functions of RNA Editing in Physarum polycephalum. In: Maas S. RNA Editing: Current research and future trends. Caister Academic Press, Norfolk. 2013; 17-40 [Google Scholar]
- 23.Brugerolle G, Bricheux G, Philippe H, Coffea G. Collodictyon triciliatum and Diphylleia rotans (=Aulacomonas submarina) form a new family of flagellates (Collodictyonidae) with tubular mitochondrial cristae that is phylogenetically distant from other flagellate groups. Protist. 2002;153:59–70. doi: 10.1078/1434-4610-00083. [DOI] [PubMed] [Google Scholar]
- 24.Zhao S, Burki F, Bråte J, Keeling PJ, Klaveness D, Shalchian-Tabrizi K. Collodictyon--an ancient lineage in the tree of eukaryotes. Mol Biol Evol. 2012;29:1557–68. doi: 10.1093/molbev/mss001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Derelle R, Lang BF. Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Mol Biol Evol. 2012;29:1277–89. doi: 10.1093/molbev/msr295. [DOI] [PubMed] [Google Scholar]
- 26.O'Kelly CJ, Nerad TA. Malawimonas jakobiformis n. gen., n. sp (Malawimonadidae n. fam.): A Jakoba-like heterotrophic nanoflagellate with discoidal mitochondrial cristae. J Eukaryot Microbiol. 1999;46:522–31. doi: 10.1111/j.1550-7408.1999.tb06070.x. [DOI] [Google Scholar]
- 27.Timme RE, Bachvaroff TR, Delwiche CF. Broad phylogenomic sampling and the sister lineage of land plants. PLoS One. 2012;7:e29696. doi: 10.1371/journal.pone.0029696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Roy S, Morse D. A full suite of histone and histone modifying genes are transcribed in the dinoflagellate Lingulodinium. PLoS One. 2012;7:e34340. doi: 10.1371/journal.pone.0034340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lipinski KA, Puchta O, Surendranath V, Kudla M, Golik P. Revisiting the yeast PPR proteins--application of an Iterative Hidden Markov Model algorithm reveals new members of the rapidly evolving family. Mol Biol Evol. 2011;28:2935–48. doi: 10.1093/molbev/msr120. [DOI] [PubMed] [Google Scholar]
- 30.Nakamura T, Yagi Y, Kobayashi K. Mechanistic insight into pentatricopeptide repeat proteins as sequence-specific RNA-binding proteins for organellar RNAs in plants. Plant Cell Physiol. 2012;53:1171–9. doi: 10.1093/pcp/pcs069. [DOI] [PubMed] [Google Scholar]
- 31.Lenz H, Knoop V. PREPACT 2.0: Predicting C-to-U and U-to-C RNA editing in organelle genome sequences with multiple references and curated RNA editing annotation. Bioinform Biol Insights. 2013;7:1–19. doi: 10.4137/BBI.S11059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Takenaka M, Zehrmann A, Verbitskiy D, Kugelmann M, Härtel B, Brennicke A. Multiple organellar RNA editing factor (MORF) family proteins are required for RNA editing in mitochondria and plastids of plants. Proc Natl Acad Sci USA. 2012;109:5104–9. doi: 10.1073/pnas.1202452109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bentolila S, Heller WP, Sun T, Babina AM, Friso G, van Wijk KJ, et al. RIP1, a member of an Arabidopsis protein family, interacts with the protein RARE1 and broadly affects RNA editing. Proc Natl Acad Sci USA. 2012;109:E1453–61. doi: 10.1073/pnas.1121465109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gladyshev EA, Meselson M, Arkhipova IR. Massive horizontal gene transfer in bdelloid rotifers. Science. 2008;320:1210–3. doi: 10.1126/science.1156407. [DOI] [PubMed] [Google Scholar]
- 35.Fritz-Laylin LK, Prochnik SE, Ginger ML, Dacks JB, Carpenter ML, Field MC, et al. The genome of Naegleria gruberi illuminates early eukaryotic versatility. Cell. 2010;140:631–42. doi: 10.1016/j.cell.2010.01.032. [DOI] [PubMed] [Google Scholar]
- 36.Adhikari BN, Wall DH, Adams BJ. Desiccation survival in an Antarctic nematode: molecular analysis using expressed sequenced tags. BMC Genomics. 2009;10:69. doi: 10.1186/1471-2164-10-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Karpenahalli MR, Lupas AN, Söding J. TPRpred: a tool for prediction of TPR-, PPR- and SEL1-like repeats from protein sequences. BMC Bioinformatics. 2007;8:2. doi: 10.1186/1471-2105-8-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]



