Abstract
A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAs—in particular rRNAs, tRNAs and some snRNAs—we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome.
INTRODUCTION
The phylum Placozoa consists of only one recognized species—the marine dweller Trichoplax adhaerens. Extensive genetic variation between individual placozoan lineages, however, suggests the existence of different species (1). The phylogenetic position of the phylum Placozoa has been the subject of contention dating from the 19th century. Originally, Placozoa were regarded to represent the base of Metazoa, later they were seen as derived (secondarily reduced) with sponges being considered to be the most basal metazoans [see e.g. (2,3,4) for overview and discussion]. Most recently, a basal position among all diploblastic animals has been suggested (5).
Trichoplax lacks tissues, organs and any type of symmetry. It is composed of only a few hundred to a few thousand cells. This organizm has a simple upper and lower epithelium, which surround a network of fiber cells, and as such has an irregular, three-layered, sandwich-type organization. Only five different cell-types have so far been described; upper and lower epithelial cells, glands cells, fibre cells and recently discovered type of small cells that are arranged a relatively evenly spaced pattern within the marginal zone, where upper and lower epithelia meet (6). It is therefore among the simplest multi-cellular organizm. With 106 Mb, the nuclear genome of T. adhaerens, which has recently been completely sequenced (7), is among the smallest animal genomes.
So far, the non-coding RNA complement of Placozoa has not been studied. The genome-wide annotation of non-coding RNAs has turned out to be a more complex and demanding problem than one might think. While a few exceptional classes of RNA genes, first and foremost rRNAs and tRNAs are readily found and annotated by blast and the widely used tRNA detector tRNAscanSE (8), most other ncRNAs are comparably poorly conserved and hard to find within complete genomes. This is in particular true whenever the sensitivity of comparative approaches are limited by large evolutionary distances to the closest well-annotated genomes. The placozoan T. adhaerens is a prime example for this situation: in the concatenated set of 104 slowly evolving single-copy nuclear protein-coding genes used for phylogenetic analysis in (7), for instance, the distances from Trichoplax to Amphimedon, Nematostella and Human are 0.44, 0.34 and 0.32 substitutions per site, respectively. As a consequence, only highly conserved DNA is alignable at all, and homology-based gene finding becomes a non-trivial task.
In this contribution, we primarily report on a careful annotation of those Trichoplax ncRNA genes that have well-described homologs in other animals. In addition, we describe computational surveys for novel ncRNA candidates. For a subset of the annotated ncRNAs we verify expression to demonstrate that the predicted homologs are functional genes.
MATERIALS AND METHODS
Sequence data and databases
The Triad1 assembly of the genome of T. adhaerens (7) was downloaded from the website of the Joint Genome Institute (http://genome.jgi-psf.org/Triad1/). For comparison, we used the the Nemve1 (http://genome.jgi-psf.org/Nemve1/) assembly of Nematostella vectensis (9), as well as the available shotgun traces of Hydra magnapapillata, Amphimedon queenslandica, Porites lobata, Acropora millepora and Acropora palmata (downloaded from the NCBI trace archive).
Known ncRNA sequences were extracted from the Rfam (10) and NonCode (11) databases. In addition we used the collection of metazoan snRNAs from (12). [The snRNAs found in the current study were made available to (12)].
Software
Homology searches were performed using NCBI blastall 2.2.6 (13), infernal (14), fragrep (15) and the novel GotohScan method described below in detail. Alignments were edited in the emacs editor using ralee mode (16). RNA secondary structures were computed using the Vienna RNA Package (17), in particular the programs RNAfold for individual structures, RNAalifold (18,19) for consensus structures of aligned RNA sequences and RNAcofold (20) for interaction structures. We used RNAmicro (21) in the updated version (1.3) (http://www.bioinf.uni-leipzig.de/~jana/software/RNAmicro.html) to identify microRNA candidates from multiple alignments. The analysis of putative snoRNAs was performed using snoReport (22), targets for box H/ACA snoRNAs were performed using a preliminary version of snoplex (H.Tafer et al., in preparation). The genome-wide screens for conserved secondary structure elements were performed using RNAz (23) as described below.
RNAz screens
We used multiz (24) to produce a three-way alignment of Trichoplax, Nematostella and Hydra. Only the blocks that contained Trichoplax and at least one of the two cnidarian species were used for further analysis. In addition, we prepared a six-way alignment using NcDNAlign (25) that include the genomic data of the six basal metazoa listed in the previous paragraph. The Trichoplax sequence was used as reference and only alignment blocks containing at least three species were processed further.
These two sets of input alignments were passed to the RNAz pipeline and processed in the same way: alignments longer than 120 nt are cut into 120 slices in 40 nt steps. In a series of filtering steps sequences were removed from the individual alignments or alignment slices if they are (i) shorter than 50 nt or (ii) contain more than 25% gap characters or (iii) have a base composition outside the definition range of RNAz. All pre-processing steps were performed using the script rnazWindows.pl of the current release of the RNAz package. Overlapping slices with a positive ncRNA classification probability of P > 0.5 were combined using rnazCluster.pl to a single annotation element, which we refer to as locus. In order to estimate the false discovery rate (FDR) of the screen we repeated the entire procedure with shuffled input alignments using rnazRandomizeAln.pl.
GotohScan
Blast failed to identify many of the ncRNAs that are reasonably expected to be present in the Trichoplax genome, for example homologs of the U4atac snRNA, the U3 snoRNA, or RNA component of RNase MRP. These could not be detected by means of blast, even with relaxed parameter settings. We therefore decided to use a computationally more costly but more sensitive full dynamic programming approach. Instead of using a local (Smith-Waterman) implementation such as ssearch (26) or its partition function version (27), we suggest that a ‘semi-global’ alignment approach is more natural for the homology search problems at hand. In a semi-global alignment, the best match of the ‘complete’ query sequence to the genomic DNA is sought. Due to the relatively long insertion and deletions, the use of an affine gap cost model becomes necessary. This problem is solved by the following straightforward modification of Gotoh's; dynamics programming algorithm (28).
Denote the query sequence by Q = q1,q2,…,qm and the genomic ‘subject’ sequence by P = p1,p2,…,pn. Note that the problem is not symmetric since deletions of the ends of P do not incur costs, while deletions of the ends of Q are fully penalized. As usual, denote by Sij the optimal alignment of the prefixes Q[1…i ] and P[1…j ], respectively. The values of Dij and Fij are the optimal scores of alignments of Q[1…i ] and P[1…j ] with the constraint that the alignment is an insertion or a deletion, respectively. The recursions read
|  | 1 | 
with the initializations
|  | 
In this full version, the algorithm requires 𝒪(n×m) time and memory, where n is the length of the genome and m is the length of the query sequence. While the time requirement is uncritical on off-the-shelf PCs even for large genomes, it is necessary to reduce the memory consumption. It is sufficient to compute, for every position k in the genome the score of the best alignment of the query that has its last match in k. For this purpose, we only need to store the values of the current column Sij and Di − 1,j and of the previous column Si − 1,j and Di − 1,j, i.e. these two quadratic arrays can be replaced by linear arrays of length m. From the F array only the current value Fij and the previous value Fi,j − 1 need to be stored. The alignments themselves need to be computed only for a very small subset of endpoints k of the forward recursion, namely those with nearly optimal score. For each endpoint, the alignment can be obtained by standard backtracing in 𝒪(m2) time and space.
GotohScan is not the only implementation of a semi-global alignments. Alternative approaches use a scoring based on block alignments (29) or employ Hidden Markov Models (30). We constructed our own version since this allowed us to optimize the performance for large genomes on the available off-the-shelf PC hardware and to estimate E-values directly from the observed score distribution. To this end, the current C implementation of GotohScan stores a histogram of all the scores for each query sequence over all database sequences. The locally maximal scores for each query are computed via a simple divide and conquer implementation that starts with the global maximum and continues with the next maxima to the left and right that are at least m (length of the query sequence) nucleotides away from the global maximum. A priority queue is utilized to hold a fixed number of these top-scoring positions. It is initialized only after the first database sequence (typically the longest chromosome or scaffold) while the following high-scoring positions are inserted according to the alignment score. This minimizes the effort for backtracing candidate alignments. Figure 1a gives some example of score histograms.
Figure 1.
(a) Histogram of score distribution for U4atac, U17 and RNase MRP. (b) Fitting the GotohScan score distribution of U4atac to known density functions.
Empirically, we found that the score histogram, with respect to one query sequence against all database sequences, closely follows a Gamma distribution
|  | 2 | 
see Figure 1b. Thus, we fitted a Gamma distribution to the histogram of alignment scores and used it to calculate E-values for each of the elements in the priority queue.
The GotohScan program uses only the high-density portion of the score histogram to estimate the characteristic quantities ln 〈s〉 and 〈ln s〉:
|  | 3 | 
with a and b as limits of the high-density portion of the score distribution and N the number of alignments in this range. Sdistr[i] is the number of alignments with score i. From these we estimated the scale and shape parameters θ and k by least-square fitting of log f(s;k,θ) against the logarithm of the score histogram, restricting the fitting interval to [a:b] of the score distribution. The calculation of E-values then proceeds by using the asymptotic expansion (31) of the incomplete Gamma function:
|  | 4 | 
where Uk(z) = log [1 + (k− 1)/z + (k−1)(k−2)/z2 + …] → 0 for large arguments.
In the last step, the E-values for all high-scoring positions, stored in the priority queue, are calculated and only those with an E-value lower than a given threshold are returned.
Target prediction
The targets of the novel box H/ACA snoRNA candidate are computed using the novel run-time efficient snoplex program (H.Tafer, et al. in preparation). This tool implements a dynamic programming algorithm to compute the binding energy of the snoRNA sequence to its target together with the energy of the snoRNA structure itself. In order to assess putative binding sites, snoplex furthermore considers the initial energy of the snoRNA structure, the energy that is necessary to open the target site and the duplex energy which is also depended on the surrounding snoRNA structure. Given a snoRNA sequence, snoplex scans the target RNA sequence and returns the set of thermodynamically most stable interaction structures.
Experimental verification of expression
Our experimental approach is based on (32). Approximately 400 cultured Trichoplax animals were collected (Grell strain; Haplotype 1) and small RNAs purified with the mirPremier microRNA Isolation Kit (Sigma), following the protocol for mammalian cell cultures. In the unlikely event that genomic DNA contamination was present in the purified small RNA samples, digestion with DNaseI (Fermentas) was performed following the manufacturer's; protocol. A poly-(A) tail of approximately 20 nl was added to the small RNAs using the Poly(A) Tailing Kit (Ambion). Following this, reverse transcription was performed using SuperScript II Reverse Transcriptase (Invitrogen) and a modified poly-d(T) primer (5′-AAGCAGTGGTATCAACGCAGAGT(T)3VN). Amplification of small RNAs was accomplished with the use of a universal reverse primer (5′-AAGCAGTGGTATCAACGCAGAGT) and forward primers specific to the predicted small RNA of interest. Putative products were cloned into pGEM-T vector (Promega) and positive clones sequenced using the services of Macrogen (Korea). A full list of primers and protocols can be supplied upon request.
RESULTS
tRNAs
The Trichoplax genome contains 49 canonical tRNA genes, a single selenocysteine-tRNA gene and one tRNA pseudogene recognizable by tRNAscan-SE, Table 1.
Table 1.
Summary of tRNA genes arranged by anti-codon
| First base | |||||
|---|---|---|---|---|---|
| Second base | Third base | A | C | G | T | 
| A | A | Leua | ψ | Leu | |
| C | Val | Val | Val | ||
| G | Leu+(ψ) | Leu | Leu | ||
| T | Met2 | Ile | Ilea | ||
| C | A | Trp | Cys2 | SeC | |
| C | Gly | Gly | Gly2 | ||
| G | Arg | Arg2 | Arga | ||
| T | Arg | Ser | Arga | ||
| G | A | Ser +(3ψ) | Ser | Ser | |
| C | Ala | Ala | Ala | ||
| G | Pro | Pro | Pro | ||
| T | Thr | Thr | Thr | ||
| T | A | Thya | |||
| C | Glu | Asp | Glu2 | ||
| G | Gln | His | Gln | ||
| T | Lys | Asn | Lys | ||
aindicates tRNAs with introns. The multiplicity of genes with more than one copy is indicated by a superscript. SeC indicated the selenocysteine tRNA.
Interestingly, the Trichoplax genome is essentially devoid of tRNA-like sequences. In addition, a blast search revealed a small cluster of four sequences derived from tRNA-Ser(AGA) located just downstream of the functional tRNA on scaffold 3, and a single degraded pseudogene probably derived from tRNA-Leu(TAG) on scaffold 13. These are indicated in parentheses in Table 1.
Ribosomal RNAs
In eukaryotes, rRNAs (except 5S) are processed from a polycistronic ‘rRNA operon’ which consists of SSU (18S), 5.8S, and LSU (28S) RNAs, two ‘internal spacers’ ITS-1 and ITS-2, and two ‘external spacers’, reviewed in (33). Trichoplax is no exception, see Figure 2. The rRNA sequences have already received considerable attention in a phylogenetic context, see (1,34–36). The pre-rRNA sequence appears in several copies throughout the genome. Somewhat disappointingly, the Triad1 assembly contains none of them in complete and uninterrupted form. The consensus sequence of the pre-rRNA can be easily constructed starting from the previously published sequences and the five fairly complete genomic loci [on scaffolds 22, 40 (two), 50 and 734] together with a partial copy on scaffold 34. Only the exact ends of the external transcribed spacers (ETSs) remain uncertain. Figure 2 summarizes the blastn matches of the pre-rRNA to the Trichoplax genome.
Figure 2.
Trichoplax pre-rRNA cluster reconstructed from previously published sequences L10828, Z22783, AY652578 (SSU), AY303975, AY652583 (LSU), U65478 (internal spacers and 5.8S) and Triad1 genomic sequence. Blast hits of the pre-rRNA to the Triad1 genome assembly are shown below as in the JGI genome browser.
The 5S rRNA sequence of Trichoplax has long been known (37). The current genome assembly contains nine 5S RNA genes, one of which is a degraded pseudogene. Interestingly, there are three anti-parallel pairs (two head-to-head and one tail-to-tail which contains the pseudogene).
Spliceosomal snRNAs
Splicing on mRNAs is a common feature to almost all eukaryotic organizms. The spliceosome consists of more than a hundred protein components and five small RNAs that perform crucial catalytic functions, see (38,39) for reviews. The major spliceosome, containing U1, U2, U4, U5 and U6 snRNAs, splices more than 98% of protein coding genes in metazoans, plants and fungi. A small number of protein-coding mRNAs are processed by the minor spliceosome, which contains U11, U12, U4atac, U5 and U6atac snRNAs (40). Previously, nothing was known about placozoan snRNAs. With the exception of the U4atac, the snRNAs were easily found by blastn. The U4atac was found by GotohScan only with an E-value of 9e− 19. The expression of the U4atac was also verified experimentally. With the exception of two U6 genes, each snRNA is encoded by a single gene in the Trichoplax genome.
Their secondary structures, Figure 3, closely conform to the metazoan consensus (12), with slightly shorter stems II of U11 snRNA and IV of U12 snRNA. The U12 contains an 5 nt insert indicated in red in Figure 3.
Figure 3.
RNA secondary structures of major spliceosomal (U1, U2, U4, U5, U6) and minor spliceosomal (U11, U12, U4atac, U5, U6atac) snRNAs. For U4/U6 and U4atac/U6atac the interaction structures computed by means of RNAcofold are shown. The 5 nt insert (relative to other metazoa) is highlighted in the U12.
In contrast to many other invertebrates, Trichoplax snRNAs feature a clearly recognizable proximal sequence element (PSE) see (12,41), which is easily detected by MEME (42,43), see Table 2. In line with other species, the PSE element is shared between the pol-II and pol-III transcribed snRNAs. On average the PSE elements differ by 3 nt from the consensus.
Table 2.
PSE and location of snRNAs in T. adhaerens [The sequence logo was generated using aln2pattern (15)]
|  | 
RNase P, RNase MRP, SRP RNA
The ribonucleoprotein complexes RNase P and RNase MRP are involved in tRNA and rRNA processing, respectively. Their RNA subunits, which play an essential role in their enzymatic activities, are structurally and evolutionarily related, see e.g. (44,45,46).
RNase P RNA is typically easy to find in genomic DNA, at least within metazoa. The RNase MRP RNA, which is also expected to be present throughout metazoa, is typically much less conserved. Despite substantial efforts (44), RNase MRP RNA homologs have escaped discovery in many bilaterian clades. Not surprisingly, therefore, the Trichoplax RNase P RNA was easily identified by blastn using the Rfam sequences as query. The RNase P sequence is easily verified using infernal and the corresponding Rfam model.
With standard parameters, blastn does not find an MRP RNA homologue. A dedicated, much less stringent, blastn search returns two nearly identical candidates. GotohScan, on the other hand, easily detects the same two loci. The E-value for these two candidates was 3e− 5 and 3e− 4, respectively. The infernal-based automatic test for homology to an MRP RNA covariance model provided through the Rfam website remained unsuccessful. A manually created alignment containing both metazoan and fungal RNase MRP sequences shows, however, that the Trichoplax MRP candidates share the crucial features with both of them, leaving little doubt that we have indeed identified the true MRP sequence. Figure 4 shows the homology-based secondary structure model.
Figure 4.
Secondary structure of T. adhaerens RNase MRP RNA inferred from the multiple alignment of metazoan RNase MRP RNAs provided in the Electronic Supplement.
The signal recognition particle (SRP) binds to the signal peptide emerging from the exit tunnel of the ribosome and targets the signal peptide-bearing proteins to the prokaryotic plasma membrane or the eukaryotic endoplasmic reticulum membrane (47). Its RNA component, called 7SL or SRP RNA, is well conserved and hence easy to identify by blast comparison starting from the SRP RNA sequences compiled in the SRPDB (48). The Trichoplax SRP RNA is shown in Figure 5.
Figure 5.
Structural alignments of the Trichoplax RNase P RNA, SRP RNA and U3 snoRNA sequences with the corresponding Rfam alignments as computed by infernal. The first line contains the structure annotation of the two sequences that are aligned in the second and fourth line. Line three in-between describes the sequence conservation and the type of substitution.
Small nucleolar RNAs
The two classes of snoRNAs, box H/ACA snoRNAs and box C/D snoRNAs, are mutually unrelated in both their function (directing two different chemical modifications of single residues in their target RNA) and their structure, reviewed e.g. in (49).
The U3 snoRNA belongs to the box C/D snoRNA class by virtue of its structural characteristics. It is, however, exceptional in several respects. It contains additional well-conserved sequence motifs which appear to be exclusive to U3 snoRNAs. Instead of directing a modification of single rRNA residues, it is required in the early steps of rRNA maturation, in particular for the cleavage of the 5′ETS and 18S rRNA maturation, see e.g. (50,51,52). Taken together, these features may explain that the U3 snoRNA sequence is much better conserved than all other snoRNAs; in fact, it is the only one that can be found directly by a blast search. The candidate sequence was easily verified by infernal-alignment to the corresponding Rfam model, Figure 5. Its expression was verified experimentally.
The box H/ACA U17 is also involved in the nucleolytic processing of pre-rRNA. Although it has been reported to be the best-conserved box H/ACA snoRNA and ubiquitous among eukaryotes (53), no Trichoplax homologue was found using blast. Not surprisingly, no other snoRNA homologs were detected by means of blast.
The U17 gene was readily identified by GotohScan with an E-value of 1e− 16, however. We therefore conducted a survey of the Trichoplax genome for homologues of all 244 known human box C/D snoRNAs (belonging to 107 distinct snoRNA families) and all 94 known human box H/ACA snoRNAs (82 families) extracted from the snoRNA-LBME-de (http://www-snorna.biotoul.fr) (54).
The initial search, for which we used very non-stringent score cut-offs, produced a candidate set of 22 H/ACA and 18 C/D snoRNA. Upon manual inspection, most of these sequences neither fold into secondary structures characteristic for snoRNAs nor match a query sequence unambiguously. Thus we used SnoReport (22) to check both secondary structures and sequence motifs. Candidates not recognized by SnoReport were removed.
In the next step, we manually added the remaining candidate sequences to multiple sequence alignments of individual snoRNA families. These were retrieved from the Rfam, constructed from the sequences provided through the snoRNA-LBME-db and (in the case of the U71 snoRNA) compiled from sequences deposited in Genbank. This stringent filtering step left 3 H/ACA and 4 C/D snoRNA (not including U3) (Table 3). The multiple sequence alignments, see also Figure 6, are provided in the Electronic Supplement.
Table 3.
Small nucleolar RNAs in Trichoplax
| Name | Class | Target | Conservation | Note | 
|---|---|---|---|---|
| U3 | C/D | 18S 5-22b | Eukaryotes | Verified | 
| 18S 1129-1140b | ||||
| U18 | C/D | 28S A740b | Eukaryotes | – | 
| U36 | C/D | 18S A615b | Eukaryotes | – | 
| U76 | C/D | 28S A1549b | Vertebrates | – | 
| U106 | C/D | 28S A2227 | Vertebrates | – | 
| U17 | H/ACA | a | Eukaryotes | |
| U71? | H/ACA | – | Vertebrates | Uncertain | 
| sc.3857:103-213(-) | H/ACA | 28S U1370 U1884 | Novel | 
aThe U17 snoRNA probably targets the 5′ETS, the exact target is still unknown, however (55,53).
bTarget sites homologous to the ones in human rRNAs.
Figure 6.
Top: secondary structure model of a novel H/ACA snoRNA (l.h.s.) and the best snoplex prediction of its targets sites in the rRNA operon (r.h.s.). Below: alignment of U18 snoRNA sequences from several Metazoa. Boxes and the conserved target binding site are indicated.
The U71 candidate shows enough sequence identity with the vertebrates sequences to make its homology with the vertebrate U71 snoRNA very likely; its putative target site, however, is not conserved between human and Trichoplax, we thus list it as an uncertain candidate.
Two of the H/ACA candidates were found using ACA1 as query but their homology to the human ACA1 snoRNAs cannot be established. Nevertheless, upon inspection, both show all hallmarks of box H/ACA snoRNAs. However, the corresponding primers for the candidate located on scaffold 4365 amplified a sequence fragment located immediately upstream of the predicted snoRNA (see Electronic Supplement for the corresponding alignment). Furthermore, no plausible target site could be identified for this candidate. We therefore did not include it in the list of snoRNAs. The second candidate, ‘sc.3857:103-213(-)’, on the other hand, exhibits two plausible rRNA targets on 28S rRNA (U1370 and U1884) and most likely constitutes a novel snoRNA. In addition, snoplex identifies two additional possible targets in the 18S rRNA (see Electronic Supplement).
This leaves the exceptional U17 snoRNA as the only box H/ACA snoRNA in the Trichoplax genome that can be identified unambiguously by computational means. For the three of the four box C/D snoRNA candidates (U18, U36, U76) we find nearly absolute conservation of the target-binding motifs, which are homologous to the corresponding target sites in human. For the U106 snoRNA candidate we can also identify a plausible target site in the 28S rRNA, which, however, is not homologous to that of the human U106 snoRNA.
The putative host genes of the Trichoplax snoRNAs are not conserved in human. It is known, however, that snoRNAs can change their genomic location on evolutionary time-scales. For instance, several host gene switches are observed for U17 already within vertebrates (56), see also (57). Furthermore, several human snoRNA host genes are non-coding (e.g. the GAS5 transcript for U76 and the unnamed host gene of U71) or are poorly described ORFs (such as C20orf199 for snoRNA U106), making it virtually impossible to determine whether they are homologous between human and Trichoplax.
No microRNAs
Homology based searches for microRNAs remained unsuccessful employing both blast and GotohScan using the complete set of pre-microRNA hairpins listed in miRBase (release 12.0) as query. Both short blast hits and weak GotohScan signals were analysed. Removing all sequences for which sequence conservation was very poor on the putative mature microRNA sequence and/or the putative precursor did not fold into the characteristic hairpin structure left a single candidate possibly homologous to mir-789. The best-conserved region is located opposite to the annotated mature sequence from Caenorhabditis species. Hence, this candidate also remains inconclusive.
Ab initio ncRNA prediction
An alternative to direct homology-based annotation is the ab initio prediction of ncRNAs. In particular RNAz (23) has been proved to yield results in wide variety of species, from screens of the human genome compared against (mostly) mammalia (58,59), teleost fishes (60), urochordates (61), nematodes (62), flies (63), yeasts (64) and plasmodium (65). In brief, RNAz is a machine learning tool that determines for a slice of aligned genomic DNA whether it encodes a structured RNA depending on measures of thermodynamics stability and evolutionary conservation (23).
In the case of Trichoplax, the use of comparative genomics is limited by the comparably large distance to other sequenced genomes, because most of the genome thus cannot be unambiguously aligned with better understood genomes. We therefore investigated two different genome-wide alignments. In the first screen, we used three species MultiZ-alignments (24) of T. adhaerens, and the Cnidaria Hydra magnipapillata and N. vectensis. We used all alignment blocks containing Trichoplax and at least one of the two cnidarians.
A second screen was performed using NcDNAlign alignments (25) constructed from T. adhaerens, P. lobata and shotgun traces from A. queenslandica, A. millepora, A. palmata and H. papilata. This screen was limited to alignment blocks containing Trichoplax and at least two other species. As expected, the large evolutionary distances in both screen limit the sensitivity of the comparative approach and preclude the detection of Placozoan-specific ncRNAs.
Both of the differently created alignment sets are screened with RNAz, the corresponding results are compiled in Table 4. The restrictive NcDNalign alignments revealed no novel ncRNAs. Of only 101 loci, 11 were identified as false positives mapping to four different protein-coding gene families, while the remaining hits coincide with ncRNAs that have already been identified by homology-based annotation. With the much more liberal multiz alignments we obtained 3027 RNAz hits comprising 1416 distinct genomic loci that show ‘some’ sign of evolutionary conserved secondary structure. Of these, 382 loci correspond to annotated ncRNAs, while 1088 (77%) overlap known protein-coding regions or known repetitive elements. Twelve of the remaining loci are supported by ESTs and may constitute novel ncRNAs. The remaining 193 hits contain the U3 and U17 snoRNA genes, which were found by blast and/or GotohScan.
Table 4.
RNAz screens of T. adhaerens genome
| multiz | NcDNalign | Known | |
|---|---|---|---|
| Aligned DNA (nt) | 4 837 148 | 1 35 140 | – | 
| alignments | 35 039 | 744 | – | 
| RNAz P > 0.5 | 1416 | 101 | – | 
| FDR random | 56% (797) | 43% (43) | – | 
| RNAz P > 0.9 | 751 | 79 | – | 
| FDR | 27% (386) | 15% (15) | – | 
| tRNAs | 39 | 35 | 50+1 | 
| 5S rRNA | 6 | 8 | 9 | 
| rRNA operon | 33+3 | 43 | –a | 
| snRNAs | 6 | 4 | 10 | 
| MRP, P, 7SL | 1 | 0 | 3 | 
| Protein coding | 1022 | 11 | 96 963 | 
| Repeat elements | 66 | 1 | – | 
| Total annotated | 1211 | 101 | – | 
| Unannotated with EST | 12 | 0 | – | 
| Without annotation | 205 | 0 | – | 
aThe rDNA operons appear as series of multiple RNAz hits. Known refers to all ncRNAs that have been reported previously and those that have been identified by homology search in this study.
Figure 7 summarizes the distribution of the RNAz classification scores of the MultiZ-based screen. Many of the known ncRNAs appear with moderate classification probability, with a significant enrichment observed only for scores close to one. This reflects the high expected FDR of these data, which are largely based on pairwise alignments. This implies that the initial candidates of this screen need to be post-processed with respect to gene annotation and/or other filtering methods. Indeed, the majority of predictions—even somewhat more than the estimated FDR—are located in the protein-coding regions (Table 4). The data nevertheless provide at least statistical evidence for a set of about 100–200 novel structured RNA elements.
Figure 7.
Distribution of RNAz classification score for known (true positive) (black) all predictions (grey), and only those that are identified as coding or repetitive (maroon). Note the logarithmic scale: there are more than 100 non-annotated predictions with a classification confidence above 99%.
The 744 NcDNalign were searched with RNAmicro for possible microRNAs. After removing known ncRNAs, in particular the U5 snRNA and several hits to hairpins in the rRNA operon, exons of annotated protein coding genes and repetitive elements recognized by repeatmasker, we retained 82 candidates. Since RNAmicro evaluates alignment and the corresponding consensus fold, we also checked whether the Trichoplax candidate sequences alone fold into a microRNA-like hairpin structure. Sixty-four sequences passed this filter. Most of these sequences appear to be repetitive, mapping to more than three distinct loci in the Trichoplax genome, leaving 13 microRNA-like hairpins that are conserved between Trichoplax and Nematostella. However, none of these candidates resembles any of the 40 in N. vectensis or the eight A. queenslandica microRNAs described in (66). We thus suggest that these conserved hairpins are not microRNAs. Instead they might belong to a previously undescribed class of hairpin structures.
DISCUSSION
We have reported here on a comprehensive computational study of non-protein-coding RNA genes in the genome of the placozoan T. adhaerens. We observed that only a limited set of the best-conserved ncRNAs, in particular tRNAs, rRNAs and a few additional ‘housekeeping’ RNAs are readily found by means of blastn. We have therefore developed a more sensitive tool, GotohScan, which implements a full semi-global dynamic programming algorithm. Using this method, we were able to detect homologs of several fast-evolving ncRNAs, including a few box C/D and box H/ACA snoRNAs, the RNase MRP RNA, and the full complement of spliceosomal snRNAs.
In addition to the homology-based annotation, we conducted surveys evolutionary conserved RNA secondary structures using RNAz and RNAmicro. Reasoned by the large evolutionary distance between Trichoplax and other sequenced genomes, the sensitivity of these screens was rather low, however. Nevertheless a handful of novel ncRNA candidates was found.
Due to the small size and slow growth of T. adhaerens, it is hard—if not impossible—to obtain sufficient amounts of RNAs to verify the expression of ncRNA candidates directly by Northern blots. Instead, we used here a PCR-based approach introduced by (32), which requires much smaller quantities of RNA. We did not attempt to validate the entire set of predictions but rather selected a small subset, consisting of a few of the homologs detected by GotohScan and a small collection of novel predictions. Due to the small amount of RNA, the sensitivity is still limited. Nevertheless, we unambiguously identified a few previously undescribed Trichoplax ncRNAs, namely: U4atac, as a representative of the minor spliceosome; the U3 snoRNA and a putative novel ncRNA on scaffold 3857.
Our computational annotation of the Trichoplax genome reveals much of the expected complement of the ncRNA repertoire. Most ncRNAs are single-copy genes or appear in very small copy numbers. This contrasts the situation in many of the higher metazoa, for which more detailed ncRNA annotations are available [e.g. Caenorhabditis elegans (67), Drosophila (63,68) and the Rfam-based annotation in mammalian genomes]. In particular, the small copy number of tRNAs and other pol-III transcripts is surprising, since these genes appear in dozens or hundreds of copies in many bilaterian genomes.
The lack of microRNAs is surprising at a first glance. While a few orthologous microRNAs—in particular the mir-100 family—are shared between Cnidaria and Bilateria (69,70), we found no trace of these genes in Trichoplax. Neither did we find a homolog of one of the eight sponge microRNAs (66). Our analysis is thus consistent with the recent report based on short RNA sequencing (66) that Trichoplax does not have microRNAs. The continuing expansion of the repertoire of microRNA and their targets has been associated with both major body-plan innovations as well as the emergence of phenotypic variation in closely related species (71,69–73). The microRNA precursors of Cnidaria and Bilateria are imperfectly paired hairpin structures about 80 nt in length. In contrast, the precursors of the recently discovered miRNAs of the sponge A. queenslandica (66) are not orthologous to any of the Cnidarian/Bilaterian microRNA families and resemble the structurally more diverse and more complex RNAs described in slime-molds (74), algae (75,76) and plants (77–79). Under the hypothesis of monophyletic diploplasts, which has recently gained substantial support (5,80), Placozoa have secondarily lost their ability to produce microRNAs, while sponges have secondarily relaxed the constraints on precursor structures. The complete loss of microRNAs in Placozoa is consistent with the morphological simplicity of Trichoplax. Although, argonaute, Dicer and Drosha proteins could be found in Trichoplax, no Pasha homolog, which partners with Drosha during miRNA biogenesis, was found. Since, all these core RNAi proteins, except Pasha, are also involved in non-miRNA related processes, it is likely that Pasha has been discarded together with the miRNAs in Trichoplax (66).
De novo predictions of evolutionarily conserved RNAs suggest that the Trichoplax genome may have preserved some ncRNAs characteristic to basal metazoans, such as the handful of hairpin structures that are conserved between Trichoplax and Nematostella. We do not know at this point, however, whether these purely computational signals are expressed in vivo, and what their function might be.
Our survey also misses several ncRNA classes that we should expect to be present in Trichoplax, in particular telomerase RNA, U7 snRNA [which are involved in histone 3′-end processing (81)], the Ro-associated Y-RNAs, the RNA components of the vault complex (the Trichoplax genome contains the Major Vault Protein) and possibly also a 7SK RNA. In contrast to microRNAs, however, recent studies have highlighted how difficult it is to identify these particular classes of RNA from genomic DNA: telomerase RNA evolves so rapidly that—despite its size of over 300 nt—it has not been identified so far in any invertebrate species (82). A similarly fast evolution is observed for the 7SK RNA (83,84). Due to their small size and weak sequence constraints, U7 snRNA (85,86), Y RNAs (87,88) and vault RNAs [P. F. Stadlex et al. (submitted for publication)] are also largely unknown beyond deuterostomes (in some cases Drosophilids or C. elegans, where homologs were discovered independently). Our failure to find these genes thus most likely points at the limitations of the currently available homology search methodology rather than at the absence of these RNA classes in the Trichoplax genome.
SUPPLEMENTAL INFORMATION
An Electronic Supplement provides a complete set of coordinates of all described putative RNA elements, alignments of snoRNAs, RNase MRP and genomic locations of the snoRNA targets. The data can be accessed in machine readable formats at http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/08-024/.
FUNDING
Deutsche ForschungsGemeinschaft (through the ‘Graduierten-Kolleg Wissensrepräsentation’ University of Leipzig); Austrian Fonds zur Förderung der Wissenschaftlichen Forschung (project P19411 ‘Genomdynamik’); sixth Framework Programme of the European Union (projects SYNLET and EMBIO); Alexander von Humboldt Foundation and John Templeton Foundations, and an Alexander von Humboldt Research Fellowship (D.dJ.). Funding for open access charge: Deutsche Forschungsgemeinschaft (DFG).
Conflict of interest statement. None declared.
REFERENCES
- 1.Voigt O, Collins AG, Pearse VB, Pearse JS, Ender A, Hadrys H, Schierwater B. Placozoa—no longer a phylum of one. Curr. Biol. 2004;14:R944–R945. doi: 10.1016/j.cub.2004.10.036. [DOI] [PubMed] [Google Scholar]
- 2.Syed T, Schierwater B. Trichoplax adhaerens: discovered as a missing link, forgotten as a hydrozoan, re-discovered as a key to metazoan evolution. Vie et Milieu. 2002;52:177–187. [Google Scholar]
- 3.Collins AG, Cartwright P, McFadden CS, Schierwater B. Phylogenetic context and basal metazoan model systems. Integr. Compar. Biol. 2005;45:585–594. doi: 10.1093/icb/45.4.585. [DOI] [PubMed] [Google Scholar]
- 4.Miller DJ, Ball EE. Animal evolution: Trichoplax, trees and taxonomic turmoil. Curr. Biol. 2008;18:R1003–R1005. doi: 10.1016/j.cub.2008.09.016. [DOI] [PubMed] [Google Scholar]
- 5.Schierwater B, Eitel M, Jakob W, Osigus H-J, Hadrys H, Dellaporta S, Kolokotronis S-O, DeSalle R. Concatenated molecular and morphological analysis sheds light on early metazoan evolution and fuels a modern ‘Urmetazoon’ hypothesis. PLoS Biol. 2008;7 doi: 10.1371/journal.pbio.1000020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jakob W, Sagasser S, Dellaporta S, Holland P, Kuhn K, Schierwater B. The Trox-2 Hox/ParaHox gene of Trichoplax (Placozoa) marks an epithelial boundary. Dev. Genes Evol. 2004;214:170–175. doi: 10.1007/s00427-004-0390-8. [DOI] [PubMed] [Google Scholar]
- 7.Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, Kawashima T, Kuo A, Mitros T, Carpenter ML, Signorovitch AY, et al. The Trichoplax genome and the nature of placozoans. Nature. 2008;454:955–960. doi: 10.1038/nature07191. [DOI] [PubMed] [Google Scholar]
- 8.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, Terry A, Shapiro H, Lindquist E, Kapitonov V, et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science. 2007;317:86–94. doi: 10.1126/science.1139158. [DOI] [PubMed] [Google Scholar]
- 10.Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005;33:D121–D124. doi: 10.1093/nar/gki081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.He S, Liu C, Skogerbø G, Zhao H, Wang J, Liu T, Bai B, Zhao Y, Chen R. NONCODE v2.0: decoding the non-coding. Nucleic Acids Res. 2008;36:D170–D172. doi: 10.1093/nar/gkm1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Marz M, Kirsten T, Stadler PF. Evolution of spliceosomal snRNA genes in metazoan animals. J. Mol. Evol. 2008 doi: 10.1007/s00239-008-9149-6. Nov 22, doi: 10.1007/s00239-008-9149-6 [Epub ahead of print] [DOI] [PubMed] [Google Scholar]
- 13.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nawrocki EP, Eddy SR. Query-dependent banding for faster RNA similarity searches. PLoS Comp. Biol. 2007;3:e56. doi: 10.1371/journal.pcbi.0030056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mosig A, Chen JL, Stadler PF. Giancarlo R, Hannenhalli S, editors. Homology search with fragmented nucleic acid sequence patterns. In. Algorithms in Bioinformatics (WABI 2007) 2007:335–345. Vol. 4645 of Lecture Notes in Computer Science, Springer Verlag, Berlin. [Google Scholar]
- 16.Griffiths-Jones S. RALEE—RNA alignment editor in Emacs. Bioinformatics. 2005;21:257–259. doi: 10.1093/bioinformatics/bth489. [DOI] [PubMed] [Google Scholar]
- 17.Hofacker IL, Fontana W, Stadler PF, Bonhoeffer SL, Tacker M, Schuster P. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 1994;125:167–188. [Google Scholar]
- 18.Hofacker IL, Fekete M, Stadler PF. Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 2002;319:1059–1066. doi: 10.1016/S0022-2836(02)00308-X. [DOI] [PubMed] [Google Scholar]
- 19.Bernhart SH, Hofacker IL, Will S, Gruber AR, Stadler PF. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics. 2008;9:474. doi: 10.1186/1471-2105-9-474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bernhart SH, Tafer H, Mückstein U, Flamm C, Stadler PF, Hofacker IL. Partition function and base pairing probabilities of RNA heterodimers. Algorithms Mol. Biol. 2006;1:3. doi: 10.1186/1748-7188-1-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hertel J, Stadler PF. Hairpins in a haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics. 2006;22:e197–e202. doi: 10.1093/bioinformatics/btl257. [DOI] [PubMed] [Google Scholar]
- 22.Hertel J, Hofacker IL, Stadler PF. snoReport: computational identification of snoRNAs with unknown targets. Bioinformatics. 2008;24:158–164. doi: 10.1093/bioinformatics/btm464. [DOI] [PubMed] [Google Scholar]
- 23.Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA. 2005;102:2454–2459. doi: 10.1073/pnas.0409169102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. doi: 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rose D, Hertel J, Reiche K, Stadler PF, Hackermüller J. NcDNAlign: plausible multiple alignments of non-protein-coding genomic sequences. Genomics. 2008;92:65–74. doi: 10.1016/j.ygeno.2008.04.003. [DOI] [PubMed] [Google Scholar]
- 26.Pearson WR. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics. 1991;11:635–650. doi: 10.1016/0888-7543(91)90071-l. [DOI] [PubMed] [Google Scholar]
- 27.Roshan U, Chikkagoudar S, Livesay DR. Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities. BMC Bioinformatics. 2008;9:61. doi: 10.1186/1471-2105-9-61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gotoh O. An improved algorithm for matching biological sequences. J. Mol. Biol. 1982;162:705–708. doi: 10.1016/0022-2836(82)90398-9. [DOI] [PubMed] [Google Scholar]
- 29.Basu K, Sriraam N, Richard RJ. A pattern matching approach for the estimation of alignment between any two given DNA sequences. J. Med. Syst. 2007;31:247–253. doi: 10.1007/s10916-007-9062-3. [DOI] [PubMed] [Google Scholar]
- 30.Kann MG, Sheetlin SL, Park Y, Bryant SH, Spouge JL. The identification of complete domains within protein sequences using accurate e-values for semi-global alignment. Nucleic Acids Res. 2007;35:4678–4685. doi: 10.1093/nar/gkm414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Davis PJ. Gamma function and related function. In: In: Abramowitz M, Stegun IA, editors. Handbook of Mathematical Functions. Washington, DC: National Bureau of Standards; 1964. pp. 253–266. [Google Scholar]
- 32.Ro S, Park C, Jin J, Sanders KM, Yan W. A PCR-based method for detection and quantification of small RNASs. Biochem. Biophys. Res. Comm. 2006;351:756–763. doi: 10.1016/j.bbrc.2006.10.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nazar RN. Ribosomal RNA processing and ribosome biogenesis in eukaryotes. IUBMB Life. 2004;56:457–465. doi: 10.1080/15216540400010867. [DOI] [PubMed] [Google Scholar]
- 34.Wainright PO, Hinkle G, Sogin ML, Stickel SK. The monophyletic origins of the metazoa; an unexpected evolutionary link with fungi. Science. 1993;260:340–342. doi: 10.1126/science.8469985. [DOI] [PubMed] [Google Scholar]
- 35.Odorico DM, Miller DJ. Internal and external relationships of the Cnidaria: implications of primary and predicted secondary structure of the 5′-end of the 23S-like rDNA. Proc. R. Soc. Lond. B Biol. Sci. 1997;264:77–82. doi: 10.1098/rspb.1997.0011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.da Silva FB, Muschner V, Bonatto SL. Phylogenetic position of Placozoa based on large subunit (LSU) and small subunit (SSU) rRNA genes. Genet. Mol. Biol. 2007;30:127–132. [Google Scholar]
- 37.Val'ekho-Roman KM, Bobrova VK, Troitskiî AV, Tsetlin AB, Okshteîn IL. New data on Trichoplax: the nucleotide sequence of 5S rRNA. Dokl Akad Nauk SSSR. 1990;311:500–503. [PubMed] [Google Scholar]
- 38.Nilsen TW. The spliceosome: the most complex macromolecular machine in the cell? Bioessays. 2003;25:1147–1149. doi: 10.1002/bies.10394. [DOI] [PubMed] [Google Scholar]
- 39.Valadkhan S. The spliceosome: caught in a web of shifting interactions. Curr. Opin. Struct. Biol. 2007;17:310–315. doi: 10.1016/j.sbi.2007.05.001. [DOI] [PubMed] [Google Scholar]
- 40.Tarn WY, Yario TA, Steitz JA. U12 snRNAs in vertebrates: Evolutionary conservation of 5′ sequences implicated in splicing of premRNAs containing a minor class of introns. RNA. 1995;1:644–656. [PMC free article] [PubMed] [Google Scholar]
- 41.Hernandez N. Small nuclear RNA genes: a model system to study fundamental mechanisms of transcription. J. Biol. Chem. 2001;276:26733–26736. doi: 10.1074/jbc.R100032200. [DOI] [PubMed] [Google Scholar]
- 42.Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In. 1994:28–36. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, CA. [PubMed] [Google Scholar]
- 43.Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006;34:W369–W373. doi: 10.1093/nar/gkl198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Piccinelli P, Rosenblad MA, Samuelsson T. Identification and analysis fo ribonuclease P and MRP RNA in a broad range of eukaryotes. Nucleic Acids Res. 2005;33:4485–4495. doi: 10.1093/nar/gki756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Woodhams MD, Stadler PF, Penny D, Collins LJ. RNAse MRP and the RNA processing cascade in the eukaryotic ancestor. BMC Evol. Biol. 2007;7:S13. doi: 10.1186/1471-2148-7-S1-S13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Willkomm DK, Hartmann RK. An important piece of the RNase P jigsaw solved. Trends Biochem. Sci. 2007;32:247–250. doi: 10.1016/j.tibs.2007.04.005. [DOI] [PubMed] [Google Scholar]
- 47.Nagai K, Oubridge C, Kuglstatter A, Menichelli E, Isel C, Jovine L. Structure, function and evolution of the signal recognition particle. EMBO J. 2003;22:3479–3485. doi: 10.1093/emboj/cdg337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Alm Rosenblad M, Gorodkin J, Knudsen B, Zwieb C, Samuelsson T. SRPDB (signal recognition particle database) Nucleic Acids Res. 2003;31:D363–D364. doi: 10.1093/nar/gkg107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Bachellerie J-P, Cavaillé J, Hüttenhofer A. The expanding snoRNA world. Biochimie. 2002;84:775–790. doi: 10.1016/s0300-9084(02)01402-5. [DOI] [PubMed] [Google Scholar]
- 50.Gerbi SA, Borovjagin AV, Ezrokhi M, Lange TS. Ribosome biogenesis: role of small nucleolar RNA in maturation of eukaryotic rRNA. Cold Spring Harbor Symp. Quant. Biol. 2001;LXVI:575–590. doi: 10.1101/sqb.2001.66.575. [DOI] [PubMed] [Google Scholar]
- 51.Marmier-Gourrier N, Cléry A, Senty-Ségault V, Charpentier B, Schlot-ter F, Leclerc F, Fournier R, Branlant C. A structural, phylogenetic and functional study of 15.5-kD/Snu13 protein binding on U3 small nucleolar RNA. RNA. 2003;9:821–838. doi: 10.1261/rna.2130503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Cléry A, Senty-Ségault V, Leclerc F, Raué HA, Branlant C. Analysis of sequence and structural features that identify the B/C motif of U3 small nucleolar RNA as the recognition site for the Snu13p-Rrp9p protein pair. Mol. Cellular Biol. 2007;27:1191–1206. doi: 10.1128/MCB.01287-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Atzorn V, Fragapane P, Kiss T. U17/snR30 is a ubiquitous snoRNA with two conserved sequence motifs essential for 18S rRNA production. Mol. Cell Biol. 2004;24:1769–1778. doi: 10.1128/MCB.24.4.1769-1778.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nuclcie Acids Res. 2006;34:D158–D162. doi: 10.1093/nar/gkj002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Enright CA, Maxwell ES, Eliceiri GL, Sollner-Webb B. 5′ ETS rRNA processing facilitated by four small RNAs: U14, E3, U17 and U3. RNA. 1996;2:1094–1099. [PMC free article] [PubMed] [Google Scholar]
- 56.Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, Lehmann J, Missal K, Mosig A, Müller B, Prohaska SJ, et al. Evolutionary patterns of non-coding RNAs. Theorc. Biosci. 2005;123:301–369. doi: 10.1016/j.thbio.2005.01.002. [DOI] [PubMed] [Google Scholar]
- 57.Weber MJ. Mammalian small nucleolar RNAs are mobile genetic elements. PLoS Genet. 2006;2:e205. doi: 10.1371/journal.pgen.0020205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Washietl S, Hofacker IL, Lukasser M, Hüttenhofer A, Stadler PF. Mapping of conserved RNA secondary structures predicts thousands of functional non-coding RNAs in the human genome. Nature Biotech. 2005;23:1383–1390. doi: 10.1038/nbt1144. [DOI] [PubMed] [Google Scholar]
- 59.Washietl S, Pedersen JS, Korbel JO, Gruber A, Hackermüller J, Hertel J, Lindemeyer M, Reiche K, Stocsits C, Tanzer A, et al. Structured RNAs in the ENCODE selected regions of the human genome. Gen. Res. 2007;17:852–864. doi: 10.1101/gr.5650707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Rose D, Jöris J, Hackermüller J, Reiche K, Li Q, Stadler PF. Duplicated RNA genes in teleost fish genomes. J. Bioinf. Comp. Biol. 2008;6:1157–1175. doi: 10.1142/s0219720008003886. [DOI] [PubMed] [Google Scholar]
- 61.Missal K, Rose D, Stadler PF. Non-coding RNAs in Ciona intestinalis. Bioinformatics. 2005;21(S2):i77–i78. doi: 10.1093/bioinformatics/bti1113. [Proceedings ECCB/JBI'05, Madrid] [DOI] [PubMed] [Google Scholar]
- 62.Missal K, Zhu X, Rose D, Deng W, Skogerbø G, Chen R, Stadler PF. Prediction of structured non-coding RNAs in the genome of the nematode Caenorhabitis elegans. J. Exp. Zool. Mol. Dev. Evol. 2006;306B:379–392. doi: 10.1002/jez.b.21086. [DOI] [PubMed] [Google Scholar]
- 63.Rose RD, Hackermüller J, Washietl S, Findeiß S, Reiche K, Hertel J, Stadler PF, Prohaska SJ. Computational RNomics of drosophilids. BMC Genomics. 2007;8:406. doi: 10.1186/1471-2164-8-406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Steigele S, Huber W, Fried C, Stadler PF, Nieselt K. Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions. BMC Biol. 2007;5v:25. doi: 10.1186/1741-7007-5-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, et al. Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum. Genome Res. 2008;18:281–292. doi: 10.1101/gr.6836108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Grimson A, Srivastava M, Fahey B, Woodcroft BJ, Chiang HR, King N, Degnan AM, Rokhsar DS, Bartel DP. Early origins and evolution of miRNAs and Piwi-interacting RNAs in animals. Nature. 2008;455:1193–1197. doi: 10.1038/nature07415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Stricklin SL, Griffiths-Jones S, Eddy SR. C. elegans noncoding RNA genes. WormBook. 2005. [Last acessed date, October 8, 2008]. http://www.wormbook.org/chapters/www_noncodingRNA/noncodingRNA.html. [DOI] [PMC free article] [PubMed]
- 68.Stark A, Kheradpour P, Parts L, Brennecke J, Hodges E, Hannon GJ, Kellis M. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. 2007;17:1865–1879. doi: 10.1101/gr.6593807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Sempere LF, Cole CN, McPeek MA, Peterson KJ. The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J. Exp. Zool. B. Mol. Dev. Evol. 306B:575–588. doi: 10.1002/jez.b.21118. [DOI] [PubMed] [Google Scholar]
- 70.Prochnik SE, Rokhsar DS, Aboobaker AA. Evidence for a microRNA expansion in the bilaterian ancestor. Dev. Genes Evol. 2007;217:73–77. doi: 10.1007/s00427-006-0116-1. [DOI] [PubMed] [Google Scholar]
- 71.Hertel J, Lindemeyer M, Missal K, Fried C, Tanzer A, Flamm C, Hofacker IL, Stadler PF The Students of Bioinformatics Computer Labs 2004 and 2005. The expansion of the metazoan microRNA repertoire. BMC Genomics. 2006;7:15. doi: 10.1186/1471-2164-7-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Niwa R, Slack FJ. The evolution of animal microRNA function. Curr. Opin. Gen. Devel. 2007;17:145–150. doi: 10.1016/j.gde.2007.02.004. [DOI] [PubMed] [Google Scholar]
- 73.Lee CT, Risom T, Strauss WM. Evolutionary conservation of microRNA regulatory circuits: an examination of microRNA gene complexity and conserved microRNA-target interactions through metazoan phylogeny. DNA Cell Biol. 2007;26:209–218. doi: 10.1089/dna.2006.0545. [DOI] [PubMed] [Google Scholar]
- 74.Hinas A, Reimegard J, Wagner EG, Nellen W, Ambros V, Söderbom F. The small RNA repertoire of Dictyostelium discoideum and its regulation by components of the RNAi pathway. Nucleic Acids Res. 2007;35:6714–6726. doi: 10.1093/nar/gkm707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Zhao T, Li G, Mi S, Li S, Hannon GJ, Wang XJ, Qi Y. A complex system of small RNAs in the unicellular green alga Chlamydomonas reinhardtii. Genes Dev. 2007;21:1190–1203. doi: 10.1101/gad.1543507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Molnár A, Schwach F, Studholme DJ, Thuenemann EC, Baulcombe DC. miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii. Nature. 2007;447:1126–1129. doi: 10.1038/nature05903. [DOI] [PubMed] [Google Scholar]
- 77.Zhang B, Pan X, Cannon CH, Cobb GP, Anderson TA. Conservation and divergence of plant microRNA genes. Plant J. 2006;46:243–259. doi: 10.1111/j.1365-313X.2006.02697.x. [DOI] [PubMed] [Google Scholar]
- 78.Axtell MJ, Snyder JA, Bartel DP. Common functions for diverse small RNAs of land plants. Plant Cell. 2007;19:1750–1769. doi: 10.1105/tpc.107.051706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Sunkar R, Jagadeeswaran G. In silico identification of conserved microRNAs in large number of diverse plant species. BMC Plant Biol. 2008;8:37. doi: 10.1186/1471-2229-8-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Dunn CW, Hejno A, Matus DQ, Pang K, Browne W, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008;452:745–749. doi: 10.1038/nature06614. [DOI] [PubMed] [Google Scholar]
- 81.Marzluff WF. Metazoan replication-dependent histone mRNAs: a distinct set of RNA polymerase II transcripts. Curr. Opin. Cell. Biol. 2005;17:274–280. doi: 10.1016/j.ceb.2005.04.010. [DOI] [PubMed] [Google Scholar]
- 82.Xie M, Mosig A, Qi X, Li Y, Stadler PF, Chen JJ-L. Size variation and structural conservation of vertebrate telomerase RNA. J. Biol. Chem. 2008;283:2049–2059. doi: 10.1074/jbc.M708032200. [DOI] [PubMed] [Google Scholar]
- 83.Gruber AR, Koper-Emde D, Marz M, Tafer H, Bernhart S, Obernosterer G, Mosig A, Hofacker IL, Stadler PF, Benecke B-J. Invertebrate 7SK snRNAs. J. Mol. Evol. 2008;66:107–115. doi: 10.1007/s00239-007-9052-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Gruber A, Kilgus C, Mosig A, Hofacker IL, Hennig W, Stadler PF. Arthropod 7SK RNA. Mol. Biol. Evol. 2008;25:1923–1930. doi: 10.1093/molbev/msn140. [DOI] [PubMed] [Google Scholar]
- 85.Marz M, Mosig A, Stadler BMR, Stadler PF. U7 snRNAs: A computational survey. Geno. Prot. Bioinf. 2007;5:187–195. doi: 10.1016/S1672-0229(08)60006-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.López MD, Samuelsson T. Early evolution of histone mRNA 3′ end processing. RNA. 2008;14:1–10. doi: 10.1261/rna.782308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Mosig A, Guofeng M, Stadler BMR, Stadler PF. Evolution of the vertebrate Y RNA cluster. Theorc. Biosci. 2007;126:9–14. doi: 10.1007/s12064-007-0003-y. [DOI] [PubMed] [Google Scholar]
- 88.Perreault J, Perreault J-P, Boire G. The Ro associated Y RNAs in metazoans: evolution and diversification. Mol. Biol. Evol. 24:1678–1689. doi: 10.1093/molbev/msm084. [DOI] [PubMed] [Google Scholar]







