Abstract
We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement.
INTRODUCTION
Comparisons, alignments and motif identification are essential procedures for extracting valuable information from biological sequences. Many effective software tools for these purposes are available for use with amino acid and DNA sequences, but their efficiency for RNA sequences is limited because they do not accommodate analysis of possible secondary structures. Practical analyses of multiple RNA sequences in light of their secondary structures have been difficult because of their extremely high computational costs, but several algorithms have been proposed and there are a few websites of software tools that support structure-based analyses of RNA sequences, e.g. Vienna RNA Package (http://www.tbi.univie.ac.at/~ivo/RNA/), Sfold (http://sfold.wadsworth.org) and BiBiServ (http://bibiserv.techfak.uni-bielefeld.de/).
Recent progress in RNA sequence analysis has created a demand for rapid and accurate structure-based analyses of multiple RNA sequences. To this end, we have developed several software tools for comparison (1,2), alignment (3–8) and motif identification (9) of multiple RNA sequences; searches for conserved miRNAs (10); prediction of common secondary structures from multiple sequence alignments (11) and calculation of base-pairing probabilities for long sequences (12). Using these software tools, we have developed an integrated web server and stand-alone web servers (software.ncrna.org) that support multiple alignment, pairwise alignment and extraction of structural motifs of RNA sequences.
METHODS
The integrated web server and the stand-alone web servers we developed offer three types of RNA sequence analyses based on common potential secondary structures: pairwise alignment, multiple alignment and structural motif extraction. SCARNA, PHMMTS (pair hidden Markov models on tree structures), PSTAG (pair stochastic tree adjoining grammar), Murlet and MXSCARNA can be used on their stand-alone web servers as well as on our integrated server. In addition, the source codes for PHMMTS, PSTAG, Murlet and MXSCARNA are available for download. The brief introductions to those tools follow, while the detailed evaluations are described in the refs (3–9) and their summaries given on web (http://software.ncrna.org).
Pairwise alignment
SCARNA (3) is a rapid structural pairwise alignment tool for RNA sequences of unknown secondary structure. This program separately aligns the 5′ and 3′ regions of stem candidates, which are extracted from each RNA sequence in light of base-pairing probabilities (12,13), by use of an engineered DP algorithm that incorporates rough consideration of consistency. We compared SCARNA with several other alignment tools by using Gardner's benchmark dataset (14) and a dataset comprising 5S ribosomal RNA, 5.8S ribosomal RNA and Hammerhead ribozyme from the Rfam database (15). The alignment accuracies of SCARNA for sequences with low similarities were not as high as those of programs that evaluate secondary structures more strictly, e.g. Foldalign (16), Dynalign (17) and PMcomp (18). However, the computational speed of SCARNA was approximately one order of magnitude faster (i.e. <1 min for 1000 bases) and allowed alignment of sequences longer than 1000 bases.
PHMMTS (4,5) and PSTAG (6) are tools for aligning RNA sequences of unknown secondary structure to RNA sequences with known secondary structure. PHMMTS evaluates only pseudoknot-free structures, whereas PSTAG can accept pseudoknotted structures. When compared with ClustalW (19) by using tRNA and Hammerhead ribozyme datasets, PHMMTS was more accurate in regard to correct assignment of secondary structures. In a comparison with PHMMTS and ClustalW by using RNA sequences of HDV_ribozyme, an RNA family in PseudoBase (20) that includes pseudoknotted structures, PSTAG was more accurate in correct assignment of secondary structures.
Multiple alignment
Murlet (7) and MXSCARNA (8) are structural multiple alignment tools for RNA sequences. Murlet is based on pair SCFG (stochastic context-free grammar), has dramatically decreased computational costs, and is applicable to RNA sequences as long as 300 bases. MXSCARNA is an extension of SCARNA that offers progressive alignment and is applicable to RNA sequences as long as 5000 bases though the accuracies for those longer than 500 bases are not confirmed and the lengths are restricted to 1000 in the web server. We validated Murlet and MXSCARNA by using the BRAlibaseII benchmark dataset (14) and the dataset of Kiryu et al. (7). Both tools showed comparative accuracies in SPS (sum-of-pairs score) with ProbCons (21). The accuracies in potential common secondary structures were evaluated by MCC (Mathew's correlation coefficient), and both tools showed comparative accuracy with Stemloc (22).
Motif extraction
RNAmine (9) is a tool for extraction of structural motifs. This program uses a graph-mining technique to identify local sequences with frequent stem patterns from among a set of RNA sequences. RNAmine is currently available only on the integrated web server.
Additional tools
In addition to the six software programs described, the tools SOKOS/CAN (1), Stem Kernel (2), miRRim (10), McCaskill-MEA (11) and Rfold (12) are available for download. Although pairwise alignment is the default method, kernels can be used as similarities in an alternative approach for comparing two biological sequences. SOCOS/CAN and Stem Kernel are tools for sequence comparison, both of which use features of the potential secondary structures to calculate the kernel function. SOKOS/CAN calculates the marginalized kernel on SCFG, and Stem Kernel compares the sequences by the kernel based on all possible stem patterns.
Predicting non-coding RNAs is difficult because general characteristic sequence patterns are not known. For specific families of non-coding RNAs, however, realistic predictions are possible. We developed miRRim (10) as a tool for finding conserved miRNAs.
McCaskill-MEA (11) is a method used to predict consensus secondary structures from given multiple alignments. Rfold (12) is a tool for calculating the local base-pairing probabilities without using sliding windows; it is based on the full energy model of the Vienna RNA Package (23).
DESCRIPTION OF SERVICES
Table 1 shows a list of software tools available at http://software.ncrna.org/.
Table 1.
Software tool | Function | Pseudo-knot | Download | Integrated web server | Stand-alone web server | |
---|---|---|---|---|---|---|
Max. no. of seq. | Max. length | |||||
SCARNA | Pairwise alignment | Yes | N/A | 5 | 1000 | Yes |
PHMMTS | Pairwise alignment (to known structure) | No | C++ source | 5 | 1000 | Yes |
PSTAG | Pairwise alignment (to known structure) | Yes | C++ source | 5 | 70 | Yes |
MXSCARNA | Multiple alignment | No | C++ source | 10 | 1000 | Yes |
Murlet | Multiple alignment | No | C++ source | 5 | 300 | Yes |
RNAmine | Motif extraction | No | contact | 10 | 500 | No |
SOKOS/CAN | Sequence comparison | No | C source | N/A | N/A | No |
Stem Kernel | Sequence comparison | Yes | C++ source | N/A | N/A | No |
miRRim | miRNA finding | No | Source script | N/A | N/A | No |
McCaskill-MEA | Common secondary structure prediction | No | C++ source | N/A | N/A | No |
Rfold | Base pairing probabilities | No | C++ source | N/A | N/A | No |
The integrated web server and the stand-alone web servers offer web interfaces for use of SCARNA, PHMMTS, PSTAG, Murlet, MXSCARNA and RNAmine. On the integrated web server, users can select one of the service types: multiple alignment, pairwise alignment or structural motif extraction. On the menu for ‘multiple alignment’, users can select either Murlet or MXSCARNA. Either direct input or uploading of a file of RNA sequences in multi-FASTA format is accepted. The server outputs a multiple alignment with annotations of the predicted common secondary structure, a figure of the structure and a phylogenetic tree of the sequences.
On the menu for ‘pairwise alignment’, users can select SCARNA, PHMMTS or PSTAG. In SCARNA, either direct input or uploading of a file of RNA sequences in multi-FASTA format is accepted. The server outputs a pairwise alignment with annotations of the predicted common secondary structures and a figure of the structure that includes the two aligned sequences. PHMMTS and PSTAG accept either direct input or uploading of a file of RNA sequences of unknown secondary structures in multi-FASTA format as query sequences, and direct input of an RNA sequence of known secondary structure and its secondary structure in dot-bracket format as the template structure. The server outputs the result of alignments of the query sequences to the template structure with annotations of the secondary structures and the same kind of figures of the structures as SCARNA.
On the ‘structural motif extraction’ menu, users can select RNAmine, which accepts either direct input or uploading of a file of RNA sequences in multi-FASTA format. The server outputs the extracted motifs as abstract figures of the secondary structures and the list of the members by sequence name and positions. Detailed figures and the structure-annotated sequence are linked to the members.
For each of the software tools described, after the server outputs the results the user can continue to a homology search of the RNA sequences by BLAT for various genomes. Hits of the search are equipped with links to UCSC GenomeBrowser for functional RNAs (24).
FUTURE PLANS
In addition to web servers, web services for sequence analysis tools are desirable. We have already developed soap-based web services for Murlet, MXSCARNA, PHMMTS and PSTAG. The services will start shortly.
CONCLUSION
We have developed web servers for analysis of RNA sequences in light of their secondary structures. The web server offers six software tools for multiple sequence alignment, pairwise alignment and extraction of structural motifs of RNA sequences. These servers provide practical speed of services for the tasks that have been thought to require high computational costs.
ACKNOWLEDGEMENTS
This work was supported in part by the Functional RNA Project funded by the New Energy and Industrial Technology Development Organization (NEDO) of Japan and by a Grant-in-Aid for Scientific Research on Priority Areas (Comparative Genomics) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. The authors thank Ivo Hofacker, the author of Vienna RNA package, and Chuong Do, the author of ProbCons, because some of our programs include their parameters and/or source codes. The authors thank the Japan Biological Informatics Consortium (JBIC) for its support through the Functional RNA Project and our colleagues in the Computational Biology Research Center (CBRC) for their useful discussions. Funding to pay the Open Access publication charges for this article was provided by Grant-in-Aid for Scientific Research on Priority Areas (Comparative Genomics).
Conflict of interest statement. None declared.
REFERENCES
- 1.Kin T, Tsuda K, Asai K. Marginalized kernels for RNA sequence data analysis. Genome Inform. 2002;13:112–122. [PubMed] [Google Scholar]
- 2.Sakakibara Y, Popendorf K, Ogawa N, Asai K, Sato K. Stem kernels for RNA sequence analyses. J. Bioinform. Comput. Biol. 2007;5:1103–1122. doi: 10.1142/s0219720007003028. [DOI] [PubMed] [Google Scholar]
- 3.Tabei Y, Tsuda K, Kin T, Asai K. SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments. Bioinformatics. 2006;22:1723–1729. doi: 10.1093/bioinformatics/btl177. [DOI] [PubMed] [Google Scholar]
- 4.Sakakibara Y. Pair hidden Markov models on tree structures. Bioinformatics. 2003;19:i232–i240. doi: 10.1093/bioinformatics/btg1032. [DOI] [PubMed] [Google Scholar]
- 5.Sato K, Sakakibara Y. RNA secondary structural alignment with conditional random fields. Bioinformatics. 2005;21:ii237–ii242. doi: 10.1093/bioinformatics/bti1139. [DOI] [PubMed] [Google Scholar]
- 6.Matsui H, Sato K, Sakakibara Y. Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Bioinformatics. 2005;21:2611–2617. doi: 10.1093/bioinformatics/bti385. [DOI] [PubMed] [Google Scholar]
- 7.Kiryu H, Tabei Y, Kin T, Asai K. Murlet: a practical multiple alignment tool for structural RNA sequences. Bioinformatics. 2007;23:1588–1598. doi: 10.1093/bioinformatics/btm146. [DOI] [PubMed] [Google Scholar]
- 8.Tabei Y, Kiryu H, Kin T, Asai K. A fast structural multiple alignment method for long RNA sequences. BMC Bioinform. 2008;9:33. doi: 10.1186/1471-2105-9-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hamada M, Tsuda K, Kudo T, Kin T, Asai K. Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics. 2006;22:2480–2487. doi: 10.1093/bioinformatics/btl431. [DOI] [PubMed] [Google Scholar]
- 10.Terai G, Komori T, Asai K, Kin T. miRRim: a novel system to find conserved miRNAs with high sensitivity and specificity. RNA. 2007;13:2081–2090. doi: 10.1261/rna.655107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kiryu H, Kin T, Asai K. Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics. 2007;23:434–441. doi: 10.1093/bioinformatics/btl636. [DOI] [PubMed] [Google Scholar]
- 12.Kiryu H, Kin T, Asai K. Rfold: an exact algorithm for computing local base pairing probabilities. Bioinformatics. 2008;24:367–373. doi: 10.1093/bioinformatics/btm591. [DOI] [PubMed] [Google Scholar]
- 13.McCaskill J. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 1990;29:1105–1119. doi: 10.1002/bip.360290621. [DOI] [PubMed] [Google Scholar]
- 14.Gardner P, Wilm A, Washietl S. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 2005;33:2433–2439. doi: 10.1093/nar/gki541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–441. doi: 10.1093/nar/gkg006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Torarinsson E, Havgaard JH, Gorodkin J. Multiple structural alignment and clustering of RNA sequences. Bioinformatics. 2007;23:926–932. doi: 10.1093/bioinformatics/btm049. [DOI] [PubMed] [Google Scholar]
- 17.Harmanci AO, Sharma G, Mathews DH. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics. 2007;8:130. doi: 10.1186/1471-2105-8-130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hofacker IL, Bernhart SH, Stadler PF. Alignment of RNA base pairing probability matrices. Bioinformatics. 2004;20:2222–2227. doi: 10.1093/bioinformatics/bth229. [DOI] [PubMed] [Google Scholar]
- 19.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.van Batenburg FHD, Gultyaev AP, Pleij WA, Ng J, Oliehoek J. PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res. 2000;28:201–204. doi: 10.1093/nar/28.1.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Do CB, Mahabhashyam M.SP, Brudno M, Batzoglou S. PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Research. 2005;15:330–340. doi: 10.1101/gr.2821705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Holmes I. Accelerated probabilistic inference of RNA structure evolution. BMC Bioinform. 2005;6:73. doi: 10.1186/1471-2105-6-73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hofacker I. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429–3431. doi: 10.1093/nar/gkg599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K. fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res. 2006;35:D145–D148. doi: 10.1093/nar/gkl837. [DOI] [PMC free article] [PubMed] [Google Scholar]