Abstract
The structures of many non-coding RNA (ncRNA) are conserved by evolution to a greater extent than their sequences. By predicting the conserved structure of two or more homologous sequences, the accuracy of secondary structure prediction can be improved as compared to structure prediction for a single sequence. This unit provides protocols for the use of four programs in the RNAstructure suite for prediction of conserved structures, Multilign, TurboFold, Dynalign, and PARTS. These programs can be run through webservers, on the command line, or with graphical interfaces.
Keywords: Bioinformatics, RNA Analysis, RNA, Structure and Folding, Informatics
RNAstructure is a software package for RNA secondary structure prediction and analysis. It contains four programs for predicting conserved RNA secondary structures: Multilign (Xu and Mathews, 2011), TurboFold (Harmanci et al., 2011), Dynalign (Mathews and Turner, 2002), and PARTS (Harmanci et al., 2008). Multilign and TurboFold are designed for use with three or more homologous sequences. Dynalign and PARTS are designed to work on two sequences. All four programs are available as web servers, as command line interfaces, and as source code (Bellaousov et al., 2013; Reuter and Mathews, 2010). Graphical interfaces are available for Dynalign, Multilign, and TurboFold. Basic Protocol 1 is the prediction of conserved structures for three or more sequences with the web servers. Basic Protocol 2 is the prediction of conserved structures for two sequences with the web servers. Alternate Protocol 3 is the prediction of conserved structures for three or more sequences using TurboFold with the graphical user interface. Alternate Protocol 4 is the prediction of a conserved structure in two sequences using Dynalign with the graphical user interface.
Basic Protocol 1: Predicting a structure conserved in three or more sequences with the RNAstructure web server
This protocol details the use of the RNAstructure web server to predict the conserved structure in three of more sequences (Bellaousov et al., 2013). It assumes basic familiarity with using the World Wide Web and web browsers. An example is drawn from three tRNA sequences, obtained from the Sprinzl tRNA database (Sprinzl and Vassilenko, 2005).
This protocol is suitable for use when three or more homologous sequences are available. The web server uses both of the two methods available in RNAstructure to predict conserved structures with three or more sequences. The two predictions can provide alternative hypotheses for the conserved structure.
Necessary Resources
Software
A web browser is required for accessing the RNAstructure web servers.
Connect to the web server and submit sequences
1. Point a browser to http://rna.urmc.rochester.edu/RNAstructureWeb/, the RNAstructure web server.
-
2. Choose the link “Predict a Secondary Structure Common to Three or More Sequences”.
The RNAstructure web servers are designed around two types of themes. The first is biological problem and the second is program. The “Predict a Secondary Structure Common to Three or More Sequences” server is problem-themed server, and it runs two programs, Multilign (Xu and Mathews, 2011) and TurboFold (Harmanci et al., 2011). Farther down the main server page, Multilign and TurboFold servers can be chosen. The problem-themed servers are generally more popular and are convenient to use because they provide alternative hypotheses for the structure. The program servers, however, complete their calculations in less time.
Enter the sequences
-
3. Sequences are entered using a multiple sequence FASTA format (Figure 12.4.1). Sequences can be entered for the webserver either by file upload or by pasting the sequences into a text box. Figure 12.4.2 shows a screen shot of the web server form.
Example sequences can also be displayed by clicking on the link directly above the text box labeled “click here to add example sequences to the box.” This pastes the tRNA sequences RA7680, RD0260, and RD0500 (Sprinzl and Vassilenko, 2005). These sequences are used for the example shown here. For the webservers, there are limits to the number and lengths of sequences that can be uploaded. The limits are detailed at: http://rna.urmc.rochester.edu/RNAstructureWeb/Information/Limitations.html. As of this writing, up to ten sequences can be entered and each sequence must be 500 nucleotides or shorter. These limits are designed to provide users with a reasonable rate of throughput, and they might change as dictated by server demand or as hardware is upgraded.
Note that lowercase nucleotides are not allowed to pair in structure prediction. It is therefore important that most nucleotides be uppercase. For the tRNA sequences used in this example, lowercase nucleotides are modified nucleotides that cannot be accommodated in a helix (Mathews et al., 1999).
Figure 12.4.1.

The RNAstructure multiple sequence FASTA format. Sequences are uploaded to the RNAstructure “Predict a Secondary Structure Common to Three or More Sequences” web server in multiple sequence FASTA format. For each sequence, the first line, a title line, needs to start with “>”. Subsequent lines for each sequence should only contain sequence and whitespace, which is ignored. Lowercase nucleotides will be forced single stranded in structure prediction. X can also be used to indicate a nucleotide that neither pairs nor stacks. Note that “T” is treated as “U”. Subsequent sequences also start with “>”, which indicates that a new sequence is starting with a tile.
Figure 12.4.2.
A screen shot of the web form for the RNAstructure “Predict a Secondary Structure Common to Three or More Sequences” web server. Clicking “Click here to add example sequences to the box” will paste the example sequences used here.
Select options
-
4. The first adjustable option is the absolute temperature for structure prediction. By default, structure prediction is performed for folding at 37 °C, i.e. 310.15 K.
Structure prediction assumes that structures have folded to their lowest free energy structure, i.e. that they have reached equilibrium. To determine folding stability, a set of nearest neighbor free energy parameters are used (Lu et al., 2006; Mathews et al., 2004; Mathews et al., 1999; Xia et al., 1998). These parameters are most accurate at 310.15 K. They are also accurate between 293 and 333 K (Lu et al., 2006). For most calculations, it is reasonable to choose the default temperature, even if one or more sequences are from an organism that lives at a different temperature.
-
5. Next, options for Multilign can be changed
For most users, the default parameters are the best choice for most calculations, and these parameters were used for accuracy benchmarks (Xu and Mathews, 2011). A complete explanation of the parameters is available through the online help, which can be reached by clicking the link at “If you need specific help using the Predict a Secondary Structure Common to Three or More Sequences server, please click here.” This appears immediately above the sequence entry portion of the web form.
-
6. Next, TurboFold options can be changed.
The first options are the “Folding Mode.” Three options are available, Maximum Expected Accuracy, Pseudoknots, or Threshold. Maximum Expected Accuracy, the default mode, provides a reasonable balance between predicting complete structures and avoiding the prediction of incorrect pairs (Harmanci et al., 2011). Pseudoknots mode can predict pseudoknots, a type of secondary structure that is difficult to predict (Liu et al., 2010; Seetin and Mathews, 2012b). This is the only mode that can predict pseudoknots, but it tends to predict more incorrect pairs than the other two modes. The last mode, threshold, predicts fewer pairs, but the pairs are more likely to be correctly predicted (Harmanci et al., 2011).
The next parameters are set to default values, which are the best parameters for most predictions. A complete explanation of the parameters is available through the online help, which can be reached by clicking the link at “If you need specific help using the Predict a Secondary Structure Common to Three or More Sequences server, please click here.” This appears immediately above the sequence entry portion of the web form.
Enter an email address and start the calculation
7. Optionally, an email address can be provided. If an email address is provided, email is sent when the calculation is complete, and this email contains a link to the location of the results. If an email address is not provided, it is imperative that the browser window remain open to the web server after the calculation is started to make sure the results remain accessible.
8. Start the calculation by clicking “Submit Query.” The web server will now move to the results page. This page refreshes automatically until the calculation is complete, and the results can be displayed.
Examine and download the results
9. On the results page, the results from Multilign are displayed above TurboFold. First, the alignment generated by Multilign is available for download as a plain text file, at the link labeled “Click here to download the alignment file.”
-
10. Next, structures are displayed for each sequence. Using the web form example, these are structures for each of the three sequences, and a screenshot is shown in Figure 12.4.3.
For structure display, an SVG graphic is displayed to indicate nucleotides in canonical base pairs. When there are alternative structures, called suboptimal structures, buttons labeled “Previous” and “Next” are available to change the currently displayed structure. Immediately beneath the structure display, all structures are available for download as Adobe pdf, Adobe Postscript (PS), or ct format, which is a plan text format that indicates the locations of base pairs (Figure 12.4.4). Then, individual structures are available for download as Adobe pdf, SVG, Adobe Postscript (PS), jpeg, or ct format.
Adobe pdf and Adobe Postscript are vector formats, and can therefore be edited using drawing tools, such as Inkscape or Adobe Illustrator. SVG is another vector graphic, but is generally designed for web display. Jpeg is an image format, and it cannot be easily modified or scaled.
For Multilign, the folding free energy for each predicted structure is provided as the ENERGY at the bottom of the drawing. These folding free energies are in kcal/mol. For TurboFold, scores are provided for each structure. When using the default parameters, Maximum Expected Accuracy mode and Maximum Expected Accuracy Gamma of one, the score is twice the sum of the pair probabilities for all pairs, plus the sum of the probability for being unpaired for all single-stranded nucleotides.
11. Also provided after each program is the command line that was executed by the server to run the program. Clicking the name of the program links to the help pages for the command line program. Clicking the name of the configuration file, a plain text file, links to the plain text file that specified the input parameters. This information is useful for learning how to run the programs on the command line.
Figure 12.4.3.
A screen shot of the results page for the RNAstructure “Predict a Secondary Structure Common to Three or More Sequences” web server. The top link, “Click here to download the alignment file,” provides the plain text alignment file produced by Multilign. The drawing at the top of the page, shown here, is the Multilign structure prediction for the first sequence. Below are the structure predictions for the remaining sequences, and then the structure predictions by TurboFold.
Figure 12.4.4.
The RNAstructure ct file format. A ct (connectivity table) file contains secondary structure information for a sequence. The format used by RNAstructure is as follows. The start of first line is the number of nucleotides in the sequence. The end of the first line is the title of the structure. Each of the following lines provides information about a given base in the sequence. Each base has its own line, with these elements in order: nucleotide number (starting with 1), base (A, C, G, T, U, X), the nucleotide connection in the 5′ direction, the nucleotide connection in the 3′ direction, number of the base to which the current nucleotide is paired (no pairing is indicated by 0, zero), and natural numbering (this will be the nucleotide index repeated for the calculations described in this unit). The ct file may hold multiple structures for a single sequence. This is done by repeating the format for each structure without blank lines between structures. The example shown here is the structure predicted for RA7680 by the “Predict a Secondary Structure Common to Three or More Sequences” webserver, as illustrated in Basic Protocol 1. “…” indicates parts of the ct file not displayed in the figure.
Basic Protocol 2: Predicting a structure conserved in two sequences with the RNAstructure web server
This protocol details the use of the RNAstructure web server to predict the structure conserved in two sequences (Bellaousov et al., 2013). It assumes basic familiarity with using the World Wide Web and web browsers. An example is drawn from tRNA sequences, obtained from the Sprinzl tRNA database (Sprinzl and Vassilenko, 2005).
This protocol is suitable for use when two homologous sequences are available. The web server runs the two programs in RNAstructure that predict the conserved structure in two homologous sequences. The results from the two programs are alternative hypotheses for the structure.
Necessary Resources
Software
A web browser is required for accessing the RNAstructure web servers.
Connect to the web server and submit sequences
1. Point a browser to http://rna.urmc.rochester.edu/RNAstructureWeb/, the RNAstructure web server.
2. Choose the link “Predict a Secondary Structure Common to Two Sequences” The RNAstructure web servers are designed around two types of themes. The first is a biological problem, and the second is a program. The “Predict a Secondary Structure Common to Two Sequences” server is problem-themed server. This server runs both Dynalign (Harmanci et al., 2007; Mathews, 2005; Mathews and Turner, 2002; Uzilov et al., 2006) and PARTS (Harmanci et al., 2008). Farther down the page, Dynalign and PARTS servers can be directly chosen. The problem-themed servers are generally more popular and are convenient to use because they provide alternative hypotheses for the structure. The program servers, however, complete their calculations in less time.
Enter the Sequences
-
3. Sequences are entered by either upload of a FASTA file (Figure 12.4.5) or by pasting raw sequence information into two text boxes, labeled “Sequence 1” and “Sequence 2.” Figure 12.4.6 shows a screen shot of the web server form. If sequence information is pasted directly into a text box, then a title can also be entered in the “Title” field.
Example sequences can also be displayed by clicking on the link directly above the text box labeled “Click here to add example sequences to the box.” This pastes the tRNA sequences RA7680 and RD0500 (Sprinzl and Vassilenko, 2005) into the text boxes. These sequences are used for the example shown here.
For the webservers, there are limits to the number and lengths of sequences that can be uploaded. The limits are detailed at: http://rna.urmc.rochester.edu/RNAstructureWeb/Information/Limitations.html. As of this writing, each sequence must be 250 nucleotides or shorter. These limits are designed to provide users with a reasonable rate of throughput, and they might change as dictated by server demand or as hardware is upgraded. Note that lowercase nucleotides are not allowed to pair in structure prediction. It is therefore important that most nucleotides be uppercase. For the tRNA sequences used in this example, lowercase nucleotides are modified nucleotides that cannot be accommodated in a helix (Mathews et al., 1999).
Figure 12.4.5.

The FASTA file format. Sequences can be uploaded to the RNAstructure webserver in FASTA format. For FASTA, the first line, a title line, needs to start with “>”. Subsequent lines should only contain sequence and whitespace, which is ignored. Lowercase nucleotides will be forced single stranded in structure prediction. X can also be used to indicate a nucleotide that neither pairs nor stacks.
Figure 12.4.6.
A screen shot of the web form for the RNAstructure “Predict a Secondary Structure Common to Two Sequences” web server. Clicking “Click here to add an example sequence to both sequence boxes” will paste the example sequences used here.
Select options
-
4. The first set of adjustable options is for Dynalign. For most calculations, it is best to leave these parameters at their default values.
Maximum % Energy Difference, Maximum Number of Structures, Structure Windows Size, and Alignment Window Size all affect the number of extent of difference in a predicted set of suboptimal structures. The Gap Penalty can also be changed, and this is a penalty per gap inserted in the structure alignment. Finally, the ability to insert single base pairs in one sequence relative to another can be turned off.
A complete explanation of the parameters is available through the online help, which can be reached by clicking the link at “If you need specific help using the Predict a Secondary Structure Common to Two Sequences server, please click here.” This appears immediately above the sequence entry portion of the web form.
5. Next, Optional Data can be entered to constrain the structure prediction. These constraints can be uploaded as plain text files (Figure 12.4.7).
-
6. Next PARTS options can be changed. For most calculations, it is best to leave these parameters at their default values.
The first options are the “Mode.” Three options are available, MAP (Maximum A Posteriori), Pairing Probability Prediction, or Stochastic Sampling. MAP mode, the default, provides a single, most likely structure for each sequence. “Pairing Probability Prediction” predicts the base pairing probabilities. Stochastic Sampling generates a representative sample of conserved structures. For most structure prediction applications, MAP mode provides the most information.
The next set of parameters is set to default values. For Stochastic Sampling mode, the sample size can be changed. For Pairing Probability Prediction, the output is a color plot to indicate the −log10 of the probability of base pair formation. The minimum and maximum values displayed in the plot can be changed. By default the Minimum is blank, meaning that there is no minimum. The Maximum defaults to 2, which means only base pairs of greater than 1% pairing probability will be displayed.
Figure 12.4.7.
The RNAstructure constraint file format. Folding constraint files are plain text files. These can be manually edited. For multiple entries of a specific type of constraint, entries are each listed on a separate line. Note that all specifiers, followed by “−1” or “−1 −1”, are expected by RNAstructure. For all specifiers that take two arguments, it is assumed that the first argument is the 5′ nucleotide.
Panel A shows the specification of the fields. The constraints are XA, nucleotides that will be double-stranded; XB, nucleotides that will be single-stranded (unpaired); XC, nucleotides accessible to chemical modification; XD1 and XD2, forced base pair between XD1 and XD2; XE, nucleotides accessible to FMN cleavage (U in GU pair); and XF1 and XF2, a base pair prohibited between nucleotides XF1 and XF2. All nucleotide indexes are from numbering 5′ to 3′, with the nucleotide at the 5′ end with index of 1.
Panel B shows an example.
Enter an email address and start the calculation
7. Optionally, an email address can be provided. If an email address is provided, email is sent when the calculation is complete, and this email contains a link to the location of the results. If an email address is not provided, it is imperative that the browser window remain open to the web server after the calculation is started to make sure the results remain accessible.
8. Start the calculation by clicking “Submit Query.” The web server will now move to the results page. This page refreshes automatically until the calculation is complete, and the results can be displayed.
Examine and download the results
9. On the results page, the results from Dynalign are displayed above PARTS. First, the alignment generated by Dynalign is available for download as a plain text file, at the link labeled “Click here to download the alignment file.”
-
10. Next structures are displayed for each sequence. A screenshot is shown in Figure 12.4.8.
For structure display, an SVG graphic is displayed to indicate nucleotides in canonical base pairs. When there are alternative structures, called suboptimal structures, buttons labeled “Previous” and “Next” are available to change the currently displayed structure. Immediately beneath the structure display, all structures are available for download as Adobe pdf, Adobe Postscript (PS), or ct format, which is a plan text format that indicates the locations of base pairs (Figure 12.4.4). Then, individual structures are available for download as Adobe pdf, SVG, Adobe postscript (PS), jpeg, or ct format.
Adobe pdf and Adobe Postscript are vector formats, and can therefore be edited using drawing tools, such as Inkscape or Adobe Illustrator. SVG is another vector graphic, but is generally designed for web display. Jpeg is an image format, and it cannot be easily modified or scaled.
For Dynalign, the total folding free energy change is provided for each structure. This free energy change is the sum of the folding free energy change for the structure of each sequence, plus a gap penalty for each gap in the sequence alignment. The total folding free energy is displayed as the ENERGY, and it is in units of kcal/mol.
11. Also provided after each program is the command line that was executed by the server to execute the program. Clicking the name of the program links to the helps page for the command line program. Clicking the name of the configuration file, a plain text file, links to the file that specified the input parameters. This information is useful for learning how to run the programs on the command line.
Figure 12.4.8.
A screen shot of the results page for the RNAstructure “Predict a Secondary Structure Common to Two Sequences” web server. “Click here to download the alignment file” provides a link to the plain text alignment file produced by Dynalign. The first structure drawing, shown here, is the structure predicted by Dynalign for the first sequence. The predicted structures for the second sequence are shown below. Farther down the page are the structures predicted by PARTS.
Alternate Protocol 1: Predicting a structure conserved in three or more sequences with TurboFold in the RNAstructure graphical interface
This protocol details step-by-step instructions for the use of the TurboFold algorithm in the graphical user interface of RNAstructure. It assumes some basic familiarity with point and click interfaces. An example is drawn from two tRNA sequences, RA7680 and RD0500, derived from the Sprinzl database (Sprinzl et al., 1998).
TurboFold is one of the two programs in RNAstructure that predict conserved structures in multiple homologs (Harmanci et al., 2011). The other program, Multilign, has a similar graphical interface (Xu and Mathews, 2011). For many structure predictions, it is best to run both programs because they can provide alternative hypotheses for the structure. In general, TurboFold works better when the pairwise sequence identities are relatively higher (≥70%) and Multilign works better when the pairwise sequences identities are lower. Multilign provides a multiple sequence alignment that is informed by structure, but TurboFold is faster.
Necessary Resources
The software package, RNAstructure, can be downloaded from the World-Wide-Web at http://rna.urmc.rochester.edu/RNAstructure.html. Registration is required for download, so that a count of those using the software can be maintained. The list of registered users is not shared with others and not used for any other purpose.
The RNAstructure graphical interface is available for Windows, both 32 and 64 bit, and is known to run on Windows XP, Windows Vista, Windows 7, and Windows 8. The graphical interface is also available in Java for use on Macintosh OS X or Linux, in 32 or 64 bit.
Download and install RNAstructure
1. Download RNAstructure from http://rna.urmc.rochester.edu/RNAstructure.html.
2a. For Windows, download either the 32 bit Windows Native Interface or the 64 bit Windows Native Interface. If in doubt as to whether the version of Windows being used is 32 bit or 64 bit, download the 32 bit because this will work in both environments. The 64 bit version, run in a 64 bit environment, is capable of predicting structures for longer sequences because it can address more memory. Double-click on the zip file (RNAstructure.zip or RNAstructure64bit.zip). Then, double-click on setup.exe and install the software. RNAstructure can then be run from the start list or from the start screen on Windows 8.
-
2b. For Macintosh OS X, download the Mac OS X interface by clicking the link, “JAVA Mac OS-X Interface as tarball.” Double-click RNAstructureForMac.tgz to extract the files. Double-click on the RNAstructure directory. RNAstructure can now be launched by double-clicking the RNAstructure icon. If desired, the whole RNAstructure directory can be dragged to the application folder.
Note that on OS X Mountain Lion or later, an error message may appear that states the executable is “damaged.” This is a feature that prevents software downloaded from the internet from running. To run RNAstructure, click on “System Properties.” Under “Security & Privacy,” click the lock to be able to make changes and select “Anywhere” under “Allow applications downloaded from …” Then when RNAstructure is run, a prompt will appear to query if it is OK to run RNAstructure.
-
2c. For Linux, download either “JAVA 32-bit Linux Interface as tarball” or “JAVA 64-bit Linux Interface as tarball.” If in doubt as to whether the operating system is 32 bit or 64 bit, download the 32 bit because this will work in both environments. The 64 bit version run in a 64 bit environment is capable of predicting structures for longer sequences because it can address more memory. Extract the files using tar –xzvf RNAstructureForLinux.tgz or tar –xzvf RNAstructureForLinux64bit.tgz. RNAstructure can now be started by executing RNAstructureScript, found in RNAstructure/exe/, which sets environment variables and launched the JAVA virtual machine.
Note that RNAstructure requires Oracle JAVA, which can be found at: http://java.com/en/download/linux_manual.jsp.
3. Online help for installing and launching RNAstructure are available online at: http://rna.urmc.rochester.edu/Overview/index.html.
Enter the sequences
4. Start the RNAstructure GUI.
-
5. To enter sequences in RNAstructure, use the sequence editor, which can be opened either by selecting New Sequence from the File menu or by clicking the New Sequence icon at the far left of the toolbar. The sequence editor will open as shown in Figure 12.4.9.
Note that items in the toolbar, located directly under the menu bar on Windows and Linux and at the top of the Window on OS X, identify themselves with a pop-up label if the mouse pointer is placed over that icon.
-
6. Enter a title in the Title field at the top. This title will be used to label the output from calculations. The Comment field provides an opportunity to save comments with the sequence. Finally, enter the sequence, from 5′ to 3′, in the Sequence field.
Sequences can be entered manually or pasted from other programs by cut and paste. A sequence copied to the clipboard can be pasted into the Sequence field by first clicking in the Sequence field to place the cursor and then choosing Paste from the Edit menu item or typing Ctrl-V. Several tools are available in the sequence editor. Clicking the Format Sequence button places the sequence in columns of five nucleotides with fifty nucleotides per line. If the Read Sequence button is clicked, the sequence is read aloud (over the computer’s speakers) from 5′ to 3′. This can be canceled by clicking the same button, which has its label changed to Cancel Read. Also, the sequence can be read while it is typed by choosing Read Sequence While Typing on the Read menu item. The secondary structure can be predicted for a single sequence by clicking Fold as RNA or Fold as DNA buttons. Protocols for secondary structure prediction for a single sequence using RNAstructure are detailed in unit 12.6.
7. Save the sequence either by choosing Save Sequence under the File menu item or by clicking the disk icon on the toolbar. This sequence can later be accessed by opening the file with the sequence editor, either by choosing Open Sequence under the File menu item or by clicking the Open Sequence icon on the toolbar.
-
8. Repeat steps 4-7 to enter the second sequence. Each sequence needs to be saved in its own file.
An example calculation will be run below with two sequences, RA7680 and RD0260, which are included with RNAstructure as examples, so they do not need to be entered.
Figure 12.4.9.
A screen shot of the RNAstructure sequence editor. This illustrates the sequence editor as it appears on Microsoft Windows 7 for RNAstructure 5.6. The Java versions for Linux and Macintosh have the same items. Note that the menu appears on the Macintosh menu bar when running on OS X. The tRNA sequence, RA7680, has been opened from disk. Note that lowercase nucleotides are not allowed to pair in structure predictions. For tRNA sequences, this is a convenient way to specify modified nucleotides that cannot base pair in a helix (Mathews et al., 1999). It is important that most nucleotides be uppercase.
Starting TurboFold
9. After the sequences have been entered and saved using the sequence editor, proceed to the TurboFold input form by choosing RNA TurboFold. Figure 12.4.10 shows the TurboFold input form as it appears on Microsoft Windows 7.
10. Choose a sequence file by clicking the Sequence File button. This will open the standard file chooser dialog box. After a sequence is chosen, its name will appear in the text box next to the Sequence File button. A default name for the structure output is also generated and appears in the text box next to the CT file button. This output file name can be changed by clicking the CT File button. The sequence can now be added to the list of sequences by clicking the button labeled ADD -->. The sequence now appears in the sequence list at the right.
-
11. Choose each sequence successively until all have been chosen. Any number of sequences can be selected. For this example calculations, choose RA7680 and RD0260.
Example files are provided with the RNAstructure installation. On Microsoft Windows, the default location is the user’s documents folder, e.g. C:\Users\dhm\Documents\RNAstructure_examples. On Linux and Macintosh, the files are in the examples folder in the RNAstructure folder.
12. If a sequence is mistakenly chosen, it can be removed by typing the sequence number in the box adjacent to the Delete Sequence button, and then clicking the Delete Sequence button.
Figure 12.4.10.
A screen shot of the RNAstructure TurboFold input form. This is the form as it appears on Microsoft Windows 7. The Java versions for Linux and Macintosh have the same items. Note that the menu appears on the Macintosh menu bar when running on OS X. The two example sequences used here, RA7680.seq and RD0260.seq have been selected and now appear in the list to the right.
Set optional parameters and start the calculation
-
13. The first option is the structure prediction mode. Three options are available, Maximum Expected Accuracy, Probability Threshold, or ProbKnot/TurboKnot. Maximum expected accuracy, the default mode, provides a reasonable balance between predicting complete structures and avoiding the prediction of incorrect pairs (Harmanci et al., 2011). ProbKnot/TurboKnot mode can predict pseudoknots, a type of secondary structure that is difficult to predict (Liu et al., 2010; Seetin and Mathews, 2012b). This is the only mode that can predict pseudoknots, but it tends to predict more pairs that are incorrect than the other two modes. The last mode, Probability Threshold, predicts fewer pairs than the other modes, but the pairs are more likely to be correctly predicted (Harmanci et al., 2011).
There is no single best choice for the folding mode. Maximum Expected Accuracy provides a good balance in predicting pairs correctly and minimizing incorrectly predicted pairs, and can be used in most situations. It is also worth repeating a calculation in the ProbKnot/TurboKnot mode to also predict possible pseudoknots.
14. The remaining parameters have default values that are the best parameters for most structure predictions. These are categorized as parameters that affect all types of calculations, “General Parameters,” or parameters that affect only a single mode of structure prediction and labeled as such. The online help in the TurboFold topic provides a detailed explanation of these parameters. The help can be started by choosing Help Topics under the Help menu item.
15. Start the calculation by clicking the START button. A progress dialog box opens to display the progress of the calculation.
Display the predicted structures
16. When the calculation is complete, a dialog box opens to select “Draw Structures” or “Cancel.” Choose “Draw Structures.”
-
17. Two windows will open for each sequence. The first window is the structure drawing window, which shows the predicted secondary structure and any suboptimal structures. The second window is the probability dot plot window. The probability dot plots show the −log10 of the pairing probability for all possible pairs.
The information in the probability dot plot windows summarizes all possible competing pairs in the folding ensemble, informed by the fact that all sequences must fold to homologous structures. This provides detailed information that is more easily summarized in structure color annotation as explained in the next step.
-
18. Color annotate the predicted structures with base pairing probabilities. For each sequence, click on the structure drawing window. Then choose Add Probability Color Annotation under the Annotations menu option. This will open a file choosing dialog box, which should be used to select the partition function save file for that sequence. The partition function save file has the same name as the .CT file, but a .pfs extension. After color annotating the structures for each sequence, display the color annotation key by choosing Show Probability Color Annotation Key under the Annotation menu option.
Figure 12.4.11, an example screenshot, shows the structure drawing modules for the results of the TurboFold calculation.
Structures are now colored according to probabilities. For base paired nucleotides, the probabilities are the probability of being in the specific base pair that is drawn. For unpaired nucleotides, probabilities are the probability of being unpaired. The color annotation key explains the bands of pairing probability. The highest probabilities are red (≥ 99%), then orange (99% > probability ≥ 95%), yellow (95% > probability ≥ 90%), dark green (90% > probability ≥ 80%), light green (80% > probability ≥ 70%), light blue (70% > probability ≥ 60%), dark blue (60% > probability ≥ 50%), and purple (≤ 50%).
Color annotation provides information about the confidence in the prediction of a specific pair. It has been demonstrated that more probable pairs are more likely to be correctly predicted (Harmanci et al., 2011; Mathews, 2004). When using the default TurboFold mode of Maximum Expected Accuracy, a score is given with each predicted structure. The score is twice the value of MEA Gamma times the sum of the pairing probabilities for all pairs, plus the sum of the unpairing probabilities for all unpaired nucleotides. The score is labeled ENERGY.
19. To zoom in and out of a structure, the Ctrl-right arrow and Ctrl-left arrow keys can be used, respectively. Alternatively, the zoom level can be chosen by clicking Zoom under the draw menu item. The structure can be drawn clockwise or counterclockwise by checking or unchecking Render Clockwise under the Draw menu option, respectively.
20. The currently displayed structure can be adjusted typing Ctrl-up arrow or Ctrl-down arrow to change the number up and down, respectively. Alternatively, the structure to display can be chosen from a dialog box by choosing Goto Structure Number under the Draw menu option. Structure number 1 is the structure first displayed and is the best estimate of the structure. The currently displayed structure is indicated in the upper-left corner of the drawing window. The suboptimal structures are alternative hypotheses.
Figure 12.4.11.
A screen shot of the structure drawing window from RNAstructure, showing the output of the TurboFold calculation. This is the output as it appears on Microsoft Windows 7. The structure predicted for RD0260. The steps have been followed to add color annotation and to display the color annotation key. The probability dot plot windows have been closed.
Getting help with RNAstructure
21. RNAstructure includes an online help, available by clicking Help Topics under the Help menu item. The online help is organized into chapters covering the different functions. An index is also available by clicking the index button from the online help window. The help is displayed in the default web browser.
Alternate protocol 2: Predicting a structure conserved in two sequences with Dyalign in the RNAstructure graphical interface
This protocol details step-by-step instructions for the use of the Dynalign algorithm in the graphical user interface of RNAstructure. It assumes some basic familiarity with point and click interfaces. An example is drawn from two tRNA sequences, RA7680 and RD0500, derived from the Sprinzl database (Sprinzl et al., 1998). Dynalign is appropriate to use when two homologous sequences are available.
Necessary Resource List
The software package, RNAstructure, can be downloaded from the World-Wide-Web at http://rna.chem.rochester.edu/RNAstructure.html. Registration is required for download, so that a count of those using the software can be maintained. The list of registered users is not shared with others and not used for any other purpose.
Download and install RNAstructure
1. Download and install the RNAstructure graphical user interface according to steps 1 to 3 in alternate Protocol 1, above.
Enter the sequences
-
2. Enter sequences using the RNAstructure sequence editor as detailed in steps 4 to 8 of alternate Protocol 1, above.
The two sequences for the sample calculation, RA7680 and RD0500, are included with RNAstructure as examples; therefore they do not need to be entered.
Start Dynalign
3. After both sequences have been entered and saved using the sequence editor, proceed to the Dynalign window by choosing RNA Dynalign under the RNA menu item or by clicking the Dynalign icon on the toolbar. Figure 12.4.12 illustrates the Dynalign input form window.
-
4. Choose the first sequence by clicking the button labeled Sequence File 1. The standard file chooser dialog box will appear for selecting the sequence file. For example, select RA7680.seq from the RNAstructure examples folder. Repeat this for the second sequence by clicking the Sequence File 2 button and select RD0260.seq.
Example files are provided with the RNAstructure installation. On Microsoft Windows, the default location is the user’s documents folder, C:\Users\dhm\Documents\RNAstructure_examples. On Linux and Macintosh, the files are in the examples folder in the RNAstructure folder.
-
5. Default values will now be displayed for the user-adjustable parameters and for the output file names. The predicted structures will be output in ct files and the predicted alignment is output in plain text. The names of these files can be changed from the default by clicking the buttons labeled CT File 1, CT File 2, or Alignment File.
The default parameter values are the best to use for most structure predictions. The Gap Penalty is a free energy penalty that is applied per gap inserted in the alignment. It is an empirical value that calibrates the sequence alignment score to the structure prediction folding free energy change. It is generally best to allow single base pair inserts in the structures, with the Single BP Inserts Allowed checked, because there is variability in helix length that can be observed in sets of homologous sequences.
There are a set of parameters that control the generation of suboptimal structures, boxed in the Suboptimal Structure Parameters box (Mathews, 2005). These are structures that represent alternative hypotheses for the conserved structure. Max % Energy Difference controls how much higher the folding free energy change can be in the set of structures, and Max Number of Structures is an absolute limit on the number of structures. Setting these values higher will generate more alternative hypotheses. The Structure Window Size and Alignment Window Size ensure that suboptimal structures are substantially different than each other. To generate more, similar suboptimal structures, these can be set to integers as low as 0. To generate fewer, more distinct structures, these parameters can be set to larger integer values.
By default, Generate Save File is checked. This file is named by concatenating the two CT file names with a “.”and appending .dsv (Dynalign save file format). This file can be used to display all conserved pairs in low free energy structures with the DotPlot Dynalign utility.
6. Begin the calculation by clicking the START button. A window, labeled Dynalign in Progress, will open to show the progress of the calculation with a progress bar.
Figure 12.4.12.
A screen shot of the Dynalign input form for RNAstructure. This is the form as it appears on Microsoft Windows 7. The example sequences, RA7680.seq and RD0260.seq, have already been selected for input.
Viewing the output
-
7. When the Dynalign calculation is complete, a dialog box appears that gives the options of “draw structures” and “cancel.” Click the draw structures button to display the structures on the screen. This opens the interactive drawing tool incorporated in RNAstructure. The structure of each of the two sequences is drawn in a separate window. Figure 12.4.13 shows the structure for the RA7680 tRNA sequence, predicted using Dynalign.
Note that the structures are saved to disk as .ct files (Figure 12.4.4). These can be viewed as drawings at any time by opening the RNAstructure drawing tool, Draw menu item under File, and selecting the .CT file to display.
For Dynalign, the total folding free energy change is provided for each structure. This free energy change is the sum of the folding free energy change for the structure of each sequence, plus a gap penalty for each gap in the sequence alignment. The total folding free energy is displayed as the ENERGY, and it is in units of kcal/mol.
8. To zoom in and out of the structure, the Ctrl-right arrow and Ctrl-left arrow keys can be used. Alternatively, the zoom level can be chosen by clicking Zoom under the Draw menu item. The structure can be drawn clockwise or counterclockwise by checking or unchecking Render Clockwise under the Draw menu option, respectively.
9. The currently displayed structure can be adjusted typing Ctrl-up arrow or Ctrl-down arrow to change the number up and down, respectively. Alternatively, the structure to display can be chosen from a dialog box by choosing Goto Structure Number under the Draw menu option. Structure number 1 is the structure first displayed and is the best estimate of the structure. The currently displayed structure is indicated in the upper-left corner of the drawing window. The suboptimal structures are alternative hypotheses.
-
10. The sequence alignment is saved as plain text with the extension .ali. It can be viewed with any text editor.
On Windows, a convenient text editor is WordPad, which comes with Windows. To view this file, double-click on the .ali file in Windows Explorer or in the folder in which it was saved. If your computer has never opened a .ali file, Windows will ask which program to use to open the file. First, choose the option “Select a program from a list of installed programs” and click the OK button. On the next page, Choose WordPad from the list of programs and click the checkbox next to “Always use this program to open this type of file” so that the next .ali file will be opened by WordPad.
On the Macintosh, a convenient viewer is TextEdit. To open the file, double-click it. If your computer has never opened a .ali file, it will prompt you to choose an application. Click on “Choose Application...”, then choose “All Applications” next to enable, and finally select the program TextEdit. TextEdit will then be the default viewer for .ali files.
Figure 12.4.13.
A screen shot of Dynalign output. This is the output as it appears on Microsoft Windows 7. The structure predicted for RA7680 is on top of the structure predicted for RD02060.
Getting help with RNAstructure
11. RNAstructure includes an online help, available by clicking Help Topics under the Help menu item. The online help is organized into chapters covering the different functions. An index is also available by clicking the index button from the online help window. The help is displayed in the default web browser.
COMMENTARY
Background Information
The prediction of RNA secondary structure has been shown to be about 73% accurate at predicting base pairs for a diverse database of sequences, shorter than 700 nucleotides, with known structures (Mathews et al., 2004). This accuracy can be improved by constraining or restraining prediction with the use of experimentally-determined constraints or restraints (Deigan et al., 2009; Hajdin et al., 2013; Mathews et al., 2004; Mathews et al., 1999), such as enzymatic cleavage (Knapp, 1989), FMN cleavage (Burgstaller and Famulok, 1997), chemical modification data (Cordero et al., 2012; Ehresmann et al., 1987), or SHAPE data (Merino et al., 2005). Unit 12.6 details the prediction of RNA secondary structure with RNAstructure, including the use of experimental data to improve structure predictions.
The most accurate method to determine RNA secondary structure is comparative analysis (Pace et al., 1999; Seetin and Mathews, 2012a). In this approach, the common structure is determined for a large set of homologous sequences. It has been observed that RNA structure is conserved in evolution to a greater extent than sequence, and therefore the set of conserved structures is much smaller than the set of possible structures for a single sequence. When comparative analysis is performed by an expert, the results are quite accurate when compared to crystal structures solved subsequently. For example, for ribosomal RNA, more than 97% of base pairs in secondary structures determined by comparative analysis were observed in crystal structures (Gutell et al., 2002).
Prediction methods that predicted conserved RNA secondary structures emulate comparative analysis and are, on average, more accurate than single sequence structure prediction methods. There are reviews available that discuss the approaches used for predicting conserved structures (Bernhart and Hofacker, 2009; Mathews et al., 2006; Reeder et al., 2006; Seetin and Mathews, 2012a). To date, no computational method can completely replace manual comparative analysis. Therefore, the available software tools should be used as starting points to develop hypotheses for the conserved structure.
The RNAstructure package contains two methods to predict structures conserved in two sequences, Dynalign (Harmanci et al., 2007; Mathews, 2005; Mathews and Turner, 2002; Uzilov et al., 2006) and PARTS (Harmanci et al., 2008; Harmanci et al., 2009), and it also contains two methods for predicting structures conserved in three or more sequences, Multilign and TurboFold (Harmanci et al., 2011; Xu and Mathews, 2011). Dynalign uses a scoring scheme that does not consider sequence identity, and hence it works better than PARTS when the pairwise sequence identity is low, i.e. below 60%. PARTS, however, estimates pair probabilities for conserved structures, and these estimate the reliability of predicted pairs (Harmanci et al., 2008). Multilign uses multiple Dynalign calculations to predict structures conserved in three or more sequences. It can therefore work when the pairwise sequence identities are low. Additionally, Multilign predicts the multiple sequence alignment that reflects the structure. TurboFold is faster than Multilign and estimates pair probabilities in the conserved structures. TurboFold, however, uses sequence alignment and therefore requires sequence conservation in the input set of sequence. TurboFold also does not predict the sequence alignment.
The protocols for the RNAstructure GUI covered TurboFold (Alternate Protocol 1) and Dynalign (Alternate Protocol 2). The Multilign GUI input form shares many features with the TurboFold input form, and therefore Multilign is not explicitly presented. The online help can be consulted about the meaning of parameters in the input form. PARTS can be run on the webserver or as a command line tool at this time.
Suggestions for Further Analysis
XRNA is a program that can make publication quality structure drawings. It is available from the UC Santa Cruz RNA Center at http://rna.ucsc.edu/rnacenter/xrna/xrna.html. A second useful program for manipulating structure drawings is VARNA, which allows interactive manipulation of the drawing (Darty et al., 2009).
Acknowledgement
Continued development and support of RNAstructure and the writing of these protocols were supported by National Institutes of Health grant R01 GM076485.
Literature Cited
- Bellaousov S, Reuter JS, Seetin MG, Mathews DH. RNAstructure: Web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res. 2013;41:W471–474. doi: 10.1093/nar/gkt290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernhart SH, Hofacker IL. From consensus structure prediction to RNA gene finding. Brief. Funct. Genomic Proteomic. 2009;8:461–471. doi: 10.1093/bfgp/elp043. [DOI] [PubMed] [Google Scholar]
- Burgstaller P, Famulok M. Flavin-dependent photocleavage of RNA at G.U base pairs. J. Am. Chem. Soc. 1997;119:1137–1138. [Google Scholar]
- Cordero P, Kladwang W, VanLang CC, Das R. Quantitative dimethyl sulfate mapping for automated RNA secondary structure inference. Biochemistry. 2012;51:7037–7039. doi: 10.1021/bi3008802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Darty K, Denise A, Ponty Y. VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics. 2009;25:1974–1975. doi: 10.1093/bioinformatics/btp250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deigan KE, Li TW, Mathews DH, Weeks KM. Accurate SHAPE-directed RNA structure determination. Proc. Natl. Acad. Sci. U.S.A. 2009;106:97–102. doi: 10.1073/pnas.0806929106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ehresmann C, Baudin F, Mougel M, Romby P, Ebel J, Ehresmann B. Probing the structure of RNAs in solution. Nucleic Acids Res. 1987;15:9109–9128. doi: 10.1093/nar/15.22.9109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutell RR, Lee JC, Cannone JJ. The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol. 2002;12:301–310. doi: 10.1016/s0959-440x(02)00339-1. [DOI] [PubMed] [Google Scholar]
- Hajdin CE, Bellaousov S, Huggins W, Leonard CW, Mathews DH, Weeks KM. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl. Acad. Sci. U.S.A. 2013;110:5498–5503. doi: 10.1073/pnas.1219988110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harmanci AO, Sharma G, Mathews DH. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics. 2007;8:130. doi: 10.1186/1471-2105-8-130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harmanci AO, Sharma G, Mathews DH. PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction. Nucleic Acids Res. 2008;36:2406–2417. doi: 10.1093/nar/gkn043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harmanci AO, Sharma G, Mathews DH. Stochastic sampling of the RNA structural alignment space. Nucleic Acids Res. 2009;37:4063–4075. doi: 10.1093/nar/gkp276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harmanci AO, Sharma G, Mathews DH. TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinformatics. 2011;12:108. doi: 10.1186/1471-2105-12-108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knapp G. Enzymatic approaches to probing RNA secondary and tertiary structure. Methods Enzymol. 1989;180:192–212. doi: 10.1016/0076-6879(89)80102-8. [DOI] [PubMed] [Google Scholar]
- Liu B, Mathews DH, Turner DH. RNA pseudoknots: folding and finding. 2010. F1000 Biol. Rep. 2:8. [DOI] [PMC free article] [PubMed]
- Lu ZJ, Turner DH, Mathews DH. A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Res. 2006;34:4912–4924. doi: 10.1093/nar/gkl472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathews DH. Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA. 2004;10:1178–1190. doi: 10.1261/rna.7650904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathews DH. Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics. 2005;21:2246–2253. doi: 10.1093/bioinformatics/bti349. [DOI] [PubMed] [Google Scholar]
- Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. U.S.A. 2004;101:7287–7292. doi: 10.1073/pnas.0401799101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA secondary structure. J. Mol. Biol. 1999;288:911–940. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
- Mathews DH, Schroeder SJ, Turner DH, Zuker M. Predicting RNA secondary structure. In: Gesteland RF, Cech TR, Atkins JF, editors. The RNA World. third edition Cold Spring Harbor Laboratory Press; Cold Spring Harbor: 2006. pp. 631–657. [Google Scholar]
- Mathews DH, Turner DH. Dynalign: An algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 2002;317:191–203. doi: 10.1006/jmbi.2001.5351. [DOI] [PubMed] [Google Scholar]
- Merino EJ, Wilkinson KA, Coughlan JL, Weeks KM. RNA structure analysis at single nucleotide resolution by selective 2′-hydroxyl acylation and primer extension (SHAPE) J. Am. Chem. Soc. 2005;127:4223–4231. doi: 10.1021/ja043822v. [DOI] [PubMed] [Google Scholar]
- Pace NR, Thomas BC, Woese CR. Probing RNA structure, function, and history by comparative analysis. In: Gesteland RF, Cech TR, Atkins JF, editors. The RNA World. 2nd Ed. Cold Spring Harbor Laboratory Press; 1999. pp. 113–141. [Google Scholar]
- Reeder J, Hochsmann M, Rehmsmeier M, Voss B, Giegerich R. Beyond Mfold: recent advances in RNA bioinformatics. J. Biotechnol. 2006;124:41–55. doi: 10.1016/j.jbiotec.2006.01.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reuter JS, Mathews DH. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics. 2010;11:129. doi: 10.1186/1471-2105-11-129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seetin MG, Mathews DH. RNA structure prediction: an overview of methods. Methods Mol. Biol. 2012a;905:99–122. doi: 10.1007/978-1-61779-949-5_8. [DOI] [PubMed] [Google Scholar]
- Seetin MG, Mathews DH. TurboKnot: Rapid Prediction of Conserved RNA Secondary Structures Including Pseudoknots. Bioinformatics. 2012b;28:792–798. doi: 10.1093/bioinformatics/bts044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1998;26:148–153. doi: 10.1093/nar/26.1.148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sprinzl M, Vassilenko KS. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 2005;33:D139–140. doi: 10.1093/nar/gki012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uzilov AV, Keegan JM, Mathews DH. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics. 2006;7:173. doi: 10.1186/1471-2105-7-173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia T, SantaLucia J, Jr., Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick pairs. Biochemistry. 1998;37:14719–14735. doi: 10.1021/bi9809425. [DOI] [PubMed] [Google Scholar]
- Xu Z, Mathews DH. Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences. Bioinformatics. 2011;27:626–632. doi: 10.1093/bioinformatics/btq726. [DOI] [PMC free article] [PubMed] [Google Scholar]











