Table 2. Benchmarks against RNA structural databases.
Program | BRAliBase 2.1 | Consan mix80 |
(Acc/Sn/PPV) | (Acc/Sn/PPV) | |
ClustalW | 0.85/0.86/0.86 | 0.65/0.65/0.68 |
DIALIGN | 0.82/0.83/0.85 | 0.76/0.75/0.82 |
FSA | 0.90/0.91/0.94 | 0.77/0.74/0.92 |
FSA (–maxsn) | 0.91/0.92/0.92 | 0.78/0.78/0.86 |
MAFFT | 0.90/0.91/0.91 | 0.77/0.78/0.77 |
MUSCLE | 0.90/0.91/0.90 | 0.74/0.76/0.74 |
ProbConsRNA | 0.91/0.92/0.92 | (failed to align) |
T-Coffee | 0.81/0.82/0.84 | 0.38/0.33/0.40 |
SeqAn::T-Coffee | 0.89/0.90/0.90 | (failed to align) |
Comparisons of the accuracies (Acc), sensitivities (Sn) and positive predictive values (PPV) of FSA and other alignment methods on the BRAliBase 2.1 dataset of small RNAs [26] and the Consan mix80 dataset of Small and Large Subunit ribosomal RNAs [27]. The BRAliBase 2.1 dataset consisted of all alignments with 15 sequences (the largest alignments). The mix80 dataset provided difficult alignment problems: The four alignments each contain from 107 to 254 sequences of approximately 1–4 kilobases in length, with average percentage identity less than <50%. Two program, ProbConsRNA and SeqAn::T-Coffee, were incapable of aligning these large datasets. When run in –fast mode, FSA considers only a subset (∼20% in this case) of all sequence pairs. Note that because the mix80 dataset consists of long sequences, FSA automatically uses anchoring for speed. FSA does not use anchoring on the short sequences of BRAliBase 2.1.