Table 5. Benchmarks against simulated unrelated genomic DNA.
Program | Genomic DNA |
CHAOS/DIALIGN | 10% |
ClustalW | 96% |
DIALIGN-TX | 20% |
FSA (–exonerate) | 1% |
FSA (–exonerate –maxsn) | 4% |
MAVID | 17% |
MLAGAN | 30% |
Pecan | 1% |
TBA | 0% |
Large-scale random sequence tests for genomic alignment programs. As in Table 4, table entries are the fraction of random sequence aligned, calculated by taking a sum-of-pairs over pairwise alignments. FSA aligns a small fraction of random genomic sequence in both its default and maximium-sensitivity (–maxsn) modes. TBA did not align a single base in these tests and was thus the best performer. As the three best-performing programs in this test, TBA, Pecan and FSA –exonerate, all use inexact sequence matches as anchors, the relative performance of these three programs can be explained by the stringency of the anchoring thresholds used: TBA uses the highest threshold by default, Pecan the next-highest and FSA the lowest. All three of these programs show good base-level specificity on the simulated alignments of Table 3, for which TBA has the highest specificity on one dataset and FSA on two. The random sequence tests consisted of 50 datasets, each with 10 random DNA sequences (uniform base distribution) of length 50 kb. All programs were run with default parameters. For genomic aligners that required a phylogenetic tree, we used the guide tree computed by ClustalW (rooted via the midpoint algorithm of the PHYLIP [64] retree program).