Table 3.
50 leaves | 100 leaves | 500 leaves | 1000 leaves | ||||||
---|---|---|---|---|---|---|---|---|---|
Time | Agreement | Time | Agreement | Time | Agreement | Time | Agreement | ||
Pecan1 | 21.9 | 0.914 | 297. | 0.879 | _a | _a | _a | _a | |
Prune w/Pecan | 60% | 7.26 | 0.880 | 39.2 | 0.862 | _a | _a | _a | _a |
30% | 3.13 | 0.909 | 19.6 | 0.839 | _a | _a | _a | _a | |
15% | 7.26 | 0.912 | 13.3 | 0.878 | 125. | 0.844 | _a | _a | |
7% | 4.24 | 0.909 | 13.5 | 0.849 | 29.1 | 0.907 | 122. | 0.877 | |
FSA2 | 63.1 | 0.933 | 266. | 0.856 | _a | _a | _a | _a | |
Prune w/FSA | 60% | 33.8 | 0.912 | 78.9 | 0.838 | 589. | 0.871 | _a | _a |
30% | 10.5 | 0.893 | 23.8 | 0.838 | 142. | 0.879 | _a | _a | |
15% | 4.25 | 0.885 | 17.1 | 0.857 | 40.8 | 0.877 | 150. | 0.861 | |
7% | 3.00 | 0.866 | 4.23 | 0.842 | 12.7 | 0.903 | 34.8 | 0.887 | |
MUSCLE3 | 55.6 | 0.905 | 138. | 0.799 | _b | _b | _b | _b | |
Prune w/MUSCLE | 60% | 40.7 | 0.899 | 77.9 | 0.777 | 886. | 0.862 | _b | _b |
30% | 24.7 | 0.896 | 42.8 | 0.777 | 368. | 0.883 | _b | _b | |
15% | 15.1 | 0.905 | 29.1 | 0.828 | 185. | 0.899 | 440. | 0.900 | |
7% | 24.7 | 0.905 | 18.8 | 0.841 | 114. | 0.924 | 228 | 0.928 | |
MAFFT4 | 3.17 | 0.897 | 5.39 | 0.806 | 20.1 | 0.886 | 25.2 | 0.912 | |
SATé5 | 101. | 0.915 | 301. | 0.840 | _b | _b | _b | _b |
1 Pecan was run with default parameters.
2 FSA was run with the --exonerate, --anchored, --softmasked, and --fast flags.
3 MUSCLE was run with default parameters.
4 MAFFT was run with the --treein option.
5 SATé was run with the -t option but limited to two iterations. We found that more iterations did almost nothing for accuracy.
a The majority of these problems were unable to be aligned due to running out of memory.
b The majority of these problems took longer than 3 days and were aborted.
The run-time and average agreement score of Prune alignments of different sized datasets. Several sets of simulated alignment problems were generated using a root sequence of 10 kilobases. The neutral evolution of each root sequence was simulated over 50, 100, 500, and 1000 species trees. Fifty problems were generated per tree size for a total of two hundred test alignment problems. The agreement and run-time (in minutes) for each problem size is the average over the fifty simulated alignments. Each underlying alignment method was tested on the dataset (Pecan, FSA, MUSCLE). Prune was then used to break the problems down into sub-trees that contained at most 60%, 30%, 15%, and 7% of the nodes in the entire tree. The largest number of stages was six but most of the problems had no more than 3 stages. Pecan, FSA, and MUSCLE were used as the underlying alignment method to Prune. We also performed alignment using MAFFT and SATé to compare against. To ensure a fair comparison, the true tree topology was passed to SATé (using -t option) and to MAFFT (using the poorly documented --treein option). We were unable to apply some alignment algorithms to large problems because of very long run-times and memory issues. Using Prune, we were able to use Pecan, FSA, and MUSCLE to solve alignment problems that were much deeper than could be solved without Prune. Prune achieved a very large speedup with little loss of accuracy and sometimes with an increase in accuracy.