Table 2.
60 kb | 150 kb | 500 kb | 1000 kb | ||||||
---|---|---|---|---|---|---|---|---|---|
Time | Agreement | Time | Agreement | Time | Agreement | Time | Agreement | ||
Pecan1 | 3.43 | 0.896 | 10.6 | 0.905 | 46.9 | 0.906 | 100 | 0.906 | |
Crumble w/Pecan | 60% | 3.29 | 0.894 | 7.18 | 0.904 | 21.5 | 0.905 | 51.9 | 0.906 |
30% | 2.56 | 0.889 | 4.66 | 0.903 | 11.9 | 0.905 | 23.5 | 0.905 | |
15% | 2.39 | 0.859 | 3.77 | 0.893 | 8.29 | 0.903 | 13.9 | 0.905 | |
FSA2 | 37.4 | 0.886 | _a | _a | _a | _a | _a | _a | |
Crumble w/FSA | 60% | 25.8 | 0.881 | 69.8 | 0.903 | _a | _a | _a | _a |
30% | 21.0 | 0.873 | 3act9.2 | 0.898 | _a | _a | _a | _a | |
15% | 17.7 | 0.849 | 25.5 | 0.893 | 104. | 0.811 | _a | _a | |
MUSCLE3 | _a | _a | _a | _a | _a | _a | _a | _a | |
Crumble w/MUSCLE | 60% | _a | _a | _a | _a | _a | _a | _a | _a |
30% | 128 | 0.707 | _a | _a | _a | _a | _a | _a | |
15% | 63.1 | 0.679 | 251. | 0.705 | _a | _a | _a | _a |
1 Pecan was run with default parameters.
2 FSA was run with the --exonerate, --anchored, and --softmasked flags.
3 MUSCLE was run with default parameters.
a The majority of these problems were unable to be aligned due to running out of memory.
The run-time and average agreement score of Crumble alignments of different sized datasets. Several sets of simulated alignment problems were generated using a root sequence of 60, 150, 500, and 1000 kilobases. The neutral evolution of each root sequence was simulated over a nine species tree. Fifty problems were generated per root size for a total of two hundred test alignment problems. The agreement and run-time (in minutes) for each problem size is the average over the fifty simulated alignments. Crumble was used to break the problems down to sub-problems that were 60%, 30%, and 15% of the length of the original problem. The approximate core size was set to 60%, 30%, and 15% of the length of the original problem and the block was allowed to be at most 4 kb larger as measured in any of the sequences. Pecan, FSA, and MUSCLE were used as the underlying alignment method. PrePecan was used to generate the constraints. We were unable to apply FSA directly (not using Crumble) to 150 kb or larger problems because FSA required more than the 4GBs of memory we had available per cluster node. Using Crumble we were able to run FSA on problems as large as half a megabase. MUSCLE had more memory issues but we were able to use it on problems as large as 150 kb using Crumble. For Pecan, Crumble achieved more than a seven fold speedup with almost no loss of accuracy on the largest problem size.