Skip to main content
. 2011 May 10;12:144. doi: 10.1186/1471-2105-12-144

Table 2.

Crumble results for different sized simulated datasets and underlying alignment methods.

60 kb 150 kb 500 kb 1000 kb




Time Agreement Time Agreement Time Agreement Time Agreement
Pecan1 3.43 0.896 10.6 0.905 46.9 0.906 100 0.906
Crumble w/Pecan 60% 3.29 0.894 7.18 0.904 21.5 0.905 51.9 0.906
30% 2.56 0.889 4.66 0.903 11.9 0.905 23.5 0.905
15% 2.39 0.859 3.77 0.893 8.29 0.903 13.9 0.905

FSA2 37.4 0.886 _a _a _a _a _a _a
Crumble w/FSA 60% 25.8 0.881 69.8 0.903 _a _a _a _a
30% 21.0 0.873 3act9.2 0.898 _a _a _a _a
15% 17.7 0.849 25.5 0.893 104. 0.811 _a _a

MUSCLE3 _a _a _a _a _a _a _a _a
Crumble w/MUSCLE 60% _a _a _a _a _a _a _a _a
30% 128 0.707 _a _a _a _a _a _a
15% 63.1 0.679 251. 0.705 _a _a _a _a

1 Pecan was run with default parameters.

2 FSA was run with the --exonerate, --anchored, and --softmasked flags.

3 MUSCLE was run with default parameters.

a The majority of these problems were unable to be aligned due to running out of memory.

The run-time and average agreement score of Crumble alignments of different sized datasets. Several sets of simulated alignment problems were generated using a root sequence of 60, 150, 500, and 1000 kilobases. The neutral evolution of each root sequence was simulated over a nine species tree. Fifty problems were generated per root size for a total of two hundred test alignment problems. The agreement and run-time (in minutes) for each problem size is the average over the fifty simulated alignments. Crumble was used to break the problems down to sub-problems that were 60%, 30%, and 15% of the length of the original problem. The approximate core size was set to 60%, 30%, and 15% of the length of the original problem and the block was allowed to be at most 4 kb larger as measured in any of the sequences. Pecan, FSA, and MUSCLE were used as the underlying alignment method. PrePecan was used to generate the constraints. We were unable to apply FSA directly (not using Crumble) to 150 kb or larger problems because FSA required more than the 4GBs of memory we had available per cluster node. Using Crumble we were able to run FSA on problems as large as half a megabase. MUSCLE had more memory issues but we were able to use it on problems as large as 150 kb using Crumble. For Pecan, Crumble achieved more than a seven fold speedup with almost no loss of accuracy on the largest problem size.