Table 1.
Method | Covered bases | Covered (%) | FP bases | FP (%) | FDR (%) | Time (h) |
---|---|---|---|---|---|---|
nhmmer | 278 140 893 | 53.88 | 159 028 | 0.03 | 0.06 | 595 |
cross_match | 263 131 978 | 50.97 | 282 672 | 0.05 | 0.11 | 2682 |
rmblastn | 257 212 437 | 49.82 | 201 430 | 0.04 | 0.08 | 59 |
blastn (sensitive) | 231 296 716 | 44.80 | 135 832 | 0.03 | 0.06 | 28 |
blastn | 201 836 787 | 39.10 | 68 743 | 0.01 | 0.03 | 18 |
Covered bases were computed by running a search of each entry model or consensus sequence against a 516.2-Mb benchmark from human chromosomes 1,2 and 19. FP nucleotides were computed as described in the text. FDR is the ratio of FP nucleotides to covered nucleotides. The software nhmmer (version snap-10162012) was run with default parameters, after building models using the flags (–hand –maxinsertlen 10) to ensure one match state for each position in the consensus, and to limit insert length parameterization, respectively. The software cross_match (v.0.990329) was run using RepeatMasker parameters calculated to be optimal for copies 25% diverged from their original sequence and in a background of 41% GC DNA (-gap_init -25 -gap_ext -5 -minmatch 7 -bandwidth 14 -masklevel 10 -matrix 25p41g.matrix –minscore 200). The software rmblastn (2.2.23+) was run with parameters that mirror those of cross_match, (-gapopen 20 -gapextend 5 -complexity_adjust -word_size 7 -xdrop_ungap 400 -xdrop_gap_final 800 -xdrop_gap 100 -min_raw_gapped_score 200 -dust no -matrix 25p41g.matrix). The software blastn (2.2.25+) was run with basic settings (-wordsize 7) and with sensitive settings (-reward 1 -penalty -1 -gapopen 2 -gapextend 1 -wordsize 7). For all tools, entry-specific score thresholds were chosen to meet a target FDR of 0.2%, as described in the text. Runtime was collected on a single thread on a 2.66 GHz Intel Gainestown (X5550) processor. Results show that the speed of nhmmer lies between that of rmblastn and cross_match.