Skip to main content
. 2012 Nov 30;41(Database issue):D70–D82. doi: 10.1093/nar/gks1265

Table 1.

Coverage and false discovery on benchmark data

Method Covered bases Covered (%) FP bases FP (%) FDR (%) Time (h)
nhmmer 278 140 893 53.88 159 028 0.03 0.06 595
cross_match 263 131 978 50.97 282 672 0.05 0.11 2682
rmblastn 257 212 437 49.82 201 430 0.04 0.08 59
blastn (sensitive) 231 296 716 44.80 135 832 0.03 0.06 28
blastn 201 836 787 39.10 68 743 0.01 0.03 18

Covered bases were computed by running a search of each entry model or consensus sequence against a 516.2-Mb benchmark from human chromosomes 1,2 and 19. FP nucleotides were computed as described in the text. FDR is the ratio of FP nucleotides to covered nucleotides. The software nhmmer (version snap-10162012) was run with default parameters, after building models using the flags (–hand –maxinsertlen 10) to ensure one match state for each position in the consensus, and to limit insert length parameterization, respectively. The software cross_match (v.0.990329) was run using RepeatMasker parameters calculated to be optimal for copies 25% diverged from their original sequence and in a background of 41% GC DNA (-gap_init -25 -gap_ext -5 -minmatch 7 -bandwidth 14 -masklevel 10 -matrix 25p41g.matrix –minscore 200). The software rmblastn (2.2.23+) was run with parameters that mirror those of cross_match, (-gapopen 20 -gapextend 5 -complexity_adjust -word_size 7 -xdrop_ungap 400 -xdrop_gap_final 800 -xdrop_gap 100 -min_raw_gapped_score 200 -dust no -matrix 25p41g.matrix). The software blastn (2.2.25+) was run with basic settings (-wordsize 7) and with sensitive settings (-reward 1 -penalty -1 -gapopen 2 -gapextend 1 -wordsize 7). For all tools, entry-specific score thresholds were chosen to meet a target FDR of 0.2%, as described in the text. Runtime was collected on a single thread on a 2.66 GHz Intel Gainestown (X5550) processor. Results show that the speed of nhmmer lies between that of rmblastn and cross_match.