Table 1. Comparison of the old Rfam 11.0 BLAST and Infernal 1.0 search strategy versus the new Rfam 12.0 Infernal 1.1 search strategy for 15 of 200 randomly chosen families.
Accession | Family ID | Length (nt) | #of seed seqs | Time new (h) | Time old (h) | Time (old/new) | New total hits | Old total hits | New unique hits | Old unique hits |
---|---|---|---|---|---|---|---|---|---|---|
Top five families | ||||||||||
RF00028 | Intron_gpI | 251 | 12 | 125.0 | 357.2 | 2.8 | 71 433 | 60 264 | 11 175 | 1 |
RF00026 | U6 | 104 | 188 | 31.2 | 181.1 | 5.8 | 66 517 | 62 174 | 4367 | 14 |
RF00003 | U1 | 166 | 100 | 11.6 | 64.0 | 5.5 | 15 770 | 14 867 | 904 | 1 |
RF00162 | SAM | 108 | 433 | 8.3 | 590.0 | 70.8 | 4905 | 4797 | 108 | 0 |
RF00050 | FMN | 140 | 144 | 17.1 | 169.9 | 23.9 | 4381 | 4306 | 76 | 1 |
Middle five families | ||||||||||
RF01426 | snoR126 | 101 | 4 | 40.3 | 7.3 | 0.2 | 78 | 66 | 12 | 0 |
RF01252 | snR5 | 196 | 11 | 41.1 | 9.8 | 0.2 | 76 | 72 | 4 | 0 |
RF00544 | snopsi28S-3327 | 143 | 14 | 11.3 | 15.1 | 1.3 | 75 | 74 | 1 | 0 |
RF00439 | SNORD87 | 85 | 10 | 26.8 | 12.6 | 0.5 | 75 | 74 | 1 | 0 |
RF01537 | TB11Cs2H1 | 70 | 7 | 5.8 | 7.3 | 1.3 | 74 | 73 | 1 | 0 |
Bottom five families | ||||||||||
RF01439 | S_pombe_snR36 | 164 | 2 | 25.0 | 1.7 | 0.1 | 5 | 2 | 3 | 0 |
RF01448 | S_pombe_snR93 | 143 | 2 | 11.0 | 1.5 | 0.1 | 4 | 3 | 1 | 0 |
RF00967 | mir-281 | 83 | 2 | 6.0 | 2.6 | 0.4 | 4 | 4 | 0 | 0 |
RF00925 | MIR1027 | 142 | 2 | 20.4 | 1.6 | 0.1 | 3 | 3 | 0 | 0 |
RF01576 | DdR8 | 88 | 2 | 10.4 | 1.6 | 0.2 | 2 | 2 | 0 | 0 |
all 200 | - | - | - | 4222.2 | 4069.8 | 0.96 | 201 814 | 179 681 | 22 312 | 53 |
The top five, middle five and lowest five families are shown, as ranked by number of hits found above Rfam GA thresholds using the new search strategy. Identical Rfam 12.0 score thresholds and CM parameters were used for both the new and old strategies (new: Rfam 12.0 CM file in Infernal 1.1 format; old: Rfam 12.0 CM file converted to Infernal 1.0 format using Infernal 1.1's cmconvert program). For each family, columns 1–4 include the Rfam accession, family identifier, model length in nucleotides and number of sequences in the seed alignment, columns 5–7 report on the running time for the new strategy in hours, old strategy in hours and the ratio of the running time (old/new), respectively, columns 8 and 9 report the number of hits found above the per-family Rfam 12.0 thresholds for the new strategy and old strategy, respectively; column 10 reports the number of unique hits found by the new strategy and not the old, and column 11 reports the number of unique hits found by the old strategy but not the new. A unique hit is defined as a hit found by one strategy for which none of the hits found by the other strategy overlap by ≥1 nucleotides on the same strand. The 200 families were randomly chosen from the set of 2190 families that exist in both Rfam 12.0 and Rfam 11.0, the last release for which the old strategy was used. Initially, MIR1122 (RF00906) was included in the 200, but we replaced it with another random choice (SNORD97, RF01291) after learning that MIR1122 is clearly related to a MITE (miniature inverted-repeat transposable element) in plants and that the curators at the microRNA database mirBase (4) suspect it may not be a true miRNA gene. If the family is removed from mirBase, it will also be removed from Rfam.