Table 2. Summary statistics for Rfam-based annotation of RNAs in various genomes and metagenomics data sets.
| Genome/data set | Size (Mb) | # of hits | # of fams | CPU time (hours) | Mb/hour |
|---|---|---|---|---|---|
| Homo sapiens | 3099.7 | 14 508 | 796 | 650 | 4.8 |
| Sus scrofa (pig) | 2808.5 | 6177 | 625 | 460 | 6.1 |
| Drosophila melanogaster | 168.7 | 4321 | 156 | 30 | 5.7 |
| Caenorhabditis elegans | 100.3 | 1022 | 175 | 20 | 5.2 |
| Saccharomyces cerevisiae | 12.2 | 376 | 96 | 1.7 | 7.3 |
| Escherichia coli | 4.6 | 256 | 112 | 0.46 | 10.2 |
| Bacillus subtilis | 4.1 | 211 | 52 | 0.57 | 7.2 |
| Methanocaldococcus jannaschii | 1.7 | 257 | 18 | 0.31 | 5.6 |
| Aquifex aeolicus | 1.6 | 52 | 7 | 0.22 | 7.3 |
| Borrelia burgdorferi | 0.9 | 44 | 7 | 0.22 | 4.1 |
| Human immunodeficiency virus (HIV) | 0.01 | 12 | 10 | 0.016 | 0.63 |
| Human gut microbiome sample (sample ERS167139, 454 sequencing) | 166.1 | 4342 | 54 | 22 | 7.7 |
| Human gut microbiome sample (sample ERS235581, Illumina HiSeq sequencing) (28) | 52.9 | 3159 | 47 | 8.5 | 6.2 |
| Ocean metagenome (sample SRS580499, Illumina genome analyzer) | 44.3 | 6692 | 59 | 13 | 3.5 |
The cmsearch program of Infernal 1.1 was used with Rfam 12.0 CM files and the following command-line options: --noali --cut ga --rfam --nohmmonly --cpu 0. Overlapping hits were removed such that no nucleotide was matched by more than one family by keeping the hit with the lower E-value in the case of overlaps (and higher bit score in the case of tying E-values). All searches were run as single execution threads on 3.0 GHz Intel Xeon processors. The Homo sapiens, Sus scrofa, Drosophila melanogaster and Saccharomyces cerevisiae genomes searched were obtained from Ensembl release 76 (http://www.ensembl.org/) (26) and the Escherishia coli (K12 substr MG1655), Bacillus subtilis (BSn5), Methanocaldococcus jannaschii (DSM 2661), Aquifex aeolicus (VF5) and Borrelia burgdorferi (CA-11 2A) genomes were obtained from release 23 of Ensembl Genomes (http://ensemblgenomes.org/) (27) for all of those the actual sequence file searched was downloaded via FTP and suffixed with .dna.toplevel.fa.gz. The HIV genome used is ENA accession AJ291720 and the four metagenomic samples were downloaded from the EBI Metagenomics Portal (https://www.ebi.ac.uk/metagenomics/) (29), and can be accessed by the sample accession listed in the table. ‘CPU time’ and ‘Mb/hour’ columns are rounded to two significant digits.