. 2020 Apr 27;48(10):5217–5234. doi: 10.1093/nar/gkaa265

Table 2.

Metagenomics software based on probabilistic and signal processing algorithms. Six main application areas are highlighted: containment, downsampling, probe design, profiling, resemblance and taxonomic classification. Speed indicates the relative computational speed of CPU operations, memory the relative maximum RAM used during index construction/query steps and year the publication year. More ‘⋆’s means better time and memory efficiency. Less ‘⋆’s indicate more resource intensive tools. Performance estimates using only literature based comparison are marked in gray (‘⋆’). The stars (1-5) correspond roughly to time (days, hours, minutes, seconds and milliseconds) and memory (>64GB (server), >16GB (workstation), >1GB, >16MB and <16MB). Datasets used were Shakya et al. (133) (Downsampling, Profiling and Taxonomic Classification), 99 sequencing experiments from SRA (132) (Containment), 1028 E. coli genomes from NCBI Refseq (134) (Resemblance) and a dataset containing Coronavirus, West Nile virus, Zika virus, Yellow Fever virus and Ebola virus genomes from NCBI RefSeq (134) (Probe Design). Tools supporting multithreading were run with 30 threads. KrakenUniq and Kraken2 were run on their standard databases and are expected to show better memory efficiency if MiniKraken DB is chosen instead. BioBloom Tools and Opal were indexed using the training data provided by Opal which is much smaller than the DBs other tools use. MetaMaps is a classifier specifically for Long Read sequences as compared to the other tools in the category. The datasets and results for each tool can be found at https://gitlab.com/treangenlab/hashreview