Skip to main content
. 2021 May 5;9:e11348. doi: 10.7717/peerj.11348

Table 6. Feature comparison between dRep, Assembly-Dereplicator (A-D) and TQMD.

Feature dRep A-D TQMD
main engine(s) Mash + ANIm (or gANI) Mash JELLYFISH or Mash
other dependencies CheckM (optional) none QUAST (optional), RNAmmer (optional), CD-HIT-EST (optional), Forty-Two (optional), CheckM (optional)
relational database N N Y
genome source custom custom RefSeq, GenBank, custom
taxonomic filters N N Y (when downloading and clustering)
automatic genome download N N Y
distance metric(s) Mash distance (estimated JI) then ANI Mash distance (estimated JI) 1-JI (exact) or Mash distance (estimated JI) or 1-IGF (exact)
heuristic(s) biphasic approach: Mash for fast and rough clustering followed by ANI for slow and accurate clustering d-and-c strategy (serial) iterative greedy algorithm (serial) + d-and-c strategy (parallel)
stop condition(s) unspecified first failure to dereplicate any serial batch any of 3 possible cut-offs (number of rounds, number of representatives, clustering ratio)
d-and-c dividing scheme unspecified random random or taxonomic
selection of representatives formula based on genome size, assembly quality and contamination level (incl. strain heterogeneity) assembly quality formula based on genome size, assembly quality, annotation richness and contamination level (fully customisable with 30 possible metrics)
parameterization of representative selection Y (parameter weights) N Y (simplified formula)
grid engine support N N Y (SGE/OGE) (optional)
distribution source (pip), conda, Galaxy source source (Bitbucket), Singularity container
CPU usage fixed on launch fixed on launch specified as a maximum (decreases over time)

Notes.

JI
Jaccard Index
IGF
Identical Genome Fraction
ANI
average nucleotide identity
d-and-c
divide-and-conquer
SGE/OGE
Sun/Open Grid Engine
Y
present feature
N
absent feature