Table 6. Feature comparison between dRep, Assembly-Dereplicator (A-D) and TQMD.
Feature | dRep | A-D | TQMD |
---|---|---|---|
main engine(s) | Mash + ANIm (or gANI) | Mash | JELLYFISH or Mash |
other dependencies | CheckM (optional) | none | QUAST (optional), RNAmmer (optional), CD-HIT-EST (optional), Forty-Two (optional), CheckM (optional) |
relational database | N | N | Y |
genome source | custom | custom | RefSeq, GenBank, custom |
taxonomic filters | N | N | Y (when downloading and clustering) |
automatic genome download | N | N | Y |
distance metric(s) | Mash distance (estimated JI) then ANI | Mash distance (estimated JI) | 1-JI (exact) or Mash distance (estimated JI) or 1-IGF (exact) |
heuristic(s) | biphasic approach: Mash for fast and rough clustering followed by ANI for slow and accurate clustering | d-and-c strategy (serial) | iterative greedy algorithm (serial) + d-and-c strategy (parallel) |
stop condition(s) | unspecified | first failure to dereplicate any serial batch | any of 3 possible cut-offs (number of rounds, number of representatives, clustering ratio) |
d-and-c dividing scheme | unspecified | random | random or taxonomic |
selection of representatives | formula based on genome size, assembly quality and contamination level (incl. strain heterogeneity) | assembly quality | formula based on genome size, assembly quality, annotation richness and contamination level (fully customisable with 30 possible metrics) |
parameterization of representative selection | Y (parameter weights) | N | Y (simplified formula) |
grid engine support | N | N | Y (SGE/OGE) (optional) |
distribution | source (pip), conda, Galaxy | source | source (Bitbucket), Singularity container |
CPU usage | fixed on launch | fixed on launch | specified as a maximum (decreases over time) |
Notes.
- JI
- Jaccard Index
- IGF
- Identical Genome Fraction
- ANI
- average nucleotide identity
- d-and-c
- divide-and-conquer
- SGE/OGE
- Sun/Open Grid Engine
- Y
- present feature
- N
- absent feature