TABLE 2.
Analysis tool (reference[s]) | Concept | Computational requirement | Speed | Assembly quality | Preferred sequencing technology(ies) | Web address(es) | Input format | Output format(s) |
---|---|---|---|---|---|---|---|---|
Web based | ||||||||
Velvet (103, 126) | de Bruijn graph-based assembly that resolves repeat-rich regions; can be used for de novo or reference-guided assembly; requires paired reads with 20- to 25-fold coverage | Mid* | Medium* | Low* | Illumina | https://cge.cbs.dtu.dk/services/Assembler/ | FASTA, FASTQ, SAM, or BAM | AMOS, modified FASTA |
SPAdes/hybridSPAdes (112) | de Bruijn graph-based assembler for de novo assembly of short and long reads | Low** | Low** | Mid*/** | Mixed input (Illumina, Ion Torrent, PacBio CLR, Oxford Nanopore) | https://cge.cbs.dtu.dk/services/SPAdes/ | FASTA, FASTQ, or BAM | FASTA, FASTQ, FASTG |
Command line | ||||||||
IDBA-UD (108) | de Bruijn graph-based assembly designed for assembly of repeat-rich reads of various sequencing depths | Low* | Medium* | Mid* | Illumina | http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/ | FASTA | FASTA |
RAY (96) | de Bruijn graph-based assembly that uses seeds instead of Eulerian walks; used for de novo assembly; designed for short reads | Low*** | Fast*** | Low*** | Mixed input (454, Illumina, Ion Torrent) | http://denovoassembler.sourceforge.net/ | FASTA, FASTQ, or SFF | FASTA, TXT |
Minimap/miniasm (116) | OLC framework that computes overlaps and performs read trims and unitig construction; can be used for de novo or reference-guided assembly | Low** | High** | High*/** | PacBio, Oxford Nanopore | https://github.com/lh3/minimap, https://github.com/lh3/miniasm | FASTA | GFA, PAF |
Canu (118) | OLC framework that computes overlaps and performs read correction, read trims, and unitig construction; used for de novo assembly | Mid** | Low** | High*/** | PacBio, Oxford Nanopore | https://github.com/marbl/canu | FASTA or FASTQ | FASTA |
All quantitative performance measures were taken from data reported previously, as indicated. CLR, continuous long reads; GFA, graphical fragment assembly; PAF, pairwise mapping format; SFF, standard flowgram format (454 data format); *, E. coli K-12 MG1655 data set (110); **, Enterobacter kobei data set (233); ***, Illumina data from E. coli (SRA accession number SRX000429) (234). Note that for SPAdes, only the nonhybrid tool is accessible as a Web-based tool.