Skip to main content
. 2017 Jul-Sep;40(3):553–576. doi: 10.1590/1678-4685-GMB-2016-0230

Table 1. Overview of the tools described in the present review.

Category Tool Main features Dependences* Reference Download link / webserver
Scaffolding ABySS
  • Paired-end scaffolding.

  • Scaffolding feature already integrated in the ABySS de novo assembly pipeline.

  • Uses the estimated distances generated by the program DistanceEst (from the same package) as input.

  • Allows the scaffolding using long-reads, such as those generated by PacBio and Oxford Nanopore platforms.

boost libraries:
www.boost.org/
Open MPI:
http://www.open-mpi.org
sparse-hash library:
http://goog-sparsehash.sourceforge.net/
(Simpson et al., 2009) http://www.bcgsc.ca/platform/bioinfo/software/abyss
Scaffolding Bambus 2
  • Paired-end scaffolding.

  • Can be easily integrated with assembly projects that are built on top of the AMOS package.

  • Supports the scaffolding of metagenomes.

  • Requires experience with the AMOS package and its data formats.

AMOS package (Treangen et al., 2011):
http://amos.sourceforge.net/
(Koren et al., 2011) https://sourceforge.net/projects/amos/
Scaffolding MIP
  • Paired-end scaffolding.

  • Supports both paired-end and mate-pair (long range) reads.

lpsolve library:
http://sourceforge.net/projects/lpsolve/ lemon library: http://lemon.cs.elte.hu/
(Salmela et al., 2011) https://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/
Scaffolding OPERA
  • Paired-end scaffolding.

  • Identifies potential spurious connections caused by chimeric reads and repetitive genomics elements that may affect the reliability of the scaffolding.

  • Contigs identified as misassembled may be used in the construction of more than one scaffold, but sometimes it may lead to new assembly errors.

BWA (Li and Durbin 2009):
http://bio-bwa.sourceforge.net/
Bowtie (Langmead et al., 2009):
http://bowtie-bio.sourceforge.net/ Samtools (Li et al., 2009): http://samtools.sourceforge.net/
(Gao et al., 2011) https://sourceforge.net/projects/operasf
Scaffolding SCARPA
  • Paired-end scaffolding.

  • Only uses for scaffolding those contigs with length greater than the N50 of the assembly.

  • Allows multiple libraries to be used in the same scaffolding project.

None (Donmez and Brudno, 2013) http://compbio.cs.toronto.edu/hapsembler/scarpa.html
Scaffolding SGA
  • Paired-end scaffolding.

  • Scaffolding feature already integrated in the SGA assembly pipeline, which is optimized for Illumina data and large genomes.

  • Uses the estimated distances generated by the program DistanceEst (from the package ABySS) as input, along with the read mapping file in .BAM format.

  • Allows multiple libraries to be used in the same scaffolding project.

Bamtools (Barnett et al., 2011):
https://github.com/pezmaster31/bamtools
BWA (Li and Durbin, 2009): http://bio-bwa.sourceforge.net/ Samtools (Li et al., 2009): http://samtools.sourceforge.net/
Sparse-hash library: http://goog-sparsehash.sourceforge.net/
(Simpson and Durbin, 2012) https://github.com/jts/sga
Scaffolding SOPRA
  • Paired-end scaffolding.

  • Developed to improve the assemblies generated by Velvet and SSAKE, and required the .AFG files.

  • Supports data from early Illumina and ABI SOLiD platforms, including paired-end and mate-pair reads.

  • Is not fully automated, so it is necessary to run different scripts for each step of the scaffolding.

None (Dayarian et al., 2010) http://www.physics.rutgers.edu/~anirvans/SOPRA/
Scaffolding SSPACE
  • Paired-end scaffolding.

  • Trims the edge of the contigs as they are more suitable to assembly errors.

  • Requires information about the paired-end library, including mean size of the insert, standard deviation and the relative orientation of the mates.

None (Boetzer et al., 2011) http://www.baseclear.com/genomics/bioinformatics/basetools/
Scaffolding SSPACE-LongRead
  • Paired-end scaffolding.

  • Allows the scaffolding using long-reads, such as those generated by PacBio and Oxford Nanopore platforms.

None (Boetzer and Pirovano, 2014) http://www.baseclear.com/genomics/bioinformatics/basetools/
Scaffolding MUMmer
  • Single reference-based scaffolding.

  • The result of the alignment must be post-processed to obtain the scaffolds.

(Kurtz et al., 2004) http://mummer.sourceforge.net/
Scaffolding ABACAS
  • Single reference-based scaffolding.

  • Useful when the reference and the target genome are closely-related, and the genome to be scaffolded is not larger than the reference genome.

  • Not optimized for bacteria with two or more replicons/chromosomes (ex: Leptospira genus).

  • Allows the design of primers for gap-closing.

MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
Primer3 (Koressaar and Remm, 2007; Untergasser et al., 2012):
http://primer3.ut.ee/
(Assefa et al., 2009) http://abacas.sourceforge.net/
Scaffolding CONTIGuator
  • Single reference-based scaffolding.

  • Useful when the target genome is composed by more than one chromosome / replicon.

  • Allows a more sensitive identification of syntenic regions, if compared to ABACAS, as it applies a BLAST search after MUMmmer.

ABACAS (Assefa et al., 2009):
http://abacas.sourceforge.net/
BioPython (Python package):
http://biopython.org/
BLAST+ (Altschul et al., 1990; Camacho et al., 2009):
ftp://ftp.ncbi.nlm.nih.gov/blast/
MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
Primer3 (Koressaar and Remm, 2007; Untergasser et al., 2012):
http://primer3.ut.ee/
(Galardini et al., 2011) http://contiguator.sourceforge.net/
Scaffolding Mauve
  • Single reference- based scaffolding.

  • Can be used both through a commandline interface (CLI) and a graphical user interface (GUI).

  • Allows the identification of genomic inversions and translocations.

  • Not optimized for bacteria with two or more replicons/chromosomes.

Java:
https://www.java.com/
(Darling et al., 2004; Rissman et al., 2009) http://darlinglab.org/mauve/mauve.html
Scaffolding FillScaffolds
  • Single reference- based scaffolding.

  • Not optimized for bacteria with two or more replicons/chromosomes.

  • Results may require post-processing to reconstruct the sequence of the scaffold.

Java:
https://www.java.com/
(Muñoz et al., 2010) Supplementary data of Muñoz et al. (2010). http://dx.doi.org/10.1186/1471-2105-11-304
Scaffolding SIS
  • Single reference-based scaffolding.

  • Allows the identification of genomic inversions.

  • Not optimized for bacteria with two or more replicons/chromosomes.

MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
(Dias et al., 2012) http://marte.ic.unicamp.br:8747.
Scaffolding CAR
  • Single reference-based scaffolding.

  • Allows the identification of genomic inversions and translocations.

  • Also available as a webserver.

  • Not optimized for bacteria with two or more replicons/chromosomes.

MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
PHP:
https://php.net/
(Lu et al., 2014) http://genome.cs.nthu.edu.tw/CAR/
Scaffolding RACA
  • Multiple reference-based scaffolding.

  • Optimized for large genomes and with multiple chromosomes.

  • Can also use paired-end data.

None (Kim et al., 2013): http://bioen-compbio.bioen.illinois.edu/RACA/
Scaffolding Ragout
  • Multiple reference-based scaffolding.

  • Uses phylogenetic information to identify the most probable orientation of the contigs / scaffolds.

Networkx (Python package):
http://networkx.github.io/
Newick (Python package):
http://www.daimi.au.dk/~mailund/newick.html
Sibelia:
http://github.com/bioinf/Sibelia
(Kolmogorov et al., 2014) https://github.com/fenderglass/Ragout
Scaffolding MeDuSa
  • Multiple reference-based scaffolding.

  • Accepts both finished and draft genomes as reference.

BioPython (Python package):
http://biopython.org/
Java:
https://www.java.com/
MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
(Bosi et al., 2015) https://github.com/combogenomics/medusa
Assembly integration Minimus
  • Can be easily integrated with assembly projects that are built on top of the AMOS package.

  • Requires experience with the AMOS package and its data formats.

AMOS package (Treangen et al., 2011):
http://amos.sourceforge.net/
(Sommer et al., 2007) https://sourceforge.net/projects/amos/
Assembly integration Reconciliator
  • Corrects the misassembled regions in a target assembly by comparing to an alternative assembly for the same genome.

  • Identifies repetitive regions that suffered compressions or expansions.

MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
(Zimin et al., 2008) http://www.genome.umd.edu/
Assembly integration MAIA
  • Allows the integration of two or more assemblies.

  • Accepts reference genome to perform scaffolding, what is useful for those contigs without correspondence in the other assemblies.

Matlab:
https://www.mathworks.com/
MUMmer:
http://mummer.sourceforge.net/
GAIMC (Matlab toolbox):
http://github.com/dgleich/gaimc
(Nijkamp et al., 2010) http://bioinformatics.tudelft.nl
Assembly integration CISA
  • Allows the integration of three or more assemblies.

  • Corrects misassembled regions and compressed / expanded repeated regions.

BLAST+ (Altschul et al., 1990; Camacho et al., 2009):
ftp://ftp.ncbi.nlm.nih.gov/blast/
MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
(Lin and Liao, 2013) http://sb.nhri.org.tw/CISA/
Assembly integration GAA
  • Uses the alignment between the different contigs in the set of assemblies to generate an assembly graph, which is explored to identify to minimal set of independent paths.

BLAT (Kent, 2002):
https://genome.ucsc.edu/
GSMapper:
http://454.com/
(Yao et al., 2012) http://sourceforge.net/projects/gaa-wugi/
Assembly integration Mix
  • Generate an extension graph that represents the connection between the contigs.

  • Filters the alignment to reduce the ambiguities caused by repetitive sequences.

Networkx (Python package):
http://networkx.lanl.gov/
BioPython (Python package):
http://biopython.org/
MUMmer(Kurtz et al., 2004):
http://mummer.sourceforge.net/
(Soueidan et al., 2013) https://github.com/cbib/MIX
Assembly integration GAM / GAM-NGS
  • Requires the read files to perform the assembly integration.

  • One of the assemblies to be merged is defined as “master”, while the others are defined as “slaves”.

  • Allows the identification of misassembled regions in the master, which are corrected before the generation of the final assembly.

cmake:
https://cmake.org/
zlib library:
http://www.zlib.net/
boost libraries:
www.boost.org/
sparse-hash library:
http://goog-sparsehash.sourceforge.net/
(Casagrande et al., 2009; Vicedomini et al., 2013) https://github.com/vice87/gam-ngs
Assembly integration Zorro
  • Requires the read files to perform the assembly integration.

  • Remaps the reads back to the contigs and identifies misassembled and repetitive regions based on the coverage.

  • Splits the misassembled contigs and performs the assembly integration using Minimus.

AMOS (Treangen et al., 2011):
http://amos.sourceforge.net/
BioPerl (Perl module):
http://bioperl.org
Bowtie (Langmead et al., 2009):
http://bowtie-bio.sourceforge.net/
MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
(Argueso et al., 2009) http://lge.ibi.unicamp.br/zorro/
Gap closing GapCloser
  • Gap-closing feature already integrated in the SOAPdenovo de novo assembly pipeline

  • Performs a local reassembly in the gap region using the reads located in the edges of the surrounding contigs.

None (Li et al., 2010) http://soap.genomics.org.cn/
Gap closing IMAGE
  • Iteratively performs a remapping of the reads to the contigs, followed by the selection of those that overlap the gap region and a local reassembly.

None (Tsai et al., 2010) https://sourceforge.net/projects/image2
Gap closing GapFiller
  • Iteratively performs a remapping of the reads to the contigs, followed by the selection of those that overlap the gap region and a local reassembly.

  • Requires information about the paired-end library, including mean size of the insert, its standard deviation and the relative orientation of the mates.

None (Boetzer and Pirovano, 2012) http://www.baseclear.com/genomics/bioinformatics/basetools
Gap closing Enly
  • Iteratively performs a remapping of the reads to the contigs, followed by the selection of those that overlap the gap region and a local reassembly.

  • If a reference genome is provided, a new scaffolding step can be performed to improve the assembly.

BioPython (Python package):
http://biopython.org/
BLAST and BLAST+ (Altschul et al., 1990; Camacho et al., 2009):
ftp://ftp.ncbi.nlm.nih.gov/blast/
Cdbfasta/cdbyank:
http://compbio.dfci.harvard.edu/tgi/software/
EMBOSS:
http://emboss.sourceforge.net/
Minimo assembler (Treangen et al., 2011):
http://amos.sourceforge.net/
MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
Phrap: http://www.phrap.org/phredphrapconsed.html
(Fondi et al., 2014) http://enly.sourceforge.net/
Gap closing FGAP
  • Uses alternative assemblies of the target genome to identify regions that overlap the gap.

Matlab:
https://www.mathworks.com/
(Piro et al., 2014) http://www.bioinfo.ufpr.br/fgap/
Gap closing Sealer
  • Performs a local re-assembly of the gap regions using different settings of k-mer, what may help in the solving of regions with repetitive sequences.

boost libraries:
www.boost.org/
sparse-hash library:
http://goog-sparsehash.sourceforge.net/
Open MPI:
http://www.open-mpi.org
(Paulino et al., 2015) https://github.com/bcgsc/abyss/tree/sealer-release
Gap closing GMCLoser
  • May use both paired-end reads and alternative assemblies to perform the gap-closing.

  • Applies a likelihood analysis to avoid the effect of misassemblies in the alternative assemblies.

MUMmer (Kurtz et al. 2004):
http://mummer.sourceforge.net/
BLAST+ (Altschul et al., 1990; Camacho et al., 2009):
ftp://ftp.ncbi.nlm.nih.gov/blast/
Bowtie (Langmead et al., 2009):
http://bowtie-bio.sourceforge.net/
YASS (Noé and Kucherov, 2005):
http://bioinfo.lifl.fr/yass
(Kosugi et al., 2015) https://sourceforge.net/projects/gmcloser/
Gap closing MapRepeat
  • Performs a reference-based scaffolding using a closely-related genome provided by the user.

  • Uses a reference-guided assembly to perform the gap-closing process.

BLAST+ (Altschul et al., 1990; Camacho et al., 2009):
ftp://ftp.ncbi.nlm.nih.gov/blast/
BioPython (Python package):
http://biopython.org/
MIRA:
http://mira-assembler.sourceforge.net
MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
(Mariano et al., 2015) http://github.com/dcbmariano/maprepeat
Gap closing GapBlaster
  • Allows a manual gap-closing using an alternative assembly of the target genome.

BLAST and BLAST+ (Altschul et al., 1990; Camacho et al., 2009):
ftp://ftp.ncbi.nlm.nih.gov/blast/
MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
(de Sá et al., 2016) https://sourceforge.net/projects/gapblaster2015/
Assembly evaluation REAPR
  • Calculates the accuracy of the assembly based on the coverage after remapping the reads back to the scaffolds.

  • Misassembled regions can be identified as they usually present a discrepant coverage.

  • A new set of scaffolds is generated by splitting the regions identified as misassembled.

File::Basename, File::Copy, File::Spec, File::Spec::Link, Getopt::Long and List::Util (Perl modules):
http://www.cpan.org/
R:
https://www.r-project.org/
(Hunt et al., 2013) http://www.sanger.ac.uk/science/tools/reapr
Assembly evaluation QUAST
  • Calculate several assembly metrics, such as C+G%, N50 and L50.

  • Can be used to compare different assemblies for the same genome, and / or compare then to a reference genome.

boost libraries:
www.boost.org/
Java:
https://www.java.com/
Matplotlib (Python package):
http://matplotlib.org
Time::HiRes (Perl module):
http://www.cpan.org/
(Gurevich et al., 2013) http://bioinf.spbau.ru/quast
Assembly evaluation ALE
  • Calculates the accuracy of the assembly based on the k-mers and C+G% distribution along the scaffolds.

  • Doesn't require a reference genome.

Matplotlib (Python package):
http://matplotlib.org
Mpmath (Python package):
http://mpmath.org
Numpy (Python package):
http://www.numpy.org
Pymix (Python package):
http://www.pymix.org/pymix
Setuptools (Python package):
https://github.com/pypa/setuptools
(Clark et al., 2013) http://www.alescore.org
Assembly evaluation CGAL
  • Calculates the accuracy of the assembly based on the coverage after remapping the reads back to the scaffolds.

None (Rahman and Pachter, 2013) http://bio.math.berkeley.edu/cgal/
Assembly evaluation GMvalue
  • Aligns the assembly to a reference genome (or alternative assembly) to identify misassembled regions.

  • A new set of scaffolds is generated by splitting the regions identified as misassembled.

MUMmer (Kurtz et al., 2004):
http://mummer.sourceforge.net/
BLAST+ (Altschul et al., 1990; Camacho et al., 2009):
ftp://ftp.ncbi.nlm.nih.gov/blast/
Bowtie (Langmead et al., 2009):
http://bowtie-bio.sourceforge.net/
YASS (Noé and Kucherov, 2005):
http://bioinfo.lifl.fr/yass
(Kosugi et al., 2015) https://sourceforge.net/projects/gmcloser/
Assembly correction iCORN
  • Requires paired-end reads.

  • Interactively identifies and corrects short misassemblies, such as base-substitutions and short INDELs.

SNP-o-matic (Manske and Kwiatkowski, 2009):
https://snpomatic.svn.sourceforge.net/svnroot/snpomatic
SSAHA Pileup (Ning et al., 2001):
ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/
(Otto et al., 2010) http://icorn.sourceforge.net/
Assembly correction SEQuel
  • Requires paired-end reads.

  • Interactively identifies and corrects short misassemblies, such as base-substitutions and short INDELs.

  • Performs a local reassembly of the misassembled regions using information from k-mers and paired-end reads.

Java:
https://www.java.com/
JGraphT (Java library):
http://jgrapht.org/
(Ronen et al., 2012) http://bix.ucsd.edu/SEQuel/
Assembly correction GFinisher
  • Doesn't require paired-end reads.

  • Integrates a reference-guided scaffolding step and gap-closing procedures, along with the assembly correction process.

  • Identifies misassembled regions based on the GC-Skew distribution.

Java:
https://www.java.com/
(Guizelini et al., 2016) http://gfinisher.sourceforge.net/
*

= Considering a computer running UNIX, Linux or Mac OS operating systems (OSs). As Make, sed, awk, GCC, Perl, Bash, Python and the GNU/Unix standard utility set are already included in most of the distributions / versions of these OSs, these programs were not listed as dependences.