Skip to main content
. 2013 Oct;20(10):714–737. doi: 10.1089/cmb.2013.0084

Table 1.

Comparison of Assemblers on ECOLI-SC, a Single-Cell E. coli Dataset

Assemblera NGA50 # contigsb Longest contig Total length MAc MMd INDe Nsf GF (%)g No. genesh
Conventional (multicell) assemblers
 A5 13310 745 101584 4441145 8 11.97 0.19 0.00 90.141 3453
 ABySS 68534 179 178720 4345617 5 2.71 2.66 17.07 88.268 3704
 CLC 32277 503 113285 4656964 3 4.76 2.87 7.40 92.378 3768
 EULER-SR 26580 429 140518 4248713 18 9.37 218.72 58.14 85.005 3419
 Ray 53903 296 210612 4649552 13 2.34 0.87 0.00 91.864 3838
 SOAPdenovo 16606 569 87533 4098032 7 114.38 11.08 1295.26 79.861 3038
 Velvet 22648 261 132865 3501984 2 2.07 1.23 0.00 74.254 3098
Single-cell assemblers
 E + V-SC 32051 344 132865 4540286 2 1.85 0.70 0.00 92.162 3793
 IDBA-UD 98306 244 284464 4814043 7 2.08 0.11 0.00 95.763 4062
 SPAdes 2.4 110782 274 268093 4929226 2 3.28 0.49 2.52 96.157 4060
a

Comparisons were performed with QUAST 1.2 (Gurevich et al., 2013). In each column, the best assembler by that criteria is indicated in bold.

b

Only contigs of length ≥500 bp were used.

c

MA: number of misassemblies. Misassemblies are locations on an assembled contig where the left flanking sequence aligns over 1 kb away from the right flanking sequence on the reference.

d

MM: Mismatch (substitution) error rate per 100 kb.

e

IND: number of indels per 100 kb. MM and IND are measured in aligned regions of the contigs.

f

Ns: Count of undefined bases (Ns) per 100 kb.

g

GF (%): The genome fraction is the fraction of the genome covered by the contigs. For single-cell projects, the total assembly size often exceeds the genome length due to contaminants and other reasons (see Woyke et al., 2011). The genome fraction filters out these issues.

h

The number of genes sequenced at full length is out of a list of 4324 annotated genes from www.ecogene.org for E. coli.