Skip to main content
. 2013 Sep 13;14(9):R101. doi: 10.1186/gb-2013-14-9-r101

Table 2.

Genome assembly continuity and correctness using hybrid and self-correction approaches

Organism Corrected by Assembly bp Number of contigs (expected) Number of contigs (actual) N50 (expected) N50 (actual) LAP Number of discordant bases QV
E. coli K12
Reference
4,639,675
 
1
4,639,675
NA
-9.65E + 07
4
>60
 
MiSeq 100×
4,647,253
1
2
 
2,367,319
-9.64E + 07
3
>60
 
454 50×
4,649,004
1
1
 
4,649,004
-9.64E + 07
3
>60
 
CCS 25X
4,653,267
1
1
 
4,653,267
-9.64E + 07
3
>60
 
Self
4,653,486
1
1
 
4,653,486
-9.64E + 07
3
>60
E .coli O157:H7
Near neighbor
5,594,477
 
3
3,776,951
NA
-3.82E + 07
1,282
36.40
 
MiSeq 100×
5,624,394
10
10
 
3,089,011
-3.66E + 07
4
>60
 
454 40×
5,613,057
10
12
 
927,294
-3.67E + 07
13
56.35
 
Self
5,611,389
10
9
 
4,324,437
-3.66E + 07
0
>60
B. trehalosi
MiSeq 100×
2,402,545
 
6
 
1,603,511
-3.28E + 07
1
>60
 
454 50×
2,413,761
 
4
 
1,051,672
-3.27E + 07
2
>60
 
CCS 25X
2,411,501
 
1
 
2,411,501
-3.27E + 07
0
>60
 
Self
2,411,068
 
1
 
2,411,068
-3.27E + 07
0
>60
M. haemolytica
MiSeq 100×
2,712,467
 
1
 
2,712,467
-3.31E + 07
0
>60
 
CCS 25X
2,739,949
 
2
 
2,686,992
-3.31E + 07
0
>60
 
Self
2,736,037
 
1
 
2,736,037
-3.31E + 07
0
>60
F. tularensis
Near neighbor
1,895,727
 
1
965,253
NA
-1.33E + 07
113
42.25
 
MiSeq 100×
1,879,071
3
10
 
357,518
-1.33E + 07
0
>60
 
454 50×
1,863,947
3
15
 
201,203
-1.33E + 07
0
>60
 
Self
1,828,135
3
8
 
401,731
-1.33E + 07
0
>60
 
Self (300×)
1,877,407
3
3
 
573,021
-1.33E + 07
0
>60
S. enterica Newport
Near neighbor
5,007,719
 
2
4,827,641
NA
-2.26E + 07
20
53.99
 
MiSeq 56X
5,027,784
4
2
 
4,918,796
-2.24E + 07
2
>60
 
454 25X
5,034,500
4
3
 
4,095,943
-2.24E + 07
2
>60
 
CCS 22X
5,030,885
4
2
 
4,921,886
-2.24E + 07
2
>60
  Self 5,029,197 4 2   4,919,684 -2.24E + 07 2 >60

Organism: the genome being assembled. Corrected by: the short-read data used for correction. Assembly bp: the total number of base pairs in all contigs (only contigs containing at least 100 reads are included in all results). Number of contigs (expected): predicted number of contigs for a known reference (or near-neighbor). Number of contigs (actual): the number of contigs comprising the assembly. N50: N such that 50% of the genome is contained in contigs of length ≥N. LAP: the assembly likelihood score. A score closer to zero indicates a better assembly. Number of discordant bases: the number of SNPs and indels identified by mapping MiSeq sequences back to the assembly and recording discrepancies. Each incorrect base is counted (that is, an indel that is a deletion of two bases from the assembly counts as two in this column). QV: estimated from the number of discordant bases as log10assemblylength#incorrectbases*10. The QV can be converted to an error probability P=10^(-QV/10). Assemblies were generated by Celera Assembler [31] followed by post-processing with Quiver [32]. NA, not available.