Table 1.
Gap closure results obtained on the bacterial datasets
Method | Original | IMAGE | SOAPdenovo | GapFiller | GapFiller-LC |
---|---|---|---|---|---|
Escherichia coli | |||||
Genome size (bp) | 4,478,287 | 4,530,961 | 4,490,973 | 4,490,638 | |
Scaffolds | 179 | 179 | 179 | 179 | |
Gap count | 544 | 291 | 16 | 11 | |
Total gap length (bp) | 12,516 | 2,861 | 16 | 130 | |
Errors (SNPs) | 12 | 40 | 33 | 22 | |
Errors (indels) | 4 | 17 | 25 | 9 | |
Errors (misjoins) | 1 | 1 | 1 | 1 | |
N50 | 50,557 | 50,558 | 50,558 | 50,558 | |
Streptomyces coelicolor | |||||
Genome size (bp) | 8,558,275 | 8,576,331 | 8,557,720 | 8,558,333 | |
Scaffolds | 115 | 115 | 115 | 115 | |
Gap count | 158 | 63 | 60 | 23 | |
Total gap length (bp) | 9,221 | 4,009 | 1,288 | 806 | |
Errors (SNPs) | 299 | 423 | 406 | 280 | |
Errors (indels) | 664 | 677 | 769 | 686 | |
Errors (misjoins) | 12 | 17 | 18 | 18 | |
N50 | 173,822 | 173,822 | 173,822 | 173,822 | |
Staphylococcus aureus | |||||
Genome size (bp) | 2,880,676 | 2,880,926 | 2,881,756 | 2,883,448 | |
Scaffolds | 19 | 19 | 19 | 19 | |
Gap count | 48 | 27 | 27 | 22 | |
Total gap length (bp) | 9,900 | 1,547 | 5,508 | 1,861 | |
Errors (SNPs) | 79 | 260 | 98 | 173 | |
Errors (indels) | 16 | 53 | 26 | 37 | |
Errors (misjoins) | 4 | 13 | 7 | 5 | |
N50 | 1,091,731 | 1,091,333 | 1,092,281 | 1,092,421 | |
Rhodobacter sphaeroides | |||||
Genome size (bp) | 4,609,785 | 4,609,466 | 4,609,596 | 4,610,796 | |
Scaffolds | 38 | 38 | 38 | 38 | |
Gap count | 170 | 163 | 161 | 139 | |
Total gap length (bp) | 21,409 | 14,166 | 20,667 | 17,625 | |
Errors (SNPs) | 218 | 410 | 230 | 300 | |
Errors (indels) | 187 | 294 | 190 | 199 | |
Errors (misjoins) | 6 | 10 | 6 | 7 | |
N50 | 3,192,334 | 3,192,075 | 3,192,215 | 3,192,974 |
Gap closure results obtained on four bacterial datasets show that the GapFiller strategy yields the most accurate finished genomes. Also, the gap count is lower compared to the other methods. The IMAGE method significantly underperforms on all quality measures and would therefore not be the preferred method to use. Differences are smaller between GapFiller and SOAPdenovo. Interestingly, whereas the gap count after closure is generally less for GapFiller, SOAPdenovo yields in three cases a shorter total gap length. This suggests the latter method is able to close larger gaps. Strikingly, however, the amount of errors is significantly higher for SOAPdenovo regardless of the source (SNPs, indels and misjoins). Even when applying less strict settings for GapFiller (GapFiller-LC: minimum coverage o = 1, ratio r = 0.5) to shorten the total gap length, our method still yields significantly less errors.