Table 1.
Dataset | Assembler | #ctgs/scfs | Good Ctgs/scfs | Total aln (Mbp) | Slt | Hvy | Ch | Size @ 10 Mbp | #@ 10 Mbp | Max ctg size | Err per Mbp |
---|---|---|---|---|---|---|---|---|---|---|---|
mockE | SOAPdenovo | 63,014 | 99.3% | 51 | 167 | 131 | 1 | 28,208 | 195 | 249,819 | 5.9 |
mockE | SOAPdenovo_MA | 63,107 | 99.3% | 51 | 166 | 131 | 1 | 28,208 | 195 | 249,819 | 5.8 |
mockE | Velvet | 12,381 | 96.0% | 41 | 269 | 106 | 2 | 46,122 | 128 | 183,815 | 9.2 |
mockE | Velvet_MA | 12,830 | 96.2% | 41 | 256 | 100 | 2 | 42,269 | 137 | 179,673 | 8.7 |
mockE | MetaVelvet | 23,323 | 96.7% | 49 | 474 | 160 | 5 | 62,131 | 93 | 367,458 | 13.0 |
mockE | MetaVelvet_MA | 22,772 | 96.8% | 49 | 462 | 156 | 4 | 62,138 | 91 | 367,458 | 12.7 |
mockE | Meta-IDBA | 22,064 | 95.3% | 47 | 362 | 151 | 3 | 26,141 | 223 | 249,069 | 11.0 |
mockE | Meta-IDBA_MA | 22,032 | 95.4% | 47 | 362 | 151 | 3 | 26,141 | 223 | 249,069 | 11.0 |
mockS | SOAPdenovo | 45,251 | 98.8% | 28 | 135 | 99 | 0 | 5,672 | 626 | 186,064 | 8.4 |
mockS | SOAPdenovo_MA | 44,928 | 98.8% | 28 | 135 | 98 | 0 | 5,672 | 626 | 186,064 | 8.3 |
mockS | Velvet | 20,981 | 95.6% | 28 | 498 | 127 | 1 | 6,134 | 770 | 119,120 | 22.4 |
mockS | Velvet_MA | 21,050 | 95.8% | 28 | 485 | 115 | 1 | 6,060 | 775 | 119,120 | 21.5 |
mockS | MetaVelvet | 19,649 | 94.5% | 28 | 518 | 158 | 2 | 13,028 | 351 | 217,330 | 24.2 |
mockS | MetaVelvet_MA | 20,551 | 95.3% | 28 | 517 | 143 | 3 | 6,685 | 622 | 217,330 | 20.1 |
mockS | Meta-IDBA | 4,573 | 92.3% | 18 | 101 | 83 | 0 | 13,150 | 368 | 119,604 | 10.2 |
mockS | Meta-IDBA_MA | 4,559 | 92.5% | 18 | 101 | 83 | 0 | 13,150 | 368 | 119,604 | 10.2 |
HMP | SOAPdenovo | 39,028 | 89.9% | 11 | 1,138 | 2,686 | 0 | 9,881 | 514 | 116,204 | 347.6 |
HMP | SOAPdenovo_MA | 35,230 | 89.1% | 11 | 1,138 | 2,618 | 0 | 11,359 | 426 | 238,051 | 341.5 |
HMP | Meta-IDBA | 25,861 | 88.9% | 7 | 718 | 2,102 | 0 | 4,215 | 1144 | 59,188 | 402.8 |
HMP | Meta-IDBA_MA | 25,698 | 88.7% | 7 | 710 | 2,087 | 0 | 4,215 | 1144 | 59,188 | 399.6 |
HMPscf | SOAPdenovo | 31,673 | 99.9% | 11 | - | - | 10 | 9,906 | 510 | 116,181 | 0.9 |
HMPscf | SOAPdenovo_MA | 27,231 | 99.9% | 11 | - | - | 10 | 11,359 | 426 | 238,051 | 0.9 |
HMPscf | Meta-IDBA | 20,352 | 99.9% | 7 | - | - | 10 | 4,946 | 939 | 59,188 | 1.4 |
HMPscf | Meta-IDBA_MA | 22,886 | 99.9% | 7 | - | - | 9 | 22,304 | 238 | 66,401 | 1.3 |
Datasets are mockE (mock Even), mockS (mock Staggered), HMP (Tongue dorsum, contig-level analysis), HMPscf (Tongue dorsum, scaffold-level analysis). All analyses other than HMPscf were done at the contig level. If necessary, contigs were extracted from scaffolds by splitting at three consecutive Ns. Assemblers with suffix _MA indicate the results produced by running MetAMOS on contigs produced by the corresponding assembler. #ctgs/scfs: total number of contigs/scaffolds in the assembly. Good Ctgs/scfs: fraction of contigs/scaffolds that mapped without errors to reference genomes. For the HMP dataset (Tongue dorsum contigs) alignments were only made to a small set of genomes estimated by the HMP project to match the genomes in this sample. For the HMPscf dataset good scaffolds are those without chimeric errors. Total Aln: total amount of sequence that can be aligned to the reference genomes (in Mbp). Slt: slight mis-assemblies determined by alignments that cover 80% or more of the aligned contig in a single match. Hvy: heavy misassemblies determined by alignments that cover less than 80% of the aligned contig in a single match or have two or more matches to a single reference. Ch: Chimeras are contigs with matches to two distinct reference genomes. Neither heavy mis-assemblies nor chimeras count towards reference coverage. Size @ 10 Mbp: the size of the largest contig c such that the sum of all contigs larger than c is more than 10 Mbp (similar to the commonly used N50 size). #@ 10 Mbp: smallest number of contigs whose cumulative size adds up to more than 10 Mbp. Max ctg size: size of the largest contig in the assembly. Err per Mbp: average number of errors per Mbp. Numbers in bold represent the best value for the specific dataset.