Table 2.
Upstream assembler | Contig set | N Contigs4 | N505 | N covered bases6 | Average length7 | Maximum length8 | MPMB9 | Average identity10 (%) |
---|---|---|---|---|---|---|---|---|
(a) Contigs of A.thaliana genome | ||||||||
Velvet | All1 | 30 037 | 3515 | 82 844 417 | 2668 | 27 792 | 22.2 | 95.2 |
Extendable2 | 8615 | 4148 | 28 007 451 | 3262 | 27 398 | 0.3 | 97.6 | |
Extendable + AlignGraph3 | 5751 | 7876 | 32 467 110 | 5521 | 49 768 | 1.6 | 94.8 | |
ABySS | All | 30 972 | 2559 | 69 432 667 | 2206 | 29 760 | 13.4 | 97.2 |
Extendable | 11 693 | 2820 | 28 885 212 | 2454 | 16 343 | 0.5 | 98.7 | |
Extendable + AlignGraph | 8427 | 5484 | 35 859 786 | 4151 | 25 321 | 1.1 | 95.8 | |
(b) Contigs of human chromosome 14 | ||||||||
ALLPATHS-LG | All | 4383 | 38 590 | 83 849 397 | 19 201 | 240 764 | 0.3 | 98.9 |
Extendable | 1674 | 39 851 | 35 746 095 | 20 806 | 200 495 | 0.1 | 98.9 | |
Extendable + AlignGraph | 785 | 71 847 | 36 441 001 | 45 358 | 305 880 | 0.0 | 97.5 | |
ALLPATHS-LGc | All | 3856 | 43 856 | 83 860 939 | 21 818 | 275 446 | 0.2 | 99.3 |
Extendable | 1296 | 45 719 | 31 457 201 | 24 346 | 275 446 | 0.1 | 99.5 | |
Extendable + AlignGraph | 608 | 86 613 | 34 614 465 | 54 406 | 294 615 | 0.0 | 96.9 | |
SOAPdenovo | All | 10 865 | 16 855 | 80 135 941 | 7623 | 147 494 | 5.9 | 94.9 |
Extendable | 5613 | 17 412 | 45 246 077 | 8223 | 141 981 | 0.9 | 96.4 | |
Extendable + AlignGraph | 3469 | 32 881 | 52 861 640 | 15 271 | 219 841 | 0.5 | 95.0 | |
MaSuRCA | All | 19 034 | 5767 | 75 497 302 | 3802 | 53 837 | 13.9 | 98.9 |
Extendable | 9241 | 6047 | 38 842 517 | 4199 | 51 249 | 0.2 | 99.2 | |
Extendable + AlignGraph | 5665 | 11 590 | 43 930 184 | 7666 | 66 758 | 0.4 | 98.1 | |
CABOG | All | 3118 | 46 523 | 84 989 190 | 27 401 | 296 888 | 0.3 | 97.3 |
Extendable | 1692 | 45 669 | 46 499 763 | 27 089 | 296 888 | 0.0 | 98.7 | |
Extendable + AlignGraph | 701 | 101 907 | 50 527 605 | 70 362 | 443 952 | 0.1 | 97.6 | |
Bambus2 | All | 11 219 | 8378 | 64 011 072 | 5764 | 449 449 | 3.1 | 89.9 |
Extendable | 6995 | 7521 | 37 857 989 | 5439 | 62 798 | 0.3 | 97.6 | |
Extendable + AlignGraph | 2722 | 19 989 | 39 147 357 | 14 176 | 86 154 | 0.5 | 96.5 | |
(c) Scaffolds of human chromosome 14 | ||||||||
SOAPdenovo | All | 3902 | 391 693 | 85 417 248 | 24 397 | 1 852 152 | 1.0 | 82.9 |
Extendable | 901 | 387 309 | 40 296 035 | 47 526 | 1 019 659 | 0.1 | 84.5 | |
Extendable + AlignGraph | 767 | 544 209 | 47 823 279 | 63 525 | 2 246 638 | 0.1 | 81.0 | |
MaSuRCA | All | 721 | 580 822 | 65 433 305 | 63 876 | 2 943 966 | 1.3 | 57.2 |
Extendable | 101 | 289 703 | 5 554 781 | 52 820 | 1 516 804 | 0.0 | 81.9 | |
Extendable + AlignGraph | 78 | 316 946 | 6 986 224 | 86 552 | 1 573 741 | 0.0 | 83.4 | |
CABOG | All | 471 | 387 876 | 81 163 688 | 176 590 | 1 944 475 | 0.1 | 91.9 |
Extendable | 146 | 358 688 | 29 372 033 | 200 539 | 1 905 529 | 0.0 | 98.2 | |
Extendable + AlignGraph | 67 | 906 407 | 33 708 925 | 481 712 | 2 051 503 | 0.0 | 94.1 | |
Bambus2 | All | 569 | 319 334 | 64 378 693 | 116 582 | 1 477 847 | 0.1 | 77.4 |
Extendable | 66 | 272 436 | 6 949 338 | 119 858 | 641 463 | 0.0 | 92.0 | |
Extendable + AlignGraph | 80 | 377 905 | 8 963 132 | 114 852 | 812 353 | 0.1 | 85.4 |
(a) Genomic PE reads from A.thaliana were assembled with Velvet and ABySS. The resulting contigs were extended with AlignGraph using as reference the genome sequence from A.lyrata. (b–c) The subsequent panels contain assembly results for the human chromosome 14 sample from the GAGE project where the chimpanzee genome served as reference. (b) Contig assembly results are given for the de novo assemblers ALLPATHS-LG, ALLPATHS-LGc (in cheat mode), SOAPdenovo, MaSuRCA, CABOG and Bambus2. (c) Scaffolded assembly results are given for SOAPdenovo, MaSuRCA, CABOG and Bambus2. The results are organized row-wise as follows: the number of initial contigs obtained by each de novo assembler1, the ‘extendable' subset of the initial contigs that AlignGraph was able to improve2, and the extension results obtained with AlignGraph3. The additional columns give the number of contigs4, N50 values5, the number of covered bases6, the average7, and maximum8 length of the contigs, the number of misassemblies per million base pairs (MPMB)9, and the average identity among the true contigs and the target genome10. More details on these performance criteria are provided in Section 3.1.5.