Skip to main content
. Author manuscript; available in PMC: 2013 Jul 10.
Published in final edited form as: Nat Biotechnol. 2012 Jul 1;30(7):693–700. doi: 10.1038/nbt.2280

Table 2. PacBio assembly contiguity.

Organism: The genome being assembled. The median and max lengths of corrected PacBio sequences (PBcR) are given in parenthesis. The corrected length is shorter than original PacBio RS sequences due to trimming and splitting chimeric sequences. Supplementary Table S2 reports the original PacBio RS sequence lengths before correction. The three reference data sets (Lambda NEB3011, E. coli K12, and S. cerevisiae S228c) were generated using the pre-release PacBio RS, resulting in shorter read lengths. Technology: the read data used for assembly. Pair separation (if applicable) is listed immediately after the coverage. Reference bp: the assumed genome size used for the N50 calculation. Assembly bp: the total number of base pairs in all contigs (only contigs ≥ 10,000 bp are included in all results). #Contigs: The number of contigs comprising the assembly. Max Contig Length: The maximum contig length. N50: N such that 50% of the genome is contained in contigs of length ≥ N. Assemblies for next-gen (Illumina/454) were generated by Celera Assembler,11 SOAPdenovo,49 and ALLPATHS-LG19 (where possible). Only the best assembly (based on contiguity) in each case was reported.

Organism Technology Reference bp Assembly bp # Contigs Max Contig Length N50
Lambda NEB3011 Illumina 100X 200bp 48 502 48 492 1 48 492 / 48 492 48 492 / 48 492 (100%) *
(median: 727 max: 3 280) PacBio PBcR 25X 48 440 1 48 444 / 48 444 48 444 / 48 440 (100%) *

E. coli K12 Illumina 100X 500bp 4 639 675 4 462 836 61 221 615 / 221 553 100 338/ 83 037 (82.76%) *
(median: 747 max: 3 068 ) PacBio PBcR 18X 4 465 533 77 239 058 / 238 224 71 479/ 68 309 (95.57%) *
PacBio PBcR 18X + Illumina 50X 500bp 4 576 046 65 238 272 / 238 224 93 048/ 89 431 (96.11%) *

S. cerevisiae S228c Illumina 100X 300bp 12 157 105 11 034 156 192 266 528 / 227 714 73 871/ 49 254 (66.68%) *
(median: 674 max: 5 994) PacBio PBcR 13X 11 110 420 224 224 478 / 217 704 62 898/ 54 633 (86.86%) *
PacBio PBcR 13X + Illumina 50X 300bp 11 286 932 177 262 846 / 260 794 82 543/ 59 792 (72.44%) *

E. coli C227-11 PacBio CCS 50X 5 504 407 4 917 717 76 249 515 100 322
(median: 1 217 max: 14 901) PacBio PBcR 25X (corrected by 25X CCS) 5 207 946 80 357 234 98 774
PacBio PBcR 25X + CCS 25X 5 269 158 39 647 362 227 302
PacBio PBcR 50X (corrected by 50X CCS) 5 445 466 35 1 076 027 376 443
PacBio PBcR 50X + CCS 25X 5 453 458 33 1 167 060 527 198
Manually Corrected ALLORA Assembly9 5 452 251 23 653 382 402 041

E. coli 17-2 Illumina 100X 300bp 5 000 000 4 975 331 62 226 141 74 940
(median: 886 max: 10 069 ) PacBio PBcR 50X 4 981 368 58 318 969 143 307
PacBio PBcR 50X + Illumina 50X 300bp 5 022 503 55 367 911 180 932

E. coli JM221 454 50X 5 000 000 4 714 344 66 308 063 106 034
(median: 1 216 max: 12 552) PacBio PBcR 25X 5 005 429 30 631 286 314 500
PacBio PBcR 25X + 454 25X 5 008 824 30 633 667 314 500

Melopsittacus undulatus Illumina 194X (220/500/800 paired-end 2/5/10Kb mate-pairs) 1.23 Gbp 1 023 532 850 24 181 1 050 202 47 383
454 15.4X (FLX + FLX Plus + 3/8/20Kbp paired-ends) 999 168 029 16 574 751 729 75 178
(median: 1 182 max: 14 596) 454 15.4X + PacBio PBcR 3.83X (corrected by 15.4X 454) 1 066 348 480 15 328 871 294 93 069
(median: 997 max: 13 079) 454 15.4X + PacBio PBcR 3.75X (corrected by 54X Illumina) 1 071 356 415 15 081 1 238 843 99 573
*

For genomes with an available reference, the max and N50 contig is measured both before and after breaking contigs at assembly mis-joins. The percentages in parenthesis indicate the ratio between corrected and original N50. A higher ratio indicates a more correct assembly. Full assembly quality statistics are listed in Supplementary Table S4, following the GAGE assembly evaluation methodology.12