Skip to main content
. Author manuscript; available in PMC: 2018 Apr 20.
Published in final edited form as: Nat Genet. 2017 Mar 6;49(4):643–650. doi: 10.1038/ng.3802

Table 1.

Assembly statistics

Assembly1 Contigs2 Scaffolds Unplaced contigs3 Degenerate contigs4 Contig NG50 (Mbp)5 Scaffold NG50 (Mbp)5 Assembly Size (Gbp) Assembly in Scaffolds (%)
PacBio 3,074 30,693 3.795 2.914 N/A
Optical Map 2,944 1.487 2.748 N/A
PacBio + Optical Map 1,109 333 1,242 30,693 10.197 20.623 2.910 90.89
PacBio + Hi-C 2,115 31 959 30,693 3.795 88.799 2.910 87.97
PacBio + Optical Map + Hi-C 1,780 31 571 30,693 10.197 87.347 2.910 89.05
ARS1 680 31 654 29,315 18.702 87.277 2.924 88.32
1

Assemblies are listed in order of their inclusion of scaffolding technologies towards the final assembly (ARS1), with the original contigs (PacBio) scaffolded using different technologies (Optical Map and Hi-C, respectively). Since the optical map, IrysScaffold program generates an assembly from the consensus of labelled DNA molecules, we have included scaffold statistics from this data (Optical Map) for comparison.

2

The number of continuous stretches of sequence within the scaffold without gaps larger than 3 bases in length of at least 10 bases.

3

Unplaced contigs are defined as input contigs or scaffolds that were not placed by the Optical Map or Hi-C in a scaffold were excluded from the scaffold counts.

4

Degenerate contigs were assembled unitigs that had less than 50 PacBio reads supporting their assembly (Supplementary Note). Differences in degenerate contig counts in the final ARS1 assembly are due to PBJelly merging of degenerate contigs (538 contigs), and removal due to no supporting PacBio read alignments (840).

5

All NG50 values are based on the ARS1 assembly size: 2.924 Gbp. For the PacBio entry, no scaffolds were generated so only the contig NG50 is reported.