Table 1.
Assembly statistics
Assembly1 | Contigs2 | Scaffolds | Unplaced contigs3 | Degenerate contigs4 | Contig NG50 (Mbp)5 | Scaffold NG50 (Mbp)5 | Assembly Size (Gbp) | Assembly in Scaffolds (%) |
---|---|---|---|---|---|---|---|---|
PacBio | 3,074 | – | – | 30,693 | 3.795 | – | 2.914 | N/A |
Optical Map | – | 2,944 | – | – | – | 1.487 | 2.748 | N/A |
PacBio + Optical Map | 1,109 | 333 | 1,242 | 30,693 | 10.197 | 20.623 | 2.910 | 90.89 |
PacBio + Hi-C | 2,115 | 31 | 959 | 30,693 | 3.795 | 88.799 | 2.910 | 87.97 |
PacBio + Optical Map + Hi-C | 1,780 | 31 | 571 | 30,693 | 10.197 | 87.347 | 2.910 | 89.05 |
ARS1 | 680 | 31 | 654 | 29,315 | 18.702 | 87.277 | 2.924 | 88.32 |
Assemblies are listed in order of their inclusion of scaffolding technologies towards the final assembly (ARS1), with the original contigs (PacBio) scaffolded using different technologies (Optical Map and Hi-C, respectively). Since the optical map, IrysScaffold program generates an assembly from the consensus of labelled DNA molecules, we have included scaffold statistics from this data (Optical Map) for comparison.
The number of continuous stretches of sequence within the scaffold without gaps larger than 3 bases in length of at least 10 bases.
Unplaced contigs are defined as input contigs or scaffolds that were not placed by the Optical Map or Hi-C in a scaffold were excluded from the scaffold counts.
Degenerate contigs were assembled unitigs that had less than 50 PacBio reads supporting their assembly (Supplementary Note). Differences in degenerate contig counts in the final ARS1 assembly are due to PBJelly merging of degenerate contigs (538 contigs), and removal due to no supporting PacBio read alignments (840).
All NG50 values are based on the ARS1 assembly size: 2.924 Gbp. For the PacBio entry, no scaffolds were generated so only the contig NG50 is reported.