Fig 1. Evaluation of HV31 de novo assemblies.
(A) Diagram of de novo assembly workflow. Processes and datasets are represented by blocks with square and rounded corners, respectively. (B) Overview of assembled scaffolds in 8 selected regions. Heterozygous SVs on the unassembled haplotype that are larger than 1 kb in size are shown as orange diamonds or red triangles. Note that the assembled scaffolds (gray) are often larger than the originally selected immune system regions (blue) defined in Table 1. (C) Contig/scaffold continuity (NG50, y axis) for local contigs (gray) and finished HV31 assembly scaffolds (red) in each region (x axis). NG50 is defined as the length of the longest contig/scaffold that, along with longer contigs/scaffolds, covers 50% percent of each locus, as determined by alignment to GRCh38. The size of the selected region on the GRCh38 reference is also shown. To ensure comparable results, for each contig/scaffold, only the length within region boundaries is taken into NG50 calculation. (D) The estimated number of errors per megabase in each region, before and after assembly polishing. Error rates are estimated using a modified version of the Merqury algorithm [24] as described in Methods.