Skip to main content
. 2021 May 6;22:139. doi: 10.1186/s13059-021-02330-1

Fig. 4.

Fig. 4

a, b Haplotype blocks on Chr.5 derived from the NA12878 (a) and RPE-1 (b) linked-reads data. Each row of haplotype blocks is determined using a different switching-penalty cutoff from ΔE=5 to ΔE=10,000. Only blocks with 50 or more phased variants are shown. The accuracy of each block is estimated by the percentage of genotypes that are consistent with the majority haplotype of each block determined using the reference haplotype. Blocks with ≥98% accuracy are colored in gray; those with <98% accuracy are colored in red with brightness scaled by the accuracy (minimum 50%). Three examples of low-accuracy blocks each containing a single intra-block switching error are highlighted with red arrows in the NA12878 genome; these blocks are broken into two high-accuracy blocks at a higher cutoff. Shown below the haplotype blocks are three tracks of regional variant density measured by the number of total detected variants (blue), phased variants in the final haplotype solution (green), and phased variants in the reference data (purple) in 200 kb bins. (We have chosen the Genome-In-A-Bottle data as the NA12878 reference haplotype.) Bins with more than 20 variants (variant density more than 1 per 10 kb) are omitted. Black arrows highlight large regions with low variant density, including the spinal muscular atrophy (SMA) region on 5q13.2 consisting of large (∼200kb) segmental duplications with high sequence similarity (>98%) [47] that cannot be resolved by short reads. The other region in the RPE-1 genome reflects loss-of-heterozygosity. c. Average intra-block accuracy (weighted by the number of variants in each block) and the N50 length of all haplotype blocks in each sample generated using different switching-penalty cutoffs. The NA12878 dataset produces longer haplotype blocks due to having longer input molecules (Table 1)