Skip to main content
. Author manuscript; available in PMC: 2019 Jan 22.
Published in final edited form as: Nat Methods. 2018 Jul 16;15(8):595–597. doi: 10.1038/s41592-018-0054-7

Fig. 1.

Fig. 1

Constructing the Syndip benchmark dataset. CHM1 and CHM13 cell lines were sequenced with PacBio and de novo assembled independently. Assembly contigs were aligned to the human reference genome. Differences in the alignment were taken as ‘true’ SNPs and INDELs; regions covered by exactly one contig from each CHM assembly were identified as confident regions where true variants can be called to high accuracy. For the evaluation of diploid variant calling with short reads, equal quantities of DNA from the two cell lines were experimentally mixed. A PCR-free library was constructed from the mix and sequenced to ~45-fold coverage with 151bp paired-end reads. Variants called from the short reads were compared to the PacBio variants as truth to measure variant caller accuracy.