Skip to main content
. 2021 May 6;22:139. doi: 10.1186/s13059-021-02330-1

Table 1.

Sources of data for parental haplotype inference and benchmarking

Sample Data type Data source Read count Mean Contacts Application
depth (>1Mb)
RPE-1 Bulk WGS [24] 228,708,769a 13 × Variant calling
RPE-1 Linked reads New 941,518,426b 60 ×c Variant calling
& local phasing
RPE-1 CCS long reads New 4,607,047d 11 × Local phasing
RPE-1 Hi-C [44] 281,285,484e 48,124,211 Long-range phasing
RPE-1 Single cell with New hi-conf variants and
monosomies reference haplotypes
NA12878 Linked reads v.1 10X Genomicsf 422,179,395g 35 ×c Local phasing
NA12878 Linked reads v.2 10X Genomicsh 423,854,243i 35 ×c Local phasing
NA12878 Hi-C [35] 486,848,169j 91,428,507 Long-range phasing
NA12878 Phased VCF GIABk hi-conf variants and
reference haplotypes
NA12878 Phased VCF Diploid assemblyl hi-conf variants and
reference haplotypes

aSRR1778442: median insert 243; 208,151,992 fragments aligned in pair; 2 ×101bp reads; duplication rate 0.024.

bMean molecular length 24.8kb; median insert 551; 913,660,083 aligned in pair; 2 ×150bp reads; duplication rate 0.255.

cexcluding the GEMcode sequence and duplicated fragments

dMean read length 7.1kb; 4,606,654 aligned.

eSRS1045722: median insert 364; 279,027,892 aligned in pair; 2 ×150bp reads; duplication rate 0.067.

fhttps://support.10xgenomics.com/genome-exome/datasets/2.1.0/NA12878_WGS_210

gMean molecular length 68.7kb; median insert 349; 407,015,530 aligned in pair; 2 ×150bp reads; duplication rate 0.062.

hhttps://support.10xgenomics.com/genome-exome/datasets/2.2.1/NA12878_WGS_v2

iMean molecular length 85.6kb; median insert 370; 418,283,435 aligned in pair; 2 ×150bp reads; duplication rate 0.079.

jSRR1658572: median insert 377; 484,211,662 aligned in pair; 2 ×101bp reads; duplication rate 0.028.

khttps://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/

lhttp://ftp.dfci.harvard.edu/pub/hli/hifiasm/NA12878-r253/. Phased variants were determined using dipcall (https://github.com/lh3/dipcall) on the sequences of parental chromosomes generated by diploid de novo assembly of the NA12878 genome using PacBio High-Fidelity long reads together with short reads of the parental genomes using hifiasm [40].