Table 1.
Sources of data for parental haplotype inference and benchmarking
Sample | Data type | Data source | Read count | Mean | Contacts | Application |
---|---|---|---|---|---|---|
depth | (>1Mb) | |||||
RPE-1 | Bulk WGS | [24] | 228,708,769a | 13 × | Variant calling | |
RPE-1 | Linked reads | New | 941,518,426b | 60 ×c | Variant calling | |
& local phasing | ||||||
RPE-1 | CCS long reads | New | 4,607,047d | 11 × | Local phasing | |
RPE-1 | Hi-C | [44] | 281,285,484e | 48,124,211 | Long-range phasing | |
RPE-1 | Single cell with | New | hi-conf variants and | |||
monosomies | reference haplotypes | |||||
NA12878 | Linked reads v.1 | 10X Genomicsf | 422,179,395g | 35 ×c | Local phasing | |
NA12878 | Linked reads v.2 | 10X Genomicsh | 423,854,243i | 35 ×c | Local phasing | |
NA12878 | Hi-C | [35] | 486,848,169j | 91,428,507 | Long-range phasing | |
NA12878 | Phased VCF | GIABk | hi-conf variants and | |||
reference haplotypes | ||||||
NA12878 | Phased VCF | Diploid assemblyl | hi-conf variants and | |||
reference haplotypes |
aSRR1778442: median insert 243; 208,151,992 fragments aligned in pair; 2 ×101bp reads; duplication rate 0.024.
bMean molecular length 24.8kb; median insert 551; 913,660,083 aligned in pair; 2 ×150bp reads; duplication rate 0.255.
cexcluding the GEMcode sequence and duplicated fragments
dMean read length 7.1kb; 4,606,654 aligned.
eSRS1045722: median insert 364; 279,027,892 aligned in pair; 2 ×150bp reads; duplication rate 0.067.
fhttps://support.10xgenomics.com/genome-exome/datasets/2.1.0/NA12878_WGS_210
gMean molecular length 68.7kb; median insert 349; 407,015,530 aligned in pair; 2 ×150bp reads; duplication rate 0.062.
hhttps://support.10xgenomics.com/genome-exome/datasets/2.2.1/NA12878_WGS_v2
iMean molecular length 85.6kb; median insert 370; 418,283,435 aligned in pair; 2 ×150bp reads; duplication rate 0.079.
jSRR1658572: median insert 377; 484,211,662 aligned in pair; 2 ×101bp reads; duplication rate 0.028.
khttps://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
lhttp://ftp.dfci.harvard.edu/pub/hli/hifiasm/NA12878-r253/. Phased variants were determined using dipcall (https://github.com/lh3/dipcall) on the sequences of parental chromosomes generated by diploid de novo assembly of the NA12878 genome using PacBio High-Fidelity long reads together with short reads of the parental genomes using hifiasm [40].