Table 1.
Description of the two 1000 Genome project non-singleton variants callsets used for running genomic coordinates conversion comparisons and description of the three chain files used for the liftover conversion.
1000 Genomes | low coverage | high coverage |
---|---|---|
Project callset | ||
Release year | 2013 | 2020 |
Number of samples | 2504 | 3202 |
Variant calling | multiple callers | HaplotypeCaller |
Aligned against | GRCh37 | GRCh38 |
Sequencing coverage | 7.4× | 34× |
Number of non-singleton variants | ||
SNVs | 45 595 458 | 63 993 411 |
Bi-allelic indels | 3 398 818 | 9 459 059 |
Multi-allelic indels (split) | 243 179 | 4 123 095 |
Multi-allelic indels (merged) | 108 842 | 1 375 718 |
Non-allelic primitives | 1408 | 0 |
Chain properties | ||
Source assembly | GRCh37 | GRCh38 |
Destination assembly | GRCh38 | T2T-CHM13v2.0 |
or Clint_PTRv2 | ||
Generation script | DoSameSpecies- | DoBlastz- |
LiftOver.pl | ChainNet.pl | |
Assembly aligner | BLAT | LASTZ |
Chain file | hg19ToHg38 | hg38ToHs1 |
(.over.chain.gz) | or hg38ToPanTro6 |
As the high-coverage callset used exclusively the HaplotypeCaller to call variants and used longer sequencing reads, an order of magnitude more multi-allelic indels are included and non-allelic primitive variants are not present in the callset.