Skip to main content
. 2024 Jan 23;40(2):btae038. doi: 10.1093/bioinformatics/btae038

Table 1.

Description of the two 1000 Genome project non-singleton variants callsets used for running genomic coordinates conversion comparisons and description of the three chain files used for the liftover conversion.

1000 Genomes low coverage high coverage
Project callset
Release year 2013 2020
Number of samples 2504 3202
Variant calling multiple callers HaplotypeCaller
Aligned against GRCh37 GRCh38
Sequencing coverage 7.4× 34×
Number of non-singleton variants
SNVs 45 595 458 63 993 411
Bi-allelic indels 3 398 818 9 459 059
Multi-allelic indels (split) 243 179 4 123 095
Multi-allelic indels (merged) 108 842 1 375 718
Non-allelic primitives 1408 0
Chain properties
Source assembly GRCh37 GRCh38
Destination assembly GRCh38 T2T-CHM13v2.0
or Clint_PTRv2
Generation script DoSameSpecies- DoBlastz-
LiftOver.pl ChainNet.pl
Assembly aligner BLAT LASTZ
Chain file hg19ToHg38 hg38ToHs1
(.over.chain.gz) or hg38ToPanTro6

As the high-coverage callset used exclusively the HaplotypeCaller to call variants and used longer sequencing reads, an order of magnitude more multi-allelic indels are included and non-allelic primitive variants are not present in the callset.