Figure 4.
Examples of different liftover scenarios for SNVs from the low-coverage 1000 Genomes project falling within one of the chain gaps where the gap is caused by: (a) a SNV, (b) two SNVs, (c) a SNV and an indel, or (d) a complex variant. Underlined base pairs are base pairs covered by the hg19ToHg38.over.chain.gz chain file. Gray base pairs are 5’ and 3’ anchors for the maximally extended representations of the records. Transanno/liftvcf correctly processes (a), drops (b) and (c), and yields an incorrect record chr21 44583006T TT, TC for (d), while Genozip/DVCF, Picard/LiftoverVcf, and CrossMap/VCF drop all the SNVs. Notice that variant rs1211058 (d) is represented by VCF record chr21 44583007T C in the high-coverage 1000 Genomes project, but it is not possible, without sequence context, to correctly convert this variant from GRCh37 to GRCh38.