Table 1.
Variant type | Average number of sites (thousands)a | Average sum of variant length (Mbp)b | Percentage of diploid genomec |
---|---|---|---|
All | 5,045.39 | 44.24 | 0.763 |
SNV (including MNPs) | 3,992.73 | 3.99 | 0.069 |
Indel | 1,021.73 | 3.63 | 0.063 |
SVd | 30.93 | 36.62 | 0.631 |
STR | 2.65 | 0.19 | 0.003 |
VNTR | 12.58 | 1.36 | 0.023 |
Other low complexity | 2.58 | 0.13 | 0.002 |
SD | 0.55 | 6.25 | 0.108 |
Mobile element | 6.18 | 1.91 | 0.033 |
LINE1 | 0.98 | 0.91 | 0.016 |
ERV | 0.64 | 0.27 | 0.005 |
Alu | 3.49 | 0.48 | 0.008 |
SVA | 1.07 | 0.25 | 0.004 |
Inversion | 0.15 | 23.2 | 0.400 |
Unclassified/mixed | 6.23 | 3.58 | 0.062 |
Abbreviations: ERV, endogenous retrovirus; indel, insertion or deletion; LINE1, long interspersed element 1; MNP, multiple-nucleotide polymorphism; SD, segmental duplication; SINE, short interspersed element; SNV, single-nucleotide variant; STR, short tandem repeat; SV, structural variant; SVA, SINE-VNTR-Alu; VCF, Variant Call Format; VNTR, variable number tandem repeat.
The average number of sites observed of a given variant type within each genome.
The average total length of variant sites.
The percentage of a diploid genome that each variant type represents, assuming a 5.8-Gb diploid euchromatic genome length. The values exclude heterochromatin due to uncertainty around assembly and alignment for all variants except inversions, where estimates are from Porubsky et al. (115) and not necessarily restricted to euchromatic sequence.
SVs include all structural variants; the remaining rows are SV subclasses. Unclassified/mixed denotes a class of SVs for which reliable annotation could not be given. SV counts, excluding inversions, were calculated from Minigraph (89) VCF files released as part of a paper by Liao et al. (93), provided by Heng Li and Wen-Wei Liao. Small-variant numbers are also from Liao et al. (93) and were calculated using PacBio HiFi sequencing data and DeepVariant (113).