Skip to main content
. 2020 Feb 24;21:182. doi: 10.1186/s12864-020-6569-1

Table 1.

The synthetic datasets and the number of simulated sequence variations. The Average Sequence Identity (ASI) is estimated by the total mismatches divided by the number of nucleobases

Dataset Genome size SNV Small indel large indel ASI
simHG-1X 3,088,279,342 58,421,383 1,001,626 285,757 97.93%
simHG-3X 3,088,292,247 175,100,939 962,721 275,584 93.86%
simHG-5X 3,088,289,999 291,714,646 919,762 263,271 89.90%
NA12878 6,070,700,436 3,088,156 531,315 NA 99.84%