Table 1.
Bystro, VEP, ANNOVAR offline command-line performance
Software | Dataset | Samples | Variants | Variants/s | Bystro vs |
---|---|---|---|---|---|
Bystro | 1000G Phase 3 chr1 | 2504 | 1 × 106 | 8156 ± 195 | – |
1000G Phase 3 chr1 | 2504 | 2 × 106 | 8484 ± 67.9 | – | |
1000G Phase 3 chr1 | 2504 | 4 × 106 | 8516 ± 57.2 | – | |
1000G Phase 3 chr1 | 2504 | 6.5 × 106 | 7779 ± 21.8 | – | |
1000G Phase 1 | 1092 | 3.9 × 107 | 5417 ± 76.8 | – | |
1000G Phase 3 | 2504 | 8.5 × 107 | 7904 ± 15.9 | – | |
VEP | 1000G Phase 1 | 1092 | 3.9 × 107 | 18.67 ± 0.58 | 290× |
1000G Phase 3 | 2504 | 8.5 × 107 | 10.00 ± 0.00 | 790× | |
ANNOVAR | 1000G Phase 3 chr1 | 2504 | 1 × 106 | 74.67 ± 0.21 | 109× |
1000G Phase 3 chr1 | 2504 | 2 × 106 | 75.32 ± 0.06 | 113× | |
1000G Phase 3 chr1 | 2504 | 4 × 106 | 75.15 ± 0.39 | 113× | |
1000G Phase 3 chr1 | 2504 | 6.5 × 106 | NA | NA | |
1000G Phase 1 | 1092 | 3.9 × 107 | NA | NA | |
1000G Phase 3 | 2504 | 8.5 × 107 | NA | NA |
Bystro, VEP, and ANNOVAR were similarly configured with eight threads on Amazon i3.2xlarge servers. “Dataset” refers to the VCF file used. “Variants/s” is the average of three trials. VEP performance was recorded after 2 × 105 sites in consideration of time. In runs of 1 × 106 or more annotated sites, VEP performance did not deviate from the 2 × 105 value. ANNOVAR could not complete the full Phase 1, Phase 3, or Phase 3 chromosome 1 datasets due to memory limitations. Thus, ANNOVAR was compared to Bystro on subsets of 1000 Genomes Phase 3 chromosome 1. Bystro run times included time taken to compress outputs. 1000 Genomes Phase 1 performance reflects IO limitations