Table 3.
Online comparison of Bystro and GEMINI/Galaxy in filtering 1 × 106 variants
No. | Program | Query | Time (s) | Variants | Ts/Tv |
---|---|---|---|---|---|
1 | Bystro | cadd > 15 alt:(a || c || t || g) | 0.004 ± 0 | 28,099 | 2.512 |
1 | GEMINI | SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 | 442 ± 87 | 22,063 | NA |
2 | Bystro | gnomad.exomes.af < .001 cadd > 15 missense | 0.007 ± 0.003 | 6840 | 3.083 |
2 | GEMINI | SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “missense_variant” | 77.6 ± 18.6 | 5160 | NA |
3 | Bystro | gnomad.exomes.af < .001 cadd > 15 nonsynonymous | 0.006 ± 0.001 | 6840 | 3.083 |
3 | GEMINI | SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “nonsynonymous_variant” | NA | 0 | NA |
Bystro was compared to the latest hosted version of GEMINI (v0.8.1, on the Galaxy platform) in filtering the 1 × 106 variant subset of 1000 Genomes Phase 3, which was the largest tested file that GEMINI/Galaxy could process. GEMINI requires structured SQL queries, while Bystro allows for shorter, unstructured search. In query 1, Bystro searched for CADD scores only within single-nucleotide polymorphisms (using alt:(a || c || t || g) or equivalently the regex query alt:/[actg]/), to normalize results with GEMINI, which provides no CADD data for insertions and deletions. In queries 2 and 3, Bystro’s search engine returned identical results for the synonymous terms “missense” and “nonsynonymous,” despite annotating such sites only as “nonsynonymous.” In contrast, GEMINI required the specific term “missense_variant.” GEMINI/Galaxy and Bystro returned different results because the latest version of GEMINI on Galaxy (0.8.1) uses outdated annotation sources. Comparisons between Bystro and GEMINI/Galaxy are further limited as GEMINI does not provide a natural-language parser, annotation field filters, an interactive result browser, per-query statistics, or the ability to filter saved search results. Notably, Bystro also performed substantially faster, returning all results in < 1 s