Skip to main content
. 2018 Feb 6;19:14. doi: 10.1186/s13059-018-1387-3

Table 2.

Online comparison of Bystro and recent programs in filtering 8.49 × 107 variants from 1000 Genomes

Group Search query Time (s) Variants Tr:Tv
1 Exonic 0.030 ± 0.030 993,343 2.96
2 (a) cadd > 20 maf < .001 pathogenic expert review missense 0.029 ± 0.009 65 1.71
2 (b) cadd > 20 maf < .001 pathogenic expert’s review non-synonymous 0.036 ± 0.019 65 1.71
2 (c) cadd > 20 maf < .001 pathogen expert-reviewed nonsynonymous 0.044 ± 0.025 65 1.71
3 (a) Early onset breast cancer 0.046 ± 0.029 4335 2.51
3 (b) Early-onset breast cancer 0.037 ± 0.020 4335 2.51
3 (c) Early onset breast cancers 0.033 ± 0.015 4335 2.51
4 (a) Pathogenic nonsense Ehlers-Danlos 0.038 ± 0.027 1 NA
4 (b) Pathogenic nonsense E.D.S 0.078 ± 0.087 1 NA
4 (c) Pathogenic stopgain eds 0.040 ± 0.022 1 NA

The full 1000 Genomes Phase 3 VCF file (853 GB, 8.49 × 107 variants, 2504 samples) was filtered in the publicly available Bystro web application using the Bystro natural-language search engine. VEP, GEMINI, and wANNOVAR (not shown) were also tested, but were unable to annotate this dataset or filter it. Bystro’s search engine uses a natural language parser that allows for unstructured queries: queries in groups 2, 3, and 4 show phrasing variations that did not affect results returned, as would be expected for a search engine that could handle normal language variation. “Tr:Tv” is the transition to transversion ratio automatically calculated for each query by the search engine. The transition to transversion ratio of 2.96 for the “exonic” query is close to the ~ 2.8–3.0 ratio expected in coding regions, suggesting that the search engine accurately identified exonic (coding) variants