TABLE 1.
Variant type | Phenotype | No. selected | FPR (%) | FNR (%) | CPU time (min) | Memory usage (Gb) |
---|---|---|---|---|---|---|
SNPs (90,000), 3.6 Mb on disk | β-Lactam | 4,374 | 3 | 7 | 4.4 | 1.3 |
Erythromycin | 2,341 | 3 | 63 | 4.1 | 1.3 | |
Unitigs (730,000), 25 Mb on disk | β-Lactam | 8,247 | 5 | 7 | 49.7 | 18 |
Erythromycin | 1,591 | 9 | 39 | 52.6 | 6.9 | |
k-mers (10 million), 603 Mb on disk | β-Lactam | 15,121 | 6 | 7 | 420 | 212 |
Using a training/test split of 2:1, prediction accuracy of two phenotypes was tested using 90,000 SNP calls from mapping to a reference genome, and with 730,000 unitigs. We also tested prediction using 10 million variable-length k-mers to illustrate the heavy computational resource use in even a relatively small data set. File sizes are for the sparse data structures we employ.