Skip to main content
. 2020 Jul 7;11(4):e01344-20. doi: 10.1128/mBio.01344-20

TABLE 3.

Summary of data sets testeda

Data set name Species Phenotype(s) and split Reference No. of samples No. of samples for training/test No. of genetic features
TB Mycobacterium tuberculosis First-line antibiotic resistance: rifampicin, 1,285:2,257; isoniazid, 1,553:2,011; pyrazinamide, 702: 2,445; ethambutol, 975:2,551 5 3,566 2,377/1,189 6,400 (SNPs)
N. gonorrhoeae Neisseria gonorrhoeae Antibiotic resistance MICs: azithromycin, cefixime, ciprofloxacin, penicillin, and tetracycline 53, 61, 83, 84 1,595 NUb 550,000 (unitigs)
GAS Streptococcus pyogenes Virulence, 1,093:637 46 1,730 1,154/576 1.1 million (unitigs)
SPARC Streptococcus pneumoniae Antibiotic resistance MICs: penicillin, erythromycin 47, 85 603 400/203 90,000 (SNPs), 730,000 (unitigs), 10 million (k-mers)
Maela Streptococcus pneumoniae Carriage duration; antibiotic resistance: penicillin, 1,661:1,282; erythromycin, 802:2,355; trimethoprim, 609:2,548 12, 44 3,162 (antibiotic resistance), 2,017 (carriage duration) 1,404/703 (carriage duration) 121,000 (SNPs), 1.6 million (unitigs)
GPS Streptococcus pneumoniae Antibiotic resistance (penicillin) 1 5,820 NU 1.7 million (unitigs)
Netherlands Streptococcus pneumoniae Meningitis/carriage, 693:1,144 45 1,837 1,225/612 690,000 (unitigs)
a

Each data set has a name by which it is referred to in the text. Most data sets have multiple phenotypes available, especially where multiple different antibiotic resistances are routinely phenotyped. Data sets without a training/test split were not evaluated for internal prediction ability as they were instead used with more stringent external validation data sets or were used for GWAS only, and all available samples were used to fit the model.

b

NU, not used.