TABLE 1.
Species (no. of isolates) and data representation method (no. of features) |
AUC |
|
---|---|---|
Validation data | Test data | |
E. coli (1,694) | ||
Binary representation (1,119) | 0.98 ± 0.01 | 0.97 |
Scored representation (2,167) | 0.98 ± 0.01 | 0.97 |
Scored + binary representation (4,219) | 0.98 ± 0.01 | 0.98 |
Amino acid representation (52,199) | 0.98 ± 0.01 | 0.97 |
Nucleotide representation (14,483) | 0.98 ± 0.02 | 0.97 |
M. tuberculosis (1,785) | ||
Binary representation (6,735) | 0.94 ± 0.04 | 0.92 |
Scored representation (11,120) | 0.94 ± 0.04 | 0.92 |
Scored + binary representation (21,975) | 0.94 ± 0.04 | 0.92 |
Amino acid representation (261,085) | 0.93 ± 0.04 | 0.92 |
Nucleotide representation (87,205) | 0.93 ± 0.04 | 0.92 |
For the performances with E. coli, the model was trained and validated with 1,422 isolates and tested with 272 isolates. For the performances with M. tuberculosis, the model was trained and validated with 992 isolates and tested with 793 isolates. All of these M. tuberculosis isolates had complete resistance profiles. AUC, area under the curve.