. 2023 Aug 6;12(8):1293. doi: 10.3390/antibiotics12081293

Table 1.

Characteristics of ML models in included studies.

Title	Time Frame	Predicted (Outcome) Variable	Predictors’ Groups	Training/Test Sets	Machine Learning Models Used	Ways to Avoid Data Overfitting	Handling Missing Data	Evaluation of Models’ Performance	Results
A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection [43].	10 years (1 January 2007–31 December 2016).	The proportion of recommendations for second-line antibiotics and the proportion of recommendations for inappropriate antibiotic therapy.	Labs, antibiotics, demographics, geographical, comorbidities and medical history of patients.	Training dataset (n = 10,053 patients, and thus 11,865 specimens) and test dataset (n = 3629 patients, and thus 3941 specimens).	Logistic regression, decision trees, random forest models.	Regularisation was used in the logistic regression model.	No information.	AUROCs for nitrofurantoin and TMP-SMX were poor (0.56 and 0.59). For ciprofloxacin and levofloxacin, the AUROCs were poor as well (0.64).	The ML model was able to make a recommendation for an antibiotic in 99% of the specimens, and chose ciprofloxacin or levofloxacin for 11% of the specimens, relative to 34% in the case of clinicians (a 67% reduction). Furthermore, the model’s recommendation resulted in an inappropriate antibiotic therapy (i.e., second-line antibiotics) in 10% of the specimens relative to 12% in the case of clinicians (18% reduction).
A hybrid method incorporating a rule-based approach and deep learning for prescription error prediction [44].	1 year (1 January–31 December 2018).	Antibiotic prescription errors.	Labs, medical history of patients and antibiotics.	No information.	An advanced rule-based deep neural network (ARDNN).	No information.	2.45% of height and weight missing data were predicted, and other empty data records were deleted. Data outliers were treated as missing values.	The performance was evaluated with a precision of 73%, recall of 81% and F1 score of 77%.	Out of 15,407 prescriptions by clinicians, there were 179 prescription errors. A validated prediction model for prescription errors correctly detected 145 prescription errors out of the 179 errors, implying a precision of 81%, recall of 73% and F1 score of 77%.
Evaluation of machine learning capability for a clinical decision support system to enhance antimicrobial stewardship programs [45].	11 months (2012 and 2013), where phase one was from 1 February to 30 November 2012, and phase two was from 18 November to 20 December 2013).	Inappropriate prescriptions of piperacillin–tazobactam.	Labs, demographics, geographical and vital signs.	No information.	A supervised learning module (temporal induction of classification models (TIM), which combines instance-based learning and rule induction).	The J-measure was used for measuring the improvement of a rule (i.e., the higher the information content of a rule reflected by the high J-measure, the higher the predictive accuracy of the model).	No information.	The overall system achieved a precision of 74%, recall of 96% and accuracy of 79%.	44 learned rules were extracted to identify inappropriate piperacillin–tazobactam prescriptions. When tested against the data set, they were able to identify inappropriate prescriptions with a precision of 66%, recall of 64% and accuracy of 71%.
Personal clinical history predicts antibiotic resistance to urinary tract infections [46].	10 years (1 July 2007–30 June 2017).	Mismatched treatment.	Labs, antibiotics, demographics, geographical, temporal and gender-related.	Training dataset: all data collected from 1 Jul 2007 to 30 Jun 2016; test dataset: all data collected from Jul 2016 to 30 Jun 2017.	Logistic regression and gradient-boosting decision trees (GBDTs).	The model performance on the test set was contrasted with the model performance on the training set to identify data overfitting.	Missing data for resistance measurements were defined as not available (N/A), and such samples were not used in the models.	AUROC was acceptable at 0.7 [amoxicillin-CA] to excellent at 0.83 [ciprofloxacin].	The unconstrained algorithm resulted in a predicted mismatch treatment of 5% (42% lower than the mismatch treatment of 8.5% in the case of clinicians’ prescriptions), and the constrained resulted in a predicted mismatch of 6%.
Using machine learning to guide targeted and locally tailored empiric antibiotic prescribing in a children’s hospital in Cambodia [47].	3 years (from February 2013 to January 2016).	Susceptibility to antibiotics.	Labs, antibiotics, demographics, temporal, socioeconomic conditions and medical history of patients.	The dataset was split 80% versus 20% for training and test datasets.	Logistic regression, decision trees, random forests, boosted decision trees, linear support vector machines (SVM), polynomial SVMs, radial SVMs and K-nearest neighbours.	Regularisation was used in the logistic regression model; however, no details about the rest of the ML models were used.	Missing data for the binary predictors were treated as being “negative”.	AUROC of the random forest method was excellent at 0.80 for ceftriaxone, acceptable at 0.74 for ampicillin and gentamicin and 0.71 for Gram-stain.	The random forest method had the best predictive performance in predicting susceptibility to antibiotics (such as ceftriaxone, ampicillin and gentamicin, and Gram-stain), which will be used to guide appropriate antibiotic therapy. In addition, the authors reported the AUROC values of different models rather than the ML models’ results.