Table 1.
Characteristics of ML models in included studies.
Title | Time Frame | Predicted (Outcome) Variable | Predictors’ Groups | Training/Test Sets | Machine Learning Models Used | Ways to Avoid Data Overfitting | Handling Missing Data | Evaluation of Models’ Performance | Results |
---|---|---|---|---|---|---|---|---|---|
A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection [43]. | 10 years (1 January 2007–31 December 2016). | The proportion of recommendations for second-line antibiotics and the proportion of recommendations for inappropriate antibiotic therapy. | Labs, antibiotics, demographics, geographical, comorbidities and medical history of patients. | Training dataset (n = 10,053 patients, and thus 11,865 specimens) and test dataset (n = 3629 patients, and thus 3941 specimens). | Logistic regression, decision trees, random forest models. | Regularisation was used in the logistic regression model. | No information. | AUROCs for nitrofurantoin and TMP-SMX were poor (0.56 and 0.59). For ciprofloxacin and levofloxacin, the AUROCs were poor as well (0.64). | The ML model was able to make a recommendation for an antibiotic in 99% of the specimens, and chose ciprofloxacin or levofloxacin for 11% of the specimens, relative to 34% in the case of clinicians (a 67% reduction). Furthermore, the model’s recommendation resulted in an inappropriate antibiotic therapy (i.e., second-line antibiotics) in 10% of the specimens relative to 12% in the case of clinicians (18% reduction). |
A hybrid method incorporating a rule-based approach and deep learning for prescription error prediction [44]. | 1 year (1 January–31 December 2018). | Antibiotic prescription errors. | Labs, medical history of patients and antibiotics. | No information. | An advanced rule-based deep neural network (ARDNN). | No information. | 2.45% of height and weight missing data were predicted, and other empty data records were deleted. Data outliers were treated as missing values. | The performance was evaluated with a precision of 73%, recall of 81% and F1 score of 77%. | Out of 15,407 prescriptions by clinicians, there were 179 prescription errors. A validated prediction model for prescription errors correctly detected 145 prescription errors out of the 179 errors, implying a precision of 81%, recall of 73% and F1 score of 77%. |
Evaluation of machine learning capability for a clinical decision support system to enhance antimicrobial stewardship programs [45]. | 11 months (2012 and 2013), where phase one was from 1 February to 30 November 2012, and phase two was from 18 November to 20 December 2013). | Inappropriate prescriptions of piperacillin–tazobactam. | Labs, demographics, geographical and vital signs. | No information. | A supervised learning module (temporal induction of classification models (TIM), which combines instance-based learning and rule induction). | The J-measure was used for measuring the improvement of a rule (i.e., the higher the information content of a rule reflected by the high J-measure, the higher the predictive accuracy of the model). | No information. | The overall system achieved a precision of 74%, recall of 96% and accuracy of 79%. | 44 learned rules were extracted to identify inappropriate piperacillin–tazobactam prescriptions. When tested against the data set, they were able to identify inappropriate prescriptions with a precision of 66%, recall of 64% and accuracy of 71%. |
Personal clinical history predicts antibiotic resistance to urinary tract infections [46]. | 10 years (1 July 2007–30 June 2017). | Mismatched treatment. | Labs, antibiotics, demographics, geographical, temporal and gender-related. | Training dataset: all data collected from 1 Jul 2007 to 30 Jun 2016; test dataset: all data collected from Jul 2016 to 30 Jun 2017. | Logistic regression and gradient-boosting decision trees (GBDTs). | The model performance on the test set was contrasted with the model performance on the training set to identify data overfitting. | Missing data for resistance measurements were defined as not available (N/A), and such samples were not used in the models. | AUROC was acceptable at 0.7 [amoxicillin-CA] to excellent at 0.83 [ciprofloxacin]. | The unconstrained algorithm resulted in a predicted mismatch treatment of 5% (42% lower than the mismatch treatment of 8.5% in the case of clinicians’ prescriptions), and the constrained resulted in a predicted mismatch of 6%. |
Using machine learning to guide targeted and locally tailored empiric antibiotic prescribing in a children’s hospital in Cambodia [47]. | 3 years (from February 2013 to January 2016). | Susceptibility to antibiotics. | Labs, antibiotics, demographics, temporal, socioeconomic conditions and medical history of patients. | The dataset was split 80% versus 20% for training and test datasets. | Logistic regression, decision trees, random forests, boosted decision trees, linear support vector machines (SVM), polynomial SVMs, radial SVMs and K-nearest neighbours. | Regularisation was used in the logistic regression model; however, no details about the rest of the ML models were used. | Missing data for the binary predictors were treated as being “negative”. | AUROC of the random forest method was excellent at 0.80 for ceftriaxone, acceptable at 0.74 for ampicillin and gentamicin and 0.71 for Gram-stain. | The random forest method had the best predictive performance in predicting susceptibility to antibiotics (such as ceftriaxone, ampicillin and gentamicin, and Gram-stain), which will be used to guide appropriate antibiotic therapy. In addition, the authors reported the AUROC values of different models rather than the ML models’ results. |