Table 1.
Reference | Objective | Study theme | AIa method | Findings (patient safety outcomes) |
Chen et al [57] | To classify alerts as real or artifacts in online noninvasive vital sign data streams and minimize alarm fatigue and missed true instability | Clinical alarms/alerts | KNNb, NBc, LRd, SVMe, RFf | Machine-learning (ML) models could distinguish clinically relevant pulse arterial O2 saturation, blood pressure, and respiratory rate from artifacts in an online monitoring dataset (AUCg>0.87) |
Ansari et al [58] | To minimize false alarms in the ICUh | Clinical alarms/alerts | MMDi, DTj | ML algorithm along with MMD was effective in suppressing false alarms |
Zhang et al [59] | To minimize the rate of false critical arrhythmia alarms | Clinical alarms/alerts | SVM | SVM reduced false alarm rates. The model gave an overall true positive rate of 95% and true negative rate of 85% |
Antink et al [60] | To reduce false alarms by using multimodal cardiac signals recorded from a patient monitor | Clinical alarms/alerts | BCTk, SVM, RF, RDACl | A false alarm reduction score of 65.52 was achieved; employing an alarm-specific strategy, the model performed at a true positive rate of 95% and true negative rate of 78%. False alarms for extreme tachycardia were suppressed with 100% sensitivity and specificity |
Eerikäinen et al [61] | To classify true and false cardiac arrhythmia alarms | Clinical alarms or alerts | RF | Out of 5 false alarms, 4 were suppressed; 77.39% real-time model accuracy |
Menard et al [62] | Develop a predictive model that enables Roche/Genentech Quality Program Leads oversight of adverse event reporting at the program, study, site, and patient level. | Clinical alarms/alerts | ML (not disclosed) | The ML method identified the sites by risk of underreporting and enabled real-time safety reporting. The proposed model had an AUC of 0.62, 0.79, and 0.92 for simulation scenarios of 25%, 50%, and 75%, respectively. This project was part of a broader effort at Roche/Genentech Product Development Quality to apply advanced analytics to augment and complement traditional clinical quality assurance approaches |
Segal et al [63] | To determine the clinical usefulness of medication error alerts in a real-life inpatient setting |
Clinical alarms or alerts | Probabilistic ML | 85% of the alerts were clinically valid, and 80% were considered clinically useful; 43% of the alerts caused changes in subsequent medical orders. Thus, the model detected medication errors |
Hu et al [64] | To detect clinical deterioration | Clinical alarms or alerts | NNm | NN-based model could detect health deterioration such as heart rate variability with more accuracy than one of the best-performing early warning scores (ViEWS). The positive prediction value of NN was 77.58% and the negative prediction value was 99.19% |
Kwon et al [65] | To develop alarm systems that predict cardiac arrest early | Clinical alarms or alerts | RF, LR, DEWSn, and MEWSo | The DEWS identified more than 50% of patients with in-hospital cardiac arrest 14 hours before the event. It allowed medical staff to have enough time to intervene. The AUC and AUPRCp of DEWS was 0.85 and 0.04, respectively, and outperformed MEWS with AUC and AUPC of 0.60 and 0.003, respectively; RF with AUC and AUPC of 0.78 and 0.01, respectively; and LR with AUC and AUPRC of 0.61 and 0.007, respectively. DEWS reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the other models at the same sensitivity |
Gupta and Patrick [66] | To classify clinical incidents | Clinical Report | J48q, NB multinomial, and SVM | The selected models performed poorly in classifying incident categories (48.77% best, using J48), but performed comparatively better in classifying free text (76.49% using NB). |
Wang et al [67] | To identify multiple incident types from a single report | Clinical Report | Compares binary relevance, CCr | Binary classifier improved identification of common incident types: falls, medications, pressure injury, aggression, documentation problem, and others. Automated identification enabled safety problems to be detected and addressed in a more timely manner |
Zhou et al [49] | To extract information from clinical reports | Clinical Report | SVM, NB, RF, and MLPs | ML algorithms identified the medication event originating stages, event types, and causes, respectively. The models improved the efficiency of analyzing the medication event reports and learning from the reports in a timely manner with (SVM) F1 of 0.792 and (RF) F1 of 0.925 |
Fong et al [68] | To analyze patient safety reports | Clinical Report | NLPt | Pyxis Discrepancy and Pharmacy Delivery Delay were found to be the main two factors affecting patient safety. The NLP models significantly reduced the time required to analyze safety reports |
El Messiry et al [69] | To analyze patient feedback | Clinical Report | NLP | Care-related complaints were influenced by money and emotion |
Chondrogiannis et al [70] | To identify the meaning of abbreviations used in clinical studies | Clinical Report | NLP | Each clinical study document contained about 6.8 abbreviations. Each abbreviation can have 1.25 meanings on average. This helped in identification of acronyms |
Liang and Gong [71] | To extract information from patient safety reports | Clinical Report | Multilabel classification methods | Binary relevance was the best problem transformation algorithm in the multilabeled classifiers. It provided suggestions on how to implement automated classification of patient safety reports in clinical settings |
Ong et al [72] | To identify risk events in clinical incident reports | Clinical report | Text classifiers based on SVM | SVM performed well on datasets with diverse incident types (85.8%) and data with patient misidentification (96.4%). About 90% of false positives were found in “near-misses” and 70% of false negative occurred due to spelling errors |
Taggart et al [73] | To identify bleeding events using in clinical notes | Clinical Report | NLP, SVM, CNNu, and ETv | Rule-based NLP was better than the ML approach. NLP detected bleeding complications with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value. It can thus be used for quality improvement and prevention programs |
Denecke et al [74] | To minimize any loss of information during a doctor-patient conversation | Clinical Report | NLP | Electronic health platform provides an intuitive conversational user interface that patients use to connect to their therapist and self-anamnesis app. The app also allows data sharing among treating therapists |
Evans et al [75] | To determine the incident type and the severity of harm outcome | Clinical Report | J48, SVM, and NB | The SVM classifier improved the identification of patient safety incidents. Incident reports containing deaths were most easily classified with an accuracy of 72.82%. The severity classifier was not accurate to replace manual scrutiny |
Wang et al [76] | To identify the type and severity of patient safety incident reports | Clinical Report | CNN and SVM ensemble | CNN achieved high F scores (>85%) across all test datasets when identifying common incident types, including falls, medications, pressure injury, and aggression. It improved the process by 11.9% to 45.10% across different datasets |
Klock et al [47] | To understand the root causes of falls and increase learning from fall reports for better prevention of patient falls. | Clinical Report | SVM, RF, and RNNw | The model identified high and low scoring fall reports. Most of the patient fall reports scores were between 0.3 and 0.4, indicating poor quality of reports |
Li et al [77] | To stratify patient safety adverse event risk and predict safety problems of individual patients | Clinical Report | Ensemble-ML | The adverse event risk score at the 0.1 level could identify 57.2% of adverse events with 26.3% accuracy from 9.2% of the validation sample. The adverse event risk score of 0.04 could identify 85.5% of adverse events |
Murff et al [78] | To identify postoperative surgical complications within a comprehensive electronic medical record | Clinical Report | NLP | NLP identified 82% of acute renal failure cases compared with 38% for patient safety indicators. Similar results were obtained for venous thromboembolism (59% vs 46%), pneumonia (64% vs 5%), sepsis (89% vs 34%), and postoperative myocardial infarction (91% vs 89%) |
Wang et al [79] | To automate the identification of patient safety incidents in hospitals | Clinical Report | Text-based classifier: LR, SVM | For severity level, the F score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, a high recall was achieved for SAC1 (82.8%-84%), but precision was poor (6.8%-11.2%). High-risk incidents (SAC2) and medium-risk incidents (SAC3) were often misclassified. Reports about falls, medications, pressure injury, aggression, and blood tests were identified with high recall and precision |
Rosenbaum and Baron [80] | To detect Wrong Blood in Tube errors and mitigate patient harm | Clinical Report | LR, SVM | In contrast to the univariate analysis, the best performing multivariate delta check model (SVM) identified errors with a high degree of accuracy (0.97) |
McKnight [81] | To improve the ability to extract clinical information from patient safety reports efficiently | Clinical Report | NLP | The semisupervised model categorized patient safety reports into their appropriate patient safety topic and avoided overlaps; 85% of unlabeled reports were assigned correct labels. It helped NCPSx analysts to develop policy and mitigation decisions |
Marella et al [82] | To analyze patient safety reports describing health hazards from electronic health records | Clinical Report | Text mining based on: NB, KNN, rule induction | The NB kernel performed best, with an AUC of 0.927, accuracy of 0.855, and F score of 0.877. The overall proportion of cases found relevant was comparable between manually and automatically screened cases; 334 reports identified by the model as relevant were identified as not relevant, implying a false-positive rate of 13%. Manual screening identified 4 incorrect predictions, implying a false-negative rate of 29% |
Ye et al [83] | To validate a real-time early warning system to predict patients at high risk of inpatient mortality during their hospital episodes | Clinical Report | RF, XGBy, boosting SVM, LASSOz, and KNN | The modified early warning system accurately predicted the possibility of death for the top 13.3% (34/255) of patients at least 40.8 hours before death |
Fong et al [84] | To identify health information technology-related events from patient safety reports | Clinical Report | Unigram and Bigram LR, SVM | Unigram models performed better than Bigram and combined models. It identified HITaa-related events trained on PSEbb free-text descriptions from multiple states and health care systems. The unigram LR model gave an AUC of 0.931 and an F1 score of 0.765. LR also showed potential to maintain a faster runtime when more reports are analyzed. The final HIT model had less complexity and was more easily sharable |
Simon et al [85] | To establish whether patients with type 2 diabetes can safely use PANDITcc and whether its insulin dosing advice is clinically safe | Drug safety | PANDIT | 27 out of 74 (36.5%) PANDIT advice differed from those provided by diabetes nurses. However, only one of these (1.4%) was considered unsafe by the panel |
Song et al [86] | To predict drug-drug interactions | Drug safety | SVM | The 10‐fold crossvalidation improved the identification of drug-drug interaction with AUC>0.97, which is significantly greater than the analogously developed ML model (0.67) |
Hammann et al [87] | To identify drugs that could be suspected of causing adverse reactions in the central nervous system, liver, and kidneys | Drug safety | CHAIDdd and CARTee | CART exhibited high predictive accuracy of 78.94% for allergic reactions, 88.69% for renal, and 90.22% for the liver. CHAID model showed a high accuracy of 89.74% for the central nervous system |
Bean et al [88] | To predict adverse drug reactions | Drug safety | LR, SVM, DT, NLP, own model | The proposed model (own model) outperformed traditional LR, SVM, DT, and predicted adverse drug reactions with an AUC of 0.92 |
Hu et al [89] | To predict the appropriateness of initial digoxin dosage and minimize drug-drug adverse interactions | Drug safety | C4.5, KNN, CART, RF, MLP, and LR | In the non drug-drug interaction group, the AUC of RF, MLP, CART, and C4.5 was 0.91, 0.81, 0.79, and 0.784, respectively; for the drug-drug interaction group, the AUC of RF, CART, MLP, and C4.5 was 0.89, 0.79, 0.77, and 0.77, respectively. DT-based approaches and MLP can determine the initial dosage of a high-alert digoxin medication, which can increase drug safety in clinical practice |
Tang et al [90] | To identify adverse drug effects from unstructured hospital discharge summaries | Drug safety | NLP | A total of 33 trial sets were evaluated by the algorithm and reviewed by pharmacovigilance experts. After every 6 trial sets, drug and adverse event dictionaries were updated, and rules were modified to improve the system. The model identified adverse events with 92% precision and recall |
Hu et al [91] | To predict the dosage of warfarin | Drug safety | KNN, SVRff, NN-BPgg, MThh | The proposed model improved warfarin dosage when compared to the baseline (mean absolute error 0.394); reduced mean absolute error by 40.04% |
Hasan et al [92] | To improve medication reconciliation task | Drug safety | LR, KNN | Collaborative filtering identified the top 10 missing drugs about 40% to 50% of the time and the therapeutic missing drugs about 50% to 65% of the time |
Labovitz et al [93] | To evaluate the use of a mobile AI platform on medication adherence in stroke patients on anticoagulation therapy | Drug safety | Cell phone–based AI platform | Mean (SD) cumulative adherence based on the AI platform was 90.5% (7.5%). Plasma drug concentration levels indicated that adherence was 100% (15/15) and 50% (6/12) in the intervention and control groups, respectively |
Long et al [94] | To improve the reconciliation method | Drug safety | iPad-based software tool with an AI algorithm | All patients completed the task. The software improved reconciliation; all patients identified at least one error in their electronic medical record medication list; 8 of 10 patients reported that they would use the device in the future. The entire team (clinical and patients) liked the device and preferred to use it in the future |
Reddy et al [95] | To assess proof of concept, safety, and feasibility of ABC4Dii in a free-living environment over 6 weeks |
Drug safety | ABC4D | ABC4D was safe for use as an insulin bolus dosing system. A trend suggesting a reduction in postprandial hypoglycemia was observed. The median (IQR) number of postprandial hypoglycemia episodes within 6 h after the meal was 4.5 (2.0-8.2) in week 1 versus 2.0 (0.5-6.5) in week 6 (P=.10). No episodes of severe hypoglycemia occurred during the study |
Schiff et al [96] | To evaluate the performance and clinical usefulness of medication error alerts generated by an alerting system |
Drug safety | MedAware, probabilistic ML | 75% of the chart-reviewed alerts generated by MedAware were valid from which medication errors were identified. Of these valid alerts, 75.0% were clinically useful in flagging potential medication errors. |
Li et al [97] | To develop a computerized algorithm for medication discrepancy detection and assess its performance on real-world medication reconciliation data | Drug safety | Hybrid system consisting of ML algorithms and NLP | The hybrid algorithm yielded precision (P) of 95.0%, recall (R) of 91.6%, and F value of 93.3% on medication entity identification, and P=98.7%, R=99.4%, and F=99.1% on attribute linkage. The combination of the hybrid system and medication matching system gave P=92.4%, R=90.7%, and F=91.5%, and P=71.5%, R= 65.2%, and F=68.2% on classifying the matched and the discrepant medications, respectively |
Carrell et al [98] | To identify evidence of problem opioid use in electronic health records | Drug safety | NLP | The NLP-assisted manual review identified an additional 728 (3.1%) patients with evidence of clinically diagnosed problem opioid use in clinical notes. |
Tinoco et al [99] | To evaluate the source of information affecting different adverse events | Drug safety | CSSjj (ML) | CSS detected more hospital-associated infections than manual chart review (92% vs 34%); CSS missed events that were not stored in a coded format |
Onay et al [100] | To classify approved drugs from withdrawn drugs and thus reduce adverse drug effects | Drug safety | SVM, Boosted and Bagged trees (Ensemble) | The Gaussian SVM model yielded 78% prediction accuracy for the drug dataset, including all diseases. The ensemble of bagged tree and linear SVM models involved 89% of the accuracies for psycholeptics and psychoanalytic drugs |
Cai et al [101] | To discover drug-drug interactions from the Food and Drug Administration’s adverse event reporting system and thus prevent patient harm | Drug safety | Causal Association Rule Discovery (CARD) | CARD demonstrated higher accuracy in identifying known drug interactions compared to the traditional method (20% vs 10%); CARD yielded a lower number of drug combinations that are unknown to interact (50% for CARD vs 79% for association rule mining). |
Dandala et al [102] | To extract adverse drug events from clinical narratives and automate pharmacovigilance. | Drug safety | BilSTMkk, CRF-NNll | Joint modeling improved the identification of adverse drug events from 0.62 to 0.65 |
Dey et al [103] | To predict and prevent adverse drug reactions at an early stage to enhance drug safety | Drug safety | Deep learning | Neural fingerprints from the deep learning model (AUC=0.72) outperformed all other methods in predicting adverse drug reactions. The model identified important molecular substructures that are associated with specific adverse drug reactions |
Yang et al [104] | To identify medications, adverse drug effects, and their relations with clinical notes |
Drug safety | MADEx, LSTM-RNNmm, CRFnn, SVM, RF | MADEx achieved the top-three best performances (F1 score of 0.8233) for clinical name entity recognition, adverse drug effect, and relations from clinical texts, which outperformed traditional methods |
Chapman et al [105] | To identify adverse drug effect symptoms and drugs in clinical notes | Drug safety | NLP | The micro-averaged F1 score was 80.9% for named entity recognition, 88.1% for relation extraction, and 61.2% for the integrated systems |
Lian et al [106] | To detect adverse drug reactions | Drug safety | LRMoo, BNMpp, BCP-NNqq | Experimental results showed the usefulness of the proposed pattern discovery method by improving the standard baseline adverse drug reaction by 23.83% |
Huang et al [107] | To predict adverse drug effects | Drug safety | SVM, LR | The proposed computational framework showed that an in silico model built on this framework can achieve satisfactory cardiotoxicity adverse drug reaction prediction performance (median AUC=0.771, accuracy=0.675, sensitivity=0.632, and specificity=0.789). |
aAI: artificial intelligence.
bKNN: K-nearest neighbor.
cNB: naive Bayes.
dLR: logistic regression.
eSVM: support vector machine.
fRF: random forest.
gAUC: area under the curve.
hICU: intensive care unit.
iMMD: multimodal section.
jDT: decision tree.
kBCT: binary classification tree.
lRDAC: regularized discriminant analysis classifier.
mNN: neural network.
nDEWS: deep learning–based early warning system.
oMEWS: modified early warning system.
pAUPRC: area under the precision-recall curve.
qJ48: decision tree algorithm.
rCC: closure classifier.
sMLP: multilayer perceptron.
tNLP: natural language processing.
uCNN: convolutional neural network.
vET: extra tree.
wRNN: recurrent neural network.
xNCPS: National Center for Patient Safety.
yXGB: extreme gradient boosting.
zLASSO: least absolute shrinkage and selection operator.
aaHIT: health information technology.
bbPSE: patient safety event.
ccPANDIT: Patient Assisting Net-Based Diabetes Insulin Titration.
ddCHAID: Chi square automatic interaction detector.
eeCART: classification and regression tree.
ffSVR: support vector regression.
ggNN-BP: neural network-back propagation.
hhMT: model tree.
iiABC4D: Advanced Bolus Calculator For Diabetes.
jjCSS: clinical support system.
kkBiLSTM: bi-long short-term memory neural network.
llCRF-NN: conditional random field neural network.
mmLSTM-RNN: long short-term memory-recurrent neural network.
nnCRF: conditional random field neural network.
ooLRM: logistic regression probability model.
ppBNM: Bayesian network model.
qqBCP-NN: Bayesian confidence propagation neural network.