Skip to main content
. 2020 Jul 24;8(7):e18599. doi: 10.2196/18599

Table 1.

Evidentiary table of 53 selected publications.

Reference Objective Study theme AIa method Findings (patient safety outcomes)
Chen et al [57] To classify alerts as real or artifacts in online noninvasive vital sign data streams and minimize alarm fatigue and missed true instability Clinical alarms/alerts KNNb, NBc, LRd, SVMe, RFf Machine-learning (ML) models could distinguish clinically relevant pulse arterial O2 saturation, blood pressure, and respiratory rate from artifacts in an online monitoring dataset (AUCg>0.87)

Ansari et al [58] To minimize false alarms in the ICUh Clinical alarms/alerts MMDi, DTj ML algorithm along with MMD was effective in suppressing false alarms
Zhang et al [59] To minimize the rate of false critical arrhythmia alarms Clinical alarms/alerts SVM SVM reduced false alarm rates. The model gave an overall true positive rate of 95% and true negative rate of 85%
Antink et al [60] To reduce false alarms by using multimodal cardiac signals recorded from a patient monitor Clinical alarms/alerts BCTk, SVM, RF, RDACl A false alarm reduction score of 65.52 was achieved;
employing an alarm-specific strategy, the model performed at a true positive rate of 95% and true negative rate of 78%.
False alarms for extreme tachycardia were suppressed with 100% sensitivity and specificity
Eerikäinen et al [61] To classify true and false cardiac arrhythmia alarms Clinical alarms or alerts RF Out of 5 false alarms, 4 were suppressed; 77.39% real-time model accuracy
Menard et al [62] Develop a predictive model that enables Roche/Genentech Quality Program Leads oversight of adverse event reporting at the program, study, site, and patient level. Clinical alarms/alerts ML (not disclosed) The ML method identified the sites by risk of underreporting and enabled real-time safety reporting. The proposed model had an AUC of 0.62, 0.79, and 0.92 for simulation scenarios of 25%, 50%, and 75%, respectively.


This project was part of a broader effort at Roche/Genentech Product Development Quality to apply advanced analytics to augment and complement traditional clinical quality assurance approaches
Segal et al [63] To determine the clinical usefulness of medication error alerts in a real-life inpatient setting

Clinical alarms or alerts Probabilistic ML 85% of the alerts were clinically valid, and 80% were considered clinically useful; 43% of the alerts caused changes in subsequent medical orders. Thus, the model detected medication errors
Hu et al [64] To detect clinical deterioration Clinical alarms or alerts NNm NN-based model could detect health deterioration such as heart rate variability with more accuracy than one of the best-performing early warning scores (ViEWS). The positive prediction value of NN was 77.58% and the negative prediction value was 99.19%
Kwon et al [65] To develop alarm systems that predict cardiac arrest early Clinical alarms or alerts RF, LR, DEWSn, and MEWSo The DEWS identified more than 50% of patients with in-hospital cardiac arrest 14 hours before the event. It allowed medical staff to have enough time to intervene. The AUC and AUPRCp of DEWS was 0.85 and 0.04, respectively, and outperformed MEWS with AUC and AUPC of 0.60 and 0.003, respectively; RF with AUC and AUPC of 0.78 and 0.01, respectively; and LR with AUC and AUPRC of 0.61 and 0.007, respectively.


DEWS reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the other models at the same sensitivity

Gupta and Patrick [66] To classify clinical incidents Clinical Report J48q, NB multinomial, and SVM The selected models performed poorly in classifying incident categories (48.77% best, using J48), but performed comparatively better in classifying free text (76.49% using NB).
Wang et al [67] To identify multiple incident types from a single report Clinical Report Compares binary relevance, CCr Binary classifier improved identification of common incident types: falls, medications, pressure injury, aggression, documentation problem, and others. Automated identification enabled safety problems to be detected and addressed in a more timely manner
Zhou et al [49] To extract information from clinical reports Clinical Report SVM, NB, RF, and MLPs ML algorithms identified the medication event originating stages, event types, and causes, respectively. The models improved the efficiency of analyzing the medication event reports and learning from the reports in a timely manner with (SVM) F1 of 0.792 and (RF) F1 of 0.925
Fong et al [68] To analyze patient safety reports Clinical Report NLPt Pyxis Discrepancy and Pharmacy Delivery Delay were found to be the main two factors affecting patient safety. The NLP models significantly reduced the time required to analyze safety reports
El Messiry et al [69] To analyze patient feedback Clinical Report NLP Care-related complaints were influenced by money and emotion
Chondrogiannis et al [70] To identify the meaning of abbreviations used in clinical studies Clinical Report NLP Each clinical study document contained about 6.8 abbreviations. Each abbreviation can have 1.25 meanings on average. This helped in identification of acronyms
Liang and Gong [71] To extract information from patient safety reports Clinical Report Multilabel classification methods Binary relevance was the best problem transformation algorithm in the multilabeled classifiers. It provided suggestions on how to implement automated classification of patient safety reports in clinical settings

Ong et al [72] To identify risk events in clinical incident reports Clinical report Text classifiers based on SVM SVM performed well on datasets with diverse incident types (85.8%) and data with patient misidentification (96.4%). About 90% of false positives were found in “near-misses” and 70% of false negative occurred due to spelling errors
Taggart et al [73] To identify bleeding events using in clinical notes Clinical Report NLP, SVM, CNNu, and ETv Rule-based NLP was better than the ML approach. NLP detected bleeding complications with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value. It can thus be used for quality improvement and prevention programs
Denecke et al [74] To minimize any loss of information during a doctor-patient conversation Clinical Report NLP Electronic health platform provides an intuitive conversational user interface that patients use to connect to their therapist and self-anamnesis app. The app also allows data sharing among treating therapists
Evans et al [75] To determine the incident type and the severity of harm outcome Clinical Report J48, SVM, and NB The SVM classifier improved the identification of patient safety incidents. Incident reports containing deaths were most easily classified with an accuracy of 72.82%. The severity classifier was not accurate to replace manual scrutiny
Wang et al [76] To identify the type and severity of patient safety incident reports Clinical Report CNN and SVM ensemble CNN achieved high F scores (>85%) across all test datasets when identifying common incident types, including falls, medications, pressure injury, and aggression. It improved the process by 11.9% to 45.10% across different datasets
Klock et al [47] To understand the root causes of falls and increase learning from fall reports for better prevention of patient falls. Clinical Report SVM, RF, and RNNw The model identified high and low scoring fall reports. Most of the patient fall reports scores were between 0.3 and 0.4, indicating poor quality of reports
Li et al [77] To stratify patient safety adverse event risk and predict safety problems of individual patients Clinical Report Ensemble-ML The adverse event risk score at the 0.1 level could identify 57.2% of adverse events with 26.3% accuracy from 9.2% of the validation sample. The adverse event risk score of 0.04 could identify 85.5% of adverse events
Murff et al [78] To identify postoperative surgical complications within a comprehensive electronic medical record Clinical Report NLP NLP identified 82% of acute renal failure cases compared with 38% for patient safety indicators. Similar results were obtained for venous thromboembolism (59% vs 46%), pneumonia (64% vs 5%), sepsis (89% vs 34%), and postoperative myocardial infarction (91% vs 89%)
Wang et al [79] To automate the identification of patient safety incidents in hospitals Clinical Report Text-based classifier: LR, SVM For severity level, the F score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, a high recall was achieved for SAC1 (82.8%-84%), but precision was poor (6.8%-11.2%).


High-risk incidents (SAC2) and medium-risk incidents (SAC3) were often misclassified.


Reports about falls, medications, pressure injury, aggression, and blood tests were identified with high recall and precision
Rosenbaum and Baron [80] To detect Wrong Blood in Tube errors and mitigate patient harm Clinical Report LR, SVM In contrast to the univariate analysis, the best performing multivariate delta check model (SVM) identified errors with a high degree of accuracy (0.97)
McKnight [81] To improve the ability to extract clinical information from patient safety reports efficiently Clinical Report NLP The semisupervised model categorized patient safety reports into their appropriate patient safety topic and avoided overlaps; 85% of unlabeled reports were assigned correct labels. It helped NCPSx analysts to develop policy and mitigation decisions
Marella et al [82] To analyze patient safety reports describing health hazards from electronic health records Clinical Report Text mining based on: NB, KNN, rule induction The NB kernel performed best, with an AUC of 0.927, accuracy of 0.855, and F score of 0.877.


The overall proportion of cases found relevant was comparable between manually and automatically screened cases; 334 reports identified by the model as relevant were identified as not relevant, implying a false-positive rate of 13%.


Manual screening identified 4 incorrect predictions, implying a false-negative rate of 29%
Ye et al [83] To validate a real-time early warning system to predict patients at high risk of inpatient mortality during their hospital episodes Clinical Report RF, XGBy, boosting SVM, LASSOz, and KNN The modified early warning system accurately predicted the possibility of death for the top 13.3% (34/255) of patients at least 40.8 hours before death
Fong et al [84] To identify health information technology-related events from patient safety reports Clinical Report Unigram and Bigram LR, SVM Unigram models performed better than Bigram and combined models. It identified HITaa-related events trained on PSEbb free-text descriptions from multiple states and health care systems. The unigram LR model gave an AUC of 0.931 and an F1 score of 0.765. LR also showed potential to maintain a faster runtime when more reports are analyzed. The final HIT model had less complexity and was more easily sharable
Simon et al [85] To establish whether patients with type 2 diabetes can safely use PANDITcc and whether its insulin dosing advice is clinically safe Drug safety PANDIT 27 out of 74 (36.5%) PANDIT advice differed from those provided by diabetes nurses. However, only one of these (1.4%) was considered unsafe by the panel
Song et al [86] To predict drug-drug interactions Drug safety SVM The 10‐fold crossvalidation improved the identification of drug-drug interaction with AUC>0.97, which is significantly greater than the analogously developed ML model (0.67)
Hammann et al [87] To identify drugs that could be suspected of causing adverse reactions in the central nervous system, liver, and kidneys Drug safety CHAIDdd and CARTee CART exhibited high predictive accuracy of 78.94% for allergic reactions, 88.69% for renal, and 90.22% for the liver. CHAID model showed a high accuracy of 89.74% for the central nervous system
Bean et al [88] To predict adverse drug reactions Drug safety LR, SVM, DT, NLP, own model The proposed model (own model) outperformed traditional LR, SVM, DT, and predicted adverse drug reactions with an AUC of 0.92
Hu et al [89] To predict the appropriateness of initial digoxin dosage and minimize drug-drug adverse interactions Drug safety C4.5, KNN, CART, RF, MLP, and LR In the non drug-drug interaction group, the AUC of RF, MLP, CART, and C4.5 was 0.91, 0.81, 0.79, and 0.784, respectively; for the drug-drug interaction group, the AUC of RF, CART, MLP, and C4.5 was 0.89, 0.79, 0.77, and 0.77, respectively.


DT-based approaches and MLP can determine the initial dosage of a high-alert digoxin medication, which can increase drug safety in clinical practice
Tang et al [90] To identify adverse drug effects from unstructured hospital discharge summaries Drug safety NLP A total of 33 trial sets were evaluated by the algorithm and reviewed by pharmacovigilance experts. After every 6 trial sets, drug and adverse event dictionaries were updated, and rules were modified to improve the system. The model identified adverse events with 92% precision and recall
Hu et al [91] To predict the dosage of warfarin Drug safety KNN, SVRff, NN-BPgg, MThh The proposed model improved warfarin dosage when compared to the baseline (mean absolute error 0.394); reduced mean absolute error by 40.04%
Hasan et al [92] To improve medication reconciliation task Drug safety LR, KNN Collaborative filtering identified the top 10 missing drugs about 40% to 50% of the time and the therapeutic missing drugs about 50% to 65% of the time
Labovitz et al [93] To evaluate the use of a mobile AI platform on medication adherence in stroke patients on anticoagulation therapy Drug safety Cell phone–based AI platform Mean (SD) cumulative adherence based on the AI platform was 90.5% (7.5%). Plasma drug concentration levels indicated that adherence was 100% (15/15) and 50% (6/12) in the intervention and control groups, respectively
Long et al [94] To improve the reconciliation method Drug safety iPad-based software tool with an AI algorithm All patients completed the task. The software improved reconciliation; all patients identified at least one error in their electronic medical record medication list; 8 of 10 patients reported that they would use the device in the future. The entire team (clinical and patients) liked the device and preferred to use it in the future
Reddy et al [95] To assess proof of concept, safety, and feasibility of ABC4Dii in a free-living environment over 6 weeks

Drug safety ABC4D ABC4D was safe for use as an insulin bolus dosing system. A trend suggesting a reduction in postprandial hypoglycemia was observed.


The median (IQR) number of postprandial hypoglycemia episodes within 6 h after the meal was 4.5 (2.0-8.2) in week 1 versus 2.0 (0.5-6.5) in week 6 (P=.10). No episodes of severe hypoglycemia occurred during the study

Schiff et al [96] To evaluate the performance and clinical usefulness of medication error alerts generated by an alerting system

Drug safety MedAware, probabilistic ML 75% of the chart-reviewed alerts generated by MedAware were valid from which medication errors were identified. Of these valid alerts, 75.0% were clinically useful in flagging potential medication errors.
Li et al [97] To develop a computerized algorithm for medication discrepancy detection and assess its performance on real-world medication reconciliation data Drug safety Hybrid system consisting of ML algorithms and NLP The hybrid algorithm yielded precision (P) of 95.0%, recall (R) of 91.6%, and F value of 93.3% on medication entity identification, and P=98.7%, R=99.4%, and F=99.1% on attribute linkage.
The combination of the hybrid system and medication matching system gave P=92.4%, R=90.7%, and F=91.5%, and P=71.5%, R= 65.2%, and F=68.2% on classifying the matched and the discrepant medications, respectively
Carrell et al [98] To identify evidence of problem opioid use in electronic health records Drug safety NLP The NLP-assisted manual review identified an additional 728 (3.1%) patients with evidence of clinically diagnosed problem opioid use in clinical notes.

Tinoco et al [99] To evaluate the source of information affecting different adverse events Drug safety CSSjj (ML) CSS detected more hospital-associated infections than manual chart review (92% vs 34%); CSS missed events that were not stored in a coded format
Onay et al [100] To classify approved drugs from withdrawn drugs and thus reduce adverse drug effects Drug safety SVM, Boosted and Bagged trees (Ensemble) The Gaussian SVM model yielded 78% prediction accuracy for the drug dataset, including all diseases.


The ensemble of bagged tree and linear SVM models involved 89% of the accuracies for psycholeptics and psychoanalytic drugs
Cai et al [101] To discover drug-drug interactions from the Food and Drug Administration’s adverse event reporting system and thus prevent patient harm Drug safety Causal Association Rule Discovery (CARD) CARD demonstrated higher accuracy in identifying known drug interactions compared to the traditional method (20% vs 10%);
CARD yielded a lower number of drug combinations that are unknown to interact (50% for CARD vs 79% for association rule mining).
Dandala et al [102] To extract adverse drug events from clinical narratives and automate pharmacovigilance. Drug safety BilSTMkk, CRF-NNll Joint modeling improved the identification of adverse drug events from 0.62 to 0.65

Dey et al [103] To predict and prevent adverse drug reactions at an early stage to enhance drug safety Drug safety Deep learning Neural fingerprints from the deep learning model (AUC=0.72) outperformed all other methods in predicting adverse drug reactions.


The model identified important molecular substructures that are associated with specific adverse drug reactions
Yang et al [104] To identify medications, adverse drug effects, and their relations with clinical notes

Drug safety MADEx, LSTM-RNNmm, CRFnn, SVM, RF MADEx achieved the top-three best performances (F1 score of 0.8233) for clinical name entity recognition, adverse drug effect, and relations from clinical texts, which outperformed traditional methods
Chapman et al [105] To identify adverse drug effect symptoms and drugs in clinical notes Drug safety NLP The micro-averaged F1 score was 80.9% for named entity recognition, 88.1% for relation extraction, and 61.2% for the integrated systems
Lian et al [106] To detect adverse drug reactions Drug safety LRMoo, BNMpp, BCP-NNqq Experimental results showed the usefulness of the proposed pattern discovery method by improving the standard baseline adverse drug reaction by 23.83%
Huang et al [107] To predict adverse drug effects Drug safety SVM, LR The proposed computational framework showed that an in silico model built on this framework can achieve satisfactory cardiotoxicity adverse drug reaction prediction performance (median AUC=0.771, accuracy=0.675, sensitivity=0.632, and specificity=0.789).

aAI: artificial intelligence.

bKNN: K-nearest neighbor.

cNB: naive Bayes.

dLR: logistic regression.

eSVM: support vector machine.

fRF: random forest.

gAUC: area under the curve.

hICU: intensive care unit.

iMMD: multimodal section.

jDT: decision tree.

kBCT: binary classification tree.

lRDAC: regularized discriminant analysis classifier.

mNN: neural network.

nDEWS: deep learning–based early warning system.

oMEWS: modified early warning system.

pAUPRC: area under the precision-recall curve.

qJ48: decision tree algorithm.

rCC: closure classifier.

sMLP: multilayer perceptron.

tNLP: natural language processing.

uCNN: convolutional neural network.

vET: extra tree.

wRNN: recurrent neural network.

xNCPS: National Center for Patient Safety.

yXGB: extreme gradient boosting.

zLASSO: least absolute shrinkage and selection operator.

aaHIT: health information technology.

bbPSE: patient safety event.

ccPANDIT: Patient Assisting Net-Based Diabetes Insulin Titration.

ddCHAID: Chi square automatic interaction detector.

eeCART: classification and regression tree.

ffSVR: support vector regression.

ggNN-BP: neural network-back propagation.

hhMT: model tree.

iiABC4D: Advanced Bolus Calculator For Diabetes.

jjCSS: clinical support system.

kkBiLSTM: bi-long short-term memory neural network.

llCRF-NN: conditional random field neural network.

mmLSTM-RNN: long short-term memory-recurrent neural network.

nnCRF: conditional random field neural network.

ooLRM: logistic regression probability model.

ppBNM: Bayesian network model.

qqBCP-NN: Bayesian confidence propagation neural network.