. 2020 Jul 24;8(7):e18599. doi: 10.2196/18599

Table 1.

Evidentiary table of 53 selected publications.

Reference	Objective	Study theme	AI^a method	Findings (patient safety outcomes)
Chen et al [57]	To classify alerts as real or artifacts in online noninvasive vital sign data streams and minimize alarm fatigue and missed true instability	Clinical alarms/alerts	KNN^b, NB^c, LR^d, SVM^e, RF^f	Machine-learning (ML) models could distinguish clinically relevant pulse arterial O₂ saturation, blood pressure, and respiratory rate from artifacts in an online monitoring dataset (AUC^g>0.87)
Ansari et al [58]	To minimize false alarms in the ICU^h	Clinical alarms/alerts	MMDⁱ, DT^j	ML algorithm along with MMD was effective in suppressing false alarms
Zhang et al [59]	To minimize the rate of false critical arrhythmia alarms	Clinical alarms/alerts	SVM	SVM reduced false alarm rates. The model gave an overall true positive rate of 95% and true negative rate of 85%
Antink et al [60]	To reduce false alarms by using multimodal cardiac signals recorded from a patient monitor	Clinical alarms/alerts	BCT^k, SVM, RF, RDAC^l	A false alarm reduction score of 65.52 was achieved; employing an alarm-specific strategy, the model performed at a true positive rate of 95% and true negative rate of 78%. False alarms for extreme tachycardia were suppressed with 100% sensitivity and specificity
Eerikäinen et al [61]	To classify true and false cardiac arrhythmia alarms	Clinical alarms or alerts	RF	Out of 5 false alarms, 4 were suppressed; 77.39% real-time model accuracy
Menard et al [62]	Develop a predictive model that enables Roche/Genentech Quality Program Leads oversight of adverse event reporting at the program, study, site, and patient level.	Clinical alarms/alerts	ML (not disclosed)	The ML method identified the sites by risk of underreporting and enabled real-time safety reporting. The proposed model had an AUC of 0.62, 0.79, and 0.92 for simulation scenarios of 25%, 50%, and 75%, respectively. This project was part of a broader effort at Roche/Genentech Product Development Quality to apply advanced analytics to augment and complement traditional clinical quality assurance approaches
Segal et al [63]	To determine the clinical usefulness of medication error alerts in a real-life inpatient setting	Clinical alarms or alerts	Probabilistic ML	85% of the alerts were clinically valid, and 80% were considered clinically useful; 43% of the alerts caused changes in subsequent medical orders. Thus, the model detected medication errors
Hu et al [64]	To detect clinical deterioration	Clinical alarms or alerts	NN^m	NN-based model could detect health deterioration such as heart rate variability with more accuracy than one of the best-performing early warning scores (ViEWS). The positive prediction value of NN was 77.58% and the negative prediction value was 99.19%
Kwon et al [65]	To develop alarm systems that predict cardiac arrest early	Clinical alarms or alerts	RF, LR, DEWSⁿ, and MEWS^o	The DEWS identified more than 50% of patients with in-hospital cardiac arrest 14 hours before the event. It allowed medical staff to have enough time to intervene. The AUC and AUPRC^p of DEWS was 0.85 and 0.04, respectively, and outperformed MEWS with AUC and AUPC of 0.60 and 0.003, respectively; RF with AUC and AUPC of 0.78 and 0.01, respectively; and LR with AUC and AUPRC of 0.61 and 0.007, respectively. DEWS reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the other models at the same sensitivity
Gupta and Patrick [66]	To classify clinical incidents	Clinical Report	J48^q, NB multinomial, and SVM	The selected models performed poorly in classifying incident categories (48.77% best, using J48), but performed comparatively better in classifying free text (76.49% using NB).
Wang et al [67]	To identify multiple incident types from a single report	Clinical Report	Compares binary relevance, CC^r	Binary classifier improved identification of common incident types: falls, medications, pressure injury, aggression, documentation problem, and others. Automated identification enabled safety problems to be detected and addressed in a more timely manner
Zhou et al [49]	To extract information from clinical reports	Clinical Report	SVM, NB, RF, and MLP^s	ML algorithms identified the medication event originating stages, event types, and causes, respectively. The models improved the efficiency of analyzing the medication event reports and learning from the reports in a timely manner with (SVM) F1 of 0.792 and (RF) F1 of 0.925
Fong et al [68]	To analyze patient safety reports	Clinical Report	NLP^t	Pyxis Discrepancy and Pharmacy Delivery Delay were found to be the main two factors affecting patient safety. The NLP models significantly reduced the time required to analyze safety reports
El Messiry et al [69]	To analyze patient feedback	Clinical Report	NLP	Care-related complaints were influenced by money and emotion
Chondrogiannis et al [70]	To identify the meaning of abbreviations used in clinical studies	Clinical Report	NLP	Each clinical study document contained about 6.8 abbreviations. Each abbreviation can have 1.25 meanings on average. This helped in identification of acronyms
Liang and Gong [71]	To extract information from patient safety reports	Clinical Report	Multilabel classification methods	Binary relevance was the best problem transformation algorithm in the multilabeled classifiers. It provided suggestions on how to implement automated classification of patient safety reports in clinical settings
Ong et al [72]	To identify risk events in clinical incident reports	Clinical report	Text classifiers based on SVM	SVM performed well on datasets with diverse incident types (85.8%) and data with patient misidentification (96.4%). About 90% of false positives were found in “near-misses” and 70% of false negative occurred due to spelling errors
Taggart et al [73]	To identify bleeding events using in clinical notes	Clinical Report	NLP, SVM, CNN^u, and ET^v	Rule-based NLP was better than the ML approach. NLP detected bleeding complications with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value. It can thus be used for quality improvement and prevention programs
Denecke et al [74]	To minimize any loss of information during a doctor-patient conversation	Clinical Report	NLP	Electronic health platform provides an intuitive conversational user interface that patients use to connect to their therapist and self-anamnesis app. The app also allows data sharing among treating therapists
Evans et al [75]	To determine the incident type and the severity of harm outcome	Clinical Report	J48, SVM, and NB	The SVM classifier improved the identification of patient safety incidents. Incident reports containing deaths were most easily classified with an accuracy of 72.82%. The severity classifier was not accurate to replace manual scrutiny
Wang et al [76]	To identify the type and severity of patient safety incident reports	Clinical Report	CNN and SVM ensemble	CNN achieved high F scores (>85%) across all test datasets when identifying common incident types, including falls, medications, pressure injury, and aggression. It improved the process by 11.9% to 45.10% across different datasets
Klock et al [47]	To understand the root causes of falls and increase learning from fall reports for better prevention of patient falls.	Clinical Report	SVM, RF, and RNN^w	The model identified high and low scoring fall reports. Most of the patient fall reports scores were between 0.3 and 0.4, indicating poor quality of reports
Li et al [77]	To stratify patient safety adverse event risk and predict safety problems of individual patients	Clinical Report	Ensemble-ML	The adverse event risk score at the 0.1 level could identify 57.2% of adverse events with 26.3% accuracy from 9.2% of the validation sample. The adverse event risk score of 0.04 could identify 85.5% of adverse events
Murff et al [78]	To identify postoperative surgical complications within a comprehensive electronic medical record	Clinical Report	NLP	NLP identified 82% of acute renal failure cases compared with 38% for patient safety indicators. Similar results were obtained for venous thromboembolism (59% vs 46%), pneumonia (64% vs 5%), sepsis (89% vs 34%), and postoperative myocardial infarction (91% vs 89%)
Wang et al [79]	To automate the identification of patient safety incidents in hospitals	Clinical Report	Text-based classifier: LR, SVM	For severity level, the F score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, a high recall was achieved for SAC1 (82.8%-84%), but precision was poor (6.8%-11.2%). High-risk incidents (SAC2) and medium-risk incidents (SAC3) were often misclassified. Reports about falls, medications, pressure injury, aggression, and blood tests were identified with high recall and precision
Rosenbaum and Baron [80]	To detect Wrong Blood in Tube errors and mitigate patient harm	Clinical Report	LR, SVM	In contrast to the univariate analysis, the best performing multivariate delta check model (SVM) identified errors with a high degree of accuracy (0.97)
McKnight [81]	To improve the ability to extract clinical information from patient safety reports efficiently	Clinical Report	NLP	The semisupervised model categorized patient safety reports into their appropriate patient safety topic and avoided overlaps; 85% of unlabeled reports were assigned correct labels. It helped NCPS^x analysts to develop policy and mitigation decisions
Marella et al [82]	To analyze patient safety reports describing health hazards from electronic health records	Clinical Report	Text mining based on: NB, KNN, rule induction	The NB kernel performed best, with an AUC of 0.927, accuracy of 0.855, and F score of 0.877. The overall proportion of cases found relevant was comparable between manually and automatically screened cases; 334 reports identified by the model as relevant were identified as not relevant, implying a false-positive rate of 13%. Manual screening identified 4 incorrect predictions, implying a false-negative rate of 29%
Ye et al [83]	To validate a real-time early warning system to predict patients at high risk of inpatient mortality during their hospital episodes	Clinical Report	RF, XGB^y, boosting SVM, LASSO^z, and KNN	The modified early warning system accurately predicted the possibility of death for the top 13.3% (34/255) of patients at least 40.8 hours before death
Fong et al [84]	To identify health information technology-related events from patient safety reports	Clinical Report	Unigram and Bigram LR, SVM	Unigram models performed better than Bigram and combined models. It identified HIT^aa-related events trained on PSE^bb free-text descriptions from multiple states and health care systems. The unigram LR model gave an AUC of 0.931 and an F1 score of 0.765. LR also showed potential to maintain a faster runtime when more reports are analyzed. The final HIT model had less complexity and was more easily sharable
Simon et al [85]	To establish whether patients with type 2 diabetes can safely use PANDIT^cc and whether its insulin dosing advice is clinically safe	Drug safety	PANDIT	27 out of 74 (36.5%) PANDIT advice differed from those provided by diabetes nurses. However, only one of these (1.4%) was considered unsafe by the panel
Song et al [86]	To predict drug-drug interactions	Drug safety	SVM	The 10‐fold crossvalidation improved the identification of drug-drug interaction with AUC>0.97, which is significantly greater than the analogously developed ML model (0.67)
Hammann et al [87]	To identify drugs that could be suspected of causing adverse reactions in the central nervous system, liver, and kidneys	Drug safety	CHAID^dd and CART^ee	CART exhibited high predictive accuracy of 78.94% for allergic reactions, 88.69% for renal, and 90.22% for the liver. CHAID model showed a high accuracy of 89.74% for the central nervous system
Bean et al [88]	To predict adverse drug reactions	Drug safety	LR, SVM, DT, NLP, own model	The proposed model (own model) outperformed traditional LR, SVM, DT, and predicted adverse drug reactions with an AUC of 0.92
Hu et al [89]	To predict the appropriateness of initial digoxin dosage and minimize drug-drug adverse interactions	Drug safety	C4.5, KNN, CART, RF, MLP, and LR	In the non drug-drug interaction group, the AUC of RF, MLP, CART, and C4.5 was 0.91, 0.81, 0.79, and 0.784, respectively; for the drug-drug interaction group, the AUC of RF, CART, MLP, and C4.5 was 0.89, 0.79, 0.77, and 0.77, respectively. DT-based approaches and MLP can determine the initial dosage of a high-alert digoxin medication, which can increase drug safety in clinical practice
Tang et al [90]	To identify adverse drug effects from unstructured hospital discharge summaries	Drug safety	NLP	A total of 33 trial sets were evaluated by the algorithm and reviewed by pharmacovigilance experts. After every 6 trial sets, drug and adverse event dictionaries were updated, and rules were modified to improve the system. The model identified adverse events with 92% precision and recall
Hu et al [91]	To predict the dosage of warfarin	Drug safety	KNN, SVR^ff, NN-BP^gg, MT^hh	The proposed model improved warfarin dosage when compared to the baseline (mean absolute error 0.394); reduced mean absolute error by 40.04%
Hasan et al [92]	To improve medication reconciliation task	Drug safety	LR, KNN	Collaborative filtering identified the top 10 missing drugs about 40% to 50% of the time and the therapeutic missing drugs about 50% to 65% of the time
Labovitz et al [93]	To evaluate the use of a mobile AI platform on medication adherence in stroke patients on anticoagulation therapy	Drug safety	Cell phone–based AI platform	Mean (SD) cumulative adherence based on the AI platform was 90.5% (7.5%). Plasma drug concentration levels indicated that adherence was 100% (15/15) and 50% (6/12) in the intervention and control groups, respectively
Long et al [94]	To improve the reconciliation method	Drug safety	iPad-based software tool with an AI algorithm	All patients completed the task. The software improved reconciliation; all patients identified at least one error in their electronic medical record medication list; 8 of 10 patients reported that they would use the device in the future. The entire team (clinical and patients) liked the device and preferred to use it in the future
Reddy et al [95]	To assess proof of concept, safety, and feasibility of ABC4Dⁱⁱ in a free-living environment over 6 weeks	Drug safety	ABC4D	ABC4D was safe for use as an insulin bolus dosing system. A trend suggesting a reduction in postprandial hypoglycemia was observed. The median (IQR) number of postprandial hypoglycemia episodes within 6 h after the meal was 4.5 (2.0-8.2) in week 1 versus 2.0 (0.5-6.5) in week 6 (P=.10). No episodes of severe hypoglycemia occurred during the study
Schiff et al [96]	To evaluate the performance and clinical usefulness of medication error alerts generated by an alerting system	Drug safety	MedAware, probabilistic ML	75% of the chart-reviewed alerts generated by MedAware were valid from which medication errors were identified. Of these valid alerts, 75.0% were clinically useful in flagging potential medication errors.
Li et al [97]	To develop a computerized algorithm for medication discrepancy detection and assess its performance on real-world medication reconciliation data	Drug safety	Hybrid system consisting of ML algorithms and NLP	The hybrid algorithm yielded precision (P) of 95.0%, recall (R) of 91.6%, and F value of 93.3% on medication entity identification, and P=98.7%, R=99.4%, and F=99.1% on attribute linkage. The combination of the hybrid system and medication matching system gave P=92.4%, R=90.7%, and F=91.5%, and P=71.5%, R= 65.2%, and F=68.2% on classifying the matched and the discrepant medications, respectively
Carrell et al [98]	To identify evidence of problem opioid use in electronic health records	Drug safety	NLP	The NLP-assisted manual review identified an additional 728 (3.1%) patients with evidence of clinically diagnosed problem opioid use in clinical notes.
Tinoco et al [99]	To evaluate the source of information affecting different adverse events	Drug safety	CSS^jj (ML)	CSS detected more hospital-associated infections than manual chart review (92% vs 34%); CSS missed events that were not stored in a coded format
Onay et al [100]	To classify approved drugs from withdrawn drugs and thus reduce adverse drug effects	Drug safety	SVM, Boosted and Bagged trees (Ensemble)	The Gaussian SVM model yielded 78% prediction accuracy for the drug dataset, including all diseases. The ensemble of bagged tree and linear SVM models involved 89% of the accuracies for psycholeptics and psychoanalytic drugs
Cai et al [101]	To discover drug-drug interactions from the Food and Drug Administration’s adverse event reporting system and thus prevent patient harm	Drug safety	Causal Association Rule Discovery (CARD)	CARD demonstrated higher accuracy in identifying known drug interactions compared to the traditional method (20% vs 10%); CARD yielded a lower number of drug combinations that are unknown to interact (50% for CARD vs 79% for association rule mining).
Dandala et al [102]	To extract adverse drug events from clinical narratives and automate pharmacovigilance.	Drug safety	BilSTM^kk, CRF-NN^ll	Joint modeling improved the identification of adverse drug events from 0.62 to 0.65
Dey et al [103]	To predict and prevent adverse drug reactions at an early stage to enhance drug safety	Drug safety	Deep learning	Neural fingerprints from the deep learning model (AUC=0.72) outperformed all other methods in predicting adverse drug reactions. The model identified important molecular substructures that are associated with specific adverse drug reactions
Yang et al [104]	To identify medications, adverse drug effects, and their relations with clinical notes	Drug safety	MADEx, LSTM-RNN^mm, CRFⁿⁿ, SVM, RF	MADEx achieved the top-three best performances (F1 score of 0.8233) for clinical name entity recognition, adverse drug effect, and relations from clinical texts, which outperformed traditional methods
Chapman et al [105]	To identify adverse drug effect symptoms and drugs in clinical notes	Drug safety	NLP	The micro-averaged F1 score was 80.9% for named entity recognition, 88.1% for relation extraction, and 61.2% for the integrated systems
Lian et al [106]	To detect adverse drug reactions	Drug safety	LRM^oo, BNM^pp, BCP-NN^qq	Experimental results showed the usefulness of the proposed pattern discovery method by improving the standard baseline adverse drug reaction by 23.83%
Huang et al [107]	To predict adverse drug effects	Drug safety	SVM, LR	The proposed computational framework showed that an in silico model built on this framework can achieve satisfactory cardiotoxicity adverse drug reaction prediction performance (median AUC=0.771, accuracy=0.675, sensitivity=0.632, and specificity=0.789).

^aAI: artificial intelligence.

^bKNN: K-nearest neighbor.

^cNB: naive Bayes.

^dLR: logistic regression.

^eSVM: support vector machine.

^fRF: random forest.

^gAUC: area under the curve.

^hICU: intensive care unit.

ⁱMMD: multimodal section.

^jDT: decision tree.

^kBCT: binary classification tree.

^lRDAC: regularized discriminant analysis classifier.

^mNN: neural network.

ⁿDEWS: deep learning–based early warning system.

^oMEWS: modified early warning system.

^pAUPRC: area under the precision-recall curve.

^qJ48: decision tree algorithm.

^rCC: closure classifier.

^sMLP: multilayer perceptron.

^tNLP: natural language processing.

^uCNN: convolutional neural network.

^vET: extra tree.

^wRNN: recurrent neural network.

^xNCPS: National Center for Patient Safety.

^yXGB: extreme gradient boosting.

^zLASSO: least absolute shrinkage and selection operator.

^aaHIT: health information technology.

^bbPSE: patient safety event.

^ccPANDIT: Patient Assisting Net-Based Diabetes Insulin Titration.

^ddCHAID: Chi square automatic interaction detector.

^eeCART: classification and regression tree.

^ffSVR: support vector regression.

^ggNN-BP: neural network-back propagation.

^hhMT: model tree.

ⁱⁱABC4D: Advanced Bolus Calculator For Diabetes.

^jjCSS: clinical support system.

^kkBiLSTM: bi-long short-term memory neural network.

^llCRF-NN: conditional random field neural network.

^mmLSTM-RNN: long short-term memory-recurrent neural network.

ⁿⁿCRF: conditional random field neural network.

^ooLRM: logistic regression probability model.

^ppBNM: Bayesian network model.

^qqBCP-NN: Bayesian confidence propagation neural network.