. 2021 Feb 22;11(2):372. doi: 10.3390/diagnostics11020372

Table 2.

The 44 reviewed articles reporting validation, which machine learning (ML) models were compared, the best performing model, the metric used by the authors to evaluate the models, results of the studies, most relevant laboratory features and issues of the studies.

Reference	Validation	Comparison	Best Performer	BP’s family	Metrics Used	Results		Most Important Laboratory Features for the Model	Issues/Notes
Awad et al. (2017) [18]	CV	RF, DT, NB, PART, Scores (SOFA, SAPS-I, APACHE-II, NEWS, qSOFA)	RF	Trees	AUROC	RF best performance (VS subset) predicting hospital mortality: 0.90 ± 0.01 AUROC AUROC RF (15 variables) at 6 h: 0.82 ± 0.04 SAPS at 24 h (best performer among scores): 0.650 ± 0.012		Vital Signs, age, serum urea nitrogen, respiratory rate max, heart rate max, heart rate min, creatinine max, care unit name, potassium min, GCS min and systolic blood pressure min	Performance metrics for comparison referred to cross-validation results
Escobar et al. (2017) [19]	CV	3 LoR models, Zilberberg model	LoR (automated model)	Regression	AUROC, pseudo-R2, Sensitivity, Specificity, PPV, NPV, NNE, NRI, IDI		AUROC; R2		Performance metrics for comparison referred to cross-validation results
						Age ≥ 65 years	0.546; −0.1131
						Basic model	0.591; −0.0910
						Zilberberg model	0.591; −0.0875
						Enhanced model	0.587; −0.0924
						Automated model	0.605; −0.1033
Richardson and Lidbury (2017) [20]	CV	RF (variables selection) + SVM ***	NE	SVM ***	AUROC, F1, Sensibility, Specificity, Precision	For both HBV and HCV, 3 balancing methods and 2 feature selectors were tested, showing how they can change SVM performances		HBV: ALT, Age and Sodium HCV: Age, ALT and Urea
Zhang et al. (2017) [21]	CV	GBT ***	NE	Ensemble ***	RI, H-statistic (features) AUROC, Sensibility, Specificity (model)	WBC count ≥ 15 × 109/L (RI: 49.47, p < 0.001), spinal cord involvement (RI: 26.62, p < 0.001), spinal nerve roots involvement (RI: 10.34, p < 0.001), hyperglycaemia (RI: 3.40, p < 0.001), brain or spinal meninges involvement (RI: 2.45, p = 0.003), EV-A71 infection (RI: 2.24, p < 0.001). Interaction between elevated WBC count and hyperglycaemia (H statistic: 0.231, 95% CI: 0–0.262, p = 0.031), between spinal cord involvement and duration of fever (H statistic: 0.291, 95% CI: 0.035–0.326, p = 0.035), and between brainstem involvement and body temperature (H statistic: 0.313, 95% CI: 0–0.273, p = 0.017) GBT model: 92.3% prediction accuracy, AUROC 0.985, Sensibility 0.85, Specificity 0.97
Takeuchi et al. (2017) [22]	OOB	Scores (Gunma Score, Kurume Score and Osaka Score), RF	RF	Trees	AUROC, Sensibility, Specificity, PPV, NPV, Out OF Bag error estimation	RF: AUROC 0.916, Sensitivity 79.7%, Specificity 87.3%, PPV 85.2%, NPV82.1%, OOB error rate 15.5% Sensitivity and Specificity were: 69.8% and 60.0% GS; 60.6% and 55.4% KS; 24.1% and 77.0% OS. PPV (28.2%–45.1%), NPV (82.0%–86.8%)		Aspartate aminotransferase, lactate dehydrogenase concentrations, percent neutrophils	Performance metrics for comparison referred to cross-validation results
Hernandez et al. (2017) [23]	CV	DT, RF, SVM, Naive Bayes	SVM	SVM	AUROC, AUPRC, Sensitivity, Specificity, PPV, NPV, TP, FP, TN, FN	SVM with SMOTE sampling method and considering 6 features obtained the best results AUROC, AUCPR, Sensibility, Specificity 0.830, 0.884, 0.747, 0.912
Bertsimas et al. (2018) [24]	VS	LoR, Regularized LoR, Optimal Classification Tree, CART, GBT	Optimal Classification Tree *	Trees	Accuracy (threshold 50%), PPV at Sensibility of 0.6, AUC	Optimal Classification Tree results 60-day mortality, 90-day, 120-day Accuracy: 94.9, 93.3, 86.1 PPV: 20.2, 27.5, 43.1 AUC: 0.86, 0.84, 0.83		Albumin, change in weight, Pulse, WBC count, Haematocrit according to the kind of cancer	The validation set was used only for NN, KNN, and SVM
Jeong et al. (2018) [25]	CV	CERT, CLEAR, PACE, RF, L1-regularized LoR, SVM, NN	RF	Trees	AUROC, F1, Sensibility, Specificity, PPV, NPV	ML models produced higher averaged F1-measures (0.629–0.709) and AUROC (0.737–0.816) compared to those of the original methods AUROC (0.020–0.597) and F1 (0.475–0.563)
Rosenbaum and Baron (2018) [26]	NA	Univariate models, LoR, SVM	SVM	SVM	AUC, Specificity, PPV	AUROC on testing set (simulated WIBT) best univariate (BUN): 0.84 (interquartile range 0.83–0.84) SVM (difference and values): 0.97 (0.96–0.97) LoR (Difference and values): 0.93		Difference and Values together	Not available data from the comparison among machines
Ge et al. (2018) [27]	CV	RNN-LSTM + LoR vs LoR	RNN-LSTM	DL	AUROC, TP, FP	AUROC cross-validation, AUROC testing set Logistic Regression: 0.7751, 0.7412 RNN-LSTM model: 0.8076, 0.7614		Associated with ICU Mortality: Do Not Reanimate, Prednisolone, Disseminated intravascular coagulation; Associated with ICU Survival: Arterial blood gas pH, Oxygen saturation, Pulse
Jonas et al. (2018) [28]	CV	LoR (LASSO), RF ***	NE	NE	NE	LASSO identified as the most predictive of a positive response to vasoreactivity test: 6-MWD, diabetes, HDL-C, creatinine, right atrial pressure, and cardiac index RF identified as the most predictive: NT-proBNP, HDL-C, creatinine, right atrial pressure, and cardiac index 6-MWD, HDL-C, hs-CRP, and creatinine levels best discriminated between long-term-responder and not			Performance metrics for comparison referred to cross-validation results Tool available online
Sahni et al. (2018) [29]	NA	LoR, RF	RF	Trees	AUROC	AUROC RF (demographics, physiological, lab, all comorbidities) 0.85 (0.84–0.86) LoR (demographics, physiological, lab, all comorbidities) 0.91 (0.90–0.92)		Age, BUN, platelet count, haemoglobin, creatinine, systolic blood pressure, BMI, and pulse oximetry readings	Performance metrics for comparison referred to cross-validation results
Rahimian et al. (2018) [30]	CV	CPH, RF, GBC	GBC	Ensemble	AUROC	AUROC (CI95), internal validation variables, CPH, RF, GBC QA: 0.740 (0.739, 0.741), 0.752 (0.751, 0.753), 0.779 (0.777, 0.781) T: 0.805 (0.804, 0.806), 0.825 (0.824, 0.826), 0.848 (0.847, 0.849) external validation QA: 0.736, 0.736, 0.796 T: 0.788, 0.810, 0.826		age, cholesterol ratio, haemoglobin, and platelets, frequency of lab tests, systolic blood pressure, number of admissions during the last year	Tool available online
Foysal et al. (2019) [31]	CV	Regression analysis and SVM ***	NE	SVM	R2 score, Standard error of detection, Accuracy	Accuracy: 98%		NE	Performance metrics for comparison referred to cross-validation results
Xu et al. (2019) [32]	CV	L1 Logistic Regression, Regress and Round, Naive Bayes, NN-MLP, DT, RF, AdaBoost, XGBoost.	XGBoost, RF	NA	AUROC, Sensitivity, Specificity, NPV, PPV	Mean AUROC: 0.77 on testing set AUROC > 0.90 on 22 lab tests out of 43 On external validation: results were different according to lab test considered		NE	DL missed Albumin as OS predictor
Burton et al. (2019) [33]	CV	Heuristic model (LoR) with microscopy thresholds, NN, RF, XGBoost	XGBoost *	Ensemble	AUROC, Accuracy, PPV, NPV, Sensitivity, Specificity, Relative Workload Reduction (%)	AUC Accuracy PPV NPV Sensitivity (%) Specificity (%) Relative Workload Reduction (%) Pregnant patients 0.828, 26.94, 94.6 [±0.56], 26.84 [±1.88], 25.29 [±0.92] Children (<11 years) 0.913, 62.00, 94.8 [±0·88], 55.00 [±2.12], 46.24 [±1.48] Pregnant patients 0.894, 71.65, 95.3 [±0.24], 60.93 [±0.65], 43.38 [±0.41] Combined performance 0.749, 65.65, 47.64 [±0.51], 97.14 [±0.28], 95.2 [±0.22], 60.93 [±0.60], 41.18 [±0.39]		WBC count, Bacterial count, Age, Epithelial cell count, RBC count
Fillmore et al. (2019) [34]	CV	L1 LoR (LASSO), SVM, RF	RF	Trees	Accuracy	LabTest: LR, SVM, RF ALP: 0.98, 0.97, 0.98 ALT: 0.98, 0.94, 0.92 ALB: 0.97, 0.92, 0.98 HDLC: 0.98, 0.91, 0.98 Na: 0.97, 0.98, 0.99 Mg: 0.97, 0.95, 0.99 HGB: 0.97, 0.95, 0.99			Not provided precise data of the performances on testing set
Zimmerman et al. (2019) [35]	CV	LiR, LoR, RF, NN-MLP	NN-MLP	DL	AUROC, Accuracy, Sensitivity, Specificity, PPV, NPV	LiR Regression task: RMSEV Linear Backward Selection Model 0.224 Linear All Variables Model 0.224 AUROC, Accuracy, Sensitivity, Specificity, PPV, NPV LR, Backward Selection Model: 0.780, 0.724, 0.697, 0.730, 0.337, 0.924 LR, All Variables Model: 0.783, 0.729, 0.698, 0.736, 0.342, 0.925 RF, Backward Selection Model: 0.772, 0.739, 0.660, 0.754, 0.346, 0.918 RF, All Variables Model: 0.779, 0.742, 0.673, 0.756, 0.352, 0.921 MLP, Backward Selection Model: 0.792, 0.744, 0.684, 0.756, 0.356, 0.924 MLP, All Variables Model: 0.796, 0.743, 0.694, 0.753, 0.357, 0.926		Sex, age, ethnicity, Hypoxemia, mechanical ventilation, Coagulopathy, calcium, potassium, creatinine level	Performance metrics for comparison referred to cross-validation results
Sharafoddini et al. (2019) [36]	CV	LASSO for choosing most important variables. DT, LoR, RF, SAPS-II (score)	Logistic Regression	Regression	AUROC	Including indicators improved the AUROC in all modelling techniques, on average by 0.0511; the maximum improvement was 0.1209		BUN, RDW, anion gap all 3 days. day 1: TBil, phosphate, Ca, and Lac day 2&3: Lac, BE, PO2, and PCO2 day 3: PTT and pH
Matsuo et al. (2019) [37]	CV	NN, CPH, CoxBoost, CoxLasso, Random Survival Forest	NN	DL	Concordance Index, Mean Absolute Error	Progression-free survival (PFS): Concordance index, Mean absolute error (mean ± standard error) CPH: 0.784 ± 0.069, 316.2 ± 128.3 DL: 0.795 ± 0.066, 29.3 ± 3.4 Overall survival (OS): CPH: 0.607 ± 0.039, 43.6 ± 4.3 DL: 0.616 ± 0.041, 30.7 ± 3.6		PFS: BUN, Creatinine, Albumin, (Only DL) WBC, Platelet, Bicarbonate, Haemoglobin OS: BUN (only DL) Bicarbonate (only CPH) Platelet, Creatinine, Albumin
Yang et al. (2019) [38]	OOB	RF ***	NE	Trees ***	OOB	Predicting Outcome (discharge/death) Out-of-bag error 0.073 Accuracy: 0.927 Recall/sensitivity: 0.702 Specificity: 0.973 Precision: 0.840		bicarbonate, phosphate, anion gap, white cell count (total), PTT, platelet, total calcium, chloride, glucose and INR	Not clear how they split dataset and which results are reported
Daunhawer et al. (2019) [39]	CV	L1 Regularized LoR (LASSO), RF	RF+LASSO	NE	AUROC	AUROC cross-validation test set external set RF: 0.933 ± 0.019, 0.927, 0.9329 LASSO: 0.947 ± 0.015, 0.939, 0.9470 RF + LASSO: 0.952 ± 0.013, 0.939, 0.9520		Gestational Age, weight, bilirubin level, and hours since birth
Estiri et al. (2019) [40]	Pl	CAD (Standard deviation and Mahalanobis distance), Hierarchical k-means	Hierarchical k-means	Clustering	FP, TP, FN, TN, Sensitivity, Specificity, and fallout across the eight thresholds	Specificity increases as threshold decreases. The lowest was 0.9938 Sensitivity in 39/41 variable > 0.85, Troponin I = 0.0545, LDL = 0.4867 About sensitivity, 39/41 CAD~ML, 9/41 CAD > ML About FP, in 45/50 ML had less FP than CAD
Kayhanian et al. (2019) [41]	CV	LoR, SVM	SVM	SVM	Sensitivity, Specificity, AUC, J-statistic	Sensitivity, Specificity, J-statistic, AUC Linear model, all variables: 0.75, 0.99, 0.7, 0.9 Linear model, three variables: 0.71, 0.99, 0.74, 0.83 SVM, all variables: 0.63, 1, 0.79, N/A SVM, three variables: 0.8, 0.99, 0.63, N/A		Lactate, pH and glucose
Wang et al. (2019) [42]	CV	Auto-Weka (39 ML algorithms)	RF	Trees	Sensitivity, Specificity, AUROC, Accuracy	Time after ICH, Case number, Best algorithms Sensitivity, Specificity, Accuracy, AUC 1-month: 307 Random forest, 0.774, 0.869, 0.831, 0.899 6 months: 243 Random forest, 0.725, 0.906, 0.839, 0.917		1 month: ventricle compression, GCS, ICH volume, location, Hgb; 6 months: GCS, location, age, ICH volume, gender, DBP, WBC	Connection between HDL-C and reactivity of the pulmonary vasculature is a novel finding
Ye et al. (2019) [43]	NA	Retrospective: RF, XGBoost, Boosting, SVM, LASSO, KNN Prospective: RF	RF	Trees	AUROC, PPV, Sensitivity, Specificity	RF’s AUROC: 0.884 (highest among all other ML models) high-risk sensitivity, PPV, low–moderate risk sensitivity, PPV EWS: 26.7%, 69%, 59.2%, 35.4% ViEWS: 13.7%, 35%, 35.7%, 21.4%		Diagnoses of cardiovascular diseases, congestive heart failure, or renal diseases	No information about tuning
Yang et al. (2020) [44]	CV	LoR, DT (CART), RF, and GBDT	GBDT	Ensemble	AUROC, sensitivity, specificity, agreement with RT-PCR (Agr-PCR)	AUROC; Sensitivity; Specificity; Agr-PCR GBDT 0.854 (0.829–0.878); 0.761 (0.744–0.778); 0.808 (0.795–0.821); 0.791 (0.776–0.805); on cross-validation; GBDT 0.838; 0.758; 0.740 on independent testing set		LDH, CRP, Ferritin	No information about model, training, validation, test
Ma et al. (2020) [45]	CV	RF, XGBoost, LoR for selecting variables for the new model New Model vs Score (CURB-65), XGBoost	New Model	Other	AUROC	AUROC on testing set (13 patients), AUROC on cross-validation New Model: 0.9667, 0.9514 CURB-65: 0.5500, 0.8501 XGBoost: 0.3333, 0.4530		LDH, CRP, Age	Tool available online
Hyun et al. (2020) [46]	NE	k-means***	NE	Clustering***	NE	3 Clusters Cluster 2: abnormal haemoglobin and RBC Cluster 3: highest mortality, intubation, cardiac medications and blood administration		BUN, creatinine, potassium, haemoglobin, and red blood cell
Lee et al. (2020) [47]	CV	RF, SVM, LASSO, Ridge, Elastic Net Regulation, MEWS	RF	Trees	AUROC, AUPRC, BA, Sensitivity, Specificity, F1, PLR, and NLR	AUROC AUPRC Sensitivity Specificity RF OSO: 0.80 (0.76 to 0.84); 0.25 (0.18 to 0.33); 0.70 (0.62 to 0.82); 0.78 (0.66 to 0.83) RF OSR: 0.88 (0.85 to 0.91); 0.39 (0.30 to 0.47); 0.81 (0.76 to 0.89); 0.81 (0.75 to 0.83)		OSO: Troponin I, creatine kinase and CK-MB; OSR: Lactic Acid	Performance metrics for comparison referred to cross-validation results
Morid et al. (2020) [48]	CV	RF, XGBT, Kernel-based Bayesian Network, SVM, LoR, Naive Bayes, KNN, ANN	RF	Trees	AUC, F1, Accuracy	RF Model performances according to the detection method, Accuracy AUC Last recorded Value: 0.581, 0.589 Symbolic pattern detection: 0.706, 0.694 Local structural pattern: 0.781, 0.772 Global structural pattern: 0.744, 0.730 Local & Global: 0.813, 0.809		NE
Yu et al. (2020) [49]	NA	ANN***	NE	DL ***	Checking Proportions (CP), Prediction Accuracy, Aggregated Accuracy (AA)	Threshold for CP.AA. performing test 0.15: 90.14%; 95.83% 0.25: 85.78%; 95.05% 0.35: 79.71%; 93.32% 0.45: 71.70%; 90.95% 0.6: 50.46%; 85.30%		NE	Not included data about performances, but only graph of AUROC of prediction to 1 month (with 4-month history)
Chicco and Jurman (2020) [50]	VS	LiR, RF, One-Rule, DT, ANN, SVM, KNN, Naive Bayes, XGBoost	RF	Trees	MCC, F1, Accuracy, TP, TN, PRAUC, AUROC	MCC F1 Accuracy TP TN PRAUC AUROC All features RF + 0.384, 0.547, 0.740, 0.491, 0.864, 0.657, 0.800 Cr+ EF RF +0.418 0.754 0.585 0.541 0.855 0.541 0.698 Cr+EF+FU time LoR +0.616 0.719 0.838 0.785 0.860 0.617 0.822		Serum Creatinine and Ejection Fraction
Ye et al. (2020) [51]	CV	GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest, stepwise LoR, LoR with RCS	GDBT	Ensemble	AUROC, Recall, Precision, F1	Discrimination AUC GDBT 73.51%, 95% CI 71.36%–75.65% LoR with RCS 70.9%, 95% CI 68.68%–73.12% 0.3 and 0.7 were set as cut-off points for predicting outcomes (GDM or adverse pregnancy outcomes)		GBDT: Fasting blood glucose, HbA1c, triglycerides, and maternal BMI LoR: HbA1c and high-density lipoprotein
Macias et al. (2020) [52]	CV	RF (features) + RNN-LSTM, RF	RNN-LSTM (all variables)	DL	AUROC	AUROC mortality prediction 1 month RF 0.737 RNN (many) expert variables 0.781 ± 0.021 RNN RF variables 0.820 ± 0.015 RNN all variables 0.873 ± 0.021
Lobo et al. (2020) [53]	VS	RNN-LSTM + NN + RNN-LSTM ***	NE	DL	Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE)	Best model performance ME: 0.017; MAE: 0.527; MSE: 0.489; predicting to 1 month with 5 month of history data
Roimi et al. (2020) [54]	CV	6 RF+2 XGBoost, RF, XGBoost, LoR	6 RF+2 XGBoost	Other	AUROC, Brier score	Modelling approach BIDMC RHCC AUROC Derivation set, CV Validation set, Derivation set, CV Validation set Logistic-regression: 0.75 ± 0.06, 0.70 ± 0.02, 0.80 ± 0.08, 0.72 ± 0.02 Random-Forest: 0.82 ± 0.03, 0.85 ± 0.01, 0.90 ± 0.03, 0.88 ± 0.02 Gradient Boosting Trees: 0.84 ± 0.04, 0.84 ± 0.02, 0.93 ± 0.04, 0.88 ± 0.01 Ensemble of models: 0.87 ± 0.03, 0.89 ± 0.01, 0.93 ± 0.03, 0.92 ± 0.01 validating the models of BIDMC over RHCC dataset and vice versa, the AUROCs of the models deteriorated to 0.59 ± 0.07 and 0.60 ± 0.06 for BIDMC and RHCC		Most of the strongest features included patterns of change in the time-series variables	Performance metrics for comparison referred to cross-validation results
Kirk et al. (2020) [55]	NA	SVM (cut-offs features), LoR, Random Forest regression Algorithm	RF	Trees	AUROC	AUROC baseline clinical and demographic values 0.52 inclusion of laboratory value thresholds from the day of discharge 0.54 add daily postoperative laboratory thresholds to the demographic and clinical variables 0.59 add postoperative complications 0.62 random forest regression all features 0.68		white blood cell count, bicarbonate, BUN, and creatinine
Li et al. (2020) [56]	VS	RF, LoR	LoR	Regression	AUROC, Accuracy, Precision, F1, Recall	Prospective cohort results AU-ROC Accuracy Precision F1 score Recall RF: 0.830 (0.770–0.887), 0.916 (0.891–0.936), 0.907 (0.881–0.928), 0.901 (0.874–0.922), 0.917 (0.892–0.937) LoR: 0.858 (0.808–0.903), 0.905 (0.879–0.926), 0.887 (0.859–0.910), 0.883 (0.855–0.906), 0.905 (0.879–0.926)		RBC, SI, BE, Lac, DBP, pH
Balamurugan et al. (2020) [57]	CV	Auto-Weka (Naive Bayes, DT-J48, MLP, SVM) & 4 features selectors ***	NE	NE	AUROC, F1, Precision, Accuracy, Recall, MCC, TPR, FPR	Proposed model: features selected; Accuracy; TP Rate; FP Rate GA + J48: 9; 94.32; 0.925; 0.118; PSO + J48: 9; 96.25; 0.963; 0.163; CFS + J48: 11; 84.63; 0.861; 0.871; EWSORA + J48; 4; 98.72; 0.950; 0.165;		RBC, HGB, HCT, WBC	Performance metrics for comparison referred to cross-validation results
Hu et al. (2020) [58]	CV	XGBoost, RF, LR, Score (APACHE II, PSI)	XGBoost	Ensemble	AUROC	AUROC XGBoost 0.842 (95% CI 0.749–0.928) RF 0.809 (95% CI 0.629–0.891) LR 0.701 (95% CI 0.573–0.825) APACHE II 0.720 (95% CI 0.653–0.784) PSI 0.720 (95% CI 0.654–0.7897)		Fluid balance domain, Laboratory data domain, severity score domain, Management domain, Demographic and symptom domain, Ventilation domain
Aydin et al. (2020) [59]	CV	Naïve Bayes, KNN, SVM, GLM, RF, and DT	DT *	Trees	AUC, Accuracy, Sensitivity, Specificity	AUC (%) Accuracy (%) Sensitivity (%) Specificity (%) RF 99.67; 97.45; 97.79; 97.21 KNN 98.68; 95.58; 95.08; 95.93 NB 98.71; 94.76; 94.06; 95.25 DT 93.97; 94.69; 93.55; 96.55 SVM 96.76; 91.24; 90.32; 91.86 GLM 96.83; 90.96; 90.66; 91.16		Platelet distribution width (PDW), white blood cell count (WBC), neutrophils, lymphocytes
Metsker et al. (2020) [60]	CV	KNN for clustering data and then comparison among Linear Regression, Logistic Regression, ANN, DT, and SVM	ANN	DL	AUROC, F1, Precision, Accuracy, Recall	Model Precision Recall F1 score Accuracy AUC 29’s variables Linear Regression 0.6777, 0.7911, 0.7299 0.7472 31’s variables ANN 0.7982, 0.8152, 0.8064, 0.8261, 0.8988		Age, Mean Platelet Volume
Voglis et al. (2020) [61]	Bt	Generalized Linear Models (GLM), GLMBoost, Naïve Bayes classifier, and Random Forest	GLMBoost	Ensemble	AUROC, Accuracy, F1, PPV, NPV, Sensibility, Specificity	AUROC: 84.3% (95% CI 67.0–96.4) Accuracy: 78.4% (95% CI 66.7–88.2) Sensitivity: 81.4% Specificity: 77.5% F1 score: 62.1% NPV (93.9%) PPV (50%)		preoperative serum prolactin preoperative serum insulin-like growth factor 1 level (IGF-1) BMI preoperative serum sodium level

* It was chosen as the most useful, although it was not the best performer; ** Different models were trained with a different number of features; *** A comparison of the ML models was not made; NA: Not available; NE: Not evaluable (meaning not pertinent). For all the other abbreviations, see Appendix B.