Skip to main content
. 2022 Dec 17;10(1):45–60. doi: 10.1007/s40572-022-00389-x

Table 2.

Performance metrics for machine learning models to predict drinking water quality

Performance metric Purpose Definition Limitation Score range Papers reported
Classification
  Accuracy Determines proportion of total correct classifications

TP+TNTP+TN+FP+FN

Ranges between 0 and 1

Provides an overoptimistic estimation of the classifier ability on the majority class Multi-class: 0.367–0.826 Binomial: 0.67–0.94 [4•, 5, 7, 8••, 11, 12•, 7274, 7784]
  Sensitivity Determines model’s ability to recall true positives

TPTP+FN

Ranges between 0 and 1

Sensitive to the classification threshold, lower threshold leads to high sensitivity 0.07–0.84 [4•, 57, 8••, 12•, 73, 74, 7779, 8184]
  Specificity Determines model’s ability to correctly classify true negatives

TNTN+FP

Ranges between 0 and 1

Sensitive to the classification threshold, higher threshold leads to high specificity 0.43–0.98 [4•, 5, 7, 8••, 12•, 22, 23, 27, 28, 3033]
  Area under the receiver operator curve (AUC-ROC) or C-statistic Determines probability that model will rank randomly chosen positive example higher than randomly chosen negative example

The area under the curve of false positive rate vs true positive rate at different classification thresholds between 0 and 1

Ranges between 0 and 1

Only used for binary classification problem 0.72–0.92 [4•, 7, 8••, 9•, 10•, 12•, 73, 74, 78, 79, 81]
  Matthew’s correlation coefficient (MCC) Measures association between observed & predicted values

TPTN-FPFNTP+FPTP+FNTN+FPTN+FN

Ranges between − 1 and 1

Applies to only one classification threshold 0.31–0.72 [80]
  F1 Score Finds the balance between precision and recall

2TP2TP+FP+FN

Ranges between 0 and 1

Applies to only one classification threshold 0.46–0.74 [79, 80]
  Cohen’s kappa statistic Determines how well machine learning classifier matched observations p0-pe1-peRanges between − 1 and 1 Not easy to interpret 0.46–0.62 [7, 8••]
Regression
  Coefficient of determination (R2) Determines proportion of variance explainable by predictors

1-yi-yi^2yi-μy2

Ranges between 0 and 1

Increases with the number of predictors 0.12–0.85 [4•, 5, 6, 16, 71•, 82, 83, 85••, 86, 87]
  Mean square error (MSE) Measures how spread-out data is around line of best fit

1nyi-yi^2

Ranges from 0 to ∞

Differs based on the scale of the response variable 0.05–5.18 [16, 76, 8083, 85••]
  Mean absolute error (MAE) Measures error between paired observation and prediction

1nyi-yi^

Ranges from 0 to ∞

No penalty for large errors in prediction 0.13–3.06 [80]

TP, true positives; TN, true negatives; FP, false positives; FN, false negatives; p0, overall accuracy of the model; pe, measure of the agreement between the model predictions and the actual class values as if happening by chance