Skip to main content
. 2021 Oct 14;27(38):6399–6414. doi: 10.3748/wjg.v27.i38.6399

Table 2.

Most common evaluation metrics found in the state of the art for detection, segmentation and classification tasks

Term
Symbol
Description
Positive P Number of real positive cases in the data
Negative N Number of real negative cases in the data
True positive TP Number of correct positive cases classified/detected
True negative TN Number of correct negative cases classified/detected
False positive FP Instances incorrectly classified/detected as positive
False negative FN Instances incorrectly classified/detected as negative
Area under curve AUC Area under the ROC plot
Term Task Formulation
Accuracy C, D, S (TP + TN)/(TP + TN + FN + FP)
Precision/PPV C, D, S TP/(TP + FP)
Sensitivity/Recall/TPR C, D, S TP/(TP + FN)
Specificity/TNR C, D, S TN/(TN + FP)
FPR C, D, S FP/(TN + FP)
FNR C, D, S FN/(TP + FN)
f1-score/DICE index C, D, S 2 ∙ (precision ∙ recall)/(precision + recall)
f2-score C, D, S 4 ∙ (precision∙recall)/(4∙precision + recall)
IoU/Jaccard index D, S (target ∩ prediction)/(target ∪ prediction)
AAC D, S (detected area ∩ real area)/(real area)

C: Classification; D: Detection; S: Segmentation. PPV: Positive predictive value; TPR: True positive rate; TNR: True negative rate. AAC: Annotated area covered.