. 2024 Jun 15;485(6):1095–1105. doi: 10.1007/s00428-024-03841-5

Table 4.

AI(H) performance on fibrosis-focused tasks, Sirius red slides

Model	Overall accuracy (%)*		Macro-precision (%)^†		Macro-sensitivity (macro recall) (%)^†
Model	Test	Training	Test	Training	Test	Training
Tissue detection	99.4	99.9	99.8	99.3	99.0	99.8
Microanatomy (portal area, central vein)	94.0	97.0	67.0	92.4	65.9	83.7
Fibrosis (portal fibrosis, perivenular fibrosis, pericellular fibrosis, nodular fibrosis, cirrhosis)	87.6	97.2	73.3	96.2	68.3	95.1

^*Overall accuracy is a standalone metric that measures how well machine-learning models perform in multiclass classifications. It denotes the ratio of correct predications; for example, for a three-category (category A, B, and C) classification task, overall accuracy is calculated as the sum of correct predications on category A, B, and C divided by the grand total

^†Precision and sensitivity (also called recall) are paired metrics (which means that they cannot be used individually) that measure how well machine-learning models perform in classification tasks. In binary classification, precision is calculated as TP/(TP + FP), and sensitivity is computed as TP/(TP + FN). In multiclass classification, each category forms its own positive class (and combines other categories as the negative class) and thus renders several binary classifications. Macro-precision and macro-sensitivity are arithmetic means (average) of individual binary precisions and of individual binary sensitivities, respectively

AI(H), artificial intelligence for hepatitis; FP, false positive; FN, false negative; TP, true positive