Table 4.
AI(H) performance on fibrosis-focused tasks, Sirius red slides
| Model | Overall accuracy (%)* | Macro-precision (%)† | Macro-sensitivity (macro recall) (%)† | |||
|---|---|---|---|---|---|---|
| Test | Training | Test | Training | Test | Training | |
| Tissue detection | 99.4 | 99.9 | 99.8 | 99.3 | 99.0 | 99.8 |
| Microanatomy (portal area, central vein) | 94.0 | 97.0 | 67.0 | 92.4 | 65.9 | 83.7 |
| Fibrosis (portal fibrosis, perivenular fibrosis, pericellular fibrosis, nodular fibrosis, cirrhosis) | 87.6 | 97.2 | 73.3 | 96.2 | 68.3 | 95.1 |
*Overall accuracy is a standalone metric that measures how well machine-learning models perform in multiclass classifications. It denotes the ratio of correct predications; for example, for a three-category (category A, B, and C) classification task, overall accuracy is calculated as the sum of correct predications on category A, B, and C divided by the grand total
†Precision and sensitivity (also called recall) are paired metrics (which means that they cannot be used individually) that measure how well machine-learning models perform in classification tasks. In binary classification, precision is calculated as TP/(TP + FP), and sensitivity is computed as TP/(TP + FN). In multiclass classification, each category forms its own positive class (and combines other categories as the negative class) and thus renders several binary classifications. Macro-precision and macro-sensitivity are arithmetic means (average) of individual binary precisions and of individual binary sensitivities, respectively
AI(H), artificial intelligence for hepatitis; FP, false positive; FN, false negative; TP, true positive