Skip to main content
. 2021 Aug 16;9(9):E1361–E1370. doi: 10.1055/a-1507-4980

Table 2. Algorithm testing on 1431 images with a patient split used for training.

Deep learning framework
Colon_aphtae _debris _fissure _non-ulc. Inflamm. _normal _ulcer SB_aphtae _fissure _non-ulc. Inflamm. _normal _debris _lymph. hyp. _ulcer Total
Gold standard Colon_aphtae 34 2 0 3 0 3 3 0 0 0 0 0 0 45
_debris 0 223 0 0 15 0 0 0 0 0 5 0 0 243
_fissure 6 1 0 6 0 20 0 9 0 0 0 0 0 42
_non-ulc. Inflamm. 21 2 1 39 1 2 1 0 4 0 1 0 0 72
_normal 0 6 0 0 34 0 0 0 0 0 0 0 0 40
_ulcer 9 1 0 8 0 27 0 5 0 0 0 0 1 51
SB_aphtae 0 0 0 0 0 0 109 0 4 0 0 0 11 124
_fissure 0 0 1 0 0 5 3 50 1 0 3 0 14 77
_non-ulc. Inflamm. 0 0 0 1 0 0 2 2 0 0 0 0 0 5
_normal 0 0 0 0 0 0 0 0 0 186 16 0 0 202
_debris 0 0 0 0 0 0 0 0 0 17 374 0 1 392
_lymph. hyp. 0 1 0 0 0 0 0 0 0 0 1 8 0 10
_ulcer 1 0 0 1 0 8 46 39 0 0 4 0 29 128
Total 71 236 2 58 50 65 164 105 9 203 404 8 56 1431

The matrix displays the number of images according to their classification with the gold standard and deep learning framework depending on the location. Lesions were assigned to one of 13 predefined categories. The inter-modality agreement was substantial (κ = 0.74).