Table 2. Algorithm testing on 1431 images with a patient split used for training.
Deep learning framework | |||||||||||||||
Colon_aphtae | _debris | _fissure | _non-ulc. Inflamm. | _normal | _ulcer | SB_aphtae | _fissure | _non-ulc. Inflamm. | _normal | _debris | _lymph. hyp. | _ulcer | Total | ||
Gold standard | Colon_aphtae | 34 | 2 | 0 | 3 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 45 |
_debris | 0 | 223 | 0 | 0 | 15 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 243 | |
_fissure | 6 | 1 | 0 | 6 | 0 | 20 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 42 | |
_non-ulc. Inflamm. | 21 | 2 | 1 | 39 | 1 | 2 | 1 | 0 | 4 | 0 | 1 | 0 | 0 | 72 | |
_normal | 0 | 6 | 0 | 0 | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 40 | |
_ulcer | 9 | 1 | 0 | 8 | 0 | 27 | 0 | 5 | 0 | 0 | 0 | 0 | 1 | 51 | |
SB_aphtae | 0 | 0 | 0 | 0 | 0 | 0 | 109 | 0 | 4 | 0 | 0 | 0 | 11 | 124 | |
_fissure | 0 | 0 | 1 | 0 | 0 | 5 | 3 | 50 | 1 | 0 | 3 | 0 | 14 | 77 | |
_non-ulc. Inflamm. | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 5 | |
_normal | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 186 | 16 | 0 | 0 | 202 | |
_debris | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 17 | 374 | 0 | 1 | 392 | |
_lymph. hyp. | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 8 | 0 | 10 | |
_ulcer | 1 | 0 | 0 | 1 | 0 | 8 | 46 | 39 | 0 | 0 | 4 | 0 | 29 | 128 | |
Total | 71 | 236 | 2 | 58 | 50 | 65 | 164 | 105 | 9 | 203 | 404 | 8 | 56 | 1431 |
The matrix displays the number of images according to their classification with the gold standard and deep learning framework depending on the location. Lesions were assigned to one of 13 predefined categories. The inter-modality agreement was substantial (κ = 0.74).