Table 2. Masked Reviewer Survey Responses for the Qualitative Evaluation of Digital Wound Assessments.
Question | Annotator | No./total No. (%) | |||||
---|---|---|---|---|---|---|---|
Site 1 | Site 2 | ||||||
R1 | R2 | R3 | R1 | R2 | R3 | ||
1. Area tracing meets definition? | AI | 42/100 (42.0) | 53/110 (48.2) | 67/110 (60.9) | 65/89 (73.0) | 78/85 (91.8) | 47/89 (52.8) |
H1 | 65/100 (65.0) | 59/110 (53.6) | 82/110 (74.5) | 67/89 (75.3) | 79/85 (92.9) | 63/89 (70.8) | |
H2 | 51/100 (51.0) | 53/110 (48.2) | 72/110 (65.5) | 65/89 (73.0) | 82/85 (96.5) | 55/89 (61.8) | |
P value | .01a | .73 | .11 | .88 | .41 | .04a | |
2. Which is AI? | AI | 37/105 (35.2) | 42/109 (38.5) | 42/109 (38.5) | 3/89 (3.4) | 42/85 (49.4) | 24/89 (27.0) |
H1 | 39/105 (37.1) | 27/109 (24.8) | 33/109 (30.3) | 36/89 (40.4) | 20/85 (23.5) | 44/89 (49.4) | |
H2 | 29/105 (27.6) | 40/109 (36.7) | 34/109 (31.2) | 50/89 (56.2) | 23/85 (27.1) | 21/89 (23.6) | |
P value | .51 | .21 | .48 | <.001a | .004a | .004a | |
3. Which is most accurate? | AI | 32/91 (35.2) | 39/108 (36.1) | 35/109 (32.1) | 19/89 (21.3) | 25/85 (29.4) | 24/89 (27.0) |
H1 | 27/91 (29.7) | 32/108 (29.6) | 42/109 (38.5) | 48/89 (53.9) | 38/85 (44.7) | 44/89 (49.4) | |
H2 | 32/91 (35.2) | 37/108 (34.3) | 32/109 (29.4) | 22/89 (24.7) | 22/85 (25.9) | 21/89 (23.6) | |
P value | .78 | .76 | .45 | <.001a | .04a | .004a |
Abbreviations: AI, artificial intelligence; H, human; R, reviewer.
Statistically significant differences in frequency of yes answers for Q1 between AI and human traces for Fisher exact test P values (P < .05) and statistically significant bias in frequency of selection vs random selection for χ2 P values (P < .05).