Table 3.
Summary of mean, median, range, and standard deviation of scores (Likert-type scale, 1-10) for radiologists’ opinions on difficulty level of each mock examination, their own performance, and that of AI tool
| Examination | Mean score | Median score | Range | Standard deviation |
|---|---|---|---|---|
| How representative of the FRCR was this examination? * | ||||
| 1 | 6.5 | 7.0 | 4-9 | 1.5 |
| 2 | 6.7 | 6.5 | 4-9 | 1.5 |
| 3 | 6.2 | 6.0 | 2-9 | 1.8 |
| 4 | 7.0 | 7.0 | 5-9 | 1.4 |
| 5 | 7.4 | 8.0 | 5-9 | 1.4 |
| 6 | 7.2 | 8.0 | 5-10 | 1.5 |
| 7 | 6.6 | 7.0 | 4-9 | 1.4 |
| 8 | 6.3 | 7.0 | 4-9 | 1.5 |
| 9 | 6.0 | 5.5 | 4-9 | 1.5 |
| 10 | 6.3 | 6.0 | 4-10 | 1.6 |
| How well do you think an AI model would perform on this examination? † | ||||
| 1 | 6.0 | 6.0 | 1-9 | 2.3 |
| 2 | 6.1 | 6.5 | 2-9 | 2.0 |
| 3 | 6.3 | 7.0 | 1-9 | 2.4 |
| 4 | 6.0 | 6.0 | 1-9 | 2.1 |
| 5 | 6.1 | 5.5 | 1-9 | 2.0 |
| 6 | 6.5 | 7.0 | 1-9 | 2.1 |
| 7 | 6.6 | 7.0 | 1-9 | 2.0 |
| 8 | 6.2 | 6.5 | 1-9 | 2.1 |
| 9 | 6.3 | 6.0 | 1-10 | 2.2 |
| 10 | 6.4 | 6.5 | 1-9 | 2.0 |
| How well do you think you performed on this examination? † | ||||
| 1 | 7.0 | 7.0 | 3-10 | 1.6 |
| 2 | 6.5 | 6.5 | 3-9 | 1.7 |
| 3 | 6.8 | 7.0 | 4-10 | 1.8 |
| 4 | 6.1 | 6.0 | 2-9 | 1.8 |
| 5 | 6.2 | 6.0 | 3-10 | 2.0 |
| 6 | 5.8 | 5.0 | 3-9 | 2.0 |
| 7 | 6.2 | 6.0 | 3-9 | 1.8 |
| 8 | 6.3 | 6.0 | 4-9 | 1.5 |
| 9 | 6.4 | 6.0 | 4-9 | 1.6 |
| 10 | 6.2 | 6.0 | 3-9 | 1.7 |
| Differences in scores between radiologists’ self-perception of performance and that of AI tool ‡ | ||||
| 1 | 1.0 | 1 | −4-6 | 2.7 |
| 2 | 0.3 | 0 | −4-6 | 2.5 |
| 3 | 0.5 | 0 | −4-7 | 2.7 |
| 4 | 0.1 | 0 | −5-6 | 2.3 |
| 5 | 0.0 | 0 | −5-6 | 2.4 |
| 6 | −0.7 | 0 | −6-3 | 2.3 |
| 7 | −0.5 | 0 | −6-5 | 2.4 |
| 8 | 0.1 | 0 | −4-5 | 2.3 |
| 9 | 0.1 | 0 | −5-6 | 2.3 |
| 10 | −0.2 | 0 | −6-6 | 2.3 |
AI=artificial intelligence; FRCR=Fellowship of the Royal College of Radiologists.
1=too easy; 5=about right; 10=too difficult.
1=everything incorrect; 5=half correct; 10=perfect, everything correct.
Negative scores denote perception of AI performing better.