. 2022 Dec 21;379:e072826. doi: 10.1136/bmj-2022-072826

Table 3.

Summary of mean, median, range, and standard deviation of scores (Likert-type scale, 1-10) for radiologists’ opinions on difficulty level of each mock examination, their own performance, and that of AI tool

Examination	Mean score	Median score	Range	Standard deviation
How representative of the FRCR was this examination? ^*
1	6.5	7.0	4-9	1.5
2	6.7	6.5	4-9	1.5
3	6.2	6.0	2-9	1.8
4	7.0	7.0	5-9	1.4
5	7.4	8.0	5-9	1.4
6	7.2	8.0	5-10	1.5
7	6.6	7.0	4-9	1.4
8	6.3	7.0	4-9	1.5
9	6.0	5.5	4-9	1.5
10	6.3	6.0	4-10	1.6
How well do you think an AI model would perform on this examination? ^†
1	6.0	6.0	1-9	2.3
2	6.1	6.5	2-9	2.0
3	6.3	7.0	1-9	2.4
4	6.0	6.0	1-9	2.1
5	6.1	5.5	1-9	2.0
6	6.5	7.0	1-9	2.1
7	6.6	7.0	1-9	2.0
8	6.2	6.5	1-9	2.1
9	6.3	6.0	1-10	2.2
10	6.4	6.5	1-9	2.0
How well do you think you performed on this examination? ^†
1	7.0	7.0	3-10	1.6
2	6.5	6.5	3-9	1.7
3	6.8	7.0	4-10	1.8
4	6.1	6.0	2-9	1.8
5	6.2	6.0	3-10	2.0
6	5.8	5.0	3-9	2.0
7	6.2	6.0	3-9	1.8
8	6.3	6.0	4-9	1.5
9	6.4	6.0	4-9	1.6
10	6.2	6.0	3-9	1.7
Differences in scores between radiologists’ self-perception of performance and that of AI tool ^‡
1	1.0	1	−4-6	2.7
2	0.3	0	−4-6	2.5
3	0.5	0	−4-7	2.7
4	0.1	0	−5-6	2.3
5	0.0	0	−5-6	2.4
6	−0.7	0	−6-3	2.3
7	−0.5	0	−6-5	2.4
8	0.1	0	−4-5	2.3
9	0.1	0	−5-6	2.3
10	−0.2	0	−6-6	2.3

AI=artificial intelligence; FRCR=Fellowship of the Royal College of Radiologists.

1=too easy; 5=about right; 10=too difficult.

^†

1=everything incorrect; 5=half correct; 10=perfect, everything correct.

^‡

Negative scores denote perception of AI performing better.