Skip to main content
. 2023 Aug 29;18(8):e0290691. doi: 10.1371/journal.pone.0290691

Table 3. Comparison between AI vs human with the same reference.

AI wins Human wins Equal Mean difference
(AI–human)
(± SD)
Appropriateness of the question 18 (36%) 27 (54%) 5 (10%) - 0.11 ± 1.05
Clarity and specificity 18 (36%) 26 (52%) 6 (12%) - 0.13 ± 1.08
Relevance 18 (36%) 27 (54%) 5 (10%) - 0.32 ± 1.04
Quality of the alternatives & discriminative power 21 (42%) 26 (52%) 3 (6%) - 0.10 ± 0.94
Suitability for graduate medical school exam 22 (44%) 28 (56%) 2 (4%) - 0.14 ± 1.12
Total score 20 (40%) 30 (60%) 0 (0%) - 0.80 ± 4.82