. 2023 Aug 29;18(8):e0290691. doi: 10.1371/journal.pone.0290691

Table 3. Comparison between AI vs human with the same reference.

	AI wins	Human wins	Equal	Mean difference
				(AI–human)
				(± SD)
Appropriateness of the question	18 (36%)	27 (54%)	5 (10%)	- 0.11 ± 1.05
Clarity and specificity	18 (36%)	26 (52%)	6 (12%)	- 0.13 ± 1.08
Relevance	18 (36%)	27 (54%)	5 (10%)	- 0.32 ± 1.04
Quality of the alternatives & discriminative power	21 (42%)	26 (52%)	3 (6%)	- 0.10 ± 0.94
Suitability for graduate medical school exam	22 (44%)	28 (56%)	2 (4%)	- 0.14 ± 1.12
Total score	20 (40%)	30 (60%)	0 (0%)	- 0.80 ± 4.82