. 2023 Aug 29;18(8):e0290691. doi: 10.1371/journal.pone.0290691

Table 2. Comparison of mean scores between questions generated by AI and human.

	AI (± SD)	Human (± SD)	P
Appropriateness of the question	7.72 ± 0.83	7.84 ± 0.65	0.45
Clarity and specificity	7.56 ± 0.81	7.69 ± 0.55	0.34
Relevance	7.56 ± 0.94	7.88 ± 0.52	0.04
Quality of the alternatives & discriminative power	7.26 ± 0.68	7.36 ± 0.61	0.46
Suitability for graduate medical school exam	7.25 ± 0.94	7.40 ± 0.72	0.39
Total score	37.36 ± 3.92	38.16 ± 2.62	0.23

SD = standard deviation