Skip to main content
. 2023 Aug 29;18(8):e0290691. doi: 10.1371/journal.pone.0290691

Table 2. Comparison of mean scores between questions generated by AI and human.

AI (± SD) Human (± SD) P
Appropriateness of the question 7.72 ± 0.83 7.84 ± 0.65 0.45
Clarity and specificity 7.56 ± 0.81 7.69 ± 0.55 0.34
Relevance 7.56 ± 0.94 7.88 ± 0.52 0.04
Quality of the alternatives & discriminative power 7.26 ± 0.68 7.36 ± 0.61 0.46
Suitability for graduate medical school exam 7.25 ± 0.94 7.40 ± 0.72 0.39
Total score 37.36 ± 3.92 38.16 ± 2.62 0.23

SD = standard deviation