Table 2. Comparison of mean scores between questions generated by AI and human.
AI (± SD) | Human (± SD) | P | |
---|---|---|---|
Appropriateness of the question | 7.72 ± 0.83 | 7.84 ± 0.65 | 0.45 |
Clarity and specificity | 7.56 ± 0.81 | 7.69 ± 0.55 | 0.34 |
Relevance | 7.56 ± 0.94 | 7.88 ± 0.52 | 0.04 |
Quality of the alternatives & discriminative power | 7.26 ± 0.68 | 7.36 ± 0.61 | 0.46 |
Suitability for graduate medical school exam | 7.25 ± 0.94 | 7.40 ± 0.72 | 0.39 |
Total score | 37.36 ± 3.92 | 38.16 ± 2.62 | 0.23 |
SD = standard deviation