Table 1 –
Large Language Model | Accuracy, mean (95% CI) | Completeness, mean (95% CI) | Age-Appropriateness, mean (95% CI) | Possibility of Demographic Bias, mean (95% CI) | Overall Quality, mean (95% CI) |
---|---|---|---|---|---|
GPT-4 | 4.37 (4.27, 4.47) | 4.25 (4.16, 4.34) | 3.95 (3.81, 4.09) | 1.61 (1.49, 1.73) | 3.88 (3.75, 4.01) |
Gemini | 4.55 (4.45, 4.65) | 4.39 (4.28, 4.50) | 3.26 (3.09, 3.43) | 1.16 (1.11, 1.21) | 3.43 (3.26, 3.60) |
P-value | 0.08 | 0.15 | <0.001 | <0.001 | 0.004 |
CI = confidence interval