Table. Results of the Multinomial Logistic Regression Mixed-Effects Model to Estimate Odds of a Higher Pain Ratinga.
Variable | Odds ratio (95% CI) | P value | Interpretation |
---|---|---|---|
Sex of the rater | 1.10 (0.88-1.37) | .42 | No significance between age or sex of the rater and pain ratings |
Age of the rater | 0.99 (0.94-1.05) | .77 | |
Medical training year | 1.28 (1.11-1.47) | .001 | Trainees with higher training levels were more likely to give a higher pain rating |
Scenario in the case (kidney stone vs fracture) | 0.46 (0.37-0.58) | <.001 | Cases with fracture pain were more likely to receive higher pain ratings than cases with a kidney stone |
Rater = Gemini Pro vs trainee | 0.70 (0.47-1.03) | .07 | No significant difference was found between the pain ratings from LLMs vs trainees |
Rater = GPT-4 vs trainee | 1.07 (0.45-2.53) | .88 | |
(Race of the patient in the case = Black vs White) × false beliefs | 0.09 (0.01-0.64) | .02 | A significant interaction was found between race of the patient and presence of false beliefs, with Black patients less likely to receive a higher pain rating in the presence of false beliefs; this association is independent of the rater (ie, humans vs LLMs) |
(Rater = Gemini Pro vs trainee) × false beliefs | 0.71 (0.12-4.37) | .71 | |
(Rater = GPT-4 vs trainee) × false beliefs | 0.14 (0.00-1312.63) | .68 |
Abbreviations: GPT-4, Generative Pre-trained Transformer 4; LLM, large language model.
The variables of race and false beliefs are not presented by themselves due to noninterpretability of their results in the presence of significant interaction.