Skip to main content
. 2024 Oct 7;7(10):e2437977. doi: 10.1001/jamanetworkopen.2024.37977

Table. Results of the Multinomial Logistic Regression Mixed-Effects Model to Estimate Odds of a Higher Pain Ratinga.

Variable Odds ratio (95% CI) P value Interpretation
Sex of the rater 1.10 (0.88-1.37) .42 No significance between age or sex of the rater and pain ratings
Age of the rater 0.99 (0.94-1.05) .77
Medical training year 1.28 (1.11-1.47) .001 Trainees with higher training levels were more likely to give a higher pain rating
Scenario in the case (kidney stone vs fracture) 0.46 (0.37-0.58) <.001 Cases with fracture pain were more likely to receive higher pain ratings than cases with a kidney stone
Rater = Gemini Pro vs trainee 0.70 (0.47-1.03) .07 No significant difference was found between the pain ratings from LLMs vs trainees
Rater = GPT-4 vs trainee 1.07 (0.45-2.53) .88
(Race of the patient in the case = Black vs White) × false beliefs 0.09 (0.01-0.64) .02 A significant interaction was found between race of the patient and presence of false beliefs, with Black patients less likely to receive a higher pain rating in the presence of false beliefs; this association is independent of the rater (ie, humans vs LLMs)
(Rater = Gemini Pro vs trainee) × false beliefs 0.71 (0.12-4.37) .71
(Rater = GPT-4 vs trainee) × false beliefs 0.14 (0.00-1312.63) .68

Abbreviations: GPT-4, Generative Pre-trained Transformer 4; LLM, large language model.

a

The variables of race and false beliefs are not presented by themselves due to noninterpretability of their results in the presence of significant interaction.