Table. Satisfaction of AI and Clinician Responses and Association With the Length of Responses.
Division | AIa | Clinicians | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Assessments, No. | Satisfaction estimate (SE)b | No. of characters, mean (SD) | Satisfaction and the length of response | Assessments, No. | Satisfaction estimate (SE)b | No. of characters, mean (SD) | Satisfaction and the length of response | |||
Standardized βc | P value | Standardized βc | P value | |||||||
Overall | 213 | 3.96 (0.09) | 1470.77 (391.83) | 0.10 | .16 | 195 | 3.05 (0.09) | 254.37 (198.85) | 0.23 | .002 |
Cardiovascular | 78 | 4.09 (0.14) | 1559.04 (424.83) | 0.068 | .58 | 75 | 3.25 (0.14) | 306.36 (221.09) | 0.29 | .02 |
Internal medicine | 87 | 3.82 (0.13) | 1314.72 (347.11) | 0.037 | .72 | 78 | 2.94 (0.14) | 146.31 (109.43) | 0.0056 | .96 |
Endocrinology | 48 | 4.00 (0.19) | 1610.19 (330.87) | 0.25 | .08 | 42 | 2.90 (0.20) | 362.21 (200.79) | 0.31 | .09 |
Abbreviations: AI, artificial intelligence.
Responses from Stanford GPT with prompts were assessed for satisfaction as it was graded as the best response in terms of information quality and empathy.
Satisfaction assessed on a 5-point scale, with 1 being the lowest and 5 the highest. P-values for the satisfactory estimate difference between AI vs clinicians were all P < .001 (overall, cardiovascular division, internal medicine division, and endocrinology division). The missing values were handled by missingness at random in the statistical model (mixed effect model).
To avoid too small β coefficients, we computed standardized β coefficients to present the strength of the effect of the length of response on the satisfaction estimate. The standardized β coefficient measures the changes in standard deviations (SD) of satisfaction estimates when 1 SD increases in the length of response. We adjusted it for age, sex, race, and ethnicity.