Skip to main content
[Preprint]. 2024 Aug 7:2024.08.05.24311485. [Version 1] doi: 10.1101/2024.08.05.24311485

Table 2:

Comparisons of the Primary and Secondary Outcomes by Physicians with LLM and with Conventional Resources Only (Scores standardized to 0–100)

Outcomes Physicians+LLM,

N = 1781
Physicians + Conventional Resources Only,

N = 1971
Difference between Physicians+GPT-4 and Physicians+Conventional Resources Only

(based on Generalized Mixed Effect Model, 95% Confidence Interval, p-value)
Primary Outcome
Total Score (n) 178 197 6.5 (2.7 to 10.2), p<0.001
  Mean (SD) 43.0 (17.3) 35.7 (15.5)
  Median [IQR] 41.3
[30.6 to 54.1]
34.4
[22.5 to 47.8]
Secondary Outcomes
Management (n) 178 197 6.1 (2.5 to 9.7), p=0.001
  Mean (SD) 40.5 (19.1) 33.4 (17.3)
  Median [IQR] 37.5
[26.8 to 52.4]
30.0
[19.3 to 45.5]
Factual (n) 69 78 9.6 (−3.1 to 22.3), p=0.14
  Mean (SD) 62.9 (37.6) 53.8 (39.6)
  Median [IQR] 75.0
[37.5 to 100.0]
56.2
[15.6 to 100.0]
Diagnostic (n) 72 77 12.1 (3.1 to 21.0), p=0.009
  Mean (SD) 56.8 (37.6) 45.8 (26.7)
  Median [IQR] 66.7
[29.2 to 83.3]
50.0
[33.3 to 66.7]
Specific (n) 178 197 6.2 (2.4 to 9.9), p=0.002
  Mean (SD) 42.4 (20.2) 34.9 (17.9)
  Median [IQR] 42.6
[28.1 to 57.4]
35.2
[20.8 to 48.5]
General (n) 70 80 3.3 (−1.3 to 7.9), p=0.2
  Mean (SD) 29.4 (15.0) 26.5 (13.0)
  Median [IQR] 27.3
[18.2 to 39.8]
24.6
[17.5 to 33.3]
Time Spent in Seconds (n) 178 197 119.3 (17.4 to 221.2), p=0.022
  Mean (SD) 801.5 (417.2) 690.2 (372.4)
  Median [IQR] 719.8
[514.6 to 1,010.2]
570.9
[452.9 to 814.9]