Table 2:
Comparisons of the Primary and Secondary Outcomes by Physicians with LLM and with Conventional Resources Only (Scores standardized to 0–100)
| Outcomes | Physicians+LLM, N = 1781 |
Physicians + Conventional Resources Only, N = 1971 |
Difference between Physicians+GPT-4 and Physicians+Conventional Resources Only (based on Generalized Mixed Effect Model, 95% Confidence Interval, p-value) |
|---|---|---|---|
| Primary Outcome | |||
| Total Score (n) | 178 | 197 | 6.5 (2.7 to 10.2), p<0.001 |
| Mean (SD) | 43.0 (17.3) | 35.7 (15.5) | |
| Median [IQR] | 41.3 [30.6 to 54.1] |
34.4 [22.5 to 47.8] |
|
| Secondary Outcomes | |||
| Management (n) | 178 | 197 | 6.1 (2.5 to 9.7), p=0.001 |
| Mean (SD) | 40.5 (19.1) | 33.4 (17.3) | |
| Median [IQR] | 37.5 [26.8 to 52.4] |
30.0 [19.3 to 45.5] |
|
| Factual (n) | 69 | 78 | 9.6 (−3.1 to 22.3), p=0.14 |
| Mean (SD) | 62.9 (37.6) | 53.8 (39.6) | |
| Median [IQR] | 75.0 [37.5 to 100.0] |
56.2 [15.6 to 100.0] |
|
| Diagnostic (n) | 72 | 77 | 12.1 (3.1 to 21.0), p=0.009 |
| Mean (SD) | 56.8 (37.6) | 45.8 (26.7) | |
| Median [IQR] | 66.7 [29.2 to 83.3] |
50.0 [33.3 to 66.7] |
|
| Specific (n) | 178 | 197 | 6.2 (2.4 to 9.9), p=0.002 |
| Mean (SD) | 42.4 (20.2) | 34.9 (17.9) | |
| Median [IQR] | 42.6 [28.1 to 57.4] |
35.2 [20.8 to 48.5] |
|
| General (n) | 70 | 80 | 3.3 (−1.3 to 7.9), p=0.2 |
| Mean (SD) | 29.4 (15.0) | 26.5 (13.0) | |
| Median [IQR] | 27.3 [18.2 to 39.8] |
24.6 [17.5 to 33.3] |
|
| Time Spent in Seconds (n) | 178 | 197 | 119.3 (17.4 to 221.2), p=0.022 |
| Mean (SD) | 801.5 (417.2) | 690.2 (372.4) | |
| Median [IQR] | 719.8 [514.6 to 1,010.2] |
570.9 [452.9 to 814.9] |