[Preprint]. 2024 Aug 7:2024.08.05.24311485. [Version 1] doi: 10.1101/2024.08.05.24311485

Table 2:

Comparisons of the Primary and Secondary Outcomes by Physicians with LLM and with Conventional Resources Only (Scores standardized to 0–100)

Outcomes	Physicians+LLM, N = 178¹	Physicians + Conventional Resources Only, N = 197¹	Difference between Physicians+GPT-4 and Physicians+Conventional Resources Only (based on Generalized Mixed Effect Model, 95% Confidence Interval, p-value)
Primary Outcome
Total Score (n)	178	197	6.5 (2.7 to 10.2), p<0.001
Mean (SD)	43.0 (17.3)	35.7 (15.5)
Median [IQR]	41.3 [30.6 to 54.1]	34.4 [22.5 to 47.8]
Secondary Outcomes
Management (n)	178	197	6.1 (2.5 to 9.7), p=0.001
Mean (SD)	40.5 (19.1)	33.4 (17.3)
Median [IQR]	37.5 [26.8 to 52.4]	30.0 [19.3 to 45.5]
Factual (n)	69	78	9.6 (−3.1 to 22.3), p=0.14
Mean (SD)	62.9 (37.6)	53.8 (39.6)
Median [IQR]	75.0 [37.5 to 100.0]	56.2 [15.6 to 100.0]
Diagnostic (n)	72	77	12.1 (3.1 to 21.0), p=0.009
Mean (SD)	56.8 (37.6)	45.8 (26.7)
Median [IQR]	66.7 [29.2 to 83.3]	50.0 [33.3 to 66.7]
Specific (n)	178	197	6.2 (2.4 to 9.9), p=0.002
Mean (SD)	42.4 (20.2)	34.9 (17.9)
Median [IQR]	42.6 [28.1 to 57.4]	35.2 [20.8 to 48.5]
General (n)	70	80	3.3 (−1.3 to 7.9), p=0.2
Mean (SD)	29.4 (15.0)	26.5 (13.0)
Median [IQR]	27.3 [18.2 to 39.8]	24.6 [17.5 to 33.3]
Time Spent in Seconds (n)	178	197	119.3 (17.4 to 221.2), p=0.022
Mean (SD)	801.5 (417.2)	690.2 (372.4)
Median [IQR]	719.8 [514.6 to 1,010.2]	570.9 [452.9 to 814.9]