Skip to main content
. 2025 May 30;25:150. doi: 10.1186/s12874-025-02532-2

Table 2.

Overview of survey results

Category Value Total Chatbot Human p-value overall
Overall Chatbot Human p-value Chatbot vs. Human ChatFlash ChatGPT 3.5 ChatGPT 4.0 ZenoChat Researcher A Researcher B
Addition yes 102 96 (18.18%) 6 (2.27%)  < 0.001* 16 39 30 11 6 0  < 0.001*
no 690 432 (81.82%) 258 (97.73%) 116 93 102 121 126 132
Completeness complete 560 421 (79.73%) 139 (52.65%)  < 0.001* 108 108 104 101 65 74  < 0.001*
partial 131 67 (12.69%) 64 (24.24%) 17 18 14 18 34 30
incomplete 101 40 (7.58%) 61 (23.11%) 7 6 14 13 33 28
Context correct 712 488 (92.42%) 224 (84.85%) 0.001* 125 116 123 124 110 114 0.007*
incorrect 80 40 (7.58%) 40 (15.15%) 7 16 9 8 22 18
Correctness correct 574 390 (73.86%) 184 (69.70%) 0.116 102 90 90 108 91 93 0.051
partial 119 83 (15.72%) 36 (13.64%) 20 25 26 12 16 20
incorrect 99 55 (10.42%) 44 (16.67%) 10 17 16 12 25 19
Interpretation yes 105 98 (18.56%) 7 (2.65%)  < 0.001* 18 30 43 7 5 2  < 0.001*
no 687 430 (81.44%) 257 (97.35%) 114 102 89 125 127 130
Length too short 93 19 (3.60%) 74 (28.03%)  < 0.001* 1 3 4 11 38 36
perfect 537 363 (68.75%) 174 (65.91%) 104 84 86 89 89 85  < 0.001*
too long 162 146 (27.65%) 16 (6.06%) 27 45 42 32 5 11

Significant differences between groups are indicated by *. Results are presented as absolute values unless indicated otherwise; the percentages refer to the columns