Table 2.
Overview of survey results
| Category | Value | Total | Chatbot | Human | p-value overall | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Overall | Chatbot | Human | p-value Chatbot vs. Human | ChatFlash | ChatGPT 3.5 | ChatGPT 4.0 | ZenoChat | Researcher A | Researcher B | |||
| Addition | yes | 102 | 96 (18.18%) | 6 (2.27%) | < 0.001* | 16 | 39 | 30 | 11 | 6 | 0 | < 0.001* |
| no | 690 | 432 (81.82%) | 258 (97.73%) | 116 | 93 | 102 | 121 | 126 | 132 | |||
| Completeness | complete | 560 | 421 (79.73%) | 139 (52.65%) | < 0.001* | 108 | 108 | 104 | 101 | 65 | 74 | < 0.001* |
| partial | 131 | 67 (12.69%) | 64 (24.24%) | 17 | 18 | 14 | 18 | 34 | 30 | |||
| incomplete | 101 | 40 (7.58%) | 61 (23.11%) | 7 | 6 | 14 | 13 | 33 | 28 | |||
| Context | correct | 712 | 488 (92.42%) | 224 (84.85%) | 0.001* | 125 | 116 | 123 | 124 | 110 | 114 | 0.007* |
| incorrect | 80 | 40 (7.58%) | 40 (15.15%) | 7 | 16 | 9 | 8 | 22 | 18 | |||
| Correctness | correct | 574 | 390 (73.86%) | 184 (69.70%) | 0.116 | 102 | 90 | 90 | 108 | 91 | 93 | 0.051 |
| partial | 119 | 83 (15.72%) | 36 (13.64%) | 20 | 25 | 26 | 12 | 16 | 20 | |||
| incorrect | 99 | 55 (10.42%) | 44 (16.67%) | 10 | 17 | 16 | 12 | 25 | 19 | |||
| Interpretation | yes | 105 | 98 (18.56%) | 7 (2.65%) | < 0.001* | 18 | 30 | 43 | 7 | 5 | 2 | < 0.001* |
| no | 687 | 430 (81.44%) | 257 (97.35%) | 114 | 102 | 89 | 125 | 127 | 130 | |||
| Length | too short | 93 | 19 (3.60%) | 74 (28.03%) | < 0.001* | 1 | 3 | 4 | 11 | 38 | 36 | |
| perfect | 537 | 363 (68.75%) | 174 (65.91%) | 104 | 84 | 86 | 89 | 89 | 85 | < 0.001* | ||
| too long | 162 | 146 (27.65%) | 16 (6.06%) | 27 | 45 | 42 | 32 | 5 | 11 | |||
Significant differences between groups are indicated by *. Results are presented as absolute values unless indicated otherwise; the percentages refer to the columns