. 2025 May 30;25:150. doi: 10.1186/s12874-025-02532-2

Table 2.

Overview of survey results

Category	Value	Total				Chatbot				Human		p-value overall
Category	Value	Overall	Chatbot	Human	p-value Chatbot vs. Human	ChatFlash	ChatGPT 3.5	ChatGPT 4.0	ZenoChat	Researcher A	Researcher B	p-value overall
Addition	yes	102	96 (18.18%)	6 (2.27%)	< 0.001*	16	39	30	11	6	0	< 0.001*
Addition	no	690	432 (81.82%)	258 (97.73%)	< 0.001*	116	93	102	121	126	132	< 0.001*
Completeness	complete	560	421 (79.73%)	139 (52.65%)	< 0.001*	108	108	104	101	65	74	< 0.001*
	partial	131	67 (12.69%)	64 (24.24%)		17	18	14	18	34	30
	incomplete	101	40 (7.58%)	61 (23.11%)		7	6	14	13	33	28
Context	correct	712	488 (92.42%)	224 (84.85%)	0.001*	125	116	123	124	110	114	0.007*
Context	incorrect	80	40 (7.58%)	40 (15.15%)	0.001*	7	16	9	8	22	18	0.007*
Correctness	correct	574	390 (73.86%)	184 (69.70%)	0.116	102	90	90	108	91	93	0.051
	partial	119	83 (15.72%)	36 (13.64%)		20	25	26	12	16	20
	incorrect	99	55 (10.42%)	44 (16.67%)		10	17	16	12	25	19
Interpretation	yes	105	98 (18.56%)	7 (2.65%)	< 0.001*	18	30	43	7	5	2	< 0.001*
Interpretation	no	687	430 (81.44%)	257 (97.35%)	< 0.001*	114	102	89	125	127	130	< 0.001*
Length	too short	93	19 (3.60%)	74 (28.03%)	< 0.001*	1	3	4	11	38	36
	perfect	537	363 (68.75%)	174 (65.91%)		104	84	86	89	89	85	< 0.001*
	too long	162	146 (27.65%)	16 (6.06%)		27	45	42	32	5	11

Significant differences between groups are indicated by *. Results are presented as absolute values unless indicated otherwise; the percentages refer to the columns