. 2023 Nov 22;13:20512. doi: 10.1038/s41598-023-46995-z

Table 6.

Results of the correlation analysis with Pearson correlation coefficient and obtained p-value given in the brackets along with p-value obtained from the Mann–Whitney U test comparing the values of the discrimination power index for correct and incorrect answers for temperature parameter equal to 1.

	S22	A22	S23
Polish
GPT-3.5
Pearson correlation coefficient (p-value)	− 0.092 (0.200 ns)	− 0.235 (< 0.001***)	− 0.197 (0.006**)
p-value from Mann–Whitney U test	0.156 ns	< 0.001***	0.009**
Cohen’s d	0.184	0.487	0.411
GPT-4
Pearson correlation coefficient (p-value)	− 0.037 (0.607 ns)	− 0.253 (< 0.001***)	− 0.190 (0.008**)
p-value from Mann–Whitney U test	0.294 ns	< 0.001***	0.037*
Cohen’s d	0.086	0.636	0.512
English
GPT-3.5
Pearson correlation coefficient (p-value)	− 0.098 (0.173 ns)	− 0.191 (0.007**)	− 0.109 (0.129 ns)
p-value from Mann–Whitney U test	0.116 ns	0.002**	0.171
Cohen’s d	0.198	0.395	0.221
GPT-4
Pearson correlation coefficient (p-value)	− 0.050 (0.485 ns)	− 0.151 (0.034*)	− 0.045 (0.555 ns)
p-value from Mann–Whitney U test	0.346 ns	0.055 ns	0.929 ns
Cohen’s d	0.121	0.386	0.106