Table 4.
Results of the correlation analysis with Pearson correlation coefficient and obtained p-value given in the brackets along with p-value obtained from the Mann–Whitney U test comparing the values of the index of difficulty for correct and incorrect answers for temperature parameter equal to 1.
| S22 | A22 | S23 | |
|---|---|---|---|
| Polish | |||
| GPT-3.5 | |||
| Pearson correlation coefficient (p-value) | 0.319 (< 0.001***) | 0.301 (< 0.001***) | 0.174 (0.015**) |
| p-value from Mann–Whitney U test | < 0.001*** | < 0.001*** | 0.004** |
| Cohen’s d | 0.671 | 0.634 | 0.362 |
| GPT-4 | |||
| Pearson correlation coefficient (p-value) | 0.338 (< 0.001***) | 0.334 (< 0.001***) | 0.299 (< 0.001***) |
| p-value from Mann–Whitney U test | < 0.001*** | < 0.001*** | < 0.001*** |
| Cohen’s d | 0.834 | 0.861 | 0.829 |
| English | |||
| GPT-3.5 | |||
| Pearson correlation coefficient (p-value) | 0.229 (0.001**) | 0.245 (< 0.001***) | 0.219 (0.002**) |
| p-value from Mann–Whitney U test | < 0.001*** | < 0.001*** | 0.003** |
| Cohen’s d | 0.474 | 0.511 | 0.453 |
| GPT-4 | |||
| Pearson correlation coefficient (p-value) | 0.363 (< 0.001***) | 0.273 (< 0.001***) | 0.307 (< 0.001***) |
| p-value from Mann–Whitney U test | < 0.001*** | 0.001** | < 0.001*** |
| Cohen’s d | 0.933 | 0.714 | 0.802 |