Table 5.
Results of the correlation analysis with Pearson correlation coefficient and obtained p-value given in the brackets along with p-value obtained from the Mann–Whitney U test comparing the values of the discrimination power index for correct and incorrect answers for temperature parameter equal to 0.
| S22 | A22 | S23 | |
|---|---|---|---|
| Polish | |||
| GPT-3.5 | |||
| Pearson correlation coefficient (p-value) | − 0.124 (0.083 ns) | − 0.243 (< 0.001***) | − 0.327 (< 0.001***) |
| p-value from Mann–Whitney U test | 0.053 ns | < 0.001*** | < 0.001*** |
| Cohen’s d | 0.251 | 0.499 | 0.690 |
| GPT-4 | |||
| Pearson correlation coefficient (p-value) | − 0.029 (0.690 ns) | − 0.249 (< 0.001***) | − 0.185 (0.010**) |
| p-value from Mann–Whitney U test | 0.410 ns | 0.001** | 0.031* |
| Cohen’s d | 0.067 | 0.661 | 0.482 |
| English | |||
| GPT-3.5 | |||
| Pearson correlation coefficient (p-value) | − 0.103 (0.150 ns) | − 0.176 (0.013*) | − 0.182 (0.011*) |
| p-value from Mann–Whitney U test | 0.090 ns | 0.005** | 0.009** |
| Cohen’s d | 0.210 | 0.363 | 0.394 |
| GPT-4 | |||
| Pearson correlation coefficient (p-value) | − 0.011 (0.877 ns) | − 0.146 (0.041*) | − 0.140 (0.051 ns) |
| p-value from Mann–Whitney U test | 0.704 ns | 0.072 ns | 0.137 ns |
| Cohen’s d | 0.026 | 0.368 | 0.380 |