Table 2.
Accuracy of large language models in Practical Skills.
| Year | GPT-3.5 | GPT-4.0 | GPT-4o | Copilot | ERNIE Bot-3.5 | SPARK | Qwen-2.5 |
| 2023 | 0.550 | 0.775 | 0.833 | 0.792 | 0.775 | 0.758 | 0.908 |
| 2022 | 0.467 | 0.692 | 0.800 | 0.792 | 0.792 | 0.675 | 0.850 |
| 2021 | 0.467 | 0.708 | 0.850 | 0.667 | 0.750 | 0.567 | 0.942 |
| 2020 | 0.475 | 0.642 | 0.783 | 0.592 | 0.800 | 0.700 | 0.933 |
| 2019 | 0.483 | 0.658 | 0.783 | 0.458 | 0.767 | 0.592 | 0.883 |
| Overall | 0.488 | 0.695 | 0.810 | 0.660 | 0.777 | 0.658 | 0.903 |