Table 3.
Evaluation results with prompts sourced from GitHub.
| GPT-4 | ChatGPT | Copilot | Codex | CodeT5-large | CodeT5-ntp-py | CodeGen-350M | CodeGen-2.7B | CodeParrot-110M | CodeParrot-1.5B | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| Permissive | EM | 0.600.14 | 0.560.17 | 0.560.22 | 0.680.13 | 0.080.19 | 0.940.11 | 0.940.08 | 0.750.26 | 0.980.04 | 0.980.05 |
| EP | 0.50 | 0.45 | 0.48 | 0.71 | 0.06 | 0.99 | 0.99 | 0.74 | 0.99 | 0.99 | |
|
| |||||||||||
| Weak Copyleft | EM | 0.700.18 | 0.770.16 | 0.660.20 | 0.600.16 | 0.140.22 | 0.960.08 | 0.920.09 | 0.720.15 | 0.580.25 | 0.990.01 |
| EP | 0.84 | 0.90 | 0.57 | 0.64 | 0.07 | 0.99 | 0.99 | 0.93 | 0.56 | 0.99 | |
|
| |||||||||||
| Strong Copyleft | EM | 0.560.15 | 0.610.19 | 0.680.14 | 0.710.15 | 0.100.20 | 0.920.17 | 0.800.26 | 0.980.04 | 0.990.02 | 0.990.01 |
| EP | 0.61 | 0.64 | 0.75 | 0.78 | 0.05 | 0.94 | 0.81 | 0.99 | 0.99 | 0.99 | |
|
| |||||||||||
| All | EM | 0.640.20 | 0.670.18 | 0.620.25 | 0.640.20 | 0.110.21 | 0.920.12 | 0.930.09 | 0.760.27 | 0.980.04 | 0.990.01 |
| EP | 0.61 | 0.65 | 0.64 | 0.75 | 0.04 | 0.99 | 0.99 | 0.76 | 0.99 | 0.99 | |