Table 7.
Performance of the mitigation strategies compared to baseline models under the CodeParrot and CodeRL frameworks. The code synthesis capability is evaluated using the HumanEval (CodeParrot) and APPS (CodeRL) benchmarks.
| Model | Baseline | Model Fine-Tune | Dynamic Filter | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EM | EP | Pass@1 | Pass@10 | Pass@100 | EM | EP | Pass@1 | Pass@10 | Pass@100 | EM | EP | Pass@1 | Pass@10 | Pass@100 | |
|
| |||||||||||||||
| CodeParrot-110M | 0.980.04 | 0.99 | 3.41% | 5.39% | 7.02% | 0.710.13 | 0.70 | 3.56% | 5.17% | 7.44% | 0.460.08 | 0.00 | 1.17% | 1.69% | 3.28% |
|
| |||||||||||||||
| CodeParrot-1.5B | 0.990.01 | 0.99 | 3.84% | 6.77% | 10.02% | 0.670.18 | 0.69 | 3.14% | 5.84% | 8.28% | 0.400.10 | 0.00 | 1.46% | 2.11% | 2.78% |
|
| |||||||||||||||
| CodeT5-large | 0.110.21 | 0.04 | 0.00 | 0.00 | 0.00 | 0.710.15 | 0.68 | 1.70e-03 | 7.78e-03 | 0.02 | 0.100.20 | 0.00 | 0.00 | 0.00 | 0.00 |
|
| |||||||||||||||
| CodeT5-large-ntp-py | 0.920.12 | 0.99 | 1.40e-05 | 1.30e-04 | 8.00e-04 | 0.820.10 | 0.87 | 2.48e-03 | 8.64e-03 | 0.02 | 0.470.06 | 0.00 | 0.00 | 2.83e-05 | 1.42e-05 |