Skip to main content
. Author manuscript; available in PMC: 2025 Aug 26.
Published in final edited form as: Proc Mach Learn Res. 2023 Jul;23:40373–40389.

Table 7.

Performance of the mitigation strategies compared to baseline models under the CodeParrot and CodeRL frameworks. The code synthesis capability is evaluated using the HumanEval (CodeParrot) and APPS (CodeRL) benchmarks.

Model Baseline Model Fine-Tune Dynamic Filter
EM EP Pass@1 Pass@10 Pass@100 EM EP Pass@1 Pass@10 Pass@100 EM EP Pass@1 Pass@10 Pass@100

CodeParrot-110M 0.980.04 0.99 3.41% 5.39% 7.02% 0.710.13 0.70 3.56% 5.17% 7.44% 0.460.08 0.00 1.17% 1.69% 3.28%

CodeParrot-1.5B 0.990.01 0.99 3.84% 6.77% 10.02% 0.670.18 0.69 3.14% 5.84% 8.28% 0.400.10 0.00 1.46% 2.11% 2.78%

CodeT5-large 0.110.21 0.04 0.00 0.00 0.00 0.710.15 0.68 1.70e-03 7.78e-03 0.02 0.100.20 0.00 0.00 0.00 0.00

CodeT5-large-ntp-py 0.920.12 0.99 1.40e-05 1.30e-04 8.00e-04 0.820.10 0.87 2.48e-03 8.64e-03 0.02 0.470.06 0.00 0.00 2.83e-05 1.42e-05