. Author manuscript; available in PMC: 2025 Aug 26.

Published in final edited form as: Proc Mach Learn Res. 2023 Jul;23:40373–40389.

Table 7.

Performance of the mitigation strategies compared to baseline models under the CodeParrot and CodeRL frameworks. The code synthesis capability is evaluated using the HumanEval (CodeParrot) and APPS (CodeRL) benchmarks.

Model	Baseline					Model Fine-Tune					Dynamic Filter
Model	EM	EP	Pass@1	Pass@10	Pass@100	EM	EP	Pass@1	Pass@10	Pass@100	EM	EP	Pass@1	Pass@10	Pass@100

CodeParrot-110M	0.98_0.04	0.99	3.41%	5.39%	7.02%	0.71_0.13	0.70	3.56%	5.17%	7.44%	0.46_0.08	0.00	1.17%	1.69%	3.28%

CodeParrot-1.5B	0.99_0.01	0.99	3.84%	6.77%	10.02%	0.67_0.18	0.69	3.14%	5.84%	8.28%	0.40_0.10	0.00	1.46%	2.11%	2.78%

CodeT5-large	0.11_0.21	0.04	0.00	0.00	0.00	0.71_0.15	0.68	1.70e-03	7.78e-03	0.02	0.10_0.20	0.00	0.00	0.00	0.00

CodeT5-large-ntp-py	0.92_0.12	0.99	1.40e-05	1.30e-04	8.00e-04	0.82_0.10	0.87	2.48e-03	8.64e-03	0.02	0.47_0.06	0.00	0.00	2.83e-05	1.42e-05