. Author manuscript; available in PMC: 2025 Aug 26.

Published in final edited form as: Proc Mach Learn Res. 2023 Jul;23:40373–40389.

Table 5.

Evaluation results of models with respect to prompts derived from individual training datasets. Left: prompts were derived from direct sampling of training datasets. Right: prompts were derived from filtered datasets where overlapped corpus were not included (*NO stands for non-overlap).

_Model╲^Dataset	CodeRL		CodeGen		CodeParrot		CodeRL-NO*		CodeGen-NO		CodeParrot-NO
_Model╲^Dataset	EM	EP	EM	EP	EM	EP	EM	EP	EM	EP	EM	EP

CodeT5-large	0.28_0.26	0.18	0.19_0.26	0.07	0.32 _0.26	0.22	0.31 _0.14	0.19	0.22_0.16	0.13	0.26_0.19	0.17

CodeT5-large-ntp-py	0.92 _0.13	0.98	0.74_0.21	0.84	0.64_0.11	0.94	0.91 _0.15	0.98	0.38_0.21	0.24	0.40_0.06	0.14

CodeGen-350M	0.59_0.23	0.74	0.76 _0.16	0.95	0.65_0.25	0.68	0.33_0.06	0.34	0.78 _0.14	0.94	0.32_0.05	0.28

CodeGen-2.7B	0.54_0.12	0.80	0.78 _0.15	0.96	0.66_0.24	0.66	0.28_0.04	0.33	0.75 _0.11	0.98	0.36_0.08	0.26

CodeParrot-110M	0.50_0.20	0.20	0.55_0.17	0.63	0.66 _0.17	0.76	0.31_0.04	0.22	0.23_0.06	0.30	0.71 _0.23	0.80

CodeParrot-1.5B	0.58_0.17	0.65	0.60_0.23	0.68	0.65 _0.17	0.73	0.34_0.07	0.27	0.27_0.09	0.36	0.70 _0.20	0.73