Skip to main content
. Author manuscript; available in PMC: 2025 Aug 26.
Published in final edited form as: Proc Mach Learn Res. 2023 Jul;23:40373–40389.

Table 5.

Evaluation results of models with respect to prompts derived from individual training datasets. Left: prompts were derived from direct sampling of training datasets. Right: prompts were derived from filtered datasets where overlapped corpus were not included (*NO stands for non-overlap).

ModelDataset CodeRL
CodeGen
CodeParrot


CodeRL-NO*
CodeGen-NO
CodeParrot-NO
EM EP EM EP EM EP EM EP EM EP EM EP

CodeT5-large 0.280.26 0.18 0.190.26 0.07 0.32 0.26 0.22 0.31 0.14 0.19 0.220.16 0.13 0.260.19 0.17

CodeT5-large-ntp-py 0.92 0.13 0.98 0.740.21 0.84 0.640.11 0.94 0.91 0.15 0.98 0.380.21 0.24 0.400.06 0.14

CodeGen-350M 0.590.23 0.74 0.76 0.16 0.95 0.650.25 0.68 0.330.06 0.34 0.78 0.14 0.94 0.320.05 0.28

CodeGen-2.7B 0.540.12 0.80 0.78 0.15 0.96 0.660.24 0.66 0.280.04 0.33 0.75 0.11 0.98 0.360.08 0.26

CodeParrot-110M 0.500.20 0.20 0.550.17 0.63 0.66 0.17 0.76 0.310.04 0.22 0.230.06 0.30 0.71 0.23 0.80

CodeParrot-1.5B 0.580.17 0.65 0.600.23 0.68 0.65 0.17 0.73 0.340.07 0.27 0.270.09 0.36 0.70 0.20 0.73