Skip to main content
. Author manuscript; available in PMC: 2025 Aug 26.
Published in final edited form as: Proc Mach Learn Res. 2023 Jul;23:40373–40389.

Table 2.

Data statistics of the constructed prompts using CODEIPPROMPT. The number of tokens is measured by tokenizers from the multi-lingual CodeGen model, and is presented with its mean and the standard deviation (in subscripts).

# Prompts Total Permissive
52.0K
Weak Copyleft
77.1K
Strong Copyleft
50.0K
179.1K C
6.2K
C++
6.2K
C#
37.6K
Python
30.1K
Java
99.0K
# Tokens Avg. Permissive
13.27.3
Weak Copyleft
13.27.5
Strong Copyleft
13.36.2
13.27.1 C
18.29.5
C++
18.311.5
C#
14.67.9
Python
11.66.0
Java
12.66.1