Table 2.
Data statistics of the constructed prompts using CODEIPPROMPT. The number of tokens is measured by tokenizers from the multi-lingual CodeGen model, and is presented with its mean and the standard deviation (in subscripts).
| # Prompts | Total |
Permissive 52.0K |
Weak Copyleft 77.1K |
Strong Copyleft 50.0K |
||
| 179.1K |
C 6.2K |
C++ 6.2K |
C# 37.6K |
Python 30.1K |
Java 99.0K |
|
| # Tokens | Avg. |
Permissive 13.27.3 |
Weak Copyleft 13.27.5 |
Strong Copyleft 13.36.2 |
||
| 13.27.1 |
C 18.29.5 |
C++ 18.311.5 |
C# 14.67.9 |
Python 11.66.0 |
Java 12.66.1 |
|