Table 2. Dataset sizes.
Model | Dataset full name | Short | Full | Filtered | ||||
---|---|---|---|---|---|---|---|---|
Name | Train | Valid | Test | Train | Valid | Test | ||
Machine datasets | ||||||||
GPT2 | Small-117M | s | 250,000 | 5,000 | 5,000 | 185,622 | 3,732 | 3,722 |
GPT2 | xl-1542M | xl | 250,000 | 5,000 | 5,000 | 193,052 | 3,868 | 3,851 |
GPT2 | Small-117M-k40 | s-k | 250,000 | 5,000 | 5,000 | 201,236 | 4,062 | 4,082 |
GPT2 | xl-1542M-k40 | xl-k | 250,000 | 5,000 | 5,000 | 214,202 | 4,312 | 4,243 |
GPT3 | 175B | GPT3 | 1,604 | 201 | 201 | 886 | 122 | 101 |
Grover | Grover-Mega | Grover | 8,000 | 1,000 | 1,000 | 7,740 | 964 | 961 |
Human datasets | ||||||||
GPT2 | Webtext | 250,000 | 5,000 | 5,000 | 190,503 | 3,813 | 3,834 | |
GPT3 | GPT3-webtext | 1,604 | 201 | 201 | 1,235 | 160 | 155 | |
Grover | realNews | 8,000 | 1,000 | 1,000 | 7,725 | 972 | 976 |