Table 2.
Statistics and answer types for all datasets.
| Dataset ID | Examples | Examples w. Human Reference CoTs | Examples w. AI-generated CoTs | Number of AI-generated CoTs | Answer type |
|---|---|---|---|---|---|
| AQUA-RAT | 97,975 | 97,975 | 0 | 0 | multiple choice |
| ASDiv | 1218 | 1218 | 0 | 0 | number |
| CommonsenseQA | 12,102 | 10,962 | 1221 | 4417 | multiple choice |
| EntailmentBank | 1840 | 1840 | 0 | 0 | text |
| GSM8K | 8792 | 8792 | 0 | 0 | number |
| MAWPS | 1921 | 1921 | 0 | 0 | number |
| MedQA (USMLE) | 12,723 | 0 | 1273 | 135,640 | multiple choice |
| MedMCQA | 193,155 | 161,558 | 1000 | 106,967 | multiple choice |
| MMLU (medical) | 1242 | 0 | 0 | 0 | multiple choice |
| OpenBookQA | 5957 | 5957 | 100 | 1980 | multiple choice |
| PubmedQA | 1000 | 1000 | 500 | 2500 | multiple choice |
| QED | 6175 | 6175 | 0 | 0 | collection |
| StrategyQA | 2780 | 2290 | 2289 | 6512 | bool |
| SVAMP | 1000 | 1000 | 0 | 0 | number |
| WorldTree V2 | 4367 | 4365 | 100 | 1980 | multiple choice |
Note that generated CoTs are not available for all examples, and multiple CoT might have been generated for any given example.