Table 7:
LLM results on compositional task.
| Model | Rotation | Curriculum | Successes | Failures | Accuracy |
|---|---|---|---|---|---|
| Llama 2 | Rule-like | Blocked | 601 | 39 | 93.91% |
| Llama 2 | Rule-like | Interleaved | 478 | 98 | 82.99% |
| Llama 2 | Rotated | Blocked | 151 | 489 | 23.59% |
| Llama 2 | Rotated | Interleaved | 141 | 435 | 24.48% |
| GPT-3.5 | Rule-like | Blocked | 77 | 3 | 96.25% |
| GPT-3.5 | Rule-like | Interleaved | 63 | 17 | 78.75% |
| GPT-3.5 | Rotated | Blocked | 0 | 80 | 0.00% |
| GPT-3.5 | Rotated | Interleaved | 1 | 79 | 1.25% |