Skip to main content
[Preprint]. 2024 Oct 15:arXiv:2402.08674v3. Originally published 2024 Feb 13. [Version 3]

Table 7:

LLM results on compositional task.

Model Rotation Curriculum Successes Failures Accuracy
Llama 2 Rule-like Blocked 601 39 93.91%
Llama 2 Rule-like Interleaved 478 98 82.99%
Llama 2 Rotated Blocked 151 489 23.59%
Llama 2 Rotated Interleaved 141 435 24.48%
GPT-3.5 Rule-like Blocked 77 3 96.25%
GPT-3.5 Rule-like Interleaved 63 17 78.75%
GPT-3.5 Rotated Blocked 0 80 0.00%
GPT-3.5 Rotated Interleaved 1 79 1.25%