[Preprint]. 2024 Oct 15:arXiv:2402.08674v3. Originally published 2024 Feb 13. [Version 3]

Table 7:

LLM results on compositional task.

Model	Rotation	Curriculum	Successes	Failures	Accuracy
Llama 2	Rule-like	Blocked	601	39	93.91%
Llama 2	Rule-like	Interleaved	478	98	82.99%
Llama 2	Rotated	Blocked	151	489	23.59%
Llama 2	Rotated	Interleaved	141	435	24.48%
GPT-3.5	Rule-like	Blocked	77	3	96.25%
GPT-3.5	Rule-like	Interleaved	63	17	78.75%
GPT-3.5	Rotated	Blocked	0	80	0.00%
GPT-3.5	Rotated	Interleaved	1	79	1.25%