Results of experiment 1b. All error bars depict SEM. (A) Training curves and averaged test-phase performance. At the end of the training, performance plateaued for all groups. At test, in contrast to experiment 1a, there was no significant difference in performance between groups. (B) No performance difference between task switch and stay trials. (C) Sigmoid fits to the test-phase choice proportions of the task-relevant (solid lines) and task-irrelevant dimensions (dashed lines). No sensitivity differences were observed along the relevant dimension. However, once again, there was stronger intrusion from the task-irrelevant dimension for Interleaved compared with B200. (D) Conceptual model RDMs. The same reasoning applies as described in Fig. 2D. (E) RDM model correlations at test. Despite equal test performance, the relative advantage of the factorized over the linear model is stronger for B200 than for B2 or Interleaved, suggesting that blocked training did result in better task separation, despite equal performance. (F) Bayesian model comparison between unconstrained and constrained models supports the RSA findings. The unconstrained model fits best in the B200 group, but the constrained model fits best to the Interleaved group. (G) Mean bias of the decision boundary obtained by the unconstrained model. The bias was smallest for B200, indicating that this group estimated the boundaries with high precision. (H) Mean lapse rates. The B200 group made a higher number of unspecific random errors during the test phase, compared with the Interleaved group, which explains equal test performance despite evidence for successful task factorization. We suspect that limited experience with task switches is more detrimental when rules are nonverbalizable. Asterisks denote significance: *P < 0.05; **P < 0.01; ***P < 0.001.