A, Hierarchical rule structure used during the task, which was modeled after Werchan et al. (2015). During the learning task, infants saw face–voice/toy–word mappings that could be grouped into hierarchical rule sets using the faces and voices as higher-order contexts. During the generalization task, infants saw a previously learned rule set now paired with a novel face and voice (RS1-A) and one new toy–word pairing was added to the rule set. During the inference test, infants saw the faces and voices from the learning task paired with the novel toy–word mapping from the generalization task. Infants' looking time to pairings that were consistent versus inconsistent with the hierarchical structure was measured. B, The learning task was split into two 24 s blocks in which the higher-order rule switched from one trial to the next (Switch 1 and Switch 2) and two 24 s blocks in which the higher-order rule stayed the same from one trial to the next (Stay 1 and Stay 2). The order of blocks was counterbalanced.