(A) The BOLD signal was significantly enhanced in temporal auditory regions, in primary visual areas, and in the right angular gyrus for S2 faces compared with S2 voices. (B) The fusiform gyrus showed a significant decrease in activation in response to S2 voices primed by faces compared with S2 voices primed by voices. The mean percent signal change of the peak voxel is plotted for each condition. Error bars indicate the standard error of the mean. Crossmodal = face prime, unimodal = voice prime, person‐congruent = S1 and S2 same speaker, person‐incongruent = S1 and S2 different speakers, L = left, R = right. [Color figure can be viewed at http://wileyonlinelibrary.com]