Abstract
Social animals can identify conspecifics by many forms of sensory input. However, whether the neuronal computations that support our ability to identify individuals rely on modality-independent convergence or involve ongoing synergistic interactions along the multiple sensory streams remains controversial. Direct neuronal measurements at relevant brain sites could address such questions, but this requires better bridging the work in humans and animal models. We overview recent studies in nonhuman primates on voice- and face-identity sensitive pathways and evaluate the correspondences to relevant findings in humans. This synthesis provides insights into converging sensory streams in the primate anterior temporal lobe for identity processing. Furthermore, we advance a model and suggest how alternative neuronal mechanisms could be tested.
Keywords: Face, voice, multisensory, identity, human, primate, temporal lobe
Missing pieces in identity processes
Certain individuals are unmistakable by their visual face or auditory voice characteristics, others by their smell or how they move. Identifying an individual, or any other unique entity, is an instance of the general problem of object identification, which is a process occurring at different levels of categorization (e.g., basic, subordinate). At a basic level, identifying objects relies on recognizing the categorical features of the object class; social animals can also perceptually categorize species membership, social rank, body size, or age [1, 2]. However, individuals are unique entities identified by more subtle within-category differences, referred to as subordinate level identification. For example, while a human or monkey might be content to eat “a” banana, social situations critically depend on identifying specific individuals to avoid or interact with. Identifying unique concrete entities, such as specific individuals, can be achieved by input from several sensory modalities, whereas the sound of a banana falling onto the ground may be indistinguishable from the sound of another fruit falling.
The nature of the multisensory computations underlying identity identification in the brain remains unclear. In the following we consider two scenarios: First, it could be that each sensory input is sufficient to activate an identity-specific neuronal representation. In this case, unique individual identification likely relies on amodal (Glossary) or modality-independent convergence sites, whose neural representations can be driven by any sensory input. We refer to this as an ‘or gate’. Alternatively, identification may emerge from the synergistic interplay of all available incoming signals that collectively shape the neural representation. In this case, missing input from one sensory stream will alter the neural representations at the site of convergence. We refer to this as a ‘synergistic’ process, which could be additive or non-additive [3]. Pursuing the underlying mechanisms, whatever they may be, and their impact on behavior will reveal how neural activity is used to identify individuals and unique entities.
Initial insights into the regions underlying the identification of individual persons were provided by lesion and neuroimaging studies [4-7]. Such work revealed a distributed network of brain regions engaged in extracting different types of sensory features, such as faces [8]. Lesion studies show that damage to face-selective regions in occipital, fusiform cortex and the anterior temporal lobe (ATL) can impair face perception, a disorder known as prosopagnosia [9-12]. The analogous auditory disorder affecting voices (phonagnosia) [13] can arise from damage to parts of the same temporal lobe network involved in prosopagnosia, although the heterogeneity in lesion size and location across patients makes more detailed distinctions difficult [4, 12, 14]. Lesions of the language-dominant (left) ATL are associated with a decline in the ability to name both famous faces and famous voices [15, 16]. Naming a person involves lexical retrieval, which depends on language-related processes in frontal, temporal and parietal regions around the Sylvian sulcus [17], including the ATL [18-20].
However, current accounts of the neural processes involved in assessing identity remain equivocal. The most common approaches can identify the large-scale neural substrates but provide limited insights into the overlap, segregation and form of neuronal representations involved in identity processes, because neuroimaging approaches measure either surrogates of neuronal activity or large-scale neural responses. Consequently, there is a need for direct measures of localized neuronal computations to resolve alternative accounts. Direct neuronal recordings in human patients being monitored for neurosurgery can inform on neuronal function in localized regions in the human brain, while work in animal models can describe neuronal processes at multiple scales directly from the regions of interest and offers greater specificity in neuronal manipulation (activation/inactivation). However, the animal work had not kept apace. The current literature in humans considers multisensory interactions and convergence as a research priority, with studies often collecting data from at least two sensory modalities [4, 14], and highlighting the advantage of combining visual (face) and auditory (voice) input for identity recognition [21, 22]. By contrast, neuronal-level studies in animal models were usually restricted to one sensory modality, e.g., face-identity processes in the visual system. In that respect, recent findings from auditory voice-identity related neuronal studies in monkeys may help the integration of human and nonhuman animal work and increase our understanding of the organization of identity processing in the brain.
In the following we briefly overview two alternative neurocognitive models of identity processing developed in humans. We then review recent studies on voice- and face-identity processes and multisensory pathways conducted in nonhuman primates and evaluated the correspondences to relevant findings in humans. From this synthesis, we propose a model of primate ATL function for identity-related processes and use it to identify imminent gaps in our understanding. We conclude by suggesting how alternative neuronal mechanisms could be tested.
Human models of identity perception: What are the neuronal mechanisms?
Current theoretical models developed from human studies of face- and voice-identity perception have suggested that the related auditory and visual streams converge in the ATL [4-7, 12, 23]. Convergence in the ATL has also been independently articulated in lesion studies of semantic disorders, where neurosurgical resection, stroke, or degeneration of the ATL (bilaterally or unilaterally [24]) affects a person’s ability to name or recognize an individual by seeing their face or hearing their voice [18, 19, 25-27].
Two prominent, not mutually exclusive models are as follows. A “distributed-only” model proposes that the sensory features important for recognizing an individual engage distinct brain regions, interconnected into a network [4, 18, 19]. This model does not require amodal convergence sites because the interconnectivity allows inputs from different sensory modalities to influence the collective network-wide processes. Damage to any node in a distributed network will selectively disrupt the key contribution of that node and influence, but not necessarily preclude, the function of the rest of the network. For instance, a lesion of voice-sensitive regions might result in phonagnosia and affect voice/face multisensory interactions, but will not disrupt the ability to identify an individual with inputs from the preserved sensory modalities. Another “distributed-plus-hub” model (or related “hub-plus-spoke” models) for identity processing also contains distributed processes but features the ATL as a key hub or convergence site whose function is amodal [4-7, 18, 19, 23]. Crucially, the function of a damaged amodal process cannot be recovered by the rest of the network (for a computational model see: [25]).
Both models rely on multisensory convergence sites, but differ in the processing at these sites. In this paper, we take this a step further to suggest neuronal mechanisms that could be tested even at the level of single neurons. For instance, multisensory convergence in the ATL as an amodal process suggests an ‘or gating’ function where one or another synaptic input is sufficient to result in neuronal depolarization. The alternative mechanism is a synergistic interaction of the available multisensory inputs such that the form of neuronal representations depends on the combination of the different available inputs. It is also possible that the ‘or gating’ might occur as a side product of the converging synergistic multisensory neural process being assigned a top-down label.
Thereby, different scientific lines have converged on questioning the neural multisensory interactions in ATL sites and the identity-related processes that they support. Although animal models cannot directly address questions of interest for lexical retrieval, since naming relies on human language, work in nonhuman animals can clarify which identity processes in the ATL are evolutionarily conserved and the cognitive functions that they support (e.g., perceptual awareness, conceptual knowledge; Questions Box). Recent developments now allow better bridging gaps between the work in humans and other animals.
Outstanding Questions Box.
Is any sensory input at multisensory convergence sites sufficient for the identification of conspecifics, or are all incoming signals integrated and identification emerges out of their synergistic interaction?
Which subregions of the human ATL support identity processes, how are they structurally and functionally interconnected, and how does this compare to data in animal models?
How do ATL regions functionally interact with other brain regions, and is interaction with medial temporal lobe structures required for identity recognition and memory?
Can animal studies clarify whether certain ATL processes are crucial for perception and/or conceptual knowledge? Attentional cuing tasks can be used to assess an animal’s perceptual awareness of attended voice or face features. Also, person-specific conceptual knowledge could be modelled in nonhuman animals to assess modality-independent representations of specific individuals—using voices/faces of familiar animals or the animal’s known social categories (social rank, etc.) and adaptation, oddball or other paradigms.
What is the impact of degrading or inactivating a sensory input stream on the neuronal responses at face-voice convergence sites?
Which subregions of the ATL have amodal representations in the identity network, and does this require an ‘or gate’ process or does top-down selection serve as the gating mechanism?
Regardless of the nature of the multisensory mechanisms, what is the pattern of encoding at the neuronal subpopulation level in any of these regions (e.g., distributed, sparse or other coding strategy) and how is this affected by removing sensory inputs?
How do oscillatory mechanisms support identity representations in the ATL, and are these processes specifically involved in person identification or form part of general computational principles that apply across sensory tasks and brain regions?
Face and voice regions in humans and other animals
First, face-sensitive neurons were identified in the monkey inferior temporal (IT) cortex that respond stronger to face than non-face objects [28, 29]. Subsequently, neuroimaging studies revealed face-category preferring regions in the human fusiform gyrus and occipital areas [8, 30, 31] and in the monkey fundus and inferior bank of the superior-temporal sulcus (STS) [32-36]. In the auditory modality, voice-sensitive regions have only recently been identified in humans and other animals.
Auditory studies in animal models have shown that neuronal responses typically become increasingly selective for complex sound features along the auditory processing hierarchy [37-42], and that the ventral-stream pathway processing “what” was vocalized in primates involves auditory cortex [43, 44], the anterior superior-temporal gyrus (STG) [38, 45], temporal polar cortex [46], anterior insula [47] and ventrolateral prefrontal cortex (vlPFC) [48, 49]. To more directly study “who” vocalized, rather than “what” was vocalized, requires manipulating voice content.
Regions responding more strongly to voice vs. non-voice categories of sounds were first identified in humans with fMRI [50] and include regions in the STG and STS (Fig. 1A). However, it is known that human voice regions can also strongly respond to or decode speech content [51], raising the concern that voice and speech representations might be functionally interdependent in the human brain and not evident in the same way in the brains of other animals. With the advent of auditory fMRI in nonhuman animals, scientists were able to address this: the comparison of voice vs. non-voice driven responses showed evidence for evolutionary counterparts to human voice regions in the monkey supratemporal plane (STP) (Fig. 1B; [52]) and in the temporal lobe of domesticated dogs (Fig. 1C; [53]).
Figure 1. Temporal lobe voice areas in humans, monkeys and dogs.
(A) Voice-category sensitive sites (voice vs. non-voice sounds; blue) in the human temporal lobe or those that are voice-identity sensitive (within category; red). The identified sites are projected onto the surface using pySurfer software (https://pysurfer.github.io/) and correspond to the identified peak of activity clusters reported in: [50, 57-60, 76, 115]. This representations focuses only on the temporal lobe and the right hemisphere, although, as the original reports show, the left hemisphere also has temporal voice-sensitive regions. For a recent probability map of human voice-category sensitive regions see [87]. PAC: primary auditory cortex, TP: temporal pole, a: anterior, p: posterior. (B) Summary of voice-category and voice-identity sensitive sites in the macaque temporal lobe, obtained from peak activity clusters reported in [52]. Also shown are vocalization-sensitive peak responsive sites (purple) reported in other macaque neuroimaging studies [46, 116, 117]. (C) Voice-category sensitive areas in the brains of domesticated dogs [53], showing a cluster in the anterior temporal lobe. rESG: rostral ectosylvian gyrus, SG: Sylvian gyrus, SF: Sylvian fissure, SSS: suprasylvian sulcus, ESS: ectosylvian sulcus, r: rostral (anterior), m: middle, c: caudal (posterior). Images in C kindly provided by A. Andics.
There are multiple voice-category preferring clusters in the primate STP [52], just as there exist several face-category preferring clusters more inferior in the temporal lobe [33, 34, 54]. Yet, complementary findings have now been obtained in both auditory and visual modalities that point to ATL regions being more sensitive to unique identity than more posterior temporal lobe regions (face identity: humans [36, 55], monkeys [32, 56]; voice identity: humans [57-60], monkeys [52], Fig. 1A-B; infant voice-sensitive regions [61, 62]).
Voice cells, face cells and multisensory interactions
In monkeys, targeted neural recordings in voice-identity sensitive fMRI clusters in the ATL provided first evidence for voice cells (Fig. 2A; [63]). These neurons respond strongly to the category of conspecific voices (Fig. 2B) as well as differentially to specific voices within that category (Fig. 2C-D; [63]). The neurons in the ATL voice region are also sensitive to auditory features in the vocalizations such as caller identity (Fig. 2C), further qualifying this anterior STP region as a higher-order auditory area [64] and supporting the notion that the ATL is important for identity processes.
Figure 2. Voice- and face-sensitive neuronal response characteristics in monkeys.
(A) Targeting approach for recording from the anterior voice-identity sensitive fMRI cluster (red). Multisensory cortex in the upper bank of the STS is illustrated in yellow. The fundus and the lower bank of the STS can contain face-sensitive clusters (blue, see text). (B) Voice-sensitive neurons show a categorical response to monkey vocalizations produced by many different callers (MVocs) that is two-fold greater than responses to vocalizations from other animals (AVocs) or non-voice natural sounds (NSounds) [63]. (C) Units sensitive to voice (caller) identity are often found within the pool of voice-category preferring units. Such units show comparable responses to two different vocalizations (here the response to “coo” and “grunt” calls is averaged) and differential responses to individual callers (caller M1 vs. M2) [64]. (D) Voice-sensitive neurons respond very selectively to a small subset of the stimuli within the conspecific voices/vocalizations category. (E) Voice-sensitive cells appear to be more stimulus-selective (respond well to smaller percentages of the presented voices, [63]) than face cells, which, by comparison, tend to respond to ~55% of the faces within the face stimuli [35, 65]. Panels A and C are modified from [63] and reproduced with permission from Society for Neuroscience.
However, the functional organization of face- and voice-sensitive clusters in the ATL may not be identical [63]. For example, face cells might be more abundant in fMRI-identified face patches [35] and be less selective to individual static faces (Fig. 2E; [35, 65]). By contrast, voice cells cluster in modest proportions and respond very selectively to a small subset of the presented voice stimuli (Fig. 2E); for further discussion see [63]. This high stimulus-selectivity of auditory ATL neurons is not unexpected [38] and is on par with the selectivity of neurons in the vlPFC [40, 48]. These initial comparisons suggest potential divergences in the neuronal substrates of identity representations in the auditory and visual streams at these processing stages.
Regarding the nature of multisensory interactions underlying individual identification, there is now substantial evidence for anatomical and functional crosstalk at various stages of the sensory pathways in humans and many other animal species [66-73]. Neuronal responses to voices and dynamic faces have been compared in monkeys between the voice-sensitive anterior (a)STP and the anterior upper-bank of the STS (uSTS) [64, 73, 74], which is part of multisensory association cortex in primates [64, 66, 68, 72, 73, 75, 76]. Anterior uSTS neurons, unlike those in the aSTP, are not particularly sensitive to auditory vocal features [64], which is also the case for more posterior regions of the uSTS [73, 74]. By comparison, however, anterior uSTS neurons show a balance of both auditory and visual responses (Fig. 3B) and are sensitive to the congruency of the presented voice-face pairings: multisensory influences in the uSTS tend to occur more frequently in response to matching compared to mismatched audio-visual stimuli, such as a monkey face being paired with a human voice. By contrast, aSTP neurons exhibit weak visual-only responses [64]. Also, multisensory influences in the aSTP are less selective to correct face/voice pairings and are qualitatively more similar to those reported in and around primary auditory cortex than they are to those in the STS [70]. These observations are consistent with the evidence for integrative multisensory processes in the human and monkey STS [74, 77], potentially at the cost of decreased specificity for unisensory representations [68]. The results reveal visual modulation in the ATL, but underscore the auditory role of the primate voice-sensitive aSTP with more robust multisensory integration occurring in the STS.
Figure 3. Neuronal multisensory influences and effective functional connectivity in the monkey brain.
(A) Example of a nonlinear (subadditive) multisensory unit in voice-sensitive cortex: Firing rates in response to combined audio-visual stimulation (AV, voice and face) significantly differ from the sum of the responses to the unimodal stimuli (A: auditory and V: visual). (B) Neuronal multisensory influences are prominent in voice-sensitive cortex (anterior supratemporal plane; aSTP) but are qualitatively different from those in the anterior superior temporal sulcus (aSTS). For example, aSTS neurons more often display bimodal responses [64]. (C) A study of effective functional connectivity using combined microstimulation and fMRI shows that stimulating voice-sensitive cortex (blue cross) tends to elicit fMRI activity in ATL regions [81]. (D) By contrast, stimulating the aSTS also elicits fMRI activity in frontal cortex, in particular the orbitofrontal cortex (OFC). A: anterior, P: posterior, S: superior, I: inferior. Panel A is modified from [64] and reproduced with permission from Society for Neuroscience. Images for C-D provided by C. Petkov.
A number of studies are also assessing the timing of neuronal responses relative to oscillatory activity, as a mechanism for routing and prioritizing sensory information [78]. For instance, the latencies of auditory cortical responses decrease when there is a behavioral benefit of a visual face on the reaction time in detecting an auditory voice [79]. Also, neurons in the monkey voice-sensitive aSTP show crossmodal (face-on-voice) phase-resetting that can predict the form of multisensory neuronal responses [71]. These phase-resetting effects appear to be more similar to those reported in and around primary auditory cortex than they do to those reported in the STS [72]. Moreover, neurons in the monkey STS show specific patterns of slow oscillatory activity and spike timing that reflect visual category-specific information (faces vs. objects) [80]. Taken together, this suggests that the interplay of individual neurons and the local network context shapes sensory representations. Yet, whether oscillatory processes are specifically involved in identity processing or constitute more general computational principles shared across brain regions remains unclear (Questions Box).
Interconnectivity between face and voice regions and other areas
Recently the directional effective connectivity of the voice network was investigated using combined microstimulation and fMRI in monkeys, providing insights into voice-related and multisensory processing pathways in the primate ATL [81]. Stimulating a brain region while scanning with fMRI can reveal the synaptic targets of the stimulated site, a presumption supported by the fact that target regions activated by stimulation are often consistent with those identified using neuronal anterograde tractography [e.g., 81, 82].
Surprisingly, microstimulating voice-identity sensitive cortex does not strongly activate prefrontal cortex, unlike stimulation of downstream multisensory areas in the STS and upstream auditory cortical areas in the lateral belt [81]: The voice-sensitive area in the primate aSTP seems to interact primarily with an ATL network including the uSTS and regions around the temporal pole (Fig. 3C). By contrast, stimulating the uSTS results in significantly stronger frontal fMRI activation, particularly in orbital frontal cortex (Fig. 3D). These observations suggest that multisensory voice/face processes are integrated in regions such as the uSTS in the ATL prior to having a stronger impact on frontal cortex, providing additional insights to complement those on ATL connectivity [49, 83-86] and neuronal processes [38, 46, 64].
However, there is a noted distinction between species [52], as human voice-sensitive clusters are often localized in the STS, which in monkeys is classified as multisensory association cortex [3, 66]. Interestingly, a recent probabilistic map of human temporal voice areas suggests that anterior voice-sensitive regions are located in the human STG and posterior ones in the STS [87]. Thus there may be a close correspondence across the species in terms of anterior voice-sensitive clusters and multisensory processes in the STS, although this issue is worth evaluating further (Questions Box).
Human neuroimaging studies have shown that voice and face regions in the temporal lobe can be respectively influenced by the other modality [88-90], and are structurally connected to each other [91]. Another study finds that the posterior STS bilaterally and the right aSTS respond preferentially to people-related information regardless of the sensory modality [76], which could be construed as certain human voice regions in the anterior STS being amodal [92, 93]. However, it is currently unclear whether and how voice and face regions in the human temporal lobe are interconnected with multisensory regions in the STS and those in the temporal pole or frontal cortex, knowledge that is already available in monkeys. Ongoing efforts could be complemented with more direct measures of local ATL neural responses to voices and faces in humans to compare with intracranial recordings in monkeys.
Human intracranial recordings during face and voice naming
An earlier study recording from neurons in the MTL of patients reported highly selective responses to pictures of known celebrities, such as Jennifer Aniston [94]. Recently, several studies have been conducted in human subjects performing voice and face naming tasks [26, 92, 95]. One group in particular has developed more extensive coverage of the different ATL regions for subdural cortical recordings [26, 96]. Using a voice or face naming task while recording local-field potentials revealed strikingly similar neuronal responses in the ATL regardless of the form of the sensory input, auditory or visual (Fig. 4). By contrast, electrode contacts over auditory areas in the STG mainly responded to the voice, and those over the visual fusiform gyrus mainly to the face stimulus. Moreover, ATL responses to the voices/faces tended to be in lower frequency bands (strongest in the beta band), whereas unisensory responses in the STG and fusiform gyrus (FG) were in the gamma band (Fig. 4). This might be of interest in relation to suggestions that gamma can be a measure of local or feed-forward processes while beta band activity could be an indication of top-down feedback [78]. One speculative possibility is that the ATL is receiving and providing face/voice feedback on unisensory cortex, consistent with cognitive models whereby the ATL reactivates [15] or routes information held in sensory-specific cortex. Alternatively, even certain feed-forward processes in the ATL might not appear in the gamma range as the temporal coordination of neural activity generating oscillations may differ across regions. Tentatively, these human intracranial recording results suggest modality-independent representations in parts of the ATL, while sensory-specific responses dominate in the superior (voice) and inferior (face) portions of the ATL. However, given that the task in these human studies involved lexical retrieval, it remains important to assess face- and voice-sensitive processes using non-linguistic tasks.
Figure 4. Anterior temporal lobe neuronal recordings in humans.
Intracranial human recordings from several areas in the temporal lobe during an auditory and visual identity naming task. (A) Regions of the anterior temporal lobe (ATL) are responsive to both a picture and the voice of an individual [26]. In contrast, a visual area in the posterior fusiform gyrus (pFG) responds mainly to the picture, and auditory cortex on the superior-temporal gyrus (STG) to the sound. Note that verbal naming followed the period of recording in response to the faces/voices as stimuli. (B) Some contacts in the two patients (L206, L242, L258) show unimodal (picture/voice) responses in the ATL, particularly in the beta band. Other contacts show responses to both. Images were modified by K. Nourski from [26] with permission from Society for Neuroscience.
Establishing better causal relationships with identity perception
Thus far, our understanding of how affecting neuronal processes or brain regions impacts on identity-related perception is limited. For practical reasons, recordings in monkeys and many human imaging studies are conducted with passive stimulation or a stimulus-irrelevant task, such as visual fixation. An earlier study showed that microstimulation of monkey IT neurons influenced subsequent face category judgements [97]. Recently, human transcranial magnetic stimulation of temporal voice regions selectively disrupted voice category judgements [98]. In another study, directly stimulating the fusiform gyrus of a human patient warped the patient’s perception of a face [99]. Whether these manipulations would have also affected perception in another sensory modality from the one studied is a topic for future research.
A primate model of identity processes
We propose a model of individual identity processes in primates, on the basis of the prior synthesis (Key Figure), as follows:
-
a)
Two independent but interacting auditory and visual ventral processing streams extract voice/face features. ATL regions are sensitive to identity features, with other temporal lobe regions evaluating different aspects of voice/face content, such as category membership.
-
b)
The STS is a key conduit between voice and face processing streams, with the aSTS an ATL convergence site that allows multisensory representations to more strongly influence frontal cortex.
-
c)
Neurons in ATL subregions such as the aSTS and the temporal pole integrate highly subcategorized information specific for unique individuals and concrete entities. Such representations may not be tied to any sensory modality and the neural mechanisms need to be determined (Box 1). Possibly the ATL can feed-back to unisensory processing streams to route specific forms of input.
-
d)
Anatomical connectivity between the primate ATL regions is funneled into the temporopolar cortex [85, 100], but less is known about its functional role in primates in relation to identity processes.
-
e)
Identity recognition is likely to involve MTL structures. Currently it is an open question whether auditory pathways to the MTL in primates are less direct than those in humans [101, 102], requiring cross-species comparisons of interconnectivity.
The primate model at a regional level is generally in agreement with human models on face and voice perception, whereby distinct sensory processing streams have prominent multisensory interactions between face and voice areas [4, 5, 12]. One issue that needs addressing is whether human voice regions in the STG/STS are intrinsically more multisensory than the voice regions in the primate aSTP. It is possible that human auditory voice regions in the STG are difficult to distinguish from neighboring multisensory regions in the STS in group neuroimaging data. Thus the anterior upper-bank of the STS may be a key site of multisensory convergence in both humans and monkeys. The model suggests that candidate regions for convergence sites in the ATL are the aSTS and the temporopolar cortex.
Key Figure. Primate model for identity-processing networks and multisensory convergence pathways.
The model focuses on the auditory pathway involved in extracting voice-identity content in communication signals and the analogous visual pathway. The principles would apply to other sensory input streams although the regions involved may differ. The key features of the model are the initial sensory and category-sensitive processing stages (m/pSTS; visual area TEO and auditory regions in posterior STP/STG). Multisensory influences are present throughout the visual and auditory pathway, but are thought to be qualitatively different in the STS, in relation to, for example, aSTP regions where the auditory modality is dominant [64, 72]. Identity-related processes would primarily involve ATL regions (anterior STP/STG; aSTS; aIT). Not illustrated are interactions with MTL structures such as the entorhinal cortex and hippocampus that could support the recognition of familiar individuals. The model is illustrated to the right on a rendered macaque brain to reveal some of the bidirectional pathways of inter-regional connectivity (yellow), as well as some of the feed-back projections to auditory and visual processing streams (green). A number of multisensory convergence sites are evident, which for identity-related processes in the ATL appear to involve at least the aSTS and regions of temporopolar (TP) cortex.
Box 1. Predicted neuronal mechanisms and impact of lost unisensory input.
Multisensory convergence sites have responses that are influenced by the unisensory streams feeding into the region, but the neuronal mechanisms for these convergence processes could be very different. Two simple mechanisms are illustrated as is the expected impact of lost unisensory function on neural responses at the convergence site. If the two inputs (‘a’, ‘b’) are additive [118], multiplicative or divisive on neuronal responses the convergence site will reflect a synergistic combination of the two (‘ab’). Alternatively, if the convergence site functions as an “OR” gate then the result (‘c’) would differ from the form evident in the sensory inputs, as would a synergistic process (‘ab’), but the neuronal computations involved are different.
To tease apart the mechanism requires eliminating or degrading one form of input (such as by using local, reversible molecular or genetic neuronal inactivation in an animal model) while stimulating with multisensory input (e.g., voice and face of a specific individual). Then, assessing whether the convergence site shows more ‘a’ or ‘b’ responsiveness would clarify whether the mechanism is synergy that is disrupted if one input stream is lost. The alternative is that the loss of input from one sensory stream does not qualitatively alter the form of the responsiveness in the convergence site. It is currently unknown which of these, or other, mechanisms are implemented in any of the multisensory sites identified in Key Figure.
Related predictions can be extended to measures of oscillatory activity rather than firing rates, with the main difference being the patterns of combined multisensory responses at convergence sites: dominant sensory input typically elicits broadband power increase and strong low-frequency phase alignment, while non-dominant crossmodal inputs can reset the phase of ongoing low-frequency cortical oscillations without a strong increase in power. Such cross-sensory phase-resetting can predict multisensory enhancement or suppression of spiking responses depending on its phase relationship to the dominant sensory response [71, 119]. In intact synergistic processes, the multisensory oscillatory response resulting from a combination of dominant and non-dominant inputs could be a scaled approximation of the response to the dominant inputs. In contrast, in an ‘OR’ operation, different equally dominant inputs may be combined into a form of oscillatory response that is characteristic of that particular multisensory site. Again, to tease apart the mechanism requires eliminating or degrading one form of sensory input.
Furthermore, the multisensory computations underlying identity identification still remain unclear. First, it remains possible that in certain ATL sites a process resembling convergence on a larger scale might at a finer scale be found to be partially segregated by unisensory input [74, 77]. Second, implicating either multisensory ‘synergistic’ versus ‘or gate’ mechanisms (Box 1) in specific regions cannot be resolved by the current findings: While the monkey recordings from the uSTS appear consistent with a synergistic process, as suggested by results of non-additive multisensory interactions, they also reveal independent activation by auditory or visual stimulation (Fig. 3A; [64]). The human ATL recordings that show strikingly similar responses to voice and face stimuli prior to naming, which differ from responses in unisensory regions [26], suggest an or gate operation. In humans, when ATL voice and face responsive sites are injured, voice and face naming are both impaired, suggestive of a synergistic interaction [16]. Formally testing the alternative neuronal mechanisms will require inactivating one of the input streams during multisensory stimulation as we illustrate in Box 1 and might require animal models for adequate specificity.
While the notion of sites with amodal functions may well be disproved in the future, they are useful concepts for generating testable predictions on neuronal processes and multisensory interactions. It is also worth keeping in mind that the ATL is one of several highly multisensory convergence sites in the brain that serve various purposes. For example, the angular gyrus in humans is part of a multiple-demand, cognitive control network [103] that appears to also be present in monkeys [104]. There may also be a gradation between modality-specific and amodal representations in the ATL [19, 86], which our simple model does not capture but could be explored with computational simulations as well as additional data on neuronal processes in convergence sites and those that influence them. Finally, the picture becomes more complex with feed-back interactions, but is important to consider as cognitive ‘reactivation’ of the ATL during retrieval [15] may convert a synergistic process to an or gate.
Identity processes from a broader evolutionary perspective
The proposed primate model may be generalized for testing in other nonhuman animals. Rodents identify each other by odor [105], and odor identity is represented in the olfactory piriform cortex [106, 107] (which is interconnected with the entorhinal cortex [108], one of the regions present in the primate MTL; Fig. 5). Pup odors and vocalization sounds can synergistically interact to influence maternal behavior in mice [109], and there appear to be multisensory interactions between the rodent olfactory and auditory processing systems [110-112]. Moreover, auditory object-identity processes (i.e., the timbre of resonant sources) are being studied in ferrets [113], as is the distinction between the neuronal representation in songbirds of own song vs. the song of another [114]. A broader comparative approach will clarify evolutionary relationships and allow harnessing the strengths of different animals as neurobiological models.
Concluding Remarks
By reviewing recent voice- and face-related neurobiological work in nonhuman primates and humans we suggest here a number of principles that may be eventually extended for modeling the basic neural processes involved in subordinate or identity perception. The proposed model highlights some possible neural mechanisms and the key areas of uncertainty between the primate and human models. We argue that the next step in understanding the neurobiology of identity perception will benefit from cross-species comparisons, direct access to local neuronal processes in different ATL subregions and causal manipulation of sensory inputs into convergence sites. We also need information on effective connectivity and to better establish causal relationships between neuronal processes and identity perception and cognition (Questions Box). All such work will need to involve more than just one sensory modality.
Acknowledgments
This work was supported by the Wellcome Trust (Investigator Award to CIP, WT092606AIA), BBSRC (CIP, BB/J009849/1; CK, BB/L027534/1), the Swiss National Science Foundation (CP, P2SKP3_158691) and NIH (TJA, F32-NS087664). We thank Attila Andics for providing the dog MRI and fMRI images and Kirill Nourski for help with illustrating Figure 4. We also thank P. Belin, T. Griffiths, M. Howard, T. Rinne and K. von Kriegstein for useful discussions and the anonymous reviewers for useful comments.
Glossary Box
- Additive/multiplicative/divisive neuronal responses
Multisensory interactions are measured when the response to combined sensory modalities differs from any of the responses to the different modalities in isolation. Additive responses are modeled as the sum of the individual sensory responses. Multiplicative or divisive responses are non-additive, non-linear multisensory responses.
- Amodal
A transmodal or modality-free representation of an environmental object where input from any one or multiple sensory stream(s) can contribute towards identifying the object. Our definition does not require or imply a symbolic or semantic transformation. This is a type of multisensory representation, but unlike multisensory influences between sensory streams, losing one set of unisensory inputs will not preclude identification of the object by any of the other modalities.
- ATL
Anterior temporal lobe (ATL) structures in and around the temporal pole in both hemispheres of the primate brain. This includes the temporal pole and anterior regions of the supratemporal plane (aSTP), superior temporal gyrus (aSTG), superior temporal sulcus (aSTS), middle temporal gyrus (aMTG; a gyrus present in humans but not monkeys) and the inferior temporal cortex (aIT, which includes the inferior temporal gyrus (ITG) and in humans the anterior fusiform gyrus; aFG). Medial aspects of the ATL include anterior parts of the amygdala and entorhinal cortex. Functionally distinct ATL modules can be parcellated based on cytoarchitectonics [120] and the sensory profiles of the afferent input streams and efferent projections to frontal areas [100]. Most temporal pole subregions appear to be more strongly interconnected specific other ATL subregions, while the polar area, TG [120], is connected to all other areas of the temporal pole [85].
- Beta band oscillations
Brain rhythms that fluctuate in the ~15-30Hz range.
- Depth electrode recordings
Intracerebral recordings from deep cortical, sulcal and sub-cortical structures below the surface of the brain.
- ECoG
Electrocorticography, also known as intracranial electroencephalography (iEEG), typically refers to intracranial recordings from the surface of the brain, as performed in human epilepsy patients being monitored for invasive localization of their epileptogenic foci.
- Gamma band oscillations
EEG or intracranial recordings can measure rhythmic oscillations thought to reflect the coordinated spiking activity of large groups of neurons. Gamma band oscillations occur above 30Hz.
- Intracranial recordings
Direct extra-cellular electrical recordings from within the gray matter or the surface of the brain.
- Multisensory convergence
Neurons or brain areas receiving input from multiple sensory pathways, such that their responses are affected by inputs in any of the converging sensory modalities. Multisensory convergence is thought to be the basis for the integration of different sensory inputs into a unified, multisensory object representation, but might differ mechanistically from an amodal representation, as we consider in this article.
- Neuroimaging
Brain imaging approaches measuring hemodynamic responses with functional magnetic resonance imaging (fMRI) or functional near infrared spectroscopy (fNIRS), glucose metabolism with positron emission tomography (PET) or electrical (electroencephalography, EEG) or magnetic activity (magnetoencelography, MEG) from the surface of the head.
- Phonagnosia
A variant of auditory agnosia where a lesion impairs the ability to perceive or recognize the voice of an individual, often with preserved speech comprehension.
- Prosopagnosia
A deficit where a person’s ability to perceive and recognize faces is impaired although their ability to perceive and recognize other objects may be intact. This can result from damage to the face processing network in the temporal lobe. Prosopagnosia can, but does not necessarily always, dissociate from phonagnosia.
- Selectivity
Measure of the size of the stimulus set evoking responses from a neuron or set of neurons, as an indication of its broadness of tuning. This value can range from weakly selective neurons that respond to none or most of the presented stimuli to neurons responding to a subset of the stimuli but not the others.
- Sensitivity
Measure of the stimulus category that drives a neuron or set of neurons. For instance, a voice-sensitive neuron might respond strongly to different voices, but less to non-voice sounds, and thus would carry information about a ‘voice’ category. A voice-identity sensitive neuron would respond selectively to a subset of the voice category stimuli. An extreme case of identity-sensitivity is the traditional notion of an identity-selective “grandmother cell” that responds exclusively to one particular individual in a one-or-nothing fashion.
- Voice or face content
The sensory features of vocalizations or faces that provide indexical cues on the identity of the individual. For example, a number of acoustical factors (including the vocal filtering of the sound generated by the vocal source in the mammalian larynx) could be used to identify an individual by the voice characteristics of their vocalizations. More generally, voice features are related to the identity (timbre) of resonant sources [113, 121].
References
- 1.Bergman TJ, et al. Hierarchical classification by rank and kinship in baboons. Science. 2003;302:1234–1236. doi: 10.1126/science.1087513. [DOI] [PubMed] [Google Scholar]
- 2.Ghazanfar AA, et al. Vocal-tract resonances as indexical cues in rhesus monkeys. Curr Biol. 2007;17:425–430. doi: 10.1016/j.cub.2007.01.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ghazanfar AA, Schroeder CE. Is neocortex essentially multisensory? Trends Cogn Sci. 2006;10:278–285. doi: 10.1016/j.tics.2006.04.008. [DOI] [PubMed] [Google Scholar]
- 4.Blank H, et al. Person recognition and the brain: Merging evidence from patients and healthy individuals. Neurosci Biobehav Rev. 2014;47:717–734. doi: 10.1016/j.neubiorev.2014.10.022. [DOI] [PubMed] [Google Scholar]
- 5.Belin P, et al. Understanding voice perception. British Journal of Psychology. 2011;102:711–725. doi: 10.1111/j.2044-8295.2011.02041.x. [DOI] [PubMed] [Google Scholar]
- 6.Campanella S, Belin P. Integrating face and voice in person perception. Trends Cogn Sci. 2007;11:535–543. doi: 10.1016/j.tics.2007.10.001. [DOI] [PubMed] [Google Scholar]
- 7.Bruce V, Young A. Understanding face recognition. British journal of psychology. 1986;77:305–327. doi: 10.1111/j.2044-8295.1986.tb02199.x. [DOI] [PubMed] [Google Scholar]
- 8.Haxby JV, et al. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science. 2001;293:2425–2430. doi: 10.1126/science.1063736. [DOI] [PubMed] [Google Scholar]
- 9.Busigny T, et al. Face-specific impairment in holistic perception following focal lesion of the right anterior temporal lobe. Neuropsychologia. 2014;56:312–333. doi: 10.1016/j.neuropsychologia.2014.01.018. [DOI] [PubMed] [Google Scholar]
- 10.Yang H, et al. The Anterior Temporal Face Area Contains Invariant Representations of Face Identity That Can Persist Despite the Loss of Right FFA and OFA. Cerebral Cortex. 2014 doi: 10.1093/cercor/bhu289. bhu289. [DOI] [PubMed] [Google Scholar]
- 11.Collins JA, Olson IR. Beyond the FFA: The role of the ventral anterior temporal lobes in face processing. Neuropsychologia. 2014;61:65–79. doi: 10.1016/j.neuropsychologia.2014.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gainotti G. Is the right anterior temporal variant of prosopagnosia a form of ‘associative prosopagnosia’or a form of ‘multimodal person recognition disorder’? Neuropsychology review. 2013;23:99–110. doi: 10.1007/s11065-013-9232-7. [DOI] [PubMed] [Google Scholar]
- 13.Van Lancker DR, Canter GJ. Impairment of voice and face recognition in patients with hemispheric damage. Brain Cogn. 1982;1:185–195. doi: 10.1016/0278-2626(82)90016-1. [DOI] [PubMed] [Google Scholar]
- 14.Mathias SR, von Kriegstein K. How do we recognise who is speaking? Front Biosci (Scholar edition) 2014;6:92. doi: 10.2741/s417. [DOI] [PubMed] [Google Scholar]
- 15.Damasio H, et al. A neural basis for lexical retrieval. Nature. 1996 doi: 10.1038/380499a0. [DOI] [PubMed] [Google Scholar]
- 16.Waldron EJ, et al. The left temporal pole is a heteromodal hub for retrieving proper names. Frontiers in bioscience (Scholar edition) 2014;6:50. doi: 10.2741/s413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Binder JR, et al. Where Is the Semantic System? A Critical Review and Meta-Analysis of 120 Functional Neuroimaging Studies. Cereb Cortex. 2009 doi: 10.1093/cercor/bhp055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Patterson K, et al. Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience. 2007;8:976–987. doi: 10.1038/nrn2277. [DOI] [PubMed] [Google Scholar]
- 19.Ralph MAL. Neurocognitive insights on conceptual knowledge and its breakdown. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2014;369:20120392. doi: 10.1098/rstb.2012.0392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hurley RS, et al. Asymmetric Connectivity between the Anterior Temporal Lobe and the Language Network. Journal of cognitive neuroscience. 2015 doi: 10.1162/jocn_a_00722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bulthoff I, Newell FN. Distinctive voices enhance the visual recognition of unfamiliar faces. Cognition. 2015;137:9–21. doi: 10.1016/j.cognition.2014.12.006. [DOI] [PubMed] [Google Scholar]
- 22.O’Mahony C, Newell FN. Integration of faces and voices, but not faces and names, in person recognition. Br J Psychol. 2012;103:73–82. doi: 10.1111/j.2044-8295.2011.02044.x. [DOI] [PubMed] [Google Scholar]
- 23.Schweinberger SR, Burton AM. Covert recognition and the neural system for face processing. Cortex. 2003;39:9–30. doi: 10.1016/s0010-9452(08)70071-6. [DOI] [PubMed] [Google Scholar]
- 24.Pobric G, et al. Amodal semantic representations depend on both anterior temporal lobes: evidence from repetitive transcranial magnetic stimulation. Neuropsychologia. 2010;48:1336–1342. doi: 10.1016/j.neuropsychologia.2009.12.036. [DOI] [PubMed] [Google Scholar]
- 25.Rogers TT, et al. Structure and deterioration of semantic memory: a neuropsychological and computational investigation. Psychological review. 2004;111:205. doi: 10.1037/0033-295X.111.1.205. [DOI] [PubMed] [Google Scholar]
- 26.Abel TJ, et al. Direct physiologic evidence of a heteromodal convergence region for proper naming in human left anterior temporal lobe. J Neurosci. 2015;35:1513–1520. doi: 10.1523/JNEUROSCI.3387-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Drane DL, et al. Famous face identification in temporal lobe epilepsy: Support for a multimodal integration model of semantic memory. Cortex. 2013;49:1648–1667. doi: 10.1016/j.cortex.2012.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Perrett DI, et al. Visual neurones responsive to faces in the monkey temporal cortex. Exp Brain Res. 1982;47:329–342. doi: 10.1007/BF00239352. [DOI] [PubMed] [Google Scholar]
- 29.Bruce C, et al. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J Neurophysiol. 1981;46:369–384. doi: 10.1152/jn.1981.46.2.369. [DOI] [PubMed] [Google Scholar]
- 30.Kanwisher N, et al. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci. 1997;17:4302–4311. doi: 10.1523/JNEUROSCI.17-11-04302.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sergent J, et al. Functional neuroanatomy of face and object processing. A positron emission tomography study. Brain. 1992;115(Pt 1):15–36. doi: 10.1093/brain/115.1.15. [DOI] [PubMed] [Google Scholar]
- 32.Freiwald WA, Tsao DY. Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science. 2010;330:845–851. doi: 10.1126/science.1194908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ku SP, et al. fMRI of the face-processing network in the ventral temporal lobe of awake and anesthetized macaques. Neuron. 2011;70:352–362. doi: 10.1016/j.neuron.2011.02.048. [DOI] [PubMed] [Google Scholar]
- 34.Logothetis NK, et al. Functional imaging of the monkey brain. Nat Neurosci. 1999;2:555–562. doi: 10.1038/9210. [DOI] [PubMed] [Google Scholar]
- 35.Tsao DY, et al. A cortical region consisting entirely of face-selective cells. Science. 2006;311:670–674. doi: 10.1126/science.1119983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tsao DY, Livingstone MS. Mechanisms of face perception. Annu Rev Neurosci. 2008;31:411–437. doi: 10.1146/annurev.neuro.30.051606.094238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chechik G, Nelken I. Auditory abstraction from spectro-temporal features to coding auditory entities. Proc Natl Acad Sci U S A. 2012;109:18968–18973. doi: 10.1073/pnas.1111242109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kikuchi Y, et al. Hierarchical auditory processing directed rostrally along the monkey’s supratemporal plane. The Journal of Neuroscience. 2010;30:13021–13030. doi: 10.1523/JNEUROSCI.2267-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bizley JK, et al. Auditory cortex represents both pitch judgments and the corresponding acoustic cues. Curr Biol. 2013;23:620–625. doi: 10.1016/j.cub.2013.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Romanski LM, et al. Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. J Neurophysiol. 2005;93:734–747. doi: 10.1152/jn.00675.2004. [DOI] [PubMed] [Google Scholar]
- 41.Rauschecker JP, et al. Processing of complex sounds in the macaque nonprimary auditory cortex. Science. 1995;268:111–114. doi: 10.1126/science.7701330. [DOI] [PubMed] [Google Scholar]
- 42.Kajikawa Y, et al. Auditory properties in the parabelt regions of the superior temporal gyrus in the awake macaque monkey: an initial survey. J Neurosci. 2015;35:4140–4150. doi: 10.1523/JNEUROSCI.3556-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wang X, Kadia SC. Differential representation of species-specific primate vocalizations in the auditory cortices of marmoset and cat. J Neurophysiol. 2001;86:2616–2620. doi: 10.1152/jn.2001.86.5.2616. [DOI] [PubMed] [Google Scholar]
- 44.Recanzone GH. Representation of con-specific vocalizations in the core and belt areas of the auditory cortex in the alert macaque monkey. J Neurosci. 2008;28:13184–13193. doi: 10.1523/JNEUROSCI.3619-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Russ BE, et al. Coding of auditory-stimulus identity in the auditory non-spatial processing stream. J Neurophysiol. 2008;99:87–95. doi: 10.1152/jn.01069.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Poremba A, et al. Species-specific calls evoke asymmetric activity in the monkey’s temporal poles. Nature. 2004;427:448–451. doi: 10.1038/nature02268. [DOI] [PubMed] [Google Scholar]
- 47.Remedios R, et al. An auditory region in the primate insular cortex responding preferentially to vocal communication sounds. J Neurosci. 2009;29:1034–1045. doi: 10.1523/JNEUROSCI.4089-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Gifford GW, M. K, Hauser MD, Cohen YE. The neurophysiology of functionally meaningful categories: macaque ventrolateral prefrontal cortex plays a critical role in spontaneous categorization of species-specific vocalizations. J Cogn Neurosci. 2005;17:1471–1482. doi: 10.1162/0898929054985464. [DOI] [PubMed] [Google Scholar]
- 49.Plakke B, Romanski LM. Auditory connections and functions of prefrontal cortex. Front Neurosci. 2014;8:199. doi: 10.3389/fnins.2014.00199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Belin P, et al. Voice-selective areas in human auditory cortex. Nature. 2000;403:309–312. doi: 10.1038/35002078. [DOI] [PubMed] [Google Scholar]
- 51.Formisano E, et al. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science. 2008;322:970–973. doi: 10.1126/science.1164318. [DOI] [PubMed] [Google Scholar]
- 52.Petkov CI, et al. A voice region in the monkey brain. Nat Neurosci. 2008;11:367–374. doi: 10.1038/nn2043. [DOI] [PubMed] [Google Scholar]
- 53.Andics A, et al. Voice-sensitive regions in the dog and human brain are revealed by comparative fMRI. Current Biology. 2014;24:574–578. doi: 10.1016/j.cub.2014.01.058. [DOI] [PubMed] [Google Scholar]
- 54.Tsao DY, et al. Comparing face patch systems in macaques and humans. Proc Natl Acad Sci U S A. 2008;105:19514–19519. doi: 10.1073/pnas.0809662105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kriegeskorte N, et al. Individual faces elicit distinct response patterns in human anterior temporal cortex. Proc Natl Acad Sci U S A. 2007;104:20600–20605. doi: 10.1073/pnas.0705654104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Morin EL, et al. Hierarchical encoding of social cues in primate inferior temporal cortex. Cerebral Cortex. 2014 doi: 10.1093/cercor/bhu099. bhu099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Andics A, et al. Neural mechanisms for voice recognition. Neuroimage. 2010;52:1528–1540. doi: 10.1016/j.neuroimage.2010.05.048. [DOI] [PubMed] [Google Scholar]
- 58.von Kriegstein K, et al. Modulation of neural responses to speech by directing attention to voices or verbal content. Brain Res Cogn Brain Res. 2003;17:48–55. doi: 10.1016/s0926-6410(03)00079-x. [DOI] [PubMed] [Google Scholar]
- 59.Chandrasekaran B, et al. Neural processing of what and who information in speech. J Cogn Neurosci. 2011;23:2690–2700. doi: 10.1162/jocn.2011.21631. [DOI] [PubMed] [Google Scholar]
- 60.Belin P, Zatorre RJ. Adaptation to speaker’s voice in right anterior temporal lobe. Neuroreport. 2003;14:2105–2109. doi: 10.1097/00001756-200311140-00019. [DOI] [PubMed] [Google Scholar]
- 61.Blasi A, et al. Early specialization for voice and emotion processing in the infant brain. Current Biology. 2011;21:1220–1224. doi: 10.1016/j.cub.2011.06.009. [DOI] [PubMed] [Google Scholar]
- 62.Grossmann T, et al. The developmental origins of voice processing in the human brain. Neuron. 2010;65:852–858. doi: 10.1016/j.neuron.2010.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Perrodin C, et al. Voice cells in the primate temporal lobe. Curr Biol. 2011;21:1408–1415. doi: 10.1016/j.cub.2011.07.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Perrodin C, et al. Auditory and visual modulation of temporal lobe neurons in voice-sensitive and association cortices. J Neurosci. 2014;34:2524–2537. doi: 10.1523/JNEUROSCI.2805-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Hasselmo ME, et al. The role of expression and identity in the face-selective responses of neurons in the temporal visual cortex of the monkey. Behav Brain Res. 1989;32:203–218. doi: 10.1016/s0166-4328(89)80054-3. [DOI] [PubMed] [Google Scholar]
- 66.Stein BE, Stanford TR. Multisensory integration: current issues from the perspective of the single neuron. Nat Rev Neurosci. 2008;9:255–266. doi: 10.1038/nrn2331. [DOI] [PubMed] [Google Scholar]
- 67.Bizley JK, et al. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cereb Cortex. 2007;17:2172–2189. doi: 10.1093/cercor/bhl128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Werner S, Noppeney U. Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. J Neurosci. 2010;30:2662–2675. doi: 10.1523/JNEUROSCI.5091-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Sugihara T, et al. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. J Neurosci. 2006;26:11138–11147. doi: 10.1523/JNEUROSCI.3550-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Ghazanfar AA, et al. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J Neurosci. 2005;25:5004–5012. doi: 10.1523/JNEUROSCI.0799-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Perrodin C, et al. Natural asynchronies in audiovisual communication signals regulate neuronal multisensory interactions in voice-sensitive cortex. Proceedings of the National Academy of Sciences. 2015;112:273–278. doi: 10.1073/pnas.1412817112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Chandrasekaran C, Ghazanfar AA. Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. J Neurophysiol. 2009;101:773–788. doi: 10.1152/jn.90843.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ghazanfar AA, et al. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. J Neurosci. 2008;28:4457–4469. doi: 10.1523/JNEUROSCI.0541-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Dahl CD, et al. Spatial organization of multisensory responses in temporal association cortex. J Neurosci. 2009;29:11924–11932. doi: 10.1523/JNEUROSCI.3437-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Ghazanfar AA, Takahashi DY. The evolution of speech: vision, rhythm, cooperation. Trends in cognitive sciences. 2014;18:543–553. doi: 10.1016/j.tics.2014.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Watson R, et al. People-selectivity, audiovisual integration and heteromodality in the superior temporal sulcus. Cortex. 2014;50:125–136. doi: 10.1016/j.cortex.2013.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Beauchamp MS, et al. Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nat Neurosci. 2004;7:1190–1192. doi: 10.1038/nn1333. [DOI] [PubMed] [Google Scholar]
- 78.Bastos AM, et al. Visual areas exert feedforward and feedback influences through distinct frequency channels. Neuron. 2015;85:390–401. doi: 10.1016/j.neuron.2014.12.018. [DOI] [PubMed] [Google Scholar]
- 79.Chandrasekaran C, et al. Dynamic faces speed up the onset of auditory cortical spiking responses during vocal detection. Proc Natl Acad Sci U S A. 2013;110:E4668–4677. doi: 10.1073/pnas.1312518110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Turesson HK, et al. Category-selective phase coding in the superior temporal sulcus. Proceedings of the National Academy of Sciences. 2012;109:19438–19443. doi: 10.1073/pnas.1217012109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Petkov CI, et al. Different forms of effective connectivity in primate frontotemporal pathways. Nature Communications. 2015;6 doi: 10.1038/ncomms7000. 10.1038/ncomms7000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Matsui T, et al. Direct comparison of spontaneous functional connectivity and effective connectivity measured by intracortical microstimulation: an fMRI study in macaque monkeys. Cerebral Cortex. 2011;21:2348–2356. doi: 10.1093/cercor/bhr019. [DOI] [PubMed] [Google Scholar]
- 83.Saleem KS, et al. Complementary circuits connecting the orbital and medial prefrontal networks with the temporal, insular, and opercular cortex in the macaque monkey. Journal of Comparative Neurology. 2008;506:659–693. doi: 10.1002/cne.21577. [DOI] [PubMed] [Google Scholar]
- 84.Frey S, et al. Orbitofrontal contribution to auditory encoding. Neuroimage. 2004;22:1384–1389. doi: 10.1016/j.neuroimage.2004.03.018. [DOI] [PubMed] [Google Scholar]
- 85.Pascual B, et al. Large-scale brain networks of the human left temporal pole: a functional connectivity MRI study. Cereb Cortex. 2015;25:680–702. doi: 10.1093/cercor/bht260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Binney RJ, et al. Convergent connectivity and graded specialization in the rostral human temporal lobe as revealed by diffusion-weighted imaging probabilistic tractography. Journal of Cognitive Neuroscience. 2012;24:1998–2014. doi: 10.1162/jocn_a_00263. [DOI] [PubMed] [Google Scholar]
- 87.Pernet CR, et al. The human voice areas: Spatial organization and inter-individual variability in temporal and extra-temporal cortices. NeuroImage. doi: 10.1016/j.neuroimage.2015.06.050. http://dx.doi.org/10.1016/j.neuroimage.2015.06.050 in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.von Kriegstein K, et al. Interaction of face and voice areas during speaker recognition. J Cogn Neurosci. 2005;17:367–376. doi: 10.1162/0898929053279577. [DOI] [PubMed] [Google Scholar]
- 89.von Kriegstein K, Giraud AL. Implicit multisensory associations influence voice recognition. PLoS Biol. 2006;4:e326. doi: 10.1371/journal.pbio.0040326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Schall S, et al. Early auditory sensory processing of voices is facilitated by visual mechanisms. Neuroimage. 2013;77:237–245. doi: 10.1016/j.neuroimage.2013.03.043. [DOI] [PubMed] [Google Scholar]
- 91.Blank H, et al. Direct structural connections between voice-and face-recognition areas. The Journal of Neuroscience. 2011;31:12906–12915. doi: 10.1523/JNEUROSCI.2091-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Chan AM, et al. First-pass selectivity for semantic categories in human anteroventral temporal lobe. The Journal of Neuroscience. 2011;31:18119–18129. doi: 10.1523/JNEUROSCI.3122-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Deen B, et al. Functional Organization of Social Perception and Cognition in the Superior Temporal Sulcus. Cerebral Cortex. 2015 doi: 10.1093/cercor/bhv111. bhv111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Quiroga RQ, et al. Invariant visual representation by single neurons in the human brain. Nature. 2005;435:1102–1107. doi: 10.1038/nature03687. [DOI] [PubMed] [Google Scholar]
- 95.Cervenka MC, et al. Electrocorticographic functional mapping identifies human cortex critical for auditory and visual naming. Neuroimage. 2013;69:267–276. doi: 10.1016/j.neuroimage.2012.12.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Abel TJ, et al. Mapping the temporal pole with a specialized electrode array: technique and preliminary results. Physiol Meas. 2014;35:323–337. doi: 10.1088/0967-3334/35/3/323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Afraz S-R, et al. Microstimulation of inferotemporal cortex influences face categorization. Nature. 2006;442:692–695. doi: 10.1038/nature04982. [DOI] [PubMed] [Google Scholar]
- 98.Bestelmeyer PE, et al. Right temporal TMS impairs voice detection. Curr Biol. 2011;21:R838–839. doi: 10.1016/j.cub.2011.08.046. [DOI] [PubMed] [Google Scholar]
- 99.Parvizi J, et al. Electrical stimulation of human fusiform face-selective regions distorts face perception. The Journal of Neuroscience. 2012;32:14915–14920. doi: 10.1523/JNEUROSCI.2609-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Fan L, et al. Connectivity-based parcellation of the human temporal pole using diffusion tensor imaging. Cereb Cortex. 2014;24:3365–3378. doi: 10.1093/cercor/bht196. [DOI] [PubMed] [Google Scholar]
- 101.Munoz-Lopez MM, et al. Anatomical pathways for auditory memory in primates. Front Neuroanat. 2010;4:129. doi: 10.3389/fnana.2010.00129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Fritz J, et al. In search of an auditory engram. Proc Natl Acad Sci U S A. 2005;102:9359–9364. doi: 10.1073/pnas.0503998102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Duncan J. The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour. Trends in cognitive sciences. 2010;14:172–179. doi: 10.1016/j.tics.2010.01.004. [DOI] [PubMed] [Google Scholar]
- 104.Stoewer S, et al. Frontoparietal activity with minimal decision and control in the awake macaque at 7 T. Magn Reson Imaging. 2010;28:1120–1128. doi: 10.1016/j.mri.2009.12.024. [DOI] [PubMed] [Google Scholar]
- 105.Brennan PA. The nose knows who’s who: chemosensory individuality and mate recognition in mice. Horm Behav. 2004;46:231–240. doi: 10.1016/j.yhbeh.2004.01.010. [DOI] [PubMed] [Google Scholar]
- 106.Kadohisa M, Wilson DA. Separate encoding of identity and similarity of complex familiar odors in piriform cortex. Proc Natl Acad Sci U S A. 2006;103:15206–15211. doi: 10.1073/pnas.0604313103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Gire DH, et al. Information for decision-making and stimulus identification is multiplexed in sensory cortex. Nat Neurosci. 2013;16:991–993. doi: 10.1038/nn.3432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Petrulis A, et al. Neural correlates of social odor recognition and the representation of individual distinctive social odors within entorhinal cortex and ventral subiculum. Neuroscience. 2005;130:259–274. doi: 10.1016/j.neuroscience.2004.09.001. [DOI] [PubMed] [Google Scholar]
- 109.Okabe S, et al. Pup odor and ultrasonic vocalizations synergistically stimulate maternal attention in mice. Behav Neurosci. 2013;127:432–438. doi: 10.1037/a0032395. [DOI] [PubMed] [Google Scholar]
- 110.Budinger E, et al. Multisensory processing via early cortical stages: Connections of the primary auditory cortical field with other sensory systems. Neuroscience. 2006;143:1065–1083. doi: 10.1016/j.neuroscience.2006.08.035. [DOI] [PubMed] [Google Scholar]
- 111.Cohen L, et al. Multisensory integration of natural odors and sounds in the auditory cortex. Neuron. 2011;72:357–369. doi: 10.1016/j.neuron.2011.08.019. [DOI] [PubMed] [Google Scholar]
- 112.Varga AG, Wesson DW. Distributed auditory sensory input within the mouse olfactory cortex. Eur J Neurosci. 2013;37:564–571. doi: 10.1111/ejn.12063. [DOI] [PubMed] [Google Scholar]
- 113.Bizley JK, Cohen YE. The what, where and how of auditory-object perception. Nature Reviews Neuroscience. 2013;14:693–707. doi: 10.1038/nrn3565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Poirier C, et al. Own-song recognition in the songbird auditory pathway: selectivity and lateralization. J Neurosci. 2009;29:2252–2258. doi: 10.1523/JNEUROSCI.4650-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Latinus M, et al. Norm-Based Coding of Voice Identity in Human Auditory Cortex. Curr Biol. 2013 doi: 10.1016/j.cub.2013.04.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Joly O, et al. Interhemispheric Differences in Auditory Processing Revealed by fMRI in Awake Rhesus Monkeys. Cereb Cortex. 2011 doi: 10.1093/cercor/bhr150. [DOI] [PubMed] [Google Scholar]
- 117.Gil-da-Costa R, et al. Species-specific calls activate homologs of Broca’s and Wernicke’s areas in the macaque. Nat Neurosci. 2006;9:1064–1070. doi: 10.1038/nn1741. [DOI] [PubMed] [Google Scholar]
- 118.Chandrasekaran C, et al. Monkeys and humans share a common computation for face/voice integration. PLoS Comput Biol. 2011;7:e1002165. doi: 10.1371/journal.pcbi.1002165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Lakatos P, et al. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron. 2007;53:279–292. doi: 10.1016/j.neuron.2006.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Ding SL, et al. Parcellation of human temporal polar cortex: a combined analysis of multiple cytoarchitectonic, chemoarchitectonic, and pathological markers. Journal of Comparative Neurology. 2009;514:595–623. doi: 10.1002/cne.22053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.von Kriegstein K, et al. Neural representation of auditory size in the human voice and in sounds from other resonant sources. Curr Biol. 2007;17:1123–1128. doi: 10.1016/j.cub.2007.05.061. [DOI] [PMC free article] [PubMed] [Google Scholar]